Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] What's the best site-crawler utility?

Bill Horne wrote:
> I need to copy the contents of a wiki into static pages, so please
> recommend a good web-crawler that can download an existing site into
> static content pages. It needs to run on Debian 6.0.

Remember that I wrote how wikis have a spate of problems? This is the 
biggest one. There's no way to dump a MediaWiki in a humanly-readable 
form. There just isn't.

The best option usually is to use the dumpBackup.php script to dump the 
database as XML and then parse that somehow. This requires shell access 
on the server. This will get everything including markup; there's no way 
to exclude it.

Method number two is to use the Special:Export page, if it hasn't been 
disabled, to export each page in the wiki. It can do multiple pages at 
once but each page must be specified in the export. This is essentially 
the same as dumpBackup.php except that it's page by page instead of the 
whole database.

Rich P.

BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!

Boston Linux & Unix /