[Discuss] What's the best site-crawler utility?

Richard Pieri richard.pieri at gmail.com
Tue Jan 7 19:12:29 EST 2014


Bill Horne wrote:
> I need to copy the contents of a wiki into static pages, so please
> recommend a good web-crawler that can download an existing site into
> static content pages. It needs to run on Debian 6.0.

Remember that I wrote how wikis have a spate of problems? This is the 
biggest one. There's no way to dump a MediaWiki in a humanly-readable 
form. There just isn't.

The best option usually is to use the dumpBackup.php script to dump the 
database as XML and then parse that somehow. This requires shell access 
on the server. This will get everything including markup; there's no way 
to exclude it.

Method number two is to use the Special:Export page, if it hasn't been 
disabled, to export each page in the wiki. It can do multiple pages at 
once but each page must be specified in the export. This is essentially 
the same as dumpBackup.php except that it's page by page instead of the 
whole database.

-- 
Rich P.



More information about the Discuss mailing list