[Discuss] What's the best site-crawler utility?

Greg Rundlett (freephile) greg at freephile.com
Tue Jan 7 23:02:15 EST 2014


Also, I just discovered a MediaWiki extension written by Tim Starling that
may suit your needs.  As the name implies, its for dumping to HTML.

http://www.mediawiki.org/wiki/Extension:DumpHTML

As for processing the XML produced by "export" or MediaWiki dump tools,
here is info on that XML schema
http://meta.wikimedia.org/wiki/Help:Export#Export_format

And, some of the tools you can use to process MediaWiki XML
http://wikipapers.referata.com/wiki/List_of_data_processing_tools


Greg Rundlett



More information about the Discuss mailing list