Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] What's the best site-crawler utility?



Also, I just discovered a MediaWiki extension written by Tim Starling that
may suit your needs.  As the name implies, its for dumping to HTML.

http://www.mediawiki.org/wiki/Extension:DumpHTML

As for processing the XML produced by "export" or MediaWiki dump tools,
here is info on that XML schema
http://meta.wikimedia.org/wiki/Help:Export#Export_format

And, some of the tools you can use to process MediaWiki XML
http://wikipapers.referata.com/wiki/List_of_data_processing_tools


Greg Rundlett



BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org