BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] What's the best site-crawler utility?

Subject: [Discuss] What's the best site-crawler utility?
From: richard.pieri at gmail.com (Richard Pieri)
Date: Tue, 07 Jan 2014 19:12:29 -0500
In-reply-to: <52CC9277.4010309@horne.net>
References: <52CC9277.4010309@horne.net>

Bill Horne wrote:
> I need to copy the contents of a wiki into static pages, so please
> recommend a good web-crawler that can download an existing site into
> static content pages. It needs to run on Debian 6.0.

Remember that I wrote how wikis have a spate of problems? This is the 
biggest one. There's no way to dump a MediaWiki in a humanly-readable 
form. There just isn't.

The best option usually is to use the dumpBackup.php script to dump the 
database as XML and then parse that somehow. This requires shell access 
on the server. This will get everything including markup; there's no way 
to exclude it.

Method number two is to use the Special:Export page, if it hasn't been 
disabled, to export each page in the wiki. It can do multiple pages at 
once but each page must be specified in the export. This is essentially 
the same as dumpBackup.php except that it's page by page instead of the 
whole database.

-- 
Rich P.

Follow-Ups:
- [Discuss] What's the best site-crawler utility?
  - From: dbarrett at blazemonger.com (Daniel Barrett)

References:
- [Discuss] What's the best site-crawler utility?
  - From: bill at horne.net (Bill Horne)

Prev by Date: [Discuss] Small website, non-technical users: Joomla, Drupal, or WordPress? (Solved)
Next by Date: [Discuss] What's the best site-crawler utility?
Previous by thread: [Discuss] What's the best site-crawler utility?
Next by thread: [Discuss] What's the best site-crawler utility?
Index(es):
- Date
- Thread

Boston Linux & Unix / webmaster@blu.org