Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] What's the best site-crawler utility?



On 1/7/2014 6:49 PM, Bill Horne wrote:
> I need to copy the contents of a wiki into static pages, so please
> recommend a good web-crawler that can download an existing site into
> static content pages. It needs to run on Debian 6.0.

  wget -k -m -np http://mysite

is what I used to use.  -k converts links to point to the local copy of
the page, -m turns on options for recursive mirroring, and -np enforces
that only urls "below" the initial one will be downloaded.  (the
recursive option by itself is pretty dangerous, since most sites have a
banner or something that points to a top level page, which then pulls in
the whole rest of the site).

HTH,
Matt



BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org