Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Matthew Gillen wrote: > wget -k -m -np http://mysite I create an "emergency backup" static version of dynamic sites using: wget -q -N -r -l inf -p -k --adjust-extension http://mysite The option -m is equivalent to "-r -N -l inf --no-remove-listing", but I didn't want --no-remove-listing (I don't recall why), so I specified the individual options, and added: -p --page-requisites This option causes Wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets. --adjust-extension If a file of type application/xhtml+xml or text/html is downloaded and the URL does not end with the regexp \.[Hh][Tt][Mm][Ll]?, this option will cause the suffix .html to be appended to the local filename. This is useful, for instance, when you're mirroring a remote site that uses .asp pages, but you want the mirrored pages to be viewable on your stock Apache server. Another good use for this is when you're downloading CGI-generated materials. A URL like http://site.com/article.cgi?25 will be saved as article.cgi?25.html. > '-k' ... may or may not produce what you want if you want to actually > replace the old site, with the intention of accessing it through a web > server. Works for me. I've republished sites captured with the above through a server and found them usable. But generally speaking, not all dynamic sites can successfully be crawled without customizing the crawler. And as Rich points out, if your objective is not just to end up with what appears to be a mirrored site, but actual clean HTML suitable for hand-editing, then you've still got lots of work ahead of you. -Tom -- Tom Metro The Perl Shop, Newton, MA, USA "Predictable On-demand Perl Consulting." http://www.theperlshop.com/
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |