Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] What's the best site-crawler utility?

On 1/8/2014 10:10 AM, Richard Pieri wrote:
> Daniel Barrett wrote:
>> Well, a script doesn't need human-readability. :-) Trust me, this is
>> not hard. I did it a few years ago with minimal difficulty (using a
>> couple of Emacs macros, if memory serves).
> If you recall, the decision is that a novice has volunteered to take 
> over as a way to learn HTML and related. It doesn't matter how easy 
> you think it is. What matters is that this novice is handed something 
> usable. Database dumps and web scrapes typically are anything but that.

Daniel and Richard,

Thank you for your suggestions. I appreciate, as always, the way that 
BLU members step up and try to help.

I'm going to test for transfer, and ask that Richard clarify why he 
feels that "web scrapes" are not usable: I was under the impression that 
the result of mirroring a site would be a lot of separate html files, 
one for each link on the site. Is this not correct?


Bill Horne
William Warren Consulting

BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!

Boston Linux & Unix /