Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Bill Horne wrote: > Tom Metro wrote: >> I just tried looking something up in the BLU (discuss list) archives: >> http://olduvai.blu.org/pipermail/discuss/ >> and noticed we don't have a search engine. >> >> Are there steps we can take to get Google to spider the site? > > Please don't. The discuss archive has so many of my "private" email > addresses in it that I'd have to go into the Internet witness protection > program if Google gets to it. Let's leave the archive un-indexed. I agree that divulging email addresses is bad. (I have a "Google Alert" that helps me know when archives are "leaking" my addresses.) But the software BLU is using does obfuscate email addresses in the headers and the body of the messages. Unfortunately it does it in a fairly predictable way (replacing "@" with "at" in the displayed text and hyperlinking to the mailing list addresses). A determined spammer could easily get around this, and it's probably the same scheme used by all pipermail (the software BLU uses[1]) archives. 1. http://www.amk.ca/python/unmaintained/pipermail.html It could be argued that we'd be better off using one of the public archive sites that use more sophisticated obfuscation (such as converting the addresses to images). Some public archive services include openSubscriber.com[1] (also provides RSS feeds), The Mail Archive[2], and Gmane[3] (also provides NNTP and RSS access). The mailing list[4] for Boston Perl Mongers is archived by all of these services, and you can compare for yourself the presentation and obfuscation used by each. 1. http://www.opensubscriber.com/ 2. http://www.mail-archive.com/ 3. http://gmane.org/ 4. http://boston.pm.org/kwiki/index.cgi?MongerLists Some (Gmane) even support a X-No-Archive header so users can control whether their messages get archived. Adding a list to these services is as simple as adding an address to the BLU discuss subscription list: http://www.opensubscriber.com/faq.html http://www.mail-archive.com/addlist.html http://gmane.org/add.php (I didn't run across any options for importing past archives.) As for Google spidering, the cat's already out of the bag. Google already has some of the archives, and the archives are publicly accessible, so either Google will eventually get the rest of it, or a spammer will spider the site themselves. Matt Galster wrote: > I agree with Bill. For the same or different reasons? > You can download a copy of the archive and search it if you want. I already have a full archive locally. Having a public, searchable archive has several benefits: Users who aren't list subscribers can discover BLU when one of our postings turn up as an answer to their query (particularly true if we get indexed by Google). New BLU members can find answers to common things by searching the archives. Existing BLU subscribers can point new BLU members to past postings that answer their questions. Existing BLU subscribers who don't bother to archive the postings can more conveniently find things they saw on the list in the past. The archives embody the collective knowledge of the group, and as group that follows the traditions of open source, I'd think we'd want to make that information public. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: http://tmetro.venturelogic.com/
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |