BLU archives

Tom Metro blu at vl.com
Mon Jan 30 12:42:21 EST 2006


Bill Horne wrote:
> Tom Metro wrote:
>> I just tried looking something up in the BLU (discuss list) archives:
>> http://olduvai.blu.org/pipermail/discuss/
>> and noticed we don't have a search engine.
>>
>> Are there steps we can take to get Google to spider the site?
>
> Please don't. The discuss archive has so many of my "private" email
> addresses in it that I'd have to go into the Internet witness protection
> program if Google gets to it. Let's leave the archive un-indexed.

I agree that divulging email addresses is bad. (I have a "Google Alert" 
that helps me know when archives are "leaking" my addresses.)

But the software BLU is using does obfuscate email addresses in the 
headers and the body of the messages. Unfortunately it does it in a 
fairly predictable way (replacing "@" with "at" in the displayed text 
and hyperlinking to the mailing list addresses). A determined spammer 
could easily get around this, and it's probably the same scheme used by 
all pipermail (the software BLU uses[1]) archives.

1. http://www.amk.ca/python/unmaintained/pipermail.html

It could be argued that we'd be better off using one of the public 
archive sites that use more sophisticated obfuscation (such as 
converting the addresses to images).

Some public archive services include openSubscriber.com[1] (also 
provides RSS feeds), The Mail Archive[2], and Gmane[3] (also provides 
NNTP and RSS access). The mailing list[4] for Boston Perl Mongers is 
archived by all of these services, and you can compare for yourself the 
presentation and obfuscation used by each.

1. http://www.opensubscriber.com/
2. http://www.mail-archive.com/
3. http://gmane.org/
4. http://boston.pm.org/kwiki/index.cgi?MongerLists

Some (Gmane) even support a X-No-Archive header so users can control 
whether their messages get archived.

Adding a list to these services is as simple as adding an address to the 
  BLU discuss subscription list:

http://www.opensubscriber.com/faq.html
http://www.mail-archive.com/addlist.html
http://gmane.org/add.php

(I didn't run across any options for importing past archives.)


As for Google spidering, the cat's already out of the bag. Google 
already has some of the archives, and the archives are publicly 
accessible, so either Google will eventually get the rest of it, or a 
spammer will spider the site themselves.


Matt Galster wrote:
> I agree with Bill. 

For the same or different reasons?


> You can download a copy of the archive and search it if you want.

I already have a full archive locally.

Having a public, searchable archive has several benefits:

Users who aren't list subscribers can discover BLU when one of our 
postings turn up as an answer to their query (particularly true if we 
get indexed by Google).

New BLU members can find answers to common things by searching the archives.

Existing BLU subscribers can point new BLU members to past postings that 
answer their questions.

Existing BLU subscribers who don't bother to archive the postings can 
more conveniently find things they saw on the list in the past.


The archives embody the collective knowledge of the group, and as group 
that follows the traditions of open source, I'd think we'd want to make 
that information public.

  -Tom

-- 
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/



More information about the Discuss mailing list