Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FW: FYI - Telephone Scam



A good explanation of this (it's called Bayesian Filters) can be
seen at: http://www.paulgraham.com/spam.html

-derek

dsr at tao.merseine.nu writes:

> On Tue, Feb 11, 2003 at 01:31:24PM -0500, David Kramer wrote:
> > 
> > OK, when he first started defining the product, he led me to believe that 
> > there was going to be new things in there, like basing rules on the relative 
> > space between words, and the order of phrases, that could be used for this 
> > purpose.  I realize that is not the same thing as AI.  Can they not be used 
> > for this purpose,  or are those features not in there?
> 
> (For everyone else: we're talking about the Controlled Regex Mutilator,
> CRM-114, available from crm114.sourceforge.net. It's possibly the
> world's best spam filter right now, as well as being a complete
> programming language.)
> 
> "Phrases" is an odd concept in CRM. Everything separated by any amount
> of whitespace or punctuation is a potential phrase; tokenization
> happens up to a length of 5 phrase-units. All the phrase-units are
> considered in their existence (like SpamAssassin) and in their
> appearance next to all other phrase-units (almost like Graham's binomial
> Bayesian correlation, except on a polynomial of degree 1-5).
> 
> So, spacing can count, but usually doesn't, and order of phrases does
> count, but not as much as proximity.
> 
> Then, CRM does this great superhash map with a 1 MB memory space. Each
> value from a phrase correlation increments the value of a particular
> byte in the map. If we routinely saw spam with gigantic lengths that
> were difficult to differentiate from non-spam with gigantic lengths,
> this 1 MB would have to be increased... but we don't, and it can
> trivially be increased anyway.
> 
> The map acts as a learning memory, very similar to the "teach matchboxes
> how to play tic-tac-toe" article from Scientific American (? need to
> check on that -- it's covered in
> http://www.okstate.edu/cocim/members/eswar/neuralterm.pdf
> 
> > -dsk- (who hosted the project for a while before he moved to sourceforge)
> 
> Heh. Point.
> 
> -dsr-
> -- 
> Network engineer looking for work in Boston area.
> Resume at http://tao.merseine.nu/~dsr/
> _______________________________________________
> Discuss mailing list
> Discuss at blu.org
> http://www.blu.org/mailman/listinfo/discuss

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org