Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU |
A good explanation of this (it's called Bayesian Filters) can be seen at: http://www.paulgraham.com/spam.html -derek dsr at tao.merseine.nu writes: > On Tue, Feb 11, 2003 at 01:31:24PM -0500, David Kramer wrote: > > > > OK, when he first started defining the product, he led me to believe that > > there was going to be new things in there, like basing rules on the relative > > space between words, and the order of phrases, that could be used for this > > purpose. I realize that is not the same thing as AI. Can they not be used > > for this purpose, or are those features not in there? > > (For everyone else: we're talking about the Controlled Regex Mutilator, > CRM-114, available from crm114.sourceforge.net. It's possibly the > world's best spam filter right now, as well as being a complete > programming language.) > > "Phrases" is an odd concept in CRM. Everything separated by any amount > of whitespace or punctuation is a potential phrase; tokenization > happens up to a length of 5 phrase-units. All the phrase-units are > considered in their existence (like SpamAssassin) and in their > appearance next to all other phrase-units (almost like Graham's binomial > Bayesian correlation, except on a polynomial of degree 1-5). > > So, spacing can count, but usually doesn't, and order of phrases does > count, but not as much as proximity. > > Then, CRM does this great superhash map with a 1 MB memory space. Each > value from a phrase correlation increments the value of a particular > byte in the map. If we routinely saw spam with gigantic lengths that > were difficult to differentiate from non-spam with gigantic lengths, > this 1 MB would have to be increased... but we don't, and it can > trivially be increased anyway. > > The map acts as a learning memory, very similar to the "teach matchboxes > how to play tic-tac-toe" article from Scientific American (? need to > check on that -- it's covered in > http://www.okstate.edu/cocim/members/eswar/neuralterm.pdf > > > -dsk- (who hosted the project for a while before he moved to sourceforge) > > Heh. Point. > > -dsr- > -- > Network engineer looking for work in Boston area. > Resume at http://tao.merseine.nu/~dsr/ > _______________________________________________ > Discuss mailing list > Discuss at blu.org > http://www.blu.org/mailman/listinfo/discuss -- Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory Member, MIT Student Information Processing Board (SIPB) URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH warlord at MIT.EDU PGP key available
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |