Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU |
On Tue, Feb 11, 2003 at 01:31:24PM -0500, David Kramer wrote: > > OK, when he first started defining the product, he led me to believe that > there was going to be new things in there, like basing rules on the relative > space between words, and the order of phrases, that could be used for this > purpose. I realize that is not the same thing as AI. Can they not be used > for this purpose, or are those features not in there? (For everyone else: we're talking about the Controlled Regex Mutilator, CRM-114, available from crm114.sourceforge.net. It's possibly the world's best spam filter right now, as well as being a complete programming language.) "Phrases" is an odd concept in CRM. Everything separated by any amount of whitespace or punctuation is a potential phrase; tokenization happens up to a length of 5 phrase-units. All the phrase-units are considered in their existence (like SpamAssassin) and in their appearance next to all other phrase-units (almost like Graham's binomial Bayesian correlation, except on a polynomial of degree 1-5). So, spacing can count, but usually doesn't, and order of phrases does count, but not as much as proximity. Then, CRM does this great superhash map with a 1 MB memory space. Each value from a phrase correlation increments the value of a particular byte in the map. If we routinely saw spam with gigantic lengths that were difficult to differentiate from non-spam with gigantic lengths, this 1 MB would have to be increased... but we don't, and it can trivially be increased anyway. The map acts as a learning memory, very similar to the "teach matchboxes how to play tic-tac-toe" article from Scientific American (? need to check on that -- it's covered in http://www.okstate.edu/cocim/members/eswar/neuralterm.pdf > -dsk- (who hosted the project for a while before he moved to sourceforge) Heh. Point. -dsr- -- Network engineer looking for work in Boston area. Resume at http://tao.merseine.nu/~dsr/
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |