Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FW: FYI - Telephone Scam



On Tue, Feb 11, 2003 at 01:31:24PM -0500, David Kramer wrote:
> 
> OK, when he first started defining the product, he led me to believe that 
> there was going to be new things in there, like basing rules on the relative 
> space between words, and the order of phrases, that could be used for this 
> purpose.  I realize that is not the same thing as AI.  Can they not be used 
> for this purpose,  or are those features not in there?

(For everyone else: we're talking about the Controlled Regex Mutilator,
CRM-114, available from crm114.sourceforge.net. It's possibly the
world's best spam filter right now, as well as being a complete
programming language.)

"Phrases" is an odd concept in CRM. Everything separated by any amount
of whitespace or punctuation is a potential phrase; tokenization
happens up to a length of 5 phrase-units. All the phrase-units are
considered in their existence (like SpamAssassin) and in their
appearance next to all other phrase-units (almost like Graham's binomial
Bayesian correlation, except on a polynomial of degree 1-5).

So, spacing can count, but usually doesn't, and order of phrases does
count, but not as much as proximity.

Then, CRM does this great superhash map with a 1 MB memory space. Each
value from a phrase correlation increments the value of a particular
byte in the map. If we routinely saw spam with gigantic lengths that
were difficult to differentiate from non-spam with gigantic lengths,
this 1 MB would have to be increased... but we don't, and it can
trivially be increased anyway.

The map acts as a learning memory, very similar to the "teach matchboxes
how to play tic-tac-toe" article from Scientific American (? need to
check on that -- it's covered in
http://www.okstate.edu/cocim/members/eswar/neuralterm.pdf

> -dsk- (who hosted the project for a while before he moved to sourceforge)

Heh. Point.

-dsr-
-- 
Network engineer looking for work in Boston area.
Resume at http://tao.merseine.nu/~dsr/




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org