FW: FYI - Telephone Scam
Derek Atkins
warlord at MIT.EDU
Tue Feb 11 18:39:37 EST 2003
A good explanation of this (it's called Bayesian Filters) can be
seen at: http://www.paulgraham.com/spam.html
-derek
dsr at tao.merseine.nu writes:
> On Tue, Feb 11, 2003 at 01:31:24PM -0500, David Kramer wrote:
> >
> > OK, when he first started defining the product, he led me to believe that
> > there was going to be new things in there, like basing rules on the relative
> > space between words, and the order of phrases, that could be used for this
> > purpose. I realize that is not the same thing as AI. Can they not be used
> > for this purpose, or are those features not in there?
>
> (For everyone else: we're talking about the Controlled Regex Mutilator,
> CRM-114, available from crm114.sourceforge.net. It's possibly the
> world's best spam filter right now, as well as being a complete
> programming language.)
>
> "Phrases" is an odd concept in CRM. Everything separated by any amount
> of whitespace or punctuation is a potential phrase; tokenization
> happens up to a length of 5 phrase-units. All the phrase-units are
> considered in their existence (like SpamAssassin) and in their
> appearance next to all other phrase-units (almost like Graham's binomial
> Bayesian correlation, except on a polynomial of degree 1-5).
>
> So, spacing can count, but usually doesn't, and order of phrases does
> count, but not as much as proximity.
>
> Then, CRM does this great superhash map with a 1 MB memory space. Each
> value from a phrase correlation increments the value of a particular
> byte in the map. If we routinely saw spam with gigantic lengths that
> were difficult to differentiate from non-spam with gigantic lengths,
> this 1 MB would have to be increased... but we don't, and it can
> trivially be increased anyway.
>
> The map acts as a learning memory, very similar to the "teach matchboxes
> how to play tic-tac-toe" article from Scientific American (? need to
> check on that -- it's covered in
> http://www.okstate.edu/cocim/members/eswar/neuralterm.pdf
>
> > -dsk- (who hosted the project for a while before he moved to sourceforge)
>
> Heh. Point.
>
> -dsr-
> --
> Network engineer looking for work in Boston area.
> Resume at http://tao.merseine.nu/~dsr/
> _______________________________________________
> Discuss mailing list
> Discuss at blu.org
> http://www.blu.org/mailman/listinfo/discuss
--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
warlord at MIT.EDU PGP key available
More information about the Discuss
mailing list