Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On Tue, 22 Feb 2005, Bill Holt wrote: > Hello, I have postfix/spam assassin/redhat es4.0 I'm stumped on how to > seed the bayesian database. The corpus @ wiki is old (don't want to seed > it with email from 2004), and I am using this machine as a gateway to an > exchange server. So by the time the email gets to the exchange server, > It's useless to me. My question is how to get the spam back on the > gateway for processing. Do I just take spam from users and write rules > accordingly? I'm a little lost at the best way to approach this. Any > pointers in the right direction would be greatly appreciated. Thank you, > Bill I was just talking to a coworker (and now BLU member) about that this morning. Steve, consider this your answer, too. You know that spamassassin doesn't say whether an email is spam or not, it gives it a numerical rating, and you can do different things with emails of different ratings. I have mailboxes for _SpamMaybe and _SpamSAYes, where possible and very likely spam messages respectively get dumped. I also have folders SpamSASpam and SpamSAHam. As I find messages not rated highly enough as spam, either in _SpamMaybe or any other folder, I move it to SpamSASpam. Likewise, any non-spam messages that get caught as spam, I copy to SpamSAHam. Then I have a script on my mail server that trains the database from those folders, and moves their content to an offline file. This is a cut-down version of this script: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv #!/bin/bash SRCDIR=~/IMAP DSTDIR=~/IMAPARCHIVE if [ -s $SRCDIR/SpamSASpam ] ; then echo Found spam sa-learn --spam --mbox $SRCDIR/SpamSASpam cat $SRCDIR/SpamSASpam >> $DSTDIR/SpamSASpam cp /dev/null $SRCDIR/SpamSASpam fi if [ -s $SRCDIR/_SpamSAYes ] ; then echo Found spam already caught cat $SRCDIR/_SpamSAYes >> $DSTDIR/SpamSASpam cp /dev/null $SRCDIR/_SpamSAYes fi if [ -s $SRCDIR/SpamSAHam ] ; then echo Found ham sa-learn --ham --mbox $SRCDIR/SpamSAHam cat $SRCDIR/SpamSAHam >> $DSTDIR/SpamSAHam cp /dev/null $SRCDIR/SpamSAHam fi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ NB: This script should really look for the procmail lock files before copying/truncating the files, but it's just not that big a deal. You will note that $DSTDIR/SpamSASpam grows indefinitely. This is a good thing. I just had a problem on my system where an update of Perl broke DB_File (Thank you, SuSE), and all hell broke loose on my bayes files. Upgrading spamassassin did no good (though the new version is MUCH better). I eventually ended up deleting them, but I had my big, fat, corpus of spam for the past year or so to retrain with. WARNING: Bayes won't work well unless you feed it ham, too. Don't forget to train both ham and spam. You're welcome to my corpus, if the fact that the emails are to me instead of you won't affect it. It's about 24MB. -- DDDD David Kramer david at thekramers.net http://thekramers.net DK KD DKK D It is the business of the future to be dangerous DK KD DDDD -DJ SPooky
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |