Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
> Anyone have a simple procmail recipie for eliminating duplicate mail? First, I'd like to point out the following example included in the procmail documentation (try 'man procmailex'): :0 Wh: msgid.lock | formail -D 8192 msgid.cache This is the canonical duplicate message filter. It simply tosses any message that has the same messageid as one you've already received. You may also want to check the procmail mailing list archive at: http://www.xray.mpe.mpg.de/mailing-lists/procmail/ Which gets this question probably once or twice a day :). Here are some of my solutions... [Note: the following examples were cribbed straight from my procmail configuration, and use several variables that you won't actually see defined in this message. If their content is not immediately apparent, feel free to ask me for clarification.] The following is what I'm actually using. Rather than just discarding the message, it sticks a note in the log file, marks the message header, and sticks it in my dupes folder (from where it will be automatically expired at some later date): ## ## MESSAGE-ID CHECK ## :0 * ^Message-id: * ? formail -D $msgid_cache_size $msgid_cache_file { LOG="dupecheck: msgid discard$NL" :0fwh | formail -A "$STATUS_HEADER: msgid duplicate" :0 { FOLDER=$dupedest INCLUDERC=$RCDIR/save.rc } } The downside to message id checking is that if 5 people forward you the exact same thing, this filter won't catch it. If you've got spare cycles on your machine, the following filter may be of interest. It strips out redundant whitespace in a message, converts tabs to spaces, and then computs the MD5 checksum of what's left. It caches the checksum, and checks future messages against the cache. It will weed out all messages with duplicate content: ## ## CONTENT MD5 CHECK ## ## get the MD5 checksum for this message :0b md5sum=|tr -s '\n\t ' ' '\ |md5 ## if a duplicate checksum exists, dump the message :0 * ? fgrep -s $md5sum $md5_cache_file { LOG="dupecheck: md5 discard$NL" :0fwh | formail -A "$STATUS_HEADER: md5 duplicate" :0 { FOLDER=$dupedest INCLUDERC=$RCDIR/save.rc } } ## Otherwise, add the checksum to the md5 cache and continue to process ## the message. :0Ehci | echo "$md5sum" >> $md5_cache_file ## Delete the cache if delivery of this message fails. This will ## ensure that redelivery attempts won't be rejected. TRAP="${TRAP:+${TRAP}; } test \$EXITCODE -eq 75 && rm -f $md5_cache_file" Note that there is an external script, run out of cron, the periodically truncates the cache file so that it doesn't grow without bounds. Isn't this far more information that you wanted? :) -- Lars ===== lars at larsshack.org --> http://www.larsshack.org/ __________________________________________________ Do You Yahoo!? Talk to your friends online with Yahoo! Messenger. http://im.yahoo.com - Subcription/unsubscription/info requests: send e-mail with "subscribe", "unsubscribe", or "info" on the first line of the message body to discuss-request at blu.org (Subject line is ignored).
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |