SA rules_dujour

Louis LeBlanc FreeBSD at
Thu Jul 7 18:58:06 GMT 2005

On 07/07/05 02:04 PM, Jean-Paul Natola sat at the `puter and typed:
> Hi everyone, 
> I have installed Spamassassin and rules_dujour
> I have NOT changed any settings , it is a vanilla install no config files or
> anything has been modified, yet  spam is coming right through
> Here's the header of one; I feel I'm missing something , I mean just by the
> subject I would think it would detect it--
> X-Spam-Score: 0.2 (/)
> X-Spam-Report: Spam detection software, running on the system "mfilter", has
> 	identified this incoming email as possible spam.  The original message
> 	has been attached to this so you can view it (if it isn't spam) or label
> 	similar future email.  If you have any questions, see
> 	the administrator of that system for details.
> 	Content preview:  Tuesday/Wednesday Sale! Fares from $49*. Tuesdays and
> 	Wednesdays just became your favorite days of the week! Take advantage
> 	of these great low fares and book your vacation today! Book on
> by 11:59PM EST on 7/6/05 for travel on Tuesdays and
> 	Wednesdays only from 7/12/05-9/28/05 unless otherwise noted. [...] 
> 	Content analysis details:   (0.2 points, 5.0 required)
> 	pts rule name              description

Note the score and the required score.  0.2 is a good long way from 5.0.
You may wish to take any saved spam and use it to teach SA what spam is,
because the Bayes learner is actually quite good at swaying this the
right way.  Also, if you're like me, you'll want to bump the required
score down.

When I was getting 200+ spams a day (some days over 300), SA was letting
through 2 or 3 a week.  Now I get 3 or 4 a week (I shut off the problem
domain for 6 months) and SA lets 1 or 2 a week.  It's a numbers game.
The more educated Bayes is, the smaller the percentage of FNs are.

The problem is that Bayes won't kick in until you teach it with enough
spam - I don't remember the kick-in point offhand.  I've seen a message
get pushed through several different installations of SA (all the same
version and config) and come out with drastically different scores, all
because of Bayes.  The better systems are ALWAYS educated on a regular

Also, since you're starting off, you'll get a lot of mileage out of the
SA list.

I use Maildir mailboxes on my system and when I learned how important
teaching Bayes is, I actually wrote a little perl script to check for
spam messages marked as read (Maildir/.spam/cur/*) and pipe them
automagically through the Bayes learner before moving them off to the
spam backup directory.  I also separate out the autolearned spam and
just push that off to the backup regardless of the read/unread status.
I think I've gotten about 2 FPs in 3 years of using SA.  Those FPs
really weren't spam, but they were all messages I didn't want to get
anyway, like chain letters or some other rubbish.  :)

