milter-regex doesn't seem to be miltering!

Wed Sep 27 08:29:45 PDT 2006

Chris Martin wrote:
> I am trying to use milter-regex to pre-sort e-mail/spam before passing 
> it on to clamav and spamassassin, but it doesn't seem to be working.
> 
> Here are my first, slightly lame, rules:
> 
> reject "Spam not welcome"
> header /Subject:/ /\b(PHA)+([a-zA-Z]+(RMA))\b/
> 
> reject "Spam not welcome"
> header /Subject:/ /\b(PHA)+([a-zA-Z]+(RMACY))\b/
> 
> discard
> header /Subject:/ /TESTSTRING45819203/

This isn't really the place to ask about it, but there's not really a better
forum, either. Maybe freebsd-questions. Anyway, lots of things could be going
wrong.

First, the obvious: is milter-regex running?

# ps -auwwx | fgrep milter
mailnull 34677  0.0  1.3 14772  6800  ??  Ss   28Aug06  38:12.65 /usr/local/libexec/milter-regex -c /usr/local/etc/milter-regex.conf

Did you follow the instructions in the port's pkg-install to set it up to 
start at boot time? It involves editing /etc/rc.conf.local (or rc.conf) and 
/etc/rc.local.

Did you set up logging? Make sure your /etc/syslog.conf contains lines like 
the following:

*.=debug                                        /var/log/debug.log

!milter-regex
daemon.err;daemon.notice        /var/log/maillog

and then 'kill -HUP `cat /var/run/syslog.pid`'. Now you should get copious
logs to look at. If your milter-regex.conf has errors, you should see a
message about it in maillog. In debug.log you should see everything the milter
is processing, up to the point where a rule is matched. I like to tail -f my
debug.log sometimes and see what gets through, and make sure I don't have any
false positives.

You might want to take a look at my milter-regex.conf:
http://skew.org/~mike/milter-regex.conf

In any case, you definitely have problems with your regexes. milter-regex uses
basic POSIX regular expressions by default, but you're using "+" to mean
1-or-more, so you need to append an "e" to the end to flag it as an 'extended'
POSIX regex. Your "\b" is presumably meant to be a word boundary, but that's a
feature of Perl-compatible regexes, not POSIX, so get rid of those.

Also, I'm not sure about what you're trying to match. (PHA)+ would match one
or more "PHA"s. The parentheses in ([a-zA-Z]+(RMA)) are not doing anything but
wasting memory; [a-zA-Z]+RMA would mean the same thing, matching 1 or more a-z
(case insensitive) followed by "RMA". If you want the "CY" at the end to be
optional, you'd add "(CY)?"  instead of creating a new regex for it.

The colon isn't included in the header that gets tested, so you'll never match
with "Subject:". You want "Subject". But I prefer "^Subject$" because it
ensures that it matches only "Subject" and not something like
"X-Original-Subject".

Finally, if you have multiple rules, you can put them together under one
"reject" line. Again, see my milter-regex.conf for examples, and take note of
the comments therein... For example, I'm doing a lot of "reject"ing but
ultimately I think I want discard spam, not reject it, in order to avoid
having the sending system generate a bounce that goes to the poor soul whose
email was used as the return address.

Mike