Non English Spam
Erik Norgaard
norgaard at locolomo.org
Sat Oct 14 05:39:36 PDT 2006
Beech Rintoul wrote:
> I'm getting a ton of spam every day that comes from China, Japan and Korea.
> Spam Assassin completely ignores it because it has all non-english characters
> and slows kmail to a crawl loading. Is there a way to filter on non-english
> either using Spam Assassin or procmail?
I get none after adding simple filter rules for postfix:
# Accepted mime headers: (ASCII, UTF-8 and ISO-8859-X)
/^Content-Type:.*?charset\s*=\s*"?(us-ascii|iso-8859-\d+|utf-8)"?/
OK HDR2000 Accepted charset: $1
Strictly you can reject every other characterset, but I chose to make it
explicit:
# Reject specific character sets
# Chinese, Japanese and Korean
/^Content-Type:.*?charset\s*=\s*"?(Big5|gb2312|euc-cn)"?/
REJECT HDR2100: Unaccepted character set: "$1"
/^Content-Type:.*?charset\s*=\s*"?(euc-kr|iso-2022-kr)"?/
REJECT HDR2110: Unaccepted character set: "$1"
/^Content-Type:.*?charset\s*=\s*"?(iso-2022-\w+|euc-jp|shift_jis)"?/
REJECT HDR2120: Unaccepted character set: "$1"
# Cyrrilic character sets: Russian/Ukrainian
/^Content-Type:.*?charset\s*=\s*"?(koi8-(?:r|u))"?/
REJECT HDR2200: Unaccepted character set: "$1"
/^Content-Type:.*?charset\s*=\s*"?(windows-(?:1250|1251))"?/
REJECT HDR2210: Unaccepted character set: "$1"
And then you may want a catchup rule to catch unknown character sets.
/^Content-Type:.*?charset\s*=\s*"?(\w?)"?/
WARN HDR2299: Unknown character set: "$1"
you may change WARN to REJECT.
I have noted however, that some subscribers to this list write english
encoded in one of the above character sets, I don't know enough about
the character set definition, but it seems that English characters are a
subset of any character set?
What is the recommended policy here? Should subscribers be advised to
change character set when posting to the list?
Cheers, Erik
--
Ph: +34.666334818 web: http://www.locolomo.org
X.509 Certificate: http://www.locolomo.org/crt/8D03551FFCE04F0C.crt
Key ID: 69:79:B8:2C:E3:8F:E7:BE:5D:C3:C3:B1:74:62:B8:3F:9F:1F:69:B9
More information about the freebsd-questions
mailing list