Non English Spam

Erik Norgaard norgaard at locolomo.org
Sat Oct 14 05:39:36 PDT 2006


Beech Rintoul wrote:
> I'm getting a ton of spam every day  that comes from China, Japan and Korea. 
> Spam Assassin completely ignores it because it has all non-english characters 
> and slows kmail to a crawl loading. Is there a way to filter on non-english 
> either using Spam Assassin or procmail? 

I get none after adding simple filter rules for postfix:

# Accepted mime headers: (ASCII, UTF-8 and ISO-8859-X)
/^Content-Type:.*?charset\s*=\s*"?(us-ascii|iso-8859-\d+|utf-8)"?/
     OK     HDR2000 Accepted charset: $1

Strictly you can reject every other characterset, but I chose to make it 
explicit:

# Reject specific character sets
# Chinese, Japanese and Korean
/^Content-Type:.*?charset\s*=\s*"?(Big5|gb2312|euc-cn)"?/
     REJECT HDR2100: Unaccepted character set: "$1"
/^Content-Type:.*?charset\s*=\s*"?(euc-kr|iso-2022-kr)"?/
     REJECT HDR2110: Unaccepted character set: "$1"
/^Content-Type:.*?charset\s*=\s*"?(iso-2022-\w+|euc-jp|shift_jis)"?/
     REJECT HDR2120: Unaccepted character set: "$1"
# Cyrrilic character sets: Russian/Ukrainian
/^Content-Type:.*?charset\s*=\s*"?(koi8-(?:r|u))"?/
     REJECT HDR2200: Unaccepted character set: "$1"
/^Content-Type:.*?charset\s*=\s*"?(windows-(?:1250|1251))"?/
     REJECT HDR2210: Unaccepted character set: "$1"

And then you may want a catchup rule to catch unknown character sets.

/^Content-Type:.*?charset\s*=\s*"?(\w?)"?/
     WARN   HDR2299: Unknown character set: "$1"

you may change WARN to REJECT.

I have noted however, that some subscribers to this list write english 
encoded in one of the above character sets, I don't know enough about 
the character set definition, but it seems that English characters are a 
subset of any character set?

What is the recommended policy here? Should subscribers be advised to 
change character set when posting to the list?

Cheers, Erik
-- 
Ph: +34.666334818                      web: http://www.locolomo.org
X.509 Certificate: http://www.locolomo.org/crt/8D03551FFCE04F0C.crt
Key ID: 69:79:B8:2C:E3:8F:E7:BE:5D:C3:C3:B1:74:62:B8:3F:9F:1F:69:B9


More information about the freebsd-questions mailing list