HowTos/SpamAssassin

From Scalix Wiki
Revision as of 07:43, 21 May 2009 by Putztzu (Talk | contribs) (Installation)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Scalix Wiki -> How-Tos -> SpamAssassin

Introduction

Spamassassin is an Open Source package available for RedHat and SuSE Linux. "Client" processes will communicate with a daemon (spamd) to perform the checking of a message. In most cases, the client will hand the daemon a complete message to check. The daemon will return the message with a series of header lines which indicate the “spamminess” of the contents. For more information about Spamassassin, refer to the Apache Foundation website; http://spamassassin.apache.org.

Installation

Download and install sendmail-devel and spamass-milter RPMs. These RPMs are readily available on the Internet and can be easily located using one of the rpmfind web sites.

If you are using Scalix 11, you will already have installed the sendmail-devel package.

If you are using Debian 4.0 Etch, you can install the packages

apt-get install spamassassin spamass-milter

Using SLES 10.2 (msaavedra@pixelcom.ch):
[Update 05/20/2009 by tonysu@su-networking.com]
Install the following prerequisite files from SuSE sources before anything else

sendmail.devel
C++ GNU compiler
make

Also, the updated URL is
http://mirror.its.uidaho.edu/pub/savannah/spamass-milt/spamass-milter-0.3.1.tar.gz

for SUSE 10.2 download the milter, unpack the tarball and install spamass-milter-x.y.z:

(Download Area at http://download.savannah.nongnu.org/releases/spamass-milt/)

wget http://download.savannah.nongnu.org/releases/spamass-milt/spamass-milter-0.3.1.tar.gz
tar xvf spamass-milter-0.3.1.tar.gz
cd spamass-milter-0.3.1/ 
./configure
make
make install
cd /usr/local/sbin/
ls -la
spamass-milter -p /var/run/spamass.sock -D localhost -x

after getting following error: Milter (spamassassin): error connecting to filter: Connection refused by /var/spamass/spamass.sock i changed the .sock and it works

spamass-milter -p /var/run/spamass-milter.sock -f

see result with:

tail –f /var/log/mail.xyz [what ever the file is you what to watch]

Scalix Configuration

Scalix can now be configured to filter mail through Sendmail and in turn Spamassassin by adding one option to the smtpd.cfg configuration file. The linux commands to do this are as follows

Make a copy of the current configuration file.

Scalix 10

cp /var/opt/scalix/sys/smtpd.cfg /var/opt/scalix/sys/smtpd.cfg.orig

Scalix 11

cp /var/opt/scalix/NN/s/sys/smtpd.cfg /var/opt/scalix/NN/s/sys/smtpd.cfg.orig

--> NN is the first and last characters of your hostname.

Open the configuration file with your favorite text editor, in the example we use vi.

Scalix 10

vi /var/opt/scalix/sys/smtpd.cfg

Scalix 11

vi /var/opt/scalix/NN/s/sys/smtpd.cfg

Add this line:

SMTPFILTER=TRUE

above the lines beginning with:

RELAY accept 127.0.0.1

Save the file.

Sendmail Configuration

Please note that depending upon distribution, the socket file that is used for communication between spamass-milter and sendmail may not match the instructions below. To determine the correct socket file name, look at the file /etc/sysconfig/spamass-milter

Scalix 10

Backup the sendmail.cf

cp /etc/mail/sendmail.cf /etc/mail/sendmail.cf.orig

If you are using SuSE, the file will be /etc/sendmail.cf

Uncomment the line:

#O InputMailFilters

and change it to:

O InputMailFilters=spamassassin

Immediately below that line, add the following (the font size is smaller to preserve formatting):

# Milter options
#O Milter.LogLevel
O Milter.macros.connect=b, j, _, {daemon_name}, {if_name}, {if_addr}
O Milter.macros.helo={tls_version}, {cipher}, {cipher_bits}, {cert_subject}, {cert_issuer}
O Milter.macros.envfrom=i, {auth_type}, {auth_authen}, {auth_ssf}, {auth_author}, {mail_mailer}, {mail_host}, {mail_addr}
O Milter.macros.envrcpt={rcpt_mailer}, {rcpt_host}, {rcpt_addr}

In the section MAIL FILTER DEFINITIONS, add the following line:

Xspamassassin, S=local:/var/run/spamass.sock, F=, T=C:15m;S:4m;R:4m;E:10m

Scalix 11 or if you are using .m4 macros

First, you have to check if there's a sendmail.mc file existing (openSUSE 10.X doesn't have this file as sendmail is configured by YaST by default!). If sendmail.mc is existing, just continue with Step 1 otherwise jump to Step 2:

1. sendmail.mc is existing

You can add the following line to the end of your sendmail.mc file

INPUT_MAIL_FILTER(`spamassassin',`S=local:/var/run/spamass.sock, F=, T=C:15m;S:4m;R:4m;E:10m')dnl

Note that each quoted string begins with a ` and ends in a '.

2. sendmail.mc is NOT existing

We have to do some preperations now. Firstly, we should stop YaST/SuSEconfig generating the sendmail.cf file.

To do that, in /etc/sysconfig/mail set:

MAIL_CREATE_CONFIG="no"

Next, we should backup /etc/sendmail.cf & /etc/mail/linux.mc:

# cp /etc/sendmail.cf /etc/sendmail.cf.backup
# cp /etc/mail/linux.mc /etc/mail/linux.mc.backup

Afterwards, edit /etc/mail/linux.mc, adding the following line at the end of the file:

INPUT_MAIL_FILTER(`spamassassin',`S=local:/var/run/spamass.sock, F=, T=C:15m;S:4m;R:4m;E:10m')dnl

[edit by tonysu@su-networking.com] The above new line entry in linux.mc should be prepended by a "dnl" like all other entries in the file

Note that each quoted string begins with a ` and ends in a '.

For all versions:

Now it's time to regenerate the sendmail.cf file by doing:

sudo sh -c "m4 /etc/mail/linux.mc > /etc/sendmail.cf"

If there should appear this error "m4: INTERNAL ERROR: recursive push_string!", just add a new empty line to the end of /etc/mail/linux.mc and regenerate sendmail.cf file again, it should be fine then.

Configure spamassassin and spamass-milter to start on boot

To ensure that you have your spamassassin daemon running on reboot, you should use the commands:

chkconfig --add spamassassin
chkconfig --add spamass-milter
chkconfig --level 345 spamassassin on
chkconfig --level 345 spamass-milter on
/etc/init.d/spamassassin start
/etc/init.d/spamass-milter start

Under SuSE, spamd is configured by default not to apply any rules that require Internet access (like accessing Pyzor, blocklists etc). To fix this, edit /etc/sysconfig/spamd. Look for the line, and remove the "-L" switch

 SPAMD_ARGS="-d -c -L"

for SUSE 10.2 use spamd instead of spammassassin:

-bash: chkconfig --add spamd
insserv: Warning, current runlevel(s) of script `spamd' overwrites defaults.
spamd                     0:off  1:off  2:off  3:on   4:on   5:on   6:off
-bash: chkconfig --level 345 spamd on
-bash: /etc/init.d/spamd start
(to start the Spamassassin daemon in daemon mode. Type: spamd -d)

On a Debian 4.0 Etch installation, the links for starting spamassassin and spamass-milter are set by default. However, spamassassin won't start until you enable it. To do so, edit the /etc/default/spamassassin file

vi /etc/default/spamassassin

Change the option ENABLED=0 to ENABLED=1.

Restart sendmail

Use the command

/etc/init.d/sendmail restart

SUSE 10.2 error-message if spamass-milter isn't installed

Initializing SMTP port (sendmail)WARNING:
Xspamassassin: local socket name /var/run/spamass.sock missing
# ps aux | grep spam

Restart the Scalix SMTP Relay

Use the commands

omoff -w -d 0 smtpd
omon smtpd

Confirm mail is being scanned

Using the command

tail –f /var/log/mail.log

Successful Spamassassin configuration should produce this type of output in the log file if it is working correctly

Nov 3 09:39:56 scal4 sendmail[27547]: jA3Hdueo027547: from=<Kent.Brake@scalix.com>, size=2089, class=0, nrcpts=1,  msgid=<H00000b60014d0c8.1131039536.hagrid.scalix.local@MHS>, proto=ESMTP, daemon=MTA, relay=localhost [127.0.0.1]
Nov 3 09:39:56 scal4 spamd[24498]: connection from localhost [127.0.0.1] at port 59807
Nov 3 09:39:56 scal4 spamd[24498]: info: setuid to root succeeded
Nov 3 09:39:56 scal4 spamd[24498]: Still running as root: user not specified with -u, not found, or set to root. Fall back to nobody.
Nov 3 09:39:56 scal4 spamd[24498]: processing message <H00000b60014d0c8.1131039536.hagrid.scalix.local@MHS> for root:65534.
Nov 3 09:39:56 scal4 spamd[24498]: clean message (-1.0/5.0) for root:65534 in 0.1 seconds, 2338 bytes.
Nov 3 09:39:56 scal4 spamd[24498]: result: . -1 - ALL_TRUSTED,WEIRD_QUOTING scantime=0.1,size=2338,mid=<H00000b60014d0c8.1131039536.hagrid.scalix.local@MHS>,autolearn=failed

for SUSE 10.2

tail –f /var/log/mail
tail –f /var/log/mail.error

Configure spamassassin (header and score)

for SUSE 10.2

If you need to change the score for system-wide processing:

/etc/mail/spamassassin/local.cf

or

/etc/spamassassin/local.cf

Add your own customisations to this file.

# rewrite_header Subject ****SPAM(_SCORE_)****
rewrite_header Subject **** your company ANTI-SPAM(_SCORE_)****
# Set the score required before a mail is considered spam.
required_score 3.00 => set it to required_score 5.00
  • ALWAYS* lint your rules:
spamassassin --lint

If the --lint output doesn't give you enough information, use:

spamassassin --lint -D

The Care and Feeding of your Bayes

This was initially contributed by Leigh. Many thanks

Spamassassin's Bayesian database needs a balanced supply of both Spam and Ham in order to function properly.
By feeding in false positives as Ham, and feeding false negatives as spam, we can keep the bayes database up to date.
Spamassassin also provides a facility to report spam to various anit-spam sites such as Razor, Pyzor and SpamCop.
Using the mboxadmin facility of Scalix, we can automate this task quite easily.
However, we need to be careful about what we feed into the bayes. We can't always trust our users to put spam into the right folders, and we can't expect them to hand-feed ham into our bayes. Many people use a public folder for their spam. This alows everyone to dump their false-negatives into a single folder, and automatically feed it into the bayes. Unfortunately, this doesn't allow for feeding it ham as well, and bayes needs a balanced diet. The other problem with public folders is that they are just that - public. We can't expect users to place ham into a public folder for all to see.
Here is a method for ensuring your bayes gets fed a proper balanced diet, and only spam gets fed in as spam, and only ham gets fed in as ham.

Requirements

Perl
perl-Mail-IMAPClient (probably available in your package manager)

Before using the tool, a user must exist which has the mboxadmin capability. To add this capability to a user, one can use ommodu. For example:

ommodu -o TestUser -c +mboxadmin

Set up two cron jobs on your server. Run this script every hour:

#!/usr/bin/perl
use strict;
use warnings;
use Mail::IMAPClient;
my $host="your_mail_server_ip";
my $username="mboxadmin_user_name";
my $password="mboxadmin_password";
my @real_users=`/opt/scalix/bin/omshowu -m all -i`;	# get all real user names.
foreach my $punter (@real_users)			# Loop over them all.
{
	chomp $punter;					# Remove trailing carriage return.
	print "$punter\n";				# Some output. Feel free to remove.
	my $user="mboxadmin:$username:$punter";		# Set up superuser login.
	my $imap  = new Mail::IMAPClient( 'Server' => $host , 'User' => $user , 'Password' => $password  ) or next;	# connect to server.
	my @folders=$imap->folders;			# list folders.
	foreach  my $folder (@folders)			# Look through each of them.
	{
                if (lc($folder) eq "junk e-mail")							      		# "junk email" folder.
                {
                        print "Found a spam folder: $folder\n";
                       $imap->select($folder) or next;                                                                  # Select the folder.
                        print "Folder $folder selected.\n";
                        my @list=$imap->messages or next;                                                              # List all messages in folder.
                        print scalar(@list)." messages in folder.\n";
                        foreach my $msg (reverse(@list))                                                                # Loop over them all.
                        {
                                my @email=$imap->fetch($msg,'RFC822');                                                  # Fetch message.
                                open (SALEARN,"|/usr/bin/spamassassin -d | /usr/bin/sa-learn --spam") or print "$!\n";  # Feed to sa-learn.
                                print SALEARN "$email[1]";
                                close SALEARN;
                                open (REPORT,"|/usr/bin/spamassassin -d | /usr/bin/spamassassin -r") or print "$!\n";   # Report it. (SpamCop and Pyzor).
                                print REPORT "$email[1]";
                                close REPORT;
                                $imap->delete_message($msg) or next;                                                    # Delete it.
                        }
                        $imap->expunge($folder) or next;                                                                #Expunge folder.
                }
	}
}





And this one every week:

#!/usr/bin/perl
use strict;
use warnings;
use Mail::IMAPClient;
my $host="your_server_ip_address";
my $username="mboxadmin_user_name";
my $password="mboxadmin_password";
my @real_users=`/opt/scalix/bin/omshowu -m all -i`;	# get all real user names.
foreach my $punter (@real_users)			# Loop over them all.
{
	chomp $punter;					# Remove trailing carriage return.
	print "$punter\n";				# Some output. Feel free to remove.
	my $user="mboxadmin:$username:$punter";		# Set up superuser login.
	my $imap  = new Mail::IMAPClient( 'Server' => $host , 'User' => $user , 'Password' => $password  ) or next;	# connect to server.
	my @folders=$imap->folders;			# list folders.
	foreach  my $folder (@folders)			# Look through each of them.
	{
		if (lc($folder) eq "inbox")		# "Inbox" is guaranteed to only have ham in it.
		{
			print "Inbox found.\n";		# Some debug output.
			$imap->select($folder) or next;	# Select folder.
			print "Folder $folder selected.\n";
			my @list=$imap->seen or next;	# Get only messages which have been read. Saves the possibility of reading in false positives. Also stops us interfering with people's mail.
			print scalar(@list)." messages in folder.\n";
			my $counter=0;			# Initialise counter. - we don't want the entire inbox.
			foreach my $msg (reverse(@list))		# Loop over each message.
			{
				my @email=$imap->fetch($msg,'RFC822');	# Fetch it.
				open (SALEARN,"|/usr/bin/spamassassin -d | /usr/bin/sa-learn --ham") or next;		# Feed it to sa-learn. 
				print SALEARN "$email[1]\n";
				close SALEARN;
				$counter +=1;		# Increment counter.
				last if ($counter>100); # We only want 100 messages.
			}
		}
		elsif (lc($folder) eq "possible spam") 									# "Possible Spam" folder.
		{
			print "Found a spam folder: $folder\n";
                       $imap->select($folder) or next;									# Select the folder.
                        print "Folder $folder selected.\n";
			my $lastweek=time()-604800;									# Get timestamp for this time last week.
			my @list = $imap->before($lastweek) or next; 							# List all messages older than that.
                        print scalar(@list)." messages in folder.\n";
                        foreach my $msg (reverse(@list))								# Loop over them all.
                        {
                                my @email=$imap->fetch($msg,'RFC822');							# Fetch message.
                               	open (SALEARN,"|/usr/bin/spamassassin -d | /usr/bin/sa-learn --spam") or print "$!\n";	# Feed to sa-learn.
                               	print SALEARN "$email[1]";
                               	close SALEARN;
				open (REPORT,"|/usr/bin/spamassassin -d | /usr/bin/spamassassin -r") or print "$!\n";	# Report it. (SpamCop and Pyzor).
				print REPORT "$email[1]";
				close REPORT;
				$imap->delete_message($msg) or next;							# Delete it.
                        }
			$imap->expunge($folder) or next;								#Expunge folder.
		}
		elsif(lc($folder) eq "non-spam")
		{
                       $imap->select($folder) or next;                                                                  # Select the folder.
                        print "Folder $folder selected.\n";
                        my @list=$imap->messages or next;                                                              # List all messages in folder.
                        print scalar(@list)." messages in folder.\n";
                        foreach my $msg (reverse(@list))                                                                # Loop over them all.
                        {
                                my @email=$imap->fetch($msg,'RFC822');                                                  # Fetch message.
                                open (SALEARN,"|/usr/bin/spamassassin -d | /usr/bin/sa-learn --forget") or print "$!\n";# Sa-learn forget this message if already seen.
                                print SALEARN "$email[1]";
                                close SALEARN or print "$!\n";
                                open (SALEARN,"|/usr/bin/spamassassin -d | /usr/bin/sa-learn --ham") or next;          # Feed to sa-learn as ham.
                                print SALEARN "$email[1]";
                                close SALEARN;
                        }
 
		}
                elsif (lc($folder) eq "spam")					                                      # "spam"  folder.
                {
                        print "Found a spam folder: $folder\n";
                       $imap->select($folder) or next;                                                                  # Select the folder.
                        print "Folder $folder selected.\n";
                        my $lastweek=time()-604800;                                                                     # Get timestamp for this time last week.
                        my @list = $imap->before($lastweek) or next;                                                    # List all messages older than that.
                        print scalar(@list)." messages in folder.\n";
                        foreach my $msg (reverse(@list))                                                                # Loop over them all.
                        {
                        	my $subject=$imap->subject($msg);                                                       # Fetch subject for message.
                                my @email=$imap->fetch($msg,'RFC822');                                                  # Fetch message.
                                unless ($subject=~m/\[SPAM\]/)
				{
					print "Learning message with subject: $subject\n";
                                        open (SALEARN,"|/usr/bin/spamassassin -d | /usr/bin/sa-learn --spam") or print "$!\n";  # Feed to sa-learn.
                        	        print SALEARN "$email[1]";
                                       	close SALEARN;
				}
                                open (REPORT,"|/usr/bin/spamassassin -d | /usr/bin/spamassassin -r") or print "$!\n";   # Report it. (SpamCop and Pyzor).
                                print REPORT "$email[1]";
                                close REPORT;
                                $imap->delete_message($msg) or next;                                                    # Delete it.
                        }
                        $imap->expunge($folder) or next;                                                                #Expunge folder.
                }
 
	}
}


The first script, run every hour, checks each user's "junk email" folder. Each message it finds has it's spamassassin headers removed and is fed to sa-learn as spam. It is then submtted to spamassassin's reporting facility to be reported to SpamCop, Pyzor, etc.
For the second script to be effective, we need to set up some rules and educate our users a little.
To be as aggressive as possible with spam, but also as safe as possible, set up two folders for each user: "Spam" and "Possible Spam". Each user then needs two server-side rules: All mail marked as spam by spamassassin goes into the "spam" folder, and all mail not marked as spam, but with a score above 3, goes into the "possible spam" folder. Also create a "non-spam" folder for each user. This is where they are to place copies of legitimate email which gets incorrectly tagged as spam.
Our second script then does the following:
Each user's inbox is scanned. The newest 100 messages which have already been read are fed to sa-learn as ham. This assumes that nobody is going to read a piece of spam and then leave it in their inbox. If they do, they deserve to get more spam, quite frankly.
Each user's "possible spam" folder is also read. Messages which are older than a week are fed to sa-learn as spam and then deleted. There is no point reporting these after they are a week old. If a user does not check their possible spam folder each week, they risk losing mail. This gives them the incentive to keep an eye on it.
The "non-spam" folder is also checked. Anything in here is fed to sa-learn as well. First, sa-learn is told to un-learn this message, in case auto-learn has already classified it as spam, and then it is learnt as ham.
The "spam" folder is then checked. It is almost the same as the "possible spam" folder, except that anything which has already been tagged as spam by spamassassin is not reported. This gives peopole the option of placing spam in the "spam" folder, which is a little more intuitive for them. Also, those not running Outlook 2K3 or later may not have a "junk email" folder. To check whether a message has been already tagged as spam, this script looks at the message subject. If it begins with "[SPAM]", it is simply deleted. Feeding lots of messages into your bayes database which have this tag in the header could do more harm than good. Spamassassin may start to think that all spam has the tag "[spam]" in it's subject, and down-grade any messages which don't.
One thing about mboxadmin is worth noting. As at scalix V10, an mboxadmin user cannot access the mailbox of another mboxadmin user. This means you must not have any other mboxadmins on your system, or our scripts will not be able to read their mail.

The 100-message limit on the inbox can be changed to suit your site by altering this line:

				last if ($counter>100); # We only want 100 messages.

Season to taste.
Please note: If the setup described above makes a mess of your Bayes, I will not be held responsible. Use this method at your own risk. Make sure you understand the requirements and use your own judgement. Back up your Bayes database first.

Below Contributed by Mike Lee on 11-9-2006

I recommend using a username who is not a legit user who will be checking email. The reason for this that if you use a legit username and grant them mbox permissions via ommodu username -c +mboxadmin, then that users' mailbox will not get checked. This has been my experience with this script. Therefore use a username, for example admin@yourdomain.com, as the mboxadmin if you do not receive emails to that account or care about not being able to search for spam in that account.


Below Contributed by Mark Nikkels on 17-4-2009

If your spamassassin subject headers are NOT being rewritten as per your config file, check the following.

That your spamassassin config file has the following;

# Whether to change the subject of suspected spam
rewrite_subject 1

# Text to prepend to subject if rewrite_subject is used
rewrite_header Subject ****SPAM****

If you’re using spamass-milter then make sure it’s not using the -m flag. (By default on redhat systems .. it does..)

Edit your /etc/init.d/spamass-milter file and change

EXTRA_FLAGS=”-m -r 15″ to EXTRA_FLAGS=”-r 15″ The -m tells spamass-milter NOT to modify any header or body information. So remove this and you should be fine.