Training spamassassin using sa-learn

Discuss the Scalix Server software

Moderators: ScalixSupport, admin

MDAFederal
Posts: 57
Joined: Fri Feb 10, 2006 2:49 pm
Location: Rockville, MD
Contact:

Postby MDAFederal » Thu Feb 23, 2006 1:39 pm

Is this using a global bayesian under root, or is there a per-user bayesian stored in their home directory or virtual home directory? I currently have a generic Sendmail running with SpamAssassin 2.64 and procmail and can get it to save the information on a per-user basis. With Scalix and SpamAssassin 3.1.0, I only can get it to save a global bayesian and user_prefs. I would prefer to set it up on a per-user basis. Any thoughts or suggestions?

jedwards

Postby jedwards » Thu Feb 23, 2006 6:34 pm

Help! I'm overlooking the obvious and I can't stop. Thanks leigh. But, but, but, I thought sendmail used mbox. If I copy the spam mail spool from sendmail on the old box, sa-learn accepts it. No matter, you've led me to the water.

Jim Edwards

cjwilber
Posts: 24
Joined: Tue Feb 21, 2006 6:18 am

Postby cjwilber » Mon Mar 06, 2006 12:14 pm

I have found a lot of useful information in this thread so I thought I'd post a heavily plagiarised script.
It seemed to me that having to create an mbox file using a perl script then run a bash script to break that apart again and feed into sa-learn was not the best way. So what this script does is:
1) Takes both a spam folder and a ham folder as arguments
2) Loop through the folders until find one matching either the ham or spam folder
3) For spam folder it learns from the spam and then deletes the messages
4) For ham folder, it just learns from the new mails each day (there will be some overlap, but it's better than learning entire inbox each day). Note that the process of reading the ham folder to learn from means that the IMAP server changes the seen and recent flags on the messages. To get around this I read the status before hand and restore afterwards.
6) instead of writing to a file, write to a pipe, sending the messages one at a time into sa-learn, filtereing through spamassassin first to remove the spam headers.
7) In the ham folder restore the seen and recent flags. In the spam folder just delete everything.
There are some issues, for example the fact that the password has to be used to call it. Also, would be useful if I could find out how to set up a mailboxadmin user who could cycle through all of the users' mailboxes. Anyway, here is the listing and I hope it is useful for some people.

#!/usr/bin/perl
use strict;
use warnings;
use Mail::IMAPClient;

my $usage =
"ARGS must be :
\targv1 : imap host
\targv2 : imap user (password will be prompted)
\targv3 : spam mailbox on imap server
\targv4 : ham mailbox on imap server
\targv5 : password\n";

die($usage) if(@ARGV != 5);
my ($host,$user,$spam,$ham,$password) = @ARGV;

my $imap = new Mail::IMAPClient( 'Server' => $host , 'User' => $user , 'Password' => $password ) or die "Unable to connect to imap
server\n";

foreach my $folder ($imap->folders) {
$imap->select($folder) or next;
if ($folder eq $spam) {
#For spam fetch all messages because delete them each day
print "Processing $folder folder\n";
my @list = $imap->messages or next;
print "Found " . @list . " messages in the $folder folder\n";
foreach my $mess (@list){
open (MBOX_SPAM, "|spamassassin -d | sa-learn --spam") or die "Can't open pipe: $!";
my @output = $imap->fetch(($mess,'RFC822')) or die "Unable to fetch $@";
print MBOX_SPAM "$output[1]" if(defined($output[1]));
close (MBOX_SPAM);
}
### Remove seen spam messages, because we don't need them anymore
my $nrDeleted = $imap->delete_message( scalar($imap->seen) ) or warn "Could not delete_message: $@\n";
print "$nrDeleted messages deleted\n";

### Ok, the messages are deleted, but in fact they aren't (welcome to IMAP ;-))
### So, we should expunge the folder to actually delete the messages
$imap->expunge($folder) or die "Could not expunge: $@\n";
}
elsif ($folder eq $ham) {
print "Processing $folder folder\n";
#This process will affect the status of recent and unseen flags, so take copy before and restore afterwards
print "Checking for Unseen and Recent messages:\n";
my @recent = $imap->recent or warn "No recent msgs: $@\n";
my @unseen = $imap->unseen or warn "No unseen msgs: $@\n";
print "There are " . @recent . " recent messages, and " . @unseen . " unseen messages in the $folder folder\n";
print "The status of these messages will be retained.\n";
#For ham we only fetch a day's worth of messages otherwise we would be continually re-learning same messages
my $yesterday = time()-86400;
my @list = $imap->since($yesterday) or warn "search: No emails found since yesterday\n";
if ($@) {
warn "Error in search: $@\n";
}

print "Found " . @list . " messages in $folder folder\n";
foreach my $mess (@list){
open (MBOX_HAM,"|spamassassin -d | sa-learn --ham ") or die "Can't open pipe: $!";
my @output = $imap->fetch(($mess,'RFC822')) or die "Unable to fetch $@";
print MBOX_HAM "$output[1]" if(defined($output[1]));
close (MBOX_HAM);
}
#Restore the Unseen and Recent flags after first checking whether there were any in that state
if (@unseen > 0 ) {
print "Restoring the Unseen status for " . @unseen . " messages in the $folder folder\n";
$imap->unset_flag("Seen",@unseen) or warn "Could not reset flag for Unseen messages: $@\n";
}
if (@recent > 0 ) {
print "Restoring the Recent status for " . @recent . " messages in the $folder folder\n";
$imap->set_flag("Recent",@recent) or warn "Could not set flag for Recent messages: $@\n";
}
}
}
$imap->disconnect() or die "Unable to disconnect\n";

print "Spamassassin learning of imap $ham and $spam folders finished\n";

mephisto

Spamassassin bounce messages

Postby mephisto » Sat Mar 18, 2006 12:49 pm

Hi,

somehow my spamassassin/sendmail combination is sending out bounce messages to spam senders:

Code: Select all

The original message was received at Sat, 18 Mar 2006 16:40:06 +0100 from root@localhost

   ----- The following addresses had permanent fatal errors ----- <me@mydomain>
    (reason: 550 5.7.1 Blocked by SpamAssassin)
    (expanded from: <me@mydomain>)

   ----- Transcript of session follows ----- ... while talking to [127.0.0.1]:
>>> DATA
<<< 550 5.7.1 Blocked by SpamAssassin
554 5.0.0 Service unavailable
Does anyone know how to turn this off?

ScalixSupport
Scalix
Scalix
Posts: 5503
Joined: Thu Mar 25, 2004 8:15 pm

Postby ScalixSupport » Sat Mar 18, 2006 12:59 pm

Neither of those are sending a bounce, it's the sending MTA that is doing it.

By default, spamass-milter is configured to reject anything that has a spam score above 15. As part of the SMTP conversation, once the DATA phase has been completed and spamassassin has analysed the message, if the score is above the value, a 550 error is issued rather than a 250 OK.

You need to edit /etc/sysconfig/spamass-milter or /etc/init.d/spamass-milter to change the -r option to a higher value.

But, as has been commented in other posts here, even accepting a message with that high a spam score is questionable.

Cheers

Dave

mephisto

Postby mephisto » Sat Mar 18, 2006 1:05 pm

Hi Dave,

thanks for the swift reply.

ScalixSupport wrote:You need to edit /etc/sysconfig/spamass-milter or /etc/init.d/spamass-milter to change the -r option to a higher value.


So I have the choice between filling up my users spamfolders with junk or to fill my own with non delivery messages? I'm getting them each time a spam message hits my system:

Code: Select all

The original message was received at Sat, 18 Mar 2006 15:18:09 +0100 from localhost [127.0.0.1] with id k2IEI2TL022006

   ----- The following addresses had permanent fatal errors ----- <andreas.palladium@itsower.com>
    (reason: 554 delivery error: dd This user doesn't have a itsower.com account (andreas.palladium@itsower.com) [0] - mta113.biz.mail.re2.yahoo.com)

   ----- Transcript of session follows ----- ... while talking to mx1.biz.mail.yahoo.com.:
>>> DATA
<<< 451 mta109.biz.mail.re2.yahoo.com Resources temporarily unavailable. Please try again later [#4.16.5].
... while talking to mx5.biz.mail.yahoo.com.:
>>> DATA
<<< 554 delivery error: dd This user doesn't have a itsower.com account (andreas.palladium@itsower.com) [0] - mta113.biz.mail.re2.yahoo.com
554 5.0.0 Service unavailable
So there are messages going from my server to the faked email address? What's the point in this?

ScalixSupport
Scalix
Scalix
Posts: 5503
Joined: Thu Mar 25, 2004 8:15 pm

Postby ScalixSupport » Sat Mar 18, 2006 1:37 pm

It's called spam :-)

Most spam is from faked addresses ( as are virus messages ).

Unfortunately, there is nothing in spamass-milter to provide this. You might want to look at another milter.

The spamass-milter site at savannah.nongnu.org has some patches to CVS which ( although they were submitted in 2004 ) include a -s switch as an alternative to -r where -s will discard instead of reject. The url is http://savannah.nongnu.org/patch/?func=detailitem&item_id=3513

Take a look at it and see if that meets your needs.

Cheers

Dave

mephisto

Postby mephisto » Sat Mar 18, 2006 1:40 pm

I'm just confused because I never had such problems with amavis running on my old server. I'm a bit sad I have to leave that behind now and have to deal with the spamassassin configuration myself. I migh look into the patches you mentioned.

ScalixSupport
Scalix
Scalix
Posts: 5503
Joined: Thu Mar 25, 2004 8:15 pm

Postby ScalixSupport » Sat Mar 18, 2006 2:00 pm

There's no reason why you can't use the amavis-milter with sendmail.

It's not something that we have a technote for, maybe you could write it ;-)

Cheers

Dave

mephisto

Postby mephisto » Sat Mar 18, 2006 2:09 pm

Code: Select all

[root@scalix logs]# yum search amavis
Searching Packages:
Setting up repositories
Reading repository metadata in from local files
No Matches found

Man those where the days when an "apt-get install amavisd-new" was all there was needed.

ScalixSupport
Scalix
Scalix
Posts: 5503
Joined: Thu Mar 25, 2004 8:15 pm

Postby ScalixSupport » Sat Mar 18, 2006 2:21 pm

Dag Wieers' site at apt.sw.be is a great site for things like this. It focusses mainly on the RedHat distributions, i.e. Fedora and RHEL but the source RPMs are also included.

Cheers

Dave

burhankhalid
Posts: 137
Joined: Mon Dec 19, 2005 8:31 am

Postby burhankhalid » Mon Mar 20, 2006 9:06 am

Hey mephisto:

Code: Select all

[root@avalon ~]# yum info amavisd-new
Setting up repositories
Reading repository metadata in from local files
Available Packages
Name   : amavisd-new
Arch   : noarch
Version: 2.3.3
Release: 5.fc4
Size   : 486 k
Repo   : extras
Summary: Email filter with virus scanner and spamassassin support
Description:
 amavisd-new is a high-performance and reliable interface between mailer
(MTA) and one or more content checkers: virus scanners, and/or
Mail::SpamAssassin Perl module. It is written in Perl, assuring high
reliability, portability and maintainability. It talks to MTA via (E)SMTP
or LMTP, or by using helper programs. No timing gaps exist in the design,
which could cause a mail loss.


:)

mephisto

Postby mephisto » Mon Mar 20, 2006 12:03 pm

burhankhalid wrote:Release: 5.fc4
Well, this is Fedora Core, I'm using CentOS 4.3 (or RHEL4 for that matter). I'll probably use the repository Dave mentioned, but I'm a bit hesitative as that might include a lot of unofficial RPMs.

ScalixSupport
Scalix
Scalix
Posts: 5503
Joined: Thu Mar 25, 2004 8:15 pm

Postby ScalixSupport » Mon Mar 20, 2006 12:21 pm

You don't have to use it for updates to everything.

It's the only site we know of that has the spamass-milter RPM available.

Cheers

Dave

mephisto

Postby mephisto » Mon Mar 20, 2006 12:29 pm

ScalixSupport wrote:It's the only site we know of that has the spamass-milter RPM available.
CentOS includes it if you activate all repositories (incl. addons, extras, centosplus and contrib) in /etc/yum.repos.d/CentOS-Base.repo.


Return to “Scalix Server”



Who is online

Users browsing this forum: No registered users and 2 guests

cron