How Spamassassin Spam to learn from a public folder?

Discuss the Scalix Server software

Moderators: ScalixSupport, admin

florian
Scalix
Scalix
Posts: 3852
Joined: Fri Dec 24, 2004 8:16 am
Location: Frankfurt, Germany
Contact:

Postby florian » Sun Aug 06, 2006 9:50 pm

This has now been included in http://www.scalix.com/wiki/index.php?title=HowTos/SpamAssassin. Many thanks. Florian.
Florian von Kurnatowski, Die Harder!

kali
Posts: 64
Joined: Sat Oct 29, 2005 12:13 am

Postby kali » Sun Aug 06, 2006 11:53 pm

leigh wrote:Hmm. Good point. Perhaps it doesn't scale as well as it could.
I'm running it on less than 25 users.
The thread referenced at the top of this one has a method of creating an mbox file from IMAP, but it also uses Mail::IMAP::Client.
The main reason I used a seperate instance for each mail is that I needed to do something different to each message, depending on which mailbox it is. Also, because I thought it would be prudent to use the "forget" function before feeding your ham into sa-learn. I also felt it was appropriate to remove the spamassassin markup first.


Hi Leigh,

Just another quick point. Learning a message as spam automatically "unlearns" (forgets) the message as ham and vice versa (documented in SA). So you probably don't need the call to "forget" in the weekly script and will save running an additional instance of sa-learn.

Good luck to you and thanks again. My current shell script using mailutil is running very well.

KKJensen
Posts: 142
Joined: Wed Sep 06, 2006 9:34 am
Contact:

Postby KKJensen » Tue Nov 28, 2006 4:59 pm

I'm using FC5...in which package do i find IMAPClient? I've searched in Yum extender and it came up with some stuff for perl but it looked like a different kind of mailserver.

Does this script need to be put in any specific folder to run properly? I'm using scalix 11 b2

ianare
Posts: 61
Joined: Tue Sep 19, 2006 1:13 pm

Postby ianare » Wed Nov 29, 2006 3:30 pm

KKJensen wrote:I'm using FC5...in which package do i find IMAPClient? I've searched in Yum extender and it came up with some stuff for perl but it looked like a different kind of mailserver.

Does this script need to be put in any specific folder to run properly? I'm using scalix 11 b2


first Google match.

http://dries.ulyssis.org/rpm/packages/p ... /info.html

or

yum install perl-Mail-IMAPClient


I put all my scalix scripts in one place, but that's just me.

KKJensen
Posts: 142
Joined: Wed Sep 06, 2006 9:34 am
Contact:

Postby KKJensen » Wed Nov 29, 2006 4:38 pm

I've installed the rpm (I was googling using google.com/linux and the first one only showed rpms for FC3 and older).

When I run the script it just goes to the next line and doesn't show any output and the spam folder stays full. I'm trying to run it on "Public Folder/junkmail/SPAM" ...am I missing something in the code? I've never used perl before...does specific formatting matter like using tabs vs spaces? Case Sensitivity?

Code: Select all

#!/usr/bin/perl
use strict;
use warnings;
use Mail::IMAPClient;
my $host="192.168.5.201";
my $username="sxadmin";
my $password="elisen";
my $imap = new Mail::IMAPClient( 'Server' => $host , 'User' => $username , 'Password' =>$password ) or next; #connect to server
my @folders=$imap->folders;  # list folders
foreach my $folder (@folders)  #look through each of them
{
    if (lc($folder) eq "public folders\\junkmail\\SPAM")    #Make sure we only use the right folder.
    {
      $imap->select($folder) or next;    #Select folder
      print "Folder $folder selected.\n";
           my @list=$imap->messages or next;    #List all messages in folder
        print scalar(@list)."messages in folder.\n";
      foreach my $msg (@list)        #Loop over each message.
      {
        my @email=$imap->fetch($msg,'RFC822');    #Fetch it.
        open (SALEARN,"|/usr/bin/spamassassin -d | /usr/bin/sa-learn --spam") or next;  #feed it to sa-learn.
        print SALEARN "$email[1]\n";
        close SALEARN;
            open (REPORT,"|/usr/bin/spamassassin -d | /usr/bin/spamassassin -r") or print "$!\n"; #report it
            print REPORT "$email[1]";
            close REPORT;
            $imap->delete_message($msg) or next;    #delete it
      }
        $imap->expunge($folder) or next;  #expunge folder
    }
}



It seems to run very fast and I'm not sure if this is normal...I've got about 300 spam emails in the folder. It isn't erasing what was in the public folder either so I guess it isn't running. Is there a way to get some debugging info out of it, like running in verbose mode?

Code: Select all

 # time ./train

says
real 0m1.060s
user 0m0.874s
sys 0m0.075s
Last edited by KKJensen on Wed Nov 29, 2006 5:27 pm, edited 1 time in total.

ianare
Posts: 61
Joined: Tue Sep 19, 2006 1:13 pm

Postby ianare » Wed Nov 29, 2006 5:15 pm

It's not running then. :(

try changing this: public folders\\junkmail\\SPAM
to this: public folders/junkmail/SPAM
or this: public\ folders/junkmail/SPAM

If that doesn't work, try replacing some of the 'or next' with 'or die "some descriptive message"' to see where it's crapping out on.

KKJensen
Posts: 142
Joined: Wed Sep 06, 2006 9:34 am
Contact:

Postby KKJensen » Wed Nov 29, 2006 5:44 pm

I've been reading some of the other posts linked on the same subject. Just to be sure I'm understanding, this script doesn't depend on the mail all being copied out somewhere else first?
[edit] it DOES depend on the mail getting copied somewhere else. I'm going to test the code that is posted in the wiki that cycles through each user, then each folder until it finds the "junk mail" folder.

Will the default admin, "sxadmin", suffice for the "mboxadmin" spoken of in the template?

KKJensen
Posts: 142
Joined: Wed Sep 06, 2006 9:34 am
Contact:

Postby KKJensen » Wed Nov 29, 2006 6:50 pm

okay....got the mboxadmin priveledges taken care of.

The script is now running (seems to only work on folders named "junk e-mail" regardless of me find/replacing all the "junk e-mail" for anything else). It says it's learning but is failing on the report and saying something about needing authentication.

Can anyone report? I heard that only a list of trusted groups could add to these lists so they didn't get swarmed with bad entries.

leigh
Posts: 109
Joined: Tue Feb 07, 2006 11:35 pm
Location: At my desk.
Contact:

Postby leigh » Wed Dec 13, 2006 8:33 pm

Hi,
Sorry I never responded to your Q's, I somehow got off the watchlist for this thread.

I presume you found and installed Mail::IMAP::Client. For those looking for the info, it's a perl module, so it can be installed with:

Code: Select all

perl -MCPAN -e shell
install Mail::IMAP::Client


The 'needing authentication' messae sounds like it is either Razor, Pyzor, DCC, or whatever other reporting methods SpamAssassin uses. If I recall correctly, DCC needs to authenticate in order to report spam. I wouldn't worry too much about it, as long as you are seeing the bayes being trained.
I recommend using the script posted in the wiki. It's a bit more complete, and it's easier for your users.
The sxadmin user should suffice for the mboxadmin user, but I don't recommend it. There is a security issue with having your username and password embedded in the script, so anybody looking at your script can find out your sxadmin password. Not something I would like to expose myself to.

fb
Posts: 22
Joined: Sun Jul 01, 2007 10:45 am

Postby fb » Wed Jul 18, 2007 7:28 am

leigh,

I read your HowTo carefully and I need some help.

Why do I need to set up 4 Folders: Junk e-mail, spam, non-spam, possible spam ?

Isn't spam and junk e-mail the same??

And is there any requirements to those folders like do they have to be public folders or such?

leigh
Posts: 109
Joined: Tue Feb 07, 2006 11:35 pm
Location: At my desk.
Contact:

Postby leigh » Wed Jul 18, 2007 10:04 pm

Why do I need to set up 4 Folders: Junk e-mail, spam, non-spam, possible spam ?

non-spam is for people to put false positives in. It is then learnt as ham in spamassassin.
Junk email is automatic on outluook 2K3 and above.
The other two are used to keep spam out of your inbox. I use rules to put anything tagged as spam into the spam folder, and anything not tagged, but with a score above 3 into the 'possible spam' folder.
Because the 'spam' folder is filled automatically, we don't use anything in it which is less than a week old. That gives our users enough time to check the folder and remove any false positives.

Isn't spam and junk e-mail the same??

Yes, they are, but Outlook 2K3 automatically creates the 'junk-email' folder for you.
People use it just because it's there. Therefore we need to read the spam from it.
Because of this, I have made the 'junk email' folder the most aggressive. It gets read every hour or so, and anything in it is reported immediately. Basically, this is where people put junk email they get in their inbox.

Ruthiness
Posts: 79
Joined: Tue Nov 13, 2007 8:11 pm

Postby Ruthiness » Sun Jan 13, 2008 8:09 pm

We have Scalix 11.2 and SpamAssassin installed and are ready for the next step - training it with sa-learn. I want to use the script provided by Leigh, but we have an issue with some good mail getting routed to the Junk Mail folder so if I run these scripts, it may tag good mail as spam.

How do I find out how and why good mail is getting routed to the Junk folders? I have been given a couple of example mails with subjects and to/from to search in maillog for. I am running this on RedHat Linux ES.

Any ideas? Also have the Razor plugin to SpamAssassin... Which log file might have this info?


Return to “Scalix Server”



Who is online

Users browsing this forum: No registered users and 3 guests

cron