How Spamassassin Spam to learn from a public folder?

Discuss the Scalix Server software

Moderators: ScalixSupport, admin

apdata

How Spamassassin Spam to learn from a public folder?

Postby apdata » Tue Jul 25, 2006 1:10 pm

Hello! I installed scalix and spamassassin successfull (i think :-))
now i added a public folder "Spam" where all people should copy his spammails so that i have a big Spam Archive. Now i want to make a cron-job that spamassassin looks in the directory to learn that this is spam. But i dont know how i can do it. Can Sameone give me the commandline for spamassassin.

Thanks

Alexander Peters
Germany

davidollis

Postby davidollis » Tue Jul 25, 2006 9:45 pm

ap,

I followed this thread.
http://www.scalix.com/community/viewtopic.php?t=1075

basically created a utility user with read/write access to the public spam folder and run a cron as the SA user

I found the imap2mbox script in the above thread

#!/bin/sh
WORKDIR="/var/run/spamass-milter/scalix_spamassassin"
IMAP2MBOX="/usr/local/bin/imap2mbox.pl"
IMAPUSER="scripted.tasks@****"
IMAPPASS="****"
IMAPHOST="localhost"
IMAP_SPAMFOLDER="Public Folders/Spam"
IMAP_HAMFOLDER="Public Folders/Ham"
SPAM_MBOX=spam.$$
HAM_MBOX=ham.$$
SYNC_NEEDED=N

$IMAP2MBOX $SPAM_MBOX $IMAPHOST $IMAPUSER "$IMAP_SPAMFOLDER" $IMAPPASS
$IMAP2MBOX $HAM_MBOX $IMAPHOST $IMAPUSER "$IMAP_HAMFOLDER" $IMAPPASS
if [ -s $SPAM_MBOX ]; then
cat $SPAM_MBOX | formail -e -d -s sa-learn --spam --no-sync
SYNC_NEEDED=Y
fi
if [ -s $HAM_MBOX ]; then
cat $HAM_MBOX | formail -e -d -s sa-learn --ham --no-sync
SYNC_NEEDED=Y
fi
rm $SPAM_MBOX $HAM_MBOX
if [ $SYNC_NEEDED == "Y" ]; then
sa-learn --sync
fi

leigh
Posts: 109
Joined: Tue Feb 07, 2006 11:35 pm
Location: At my desk.
Contact:

Postby leigh » Thu Jul 27, 2006 8:37 pm

Try this on for size:

Code: Select all

#!/usr/bin/perl
use strict;
use warnings;
use Mail::IMAPClient;
my $host="IP address of mail server";
my $username="username";
my $password="password";
my $imap  = new Mail::IMAPClient( 'Server' => $host , 'User' => $user , 'Password' => $password  ) or next;   # connect to server.
my @folders=$imap->folders;               # list folders.
foreach  my $folder (@folders)               # Look through each of them.
{
   if (lc($folder) eq "public folders\\spam")      # Make sure we only use the right folder.
   {
      $imap->select($folder) or next;         # Select folder.
      print "Folder $folder selected.\n";
                my @list=$imap->messages or next;       # List all messages in folder.
        print scalar(@list)." messages in folder.\n";
      foreach my $msg (@list)            # Loop over each message.
      {
         my @email=$imap->fetch($msg,'RFC822');   # Fetch it.
         open (SALEARN,"|/usr/bin/spamassassin -d | /usr/bin/sa-learn --spam") or next;      # Feed it to sa-learn.
         print SALEARN "$email[1]\n";
         close SALEARN;
                        open (REPORT,"|/usr/bin/spamassassin -d | /usr/bin/spamassassin -r") or print "$!\n";   # Report it. (SpamCop and Pyzor).
                        print REPORT "$email[1]";
                        close REPORT;
                        $imap->delete_message($msg) or next;                                                    # Delete it.
      }
                $imap->expunge($folder) or next;                                                                # Expunge folder.
   }
}


This loops over the public folders/spam folder and for each message, it
* Feeds it to sa-learn as spam
* Reports it to spamcop and pyzor (and probably razor2 and DCC if you have it).
* Deletes it.

It should be a piece of cake to alter it to feed in ham as well.
You just need to alter the IP address, username and password near the top of it.

florian
Scalix
Scalix
Posts: 3852
Joined: Fri Dec 24, 2004 8:16 am
Location: Frankfurt, Germany
Contact:

Postby florian » Thu Jul 27, 2006 11:36 pm

Maybe to add one missing bit of information....

the user used to access the public folder must be a "Premium" user as only premium users can access the Public Folder namespace using IMAP.

This is actually a useful piece - would you want to format and post this to our Wiki at www.scalix.com/wiki, somewhere under contributions and Howtos - once finished, I'm happy to setup a link to it on the Wiki's main page......

:-)
Florian.
Florian von Kurnatowski, Die Harder!

leigh
Posts: 109
Joined: Tue Feb 07, 2006 11:35 pm
Location: At my desk.
Contact:

Postby leigh » Thu Jul 27, 2006 11:55 pm

If you wait a little while, I may have something even better, when I get around to it.
We find it difficult to get users to use a "public folders/spam" and "public folders/non-spam" mailbox properly. (yeah, I know, it shouldn't be that difficult, but it is.), so I'm working on a method which doesn't need public folders. I've written it, and I'm just testing it now. Seems OK so far.
Basically, it uses the 'junk e-mail' folder in Outlook, with mboxadmin rights to read everyone's folders.
Then it takes ham from their inbox and looks for any false negatives in their own 'spam' folder.
Stay tuned.

florian
Scalix
Scalix
Posts: 3852
Joined: Fri Dec 24, 2004 8:16 am
Location: Frankfurt, Germany
Contact:

Postby florian » Fri Jul 28, 2006 6:58 am

,-) sounds good to me - while offering both methods might not hurt as well! ;_)

thanks,
Florian.
Florian von Kurnatowski, Die Harder!

kali
Posts: 64
Joined: Sat Oct 29, 2005 12:13 am

Postby kali » Fri Jul 28, 2006 4:58 pm

leigh wrote:If you wait a little while, I may have something even better, when I get around to it.
We find it difficult to get users to use a "public folders/spam" and "public folders/non-spam" mailbox properly. (yeah, I know, it shouldn't be that difficult, but it is.), so I'm working on a method which doesn't need public folders. I've written it, and I'm just testing it now. Seems OK so far.
Basically, it uses the 'junk e-mail' folder in Outlook, with mboxadmin rights to read everyone's folders.
Then it takes ham from their inbox and looks for any false negatives in their own 'spam' folder.
Stay tuned.


Leigh,

I would be very interested in this approach (mboxadmin admin user looping through standard Outook Junk E-Mail folders). Do you have a draft of this? Would very much like to get this in production for clients.

Feel free to PM me if you're not quite ready to go public with this!

Thanks - and let me know when you can.

jcaudell
Posts: 73
Joined: Tue Jul 18, 2006 9:56 am

Postby jcaudell » Fri Jul 28, 2006 5:20 pm

Leigh,

I am also very interested in this and would very much like to help in any way I can if you need any additional testers or anyone to help out.

Thanks!

leigh
Posts: 109
Joined: Tue Feb 07, 2006 11:35 pm
Location: At my desk.
Contact:

Postby leigh » Sun Jul 30, 2006 6:58 pm

It's a work still in progress, but here's the start:
http://www.scalix.com/wiki/index.php?title=Care_and_feeding_of_your_Bayes
When I get the time, I'll come back and finish it, complete with an explanation of what it does and why I chose to do it that way.

kali
Posts: 64
Joined: Sat Oct 29, 2005 12:13 am

Postby kali » Mon Jul 31, 2006 11:08 pm

Nice script - I like it very much. Simple, effective.

Just one quick question - how does one create an mboxadmin user? Can an existing user be given that privelege?

leigh
Posts: 109
Joined: Tue Feb 07, 2006 11:35 pm
Location: At my desk.
Contact:

Postby leigh » Mon Jul 31, 2006 11:14 pm

ommodu username -c +mboxadmin

man ommodu for more details.

kali
Posts: 64
Joined: Sat Oct 29, 2005 12:13 am

Postby kali » Tue Aug 01, 2006 1:27 pm

Yup - figured that one out very quickly!

Here's a thought (and I'm willing to help). You script pipes each message to a new instance of sa-learn, and that is very slow.... I ran on a 600 user client and took a VERY long time. I am recalling my IMAP days when sa-learn, run on one box with lots of messages, ran very quickly through.

So.... I'm thinking rather than finding spam folder, piping each individual message to a new sa-learn instance -- how about create temp (empty) mbox file, find each spam folder, "move" all messages from spam folder (append) to temp mbox, do that for all users, then run one instance of sa-learn on the mbox file, then delete the file?

I will take a look at Mail::IMAPClient and see if that is possible? But you may have the faster track at this....

Thanks for good stuff Leigh,

kali
Posts: 64
Joined: Sat Oct 29, 2005 12:13 am

Got It!

Postby kali » Tue Aug 01, 2006 6:39 pm

Leigh - and others,

I have solved this (for me anyway). I was not terribly fond of the perl script (very high resource utilization using Mail::IMAPClient - so instead took a path of using a bash script which uses uw-imap's mailutil program.

I scripted mailutil to go to each users' Junk E-mail folder and "appenddelete" (which is a very fast function) the messages to a local mbx format mailbox (very fast mailbox format). Then after all mail has been appended and removed from user's spam boxes, it lauches sa-learn once on the mbx file - reads/learns all the mail (also very fast) and then deletes the temporary mbx file.

I know this is somewhat "specialized" but I needed this for large customers (500+ users) which required faster processing of the junk mail as it took almost an hour using the perl scrip from Leigh. That said - having current bayes information is just invaluable to the spamassassin engine to make good determinations.

Please let me know if you have any questions and I can document and/or provide the script to anyone interested.

Regards,

leigh
Posts: 109
Joined: Tue Feb 07, 2006 11:35 pm
Location: At my desk.
Contact:

Postby leigh » Tue Aug 01, 2006 7:18 pm

Hmm. Good point. Perhaps it doesn't scale as well as it could.
I'm running it on less than 25 users.
The thread referenced at the top of this one has a method of creating an mbox file from IMAP, but it also uses Mail::IMAP::Client.
The main reason I used a seperate instance for each mail is that I needed to do something different to each message, depending on which mailbox it is. Also, because I thought it would be prudent to use the "forget" function before feeding your ham into sa-learn. I also felt it was appropriate to remove the spamassassin markup first.

kali
Posts: 64
Joined: Sat Oct 29, 2005 12:13 am

Postby kali » Tue Aug 01, 2006 7:30 pm

Leigh,

Just to let you know, it was YOUR script that provided the basis for me to move forward with this - so I definitely have you to thank for that! :D

I'm pretty sure the SA headers are ignored in Bayes learning so you probably don't need to worry about those. The other issues can also be covered (different folders, different options etc.), as I really do the same thing you do - just using compiled mailutil rather than perl.

Thanks again for the groundwork!


Return to “Scalix Server”



Who is online

Users browsing this forum: No registered users and 2 guests

cron