Page 1 of 1
Corrupt Messages Stop Server
Posted: Thu Dec 06, 2007 11:51 am
by dougp23
I had a few users this morning telling me they were not getting email. After a few tests, yes this was indeed the case.
I looked, and 186 messages were in the Corrupt Message Queue. Some were legitimate, some were spam.
I tried stopping and restarting clamd, and 'omresub -q error'. When I did the omresub command, spamd went to 100%, yet all the messages stayed in the queue. I finally had to flush the queue (delete all messages including all the legitimate ones).
This server has been running for more than a year, so I don't think it's clamav being misconfigured. But this has happened to this box before where it stops forwarding mail, everyhting looks fine, and the only way to get it going again is to kill everything in the corrupt message queue.
Can someone offer some help? Deleting legitimate messages is not a good thing, as you can imagine!
Posted: Fri Dec 07, 2007 5:30 am
by mikethebike
doug,
could you send an email to one of the affected users, then have a look in the fatal log (~logs/fatal) to see if there are any errors?
Does this happen with all mail to those users?
Mick
Posted: Fri Dec 07, 2007 10:26 am
by dougp23
Well, it looks like when this was occuring, ALL mail was moving to the "Corrupt Message Queue".
The fatal logs were talking about a 'mapper error', which means that clamav had apparently updated improperly, and was not running.
Is it common to see ANY messages in the Corrupt Queue, or if you have even one, should you be concerned??
Posted: Fri Dec 07, 2007 3:30 pm
by mikethebike
Doug,
I would be tempted to disable clamav if you think it is that causing the issue, and if there is only that error in the fatal log at that time.
Have you checked you have the correct files in the ~rules directory?
If you turn off the service router and turn it back on, do you get any errors?
When you look at the error queue, are the messages marked as "ERR" or "MSG"
Mick
Posted: Mon Dec 10, 2007 8:54 am
by dougp23
How would I disable clamav? The same thing is happening this morning, and I don't have time to spend hours figuring out why....
If I just shutdown clamd, I don't think that is gonna help me....
Posted: Mon Dec 10, 2007 9:47 am
by mikethebike
doug,
Are the messages marked as "ERR" or "MSG"?
I think clamav uses scalix rules. You could run:
omshowrt -q all -d
make sure the rule is in ~scalix/rule, and make sure any files that rule references are in that directory. Ensure they are readable by scalix.
If you do not have time to troubleshoot, you could always remove the rules altogether:
omshowrt -q all |awk '{print $2}' |while read route;do
ommodrt -m $route -d""
done
This may take a few minutes to run.
You will need to stop and restart the router after doing this.
Check in the fatal log for any errors after restarting.
This assumes you have no other rules applied.
Posted: Mon Dec 10, 2007 6:35 pm
by dougp23
Output from
omshowrt -q all -d
goes like this:
The global virus scanning/cleaning rule file exists
UNIX internet MIME
UNIX internet,tnef TNEF
LOCAL qmail
So that looks fine, I think. (qmail is the name of the mailserver itself).
IN the ~scalix/rules dir, I have these
-rw-r--r-- 1 root 54 Mar 13 2007 ALL-ROUTES.VIR
-rw-r--r-- 1 root 67 Mar 13 2007 ndninfo.txt
-r-xr-xr-x 1 root 35644 Mar 13 2007 omvscan.map
so that looks OK, as the read bit is set for everyone.
In fatal logs, I have this:
ERROR Service Router(Service Router) Mon Dec 10 07:58:58 2007
[OM 5181] Reply timed out or invalid - Mapper protocol problem.
Command sent: <none - expect greeting reply>
Reply received: 503 "ClamAV" cannot scan Scalix-owned file
Pid of logging process: 3408
Followd by man of these:
ERROR Service Router(Service Router) Mon Dec 10 08:34:35 2007
[OM 5183] A Mapper error has been detected.
Pid of logging process: 4651
A grep of clamav in /etc/group shows this
scalix:x:101:clamav
clamav:x:102:
The odd thing is I get nightly reports, and it will tell me how many times it found what viruses, and how many times the clamav db was reloaded:
--------------------- Clamav Begin ------------------------
Viruses detected:
HTML.Phishing.Bank-362: 8 Time(s)
Daemon check list:
Database status OK: 41 Time(s)
**Unmatched Entries**
Database correctly reloaded (173161 signatures)
---------------------- Clamav End -------------------------
Any ideas?? That error in the fatal log does seem to indicate that clamav can't scan a scalix owned file, but not sure why I would get that error. And not TONS of times, our mailserver only processes about 3000 emails a day, but it would seem I would have a log FILLED with that permission error if that was the prob.
Anyone?
Posted: Mon Dec 10, 2007 7:42 pm
by mikevl
Hi What is the ownership of the Clamav process?
Did you put Scalix in the ClamAv alternate groups?
Mike
Posted: Mon Dec 10, 2007 7:54 pm
by dougp23
HI.
#ps aux | grep clam
clamav 3085 0.6 2.8 97312 59604 ? Ss 07:58 4:25 clamd
a grep of scalix in /etc/group gives me this:
scalix:x:101:clamav
Does that help?
Posted: Tue Dec 11, 2007 10:33 am
by les
dougp23 wrote:HI.
#ps aux | grep clam
clamav 3085 0.6 2.8 97312 59604 ? Ss 07:58 4:25 clamd
a grep of scalix in /etc/group gives me this:
scalix:x:101:clamav
Does that help?
Do you have "AllowSupplementaryGroups yes" in /etc/clamd.conf?
The error is "likely" just the generic error which gets returned, all your setup looks right, something else is amiss.
You should enable debug logging in clamav to see whats going on.
edit /var/opt/scalix/xx/s/sys/omvscan.cfg and set OMAV_LOGLEVEL to 3 for debug.
restart the service router after the change. then send a test message through and watch the log file /var/opt/scalix/xx/s/logs/omvscan.log.
If errors are happening they should be logged there in more detail.
Hope that helps.
Posted: Wed Dec 12, 2007 10:20 am
by dougp23
Yes, I have the "AllowSuppGroups" set to yes.
As a temp fix, I removed freshclam from /etc/cron.d/daily. I have to believe that this is due to clam updating. When it updates, it restarts clamd. And if you are in the midst of processing a message, it might get flaky. (It would seem the better way to do this would be to shut down the service router, update clamav, wait about 1 minute for clamav to reload, then restart the service router).
So for now, I will manually run freshclam when I am near the system.
Posted: Wed Dec 12, 2007 5:30 pm
by les
dougp23 wrote:Yes, I have the "AllowSuppGroups" set to yes.
As a temp fix, I removed freshclam from /etc/cron.d/daily. I have to believe that this is due to clam updating. When it updates, it restarts clamd. And if you are in the midst of processing a message, it might get flaky. (It would seem the better way to do this would be to shut down the service router, update clamav, wait about 1 minute for clamav to reload, then restart the service router).
So for now, I will manually run freshclam when I am near the system.
yes, it is a fragile environment!!!
These days i prefer to have clamav hooked directly into sendmail via clamav-milter.
In the past i've had multiple sites where the service router has stopped and blocked mail due to an error from clamav.
clamav-milter coexists nicely with spamassassin and spamass-milter and it leaves the service router out of play in terms of stopping all mail if it cannot scan a file.
As yet, on multiple sites i am yet to see any issues with this setup and i still have freshclam running.