Error Queue Filling Up

Discuss the Scalix Server software

Moderators: ScalixSupport, admin

mshade
Posts: 19
Joined: Fri Apr 20, 2007 2:49 pm

Error Queue Filling Up

Postby mshade » Mon Jun 18, 2007 12:21 pm

Hi all,

Recently, more and more messages have been getting dumped into the error queue -- this morning, I noticed that even Scalix's own messages are getting routed to the error queue. Here's an example:

Code: Select all

Messages on the Scalix ERROR Queue
----------------------------------------------------
31143364   prosody.com            MSG   N boxes                       06.16.07
31150595   root / internet        MSG   N Cron <root@ps> /usr/sbin/up 06.17.07
31153667   root / internet        MSG   N <mailnode> - Scali 06.17.07
31154611   root / internet        MSG   N <mailnode> - Scali 06.17.07
31154851   fvpf-dev-bounces       MSG   N [Fvpf-dev] [FVPF-dev_01 Tas 06.17.07
31155299   fvpf-dev-bounces       MSG   N [Fvpf-dev] [FVPF-dev_01 Tas 06.17.07
31155939   root / internet        MSG   N Cron <root@f1mail> /usr/bin 06.17.07
31156692   nvqv / internet        MSG   N astounding purchases simpli 06.17.07
31157044   ccampbell / internet   MSG   N New Contact Info            06.17.07
31158515   root / internet        MSG   N Cron <root> /var/ww 00:52:28


Part of the problem seems to lay with the Mapper script handling the connection between the service router and ClamAV, but there are numerous SERIOUS ERROR lines regarding local delivery.

One of the many entries in the fatal log:

Code: Select all

SERIOUS ERROR           Local Delivery(Local Delivery) Mon Jun 18 11:50:13 2007
[OM 10272] BACKTRACE:
/opt/scalix/lib/libom_er.so(er_add_backtrace+0xc6)[0xa0aee6]
/opt/scalix/lib/libom_cvc.so(cvc_enhCnvString+0x107)[0x362230]
/opt/scalix/lib/libom_cvc.so(cvc_ConvertString+0x3d)[0x362cc5]
/opt/scalix/lib/libom_rtfl.so(rtfl_BuildLine+0x3e9)[0xe78897]
/opt/scalix/lib/libom_rtfl.so[0xe7aa50]
/opt/scalix/lib/libom_rtfl.so(rtfl_Parse+0x276)[0xe7c41f]
/opt/scalix/lib/libom_rtfl.so(rtfl_search+0x109)[0xe79d83]
/opt/scalix/lib/libom_flt.so[0x9ebef9]
/opt/scalix/lib/libom_flt.so(flt_ApplyTextMatch+0xe1)[0x9ec036]
/opt/scalix/lib/libom_flt.so(Test_TextBody_Att+0x1eb)[0x9e9afe]
/opt/scalix/lib/libom_flt.so(flt_ApplySingle+0x6c8)[0x9e86c0]
/opt/scalix/lib/libom_flt.so(flt_ApplyNextFilter+0x228)[0x9e79da]
/opt/scalix/lib/libom_flt.so(flt_ApplyOrGroup+0xb6)[0x9e7ca5]
/opt/scalix/lib/libom_flt.so(flt_ApplyNextFilter+0x1e8)[0x9e799a]
/opt/scalix/lib/libom_flt.so(flt_ApplyOrGroup+0xb6)[0x9e7ca5]
/opt/scalix/lib/libom_flt.so(flt_ApplyNextFilter+0x1e8)[0x9e799a]
/opt/scalix/lib/libom_flt.so(flt_ApplyOuterGroup+0xcb)[0x9e7663]
/opt/scalix/lib/libom_flt.so(flt_ApplyFC+0x140)[0x9e7184]
local.delivery[0x8057cd9]
local.delivery[0x80530a4]
local.delivery[0x805c2ab]
local.delivery[0x805dfa3]
local.delivery[0x805ec4b]
/lib/tls/libc.so.6(__libc_start_main+0xd3)[0x48ade3]
Pid of logging process: 14927
  Last Msg Id: H000008700966330.1182181781.<mailnode>


Also receiving these errors:

Code: Select all

ERROR                   Service Router(Service Router) Wed Jun 13 17:55:31 2007
[OM 5183] A Mapper error has been detected.
Pid of logging process: 27770


ERROR                   Service Router(Service Router) Wed Jun 13 18:02:08 2007
[OM 5181] Reply timed out or invalid - Mapper protocol problem.
Command sent: SCAN:/var/opt/scalix/fl/s/data/00000ij/00tj4i7
Reply received: 504 anti-virus engine "ClamAV" exhibits unexpected behavior
Pid of logging process: 27770


ERROR                   Service Router(Service Router) Wed Jun 13 18:02:08 2007
[OM 5181] Reply timed out or invalid - Mapper protocol problem.
Command sent: QUIT Please Close This Session
Reply received:
Pid of logging process: 27770


ERROR                   Service Router(Service Router) Wed Jun 13 18:02:08 2007
[OM 5183] A Mapper error has been detected.
Pid of logging process: 27770


Are the service router and ClamAV errors related to the Serious Error local delivery errors, and what else should we be looking for to remedy the root of our problem?

We are running Scalix 11.0.4, upgraded nearly a month ago from 10.

Thanks!

chris
Scalix Star
Scalix Star
Posts: 321
Joined: Mon May 09, 2005 2:56 pm
Location: Freiburg, Germany

Postby chris » Mon Jun 18, 2007 7:25 pm

Are you getting any errors from ClamAV that correlate to those errors you see in the Scalix logs?

Error 504 can be a lot of different things. According to RFC2821 (http://www.ietf.org/rfc/rfc2821.txt) it's Command Parameter Not Implemented, which means that ClamAV is getting a parameter it doesn't understand for a command which it does understand.

I'll bet there's something in the Clam logs that should point in the right direction.

/c

mshade
Posts: 19
Joined: Fri Apr 20, 2007 2:49 pm

Postby mshade » Tue Jun 19, 2007 9:39 am

Chris, thanks for the response.

I did turn on verbose logging when I noticed the problem, but clamd.log isn't showing anything useful. Mainly virus database updates, and database checks -- not an error in sight that seems to correspond to anything the scalix fatal log is showing.

Here's a snippet:

Code: Select all

Tue Jun 19 01:47:58 2007 -> Reading databases from /var/clamav
Tue Jun 19 01:49:05 2007 -> Database correctly reloaded (232369 signatures)
Tue Jun 19 02:00:18 2007 -> SelfCheck: Database status OK.
Tue Jun 19 02:31:23 2007 -> SelfCheck: Database status OK.
Tue Jun 19 02:57:33 2007 -> Reading databases from /var/clamav
Tue Jun 19 02:58:32 2007 -> Database correctly reloaded (232378 signatures)
Tue Jun 19 03:03:38 2007 -> SelfCheck: Database status OK.
Tue Jun 19 03:35:41 2007 -> SelfCheck: Database status OK.
Tue Jun 19 04:06:20 2007 -> SelfCheck: Database status OK.
Tue Jun 19 04:36:51 2007 -> SelfCheck: Database status OK.

etcetera...

Do you think the Local Delivery serious errors are also related to ClamAV?

Thanks again

chris
Scalix Star
Scalix Star
Posts: 321
Joined: Mon May 09, 2005 2:56 pm
Location: Freiburg, Germany

Postby chris » Tue Jun 19, 2007 9:48 am

I'd guess that they are related, I'll check with engineering if someone thinks differently.

dkelly
Scalix
Scalix
Posts: 593
Joined: Thu Mar 18, 2004 2:03 pm

Postby dkelly » Tue Jun 19, 2007 10:13 am

What version of Scalix are you currently running.

The problems that you list here are two-fold.

The first one, is the virus scanner, that is simplest to fix. The key is the error

Code: Select all

[OM 5181] Reply timed out or invalid - Mapper protocol problem.
If you search for "clamav" and "timeout", you'll find this post at http://www.scalix.com/forums/viewtopic.php?t=6959&highlight=clamav+timeout. The link in that post shows the configuration options you need to include in general.cfg

The Local Delivery error is something different. This is most likely to be bug 14709 which was fixed in our April release 11.0.3.

Cheers

Dave

mshade
Posts: 19
Joined: Fri Apr 20, 2007 2:49 pm

Postby mshade » Tue Jun 19, 2007 10:34 am

We're running 11.0.4, upgraded from 10.0.5 -- on RHEL 4.

Thanks for that link above! I've gone ahead and upped the Mapper timeout to 120 seconds; I know this is at least part of the issue since we've always had trouble with messages with large attachments ending up on the error queue. The problem is that many of the messages on the error queue now have no attachments at all and so probably wouldn't have fallen victim to a timeout.

Any other ideas about the Serious Errors? If this bug was fixed in 11.0.3, could there have been a problem with the upgrade?

Thanks again


Return to “Scalix Server”



Who is online

Users browsing this forum: No registered users and 4 guests

cron