Page 1 of 1

SERIOUS ERROR Local Delivery(Local Delivery)

Posted: Wed May 23, 2007 10:27 am
by eyalm
Hi,
I've been getting this error lately:

omshowlog -s ld
SERIOUS ERROR Local Delivery(Local Delivery) 05.23.07 09:18:09
[OM 10272] BACKTRACE:
/opt/scalix/lib/libom_er.so(er_add_backtrace+0xc6)[0xf7f2eee6]
/opt/scalix/lib/libom_cvc.so(cvc_enhCnvString+0x107)[0xf7c5f230]
/opt/scalix/lib/libom_cvc.so(cvc_ConvertString+0x3d)[0xf7c5fcc5]
/opt/scalix/lib/libom_rtfl.so(rtfl_BuildLine+0x3e9)[0xf7c26897]
/opt/scalix/lib/libom_rtfl.so[0xf7c28a50]
/opt/scalix/lib/libom_rtfl.so(rtfl_Parse+0x175)[0xf7c2a31e]
/opt/scalix/lib/libom_rtfl.so(rtfl_search+0x109)[0xf7c27d83]
/opt/scalix/lib/libom_flt.so[0xf7f1eef9]
/opt/scalix/lib/libom_flt.so(flt_ApplyTextMatch+0xe1)[0xf7f1f036]
/opt/scalix/lib/libom_flt.so(Test_TextBody_Att+0x1eb)[0xf7f1cafe]
/opt/scalix/lib/libom_flt.so(flt_ApplySingle+0x6c8)[0xf7f1b6c0]
/opt/scalix/lib/libom_flt.so(flt_ApplyNextFilter+0x228)[0xf7f1a9da]
/opt/scalix/lib/libom_flt.so(flt_ApplyOrGroup+0xb6)[0xf7f1aca5]
/opt/scalix/lib/libom_flt.so(flt_ApplyNextFilter+0x1e8)[0xf7f1a99a]
/opt/scalix/lib/libom_flt.so(flt_ApplyOrGroup+0xb6)[0xf7f1aca5]
/opt/scalix/lib/libom_flt.so(flt_ApplyNextFilter+0x1e8)[0xf7f1a99a]
/opt/scalix/lib/libom_flt.so(flt_ApplyAndGroup+0xb6)[0xf7f1aadc]
/opt/scalix/lib/libom_flt.so(flt_ApplyNextFilter+0x1c9)[0xf7f1a97b]
/opt/scalix/lib/libom_flt.so(flt_ApplyOuterGroup+0xcb)[0xf7f1a663]
/opt/scalix/lib/libom_flt.so(flt_ApplyFC+0x140)[0xf7f1a184]
local.delivery[0x8057cd9]
local.delivery[0x80530a4]
local.delivery[0x805c2ab]
local.delivery[0x805dfa3]
܅
Last Msg Id: L20398BF263BE44efACFB1B8E89CF456E.1179929848.scalix.int.cardonhealthcare.com


With different 'Msg Id'

Is there any way to track what's going on?

Posted: Thu May 24, 2007 11:18 am
by eyalm
anybody?
It's still showing up every so often.

Posted: Thu May 24, 2007 5:08 pm
by Shredder
omsolve gives the following:

Code: Select all

mail:~ # omsolve -n OM 10272
-------------------------------------------------------------------------------
Error Group: OM  Error Number: 10272

BACKTRACE:
[]

Diagnostic Stack Trace - normally produced after message a serious
problem which is reported separately.  This message is kept separate
because in the case of memory corruptions, there may be problems
generating it.
-------------------------------------------------------------------------------


I would check if something is stuck in the local delivery queue (omstat -q local).

You can also use SAC to look at it. (Under the Server Info section, then under Queues, Local Delivery)

Shredder

Posted: Thu May 24, 2007 5:49 pm
by eyalm
Thanks for the reply.
I checked that before but local queue is empty:

[root@scalix logs]# omstat -q local
omstat : There are no messages on the queue


I also checked the other queues in SAC and they're all empty.

Posted: Thu May 24, 2007 6:11 pm
by Shredder
You could try restarting the local delivery service:

Code: Select all

omoff -d0 -w ld
omreset -o off ld
omon ld


Shredder

Posted: Thu May 24, 2007 6:30 pm
by eyalm
I'll try that and see if the error keeps showing up.

thanks!

Posted: Fri May 25, 2007 10:48 am
by eyalm
Error still showing up with different MsgId.

any other ideas?

Posted: Fri May 25, 2007 11:10 am
by Shredder
Can you post the contents of /var/opt/scalix/??/s/logs/fatal

Maybe there is something in there.

Shredder

Posted: Wed May 30, 2007 7:40 am
by gren
Hi,

If you do "omstat -q ERROR" on your Scalix server, are there any messages reported?
Also, for the POISON queue, does "omstat -q POISON" report anything?

If so, these may be messages we are encountering problems processing.

Which version of Scalix are you using? I know some fixes have been made in a similar area in recent 11.X fix releases.

Things you can try :
omresub -q ERROR
omresub -q POISON

Note that if the messages are still causing problems, then local.delivery will die again and need restarting.

Any messages that end up back on the ERROR queue or the POISON queue would be interesting.
It would be useful if you could send me examples of these messages dumped from the queues. The "omqdump" command can do this. The password is "A##E" where ## is today's month day + 10, so for 30th May, the password is "A40E".
The "o" command will output a message to a set of files, if you could tar these up and send the result to gren dot elliot at scalix dot com with dots and ats replaced, that would be great. With these, we may be able to pinpoint the exact error and fix it :)

Regards,
Gren.

Posted: Wed May 30, 2007 9:57 am
by eyalm
gren,
both error and poison queue are empty.
also, local.delivery doesnt die, it just gives me that error in omshowlog -s ld with a different MsgId everytime.

Thanks.

Posted: Thu May 31, 2007 5:49 am
by gren
Ahhh! Actually, that does make sense now I think about it. This error occurs once a message has been delivered into an intray. It happens during the creation of the cached image of a message for IMAP usage.

If you could locate one of the messages and do an "sxmboxexp" of the direct reference for the message, that would be great. It looks like the direct reference isn't logged in this case. (It would be after the "Last Msg Id:") Is that correct?

You could use audit logging (omconfaud) to track who has just had a message delivered when the error happens. Use omcontain to navigate to that user's intray (same password as for omqdump) and use the "r" command to find the direct reference. It will be in a line similar to this :
Direct Index: 972121 (69120) 000ed5592d544c93

The important bit is the last string similar to "000ed5592d544c93".

If you have successfully found a problem message, then, from the command line as root:

mime.browse -x -m 000ed5592d544c93

will probably cause the same problem in the error logs and probably a non-zero exit code.
To produce a dump of the problem message in "foo.mbox", use :
sxmboxexp --force -a foo.mbox --dref 000ed5592d544c93

Thanks,
Gren.

Posted: Thu May 31, 2007 7:50 pm
by eyalm
Thanks Gren,
I'll check it this weekend and post the results.

Posted: Tue Jun 05, 2007 12:29 pm
by eyalm
I found a message that was producing the error.
I ran mime.browse of that message, with no error.
what I did notice is that today I had 3 messages that showed errors in omshowlog -s ld, and all 3 messages were delivered to the same user.

I'll keep checking if that user is always a recipient when ld generates that error.

Posted: Wed Jun 06, 2007 7:21 pm
by kjakkanen
We get this error occasionally too, it doesn't seem to happen regularly though. I don't quite understand how I could afterwards check for the msgs causing this problem, I know that by tailing the audit + error logs I could do it in real-time but might have to sit tight for a long period?

-Kimmo