How to recover from looping non-delivery messages?

Discuss the Scalix Server software

Moderators: ScalixSupport, admin

mhanisch
Posts: 31
Joined: Mon Jan 02, 2006 11:53 am
Location: Munich, Germany

How to recover from looping non-delivery messages?

Postby mhanisch » Sat Jan 20, 2007 11:13 am

I have a bad problem with non-delivery notifications that keep looping.

No matter what I do, these messages (well, copies of the same message) keep
on appearing on my local delivery queue.

At the moment, I'm seeing about 160000 of these messages on the ld queue, and that
number is increasing by the second.

For some reason, I cannot get rid of them.
Yesterday evening I managed to get the number down to about 100000, but this morning
the number went up again.

Is there any way to discard all NDNs on a queue?
Or can I make sure that they are delivered only once?
How can I delete them from the queue (they all share the same subject lline)?

This seems to be a bug in scalix that's being triggered by an incorrect RCPT address in the original mail.

ls-al
Scalix Star
Scalix Star
Posts: 510
Joined: Tue Jun 29, 2004 8:28 am
Location: Leipzig, Germany
Contact:

Postby ls-al » Sat Jan 20, 2007 12:24 pm

Hi,
first of all I would suggest to shut down the Service Router, Local Delivery and the SMTP Relay Daemon to prevent further messages regardless from where they come from.

Code: Select all

omoff -d 0 sr smtpd ld


Then - if you are sure that the subject is unique - you should delete them from the queues with a slightly modified script from viewtopic.php?p=25355.

Code: Select all

omstat -q router | grep "put in the subject here" | cut -d" " -f 1 | while read msgid
do
    omstat -q router -j -R -m $msgid
done
omstat -q local | grep "put in the subject here" | cut -d" " -f 1 | while read msgid
do
    omstat -q local  -j -R -m $msgid
done

This will take a while.
There are currently to less informations to determine the cause of the problem.
Maybe you can post more once you have your system up and running again. Dont forget to

Code: Select all

omon ld sr smtpd


Viele Grüße von Freistaat zu Freistaat. :wink:

mhanisch
Posts: 31
Joined: Mon Jan 02, 2006 11:53 am
Location: Munich, Germany

NDNs are piling up even though router etc. are disabled

Postby mhanisch » Sat Jan 20, 2007 12:48 pm

Hi,

thanks for the heads up.

I've started a script like the one you proposed a while ago,
but it's veeeeery slow.
The new messages come in much faster than the script is able to delete them, so
this won't really help me, I'm afraid.

Unfortunately stopping ld, sr and smtpd doesn't help - I'm still getting more and more messages, only NDNs. No idea where they come from or which service is creating them :-/

Do you have an idea where the non-delivery reports are being created?
I would have assumed that it's the Service Router, but apparently that's not the case - the service router has been stopped a while ago, and still there are more and more messages.

My next steps:
- shut down smtpd so that at least no more emails from the outside get in
- keep ld running to reduce the number of messages in the queue
- at some point, there should only be error messages in the queue, since no emails
are accepted from the outside.
- when that happens, I'll delete all the remaining messages in the queue using
omqdump - that's fast enough to at least reduce the number of messages in the queue

Any way, thanks for the reply!

I hope that I'll manage to stabilize the system to a point when I can start investigating
the origin of these pesky NDNs; otherwise I don't see how I can get everything up and running again... :-(


Grüsse aus München,
Michael.

mhanisch
Posts: 31
Joined: Mon Jan 02, 2006 11:53 am
Location: Munich, Germany

Found the culprit

Postby mhanisch » Sat Jan 20, 2007 1:18 pm

Ok, I've found the culprit:

One of our users sent a mail with a broken recipient address through our server.
The internet mail gateway noticed that the recipient address was broken and
sent a NDN back to the service router.
In addition, the contents were returned. (Only just disabled that.)

But apparently that recipient address was screwed up so badly that unix.out
could not remove that message from its queue, so it kept sending one NDN after
the other.

Now that the Internet Mail Gateway was stopped, there are no more new mails.
I deleted the offending email from the queue, so hopefully this will keep the problem at
bay.

Now I only need to take care of the 230000 messages that are waiting in the router and local queues... :?

dkelly
Scalix
Scalix
Posts: 593
Joined: Thu Mar 18, 2004 2:03 pm

Postby dkelly » Sat Jan 20, 2007 2:51 pm

What version of the Scalix server are you running with. This behaviour was addressed in Scalix 11 so, if you are still seeing it, please tell us as, obviously, we would have more fixes to do.

Cheers

Dave

mhanisch
Posts: 31
Joined: Mon Jan 02, 2006 11:53 am
Location: Munich, Germany

handling of known bugs and other support issues

Postby mhanisch » Sun Jan 21, 2007 1:42 pm

Hi Dave,

we're running 10.0.3 at the moment.

I did not know that this was a known bug (obviously); at least it's one more reason to upgrade to v11 now that it's ready.

BTW, what's the preferred/official way to stay up to date with respect to known issues like this one? I would assume that -having a enterprisey subscription and such- we would be informed about new known issues as they come up, but maybe it was there and I've just missed the corresponding entry in the release notes...
Or should I just check the scalix bug tracker on a regular basis?

On a related note, we're currently evaluating an extension of our scalix deployment to 2 additional locations, i.e. ~60-70 additional users and some additional servers for load-balancing and failover. But considering the problems and downtime we had in the past 6 months or so makes it kind of hard to justify this additional investment in Scalix, especially since the support response times have been less than ideal recently. (I still haven't heard from scalix support on this issue despite the 2 emails I've sent there in the past days.)

I know that it's not your fault, but I definitely need to address this issue with someone from scalix sales or our scalix distribution partner since we seem to be having a real communication problem lately.

Anyway, thanks for the heads up. At least our mailsystem is back in working state again...


Return to “Scalix Server”



Who is online

Users browsing this forum: No registered users and 10 guests

cron