Page 1 of 1

seriously delayed emails

Posted: Wed Dec 13, 2006 3:00 pm
by souperdad
We are having a problem in that a lot of our emails are being delayed, sometimes up to 8 hours. As you can see from /var/log/maillog the connection is being refused by 127.0.0.1 but was in fact delivered 6 hours later (email addresses changed for privacy reasons)

Dec 12 10:25:00 mail sendmail[24332]: kBCIOsEC024332: from=<sender@sender.com>, size=103892, class=0, nrcpts=1, msgid=<4AEEB55F8E7E954897DDFAC79D82468C3CC696@gaalpsvr0281>, proto=ESMTP, relay=root@localhost
Dec 12 10:25:00 mail sendmail[24332]: kBCIOsEC024332: to=<recipient@recipient.com>, delay=00:00:03, mailer=relay, pri=133892, stat=queued
Dec 12 12:23:57 mail sm-msp-queue[31253]: kBCIOsEC024332: to=<recipient@recipient.com>, delay=01:59:00, xdelay=00:00:00, mailer=relay, pri=223892, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]
Dec 12 18:13:55 mail sm-msp-queue[28917]: kBCIOsEC024332: to=<recipient@recipient.com>, delay=07:48:58, xdelay=00:00:09, mailer=relay, pri=313892, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (kBD1HrKC029435 Message accepted for delivery)

Anyone have any ideas why these emails might be sitting on the server for this time instead of being delivered?

From the scalix server I can successfully telnet to 127.0.0.1

here is the smtpd.cfg

SMTPFILTER=TRUE
RELAY accept 127.0.0.1
RELAY accept .lush.com
RELAY accept .lushcanada.com
RELAY accept .lushusa.com
RELAY Log_Reject ALL

# extra rules added to prevent open relay usage
RECIPIENT Log_Reject *@*@*
RECIPIENT Log_Reject *%*
RECIPIENT Log_Reject *!*
RECIPIENT Log_Reject *#*@*

and /etc/sysconfig/sendmail :
DAEMON=yes
QUEUE=1h

Posted: Wed Dec 13, 2006 8:32 pm
by kanderson
How busy is your server? Sendmail might be refusing connections if it's too busy (load average of 12 or higher, from memory) Maybe a backup or something similar running?

Posted: Sat Dec 16, 2006 1:45 am
by souperdad
our server is very busy and quite frequently the load average is up in high teens.
The server specs are more than enough, (64bit dual core with 2 GB of RAM)

can you change the setting at which point sendmail will start refusing connections?

I'm wondering if it's a better idea to move the spam filtering to another server as according to top it always seems to be spamd that is hogging the resources

Posted: Mon Dec 18, 2006 4:54 am
by Valerion
With load averages that high your whole system will suffer. You HAVE to reduce that, otherwise your users will barely be able to access their email. I had this at a client's site and it affected Scalix very badly. Simply allowing sendmail to run will just make the problem worse.

Ideally you should have no more than a load average of 1 per CPU. For your system I would say anything above 5 or 6 (for short periods) or 2-4 normally would raise a red flag and cause me to investigate the matter. If you can find the culprit and move it to a different machine, it would be a much better solution.

Posted: Wed Dec 20, 2006 3:47 pm
by souperdad
More often than not when the load average is really high, top will show spamd at the top of the list nearly every time.

Same problem but load average is lower

Posted: Wed Dec 27, 2006 9:41 am
by simonwelch
We also get serious email delivery problems, this has got so bad that we instruct people to send to a gmail account. Output of uptime is detailed below and our logs don't show connection refused.

RHEL 4, Dell Dual Xeon 2850, 4 gb memory.

Simon

13:31:03 up 46 days, 18:11, 1 user, load average: 1.72, 1.55, 1.52