Scalix Forums

Posted: **Tue Dec 11, 2007 10:38 am**

We have been running 10.0.5 for quite some time without any problems. Recently I have seen the IO/wait on the server jump into the 96% to 98% range. This goes on for up to a couple of hours, and then it frees up without any apparent cause. The server runs so slowly while this is happening that our users assume that the server has crashed.

During this time the Local Delivery queue grows. I have shut off the incoming mail relay, and watched the local queue process at the rate of less than 1 message every 5 minutes.

Also, I have tried stopping the Local Delivery service and the OMScan server (because I saw that running). Nothing seems to help. I have gotten it down to just "ual.remote" and "in.imap41d" task running with a "D" status.

Also, on the "top" command the load average normally runs around .02 to .05. During one of these incidents the load average spikes to 20 to 30. (in one incident it went to 130).

I would appreciate any help you could give me, I can't think of anything else to try.

Posted: **Tue Dec 11, 2007 4:14 pm**

How many users do you have?
How many messages a day (approx) deos the Scalix server service?
How is your disk subsystem configured?

Mike

Posted: **Tue Dec 11, 2007 4:45 pm**

Hi Mike,
We have about 75 active users, normally I see anywhere from 50 to 65 users connected at peak times of the day.

Our mail activity is less than 2,000 emails per day (in and out), excluding spam. Spam accounts for another 8,000 to 10,000 incoming messages each day.

I have a 466 GB ST3500630A disk for /var/opt/, a 373 GB HDS724040KLAT80 disk for /bak/, and a 38 GB ST340014A disk for the operating system (RHEL3).

John

Posted: **Tue Dec 11, 2007 5:36 pm**

Hi John

It is my guess that your disk subsystem is not optimal for your requirements.
A quick search of ST3500630A shows that to be an ATA 100 interface. Scalix relies on good I/O it is Scalix's only requirement really. IE don't require much CPU, don't require much memory but I/O is king. Whith that many users and that many messages I would think that the min specification you use would be SATA2 I/O. You could use RAID1 if you want but RAID1+0 give the best performance.

SAS drives in the above configurations give the best performance of all but are very expensive. SATA should do fine.

Mike

Posted: **Tue Dec 11, 2007 5:43 pm**

Mike,
My last post got me thinking, and I think this might be relevant.

I normally see only a few messages in the local delivery queue at any one time. But when one of these slow-down's happens I have always found a large number of spam emails that appear to have arrived in a short time. Typically I might get between 150 to 800 spam emails in a matter of a few minutes.

When the server is running properly I've watched the spam messages being delivered to the filter mail box and have seen it process as many as 30 messages per minute. But, during a slow-down I might see a single message take up to 5 minutes to deliver.

Could it be that processing a large number of emails for a single mailbox causes the local delivery service to lock-up, or possible to begin to thrash?

Given your comments in your last post - does the disk access requirement increase in a non-linear fashion with higher message volume for a given mailbox?

John

Posted: **Tue Dec 11, 2007 5:52 pm**

Hi John

The other option is to filter your spam at the gateway. There are may products that do this both commercial and open source. IPCOP is one that others have had success with. but there are many to choose from.

Mike

Posted: **Wed Dec 12, 2007 3:49 am**

Your problem here is probably disk thrashing. Under high load (lots of incoming emails) each new email needs to be written to disk as it enters the queues the first time. If the load on this is very high you will probably end up with a IO subsystem not capable of handling the load from both the incoming mails and the disk pressure from the SR/Archiver/LD/Internet mail gateway trying to handle all of that. I think it may be worthwhile to do at least some filtering before it reaches the server, to alleviate the load of all this. Or re-spec your disks to keep up.

You can (if you have more than one core) try to increase the number of workers on a queue, resulting in a faster processing of the queue, but that will increase the CPU usage, memory usage and the disk IO all at once. One worker should easily keep up with that kind of load.

Posted: **Tue Feb 12, 2008 3:57 pm**

I have a filter installed in a qmail server that eliminates the 8,000 to 10,000 daily spam messages. In the last 4 days the total messages processed through the Scalix Local Delivery queue is down to 8,652 (or around 2,200 messages per day).

Using the SAR command I see the following when I run my daily backup.

Scalix Forums

Scalix server slow with High IO/Wait

Scalix server slow with High IO/Wait