Page 1 of 1

100's of unix.in processes running

Posted: Fri Sep 28, 2007 11:29 am
by computernay
Platform:
Scalix 11.1
CentOS5 x86_64
P4 processor
2gb ram
single 320gb sata drive
Around 60 users, all premium

Our server has been running fine for the last 3 months, until 2 days ago. I installed an updated kernel and rebooted overnight. A number of other issues occurred, but this one remains.
After the Internet Mail Gateway has been running for a while (say, 15 minutes) it starts more and more unix.in processes, until the server is completely overloaded. The load average this morning was over 400, and there were around 450 unix.in processes. If I stop the Internet Mail Gateway using 'omoff -d0 -w unix' it leaves the unix.in processes running, and I have to do 'killall unix.in' to get the server back to responsiveness. This is also preventing incoming mail from being processed.

I need help quickly, even if it's only a pointer to what may be wrong. I will provide more info if needed.

Posted: Sun Sep 30, 2007 1:57 pm
by chris
Hi, if you need help with a guaranteed response time you'll normally need to open a support case. Unfortunately, however, we don't support CentOS. The reason being cases like this, where we haven't tested with the CentOS build environment (although they compile the same source as RHEL, RedHat doesn't release their build specifications, which can create differences particularly around kernels)

That being said, are you seeing any kind of errors generated by the unix.in processes? The unix.in process is spawned by omsmtpd in order to put incoming messages on the router queue. So it's a strange place to have processes hanging with no other outward sign of issues.

Have you tried rolling back to the previous kernel to see if that alleviates the situation?

Chris

Posted: Tue Oct 02, 2007 11:06 am
by computernay
I was able to get the issue under control using the smtpd.cfg options MAX_CLIENTS and MAX_SUBPROCS. I found these options in some old Scalix 10 documentation we had laying around. The issue isn't completely resolved, but it's being limited enough that the server stays responsive.

For future reference, how do I open a support case?

To answer you questions, I couldn't see any errors being generated by the unix.in processes. I was watching the audit log and sometimes omshowlog. We did try rolling back to the previous kernel, but it made no difference. Also I should note that we have spamassassin installed.
Another thing to note: the /etc/init.d/scalix script was not enabled. (chkconfig --list scalix showed all off) This was causing the server to shutdown without stopping the scalix daemons, and also I had to manually start them when the server booted. Not sure if this was my fault (something I missed in the installation notes?) or if the installer didn't enable it.

We are going to install a raid1 array just for the scalix data and see if it makes a difference.