Postby egadinc » Tue Nov 14, 2006 5:32 pm
Sorry for not posting the resolution earlier.
Once the server was running properly, it became an issue of out of site, out of mind.
Overall, it wasn't entirely clear what all the problems were. It appears that minor changes to timing made the problems go away.
Here's the summation of the situation as I got from SCALIX SUPPORT:
-----------------------------------------------------------------------------------------------------------
I believe what resolved the smtpd "Partially Aborted" problem was two fold.
First, getting your /etc/mail/local-host-names file configured with your domain so that mail to invalid/unknown addresses would be rejected rather than accepted by sendmail.
Second, bringing your greet pause down from 5 to 10 seconds down to 0 seconds. This was done by editing your /etc/sendmail.cf file and changing this section:
######################################################################
### greet_pause: lookup pause time before 220 greeting
###
### Parameters:
### $1: {client_name}
### $2: {client_addr}
######################################################################
SLocal_greet_pause
Sgreet_pause
R$* $: <$1><?> $| $>"Local_greet_pause" $1
R<$*><?> $| $#$* $#$2
R<$*><?> $| $* $: $1
R$+ $| $+ $: $>D < $1 > <?> <! GreetPause> < $2 >
R $| $+ $: $>A < $1 > <?> <! GreetPause> <> empty client_name
R<?> <$+> $: $>A < $1 > <?> <! GreetPause> <> no: another lookup
R<?> <$*> $# 0000
R<$* <TMPF>> <$*> $@
R<$+> <$*> $# $1
so that this line has all zero's (0 second greet pause) in it.
R<?> <$*> $# 0000
Once those were done, I don't believe smtpd ever went "partially aborted" again.
From there, the time was spent parsing your smtpd.log file and determining why the PortCheck program kept timing out on your smtp socket.
At present, your timeout value is set to 180 seconds in ~root/PortCheck.pl and we may be able to bring it down to 120 seconds. I'd like to leave it at 180 seconds for another week.
As to why there is a need for an extended timeout period, it appears to be network and/or DNS related.
The part of the code that's being processed when there were delays is the section that does a reverse lookup on the inbound connection in order to match the value that the incoming host provides with the HELO command with what the address *really* is.
The reason we can't determine if it's network performance or DNS related is because all we can see is that we're waiting for a response to the DNS lookup.
If I were to guess, I would say it's 20% DNS related and 80% network related. The reason I say that is because I've been logged onto your server via SSH and at times it's very non-responsive.
If I look at "top" during that time, your system isn't being heavily taxed from a memory or CPU point of view, so the performance hit is likely coming from the network. You would really have to have some type of network traffic analyzer hooked to your Internet connection for a couple of weeks to really prove this out.
---------------------------------------------------------------------------------------------------------
Hope this helps.
After this, I moved to openSUSE 10.0 and ran with Scalix 10. This worked well. Unfortunately, I never found a convenient way to copy over the list of Accounts, so it was a manual data entry process.
Since then, I've moved onto to openSUSE 10.1 and installed Scalix 11 Beta 1.
I was updating crashed server hardware and I didn't want to go back have to go back to openSUSE 10.0 and the forward to openSUSE 10.1, I took my chances with working with Scalix 11 Beta 1.
Once again, no obvious way to export Accounts and import them, so it was back to retyping all the Accounts in SAC.
This experience has been bad for the SMTP problem. No matter what I do, I've been unable to fix the problem that was solved with the suggestions from the original problem.
RESULT: Haven't had outgoing email (SMTP) since moving to Beta 1. Also, incoming mail seems to hang during the night (every night), and simply starting and stopping Scalix doesn't work. SO, every morning at 7:00am I manually restart the whole server. Incoming mail is good until next morning.
I've created a new server, hoping to go to Beta 2, and have this problem fixed.
Unfortunately, once again, have not been able to copy out the Accounts and import them into the new server without botching the whole Scalix 11 Beta 2 install.
In a few hours, since I'm going to be out of town for a week, I'm going to be doing the following:
a) Uninstall Scalix 11 Beta 2
b) Reinstall Scalix 11 Beta 2, with no attempt at copying data files.
c) Retype in all the accounts into SAC, again.
d) Install ommaint, and set it to go every 30 minutes.
e) Add a cron to restart the entire server at 7:00am every morning.
f) Pray alot...
g) Hope none of my mail users notice the flaky behavior.