SMTP stops periodicaly - again..

Postby **ScalixSupport** » Thu Apr 13, 2006 5:32 pm

Hi Peter,

Using telnet I found that your address resolves quickly, non-Scalix entries (i.e. root) and bogus_addresses resolve slowly. However, on your end the latter two resolve quickly. So, I tried resolving another address (sxadmin) and I can see that's very slow as well. So, there may be something going on with your directory.

Can you do an:

omsearch -e cn=*|wc -l

and post how long it took and how many entries it found?

Thanks,
Rachel

peterz · Postby **peterz** » Thu Apr 13, 2006 5:46 pm

Rachel:

Here's the omsearch as requested. Each result was as immediate (instant) after hitting the return key at the end of the command.

------------------------------------------------------------------------------------------

mail:~ # omsearch -e cn=*|wc -l
57
mail:~ # omsearch -e cn=*|wc -l
57
mail:~ # omsearch -e cn=*|wc -l
57
mail:~ # omsearch -e cn=*|wc -l
57
mail:~ #

------------------------------------------------------------------------------------------

I did this about 20 times to make sure it was consistent.

Regards,

- Peter

Postby **ScalixSupport** » Thu Apr 13, 2006 6:01 pm

Hi Peter,

I think at this point, we're just spinning our wheels. We really need to get access to your server because what we're seeing via telneting to socket 25 has very different behavior than what you're seeing on your server. It's not the Scalix directory, it's not sendmail and it's not the SMTP Relay.

Two things to keep mind about purchasing a Support Incident. First, if the problem turns out to be a bug, it won't use up one of your incidents. Second, we can always circle back here and post the results so others can see the resolution.

Thanks,
Rachel

peterz · Postby **peterz** » Thu Apr 13, 2006 6:09 pm

Rachel:

So, what's the cost of this support incident? If it's reasonable, let's proceed.

Regards,

-Peter

Postby **ScalixSupport** » Thu Apr 13, 2006 6:12 pm

Hi Peter,

I've emailed you the pricing info and order form.

Thanks,
Rachel

peterz · Postby **peterz** » Thu Apr 13, 2006 6:22 pm

Rachel:

I've filled in the form and emailed it to you.
A little steep for a single incident, but I need to resolve this.

Regards,

- Peter

btisdall · Postby **btisdall** » Thu Apr 20, 2006 2:33 pm

If both parties are happy to divulge the details, I'm curious as to what the problem as & how it was resolved.

Regards,

Ben Tisdall
RedCircle IT Ltd
London

peterz · Postby **peterz** » Thu Apr 20, 2006 2:46 pm

I will gladly provide an update as soon as it becomes clear what the problem was.

Rachel, from Support, seems to have reduced the problem significantly.
She's been doing a terrific job of tracking this problem down and tweaking the system.

However, the story seems to be that it's a 'sendmail' setup problem, and this is not covered by the Scalix automatic installation. Obviously, misinterpretation between Scalix and the Community on what is and is not offered in this email package.

Anyway, since I'm paying for this Support Incident, I will be posting every detail of this issue and resolution here and thereby share my $300.00 experience with everyone.

-------------------------------------------------------------------------------------------------------
General status: the sudden aborted SMTP server incidents appear to have just about disappeared over the last 5 days. There have been a few errors reported by 'ommaint' that might be part of the problem.

My strongest suggestion at this point is that 'ommaint' utility is not a mere suggestion, but a mandatory component to running Scalix. Unfortunately, it's not part of the automatic installation and requires a manual installation and a setup into Crontab. (i.e., some Linux skills are necessary to run this email server!)

Regards,

- Peter

btisdall · Postby **btisdall** » Fri Apr 21, 2006 5:38 am

Glad to hear things are improving.

Regards,

Ben Tisdall
RedCircle IT Ltd
London

PeyotePipesmoker · Postby **PeyotePipesmoker** » Fri Apr 28, 2006 12:46 pm

Greetings everyone,

I am currently running Red Hat ES Ver 4 with Scalix 10. I too am getting SMTP Relay failures on a constant basis. I running approx the same amount of users Peterz is running with some using POP3 and others MAPI. We have an exchange cluster and introduced the Scalix into our mail system. Up until now the only problem has been SMTP Relay partially aborting. Any help is much appreciated!!

Regards,
Ryan W.

Postby **ScalixSupport** » Fri Apr 28, 2006 2:38 pm

Hi Ryan,

Given that Peter paid for Support, I'll leave it up to him to post specifics on his setup and what we've done to resolve the problem. So, let's look at your site individually. Can you provide some details on your system, how much memory, what type of network connection, DNS configuration, etc. Have you enabled SMTP debug logging? If so, do you have some snippets from the log you can post? Have you made sure your /etc/mail/local-host-names files is correct?

Thanks,
Rachel

souperdad · Postby **souperdad** » Tue Nov 14, 2006 5:03 pm

I'm having the same issue. Has the issue been resolved for Peter? I haven't seen any resolution posted.

egadinc · Postby **egadinc** » Tue Nov 14, 2006 5:32 pm

Sorry for not posting the resolution earlier.

Once the server was running properly, it became an issue of out of site, out of mind.

Overall, it wasn't entirely clear what all the problems were. It appears that minor changes to timing made the problems go away.

Here's the summation of the situation as I got from SCALIX SUPPORT:

-----------------------------------------------------------------------------------------------------------

I believe what resolved the smtpd "Partially Aborted" problem was two fold.

First, getting your /etc/mail/local-host-names file configured with your domain so that mail to invalid/unknown addresses would be rejected rather than accepted by sendmail.

Second, bringing your greet pause down from 5 to 10 seconds down to 0 seconds. This was done by editing your /etc/sendmail.cf file and changing this section:

######################################################################
### greet_pause: lookup pause time before 220 greeting
###
### Parameters:
### $1: {client_name}
### $2: {client_addr}
######################################################################
SLocal_greet_pause
Sgreet_pause
R$* $: <$1><?> $| $>"Local_greet_pause" $1
R<$*><?> $| $#$* $#$2
R<$*><?> $| $* $: $1
R$+ $| $+ $: $>D < $1 > <?> <! GreetPause> < $2 >
R $| $+ $: $>A < $1 > <?> <! GreetPause> <> empty client_name
R<?> <$+> $: $>A < $1 > <?> <! GreetPause> <> no: another lookup
R<?> <$*> $# 0000
R<$* <TMPF>> <$*> $@
R<$+> <$*> $# $1

so that this line has all zero's (0 second greet pause) in it.

R<?> <$*> $# 0000

Once those were done, I don't believe smtpd ever went "partially aborted" again.

From there, the time was spent parsing your smtpd.log file and determining why the PortCheck program kept timing out on your smtp socket.

At present, your timeout value is set to 180 seconds in ~root/PortCheck.pl and we may be able to bring it down to 120 seconds. I'd like to leave it at 180 seconds for another week.

As to why there is a need for an extended timeout period, it appears to be network and/or DNS related.

The part of the code that's being processed when there were delays is the section that does a reverse lookup on the inbound connection in order to match the value that the incoming host provides with the HELO command with what the address *really* is.

The reason we can't determine if it's network performance or DNS related is because all we can see is that we're waiting for a response to the DNS lookup.

If I were to guess, I would say it's 20% DNS related and 80% network related. The reason I say that is because I've been logged onto your server via SSH and at times it's very non-responsive.

If I look at "top" during that time, your system isn't being heavily taxed from a memory or CPU point of view, so the performance hit is likely coming from the network. You would really have to have some type of network traffic analyzer hooked to your Internet connection for a couple of weeks to really prove this out.

---------------------------------------------------------------------------------------------------------

Hope this helps.

After this, I moved to openSUSE 10.0 and ran with Scalix 10. This worked well. Unfortunately, I never found a convenient way to copy over the list of Accounts, so it was a manual data entry process.

Since then, I've moved onto to openSUSE 10.1 and installed Scalix 11 Beta 1.

I was updating crashed server hardware and I didn't want to go back have to go back to openSUSE 10.0 and the forward to openSUSE 10.1, I took my chances with working with Scalix 11 Beta 1.

Once again, no obvious way to export Accounts and import them, so it was back to retyping all the Accounts in SAC.

This experience has been bad for the SMTP problem. No matter what I do, I've been unable to fix the problem that was solved with the suggestions from the original problem.

RESULT: Haven't had outgoing email (SMTP) since moving to Beta 1. Also, incoming mail seems to hang during the night (every night), and simply starting and stopping Scalix doesn't work. SO, every morning at 7:00am I manually restart the whole server. Incoming mail is good until next morning.

I've created a new server, hoping to go to Beta 2, and have this problem fixed.

Unfortunately, once again, have not been able to copy out the Accounts and import them into the new server without botching the whole Scalix 11 Beta 2 install.

In a few hours, since I'm going to be out of town for a week, I'm going to be doing the following:

a) Uninstall Scalix 11 Beta 2
b) Reinstall Scalix 11 Beta 2, with no attempt at copying data files.
c) Retype in all the accounts into SAC, again.
d) Install ommaint, and set it to go every 30 minutes.
e) Add a cron to restart the entire server at 7:00am every morning.
f) Pray alot...
g) Hope none of my mail users notice the flaky behavior.

benmclendon · Postby **benmclendon** » Fri Oct 24, 2008 11:49 am

I'm sorry to say that things are not a lot better since your posts..

We were running 11.0 on RHEL 4 and having similar issues. We upgraded to 11.4.2 hoping for some bug fixes... and its been down hill from there. We sent a lot of time staging up a fresh build on Centos 4 with 11.4.2 and restoring our store.... got the same result.

I am working aggressively on quotes for a new Exchange system to replace Scalix... that is when I'm not baby sitting the Scalix server, killing PIDs, restarting services, or rebooting the P.O.S.

Scalix Forums

SMTP stops periodicaly - again..

omsearch results

Support Incident

Incident Fee

the diagnosis of the problem continues...

I am having the same issues!!

same problem

Summary of resolution.

More Bad News

Who is online