Heavy IMAP Usage is Killing the Server

Discuss the Scalix Server software

Moderators: ScalixSupport, admin

roastpork
Posts: 34
Joined: Thu Mar 23, 2006 1:55 am

Postby roastpork » Sat Oct 14, 2006 12:03 am

OK, I'll chime in with another: "happens here too"

Our problem was recently exposed with the use of SWA when a user went on vacation and attempted to use SWA. I checked the server and verified results, logs, i/o waits, etc. similar to what has been posted in this thread. in.imap41d goes nuts when trying to switch between mail messages via SWA.

In our case we are running in VMWare Server on a Windows 2003 box. I noticed a mention earlier in the thread regarding a customer with VMware setup. Were there changes specific to a VMware environment I can try?

I can provide more details specific to our environemnt but barring a VMware problem I think we're experiencing similar to everyone else in this thread.

Server: P4 3.2Ghz, 2GB RAM, dual sata hardware mirrored 250GB drives
VMware: allocated 768MB RAM to session but memory isn't the problem.
Users: 12, 8 on-site via Outlook-Scalix, 4 off-site POP3.

I know this box is disk i/o challenged but am dealing with what I have to work with.

The user we are seeing this with has a fairly large number of messages (1,000's). With maybe hundreds in Inbox.

I am planning on rolling this out to another client with much more demanding message stores (and better server disk i/o) but will have to postpone migration or look at another product as this doesn't truly look hardware related. This hasn't shown up repeatedly across semi-demanding client users?

-Mike

ScalixSupport
Scalix
Scalix
Posts: 5503
Joined: Thu Mar 25, 2004 8:15 pm

Postby ScalixSupport » Sat Oct 14, 2006 12:14 am

VMware and Scalix don't mix well. The problem is that you are virtualizing an IO bound application and by giving it little memory you are doing the caching no favor. You should either increase memory available to the subsystem, e.g. at least a gig or change the disk subsystem setup to RAID 1+0. SATA drives in particular are not a good choice - these are not server grade. You get what you pay for in the end.

Cheers,

Sascha.

roastpork
Posts: 34
Joined: Thu Mar 23, 2006 1:55 am

Postby roastpork » Sat Oct 14, 2006 12:39 am

ScalixSupport wrote:VMware and Scalix don't mix well. The problem is that you are virtualizing an IO bound application and by giving it little memory you are doing the caching no favor. You should either increase memory available to the subsystem, e.g. at least a gig or change the disk subsystem setup to RAID 1+0. SATA drives in particular are not a good choice - these are not server grade. You get what you pay for in the end.

Cheers,

Sascha.


Thanks for the quick response. I understand and will evaluate hardware improvments while maintaining an eye on this thread as well.

-Mike

paintbuoy
Posts: 14
Joined: Tue Oct 10, 2006 9:54 pm

Postby paintbuoy » Mon Oct 16, 2006 5:46 am

ScalixSupport wrote:SATA drives in particular are not a good choice

This got me thinking about the server that is having the Session Monitor issues due to excessive thread usage. It uses SATA drives with Suse 10.0, LVM and ReiserFS running on top.
I did some searching and found a number of performance issues related to SATA drives and the 2.6.13 kernel (used in Suse 10) and degraded ReiserFS performance over time.

To ensure I was not judging Scalix on the performance of other components I have compiled and installed the 2.6.16 kernel used in Suse 10.1 and changed the partition format on the server to ext3 (this move was helped in part by Novell announcing their default filesystem from Suse 10.2 would be ext3).

The performance of Scalix on the new kernel and different filesystem appears to be significantly improved. During OSX Mail synchronise tasks the number of concurrent user threads is reduced. Since the kernel/filesystem change I have not been able to reproduce the Session Monitor errors which is a good sign though I will keep a close eye on things as time progresses.

Here is what I've learnt from the process:
1. Using the 2.6.16+ kernel with SATA drives offers speed improvements (on this specific hardware at least)
2. The ext3 filesystem and Scalix appear to play nicer together than reiser3.
3. The number of concurrent OSX Mail connections would seem to be related to the quantity of user email folders (the more folders in the email account the more concurrent connections).

Thanks for the assistance during this process, if I encounter this error in the future I will post my logs here.

P.S. I would still be very pleased to see the concurrent IMAP thread limit raised (or turned into an adjustable variable) in a future version of Scalix. Plus while you are at it please allow the LDAP server port to be changed so that it doesn't conflict with existing services, it is the only other thing that annoys me about Scalix :D :D

paintbuoy
Posts: 14
Joined: Tue Oct 10, 2006 9:54 pm

Postby paintbuoy » Mon Oct 16, 2006 7:28 pm

I managed to break the Session Monitor by accidentally logging in two OSX Mail sessions under the same username on different computers. The Session Monitor refuses to allow connections from the effected user by still allows access by other users and Webmail/SAC.
Below is the log:

Code: Select all

scalix:/ # omshowlog -F 11:30

ERROR                          IMAP Server Da(IMAP Server Pr) 10.17.06 12:03:22
[OM 24070] Debug message for Lab use :
imapSatAuthenticate:Could not register with Session Monitor.
User Name: David Harrison / scalix, stress-free/CN=David Harrison


ERROR                          IMAP Server Da(IMAP Server Pr) 10.17.06 12:04:39
[OM 24070] Debug message for Lab use :
imapSatAuthenticate:Could not register with Session Monitor.
User Name: David Harrison / scalix, stress-free/CN=David Harrison


By restarting the IMAP daemon the issue was corrected without interupting other users.

Code: Select all

omoff -d 0 -a IMAP & omon IMAP

dkelly
Scalix
Scalix
Posts: 593
Joined: Thu Mar 18, 2004 2:03 pm

Postby dkelly » Mon Oct 16, 2006 8:27 pm

paintbuoy wrote:Plus while you are at it please allow the LDAP server port to be changed so that it doesn't conflict with existing services, it is the only other thing that annoys me about Scalix :D :D


Please take a look at ~scalix/sys/slapd.conf and in particular, the setting portNum

If you make this change, you'll also need to edit ubermanager.query.server.port in /etc/opt/scalix/caa/scalix.res/config/ubermanager.properties

Cheers

Dave

paintbuoy
Posts: 14
Joined: Tue Oct 10, 2006 9:54 pm

Postby paintbuoy » Mon Oct 16, 2006 8:35 pm

dkelly wrote:Please take a look at ~scalix/sys/slapd.conf and in particular, the setting portNum

If you make this change, you'll also need to edit ubermanager.query.server.port in /etc/opt/scalix/caa/scalix.res/config/ubermanager.properties

Brilliant I had searched the forums and found a post that said the slapd daemon port could be changed but some of the external services (I believe Webmail) were hard coded to use port 389. I will try this out this evening to see if it works.

Derek
Posts: 169
Joined: Fri Mar 24, 2006 4:53 pm
Contact:

Postby Derek » Mon Jul 09, 2007 9:42 am

florian wrote:When deleting stuff from the "temp" directory, you should be careful to leave stuff under temp/mime_cache intact - this is what is used by the IMAP server for rendering Scalix messages into mime. The cache will be automatically aged by the omscan command as part of monthly message store maintenance.

Out of curiosity - are the majority of these files found in temp or in temp/mime_cache?

Thx,
Florian.


Something is cleaning out my temp/mime_cache automatically and I can't figure it out or why. I'm thinking that maybe this is the reason that IMAP/SWA is so ungodly slow. What is doing it? I've checked ommaint, and that's not doing anything and at most it runs once an hour. I am seeing files drop out of temp/mime_cache in minute-long intervals, not hours.

I found the configuration option MIME_CACHE_TARGET_SIZE and see that it defaults to 1 (1mb). This seems entirely too small for almost any size organization. While I see the description says it can grow larger, if temp/mime_cache is routinely emptied to try and reach that size, it isn't go to do much good. I think this explains why if I log into webmail, I get an error, and then log in again it comes right up...because my stuff is cached. But if I login back into webmail a couple hours later, it is slow again.

Anyone have any comments about this? I am desperate to get IMAP/SWA running/loading faster.

kanderson

Postby kanderson » Sun Mar 30, 2008 3:44 pm

Note for everyone.

The line that I had recommended in this thread has been found to cause JAVA IO errors, if they delete files necessary for the IMAP cache. And it will.

Please be aware of that, and remove that line if you have it in place now.

Thanks
Kev.

thecowster
Posts: 62
Joined: Fri Oct 12, 2007 8:48 am

Yet another "happens here too"

Postby thecowster » Tue Apr 22, 2008 8:16 am

Hi there,

We're setting up Scalix, not gone productive yet. We thought the system was rock solid, but then had an internal test and found some concerning behavior. One aspect of this bad behavior is the 100% cpu for a in.imap41d process.

The good nes is that I can reproduce this at will. I have an email which I simply drag and drop with an IMAP client into a Scalix account, and bang, off goes the cpu. I say this is good news, because it should make it easier to locate and solve.

I actually have a number of these emails now, which I have been collecting over time. Oddly I don't see a pattern in them, so I cannot say empirically what is causing the problem. Also, the sample set I have collected these from is far far smaller than the total mail of our company - which should be being migrated to Scalix in the near future. So expect to see more of these if we proceed.

If anyone is interested to see a sample "email of death", just let me know. But be aware that saving it in an email client or forwarding it appears to filter out the fatal properties of the mail. Therefore its best transferred via an IMAP append.

Edit I should mention that with imap loglevel of 15 I see 50MB per minute of logs when this happens - but its all one just looooong call stack. It appears that a Scalix 11.3 imap process can get stuck in an infinite loop. Its not a tight loop though, as at least one file appears to be being opened and closed during this 100% cpu usage. Also, this continues indefinitely until I kill the process (with kill -9).

Cheers

PS: we're using Scalix 11.3, with an enterprise license.

Valerion
Scalix Star
Scalix Star
Posts: 2730
Joined: Thu Feb 26, 2004 7:40 am
Location: Johannesburg, South Africa
Contact:

Postby Valerion » Thu Apr 24, 2008 8:04 am

The best way to approach this is to get in contact with Scalix Support. You should have received a number of incident points with your purchase in any event, and for true bugs (what this sounds like offhand) those will likely get refunded. They can then log a issue with the developers and hopefully get this sorted out.

thecowster
Posts: 62
Joined: Fri Oct 12, 2007 8:48 am

Postby thecowster » Thu Apr 24, 2008 8:20 am

Sadly the incident points given with a new purchase expire after 3 months - and we didn't see these issues until after that. I did get a response from Scalix however, so mailed them some more details. That was a couple of days ago (not heard anything else since) Since then I've a found another way to kill the server (just try and connect with korganiser using groupdav, and wham, an imap process shoots up to 100% cpu).

seanyseansean
Posts: 29
Joined: Wed Apr 09, 2008 9:05 am

Postby seanyseansean » Mon Apr 28, 2008 4:26 am

Hi, i'm a colleague of thecowster above.

In addition to his post - we have tested the hardware fully this weekend (without Scalix running) and there are no throughput problems with the Infortrend RAID system or the NICs.

We'd really appreciate some guidance here. We have debugged as far as we can, and a single email killing the server seems like something i'd imagine Scalix would be interested in solving.

thecowster
Posts: 62
Joined: Fri Oct 12, 2007 8:48 am

Postby thecowster » Tue Apr 29, 2008 7:03 am

I have saved the mail as a text file, and written a ruby script to append it to a mail server account via IMAP (simulating what an IMAP mail client does). When you run the script, it kills Scalix.

If you would like a copy of the "killer" email and the upload script, just pm me. This is just a mail I received in the course of business in the last few years, and does not represent any malicious intent to harm Scalix ;-)

Note that the script is only useful for getting the email back into a mail account. You could use this to put the mail into a non-Scalix system, then simply user drag and drop in an IMAP mail client to copy it to a Scalix account (ie, you don't need the script to kill Scalix, you only need the email, available on a mail server)

Cheers
Andy

Valerion
Scalix Star
Scalix Star
Posts: 2730
Joined: Thu Feb 26, 2004 7:40 am
Location: Johannesburg, South Africa
Contact:

Postby Valerion » Thu May 08, 2008 7:00 am

This needs to be passed to the developers. The best route to do this is via Scalix Support, though you can also post it in bugzilla.scalix.com if you can recreate it in a clean mailstore.


Return to “Scalix Server”



Who is online

Users browsing this forum: No registered users and 6 guests