Mailbox Corruption?

Discuss the Scalix Server software

Moderators: ScalixSupport, admin

kluss0
Posts: 118
Joined: Sat Jan 07, 2006 1:40 pm

Postby kluss0 » Tue Mar 28, 2006 8:31 am

The problem seems to be spreading. I had two more users this morning with "Bad Magic Number in Index" errors that couldn't log into their e-mail. Does anyone know what could be causing this to happen?

Thanks,
Kenny

ScalixSupport
Scalix
Scalix
Posts: 5503
Joined: Thu Mar 25, 2004 8:15 pm

Postby ScalixSupport » Tue Mar 28, 2006 8:43 am

Hi,

where are your messages stored, what is the underlying storage system, what is in /var/log/messages, dmesg, etc. You seem to have a file system problem.

Cheers,

Sascha.

kluss0
Posts: 118
Joined: Sat Jan 07, 2006 1:40 pm

Postby kluss0 » Tue Mar 28, 2006 9:32 am

Hi Sascha,

I am running RHEL ES4 on 900GB RAID 1+0 system (6x300GB drives) using EXT3. I have looked at messages, dmesg, etc., and there is nothing there that would indicate a problem with the SCSI card, the drives, or the FS. I just brought the system down and forced an fsck on reboot, so that might show something.

The message store uses all of the default paths from installation. FSCK just finished with no errors or warnings. The fs seems to be fine.

Thanks,
Kenny

ScalixSupport
Scalix
Scalix
Posts: 5503
Joined: Thu Mar 25, 2004 8:15 pm

Postby ScalixSupport » Tue Mar 28, 2006 9:42 am

So this is a block device and not a remote NFS mount?

kluss0
Posts: 118
Joined: Sat Jan 07, 2006 1:40 pm

Postby kluss0 » Tue Mar 28, 2006 10:02 am

No, this is not an NFS mount. I was originally going to do that (mount the storage space from a NetApp), but I was told in the pre-sales process that I couldn't do that. The storage is all local SATA drives (I mistakenly said SCSI earlier) on a 3Ware 9500S.

kluss0
Posts: 118
Joined: Sat Jan 07, 2006 1:40 pm

Postby kluss0 » Mon Apr 03, 2006 7:46 am

Another mailbox has just become corrupted with the same error as the first one:
[root@postal logs]# omtidyu -B -u "Himansu Sahu" -M
A fatal error has occurred - see the system error log
Content Record has not been upgraded to current container format.
This command is not allowed in this context
This command is not allowed in this context
This command is not allowed in this context
This command is not allowed in this context
This command is not allowed in this context
This command is not allowed in this context


When I open the container in omcontain, everything looks fine. I just don't know how to fix this other then deleting the mail box and restoring it. Any ideas??

Thanks,
Kenny

ScalixSupport
Scalix
Scalix
Posts: 5503
Joined: Thu Mar 25, 2004 8:15 pm

Postby ScalixSupport » Mon Apr 03, 2006 10:31 am

You are not by any chance an ex-Samsung customer and have users still using their old Contact SPI or something like that ?

Sascha.

kluss0
Posts: 118
Joined: Sat Jan 07, 2006 1:40 pm

Postby kluss0 » Mon Apr 03, 2006 10:45 am

No, we never used Samsung Contact. We are an ex-Exchange shop, but we never had Exchange in-house. Our mail was outsourced until Nov. of 2005 when I brought it in-house with Scalix Enterprise. Everyone was accessing their mail via pop3. I converted everyone to Scalix v9, then upgraded to v10 last month. Everything was fine until about a week ago when all of these really strange mailbox corruptions started to happen.

I have had some users that end up with 20 or so folders called "Calendar", multiple "Tasks" folders, but their mail is still functioning. Some users get the "Bad Magic Number" error, and two have had the "Content Record has not been upgraded to current container format." error. If it weren't such a critical system, I would be tempted to blow the whole thing away and start from scratch...

Thanks,
Kenny

florian
Scalix
Scalix
Posts: 3852
Joined: Fri Dec 24, 2004 8:16 am
Location: Frankfurt, Germany
Contact:

Postby florian » Mon Apr 03, 2006 10:49 am

Hi Kenny,

this sounds like you're a Scalix customer; If that's the case, I believe you should be opening an official support case through support@scalix.com.

This sounds environment-specific (maybe hardware/filesystem corruption involved) and I believe it's too risky to just rely on the forum for this.

Cheers,
Florian.
Florian von Kurnatowski, Die Harder!

kluss0
Posts: 118
Joined: Sat Jan 07, 2006 1:40 pm

Postby kluss0 » Mon Apr 03, 2006 10:58 am

Hi Florian,

Yes, I do need to open an official support request at this point. I like to post things to the forum so that if anyone else in the community runs into something similar, they can find the answers here. The forums/community, to me, were one of the main selling points for Scalix, so I want to contribute as much information as possible back.

Thanks,
Kenny

florian
Scalix
Scalix
Posts: 3852
Joined: Fri Dec 24, 2004 8:16 am
Location: Frankfurt, Germany
Contact:

Postby florian » Mon Apr 03, 2006 11:01 am

Kenny,

absolutely, I agree and I and the community thank you for that. I believe if the resolution that is found through support offers any value to the rest of the readers, you can certainly post it here.

:-) Great attitude, thanks again,
Florian.
Florian von Kurnatowski, Die Harder!

Stephen
Posts: 17
Joined: Wed Feb 23, 2005 11:48 am
Location: Dallas Texas
Contact:

What was the resolution

Postby Stephen » Fri Jul 21, 2006 7:32 pm

We're seeing this "Content Record has not been upgraded to current container format. " error too. The exact same thing as the early part of this thread. What was the resolution?
Stephen Eaton, EMCom
stephen@getemcom.com
www.getemcom.com
Certified Scalix Professional
Certified Scalix Instructor

kluss0
Posts: 118
Joined: Sat Jan 07, 2006 1:40 pm

Postby kluss0 » Sat Jul 22, 2006 7:52 pm

Unfortunately, there is no resolution yet. I have had a support ticket open for quite some time, and the Scalix support staff has been working extremely hard to figure this out. We thought that we had it narrowed down to a configuraton error that was causing the mime data to be constantly re-built, but then the corruption continued to happen.

Every time it happens, I have to delete the account, re-create it, then import the users data from backups using omcpinu. The strange thing is that I usually cannot outright delete the account. I have to delete it, wait for an error stating that the use cannot be deleted, then remove the user entry from the directory with omdelent.

There is apparently a way to work around the problem by zeroing out the index of the corrupted message. However, this procedure is far beyond my extpertise and so far has only been performed by a member of the Scalix escalation group.

FYI,
Kenny

kluss0
Posts: 118
Joined: Sat Jan 07, 2006 1:40 pm

Postby kluss0 » Wed Oct 04, 2006 1:02 pm

I finally have an apparent cause of this problem, and a somewhat solution.

The problem is caused by this bug: http://bugzilla.scalix.com/show_bug.cgi?id=12624 . Basically, under several sets of circumstances, pending deletes are not cleaned up, so when omoff is used to shut down imap and rci, the containers are left open and the processes are killed while there are still pending deletes. This causes the mailstore to become corrupt, producing either a "Bad magic number in header" error or "Current container has not been upgraded to current format" error. I have been told that this bug is fixed in v11.

The solution: don't use omoff. I converted my server to use LVM and do a snapshot backup to another server. On the other server, I bring up Scalix and do individual omcpoutu's for each user. This works great, as long as you have an up to date version of RedHat (or any other distribution). If you are using RHEL, you *MUST* have at least "Nahant Update 3" or later. I don't know what you need for other distributions. The problem with older releases is that there was a bug in the devmapper code that would crash the system when using an LVM snapshot (http://www.scalix.com/community/viewtop ... hlight=lvm).

I could have just waited until v11 becomes available, but I chose to go down the LVM path for the simple reason that not only does it give me a backup solution, but it also gives me a warm-spare server should anything happen to my main server.

tbarber

Postby tbarber » Tue Oct 17, 2006 2:21 pm

Has anyone found a solution for error 3541?


Return to “Scalix Server”



Who is online

Users browsing this forum: No registered users and 21 guests