Page 1 of 1
RedHat kernel problems with LVM snapshots (RHEL4,RHEL3)
Posted: Thu Dec 01, 2005 7:30 pm
by ScalixSupport
We are aware of a number of issues with RedHat kernels relating to kernel hangs or panics when using LVM snapshots.
On FC4, using kernel-2.6.8-1.541 or above, attempting to create a snapshot generates a kernel error. This is raised as RedHat bug
132057
As of 1st December 2005) there is no indication of any imminent fix.
There is a corresponding RHEL4 bug for the same issue
164959
As of 1st December 2005, a fix will be available in RHEL 4 Update 3.
On RHEL3, after installing Update 6, attempts to mount a snapshot volume cause a hang. This is raised as RedHat bug
171983.
As of 1st December 2005, there is no indication of any imminent fix.
We advise the following:
- For a new installation, use RHEL3 Update 5. This is the most tried and tested RedHat version that Scalix customers are using.
- Customers who have installed RHEL3 Update 6 do not need to back anything out provided that they use the kernel that was available in RHEL 3 Update 5.
- In the case of FC4, the only alternative is to stop the Scalix server and take an offline backup.
Best regards
Scalix Support Team
Posted: Thu Feb 16, 2006 9:36 am
by Brian_K
Has anyone tested RHEL 3's latest kernel Version 2.4.21 Rel 37.0.1.EL with Scalix as of yet? I know the kernel prior to this one caused problems with sites that use the snapshot LVM approach to do their backups. I would like to update to the latest and greatest kernel, just worried that the same LVM issue will arise in this release as well. Any infomation anyone can provide to me on the stabilty of this version would be great.
Thanks.
Brian
Posted: Mon Feb 20, 2006 9:03 am
by jch
That's news to me. I know we had issues with RHEL3 update 6, but so far as I know the kernel has always been OK, even for LVM snapshots. If I'm wrong, do you have a Red Hat bugzilla number? I like to keep an eye on these things.
The RHEL4 kernel -- currently 2.6.9-22.0.2 -- does have problems with LVM snapsots though. The RHEL update 3 kernel, currently in beta, supposedly does fix snapshot problem. Alas I can't tell you positively that this is the case since I've never been able to reproduce it and I haven't been able to persuade anyone that can to fo a quick test for me :-( The final release of RHEL4u3 should be out in a few weeks, and if anyone is listening that can do a test I'd love to have confirmation that the bug is fixed. Or if it isn't we can report it to Red Hat so that, with luck, it is fixed rather than having to wait until the middle of summer for the next RHEL4 update.
jch
server development
Posted: Tue Apr 25, 2006 2:08 pm
by Sneeper
Greetings!
I'm having problems with lvremove hanging in an uninterruptable sleep when trying to remove a snapshot. I'm on Fedora Core 4 in the latest RH kernel (2.6.16-1.2096_FC4smp). I just came across this thread.
Is this thread still valid for FC4 2.6.16 Red Hat kernel? If so, is it specific to red hat compiled kernels? (i..e. is it worth persuing compiling my own kernel on Fedora Core 4 as a fix? ) Are there other officially recommended distros or versions for using LVM Snapshots and Scalix where this does work?
Finally, would it be worthwhile to just not use LVM and use rsync during an omsuspend to create a snapshot that way?
Thanks for any info.
--Sneeper
Posted: Wed Apr 26, 2006 4:01 am
by jch
I think this is a different problem. The hanging mount relates to the RHEL3 2.4-based kernel; the crash on RHEL4 relates to creating the snapshot. I know that the RHEL4 problem was fixed in RHEL4 update 3 and I believe that the RHEL3 problem was fixed in update 7 (I haven't had any positive confirmation of that though).
I'd try the previous two FC4 kernels (2.6.16-2069 and 2.6.15-1833, if memory serves) and see if either of these have the problem and I'd check the Fedora lists and bugzilla to see if anyone has the same problem.
You probably can't do a backup while omsuspend is running. The omsuspend will only stop system activity for at most five minuutes (well, actually 299 seconds, it's a minor bug) and that isn't long enough to do a backup -- omsuspend is there specifically to allow a mirror to be split or a snapshot to be taken -- that should only take a few seconds (see the man page).
jch
Posted: Tue Dec 26, 2006 7:58 am
by hughesjr
I am the lead developer for CentOS-4x and I can confirm that at the EL4 update 3 level (corresponds to CentOS-4.3 and higher, RHEL4 update3 and higher) the snapshot hang issue is completely gone.
I highly recommend starting with a snapshot to perform backups, on my test system it takes only 2 seconds to take a snapshot of a 20GB /var/opt/scalix partiton.
2 seconds of omsuspend is much preferable to several minutes of downtime to rsync (or copy) the files to another spot on your disc ... and doing non consistent backups means that you can not do a full restore.