Bart Schaefer
2009-May-06 10:32 UTC
[CentOS] Strange problem with filesystem changes reverting on reboot
Our sysadmin was doing midnight work on moving some hardware to new power outlets. We'd recently done a CentOS 5.3 install on one of those machines and then "yum install" with the centosplus kernel and some rpmforge packages. It had been up and running fine for at least two weeks in that configuration. He sent this message: On reboot the root file system seems to have reverted to the previous startup -- no CentOS plus, no Dag repository info in /etc/yum.repos.d, no xfs, and therefore no /var/lib/mysql. This is at least the second time we've experienced this phenomenon ... suffice to say I am really really suspicious about ext3 now. The previous time this occurred was quite some time ago, probably soon after the CentOS 5.1 release -- we'd written it off as pilot error of some kind. The root is not an LVM, but it is on a software RAID -- my suspicion leans more toward a RAID issue than ext3. Does any of this sound familiar to anyone?
Alexx
2009-May-06 11:18 UTC
[CentOS] Strange problem with filesystem changes reverting on reboot
Bart Schaefer ?????: *snipped content*> > On reboot the root file system seems to have reverted to the previous > startup -- no CentOS plus, no Dag repository info in /etc/yum.repos.d, > no xfs, and therefore no /var/lib/mysql. This is at least the second > time we've experienced this phenomenon ... suffice to say I am really > really suspicious about ext3 now. > > The previous time this occurred was quite some time ago, probably soon > after the CentOS 5.1 release -- we'd written it off as pilot error of > some kind. The root is not an LVM, but it is on a software RAID -- my > suspicion leans more toward a RAID issue than ext3. > > Does any of this sound familiar to anyone?Yep, it does. I run some cluster and its gLite middleware on Scientific Linux 3.x ("flavour" of RHEL 3). Its "/" has been put to software RAID1, the same thing you described has happened several times. After reboot (not necessarily the first one) just about everything in file system reverted to the previous state. Just like after the fresh install, *poof* Currently I'm using Scientific Linux 4.7 on LVM in the mix with software RIAD1. So far so good. I didn't have time to experiment with software RIAD1 after stuff disappeared. It had to be installed fast and put into operation, so I just abandoned software RAID, and have no clue what was the reason and how the issue could be solved. Will be gla? if someone could shed the lite on this. Cheers, Alexei Altuhov.
Les Mikesell
2009-May-06 12:47 UTC
[CentOS] Strange problem with filesystem changes reverting on reboot
Bart Schaefer wrote:> Our sysadmin was doing midnight work on moving some hardware to new > power outlets. We'd recently done a CentOS 5.3 install on one of > those machines and then "yum install" with the centosplus kernel and > some rpmforge packages. It had been up and running fine for at least > two weeks in that configuration. He sent this message: > > On reboot the root file system seems to have reverted to the previous > startup -- no CentOS plus, no Dag repository info in /etc/yum.repos.d, > no xfs, and therefore no /var/lib/mysql. This is at least the second > time we've experienced this phenomenon ... suffice to say I am really > really suspicious about ext3 now. > > The previous time this occurred was quite some time ago, probably soon > after the CentOS 5.1 release -- we'd written it off as pilot error of > some kind. The root is not an LVM, but it is on a software RAID -- my > suspicion leans more toward a RAID issue than ext3. > > Does any of this sound familiar to anyone?The only way I can even imagine that happening would be on a RAID1 where the mirrors were not in sync when you made the changes so they only happened on one drive. There are reasonably common circumstances to cause this, so you should always check with 'cat /proc/mdstat' to be sure both mirrors are active. Then, the more unlikely part is that when you rebooted, the previously active mirror was not recognized and the previously idle mirror became active instead. Again, 'cat /proc/mdstat' would have shown the problem - and if that was it, unless the drives had started to sync the wrong direction, the quick fix would have been to simply remove the drive with the old contents, forcing the other one to be used. One possibility here would be that the partition type on the drive that didn't join the raid at reboot was not set to 'FD' for autodetect. And the one that had the old contents might have had some error that caused it to be kicked out of the set earlier - the system does seem to be very sensitive about that where if there is only one drive it will do many more retries. -- Les Mikesell lesmikesell at gmail.com
jacob at aers.ca
2009-May-06 17:40 UTC
[CentOS] Strange problem with filesystem changes reverting on reboot
I've had similar issues with certain brands of "hardware" raid controllers. It was like the mirror didn't sync and it picked a random drive on boot to be the master. So changes happened they just didn't happen to both drives and on reboot the other drive without the changes might/would become master and everything would seem to have vanished. I fixed it by disabling the hardware raid and going with a pure software RAID 1. -----Original Message----- From: centos-bounces at centos.org [mailto:centos-bounces at centos.org] On Behalf Of Bart Schaefer Sent: Wednesday, May 06, 2009 3:33 AM To: CentOS mailing list Subject: [CentOS] Strange problem with filesystem changes reverting on reboot Our sysadmin was doing midnight work on moving some hardware to new power outlets. We'd recently done a CentOS 5.3 install on one of those machines and then "yum install" with the centosplus kernel and some rpmforge packages. It had been up and running fine for at least two weeks in that configuration. He sent this message: On reboot the root file system seems to have reverted to the previous startup -- no CentOS plus, no Dag repository info in /etc/yum.repos.d, no xfs, and therefore no /var/lib/mysql. This is at least the second time we've experienced this phenomenon ... suffice to say I am really really suspicious about ext3 now. The previous time this occurred was quite some time ago, probably soon after the CentOS 5.1 release -- we'd written it off as pilot error of some kind. The root is not an LVM, but it is on a software RAID -- my suspicion leans more toward a RAID issue than ext3. Does any of this sound familiar to anyone? _______________________________________________ CentOS mailing list CentOS at centos.org http://lists.centos.org/mailman/listinfo/centos