Alexey Loukianov
2006-Dec-04 22:07 UTC
[CentOS] Strange issue with device-mapper lib in CentOS
Hello all, a couple of weeks ago I've been installing CentOS 4.2 on a very-old server machine with MSI server board based on Intel GX440 chipset, two Xeons 500Mhz and one 1Gig of RAM. There is an AMI MegaRaid 467 installed as a storage controller, which causes some troubles with installation, as stock CenOS4 install and production kernels doesn't have older megaraid.ko module compiled, but there are a lot of not so very difficult ways to overcome it. After a clean-and-relatively-fast install first of all I had up2date-d it to CentOS 4.4, did some basic initial reconfigurations, turned it off and left it lay around doing nothing. Today its time has come, and I turned it on. While booting I've seen a message "Segmentation fault" just after a line about "Starting up LVM2". That's confused me a bit. Logged it as root, typed vgdisplay, got a normal output and a message "Segmentation fault" after it. lvdisplay performed just the same - normal display of all LVM logical volumes and a "Segmentation fault" at the bottom. Next step was obvious: # rpm -Va Huh, here we are. There's a bunch of RPMs with binary files in them changed since they've been installed. Just looks like it's a virus job I thinked. But, wait! That's very strange! This server has been laying around turned off and doing nothing since the moment I've done the installation of the system. There were NO possible time for a virus to infect a system. Well, in any case, I took on my special LiveCD with a ClamAV on it and a RamDisk for freshclam to store updated virus databases, booted it, mounted possibly infected system and checked it with clamscan. There were NO viruses found. Well, I though that this might be caused by a faulty SCSI disk in array, that distort the data that's being written to it, instead of informing host that there's a bad block here. Ok, that's easy to check. Let me go to the single mode, reinstall distorted RPMs using rpm -Uvh --replacepkgs, do a couple of 'sync's, remount all filesystems with -O ro,sync, and check installed rpm's with a rpm -Va. Headed on, done all above, got nothing. After a reinstall all files became correct, and LVM tools got back to a correct behavior without "Segmentation fault". Hmm... that's strange, I thought. Well, at least ATM I've got a correctly functioning system without viruses. Huh, well, now it's time to reboot and check how does it performs. I'm going to do unattended reboots in future, it should reboot seamlessly without excess questions. # shutdown -r now Reboot went smoothly, but just as LVM2 was initializing, I've got "Segmentation fault" message again! Damn! What's wrong?! Logged in, rpm -Va - gotcha! Again, device-mapper RPM was broken. Well, let's reinstall it again, sync, remount root readonly, check with rpm -V device-mapper. Done that - all seems to be ok, no output from the rpm -V = files are intact. Rebooted again. Run: [root at omega MegaMgr5.20]# rpm -V device-mapper ..5..... /lib/libdevmapper.so.1.02 That's it. After each and every reboot I've got this file corrupted. Looks like it's not a faulty HDD trouble, and it's not a faulty RAID controller. Most likely something corrupts this file during shutdown process or during boot process. Haven't got enough time today to investigate more deeply, going to continue with it tomorrow. Will post here the results, if any. -- Best regards, Alexey Loukianov mailto:aloukianov at lavtech.ru System Engineer, IT Department, Lavtech Corp
Hi Alexey, On 12/4/06, Alexey Loukianov <aloukianov at lavtech.ru> wrote:> > Hello all, > > a couple of weeks ago I've been installing CentOS 4.2 on a very-old > server machine with MSI server board based on Intel GX440 chipset, two > Xeons 500Mhz and one 1Gig of RAM. There is an AMI MegaRaid 467 > installed as a storage controller, which causes some troubles with > installation, as stock CenOS4 install and production kernels doesn't > have older megaraid.ko module compiled, but there are a lot of not so > very difficult ways to overcome it. After a clean-and-relatively-fast > install first of all I had up2date-d it to CentOS 4.4, did some basic > initial reconfigurations, turned it off and left it lay around doing > nothing. > > Today its time has come, and I turned it on. While booting I've seen a > message "Segmentation fault" just after a line about "Starting up > LVM2". That's confused me a bit. Logged it as root, typed vgdisplay, > got a normal output and a message "Segmentation fault" after it. > lvdisplay performed just the same - normal display of all LVM logical > volumes and a "Segmentation fault" at the bottom. > > Next step was obvious: > # rpm -Va > > Huh, here we are. There's a bunch of RPMs with binary files in them > changed since they've been installed. Just looks like it's a virus > job I thinked. But, wait! That's very strange! This server has been > laying around turned off and doing nothing since the moment I've done > the installation of the system. There were NO possible time for a > virus to infect a system. Well, in any case, I took on my special > LiveCD with a ClamAV on it and a RamDisk for freshclam to > store updated virus databases, booted it, mounted possibly infected > system and checked it with clamscan. There were NO viruses found. > > Well, I though that this might be caused by a faulty SCSI disk in > array, that distort the data that's being written to it, instead of > informing host that there's a bad block here. Ok, that's easy to > check. Let me go to the single mode, reinstall distorted RPMs using > rpm -Uvh --replacepkgs, do a couple of 'sync's, remount all > filesystems with -O ro,sync, and check installed rpm's with a rpm -Va. > Headed on, done all above, got nothing. After a reinstall all files > became correct, and LVM tools got back to a correct behavior without > "Segmentation fault". Hmm... that's strange, I thought. Well, at least > ATM I've got a correctly functioning system without viruses. > Huh, well, now it's time to reboot and check how does it performs. I'm > going to do unattended reboots in future, it should reboot seamlessly > without excess questions. > # shutdown -r now > Reboot went smoothly, but just as LVM2 was initializing, I've got > "Segmentation fault" message again! Damn! What's wrong?! Logged in, > rpm -Va - gotcha! Again, device-mapper RPM was broken. > Well, let's reinstall it again, sync, remount root readonly, check with > rpm -V device-mapper. Done that - all seems to be ok, no output from > the rpm -V = files are intact. Rebooted again. Run: > [root at omega MegaMgr5.20]# rpm -V device-mapper > ..5..... /lib/libdevmapper.so.1.02 > > That's it. After each and every reboot I've got this file corrupted. > Looks like it's not a faulty HDD trouble, and it's not a faulty RAID > controller. Most likely something corrupts this file during shutdown > process or during boot process. Haven't got enough time today to > investigate more deeply, going to continue with it tomorrow. Will post > here the results, if any. > > -- > Best regards, > Alexey Loukianov mailto:aloukianov at lavtech.ru > System Engineer, > IT Department, > Lavtech Corp > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos>[root at omega MegaMgr5.20]# rpm -V device-mapper >..5..... /lib/libdevmapper.so.1.02Is that the only file consistently corrupted or are there others? I've seen similar "mysteries" before that turned out to be a memory issue, once system memory, once CPU cache (that was really weird). I'm not sure this is a memory problem, but wouldn't hurt to run a memory test. Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos/attachments/20061205/00a94ac3/attachment-0001.html>
Alexey Loukianov
2006-Dec-06 11:37 UTC
[CentOS] Strange issue with device-mapper lib in CentOS
Greetings, Nick. On 6 ??????? 2006 ?., 0:01:41 you wrote:>>[ root at omega MegaMgr5.20]# rpm -V device-mapper >>..5.....????/lib/libdevmapper.so.1.02> Is that the only file consistently corrupted or are there others?? > I've seen similar "mysteries" before that turned out to be a memory > issue, once system memory, once CPU cache (that was really weird).? > I'm not sure this is a memory problem, but wouldn't hurt to run a memory test. >It's the only consistency corruption, other files are left intact. Gived it a chance to show troubles: left it last night with memtest86 running. Went smoothly without errors till morning. Now I'm proceeding with a full reinstall, will look what will it give to me in the end ;). -- Best regards, Alexey Loukianov mailto:aloukianov at lavtech.ru System Engineer, IT Department, Lavtech Corp