Aleksandar Milivojevic
2005-Apr-11 16:23 UTC
[CentOS] trouble booting the system with I2O hardware RAID
I've just made (yet another) CentOS 4 installation. The install process seems to go fine, however the machine doesn't wan't to boot. The system in question has one of I2O Adaptec RAID controllers. I've configured LVM with one volume group and several volumes. If I boot into the rescue mode, all looks fine and dandy. Anaconda finds the installation, and I can access all volumes. However, when doing "real" boot, it gets into trouble. All required modules are loaded from initrd image (as far as I can tell). The I2O modules are able to locate the RAID devices (I see all partitions reported: /dev/i2o/hda1 (unused), /dev/i2o/hdb1 (/boot), and /dev/i2o/hdb2 (rest of the system under LVM). The only thing different from rescue mode is that i2o/hda and i2o/hdb are reversed (this is strange, but it shouldn't affect things since /boot partition has a label "/boot", and all the rest is under LVM, so everything should be device name independent). I have no idea why i2o device drivers behave differently when loaded from initrd image during boot, and by Anaconda during installation. The last couple of messages printed on the screen are: Creating root device Mounting root file system kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. mount: error 2 mounting none Switching to new root WARNING: can't access (null) exec of init ((null)) failed!!!: 14 umount /initrd/dev failed: 2 Kernel panic - not syncing: Attempted to kill init! Looking at the "init" script from initrd image, this correspond to: echo Mounting root filesystem mount -o defaults --ro -t ext3 /dev/root /sysroot mount -t tmpfs --bind /dev /sysroot/dev echo Switching to new root switchroot /sysroot umount /initrd/dev Which would indicate that mount of root file system went OK, but then it failed to mount /dev filesystem (basically, move already mounted /dev to /sysroot/dev). After the switchroot /sysroot, old /dev mount point become invalid (non-accessible), the new /dev mount point was not there and of course everything broke from that point on. I've Googled around a bit, and the only relevant thing Google gave me was this French page. There were couple of more pages with similar but different problem (modules failing to load and/or detect disk drives, which is not the case here, all modules were loaded correctly as witnessed by successfull LVM initialization and successfull root file system mount). http://www.fedora-france.org/modules/newbb/viewtopic.php?topic_id=3838&forum=6&post_id=20970 I do live in Canada, but don't speak a word of French (shame on me, but in my defense it is on my todo list). However I did managed to figure out somebody suggested going with Grub instead of LILO. IMO, Grub or LILO shouldn't make any difference, since the error is happening way after boot loader did its job. Anyhow, just for fun, I reinstalled the system from scratch, this time choosing Grub as boot loader of choice to be installed into MBR. However, for whatever reason, Anaconda did not install Grub (dd & less showed no signs of Grub in MBR). Boot into the rescue, chroot, grub-install, OK now I have Grub in MBR. But again, no joy. Grub doesn't even start and system simply hangs in mid-air. No errors printed, no anything. Currently, I'm kind of stuck and idea-less. The system did worked perfectly in the past with Red Hat 7.3 (and LILO as boot loader), and exactly the same hardware RAID configuration (two volumes, one for system, one for data). Any help, hint, etc would be greatly appriciated. -- Aleksandar Milivojevic <amilivojevic at pbl.ca> Pollard Banknote Limited Systems Administrator 1499 Buffalo Place Tel: (204) 474-2323 ext 276 Winnipeg, MB R3T 1L7
Aleksandar Milivojevic
2005-Apr-11 19:00 UTC
[CentOS] trouble booting the system with I2O hardware RAID
Aleksandar Milivojevic wrote:> The system in question has one of I2O Adaptec RAID controllers. I've > configured LVM with one volume group and several volumes. If I boot > into the rescue mode, all looks fine and dandy. Anaconda finds the > installation, and I can access all volumes. > > However, when doing "real" boot, it gets into trouble. All required > modules are loaded from initrd image (as far as I can tell). The I2O > modules are able to locate the RAID devices (I see all partitions > reported: /dev/i2o/hda1 (unused), /dev/i2o/hdb1 (/boot), and > /dev/i2o/hdb2 (rest of the system under LVM). The only thing different > from rescue mode is that i2o/hda and i2o/hdb are reversed (this is > strange, but it shouldn't affect things since /boot partition has a > label "/boot", and all the rest is under LVM, so everything should be > device name independent). I have no idea why i2o device drivers behave > differently when loaded from initrd image during boot, and by Anaconda > during installation. > > The last couple of messages printed on the screen are: > > Creating root device > Mounting root file system > kjournald starting. Commit interval 5 seconds > EXT3-fs: mounted filesystem with ordered data mode. > mount: error 2 mounting none > Switching to new root > WARNING: can't access (null) > exec of init ((null)) failed!!!: 14 > umount /initrd/dev failed: 2 > Kernel panic - not syncing: Attempted to kill init!Ah, found it... I was bitten by that nonsense called file system labels. Again. And it even might be that LVM volume information was also read from the wrong place. The problem isn't I2O related, and can probably happen with any other hardware configuration. I'll summarize, so that folks with similar problems in the future know what to do. Configuration: I2O RAID controller with two volumes. First RAID volume is used for the system. Second RAID volume is used for some data storage. Since kernel assigns them different device names during installation, and when the system is booted from the disk after installation, I'll call them "system RAID volume" and "data RAID volume". When I reference device names, it is just a reference as what name system saw them in particular step. During installation, i2o device drivers report the volumes in expected order. /dev/i2o/hda is the system RAID volume, /dev/i2o/hdb is the data RAID volume. Exactly the order they are defined in I2O BIOS. hdb is not touched by installation process and it contained single partition hdb1. /boot is installed on hda1 and "/boot" file system label written onto it. hda2 is configured as LVM physical volume with the rest of the system (including root partition). After the installation is done, and system reboots, for whatever strange reason data RAID volume is detected as /dev/i2o/hda, and system RAID volume as /dev/i2o/hdb. This should theoretically work fine since device names are never used as-is in system's configuration. However, the disks in data RAID volume were previously used (they were not clean), and since system detected them first, this was the root of the problem. It seems that those disks had (once apon a time) system on them, and set of LVM volumes defined, so that was used instead of the "real" information from first RAID volume. I'm not sure if disks were used connected to this I2O controller, or if they were used somewhere else and it just appeared that this information fell into the "right" spot when RAID volume was assembled. OK, so I wiped all partitions from data RAID volume. This time system actually boots (because it can see only partitions on system RAID volume that it detected as /dev/i2o/hdb, so it reads correct LVM information). But the story does not end here. I created single partition on data RAID volume (/dev/i2o/hda), defined it as LVM physical volume, and created new volume group with single logical volume on it. Created file system, mounted it, updated fstab. So far so good. Reboot. Ups, the system doesn't boot, and complains about duplicate "/boot" labels. Back into the rescue mode. And sure there it was. e2label reports that first partition on data RAID volume (which is of type LVM and contains LVM physical volume) and first partition on system RAID volume (which is of type Linux native and contains ext3 file system) both have label "/boot". Ooops. Apperently, Anaconda was smart enough to ignore the label on something that was not an file system. Whatever goes on during "real" boot wasn't that smart. Used e2label to wipe out the label from data RAID volume. This time system booted, no problems at all. For good measure I wiped out logical volume/group and physical volume from data RAID volume and recreated them (didn't wanted to risk e2label used on something that is not file system screw some metadata for LVM). All is happy now. It could have saved me tons of time and grief if Anaconda checked during install process (and detected) conflicting LVM information and conflicting file system labels. Or if file system labels were randomly generated (insted of using mount point names), like the labels usded by MD and LVM drivers. Hopefully this info will be usefull to somebody in the future. -- Aleksandar Milivojevic <amilivojevic at pbl.ca> Pollard Banknote Limited Systems Administrator 1499 Buffalo Place Tel: (204) 474-2323 ext 276 Winnipeg, MB R3T 1L7