Robert Heller
2011-Jul-22 19:08 UTC
[CentOS] Strange problem with LVM, device-mapper, and software RAID...
Running on a up-to-date CentOS 5.6 x86_64 machine: [heller at ravel ~]$ uname -a Linux ravel.60villagedrive 2.6.18-238.19.1.el5 #1 SMP Fri Jul 15 07:31:24 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux with a TYAN Computer Corp S4881 motherboard, which has a nVidia 4 channel SATA controller. It also has a Marvell Technology Group Ltd. 88SX7042 PCI-e 4-port SATA-II (rev 02). This machine has a 120G SATA system disk on the motherboard controller as the system disk: [heller at ravel ~]$ sudo /sbin/fdisk -l /dev/sda Disk /dev/sda: 120.0 GB, 120034123776 bytes 255 heads, 63 sectors/track, 14593 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 125 1004031 83 Linux /dev/sda2 126 14593 116214210 8e Linux LVM /dev/sda1 is /boot and /dev/sda2 is a LVM Volume group (named "RavelSystem") with two logical volumes (named "root" and "swap"), containing the root file system and a base 1G swap area. So far so good. On the Marvell controller are 4 1.5GB disks arranged as a RAID10 array: [heller at ravel ~]$ cat /proc/mdstat Personalities : [raid10] md1 : active raid10 sdg1[3] sdf1[2] sde1[1] sdd1[0] 2930270208 blocks 512K chunks 2 far-copies [4/4] [UUUU] unused devices: <none> [heller at ravel ~]$ sudo /sbin/mdadm --detail /dev/md1 /dev/md1: Version : 0.90 Creation Time : Tue Jun 21 19:04:19 2011 Raid Level : raid10 Array Size : 2930270208 (2794.52 GiB 3000.60 GB) Used Dev Size : 1465135616 (1397.26 GiB 1500.30 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Jul 22 14:37:04 2011 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : far=2 Chunk Size : 512K UUID : 7a257206:a2b7c9d9:c5d004ae:2bdf6faf Events : 0.578 Number Major Minor RaidDevice State 0 8 49 0 active sync /dev/sdd1 1 8 65 1 active sync /dev/sde1 2 8 81 2 active sync /dev/sdf1 3 8 97 3 active sync /dev/sdg1 This RAID10 array has a second LVM Volume Group (named "RavelData2") on it, also with two logical volumes (named "data" and "largeswap"), a large data file system and a 16gig swap area. We are getting a strange message from device mapper on boot up: [heller at ravel ~]$ dmesg | grep device-mapper -A 5 -B 5 sdg: Write Protect is off sdg: Mode Sense: 00 3a 00 00 SCSI device sdg: drive cache: write back sdg: sdg1 sd 7:0:0:0: Attached scsi disk sdg device-mapper: uevent: version 1.0.3 device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel at redhat.com device-mapper: dm-raid45: initialized v0.2594l kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. SELinux: Disabled at runtime. SELinux: Unregistering netfilter hooks type=1404 audit(1311358429.773:2): selinux=0 auid=4294967295 ses=4294967295 -- md: bind<sdg1> md: running: <sdg1><sdf1><sde1><sdd1> md: raid10 personality registered for level 10 raid10: raid set md1 active with 4 out of 4 devices md: ... autorun DONE. device-mapper: multipath: version 1.0.6 loaded device-mapper: table: 253:6: linear: dm-linear: Device lookup failed device-mapper: ioctl: error adding target to table device-mapper: ioctl: device doesn't appear to be in the dev hash table. EXT3 FS on dm-5, internal journal kjournald starting. Commit interval 5 seconds EXT3 FS on sda1, internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 180 seconds '253:6' happens to be the large swap space: [heller at ravel ~]$ dir -l /dev/mapper/ total 0 crw------- 1 root root 10, 63 Jul 22 14:13 control brw-rw---- 1 root disk 253, 2 Jul 22 14:13 nvidia_biaabdaf brw-rw---- 1 root disk 253, 3 Jul 22 14:13 nvidia_biaabdafp1 brw-rw---- 1 root disk 253, 0 Jul 22 14:13 nvidia_efjjfcad brw-rw---- 1 root disk 253, 1 Jul 22 14:13 nvidia_efjjfcadp1 brw-rw---- 1 root disk 253, 7 Jul 22 14:13 RavelData2-data brw-rw---- 1 root disk 253, 6 Jul 22 14:13 RavelData2-largeswap brw-rw---- 1 root disk 253, 5 Jul 22 14:13 RavelSystem-root brw-rw---- 1 root disk 253, 4 Jul 22 14:13 RavelSystem-swap The weirdness is this: 1) the large swap space is not being activated automatically on boot. It can be manually activated with the swapon command, and as a stopgap measure, I've added a swapon command to rc.local 2) vgscan, vgchange, and vgdisplay are not seeing RavelData2, *even though* the system has no problem mounting the data disk during boot and swapon has no problem activating the largeswap swap area (at least once the system is in full multi-user mode). And (appearently) the device mapper is perfectly happy to do its thing (more or less) with these logical volumes. [heller at ravel ~]$ sudo /sbin/vgscan -v --ignorelockingfailure --mknodes -d Wiping cache of LVM-capable devices Wiping internal VG cache Reading all physical volumes. This may take a while... Finding all volume groups Finding volume group "RavelSystem" Found volume group "RavelSystem" using metadata type lvm2 Finding all logical volumes [heller at ravel ~]$ sudo /sbin/vgchange -a y 2 logical volume(s) in volume group "RavelSystem" now active [heller at ravel ~]$ sudo /sbin/vgchange -a y RavelData2 Volume group "RavelData2" not found [heller at ravel ~]$ sudo /usr/sbin/vgdisplay RavelData2 Volume group "RavelData2" not found Does anyone have any guesses as to what is going on here? -- Robert Heller -- 978-544-6933 / heller at deepsoft.com Deepwoods Software -- http://www.deepsoft.com/ () ascii ribbon campaign -- against html e-mail /\ www.asciiribbon.org -- against proprietary attachments
Apparently Analagous Threads
- CentOS 7 - Have 2 disks, each with a biosboot partition, can only boot off one of them
- USB disk dropping out under light load
- LVM and hotswap (USB/iSCSI) devices?
- Bootable USB key...
- rsync: hlink.c:271: check_prior: Assertion `node->data != ((void *)0)' failed.