I thought I'd test replacing a failed drive in a 4 drive raid 10 array on a CentOS 5.2 box before it goes online and before a drive really fails. I 'mdadm failed, removed', powered off, replaced drive, partitioned with sfdisk -d /dev/sda | sfdisk /dev/sdb, and finally 'mdadm add'ed'. Everything seems fine until I try to create a snapshot lv. (Creating a snapshot lv worked before I replaced the drive.) Here's what I'm seeing. # lvcreate -p r -s -L 8G -n home-snapshot /dev/vg0/homelv Couldn't find device with uuid 'yIIGF9-9f61-QPk8-q6q1-wn4D-iE1x-MJIMgi'. Couldn't find all physical volumes for volume group vg0. Volume group for uuid not found: I4Gf5TUB1M1TfHxZNg9cCkM1SbRo8cthCTTjVHBEHeCniUIQ03Ov4V1iOy2ciJwm Aborting. Failed to activate snapshot exception store. So then I try # pvdisplay --- Physical volume --- PV Name /dev/md3 VG Name vg0 PV Size 903.97 GB / not usable 3.00 MB Allocatable yes PE Size (KByte) 4096 Total PE 231416 Free PE 44536 Allocated PE 186880 PV UUID yIIGF9-9f61-QPk8-q6q1-wn4D-iE1x-MJIMgi Subsequent runs of pvdisplay eventually returns nothing. pvck /dev/md3 seems to restore that but creating a snapshot volume still fails. It's as if the "PV stuff" is not on the new drive. I (probably incorrectly) assumed that just adding the drive back in to the raid array would take care of that. I've searched quite a bit but have not found any clues. Any one? -- Thanks, Mike
Ross S. W. Walker
2008-Jul-17 22:55 UTC
[CentOS] lvm errors after replacing drive in raid 10 array
Mike wrote:> I thought I'd test replacing a failed drive in a 4 drive raid 10 array on > a CentOS 5.2 box before it goes online and before a drive really fails. > > I 'mdadm failed, removed', powered off, replaced drive, partitioned with > sfdisk -d /dev/sda | sfdisk /dev/sdb, and finally 'mdadm add'ed'. > > Everything seems fine until I try to create a snapshot lv. (Creating a > snapshot lv worked before I replaced the drive.) Here's what I'm seeing. > > # lvcreate -p r -s -L 8G -n home-snapshot /dev/vg0/homelv > Couldn't find device with uuid > 'yIIGF9-9f61-QPk8-q6q1-wn4D-iE1x-MJIMgi'. > Couldn't find all physical volumes for volume group vg0. > Volume group for uuid not found: > I4Gf5TUB1M1TfHxZNg9cCkM1SbRo8cthCTTjVHBEHeCniUIQ03Ov4V1iOy2ciJwm > Aborting. Failed to activate snapshot exception store. > > So then I try > > # pvdisplay > --- Physical volume --- > PV Name /dev/md3 > VG Name vg0 > PV Size 903.97 GB / not usable 3.00 MB > Allocatable yes > PE Size (KByte) 4096 > Total PE 231416 > Free PE 44536 > Allocated PE 186880 > PV UUID yIIGF9-9f61-QPk8-q6q1-wn4D-iE1x-MJIMgi > > > Subsequent runs of pvdisplay eventually returns nothing. pvck /dev/md3 > seems to restore that but creating a snapshot volume still fails. > > It's as if the "PV stuff" is not on the new drive. I (probably > incorrectly) assumed that just adding the drive back in to > the raid array would take care of that. > > I've searched quite a bit but have not found any clues. Any one?It would be interesting to see what the mdadm --detail /dev/mdX says. I see the VG is made out of 1 PV md3? What are md0,1,2 doing, I can guess md0 is probably /boot, but what about 1 and 2? It wouldn't hurt to give the sfdisk partition dumps for the drives in question too. -Ross ______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.
Mike
2008-Jul-18 14:37 UTC
[CentOS] Re: lvm errors after replacing drive in raid 10 array [SOLVED ?]
Just for the record I'm about 98.7% sure that the root problem here was that the LVM stuff (pvcreate, vgcreate, lvcreate) was done when booted from systemrescuecd and had nothing to do with replacing a failed drive. The ouptut from 'pvcreate --version' on the systemrescuecd is: LVM version: 2.02.33 (2008-01-31) Library version: 1.02.26 (2008-06-06) Driver version: 4.13.0 And when booted from CentOS 5.2: LVM version: 2.02.32-RHEL5 (2008-03-04) Library version: 1.02.24 (2007-12-20) Driver version: 4.11.5 When [pv|vg|lv]create is done like it should have been (after booting CentOS) snapshot volume creation works as expected even after replacing a failed drive. On Thu, 17 Jul 2008, Mike wrote:> I thought I'd test replacing a failed drive in a 4 drive raid 10 array on a > CentOS 5.2 box before it goes online and before a drive really fails. > > I 'mdadm failed, removed', powered off, replaced drive, partitioned with > sfdisk -d /dev/sda | sfdisk /dev/sdb, and finally 'mdadm add'ed'. > > Everything seems fine until I try to create a snapshot lv. (Creating a > snapshot lv worked before I replaced the drive.) Here's what I'm seeing. > > # lvcreate -p r -s -L 8G -n home-snapshot /dev/vg0/homelv > Couldn't find device with uuid 'yIIGF9-9f61-QPk8-q6q1-wn4D-iE1x-MJIMgi'. > Couldn't find all physical volumes for volume group vg0. > Volume group for uuid not found: > I4Gf5TUB1M1TfHxZNg9cCkM1SbRo8cthCTTjVHBEHeCniUIQ03Ov4V1iOy2ciJwm > Aborting. Failed to activate snapshot exception store. >