Rudi Ahlers
2009-May-08 12:29 UTC
[Xen-users] domU corrupt after server crash, help needed trying to recover domU
Hi all, One of our Dell servers has failed badly, and one of the domU''s has been corrupted in the process. It boots up to a point and then gives me a kernel panic: Loading dm-zero.ko module Loading dm-snapshot.ko module Scanning and configuring dmraid supported devices Scanning logical volumes Reading all physical volumes. This may take a while... No volume groups found Activating logical volumes Volume group "VolGroup00" not found Creating root device. Mounting root filesystem. mount: could not find filesystem ''/dev/root'' Setting up other filesystems. Setting up new root fs setuproot: moving /dev failed: No such file or directory no fstab.sys, mounting internal defaults setuproot: error mounting /proc: No such file or directory setuproot: error mounting /sys: No such file or directory Switching to new root and running init. unmounting old /dev unmounting old /proc unmounting old /sys switchroot: mount failed: No such file or directory Kernel panic - not syncing: Attempted to kill init! It shows up as a Zombie: [root@xen ~]# xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 1439 1 r----- 329.0 Zombie-hfserver2 15 1024 1 ----c- 0.5 hfdns02 10 519 2 r----- 1552.8 I can''t mount either: root@xen ~]# mount /dev/data/hf hfdns02 hfserver2 [root@xen ~]# mount /dev/data/hfserver2 /mnt/cpanel/ mount: you must specify the filesystem type [root@xen ~]# mount -o loop /dev/data/hfserver2 /mnt/cpanel/ mount: you must specify the filesystem type Here''s the output of the LVM partitions: [root@xen ~]# lvscan ACTIVE ''/dev/data/cpanel002'' [100.00 GB] inherit ACTIVE ''/dev/data/windows2003_web'' [30.00 GB] inherit ACTIVE ''/dev/data/storage'' [50.00 GB] inherit ACTIVE Original ''/dev/data/hfserver2'' [30.00 GB] inherit ACTIVE ''/dev/data/hfdns02'' [30.00 GB] inherit ACTIVE ''/dev/data/pluto'' [30.00 GB] inherit ACTIVE Snapshot ''/dev/data/pluto_s'' [30.00 GB] inherit ACTIVE ''/dev/system/root'' [39.06 GB] inherit ACTIVE ''/dev/system/swap'' [9.75 GB] inherit [root@xen ~]# vgscan Reading all physical volumes. This may take a while... Found volume group "data" using metadata type lvm2 Found volume group "system" using metadata type lvm2 [root@xen ~]# Does anyone know how to fix a LVM like this? -- Kind Regards Rudi Ahlers CEO, SoftDux Hosting Web: http://www.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Rudi Ahlers
2009-May-08 13:43 UTC
[Xen-users] Re: domU corrupt after server crash, help needed trying to recover domU
On Fri, May 8, 2009 at 2:29 PM, Rudi Ahlers <rudiahlers@gmail.com> wrote:> Hi all, > > One of our Dell servers has failed badly, and one of the domU''s has been > corrupted in the process. It boots up to a point and then gives me a kernel > panic: > > Loading dm-zero.ko module > Loading dm-snapshot.ko module > Scanning and configuring dmraid supported devices > Scanning logical volumes > Reading all physical volumes. This may take a while... > No volume groups found > Activating logical volumes > Volume group "VolGroup00" not found > Creating root device. > Mounting root filesystem. > mount: could not find filesystem ''/dev/root'' > Setting up other filesystems. > Setting up new root fs > setuproot: moving /dev failed: No such file or directory > no fstab.sys, mounting internal defaults > setuproot: error mounting /proc: No such file or directory > setuproot: error mounting /sys: No such file or directory > Switching to new root and running init. > unmounting old /dev > unmounting old /proc > unmounting old /sys > switchroot: mount failed: No such file or directory > Kernel panic - not syncing: Attempted to kill init! > > > It shows up as a Zombie: > > [root@xen ~]# xm list > Name ID Mem(MiB) VCPUs State Time(s) > Domain-0 0 1439 1 r----- 329.0 > Zombie-hfserver2 15 1024 1 ----c- 0.5 > hfdns02 10 519 2 r----- 1552.8 > > > I can''t mount either: > > root@xen ~]# mount /dev/data/hf > hfdns02 hfserver2 > [root@xen ~]# mount /dev/data/hfserver2 /mnt/cpanel/ > mount: you must specify the filesystem type > [root@xen ~]# mount -o loop /dev/data/hfserver2 /mnt/cpanel/ > mount: you must specify the filesystem type > > Here''s the output of the LVM partitions: > > [root@xen ~]# lvscan > ACTIVE ''/dev/data/cpanel002'' [100.00 GB] inherit > ACTIVE ''/dev/data/windows2003_web'' [30.00 GB] inherit > ACTIVE ''/dev/data/storage'' [50.00 GB] inherit > ACTIVE Original ''/dev/data/hfserver2'' [30.00 GB] inherit > ACTIVE ''/dev/data/hfdns02'' [30.00 GB] inherit > ACTIVE ''/dev/data/pluto'' [30.00 GB] inherit > ACTIVE Snapshot ''/dev/data/pluto_s'' [30.00 GB] inherit > ACTIVE ''/dev/system/root'' [39.06 GB] inherit > ACTIVE ''/dev/system/swap'' [9.75 GB] inherit > [root@xen ~]# vgscan > Reading all physical volumes. This may take a while... > Found volume group "data" using metadata type lvm2 > Found volume group "system" using metadata type lvm2 > [root@xen ~]# > > > Does anyone know how to fix a LVM like this? > > > -- >Here''s what I''ve done so far: [root@xen ~]# losetup /dev/loop4 /dev/data/hfserver2 # This mounts the LVM partition as imaginary physical to /dev/loop4 # data is the virtual group (VG) name [root@xen ~]# kpartx -va /dev/loop4 add map loop4p1 : 0 208782 linear /dev/loop4 63 add map loop4p2 : 0 62701695 linear /dev/loop4 208845 # This creates a device map with partitions in /dev/data/hfserver in /dev/mapper [root@xen ~]# vgscan Reading all physical volumes. This may take a while... Found volume group "VolGroup00" using metadata type lvm2 Found volume group "data" using metadata type lvm2 Found volume group "system" using metadata type lvm2 [root@xen ~]# lvscan inactive ''/dev/VolGroup00/LogVol00'' [27.94 GB] inherit inactive ''/dev/VolGroup00/LogVol01'' [1.94 GB] inherit ACTIVE ''/dev/data/cpanel002'' [100.00 GB] inherit ACTIVE ''/dev/data/windows2003_web'' [30.00 GB] inherit ACTIVE ''/dev/data/storage'' [50.00 GB] inherit ACTIVE Original ''/dev/data/hfserver2'' [30.00 GB] inherit ACTIVE ''/dev/data/hfdns02'' [30.00 GB] inherit ACTIVE ''/dev/data/pluto'' [30.00 GB] inherit ACTIVE Snapshot ''/dev/data/pluto_s'' [30.00 GB] inherit ACTIVE ''/dev/system/root'' [39.06 GB] inherit ACTIVE ''/dev/system/swap'' [9.75 GB] inherit [root@xen ~]# lvchange -ay VolGroup00 [root@xen ~]# lvscan ACTIVE ''/dev/VolGroup00/LogVol00'' [27.94 GB] inherit ACTIVE ''/dev/VolGroup00/LogVol01'' [1.94 GB] inherit ACTIVE ''/dev/data/cpanel002'' [100.00 GB] inherit ACTIVE ''/dev/data/windows2003_web'' [30.00 GB] inherit ACTIVE ''/dev/data/storage'' [50.00 GB] inherit ACTIVE Original ''/dev/data/hfserver2'' [30.00 GB] inherit ACTIVE ''/dev/data/hfdns02'' [30.00 GB] inherit ACTIVE ''/dev/data/pluto'' [30.00 GB] inherit ACTIVE Snapshot ''/dev/data/pluto_s'' [30.00 GB] inherit ACTIVE ''/dev/system/root'' [39.06 GB] inherit ACTIVE ''/dev/system/swap'' [9.75 GB] inherit [root@xen ~]# e2fsck /dev/VolGroup00/LogVol00 e2fsck 1.39 (29-May-2006) /dev/VolGroup00/LogVol00: clean, 631982/7325696 files, 4512772/7323648 blocks At first it found a whole lot of damages inodes which I repaired. Then, I reversed the process: [root@xen ~]# lvchange -an VolGroup00 [root@xen ~]# lvscan inactive ''/dev/VolGroup00/LogVol00'' [27.94 GB] inherit inactive ''/dev/VolGroup00/LogVol01'' [1.94 GB] inherit ACTIVE ''/dev/data/cpanel002'' [100.00 GB] inherit ACTIVE ''/dev/data/windows2003_web'' [30.00 GB] inherit ACTIVE ''/dev/data/storage'' [50.00 GB] inherit ACTIVE Original ''/dev/data/hfserver2'' [30.00 GB] inherit ACTIVE ''/dev/data/hfdns02'' [30.00 GB] inherit ACTIVE ''/dev/data/pluto'' [30.00 GB] inherit ACTIVE Snapshot ''/dev/data/pluto_s'' [30.00 GB] inherit ACTIVE ''/dev/system/root'' [39.06 GB] inherit ACTIVE ''/dev/system/swap'' [9.75 GB] inherit [root@xen ~]# vgchange -an VolGroup00 0 logical volume(s) in volume group "VolGroup00" now active [root@xen ~]# kpartx -d /dev/loop4 [root@xen ~]# losetup -d /dev/loop4 [root@xen ~]# [root@xen ~]# xm create -c /etc/xen/hfserver2 And then it dies: Reading all physical volumes. This may take a while... Found volume group "VolGroup00" using metadata type lvm2 Activating logical volumes 2 logical volume(s) in volume group "VolGroup00" now active Creating root device. Mounting root filesystem. kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. Setting up other filesystems. Setting up new root fs no fstab.sys, mounting internal defaults Switching to new root and running init. unmounting old /dev unmounting old /proc unmounting old /sys exec of init (/sbin/init) failed!!!: No such file or directory Kernel panic - not syncing: Attempted to kill init! [root@xen ~]# -- Kind Regards Rudi Ahlers CEO, SoftDux Hosting Web: http://www.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Rudi Ahlers
2009-May-08 18:21 UTC
[Xen-users] Re: domU corrupt after server crash, help needed trying to recover domU
Does anyone know how to fix this? On 5/8/09, Rudi Ahlers <rudiahlers@gmail.com> wrote:> On Fri, May 8, 2009 at 2:29 PM, Rudi Ahlers <rudiahlers@gmail.com> wrote: > >> Hi all, >> >> One of our Dell servers has failed badly, and one of the domU''s has been >> corrupted in the process. It boots up to a point and then gives me a >> kernel >> panic: >> >> Loading dm-zero.ko module >> Loading dm-snapshot.ko module >> Scanning and configuring dmraid supported devices >> Scanning logical volumes >> Reading all physical volumes. This may take a while... >> No volume groups found >> Activating logical volumes >> Volume group "VolGroup00" not found >> Creating root device. >> Mounting root filesystem. >> mount: could not find filesystem ''/dev/root'' >> Setting up other filesystems. >> Setting up new root fs >> setuproot: moving /dev failed: No such file or directory >> no fstab.sys, mounting internal defaults >> setuproot: error mounting /proc: No such file or directory >> setuproot: error mounting /sys: No such file or directory >> Switching to new root and running init. >> unmounting old /dev >> unmounting old /proc >> unmounting old /sys >> switchroot: mount failed: No such file or directory >> Kernel panic - not syncing: Attempted to kill init! >> >> >> It shows up as a Zombie: >> >> [root@xen ~]# xm list >> Name ID Mem(MiB) VCPUs State >> Time(s) >> Domain-0 0 1439 1 r----- >> 329.0 >> Zombie-hfserver2 15 1024 1 ----c- >> 0.5 >> hfdns02 10 519 2 r----- >> 1552.8 >> >> >> I can''t mount either: >> >> root@xen ~]# mount /dev/data/hf >> hfdns02 hfserver2 >> [root@xen ~]# mount /dev/data/hfserver2 /mnt/cpanel/ >> mount: you must specify the filesystem type >> [root@xen ~]# mount -o loop /dev/data/hfserver2 /mnt/cpanel/ >> mount: you must specify the filesystem type >> >> Here''s the output of the LVM partitions: >> >> [root@xen ~]# lvscan >> ACTIVE ''/dev/data/cpanel002'' [100.00 GB] inherit >> ACTIVE ''/dev/data/windows2003_web'' [30.00 GB] inherit >> ACTIVE ''/dev/data/storage'' [50.00 GB] inherit >> ACTIVE Original ''/dev/data/hfserver2'' [30.00 GB] inherit >> ACTIVE ''/dev/data/hfdns02'' [30.00 GB] inherit >> ACTIVE ''/dev/data/pluto'' [30.00 GB] inherit >> ACTIVE Snapshot ''/dev/data/pluto_s'' [30.00 GB] inherit >> ACTIVE ''/dev/system/root'' [39.06 GB] inherit >> ACTIVE ''/dev/system/swap'' [9.75 GB] inherit >> [root@xen ~]# vgscan >> Reading all physical volumes. This may take a while... >> Found volume group "data" using metadata type lvm2 >> Found volume group "system" using metadata type lvm2 >> [root@xen ~]# >> >> >> Does anyone know how to fix a LVM like this? >> >> >> -- >> > > > > Here''s what I''ve done so far: > > [root@xen ~]# losetup /dev/loop4 /dev/data/hfserver2 > > # This mounts the LVM partition as imaginary physical to /dev/loop4 > # data is the virtual group (VG) name > > [root@xen ~]# kpartx -va /dev/loop4 > add map loop4p1 : 0 208782 linear /dev/loop4 63 > add map loop4p2 : 0 62701695 linear /dev/loop4 208845 > > # This creates a device map with partitions in /dev/data/hfserver in > /dev/mapper > > [root@xen ~]# vgscan > Reading all physical volumes. This may take a while... > Found volume group "VolGroup00" using metadata type lvm2 > Found volume group "data" using metadata type lvm2 > Found volume group "system" using metadata type lvm2 > > > [root@xen ~]# lvscan > inactive ''/dev/VolGroup00/LogVol00'' [27.94 GB] inherit > inactive ''/dev/VolGroup00/LogVol01'' [1.94 GB] inherit > ACTIVE ''/dev/data/cpanel002'' [100.00 GB] inherit > ACTIVE ''/dev/data/windows2003_web'' [30.00 GB] inherit > ACTIVE ''/dev/data/storage'' [50.00 GB] inherit > ACTIVE Original ''/dev/data/hfserver2'' [30.00 GB] inherit > ACTIVE ''/dev/data/hfdns02'' [30.00 GB] inherit > ACTIVE ''/dev/data/pluto'' [30.00 GB] inherit > ACTIVE Snapshot ''/dev/data/pluto_s'' [30.00 GB] inherit > ACTIVE ''/dev/system/root'' [39.06 GB] inherit > ACTIVE ''/dev/system/swap'' [9.75 GB] inherit > > [root@xen ~]# lvchange -ay VolGroup00 > [root@xen ~]# lvscan > ACTIVE ''/dev/VolGroup00/LogVol00'' [27.94 GB] inherit > ACTIVE ''/dev/VolGroup00/LogVol01'' [1.94 GB] inherit > ACTIVE ''/dev/data/cpanel002'' [100.00 GB] inherit > ACTIVE ''/dev/data/windows2003_web'' [30.00 GB] inherit > ACTIVE ''/dev/data/storage'' [50.00 GB] inherit > ACTIVE Original ''/dev/data/hfserver2'' [30.00 GB] inherit > ACTIVE ''/dev/data/hfdns02'' [30.00 GB] inherit > ACTIVE ''/dev/data/pluto'' [30.00 GB] inherit > ACTIVE Snapshot ''/dev/data/pluto_s'' [30.00 GB] inherit > ACTIVE ''/dev/system/root'' [39.06 GB] inherit > ACTIVE ''/dev/system/swap'' [9.75 GB] inherit > > [root@xen ~]# e2fsck /dev/VolGroup00/LogVol00 > e2fsck 1.39 (29-May-2006) > /dev/VolGroup00/LogVol00: clean, 631982/7325696 files, 4512772/7323648 > blocks > > > At first it found a whole lot of damages inodes which I repaired. > > Then, I reversed the process: > > [root@xen ~]# lvchange -an VolGroup00 > [root@xen ~]# lvscan > inactive ''/dev/VolGroup00/LogVol00'' [27.94 GB] inherit > inactive ''/dev/VolGroup00/LogVol01'' [1.94 GB] inherit > ACTIVE ''/dev/data/cpanel002'' [100.00 GB] inherit > ACTIVE ''/dev/data/windows2003_web'' [30.00 GB] inherit > ACTIVE ''/dev/data/storage'' [50.00 GB] inherit > ACTIVE Original ''/dev/data/hfserver2'' [30.00 GB] inherit > ACTIVE ''/dev/data/hfdns02'' [30.00 GB] inherit > ACTIVE ''/dev/data/pluto'' [30.00 GB] inherit > ACTIVE Snapshot ''/dev/data/pluto_s'' [30.00 GB] inherit > ACTIVE ''/dev/system/root'' [39.06 GB] inherit > ACTIVE ''/dev/system/swap'' [9.75 GB] inherit > > > > [root@xen ~]# vgchange -an VolGroup00 > 0 logical volume(s) in volume group "VolGroup00" now active > [root@xen ~]# kpartx -d /dev/loop4 > [root@xen ~]# losetup -d /dev/loop4 > [root@xen ~]# > > [root@xen ~]# xm create -c /etc/xen/hfserver2 > > > And then it dies: > > > Reading all physical volumes. This may take a while... > Found volume group "VolGroup00" using metadata type lvm2 > Activating logical volumes > 2 logical volume(s) in volume group "VolGroup00" now active > Creating root device. > Mounting root filesystem. > kjournald starting. Commit interval 5 seconds > EXT3-fs: mounted filesystem with ordered data mode. > Setting up other filesystems. > Setting up new root fs > no fstab.sys, mounting internal defaults > Switching to new root and running init. > unmounting old /dev > unmounting old /proc > unmounting old /sys > exec of init (/sbin/init) failed!!!: No such file or directory > Kernel panic - not syncing: Attempted to kill init! > [root@xen ~]# > > > -- > Kind Regards > Rudi Ahlers > CEO, SoftDux Hosting > Web: http://www.SoftDux.com > Office: 087 805 9573 > Cell: 082 554 7532 >-- Kind Regards Rudi Ahlers CEO, SoftDux Hosting Web: http://www.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Fajar A. Nugraha
2009-May-08 20:55 UTC
Re: [Xen-users] Re: domU corrupt after server crash, help needed trying to recover domU
On Sat, May 9, 2009 at 1:21 AM, Rudi Ahlers <rudiahlers@gmail.com> wrote:> Does anyone know how to fix this?You pretty much did everything correctly. If it still refuses to come up the only available option is restore from backup. I see that you use losetup, meaning file-backed storage for domU. Do you also use file:/ in domU config file? If yes, then that configuration is prone to failures. You should use tap:aio:/ or phy:/ with LVM-backed storage. Regards, Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Rudi Ahlers
2009-May-08 22:42 UTC
Re: [Xen-users] Re: domU corrupt after server crash, help needed trying to recover domU
On Fri, May 8, 2009 at 10:55 PM, Fajar A. Nugraha <fajar@fajar.net> wrote:> On Sat, May 9, 2009 at 1:21 AM, Rudi Ahlers <rudiahlers@gmail.com> wrote: > > Does anyone know how to fix this? > > You pretty much did everything correctly. If it still refuses to come > up the only available option is restore from backup. > > I see that you use losetup, meaning file-backed storage for domU. Do > you also use file:/ in domU config file? If yes, then that > configuration is prone to failures. You should use tap:aio:/ or phy:/ > with LVM-backed storage. > > Regards, > > Fajar >Hi Fajar, I got the commands via google search, so I didn''t know that losetup was only meant for file-backed storage. Here''s the domU configuration: [root@xen ~]# more /etc/xen/hfserver2 name = "hfserver2" uuid = "073b87c5-0317-6eb3-f07f-b8978246ec48" maxmem = 4096 memory = 1024 vcpus = 2 bootloader = "/usr/bin/pygrub" on_poweroff = "destroy" on_reboot = "restart" on_crash = "restart" vfb = [ ] disk = [ "phy:/dev/data/hfserver2,xvda,w" ] vif = [ "mac=00:16:3e:2d:a1:fe,bridge=xenbr0" ] Unfortunately there''s no backups :( -- Kind Regards Rudi Ahlers CEO, SoftDux Hosting Web: http://www.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Paul Choi
2009-May-08 23:07 UTC
Re: [Xen-users] Re: domU corrupt after server crash, help needed trying to recover domU
Have you checked /dev/mapper/? Since you ran "kpartx -a /dev/loop4", you should see: /dev/mapper/loop4p1 /dev/mapper/loop4p2 /dev/mapper/loop4p3 and so on... You can treat each of them like a device. So, you can fsck /dev/mapper/loop4p1, for example. And mount it from dom0 to get the data. Well, if fsck works out ok, maybe the domU will work. Since you went as far as doing losetup, I figure you tried doing fsck and mount, but since I don''t see it in your email, just in case... I hope this helps. -Paul Choi Rudi Ahlers wrote:> Does anyone know how to fix this? > > On 5/8/09, Rudi Ahlers <rudiahlers@gmail.com> wrote: > >> On Fri, May 8, 2009 at 2:29 PM, Rudi Ahlers <rudiahlers@gmail.com> wrote: >> >> >>> Hi all, >>> >>> One of our Dell servers has failed badly, and one of the domU''s has been >>> corrupted in the process. It boots up to a point and then gives me a >>> kernel >>> panic: >>> >>> Loading dm-zero.ko module >>> Loading dm-snapshot.ko module >>> Scanning and configuring dmraid supported devices >>> Scanning logical volumes >>> Reading all physical volumes. This may take a while... >>> No volume groups found >>> Activating logical volumes >>> Volume group "VolGroup00" not found >>> Creating root device. >>> Mounting root filesystem. >>> mount: could not find filesystem ''/dev/root'' >>> Setting up other filesystems. >>> Setting up new root fs >>> setuproot: moving /dev failed: No such file or directory >>> no fstab.sys, mounting internal defaults >>> setuproot: error mounting /proc: No such file or directory >>> setuproot: error mounting /sys: No such file or directory >>> Switching to new root and running init. >>> unmounting old /dev >>> unmounting old /proc >>> unmounting old /sys >>> switchroot: mount failed: No such file or directory >>> Kernel panic - not syncing: Attempted to kill init! >>> >>> >>> It shows up as a Zombie: >>> >>> [root@xen ~]# xm list >>> Name ID Mem(MiB) VCPUs State >>> Time(s) >>> Domain-0 0 1439 1 r----- >>> 329.0 >>> Zombie-hfserver2 15 1024 1 ----c- >>> 0.5 >>> hfdns02 10 519 2 r----- >>> 1552.8 >>> >>> >>> I can''t mount either: >>> >>> root@xen ~]# mount /dev/data/hf >>> hfdns02 hfserver2 >>> [root@xen ~]# mount /dev/data/hfserver2 /mnt/cpanel/ >>> mount: you must specify the filesystem type >>> [root@xen ~]# mount -o loop /dev/data/hfserver2 /mnt/cpanel/ >>> mount: you must specify the filesystem type >>> >>> Here''s the output of the LVM partitions: >>> >>> [root@xen ~]# lvscan >>> ACTIVE ''/dev/data/cpanel002'' [100.00 GB] inherit >>> ACTIVE ''/dev/data/windows2003_web'' [30.00 GB] inherit >>> ACTIVE ''/dev/data/storage'' [50.00 GB] inherit >>> ACTIVE Original ''/dev/data/hfserver2'' [30.00 GB] inherit >>> ACTIVE ''/dev/data/hfdns02'' [30.00 GB] inherit >>> ACTIVE ''/dev/data/pluto'' [30.00 GB] inherit >>> ACTIVE Snapshot ''/dev/data/pluto_s'' [30.00 GB] inherit >>> ACTIVE ''/dev/system/root'' [39.06 GB] inherit >>> ACTIVE ''/dev/system/swap'' [9.75 GB] inherit >>> [root@xen ~]# vgscan >>> Reading all physical volumes. This may take a while... >>> Found volume group "data" using metadata type lvm2 >>> Found volume group "system" using metadata type lvm2 >>> [root@xen ~]# >>> >>> >>> Does anyone know how to fix a LVM like this? >>> >>> >>> -- >>> >>> >> >> Here''s what I''ve done so far: >> >> [root@xen ~]# losetup /dev/loop4 /dev/data/hfserver2 >> >> # This mounts the LVM partition as imaginary physical to /dev/loop4 >> # data is the virtual group (VG) name >> >> [root@xen ~]# kpartx -va /dev/loop4 >> add map loop4p1 : 0 208782 linear /dev/loop4 63 >> add map loop4p2 : 0 62701695 linear /dev/loop4 208845 >> >> # This creates a device map with partitions in /dev/data/hfserver in >> /dev/mapper >> >> [root@xen ~]# vgscan >> Reading all physical volumes. This may take a while... >> Found volume group "VolGroup00" using metadata type lvm2 >> Found volume group "data" using metadata type lvm2 >> Found volume group "system" using metadata type lvm2 >> >> >> [root@xen ~]# lvscan >> inactive ''/dev/VolGroup00/LogVol00'' [27.94 GB] inherit >> inactive ''/dev/VolGroup00/LogVol01'' [1.94 GB] inherit >> ACTIVE ''/dev/data/cpanel002'' [100.00 GB] inherit >> ACTIVE ''/dev/data/windows2003_web'' [30.00 GB] inherit >> ACTIVE ''/dev/data/storage'' [50.00 GB] inherit >> ACTIVE Original ''/dev/data/hfserver2'' [30.00 GB] inherit >> ACTIVE ''/dev/data/hfdns02'' [30.00 GB] inherit >> ACTIVE ''/dev/data/pluto'' [30.00 GB] inherit >> ACTIVE Snapshot ''/dev/data/pluto_s'' [30.00 GB] inherit >> ACTIVE ''/dev/system/root'' [39.06 GB] inherit >> ACTIVE ''/dev/system/swap'' [9.75 GB] inherit >> >> [root@xen ~]# lvchange -ay VolGroup00 >> [root@xen ~]# lvscan >> ACTIVE ''/dev/VolGroup00/LogVol00'' [27.94 GB] inherit >> ACTIVE ''/dev/VolGroup00/LogVol01'' [1.94 GB] inherit >> ACTIVE ''/dev/data/cpanel002'' [100.00 GB] inherit >> ACTIVE ''/dev/data/windows2003_web'' [30.00 GB] inherit >> ACTIVE ''/dev/data/storage'' [50.00 GB] inherit >> ACTIVE Original ''/dev/data/hfserver2'' [30.00 GB] inherit >> ACTIVE ''/dev/data/hfdns02'' [30.00 GB] inherit >> ACTIVE ''/dev/data/pluto'' [30.00 GB] inherit >> ACTIVE Snapshot ''/dev/data/pluto_s'' [30.00 GB] inherit >> ACTIVE ''/dev/system/root'' [39.06 GB] inherit >> ACTIVE ''/dev/system/swap'' [9.75 GB] inherit >> >> [root@xen ~]# e2fsck /dev/VolGroup00/LogVol00 >> e2fsck 1.39 (29-May-2006) >> /dev/VolGroup00/LogVol00: clean, 631982/7325696 files, 4512772/7323648 >> blocks >> >> >> At first it found a whole lot of damages inodes which I repaired. >> >> Then, I reversed the process: >> >> [root@xen ~]# lvchange -an VolGroup00 >> [root@xen ~]# lvscan >> inactive ''/dev/VolGroup00/LogVol00'' [27.94 GB] inherit >> inactive ''/dev/VolGroup00/LogVol01'' [1.94 GB] inherit >> ACTIVE ''/dev/data/cpanel002'' [100.00 GB] inherit >> ACTIVE ''/dev/data/windows2003_web'' [30.00 GB] inherit >> ACTIVE ''/dev/data/storage'' [50.00 GB] inherit >> ACTIVE Original ''/dev/data/hfserver2'' [30.00 GB] inherit >> ACTIVE ''/dev/data/hfdns02'' [30.00 GB] inherit >> ACTIVE ''/dev/data/pluto'' [30.00 GB] inherit >> ACTIVE Snapshot ''/dev/data/pluto_s'' [30.00 GB] inherit >> ACTIVE ''/dev/system/root'' [39.06 GB] inherit >> ACTIVE ''/dev/system/swap'' [9.75 GB] inherit >> >> >> >> [root@xen ~]# vgchange -an VolGroup00 >> 0 logical volume(s) in volume group "VolGroup00" now active >> [root@xen ~]# kpartx -d /dev/loop4 >> [root@xen ~]# losetup -d /dev/loop4 >> [root@xen ~]# >> >> [root@xen ~]# xm create -c /etc/xen/hfserver2 >> >> >> And then it dies: >> >> >> Reading all physical volumes. This may take a while... >> Found volume group "VolGroup00" using metadata type lvm2 >> Activating logical volumes >> 2 logical volume(s) in volume group "VolGroup00" now active >> Creating root device. >> Mounting root filesystem. >> kjournald starting. Commit interval 5 seconds >> EXT3-fs: mounted filesystem with ordered data mode. >> Setting up other filesystems. >> Setting up new root fs >> no fstab.sys, mounting internal defaults >> Switching to new root and running init. >> unmounting old /dev >> unmounting old /proc >> unmounting old /sys >> exec of init (/sbin/init) failed!!!: No such file or directory >> Kernel panic - not syncing: Attempted to kill init! >> [root@xen ~]# >> >> >> -- >> Kind Regards >> Rudi Ahlers >> CEO, SoftDux Hosting >> Web: http://www.SoftDux.com >> Office: 087 805 9573 >> Cell: 082 554 7532 >> >> > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Fajar A. Nugraha
2009-May-09 03:19 UTC
Re: [Xen-users] Re: domU corrupt after server crash, help needed trying to recover domU
On Sat, May 9, 2009 at 5:42 AM, Rudi Ahlers <rudiahlers@gmail.com> wrote:> Hi Fajar, > > I got the commands via google search, so I didn''t know that losetup was only > meant for file-backed storage.If it''s a block device (LVM, partition, etc.) you can skip losetup and go directly to kpartx -av /dev/data/hfserver2> Unfortunately there''s no backups :(Ouch. Sorry to hear that. So that makes it what ... your second corruption? On my environment FS corruption is USUALLY because one of these : - human error (like the admin mounting the same block device twice on different servers). This usually happens on shared-storage systems (SAN, NAS, etc). - SAN error (like when it got temporarily disconnected, and then reconnected again) - server hardware error (bad memory, bad disk controller, etc.) I suggest you check all three to make sure corruption doesn''t happen again. If both corruption are on the same hardware, then most likely the server hardware is bad. You could PROBABLY still salvage some data from the broken domU. Try shutting it down, and mount it again on dom0. Sometimes fsck will find recovered inodes in /lost+found, so perhaps some of your data is still there. BTW, VolGroup00 IS the name of domU''s VG right? It''s not dom0''s VG? Cause if it were dom0''s you might have more problems ahead. Regards, Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Rudi Ahlers
2009-May-10 07:59 UTC
Re: [Xen-users] Re: domU corrupt after server crash, help needed trying to recover domU
On Sun, May 10, 2009 at 9:38 AM, Rudi Ahlers <rudiahlers@gmail.com> wrote:> > > On Sat, May 9, 2009 at 5:19 AM, Fajar A. Nugraha <fajar@fajar.net> wrote: > >> On Sat, May 9, 2009 at 5:42 AM, Rudi Ahlers <rudiahlers@gmail.com> wrote: >> > Hi Fajar, >> > >> > I got the commands via google search, so I didn''t know that losetup was >> only >> > meant for file-backed storage. >> >> If it''s a block device (LVM, partition, etc.) you can skip losetup and >> go directly to >> kpartx -av /dev/data/hfserver2 >> > > Really? Cool, now I''ve learned something :) > > >> >> > Unfortunately there''s no backups :( >> >> Ouch. Sorry to hear that. >> So that makes it what ... your second corruption? >> On my environment FS corruption is USUALLY because one of these : >> - human error (like the admin mounting the same block device twice on >> different servers). This usually happens on shared-storage systems >> (SAN, NAS, etc). >> - SAN error (like when it got temporarily disconnected, and then >> reconnected again) >> - server hardware error (bad memory, bad disk controller, etc.) >> > > Yes, but on a different server, different client, different reason. The > only thing that''s the same is the IDC, and the server setup. Both have > CentOS on the host node, and runs cPanel on the domU VPS''s. I''d love to > setup a shared NAS and have 2 servers shared the data from there, but funds > are a bit limited :( > > >> >> I suggest you check all three to make sure corruption doesn''t happen >> again. If both corruption are on the same hardware, then most likely >> the server hardware is bad. >> > > The problem is due to the RAM. The ECC (non buffered) Kingston Memory > modules don''t work as expected on the Dell PE860 platform. Strangely when I > put normal desktop RAM into the server, it worked fine. So, I''m taking the > RAM back to the supplier on Monday. > > >> You could PROBABLY still salvage some data from the broken domU. Try >> shutting it down, and mount it again on dom0. Sometimes fsck will find >> recovered inodes in /lost+found, so perhaps some of your data is still >> there. >> > > Yes, I''m going to try this and see how far I can get. > > >> >> BTW, VolGroup00 IS the name of domU''s VG right? It''s not dom0''s VG? >> Cause if it were dom0''s you might have more problems ahead. >> > > no, the hostnode''s LVM has been renamed to /dev/data/root, /dev/home/swap & > /dev/data/home for this very reason > > >> >> Regards, >> >> Fajar >> > >Just as matter of interest, the amount of recovered files in /mnt/cpanel/lost+found/ (this is the mounted VolGroup00 partition) is 32536 And the files all looks like this: -rw-r----- 1 root 32046 94816 Feb 12 10:28 #1865704 -rw-r--r-- 1 root root 91 Feb 12 10:28 #1865705 -rw-r----- 1 root 32052 94816 Feb 12 10:28 #1865707 -rw-r----- 1 root 32022 901 Feb 12 10:28 #1865709 -rw-r----- 1 root 32052 94816 Feb 12 10:28 #1865710 -rw-r----- 1 root 32013 94816 Feb 12 10:28 #1865711 -rw-r----- 1 root 32013 94816 Feb 12 10:28 #1865713 -rw-r--r-- 1 root root 91 Feb 12 10:28 #1865714 -rw-r----- 1 root 32037 94816 Feb 12 10:28 #1865715 -rw-r----- 1 root 506 94816 Feb 12 10:44 #1865716 -rw-r----- 1 root 32037 6202 Feb 12 10:28 #1865717 -rw-r--r-- 1 root root 92 Feb 12 10:44 #1865718 -rw-r--r-- 1 root root 105 Feb 12 10:44 #1865719 -rw-r----- 1 root 32029 94816 Feb 12 10:44 #1865721 -rw-r----- 1 root 32029 94816 Feb 12 10:44 #1865722 -rw-r----- 1 root 32029 94816 Feb 12 10:44 #1865723 -rw-r-0m#1865794 -rw-r----- 1 root 32029 94816 Feb 12 16:47 #1865795 [root@xen cpanel]# ll lost+found/ | wc -l What can I do with these files, apart from deleting them? -- Kind Regards Rudi Ahlers CEO, SoftDux Hosting Web: http://www.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Ciro Iriarte
2009-May-10 19:08 UTC
Re: [Xen-users] Re: domU corrupt after server crash, help needed trying to recover domU
2009/5/10 Rudi Ahlers <rudiahlers@gmail.com>:> > > On Sun, May 10, 2009 at 9:38 AM, Rudi Ahlers <rudiahlers@gmail.com> wrote: >> >> >> On Sat, May 9, 2009 at 5:19 AM, Fajar A. Nugraha <fajar@fajar.net> wrote: >>> >>> On Sat, May 9, 2009 at 5:42 AM, Rudi Ahlers <rudiahlers@gmail.com> wrote: >>> > Hi Fajar, >>> > >>> > I got the commands via google search, so I didn''t know that losetup was >>> > only >>> > meant for file-backed storage. >>> >>> If it''s a block device (LVM, partition, etc.) you can skip losetup and >>> go directly to >>> kpartx -av /dev/data/hfserver2 >> >> Really? Cool, now I''ve learned something :) >> >>> >>> > Unfortunately there''s no backups :( >>> >>> Ouch. Sorry to hear that. >>> So that makes it what ... your second corruption? >>> On my environment FS corruption is USUALLY because one of these : >>> - human error (like the admin mounting the same block device twice on >>> different servers). This usually happens on shared-storage systems >>> (SAN, NAS, etc). >>> - SAN error (like when it got temporarily disconnected, and then >>> reconnected again) >>> - server hardware error (bad memory, bad disk controller, etc.) >> >> Yes, but on a different server, different client, different reason. The >> only thing that''s the same is the IDC, and the server setup. Both have >> CentOS on the host node, and runs cPanel on the domU VPS''s. I''d love to >> setup a shared NAS and have 2 servers shared the data from there, but funds >> are a bit limited :( >> >>> >>> I suggest you check all three to make sure corruption doesn''t happen >>> again. If both corruption are on the same hardware, then most likely >>> the server hardware is bad. >> >> The problem is due to the RAM. The ECC (non buffered) Kingston Memory >> modules don''t work as expected on the Dell PE860 platform. Strangely when I >> put normal desktop RAM into the server, it worked fine. So, I''m taking the >> RAM back to the supplier on Monday. >> >>> >>> You could PROBABLY still salvage some data from the broken domU. Try >>> shutting it down, and mount it again on dom0. Sometimes fsck will find >>> recovered inodes in /lost+found, so perhaps some of your data is still >>> there. >> >> Yes, I''m going to try this and see how far I can get. >> >>> >>> BTW, VolGroup00 IS the name of domU''s VG right? It''s not dom0''s VG? >>> Cause if it were dom0''s you might have more problems ahead. >> >> no, the hostnode''s LVM has been renamed to /dev/data/root, /dev/home/swap >> & /dev/data/home for this very reason >> >>> >>> Regards, >>> >>> Fajar >> > > > Just as matter of interest, the amount of recovered files in > /mnt/cpanel/lost+found/ (this is the mounted VolGroup00 partition) is 32536 > > And the files all looks like this: > > -rw-r----- 1 root 32046 94816 Feb 12 10:28 #1865704 > -rw-r--r-- 1 root root 91 Feb 12 10:28 #1865705 > -rw-r----- 1 root 32052 94816 Feb 12 10:28 #1865707 > -rw-r----- 1 root 32022 901 Feb 12 10:28 #1865709 > -rw-r----- 1 root 32052 94816 Feb 12 10:28 #1865710 > -rw-r----- 1 root 32013 94816 Feb 12 10:28 #1865711 > -rw-r----- 1 root 32013 94816 Feb 12 10:28 #1865713 > -rw-r--r-- 1 root root 91 Feb 12 10:28 #1865714 > -rw-r----- 1 root 32037 94816 Feb 12 10:28 #1865715 > -rw-r----- 1 root 506 94816 Feb 12 10:44 #1865716 > -rw-r----- 1 root 32037 6202 Feb 12 10:28 #1865717 > -rw-r--r-- 1 root root 92 Feb 12 10:44 #1865718 > -rw-r--r-- 1 root root 105 Feb 12 10:44 #1865719 > -rw-r----- 1 root 32029 94816 Feb 12 10:44 #1865721 > -rw-r----- 1 root 32029 94816 Feb 12 10:44 #1865722 > -rw-r----- 1 root 32029 94816 Feb 12 10:44 #1865723 > -rw-r-0m#1865794 > -rw-r----- 1 root 32029 94816 Feb 12 16:47 #1865795 > [root@xen cpanel]# ll lost+found/ | wc -l > > > > What can I do with these files, apart from deleting them? > > -- > Kind Regards > Rudi Ahlers > CEO, SoftDux Hosting > Web: http://www.SoftDux.com > Office: 087 805 9573 > Cell: 082 554 7532 >Those are from a fsck execution. Your only hope is running the "file" command on each of them to try to guess what are the files, if they are not corrupt you can copy them back to where they belong, but there''s no automatic procedure for this... Regards, -- Ciro Iriarte http://cyruspy.wordpress.com -- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users