Andrew
2010-Mar-11 11:32 UTC
[zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
Hi All, We recently upgraded our Solaris 10 servers from ESX 3.5 to vSphere and in the process, the zpools appeared to become FAULTED even though we did not touch the OS. We detached the Physical RDM (1TB) from the Virtual Machine and attached to another idential Virtual machine to see if that fixed the problem, but unfortunately, typing Zpool status and Zpool import finds nothing even though "FORMAT" and FORMAT -E displays the 1TB volume. Are there any known problems or ways to reimport a supposed lost/confused zpool on a new host? Thanks Andrew -- This message posted from opensolaris.org
Andrew
2010-Mar-11 13:27 UTC
[zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
Ok, The fault appears to have occurred regardless of the attempts to move to vSphere as we''ve now moved the host back to ESX 3.5 from whence it came and the problem still exists. Looks to me like the fault occurred as a result of a reboot. Any help and advice would be greatly appreciated. -- This message posted from opensolaris.org
Ross Walker
2010-Mar-11 14:45 UTC
[zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
On Mar 11, 2010, at 8:27 AM, Andrew <acmcomputers at hotmail.com> wrote:> Ok, > > The fault appears to have occurred regardless of the attempts to > move to vSphere as we''ve now moved the host back to ESX 3.5 from > whence it came and the problem still exists. > > Looks to me like the fault occurred as a result of a reboot. > > Any help and advice would be greatly appreciated.It appears the RDM might have had something to do with this. Try a different RDM setting then physical, like virtual. Try mounting the disk via iSCSI initiator inside VM instead of RDM. If you tried fiddling with the ESX RDM options and it still doesn''t work... Inside the Solaris VM, dump the first 128k of the disk to a file using dd then using a hex editor find out what lba contains the MBR, which should be LBA 0, but I suspect it will be offset. Then the GPT will start at MBR LBA + 1 to MBR LBA + 33. Use the wikipedia entry for MBR, there is a unique identifier in there somewhere to search for. There is a backup GPT also in the last 33 sectors of the disk. Once you find the offset it is best to just dump those 34 sectors (0-33) to another file. Edit each MBR and GPT entry to take into account the offset then copy those 34 sectors into the first 34 sectors of the disk, and the last 33 sectors of the file to the last 33 sectors of the disk. Rescan, and hopefully it will see the disk. If the offset is in the other direction then it means it''s been padded, probably with metainfo? And you will need to get rid of the RDM and use the iSCSI initiator in the solaris vm to mount the volume. See how the first 34 sectors look, and if they are damaged take the backup GPT to reconstruct the primary GPT and recreate the MBR. -Ross
Andrew
2010-Mar-11 16:20 UTC
[zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
Hi Ross, Thanks for your advice. I''ve tried presenting as Virtual and Physical but sadly to no avail. I''m guessing if it was going to work then a quick zpool import or zpool status should at the very show me the "data" pool thats gone missing. The RDM is from a FC SAN so unfortunately I can''t rely on connecting using an iSCSI initiator within the OS to attach the volume so I guess i have to dive straight into checking the MBR at this stage. I''ll no doubt need some help here so please forgive me if I fall at the first hurdle. Kind Regards Andrew -- This message posted from opensolaris.org
Andrew
2010-Mar-11 17:31 UTC
[zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
Hi Ross, Ok - as a Solaris newbie.. i''m going to need your help. Format produces the following:- c8t4d0 (VMware-Virtualdisk-1.0 cyl 65268 alt 2 hd 255 sec 126) /pci at 0,0/pci15ad,1976 at 10/sd at 4,0 what dd command do I need to run to reference this disk? I''ve tried /dev/rdsk/c8t4d0 and /dev/dsk/c8t4d0 but neither of them are valid. Kind Regards Andrew -- This message posted from opensolaris.org
Ross Walker
2010-Mar-11 23:31 UTC
[zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
On Mar 11, 2010, at 12:31 PM, Andrew <acmcomputers at hotmail.com> wrote:> Hi Ross, > > Ok - as a Solaris newbie.. i''m going to need your help. > > Format produces the following:- > > c8t4d0 (VMware-Virtualdisk-1.0 cyl 65268 alt 2 hd 255 sec 126) / > pci at 0,0/pci15ad,1976 at 10/sd at 4,0 > > what dd command do I need to run to reference this disk? I''ve tried / > dev/rdsk/c8t4d0 and /dev/dsk/c8t4d0 but neither of them are valid.dd if=/dev/rdsk/c8t4d0p0 of=~/disk.out bs=512 count=256 That should get you the first 128K. As for a hex editor, try bvi, like vi but for binary and supports much of the vi commands. Search for signature 0x55AA (little endian) which should be bytes 511 and 512 of the MBR. There is also the possibility that these were wiped somehow, or even cached in vmware and lost during a vm reset. -Ross
Andrew
2010-Mar-16 08:08 UTC
[zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
Hi again, Out of interest, could this problem have been avoided if the ZFS configuration didnt rely on a single disk? i.e. RAIDZ etc Thanks -- This message posted from opensolaris.org
Andrew
2010-Mar-17 14:24 UTC
[zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
Hi all, Great news - by attaching an identical size RDM to the server and then grabbing the first 128K using the command you specified Ross dd if=/dev/rdsk/c8t4d0p0 of=~/disk.out bs=512 count=256 we then proceeded to inject this into the faulted RDM and lo and behold the volume recovered! dd if=~/disk.out of=/dev/rdsk/c8t5d0p0 bs=512 count=256 Thanks for your help! -- This message posted from opensolaris.org