Chris Murray
2011-Jan-04 07:18 UTC
[zfs-discuss] Single VDEV pool permanent and checksum errors after replace
Hi, I have some strange goings-on with my VM of Solaris Express 11, and I hope someone can help. It shares out other virtual machine files for use in ESXi 4.0 (it, too, runs in there) I had two disks inside the VM - one for rpool and one for ''vmpool''. All was fine. vmpool has some deduped data. That was also fine. I added a Samsung SSD to the ESXi host, created a 512MB VMDK and a 20GB VMDK, and added as log and cache, respectively. This also worked fine. At this point, the pool is made of c8t1d0 (data), c8t2d0 (logs), c8t3d0 (cache). I decide that to add some redundancy, I''ll add a mirrored virtual disk. At this point, it happens that the VMDK for this disk (c8t4d0) actually resides on the same physical disk as c8t1d0. The idea was to perform the logical split in Solaris Express first, deal with the IO penalty of writing everything twice to the same physical disk (even though Solaris thinks they''re two separate ones), then move that VMDK onto a separate physical disk shortly. This should in the short term protect against bit-flips and small errors on the single physical disk that ESXi has, until a second one is installed. I have a think about capacity, though, and decide I''d prefer the mirror to be of c8t4d0 and c8t5d0 instead. So, it seems I want to go from one single disk (c8t1d0), to a mirror of c8t4d0 and c8t5d0. In my mind, that''s a ''zpool replace'' onto c8t4d0 and a ''zpool attach'' of c8t5d0. I kick off the replace, and all goes fine. Part way through I try to do the attach as well, but am politely told I can''t. The replace itself completed without complaint, however on completion, virtual machines whose disks are inside ''vmpool'' start hanging, checksum errors rapidly start counting up, and since there''s no redundancy, nothing can be done to repair them. ?pool: vmpool state: DEGRADED status: One or more devices has experienced an error resulting in data ? ? ? ?corruption. ?Applications may be affected. action: Restore the file in question if possible. ?Otherwise restore the ? ? ? ?entire pool from backup. ? see: http://www.sun.com/msg/ZFS-8000-8A scan: resilvered 48.2G in 2h53m with 0 errors on Mon Jan ?3 20:45:49 2011 config: ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM ? ? ? ?vmpool ? ? ?DEGRADED ? ? 0 ? ? 0 25.6K ? ? ? ? ?c8t4d0 ? ?DEGRADED ? ? 0 ? ? 0 25.6K ?too many errors ? ? ? ?logs ? ? ? ? ?c8t2d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 ? ? ? ?cache ? ? ? ? ?c8t3d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 errors: Permanent errors have been detected in the following files: ? ? ? ?/vmpool/nfs/duck/duck_1-flat.vmdk ? ? ? ?/vmpool/nfs/panda/template.xppro-flat.vmdk At this point, I remove disk c8t1d0, and snapshot the entire VM in case I do any further damage. This leads to my first two questions: #1 - are there any suspicions as to what''s happened here? How come the resilver completed fine but now there are checksum errors on the replacement disk? It does reside on the same physical disk, after all. Could this be something to do with me attempting the attach during the replace? #2 - in my mind, c8t1d0 contains the state of the pool just prior to the cutover to c8t4d0. Is there any way I can get this back, and scrap the contents of c8t4d0? A ''zpool import -D'' is fruitless, but I imagine there''s some way of tricking Solaris into seeing c8t1d0 this as a single disk pool again? Now that I''ve snapshotted the VM and have a sort of safety net, I run a scrub, which unsurprisingly unearths checksum errors and lists all of the files which have problems: ?pool: vmpool state: ONLINE status: One or more devices has experienced an error resulting in data ? ? ? ?corruption. ?Applications may be affected. action: Restore the file in question if possible. ?Otherwise restore the ? ? ? ?entire pool from backup. ? see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub repaired 0 in 0h30m with 95 errors on Mon Jan ?3 21:47:25 2011 config: ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM ? ? ? ?vmpool ? ? ?ONLINE ? ? ? 0 ? ? 0 ? 190 ? ? ? ? ?c8t4d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? 190 ? ? ? ?logs ? ? ? ? ?c8t2d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 ? ? ? ?cache ? ? ? ? ?c8t3d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 errors: Permanent errors have been detected in the following files: ? ? ? ?/vmpool/nfs/duck/duck-flat.vmdk ? ? ? ?/vmpool/nfs/duck/Windows Server 2003 Standard Edition.nvram ? ? ? ?/vmpool/nfs/duck/duck_1-flat.vmdk ? ? ? ?/vmpool/nfs/eagle/eagle-flat.vmdk ? ? ? ?/vmpool/nfs/eagle/eagle_1-flat.vmdk ? ? ? ?/vmpool/nfs/eagle/eagle_2-flat.vmdk ? ? ? ?/vmpool/nfs/eagle/eagle_3-flat.vmdk ? ? ? ?/vmpool/nfs/eagle/eagle_5-flat.vmdk ? ? ? ?/vmpool/nfs/panda/Windows XP Professional.nvram ? ? ? ?/vmpool/nfs/panda/panda-flat.vmdk ? ? ? ?/vmpool/nfs/panda/template.xppro-flat.vmdk I ''zpool clear vmpool'', power on one of the VMs, and the checksum count quickly reaches 970. #3 - why would this be the case? I thought the purpose of a scrub was to traverse all blocks, read them, and unearth problems? I''m wondering why these 970 errors haven''t been found in the scrub? I power off the VM, perform another scrub. This time, 94 errors: pool: vmpool state: ONLINE status: One or more devices has experienced an error resulting in data ? ? ? ?corruption. ?Applications may be affected. action: Restore the file in question if possible. ?Otherwise restore the ? ? ? ?entire pool from backup. ? see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub repaired 0 in 0h33m with 94 errors on Mon Jan ?3 22:27:30 2011 config: ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM ? ? ? ?vmpool ? ? ?ONLINE ? ? ? 0 ? ? 0 1.13K ? ? ? ? ?c8t4d0 ? ?ONLINE ? ? ? 0 ? ? 0 1.13K ? ? ? ?logs ? ? ? ? ?c8t2d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 ? ? ? ?cache ? ? ? ? ?c8t3d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 errors: Permanent errors have been detected in the following files: ? ? ? ?/vmpool/nfs/duck/duck-flat.vmdk ? ? ? ?/vmpool/nfs/duck/duck_1-flat.vmdk ? ? ? ?/vmpool/nfs/eagle/eagle-flat.vmdk ? ? ? ?/vmpool/nfs/eagle/eagle_1-flat.vmdk ? ? ? ?/vmpool/nfs/eagle/eagle_2-flat.vmdk ? ? ? ?/vmpool/nfs/eagle/eagle_3-flat.vmdk ? ? ? ?/vmpool/nfs/eagle/eagle_5-flat.vmdk ? ? ? ?/vmpool/nfs/panda/Windows XP Professional.nvram ? ? ? ?/vmpool/nfs/panda/panda-flat.vmdk ? ? ? ?/vmpool/nfs/panda/template.xppro-flat.vmdk I then set the failmode of the pool to continue, in the hope that while ZFS thinks there are errors, the files would still be accessible over NFS and ESXi wouldn''t care about them. Unsurprisingly, the VMs don''t boot still. Can anyone help? I see the instructions on restoring files, but I''m quite surprised that a replace seems to have induced this problem. The last time I saw checksum problems was when I had a load of SATA disks behind USB controllers and I suffered power loss part way through a replace, so I can understand there''s the potential to get into a mess. On this occasion there wasn''t any power loss, and the event itself reported success .. ? Thank you in advance, Chris
Edward Ned Harvey
2011-Jan-04 13:13 UTC
[zfs-discuss] Single VDEV pool permanent and checksum errors after replace
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Chris Murray > > I have some strange goings-on with my VM of Solaris Express 11, and I > hope someone can help. > > It shares out other virtual machine files for use in ESXi 4.0 (it, > too, runs in there)The first thing I''m noticing is that you''re running sol11express inside ESXi 4.0. This is an unsupported configuration, and in my personal experience whenever you run an unsupported OS for either the host or guest of any virtualization (vmware or other) then the end result is random errors and general instability. Maybe that''s not the problem for you, but I would certainly consider it suspicious.> So, it seems I > want to go from one single disk (c8t1d0), to a mirror of c8t4d0 and > c8t5d0. In my mind, that''s a ''zpool replace'' onto c8t4d0 and a ''zpool > attach'' of c8t5d0. I kick off the replace, and all goes fine. Part way > through I try to do the attach as well, but am politely told I can''t.This also might not be the cause of your problem, but you should have probably done the attach first, wait for it to complete, do a scrub for good measure, and then do the replace. In fact, I am surprised to find out it''s even POSSIBLE to do a replace on a pool that has only one disk. I didn''t know you could even do that until now.> The replace itself completed without complaint, however on completion, > virtual machines whose disks are inside ''vmpool'' start hanging, > checksum errors rapidly start counting up, and since there''s no > redundancy, nothing can be done to repair them.You did the replace, and it wrote to the new disk without reading anything from it, so there was no way it could detect checksum errors during the replace. After the replace completed, it started reading the data that was just previously written, and upon read, it discovers checksum mismatches. You will need to go back to your original disk before you did the replace.> #1 - are there any suspicions as to what''s happened here? How come > the resilver completed fine but now there are checksum errors on the > replacement disk? It does reside on the same physical disk, after all. > Could this be something to do with me attempting the attach during the > replace?Well - Even though the new vmdk exists on the same physical disk, the facts are, the new vmdk is reporting checksum errors. You better consider the possibilities (a) that disk is actually experiencing hardware failure and you better backup as soon as possible, or (b) you''re running into an unsupported virtual hardware glitch, as I formerly hinted.> #2 - in my mind, c8t1d0 contains the state of the pool just prior > to the cutover to c8t4d0. Is there any way I can get this back, and > scrap the contents of c8t4d0? A ''zpool import -D'' is fruitless, but I > imagine there''s some way of tricking Solaris into seeing c8t1d0 this > as a single disk pool again?Good question. I am not sure precisely what "zpool replace" does. Maybe somebody else can answer this. After doing a "zpool replace," is it possible to move the old disk to a new system and import the pool? Or is the pool permanently removed from the old disk? I would boot into command-line mode from the solaris CD, with only that one disk attached, and then try the "zpool import" ... If that works, you know you have something. And if it doesn''t work, I don''t know what to tell you. You might be hosed.> Now that I''ve snapshotted the VM and have a sort of safety net, I run > a scrub, which unsurprisingly unearths checksum errors and lists all > of the files which have problems:Because we''re talking about ZFS, you should probably specify, that you created a vmware snapshot of the machine. You''re not talking about a ZFS snapshot, are you?> I ''zpool clear vmpool'', power on one of the VMs, and the checksum > count quickly reaches 970. > > #3 - why would this be the case? I thought the purpose of a scrub > was to traverse all blocks, read them, and unearth problems? I''m > wondering why these 970 errors haven''t been found in the scrub?Checksum errors are not correctable when you have no redundancy. That means, although you ran the scrub and the clear, the problems have not been fixed. They will come back as soon as those blocks are read again.> I power off the VM, perform another scrub. This time, 94 errors:It is strange that the number of errors is lower the 2nd time around. The one thing that''s sure: You either have hardware failure, or something that looks like hardware failure (such as a bug caused by unsupported virtualization configuration.)
Chris Murray
2011-Jan-05 11:33 UTC
[zfs-discuss] Single VDEV pool permanent and checksum errors after replace
Hi Edward, Thank you for the feedback. All makes sense. To clarify, yes, I snapshotted the VM within ESXi, not the filesystems within the pool. Unfortunately, because of my misunderstanding of how ESXi snapshotting works, I''m now left without the option of investigating whether the replaced disk could be used to create a new pool. For anyone interested, I removed the c8t1d0 disk from the VM, snapshotted, messed around a little, removed the ''corrupt'' disks, added c8t1d0 back in, performed a ''zdb -l'' which did show a disk of type ''replacing'', with two children. That looked quite promising, but I wanted to wait until anyone had chipped in with some suggestions about how to recover from the replaced disk, so I decided to look at the corrupt data again. I reverted back to the snapshot in ESXi, bringing back my corrupt disks (as you''d expect), but which unfortunately *deleted* (!?) the VMDK files which related to c8t1d0. Not a ZFS/Solaris issue of any kind, I know, but one to watch out for potentially if anyone else is trying things out in this unsupported configuration. Shame I can''t look into getting data back from the ''good'' virtual disk - that''s probably something I''d like answered so I might look into again once I''ve put this matter to bed. In the meantime, I''ll see what I can do with dd_rescue or dd with ''noerror,sync'' to produce some swiss-cheese VMDK files and see whether the content can be repaired. It''s not the end of the world if they''re gone, but I''d like to satisfy my own curiosity with this little exercise in recovery. Thanks again for the input, Chris -- This message posted from opensolaris.org
Edward Ned Harvey
2011-Jan-05 13:26 UTC
[zfs-discuss] Single VDEV pool permanent and checksum errors after replace
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Chris Murray > > Thank you for the feedback. All makes sense.Sorry to hear about what''s probably an unfortunate loss of nonredundant disk... One comment about etiquette though: You changed the subject, so your reply is no longer threaded with the original thread. This makes it difficult for people to see this message and understand what you''re talking about in context. Also, when you replied you didn''t quote any of what you were replying to. Maybe it doesn''t matter this time, but it''s useful to quote relevant parts of what you''re replying to, so people can see it without having to flip back and read old messages. Please bear in mind, some people use this list via web interface (which shows previous messages in the thread on the same page.) But many people use email, where each separate message is saved separately... The previous messages in thread are not displayed while reading the current message in a thread. So quoting is very important for readers to understand context.
Chris Murray
2011-Jan-06 20:02 UTC
[zfs-discuss] Single VDEV pool permanent and checksum errors after replace
On 5 January 2011 13:26, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:> One comment about etiquette though: >I''ll certainly bear your comments in mind in future, however I''m not sure what happened to the subject, as I used the interface at http://opensolaris.org/jive/. I thought that would keep the subject the same. Plus, my gmail account appears to have joined up my reply from the web interface with the original thread too? Anyhow, I do see your point about quoting, and will do from now. For anyone wondering about the extent of checksum problems in my VMDK files, they range from only 128KB worth in some, to 640KB in others. Unfortunately it appears that the bad parts are in critical parts of the filesystem, but it''s not a ZFS matter so I''ll see what can be done by way of repair with Windows/NTFS inside each affected VM. So whatever went wrong, it was only a small amount of data. Thanks again, Chris
Chris Murray
2011-Jan-06 20:03 UTC
[zfs-discuss] Single VDEV pool permanent and checksum errors after replace
On 6 January 2011 20:02, Chris Murray <chrismurray84 at gmail.com> wrote:> On 5 January 2011 13:26, Edward Ned Harvey > <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote: >> One comment about etiquette though: >> > > > I''ll certainly bear your comments in mind in future, however I''m not > sure what happened to the subject, as I used the interface at > http://opensolaris.org/jive/. I thought that would keep the subject > the same. Plus, my gmail account appears to have joined up my reply > from the web interface with the original thread too? Anyhow, I do see > your point about quoting, and will do from now. > > For anyone wondering about the extent of checksum problems in my VMDK > files, they range from only 128KB worth in some, to 640KB in others. > Unfortunately it appears that the bad parts are in critical parts of > the filesystem, but it''s not a ZFS matter so I''ll see what can be done > by way of repair with Windows/NTFS inside each affected VM. So > whatever went wrong, it was only a small amount of data. > > Thanks again, > Chris >I''ll get the hang of this e-mail lark on of these days, I''m sure :-)