thr3ads.net - zfs discuss - [zfs-discuss] Single VDEV pool permanent and checksum errors after replace [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Chris Murray

2011-Jan-04 07:18 UTC

[zfs-discuss] Single VDEV pool permanent and checksum errors after replace

Hi,

I have some strange goings-on with my VM of Solaris Express 11, and I
hope someone can help.

It shares out other virtual machine files for use in ESXi 4.0 (it,
too, runs in there)

I had two disks inside the VM - one for rpool and one for
''vmpool''.
All was fine.
vmpool has some deduped data. That was also fine.
I added a Samsung SSD to the ESXi host, created a 512MB VMDK and a
20GB VMDK, and added as log and cache, respectively. This also worked
fine.

At this point, the pool is made of c8t1d0 (data), c8t2d0 (logs),
c8t3d0 (cache). I decide that to add some redundancy, I''ll add a
mirrored virtual disk. At this point, it happens that the VMDK for
this disk (c8t4d0) actually resides on the same physical disk as
c8t1d0. The idea was to perform the logical split in Solaris Express
first, deal with the IO penalty of writing everything twice to the
same physical disk (even though Solaris thinks they''re two separate
ones), then move that VMDK onto a separate physical disk shortly. This
should in the short term protect against bit-flips and small errors on
the single physical disk that ESXi has, until a second one is
installed. I have a think about capacity, though, and decide I''d
prefer the mirror to be of c8t4d0 and c8t5d0 instead. So, it seems I
want to go from one single disk (c8t1d0), to a mirror of c8t4d0 and
c8t5d0. In my mind, that''s a ''zpool replace'' onto
c8t4d0 and a ''zpool
attach'' of c8t5d0. I kick off the replace, and all goes fine. Part way
through I try to do the attach as well, but am politely told I can''t.

The replace itself completed without complaint, however on completion,
virtual machines whose disks are inside ''vmpool'' start
hanging,
checksum errors rapidly start counting up, and since there''s no
redundancy, nothing can be done to repair them.


?pool: vmpool
state: DEGRADED
status: One or more devices has experienced an error resulting in data
? ? ? ?corruption. ?Applications may be affected.
action: Restore the file in question if possible. ?Otherwise restore the
? ? ? ?entire pool from backup.
? see: http://www.sun.com/msg/ZFS-8000-8A
scan: resilvered 48.2G in 2h53m with 0 errors on Mon Jan ?3 20:45:49 2011
config:

? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM
? ? ? ?vmpool ? ? ?DEGRADED ? ? 0 ? ? 0 25.6K
? ? ? ? ?c8t4d0 ? ?DEGRADED ? ? 0 ? ? 0 25.6K ?too many errors
? ? ? ?logs
? ? ? ? ?c8t2d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ?cache
? ? ? ? ?c8t3d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0

errors: Permanent errors have been detected in the following files:

? ? ? ?/vmpool/nfs/duck/duck_1-flat.vmdk
? ? ? ?/vmpool/nfs/panda/template.xppro-flat.vmdk


At this point, I remove disk c8t1d0, and snapshot the entire VM in
case I do any further damage. This leads to my first two questions:

    #1 - are there any suspicions as to what''s happened here? How come
the resilver completed fine but now there are checksum errors on the
replacement disk? It does reside on the same physical disk, after all.
Could this be something to do with me attempting the attach during the
replace?
    #2 - in my mind, c8t1d0 contains the state of the pool just prior
to the cutover to c8t4d0. Is there any way I can get this back, and
scrap the contents of c8t4d0? A ''zpool import -D'' is
fruitless, but I
imagine there''s some way of tricking Solaris into seeing c8t1d0 this
as a single disk pool again?

Now that I''ve snapshotted the VM and have a sort of safety net, I run
a scrub, which unsurprisingly unearths checksum errors and lists all
of the files which have problems:


?pool: vmpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
? ? ? ?corruption. ?Applications may be affected.
action: Restore the file in question if possible. ?Otherwise restore the
? ? ? ?entire pool from backup.
? see: http://www.sun.com/msg/ZFS-8000-8A
scan: scrub repaired 0 in 0h30m with 95 errors on Mon Jan ?3 21:47:25 2011
config:

? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM
? ? ? ?vmpool ? ? ?ONLINE ? ? ? 0 ? ? 0 ? 190
? ? ? ? ?c8t4d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? 190
? ? ? ?logs
? ? ? ? ?c8t2d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ?cache
? ? ? ? ?c8t3d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0

errors: Permanent errors have been detected in the following files:

? ? ? ?/vmpool/nfs/duck/duck-flat.vmdk
? ? ? ?/vmpool/nfs/duck/Windows Server 2003 Standard Edition.nvram
? ? ? ?/vmpool/nfs/duck/duck_1-flat.vmdk
? ? ? ?/vmpool/nfs/eagle/eagle-flat.vmdk
? ? ? ?/vmpool/nfs/eagle/eagle_1-flat.vmdk
? ? ? ?/vmpool/nfs/eagle/eagle_2-flat.vmdk
? ? ? ?/vmpool/nfs/eagle/eagle_3-flat.vmdk
? ? ? ?/vmpool/nfs/eagle/eagle_5-flat.vmdk
? ? ? ?/vmpool/nfs/panda/Windows XP Professional.nvram
? ? ? ?/vmpool/nfs/panda/panda-flat.vmdk
? ? ? ?/vmpool/nfs/panda/template.xppro-flat.vmdk


I ''zpool clear vmpool'', power on one of the VMs, and the
checksum
count quickly reaches 970.

    #3 - why would this be the case? I thought the purpose of a scrub
was to traverse all blocks, read them, and unearth problems? I''m
wondering why these 970 errors haven''t been found in the scrub?

I power off the VM, perform another scrub. This time, 94 errors:

pool: vmpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
? ? ? ?corruption. ?Applications may be affected.
action: Restore the file in question if possible. ?Otherwise restore the
? ? ? ?entire pool from backup.
? see: http://www.sun.com/msg/ZFS-8000-8A
scan: scrub repaired 0 in 0h33m with 94 errors on Mon Jan ?3 22:27:30 2011
config:

? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM
? ? ? ?vmpool ? ? ?ONLINE ? ? ? 0 ? ? 0 1.13K
? ? ? ? ?c8t4d0 ? ?ONLINE ? ? ? 0 ? ? 0 1.13K
? ? ? ?logs
? ? ? ? ?c8t2d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ?cache
? ? ? ? ?c8t3d0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0

errors: Permanent errors have been detected in the following files:

? ? ? ?/vmpool/nfs/duck/duck-flat.vmdk
? ? ? ?/vmpool/nfs/duck/duck_1-flat.vmdk
? ? ? ?/vmpool/nfs/eagle/eagle-flat.vmdk
? ? ? ?/vmpool/nfs/eagle/eagle_1-flat.vmdk
? ? ? ?/vmpool/nfs/eagle/eagle_2-flat.vmdk
? ? ? ?/vmpool/nfs/eagle/eagle_3-flat.vmdk
? ? ? ?/vmpool/nfs/eagle/eagle_5-flat.vmdk
? ? ? ?/vmpool/nfs/panda/Windows XP Professional.nvram
? ? ? ?/vmpool/nfs/panda/panda-flat.vmdk
? ? ? ?/vmpool/nfs/panda/template.xppro-flat.vmdk

I then set the failmode of the pool to continue, in the hope that
while ZFS thinks there are errors, the files would still be accessible
over NFS and ESXi wouldn''t care about them. Unsurprisingly, the VMs
don''t boot still.

Can anyone help? I see the instructions on restoring files, but I''m
quite surprised that a replace seems to have induced this problem. The
last time I saw checksum problems was when I had a load of SATA disks
behind USB controllers and I suffered power loss part way through a
replace, so I can understand there''s the potential to get into a mess.
On this occasion there wasn''t any power loss, and the event itself
reported success .. ?

Thank you in advance,
Chris

Edward Ned Harvey

2011-Jan-04 13:13 UTC

head link

[zfs-discuss] Single VDEV pool permanent and checksum errors after replace

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Chris Murray
> 
> I have some strange goings-on with my VM of Solaris Express 11, and I
> hope someone can help.
> 
> It shares out other virtual machine files for use in ESXi 4.0 (it,
> too, runs in there)
The first thing I''m noticing is that you''re running
sol11express inside ESXi
4.0.  This is an unsupported configuration, and in my personal experience
whenever you run an unsupported OS for either the host or guest of any
virtualization (vmware or other) then the end result is random errors and
general instability.

Maybe that''s not the problem for you, but I would certainly consider it
suspicious.

> So, it seems I
> want to go from one single disk (c8t1d0), to a mirror of c8t4d0 and
> c8t5d0. In my mind, that''s a ''zpool replace''
onto c8t4d0 and a ''zpool
> attach'' of c8t5d0. I kick off the replace, and all goes fine. Part
way
> through I try to do the attach as well, but am politely told I
can''t.
This also might not be the cause of your problem, but you should have
probably done the attach first, wait for it to complete, do a scrub for good
measure, and then do the replace.  In fact, I am surprised to find out
it''s
even POSSIBLE to do a replace on a pool that has only one disk.  I
didn''t
know you could even do that until now.

> The replace itself completed without complaint, however on completion,
> virtual machines whose disks are inside ''vmpool'' start
hanging,
> checksum errors rapidly start counting up, and since there''s no
> redundancy, nothing can be done to repair them.
You did the replace, and it wrote to the new disk without reading anything
from it, so there was no way it could detect checksum errors during the
replace.  After the replace completed, it started reading the data that was
just previously written, and upon read, it discovers checksum mismatches.

You will need to go back to your original disk before you did the replace.

>     #1 - are there any suspicions as to what''s happened here? How
come
> the resilver completed fine but now there are checksum errors on the
> replacement disk? It does reside on the same physical disk, after all.
> Could this be something to do with me attempting the attach during the
> replace?
Well - Even though the new vmdk exists on the same physical disk, the facts
are, the new vmdk is reporting checksum errors.  You better consider the
possibilities (a) that disk is actually experiencing hardware failure and
you better backup as soon as possible, or (b) you''re running into an
unsupported virtual hardware glitch, as I formerly hinted.

>     #2 - in my mind, c8t1d0 contains the state of the pool just prior
> to the cutover to c8t4d0. Is there any way I can get this back, and
> scrap the contents of c8t4d0? A ''zpool import -D'' is
fruitless, but I
> imagine there''s some way of tricking Solaris into seeing c8t1d0
this
> as a single disk pool again?
Good question.  I am not sure precisely what "zpool replace" does. 
Maybe
somebody else can answer this.  After doing a "zpool replace," is it
possible to move the old disk to a new system and import the pool?  Or is
the pool permanently removed from the old disk?

I would boot into command-line mode from the solaris CD, with only that one
disk attached, and then try the "zpool import" ... If that works, you
know
you have something.  And if it doesn''t work, I don''t know what
to tell you.
You might be hosed.

> Now that I''ve snapshotted the VM and have a sort of safety net, I
run
> a scrub, which unsurprisingly unearths checksum errors and lists all
> of the files which have problems:
Because we''re talking about ZFS, you should probably specify, that you
created a vmware snapshot of the machine.  You''re not talking about a
ZFS
snapshot, are you?

> I ''zpool clear vmpool'', power on one of the VMs, and the
checksum
> count quickly reaches 970.
> 
>     #3 - why would this be the case? I thought the purpose of a scrub
> was to traverse all blocks, read them, and unearth problems? I''m
> wondering why these 970 errors haven''t been found in the scrub?
Checksum errors are not correctable when you have no redundancy.  That
means, although you ran the scrub and the clear, the problems have not been
fixed.  They will come back as soon as those blocks are read again.

> I power off the VM, perform another scrub. This time, 94 errors:
It is strange that the number of errors is lower the 2nd time around.  The
one thing that''s sure:  You either have hardware failure, or something
that
looks like hardware failure (such as a bug caused by unsupported
virtualization configuration.)

Chris Murray

2011-Jan-05 11:33 UTC

head link

[zfs-discuss] Single VDEV pool permanent and checksum errors after replace

Hi Edward,

Thank you for the feedback. All makes sense.

To clarify, yes, I snapshotted the VM within ESXi, not the filesystems within
the pool. Unfortunately, because of my misunderstanding of how ESXi snapshotting
works, I''m now left without the option of investigating whether the
replaced disk could be used to create a new pool.

For anyone interested, I removed the c8t1d0 disk from the VM, snapshotted,
messed around a little, removed the ''corrupt'' disks, added
c8t1d0 back in, performed a ''zdb -l'' which did show a disk of
type ''replacing'', with two children. That looked quite
promising, but I wanted to wait until anyone had chipped in with some
suggestions about how to recover from the replaced disk, so I decided to look at
the corrupt data again. I reverted back to the snapshot in ESXi, bringing back
my corrupt disks (as you''d expect), but which unfortunately *deleted*
(!?) the VMDK files which related to c8t1d0. Not a ZFS/Solaris issue of any
kind, I know, but one to watch out for potentially if anyone else is trying
things out in this unsupported configuration.

Shame I can''t look into getting data back from the
''good'' virtual disk - that''s probably something
I''d like answered so I might look into again once I''ve put
this matter to bed.

In the meantime, I''ll see what I can do with dd_rescue or dd with
''noerror,sync'' to produce some swiss-cheese VMDK files and see
whether the content can be repaired. It''s not the end of the world if
they''re gone, but I''d like to satisfy my own curiosity with
this little exercise in recovery.

Thanks again for the input,
Chris
-- 
This message posted from opensolaris.org

Edward Ned Harvey

2011-Jan-05 13:26 UTC

head link

[zfs-discuss] Single VDEV pool permanent and checksum errors after replace

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Chris Murray
> 
> Thank you for the feedback. All makes sense.
Sorry to hear about what''s probably an unfortunate loss of nonredundant
disk...  

One comment about etiquette though:  

You changed the subject, so your reply is no longer threaded with the
original thread.  This makes it difficult for people to see this message and
understand what you''re talking about in context.

Also, when you replied you didn''t quote any of what you were replying
to.
Maybe it doesn''t matter this time, but it''s useful to quote
relevant parts
of what you''re replying to, so people can see it without having to flip
back
and read old messages.

Please bear in mind, some people use this list via web interface (which
shows previous messages in the thread on the same page.)  But many people
use email, where each separate message is saved separately...  The previous
messages in thread are not displayed while reading the current message in a
thread.  So quoting is very important for readers to understand context.

Chris Murray

2011-Jan-06 20:02 UTC

head link

[zfs-discuss] Single VDEV pool permanent and checksum errors after replace

On 5 January 2011 13:26, Edward Ned Harvey
<opensolarisisdeadlongliveopensolaris at nedharvey.com>
wrote:> One comment about etiquette though:
>

I''ll certainly bear your comments in mind in future, however
I''m not
sure what happened to the subject, as I used the interface at
http://opensolaris.org/jive/. I thought that would keep the subject
the same. Plus, my gmail account appears to have joined up my reply
from the web interface with the original thread too? Anyhow, I do see
your point about quoting, and will do from now.

For anyone wondering about the extent of checksum problems in my VMDK
files, they range from only 128KB worth in some, to 640KB in others.
Unfortunately it appears that the bad parts are in critical parts of
the filesystem, but it''s not a ZFS matter so I''ll see what can
be done
by way of repair with Windows/NTFS inside each affected VM. So
whatever went wrong, it was only a small amount of data.

Thanks again,
Chris

Chris Murray

2011-Jan-06 20:03 UTC

head link

[zfs-discuss] Single VDEV pool permanent and checksum errors after replace

On 6 January 2011 20:02, Chris Murray <chrismurray84 at gmail.com>
wrote:> On 5 January 2011 13:26, Edward Ned Harvey
> <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:
>> One comment about etiquette though:
>>
>
>
> I''ll certainly bear your comments in mind in future, however
I''m not
> sure what happened to the subject, as I used the interface at
> http://opensolaris.org/jive/. I thought that would keep the subject
> the same. Plus, my gmail account appears to have joined up my reply
> from the web interface with the original thread too? Anyhow, I do see
> your point about quoting, and will do from now.
>
> For anyone wondering about the extent of checksum problems in my VMDK
> files, they range from only 128KB worth in some, to 640KB in others.
> Unfortunately it appears that the bad parts are in critical parts of
> the filesystem, but it''s not a ZFS matter so I''ll see
what can be done
> by way of repair with Windows/NTFS inside each affected VM. So
> whatever went wrong, it was only a small amount of data.
>
> Thanks again,
> Chris
>
I''ll get the hang of this e-mail lark on of these days, I''m
sure  :-)

zfs discuss - Jan 2011 - Single VDEV pool permanent and checksum errors after replace

[zfs-discuss] Single VDEV pool permanent and checksum errors after replace

[zfs-discuss] Single VDEV pool permanent and checksum errors after replace

[zfs-discuss] Single VDEV pool permanent and checksum errors after replace

[zfs-discuss] Single VDEV pool permanent and checksum errors after replace

[zfs-discuss] Single VDEV pool permanent and checksum errors after replace

[zfs-discuss] Single VDEV pool permanent and checksum errors after replace