thr3ads.net - zfs discuss - [zfs-discuss] Still no way to recover a "corrupted" pool [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Freddie Cash

2011-Apr-29 20:23 UTC

[zfs-discuss] Still no way to recover a "corrupted" pool

Is there anyway, yet, to import a pool with corrupted space_map
errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions?

I have a pool comprised of 4 raidz2 vdevs of 6 drives each.  I have
almost 10 TB of data in the pool (3 TB actual disk space used due to
dedup and compression).  While testing various failure modes, I have
managed to corrupt the pool to the point where it won''t import.  So
much for being bulletproof.  :(

If I try to import the pool normally, it give corrupted space_map errors.

If I try to "import -F" the pool, it complains that "zio-io_type
!ZIO_TYPE_WRITE".

I''ve also tried the above with "-o readonly=on" and "-R
some/other/root" variations.

There''s also no zfs.cache file anywhere to be found, and creating a
blank file doesn''t help.

Does this mean that a 10 TB pool can be lost due to a single file
being corrupted, or a single piece of pool metadata being corrupted?
And that there''s *still* no recovery tools for situations like this?

Running ZFSv28 on 64-bit FreeBSD 8-STABLE.

For the curious, the failure mode that causes this?  Rebooting while 8
simultaneous rsyncs were running, which were not killed by the
shutdown process for some reason, which prevented 8 ZFS filesystems
from being unmounted, which prevented the pool from being exported
(even though I have a "zfs unmount -f" and "zpool export -f"
fail-safe), which locked up the shutdown process requiring a power
reset.

:(

-- 
Freddie Cash
fjwcash at gmail.com

Freddie Cash

2011-Apr-29 23:21 UTC

head link

[zfs-discuss] Still no way to recover a "corrupted" pool

On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash <fjwcash at gmail.com>
wrote:> Is there anyway, yet, to import a pool with corrupted space_map
> errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions?
>
> I have a pool comprised of 4 raidz2 vdevs of 6 drives each. ?I have
> almost 10 TB of data in the pool (3 TB actual disk space used due to
> dedup and compression). ?While testing various failure modes, I have
> managed to corrupt the pool to the point where it won''t import.
?So
> much for being bulletproof. ?:(
>
> If I try to import the pool normally, it give corrupted space_map errors.
>
> If I try to "import -F" the pool, it complains that
"zio-io_type !> ZIO_TYPE_WRITE".
>
> I''ve also tried the above with "-o readonly=on" and
"-R
> some/other/root" variations.
>
> There''s also no zfs.cache file anywhere to be found, and creating
a
> blank file doesn''t help.
>
> Does this mean that a 10 TB pool can be lost due to a single file
> being corrupted, or a single piece of pool metadata being corrupted?
> And that there''s *still* no recovery tools for situations like
this?
>
> Running ZFSv28 on 64-bit FreeBSD 8-STABLE.
>
> For the curious, the failure mode that causes this? ?Rebooting while 8
> simultaneous rsyncs were running, which were not killed by the
> shutdown process for some reason, which prevented 8 ZFS filesystems
> from being unmounted, which prevented the pool from being exported
> (even though I have a "zfs unmount -f" and "zpool export
-f"
> fail-safe), which locked up the shutdown process requiring a power
> reset.
Well, by commenting out the VERIFY line for zio->io_type !ZIO_TYPE_WRITE and
compiling a new kernel, I can import the pool, but
only with -F and -o readonly=on.  :(

Trying to import it read-write gives dmu_free_range errors and panics
the system.

Compiling a kernel with that assertion commented out allows the pool
to be imported read-only.  Importing it read-write gives a bunch of
other dmu panics.  :( :( :(

How can it be that after 28 pool format revisions and 5+ years of
development, ZFS is still this brittle?  I''ve found lots of threads
from 2007 about this very issue, with "don''t do that" and
"it''s not an
issue" and "there''s no need for a pool consistency
checker" and other
similar "head in the sand" responses.  :(  But still no way to prevent
or fix this form of corruption.

It''s great that I can get the pool to import read-only, so the data is
still available.  But that really doesn''t help when I''ve
already
rebuilt this pool twice due to this issue.

-- 
Freddie Cash
fjwcash at gmail.com

Alexander J. Maidak

2011-Apr-30 00:00 UTC

head link

[zfs-discuss] Still no way to recover a "corrupted" pool

On Fri, 2011-04-29 at 16:21 -0700, Freddie Cash wrote:> On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash <fjwcash at gmail.com>
wrote:
> > Is there anyway, yet, to import a pool with corrupted space_map
> > errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions?
>...
> Well, by commenting out the VERIFY line for zio->io_type !>
ZIO_TYPE_WRITE and compiling a new kernel, I can import the pool, but
> only with -F and -o readonly=on.  :(
> 
> It''s great that I can get the pool to import read-only, so the
data is
> still available.  But that really doesn''t help when I''ve
already
> rebuilt this pool twice due to this issue.
> 
Just curious, did you try an import or recovery with Solaris 11 Express
build 151a?  I expect it wouldn''t have made a difference, but
I''d be
curious to know.

-Alex

Freddie Cash

2011-Apr-30 00:02 UTC

head link

[zfs-discuss] Still no way to recover a "corrupted" pool

On Fri, Apr 29, 2011 at 5:00 PM, Alexander J. Maidak <ajmaidak at
mchsi.com> wrote:> On Fri, 2011-04-29 at 16:21 -0700, Freddie Cash wrote:
>> On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash <fjwcash at
gmail.com> wrote:
>> > Is there anyway, yet, to import a pool with corrupted space_map
>> > errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions?
>
>>...
>> Well, by commenting out the VERIFY line for zio->io_type !>>
ZIO_TYPE_WRITE and compiling a new kernel, I can import the pool, but
>> only with -F and -o readonly=on. ?:(
>>
>> It''s great that I can get the pool to import read-only, so the
data is
>> still available. ?But that really doesn''t help when
I''ve already
>> rebuilt this pool twice due to this issue.
>>
>
> Just curious, did you try an import or recovery with Solaris 11 Express
> build 151a? ?I expect it wouldn''t have made a difference, but
I''d be
> curious to know.
No, that''s on the menu for next week, trying a couple OpenSolaris,
Solaris Express, Nexenta LiveCDs to see if they make a difference.


-- 
Freddie Cash
fjwcash at gmail.com

Brandon High

2011-Apr-30 00:17 UTC

head link

[zfs-discuss] Still no way to recover a "corrupted" pool

On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash <fjwcash at gmail.com>
wrote:> Running ZFSv28 on 64-bit FreeBSD 8-STABLE.
I''d suggest trying to import the pool into snv_151a (Solaris 11
Express), which is the reference and development platform for ZFS.

-B

-- 
Brandon High : bhigh at freaks.com

Freddie Cash

2011-May-16 20:55 UTC

head link

[zfs-discuss] Still no way to recover a "corrupted" pool

On Fri, Apr 29, 2011 at 5:17 PM, Brandon High <bhigh at freaks.com>
wrote:> On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash <fjwcash at gmail.com>
wrote:
>> Running ZFSv28 on 64-bit FreeBSD 8-STABLE.
>
> I''d suggest trying to import the pool into snv_151a (Solaris 11
> Express), which is the reference and development platform for ZFS.
Would not import in Solaris 11 Express.  :(  Could not even find any
pools to import.  Even when using "zpool import -d /dev/dsk" or any
other import commands.  Most likely due to using a FreeBSD-specific
method of labelling the disks.

I''ve since rebuilt the pool (a third time), using GPT partitions,
labels on the partitions, and using the labels in the pool
configuration.  That should make it importable across OSes (FreeBSD,
Solaris, Linux, etc).

It''s just frustrating that it''s still possible to corrupt a
pool in
such a way that "nuke and pave" is the only solution.  Especially when
this same assertion was discussed in 2007 ... with no workaround or
fix or whatnot implemented, four years later.

What''s most frustrating is that this is the third time I''ve
built this
pool due to corruption like this, within three months.  :(

-- 
Freddie Cash
fjwcash at gmail.com

Brandon High

2011-May-16 21:46 UTC

head link

[zfs-discuss] Still no way to recover a "corrupted" pool

On Mon, May 16, 2011 at 1:55 PM, Freddie Cash <fjwcash at gmail.com>
wrote:> Would not import in Solaris 11 Express. ?:( ?Could not even find any
> pools to import. ?Even when using "zpool import -d /dev/dsk" or
any
> other import commands. ?Most likely due to using a FreeBSD-specific
> method of labelling the disks.
I think someone solved this before by creating a directory and making
symlinks to the correct partition/slices on each disk. Then you can
use ''zpool import -d /tmp/foo'' to do the import. eg:

# mkdir /tmp/fbsd # create a temp directory to point to the p0
partitions of the relevant disks
# ln -s /dev/dsk/c8t1d0p0 /tmp/fbsd/
# ln -s /dev/dsk/c8t2d0p0 /tmp/fbsd/
# ln -s /dev/dsk/c8t3d0p0 /tmp/fbsd/
# ln -s /dev/dsk/c8t4d0p0 /tmp/fbsd/
# zpool import -d /tmp/fbsd/ $POOLNAME

I''ve never used FreeBSD so I can''t offer any advice about
which device
name is correct or if this will work. Posts from February 2010 "Import
zpool from FreeBSD in OpenSolaris" indicate that you want p0.
> It''s just frustrating that it''s still possible to corrupt
a pool in
> such a way that "nuke and pave" is the only solution. ?Especially
when
I''m not sure it was the only solution, it''s just the one you
followed.
> What''s most frustrating is that this is the third time
I''ve built this
> pool due to corruption like this, within three months. ?:(
You may have an underlying hardware problem, or there could be a bug
in the FreeBSD implementation that you''re tripping over.

-B

-- 
Brandon High : bhigh at freaks.com

Dick Hoogendijk

2011-May-17 09:02 UTC

head link

[zfs-discuss] Still no way to recover a "corrupted" pool

Op 16-5-2011 22:55 schreef Freddie Cash:> On Fri, Apr 29, 2011 at 5:17 PM, Brandon High<bhigh at freaks.com> 
wrote:
>> On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash<fjwcash at
gmail.com>  wrote:
>>> Running ZFSv28 on 64-bit FreeBSD 8-STABLE.
>> I''d suggest trying to import the pool into snv_151a (Solaris
11
>> Express), which is the reference and development platform for ZFS.
> Would not import in Solaris 11 Express.  :(  Could not even find any
> pools to import.  Even when using "zpool import -d /dev/dsk" or
any
> other import commands.  Most likely due to using a FreeBSD-specific
> method of labelling the disks.
That should not be the case. You either use ZFS or you don''t
> What''s most frustrating is that this is the third time
I''ve built this
> pool due to corruption like this, within three months.  :(
Three times in three months is not normal. You should be looking for the 
cause within your hardware imho. ZFS is very stable in my experiences, 
but I must say I always have run solaris (10/11) with it and not 
FreeBSD. I once tried and immediately got troubles I never had before, 
so I dropped the zfs/fbsd unity. ;-)

zfs discuss - Apr 2011 - Still no way to recover a "corrupted" pool

[zfs-discuss] Still no way to recover a "corrupted" pool

[zfs-discuss] Still no way to recover a "corrupted" pool

[zfs-discuss] Still no way to recover a "corrupted" pool

[zfs-discuss] Still no way to recover a "corrupted" pool

[zfs-discuss] Still no way to recover a "corrupted" pool

[zfs-discuss] Still no way to recover a "corrupted" pool

[zfs-discuss] Still no way to recover a "corrupted" pool

[zfs-discuss] Still no way to recover a "corrupted" pool