Freddie Cash
2011-Apr-29 20:23 UTC
[zfs-discuss] Still no way to recover a "corrupted" pool
Is there anyway, yet, to import a pool with corrupted space_map errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions? I have a pool comprised of 4 raidz2 vdevs of 6 drives each. I have almost 10 TB of data in the pool (3 TB actual disk space used due to dedup and compression). While testing various failure modes, I have managed to corrupt the pool to the point where it won''t import. So much for being bulletproof. :( If I try to import the pool normally, it give corrupted space_map errors. If I try to "import -F" the pool, it complains that "zio-io_type !ZIO_TYPE_WRITE". I''ve also tried the above with "-o readonly=on" and "-R some/other/root" variations. There''s also no zfs.cache file anywhere to be found, and creating a blank file doesn''t help. Does this mean that a 10 TB pool can be lost due to a single file being corrupted, or a single piece of pool metadata being corrupted? And that there''s *still* no recovery tools for situations like this? Running ZFSv28 on 64-bit FreeBSD 8-STABLE. For the curious, the failure mode that causes this? Rebooting while 8 simultaneous rsyncs were running, which were not killed by the shutdown process for some reason, which prevented 8 ZFS filesystems from being unmounted, which prevented the pool from being exported (even though I have a "zfs unmount -f" and "zpool export -f" fail-safe), which locked up the shutdown process requiring a power reset. :( -- Freddie Cash fjwcash at gmail.com
Freddie Cash
2011-Apr-29 23:21 UTC
[zfs-discuss] Still no way to recover a "corrupted" pool
On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash <fjwcash at gmail.com> wrote:> Is there anyway, yet, to import a pool with corrupted space_map > errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions? > > I have a pool comprised of 4 raidz2 vdevs of 6 drives each. ?I have > almost 10 TB of data in the pool (3 TB actual disk space used due to > dedup and compression). ?While testing various failure modes, I have > managed to corrupt the pool to the point where it won''t import. ?So > much for being bulletproof. ?:( > > If I try to import the pool normally, it give corrupted space_map errors. > > If I try to "import -F" the pool, it complains that "zio-io_type !> ZIO_TYPE_WRITE". > > I''ve also tried the above with "-o readonly=on" and "-R > some/other/root" variations. > > There''s also no zfs.cache file anywhere to be found, and creating a > blank file doesn''t help. > > Does this mean that a 10 TB pool can be lost due to a single file > being corrupted, or a single piece of pool metadata being corrupted? > And that there''s *still* no recovery tools for situations like this? > > Running ZFSv28 on 64-bit FreeBSD 8-STABLE. > > For the curious, the failure mode that causes this? ?Rebooting while 8 > simultaneous rsyncs were running, which were not killed by the > shutdown process for some reason, which prevented 8 ZFS filesystems > from being unmounted, which prevented the pool from being exported > (even though I have a "zfs unmount -f" and "zpool export -f" > fail-safe), which locked up the shutdown process requiring a power > reset.Well, by commenting out the VERIFY line for zio->io_type !ZIO_TYPE_WRITE and compiling a new kernel, I can import the pool, but only with -F and -o readonly=on. :( Trying to import it read-write gives dmu_free_range errors and panics the system. Compiling a kernel with that assertion commented out allows the pool to be imported read-only. Importing it read-write gives a bunch of other dmu panics. :( :( :( How can it be that after 28 pool format revisions and 5+ years of development, ZFS is still this brittle? I''ve found lots of threads from 2007 about this very issue, with "don''t do that" and "it''s not an issue" and "there''s no need for a pool consistency checker" and other similar "head in the sand" responses. :( But still no way to prevent or fix this form of corruption. It''s great that I can get the pool to import read-only, so the data is still available. But that really doesn''t help when I''ve already rebuilt this pool twice due to this issue. -- Freddie Cash fjwcash at gmail.com
Alexander J. Maidak
2011-Apr-30 00:00 UTC
[zfs-discuss] Still no way to recover a "corrupted" pool
On Fri, 2011-04-29 at 16:21 -0700, Freddie Cash wrote:> On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash <fjwcash at gmail.com> wrote: > > Is there anyway, yet, to import a pool with corrupted space_map > > errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions?>... > Well, by commenting out the VERIFY line for zio->io_type !> ZIO_TYPE_WRITE and compiling a new kernel, I can import the pool, but > only with -F and -o readonly=on. :( > > It''s great that I can get the pool to import read-only, so the data is > still available. But that really doesn''t help when I''ve already > rebuilt this pool twice due to this issue. >Just curious, did you try an import or recovery with Solaris 11 Express build 151a? I expect it wouldn''t have made a difference, but I''d be curious to know. -Alex
Freddie Cash
2011-Apr-30 00:02 UTC
[zfs-discuss] Still no way to recover a "corrupted" pool
On Fri, Apr 29, 2011 at 5:00 PM, Alexander J. Maidak <ajmaidak at mchsi.com> wrote:> On Fri, 2011-04-29 at 16:21 -0700, Freddie Cash wrote: >> On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash <fjwcash at gmail.com> wrote: >> > Is there anyway, yet, to import a pool with corrupted space_map >> > errors, or "zio-io_type != ZIO_TYPE_WRITE" assertions? > >>... >> Well, by commenting out the VERIFY line for zio->io_type !>> ZIO_TYPE_WRITE and compiling a new kernel, I can import the pool, but >> only with -F and -o readonly=on. ?:( >> >> It''s great that I can get the pool to import read-only, so the data is >> still available. ?But that really doesn''t help when I''ve already >> rebuilt this pool twice due to this issue. >> > > Just curious, did you try an import or recovery with Solaris 11 Express > build 151a? ?I expect it wouldn''t have made a difference, but I''d be > curious to know.No, that''s on the menu for next week, trying a couple OpenSolaris, Solaris Express, Nexenta LiveCDs to see if they make a difference. -- Freddie Cash fjwcash at gmail.com
Brandon High
2011-Apr-30 00:17 UTC
[zfs-discuss] Still no way to recover a "corrupted" pool
On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash <fjwcash at gmail.com> wrote:> Running ZFSv28 on 64-bit FreeBSD 8-STABLE.I''d suggest trying to import the pool into snv_151a (Solaris 11 Express), which is the reference and development platform for ZFS. -B -- Brandon High : bhigh at freaks.com
Freddie Cash
2011-May-16 20:55 UTC
[zfs-discuss] Still no way to recover a "corrupted" pool
On Fri, Apr 29, 2011 at 5:17 PM, Brandon High <bhigh at freaks.com> wrote:> On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash <fjwcash at gmail.com> wrote: >> Running ZFSv28 on 64-bit FreeBSD 8-STABLE. > > I''d suggest trying to import the pool into snv_151a (Solaris 11 > Express), which is the reference and development platform for ZFS.Would not import in Solaris 11 Express. :( Could not even find any pools to import. Even when using "zpool import -d /dev/dsk" or any other import commands. Most likely due to using a FreeBSD-specific method of labelling the disks. I''ve since rebuilt the pool (a third time), using GPT partitions, labels on the partitions, and using the labels in the pool configuration. That should make it importable across OSes (FreeBSD, Solaris, Linux, etc). It''s just frustrating that it''s still possible to corrupt a pool in such a way that "nuke and pave" is the only solution. Especially when this same assertion was discussed in 2007 ... with no workaround or fix or whatnot implemented, four years later. What''s most frustrating is that this is the third time I''ve built this pool due to corruption like this, within three months. :( -- Freddie Cash fjwcash at gmail.com
Brandon High
2011-May-16 21:46 UTC
[zfs-discuss] Still no way to recover a "corrupted" pool
On Mon, May 16, 2011 at 1:55 PM, Freddie Cash <fjwcash at gmail.com> wrote:> Would not import in Solaris 11 Express. ?:( ?Could not even find any > pools to import. ?Even when using "zpool import -d /dev/dsk" or any > other import commands. ?Most likely due to using a FreeBSD-specific > method of labelling the disks.I think someone solved this before by creating a directory and making symlinks to the correct partition/slices on each disk. Then you can use ''zpool import -d /tmp/foo'' to do the import. eg: # mkdir /tmp/fbsd # create a temp directory to point to the p0 partitions of the relevant disks # ln -s /dev/dsk/c8t1d0p0 /tmp/fbsd/ # ln -s /dev/dsk/c8t2d0p0 /tmp/fbsd/ # ln -s /dev/dsk/c8t3d0p0 /tmp/fbsd/ # ln -s /dev/dsk/c8t4d0p0 /tmp/fbsd/ # zpool import -d /tmp/fbsd/ $POOLNAME I''ve never used FreeBSD so I can''t offer any advice about which device name is correct or if this will work. Posts from February 2010 "Import zpool from FreeBSD in OpenSolaris" indicate that you want p0.> It''s just frustrating that it''s still possible to corrupt a pool in > such a way that "nuke and pave" is the only solution. ?Especially whenI''m not sure it was the only solution, it''s just the one you followed.> What''s most frustrating is that this is the third time I''ve built this > pool due to corruption like this, within three months. ?:(You may have an underlying hardware problem, or there could be a bug in the FreeBSD implementation that you''re tripping over. -B -- Brandon High : bhigh at freaks.com
Dick Hoogendijk
2011-May-17 09:02 UTC
[zfs-discuss] Still no way to recover a "corrupted" pool
Op 16-5-2011 22:55 schreef Freddie Cash:> On Fri, Apr 29, 2011 at 5:17 PM, Brandon High<bhigh at freaks.com> wrote: >> On Fri, Apr 29, 2011 at 1:23 PM, Freddie Cash<fjwcash at gmail.com> wrote: >>> Running ZFSv28 on 64-bit FreeBSD 8-STABLE. >> I''d suggest trying to import the pool into snv_151a (Solaris 11 >> Express), which is the reference and development platform for ZFS. > Would not import in Solaris 11 Express. :( Could not even find any > pools to import. Even when using "zpool import -d /dev/dsk" or any > other import commands. Most likely due to using a FreeBSD-specific > method of labelling the disks.That should not be the case. You either use ZFS or you don''t> What''s most frustrating is that this is the third time I''ve built this > pool due to corruption like this, within three months. :(Three times in three months is not normal. You should be looking for the cause within your hardware imho. ZFS is very stable in my experiences, but I must say I always have run solaris (10/11) with it and not FreeBSD. I once tried and immediately got troubles I never had before, so I dropped the zfs/fbsd unity. ;-)