James Risner
2009-Dec-03 06:35 UTC
[zfs-discuss] ZIL corrupt, not recoverable even with logfix
I have a 9 drive system (four mirrors of two disks and one hot spare) with a 10th SSD drive for ZIL. The ZIL is corrupt. I''ve been unable to recover using FreeBSD 8, Opensolaris x86, and using logfix (http://github.com/pjjw/logfix) In FreeBSD 8.0RC3 and below (uses v13 ZFS): 1) Boot Single User (both i386 and amd64) 2) /etc/rc.d/hostid start 3) "zpool import" results in system lockup (infinite time or at least 3 days) In FreeBSD 8.0 Release: 1) Do #1 & #2 above, then "zpool import -f" results is being told there are missing elements (namely the log disk ad4p2) In OpenSolaris x86: 1) "zpool import -f" reports log disk is missing. Use Logfix under OpenSolaris: 1) make new pool junkpool 2) logfix using a disk from the pool and the new log disk and the guid of the old corrupt ZIL log from the freebsd box. 3) "zpool import -f" is different, it now shows the new log but reports a disk pair (mirror of da4p5 & da5p5 using the FreeBSD names since I don''t understand OpenSolaris names) missing. They show up before the log disk is changed, but now do not. 4) If I remove the log disk, they reappear. 5) Of note, 8 of the disks (the four mirrors) are one one SAS HBA. The spare is on another SATA controller with the SSD disk. 6) Could it be that the disks span controllers? Like c8t[1-8]d0s4 are the 8 disks and c7d0 is the spare and c8d1 is the SSD. I''ve spent 2 weeks trying to recover this pool, and been unable to do so in FreeBSD or OpenSolaris. Is there anyone who could help? Or suggest things I have not tried? I''m fine with copying the data off if I could just mount the thing read only even. -- This message posted from opensolaris.org
Anon Y Mous
2009-Dec-03 17:24 UTC
[zfs-discuss] ZIL corrupt, not recoverable even with logfix
Was the zpool originally created by a FreeBSD operating system or by an OpenSolaris operating system? If so, what version of FreeBSD, SXCE, OpenSolaris Indiana was it originally created by? The reason I''m asking this is because there are different versions of ZFS in different versions of OpenSolaris, so if you take a newer version zpool and try to mount it in an older version OpenSolaris, it won''t mount. The last time I tried it a long time ago, ZFS in FreeBSD was pretty unstable and still under heavy development, which was the sole reason I migrated my storage server with my important data on it to OpenSolaris, and it has been rock solid stable since. -- This message posted from opensolaris.org
James Risner
2009-Dec-04 06:33 UTC
[zfs-discuss] ZIL corrupt, not recoverable even with logfix
It was created on AMD64 FreeBSD with 8.0RC2 (which was version 13 of ZFS iirc.) At some point I knocked it out (export) somehow, I don''t remember doing so intentionally. So I can''t do commands like zpool replace since there are no pools. It says it was last used by the FreeBSD box, but the FreeBSD does not show it with "zpool status" command. I''m going down tomorrow to work on it again, and I''m going to try 8.0 Release AMD64 FreeBSD (I''ve already tried i386 AMD64 FreeBSD 8.0 Release) and Opensolaris dev-127. I was just hoping there was some way I''m missing to mount it read only (I have tried "zpool import -f -o readonly=yes" but that doesn''t work either.) -- This message posted from opensolaris.org
max at bruningsystems.com
2009-Dec-11 07:52 UTC
[zfs-discuss] ZIL corrupt, not recoverable even with logfix
Hi James, I just spent about a week recovering about 10TB of file data for someone who encountered a (somewhat?) similar problem to what you are seeing. If you are still having problems with this, please contact me off-list. Regards, max James Risner wrote:> It was created on AMD64 FreeBSD with 8.0RC2 (which was version 13 of ZFS iirc.) > > At some point I knocked it out (export) somehow, I don''t remember doing so intentionally. So I can''t do commands like zpool replace since there are no pools. > > It says it was last used by the FreeBSD box, but the FreeBSD does not show it with "zpool status" command. > > I''m going down tomorrow to work on it again, and I''m going to try 8.0 Release AMD64 FreeBSD (I''ve already tried i386 AMD64 FreeBSD 8.0 Release) and Opensolaris dev-127. > > I was just hoping there was some way I''m missing to mount it read only (I have tried "zpool import -f -o readonly=yes" but that doesn''t work either.) >
Victor Latushkin
2009-Dec-15 06:26 UTC
[zfs-discuss] ZIL corrupt, not recoverable even with logfix
On Dec 4, 2009, at 9:33, James Risner <ros at akira.stdio.com> wrote:> It was created on AMD64 FreeBSD with 8.0RC2 (which was version 13 of > ZFS iirc.) > > At some point I knocked it out (export) somehow, I don''t remember > doing so intentionally. So I can''t do commands like zpool replace > since there are no pools.Have you tried build 128 which includes pool recovery support?> It says it was last used by the FreeBSD box, but the FreeBSD does > not show it with "zpool status" command.This is because FreeBSD hostname (and hostid?) is recorded in the labels along with active pool state.> > I''m going down tomorrow to work on it again, and I''m going to try > 8.0 Release AMD64 FreeBSD (I''ve already tried i386 AMD64 FreeBSD 8.0 > Release) and Opensolaris dev-127. > > I was just hoping there was some way I''m missing to mount it read > only (I have tried "zpool import -f -o readonly=yes" but that > doesn''t work either.)It does not work that way at the moment, though readonly import is quite useful option that can be tried. If you still need help with this and can provide remote access to your box while it is running OpenSolaris, please let me know. Regards Victor> -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
James Risner
2009-Dec-19 13:45 UTC
[zfs-discuss] ZIL corrupt, not recoverable even with logfix
Written by jktorn:>Have you tried build 128 which includes pool recovery support? > >This is because FreeBSD hostname (and hostid?) is recorded in the >labels along with active pool state. > >It does not work that way at the moment, though readonly import is >quite useful option that can be tried.Yes, I tried 128a and 129. Neither worked and all of them failed just like the 127 version. Specifically, they all reported the pool was used by another and ignored the -f option to import anyway. Things I had tried before I programmed myself a solution include: With or without pool name ("zpool import tank" "zpool import") With or without -f option With or without -V (undocumented "do anyway" option) With or without -F (lose data option) With or without -FX (lose massive data option) I finally decided to install an Opensolaris machine and compile a fixed version of logfix.c, which I detailed here: http://opensolaris.org/jive/thread.jspa?threadID=62831&tstart=0 Short summary of the problems preventing me from mounting this forged pool: 1) The pool had been accidentally exported on the host FreeBSD system. 2) The pool had 10 drives, and logfix was written to assume only one vdev for data and one for the log. The guid needed to be changed and the generic sequential "id" also needed to be set to a unique value. This is why my da4/da5 mirror disks disappeared whenever I used logfix to mark up the log device (it was the same "id", for example in my case "1".) 3) The marked up log device had 1 label matching my pool "tank" and the other 3 labels matching a scratch pool named junkpool. These labels had to be removed to prevent it from reporting this error: Assertion failed: rn->rn_nozpool == B_FALSE, file ../common/libzfs_import.c, line 1078, function zpool_open_func 4) The pool was last used on FreeBSD and the FreeBSD device names differed from OpenSolaris, for some reason they could not be properly detected. So I had to make a directory and use "-d /tmp/fbsd" to point zpool to the directory to find the original device names. 5) The pool had lost 84 seconds of data on the log disk that can not be recovered (which required me to use -F to lose and mount.) 6) Since I made the log device to repair this pool a file, I need to use lofiadm to make a block/character device for it. Happy! # pfexec zpool import -d /tmp/fbsd/ -f -F tank Pool tank returned to its state as of November 13, 2009 10:50:11 AM PST. Discarded approximately 25 seconds of transactions. -- This message posted from opensolaris.org