Matthew Ellison
2010-Aug-18 07:15 UTC
[zfs-discuss] Kernel panic on import / interrupted zfs destroy
I have a box running snv_134 that had a little boo-boo. The problem first started a couple of weeks ago with some corruption on two filesystems in a 11 disk 10tb raidz2 set. I ran a couple of scrubs that revealed a handful of corrupt files on my 2 de-duplicated zfs filesystems. No biggie. I thought that my problems had something to do with de-duplication in 134, so I went about the process of creating new filesystems and copying over the "good" files to another box. Every time I touched the "bad" files I got a filesystem error 5. When trying to delete them manually, I got kernel panics - which eventually turned into reboot loops. I tried installing nexenta on another disk to see if that would allow me to get passed the reboot loop - which it did. I finished moving the "good" files over (using rsync, which skipped over the error 5 files, unlike cp or mv), and destroyed one of the two filesystems. Unfortunately, this caused a kernel panic in the middle of the destroy operation, which then became another panic / reboot loop. I was able to get in with milestone=none and delete the zfs cache, but now I have a new problem: Any attempt to import the pool results in a panic. I have tried from my snv_134 install, from the live cd, and from nexenta. I have tried various zdb incantations (with aok=1 and zfs:zfs_recover=1), to no avail - these error out after a few minutes. I have even tried another controller. I have zdb -e -bcsvL running now from 134 (without aok=1) which has been running for several hours. Can zdb recover from this kind of situation (with a half-destroyed filesystem that panics the kernel on import?) What is the impact of the above zdb operation without aok=1? Is there any likelihood of a recovery of non-affected filesystems? Any suggestions? Regards, Matthew Ellison
Matthew Ellison
2010-Aug-18 20:29 UTC
[zfs-discuss] Fwd: Kernel panic on import / interrupted zfs destroy
Hmm still running zdb since last night. Anyone have any suggestions or advice how to proceed with this issue? Thanks, Matthew Ellison Begin forwarded message:> From: Matthew Ellison <matt at mattellison.com> > Date: August 18, 2010 3:15:39 AM EDT > To: zfs-discuss at opensolaris.org > Subject: Kernel panic on import / interrupted zfs destroy > > I have a box running snv_134 that had a little boo-boo. > > The problem first started a couple of weeks ago with some corruption on two filesystems in a 11 disk 10tb raidz2 set. I ran a couple of scrubs that revealed a handful of corrupt files on my 2 de-duplicated zfs filesystems. No biggie. > > I thought that my problems had something to do with de-duplication in 134, so I went about the process of creating new filesystems and copying over the "good" files to another box. Every time I touched the "bad" files I got a filesystem error 5. When trying to delete them manually, I got kernel panics - which eventually turned into reboot loops. > > I tried installing nexenta on another disk to see if that would allow me to get passed the reboot loop - which it did. I finished moving the "good" files over (using rsync, which skipped over the error 5 files, unlike cp or mv), and destroyed one of the two filesystems. Unfortunately, this caused a kernel panic in the middle of the destroy operation, which then became another panic / reboot loop. > > I was able to get in with milestone=none and delete the zfs cache, but now I have a new problem: Any attempt to import the pool results in a panic. I have tried from my snv_134 install, from the live cd, and from nexenta. I have tried various zdb incantations (with aok=1 and zfs:zfs_recover=1), to no avail - these error out after a few minutes. I have even tried another controller. > > I have zdb -e -bcsvL running now from 134 (without aok=1) which has been running for several hours. Can zdb recover from this kind of situation (with a half-destroyed filesystem that panics the kernel on import?) What is the impact of the above zdb operation without aok=1? Is there any likelihood of a recovery of non-affected filesystems? > > Any suggestions? > > Regards, > > Matthew Ellison-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100818/85242048/attachment.html>