Running ZFS on a Nexenta box, I had a mirror get broken and apparently the metadata is corrupt now. If I try and mount vol2 it works but if I try and mount -a or mount vol2/vm2 is instantly kernel panics and reboots. Is it possible to recover from this? I don''t care if I lose the file listed below, but the other data in the volume would be really nice to get back. I have scrubbed the volume to no avail. Any other thoughts. zpool status -xv vol2 pool: vol2 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM vol2 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: vol2/vm2 at snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk -- John
Do you have a coredump? Or a stack trace of the panic? On Wed, 19 May 2010, John Andrunas wrote:> Running ZFS on a Nexenta box, I had a mirror get broken and apparently > the metadata is corrupt now. If I try and mount vol2 it works but if > I try and mount -a or mount vol2/vm2 is instantly kernel panics and > reboots. Is it possible to recover from this? I don''t care if I lose > the file listed below, but the other data in the volume would be > really nice to get back. I have scrubbed the volume to no avail. Any > other thoughts. > > > zpool status -xv vol2 > pool: vol2 > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > vol2 ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c3t3d0 ONLINE 0 0 0 > c3t2d0 ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > vol2/vm2 at snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk > > -- > John > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >Regards, markm
Not to my knowledge, how would I go about getting one? (CC''ing discuss) On Wed, May 19, 2010 at 8:46 AM, Mark J Musante <Mark.Musante at oracle.com> wrote:> > Do you have a coredump? ?Or a stack trace of the panic? > > On Wed, 19 May 2010, John Andrunas wrote: > >> Running ZFS on a Nexenta box, I had a mirror get broken and apparently >> the metadata is corrupt now. ?If I try and mount vol2 it works but if >> I try and mount -a or mount vol2/vm2 is instantly kernel panics and >> reboots. ?Is it possible to recover from this? ?I don''t care if I lose >> the file listed below, but the other data in the volume would be >> really nice to get back. ?I have scrubbed the volume to no avail. ?Any >> other thoughts. >> >> >> zpool status -xv vol2 >> ?pool: vol2 >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> ? ? ? corruption. ?Applications may be affected. >> action: Restore the file in question if possible. ?Otherwise restore the >> ? ? ? entire pool from backup. >> ?see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: none requested >> config: >> >> ? ? ? NAME ? ? ? ?STATE ? ? READ WRITE CKSUM >> ? ? ? vol2 ? ? ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> ? ? ? ? mirror-0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> ? ? ? ? ? c3t3d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> ? ? ? ? ? c3t2d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> >> errors: Permanent errors have been detected in the following files: >> >> ? ? ? vol2/vm2 at snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk >> >> -- >> John >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > > Regards, > markm >-- John
On 19.05.10 17:53, John Andrunas wrote:> Not to my knowledge, how would I go about getting one? (CC''ing discuss)man savecore and dumpadm. Michael> > > On Wed, May 19, 2010 at 8:46 AM, Mark J Musante<Mark.Musante at oracle.com> wrote: >> >> Do you have a coredump? Or a stack trace of the panic? >> >> On Wed, 19 May 2010, John Andrunas wrote: >> >>> Running ZFS on a Nexenta box, I had a mirror get broken and apparently >>> the metadata is corrupt now. If I try and mount vol2 it works but if >>> I try and mount -a or mount vol2/vm2 is instantly kernel panics and >>> reboots. Is it possible to recover from this? I don''t care if I lose >>> the file listed below, but the other data in the volume would be >>> really nice to get back. I have scrubbed the volume to no avail. Any >>> other thoughts. >>> >>> >>> zpool status -xv vol2 >>> pool: vol2 >>> state: ONLINE >>> status: One or more devices has experienced an error resulting in data >>> corruption. Applications may be affected. >>> action: Restore the file in question if possible. Otherwise restore the >>> entire pool from backup. >>> see: http://www.sun.com/msg/ZFS-8000-8A >>> scrub: none requested >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> vol2 ONLINE 0 0 0 >>> mirror-0 ONLINE 0 0 0 >>> c3t3d0 ONLINE 0 0 0 >>> c3t2d0 ONLINE 0 0 0 >>> >>> errors: Permanent errors have been detected in the following files: >>> >>> vol2/vm2 at snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk >>> >>> -- >>> John >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>> >> >> >> Regards, >> markm >> > > >-- michael.schuster at oracle.com http://blogs.sun.com/recursion Recursion, n.: see ''Recursion''
Hmmm... no coredump even though I configured it. Here is the trace though I will see what I can do about the coredump root at cluster:/export/home/admin# zfs mount vol2/vm2 panic[cpu3]/thread=ffffff001f45ec60: BAD TRAP: type=e (#pf Page fault) rp=ffffff001f45e950 addr=30 occurred in module "zfs" due to a NULL pointer deree zpool-vol2: #pf Page fault Bad kernel fault at addr=0x30 pid=1469, pc=0xfffffffff795d054, sp=0xffffff001f45ea48, eflags=0x10296 cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> cr2: 30cr3: 5000000cr8: c rdi: 0 rsi: ffffff05208b2388 rdx: ffffff001f45e888 rcx: 0 r8: 3000900ff r9: 198f5ff6 rax: 0 rbx: 200 rbp: ffffff001f45ea50 r10: c0130803 r11: ffffff001f45ec60 r12: ffffff05208b2388 r13: ffffff0521fc4000 r14: ffffff050c0167e0 r15: ffffff050c0167e8 fsb: 0 gsb: ffffff04eb9b8080 ds: 4b es: 4b fs: 0 gs: 1c3 trp: e err: 2 rip: fffffffff795d054 cs: 30 rfl: 10296 rsp: ffffff001f45ea48 ss: 38 ffffff001f45e830 unix:die+dd () ffffff001f45e940 unix:trap+177b () ffffff001f45e950 unix:cmntrap+e6 () ffffff001f45ea50 zfs:ddt_phys_decref+c () ffffff001f45ea80 zfs:zio_ddt_free+55 () ffffff001f45eab0 zfs:zio_execute+8d () ffffff001f45eb50 genunix:taskq_thread+248 () ffffff001f45eb60 unix:thread_start+8 () syncing file systems... done skipping system dump - no dump device configured rebooting... On Wed, May 19, 2010 at 8:55 AM, Michael Schuster <michael.schuster at oracle.com> wrote:> On 19.05.10 17:53, John Andrunas wrote: >> >> Not to my knowledge, how would I go about getting one? ?(CC''ing discuss) > > man savecore and dumpadm. > > Michael >> >> >> On Wed, May 19, 2010 at 8:46 AM, Mark J Musante<Mark.Musante at oracle.com> >> ?wrote: >>> >>> Do you have a coredump? ?Or a stack trace of the panic? >>> >>> On Wed, 19 May 2010, John Andrunas wrote: >>> >>>> Running ZFS on a Nexenta box, I had a mirror get broken and apparently >>>> the metadata is corrupt now. ?If I try and mount vol2 it works but if >>>> I try and mount -a or mount vol2/vm2 is instantly kernel panics and >>>> reboots. ?Is it possible to recover from this? ?I don''t care if I lose >>>> the file listed below, but the other data in the volume would be >>>> really nice to get back. ?I have scrubbed the volume to no avail. ?Any >>>> other thoughts. >>>> >>>> >>>> zpool status -xv vol2 >>>> ?pool: vol2 >>>> state: ONLINE >>>> status: One or more devices has experienced an error resulting in data >>>> ? ? ? corruption. ?Applications may be affected. >>>> action: Restore the file in question if possible. ?Otherwise restore the >>>> ? ? ? entire pool from backup. >>>> ?see: http://www.sun.com/msg/ZFS-8000-8A >>>> scrub: none requested >>>> config: >>>> >>>> ? ? ? NAME ? ? ? ?STATE ? ? READ WRITE CKSUM >>>> ? ? ? vol2 ? ? ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>>> ? ? ? ? mirror-0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>>> ? ? ? ? ? c3t3d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>>> ? ? ? ? ? c3t2d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>>> >>>> errors: Permanent errors have been detected in the following files: >>>> >>>> ? ? ? vol2/vm2 at snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk >>>> >>>> -- >>>> John >>>> _______________________________________________ >>>> zfs-discuss mailing list >>>> zfs-discuss at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>> >>> >>> >>> Regards, >>> markm >>> >> >> >> > > > -- > michael.schuster at oracle.com ? ? http://blogs.sun.com/recursion > Recursion, n.: see ''Recursion'' >-- John
OK, I got a core dump, what do I do with it now? It is 1.2G in size. On Wed, May 19, 2010 at 10:54 AM, John Andrunas <john at andrunas.net> wrote:> Hmmm... no coredump even though I configured it. > > Here is the trace though ?I will see what I can do about the coredump > > root at cluster:/export/home/admin# zfs mount vol2/vm2 > > panic[cpu3]/thread=ffffff001f45ec60: BAD TRAP: type=e (#pf Page fault) > rp=ffffff001f45e950 addr=30 occurred in module "zfs" due to a NULL > pointer deree > > zpool-vol2: #pf Page fault > Bad kernel fault at addr=0x30 > pid=1469, pc=0xfffffffff795d054, sp=0xffffff001f45ea48, eflags=0x10296 > cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> > cr2: 30cr3: 5000000cr8: c > > ? ? ? ?rdi: ? ? ? ? ? ? ? ?0 rsi: ffffff05208b2388 rdx: ffffff001f45e888 > ? ? ? ?rcx: ? ? ? ? ? ? ? ?0 ?r8: ? ? ? ?3000900ff ?r9: ? ? ? ? 198f5ff6 > ? ? ? ?rax: ? ? ? ? ? ? ? ?0 rbx: ? ? ? ? ? ? ?200 rbp: ffffff001f45ea50 > ? ? ? ?r10: ? ? ? ? c0130803 r11: ffffff001f45ec60 r12: ffffff05208b2388 > ? ? ? ?r13: ffffff0521fc4000 r14: ffffff050c0167e0 r15: ffffff050c0167e8 > ? ? ? ?fsb: ? ? ? ? ? ? ? ?0 gsb: ffffff04eb9b8080 ?ds: ? ? ? ? ? ? ? 4b > ? ? ? ? es: ? ? ? ? ? ? ? 4b ?fs: ? ? ? ? ? ? ? ?0 ?gs: ? ? ? ? ? ? ?1c3 > ? ? ? ?trp: ? ? ? ? ? ? ? ?e err: ? ? ? ? ? ? ? ?2 rip: fffffffff795d054 > ? ? ? ? cs: ? ? ? ? ? ? ? 30 rfl: ? ? ? ? ? ?10296 rsp: ffffff001f45ea48 > ? ? ? ? ss: ? ? ? ? ? ? ? 38 > > ffffff001f45e830 unix:die+dd () > ffffff001f45e940 unix:trap+177b () > ffffff001f45e950 unix:cmntrap+e6 () > ffffff001f45ea50 zfs:ddt_phys_decref+c () > ffffff001f45ea80 zfs:zio_ddt_free+55 () > ffffff001f45eab0 zfs:zio_execute+8d () > ffffff001f45eb50 genunix:taskq_thread+248 () > ffffff001f45eb60 unix:thread_start+8 () > > syncing file systems... done > skipping system dump - no dump device configured > rebooting... > > > On Wed, May 19, 2010 at 8:55 AM, Michael Schuster > <michael.schuster at oracle.com> wrote: >> On 19.05.10 17:53, John Andrunas wrote: >>> >>> Not to my knowledge, how would I go about getting one? ?(CC''ing discuss) >> >> man savecore and dumpadm. >> >> Michael >>> >>> >>> On Wed, May 19, 2010 at 8:46 AM, Mark J Musante<Mark.Musante at oracle.com> >>> ?wrote: >>>> >>>> Do you have a coredump? ?Or a stack trace of the panic? >>>> >>>> On Wed, 19 May 2010, John Andrunas wrote: >>>> >>>>> Running ZFS on a Nexenta box, I had a mirror get broken and apparently >>>>> the metadata is corrupt now. ?If I try and mount vol2 it works but if >>>>> I try and mount -a or mount vol2/vm2 is instantly kernel panics and >>>>> reboots. ?Is it possible to recover from this? ?I don''t care if I lose >>>>> the file listed below, but the other data in the volume would be >>>>> really nice to get back. ?I have scrubbed the volume to no avail. ?Any >>>>> other thoughts. >>>>> >>>>> >>>>> zpool status -xv vol2 >>>>> ?pool: vol2 >>>>> state: ONLINE >>>>> status: One or more devices has experienced an error resulting in data >>>>> ? ? ? corruption. ?Applications may be affected. >>>>> action: Restore the file in question if possible. ?Otherwise restore the >>>>> ? ? ? entire pool from backup. >>>>> ?see: http://www.sun.com/msg/ZFS-8000-8A >>>>> scrub: none requested >>>>> config: >>>>> >>>>> ? ? ? NAME ? ? ? ?STATE ? ? READ WRITE CKSUM >>>>> ? ? ? vol2 ? ? ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>>>> ? ? ? ? mirror-0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>>>> ? ? ? ? ? c3t3d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>>>> ? ? ? ? ? c3t2d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>>>> >>>>> errors: Permanent errors have been detected in the following files: >>>>> >>>>> ? ? ? vol2/vm2 at snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk >>>>> >>>>> -- >>>>> John >>>>> _______________________________________________ >>>>> zfs-discuss mailing list >>>>> zfs-discuss at opensolaris.org >>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>>> >>>> >>>> >>>> Regards, >>>> markm >>>> >>> >>> >>> >> >> >> -- >> michael.schuster at oracle.com ? ? http://blogs.sun.com/recursion >> Recursion, n.: see ''Recursion'' >> > > > > -- > John >-- John
First, I suggest you open a bug at https://defect.opensolaris.org/bz and get a bug number. Then, name your core dump something like "bug.<bugnumber>" and upload it using the instructions here: http://supportfiles.sun.com/upload Update the bug once you''ve uploaded the core and supply the name of the core file. Lori On 05/19/10 12:40 PM, John Andrunas wrote:> OK, I got a core dump, what do I do with it now? > > It is 1.2G in size. > > > On Wed, May 19, 2010 at 10:54 AM, John Andrunas<john at andrunas.net> wrote: > >> Hmmm... no coredump even though I configured it. >> >> Here is the trace though I will see what I can do about the coredump >> >> root at cluster:/export/home/admin# zfs mount vol2/vm2 >> >> panic[cpu3]/thread=ffffff001f45ec60: BAD TRAP: type=e (#pf Page fault) >> rp=ffffff001f45e950 addr=30 occurred in module "zfs" due to a NULL >> pointer deree >> >> zpool-vol2: #pf Page fault >> Bad kernel fault at addr=0x30 >> pid=1469, pc=0xfffffffff795d054, sp=0xffffff001f45ea48, eflags=0x10296 >> cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> >> cr2: 30cr3: 5000000cr8: c >> >> rdi: 0 rsi: ffffff05208b2388 rdx: ffffff001f45e888 >> rcx: 0 r8: 3000900ff r9: 198f5ff6 >> rax: 0 rbx: 200 rbp: ffffff001f45ea50 >> r10: c0130803 r11: ffffff001f45ec60 r12: ffffff05208b2388 >> r13: ffffff0521fc4000 r14: ffffff050c0167e0 r15: ffffff050c0167e8 >> fsb: 0 gsb: ffffff04eb9b8080 ds: 4b >> es: 4b fs: 0 gs: 1c3 >> trp: e err: 2 rip: fffffffff795d054 >> cs: 30 rfl: 10296 rsp: ffffff001f45ea48 >> ss: 38 >> >> ffffff001f45e830 unix:die+dd () >> ffffff001f45e940 unix:trap+177b () >> ffffff001f45e950 unix:cmntrap+e6 () >> ffffff001f45ea50 zfs:ddt_phys_decref+c () >> ffffff001f45ea80 zfs:zio_ddt_free+55 () >> ffffff001f45eab0 zfs:zio_execute+8d () >> ffffff001f45eb50 genunix:taskq_thread+248 () >> ffffff001f45eb60 unix:thread_start+8 () >> >> syncing file systems... done >> skipping system dump - no dump device configured >> rebooting... >> >> >> On Wed, May 19, 2010 at 8:55 AM, Michael Schuster >> <michael.schuster at oracle.com> wrote: >> >>> On 19.05.10 17:53, John Andrunas wrote: >>> >>>> Not to my knowledge, how would I go about getting one? (CC''ing discuss) >>>> >>> man savecore and dumpadm. >>> >>> Michael >>> >>>> >>>> On Wed, May 19, 2010 at 8:46 AM, Mark J Musante<Mark.Musante at oracle.com> >>>> wrote: >>>> >>>>> Do you have a coredump? Or a stack trace of the panic? >>>>> >>>>> On Wed, 19 May 2010, John Andrunas wrote: >>>>> >>>>> >>>>>> Running ZFS on a Nexenta box, I had a mirror get broken and apparently >>>>>> the metadata is corrupt now. If I try and mount vol2 it works but if >>>>>> I try and mount -a or mount vol2/vm2 is instantly kernel panics and >>>>>> reboots. Is it possible to recover from this? I don''t care if I lose >>>>>> the file listed below, but the other data in the volume would be >>>>>> really nice to get back. I have scrubbed the volume to no avail. Any >>>>>> other thoughts. >>>>>> >>>>>> >>>>>> zpool status -xv vol2 >>>>>> pool: vol2 >>>>>> state: ONLINE >>>>>> status: One or more devices has experienced an error resulting in data >>>>>> corruption. Applications may be affected. >>>>>> action: Restore the file in question if possible. Otherwise restore the >>>>>> entire pool from backup. >>>>>> see: http://www.sun.com/msg/ZFS-8000-8A >>>>>> scrub: none requested >>>>>> config: >>>>>> >>>>>> NAME STATE READ WRITE CKSUM >>>>>> vol2 ONLINE 0 0 0 >>>>>> mirror-0 ONLINE 0 0 0 >>>>>> c3t3d0 ONLINE 0 0 0 >>>>>> c3t2d0 ONLINE 0 0 0 >>>>>> >>>>>> errors: Permanent errors have been detected in the following files: >>>>>> >>>>>> vol2/vm2 at snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk >>>>>> >>>>>> -- >>>>>> John >>>>>> _______________________________________________ >>>>>> zfs-discuss mailing list >>>>>> zfs-discuss at opensolaris.org >>>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>>>> >>>>>> >>>>> >>>>> Regards, >>>>> markm >>>>> >>>>> >>>> >>>> >>>> >>> >>> -- >>> michael.schuster at oracle.com http://blogs.sun.com/recursion >>> Recursion, n.: see ''Recursion'' >>> >>> >> >> >> -- >> John >> >> > > >
On Wed, 19 May 2010, John Andrunas wrote:> ffffff001f45e830 unix:die+dd () > ffffff001f45e940 unix:trap+177b () > ffffff001f45e950 unix:cmntrap+e6 () > ffffff001f45ea50 zfs:ddt_phys_decref+c () > ffffff001f45ea80 zfs:zio_ddt_free+55 () > ffffff001f45eab0 zfs:zio_execute+8d () > ffffff001f45eb50 genunix:taskq_thread+248 () > ffffff001f45eb60 unix:thread_start+8 ()This shows you''re using some recent bits that includes dedup. How recent is your build? The stack you show here is similar to that in CR 6915314, which we haven''t been able to root-cause yet. Let me know if you get a chance to upload the core as Lori Alt outlined, and I can update our bug tracking system to reflect that. Regards, markm