Running ZFS on a Nexenta box, I had a mirror get broken and apparently
the metadata is corrupt now. If I try and mount vol2 it works but if
I try and mount -a or mount vol2/vm2 is instantly kernel panics and
reboots. Is it possible to recover from this? I don''t care if I lose
the file listed below, but the other data in the volume would be
really nice to get back. I have scrubbed the volume to no avail. Any
other thoughts.
zpool status -xv vol2
pool: vol2
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
vol2 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
vol2/vm2 at snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk
--
John
Do you have a coredump? Or a stack trace of the panic? On Wed, 19 May 2010, John Andrunas wrote:> Running ZFS on a Nexenta box, I had a mirror get broken and apparently > the metadata is corrupt now. If I try and mount vol2 it works but if > I try and mount -a or mount vol2/vm2 is instantly kernel panics and > reboots. Is it possible to recover from this? I don''t care if I lose > the file listed below, but the other data in the volume would be > really nice to get back. I have scrubbed the volume to no avail. Any > other thoughts. > > > zpool status -xv vol2 > pool: vol2 > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > vol2 ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c3t3d0 ONLINE 0 0 0 > c3t2d0 ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > vol2/vm2 at snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk > > -- > John > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >Regards, markm
Not to my knowledge, how would I go about getting one? (CC''ing discuss) On Wed, May 19, 2010 at 8:46 AM, Mark J Musante <Mark.Musante at oracle.com> wrote:> > Do you have a coredump? ?Or a stack trace of the panic? > > On Wed, 19 May 2010, John Andrunas wrote: > >> Running ZFS on a Nexenta box, I had a mirror get broken and apparently >> the metadata is corrupt now. ?If I try and mount vol2 it works but if >> I try and mount -a or mount vol2/vm2 is instantly kernel panics and >> reboots. ?Is it possible to recover from this? ?I don''t care if I lose >> the file listed below, but the other data in the volume would be >> really nice to get back. ?I have scrubbed the volume to no avail. ?Any >> other thoughts. >> >> >> zpool status -xv vol2 >> ?pool: vol2 >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> ? ? ? corruption. ?Applications may be affected. >> action: Restore the file in question if possible. ?Otherwise restore the >> ? ? ? entire pool from backup. >> ?see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: none requested >> config: >> >> ? ? ? NAME ? ? ? ?STATE ? ? READ WRITE CKSUM >> ? ? ? vol2 ? ? ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> ? ? ? ? mirror-0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> ? ? ? ? ? c3t3d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> ? ? ? ? ? c3t2d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >> >> errors: Permanent errors have been detected in the following files: >> >> ? ? ? vol2/vm2 at snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk >> >> -- >> John >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > > Regards, > markm >-- John
On 19.05.10 17:53, John Andrunas wrote:> Not to my knowledge, how would I go about getting one? (CC''ing discuss)man savecore and dumpadm. Michael> > > On Wed, May 19, 2010 at 8:46 AM, Mark J Musante<Mark.Musante at oracle.com> wrote: >> >> Do you have a coredump? Or a stack trace of the panic? >> >> On Wed, 19 May 2010, John Andrunas wrote: >> >>> Running ZFS on a Nexenta box, I had a mirror get broken and apparently >>> the metadata is corrupt now. If I try and mount vol2 it works but if >>> I try and mount -a or mount vol2/vm2 is instantly kernel panics and >>> reboots. Is it possible to recover from this? I don''t care if I lose >>> the file listed below, but the other data in the volume would be >>> really nice to get back. I have scrubbed the volume to no avail. Any >>> other thoughts. >>> >>> >>> zpool status -xv vol2 >>> pool: vol2 >>> state: ONLINE >>> status: One or more devices has experienced an error resulting in data >>> corruption. Applications may be affected. >>> action: Restore the file in question if possible. Otherwise restore the >>> entire pool from backup. >>> see: http://www.sun.com/msg/ZFS-8000-8A >>> scrub: none requested >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> vol2 ONLINE 0 0 0 >>> mirror-0 ONLINE 0 0 0 >>> c3t3d0 ONLINE 0 0 0 >>> c3t2d0 ONLINE 0 0 0 >>> >>> errors: Permanent errors have been detected in the following files: >>> >>> vol2/vm2 at snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk >>> >>> -- >>> John >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>> >> >> >> Regards, >> markm >> > > >-- michael.schuster at oracle.com http://blogs.sun.com/recursion Recursion, n.: see ''Recursion''
Hmmm... no coredump even though I configured it.
Here is the trace though I will see what I can do about the coredump
root at cluster:/export/home/admin# zfs mount vol2/vm2
panic[cpu3]/thread=ffffff001f45ec60: BAD TRAP: type=e (#pf Page fault)
rp=ffffff001f45e950 addr=30 occurred in module "zfs" due to a NULL
pointer deree
zpool-vol2: #pf Page fault
Bad kernel fault at addr=0x30
pid=1469, pc=0xfffffffff795d054, sp=0xffffff001f45ea48, eflags=0x10296
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4:
6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 30cr3: 5000000cr8: c
rdi: 0 rsi: ffffff05208b2388 rdx: ffffff001f45e888
rcx: 0 r8: 3000900ff r9: 198f5ff6
rax: 0 rbx: 200 rbp: ffffff001f45ea50
r10: c0130803 r11: ffffff001f45ec60 r12: ffffff05208b2388
r13: ffffff0521fc4000 r14: ffffff050c0167e0 r15: ffffff050c0167e8
fsb: 0 gsb: ffffff04eb9b8080 ds: 4b
es: 4b fs: 0 gs: 1c3
trp: e err: 2 rip: fffffffff795d054
cs: 30 rfl: 10296 rsp: ffffff001f45ea48
ss: 38
ffffff001f45e830 unix:die+dd ()
ffffff001f45e940 unix:trap+177b ()
ffffff001f45e950 unix:cmntrap+e6 ()
ffffff001f45ea50 zfs:ddt_phys_decref+c ()
ffffff001f45ea80 zfs:zio_ddt_free+55 ()
ffffff001f45eab0 zfs:zio_execute+8d ()
ffffff001f45eb50 genunix:taskq_thread+248 ()
ffffff001f45eb60 unix:thread_start+8 ()
syncing file systems... done
skipping system dump - no dump device configured
rebooting...
On Wed, May 19, 2010 at 8:55 AM, Michael Schuster
<michael.schuster at oracle.com> wrote:> On 19.05.10 17:53, John Andrunas wrote:
>>
>> Not to my knowledge, how would I go about getting one?
?(CC''ing discuss)
>
> man savecore and dumpadm.
>
> Michael
>>
>>
>> On Wed, May 19, 2010 at 8:46 AM, Mark J Musante<Mark.Musante at
oracle.com>
>> ?wrote:
>>>
>>> Do you have a coredump? ?Or a stack trace of the panic?
>>>
>>> On Wed, 19 May 2010, John Andrunas wrote:
>>>
>>>> Running ZFS on a Nexenta box, I had a mirror get broken and
apparently
>>>> the metadata is corrupt now. ?If I try and mount vol2 it works
but if
>>>> I try and mount -a or mount vol2/vm2 is instantly kernel panics
and
>>>> reboots. ?Is it possible to recover from this? ?I
don''t care if I lose
>>>> the file listed below, but the other data in the volume would
be
>>>> really nice to get back. ?I have scrubbed the volume to no
avail. ?Any
>>>> other thoughts.
>>>>
>>>>
>>>> zpool status -xv vol2
>>>> ?pool: vol2
>>>> state: ONLINE
>>>> status: One or more devices has experienced an error resulting
in data
>>>> ? ? ? corruption. ?Applications may be affected.
>>>> action: Restore the file in question if possible. ?Otherwise
restore the
>>>> ? ? ? entire pool from backup.
>>>> ?see: http://www.sun.com/msg/ZFS-8000-8A
>>>> scrub: none requested
>>>> config:
>>>>
>>>> ? ? ? NAME ? ? ? ?STATE ? ? READ WRITE CKSUM
>>>> ? ? ? vol2 ? ? ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
>>>> ? ? ? ? mirror-0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
>>>> ? ? ? ? ? c3t3d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
>>>> ? ? ? ? ? c3t2d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
>>>>
>>>> errors: Permanent errors have been detected in the following
files:
>>>>
>>>> ? ? ? vol2/vm2 at
snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk
>>>>
>>>> --
>>>> John
>>>> _______________________________________________
>>>> zfs-discuss mailing list
>>>> zfs-discuss at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>>
>>>
>>>
>>> Regards,
>>> markm
>>>
>>
>>
>>
>
>
> --
> michael.schuster at oracle.com ? ? http://blogs.sun.com/recursion
> Recursion, n.: see ''Recursion''
>
--
John
OK, I got a core dump, what do I do with it now? It is 1.2G in size. On Wed, May 19, 2010 at 10:54 AM, John Andrunas <john at andrunas.net> wrote:> Hmmm... no coredump even though I configured it. > > Here is the trace though ?I will see what I can do about the coredump > > root at cluster:/export/home/admin# zfs mount vol2/vm2 > > panic[cpu3]/thread=ffffff001f45ec60: BAD TRAP: type=e (#pf Page fault) > rp=ffffff001f45e950 addr=30 occurred in module "zfs" due to a NULL > pointer deree > > zpool-vol2: #pf Page fault > Bad kernel fault at addr=0x30 > pid=1469, pc=0xfffffffff795d054, sp=0xffffff001f45ea48, eflags=0x10296 > cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> > cr2: 30cr3: 5000000cr8: c > > ? ? ? ?rdi: ? ? ? ? ? ? ? ?0 rsi: ffffff05208b2388 rdx: ffffff001f45e888 > ? ? ? ?rcx: ? ? ? ? ? ? ? ?0 ?r8: ? ? ? ?3000900ff ?r9: ? ? ? ? 198f5ff6 > ? ? ? ?rax: ? ? ? ? ? ? ? ?0 rbx: ? ? ? ? ? ? ?200 rbp: ffffff001f45ea50 > ? ? ? ?r10: ? ? ? ? c0130803 r11: ffffff001f45ec60 r12: ffffff05208b2388 > ? ? ? ?r13: ffffff0521fc4000 r14: ffffff050c0167e0 r15: ffffff050c0167e8 > ? ? ? ?fsb: ? ? ? ? ? ? ? ?0 gsb: ffffff04eb9b8080 ?ds: ? ? ? ? ? ? ? 4b > ? ? ? ? es: ? ? ? ? ? ? ? 4b ?fs: ? ? ? ? ? ? ? ?0 ?gs: ? ? ? ? ? ? ?1c3 > ? ? ? ?trp: ? ? ? ? ? ? ? ?e err: ? ? ? ? ? ? ? ?2 rip: fffffffff795d054 > ? ? ? ? cs: ? ? ? ? ? ? ? 30 rfl: ? ? ? ? ? ?10296 rsp: ffffff001f45ea48 > ? ? ? ? ss: ? ? ? ? ? ? ? 38 > > ffffff001f45e830 unix:die+dd () > ffffff001f45e940 unix:trap+177b () > ffffff001f45e950 unix:cmntrap+e6 () > ffffff001f45ea50 zfs:ddt_phys_decref+c () > ffffff001f45ea80 zfs:zio_ddt_free+55 () > ffffff001f45eab0 zfs:zio_execute+8d () > ffffff001f45eb50 genunix:taskq_thread+248 () > ffffff001f45eb60 unix:thread_start+8 () > > syncing file systems... done > skipping system dump - no dump device configured > rebooting... > > > On Wed, May 19, 2010 at 8:55 AM, Michael Schuster > <michael.schuster at oracle.com> wrote: >> On 19.05.10 17:53, John Andrunas wrote: >>> >>> Not to my knowledge, how would I go about getting one? ?(CC''ing discuss) >> >> man savecore and dumpadm. >> >> Michael >>> >>> >>> On Wed, May 19, 2010 at 8:46 AM, Mark J Musante<Mark.Musante at oracle.com> >>> ?wrote: >>>> >>>> Do you have a coredump? ?Or a stack trace of the panic? >>>> >>>> On Wed, 19 May 2010, John Andrunas wrote: >>>> >>>>> Running ZFS on a Nexenta box, I had a mirror get broken and apparently >>>>> the metadata is corrupt now. ?If I try and mount vol2 it works but if >>>>> I try and mount -a or mount vol2/vm2 is instantly kernel panics and >>>>> reboots. ?Is it possible to recover from this? ?I don''t care if I lose >>>>> the file listed below, but the other data in the volume would be >>>>> really nice to get back. ?I have scrubbed the volume to no avail. ?Any >>>>> other thoughts. >>>>> >>>>> >>>>> zpool status -xv vol2 >>>>> ?pool: vol2 >>>>> state: ONLINE >>>>> status: One or more devices has experienced an error resulting in data >>>>> ? ? ? corruption. ?Applications may be affected. >>>>> action: Restore the file in question if possible. ?Otherwise restore the >>>>> ? ? ? entire pool from backup. >>>>> ?see: http://www.sun.com/msg/ZFS-8000-8A >>>>> scrub: none requested >>>>> config: >>>>> >>>>> ? ? ? NAME ? ? ? ?STATE ? ? READ WRITE CKSUM >>>>> ? ? ? vol2 ? ? ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>>>> ? ? ? ? mirror-0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>>>> ? ? ? ? ? c3t3d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>>>> ? ? ? ? ? c3t2d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 >>>>> >>>>> errors: Permanent errors have been detected in the following files: >>>>> >>>>> ? ? ? vol2/vm2 at snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk >>>>> >>>>> -- >>>>> John >>>>> _______________________________________________ >>>>> zfs-discuss mailing list >>>>> zfs-discuss at opensolaris.org >>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>>> >>>> >>>> >>>> Regards, >>>> markm >>>> >>> >>> >>> >> >> >> -- >> michael.schuster at oracle.com ? ? http://blogs.sun.com/recursion >> Recursion, n.: see ''Recursion'' >> > > > > -- > John >-- John
First, I suggest you open a bug at https://defect.opensolaris.org/bz and get a bug number. Then, name your core dump something like "bug.<bugnumber>" and upload it using the instructions here: http://supportfiles.sun.com/upload Update the bug once you''ve uploaded the core and supply the name of the core file. Lori On 05/19/10 12:40 PM, John Andrunas wrote:> OK, I got a core dump, what do I do with it now? > > It is 1.2G in size. > > > On Wed, May 19, 2010 at 10:54 AM, John Andrunas<john at andrunas.net> wrote: > >> Hmmm... no coredump even though I configured it. >> >> Here is the trace though I will see what I can do about the coredump >> >> root at cluster:/export/home/admin# zfs mount vol2/vm2 >> >> panic[cpu3]/thread=ffffff001f45ec60: BAD TRAP: type=e (#pf Page fault) >> rp=ffffff001f45e950 addr=30 occurred in module "zfs" due to a NULL >> pointer deree >> >> zpool-vol2: #pf Page fault >> Bad kernel fault at addr=0x30 >> pid=1469, pc=0xfffffffff795d054, sp=0xffffff001f45ea48, eflags=0x10296 >> cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> >> cr2: 30cr3: 5000000cr8: c >> >> rdi: 0 rsi: ffffff05208b2388 rdx: ffffff001f45e888 >> rcx: 0 r8: 3000900ff r9: 198f5ff6 >> rax: 0 rbx: 200 rbp: ffffff001f45ea50 >> r10: c0130803 r11: ffffff001f45ec60 r12: ffffff05208b2388 >> r13: ffffff0521fc4000 r14: ffffff050c0167e0 r15: ffffff050c0167e8 >> fsb: 0 gsb: ffffff04eb9b8080 ds: 4b >> es: 4b fs: 0 gs: 1c3 >> trp: e err: 2 rip: fffffffff795d054 >> cs: 30 rfl: 10296 rsp: ffffff001f45ea48 >> ss: 38 >> >> ffffff001f45e830 unix:die+dd () >> ffffff001f45e940 unix:trap+177b () >> ffffff001f45e950 unix:cmntrap+e6 () >> ffffff001f45ea50 zfs:ddt_phys_decref+c () >> ffffff001f45ea80 zfs:zio_ddt_free+55 () >> ffffff001f45eab0 zfs:zio_execute+8d () >> ffffff001f45eb50 genunix:taskq_thread+248 () >> ffffff001f45eb60 unix:thread_start+8 () >> >> syncing file systems... done >> skipping system dump - no dump device configured >> rebooting... >> >> >> On Wed, May 19, 2010 at 8:55 AM, Michael Schuster >> <michael.schuster at oracle.com> wrote: >> >>> On 19.05.10 17:53, John Andrunas wrote: >>> >>>> Not to my knowledge, how would I go about getting one? (CC''ing discuss) >>>> >>> man savecore and dumpadm. >>> >>> Michael >>> >>>> >>>> On Wed, May 19, 2010 at 8:46 AM, Mark J Musante<Mark.Musante at oracle.com> >>>> wrote: >>>> >>>>> Do you have a coredump? Or a stack trace of the panic? >>>>> >>>>> On Wed, 19 May 2010, John Andrunas wrote: >>>>> >>>>> >>>>>> Running ZFS on a Nexenta box, I had a mirror get broken and apparently >>>>>> the metadata is corrupt now. If I try and mount vol2 it works but if >>>>>> I try and mount -a or mount vol2/vm2 is instantly kernel panics and >>>>>> reboots. Is it possible to recover from this? I don''t care if I lose >>>>>> the file listed below, but the other data in the volume would be >>>>>> really nice to get back. I have scrubbed the volume to no avail. Any >>>>>> other thoughts. >>>>>> >>>>>> >>>>>> zpool status -xv vol2 >>>>>> pool: vol2 >>>>>> state: ONLINE >>>>>> status: One or more devices has experienced an error resulting in data >>>>>> corruption. Applications may be affected. >>>>>> action: Restore the file in question if possible. Otherwise restore the >>>>>> entire pool from backup. >>>>>> see: http://www.sun.com/msg/ZFS-8000-8A >>>>>> scrub: none requested >>>>>> config: >>>>>> >>>>>> NAME STATE READ WRITE CKSUM >>>>>> vol2 ONLINE 0 0 0 >>>>>> mirror-0 ONLINE 0 0 0 >>>>>> c3t3d0 ONLINE 0 0 0 >>>>>> c3t2d0 ONLINE 0 0 0 >>>>>> >>>>>> errors: Permanent errors have been detected in the following files: >>>>>> >>>>>> vol2/vm2 at snap-daily-1-2010-05-06-0000:/as5/as5-flat.vmdk >>>>>> >>>>>> -- >>>>>> John >>>>>> _______________________________________________ >>>>>> zfs-discuss mailing list >>>>>> zfs-discuss at opensolaris.org >>>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>>>> >>>>>> >>>>> >>>>> Regards, >>>>> markm >>>>> >>>>> >>>> >>>> >>>> >>> >>> -- >>> michael.schuster at oracle.com http://blogs.sun.com/recursion >>> Recursion, n.: see ''Recursion'' >>> >>> >> >> >> -- >> John >> >> > > >
On Wed, 19 May 2010, John Andrunas wrote:> ffffff001f45e830 unix:die+dd () > ffffff001f45e940 unix:trap+177b () > ffffff001f45e950 unix:cmntrap+e6 () > ffffff001f45ea50 zfs:ddt_phys_decref+c () > ffffff001f45ea80 zfs:zio_ddt_free+55 () > ffffff001f45eab0 zfs:zio_execute+8d () > ffffff001f45eb50 genunix:taskq_thread+248 () > ffffff001f45eb60 unix:thread_start+8 ()This shows you''re using some recent bits that includes dedup. How recent is your build? The stack you show here is similar to that in CR 6915314, which we haven''t been able to root-cause yet. Let me know if you get a chance to upload the core as Lori Alt outlined, and I can update our bug tracking system to reflect that. Regards, markm