I encountered the following ZFS panic today and am looking for suggestions
on how to resolve it.
First panic:
panic[cpu0]/thread=ffffff000fa9cc80: assertion failed: 0 ==
dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs,
&dbp), file: ../../common/fs/zfs/dmu.c, line: 435
ffffff000fa9c8d0 genunix:assfail+7e ()
ffffff000fa9c980 zfs:dmu_write+15c ()
ffffff000fa9ca40 zfs:space_map_sync+27a ()
ffffff000fa9cae0 zfs:metaslab_sync+24f ()
ffffff000fa9cb40 zfs:vdev_sync+b5 ()
ffffff000fa9cbd0 zfs:spa_sync+1e2 ()
ffffff000fa9cc60 zfs:txg_sync_thread+19a ()
ffffff000fa9cc70 unix:thread_start+8 ()
Recursive panic at boot:
panic[cpu3]/thread=ffffff000f818c80: BAD TRAP: type=e (#pf Page fault)
rp=ffffff000f818610 addr=0 occurred in module "unix" due to a NULL
pointer dereference
sched: #pf Page fault
Bad kernel fault at addr=0x0
pid=0, pc=0xfffffffffb83d24b, sp=0xffffff000f818708, eflags=0x10246
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4:
6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 0 cr3: 8c00000 cr8: c
rdi: 0 rsi: 1 rdx: ffffff000f818c80
rcx: 23 r8: 0 r9: fffffffec2a65e80
rax: 0 rbx: 0 rbp: ffffff000f818780
r10: 6ea8f08d984d6b r11: fffffffec2a38000 r12: 2bd40a
r13: 400 r14: 1b0005c8800 r15: fffffffecc992580
fsb: fffffd7fff382000 gsb: fffffffec28aaa80 ds: 0
es: 0 fs: 0 gs: 0
trp: e err: 2 rip: fffffffffb83d24b
cs: 30 rfl: 10246 rsp: ffffff000f818708
ss: 38
ffffff000f8184f0 unix:die+c8 ()
ffffff000f818600 unix:trap+135b ()
ffffff000f818610 unix:_cmntrap+e9 ()
ffffff000f818780 unix:mutex_enter+b ()
ffffff000f8187f0 zfs:metaslab_free+97 ()
ffffff000f818820 zfs:zio_dva_free+29 ()
ffffff000f818840 zfs:zio_next_stage+b3 ()
ffffff000f818860 zfs:zio_gang_pipeline+31 ()
ffffff000f818880 zfs:zio_next_stage+b3 ()
ffffff000f8188d0 zfs:zio_wait_for_children+5d ()
ffffff000f8188f0 zfs:zio_wait_children_ready+20 ()
ffffff000f818910 zfs:zio_next_stage_async+bb ()
ffffff000f818930 zfs:zio_nowait+11 ()
ffffff000f8189c0 zfs:arc_free+174 ()
ffffff000f818a50 zfs:dsl_dataset_block_kill+25f ()
ffffff000f818ad0 zfs:dmu_objset_sync+90 ()
ffffff000f818b40 zfs:dsl_pool_sync+199 ()
ffffff000f818bd0 zfs:spa_sync+1c5 ()
ffffff000f818c60 zfs:txg_sync_thread+19a ()
ffffff000f818c70 unix:thread_start+8 ()
The events that lead up to this:
- "data" zpool consists of two 7 disk raidz1 vdevs with 2 hot spares,
one
disk in each vdev was faulted with hot spares active.
- faulted disks were cleared because errors were suspicious, hot spares
were detached and marked as available. zpool status shows pool as
normal
- "data_new" zpool was created and consists of two 8 disk raidz2 vdevs
- recursive zfs send/recv command was issued to copy filesystems and
snapshots from "data" to "data_new", which immediately
triggered
the first panic
- system refuses to boot now with recursive panic while trying to mount
zpools
Bill Sommerfeld on #opensolaris suggested that this may be related
to bugid 6393634, but it doesn''t offer any suggestion on how to recover
from it. I''m guessing I can boot from DVD and remove the zpool.cache
in order to get the machine to boot, but I may not be able to import
the affected pool. I''d like to save it if possible.
Any help would be appreciated.
-phillip