On two x86 32-bit systems, running snv_34 bfu''ed to on-20060313 release bits [*], I''m getting a reproducible "dangling dbufs" panic when I try to poweroff these systems by pressing the ACPI power button. The problem does not happen when I use the "poweroff" or "halt" command. Both systems have their "/usr" copied to (and mounted from) a compressed zfs filesystem. A third box with /usr on ufs doesn''t show this problem. [*] actually on-20060313, with the two fixes from this thread: http://www.opensolaris.org/jive/thread.jspa?threadID=6882&tstart=0> ::statusdebugging crash dump vmcore.1 (32-bit) from max operating system: 5.11 wos_b36_2 (i86pc) panic message: dangling dbufs (dn=d507f048, dbuf=d5e4df00) dump content: kernel pages only> $cvpanic(f9b14564, d507f048, d5e4df00) dnode_evict_dbufs+0x179() dmu_objset_evict_dbufs+0xcd() zfs_objset_close+0x15d(d45f7680) zfs_umount+0xf6() fsop_unmount+0x18(d45dd000, 0, d377bf38) dounmount+0x46(d45dd000, 0, d377bf38) vfs_unmountall+0x7e() kadmin+0x3a1(2, 6, 0, d377bf38) uadmin+0x85() sys_call+0x104()> ::fsinfoVFSP FS MOUNT fec63030 ufs / fec63970 devfs /devices d3a0e080 ctfs /system/contract d3a0b500 proc /proc d3a0a000 mntfs /etc/mnttab d3a4bac0 tmpfs /etc/svc/volatile d3a4b040 objfs /system/object d45de500 namefs /etc/svc/volatile/repository_door d45dd000 zfs /usr> ::psS PID PPID PGID SID UID FLAGS ADDR NAME R 0 0 0 0 0 0x00000001 fec1dd7c sched R 3 0 0 0 0 0x00020001 d40d0278 fsflush R 2 0 0 0 0 0x00020001 d40d0ae0 pageout Z 1 0 0 0 0 0x42024002 d40d1348 init R 7 1 7 7 0 0x42020002 d40cf1a8 svc.startd> ::msgbuf... panic[cpu0]/thread=d45f8000: dangling dbufs (dn=d507f048, dbuf=d5e4df00) d4647d28 zfs:dnode_evict_dbufs+179 (d507f048, d45f770c,) d4647d54 zfs:dmu_objset_evict_dbufs+cd (d5e50f70, d377bf38,) d4647d88 zfs:zfs_objset_close+15d (d45f7680) d4647dac zfs:zfs_umount+f6 (d45dd000, 0, d377bf) d4647dc0 genunix:fsop_unmount+18 (d45dd000, 0, d377bf) d4647de0 genunix:dounmount+46 (d45dd000, 0, d377bf) d4647e0c genunix:vfs_unmountall+7e (0, 2, d377bf38, 5, ) d4647e3c genunix:kadmin+3a1 (2, 6, 0, d377bf38) d4647f84 genunix:uadmin+85 (2, 6, 0, d4647fac, ) syncing file systems... done dumping to /dev/dsk/c0d0s1, offset 107806720, content: kernel ===================================================================And on another box: # mdb -k 3 Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp ufs ip sctp usba uhci s1394 nca zfs random fctl lofs nfs audiosup: crypto sppp ptm ipc ]> ::statusdebugging crash dump vmcore.3 (32-bit) from tiger2 operating system: 5.11 wos_b36 (i86pc) panic message: dangling dbufs (dn=d1afddf8, dbuf=d10a3e50) dump content: kernel pages only> ::psS PID PPID PGID SID UID FLAGS ADDR NAME R 0 0 0 0 0 0x00000001 fec1dd7c sched R 3 0 0 0 0 0x00020001 d02a8278 fsflush R 2 0 0 0 0 0x00020001 d02a8ae0 pageout Z 1 0 0 0 0 0x42024002 d02a9348 init R 7 1 7 7 0 0x42020002 d02a71a8 svc.startd> ::fsinfoVFSP FS MOUNT fec63030 ufs / fec63970 devfs /devices cfd4b580 ctfs /system/contract cfd4ab00 proc /proc cfdcf500 mntfs /etc/mnttab cfdce000 tmpfs /etc/svc/volatile cfdc7540 objfs /system/object cff47080 namefs /etc/svc/volatile/repository_door cff48580 zfs /usr> $cvpanic(f7db056c, d1afddf8, d10a3e50) dnode_evict_dbufs+0x179() dmu_objset_evict_dbufs+0xcd() zfs_objset_close+0x15d(cfd7c680) zfs_umount+0xf6() fsop_unmount+0x18(cff48580, 0, d6b01f38) dounmount+0x46(cff48580, 0, d6b01f38) vfs_unmountall+0x7e() kadmin+0x3a1(2, 6, 0, d6b01f38) uadmin+0x85() sys_call+0x104()> ::msgbufMESSAGE sd1 is /pci at 0,0/pci1022,7414 at 7,4/storage at 3/disk at 0,1 /pci at 0,0/pci1022,7414 at 7,4/storage at 3/disk at 0,1 (sd1) online .... asy1 is /isa/asy at 1,2f8 pseudo-device: pm0 pm0 is /pseudo/pm at 0 panic[cpu1]/thread=d044e000: dangling dbufs (dn=d1afddf8, dbuf=d10a3e50) d04a4d28 zfs:dnode_evict_dbufs+179 (d1afddf8, cfd7c70c,) d04a4d54 zfs:dmu_objset_evict_dbufs+cd (d0ec6700, d6b01f38,) d04a4d88 zfs:zfs_objset_close+15d (cfd7c680) d04a4dac zfs:zfs_umount+f6 (cff48580, 0, d6b01f) d04a4dc0 genunix:fsop_unmount+18 (cff48580, 0, d6b01f) d04a4de0 genunix:dounmount+46 (cff48580, 0, d6b01f) d04a4e0c genunix:vfs_unmountall+7e (0, 2, d6b01f38, 5, ) d04a4e3c genunix:kadmin+3a1 (2, 6, 0, d6b01f38) d04a4f84 genunix:uadmin+85 (2, 6, 0, d04a4fac, ) syncing file systems... done dumping to /dev/dsk/c0d0s1, offset 429391872, content: kernel> d1afddf8::whatisd1afddf8 is d1afddf8+0, allocated from dnode_t> d1afddf8::print dnode_t{ dn_struct_rwlock = { _opaque = [ 0 ] } dn_link = { list_next = 0xd10a564c list_prev = 0xd1afdc74 } dn_objset = 0xd05186c0 dn_object = 0x2 dn_dbuf = 0xd152d260 dn_phys = 0xd1bfc400 dn_type = 0t22 (DMU_OT_DELETE_QUEUE) dn_bonuslen = 0 dn_bonustype = 0 dn_nblkptr = 0x3 dn_checksum = 0 dn_compress = 0 dn_nlevels = 0x2 dn_indblkshift = 0xe dn_datablkshift = 0xe dn_datablkszsec = 0x20 dn_datablksz = 0x4000 dn_maxblkid = 0x40 dn_next_nlevels = [ 0, 0, 0, 0 ] dn_next_indblkshift = [ 0, 0, 0, 0 ] dn_next_blksz = [ 0, 0, 0, 0 ] dn_dirty_link = [ { list_next = 0 list_prev = 0 } { list_next = 0 list_prev = 0 } { list_next = 0 list_prev = 0 } { list_next = 0 list_prev = 0 } ] dn_mtx = { _opaque = [ 0, 0 ] } dn_dirty_dbufs = [ { list_size = 0xc0 list_offset = 0x6c list_head = { list_next = 0xd1afde7c list_prev = 0xd1afde7c } } { list_size = 0xc0 list_offset = 0x74 list_head = { list_next = 0xd1afde8c list_prev = 0xd1afde8c } } { list_size = 0xc0 list_offset = 0x7c list_head = { list_next = 0xd1afde9c list_prev = 0xd1afde9c } } { list_size = 0xc0 list_offset = 0x84 list_head = { list_next = 0xd1afdeac list_prev = 0xd1afdeac } } ] dn_ranges = [ { avl_root = 0 avl_compar = free_range_compar avl_offset = 0 avl_numnodes = 0 avl_size = 0x20 } { avl_root = 0 avl_compar = free_range_compar avl_offset = 0 avl_numnodes = 0 avl_size = 0x20 } { avl_root = 0 avl_compar = free_range_compar avl_offset = 0 avl_numnodes = 0 avl_size = 0x20 } { avl_root = 0 avl_compar = free_range_compar avl_offset = 0 avl_numnodes = 0 avl_size = 0x20 } ] dn_allocated_txg = 0 dn_free_txg = 0 dn_assigned_txg = 0 dn_assigned_tx = 0 dn_notxholds = { _opaque = 0 } dn_dirtyctx = 0 (DN_UNDIRTIED) dn_dirtyctx_firstset = 0 dn_tx_holds = { rc_count = 0 } dn_holds = { rc_count = 0x4 } dn_dbufs_mtx = { _opaque = [ 0, 0 ] } dn_dbufs = { list_size = 0xc0 list_offset = 0x64 list_head = { list_next = 0xd10a3eb4 list_prev = 0xd152d204 } } dn_bonus = 0 dn_zfetch = { zf_rwlock = { _opaque = [ 0 ] } zf_stream = { list_size = 0x48 list_offset = 0x38 list_head = { list_next = 0xd151fa58 list_prev = 0xd151fa58 } } zf_dnode = 0xd1afddf8 zf_stream_cnt = 0x1 zf_alloc_fail = 0x1 } }> d10a3e50::whatisd10a3e50 is d10a3e50+0, allocated from dmu_buf_impl_t> d10a3e50::print dmu_buf_impl_t{ db = { db_object = 0x2 db_offset = 0x4000 db_size = 0x4000 db_data = 0xd1bf4000 } db_objset = 0xd05186c0 db_dnode = 0xd1afddf8 db_parent = 0xd152d1a0 db_hash_next = 0 db_blkid = 0x1 db_blkptr = 0xd1bf8080 db_level = 0 db_mtx = { _opaque = [ 0, 0 ] } db_state = 3 (DB_CACHED) db_holds = { rc_count = 0x1 } db_buf = 0xd0ba16c8 db_changed = { _opaque = 0 } db_data_pending = 0 db_dirtied = 0 db_link = { list_next = 0xd10a3f74 list_prev = 0xd1afdf4c } db_dirty_node = [ { list_next = 0 list_prev = 0 } { list_next = 0 list_prev = 0 } { list_next = 0 list_prev = 0 } { list_next = 0 list_prev = 0 } ] db_dirtycnt = 0 db_d = { db_user_ptr = 0xd0a6e650 db_user_data_ptr_ptr = 0xd0a6e664 db_evict_func = zap_leaf_pageout db_immediate_evict = 0 db_freed_in_flight = 0 db_data_old = [ 0, 0, 0, 0 ] db_overridden_by = [ 0, 0, 0, 0 ] } }>This message posted from opensolaris.org
J?rgen Keil wrote:> On two x86 32-bit systems, running snv_34 bfu''ed to on-20060313 release bits [*], I''m getting a > reproducible "dangling dbufs" panic when I try to poweroff these systems by pressing the ACPI > power button. The problem does not happen when I use the "poweroff" or "halt" command.A couple of questions: - specifically what kind of systems are these? - have you tried this in 64-bit mode? I''ll see if I can reproduce this here later today. Dana> Both systems have their "/usr" copied to (and mounted from) a compressed zfs filesystem. > > A third box with /usr on ufs doesn''t show this problem. > > > [*] actually on-20060313, with the two fixes from this thread: > http://www.opensolaris.org/jive/thread.jspa?threadID=6882&tstart=0 > > >> ::status > debugging crash dump vmcore.1 (32-bit) from max > operating system: 5.11 wos_b36_2 (i86pc) > panic message: dangling dbufs (dn=d507f048, dbuf=d5e4df00) > > dump content: kernel pages only >> $c > vpanic(f9b14564, d507f048, d5e4df00) > dnode_evict_dbufs+0x179() > dmu_objset_evict_dbufs+0xcd() > zfs_objset_close+0x15d(d45f7680) > zfs_umount+0xf6() > fsop_unmount+0x18(d45dd000, 0, d377bf38) > dounmount+0x46(d45dd000, 0, d377bf38) > vfs_unmountall+0x7e() > kadmin+0x3a1(2, 6, 0, d377bf38) > uadmin+0x85() > sys_call+0x104() >> ::fsinfo > VFSP FS MOUNT > fec63030 ufs / > fec63970 devfs /devices > d3a0e080 ctfs /system/contract > d3a0b500 proc /proc > d3a0a000 mntfs /etc/mnttab > d3a4bac0 tmpfs /etc/svc/volatile > d3a4b040 objfs /system/object > d45de500 namefs /etc/svc/volatile/repository_door > d45dd000 zfs /usr >> ::ps > S PID PPID PGID SID UID FLAGS ADDR NAME > R 0 0 0 0 0 0x00000001 fec1dd7c sched > R 3 0 0 0 0 0x00020001 d40d0278 fsflush > R 2 0 0 0 0 0x00020001 d40d0ae0 pageout > Z 1 0 0 0 0 0x42024002 d40d1348 init > R 7 1 7 7 0 0x42020002 d40cf1a8 svc.startd >> ::msgbuf > ... > > panic[cpu0]/thread=d45f8000: > dangling dbufs (dn=d507f048, dbuf=d5e4df00) > > > d4647d28 zfs:dnode_evict_dbufs+179 (d507f048, d45f770c,) > d4647d54 zfs:dmu_objset_evict_dbufs+cd (d5e50f70, d377bf38,) > d4647d88 zfs:zfs_objset_close+15d (d45f7680) > d4647dac zfs:zfs_umount+f6 (d45dd000, 0, d377bf) > d4647dc0 genunix:fsop_unmount+18 (d45dd000, 0, d377bf) > d4647de0 genunix:dounmount+46 (d45dd000, 0, d377bf) > d4647e0c genunix:vfs_unmountall+7e (0, 2, d377bf38, 5, ) > d4647e3c genunix:kadmin+3a1 (2, 6, 0, d377bf38) > d4647f84 genunix:uadmin+85 (2, 6, 0, d4647fac, ) > > syncing file systems... > done > dumping to /dev/dsk/c0d0s1, offset 107806720, content: kernel > > > ===================================================================> And on another box: > > # mdb -k 3 > Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp ufs ip sctp usba uhci s1394 nca zfs random fctl lofs nfs audiosup: crypto sppp ptm ipc ] >> ::status > debugging crash dump vmcore.3 (32-bit) from tiger2 > operating system: 5.11 wos_b36 (i86pc) > panic message: dangling dbufs (dn=d1afddf8, dbuf=d10a3e50) > > dump content: kernel pages only >> ::ps > S PID PPID PGID SID UID FLAGS ADDR NAME > R 0 0 0 0 0 0x00000001 fec1dd7c sched > R 3 0 0 0 0 0x00020001 d02a8278 fsflush > R 2 0 0 0 0 0x00020001 d02a8ae0 pageout > Z 1 0 0 0 0 0x42024002 d02a9348 init > R 7 1 7 7 0 0x42020002 d02a71a8 svc.startd >> ::fsinfo > VFSP FS MOUNT > fec63030 ufs / > fec63970 devfs /devices > cfd4b580 ctfs /system/contract > cfd4ab00 proc /proc > cfdcf500 mntfs /etc/mnttab > cfdce000 tmpfs /etc/svc/volatile > cfdc7540 objfs /system/object > cff47080 namefs /etc/svc/volatile/repository_door > cff48580 zfs /usr >> $c > vpanic(f7db056c, d1afddf8, d10a3e50) > dnode_evict_dbufs+0x179() > dmu_objset_evict_dbufs+0xcd() > zfs_objset_close+0x15d(cfd7c680) > zfs_umount+0xf6() > fsop_unmount+0x18(cff48580, 0, d6b01f38) > dounmount+0x46(cff48580, 0, d6b01f38) > vfs_unmountall+0x7e() > kadmin+0x3a1(2, 6, 0, d6b01f38) > uadmin+0x85() > sys_call+0x104() >> ::msgbuf > MESSAGE > sd1 is /pci at 0,0/pci1022,7414 at 7,4/storage at 3/disk at 0,1 > /pci at 0,0/pci1022,7414 at 7,4/storage at 3/disk at 0,1 (sd1) online > .... > asy1 is /isa/asy at 1,2f8 > pseudo-device: pm0 > pm0 is /pseudo/pm at 0 > > panic[cpu1]/thread=d044e000: > dangling dbufs (dn=d1afddf8, dbuf=d10a3e50) > > > d04a4d28 zfs:dnode_evict_dbufs+179 (d1afddf8, cfd7c70c,) > d04a4d54 zfs:dmu_objset_evict_dbufs+cd (d0ec6700, d6b01f38,) > d04a4d88 zfs:zfs_objset_close+15d (cfd7c680) > d04a4dac zfs:zfs_umount+f6 (cff48580, 0, d6b01f) > d04a4dc0 genunix:fsop_unmount+18 (cff48580, 0, d6b01f) > d04a4de0 genunix:dounmount+46 (cff48580, 0, d6b01f) > d04a4e0c genunix:vfs_unmountall+7e (0, 2, d6b01f38, 5, ) > d04a4e3c genunix:kadmin+3a1 (2, 6, 0, d6b01f38) > d04a4f84 genunix:uadmin+85 (2, 6, 0, d04a4fac, ) > > syncing file systems... > done > dumping to /dev/dsk/c0d0s1, offset 429391872, content: kernel >> d1afddf8::whatis > d1afddf8 is d1afddf8+0, allocated from dnode_t >> d1afddf8::print dnode_t > { > dn_struct_rwlock = { > _opaque = [ 0 ] > } > dn_link = { > list_next = 0xd10a564c > list_prev = 0xd1afdc74 > } > dn_objset = 0xd05186c0 > dn_object = 0x2 > dn_dbuf = 0xd152d260 > dn_phys = 0xd1bfc400 > dn_type = 0t22 (DMU_OT_DELETE_QUEUE) > dn_bonuslen = 0 > dn_bonustype = 0 > dn_nblkptr = 0x3 > dn_checksum = 0 > dn_compress = 0 > dn_nlevels = 0x2 > dn_indblkshift = 0xe > dn_datablkshift = 0xe > dn_datablkszsec = 0x20 > dn_datablksz = 0x4000 > dn_maxblkid = 0x40 > dn_next_nlevels = [ 0, 0, 0, 0 ] > dn_next_indblkshift = [ 0, 0, 0, 0 ] > dn_next_blksz = [ 0, 0, 0, 0 ] > dn_dirty_link = [ > { > list_next = 0 > list_prev = 0 > } > { > list_next = 0 > list_prev = 0 > } > { > list_next = 0 > list_prev = 0 > } > { > list_next = 0 > list_prev = 0 > } > ] > dn_mtx = { > _opaque = [ 0, 0 ] > } > dn_dirty_dbufs = [ > { > list_size = 0xc0 > list_offset = 0x6c > list_head = { > list_next = 0xd1afde7c > list_prev = 0xd1afde7c > } > } > { > list_size = 0xc0 > list_offset = 0x74 > list_head = { > list_next = 0xd1afde8c > list_prev = 0xd1afde8c > } > } > { > list_size = 0xc0 > list_offset = 0x7c > list_head = { > list_next = 0xd1afde9c > list_prev = 0xd1afde9c > } > } > { > list_size = 0xc0 > list_offset = 0x84 > list_head = { > list_next = 0xd1afdeac > list_prev = 0xd1afdeac > } > } > ] > dn_ranges = [ > { > avl_root = 0 > avl_compar = free_range_compar > avl_offset = 0 > avl_numnodes = 0 > avl_size = 0x20 > } > { > avl_root = 0 > avl_compar = free_range_compar > avl_offset = 0 > avl_numnodes = 0 > avl_size = 0x20 > } > { > avl_root = 0 > avl_compar = free_range_compar > avl_offset = 0 > avl_numnodes = 0 > avl_size = 0x20 > } > { > avl_root = 0 > avl_compar = free_range_compar > avl_offset = 0 > avl_numnodes = 0 > avl_size = 0x20 > } > ] > dn_allocated_txg = 0 > dn_free_txg = 0 > dn_assigned_txg = 0 > dn_assigned_tx = 0 > dn_notxholds = { > _opaque = 0 > } > dn_dirtyctx = 0 (DN_UNDIRTIED) > dn_dirtyctx_firstset = 0 > dn_tx_holds = { > rc_count = 0 > } > dn_holds = { > rc_count = 0x4 > } > dn_dbufs_mtx = { > _opaque = [ 0, 0 ] > } > dn_dbufs = { > list_size = 0xc0 > list_offset = 0x64 > list_head = { > list_next = 0xd10a3eb4 > list_prev = 0xd152d204 > } > } > dn_bonus = 0 > dn_zfetch = { > zf_rwlock = { > _opaque = [ 0 ] > } > zf_stream = { > list_size = 0x48 > list_offset = 0x38 > list_head = { > list_next = 0xd151fa58 > list_prev = 0xd151fa58 > } > } > zf_dnode = 0xd1afddf8 > zf_stream_cnt = 0x1 > zf_alloc_fail = 0x1 > } > } >> d10a3e50::whatis > d10a3e50 is d10a3e50+0, allocated from dmu_buf_impl_t >> d10a3e50::print dmu_buf_impl_t > { > db = { > db_object = 0x2 > db_offset = 0x4000 > db_size = 0x4000 > db_data = 0xd1bf4000 > } > db_objset = 0xd05186c0 > db_dnode = 0xd1afddf8 > db_parent = 0xd152d1a0 > db_hash_next = 0 > db_blkid = 0x1 > db_blkptr = 0xd1bf8080 > db_level = 0 > db_mtx = { > _opaque = [ 0, 0 ] > } > db_state = 3 (DB_CACHED) > db_holds = { > rc_count = 0x1 > } > db_buf = 0xd0ba16c8 > db_changed = { > _opaque = 0 > } > db_data_pending = 0 > db_dirtied = 0 > db_link = { > list_next = 0xd10a3f74 > list_prev = 0xd1afdf4c > } > db_dirty_node = [ > { > list_next = 0 > list_prev = 0 > } > { > list_next = 0 > list_prev = 0 > } > { > list_next = 0 > list_prev = 0 > } > { > list_next = 0 > list_prev = 0 > } > ] > db_dirtycnt = 0 > db_d = { > db_user_ptr = 0xd0a6e650 > db_user_data_ptr_ptr = 0xd0a6e664 > db_evict_func = zap_leaf_pageout > db_immediate_evict = 0 > db_freed_in_flight = 0 > db_data_old = [ 0, 0, 0, 0 ] > db_overridden_by = [ 0, 0, 0, 0 ] > } > } > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> J?rgen Keil wrote: > > On two x86 32-bit systems, running snv_34 bfu''ed to on-20060313 release bits [*], I''m getting a > > reproducible "dangling dbufs" panic when I try to poweroff these systems by pressing the ACPI > > power button. The problem does not happen when I use the "poweroff" or "halt" command. > > A couple of questions: > > - specifically what kind of systems are these?- a Toshiba Tecra S1 laptop (Pentium M / Centrino) - a dual processor Tyan Tiger MP system (2x AMD MP CPU).> - have you tried this in 64-bit mode?Nope, no 64-bit mode supported on these systems This message posted from opensolaris.org
> J?rgen Keil wrote: > > On two x86 32-bit systems, running snv_34 bfu''ed to on-20060313 release bits [*], I''m getting a > > reproducible "dangling dbufs" panic when I try to poweroff these systems by pressing the ACPI > > power button. > > I''ll see if I can reproduce this here later today.Now that on-20060320 is out, I compiled release and debug versions and bfu''ed to them. With on-20060320 I''ve been unable to reproduce the dangling dbufs panic, so far. I tested both unmodified on-20060320 bits, and my modified zfs module that calls kmem_reap() when the kernel heap is >75% full. To be sure, I booted from an alternate on-20060313 root ufs filesystem / on-20060313 zfs /usr filesystem, and in this environment the dangling dbufs panic remains reproducible. Hmm, on-20060320 includes a change for zfs unmount: BUG/RFE: 6393443 Remove remaining txg_wait_synced() from zfs unmount path. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6393443 It seems as if 6393443 is a fix for this dangling dbufs panic... This message posted from opensolaris.org