I''ve experienced a hang (cannot re-establish lost VNC session, ssh as me, console login as root) on an snv_80+ system. CTEact tells me: Searching for mutex deadlocks: ### MUTEX ### thread 2a102765ca0 is waiting for mutex 0x3000466e5c8 mutex is owned by thread 2a10282dca0 stack of thread 2a102765ca0 last ran: 2 mins 43 secs ago, 1 min 30 secs before panic stack trace is: unix: swtch () genunix: turnstile_block+0x5a4 (0,0,0x3000466e5c8,mutex_sobj_ops,0,0) unix: mutex_vector_enter+0x528 (0x3000466e5c8) unix: mutex_enter (0x3000466e5c8) zfs: vdev_queue_io+0x7c (0x3004f73dd20) zfs: vdev_disk_io_start+0x158 (0x3004f73dd20) zfs: zio_vdev_io_start (0x3004f73dd20) zfs: zio_execute+0xf4 (0x3004f73dd20) zfs: zio_nowait (?) zfs: vdev_mirror_io_start+0x1e4 (0x3004f754700) zfs: zio_vdev_io_start (0x3004f754700) zfs: zio_execute+0xf4 (0x3004f754700) zfs: zio_nowait (?) zfs: vdev_mirror_io_start+0x1e4 (0x3004e468828) zfs: zio_vdev_io_start (0x3004e468828) zfs: zio_execute+0xf4 (0x3004e468828) genunix: taskq_thread+0x1f0 (0x3000506d3c0) unix: thread_start+4 () stack of thread 2a10282dca0 last ran: 1 min 13 secs ago, 17 ticks before panic stack trace is: unix: swtch () genunix: cv_wait+0x5c (0x2a10282de46,0x2a10282de48) genunix: delay+0x84 (0x19) unix: page_resv+0x78 (3,0) unix: segkmem_xalloc+0xcc (0x3000000a000,0,0x6000,0,0,segkmem_page_create,kvp) unix: segkmem_alloc_vn+0xac (0x3000000a000,0x6000,0,kvp) unix: segkmem_alloc (*0x3000001a060,0x6000,?) genunix: vmem_xalloc+0x6dc (0x3000001a000,0x6000,0x2000,0,0,0,0,0) genunix: vmem_alloc+0x210 (0x3000001a000,0x6000,0) genunix: kmem_slab_create+0x44 (0x300060a9908,0) genunix: kmem_slab_alloc+0x5c (0x300060a9908,0) genunix: kmem_cache_alloc+0x144 (0x300060a9908,0) zfs: zio_buf_alloc (0x5e00) zfs: vdev_queue_io_to_issue+0x168 (0x3000466e528,0x23) zfs: vdev_queue_io_done+0x54 (0x3005df0fbc8) zfs: vdev_disk_io_done+4 (0x3005df0fbc8) zfs: zio_vdev_io_done (0x3005df0fbc8) zfs: zio_execute+0xf4 (0x3005df0fbc8) genunix: taskq_thread+0x1f0 (0x3000506d4b0) unix: thread_start+4 () There are 30 other threads, some of which have stacks like the one in turnstile_lock() above and others that look more like: thread 2a10281dca0 is waiting for mutex 0x3000466e5c8 mutex is owned by thread 2a10282dca0 stack of thread 2a10281dca0 last ran: 2 mins 43 secs ago, 1 min 30 secs before panic stack trace is: unix: swtch () genunix: turnstile_block+0x5a4 (0x30006630790,0,0x3000466e5c8,mutex_sobj_ops,0,0 ) unix: mutex_vector_enter+0x528 (0x3000466e5c8) unix: mutex_enter (0x3000466e5c8) zfs: vdev_queue_io_done+0x9c (0x3004f49e5c0) zfs: vdev_disk_io_done+4 (0x3004f49e5c0) zfs: zio_vdev_io_done (0x3004f49e5c0) zfs: zio_execute+0xf4 (0x3004f49e5c0) genunix: taskq_thread+0x1f0 (0x3000506d4b0) unix: thread_start+4 () This kernel is one that I built from bits a little newer than snv_80. The tip of my repository looks like: changeset: 5732:d351713150c2 tag: tip user: randyf date: Thu Dec 20 16:51:30 2007 -0800 summary: 6631164 AD_FORCE_SUSPEND_TO_RAM test case actually powers off, when it should just return While all of this was happening, I was patiently waiting for this hg clone to finish. Normally it would be done in a few minutes - I let it run over night. *** process id 104159 is /usr/bin/python /usr/bin/hg clone ssh://anon at hg.opensolaris.org/hg/onnv/onnv-ga , parent process is 104133 uid is 0x3e8 0t1000, gid is 0x3e8 0t1000 thread addr 3005131ed60, proc addr 300096658a8, lwp addr 3004754c0e8 t_state is 0x1 - TS_SLEEP Scheduling info: t_pri is 0x3b, t_epri is 0, t_cid is 0x1 scheduling class is: TS t_disp_time: is 0xd0fd1c, 0t13696284 last ran: 10 hours 23 mins 0 secs ago, 10 hours 21 mins 47 secs before panic on cpu 0 pc is 1104ba8, sp is 2a102ec9440, t_stk 2a102ec9ae0 stack trace is: unix: swtch () genunix: cv_wait+0x5c (0x300492dde18,0x300492dde10) zfs: zio_wait+0x54 (0x300492ddb88) zfs: dmu_buf_hold_array_by_dnode+0x1c0 (0x3004a16b7f8,0,0x154,1,0x7ae76a35, 0x2a102ec97fc,0x2a102ec97f0) zfs: dmu_buf_hold_array+0x60 (0x300082ed2f0,?,0,0x154,1,0x7ae76a35,0x2a102ec97fc ,0x2a102ec97f0) zfs: dmu_read_uio+0x30 (0x300082ed2f0,*0x30049cc2070,0x2a102ec9a10,0x154) zfs: zfs_read+0x1e8 (0x30023cecf80,0x2a102ec9a10,?,?,?) genunix: fop_read+0x48 (0x30023cecf80,0x2a102ec9a10,0,0x300039600e8,0) genunix: read+0x1fc (3,?,0x200?) unix: syscall_trap32+0x1e8 () The system is an dual proc Ultra 2 w/ 768 MB RAM. I''ve done very similar things on snv_76 and earlier. The key difference is that I had previously been running an unmirrored zpool. Now it looks like... # zpool status pool: pool0 state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using ''zpool upgrade''. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM pool0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t1d0s7 ONLINE 0 0 0 c0t0d0s7 ONLINE 0 0 0 errors: No known data errors I''ll keep the crash dump around for a while in the event that someone has interest in digging into it more. -- Mike Gerdts http://mgerdts.blogspot.com/