Hi. snv_39, SPARC. I have several pools (no protection on ZFS) with several filesystems inside each pool. Data are served by nfsd (over 3000 active threads). Last time I changed it /etc/system: set rpcmod:cotsmaxdupreqs=8192 set rpcmod:maxdupreqs=8192 And now I observer that every few hours nfsd is not issuing any IOs to most pools and all its threads (over 3000 right now - its set limit). Locally I can issue IOs to zfs filesystems without any problem. After 10-25 minutes the problem is gone on itself and then again back later. Most nfsd threads are hanging in ZFS. zpool iostat 1 nfs-s5-p0 2.45T 2.08T 0 0 0 0 nfs-s5-p1 1.60T 2.93T 0 0 0 0 nfs-s5-p2 1.39T 3.14T 0 0 0 0 nfs-s5-p3 41.2G 4.49T 0 0 0 0 nfs-s5-s8 4.40T 137G 0 7 0 255K ---------- ----- ----- ----- ----- ----- ----- mdb -kw> ::ps!grep nfsdR 320 1 320 320 1 0x42300902 0000030035d9a040 nfsd> 0000030035d9a040::walk thread|::findstack -vstack pointer for thread 30001305020: 2a100687021 [ 000002a100687021 cv_wait+0x40() ] 000002a1006870d1 exitlwps+0x11c(0, 200000, 42000002, 30035d9a040, 100000, 30035d9a106) 000002a100687181 proc_exit+0x1c(1, 0, ff131c80, 0, f, 18afe38) 000002a100687231 exit+8(1, 0, ff131c80, 0, f, ff3a2400) 000002a1006872e1 syscall_trap32+0xcc(0, 0, ff131c80, 0, f, ff3a2400) stack pointer for thread 3004138b920: 2a102176621 [ 000002a102176621 cv_wait+0x40() ] 000002a1021766d1 zil_commit+0x74(600012742ec, 26ca1, 10, 60001274280, 0, 26ca1) 000002a102176781 zfs_fsync+0xa8(0, 0, 3000081cf94, 0, 300d7fe1000, 0) 000002a102176831 fop_fsync+0x14(300d7fea040, 0, 300be5d1358, 3ade4e4, 0, 7ba3d40c) 000002a1021768e1 rfs3_remove+0x22c(2a102177198, 2a102177398, 0, 2a102177698, 300be5d1358, 2a102177220) 000002a102176ab1 common_dispatch+0x44c(2a102177698, 300c07cbdc0, 2a102177500, 6003298f200, 7017a1c0, 7bb9c7a8) 000002a102176dd1 svc_getreq+0x210(300c07cbdc0, 600096837c0, 6003269bc50, 300844084f8, 18feb90, 6003269bac0) 000002a102176f21 svc_run+0x194(60001125190, 0, 0, 1, 600011251c8, 30035d9a040) 000002a102176fd1 nfssys+0x1a4(e, ff0a1f9c, 7bb2f800, c, c, 1d0) 000002a1021772e1 syscall_trap32+0xcc(e, ff0a1f9c, 0, 0, 0, 0) stack pointer for thread 300be20d600: 2a101862621 [ 000002a101862621 cv_wait+0x40() ] 000002a1018626d1 zil_commit+0x74(300418bb62c, 26a38, 10, 300418bb5c0, 0, 26a38) 000002a101862781 zfs_fsync+0xa8(0, 0, 6000724d994, 0, 30060eb0010, 0) 000002a101862831 fop_fsync+0x14(30182c5fa00, 0, 300be5d0dd8, 3aee1e6, 0, 7ba3d40c) 000002a1018628e1 rfs3_remove+0x22c(2a101863198, 2a101863398, 0, 2a101863698, 300be5d0dd8, 2a101863220) 000002a101862ab1 common_dispatch+0x44c(2a101863698, 300c0892c80, 2a101863500, 6002fddb000, 7017a1c0, 7bb9c7a8) 000002a101862dd1 svc_getreq+0x210(300c0892c80, 6001966b0c0, 6002fe72750, c00, 18feb90, 6002fe725c0) 000002a101862f21 svc_run+0x194(60001125190, 0, 160, 1, 600011251c8, 30035d9a040) 000002a101862fd1 nfssys+0x1a4(e, fefe1f9c, 7bb2f800, c, c, 1d0) 000002a1018632e1 syscall_trap32+0xcc(e, fefe1f9c, 0, 0, 0, 0) stack pointer for thread 3000131b960: 2a1002ae621 [ 000002a1002ae621 cv_wait+0x40() ] 000002a1002ae6d1 zil_commit+0x74(6000574486c, 22049, 10, 60005744800, 0, 22049) 000002a1002ae781 zfs_fsync+0xa8(0, 0, 60005770d14, 0, 3007b62a648, 0) 000002a1002ae831 fop_fsync+0x14(3008549dd00, 0, 300416fc220, 3ad91ac, 0, 7ba3d40c) 000002a1002ae8e1 rfs3_remove+0x22c(2a1002af198, 2a1002af398, 0, 2a1002af698, 300416fc220, 2a1002af220) 000002a1002aeab1 common_dispatch+0x44c(2a1002af698, 600013a96c0, 2a1002af500, 300b7293180, 7017a1c0, 7bb9c7a8) 000002a1002aedd1 svc_getreq+0x210(600013a96c0, 6002e3d2980, 60021510710, 60003f5d0f8, 18feb90, 60021510580) 000002a1002aef21 svc_run+0x194(60001125190, 0, 0, 1, 600011251c8, 30035d9a040) 000002a1002aefd1 nfssys+0x1a4(e, fef31f9c, 7bb2f800, c, c, 1d0) 000002a1002af2e1 syscall_trap32+0xcc(e, fef31f9c, 0, 0, 0, 0) stack pointer for thread 300b38dd620: 2a1031f64e1 [ 000002a1031f64e1 cv_wait+0x40() ] 000002a1031f6591 zil_commit+0x74(300418bbdac, 27f84, 10, 300418bbd40, 39081e08, 27f84) 000002a1031f6641 zfs_fsync+0xa8(0, 10000, 30090ab13d4, 0, 3014523f8c8, 0) 000002a1031f66f1 fop_fsync+0x14(3009fde44c0, 10000, 60001002218, a, 39081e08, 7ba3d40c) 000002a1031f67a1 rfs3_create+0x7bc(2a1031f7500, 2a1031f7080, 1, 0, 60001002218, 2a1031f7220) 000002a1031f6ab1 common_dispatch+0x44c(2a1031f7698, 300bfd2ce40, 2a1031f7500, 300b70f2340, 7017a1c0, 7bb9b568) 000002a1031f6dd1 svc_getreq+0x210(300bfd2ce40, 60034dda0c0, 300ea327690, 1e3, 18feb90, 300ea327500) 000002a1031f6f21 svc_run+0x194(60001125190, 1, 0, 1, 600011251c8, 30035d9a040) 000002a1031f6fd1 nfssys+0x1a4(e, fedf1f9c, 7bb2f800, c, c, 1d0) 000002a1031f72e1 syscall_trap32+0xcc(e, fedf1f9c, 0, 0, 0, 0) [...] using mpstat I can see that one CPU is 100% utlized: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 86 214 114 0 0 0 0 0 0 0 1 0 99 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 2 0 0 0 37 0 73 0 0 0 0 1 0 0 0 100 3 0 0 3 12 0 22 0 1 1 0 0 0 0 0 100 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 5 0 0 2 10 0 18 0 0 0 0 0 0 0 0 100 6 0 0 1 3 0 4 0 0 1 0 0 0 0 0 100 7 0 0 3 9 0 20 0 0 0 0 0 0 1 0 99 8 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 9 0 0 0 10 2 14 0 0 0 0 0 0 1 0 99 10 0 0 3 3 0 4 0 0 1 0 0 0 0 0 100 11 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 12 0 0 0 6 0 10 0 1 0 0 0 0 0 0 100 13 0 0 2 6 0 10 0 0 0 0 0 0 0 0 100 14 0 0 1 6 0 10 0 0 1 0 226 0 0 0 100 15 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 16 0 0 0 3 0 4 0 0 0 0 0 0 0 0 100 17 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 18 0 0 0 4 0 6 0 0 0 0 0 0 0 0 100 19 0 0 0 6 0 10 0 0 0 0 0 0 0 0 100 20 0 0 1 5 0 8 0 1 0 0 18 1 0 0 99 21 0 0 20 24 21 4 0 0 0 0 0 0 0 0 100 22 0 0 22 47 38 16 0 0 0 0 0 0 0 0 100 23 0 0 4 16 4 22 0 1 1 0 0 0 0 0 100 24 0 0 4 5 4 0 0 0 1 0 0 0 0 0 100 25 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 26 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 27 0 0 0 1 0 0 0 0 0 0 0 0 100 0 0 28 0 0 1 10 0 18 0 0 2 0 5 0 0 0 100 29 0 0 2 8 0 14 0 0 0 0 0 0 0 0 100 30 0 0 0 8 0 14 0 0 2 0 165 1 0 0 99 31 0 0 2 11 0 20 0 0 0 0 19 0 0 0 100 ^C bash-3.00# Well I wanted to play with dtrace but it has gone (CPU usage) and now system is almost 100% idle but still no IOs. Locally I can issue IOs to these zfs filesystems without any problem. Finally nfsd stopped (I issued kill -9 to nfsd - and it exited after... don''t know 10-15 minutes). This message posted from opensolaris.org
I issued svcadm disable nfs/server nfsd is still there with about 1300 threads (down from 2052). mpstat show at least on CPU with 0% idle all the time and: bash-3.00# dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}'' -n tick-10s''{printa(@);exit(0);}'' [...] page_add_common 10514532 bcopy 10545304 page_vpsub 11011128 page_try_reclaim_lock 11599172 sfmmu_mlist_enter 12704152 sfmmu_mlist_exit 13718000 page_next_scan_large 17375924 mutex_vector_enter 27981368 pid_entry 33161076 sfmmu_mlspl_enter 33803644 send_mondo_set 38210784 xc_serv 44234968 disp_getwork 83185868 avl_walk 356750584 disp_anywork 583123100> ::ps!grep nfsdR 3865 1 3865 3865 1 0x42300002 00000300116ec7e0 nfsd> 00000300116ec7e0::walk thread|::findstack -vstack pointer for thread 3002f4bd300: 2a1084b7021 [ 000002a1084b7021 cv_wait+0x40() ] 000002a1084b70d1 exitlwps+0x11c(0, 200000, 42000002, 300116ec7e0, 100000, 300116ec8a6) 000002a1084b7181 proc_exit+0x1c(1, 0, ffbff9d0, 0, f, 18afe38) 000002a1084b7231 exit+8(1, 0, ffbff9d0, 0, f, ff3a2000) 000002a1084b72e1 syscall_trap32+0xcc(0, 0, ffbff9d0, 0, f, ff3a2000) stack pointer for thread 301b39883a0: 2a106e4e4e1 [ 000002a106e4e4e1 cv_wait+0x40() ] 000002a106e4e591 zil_commit+0x74(300418bb62c, 29805, 10, 300418bb5c0, 93d0ed08, 29805) 000002a106e4e641 zfs_fsync+0xa8(0, 10000, 6000724d994, 0, 300418fe008, 0) 000002a106e4e6f1 fop_fsync+0x14(300418f07c0, 10000, 3000f8ad200, a, 93d0ed08, 7ba3d40c) 000002a106e4e7a1 rfs3_create+0x7bc(2a106e4f500, 2a106e4f080, 1, 0, 3000f8ad200, 2a106e4f220) 000002a106e4eab1 common_dispatch+0x44c(2a106e4f698, 6002bbf2e40, 2a106e4f500, 6003298f200, 7017a1c0, 7bb9b568) 000002a106e4edd1 svc_getreq+0x210(6002bbf2e40, 60003b42800, 301d3cee1d0, 3015c253af8, 18feb90, 301d3cee040) 000002a106e4ef21 svc_run+0x194(6002410c8f8, 0, 0, 1, 6002410c930, 300116ec7e0) 000002a106e4efd1 nfssys+0x1a4(e, ff0d1f9c, 7bb2f800, c, c, 1d0) 000002a106e4f2e1 syscall_trap32+0xcc(e, ff0d1f9c, 0, 0, 0, 0) stack pointer for thread 301bcd8e340: 2a107f364e1 [ 000002a107f364e1 cv_wait+0x40() ] 000002a107f36591 zil_commit+0x74(300418bb62c, 2980a, 10, 300418bb5c0, 93d0ed08, 2980a) 000002a107f36641 zfs_fsync+0xa8(0, 10000, 6000724d994, 0, 3013617ca80, 0) 000002a107f366f1 fop_fsync+0x14(300360f6a40, 10000, 301c07bea58, a, 93d0ed08, 7ba3d40c) 000002a107f367a1 rfs3_create+0x7bc(2a107f37500, 2a107f37080, 1, 0, 301c07bea58, 2a107f37220) 000002a107f36ab1 common_dispatch+0x44c(2a107f37698, 6002dd3e900, 2a107f37500, 6003298f200, 7017a1c0, 7bb9b568) 000002a107f36dd1 svc_getreq+0x210(6002dd3e900, 6001fd85180, 3019a3c4190, 171, 18feb90, 3019a3c4000) 000002a107f36f21 svc_run+0x194(6002410c8f8, 0, 0, 1, 6002410c930, 300116ec7e0) 000002a107f36fd1 nfssys+0x1a4(e, ff091f9c, 7bb2f800, c, c, 1d0) 000002a107f372e1 syscall_trap32+0xcc(e, ff091f9c, 0, 0, 0, 0) stack pointer for thread 3007fab5900: 2a1062464e1 [ 000002a1062464e1 cv_wait+0x40() ] 000002a106246591 zil_commit+0x74(6000127546c, 25af2, 10, 60001275400, 29646c08, 25af2) 000002a106246641 zfs_fsync+0xa8(0, 10000, 6000724c494, 0, 3012b2776b0, 0) 000002a1062466f1 fop_fsync+0x14(300e667dcc0, 10000, 300416fd090, a, 29646c08, 7ba3d40c) 000002a1062467a1 rfs3_create+0x7bc(2a106247500, 2a106247080, 1, 0, 300416fd090, 2a106247220) 000002a106246ab1 common_dispatch+0x44c(2a106247698, 30039c25c00, 2a106247500, 30009ca9440, 7017a1c0, 7bb9b568) 000002a106246dd1 svc_getreq+0x210(30039c25c00, 3005bfc0d80, 300ea327690, 30012d4d2b8, 18feb90, 300ea327500) 000002a106246f21 svc_run+0x194(6002410c8f8, 0, 0, 1, 6002410c930, 300116ec7e0) 000002a106246fd1 nfssys+0x1a4(e, ff081f9c, 7bb2f800, c, c, 1d0) 000002a1062472e1 syscall_trap32+0xcc(e, ff081f9c, 0, 0, 0, 0) stack pointer for thread 301d6b6c6c0: 2a10849e4e1 [ 000002a10849e4e1 cv_wait+0x40() ] 000002a10849e591 zil_commit+0x74(6000574486c, 23d14, 10, 60005744800, c8968e08, 23d14) 000002a10849e641 zfs_fsync+0xa8(0, 10000, 60005770d14, 0, 30191847508, 0) 000002a10849e6f1 fop_fsync+0x14(3012a1bd480, 10000, 301e272a390, a, c8968e08, 7ba3d40c) 000002a10849e7a1 rfs3_create+0x7bc(2a10849f500, 2a10849f080, 1, 0, 301e272a390, 2a10849f220) 000002a10849eab1 common_dispatch+0x44c(2a10849f698, 6002dd3f540, 2a10849f500, 300b72a7ac0, 7017a1c0, 7bb9b568) 000002a10849edd1 svc_getreq+0x210(6002dd3f540, 60010279140, 301a20eac90, 301105bb638, 18feb90, 301a20eab00) 000002a10849ef21 svc_run+0x194(6002410c8f8, 0, 0, 1, 6002410c930, 300116ec7e0) 000002a10849efd1 nfssys+0x1a4(e, ff001f9c, 7bb2f800, c, c, 1d0) 000002a10849f2e1 syscall_trap32+0xcc(e, ff001f9c, 0, 0, 0, 0) stack pointer for thread 30023ed2c80: 2a1039964e1 [ 000002a1039964e1 cv_wait+0x40() ] 000002a103996591 zil_commit+0x74(30039c275ec, 27e41, 10, 30039c27580, a0369a08, 27e41) 000002a103996641 zfs_fsync+0xa8(0, 10000, 30090ab16d4, 0, 300b3e12e58, 0) 000002a1039966f1 fop_fsync+0x14(3000300a9c0, 10000, 30020ef5410, a, a0369a08, 7ba3d40c) 000002a1039967a1 rfs3_create+0x7bc(2a103997500, 2a103997080, 1, 0, 30020ef5410, 2a103997220) 000002a103996ab1 common_dispatch+0x44c(2a103997698, 300be5a4580, 2a103997500, 60030100100, 7017a1c0, 7bb9b568) 000002a103996dd1 svc_getreq+0x210(300be5a4580, 600055c6080, 6002d688c90, 60034e68bb8, 18feb90, 6002d688b00) 000002a103996f21 svc_run+0x194(6002410c8f8, 0, 0, 1, 6002410c930, 300116ec7e0) 000002a103996fd1 nfssys+0x1a4(e, fefa1f9c, 7bb2f800, c, c, 1d0) 000002a1039972e1 syscall_trap32+0xcc(e, fefa1f9c, 0, 0, 0, 0) so these threads are in zfs again. i see no disk activity (both iostat and zpool iostat). bash-3.00# dtrace -n fbt::avl_walk:entry''{@[stack()]=count();}'' [about 5s] [..] genunix`rm_assize+0x1a4 procfs`prgetpsinfo32+0x3a4 procfs`pr_read_psinfo_32+0x38 genunix`fop_read+0x20 genunix`read+0x29c unix`syscall_trap32+0xcc 10720 zfs`metaslab_ff_alloc+0x98 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x1d4 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x2c zfs`zio_write_allocate_gang_members+0x328 zfs`zio_write_compress+0x1e4 zfs`arc_write+0xbc zfs`dbuf_sync+0x6b0 zfs`dnode_sync+0x300 zfs`dmu_objset_sync_dnodes+0x68 zfs`dmu_objset_sync+0x50 zfs`dsl_dataset_sync+0xc zfs`dsl_pool_sync+0x60 zfs`spa_sync+0xe0 zfs`txg_sync_thread+0x130 unix`thread_start+0x4 362020 zfs`metaslab_ff_alloc+0x98 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x1d4 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x2c zfs`zio_dva_allocate+0x50 zfs`zio_write_compress+0x1e4 zfs`arc_write+0xbc zfs`dbuf_sync+0x6b0 zfs`dnode_sync+0x300 zfs`dmu_objset_sync_dnodes+0x68 zfs`dmu_objset_sync+0x50 zfs`dsl_dataset_sync+0xc zfs`dsl_pool_sync+0x60 zfs`spa_sync+0xe0 zfs`txg_sync_thread+0x130 unix`thread_start+0x4 439189 Totally after 3-5 minutes nfsd exited. This message posted from opensolaris.org
It''s not only when I try to stop nfsd - during normall operations I see that one CPU has 0% idle, all traffic is only to one pool (and this is very small traffic) and all nfs threads hung - I guess all these threads are to this pool. bash-3.00# zpool iostat 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- nfs-s5-p0 2.46T 2.07T 9 44 591K 2.13M nfs-s5-p1 1.61T 2.92T 11 57 690K 2.79M nfs-s5-p2 1.41T 3.12T 14 98 923K 7.71M nfs-s5-p3 42.4G 4.49T 1 19 122K 570K nfs-s5-s8 4.40T 134G 40 36 1.69M 775K ---------- ----- ----- ----- ----- ----- ----- nfs-s5-p0 2.46T 2.07T 0 0 0 0 nfs-s5-p1 1.61T 2.92T 0 0 0 0 nfs-s5-p2 1.41T 3.12T 0 0 0 0 nfs-s5-p3 42.4G 4.49T 0 0 0 0 nfs-s5-s8 4.40T 134G 0 27 0 31.6K ---------- ----- ----- ----- ----- ----- ----- nfs-s5-p0 2.46T 2.07T 0 0 0 0 nfs-s5-p1 1.61T 2.92T 0 0 0 0 nfs-s5-p2 1.41T 3.12T 0 0 0 0 nfs-s5-p3 42.4G 4.49T 0 0 0 0 nfs-s5-s8 4.40T 134G 0 27 0 36.6K ---------- ----- ----- ----- ----- ----- ----- nfs-s5-p0 2.46T 2.07T 0 0 0 0 nfs-s5-p1 1.61T 2.92T 0 0 0 0 nfs-s5-p2 1.41T 3.12T 0 0 0 0 nfs-s5-p3 42.4G 4.49T 0 0 0 0 nfs-s5-s8 4.40T 134G 0 29 0 37.1K ---------- ----- ----- ----- ----- ----- ----- ^C bash-3.00# pool nfs-s5-s8 is: nfs-s5-s8 4.40T 61.3G 39.5K /nfs-s5-s8 nfs-s5-s8/d5201 395G 61.3G 357G /nfs-s5-s8/d5201 nfs-s5-s8/d5201 at r1 17.2G - 331G - nfs-s5-s8/d5201 at r2 467M - 314G - nfs-s5-s8/d5202 385G 61.3G 349G /nfs-s5-s8/d5202 nfs-s5-s8/d5202 at r1 16.9G - 320G - nfs-s5-s8/d5202 at r2 457M - 304G - nfs-s5-s8/d5203 392G 61.3G 353G /nfs-s5-s8/d5203 nfs-s5-s8/d5203 at r1 17.8G - 326G - nfs-s5-s8/d5203 at r2 496M - 309G - nfs-s5-s8/d5204 381G 61.3G 344G /nfs-s5-s8/d5204 nfs-s5-s8/d5204 at r1 17.1G - 315G - nfs-s5-s8/d5204 at r2 482M - 299G - nfs-s5-s8/d5205 381G 61.3G 346G /nfs-s5-s8/d5205 nfs-s5-s8/d5205 at r1 14.9G - 316G - nfs-s5-s8/d5205 at r2 357M - 302G - nfs-s5-s8/d5206 383G 61.3G 348G /nfs-s5-s8/d5206 nfs-s5-s8/d5206 at r1 14.6G - 317G - nfs-s5-s8/d5206 at r2 355M - 303G - nfs-s5-s8/d5207 331G 61.3G 321G /nfs-s5-s8/d5207 nfs-s5-s8/d5207 at r1 10.3G - 243G - nfs-s5-s8/d5208 314G 61.3G 303G /nfs-s5-s8/d5208 nfs-s5-s8/d5208 at r1 10.9G - 250G - nfs-s5-s8/d5209 323G 61.3G 311G /nfs-s5-s8/d5209 nfs-s5-s8/d5209 at r1 11.4G - 258G - nfs-s5-s8/d5210 382G 61.3G 369G /nfs-s5-s8/d5210 nfs-s5-s8/d5210 at r1 13.2G - 317G - nfs-s5-s8/d5211 409G 61.3G 396G /nfs-s5-s8/d5211 nfs-s5-s8/d5211 at r1 13.2G - 323G - nfs-s5-s8/d5212 429G 61.3G 417G /nfs-s5-s8/d5212 nfs-s5-s8/d5212 at r1 11.5G - 252G - right now I can''t export that pool nor destroy snapshot in that pool - both commands are hunging. bash-3.00# zfs destroy nfs-s5-s8/d5212 at r1 [it''s here now for ~3 minutes] bash-3.00# zpool export nfs-s5-s8 ^C^C^C^C [it''s here for ~3 minutes] bash-3.00# mdb -kw Loading modules: [ unix krtld genunix specfs dtrace ufs sd px md ip sctp usba lofs zfs random qlc fctl fcp ssd nfs crypto ptm ]> ::ps!grep zfsR 720 348 720 342 0 0x4a004000 0000060005673378 zfs> 0000060005673378::walk thread|::findstack -vstack pointer for thread 300024ea060: 2a102c4cc31 [ 000002a102c4cc31 cv_wait+0x40() ] 000002a102c4cce1 txg_wait_synced+0x54(3000052eb90, 43ca2, 300419b600f, 3000052ebd0, 3000052ebd2, 3000052eb88) 000002a102c4cd91 dsl_dataset_destroy+0x64(300419b6000, 5, 7ba13d84, 2a102c4d828, 300419b600f, 3000052eac0) 000002a102c4cf71 dmu_objset_destroy+0x3c(300419b6000, 0, 0, 2, 0, 700d0f08) 000002a102c4d031 zfsdev_ioctl+0x158(700d0c00, 54, ffbfedb8, 1c, 70, 300419b6000) 000002a102c4d0e1 fop_ioctl+0x20(600189b61c0, 5a1c, ffbfedb8, 100003, 60019bd6850, 11fc888) 000002a102c4d191 ioctl+0x184(4, 600011c4078, ffbfedb8, 4, 40490, 5a1c) 000002a102c4d2e1 syscall_trap32+0xcc(4, 5a1c, ffbfedb8, 4, 40490, 2)> ::ps!grep zpoolR 568 445 568 439 0 0x4a004000 00000301249b1000 zpool> 00000301249b1000::walk thread|::findstack -vstack pointer for thread 300027adc40: 2a103936f41 [ 000002a103936f41 cv_wait+0x40() ] 000002a103936ff1 zil_commit+0x74(6001969f72c, ffffffffffffffff, 10, 6001969f6c0, 300011ba000, bfa) 000002a1039370a1 zfs_sync+0x9c(6001977ca80, 0, 60019759540, 60019759594, 0, 0) 000002a103937151 dounmount+0x28(6001977ca80, 0, 6002a3312a8, 600196ba940, 300011ba000, 300c8d92780) 000002a103937201 umount2+0x12c(4c4658, 0, ffbfb6b8, 3ef6c, 0, ff38c194) 000002a1039372e1 syscall_trap32+0xcc(4c4658, 0, ffbfb6b8, 3ef6c, 20a50, ff38c194)>bash-3.00# dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}'' -n tick-10s''{printa(@);exit(0);}'' [...] resume 6644840 hwblkclr 8601940 lock_set_spl_spin 11818300 send_mondo_set 15736288 pid_entry 25685856 page_next_scan_large 49429932 xc_serv 50147308 disp_getwork 65131372 mutex_vector_enter 220272764 avl_walk 253932168 disp_anywork 393395416 Maybe ''coz of snapshots in nfs-s5-s8? ok, I did BREAK/sync while nfsd, zpool and zfs were hunging. IOs were only to nfs-s5-s8 Crashdump could be provided - sorry but not for public eyes. This message posted from opensolaris.org
Robert Milkowski wrote:> I issued svcadm disable nfs/server > nfsd is still there with about 1300 threads (down from 2052).> stack pointer for thread 3002f4bd300: 2a1084b7021 > [ 000002a1084b7021 cv_wait+0x40() ] > 000002a1084b70d1 exitlwps+0x11c(0, 200000, 42000002, 300116ec7e0, 100000, 300116ec8a6) > 000002a1084b7181 proc_exit+0x1c(1, 0, ffbff9d0, 0, f, 18afe38) > 000002a1084b7231 exit+8(1, 0, ffbff9d0, 0, f, ff3a2000) > 000002a1084b72e1 syscall_trap32+0xcc(0, 0, ffbff9d0, 0, f, ff3a2000) > stack pointer for thread 301b39883a0: 2a106e4e4e1 > [ 000002a106e4e4e1 cv_wait+0x40() ] > 000002a106e4e591 zil_commit+0x74(300418bb62c, 29805, 10, 300418bb5c0, 93d0ed08, 29805) > 000002a106e4e641 zfs_fsync+0xa8(0, 10000, 6000724d994, 0, 300418fe008, 0) > 000002a106e4e6f1 fop_fsync+0x14(300418f07c0, 10000, 3000f8ad200, a, 93d0ed08, 7ba3d40c) > 000002a106e4e7a1 rfs3_create+0x7bc(2a106e4f500, 2a106e4f080, 1, 0, 3000f8ad200, 2a106e4f220) > 000002a106e4eab1 common_dispatch+0x44c(2a106e4f698, 6002bbf2e40, 2a106e4f500, 6003298f200, 7017a1c0, 7bb9b568) > 000002a106e4edd1 svc_getreq+0x210(6002bbf2e40, 60003b42800, 301d3cee1d0, 3015c253af8, 18feb90, 301d3cee040) > 000002a106e4ef21 svc_run+0x194(6002410c8f8, 0, 0, 1, 6002410c930, 300116ec7e0) > 000002a106e4efd1 nfssys+0x1a4(e, ff0d1f9c, 7bb2f800, c, c, 1d0) > 000002a106e4f2e1 syscall_trap32+0xcc(e, ff0d1f9c, 0, 0, 0, 0) > stack pointer for thread 301bcd8e340: 2a107f364e1 > [ 000002a107f364e1 cv_wait+0x40() ] > 000002a107f36591 zil_commit+0x74(300418bb62c, 2980a, 10, 300418bb5c0, 93d0ed08, 2980a) > 000002a107f36641 zfs_fsync+0xa8(0, 10000, 6000724d994, 0, 3013617ca80, 0) > 000002a107f366f1 fop_fsync+0x14(300360f6a40, 10000, 301c07bea58, a, 93d0ed08, 7ba3d40c) > 000002a107f367a1 rfs3_create+0x7bc(2a107f37500, 2a107f37080, 1, 0, 301c07bea58, 2a107f37220) > 000002a107f36ab1 common_dispatch+0x44c(2a107f37698, 6002dd3e900, 2a107f37500, 6003298f200, 7017a1c0, 7bb9b568) > 000002a107f36dd1 svc_getreq+0x210(6002dd3e900, 6001fd85180, 3019a3c4190, 171, 18feb90, 3019a3c4000) > 000002a107f36f21 svc_run+0x194(6002410c8f8, 0, 0, 1, 6002410c930, 300116ec7e0) > 000002a107f36fd1 nfssys+0x1a4(e, ff091f9c, 7bb2f800, c, c, 1d0) > 000002a107f372e1 syscall_trap32+0xcc(e, ff091f9c, 0, 0, 0, 0) > stack pointer for thread 3007fab5900: 2a1062464e1 > [ 000002a1062464e1 cv_wait+0x40() ] > 000002a106246591 zil_commit+0x74(6000127546c, 25af2, 10, 60001275400, 29646c08, 25af2) > 000002a106246641 zfs_fsync+0xa8(0, 10000, 6000724c494, 0, 3012b2776b0, 0) > 000002a1062466f1 fop_fsync+0x14(300e667dcc0, 10000, 300416fd090, a, 29646c08, 7ba3d40c) > 000002a1062467a1 rfs3_create+0x7bc(2a106247500, 2a106247080, 1, 0, 300416fd090, 2a106247220) > 000002a106246ab1 common_dispatch+0x44c(2a106247698, 30039c25c00, 2a106247500, 30009ca9440, 7017a1c0, 7bb9b568) > 000002a106246dd1 svc_getreq+0x210(30039c25c00, 3005bfc0d80, 300ea327690, 30012d4d2b8, 18feb90, 300ea327500) > 000002a106246f21 svc_run+0x194(6002410c8f8, 0, 0, 1, 6002410c930, 300116ec7e0) > 000002a106246fd1 nfssys+0x1a4(e, ff081f9c, 7bb2f800, c, c, 1d0) > 000002a1062472e1 syscall_trap32+0xcc(e, ff081f9c, 0, 0, 0, 0) > stack pointer for thread 301d6b6c6c0: 2a10849e4e1 > [ 000002a10849e4e1 cv_wait+0x40() ] > 000002a10849e591 zil_commit+0x74(6000574486c, 23d14, 10, 60005744800, c8968e08, 23d14) > 000002a10849e641 zfs_fsync+0xa8(0, 10000, 60005770d14, 0, 30191847508, 0) > 000002a10849e6f1 fop_fsync+0x14(3012a1bd480, 10000, 301e272a390, a, c8968e08, 7ba3d40c) > 000002a10849e7a1 rfs3_create+0x7bc(2a10849f500, 2a10849f080, 1, 0, 301e272a390, 2a10849f220) > 000002a10849eab1 common_dispatch+0x44c(2a10849f698, 6002dd3f540, 2a10849f500, 300b72a7ac0, 7017a1c0, 7bb9b568) > 000002a10849edd1 svc_getreq+0x210(6002dd3f540, 60010279140, 301a20eac90, 301105bb638, 18feb90, 301a20eab00) > 000002a10849ef21 svc_run+0x194(6002410c8f8, 0, 0, 1, 6002410c930, 300116ec7e0) > 000002a10849efd1 nfssys+0x1a4(e, ff001f9c, 7bb2f800, c, c, 1d0) > 000002a10849f2e1 syscall_trap32+0xcc(e, ff001f9c, 0, 0, 0, 0) > stack pointer for thread 30023ed2c80: 2a1039964e1 > [ 000002a1039964e1 cv_wait+0x40() ] > 000002a103996591 zil_commit+0x74(30039c275ec, 27e41, 10, 30039c27580, a0369a08, 27e41) > 000002a103996641 zfs_fsync+0xa8(0, 10000, 30090ab16d4, 0, 300b3e12e58, 0) > 000002a1039966f1 fop_fsync+0x14(3000300a9c0, 10000, 30020ef5410, a, a0369a08, 7ba3d40c) > 000002a1039967a1 rfs3_create+0x7bc(2a103997500, 2a103997080, 1, 0, 30020ef5410, 2a103997220) > 000002a103996ab1 common_dispatch+0x44c(2a103997698, 300be5a4580, 2a103997500, 60030100100, 7017a1c0, 7bb9b568) > 000002a103996dd1 svc_getreq+0x210(300be5a4580, 600055c6080, 6002d688c90, 60034e68bb8, 18feb90, 6002d688b00) > 000002a103996f21 svc_run+0x194(6002410c8f8, 0, 0, 1, 6002410c930, 300116ec7e0) > 000002a103996fd1 nfssys+0x1a4(e, fefa1f9c, 7bb2f800, c, c, 1d0) > 000002a1039972e1 syscall_trap32+0xcc(e, fefa1f9c, 0, 0, 0, 0)It looks like a fair number of these threads are doing fsync(); which looks similar to 6404018: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6404018 and might be related to CR 6413510 (when it rains, it pours - I just mentioned this CR in another note). Dana
System was booting over 30 minutes (it was staying hust after checkign for ufs was done) - I can see on the array that disks in pool nfs-s5-s8 are blinking - I guess it was hanging during zfs import/mount. It was reported during ZFS beta and was supposed to be fixed - looks like it''s not. This message posted from opensolaris.org