hi,
i have an x86 system here running zfs:
SunOS hyacinth 5.11 snv_27 i86pc i386 i86pc
>zpool status
pool: dsk
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
dsk ONLINE 0 0 0
c1d1s0 ONLINE 0 0 0
c1d0s0 ONLINE 0 0 0>zfs list
NAME USED AVAIL REFER MOUNTPOINT
dsk 27.1G 115G 35.8M /dsk
dsk/home 245M 115G 245M /dsk/home
dsk/mp3 25.0G 115G 25.0G /dsk/mp3
dsk/on 52.2M 115G 52.2M /dsk/on
dsk/pkgmaps 2.04M 115G 2.04M /dsk/pkgmaps
dsk/pkgs 1.61G 115G 9.00K /dsk/pkgs
dsk/pkgs/nv27a 1.61G 115G 1.61G /dsk/pkgs/nv27a
dsk/tars 154M 115G 154M /dsk/tars
after running for a couple of days, two processes (mozilla & svn) hang with
about 30% CPU use. both of them are unkillable. system CPU is >90%:
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr cd cd cd s0 in sy cs us sy id
0 0 0 1529052 10960 1 9 1 0 0 0 0 0 0 0 0 28565 967 77636 2 93 6
1 0 0 1529052 10960 0 0 0 0 0 0 0 0 0 0 0 28542 906 77407 2 93 6
1 0 0 1529052 10960 0 0 0 0 0 0 0 0 0 0 0 28349 929 77550 2 93 5
0 0 0 1529052 10960 0 0 1 0 0 0 0 0 0 0 0 28431 907 77597 2 93 6
1 0 0 1529052 10960 0 0 0 0 0 0 0 0 0 0 0 28356 912 77137 2 93 5
0 0 0 1529052 10956 0 0 1 0 0 0 0 0 0 0 0 28285 914 77165 2 93 5
0 0 0 1529052 10960 0 0 0 0 0 0 0 4 0 0 0 28390 922 77326 2 93 5
0 0 0 1529052 10956 0 0 1 0 0 0 0 0 0 0 0 27722 1198 77365 3 92 5
0 0 0 1529052 10956 0 0 0 0 0 0 0 0 0 0 0 28219 901 77294 2 93 5
0 0 0 1529052 10956 0 0 0 0 0 0 0 0 0 0 0 28244 931 77020 2 93 5
with a lot of lock contention:
>lockstat sleep 15
Adaptive mutex spin: 869541 events in 16.107 seconds (53985 events/sec)
Count indv cuml rcnt spin Lock Caller
-------------------------------------------------------------------------------
189407 22% 22% 0.00 56 0xd9f6df70 dbuf_remove_ref+0x44
140175 16% 38% 0.00 103 0xd45f1b6c cv_wait+0x54
124045 14% 52% 0.00 39 dbuf_hash_table+0x5cc dbuf_find+0x57
75292 9% 61% 0.00 6 0xd9f6b1bc dbuf_find+0xa0
58586 7% 68% 0.00 54 0xd9f6b1bc dbuf_remove_ref+0x58
30169 3% 71% 0.00 68 0xd45f1b6c txg_quiesce_thread+0x117
29615 3% 74% 0.00 224 0xd47daac0 txg_quiesce+0x3c
29531 3% 78% 0.00 318 0xd47daac0 txg_rele_to_sync+0x1a
26134 3% 81% 0.00 37 0xdd8c4c6c
dsl_dataset_block_freeable+0x21
25057 3% 84% 0.00 183 0xd47dab00 txg_rele_to_sync+0x1a
24927 3% 87% 0.00 209 0xd47dab00 txg_quiesce+0xb0
24301 3% 89% 0.00 80 0xd45f1b6c txg_wait_open+0x1e
19829 2% 92% 0.00 47 0xdd8c4c6c
dsl_dir_tempreserve_clear+0x3a
18359 2% 94% 0.00 60 0xd45f1b6c txg_sync_thread+0x1eb
17324 2% 96% 0.00 129 0xd47daac0 txg_hold_open+0x26
17007 2% 98% 0.00 201 0xd47dab00 txg_quiesce+0x3c
7494 1% 99% 0.00 115 0xd47dab00 txg_hold_open+0x26
6104 1% 99% 0.00 63 0xdd8c4c6c
dsl_dir_tempreserve_impl+0x3a
3189 0% 100% 0.00 55 0xd4a78e2c
dsl_dir_tempreserve_clear+0x3a
858 0% 100% 0.00 15 0xd8c8fd18 dnode_hold_impl+0x170
660 0% 100% 0.00 17 0xd8c8fd18 dmu_tx_try_assign+0xb2
550 0% 100% 0.00 13 0xd4a78e2c
dsl_dir_tempreserve_impl+0x3a
495 0% 100% 0.00 7 0xd8c8fd18 dmu_tx_unassign+0x45
398 0% 100% 0.00 311 0xd47daac0 txg_quiesce+0xb0
19 0% 100% 0.00 22 0xdb15a000 callout_execute+0xa3
5 0% 100% 0.00 2 0xdb162000 untimeout+0x24
4 0% 100% 0.00 60 0xdb15a000 timeout_common+0x15
3 0% 100% 0.00 4 0xdb15a000 untimeout+0x24
1 0% 100% 0.00 1 0xdb162000 timeout_common+0x15
1 0% 100% 0.00 16 0xdb15a000 callout_execute+0x18
1 0% 100% 0.00 5 0xd4189400 post_syscall+0x2a7
1 0% 100% 0.00 2 0xd4365000 timeout_common+0x15
-------------------------------------------------------------------------------
Adaptive mutex block: 2367 events in 16.107 seconds (147 events/sec)
Count indv cuml rcnt nsec Lock Caller
-------------------------------------------------------------------------------
860 36% 36% 0.00 148368 0xd45f1b6c cv_wait+0x54
258 11% 47% 0.00 264247 0xd9f6df70 dbuf_remove_ref+0x44
240 10% 57% 0.00 377724 0xd45f1b6c txg_wait_open+0x1e
216 9% 66% 0.00 58067 dbuf_hash_table+0x5cc dbuf_find+0x57
159 7% 73% 0.00 63749 0xd47daac0 txg_quiesce+0x3c
148 6% 79% 0.00 57023 0xd47dab00 txg_quiesce+0x3c
73 3% 83% 0.00 58242 0xd9f6b1bc dbuf_find+0xa0
58 2% 85% 0.00 58718 0xd47daac0 txg_rele_to_sync+0x1a
54 2% 87% 0.00 101248 0xd9f6b1bc dbuf_remove_ref+0x58
45 2% 89% 0.00 60380 0xd45f1b6c txg_sync_thread+0x1eb
42 2% 91% 0.00 54663 0xd45f1b6c txg_quiesce_thread+0x117
35 1% 92% 0.00 579898 0xd47dab00 txg_hold_open+0x26
35 1% 94% 0.00 348679 0xd47daac0 txg_hold_open+0x26
32 1% 95% 0.00 53097 0xd47dab00 txg_quiesce+0xb0
29 1% 96% 0.00 54720 0xd47daac0 cv_wait+0x54
22 1% 97% 0.00 64088 0xdd8c4c6c
dsl_dataset_block_freeable+0x21
19 1% 98% 0.00 126725 0xd47dab00 txg_rele_to_sync+0x1a
14 1% 99% 0.00 302130 0xd8c8fd18 dnode_hold_impl+0x170
10 0% 99% 0.00 83035 0xd8c8fd18 dmu_tx_try_assign+0xb2
7 0% 100% 0.00 103280 0xd4a78e2c
dsl_dir_tempreserve_impl+0x3a
5 0% 100% 0.00 53437 0xdd8c4c6c
dsl_dir_tempreserve_clear+0x3a
4 0% 100% 0.00 75689 0xd8c8fd18 dmu_tx_unassign+0x45
1 0% 100% 0.00 91938 0xdd8c4c6c
dsl_dir_tempreserve_impl+0x3a
1 0% 100% 0.00 22649 0xd4a78e2c
dsl_dir_tempreserve_clear+0x3a
-------------------------------------------------------------------------------
Spin lock spin: 14238 events in 16.107 seconds (884 events/sec)
Count indv cuml rcnt spin Lock Caller
-------------------------------------------------------------------------------
9093 64% 64% 0.00 37 0xd467a760 lock_set_spl+0x41
4002 28% 92% 0.00 15 cpu0_disp lock_set_spl+0x41
586 4% 96% 0.00 73 0xd467a760 disp_lock_enter_high+0xe
256 2% 98% 0.00 62 cpu0_disp disp_lock_enter_high+0xe
85 1% 98% 0.00 155 turnstile_table+0x744 lock_set_spl+0x41
57 0% 99% 0.00 167 0xd70c5799 lock_set_spl+0x41
42 0% 99% 0.00 165 turnstile_table+0x65c lock_set_spl+0x41
33 0% 99% 0.00 195 turnstile_table+0x6fc lock_set_spl+0x41
32 0% 100% 0.00 203 turnstile_table+0x63c lock_set_spl+0x41
18 0% 100% 0.00 101 turnstile_table+0x42c lock_set_spl+0x41
12 0% 100% 0.00 95 turnstile_table+0x4ac lock_set_spl+0x41
9 0% 100% 0.00 204 turnstile_table+0x60c lock_set_spl+0x41
4 0% 100% 0.00 161 turnstile_table+0x624 lock_set_spl+0x41
4 0% 100% 0.00 1 0xd70c57f9 lock_set_spl+0x41
2 0% 100% 0.00 99 sleepq_head+0x3c4 disp_lock_enter_high+0xe
2 0% 100% 0.00 13 sleepq_head+0x3c4 lock_set_spl+0x41
1 0% 100% 0.00 12 0xd70c5319 lock_set_spl+0x41
-------------------------------------------------------------------------------
Thread lock spin: 200 events in 16.107 seconds (12 events/sec)
Count indv cuml rcnt spin Lock Caller
-------------------------------------------------------------------------------
52 26% 26% 0.00 145 cpu[0]+0x84 ts_tick+0x18
34 17% 43% 0.00 79 cpu[1]+0x84 ts_tick+0x18
29 14% 58% 0.00 60 cpu[1]+0x84 cv_wait+0x1c
26 13% 70% 0.00 124 turnstile_table+0x744 ts_tick+0x18
15 8% 78% 0.00 191 cpu[0]+0x84 cv_wait+0x1c
11 6% 84% 0.00 205 sleepq_head+0x3c4 ts_tick+0x18
9 4% 88% 0.00 43 cpu[1]+0x84 turnstile_block+0x6b
8 4% 92% 0.00 145 turnstile_table+0x65c ts_tick+0x18
4 2% 94% 0.00 198 turnstile_table+0x624 ts_tick+0x18
3 2% 96% 0.00 123 turnstile_table+0x6fc ts_tick+0x18
3 2% 97% 0.00 449 turnstile_table+0x63c ts_tick+0x18
2 1% 98% 0.00 227 turnstile_table+0x4ac ts_tick+0x18
1 0% 98% 0.00 325 cpu[0]+0x84 ts_update_list+0x49
1 0% 99% 0.00 36 cpu0_disp lwp_continue+0x2a
1 0% 100% 0.00 1 turnstile_table+0x60c ts_tick+0x18
1 0% 100% 0.00 1 0xd467a760 ts_update_list+0x49
-------------------------------------------------------------------------------
the high number of txg locks makes me think this is something inside ZFS. the
rest of the filesystem seems fine. at the time the problem started, both
processes (mozilla & svn) were writing to the same filesystem, /dsk/on. the
fs
seems fine, though, and other file operations are still working. disk usage is
almost entirely idle.
any ideas?
larne.