Hello zfs-discuss,
S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is
created and 7 copies are kept. There''s quota set for a file system
however there''s always at least 50GB of free space in a file system
(and much more in a pool). ZFS file system is exported over NFS.
Snapshots consume about 280GB of space.
We have noticed so performance problems on nfs clients to this file
system even during times with smaller load. Rising quota didn''t
help. However removing oldest snapshot automatically solved
performance problems.
I do not have more details - sorry.
Is it expected for snapshots to have very noticeable performance
impact on file system being snapshoted?
--
Best regards,
Robert mailto:rmilkowski at task.gda.pl
http://milek.blogspot.com
Robert Milkowski wrote:> Hello zfs-discuss, > > S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is > created and 7 copies are kept. There''s quota set for a file system > however there''s always at least 50GB of free space in a file system > (and much more in a pool). ZFS file system is exported over NFS. > > Snapshots consume about 280GB of space. > > We have noticed so performance problems on nfs clients to this file > system even during times with smaller load. Rising quota didn''t > help. However removing oldest snapshot automatically solved > performance problems. > > I do not have more details - sorry. > > Is it expected for snapshots to have very noticeable performance > impact on file system being snapshoted?No, this behavior is unexpected. The only way that snapshots should have a performance impact on access to the filesystem is if you are running low on space in the pool or quota (which it sounds like you are not). Can you describe what the performance problems were? What was the workload like? What problem did you identify? How did it improve when you ''zfs destroy''-ed the oldest snapshot? Are you sure that the oldest snapshot wasn''t pushing you close to your quota? --matt
Matthew Ahrens wrote On 10/16/06 09:07,:> Robert Milkowski wrote: > >> Hello zfs-discuss, >> >> S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is >> created and 7 copies are kept. There''s quota set for a file system >> however there''s always at least 50GB of free space in a file system >> (and much more in a pool). ZFS file system is exported over NFS. >> >> Snapshots consume about 280GB of space. >> We have noticed so performance problems on nfs clients to this file >> system even during times with smaller load. Rising quota didn''t >> help. However removing oldest snapshot automatically solved >> performance problems. >> >> I do not have more details - sorry. >> >> Is it expected for snapshots to have very noticeable performance >> impact on file system being snapshoted? > > > No, this behavior is unexpected. The only way that snapshots should > have a performance impact on access to the filesystem is if you are > running low on space in the pool or quota (which it sounds like you are > not). > > Can you describe what the performance problems were? What was the > workload like? What problem did you identify? How did it improve when > you ''zfs destroy''-ed the oldest snapshot? Are you sure that the oldest > snapshot wasn''t pushing you close to your quota? > > --mattI could well believe there would be a hiccup when the snapshot is taken on the rest of the pool. Each snapshot calls txg_wait_synced four times. A few related to the zil and one from dsl_sync_task_group_wait
Hello Matthew, Monday, October 16, 2006, 5:07:50 PM, you wrote: MA> Robert Milkowski wrote:>> Hello zfs-discuss, >> >> S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is >> created and 7 copies are kept. There''s quota set for a file system >> however there''s always at least 50GB of free space in a file system >> (and much more in a pool). ZFS file system is exported over NFS. >> >> Snapshots consume about 280GB of space. >> >> We have noticed so performance problems on nfs clients to this file >> system even during times with smaller load. Rising quota didn''t >> help. However removing oldest snapshot automatically solved >> performance problems. >> >> I do not have more details - sorry. >> >> Is it expected for snapshots to have very noticeable performance >> impact on file system being snapshoted?MA> No, this behavior is unexpected. The only way that snapshots should MA> have a performance impact on access to the filesystem is if you are MA> running low on space in the pool or quota (which it sounds like you are MA> not). MA> Can you describe what the performance problems were? What was the MA> workload like? What problem did you identify? How did it improve when MA> you ''zfs destroy''-ed the oldest snapshot? Are you sure that the oldest MA> snapshot wasn''t pushing you close to your quota? I wasn''t on-site every-time it happened... but it looks like all NFS clients to that file system have problem accessing data for 2-5s and iostat on clients shows 100% busy to the disk. There was well over 100gb of free space in a file system, then rising quote to add another 100gb didn''t help. After removing snapshots all problems disappeared almost immediately. No new snapshots were created during this time. It happened second time - and again, quite a lot of free space and destroying one snapshot helps. If it happens again I''ll try to get some more specific data - however it depends on when it happens as during peak hours I''ll probably just destroy a snapshot to get it working. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote:> If it happens again I''ll try to get some more specific data - however > it depends on when it happens as during peak hours I''ll probably just > destroy a snapshot to get it working.If it happens again, it would be great if you could gather some data before you destroy the snapshot so we have some chance of figuring out what''s going on here. ''iostat -xnpc 1'' will tell us if it''s CPU or disk bound. ''lockstat -kgiw sleep 10'' will tell us what functions are using CPU. ''echo "::walk thread|::findstack" | mdb -k'' will tell us where threads are stuck. Actually, if you could gather each of those both while you''re observing the problem, and then after the problem goes away, that would be helpful. --matt
Hi.
On nfs clients which are mounting file system f3-1/d611 I can see 3-5s
periods of 100% busy (iostat) and almost no IOs issued to nfs server, on nfs
server at the same time disk activity is almost 0 (both iostat and zpool
iostat). However CPU activity increases in SYS during that periods.
There''s quite a lot of free space in a f3-1/d611 as you can see:
# zfs list | egrep "611|NAM"
NAME USED AVAIL REFER MOUNTPOINT
f3-1/d611 381G 219G 151G //f3-1/d611
f3-1/d611 at auto-2006-10-18 48.2G - 156G -
f3-1/d611 at auto-2006-10-19 19.4G - 145G -
f3-1/d611 at auto-2006-10-20 21.9G - 155G -
f3-1/d611 at auto-2006-10-21 22.9G - 151G -
f3-1/d611 at auto-2006-10-22 18.5G - 145G -
f3-1/d611 at auto-2006-10-23 17.6G - 148G -
f3-1/d611 at auto-2006-10-24 14.5G - 154G -
And much more in a pool itself:
# zfs list | egrep "f3-1 |NAM"
NAME USED AVAIL REFER MOUNTPOINT
f3-1 563G 642G 9.18G //f3-1
# zfs get all f3-1/d611
NAME PROPERTY VALUE SOURCE
f3-1/d611 type filesystem -
f3-1/d611 creation Tue Aug 29 2:07 2006 -
f3-1/d611 used 381G -
f3-1/d611 available 219G -
f3-1/d611 referenced 151G -
f3-1/d611 compressratio 1.00x -
f3-1/d611 mounted yes -
f3-1/d611 quota 600G local
f3-1/d611 reservation none default
f3-1/d611 recordsize 128K default
f3-1/d611 mountpoint //f3-1/d611 default
f3-1/d611 sharenfs off default
f3-1/d611 checksum on default
f3-1/d611 compression off default
f3-1/d611 atime off inherited from f3-1
f3-1/d611 devices on default
f3-1/d611 exec on default
f3-1/d611 setuid on default
f3-1/d611 readonly off default
f3-1/d611 zoned off default
f3-1/d611 snapdir hidden default
f3-1/d611 aclmode groupmask default
f3-1/d611 aclinherit secure default
Access to this file system is only over NFS. Right now nfsd has less than 400
active threads.
Now output from several commands you requested during the times when disk
activity is low, CPU usage higher and nfs clients see problem.
# zpool iostat 1
[...]
f3-1 562G 662G 20 0 156K 0
f3-1 562G 662G 29 6 842K 887K
f3-1 562G 662G 12 0 238K 0
f3-1 562G 662G 927 25 11.0M 2.38M
f3-1 562G 662G 1.96K 221 19.8M 5.13M
f3-1 562G 662G 110 112 2.09M 1.72M
# iostat -xnzp 1
[...]
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
44.0 0.0 819.4 0.0 0.0 0.3 0.0 7.2 0 22
c5t600C0FF000000000098FD57F9DA83C00d0
44.0 0.0 819.4 0.0 0.0 0.3 0.0 7.2 0 22
c5t600C0FF000000000098FD57F9DA83C00d0s0
26.0 0.0 179.5 0.0 0.0 0.2 0.0 6.8 0 14
c5t600C0FF000000000098FD55DBA4EA000d0
26.0 0.0 179.5 0.0 0.0 0.2 0.0 6.8 0 14
c5t600C0FF000000000098FD55DBA4EA000d0s0
26.0 0.0 224.0 0.0 0.0 0.2 0.0 7.2 0 15
c5t600C0FF000000000098FD516E4403200d0
26.0 0.0 224.0 0.0 0.0 0.2 0.0 7.2 0 15
c5t600C0FF000000000098FD516E4403200d0s0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
12.0 0.0 216.5 0.0 0.0 0.1 0.0 6.1 0 6
c5t600C0FF000000000098FD57F9DA83C00d0
12.0 0.0 216.5 0.0 0.0 0.1 0.0 6.1 0 6
c5t600C0FF000000000098FD57F9DA83C00d0s0
10.0 0.0 70.5 0.0 0.0 0.1 0.0 6.4 0 6
c5t600C0FF000000000098FD55DBA4EA000d0
10.0 0.0 70.5 0.0 0.0 0.1 0.0 6.4 0 6
c5t600C0FF000000000098FD55DBA4EA000d0s0
8.0 0.0 63.0 0.0 0.0 0.0 0.0 6.0 0 5
c5t600C0FF000000000098FD516E4403200d0
8.0 0.0 63.0 0.0 0.0 0.0 0.0 6.0 0 5
c5t600C0FF000000000098FD516E4403200d0s0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 1.0 0.0 0.5 0.0 0.0 0.0 12.3 0 1 c1t0d0
0.0 1.0 0.0 0.5 0.0 0.0 0.0 12.3 0 1 c1t0d0s6
0.0 1.0 0.0 0.5 0.0 0.0 0.0 9.5 0 1 c1t1d0
0.0 1.0 0.0 0.5 0.0 0.0 0.0 9.5 0 1 c1t1d0s6
7.0 0.0 55.8 0.0 0.0 0.1 0.0 7.8 0 5
c5t600C0FF000000000098FD57F9DA83C00d0
7.0 0.0 55.8 0.0 0.0 0.1 0.0 7.8 0 5
c5t600C0FF000000000098FD57F9DA83C00d0s0
6.0 7.0 66.8 893.2 0.0 0.1 0.0 6.4 0 5
c5t600C0FF000000000098FD55DBA4EA000d0
6.0 7.0 66.8 893.2 0.0 0.1 0.0 6.4 0 5
c5t600C0FF000000000098FD55DBA4EA000d0s0
9.0 0.0 665.4 0.0 0.0 0.1 0.0 6.5 0 4
c5t600C0FF000000000098FD516E4403200d0
9.0 0.0 665.4 0.0 0.0 0.1 0.0 6.5 0 4
c5t600C0FF000000000098FD516E4403200d0s0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
5.0 0.0 36.6 0.0 0.0 0.0 0.0 7.3 0 4
c5t600C0FF000000000098FD57F9DA83C00d0
5.0 0.0 36.6 0.0 0.0 0.0 0.0 7.3 0 4
c5t600C0FF000000000098FD57F9DA83C00d0s0
11.0 0.0 84.3 0.0 0.0 0.1 0.0 6.1 0 6
c5t600C0FF000000000098FD55DBA4EA000d0
11.0 0.0 84.3 0.0 0.0 0.1 0.0 6.1 0 6
c5t600C0FF000000000098FD55DBA4EA000d0s0
5.0 0.0 181.1 0.0 0.0 0.0 0.0 7.1 0 4
c5t600C0FF000000000098FD516E4403200d0
5.0 0.0 181.1 0.0 0.0 0.0 0.0 7.1 0 4
c5t600C0FF000000000098FD516E4403200d0s0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
229.9 10.0 2795.9 639.9 0.0 1.9 0.0 7.8 0 49
c5t600C0FF000000000098FD57F9DA83C00d0
229.9 10.0 2795.9 639.9 0.0 1.9 0.0 7.8 0 49
c5t600C0FF000000000098FD57F9DA83C00d0s0
243.9 13.0 3790.1 1151.7 0.0 2.0 0.0 7.6 0 54
c5t600C0FF000000000098FD55DBA4EA000d0
243.9 13.0 3790.1 1151.7 0.0 2.0 0.0 7.6 0 54
c5t600C0FF000000000098FD55DBA4EA000d0s0
198.0 0.0 2355.5 0.0 0.0 1.5 0.0 7.7 0 51
c5t600C0FF000000000098FD516E4403200d0
198.0 0.0 2355.5 0.0 0.0 1.5 0.0 7.7 0 51
c5t600C0FF000000000098FD516E4403200d0s0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
712.1 71.0 6038.5 1469.6 0.0 8.4 0.0 10.8 0 98
c5t600C0FF000000000098FD57F9DA83C00d0
712.1 71.0 6038.5 1469.6 0.0 8.4 0.0 10.8 0 98
c5t600C0FF000000000098FD57F9DA83C00d0s0
759.1 164.0 7574.6 2283.2 0.0 10.2 0.0 11.1 0 100
c5t600C0FF000000000098FD55DBA4EA000d0
759.1 164.0 7574.6 2283.2 0.0 10.2 0.0 11.1 0 100
c5t600C0FF000000000098FD55DBA4EA000d0s0
716.1 40.0 8526.2 1956.2 0.0 8.7 0.0 11.5 0 98
c5t600C0FF000000000098FD516E4403200d0
716.1 40.0 8526.2 1956.2 0.0 8.7 0.0 11.5 0 98
c5t600C0FF000000000098FD516E4403200d0s0
# mpstat 1
[...]
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 7 72 61 563 6 103 13 0 253 1 20 0 79
1 0 0 3 57 31 359 0 70 11 1 14 0 82 0 18
2 0 0 8 369 356 755 3 144 18 0 140 0 3 0 97
3 0 0 9 321 218 334 1 18 8 0 32 0 57 0 43
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 1 21 16 192 0 13 3 0 40 0 55 0 45
1 0 0 1 54 27 324 2 50 8 0 195 0 15 0 85
2 0 0 2 165 161 148 0 39 2 0 49 0 1 0 99
3 0 0 7 260 158 144 1 36 4 0 31 0 60 0 40
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 15 36 29 113 1 35 6 0 18 0 87 0 13
1 0 0 5 60 33 497 1 59 11 0 108 0 48 0 52
2 0 0 5 491 483 477 25 72 9 0 214 1 12 0 87
3 0 0 7 288 183 328 2 30 4 0 45 0 46 0 54
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 7 15 12 270 0 33 6 0 175 0 12 0 88
1 0 0 3 47 25 178 0 42 4 0 47 1 5 0 94
2 0 0 1 64 60 70 0 22 3 0 75 0 0 0 100
3 0 0 2 267 162 2 2 1 1 0 0 0 100 0 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 112 493 428 3068 7 638 279 91 24 0 50 0 50
1 0 0 55 87 30 1541 4 402 167 71 154 0 70 0 30
2 0 0 145 4074 3932 4564 5 998 490 497 106 0 51 0 49
3 0 0 139 764 587 4220 12 798 334 461 17 0 66 0 34
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 281 1189 902 9050 22 1867 1173 1573 139 0 62 0 38
1 0 0 136 357 27 7267 11 1645 985 1128 42 0 76 0 24
2 0 0 168 7332 7079 7255 21 1608 952 1138 80 0 63 0 37
3 0 0 331 1199 1058 1589 19 272 253 58 34 0 84 0 16
Different time period when disk activity is small:
# lockstat -kgIw sleep 10 | less
Profiling interrupt: 3904 events in 10.055 seconds (388 events/sec)
Count genr cuml rcnt nsec CPU+PIL Hottest Caller
-------------------------------------------------------------------------------
8925 229% ---- 0.00 5067 cpu[1] nfssys
7386 189% ---- 0.00 2610 cpu[3] thread_start
6566 168% ---- 0.00 4404 cpu[2] thread_start
6453 165% ---- 0.00 3325 cpu[0] thread_start
1259 32% ---- 0.00 1756 cpu[0]+11 thread_start
1161 30% ---- 0.00 2247 cpu[2]+11 thread_start
1159 30% ---- 0.00 1472 cpu[3]+11 thread_start
939 24% ---- 0.00 2459 cpu[1]+11 thread_start
190 5% ---- 0.00 2880 cpu[0]+6 pci_intr_wrapper
181 5% ---- 0.00 2914 cpu[3]+6 ql_response_pkt
103 3% ---- 0.00 4591 cpu[2]+6 intr_thread
7 0% ---- 0.00 1118 cpu[3]+12 rmc_comm_serdev_receive
5 0% ---- 0.00 1022 cpu[3]+13 cyclic_softint
-------------------------------------------------------------------------------
Now during another period when disk activity is low and nfs clients see problem:
# dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n
fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}''
-n tick-5s''{printa(@);exit(0);}''
[...]
page_sub_common 840400
mutex_vector_enter 880400
callout_schedule_1 1106200
kmem_cache_alloc 1182400
lock_set_spl_spin 1240000
thread_lock 1269600
syscall_mstate 1359200
set_freemem 1575800
page_trylock 1587500
hwblkpagecopy 1885100
set_anoninfo 1904400
page_pptonum 2044900
xc_serv 2147600
lzjb_decompress 3209600
resume 4783500
hwblkclr 7101100
page_next_scan_large 23648600
generic_idle_cpu 69234100
disp_getwork 139261800
avl_walk 669424900
Lets see how it looks when there;s no problem:
# dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n
fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}''
-n tick-5s''{printa(@);exit(0);}''
[...]
avl_insert 8591400
sleepq_insert 8660800
fletcher_4_native 9789400
buf_hash 10454600
copy_pattern 12395200
hat_getpfnum 14970100
ddi_get_instance 16348900
ddi_get_soft_state 19676500
pci_pbm_dma_sync 21993300
space_map_seg_compare 25628100
lock_set_spin 31066500
ip_cksum 31528900
kmem_cache_alloc 34203500
kmem_free 35418800
page_next_scan_large 35833700
fletcher_2_native 48868100
kmem_cache_free 54456400
avl_destroy_nodes 63271000
lzjb_decompress 72163800
lock_set_spl_spin 72655100
lzjb_compress 91146600
mutex_vector_enter 113553300
resume 2836183200
No avl_walk :)
Heh, I''m sure I have seen avl_walk consuming lot of CPU before...
So wait for another such period and (6-7seconds):
# dtrace -n fbt::avl_walk:entry''{@[stack()]=count();}''
[...]
zfs`space_map_add+0x160
zfs`space_map_load+0x218
zfs`metaslab_activate+0x3c
zfs`metaslab_group_alloc+0x1c0
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_dva_allocate+0x4c
zfs`zio_write_compress+0x1ec
zfs`arc_write+0xe4
zfs`dbuf_sync+0x6d8
zfs`dnode_sync+0x35c
zfs`dmu_objset_sync_dnodes+0x6c
zfs`dmu_objset_sync+0x54
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x104
zfs`spa_sync+0xe4
zfs`txg_sync_thread+0x134
unix`thread_start+0x4
1259
zfs`space_map_add+0x174
zfs`space_map_load+0x218
zfs`metaslab_activate+0x3c
zfs`metaslab_group_alloc+0x1c0
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_alloc_blk+0x70
zfs`zil_lwb_write_start+0x8c
zfs`zil_lwb_commit+0x1ac
zfs`zil_commit+0x1b0
zfs`zfs_fsync+0xa8
genunix`fop_fsync+0x14
nfssrv`rfs3_create+0x700
nfssrv`common_dispatch+0x444
rpcmod`svc_getreq+0x154
rpcmod`svc_run+0x198
nfs`nfssys+0x1c4
unix`syscall_trap32+0xcc
183895
zfs`space_map_add+0x160
zfs`space_map_load+0x218
zfs`metaslab_activate+0x3c
zfs`metaslab_group_alloc+0x1c0
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_alloc_blk+0x70
zfs`zil_lwb_write_start+0x8c
zfs`zil_lwb_commit+0x1ac
zfs`zil_commit+0x1b0
zfs`zfs_fsync+0xa8
genunix`fop_fsync+0x14
nfssrv`rfs3_create+0x700
nfssrv`common_dispatch+0x444
rpcmod`svc_getreq+0x154
rpcmod`svc_run+0x198
nfs`nfssys+0x1c4
unix`syscall_trap32+0xcc
196122
zfs`metaslab_ff_alloc+0x9c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x1e4
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_alloc_blk+0x70
zfs`zil_lwb_write_start+0x8c
zfs`zil_commit+0x21c
zfs`zfs_fsync+0xa8
genunix`fop_fsync+0x14
nfssrv`rfs3_remove+0x218
nfssrv`common_dispatch+0x444
rpcmod`svc_getreq+0x154
rpcmod`svc_run+0x198
nfs`nfssys+0x1c4
unix`syscall_trap32+0xcc
854743
zfs`metaslab_ff_alloc+0x9c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x1e4
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_dva_allocate+0x4c
zfs`zio_write_compress+0x1ec
zfs`arc_write+0xe4
zfs`dbuf_sync+0x6d8
zfs`dnode_sync+0x35c
zfs`dmu_objset_sync_dnodes+0x6c
zfs`dmu_objset_sync+0x54
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x104
zfs`spa_sync+0xe4
zfs`txg_sync_thread+0x134
unix`thread_start+0x4
1042372
zfs`metaslab_ff_alloc+0x9c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x1e4
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_alloc_blk+0x70
zfs`zil_lwb_write_start+0x8c
zfs`zil_lwb_commit+0x1ac
zfs`zil_commit+0x1b0
zfs`zfs_fsync+0xa8
genunix`fop_fsync+0x14
nfssrv`rfs3_create+0x700
nfssrv`common_dispatch+0x444
rpcmod`svc_getreq+0x154
rpcmod`svc_run+0x198
nfs`nfssys+0x1c4
unix`syscall_trap32+0xcc
1415957
And another "bad" time period:
# dtrace -n fbt::avl_walk:entry''{self->vt=vtimestamp;}'' -n
fbt::avl_walk:return''/self->vt/{@[stack()]=sum(vtimestamp-self->vt);self->vt=0;}''
-n tick-6s''{printa(@);exit(0);}''
[...]
zfs`vdev_queue_io_to_issue+0xc4
zfs`vdev_queue_io_done+0x4c
zfs`vdev_disk_io_done+0x4
genunix`taskq_thread+0x1a4
unix`thread_start+0x4
11700
zfs`vdev_queue_io_to_issue+0x58
zfs`vdev_queue_io_done+0x4c
zfs`vdev_disk_io_done+0x4
genunix`taskq_thread+0x1a4
unix`thread_start+0x4
20800
zfs`metaslab_ff_alloc+0x9c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x1e4
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_dva_allocate+0x4c
zfs`zio_write_compress+0x1ec
zfs`arc_write+0xe4
zfs`dbuf_sync+0x6d8
zfs`dnode_sync+0x35c
zfs`dmu_objset_sync_dnodes+0x6c
zfs`dmu_objset_sync+0x54
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x104
zfs`spa_sync+0xe4
zfs`txg_sync_thread+0x134
unix`thread_start+0x4
382903900
zfs`metaslab_ff_alloc+0x9c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x1e4
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_alloc_blk+0x70
zfs`zil_lwb_write_start+0x8c
zfs`zil_lwb_commit+0x1ac
zfs`zil_commit+0x1b0
zfs`zfs_write+0x6c4
genunix`fop_write+0x20
nfssrv`rfs3_write+0x368
nfssrv`common_dispatch+0x444
rpcmod`svc_getreq+0x154
rpcmod`svc_run+0x198
nfs`nfssys+0x1c4
unix`syscall_trap32+0xcc
475080700
zfs`metaslab_ff_alloc+0x9c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x1e4
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_alloc_blk+0x70
zfs`zil_lwb_write_start+0x8c
zfs`zil_lwb_commit+0x1ac
zfs`zil_commit+0x1b0
zfs`zfs_fsync+0xa8
genunix`fop_fsync+0x14
nfssrv`rfs3_create+0x700
nfssrv`common_dispatch+0x444
rpcmod`svc_getreq+0x154
rpcmod`svc_run+0x198
nfs`nfssys+0x1c4
unix`syscall_trap32+0xcc
481455800
So lets destroy oldest snapshot:
# zfs destroy f3-1/d611 at auto-2006-10-18
[it took about 4 minutes!]
# mdb -k
Loading modules: [ unix krtld genunix dtrace specfs ufs sd md ip sctp usba fcp
fctl qlc ssd lofs logindmux ptm cpc fcip random crypto zfs nfs
]> ::ps!grep zfs
R 20905 20464 20905 20458 0 0x4a004000 00000301ec367c20
zfs> 00000301ec367c20::walk thread|::findstack -v
stack pointer for thread 301d1a02fe0: 2a103a28b71
[ 000002a103a28b71 cv_wait+0x38() ]
000002a103a28c21 txg_wait_synced+0x54(60007327290, 1a7bda, 1a7bd9,
600073272d0, 600073272d2, 60007327288)
000002a103a28cd1 dsl_dir_sync_task+0x124(60003867d00, 0, 60003f6800a, 0, 0,
1a7bda)
000002a103a28d81 dsl_dataset_destroy+0x98(60003f68000, 5, 3, 60003f68009,
600073271c0, 0)
000002a103a28f61 dmu_objset_destroy+0x3c(60003f68000, 0, 0, 2, 0, 54)
000002a103a29021 zfsdev_ioctl+0x160(70370c00, 54, ffbfed38, 1c, 70, e68)
000002a103a290d1 fop_ioctl+0x20(301ee6220c0, 5a1c, ffbfed38, 100003,
60003dcebf8, 12019f0)
000002a103a29191 ioctl+0x184(4, 3005b7c2ba8, ffbfed38, ff38db68, 40350, 5a1c)
000002a103a292e1 syscall_trap32+0xcc(4, 5a1c, ffbfed38, ff38db68, 40350, 0)
After snapshot was destroyed problem is completly gone.
And right now no avl_walk function at the top:
# dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n
fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}''
-n tick-5s''{printa(@);exit(0);}''
[...]
cv_broadcast 11204800
vdev_cache_offset_compare 11977900
avl_walk 12568500
hat_getpfnum 13012900
zio_next_stage 13649700
page_next_scan_large 15850300
ddi_get_instance 16842800
kmem_free 20050300
pci_pbm_dma_sync 22854300
ddi_get_soft_state 25803500
space_map_seg_compare 32883300
kmem_cache_free 34151000
ip_cksum 34840900
kmem_cache_alloc 41557900
resume 47279100
fletcher_2_native 55060000
lzjb_compress 94374200
lzjb_decompress 97102700
mutex_vector_enter 115330500
Lets try small experiment - see if lowering quota for the fs will make things
bad again:
# zfs list | egrep "NAME|f3-1/d611"
NAME USED AVAIL REFER MOUNTPOINT
f3-1/d611 334G 266G 151G //f3-1/d611
f3-1/d611 at auto-2006-10-19 25.7G - 145G -
f3-1/d611 at auto-2006-10-20 21.9G - 155G -
f3-1/d611 at auto-2006-10-21 22.9G - 151G -
f3-1/d611 at auto-2006-10-22 18.5G - 145G -
f3-1/d611 at auto-2006-10-23 17.6G - 148G -
f3-1/d611 at auto-2006-10-24 14.7G - 154G -
# zfs get quota f3-1/d611
NAME PROPERTY VALUE SOURCE
f3-1/d611 quota 600G local
# zfs set quota=400g f3-1/d611
# zfs list | egrep "NAME|f3-1/d611"
NAME USED AVAIL REFER MOUNTPOINT
f3-1/d611 334G 66.1G 151G //f3-1/d611
f3-1/d611 at auto-2006-10-19 25.7G - 145G -
f3-1/d611 at auto-2006-10-20 21.9G - 155G -
f3-1/d611 at auto-2006-10-21 22.9G - 151G -
f3-1/d611 at auto-2006-10-22 18.5G - 145G -
f3-1/d611 at auto-2006-10-23 17.6G - 148G -
f3-1/d611 at auto-2006-10-24 14.7G - 154G -
This message posted from opensolaris.org
forgot to mention - after quota was lowered still no problem - everything works ok. This message posted from opensolaris.org
Robert Milkowski wrote:> Hi. > > On nfs clients which are mounting file system f3-1/d611 I can see 3-5s periods of 100% busy (iostat) and almost no IOs issued to nfs server, on nfs server at the same time disk activity is almost 0 (both iostat and zpool iostat). However CPU activity increases in SYS during that periods.> Different time period when disk activity is small: > > # lockstat -kgIw sleep 10 | lessDid you happen to get ''lockstat -kgIW'' output while the problem was occurring? (note the capital W) I''m not sure how to interpret the -w output... (and sorry I gave you the wrong flags before).> Now during another period when disk activity is low and nfs clients see problem: > > # dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}'' -n tick-5s''{printa(@);exit(0);}'' > [...] > page_next_scan_large 23648600 > generic_idle_cpu 69234100 > disp_getwork 139261800 > avl_walk 669424900Hmm, that''s a possibility, but the method you''re using to gather this information (tracing *every* function entry and exit) is a bit heavy-handed, and it may be distorting the results.> Heh, I''m sure I have seen avl_walk consuming lot of CPU before... > > So wait for another such period and (6-7seconds): > > # dtrace -n fbt::avl_walk:entry''{@[stack()]=count();}'' > [...]> zfs`metaslab_ff_alloc+0x9c > zfs`space_map_alloc+0x10 > zfs`metaslab_group_alloc+0x1e4 > zfs`metaslab_alloc_dva+0x114 > zfs`metaslab_alloc+0x2c > zfs`zio_alloc_blk+0x70 > zfs`zil_lwb_write_start+0x8c > zfs`zil_lwb_commit+0x1ac > zfs`zil_commit+0x1b0 > zfs`zfs_fsync+0xa8 > genunix`fop_fsync+0x14 > nfssrv`rfs3_create+0x700 > nfssrv`common_dispatch+0x444 > rpcmod`svc_getreq+0x154 > rpcmod`svc_run+0x198 > nfs`nfssys+0x1c4 > unix`syscall_trap32+0xcc > 1415957Hmm, assuming that avl_walk() is actually consuming most of our CPU (which the lockstat -kgIW will confirm), this seems to indicate that we''re taking a long time trying to find free chunks of space. This may happen if you have lots of small fragments of free space, but no chunks large enough to hold the block we''re trying to allocate. We try to avoid this situation by trying to allocate from the metaslabs with the most free space, but it''s possible that there''s an error in this algorithm.> So lets destroy oldest snapshot: > > # zfs destroy f3-1/d611 at auto-2006-10-18 > [it took about 4 minutes!]> After snapshot was destroyed problem is completly gone.FYI, destroying the snapshot probably helped simply by (a) returning some big chunks of space to the pool and/or (b) perturbing the system enough so that we try different metaslabs which aren''t so fragmented. --matt
Hello Matthew, Wednesday, October 25, 2006, 7:08:56 AM, you wrote: MA> Robert Milkowski wrote:>> Hi. >> >> On nfs clients which are mounting file system f3-1/d611 I can see 3-5s periods of 100% busy (iostat) and almost no IOs issued to nfs server, on nfs server at the same time disk activity is almost 0 (both iostat and zpool iostat). However CPU activity increases in SYS during that periods.>> Different time period when disk activity is small: >> >> # lockstat -kgIw sleep 10 | lessMA> Did you happen to get ''lockstat -kgIW'' output while the problem was MA> occurring? (note the capital W) I''m not sure how to interpret the -w MA> output... (and sorry I gave you the wrong flags before). Sorry, just blindly copied (changed i to I), next time I''ll try.>> Now during another period when disk activity is low and nfs clients see problem: >> >> # dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}'' -n tick-5s''{printa(@);exit(0);}'' >> [...] >> page_next_scan_large 23648600 >> generic_idle_cpu 69234100 >> disp_getwork 139261800 >> avl_walk 669424900MA> Hmm, that''s a possibility, but the method you''re using to gather this MA> information (tracing *every* function entry and exit) is a bit MA> heavy-handed, and it may be distorting the results. That could be the case - however during normal periods avl_walk when measured the same way is not on the top.>> Heh, I''m sure I have seen avl_walk consuming lot of CPU before... >> >> So wait for another such period and (6-7seconds): >> >> # dtrace -n fbt::avl_walk:entry''{@[stack()]=count();}'' >> [...]>> zfs`metaslab_ff_alloc+0x9c >> zfs`space_map_alloc+0x10 >> zfs`metaslab_group_alloc+0x1e4 >> zfs`metaslab_alloc_dva+0x114 >> zfs`metaslab_alloc+0x2c >> zfs`zio_alloc_blk+0x70 >> zfs`zil_lwb_write_start+0x8c >> zfs`zil_lwb_commit+0x1ac >> zfs`zil_commit+0x1b0 >> zfs`zfs_fsync+0xa8 >> genunix`fop_fsync+0x14 >> nfssrv`rfs3_create+0x700 >> nfssrv`common_dispatch+0x444 >> rpcmod`svc_getreq+0x154 >> rpcmod`svc_run+0x198 >> nfs`nfssys+0x1c4 >> unix`syscall_trap32+0xcc >> 1415957MA> Hmm, assuming that avl_walk() is actually consuming most of our CPU MA> (which the lockstat -kgIW will confirm), this seems to indicate that MA> we''re taking a long time trying to find free chunks of space. This may MA> happen if you have lots of small fragments of free space, but no chunks MA> large enough to hold the block we''re trying to allocate. We try to MA> avoid this situation by trying to allocate from the metaslabs with the MA> most free space, but it''s possible that there''s an error in this algorithm.>> So lets destroy oldest snapshot: >> >> # zfs destroy f3-1/d611 at auto-2006-10-18 >> [it took about 4 minutes!]>> After snapshot was destroyed problem is completly gone.MA> FYI, destroying the snapshot probably helped simply by (a) returning MA> some big chunks of space to the pool and/or (b) perturbing the system MA> enough so that we try different metaslabs which aren''t so fragmented. Well, that''s scaring - it means when I''ll fill up the pool a little bit more I could get big problem even without using snapshots which is much worse as there would be no easy and quick method to solve symptoms like just destroying snapshot. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski
2006-Oct-25 13:35 UTC
[zfs-discuss] Re: Re: Snapshots impact on performance
Hi.
Today again the same problem - however not that severe so I''m not
sure if below lockstat is representative.
# lockstat -kgIW sleep 5 | less
Profiling interrupt: 1968 events in 5.072 seconds (388 events/sec)
Count genr cuml rcnt nsec Hottest CPU+PIL Caller
-------------------------------------------------------------------------------
1560 79% ---- 0.00 2520 cpu[3] thread_start
1018 52% ---- 0.00 2069 cpu[2]+11 idle
851 43% ---- 0.00 2093 cpu[2]+11 disp_getwork
591 30% ---- 0.00 3647 cpu[1] metaslab_alloc
591 30% ---- 0.00 3647 cpu[1] metaslab_alloc_dva
591 30% ---- 0.00 3647 cpu[1] metaslab_group_alloc
404 21% ---- 0.00 3488 cpu[1] space_map_load
404 21% ---- 0.00 3488 cpu[1] metaslab_activate
382 19% ---- 0.00 3564 cpu[1] zio_dva_allocate
382 19% ---- 0.00 3564 cpu[1] zio_write_compress
382 19% ---- 0.00 3564 cpu[1] txg_sync_thread
382 19% ---- 0.00 3564 cpu[1] spa_sync
382 19% ---- 0.00 3564 cpu[1] dsl_pool_sync
382 19% ---- 0.00 3564 cpu[1] dsl_dataset_sync
382 19% ---- 0.00 3564 cpu[1] dnode_sync
382 19% ---- 0.00 3564 cpu[1] dmu_objset_sync
382 19% ---- 0.00 3564 cpu[1] dmu_objset_sync_dnodes
382 19% ---- 0.00 3564 cpu[1] dbuf_sync
382 19% ---- 0.00 3564 cpu[1] arc_write
363 18% ---- 0.00 3510 cpu[1] syscall_trap32
361 18% ---- 0.00 3514 cpu[1] nfssys
361 18% ---- 0.00 3514 cpu[1] svc_run
350 18% ---- 0.00 3532 cpu[1] svc_getreq
349 18% ---- 0.00 3533 cpu[1] common_dispatch
251 13% ---- 0.00 3813 cpu[1] avl_find
238 12% ---- 0.00 3320 cpu[1] space_map_remove
211 11% ---- 0.00 3798 cpu[1] zil_commit
209 11% ---- 0.00 3800 cpu[1] zio_alloc_blk
209 11% ---- 0.00 3800 cpu[1] zil_lwb_write_start
198 10% ---- 0.00 4256 cpu[1] metaslab_ff_alloc
196 10% ---- 0.00 3867 cpu[1] zfs_putpage
[...]
This message posted from opensolaris.org
Matthew Ahrens
2006-Oct-25 16:58 UTC
[zfs-discuss] Re: Re: Snapshots impact on performance
Robert Milkowski wrote:> Hi. > > Today again the same problem - however not that severe so I''m not sure if below lockstat is representative. > > # lockstat -kgIW sleep 5 | less > Profiling interrupt: 1968 events in 5.072 seconds (388 events/sec) > > Count genr cuml rcnt nsec Hottest CPU+PIL Caller > ------------------------------------------------------------------------------- > 1560 79% ---- 0.00 2520 cpu[3] thread_start > 1018 52% ---- 0.00 2069 cpu[2]+11 idle > 851 43% ---- 0.00 2093 cpu[2]+11 disp_getwork > 591 30% ---- 0.00 3647 cpu[1] metaslab_alloc > 591 30% ---- 0.00 3647 cpu[1] metaslab_alloc_dva > 591 30% ---- 0.00 3647 cpu[1] metaslab_group_alloc > 404 21% ---- 0.00 3488 cpu[1] space_map_load > 404 21% ---- 0.00 3488 cpu[1] metaslab_activateNice, this is definitely pointing the finger more definitively. Next time could you try: dtrace -n ''profile-97{@[stack(20)] = count()}'' -c ''sleep 5'' (just send the last 10 or so stack traces) In the mean time I''ll talk with our SPA experts and see if I can figure out how to fix this... --matt
> Nice, this is definitely pointing the finger more definitively. Next > time could you try: > > dtrace -n ''profile-97{@[stack(20)] = count()}'' -c ''sleep 5'' > > (just send the last 10 or so stack traces) > > In the mean time I''ll talk with our SPA experts and see if I can figure > out how to fix this...By any chance is the pool fairly close to full? The fuller it gets, the harder it becomes to find long stretches of free space. Jeff
Robert Milkowski
2006-Oct-30 08:54 UTC
[zfs-discuss] Re: Re: Snapshots impact on performance
Hello Jeff, Monday, October 30, 2006, 2:03:52 AM, you wrote:>> Nice, this is definitely pointing the finger more definitively. Next >> time could you try: >> >> dtrace -n ''profile-97{@[stack(20)] = count()}'' -c ''sleep 5'' >> >> (just send the last 10 or so stack traces) >> >> In the mean time I''ll talk with our SPA experts and see if I can figure >> out how to fix this...JB> By any chance is the pool fairly close to full? The fuller it gets, JB> the harder it becomes to find long stretches of free space. Nope - at least 600GB free all the time in the pool. Also it''s not a quota - if I rise a quote for the file system (+100GB for example) it doesn''t help. If I remove oldest snapshot it always helps immediately. Sometimes I have to remove two oldest snapshots. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Chris Gerhard
2006-Dec-06 17:23 UTC
[zfs-discuss] Re: Re: Re: Snapshots impact on performance
One of our file servers internally to Sun that reproduces this running nv53 here
is the dtrace output:
unix`mutex_vector_enter+0x120
zfs`metaslab_group_alloc+0x1a0
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_dva_allocate+0x54
zfs`zio_write_compress+0x248
zfs`arc_write+0xec
zfs`dbuf_sync+0x698
zfs`dnode_sync+0x2ec
zfs`dmu_objset_sync_dnodes+0x60
zfs`dmu_objset_sync+0x78
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x64
zfs`spa_sync+0x1b4
zfs`txg_sync_thread+0x120
unix`thread_start+0x4
8
unix`mutex_vector_enter+0x120
zfs`metaslab_group_alloc+0x1a0
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_fsync+0xa4
genunix`fop_fsync+0x14
nfssrv`rfs3_setattr+0x4ec
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
8
genunix`avl_walk+0x3c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_putpage+0x1d8
genunix`fop_putpage+0x1c
nfssrv`rfs3_commit+0x130
nfssrv`common_dispatch+0x4ec
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
8
genunix`avl_walk+0x40
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_putpage+0x1d8
genunix`fop_putpage+0x1c
nfssrv`rfs3_commit+0x130
nfssrv`common_dispatch+0x4ec
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
8
genunix`avl_walk+0x4c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_dva_allocate+0x54
zfs`zio_write_compress+0x248
zfs`arc_write+0xec
zfs`dbuf_sync+0x698
zfs`dnode_sync+0x2ec
zfs`dmu_objset_sync_dnodes+0x60
zfs`dmu_objset_sync+0x50
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x64
zfs`spa_sync+0x1b4
zfs`txg_sync_thread+0x120
unix`thread_start+0x4
8
zfs`fletcher_2_native+0x2c
zfs`arc_cksum_verify+0x64
zfs`arc_buf_thaw+0x38
zfs`dbuf_dirty+0x10c
zfs`dmu_write_uio+0xc4
zfs`zfs_write+0x3ac
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
8
zfs`fletcher_2_native+0x2c
zfs`arc_cksum_verify+0x64
zfs`arc_buf_destroy+0x1c
zfs`arc_evict+0x1f0
zfs`arc_adjust+0xf8
zfs`arc_kmem_reclaim+0x100
zfs`arc_kmem_reap_now+0x20
zfs`arc_reclaim_thread+0xdc
unix`thread_start+0x4
8
zfs`fletcher_2_native+0x2c
zfs`arc_cksum_compute+0x6c
zfs`dbuf_rele+0x40
zfs`dmu_buf_rele_array+0x34
zfs`dmu_write_uio+0x13c
zfs`zfs_write+0x3ac
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
8
unix`disp_getwork+0x7c
unix`idle+0xd0
unix`thread_start+0x4
9
unix`mutex_vector_enter+0x134
zfs`metaslab_free_dva+0xb0
zfs`metaslab_free+0x78
zfs`zio_dva_free+0x14
zfs`zio_wait+0xc
zfs`arc_free+0x13c
zfs`dsl_dataset_block_kill+0x168
zfs`dbuf_do_kill+0x14
genunix`taskq_thread+0x19c
unix`thread_start+0x4
9
genunix`avl_walk+0x3c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_dva_allocate+0x54
zfs`zio_write_compress+0x248
zfs`arc_write+0xec
zfs`dbuf_sync+0x698
zfs`dnode_sync+0x2ec
zfs`dmu_objset_sync_dnodes+0x60
zfs`dmu_objset_sync+0x50
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x64
zfs`spa_sync+0x1b4
zfs`txg_sync_thread+0x120
unix`thread_start+0x4
9
10
unix`mutex_vector_enter+0x120
zfs`metaslab_group_alloc+0x1a0
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_putpage+0x1d8
genunix`fop_putpage+0x1c
nfssrv`rfs3_commit+0x130
nfssrv`common_dispatch+0x4ec
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
10
unix`mutex_vector_enter+0x134
zfs`metaslab_group_alloc+0x1a0
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_fsync+0xa4
genunix`fop_fsync+0x14
nfssrv`rfs3_setattr+0x4ec
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
10
unix`disp_getwork+0x8c
unix`idle+0xd0
unix`thread_start+0x4
11
unix`mutex_vector_enter+0x134
zfs`metaslab_group_alloc+0x1a0
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_dva_allocate+0x54
zfs`zio_write_compress+0x248
zfs`arc_write+0xec
zfs`dbuf_sync+0x698
zfs`dnode_sync+0x2ec
zfs`dmu_objset_sync_dnodes+0x60
zfs`dmu_objset_sync+0x78
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x64
zfs`spa_sync+0x1b4
zfs`txg_sync_thread+0x120
unix`thread_start+0x4
12
genunix`avl_walk+0x50
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_dva_allocate+0x54
zfs`zio_write_compress+0x248
zfs`arc_write+0xec
zfs`dbuf_sync+0x698
zfs`dnode_sync+0x2ec
zfs`dmu_objset_sync_dnodes+0x60
zfs`dmu_objset_sync+0x50
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x64
zfs`spa_sync+0x1b4
zfs`txg_sync_thread+0x120
unix`thread_start+0x4
12
zfs`metaslab_ff_alloc+0x7c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
12
genunix`avl_walk+0x40
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_dva_allocate+0x54
zfs`zio_write_compress+0x248
zfs`arc_write+0xec
zfs`dbuf_sync+0x698
zfs`dnode_sync+0x2ec
zfs`dmu_objset_sync_dnodes+0x60
zfs`dmu_objset_sync+0x50
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x64
zfs`spa_sync+0x1b4
zfs`txg_sync_thread+0x120
unix`thread_start+0x4
13
unix`mutex_vector_enter+0x120
zfs`metaslab_free_dva+0xb0
zfs`metaslab_free+0x78
zfs`zio_dva_free+0x14
zfs`zio_wait+0xc
zfs`arc_free+0x13c
zfs`dsl_dataset_block_kill+0x168
zfs`dbuf_do_kill+0x14
genunix`taskq_thread+0x19c
unix`thread_start+0x4
18
genunix`avl_walk+0x50
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
18
zfs`metaslab_ff_alloc+0x74
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
19
genunix`avl_walk+0x4c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
20
genunix`avl_walk+0x3c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
22
unix`mutex_vector_enter+0x134
zfs`metaslab_group_alloc+0x1a0
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_putpage+0x1d8
genunix`fop_putpage+0x1c
nfssrv`rfs3_commit+0x130
nfssrv`common_dispatch+0x4ec
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
26
genunix`avl_walk+0x40
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
27
zfs`metaslab_ff_alloc+0x74
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
38
genunix`avl_walk+0x50
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
45
genunix`avl_walk+0x40
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
46
genunix`avl_walk+0x3c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
57
genunix`avl_walk+0x4c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
57
This message posted from opensolaris.org
Robert Milkowski
2006-Dec-12 17:28 UTC
[zfs-discuss] Re: Re: Re: Snapshots impact on performance
Hello Chris,
Wednesday, December 6, 2006, 6:23:48 PM, you wrote:
CG> One of our file servers internally to Sun that reproduces this
CG> running nv53 here is the dtrace output:
Any conclusions yet?
--
Best regards,
Robert mailto:rmilkowski at task.gda.pl
http://milek.blogspot.com
Chris Gerhard
2006-Dec-13 11:25 UTC
[zfs-discuss] Re: Re: Re: Snapshots impact on performance
Robert Milkowski wrote:> Hello Chris, > > Wednesday, December 6, 2006, 6:23:48 PM, you wrote: > > CG> One of our file servers internally to Sun that reproduces this > CG> running nv53 here is the dtrace output: > > Any conclusions yet? >Not yet. We had to delete all the "automatic" snapshots we had so that the system became usable again. An engineer is looking into reproducing this on a lab system. I hope they will turn up on this thread with a progress report. -- Chris Gerhard. __o __o __o Principal Engineer _`\<,`\<,`\<,_ Sun Microsystems Limited (*)/---/---/ (*) Phone: +44 (0) 1252 426033 (ext 26033) http://blogs.sun.com/chrisg ----------------------------------------------------------- NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3253 bytes Desc: S/MIME Cryptographic Signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061213/ad0395af/attachment.bin>
Robert Milkowski
2006-Dec-22 12:33 UTC
[zfs-discuss] Re: Re: Re: Snapshots impact on performance
Hi.
The problem is getting worse... now even if I destroy all snapshots in a pool
I get performance problem even with zil_disable set to 1.
Despite that I have limit for maximum nfs threads set to 2048 I get only about
1700.
If I want to kill nfsd server it takes 1-4 minutes untill all threads are
finished (and most of the time above 1000 threads are stil there). During nfsd
is stopping I can see IOs (99% reads) to the pool.
Also simple zfs commands (like changing quote for a file system, etc.) take too
much time to complete (like 3-5 minutes to set quota for a file system).
There''s still over 700gb free storage in a pool, setting quota to none
doesn''t help.
Using iostat it looks like disks aren''t saturated. It looks almost like
lot of nfsd threads are spinning (probably in zfs) - I don''t recall
nfsd stopping so long on UFS file systems.
bash-3.00# iostat -xnzC 1
[... first output]
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1782.3 0.0 12700.7 0.0 0.0 11.1 0.0 6.2 0 293 c5
584.3 0.0 4102.2 0.0 0.0 3.6 0.0 6.2 0 96
c5t600C0FF000000000098FD57F9DA83C00d0
572.1 0.0 4144.8 0.0 0.0 3.7 0.0 6.5 0 99
c5t600C0FF000000000098FD55DBA4EA000d0
625.8 0.0 4453.7 0.0 0.0 3.7 0.0 5.9 0 98
c5t600C0FF000000000098FD516E4403200d0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
2381.9 0.0 17242.1 0.0 0.0 16.7 0.0 7.0 0 291 c5
800.0 0.0 5827.9 0.0 0.0 5.5 0.0 6.9 0 96
c5t600C0FF000000000098FD57F9DA83C00d0
690.0 0.0 4991.9 0.0 0.0 4.9 0.0 7.1 0 97
c5t600C0FF000000000098FD55DBA4EA000d0
892.0 0.0 6422.3 0.0 0.0 6.3 0.0 7.1 0 98
c5t600C0FF000000000098FD516E4403200d0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1890.2 0.0 12826.0 0.0 0.0 11.4 0.0 6.0 0 292 c5
611.1 0.0 3998.5 0.0 0.0 3.4 0.0 5.6 0 96
c5t600C0FF000000000098FD57F9DA83C00d0
604.1 0.0 4259.5 0.0 0.0 3.9 0.0 6.4 0 98
c5t600C0FF000000000098FD55DBA4EA000d0
675.1 0.0 4568.0 0.0 0.0 4.1 0.0 6.0 0 98
c5t600C0FF000000000098FD516E4403200d0
^C
Now dtrace output you''ve requested.
dtrace -n ''profile-97{@[stack(20)] = count()}'' -c
''sleep 5''
[...]
unix`i_ddi_splhigh
unix`disp_getwork+0x38
unix`idle+0xd4
unix`thread_start+0x4
35
unix`i_ddi_splx
unix`disp_getwork+0x160
unix`idle+0xd4
unix`thread_start+0x4
38
unix`disp_getwork+0x70
unix`idle+0xd4
unix`thread_start+0x4
38
unix`disp_getwork+0x158
unix`idle+0xd4
unix`thread_start+0x4
38
unix`i_ddi_splhigh+0x14
unix`disp_getwork+0x38
unix`idle+0xd4
unix`thread_start+0x4
39
unix`disp_getwork+0x8c
unix`idle+0xd4
unix`thread_start+0x4
39
unix`idle+0x12c
unix`thread_start+0x4
44
unix`disp_getwork+0x1a8
unix`idle+0xd4
unix`thread_start+0x4
47
unix`disp_getwork+0x7c
unix`idle+0xd4
unix`thread_start+0x4
49
unix`disp_getwork+0x90
unix`idle+0xd4
unix`thread_start+0x4
56
unix`disp_getwork+0x10c
unix`idle+0xd4
unix`thread_start+0x4
62
unix`i_ddi_splx+0x1c
unix`disp_getwork+0x160
unix`idle+0xd4
unix`thread_start+0x4
117
bash-3.00#
This message posted from opensolaris.org
Robert Milkowski
2006-Dec-22 12:42 UTC
[zfs-discuss] Re: Re: Re: Snapshots impact on performance
bash-3.00# lockstat -kgIW sleep 100 | head -30 Profiling interrupt: 38844 events in 100.098 seconds (388 events/sec) Count genr cuml rcnt nsec Hottest CPU+PIL Caller ------------------------------------------------------------------------------- 32081 83% ---- 0.00 2432 cpu[1] thread_start 24347 63% ---- 0.00 2116 cpu[1]+11 idle 20724 53% ---- 0.00 2125 cpu[1]+11 disp_getwork 5525 14% ---- 0.00 3571 cpu0 syscall_trap32 5470 14% ---- 0.00 3570 cpu0 nfssys 5470 14% ---- 0.00 3570 cpu0 svc_run 5225 13% ---- 0.00 3595 cpu0 svc_getreq 5142 13% ---- 0.00 3600 cpu0 common_dispatch 4488 12% ---- 0.00 3040 cpu[2] taskq_thread 3658 9% ---- 0.00 2946 cpu[2] zio_vdev_io_assess 2943 8% ---- 0.00 2874 cpu[2] zio_read_decompress 2846 7% ---- 0.00 2120 cpu[1] splx 2654 7% ---- 0.00 4105 cpu[1] putnext 2541 7% ---- 0.00 2785 cpu[2] lzjb_decompress 2382 6% ---- 0.00 2138 cpu[2] i_ddi_splhigh 2056 5% ---- 0.00 3481 cpu0 fop_lookup 2031 5% ---- 0.00 3485 cpu0 zfs_lookup 1979 5% ---- 0.00 3482 cpu0 zfs_dirlook 1935 5% ---- 0.00 3492 cpu0 zfs_dirent_lock 1910 5% ---- 0.00 3835 cpu[1] txg_sync_thread 1910 5% ---- 0.00 3835 cpu[1] spa_sync 1896 5% ---- 0.00 3842 cpu[1] dsl_pool_sync 1896 5% ---- 0.00 3842 cpu[1] dmu_objset_sync 1895 5% ---- 0.00 3843 cpu[1] dmu_objset_sync_dnodes 1894 5% ---- 0.00 3843 cpu[1] dnode_sync bash-3.00# This message posted from opensolaris.org
Robert Milkowski
2007-Jan-05 22:59 UTC
[zfs-discuss] Re: Re: Re: Snapshots impact on performance
Hello Chris, Wednesday, December 13, 2006, 12:25:40 PM, you wrote: CG> Robert Milkowski wrote:>> Hello Chris, >> >> Wednesday, December 6, 2006, 6:23:48 PM, you wrote: >> >> CG> One of our file servers internally to Sun that reproduces this >> CG> running nv53 here is the dtrace output: >> >> Any conclusions yet? >>CG> Not yet. We had to delete all the "automatic" snapshots we had so that CG> the system became usable again. An engineer is looking into reproducing CG> this on a lab system. I hope they will turn up on this thread with a CG> progress report. The problem here is that even without snapshots we''re seeing the same problem. It''s like snapshots only make it worse. Hope it''ll be solved quickly (I''ve also escalating it with P1 for some time...). -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Gino
2007-Jun-27 08:59 UTC
[zfs-discuss] Re: Re[2]: Re: Re: Re: Snapshots impact on performance
Same problem here (snv_60). Robert, did you find any solutions? gino This message posted from opensolaris.org
Victor Latushkin
2007-Jun-27 12:19 UTC
[zfs-discuss] Re: Re[2]: Re: Re: Re: Snapshots impact on performance
Gino wrote:> Same problem here (snv_60). > Robert, did you find any solutions?Couple of week ago I put together an implementation of space maps which completely eliminates loops and recursion from space map alloc operation, and allows to implement different allocation strategies quite easily (of which I put together 3 more). It looks like it works for me on thumper and my notebook with ZFS Root though I have almost no time to test it more these days due to year end. I haven''t done SPARC build yet and I do not have test case to test against. Also, it comes at a price - I have to spend some more time (logarithmic, though) during all other operations on space maps and is not optimized now. victor
Hello Victor, Wednesday, June 27, 2007, 1:19:44 PM, you wrote: VL> Gino wrote:>> Same problem here (snv_60). >> Robert, did you find any solutions?VL> Couple of week ago I put together an implementation of space maps which VL> completely eliminates loops and recursion from space map alloc VL> operation, and allows to implement different allocation strategies quite VL> easily (of which I put together 3 more). It looks like it works for me VL> on thumper and my notebook with ZFS Root though I have almost no time to VL> test it more these days due to year end. I haven''t done SPARC build yet VL> and I do not have test case to test against. VL> Also, it comes at a price - I have to spend some more time (logarithmic, VL> though) during all other operations on space maps and is not optimized now. Lukasz (cc) - maybe you can test it and even help on tuning it? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Łukasz K
2007-Jul-27 12:41 UTC
[zfs-discuss] Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impact on performance
Dnia 26-07-2007 o godz. 13:31 Robert Milkowski napisa?(a):> Hello Victor, > > Wednesday, June 27, 2007, 1:19:44 PM, you wrote: > > VL> Gino wrote: > >> Same problem here (snv_60). > >> Robert, did you find any solutions? > > VL> Couple of week ago I put together an implementation of space maps > which > VL> completely eliminates loops and recursion from space map alloc > VL> operation, and allows to implement different allocation strategies > quite > VL> easily (of which I put together 3 more). It looks like it works for me > VL> on thumper and my notebook with ZFS Root though I have almost no > time to > VL> test it more these days due to year end. I haven''t done SPARC build > yet > VL> and I do not have test case to test against. > > VL> Also, it comes at a price - I have to spend some more time > (logarithmic, > VL> though) during all other operations on space maps and is not > optimized now. > > Lukasz (cc) - maybe you can test it and even help on tuning it? >Yes, I can test it. I''m building environment to compile opensolaris and test zfs. I will be ready next week. Victor, can you tell me where to look for your changes ? How to change allocation strategy ? I can see that changing space_map_ops_t I can declare diffrent callback functions. Lukas ---------------------------------------------------- Tylko od nich zale?y czy prze?yj? t? noc. Jak uciec, gdy oni widz? wszystko? Kate Beckinsale w mrocznym thrillerze MOTEL - kinach od 3 sierpnia! http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fmotel.html&sid=1236
> Same problem here (snv_60). > Robert, did you find any solutions? > > ginocheck this http://www.opensolaris.org/jive/thread.jspa?threadID=34423&tstart=0 Check spa_sync function time remember to change POOL_NAME ! dtrace -q -n fbt::spa_sync:entry''/(char *)(((spa_t*)arg0)->spa_name) == "POOL_NAME"/{ self->t = timestamp; }'' -n fbt::spa_sync:return''/self->t/{ @m = max((timestamp - self->t)/1000000); self->t = 0; }'' -n tick-10m''{ printa("%@u", at m); exit(0); }'' If you have long spa_sync times, try to check if you have problems with finding new blocks in space map with this script: #!/usr/sbin/dtrace -s fbt::space_map_alloc:entry { self->s = arg1; } fbt::space_map_alloc:return /arg1 != -1/ { self->s = 0; } fbt::space_map_alloc:return /self->s && (arg1 == -1)/ { @s = quantize(self->s); self->s = 0; } tick-10s { printa(@s); } Then change zfs set recordsize=XX POOL_NAME. Make sure that all filesystem inherits recordsize. #zfs get -r recordsize POOL_NAME Other thing is space map size. check map size echo ''::spa'' | mdb -k | grep ''f[0-9]*-[0-9]*'' \ | while read pool_ptr state pool_name do echo "${pool_ptr}::walk metaslab|::print -d struct metaslab ms_smo.smo_objsize" \ | mdb -k \ | nawk ''{sub("^0t","",$3);sum+=$3}END{print sum}'' done The value you will get is space map size on disk. In memory space map will have about 4 *size_on_disk. Sometimes during snapshot remove kernel will have to load all space maps to memory. For example if space map on disk takes 1GB then: - kernel in spa_sync funtion will read 1GB from disk ( or from cache ) - allocate 4GB for avl trees - do all operations on avl trees - save maps It is good to have enough free memory for this operations. You can reduce space map by coping all filesystems on other pool. I recommend zfs send. regards Lukas This message posted from opensolaris.org
Victor Latushkin
2007-Aug-24 12:57 UTC
[zfs-discuss] Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impact on performance
Hi Lukasz and all, I just returned from month long sick-leave, so I need some time to sort pile of emails, do SPARC build and some testing and then I''ll be able to provide you with my changes in some form. Hope this will happen next week. Cheers, Victor ?ukasz K wrote:> Dnia 26-07-2007 o godz. 13:31 Robert Milkowski napisa?(a): >> Hello Victor, >> >> Wednesday, June 27, 2007, 1:19:44 PM, you wrote: >> >> VL> Gino wrote: >>>> Same problem here (snv_60). >>>> Robert, did you find any solutions? >> VL> Couple of week ago I put together an implementation of space maps >> which >> VL> completely eliminates loops and recursion from space map alloc >> VL> operation, and allows to implement different allocation strategies >> quite >> VL> easily (of which I put together 3 more). It looks like it works for me >> VL> on thumper and my notebook with ZFS Root though I have almost no >> time to >> VL> test it more these days due to year end. I haven''t done SPARC build >> yet >> VL> and I do not have test case to test against. >> >> VL> Also, it comes at a price - I have to spend some more time >> (logarithmic, >> VL> though) during all other operations on space maps and is not >> optimized now. >> >> Lukasz (cc) - maybe you can test it and even help on tuning it? >> > Yes, I can test it. I''m building environment to compile opensolaris > and test zfs. I will be ready next week. > > Victor, can you tell me where to look for your changes ? > How to change allocation strategy ? > I can see that changing space_map_ops_t > I can declare diffrent callback functions. > > Lukas > > ---------------------------------------------------- > Tylko od nich zale?y czy prze?yj? t? noc. Jak uciec, gdy > oni widz? wszystko? Kate Beckinsale w mrocznym thrillerze > MOTEL - kinach od 3 sierpnia! > http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fmotel.html&sid=1236 > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Łukasz K
2007-Aug-24 13:49 UTC
[zfs-discuss] Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impacton performance
Great, I have latest snv_70 sources and I''m working on it. Lukas Dnia 24-08-2007 o godz. 14:57 Victor Latushkin napisa?(a):> Hi Lukasz and all, > > I just returned from month long sick-leave, so I need some time to sort > pile of emails, do SPARC build and some testing and then I''ll be able to > provide you with my changes in some form. Hope this will happen next week. > > Cheers, > Victor > > ?ukasz K wrote: > > Dnia 26-07-2007 o godz. 13:31 Robert Milkowski napisa?(a): > >> Hello Victor, > >> > >> Wednesday, June 27, 2007, 1:19:44 PM, you wrote: > >> > >> VL> Gino wrote: > >>>> Same problem here (snv_60). > >>>> Robert, did you find any solutions? > >> VL> Couple of week ago I put together an implementation of space maps > >> which > >> VL> completely eliminates loops and recursion from space map alloc > >> VL> operation, and allows to implement different allocation strategies > >> quite > >> VL> easily (of which I put together 3 more). It looks like it works for me > >> VL> on thumper and my notebook with ZFS Root though I have almost no > >> time to > >> VL> test it more these days due to year end. I haven''t done SPARC build > >> yet > >> VL> and I do not have test case to test against. > >> > >> VL> Also, it comes at a price - I have to spend some more time > >> (logarithmic, > >> VL> though) during all other operations on space maps and is not > >> optimized now. > >> > >> Lukasz (cc) - maybe you can test it and even help on tuning it? > >> > > Yes, I can test it. I''m building environment to compile opensolaris > > and test zfs. I will be ready next week. > > > > Victor, can you tell me where to look for your changes ? > > How to change allocation strategy ? > > I can see that changing space_map_ops_t > > I can declare diffrent callback functions. > > > > Lukas > > > > ---------------------------------------------------- > > Tylko od nich zale?y czy prze?yj? t? noc. Jak uciec, gdy > > oni widz? wszystko? Kate Beckinsale w mrocznym thrillerze > > MOTEL - kinach od 3 sierpnia! > > http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fmotel.html&sid=1236 > > > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss---------------------------------------------------- Najnowszy album ?wiatowej klasy DJ''a Paula van Dyka "In Between" ju? w sklepach! Kompilacja zawiera singiel "White Lies" z udzia?em Jessica Sutta z Pussycat Dolls http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fpaulvandyk.html&sid=1272
Hello Erblichs,
Two years later... :)
It turned out to be a problem with metaslabs fragmentation.
Lowering recordsize to 8k helps slightly but still it''s not fixed.
--
Best regards,
Robert Milkowski mailto:milek at task.gda.pl
http://milek.blogspot.com