Hello zfs-discuss, S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is created and 7 copies are kept. There''s quota set for a file system however there''s always at least 50GB of free space in a file system (and much more in a pool). ZFS file system is exported over NFS. Snapshots consume about 280GB of space. We have noticed so performance problems on nfs clients to this file system even during times with smaller load. Rising quota didn''t help. However removing oldest snapshot automatically solved performance problems. I do not have more details - sorry. Is it expected for snapshots to have very noticeable performance impact on file system being snapshoted? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote:> Hello zfs-discuss, > > S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is > created and 7 copies are kept. There''s quota set for a file system > however there''s always at least 50GB of free space in a file system > (and much more in a pool). ZFS file system is exported over NFS. > > Snapshots consume about 280GB of space. > > We have noticed so performance problems on nfs clients to this file > system even during times with smaller load. Rising quota didn''t > help. However removing oldest snapshot automatically solved > performance problems. > > I do not have more details - sorry. > > Is it expected for snapshots to have very noticeable performance > impact on file system being snapshoted?No, this behavior is unexpected. The only way that snapshots should have a performance impact on access to the filesystem is if you are running low on space in the pool or quota (which it sounds like you are not). Can you describe what the performance problems were? What was the workload like? What problem did you identify? How did it improve when you ''zfs destroy''-ed the oldest snapshot? Are you sure that the oldest snapshot wasn''t pushing you close to your quota? --matt
Matthew Ahrens wrote On 10/16/06 09:07,:> Robert Milkowski wrote: > >> Hello zfs-discuss, >> >> S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is >> created and 7 copies are kept. There''s quota set for a file system >> however there''s always at least 50GB of free space in a file system >> (and much more in a pool). ZFS file system is exported over NFS. >> >> Snapshots consume about 280GB of space. >> We have noticed so performance problems on nfs clients to this file >> system even during times with smaller load. Rising quota didn''t >> help. However removing oldest snapshot automatically solved >> performance problems. >> >> I do not have more details - sorry. >> >> Is it expected for snapshots to have very noticeable performance >> impact on file system being snapshoted? > > > No, this behavior is unexpected. The only way that snapshots should > have a performance impact on access to the filesystem is if you are > running low on space in the pool or quota (which it sounds like you are > not). > > Can you describe what the performance problems were? What was the > workload like? What problem did you identify? How did it improve when > you ''zfs destroy''-ed the oldest snapshot? Are you sure that the oldest > snapshot wasn''t pushing you close to your quota? > > --mattI could well believe there would be a hiccup when the snapshot is taken on the rest of the pool. Each snapshot calls txg_wait_synced four times. A few related to the zil and one from dsl_sync_task_group_wait
Hello Matthew, Monday, October 16, 2006, 5:07:50 PM, you wrote: MA> Robert Milkowski wrote:>> Hello zfs-discuss, >> >> S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is >> created and 7 copies are kept. There''s quota set for a file system >> however there''s always at least 50GB of free space in a file system >> (and much more in a pool). ZFS file system is exported over NFS. >> >> Snapshots consume about 280GB of space. >> >> We have noticed so performance problems on nfs clients to this file >> system even during times with smaller load. Rising quota didn''t >> help. However removing oldest snapshot automatically solved >> performance problems. >> >> I do not have more details - sorry. >> >> Is it expected for snapshots to have very noticeable performance >> impact on file system being snapshoted?MA> No, this behavior is unexpected. The only way that snapshots should MA> have a performance impact on access to the filesystem is if you are MA> running low on space in the pool or quota (which it sounds like you are MA> not). MA> Can you describe what the performance problems were? What was the MA> workload like? What problem did you identify? How did it improve when MA> you ''zfs destroy''-ed the oldest snapshot? Are you sure that the oldest MA> snapshot wasn''t pushing you close to your quota? I wasn''t on-site every-time it happened... but it looks like all NFS clients to that file system have problem accessing data for 2-5s and iostat on clients shows 100% busy to the disk. There was well over 100gb of free space in a file system, then rising quote to add another 100gb didn''t help. After removing snapshots all problems disappeared almost immediately. No new snapshots were created during this time. It happened second time - and again, quite a lot of free space and destroying one snapshot helps. If it happens again I''ll try to get some more specific data - however it depends on when it happens as during peak hours I''ll probably just destroy a snapshot to get it working. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote:> If it happens again I''ll try to get some more specific data - however > it depends on when it happens as during peak hours I''ll probably just > destroy a snapshot to get it working.If it happens again, it would be great if you could gather some data before you destroy the snapshot so we have some chance of figuring out what''s going on here. ''iostat -xnpc 1'' will tell us if it''s CPU or disk bound. ''lockstat -kgiw sleep 10'' will tell us what functions are using CPU. ''echo "::walk thread|::findstack" | mdb -k'' will tell us where threads are stuck. Actually, if you could gather each of those both while you''re observing the problem, and then after the problem goes away, that would be helpful. --matt
Hi. On nfs clients which are mounting file system f3-1/d611 I can see 3-5s periods of 100% busy (iostat) and almost no IOs issued to nfs server, on nfs server at the same time disk activity is almost 0 (both iostat and zpool iostat). However CPU activity increases in SYS during that periods. There''s quite a lot of free space in a f3-1/d611 as you can see: # zfs list | egrep "611|NAM" NAME USED AVAIL REFER MOUNTPOINT f3-1/d611 381G 219G 151G //f3-1/d611 f3-1/d611 at auto-2006-10-18 48.2G - 156G - f3-1/d611 at auto-2006-10-19 19.4G - 145G - f3-1/d611 at auto-2006-10-20 21.9G - 155G - f3-1/d611 at auto-2006-10-21 22.9G - 151G - f3-1/d611 at auto-2006-10-22 18.5G - 145G - f3-1/d611 at auto-2006-10-23 17.6G - 148G - f3-1/d611 at auto-2006-10-24 14.5G - 154G - And much more in a pool itself: # zfs list | egrep "f3-1 |NAM" NAME USED AVAIL REFER MOUNTPOINT f3-1 563G 642G 9.18G //f3-1 # zfs get all f3-1/d611 NAME PROPERTY VALUE SOURCE f3-1/d611 type filesystem - f3-1/d611 creation Tue Aug 29 2:07 2006 - f3-1/d611 used 381G - f3-1/d611 available 219G - f3-1/d611 referenced 151G - f3-1/d611 compressratio 1.00x - f3-1/d611 mounted yes - f3-1/d611 quota 600G local f3-1/d611 reservation none default f3-1/d611 recordsize 128K default f3-1/d611 mountpoint //f3-1/d611 default f3-1/d611 sharenfs off default f3-1/d611 checksum on default f3-1/d611 compression off default f3-1/d611 atime off inherited from f3-1 f3-1/d611 devices on default f3-1/d611 exec on default f3-1/d611 setuid on default f3-1/d611 readonly off default f3-1/d611 zoned off default f3-1/d611 snapdir hidden default f3-1/d611 aclmode groupmask default f3-1/d611 aclinherit secure default Access to this file system is only over NFS. Right now nfsd has less than 400 active threads. Now output from several commands you requested during the times when disk activity is low, CPU usage higher and nfs clients see problem. # zpool iostat 1 [...] f3-1 562G 662G 20 0 156K 0 f3-1 562G 662G 29 6 842K 887K f3-1 562G 662G 12 0 238K 0 f3-1 562G 662G 927 25 11.0M 2.38M f3-1 562G 662G 1.96K 221 19.8M 5.13M f3-1 562G 662G 110 112 2.09M 1.72M # iostat -xnzp 1 [...] extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 44.0 0.0 819.4 0.0 0.0 0.3 0.0 7.2 0 22 c5t600C0FF000000000098FD57F9DA83C00d0 44.0 0.0 819.4 0.0 0.0 0.3 0.0 7.2 0 22 c5t600C0FF000000000098FD57F9DA83C00d0s0 26.0 0.0 179.5 0.0 0.0 0.2 0.0 6.8 0 14 c5t600C0FF000000000098FD55DBA4EA000d0 26.0 0.0 179.5 0.0 0.0 0.2 0.0 6.8 0 14 c5t600C0FF000000000098FD55DBA4EA000d0s0 26.0 0.0 224.0 0.0 0.0 0.2 0.0 7.2 0 15 c5t600C0FF000000000098FD516E4403200d0 26.0 0.0 224.0 0.0 0.0 0.2 0.0 7.2 0 15 c5t600C0FF000000000098FD516E4403200d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 12.0 0.0 216.5 0.0 0.0 0.1 0.0 6.1 0 6 c5t600C0FF000000000098FD57F9DA83C00d0 12.0 0.0 216.5 0.0 0.0 0.1 0.0 6.1 0 6 c5t600C0FF000000000098FD57F9DA83C00d0s0 10.0 0.0 70.5 0.0 0.0 0.1 0.0 6.4 0 6 c5t600C0FF000000000098FD55DBA4EA000d0 10.0 0.0 70.5 0.0 0.0 0.1 0.0 6.4 0 6 c5t600C0FF000000000098FD55DBA4EA000d0s0 8.0 0.0 63.0 0.0 0.0 0.0 0.0 6.0 0 5 c5t600C0FF000000000098FD516E4403200d0 8.0 0.0 63.0 0.0 0.0 0.0 0.0 6.0 0 5 c5t600C0FF000000000098FD516E4403200d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 1.0 0.0 0.5 0.0 0.0 0.0 12.3 0 1 c1t0d0 0.0 1.0 0.0 0.5 0.0 0.0 0.0 12.3 0 1 c1t0d0s6 0.0 1.0 0.0 0.5 0.0 0.0 0.0 9.5 0 1 c1t1d0 0.0 1.0 0.0 0.5 0.0 0.0 0.0 9.5 0 1 c1t1d0s6 7.0 0.0 55.8 0.0 0.0 0.1 0.0 7.8 0 5 c5t600C0FF000000000098FD57F9DA83C00d0 7.0 0.0 55.8 0.0 0.0 0.1 0.0 7.8 0 5 c5t600C0FF000000000098FD57F9DA83C00d0s0 6.0 7.0 66.8 893.2 0.0 0.1 0.0 6.4 0 5 c5t600C0FF000000000098FD55DBA4EA000d0 6.0 7.0 66.8 893.2 0.0 0.1 0.0 6.4 0 5 c5t600C0FF000000000098FD55DBA4EA000d0s0 9.0 0.0 665.4 0.0 0.0 0.1 0.0 6.5 0 4 c5t600C0FF000000000098FD516E4403200d0 9.0 0.0 665.4 0.0 0.0 0.1 0.0 6.5 0 4 c5t600C0FF000000000098FD516E4403200d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 5.0 0.0 36.6 0.0 0.0 0.0 0.0 7.3 0 4 c5t600C0FF000000000098FD57F9DA83C00d0 5.0 0.0 36.6 0.0 0.0 0.0 0.0 7.3 0 4 c5t600C0FF000000000098FD57F9DA83C00d0s0 11.0 0.0 84.3 0.0 0.0 0.1 0.0 6.1 0 6 c5t600C0FF000000000098FD55DBA4EA000d0 11.0 0.0 84.3 0.0 0.0 0.1 0.0 6.1 0 6 c5t600C0FF000000000098FD55DBA4EA000d0s0 5.0 0.0 181.1 0.0 0.0 0.0 0.0 7.1 0 4 c5t600C0FF000000000098FD516E4403200d0 5.0 0.0 181.1 0.0 0.0 0.0 0.0 7.1 0 4 c5t600C0FF000000000098FD516E4403200d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 229.9 10.0 2795.9 639.9 0.0 1.9 0.0 7.8 0 49 c5t600C0FF000000000098FD57F9DA83C00d0 229.9 10.0 2795.9 639.9 0.0 1.9 0.0 7.8 0 49 c5t600C0FF000000000098FD57F9DA83C00d0s0 243.9 13.0 3790.1 1151.7 0.0 2.0 0.0 7.6 0 54 c5t600C0FF000000000098FD55DBA4EA000d0 243.9 13.0 3790.1 1151.7 0.0 2.0 0.0 7.6 0 54 c5t600C0FF000000000098FD55DBA4EA000d0s0 198.0 0.0 2355.5 0.0 0.0 1.5 0.0 7.7 0 51 c5t600C0FF000000000098FD516E4403200d0 198.0 0.0 2355.5 0.0 0.0 1.5 0.0 7.7 0 51 c5t600C0FF000000000098FD516E4403200d0s0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 712.1 71.0 6038.5 1469.6 0.0 8.4 0.0 10.8 0 98 c5t600C0FF000000000098FD57F9DA83C00d0 712.1 71.0 6038.5 1469.6 0.0 8.4 0.0 10.8 0 98 c5t600C0FF000000000098FD57F9DA83C00d0s0 759.1 164.0 7574.6 2283.2 0.0 10.2 0.0 11.1 0 100 c5t600C0FF000000000098FD55DBA4EA000d0 759.1 164.0 7574.6 2283.2 0.0 10.2 0.0 11.1 0 100 c5t600C0FF000000000098FD55DBA4EA000d0s0 716.1 40.0 8526.2 1956.2 0.0 8.7 0.0 11.5 0 98 c5t600C0FF000000000098FD516E4403200d0 716.1 40.0 8526.2 1956.2 0.0 8.7 0.0 11.5 0 98 c5t600C0FF000000000098FD516E4403200d0s0 # mpstat 1 [...] CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 7 72 61 563 6 103 13 0 253 1 20 0 79 1 0 0 3 57 31 359 0 70 11 1 14 0 82 0 18 2 0 0 8 369 356 755 3 144 18 0 140 0 3 0 97 3 0 0 9 321 218 334 1 18 8 0 32 0 57 0 43 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1 21 16 192 0 13 3 0 40 0 55 0 45 1 0 0 1 54 27 324 2 50 8 0 195 0 15 0 85 2 0 0 2 165 161 148 0 39 2 0 49 0 1 0 99 3 0 0 7 260 158 144 1 36 4 0 31 0 60 0 40 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 15 36 29 113 1 35 6 0 18 0 87 0 13 1 0 0 5 60 33 497 1 59 11 0 108 0 48 0 52 2 0 0 5 491 483 477 25 72 9 0 214 1 12 0 87 3 0 0 7 288 183 328 2 30 4 0 45 0 46 0 54 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 7 15 12 270 0 33 6 0 175 0 12 0 88 1 0 0 3 47 25 178 0 42 4 0 47 1 5 0 94 2 0 0 1 64 60 70 0 22 3 0 75 0 0 0 100 3 0 0 2 267 162 2 2 1 1 0 0 0 100 0 0 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 112 493 428 3068 7 638 279 91 24 0 50 0 50 1 0 0 55 87 30 1541 4 402 167 71 154 0 70 0 30 2 0 0 145 4074 3932 4564 5 998 490 497 106 0 51 0 49 3 0 0 139 764 587 4220 12 798 334 461 17 0 66 0 34 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 281 1189 902 9050 22 1867 1173 1573 139 0 62 0 38 1 0 0 136 357 27 7267 11 1645 985 1128 42 0 76 0 24 2 0 0 168 7332 7079 7255 21 1608 952 1138 80 0 63 0 37 3 0 0 331 1199 1058 1589 19 272 253 58 34 0 84 0 16 Different time period when disk activity is small: # lockstat -kgIw sleep 10 | less Profiling interrupt: 3904 events in 10.055 seconds (388 events/sec) Count genr cuml rcnt nsec CPU+PIL Hottest Caller ------------------------------------------------------------------------------- 8925 229% ---- 0.00 5067 cpu[1] nfssys 7386 189% ---- 0.00 2610 cpu[3] thread_start 6566 168% ---- 0.00 4404 cpu[2] thread_start 6453 165% ---- 0.00 3325 cpu[0] thread_start 1259 32% ---- 0.00 1756 cpu[0]+11 thread_start 1161 30% ---- 0.00 2247 cpu[2]+11 thread_start 1159 30% ---- 0.00 1472 cpu[3]+11 thread_start 939 24% ---- 0.00 2459 cpu[1]+11 thread_start 190 5% ---- 0.00 2880 cpu[0]+6 pci_intr_wrapper 181 5% ---- 0.00 2914 cpu[3]+6 ql_response_pkt 103 3% ---- 0.00 4591 cpu[2]+6 intr_thread 7 0% ---- 0.00 1118 cpu[3]+12 rmc_comm_serdev_receive 5 0% ---- 0.00 1022 cpu[3]+13 cyclic_softint ------------------------------------------------------------------------------- Now during another period when disk activity is low and nfs clients see problem: # dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}'' -n tick-5s''{printa(@);exit(0);}'' [...] page_sub_common 840400 mutex_vector_enter 880400 callout_schedule_1 1106200 kmem_cache_alloc 1182400 lock_set_spl_spin 1240000 thread_lock 1269600 syscall_mstate 1359200 set_freemem 1575800 page_trylock 1587500 hwblkpagecopy 1885100 set_anoninfo 1904400 page_pptonum 2044900 xc_serv 2147600 lzjb_decompress 3209600 resume 4783500 hwblkclr 7101100 page_next_scan_large 23648600 generic_idle_cpu 69234100 disp_getwork 139261800 avl_walk 669424900 Lets see how it looks when there;s no problem: # dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}'' -n tick-5s''{printa(@);exit(0);}'' [...] avl_insert 8591400 sleepq_insert 8660800 fletcher_4_native 9789400 buf_hash 10454600 copy_pattern 12395200 hat_getpfnum 14970100 ddi_get_instance 16348900 ddi_get_soft_state 19676500 pci_pbm_dma_sync 21993300 space_map_seg_compare 25628100 lock_set_spin 31066500 ip_cksum 31528900 kmem_cache_alloc 34203500 kmem_free 35418800 page_next_scan_large 35833700 fletcher_2_native 48868100 kmem_cache_free 54456400 avl_destroy_nodes 63271000 lzjb_decompress 72163800 lock_set_spl_spin 72655100 lzjb_compress 91146600 mutex_vector_enter 113553300 resume 2836183200 No avl_walk :) Heh, I''m sure I have seen avl_walk consuming lot of CPU before... So wait for another such period and (6-7seconds): # dtrace -n fbt::avl_walk:entry''{@[stack()]=count();}'' [...] zfs`space_map_add+0x160 zfs`space_map_load+0x218 zfs`metaslab_activate+0x3c zfs`metaslab_group_alloc+0x1c0 zfs`metaslab_alloc_dva+0x114 zfs`metaslab_alloc+0x2c zfs`zio_dva_allocate+0x4c zfs`zio_write_compress+0x1ec zfs`arc_write+0xe4 zfs`dbuf_sync+0x6d8 zfs`dnode_sync+0x35c zfs`dmu_objset_sync_dnodes+0x6c zfs`dmu_objset_sync+0x54 zfs`dsl_dataset_sync+0xc zfs`dsl_pool_sync+0x104 zfs`spa_sync+0xe4 zfs`txg_sync_thread+0x134 unix`thread_start+0x4 1259 zfs`space_map_add+0x174 zfs`space_map_load+0x218 zfs`metaslab_activate+0x3c zfs`metaslab_group_alloc+0x1c0 zfs`metaslab_alloc_dva+0x114 zfs`metaslab_alloc+0x2c zfs`zio_alloc_blk+0x70 zfs`zil_lwb_write_start+0x8c zfs`zil_lwb_commit+0x1ac zfs`zil_commit+0x1b0 zfs`zfs_fsync+0xa8 genunix`fop_fsync+0x14 nfssrv`rfs3_create+0x700 nfssrv`common_dispatch+0x444 rpcmod`svc_getreq+0x154 rpcmod`svc_run+0x198 nfs`nfssys+0x1c4 unix`syscall_trap32+0xcc 183895 zfs`space_map_add+0x160 zfs`space_map_load+0x218 zfs`metaslab_activate+0x3c zfs`metaslab_group_alloc+0x1c0 zfs`metaslab_alloc_dva+0x114 zfs`metaslab_alloc+0x2c zfs`zio_alloc_blk+0x70 zfs`zil_lwb_write_start+0x8c zfs`zil_lwb_commit+0x1ac zfs`zil_commit+0x1b0 zfs`zfs_fsync+0xa8 genunix`fop_fsync+0x14 nfssrv`rfs3_create+0x700 nfssrv`common_dispatch+0x444 rpcmod`svc_getreq+0x154 rpcmod`svc_run+0x198 nfs`nfssys+0x1c4 unix`syscall_trap32+0xcc 196122 zfs`metaslab_ff_alloc+0x9c zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x1e4 zfs`metaslab_alloc_dva+0x114 zfs`metaslab_alloc+0x2c zfs`zio_alloc_blk+0x70 zfs`zil_lwb_write_start+0x8c zfs`zil_commit+0x21c zfs`zfs_fsync+0xa8 genunix`fop_fsync+0x14 nfssrv`rfs3_remove+0x218 nfssrv`common_dispatch+0x444 rpcmod`svc_getreq+0x154 rpcmod`svc_run+0x198 nfs`nfssys+0x1c4 unix`syscall_trap32+0xcc 854743 zfs`metaslab_ff_alloc+0x9c zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x1e4 zfs`metaslab_alloc_dva+0x114 zfs`metaslab_alloc+0x2c zfs`zio_dva_allocate+0x4c zfs`zio_write_compress+0x1ec zfs`arc_write+0xe4 zfs`dbuf_sync+0x6d8 zfs`dnode_sync+0x35c zfs`dmu_objset_sync_dnodes+0x6c zfs`dmu_objset_sync+0x54 zfs`dsl_dataset_sync+0xc zfs`dsl_pool_sync+0x104 zfs`spa_sync+0xe4 zfs`txg_sync_thread+0x134 unix`thread_start+0x4 1042372 zfs`metaslab_ff_alloc+0x9c zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x1e4 zfs`metaslab_alloc_dva+0x114 zfs`metaslab_alloc+0x2c zfs`zio_alloc_blk+0x70 zfs`zil_lwb_write_start+0x8c zfs`zil_lwb_commit+0x1ac zfs`zil_commit+0x1b0 zfs`zfs_fsync+0xa8 genunix`fop_fsync+0x14 nfssrv`rfs3_create+0x700 nfssrv`common_dispatch+0x444 rpcmod`svc_getreq+0x154 rpcmod`svc_run+0x198 nfs`nfssys+0x1c4 unix`syscall_trap32+0xcc 1415957 And another "bad" time period: # dtrace -n fbt::avl_walk:entry''{self->vt=vtimestamp;}'' -n fbt::avl_walk:return''/self->vt/{@[stack()]=sum(vtimestamp-self->vt);self->vt=0;}'' -n tick-6s''{printa(@);exit(0);}'' [...] zfs`vdev_queue_io_to_issue+0xc4 zfs`vdev_queue_io_done+0x4c zfs`vdev_disk_io_done+0x4 genunix`taskq_thread+0x1a4 unix`thread_start+0x4 11700 zfs`vdev_queue_io_to_issue+0x58 zfs`vdev_queue_io_done+0x4c zfs`vdev_disk_io_done+0x4 genunix`taskq_thread+0x1a4 unix`thread_start+0x4 20800 zfs`metaslab_ff_alloc+0x9c zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x1e4 zfs`metaslab_alloc_dva+0x114 zfs`metaslab_alloc+0x2c zfs`zio_dva_allocate+0x4c zfs`zio_write_compress+0x1ec zfs`arc_write+0xe4 zfs`dbuf_sync+0x6d8 zfs`dnode_sync+0x35c zfs`dmu_objset_sync_dnodes+0x6c zfs`dmu_objset_sync+0x54 zfs`dsl_dataset_sync+0xc zfs`dsl_pool_sync+0x104 zfs`spa_sync+0xe4 zfs`txg_sync_thread+0x134 unix`thread_start+0x4 382903900 zfs`metaslab_ff_alloc+0x9c zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x1e4 zfs`metaslab_alloc_dva+0x114 zfs`metaslab_alloc+0x2c zfs`zio_alloc_blk+0x70 zfs`zil_lwb_write_start+0x8c zfs`zil_lwb_commit+0x1ac zfs`zil_commit+0x1b0 zfs`zfs_write+0x6c4 genunix`fop_write+0x20 nfssrv`rfs3_write+0x368 nfssrv`common_dispatch+0x444 rpcmod`svc_getreq+0x154 rpcmod`svc_run+0x198 nfs`nfssys+0x1c4 unix`syscall_trap32+0xcc 475080700 zfs`metaslab_ff_alloc+0x9c zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x1e4 zfs`metaslab_alloc_dva+0x114 zfs`metaslab_alloc+0x2c zfs`zio_alloc_blk+0x70 zfs`zil_lwb_write_start+0x8c zfs`zil_lwb_commit+0x1ac zfs`zil_commit+0x1b0 zfs`zfs_fsync+0xa8 genunix`fop_fsync+0x14 nfssrv`rfs3_create+0x700 nfssrv`common_dispatch+0x444 rpcmod`svc_getreq+0x154 rpcmod`svc_run+0x198 nfs`nfssys+0x1c4 unix`syscall_trap32+0xcc 481455800 So lets destroy oldest snapshot: # zfs destroy f3-1/d611 at auto-2006-10-18 [it took about 4 minutes!] # mdb -k Loading modules: [ unix krtld genunix dtrace specfs ufs sd md ip sctp usba fcp fctl qlc ssd lofs logindmux ptm cpc fcip random crypto zfs nfs ]> ::ps!grep zfsR 20905 20464 20905 20458 0 0x4a004000 00000301ec367c20 zfs> 00000301ec367c20::walk thread|::findstack -vstack pointer for thread 301d1a02fe0: 2a103a28b71 [ 000002a103a28b71 cv_wait+0x38() ] 000002a103a28c21 txg_wait_synced+0x54(60007327290, 1a7bda, 1a7bd9, 600073272d0, 600073272d2, 60007327288) 000002a103a28cd1 dsl_dir_sync_task+0x124(60003867d00, 0, 60003f6800a, 0, 0, 1a7bda) 000002a103a28d81 dsl_dataset_destroy+0x98(60003f68000, 5, 3, 60003f68009, 600073271c0, 0) 000002a103a28f61 dmu_objset_destroy+0x3c(60003f68000, 0, 0, 2, 0, 54) 000002a103a29021 zfsdev_ioctl+0x160(70370c00, 54, ffbfed38, 1c, 70, e68) 000002a103a290d1 fop_ioctl+0x20(301ee6220c0, 5a1c, ffbfed38, 100003, 60003dcebf8, 12019f0) 000002a103a29191 ioctl+0x184(4, 3005b7c2ba8, ffbfed38, ff38db68, 40350, 5a1c) 000002a103a292e1 syscall_trap32+0xcc(4, 5a1c, ffbfed38, ff38db68, 40350, 0) After snapshot was destroyed problem is completly gone. And right now no avl_walk function at the top: # dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}'' -n tick-5s''{printa(@);exit(0);}'' [...] cv_broadcast 11204800 vdev_cache_offset_compare 11977900 avl_walk 12568500 hat_getpfnum 13012900 zio_next_stage 13649700 page_next_scan_large 15850300 ddi_get_instance 16842800 kmem_free 20050300 pci_pbm_dma_sync 22854300 ddi_get_soft_state 25803500 space_map_seg_compare 32883300 kmem_cache_free 34151000 ip_cksum 34840900 kmem_cache_alloc 41557900 resume 47279100 fletcher_2_native 55060000 lzjb_compress 94374200 lzjb_decompress 97102700 mutex_vector_enter 115330500 Lets try small experiment - see if lowering quota for the fs will make things bad again: # zfs list | egrep "NAME|f3-1/d611" NAME USED AVAIL REFER MOUNTPOINT f3-1/d611 334G 266G 151G //f3-1/d611 f3-1/d611 at auto-2006-10-19 25.7G - 145G - f3-1/d611 at auto-2006-10-20 21.9G - 155G - f3-1/d611 at auto-2006-10-21 22.9G - 151G - f3-1/d611 at auto-2006-10-22 18.5G - 145G - f3-1/d611 at auto-2006-10-23 17.6G - 148G - f3-1/d611 at auto-2006-10-24 14.7G - 154G - # zfs get quota f3-1/d611 NAME PROPERTY VALUE SOURCE f3-1/d611 quota 600G local # zfs set quota=400g f3-1/d611 # zfs list | egrep "NAME|f3-1/d611" NAME USED AVAIL REFER MOUNTPOINT f3-1/d611 334G 66.1G 151G //f3-1/d611 f3-1/d611 at auto-2006-10-19 25.7G - 145G - f3-1/d611 at auto-2006-10-20 21.9G - 155G - f3-1/d611 at auto-2006-10-21 22.9G - 151G - f3-1/d611 at auto-2006-10-22 18.5G - 145G - f3-1/d611 at auto-2006-10-23 17.6G - 148G - f3-1/d611 at auto-2006-10-24 14.7G - 154G - This message posted from opensolaris.org
forgot to mention - after quota was lowered still no problem - everything works ok. This message posted from opensolaris.org
Robert Milkowski wrote:> Hi. > > On nfs clients which are mounting file system f3-1/d611 I can see 3-5s periods of 100% busy (iostat) and almost no IOs issued to nfs server, on nfs server at the same time disk activity is almost 0 (both iostat and zpool iostat). However CPU activity increases in SYS during that periods.> Different time period when disk activity is small: > > # lockstat -kgIw sleep 10 | lessDid you happen to get ''lockstat -kgIW'' output while the problem was occurring? (note the capital W) I''m not sure how to interpret the -w output... (and sorry I gave you the wrong flags before).> Now during another period when disk activity is low and nfs clients see problem: > > # dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}'' -n tick-5s''{printa(@);exit(0);}'' > [...] > page_next_scan_large 23648600 > generic_idle_cpu 69234100 > disp_getwork 139261800 > avl_walk 669424900Hmm, that''s a possibility, but the method you''re using to gather this information (tracing *every* function entry and exit) is a bit heavy-handed, and it may be distorting the results.> Heh, I''m sure I have seen avl_walk consuming lot of CPU before... > > So wait for another such period and (6-7seconds): > > # dtrace -n fbt::avl_walk:entry''{@[stack()]=count();}'' > [...]> zfs`metaslab_ff_alloc+0x9c > zfs`space_map_alloc+0x10 > zfs`metaslab_group_alloc+0x1e4 > zfs`metaslab_alloc_dva+0x114 > zfs`metaslab_alloc+0x2c > zfs`zio_alloc_blk+0x70 > zfs`zil_lwb_write_start+0x8c > zfs`zil_lwb_commit+0x1ac > zfs`zil_commit+0x1b0 > zfs`zfs_fsync+0xa8 > genunix`fop_fsync+0x14 > nfssrv`rfs3_create+0x700 > nfssrv`common_dispatch+0x444 > rpcmod`svc_getreq+0x154 > rpcmod`svc_run+0x198 > nfs`nfssys+0x1c4 > unix`syscall_trap32+0xcc > 1415957Hmm, assuming that avl_walk() is actually consuming most of our CPU (which the lockstat -kgIW will confirm), this seems to indicate that we''re taking a long time trying to find free chunks of space. This may happen if you have lots of small fragments of free space, but no chunks large enough to hold the block we''re trying to allocate. We try to avoid this situation by trying to allocate from the metaslabs with the most free space, but it''s possible that there''s an error in this algorithm.> So lets destroy oldest snapshot: > > # zfs destroy f3-1/d611 at auto-2006-10-18 > [it took about 4 minutes!]> After snapshot was destroyed problem is completly gone.FYI, destroying the snapshot probably helped simply by (a) returning some big chunks of space to the pool and/or (b) perturbing the system enough so that we try different metaslabs which aren''t so fragmented. --matt
Hello Matthew, Wednesday, October 25, 2006, 7:08:56 AM, you wrote: MA> Robert Milkowski wrote:>> Hi. >> >> On nfs clients which are mounting file system f3-1/d611 I can see 3-5s periods of 100% busy (iostat) and almost no IOs issued to nfs server, on nfs server at the same time disk activity is almost 0 (both iostat and zpool iostat). However CPU activity increases in SYS during that periods.>> Different time period when disk activity is small: >> >> # lockstat -kgIw sleep 10 | lessMA> Did you happen to get ''lockstat -kgIW'' output while the problem was MA> occurring? (note the capital W) I''m not sure how to interpret the -w MA> output... (and sorry I gave you the wrong flags before). Sorry, just blindly copied (changed i to I), next time I''ll try.>> Now during another period when disk activity is low and nfs clients see problem: >> >> # dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}'' -n tick-5s''{printa(@);exit(0);}'' >> [...] >> page_next_scan_large 23648600 >> generic_idle_cpu 69234100 >> disp_getwork 139261800 >> avl_walk 669424900MA> Hmm, that''s a possibility, but the method you''re using to gather this MA> information (tracing *every* function entry and exit) is a bit MA> heavy-handed, and it may be distorting the results. That could be the case - however during normal periods avl_walk when measured the same way is not on the top.>> Heh, I''m sure I have seen avl_walk consuming lot of CPU before... >> >> So wait for another such period and (6-7seconds): >> >> # dtrace -n fbt::avl_walk:entry''{@[stack()]=count();}'' >> [...]>> zfs`metaslab_ff_alloc+0x9c >> zfs`space_map_alloc+0x10 >> zfs`metaslab_group_alloc+0x1e4 >> zfs`metaslab_alloc_dva+0x114 >> zfs`metaslab_alloc+0x2c >> zfs`zio_alloc_blk+0x70 >> zfs`zil_lwb_write_start+0x8c >> zfs`zil_lwb_commit+0x1ac >> zfs`zil_commit+0x1b0 >> zfs`zfs_fsync+0xa8 >> genunix`fop_fsync+0x14 >> nfssrv`rfs3_create+0x700 >> nfssrv`common_dispatch+0x444 >> rpcmod`svc_getreq+0x154 >> rpcmod`svc_run+0x198 >> nfs`nfssys+0x1c4 >> unix`syscall_trap32+0xcc >> 1415957MA> Hmm, assuming that avl_walk() is actually consuming most of our CPU MA> (which the lockstat -kgIW will confirm), this seems to indicate that MA> we''re taking a long time trying to find free chunks of space. This may MA> happen if you have lots of small fragments of free space, but no chunks MA> large enough to hold the block we''re trying to allocate. We try to MA> avoid this situation by trying to allocate from the metaslabs with the MA> most free space, but it''s possible that there''s an error in this algorithm.>> So lets destroy oldest snapshot: >> >> # zfs destroy f3-1/d611 at auto-2006-10-18 >> [it took about 4 minutes!]>> After snapshot was destroyed problem is completly gone.MA> FYI, destroying the snapshot probably helped simply by (a) returning MA> some big chunks of space to the pool and/or (b) perturbing the system MA> enough so that we try different metaslabs which aren''t so fragmented. Well, that''s scaring - it means when I''ll fill up the pool a little bit more I could get big problem even without using snapshots which is much worse as there would be no easy and quick method to solve symptoms like just destroying snapshot. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski
2006-Oct-25 13:35 UTC
[zfs-discuss] Re: Re: Snapshots impact on performance
Hi. Today again the same problem - however not that severe so I''m not sure if below lockstat is representative. # lockstat -kgIW sleep 5 | less Profiling interrupt: 1968 events in 5.072 seconds (388 events/sec) Count genr cuml rcnt nsec Hottest CPU+PIL Caller ------------------------------------------------------------------------------- 1560 79% ---- 0.00 2520 cpu[3] thread_start 1018 52% ---- 0.00 2069 cpu[2]+11 idle 851 43% ---- 0.00 2093 cpu[2]+11 disp_getwork 591 30% ---- 0.00 3647 cpu[1] metaslab_alloc 591 30% ---- 0.00 3647 cpu[1] metaslab_alloc_dva 591 30% ---- 0.00 3647 cpu[1] metaslab_group_alloc 404 21% ---- 0.00 3488 cpu[1] space_map_load 404 21% ---- 0.00 3488 cpu[1] metaslab_activate 382 19% ---- 0.00 3564 cpu[1] zio_dva_allocate 382 19% ---- 0.00 3564 cpu[1] zio_write_compress 382 19% ---- 0.00 3564 cpu[1] txg_sync_thread 382 19% ---- 0.00 3564 cpu[1] spa_sync 382 19% ---- 0.00 3564 cpu[1] dsl_pool_sync 382 19% ---- 0.00 3564 cpu[1] dsl_dataset_sync 382 19% ---- 0.00 3564 cpu[1] dnode_sync 382 19% ---- 0.00 3564 cpu[1] dmu_objset_sync 382 19% ---- 0.00 3564 cpu[1] dmu_objset_sync_dnodes 382 19% ---- 0.00 3564 cpu[1] dbuf_sync 382 19% ---- 0.00 3564 cpu[1] arc_write 363 18% ---- 0.00 3510 cpu[1] syscall_trap32 361 18% ---- 0.00 3514 cpu[1] nfssys 361 18% ---- 0.00 3514 cpu[1] svc_run 350 18% ---- 0.00 3532 cpu[1] svc_getreq 349 18% ---- 0.00 3533 cpu[1] common_dispatch 251 13% ---- 0.00 3813 cpu[1] avl_find 238 12% ---- 0.00 3320 cpu[1] space_map_remove 211 11% ---- 0.00 3798 cpu[1] zil_commit 209 11% ---- 0.00 3800 cpu[1] zio_alloc_blk 209 11% ---- 0.00 3800 cpu[1] zil_lwb_write_start 198 10% ---- 0.00 4256 cpu[1] metaslab_ff_alloc 196 10% ---- 0.00 3867 cpu[1] zfs_putpage [...] This message posted from opensolaris.org
Matthew Ahrens
2006-Oct-25 16:58 UTC
[zfs-discuss] Re: Re: Snapshots impact on performance
Robert Milkowski wrote:> Hi. > > Today again the same problem - however not that severe so I''m not sure if below lockstat is representative. > > # lockstat -kgIW sleep 5 | less > Profiling interrupt: 1968 events in 5.072 seconds (388 events/sec) > > Count genr cuml rcnt nsec Hottest CPU+PIL Caller > ------------------------------------------------------------------------------- > 1560 79% ---- 0.00 2520 cpu[3] thread_start > 1018 52% ---- 0.00 2069 cpu[2]+11 idle > 851 43% ---- 0.00 2093 cpu[2]+11 disp_getwork > 591 30% ---- 0.00 3647 cpu[1] metaslab_alloc > 591 30% ---- 0.00 3647 cpu[1] metaslab_alloc_dva > 591 30% ---- 0.00 3647 cpu[1] metaslab_group_alloc > 404 21% ---- 0.00 3488 cpu[1] space_map_load > 404 21% ---- 0.00 3488 cpu[1] metaslab_activateNice, this is definitely pointing the finger more definitively. Next time could you try: dtrace -n ''profile-97{@[stack(20)] = count()}'' -c ''sleep 5'' (just send the last 10 or so stack traces) In the mean time I''ll talk with our SPA experts and see if I can figure out how to fix this... --matt
> Nice, this is definitely pointing the finger more definitively. Next > time could you try: > > dtrace -n ''profile-97{@[stack(20)] = count()}'' -c ''sleep 5'' > > (just send the last 10 or so stack traces) > > In the mean time I''ll talk with our SPA experts and see if I can figure > out how to fix this...By any chance is the pool fairly close to full? The fuller it gets, the harder it becomes to find long stretches of free space. Jeff
Robert Milkowski
2006-Oct-30 08:54 UTC
[zfs-discuss] Re: Re: Snapshots impact on performance
Hello Jeff, Monday, October 30, 2006, 2:03:52 AM, you wrote:>> Nice, this is definitely pointing the finger more definitively. Next >> time could you try: >> >> dtrace -n ''profile-97{@[stack(20)] = count()}'' -c ''sleep 5'' >> >> (just send the last 10 or so stack traces) >> >> In the mean time I''ll talk with our SPA experts and see if I can figure >> out how to fix this...JB> By any chance is the pool fairly close to full? The fuller it gets, JB> the harder it becomes to find long stretches of free space. Nope - at least 600GB free all the time in the pool. Also it''s not a quota - if I rise a quote for the file system (+100GB for example) it doesn''t help. If I remove oldest snapshot it always helps immediately. Sometimes I have to remove two oldest snapshots. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Chris Gerhard
2006-Dec-06 17:23 UTC
[zfs-discuss] Re: Re: Re: Snapshots impact on performance
One of our file servers internally to Sun that reproduces this running nv53 here is the dtrace output: unix`mutex_vector_enter+0x120 zfs`metaslab_group_alloc+0x1a0 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_dva_allocate+0x54 zfs`zio_write_compress+0x248 zfs`arc_write+0xec zfs`dbuf_sync+0x698 zfs`dnode_sync+0x2ec zfs`dmu_objset_sync_dnodes+0x60 zfs`dmu_objset_sync+0x78 zfs`dsl_dataset_sync+0xc zfs`dsl_pool_sync+0x64 zfs`spa_sync+0x1b4 zfs`txg_sync_thread+0x120 unix`thread_start+0x4 8 unix`mutex_vector_enter+0x120 zfs`metaslab_group_alloc+0x1a0 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_commit_writer+0x2ac zfs`zil_commit+0x68 zfs`zfs_fsync+0xa4 genunix`fop_fsync+0x14 nfssrv`rfs3_setattr+0x4ec nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 8 genunix`avl_walk+0x3c zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_lwb_commit+0x94 zfs`zil_commit_writer+0x1e4 zfs`zil_commit+0x68 zfs`zfs_putpage+0x1d8 genunix`fop_putpage+0x1c nfssrv`rfs3_commit+0x130 nfssrv`common_dispatch+0x4ec rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 8 genunix`avl_walk+0x40 zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_commit_writer+0x2ac zfs`zil_commit+0x68 zfs`zfs_putpage+0x1d8 genunix`fop_putpage+0x1c nfssrv`rfs3_commit+0x130 nfssrv`common_dispatch+0x4ec rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 8 genunix`avl_walk+0x4c zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_dva_allocate+0x54 zfs`zio_write_compress+0x248 zfs`arc_write+0xec zfs`dbuf_sync+0x698 zfs`dnode_sync+0x2ec zfs`dmu_objset_sync_dnodes+0x60 zfs`dmu_objset_sync+0x50 zfs`dsl_dataset_sync+0xc zfs`dsl_pool_sync+0x64 zfs`spa_sync+0x1b4 zfs`txg_sync_thread+0x120 unix`thread_start+0x4 8 zfs`fletcher_2_native+0x2c zfs`arc_cksum_verify+0x64 zfs`arc_buf_thaw+0x38 zfs`dbuf_dirty+0x10c zfs`dmu_write_uio+0xc4 zfs`zfs_write+0x3ac genunix`fop_write+0x20 nfssrv`rfs3_write+0x3d8 nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 8 zfs`fletcher_2_native+0x2c zfs`arc_cksum_verify+0x64 zfs`arc_buf_destroy+0x1c zfs`arc_evict+0x1f0 zfs`arc_adjust+0xf8 zfs`arc_kmem_reclaim+0x100 zfs`arc_kmem_reap_now+0x20 zfs`arc_reclaim_thread+0xdc unix`thread_start+0x4 8 zfs`fletcher_2_native+0x2c zfs`arc_cksum_compute+0x6c zfs`dbuf_rele+0x40 zfs`dmu_buf_rele_array+0x34 zfs`dmu_write_uio+0x13c zfs`zfs_write+0x3ac genunix`fop_write+0x20 nfssrv`rfs3_write+0x3d8 nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 8 unix`disp_getwork+0x7c unix`idle+0xd0 unix`thread_start+0x4 9 unix`mutex_vector_enter+0x134 zfs`metaslab_free_dva+0xb0 zfs`metaslab_free+0x78 zfs`zio_dva_free+0x14 zfs`zio_wait+0xc zfs`arc_free+0x13c zfs`dsl_dataset_block_kill+0x168 zfs`dbuf_do_kill+0x14 genunix`taskq_thread+0x19c unix`thread_start+0x4 9 genunix`avl_walk+0x3c zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_dva_allocate+0x54 zfs`zio_write_compress+0x248 zfs`arc_write+0xec zfs`dbuf_sync+0x698 zfs`dnode_sync+0x2ec zfs`dmu_objset_sync_dnodes+0x60 zfs`dmu_objset_sync+0x50 zfs`dsl_dataset_sync+0xc zfs`dsl_pool_sync+0x64 zfs`spa_sync+0x1b4 zfs`txg_sync_thread+0x120 unix`thread_start+0x4 9 10 unix`mutex_vector_enter+0x120 zfs`metaslab_group_alloc+0x1a0 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_lwb_commit+0x94 zfs`zil_commit_writer+0x1e4 zfs`zil_commit+0x68 zfs`zfs_putpage+0x1d8 genunix`fop_putpage+0x1c nfssrv`rfs3_commit+0x130 nfssrv`common_dispatch+0x4ec rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 10 unix`mutex_vector_enter+0x134 zfs`metaslab_group_alloc+0x1a0 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_commit_writer+0x2ac zfs`zil_commit+0x68 zfs`zfs_fsync+0xa4 genunix`fop_fsync+0x14 nfssrv`rfs3_setattr+0x4ec nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 10 unix`disp_getwork+0x8c unix`idle+0xd0 unix`thread_start+0x4 11 unix`mutex_vector_enter+0x134 zfs`metaslab_group_alloc+0x1a0 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_dva_allocate+0x54 zfs`zio_write_compress+0x248 zfs`arc_write+0xec zfs`dbuf_sync+0x698 zfs`dnode_sync+0x2ec zfs`dmu_objset_sync_dnodes+0x60 zfs`dmu_objset_sync+0x78 zfs`dsl_dataset_sync+0xc zfs`dsl_pool_sync+0x64 zfs`spa_sync+0x1b4 zfs`txg_sync_thread+0x120 unix`thread_start+0x4 12 genunix`avl_walk+0x50 zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_dva_allocate+0x54 zfs`zio_write_compress+0x248 zfs`arc_write+0xec zfs`dbuf_sync+0x698 zfs`dnode_sync+0x2ec zfs`dmu_objset_sync_dnodes+0x60 zfs`dmu_objset_sync+0x50 zfs`dsl_dataset_sync+0xc zfs`dsl_pool_sync+0x64 zfs`spa_sync+0x1b4 zfs`txg_sync_thread+0x120 unix`thread_start+0x4 12 zfs`metaslab_ff_alloc+0x7c zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_lwb_commit+0x94 zfs`zil_commit_writer+0x1e4 zfs`zil_commit+0x68 zfs`zfs_write+0x648 genunix`fop_write+0x20 nfssrv`rfs3_write+0x3d8 nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 12 genunix`avl_walk+0x40 zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_dva_allocate+0x54 zfs`zio_write_compress+0x248 zfs`arc_write+0xec zfs`dbuf_sync+0x698 zfs`dnode_sync+0x2ec zfs`dmu_objset_sync_dnodes+0x60 zfs`dmu_objset_sync+0x50 zfs`dsl_dataset_sync+0xc zfs`dsl_pool_sync+0x64 zfs`spa_sync+0x1b4 zfs`txg_sync_thread+0x120 unix`thread_start+0x4 13 unix`mutex_vector_enter+0x120 zfs`metaslab_free_dva+0xb0 zfs`metaslab_free+0x78 zfs`zio_dva_free+0x14 zfs`zio_wait+0xc zfs`arc_free+0x13c zfs`dsl_dataset_block_kill+0x168 zfs`dbuf_do_kill+0x14 genunix`taskq_thread+0x19c unix`thread_start+0x4 18 genunix`avl_walk+0x50 zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_commit_writer+0x2ac zfs`zil_commit+0x68 zfs`zfs_write+0x648 genunix`fop_write+0x20 nfssrv`rfs3_write+0x3d8 nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 18 zfs`metaslab_ff_alloc+0x74 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_commit_writer+0x2ac zfs`zil_commit+0x68 zfs`zfs_write+0x648 genunix`fop_write+0x20 nfssrv`rfs3_write+0x3d8 nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 19 genunix`avl_walk+0x4c zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_commit_writer+0x2ac zfs`zil_commit+0x68 zfs`zfs_write+0x648 genunix`fop_write+0x20 nfssrv`rfs3_write+0x3d8 nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 20 genunix`avl_walk+0x3c zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_commit_writer+0x2ac zfs`zil_commit+0x68 zfs`zfs_write+0x648 genunix`fop_write+0x20 nfssrv`rfs3_write+0x3d8 nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 22 unix`mutex_vector_enter+0x134 zfs`metaslab_group_alloc+0x1a0 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_lwb_commit+0x94 zfs`zil_commit_writer+0x1e4 zfs`zil_commit+0x68 zfs`zfs_putpage+0x1d8 genunix`fop_putpage+0x1c nfssrv`rfs3_commit+0x130 nfssrv`common_dispatch+0x4ec rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 26 genunix`avl_walk+0x40 zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_commit_writer+0x2ac zfs`zil_commit+0x68 zfs`zfs_write+0x648 genunix`fop_write+0x20 nfssrv`rfs3_write+0x3d8 nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 27 zfs`metaslab_ff_alloc+0x74 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_lwb_commit+0x94 zfs`zil_commit_writer+0x1e4 zfs`zil_commit+0x68 zfs`zfs_write+0x648 genunix`fop_write+0x20 nfssrv`rfs3_write+0x3d8 nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 38 genunix`avl_walk+0x50 zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_lwb_commit+0x94 zfs`zil_commit_writer+0x1e4 zfs`zil_commit+0x68 zfs`zfs_write+0x648 genunix`fop_write+0x20 nfssrv`rfs3_write+0x3d8 nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 45 genunix`avl_walk+0x40 zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_lwb_commit+0x94 zfs`zil_commit_writer+0x1e4 zfs`zil_commit+0x68 zfs`zfs_write+0x648 genunix`fop_write+0x20 nfssrv`rfs3_write+0x3d8 nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 46 genunix`avl_walk+0x3c zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_lwb_commit+0x94 zfs`zil_commit_writer+0x1e4 zfs`zil_commit+0x68 zfs`zfs_write+0x648 genunix`fop_write+0x20 nfssrv`rfs3_write+0x3d8 nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 57 genunix`avl_walk+0x4c zfs`metaslab_ff_alloc+0x90 zfs`space_map_alloc+0x10 zfs`metaslab_group_alloc+0x200 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x3c zfs`zio_alloc_blk+0x34 zfs`zil_lwb_write_start+0xdc zfs`zil_lwb_commit+0x94 zfs`zil_commit_writer+0x1e4 zfs`zil_commit+0x68 zfs`zfs_write+0x648 genunix`fop_write+0x20 nfssrv`rfs3_write+0x3d8 nfssrv`common_dispatch+0x3c8 rpcmod`svc_getreq+0x20c rpcmod`svc_run+0x1ac nfs`nfssys+0x18c unix`syscall_trap32+0xcc 57 This message posted from opensolaris.org
Robert Milkowski
2006-Dec-12 17:28 UTC
[zfs-discuss] Re: Re: Re: Snapshots impact on performance
Hello Chris, Wednesday, December 6, 2006, 6:23:48 PM, you wrote: CG> One of our file servers internally to Sun that reproduces this CG> running nv53 here is the dtrace output: Any conclusions yet? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Chris Gerhard
2006-Dec-13 11:25 UTC
[zfs-discuss] Re: Re: Re: Snapshots impact on performance
Robert Milkowski wrote:> Hello Chris, > > Wednesday, December 6, 2006, 6:23:48 PM, you wrote: > > CG> One of our file servers internally to Sun that reproduces this > CG> running nv53 here is the dtrace output: > > Any conclusions yet? >Not yet. We had to delete all the "automatic" snapshots we had so that the system became usable again. An engineer is looking into reproducing this on a lab system. I hope they will turn up on this thread with a progress report. -- Chris Gerhard. __o __o __o Principal Engineer _`\<,`\<,`\<,_ Sun Microsystems Limited (*)/---/---/ (*) Phone: +44 (0) 1252 426033 (ext 26033) http://blogs.sun.com/chrisg ----------------------------------------------------------- NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3253 bytes Desc: S/MIME Cryptographic Signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061213/ad0395af/attachment.bin>
Robert Milkowski
2006-Dec-22 12:33 UTC
[zfs-discuss] Re: Re: Re: Snapshots impact on performance
Hi. The problem is getting worse... now even if I destroy all snapshots in a pool I get performance problem even with zil_disable set to 1. Despite that I have limit for maximum nfs threads set to 2048 I get only about 1700. If I want to kill nfsd server it takes 1-4 minutes untill all threads are finished (and most of the time above 1000 threads are stil there). During nfsd is stopping I can see IOs (99% reads) to the pool. Also simple zfs commands (like changing quote for a file system, etc.) take too much time to complete (like 3-5 minutes to set quota for a file system). There''s still over 700gb free storage in a pool, setting quota to none doesn''t help. Using iostat it looks like disks aren''t saturated. It looks almost like lot of nfsd threads are spinning (probably in zfs) - I don''t recall nfsd stopping so long on UFS file systems. bash-3.00# iostat -xnzC 1 [... first output] extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1782.3 0.0 12700.7 0.0 0.0 11.1 0.0 6.2 0 293 c5 584.3 0.0 4102.2 0.0 0.0 3.6 0.0 6.2 0 96 c5t600C0FF000000000098FD57F9DA83C00d0 572.1 0.0 4144.8 0.0 0.0 3.7 0.0 6.5 0 99 c5t600C0FF000000000098FD55DBA4EA000d0 625.8 0.0 4453.7 0.0 0.0 3.7 0.0 5.9 0 98 c5t600C0FF000000000098FD516E4403200d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 2381.9 0.0 17242.1 0.0 0.0 16.7 0.0 7.0 0 291 c5 800.0 0.0 5827.9 0.0 0.0 5.5 0.0 6.9 0 96 c5t600C0FF000000000098FD57F9DA83C00d0 690.0 0.0 4991.9 0.0 0.0 4.9 0.0 7.1 0 97 c5t600C0FF000000000098FD55DBA4EA000d0 892.0 0.0 6422.3 0.0 0.0 6.3 0.0 7.1 0 98 c5t600C0FF000000000098FD516E4403200d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1890.2 0.0 12826.0 0.0 0.0 11.4 0.0 6.0 0 292 c5 611.1 0.0 3998.5 0.0 0.0 3.4 0.0 5.6 0 96 c5t600C0FF000000000098FD57F9DA83C00d0 604.1 0.0 4259.5 0.0 0.0 3.9 0.0 6.4 0 98 c5t600C0FF000000000098FD55DBA4EA000d0 675.1 0.0 4568.0 0.0 0.0 4.1 0.0 6.0 0 98 c5t600C0FF000000000098FD516E4403200d0 ^C Now dtrace output you''ve requested. dtrace -n ''profile-97{@[stack(20)] = count()}'' -c ''sleep 5'' [...] unix`i_ddi_splhigh unix`disp_getwork+0x38 unix`idle+0xd4 unix`thread_start+0x4 35 unix`i_ddi_splx unix`disp_getwork+0x160 unix`idle+0xd4 unix`thread_start+0x4 38 unix`disp_getwork+0x70 unix`idle+0xd4 unix`thread_start+0x4 38 unix`disp_getwork+0x158 unix`idle+0xd4 unix`thread_start+0x4 38 unix`i_ddi_splhigh+0x14 unix`disp_getwork+0x38 unix`idle+0xd4 unix`thread_start+0x4 39 unix`disp_getwork+0x8c unix`idle+0xd4 unix`thread_start+0x4 39 unix`idle+0x12c unix`thread_start+0x4 44 unix`disp_getwork+0x1a8 unix`idle+0xd4 unix`thread_start+0x4 47 unix`disp_getwork+0x7c unix`idle+0xd4 unix`thread_start+0x4 49 unix`disp_getwork+0x90 unix`idle+0xd4 unix`thread_start+0x4 56 unix`disp_getwork+0x10c unix`idle+0xd4 unix`thread_start+0x4 62 unix`i_ddi_splx+0x1c unix`disp_getwork+0x160 unix`idle+0xd4 unix`thread_start+0x4 117 bash-3.00# This message posted from opensolaris.org
Robert Milkowski
2006-Dec-22 12:42 UTC
[zfs-discuss] Re: Re: Re: Snapshots impact on performance
bash-3.00# lockstat -kgIW sleep 100 | head -30 Profiling interrupt: 38844 events in 100.098 seconds (388 events/sec) Count genr cuml rcnt nsec Hottest CPU+PIL Caller ------------------------------------------------------------------------------- 32081 83% ---- 0.00 2432 cpu[1] thread_start 24347 63% ---- 0.00 2116 cpu[1]+11 idle 20724 53% ---- 0.00 2125 cpu[1]+11 disp_getwork 5525 14% ---- 0.00 3571 cpu0 syscall_trap32 5470 14% ---- 0.00 3570 cpu0 nfssys 5470 14% ---- 0.00 3570 cpu0 svc_run 5225 13% ---- 0.00 3595 cpu0 svc_getreq 5142 13% ---- 0.00 3600 cpu0 common_dispatch 4488 12% ---- 0.00 3040 cpu[2] taskq_thread 3658 9% ---- 0.00 2946 cpu[2] zio_vdev_io_assess 2943 8% ---- 0.00 2874 cpu[2] zio_read_decompress 2846 7% ---- 0.00 2120 cpu[1] splx 2654 7% ---- 0.00 4105 cpu[1] putnext 2541 7% ---- 0.00 2785 cpu[2] lzjb_decompress 2382 6% ---- 0.00 2138 cpu[2] i_ddi_splhigh 2056 5% ---- 0.00 3481 cpu0 fop_lookup 2031 5% ---- 0.00 3485 cpu0 zfs_lookup 1979 5% ---- 0.00 3482 cpu0 zfs_dirlook 1935 5% ---- 0.00 3492 cpu0 zfs_dirent_lock 1910 5% ---- 0.00 3835 cpu[1] txg_sync_thread 1910 5% ---- 0.00 3835 cpu[1] spa_sync 1896 5% ---- 0.00 3842 cpu[1] dsl_pool_sync 1896 5% ---- 0.00 3842 cpu[1] dmu_objset_sync 1895 5% ---- 0.00 3843 cpu[1] dmu_objset_sync_dnodes 1894 5% ---- 0.00 3843 cpu[1] dnode_sync bash-3.00# This message posted from opensolaris.org
Robert Milkowski
2007-Jan-05 22:59 UTC
[zfs-discuss] Re: Re: Re: Snapshots impact on performance
Hello Chris, Wednesday, December 13, 2006, 12:25:40 PM, you wrote: CG> Robert Milkowski wrote:>> Hello Chris, >> >> Wednesday, December 6, 2006, 6:23:48 PM, you wrote: >> >> CG> One of our file servers internally to Sun that reproduces this >> CG> running nv53 here is the dtrace output: >> >> Any conclusions yet? >>CG> Not yet. We had to delete all the "automatic" snapshots we had so that CG> the system became usable again. An engineer is looking into reproducing CG> this on a lab system. I hope they will turn up on this thread with a CG> progress report. The problem here is that even without snapshots we''re seeing the same problem. It''s like snapshots only make it worse. Hope it''ll be solved quickly (I''ve also escalating it with P1 for some time...). -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Gino
2007-Jun-27 08:59 UTC
[zfs-discuss] Re: Re[2]: Re: Re: Re: Snapshots impact on performance
Same problem here (snv_60). Robert, did you find any solutions? gino This message posted from opensolaris.org
Victor Latushkin
2007-Jun-27 12:19 UTC
[zfs-discuss] Re: Re[2]: Re: Re: Re: Snapshots impact on performance
Gino wrote:> Same problem here (snv_60). > Robert, did you find any solutions?Couple of week ago I put together an implementation of space maps which completely eliminates loops and recursion from space map alloc operation, and allows to implement different allocation strategies quite easily (of which I put together 3 more). It looks like it works for me on thumper and my notebook with ZFS Root though I have almost no time to test it more these days due to year end. I haven''t done SPARC build yet and I do not have test case to test against. Also, it comes at a price - I have to spend some more time (logarithmic, though) during all other operations on space maps and is not optimized now. victor
Hello Victor, Wednesday, June 27, 2007, 1:19:44 PM, you wrote: VL> Gino wrote:>> Same problem here (snv_60). >> Robert, did you find any solutions?VL> Couple of week ago I put together an implementation of space maps which VL> completely eliminates loops and recursion from space map alloc VL> operation, and allows to implement different allocation strategies quite VL> easily (of which I put together 3 more). It looks like it works for me VL> on thumper and my notebook with ZFS Root though I have almost no time to VL> test it more these days due to year end. I haven''t done SPARC build yet VL> and I do not have test case to test against. VL> Also, it comes at a price - I have to spend some more time (logarithmic, VL> though) during all other operations on space maps and is not optimized now. Lukasz (cc) - maybe you can test it and even help on tuning it? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Łukasz K
2007-Jul-27 12:41 UTC
[zfs-discuss] Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impact on performance
Dnia 26-07-2007 o godz. 13:31 Robert Milkowski napisa?(a):> Hello Victor, > > Wednesday, June 27, 2007, 1:19:44 PM, you wrote: > > VL> Gino wrote: > >> Same problem here (snv_60). > >> Robert, did you find any solutions? > > VL> Couple of week ago I put together an implementation of space maps > which > VL> completely eliminates loops and recursion from space map alloc > VL> operation, and allows to implement different allocation strategies > quite > VL> easily (of which I put together 3 more). It looks like it works for me > VL> on thumper and my notebook with ZFS Root though I have almost no > time to > VL> test it more these days due to year end. I haven''t done SPARC build > yet > VL> and I do not have test case to test against. > > VL> Also, it comes at a price - I have to spend some more time > (logarithmic, > VL> though) during all other operations on space maps and is not > optimized now. > > Lukasz (cc) - maybe you can test it and even help on tuning it? >Yes, I can test it. I''m building environment to compile opensolaris and test zfs. I will be ready next week. Victor, can you tell me where to look for your changes ? How to change allocation strategy ? I can see that changing space_map_ops_t I can declare diffrent callback functions. Lukas ---------------------------------------------------- Tylko od nich zale?y czy prze?yj? t? noc. Jak uciec, gdy oni widz? wszystko? Kate Beckinsale w mrocznym thrillerze MOTEL - kinach od 3 sierpnia! http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fmotel.html&sid=1236
> Same problem here (snv_60). > Robert, did you find any solutions? > > ginocheck this http://www.opensolaris.org/jive/thread.jspa?threadID=34423&tstart=0 Check spa_sync function time remember to change POOL_NAME ! dtrace -q -n fbt::spa_sync:entry''/(char *)(((spa_t*)arg0)->spa_name) == "POOL_NAME"/{ self->t = timestamp; }'' -n fbt::spa_sync:return''/self->t/{ @m = max((timestamp - self->t)/1000000); self->t = 0; }'' -n tick-10m''{ printa("%@u", at m); exit(0); }'' If you have long spa_sync times, try to check if you have problems with finding new blocks in space map with this script: #!/usr/sbin/dtrace -s fbt::space_map_alloc:entry { self->s = arg1; } fbt::space_map_alloc:return /arg1 != -1/ { self->s = 0; } fbt::space_map_alloc:return /self->s && (arg1 == -1)/ { @s = quantize(self->s); self->s = 0; } tick-10s { printa(@s); } Then change zfs set recordsize=XX POOL_NAME. Make sure that all filesystem inherits recordsize. #zfs get -r recordsize POOL_NAME Other thing is space map size. check map size echo ''::spa'' | mdb -k | grep ''f[0-9]*-[0-9]*'' \ | while read pool_ptr state pool_name do echo "${pool_ptr}::walk metaslab|::print -d struct metaslab ms_smo.smo_objsize" \ | mdb -k \ | nawk ''{sub("^0t","",$3);sum+=$3}END{print sum}'' done The value you will get is space map size on disk. In memory space map will have about 4 *size_on_disk. Sometimes during snapshot remove kernel will have to load all space maps to memory. For example if space map on disk takes 1GB then: - kernel in spa_sync funtion will read 1GB from disk ( or from cache ) - allocate 4GB for avl trees - do all operations on avl trees - save maps It is good to have enough free memory for this operations. You can reduce space map by coping all filesystems on other pool. I recommend zfs send. regards Lukas This message posted from opensolaris.org
Victor Latushkin
2007-Aug-24 12:57 UTC
[zfs-discuss] Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impact on performance
Hi Lukasz and all, I just returned from month long sick-leave, so I need some time to sort pile of emails, do SPARC build and some testing and then I''ll be able to provide you with my changes in some form. Hope this will happen next week. Cheers, Victor ?ukasz K wrote:> Dnia 26-07-2007 o godz. 13:31 Robert Milkowski napisa?(a): >> Hello Victor, >> >> Wednesday, June 27, 2007, 1:19:44 PM, you wrote: >> >> VL> Gino wrote: >>>> Same problem here (snv_60). >>>> Robert, did you find any solutions? >> VL> Couple of week ago I put together an implementation of space maps >> which >> VL> completely eliminates loops and recursion from space map alloc >> VL> operation, and allows to implement different allocation strategies >> quite >> VL> easily (of which I put together 3 more). It looks like it works for me >> VL> on thumper and my notebook with ZFS Root though I have almost no >> time to >> VL> test it more these days due to year end. I haven''t done SPARC build >> yet >> VL> and I do not have test case to test against. >> >> VL> Also, it comes at a price - I have to spend some more time >> (logarithmic, >> VL> though) during all other operations on space maps and is not >> optimized now. >> >> Lukasz (cc) - maybe you can test it and even help on tuning it? >> > Yes, I can test it. I''m building environment to compile opensolaris > and test zfs. I will be ready next week. > > Victor, can you tell me where to look for your changes ? > How to change allocation strategy ? > I can see that changing space_map_ops_t > I can declare diffrent callback functions. > > Lukas > > ---------------------------------------------------- > Tylko od nich zale?y czy prze?yj? t? noc. Jak uciec, gdy > oni widz? wszystko? Kate Beckinsale w mrocznym thrillerze > MOTEL - kinach od 3 sierpnia! > http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fmotel.html&sid=1236 > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Łukasz K
2007-Aug-24 13:49 UTC
[zfs-discuss] Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impacton performance
Great, I have latest snv_70 sources and I''m working on it. Lukas Dnia 24-08-2007 o godz. 14:57 Victor Latushkin napisa?(a):> Hi Lukasz and all, > > I just returned from month long sick-leave, so I need some time to sort > pile of emails, do SPARC build and some testing and then I''ll be able to > provide you with my changes in some form. Hope this will happen next week. > > Cheers, > Victor > > ?ukasz K wrote: > > Dnia 26-07-2007 o godz. 13:31 Robert Milkowski napisa?(a): > >> Hello Victor, > >> > >> Wednesday, June 27, 2007, 1:19:44 PM, you wrote: > >> > >> VL> Gino wrote: > >>>> Same problem here (snv_60). > >>>> Robert, did you find any solutions? > >> VL> Couple of week ago I put together an implementation of space maps > >> which > >> VL> completely eliminates loops and recursion from space map alloc > >> VL> operation, and allows to implement different allocation strategies > >> quite > >> VL> easily (of which I put together 3 more). It looks like it works for me > >> VL> on thumper and my notebook with ZFS Root though I have almost no > >> time to > >> VL> test it more these days due to year end. I haven''t done SPARC build > >> yet > >> VL> and I do not have test case to test against. > >> > >> VL> Also, it comes at a price - I have to spend some more time > >> (logarithmic, > >> VL> though) during all other operations on space maps and is not > >> optimized now. > >> > >> Lukasz (cc) - maybe you can test it and even help on tuning it? > >> > > Yes, I can test it. I''m building environment to compile opensolaris > > and test zfs. I will be ready next week. > > > > Victor, can you tell me where to look for your changes ? > > How to change allocation strategy ? > > I can see that changing space_map_ops_t > > I can declare diffrent callback functions. > > > > Lukas > > > > ---------------------------------------------------- > > Tylko od nich zale?y czy prze?yj? t? noc. Jak uciec, gdy > > oni widz? wszystko? Kate Beckinsale w mrocznym thrillerze > > MOTEL - kinach od 3 sierpnia! > > http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fmotel.html&sid=1236 > > > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss---------------------------------------------------- Najnowszy album ?wiatowej klasy DJ''a Paula van Dyka "In Between" ju? w sklepach! Kompilacja zawiera singiel "White Lies" z udzia?em Jessica Sutta z Pussycat Dolls http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fpaulvandyk.html&sid=1272
Hello Erblichs, Two years later... :) It turned out to be a problem with metaslabs fragmentation. Lowering recordsize to 8k helps slightly but still it''s not fixed. -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com