thr3ads.net - zfs discuss - [zfs-discuss] Snapshots impact on performance [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Robert Milkowski

2006-Oct-16 12:17 UTC

[zfs-discuss] Snapshots impact on performance

Hello zfs-discuss,

  S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is
  created and 7 copies are kept. There''s quota set for a file system
  however there''s always at least 50GB of free space in a file system
  (and much more in a pool). ZFS file system is exported over NFS.

  Snapshots consume about 280GB of space.
  
  We have noticed so performance problems on nfs clients to this file
  system even during times with smaller load. Rising quota didn''t
  help. However removing oldest snapshot automatically solved
  performance problems.

  I do not have more details - sorry.

  Is it expected for snapshots to have very noticeable performance
  impact on file system being snapshoted?
  

-- 
Best regards,
 Robert                          mailto:rmilkowski at task.gda.pl
                                     http://milek.blogspot.com

Matthew Ahrens

2006-Oct-16 15:07 UTC

head link

[zfs-discuss] Snapshots impact on performance

Robert Milkowski wrote:> Hello zfs-discuss,
> 
>   S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is
>   created and 7 copies are kept. There''s quota set for a file
system
>   however there''s always at least 50GB of free space in a file
system
>   (and much more in a pool). ZFS file system is exported over NFS.
> 
>   Snapshots consume about 280GB of space.
>   
>   We have noticed so performance problems on nfs clients to this file
>   system even during times with smaller load. Rising quota didn''t
>   help. However removing oldest snapshot automatically solved
>   performance problems.
> 
>   I do not have more details - sorry.
> 
>   Is it expected for snapshots to have very noticeable performance
>   impact on file system being snapshoted?
No, this behavior is unexpected.  The only way that snapshots should 
have a performance impact on access to the filesystem is if you are 
running low on space in the pool or quota (which it sounds like you are 
not).

Can you describe what the performance problems were?  What was the 
workload like?  What problem did you identify?  How did it improve when 
you ''zfs destroy''-ed the oldest snapshot?  Are you sure that
the oldest
snapshot wasn''t pushing you close to your quota?

--matt

Neil Perrin

2006-Oct-16 16:50 UTC

head link

[zfs-discuss] Snapshots impact on performance

Matthew Ahrens wrote On 10/16/06 09:07,:> Robert Milkowski wrote:
> 
>> Hello zfs-discuss,
>>
>>   S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is
>>   created and 7 copies are kept. There''s quota set for a file
system
>>   however there''s always at least 50GB of free space in a file
system
>>   (and much more in a pool). ZFS file system is exported over NFS.
>>
>>   Snapshots consume about 280GB of space.
>>     We have noticed so performance problems on nfs clients to this file
>>   system even during times with smaller load. Rising quota
didn''t
>>   help. However removing oldest snapshot automatically solved
>>   performance problems.
>>
>>   I do not have more details - sorry.
>>
>>   Is it expected for snapshots to have very noticeable performance
>>   impact on file system being snapshoted?
> 
> 
> No, this behavior is unexpected.  The only way that snapshots should 
> have a performance impact on access to the filesystem is if you are 
> running low on space in the pool or quota (which it sounds like you are 
> not).
> 
> Can you describe what the performance problems were?  What was the 
> workload like?  What problem did you identify?  How did it improve when 
> you ''zfs destroy''-ed the oldest snapshot?  Are you sure
that the oldest
> snapshot wasn''t pushing you close to your quota?
> 
> --matt
I could well believe there would be a hiccup when the snapshot is
taken on the rest of the pool. Each snapshot calls txg_wait_synced
four times. A few related to the zil and one from dsl_sync_task_group_wait

Robert Milkowski

2006-Oct-17 13:55 UTC

head link

[zfs-discuss] Snapshots impact on performance

Hello Matthew,

Monday, October 16, 2006, 5:07:50 PM, you wrote:

MA> Robert Milkowski wrote:>> Hello zfs-discuss,
>> 
>>   S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is
>>   created and 7 copies are kept. There''s quota set for a file
system
>>   however there''s always at least 50GB of free space in a file
system
>>   (and much more in a pool). ZFS file system is exported over NFS.
>> 
>>   Snapshots consume about 280GB of space.
>>   
>>   We have noticed so performance problems on nfs clients to this file
>>   system even during times with smaller load. Rising quota
didn''t
>>   help. However removing oldest snapshot automatically solved
>>   performance problems.
>> 
>>   I do not have more details - sorry.
>> 
>>   Is it expected for snapshots to have very noticeable performance
>>   impact on file system being snapshoted?
MA> No, this behavior is unexpected.  The only way that snapshots should 
MA> have a performance impact on access to the filesystem is if you are 
MA> running low on space in the pool or quota (which it sounds like you are
MA> not).

MA> Can you describe what the performance problems were?  What was the 
MA> workload like?  What problem did you identify?  How did it improve when
MA> you ''zfs destroy''-ed the oldest snapshot?  Are you sure
that the oldest
MA> snapshot wasn''t pushing you close to your quota?

I wasn''t on-site every-time it happened... but it looks like all NFS
clients to that file system have problem accessing data for 2-5s and
iostat on clients shows 100% busy to the disk.

There was well over 100gb of free space in a file system, then rising
quote to add another 100gb didn''t help. After removing snapshots all
problems disappeared almost immediately. No new snapshots were created
during this time. It happened second time - and again, quite a lot of
free space and destroying one snapshot helps.

If it happens again I''ll try to get some more specific data - however
it depends on when it happens as during peak hours I''ll probably just
destroy a snapshot to get it working.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Matthew Ahrens

2006-Oct-17 15:05 UTC

head link

[zfs-discuss] Snapshots impact on performance

Robert Milkowski wrote:> If it happens again I''ll try to get some more specific data -
however
> it depends on when it happens as during peak hours I''ll probably
just
> destroy a snapshot to get it working.
If it happens again, it would be great if you could gather some data 
before you destroy the snapshot so we have some chance of figuring out 
what''s going on here.  ''iostat -xnpc 1'' will tell us
if it''s CPU or disk
bound.  ''lockstat -kgiw sleep 10'' will tell us what functions
are using
CPU.  ''echo "::walk thread|::findstack" | mdb -k''
will tell us where
threads are stuck.

Actually, if you could gather each of those both while you''re observing
the problem, and then after the problem goes away, that would be helpful.

--matt

Robert Milkowski

2006-Oct-24 19:48 UTC

head link

[zfs-discuss] Re: Snapshots impact on performance

Hi.

    On nfs clients which are mounting file system f3-1/d611 I can see 3-5s
periods of 100% busy (iostat) and almost no IOs issued to nfs server, on nfs
server at the same time disk activity is almost 0 (both iostat and zpool
iostat). However CPU activity increases in SYS during that periods.

There''s quite a lot of free space in a f3-1/d611 as you can see:

# zfs list | egrep "611|NAM"
NAME                   USED  AVAIL  REFER  MOUNTPOINT
f3-1/d611              381G   219G   151G  //f3-1/d611
f3-1/d611 at auto-2006-10-18  48.2G      -   156G  -
f3-1/d611 at auto-2006-10-19  19.4G      -   145G  -
f3-1/d611 at auto-2006-10-20  21.9G      -   155G  -
f3-1/d611 at auto-2006-10-21  22.9G      -   151G  -
f3-1/d611 at auto-2006-10-22  18.5G      -   145G  -
f3-1/d611 at auto-2006-10-23  17.6G      -   148G  -
f3-1/d611 at auto-2006-10-24  14.5G      -   154G  -

And much more in a pool itself:

# zfs list | egrep "f3-1 |NAM"
NAME                   USED  AVAIL  REFER  MOUNTPOINT
f3-1                   563G   642G  9.18G  //f3-1

# zfs get all f3-1/d611
NAME             PROPERTY       VALUE                      SOURCE
f3-1/d611        type           filesystem                 -
f3-1/d611        creation       Tue Aug 29  2:07 2006      -
f3-1/d611        used           381G                       -
f3-1/d611        available      219G                       -
f3-1/d611        referenced     151G                       -
f3-1/d611        compressratio  1.00x                      -
f3-1/d611        mounted        yes                        -
f3-1/d611        quota          600G                       local
f3-1/d611        reservation    none                       default
f3-1/d611        recordsize     128K                       default
f3-1/d611        mountpoint     //f3-1/d611                default
f3-1/d611        sharenfs       off                        default
f3-1/d611        checksum       on                         default
f3-1/d611        compression    off                        default
f3-1/d611        atime          off                        inherited from f3-1
f3-1/d611        devices        on                         default
f3-1/d611        exec           on                         default
f3-1/d611        setuid         on                         default
f3-1/d611        readonly       off                        default
f3-1/d611        zoned          off                        default
f3-1/d611        snapdir        hidden                     default
f3-1/d611        aclmode        groupmask                  default
f3-1/d611        aclinherit     secure                     default


Access to this file system is only over NFS. Right now nfsd has less than 400
active threads.

Now output from several commands you requested during the times when disk
activity is low, CPU usage higher and nfs clients see problem.

# zpool iostat 1
[...]
f3-1         562G   662G     20      0   156K      0
f3-1         562G   662G     29      6   842K   887K
f3-1         562G   662G     12      0   238K      0
f3-1         562G   662G    927     25  11.0M  2.38M
f3-1         562G   662G  1.96K    221  19.8M  5.13M
f3-1         562G   662G    110    112  2.09M  1.72M


# iostat -xnzp 1
[...]
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   44.0    0.0  819.4    0.0  0.0  0.3    0.0    7.2   0  22
c5t600C0FF000000000098FD57F9DA83C00d0
   44.0    0.0  819.4    0.0  0.0  0.3    0.0    7.2   0  22
c5t600C0FF000000000098FD57F9DA83C00d0s0
   26.0    0.0  179.5    0.0  0.0  0.2    0.0    6.8   0  14
c5t600C0FF000000000098FD55DBA4EA000d0
   26.0    0.0  179.5    0.0  0.0  0.2    0.0    6.8   0  14
c5t600C0FF000000000098FD55DBA4EA000d0s0
   26.0    0.0  224.0    0.0  0.0  0.2    0.0    7.2   0  15
c5t600C0FF000000000098FD516E4403200d0
   26.0    0.0  224.0    0.0  0.0  0.2    0.0    7.2   0  15
c5t600C0FF000000000098FD516E4403200d0s0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   12.0    0.0  216.5    0.0  0.0  0.1    0.0    6.1   0   6
c5t600C0FF000000000098FD57F9DA83C00d0
   12.0    0.0  216.5    0.0  0.0  0.1    0.0    6.1   0   6
c5t600C0FF000000000098FD57F9DA83C00d0s0
   10.0    0.0   70.5    0.0  0.0  0.1    0.0    6.4   0   6
c5t600C0FF000000000098FD55DBA4EA000d0
   10.0    0.0   70.5    0.0  0.0  0.1    0.0    6.4   0   6
c5t600C0FF000000000098FD55DBA4EA000d0s0
    8.0    0.0   63.0    0.0  0.0  0.0    0.0    6.0   0   5
c5t600C0FF000000000098FD516E4403200d0
    8.0    0.0   63.0    0.0  0.0  0.0    0.0    6.0   0   5
c5t600C0FF000000000098FD516E4403200d0s0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    1.0    0.0    0.5  0.0  0.0    0.0   12.3   0   1 c1t0d0
    0.0    1.0    0.0    0.5  0.0  0.0    0.0   12.3   0   1 c1t0d0s6
    0.0    1.0    0.0    0.5  0.0  0.0    0.0    9.5   0   1 c1t1d0
    0.0    1.0    0.0    0.5  0.0  0.0    0.0    9.5   0   1 c1t1d0s6
    7.0    0.0   55.8    0.0  0.0  0.1    0.0    7.8   0   5
c5t600C0FF000000000098FD57F9DA83C00d0
    7.0    0.0   55.8    0.0  0.0  0.1    0.0    7.8   0   5
c5t600C0FF000000000098FD57F9DA83C00d0s0
    6.0    7.0   66.8  893.2  0.0  0.1    0.0    6.4   0   5
c5t600C0FF000000000098FD55DBA4EA000d0
    6.0    7.0   66.8  893.2  0.0  0.1    0.0    6.4   0   5
c5t600C0FF000000000098FD55DBA4EA000d0s0
    9.0    0.0  665.4    0.0  0.0  0.1    0.0    6.5   0   4
c5t600C0FF000000000098FD516E4403200d0
    9.0    0.0  665.4    0.0  0.0  0.1    0.0    6.5   0   4
c5t600C0FF000000000098FD516E4403200d0s0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    5.0    0.0   36.6    0.0  0.0  0.0    0.0    7.3   0   4
c5t600C0FF000000000098FD57F9DA83C00d0
    5.0    0.0   36.6    0.0  0.0  0.0    0.0    7.3   0   4
c5t600C0FF000000000098FD57F9DA83C00d0s0
   11.0    0.0   84.3    0.0  0.0  0.1    0.0    6.1   0   6
c5t600C0FF000000000098FD55DBA4EA000d0
   11.0    0.0   84.3    0.0  0.0  0.1    0.0    6.1   0   6
c5t600C0FF000000000098FD55DBA4EA000d0s0
    5.0    0.0  181.1    0.0  0.0  0.0    0.0    7.1   0   4
c5t600C0FF000000000098FD516E4403200d0
    5.0    0.0  181.1    0.0  0.0  0.0    0.0    7.1   0   4
c5t600C0FF000000000098FD516E4403200d0s0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  229.9   10.0 2795.9  639.9  0.0  1.9    0.0    7.8   0  49
c5t600C0FF000000000098FD57F9DA83C00d0
  229.9   10.0 2795.9  639.9  0.0  1.9    0.0    7.8   0  49
c5t600C0FF000000000098FD57F9DA83C00d0s0
  243.9   13.0 3790.1 1151.7  0.0  2.0    0.0    7.6   0  54
c5t600C0FF000000000098FD55DBA4EA000d0
  243.9   13.0 3790.1 1151.7  0.0  2.0    0.0    7.6   0  54
c5t600C0FF000000000098FD55DBA4EA000d0s0
  198.0    0.0 2355.5    0.0  0.0  1.5    0.0    7.7   0  51
c5t600C0FF000000000098FD516E4403200d0
  198.0    0.0 2355.5    0.0  0.0  1.5    0.0    7.7   0  51
c5t600C0FF000000000098FD516E4403200d0s0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  712.1   71.0 6038.5 1469.6  0.0  8.4    0.0   10.8   0  98
c5t600C0FF000000000098FD57F9DA83C00d0
  712.1   71.0 6038.5 1469.6  0.0  8.4    0.0   10.8   0  98
c5t600C0FF000000000098FD57F9DA83C00d0s0
  759.1  164.0 7574.6 2283.2  0.0 10.2    0.0   11.1   0 100
c5t600C0FF000000000098FD55DBA4EA000d0
  759.1  164.0 7574.6 2283.2  0.0 10.2    0.0   11.1   0 100
c5t600C0FF000000000098FD55DBA4EA000d0s0
  716.1   40.0 8526.2 1956.2  0.0  8.7    0.0   11.5   0  98
c5t600C0FF000000000098FD516E4403200d0
  716.1   40.0 8526.2 1956.2  0.0  8.7    0.0   11.5   0  98
c5t600C0FF000000000098FD516E4403200d0s0


# mpstat 1
[...]
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    7    72   61  563    6  103   13    0   253    1  20   0  79
  1    0   0    3    57   31  359    0   70   11    1    14    0  82   0  18
  2    0   0    8   369  356  755    3  144   18    0   140    0   3   0  97
  3    0   0    9   321  218  334    1   18    8    0    32    0  57   0  43
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    1    21   16  192    0   13    3    0    40    0  55   0  45
  1    0   0    1    54   27  324    2   50    8    0   195    0  15   0  85
  2    0   0    2   165  161  148    0   39    2    0    49    0   1   0  99
  3    0   0    7   260  158  144    1   36    4    0    31    0  60   0  40
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0   15    36   29  113    1   35    6    0    18    0  87   0  13
  1    0   0    5    60   33  497    1   59   11    0   108    0  48   0  52
  2    0   0    5   491  483  477   25   72    9    0   214    1  12   0  87
  3    0   0    7   288  183  328    2   30    4    0    45    0  46   0  54
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    7    15   12  270    0   33    6    0   175    0  12   0  88
  1    0   0    3    47   25  178    0   42    4    0    47    1   5   0  94
  2    0   0    1    64   60   70    0   22    3    0    75    0   0   0 100
  3    0   0    2   267  162    2    2    1    1    0     0    0 100   0   0
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0  112   493  428 3068    7  638  279   91    24    0  50   0  50
  1    0   0   55    87   30 1541    4  402  167   71   154    0  70   0  30
  2    0   0  145  4074 3932 4564    5  998  490  497   106    0  51   0  49
  3    0   0  139   764  587 4220   12  798  334  461    17    0  66   0  34
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0  281  1189  902 9050   22 1867 1173 1573   139    0  62   0  38
  1    0   0  136   357   27 7267   11 1645  985 1128    42    0  76   0  24
  2    0   0  168  7332 7079 7255   21 1608  952 1138    80    0  63   0  37
  3    0   0  331  1199 1058 1589   19  272  253   58    34    0  84   0  16



Different time period when disk activity is small:

# lockstat -kgIw sleep 10 | less

Profiling interrupt: 3904 events in 10.055 seconds (388 events/sec)

Count genr cuml rcnt     nsec CPU+PIL                Hottest Caller
-------------------------------------------------------------------------------
 8925 229% ---- 0.00     5067 cpu[1]                 nfssys
 7386 189% ---- 0.00     2610 cpu[3]                 thread_start
 6566 168% ---- 0.00     4404 cpu[2]                 thread_start
 6453 165% ---- 0.00     3325 cpu[0]                 thread_start
 1259  32% ---- 0.00     1756 cpu[0]+11              thread_start
 1161  30% ---- 0.00     2247 cpu[2]+11              thread_start
 1159  30% ---- 0.00     1472 cpu[3]+11              thread_start
  939  24% ---- 0.00     2459 cpu[1]+11              thread_start
  190   5% ---- 0.00     2880 cpu[0]+6               pci_intr_wrapper
  181   5% ---- 0.00     2914 cpu[3]+6               ql_response_pkt
  103   3% ---- 0.00     4591 cpu[2]+6               intr_thread
    7   0% ---- 0.00     1118 cpu[3]+12              rmc_comm_serdev_receive
    5   0% ---- 0.00     1022 cpu[3]+13              cyclic_softint
-------------------------------------------------------------------------------




Now during another period when disk activity is low and nfs clients see problem:

# dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n
fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}''
-n tick-5s''{printa(@);exit(0);}''
[...]
  page_sub_common                                              840400
  mutex_vector_enter                                           880400
  callout_schedule_1                                          1106200
  kmem_cache_alloc                                            1182400
  lock_set_spl_spin                                           1240000
  thread_lock                                                 1269600
  syscall_mstate                                              1359200
  set_freemem                                                 1575800
  page_trylock                                                1587500
  hwblkpagecopy                                               1885100
  set_anoninfo                                                1904400
  page_pptonum                                                2044900
  xc_serv                                                     2147600
  lzjb_decompress                                             3209600
  resume                                                      4783500
  hwblkclr                                                    7101100
  page_next_scan_large                                       23648600
  generic_idle_cpu                                           69234100
  disp_getwork                                              139261800
  avl_walk                                                  669424900


Lets see how it looks when there;s no problem:

# dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n
fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}''
-n tick-5s''{printa(@);exit(0);}''
[...]
  avl_insert                                                  8591400
  sleepq_insert                                               8660800
  fletcher_4_native                                           9789400
  buf_hash                                                   10454600
  copy_pattern                                               12395200
  hat_getpfnum                                               14970100
  ddi_get_instance                                           16348900
  ddi_get_soft_state                                         19676500
  pci_pbm_dma_sync                                           21993300
  space_map_seg_compare                                      25628100
  lock_set_spin                                              31066500
  ip_cksum                                                   31528900
  kmem_cache_alloc                                           34203500
  kmem_free                                                  35418800
  page_next_scan_large                                       35833700
  fletcher_2_native                                          48868100
  kmem_cache_free                                            54456400
  avl_destroy_nodes                                          63271000
  lzjb_decompress                                            72163800
  lock_set_spl_spin                                          72655100
  lzjb_compress                                              91146600
  mutex_vector_enter                                        113553300
  resume                                                   2836183200


No avl_walk :)

Heh, I''m sure I have seen avl_walk consuming lot of CPU before...

So wait for another such period and (6-7seconds):

# dtrace -n fbt::avl_walk:entry''{@[stack()]=count();}''
[...]
              zfs`space_map_add+0x160
              zfs`space_map_load+0x218
              zfs`metaslab_activate+0x3c
              zfs`metaslab_group_alloc+0x1c0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6d8
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x104
              zfs`spa_sync+0xe4
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
             1259

              zfs`space_map_add+0x174
              zfs`space_map_load+0x218
              zfs`metaslab_activate+0x3c
              zfs`metaslab_group_alloc+0x1c0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_alloc_blk+0x70
              zfs`zil_lwb_write_start+0x8c
              zfs`zil_lwb_commit+0x1ac
              zfs`zil_commit+0x1b0
              zfs`zfs_fsync+0xa8
              genunix`fop_fsync+0x14
              nfssrv`rfs3_create+0x700
              nfssrv`common_dispatch+0x444
              rpcmod`svc_getreq+0x154
              rpcmod`svc_run+0x198
              nfs`nfssys+0x1c4
              unix`syscall_trap32+0xcc
           183895

              zfs`space_map_add+0x160
              zfs`space_map_load+0x218
              zfs`metaslab_activate+0x3c
              zfs`metaslab_group_alloc+0x1c0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_alloc_blk+0x70
              zfs`zil_lwb_write_start+0x8c
              zfs`zil_lwb_commit+0x1ac
              zfs`zil_commit+0x1b0
              zfs`zfs_fsync+0xa8
              genunix`fop_fsync+0x14
              nfssrv`rfs3_create+0x700
              nfssrv`common_dispatch+0x444
              rpcmod`svc_getreq+0x154
              rpcmod`svc_run+0x198
              nfs`nfssys+0x1c4
              unix`syscall_trap32+0xcc
           196122

              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e4
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_alloc_blk+0x70
              zfs`zil_lwb_write_start+0x8c
              zfs`zil_commit+0x21c
              zfs`zfs_fsync+0xa8
              genunix`fop_fsync+0x14
              nfssrv`rfs3_remove+0x218
              nfssrv`common_dispatch+0x444
              rpcmod`svc_getreq+0x154
              rpcmod`svc_run+0x198
              nfs`nfssys+0x1c4
              unix`syscall_trap32+0xcc
           854743

              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e4
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6d8
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x104
              zfs`spa_sync+0xe4
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
          1042372

              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e4
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_alloc_blk+0x70
              zfs`zil_lwb_write_start+0x8c
              zfs`zil_lwb_commit+0x1ac
              zfs`zil_commit+0x1b0
              zfs`zfs_fsync+0xa8
              genunix`fop_fsync+0x14
              nfssrv`rfs3_create+0x700
              nfssrv`common_dispatch+0x444
              rpcmod`svc_getreq+0x154
              rpcmod`svc_run+0x198
              nfs`nfssys+0x1c4
              unix`syscall_trap32+0xcc
          1415957


And another "bad" time period:

# dtrace -n fbt::avl_walk:entry''{self->vt=vtimestamp;}'' -n
fbt::avl_walk:return''/self->vt/{@[stack()]=sum(vtimestamp-self->vt);self->vt=0;}''
-n tick-6s''{printa(@);exit(0);}''
[...]
              zfs`vdev_queue_io_to_issue+0xc4
              zfs`vdev_queue_io_done+0x4c
              zfs`vdev_disk_io_done+0x4
              genunix`taskq_thread+0x1a4
              unix`thread_start+0x4
            11700

              zfs`vdev_queue_io_to_issue+0x58
              zfs`vdev_queue_io_done+0x4c
              zfs`vdev_disk_io_done+0x4
              genunix`taskq_thread+0x1a4
              unix`thread_start+0x4
            20800

              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e4
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6d8
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x104
              zfs`spa_sync+0xe4
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
        382903900

              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e4
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_alloc_blk+0x70
              zfs`zil_lwb_write_start+0x8c
              zfs`zil_lwb_commit+0x1ac
              zfs`zil_commit+0x1b0
              zfs`zfs_write+0x6c4
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x368
              nfssrv`common_dispatch+0x444
              rpcmod`svc_getreq+0x154
              rpcmod`svc_run+0x198
              nfs`nfssys+0x1c4
              unix`syscall_trap32+0xcc
        475080700

              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e4
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_alloc_blk+0x70
              zfs`zil_lwb_write_start+0x8c
              zfs`zil_lwb_commit+0x1ac
              zfs`zil_commit+0x1b0
              zfs`zfs_fsync+0xa8
              genunix`fop_fsync+0x14
              nfssrv`rfs3_create+0x700
              nfssrv`common_dispatch+0x444
              rpcmod`svc_getreq+0x154
              rpcmod`svc_run+0x198
              nfs`nfssys+0x1c4
              unix`syscall_trap32+0xcc
        481455800




So lets destroy oldest snapshot:

# zfs destroy f3-1/d611 at auto-2006-10-18
[it took about 4 minutes!]

# mdb -k
Loading modules: [ unix krtld genunix dtrace specfs ufs sd md ip sctp usba fcp
fctl qlc ssd lofs logindmux ptm cpc fcip random crypto zfs nfs
]> ::ps!grep zfsR  20905  20464  20905  20458      0 0x4a004000 00000301ec367c20
zfs> 00000301ec367c20::walk thread|::findstack -vstack pointer for thread 301d1a02fe0: 2a103a28b71
[ 000002a103a28b71 cv_wait+0x38() ]
  000002a103a28c21 txg_wait_synced+0x54(60007327290, 1a7bda, 1a7bd9,
600073272d0, 600073272d2, 60007327288)
  000002a103a28cd1 dsl_dir_sync_task+0x124(60003867d00, 0, 60003f6800a, 0, 0,
1a7bda)
  000002a103a28d81 dsl_dataset_destroy+0x98(60003f68000, 5, 3, 60003f68009,
600073271c0, 0)
  000002a103a28f61 dmu_objset_destroy+0x3c(60003f68000, 0, 0, 2, 0, 54)
  000002a103a29021 zfsdev_ioctl+0x160(70370c00, 54, ffbfed38, 1c, 70, e68)
  000002a103a290d1 fop_ioctl+0x20(301ee6220c0, 5a1c, ffbfed38, 100003,
60003dcebf8, 12019f0)
  000002a103a29191 ioctl+0x184(4, 3005b7c2ba8, ffbfed38, ff38db68, 40350, 5a1c)
  000002a103a292e1 syscall_trap32+0xcc(4, 5a1c, ffbfed38, ff38db68, 40350, 0)


After snapshot was destroyed problem is completly gone.
And right now no avl_walk function at the top:

# dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n
fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}''
-n tick-5s''{printa(@);exit(0);}''
[...]
cv_broadcast                                               11204800
  vdev_cache_offset_compare                                  11977900
  avl_walk                                                   12568500
  hat_getpfnum                                               13012900
  zio_next_stage                                             13649700
  page_next_scan_large                                       15850300
  ddi_get_instance                                           16842800
  kmem_free                                                  20050300
  pci_pbm_dma_sync                                           22854300
  ddi_get_soft_state                                         25803500
  space_map_seg_compare                                      32883300
  kmem_cache_free                                            34151000
  ip_cksum                                                   34840900
  kmem_cache_alloc                                           41557900
  resume                                                     47279100
  fletcher_2_native                                          55060000
  lzjb_compress                                              94374200
  lzjb_decompress                                            97102700
  mutex_vector_enter                                        115330500


Lets try small experiment - see if lowering quota for the fs will make things
bad again:

# zfs list | egrep "NAME|f3-1/d611"
NAME                   USED  AVAIL  REFER  MOUNTPOINT
f3-1/d611              334G   266G   151G  //f3-1/d611
f3-1/d611 at auto-2006-10-19  25.7G      -   145G  -
f3-1/d611 at auto-2006-10-20  21.9G      -   155G  -
f3-1/d611 at auto-2006-10-21  22.9G      -   151G  -
f3-1/d611 at auto-2006-10-22  18.5G      -   145G  -
f3-1/d611 at auto-2006-10-23  17.6G      -   148G  -
f3-1/d611 at auto-2006-10-24  14.7G      -   154G  -

# zfs get quota f3-1/d611
NAME             PROPERTY       VALUE                      SOURCE
f3-1/d611        quota          600G                       local

# zfs set quota=400g f3-1/d611

# zfs list | egrep "NAME|f3-1/d611"
NAME                   USED  AVAIL  REFER  MOUNTPOINT
f3-1/d611              334G  66.1G   151G  //f3-1/d611
f3-1/d611 at auto-2006-10-19  25.7G      -   145G  -
f3-1/d611 at auto-2006-10-20  21.9G      -   155G  -
f3-1/d611 at auto-2006-10-21  22.9G      -   151G  -
f3-1/d611 at auto-2006-10-22  18.5G      -   145G  -
f3-1/d611 at auto-2006-10-23  17.6G      -   148G  -
f3-1/d611 at auto-2006-10-24  14.7G      -   154G  -
 
 
This message posted from opensolaris.org

Robert Milkowski

2006-Oct-24 20:16 UTC

head link

[zfs-discuss] Re: Snapshots impact on performance

forgot to mention - after quota was lowered still no problem - everything works
ok.
 
 
This message posted from opensolaris.org

Matthew Ahrens

2006-Oct-25 05:08 UTC

head link

[zfs-discuss] Re: Snapshots impact on performance

Robert Milkowski wrote:> Hi.
> 
>     On nfs clients which are mounting file system f3-1/d611 I can see 3-5s
periods of 100% busy (iostat) and almost no IOs issued to nfs server, on nfs
server at the same time disk activity is almost 0 (both iostat and zpool
iostat). However CPU activity increases in SYS during that periods.
> Different time period when disk activity is small:
> 
> # lockstat -kgIw sleep 10 | less
Did you happen to get ''lockstat -kgIW'' output while the
problem was
occurring?  (note the capital W)  I''m not sure how to interpret the -w 
output... (and sorry I gave you the wrong flags before).
> Now during another period when disk activity is low and nfs clients see
problem:
> 
> # dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n
fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}''
-n tick-5s''{printa(@);exit(0);}''
> [...]
>   page_next_scan_large                                       23648600
>   generic_idle_cpu                                           69234100
>   disp_getwork                                              139261800
>   avl_walk                                                  669424900
Hmm, that''s a possibility, but the method you''re using to
gather this
information (tracing *every* function entry and exit) is a bit 
heavy-handed, and it may be distorting the results.
> Heh, I''m sure I have seen avl_walk consuming lot of CPU before...
> 
> So wait for another such period and (6-7seconds):
> 
> # dtrace -n fbt::avl_walk:entry''{@[stack()]=count();}''
> [...]
>               zfs`metaslab_ff_alloc+0x9c
>               zfs`space_map_alloc+0x10
>               zfs`metaslab_group_alloc+0x1e4
>               zfs`metaslab_alloc_dva+0x114
>               zfs`metaslab_alloc+0x2c
>               zfs`zio_alloc_blk+0x70
>               zfs`zil_lwb_write_start+0x8c
>               zfs`zil_lwb_commit+0x1ac
>               zfs`zil_commit+0x1b0
>               zfs`zfs_fsync+0xa8
>               genunix`fop_fsync+0x14
>               nfssrv`rfs3_create+0x700
>               nfssrv`common_dispatch+0x444
>               rpcmod`svc_getreq+0x154
>               rpcmod`svc_run+0x198
>               nfs`nfssys+0x1c4
>               unix`syscall_trap32+0xcc
>           1415957
Hmm, assuming that avl_walk() is actually consuming most of our CPU 
(which the lockstat -kgIW will confirm), this seems to indicate that 
we''re taking a long time trying to find free chunks of space.  This may
happen if you have lots of small fragments of free space, but no chunks 
large enough to hold the block we''re trying to allocate.  We try to 
avoid this situation by trying to allocate from the metaslabs with the 
most free space, but it''s possible that there''s an error in
this algorithm.
> So lets destroy oldest snapshot:
> 
> # zfs destroy f3-1/d611 at auto-2006-10-18
> [it took about 4 minutes!]
> After snapshot was destroyed problem is completly gone.
FYI, destroying the snapshot probably helped simply by (a) returning 
some big chunks of space to the pool and/or (b) perturbing the system 
enough so that we try different metaslabs which aren''t so fragmented.

--matt

Robert Milkowski

2006-Oct-25 11:32 UTC

head link

[zfs-discuss] Re: Snapshots impact on performance

Hello Matthew,

Wednesday, October 25, 2006, 7:08:56 AM, you wrote:

MA> Robert Milkowski wrote:>> Hi.
>> 
>>     On nfs clients which are mounting file system f3-1/d611 I can see
3-5s periods of 100% busy (iostat) and almost no IOs issued to nfs server, on
nfs server at the same time disk activity is almost 0 (both iostat and zpool
iostat). However CPU activity increases in SYS during that periods.
>> Different time period when disk activity is small:
>> 
>> # lockstat -kgIw sleep 10 | less
MA> Did you happen to get ''lockstat -kgIW'' output while the
problem was
MA> occurring?  (note the capital W)  I''m not sure how to interpret
the -w
MA> output... (and sorry I gave you the wrong flags before).


Sorry, just blindly copied (changed i to I), next time I''ll try.
>> Now during another period when disk activity is low and nfs clients see
problem:
>> 
>> # dtrace -n fbt:::entry''{self->vt=vtimestamp;}'' -n
fbt:::return''/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt=0;}''
-n tick-5s''{printa(@);exit(0);}''
>> [...]
>>   page_next_scan_large                                       23648600
>>   generic_idle_cpu                                           69234100
>>   disp_getwork                                              139261800
>>   avl_walk                                                  669424900
MA> Hmm, that''s a possibility, but the method you''re using
to gather this
MA> information (tracing *every* function entry and exit) is a bit 
MA> heavy-handed, and it may be distorting the results.

That could be the case - however during normal periods avl_walk when
measured the same way is not on the top.

>> Heh, I''m sure I have seen avl_walk consuming lot of CPU
before...
>> 
>> So wait for another such period and (6-7seconds):
>> 
>> # dtrace -n
fbt::avl_walk:entry''{@[stack()]=count();}''
>> [...]
>>               zfs`metaslab_ff_alloc+0x9c
>>               zfs`space_map_alloc+0x10
>>               zfs`metaslab_group_alloc+0x1e4
>>               zfs`metaslab_alloc_dva+0x114
>>               zfs`metaslab_alloc+0x2c
>>               zfs`zio_alloc_blk+0x70
>>               zfs`zil_lwb_write_start+0x8c
>>               zfs`zil_lwb_commit+0x1ac
>>               zfs`zil_commit+0x1b0
>>               zfs`zfs_fsync+0xa8
>>               genunix`fop_fsync+0x14
>>               nfssrv`rfs3_create+0x700
>>               nfssrv`common_dispatch+0x444
>>               rpcmod`svc_getreq+0x154
>>               rpcmod`svc_run+0x198
>>               nfs`nfssys+0x1c4
>>               unix`syscall_trap32+0xcc
>>           1415957
MA> Hmm, assuming that avl_walk() is actually consuming most of our CPU 
MA> (which the lockstat -kgIW will confirm), this seems to indicate that 
MA> we''re taking a long time trying to find free chunks of space. 
This may
MA> happen if you have lots of small fragments of free space, but no chunks
MA> large enough to hold the block we''re trying to allocate.  We try
to
MA> avoid this situation by trying to allocate from the metaslabs with the
MA> most free space, but it''s possible that there''s an
error in this algorithm.

>> So lets destroy oldest snapshot:
>> 
>> # zfs destroy f3-1/d611 at auto-2006-10-18
>> [it took about 4 minutes!]
>> After snapshot was destroyed problem is completly gone.
MA> FYI, destroying the snapshot probably helped simply by (a) returning 
MA> some big chunks of space to the pool and/or (b) perturbing the system 
MA> enough so that we try different metaslabs which aren''t so
fragmented.


Well, that''s scaring - it means when I''ll fill up the pool a
little
bit more I could get big problem even without using snapshots which is
much worse as there would be no easy and quick method to solve
symptoms like just destroying snapshot.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2006-Oct-25 13:35 UTC

head link

[zfs-discuss] Re: Re: Snapshots impact on performance

Hi.

    Today again the same problem - however not that severe so I''m not
sure if below lockstat is representative.

# lockstat -kgIW sleep 5 | less
Profiling interrupt: 1968 events in 5.072 seconds (388 events/sec)

Count genr cuml rcnt     nsec Hottest CPU+PIL        Caller
-------------------------------------------------------------------------------
 1560  79% ---- 0.00     2520 cpu[3]                 thread_start
 1018  52% ---- 0.00     2069 cpu[2]+11              idle
  851  43% ---- 0.00     2093 cpu[2]+11              disp_getwork
  591  30% ---- 0.00     3647 cpu[1]                 metaslab_alloc
  591  30% ---- 0.00     3647 cpu[1]                 metaslab_alloc_dva
  591  30% ---- 0.00     3647 cpu[1]                 metaslab_group_alloc
  404  21% ---- 0.00     3488 cpu[1]                 space_map_load
  404  21% ---- 0.00     3488 cpu[1]                 metaslab_activate
  382  19% ---- 0.00     3564 cpu[1]                 zio_dva_allocate
  382  19% ---- 0.00     3564 cpu[1]                 zio_write_compress
  382  19% ---- 0.00     3564 cpu[1]                 txg_sync_thread
  382  19% ---- 0.00     3564 cpu[1]                 spa_sync
  382  19% ---- 0.00     3564 cpu[1]                 dsl_pool_sync
  382  19% ---- 0.00     3564 cpu[1]                 dsl_dataset_sync
  382  19% ---- 0.00     3564 cpu[1]                 dnode_sync
  382  19% ---- 0.00     3564 cpu[1]                 dmu_objset_sync
  382  19% ---- 0.00     3564 cpu[1]                 dmu_objset_sync_dnodes
  382  19% ---- 0.00     3564 cpu[1]                 dbuf_sync
  382  19% ---- 0.00     3564 cpu[1]                 arc_write
  363  18% ---- 0.00     3510 cpu[1]                 syscall_trap32
  361  18% ---- 0.00     3514 cpu[1]                 nfssys
  361  18% ---- 0.00     3514 cpu[1]                 svc_run
  350  18% ---- 0.00     3532 cpu[1]                 svc_getreq
  349  18% ---- 0.00     3533 cpu[1]                 common_dispatch
  251  13% ---- 0.00     3813 cpu[1]                 avl_find
  238  12% ---- 0.00     3320 cpu[1]                 space_map_remove
  211  11% ---- 0.00     3798 cpu[1]                 zil_commit
  209  11% ---- 0.00     3800 cpu[1]                 zio_alloc_blk
  209  11% ---- 0.00     3800 cpu[1]                 zil_lwb_write_start
  198  10% ---- 0.00     4256 cpu[1]                 metaslab_ff_alloc
  196  10% ---- 0.00     3867 cpu[1]                 zfs_putpage
[...]
 
 
This message posted from opensolaris.org

Matthew Ahrens

2006-Oct-25 16:58 UTC

head link

[zfs-discuss] Re: Re: Snapshots impact on performance

Robert Milkowski wrote:> Hi.
> 
>     Today again the same problem - however not that severe so I''m
not sure if below lockstat is representative.
> 
> # lockstat -kgIW sleep 5 | less
> Profiling interrupt: 1968 events in 5.072 seconds (388 events/sec)
> 
> Count genr cuml rcnt     nsec Hottest CPU+PIL        Caller
>
-------------------------------------------------------------------------------
>  1560  79% ---- 0.00     2520 cpu[3]                 thread_start
>  1018  52% ---- 0.00     2069 cpu[2]+11              idle
>   851  43% ---- 0.00     2093 cpu[2]+11              disp_getwork
>   591  30% ---- 0.00     3647 cpu[1]                 metaslab_alloc
>   591  30% ---- 0.00     3647 cpu[1]                 metaslab_alloc_dva
>   591  30% ---- 0.00     3647 cpu[1]                 metaslab_group_alloc
>   404  21% ---- 0.00     3488 cpu[1]                 space_map_load
>   404  21% ---- 0.00     3488 cpu[1]                 metaslab_activate
Nice, this is definitely pointing the finger more definitively.  Next 
time could you try:

dtrace -n ''profile-97{@[stack(20)] = count()}'' -c
''sleep 5''

(just send the last 10 or so stack traces)

In the mean time I''ll talk with our SPA experts and see if I can figure
out how to fix this...

--matt

Jeff Bonwick

2006-Oct-30 01:03 UTC

head link

[zfs-discuss] Re: Re: Snapshots impact on performance

> Nice, this is definitely pointing the finger more definitively.  Next 
> time could you try:
> 
> dtrace -n ''profile-97{@[stack(20)] = count()}'' -c
''sleep 5''
> 
> (just send the last 10 or so stack traces)
> 
> In the mean time I''ll talk with our SPA experts and see if I can
figure
> out how to fix this...
By any chance is the pool fairly close to full?  The fuller it gets,
the harder it becomes to find long stretches of free space.

Jeff

Robert Milkowski

2006-Oct-30 08:54 UTC

head link

[zfs-discuss] Re: Re: Snapshots impact on performance

Hello Jeff,

Monday, October 30, 2006, 2:03:52 AM, you wrote:
>> Nice, this is definitely pointing the finger more definitively.  Next 
>> time could you try:
>> 
>> dtrace -n ''profile-97{@[stack(20)] = count()}'' -c
''sleep 5''
>> 
>> (just send the last 10 or so stack traces)
>> 
>> In the mean time I''ll talk with our SPA experts and see if I
can figure
>> out how to fix this...
JB> By any chance is the pool fairly close to full?  The fuller it gets,
JB> the harder it becomes to find long stretches of free space.

Nope - at least 600GB free all the time in the pool.
Also it''s not a quota - if I rise a quote for the file system (+100GB
for example) it doesn''t help. If I remove oldest snapshot it always
helps immediately. Sometimes I have to remove two oldest snapshots.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Chris Gerhard

2006-Dec-06 17:23 UTC

head link

[zfs-discuss] Re: Re: Re: Snapshots impact on performance

One of our file servers internally to Sun that reproduces this running nv53 here
is the dtrace output:

              unix`mutex_vector_enter+0x120
              zfs`metaslab_group_alloc+0x1a0
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_dva_allocate+0x54
              zfs`zio_write_compress+0x248
              zfs`arc_write+0xec
              zfs`dbuf_sync+0x698
              zfs`dnode_sync+0x2ec
              zfs`dmu_objset_sync_dnodes+0x60
              zfs`dmu_objset_sync+0x78
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b4
              zfs`txg_sync_thread+0x120
              unix`thread_start+0x4
                8

              unix`mutex_vector_enter+0x120
              zfs`metaslab_group_alloc+0x1a0
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_commit_writer+0x2ac
              zfs`zil_commit+0x68
              zfs`zfs_fsync+0xa4
              genunix`fop_fsync+0x14
              nfssrv`rfs3_setattr+0x4ec
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
                8

              genunix`avl_walk+0x3c
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_lwb_commit+0x94
              zfs`zil_commit_writer+0x1e4
              zfs`zil_commit+0x68
              zfs`zfs_putpage+0x1d8
              genunix`fop_putpage+0x1c
              nfssrv`rfs3_commit+0x130
              nfssrv`common_dispatch+0x4ec
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
                8

              genunix`avl_walk+0x40
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_commit_writer+0x2ac
              zfs`zil_commit+0x68
              zfs`zfs_putpage+0x1d8
              genunix`fop_putpage+0x1c
              nfssrv`rfs3_commit+0x130
              nfssrv`common_dispatch+0x4ec
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
                8

              genunix`avl_walk+0x4c
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_dva_allocate+0x54
              zfs`zio_write_compress+0x248
              zfs`arc_write+0xec
              zfs`dbuf_sync+0x698
              zfs`dnode_sync+0x2ec
              zfs`dmu_objset_sync_dnodes+0x60
              zfs`dmu_objset_sync+0x50
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b4
              zfs`txg_sync_thread+0x120
              unix`thread_start+0x4
                8

              zfs`fletcher_2_native+0x2c
              zfs`arc_cksum_verify+0x64
              zfs`arc_buf_thaw+0x38
              zfs`dbuf_dirty+0x10c
              zfs`dmu_write_uio+0xc4
              zfs`zfs_write+0x3ac
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x3d8
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
                8

              zfs`fletcher_2_native+0x2c
              zfs`arc_cksum_verify+0x64
              zfs`arc_buf_destroy+0x1c
              zfs`arc_evict+0x1f0
              zfs`arc_adjust+0xf8
              zfs`arc_kmem_reclaim+0x100
              zfs`arc_kmem_reap_now+0x20
              zfs`arc_reclaim_thread+0xdc
              unix`thread_start+0x4
                8

              zfs`fletcher_2_native+0x2c
              zfs`arc_cksum_compute+0x6c
              zfs`dbuf_rele+0x40
              zfs`dmu_buf_rele_array+0x34
              zfs`dmu_write_uio+0x13c
              zfs`zfs_write+0x3ac
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x3d8
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
                8

              unix`disp_getwork+0x7c
              unix`idle+0xd0
              unix`thread_start+0x4
                9

              unix`mutex_vector_enter+0x134
              zfs`metaslab_free_dva+0xb0
              zfs`metaslab_free+0x78
              zfs`zio_dva_free+0x14
              zfs`zio_wait+0xc
              zfs`arc_free+0x13c
              zfs`dsl_dataset_block_kill+0x168
              zfs`dbuf_do_kill+0x14
              genunix`taskq_thread+0x19c
              unix`thread_start+0x4
                9

              genunix`avl_walk+0x3c
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_dva_allocate+0x54
              zfs`zio_write_compress+0x248
              zfs`arc_write+0xec
              zfs`dbuf_sync+0x698
              zfs`dnode_sync+0x2ec
              zfs`dmu_objset_sync_dnodes+0x60
              zfs`dmu_objset_sync+0x50
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b4
              zfs`txg_sync_thread+0x120
              unix`thread_start+0x4
                9

               10

              unix`mutex_vector_enter+0x120
              zfs`metaslab_group_alloc+0x1a0
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_lwb_commit+0x94
              zfs`zil_commit_writer+0x1e4
              zfs`zil_commit+0x68
              zfs`zfs_putpage+0x1d8
              genunix`fop_putpage+0x1c
              nfssrv`rfs3_commit+0x130
              nfssrv`common_dispatch+0x4ec
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               10

              unix`mutex_vector_enter+0x134
              zfs`metaslab_group_alloc+0x1a0
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_commit_writer+0x2ac
              zfs`zil_commit+0x68
              zfs`zfs_fsync+0xa4
              genunix`fop_fsync+0x14
              nfssrv`rfs3_setattr+0x4ec
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               10

              unix`disp_getwork+0x8c
              unix`idle+0xd0
              unix`thread_start+0x4
               11

              unix`mutex_vector_enter+0x134
              zfs`metaslab_group_alloc+0x1a0
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_dva_allocate+0x54
              zfs`zio_write_compress+0x248
              zfs`arc_write+0xec
              zfs`dbuf_sync+0x698
              zfs`dnode_sync+0x2ec
              zfs`dmu_objset_sync_dnodes+0x60
              zfs`dmu_objset_sync+0x78
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b4
              zfs`txg_sync_thread+0x120
              unix`thread_start+0x4
               12

              genunix`avl_walk+0x50
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_dva_allocate+0x54
              zfs`zio_write_compress+0x248
              zfs`arc_write+0xec
              zfs`dbuf_sync+0x698
              zfs`dnode_sync+0x2ec
              zfs`dmu_objset_sync_dnodes+0x60
              zfs`dmu_objset_sync+0x50
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b4
              zfs`txg_sync_thread+0x120
              unix`thread_start+0x4
               12

              zfs`metaslab_ff_alloc+0x7c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_lwb_commit+0x94
              zfs`zil_commit_writer+0x1e4
              zfs`zil_commit+0x68
              zfs`zfs_write+0x648
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x3d8
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               12

              genunix`avl_walk+0x40
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_dva_allocate+0x54
              zfs`zio_write_compress+0x248
              zfs`arc_write+0xec
              zfs`dbuf_sync+0x698
              zfs`dnode_sync+0x2ec
              zfs`dmu_objset_sync_dnodes+0x60
              zfs`dmu_objset_sync+0x50
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b4
              zfs`txg_sync_thread+0x120
              unix`thread_start+0x4
               13

              unix`mutex_vector_enter+0x120
              zfs`metaslab_free_dva+0xb0
              zfs`metaslab_free+0x78
              zfs`zio_dva_free+0x14
              zfs`zio_wait+0xc
              zfs`arc_free+0x13c
              zfs`dsl_dataset_block_kill+0x168
              zfs`dbuf_do_kill+0x14
              genunix`taskq_thread+0x19c
              unix`thread_start+0x4
               18

              genunix`avl_walk+0x50
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_commit_writer+0x2ac
              zfs`zil_commit+0x68
              zfs`zfs_write+0x648
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x3d8
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               18

              zfs`metaslab_ff_alloc+0x74
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_commit_writer+0x2ac
              zfs`zil_commit+0x68
              zfs`zfs_write+0x648
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x3d8
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               19

              genunix`avl_walk+0x4c
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_commit_writer+0x2ac
              zfs`zil_commit+0x68
              zfs`zfs_write+0x648
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x3d8
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               20

              genunix`avl_walk+0x3c
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_commit_writer+0x2ac
              zfs`zil_commit+0x68
              zfs`zfs_write+0x648
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x3d8
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               22

              unix`mutex_vector_enter+0x134
              zfs`metaslab_group_alloc+0x1a0
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_lwb_commit+0x94
              zfs`zil_commit_writer+0x1e4
              zfs`zil_commit+0x68
              zfs`zfs_putpage+0x1d8
              genunix`fop_putpage+0x1c
              nfssrv`rfs3_commit+0x130
              nfssrv`common_dispatch+0x4ec
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               26

              genunix`avl_walk+0x40
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_commit_writer+0x2ac
              zfs`zil_commit+0x68
              zfs`zfs_write+0x648
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x3d8
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               27

              zfs`metaslab_ff_alloc+0x74
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_lwb_commit+0x94
              zfs`zil_commit_writer+0x1e4
              zfs`zil_commit+0x68
              zfs`zfs_write+0x648
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x3d8
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               38

              genunix`avl_walk+0x50
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_lwb_commit+0x94
              zfs`zil_commit_writer+0x1e4
              zfs`zil_commit+0x68
              zfs`zfs_write+0x648
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x3d8
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               45

              genunix`avl_walk+0x40
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_lwb_commit+0x94
              zfs`zil_commit_writer+0x1e4
              zfs`zil_commit+0x68
              zfs`zfs_write+0x648
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x3d8
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               46

              genunix`avl_walk+0x3c
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_lwb_commit+0x94
              zfs`zil_commit_writer+0x1e4
              zfs`zil_commit+0x68
              zfs`zfs_write+0x648
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x3d8
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               57

              genunix`avl_walk+0x4c
              zfs`metaslab_ff_alloc+0x90
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x200
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x3c
              zfs`zio_alloc_blk+0x34
              zfs`zil_lwb_write_start+0xdc
              zfs`zil_lwb_commit+0x94
              zfs`zil_commit_writer+0x1e4
              zfs`zil_commit+0x68
              zfs`zfs_write+0x648
              genunix`fop_write+0x20
              nfssrv`rfs3_write+0x3d8
              nfssrv`common_dispatch+0x3c8
              rpcmod`svc_getreq+0x20c
              rpcmod`svc_run+0x1ac
              nfs`nfssys+0x18c
              unix`syscall_trap32+0xcc
               57
 
 
This message posted from opensolaris.org

Robert Milkowski

2006-Dec-12 17:28 UTC

head link

[zfs-discuss] Re: Re: Re: Snapshots impact on performance

Hello Chris,

Wednesday, December 6, 2006, 6:23:48 PM, you wrote:

CG> One of our file servers internally to Sun that reproduces this
CG> running nv53 here is the dtrace output:

Any conclusions yet?

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Chris Gerhard

2006-Dec-13 11:25 UTC

head link

[zfs-discuss] Re: Re: Re: Snapshots impact on performance

Robert Milkowski wrote:> Hello Chris,
> 
> Wednesday, December 6, 2006, 6:23:48 PM, you wrote:
> 
> CG> One of our file servers internally to Sun that reproduces this
> CG> running nv53 here is the dtrace output:
> 
> Any conclusions yet?
> 

Not yet.  We had to delete all the "automatic" snapshots we had so
that
the system became usable again. An engineer is looking into reproducing 
this on a lab system. I hope they will turn up on this thread with a 
progress report.

-- 
Chris Gerhard.                               __o __o __o
Principal Engineer                          _`\<,`\<,`\<,_
Sun Microsystems Limited                   (*)/---/---/ (*)
Phone: +44 (0) 1252 426033 (ext 26033) http://blogs.sun.com/chrisg
-----------------------------------------------------------
NOTICE: This email message is for the sole use of the
intended recipient(s) and may contain confidential and
privileged information. Any unauthorized review, use,
disclosure or distribution is prohibited.
If you are not the intended recipient, please contact
the sender by reply email and destroy all copies of the
original message.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3253 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061213/ad0395af/attachment.bin>

Robert Milkowski

2006-Dec-22 12:33 UTC

head link

[zfs-discuss] Re: Re: Re: Snapshots impact on performance

Hi.

  The problem is getting worse... now even if I destroy all snapshots in a pool
I get performance problem even with zil_disable set to 1.

Despite that I have limit for maximum nfs threads set to 2048 I get only about
1700.
If I want to kill nfsd server it takes 1-4 minutes untill all threads are
finished (and most of the time above 1000 threads are stil there). During nfsd
is stopping I can see IOs (99% reads) to the pool.

Also simple zfs commands (like changing quote for a file system, etc.) take too
much time to complete (like 3-5 minutes to set quota for a file system).

There''s still over 700gb free storage in a pool, setting quota to none
doesn''t help.

Using iostat it looks like disks aren''t saturated. It looks almost like
lot of nfsd threads are spinning (probably in zfs) - I don''t recall
nfsd stopping so long on UFS file systems.


bash-3.00# iostat -xnzC 1
[... first output]
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 1782.3    0.0 12700.7    0.0  0.0 11.1    0.0    6.2   0 293 c5
  584.3    0.0 4102.2    0.0  0.0  3.6    0.0    6.2   0  96
c5t600C0FF000000000098FD57F9DA83C00d0
  572.1    0.0 4144.8    0.0  0.0  3.7    0.0    6.5   0  99
c5t600C0FF000000000098FD55DBA4EA000d0
  625.8    0.0 4453.7    0.0  0.0  3.7    0.0    5.9   0  98
c5t600C0FF000000000098FD516E4403200d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 2381.9    0.0 17242.1    0.0  0.0 16.7    0.0    7.0   0 291 c5
  800.0    0.0 5827.9    0.0  0.0  5.5    0.0    6.9   0  96
c5t600C0FF000000000098FD57F9DA83C00d0
  690.0    0.0 4991.9    0.0  0.0  4.9    0.0    7.1   0  97
c5t600C0FF000000000098FD55DBA4EA000d0
  892.0    0.0 6422.3    0.0  0.0  6.3    0.0    7.1   0  98
c5t600C0FF000000000098FD516E4403200d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 1890.2    0.0 12826.0    0.0  0.0 11.4    0.0    6.0   0 292 c5
  611.1    0.0 3998.5    0.0  0.0  3.4    0.0    5.6   0  96
c5t600C0FF000000000098FD57F9DA83C00d0
  604.1    0.0 4259.5    0.0  0.0  3.9    0.0    6.4   0  98
c5t600C0FF000000000098FD55DBA4EA000d0
  675.1    0.0 4568.0    0.0  0.0  4.1    0.0    6.0   0  98
c5t600C0FF000000000098FD516E4403200d0
^C



Now dtrace output you''ve requested.

dtrace -n ''profile-97{@[stack(20)] = count()}'' -c
''sleep 5''
[...]

              unix`i_ddi_splhigh
              unix`disp_getwork+0x38
              unix`idle+0xd4
              unix`thread_start+0x4
               35

              unix`i_ddi_splx
              unix`disp_getwork+0x160
              unix`idle+0xd4
              unix`thread_start+0x4
               38

              unix`disp_getwork+0x70
              unix`idle+0xd4
              unix`thread_start+0x4
               38

              unix`disp_getwork+0x158
              unix`idle+0xd4
              unix`thread_start+0x4
               38

              unix`i_ddi_splhigh+0x14
              unix`disp_getwork+0x38
              unix`idle+0xd4
              unix`thread_start+0x4
               39

              unix`disp_getwork+0x8c
              unix`idle+0xd4
              unix`thread_start+0x4
               39

              unix`idle+0x12c
              unix`thread_start+0x4
               44

              unix`disp_getwork+0x1a8
              unix`idle+0xd4
              unix`thread_start+0x4
               47

              unix`disp_getwork+0x7c
              unix`idle+0xd4
              unix`thread_start+0x4
               49

              unix`disp_getwork+0x90
              unix`idle+0xd4
              unix`thread_start+0x4
               56

              unix`disp_getwork+0x10c
              unix`idle+0xd4
              unix`thread_start+0x4
               62

              unix`i_ddi_splx+0x1c
              unix`disp_getwork+0x160
              unix`idle+0xd4
              unix`thread_start+0x4
              117
bash-3.00#
 
 
This message posted from opensolaris.org

Robert Milkowski

2006-Dec-22 12:42 UTC

head link

[zfs-discuss] Re: Re: Re: Snapshots impact on performance

bash-3.00# lockstat -kgIW sleep 100 | head -30

Profiling interrupt: 38844 events in 100.098 seconds (388 events/sec)

Count genr cuml rcnt     nsec Hottest CPU+PIL        Caller
-------------------------------------------------------------------------------
32081  83% ---- 0.00     2432 cpu[1]                 thread_start
24347  63% ---- 0.00     2116 cpu[1]+11              idle
20724  53% ---- 0.00     2125 cpu[1]+11              disp_getwork
 5525  14% ---- 0.00     3571 cpu0                   syscall_trap32
 5470  14% ---- 0.00     3570 cpu0                   nfssys
 5470  14% ---- 0.00     3570 cpu0                   svc_run
 5225  13% ---- 0.00     3595 cpu0                   svc_getreq
 5142  13% ---- 0.00     3600 cpu0                   common_dispatch
 4488  12% ---- 0.00     3040 cpu[2]                 taskq_thread
 3658   9% ---- 0.00     2946 cpu[2]                 zio_vdev_io_assess
 2943   8% ---- 0.00     2874 cpu[2]                 zio_read_decompress
 2846   7% ---- 0.00     2120 cpu[1]                 splx
 2654   7% ---- 0.00     4105 cpu[1]                 putnext
 2541   7% ---- 0.00     2785 cpu[2]                 lzjb_decompress
 2382   6% ---- 0.00     2138 cpu[2]                 i_ddi_splhigh
 2056   5% ---- 0.00     3481 cpu0                   fop_lookup
 2031   5% ---- 0.00     3485 cpu0                   zfs_lookup
 1979   5% ---- 0.00     3482 cpu0                   zfs_dirlook
 1935   5% ---- 0.00     3492 cpu0                   zfs_dirent_lock
 1910   5% ---- 0.00     3835 cpu[1]                 txg_sync_thread
 1910   5% ---- 0.00     3835 cpu[1]                 spa_sync
 1896   5% ---- 0.00     3842 cpu[1]                 dsl_pool_sync
 1896   5% ---- 0.00     3842 cpu[1]                 dmu_objset_sync
 1895   5% ---- 0.00     3843 cpu[1]                 dmu_objset_sync_dnodes
 1894   5% ---- 0.00     3843 cpu[1]                 dnode_sync
bash-3.00#
 
 
This message posted from opensolaris.org

Robert Milkowski

2007-Jan-05 22:59 UTC

head link

[zfs-discuss] Re: Re: Re: Snapshots impact on performance

Hello Chris,

Wednesday, December 13, 2006, 12:25:40 PM, you wrote:

CG> Robert Milkowski wrote:>> Hello Chris,
>> 
>> Wednesday, December 6, 2006, 6:23:48 PM, you wrote:
>> 
>> CG> One of our file servers internally to Sun that reproduces this
>> CG> running nv53 here is the dtrace output:
>> 
>> Any conclusions yet?
>> 

CG> Not yet.  We had to delete all the "automatic" snapshots we had
so that
CG> the system became usable again. An engineer is looking into reproducing
CG> this on a lab system. I hope they will turn up on this thread with a 
CG> progress report.

The problem here is that even without snapshots we''re seeing the same
problem. It''s like snapshots only make it worse.

Hope it''ll be solved quickly (I''ve also escalating it with P1
for some
time...).

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Gino

2007-Jun-27 08:59 UTC

head link

[zfs-discuss] Re: Re[2]: Re: Re: Re: Snapshots impact on performance

Same problem here (snv_60).
Robert, did you find any solutions?

gino
 
 
This message posted from opensolaris.org

Victor Latushkin

2007-Jun-27 12:19 UTC

head link

[zfs-discuss] Re: Re[2]: Re: Re: Re: Snapshots impact on performance

Gino wrote:> Same problem here (snv_60).
> Robert, did you find any solutions?
Couple of week ago I put together an implementation of space maps which 
completely eliminates loops and recursion from space map alloc 
operation, and allows to implement different allocation strategies quite 
easily (of which I put together 3 more). It looks like it works for me 
on thumper and my notebook with ZFS Root though I have almost no time to 
test it more these days due to year end. I haven''t done SPARC build yet
and I do not have test case to test against.

Also, it comes at a price - I have to spend some more time (logarithmic, 
though) during all other operations on space maps and is not optimized now.

victor

Robert Milkowski

2007-Jul-26 11:31 UTC

head link

[zfs-discuss] Snapshots impact on performance

Hello Victor,

Wednesday, June 27, 2007, 1:19:44 PM, you wrote:

VL> Gino wrote:>> Same problem here (snv_60).
>> Robert, did you find any solutions?
VL> Couple of week ago I put together an implementation of space maps which
VL> completely eliminates loops and recursion from space map alloc 
VL> operation, and allows to implement different allocation strategies quite
VL> easily (of which I put together 3 more). It looks like it works for me
VL> on thumper and my notebook with ZFS Root though I have almost no time to
VL> test it more these days due to year end. I haven''t done SPARC
build yet
VL> and I do not have test case to test against.

VL> Also, it comes at a price - I have to spend some more time (logarithmic,
VL> though) during all other operations on space maps and is not optimized
now.

Lukasz (cc) - maybe you can test it and even help on tuning it?


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Łukasz K

2007-Jul-27 12:41 UTC

head link

[zfs-discuss] Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impact on performance

Dnia 26-07-2007 o godz. 13:31 Robert Milkowski
napisa?(a):> Hello Victor,
> 
> Wednesday, June 27, 2007, 1:19:44 PM, you wrote:
> 
> VL> Gino wrote:
> >> Same problem here (snv_60).
> >> Robert, did you find any solutions?
> 
> VL> Couple of week ago I put together an implementation of space maps
> which
> VL> completely eliminates loops and recursion from space map alloc
> VL> operation, and allows to implement different allocation strategies
> quite
> VL> easily (of which I put together 3 more). It looks like it works for
me
> VL> on thumper and my notebook with ZFS Root though I have almost no
> time to
> VL> test it more these days due to year end. I haven''t done
SPARC build
> yet
> VL> and I do not have test case to test against.
> 
> VL> Also, it comes at a price - I have to spend some more time
> (logarithmic,
> VL> though) during all other operations on space maps and is not
> optimized now.
> 
> Lukasz (cc) - maybe you can test it and even help on tuning it?
> Yes, I can test it. I''m building environment to compile opensolaris
and test zfs. I will be ready next week.

Victor, can you tell me where to look for your changes ?
How to change allocation strategy ?
I can see that changing space_map_ops_t
I can declare diffrent callback functions.

Lukas

----------------------------------------------------
Tylko od nich zale?y czy prze?yj? t? noc. Jak uciec, gdy 
oni widz? wszystko? Kate Beckinsale w mrocznym thrillerze
MOTEL - kinach od 3 sierpnia!
http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fmotel.html&sid=1236

Łukasz

2007-Jul-27 14:21 UTC

head link

[zfs-discuss] Snapshots impact on performance

> Same problem here (snv_60).
> Robert, did you find any solutions?
> 
> gino
check this
http://www.opensolaris.org/jive/thread.jspa?threadID=34423&tstart=0

Check spa_sync function time
remember to change POOL_NAME !

dtrace -q -n fbt::spa_sync:entry''/(char *)(((spa_t*)arg0)->spa_name)
== "POOL_NAME"/{ self->t = timestamp; }'' -n
fbt::spa_sync:return''/self->t/{ @m = max((timestamp -
self->t)/1000000); self->t = 0; }'' -n tick-10m''{
printa("%@u", at m); exit(0); }''

If you have long spa_sync times, try to check if you have 
problems with finding new blocks in space map with this script:

#!/usr/sbin/dtrace -s

fbt::space_map_alloc:entry
{
   self->s = arg1;
}

fbt::space_map_alloc:return
/arg1 != -1/
{
  self->s = 0;
}

fbt::space_map_alloc:return
/self->s && (arg1 == -1)/
{
  @s = quantize(self->s);
  self->s = 0;
}

tick-10s
{
  printa(@s);
}

Then change zfs set recordsize=XX POOL_NAME. Make sure that all filesystem
inherits
recordsize. 
  #zfs get -r recordsize POOL_NAME

Other thing is space map size. 

check map size

echo ''::spa'' | mdb -k | grep
''f[0-9]*-[0-9]*'' \
  | while read pool_ptr state pool_name
do
  echo "${pool_ptr}::walk metaslab|::print -d struct metaslab
ms_smo.smo_objsize" \
    | mdb -k \
    | nawk ''{sub("^0t","",$3);sum+=$3}END{print
sum}''
done

The value you will get is space map size on disk. In memory space map will have 
about 4 *size_on_disk. Sometimes during snapshot remove kernel will have to load
all space maps to memory. For example
if space map on disk takes 1GB then:
 - kernel in spa_sync funtion will read 1GB from disk ( or from cache )
 - allocate 4GB for avl trees
 - do all operations on avl trees
 - save maps

It is good to have enough free memory for this operations.

You can reduce space map by coping all filesystems on other pool. I recommend
zfs send.

regards

Lukas
 
 
This message posted from opensolaris.org

Victor Latushkin

2007-Aug-24 12:57 UTC

head link

[zfs-discuss] Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impact on performance

Hi Lukasz and all,

I just returned from month long sick-leave, so I need some time to sort 
pile of emails, do SPARC build and some testing and then I''ll be able
to
provide you with my changes in some form. Hope this will happen next week.

Cheers,
Victor

?ukasz K wrote:> Dnia 26-07-2007 o godz. 13:31 Robert Milkowski napisa?(a):
>> Hello Victor,
>>
>> Wednesday, June 27, 2007, 1:19:44 PM, you wrote:
>>
>> VL> Gino wrote:
>>>> Same problem here (snv_60).
>>>> Robert, did you find any solutions?
>> VL> Couple of week ago I put together an implementation of space
maps
>> which
>> VL> completely eliminates loops and recursion from space map alloc
>> VL> operation, and allows to implement different allocation
strategies
>> quite
>> VL> easily (of which I put together 3 more). It looks like it works
for me
>> VL> on thumper and my notebook with ZFS Root though I have almost no
>> time to
>> VL> test it more these days due to year end. I haven''t done
SPARC build
>> yet
>> VL> and I do not have test case to test against.
>>
>> VL> Also, it comes at a price - I have to spend some more time
>> (logarithmic,
>> VL> though) during all other operations on space maps and is not
>> optimized now.
>>
>> Lukasz (cc) - maybe you can test it and even help on tuning it?
>>
> Yes, I can test it. I''m building environment to compile
opensolaris
> and test zfs. I will be ready next week.
> 
> Victor, can you tell me where to look for your changes ?
> How to change allocation strategy ?
> I can see that changing space_map_ops_t
> I can declare diffrent callback functions.
> 
> Lukas
> 
> ----------------------------------------------------
> Tylko od nich zale?y czy prze?yj? t? noc. Jak uciec, gdy 
> oni widz? wszystko? Kate Beckinsale w mrocznym thrillerze
> MOTEL - kinach od 3 sierpnia!
>
http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fmotel.html&sid=1236
> 
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Łukasz K

2007-Aug-24 13:49 UTC

head link

[zfs-discuss] Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impacton performance

Great, I have latest snv_70 sources and I''m working on it.

Lukas


Dnia 24-08-2007 o godz. 14:57 Victor Latushkin
napisa?(a):> Hi Lukasz and all,
> 
> I just returned from month long sick-leave, so I need some time to sort
> pile of emails, do SPARC build and some testing and then I''ll be
able to
> provide you with my changes in some form. Hope this will happen next week.
> 
> Cheers,
> Victor
> 
> ?ukasz K wrote:
> > Dnia 26-07-2007 o godz. 13:31 Robert Milkowski napisa?(a):
> >> Hello Victor,
> >>
> >> Wednesday, June 27, 2007, 1:19:44 PM, you wrote:
> >>
> >> VL> Gino wrote:
> >>>> Same problem here (snv_60).
> >>>> Robert, did you find any solutions?
> >> VL> Couple of week ago I put together an implementation of
space maps
> >> which
> >> VL> completely eliminates loops and recursion from space map
alloc
> >> VL> operation, and allows to implement different allocation
strategies
> >> quite
> >> VL> easily (of which I put together 3 more). It looks like it
works for me
> >> VL> on thumper and my notebook with ZFS Root though I have
almost no
> >> time to
> >> VL> test it more these days due to year end. I haven''t
done SPARC build
> >> yet
> >> VL> and I do not have test case to test against.
> >>
> >> VL> Also, it comes at a price - I have to spend some more time
> >> (logarithmic,
> >> VL> though) during all other operations on space maps and is
not
> >> optimized now.
> >>
> >> Lukasz (cc) - maybe you can test it and even help on tuning it?
> >>
> > Yes, I can test it. I''m building environment to compile
opensolaris
> > and test zfs. I will be ready next week.
> > 
> > Victor, can you tell me where to look for your changes ?
> > How to change allocation strategy ?
> > I can see that changing space_map_ops_t
> > I can declare diffrent callback functions.
> > 
> > Lukas
> > 
> > ----------------------------------------------------
> > Tylko od nich zale?y czy prze?yj? t? noc. Jak uciec, gdy
> > oni widz? wszystko? Kate Beckinsale w mrocznym thrillerze
> > MOTEL - kinach od 3 sierpnia!
> >
http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fmotel.html&sid=1236
> > 
> > 
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
----------------------------------------------------
Najnowszy album ?wiatowej klasy DJ''a Paula van Dyka 
"In Between" ju? w sklepach! Kompilacja zawiera singiel 
"White Lies" z udzia?em Jessica Sutta z Pussycat Dolls
http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fpaulvandyk.html&sid=1272

Robert Milkowski

2008-Mar-14 18:02 UTC

head link

[zfs-discuss] Snapshots impact on performance

Hello Erblichs,

Two years later... :)

It turned out to be a problem with metaslabs fragmentation.
Lowering recordsize to 8k helps slightly but still it''s not fixed.


-- 
Best regards,
 Robert Milkowski                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

zfs discuss - Oct 2006 - Snapshots impact on performance

[zfs-discuss] Snapshots impact on performance

[zfs-discuss] Snapshots impact on performance

[zfs-discuss] Snapshots impact on performance

[zfs-discuss] Snapshots impact on performance

[zfs-discuss] Snapshots impact on performance

[zfs-discuss] Re: Snapshots impact on performance

[zfs-discuss] Re: Snapshots impact on performance

[zfs-discuss] Re: Snapshots impact on performance

[zfs-discuss] Re: Snapshots impact on performance

[zfs-discuss] Re: Re: Snapshots impact on performance

[zfs-discuss] Re: Re: Snapshots impact on performance

[zfs-discuss] Re: Re: Snapshots impact on performance

[zfs-discuss] Re: Re: Snapshots impact on performance

[zfs-discuss] Re: Re: Re: Snapshots impact on performance

[zfs-discuss] Re: Re: Re: Snapshots impact on performance

[zfs-discuss] Re: Re: Re: Snapshots impact on performance

[zfs-discuss] Re: Re: Re: Snapshots impact on performance

[zfs-discuss] Re: Re: Re: Snapshots impact on performance

[zfs-discuss] Re: Re: Re: Snapshots impact on performance

[zfs-discuss] Re: Re[2]: Re: Re: Re: Snapshots impact on performance

[zfs-discuss] Re: Re[2]: Re: Re: Re: Snapshots impact on performance

[zfs-discuss] Snapshots impact on performance

[zfs-discuss] Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impact on performance

[zfs-discuss] Snapshots impact on performance

[zfs-discuss] Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impact on performance

[zfs-discuss] Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impacton performance

[zfs-discuss] Snapshots impact on performance