thr3ads.net - zfs discuss - [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv

If this information is useful, please help other people find it:
Share via:

Tomas Ögren

2006-Nov-09 16:59 UTC

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

Hello.

We''re currently using a Sun Blade1000 (2x750MHz, 1G ram, 2x160MB/s mpt
scsi buses, skge GigE network) as a NFS backend with ZFS for
distribution of free software like Debian (cdimage.debian.org,
ftp.se.debian.org) and have run into some performance issues.

We are running SX snv_48 and have run with a raidz2 with 7x300G for a
while now, just added another 7x300G raidz2 today but I''ll stick to old
information so far. Tried Sol10u2 before, but nfs writes killed every
bit of performance, snv_48 works much better in that regard.

Working data set is about 1.2TB over ~550k inodes right now. Backend
serves data to 2-4 linux frontends running Apache (with local raid0
mod_disk_cache), rsync (looking through entire debian trees every now
and then) and vsftp (not used much).

There are (at least?) two types of performance issues we''ve run into..

1. DNLC-through-ZFS doesn''t seem to listen to ncsize.

The filesystem currently has ~550k inodes and large portions of it is
frequently looked over with rsync (over nfs). mdb said ncsize was about
68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
600k and rebooted.. Didn''t seem to change much, still seeing hitrates
at
about the same and manual find(1) doesn''t seem to be that cached
(according to vmstat and dnlcsnoop.d).
When booting, the following message came up, not sure if it matters or not:
NOTICE: setting nrnode to max value of 351642
NOTICE: setting nrnode to max value of 235577

Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that
it has its own implementation which is integrated with the rest of the
ZFS cache which throws out metadata cache in favour of data cache.. or
something..


2. Readahead or something is killing all signs of performance

Since there can be pretty many requests in the air at the same time,
we''re having issues with readahead..
Some regular numbers are 7x13MB/s being read from disk according to
''iostat -xnzm 5'' and ''zpool iostat -v 5'',
and maybe 5MB/s is being sent
back over the network.. This means that about 20x more is read from disk
than actually being used. When testing single streams, the readahead
helps and data isn''t thrown away.. but when a bazilion nfs requests
come
at once, too much is being read by zfs compared to what was actually
requested/being delivered.

I saw some stuff about zfs_prefetch_disable in current ("unreleased")
code, will this help us perhaps? I''ve read about two layers of
prefetch,
one per vdev and one per disk.. Since the current working set is about
1.2TB, 1GB memory in the server and "lots of one-shot file requests"
nature, we''d like to disable as much readahead and data cache as
possible (since the chance of a positive data cache hit is very low)..
Keeping dnlc stuff in memory would help though.


Some URLs:

zfs_prefetch_disable being integrated:
http://dlc.sun.com/osol/on/downloads/current/on-changelog-20061103.html

zfs_prefetch_disable itself
http://src.opensolaris.org/source/search?q=zfs_prefetch_disable&defs=&refs=&path=&hist
Soft Track Buffer / Prefetch:
http://blogs.sun.com/roch/entry/the_dynamics_of_zfs

As far as I''ve been able to tell using mdb, this is already lowered in
b48?
http://blogs.sun.com/roch/entry/tuning_the_knobs


Suggestions, ideas etc?


/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Neil Perrin

2006-Nov-09 18:35 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

Tomas ?gren wrote On 11/09/06 09:59,:
> 1. DNLC-through-ZFS doesn''t seem to listen to ncsize.
> 
> The filesystem currently has ~550k inodes and large portions of it is
> frequently looked over with rsync (over nfs). mdb said ncsize was about
> 68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
> 600k and rebooted.. Didn''t seem to change much, still seeing
hitrates at
> about the same and manual find(1) doesn''t seem to be that cached
> (according to vmstat and dnlcsnoop.d).
> When booting, the following message came up, not sure if it matters or not:
> NOTICE: setting nrnode to max value of 351642
> NOTICE: setting nrnode to max value of 235577
> 
> Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that
> it has its own implementation which is integrated with the rest of the
> ZFS cache which throws out metadata cache in favour of data cache.. or
> something..
A more complete and useful set of dnlc statistic can be obtained via
"kstat -n dnlcstats". As well as soft the limit on dnlc entries
(ncsize)
the current number of cached entries is also useful:

echo ncsize/D | mdb -k
echo dnlc_nentries/D | mdb -k

nfs does have a maximum nmber of rnodes which is calculated from the
memory available. It doesn''t look like nrnode_max can be overridden.

Having said that I actually think your problem is lack of memory.
For each ZFS vnode held by the DNLC it uses a *lot* more memory
than say UFS. Consequently it has to purge dnlc entries and I
suspect with only 1GB that the ZFS ARC doesn''t allow many dnlc entries.
I don''t know if that number is maintained anywhere, for you to check.
Mark?

Neil.

eric kustarz

2006-Nov-09 20:15 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

Neil Perrin wrote:> 
> 
> Tomas ?gren wrote On 11/09/06 09:59,:
> 
>> 1. DNLC-through-ZFS doesn''t seem to listen to ncsize.
>>
>> The filesystem currently has ~550k inodes and large portions of it is
>> frequently looked over with rsync (over nfs). mdb said ncsize was about
>> 68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
>> 600k and rebooted.. Didn''t seem to change much, still seeing
hitrates at
>> about the same and manual find(1) doesn''t seem to be that
cached
>> (according to vmstat and dnlcsnoop.d).
>> When booting, the following message came up, not sure if it matters or 
>> not:
>> NOTICE: setting nrnode to max value of 351642
>> NOTICE: setting nrnode to max value of 235577
>>
>> Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is
that
>> it has its own implementation which is integrated with the rest of the
>> ZFS cache which throws out metadata cache in favour of data cache.. or
>> something..
> 
> 
> A more complete and useful set of dnlc statistic can be obtained via
> "kstat -n dnlcstats". As well as soft the limit on dnlc entries
(ncsize)
> the current number of cached entries is also useful:
> 
> echo ncsize/D | mdb -k
> echo dnlc_nentries/D | mdb -k
> 
> nfs does have a maximum nmber of rnodes which is calculated from the
> memory available. It doesn''t look like nrnode_max can be
overridden.
> 
> Having said that I actually think your problem is lack of memory.
> For each ZFS vnode held by the DNLC it uses a *lot* more memory
> than say UFS. Consequently it has to purge dnlc entries and I
> suspect with only 1GB that the ZFS ARC doesn''t allow many dnlc
entries.
> I don''t know if that number is maintained anywhere, for you to
check.
> Mark?
> 
> Neil.
If the ARC detects low memory (via arc_reclaim_needed()), then we call 
arc_kmem_reap_now() and subsequently dnlc_reduce_cache() - which reduces 
the # of dnlc entries by 3% (ARC_REDUCE_DNLC_PERCENT).

So yeah, dnlc_nentries would be really interesting to see (especially if 
its << ncsize).

eric

Tomas Ögren

2006-Nov-09 20:47 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes:
> 
> 
> Tomas ?gren wrote On 11/09/06 09:59,:
> 
> >1. DNLC-through-ZFS doesn''t seem to listen to ncsize.
> >
> >The filesystem currently has ~550k inodes and large portions of it is
> >frequently looked over with rsync (over nfs). mdb said ncsize was about
> >68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
> >600k and rebooted.. Didn''t seem to change much, still seeing
hitrates at
> >about the same and manual find(1) doesn''t seem to be that
cached
> >(according to vmstat and dnlcsnoop.d).
> >When booting, the following message came up, not sure if it matters or
not:
> >NOTICE: setting nrnode to max value of 351642
> >NOTICE: setting nrnode to max value of 235577
> >
> >Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is
that
> >it has its own implementation which is integrated with the rest of the
> >ZFS cache which throws out metadata cache in favour of data cache.. or
> >something..
> 
> A more complete and useful set of dnlc statistic can be obtained via
> "kstat -n dnlcstats". As well as soft the limit on dnlc entries
(ncsize)
> the current number of cached entries is also useful:
This is after ~28h uptime:

module: unix                            instance: 0
name:   dnlcstats                       class:    misc
        crtime                          47.5600948
        dir_add_abort                   0
        dir_add_max                     0
        dir_add_no_memory               0
        dir_cached_current              4
        dir_cached_total                107
        dir_entries_cached_current      4321
        dir_fini_purge                  0
        dir_hits                        11000
        dir_misses                      172814
        dir_reclaim_any                 25
        dir_reclaim_last                16
        dir_remove_entry_fail           0
        dir_remove_space_fail           0
        dir_start_no_memory             0
        dir_update_fail                 0
        double_enters                   234918
        enters                          59193543
        hits                            36690843
        misses                          59384436
        negative_cache_hits             1366345
        pick_free                       0
        pick_heuristic                  57069023
        pick_last                       2035111
        purge_all                       1
        purge_fs1                       0
        purge_total_entries             3748
        purge_vfs                       187
        purge_vp                        95
        snaptime                        99177.711093


vmstat -s:
 96080561 total name lookups (cache hits 38%)
> 
> echo ncsize/D | mdb -k
> echo dnlc_nentries/D | mdb -k
ncsize:         600000
dnlc_nentries:  19230

Not quite the same..
> nfs does have a maximum nmber of rnodes which is calculated from the
> memory available. It doesn''t look like nrnode_max can be
overridden.
rnode seems to take 472 bytes according to my test program.. which is "a
bit more" than the 64 bytes per dnlc entry in ncsize docs..
> Having said that I actually think your problem is lack of memory.
> For each ZFS vnode held by the DNLC it uses a *lot* more memory
> than say UFS. Consequently it has to purge dnlc entries and I
> suspect with only 1GB that the ZFS ARC doesn''t allow many dnlc
entries.
> I don''t know if that number is maintained anywhere, for you to
check.
> Mark?
Current memory usage (for some values of usage ;):
# echo ::memstat|mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                      95584               746   75%
Anon                        20868               163   16%
Exec and libs                1703                13    1%
Page cache                   1007                 7    1%
Free (cachelist)               97                 0    0%
Free (freelist)              7745                60    6%

Total                      127004               992
Physical                   125192               978


/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Brian Wong

2006-Nov-09 20:54 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

eric kustarz wrote:>
> If the ARC detects low memory (via arc_reclaim_needed()), then we call 
> arc_kmem_reap_now() and subsequently dnlc_reduce_cache() - which 
> reduces the # of dnlc entries by 3% (ARC_REDUCE_DNLC_PERCENT).
>
> So yeah, dnlc_nentries would be really interesting to see (especially 
> if its << ncsize).The version of statit that we''re using is still attached to ancient 
32-bit counters that /are/ overflowing on our runs. I''m fixing this at 
the moment and I''ll send around a new binary this afternoon.

blw

Tomas Ögren

2006-Nov-09 21:05 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

On 09 November, 2006 - Tomas ?gren sent me these 4,4K bytes:
> On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes:
> 
> > nfs does have a maximum nmber of rnodes which is calculated from the
> > memory available. It doesn''t look like nrnode_max can be
overridden.
> 
> rnode seems to take 472 bytes according to my test program.. which is
"a
> bit more" than the 64 bytes per dnlc entry in ncsize docs..
But wait a minute.. I''m not interested in being an NFS client.. this is
a server.. so wasting ~100MB on nfs client stuff that will never be used
isn''t that great.. setting to something really low and rebooting now..

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

eric kustarz

2006-Nov-09 21:22 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

Brian Wong wrote:> eric kustarz wrote:
> 
>>
>> If the ARC detects low memory (via arc_reclaim_needed()), then we call 
>> arc_kmem_reap_now() and subsequently dnlc_reduce_cache() - which 
>> reduces the # of dnlc entries by 3% (ARC_REDUCE_DNLC_PERCENT).
>>
>> So yeah, dnlc_nentries would be really interesting to see (especially 
>> if its << ncsize).
> 
> The version of statit that we''re using is still attached to
ancient
> 32-bit counters that /are/ overflowing on our runs. I''m fixing
this at
> the moment and I''ll send around a new binary this afternoon.
> 
> blw
Me and Spencer just fixed some statit bugs (such as getting it to not 
core on a thumper)... he has the changes, so i''d sync up with him
(i''m
not sure if they are they same bugs though).

eric

Robert Milkowski

2006-Nov-09 22:46 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

Hello Tomas,

Thursday, November 9, 2006, 9:47:17 PM, you wrote:

T?> On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes:


T?> Current memory usage (for some values of usage ;):
T?> # echo ::memstat|mdb -k
T?> Page Summary                Pages                MB  %Tot
T?> ------------     ----------------  ----------------  ----
T?> Kernel                      95584               746   75%
T?> Anon                        20868               163   16%
T?> Exec and libs                1703                13    1%
T?> Page cache                   1007                 7    1%
T?> Free (cachelist)               97                 0    0%
T?> Free (freelist)              7745                60    6%

T?> Total                      127004               992
T?> Physical                   125192               978

Well, when I rised ncsize on nfs server I got memory pressure problem.
Leaving ncsize at default solved problem.



-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Neil Perrin

2006-Nov-10 00:45 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

Tomas ?gren wrote On 11/09/06 13:47,:
>On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes:
>
>  
>
>>Tomas ?gren wrote On 11/09/06 09:59,:
>>
>>    
>>
>>>1. DNLC-through-ZFS doesn''t seem to listen to ncsize.
>>>
>>>The filesystem currently has ~550k inodes and large portions of it
is
>>>frequently looked over with rsync (over nfs). mdb said ncsize was
about
>>>68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
>>>600k and rebooted.. Didn''t seem to change much, still
seeing hitrates at
>>>about the same and manual find(1) doesn''t seem to be that
cached
>>>(according to vmstat and dnlcsnoop.d).
>>>When booting, the following message came up, not sure if it matters
or not:
>>>NOTICE: setting nrnode to max value of 351642
>>>NOTICE: setting nrnode to max value of 235577
>>>
>>>Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is
that
>>>it has its own implementation which is integrated with the rest of
the
>>>ZFS cache which throws out metadata cache in favour of data cache..
or
>>>something..
>>>      
>>>
>>A more complete and useful set of dnlc statistic can be obtained via
>>"kstat -n dnlcstats". As well as soft the limit on dnlc
entries (ncsize)
>>the current number of cached entries is also useful:
>>    
>>
>
>This is after ~28h uptime:
>
>module: unix                            instance: 0
>name:   dnlcstats                       class:    misc
>        crtime                          47.5600948
>        dir_add_abort                   0
>        dir_add_max                     0
>        dir_add_no_memory               0
>        dir_cached_current              4
>        dir_cached_total                107
>        dir_entries_cached_current      4321
>        dir_fini_purge                  0
>        dir_hits                        11000
>        dir_misses                      172814
>        dir_reclaim_any                 25
>        dir_reclaim_last                16
>        dir_remove_entry_fail           0
>        dir_remove_space_fail           0
>        dir_start_no_memory             0
>        dir_update_fail                 0
>        double_enters                   234918
>        enters                          59193543
>        hits                            36690843
>        misses                          59384436
>        negative_cache_hits             1366345
>        pick_free                       0
>        pick_heuristic                  57069023
>        pick_last                       2035111
>        purge_all                       1
>        purge_fs1                       0
>        purge_total_entries             3748
>        purge_vfs                       187
>        purge_vp                        95
>        snaptime                        99177.711093
>
>
>vmstat -s:
> 96080561 total name lookups (cache hits 38%)
>
>  
>
>>echo ncsize/D | mdb -k
>>echo dnlc_nentries/D | mdb -k
>>    
>>
>
>ncsize:         600000
>dnlc_nentries:  19230
>
>Not quite the same..
>
>  
>
>>Having said that I actually think your problem is lack of memory.
>>For each ZFS vnode held by the DNLC it uses a *lot* more memory
>>than say UFS. Consequently it has to purge dnlc entries and I
>>suspect with only 1GB that the ZFS ARC doesn''t allow many dnlc
entries.
>>I don''t know if that number is maintained anywhere, for you to
check.
>>Mark?
>>    
>>
>
>Current memory usage (for some values of usage ;):
># echo ::memstat|mdb -k
>Page Summary                Pages                MB  %Tot
>------------     ----------------  ----------------  ----
>Kernel                      95584               746   75%
>Anon                        20868               163   16%
>Exec and libs                1703                13    1%
>Page cache                   1007                 7    1%
>Free (cachelist)               97                 0    0%
>Free (freelist)              7745                60    6%
>
>Total                      127004               992
>Physical                   125192               978
>
>
>/Tomas
>  
>This memory usage shows nearly all of memory consumed by the kernel
and probably by ZFS.  ZFS can''t add any more DNLC entries due to lack
of
memory without purging others. This can be seen from  the number of
dnlc_nentries being way less than ncsize.
I don''t know if there''s a DMU or ARC bug to reduce the memory
footprint
of their internal structures for situations like this, but we are aware 
of the
issue.

Neil.

Sanjeev Bagewadi

2006-Nov-10 12:25 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

Comments in line...

Neil Perrin wrote:
>>>> 1. DNLC-through-ZFS doesn''t seem to listen to ncsize.
>>>>
>>>> The filesystem currently has ~550k inodes and large portions of
it is
>>>> frequently looked over with rsync (over nfs). mdb said ncsize
was
>>>> about
>>>> 68k and vmstat -s  said we had a hitrate of ~30%, so I set
ncsize to
>>>> 600k and rebooted.. Didn''t seem to change much, still
seeing
>>>> hitrates at
>>>> about the same and manual find(1) doesn''t seem to be
that cached
>>>> (according to vmstat and dnlcsnoop.d).
>>>> When booting, the following message came up, not sure if it
matters
>>>> or not:
>>>> NOTICE: setting nrnode to max value of 351642
>>>> NOTICE: setting nrnode to max value of 235577
>>>>
>>>> Is there a separate ZFS-DNLC knob to adjust for this? Wild
guess is
>>>> that
>>>> it has its own implementation which is integrated with the rest
of the
>>>> ZFS cache which throws out metadata cache in favour of data
cache.. or
>>>> something.. 
>>>
>> Current memory usage (for some values of usage ;):
>> # echo ::memstat|mdb -k
>> Page Summary                Pages                MB  %Tot
>> ------------     ----------------  ----------------  ----
>> Kernel                      95584               746   75%
>> Anon                        20868               163   16%
>> Exec and libs                1703                13    1%
>> Page cache                   1007                 7    1%
>> Free (cachelist)               97                 0    0%
>> Free (freelist)              7745                60    6%
>>
>> Total                      127004               992
>> Physical                   125192               978
>>
>>
>> /Tomas
>>  
>>
> This memory usage shows nearly all of memory consumed by the kernel
> and probably by ZFS.  ZFS can''t add any more DNLC entries due to
lack of
> memory without purging others. This can be seen from  the number of
> dnlc_nentries being way less than ncsize.
> I don''t know if there''s a DMU or ARC bug to reduce the
memory footprint
> of their internal structures for situations like this, but we are 
> aware of the
> issue.
Can you please check the zio buffers and the arc status ?

Here is how you can do it :
- Start mdb : ie. mdb -k

 > ::kmem_cache

- In the output generated above check the amount consumed by the 
zio_buf_*, arc_buf_t and
  arc_buf_hdr_t.

- Dump the values of arc

 > arc::print struct arc

- This should give you some like below.
-- snip--
 > arc::print struct arc
{
    anon = ARC_anon
    mru = ARC_mru
    mru_ghost = ARC_mru_ghost
    mfu = ARC_mfu
    mfu_ghost = ARC_mfu_ghost
    size = 0x3e20000               <-- tells you the current memory 
consumed by ARC buffer (including the
                                                    the memory consumed 
for the data cached ie. zio_buff_*
    p = 0x1d06a06
    c = 0x4000000
    c_min = 0x4000000
    c_max = 0x2f9aa800
    hits = 0x2fd2
    misses = 0xd1c
    deleted = 0x296
    skipped = 0
    hash_elements = 0xa85
    hash_elements_max = 0xcc0
    hash_collisions = 0x173
    hash_chains = 0xbe
    hash_chain_max = 0x2
    no_grow = 0               <-- This would be set to 1 if we have a 
memory crunch
}
-- snip --

And as Niel pointed out we would probably need some way of limiting the 
ARC consumption.

Regards,
Sanjeev.
>
> Neil.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:    x27521 +91 80 669 27521

Tomas Ögren

2006-Nov-10 13:32 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

On 10 November, 2006 - Sanjeev Bagewadi sent me these 3,5K bytes:
> Comments in line...
> 
> Neil Perrin wrote:
> 
> >>>>1. DNLC-through-ZFS doesn''t seem to listen to
ncsize.
> >>>>
> >>>>The filesystem currently has ~550k inodes and large
portions of it is
> >>>>frequently looked over with rsync (over nfs). mdb said
ncsize was
> >>>>about
> >>>>68k and vmstat -s  said we had a hitrate of ~30%, so I set
ncsize to
> >>>>600k and rebooted.. Didn''t seem to change much,
still seeing
> >>>>hitrates at
> >>>>about the same and manual find(1) doesn''t seem to
be that cached
> >>>>(according to vmstat and dnlcsnoop.d).
> >>>>When booting, the following message came up, not sure if it
matters
> >>>>or not:
> >>>>NOTICE: setting nrnode to max value of 351642
> >>>>NOTICE: setting nrnode to max value of 235577
> >>>>
> >>>>Is there a separate ZFS-DNLC knob to adjust for this? Wild
guess is
> >>>>that
> >>>>it has its own implementation which is integrated with the
rest of the
> >>>>ZFS cache which throws out metadata cache in favour of data
cache.. or
> >>>>something.. 
> >>>
> >>Current memory usage (for some values of usage ;):
> >># echo ::memstat|mdb -k
> >>Page Summary                Pages                MB  %Tot
> >>------------     ----------------  ----------------  ----
> >>Kernel                      95584               746   75%
> >>Anon                        20868               163   16%
> >>Exec and libs                1703                13    1%
> >>Page cache                   1007                 7    1%
> >>Free (cachelist)               97                 0    0%
> >>Free (freelist)              7745                60    6%
> >>
> >>Total                      127004               992
> >>Physical                   125192               978
> >>
> >>
> >>/Tomas
> >> 
> >>
> >This memory usage shows nearly all of memory consumed by the kernel
> >and probably by ZFS.  ZFS can''t add any more DNLC entries due
to lack of
> >memory without purging others. This can be seen from  the number of
> >dnlc_nentries being way less than ncsize.
> >I don''t know if there''s a DMU or ARC bug to reduce
the memory footprint
> >of their internal structures for situations like this, but we are 
> >aware of the
> >issue.
> 
> Can you please check the zio buffers and the arc status ?
> 
> Here is how you can do it :
> - Start mdb : ie. mdb -k
> 
> > ::kmem_cache
> 
> - In the output generated above check the amount consumed by the 
> zio_buf_*, arc_buf_t and
>  arc_buf_hdr_t.
ADDR             NAME                      FLAG  CFLAG  BUFSIZE  BUFTOTL

0000030002640a08 zio_buf_512               0000 020000      512   102675
0000030002640c88 zio_buf_1024              0200 020000     1024       48
0000030002640f08 zio_buf_1536              0200 020000     1536       70
0000030002641188 zio_buf_2048              0200 020000     2048       16
0000030002641408 zio_buf_2560              0200 020000     2560        9
0000030002641688 zio_buf_3072              0200 020000     3072       16
0000030002641908 zio_buf_3584              0200 020000     3584       18
0000030002641b88 zio_buf_4096              0200 020000     4096       12
0000030002668008 zio_buf_5120              0200 020000     5120       32
0000030002668288 zio_buf_6144              0200 020000     6144        8
0000030002668508 zio_buf_7168              0200 020000     7168     1032
0000030002668788 zio_buf_8192              0200 020000     8192        8
0000030002668a08 zio_buf_10240             0200 020000    10240        8
0000030002668c88 zio_buf_12288             0200 020000    12288        4
0000030002668f08 zio_buf_14336             0200 020000    14336      468
0000030002669188 zio_buf_16384             0200 020000    16384     3326
0000030002669408 zio_buf_20480             0200 020000    20480       16
0000030002669688 zio_buf_24576             0200 020000    24576        3
0000030002669908 zio_buf_28672             0200 020000    28672       12
0000030002669b88 zio_buf_32768             0200 020000    32768     1935
000003000266c008 zio_buf_40960             0200 020000    40960       13
000003000266c288 zio_buf_49152             0200 020000    49152        9
000003000266c508 zio_buf_57344             0200 020000    57344        7
000003000266c788 zio_buf_65536             0200 020000    65536     3272
000003000266ca08 zio_buf_73728             0200 020000    73728       10
000003000266cc88 zio_buf_81920             0200 020000    81920        7
000003000266cf08 zio_buf_90112             0200 020000    90112        5
000003000266d188 zio_buf_98304             0200 020000    98304        7
000003000266d408 zio_buf_106496            0200 020000   106496       12
000003000266d688 zio_buf_114688            0200 020000   114688        6
000003000266d908 zio_buf_122880            0200 020000   122880        5
000003000266db88 zio_buf_131072            0200 020000   131072       92

0000030002670508 arc_buf_hdr_t             0000 000000      128    11970
0000030002670788 arc_buf_t                 0000 000000       40     7308
> - Dump the values of arc
> 
> > arc::print struct arc
> arc::print struct arc               {
    anon = ARC_anon
    mru = ARC_mru
    mru_ghost = ARC_mru_ghost
    mfu = ARC_mfu
    mfu_ghost = ARC_mfu_ghost
    size = 0x6f7a400
    p = 0x5d9bd5a
    c = 0x5f6375a
    c_min = 0x4000000
    c_max = 0x2e82a000
    hits = 0x40e0a15
    misses = 0x1cec4a4
    deleted = 0x1b0ba0d
    skipped = 0x24ea64e13
    hash_elements = 0x179d
    hash_elements_max = 0x60bb
    hash_collisions = 0x8dca3a
    hash_chains = 0x391
    hash_chain_max = 0x8
    no_grow = 0x1
}

So, about 100MB and a memory crunch..
>    size = 0x3e20000               <-- tells you the current memory 
> consumed by ARC buffer (including the
>                                                    the memory consumed 
> for the data cached ie. zio_buff_*
>    no_grow = 0               <-- This would be set to 1 if we have a 
> memory crunch
> And as Niel pointed out we would probably need some way of limiting the 
> ARC consumption.
/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Sanjeev Bagewadi

2006-Nov-13 05:25 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

Tomas,

comments inline...


Tomas ?gren wrote:
>On 10 November, 2006 - Sanjeev Bagewadi sent me these 3,5K bytes:
>
>  
>
>>>>>>1. DNLC-through-ZFS doesn''t seem to listen to
ncsize.
>>>>>>
>>>>>>The filesystem currently has ~550k inodes and large
portions of it is
>>>>>>frequently looked over with rsync (over nfs). mdb said
ncsize was
>>>>>>about
>>>>>>68k and vmstat -s  said we had a hitrate of ~30%, so I
set ncsize to
>>>>>>600k and rebooted.. Didn''t seem to change much,
still seeing
>>>>>>hitrates at
>>>>>>about the same and manual find(1) doesn''t seem
to be that cached
>>>>>>(according to vmstat and dnlcsnoop.d).
>>>>>>When booting, the following message came up, not sure if
it matters
>>>>>>or not:
>>>>>>NOTICE: setting nrnode to max value of 351642
>>>>>>NOTICE: setting nrnode to max value of 235577
>>>>>>
>>>>>>Is there a separate ZFS-DNLC knob to adjust for this?
Wild guess is
>>>>>>that
>>>>>>it has its own implementation which is integrated with
the rest of the
>>>>>>ZFS cache which throws out metadata cache in favour of
data cache.. or
>>>>>>something.. 
>>>>>>            
>>>>>>
>>>>Current memory usage (for some values of usage ;):
>>>># echo ::memstat|mdb -k
>>>>Page Summary                Pages                MB  %Tot
>>>>------------     ----------------  ----------------  ----
>>>>Kernel                      95584               746   75%
>>>>Anon                        20868               163   16%
>>>>Exec and libs                1703                13    1%
>>>>Page cache                   1007                 7    1%
>>>>Free (cachelist)               97                 0    0%
>>>>Free (freelist)              7745                60    6%
>>>>
>>>>Total                      127004               992
>>>>Physical                   125192               978
>>>>
>>>>
>>>>/Tomas
>>>>
>>>>
>>>>        
>>>>
>>>This memory usage shows nearly all of memory consumed by the kernel
>>>and probably by ZFS.  ZFS can''t add any more DNLC entries
due to lack of
>>>memory without purging others. This can be seen from  the number of
>>>dnlc_nentries being way less than ncsize.
>>>I don''t know if there''s a DMU or ARC bug to reduce
the memory footprint
>>>of their internal structures for situations like this, but we are 
>>>aware of the
>>>issue.
>>>      
>>>
>>Can you please check the zio buffers and the arc status ?
>>
>>Here is how you can do it :
>>- Start mdb : ie. mdb -k
>>
>>    
>>
>>>::kmem_cache
>>>      
>>>
>>- In the output generated above check the amount consumed by the 
>>zio_buf_*, arc_buf_t and
>> arc_buf_hdr_t.
>>    
>>
>
>ADDR             NAME                      FLAG  CFLAG  BUFSIZE  BUFTOTL
>
>0000030002640a08 zio_buf_512               0000 020000      512   102675
>0000030002640c88 zio_buf_1024              0200 020000     1024       48
>0000030002640f08 zio_buf_1536              0200 020000     1536       70
>0000030002641188 zio_buf_2048              0200 020000     2048       16
>0000030002641408 zio_buf_2560              0200 020000     2560        9
>0000030002641688 zio_buf_3072              0200 020000     3072       16
>0000030002641908 zio_buf_3584              0200 020000     3584       18
>0000030002641b88 zio_buf_4096              0200 020000     4096       12
>0000030002668008 zio_buf_5120              0200 020000     5120       32
>0000030002668288 zio_buf_6144              0200 020000     6144        8
>0000030002668508 zio_buf_7168              0200 020000     7168     1032
>0000030002668788 zio_buf_8192              0200 020000     8192        8
>0000030002668a08 zio_buf_10240             0200 020000    10240        8
>0000030002668c88 zio_buf_12288             0200 020000    12288        4
>0000030002668f08 zio_buf_14336             0200 020000    14336      468
>0000030002669188 zio_buf_16384             0200 020000    16384     3326
>0000030002669408 zio_buf_20480             0200 020000    20480       16
>0000030002669688 zio_buf_24576             0200 020000    24576        3
>0000030002669908 zio_buf_28672             0200 020000    28672       12
>0000030002669b88 zio_buf_32768             0200 020000    32768     1935
>000003000266c008 zio_buf_40960             0200 020000    40960       13
>000003000266c288 zio_buf_49152             0200 020000    49152        9
>000003000266c508 zio_buf_57344             0200 020000    57344        7
>000003000266c788 zio_buf_65536             0200 020000    65536     3272
>000003000266ca08 zio_buf_73728             0200 020000    73728       10
>000003000266cc88 zio_buf_81920             0200 020000    81920        7
>000003000266cf08 zio_buf_90112             0200 020000    90112        5
>000003000266d188 zio_buf_98304             0200 020000    98304        7
>000003000266d408 zio_buf_106496            0200 020000   106496       12
>000003000266d688 zio_buf_114688            0200 020000   114688        6
>000003000266d908 zio_buf_122880            0200 020000   122880        5
>000003000266db88 zio_buf_131072            0200 020000   131072       92
>
>0000030002670508 arc_buf_hdr_t             0000 000000      128    11970
>0000030002670788 arc_buf_t                 0000 000000       40     7308
>
>  
>
>>- Dump the values of arc
>>
>>    
>>
>>>arc::print struct arc
>>>      
>>>
>
>  
>
>>arc::print struct arc               
>>    
>>
>{
>    anon = ARC_anon
>    mru = ARC_mru
>    mru_ghost = ARC_mru_ghost
>    mfu = ARC_mfu
>    mfu_ghost = ARC_mfu_ghost
>    size = 0x6f7a400
>    p = 0x5d9bd5a
>    c = 0x5f6375a
>    c_min = 0x4000000
>    c_max = 0x2e82a000
>    hits = 0x40e0a15
>    misses = 0x1cec4a4
>    deleted = 0x1b0ba0d
>    skipped = 0x24ea64e13
>    hash_elements = 0x179d
>    hash_elements_max = 0x60bb
>    hash_collisions = 0x8dca3a
>    hash_chains = 0x391
>    hash_chain_max = 0x8
>    no_grow = 0x1
>}
>
>So, about 100MB and a memory crunch..
>  
>Interesting ! So, it is not the ARC which is consuming too much memory....
It is some other piece (not sure if it belongs to ZFS) which is causing 
the crunch...

Or the other possibility is that ARC ate up too much and caused a near 
crunch situation
and the kmem hit back and caused ARC to free up it''s buffers (hence the
no_grow flag enabled).
So, it (ARC) could be osscillating between large caching and then 
purging the caches.

You might want to keep track of these values (ARC size and no_grow flag) 
and see how they
change over a period of time. This would help us understand the pattern.
And if we know it ARC which is causing the crunch we could manually 
change the values of
c_max to a comfortable value and that would limit the size of ARC. 
However, I would suggest
that you try it out on a non-production machine first.

By, default the c_max is set to 75% of physmem and that is the hard 
limit. "c" is the soft limit and
ARC would try and grow upto ''c". The value of "c" is
adjusted when there
is a need to cache more
but, it will never exceed "c_max".

Regarding the huge number of reads, I am sure you have already tried 
disabling the VDEV prefetch.
If not, it is worth a try.

Thanks and regards,
Sanjeev.

-- 
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:    x27521 +91 80 669 27521

Tomas Ögren

2006-Nov-13 09:51 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes:
> Tomas,
> 
> comments inline...
> 
> 
> >>arc::print struct arc               
> >>   
> >>
> >{
> >   anon = ARC_anon
> >   mru = ARC_mru
> >   mru_ghost = ARC_mru_ghost
> >   mfu = ARC_mfu
> >   mfu_ghost = ARC_mfu_ghost
> >   size = 0x6f7a400
> >   p = 0x5d9bd5a
> >   c = 0x5f6375a
> >   c_min = 0x4000000
> >   c_max = 0x2e82a000
> >   hits = 0x40e0a15
> >   misses = 0x1cec4a4
> >   deleted = 0x1b0ba0d
> >   skipped = 0x24ea64e13
> >   hash_elements = 0x179d
> >   hash_elements_max = 0x60bb
> >   hash_collisions = 0x8dca3a
> >   hash_chains = 0x391
> >   hash_chain_max = 0x8
> >   no_grow = 0x1
> >}
> >
> >So, about 100MB and a memory crunch..
> > 
> >
> Interesting ! So, it is not the ARC which is consuming too much memory....
> It is some other piece (not sure if it belongs to ZFS) which is causing 
> the crunch...
> 
> Or the other possibility is that ARC ate up too much and caused a near 
> crunch situation
> and the kmem hit back and caused ARC to free up it''s buffers
(hence the
> no_grow flag enabled).
> So, it (ARC) could be osscillating between large caching and then 
> purging the caches.
> 
> You might want to keep track of these values (ARC size and no_grow flag) 
> and see how they
> change over a period of time. This would help us understand the pattern.
I would guess it grows after boot until it hits some max and then stays
there.. but I can check it out..
> And if we know it ARC which is causing the crunch we could manually 
> change the values of
> c_max to a comfortable value and that would limit the size of ARC. 
But in the ZFS world, DNLC is part of the ARC, right?
My original question was how to get rid of "data cache", but keep
"metadata cache" (such as DNLC)...
> However, I would suggest
> that you try it out on a non-production machine first.
> 
> By, default the c_max is set to 75% of physmem and that is the hard 
> limit. "c" is the soft limit and
> ARC would try and grow upto ''c". The value of "c"
is adjusted when there
> is a need to cache more
> but, it will never exceed "c_max".
> 
> Regarding the huge number of reads, I am sure you have already tried 
> disabling the VDEV prefetch.
> If not, it is worth a try.
That was part of my original question, how? :)

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Roch - PAE

2006-Nov-13 13:41 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

Tomas ?gren writes:
 > On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes:
 > 
 > > Tomas,
 > > 
 > > comments inline...
 > > 
 > > 
 > > >>arc::print struct arc               
 > > >>   
 > > >>
 > > >{
 > > >   anon = ARC_anon
 > > >   mru = ARC_mru
 > > >   mru_ghost = ARC_mru_ghost
 > > >   mfu = ARC_mfu
 > > >   mfu_ghost = ARC_mfu_ghost
 > > >   size = 0x6f7a400
 > > >   p = 0x5d9bd5a
 > > >   c = 0x5f6375a
 > > >   c_min = 0x4000000
 > > >   c_max = 0x2e82a000
 > > >   hits = 0x40e0a15
 > > >   misses = 0x1cec4a4
 > > >   deleted = 0x1b0ba0d
 > > >   skipped = 0x24ea64e13
 > > >   hash_elements = 0x179d
 > > >   hash_elements_max = 0x60bb
 > > >   hash_collisions = 0x8dca3a
 > > >   hash_chains = 0x391
 > > >   hash_chain_max = 0x8
 > > >   no_grow = 0x1
 > > >}
 > > >
 > > >So, about 100MB and a memory crunch..
 > > > 
 > > >
 > > Interesting ! So, it is not the ARC which is consuming too much
memory....
 > > It is some other piece (not sure if it belongs to ZFS) which is
causing
 > > the crunch...
 > > 
 > > Or the other possibility is that ARC ate up too much and caused a
near
 > > crunch situation
 > > and the kmem hit back and caused ARC to free up it''s buffers
(hence the
 > > no_grow flag enabled).
 > > So, it (ARC) could be osscillating between large caching and then 
 > > purging the caches.
 > > 
 > > You might want to keep track of these values (ARC size and no_grow
flag)
 > > and see how they
 > > change over a period of time. This would help us understand the
pattern.
 > 
 > I would guess it grows after boot until it hits some max and then stays
 > there.. but I can check it out..
 > 
 > > And if we know it ARC which is causing the crunch we could manually 
 > > change the values of
 > > c_max to a comfortable value and that would limit the size of ARC. 
 > 
 > But in the ZFS world, DNLC is part of the ARC, right?
 > My original question was how to get rid of "data cache", but
keep
 > "metadata cache" (such as DNLC)...
 > 
 > > However, I would suggest
 > > that you try it out on a non-production machine first.
 > > 
 > > By, default the c_max is set to 75% of physmem and that is the hard 
 > > limit. "c" is the soft limit and
 > > ARC would try and grow upto ''c". The value of
"c" is adjusted when there
 > > is a need to cache more
 > > but, it will never exceed "c_max".
 > > 
 > > Regarding the huge number of reads, I am sure you have already tried 
 > > disabling the VDEV prefetch.
 > > If not, it is worth a try.
 > 
 > That was part of my original question, how? :)
 > 
 > /Tomas
 > -- 
 > Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
 > |- Student at Computing Science, University of Ume?
 > `- Sysadmin at {cs,acc}.umu.se
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Under memory pressure the arc will shrink and it will also
shrink the dnlc by 3%.

	arc_reduce_dnlc_percent = 3

You could try to tune that number.

-r

Eric Kustarz

2006-Nov-13 21:40 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

Tomas ?gren wrote:> On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes:
> 
> 
>>Tomas,
>>
>>comments inline...
>>
>>
>>
>>>>arc::print struct arc               
>>>>  
>>>>
>>>
>>>{
>>>  anon = ARC_anon
>>>  mru = ARC_mru
>>>  mru_ghost = ARC_mru_ghost
>>>  mfu = ARC_mfu
>>>  mfu_ghost = ARC_mfu_ghost
>>>  size = 0x6f7a400
>>>  p = 0x5d9bd5a
>>>  c = 0x5f6375a
>>>  c_min = 0x4000000
>>>  c_max = 0x2e82a000
>>>  hits = 0x40e0a15
>>>  misses = 0x1cec4a4
>>>  deleted = 0x1b0ba0d
>>>  skipped = 0x24ea64e13
>>>  hash_elements = 0x179d
>>>  hash_elements_max = 0x60bb
>>>  hash_collisions = 0x8dca3a
>>>  hash_chains = 0x391
>>>  hash_chain_max = 0x8
>>>  no_grow = 0x1
>>>}
>>>
>>>So, about 100MB and a memory crunch..
>>>
>>>
>>
>>Interesting ! So, it is not the ARC which is consuming too much
memory....
>>It is some other piece (not sure if it belongs to ZFS) which is causing 
>>the crunch...
>>
>>Or the other possibility is that ARC ate up too much and caused a near 
>>crunch situation
>>and the kmem hit back and caused ARC to free up it''s buffers
(hence the
>>no_grow flag enabled).
>>So, it (ARC) could be osscillating between large caching and then 
>>purging the caches.
>>
>>You might want to keep track of these values (ARC size and no_grow flag)
>>and see how they
>>change over a period of time. This would help us understand the pattern.
> 
> 
> I would guess it grows after boot until it hits some max and then stays
> there.. but I can check it out..
> 
> 
>>And if we know it ARC which is causing the crunch we could manually 
>>change the values of
>>c_max to a comfortable value and that would limit the size of ARC. 
> 
> 
> But in the ZFS world, DNLC is part of the ARC, right?
> My original question was how to get rid of "data cache", but keep
> "metadata cache" (such as DNLC)...
> 
> 
>>However, I would suggest
>>that you try it out on a non-production machine first.
>>
>>By, default the c_max is set to 75% of physmem and that is the hard 
>>limit. "c" is the soft limit and
>>ARC would try and grow upto ''c". The value of
"c" is adjusted when there
>>is a need to cache more
>>but, it will never exceed "c_max".
>>
>>Regarding the huge number of reads, I am sure you have already tried 
>>disabling the VDEV prefetch.
>>If not, it is worth a try.
> 
> 
> That was part of my original question, how? :)
> 
> /Tomas
On recent bits, you can set ''zfs_vdev_cache_max'' to 1 to
disable the
vdev cache.

eric

Tomas Ögren

2006-Nov-14 17:23 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

On 13 November, 2006 - Eric Kustarz sent me these 2,4K bytes:
> Tomas ?gren wrote:
> >On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes:
> >>Regarding the huge number of reads, I am sure you have already
tried
> >>disabling the VDEV prefetch.
> >>If not, it is worth a try.
> >That was part of my original question, how? :)
> 
> On recent bits, you can set ''zfs_vdev_cache_max'' to 1 to
disable the
> vdev cache.
On earlier versions (snv_48), I did similar with ztune.sh[0], adding
cache_size which I set to 0 (instead of 10M).

This helped quite a lot, but there seems to be one more level of
prefetching..

Example:
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
ftp         1.67T  2.15T  1.26K     23  40.9M   890K
  raidz2    1.37T   551G    674     10  22.3M   399K
    c4t0d0      -      -    210      3  3.19M  80.4K
    c4t1d0      -      -    211      3  3.19M  80.4K
    c4t2d0      -      -    211      3  3.19M  80.4K
    c5t0d0      -      -    210      3  3.19M  80.4K
    c5t1d0      -      -    242      4  3.19M  80.4K
    c5t2d0      -      -    211      3  3.19M  80.4K
    c5t3d0      -      -    211      3  3.19M  80.4K
  raidz2     305G  1.61T    614     12  18.6M   491K
    c4t3d0      -      -    222      5  2.66M  99.1K
    c4t4d0      -      -    223      5  2.66M  99.1K
    c4t5d0      -      -    224      5  2.66M  99.1K
    c4t8d0      -      -    190      5  2.66M  99.1K
    c5t4d0      -      -    190      5  2.66M  99.1K
    c5t5d0      -      -    226      5  2.66M  99.1K
    c5t8d0      -      -    225      5  2.66M  99.1K
----------  -----  -----  -----  -----  -----  -----

Before this fix, the ''read bandwidth'' for disks in the first
raidz2
added up to way more than the raidz2 itself.. now it adds up correctly,
but some other readahead causes a 1-10x factor too much, mostly hovering
around 2-3x.. before it was hovering around 8-10x..

[0]:
http://blogs.sun.com/roch/resource/ztune.sh

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Sanjeev Bagewadi

2006-Nov-15 09:43 UTC

head link

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

Tomas,

Apologies for delayed response...

Tomas ?gren wrote:
>>Interesting ! So, it is not the ARC which is consuming too much
memory....
>>It is some other piece (not sure if it belongs to ZFS) which is causing 
>>the crunch...
>>
>>Or the other possibility is that ARC ate up too much and caused a near 
>>crunch situation
>>and the kmem hit back and caused ARC to free up it''s buffers
(hence the
>>no_grow flag enabled).
>>So, it (ARC) could be osscillating between large caching and then 
>>purging the caches.
>>
>>You might want to keep track of these values (ARC size and no_grow flag)
>>and see how they
>>change over a period of time. This would help us understand the pattern.
>>    
>>
>
>I would guess it grows after boot until it hits some max and then stays
>there.. but I can check it out..
>  
>No, that is not true. Its shrinks when there is memory pressure. The 
values of ''c'' and ''p'' are
adjusted accordingly.
>>And if we know it ARC which is causing the crunch we could manually 
>>change the values of
>>c_max to a comfortable value and that would limit the size of ARC. 
>>    
>>
>
>But in the ZFS world, DNLC is part of the ARC, right?
>  
>Not really... ZFS uses the regular DNLC for lookup optimization. 
However, the metadata/data
is cached in the ARC.
>My original question was how to get rid of "data cache", but keep
>"metadata cache" (such as DNLC)...
>  
>This is good question. AFAIK ARC does not really differentiate between 
metadata and data.
So, I am not sure if we can control it. However, as I mentioned above 
ZFS still uses the DNLC caching.
>  
>
>>However, I would suggest
>>that you try it out on a non-production machine first.
>>
>>By, default the c_max is set to 75% of physmem and that is the hard 
>>limit. "c" is the soft limit and
>>ARC would try and grow upto ''c". The value of
"c" is adjusted when there
>>is a need to cache more
>>but, it will never exceed "c_max".
>>
>>Regarding the huge number of reads, I am sure you have already tried 
>>disabling the VDEV prefetch.
>>If not, it is worth a try.
>>    
>>
>
>That was part of my original question, how? :)
>  
>Apologies :-) I was digging around the code and I find that 
zfs_vdev_cache_bshift is the one which would
control the amount that is read. Currenty it is set to 16. So, we should 
be able to modify this and reduce
the prefetch.

However, I will have to double check with more people and get back to you.

Thanks and regards,
Sanjeev.
>/Tomas
>  
>

-- 
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:    x27521 +91 80 669 27521

Reasonably Related Threads

Search for more reasonably related threads

zfs discuss - Nov 2006 - Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

Reasonably Related Threads