Tomas Ögren
2006-Nov-09 16:59 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Hello. We''re currently using a Sun Blade1000 (2x750MHz, 1G ram, 2x160MB/s mpt scsi buses, skge GigE network) as a NFS backend with ZFS for distribution of free software like Debian (cdimage.debian.org, ftp.se.debian.org) and have run into some performance issues. We are running SX snv_48 and have run with a raidz2 with 7x300G for a while now, just added another 7x300G raidz2 today but I''ll stick to old information so far. Tried Sol10u2 before, but nfs writes killed every bit of performance, snv_48 works much better in that regard. Working data set is about 1.2TB over ~550k inodes right now. Backend serves data to 2-4 linux frontends running Apache (with local raid0 mod_disk_cache), rsync (looking through entire debian trees every now and then) and vsftp (not used much). There are (at least?) two types of performance issues we''ve run into.. 1. DNLC-through-ZFS doesn''t seem to listen to ncsize. The filesystem currently has ~550k inodes and large portions of it is frequently looked over with rsync (over nfs). mdb said ncsize was about 68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to 600k and rebooted.. Didn''t seem to change much, still seeing hitrates at about the same and manual find(1) doesn''t seem to be that cached (according to vmstat and dnlcsnoop.d). When booting, the following message came up, not sure if it matters or not: NOTICE: setting nrnode to max value of 351642 NOTICE: setting nrnode to max value of 235577 Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that it has its own implementation which is integrated with the rest of the ZFS cache which throws out metadata cache in favour of data cache.. or something.. 2. Readahead or something is killing all signs of performance Since there can be pretty many requests in the air at the same time, we''re having issues with readahead.. Some regular numbers are 7x13MB/s being read from disk according to ''iostat -xnzm 5'' and ''zpool iostat -v 5'', and maybe 5MB/s is being sent back over the network.. This means that about 20x more is read from disk than actually being used. When testing single streams, the readahead helps and data isn''t thrown away.. but when a bazilion nfs requests come at once, too much is being read by zfs compared to what was actually requested/being delivered. I saw some stuff about zfs_prefetch_disable in current ("unreleased") code, will this help us perhaps? I''ve read about two layers of prefetch, one per vdev and one per disk.. Since the current working set is about 1.2TB, 1GB memory in the server and "lots of one-shot file requests" nature, we''d like to disable as much readahead and data cache as possible (since the chance of a positive data cache hit is very low).. Keeping dnlc stuff in memory would help though. Some URLs: zfs_prefetch_disable being integrated: http://dlc.sun.com/osol/on/downloads/current/on-changelog-20061103.html zfs_prefetch_disable itself http://src.opensolaris.org/source/search?q=zfs_prefetch_disable&defs=&refs=&path=&hist Soft Track Buffer / Prefetch: http://blogs.sun.com/roch/entry/the_dynamics_of_zfs As far as I''ve been able to tell using mdb, this is already lowered in b48? http://blogs.sun.com/roch/entry/tuning_the_knobs Suggestions, ideas etc? /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Neil Perrin
2006-Nov-09 18:35 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Tomas ?gren wrote On 11/09/06 09:59,:> 1. DNLC-through-ZFS doesn''t seem to listen to ncsize. > > The filesystem currently has ~550k inodes and large portions of it is > frequently looked over with rsync (over nfs). mdb said ncsize was about > 68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to > 600k and rebooted.. Didn''t seem to change much, still seeing hitrates at > about the same and manual find(1) doesn''t seem to be that cached > (according to vmstat and dnlcsnoop.d). > When booting, the following message came up, not sure if it matters or not: > NOTICE: setting nrnode to max value of 351642 > NOTICE: setting nrnode to max value of 235577 > > Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that > it has its own implementation which is integrated with the rest of the > ZFS cache which throws out metadata cache in favour of data cache.. or > something..A more complete and useful set of dnlc statistic can be obtained via "kstat -n dnlcstats". As well as soft the limit on dnlc entries (ncsize) the current number of cached entries is also useful: echo ncsize/D | mdb -k echo dnlc_nentries/D | mdb -k nfs does have a maximum nmber of rnodes which is calculated from the memory available. It doesn''t look like nrnode_max can be overridden. Having said that I actually think your problem is lack of memory. For each ZFS vnode held by the DNLC it uses a *lot* more memory than say UFS. Consequently it has to purge dnlc entries and I suspect with only 1GB that the ZFS ARC doesn''t allow many dnlc entries. I don''t know if that number is maintained anywhere, for you to check. Mark? Neil.
eric kustarz
2006-Nov-09 20:15 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Neil Perrin wrote:> > > Tomas ?gren wrote On 11/09/06 09:59,: > >> 1. DNLC-through-ZFS doesn''t seem to listen to ncsize. >> >> The filesystem currently has ~550k inodes and large portions of it is >> frequently looked over with rsync (over nfs). mdb said ncsize was about >> 68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to >> 600k and rebooted.. Didn''t seem to change much, still seeing hitrates at >> about the same and manual find(1) doesn''t seem to be that cached >> (according to vmstat and dnlcsnoop.d). >> When booting, the following message came up, not sure if it matters or >> not: >> NOTICE: setting nrnode to max value of 351642 >> NOTICE: setting nrnode to max value of 235577 >> >> Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that >> it has its own implementation which is integrated with the rest of the >> ZFS cache which throws out metadata cache in favour of data cache.. or >> something.. > > > A more complete and useful set of dnlc statistic can be obtained via > "kstat -n dnlcstats". As well as soft the limit on dnlc entries (ncsize) > the current number of cached entries is also useful: > > echo ncsize/D | mdb -k > echo dnlc_nentries/D | mdb -k > > nfs does have a maximum nmber of rnodes which is calculated from the > memory available. It doesn''t look like nrnode_max can be overridden. > > Having said that I actually think your problem is lack of memory. > For each ZFS vnode held by the DNLC it uses a *lot* more memory > than say UFS. Consequently it has to purge dnlc entries and I > suspect with only 1GB that the ZFS ARC doesn''t allow many dnlc entries. > I don''t know if that number is maintained anywhere, for you to check. > Mark? > > Neil.If the ARC detects low memory (via arc_reclaim_needed()), then we call arc_kmem_reap_now() and subsequently dnlc_reduce_cache() - which reduces the # of dnlc entries by 3% (ARC_REDUCE_DNLC_PERCENT). So yeah, dnlc_nentries would be really interesting to see (especially if its << ncsize). eric
Tomas Ögren
2006-Nov-09 20:47 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes:> > > Tomas ?gren wrote On 11/09/06 09:59,: > > >1. DNLC-through-ZFS doesn''t seem to listen to ncsize. > > > >The filesystem currently has ~550k inodes and large portions of it is > >frequently looked over with rsync (over nfs). mdb said ncsize was about > >68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to > >600k and rebooted.. Didn''t seem to change much, still seeing hitrates at > >about the same and manual find(1) doesn''t seem to be that cached > >(according to vmstat and dnlcsnoop.d). > >When booting, the following message came up, not sure if it matters or not: > >NOTICE: setting nrnode to max value of 351642 > >NOTICE: setting nrnode to max value of 235577 > > > >Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that > >it has its own implementation which is integrated with the rest of the > >ZFS cache which throws out metadata cache in favour of data cache.. or > >something.. > > A more complete and useful set of dnlc statistic can be obtained via > "kstat -n dnlcstats". As well as soft the limit on dnlc entries (ncsize) > the current number of cached entries is also useful:This is after ~28h uptime: module: unix instance: 0 name: dnlcstats class: misc crtime 47.5600948 dir_add_abort 0 dir_add_max 0 dir_add_no_memory 0 dir_cached_current 4 dir_cached_total 107 dir_entries_cached_current 4321 dir_fini_purge 0 dir_hits 11000 dir_misses 172814 dir_reclaim_any 25 dir_reclaim_last 16 dir_remove_entry_fail 0 dir_remove_space_fail 0 dir_start_no_memory 0 dir_update_fail 0 double_enters 234918 enters 59193543 hits 36690843 misses 59384436 negative_cache_hits 1366345 pick_free 0 pick_heuristic 57069023 pick_last 2035111 purge_all 1 purge_fs1 0 purge_total_entries 3748 purge_vfs 187 purge_vp 95 snaptime 99177.711093 vmstat -s: 96080561 total name lookups (cache hits 38%)> > echo ncsize/D | mdb -k > echo dnlc_nentries/D | mdb -kncsize: 600000 dnlc_nentries: 19230 Not quite the same..> nfs does have a maximum nmber of rnodes which is calculated from the > memory available. It doesn''t look like nrnode_max can be overridden.rnode seems to take 472 bytes according to my test program.. which is "a bit more" than the 64 bytes per dnlc entry in ncsize docs..> Having said that I actually think your problem is lack of memory. > For each ZFS vnode held by the DNLC it uses a *lot* more memory > than say UFS. Consequently it has to purge dnlc entries and I > suspect with only 1GB that the ZFS ARC doesn''t allow many dnlc entries. > I don''t know if that number is maintained anywhere, for you to check. > Mark?Current memory usage (for some values of usage ;): # echo ::memstat|mdb -k Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 95584 746 75% Anon 20868 163 16% Exec and libs 1703 13 1% Page cache 1007 7 1% Free (cachelist) 97 0 0% Free (freelist) 7745 60 6% Total 127004 992 Physical 125192 978 /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Brian Wong
2006-Nov-09 20:54 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
eric kustarz wrote:> > If the ARC detects low memory (via arc_reclaim_needed()), then we call > arc_kmem_reap_now() and subsequently dnlc_reduce_cache() - which > reduces the # of dnlc entries by 3% (ARC_REDUCE_DNLC_PERCENT). > > So yeah, dnlc_nentries would be really interesting to see (especially > if its << ncsize).The version of statit that we''re using is still attached to ancient 32-bit counters that /are/ overflowing on our runs. I''m fixing this at the moment and I''ll send around a new binary this afternoon. blw
Tomas Ögren
2006-Nov-09 21:05 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
On 09 November, 2006 - Tomas ?gren sent me these 4,4K bytes:> On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes: > > > nfs does have a maximum nmber of rnodes which is calculated from the > > memory available. It doesn''t look like nrnode_max can be overridden. > > rnode seems to take 472 bytes according to my test program.. which is "a > bit more" than the 64 bytes per dnlc entry in ncsize docs..But wait a minute.. I''m not interested in being an NFS client.. this is a server.. so wasting ~100MB on nfs client stuff that will never be used isn''t that great.. setting to something really low and rebooting now.. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
eric kustarz
2006-Nov-09 21:22 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Brian Wong wrote:> eric kustarz wrote: > >> >> If the ARC detects low memory (via arc_reclaim_needed()), then we call >> arc_kmem_reap_now() and subsequently dnlc_reduce_cache() - which >> reduces the # of dnlc entries by 3% (ARC_REDUCE_DNLC_PERCENT). >> >> So yeah, dnlc_nentries would be really interesting to see (especially >> if its << ncsize). > > The version of statit that we''re using is still attached to ancient > 32-bit counters that /are/ overflowing on our runs. I''m fixing this at > the moment and I''ll send around a new binary this afternoon. > > blwMe and Spencer just fixed some statit bugs (such as getting it to not core on a thumper)... he has the changes, so i''d sync up with him (i''m not sure if they are they same bugs though). eric
Robert Milkowski
2006-Nov-09 22:46 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Hello Tomas, Thursday, November 9, 2006, 9:47:17 PM, you wrote: T?> On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes: T?> Current memory usage (for some values of usage ;): T?> # echo ::memstat|mdb -k T?> Page Summary Pages MB %Tot T?> ------------ ---------------- ---------------- ---- T?> Kernel 95584 746 75% T?> Anon 20868 163 16% T?> Exec and libs 1703 13 1% T?> Page cache 1007 7 1% T?> Free (cachelist) 97 0 0% T?> Free (freelist) 7745 60 6% T?> Total 127004 992 T?> Physical 125192 978 Well, when I rised ncsize on nfs server I got memory pressure problem. Leaving ncsize at default solved problem. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Neil Perrin
2006-Nov-10 00:45 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Tomas ?gren wrote On 11/09/06 13:47,:>On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes: > > > >>Tomas ?gren wrote On 11/09/06 09:59,: >> >> >> >>>1. DNLC-through-ZFS doesn''t seem to listen to ncsize. >>> >>>The filesystem currently has ~550k inodes and large portions of it is >>>frequently looked over with rsync (over nfs). mdb said ncsize was about >>>68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to >>>600k and rebooted.. Didn''t seem to change much, still seeing hitrates at >>>about the same and manual find(1) doesn''t seem to be that cached >>>(according to vmstat and dnlcsnoop.d). >>>When booting, the following message came up, not sure if it matters or not: >>>NOTICE: setting nrnode to max value of 351642 >>>NOTICE: setting nrnode to max value of 235577 >>> >>>Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that >>>it has its own implementation which is integrated with the rest of the >>>ZFS cache which throws out metadata cache in favour of data cache.. or >>>something.. >>> >>> >>A more complete and useful set of dnlc statistic can be obtained via >>"kstat -n dnlcstats". As well as soft the limit on dnlc entries (ncsize) >>the current number of cached entries is also useful: >> >> > >This is after ~28h uptime: > >module: unix instance: 0 >name: dnlcstats class: misc > crtime 47.5600948 > dir_add_abort 0 > dir_add_max 0 > dir_add_no_memory 0 > dir_cached_current 4 > dir_cached_total 107 > dir_entries_cached_current 4321 > dir_fini_purge 0 > dir_hits 11000 > dir_misses 172814 > dir_reclaim_any 25 > dir_reclaim_last 16 > dir_remove_entry_fail 0 > dir_remove_space_fail 0 > dir_start_no_memory 0 > dir_update_fail 0 > double_enters 234918 > enters 59193543 > hits 36690843 > misses 59384436 > negative_cache_hits 1366345 > pick_free 0 > pick_heuristic 57069023 > pick_last 2035111 > purge_all 1 > purge_fs1 0 > purge_total_entries 3748 > purge_vfs 187 > purge_vp 95 > snaptime 99177.711093 > > >vmstat -s: > 96080561 total name lookups (cache hits 38%) > > > >>echo ncsize/D | mdb -k >>echo dnlc_nentries/D | mdb -k >> >> > >ncsize: 600000 >dnlc_nentries: 19230 > >Not quite the same.. > > > >>Having said that I actually think your problem is lack of memory. >>For each ZFS vnode held by the DNLC it uses a *lot* more memory >>than say UFS. Consequently it has to purge dnlc entries and I >>suspect with only 1GB that the ZFS ARC doesn''t allow many dnlc entries. >>I don''t know if that number is maintained anywhere, for you to check. >>Mark? >> >> > >Current memory usage (for some values of usage ;): ># echo ::memstat|mdb -k >Page Summary Pages MB %Tot >------------ ---------------- ---------------- ---- >Kernel 95584 746 75% >Anon 20868 163 16% >Exec and libs 1703 13 1% >Page cache 1007 7 1% >Free (cachelist) 97 0 0% >Free (freelist) 7745 60 6% > >Total 127004 992 >Physical 125192 978 > > >/Tomas > >This memory usage shows nearly all of memory consumed by the kernel and probably by ZFS. ZFS can''t add any more DNLC entries due to lack of memory without purging others. This can be seen from the number of dnlc_nentries being way less than ncsize. I don''t know if there''s a DMU or ARC bug to reduce the memory footprint of their internal structures for situations like this, but we are aware of the issue. Neil.
Sanjeev Bagewadi
2006-Nov-10 12:25 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Comments in line... Neil Perrin wrote:>>>> 1. DNLC-through-ZFS doesn''t seem to listen to ncsize. >>>> >>>> The filesystem currently has ~550k inodes and large portions of it is >>>> frequently looked over with rsync (over nfs). mdb said ncsize was >>>> about >>>> 68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to >>>> 600k and rebooted.. Didn''t seem to change much, still seeing >>>> hitrates at >>>> about the same and manual find(1) doesn''t seem to be that cached >>>> (according to vmstat and dnlcsnoop.d). >>>> When booting, the following message came up, not sure if it matters >>>> or not: >>>> NOTICE: setting nrnode to max value of 351642 >>>> NOTICE: setting nrnode to max value of 235577 >>>> >>>> Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is >>>> that >>>> it has its own implementation which is integrated with the rest of the >>>> ZFS cache which throws out metadata cache in favour of data cache.. or >>>> something.. >>> >> Current memory usage (for some values of usage ;): >> # echo ::memstat|mdb -k >> Page Summary Pages MB %Tot >> ------------ ---------------- ---------------- ---- >> Kernel 95584 746 75% >> Anon 20868 163 16% >> Exec and libs 1703 13 1% >> Page cache 1007 7 1% >> Free (cachelist) 97 0 0% >> Free (freelist) 7745 60 6% >> >> Total 127004 992 >> Physical 125192 978 >> >> >> /Tomas >> >> > This memory usage shows nearly all of memory consumed by the kernel > and probably by ZFS. ZFS can''t add any more DNLC entries due to lack of > memory without purging others. This can be seen from the number of > dnlc_nentries being way less than ncsize. > I don''t know if there''s a DMU or ARC bug to reduce the memory footprint > of their internal structures for situations like this, but we are > aware of the > issue.Can you please check the zio buffers and the arc status ? Here is how you can do it : - Start mdb : ie. mdb -k > ::kmem_cache - In the output generated above check the amount consumed by the zio_buf_*, arc_buf_t and arc_buf_hdr_t. - Dump the values of arc > arc::print struct arc - This should give you some like below. -- snip-- > arc::print struct arc { anon = ARC_anon mru = ARC_mru mru_ghost = ARC_mru_ghost mfu = ARC_mfu mfu_ghost = ARC_mfu_ghost size = 0x3e20000 <-- tells you the current memory consumed by ARC buffer (including the the memory consumed for the data cached ie. zio_buff_* p = 0x1d06a06 c = 0x4000000 c_min = 0x4000000 c_max = 0x2f9aa800 hits = 0x2fd2 misses = 0xd1c deleted = 0x296 skipped = 0 hash_elements = 0xa85 hash_elements_max = 0xcc0 hash_collisions = 0x173 hash_chains = 0xbe hash_chain_max = 0x2 no_grow = 0 <-- This would be set to 1 if we have a memory crunch } -- snip -- And as Niel pointed out we would probably need some way of limiting the ARC consumption. Regards, Sanjeev.> > Neil. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Solaris Revenue Products Engineering, India Engineering Center, Sun Microsystems India Pvt Ltd. Tel: x27521 +91 80 669 27521
Tomas Ögren
2006-Nov-10 13:32 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
On 10 November, 2006 - Sanjeev Bagewadi sent me these 3,5K bytes:> Comments in line... > > Neil Perrin wrote: > > >>>>1. DNLC-through-ZFS doesn''t seem to listen to ncsize. > >>>> > >>>>The filesystem currently has ~550k inodes and large portions of it is > >>>>frequently looked over with rsync (over nfs). mdb said ncsize was > >>>>about > >>>>68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to > >>>>600k and rebooted.. Didn''t seem to change much, still seeing > >>>>hitrates at > >>>>about the same and manual find(1) doesn''t seem to be that cached > >>>>(according to vmstat and dnlcsnoop.d). > >>>>When booting, the following message came up, not sure if it matters > >>>>or not: > >>>>NOTICE: setting nrnode to max value of 351642 > >>>>NOTICE: setting nrnode to max value of 235577 > >>>> > >>>>Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is > >>>>that > >>>>it has its own implementation which is integrated with the rest of the > >>>>ZFS cache which throws out metadata cache in favour of data cache.. or > >>>>something.. > >>> > >>Current memory usage (for some values of usage ;): > >># echo ::memstat|mdb -k > >>Page Summary Pages MB %Tot > >>------------ ---------------- ---------------- ---- > >>Kernel 95584 746 75% > >>Anon 20868 163 16% > >>Exec and libs 1703 13 1% > >>Page cache 1007 7 1% > >>Free (cachelist) 97 0 0% > >>Free (freelist) 7745 60 6% > >> > >>Total 127004 992 > >>Physical 125192 978 > >> > >> > >>/Tomas > >> > >> > >This memory usage shows nearly all of memory consumed by the kernel > >and probably by ZFS. ZFS can''t add any more DNLC entries due to lack of > >memory without purging others. This can be seen from the number of > >dnlc_nentries being way less than ncsize. > >I don''t know if there''s a DMU or ARC bug to reduce the memory footprint > >of their internal structures for situations like this, but we are > >aware of the > >issue. > > Can you please check the zio buffers and the arc status ? > > Here is how you can do it : > - Start mdb : ie. mdb -k > > > ::kmem_cache > > - In the output generated above check the amount consumed by the > zio_buf_*, arc_buf_t and > arc_buf_hdr_t.ADDR NAME FLAG CFLAG BUFSIZE BUFTOTL 0000030002640a08 zio_buf_512 0000 020000 512 102675 0000030002640c88 zio_buf_1024 0200 020000 1024 48 0000030002640f08 zio_buf_1536 0200 020000 1536 70 0000030002641188 zio_buf_2048 0200 020000 2048 16 0000030002641408 zio_buf_2560 0200 020000 2560 9 0000030002641688 zio_buf_3072 0200 020000 3072 16 0000030002641908 zio_buf_3584 0200 020000 3584 18 0000030002641b88 zio_buf_4096 0200 020000 4096 12 0000030002668008 zio_buf_5120 0200 020000 5120 32 0000030002668288 zio_buf_6144 0200 020000 6144 8 0000030002668508 zio_buf_7168 0200 020000 7168 1032 0000030002668788 zio_buf_8192 0200 020000 8192 8 0000030002668a08 zio_buf_10240 0200 020000 10240 8 0000030002668c88 zio_buf_12288 0200 020000 12288 4 0000030002668f08 zio_buf_14336 0200 020000 14336 468 0000030002669188 zio_buf_16384 0200 020000 16384 3326 0000030002669408 zio_buf_20480 0200 020000 20480 16 0000030002669688 zio_buf_24576 0200 020000 24576 3 0000030002669908 zio_buf_28672 0200 020000 28672 12 0000030002669b88 zio_buf_32768 0200 020000 32768 1935 000003000266c008 zio_buf_40960 0200 020000 40960 13 000003000266c288 zio_buf_49152 0200 020000 49152 9 000003000266c508 zio_buf_57344 0200 020000 57344 7 000003000266c788 zio_buf_65536 0200 020000 65536 3272 000003000266ca08 zio_buf_73728 0200 020000 73728 10 000003000266cc88 zio_buf_81920 0200 020000 81920 7 000003000266cf08 zio_buf_90112 0200 020000 90112 5 000003000266d188 zio_buf_98304 0200 020000 98304 7 000003000266d408 zio_buf_106496 0200 020000 106496 12 000003000266d688 zio_buf_114688 0200 020000 114688 6 000003000266d908 zio_buf_122880 0200 020000 122880 5 000003000266db88 zio_buf_131072 0200 020000 131072 92 0000030002670508 arc_buf_hdr_t 0000 000000 128 11970 0000030002670788 arc_buf_t 0000 000000 40 7308> - Dump the values of arc > > > arc::print struct arc> arc::print struct arc{ anon = ARC_anon mru = ARC_mru mru_ghost = ARC_mru_ghost mfu = ARC_mfu mfu_ghost = ARC_mfu_ghost size = 0x6f7a400 p = 0x5d9bd5a c = 0x5f6375a c_min = 0x4000000 c_max = 0x2e82a000 hits = 0x40e0a15 misses = 0x1cec4a4 deleted = 0x1b0ba0d skipped = 0x24ea64e13 hash_elements = 0x179d hash_elements_max = 0x60bb hash_collisions = 0x8dca3a hash_chains = 0x391 hash_chain_max = 0x8 no_grow = 0x1 } So, about 100MB and a memory crunch..> size = 0x3e20000 <-- tells you the current memory > consumed by ARC buffer (including the > the memory consumed > for the data cached ie. zio_buff_*> no_grow = 0 <-- This would be set to 1 if we have a > memory crunch> And as Niel pointed out we would probably need some way of limiting the > ARC consumption./Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Sanjeev Bagewadi
2006-Nov-13 05:25 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Tomas, comments inline... Tomas ?gren wrote:>On 10 November, 2006 - Sanjeev Bagewadi sent me these 3,5K bytes: > > > >>>>>>1. DNLC-through-ZFS doesn''t seem to listen to ncsize. >>>>>> >>>>>>The filesystem currently has ~550k inodes and large portions of it is >>>>>>frequently looked over with rsync (over nfs). mdb said ncsize was >>>>>>about >>>>>>68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to >>>>>>600k and rebooted.. Didn''t seem to change much, still seeing >>>>>>hitrates at >>>>>>about the same and manual find(1) doesn''t seem to be that cached >>>>>>(according to vmstat and dnlcsnoop.d). >>>>>>When booting, the following message came up, not sure if it matters >>>>>>or not: >>>>>>NOTICE: setting nrnode to max value of 351642 >>>>>>NOTICE: setting nrnode to max value of 235577 >>>>>> >>>>>>Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is >>>>>>that >>>>>>it has its own implementation which is integrated with the rest of the >>>>>>ZFS cache which throws out metadata cache in favour of data cache.. or >>>>>>something.. >>>>>> >>>>>> >>>>Current memory usage (for some values of usage ;): >>>># echo ::memstat|mdb -k >>>>Page Summary Pages MB %Tot >>>>------------ ---------------- ---------------- ---- >>>>Kernel 95584 746 75% >>>>Anon 20868 163 16% >>>>Exec and libs 1703 13 1% >>>>Page cache 1007 7 1% >>>>Free (cachelist) 97 0 0% >>>>Free (freelist) 7745 60 6% >>>> >>>>Total 127004 992 >>>>Physical 125192 978 >>>> >>>> >>>>/Tomas >>>> >>>> >>>> >>>> >>>This memory usage shows nearly all of memory consumed by the kernel >>>and probably by ZFS. ZFS can''t add any more DNLC entries due to lack of >>>memory without purging others. This can be seen from the number of >>>dnlc_nentries being way less than ncsize. >>>I don''t know if there''s a DMU or ARC bug to reduce the memory footprint >>>of their internal structures for situations like this, but we are >>>aware of the >>>issue. >>> >>> >>Can you please check the zio buffers and the arc status ? >> >>Here is how you can do it : >>- Start mdb : ie. mdb -k >> >> >> >>>::kmem_cache >>> >>> >>- In the output generated above check the amount consumed by the >>zio_buf_*, arc_buf_t and >> arc_buf_hdr_t. >> >> > >ADDR NAME FLAG CFLAG BUFSIZE BUFTOTL > >0000030002640a08 zio_buf_512 0000 020000 512 102675 >0000030002640c88 zio_buf_1024 0200 020000 1024 48 >0000030002640f08 zio_buf_1536 0200 020000 1536 70 >0000030002641188 zio_buf_2048 0200 020000 2048 16 >0000030002641408 zio_buf_2560 0200 020000 2560 9 >0000030002641688 zio_buf_3072 0200 020000 3072 16 >0000030002641908 zio_buf_3584 0200 020000 3584 18 >0000030002641b88 zio_buf_4096 0200 020000 4096 12 >0000030002668008 zio_buf_5120 0200 020000 5120 32 >0000030002668288 zio_buf_6144 0200 020000 6144 8 >0000030002668508 zio_buf_7168 0200 020000 7168 1032 >0000030002668788 zio_buf_8192 0200 020000 8192 8 >0000030002668a08 zio_buf_10240 0200 020000 10240 8 >0000030002668c88 zio_buf_12288 0200 020000 12288 4 >0000030002668f08 zio_buf_14336 0200 020000 14336 468 >0000030002669188 zio_buf_16384 0200 020000 16384 3326 >0000030002669408 zio_buf_20480 0200 020000 20480 16 >0000030002669688 zio_buf_24576 0200 020000 24576 3 >0000030002669908 zio_buf_28672 0200 020000 28672 12 >0000030002669b88 zio_buf_32768 0200 020000 32768 1935 >000003000266c008 zio_buf_40960 0200 020000 40960 13 >000003000266c288 zio_buf_49152 0200 020000 49152 9 >000003000266c508 zio_buf_57344 0200 020000 57344 7 >000003000266c788 zio_buf_65536 0200 020000 65536 3272 >000003000266ca08 zio_buf_73728 0200 020000 73728 10 >000003000266cc88 zio_buf_81920 0200 020000 81920 7 >000003000266cf08 zio_buf_90112 0200 020000 90112 5 >000003000266d188 zio_buf_98304 0200 020000 98304 7 >000003000266d408 zio_buf_106496 0200 020000 106496 12 >000003000266d688 zio_buf_114688 0200 020000 114688 6 >000003000266d908 zio_buf_122880 0200 020000 122880 5 >000003000266db88 zio_buf_131072 0200 020000 131072 92 > >0000030002670508 arc_buf_hdr_t 0000 000000 128 11970 >0000030002670788 arc_buf_t 0000 000000 40 7308 > > > >>- Dump the values of arc >> >> >> >>>arc::print struct arc >>> >>> > > > >>arc::print struct arc >> >> >{ > anon = ARC_anon > mru = ARC_mru > mru_ghost = ARC_mru_ghost > mfu = ARC_mfu > mfu_ghost = ARC_mfu_ghost > size = 0x6f7a400 > p = 0x5d9bd5a > c = 0x5f6375a > c_min = 0x4000000 > c_max = 0x2e82a000 > hits = 0x40e0a15 > misses = 0x1cec4a4 > deleted = 0x1b0ba0d > skipped = 0x24ea64e13 > hash_elements = 0x179d > hash_elements_max = 0x60bb > hash_collisions = 0x8dca3a > hash_chains = 0x391 > hash_chain_max = 0x8 > no_grow = 0x1 >} > >So, about 100MB and a memory crunch.. > >Interesting ! So, it is not the ARC which is consuming too much memory.... It is some other piece (not sure if it belongs to ZFS) which is causing the crunch... Or the other possibility is that ARC ate up too much and caused a near crunch situation and the kmem hit back and caused ARC to free up it''s buffers (hence the no_grow flag enabled). So, it (ARC) could be osscillating between large caching and then purging the caches. You might want to keep track of these values (ARC size and no_grow flag) and see how they change over a period of time. This would help us understand the pattern. And if we know it ARC which is causing the crunch we could manually change the values of c_max to a comfortable value and that would limit the size of ARC. However, I would suggest that you try it out on a non-production machine first. By, default the c_max is set to 75% of physmem and that is the hard limit. "c" is the soft limit and ARC would try and grow upto ''c". The value of "c" is adjusted when there is a need to cache more but, it will never exceed "c_max". Regarding the huge number of reads, I am sure you have already tried disabling the VDEV prefetch. If not, it is worth a try. Thanks and regards, Sanjeev. -- Solaris Revenue Products Engineering, India Engineering Center, Sun Microsystems India Pvt Ltd. Tel: x27521 +91 80 669 27521
Tomas Ögren
2006-Nov-13 09:51 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes:> Tomas, > > comments inline... > > > >>arc::print struct arc > >> > >> > >{ > > anon = ARC_anon > > mru = ARC_mru > > mru_ghost = ARC_mru_ghost > > mfu = ARC_mfu > > mfu_ghost = ARC_mfu_ghost > > size = 0x6f7a400 > > p = 0x5d9bd5a > > c = 0x5f6375a > > c_min = 0x4000000 > > c_max = 0x2e82a000 > > hits = 0x40e0a15 > > misses = 0x1cec4a4 > > deleted = 0x1b0ba0d > > skipped = 0x24ea64e13 > > hash_elements = 0x179d > > hash_elements_max = 0x60bb > > hash_collisions = 0x8dca3a > > hash_chains = 0x391 > > hash_chain_max = 0x8 > > no_grow = 0x1 > >} > > > >So, about 100MB and a memory crunch.. > > > > > Interesting ! So, it is not the ARC which is consuming too much memory.... > It is some other piece (not sure if it belongs to ZFS) which is causing > the crunch... > > Or the other possibility is that ARC ate up too much and caused a near > crunch situation > and the kmem hit back and caused ARC to free up it''s buffers (hence the > no_grow flag enabled). > So, it (ARC) could be osscillating between large caching and then > purging the caches. > > You might want to keep track of these values (ARC size and no_grow flag) > and see how they > change over a period of time. This would help us understand the pattern.I would guess it grows after boot until it hits some max and then stays there.. but I can check it out..> And if we know it ARC which is causing the crunch we could manually > change the values of > c_max to a comfortable value and that would limit the size of ARC.But in the ZFS world, DNLC is part of the ARC, right? My original question was how to get rid of "data cache", but keep "metadata cache" (such as DNLC)...> However, I would suggest > that you try it out on a non-production machine first. > > By, default the c_max is set to 75% of physmem and that is the hard > limit. "c" is the soft limit and > ARC would try and grow upto ''c". The value of "c" is adjusted when there > is a need to cache more > but, it will never exceed "c_max". > > Regarding the huge number of reads, I am sure you have already tried > disabling the VDEV prefetch. > If not, it is worth a try.That was part of my original question, how? :) /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Roch - PAE
2006-Nov-13 13:41 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Tomas ?gren writes: > On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes: > > > Tomas, > > > > comments inline... > > > > > > >>arc::print struct arc > > >> > > >> > > >{ > > > anon = ARC_anon > > > mru = ARC_mru > > > mru_ghost = ARC_mru_ghost > > > mfu = ARC_mfu > > > mfu_ghost = ARC_mfu_ghost > > > size = 0x6f7a400 > > > p = 0x5d9bd5a > > > c = 0x5f6375a > > > c_min = 0x4000000 > > > c_max = 0x2e82a000 > > > hits = 0x40e0a15 > > > misses = 0x1cec4a4 > > > deleted = 0x1b0ba0d > > > skipped = 0x24ea64e13 > > > hash_elements = 0x179d > > > hash_elements_max = 0x60bb > > > hash_collisions = 0x8dca3a > > > hash_chains = 0x391 > > > hash_chain_max = 0x8 > > > no_grow = 0x1 > > >} > > > > > >So, about 100MB and a memory crunch.. > > > > > > > > Interesting ! So, it is not the ARC which is consuming too much memory.... > > It is some other piece (not sure if it belongs to ZFS) which is causing > > the crunch... > > > > Or the other possibility is that ARC ate up too much and caused a near > > crunch situation > > and the kmem hit back and caused ARC to free up it''s buffers (hence the > > no_grow flag enabled). > > So, it (ARC) could be osscillating between large caching and then > > purging the caches. > > > > You might want to keep track of these values (ARC size and no_grow flag) > > and see how they > > change over a period of time. This would help us understand the pattern. > > I would guess it grows after boot until it hits some max and then stays > there.. but I can check it out.. > > > And if we know it ARC which is causing the crunch we could manually > > change the values of > > c_max to a comfortable value and that would limit the size of ARC. > > But in the ZFS world, DNLC is part of the ARC, right? > My original question was how to get rid of "data cache", but keep > "metadata cache" (such as DNLC)... > > > However, I would suggest > > that you try it out on a non-production machine first. > > > > By, default the c_max is set to 75% of physmem and that is the hard > > limit. "c" is the soft limit and > > ARC would try and grow upto ''c". The value of "c" is adjusted when there > > is a need to cache more > > but, it will never exceed "c_max". > > > > Regarding the huge number of reads, I am sure you have already tried > > disabling the VDEV prefetch. > > If not, it is worth a try. > > That was part of my original question, how? :) > > /Tomas > -- > Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ > |- Student at Computing Science, University of Ume? > `- Sysadmin at {cs,acc}.umu.se > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Under memory pressure the arc will shrink and it will also shrink the dnlc by 3%. arc_reduce_dnlc_percent = 3 You could try to tune that number. -r
Eric Kustarz
2006-Nov-13 21:40 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Tomas ?gren wrote:> On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes: > > >>Tomas, >> >>comments inline... >> >> >> >>>>arc::print struct arc >>>> >>>> >>> >>>{ >>> anon = ARC_anon >>> mru = ARC_mru >>> mru_ghost = ARC_mru_ghost >>> mfu = ARC_mfu >>> mfu_ghost = ARC_mfu_ghost >>> size = 0x6f7a400 >>> p = 0x5d9bd5a >>> c = 0x5f6375a >>> c_min = 0x4000000 >>> c_max = 0x2e82a000 >>> hits = 0x40e0a15 >>> misses = 0x1cec4a4 >>> deleted = 0x1b0ba0d >>> skipped = 0x24ea64e13 >>> hash_elements = 0x179d >>> hash_elements_max = 0x60bb >>> hash_collisions = 0x8dca3a >>> hash_chains = 0x391 >>> hash_chain_max = 0x8 >>> no_grow = 0x1 >>>} >>> >>>So, about 100MB and a memory crunch.. >>> >>> >> >>Interesting ! So, it is not the ARC which is consuming too much memory.... >>It is some other piece (not sure if it belongs to ZFS) which is causing >>the crunch... >> >>Or the other possibility is that ARC ate up too much and caused a near >>crunch situation >>and the kmem hit back and caused ARC to free up it''s buffers (hence the >>no_grow flag enabled). >>So, it (ARC) could be osscillating between large caching and then >>purging the caches. >> >>You might want to keep track of these values (ARC size and no_grow flag) >>and see how they >>change over a period of time. This would help us understand the pattern. > > > I would guess it grows after boot until it hits some max and then stays > there.. but I can check it out.. > > >>And if we know it ARC which is causing the crunch we could manually >>change the values of >>c_max to a comfortable value and that would limit the size of ARC. > > > But in the ZFS world, DNLC is part of the ARC, right? > My original question was how to get rid of "data cache", but keep > "metadata cache" (such as DNLC)... > > >>However, I would suggest >>that you try it out on a non-production machine first. >> >>By, default the c_max is set to 75% of physmem and that is the hard >>limit. "c" is the soft limit and >>ARC would try and grow upto ''c". The value of "c" is adjusted when there >>is a need to cache more >>but, it will never exceed "c_max". >> >>Regarding the huge number of reads, I am sure you have already tried >>disabling the VDEV prefetch. >>If not, it is worth a try. > > > That was part of my original question, how? :) > > /TomasOn recent bits, you can set ''zfs_vdev_cache_max'' to 1 to disable the vdev cache. eric
Tomas Ögren
2006-Nov-14 17:23 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
On 13 November, 2006 - Eric Kustarz sent me these 2,4K bytes:> Tomas ?gren wrote: > >On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes: > >>Regarding the huge number of reads, I am sure you have already tried > >>disabling the VDEV prefetch. > >>If not, it is worth a try. > >That was part of my original question, how? :) > > On recent bits, you can set ''zfs_vdev_cache_max'' to 1 to disable the > vdev cache.On earlier versions (snv_48), I did similar with ztune.sh[0], adding cache_size which I set to 0 (instead of 10M). This helped quite a lot, but there seems to be one more level of prefetching.. Example: capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- ftp 1.67T 2.15T 1.26K 23 40.9M 890K raidz2 1.37T 551G 674 10 22.3M 399K c4t0d0 - - 210 3 3.19M 80.4K c4t1d0 - - 211 3 3.19M 80.4K c4t2d0 - - 211 3 3.19M 80.4K c5t0d0 - - 210 3 3.19M 80.4K c5t1d0 - - 242 4 3.19M 80.4K c5t2d0 - - 211 3 3.19M 80.4K c5t3d0 - - 211 3 3.19M 80.4K raidz2 305G 1.61T 614 12 18.6M 491K c4t3d0 - - 222 5 2.66M 99.1K c4t4d0 - - 223 5 2.66M 99.1K c4t5d0 - - 224 5 2.66M 99.1K c4t8d0 - - 190 5 2.66M 99.1K c5t4d0 - - 190 5 2.66M 99.1K c5t5d0 - - 226 5 2.66M 99.1K c5t8d0 - - 225 5 2.66M 99.1K ---------- ----- ----- ----- ----- ----- ----- Before this fix, the ''read bandwidth'' for disks in the first raidz2 added up to way more than the raidz2 itself.. now it adds up correctly, but some other readahead causes a 1-10x factor too much, mostly hovering around 2-3x.. before it was hovering around 8-10x.. [0]: http://blogs.sun.com/roch/resource/ztune.sh /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Sanjeev Bagewadi
2006-Nov-15 09:43 UTC
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Tomas, Apologies for delayed response... Tomas ?gren wrote:>>Interesting ! So, it is not the ARC which is consuming too much memory.... >>It is some other piece (not sure if it belongs to ZFS) which is causing >>the crunch... >> >>Or the other possibility is that ARC ate up too much and caused a near >>crunch situation >>and the kmem hit back and caused ARC to free up it''s buffers (hence the >>no_grow flag enabled). >>So, it (ARC) could be osscillating between large caching and then >>purging the caches. >> >>You might want to keep track of these values (ARC size and no_grow flag) >>and see how they >>change over a period of time. This would help us understand the pattern. >> >> > >I would guess it grows after boot until it hits some max and then stays >there.. but I can check it out.. > >No, that is not true. Its shrinks when there is memory pressure. The values of ''c'' and ''p'' are adjusted accordingly.>>And if we know it ARC which is causing the crunch we could manually >>change the values of >>c_max to a comfortable value and that would limit the size of ARC. >> >> > >But in the ZFS world, DNLC is part of the ARC, right? > >Not really... ZFS uses the regular DNLC for lookup optimization. However, the metadata/data is cached in the ARC.>My original question was how to get rid of "data cache", but keep >"metadata cache" (such as DNLC)... > >This is good question. AFAIK ARC does not really differentiate between metadata and data. So, I am not sure if we can control it. However, as I mentioned above ZFS still uses the DNLC caching.> > >>However, I would suggest >>that you try it out on a non-production machine first. >> >>By, default the c_max is set to 75% of physmem and that is the hard >>limit. "c" is the soft limit and >>ARC would try and grow upto ''c". The value of "c" is adjusted when there >>is a need to cache more >>but, it will never exceed "c_max". >> >>Regarding the huge number of reads, I am sure you have already tried >>disabling the VDEV prefetch. >>If not, it is worth a try. >> >> > >That was part of my original question, how? :) > >Apologies :-) I was digging around the code and I find that zfs_vdev_cache_bshift is the one which would control the amount that is read. Currenty it is set to 16. So, we should be able to modify this and reduce the prefetch. However, I will have to double check with more people and get back to you. Thanks and regards, Sanjeev.>/Tomas > >-- Solaris Revenue Products Engineering, India Engineering Center, Sun Microsystems India Pvt Ltd. Tel: x27521 +91 80 669 27521