Hi, Please forgive me if my searching-fu has failed me in this case, but I''ve been unable to find any information on how people are going about monitoring and alerting regarding memory usage on Solaris hosts using ZFS. The problem is not that the ZFS ARC is using up the memory, but that the script Nagios is using to check memory usage simply sees, say 96% RAM used, and alerts. The options I can see, and the risks I see with them, are: 1) Raise the alert thresholds so that they are both (warn and crit) above the maximum that the ARC should let itself be. The problem is I can see those being in the order of 98/99% which doesn''t leave a lot of room for response if memory usage is headed towards 100%. 2) Alter the warning script to "ignore" the ARC cache and do alerting based on what''s left. Perhaps with a third threshold somewhere above where the ARC should let things get, in case for some reason the ARC isn''t returning memory to apps. The risk I see here is that ignoring the ARC may present other odd scenarios where I''m essentially ignoring what''s causing the memory problems. So how are others monitoring memory usage on ZFS servers? I''ve read (but can''t find a written reference) that the ARC limits itself such that 1GB of memory is always free. Is that a hard coded number? Is there a bit of leeway around it or can I rely on that exact number of bytes being free unless there is impending 100% memory utilisation? Regards, TROY NANCARROW Systems Architect - UNIX Systems T +61 (3) 8325 2454 1-21 Dean Street, Moonee Ponds, VIC, 3039 www.foxtel.com.au <http://www.foxtel.com.au/> This e-mail, and any attachment, is confidential. If you are not the intended recipient, please delete it from your system, do not use or disclose the information in any way, and notify the sender immediately. Any views expressed in this message are those of the individual sender and may not be the views of FOXTEL, unless specifically stated. No warranty is made that the e-mail or attachment (s) are free from computer viruses or other defects. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090506/92a78d0b/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 6357 bytes Desc: FOXTELemail_footer.gif URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090506/92a78d0b/attachment.gif>
On Wed, May 6, 2009 at 1:08 PM, Troy Nancarrow (MEL) <Troy.Nancarrow at foxtel.com.au> wrote:> So how are others monitoring memory usage on ZFS servers?I think you can get the amount of memory zfs arc uses with arcstat.pl. http://www.solarisinternals.com/wiki/index.php/Arcstat IMHO it''s probably best to set a limit on ARC size and treat it like any other memory used by applications. Regards, Fajar
Troy Nancarrow (MEL) schrieb:> Hi, > > Please forgive me if my searching-fu has failed me in this case, but > I''ve been unable to find any information on how people are going about > monitoring and alerting regarding memory usage on Solaris hosts using ZFS. > > The problem is not that the ZFS ARC is using up the memory, but that the > script Nagios is using to check memory usage simply sees, say 96% RAM > used, and alerts. > The options I can see, and the risks I see with them, are: > 1) Raise the alert thresholds so that they are both (warn and crit) > above the maximum that the ARC should let itself be. The problem is I > can see those being in the order of 98/99% which doesn''t leave a lot of > room for response if memory usage is headed towards 100%. > 2) Alter the warning script to "ignore" the ARC cache and do alerting > based on what''s left. Perhaps with a third threshold somewhere above > where the ARC should let things get, in case for some reason the ARC > isn''t returning memory to apps. The risk I see here is that ignoring the > ARC may present other odd scenarios where I''m essentially ignoring > what''s causing the memory problems. > > So how are others monitoring memory usage on ZFS servers? > > I''ve read (but can''t find a written reference) that the ARC limits > itself such that 1GB of memory is always free. Is that a hard coded > number? Is there a bit of leeway around it or can I rely on that exact > number of bytes being free unless there is impending 100% memory > utilisation? > > Regards, > > > *TROY NANCARROW* >the ZFS evil tuning guide contains a description how to limit the arc size. Look here: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Solaris_10_8.2F07_and_Solaris_Nevada_.28snv_51.29_Releases Concerning monitoring of ARC size, I use (of course) my own tool called sysstat. It shows all key system metrics on one terminal page similar to top. You can get it there: http://www.maier-komor.de/sysstat.html HTH, Thomas
Fajar A. Nugraha wrote:> On Wed, May 6, 2009 at 1:08 PM, Troy Nancarrow (MEL) > <Troy.Nancarrow at foxtel.com.au> wrote: > >> So how are others monitoring memory usage on ZFS servers? >> > > I think you can get the amount of memory zfs arc uses with arcstat.pl. > http://www.solarisinternals.com/wiki/index.php/Arcstat >arcstat is a perl script which reads the ARC kstats (mainly because there is a nifty perl module which reads kstats, which is why /usr/bin/kstat is written in perl). You can always do similar.> IMHO it''s probably best to set a limit on ARC size and treat it like > any other memory used by applications. >There are a few cases where this makes sense, but not many. The ARC will shrink, as needed. With the new write throttle, the perception of ARC dominance is significantly reduced. The main reason to limit ARC today, is to prevent the large page/small page change that can happen if you need to restart a large-page using application (eg databases). -- richard
On Wed, 6 May 2009, Troy Nancarrow (MEL) wrote:> Please forgive me if my searching-fu has failed me in this case, but > I''ve been unable to find any information on how people are going about > monitoring and alerting regarding memory usage on Solaris hosts using > ZFS. > > The problem is not that the ZFS ARC is using up the memory, but that the > script Nagios is using to check memory usage simply sees, say 96% RAM > used, and alerts.Memory is meant to be used. 96% RAM use is good since it represents an effective use of your investment. In the old days there was concern if an application or application''s data could fit in RAM, or that if there was a shortage of RAM, then there could be a lot of activity on the swap device and the system would slow to a crawl. Nowadays RAM in server applications is primarily used as cache. It could be caching application executable pages, shared library executable pages, memory mapped file pages, and normal filesystem data. Caching is good so if a busy server shows a lot of memory free, then perhaps someone wasted money purchasing more RAM than was needed. Or perhaps this extra memory is in reserve for that unusual high load day. If there is insufficient RAM, there may still be 96% RAM in use, but since there is less useful caching, other metrics such as hard paging rates or disk access rates may go up, leading to a slow system. It seems like this Nagios script is not very useful since the notion of "free memory" has become antiquated. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> On Wed, 6 May 2009, Troy Nancarrow (MEL) wrote: > >> Please forgive me if my searching-fu has failed me in this case, but >> I''ve been unable to find any information on how people are going about >> monitoring and alerting regarding memory usage on Solaris hosts using >> ZFS. >> >> The problem is not that the ZFS ARC is using up the memory, but that the >> script Nagios is using to check memory usage simply sees, say 96% RAM >> used, and alerts. > > Memory is meant to be used. 96% RAM use is good since it represents > an effective use of your investment.Actually, I think a percentage of RAM is a bogus metric to measure. For example, on a 2TBytes system, you would be wasting 80 GBytes. Perhaps you should look for a more meaningful threshold. -- richard
On Wed, 6 May 2009, Richard Elling wrote:>> >> Memory is meant to be used. 96% RAM use is good since it represents an >> effective use of your investment. > > Actually, I think a percentage of RAM is a bogus metric to measure. > For example, on a 2TBytes system, you would be wasting 80 GBytes. > Perhaps you should look for a more meaningful threshold.So percent of memory consumed is not a useful efficiency metric? Is this true even if a double precision floating point value is used? :-) It seems like a more useful measure (on a server) is a caching efficiency metric. If the cache hit ratios are poor yet the cache is continually being loaded with new data, then there may be a resource availability issue. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Ben Rockwood''s written a very useful util called arc_summary: http://www.cuddletech.com/blog/pivot/entry.php?id=979 It''s really good for looking at ARC usage (including memory usage). You might be able to make some guesses based on "kstat -n zfs_file_data" and "kstat -n zfs_file_data_buf". Look for mem_inuse. Running "::memstat" in "mdb -k" also shows Kernel memory usage (probably includes ZFS overhead) and ZFS File Data memory usage. But it''s painfully slow to run. kstat is probably better. -Paul Choi Richard Elling wrote:> > > Bob Friesenhahn wrote: >> On Wed, 6 May 2009, Troy Nancarrow (MEL) wrote: >> >>> Please forgive me if my searching-fu has failed me in this case, but >>> I''ve been unable to find any information on how people are going about >>> monitoring and alerting regarding memory usage on Solaris hosts using >>> ZFS. >>> >>> The problem is not that the ZFS ARC is using up the memory, but that >>> the >>> script Nagios is using to check memory usage simply sees, say 96% RAM >>> used, and alerts. >> >> Memory is meant to be used. 96% RAM use is good since it represents >> an effective use of your investment. > > Actually, I think a percentage of RAM is a bogus metric to measure. > For example, on a 2TBytes system, you would be wasting 80 GBytes. > Perhaps you should look for a more meaningful threshold. > -- richard > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Bob Friesenhahn wrote:> It seems like this Nagios script is not very useful since the notion of > "free memory" has become antiquated.Not true. The script is simply not intelligent enough. There are really 3 broad kinds of RAM usage: A) Unused B) Unfreeable by the kernel (normal process memory) C) Freeable by the kernel (buffer cache, ARC, etc.) Monitoring usually should focus on keeping (A+C) above some threshold. On Solaris, this means parsing some rather obscure kstats, sadly (not that Linux''s /proc/meminfo is much better). Or has vmstat grown more intelligence when I wasn''t looking? -- Carson
On Wed, May 6, 2009 at 10:17 PM, Richard Elling <richard.elling at gmail.com> wrote:> Fajar A. Nugraha wrote:>> IMHO it''s probably best to set a limit on ARC size and treat it like >> any other memory used by applications. >> > > There are a few cases where this makes sense, but not many. ?The ARC > will shrink, as needed. ?With the new write throttle, the perception of > ARC dominance is significantly reduced. ?The main reason to limit ARC > today, is to prevent the large page/small page change that can happen if > you need to restart a large-page using application (eg databases).I got a recommendation to limit ARC size when using zfs in xen dom0. https://opensolaris.org/jive/thread.jspa?messageID=355579 Is it among the "not many" cases, or is irrelevant today with new write throttle?
Fajar A. Nugraha wrote:> On Wed, May 6, 2009 at 10:17 PM, Richard Elling > <richard.elling at gmail.com> wrote: > >> Fajar A. Nugraha wrote: >> > > >>> IMHO it''s probably best to set a limit on ARC size and treat it like >>> any other memory used by applications. >>> >>> >> There are a few cases where this makes sense, but not many. The ARC >> will shrink, as needed. With the new write throttle, the perception of >> ARC dominance is significantly reduced. The main reason to limit ARC >> today, is to prevent the large page/small page change that can happen if >> you need to restart a large-page using application (eg databases). >> > > I got a recommendation to limit ARC size when using zfs in xen dom0. > https://opensolaris.org/jive/thread.jspa?messageID=355579 > Is it among the "not many" cases, or is irrelevant today with new > write throttle? >I see no justification in that post for limiting memory use. You can easily measure memory usage with vmstat and ARC-specific usage with arcstat and see if you are running into a memory shortfall. If not, then the recommendation may be misguided. -- richard
Carson Gaspar wrote:> Not true. The script is simply not intelligent enough. There are really > 3 broad kinds of RAM usage: > > A) Unused > B) Unfreeable by the kernel (normal process memory) > C) Freeable by the kernel (buffer cache, ARC, etc.) > > Monitoring usually should focus on keeping (A+C) above some threshold. > On Solaris, this means parsing some rather obscure kstats, sadly (not > that Linux''s /proc/meminfo is much better).B) is freeable but requires moving pages to spinning rust. There''s a subset of B (Call it B1) that is the active processes'' working sets which are basically useless to swap out, since they''ll be swapped right back in again. Two other important types of RAM usage in many modern situations: D) Unpageable (pinned) memory E) Memory that is presented to the OS but that is thin-provisioned by a hypervisor or other vitualization layer. (use of this memory may mean that the hypervisor moves pages to spinning rust) For virtualized systems, you should limit the size of A+B1+C so that it does not get into memory E. There''s no point in having data in the ARC if the hypervisor has to go to disk to get it. Considering that the size of E is dependant on the memory demands on the host server, (which the guest has no insight into) this is a Very Hard problem. Often this is arranged by having the hypervisor break the virtualization containment via a memory management driver (vmware tools provides a memory control, for example) which steals pages of virtual-chip memory to avoid the hypervisor swapping. --Joe
On Thu, 7 May 2009, Moore, Joe wrote:> Carson Gaspar wrote: >> Not true. The script is simply not intelligent enough. There are really >> 3 broad kinds of RAM usage: >> >> A) Unused >> B) Unfreeable by the kernel (normal process memory) >> C) Freeable by the kernel (buffer cache, ARC, etc.) >> >> Monitoring usually should focus on keeping (A+C) above some threshold. >> On Solaris, this means parsing some rather obscure kstats, sadly (not >> that Linux''s /proc/meminfo is much better). > > B) is freeable but requires moving pages to spinning rust. There''s > a subset of B (Call it B1) that is the active processes'' working > sets which are basically useless to swap out, since they''ll be > swapped right back in again.Yes. Solaris memory use is much more complex than can be described by simple A, B, C. Using pmap on a running process reveals this complexity, yet this only shows the complexity from a process''s view, and excludes usages such as zfs ARC. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/