Florian Heigl
2011-Feb-26 14:34 UTC
[Xen-devel] "right" way to gather domU stats in xen 3 & 4?
Hi all, I''m building a xen agent for nagios / check_mk. Automatic inventory of VMs and the basic up / down reporting are reliable now, and I''m looking at the next items on my list. * Free memory. This seems easy at first, look at xm info and that''s mostly it. I can have a different color for memory allocated to dom0 minus the dom0 lower balloon limit, but I''ll also have a check that will go to full alarm if anyone is crazy enough to use dom0 balloning. ;) What I don''t know is if I also need to substract something for the Xen heap? Long ago it used to default to 32MB i think. Can someone clue me in about that - is it relevant to xm info free / total mem? * per domU I also wanna look at memory statistics. - one thing is: mem vs. mem-max to show balloning. - the other thing is tmem: i don''t know if i should spend the time getting it right as I start getting the impression that since it was added by Dan and now tmem2 was added, two-and-a-half years went down where it''s considered working implemented none bothers to make it work for everyone. i.e. the recent directed that the direct ballooning daemon was just a lab exercise ;) If you know of any people that successfully run xen with tmem2 and such, I''d love to work with them to build the nagios-sy statistics .Otherwise I''ll save myself the headaches. * per domU cpu percent (to show how much of the dom0 power the vm is consuming)... Speed issues: Usually checks in check_mk are fired off every minute, so it would be good if I can directly via xenstore to collect and report my data within 1-2 seconds or less. Speed seems to be an issue I have to worry about - on my "top of the shelf" xen host it will take around 0.6seconds to query a meager 5 VMs. That''s just a 1.5GHz VIA box, but I''ll have to see how long it takes for 100 VMs or more. Documentation?? What I''m missing is some document that''d show all nodes in the xenstore that are readable. I''ve poked around a lot already but the statistics are hiding from me. Also I would try to use something that can work in xen4 and xen3. But that''s not mandatory, I can fallback from xl to xenstore-read to xm to libvirt. Why you might want to help: Using check_mk you can pull off all kinds of crazy stuff with the data it collects: trend analyzing on disk usage ("simple" example: get an alert if your vm store is growing at a rate that will let it run out of space in 3 days) if somebody feels they need it, use the block IO rates to trigger an eventhandler that will put io & cpu caps on a VM. (hosters might love that :) I think most of these features are not implemented in any nagios checks so far If I just hack it in ksh, it *will work*, but be ugly and slow :) and of course you won''t have to bother with any config files to add a VM! Maybe someone likes xenstore *a lot* and can point me at the right spots. Florian p.s.: could interested parties consider spending a day to improve the xm list output? it may technically make sense that a vm created using xm new has no ID and no status instead of "-------" and a VM that is running but didn''t use CPU during the microsecond we queried it is shown as blocking. But it makes life harder for each and every xen user for 5 or 6 years now, and technical reasons really don''t cut it if they turn information into worthless bytes. (I still feel you would get an "-r-----" state most of the time back in Xen2...) -- the purpose of libvirt is to provide an abstraction layer hiding all xen features added since 2006 until they were finally understood and copied by the kvm devs. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Feb-28 11:47 UTC
Re: [Xen-devel] "right" way to gather domU stats in xen 3 & 4?
On Sat, 26 Feb 2011, Florian Heigl wrote:> Hi all, > > I''m building a xen agent for nagios / check_mk. > Automatic inventory of VMs and the basic up / down reporting are > reliable now, and I''m looking at the next items on my list.it looks like a interesting and useful project> * Free memory. This seems easy at first, look at xm info and that''s > mostly it. I can have a different color for memory allocated to dom0 > minus the dom0 lower balloon limit, but I''ll also have a check that > will go to full alarm if anyone is crazy enough to use dom0 balloning. > ;) > What I don''t know is if I also need to substract something for the > Xen heap? Long ago it used to default to 32MB i think. Can someone > clue me in about that - is it relevant to xm info free / total mem?libxenlight provides a function that is called libxl_get_free_memory that returns the amount of free memory in the system. You can call it directly (adding a libxenlight dependency to your code) or you could simply give a look at the implementation (tools/libxl/libxl.c:libxl_get_free_memory). Also on hosts managed by libxenlight there is an additional xenstore node called /local/domain/0/memory/freemem-slack that contains the amount of memory that is going to be left free for Xen. In case you are wondering xen 4.1 is going to ship with two toolstacks: the old xend and a new one that is a library called libxenlight plus a minimal C utility called xl to invoke the library functions. xl/libxenlight are recommended over xend.> * per domU I also wanna look at memory statistics. > - one thing is: mem vs. mem-max to show balloning. > - the other thing is tmem: i don''t know if i should spend the time > getting it right as I start getting the impression that since it was > added by Dan and now tmem2 was added, two-and-a-half years went down > where it''s considered working implemented none bothers to make it work > for everyone. i.e. the recent directed that the direct ballooning > daemon was just a lab exercise ;) If you know of any people that > successfully run xen with tmem2 and such, I''d love to work with them > to build the nagios-sy statistics .Otherwise I''ll save myself the > headaches. > > * per domU cpu percent (to show how much of the dom0 power the vm is > consuming)... > > > Speed issues: > Usually checks in check_mk are fired off every minute, so it would be > good if I can directly via xenstore to collect and report my data > within 1-2 seconds or less. Speed seems to be an issue I have to worry > about - on my "top of the shelf" xen host it will take around > 0.6seconds to query a meager 5 VMs. > That''s just a 1.5GHz VIA box, but I''ll have to see how long it takes > for 100 VMs or more.Xenstore can become very busy on systems with many VMs running. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Florian Heigl
2011-Feb-28 23:16 UTC
Re: [Xen-devel] "right" way to gather domU stats in xen 3 & 4?
Hi Stefano, firstofall thanks for your reply! 2011/2/28 Stefano Stabellini <stefano.stabellini@eu.citrix.com>:> On Sat, 26 Feb 2011, Florian Heigl wrote: >> Hi all, >> >> I''m building a xen agent for nagios / check_mk. >> Automatic inventory of VMs and the basic up / down reporting are >> reliable now, and I''m looking at the next items on my list. > > it looks like a interesting and useful projectI hope it''ll be helpful, definitely works good for me. Things are a lot easier if you can just say "scan for any VMs on that host" and then they''re monitored / assign them to clusters. ( you can read here if you wanna: http://deranfangvomende.wordpress.com/2011/02/09/check_mk-xen-plugin-online/ ). I''m a *very* great fan of libxenlight. Many years ago there was "libxen" which wasn''t brought over to Xen3 and it was really time there''s a new fast tool "to rule them all". (i just had to). The host-side agent is very small and thus i''ll be just in /bin/sh and use xm/xl as available. I could use python, too, but if libxenlight is around the corner i don''t wanna re-introduce a python dependency :) I''m gonna trash the local agent code a few more times since it''s neither elegant nor fast yet. Both shell and python should work on Linux/NetBSD/Solaris. On the other hand the python bindings as shown at http://wiki.xensource.com/xenwiki/XenApi are probably completely outdated, and libxenlight is only available on Xen4.1 which severly limits it''s usability right now. Not sure how to go about this, but I think it will pay out to start simple with "xm", not thinking about performance impact and then rewrite the host agent later on to mostly use xl via i.e. python. I understand I gave too much thought about free memory and how much of is used by dom0/hypervisor/free. Besides the free memory nobody ever cares, me included. On most of my hosts I couldn''t say how much "total_mb" they display, because I just look at the "free_mb". So that point is sorted. I will try digging into xentop over the next days, as I the main magick of breaking down stats per domU is still open. I hope I will find other data than cpu seconds used, because that would mean UGLY calculations (in theory: multiply uptime by number of cores, and divide that by the seconds used by the domain?) Any comment about tmem / baloon would still be great... why doesn''t anyone jump when our coolest features are mentioned? :) I think it''s important to make them visible to the general users...>> That''s just a 1.5GHz VIA box, but I''ll have to see how long it takes >> for 100 VMs or more. > > Xenstore can become very busy on systems with many VMs running.So, any advice? Obviously, limiting my queries is the main trick, but seems the tools do a lot of calls internally. I wonder if that post about xenstore IO performance http://xen.1045712.n5.nabble.com/Revisiting-XenD-XenStored-performance-scalability-issues-td2504870.html still applies. I''ll try the ramdisk hack he described out of curiosity. Florian -- the purpose of libvirt is to provide an abstraction layer hiding all xen features added since 2006 until they were finally understood and copied by the kvm devs. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Mar-01 11:13 UTC
Re: [Xen-devel] "right" way to gather domU stats in xen 3 & 4?
On Mon, 28 Feb 2011, Florian Heigl wrote:> Any comment about tmem / baloon would still be great... why doesn''t > anyone jump when our coolest features are mentioned? :) > I think it''s important to make them visible to the general users...Ballooning shouldn''t be difficult, it is just a matter of reading memory/target and memory/static-max from xenstore. You could also read the actual memory used by the domain and compare it with target. Regarding tmem I''ll let Dan comment on it.> >> That''s just a 1.5GHz VIA box, but I''ll have to see how long it takes > >> for 100 VMs or more. > > > > Xenstore can become very busy on systems with many VMs running. > > So, any advice? Obviously, limiting my queries is the main trick, but > seems the tools do a lot of calls internally. > > I wonder if that post about xenstore IO performance > http://xen.1045712.n5.nabble.com/Revisiting-XenD-XenStored-performance-scalability-issues-td2504870.html > still applies. I''ll try the ramdisk hack he described out of > curiosity.It still applies to XenD but nowadays the development is mostly on xl/libxenlight that in response to "xl list" does a xenstore read per domain to resolve the domain name. If you are not interested in the domain name you could just call libxl_list_domain to have the list of domains running with a basic set of information (see libxl_dominfo, contains memory usage, cpu usage and number of online vcpus), no xenstore transactions at all! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2011-Mar-01 16:32 UTC
RE: [Xen-devel] "right" way to gather domU stats in xen 3 & 4?
> Any comment about tmem / baloon would still be great... why doesn''t > anyone jump when our coolest features are mentioned? :) > I think it''s important to make them visible to the general users...Hi Florian -- Tmem has no value without guest kernel changes and getting those changes (even though very small) into the Linux kernel has proven to be a very long frustrating experience, which I hope will finally come to fruition soon. Once in the upstream kernel, distro domUs will still need to merge and enable these changes. A couple of key things to plan for in your management tools: 1) Don''t assume that the amount of memory used by a guest is fixed and/or only under the control of your tools. 2) When tmem is in use, make sure you understand the difference between "free memory" and "freeable memory". Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Florian Heigl
2011-Mar-01 16:59 UTC
Re: [Xen-devel] "right" way to gather domU stats in xen 3 & 4?
Hi Dan, 2011/3/1 Dan Magenheimer <dan.magenheimer@oracle.com>:> Tmem has no value without guest kernel changes and getting those > changes (even though very small) into the Linux kernel has proven > to be a very long frustrating experience, which I hope willI wondered for some time now... can''t you just push it into Oracle VM & OEL in the meantime? Even as an unsupported kernel, It would "work for me" and my customers. About the long frustrating experience, see my sig :)> A couple of key things to plan for in your management tools: > 1) Don''t assume that the amount of memory used by a guest is > fixed and/or only under the control of your tools.Thats why I''ve been asking so intently. Right now it will be great to have a graph showing mem and maxmem, but when tmem is seeing more adaption any baloon stats become less useful. Also, as of today, half of the distros doesn''t have working cpu hotplug or balooning anyway. Anyway, thanks for the update :) Flo -- the purpose of libvirt is to provide an abstraction layer hiding all xen features added since 2006 until they were finally understood and copied by the kvm devs. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel