Andres Lagar-Cavilla
2012-Mar-01 02:25 UTC
[PATCH 0 of 2] Virq for low memory condition, V4
Changes form V3 posted Feb 28th - lowmemd is now xen-lowmemd - .hgignore rune added for xen-lowmemd - User can specify zero on the command line to disable the virq altogether - Addressed two comments from Jan Beulich + Better detection of no user-provided command line threshold + Deal with the case in which the threshold may end up being zero. Patch 1 is hypervisor (xen/common bits), patch 2 is tools. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> xen/common/page_alloc.c | 112 +++++++++++++++++++++++++++++++++++ xen/include/public/xen.h | 1 + .hgignore | 1 + tools/misc/Makefile | 7 +- tools/misc/xen-lowmemd.c | 148 +++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 267 insertions(+), 2 deletions(-)
Andres Lagar-Cavilla
2012-Mar-01 02:25 UTC
[PATCH 1 of 2] Global virq for low memory situations
xen/common/page_alloc.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++ xen/include/public/xen.h | 1 + 2 files changed, 113 insertions(+), 0 deletions(-) When a low memory threshold on the Xen heap is reached, we fire a global dom0 virq. If someone''s listening, they can free up some more memory. The low threshold is configurable via the command line token ''low_mem_virq_limit", and defaults to 64MiB. If the user specifies zero via the command line, the virq is disabled. We define a new virq VIRQ_ENOMEM. Potential listeners include squeezed, xenballoond, or anything else that can be fired through xencommons. We error-check the low mem virq against initial available heap (after dom0 allocation), to avoid firing immediately. Virq issuing is controlled by a hysteresis algorithm: when memory dips below a threshold, the virq is issued and the next virq will fire when memory shrinks another order of magnitude. The virq will not fire again in the current "band" until memory grows over the next higher order of magnitude. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 0696aa1de7c2 -r ac846c7ddaba xen/common/page_alloc.c --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -35,6 +35,7 @@ #include <xen/perfc.h> #include <xen/numa.h> #include <xen/nodemask.h> +#include <xen/event.h> #include <xen/tmem.h> #include <xen/tmem_xen.h> #include <public/sysctl.h> @@ -300,6 +301,111 @@ static unsigned long init_node_heap(int return needed; } +/* Default to 64 MiB */ +#define DEFAULT_LOW_MEM_VIRQ (((paddr_t) 64) << 20) +#define MAX_LOW_MEM_VIRQ (((paddr_t) 1024) << 20) + +static paddr_t __read_mostly opt_low_mem_virq = ((paddr_t) -1); +size_param("low_mem_virq_limit", opt_low_mem_virq); + +/* Thresholds to control hysteresis. In pages */ +/* When memory grows above this threshold, reset hysteresis. + * -1 initially to not reset until at least one virq issued. */ +static unsigned long low_mem_virq_high = -1UL; +/* Threshold at which we issue virq */ +static unsigned long low_mem_virq_th = 0; +/* Original threshold after all checks completed */ +static unsigned long low_mem_virq_orig = 0; +/* Order for current threshold */ +static unsigned int low_mem_virq_th_order = 0; + +/* Perform bootstrapping checks and set bounds */ +static void __init setup_low_mem_virq(void) +{ + unsigned int order; + paddr_t threshold; + bool_t halve = 0; + + /* If the user specifies zero, then he/she doesn''t want this virq + * to ever trigger. */ + if ( opt_low_mem_virq == 0 ) + { + low_mem_virq_th = -1UL; + return; + } + + /* If the user did not specify a knob, remember that */ + if ( opt_low_mem_virq == ((paddr_t) -1) ) + { + halve = 1; + threshold = DEFAULT_LOW_MEM_VIRQ; + } else + threshold = opt_low_mem_virq; + + /* Dom0 has already been allocated by now. So check we won''t be + * complaining immediately with whatever''s left of the heap. */ + threshold = min(threshold, + ((paddr_t) total_avail_pages) << PAGE_SHIFT); + + /* Then, cap to some predefined maximum */ + threshold = min(threshold, MAX_LOW_MEM_VIRQ); + + /* If the user specified no knob, and we are at the current available + * level, halve the threshold. */ + if ( halve && + (threshold == (((paddr_t) total_avail_pages) << PAGE_SHIFT)) ) + threshold >>= 1; + + /* Zero? Have to fire immediately */ + threshold = max(threshold, (paddr_t) PAGE_SIZE); + + /* Threshold bytes -> pages */ + low_mem_virq_th = threshold >> PAGE_SHIFT; + + /* Next, round the threshold down to the next order */ + order = get_order_from_pages(low_mem_virq_th); + if ( (1UL << order) > low_mem_virq_th ) + order--; + + /* Set bounds, ready to go */ + low_mem_virq_th = low_mem_virq_orig = 1UL << order; + low_mem_virq_th_order = order; + + printk("Initial low memory virq threshold set at 0x%lx pages.\n", + low_mem_virq_th); +} + +static void check_low_mem_virq(void) +{ + if ( unlikely(total_avail_pages <= low_mem_virq_th) ) + { + send_global_virq(VIRQ_ENOMEM); + + /* Update thresholds. Next warning will be when we drop below + * next order. However, we wait until we grow beyond one + * order above us to complain again at the current order */ + low_mem_virq_high = 1UL << (low_mem_virq_th_order + 1); + if ( low_mem_virq_th_order > 0 ) + low_mem_virq_th_order--; + low_mem_virq_th = 1UL << low_mem_virq_th_order; + return; + } + + if ( unlikely(total_avail_pages >= low_mem_virq_high) ) + { + /* Reset hysteresis. Bring threshold up one order. + * If we are back where originally set, set high + * threshold to -1 to avoid further growth of + * virq threshold. */ + low_mem_virq_th_order++; + low_mem_virq_th = 1UL << low_mem_virq_th_order; + if ( low_mem_virq_th == low_mem_virq_orig ) + low_mem_virq_high = -1UL; + else + low_mem_virq_high = 1UL << (low_mem_virq_th_order + 2); + } +} + /* Allocate 2^@order contiguous pages. */ static struct page_info *alloc_heap_pages( unsigned int zone_lo, unsigned int zone_hi, @@ -420,6 +526,8 @@ static struct page_info *alloc_heap_page total_avail_pages -= request; ASSERT(total_avail_pages >= 0); + check_low_mem_virq(); + if ( d != NULL ) d->last_alloc_node = node; @@ -1022,6 +1130,10 @@ void __init scrub_heap_pages(void) } printk("done.\n"); + + /* Now that the heap is initialized, run checks and set bounds + * for the low mem virq algorithm. */ + setup_low_mem_virq(); } diff -r 0696aa1de7c2 -r ac846c7ddaba xen/include/public/xen.h --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -157,6 +157,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_pfn_t); #define VIRQ_PCPU_STATE 9 /* G. (DOM0) PCPU state changed */ #define VIRQ_MEM_EVENT 10 /* G. (DOM0) A memory event has occured */ #define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient */ +#define VIRQ_ENOMEM 12 /* G. (DOM0) Low on heap memory */ /* Architecture-specific VIRQ definitions. */ #define VIRQ_ARCH_0 16
Andres Lagar-Cavilla
2012-Mar-01 02:25 UTC
[PATCH 2 of 2] Lowmemd: Simple demo code to show use of VIRQ_ENOMEM
.hgignore | 1 + tools/misc/Makefile | 7 +- tools/misc/xen-lowmemd.c | 148 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 154 insertions(+), 2 deletions(-) Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r ac846c7ddaba -r 4929082ea9f7 .hgignore --- a/.hgignore +++ b/.hgignore @@ -202,6 +202,7 @@ ^tools/misc/xenperf$ ^tools/misc/xenpm$ ^tools/misc/xen-hvmctx$ +^tools/misc/xen-lowmemd$ ^tools/misc/gtraceview$ ^tools/misc/gtracestat$ ^tools/misc/xenlockprof$ diff -r ac846c7ddaba -r 4929082ea9f7 tools/misc/Makefile --- a/tools/misc/Makefile +++ b/tools/misc/Makefile @@ -9,7 +9,7 @@ CFLAGS += $(CFLAGS_xeninclude) HDRS = $(wildcard *.h) TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd -TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash +TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash xen-lowmemd TARGETS-$(CONFIG_MIGRATE) += xen-hptool TARGETS := $(TARGETS-y) @@ -21,7 +21,7 @@ INSTALL_BIN-y := xencons INSTALL_BIN-$(CONFIG_X86) += xen-detect INSTALL_BIN := $(INSTALL_BIN-y) -INSTALL_SBIN-y := xm xen-bugtool xen-python-path xend xenperf xsview xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xen-ringwatch +INSTALL_SBIN-y := xm xen-bugtool xen-python-path xend xenperf xsview xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xen-ringwatch xen-lowmemd INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool INSTALL_SBIN := $(INSTALL_SBIN-y) @@ -70,6 +70,9 @@ xen-hptool: xen-hptool.o xenwatchdogd: xenwatchdogd.o $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS) +xen-lowmemd: xen-lowmemd.o + $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(LDLIBS_libxenstore) $(APPEND_LDFLAGS) + gtraceview: gtraceview.o $(CC) $(LDFLAGS) -o $@ $< $(CURSES_LIBS) $(APPEND_LDFLAGS) diff -r ac846c7ddaba -r 4929082ea9f7 tools/misc/xen-lowmemd.c --- /dev/null +++ b/tools/misc/xen-lowmemd.c @@ -0,0 +1,148 @@ +/* + * xen-lowmemd: demo VIRQ_ENOMEM + * Andres Lagar-Cavilla (GridCentric Inc.) + */ + +#include <stdio.h> +#include <xenctrl.h> +#include <xs.h> +#include <stdlib.h> +#include <string.h> + +static evtchn_port_t virq_port = -1; +static xc_evtchn *xce_handle = NULL; +static xc_interface *xch = NULL; +static struct xs_handle *xs_handle = NULL; + +void cleanup(void) +{ + if (virq_port > -1) + xc_evtchn_unbind(xce_handle, virq_port); + if (xce_handle) + xc_evtchn_close(xce_handle); + if (xch) + xc_interface_close(xch); + if (xs_handle) + xs_daemon_close(xs_handle); +} + +/* Never shrink dom0 below 1 GiB */ +#define DOM0_FLOOR (1 << 30) +#define DOM0_FLOOR_PG ((DOM0_FLOOR) >> 12) + +/* Act if free memory is less than 92 MiB */ +#define THRESHOLD (92 << 20) +#define THRESHOLD_PG ((THRESHOLD) >> 12) + +#define BUFSZ 512 +void handle_low_mem(void) +{ + xc_dominfo_t dom0_info; + xc_physinfo_t info; + unsigned long long free_pages, dom0_pages, diff, dom0_target; + char data[BUFSZ], error[BUFSZ]; + + if (xc_physinfo(xch, &info) < 0) + { + perror("Getting physinfo failed"); + return; + } + + free_pages = (unsigned long long) info.free_pages; + printf("Available free pages: 0x%llx:%llux\n", + free_pages, free_pages); + + /* Don''t do anything if we have more than the threshold free */ + if ( free_pages >= THRESHOLD_PG ) + return; + diff = THRESHOLD_PG - free_pages; + + if (xc_domain_getinfo(xch, 0, 1, &dom0_info) < 1) + { + perror("Failed to get dom0 info"); + return; + } + + dom0_pages = (unsigned long long) dom0_info.nr_pages; + printf("Dom0 pages: 0x%llx:%llu\n", dom0_pages, dom0_pages); + dom0_target = dom0_pages - diff; + if (dom0_target <= DOM0_FLOOR_PG) + return; + + printf("Shooting for dom0 target 0x%llx:%llu\n", + dom0_target, dom0_target); + + snprintf(data, BUFSZ, "%llu", dom0_target); + if (!xs_write(xs_handle, XBT_NULL, + "/local/domain/0/memory/target", data, strlen(data))) + { + snprintf(error, BUFSZ,"Failed to write target %s to xenstore", data); + perror(error); + } +} + +int main(int argc, char *argv[]) +{ + int rc; + + atexit(cleanup); + + xch = xc_interface_open(NULL, NULL, 0); + if (xch == NULL) + { + perror("Failed to open xc interface"); + return 1; + } + + xce_handle = xc_evtchn_open(NULL, 0); + if (xce_handle == NULL) + { + perror("Failed to open evtchn device"); + return 2; + } + + xs_handle = xs_daemon_open(); + if (xs_handle == NULL) + { + perror("Failed to open xenstore connection"); + return 3; + } + + if ((rc = xc_evtchn_bind_virq(xce_handle, VIRQ_ENOMEM)) == -1) + { + perror("Failed to bind to domain exception virq port"); + return 4; + } + + virq_port = rc; + + while(1) + { + evtchn_port_t port; + + if ((port = xc_evtchn_pending(xce_handle)) == -1) + { + perror("Failed to listen for pending event channel"); + return 5; + } + + if (port != virq_port) + { + char data[BUFSZ]; + snprintf(data, BUFSZ, "Wrong port, got %d expected %d", port, virq_port); + perror(data); + return 6; + } + + if (xc_evtchn_unmask(xce_handle, port) == -1) + { + perror("Failed to unmask port"); + return 7; + } + + printf("Got a virq kick, time to get work\n"); + handle_low_mem(); + } + + return 0; +}
Ian Jackson
2012-Mar-01 15:30 UTC
Re: [PATCH 2 of 2] Lowmemd: Simple demo code to show use of VIRQ_ENOMEM
Andres Lagar-Cavilla writes ("[Xen-devel] [PATCH 2 of 2] Lowmemd: Simple demo code to show use of VIRQ_ENOMEM"):> .hgignore | 1 + > tools/misc/Makefile | 7 +- > tools/misc/xen-lowmemd.c | 148 +++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 154 insertions(+), 2 deletions(-)Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> This can go in when the hypervisor changes do. If for any reason you need to repost it would be nice to add a .gitignore change along with the .hgignore one, but that''s strictly optional. Ian.
Dan Magenheimer
2012-Mar-01 19:19 UTC
Re: [PATCH 1 of 2] Global virq for low memory situations
> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > Sent: Wednesday, February 29, 2012 7:26 PM > To: xen-devel@lists.xensource.com > Cc: ian.campbell@citrix.com; andres@gridcentric.ca; tim@xen.org; JBeulich@suse.com; > ian.jackson@citrix.com; adin@gridcentric.ca > Subject: [Xen-devel] [PATCH 1 of 2] Global virq for low memory situations > > xen/common/page_alloc.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++ > xen/include/public/xen.h | 1 + > 2 files changed, 113 insertions(+), 0 deletions(-)Just a note that I think this will trigger false positives when Xen tmem is working (with tmem-enabled guests). With tmem, Xen has two classes of free memory: free and "freeable". In many situations, free will approach 0 but there will still be lots of "freeable" memory which is available for guest ballooning, launching new domains, etc. It may be the case that your virq may never be used on a system with a tmem guest, but if it is, weird things (like unnecessary hypervisor swapping) may happen. Dan
Andres Lagar-Cavilla
2012-Mar-01 19:26 UTC
Re: [PATCH 1 of 2] Global virq for low memory situations
>> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> Sent: Wednesday, February 29, 2012 7:26 PM >> To: xen-devel@lists.xensource.com >> Cc: ian.campbell@citrix.com; andres@gridcentric.ca; tim@xen.org; >> JBeulich@suse.com; >> ian.jackson@citrix.com; adin@gridcentric.ca >> Subject: [Xen-devel] [PATCH 1 of 2] Global virq for low memory >> situations >> >> xen/common/page_alloc.c | 112 >> +++++++++++++++++++++++++++++++++++++++++++++++ >> xen/include/public/xen.h | 1 + >> 2 files changed, 113 insertions(+), 0 deletions(-) > > Just a note that I think this will trigger false positives when > Xen tmem is working (with tmem-enabled guests). With tmem, > Xen has two classes of free memory: free and "freeable". > In many situations, free will approach 0 but there will still > be lots of "freeable" memory which is available for guest > ballooning, launching new domains, etc. > > It may be the case that your virq may never be used on > a system with a tmem guest, but if it is, weird things > (like unnecessary hypervisor swapping) may happen.Since I know zero point zero zero zip nada about tmem, maybe the most efficient path is that you throw on top of the patch the tmem magic. Which would seem to be all about factoring tmem_freeable_pages() into the calculation? Thanks, Andres> > Dan >
Dan Magenheimer
2012-Mar-01 21:08 UTC
Re: [PATCH 1 of 2] Global virq for low memory situations
> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > Subject: RE: [Xen-devel] [PATCH 1 of 2] Global virq for low memory situations > > >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > >> Sent: Wednesday, February 29, 2012 7:26 PM > >> To: xen-devel@lists.xensource.com > >> Cc: ian.campbell@citrix.com; andres@gridcentric.ca; tim@xen.org; > >> JBeulich@suse.com; > >> ian.jackson@citrix.com; adin@gridcentric.ca > >> Subject: [Xen-devel] [PATCH 1 of 2] Global virq for low memory > >> situations > >> > >> xen/common/page_alloc.c | 112 > >> +++++++++++++++++++++++++++++++++++++++++++++++ > >> xen/include/public/xen.h | 1 + > >> 2 files changed, 113 insertions(+), 0 deletions(-) > > > > Just a note that I think this will trigger false positives when > > Xen tmem is working (with tmem-enabled guests). With tmem, > > Xen has two classes of free memory: free and "freeable". > > In many situations, free will approach 0 but there will still > > be lots of "freeable" memory which is available for guest > > ballooning, launching new domains, etc. > > > > It may be the case that your virq may never be used on > > a system with a tmem guest, but if it is, weird things > > (like unnecessary hypervisor swapping) may happen. > > Since I know zero point zero zero zip nada about tmem, maybe the most > efficient path is that you throw on top of the patch the tmem magic. Which > would seem to be all about factoring tmem_freeable_pages() into the > calculation?Since I know less than zero (:-) about what you plan to use the virq for, and neither of us has tested one in the presence of the other, the sane short term solution may be to at least WARN_ON_ONCE and possibly disable your virq if opt_tmem is set. For now, tmem requires an explicit boot option (which sets opt_tmem) so you should be safe if opt_tmem==0. Longer term, we are going to need to test environments where both xenpaging/sharing is enabled for legacy/Windows HVM guests AND tmem is enabled for tmem-savvy guests. Does that make sense? Dan