Olaf Hering
2011-Nov-02 14:45 UTC
[Xen-devel] [PATCH 0 of 4] libxl: initial support for xenpaging
The following series adds initial support for xenpaging to libxl. It depends on two series I sent earlier: tools/xenpaging fixes for xen-unstable, sent on 2011-10-21 http://lists.xensource.com/archives/html/xen-devel/2011-10/msg01542.html libxl: make spawn interface more generic, sent on 2011-10-27 http://lists.xensource.com/archives/html/xen-devel/2011-10/msg01912.html The logic of xenpaging was reversed by this series. It does now monitor the guests tot_pages value and work toward that number by either paging out more pages, or write pages back into the guest. Target changes will received from the guests "memory/target-tot_pages" path. Three new configuration file options specific for xenpaging were added: actmem=<int> xenpaging_file=<string> (optional) xenpaging_extra=[ ''string'', ''string'' ] (optional) xenpaging will only be started if actmem= is set and not zero. A xl mem-SOMETHING command is not yet part of this series. I will add it once a suitable name is found. There has been some discussion regarding the naming of the config option, and how to drive xenpaging via xl commands. http://lists.xensource.com/archives/html/xen-devel/2011-10/msg00110.html The term "actual memory" was suggested by IanC, thats why the option is now ''actmem='' instead of ''totmem=''. So far I couldnt come up with a better name that follows the current scheme. George Dunlap suggested the following off-list for the related xl mem-* commands: ''xl mem-set'' should continue to change the balloon target as it does today. But it should also update "memory/target-tot_pages" with the same value. There could be some churn when the balloon driver and xenpaging try to reach that value. Eventually xenpaging will be faster to free pages, while the balloon driver still tries to reach its target. In my opinion thats not an issue if mem-set really means ''release as much memory back to Xen, as fast as possible''. If the guest is actually using much memory then the balloon driver (in its role as memory hog) can not do much to reach its target. But xenpaging swap some parts of the guest to free memory on the host. Two other ''xl mem-*'' commands should be added to tweak just the balloon driver and xenpaging. ''xl mem-balloon-target'' does what ''mem-set'' does today, and ''xl mem-swap-target'' will tweak "memory/target-tot_pages". Olaf tools/libxl/libxl.h | 1 tools/libxl/libxl_create.c | 126 ++++++++++++++++++++++++++ tools/libxl/libxl_dom.c | 8 + tools/libxl/libxl_memory.txt | 57 +++++++----- tools/libxl/libxl_types.idl | 3 tools/libxl/xl_cmdimpl.c | 31 ++++++ tools/xenpaging/xenpaging.c | 201 +++++++++++++++++++++++++++++++++++-------- tools/xenpaging/xenpaging.h | 1 8 files changed, 368 insertions(+), 60 deletions(-) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Nov-02 14:45 UTC
[Xen-devel] [PATCH 1 of 4] xenpaging: use guests tot_pages as working target
# HG changeset patch # User Olaf Hering <olaf@aepfle.de> # Date 1320244382 -3600 # Node ID 0d872bf1203dd36200477f688908797875035b50 # Parent f057eb06706e2bacaadb41cf80fa45001e786e69 xenpaging: use guests tot_pages as working target This change reverses the task of xenpaging. Before this change a fixed number of pages was paged out. With this change the guest will not have access to more than the given number of pages at the same time. Signed-off-by: Olaf Hering <olaf@aepfle.de> diff -r f057eb06706e -r 0d872bf1203d tools/xenpaging/policy_default.c --- a/tools/xenpaging/policy_default.c +++ b/tools/xenpaging/policy_default.c @@ -71,7 +71,6 @@ int policy_init(xenpaging_t *paging) /* Start in the middle to avoid paging during BIOS startup */ current_gfn = max_pages / 2; - current_gfn -= paging->num_pages / 2; rc = 0; out: diff -r f057eb06706e -r 0d872bf1203d tools/xenpaging/xenpaging.c --- a/tools/xenpaging/xenpaging.c +++ b/tools/xenpaging/xenpaging.c @@ -136,6 +136,21 @@ err: return rc; } +static int xenpaging_get_tot_pages(xenpaging_t *paging) +{ + xc_interface *xch = paging->xc_handle; + xc_domaininfo_t domain_info; + int rc; + + rc = xc_domain_getinfolist(xch, paging->mem_event.domain_id, 1, &domain_info); + if ( rc != 1 ) + { + PERROR("Error getting domain info"); + return -1; + } + return domain_info.tot_pages; +} + static void *init_page(void) { void *buffer; @@ -161,7 +176,7 @@ static void *init_page(void) return NULL; } -static xenpaging_t *xenpaging_init(domid_t domain_id, int num_pages) +static xenpaging_t *xenpaging_init(domid_t domain_id, int target_tot_pages) { xenpaging_t *paging; xc_domaininfo_t domain_info; @@ -296,12 +311,7 @@ static xenpaging_t *xenpaging_init(domid } DPRINTF("max_pages = %d\n", paging->max_pages); - if ( num_pages < 0 || num_pages > paging->max_pages ) - { - num_pages = paging->max_pages; - DPRINTF("setting num_pages to %d\n", num_pages); - } - paging->num_pages = num_pages; + paging->target_tot_pages = target_tot_pages; /* Initialise policy */ rc = policy_init(paging); @@ -648,7 +658,9 @@ int main(int argc, char *argv[]) xenpaging_victim_t *victims; mem_event_request_t req; mem_event_response_t rsp; + int num, prev_num = 0; int i; + int tot_pages; int rc = -1; int rc1; xc_interface *xch; @@ -659,7 +671,7 @@ int main(int argc, char *argv[]) if ( argc != 3 ) { - fprintf(stderr, "Usage: %s <domain_id> <num_pages>\n", argv[0]); + fprintf(stderr, "Usage: %s <domain_id> <tot_pages>\n", argv[0]); return -1; } @@ -672,7 +684,7 @@ int main(int argc, char *argv[]) } xch = paging->xc_handle; - DPRINTF("starting %s %u %d\n", argv[0], paging->mem_event.domain_id, paging->num_pages); + DPRINTF("starting %s %u %d\n", argv[0], paging->mem_event.domain_id, paging->target_tot_pages); /* Open file */ sprintf(filename, "page_cache_%u", paging->mem_event.domain_id); @@ -704,9 +716,6 @@ int main(int argc, char *argv[]) /* listen for page-in events to stop pager */ create_page_in_thread(paging); - i = evict_pages(paging, fd, victims, paging->num_pages); - DPRINTF("%d pages evicted. Done.\n", i); - /* Swap pages in and out */ while ( 1 ) { @@ -771,12 +780,8 @@ int main(int argc, char *argv[]) goto out; } - /* Evict a new page to replace the one we just paged in, - * or clear this pagefile slot on exit */ - if ( interrupted ) - victims[i].gfn = INVALID_MFN; - else - evict_victim(paging, &victims[i], fd, i); + /* Clear this pagefile slot */ + victims[i].gfn = INVALID_MFN; } else { @@ -823,6 +828,43 @@ int main(int argc, char *argv[]) if ( interrupted ) break; + /* Check if the target has been reached already */ + tot_pages = xenpaging_get_tot_pages(paging); + if ( tot_pages < 0 ) + goto out; + + /* Resume all pages if paging is disabled or no target was set */ + if ( paging->target_tot_pages == 0 ) + { + if ( paging->num_paged_out ) + resume_pages(paging, paging->num_paged_out); + } + /* Evict more pages if target not reached */ + else if ( tot_pages > paging->target_tot_pages ) + { + num = tot_pages - paging->target_tot_pages; + if ( num != prev_num ) + { + DPRINTF("Need to evict %d pages to reach %d target_tot_pages\n", num, paging->target_tot_pages); + prev_num = num; + } + /* Limit the number of evicts to be able to process page-in requests */ + if ( num > 42 ) + num = 42; + evict_pages(paging, fd, victims, num); + } + /* Resume some pages if target not reached */ + else if ( tot_pages < paging->target_tot_pages && paging->num_paged_out ) + { + num = paging->target_tot_pages - tot_pages; + if ( num != prev_num ) + { + DPRINTF("Need to resume %d pages to reach %d target_tot_pages\n", num, paging->target_tot_pages); + prev_num = num; + } + resume_pages(paging, num); + } + } DPRINTF("xenpaging got signal %d\n", interrupted); diff -r f057eb06706e -r 0d872bf1203d tools/xenpaging/xenpaging.h --- a/tools/xenpaging/xenpaging.h +++ b/tools/xenpaging/xenpaging.h @@ -50,7 +50,7 @@ typedef struct xenpaging { /* number of pages for which data structures were allocated */ int max_pages; int num_paged_out; - int num_pages; + int target_tot_pages; int policy_mru_size; unsigned long pagein_queue[XENPAGING_PAGEIN_QUEUE_SIZE]; } xenpaging_t; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Nov-02 14:45 UTC
[Xen-devel] [PATCH 2 of 4] xenpaging: watch the guests memory/target-tot_pages xenstore value
# HG changeset patch # User Olaf Hering <olaf@aepfle.de> # Date 1320244383 -3600 # Node ID 434f0b4da9148b101e184e0108be6c31f67038f4 # Parent 0d872bf1203dd36200477f688908797875035b50 xenpaging: watch the guests memory/target-tot_pages xenstore value Subsequent patches will use xenstored to store the numbers of pages xenpaging is suppose to page-out. Remove num_pages and use target_pages instead. Signed-off-by: Olaf Hering <olaf@aepfle.de> diff -r 0d872bf1203d -r 434f0b4da914 tools/xenpaging/xenpaging.c --- a/tools/xenpaging/xenpaging.c +++ b/tools/xenpaging/xenpaging.c @@ -19,8 +19,10 @@ */ #define _XOPEN_SOURCE 600 +#define _GNU_SOURCE #include <inttypes.h> +#include <stdio.h> #include <stdlib.h> #include <stdarg.h> #include <time.h> @@ -35,6 +37,10 @@ #include "policy.h" #include "xenpaging.h" +/* Defines number of mfns a guest should use at a time, in KiB */ +#define WATCH_TARGETPAGES "memory/target-tot_pages" +static char *watch_target_tot_pages; +static char *dom_path; static char watch_token[16]; static char filename[80]; static int interrupted; @@ -72,7 +78,7 @@ static int xenpaging_wait_for_event_or_t { xc_interface *xch = paging->xc_handle; xc_evtchn *xce = paging->mem_event.xce_handle; - char **vec; + char **vec, *val; unsigned int num; struct pollfd fd[2]; int port; @@ -111,6 +117,25 @@ static int xenpaging_wait_for_event_or_t rc = 0; } } + else if ( strcmp(vec[XS_WATCH_PATH], watch_target_tot_pages) == 0 ) + { + int ret, target_tot_pages; + val = xs_read(paging->xs_handle, XBT_NULL, vec[XS_WATCH_PATH], NULL); + if ( val ) + { + ret = sscanf(val, "%d", &target_tot_pages); + if ( ret > 0 ) + { + /* KiB to pages */ + target_tot_pages >>= 2; + if ( target_tot_pages < 0 || target_tot_pages > paging->max_pages ) + target_tot_pages = paging->max_pages; + paging->target_tot_pages = target_tot_pages; + DPRINTF("new target_tot_pages %d\n", target_tot_pages); + } + free(val); + } + } free(vec); } } @@ -216,6 +241,25 @@ static xenpaging_t *xenpaging_init(domid goto err; } + /* Watch xenpagings working target */ + dom_path = xs_get_domain_path(paging->xs_handle, domain_id); + if ( !dom_path ) + { + PERROR("Could not find domain path\n"); + goto err; + } + if ( asprintf(&watch_target_tot_pages, "%s/%s", dom_path, WATCH_TARGETPAGES) < 0 ) + { + PERROR("Could not alloc watch path\n"); + goto err; + } + DPRINTF("watching ''%s''\n", watch_target_tot_pages); + if ( xs_watch(paging->xs_handle, watch_target_tot_pages, "") == false ) + { + PERROR("Could not bind to xenpaging watch\n"); + goto err; + } + p = getenv("XENPAGING_POLICY_MRU_SIZE"); if ( p && *p ) { @@ -342,6 +386,8 @@ static xenpaging_t *xenpaging_init(domid free(paging->mem_event.ring_page); } + free(dom_path); + free(watch_target_tot_pages); free(paging->bitmap); free(paging); } @@ -357,6 +403,9 @@ static int xenpaging_teardown(xenpaging_ if ( paging == NULL ) return 0; + xs_unwatch(paging->xs_handle, watch_target_tot_pages, ""); + xs_unwatch(paging->xs_handle, "@releaseDomain", watch_token); + xch = paging->xc_handle; paging->xc_handle = NULL; /* Tear down domain paging in Xen */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Nov-02 14:45 UTC
[Xen-devel] [PATCH 3 of 4] xenpaging: add cmdline interface for pager
# HG changeset patch # User Olaf Hering <olaf@aepfle.de> # Date 1320244384 -3600 # Node ID a51d4fab351d2d1a38b82cbd7ad925f76fce9e9a # Parent 434f0b4da9148b101e184e0108be6c31f67038f4 xenpaging: add cmdline interface for pager Introduce a cmdline handling for the pager. This simplifies libxl support, debug and mru_size are not passed via the environment anymore. The new interface looks like this: xenpaging [options] -f <pagefile> -d <domain_id> options: -d <domid> --domain=<domid> numerical domain_id of guest. This option is required. -f <file> --pagefile=<file> pagefile to use. This option is required. -m <max_memkb> --max_memkb=<max_memkb> maximum amount of memory to handle. -r <num> --mru_size=<num> number of paged-in pages to keep in memory. -d --debug enable debug output. -h --help this output. Signed-off-by: Olaf Hering <olaf@aepfle.de> diff -r 434f0b4da914 -r a51d4fab351d tools/xenpaging/xenpaging.c --- a/tools/xenpaging/xenpaging.c +++ b/tools/xenpaging/xenpaging.c @@ -31,6 +31,7 @@ #include <poll.h> #include <xc_private.h> #include <xs.h> +#include <getopt.h> #include "xc_bitops.h" #include "file_ops.h" @@ -42,12 +43,12 @@ static char *watch_target_tot_pages; static char *dom_path; static char watch_token[16]; -static char filename[80]; +static char *filename; static int interrupted; static void unlink_pagefile(void) { - if ( filename[0] ) + if ( filename && filename[0] ) { unlink(filename); filename[0] = ''\0''; @@ -201,11 +202,85 @@ static void *init_page(void) return NULL; } -static xenpaging_t *xenpaging_init(domid_t domain_id, int target_tot_pages) +static void usage(void) +{ + printf("usage:\n\n"); + + printf(" xenpaging [options] -f <pagefile> -d <domain_id>\n\n"); + + printf("options:\n"); + printf(" -d <domid> --domain=<domid> numerical domain_id of guest. This option is required.\n"); + printf(" -f <file> --pagefile=<file> pagefile to use. This option is required.\n"); + printf(" -m <max_memkb> --max_memkb=<max_memkb> maximum amount of memory to handle.\n"); + printf(" -r <num> --mru_size=<num> number of paged-in pages to keep in memory.\n"); + printf(" -v --verbose enable debug output.\n"); + printf(" -h --help this output.\n"); +} + +static int xenpaging_getopts(xenpaging_t *paging, int argc, char *argv[]) +{ + int ch; + static const char sopts[] = "hvd:f:m:r:"; + static const struct option lopts[] = { + {"help", 0, NULL, ''h''}, + {"verbose", 0, NULL, ''v''}, + {"domain", 1, NULL, ''d''}, + {"pagefile", 1, NULL, ''f''}, + {"mru_size", 1, NULL, ''m''}, + { } + }; + + while ((ch = getopt_long(argc, argv, sopts, lopts, NULL)) != -1) + { + switch(ch) { + case ''d'': + paging->mem_event.domain_id = atoi(optarg); + break; + case ''f'': + filename = strdup(optarg); + break; + case ''m'': + /* KiB to pages */ + paging->max_pages = atoi(optarg) >> 2; + break; + case ''r'': + paging->policy_mru_size = atoi(optarg); + break; + case ''v'': + paging->debug = 1; + break; + case ''h'': + case ''?'': + usage(); + return 1; + } + } + + argv += optind; argc -= optind; + + /* Path to pagefile is required */ + if ( !filename ) + { + printf("Filename for pagefile missing!\n"); + usage(); + return 1; + } + + /* Set domain id */ + if ( !paging->mem_event.domain_id ) + { + printf("Numerical <domain_id> missing!\n"); + return 1; + } + + return 0; +} + +static xenpaging_t *xenpaging_init(int argc, char *argv[]) { xenpaging_t *paging; xc_domaininfo_t domain_info; - xc_interface *xch; + xc_interface *xch = NULL; xentoollog_logger *dbg = NULL; char *p; int rc; @@ -215,7 +290,12 @@ static xenpaging_t *xenpaging_init(domid if ( !paging ) goto err; - if ( getenv("XENPAGING_DEBUG") ) + /* Get cmdline options and domain_id */ + if ( xenpaging_getopts(paging, argc, argv) ) + goto err; + + /* Enable debug output */ + if ( paging->debug ) dbg = (xentoollog_logger *)xtl_createlogger_stdiostream(stderr, XTL_DEBUG, 0); /* Open connection to xen */ @@ -234,7 +314,7 @@ static xenpaging_t *xenpaging_init(domid } /* write domain ID to watch so we can ignore other domain shutdowns */ - snprintf(watch_token, sizeof(watch_token), "%u", domain_id); + snprintf(watch_token, sizeof(watch_token), "%u", paging->mem_event.domain_id); if ( xs_watch(paging->xs_handle, "@releaseDomain", watch_token) == false ) { PERROR("Could not bind to shutdown watch\n"); @@ -242,7 +322,7 @@ static xenpaging_t *xenpaging_init(domid } /* Watch xenpagings working target */ - dom_path = xs_get_domain_path(paging->xs_handle, domain_id); + dom_path = xs_get_domain_path(paging->xs_handle, paging->mem_event.domain_id); if ( !dom_path ) { PERROR("Could not find domain path\n"); @@ -260,16 +340,6 @@ static xenpaging_t *xenpaging_init(domid goto err; } - p = getenv("XENPAGING_POLICY_MRU_SIZE"); - if ( p && *p ) - { - paging->policy_mru_size = atoi(p); - DPRINTF("Setting policy mru_size to %d\n", paging->policy_mru_size); - } - - /* Set domain id */ - paging->mem_event.domain_id = domain_id; - /* Initialise shared page */ paging->mem_event.shared_page = init_page(); if ( paging->mem_event.shared_page == NULL ) @@ -335,17 +405,21 @@ static xenpaging_t *xenpaging_init(domid paging->mem_event.port = rc; - rc = xc_domain_getinfolist(xch, paging->mem_event.domain_id, 1, - &domain_info); - if ( rc != 1 ) + /* Get max_pages from guest if not provided via cmdline */ + if ( !paging->max_pages ) { - PERROR("Error getting domain info"); - goto err; + rc = xc_domain_getinfolist(xch, paging->mem_event.domain_id, 1, + &domain_info); + if ( rc != 1 ) + { + PERROR("Error getting domain info"); + goto err; + } + + /* Record number of max_pages */ + paging->max_pages = domain_info.max_pages; } - /* Record number of max_pages */ - paging->max_pages = domain_info.max_pages; - /* Allocate bitmap for tracking pages that have been paged out */ paging->bitmap = bitmap_alloc(paging->max_pages); if ( !paging->bitmap ) @@ -355,8 +429,6 @@ static xenpaging_t *xenpaging_init(domid } DPRINTF("max_pages = %d\n", paging->max_pages); - paging->target_tot_pages = target_tot_pages; - /* Initialise policy */ rc = policy_init(paging); if ( rc != 0 ) @@ -718,25 +790,18 @@ int main(int argc, char *argv[]) mode_t open_mode = S_IRUSR | S_IRGRP | S_IROTH | S_IWUSR | S_IWGRP | S_IWOTH; int fd; - if ( argc != 3 ) - { - fprintf(stderr, "Usage: %s <domain_id> <tot_pages>\n", argv[0]); - return -1; - } - /* Initialise domain paging */ - paging = xenpaging_init(atoi(argv[1]), atoi(argv[2])); + paging = xenpaging_init(argc, argv); if ( paging == NULL ) { - fprintf(stderr, "Error initialising paging"); + fprintf(stderr, "Error initialising paging\n"); return 1; } xch = paging->xc_handle; - DPRINTF("starting %s %u %d\n", argv[0], paging->mem_event.domain_id, paging->target_tot_pages); + DPRINTF("starting %s for domain_id %u with pagefile %s\n", argv[0], paging->mem_event.domain_id, filename); /* Open file */ - sprintf(filename, "page_cache_%u", paging->mem_event.domain_id); fd = open(filename, open_flags, open_mode); if ( fd < 0 ) { diff -r 434f0b4da914 -r a51d4fab351d tools/xenpaging/xenpaging.h --- a/tools/xenpaging/xenpaging.h +++ b/tools/xenpaging/xenpaging.h @@ -52,6 +52,7 @@ typedef struct xenpaging { int num_paged_out; int target_tot_pages; int policy_mru_size; + int debug; unsigned long pagein_queue[XENPAGING_PAGEIN_QUEUE_SIZE]; } xenpaging_t; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Nov-02 14:45 UTC
[Xen-devel] [PATCH 4 of 4] xenpaging: initial libxl support
# HG changeset patch # User Olaf Hering <olaf@aepfle.de> # Date 1320244864 -3600 # Node ID ab5406a5b1d01e3828f0dcd833f99b70e4fbad72 # Parent a51d4fab351d2d1a38b82cbd7ad925f76fce9e9a xenpaging: initial libxl support Add initial support to libxl for starting xenpaging. The patch adds three new config options: actmem=<int>, the amount of memory in MiB for the guest xenpaging_file=<string>, pagefile to use (optional) xenpaging_extra=[ ''string'', ''string'' ], additional args for xenpaging (optional) If ''actmem='' is not specified in config file, xenpaging will not start. If ''xenpaging_file='' is not specified in config file, /var/lib/xen/xenpaging/<domain_name>.<domaind_id>.paging is used. Signed-off-by: Olaf Hering <olaf@aepfle.de> diff -r a51d4fab351d -r ab5406a5b1d0 tools/libxl/libxl.h --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -261,6 +261,7 @@ int libxl_init_dm_info(libxl_ctx *ctx, typedef int (*libxl_console_ready)(libxl_ctx *ctx, uint32_t domid, void *priv); int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config, libxl_console_ready cb, void *priv, uint32_t *domid); int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config, libxl_console_ready cb, void *priv, uint32_t *domid, int restore_fd); +int libxl__create_xenpaging(libxl_ctx *ctx, libxl_domain_config *d_config, uint32_t domid, char *path); void libxl_domain_config_destroy(libxl_domain_config *d_config); int libxl_domain_suspend(libxl_ctx *ctx, libxl_domain_suspend_info *info, uint32_t domid, int fd); diff -r a51d4fab351d -r ab5406a5b1d0 tools/libxl/libxl_create.c --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -429,6 +429,122 @@ retry_transaction: return rc; } +static int create_xenpaging(libxl__gc *gc, char *dom_name, uint32_t domid, + libxl_domain_build_info *b_info) +{ + libxl__spawner_starting *buf_starting; + libxl_string_list xpe = b_info->u.hvm.xenpaging_extra; + int i, rc; + char *logfile; + int logfile_w, null; + char *path, *dom_path, *value; + char **args; + char *xp; + flexarray_t *xp_args; + libxl_ctx *ctx = libxl__gc_owner(gc); + + /* Nothing to do */ + if (!b_info->tot_memkb) + return 0; + + /* Check if paging is already enabled */ + dom_path = libxl__xs_get_dompath(gc, domid); + if (!dom_path ) { + rc = ERROR_NOMEM; + goto out; + } + path = libxl__sprintf(gc, "%s/xenpaging/state", dom_path); + if (!path ) { + rc = ERROR_NOMEM; + goto out; + } + value = xs_read(ctx->xsh, XBT_NULL, path, NULL); + rc = value && strcmp(value, "running") == 0; + free(value); + /* Already running, nothing to do */ + if (rc) + return 0; + + /* Check if xenpaging is present */ + xp = libxl__abs_path(gc, "xenpaging", libxl_libexec_path()); + if (access(xp, X_OK) < 0) { + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "%s is not executable", xp); + rc = ERROR_FAIL; + goto out; + } + + /* Initialise settings for child */ + buf_starting = calloc(sizeof(*buf_starting), 1); + if (!buf_starting) { + rc = ERROR_NOMEM; + goto out; + } + buf_starting->domid = domid; + buf_starting->dom_path = dom_path; + buf_starting->pid_path = "xenpaging/xenpaging-pid"; + buf_starting->for_spawn = calloc(sizeof(libxl__spawn_starting), 1); + if (!buf_starting->for_spawn) { + rc = ERROR_NOMEM; + goto out; + } + + /* Assemble arguments for xenpaging */ + xp_args = flexarray_make(8, 1); + if (!xp_args) { + rc = ERROR_NOMEM; + goto out; + } + /* Set executable path */ + flexarray_append(xp_args, xp); + + /* Append pagefile option */ + flexarray_append(xp_args, "-f"); + if (b_info->u.hvm.xenpaging_file) + flexarray_append(xp_args, b_info->u.hvm.xenpaging_file); + else + flexarray_append(xp_args, libxl__sprintf(gc, "%s/%s.%u.paging", + libxl_xenpaging_dir_path(), dom_name, domid)); + + /* Set maximum amount of memory xenpaging should handle */ + flexarray_append(xp_args, "-m"); + flexarray_append(xp_args, libxl__sprintf(gc, "%d", b_info->max_memkb)); + + /* Append extra args for pager */ + for (i = 0; xpe && xpe[i]; i++) + flexarray_append(xp_args, xpe[i]); + /* Append domid for pager */ + flexarray_append(xp_args, "-d"); + flexarray_append(xp_args, libxl__sprintf(gc, "%u", domid)); + flexarray_append(xp_args, NULL); + args = (char **) flexarray_contents(xp_args); + + /* Initialise logfile */ + libxl_create_logfile(ctx, libxl__sprintf(gc, "xenpaging-%s", dom_name), + &logfile); + logfile_w = open(logfile, O_WRONLY|O_CREAT, 0644); + free(logfile); + null = open("/dev/null", O_RDONLY); + + /* Spawn the child */ + rc = libxl__spawn_spawn(gc, buf_starting->for_spawn, "xenpaging", + libxl_spawner_record_pid, buf_starting); + if (rc < 0) + goto out_close; + if (!rc) { /* inner child */ + setsid(); + /* Finally run xenpaging */ + libxl__exec(null, logfile_w, logfile_w, xp, args); + } + rc = libxl__spawn_confirm_offspring_startup(gc, 5, "xenpaging", path, + "running", buf_starting); +out_close: + close(null); + close(logfile_w); + free(args); +out: + return rc; +} + static int do_domain_create(libxl__gc *gc, libxl_domain_config *d_config, libxl_console_ready cb, void *priv, uint32_t *domid_out, int restore_fd) @@ -614,6 +730,16 @@ static int do_domain_create(libxl__gc *g goto error_out; } + if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM) { + ret = create_xenpaging(gc, d_config->dm_info.dom_name, domid, + &d_config->b_info); + if (ret) { + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, + "Failed to start xenpaging.\n"); + goto error_out; + } + } + *domid_out = domid; return 0; diff -r a51d4fab351d -r ab5406a5b1d0 tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -108,7 +108,7 @@ int libxl__build_post(libxl__gc *gc, uin if (info->cpuid != NULL) libxl_cpuid_set(ctx, domid, info->cpuid); - ents = libxl__calloc(gc, 12 + (info->max_vcpus * 2) + 2, sizeof(char *)); + ents = libxl__calloc(gc, 14 + (info->max_vcpus * 2) + 2, sizeof(char *)); ents[0] = "memory/static-max"; ents[1] = libxl__sprintf(gc, "%d", info->max_memkb); ents[2] = "memory/target"; @@ -121,9 +121,11 @@ int libxl__build_post(libxl__gc *gc, uin ents[9] = libxl__sprintf(gc, "%"PRIu32, state->store_port); ents[10] = "store/ring-ref"; ents[11] = libxl__sprintf(gc, "%lu", state->store_mfn); + ents[12] = "memory/target-tot_pages"; + ents[13] = libxl__sprintf(gc, "%d", info->tot_memkb); for (i = 0; i < info->max_vcpus; i++) { - ents[12+(i*2)] = libxl__sprintf(gc, "cpu/%d/availability", i); - ents[12+(i*2)+1] = (i && info->cur_vcpus && !(info->cur_vcpus & (1 << i))) + ents[14+(i*2)] = libxl__sprintf(gc, "cpu/%d/availability", i); + ents[14+(i*2)+1] = (i && info->cur_vcpus && !(info->cur_vcpus & (1 << i))) ? "offline" : "online"; } diff -r a51d4fab351d -r ab5406a5b1d0 tools/libxl/libxl_memory.txt --- a/tools/libxl/libxl_memory.txt +++ b/tools/libxl/libxl_memory.txt @@ -1,28 +1,28 @@ /* === Domain memory breakdown: HVM guests ================================= - + +----------+ + - | | shadow | | - | +----------+ | - overhead | | extra | | - | | external | | - | +----------+ + | - | | extra | | | - | | internal | | | - + +----------+ + | | footprint - | | video | | | | - | +----------+ + + | | xen | - | | | | | | actual | maximum | - | | | | | | target | | - | | guest | | | build | | | - | | | | | start | | | - static | | | | | | | | - maximum | +----------+ | + + + + - | | | | - | | | | - | | balloon | | build - | | | | maximum - | | | | - + +----------+ + + + +----------+ + + | | shadow | | + | +----------+ | + overhead | | extra | | + | | external | | + | +----------+ + | + | | extra | | | + | | internal | | | + + +----------+ + | | footprint + | | video | | | | + | +----------+ + + + | | xen | + | | | | guest OS | | | actual | maximum | + | | guest | | real RAM | | | target | | + | | | | | | build | | | + | |----------+ + | | start + | | + static | | paging | | | | | + maximum | +----------+ | + + + + | | | | + | | | | + | | balloon | | build + | | | | maximum + | | | | + + +----------+ + extra internal = LIBXL_MAXMEM_CONSTANT @@ -34,6 +34,17 @@ libxl_domain_setmaxmem -> xen maximum libxl_set_memory_target -> actual target + build maximum = RAM as seen inside the virtual machine + Guest OS has to configure itself for this amount of memory + Increase/Decrease via memory hotplug of virtual hardware. + xl mem-max + build start = RAM usable by the guest OS + Guest OS sees balloon driver as memory hog + Increase/Decrease via commands to the balloon driver + xl mem-set + actual target = RAM allocated for the guest + Increase/Decrease via commands to paging daemon + xl mem-paging_target (?) === Domain memory breakdown: PV guests ================================= diff -r a51d4fab351d -r ab5406a5b1d0 tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -157,6 +157,7 @@ libxl_domain_build_info = Struct("domain ("tsc_mode", integer), ("max_memkb", uint32), ("target_memkb", uint32), + ("tot_memkb", uint32), ("video_memkb", uint32), ("shadow_memkb", uint32), ("disable_migrate", bool), @@ -174,6 +175,8 @@ libxl_domain_build_info = Struct("domain ("vpt_align", bool), ("timer_mode", integer), ("nested_hvm", bool), + ("xenpaging_file", string), + ("xenpaging_extra", libxl_string_list), ])), ("pv", Struct(None, [("kernel", libxl_file_reference), ("slack_memkb", uint32), diff -r a51d4fab351d -r ab5406a5b1d0 tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -346,6 +346,7 @@ static void printf_info(int domid, printf("\t\t\t(firmware %s)\n", b_info->u.hvm.firmware); printf("\t\t\t(video_memkb %d)\n", b_info->video_memkb); printf("\t\t\t(shadow_memkb %d)\n", b_info->shadow_memkb); + printf("\t\t\t(tot_memkb %d)\n", b_info->tot_memkb); printf("\t\t\t(pae %d)\n", b_info->u.hvm.pae); printf("\t\t\t(apic %d)\n", b_info->u.hvm.apic); printf("\t\t\t(acpi %d)\n", b_info->u.hvm.acpi); @@ -380,6 +381,7 @@ static void printf_info(int domid, printf("\t\t\t(spicedisable_ticketing %d)\n", dm_info->spicedisable_ticketing); printf("\t\t\t(spiceagent_mouse %d)\n", dm_info->spiceagent_mouse); + printf("\t\t\t(xenpaging_file %s)\n", b_info->u.hvm.xenpaging_file); printf("\t\t)\n"); break; case LIBXL_DOMAIN_TYPE_PV: @@ -515,6 +517,28 @@ static void parse_disk_config(XLU_Config parse_disk_config_multistring(config, 1, &spec, disk); } +static void parse_xenpaging_extra(const XLU_Config *config, libxl_string_list *xpe) +{ + XLU_ConfigList *args; + libxl_string_list l; + const char *val; + int nr_args = 0, i; + + if (xlu_cfg_get_list(config, "xenpaging_extra", &args, &nr_args, 1)) + return; + + l = xmalloc(sizeof(char*)*(nr_args + 1)); + if (!l) + return; + + l[nr_args] = NULL; + for (i = 0; i < nr_args; i++) { + val = xlu_cfg_get_listitem(args, i); + l[i] = val ? strdup(val) : NULL; + } + *xpe = l; +} + static void parse_config_data(const char *configfile_filename_report, const char *configfile_data, int configfile_len, @@ -620,6 +644,9 @@ static void parse_config_data(const char if (!xlu_cfg_get_long (config, "maxmem", &l)) b_info->max_memkb = l * 1024; + if (!xlu_cfg_get_long (config, "actmem", &l)) + b_info->tot_memkb = l * 1024; + if (xlu_cfg_get_string (config, "on_poweroff", &buf)) buf = "destroy"; if (!parse_action_on_shutdown(buf, &d_config->on_poweroff)) { @@ -695,6 +722,10 @@ static void parse_config_data(const char b_info->u.hvm.timer_mode = l; if (!xlu_cfg_get_long (config, "nestedhvm", &l)) b_info->u.hvm.nested_hvm = l; + + xlu_cfg_replace_string (config, "xenpaging_file", &b_info->u.hvm.xenpaging_file); + parse_xenpaging_extra(config, &b_info->u.hvm.xenpaging_extra); + break; case LIBXL_DOMAIN_TYPE_PV: { diff -r a51d4fab351d -r ab5406a5b1d0 tools/xenpaging/xenpaging.c --- a/tools/xenpaging/xenpaging.c +++ b/tools/xenpaging/xenpaging.c @@ -40,6 +40,8 @@ /* Defines number of mfns a guest should use at a time, in KiB */ #define WATCH_TARGETPAGES "memory/target-tot_pages" +/* Defines path to startup confirmation */ +#define WATCH_STARTUP "xenpaging/state" static char *watch_target_tot_pages; static char *dom_path; static char watch_token[16]; @@ -772,6 +774,20 @@ static int evict_pages(xenpaging_t *pagi return num; } +static void xenpaging_confirm_startup(xenpaging_t *paging) +{ + xc_interface *xch = paging->xc_handle; + char *path; + int len; + + len = asprintf(&path, "%s/%s", dom_path, WATCH_STARTUP); + if ( len < 0 ) + return; + DPRINTF("confirming startup in %s\n", path); + xs_write(paging->xs_handle, XBT_NULL, path, "running", len); + free(path); +} + int main(int argc, char *argv[]) { struct sigaction act; @@ -830,6 +846,9 @@ int main(int argc, char *argv[]) /* listen for page-in events to stop pager */ create_page_in_thread(paging); + /* Confirm startup to caller */ + xenpaging_confirm_startup(paging); + /* Swap pages in and out */ while ( 1 ) { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Nov-07 11:02 UTC
Re: [Xen-devel] [PATCH 4 of 4] xenpaging: initial libxl support
On Wed, 2 Nov 2011, Olaf Hering wrote:> diff -r a51d4fab351d -r ab5406a5b1d0 tools/libxl/libxl_create.c > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -429,6 +429,122 @@ retry_transaction: > return rc; > } > > +static int create_xenpaging(libxl__gc *gc, char *dom_name, uint32_t domid, > + libxl_domain_build_info *b_info) > +{ > + libxl__spawner_starting *buf_starting; > + libxl_string_list xpe = b_info->u.hvm.xenpaging_extra; > + int i, rc; > + char *logfile; > + int logfile_w, null; > + char *path, *dom_path, *value; > + char **args; > + char *xp; > + flexarray_t *xp_args; > + libxl_ctx *ctx = libxl__gc_owner(gc); > + > + /* Nothing to do */ > + if (!b_info->tot_memkb) > + return 0;I think that using tot_memkb to store the actual memory target and then checking whether is 0 to detect if paging is active/inactive is confusing. If tot_memkb is the pod target of the domain, we should be coherent and set it equal to target_memkb when paging is inactive.> @@ -34,6 +34,17 @@ > libxl_domain_setmaxmem -> xen maximum > libxl_set_memory_target -> actual target > > + build maximum = RAM as seen inside the virtual machine > + Guest OS has to configure itself for this amount of memory > + Increase/Decrease via memory hotplug of virtual hardware. > + xl mem-max > + build start = RAM usable by the guest OS > + Guest OS sees balloon driver as memory hog > + Increase/Decrease via commands to the balloon driver > + xl mem-set > + actual target = RAM allocated for the guest > + Increase/Decrease via commands to paging daemon > + xl mem-paging_target (?)maybe xl mem-paging is specific enough> === Domain memory breakdown: PV guests =================================> > diff -r a51d4fab351d -r ab5406a5b1d0 tools/libxl/libxl_types.idl > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -157,6 +157,7 @@ libxl_domain_build_info = Struct("domain > ("tsc_mode", integer), > ("max_memkb", uint32), > ("target_memkb", uint32), > + ("tot_memkb", uint32), > ("video_memkb", uint32), > ("shadow_memkb", uint32), > ("disable_migrate", bool),I would like a comment somewhere of what tot_memkb is supposed to represent. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Nov-07 12:55 UTC
Re: [Xen-devel] [PATCH 4 of 4] xenpaging: initial libxl support
On Mon, Nov 07, Stefano Stabellini wrote:> I think that using tot_memkb to store the actual memory target and then > checking whether is 0 to detect if paging is active/inactive is > confusing.tot_memkb is only set when it was specified in the config file, and perhaps later when a suitable xl mem-FOO command and a related watch on the targer-tot_pages node is added.> If tot_memkb is the pod target of the domain, we should be coherent and > set it equal to target_memkb when paging is inactive.So far PoD and paging are unrelated and mean different things. I think the difference between max_memkb and tot_memkb could be the trigger to start paging.> > === Domain memory breakdown: PV guests =================================> > > > diff -r a51d4fab351d -r ab5406a5b1d0 tools/libxl/libxl_types.idl > > --- a/tools/libxl/libxl_types.idl > > +++ b/tools/libxl/libxl_types.idl > > @@ -157,6 +157,7 @@ libxl_domain_build_info = Struct("domain > > ("tsc_mode", integer), > > ("max_memkb", uint32), > > ("target_memkb", uint32), > > + ("tot_memkb", uint32), > > ("video_memkb", uint32), > > ("shadow_memkb", uint32), > > ("disable_migrate", bool), > > I would like a comment somewhere of what tot_memkb is supposed to > represent.Yes, sorry, docu is lacking in that change. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Nov-07 13:28 UTC
Re: [Xen-devel] [PATCH 4 of 4] xenpaging: initial libxl support
On Mon, 7 Nov 2011, Olaf Hering wrote:> > If tot_memkb is the pod target of the domain, we should be coherent and > > set it equal to target_memkb when paging is inactive. > > So far PoD and paging are unrelated and mean different things. > I think the difference between max_memkb and tot_memkb could be the > trigger to start paging.Yes, I think it would be better. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Nov 07, Stefano Stabellini wrote:> On Mon, 7 Nov 2011, Olaf Hering wrote: > > > If tot_memkb is the pod target of the domain, we should be coherent and > > > set it equal to target_memkb when paging is inactive. > > > > So far PoD and paging are unrelated and mean different things. > > I think the difference between max_memkb and tot_memkb could be the > > trigger to start paging. > > Yes, I think it would be better.I have to disagree here. After looking at the code in parse_config_data(), tot_memkb is only set if actmem= is listed in the configfile. And if actmem= is set, its the trigger to run xenpaging and let it work toward the specified number. So checking for a non-null tot_memkb in create_xenpaging() looks like the correct way to me to decide wether xenpaging should be started. Olaf
Stefano Stabellini
2011-Nov-21 10:53 UTC
Re: [PATCH 4 of 4] xenpaging: initial libxl support
On Sun, 20 Nov 2011, Olaf Hering wrote:> On Mon, Nov 07, Stefano Stabellini wrote: > > > On Mon, 7 Nov 2011, Olaf Hering wrote: > > > > If tot_memkb is the pod target of the domain, we should be coherent and > > > > set it equal to target_memkb when paging is inactive. > > > > > > So far PoD and paging are unrelated and mean different things. > > > I think the difference between max_memkb and tot_memkb could be the > > > trigger to start paging. > > > > Yes, I think it would be better. > > I have to disagree here. > > After looking at the code in parse_config_data(), tot_memkb is only set > if actmem= is listed in the configfile. And if actmem= is set, its the > trigger to run xenpaging and let it work toward the specified number. > So checking for a non-null tot_memkb in create_xenpaging() looks like > the correct way to me to decide wether xenpaging should be started.what if tot_memkb is bigger than target_memkb? Or even bigger than max_memkb?
On Mon, Nov 21, Stefano Stabellini wrote:> what if tot_memkb is bigger than target_memkb? Or even bigger than > max_memkb?tot_memkb is unrelated to target_memkb, also somewhat unrelated to max_memkb. xenpaging will look at tot_memkb value (at "memory/target-tot_pages" to be precise) and try to reach that number of domain->tot_pages. If the tot_memkb number is larger than max_memkb nothing will happen. Right now there is not much checking anyway, memory=1024 maxmem=1 in the config is accepted in my testing. Olaf
On Mon, Nov 21, 2011 at 3:13 PM, Olaf Hering <olaf@aepfle.de> wrote:> On Mon, Nov 21, Stefano Stabellini wrote: > >> what if tot_memkb is bigger than target_memkb? Or even bigger than >> max_memkb? > > tot_memkb is unrelated to target_memkb, also somewhat unrelated to > max_memkb.I''d love to contribute to this discussion, but I don''t know what these different names mean. I think what we need to talk about is all of the different memory parameters we need, and then what each of the individual names mean -- what they currently map to, and then what we want them to map to. At very least they should be in a comment somewhere.> > xenpaging will look at tot_memkb value (at "memory/target-tot_pages" to > be precise) and try to reach that number of domain->tot_pages. If the > tot_memkb number is larger than max_memkb nothing will happen. > > > Right now there is not much checking anyway, memory=1024 maxmem=1 in the > config is accepted in my testing. > > Olaf > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >
On Mon, 2011-11-21 at 16:40 +0000, George Dunlap wrote:> On Mon, Nov 21, 2011 at 3:13 PM, Olaf Hering <olaf@aepfle.de> wrote: > > On Mon, Nov 21, Stefano Stabellini wrote: > > > >> what if tot_memkb is bigger than target_memkb? Or even bigger than > >> max_memkb? > > > > tot_memkb is unrelated to target_memkb, also somewhat unrelated to > > max_memkb. > > I''d love to contribute to this discussion, but I don''t know what these > different names mean. I think what we need to talk about is all of > the different memory parameters we need, and then what each of the > individual names mean -- what they currently map to, and then what we > want them to map to. At very least they should be in a comment > somewhere.tools/libxl/libxl_memory.txt covers some of that (and Olaf patched it IIRC) although it is not so clear on the mapping to xl configuration keys. Ian.> > > > > xenpaging will look at tot_memkb value (at "memory/target-tot_pages" to > > be precise) and try to reach that number of domain->tot_pages. If the > > tot_memkb number is larger than max_memkb nothing will happen. > > > > > > Right now there is not much checking anyway, memory=1024 maxmem=1 in the > > config is accepted in my testing. > > > > Olaf > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > >
Stefano Stabellini
2011-Nov-22 10:58 UTC
Re: [PATCH 4 of 4] xenpaging: initial libxl support
On Mon, 21 Nov 2011, Olaf Hering wrote:> On Mon, Nov 21, Stefano Stabellini wrote: > > > what if tot_memkb is bigger than target_memkb? Or even bigger than > > max_memkb? > > tot_memkb is unrelated to target_memkb, also somewhat unrelated to > max_memkb.At build time ballooning is not active yet and target_memkb represents the amount of memory available to the VM plus the videoram (see libxl__build_hvm). As a consequence I think that tot_memkb cannot be higher than target_memkb - videoram_memkb (that is build_start in the diagram). So, what is going to happen if tot_memkb is higher than target_memkb - videoram_memkb? Also, what is going to happen if it is lower?> xenpaging will look at tot_memkb value (at "memory/target-tot_pages" to > be precise) and try to reach that number of domain->tot_pages. If the > tot_memkb number is larger than max_memkb nothing will happen.How is it going to reach the tot_pages target? Where is it going to take the memory from? Is it going to automatically page out memory from other VMs?> Right now there is not much checking anyway, memory=1024 maxmem=1 in the > config is accepted in my testing.That is a correct configuration: it means that the domain has 1024MB of RAM but it cannot allocate any more (maximum allocation limit being 1MB). maxmem doesn''t influence the current memory of the VM, only future allocations.
On Tue, Nov 22, Stefano Stabellini wrote:> On Mon, 21 Nov 2011, Olaf Hering wrote: > > On Mon, Nov 21, Stefano Stabellini wrote: > > > > > what if tot_memkb is bigger than target_memkb? Or even bigger than > > > max_memkb? > > > > tot_memkb is unrelated to target_memkb, also somewhat unrelated to > > max_memkb. > > At build time ballooning is not active yet and target_memkb represents > the amount of memory available to the VM plus the videoram (see > libxl__build_hvm). > As a consequence I think that tot_memkb cannot be higher than > target_memkb - videoram_memkb (that is build_start in the diagram).It can because with xenpaging the target_memkb turns from real memory into virtual memory, and tot_memkb is the new amount of real memory. The actual checking wether the tot_memkb/target_memkb/max_memkb are sane can be either done when they are changed with xl mem-XY like its done now. Or we add new code to do such checking already during config parsing.> So, what is going to happen if tot_memkb is higher than target_memkb - > videoram_memkb?Nothing happens, since xenpaging is the only consumer of that variable (via xenstore). See below.> Also, what is going to happen if it is lower?If its lower, xenpaging will page-out some pages, adds them back if the guest happens to access them and page-out some other pages. The guest still has access to all memory it thinks it has (target_memkb).> > xenpaging will look at tot_memkb value (at "memory/target-tot_pages" to > > be precise) and try to reach that number of domain->tot_pages. If the > > tot_memkb number is larger than max_memkb nothing will happen. > > How is it going to reach the tot_pages target? Where is it going to take > the memory from? Is it going to automatically page out memory from other > VMs?xenpaging does not add new memory. If it has no pages to page-in and tot_pages is still higher, it will do nothing.> > Right now there is not much checking anyway, memory=1024 maxmem=1 in the > > config is accepted in my testing. > > That is a correct configuration: it means that the domain has 1024MB of > RAM but it cannot allocate any more (maximum allocation limit being 1MB). > maxmem doesn''t influence the current memory of the VM, only future > allocations.It causes stall in the host, perhaps due to an interger overflow (I have not analyzed it yet). Olaf
On Mon, Nov 21, 2011 at 3:13 PM, Olaf Hering <olaf@aepfle.de> wrote:> On Mon, Nov 21, Stefano Stabellini wrote: > >> what if tot_memkb is bigger than target_memkb? Or even bigger than >> max_memkb? > > tot_memkb is unrelated to target_memkb, also somewhat unrelated to > max_memkb.It seems to me the opposite: tot_memkb (as you''re describing here) and target_memkb both mean, "How much Xen memory the administrator wants allocated to the VM." Before either paging or PoD, the only way to modify the amount of memory allocated to a VM was via the balloon driver. PoD introduced a mechanism that allows the domain builder to start a VM with less memory than static_max, and allow the VM to run until balloon driver can normalize things. Paging introduces a separate mechanism for the administrator to modify the amount of memory allocated to the VM. It seems to me like paging and ballooning should both use target_memkb. We just need to figure out how to make sure that paging only comes on when it''s needed. When it might be needed includes: * For guests that don''t have a balloon driver * For guests whose balloon driver is not meeting target_memkb (either because it''s unresponsive, rebellious, or because it can''t get more memory from the guest OS) * Potentially, between domain creation and the time the balloon driver comes up (i.e., replacing PoD). It seems like having some kind of a flag or setting would be better. Various factors: * Do we start the paging daemon? * Do we use paging during boot? Only matters if max_memkb !target_memkb. If no, the domain builder uses PoD mode. If yes, the domain builder will fill in target_memkb worth of guest memory, and then fill the rest with swapped-out entries. (If max_memkb =target_memkb, domain builder fills in all entries.) * When does the paging daemon respond to changes to target_memkb? This could be: - Immediately (assume no balloon driver) - PoD mode: Start immediately, but when you notice the balloon driver reaching the initial target_memkb, turn off, or switch into the next mode - Fallback mode: Pay attention to changes in target_memkb, but don''t act immediately. Wait for paging_delay secs for the balloon driver to handle it; if it doesn''t respond, then start paging (and perhaps switch to "Immediately" mode). What do you think? -George
Lets resume this discussion now to get it sorted out for 4.2. On Tue, Nov 22, George Dunlap wrote:> On Mon, Nov 21, 2011 at 3:13 PM, Olaf Hering <olaf@aepfle.de> wrote: > > On Mon, Nov 21, Stefano Stabellini wrote: > > > >> what if tot_memkb is bigger than target_memkb? Or even bigger than > >> max_memkb? > > > > tot_memkb is unrelated to target_memkb, also somewhat unrelated to > > max_memkb. > > It seems to me the opposite: tot_memkb (as you''re describing here) and > target_memkb both mean, "How much Xen memory the administrator wants > allocated to the VM." Before either paging or PoD, the only way to > modify the amount of memory allocated to a VM was via the balloon > driver. PoD introduced a mechanism that allows the domain builder to > start a VM with less memory than static_max, and allow the VM to run > until balloon driver can normalize things. Paging introduces a > separate mechanism for the administrator to modify the amount of > memory allocated to the VM. > > It seems to me like paging and ballooning should both use > target_memkb. We just need to figure out how to make sure that paging > only comes on when it''s needed. When it might be needed includes: > * For guests that don''t have a balloon driver > * For guests whose balloon driver is not meeting target_memkb (either > because it''s unresponsive, rebellious, or because it can''t get more > memory from the guest OS) > * Potentially, between domain creation and the time the balloon driver > comes up (i.e., replacing PoD). > > It seems like having some kind of a flag or setting would be better. > Various factors: > * Do we start the paging daemon? > * Do we use paging during boot? Only matters if max_memkb !> target_memkb. If no, the domain builder uses PoD mode. If yes, the > domain builder will fill in target_memkb worth of guest memory, and > then fill the rest with swapped-out entries. (If max_memkb => target_memkb, domain builder fills in all entries.) > * When does the paging daemon respond to changes to target_memkb? > This could be: > - Immediately (assume no balloon driver) > - PoD mode: Start immediately, but when you notice the balloon driver > reaching the initial target_memkb, turn off, or switch into the next > mode > - Fallback mode: Pay attention to changes in target_memkb, but don''t > act immediately. Wait for paging_delay secs for the balloon driver to > handle it; if it doesn''t respond, then start paging (and perhaps > switch to "Immediately" mode). > > What do you think?So there is that maxmem= setting to let the guest OS configure itself for a given amount of pseudo-physical memory. Then there is a way to cut down the guest OS memory usage, both with balloon driver in guest and later with PoD. Isnt paging a better (or: just different) way to control the memory usage of a guest OS (It costs diskspace in dom0)? If a guest OS is configured with maxmem=4096, but then restricted with memory=3072 in the next line, why is maxmem= there in the first place? Would it clearer to say: The guest OS has a certain workload which requires 3072MB. But maybe at some point the guest needs the full 4096MB, then it can access all of it at the cost of some IO due to swapping in dom0. I think the balloon driver in the guest is not really needed anymore, it could just be there and do nothing. IF there is physical memory to release to the host, the pager can do it on behalf of the balloon driver. What if the config format is like this: Do things as they were done until now (PoD + balloon driver): memory=3072 maxmem=4096 paging=0 (or not specified at all) Do things with pager instead of balloon driver and/or PoD: memory=3072 maxmem=4096 paging=1, or xenpaging=1 xenpaging_extra=[ ''-f'', ''/path/to/pagefile_guestname'' ] (optional) And have mem-set adjust memory/target-tot_pages to tell pager about the new target. The builder could create some sort PoD for a paged guest so that during startup only the amount of memory= needs to be allocated. This needs to be implemented, right now a starting guest needs the full amount of memory until the pager starts to page-out pages. Olaf
On Mon, 2012-01-09 at 19:21 +0000, Olaf Hering wrote:> So there is that maxmem= setting to let the guest OS configure itself > for a given amount of pseudo-physical memory. Then there is a way to cut > down the guest OS memory usage, both with balloon driver in guest and > later with PoD. > Isnt paging a better (or: just different) way to control the memory > usage of a guest OS (It costs diskspace in dom0)?On the contrary, hypervisor swapping is definitely *much worse* than using a balloon driver. The balloon driver was an innovation developed specifically to avoid hypervisor swapping if at all possible[1]. We need hypervisor swapping as a back-stop for situations where the balloon driver is non-existent, or can''t function immediately for some reason (e.g., we''ve been using page-sharing to do memory overcommit and suddenly have a bunch of pages un-shared); but it should always be a last resort, and would ideally be mitigated by the balloon driver as soon as possible. [1] http://www.waldspurger.org/carl/papers/esx-mem-osdi02.pdf> If a guest OS is configured with maxmem=4096, but then restricted with > memory=3072 in the next line, why is maxmem= there in the first place?Because for HVM guests at least, the guest OS will never recognize more memory than was reported in the e820 map at boot. So if you boot with maxmem=3072, the VM will *never* be able to see more then 3072 megabytes of RAM. If you want to start a VM with 3072 MiB, but want the flexibility of allowing the VM to use up to 4096 MiB at some point in the future, you need to have 4096MiB in the e820 map.> Would it clearer to say: The guest OS has a certain workload which > requires 3072MB. But maybe at some point the guest needs the full > 4096MB, then it can access all of it at the cost of some IO due to > swapping in dom0.The very best thing is if the guest does its own swapping. If its working set is 4096MiB, but its available memory is only 3072MiB, it''s better to tell the guest it only has 3072MiB to work with, so it can do the swapping optimally.> I think the balloon driver in the guest is not really needed anymore, it > could just be there and do nothing. IF there is physical memory to > release to the host, the pager can do it on behalf of the balloon > driver.Hopefully it''s clear that I disagree with this completely.> What if the config format is like this: > > Do things as they were done until now (PoD + balloon driver): > memory=3072 > maxmem=4096 > paging=0 (or not specified at all) > > Do things with pager instead of balloon driver and/or PoD: > memory=3072 > maxmem=4096 > paging=1, or xenpaging=1 > xenpaging_extra=[ ''-f'', ''/path/to/pagefile_guestname'' ] (optional)Except that this makes paging and ballooning mutually exclusive. What we want is to make them work together -- to have paging as a back-up when ballooning fails (or isn''t fast enough). We''d also like to experiment with having a special-case of paging replace PoD; in that case, we need to start with this special-case paging and then transition into ballooning. It may be that we don''t have time to make them work together before the 4.2 release; in that case, we may need to make them mutually exclusive for that release, to be fixed up in 4.3. But if we can make them work together by 4.2, that would be the best; and in any case, we need to make sure we''re planning for them to work together, and minimize the interface changes when we do.> The builder could create some sort PoD for a paged guest so > that during startup only the amount of memory= needs to be allocated. > This needs to be implemented, right now a starting guest needs the full > amount of memory until the pager starts to page-out pages.Yes, the builder needs to be able to start a guest with pages pre-paged out, for the same reason we introduced PoD: that is, if you page a guest from 4096MiB down to 3072MiB, and then reboot the guest, you may only have 3072MiB available. So if you want maxmem=4096 still, you need to start with some pages "pre-paged" out. We need that mechanism for robustness anyway; we can then experiment with using it to replace PoD. -George
On Tue, Jan 10, George Dunlap wrote:> On Mon, 2012-01-09 at 19:21 +0000, Olaf Hering wrote: > > So there is that maxmem= setting to let the guest OS configure itself > > for a given amount of pseudo-physical memory. Then there is a way to cut > > down the guest OS memory usage, both with balloon driver in guest and > > later with PoD. > > Isnt paging a better (or: just different) way to control the memory > > usage of a guest OS (It costs diskspace in dom0)? > > On the contrary, hypervisor swapping is definitely *much worse* than > using a balloon driver. The balloon driver was an innovation developed > specifically to avoid hypervisor swapping if at all possible[1]. We > need hypervisor swapping as a back-stop for situations where the balloon > driver is non-existent, or can''t function immediately for some reason > (e.g., we''ve been using page-sharing to do memory overcommit and > suddenly have a bunch of pages un-shared); but it should always be a > last resort, and would ideally be mitigated by the balloon driver as > soon as possible.Isnt that up to the host admin to decide where to take the memory from? So if its acceptable to swap parts of a VM (independent from what the guest OS thinks it has), so be it. We just need to right knobs. So far we have two knobs: maxmem= xl mem-max memory= xl mem-set (and guest OS balloon driver via sysfs) Another knob for paging is needed. A while ago you proposed two new commands: mem-balloon_target and mem-swap_target. Perhaps these terms should be used also in the config file to set the initial memory/target and memory/target-tot_pages values. If the latter is set, start the pager. And if the latter is called, start the pager if it doesnt run already. At some point we will have the code ready so that PoD and paging can coexist, so that the guests memory usage can grow on guests demand as it does now. This is just a detail, independent from config options and commands. To summarize: maxmem= ; xl mem-max memory= mem-balloon_target= ; xl mem-balloon_target, xl mem-set mem-swap_target= ; xl mem-swap_target The rule could be like this: mem-swap_target <= mem-balloon_target <= mem-max> > What if the config format is like this: > > > > Do things as they were done until now (PoD + balloon driver): > > memory=3072 > > maxmem=4096 > > paging=0 (or not specified at all) > > > > Do things with pager instead of balloon driver and/or PoD: > > memory=3072 > > maxmem=4096 > > paging=1, or xenpaging=1 > > xenpaging_extra=[ ''-f'', ''/path/to/pagefile_guestname'' ] (optional) > > Except that this makes paging and ballooning mutually exclusive. What > we want is to make them work together -- to have paging as a back-up > when ballooning fails (or isn''t fast enough).ballooning in the guest will still work. For example via sysfs, the guest driver can release pages any time it wants to. But with the above knobs the balloon driver can still be tweaked from the host. Olaf
At 15:58 +0100 on 11 Jan (1326297501), Olaf Hering wrote:> On Tue, Jan 10, George Dunlap wrote: > > > On Mon, 2012-01-09 at 19:21 +0000, Olaf Hering wrote: > > > So there is that maxmem= setting to let the guest OS configure itself > > > for a given amount of pseudo-physical memory. Then there is a way to cut > > > down the guest OS memory usage, both with balloon driver in guest and > > > later with PoD. > > > Isnt paging a better (or: just different) way to control the memory > > > usage of a guest OS (It costs diskspace in dom0)? > > > > On the contrary, hypervisor swapping is definitely *much worse* than > > using a balloon driver. The balloon driver was an innovation developed > > specifically to avoid hypervisor swapping if at all possible[1]. We > > need hypervisor swapping as a back-stop for situations where the balloon > > driver is non-existent, or can''t function immediately for some reason > > (e.g., we''ve been using page-sharing to do memory overcommit and > > suddenly have a bunch of pages un-shared); but it should always be a > > last resort, and would ideally be mitigated by the balloon driver as > > soon as possible. > > Isnt that up to the host admin to decide where to take the memory from? > So if its acceptable to swap parts of a VM (independent from what the > guest OS thinks it has), so be it.Why? The _only_ reason I can imagine for wanting to use paging is when the balloon driver can''t or won''t do its job. There''s no advantage to paging except that you can always force it to happen. I think it makes sense to have two separate targets at the libxl level (one for the balloon driver and one for the external pager/PoD), but at the xl level (i.e. in config files and commands) there should be only one target for memroy-actually-in-use-by-the-guest and xl should DTRT to achieve it. This interface is already baffling enough. :) Tim.
On Wed, Jan 11, Tim Deegan wrote:> > Isnt that up to the host admin to decide where to take the memory from? > > So if its acceptable to swap parts of a VM (independent from what the > > guest OS thinks it has), so be it. > > Why? The _only_ reason I can imagine for wanting to use paging is when > the balloon driver can''t or won''t do its job. There''s no advantage to > paging except that you can always force it to happen.Isnt that the whole point of paging, to make it happen at will without the guest (or the application at process level) noticing it? Olaf
At 17:38 +0100 on 11 Jan (1326303483), Olaf Hering wrote:> On Wed, Jan 11, Tim Deegan wrote: > > > > Isnt that up to the host admin to decide where to take the memory from? > > > So if its acceptable to swap parts of a VM (independent from what the > > > guest OS thinks it has), so be it. > > > > Why? The _only_ reason I can imagine for wanting to use paging is when > > the balloon driver can''t or won''t do its job. There''s no advantage to > > paging except that you can always force it to happen. > > Isnt that the whole point of paging, to make it happen at will without > the guest (or the application at process level) noticing it?Yes, but that''s a _bad_ thing. :) If the guest can co-operate, you''ll get way better eviction choices, better performance, and better accounting (since the I/O is done by the guest to guest-owned disk). That''s why I think both mechanisms should be visible up to the libxl layer, but xl itself should just implement the one sensible policy: try ballooning first, then page if that fails. Tim.
On Wed, Jan 11, Tim Deegan wrote:> At 17:38 +0100 on 11 Jan (1326303483), Olaf Hering wrote: > > On Wed, Jan 11, Tim Deegan wrote: > > > > > > Isnt that up to the host admin to decide where to take the memory from? > > > > So if its acceptable to swap parts of a VM (independent from what the > > > > guest OS thinks it has), so be it. > > > > > > Why? The _only_ reason I can imagine for wanting to use paging is when > > > the balloon driver can''t or won''t do its job. There''s no advantage to > > > paging except that you can always force it to happen. > > > > Isnt that the whole point of paging, to make it happen at will without > > the guest (or the application at process level) noticing it? > > Yes, but that''s a _bad_ thing. :) If the guest can co-operate, you''ll > get way better eviction choices, better performance, and better > accounting (since the I/O is done by the guest to guest-owned disk).Hmm, I think its slightly like an ''rm -rf *'' accident: bad, but allowed.> That''s why I think both mechanisms should be visible up to the libxl > layer, but xl itself should just implement the one sensible policy: > try ballooning first, then page if that fails.So you are saying xl should take care of an improved mem-set command? Perhaps by tweaking memory/target first, monitoring something like tot_pages, and if memory/target isnt reached after some time, tweak memory/target-tot_pages so that xenpaging takes care of the rest? Olaf
On Thu, 2012-01-12 at 14:12 +0000, Olaf Hering wrote:> On Wed, Jan 11, Tim Deegan wrote: > > > At 17:38 +0100 on 11 Jan (1326303483), Olaf Hering wrote: > > > On Wed, Jan 11, Tim Deegan wrote: > > > > > > > > Isnt that up to the host admin to decide where to take the memory from? > > > > > So if its acceptable to swap parts of a VM (independent from what the > > > > > guest OS thinks it has), so be it. > > > > > > > > Why? The _only_ reason I can imagine for wanting to use paging is when > > > > the balloon driver can''t or won''t do its job. There''s no advantage to > > > > paging except that you can always force it to happen. > > > > > > Isnt that the whole point of paging, to make it happen at will without > > > the guest (or the application at process level) noticing it? > > > > Yes, but that''s a _bad_ thing. :) If the guest can co-operate, you''ll > > get way better eviction choices, better performance, and better > > accounting (since the I/O is done by the guest to guest-owned disk). > > Hmm, I think its slightly like an ''rm -rf *'' accident: bad, but allowed.You analogy is bogus, the "support" for "rm -rf *" comes legitimately (even if unfortunately) from the combined semantics of the shell and rm and just falls out from the normal use cases. In the case of paging we would have to add explicit support for doing something which we think has no purpose. I think it is OK for libxl to offer the flexibility to toolstack authors to do this however they want but xl should only expose a single "target" value.> > That''s why I think both mechanisms should be visible up to the libxl > > layer, but xl itself should just implement the one sensible policy: > > try ballooning first, then page if that fails. > > So you are saying xl should take care of an improved mem-set command? > Perhaps by tweaking memory/target first, monitoring something like > tot_pages, and if memory/target isnt reached after some time, tweak > memory/target-tot_pages so that xenpaging takes care of the rest?Yes, although there is no need for monitoring, it can just be set memory/target, wait, set memory/target-tot_pages. If the balloon driver has caught up then setting target-tot_pages will be a nop but it still correctly reflects the desired state of the system and so we should set it. Another alternative would be for the pager to add some hysteresis after it observes a change in the target before it starts "implementing" it. This would allow the toolstack to just set things one shot. I''m not sure that this is better though -- it makes things a little less flexible for the toolstack and encodes policy in the pager. Ian.