Implement vcpu soft affinity for credit1 Hello everyone, Take 3 for his series. Very briefly, what it does is allowing each vcpu to have: - an hard affinity, which they already do, and we usually call pinning. This is the list of pcpus where a vcpu is allowed to run; - a soft affinity, which this series introduces. This is the list of pcpus where a vcpu *prefers* to run. Once that is done, per-vcpu NUMA-aware scheduling is easily implemented on top of that, just by instructing libxl to issue the proper call to setup the soft affinity of the domain''s vcpus to be equal to its node-affinity. Wrt v2 review[*] I have addressed all the comments (see individual changelogs). In particular, I have completely redesigned the libxl interface. It now allows both the following usage patterns: 1. changing either soft affinity only or hard affinity only or both of them to the same value, and getting DEBUG or WARN output if that results in an inconsistent state; 2. changing both hard and soft affinity, each one to its own value, and getting DEBUG or WARN output only if the *final* state is inconsistent. The series is also available here: git://xenbits.xen.org/people/dariof/xen.git numa/per-vcpu-affinity-v3 Thanks and Regards, Dario [*] http://bugs.xenproject.org/xen/mid/%3C20131113190852.18086.5437.stgit@Solace%3E/ --- Dario Faggioli (14): a xl: match output of vcpu-list with pinning syntax n libxl: sanitize error handling in libxl_get_max_{cpus,nodes} ra xl: allow for node-wise specification of vcpu pinning a xl: implement and enable dryrun mode for `xl vcpu-pin'' a xl: test script for the cpumap parser (for vCPU pinning) r xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity r xen: sched: introduce soft-affinity and use it instead d->node-affinity xen: derive NUMA node affinity from hard and soft CPU affinity xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity libxc: get and set soft and hard affinity libxl: get and set soft affinity xl: enable getting and setting soft * xl: enable for specifying node-affinity in the config file a libxl: automatic NUMA placement affects soft affinity n = new in v3 r = has been ''Reviewed-by'' a = has been ''Acked-by'' * = has been ''Acked-by'' but the implementation changed a bit (IanJ explicitly said to point this out) docs/man/xl.cfg.pod.5 | 66 +++- docs/man/xl.pod.1 | 24 +- docs/misc/xl-numa-placement.markdown | 164 +++++++---- tools/libxc/xc_domain.c | 47 ++- tools/libxc/xenctrl.h | 44 +++ tools/libxl/Makefile | 2 tools/libxl/check-xl-vcpupin-parse | 294 +++++++++++++++++++ tools/libxl/check-xl-vcpupin-parse.data-example | 53 +++ tools/libxl/libxl.c | 146 +++++++++ tools/libxl/libxl.h | 30 ++ tools/libxl/libxl_create.c | 6 tools/libxl/libxl_dom.c | 23 + tools/libxl/libxl_types.idl | 4 tools/libxl/libxl_utils.c | 48 +++ tools/libxl/libxl_utils.h | 41 +-- tools/libxl/xl_cmdimpl.c | 356 ++++++++++++++++------- tools/libxl/xl_cmdtable.c | 5 tools/ocaml/libs/xc/xenctrl_stubs.c | 8 - tools/python/xen/lowlevel/xc/xc.c | 6 xen/arch/x86/traps.c | 13 - xen/common/domain.c | 87 +++--- xen/common/domctl.c | 54 +++ xen/common/keyhandler.c | 4 xen/common/sched_credit.c | 161 ++++------ xen/common/sched_sedf.c | 2 xen/common/schedule.c | 57 ++-- xen/common/wait.c | 10 - xen/include/public/domctl.h | 15 + xen/include/xen/sched.h | 14 + 29 files changed, 1377 insertions(+), 407 deletions(-) create mode 100755 tools/libxl/check-xl-vcpupin-parse create mode 100644 tools/libxl/check-xl-vcpupin-parse.data-example -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Dario Faggioli
2013-Nov-18 18:16 UTC
[PATCH v3 01/14] xl: match output of vcpu-list with pinning syntax
in fact, pinning to all the pcpus happens by specifying "all" (either on the command line or in the config file), while `xl vcpu-list'' report it as "any cpu". Change this into something more consistent, by using "all" everywhere. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Changes since v1: * this patch was not there in v1. It is now as using the same syntax for both input and output was requested during review. --- tools/libxl/xl_cmdimpl.c | 27 +++++++-------------------- 1 file changed, 7 insertions(+), 20 deletions(-) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 8690ec7..13e97b3 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -3101,8 +3101,7 @@ out: } } -/* If map is not full, prints it and returns 0. Returns 1 otherwise. */ -static int print_bitmap(uint8_t *map, int maplen, FILE *stream) +static void print_bitmap(uint8_t *map, int maplen, FILE *stream) { int i; uint8_t pmap = 0, bitmask = 0; @@ -3140,28 +3139,16 @@ static int print_bitmap(uint8_t *map, int maplen, FILE *stream) case 2: break; case 1: - if (firstset == 0) - return 1; + if (firstset == 0) { + fprintf(stream, "all"); + break; + } case 3: fprintf(stream, "%s%d", state > 1 ? "," : "", firstset); if (i - 1 > firstset) fprintf(stream, "-%d", i - 1); break; } - - return 0; -} - -static void print_cpumap(uint8_t *map, int maplen, FILE *stream) -{ - if (print_bitmap(map, maplen, stream)) - fprintf(stream, "any cpu"); -} - -static void print_nodemap(uint8_t *map, int maplen, FILE *stream) -{ - if (print_bitmap(map, maplen, stream)) - fprintf(stream, "any node"); } static void list_domains(int verbose, int context, int claim, int numa, @@ -3234,7 +3221,7 @@ static void list_domains(int verbose, int context, int claim, int numa, libxl_domain_get_nodeaffinity(ctx, info[i].domid, &nodemap); putchar('' ''); - print_nodemap(nodemap.map, physinfo.nr_nodes, stdout); + print_bitmap(nodemap.map, physinfo.nr_nodes, stdout); } putchar(''\n''); } @@ -4446,7 +4433,7 @@ static void print_vcpuinfo(uint32_t tdomid, /* TIM */ printf("%9.1f ", ((float)vcpuinfo->vcpu_time / 1e9)); /* CPU AFFINITY */ - print_cpumap(vcpuinfo->cpumap.map, nr_cpus, stdout); + print_bitmap(vcpuinfo->cpumap.map, nr_cpus, stdout); printf("\n"); }
Dario Faggioli
2013-Nov-18 18:16 UTC
[PATCH v3 02/14] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}
as well as both error handling and logging in libxl_cpu_bitmap_alloc and libxl_node_bitmap_alloc. Now libxl_get_max_{cpus,nodes} either return a positive number, or a libxl error code. Thanks to that, it is possible to fix loggig for the two bitmap allocation functions, which now happens _inside_ the functions themselves, and report what happens more accurately. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * this wasn''t there in v2, but fixing this for v3 was requested during v2 review. --- tools/libxl/libxl.c | 8 ++------ tools/libxl/libxl_utils.c | 48 +++++++++++++++++++++++++++++++++++++++++++-- tools/libxl/libxl_utils.h | 32 ++++++------------------------ 3 files changed, 54 insertions(+), 34 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 0de1112..d3ab65e 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -616,10 +616,8 @@ static int cpupool_info(libxl__gc *gc, info->n_dom = xcinfo->n_dom; rc = libxl_cpu_bitmap_alloc(CTX, &info->cpumap, 0); if (rc) - { - LOG(ERROR, "unable to allocate cpumap %d\n", rc); goto out; - } + memcpy(info->cpumap.map, xcinfo->cpumap, info->cpumap.size); rc = 0; @@ -4204,10 +4202,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, } for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu, ++ptr) { - if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) { - LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "allocating cpumap"); + if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) return NULL; - } if (xc_vcpu_getinfo(ctx->xch, domid, *nb_vcpu, &vcpuinfo) == -1) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu info"); return NULL; diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c index 682f874..2a51c9c 100644 --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -645,6 +645,46 @@ char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const libxl_bitmap *bitmap) return q; } +inline int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, + libxl_bitmap *cpumap, + int max_cpus) +{ + if (max_cpus < 0) + return ERROR_INVAL; + + if (max_cpus == 0) + max_cpus = libxl_get_max_cpus(ctx); + + if (max_cpus <= 0) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, + "failed to retrieve the maximum number of cpus"); + return ERROR_FAIL; + } + + /* This can''t fail: no need to check and log */ + return libxl_bitmap_alloc(ctx, cpumap, max_cpus); +} + +int libxl_node_bitmap_alloc(libxl_ctx *ctx, + libxl_bitmap *nodemap, + int max_nodes) +{ + if (max_nodes < 0) + return ERROR_INVAL; + + if (max_nodes == 0) + max_nodes = libxl_get_max_nodes(ctx); + + if (max_nodes <= 0) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, + "failed to retrieve the maximum number of nodes"); + return ERROR_FAIL; + } + + /* This can''t fail: no need to check and log */ + return libxl_bitmap_alloc(ctx, nodemap, max_nodes); +} + int libxl_nodemap_to_cpumap(libxl_ctx *ctx, const libxl_bitmap *nodemap, libxl_bitmap *cpumap) @@ -713,12 +753,16 @@ int libxl_cpumap_to_nodemap(libxl_ctx *ctx, int libxl_get_max_cpus(libxl_ctx *ctx) { - return xc_get_max_cpus(ctx->xch); + int max_cpus = xc_get_max_cpus(ctx->xch); + + return max_cpus <= 0 ? ERROR_FAIL : max_cpus; } int libxl_get_max_nodes(libxl_ctx *ctx) { - return xc_get_max_nodes(ctx->xch); + int max_nodes = xc_get_max_nodes(ctx->xch); + + return max_nodes <= 0 ? ERROR_FAIL : max_nodes; } int libxl__enum_from_string(const libxl_enum_string_table *t, diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h index 7b84e6a..b11cf28 100644 --- a/tools/libxl/libxl_utils.h +++ b/tools/libxl/libxl_utils.h @@ -98,32 +98,12 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit) #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \ if (libxl_bitmap_test(&(m), v)) -static inline int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, - int max_cpus) -{ - if (max_cpus < 0) - return ERROR_INVAL; - if (max_cpus == 0) - max_cpus = libxl_get_max_cpus(ctx); - if (max_cpus == 0) - return ERROR_FAIL; - - return libxl_bitmap_alloc(ctx, cpumap, max_cpus); -} - -static inline int libxl_node_bitmap_alloc(libxl_ctx *ctx, - libxl_bitmap *nodemap, - int max_nodes) -{ - if (max_nodes < 0) - return ERROR_INVAL; - if (max_nodes == 0) - max_nodes = libxl_get_max_nodes(ctx); - if (max_nodes == 0) - return ERROR_FAIL; - - return libxl_bitmap_alloc(ctx, nodemap, max_nodes); -} +int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, + libxl_bitmap *cpumap, + int max_cpus); +int libxl_node_bitmap_alloc(libxl_ctx *ctx, + libxl_bitmap *nodemap, + int max_nodes); /* Populate cpumap with the cpus spanned by the nodes in nodemap */ int libxl_nodemap_to_cpumap(libxl_ctx *ctx,
Dario Faggioli
2013-Nov-18 18:16 UTC
[PATCH v3 03/14] xl: allow for node-wise specification of vcpu pinning
Making it possible to use something like the following: * "nodes:0-3": all pCPUs of nodes 0,1,2,3; * "nodes:0-3,^node:2": all pCPUS of nodes 0,1,3; * "1,nodes:1-2,^6": pCPU 1 plus all pCPUs of nodes 1,2 but not pCPU 6; * ... In both domain config file and `xl vcpu-pin''. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Changes from v1 (of this very series): * actually checking for both "nodes:" and "node:" as per the doc says; * using strcmp() (rather than strncmp()) when matching "all", to avoid returning success on any longer string that just begins with "all"; * fixing the handling (well, the rejection, actually) of "^all" and "^nodes:all"; * make some string pointers const. Changes from v2 (of original series): * turned a ''return'' into ''goto out'', consistently with the most of exit patterns; * harmonized error handling: now parse_range() return a libxl error code, as requested during review; * dealing with "all" moved inside update_cpumap_range(). It''s tricky to move it in parse_range() (as requested during review), since we need the cpumap being modified handy when dealing with it. However, having it in update_cpumap_range() simplifies the code just as much as that; * explicitly checking for junk after a valid value or range in parse_range(), as requested during review; * xl exits on parsing failing, so no need to reset the cpumap to something sensible in vcpupin_parse(), as suggested during review; Changes from v1 (of original series): * code rearranged in order to look more simple to follow and understand, as requested during review; * improved docs in xl.cfg.pod.5, as requested during review; * strtoul() now returns into unsigned long, and the case where it returns ULONG_MAX is now taken into account, as requested during review; * stuff like "all,^7" now works, as requested during review. Specifying just "^7" does not work either before or after this change * killed some magic (i.e., `ptr += 5 + (ptr[4] == ''s''`) by introducing STR_SKIP_PREFIX() macro, as requested during review. --- docs/man/xl.cfg.pod.5 | 20 ++++++ tools/libxl/xl_cmdimpl.c | 153 +++++++++++++++++++++++++++++++++------------- 2 files changed, 128 insertions(+), 45 deletions(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index e6fc83f..5dbc73c 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -115,7 +115,25 @@ To allow all the vcpus of the guest to run on all the cpus on the host. =item "0-3,5,^1" -To allow all the vcpus of the guest to run on cpus 0,2,3,5. +To allow all the vcpus of the guest to run on cpus 0,2,3,5. Combining +this with "all" is possible, meaning "all,^7" results in all the vcpus +of the guest running on all the cpus on the host except cpu 7. + +=item "nodes:0-3,node:^2" + +To allow all the vcpus of the guest to run on the cpus from NUMA nodes +0,1,3 of the host. So, if cpus 0-3 belongs to node 0, cpus 4-7 belongs +to node 1 and cpus 8-11 to node 3, the above would mean all the vcpus +of the guest will run on cpus 0-3,8-11. + +Combining this notation with the one above is possible. For instance, +"1,node:2,^6", means all the vcpus of the guest will run on cpu 1 and +on all the cpus of NUMA node 2, but not on cpu 6. Following the same +example as above, that would be cpus 1,4,5,7. + +Combining this with "all" is also possible, meaning "all,^nodes:1" +results in all the vcpus of the guest running on all the cpus on the +host, except for the cpus belonging to the host NUMA node 1. =item ["2", "3"] (or [2, 3]) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 13e97b3..5f5cc43 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -59,6 +59,11 @@ } \ }) +#define STR_HAS_PREFIX( a, b ) \ + ( strncmp(a, b, strlen(b)) == 0 ) +#define STR_SKIP_PREFIX( a, b ) \ + ( STR_HAS_PREFIX(a, b) ? ((a) += strlen(b), 1) : 0 ) + int logfile = 2; @@ -513,61 +518,121 @@ static void split_string_into_string_list(const char *str, free(s); } -static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap) +static int parse_range(const char *str, unsigned long *a, unsigned long *b) { - libxl_bitmap exclude_cpumap; - uint32_t cpuida, cpuidb; - char *endptr, *toka, *tokb, *saveptr = NULL; - int i, rc = 0, rmcpu; + const char *nstr; + char *endptr; - if (!strcmp(cpu, "all")) { - libxl_bitmap_set_any(cpumap); - return 0; + *a = *b = strtoul(str, &endptr, 10); + if (endptr == str || *a == ULONG_MAX) + return ERROR_INVAL; + + if (*endptr == ''-'') { + nstr = endptr + 1; + + *b = strtoul(nstr, &endptr, 10); + if (endptr == nstr || *b == ULONG_MAX || *b < *a) + return ERROR_INVAL; + } + + /* Valid value or range so far, but we also don''t want junk after that */ + if (*endptr != ''\0'') + return ERROR_INVAL; + + return 0; +} + +/* + * Add or removes a specific set of cpus (specified in str, either as + * single cpus or as entire NUMA nodes) to/from cpumap. + */ +static int update_cpumap_range(const char *str, libxl_bitmap *cpumap) +{ + unsigned long ida, idb; + libxl_bitmap node_cpumap; + bool is_not = false, is_nodes = false; + int rc = 0; + + libxl_bitmap_init(&node_cpumap); + + rc = libxl_node_bitmap_alloc(ctx, &node_cpumap, 0); + if (rc) { + fprintf(stderr, "libxl_node_bitmap_alloc failed.\n"); + goto out; } - if (libxl_cpu_bitmap_alloc(ctx, &exclude_cpumap, 0)) { - fprintf(stderr, "Error: Failed to allocate cpumap.\n"); - return ENOMEM; + /* Are we adding or removing cpus/nodes? */ + if (STR_SKIP_PREFIX(str, "^")) { + is_not = true; } - for (toka = strtok_r(cpu, ",", &saveptr); toka; - toka = strtok_r(NULL, ",", &saveptr)) { - rmcpu = 0; - if (*toka == ''^'') { - /* This (These) Cpu(s) will be removed from the map */ - toka++; - rmcpu = 1; - } - /* Extract a valid (range of) cpu(s) */ - cpuida = cpuidb = strtoul(toka, &endptr, 10); - if (endptr == toka) { - fprintf(stderr, "Error: Invalid argument.\n"); - rc = EINVAL; - goto vcpp_out; - } - if (*endptr == ''-'') { - tokb = endptr + 1; - cpuidb = strtoul(tokb, &endptr, 10); - if (endptr == tokb || cpuida > cpuidb) { - fprintf(stderr, "Error: Invalid argument.\n"); - rc = EINVAL; - goto vcpp_out; + /* Are we dealing with cpus or full nodes? */ + if (STR_SKIP_PREFIX(str, "node:") || STR_SKIP_PREFIX(str, "nodes:")) { + is_nodes = true; + } + + if (strcmp(str, "all") == 0) { + /* We do not accept "^all" or "^nodes:all" */ + if (is_not) { + fprintf(stderr, "Can''t combine \"^\" and \"all\".\n"); + rc = ERROR_INVAL; + } else + libxl_bitmap_set_any(cpumap); + goto out; + } + + rc = parse_range(str, &ida, &idb); + if (rc) { + fprintf(stderr, "Invalid pcpu range: %s.\n", str); + goto out; + } + + /* Add or remove the specified cpus in the range */ + while (ida <= idb) { + if (is_nodes) { + /* Add/Remove all the cpus of a NUMA node */ + int i; + + rc = libxl_node_to_cpumap(ctx, ida, &node_cpumap); + if (rc) { + fprintf(stderr, "libxl_node_to_cpumap failed.\n"); + goto out; } + + /* Add/Remove all the cpus in the node cpumap */ + libxl_for_each_set_bit(i, node_cpumap) { + is_not ? libxl_bitmap_reset(cpumap, i) : + libxl_bitmap_set(cpumap, i); + } + } else { + /* Add/Remove this cpu */ + is_not ? libxl_bitmap_reset(cpumap, ida) : + libxl_bitmap_set(cpumap, ida); } - while (cpuida <= cpuidb) { - rmcpu == 0 ? libxl_bitmap_set(cpumap, cpuida) : - libxl_bitmap_set(&exclude_cpumap, cpuida); - cpuida++; - } + ida++; } - /* Clear all the cpus from the removal list */ - libxl_for_each_set_bit(i, exclude_cpumap) { - libxl_bitmap_reset(cpumap, i); - } + out: + libxl_bitmap_dispose(&node_cpumap); + return rc; +} -vcpp_out: - libxl_bitmap_dispose(&exclude_cpumap); +/* + * Takes a string representing a set of cpus (specified either as + * single cpus or as eintire NUMA nodes) and turns it into the + * corresponding libxl_bitmap (in cpumap). + */ +static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap) +{ + char *ptr, *saveptr = NULL; + int rc = 0; + + for (ptr = strtok_r(cpu, ",", &saveptr); ptr; + ptr = strtok_r(NULL, ",", &saveptr)) { + rc = update_cpumap_range(ptr, cpumap); + if (rc) + break; + } return rc; }
Dario Faggioli
2013-Nov-18 18:17 UTC
[PATCH v3 04/14] xl: implement and enable dryrun mode for `xl vcpu-pin''
As it can be useful to see if the outcome of some complex vCPU pinning bitmap specification looks as expected. This also allow for the introduction of some automatic testing and verification for the bitmap parsing code, as it happens already in check-xl-disk-parse and check-xl-vif-parse. In particular, to make the above possible, this commit also changes the implementation of the vcpu-pin command so that, instead of always returning 0, it returns an error if the parsing fails. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Changes since v2 (of original series): * fixed a typo in the changelog --- tools/libxl/xl_cmdimpl.c | 49 +++++++++++++++++++++++++++++++++------------ tools/libxl/xl_cmdtable.c | 2 +- 2 files changed, 37 insertions(+), 14 deletions(-) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 5f5cc43..cf237c4 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -4566,30 +4566,53 @@ int main_vcpulist(int argc, char **argv) return 0; } -static void vcpupin(uint32_t domid, const char *vcpu, char *cpu) +static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) { libxl_vcpuinfo *vcpuinfo; libxl_bitmap cpumap; uint32_t vcpuid; char *endptr; - int i, nb_vcpu; + int i = 0, nb_vcpu, rc = -1; + + libxl_bitmap_init(&cpumap); vcpuid = strtoul(vcpu, &endptr, 10); if (vcpu == endptr) { if (strcmp(vcpu, "all")) { fprintf(stderr, "Error: Invalid argument.\n"); - return; + goto out; } vcpuid = -1; } - if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) { - goto vcpupin_out; - } + if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) + goto out; if (vcpupin_parse(cpu, &cpumap)) - goto vcpupin_out1; + goto out; + + if (dryrun_only) { + libxl_cputopology *info = libxl_get_cpu_topology(ctx, &i); + + if (!info) { + fprintf(stderr, "libxl_get_cpu_topology failed.\n"); + goto out; + } + libxl_cputopology_list_free(info, i); + + fprintf(stdout, "cpumap: "); + print_bitmap(cpumap.map, i, stdout); + fprintf(stdout, "\n"); + + if (ferror(stdout) || fflush(stdout)) { + perror("stdout"); + exit(-1); + } + + rc = 0; + goto out; + } if (vcpuid != -1) { if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap) == -1) { @@ -4599,7 +4622,7 @@ static void vcpupin(uint32_t domid, const char *vcpu, char *cpu) else { if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu, &i))) { fprintf(stderr, "libxl_list_vcpu failed.\n"); - goto vcpupin_out1; + goto out; } for (i = 0; i < nb_vcpu; i++) { if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid, @@ -4610,10 +4633,11 @@ static void vcpupin(uint32_t domid, const char *vcpu, char *cpu) } libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu); } - vcpupin_out1: + + rc = 0; + out: libxl_bitmap_dispose(&cpumap); - vcpupin_out: - ; + return rc; } int main_vcpupin(int argc, char **argv) @@ -4624,8 +4648,7 @@ int main_vcpupin(int argc, char **argv) /* No options */ } - vcpupin(find_domain(argv[optind]), argv[optind+1] , argv[optind+2]); - return 0; + return vcpupin(find_domain(argv[optind]), argv[optind+1] , argv[optind+2]); } static void vcpuset(uint32_t domid, const char* nr_vcpus, int check_host) diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c index 326a660..d3dcbf0 100644 --- a/tools/libxl/xl_cmdtable.c +++ b/tools/libxl/xl_cmdtable.c @@ -211,7 +211,7 @@ struct cmd_spec cmd_table[] = { "[Domain, ...]", }, { "vcpu-pin", - &main_vcpupin, 0, 1, + &main_vcpupin, 1, 1, "Set which CPUs a VCPU can use", "<Domain> <VCPU|all> <CPUs|all>", },
Dario Faggioli
2013-Nov-18 18:17 UTC
[PATCH v3 05/14] xl: test script for the cpumap parser (for vCPU pinning)
This commit introduces "check-xl-vcpupin-parse" for helping verifying and debugging the (v)CPU bitmap parsing code in xl. The script runs "xl -N vcpu-pin 0 all <some strings>" repeatedly, with various input strings, and checks that the output is as expected. This is what the script can do: # ./check-xl-vcpupin-parse -h usage: ./check-xl-vcpupin-parse [options] Tests various vcpu-pinning strings. If run without arguments acts as follows: - generates some test data and saves them in check-xl-vcpupin-parse.data; - tests all the generated configurations (reading them back from check-xl-vcpupin-parse.data). An example of a test vector file is provided in check-xl-vcpupin-parse.data-example. Options: -h prints this message -r seed uses seed for initializing the rundom number generator (default: the script PID) -s string tries using string as a vcpu pinning configuration and reports whether that succeeds or not -o ofile save the test data in ofile (default: check-xl-vcpupin-parse.data) -i ifile read test data from ifile An example test data file (generated on a 2 NUMA nodes, 16 CPUs host) is being provided in check-xl-vcpupin-parse.data-example. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> --- Changes from v2 (of original series): * killed the `sleep 1'', as requested during review; * allow for passing a custom randon seed, and report what is the actual random seed used, as requested during review; * allow for testing for specific pinning configuration strings, as suggested during review; * stores the test data in a file, after generating them, and read them back from there for actual testing, as suggested during review; * allow for reading the test data from an existing test file instead than always generating new ones. Changes from v1 (of original series): * this was not there in v1, and adding it has been requested during review. --- tools/libxl/check-xl-vcpupin-parse | 294 +++++++++++++++++++++++ tools/libxl/check-xl-vcpupin-parse.data-example | 53 ++++ 2 files changed, 347 insertions(+) create mode 100755 tools/libxl/check-xl-vcpupin-parse create mode 100644 tools/libxl/check-xl-vcpupin-parse.data-example diff --git a/tools/libxl/check-xl-vcpupin-parse b/tools/libxl/check-xl-vcpupin-parse new file mode 100755 index 0000000..21f8421 --- /dev/null +++ b/tools/libxl/check-xl-vcpupin-parse @@ -0,0 +1,294 @@ +#!/bin/bash + +set -e + +if [ -x ./xl ] ; then + export LD_LIBRARY_PATH=.:../libxc:../xenstore: + XL=./xl +else + XL=xl +fi + +fprefix=tmp.check-xl-vcpupin-parse +outfile=check-xl-vcpupin-parse.data + +usage () { +cat <<END +usage: $0 [options] + +Tests various vcpu-pinning strings. If run without arguments acts +as follows: + - generates some test data and saves them in $outfile; + - tests all the generated configurations (reading them back from + $outfile). + +An example of a test vector file is provided in ${outfile}-example. + +Options: + -h prints this message + -r seed uses seed for initializing the rundom number generator + (default: the script PID) + -s string tries using string as a vcpu pinning configuration and + reports whether that succeeds or not + -o ofile save the test data in ofile (default: $outfile) + -i ifile read test data from ifile +END +} + +expected () { + cat >$fprefix.expected +} + +# by default, re-seed with our PID +seed=$$ +failures=0 + +# Execute one test and check the result against the provided +# rc value and output +one () { + expected_rc=$1; shift + printf "test case %s...\n" "$*" + set +e + ${XL} -N vcpu-pin 0 all "$@" </dev/null >$fprefix.actual 2>/dev/null + actual_rc=$? + if [ $actual_rc != $expected_rc ]; then + diff -u $fprefix.expected $fprefix.actual + echo >&2 "test case \`$*'' failed ($actual_rc $diff_rc)" + failures=$(( $failures + 1 )) + fi + set -e +} + +# Write an entry in the test vector file. Format is as follows: +# test-string*expected-rc*expected-output +write () { + printf "$1*$2*$3\n" >> $outfile +} + +complete () { + if [ "$failures" = 0 ]; then + echo all ok.; exit 0 + else + echo "$failures tests failed."; exit 1 + fi +} + +# Test a specific pinning string +string () { + expected_rc=$1; shift + printf "test case %s...\n" "$*" + set +e + ${XL} -N vcpu-pin 0 all "$@" &> /dev/null + actual_rc=$? + set -e + + if [ $actual_rc != $expected_rc ]; then + echo >&2 "test case \`$*'' failed ($actual_rc)" + else + echo >&2 "test case \`$*'' succeeded" + fi + + exit 0 +} + +# Read a test vector file (provided as $1) line by line and +# test all the entries it contains +run () +{ + while read line + do + if [ ${line:0:1} != ''#'' ]; then + test_string="`echo $line | cut -f1 -d''*''`" + exp_rc="`echo $line | cut -f2 -d''*''`" + exp_output="`echo $line | cut -f3 -d''*''`" + + expected <<END +$exp_output +END + one $exp_rc "$test_string" + fi + done < $1 + + complete + + exit 0 +} + +while getopts "hr:s:o:i:" option +do + case $option in + h) + usage + exit 0 + ;; + r) + seed=$OPTARG + ;; + s) + string 0 "$OPTARG" + ;; + o) + outfile=$OPTARG + ;; + i) + run $OPTARG + ;; + esac +done + +#---------- test data ---------- +# +nr_cpus=`xl info | grep nr_cpus | cut -f2 -d'':''` +nr_nodes=`xl info | grep nr_nodes | cut -f2 -d'':''` +nr_cpus_per_node=`xl info -n | sed ''/cpu:/,/numa_info/!d'' | head -n -1 | \ + awk ''{print $4}'' | uniq -c | tail -1 | awk ''{print $1}''` +cat >$outfile <<END +# WARNING: some of these tests are topology based tests. +# Expect failures if the topology is not detected correctly +# detected topology: $nr_cpus CPUs, $nr_nodes nodes, $nr_cpus_per_node CPUs per node. +# +# seed used for random number generation: seed=${seed}. +# +# Format is as follows: +# test-string*expected-return-code*expected-output +# +END + +# Re-seed the random number generator +RANDOM=$seed + +echo "# Testing a wrong configuration" >> $outfile +write foo 255 "" + +echo "# Testing the ''all'' syntax" >> $outfile +write "all" 0 "cpumap: all" +write "nodes:all" 0 "cpumap: all" +write "all,nodes:all" 0 "cpumap: all" +write "all,^nodes:0,all" 0 "cpumap: all" + +echo "# Testing the empty cpumap case" >> $outfile +write "^0" 0 "cpumap: none" + +echo "# A few attempts of pinning to just one random cpu" >> $outfile +if [ $nr_cpus -gt 1 ]; then + for i in `seq 0 3`; do + cpu=$(($RANDOM % nr_cpus)) + write "$cpu" 0 "cpumap: $cpu" + done +fi + +echo "# A few attempts of pinning to all but one random cpu" >> $outfile +if [ $nr_cpus -gt 2 ]; then + for i in `seq 0 3`; do + cpu=$(($RANDOM % nr_cpus)) + if [ $cpu -eq 0 ]; then + expected_range="1-$((nr_cpus - 1))" + elif [ $cpu -eq 1 ]; then + expected_range="0,2-$((nr_cpus - 1))" + elif [ $cpu -eq $((nr_cpus - 2)) ]; then + expected_range="0-$((cpu - 1)),$((nr_cpus - 1))" + elif [ $cpu -eq $((nr_cpus - 1)) ]; then + expected_range="0-$((nr_cpus - 2))" + else + expected_range="0-$((cpu - 1)),$((cpu + 1))-$((nr_cpus - 1))" + fi + write "all,^$cpu" 0 "cpumap: $expected_range" + done +fi + +echo "# A few attempts of pinning to a random range of cpus" >> $outfile +if [ $nr_cpus -gt 2 ]; then + for i in `seq 0 3`; do + cpua=$(($RANDOM % nr_cpus)) + range=$((nr_cpus - cpua)) + cpub=$(($RANDOM % range)) + cpubb=$((cpua + cpub)) + if [ $cpua -eq $cpubb ]; then + expected_range="$cpua" + else + expected_range="$cpua-$cpubb" + fi + write "$expected_range" 0 "cpumap: $expected_range" + done +fi + +echo "# A few attempts of pinning to just one random node" >> $outfile +if [ $nr_nodes -gt 1 ]; then + for i in `seq 0 3`; do + node=$(($RANDOM % nr_nodes)) + # this assumes that the first $nr_cpus_per_node (from cpu + # 0 to cpu $nr_cpus_per_node-1) are assigned to the first node + # (node 0), the second $nr_cpus_per_node (from $nr_cpus_per_node + # to 2*$nr_cpus_per_node-1) are assigned to the second node (node + # 1), etc. Expect failures if that is not the case. + write "nodes:$node" 0 "cpumap: $((nr_cpus_per_node*node))-$((nr_cpus_per_node*(node+1)-1))" + done +fi + +echo "# A few attempts of pinning to all but one random node" >> $outfile +if [ $nr_nodes -gt 1 ]; then + for i in `seq 0 3`; do + node=$(($RANDOM % nr_nodes)) + # this assumes that the first $nr_cpus_per_node (from cpu + # 0 to cpu $nr_cpus_per_node-1) are assigned to the first node + # (node 0), the second $nr_cpus_per_node (from $nr_cpus_per_node + # to 2*$nr_cpus_per_node-1) are assigned to the second node (node + # 1), etc. Expect failures if that is not the case. + if [ $node -eq 0 ]; then + expected_range="$nr_cpus_per_node-$((nr_cpus - 1))" + elif [ $node -eq $((nr_nodes - 1)) ]; then + expected_range="0-$((nr_cpus - nr_cpus_per_node - 1))" + else + expected_range="0-$((nr_cpus_per_node*node-1)),$((nr_cpus_per_node*(node+1)))-$nr_cpus" + fi + write "all,^nodes:$node" 0 "cpumap: $expected_range" + done +fi + +echo "# A few attempts of pinning to a random range of nodes" >> $outfile +if [ $nr_nodes -gt 1 ]; then + for i in `seq 0 3`; do + nodea=$(($RANDOM % nr_nodes)) + range=$((nr_nodes - nodea)) + nodeb=$(($RANDOM % range)) + nodebb=$((nodea + nodeb)) + # this assumes that the first $nr_cpus_per_node (from cpu + # 0 to cpu $nr_cpus_per_node-1) are assigned to the first node + # (node 0), the second $nr_cpus_per_node (from $nr_cpus_per_node + # to 2*$nr_cpus_per_node-1) are assigned to the second node (node + # 1), etc. Expect failures if that is not the case. + if [ $nodea -eq 0 ] && [ $nodebb -eq $((nr_nodes - 1)) ]; then + expected_range="all" + else + expected_range="$((nr_cpus_per_node*nodea))-$((nr_cpus_per_node*(nodebb+1) - 1))" + fi + write "nodes:$nodea-$nodebb" 0 "cpumap: $expected_range" + done +fi + +echo "# A few attempts of pinning to a node but excluding one random cpu" >> $outfile +if [ $nr_nodes -gt 1 ]; then + for i in `seq 0 3`; do + node=$(($RANDOM % nr_nodes)) + # this assumes that the first $nr_cpus_per_node (from cpu + # 0 to cpu $nr_cpus_per_node-1) are assigned to the first node + # (node 0), the second $nr_cpus_per_node (from $nr_cpus_per_node + # to 2*$nr_cpus_per_node-1) are assigned to the second node (node + # 1), etc. Expect failures if that is not the case. + cpu=$(($RANDOM % nr_cpus_per_node + nr_cpus_per_node*node)) + if [ $cpu -eq $((nr_cpus_per_node*node)) ]; then + expected_range="$((nr_cpus_per_node*node + 1))-$((nr_cpus_per_node*(node+1) - 1))" + elif [ $cpu -eq $((nr_cpus_per_node*node + 1)) ]; then + expected_range="$((nr_cpus_per_node*node)),$((nr_cpus_per_node*node + 2))-$((nr_cpus_per_node*(node+1) - 1))" + elif [ $cpu -eq $((nr_cpus_per_node*(node+1) - 2)) ]; then + expected_range="$((nr_cpus_per_node*node))-$((nr_cpus_per_node*(node+1) - 3)),$((nr_cpus_per_node*(node+1) - 1))" + elif [ $cpu -eq $((nr_cpus_per_node*(node+1) - 1)) ]; then + expected_range="$((nr_cpus_per_node*node))-$((nr_cpus_per_node*(node+1) - 2))" + else + expected_range="$((nr_cpus_per_node*node))-$((cpu - 1)),$((cpu + 1))-$((nr_cpus_per_node*(node+1) - 1))" + fi + write "nodes:$node,^$cpu" 0 "cpumap: $expected_range" + done +fi + +run $outfile diff --git a/tools/libxl/check-xl-vcpupin-parse.data-example b/tools/libxl/check-xl-vcpupin-parse.data-example new file mode 100644 index 0000000..4bbd5de --- /dev/null +++ b/tools/libxl/check-xl-vcpupin-parse.data-example @@ -0,0 +1,53 @@ +# WARNING: some of these tests are topology based tests. +# Expect failures if the topology is not detected correctly +# detected topology: 16 CPUs, 2 nodes, 8 CPUs per node. +# +# seed used for random number generation: seed=13328. +# +# Format is as follows: +# test-string*expected-return-code*expected-output +# +# Testing a wrong configuration +foo*255* +# Testing the ''all'' syntax +all*0*cpumap: all +nodes:all*0*cpumap: all +all,nodes:all*0*cpumap: all +all,^nodes:0,all*0*cpumap: all +# Testing the empty cpumap case +^0*0*cpumap: none +# A few attempts of pinning to just one random cpu +0*0*cpumap: 0 +9*0*cpumap: 9 +6*0*cpumap: 6 +0*0*cpumap: 0 +# A few attempts of pinning to all but one random cpu +all,^12*0*cpumap: 0-11,13-15 +all,^6*0*cpumap: 0-5,7-15 +all,^3*0*cpumap: 0-2,4-15 +all,^7*0*cpumap: 0-6,8-15 +# A few attempts of pinning to a random range of cpus +13-15*0*cpumap: 13-15 +7*0*cpumap: 7 +3-5*0*cpumap: 3-5 +8-11*0*cpumap: 8-11 +# A few attempts of pinning to just one random node +nodes:1*0*cpumap: 8-15 +nodes:0*0*cpumap: 0-7 +nodes:0*0*cpumap: 0-7 +nodes:0*0*cpumap: 0-7 +# A few attempts of pinning to all but one random node +all,^nodes:0*0*cpumap: 8-15 +all,^nodes:1*0*cpumap: 0-7 +all,^nodes:1*0*cpumap: 0-7 +all,^nodes:0*0*cpumap: 8-15 +# A few attempts of pinning to a random range of nodes +nodes:1-1*0*cpumap: 8-15 +nodes:1-1*0*cpumap: 8-15 +nodes:0-1*0*cpumap: all +nodes:0-0*0*cpumap: 0-7 +# A few attempts of pinning to a node but excluding one random cpu +nodes:1,^8*0*cpumap: 9-15 +nodes:0,^6*0*cpumap: 0-5,7 +nodes:1,^9*0*cpumap: 8,10-15 +nodes:0,^5*0*cpumap: 0-4,6-7
Dario Faggioli
2013-Nov-18 18:17 UTC
[PATCH v3 06/14] xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity
in order to distinguish it from the cpu_soft_affinity which will be introduced a later commit ("xen: sched: introduce soft-affinity and use it instead d->node-affinity"). This patch does not imply any functional change, it is basically the result of something like the following: s/cpu_affinity/cpu_hard_affinity/g s/cpu_affinity_tmp/cpu_hard_affinity_tmp/g s/cpu_affinity_saved/cpu_hard_affinity_saved/g Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> --- Changes from v2: * patch has been moved one step up in the series. --- xen/arch/x86/traps.c | 11 ++++++----- xen/common/domain.c | 22 +++++++++++----------- xen/common/domctl.c | 2 +- xen/common/keyhandler.c | 2 +- xen/common/sched_credit.c | 12 ++++++------ xen/common/sched_sedf.c | 2 +- xen/common/schedule.c | 21 +++++++++++---------- xen/common/wait.c | 4 ++-- xen/include/xen/sched.h | 8 ++++---- 9 files changed, 43 insertions(+), 41 deletions(-) diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index e5b3585..4279cad 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -3083,7 +3083,8 @@ static void nmi_mce_softirq(void) /* Set the tmp value unconditionally, so that * the check in the iret hypercall works. */ - cpumask_copy(st->vcpu->cpu_affinity_tmp, st->vcpu->cpu_affinity); + cpumask_copy(st->vcpu->cpu_hard_affinity_tmp, + st->vcpu->cpu_hard_affinity); if ((cpu != st->processor) || (st->processor != st->vcpu->processor)) @@ -3118,11 +3119,11 @@ void async_exception_cleanup(struct vcpu *curr) return; /* Restore affinity. */ - if ( !cpumask_empty(curr->cpu_affinity_tmp) && - !cpumask_equal(curr->cpu_affinity_tmp, curr->cpu_affinity) ) + if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) && + !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) ) { - vcpu_set_affinity(curr, curr->cpu_affinity_tmp); - cpumask_clear(curr->cpu_affinity_tmp); + vcpu_set_affinity(curr, curr->cpu_hard_affinity_tmp); + cpumask_clear(curr->cpu_hard_affinity_tmp); } if ( !(curr->async_exception_mask & (curr->async_exception_mask - 1)) ) diff --git a/xen/common/domain.c b/xen/common/domain.c index 2cbc489..d8116c7 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -125,9 +125,9 @@ struct vcpu *alloc_vcpu( tasklet_init(&v->continue_hypercall_tasklet, NULL, 0); - if ( !zalloc_cpumask_var(&v->cpu_affinity) || - !zalloc_cpumask_var(&v->cpu_affinity_tmp) || - !zalloc_cpumask_var(&v->cpu_affinity_saved) || + if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) || + !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) || + !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) || !zalloc_cpumask_var(&v->vcpu_dirty_cpumask) ) goto fail_free; @@ -156,9 +156,9 @@ struct vcpu *alloc_vcpu( fail_wq: destroy_waitqueue_vcpu(v); fail_free: - free_cpumask_var(v->cpu_affinity); - free_cpumask_var(v->cpu_affinity_tmp); - free_cpumask_var(v->cpu_affinity_saved); + free_cpumask_var(v->cpu_hard_affinity); + free_cpumask_var(v->cpu_hard_affinity_tmp); + free_cpumask_var(v->cpu_hard_affinity_saved); free_cpumask_var(v->vcpu_dirty_cpumask); free_vcpu_struct(v); return NULL; @@ -371,7 +371,7 @@ void domain_update_node_affinity(struct domain *d) for_each_vcpu ( d, v ) { - cpumask_and(online_affinity, v->cpu_affinity, online); + cpumask_and(online_affinity, v->cpu_hard_affinity, online); cpumask_or(cpumask, cpumask, online_affinity); } @@ -734,9 +734,9 @@ static void complete_domain_destroy(struct rcu_head *head) for ( i = d->max_vcpus - 1; i >= 0; i-- ) if ( (v = d->vcpu[i]) != NULL ) { - free_cpumask_var(v->cpu_affinity); - free_cpumask_var(v->cpu_affinity_tmp); - free_cpumask_var(v->cpu_affinity_saved); + free_cpumask_var(v->cpu_hard_affinity); + free_cpumask_var(v->cpu_hard_affinity_tmp); + free_cpumask_var(v->cpu_hard_affinity_saved); free_cpumask_var(v->vcpu_dirty_cpumask); free_vcpu_struct(v); } @@ -875,7 +875,7 @@ int vcpu_reset(struct vcpu *v) v->async_exception_mask = 0; memset(v->async_exception_state, 0, sizeof(v->async_exception_state)); #endif - cpumask_clear(v->cpu_affinity_tmp); + cpumask_clear(v->cpu_hard_affinity_tmp); clear_bit(_VPF_blocked, &v->pause_flags); clear_bit(_VPF_in_reset, &v->pause_flags); diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 904d27b..5e0ac5c 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -629,7 +629,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) else { ret = cpumask_to_xenctl_bitmap( - &op->u.vcpuaffinity.cpumap, v->cpu_affinity); + &op->u.vcpuaffinity.cpumap, v->cpu_hard_affinity); } } break; diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c index 8e4b3f8..c11f577 100644 --- a/xen/common/keyhandler.c +++ b/xen/common/keyhandler.c @@ -296,7 +296,7 @@ static void dump_domains(unsigned char key) !vcpu_event_delivery_is_enabled(v)); cpuset_print(tmpstr, sizeof(tmpstr), v->vcpu_dirty_cpumask); printk("dirty_cpus=%s ", tmpstr); - cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_affinity); + cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_hard_affinity); printk("cpu_affinity=%s\n", tmpstr); printk(" pause_count=%d pause_flags=%lx\n", atomic_read(&v->pause_count), v->pause_flags); diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index db5512e..c6a2560 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -332,13 +332,13 @@ csched_balance_cpumask(const struct vcpu *vc, int step, cpumask_t *mask) if ( step == CSCHED_BALANCE_NODE_AFFINITY ) { cpumask_and(mask, CSCHED_DOM(vc->domain)->node_affinity_cpumask, - vc->cpu_affinity); + vc->cpu_hard_affinity); if ( unlikely(cpumask_empty(mask)) ) - cpumask_copy(mask, vc->cpu_affinity); + cpumask_copy(mask, vc->cpu_hard_affinity); } else /* step == CSCHED_BALANCE_CPU_AFFINITY */ - cpumask_copy(mask, vc->cpu_affinity); + cpumask_copy(mask, vc->cpu_hard_affinity); } static void burn_credits(struct csched_vcpu *svc, s_time_t now) @@ -407,7 +407,7 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new) if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY && !__vcpu_has_node_affinity(new->vcpu, - new->vcpu->cpu_affinity) ) + new->vcpu->cpu_hard_affinity) ) continue; /* Are there idlers suitable for new (for this balance step)? */ @@ -642,7 +642,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) /* Store in cpus the mask of online cpus on which the domain can run */ online = cpupool_scheduler_cpumask(vc->domain->cpupool); - cpumask_and(&cpus, vc->cpu_affinity, online); + cpumask_and(&cpus, vc->cpu_hard_affinity, online); for_each_csched_balance_step( balance_step ) { @@ -1498,7 +1498,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step) * or counter. */ if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY - && !__vcpu_has_node_affinity(vc, vc->cpu_affinity) ) + && !__vcpu_has_node_affinity(vc, vc->cpu_hard_affinity) ) continue; csched_balance_cpumask(vc, balance_step, csched_balance_mask); diff --git a/xen/common/sched_sedf.c b/xen/common/sched_sedf.c index 7c24171..c219aed 100644 --- a/xen/common/sched_sedf.c +++ b/xen/common/sched_sedf.c @@ -396,7 +396,7 @@ static int sedf_pick_cpu(const struct scheduler *ops, struct vcpu *v) cpumask_t *online; online = cpupool_scheduler_cpumask(v->domain->cpupool); - cpumask_and(&online_affinity, v->cpu_affinity, online); + cpumask_and(&online_affinity, v->cpu_hard_affinity, online); return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) - 1, &online_affinity); } diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 0f45f07..c4236c5 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -194,9 +194,9 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor) */ v->processor = processor; if ( is_idle_domain(d) || d->is_pinned ) - cpumask_copy(v->cpu_affinity, cpumask_of(processor)); + cpumask_copy(v->cpu_hard_affinity, cpumask_of(processor)); else - cpumask_setall(v->cpu_affinity); + cpumask_setall(v->cpu_hard_affinity); /* Initialise the per-vcpu timers. */ init_timer(&v->periodic_timer, vcpu_periodic_timer_fn, @@ -285,7 +285,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) migrate_timer(&v->singleshot_timer, new_p); migrate_timer(&v->poll_timer, new_p); - cpumask_setall(v->cpu_affinity); + cpumask_setall(v->cpu_hard_affinity); lock = vcpu_schedule_lock_irq(v); v->processor = new_p; @@ -457,7 +457,7 @@ static void vcpu_migrate(struct vcpu *v) */ if ( pick_called && (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) && - cpumask_test_cpu(new_cpu, v->cpu_affinity) && + cpumask_test_cpu(new_cpu, v->cpu_hard_affinity) && cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) ) break; @@ -561,7 +561,7 @@ void restore_vcpu_affinity(struct domain *d) { printk(XENLOG_DEBUG "Restoring affinity for d%dv%d\n", d->domain_id, v->vcpu_id); - cpumask_copy(v->cpu_affinity, v->cpu_affinity_saved); + cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved); v->affinity_broken = 0; } @@ -604,20 +604,21 @@ int cpu_disable_scheduler(unsigned int cpu) unsigned long flags; spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags); - cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid); + cpumask_and(&online_affinity, v->cpu_hard_affinity, c->cpu_valid); if ( cpumask_empty(&online_affinity) && - cpumask_test_cpu(cpu, v->cpu_affinity) ) + cpumask_test_cpu(cpu, v->cpu_hard_affinity) ) { printk(XENLOG_DEBUG "Breaking affinity for d%dv%d\n", d->domain_id, v->vcpu_id); if (system_state == SYS_STATE_suspend) { - cpumask_copy(v->cpu_affinity_saved, v->cpu_affinity); + cpumask_copy(v->cpu_hard_affinity_saved, + v->cpu_hard_affinity); v->affinity_broken = 1; } - cpumask_setall(v->cpu_affinity); + cpumask_setall(v->cpu_hard_affinity); } if ( v->processor == cpu ) @@ -665,7 +666,7 @@ int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity) lock = vcpu_schedule_lock_irq(v); - cpumask_copy(v->cpu_affinity, affinity); + cpumask_copy(v->cpu_hard_affinity, affinity); /* Always ask the scheduler to re-evaluate placement * when changing the affinity */ diff --git a/xen/common/wait.c b/xen/common/wait.c index 3c9366c..3f6ff41 100644 --- a/xen/common/wait.c +++ b/xen/common/wait.c @@ -134,7 +134,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv) /* Save current VCPU affinity; force wakeup on *this* CPU only. */ wqv->wakeup_cpu = smp_processor_id(); - cpumask_copy(&wqv->saved_affinity, curr->cpu_affinity); + cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) { gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); @@ -183,7 +183,7 @@ void check_wakeup_from_wait(void) { /* Re-set VCPU affinity and re-enter the scheduler. */ struct vcpu *curr = current; - cpumask_copy(&wqv->saved_affinity, curr->cpu_affinity); + cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) { gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index cbdf377..40e5927 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -192,11 +192,11 @@ struct vcpu spinlock_t virq_lock; /* Bitmask of CPUs on which this VCPU may run. */ - cpumask_var_t cpu_affinity; + cpumask_var_t cpu_hard_affinity; /* Used to change affinity temporarily. */ - cpumask_var_t cpu_affinity_tmp; + cpumask_var_t cpu_hard_affinity_tmp; /* Used to restore affinity across S3. */ - cpumask_var_t cpu_affinity_saved; + cpumask_var_t cpu_hard_affinity_saved; /* Bitmask of CPUs which are holding onto this VCPU''s state. */ cpumask_var_t vcpu_dirty_cpumask; @@ -792,7 +792,7 @@ void watchdog_domain_destroy(struct domain *d); #define has_hvm_container_domain(d) ((d)->guest_type != guest_type_pv) #define has_hvm_container_vcpu(v) (has_hvm_container_domain((v)->domain)) #define is_pinned_vcpu(v) ((v)->domain->is_pinned || \ - cpumask_weight((v)->cpu_affinity) == 1) + cpumask_weight((v)->cpu_hard_affinity) == 1) #ifdef HAS_PASSTHROUGH #define need_iommu(d) ((d)->need_iommu) #else
Dario Faggioli
2013-Nov-18 18:17 UTC
[PATCH v3 07/14] xen: sched: introduce soft-affinity and use it instead d->node-affinity
Before this change, each vcpu had its own vcpu-affinity (in v->cpu_affinity), representing the set of pcpus where the vcpu is allowed to run. Since when NUMA-aware scheduling was introduced the (credit1 only, for now) scheduler also tries as much as it can to run all the vcpus of a domain on one of the nodes that constitutes the domain''s node-affinity. The idea here is making the mechanism more general by: * allowing for this ''preference'' for some pcpus/nodes to be expressed on a per-vcpu basis, instead than for the domain as a whole. That is to say, each vcpu should have its own set of preferred pcpus/nodes, instead than it being the very same for all the vcpus of the domain; * generalizing the idea of ''preferred pcpus'' to not only NUMA awareness and support. That is to say, independently from it being or not (mostly) useful on NUMA systems, it should be possible to specify, for each vcpu, a set of pcpus where it prefers to run (in addition, and possibly unrelated to, the set of pcpus where it is allowed to run). We will be calling this set of *preferred* pcpus the vcpu''s soft affinity, and this changes introduce it, and starts using it for scheduling, replacing the indirect use of the domain''s NUMA node-affinity. This is more general, as soft affinity does not have to be related to NUMA. Nevertheless, it allows to achieve the same results of NUMA-aware scheduling, just by making soft affinity equal to the domain''s node affinity, for all the vCPUs (e.g., from the toolstack). This also means renaming most of the NUMA-aware scheduling related functions, in credit1, to something more generic, hinting toward the concept of soft affinity rather than directly to NUMA awareness. As a side effects, this simplifies the code quit a bit. In fact, prior to this change, we needed to cache the translation of d->node_affinity (which is a nodemask_t) to a cpumask_t, since that is what scheduling decisions require (we used to keep it in node_affinity_cpumask). This, and all the complicated logic required to keep it updated, is not necessary any longer. The high level description of NUMA placement and scheduling in docs/misc/xl-numa-placement.markdown is being updated too, to match the new architecture. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> --- Changes from v2: * this patch folds patches 6 ("xen: sched: make space for cpu_soft_affinity") and 10 ("xen: sched: use soft-affinity instead of domain''s node-affinity"), as suggested during review. ''Reviewed-by'' from George is there since both patch 6 and 10 had it, and I didn''t do anything else than squashing them. Changes from v1: * in v1, "7/12 xen: numa-sched: use per-vcpu node-affinity for actual scheduling" was doing something very similar to this patch. --- docs/misc/xl-numa-placement.markdown | 148 ++++++++++++++++++++++----------- xen/common/domain.c | 5 + xen/common/keyhandler.c | 2 xen/common/sched_credit.c | 153 +++++++++++++--------------------- xen/common/schedule.c | 3 + xen/include/xen/sched.h | 3 + 6 files changed, 168 insertions(+), 146 deletions(-) diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown index caa3fec..b1ed361 100644 --- a/docs/misc/xl-numa-placement.markdown +++ b/docs/misc/xl-numa-placement.markdown @@ -12,13 +12,6 @@ is quite more complex and slow. On these machines, a NUMA node is usually defined as a set of processor cores (typically a physical CPU package) and the memory directly attached to the set of cores. -The Xen hypervisor deals with NUMA machines by assigning to each domain -a "node affinity", i.e., a set of NUMA nodes of the host from which they -get their memory allocated. Also, even if the node affinity of a domain -is allowed to change on-line, it is very important to "place" the domain -correctly when it is fist created, as the most of its memory is allocated -at that time and can not (for now) be moved easily. - NUMA awareness becomes very important as soon as many domains start running memory-intensive workloads on a shared host. In fact, the cost of accessing non node-local memory locations is very high, and the @@ -27,14 +20,37 @@ performance degradation is likely to be noticeable. For more information, have a look at the [Xen NUMA Introduction][numa_intro] page on the Wiki. +## Xen and NUMA machines: the concept of _node-affinity_ ## + +The Xen hypervisor deals with NUMA machines throughout the concept of +_node-affinity_. The node-affinity of a domain is the set of NUMA nodes +of the host where the memory for the domain is being allocated (mostly, +at domain creation time). This is, at least in principle, different and +unrelated with the vCPU (hard and soft, see below) scheduling affinity, +which instead is the set of pCPUs where the vCPU is allowed (or prefers) +to run. + +Of course, despite the fact that they belong to and affect different +subsystems, the domain node-affinity and the vCPUs affinity are not +completely independent. +In fact, if the domain node-affinity is not explicitly specified by the +user, via the proper libxl calls or xl config item, it will be computed +basing on the vCPUs'' scheduling affinity. + +Notice that, even if the node affinity of a domain may change on-line, +it is very important to "place" the domain correctly when it is fist +created, as the most of its memory is allocated at that time and can +not (for now) be moved easily. + ### Placing via pinning and cpupools ### -The simplest way of placing a domain on a NUMA node is statically pinning -the domain''s vCPUs to the pCPUs of the node. This goes under the name of -CPU affinity and can be set through the "cpus=" option in the config file -(more about this below). Another option is to pool together the pCPUs -spanning the node and put the domain in such a cpupool with the "pool=" -config option (as documented in our [Wiki][cpupools_howto]). +The simplest way of placing a domain on a NUMA node is setting the hard +scheduling affinity of the domain''s vCPUs to the pCPUs of the node. This +also goes under the name of vCPU pinning, and can be done through the +"cpus=" option in the config file (more about this below). Another option +is to pool together the pCPUs spanning the node and put the domain in +such a _cpupool_ with the "pool=" config option (as documented in our +[Wiki][cpupools_howto]). In both the above cases, the domain will not be able to execute outside the specified set of pCPUs for any reasons, even if all those pCPUs are @@ -45,24 +61,45 @@ may come at he cost of some load imbalances. ### NUMA aware scheduling ### -If the credit scheduler is in use, the concept of node affinity defined -above does not only apply to memory. In fact, starting from Xen 4.3, the -scheduler always tries to run the domain''s vCPUs on one of the nodes in -its node affinity. Only if that turns out to be impossible, it will just -pick any free pCPU. - -This is, therefore, something more flexible than CPU affinity, as a domain -can still run everywhere, it just prefers some nodes rather than others. -Locality of access is less guaranteed than in the pinning case, but that -comes along with better chances to exploit all the host resources (e.g., -the pCPUs). - -In fact, if all the pCPUs in a domain''s node affinity are busy, it is -possible for the domain to run outside of there, but it is very likely that -slower execution (due to remote memory accesses) is still better than no -execution at all, as it would happen with pinning. For this reason, NUMA -aware scheduling has the potential of bringing substantial performances -benefits, although this will depend on the workload. +If using the credit1 scheduler, and starting from Xen 4.3, the scheduler +itself always tries to run the domain''s vCPUs on one of the nodes in +its node-affinity. Only if that turns out to be impossible, it will just +pick any free pCPU. Locality of access is less guaranteed than in the +pinning case, but that comes along with better chances to exploit all +the host resources (e.g., the pCPUs). + +Starting from Xen 4.4, credit1 supports two forms of affinity: hard and +soft, both on a per-vCPU basis. This means each vCPU can have its own +soft affinity, stating where such vCPU prefers to execute on. This is +less strict than what it (also starting from 4.4) is called hard affinity, +as the vCPU can potentially run everywhere, it just prefers some pCPUs +rather than others. +In Xen 4.4, therefore, NUMA-aware scheduling is achieved by matching the +soft affinity of the vCPUs of a domain with its node-affinity. + +In fact, as it was for 4.3, if all the pCPUs in a vCPU''s soft affinity +are busy, it is possible for the domain to run outside from there. The +idea is that slower execution (due to remote memory accesses) is still +better than no execution at all (as it would happen with pinning). For +this reason, NUMA aware scheduling has the potential of bringing +substantial performances benefits, although this will depend on the +workload. + +Notice that, for each vCPU, the following three scenarios are possbile: + + * a vCPU *is pinned* to some pCPUs and *does not have* any soft affinity + In this case, the vCPU is always scheduled on one of the pCPUs to which + it is pinned, without any specific peference among them. + * a vCPU *has* its own soft affinity and *is not* pinned to any particular + pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the + scheduler will try to have it running on one of the pCPUs in its soft + affinity; + * a vCPU *has* its own vCPU soft affinity and *is also* pinned to some + pCPUs. In this case, the vCPU is always scheduled on one of the pCPUs + onto which it is pinned, with, among them, a preference for the ones + that also forms its soft affinity. In case pinning and soft affinity + form two disjoint sets of pCPUs, pinning "wins", and the soft affinity + is just ignored. ## Guest placement in xl ## @@ -71,25 +108,23 @@ both manual or automatic placement of them across the host''s NUMA nodes. Note that xm/xend does a very similar thing, the only differences being the details of the heuristics adopted for automatic placement (see below), -and the lack of support (in both xm/xend and the Xen versions where that\ +and the lack of support (in both xm/xend and the Xen versions where that was the default toolstack) for NUMA aware scheduling. ### Placing the guest manually ### Thanks to the "cpus=" option, it is possible to specify where a domain should be created and scheduled on, directly in its config file. This -affects NUMA placement and memory accesses as the hypervisor constructs -the node affinity of a VM basing right on its CPU affinity when it is -created. +affects NUMA placement and memory accesses as, in this case, the +hypervisor constructs the node-affinity of a VM basing right on its +vCPU pinning when it is created. This is very simple and effective, but requires the user/system -administrator to explicitly specify affinities for each and every domain, +administrator to explicitly specify the pinning for each and every domain, or Xen won''t be able to guarantee the locality for their memory accesses. -Notice that this also pins the domain''s vCPUs to the specified set of -pCPUs, so it not only sets the domain''s node affinity (its memory will -come from the nodes to which the pCPUs belong), but at the same time -forces the vCPUs of the domain to be scheduled on those same pCPUs. +That, of course, also mean the vCPUs of the domain will only be able to +execute on those same pCPUs. ### Placing the guest automatically ### @@ -97,7 +132,9 @@ If no "cpus=" option is specified in the config file, libxl tries to figure out on its own on which node(s) the domain could fit best. If it finds one (some), the domain''s node affinity get set to there, and both memory allocations and NUMA aware scheduling (for the credit -scheduler and starting from Xen 4.3) will comply with it. +scheduler and starting from Xen 4.3) will comply with it. Starting from +Xen 4.4, this also means that the mask resulting from this "fitting" +procedure will become the soft affinity of all the vCPUs of the domain. It is worthwhile noting that optimally fitting a set of VMs on the NUMA nodes of an host is an incarnation of the Bin Packing Problem. In fact, @@ -142,34 +179,43 @@ any placement from happening: libxl_defbool_set(&domain_build_info->numa_placement, false); -Also, if `numa_placement` is set to `true`, the domain must not -have any CPU affinity (i.e., `domain_build_info->cpumap` must -have all its bits set, as it is by default), or domain creation -will fail returning `ERROR_INVAL`. +Also, if `numa_placement` is set to `true`, the domain''s vCPUs must +not be pinned (i.e., `domain_build_info->cpumap` must have all its +bits set, as it is by default), or domain creation will fail with +`ERROR_INVAL`. Starting from Xen 4.3, in case automatic placement happens (and is -successful), it will affect the domain''s node affinity and _not_ its -CPU affinity. Namely, the domain''s vCPUs will not be pinned to any +successful), it will affect the domain''s node-affinity and _not_ its +vCPU pinning. Namely, the domain''s vCPUs will not be pinned to any pCPU on the host, but the memory from the domain will come from the selected node(s) and the NUMA aware scheduling (if the credit scheduler -is in use) will try to keep the domain there as much as possible. +is in use) will try to keep the domain''s vCPUs there as much as possible. Besides than that, looking and/or tweaking the placement algorithm search "Automatic NUMA placement" in libxl\_internal.h. Note this may change in future versions of Xen/libxl. +## Xen < 4.4 ## + +The concept of vCPU soft affinity has been introduced for the first time +in Xen 4.4. In 4.3, it is the domain''s node-affinity that drives the +NUMA-aware scheduler. The main difference is soft affinity is per-vCPU, +and so each vCPU can have its own mask of pCPUs, while node-affinity is +per-domain, that is the equivalent of having all the vCPUs with the same +soft affinity. + ## Xen < 4.3 ## As NUMA aware scheduling is a new feature of Xen 4.3, things are a little bit different for earlier version of Xen. If no "cpus=" option is specified and Xen 4.2 is in use, the automatic placement algorithm still runs, but the results is used to _pin_ the vCPUs of the domain to the output node(s). -This is consistent with what was happening with xm/xend, which were also -affecting the domain''s CPU affinity. +This is consistent with what was happening with xm/xend. On a version of Xen earlier than 4.2, there is not automatic placement at -all in xl or libxl, and hence no node or CPU affinity being affected. +all in xl or libxl, and hence no node-affinity, vCPU affinity or pinning +being introduced/modified. ## Limitations ## diff --git a/xen/common/domain.c b/xen/common/domain.c index d8116c7..d6ac4d1 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -128,6 +128,7 @@ struct vcpu *alloc_vcpu( if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) || !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) || !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) || + !zalloc_cpumask_var(&v->cpu_soft_affinity) || !zalloc_cpumask_var(&v->vcpu_dirty_cpumask) ) goto fail_free; @@ -159,6 +160,7 @@ struct vcpu *alloc_vcpu( free_cpumask_var(v->cpu_hard_affinity); free_cpumask_var(v->cpu_hard_affinity_tmp); free_cpumask_var(v->cpu_hard_affinity_saved); + free_cpumask_var(v->cpu_soft_affinity); free_cpumask_var(v->vcpu_dirty_cpumask); free_vcpu_struct(v); return NULL; @@ -390,8 +392,6 @@ void domain_update_node_affinity(struct domain *d) node_set(node, d->node_affinity); } - sched_set_node_affinity(d, &d->node_affinity); - spin_unlock(&d->node_affinity_lock); free_cpumask_var(online_affinity); @@ -737,6 +737,7 @@ static void complete_domain_destroy(struct rcu_head *head) free_cpumask_var(v->cpu_hard_affinity); free_cpumask_var(v->cpu_hard_affinity_tmp); free_cpumask_var(v->cpu_hard_affinity_saved); + free_cpumask_var(v->cpu_soft_affinity); free_cpumask_var(v->vcpu_dirty_cpumask); free_vcpu_struct(v); } diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c index c11f577..42fb418 100644 --- a/xen/common/keyhandler.c +++ b/xen/common/keyhandler.c @@ -298,6 +298,8 @@ static void dump_domains(unsigned char key) printk("dirty_cpus=%s ", tmpstr); cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_hard_affinity); printk("cpu_affinity=%s\n", tmpstr); + cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_soft_affinity); + printk("cpu_soft_affinity=%s\n", tmpstr); printk(" pause_count=%d pause_flags=%lx\n", atomic_read(&v->pause_count), v->pause_flags); arch_dump_vcpu_info(v); diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index c6a2560..8b02b7b 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -112,10 +112,24 @@ /* - * Node Balancing + * Hard and soft affinity load balancing. + * + * Idea is each vcpu has some pcpus that it prefers, some that it does not + * prefer but is OK with, and some that it cannot run on at all. The first + * set of pcpus are the ones that are both in the soft affinity *and* in the + * hard affinity; the second set of pcpus are the ones that are in the hard + * affinity but *not* in the soft affinity; the third set of pcpus are the + * ones that are not in the hard affinity. + * + * We implement a two step balancing logic. Basically, every time there is + * the need to decide where to run a vcpu, we first check the soft affinity + * (well, actually, the && between soft and hard affinity), to see if we can + * send it where it prefers to (and can) run on. However, if the first step + * does not find any suitable and free pcpu, we fall back checking the hard + * affinity. */ -#define CSCHED_BALANCE_NODE_AFFINITY 0 -#define CSCHED_BALANCE_CPU_AFFINITY 1 +#define CSCHED_BALANCE_SOFT_AFFINITY 0 +#define CSCHED_BALANCE_HARD_AFFINITY 1 /* * Boot parameters @@ -138,7 +152,7 @@ struct csched_pcpu { /* * Convenience macro for accessing the per-PCPU cpumask we need for - * implementing the two steps (vcpu and node affinity) balancing logic. + * implementing the two steps (soft and hard affinity) balancing logic. * It is stored in csched_pcpu so that serialization is not an issue, * as there is a csched_pcpu for each PCPU and we always hold the * runqueue spin-lock when using this. @@ -178,9 +192,6 @@ struct csched_dom { struct list_head active_vcpu; struct list_head active_sdom_elem; struct domain *dom; - /* cpumask translated from the domain''s node-affinity. - * Basically, the CPUs we prefer to be scheduled on. */ - cpumask_var_t node_affinity_cpumask; uint16_t active_vcpu_count; uint16_t weight; uint16_t cap; @@ -261,59 +272,28 @@ __runq_remove(struct csched_vcpu *svc) list_del_init(&svc->runq_elem); } -/* - * Translates node-affinity mask into a cpumask, so that we can use it during - * actual scheduling. That of course will contain all the cpus from all the - * set nodes in the original node-affinity mask. - * - * Note that any serialization needed to access mask safely is complete - * responsibility of the caller of this function/hook. - */ -static void csched_set_node_affinity( - const struct scheduler *ops, - struct domain *d, - nodemask_t *mask) -{ - struct csched_dom *sdom; - int node; - - /* Skip idle domain since it doesn''t even have a node_affinity_cpumask */ - if ( unlikely(is_idle_domain(d)) ) - return; - - sdom = CSCHED_DOM(d); - cpumask_clear(sdom->node_affinity_cpumask); - for_each_node_mask( node, *mask ) - cpumask_or(sdom->node_affinity_cpumask, sdom->node_affinity_cpumask, - &node_to_cpumask(node)); -} #define for_each_csched_balance_step(step) \ - for ( (step) = 0; (step) <= CSCHED_BALANCE_CPU_AFFINITY; (step)++ ) + for ( (step) = 0; (step) <= CSCHED_BALANCE_HARD_AFFINITY; (step)++ ) /* - * vcpu-affinity balancing is always necessary and must never be skipped. - * OTOH, if a domain''s node-affinity is said to be automatically computed - * (or if it just spans all the nodes), we can safely avoid dealing with - * node-affinity entirely. + * Hard affinity balancing is always necessary and must never be skipped. + * OTOH, if the vcpu''s soft affinity is full (it spans all the possible + * pcpus) we can safely avoid dealing with it entirely. * - * Node-affinity is also deemed meaningless in case it has empty - * intersection with mask, to cover the cases where using the node-affinity + * A vcpu''s soft affinity is also deemed meaningless in case it has empty + * intersection with mask, to cover the cases where using the soft affinity * mask seems legit, but would instead led to trying to schedule the vcpu * on _no_ pcpu! Typical use cases are for mask to be equal to the vcpu''s - * vcpu-affinity, or to the && of vcpu-affinity and the set of online cpus + * hard affinity, or to the && of hard affinity and the set of online cpus * in the domain''s cpupool. */ -static inline int __vcpu_has_node_affinity(const struct vcpu *vc, +static inline int __vcpu_has_soft_affinity(const struct vcpu *vc, const cpumask_t *mask) { - const struct domain *d = vc->domain; - const struct csched_dom *sdom = CSCHED_DOM(d); - - if ( d->auto_node_affinity - || cpumask_full(sdom->node_affinity_cpumask) - || !cpumask_intersects(sdom->node_affinity_cpumask, mask) ) + if ( cpumask_full(vc->cpu_soft_affinity) + || !cpumask_intersects(vc->cpu_soft_affinity, mask) ) return 0; return 1; @@ -321,23 +301,22 @@ static inline int __vcpu_has_node_affinity(const struct vcpu *vc, /* * Each csched-balance step uses its own cpumask. This function determines - * which one (given the step) and copies it in mask. For the node-affinity - * balancing step, the pcpus that are not part of vc''s vcpu-affinity are + * which one (given the step) and copies it in mask. For the soft affinity + * balancing step, the pcpus that are not part of vc''s hard affinity are * filtered out from the result, to avoid running a vcpu where it would * like, but is not allowed to! */ static void csched_balance_cpumask(const struct vcpu *vc, int step, cpumask_t *mask) { - if ( step == CSCHED_BALANCE_NODE_AFFINITY ) + if ( step == CSCHED_BALANCE_SOFT_AFFINITY ) { - cpumask_and(mask, CSCHED_DOM(vc->domain)->node_affinity_cpumask, - vc->cpu_hard_affinity); + cpumask_and(mask, vc->cpu_soft_affinity, vc->cpu_hard_affinity); if ( unlikely(cpumask_empty(mask)) ) cpumask_copy(mask, vc->cpu_hard_affinity); } - else /* step == CSCHED_BALANCE_CPU_AFFINITY */ + else /* step == CSCHED_BALANCE_HARD_AFFINITY */ cpumask_copy(mask, vc->cpu_hard_affinity); } @@ -398,15 +377,15 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new) else if ( !idlers_empty ) { /* - * Node and vcpu-affinity balancing loop. For vcpus without - * a useful node-affinity, consider vcpu-affinity only. + * Soft and hard affinity balancing loop. For vcpus without + * a useful soft affinity, consider hard affinity only. */ for_each_csched_balance_step( balance_step ) { int new_idlers_empty; - if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY - && !__vcpu_has_node_affinity(new->vcpu, + if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY + && !__vcpu_has_soft_affinity(new->vcpu, new->vcpu->cpu_hard_affinity) ) continue; @@ -418,11 +397,11 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new) /* * Let''s not be too harsh! If there aren''t idlers suitable - * for new in its node-affinity mask, make sure we check its - * vcpu-affinity as well, before taking final decisions. + * for new in its soft affinity mask, make sure we check its + * hard affinity as well, before taking final decisions. */ if ( new_idlers_empty - && balance_step == CSCHED_BALANCE_NODE_AFFINITY ) + && balance_step == CSCHED_BALANCE_SOFT_AFFINITY ) continue; /* @@ -649,23 +628,23 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) /* * We want to pick up a pcpu among the ones that are online and * can accommodate vc, which is basically what we computed above - * and stored in cpus. As far as vcpu-affinity is concerned, + * and stored in cpus. As far as hard affinity is concerned, * there always will be at least one of these pcpus, hence cpus * is never empty and the calls to cpumask_cycle() and * cpumask_test_cpu() below are ok. * - * On the other hand, when considering node-affinity too, it + * On the other hand, when considering soft affinity too, it * is possible for the mask to become empty (for instance, if the * domain has been put in a cpupool that does not contain any of the - * nodes in its node-affinity), which would result in the ASSERT()-s + * pcpus in its soft affinity), which would result in the ASSERT()-s * inside cpumask_*() operations triggering (in debug builds). * - * Therefore, in this case, we filter the node-affinity mask against - * cpus and, if the result is empty, we just skip the node-affinity + * Therefore, in this case, we filter the soft affinity mask against + * cpus and, if the result is empty, we just skip the soft affinity * balancing step all together. */ - if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY - && !__vcpu_has_node_affinity(vc, &cpus) ) + if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY + && !__vcpu_has_soft_affinity(vc, &cpus) ) continue; /* Pick an online CPU from the proper affinity mask */ @@ -1122,13 +1101,6 @@ csched_alloc_domdata(const struct scheduler *ops, struct domain *dom) if ( sdom == NULL ) return NULL; - if ( !alloc_cpumask_var(&sdom->node_affinity_cpumask) ) - { - xfree(sdom); - return NULL; - } - cpumask_setall(sdom->node_affinity_cpumask); - /* Initialize credit and weight */ INIT_LIST_HEAD(&sdom->active_vcpu); INIT_LIST_HEAD(&sdom->active_sdom_elem); @@ -1158,9 +1130,6 @@ csched_dom_init(const struct scheduler *ops, struct domain *dom) static void csched_free_domdata(const struct scheduler *ops, void *data) { - struct csched_dom *sdom = data; - - free_cpumask_var(sdom->node_affinity_cpumask); xfree(data); } @@ -1486,19 +1455,19 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step) BUG_ON( is_idle_vcpu(vc) ); /* - * If the vcpu has no useful node-affinity, skip this vcpu. - * In fact, what we want is to check if we have any node-affine - * work to steal, before starting to look at vcpu-affine work. + * If the vcpu has no useful soft affinity, skip this vcpu. + * In fact, what we want is to check if we have any "soft-affine + * work" to steal, before starting to look at "hard-affine work". * * Notice that, if not even one vCPU on this runq has a useful - * node-affinity, we could have avoid considering this runq for - * a node balancing step in the first place. This, for instance, + * soft affinity, we could have avoid considering this runq for + * a soft balancing step in the first place. This, for instance, * can be implemented by taking note of on what runq there are - * vCPUs with useful node-affinities in some sort of bitmap + * vCPUs with useful soft affinities in some sort of bitmap * or counter. */ - if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY - && !__vcpu_has_node_affinity(vc, vc->cpu_hard_affinity) ) + if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY + && !__vcpu_has_soft_affinity(vc, vc->cpu_hard_affinity) ) continue; csched_balance_cpumask(vc, balance_step, csched_balance_mask); @@ -1546,17 +1515,17 @@ csched_load_balance(struct csched_private *prv, int cpu, SCHED_STAT_CRANK(load_balance_other); /* - * Let''s look around for work to steal, taking both vcpu-affinity - * and node-affinity into account. More specifically, we check all + * Let''s look around for work to steal, taking both hard affinity + * and soft affinity into account. More specifically, we check all * the non-idle CPUs'' runq, looking for: - * 1. any node-affine work to steal first, - * 2. if not finding anything, any vcpu-affine work to steal. + * 1. any "soft-affine work" to steal first, + * 2. if not finding anything, any "hard-affine work" to steal. */ for_each_csched_balance_step( bstep ) { /* * We peek at the non-idling CPUs in a node-wise fashion. In fact, - * it is more likely that we find some node-affine work on our same + * it is more likely that we find some affine work on our same * node, not to mention that migrating vcpus within the same node * could well expected to be cheaper than across-nodes (memory * stays local, there might be some node-wide cache[s], etc.). @@ -1982,8 +1951,6 @@ const struct scheduler sched_credit_def = { .adjust = csched_dom_cntl, .adjust_global = csched_sys_cntl, - .set_node_affinity = csched_set_node_affinity, - .pick_cpu = csched_cpu_pick, .do_schedule = csched_schedule, diff --git a/xen/common/schedule.c b/xen/common/schedule.c index c4236c5..c9ae521 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -198,6 +198,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor) else cpumask_setall(v->cpu_hard_affinity); + cpumask_setall(v->cpu_soft_affinity); + /* Initialise the per-vcpu timers. */ init_timer(&v->periodic_timer, vcpu_periodic_timer_fn, v, v->processor); @@ -286,6 +288,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) migrate_timer(&v->poll_timer, new_p); cpumask_setall(v->cpu_hard_affinity); + cpumask_setall(v->cpu_soft_affinity); lock = vcpu_schedule_lock_irq(v); v->processor = new_p; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 40e5927..3575312 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -198,6 +198,9 @@ struct vcpu /* Used to restore affinity across S3. */ cpumask_var_t cpu_hard_affinity_saved; + /* Bitmask of CPUs on which this VCPU prefers to run. */ + cpumask_var_t cpu_soft_affinity; + /* Bitmask of CPUs which are holding onto this VCPU''s state. */ cpumask_var_t vcpu_dirty_cpumask;
Dario Faggioli
2013-Nov-18 18:17 UTC
[PATCH v3 08/14] xen: derive NUMA node affinity from hard and soft CPU affinity
if a domain''s NUMA node-affinity (which is what controls memory allocations) is provided by the user/toolstack, it just is not touched. However, if the user does not say anything, leaving it all to Xen, let''s compute it in the following way: 1. cpupool''s cpus & hard-affinity & soft-affinity 2. if (1) is empty: cpupool''s cpus & hard-affinity This guarantees memory to be allocated from the narrowest possible set of NUMA nodes, ad makes it relatively easy to set up NUMA-aware scheduling on top of soft affinity. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * the loop computing the mask is now only executed when it really is useful, as suggested during review; * the loop, and all the cpumask handling is optimized, in a way similar to what was suggested during review. --- xen/common/domain.c | 62 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 40 insertions(+), 22 deletions(-) diff --git a/xen/common/domain.c b/xen/common/domain.c index d6ac4d1..721678a 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -353,17 +353,17 @@ struct domain *domain_create( void domain_update_node_affinity(struct domain *d) { - cpumask_var_t cpumask; - cpumask_var_t online_affinity; + cpumask_var_t dom_cpumask, dom_cpumask_soft; + cpumask_t *dom_affinity; const cpumask_t *online; struct vcpu *v; - unsigned int node; + unsigned int cpu; - if ( !zalloc_cpumask_var(&cpumask) ) + if ( !zalloc_cpumask_var(&dom_cpumask) ) return; - if ( !alloc_cpumask_var(&online_affinity) ) + if ( !zalloc_cpumask_var(&dom_cpumask_soft) ) { - free_cpumask_var(cpumask); + free_cpumask_var(dom_cpumask); return; } @@ -371,31 +371,49 @@ void domain_update_node_affinity(struct domain *d) spin_lock(&d->node_affinity_lock); - for_each_vcpu ( d, v ) - { - cpumask_and(online_affinity, v->cpu_hard_affinity, online); - cpumask_or(cpumask, cpumask, online_affinity); - } - /* - * If d->auto_node_affinity is true, the domain''s node-affinity mask - * (d->node_affinity) is automaically computed from all the domain''s - * vcpus'' vcpu-affinity masks (the union of which we have just built - * above in cpumask). OTOH, if d->auto_node_affinity is false, we - * must leave the node-affinity of the domain alone. + * If d->auto_node_affinity is true, let''s compute the domain''s + * node-affinity and update d->node_affinity accordingly. if false, + * just leave d->auto_node_affinity alone. */ if ( d->auto_node_affinity ) { + /* + * We want the narrowest possible set of pcpus (to get the narowest + * possible set of nodes). What we need is the cpumask of where the + * domain can run (the union of the hard affinity of all its vcpus), + * and the full mask of where it would prefer to run (the union of + * the soft affinity of all its various vcpus). Let''s build them. + */ + cpumask_clear(dom_cpumask); + cpumask_clear(dom_cpumask_soft); + for_each_vcpu ( d, v ) + { + cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity); + cpumask_or(dom_cpumask_soft, dom_cpumask_soft, + v->cpu_soft_affinity); + } + /* Filter out non-online cpus */ + cpumask_and(dom_cpumask, dom_cpumask, online); + /* And compute the intersection between hard, online and soft */ + cpumask_and(dom_cpumask_soft, dom_cpumask_soft, dom_cpumask); + + /* + * If not empty, the intersection of hard, soft and online is the + * narrowest set we want. If empty, we fall back to hard&online. + */ + dom_affinity = cpumask_empty(dom_cpumask_soft) ? + dom_cpumask : dom_cpumask_soft; + nodes_clear(d->node_affinity); - for_each_online_node ( node ) - if ( cpumask_intersects(&node_to_cpumask(node), cpumask) ) - node_set(node, d->node_affinity); + for_each_cpu( cpu, dom_affinity ) + node_set(cpu_to_node(cpu), d->node_affinity); } spin_unlock(&d->node_affinity_lock); - free_cpumask_var(online_affinity); - free_cpumask_var(cpumask); + free_cpumask_var(dom_cpumask_soft); + free_cpumask_var(dom_cpumask); }
Dario Faggioli
2013-Nov-18 18:17 UTC
[PATCH v3 09/14] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
by adding a flag for the caller to specify which one he cares about. Add also another cpumap there. This way, in case of DOMCTL_setvcpuaffinity, Xen can return back to the caller the "effective affinity" of the vcpu. We call the effective affinity the intersection between cpupool''s cpus, the (new?) hard affinity and the (new?) soft affinity. The purpose of this is allowing the toolstack to figure out whether or not the requested change produced sensible results, when combined with the other settings that are already in place. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * in DOMCTL_[sg]etvcpuaffinity, flag is really a flag now, i.e., we accept request for setting and getting: (1) only hard affinity; (2) only soft affinity; (3) both; as suggested during review. --- tools/libxc/xc_domain.c | 4 ++- xen/arch/x86/traps.c | 4 ++- xen/common/domctl.c | 54 ++++++++++++++++++++++++++++++++++++++++--- xen/common/schedule.c | 35 +++++++++++++++++++--------- xen/common/wait.c | 6 ++--- xen/include/public/domctl.h | 15 ++++++++++-- xen/include/xen/sched.h | 3 ++ 7 files changed, 97 insertions(+), 24 deletions(-) diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 1ccafc5..f9ae4bf 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -215,7 +215,9 @@ int xc_vcpu_setaffinity(xc_interface *xch, domctl.cmd = XEN_DOMCTL_setvcpuaffinity; domctl.domain = (domid_t)domid; - domctl.u.vcpuaffinity.vcpu = vcpu; + domctl.u.vcpuaffinity.vcpu = vcpu; + /* Soft affinity is there, but not used anywhere for now, so... */ + domctl.u.vcpuaffinity.flags = XEN_VCPUAFFINITY_HARD; memcpy(local, cpumap, cpusize); diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 4279cad..196ff68 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -3093,7 +3093,7 @@ static void nmi_mce_softirq(void) * Make sure to wakeup the vcpu on the * specified processor. */ - vcpu_set_affinity(st->vcpu, cpumask_of(st->processor)); + vcpu_set_hard_affinity(st->vcpu, cpumask_of(st->processor)); /* Affinity is restored in the iret hypercall. */ } @@ -3122,7 +3122,7 @@ void async_exception_cleanup(struct vcpu *curr) if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) && !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) ) { - vcpu_set_affinity(curr, curr->cpu_hard_affinity_tmp); + vcpu_set_hard_affinity(curr, curr->cpu_hard_affinity_tmp); cpumask_clear(curr->cpu_hard_affinity_tmp); } diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 5e0ac5c..84be0d6 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -617,19 +617,65 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) if ( op->cmd == XEN_DOMCTL_setvcpuaffinity ) { cpumask_var_t new_affinity; + cpumask_t *online; ret = xenctl_bitmap_to_cpumask( &new_affinity, &op->u.vcpuaffinity.cpumap); - if ( !ret ) + if ( ret ) + break; + + ret = -EINVAL; + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD ) + ret = vcpu_set_hard_affinity(v, new_affinity); + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT ) + ret = vcpu_set_soft_affinity(v, new_affinity); + + if ( ret ) + goto setvcpuaffinity_out; + + /* + * Report back to the caller what the "effective affinity", that + * is the intersection of cpupool''s pcpus, the (new?) hard + * affinity and the (new?) soft-affinity. + */ + if ( !guest_handle_is_null(op->u.vcpuaffinity.eff_cpumap.bitmap) ) { - ret = vcpu_set_affinity(v, new_affinity); - free_cpumask_var(new_affinity); + online = cpupool_online_cpumask(v->domain->cpupool); + cpumask_and(new_affinity, online, v->cpu_hard_affinity); + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT) + cpumask_and(new_affinity, new_affinity, + v->cpu_soft_affinity); + + ret = cpumask_to_xenctl_bitmap( + &op->u.vcpuaffinity.eff_cpumap, new_affinity); } + + setvcpuaffinity_out: + free_cpumask_var(new_affinity); } else { + cpumask_var_t affinity; + + /* + * If the caller asks for both _HARD and _SOFT, what we return + * is the intersection of hard and soft affinity for the vcpu. + */ + if ( !alloc_cpumask_var(&affinity) ) { + ret = -EFAULT; + break; + } + cpumask_setall(affinity); + + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD ) + cpumask_copy(affinity, v->cpu_hard_affinity); + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT ) + cpumask_and(affinity, affinity, v->cpu_soft_affinity); + ret = cpumask_to_xenctl_bitmap( - &op->u.vcpuaffinity.cpumap, v->cpu_hard_affinity); + &op->u.vcpuaffinity.cpumap, affinity); + + free_cpumask_var(affinity); } } break; diff --git a/xen/common/schedule.c b/xen/common/schedule.c index c9ae521..6c53287 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -654,22 +654,14 @@ void sched_set_node_affinity(struct domain *d, nodemask_t *mask) SCHED_OP(DOM2OP(d), set_node_affinity, d, mask); } -int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity) +static int vcpu_set_affinity( + struct vcpu *v, const cpumask_t *affinity, cpumask_t **which) { - cpumask_t online_affinity; - cpumask_t *online; spinlock_t *lock; - if ( v->domain->is_pinned ) - return -EINVAL; - online = VCPU2ONLINE(v); - cpumask_and(&online_affinity, affinity, online); - if ( cpumask_empty(&online_affinity) ) - return -EINVAL; - lock = vcpu_schedule_lock_irq(v); - cpumask_copy(v->cpu_hard_affinity, affinity); + cpumask_copy(*which, affinity); /* Always ask the scheduler to re-evaluate placement * when changing the affinity */ @@ -688,6 +680,27 @@ int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity) return 0; } +int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity) +{ + cpumask_t online_affinity; + cpumask_t *online; + + if ( v->domain->is_pinned ) + return -EINVAL; + + online = VCPU2ONLINE(v); + cpumask_and(&online_affinity, affinity, online); + if ( cpumask_empty(&online_affinity) ) + return -EINVAL; + + return vcpu_set_affinity(v, affinity, &v->cpu_hard_affinity); +} + +int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity) +{ + return vcpu_set_affinity(v, affinity, &v->cpu_soft_affinity); +} + /* Block the currently-executing domain until a pertinent event occurs. */ void vcpu_block(void) { diff --git a/xen/common/wait.c b/xen/common/wait.c index 3f6ff41..1f6b597 100644 --- a/xen/common/wait.c +++ b/xen/common/wait.c @@ -135,7 +135,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv) /* Save current VCPU affinity; force wakeup on *this* CPU only. */ wqv->wakeup_cpu = smp_processor_id(); cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); - if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) + if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) { gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); domain_crash_synchronous(); @@ -166,7 +166,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv) static void __finish_wait(struct waitqueue_vcpu *wqv) { wqv->esp = NULL; - (void)vcpu_set_affinity(current, &wqv->saved_affinity); + (void)vcpu_set_hard_affinity(current, &wqv->saved_affinity); } void check_wakeup_from_wait(void) @@ -184,7 +184,7 @@ void check_wakeup_from_wait(void) /* Re-set VCPU affinity and re-enter the scheduler. */ struct vcpu *curr = current; cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); - if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) + if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) { gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); domain_crash_synchronous(); diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 01a3652..4f71450 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -300,8 +300,19 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_nodeaffinity_t); /* XEN_DOMCTL_setvcpuaffinity */ /* XEN_DOMCTL_getvcpuaffinity */ struct xen_domctl_vcpuaffinity { - uint32_t vcpu; /* IN */ - struct xenctl_bitmap cpumap; /* IN/OUT */ + /* IN variables. */ + uint32_t vcpu; + /* Set/get the hard affinity for vcpu */ +#define _XEN_VCPUAFFINITY_HARD 0 +#define XEN_VCPUAFFINITY_HARD (1U<<_XEN_VCPUAFFINITY_HARD) + /* Set/get the soft affinity for vcpu */ +#define _XEN_VCPUAFFINITY_SOFT 1 +#define XEN_VCPUAFFINITY_SOFT (1U<<_XEN_VCPUAFFINITY_SOFT) + uint32_t flags; + /* IN/OUT variables. */ + struct xenctl_bitmap cpumap; + /* OUT variables. */ + struct xenctl_bitmap eff_cpumap; }; typedef struct xen_domctl_vcpuaffinity xen_domctl_vcpuaffinity_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuaffinity_t); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 3575312..0f728b3 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -755,7 +755,8 @@ void scheduler_free(struct scheduler *sched); int schedule_cpu_switch(unsigned int cpu, struct cpupool *c); void vcpu_force_reschedule(struct vcpu *v); int cpu_disable_scheduler(unsigned int cpu); -int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity); +int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity); +int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity); void restore_vcpu_affinity(struct domain *d); void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate);
Dario Faggioli
2013-Nov-18 18:18 UTC
[PATCH v3 10/14] libxc: get and set soft and hard affinity
by using the new flag introduced in the parameters of the DOMCTL_{get,set}_vcpuaffinity hypercall. This happens by adding a new parameter (flags) to xc_vcpu_setaffinity() and xc_vcpu_getaffinity(), so that the caller can decide to set either the soft or hard affinity, or even both. In case of setting both hard and soft, they are set to the same cpumap. xc_get_setaffinity() also takes another new param, for reporting back to the caller what the actual affinity the scheduler uses will be after a successful call. In case of asking to get both hard and soft, what the caller gets is the intersection between them. In-tree callers are also fixed to cope with the new interface. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * better cleanup logic in _vcpu_setaffinity() (regarding xc_hypercall_buffer_{alloc,free}()), as suggested during review; * make it more evident that DOMCTL_setvcpuaffinity has an out parameter, by calling ecpumap_out, and improving the comment wrt that; * change the interface and have xc_vcpu_[sg]etaffinity() so that they take the new parameters (flags and ecpumap_out) and fix the in tree callers. --- tools/libxc/xc_domain.c | 47 +++++++++++++++++++++-------------- tools/libxc/xenctrl.h | 44 ++++++++++++++++++++++++++++++++- tools/libxl/libxl.c | 7 ++++- tools/ocaml/libs/xc/xenctrl_stubs.c | 8 ++++-- tools/python/xen/lowlevel/xc/xc.c | 6 +++- 5 files changed, 86 insertions(+), 26 deletions(-) diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index f9ae4bf..bddf4e0 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -192,44 +192,52 @@ int xc_domain_node_getaffinity(xc_interface *xch, int xc_vcpu_setaffinity(xc_interface *xch, uint32_t domid, int vcpu, - xc_cpumap_t cpumap) + xc_cpumap_t cpumap, + uint32_t flags, + xc_cpumap_t ecpumap_out) { DECLARE_DOMCTL; - DECLARE_HYPERCALL_BUFFER(uint8_t, local); + DECLARE_HYPERCALL_BUFFER(uint8_t, cpumap_local); + DECLARE_HYPERCALL_BUFFER(uint8_t, ecpumap_local); int ret = -1; int cpusize; cpusize = xc_get_cpumap_size(xch); - if (!cpusize) + if ( !cpusize ) { PERROR("Could not get number of cpus"); - goto out; + return -1;; } - local = xc_hypercall_buffer_alloc(xch, local, cpusize); - if ( local == NULL ) + cpumap_local = xc_hypercall_buffer_alloc(xch, cpumap_local, cpusize); + ecpumap_local = xc_hypercall_buffer_alloc(xch, ecpumap_local, cpusize); + if ( cpumap_local == NULL || cpumap_local == NULL) { - PERROR("Could not allocate memory for setvcpuaffinity domctl hypercall"); + PERROR("Could not allocate hcall buffers for DOMCTL_setvcpuaffinity"); goto out; } domctl.cmd = XEN_DOMCTL_setvcpuaffinity; domctl.domain = (domid_t)domid; domctl.u.vcpuaffinity.vcpu = vcpu; - /* Soft affinity is there, but not used anywhere for now, so... */ - domctl.u.vcpuaffinity.flags = XEN_VCPUAFFINITY_HARD; - - memcpy(local, cpumap, cpusize); - - set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); + domctl.u.vcpuaffinity.flags = flags; + memcpy(cpumap_local, cpumap, cpusize); + set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, cpumap_local); domctl.u.vcpuaffinity.cpumap.nr_bits = cpusize * 8; + set_xen_guest_handle(domctl.u.vcpuaffinity.eff_cpumap.bitmap, + ecpumap_local); + domctl.u.vcpuaffinity.eff_cpumap.nr_bits = cpusize * 8; + ret = do_domctl(xch, &domctl); - xc_hypercall_buffer_free(xch, local); + if ( ecpumap_out != NULL ) + memcpy(ecpumap_out, ecpumap_local, cpusize); out: + xc_hypercall_buffer_free(xch, cpumap_local); + xc_hypercall_buffer_free(xch, ecpumap_local); return ret; } @@ -237,6 +245,7 @@ int xc_vcpu_setaffinity(xc_interface *xch, int xc_vcpu_getaffinity(xc_interface *xch, uint32_t domid, int vcpu, + uint32_t flags, xc_cpumap_t cpumap) { DECLARE_DOMCTL; @@ -245,22 +254,23 @@ int xc_vcpu_getaffinity(xc_interface *xch, int cpusize; cpusize = xc_get_cpumap_size(xch); - if (!cpusize) + if ( !cpusize ) { PERROR("Could not get number of cpus"); - goto out; + return -1; } local = xc_hypercall_buffer_alloc(xch, local, cpusize); - if (local == NULL) + if ( local == NULL ) { PERROR("Could not allocate memory for getvcpuaffinity domctl hypercall"); - goto out; + return -1; } domctl.cmd = XEN_DOMCTL_getvcpuaffinity; domctl.domain = (domid_t)domid; domctl.u.vcpuaffinity.vcpu = vcpu; + domctl.u.vcpuaffinity.flags = flags; set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); domctl.u.vcpuaffinity.cpumap.nr_bits = cpusize * 8; @@ -270,7 +280,6 @@ int xc_vcpu_getaffinity(xc_interface *xch, memcpy(cpumap, local, cpusize); xc_hypercall_buffer_free(xch, local); -out: return ret; } diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h index 4ac6b8a..a97ed67 100644 --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -579,13 +579,55 @@ int xc_domain_node_getaffinity(xc_interface *xch, uint32_t domind, xc_nodemap_t nodemap); +/** + * This function specifies the CPU affinity for a vcpu. + * + * There are two kinds of affinity. Soft affinity is on what pcpus a vcpu + * prefers to run. Hard affinity is on what pcpus a vcpu is allowed to run. + * If flags contains *only* XEN_VCPUAFFINITY_SOFT, it is the soft affinity + * that is set. If flags contains *only* XEN_VCPUAFFINITY_HARD, it is the + * hard affinity that is set. If flags contains *both*, both are set to the + * same value, provided in cpumap. + * + * The function also returns the effective affinity, via the ecpumap_out + * parameter. Effective affinity it the intersection of soft affinity, hard + * affinity and the set of the cpus of the cpupool the domain belongs to. + * It basically is what the Xen scheduler will actually use. Reporting it + * back to the caller allows him to check if that matches with, or at least + * is good enough for, his purposes. + * + * @param xch a handle to an open hypervisor interface. + * @param domid the id of the domain to which the vcpu belongs + * @param vcpu the vcpu id wihin the domain + * @param cpumap the (hard, soft, both) new affinity map one wants to set + * #param flags what we want to set + * @param ecpumap_out where the effective affinity for the vcpu is returned + */ int xc_vcpu_setaffinity(xc_interface *xch, uint32_t domid, int vcpu, - xc_cpumap_t cpumap); + xc_cpumap_t cpumap, + uint32_t flags, + xc_cpumap_t ecpumap_out); + +/** + * This function retrieves hard or soft CPU affinity (or their intersection) + * for a vcpu, depending on flags. + * + * Soft affinity is returned if *only* XEN_VCPUAFFINITY_SOFT is set in flags. + * Hard affinity is returned if *only* XEN_VCPUAFFINITY_HARD is set in flags. + * If both are set, what is returned is the intersection of the two. + * + * @param xch a handle to an open hypervisor interface. + * @param domid the id of the domain to which the vcpu belongs + * @param vcpu the vcpu id wihin the domain + * @param flags what we want get + * @param cpumap is where the desired affinity is returned + */ int xc_vcpu_getaffinity(xc_interface *xch, uint32_t domid, int vcpu, + uint32_t flags, xc_cpumap_t cpumap); diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index d3ab65e..d0db3f0 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -4208,7 +4208,9 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu info"); return NULL; } - if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu, ptr->cpumap.map) == -1) { + if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu, + XEN_VCPUAFFINITY_HARD, + ptr->cpumap.map) == -1) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu affinity"); return NULL; } @@ -4225,7 +4227,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, libxl_bitmap *cpumap) { - if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map)) { + if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map, + XEN_VCPUAFFINITY_HARD, NULL)) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu affinity"); return ERROR_FAIL; } diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c index f5cf0ed..30327d4 100644 --- a/tools/ocaml/libs/xc/xenctrl_stubs.c +++ b/tools/ocaml/libs/xc/xenctrl_stubs.c @@ -438,7 +438,9 @@ CAMLprim value stub_xc_vcpu_setaffinity(value xch, value domid, c_cpumap[i/8] |= 1 << (i&7); } retval = xc_vcpu_setaffinity(_H(xch), _D(domid), - Int_val(vcpu), c_cpumap); + Int_val(vcpu), c_cpumap, + XEN_VCPUAFFINITY_HARD, + NULL); free(c_cpumap); if (retval < 0) @@ -460,7 +462,9 @@ CAMLprim value stub_xc_vcpu_getaffinity(value xch, value domid, failwith_xc(_H(xch)); retval = xc_vcpu_getaffinity(_H(xch), _D(domid), - Int_val(vcpu), c_cpumap); + Int_val(vcpu), + XEN_VCPUAFFINITY_HARD, + c_cpumap); if (retval < 0) { free(c_cpumap); failwith_xc(_H(xch)); diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c index 2625fc4..9348ce6 100644 --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -256,7 +256,8 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self, } } - if ( xc_vcpu_setaffinity(self->xc_handle, dom, vcpu, cpumap) != 0 ) + if ( xc_vcpu_setaffinity(self->xc_handle, dom, vcpu, cpumap, + XEN_VCPUAFFINITY_HARD, NULL) != 0 ) { free(cpumap); return pyxc_error_to_exception(self->xc_handle); @@ -403,7 +404,8 @@ static PyObject *pyxc_vcpu_getinfo(XcObject *self, if(cpumap == NULL) return pyxc_error_to_exception(self->xc_handle); - rc = xc_vcpu_getaffinity(self->xc_handle, dom, vcpu, cpumap); + rc = xc_vcpu_getaffinity(self->xc_handle, dom, vcpu, + XEN_VCPUAFFINITY_HARD, cpumap); if ( rc < 0 ) { free(cpumap);
Make space for two new cpumap-s, one in vcpu_info (for getting soft affinity) and build_info (for setting it). Provide two new API calls: * libxl_set_vcpuaffinity2, taking a cpumap and setting either hard, soft or both affinity to it, depending on ''flags''; * libxl_set_vcpuaffinity3, taking two cpumap, one for hard and one for soft affinity. The bheavior of the existing libxl_set_vcpuaffinity is left unchanged, i.e., it only set hard affinity. Getting soft affinity happens indirectly, via `xl vcpu-list'' (as it is already for hard affinity). The new calls include logic to check whether the affinity which will be used by Xen to schedule the vCPU(s) does actually match with the cpumap provided. In fact, we want to allow every possible combination of hard and soft affinities to be set, but we warn the user upon particularly weird combinations (e.g., hard and soft being disjoint sets of pCPUs). Also, this is the first change breaking the libxl ABI, so it bumps the MAJOR. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * interface completely redesigned, as discussed during review. --- tools/libxl/Makefile | 2 - tools/libxl/libxl.c | 131 +++++++++++++++++++++++++++++++++++++++++++ tools/libxl/libxl.h | 30 ++++++++++ tools/libxl/libxl_create.c | 6 ++ tools/libxl/libxl_types.idl | 4 + tools/libxl/libxl_utils.h | 15 +++++ 6 files changed, 186 insertions(+), 2 deletions(-) diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index cf214bb..cba32d5 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -5,7 +5,7 @@ XEN_ROOT = $(CURDIR)/../.. include $(XEN_ROOT)/tools/Rules.mk -MAJOR = 4.3 +MAJOR = 4.4 MINOR = 0 XLUMAJOR = 4.3 diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index d0db3f0..1122360 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -4204,6 +4204,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu, ++ptr) { if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) return NULL; + if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap_soft, 0)) + return NULL; if (xc_vcpu_getinfo(ctx->xch, domid, *nb_vcpu, &vcpuinfo) == -1) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu info"); return NULL; @@ -4214,6 +4216,12 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu affinity"); return NULL; } + if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu, + XEN_VCPUAFFINITY_SOFT, + ptr->cpumap_soft.map) == -1) { + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu affinity"); + return NULL; + } ptr->vcpuid = *nb_vcpu; ptr->cpu = vcpuinfo.cpu; ptr->online = !!vcpuinfo.online; @@ -4250,6 +4258,129 @@ int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, return rc; } +int libxl_set_vcpuaffinity2(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, + const libxl_bitmap *cpumap, int flags) +{ + libxl_cputopology *topology; + libxl_bitmap ecpumap; + int nr_cpus = 0, rc; + + topology = libxl_get_cpu_topology(ctx, &nr_cpus); + if (!topology) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "failed to retrieve CPU topology"); + return ERROR_FAIL; + } + libxl_cputopology_list_free(topology, nr_cpus); + + rc = libxl_cpu_bitmap_alloc(ctx, &ecpumap, 0); + if (rc) + return rc; + + if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map, + flags, ecpumap.map)) { + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu affinity"); + rc = ERROR_FAIL; + goto out; + } + + if (!libxl_bitmap_equal(cpumap, &ecpumap, nr_cpus)) + LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, + "New affinity for vcpu %d contains unreachable cpus", + vcpuid); + if (libxl_bitmap_is_empty(&ecpumap)) + LIBXL__LOG(ctx, LIBXL__LOG_WARNING, + "New affinity for vcpu %d has only unreachabel cpus. " + "Only hard affinity will be considered for scheduling", + vcpuid); + + rc = 0; + out: + libxl_bitmap_dispose(&ecpumap); + return 0; +} + +int libxl_set_vcpuaffinity_all2(libxl_ctx *ctx, uint32_t domid, + unsigned int max_vcpus, + const libxl_bitmap *cpumap, int flags) +{ + int i, rc = 0; + + for (i = 0; i < max_vcpus; i++) { + if (libxl_set_vcpuaffinity2(ctx, domid, i, cpumap, flags)) { + LIBXL__LOG(ctx, LIBXL__LOG_WARNING, + "failed to set affinity for %d", i); + rc = ERROR_FAIL; + } + } + return rc; +} + +int libxl_set_vcpuaffinity3(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, + const libxl_bitmap *cpumap_hard, + const libxl_bitmap *cpumap_soft) +{ + libxl_cputopology *topology; + libxl_bitmap ecpumap; + int nr_cpus = 0, rc; + + topology = libxl_get_cpu_topology(ctx, &nr_cpus); + if (!topology) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "failed to retrieve CPU topology"); + return ERROR_FAIL; + } + libxl_cputopology_list_free(topology, nr_cpus); + + rc = libxl_cpu_bitmap_alloc(ctx, &ecpumap, 0); + if (rc) + return rc; + + if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap_hard->map, + XEN_VCPUAFFINITY_HARD, NULL)) { + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu hard affinity"); + rc = ERROR_FAIL; + goto out; + } + + if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap_soft->map, + XEN_VCPUAFFINITY_SOFT, ecpumap.map)) { + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu soft affinity"); + rc = ERROR_FAIL; + goto out; + } + + if (!libxl_bitmap_equal(cpumap_soft, &ecpumap, nr_cpus)) + LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, + "New soft affinity for vcpu %d contains unreachable cpus", + vcpuid); + if (libxl_bitmap_is_empty(&ecpumap)) + LIBXL__LOG(ctx, LIBXL__LOG_WARNING, + "New soft affinity for vcpu %d has only unreachabel cpus. " + "Only hard affinity will be considered for scheduling", + vcpuid); + + rc = 0; + out: + libxl_bitmap_dispose(&ecpumap); + return 0; +} + +int libxl_set_vcpuaffinity_all3(libxl_ctx *ctx, uint32_t domid, + unsigned int max_vcpus, + const libxl_bitmap *cpumap_hard, + const libxl_bitmap *cpumap_soft) +{ + int i, rc = 0; + + for (i = 0; i < max_vcpus; i++) { + if (libxl_set_vcpuaffinity3(ctx, domid, i, cpumap_hard, cpumap_soft)) { + LIBXL__LOG(ctx, LIBXL__LOG_WARNING, + "failed to set affinity for %d", i); + rc = ERROR_FAIL; + } + } + return rc; +} + int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid, libxl_bitmap *nodemap) { diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index c7dceda..504c57b 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -82,6 +82,20 @@ #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1 /* + * LIBXL_HAVE_VCPUINFO_SOFTAFFINITY indicates that a ''cpumap_soft'' + * field (of libxl_bitmap type) is present in libxl_vcpuinfo, + * containing the soft affinity for the vcpu. + */ +#define LIBXL_HAVE_VCPUINFO_SOFTAFFINITY 1 + +/* + * LIBXL_HAVE_BUILDINFO_SOFTAFFINITY indicates that a ''cpumap_soft'' + * field (of libxl_bitmap type) is present in libxl_domain_build_info, + * containing the soft affinity for the vcpu. + */ +#define LIBXL_HAVE_BUILDINFO_SOFTAFFINITY 1 + +/* * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the * libxl_vendor_device field is present in the hvm sections of * libxl_domain_build_info. This field tells libxl which @@ -973,6 +987,22 @@ int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, libxl_bitmap *cpumap); int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, unsigned int max_vcpus, libxl_bitmap *cpumap); +int libxl_set_vcpuaffinity2(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, + const libxl_bitmap *cpumap, int flags); +int libxl_set_vcpuaffinity_all2(libxl_ctx *ctx, uint32_t domid, + unsigned int max_vcpus, + const libxl_bitmap *cpumap, int flags); +int libxl_set_vcpuaffinity3(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, + const libxl_bitmap *cpumap_hard, + const libxl_bitmap *cpumap_soft); +int libxl_set_vcpuaffinity_all3(libxl_ctx *ctx, uint32_t domid, + unsigned int max_vcpus, + const libxl_bitmap *cpumap_hard, + const libxl_bitmap *cpumap_soft); +/* Flags, consistent with domctl.h */ +#define LIBXL_VCPUAFFINITY_HARD 1 +#define LIBXL_VCPUAFFINITY_SOFT 2 + int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid, libxl_bitmap *nodemap); int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid, diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 5e9cdcc..c314bec 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -192,6 +192,12 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, libxl_bitmap_set_any(&b_info->cpumap); } + if (!b_info->cpumap_soft.size) { + if (libxl_cpu_bitmap_alloc(CTX, &b_info->cpumap_soft, 0)) + return ERROR_FAIL; + libxl_bitmap_set_any(&b_info->cpumap_soft); + } + libxl_defbool_setdefault(&b_info->numa_placement, true); if (!b_info->nodemap.size) { diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index de5bac3..4001761 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -297,6 +297,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("max_vcpus", integer), ("avail_vcpus", libxl_bitmap), ("cpumap", libxl_bitmap), + ("cpumap_soft", libxl_bitmap), ("nodemap", libxl_bitmap), ("numa_placement", libxl_defbool), ("tsc_mode", libxl_tsc_mode), @@ -509,7 +510,8 @@ libxl_vcpuinfo = Struct("vcpuinfo", [ ("blocked", bool), ("running", bool), ("vcpu_time", uint64), # total vcpu time ran (ns) - ("cpumap", libxl_bitmap), # current cpu''s affinities + ("cpumap", libxl_bitmap), # current hard cpu affinity + ("cpumap_soft", libxl_bitmap), # current soft cpu affinity ], dir=DIR_OUT) libxl_physinfo = Struct("physinfo", [ diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h index b11cf28..fc3afee 100644 --- a/tools/libxl/libxl_utils.h +++ b/tools/libxl/libxl_utils.h @@ -98,6 +98,21 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit) #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \ if (libxl_bitmap_test(&(m), v)) +static inline int libxl_bitmap_equal(const libxl_bitmap *ba, + const libxl_bitmap *bb, + int nr_bits) +{ + int i; + + /* Only check nr_bits (all bits if <= 0) */ + nr_bits = nr_bits <=0 ? ba->size * 8 : nr_bits; + for (i = 0; i < nr_bits; i++) { + if (libxl_bitmap_test(ba, i) != libxl_bitmap_test(bb, i)) + return 0; + } + return 1; +} + int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int max_cpus);
Getting happens via `xl vcpu-list'', which now looks like this: # xl vcpu-list -s Name ID VCPU CPU State Time(s) Hard Affinity / Soft Affinity Domain-0 0 0 11 -b- 5.4 8-15 / all Domain-0 0 1 11 -b- 1.0 8-15 / all Domain-0 0 14 13 -b- 1.4 8-15 / all Domain-0 0 15 8 -b- 1.6 8-15 / all vm-test 3 0 4 -b- 2.5 0-12 / 0-7 vm-test 3 1 0 -b- 3.2 0-12 / 0-7 Setting happens by adding a ''-s''/''--soft'' switch to `xl vcpu-pin''. xl manual page is updated accordingly. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * this patch folds what in v2 were patches 13 and 14; * `xl vcpu-pin'' always shows both had and soft affinity, without the need of passing ''-s''. --- docs/man/xl.pod.1 | 24 +++++++++++++++++++- tools/libxl/xl_cmdimpl.c | 54 +++++++++++++++++++++++---------------------- tools/libxl/xl_cmdtable.c | 3 ++- 3 files changed, 53 insertions(+), 28 deletions(-) diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 index e7b9de2..481fbdf 100644 --- a/docs/man/xl.pod.1 +++ b/docs/man/xl.pod.1 @@ -619,7 +619,7 @@ after B<vcpu-set>, go to B<SEE ALSO> section for information. Lists VCPU information for a specific domain. If no domain is specified, VCPU information for all domains will be provided. -=item B<vcpu-pin> I<domain-id> I<vcpu> I<cpus> +=item B<vcpu-pin> [I<OPTIONS>] I<domain-id> I<vcpu> I<cpus> Pins the VCPU to only run on the specific CPUs. The keyword B<all> can be used to apply the I<cpus> list to all VCPUs in the @@ -630,6 +630,28 @@ different run state is appropriate. Pinning can be used to restrict this, by ensuring certain VCPUs can only run on certain physical CPUs. +B<OPTIONS> + +=over 4 + +=item B<-s>, B<--soft> + +The same as above, but affect I<soft affinity> rather than pinning +(also called I<hard affinity>). + +Normally, VCPUs just wander among the CPUs where it is allowed to +run (either all the CPUs or the ones to which it is pinned, as said +for B<vcpu-list>). Soft affinity offer a mean to specify one or more +I<preferred> CPUs. Basically, among the ones where it can run, the +VCPU the VCPU will greately prefer to execute on one of these CPUs, +whenever that is possible. + +Notice that, in order for soft affinity to actually work, it needs +special support in the scheduler. Right now, only credit1 provides +that. + +=back + =item B<vm-list> Prints information about guests. This list excludes information about diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index cf237c4..d5c4eb1 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -4497,8 +4497,10 @@ static void print_vcpuinfo(uint32_t tdomid, } /* TIM */ printf("%9.1f ", ((float)vcpuinfo->vcpu_time / 1e9)); - /* CPU AFFINITY */ + /* CPU HARD AND SOFT AFFINITY */ print_bitmap(vcpuinfo->cpumap.map, nr_cpus, stdout); + printf(" / "); + print_bitmap(vcpuinfo->cpumap_soft.map, nr_cpus, stdout); printf("\n"); } @@ -4533,7 +4535,8 @@ static void vcpulist(int argc, char **argv) } printf("%-32s %5s %5s %5s %5s %9s %s\n", - "Name", "ID", "VCPU", "CPU", "State", "Time(s)", "CPU Affinity"); + "Name", "ID", "VCPU", "CPU", "State", "Time(s)", + "Hard Affinity / Soft Affinity"); if (!argc) { if (!(dominfo = libxl_list_domain(ctx, &nb_domain))) { fprintf(stderr, "libxl_list_domain failed.\n"); @@ -4566,17 +4569,33 @@ int main_vcpulist(int argc, char **argv) return 0; } -static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) +int main_vcpupin(int argc, char **argv) { libxl_vcpuinfo *vcpuinfo; libxl_bitmap cpumap; - - uint32_t vcpuid; - char *endptr; + uint32_t vcpuid, domid; + const char *vcpu; + char *endptr, *cpu; int i = 0, nb_vcpu, rc = -1; + int opt, flags = LIBXL_VCPUAFFINITY_HARD; + static struct option opts[] = { + {"soft", 0, 0, ''s''}, + COMMON_LONG_OPTS, + {0, 0, 0, 0} + }; libxl_bitmap_init(&cpumap); + SWITCH_FOREACH_OPT(opt, "s", opts, "vcpu-pin", 3) { + case ''s'': + flags = LIBXL_VCPUAFFINITY_SOFT; + break; + } + + domid = find_domain(argv[optind]); + vcpu = argv[optind+1]; + cpu = argv[optind+2]; + vcpuid = strtoul(vcpu, &endptr, 10); if (vcpu == endptr) { if (strcmp(vcpu, "all")) { @@ -4615,22 +4634,16 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) } if (vcpuid != -1) { - if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap) == -1) { + if (libxl_set_vcpuaffinity2(ctx, domid, vcpuid, &cpumap, flags)) fprintf(stderr, "Could not set affinity for vcpu `%u''.\n", vcpuid); - } } else { if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu, &i))) { fprintf(stderr, "libxl_list_vcpu failed.\n"); goto out; } - for (i = 0; i < nb_vcpu; i++) { - if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid, - &cpumap) == -1) { - fprintf(stderr, "libxl_set_vcpuaffinity failed" - " on vcpu `%u''.\n", vcpuinfo[i].vcpuid); - } - } + if (libxl_set_vcpuaffinity_all2(ctx, domid, nb_vcpu, &cpumap, flags)) + fprintf(stderr, "libxl_set_vcpuaffinity_all2 failed.\n"); libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu); } @@ -4640,17 +4653,6 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) return rc; } -int main_vcpupin(int argc, char **argv) -{ - int opt; - - SWITCH_FOREACH_OPT(opt, "", NULL, "vcpu-pin", 3) { - /* No options */ - } - - return vcpupin(find_domain(argv[optind]), argv[optind+1] , argv[optind+2]); -} - static void vcpuset(uint32_t domid, const char* nr_vcpus, int check_host) { char *endptr; diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c index d3dcbf0..c97796f 100644 --- a/tools/libxl/xl_cmdtable.c +++ b/tools/libxl/xl_cmdtable.c @@ -213,7 +213,8 @@ struct cmd_spec cmd_table[] = { { "vcpu-pin", &main_vcpupin, 1, 1, "Set which CPUs a VCPU can use", - "<Domain> <VCPU|all> <CPUs|all>", + "[option] <Domain> <VCPU|all> <CPUs|all>", + "-s, --soft Deal with soft affinity", }, { "vcpu-set", &main_vcpuset, 0, 1,
Dario Faggioli
2013-Nov-18 18:18 UTC
[PATCH v3 13/14] xl: enable for specifying node-affinity in the config file
in a similar way to how it is possible to specify vcpu-affinity. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * use the new libxl API. Although the implementation changed only a little bit, I removed IanJ''s Acked-by, although I am here saying that he did provided it, as requested. --- docs/man/xl.cfg.pod.5 | 27 ++++++++++++++-- tools/libxl/libxl_dom.c | 3 +- tools/libxl/xl_cmdimpl.c | 79 +++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 103 insertions(+), 6 deletions(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index 5dbc73c..733c74e 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -144,19 +144,40 @@ run on cpu #3 of the host. =back If this option is not specified, no vcpu to cpu pinning is established, -and the vcpus of the guest can run on all the cpus of the host. +and the vcpus of the guest can run on all the cpus of the host. If this +option is specified, the intersection of the vcpu pinning mask, provided +here, and the soft affinity mask, provided via B<cpus\_soft=> (if any), +is utilized to compute the domain node-affinity, for driving memory +allocations. If we are on a NUMA machine (i.e., if the host has more than one NUMA node) and this option is not specified, libxl automatically tries to place the guest on the least possible number of nodes. That, however, will not affect vcpu pinning, so the guest will still be able to run on -all the cpus, it will just prefer the ones from the node it has been -placed on. A heuristic approach is used for choosing the best node (or +all the cpus. A heuristic approach is used for choosing the best node (or set of nodes), with the goals of maximizing performance for the guest and, at the same time, achieving efficient utilization of host cpus and memory. See F<docs/misc/xl-numa-placement.markdown> for more details. +=item B<cpus_soft="CPU-LIST"> + +Exactly as B<cpus=>, but specifies soft affinity, rather than pinning +(also called hard affinity). Starting from Xen 4.4, and if the credit +scheduler is used, this means the vcpus of the domain prefers to run +these pcpus. Default is either all pcpus or xl (via libxl) guesses +(depending on what other options are present). + +A C<CPU-LIST> is specified exactly as above, for B<cpus=>. + +If this option is not specified, the vcpus of the guest will not have +any preference regarding on what cpu to run, and the scheduler will +treat all the cpus where a vcpu can execute (if B<cpus=> is specified), +or all the host cpus (if not), the same. If this option is specified, +the intersection of the soft affinity mask, provided here, and the vcpu +pinning, provided via B<cpus=> (if any), is utilized to compute the +domain node-affinity, for driving memory allocations. + =back =head3 CPU Scheduling diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index a1c16b0..ceb37a3 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -236,7 +236,8 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid, return rc; } libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap); - libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap); + libxl_set_vcpuaffinity_all3(ctx, domid, info->max_vcpus, &info->cpumap, + &info->cpumap_soft); xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + LIBXL_MAXMEM_CONSTANT); xs_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL); diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index d5c4eb1..660bb1f 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -76,8 +76,9 @@ xlchild children[child_max]; static const char *common_domname; static int fd_lock = -1; -/* Stash for specific vcpu to pcpu mappping */ +/* Stash for specific vcpu to pcpu hard and soft mappping */ static int *vcpu_to_pcpu; +static int *vcpu_to_pcpu_soft; static const char savefileheader_magic[32] "Xen saved domain, xl format\n \0 \r"; @@ -647,7 +648,8 @@ static void parse_config_data(const char *config_source, const char *buf; long l; XLU_Config *config; - XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms; + XLU_ConfigList *cpus, *cpus_soft, *vbds, *nics, *pcis; + XLU_ConfigList *cvfbs, *cpuids, *vtpms; XLU_ConfigList *ioports, *irqs, *iomem; int num_ioports, num_irqs, num_iomem; int pci_power_mgmt = 0; @@ -824,6 +826,50 @@ static void parse_config_data(const char *config_source, libxl_defbool_set(&b_info->numa_placement, false); } + if (!xlu_cfg_get_list (config, "cpus_soft", &cpus_soft, 0, 1)) { + int n_cpus = 0; + + if (libxl_node_bitmap_alloc(ctx, &b_info->cpumap_soft, 0)) { + fprintf(stderr, "Unable to allocate cpumap_soft\n"); + exit(1); + } + + /* As above, use a temporary storage for the single affinities */ + vcpu_to_pcpu_soft = xmalloc(sizeof(int) * b_info->max_vcpus); + memset(vcpu_to_pcpu_soft, -1, sizeof(int) * b_info->max_vcpus); + + libxl_bitmap_set_none(&b_info->cpumap_soft); + while ((buf = xlu_cfg_get_listitem(cpus_soft, n_cpus)) != NULL) { + i = atoi(buf); + if (!libxl_bitmap_cpu_valid(&b_info->cpumap_soft, i)) { + fprintf(stderr, "cpu %d illegal\n", i); + exit(1); + } + libxl_bitmap_set(&b_info->cpumap_soft, i); + if (n_cpus < b_info->max_vcpus) + vcpu_to_pcpu_soft[n_cpus] = i; + n_cpus++; + } + + /* We have a soft affinity map, disable automatic placement */ + libxl_defbool_set(&b_info->numa_placement, false); + } + else if (!xlu_cfg_get_string (config, "cpus_soft", &buf, 0)) { + char *buf2 = strdup(buf); + + if (libxl_node_bitmap_alloc(ctx, &b_info->cpumap_soft, 0)) { + fprintf(stderr, "Unable to allocate cpumap_soft\n"); + exit(1); + } + + libxl_bitmap_set_none(&b_info->cpumap_soft); + if (vcpupin_parse(buf2, &b_info->cpumap_soft)) + exit(1); + free(buf2); + + libxl_defbool_set(&b_info->numa_placement, false); + } + if (!xlu_cfg_get_long (config, "memory", &l, 0)) { b_info->max_memkb = l * 1024; b_info->target_memkb = b_info->max_memkb; @@ -2183,6 +2229,35 @@ start: free(vcpu_to_pcpu); vcpu_to_pcpu = NULL; } + /* And do the same for single vcpu to soft-affinity mapping */ + if (vcpu_to_pcpu_soft) { + libxl_bitmap soft_cpumap; + + ret = libxl_cpu_bitmap_alloc(ctx, &soft_cpumap, 0); + if (ret) + goto error_out; + for (i = 0; i < d_config.b_info.max_vcpus; i++) { + + if (vcpu_to_pcpu_soft[i] != -1) { + libxl_bitmap_set_none(&soft_cpumap); + libxl_bitmap_set(&soft_cpumap, vcpu_to_pcpu_soft[i]); + } else { + libxl_bitmap_set_any(&soft_cpumap); + } + if (libxl_set_vcpuaffinity2(ctx, domid, i, &soft_cpumap, + LIBXL_VCPUAFFINITY_SOFT)) { + fprintf(stderr, "setting soft-affinity failed " + "on vcpu `%d''.\n", i); + libxl_bitmap_dispose(&soft_cpumap); + free(vcpu_to_pcpu_soft); + ret = ERROR_FAIL; + goto error_out; + } + } + libxl_bitmap_dispose(&soft_cpumap); + free(vcpu_to_pcpu_soft); vcpu_to_pcpu_soft = NULL; + } + ret = libxl_userdata_store(ctx, domid, "xl", config_data, config_len); if (ret) {
Dario Faggioli
2013-Nov-18 18:18 UTC
[PATCH v3 14/14] libxl: automatic NUMA placement affects soft affinity
vCPU soft affinity and NUMA-aware scheduling does not have to be related. However, soft affinity is how NUMA-aware scheduling is actually implemented, and therefore, by default, the results of automatic NUMA placement (at VM creation time) are also used to set the soft affinity of all the vCPUs of the domain. Of course, this only happens if automatic NUMA placement is enabled and actually takes place (for instance, if the user does not specify any hard and soft affiniy in the xl config file). This also takes care of the vice-versa, i.e., don''t trigger automatic placement if the config file specifies either an hard (the check for which was already there) or a soft (the check for which is introduced by this commit) affinity. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> --- docs/man/xl.cfg.pod.5 | 21 +++++++++++---------- docs/misc/xl-numa-placement.markdown | 16 ++++++++++++++-- tools/libxl/libxl_dom.c | 20 ++++++++++++++++++-- 3 files changed, 43 insertions(+), 14 deletions(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index 733c74e..d4a0a6f 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -150,16 +150,6 @@ here, and the soft affinity mask, provided via B<cpus\_soft=> (if any), is utilized to compute the domain node-affinity, for driving memory allocations. -If we are on a NUMA machine (i.e., if the host has more than one NUMA -node) and this option is not specified, libxl automatically tries to -place the guest on the least possible number of nodes. That, however, -will not affect vcpu pinning, so the guest will still be able to run on -all the cpus. A heuristic approach is used for choosing the best node (or -set of nodes), with the goals of maximizing performance for the guest -and, at the same time, achieving efficient utilization of host cpus -and memory. See F<docs/misc/xl-numa-placement.markdown> for more -details. - =item B<cpus_soft="CPU-LIST"> Exactly as B<cpus=>, but specifies soft affinity, rather than pinning @@ -178,6 +168,17 @@ the intersection of the soft affinity mask, provided here, and the vcpu pinning, provided via B<cpus=> (if any), is utilized to compute the domain node-affinity, for driving memory allocations. +If this option is not specified (and B<cpus=> is not specified either), +libxl automatically tries to place the guest on the least possible +number of nodes. A heuristic approach is used for choosing the best +node (or set of nodes), with the goal of maximizing performance for +the guest and, at the same time, achieving efficient utilization of +host cpus and memory. In that case, the soft affinity of all the vcpus +of the domain will be set to the pcpus belonging to the NUMA nodes +chosen during placement. + +For more details, see F<docs/misc/xl-numa-placement.markdown>. + =back =head3 CPU Scheduling diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown index b1ed361..f644758 100644 --- a/docs/misc/xl-numa-placement.markdown +++ b/docs/misc/xl-numa-placement.markdown @@ -126,10 +126,22 @@ or Xen won''t be able to guarantee the locality for their memory accesses. That, of course, also mean the vCPUs of the domain will only be able to execute on those same pCPUs. +Starting from 4.4, is is also possible to specify a "cpus\_soft=" option +in the xl config file. This, independently from whether or not "cpus=" is +specified too, affect the NUMA placement in a way very similar to what +is described above. In fact, the hypervisor will build up the node-affinity +of the VM basing right on it or, if both pinning (via "cpus=") and soft +affinity (via "cpus\_soft=") are present, basing on their intersection. + +Besides that, "cpus\_soft=" also means, of course, that the vCPUs of the +domain will prefer to execute on, among the pCPUs where they can run, +those particular pCPUs. + + ### Placing the guest automatically ### -If no "cpus=" option is specified in the config file, libxl tries -to figure out on its own on which node(s) the domain could fit best. +If neither "cpus=" nor "cpus\_soft=" are present in the config file, libxl +tries to figure out on its own on which node(s) the domain could fit best. If it finds one (some), the domain''s node affinity get set to there, and both memory allocations and NUMA aware scheduling (for the credit scheduler and starting from Xen 4.3) will comply with it. Starting from diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index ceb37a3..6599209 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -222,18 +222,34 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid, * some weird error manifests) the subsequent call to * libxl_domain_set_nodeaffinity() will do the actual placement, * whatever that turns out to be. + * + * As far as scheduling is concerned, we achieve NUMA-aware scheduling + * by having the results of placement affect the soft affinity of all + * the vcpus of the domain. Of course, we want that iff placement is + * enabled and actually happens, so we only change info->cpumap_soft to + * reflect the placement result if that is the case */ if (libxl_defbool_val(info->numa_placement)) { - if (!libxl_bitmap_is_full(&info->cpumap)) { + /* We require both hard and soft affinity not to be set */ + if (!libxl_bitmap_is_full(&info->cpumap) || + !libxl_bitmap_is_full(&info->cpumap_soft)) { LOG(ERROR, "Can run NUMA placement only if no vcpu " - "affinity is specified"); + "(hard or soft) affinity is specified"); return ERROR_INVAL; } rc = numa_place_domain(gc, domid, info); if (rc) return rc; + + /* + * We change the soft affinity in domain_build_info here, of course + * after converting the result of placement from nodes to cpus. the + * following call to libxl_set_vcpuaffinity_all_soft() will do the + * actual updating of the domain''s vcpus'' soft affinity. + */ + libxl_nodemap_to_cpumap(ctx, &info->nodemap, &info->cpumap_soft); } libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap); libxl_set_vcpuaffinity_all3(ctx, domid, info->max_vcpus, &info->cpumap,
On lun, 2013-11-18 at 19:16 +0100, Dario Faggioli wrote:> Implement vcpu soft affinity for credit1 >Well, of course this ("Implement vcpu soft affinity for credit1") was the real subject, which I put one line below where I should have, when editing the cover letter. :-/ Sorry for that. -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2013-Nov-19 12:24 UTC
Re: [PATCH v3 02/14] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}
On 11/18/2013 06:16 PM, Dario Faggioli wrote:> as well as both error handling and logging in libxl_cpu_bitmap_alloc > and libxl_node_bitmap_alloc. > > Now libxl_get_max_{cpus,nodes} either return a positive number, or > a libxl error code. Thanks to that, it is possible to fix loggig for > the two bitmap allocation functions, which now happens _inside_ the > functions themselves, and report what happens more accurately. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> With one caveat...> diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c > index 682f874..2a51c9c 100644 > --- a/tools/libxl/libxl_utils.c > +++ b/tools/libxl/libxl_utils.c > @@ -645,6 +645,46 @@ char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const libxl_bitmap *bitmap) > return q; > } > > +inline int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, > + libxl_bitmap *cpumap, > + int max_cpus)Stray ''inline''. :-) -George
Dario Faggioli
2013-Nov-19 12:34 UTC
Re: [PATCH v3 02/14] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}
On mar, 2013-11-19 at 12:24 +0000, George Dunlap wrote:> On 11/18/2013 06:16 PM, Dario Faggioli wrote: > > as well as both error handling and logging in libxl_cpu_bitmap_alloc > > and libxl_node_bitmap_alloc. > > > > Now libxl_get_max_{cpus,nodes} either return a positive number, or > > a libxl error code. Thanks to that, it is possible to fix loggig for > > the two bitmap allocation functions, which now happens _inside_ the > > functions themselves, and report what happens more accurately. > > > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> >Thanks.> > diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c > > index 682f874..2a51c9c 100644 > > --- a/tools/libxl/libxl_utils.c > > +++ b/tools/libxl/libxl_utils.c > > @@ -645,6 +645,46 @@ char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const libxl_bitmap *bitmap) > > return q; > > } > > > > +inline int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, > > + libxl_bitmap *cpumap, > > + int max_cpus) > > Stray ''inline''. :-) >Wow... How did he manage to survive? I mean, I killed his ''static'' buddy but kept him? I guess I''m getting too old for late night hacking sessions! :-P Anyway, let''s see how the rest of the review goes. If I have to resend, I will fix this. Otherwise I can just resend this patch, or do whatever the maintainers/committers are most comfortable with. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2013-Nov-19 14:14 UTC
Re: [PATCH v3 08/14] xen: derive NUMA node affinity from hard and soft CPU affinity
On 11/18/2013 06:17 PM, Dario Faggioli wrote:> if a domain''s NUMA node-affinity (which is what controls > memory allocations) is provided by the user/toolstack, it > just is not touched. However, if the user does not say > anything, leaving it all to Xen, let''s compute it in the > following way: > > 1. cpupool''s cpus & hard-affinity & soft-affinity > 2. if (1) is empty: cpupool''s cpus & hard-affinity > > This guarantees memory to be allocated from the narrowest > possible set of NUMA nodes, ad makes it relatively easy to > set up NUMA-aware scheduling on top of soft affinity. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>> --- > Changes from v2: > * the loop computing the mask is now only executed when > it really is useful, as suggested during review; > * the loop, and all the cpumask handling is optimized, > in a way similar to what was suggested during review. > --- > xen/common/domain.c | 62 +++++++++++++++++++++++++++++++++------------------ > 1 file changed, 40 insertions(+), 22 deletions(-) > > diff --git a/xen/common/domain.c b/xen/common/domain.c > index d6ac4d1..721678a 100644 > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -353,17 +353,17 @@ struct domain *domain_create( > > void domain_update_node_affinity(struct domain *d) > { > - cpumask_var_t cpumask; > - cpumask_var_t online_affinity; > + cpumask_var_t dom_cpumask, dom_cpumask_soft; > + cpumask_t *dom_affinity; > const cpumask_t *online; > struct vcpu *v; > - unsigned int node; > + unsigned int cpu; > > - if ( !zalloc_cpumask_var(&cpumask) ) > + if ( !zalloc_cpumask_var(&dom_cpumask) ) > return; > - if ( !alloc_cpumask_var(&online_affinity) ) > + if ( !zalloc_cpumask_var(&dom_cpumask_soft) ) > { > - free_cpumask_var(cpumask); > + free_cpumask_var(dom_cpumask); > return; > } > > @@ -371,31 +371,49 @@ void domain_update_node_affinity(struct domain *d) > > spin_lock(&d->node_affinity_lock); > > - for_each_vcpu ( d, v ) > - { > - cpumask_and(online_affinity, v->cpu_hard_affinity, online); > - cpumask_or(cpumask, cpumask, online_affinity); > - } > - > /* > - * If d->auto_node_affinity is true, the domain''s node-affinity mask > - * (d->node_affinity) is automaically computed from all the domain''s > - * vcpus'' vcpu-affinity masks (the union of which we have just built > - * above in cpumask). OTOH, if d->auto_node_affinity is false, we > - * must leave the node-affinity of the domain alone. > + * If d->auto_node_affinity is true, let''s compute the domain''s > + * node-affinity and update d->node_affinity accordingly. if false, > + * just leave d->auto_node_affinity alone. > */ > if ( d->auto_node_affinity ) > { > + /* > + * We want the narrowest possible set of pcpus (to get the narowest > + * possible set of nodes). What we need is the cpumask of where the > + * domain can run (the union of the hard affinity of all its vcpus), > + * and the full mask of where it would prefer to run (the union of > + * the soft affinity of all its various vcpus). Let''s build them. > + */ > + cpumask_clear(dom_cpumask); > + cpumask_clear(dom_cpumask_soft); > + for_each_vcpu ( d, v ) > + { > + cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity); > + cpumask_or(dom_cpumask_soft, dom_cpumask_soft, > + v->cpu_soft_affinity); > + } > + /* Filter out non-online cpus */ > + cpumask_and(dom_cpumask, dom_cpumask, online); > + /* And compute the intersection between hard, online and soft */ > + cpumask_and(dom_cpumask_soft, dom_cpumask_soft, dom_cpumask); > + > + /* > + * If not empty, the intersection of hard, soft and online is the > + * narrowest set we want. If empty, we fall back to hard&online. > + */ > + dom_affinity = cpumask_empty(dom_cpumask_soft) ? > + dom_cpumask : dom_cpumask_soft; > + > nodes_clear(d->node_affinity); > - for_each_online_node ( node ) > - if ( cpumask_intersects(&node_to_cpumask(node), cpumask) ) > - node_set(node, d->node_affinity); > + for_each_cpu( cpu, dom_affinity ) > + node_set(cpu_to_node(cpu), d->node_affinity); > } > > spin_unlock(&d->node_affinity_lock); > > - free_cpumask_var(online_affinity); > - free_cpumask_var(cpumask); > + free_cpumask_var(dom_cpumask_soft); > + free_cpumask_var(dom_cpumask); > } > > >
George Dunlap
2013-Nov-19 14:32 UTC
Re: [PATCH v3 09/14] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
On 11/18/2013 06:17 PM, Dario Faggioli wrote:> by adding a flag for the caller to specify which one he cares about. > > Add also another cpumap there. This way, in case of > DOMCTL_setvcpuaffinity, Xen can return back to the caller the > "effective affinity" of the vcpu. We call the effective affinity > the intersection between cpupool''s cpus, the (new?) hard affinity > and the (new?) soft affinity. > > The purpose of this is allowing the toolstack to figure out whether > or not the requested change produced sensible results, when combined > with the other settings that are already in place. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>Looks good: Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>> --- > Changes from v2: > * in DOMCTL_[sg]etvcpuaffinity, flag is really a flag now, > i.e., we accept request for setting and getting: (1) only > hard affinity; (2) only soft affinity; (3) both; as > suggested during review. > --- > tools/libxc/xc_domain.c | 4 ++- > xen/arch/x86/traps.c | 4 ++- > xen/common/domctl.c | 54 ++++++++++++++++++++++++++++++++++++++++--- > xen/common/schedule.c | 35 +++++++++++++++++++--------- > xen/common/wait.c | 6 ++--- > xen/include/public/domctl.h | 15 ++++++++++-- > xen/include/xen/sched.h | 3 ++ > 7 files changed, 97 insertions(+), 24 deletions(-) > > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c > index 1ccafc5..f9ae4bf 100644 > --- a/tools/libxc/xc_domain.c > +++ b/tools/libxc/xc_domain.c > @@ -215,7 +215,9 @@ int xc_vcpu_setaffinity(xc_interface *xch, > > domctl.cmd = XEN_DOMCTL_setvcpuaffinity; > domctl.domain = (domid_t)domid; > - domctl.u.vcpuaffinity.vcpu = vcpu; > + domctl.u.vcpuaffinity.vcpu = vcpu; > + /* Soft affinity is there, but not used anywhere for now, so... */ > + domctl.u.vcpuaffinity.flags = XEN_VCPUAFFINITY_HARD; > > memcpy(local, cpumap, cpusize); > > diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c > index 4279cad..196ff68 100644 > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -3093,7 +3093,7 @@ static void nmi_mce_softirq(void) > * Make sure to wakeup the vcpu on the > * specified processor. > */ > - vcpu_set_affinity(st->vcpu, cpumask_of(st->processor)); > + vcpu_set_hard_affinity(st->vcpu, cpumask_of(st->processor)); > > /* Affinity is restored in the iret hypercall. */ > } > @@ -3122,7 +3122,7 @@ void async_exception_cleanup(struct vcpu *curr) > if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) && > !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) ) > { > - vcpu_set_affinity(curr, curr->cpu_hard_affinity_tmp); > + vcpu_set_hard_affinity(curr, curr->cpu_hard_affinity_tmp); > cpumask_clear(curr->cpu_hard_affinity_tmp); > } > > diff --git a/xen/common/domctl.c b/xen/common/domctl.c > index 5e0ac5c..84be0d6 100644 > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -617,19 +617,65 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) > if ( op->cmd == XEN_DOMCTL_setvcpuaffinity ) > { > cpumask_var_t new_affinity; > + cpumask_t *online; > > ret = xenctl_bitmap_to_cpumask( > &new_affinity, &op->u.vcpuaffinity.cpumap); > - if ( !ret ) > + if ( ret ) > + break; > + > + ret = -EINVAL; > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD ) > + ret = vcpu_set_hard_affinity(v, new_affinity); > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT ) > + ret = vcpu_set_soft_affinity(v, new_affinity); > + > + if ( ret ) > + goto setvcpuaffinity_out; > + > + /* > + * Report back to the caller what the "effective affinity", that > + * is the intersection of cpupool''s pcpus, the (new?) hard > + * affinity and the (new?) soft-affinity. > + */ > + if ( !guest_handle_is_null(op->u.vcpuaffinity.eff_cpumap.bitmap) ) > { > - ret = vcpu_set_affinity(v, new_affinity); > - free_cpumask_var(new_affinity); > + online = cpupool_online_cpumask(v->domain->cpupool); > + cpumask_and(new_affinity, online, v->cpu_hard_affinity); > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT) > + cpumask_and(new_affinity, new_affinity, > + v->cpu_soft_affinity); > + > + ret = cpumask_to_xenctl_bitmap( > + &op->u.vcpuaffinity.eff_cpumap, new_affinity); > } > + > + setvcpuaffinity_out: > + free_cpumask_var(new_affinity); > } > else > { > + cpumask_var_t affinity; > + > + /* > + * If the caller asks for both _HARD and _SOFT, what we return > + * is the intersection of hard and soft affinity for the vcpu. > + */ > + if ( !alloc_cpumask_var(&affinity) ) { > + ret = -EFAULT; > + break; > + } > + cpumask_setall(affinity); > + > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD ) > + cpumask_copy(affinity, v->cpu_hard_affinity); > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT ) > + cpumask_and(affinity, affinity, v->cpu_soft_affinity); > + > ret = cpumask_to_xenctl_bitmap( > - &op->u.vcpuaffinity.cpumap, v->cpu_hard_affinity); > + &op->u.vcpuaffinity.cpumap, affinity); > + > + free_cpumask_var(affinity); > } > } > break; > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index c9ae521..6c53287 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -654,22 +654,14 @@ void sched_set_node_affinity(struct domain *d, nodemask_t *mask) > SCHED_OP(DOM2OP(d), set_node_affinity, d, mask); > } > > -int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity) > +static int vcpu_set_affinity( > + struct vcpu *v, const cpumask_t *affinity, cpumask_t **which) > { > - cpumask_t online_affinity; > - cpumask_t *online; > spinlock_t *lock; > > - if ( v->domain->is_pinned ) > - return -EINVAL; > - online = VCPU2ONLINE(v); > - cpumask_and(&online_affinity, affinity, online); > - if ( cpumask_empty(&online_affinity) ) > - return -EINVAL; > - > lock = vcpu_schedule_lock_irq(v); > > - cpumask_copy(v->cpu_hard_affinity, affinity); > + cpumask_copy(*which, affinity); > > /* Always ask the scheduler to re-evaluate placement > * when changing the affinity */ > @@ -688,6 +680,27 @@ int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity) > return 0; > } > > +int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity) > +{ > + cpumask_t online_affinity; > + cpumask_t *online; > + > + if ( v->domain->is_pinned ) > + return -EINVAL; > + > + online = VCPU2ONLINE(v); > + cpumask_and(&online_affinity, affinity, online); > + if ( cpumask_empty(&online_affinity) ) > + return -EINVAL; > + > + return vcpu_set_affinity(v, affinity, &v->cpu_hard_affinity); > +} > + > +int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity) > +{ > + return vcpu_set_affinity(v, affinity, &v->cpu_soft_affinity); > +} > + > /* Block the currently-executing domain until a pertinent event occurs. */ > void vcpu_block(void) > { > diff --git a/xen/common/wait.c b/xen/common/wait.c > index 3f6ff41..1f6b597 100644 > --- a/xen/common/wait.c > +++ b/xen/common/wait.c > @@ -135,7 +135,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv) > /* Save current VCPU affinity; force wakeup on *this* CPU only. */ > wqv->wakeup_cpu = smp_processor_id(); > cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); > - if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) > + if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) > { > gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); > domain_crash_synchronous(); > @@ -166,7 +166,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv) > static void __finish_wait(struct waitqueue_vcpu *wqv) > { > wqv->esp = NULL; > - (void)vcpu_set_affinity(current, &wqv->saved_affinity); > + (void)vcpu_set_hard_affinity(current, &wqv->saved_affinity); > } > > void check_wakeup_from_wait(void) > @@ -184,7 +184,7 @@ void check_wakeup_from_wait(void) > /* Re-set VCPU affinity and re-enter the scheduler. */ > struct vcpu *curr = current; > cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); > - if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) > + if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) > { > gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); > domain_crash_synchronous(); > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h > index 01a3652..4f71450 100644 > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -300,8 +300,19 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_nodeaffinity_t); > /* XEN_DOMCTL_setvcpuaffinity */ > /* XEN_DOMCTL_getvcpuaffinity */ > struct xen_domctl_vcpuaffinity { > - uint32_t vcpu; /* IN */ > - struct xenctl_bitmap cpumap; /* IN/OUT */ > + /* IN variables. */ > + uint32_t vcpu; > + /* Set/get the hard affinity for vcpu */ > +#define _XEN_VCPUAFFINITY_HARD 0 > +#define XEN_VCPUAFFINITY_HARD (1U<<_XEN_VCPUAFFINITY_HARD) > + /* Set/get the soft affinity for vcpu */ > +#define _XEN_VCPUAFFINITY_SOFT 1 > +#define XEN_VCPUAFFINITY_SOFT (1U<<_XEN_VCPUAFFINITY_SOFT) > + uint32_t flags; > + /* IN/OUT variables. */ > + struct xenctl_bitmap cpumap; > + /* OUT variables. */ > + struct xenctl_bitmap eff_cpumap; > }; > typedef struct xen_domctl_vcpuaffinity xen_domctl_vcpuaffinity_t; > DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuaffinity_t); > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h > index 3575312..0f728b3 100644 > --- a/xen/include/xen/sched.h > +++ b/xen/include/xen/sched.h > @@ -755,7 +755,8 @@ void scheduler_free(struct scheduler *sched); > int schedule_cpu_switch(unsigned int cpu, struct cpupool *c); > void vcpu_force_reschedule(struct vcpu *v); > int cpu_disable_scheduler(unsigned int cpu); > -int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity); > +int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity); > +int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity); > void restore_vcpu_affinity(struct domain *d); > > void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate); >
George Dunlap
2013-Nov-19 14:51 UTC
Re: [PATCH v3 10/14] libxc: get and set soft and hard affinity
On 11/18/2013 06:18 PM, Dario Faggioli wrote:> by using the new flag introduced in the parameters of the > DOMCTL_{get,set}_vcpuaffinity hypercall. > > This happens by adding a new parameter (flags) to > xc_vcpu_setaffinity() and xc_vcpu_getaffinity(), so that the > caller can decide to set either the soft or hard affinity, or > even both. > > In case of setting both hard and soft, they are set to the > same cpumap. xc_get_setaffinity() also takes another new param, > for reporting back to the caller what the actual affinity the > scheduler uses will be after a successful call. > In case of asking to get both hard and soft, what the caller > gets is the intersection between them. > > In-tree callers are also fixed to cope with the new interface. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>Acked-by: George Dunlap <george.dunlap@eu.citrix.com> But...> --- > Changes from v2: > * better cleanup logic in _vcpu_setaffinity() (regarding > xc_hypercall_buffer_{alloc,free}()), as suggested during > review; > * make it more evident that DOMCTL_setvcpuaffinity has an out > parameter, by calling ecpumap_out, and improving the comment > wrt that; > * change the interface and have xc_vcpu_[sg]etaffinity() so > that they take the new parameters (flags and ecpumap_out) and > fix the in tree callers. > --- > tools/libxc/xc_domain.c | 47 +++++++++++++++++++++-------------- > tools/libxc/xenctrl.h | 44 ++++++++++++++++++++++++++++++++- > tools/libxl/libxl.c | 7 ++++- > tools/ocaml/libs/xc/xenctrl_stubs.c | 8 ++++-- > tools/python/xen/lowlevel/xc/xc.c | 6 +++- > 5 files changed, 86 insertions(+), 26 deletions(-) > > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c > index f9ae4bf..bddf4e0 100644 > --- a/tools/libxc/xc_domain.c > +++ b/tools/libxc/xc_domain.c > @@ -192,44 +192,52 @@ int xc_domain_node_getaffinity(xc_interface *xch, > int xc_vcpu_setaffinity(xc_interface *xch, > uint32_t domid, > int vcpu, > - xc_cpumap_t cpumap) > + xc_cpumap_t cpumap, > + uint32_t flags, > + xc_cpumap_t ecpumap_out) > { > DECLARE_DOMCTL; > - DECLARE_HYPERCALL_BUFFER(uint8_t, local); > + DECLARE_HYPERCALL_BUFFER(uint8_t, cpumap_local); > + DECLARE_HYPERCALL_BUFFER(uint8_t, ecpumap_local); > int ret = -1; > int cpusize; > > cpusize = xc_get_cpumap_size(xch); > - if (!cpusize) > + if ( !cpusize )I know IanJ will have something to say about non-snuggly braces here and below. :-) -George
Ian Campbell
2013-Nov-19 14:57 UTC
Re: [PATCH v3 10/14] libxc: get and set soft and hard affinity
On Tue, 2013-11-19 at 14:51 +0000, George Dunlap wrote:> On 11/18/2013 06:18 PM, Dario Faggioli wrote: > > by using the new flag introduced in the parameters of the > > DOMCTL_{get,set}_vcpuaffinity hypercall. > > > > This happens by adding a new parameter (flags) to > > xc_vcpu_setaffinity() and xc_vcpu_getaffinity(), so that the > > caller can decide to set either the soft or hard affinity, or > > even both. > > > > In case of setting both hard and soft, they are set to the > > same cpumap. xc_get_setaffinity() also takes another new param, > > for reporting back to the caller what the actual affinity the > > scheduler uses will be after a successful call. > > In case of asking to get both hard and soft, what the caller > > gets is the intersection between them. > > > > In-tree callers are also fixed to cope with the new interface. > > > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > Acked-by: George Dunlap <george.dunlap@eu.citrix.com> > > But... > > > --- > > Changes from v2: > > * better cleanup logic in _vcpu_setaffinity() (regarding > > xc_hypercall_buffer_{alloc,free}()), as suggested during > > review; > > * make it more evident that DOMCTL_setvcpuaffinity has an out > > parameter, by calling ecpumap_out, and improving the comment > > wrt that; > > * change the interface and have xc_vcpu_[sg]etaffinity() so > > that they take the new parameters (flags and ecpumap_out) and > > fix the in tree callers. > > --- > > tools/libxc/xc_domain.c | 47 +++++++++++++++++++++-------------- > > tools/libxc/xenctrl.h | 44 ++++++++++++++++++++++++++++++++- > > tools/libxl/libxl.c | 7 ++++- > > tools/ocaml/libs/xc/xenctrl_stubs.c | 8 ++++-- > > tools/python/xen/lowlevel/xc/xc.c | 6 +++- > > 5 files changed, 86 insertions(+), 26 deletions(-) > > > > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c > > index f9ae4bf..bddf4e0 100644 > > --- a/tools/libxc/xc_domain.c > > +++ b/tools/libxc/xc_domain.c > > @@ -192,44 +192,52 @@ int xc_domain_node_getaffinity(xc_interface *xch, > > int xc_vcpu_setaffinity(xc_interface *xch, > > uint32_t domid, > > int vcpu, > > - xc_cpumap_t cpumap) > > + xc_cpumap_t cpumap, > > + uint32_t flags, > > + xc_cpumap_t ecpumap_out) > > { > > DECLARE_DOMCTL; > > - DECLARE_HYPERCALL_BUFFER(uint8_t, local); > > + DECLARE_HYPERCALL_BUFFER(uint8_t, cpumap_local); > > + DECLARE_HYPERCALL_BUFFER(uint8_t, ecpumap_local); > > int ret = -1; > > int cpusize; > > > > cpusize = xc_get_cpumap_size(xch); > > - if (!cpusize) > > + if ( !cpusize ) > > I know IanJ will have something to say about non-snuggly braces here and > below. :-)The spaces are legit in libxc I think, it uses Xen coding style. It''s libxl which differs... Ian.
George Dunlap
2013-Nov-19 14:58 UTC
Re: [PATCH v3 10/14] libxc: get and set soft and hard affinity
On 11/19/2013 02:57 PM, Ian Campbell wrote:> On Tue, 2013-11-19 at 14:51 +0000, George Dunlap wrote: >> On 11/18/2013 06:18 PM, Dario Faggioli wrote: >>> by using the new flag introduced in the parameters of the >>> DOMCTL_{get,set}_vcpuaffinity hypercall. >>> >>> This happens by adding a new parameter (flags) to >>> xc_vcpu_setaffinity() and xc_vcpu_getaffinity(), so that the >>> caller can decide to set either the soft or hard affinity, or >>> even both. >>> >>> In case of setting both hard and soft, they are set to the >>> same cpumap. xc_get_setaffinity() also takes another new param, >>> for reporting back to the caller what the actual affinity the >>> scheduler uses will be after a successful call. >>> In case of asking to get both hard and soft, what the caller >>> gets is the intersection between them. >>> >>> In-tree callers are also fixed to cope with the new interface. >>> >>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> >> >> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> >> >> But... >> >>> --- >>> Changes from v2: >>> * better cleanup logic in _vcpu_setaffinity() (regarding >>> xc_hypercall_buffer_{alloc,free}()), as suggested during >>> review; >>> * make it more evident that DOMCTL_setvcpuaffinity has an out >>> parameter, by calling ecpumap_out, and improving the comment >>> wrt that; >>> * change the interface and have xc_vcpu_[sg]etaffinity() so >>> that they take the new parameters (flags and ecpumap_out) and >>> fix the in tree callers. >>> --- >>> tools/libxc/xc_domain.c | 47 +++++++++++++++++++++-------------- >>> tools/libxc/xenctrl.h | 44 ++++++++++++++++++++++++++++++++- >>> tools/libxl/libxl.c | 7 ++++- >>> tools/ocaml/libs/xc/xenctrl_stubs.c | 8 ++++-- >>> tools/python/xen/lowlevel/xc/xc.c | 6 +++- >>> 5 files changed, 86 insertions(+), 26 deletions(-) >>> >>> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c >>> index f9ae4bf..bddf4e0 100644 >>> --- a/tools/libxc/xc_domain.c >>> +++ b/tools/libxc/xc_domain.c >>> @@ -192,44 +192,52 @@ int xc_domain_node_getaffinity(xc_interface *xch, >>> int xc_vcpu_setaffinity(xc_interface *xch, >>> uint32_t domid, >>> int vcpu, >>> - xc_cpumap_t cpumap) >>> + xc_cpumap_t cpumap, >>> + uint32_t flags, >>> + xc_cpumap_t ecpumap_out) >>> { >>> DECLARE_DOMCTL; >>> - DECLARE_HYPERCALL_BUFFER(uint8_t, local); >>> + DECLARE_HYPERCALL_BUFFER(uint8_t, cpumap_local); >>> + DECLARE_HYPERCALL_BUFFER(uint8_t, ecpumap_local); >>> int ret = -1; >>> int cpusize; >>> >>> cpusize = xc_get_cpumap_size(xch); >>> - if (!cpusize) >>> + if ( !cpusize ) >> >> I know IanJ will have something to say about non-snuggly braces here and >> below. :-) > > The spaces are legit in libxc I think, it uses Xen coding style. It''s > libxl which differs...OK -- I just happened to see snuggly braces elsewhere in the file when I took a quick glance around. It has my Ack either way. -George
On 11/18/2013 06:18 PM, Dario Faggioli wrote:> Make space for two new cpumap-s, one in vcpu_info (for getting > soft affinity) and build_info (for setting it). Provide two > new API calls: > > * libxl_set_vcpuaffinity2, taking a cpumap and setting either > hard, soft or both affinity to it, depending on ''flags''; > * libxl_set_vcpuaffinity3, taking two cpumap, one for hard > and one for soft affinity. > > The bheavior of the existing libxl_set_vcpuaffinity is left > unchanged, i.e., it only set hard affinity. > > Getting soft affinity happens indirectly, via `xl vcpu-list'' > (as it is already for hard affinity). > > The new calls include logic to check whether the affinity which > will be used by Xen to schedule the vCPU(s) does actually match > with the cpumap provided. In fact, we want to allow every possible > combination of hard and soft affinities to be set, but we warn > the user upon particularly weird combinations (e.g., hard and > soft being disjoint sets of pCPUs). > > Also, this is the first change breaking the libxl ABI, so it > bumps the MAJOR. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>The interface is fine with me (I would probably just have 2 and not 3, but I''m OK with both). Just a few minor comments:> @@ -973,6 +987,22 @@ int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > libxl_bitmap *cpumap); > int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, > unsigned int max_vcpus, libxl_bitmap *cpumap); > +int libxl_set_vcpuaffinity2(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > + const libxl_bitmap *cpumap, int flags); > +int libxl_set_vcpuaffinity_all2(libxl_ctx *ctx, uint32_t domid, > + unsigned int max_vcpus, > + const libxl_bitmap *cpumap, int flags);Should we have a bit more documentation about the behavior of "flags" somewhere?> diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h > index b11cf28..fc3afee 100644 > --- a/tools/libxl/libxl_utils.h > +++ b/tools/libxl/libxl_utils.h > @@ -98,6 +98,21 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit) > #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \ > if (libxl_bitmap_test(&(m), v)) > > +static inline int libxl_bitmap_equal(const libxl_bitmap *ba, > + const libxl_bitmap *bb, > + int nr_bits) > +{ > + int i; > + > + /* Only check nr_bits (all bits if <= 0) */ > + nr_bits = nr_bits <=0 ? ba->size * 8 : nr_bits;The conditional should really have parenthesis around it, I''m pretty sure; and the spacing is inconsistent. -George
On 11/18/2013 06:16 PM, Dario Faggioli wrote:> Implement vcpu soft affinity for credit1 > > Hello everyone, > > Take 3 for his series. > > Very briefly, what it does is allowing each vcpu to have: > - an hard affinity, which they already do, and we usually call pinning. This > is the list of pcpus where a vcpu is allowed to run; > - a soft affinity, which this series introduces. This is the list of pcpus > where a vcpu *prefers* to run. > > Once that is done, per-vcpu NUMA-aware scheduling is easily implemented on top > of that, just by instructing libxl to issue the proper call to setup the soft > affinity of the domain''s vcpus to be equal to its node-affinity. > > Wrt v2 review[*] I have addressed all the comments (see individual changelogs). > In particular, I have completely redesigned the libxl interface. It now allows > both the following usage patterns: > 1. changing either soft affinity only or hard affinity only or both of them to > the same value, and getting DEBUG or WARN output if that results in an > inconsistent state; > 2. changing both hard and soft affinity, each one to its own value, and > getting DEBUG or WARN output only if the *final* state is inconsistent. > > The series is also available here: > > git://xenbits.xen.org/people/dariof/xen.git numa/per-vcpu-affinity-v3 > > Thanks and Regards, > DarioRelease-wise, I think this is probably not a blocker, but it is a very cool feature. From a scheduler standpoint, it should be fairly low risk -- it is primarily simplifying a mechanism that was introduced in 4.3. Bugs in the library code should be fairly easy to catch, and fairly low impact if they are found. The primary thing to be concerned about is the interface; whether we have had enough time to consider the new interfaces before committing to support them. But they''re fairly straightforward. Since we''re only a day past the freezing point, and (I think) almost ready to be checked in, I''m inclined to give this a freeze exception. Any other thoughts? -George
>>> On 19.11.13 at 17:00, George Dunlap <george.dunlap@eu.citrix.com> wrote: > Release-wise, I think this is probably not a blocker, but it is a very > cool feature. From a scheduler standpoint, it should be fairly low risk > -- it is primarily simplifying a mechanism that was introduced in 4.3. > Bugs in the library code should be fairly easy to catch, and fairly low > impact if they are found. > > The primary thing to be concerned about is the interface; whether we > have had enough time to consider the new interfaces before committing to > support them. But they''re fairly straightforward. > > Since we''re only a day past the freezing point, and (I think) almost > ready to be checked in, I''m inclined to give this a freeze exception. > > Any other thoughts?I agree (with the caveat that I haven''t gone through patches 8 and 9 yet). Jan
Dario Faggioli
2013-Nov-19 16:09 UTC
Re: [PATCH v3 11/14] libxl: get and set soft affinity
On mar, 2013-11-19 at 15:41 +0000, George Dunlap wrote:> On 11/18/2013 06:18 PM, Dario Faggioli wrote: > > Make space for two new cpumap-s, one in vcpu_info (for getting > > soft affinity) and build_info (for setting it). Provide two > > new API calls: > > > > * libxl_set_vcpuaffinity2, taking a cpumap and setting either > > hard, soft or both affinity to it, depending on ''flags''; > > * libxl_set_vcpuaffinity3, taking two cpumap, one for hard > > and one for soft affinity. > > > > The bheavior of the existing libxl_set_vcpuaffinity is left > > unchanged, i.e., it only set hard affinity. > > > > Getting soft affinity happens indirectly, via `xl vcpu-list'' > > (as it is already for hard affinity). > > > > The new calls include logic to check whether the affinity which > > will be used by Xen to schedule the vCPU(s) does actually match > > with the cpumap provided. In fact, we want to allow every possible > > combination of hard and soft affinities to be set, but we warn > > the user upon particularly weird combinations (e.g., hard and > > soft being disjoint sets of pCPUs). > > > > Also, this is the first change breaking the libxl ABI, so it > > bumps the MAJOR. > > > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > The interface is fine with me (I would probably just have 2 and not 3, > but I''m OK with both). Just a few minor comments: >the ''3'' variant (tries to) accomplish what IanJ explicitly asked: having a way to set both hard and soft affinity at the same time, and each with its own value, and only checking for consistency at the very end. I also wasn''t sure whether that would have been actually useful but, I have to admit, it turned out it is, as it can be seen in the following patches, when the interface is used to (re)implement both the existing and the new xl commands and command variants. Let''s see what IanJ thinks, I guess. :-)> > @@ -973,6 +987,22 @@ int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > > libxl_bitmap *cpumap); > > int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, > > unsigned int max_vcpus, libxl_bitmap *cpumap); > > +int libxl_set_vcpuaffinity2(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > > + const libxl_bitmap *cpumap, int flags); > > +int libxl_set_vcpuaffinity_all2(libxl_ctx *ctx, uint32_t domid, > > + unsigned int max_vcpus, > > + const libxl_bitmap *cpumap, int flags); > > Should we have a bit more documentation about the behavior of "flags" > somewhere? >Fair enough. I can respin with that added.> > diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h > > index b11cf28..fc3afee 100644 > > --- a/tools/libxl/libxl_utils.h > > +++ b/tools/libxl/libxl_utils.h > > @@ -98,6 +98,21 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit) > > #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \ > > if (libxl_bitmap_test(&(m), v)) > > > > +static inline int libxl_bitmap_equal(const libxl_bitmap *ba, > > + const libxl_bitmap *bb, > > + int nr_bits) > > +{ > > + int i; > > + > > + /* Only check nr_bits (all bits if <= 0) */ > > + nr_bits = nr_bits <=0 ? ba->size * 8 : nr_bits; > > The conditional should really have parenthesis around it, I''m pretty > sure; and the spacing is inconsistent. >Right. Will do. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Nov-19 16:20 UTC
Re: [PATCH v3 08/14] xen: derive NUMA node affinity from hard and soft CPU affinity
>>> On 18.11.13 at 19:17, Dario Faggioli <dario.faggioli@citrix.com> wrote: > if a domain''s NUMA node-affinity (which is what controls > memory allocations) is provided by the user/toolstack, it > just is not touched. However, if the user does not say > anything, leaving it all to Xen, let''s compute it in the > following way: > > 1. cpupool''s cpus & hard-affinity & soft-affinity > 2. if (1) is empty: cpupool''s cpus & hard-affinityIs this really guaranteed to always be non-empty? At least an ASSERT() to that effect would be nice, as it''s not immediately obvious.> - if ( !zalloc_cpumask_var(&cpumask) ) > + if ( !zalloc_cpumask_var(&dom_cpumask) ) > return; > - if ( !alloc_cpumask_var(&online_affinity) ) > + if ( !zalloc_cpumask_var(&dom_cpumask_soft) )So you use zalloc_cpumask_var() here ...> if ( d->auto_node_affinity ) > { > + /* > + * We want the narrowest possible set of pcpus (to get the narowest > + * possible set of nodes). What we need is the cpumask of where the > + * domain can run (the union of the hard affinity of all its vcpus), > + * and the full mask of where it would prefer to run (the union of > + * the soft affinity of all its various vcpus). Let''s build them. > + */ > + cpumask_clear(dom_cpumask); > + cpumask_clear(dom_cpumask_soft);... and then clear the masks explicitly here? Jan
Dario Faggioli
2013-Nov-19 16:35 UTC
Re: [PATCH v3 08/14] xen: derive NUMA node affinity from hard and soft CPU affinity
On mar, 2013-11-19 at 16:20 +0000, Jan Beulich wrote:> >>> On 18.11.13 at 19:17, Dario Faggioli <dario.faggioli@citrix.com> wrote: > > if a domain''s NUMA node-affinity (which is what controls > > memory allocations) is provided by the user/toolstack, it > > just is not touched. However, if the user does not say > > anything, leaving it all to Xen, let''s compute it in the > > following way: > > > > 1. cpupool''s cpus & hard-affinity & soft-affinity > > 2. if (1) is empty: cpupool''s cpus & hard-affinity > > Is this really guaranteed to always be non-empty? At least an > ASSERT() to that effect would be nice, as it''s not immediately > obvious. >I think it is, basing on how cpupools and hard affinity interact, even before this series (where hard affinity is v->cpu_affinity, the only per-vcpu affinity we have). For instance, when you move a domain to a new cpupool, it always reset v->cpu_affinity to "all" for all the domain''s vcpus (see sched_move_domain()). Similarly, when removing cpus from a cpupools, if some v->cpu_affinity become empty, they get reset to "all" too (see cpu_disable_scheduler()). It also uses "all" as v->cpu_affinity for all the vcpus that, at domain creation time, have an affinity which has an empty intersection with the cpupool where the domain is being created. So, yes, I really think (2.) is guaranteed to be non empty, and yes, I can add an ASSERT there.> > - if ( !zalloc_cpumask_var(&cpumask) ) > > + if ( !zalloc_cpumask_var(&dom_cpumask) ) > > return; > > - if ( !alloc_cpumask_var(&online_affinity) ) > > + if ( !zalloc_cpumask_var(&dom_cpumask_soft) ) > > So you use zalloc_cpumask_var() here ... > > > if ( d->auto_node_affinity ) > > { > > + /* > > + * We want the narrowest possible set of pcpus (to get the narowest > > + * possible set of nodes). What we need is the cpumask of where the > > + * domain can run (the union of the hard affinity of all its vcpus), > > + * and the full mask of where it would prefer to run (the union of > > + * the soft affinity of all its various vcpus). Let''s build them. > > + */ > > + cpumask_clear(dom_cpumask); > > + cpumask_clear(dom_cpumask_soft); > > ... and then clear the masks explicitly here? >AhA, right... I probably got a bit lost while reshuffling things. :-) I''ll ditch these two cpumask_clear(). Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Nov-19 16:39 UTC
Re: [PATCH v3 09/14] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
>>> On 18.11.13 at 19:17, Dario Faggioli <dario.faggioli@citrix.com> wrote: > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -617,19 +617,65 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) > u_domctl) > if ( op->cmd == XEN_DOMCTL_setvcpuaffinity ) > { > cpumask_var_t new_affinity; > + cpumask_t *online; > > ret = xenctl_bitmap_to_cpumask( > &new_affinity, &op->u.vcpuaffinity.cpumap); > - if ( !ret ) > + if ( ret ) > + break; > + > + ret = -EINVAL; > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD ) > + ret = vcpu_set_hard_affinity(v, new_affinity); > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT ) > + ret = vcpu_set_soft_affinity(v, new_affinity);You''re discarding an eventual error indicator from vcpu_set_hard_affinity() here.> + > + if ( ret ) > + goto setvcpuaffinity_out;Considering that you''re going to return an error here, the caller may expect that the call did nothing, even if vcpu_set_hard_affinity() succeeded and vcpu_set_soft_affinity() failed. I know this is ugly to handle...> + > + /* > + * Report back to the caller what the "effective affinity", that > + * is the intersection of cpupool''s pcpus, the (new?) hard > + * affinity and the (new?) soft-affinity. > + */ > + if ( !guest_handle_is_null(op->u.vcpuaffinity.eff_cpumap.bitmap) ) > { > - ret = vcpu_set_affinity(v, new_affinity); > - free_cpumask_var(new_affinity); > + online = cpupool_online_cpumask(v->domain->cpupool); > + cpumask_and(new_affinity, online, v->cpu_hard_affinity); > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT) > + cpumask_and(new_affinity, new_affinity, > + v->cpu_soft_affinity); > + > + ret = cpumask_to_xenctl_bitmap( > + &op->u.vcpuaffinity.eff_cpumap, new_affinity);Considering that you have two bitmaps available from the caller, can''t you just return both when both flags are set?> else > { > + cpumask_var_t affinity; > + > + /* > + * If the caller asks for both _HARD and _SOFT, what we return > + * is the intersection of hard and soft affinity for the vcpu. > + */ > + if ( !alloc_cpumask_var(&affinity) ) {Coding style.> + ret = -EFAULT;-ENOMEM> + break; > + } > + cpumask_setall(affinity); > + > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD ) > + cpumask_copy(affinity, v->cpu_hard_affinity); > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT ) > + cpumask_and(affinity, affinity, v->cpu_soft_affinity);Just like in the set case, you should fail when neither bit is set, and you could easily copy out both mask when both bits are set (or otherwise for the get case using the same interface structure is sort of bogus).> --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -654,22 +654,14 @@ void sched_set_node_affinity(struct domain *d, nodemask_t *mask) > SCHED_OP(DOM2OP(d), set_node_affinity, d, mask); > } > > -int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity) > +static int vcpu_set_affinity( > + struct vcpu *v, const cpumask_t *affinity, cpumask_t **which)I don''t think there''s a need for the double * on "which". Jan
Ian Campbell
2013-Nov-19 17:08 UTC
Re: [PATCH v3 10/14] libxc: get and set soft and hard affinity
On Mon, 2013-11-18 at 19:18 +0100, Dario Faggioli wrote:> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>Acked-by: Ian Campbell <ian.campbell@citrix.com> There are a few preexisting issues with the setaffinity function, but this just duplicates them into the new cpumap, so I don''t see any point in holding up the series for them. Perhaps you could put them on your todo list?> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c > index f9ae4bf..bddf4e0 100644 > --- a/tools/libxc/xc_domain.c > +++ b/tools/libxc/xc_domain.c > @@ -192,44 +192,52 @@ int xc_domain_node_getaffinity(xc_interface *xch, > int xc_vcpu_setaffinity(xc_interface *xch, > uint32_t domid, > int vcpu, > - xc_cpumap_t cpumap) > + xc_cpumap_t cpumap, > + uint32_t flags, > + xc_cpumap_t ecpumap_out) > { > DECLARE_DOMCTL; > - DECLARE_HYPERCALL_BUFFER(uint8_t, local); > + DECLARE_HYPERCALL_BUFFER(uint8_t, cpumap_local); > + DECLARE_HYPERCALL_BUFFER(uint8_t, ecpumap_local); > int ret = -1; > int cpusize; > > cpusize = xc_get_cpumap_size(xch); > - if (!cpusize) > + if ( !cpusize ) > { > PERROR("Could not get number of cpus"); > - goto out; > + return -1;;Double ";;"?> } > > - local = xc_hypercall_buffer_alloc(xch, local, cpusize); > - if ( local == NULL ) > + cpumap_local = xc_hypercall_buffer_alloc(xch, cpumap_local, cpusize); > + ecpumap_local = xc_hypercall_buffer_alloc(xch, ecpumap_local, cpusize); > + if ( cpumap_local == NULL || cpumap_local == NULL) > { > - PERROR("Could not allocate memory for setvcpuaffinity domctl hypercall"); > + PERROR("Could not allocate hcall buffers for DOMCTL_setvcpuaffinity"); > goto out; > } > > domctl.cmd = XEN_DOMCTL_setvcpuaffinity; > domctl.domain = (domid_t)domid; > domctl.u.vcpuaffinity.vcpu = vcpu; > - /* Soft affinity is there, but not used anywhere for now, so... */ > - domctl.u.vcpuaffinity.flags = XEN_VCPUAFFINITY_HARD; > - > - memcpy(local, cpumap, cpusize); > - > - set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); > + domctl.u.vcpuaffinity.flags = flags; > > + memcpy(cpumap_local, cpumap, cpusize);This risks running of the end of the supplies cpumap, if it is smaller than cpusize. But more importantly why is this not using the hypercall buffer bounce mechanism?> + set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, cpumap_local); > domctl.u.vcpuaffinity.cpumap.nr_bits = cpusize * 8; > > + set_xen_guest_handle(domctl.u.vcpuaffinity.eff_cpumap.bitmap, > + ecpumap_local); > + domctl.u.vcpuaffinity.eff_cpumap.nr_bits = cpusize * 8; > + > ret = do_domctl(xch, &domctl); > > - xc_hypercall_buffer_free(xch, local); > + if ( ecpumap_out != NULL ) > + memcpy(ecpumap_out, ecpumap_local, cpusize);Likewise this risks overrunning ecpumap_out, doesn''t it?> out: > + xc_hypercall_buffer_free(xch, cpumap_local); > + xc_hypercall_buffer_free(xch, ecpumap_local); > return ret; > } >
On Tue, 2013-11-19 at 17:09 +0100, Dario Faggioli wrote:> On mar, 2013-11-19 at 15:41 +0000, George Dunlap wrote: > > On 11/18/2013 06:18 PM, Dario Faggioli wrote: > > > Make space for two new cpumap-s, one in vcpu_info (for getting > > > soft affinity) and build_info (for setting it). Provide two > > > new API calls: > > > > > > * libxl_set_vcpuaffinity2, taking a cpumap and setting either > > > hard, soft or both affinity to it, depending on ''flags''; > > > * libxl_set_vcpuaffinity3, taking two cpumap, one for hard > > > and one for soft affinity.I must confess that in the end I really dislike these foo, fooN, fooM style functions. Can we not use LIBXL_APIVERSION here to allow us to uprev the API?> > > > > > The bheavior of the existing libxl_set_vcpuaffinity is left > > > unchanged, i.e., it only set hard affinity. > > > > > > Getting soft affinity happens indirectly, via `xl vcpu-list'' > > > (as it is already for hard affinity). > > > > > > The new calls include logic to check whether the affinity which > > > will be used by Xen to schedule the vCPU(s) does actually match > > > with the cpumap provided. In fact, we want to allow every possible > > > combination of hard and soft affinities to be set, but we warn > > > the user upon particularly weird combinations (e.g., hard and > > > soft being disjoint sets of pCPUs). > > > > > > Also, this is the first change breaking the libxl ABI, so it > > > bumps the MAJOR. > > > > > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > > > The interface is fine with me (I would probably just have 2 and not 3, > > but I''m OK with both). Just a few minor comments: > > > the ''3'' variant (tries to) accomplish what IanJ explicitly asked: having > a way to set both hard and soft affinity at the same time, and each with > its own value, and only checking for consistency at the very end.Can this not be accomplished by a single function which accepts one or zero of the bitmasks being NULL?> I also wasn''t sure whether that would have been actually useful but, I > have to admit, it turned out it is, as it can be seen in the following > patches, when the interface is used to (re)implement both the existing > and the new xl commands and command variants.So did ...2 turn out not to be useful? Lets not provide both in that case.> > Let''s see what IanJ thinks, I guess. :-) >Ian.C.
On Mon, 2013-11-18 at 19:18 +0100, Dario Faggioli wrote:> Make space for two new cpumap-s, one in vcpu_info (for getting > soft affinity) and build_info (for setting it). Provide two > new API calls: > > * libxl_set_vcpuaffinity2, taking a cpumap and setting either > hard, soft or both affinity to it, depending on ''flags''; > * libxl_set_vcpuaffinity3, taking two cpumap, one for hard > and one for soft affinity. > > The bheavior of the existing libxl_set_vcpuaffinity is left > unchanged, i.e., it only set hard affinity. > > Getting soft affinity happens indirectly, via `xl vcpu-list'' > (as it is already for hard affinity). > > The new calls include logic to check whether the affinity which > will be used by Xen to schedule the vCPU(s) does actually match > with the cpumap provided. In fact, we want to allow every possible > combination of hard and soft affinities to be set, but we warn > the user upon particularly weird combinations (e.g., hard and > soft being disjoint sets of pCPUs). > > Also, this is the first change breaking the libxl ABI, so it > bumps the MAJOR. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > --- > Changes from v2: > * interface completely redesigned, as discussed during > review. > --- > tools/libxl/Makefile | 2 - > tools/libxl/libxl.c | 131 +++++++++++++++++++++++++++++++++++++++++++ > tools/libxl/libxl.h | 30 ++++++++++ > tools/libxl/libxl_create.c | 6 ++ > tools/libxl/libxl_types.idl | 4 + > tools/libxl/libxl_utils.h | 15 +++++ > 6 files changed, 186 insertions(+), 2 deletions(-) > > diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile > index cf214bb..cba32d5 100644 > --- a/tools/libxl/Makefile > +++ b/tools/libxl/Makefile > @@ -5,7 +5,7 @@ > XEN_ROOT = $(CURDIR)/../.. > include $(XEN_ROOT)/tools/Rules.mk > > -MAJOR = 4.3 > +MAJOR = 4.4 > MINOR = 0 > > XLUMAJOR = 4.3 > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > index d0db3f0..1122360 100644 > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -4204,6 +4204,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, > for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu, ++ptr) { > if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) > return NULL; > + if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap_soft, 0)) > + return NULL; > if (xc_vcpu_getinfo(ctx->xch, domid, *nb_vcpu, &vcpuinfo) == -1) { > LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu info"); > return NULL; > @@ -4214,6 +4216,12 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, > LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu affinity"); > return NULL; > } > + if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu, > + XEN_VCPUAFFINITY_SOFT, > + ptr->cpumap_soft.map) == -1) { > + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu affinity"); > + return NULL; > + } > ptr->vcpuid = *nb_vcpu; > ptr->cpu = vcpuinfo.cpu; > ptr->online = !!vcpuinfo.online; > @@ -4250,6 +4258,129 @@ int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, > return rc; > } > > +int libxl_set_vcpuaffinity2(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > + const libxl_bitmap *cpumap, int flags)I think if we are going to duplicate hte API in this way then we should still combine the implementation, either with an internal private helper or by making the old API a wrapper around the new one. The internals of ...2 and ...3 should be shared as far as possible too.> +{ > + libxl_cputopology *topology; > + libxl_bitmap ecpumap; > + int nr_cpus = 0, rc; > + > + topology = libxl_get_cpu_topology(ctx, &nr_cpus); > + if (!topology) { > + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "failed to retrieve CPU topology");It''s not consistent within the file but I think for new functions we should use the LOG macro variants.> + return ERROR_FAIL; > + } > + libxl_cputopology_list_free(topology, nr_cpus);Why are you retrieving this only to immediately throw it away?> + > + rc = libxl_cpu_bitmap_alloc(ctx, &ecpumap, 0); > + if (rc) > + return rc; > + > + if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map, > + flags, ecpumap.map)) { > + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu affinity"); > + rc = ERROR_FAIL; > + goto out; > + } > + > + if (!libxl_bitmap_equal(cpumap, &ecpumap, nr_cpus)) > + LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, > + "New affinity for vcpu %d contains unreachable cpus", > + vcpuid); > + if (libxl_bitmap_is_empty(&ecpumap)) > + LIBXL__LOG(ctx, LIBXL__LOG_WARNING, > + "New affinity for vcpu %d has only unreachabel cpus. ""unreachable"> + "Only hard affinity will be considered for scheduling", > + vcpuid); > + > + rc = 0; > + out: > + libxl_bitmap_dispose(&ecpumap); > + return 0; > +}> +int libxl_set_vcpuaffinity3(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > + const libxl_bitmap *cpumap_hard, > + const libxl_bitmap *cpumap_soft)Insert the same comments as ...2, because AFAICT it is mostly the same function.> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > index c7dceda..504c57b 100644 > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -82,6 +82,20 @@ > #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1 > > /* > + * LIBXL_HAVE_VCPUINFO_SOFTAFFINITY indicates that a ''cpumap_soft'' > + * field (of libxl_bitmap type) is present in libxl_vcpuinfo, > + * containing the soft affinity for the vcpu. > + */ > +#define LIBXL_HAVE_VCPUINFO_SOFTAFFINITY 1 > + > +/* > + * LIBXL_HAVE_BUILDINFO_SOFTAFFINITY indicates that a ''cpumap_soft'' > + * field (of libxl_bitmap type) is present in libxl_domain_build_info, > + * containing the soft affinity for the vcpu. > + */ > +#define LIBXL_HAVE_BUILDINFO_SOFTAFFINITY 1Given that they arrive can we just use HAVE_SOFTRAFFINITY?> + > +/* > * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the > * libxl_vendor_device field is present in the hvm sections of > * libxl_domain_build_info. This field tells libxl which > @@ -973,6 +987,22 @@ int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > libxl_bitmap *cpumap); > int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, > unsigned int max_vcpus, libxl_bitmap *cpumap); > +int libxl_set_vcpuaffinity2(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > + const libxl_bitmap *cpumap, int flags); > +int libxl_set_vcpuaffinity_all2(libxl_ctx *ctx, uint32_t domid, > + unsigned int max_vcpus, > + const libxl_bitmap *cpumap, int flags); > +int libxl_set_vcpuaffinity3(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > + const libxl_bitmap *cpumap_hard, > + const libxl_bitmap *cpumap_soft); > +int libxl_set_vcpuaffinity_all3(libxl_ctx *ctx, uint32_t domid, > + unsigned int max_vcpus, > + const libxl_bitmap *cpumap_hard, > + const libxl_bitmap *cpumap_soft); > +/* Flags, consistent with domctl.h */ > +#define LIBXL_VCPUAFFINITY_HARD 1 > +#define LIBXL_VCPUAFFINITY_SOFT 2Can these be an enum in the idl?> + > int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid, > libxl_bitmap *nodemap); > int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid, > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
Ian Campbell
2013-Nov-19 17:30 UTC
Re: [PATCH v3 12/14] xl: enable getting and setting soft
On Mon, 2013-11-18 at 19:18 +0100, Dario Faggioli wrote:> Getting happens via `xl vcpu-list'', which now looks like this: > > # xl vcpu-list -s > Name ID VCPU CPU State Time(s) Hard Affinity / Soft Affinity > Domain-0 0 0 11 -b- 5.4 8-15 / allSince the / is never likely to align, how about "CPU Affinity (Hard/Soft)" as the title line?> Domain-0 0 1 11 -b- 1.0 8-15 / all > Domain-0 0 14 13 -b- 1.4 8-15 / all > Domain-0 0 15 8 -b- 1.6 8-15 / all > vm-test 3 0 4 -b- 2.5 0-12 / 0-7 > vm-test 3 1 0 -b- 3.2 0-12 / 0-7 > > Setting happens by adding a ''-s''/''--soft'' switch to `xl vcpu-pin''. > > xl manual page is updated accordingly. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > --- > Changes from v2: > * this patch folds what in v2 were patches 13 and 14; > * `xl vcpu-pin'' always shows both had and soft affinity, > without the need of passing ''-s''. > --- > docs/man/xl.pod.1 | 24 +++++++++++++++++++- > tools/libxl/xl_cmdimpl.c | 54 +++++++++++++++++++++++---------------------- > tools/libxl/xl_cmdtable.c | 3 ++- > 3 files changed, 53 insertions(+), 28 deletions(-) > > diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 > index e7b9de2..481fbdf 100644 > --- a/docs/man/xl.pod.1 > +++ b/docs/man/xl.pod.1 > @@ -619,7 +619,7 @@ after B<vcpu-set>, go to B<SEE ALSO> section for information. > Lists VCPU information for a specific domain. If no domain is > specified, VCPU information for all domains will be provided. > > -=item B<vcpu-pin> I<domain-id> I<vcpu> I<cpus> > +=item B<vcpu-pin> [I<OPTIONS>] I<domain-id> I<vcpu> I<cpus> > > Pins the VCPU to only run on the specific CPUs. The keyword > B<all> can be used to apply the I<cpus> list to all VCPUs in the > @@ -630,6 +630,28 @@ different run state is appropriate. Pinning can be used to restrict > this, by ensuring certain VCPUs can only run on certain physical > CPUs. > > +B<OPTIONS> > + > +=over 4 > + > +=item B<-s>, B<--soft> > + > +The same as above, but affect I<soft affinity> rather than pinning > +(also called I<hard affinity>). > + > +Normally, VCPUs just wander among the CPUs where it is allowed to > +run (either all the CPUs or the ones to which it is pinned, as said > +for B<vcpu-list>). Soft affinity offer a mean to specify one or moreI think "offers a means" is correct. Or perhaps "affinities offer a means".> +I<preferred> CPUs. Basically, among the ones where it can run, the > +VCPU the VCPU will greately prefer to execute on one of these CPUs,"greatly"> diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c > index d3dcbf0..c97796f 100644 > --- a/tools/libxl/xl_cmdtable.c > +++ b/tools/libxl/xl_cmdtable.c > @@ -213,7 +213,8 @@ struct cmd_spec cmd_table[] = { > { "vcpu-pin", > &main_vcpupin, 1, 1, > "Set which CPUs a VCPU can use", > - "<Domain> <VCPU|all> <CPUs|all>", > + "[option] <Domain> <VCPU|all> <CPUs|all>", > + "-s, --soft Deal with soft affinity","Set soft affinity" ?> }, > { "vcpu-set", > &main_vcpuset, 0, 1, >
Ian Campbell
2013-Nov-19 17:35 UTC
Re: [PATCH v3 13/14] xl: enable for specifying node-affinity in the config file
On Mon, 2013-11-18 at 19:18 +0100, Dario Faggioli wrote:> in a similar way to how it is possible to specify vcpu-affinity. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > --- > Changes from v2: > * use the new libxl API. Although the implementation changed > only a little bit, I removed IanJ''s Acked-by, although I am > here saying that he did provided it, as requested. > --- > docs/man/xl.cfg.pod.5 | 27 ++++++++++++++-- > tools/libxl/libxl_dom.c | 3 +- > tools/libxl/xl_cmdimpl.c | 79 +++++++++++++++++++++++++++++++++++++++++++++- > 3 files changed, 103 insertions(+), 6 deletions(-) > > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 > index 5dbc73c..733c74e 100644 > --- a/docs/man/xl.cfg.pod.5 > +++ b/docs/man/xl.cfg.pod.5 > @@ -144,19 +144,40 @@ run on cpu #3 of the host. > =back > > If this option is not specified, no vcpu to cpu pinning is established, > -and the vcpus of the guest can run on all the cpus of the host. > +and the vcpus of the guest can run on all the cpus of the host. If this > +option is specified, the intersection of the vcpu pinning mask, provided > +here, and the soft affinity mask, provided via B<cpus\_soft=> (if any), > +is utilized to compute the domain node-affinity, for driving memory > +allocations. > > If we are on a NUMA machine (i.e., if the host has more than one NUMA > node) and this option is not specified, libxl automatically tries to > place the guest on the least possible number of nodes. That, however, > will not affect vcpu pinning, so the guest will still be able to run on > -all the cpus, it will just prefer the ones from the node it has been > -placed on. A heuristic approach is used for choosing the best node (or > +all the cpus. A heuristic approach is used for choosing the best node (or > set of nodes), with the goals of maximizing performance for the guest > and, at the same time, achieving efficient utilization of host cpus > and memory. See F<docs/misc/xl-numa-placement.markdown> for more > details. > > +=item B<cpus_soft="CPU-LIST"> > + > +Exactly as B<cpus=>, but specifies soft affinity, rather than pinning > +(also called hard affinity). Starting from Xen 4.4, and if the creditI don''t think we need to reference particular versions in what is effectively the manpage which comes with that version.> +scheduler is used, this means the vcpus of the domain prefers to run > +these pcpus. Default is either all pcpus or xl (via libxl) guesses > +(depending on what other options are present).No need to mention libxl here. TBH I would either document what the other options which affect the guess are or not mention it at all, as it stands the sentence doesn''t tell me anything very useful.> + > +A C<CPU-LIST> is specified exactly as above, for B<cpus=>. > + > +If this option is not specified, the vcpus of the guest will not have > +any preference regarding on what cpu to run, and the scheduler will > +treat all the cpus where a vcpu can execute (if B<cpus=> is specified), > +or all the host cpus (if not), the same. If this option is specified, > +the intersection of the soft affinity mask, provided here, and the vcpu > +pinning, provided via B<cpus=> (if any), is utilized to compute the > +domain node-affinity, for driving memory allocations. > + > =back > > =head3 CPU Scheduling > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > index a1c16b0..ceb37a3 100644 > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -236,7 +236,8 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid, > return rc; > } > libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap); > - libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap); > + libxl_set_vcpuaffinity_all3(ctx, domid, info->max_vcpus, &info->cpumap, > + &info->cpumap_soft); > > xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + LIBXL_MAXMEM_CONSTANT); > xs_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL); > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > index d5c4eb1..660bb1f 100644 > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -76,8 +76,9 @@ xlchild children[child_max]; > static const char *common_domname; > static int fd_lock = -1; > > -/* Stash for specific vcpu to pcpu mappping */ > +/* Stash for specific vcpu to pcpu hard and soft mappping */ > static int *vcpu_to_pcpu; > +static int *vcpu_to_pcpu_soft; > > static const char savefileheader_magic[32]> "Xen saved domain, xl format\n \0 \r"; > @@ -647,7 +648,8 @@ static void parse_config_data(const char *config_source, > const char *buf; > long l; > XLU_Config *config; > - XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms; > + XLU_ConfigList *cpus, *cpus_soft, *vbds, *nics, *pcis; > + XLU_ConfigList *cvfbs, *cpuids, *vtpms; > XLU_ConfigList *ioports, *irqs, *iomem; > int num_ioports, num_irqs, num_iomem; > int pci_power_mgmt = 0; > @@ -824,6 +826,50 @@ static void parse_config_data(const char *config_source, > libxl_defbool_set(&b_info->numa_placement, false); > } > > + if (!xlu_cfg_get_list (config, "cpus_soft", &cpus_soft, 0, 1)) {How much of this block duplicates the parsing of the pinning field? Can it be refactored?> + int n_cpus = 0; > + > + if (libxl_node_bitmap_alloc(ctx, &b_info->cpumap_soft, 0)) { > + fprintf(stderr, "Unable to allocate cpumap_soft\n"); > + exit(1); > + } > + > + /* As above, use a temporary storage for the single affinities */"use temporary storage..." (the "a" is redundant/sounds wierd)> + vcpu_to_pcpu_soft = xmalloc(sizeof(int) * b_info->max_vcpus); > + memset(vcpu_to_pcpu_soft, -1, sizeof(int) * b_info->max_vcpus); > + > + libxl_bitmap_set_none(&b_info->cpumap_soft); > + while ((buf = xlu_cfg_get_listitem(cpus_soft, n_cpus)) != NULL) { > + i = atoi(buf); > + if (!libxl_bitmap_cpu_valid(&b_info->cpumap_soft, i)) { > + fprintf(stderr, "cpu %d illegal\n", i); > + exit(1); > + } > + libxl_bitmap_set(&b_info->cpumap_soft, i); > + if (n_cpus < b_info->max_vcpus) > + vcpu_to_pcpu_soft[n_cpus] = i; > + n_cpus++; > + } > + > + /* We have a soft affinity map, disable automatic placement */ > + libxl_defbool_set(&b_info->numa_placement, false); > + } > + else if (!xlu_cfg_get_string (config, "cpus_soft", &buf, 0)) { > + char *buf2 = strdup(buf); > + > + if (libxl_node_bitmap_alloc(ctx, &b_info->cpumap_soft, 0)) { > + fprintf(stderr, "Unable to allocate cpumap_soft\n"); > + exit(1); > + } > + > + libxl_bitmap_set_none(&b_info->cpumap_soft); > + if (vcpupin_parse(buf2, &b_info->cpumap_soft)) > + exit(1); > + free(buf2); > + > + libxl_defbool_set(&b_info->numa_placement, false); > + } > + > if (!xlu_cfg_get_long (config, "memory", &l, 0)) { > b_info->max_memkb = l * 1024; > b_info->target_memkb = b_info->max_memkb; > @@ -2183,6 +2229,35 @@ start: > free(vcpu_to_pcpu); vcpu_to_pcpu = NULL; > } > > + /* And do the same for single vcpu to soft-affinity mapping */Another option to refactor common code then?
Ian Campbell
2013-Nov-19 17:41 UTC
Re: [PATCH v3 14/14] libxl: automatic NUMA placement affects soft affinity
On Mon, 2013-11-18 at 19:18 +0100, Dario Faggioli wrote:> diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown > index b1ed361..f644758 100644 > --- a/docs/misc/xl-numa-placement.markdown > +++ b/docs/misc/xl-numa-placement.markdown > @@ -126,10 +126,22 @@ or Xen won''t be able to guarantee the locality for their memory accesses. > That, of course, also mean the vCPUs of the domain will only be able to > execute on those same pCPUs. > > +Starting from 4.4, is is also possible to specify a "cpus\_soft=" options/Starting from 4.4, /It/> +in the xl config file. This, independently from whether or not "cpus=" is > +specified too, affect the NUMA placement in a way very similar to what"affects".> +is described above. In fact, the hypervisor will build up the node-affinity > +of the VM basing right on it or, if both pinning (via "cpus=") and soft"basing right on it" -- what does that mean?> +affinity (via "cpus\_soft=") are present, basing on their intersection."based on"> + > +Besides that, "cpus\_soft=" also means, of course, that the vCPUs of thes/Besides that, // s/, of course, // You aren''t being graded against a target word limit you know ;-)> +domain will prefer to execute on, among the pCPUs where they can run, > +those particular pCPUs.Isn''t "among the pCPUs where they can run" here somewhat redundant too? Ian.
Dario Faggioli
2013-Nov-19 17:51 UTC
Re: [PATCH v3 11/14] libxl: get and set soft affinity
On mar, 2013-11-19 at 17:24 +0000, Ian Campbell wrote:> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>> > +int libxl_set_vcpuaffinity2(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > > + const libxl_bitmap *cpumap, int flags) > > I think if we are going to duplicate hte API in this way then we should > still combine the implementation, either with an internal private helper > or by making the old API a wrapper around the new one. > > The internals of ...2 and ...3 should be shared as far as possible too. >Ok, this can be done, I think.> > +{ > > + libxl_cputopology *topology; > > + libxl_bitmap ecpumap; > > + int nr_cpus = 0, rc; > > + > > + topology = libxl_get_cpu_topology(ctx, &nr_cpus); > > + if (!topology) { > > + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "failed to retrieve CPU topology"); > > It''s not consistent within the file but I think for new functions we > should use the LOG macro variants. >Right, but don''t I need a gc to use it? Should I "make up" one just for the purpose of using LOG/LOGE?> > + return ERROR_FAIL; > > + } > > + libxl_cputopology_list_free(topology, nr_cpus); > > Why are you retrieving this only to immediately throw it away? >Because I need nr_cpus. :-)> > +int libxl_set_vcpuaffinity3(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > > + const libxl_bitmap *cpumap_hard, > > + const libxl_bitmap *cpumap_soft) > > Insert the same comments as ...2, because AFAICT it is mostly the same > function. >I will.> > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > > index c7dceda..504c57b 100644 > > --- a/tools/libxl/libxl.h > > +++ b/tools/libxl/libxl.h > > @@ -82,6 +82,20 @@ > > #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1 > > > > /* > > + * LIBXL_HAVE_VCPUINFO_SOFTAFFINITY indicates that a ''cpumap_soft'' > > + * field (of libxl_bitmap type) is present in libxl_vcpuinfo, > > + * containing the soft affinity for the vcpu. > > + */ > > +#define LIBXL_HAVE_VCPUINFO_SOFTAFFINITY 1 > > + > > +/* > > + * LIBXL_HAVE_BUILDINFO_SOFTAFFINITY indicates that a ''cpumap_soft'' > > + * field (of libxl_bitmap type) is present in libxl_domain_build_info, > > + * containing the soft affinity for the vcpu. > > + */ > > +#define LIBXL_HAVE_BUILDINFO_SOFTAFFINITY 1 > > Given that they arrive can we just use HAVE_SOFTRAFFINITY? >You mean just introducing one #define? Sure... For some reason I assumed that every new field should come with it''s own symbol. But if it''s fine to have one, I''m all for it. :-)> > +/* Flags, consistent with domctl.h */ > > +#define LIBXL_VCPUAFFINITY_HARD 1 > > +#define LIBXL_VCPUAFFINITY_SOFT 2 > > Can these be an enum in the idl? >I think yes. I did actually check and, of all the enum-s in the IDL, none are used as flags, they''re rather used as "single values". OTOH, the only actual flags I found (I think it was LIBXL_SUSPEND_DEBUG, LIBXL_SUSPEND_LIVE) were defined like I did myself above... That''s why I went for it. But again, if you''re fine with these being enum, I will make them so. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2013-Nov-19 17:52 UTC
Re: [PATCH v3 12/14] xl: enable getting and setting soft
On mar, 2013-11-19 at 17:30 +0000, Ian Campbell wrote:> On Mon, 2013-11-18 at 19:18 +0100, Dario Faggioli wrote: > > Getting happens via `xl vcpu-list'', which now looks like this: > > > > # xl vcpu-list -s > > Name ID VCPU CPU State Time(s) Hard Affinity / Soft Affinity > > Domain-0 0 0 11 -b- 5.4 8-15 / all > > Since the / is never likely to align, how about "CPU Affinity > (Hard/Soft)" as the title line? >Ok (to this and to all you say below). Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2013-Nov-19 17:57 UTC
Re: [PATCH v3 14/14] libxl: automatic NUMA placement affects soft affinity
On mar, 2013-11-19 at 17:41 +0000, Ian Campbell wrote:> On Mon, 2013-11-18 at 19:18 +0100, Dario Faggioli wrote:> > + > > +Besides that, "cpus\_soft=" also means, of course, that the vCPUs of the > > s/Besides that, // > s/, of course, // > > You aren''t being graded against a target word limit you know ;-) >Or so you assume... Are you sure? Have you read my contract? :-P :-P Anyway, thanks, will fix everything as you suggest. Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2013-Nov-19 18:01 UTC
Re: [PATCH v3 10/14] libxc: get and set soft and hard affinity
On mar, 2013-11-19 at 17:08 +0000, Ian Campbell wrote:> On Mon, 2013-11-18 at 19:18 +0100, Dario Faggioli wrote: > > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > Acked-by: Ian Campbell <ian.campbell@citrix.com> >Cool.> There are a few preexisting issues with the setaffinity function, but > this just duplicates them into the new cpumap, so I don''t see any point > in holding up the series for them. Perhaps you could put them on your > todo list? >I certainly can.> > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c > > index f9ae4bf..bddf4e0 100644 > > --- a/tools/libxc/xc_domain.c > > +++ b/tools/libxc/xc_domain.c > > @@ -192,44 +192,52 @@ int xc_domain_node_getaffinity(xc_interface *xch, > > int xc_vcpu_setaffinity(xc_interface *xch, > > uint32_t domid, > > int vcpu, > > - xc_cpumap_t cpumap) > > + xc_cpumap_t cpumap, > > + uint32_t flags, > > + xc_cpumap_t ecpumap_out) > > { > > DECLARE_DOMCTL; > > - DECLARE_HYPERCALL_BUFFER(uint8_t, local); > > + DECLARE_HYPERCALL_BUFFER(uint8_t, cpumap_local); > > + DECLARE_HYPERCALL_BUFFER(uint8_t, ecpumap_local); > > int ret = -1; > > int cpusize; > > > > cpusize = xc_get_cpumap_size(xch); > > - if (!cpusize) > > + if ( !cpusize ) > > { > > PERROR("Could not get number of cpus"); > > - goto out; > > + return -1;; > > Double ";;"? >Ouch... I''ll have to respin this series (and will do that shortly), so I''ll have a chance to fix this.> > } > > > > - local = xc_hypercall_buffer_alloc(xch, local, cpusize); > > - if ( local == NULL ) > > + cpumap_local = xc_hypercall_buffer_alloc(xch, cpumap_local, cpusize); > > + ecpumap_local = xc_hypercall_buffer_alloc(xch, ecpumap_local, cpusize); > > + if ( cpumap_local == NULL || cpumap_local == NULL) > > { > > - PERROR("Could not allocate memory for setvcpuaffinity domctl hypercall"); > > + PERROR("Could not allocate hcall buffers for DOMCTL_setvcpuaffinity"); > > goto out; > > } > > > > domctl.cmd = XEN_DOMCTL_setvcpuaffinity; > > domctl.domain = (domid_t)domid; > > domctl.u.vcpuaffinity.vcpu = vcpu; > > - /* Soft affinity is there, but not used anywhere for now, so... */ > > - domctl.u.vcpuaffinity.flags = XEN_VCPUAFFINITY_HARD; > > - > > - memcpy(local, cpumap, cpusize); > > - > > - set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); > > + domctl.u.vcpuaffinity.flags = flags; > > > > + memcpy(cpumap_local, cpumap, cpusize); > > This risks running of the end of the supplies cpumap, if it is smaller > than cpusize. >I see. Added to my todo list. :-)> But more importantly why is this not using the hypercall buffer bounce > mechanism? >I happen to have investigated that (with the aim of doing the switch). So, AFAICT, declaring a bounce buffer requires the size of it to be known at declaration time, is that correct? OTOH, looks like plain hypercall buffer is not interested in that until you get to xc_buffer_alloc() it. In this case, I don''t have the size of the buffer until I get to call xc_get_cpumap_size(), and that''s why I gave up turning this (and the new one I''m introducing) into bouncing buffers. If I''m missing/misunderstanding something, and it''s actually possible to do so (how?), I''d be glad to. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2013-Nov-19 18:58 UTC
Re: [PATCH v3 11/14] libxl: get and set soft affinity
On mar, 2013-11-19 at 17:15 +0000, Ian Campbell wrote:> On Tue, 2013-11-19 at 17:09 +0100, Dario Faggioli wrote: > > On mar, 2013-11-19 at 15:41 +0000, George Dunlap wrote: > > > On 11/18/2013 06:18 PM, Dario Faggioli wrote: > > > > Make space for two new cpumap-s, one in vcpu_info (for getting > > > > soft affinity) and build_info (for setting it). Provide two > > > > new API calls: > > > > > > > > * libxl_set_vcpuaffinity2, taking a cpumap and setting either > > > > hard, soft or both affinity to it, depending on ''flags''; > > > > * libxl_set_vcpuaffinity3, taking two cpumap, one for hard > > > > and one for soft affinity. > > I must confess that in the end I really dislike these foo, fooN, fooM > style functions. Can we not use LIBXL_APIVERSION here to allow us to > uprev the API? >We probably can. Situation is as follows: - George wanted something like what libxl_set_vcpuaffinity2 does - IanJ wanted something like what libxl_set_vcpuaffinity3 does - After having introduced them, I actually find uses for both of them, and hence I kept them As per the mechanism used to amend the current API, I don''t have a real preference. Both me and George asked (in a previous thread) what mechanism we should use, but we did not get much feedback at that time (not complain, just summing up what has happened). Right now, I think I just need for someone authoritative enough to express his preference. You certainly qualify as such, but, if possible, I''d like to see what the other Ian thinks too, as he''s been quite deeply involved in reviewing this interface. So, IanJ, any thoughts?> > the ''3'' variant (tries to) accomplish what IanJ explicitly asked: having > > a way to set both hard and soft affinity at the same time, and each with > > its own value, and only checking for consistency at the very end. > > Can this not be accomplished by a single function which accepts one or > zero of the bitmasks being NULL? >I''ll see whether that''s true, but yes, I think I can do that.> > I also wasn''t sure whether that would have been actually useful but, I > > have to admit, it turned out it is, as it can be seen in the following > > patches, when the interface is used to (re)implement both the existing > > and the new xl commands and command variants. > > So did ...2 turn out not to be useful? Lets not provide both in that > case. >It (unfortunately) did. :-/ Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Tue, 2013-11-19 at 18:51 +0100, Dario Faggioli wrote:> > > +{ > > > + libxl_cputopology *topology; > > > + libxl_bitmap ecpumap; > > > + int nr_cpus = 0, rc; > > > + > > > + topology = libxl_get_cpu_topology(ctx, &nr_cpus); > > > + if (!topology) { > > > + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "failed to retrieve CPU topology"); > > > > It''s not consistent within the file but I think for new functions we > > should use the LOG macro variants. > > > Right, but don''t I need a gc to use it? Should I "make up" one just for > the purpose of using LOG/LOGE?I think a call to GC_INIT/GC_FREE should be cheap enough.> > > + return ERROR_FAIL; > > > + } > > > + libxl_cputopology_list_free(topology, nr_cpus); > > > > Why are you retrieving this only to immediately throw it away? > > > Because I need nr_cpus. :-)Surely this is not the recommended way to get nr_cpus! libxl_get_cpu_topology() itself calls libxl_get_max_cpus() which seems like the obvious candidate.> > > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > > > index c7dceda..504c57b 100644 > > > --- a/tools/libxl/libxl.h > > > +++ b/tools/libxl/libxl.h > > > @@ -82,6 +82,20 @@ > > > #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1 > > > > > > /* > > > + * LIBXL_HAVE_VCPUINFO_SOFTAFFINITY indicates that a ''cpumap_soft'' > > > + * field (of libxl_bitmap type) is present in libxl_vcpuinfo, > > > + * containing the soft affinity for the vcpu. > > > + */ > > > +#define LIBXL_HAVE_VCPUINFO_SOFTAFFINITY 1 > > > + > > > +/* > > > + * LIBXL_HAVE_BUILDINFO_SOFTAFFINITY indicates that a ''cpumap_soft'' > > > + * field (of libxl_bitmap type) is present in libxl_domain_build_info, > > > + * containing the soft affinity for the vcpu. > > > + */ > > > +#define LIBXL_HAVE_BUILDINFO_SOFTAFFINITY 1 > > > > Given that they arrive can we just use HAVE_SOFTRAFFINITY? > > > You mean just introducing one #define? Sure... For some reason I assumed > that every new field should come with it''s own symbol. But if it''s fine > to have one, I''m all for it. :-)I think it''s ok.> > > > +/* Flags, consistent with domctl.h */ > > > +#define LIBXL_VCPUAFFINITY_HARD 1 > > > +#define LIBXL_VCPUAFFINITY_SOFT 2 > > > > Can these be an enum in the idl? > > > I think yes. > > I did actually check and, of all the enum-s in the IDL, none are used as > flags, they''re rather used as "single values". OTOH, the only actual > flags I found (I think it was LIBXL_SUSPEND_DEBUG, LIBXL_SUSPEND_LIVE) > were defined like I did myself above... That''s why I went for it.I have a feeling they predate the IDL, or at least the Enumeration support. It''s true that we don''t have any other bit fields in enums though. I can''t see the harm, it''s probably not worth introducing a new IDL type for them.> > But again, if you''re fine with these being enum, I will make them so. > > Thanks and Regards, > Dario >
On 20/11/13 11:27, Ian Campbell wrote:> On Tue, 2013-11-19 at 18:51 +0100, Dario Faggioli wrote: >>>> +{ >>>> + libxl_cputopology *topology; >>>> + libxl_bitmap ecpumap; >>>> + int nr_cpus = 0, rc; >>>> + >>>> + topology = libxl_get_cpu_topology(ctx, &nr_cpus); >>>> + if (!topology) { >>>> + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "failed to retrieve CPU topology"); >>> It''s not consistent within the file but I think for new functions we >>> should use the LOG macro variants. >>> >> Right, but don''t I need a gc to use it? Should I "make up" one just for >> the purpose of using LOG/LOGE? > I think a call to GC_INIT/GC_FREE should be cheap enough. > >>>> + return ERROR_FAIL; >>>> + } >>>> + libxl_cputopology_list_free(topology, nr_cpus); >>> Why are you retrieving this only to immediately throw it away? >>> >> Because I need nr_cpus. :-) > Surely this is not the recommended way to get nr_cpus! > > libxl_get_cpu_topology() itself calls libxl_get_max_cpus() which seems > like the obvious candidate. > > >>>> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h >>>> index c7dceda..504c57b 100644 >>>> --- a/tools/libxl/libxl.h >>>> +++ b/tools/libxl/libxl.h >>>> @@ -82,6 +82,20 @@ >>>> #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1 >>>> >>>> /* >>>> + * LIBXL_HAVE_VCPUINFO_SOFTAFFINITY indicates that a ''cpumap_soft'' >>>> + * field (of libxl_bitmap type) is present in libxl_vcpuinfo, >>>> + * containing the soft affinity for the vcpu. >>>> + */ >>>> +#define LIBXL_HAVE_VCPUINFO_SOFTAFFINITY 1 >>>> + >>>> +/* >>>> + * LIBXL_HAVE_BUILDINFO_SOFTAFFINITY indicates that a ''cpumap_soft'' >>>> + * field (of libxl_bitmap type) is present in libxl_domain_build_info, >>>> + * containing the soft affinity for the vcpu. >>>> + */ >>>> +#define LIBXL_HAVE_BUILDINFO_SOFTAFFINITY 1 >>> Given that they arrive can we just use HAVE_SOFTRAFFINITY? >>> >> You mean just introducing one #define? Sure... For some reason I assumed >> that every new field should come with it''s own symbol. But if it''s fine >> to have one, I''m all for it. :-) > I think it''s ok. > >>>> +/* Flags, consistent with domctl.h */ >>>> +#define LIBXL_VCPUAFFINITY_HARD 1 >>>> +#define LIBXL_VCPUAFFINITY_SOFT 2 >>> Can these be an enum in the idl? >>> >> I think yes. >> >> I did actually check and, of all the enum-s in the IDL, none are used as >> flags, they''re rather used as "single values". OTOH, the only actual >> flags I found (I think it was LIBXL_SUSPEND_DEBUG, LIBXL_SUSPEND_LIVE) >> were defined like I did myself above... That''s why I went for it. > I have a feeling they predate the IDL, or at least the Enumeration > support. It''s true that we don''t have any other bit fields in enums > though. I can''t see the harm, it''s probably not worth introducing a new > IDL type for them.Since these are bits, not numbers, I don''t think an enum is the right construct. Or, the enum values should be the *bit numbers*, and the flags should be (1<<[bit_humber]). -George
On Tue, 2013-11-19 at 19:58 +0100, Dario Faggioli wrote:> As per the mechanism used to amend the current API, I don''t have a real > preference. Both me and George asked (in a previous thread) what > mechanism we should use, but we did not get much feedback at that time > (not complain, just summing up what has happened).Yes, sorry about that.> Right now, I think I just need for someone authoritative enough to > express his preference. You certainly qualify as such, but, if possible, > I''d like to see what the other Ian thinks too, as he''s been quite deeply > involved in reviewing this interface.I think the major argument in favour of the LIBXL_API_VERSION based approach is that this is how we have dealt with all the previous such changes. Having a mixture of this and fooN interfaces seems to me to be worse that either option by itself.> > > the ''3'' variant (tries to) accomplish what IanJ explicitly asked: having > > > a way to set both hard and soft affinity at the same time, and each with > > > its own value, and only checking for consistency at the very end. > > > > Can this not be accomplished by a single function which accepts one or > > zero of the bitmasks being NULL? > > > I''ll see whether that''s true, but yes, I think I can do that.Thanks, Ian.
On Wed, 2013-11-20 at 11:29 +0000, George Dunlap wrote:> On 20/11/13 11:27, Ian Campbell wrote: > > On Tue, 2013-11-19 at 18:51 +0100, Dario Faggioli wrote: > >>>> +{ > >>>> + libxl_cputopology *topology; > >>>> + libxl_bitmap ecpumap; > >>>> + int nr_cpus = 0, rc; > >>>> + > >>>> + topology = libxl_get_cpu_topology(ctx, &nr_cpus); > >>>> + if (!topology) { > >>>> + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "failed to retrieve CPU topology"); > >>> It''s not consistent within the file but I think for new functions we > >>> should use the LOG macro variants. > >>> > >> Right, but don''t I need a gc to use it? Should I "make up" one just for > >> the purpose of using LOG/LOGE? > > I think a call to GC_INIT/GC_FREE should be cheap enough. > > > >>>> + return ERROR_FAIL; > >>>> + } > >>>> + libxl_cputopology_list_free(topology, nr_cpus); > >>> Why are you retrieving this only to immediately throw it away? > >>> > >> Because I need nr_cpus. :-) > > Surely this is not the recommended way to get nr_cpus! > > > > libxl_get_cpu_topology() itself calls libxl_get_max_cpus() which seems > > like the obvious candidate. > > > > > >>>> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > >>>> index c7dceda..504c57b 100644 > >>>> --- a/tools/libxl/libxl.h > >>>> +++ b/tools/libxl/libxl.h > >>>> @@ -82,6 +82,20 @@ > >>>> #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1 > >>>> > >>>> /* > >>>> + * LIBXL_HAVE_VCPUINFO_SOFTAFFINITY indicates that a ''cpumap_soft'' > >>>> + * field (of libxl_bitmap type) is present in libxl_vcpuinfo, > >>>> + * containing the soft affinity for the vcpu. > >>>> + */ > >>>> +#define LIBXL_HAVE_VCPUINFO_SOFTAFFINITY 1 > >>>> + > >>>> +/* > >>>> + * LIBXL_HAVE_BUILDINFO_SOFTAFFINITY indicates that a ''cpumap_soft'' > >>>> + * field (of libxl_bitmap type) is present in libxl_domain_build_info, > >>>> + * containing the soft affinity for the vcpu. > >>>> + */ > >>>> +#define LIBXL_HAVE_BUILDINFO_SOFTAFFINITY 1 > >>> Given that they arrive can we just use HAVE_SOFTRAFFINITY? > >>> > >> You mean just introducing one #define? Sure... For some reason I assumed > >> that every new field should come with it''s own symbol. But if it''s fine > >> to have one, I''m all for it. :-) > > I think it''s ok. > > > >>>> +/* Flags, consistent with domctl.h */ > >>>> +#define LIBXL_VCPUAFFINITY_HARD 1 > >>>> +#define LIBXL_VCPUAFFINITY_SOFT 2 > >>> Can these be an enum in the idl? > >>> > >> I think yes. > >> > >> I did actually check and, of all the enum-s in the IDL, none are used as > >> flags, they''re rather used as "single values". OTOH, the only actual > >> flags I found (I think it was LIBXL_SUSPEND_DEBUG, LIBXL_SUSPEND_LIVE) > >> were defined like I did myself above... That''s why I went for it. > > I have a feeling they predate the IDL, or at least the Enumeration > > support. It''s true that we don''t have any other bit fields in enums > > though. I can''t see the harm, it''s probably not worth introducing a new > > IDL type for them. > > Since these are bits, not numbers, I don''t think an enum is the right > construct. Or, the enum values should be the *bit numbers*, and the > flags should be (1<<[bit_humber]).That would need a new IDL type to express. In which case lets just leave the raw #defines, unless anyone else has a strong opinion. Ian.
Dario Faggioli
2013-Nov-20 11:40 UTC
Re: [PATCH v3 11/14] libxl: get and set soft affinity
On mer, 2013-11-20 at 11:32 +0000, Ian Campbell wrote:> On Wed, 2013-11-20 at 11:29 +0000, George Dunlap wrote: > > On 20/11/13 11:27, Ian Campbell wrote: > > >> I did actually check and, of all the enum-s in the IDL, none are used as > > >> flags, they''re rather used as "single values". OTOH, the only actual > > >> flags I found (I think it was LIBXL_SUSPEND_DEBUG, LIBXL_SUSPEND_LIVE) > > >> were defined like I did myself above... That''s why I went for it. > > > I have a feeling they predate the IDL, or at least the Enumeration > > > support. It''s true that we don''t have any other bit fields in enums > > > though. I can''t see the harm, it''s probably not worth introducing a new > > > IDL type for them. > > > > Since these are bits, not numbers, I don''t think an enum is the right > > construct. Or, the enum values should be the *bit numbers*, and the > > flags should be (1<<[bit_humber]). > > That would need a new IDL type to express. In which case lets just leave > the raw #defines, unless anyone else has a strong opinion. >That would probably the best option, at least for now. Of course, I can add "introduce a new IDL type for bitfields" to my TODO list, and send a followup patch for 4.5. Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2013-Nov-20 12:00 UTC
Re: [PATCH v3 11/14] libxl: get and set soft affinity
On mer, 2013-11-20 at 11:27 +0000, Ian Campbell wrote:> On Tue, 2013-11-19 at 18:51 +0100, Dario Faggioli wrote: > > Right, but don''t I need a gc to use it? Should I "make up" one just for > > the purpose of using LOG/LOGE? > > I think a call to GC_INIT/GC_FREE should be cheap enough. >Fine then.> > > > + return ERROR_FAIL; > > > > + } > > > > + libxl_cputopology_list_free(topology, nr_cpus); > > > > > > Why are you retrieving this only to immediately throw it away? > > > > > Because I need nr_cpus. :-) > > Surely this is not the recommended way to get nr_cpus! > > libxl_get_cpu_topology() itself calls libxl_get_max_cpus() which seems > like the obvious candidate. >Well, it does indeed, but then it (most likely) returns something different from what libxl_get_max_cpus() says. I fact, what it does is use the result of such call to size the arrays needed for calling xc_topologyinfo. Then, it takes what the call to xc_topologyinfo() returns in tinfo.max_cpu_index and returns that (increased by one, as that''s the index rather than the number) to the caller. I think the difference is libxl_get_max_cpus() returns the maximum possible number of supported cpus, while libxl_get_cputopology() --thanks to the fact that it goes through xc_topologyinfo(), not (only) xc_get_max_cpus(), like libxl_get_max_cpus() does-- returns the actual number of cpus. What I need is the latter, and I''m looking again, but I''m again not finding anything easier (and, I agree, less ugly) than this to get it. I can go through libxl_get_physinfo(), rather than through _topologyinfo(), but the result is not going to be that different. I''ll keep looking but, in case I really don''t fine anything, do you want me to: - stick with this, at least for now; - introduce a new libxl (and probably libxc too) interface for this ? And I''m asking with this patch series in mind... I mean, I of course can add to my TODO list to do the latter, but do you think it''s a prerequisite for accepting this patch? Just let me know. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Wed, 2013-11-20 at 13:00 +0100, Dario Faggioli wrote:> > > > > + return ERROR_FAIL; > > > > > + } > > > > > + libxl_cputopology_list_free(topology, nr_cpus); > > > > > > > > Why are you retrieving this only to immediately throw it away? > > > > > > > Because I need nr_cpus. :-) > > > > Surely this is not the recommended way to get nr_cpus! > > > > libxl_get_cpu_topology() itself calls libxl_get_max_cpus() which seems > > like the obvious candidate. > > > Well, it does indeed, but then it (most likely) returns something > different from what libxl_get_max_cpus() says. > > I fact, what it does is use the result of such call to size the arrays > needed for calling xc_topologyinfo. Then, it takes what the call to > xc_topologyinfo() returns in tinfo.max_cpu_index and returns that > (increased by one, as that''s the index rather than the number) to the > caller. > > I think the difference is libxl_get_max_cpus() returns the maximum > possible number of supported cpus, while libxl_get_cputopology() > --thanks to the fact that it goes through xc_topologyinfo(), not (only) > xc_get_max_cpus(), like libxl_get_max_cpus() does-- returns the actual > number of cpus. > > What I need is the latter, and I''m looking again, but I''m again not > finding anything easier (and, I agree, less ugly) than this to get it. > I can go through libxl_get_physinfo(), rather than through > _topologyinfo(), but the result is not going to be that different. > > I''ll keep looking but, in case I really don''t fine anything, do you want > me to: > - stick with this, at least for now; > - introduce a new libxl (and probably libxc too) interface for thisI don''t understand why libxl_get_max_cpus, which is certainly at least as big as you need, isn''t sufficient here. Especially since in the same function you call libxl_cpu_bitmap_alloc(...,..., 0) which uses libxl_get_max_cpus. You then use this with "libxl_bitmap_equal(cpumap, &ecpumap, nr_cpus)" where surely the sizes of cpumap and ecpumap could be used and/or are what actually matter? (and shouldn''t you be checking that the sizes meet some constraint relative to each other?)> And I''m asking with this patch series in mind... I mean, I of course can > add to my TODO list to do the latter, but do you think it''s a > prerequisite for accepting this patch?> > Just let me know. > > Regards, > Dario >
Dario Faggioli
2013-Nov-20 12:18 UTC
Re: [PATCH v3 11/14] libxl: get and set soft affinity
On mer, 2013-11-20 at 12:05 +0000, Ian Campbell wrote:> On Wed, 2013-11-20 at 13:00 +0100, Dario Faggioli wrote: > > I''ll keep looking but, in case I really don''t fine anything, do you want > > me to: > > - stick with this, at least for now; > > - introduce a new libxl (and probably libxc too) interface for this > > I don''t understand why libxl_get_max_cpus, which is certainly at least > as big as you need, isn''t sufficient here. Especially since in the same > function you call libxl_cpu_bitmap_alloc(...,..., 0) which uses > libxl_get_max_cpus. >As a matter of fact, it''s _too_ big! The point here is that, whenever the caller gives me a cpumap that comes from him specifying "all" in the config file, this cpumap is, for instance, on my system, and unsigned int full of 1-s. If you manage to (pretty) print it, that would look like "0-63". After going down to Xen and then back from there, i.e., what happens to ecpumap, the "all" from above has become, again on my system, where I have 16 cpus, something like "0-15". That is, looking at the bits in the actual uint, the first 16 of them to 1, the other to 0. Well, what I want is to be able to _not_ warn the user in this case, since he asked for all the cpus, and that''s what he''s getting, but for doing that I need to be able to restrict the comparison that is happening in libxl_bitmap_equal() to *only* the actual number of cpus, not the theoretically supported one.> You then use this with "libxl_bitmap_equal(cpumap, &ecpumap, nr_cpus)" > where surely the sizes of cpumap and ecpumap could be used and/or are > what actually matter? (and shouldn''t you be checking that the sizes meet > some constraint relative to each other?) >Fair enough. It''s very unlikely, but yes, since the interface allows to define bitmaps of different sizes, I should probably check that. Good point actually, but unrelated to and unhelpful to solve my issue. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Wed, 2013-11-20 at 13:18 +0100, Dario Faggioli wrote:> On mer, 2013-11-20 at 12:05 +0000, Ian Campbell wrote: > > On Wed, 2013-11-20 at 13:00 +0100, Dario Faggioli wrote: > > > I''ll keep looking but, in case I really don''t fine anything, do you want > > > me to: > > > - stick with this, at least for now; > > > - introduce a new libxl (and probably libxc too) interface for this > > > > I don''t understand why libxl_get_max_cpus, which is certainly at least > > as big as you need, isn''t sufficient here. Especially since in the same > > function you call libxl_cpu_bitmap_alloc(...,..., 0) which uses > > libxl_get_max_cpus. > > > As a matter of fact, it''s _too_ big! > > The point here is that, whenever the caller gives me a cpumap that comes > from him specifying "all" in the config file, this cpumap is, for > instance, on my system, and unsigned int full of 1-s. If you manage to > (pretty) print it, that would look like "0-63". > > After going down to Xen and then back from there, i.e., what happens to > ecpumap, the "all" from above has become, again on my system, where I > have 16 cpus, something like "0-15". That is, looking at the bits in the > actual uint, the first 16 of them to 1, the other to 0.And the returned bitmap doesn''t have a size == 16? That''s not very helpful I suppose. It seems like it should be quite quick to wire up xc_get_nr_cpus based on xc_get_max_cpus and use that. Is there not a race condition here somewhere -- what happens if a CPU is on/offlined during all this? Ian.
On 20/11/13 11:30, Ian Campbell wrote:> On Tue, 2013-11-19 at 19:58 +0100, Dario Faggioli wrote: >> As per the mechanism used to amend the current API, I don''t have a real >> preference. Both me and George asked (in a previous thread) what >> mechanism we should use, but we did not get much feedback at that time >> (not complain, just summing up what has happened). > Yes, sorry about that. > >> Right now, I think I just need for someone authoritative enough to >> express his preference. You certainly qualify as such, but, if possible, >> I''d like to see what the other Ian thinks too, as he''s been quite deeply >> involved in reviewing this interface. > I think the major argument in favour of the LIBXL_API_VERSION based > approach is that this is how we have dealt with all the previous such > changes. Having a mixture of this and fooN interfaces seems to me to be > worse that either option by itself.Just to be clear -- if we bump the API number, then we have to do some #ifdef-ery to allow programs compiled against the old API to compile, right? -George
On Wed, 2013-11-20 at 13:59 +0000, George Dunlap wrote:> On 20/11/13 11:30, Ian Campbell wrote: > > On Tue, 2013-11-19 at 19:58 +0100, Dario Faggioli wrote: > >> As per the mechanism used to amend the current API, I don''t have a real > >> preference. Both me and George asked (in a previous thread) what > >> mechanism we should use, but we did not get much feedback at that time > >> (not complain, just summing up what has happened). > > Yes, sorry about that. > > > >> Right now, I think I just need for someone authoritative enough to > >> express his preference. You certainly qualify as such, but, if possible, > >> I''d like to see what the other Ian thinks too, as he''s been quite deeply > >> involved in reviewing this interface. > > I think the major argument in favour of the LIBXL_API_VERSION based > > approach is that this is how we have dealt with all the previous such > > changes. Having a mixture of this and fooN interfaces seems to me to be > > worse that either option by itself. > > Just to be clear -- if we bump the API number, then we have to do some > #ifdef-ery to allow programs compiled against the old API to compile, right?Yes. I expect that the libxl_domain_create_restore machinery is the template which would be desired here. This does not remove the need for a LIBXL_HAVE_FOO either -- they are somewhat orthogonal. (In practice at least libvirt prefers the LIBXL_HAVE mechanism) Ian.
On 19/11/13 16:09, Dario Faggioli wrote:> On mar, 2013-11-19 at 15:41 +0000, George Dunlap wrote: >> On 11/18/2013 06:18 PM, Dario Faggioli wrote: >>> Make space for two new cpumap-s, one in vcpu_info (for getting >>> soft affinity) and build_info (for setting it). Provide two >>> new API calls: >>> >>> * libxl_set_vcpuaffinity2, taking a cpumap and setting either >>> hard, soft or both affinity to it, depending on ''flags''; >>> * libxl_set_vcpuaffinity3, taking two cpumap, one for hard >>> and one for soft affinity. >>> >>> The bheavior of the existing libxl_set_vcpuaffinity is left >>> unchanged, i.e., it only set hard affinity. >>> >>> Getting soft affinity happens indirectly, via `xl vcpu-list'' >>> (as it is already for hard affinity). >>> >>> The new calls include logic to check whether the affinity which >>> will be used by Xen to schedule the vCPU(s) does actually match >>> with the cpumap provided. In fact, we want to allow every possible >>> combination of hard and soft affinities to be set, but we warn >>> the user upon particularly weird combinations (e.g., hard and >>> soft being disjoint sets of pCPUs). >>> >>> Also, this is the first change breaking the libxl ABI, so it >>> bumps the MAJOR. >>> >>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> >> The interface is fine with me (I would probably just have 2 and not 3, >> but I''m OK with both). Just a few minor comments: >> > the ''3'' variant (tries to) accomplish what IanJ explicitly asked: having > a way to set both hard and soft affinity at the same time, and each with > its own value, and only checking for consistency at the very end. > > I also wasn''t sure whether that would have been actually useful but, I > have to admit, it turned out it is, as it can be seen in the following > patches, when the interface is used to (re)implement both the existing > and the new xl commands and command variants. > > Let''s see what IanJ thinks, I guess. :-)BTW, I wasn''t complaining -- I know IanJ wanted it the second way; I was just re-registering my preference. :-) -George
On 20/11/13 11:40, Dario Faggioli wrote:> On mer, 2013-11-20 at 11:32 +0000, Ian Campbell wrote: >> On Wed, 2013-11-20 at 11:29 +0000, George Dunlap wrote: >>> On 20/11/13 11:27, Ian Campbell wrote: >>>>> I did actually check and, of all the enum-s in the IDL, none are used as >>>>> flags, they''re rather used as "single values". OTOH, the only actual >>>>> flags I found (I think it was LIBXL_SUSPEND_DEBUG, LIBXL_SUSPEND_LIVE) >>>>> were defined like I did myself above... That''s why I went for it. >>>> I have a feeling they predate the IDL, or at least the Enumeration >>>> support. It''s true that we don''t have any other bit fields in enums >>>> though. I can''t see the harm, it''s probably not worth introducing a new >>>> IDL type for them. >>> Since these are bits, not numbers, I don''t think an enum is the right >>> construct. Or, the enum values should be the *bit numbers*, and the >>> flags should be (1<<[bit_humber]). >> That would need a new IDL type to express. In which case lets just leave >> the raw #defines, unless anyone else has a strong opinion. >> > That would probably the best option, at least for now. Of course, I can > add "introduce a new IDL type for bitfields" to my TODO list, and send a > followup patch for 4.5.But if we end up doing as IanC suggests -- having a new function which accepts two pointers, either of which can be NULL -- then the need for these OR-able flags goes away, doesn''t it? -George
Dario Faggioli
2013-Nov-20 14:50 UTC
Re: [PATCH v3 11/14] libxl: get and set soft affinity
On mer, 2013-11-20 at 12:26 +0000, Ian Campbell wrote:> On Wed, 2013-11-20 at 13:18 +0100, Dario Faggioli wrote:> > After going down to Xen and then back from there, i.e., what happens to > > ecpumap, the "all" from above has become, again on my system, where I > > have 16 cpus, something like "0-15". That is, looking at the bits in the > > actual uint, the first 16 of them to 1, the other to 0. > > And the returned bitmap doesn''t have a size == 16? That''s not very > helpful I suppose. >Where? I mean what size are you talking about? In libxl, it is libxl itself that allocates the bitmap and decides, at allocation time (with the third parameter of libxl_cpu_bitmap_alloc()) how many bits I want there, and nothing changes that. So, after having allocated a cpumap 64 bits big, there is no way it can tell that only 16 are worth. I can allocate the cpumap more precisely but, for one, that would still require figuring out the actual number of pcpus. Also, that would work for 16, but for any other value that is not multiple of sizeof(uint8_t), I''d have to face the same problem.> It seems like it should be quite quick to wire up xc_get_nr_cpus based > on xc_get_max_cpus and use that. >No it''s not. On it.> Is there not a race condition here somewhere -- what happens if a CPU is > on/offlined during all this? >Again, I''m not getting. What''s the window where you''re worried about races, if on/offlining is involved? What do you refer to with "during all this" ? Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2013-Nov-20 14:52 UTC
Re: [PATCH v3 11/14] libxl: get and set soft affinity
On mer, 2013-11-20 at 14:45 +0000, George Dunlap wrote:> On 20/11/13 11:40, Dario Faggioli wrote: > > That would probably the best option, at least for now. Of course, I can > > add "introduce a new IDL type for bitfields" to my TODO list, and send a > > followup patch for 4.5. > > But if we end up doing as IanC suggests -- having a new function which > accepts two pointers, either of which can be NULL -- then the need for > these OR-able flags goes away, doesn''t it? >Good point, it indeed does. I just became even more a fan of that solution then. :-D Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Wed, 2013-11-20 at 15:50 +0100, Dario Faggioli wrote:> On mer, 2013-11-20 at 12:26 +0000, Ian Campbell wrote: > > On Wed, 2013-11-20 at 13:18 +0100, Dario Faggioli wrote: > > > > After going down to Xen and then back from there, i.e., what happens to > > > ecpumap, the "all" from above has become, again on my system, where I > > > have 16 cpus, something like "0-15". That is, looking at the bits in the > > > actual uint, the first 16 of them to 1, the other to 0. > > > > And the returned bitmap doesn''t have a size == 16? That''s not very > > helpful I suppose. > > > Where? I mean what size are you talking about? In libxl, it is libxl > itself that allocates the bitmap and decides, at allocation time (with > the third parameter of libxl_cpu_bitmap_alloc()) how many bits I want > there, and nothing changes that. So, after having allocated a cpumap 64 > bits big, there is no way it can tell that only 16 are worth. > > I can allocate the cpumap more precisely but, for one, that would still > require figuring out the actual number of pcpus. Also, that would work > for 16, but for any other value that is not multiple of sizeof(uint8_t), > I''d have to face the same problem.Oh, OK. I figured max_cpu would be stashed there somewhere.> > It seems like it should be quite quick to wire up xc_get_nr_cpus based > > on xc_get_max_cpus and use that. > > > No it''s not. On it.Great.> > Is there not a race condition here somewhere -- what happens if a CPU is > > on/offlined during all this? > > > Again, I''m not getting. What''s the window where you''re worried about > races, if on/offlining is involved? What do you refer to with "during > all this" ?Between getting the maximum cpu number and checking the results of a pin call. What happens if a CPU went away such that you think when checking that there is 16 cpus (based on old information) but when the pin hypercall was made there were only 15. Or conversely if a CPU was plugged in. Also, does this check fail if the cpumask is sparse? Is that something which can happen e.g. unplugging CPU#8 in a 16 CPU system? Ian.
Dario Faggioli
2013-Nov-20 16:27 UTC
Re: [PATCH v3 11/14] libxl: get and set soft affinity
On mer, 2013-11-20 at 14:56 +0000, Ian Campbell wrote:> On Wed, 2013-11-20 at 15:50 +0100, Dario Faggioli wrote: > > Again, I''m not getting. What''s the window where you''re worried about > > races, if on/offlining is involved? What do you refer to with "during > > all this" ? > > Between getting the maximum cpu number and checking the results of a pin > call. What happens if a CPU went away such that you think when checking > that there is 16 cpus (based on old information) but when the pin > hypercall was made there were only 15. Or conversely if a CPU was > plugged in. >Well, for the sake of this patch, all that we risk is printing a spurious warning or missing one. Anyway, I see what you mean now, and I guess I can try to mitigate it, by moving the check for the number of cpus after the affinity setting call. That would at least make it more likely for the information about the number of cpus to be consistent with the result of the call, but won''t eliminate the possibility of races. In fact, I don''t think I can''t avoid that with 100% probability, as there is no way to get both the result of the affinity setting call and the number of cpus in an atomic way, and I don''t think it''s worth introducing one for the sake of this...> Also, does this check fail if the cpumask is sparse? Is that something > which can happen e.g. unplugging CPU#8 in a 16 CPU system? >Well, in that case I guess it''d be fine to print the warning. I mean, if the user wanted affinity to cpu#8 and that went away, it''s a good to tell him he''s not getting (exactly) what he asked, isn''t it? Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell writes ("Re: [PATCH v3 11/14] libxl: get and set soft affinity"):> On Wed, 2013-11-20 at 13:59 +0000, George Dunlap wrote: > > Just to be clear -- if we bump the API number, then we have to do some > > #ifdef-ery to allow programs compiled against the old API to compile, right? > > Yes. > > I expect that the libxl_domain_create_restore machinery is the template > which would be desired here. > > This does not remove the need for a LIBXL_HAVE_FOO either -- they are > somewhat orthogonal. (In practice at least libvirt prefers the > LIBXL_HAVE mechanism)For the record, I don''t object to this approach, if that''s what George and Ian C prefer. I agree that the profusion of variants is ugly. Ian.
Dario Faggioli
2013-Nov-20 17:46 UTC
Re: [PATCH v3 11/14] libxl: get and set soft affinity
On mer, 2013-11-20 at 16:59 +0000, Ian Jackson wrote:> Ian Campbell writes ("Re: [PATCH v3 11/14] libxl: get and set soft affinity"): > > I expect that the libxl_domain_create_restore machinery is the template > > which would be desired here. > > > > This does not remove the need for a LIBXL_HAVE_FOO either -- they are > > somewhat orthogonal. (In practice at least libvirt prefers the > > LIBXL_HAVE mechanism) > > For the record, I don''t object to this approach, if that''s what George > and Ian C prefer. I agree that the profusion of variants is ugly. >Fine then. Going for it. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2013-Nov-22 18:55 UTC
Re: [PATCH v3 09/14] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
So, about this, in the new version of this series I''m releasing right now. On mar, 2013-11-19 at 16:39 +0000, Jan Beulich wrote:> >>> On 18.11.13 at 19:17, Dario Faggioli <dario.faggioli@citrix.com> wrote: > > --- a/xen/common/domctl.c > > +++ b/xen/common/domctl.c > > @@ -617,19 +617,65 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) > > u_domctl) > > if ( op->cmd == XEN_DOMCTL_setvcpuaffinity ) > > { > > cpumask_var_t new_affinity; > > + cpumask_t *online; > > > > ret = xenctl_bitmap_to_cpumask( > > &new_affinity, &op->u.vcpuaffinity.cpumap); > > - if ( !ret ) > > + if ( ret ) > > + break; > > + > > + ret = -EINVAL; > > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD ) > > + ret = vcpu_set_hard_affinity(v, new_affinity); > > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT ) > > + ret = vcpu_set_soft_affinity(v, new_affinity); > > You''re discarding an eventual error indicator from > vcpu_set_hard_affinity() here. >I fixed this.> > + > > + if ( ret ) > > + goto setvcpuaffinity_out; > > Considering that you''re going to return an error here, the caller > may expect that the call did nothing, even if > vcpu_set_hard_affinity() succeeded and vcpu_set_soft_affinity() > failed. I know this is ugly to handle... >And this too.> > + > > + /* > > + * Report back to the caller what the "effective affinity", that > > + * is the intersection of cpupool''s pcpus, the (new?) hard > > + * affinity and the (new?) soft-affinity. > > + */ > > + if ( !guest_handle_is_null(op->u.vcpuaffinity.eff_cpumap.bitmap) ) > > { > > - ret = vcpu_set_affinity(v, new_affinity); > > - free_cpumask_var(new_affinity); > > + online = cpupool_online_cpumask(v->domain->cpupool); > > + cpumask_and(new_affinity, online, v->cpu_hard_affinity); > > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT) > > + cpumask_and(new_affinity, new_affinity, > > + v->cpu_soft_affinity); > > + > > + ret = cpumask_to_xenctl_bitmap( > > + &op->u.vcpuaffinity.eff_cpumap, new_affinity); > > Considering that you have two bitmaps available from the caller, > can''t you just return both when both flags are set? >Well, it''s true that there are two cpumaps, but only one is meant to be an output parameter. Also, I think this is more useful like this, i.e., returning either: - hard-affinity&online - hard-affinity&soft-affinity&online, As it can be seen in the libxl patch (in the new series). Therefore, I kept this interface as it was here, also considering that: - it''s pretty late to re-re-redesign; - neither this nor the xc one are stable interfaces, so we can come back and revisit this later, if we want to. Do you think this could be acceptable? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Nov-25 09:32 UTC
Re: [PATCH v3 09/14] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
>>> On 22.11.13 at 19:55, Dario Faggioli <dario.faggioli@citrix.com> wrote: > On mar, 2013-11-19 at 16:39 +0000, Jan Beulich wrote: >> >>> On 18.11.13 at 19:17, Dario Faggioli <dario.faggioli@citrix.com> wrote: >> > + online = cpupool_online_cpumask(v->domain->cpupool); >> > + cpumask_and(new_affinity, online, v->cpu_hard_affinity); >> > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT) >> > + cpumask_and(new_affinity, new_affinity, >> > + v->cpu_soft_affinity); >> > + >> > + ret = cpumask_to_xenctl_bitmap( >> > + &op->u.vcpuaffinity.eff_cpumap, new_affinity); >> >> Considering that you have two bitmaps available from the caller, >> can''t you just return both when both flags are set? >> > Well, it''s true that there are two cpumaps, but only one is meant to be > an output parameter."meant to be" based on what? Surely not because so far it was that way - as you say further down, this is not an interface required to be stable.> Also, I think this is more useful like this, i.e., > returning either: > - hard-affinity&online > - hard-affinity&soft-affinity&online, > > As it can be seen in the libxl patch (in the new series). > > Therefore, I kept this interface as it was here, also considering that: > - it''s pretty late to re-re-redesign; > - neither this nor the xc one are stable interfaces, so we can come > back and revisit this later, if we want to. > > Do you think this could be acceptable?I wouldn''t veto it, but I also dislike reduced flexibility when more flexibility is obviously achievable without much effort. Jan
Dario Faggioli
2013-Nov-25 09:54 UTC
Re: [PATCH v3 09/14] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
On lun, 2013-11-25 at 09:32 +0000, Jan Beulich wrote:> >>> On 22.11.13 at 19:55, Dario Faggioli <dario.faggioli@citrix.com> wrote: > > Therefore, I kept this interface as it was here, also considering that: > > - it''s pretty late to re-re-redesign; > > - neither this nor the xc one are stable interfaces, so we can come > > back and revisit this later, if we want to. > > > > Do you think this could be acceptable? > > I wouldn''t veto it, but I also dislike reduced flexibility when more > flexibility is obviously achievable without much effort. >Ok, understood. Ok, I''m up for changing this then. So, let m ask a few questions, just to make sure ot get it right this time! ;-P You are saying the interface should look as follows: int xc_vcpu_setaffinity(xc_interface *xch, uint32_t domid, int vcpu, xc_cpumap_t cpumap_soft, xc_cpumap_t cpumap_hard, uint32_t flags); Where both cpumap_soft and cpumap_hard are IN/OUT parameters and, as far as OUT is concerned: - cpumap_hard will contain hard-affinity&online - cpumap_soft will contain what? (a) soft-affinity? (b) soft-affinity&online (c) soft-affinity&hard-affinity&online? I would make it (c), as (a) is what xc_vcpu_setaffinity() retrieves, while the ''&online'' part is particularly hard to get from other existing calls (either at libxc and libxl level). Among (b) and (c), (c) is what would be most useful for an high level caller (like libxl) to have it ready, but it''s also true that, if I give (b) to it, it could in theory reconstruct (c) by himself (with the vice-versa not being true). Thoughts? Also, should I return something in either *only* of the maps when the corresponding flag is set (and of course fill both when both flags are)? Or do we always want to get something back (when non NULL, of course) and apply the content of flag only to the IN path (i.e., for setting either or both the afinities)? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Nov-25 10:00 UTC
Re: [PATCH v3 09/14] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
>>> On 25.11.13 at 10:54, Dario Faggioli <dario.faggioli@citrix.com> wrote: > On lun, 2013-11-25 at 09:32 +0000, Jan Beulich wrote: >> >>> On 22.11.13 at 19:55, Dario Faggioli <dario.faggioli@citrix.com> wrote: >> > Therefore, I kept this interface as it was here, also considering that: >> > - it''s pretty late to re-re-redesign; >> > - neither this nor the xc one are stable interfaces, so we can come >> > back and revisit this later, if we want to. >> > >> > Do you think this could be acceptable? >> >> I wouldn''t veto it, but I also dislike reduced flexibility when more >> flexibility is obviously achievable without much effort. >> > Ok, understood. Ok, I''m up for changing this then. So, let m ask a few > questions, just to make sure ot get it right this time! ;-P > > You are saying the interface should look as follows: > > int xc_vcpu_setaffinity(xc_interface *xch, > uint32_t domid, > int vcpu, > xc_cpumap_t cpumap_soft, > xc_cpumap_t cpumap_hard, > uint32_t flags); > > Where both cpumap_soft and cpumap_hard are IN/OUT parameters and, as far > as OUT is concerned: > - cpumap_hard will contain hard-affinity&online > - cpumap_soft will contain what? > (a) soft-affinity? > (b) soft-affinity&online > (c) soft-affinity&hard-affinity&online?(c) seems the best fit - after all it should represent what the hypervisor will effectively use.> I would make it (c), as (a) is what xc_vcpu_setaffinity() retrieves, > while the ''&online'' part is particularly hard to get from other existing > calls (either at libxc and libxl level). Among (b) and (c), (c) is what > would be most useful for an high level caller (like libxl) to have it > ready, but it''s also true that, if I give (b) to it, it could in theory > reconstruct (c) by himself (with the vice-versa not being true). > > Thoughts? > > Also, should I return something in either *only* of the maps when the > corresponding flag is set (and of course fill both when both flags are)?Of course - the other handle may validly be containing garbage in that case afaict. Jan
George Dunlap
2013-Nov-25 10:58 UTC
Re: [PATCH v3 09/14] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
On 11/25/2013 10:00 AM, Jan Beulich wrote:>>>> On 25.11.13 at 10:54, Dario Faggioli <dario.faggioli@citrix.com> wrote: >> On lun, 2013-11-25 at 09:32 +0000, Jan Beulich wrote: >>>>>> On 22.11.13 at 19:55, Dario Faggioli <dario.faggioli@citrix.com> wrote: >>>> Therefore, I kept this interface as it was here, also considering that: >>>> - it''s pretty late to re-re-redesign; >>>> - neither this nor the xc one are stable interfaces, so we can come >>>> back and revisit this later, if we want to. >>>> >>>> Do you think this could be acceptable? >>> I wouldn''t veto it, but I also dislike reduced flexibility when more >>> flexibility is obviously achievable without much effort. >>> >> Ok, understood. Ok, I''m up for changing this then. So, let m ask a few >> questions, just to make sure ot get it right this time! ;-P >> >> You are saying the interface should look as follows: >> >> int xc_vcpu_setaffinity(xc_interface *xch, >> uint32_t domid, >> int vcpu, >> xc_cpumap_t cpumap_soft, >> xc_cpumap_t cpumap_hard, >> uint32_t flags); >> >> Where both cpumap_soft and cpumap_hard are IN/OUT parameters and, as far >> as OUT is concerned: >> - cpumap_hard will contain hard-affinity&online >> - cpumap_soft will contain what? >> (a) soft-affinity? >> (b) soft-affinity&online >> (c) soft-affinity&hard-affinity&online? > (c) seems the best fit - after all it should represent what the > hypervisor will effectively use.+1