Dario Faggioli
2013-Nov-22 18:56 UTC
[PATCH v4 00/15] Implement vcpu soft affinity for credit1
Take 4 for soft affinities. Very briefly, what it does is allowing each vcpu to have: - an hard affinity, which they already do, and we usually call pinning. This is the list of pcpus where a vcpu is allowed to run; - a soft affinity, which this series introduces. This is the list of pcpus where a vcpu *prefers* to run. Once that is done, per-vcpu NUMA-aware scheduling is easily implemented on top of that, just by instructing libxl to issue the proper call to setup the soft affinity of the domain''s vcpus to be equal to its node-affinity. In this version, I addressed all the comments I got during v3[*], mostly at the libxl and xl level. For xl, I re-organized and refactored the code, as it was requested. For xl, I changed the interface as IanC suggested, so now there is only one function dealing with both hard and soft affinities via 2 cpumaps, either of which (but not both!) can be NULL. To honour libxl API stability, LIBXL_API_VERSION #if-deffery is exploited, as also suggersted during v3 review. Release whise, George expressed himself in favour of a freeze exception here: http://bugs.xenproject.org/xen/mid/<528B8B25.6000608@eu.citrix.com> The patch series is available in the following git branch: git://xenbits.xen.org/people/dariof/xen.git numa/per-vcpu-affinity-v4 Thanks and Regards, Dario [*] http://bugs.xenproject.org/xen/mid/<20131118175544.31002.79574.stgit@Solace> --- Dario Faggioli (15): a xl: match output of vcpu-list with pinning syntax libxl: sanitize error handling in libxl_get_max_{cpus,nodes} n libxl: introduce libxl_get_nr_cpus() ra xl: allow for node-wise specification of vcpu pinning a xl: implement and enable dryrun mode for `xl vcpu-pin'' a xl: test script for the cpumap parser (for vCPU pinning) r xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity r xen: sched: introduce soft-affinity and use it instead d->node-affinity r xen: derive NUMA node affinity from hard and soft CPU affinity r xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity a libxc: get and set soft and hard affinity libxl: get and set soft affinity xl: enable getting and setting soft * xl: enable for specifying node-affinity in the config file a libxl: automatic NUMA placement affects soft affinity n = new in v4 r = has been ''Reviewed-by'' a = has been ''Acked-by'' * = has been ''Acked-by'' but the implementation changed a bit (IanJ explicitly said to point this out) docs/man/xl.cfg.pod.5 | 64 ++- docs/man/xl.pod.1 | 24 + docs/misc/xl-numa-placement.markdown | 162 +++++--- tools/libxc/xc_domain.c | 47 +- tools/libxc/xenctrl.h | 44 ++ tools/libxl/Makefile | 2 tools/libxl/check-xl-vcpupin-parse | 294 ++++++++++++++ tools/libxl/check-xl-vcpupin-parse.data-example | 53 +++ tools/libxl/libxl.c | 96 ++++- tools/libxl/libxl.h | 41 ++ tools/libxl/libxl_create.c | 6 tools/libxl/libxl_dom.c | 23 + tools/libxl/libxl_types.idl | 4 tools/libxl/libxl_utils.c | 76 ++++ tools/libxl/libxl_utils.h | 47 +- tools/libxl/xl_cmdimpl.c | 482 +++++++++++++++-------- tools/libxl/xl_cmdtable.c | 5 tools/ocaml/libs/xc/xenctrl_stubs.c | 8 tools/python/xen/lowlevel/xc/xc.c | 6 xen/arch/x86/traps.c | 13 - xen/common/domain.c | 86 ++-- xen/common/domctl.c | 92 ++++ xen/common/keyhandler.c | 4 xen/common/sched_credit.c | 161 +++----- xen/common/sched_sedf.c | 2 xen/common/schedule.c | 57 ++- xen/common/wait.c | 10 xen/include/public/domctl.h | 15 + xen/include/xen/sched.h | 14 - 29 files changed, 1452 insertions(+), 486 deletions(-) create mode 100755 tools/libxl/check-xl-vcpupin-parse create mode 100644 tools/libxl/check-xl-vcpupin-parse.data-example -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Dario Faggioli
2013-Nov-22 18:56 UTC
[PATCH v4 01/15] xl: match output of vcpu-list with pinning syntax
in fact, pinning to all the pcpus happens by specifying "all" (either on the command line or in the config file), while `xl vcpu-list'' report it as "any cpu". Change this into something more consistent, by using "all" everywhere. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Changes since v1: * this patch was not there in v1. It is now as using the same syntax for both input and output was requested during review. --- tools/libxl/xl_cmdimpl.c | 27 +++++++-------------------- 1 file changed, 7 insertions(+), 20 deletions(-) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 341863e..26d7712 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -3118,8 +3118,7 @@ out: } } -/* If map is not full, prints it and returns 0. Returns 1 otherwise. */ -static int print_bitmap(uint8_t *map, int maplen, FILE *stream) +static void print_bitmap(uint8_t *map, int maplen, FILE *stream) { int i; uint8_t pmap = 0, bitmask = 0; @@ -3157,28 +3156,16 @@ static int print_bitmap(uint8_t *map, int maplen, FILE *stream) case 2: break; case 1: - if (firstset == 0) - return 1; + if (firstset == 0) { + fprintf(stream, "all"); + break; + } case 3: fprintf(stream, "%s%d", state > 1 ? "," : "", firstset); if (i - 1 > firstset) fprintf(stream, "-%d", i - 1); break; } - - return 0; -} - -static void print_cpumap(uint8_t *map, int maplen, FILE *stream) -{ - if (print_bitmap(map, maplen, stream)) - fprintf(stream, "any cpu"); -} - -static void print_nodemap(uint8_t *map, int maplen, FILE *stream) -{ - if (print_bitmap(map, maplen, stream)) - fprintf(stream, "any node"); } static void list_domains(int verbose, int context, int claim, int numa, @@ -3251,7 +3238,7 @@ static void list_domains(int verbose, int context, int claim, int numa, libxl_domain_get_nodeaffinity(ctx, info[i].domid, &nodemap); putchar('' ''); - print_nodemap(nodemap.map, physinfo.nr_nodes, stdout); + print_bitmap(nodemap.map, physinfo.nr_nodes, stdout); } putchar(''\n''); } @@ -4463,7 +4450,7 @@ static void print_vcpuinfo(uint32_t tdomid, /* TIM */ printf("%9.1f ", ((float)vcpuinfo->vcpu_time / 1e9)); /* CPU AFFINITY */ - print_cpumap(vcpuinfo->cpumap.map, nr_cpus, stdout); + print_bitmap(vcpuinfo->cpumap.map, nr_cpus, stdout); printf("\n"); }
Dario Faggioli
2013-Nov-22 18:56 UTC
[PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}
as well as both error handling and logging in libxl_cpu_bitmap_alloc and libxl_node_bitmap_alloc. Now libxl_get_max_{cpus,nodes} either return a positive number, or a libxl error code. Thanks to that, it is possible to fix loggig for the two bitmap allocation functions, which now happens _inside_ the functions themselves, and report what happens more accurately. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v3: * switch the functions to LOG() / LOGE(). * take care of the callers of libxl_get_max_{cpus,nodes}() too. Changes from v2: * this wasn''t there in v2, but fixing this for v3 was requested during v2 review. --- tools/libxl/libxl.c | 14 ++++------- tools/libxl/libxl_utils.c | 58 +++++++++++++++++++++++++++++++++++++++++++-- tools/libxl/libxl_utils.h | 32 +++++-------------------- 3 files changed, 67 insertions(+), 37 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 9b93262..61d8a76 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -611,10 +611,8 @@ static int cpupool_info(libxl__gc *gc, info->n_dom = xcinfo->n_dom; rc = libxl_cpu_bitmap_alloc(CTX, &info->cpumap, 0); if (rc) - { - LOG(ERROR, "unable to allocate cpumap %d\n", rc); goto out; - } + memcpy(info->cpumap.map, xcinfo->cpumap, info->cpumap.size); rc = 0; @@ -4351,7 +4349,7 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out) int max_cpus; max_cpus = libxl_get_max_cpus(ctx); - if (max_cpus == 0) + if (max_cpus <= 0) { LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of CPUS"); ret = NULL; @@ -4416,7 +4414,7 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr) int i, j, max_nodes; max_nodes = libxl_get_max_nodes(ctx); - if (max_nodes == 0) + if (max_nodes <= 0) { LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES"); ret = NULL; @@ -4538,10 +4536,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, } for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu, ++ptr) { - if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) { - LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "allocating cpumap"); + if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) return NULL; - } if (xc_vcpu_getinfo(ctx->xch, domid, *nb_vcpu, &vcpuinfo) == -1) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu info"); return NULL; @@ -5304,7 +5300,7 @@ int libxl_get_freecpus(libxl_ctx *ctx, libxl_bitmap *cpumap) int ncpus; ncpus = libxl_get_max_cpus(ctx); - if (ncpus == 0) + if (ncpus <= 0) return ERROR_FAIL; cpumap->map = xc_cpupool_freeinfo(ctx->xch); diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c index 9f5f589..1815422 100644 --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -651,6 +651,56 @@ char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const libxl_bitmap *bitmap) return q; } +int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int max_cpus) +{ + GC_INIT(ctx); + int rc = 0; + + if (max_cpus < 0) { + rc = ERROR_INVAL; + goto out; + } + if (max_cpus == 0) + max_cpus = libxl_get_max_cpus(ctx); + if (max_cpus <= 0) { + LOG(ERROR, "failed to retrieve the maximum number of cpus"); + rc = ERROR_FAIL; + goto out; + } + /* This can''t fail: no need to check and log */ + libxl_bitmap_alloc(ctx, cpumap, max_cpus); + + out: + GC_FREE; + return rc; +} + +int libxl_node_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *nodemap, + int max_nodes) +{ + GC_INIT(ctx); + int rc = 0; + + if (max_nodes < 0) { + rc = ERROR_INVAL; + goto out; + } + + if (max_nodes == 0) + max_nodes = libxl_get_max_nodes(ctx); + if (max_nodes <= 0) { + LOG(ERROR, "failed to retrieve the maximum number of nodes"); + rc = ERROR_FAIL; + goto out; + } + /* This can''t fail: no need to check and log */ + libxl_bitmap_alloc(ctx, nodemap, max_nodes); + + out: + GC_FREE; + return rc; +} + int libxl_nodemap_to_cpumap(libxl_ctx *ctx, const libxl_bitmap *nodemap, libxl_bitmap *cpumap) @@ -719,12 +769,16 @@ int libxl_cpumap_to_nodemap(libxl_ctx *ctx, int libxl_get_max_cpus(libxl_ctx *ctx) { - return xc_get_max_cpus(ctx->xch); + int max_cpus = xc_get_max_cpus(ctx->xch); + + return max_cpus <= 0 ? ERROR_FAIL : max_cpus; } int libxl_get_max_nodes(libxl_ctx *ctx) { - return xc_get_max_nodes(ctx->xch); + int max_nodes = xc_get_max_nodes(ctx->xch); + + return max_nodes <= 0 ? ERROR_FAIL : max_nodes; } int libxl__enum_from_string(const libxl_enum_string_table *t, diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h index 7b84e6a..b11cf28 100644 --- a/tools/libxl/libxl_utils.h +++ b/tools/libxl/libxl_utils.h @@ -98,32 +98,12 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit) #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \ if (libxl_bitmap_test(&(m), v)) -static inline int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, - int max_cpus) -{ - if (max_cpus < 0) - return ERROR_INVAL; - if (max_cpus == 0) - max_cpus = libxl_get_max_cpus(ctx); - if (max_cpus == 0) - return ERROR_FAIL; - - return libxl_bitmap_alloc(ctx, cpumap, max_cpus); -} - -static inline int libxl_node_bitmap_alloc(libxl_ctx *ctx, - libxl_bitmap *nodemap, - int max_nodes) -{ - if (max_nodes < 0) - return ERROR_INVAL; - if (max_nodes == 0) - max_nodes = libxl_get_max_nodes(ctx); - if (max_nodes == 0) - return ERROR_FAIL; - - return libxl_bitmap_alloc(ctx, nodemap, max_nodes); -} +int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, + libxl_bitmap *cpumap, + int max_cpus); +int libxl_node_bitmap_alloc(libxl_ctx *ctx, + libxl_bitmap *nodemap, + int max_nodes); /* Populate cpumap with the cpus spanned by the nodes in nodemap */ int libxl_nodemap_to_cpumap(libxl_ctx *ctx,
Dario Faggioli
2013-Nov-22 18:56 UTC
[PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()
to retrieve the actual number of pCPUs on the host. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- tools/libxl/libxl.h | 3 +++ tools/libxl/libxl_utils.c | 18 ++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index a9663e4..2fab5ba 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -652,6 +652,9 @@ int libxl_domain_preserve(libxl_ctx *ctx, uint32_t domid, libxl_domain_create_in /* get max. number of cpus supported by hypervisor */ int libxl_get_max_cpus(libxl_ctx *ctx); +/* get the actual number of online cpus on the host */ +int libxl_get_nr_cpus(libxl_ctx *ctx); + /* get max. number of NUMA nodes supported by hypervisor */ int libxl_get_max_nodes(libxl_ctx *ctx); diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c index 1815422..8763070 100644 --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -774,6 +774,24 @@ int libxl_get_max_cpus(libxl_ctx *ctx) return max_cpus <= 0 ? ERROR_FAIL : max_cpus; } +int libxl_get_nr_cpus(libxl_ctx *ctx) +{ + GC_INIT(ctx); + xc_physinfo_t physinfo = { 0 }; + int rc = 0; + + if (xc_physinfo(ctx->xch, &physinfo)) { + LOGE(ERROR, "xc_physinfo failed."); + rc = ERROR_FAIL; + goto out; + } + rc = physinfo.nr_cpus; + + out: + GC_FREE; + return rc; +} + int libxl_get_max_nodes(libxl_ctx *ctx) { int max_nodes = xc_get_max_nodes(ctx->xch);
Dario Faggioli
2013-Nov-22 18:57 UTC
[PATCH v4 04/15] xl: allow for node-wise specification of vcpu pinning
Making it possible to use something like the following: * "nodes:0-3": all pCPUs of nodes 0,1,2,3; * "nodes:0-3,^node:2": all pCPUS of nodes 0,1,3; * "1,nodes:1-2,^6": pCPU 1 plus all pCPUs of nodes 1,2 but not pCPU 6; * ... In both domain config file and `xl vcpu-pin''. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Changes from v1 (of this very series): * actually checking for both "nodes:" and "node:" as per the doc says; * using strcmp() (rather than strncmp()) when matching "all", to avoid returning success on any longer string that just begins with "all"; * fixing the handling (well, the rejection, actually) of "^all" and "^nodes:all"; * make some string pointers const. Changes from v2 (of original series): * turned a ''return'' into ''goto out'', consistently with the most of exit patterns; * harmonized error handling: now parse_range() return a libxl error code, as requested during review; * dealing with "all" moved inside update_cpumap_range(). It''s tricky to move it in parse_range() (as requested during review), since we need the cpumap being modified handy when dealing with it. However, having it in update_cpumap_range() simplifies the code just as much as that; * explicitly checking for junk after a valid value or range in parse_range(), as requested during review; * xl exits on parsing failing, so no need to reset the cpumap to something sensible in vcpupin_parse(), as suggested during review; Changes from v1 (of original series): * code rearranged in order to look more simple to follow and understand, as requested during review; * improved docs in xl.cfg.pod.5, as requested during review; * strtoul() now returns into unsigned long, and the case where it returns ULONG_MAX is now taken into account, as requested during review; * stuff like "all,^7" now works, as requested during review. Specifying just "^7" does not work either before or after this change * killed some magic (i.e., `ptr += 5 + (ptr[4] == ''s''`) by introducing STR_SKIP_PREFIX() macro, as requested during review. --- docs/man/xl.cfg.pod.5 | 20 ++++++ tools/libxl/xl_cmdimpl.c | 153 +++++++++++++++++++++++++++++++++------------- 2 files changed, 128 insertions(+), 45 deletions(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index 3b227b7..5e17b5d 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -115,7 +115,25 @@ To allow all the vcpus of the guest to run on all the cpus on the host. =item "0-3,5,^1" -To allow all the vcpus of the guest to run on cpus 0,2,3,5. +To allow all the vcpus of the guest to run on cpus 0,2,3,5. Combining +this with "all" is possible, meaning "all,^7" results in all the vcpus +of the guest running on all the cpus on the host except cpu 7. + +=item "nodes:0-3,node:^2" + +To allow all the vcpus of the guest to run on the cpus from NUMA nodes +0,1,3 of the host. So, if cpus 0-3 belongs to node 0, cpus 4-7 belongs +to node 1 and cpus 8-11 to node 3, the above would mean all the vcpus +of the guest will run on cpus 0-3,8-11. + +Combining this notation with the one above is possible. For instance, +"1,node:2,^6", means all the vcpus of the guest will run on cpu 1 and +on all the cpus of NUMA node 2, but not on cpu 6. Following the same +example as above, that would be cpus 1,4,5,7. + +Combining this with "all" is also possible, meaning "all,^nodes:1" +results in all the vcpus of the guest running on all the cpus on the +host, except for the cpus belonging to the host NUMA node 1. =item ["2", "3"] (or [2, 3]) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 26d7712..8138cfe 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -59,6 +59,11 @@ } \ }) +#define STR_HAS_PREFIX( a, b ) \ + ( strncmp(a, b, strlen(b)) == 0 ) +#define STR_SKIP_PREFIX( a, b ) \ + ( STR_HAS_PREFIX(a, b) ? ((a) += strlen(b), 1) : 0 ) + int logfile = 2; @@ -563,61 +568,121 @@ static void split_string_into_string_list(const char *str, free(s); } -static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap) +static int parse_range(const char *str, unsigned long *a, unsigned long *b) { - libxl_bitmap exclude_cpumap; - uint32_t cpuida, cpuidb; - char *endptr, *toka, *tokb, *saveptr = NULL; - int i, rc = 0, rmcpu; + const char *nstr; + char *endptr; - if (!strcmp(cpu, "all")) { - libxl_bitmap_set_any(cpumap); - return 0; + *a = *b = strtoul(str, &endptr, 10); + if (endptr == str || *a == ULONG_MAX) + return ERROR_INVAL; + + if (*endptr == ''-'') { + nstr = endptr + 1; + + *b = strtoul(nstr, &endptr, 10); + if (endptr == nstr || *b == ULONG_MAX || *b < *a) + return ERROR_INVAL; + } + + /* Valid value or range so far, but we also don''t want junk after that */ + if (*endptr != ''\0'') + return ERROR_INVAL; + + return 0; +} + +/* + * Add or removes a specific set of cpus (specified in str, either as + * single cpus or as entire NUMA nodes) to/from cpumap. + */ +static int update_cpumap_range(const char *str, libxl_bitmap *cpumap) +{ + unsigned long ida, idb; + libxl_bitmap node_cpumap; + bool is_not = false, is_nodes = false; + int rc = 0; + + libxl_bitmap_init(&node_cpumap); + + rc = libxl_node_bitmap_alloc(ctx, &node_cpumap, 0); + if (rc) { + fprintf(stderr, "libxl_node_bitmap_alloc failed.\n"); + goto out; } - if (libxl_cpu_bitmap_alloc(ctx, &exclude_cpumap, 0)) { - fprintf(stderr, "Error: Failed to allocate cpumap.\n"); - return ENOMEM; + /* Are we adding or removing cpus/nodes? */ + if (STR_SKIP_PREFIX(str, "^")) { + is_not = true; } - for (toka = strtok_r(cpu, ",", &saveptr); toka; - toka = strtok_r(NULL, ",", &saveptr)) { - rmcpu = 0; - if (*toka == ''^'') { - /* This (These) Cpu(s) will be removed from the map */ - toka++; - rmcpu = 1; - } - /* Extract a valid (range of) cpu(s) */ - cpuida = cpuidb = strtoul(toka, &endptr, 10); - if (endptr == toka) { - fprintf(stderr, "Error: Invalid argument.\n"); - rc = EINVAL; - goto vcpp_out; - } - if (*endptr == ''-'') { - tokb = endptr + 1; - cpuidb = strtoul(tokb, &endptr, 10); - if (endptr == tokb || cpuida > cpuidb) { - fprintf(stderr, "Error: Invalid argument.\n"); - rc = EINVAL; - goto vcpp_out; + /* Are we dealing with cpus or full nodes? */ + if (STR_SKIP_PREFIX(str, "node:") || STR_SKIP_PREFIX(str, "nodes:")) { + is_nodes = true; + } + + if (strcmp(str, "all") == 0) { + /* We do not accept "^all" or "^nodes:all" */ + if (is_not) { + fprintf(stderr, "Can''t combine \"^\" and \"all\".\n"); + rc = ERROR_INVAL; + } else + libxl_bitmap_set_any(cpumap); + goto out; + } + + rc = parse_range(str, &ida, &idb); + if (rc) { + fprintf(stderr, "Invalid pcpu range: %s.\n", str); + goto out; + } + + /* Add or remove the specified cpus in the range */ + while (ida <= idb) { + if (is_nodes) { + /* Add/Remove all the cpus of a NUMA node */ + int i; + + rc = libxl_node_to_cpumap(ctx, ida, &node_cpumap); + if (rc) { + fprintf(stderr, "libxl_node_to_cpumap failed.\n"); + goto out; } + + /* Add/Remove all the cpus in the node cpumap */ + libxl_for_each_set_bit(i, node_cpumap) { + is_not ? libxl_bitmap_reset(cpumap, i) : + libxl_bitmap_set(cpumap, i); + } + } else { + /* Add/Remove this cpu */ + is_not ? libxl_bitmap_reset(cpumap, ida) : + libxl_bitmap_set(cpumap, ida); } - while (cpuida <= cpuidb) { - rmcpu == 0 ? libxl_bitmap_set(cpumap, cpuida) : - libxl_bitmap_set(&exclude_cpumap, cpuida); - cpuida++; - } + ida++; } - /* Clear all the cpus from the removal list */ - libxl_for_each_set_bit(i, exclude_cpumap) { - libxl_bitmap_reset(cpumap, i); - } + out: + libxl_bitmap_dispose(&node_cpumap); + return rc; +} -vcpp_out: - libxl_bitmap_dispose(&exclude_cpumap); +/* + * Takes a string representing a set of cpus (specified either as + * single cpus or as eintire NUMA nodes) and turns it into the + * corresponding libxl_bitmap (in cpumap). + */ +static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap) +{ + char *ptr, *saveptr = NULL; + int rc = 0; + + for (ptr = strtok_r(cpu, ",", &saveptr); ptr; + ptr = strtok_r(NULL, ",", &saveptr)) { + rc = update_cpumap_range(ptr, cpumap); + if (rc) + break; + } return rc; }
Dario Faggioli
2013-Nov-22 18:57 UTC
[PATCH v4 05/15] xl: implement and enable dryrun mode for `xl vcpu-pin''
As it can be useful to see if the outcome of some complex vCPU pinning bitmap specification looks as expected. This also allow for the introduction of some automatic testing and verification for the bitmap parsing code, as it happens already in check-xl-disk-parse and check-xl-vif-parse. In particular, to make the above possible, this commit also changes the implementation of the vcpu-pin command so that, instead of always returning 0, it returns an error if the parsing fails. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Changes since v2 (of original series): * fixed a typo in the changelog --- tools/libxl/xl_cmdimpl.c | 48 +++++++++++++++++++++++++++++++++------------ tools/libxl/xl_cmdtable.c | 2 +- 2 files changed, 36 insertions(+), 14 deletions(-) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 8138cfe..7b4d058 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -4583,40 +4583,62 @@ int main_vcpulist(int argc, char **argv) return 0; } -static void vcpupin(uint32_t domid, const char *vcpu, char *cpu) +static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) { libxl_vcpuinfo *vcpuinfo; libxl_bitmap cpumap; uint32_t vcpuid; char *endptr; - int i, nb_vcpu; + int i, nb_cpu, nb_vcpu, rc = -1; + + libxl_bitmap_init(&cpumap); vcpuid = strtoul(vcpu, &endptr, 10); if (vcpu == endptr) { if (strcmp(vcpu, "all")) { fprintf(stderr, "Error: Invalid argument.\n"); - return; + goto out; } vcpuid = -1; } - if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) { - goto vcpupin_out; - } + if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) + goto out; if (vcpupin_parse(cpu, &cpumap)) - goto vcpupin_out1; + goto out; + + if (dryrun_only) { + nb_cpu = libxl_get_nr_cpus(ctx); + if (nb_cpu <= 0) { + fprintf(stderr, "libxl_get_nr_cpus failed.\n"); + goto out; + } + + fprintf(stdout, "cpumap: "); + print_bitmap(cpumap.map, nb_cpu, stdout); + fprintf(stdout, "\n"); + + if (ferror(stdout) || fflush(stdout)) { + perror("stdout"); + exit(-1); + } + + rc = 0; + goto out; + } if (vcpuid != -1) { if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap) == -1) { fprintf(stderr, "Could not set affinity for vcpu `%u''.\n", vcpuid); + goto out; } } else { if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu, &i))) { fprintf(stderr, "libxl_list_vcpu failed.\n"); - goto vcpupin_out1; + goto out; } for (i = 0; i < nb_vcpu; i++) { if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid, @@ -4627,10 +4649,11 @@ static void vcpupin(uint32_t domid, const char *vcpu, char *cpu) } libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu); } - vcpupin_out1: + + rc = 0; + out: libxl_bitmap_dispose(&cpumap); - vcpupin_out: - ; + return rc; } int main_vcpupin(int argc, char **argv) @@ -4641,8 +4664,7 @@ int main_vcpupin(int argc, char **argv) /* No options */ } - vcpupin(find_domain(argv[optind]), argv[optind+1] , argv[optind+2]); - return 0; + return vcpupin(find_domain(argv[optind]), argv[optind+1] , argv[optind+2]); } static void vcpuset(uint32_t domid, const char* nr_vcpus, int check_host) diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c index 7709206..ebe0220 100644 --- a/tools/libxl/xl_cmdtable.c +++ b/tools/libxl/xl_cmdtable.c @@ -211,7 +211,7 @@ struct cmd_spec cmd_table[] = { "[Domain, ...]", }, { "vcpu-pin", - &main_vcpupin, 0, 1, + &main_vcpupin, 1, 1, "Set which CPUs a VCPU can use", "<Domain> <VCPU|all> <CPUs|all>", },
Dario Faggioli
2013-Nov-22 18:57 UTC
[PATCH v4 06/15] xl: test script for the cpumap parser (for vCPU pinning)
This commit introduces "check-xl-vcpupin-parse" for helping verifying and debugging the (v)CPU bitmap parsing code in xl. The script runs "xl -N vcpu-pin 0 all <some strings>" repeatedly, with various input strings, and checks that the output is as expected. This is what the script can do: # ./check-xl-vcpupin-parse -h usage: ./check-xl-vcpupin-parse [options] Tests various vcpu-pinning strings. If run without arguments acts as follows: - generates some test data and saves them in check-xl-vcpupin-parse.data; - tests all the generated configurations (reading them back from check-xl-vcpupin-parse.data). An example of a test vector file is provided in check-xl-vcpupin-parse.data-example. Options: -h prints this message -r seed uses seed for initializing the rundom number generator (default: the script PID) -s string tries using string as a vcpu pinning configuration and reports whether that succeeds or not -o ofile save the test data in ofile (default: check-xl-vcpupin-parse.data) -i ifile read test data from ifile An example test data file (generated on a 2 NUMA nodes, 16 CPUs host) is being provided in check-xl-vcpupin-parse.data-example. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> --- Changes from v2 (of original series): * killed the `sleep 1'', as requested during review; * allow for passing a custom randon seed, and report what is the actual random seed used, as requested during review; * allow for testing for specific pinning configuration strings, as suggested during review; * stores the test data in a file, after generating them, and read them back from there for actual testing, as suggested during review; * allow for reading the test data from an existing test file instead than always generating new ones. Changes from v1 (of original series): * this was not there in v1, and adding it has been requested during review. --- tools/libxl/check-xl-vcpupin-parse | 294 +++++++++++++++++++++++ tools/libxl/check-xl-vcpupin-parse.data-example | 53 ++++ 2 files changed, 347 insertions(+) create mode 100755 tools/libxl/check-xl-vcpupin-parse create mode 100644 tools/libxl/check-xl-vcpupin-parse.data-example diff --git a/tools/libxl/check-xl-vcpupin-parse b/tools/libxl/check-xl-vcpupin-parse new file mode 100755 index 0000000..21f8421 --- /dev/null +++ b/tools/libxl/check-xl-vcpupin-parse @@ -0,0 +1,294 @@ +#!/bin/bash + +set -e + +if [ -x ./xl ] ; then + export LD_LIBRARY_PATH=.:../libxc:../xenstore: + XL=./xl +else + XL=xl +fi + +fprefix=tmp.check-xl-vcpupin-parse +outfile=check-xl-vcpupin-parse.data + +usage () { +cat <<END +usage: $0 [options] + +Tests various vcpu-pinning strings. If run without arguments acts +as follows: + - generates some test data and saves them in $outfile; + - tests all the generated configurations (reading them back from + $outfile). + +An example of a test vector file is provided in ${outfile}-example. + +Options: + -h prints this message + -r seed uses seed for initializing the rundom number generator + (default: the script PID) + -s string tries using string as a vcpu pinning configuration and + reports whether that succeeds or not + -o ofile save the test data in ofile (default: $outfile) + -i ifile read test data from ifile +END +} + +expected () { + cat >$fprefix.expected +} + +# by default, re-seed with our PID +seed=$$ +failures=0 + +# Execute one test and check the result against the provided +# rc value and output +one () { + expected_rc=$1; shift + printf "test case %s...\n" "$*" + set +e + ${XL} -N vcpu-pin 0 all "$@" </dev/null >$fprefix.actual 2>/dev/null + actual_rc=$? + if [ $actual_rc != $expected_rc ]; then + diff -u $fprefix.expected $fprefix.actual + echo >&2 "test case \`$*'' failed ($actual_rc $diff_rc)" + failures=$(( $failures + 1 )) + fi + set -e +} + +# Write an entry in the test vector file. Format is as follows: +# test-string*expected-rc*expected-output +write () { + printf "$1*$2*$3\n" >> $outfile +} + +complete () { + if [ "$failures" = 0 ]; then + echo all ok.; exit 0 + else + echo "$failures tests failed."; exit 1 + fi +} + +# Test a specific pinning string +string () { + expected_rc=$1; shift + printf "test case %s...\n" "$*" + set +e + ${XL} -N vcpu-pin 0 all "$@" &> /dev/null + actual_rc=$? + set -e + + if [ $actual_rc != $expected_rc ]; then + echo >&2 "test case \`$*'' failed ($actual_rc)" + else + echo >&2 "test case \`$*'' succeeded" + fi + + exit 0 +} + +# Read a test vector file (provided as $1) line by line and +# test all the entries it contains +run () +{ + while read line + do + if [ ${line:0:1} != ''#'' ]; then + test_string="`echo $line | cut -f1 -d''*''`" + exp_rc="`echo $line | cut -f2 -d''*''`" + exp_output="`echo $line | cut -f3 -d''*''`" + + expected <<END +$exp_output +END + one $exp_rc "$test_string" + fi + done < $1 + + complete + + exit 0 +} + +while getopts "hr:s:o:i:" option +do + case $option in + h) + usage + exit 0 + ;; + r) + seed=$OPTARG + ;; + s) + string 0 "$OPTARG" + ;; + o) + outfile=$OPTARG + ;; + i) + run $OPTARG + ;; + esac +done + +#---------- test data ---------- +# +nr_cpus=`xl info | grep nr_cpus | cut -f2 -d'':''` +nr_nodes=`xl info | grep nr_nodes | cut -f2 -d'':''` +nr_cpus_per_node=`xl info -n | sed ''/cpu:/,/numa_info/!d'' | head -n -1 | \ + awk ''{print $4}'' | uniq -c | tail -1 | awk ''{print $1}''` +cat >$outfile <<END +# WARNING: some of these tests are topology based tests. +# Expect failures if the topology is not detected correctly +# detected topology: $nr_cpus CPUs, $nr_nodes nodes, $nr_cpus_per_node CPUs per node. +# +# seed used for random number generation: seed=${seed}. +# +# Format is as follows: +# test-string*expected-return-code*expected-output +# +END + +# Re-seed the random number generator +RANDOM=$seed + +echo "# Testing a wrong configuration" >> $outfile +write foo 255 "" + +echo "# Testing the ''all'' syntax" >> $outfile +write "all" 0 "cpumap: all" +write "nodes:all" 0 "cpumap: all" +write "all,nodes:all" 0 "cpumap: all" +write "all,^nodes:0,all" 0 "cpumap: all" + +echo "# Testing the empty cpumap case" >> $outfile +write "^0" 0 "cpumap: none" + +echo "# A few attempts of pinning to just one random cpu" >> $outfile +if [ $nr_cpus -gt 1 ]; then + for i in `seq 0 3`; do + cpu=$(($RANDOM % nr_cpus)) + write "$cpu" 0 "cpumap: $cpu" + done +fi + +echo "# A few attempts of pinning to all but one random cpu" >> $outfile +if [ $nr_cpus -gt 2 ]; then + for i in `seq 0 3`; do + cpu=$(($RANDOM % nr_cpus)) + if [ $cpu -eq 0 ]; then + expected_range="1-$((nr_cpus - 1))" + elif [ $cpu -eq 1 ]; then + expected_range="0,2-$((nr_cpus - 1))" + elif [ $cpu -eq $((nr_cpus - 2)) ]; then + expected_range="0-$((cpu - 1)),$((nr_cpus - 1))" + elif [ $cpu -eq $((nr_cpus - 1)) ]; then + expected_range="0-$((nr_cpus - 2))" + else + expected_range="0-$((cpu - 1)),$((cpu + 1))-$((nr_cpus - 1))" + fi + write "all,^$cpu" 0 "cpumap: $expected_range" + done +fi + +echo "# A few attempts of pinning to a random range of cpus" >> $outfile +if [ $nr_cpus -gt 2 ]; then + for i in `seq 0 3`; do + cpua=$(($RANDOM % nr_cpus)) + range=$((nr_cpus - cpua)) + cpub=$(($RANDOM % range)) + cpubb=$((cpua + cpub)) + if [ $cpua -eq $cpubb ]; then + expected_range="$cpua" + else + expected_range="$cpua-$cpubb" + fi + write "$expected_range" 0 "cpumap: $expected_range" + done +fi + +echo "# A few attempts of pinning to just one random node" >> $outfile +if [ $nr_nodes -gt 1 ]; then + for i in `seq 0 3`; do + node=$(($RANDOM % nr_nodes)) + # this assumes that the first $nr_cpus_per_node (from cpu + # 0 to cpu $nr_cpus_per_node-1) are assigned to the first node + # (node 0), the second $nr_cpus_per_node (from $nr_cpus_per_node + # to 2*$nr_cpus_per_node-1) are assigned to the second node (node + # 1), etc. Expect failures if that is not the case. + write "nodes:$node" 0 "cpumap: $((nr_cpus_per_node*node))-$((nr_cpus_per_node*(node+1)-1))" + done +fi + +echo "# A few attempts of pinning to all but one random node" >> $outfile +if [ $nr_nodes -gt 1 ]; then + for i in `seq 0 3`; do + node=$(($RANDOM % nr_nodes)) + # this assumes that the first $nr_cpus_per_node (from cpu + # 0 to cpu $nr_cpus_per_node-1) are assigned to the first node + # (node 0), the second $nr_cpus_per_node (from $nr_cpus_per_node + # to 2*$nr_cpus_per_node-1) are assigned to the second node (node + # 1), etc. Expect failures if that is not the case. + if [ $node -eq 0 ]; then + expected_range="$nr_cpus_per_node-$((nr_cpus - 1))" + elif [ $node -eq $((nr_nodes - 1)) ]; then + expected_range="0-$((nr_cpus - nr_cpus_per_node - 1))" + else + expected_range="0-$((nr_cpus_per_node*node-1)),$((nr_cpus_per_node*(node+1)))-$nr_cpus" + fi + write "all,^nodes:$node" 0 "cpumap: $expected_range" + done +fi + +echo "# A few attempts of pinning to a random range of nodes" >> $outfile +if [ $nr_nodes -gt 1 ]; then + for i in `seq 0 3`; do + nodea=$(($RANDOM % nr_nodes)) + range=$((nr_nodes - nodea)) + nodeb=$(($RANDOM % range)) + nodebb=$((nodea + nodeb)) + # this assumes that the first $nr_cpus_per_node (from cpu + # 0 to cpu $nr_cpus_per_node-1) are assigned to the first node + # (node 0), the second $nr_cpus_per_node (from $nr_cpus_per_node + # to 2*$nr_cpus_per_node-1) are assigned to the second node (node + # 1), etc. Expect failures if that is not the case. + if [ $nodea -eq 0 ] && [ $nodebb -eq $((nr_nodes - 1)) ]; then + expected_range="all" + else + expected_range="$((nr_cpus_per_node*nodea))-$((nr_cpus_per_node*(nodebb+1) - 1))" + fi + write "nodes:$nodea-$nodebb" 0 "cpumap: $expected_range" + done +fi + +echo "# A few attempts of pinning to a node but excluding one random cpu" >> $outfile +if [ $nr_nodes -gt 1 ]; then + for i in `seq 0 3`; do + node=$(($RANDOM % nr_nodes)) + # this assumes that the first $nr_cpus_per_node (from cpu + # 0 to cpu $nr_cpus_per_node-1) are assigned to the first node + # (node 0), the second $nr_cpus_per_node (from $nr_cpus_per_node + # to 2*$nr_cpus_per_node-1) are assigned to the second node (node + # 1), etc. Expect failures if that is not the case. + cpu=$(($RANDOM % nr_cpus_per_node + nr_cpus_per_node*node)) + if [ $cpu -eq $((nr_cpus_per_node*node)) ]; then + expected_range="$((nr_cpus_per_node*node + 1))-$((nr_cpus_per_node*(node+1) - 1))" + elif [ $cpu -eq $((nr_cpus_per_node*node + 1)) ]; then + expected_range="$((nr_cpus_per_node*node)),$((nr_cpus_per_node*node + 2))-$((nr_cpus_per_node*(node+1) - 1))" + elif [ $cpu -eq $((nr_cpus_per_node*(node+1) - 2)) ]; then + expected_range="$((nr_cpus_per_node*node))-$((nr_cpus_per_node*(node+1) - 3)),$((nr_cpus_per_node*(node+1) - 1))" + elif [ $cpu -eq $((nr_cpus_per_node*(node+1) - 1)) ]; then + expected_range="$((nr_cpus_per_node*node))-$((nr_cpus_per_node*(node+1) - 2))" + else + expected_range="$((nr_cpus_per_node*node))-$((cpu - 1)),$((cpu + 1))-$((nr_cpus_per_node*(node+1) - 1))" + fi + write "nodes:$node,^$cpu" 0 "cpumap: $expected_range" + done +fi + +run $outfile diff --git a/tools/libxl/check-xl-vcpupin-parse.data-example b/tools/libxl/check-xl-vcpupin-parse.data-example new file mode 100644 index 0000000..4bbd5de --- /dev/null +++ b/tools/libxl/check-xl-vcpupin-parse.data-example @@ -0,0 +1,53 @@ +# WARNING: some of these tests are topology based tests. +# Expect failures if the topology is not detected correctly +# detected topology: 16 CPUs, 2 nodes, 8 CPUs per node. +# +# seed used for random number generation: seed=13328. +# +# Format is as follows: +# test-string*expected-return-code*expected-output +# +# Testing a wrong configuration +foo*255* +# Testing the ''all'' syntax +all*0*cpumap: all +nodes:all*0*cpumap: all +all,nodes:all*0*cpumap: all +all,^nodes:0,all*0*cpumap: all +# Testing the empty cpumap case +^0*0*cpumap: none +# A few attempts of pinning to just one random cpu +0*0*cpumap: 0 +9*0*cpumap: 9 +6*0*cpumap: 6 +0*0*cpumap: 0 +# A few attempts of pinning to all but one random cpu +all,^12*0*cpumap: 0-11,13-15 +all,^6*0*cpumap: 0-5,7-15 +all,^3*0*cpumap: 0-2,4-15 +all,^7*0*cpumap: 0-6,8-15 +# A few attempts of pinning to a random range of cpus +13-15*0*cpumap: 13-15 +7*0*cpumap: 7 +3-5*0*cpumap: 3-5 +8-11*0*cpumap: 8-11 +# A few attempts of pinning to just one random node +nodes:1*0*cpumap: 8-15 +nodes:0*0*cpumap: 0-7 +nodes:0*0*cpumap: 0-7 +nodes:0*0*cpumap: 0-7 +# A few attempts of pinning to all but one random node +all,^nodes:0*0*cpumap: 8-15 +all,^nodes:1*0*cpumap: 0-7 +all,^nodes:1*0*cpumap: 0-7 +all,^nodes:0*0*cpumap: 8-15 +# A few attempts of pinning to a random range of nodes +nodes:1-1*0*cpumap: 8-15 +nodes:1-1*0*cpumap: 8-15 +nodes:0-1*0*cpumap: all +nodes:0-0*0*cpumap: 0-7 +# A few attempts of pinning to a node but excluding one random cpu +nodes:1,^8*0*cpumap: 9-15 +nodes:0,^6*0*cpumap: 0-5,7 +nodes:1,^9*0*cpumap: 8,10-15 +nodes:0,^5*0*cpumap: 0-4,6-7
Dario Faggioli
2013-Nov-22 18:57 UTC
[PATCH v4 07/15] xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity
in order to distinguish it from the cpu_soft_affinity which will be introduced a later commit ("xen: sched: introduce soft-affinity and use it instead d->node-affinity"). This patch does not imply any functional change, it is basically the result of something like the following: s/cpu_affinity/cpu_hard_affinity/g s/cpu_affinity_tmp/cpu_hard_affinity_tmp/g s/cpu_affinity_saved/cpu_hard_affinity_saved/g Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> --- Changes from v2: * patch has been moved one step up in the series. --- xen/arch/x86/traps.c | 11 ++++++----- xen/common/domain.c | 22 +++++++++++----------- xen/common/domctl.c | 2 +- xen/common/keyhandler.c | 2 +- xen/common/sched_credit.c | 12 ++++++------ xen/common/sched_sedf.c | 2 +- xen/common/schedule.c | 21 +++++++++++---------- xen/common/wait.c | 4 ++-- xen/include/xen/sched.h | 8 ++++---- 9 files changed, 43 insertions(+), 41 deletions(-) diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index d8b3eac..157031e 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -3093,7 +3093,8 @@ static void nmi_mce_softirq(void) /* Set the tmp value unconditionally, so that * the check in the iret hypercall works. */ - cpumask_copy(st->vcpu->cpu_affinity_tmp, st->vcpu->cpu_affinity); + cpumask_copy(st->vcpu->cpu_hard_affinity_tmp, + st->vcpu->cpu_hard_affinity); if ((cpu != st->processor) || (st->processor != st->vcpu->processor)) @@ -3128,11 +3129,11 @@ void async_exception_cleanup(struct vcpu *curr) return; /* Restore affinity. */ - if ( !cpumask_empty(curr->cpu_affinity_tmp) && - !cpumask_equal(curr->cpu_affinity_tmp, curr->cpu_affinity) ) + if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) && + !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) ) { - vcpu_set_affinity(curr, curr->cpu_affinity_tmp); - cpumask_clear(curr->cpu_affinity_tmp); + vcpu_set_affinity(curr, curr->cpu_hard_affinity_tmp); + cpumask_clear(curr->cpu_hard_affinity_tmp); } if ( !(curr->async_exception_mask & (curr->async_exception_mask - 1)) ) diff --git a/xen/common/domain.c b/xen/common/domain.c index 2cbc489..d8116c7 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -125,9 +125,9 @@ struct vcpu *alloc_vcpu( tasklet_init(&v->continue_hypercall_tasklet, NULL, 0); - if ( !zalloc_cpumask_var(&v->cpu_affinity) || - !zalloc_cpumask_var(&v->cpu_affinity_tmp) || - !zalloc_cpumask_var(&v->cpu_affinity_saved) || + if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) || + !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) || + !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) || !zalloc_cpumask_var(&v->vcpu_dirty_cpumask) ) goto fail_free; @@ -156,9 +156,9 @@ struct vcpu *alloc_vcpu( fail_wq: destroy_waitqueue_vcpu(v); fail_free: - free_cpumask_var(v->cpu_affinity); - free_cpumask_var(v->cpu_affinity_tmp); - free_cpumask_var(v->cpu_affinity_saved); + free_cpumask_var(v->cpu_hard_affinity); + free_cpumask_var(v->cpu_hard_affinity_tmp); + free_cpumask_var(v->cpu_hard_affinity_saved); free_cpumask_var(v->vcpu_dirty_cpumask); free_vcpu_struct(v); return NULL; @@ -371,7 +371,7 @@ void domain_update_node_affinity(struct domain *d) for_each_vcpu ( d, v ) { - cpumask_and(online_affinity, v->cpu_affinity, online); + cpumask_and(online_affinity, v->cpu_hard_affinity, online); cpumask_or(cpumask, cpumask, online_affinity); } @@ -734,9 +734,9 @@ static void complete_domain_destroy(struct rcu_head *head) for ( i = d->max_vcpus - 1; i >= 0; i-- ) if ( (v = d->vcpu[i]) != NULL ) { - free_cpumask_var(v->cpu_affinity); - free_cpumask_var(v->cpu_affinity_tmp); - free_cpumask_var(v->cpu_affinity_saved); + free_cpumask_var(v->cpu_hard_affinity); + free_cpumask_var(v->cpu_hard_affinity_tmp); + free_cpumask_var(v->cpu_hard_affinity_saved); free_cpumask_var(v->vcpu_dirty_cpumask); free_vcpu_struct(v); } @@ -875,7 +875,7 @@ int vcpu_reset(struct vcpu *v) v->async_exception_mask = 0; memset(v->async_exception_state, 0, sizeof(v->async_exception_state)); #endif - cpumask_clear(v->cpu_affinity_tmp); + cpumask_clear(v->cpu_hard_affinity_tmp); clear_bit(_VPF_blocked, &v->pause_flags); clear_bit(_VPF_in_reset, &v->pause_flags); diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 904d27b..5e0ac5c 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -629,7 +629,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) else { ret = cpumask_to_xenctl_bitmap( - &op->u.vcpuaffinity.cpumap, v->cpu_affinity); + &op->u.vcpuaffinity.cpumap, v->cpu_hard_affinity); } } break; diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c index 8e4b3f8..c11f577 100644 --- a/xen/common/keyhandler.c +++ b/xen/common/keyhandler.c @@ -296,7 +296,7 @@ static void dump_domains(unsigned char key) !vcpu_event_delivery_is_enabled(v)); cpuset_print(tmpstr, sizeof(tmpstr), v->vcpu_dirty_cpumask); printk("dirty_cpus=%s ", tmpstr); - cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_affinity); + cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_hard_affinity); printk("cpu_affinity=%s\n", tmpstr); printk(" pause_count=%d pause_flags=%lx\n", atomic_read(&v->pause_count), v->pause_flags); diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index db5512e..c6a2560 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -332,13 +332,13 @@ csched_balance_cpumask(const struct vcpu *vc, int step, cpumask_t *mask) if ( step == CSCHED_BALANCE_NODE_AFFINITY ) { cpumask_and(mask, CSCHED_DOM(vc->domain)->node_affinity_cpumask, - vc->cpu_affinity); + vc->cpu_hard_affinity); if ( unlikely(cpumask_empty(mask)) ) - cpumask_copy(mask, vc->cpu_affinity); + cpumask_copy(mask, vc->cpu_hard_affinity); } else /* step == CSCHED_BALANCE_CPU_AFFINITY */ - cpumask_copy(mask, vc->cpu_affinity); + cpumask_copy(mask, vc->cpu_hard_affinity); } static void burn_credits(struct csched_vcpu *svc, s_time_t now) @@ -407,7 +407,7 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new) if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY && !__vcpu_has_node_affinity(new->vcpu, - new->vcpu->cpu_affinity) ) + new->vcpu->cpu_hard_affinity) ) continue; /* Are there idlers suitable for new (for this balance step)? */ @@ -642,7 +642,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) /* Store in cpus the mask of online cpus on which the domain can run */ online = cpupool_scheduler_cpumask(vc->domain->cpupool); - cpumask_and(&cpus, vc->cpu_affinity, online); + cpumask_and(&cpus, vc->cpu_hard_affinity, online); for_each_csched_balance_step( balance_step ) { @@ -1498,7 +1498,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step) * or counter. */ if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY - && !__vcpu_has_node_affinity(vc, vc->cpu_affinity) ) + && !__vcpu_has_node_affinity(vc, vc->cpu_hard_affinity) ) continue; csched_balance_cpumask(vc, balance_step, csched_balance_mask); diff --git a/xen/common/sched_sedf.c b/xen/common/sched_sedf.c index 7c24171..c219aed 100644 --- a/xen/common/sched_sedf.c +++ b/xen/common/sched_sedf.c @@ -396,7 +396,7 @@ static int sedf_pick_cpu(const struct scheduler *ops, struct vcpu *v) cpumask_t *online; online = cpupool_scheduler_cpumask(v->domain->cpupool); - cpumask_and(&online_affinity, v->cpu_affinity, online); + cpumask_and(&online_affinity, v->cpu_hard_affinity, online); return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) - 1, &online_affinity); } diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 0f45f07..c4236c5 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -194,9 +194,9 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor) */ v->processor = processor; if ( is_idle_domain(d) || d->is_pinned ) - cpumask_copy(v->cpu_affinity, cpumask_of(processor)); + cpumask_copy(v->cpu_hard_affinity, cpumask_of(processor)); else - cpumask_setall(v->cpu_affinity); + cpumask_setall(v->cpu_hard_affinity); /* Initialise the per-vcpu timers. */ init_timer(&v->periodic_timer, vcpu_periodic_timer_fn, @@ -285,7 +285,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) migrate_timer(&v->singleshot_timer, new_p); migrate_timer(&v->poll_timer, new_p); - cpumask_setall(v->cpu_affinity); + cpumask_setall(v->cpu_hard_affinity); lock = vcpu_schedule_lock_irq(v); v->processor = new_p; @@ -457,7 +457,7 @@ static void vcpu_migrate(struct vcpu *v) */ if ( pick_called && (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) && - cpumask_test_cpu(new_cpu, v->cpu_affinity) && + cpumask_test_cpu(new_cpu, v->cpu_hard_affinity) && cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) ) break; @@ -561,7 +561,7 @@ void restore_vcpu_affinity(struct domain *d) { printk(XENLOG_DEBUG "Restoring affinity for d%dv%d\n", d->domain_id, v->vcpu_id); - cpumask_copy(v->cpu_affinity, v->cpu_affinity_saved); + cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved); v->affinity_broken = 0; } @@ -604,20 +604,21 @@ int cpu_disable_scheduler(unsigned int cpu) unsigned long flags; spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags); - cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid); + cpumask_and(&online_affinity, v->cpu_hard_affinity, c->cpu_valid); if ( cpumask_empty(&online_affinity) && - cpumask_test_cpu(cpu, v->cpu_affinity) ) + cpumask_test_cpu(cpu, v->cpu_hard_affinity) ) { printk(XENLOG_DEBUG "Breaking affinity for d%dv%d\n", d->domain_id, v->vcpu_id); if (system_state == SYS_STATE_suspend) { - cpumask_copy(v->cpu_affinity_saved, v->cpu_affinity); + cpumask_copy(v->cpu_hard_affinity_saved, + v->cpu_hard_affinity); v->affinity_broken = 1; } - cpumask_setall(v->cpu_affinity); + cpumask_setall(v->cpu_hard_affinity); } if ( v->processor == cpu ) @@ -665,7 +666,7 @@ int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity) lock = vcpu_schedule_lock_irq(v); - cpumask_copy(v->cpu_affinity, affinity); + cpumask_copy(v->cpu_hard_affinity, affinity); /* Always ask the scheduler to re-evaluate placement * when changing the affinity */ diff --git a/xen/common/wait.c b/xen/common/wait.c index 3c9366c..3f6ff41 100644 --- a/xen/common/wait.c +++ b/xen/common/wait.c @@ -134,7 +134,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv) /* Save current VCPU affinity; force wakeup on *this* CPU only. */ wqv->wakeup_cpu = smp_processor_id(); - cpumask_copy(&wqv->saved_affinity, curr->cpu_affinity); + cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) { gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); @@ -183,7 +183,7 @@ void check_wakeup_from_wait(void) { /* Re-set VCPU affinity and re-enter the scheduler. */ struct vcpu *curr = current; - cpumask_copy(&wqv->saved_affinity, curr->cpu_affinity); + cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) { gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index cbdf377..40e5927 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -192,11 +192,11 @@ struct vcpu spinlock_t virq_lock; /* Bitmask of CPUs on which this VCPU may run. */ - cpumask_var_t cpu_affinity; + cpumask_var_t cpu_hard_affinity; /* Used to change affinity temporarily. */ - cpumask_var_t cpu_affinity_tmp; + cpumask_var_t cpu_hard_affinity_tmp; /* Used to restore affinity across S3. */ - cpumask_var_t cpu_affinity_saved; + cpumask_var_t cpu_hard_affinity_saved; /* Bitmask of CPUs which are holding onto this VCPU''s state. */ cpumask_var_t vcpu_dirty_cpumask; @@ -792,7 +792,7 @@ void watchdog_domain_destroy(struct domain *d); #define has_hvm_container_domain(d) ((d)->guest_type != guest_type_pv) #define has_hvm_container_vcpu(v) (has_hvm_container_domain((v)->domain)) #define is_pinned_vcpu(v) ((v)->domain->is_pinned || \ - cpumask_weight((v)->cpu_affinity) == 1) + cpumask_weight((v)->cpu_hard_affinity) == 1) #ifdef HAS_PASSTHROUGH #define need_iommu(d) ((d)->need_iommu) #else
Dario Faggioli
2013-Nov-22 18:57 UTC
[PATCH v4 08/15] xen: sched: introduce soft-affinity and use it instead d->node-affinity
Before this change, each vcpu had its own vcpu-affinity (in v->cpu_affinity), representing the set of pcpus where the vcpu is allowed to run. Since when NUMA-aware scheduling was introduced the (credit1 only, for now) scheduler also tries as much as it can to run all the vcpus of a domain on one of the nodes that constitutes the domain''s node-affinity. The idea here is making the mechanism more general by: * allowing for this ''preference'' for some pcpus/nodes to be expressed on a per-vcpu basis, instead than for the domain as a whole. That is to say, each vcpu should have its own set of preferred pcpus/nodes, instead than it being the very same for all the vcpus of the domain; * generalizing the idea of ''preferred pcpus'' to not only NUMA awareness and support. That is to say, independently from it being or not (mostly) useful on NUMA systems, it should be possible to specify, for each vcpu, a set of pcpus where it prefers to run (in addition, and possibly unrelated to, the set of pcpus where it is allowed to run). We will be calling this set of *preferred* pcpus the vcpu''s soft affinity, and this changes introduce it, and starts using it for scheduling, replacing the indirect use of the domain''s NUMA node-affinity. This is more general, as soft affinity does not have to be related to NUMA. Nevertheless, it allows to achieve the same results of NUMA-aware scheduling, just by making soft affinity equal to the domain''s node affinity, for all the vCPUs (e.g., from the toolstack). This also means renaming most of the NUMA-aware scheduling related functions, in credit1, to something more generic, hinting toward the concept of soft affinity rather than directly to NUMA awareness. As a side effects, this simplifies the code quit a bit. In fact, prior to this change, we needed to cache the translation of d->node_affinity (which is a nodemask_t) to a cpumask_t, since that is what scheduling decisions require (we used to keep it in node_affinity_cpumask). This, and all the complicated logic required to keep it updated, is not necessary any longer. The high level description of NUMA placement and scheduling in docs/misc/xl-numa-placement.markdown is being updated too, to match the new architecture. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> --- Changes from v2: * this patch folds patches 6 ("xen: sched: make space for cpu_soft_affinity") and 10 ("xen: sched: use soft-affinity instead of domain''s node-affinity"), as suggested during review. ''Reviewed-by'' from George is there since both patch 6 and 10 had it, and I didn''t do anything else than squashing them. Changes from v1: * in v1, "7/12 xen: numa-sched: use per-vcpu node-affinity for actual scheduling" was doing something very similar to this patch. --- docs/misc/xl-numa-placement.markdown | 148 ++++++++++++++++++++++----------- xen/common/domain.c | 5 + xen/common/keyhandler.c | 2 xen/common/sched_credit.c | 153 +++++++++++++--------------------- xen/common/schedule.c | 3 + xen/include/xen/sched.h | 3 + 6 files changed, 168 insertions(+), 146 deletions(-) diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown index caa3fec..b1ed361 100644 --- a/docs/misc/xl-numa-placement.markdown +++ b/docs/misc/xl-numa-placement.markdown @@ -12,13 +12,6 @@ is quite more complex and slow. On these machines, a NUMA node is usually defined as a set of processor cores (typically a physical CPU package) and the memory directly attached to the set of cores. -The Xen hypervisor deals with NUMA machines by assigning to each domain -a "node affinity", i.e., a set of NUMA nodes of the host from which they -get their memory allocated. Also, even if the node affinity of a domain -is allowed to change on-line, it is very important to "place" the domain -correctly when it is fist created, as the most of its memory is allocated -at that time and can not (for now) be moved easily. - NUMA awareness becomes very important as soon as many domains start running memory-intensive workloads on a shared host. In fact, the cost of accessing non node-local memory locations is very high, and the @@ -27,14 +20,37 @@ performance degradation is likely to be noticeable. For more information, have a look at the [Xen NUMA Introduction][numa_intro] page on the Wiki. +## Xen and NUMA machines: the concept of _node-affinity_ ## + +The Xen hypervisor deals with NUMA machines throughout the concept of +_node-affinity_. The node-affinity of a domain is the set of NUMA nodes +of the host where the memory for the domain is being allocated (mostly, +at domain creation time). This is, at least in principle, different and +unrelated with the vCPU (hard and soft, see below) scheduling affinity, +which instead is the set of pCPUs where the vCPU is allowed (or prefers) +to run. + +Of course, despite the fact that they belong to and affect different +subsystems, the domain node-affinity and the vCPUs affinity are not +completely independent. +In fact, if the domain node-affinity is not explicitly specified by the +user, via the proper libxl calls or xl config item, it will be computed +basing on the vCPUs'' scheduling affinity. + +Notice that, even if the node affinity of a domain may change on-line, +it is very important to "place" the domain correctly when it is fist +created, as the most of its memory is allocated at that time and can +not (for now) be moved easily. + ### Placing via pinning and cpupools ### -The simplest way of placing a domain on a NUMA node is statically pinning -the domain''s vCPUs to the pCPUs of the node. This goes under the name of -CPU affinity and can be set through the "cpus=" option in the config file -(more about this below). Another option is to pool together the pCPUs -spanning the node and put the domain in such a cpupool with the "pool=" -config option (as documented in our [Wiki][cpupools_howto]). +The simplest way of placing a domain on a NUMA node is setting the hard +scheduling affinity of the domain''s vCPUs to the pCPUs of the node. This +also goes under the name of vCPU pinning, and can be done through the +"cpus=" option in the config file (more about this below). Another option +is to pool together the pCPUs spanning the node and put the domain in +such a _cpupool_ with the "pool=" config option (as documented in our +[Wiki][cpupools_howto]). In both the above cases, the domain will not be able to execute outside the specified set of pCPUs for any reasons, even if all those pCPUs are @@ -45,24 +61,45 @@ may come at he cost of some load imbalances. ### NUMA aware scheduling ### -If the credit scheduler is in use, the concept of node affinity defined -above does not only apply to memory. In fact, starting from Xen 4.3, the -scheduler always tries to run the domain''s vCPUs on one of the nodes in -its node affinity. Only if that turns out to be impossible, it will just -pick any free pCPU. - -This is, therefore, something more flexible than CPU affinity, as a domain -can still run everywhere, it just prefers some nodes rather than others. -Locality of access is less guaranteed than in the pinning case, but that -comes along with better chances to exploit all the host resources (e.g., -the pCPUs). - -In fact, if all the pCPUs in a domain''s node affinity are busy, it is -possible for the domain to run outside of there, but it is very likely that -slower execution (due to remote memory accesses) is still better than no -execution at all, as it would happen with pinning. For this reason, NUMA -aware scheduling has the potential of bringing substantial performances -benefits, although this will depend on the workload. +If using the credit1 scheduler, and starting from Xen 4.3, the scheduler +itself always tries to run the domain''s vCPUs on one of the nodes in +its node-affinity. Only if that turns out to be impossible, it will just +pick any free pCPU. Locality of access is less guaranteed than in the +pinning case, but that comes along with better chances to exploit all +the host resources (e.g., the pCPUs). + +Starting from Xen 4.4, credit1 supports two forms of affinity: hard and +soft, both on a per-vCPU basis. This means each vCPU can have its own +soft affinity, stating where such vCPU prefers to execute on. This is +less strict than what it (also starting from 4.4) is called hard affinity, +as the vCPU can potentially run everywhere, it just prefers some pCPUs +rather than others. +In Xen 4.4, therefore, NUMA-aware scheduling is achieved by matching the +soft affinity of the vCPUs of a domain with its node-affinity. + +In fact, as it was for 4.3, if all the pCPUs in a vCPU''s soft affinity +are busy, it is possible for the domain to run outside from there. The +idea is that slower execution (due to remote memory accesses) is still +better than no execution at all (as it would happen with pinning). For +this reason, NUMA aware scheduling has the potential of bringing +substantial performances benefits, although this will depend on the +workload. + +Notice that, for each vCPU, the following three scenarios are possbile: + + * a vCPU *is pinned* to some pCPUs and *does not have* any soft affinity + In this case, the vCPU is always scheduled on one of the pCPUs to which + it is pinned, without any specific peference among them. + * a vCPU *has* its own soft affinity and *is not* pinned to any particular + pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the + scheduler will try to have it running on one of the pCPUs in its soft + affinity; + * a vCPU *has* its own vCPU soft affinity and *is also* pinned to some + pCPUs. In this case, the vCPU is always scheduled on one of the pCPUs + onto which it is pinned, with, among them, a preference for the ones + that also forms its soft affinity. In case pinning and soft affinity + form two disjoint sets of pCPUs, pinning "wins", and the soft affinity + is just ignored. ## Guest placement in xl ## @@ -71,25 +108,23 @@ both manual or automatic placement of them across the host''s NUMA nodes. Note that xm/xend does a very similar thing, the only differences being the details of the heuristics adopted for automatic placement (see below), -and the lack of support (in both xm/xend and the Xen versions where that\ +and the lack of support (in both xm/xend and the Xen versions where that was the default toolstack) for NUMA aware scheduling. ### Placing the guest manually ### Thanks to the "cpus=" option, it is possible to specify where a domain should be created and scheduled on, directly in its config file. This -affects NUMA placement and memory accesses as the hypervisor constructs -the node affinity of a VM basing right on its CPU affinity when it is -created. +affects NUMA placement and memory accesses as, in this case, the +hypervisor constructs the node-affinity of a VM basing right on its +vCPU pinning when it is created. This is very simple and effective, but requires the user/system -administrator to explicitly specify affinities for each and every domain, +administrator to explicitly specify the pinning for each and every domain, or Xen won''t be able to guarantee the locality for their memory accesses. -Notice that this also pins the domain''s vCPUs to the specified set of -pCPUs, so it not only sets the domain''s node affinity (its memory will -come from the nodes to which the pCPUs belong), but at the same time -forces the vCPUs of the domain to be scheduled on those same pCPUs. +That, of course, also mean the vCPUs of the domain will only be able to +execute on those same pCPUs. ### Placing the guest automatically ### @@ -97,7 +132,9 @@ If no "cpus=" option is specified in the config file, libxl tries to figure out on its own on which node(s) the domain could fit best. If it finds one (some), the domain''s node affinity get set to there, and both memory allocations and NUMA aware scheduling (for the credit -scheduler and starting from Xen 4.3) will comply with it. +scheduler and starting from Xen 4.3) will comply with it. Starting from +Xen 4.4, this also means that the mask resulting from this "fitting" +procedure will become the soft affinity of all the vCPUs of the domain. It is worthwhile noting that optimally fitting a set of VMs on the NUMA nodes of an host is an incarnation of the Bin Packing Problem. In fact, @@ -142,34 +179,43 @@ any placement from happening: libxl_defbool_set(&domain_build_info->numa_placement, false); -Also, if `numa_placement` is set to `true`, the domain must not -have any CPU affinity (i.e., `domain_build_info->cpumap` must -have all its bits set, as it is by default), or domain creation -will fail returning `ERROR_INVAL`. +Also, if `numa_placement` is set to `true`, the domain''s vCPUs must +not be pinned (i.e., `domain_build_info->cpumap` must have all its +bits set, as it is by default), or domain creation will fail with +`ERROR_INVAL`. Starting from Xen 4.3, in case automatic placement happens (and is -successful), it will affect the domain''s node affinity and _not_ its -CPU affinity. Namely, the domain''s vCPUs will not be pinned to any +successful), it will affect the domain''s node-affinity and _not_ its +vCPU pinning. Namely, the domain''s vCPUs will not be pinned to any pCPU on the host, but the memory from the domain will come from the selected node(s) and the NUMA aware scheduling (if the credit scheduler -is in use) will try to keep the domain there as much as possible. +is in use) will try to keep the domain''s vCPUs there as much as possible. Besides than that, looking and/or tweaking the placement algorithm search "Automatic NUMA placement" in libxl\_internal.h. Note this may change in future versions of Xen/libxl. +## Xen < 4.4 ## + +The concept of vCPU soft affinity has been introduced for the first time +in Xen 4.4. In 4.3, it is the domain''s node-affinity that drives the +NUMA-aware scheduler. The main difference is soft affinity is per-vCPU, +and so each vCPU can have its own mask of pCPUs, while node-affinity is +per-domain, that is the equivalent of having all the vCPUs with the same +soft affinity. + ## Xen < 4.3 ## As NUMA aware scheduling is a new feature of Xen 4.3, things are a little bit different for earlier version of Xen. If no "cpus=" option is specified and Xen 4.2 is in use, the automatic placement algorithm still runs, but the results is used to _pin_ the vCPUs of the domain to the output node(s). -This is consistent with what was happening with xm/xend, which were also -affecting the domain''s CPU affinity. +This is consistent with what was happening with xm/xend. On a version of Xen earlier than 4.2, there is not automatic placement at -all in xl or libxl, and hence no node or CPU affinity being affected. +all in xl or libxl, and hence no node-affinity, vCPU affinity or pinning +being introduced/modified. ## Limitations ## diff --git a/xen/common/domain.c b/xen/common/domain.c index d8116c7..d6ac4d1 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -128,6 +128,7 @@ struct vcpu *alloc_vcpu( if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) || !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) || !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) || + !zalloc_cpumask_var(&v->cpu_soft_affinity) || !zalloc_cpumask_var(&v->vcpu_dirty_cpumask) ) goto fail_free; @@ -159,6 +160,7 @@ struct vcpu *alloc_vcpu( free_cpumask_var(v->cpu_hard_affinity); free_cpumask_var(v->cpu_hard_affinity_tmp); free_cpumask_var(v->cpu_hard_affinity_saved); + free_cpumask_var(v->cpu_soft_affinity); free_cpumask_var(v->vcpu_dirty_cpumask); free_vcpu_struct(v); return NULL; @@ -390,8 +392,6 @@ void domain_update_node_affinity(struct domain *d) node_set(node, d->node_affinity); } - sched_set_node_affinity(d, &d->node_affinity); - spin_unlock(&d->node_affinity_lock); free_cpumask_var(online_affinity); @@ -737,6 +737,7 @@ static void complete_domain_destroy(struct rcu_head *head) free_cpumask_var(v->cpu_hard_affinity); free_cpumask_var(v->cpu_hard_affinity_tmp); free_cpumask_var(v->cpu_hard_affinity_saved); + free_cpumask_var(v->cpu_soft_affinity); free_cpumask_var(v->vcpu_dirty_cpumask); free_vcpu_struct(v); } diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c index c11f577..42fb418 100644 --- a/xen/common/keyhandler.c +++ b/xen/common/keyhandler.c @@ -298,6 +298,8 @@ static void dump_domains(unsigned char key) printk("dirty_cpus=%s ", tmpstr); cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_hard_affinity); printk("cpu_affinity=%s\n", tmpstr); + cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_soft_affinity); + printk("cpu_soft_affinity=%s\n", tmpstr); printk(" pause_count=%d pause_flags=%lx\n", atomic_read(&v->pause_count), v->pause_flags); arch_dump_vcpu_info(v); diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index c6a2560..8b02b7b 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -112,10 +112,24 @@ /* - * Node Balancing + * Hard and soft affinity load balancing. + * + * Idea is each vcpu has some pcpus that it prefers, some that it does not + * prefer but is OK with, and some that it cannot run on at all. The first + * set of pcpus are the ones that are both in the soft affinity *and* in the + * hard affinity; the second set of pcpus are the ones that are in the hard + * affinity but *not* in the soft affinity; the third set of pcpus are the + * ones that are not in the hard affinity. + * + * We implement a two step balancing logic. Basically, every time there is + * the need to decide where to run a vcpu, we first check the soft affinity + * (well, actually, the && between soft and hard affinity), to see if we can + * send it where it prefers to (and can) run on. However, if the first step + * does not find any suitable and free pcpu, we fall back checking the hard + * affinity. */ -#define CSCHED_BALANCE_NODE_AFFINITY 0 -#define CSCHED_BALANCE_CPU_AFFINITY 1 +#define CSCHED_BALANCE_SOFT_AFFINITY 0 +#define CSCHED_BALANCE_HARD_AFFINITY 1 /* * Boot parameters @@ -138,7 +152,7 @@ struct csched_pcpu { /* * Convenience macro for accessing the per-PCPU cpumask we need for - * implementing the two steps (vcpu and node affinity) balancing logic. + * implementing the two steps (soft and hard affinity) balancing logic. * It is stored in csched_pcpu so that serialization is not an issue, * as there is a csched_pcpu for each PCPU and we always hold the * runqueue spin-lock when using this. @@ -178,9 +192,6 @@ struct csched_dom { struct list_head active_vcpu; struct list_head active_sdom_elem; struct domain *dom; - /* cpumask translated from the domain''s node-affinity. - * Basically, the CPUs we prefer to be scheduled on. */ - cpumask_var_t node_affinity_cpumask; uint16_t active_vcpu_count; uint16_t weight; uint16_t cap; @@ -261,59 +272,28 @@ __runq_remove(struct csched_vcpu *svc) list_del_init(&svc->runq_elem); } -/* - * Translates node-affinity mask into a cpumask, so that we can use it during - * actual scheduling. That of course will contain all the cpus from all the - * set nodes in the original node-affinity mask. - * - * Note that any serialization needed to access mask safely is complete - * responsibility of the caller of this function/hook. - */ -static void csched_set_node_affinity( - const struct scheduler *ops, - struct domain *d, - nodemask_t *mask) -{ - struct csched_dom *sdom; - int node; - - /* Skip idle domain since it doesn''t even have a node_affinity_cpumask */ - if ( unlikely(is_idle_domain(d)) ) - return; - - sdom = CSCHED_DOM(d); - cpumask_clear(sdom->node_affinity_cpumask); - for_each_node_mask( node, *mask ) - cpumask_or(sdom->node_affinity_cpumask, sdom->node_affinity_cpumask, - &node_to_cpumask(node)); -} #define for_each_csched_balance_step(step) \ - for ( (step) = 0; (step) <= CSCHED_BALANCE_CPU_AFFINITY; (step)++ ) + for ( (step) = 0; (step) <= CSCHED_BALANCE_HARD_AFFINITY; (step)++ ) /* - * vcpu-affinity balancing is always necessary and must never be skipped. - * OTOH, if a domain''s node-affinity is said to be automatically computed - * (or if it just spans all the nodes), we can safely avoid dealing with - * node-affinity entirely. + * Hard affinity balancing is always necessary and must never be skipped. + * OTOH, if the vcpu''s soft affinity is full (it spans all the possible + * pcpus) we can safely avoid dealing with it entirely. * - * Node-affinity is also deemed meaningless in case it has empty - * intersection with mask, to cover the cases where using the node-affinity + * A vcpu''s soft affinity is also deemed meaningless in case it has empty + * intersection with mask, to cover the cases where using the soft affinity * mask seems legit, but would instead led to trying to schedule the vcpu * on _no_ pcpu! Typical use cases are for mask to be equal to the vcpu''s - * vcpu-affinity, or to the && of vcpu-affinity and the set of online cpus + * hard affinity, or to the && of hard affinity and the set of online cpus * in the domain''s cpupool. */ -static inline int __vcpu_has_node_affinity(const struct vcpu *vc, +static inline int __vcpu_has_soft_affinity(const struct vcpu *vc, const cpumask_t *mask) { - const struct domain *d = vc->domain; - const struct csched_dom *sdom = CSCHED_DOM(d); - - if ( d->auto_node_affinity - || cpumask_full(sdom->node_affinity_cpumask) - || !cpumask_intersects(sdom->node_affinity_cpumask, mask) ) + if ( cpumask_full(vc->cpu_soft_affinity) + || !cpumask_intersects(vc->cpu_soft_affinity, mask) ) return 0; return 1; @@ -321,23 +301,22 @@ static inline int __vcpu_has_node_affinity(const struct vcpu *vc, /* * Each csched-balance step uses its own cpumask. This function determines - * which one (given the step) and copies it in mask. For the node-affinity - * balancing step, the pcpus that are not part of vc''s vcpu-affinity are + * which one (given the step) and copies it in mask. For the soft affinity + * balancing step, the pcpus that are not part of vc''s hard affinity are * filtered out from the result, to avoid running a vcpu where it would * like, but is not allowed to! */ static void csched_balance_cpumask(const struct vcpu *vc, int step, cpumask_t *mask) { - if ( step == CSCHED_BALANCE_NODE_AFFINITY ) + if ( step == CSCHED_BALANCE_SOFT_AFFINITY ) { - cpumask_and(mask, CSCHED_DOM(vc->domain)->node_affinity_cpumask, - vc->cpu_hard_affinity); + cpumask_and(mask, vc->cpu_soft_affinity, vc->cpu_hard_affinity); if ( unlikely(cpumask_empty(mask)) ) cpumask_copy(mask, vc->cpu_hard_affinity); } - else /* step == CSCHED_BALANCE_CPU_AFFINITY */ + else /* step == CSCHED_BALANCE_HARD_AFFINITY */ cpumask_copy(mask, vc->cpu_hard_affinity); } @@ -398,15 +377,15 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new) else if ( !idlers_empty ) { /* - * Node and vcpu-affinity balancing loop. For vcpus without - * a useful node-affinity, consider vcpu-affinity only. + * Soft and hard affinity balancing loop. For vcpus without + * a useful soft affinity, consider hard affinity only. */ for_each_csched_balance_step( balance_step ) { int new_idlers_empty; - if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY - && !__vcpu_has_node_affinity(new->vcpu, + if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY + && !__vcpu_has_soft_affinity(new->vcpu, new->vcpu->cpu_hard_affinity) ) continue; @@ -418,11 +397,11 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new) /* * Let''s not be too harsh! If there aren''t idlers suitable - * for new in its node-affinity mask, make sure we check its - * vcpu-affinity as well, before taking final decisions. + * for new in its soft affinity mask, make sure we check its + * hard affinity as well, before taking final decisions. */ if ( new_idlers_empty - && balance_step == CSCHED_BALANCE_NODE_AFFINITY ) + && balance_step == CSCHED_BALANCE_SOFT_AFFINITY ) continue; /* @@ -649,23 +628,23 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) /* * We want to pick up a pcpu among the ones that are online and * can accommodate vc, which is basically what we computed above - * and stored in cpus. As far as vcpu-affinity is concerned, + * and stored in cpus. As far as hard affinity is concerned, * there always will be at least one of these pcpus, hence cpus * is never empty and the calls to cpumask_cycle() and * cpumask_test_cpu() below are ok. * - * On the other hand, when considering node-affinity too, it + * On the other hand, when considering soft affinity too, it * is possible for the mask to become empty (for instance, if the * domain has been put in a cpupool that does not contain any of the - * nodes in its node-affinity), which would result in the ASSERT()-s + * pcpus in its soft affinity), which would result in the ASSERT()-s * inside cpumask_*() operations triggering (in debug builds). * - * Therefore, in this case, we filter the node-affinity mask against - * cpus and, if the result is empty, we just skip the node-affinity + * Therefore, in this case, we filter the soft affinity mask against + * cpus and, if the result is empty, we just skip the soft affinity * balancing step all together. */ - if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY - && !__vcpu_has_node_affinity(vc, &cpus) ) + if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY + && !__vcpu_has_soft_affinity(vc, &cpus) ) continue; /* Pick an online CPU from the proper affinity mask */ @@ -1122,13 +1101,6 @@ csched_alloc_domdata(const struct scheduler *ops, struct domain *dom) if ( sdom == NULL ) return NULL; - if ( !alloc_cpumask_var(&sdom->node_affinity_cpumask) ) - { - xfree(sdom); - return NULL; - } - cpumask_setall(sdom->node_affinity_cpumask); - /* Initialize credit and weight */ INIT_LIST_HEAD(&sdom->active_vcpu); INIT_LIST_HEAD(&sdom->active_sdom_elem); @@ -1158,9 +1130,6 @@ csched_dom_init(const struct scheduler *ops, struct domain *dom) static void csched_free_domdata(const struct scheduler *ops, void *data) { - struct csched_dom *sdom = data; - - free_cpumask_var(sdom->node_affinity_cpumask); xfree(data); } @@ -1486,19 +1455,19 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step) BUG_ON( is_idle_vcpu(vc) ); /* - * If the vcpu has no useful node-affinity, skip this vcpu. - * In fact, what we want is to check if we have any node-affine - * work to steal, before starting to look at vcpu-affine work. + * If the vcpu has no useful soft affinity, skip this vcpu. + * In fact, what we want is to check if we have any "soft-affine + * work" to steal, before starting to look at "hard-affine work". * * Notice that, if not even one vCPU on this runq has a useful - * node-affinity, we could have avoid considering this runq for - * a node balancing step in the first place. This, for instance, + * soft affinity, we could have avoid considering this runq for + * a soft balancing step in the first place. This, for instance, * can be implemented by taking note of on what runq there are - * vCPUs with useful node-affinities in some sort of bitmap + * vCPUs with useful soft affinities in some sort of bitmap * or counter. */ - if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY - && !__vcpu_has_node_affinity(vc, vc->cpu_hard_affinity) ) + if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY + && !__vcpu_has_soft_affinity(vc, vc->cpu_hard_affinity) ) continue; csched_balance_cpumask(vc, balance_step, csched_balance_mask); @@ -1546,17 +1515,17 @@ csched_load_balance(struct csched_private *prv, int cpu, SCHED_STAT_CRANK(load_balance_other); /* - * Let''s look around for work to steal, taking both vcpu-affinity - * and node-affinity into account. More specifically, we check all + * Let''s look around for work to steal, taking both hard affinity + * and soft affinity into account. More specifically, we check all * the non-idle CPUs'' runq, looking for: - * 1. any node-affine work to steal first, - * 2. if not finding anything, any vcpu-affine work to steal. + * 1. any "soft-affine work" to steal first, + * 2. if not finding anything, any "hard-affine work" to steal. */ for_each_csched_balance_step( bstep ) { /* * We peek at the non-idling CPUs in a node-wise fashion. In fact, - * it is more likely that we find some node-affine work on our same + * it is more likely that we find some affine work on our same * node, not to mention that migrating vcpus within the same node * could well expected to be cheaper than across-nodes (memory * stays local, there might be some node-wide cache[s], etc.). @@ -1982,8 +1951,6 @@ const struct scheduler sched_credit_def = { .adjust = csched_dom_cntl, .adjust_global = csched_sys_cntl, - .set_node_affinity = csched_set_node_affinity, - .pick_cpu = csched_cpu_pick, .do_schedule = csched_schedule, diff --git a/xen/common/schedule.c b/xen/common/schedule.c index c4236c5..c9ae521 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -198,6 +198,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor) else cpumask_setall(v->cpu_hard_affinity); + cpumask_setall(v->cpu_soft_affinity); + /* Initialise the per-vcpu timers. */ init_timer(&v->periodic_timer, vcpu_periodic_timer_fn, v, v->processor); @@ -286,6 +288,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) migrate_timer(&v->poll_timer, new_p); cpumask_setall(v->cpu_hard_affinity); + cpumask_setall(v->cpu_soft_affinity); lock = vcpu_schedule_lock_irq(v); v->processor = new_p; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 40e5927..3575312 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -198,6 +198,9 @@ struct vcpu /* Used to restore affinity across S3. */ cpumask_var_t cpu_hard_affinity_saved; + /* Bitmask of CPUs on which this VCPU prefers to run. */ + cpumask_var_t cpu_soft_affinity; + /* Bitmask of CPUs which are holding onto this VCPU''s state. */ cpumask_var_t vcpu_dirty_cpumask;
Dario Faggioli
2013-Nov-22 18:57 UTC
[PATCH v4 09/15] xen: derive NUMA node affinity from hard and soft CPU affinity
if a domain''s NUMA node-affinity (which is what controls memory allocations) is provided by the user/toolstack, it just is not touched. However, if the user does not say anything, leaving it all to Xen, let''s compute it in the following way: 1. cpupool''s cpus & hard-affinity & soft-affinity 2. if (1) is empty: cpupool''s cpus & hard-affinity This guarantees memory to be allocated from the narrowest possible set of NUMA nodes, ad makes it relatively easy to set up NUMA-aware scheduling on top of soft affinity. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> --- Changes from v3: * avoid pointless calls to cpumask_clear(), as requested during review; * ASSERT() non emptyness of cpupool & hard affinity, as suggested during review. Changes from v2: * the loop computing the mask is now only executed when it really is useful, as suggested during review; * the loop, and all the cpumask handling is optimized, in a way similar to what was suggested during review. --- xen/common/domain.c | 61 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 39 insertions(+), 22 deletions(-) diff --git a/xen/common/domain.c b/xen/common/domain.c index d6ac4d1..4adc123 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -353,17 +353,17 @@ struct domain *domain_create( void domain_update_node_affinity(struct domain *d) { - cpumask_var_t cpumask; - cpumask_var_t online_affinity; + cpumask_var_t dom_cpumask, dom_cpumask_soft; + cpumask_t *dom_affinity; const cpumask_t *online; struct vcpu *v; - unsigned int node; + unsigned int cpu; - if ( !zalloc_cpumask_var(&cpumask) ) + if ( !zalloc_cpumask_var(&dom_cpumask) ) return; - if ( !alloc_cpumask_var(&online_affinity) ) + if ( !zalloc_cpumask_var(&dom_cpumask_soft) ) { - free_cpumask_var(cpumask); + free_cpumask_var(dom_cpumask); return; } @@ -371,31 +371,48 @@ void domain_update_node_affinity(struct domain *d) spin_lock(&d->node_affinity_lock); - for_each_vcpu ( d, v ) - { - cpumask_and(online_affinity, v->cpu_hard_affinity, online); - cpumask_or(cpumask, cpumask, online_affinity); - } - /* - * If d->auto_node_affinity is true, the domain''s node-affinity mask - * (d->node_affinity) is automaically computed from all the domain''s - * vcpus'' vcpu-affinity masks (the union of which we have just built - * above in cpumask). OTOH, if d->auto_node_affinity is false, we - * must leave the node-affinity of the domain alone. + * If d->auto_node_affinity is true, let''s compute the domain''s + * node-affinity and update d->node_affinity accordingly. if false, + * just leave d->auto_node_affinity alone. */ if ( d->auto_node_affinity ) { + /* + * We want the narrowest possible set of pcpus (to get the narowest + * possible set of nodes). What we need is the cpumask of where the + * domain can run (the union of the hard affinity of all its vcpus), + * and the full mask of where it would prefer to run (the union of + * the soft affinity of all its various vcpus). Let''s build them. + */ + for_each_vcpu ( d, v ) + { + cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity); + cpumask_or(dom_cpumask_soft, dom_cpumask_soft, + v->cpu_soft_affinity); + } + /* Filter out non-online cpus */ + cpumask_and(dom_cpumask, dom_cpumask, online); + ASSERT(!cpumask_empty(dom_cpumask)); + /* And compute the intersection between hard, online and soft */ + cpumask_and(dom_cpumask_soft, dom_cpumask_soft, dom_cpumask); + + /* + * If not empty, the intersection of hard, soft and online is the + * narrowest set we want. If empty, we fall back to hard&online. + */ + dom_affinity = cpumask_empty(dom_cpumask_soft) ? + dom_cpumask : dom_cpumask_soft; + nodes_clear(d->node_affinity); - for_each_online_node ( node ) - if ( cpumask_intersects(&node_to_cpumask(node), cpumask) ) - node_set(node, d->node_affinity); + for_each_cpu( cpu, dom_affinity ) + node_set(cpu_to_node(cpu), d->node_affinity); } spin_unlock(&d->node_affinity_lock); - free_cpumask_var(online_affinity); - free_cpumask_var(cpumask); + free_cpumask_var(dom_cpumask_soft); + free_cpumask_var(dom_cpumask); }
Dario Faggioli
2013-Nov-22 18:57 UTC
[PATCH v4 10/15] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
by adding a flag for the caller to specify which one he cares about. Add also another cpumap there. This way, in case of DOMCTL_setvcpuaffinity, Xen can return back to the caller the "effective affinity" of the vcpu. We call the effective affinity the intersection between cpupool''s cpus, the (new?) hard affinity and the (new?) soft affinity. The purpose of this is allowing the toolstack to figure out whether or not the requested change produced sensible results, when combined with the other settings that are already in place. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> --- Changes since v3: * no longer discarding possible errors. Also, rollback setting hard affinity if setting soft affinity fails afterwards, so that the caller really sees no changes when the call fails, as requested during review; * fixed -EFAULT --> -ENOMEM in case of a failed memory allocation, as requested during review; * removed non necessary use of pointer to pointer, as requested during review. Changes from v2: * in DOMCTL_[sg]etvcpuaffinity, flag is really a flag now, i.e., we accept request for setting and getting: (1) only hard affinity; (2) only soft affinity; (3) both; as suggested during review. --- tools/libxc/xc_domain.c | 4 +- xen/arch/x86/traps.c | 4 +- xen/common/domctl.c | 92 +++++++++++++++++++++++++++++++++++++++++-- xen/common/schedule.c | 35 +++++++++++----- xen/common/wait.c | 6 +-- xen/include/public/domctl.h | 15 ++++++- xen/include/xen/sched.h | 3 + 7 files changed, 134 insertions(+), 25 deletions(-) diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 1ccafc5..f9ae4bf 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -215,7 +215,9 @@ int xc_vcpu_setaffinity(xc_interface *xch, domctl.cmd = XEN_DOMCTL_setvcpuaffinity; domctl.domain = (domid_t)domid; - domctl.u.vcpuaffinity.vcpu = vcpu; + domctl.u.vcpuaffinity.vcpu = vcpu; + /* Soft affinity is there, but not used anywhere for now, so... */ + domctl.u.vcpuaffinity.flags = XEN_VCPUAFFINITY_HARD; memcpy(local, cpumap, cpusize); diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 157031e..ff4523b 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -3103,7 +3103,7 @@ static void nmi_mce_softirq(void) * Make sure to wakeup the vcpu on the * specified processor. */ - vcpu_set_affinity(st->vcpu, cpumask_of(st->processor)); + vcpu_set_hard_affinity(st->vcpu, cpumask_of(st->processor)); /* Affinity is restored in the iret hypercall. */ } @@ -3132,7 +3132,7 @@ void async_exception_cleanup(struct vcpu *curr) if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) && !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) ) { - vcpu_set_affinity(curr, curr->cpu_hard_affinity_tmp); + vcpu_set_hard_affinity(curr, curr->cpu_hard_affinity_tmp); cpumask_clear(curr->cpu_hard_affinity_tmp); } diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 5e0ac5c..6081d22 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -616,20 +616,102 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) if ( op->cmd == XEN_DOMCTL_setvcpuaffinity ) { - cpumask_var_t new_affinity; + cpumask_var_t new_affinity, old_affinity; + cpumask_t *online; + + /* We need this to restore hard affinity if setting soft fails */ + if ( !alloc_cpumask_var(&old_affinity) ) + { + ret = -ENOMEM; + break; + } + cpumask_copy(old_affinity, v->cpu_hard_affinity); ret = xenctl_bitmap_to_cpumask( &new_affinity, &op->u.vcpuaffinity.cpumap); - if ( !ret ) + if ( ret ) + { + free_cpumask_var(old_affinity); + break; + } + + ret = -EINVAL; + if ( op->u.vcpuaffinity.flags == 0 ) + goto setvcpuaffinity_out; + + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD ) + { + ret = vcpu_set_hard_affinity(v, new_affinity); + if ( ret ) + goto setvcpuaffinity_out; + } + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT ) + { + ret = vcpu_set_soft_affinity(v, new_affinity); + if ( ret ) + { + /* + * Since we''re returning error, the caller expects nothing + * happened, so we rollback the changes to hard affinity + * (if any). + */ + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD ) + vcpu_set_hard_affinity(v, old_affinity); + goto setvcpuaffinity_out; + } + } + + /* + * Report back to the caller what the "effective affinity", that + * is the intersection of cpupool''s pcpus, the (new?) hard + * affinity and the (new?) soft-affinity. + */ + if ( !guest_handle_is_null(op->u.vcpuaffinity.eff_cpumap.bitmap) ) { - ret = vcpu_set_affinity(v, new_affinity); - free_cpumask_var(new_affinity); + online = cpupool_online_cpumask(v->domain->cpupool); + cpumask_and(new_affinity, online, v->cpu_hard_affinity); + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT) + cpumask_and(new_affinity, new_affinity, + v->cpu_soft_affinity); + + ret = cpumask_to_xenctl_bitmap( + &op->u.vcpuaffinity.eff_cpumap, new_affinity); } + + setvcpuaffinity_out: + free_cpumask_var(new_affinity); + free_cpumask_var(old_affinity); } else { + cpumask_var_t affinity; + + /* + * If the caller asks for both _HARD and _SOFT, what we return + * is the intersection of hard and soft affinity for the vcpu. + */ + if ( !alloc_cpumask_var(&affinity) ) + { + ret = -ENOMEM; + break; + } + cpumask_setall(affinity); + + if ( op->u.vcpuaffinity.flags == 0 ) + { + ret = -EINVAL; + free_cpumask_var(affinity); + break; + } + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD ) + cpumask_copy(affinity, v->cpu_hard_affinity); + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT ) + cpumask_and(affinity, affinity, v->cpu_soft_affinity); + ret = cpumask_to_xenctl_bitmap( - &op->u.vcpuaffinity.cpumap, v->cpu_hard_affinity); + &op->u.vcpuaffinity.cpumap, affinity); + + free_cpumask_var(affinity); } } break; diff --git a/xen/common/schedule.c b/xen/common/schedule.c index c9ae521..b1e9b08 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -654,22 +654,14 @@ void sched_set_node_affinity(struct domain *d, nodemask_t *mask) SCHED_OP(DOM2OP(d), set_node_affinity, d, mask); } -int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity) +static int vcpu_set_affinity( + struct vcpu *v, const cpumask_t *affinity, cpumask_t *which) { - cpumask_t online_affinity; - cpumask_t *online; spinlock_t *lock; - if ( v->domain->is_pinned ) - return -EINVAL; - online = VCPU2ONLINE(v); - cpumask_and(&online_affinity, affinity, online); - if ( cpumask_empty(&online_affinity) ) - return -EINVAL; - lock = vcpu_schedule_lock_irq(v); - cpumask_copy(v->cpu_hard_affinity, affinity); + cpumask_copy(which, affinity); /* Always ask the scheduler to re-evaluate placement * when changing the affinity */ @@ -688,6 +680,27 @@ int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity) return 0; } +int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity) +{ + cpumask_t online_affinity; + cpumask_t *online; + + if ( v->domain->is_pinned ) + return -EINVAL; + + online = VCPU2ONLINE(v); + cpumask_and(&online_affinity, affinity, online); + if ( cpumask_empty(&online_affinity) ) + return -EINVAL; + + return vcpu_set_affinity(v, affinity, v->cpu_hard_affinity); +} + +int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity) +{ + return vcpu_set_affinity(v, affinity, v->cpu_soft_affinity); +} + /* Block the currently-executing domain until a pertinent event occurs. */ void vcpu_block(void) { diff --git a/xen/common/wait.c b/xen/common/wait.c index 3f6ff41..1f6b597 100644 --- a/xen/common/wait.c +++ b/xen/common/wait.c @@ -135,7 +135,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv) /* Save current VCPU affinity; force wakeup on *this* CPU only. */ wqv->wakeup_cpu = smp_processor_id(); cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); - if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) + if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) { gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); domain_crash_synchronous(); @@ -166,7 +166,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv) static void __finish_wait(struct waitqueue_vcpu *wqv) { wqv->esp = NULL; - (void)vcpu_set_affinity(current, &wqv->saved_affinity); + (void)vcpu_set_hard_affinity(current, &wqv->saved_affinity); } void check_wakeup_from_wait(void) @@ -184,7 +184,7 @@ void check_wakeup_from_wait(void) /* Re-set VCPU affinity and re-enter the scheduler. */ struct vcpu *curr = current; cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); - if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) + if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) { gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); domain_crash_synchronous(); diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 01a3652..4f71450 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -300,8 +300,19 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_nodeaffinity_t); /* XEN_DOMCTL_setvcpuaffinity */ /* XEN_DOMCTL_getvcpuaffinity */ struct xen_domctl_vcpuaffinity { - uint32_t vcpu; /* IN */ - struct xenctl_bitmap cpumap; /* IN/OUT */ + /* IN variables. */ + uint32_t vcpu; + /* Set/get the hard affinity for vcpu */ +#define _XEN_VCPUAFFINITY_HARD 0 +#define XEN_VCPUAFFINITY_HARD (1U<<_XEN_VCPUAFFINITY_HARD) + /* Set/get the soft affinity for vcpu */ +#define _XEN_VCPUAFFINITY_SOFT 1 +#define XEN_VCPUAFFINITY_SOFT (1U<<_XEN_VCPUAFFINITY_SOFT) + uint32_t flags; + /* IN/OUT variables. */ + struct xenctl_bitmap cpumap; + /* OUT variables. */ + struct xenctl_bitmap eff_cpumap; }; typedef struct xen_domctl_vcpuaffinity xen_domctl_vcpuaffinity_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuaffinity_t); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 3575312..0f728b3 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -755,7 +755,8 @@ void scheduler_free(struct scheduler *sched); int schedule_cpu_switch(unsigned int cpu, struct cpupool *c); void vcpu_force_reschedule(struct vcpu *v); int cpu_disable_scheduler(unsigned int cpu); -int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity); +int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity); +int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity); void restore_vcpu_affinity(struct domain *d); void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate);
Dario Faggioli
2013-Nov-22 18:58 UTC
[PATCH v4 11/15] libxc: get and set soft and hard affinity
by using the new flag introduced in the parameters of the DOMCTL_{get,set}_vcpuaffinity hypercall. This happens by adding a new parameter (flags) to xc_vcpu_setaffinity() and xc_vcpu_getaffinity(), so that the caller can decide to set either the soft or hard affinity, or even both. In case of setting both hard and soft, they are set to the same cpumap. xc_get_setaffinity() also takes another new param, for reporting back to the caller what the actual affinity the scheduler uses will be after a successful call. In case of asking to get both hard and soft, what the caller gets is the intersection between them. In-tree callers are also fixed to cope with the new interface. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> --- Changes from v2: * better cleanup logic in _vcpu_setaffinity() (regarding xc_hypercall_buffer_{alloc,free}()), as suggested during review; * make it more evident that DOMCTL_setvcpuaffinity has an out parameter, by calling ecpumap_out, and improving the comment wrt that; * change the interface and have xc_vcpu_[sg]etaffinity() so that they take the new parameters (flags and ecpumap_out) and fix the in tree callers. --- tools/libxc/xc_domain.c | 47 +++++++++++++++++++++-------------- tools/libxc/xenctrl.h | 44 ++++++++++++++++++++++++++++++++- tools/libxl/libxl.c | 7 ++++- tools/ocaml/libs/xc/xenctrl_stubs.c | 8 ++++-- tools/python/xen/lowlevel/xc/xc.c | 6 +++- 5 files changed, 86 insertions(+), 26 deletions(-) diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index f9ae4bf..c137351 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -192,44 +192,52 @@ int xc_domain_node_getaffinity(xc_interface *xch, int xc_vcpu_setaffinity(xc_interface *xch, uint32_t domid, int vcpu, - xc_cpumap_t cpumap) + xc_cpumap_t cpumap, + uint32_t flags, + xc_cpumap_t ecpumap_out) { DECLARE_DOMCTL; - DECLARE_HYPERCALL_BUFFER(uint8_t, local); + DECLARE_HYPERCALL_BUFFER(uint8_t, cpumap_local); + DECLARE_HYPERCALL_BUFFER(uint8_t, ecpumap_local); int ret = -1; int cpusize; cpusize = xc_get_cpumap_size(xch); - if (!cpusize) + if ( !cpusize ) { PERROR("Could not get number of cpus"); - goto out; + return -1; } - local = xc_hypercall_buffer_alloc(xch, local, cpusize); - if ( local == NULL ) + cpumap_local = xc_hypercall_buffer_alloc(xch, cpumap_local, cpusize); + ecpumap_local = xc_hypercall_buffer_alloc(xch, ecpumap_local, cpusize); + if ( cpumap_local == NULL || cpumap_local == NULL) { - PERROR("Could not allocate memory for setvcpuaffinity domctl hypercall"); + PERROR("Could not allocate hcall buffers for DOMCTL_setvcpuaffinity"); goto out; } domctl.cmd = XEN_DOMCTL_setvcpuaffinity; domctl.domain = (domid_t)domid; domctl.u.vcpuaffinity.vcpu = vcpu; - /* Soft affinity is there, but not used anywhere for now, so... */ - domctl.u.vcpuaffinity.flags = XEN_VCPUAFFINITY_HARD; - - memcpy(local, cpumap, cpusize); - - set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); + domctl.u.vcpuaffinity.flags = flags; + memcpy(cpumap_local, cpumap, cpusize); + set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, cpumap_local); domctl.u.vcpuaffinity.cpumap.nr_bits = cpusize * 8; + set_xen_guest_handle(domctl.u.vcpuaffinity.eff_cpumap.bitmap, + ecpumap_local); + domctl.u.vcpuaffinity.eff_cpumap.nr_bits = cpusize * 8; + ret = do_domctl(xch, &domctl); - xc_hypercall_buffer_free(xch, local); + if ( ecpumap_out != NULL ) + memcpy(ecpumap_out, ecpumap_local, cpusize); out: + xc_hypercall_buffer_free(xch, cpumap_local); + xc_hypercall_buffer_free(xch, ecpumap_local); return ret; } @@ -237,6 +245,7 @@ int xc_vcpu_setaffinity(xc_interface *xch, int xc_vcpu_getaffinity(xc_interface *xch, uint32_t domid, int vcpu, + uint32_t flags, xc_cpumap_t cpumap) { DECLARE_DOMCTL; @@ -245,22 +254,23 @@ int xc_vcpu_getaffinity(xc_interface *xch, int cpusize; cpusize = xc_get_cpumap_size(xch); - if (!cpusize) + if ( !cpusize ) { PERROR("Could not get number of cpus"); - goto out; + return -1; } local = xc_hypercall_buffer_alloc(xch, local, cpusize); - if (local == NULL) + if ( local == NULL ) { PERROR("Could not allocate memory for getvcpuaffinity domctl hypercall"); - goto out; + return -1; } domctl.cmd = XEN_DOMCTL_getvcpuaffinity; domctl.domain = (domid_t)domid; domctl.u.vcpuaffinity.vcpu = vcpu; + domctl.u.vcpuaffinity.flags = flags; set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); domctl.u.vcpuaffinity.cpumap.nr_bits = cpusize * 8; @@ -270,7 +280,6 @@ int xc_vcpu_getaffinity(xc_interface *xch, memcpy(cpumap, local, cpusize); xc_hypercall_buffer_free(xch, local); -out: return ret; } diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h index 4ac6b8a..a97ed67 100644 --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -579,13 +579,55 @@ int xc_domain_node_getaffinity(xc_interface *xch, uint32_t domind, xc_nodemap_t nodemap); +/** + * This function specifies the CPU affinity for a vcpu. + * + * There are two kinds of affinity. Soft affinity is on what pcpus a vcpu + * prefers to run. Hard affinity is on what pcpus a vcpu is allowed to run. + * If flags contains *only* XEN_VCPUAFFINITY_SOFT, it is the soft affinity + * that is set. If flags contains *only* XEN_VCPUAFFINITY_HARD, it is the + * hard affinity that is set. If flags contains *both*, both are set to the + * same value, provided in cpumap. + * + * The function also returns the effective affinity, via the ecpumap_out + * parameter. Effective affinity it the intersection of soft affinity, hard + * affinity and the set of the cpus of the cpupool the domain belongs to. + * It basically is what the Xen scheduler will actually use. Reporting it + * back to the caller allows him to check if that matches with, or at least + * is good enough for, his purposes. + * + * @param xch a handle to an open hypervisor interface. + * @param domid the id of the domain to which the vcpu belongs + * @param vcpu the vcpu id wihin the domain + * @param cpumap the (hard, soft, both) new affinity map one wants to set + * #param flags what we want to set + * @param ecpumap_out where the effective affinity for the vcpu is returned + */ int xc_vcpu_setaffinity(xc_interface *xch, uint32_t domid, int vcpu, - xc_cpumap_t cpumap); + xc_cpumap_t cpumap, + uint32_t flags, + xc_cpumap_t ecpumap_out); + +/** + * This function retrieves hard or soft CPU affinity (or their intersection) + * for a vcpu, depending on flags. + * + * Soft affinity is returned if *only* XEN_VCPUAFFINITY_SOFT is set in flags. + * Hard affinity is returned if *only* XEN_VCPUAFFINITY_HARD is set in flags. + * If both are set, what is returned is the intersection of the two. + * + * @param xch a handle to an open hypervisor interface. + * @param domid the id of the domain to which the vcpu belongs + * @param vcpu the vcpu id wihin the domain + * @param flags what we want get + * @param cpumap is where the desired affinity is returned + */ int xc_vcpu_getaffinity(xc_interface *xch, uint32_t domid, int vcpu, + uint32_t flags, xc_cpumap_t cpumap); diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 61d8a76..961d55e 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -4542,7 +4542,9 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu info"); return NULL; } - if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu, ptr->cpumap.map) == -1) { + if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu, + XEN_VCPUAFFINITY_HARD, + ptr->cpumap.map) == -1) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu affinity"); return NULL; } @@ -4559,7 +4561,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, libxl_bitmap *cpumap) { - if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map)) { + if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map, + XEN_VCPUAFFINITY_HARD, NULL)) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu affinity"); return ERROR_FAIL; } diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c index f5cf0ed..30327d4 100644 --- a/tools/ocaml/libs/xc/xenctrl_stubs.c +++ b/tools/ocaml/libs/xc/xenctrl_stubs.c @@ -438,7 +438,9 @@ CAMLprim value stub_xc_vcpu_setaffinity(value xch, value domid, c_cpumap[i/8] |= 1 << (i&7); } retval = xc_vcpu_setaffinity(_H(xch), _D(domid), - Int_val(vcpu), c_cpumap); + Int_val(vcpu), c_cpumap, + XEN_VCPUAFFINITY_HARD, + NULL); free(c_cpumap); if (retval < 0) @@ -460,7 +462,9 @@ CAMLprim value stub_xc_vcpu_getaffinity(value xch, value domid, failwith_xc(_H(xch)); retval = xc_vcpu_getaffinity(_H(xch), _D(domid), - Int_val(vcpu), c_cpumap); + Int_val(vcpu), + XEN_VCPUAFFINITY_HARD, + c_cpumap); if (retval < 0) { free(c_cpumap); failwith_xc(_H(xch)); diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c index 2625fc4..9348ce6 100644 --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -256,7 +256,8 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self, } } - if ( xc_vcpu_setaffinity(self->xc_handle, dom, vcpu, cpumap) != 0 ) + if ( xc_vcpu_setaffinity(self->xc_handle, dom, vcpu, cpumap, + XEN_VCPUAFFINITY_HARD, NULL) != 0 ) { free(cpumap); return pyxc_error_to_exception(self->xc_handle); @@ -403,7 +404,8 @@ static PyObject *pyxc_vcpu_getinfo(XcObject *self, if(cpumap == NULL) return pyxc_error_to_exception(self->xc_handle); - rc = xc_vcpu_getaffinity(self->xc_handle, dom, vcpu, cpumap); + rc = xc_vcpu_getaffinity(self->xc_handle, dom, vcpu, + XEN_VCPUAFFINITY_HARD, cpumap); if ( rc < 0 ) { free(cpumap);
Make space for two new cpumap-s, one in vcpu_info (for getting soft affinity) and build_info (for setting it) and amend the API for setting vCPU affinity. libxl_set_vcpuaffinity() now takes two cpumaps, one for hard and one for soft affinity (LIBXL_API_VERSION is exploited to retain source level backword compatibility). Either of the two cpumap can be NULL, in which case, only the affinity corresponding to the non-NULL cpumap will be affected. Getting soft affinity happens indirectly, via `xl vcpu-list'' (as it is already for hard affinity). This commit also introduces some logic to check whether the affinity which will be used by Xen to schedule the vCPU(s) does actually match with the cpumaps provided. In fact, we want to allow every possible combination of hard and soft affinity to be set, but we warn the user upon particularly weird situations (e.g., hard and soft being disjoint sets of pCPUs). This is the first change breaking the libxl ABI, so it bumps the MAJOR. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v3: * only introduce one LIBXL_HAVE_ symbol for soft affinity, as requested during review; * use LIBXL_API_VERSION instead of having multiple version of the same function, as suggested during review; * use libxl_get_nr_cpus() rather than libxl_get_cputopology(), as suggested during review; * use LOGE() instead of LIBXL__LOG_ERRNO(), as requested during review; * kill the flags and use just one _set_vcpuaffinity() function with two cpumaps, allowing either of them to be NULL, as suggested during review; * avoid overflowing the bitmaps in libxl_bitmap_equal(), as suggested during review. Changes from v2: * interface completely redesigned, as discussed during review. --- tools/libxl/Makefile | 2 + tools/libxl/libxl.c | 79 ++++++++++++++++++++++++++++++++++++++----- tools/libxl/libxl.h | 38 ++++++++++++++++++++- tools/libxl/libxl_create.c | 6 +++ tools/libxl/libxl_dom.c | 3 +- tools/libxl/libxl_types.idl | 4 ++ tools/libxl/libxl_utils.h | 25 +++++++++++++- tools/libxl/xl_cmdimpl.c | 6 ++- 8 files changed, 145 insertions(+), 18 deletions(-) diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index d8495bb..1410c44 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -5,7 +5,7 @@ XEN_ROOT = $(CURDIR)/../.. include $(XEN_ROOT)/tools/Rules.mk -MAJOR = 4.3 +MAJOR = 4.4 MINOR = 0 XLUMAJOR = 4.3 diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 961d55e..4b871d7 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -4538,6 +4538,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu, ++ptr) { if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) return NULL; + if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap_soft, 0)) + return NULL; if (xc_vcpu_getinfo(ctx->xch, domid, *nb_vcpu, &vcpuinfo) == -1) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu info"); return NULL; @@ -4548,6 +4550,12 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu affinity"); return NULL; } + if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu, + XEN_VCPUAFFINITY_SOFT, + ptr->cpumap_soft.map) == -1) { + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu affinity"); + return NULL; + } ptr->vcpuid = *nb_vcpu; ptr->cpu = vcpuinfo.cpu; ptr->online = !!vcpuinfo.online; @@ -4559,28 +4567,81 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, } int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, - libxl_bitmap *cpumap) + const libxl_bitmap *cpumap_hard, + const libxl_bitmap *cpumap_soft) { - if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map, - XEN_VCPUAFFINITY_HARD, NULL)) { - LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu affinity"); - return ERROR_FAIL; + GC_INIT(ctx); + libxl_bitmap ecpumap; + int nr_cpus = 0, rc; + + if (!cpumap_hard && !cpumap_soft) + return ERROR_INVAL; + + rc = libxl_cpu_bitmap_alloc(ctx, &ecpumap, 0); + if (rc) + return rc; + + nr_cpus = libxl_get_nr_cpus(ctx); + if (nr_cpus <= 0) { + rc = ERROR_FAIL; + goto out; } + + if (cpumap_hard && xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, + cpumap_hard->map, + XEN_VCPUAFFINITY_HARD, + ecpumap.map)) { + LOGE(ERROR, "setting vcpu hard affinity"); + rc = ERROR_FAIL; + goto out; + } + if (cpumap_hard && !libxl_bitmap_equal(cpumap_hard, &ecpumap, nr_cpus)) + LOG(DEBUG, "New hard affinity for vcpu %d contains unreachable cpus", + vcpuid); + + if (cpumap_soft && xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, + cpumap_soft->map, + XEN_VCPUAFFINITY_SOFT, + ecpumap.map)) { + LOGE(ERROR, "setting vcpu soft affinity"); + rc = ERROR_FAIL; + goto out; + } + if (cpumap_soft && !libxl_bitmap_equal(cpumap_soft, &ecpumap, nr_cpus)) + LOG(DEBUG, "New soft affinity for vcpu %d contains unreachable cpus", + vcpuid); + + /* + * When setting hard affinity, it is guaranteed that the result will not + * be empty, so we need to check for that only if soft. + */ + if (cpumap_soft && libxl_bitmap_is_empty(&ecpumap)) + LOG(WARN, "New soft affinity for vcpu %d has only unreachable cpus." + " Only hard affinity will be considered for scheduling", vcpuid); + + rc = 0; + out: + libxl_bitmap_dispose(&ecpumap); + GC_FREE; return 0; } int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, - unsigned int max_vcpus, libxl_bitmap *cpumap) + unsigned int max_vcpus, + const libxl_bitmap *cpumap_hard, + const libxl_bitmap *cpumap_soft) { + GC_INIT(ctx); int i, rc = 0; for (i = 0; i < max_vcpus; i++) { - if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap)) { - LIBXL__LOG(ctx, LIBXL__LOG_WARNING, - "failed to set affinity for %d", i); + if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap_hard, cpumap_soft)) { + LOG(WARN, "failed to set affinity for %d", i); rc = ERROR_FAIL; } } + + GC_FREE; return rc; } diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 2fab5ba..9864927 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -82,6 +82,14 @@ #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1 /* + * LIBXL_HAVE_SOFT_AFFINITY indicates that a ''cpumap_soft'' + * field (of libxl_bitmap type) is present in both + * libxl_domain_build_info and libxl_vcpuinfo, containing + * the soft affinity for the vcpu. + */ +#define LIBXL_HAVE_SOFT_AFFINITY 1 + +/* * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the * libxl_vendor_device field is present in the hvm sections of * libxl_domain_build_info. This field tells libxl which @@ -994,9 +1002,35 @@ int libxl_userdata_retrieve(libxl_ctx *ctx, uint32_t domid, int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo); int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, - libxl_bitmap *cpumap); + const libxl_bitmap *cpumap_hard, + const libxl_bitmap *cpumap_soft); int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, - unsigned int max_vcpus, libxl_bitmap *cpumap); + unsigned int max_vcpus, + const libxl_bitmap *cpumap_hard, + const libxl_bitmap *cpumap_soft); + +#if defined (LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x040400 + +static inline +int libxl_set_vcpuaffinity_0x040200(libxl_ctx *ctx, uint32_t domid, + uint32_t vcpuid, libxl_bitmap *cpumap) +{ + return libxl_set_vcpuaffinity(ctx, domid, vcpuid, cpumap, NULL); +} + +static inline +int libxl_set_vcpuaffinity_all_0x040200(libxl_ctx *ctx, uint32_t domid, + unsigned int max_vcpus, + libxl_bitmap *cpumap) +{ + return libxl_set_vcpuaffinity_all(ctx, domid, max_vcpus, cpumap, NULL); +} + +#define libxl_set_vcpuaffinity libxl_set_vcpuaffinity_0x040200 +#define libxl_set_vcpuaffinity_all libxl_set_vcpuaffinity_all_0x040200 + +#endif + int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid, libxl_bitmap *nodemap); int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid, diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index fe7ba0d..58f80df 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -193,6 +193,12 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, libxl_bitmap_set_any(&b_info->cpumap); } + if (!b_info->cpumap_soft.size) { + if (libxl_cpu_bitmap_alloc(CTX, &b_info->cpumap_soft, 0)) + return ERROR_FAIL; + libxl_bitmap_set_any(&b_info->cpumap_soft); + } + libxl_defbool_setdefault(&b_info->numa_placement, true); if (!b_info->nodemap.size) { diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 72489f8..4bfed60 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -236,7 +236,8 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid, return rc; } libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap); - libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap); + libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap, + &info->cpumap_soft); xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + LIBXL_MAXMEM_CONSTANT); xs_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL); diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index cba8eff..48699c9 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -298,6 +298,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("max_vcpus", integer), ("avail_vcpus", libxl_bitmap), ("cpumap", libxl_bitmap), + ("cpumap_soft", libxl_bitmap), ("nodemap", libxl_bitmap), ("numa_placement", libxl_defbool), ("tsc_mode", libxl_tsc_mode), @@ -510,7 +511,8 @@ libxl_vcpuinfo = Struct("vcpuinfo", [ ("blocked", bool), ("running", bool), ("vcpu_time", uint64), # total vcpu time ran (ns) - ("cpumap", libxl_bitmap), # current cpu''s affinities + ("cpumap", libxl_bitmap), # current hard cpu affinity + ("cpumap_soft", libxl_bitmap), # current soft cpu affinity ], dir=DIR_OUT) libxl_physinfo = Struct("physinfo", [ diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h index b11cf28..c205dd7 100644 --- a/tools/libxl/libxl_utils.h +++ b/tools/libxl/libxl_utils.h @@ -90,7 +90,7 @@ static inline void libxl_bitmap_set_none(libxl_bitmap *bitmap) { memset(bitmap->map, 0, bitmap->size); } -static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit) +static inline int libxl_bitmap_valid(const libxl_bitmap *bitmap, int bit) { return bit >= 0 && bit < (bitmap->size * 8); } @@ -98,6 +98,29 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit) #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \ if (libxl_bitmap_test(&(m), v)) +static inline int libxl_bitmap_equal(const libxl_bitmap *ba, + const libxl_bitmap *bb, + int nr_bits) +{ + int i; + + /* Only check nr_bits (all bits if <= 0) */ + nr_bits = (nr_bits <= 0) ? ba->size * 8 : nr_bits; + for (i = 0; i < nr_bits; i++) { + /* If overflowing one of the bitmaps, we call them different */ + if (!libxl_bitmap_valid(ba, i) || !libxl_bitmap_valid(bb, i)) + return 0; + if (libxl_bitmap_test(ba, i) != libxl_bitmap_test(bb, i)) + return 0; + } + return 1; +} + +static inline int libxl_bitmap_cpu_valid(libxl_bitmap *cpumap, int cpu) +{ + return libxl_bitmap_valid(cpumap, cpu); +} + int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int max_cpus); diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 7b4d058..17fffdd 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -2222,7 +2222,7 @@ start: } else { libxl_bitmap_set_any(&vcpu_cpumap); } - if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap)) { + if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap, NULL)) { fprintf(stderr, "setting affinity failed on vcpu `%d''.\n", i); libxl_bitmap_dispose(&vcpu_cpumap); free(vcpu_to_pcpu); @@ -4630,7 +4630,7 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) } if (vcpuid != -1) { - if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap) == -1) { + if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) { fprintf(stderr, "Could not set affinity for vcpu `%u''.\n", vcpuid); goto out; } @@ -4642,7 +4642,7 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) } for (i = 0; i < nb_vcpu; i++) { if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid, - &cpumap) == -1) { + &cpumap, NULL)) { fprintf(stderr, "libxl_set_vcpuaffinity failed" " on vcpu `%u''.\n", vcpuinfo[i].vcpuid); }
Getting happens via `xl vcpu-list'', which now looks like this: # xl vcpu-list -s Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) Domain-0 0 0 11 -b- 5.4 8-15 / all Domain-0 0 1 11 -b- 1.0 8-15 / all Domain-0 0 14 13 -b- 1.4 8-15 / all Domain-0 0 15 8 -b- 1.6 8-15 / all vm-test 3 0 4 -b- 2.5 0-12 / 0-7 vm-test 3 1 0 -b- 3.2 0-12 / 0-7 Setting happens by adding a ''-s''/''--soft'' switch to `xl vcpu-pin''. xl manual page is updated accordingly. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v3: * fix typos in doc, rephrased the help message and changed the title of the column for hard/soft affinity, as suggested during review. Changes from v2: * this patch folds what in v2 were patches 13 and 14; * `xl vcpu-pin'' always shows both had and soft affinity, without the need of passing ''-s''. --- docs/man/xl.pod.1 | 24 +++++++++++++++ tools/libxl/xl_cmdimpl.c | 70 +++++++++++++++++++++++++++------------------ tools/libxl/xl_cmdtable.c | 3 +- 3 files changed, 67 insertions(+), 30 deletions(-) diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 index e7b9de2..b738ca2 100644 --- a/docs/man/xl.pod.1 +++ b/docs/man/xl.pod.1 @@ -619,7 +619,7 @@ after B<vcpu-set>, go to B<SEE ALSO> section for information. Lists VCPU information for a specific domain. If no domain is specified, VCPU information for all domains will be provided. -=item B<vcpu-pin> I<domain-id> I<vcpu> I<cpus> +=item B<vcpu-pin> [I<OPTIONS>] I<domain-id> I<vcpu> I<cpus> Pins the VCPU to only run on the specific CPUs. The keyword B<all> can be used to apply the I<cpus> list to all VCPUs in the @@ -630,6 +630,28 @@ different run state is appropriate. Pinning can be used to restrict this, by ensuring certain VCPUs can only run on certain physical CPUs. +B<OPTIONS> + +=over 4 + +=item B<-s>, B<--soft> + +The same as above, but affect I<soft affinity> rather than pinning +(also called I<hard affinity>). + +Normally, VCPUs just wander among the CPUs where it is allowed to +run (either all the CPUs or the ones to which it is pinned, as said +for B<vcpu-list>). Soft affinity offers a means to specify one or +more I<preferred> CPUs. Basically, among the ones where it can run, +the VCPU the VCPU will greatly prefer to execute on one of these +CPUs, whenever that is possible. + +Notice that, in order for soft affinity to actually work, it needs +special support in the scheduler. Right now, only credit1 provides +that. + +=back + =item B<vm-list> Prints information about guests. This list excludes information about diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 17fffdd..d6fda26 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -4514,8 +4514,10 @@ static void print_vcpuinfo(uint32_t tdomid, } /* TIM */ printf("%9.1f ", ((float)vcpuinfo->vcpu_time / 1e9)); - /* CPU AFFINITY */ + /* CPU HARD AND SOFT AFFINITY */ print_bitmap(vcpuinfo->cpumap.map, nr_cpus, stdout); + printf(" / "); + print_bitmap(vcpuinfo->cpumap_soft.map, nr_cpus, stdout); printf("\n"); } @@ -4550,7 +4552,8 @@ static void vcpulist(int argc, char **argv) } printf("%-32s %5s %5s %5s %5s %9s %s\n", - "Name", "ID", "VCPU", "CPU", "State", "Time(s)", "CPU Affinity"); + "Name", "ID", "VCPU", "CPU", "State", "Time(s)", + "Affinity (Hard / Soft)"); if (!argc) { if (!(dominfo = libxl_list_domain(ctx, &nb_domain))) { fprintf(stderr, "libxl_list_domain failed.\n"); @@ -4583,17 +4586,33 @@ int main_vcpulist(int argc, char **argv) return 0; } -static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) +int main_vcpupin(int argc, char **argv) { libxl_vcpuinfo *vcpuinfo; libxl_bitmap cpumap; - - uint32_t vcpuid; - char *endptr; - int i, nb_cpu, nb_vcpu, rc = -1; + uint32_t vcpuid, domid; + const char *vcpu; + char *endptr, *cpu; + int nb_cpu, nb_vcpu, rc = -1; + int opt, soft_affinity = 0; + static struct option opts[] = { + {"soft", 0, 0, ''s''}, + COMMON_LONG_OPTS, + {0, 0, 0, 0} + }; libxl_bitmap_init(&cpumap); + SWITCH_FOREACH_OPT(opt, "s", opts, "vcpu-pin", 3) { + case ''s'': + soft_affinity = 1; + break; + } + + domid = find_domain(argv[optind]); + vcpu = argv[optind+1]; + cpu = argv[optind+2]; + vcpuid = strtoul(vcpu, &endptr, 10); if (vcpu == endptr) { if (strcmp(vcpu, "all")) { @@ -4630,23 +4649,29 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) } if (vcpuid != -1) { - if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) { - fprintf(stderr, "Could not set affinity for vcpu `%u''.\n", vcpuid); + if (!soft_affinity && + libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) { + fprintf(stderr, "Could not set hard affinity for vcpu `%u''.\n", + vcpuid); + goto out; + } else if (soft_affinity && + libxl_set_vcpuaffinity(ctx, domid, vcpuid, NULL, &cpumap)) { + fprintf(stderr, "Could not set soft affinity for vcpu `%u''.\n", + vcpuid); goto out; } } else { - if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu, &i))) { + if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu, &nb_cpu))) { fprintf(stderr, "libxl_list_vcpu failed.\n"); goto out; } - for (i = 0; i < nb_vcpu; i++) { - if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid, - &cpumap, NULL)) { - fprintf(stderr, "libxl_set_vcpuaffinity failed" - " on vcpu `%u''.\n", vcpuinfo[i].vcpuid); - } - } + if (!soft_affinity && + libxl_set_vcpuaffinity_all(ctx, domid, nb_vcpu, &cpumap, NULL)) + fprintf(stderr, "Could not set hard affinity.\n"); + else if (soft_affinity && + libxl_set_vcpuaffinity_all(ctx, domid, nb_vcpu, NULL, &cpumap)) + fprintf(stderr, "Could not set soft affinity.\n"); libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu); } @@ -4656,17 +4681,6 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) return rc; } -int main_vcpupin(int argc, char **argv) -{ - int opt; - - SWITCH_FOREACH_OPT(opt, "", NULL, "vcpu-pin", 3) { - /* No options */ - } - - return vcpupin(find_domain(argv[optind]), argv[optind+1] , argv[optind+2]); -} - static void vcpuset(uint32_t domid, const char* nr_vcpus, int check_host) { char *endptr; diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c index ebe0220..8aee343 100644 --- a/tools/libxl/xl_cmdtable.c +++ b/tools/libxl/xl_cmdtable.c @@ -213,7 +213,8 @@ struct cmd_spec cmd_table[] = { { "vcpu-pin", &main_vcpupin, 1, 1, "Set which CPUs a VCPU can use", - "<Domain> <VCPU|all> <CPUs|all>", + "[option] <Domain> <VCPU|all> <CPUs|all>", + "-s, --soft Set soft affinity", }, { "vcpu-set", &main_vcpuset, 0, 1,
Dario Faggioli
2013-Nov-22 18:58 UTC
[PATCH v4 14/15] xl: enable for specifying node-affinity in the config file
in a similar way to how it is possible to specify vcpu-affinity. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v3: * fix typos and language issues in docs and comments, as suggested during review; * common code to soft and hard affinity parsing factored together, as requested uring review. Changes from v2: * use the new libxl API. Although the implementation changed only a little bit, I removed IanJ''s Acked-by, although I am here saying that he did provided it, as requested. --- docs/man/xl.cfg.pod.5 | 25 +++++- tools/libxl/xl_cmdimpl.c | 192 ++++++++++++++++++++++++++++++---------------- 2 files changed, 146 insertions(+), 71 deletions(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index 5e17b5d..9566393 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -144,19 +144,38 @@ run on cpu #3 of the host. =back If this option is not specified, no vcpu to cpu pinning is established, -and the vcpus of the guest can run on all the cpus of the host. +and the vcpus of the guest can run on all the cpus of the host. If this +option is specified, the intersection of the vcpu pinning mask, provided +here, and the soft affinity mask, provided via B<cpus\_soft=> (if any), +is utilized to compute the domain node-affinity, for driving memory +allocations. If we are on a NUMA machine (i.e., if the host has more than one NUMA node) and this option is not specified, libxl automatically tries to place the guest on the least possible number of nodes. That, however, will not affect vcpu pinning, so the guest will still be able to run on -all the cpus, it will just prefer the ones from the node it has been -placed on. A heuristic approach is used for choosing the best node (or +all the cpus. A heuristic approach is used for choosing the best node (or set of nodes), with the goals of maximizing performance for the guest and, at the same time, achieving efficient utilization of host cpus and memory. See F<docs/misc/xl-numa-placement.markdown> for more details. +=item B<cpus_soft="CPU-LIST"> + +Exactly as B<cpus=>, but specifies soft affinity, rather than pinning +(also called hard affinity). When using the credit scheduler, this +means the vcpus of the domain prefers to run on these pcpus. + +A C<CPU-LIST> is specified exactly as above, for B<cpus=>. + +If this option is not specified, the vcpus of the guest will not have +any preference regarding on what cpu to run, and the scheduler will +treat all the cpus where a vcpu can execute (if B<cpus=> is specified), +or all the host cpus (if not), the same. If this option is specified, +the intersection of the soft affinity mask, provided here, and the vcpu +pinning, provided via B<cpus=> (if any), is utilized to compute the +domain node-affinity, for driving memory allocations. + =back =head3 CPU Scheduling diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index d6fda26..abd3e42 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -76,8 +76,9 @@ xlchild children[child_max]; static const char *common_domname; static int fd_lock = -1; -/* Stash for specific vcpu to pcpu mappping */ +/* Stash for specific vcpu to pcpu hard and soft mappping */ static int *vcpu_to_pcpu; +static int *vcpu_to_pcpu_soft; static const char savefileheader_magic[32] "Xen saved domain, xl format\n \0 \r"; @@ -687,6 +688,64 @@ static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap) return rc; } +static int *parse_config_cpumap_list(XLU_ConfigList *cpus, + libxl_bitmap *cpumap, + int max_vcpus) +{ + int i, n_cpus = 0; + int *to_pcpu; + const char *buf; + + if (libxl_cpu_bitmap_alloc(ctx, cpumap, 0)) { + fprintf(stderr, "Unable to allocate cpumap\n"); + exit(1); + } + + /* Prepare the array for single vcpu to pcpu mappings */ + to_pcpu = xmalloc(sizeof(int) * max_vcpus); + memset(to_pcpu, -1, sizeof(int) * max_vcpus); + + /* + * Idea here is to let libxl think all the domain''s vcpus + * have cpu affinity with all the pcpus on the list. Doing + * that ensures memory is allocated on the proper NUMA nodes. + * It is then us, here in xl, that matches each single vcpu + * to its pcpu (and that''s why we need to stash such info in + * the to_pcpu array now) after the domain has been created. + * This way, we avoid having to pass to libxl some big array + * hosting the single mappings. + */ + libxl_bitmap_set_none(cpumap); + while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) { + i = atoi(buf); + if (!libxl_bitmap_cpu_valid(cpumap, i)) { + fprintf(stderr, "cpu %d illegal\n", i); + exit(1); + } + libxl_bitmap_set(cpumap, i); + if (n_cpus < max_vcpus) + to_pcpu[n_cpus] = i; + n_cpus++; + } + + return to_pcpu; +} + +static void parse_config_cpumap_string(const char *buf, libxl_bitmap *cpumap) +{ + char *buf2 = strdup(buf); + + if (libxl_cpu_bitmap_alloc(ctx, cpumap, 0)) { + fprintf(stderr, "Unable to allocate cpumap\n"); + exit(1); + } + + libxl_bitmap_set_none(cpumap); + if (vcpupin_parse(buf2, cpumap)) + exit(1); + free(buf2); +} + static void parse_config_data(const char *config_source, const char *config_data, int config_len, @@ -697,7 +756,8 @@ static void parse_config_data(const char *config_source, const char *buf; long l; XLU_Config *config; - XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms; + XLU_ConfigList *cpus, *cpus_soft, *vbds, *nics, *pcis; + XLU_ConfigList *cvfbs, *cpuids, *vtpms; XLU_ConfigList *ioports, *irqs, *iomem; int num_ioports, num_irqs, num_iomem; int pci_power_mgmt = 0; @@ -820,57 +880,24 @@ static void parse_config_data(const char *config_source, b_info->max_vcpus = l; if (!xlu_cfg_get_list (config, "cpus", &cpus, 0, 1)) { - int n_cpus = 0; - - if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) { - fprintf(stderr, "Unable to allocate cpumap\n"); - exit(1); - } - - /* Prepare the array for single vcpu to pcpu mappings */ - vcpu_to_pcpu = xmalloc(sizeof(int) * b_info->max_vcpus); - memset(vcpu_to_pcpu, -1, sizeof(int) * b_info->max_vcpus); - - /* - * Idea here is to let libxl think all the domain''s vcpus - * have cpu affinity with all the pcpus on the list. - * It is then us, here in xl, that matches each single vcpu - * to its pcpu (and that''s why we need to stash such info in - * the vcpu_to_pcpu array now) after the domain has been created. - * Doing it like this saves the burden of passing to libxl - * some big array hosting the single mappings. Also, using - * the cpumap derived from the list ensures memory is being - * allocated on the proper nodes anyway. - */ - libxl_bitmap_set_none(&b_info->cpumap); - while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) { - i = atoi(buf); - if (!libxl_bitmap_cpu_valid(&b_info->cpumap, i)) { - fprintf(stderr, "cpu %d illegal\n", i); - exit(1); - } - libxl_bitmap_set(&b_info->cpumap, i); - if (n_cpus < b_info->max_vcpus) - vcpu_to_pcpu[n_cpus] = i; - n_cpus++; - } - + vcpu_to_pcpu = parse_config_cpumap_list(cpus, &b_info->cpumap, + b_info->max_vcpus); /* We have a cpumap, disable automatic placement */ libxl_defbool_set(&b_info->numa_placement, false); } else if (!xlu_cfg_get_string (config, "cpus", &buf, 0)) { - char *buf2 = strdup(buf); - - if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) { - fprintf(stderr, "Unable to allocate cpumap\n"); - exit(1); - } - - libxl_bitmap_set_none(&b_info->cpumap); - if (vcpupin_parse(buf2, &b_info->cpumap)) - exit(1); - free(buf2); + parse_config_cpumap_string(buf, &b_info->cpumap); + libxl_defbool_set(&b_info->numa_placement, false); + } + if (!xlu_cfg_get_list (config, "cpus_soft", &cpus_soft, 0, 1)) { + vcpu_to_pcpu_soft + parse_config_cpumap_list(cpus_soft, &b_info->cpumap_soft, + b_info->max_vcpus); + libxl_defbool_set(&b_info->numa_placement, false); + } + else if (!xlu_cfg_get_string (config, "cpus_soft", &buf, 0)) { + parse_config_cpumap_string(buf, &b_info->cpumap_soft); libxl_defbool_set(&b_info->numa_placement, false); } @@ -1991,6 +2018,40 @@ static void evdisable_disk_ejects(libxl_evgen_disk_eject **diskws, } } +static inline int set_vcpu_to_pcpu_affinity(uint32_t domid, int *to_pcpu, + int max_vcpus, int soft) +{ + libxl_bitmap vcpu_cpumap; + int i, ret = 0; + + ret = libxl_cpu_bitmap_alloc(ctx, &vcpu_cpumap, 0); + if (ret) + return -1; + + for (i = 0; i < max_vcpus; i++) { + if (to_pcpu[i] != -1) { + libxl_bitmap_set_none(&vcpu_cpumap); + libxl_bitmap_set(&vcpu_cpumap, to_pcpu[i]); + } else { + libxl_bitmap_set_any(&vcpu_cpumap); + } + if (!soft && + libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap, NULL)) { + fprintf(stderr, "setting hard affinity failed on vcpu `%d''.\n", i); + ret = -1; + break; + } else if (soft && + libxl_set_vcpuaffinity(ctx, domid, i, NULL, &vcpu_cpumap)) { + fprintf(stderr, "setting soft affinity failed on vcpu `%d''.\n", i); + ret = -1; + break; + } + } + libxl_bitmap_dispose(&vcpu_cpumap); + + return ret; +} + static uint32_t create_domain(struct domain_create *dom_info) { uint32_t domid = INVALID_DOMID; @@ -2207,31 +2268,26 @@ start: if ( ret ) goto error_out; - /* If single vcpu to pcpu mapping was requested, honour it */ + /* If single vcpu pinning or soft affinity was requested, honour it */ if (vcpu_to_pcpu) { - libxl_bitmap vcpu_cpumap; + ret = set_vcpu_to_pcpu_affinity(domid, vcpu_to_pcpu, + d_config.b_info.max_vcpus, 0); + free(vcpu_to_pcpu); - ret = libxl_cpu_bitmap_alloc(ctx, &vcpu_cpumap, 0); if (ret) goto error_out; - for (i = 0; i < d_config.b_info.max_vcpus; i++) { - if (vcpu_to_pcpu[i] != -1) { - libxl_bitmap_set_none(&vcpu_cpumap); - libxl_bitmap_set(&vcpu_cpumap, vcpu_to_pcpu[i]); - } else { - libxl_bitmap_set_any(&vcpu_cpumap); - } - if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap, NULL)) { - fprintf(stderr, "setting affinity failed on vcpu `%d''.\n", i); - libxl_bitmap_dispose(&vcpu_cpumap); - free(vcpu_to_pcpu); - ret = ERROR_FAIL; - goto error_out; - } - } - libxl_bitmap_dispose(&vcpu_cpumap); - free(vcpu_to_pcpu); vcpu_to_pcpu = NULL; + vcpu_to_pcpu = NULL; + } + if (vcpu_to_pcpu_soft) { + ret = set_vcpu_to_pcpu_affinity(domid, vcpu_to_pcpu_soft, + d_config.b_info.max_vcpus, 1); + free(vcpu_to_pcpu_soft); + + if (ret) + goto error_out; + + vcpu_to_pcpu_soft = NULL; } ret = libxl_userdata_store(ctx, domid, "xl",
Dario Faggioli
2013-Nov-22 18:58 UTC
[PATCH v4 15/15] libxl: automatic NUMA placement affects soft affinity
vCPU soft affinity and NUMA-aware scheduling does not have to be related. However, soft affinity is how NUMA-aware scheduling is actually implemented, and therefore, by default, the results of automatic NUMA placement (at VM creation time) are also used to set the soft affinity of all the vCPUs of the domain. Of course, this only happens if automatic NUMA placement is enabled and actually takes place (for instance, if the user does not specify any hard and soft affiniy in the xl config file). This also takes care of the vice-versa, i.e., don''t trigger automatic placement if the config file specifies either an hard (the check for which was already there) or a soft (the check for which is introduced by this commit) affinity. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> --- Changes from v3: * rephrase comments and docs, as suggestd during review. --- docs/man/xl.cfg.pod.5 | 21 +++++++++++---------- docs/misc/xl-numa-placement.markdown | 14 ++++++++++++-- tools/libxl/libxl_dom.c | 20 ++++++++++++++++++-- 3 files changed, 41 insertions(+), 14 deletions(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index 9566393..f210201 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -150,16 +150,6 @@ here, and the soft affinity mask, provided via B<cpus\_soft=> (if any), is utilized to compute the domain node-affinity, for driving memory allocations. -If we are on a NUMA machine (i.e., if the host has more than one NUMA -node) and this option is not specified, libxl automatically tries to -place the guest on the least possible number of nodes. That, however, -will not affect vcpu pinning, so the guest will still be able to run on -all the cpus. A heuristic approach is used for choosing the best node (or -set of nodes), with the goals of maximizing performance for the guest -and, at the same time, achieving efficient utilization of host cpus -and memory. See F<docs/misc/xl-numa-placement.markdown> for more -details. - =item B<cpus_soft="CPU-LIST"> Exactly as B<cpus=>, but specifies soft affinity, rather than pinning @@ -176,6 +166,17 @@ the intersection of the soft affinity mask, provided here, and the vcpu pinning, provided via B<cpus=> (if any), is utilized to compute the domain node-affinity, for driving memory allocations. +If this option is not specified (and B<cpus=> is not specified either), +libxl automatically tries to place the guest on the least possible +number of nodes. A heuristic approach is used for choosing the best +node (or set of nodes), with the goal of maximizing performance for +the guest and, at the same time, achieving efficient utilization of +host cpus and memory. In that case, the soft affinity of all the vcpus +of the domain will be set to the pcpus belonging to the NUMA nodes +chosen during placement. + +For more details, see F<docs/misc/xl-numa-placement.markdown>. + =back =head3 CPU Scheduling diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown index b1ed361..09ae95e 100644 --- a/docs/misc/xl-numa-placement.markdown +++ b/docs/misc/xl-numa-placement.markdown @@ -126,10 +126,20 @@ or Xen won''t be able to guarantee the locality for their memory accesses. That, of course, also mean the vCPUs of the domain will only be able to execute on those same pCPUs. +It is is also possible to have a "cpus\_soft=" option in the xl config file, +to specify the soft affinity for all the vCPUs of the domain. This affects +the NUMA placement in the following way: + + * if only "cpus\_soft=" is present, the VM''s node-affinity will be equal + to the nodes to which the pCPUs in the soft affinity mask belong; + * if both "cpus\_soft=" and "cpus=" are present, the VM''s node-affinity + will be equal to the nodes to which the pCPUs present both in hard and + soft affinity belong. + ### Placing the guest automatically ### -If no "cpus=" option is specified in the config file, libxl tries -to figure out on its own on which node(s) the domain could fit best. +If neither "cpus=" nor "cpus\_soft=" are present in the config file, libxl +tries to figure out on its own on which node(s) the domain could fit best. If it finds one (some), the domain''s node affinity get set to there, and both memory allocations and NUMA aware scheduling (for the credit scheduler and starting from Xen 4.3) will comply with it. Starting from diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 4bfed60..ceb8643 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -222,18 +222,34 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid, * some weird error manifests) the subsequent call to * libxl_domain_set_nodeaffinity() will do the actual placement, * whatever that turns out to be. + * + * As far as scheduling is concerned, we achieve NUMA-aware scheduling + * by having the results of placement affect the soft affinity of all + * the vcpus of the domain. Of course, we want that iff placement is + * enabled and actually happens, so we only change info->cpumap_soft to + * reflect the placement result if that is the case */ if (libxl_defbool_val(info->numa_placement)) { - if (!libxl_bitmap_is_full(&info->cpumap)) { + /* We require both hard and soft affinity not to be set */ + if (!libxl_bitmap_is_full(&info->cpumap) || + !libxl_bitmap_is_full(&info->cpumap_soft)) { LOG(ERROR, "Can run NUMA placement only if no vcpu " - "affinity is specified"); + "(hard or soft) affinity is specified"); return ERROR_INVAL; } rc = numa_place_domain(gc, domid, info); if (rc) return rc; + + /* + * We change the soft affinity in domain_build_info here, of course + * after converting the result of placement from nodes to cpus. the + * following call to libxl_set_vcpuaffinity_all_soft() will do the + * actual updating of the domain''s vcpus'' soft affinity. + */ + libxl_nodemap_to_cpumap(ctx, &info->nodemap, &info->cpumap_soft); } libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap); libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap,
George Dunlap
2013-Nov-25 17:26 UTC
Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}
On 11/22/2013 06:56 PM, Dario Faggioli wrote:> as well as both error handling and logging in libxl_cpu_bitmap_alloc > and libxl_node_bitmap_alloc. > > Now libxl_get_max_{cpus,nodes} either return a positive number, or > a libxl error code. Thanks to that, it is possible to fix loggig for > the two bitmap allocation functions, which now happens _inside_ the > functions themselves, and report what happens more accurately. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
On 11/22/2013 06:58 PM, Dario Faggioli wrote:> Make space for two new cpumap-s, one in vcpu_info (for getting > soft affinity) and build_info (for setting it) and amend the > API for setting vCPU affinity. > > libxl_set_vcpuaffinity() now takes two cpumaps, one for hard > and one for soft affinity (LIBXL_API_VERSION is exploited to > retain source level backword compatibility). Either of the > two cpumap can be NULL, in which case, only the affinity > corresponding to the non-NULL cpumap will be affected. > > Getting soft affinity happens indirectly, via `xl vcpu-list'' > (as it is already for hard affinity). > > This commit also introduces some logic to check whether the > affinity which will be used by Xen to schedule the vCPU(s) > does actually match with the cpumaps provided. In fact, we > want to allow every possible combination of hard and soft > affinity to be set, but we warn the user upon particularly > weird situations (e.g., hard and soft being disjoint sets > of pCPUs). > > This is the first change breaking the libxl ABI, so it bumps > the MAJOR. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>Looks good: Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>> --- > Changes from v3: > * only introduce one LIBXL_HAVE_ symbol for soft affinity, > as requested during review; > * use LIBXL_API_VERSION instead of having multiple version > of the same function, as suggested during review; > * use libxl_get_nr_cpus() rather than libxl_get_cputopology(), > as suggested during review; > * use LOGE() instead of LIBXL__LOG_ERRNO(), as requested > during review; > * kill the flags and use just one _set_vcpuaffinity() > function with two cpumaps, allowing either of them to > be NULL, as suggested during review; > * avoid overflowing the bitmaps in libxl_bitmap_equal(), > as suggested during review. > > Changes from v2: > * interface completely redesigned, as discussed during > review. > --- > tools/libxl/Makefile | 2 + > tools/libxl/libxl.c | 79 ++++++++++++++++++++++++++++++++++++++----- > tools/libxl/libxl.h | 38 ++++++++++++++++++++- > tools/libxl/libxl_create.c | 6 +++ > tools/libxl/libxl_dom.c | 3 +- > tools/libxl/libxl_types.idl | 4 ++ > tools/libxl/libxl_utils.h | 25 +++++++++++++- > tools/libxl/xl_cmdimpl.c | 6 ++- > 8 files changed, 145 insertions(+), 18 deletions(-) > > diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile > index d8495bb..1410c44 100644 > --- a/tools/libxl/Makefile > +++ b/tools/libxl/Makefile > @@ -5,7 +5,7 @@ > XEN_ROOT = $(CURDIR)/../.. > include $(XEN_ROOT)/tools/Rules.mk > > -MAJOR = 4.3 > +MAJOR = 4.4 > MINOR = 0 > > XLUMAJOR = 4.3 > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > index 961d55e..4b871d7 100644 > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -4538,6 +4538,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, > for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu, ++ptr) { > if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) > return NULL; > + if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap_soft, 0)) > + return NULL; > if (xc_vcpu_getinfo(ctx->xch, domid, *nb_vcpu, &vcpuinfo) == -1) { > LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu info"); > return NULL; > @@ -4548,6 +4550,12 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, > LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu affinity"); > return NULL; > } > + if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu, > + XEN_VCPUAFFINITY_SOFT, > + ptr->cpumap_soft.map) == -1) { > + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu affinity"); > + return NULL; > + } > ptr->vcpuid = *nb_vcpu; > ptr->cpu = vcpuinfo.cpu; > ptr->online = !!vcpuinfo.online; > @@ -4559,28 +4567,81 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, > } > > int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > - libxl_bitmap *cpumap) > + const libxl_bitmap *cpumap_hard, > + const libxl_bitmap *cpumap_soft) > { > - if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map, > - XEN_VCPUAFFINITY_HARD, NULL)) { > - LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu affinity"); > - return ERROR_FAIL; > + GC_INIT(ctx); > + libxl_bitmap ecpumap; > + int nr_cpus = 0, rc; > + > + if (!cpumap_hard && !cpumap_soft) > + return ERROR_INVAL; > + > + rc = libxl_cpu_bitmap_alloc(ctx, &ecpumap, 0); > + if (rc) > + return rc; > + > + nr_cpus = libxl_get_nr_cpus(ctx); > + if (nr_cpus <= 0) { > + rc = ERROR_FAIL; > + goto out; > } > + > + if (cpumap_hard && xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, > + cpumap_hard->map, > + XEN_VCPUAFFINITY_HARD, > + ecpumap.map)) { > + LOGE(ERROR, "setting vcpu hard affinity"); > + rc = ERROR_FAIL; > + goto out; > + } > + if (cpumap_hard && !libxl_bitmap_equal(cpumap_hard, &ecpumap, nr_cpus)) > + LOG(DEBUG, "New hard affinity for vcpu %d contains unreachable cpus", > + vcpuid); > + > + if (cpumap_soft && xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, > + cpumap_soft->map, > + XEN_VCPUAFFINITY_SOFT, > + ecpumap.map)) { > + LOGE(ERROR, "setting vcpu soft affinity"); > + rc = ERROR_FAIL; > + goto out; > + } > + if (cpumap_soft && !libxl_bitmap_equal(cpumap_soft, &ecpumap, nr_cpus)) > + LOG(DEBUG, "New soft affinity for vcpu %d contains unreachable cpus", > + vcpuid); > + > + /* > + * When setting hard affinity, it is guaranteed that the result will not > + * be empty, so we need to check for that only if soft. > + */ > + if (cpumap_soft && libxl_bitmap_is_empty(&ecpumap)) > + LOG(WARN, "New soft affinity for vcpu %d has only unreachable cpus." > + " Only hard affinity will be considered for scheduling", vcpuid); > + > + rc = 0; > + out: > + libxl_bitmap_dispose(&ecpumap); > + GC_FREE; > return 0; > } > > int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, > - unsigned int max_vcpus, libxl_bitmap *cpumap) > + unsigned int max_vcpus, > + const libxl_bitmap *cpumap_hard, > + const libxl_bitmap *cpumap_soft) > { > + GC_INIT(ctx); > int i, rc = 0; > > for (i = 0; i < max_vcpus; i++) { > - if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap)) { > - LIBXL__LOG(ctx, LIBXL__LOG_WARNING, > - "failed to set affinity for %d", i); > + if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap_hard, cpumap_soft)) { > + LOG(WARN, "failed to set affinity for %d", i); > rc = ERROR_FAIL; > } > } > + > + GC_FREE; > return rc; > } > > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > index 2fab5ba..9864927 100644 > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -82,6 +82,14 @@ > #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1 > > /* > + * LIBXL_HAVE_SOFT_AFFINITY indicates that a ''cpumap_soft'' > + * field (of libxl_bitmap type) is present in both > + * libxl_domain_build_info and libxl_vcpuinfo, containing > + * the soft affinity for the vcpu. > + */ > +#define LIBXL_HAVE_SOFT_AFFINITY 1 > + > +/* > * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the > * libxl_vendor_device field is present in the hvm sections of > * libxl_domain_build_info. This field tells libxl which > @@ -994,9 +1002,35 @@ int libxl_userdata_retrieve(libxl_ctx *ctx, uint32_t domid, > > int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo); > int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > - libxl_bitmap *cpumap); > + const libxl_bitmap *cpumap_hard, > + const libxl_bitmap *cpumap_soft); > int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, > - unsigned int max_vcpus, libxl_bitmap *cpumap); > + unsigned int max_vcpus, > + const libxl_bitmap *cpumap_hard, > + const libxl_bitmap *cpumap_soft); > + > +#if defined (LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x040400 > + > +static inline > +int libxl_set_vcpuaffinity_0x040200(libxl_ctx *ctx, uint32_t domid, > + uint32_t vcpuid, libxl_bitmap *cpumap) > +{ > + return libxl_set_vcpuaffinity(ctx, domid, vcpuid, cpumap, NULL); > +} > + > +static inline > +int libxl_set_vcpuaffinity_all_0x040200(libxl_ctx *ctx, uint32_t domid, > + unsigned int max_vcpus, > + libxl_bitmap *cpumap) > +{ > + return libxl_set_vcpuaffinity_all(ctx, domid, max_vcpus, cpumap, NULL); > +} > + > +#define libxl_set_vcpuaffinity libxl_set_vcpuaffinity_0x040200 > +#define libxl_set_vcpuaffinity_all libxl_set_vcpuaffinity_all_0x040200 > + > +#endif > + > int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid, > libxl_bitmap *nodemap); > int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid, > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index fe7ba0d..58f80df 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -193,6 +193,12 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, > libxl_bitmap_set_any(&b_info->cpumap); > } > > + if (!b_info->cpumap_soft.size) { > + if (libxl_cpu_bitmap_alloc(CTX, &b_info->cpumap_soft, 0)) > + return ERROR_FAIL; > + libxl_bitmap_set_any(&b_info->cpumap_soft); > + } > + > libxl_defbool_setdefault(&b_info->numa_placement, true); > > if (!b_info->nodemap.size) { > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > index 72489f8..4bfed60 100644 > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -236,7 +236,8 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid, > return rc; > } > libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap); > - libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap); > + libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap, > + &info->cpumap_soft); > > xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + LIBXL_MAXMEM_CONSTANT); > xs_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL); > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > index cba8eff..48699c9 100644 > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -298,6 +298,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ > ("max_vcpus", integer), > ("avail_vcpus", libxl_bitmap), > ("cpumap", libxl_bitmap), > + ("cpumap_soft", libxl_bitmap), > ("nodemap", libxl_bitmap), > ("numa_placement", libxl_defbool), > ("tsc_mode", libxl_tsc_mode), > @@ -510,7 +511,8 @@ libxl_vcpuinfo = Struct("vcpuinfo", [ > ("blocked", bool), > ("running", bool), > ("vcpu_time", uint64), # total vcpu time ran (ns) > - ("cpumap", libxl_bitmap), # current cpu''s affinities > + ("cpumap", libxl_bitmap), # current hard cpu affinity > + ("cpumap_soft", libxl_bitmap), # current soft cpu affinity > ], dir=DIR_OUT) > > libxl_physinfo = Struct("physinfo", [ > diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h > index b11cf28..c205dd7 100644 > --- a/tools/libxl/libxl_utils.h > +++ b/tools/libxl/libxl_utils.h > @@ -90,7 +90,7 @@ static inline void libxl_bitmap_set_none(libxl_bitmap *bitmap) > { > memset(bitmap->map, 0, bitmap->size); > } > -static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit) > +static inline int libxl_bitmap_valid(const libxl_bitmap *bitmap, int bit) > { > return bit >= 0 && bit < (bitmap->size * 8); > } > @@ -98,6 +98,29 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit) > #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \ > if (libxl_bitmap_test(&(m), v)) > > +static inline int libxl_bitmap_equal(const libxl_bitmap *ba, > + const libxl_bitmap *bb, > + int nr_bits) > +{ > + int i; > + > + /* Only check nr_bits (all bits if <= 0) */ > + nr_bits = (nr_bits <= 0) ? ba->size * 8 : nr_bits; > + for (i = 0; i < nr_bits; i++) { > + /* If overflowing one of the bitmaps, we call them different */ > + if (!libxl_bitmap_valid(ba, i) || !libxl_bitmap_valid(bb, i)) > + return 0; > + if (libxl_bitmap_test(ba, i) != libxl_bitmap_test(bb, i)) > + return 0; > + } > + return 1; > +} > + > +static inline int libxl_bitmap_cpu_valid(libxl_bitmap *cpumap, int cpu) > +{ > + return libxl_bitmap_valid(cpumap, cpu); > +} > + > int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, > libxl_bitmap *cpumap, > int max_cpus); > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > index 7b4d058..17fffdd 100644 > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -2222,7 +2222,7 @@ start: > } else { > libxl_bitmap_set_any(&vcpu_cpumap); > } > - if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap)) { > + if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap, NULL)) { > fprintf(stderr, "setting affinity failed on vcpu `%d''.\n", i); > libxl_bitmap_dispose(&vcpu_cpumap); > free(vcpu_to_pcpu); > @@ -4630,7 +4630,7 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) > } > > if (vcpuid != -1) { > - if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap) == -1) { > + if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) { > fprintf(stderr, "Could not set affinity for vcpu `%u''.\n", vcpuid); > goto out; > } > @@ -4642,7 +4642,7 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) > } > for (i = 0; i < nb_vcpu; i++) { > if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid, > - &cpumap) == -1) { > + &cpumap, NULL)) { > fprintf(stderr, "libxl_set_vcpuaffinity failed" > " on vcpu `%u''.\n", vcpuinfo[i].vcpuid); > } >
Jan Beulich
2013-Nov-27 13:11 UTC
Re: [PATCH v4 10/15] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
>>> On 22.11.13 at 19:57, Dario Faggioli <dario.faggioli@citrix.com> wrote: > + /* > + * Report back to the caller what the "effective affinity", that > + * is the intersection of cpupool''s pcpus, the (new?) hard > + * affinity and the (new?) soft-affinity. > + */ > + if ( !guest_handle_is_null(op->u.vcpuaffinity.eff_cpumap.bitmap) ) > { > - ret = vcpu_set_affinity(v, new_affinity); > - free_cpumask_var(new_affinity); > + online = cpupool_online_cpumask(v->domain->cpupool); > + cpumask_and(new_affinity, online, v->cpu_hard_affinity); > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT) > + cpumask_and(new_affinity, new_affinity, > + v->cpu_soft_affinity); > + > + ret = cpumask_to_xenctl_bitmap( > + &op->u.vcpuaffinity.eff_cpumap, new_affinity);So with both flags set, how is the caller supposed to know what hard affinity is now in effect? I said on the previous version already that with you _having_ two CPU masks, you should return both.> else > { > + cpumask_var_t affinity; > + > + /* > + * If the caller asks for both _HARD and _SOFT, what we return > + * is the intersection of hard and soft affinity for the vcpu. > + */ > + if ( !alloc_cpumask_var(&affinity) ) > + { > + ret = -ENOMEM; > + break; > + } > + cpumask_setall(affinity); > + > + if ( op->u.vcpuaffinity.flags == 0 ) > + { > + ret = -EINVAL; > + free_cpumask_var(affinity); > + break; > + } > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD ) > + cpumask_copy(affinity, v->cpu_hard_affinity); > + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT ) > + cpumask_and(affinity, affinity, v->cpu_soft_affinity); > + > ret = cpumask_to_xenctl_bitmap( > - &op->u.vcpuaffinity.cpumap, v->cpu_hard_affinity); > + &op->u.vcpuaffinity.cpumap, affinity);Similarly here. Of course, the name "eff_cpumap" then is likely not really suitable.> --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -300,8 +300,19 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_nodeaffinity_t); > /* XEN_DOMCTL_setvcpuaffinity */ > /* XEN_DOMCTL_getvcpuaffinity */ > struct xen_domctl_vcpuaffinity { > - uint32_t vcpu; /* IN */ > - struct xenctl_bitmap cpumap; /* IN/OUT */ > + /* IN variables. */ > + uint32_t vcpu; > + /* Set/get the hard affinity for vcpu */ > +#define _XEN_VCPUAFFINITY_HARD 0 > +#define XEN_VCPUAFFINITY_HARD (1U<<_XEN_VCPUAFFINITY_HARD) > + /* Set/get the soft affinity for vcpu */ > +#define _XEN_VCPUAFFINITY_SOFT 1 > +#define XEN_VCPUAFFINITY_SOFT (1U<<_XEN_VCPUAFFINITY_SOFT) > + uint32_t flags; > + /* IN/OUT variables. */In further revisions, please make these annotations correctly reflect what you patch does: In its current shape, this is an IN for set and an OUT for get, ...> + struct xenctl_bitmap cpumap; > + /* OUT variables. */... while this is an OUT for set, and unused for get (and I also said earlier that _if_ this remains unused for one of the two operations, then sharing the interface structure wouldn''t be appropriate anymore). Jan
Ian Campbell
2013-Nov-27 13:45 UTC
Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}
On Fri, 2013-11-22 at 19:56 +0100, Dario Faggioli wrote:> @@ -4351,7 +4349,7 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out) > int max_cpus; > > max_cpus = libxl_get_max_cpus(ctx); > - if (max_cpus == 0) > + if (max_cpus <= 0)Doesn''t this slightly odd error condition (which includes 0 as an error) actually stem from xc_get_max_cpus''s broken handling of the case where xc_physinfo failed? It''d be better IMHO to make sure that this call chain ultimately returns either a negative error or a positive number of cpus and never 0. I''d be inclined to do that in xc_get_max_cpus. I suppose then xc_get_{cpu,node}map_size would need error handling too. Handling it in libxl_get_max_cpus might be reasonable too. That function really ought to be returning libxl error codes, not -1 and errno as it currently does. If xc_physinfo is failing it seems unlikely that anything subsequent to that is going to make much useful progress.> @@ -4538,10 +4536,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, > } > > for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu, ++ptr) { > - if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) { > - LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "allocating cpumap"); > + if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) > return NULL;Isn''t this leaking ret == ptr here and everywhere else this returns?> - } > if (xc_vcpu_getinfo(ctx->xch, domid, *nb_vcpu, &vcpuinfo) == -1) { > LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu info"); > return NULL; > @@ -5304,7 +5300,7 @@ int libxl_get_freecpus(libxl_ctx *ctx, libxl_bitmap *cpumap) > int ncpus; > > ncpus = libxl_get_max_cpus(ctx); > - if (ncpus == 0) > + if (ncpus <= 0) > return ERROR_FAIL;If libxl_get_max_cpus returned a proper rc you could propagate it.> > cpumap->map = xc_cpupool_freeinfo(ctx->xch); > diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c > index 9f5f589..1815422 100644 > --- a/tools/libxl/libxl_utils.c > +++ b/tools/libxl/libxl_utils.c > @@ -651,6 +651,56 @@ char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const libxl_bitmap *bitmap) > return q; > } > > +int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int max_cpus)You seem to have combined code motion with actual changes here. Please don''t do that.> +{ > + GC_INIT(ctx); > + int rc = 0; > + > + if (max_cpus < 0) { > + rc = ERROR_INVAL;No need to log here?> + goto out; > + } > + if (max_cpus == 0) > + max_cpus = libxl_get_max_cpus(ctx); > + if (max_cpus <= 0) { > + LOG(ERROR, "failed to retrieve the maximum number of cpus"); > + rc = ERROR_FAIL; > + goto out; > + } > + /* This can''t fail: no need to check and log */ > + libxl_bitmap_alloc(ctx, cpumap, max_cpus); > + > + out: > + GC_FREE; > + return rc; > +} > + > +int libxl_node_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *nodemap, > + int max_nodes)This is identical to cpu_bitmap alloc other than the parameter name and the log message, isn''t it? A common internal function, perhaps taking a const char *what would eliminate that.> +{ > + GC_INIT(ctx); > + int rc = 0; > + > + if (max_nodes < 0) { > + rc = ERROR_INVAL; > + goto out; > + } > + > + if (max_nodes == 0) > + max_nodes = libxl_get_max_nodes(ctx); > + if (max_nodes <= 0) { > + LOG(ERROR, "failed to retrieve the maximum number of nodes"); > + rc = ERROR_FAIL; > + goto out; > + } > + /* This can''t fail: no need to check and log */ > + libxl_bitmap_alloc(ctx, nodemap, max_nodes); > + > + out: > + GC_FREE; > + return rc; > +} > + > int libxl_nodemap_to_cpumap(libxl_ctx *ctx, > const libxl_bitmap *nodemap, > libxl_bitmap *cpumap) > @@ -719,12 +769,16 @@ int libxl_cpumap_to_nodemap(libxl_ctx *ctx, > > int libxl_get_max_cpus(libxl_ctx *ctx) > { > - return xc_get_max_cpus(ctx->xch); > + int max_cpus = xc_get_max_cpus(ctx->xch); > + > + return max_cpus <= 0 ? ERROR_FAIL : max_cpus;Aha, so there is indeed no need for all the <= changes in the callers. Good! Ian.
Ian Campbell
2013-Nov-27 13:49 UTC
Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()
On Fri, 2013-11-22 at 19:56 +0100, Dario Faggioli wrote:> to retrieve the actual number of pCPUs on the host. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > --- > tools/libxl/libxl.h | 3 +++ > tools/libxl/libxl_utils.c | 18 ++++++++++++++++++ > 2 files changed, 21 insertions(+) > > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > index a9663e4..2fab5ba 100644 > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -652,6 +652,9 @@ int libxl_domain_preserve(libxl_ctx *ctx, uint32_t domid, libxl_domain_create_in > /* get max. number of cpus supported by hypervisor */ > int libxl_get_max_cpus(libxl_ctx *ctx); > > +/* get the actual number of online cpus on the host */ > +int libxl_get_nr_cpus(libxl_ctx *ctx);I suggest get_online_cpus if that is what it returns. "nr" could be counting various different types of cpu.> + > /* get max. number of NUMA nodes supported by hypervisor */ > int libxl_get_max_nodes(libxl_ctx *ctx); > > diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c > index 1815422..8763070 100644 > --- a/tools/libxl/libxl_utils.c > +++ b/tools/libxl/libxl_utils.c > @@ -774,6 +774,24 @@ int libxl_get_max_cpus(libxl_ctx *ctx) > return max_cpus <= 0 ? ERROR_FAIL : max_cpus; > } > > +int libxl_get_nr_cpus(libxl_ctx *ctx)Arguably for consistency with the existing interfaces this should be a wrapper around a (new) libxc function, but lets not go there today.> +{ > + GC_INIT(ctx); > + xc_physinfo_t physinfo = { 0 };I don''t think this initialiser is actually needed (nowhere in libxc does it).> + int rc = 0; > + > + if (xc_physinfo(ctx->xch, &physinfo)) { > + LOGE(ERROR, "xc_physinfo failed."); > + rc = ERROR_FAIL; > + goto out; > + } > + rc = physinfo.nr_cpus; > + > + out: > + GC_FREE; > + return rc; > +} > + > int libxl_get_max_nodes(libxl_ctx *ctx) > { > int max_nodes = xc_get_max_nodes(ctx->xch); > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
George Dunlap
2013-Nov-27 14:17 UTC
Re: [PATCH v4 10/15] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
On 11/27/2013 01:11 PM, Jan Beulich wrote:>>>> On 22.11.13 at 19:57, Dario Faggioli <dario.faggioli@citrix.com> wrote: >> + /* >> + * Report back to the caller what the "effective affinity", that >> + * is the intersection of cpupool''s pcpus, the (new?) hard >> + * affinity and the (new?) soft-affinity. >> + */ >> + if ( !guest_handle_is_null(op->u.vcpuaffinity.eff_cpumap.bitmap) ) >> { >> - ret = vcpu_set_affinity(v, new_affinity); >> - free_cpumask_var(new_affinity); >> + online = cpupool_online_cpumask(v->domain->cpupool); >> + cpumask_and(new_affinity, online, v->cpu_hard_affinity); >> + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT) >> + cpumask_and(new_affinity, new_affinity, >> + v->cpu_soft_affinity); >> + >> + ret = cpumask_to_xenctl_bitmap( >> + &op->u.vcpuaffinity.eff_cpumap, new_affinity); > So with both flags set, how is the caller supposed to know what > hard affinity is now in effect? I said on the previous version already > that with you _having_ two CPU masks, you should return both.If I recall the timing correctly, I think this series was sent out before you guys had come to that conclusion on the other thread. -George
Dario Faggioli
2013-Nov-27 14:31 UTC
Re: [PATCH v4 10/15] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
On mer, 2013-11-27 at 14:17 +0000, George Dunlap wrote:> On 11/27/2013 01:11 PM, Jan Beulich wrote: > >>>> On 22.11.13 at 19:57, Dario Faggioli <dario.faggioli@citrix.com> wrote: > >> + /* > >> + * Report back to the caller what the "effective affinity", that > >> + * is the intersection of cpupool''s pcpus, the (new?) hard > >> + * affinity and the (new?) soft-affinity. > >> + */ > >> + if ( !guest_handle_is_null(op->u.vcpuaffinity.eff_cpumap.bitmap) ) > >> { > >> - ret = vcpu_set_affinity(v, new_affinity); > >> - free_cpumask_var(new_affinity); > >> + online = cpupool_online_cpumask(v->domain->cpupool); > >> + cpumask_and(new_affinity, online, v->cpu_hard_affinity); > >> + if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT) > >> + cpumask_and(new_affinity, new_affinity, > >> + v->cpu_soft_affinity); > >> + > >> + ret = cpumask_to_xenctl_bitmap( > >> + &op->u.vcpuaffinity.eff_cpumap, new_affinity); > > So with both flags set, how is the caller supposed to know what > > hard affinity is now in effect? I said on the previous version already > > that with you _having_ two CPU masks, you should return both. > > If I recall the timing correctly, I think this series was sent out > before you guys had come to that conclusion on the other thread. >Indeed. Anyway, new version implementing exactly what we agreed upon there coming shortly (for real, I just got sidetracked by a couple of other stuff, but it''s almost ready). Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote:> Make space for two new cpumap-s, one in vcpu_info (for getting > soft affinity) and build_info (for setting it) and amend the > API for setting vCPU affinity. > > libxl_set_vcpuaffinity() now takes two cpumaps, one for hard > and one for soft affinity (LIBXL_API_VERSION is exploited to > retain source level backword compatibility). Either of the > two cpumap can be NULL, in which case, only the affinity > corresponding to the non-NULL cpumap will be affected. > > Getting soft affinity happens indirectly, via `xl vcpu-list'' > (as it is already for hard affinity). > > This commit also introduces some logic to check whether the > affinity which will be used by Xen to schedule the vCPU(s) > does actually match with the cpumaps provided. In fact, we > want to allow every possible combination of hard and soft > affinity to be set, but we warn the user upon particularly > weird situations (e.g., hard and soft being disjoint sets > of pCPUs). > > This is the first change breaking the libxl ABI, so it bumps > the MAJOR. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > --- > Changes from v3: > * only introduce one LIBXL_HAVE_ symbol for soft affinity, > as requested during review; > * use LIBXL_API_VERSION instead of having multiple version > of the same function, as suggested during review; > * use libxl_get_nr_cpus() rather than libxl_get_cputopology(), > as suggested during review; > * use LOGE() instead of LIBXL__LOG_ERRNO(), as requested > during review; > * kill the flags and use just one _set_vcpuaffinity() > function with two cpumaps, allowing either of them to > be NULL, as suggested during review; > * avoid overflowing the bitmaps in libxl_bitmap_equal(), > as suggested during review. > > Changes from v2: > * interface completely redesigned, as discussed during > review. > --- > tools/libxl/Makefile | 2 + > tools/libxl/libxl.c | 79 ++++++++++++++++++++++++++++++++++++++----- > tools/libxl/libxl.h | 38 ++++++++++++++++++++- > tools/libxl/libxl_create.c | 6 +++ > tools/libxl/libxl_dom.c | 3 +- > tools/libxl/libxl_types.idl | 4 ++ > tools/libxl/libxl_utils.h | 25 +++++++++++++- > tools/libxl/xl_cmdimpl.c | 6 ++- > 8 files changed, 145 insertions(+), 18 deletions(-) > > diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile > index d8495bb..1410c44 100644 > --- a/tools/libxl/Makefile > +++ b/tools/libxl/Makefile > @@ -5,7 +5,7 @@ > XEN_ROOT = $(CURDIR)/../.. > include $(XEN_ROOT)/tools/Rules.mk > > -MAJOR = 4.3 > +MAJOR = 4.4 > MINOR = 0 > > XLUMAJOR = 4.3 > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > index 961d55e..4b871d7 100644 > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -4538,6 +4538,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, > for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu, ++ptr) { > if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) > return NULL; > + if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap_soft, 0)) > + return NULL;Leaks ptr and ptr->cpumap...> if (xc_vcpu_getinfo(ctx->xch, domid, *nb_vcpu, &vcpuinfo) == -1) { > LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu info"); > return NULL; > @@ -4548,6 +4550,12 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, > LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu affinity"); > return NULL; > } > + if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu, > + XEN_VCPUAFFINITY_SOFT, > + ptr->cpumap_soft.map) == -1) { > + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu affinity"); > + return NULL; > + } > ptr->vcpuid = *nb_vcpu; > ptr->cpu = vcpuinfo.cpu; > ptr->online = !!vcpuinfo.online; > @@ -4559,28 +4567,81 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, > } > > int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > - libxl_bitmap *cpumap) > + const libxl_bitmap *cpumap_hard, > + const libxl_bitmap *cpumap_soft) > { > - if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map, > - XEN_VCPUAFFINITY_HARD, NULL)) { > - LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu affinity"); > - return ERROR_FAIL; > + GC_INIT(ctx); > + libxl_bitmap ecpumap; > + int nr_cpus = 0, rc; > + > + if (!cpumap_hard && !cpumap_soft) > + return ERROR_INVAL; > + > + rc = libxl_cpu_bitmap_alloc(ctx, &ecpumap, 0); > + if (rc) > + return rc; > + > + nr_cpus = libxl_get_nr_cpus(ctx); > + if (nr_cpus <= 0) { > + rc = ERROR_FAIL; > + goto out; > } > + > + if (cpumap_hard && xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, > + cpumap_hard->map, > + XEN_VCPUAFFINITY_HARD, > + ecpumap.map)) { > + LOGE(ERROR, "setting vcpu hard affinity"); > + rc = ERROR_FAIL; > + goto out; > + } > + if (cpumap_hard && !libxl_bitmap_equal(cpumap_hard, &ecpumap, nr_cpus)) > + LOG(DEBUG, "New hard affinity for vcpu %d contains unreachable cpus", > + vcpuid); > + > + if (cpumap_soft && xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, > + cpumap_soft->map, > + XEN_VCPUAFFINITY_SOFT, > + ecpumap.map)) { > + LOGE(ERROR, "setting vcpu soft affinity"); > + rc = ERROR_FAIL; > + goto out; > + } > + if (cpumap_soft && !libxl_bitmap_equal(cpumap_soft, &ecpumap, nr_cpus)) > + LOG(DEBUG, "New soft affinity for vcpu %d contains unreachable cpus", > + vcpuid); > + > + /* > + * When setting hard affinity, it is guaranteed that the result will not > + * be empty, so we need to check for that only if soft. > + */ > + if (cpumap_soft && libxl_bitmap_is_empty(&ecpumap)) > + LOG(WARN, "New soft affinity for vcpu %d has only unreachable cpus." > + " Only hard affinity will be considered for scheduling", vcpuid); > + > + rc = 0; > + out: > + libxl_bitmap_dispose(&ecpumap); > + GC_FREE; > return 0;ITYM rreturn rc.> } > > int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, > - unsigned int max_vcpus, libxl_bitmap *cpumap) > + unsigned int max_vcpus, > + const libxl_bitmap *cpumap_hard, > + const libxl_bitmap *cpumap_soft) > { > + GC_INIT(ctx); > int i, rc = 0; > > for (i = 0; i < max_vcpus; i++) { > - if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap)) { > - LIBXL__LOG(ctx, LIBXL__LOG_WARNING, > - "failed to set affinity for %d", i); > + if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap_hard, cpumap_soft)) { > + LOG(WARN, "failed to set affinity for %d", i); > rc = ERROR_FAIL; > } > } > + > + GC_FREE; > return rc; > } > > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > index 2fab5ba..9864927 100644 > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -82,6 +82,14 @@ > #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1 > > /* > + * LIBXL_HAVE_SOFT_AFFINITY indicates that a ''cpumap_soft'' > + * field (of libxl_bitmap type) is present in both > + * libxl_domain_build_info and libxl_vcpuinfo, containing > + * the soft affinity for the vcpu."the vcpu", in the build_info case I guess it applies to all vcpus?> + */ > +#define LIBXL_HAVE_SOFT_AFFINITY 1 > + > +/* > * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the > * libxl_vendor_device field is present in the hvm sections of > * libxl_domain_build_info. This field tells libxl which > @@ -994,9 +1002,35 @@ int libxl_userdata_retrieve(libxl_ctx *ctx, uint32_t domid, > > int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo); > int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > - libxl_bitmap *cpumap); > + const libxl_bitmap *cpumap_hard, > + const libxl_bitmap *cpumap_soft); > int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, > - unsigned int max_vcpus, libxl_bitmap *cpumap); > + unsigned int max_vcpus, > + const libxl_bitmap *cpumap_hard, > + const libxl_bitmap *cpumap_soft); > + > +#if defined (LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x040400 > + > +static inline > +int libxl_set_vcpuaffinity_0x040200(libxl_ctx *ctx, uint32_t domid, > + uint32_t vcpuid, libxl_bitmap *cpumap) > +{ > + return libxl_set_vcpuaffinity(ctx, domid, vcpuid, cpumap, NULL); > +} > + > +static inline > +int libxl_set_vcpuaffinity_all_0x040200(libxl_ctx *ctx, uint32_t domid, > + unsigned int max_vcpus, > + libxl_bitmap *cpumap) > +{ > + return libxl_set_vcpuaffinity_all(ctx, domid, max_vcpus, cpumap, NULL); > +} > + > +#define libxl_set_vcpuaffinity libxl_set_vcpuaffinity_0x040200 > +#define libxl_set_vcpuaffinity_all libxl_set_vcpuaffinity_all_0x040200This looks correct, thanks. It could also have jsut been #defined since the inline is only used in the libxl_domain_create_restore_0x040200 case because there is a local variable.> + > +#endif > + > int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid, > libxl_bitmap *nodemap); > int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid, > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index fe7ba0d..58f80df 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -193,6 +193,12 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, > libxl_bitmap_set_any(&b_info->cpumap); > } > > + if (!b_info->cpumap_soft.size) { > + if (libxl_cpu_bitmap_alloc(CTX, &b_info->cpumap_soft, 0)) > + return ERROR_FAIL; > + libxl_bitmap_set_any(&b_info->cpumap_soft); > + } > + > libxl_defbool_setdefault(&b_info->numa_placement, true); > > if (!b_info->nodemap.size) { > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > index 72489f8..4bfed60 100644 > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -236,7 +236,8 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid, > return rc; > } > libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap); > - libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap); > + libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap, > + &info->cpumap_soft); > > xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + LIBXL_MAXMEM_CONSTANT); > xs_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL); > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > index cba8eff..48699c9 100644 > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -298,6 +298,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ > ("max_vcpus", integer), > ("avail_vcpus", libxl_bitmap), > ("cpumap", libxl_bitmap), > + ("cpumap_soft", libxl_bitmap), > ("nodemap", libxl_bitmap), > ("numa_placement", libxl_defbool), > ("tsc_mode", libxl_tsc_mode), > @@ -510,7 +511,8 @@ libxl_vcpuinfo = Struct("vcpuinfo", [ > ("blocked", bool), > ("running", bool), > ("vcpu_time", uint64), # total vcpu time ran (ns) > - ("cpumap", libxl_bitmap), # current cpu''s affinities > + ("cpumap", libxl_bitmap), # current hard cpu affinity > + ("cpumap_soft", libxl_bitmap), # current soft cpu affinity > ], dir=DIR_OUT) > > libxl_physinfo = Struct("physinfo", [ > diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h > index b11cf28..c205dd7 100644 > --- a/tools/libxl/libxl_utils.h > +++ b/tools/libxl/libxl_utils.h > @@ -90,7 +90,7 @@ static inline void libxl_bitmap_set_none(libxl_bitmap *bitmap) > { > memset(bitmap->map, 0, bitmap->size); > } > -static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit) > +static inline int libxl_bitmap_valid(const libxl_bitmap *bitmap, int bit)This name suggests to me that it checks whether the entire bitmap is somehow valid. ..._bit_valid? (or valid_bit). An alternative might be for the places which use this to assume that any bit past the end of the mask must be zero (or 1, but I think 0 makes more sense).> { > return bit >= 0 && bit < (bitmap->size * 8); > } > @@ -98,6 +98,29 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit) > #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \ > if (libxl_bitmap_test(&(m), v)) > > +static inline int libxl_bitmap_equal(const libxl_bitmap *ba, > + const libxl_bitmap *bb, > + int nr_bits) > +{ > + int i; > + > + /* Only check nr_bits (all bits if <= 0) */ > + nr_bits = (nr_bits <= 0) ? ba->size * 8 : nr_bits; > + for (i = 0; i < nr_bits; i++) { > + /* If overflowing one of the bitmaps, we call them different */ > + if (!libxl_bitmap_valid(ba, i) || !libxl_bitmap_valid(bb, i)) > + return 0; > + if (libxl_bitmap_test(ba, i) != libxl_bitmap_test(bb, i)) > + return 0; > + } > + return 1; > +} > + > +static inline int libxl_bitmap_cpu_valid(libxl_bitmap *cpumap, int cpu) > +{ > + return libxl_bitmap_valid(cpumap, cpu); > +} > + > int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, > libxl_bitmap *cpumap, > int max_cpus); > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > index 7b4d058..17fffdd 100644 > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -2222,7 +2222,7 @@ start: > } else { > libxl_bitmap_set_any(&vcpu_cpumap); > } > - if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap)) { > + if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap, NULL)) { > fprintf(stderr, "setting affinity failed on vcpu `%d''.\n", i); > libxl_bitmap_dispose(&vcpu_cpumap); > free(vcpu_to_pcpu); > @@ -4630,7 +4630,7 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) > } > > if (vcpuid != -1) { > - if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap) == -1) { > + if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) { > fprintf(stderr, "Could not set affinity for vcpu `%u''.\n", vcpuid); > goto out; > } > @@ -4642,7 +4642,7 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) > } > for (i = 0; i < nb_vcpu; i++) { > if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid, > - &cpumap) == -1) { > + &cpumap, NULL)) { > fprintf(stderr, "libxl_set_vcpuaffinity failed" > " on vcpu `%u''.\n", vcpuinfo[i].vcpuid); > } >
Ian Campbell
2013-Nov-27 14:57 UTC
Re: [PATCH v4 13/15] xl: enable getting and setting soft
On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote:> Getting happens via `xl vcpu-list'', which now looks like this: > > # xl vcpu-list -s > Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) > Domain-0 0 0 11 -b- 5.4 8-15 / all > Domain-0 0 1 11 -b- 1.0 8-15 / all > Domain-0 0 14 13 -b- 1.4 8-15 / all > Domain-0 0 15 8 -b- 1.6 8-15 / all > vm-test 3 0 4 -b- 2.5 0-12 / 0-7 > vm-test 3 1 0 -b- 3.2 0-12 / 0-7 > > Setting happens by adding a ''-s''/''--soft'' switch to `xl vcpu-pin''. > > xl manual page is updated accordingly. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > --- > Changes from v3: > * fix typos in doc, rephrased the help message and changed > the title of the column for hard/soft affinity, as suggested > during review. > > Changes from v2: > * this patch folds what in v2 were patches 13 and 14; > * `xl vcpu-pin'' always shows both had and soft affinity, > without the need of passing ''-s''. > --- > docs/man/xl.pod.1 | 24 +++++++++++++++ > tools/libxl/xl_cmdimpl.c | 70 +++++++++++++++++++++++++++------------------ > tools/libxl/xl_cmdtable.c | 3 +- > 3 files changed, 67 insertions(+), 30 deletions(-) > > diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 > index e7b9de2..b738ca2 100644 > --- a/docs/man/xl.pod.1 > +++ b/docs/man/xl.pod.1 > @@ -619,7 +619,7 @@ after B<vcpu-set>, go to B<SEE ALSO> section for information. > Lists VCPU information for a specific domain. If no domain is > specified, VCPU information for all domains will be provided. > > -=item B<vcpu-pin> I<domain-id> I<vcpu> I<cpus> > +=item B<vcpu-pin> [I<OPTIONS>] I<domain-id> I<vcpu> I<cpus> > > Pins the VCPU to only run on the specific CPUs. The keyword > B<all> can be used to apply the I<cpus> list to all VCPUs in the > @@ -630,6 +630,28 @@ different run state is appropriate. Pinning can be used to restrict > this, by ensuring certain VCPUs can only run on certain physical > CPUs. > > +B<OPTIONS> > + > +=over 4 > + > +=item B<-s>, B<--soft> > + > +The same as above, but affect I<soft affinity> rather than pinning > +(also called I<hard affinity>). > + > +Normally, VCPUs just wander among the CPUs where it is allowed tos/it is/they are/ I''d usually avoid terms such as "wander" in formal writing. Perhaps: "A VCPUs is normally scheduled among a set of PCPUs."> +run (either all the CPUs or the ones to which it is pinned, as said > +for B<vcpu-list>). Soft affinity offers a means to specify one ors/, as said for B<vcpu-list>// unless you think that adds something? I can''t see why vcpu-list is relevant though. Or just "(subject to pinning)" would be a less wordy way to get the message across.> +more I<preferred> CPUs. Basically, among the ones where it can run, > +the VCPU the VCPU will greatly prefer to execute on one of these > +CPUs, whenever that is possible."...one or more I<preferred> CPUs where the VCPU will prefer to run whenever possible".> + > +Notice that, in order for soft affinity to actually work, it needs > +special support in the scheduler. Right now, only credit1 provides > +that. > + > +=back > + > =item B<vm-list> > > Prints information about guests. This list excludes information about > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > index 17fffdd..d6fda26 100644 > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -4630,23 +4649,29 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) > } > > if (vcpuid != -1) { > - if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) { > - fprintf(stderr, "Could not set affinity for vcpu `%u''.\n", vcpuid); > + if (!soft_affinity && > + libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) { > + fprintf(stderr, "Could not set hard affinity for vcpu `%u''.\n", > + vcpuid); > + goto out; > + } else if (soft_affinity && > + libxl_set_vcpuaffinity(ctx, domid, vcpuid, NULL, &cpumap)) { > + fprintf(stderr, "Could not set soft affinity for vcpu `%u''.\n", > + vcpuid);I think I would have written this as: libxl_bitmap *soft = NULL, *hard = NULL; if (soft_affinity) soft = &cpumap else hard = &cpumap The in both of the branches "if (vcpu != -1)": libxl_set_vcpuaffinity(..., hard, soft) error handling I think you can omit the soft/hard from the log message other logging and/or the fact that they know what they asked for will cover the distinction. If you really wanted the const char *what = soft_affinity ? "soft" : "hard" to use. Is there no option to set both?
Ian Campbell
2013-Nov-27 15:53 UTC
Re: [PATCH v4 14/15] xl: enable for specifying node-affinity in the config file
On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote:> in a similar way to how it is possible to specify vcpu-affinity. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > --- > Changes from v3: > * fix typos and language issues in docs and comments, as > suggested during review; > * common code to soft and hard affinity parsing factored > together, as requested uring review. > > Changes from v2: > * use the new libxl API. Although the implementation changed > only a little bit, I removed IanJ''s Acked-by, although I am > here saying that he did provided it, as requested. > --- > docs/man/xl.cfg.pod.5 | 25 +++++- > tools/libxl/xl_cmdimpl.c | 192 ++++++++++++++++++++++++++++++---------------- > 2 files changed, 146 insertions(+), 71 deletions(-) > > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 > index 5e17b5d..9566393 100644 > --- a/docs/man/xl.cfg.pod.5 > +++ b/docs/man/xl.cfg.pod.5 > +=item B<cpus_soft="CPU-LIST"> > + > +Exactly as B<cpus=>, but specifies soft affinity, rather than pinning > +(also called hard affinity). When using the credit scheduler, this > +means the vcpus of the domain prefers to run on these pcpus.Just "prefer".> + > +A C<CPU-LIST> is specified exactly as above, for B<cpus=>. > + > +If this option is not specified, the vcpus of the guest will not have > +any preference regarding on what cpu to run, and the scheduler will > +treat all the cpus where a vcpu can execute (if B<cpus=> is specified), > +or all the host cpus (if not), the same.You can end this sentence after "on what cpu to run" without losing information IMHO.> If this option is specified, > +the intersection of the soft affinity mask, provided here, and the vcpu > +pinning, provided via B<cpus=> (if any), is utilized to compute the > +domain node-affinity, for driving memory allocations. > + > =back > > =head3 CPU Scheduling > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > index d6fda26..abd3e42 100644 > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -76,8 +76,9 @@ xlchild children[child_max]; > static const char *common_domname; > static int fd_lock = -1; > > -/* Stash for specific vcpu to pcpu mappping */ > +/* Stash for specific vcpu to pcpu hard and soft mappping */You could fix the spelling of mapping while here.> static int *vcpu_to_pcpu; > +static int *vcpu_to_pcpu_soft; > > static const char savefileheader_magic[32]> "Xen saved domain, xl format\n \0 \r"; > @@ -687,6 +688,64 @@ static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap) > return rc; > } > > +static int *parse_config_cpumap_list(XLU_ConfigList *cpus, > + libxl_bitmap *cpumap, > + int max_vcpus)I assume this is pure motion/refactoring? Or do I need to read it?> +{ > + int i, n_cpus = 0; > + int *to_pcpu; > + const char *buf; > + > + if (libxl_cpu_bitmap_alloc(ctx, cpumap, 0)) { > + fprintf(stderr, "Unable to allocate cpumap\n"); > + exit(1); > + } > + > + /* Prepare the array for single vcpu to pcpu mappings */ > + to_pcpu = xmalloc(sizeof(int) * max_vcpus); > + memset(to_pcpu, -1, sizeof(int) * max_vcpus); > + > + /* > + * Idea here is to let libxl think all the domain''s vcpus > + * have cpu affinity with all the pcpus on the list. Doing > + * that ensures memory is allocated on the proper NUMA nodes. > + * It is then us, here in xl, that matches each single vcpu > + * to its pcpu (and that''s why we need to stash such info in > + * the to_pcpu array now) after the domain has been created. > + * This way, we avoid having to pass to libxl some big array > + * hosting the single mappings. > + */ > + libxl_bitmap_set_none(cpumap); > + while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) { > + i = atoi(buf); > + if (!libxl_bitmap_cpu_valid(cpumap, i)) { > + fprintf(stderr, "cpu %d illegal\n", i); > + exit(1); > + } > + libxl_bitmap_set(cpumap, i); > + if (n_cpus < max_vcpus) > + to_pcpu[n_cpus] = i; > + n_cpus++; > + } > + > + return to_pcpu; > +} > + > +static void parse_config_cpumap_string(const char *buf, libxl_bitmap *cpumap)Likewise this is also motion/refactoring, correct?> +{ > + char *buf2 = strdup(buf); > + > + if (libxl_cpu_bitmap_alloc(ctx, cpumap, 0)) { > + fprintf(stderr, "Unable to allocate cpumap\n"); > + exit(1); > + } > + > + libxl_bitmap_set_none(cpumap); > + if (vcpupin_parse(buf2, cpumap)) > + exit(1); > + free(buf2); > +} > + > static void parse_config_data(const char *config_source, > const char *config_data, > int config_len, > @@ -697,7 +756,8 @@ static void parse_config_data(const char *config_source, > const char *buf; > long l; > XLU_Config *config; > - XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms; > + XLU_ConfigList *cpus, *cpus_soft, *vbds, *nics, *pcis; > + XLU_ConfigList *cvfbs, *cpuids, *vtpms; > XLU_ConfigList *ioports, *irqs, *iomem; > int num_ioports, num_irqs, num_iomem; > int pci_power_mgmt = 0; > @@ -820,57 +880,24 @@ static void parse_config_data(const char *config_source, > b_info->max_vcpus = l; > > if (!xlu_cfg_get_list (config, "cpus", &cpus, 0, 1)) { > - int n_cpus = 0; > - > - if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) { > - fprintf(stderr, "Unable to allocate cpumap\n"); > - exit(1); > - } > - > - /* Prepare the array for single vcpu to pcpu mappings */ > - vcpu_to_pcpu = xmalloc(sizeof(int) * b_info->max_vcpus); > - memset(vcpu_to_pcpu, -1, sizeof(int) * b_info->max_vcpus); > - > - /* > - * Idea here is to let libxl think all the domain''s vcpus > - * have cpu affinity with all the pcpus on the list. > - * It is then us, here in xl, that matches each single vcpu > - * to its pcpu (and that''s why we need to stash such info in > - * the vcpu_to_pcpu array now) after the domain has been created. > - * Doing it like this saves the burden of passing to libxl > - * some big array hosting the single mappings. Also, using > - * the cpumap derived from the list ensures memory is being > - * allocated on the proper nodes anyway. > - */ > - libxl_bitmap_set_none(&b_info->cpumap); > - while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) { > - i = atoi(buf); > - if (!libxl_bitmap_cpu_valid(&b_info->cpumap, i)) { > - fprintf(stderr, "cpu %d illegal\n", i); > - exit(1); > - } > - libxl_bitmap_set(&b_info->cpumap, i); > - if (n_cpus < b_info->max_vcpus) > - vcpu_to_pcpu[n_cpus] = i; > - n_cpus++; > - } > - > + vcpu_to_pcpu = parse_config_cpumap_list(cpus, &b_info->cpumap, > + b_info->max_vcpus); > /* We have a cpumap, disable automatic placement */ > libxl_defbool_set(&b_info->numa_placement, false); > } > else if (!xlu_cfg_get_string (config, "cpus", &buf, 0)) { > - char *buf2 = strdup(buf); > - > - if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) { > - fprintf(stderr, "Unable to allocate cpumap\n"); > - exit(1); > - } > - > - libxl_bitmap_set_none(&b_info->cpumap); > - if (vcpupin_parse(buf2, &b_info->cpumap)) > - exit(1); > - free(buf2); > + parse_config_cpumap_string(buf, &b_info->cpumap); > + libxl_defbool_set(&b_info->numa_placement, false); > + } > > + if (!xlu_cfg_get_list (config, "cpus_soft", &cpus_soft, 0, 1)) { > + vcpu_to_pcpu_soft > + parse_config_cpumap_list(cpus_soft, &b_info->cpumap_soft, > + b_info->max_vcpus); > + libxl_defbool_set(&b_info->numa_placement, false); > + } > + else if (!xlu_cfg_get_string (config, "cpus_soft", &buf, 0)) { > + parse_config_cpumap_string(buf, &b_info->cpumap_soft); > libxl_defbool_set(&b_info->numa_placement, false); > }This is / else if pattern is the same for both types of bitmap, right? Can you make it a function which takes config, "cpus_soft"/"cpus" and a pointer to the map to fill in etc?> > @@ -1991,6 +2018,40 @@ static void evdisable_disk_ejects(libxl_evgen_disk_eject **diskws, > } > } > > +static inline int set_vcpu_to_pcpu_affinity(uint32_t domid, int *to_pcpu, > + int max_vcpus, int soft) > +{ > + libxl_bitmap vcpu_cpumap; > + int i, ret = 0; > + > + ret = libxl_cpu_bitmap_alloc(ctx, &vcpu_cpumap, 0); > + if (ret) > + return -1; > + > + for (i = 0; i < max_vcpus; i++) { > + if (to_pcpu[i] != -1) { > + libxl_bitmap_set_none(&vcpu_cpumap); > + libxl_bitmap_set(&vcpu_cpumap, to_pcpu[i]); > + } else { > + libxl_bitmap_set_any(&vcpu_cpumap); > + } > + if (!soft && > + libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap, NULL)) { > + fprintf(stderr, "setting hard affinity failed on vcpu `%d''.\n", i); > + ret = -1; > + break; > + } else if (soft && > + libxl_set_vcpuaffinity(ctx, domid, i, NULL, &vcpu_cpumap)) {Another place where using termporary hard and soft pointers initialised from the soft boolean would work I think. Ian.
Ian Campbell
2013-Nov-27 15:55 UTC
Re: [PATCH v4 15/15] libxl: automatic NUMA placement affects soft affinity
On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote:> vCPU soft affinity and NUMA-aware scheduling does not have > to be related. However, soft affinity is how NUMA-aware > scheduling is actually implemented, and therefore, by default, > the results of automatic NUMA placement (at VM creation time) > are also used to set the soft affinity of all the vCPUs of > the domain. > > Of course, this only happens if automatic NUMA placement is > enabled and actually takes place (for instance, if the user > does not specify any hard and soft affiniy in the xl config > file). > > This also takes care of the vice-versa, i.e., don''t trigger > automatic placement if the config file specifies either an > hard (the check for which was already there) or a soft (the > check for which is introduced by this commit) affinity. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > Acked-by: George Dunlap <george.dunlap@eu.citrix.com>Acked-by: Ian Campbell <ian.campbell@citrix.com>
Dario Faggioli
2013-Dec-02 18:10 UTC
Re: [PATCH v4 13/15] xl: enable getting and setting soft
On mer, 2013-11-27 at 14:57 +0000, Ian Campbell wrote:> On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote: > > --- a/tools/libxl/xl_cmdimpl.c > > +++ b/tools/libxl/xl_cmdimpl.c > > @@ -4630,23 +4649,29 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu) > > } > > > > if (vcpuid != -1) { > > - if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) { > > - fprintf(stderr, "Could not set affinity for vcpu `%u''.\n", vcpuid); > > + if (!soft_affinity && > > + libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) { > > + fprintf(stderr, "Could not set hard affinity for vcpu `%u''.\n", > > + vcpuid); > > + goto out; > > + } else if (soft_affinity && > > + libxl_set_vcpuaffinity(ctx, domid, vcpuid, NULL, &cpumap)) { > > + fprintf(stderr, "Could not set soft affinity for vcpu `%u''.\n", > > + vcpuid); > > I think I would have written this as: > libxl_bitmap *soft = NULL, *hard = NULL; > if (soft_affinity) > soft = &cpumap > else > hard = &cpumap > > The in both of the branches "if (vcpu != -1)": > libxl_set_vcpuaffinity(..., hard, soft) > error handling > > I think you can omit the soft/hard from the log message other logging > and/or the fact that they know what they asked for will cover the > distinction. If you really wanted the const char *what = soft_affinity ? > "soft" : "hard" to use. >Done this, and all the that was suggested in this e-mail.> Is there no option to set both? >No, not for now. Do you think there should be one? Personally, I think we should keep `xl vcpu-pin <dom> <vcpu> <affinity>'' working, and doing the thing that people are most used with, i.e., setting the pinning (hard affinity). At that point, the most natural thing to do seemed to add a switch to have this deal with soft affinity, which is what happens in this patch. I of course can come up with pretty much all sort of behavior, e.g., setting both hard and soft affinity to the same cpumap, if no switch is provided, and then introduce both a ''-h'' and a ''-s'', for specifying different cpumaps, at the same time, etc. Of course, that could well happen during 4.5, I guess. :-P So, what do we want? Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2013-Dec-02 18:17 UTC
Re: [PATCH v4 12/15] libxl: get and set soft affinity
On mer, 2013-11-27 at 14:45 +0000, Ian Campbell wrote:> On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote: > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > > index 961d55e..4b871d7 100644 > > --- a/tools/libxl/libxl.c > > +++ b/tools/libxl/libxl.c > > @@ -4538,6 +4538,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, > > for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu, ++ptr) { > > if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) > > return NULL; > > + if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap_soft, 0)) > > + return NULL; > > Leaks ptr and ptr->cpumap... >Addressed in a separate patch (still part of the new version, v5, of the series).> > + */ > > +#define LIBXL_HAVE_SOFT_AFFINITY 1 > > + > > +/* > > * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the > > * libxl_vendor_device field is present in the hvm sections of > > * libxl_domain_build_info. This field tells libxl which > > @@ -994,9 +1002,35 @@ int libxl_userdata_retrieve(libxl_ctx *ctx, uint32_t domid, > > > > int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo); > > int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > > - libxl_bitmap *cpumap); > > + const libxl_bitmap *cpumap_hard, > > + const libxl_bitmap *cpumap_soft); > > int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, > > - unsigned int max_vcpus, libxl_bitmap *cpumap); > > + unsigned int max_vcpus, > > + const libxl_bitmap *cpumap_hard, > > + const libxl_bitmap *cpumap_soft); > > + > > +#if defined (LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x040400 > > + > > +static inline > > +int libxl_set_vcpuaffinity_0x040200(libxl_ctx *ctx, uint32_t domid, > > + uint32_t vcpuid, libxl_bitmap *cpumap) > > +{ > > + return libxl_set_vcpuaffinity(ctx, domid, vcpuid, cpumap, NULL); > > +} > > + > > +static inline > > +int libxl_set_vcpuaffinity_all_0x040200(libxl_ctx *ctx, uint32_t domid, > > + unsigned int max_vcpus, > > + libxl_bitmap *cpumap) > > +{ > > + return libxl_set_vcpuaffinity_all(ctx, domid, max_vcpus, cpumap, NULL); > > +} > > + > > +#define libxl_set_vcpuaffinity libxl_set_vcpuaffinity_0x040200 > > +#define libxl_set_vcpuaffinity_all libxl_set_vcpuaffinity_all_0x040200 > > This looks correct, thanks. It could also have jsut been #defined since > the inline is only used in the libxl_domain_create_restore_0x040200 case > because there is a local variable. >Yes, I saw that. You mean something like this: #define libxl_set_vcpuaffinity(_ctx, _domid, _vcpuid, _map) \ libxl_set_vcpuaffinity((_ctx), (_domid), (_vcpuid), (_map), NULL) ? Well, I guess I can, but don''t I more protection for the arguments / risk to clash with the app namespace, if I do? That''s the main reason why I used the inline variant, and that is true for v5 as well. If you prefer to go with the `#define'' only, I can respin (either the series or the patch). Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2013-Dec-02 18:21 UTC
Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}
On mer, 2013-11-27 at 13:45 +0000, Ian Campbell wrote:> On Fri, 2013-11-22 at 19:56 +0100, Dario Faggioli wrote: > > > > cpumap->map = xc_cpupool_freeinfo(ctx->xch); > > diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c > > index 9f5f589..1815422 100644 > > --- a/tools/libxl/libxl_utils.c > > +++ b/tools/libxl/libxl_utils.c > > @@ -651,6 +651,56 @@ char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const libxl_bitmap *bitmap) > > return q; > > } > > > > +int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int max_cpus) > > You seem to have combined code motion with actual changes here. Please > don''t do that. >As I explained in the commit message for this patch in v5, I actually am both moving and (slightly) changing the functions, and that''s mainly because the changes I''m doing require the move (e.g., GC_INIT/GC_FREE not being available in the original place). I feel like it''s more natural to do like this, rather than having a pre-patch just moving the code for no apparent reason... Isn''t it? Or was it something else that you were pointing out? (Anyway, feel free to look at v5 and reply there). Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2013-Dec-02 18:22 UTC
Re: [PATCH v4 14/15] xl: enable for specifying node-affinity in the config file
On mer, 2013-11-27 at 15:53 +0000, Ian Campbell wrote:> On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote: > > +static int *parse_config_cpumap_list(XLU_ConfigList *cpus, > > + libxl_bitmap *cpumap, > > + int max_vcpus) > > I assume this is pure motion/refactoring? Or do I need to read it? >It indeed it motion/refactoring, and so it is in v5.> > +{ > > + int i, n_cpus = 0; > > + int *to_pcpu; > > + const char *buf; > > + > > + if (libxl_cpu_bitmap_alloc(ctx, cpumap, 0)) { > > + fprintf(stderr, "Unable to allocate cpumap\n"); > > + exit(1); > > + } > > + > > + /* Prepare the array for single vcpu to pcpu mappings */ > > + to_pcpu = xmalloc(sizeof(int) * max_vcpus); > > + memset(to_pcpu, -1, sizeof(int) * max_vcpus); > > + > > + /* > > + * Idea here is to let libxl think all the domain''s vcpus > > + * have cpu affinity with all the pcpus on the list. Doing > > + * that ensures memory is allocated on the proper NUMA nodes. > > + * It is then us, here in xl, that matches each single vcpu > > + * to its pcpu (and that''s why we need to stash such info in > > + * the to_pcpu array now) after the domain has been created. > > + * This way, we avoid having to pass to libxl some big array > > + * hosting the single mappings. > > + */ > > + libxl_bitmap_set_none(cpumap); > > + while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) { > > + i = atoi(buf); > > + if (!libxl_bitmap_cpu_valid(cpumap, i)) { > > + fprintf(stderr, "cpu %d illegal\n", i); > > + exit(1); > > + } > > + libxl_bitmap_set(cpumap, i); > > + if (n_cpus < max_vcpus) > > + to_pcpu[n_cpus] = i; > > + n_cpus++; > > + } > > + > > + return to_pcpu; > > +} > > + > > +static void parse_config_cpumap_string(const char *buf, libxl_bitmap *cpumap) > > Likewise this is also motion/refactoring, correct? >Same here. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-Dec-03 09:32 UTC
Re: [PATCH v4 13/15] xl: enable getting and setting soft
On Mon, 2013-12-02 at 19:10 +0100, Dario Faggioli wrote:> > Is there no option to set both? > > > No, not for now. Do you think there should be one?Is it never necessary to set the "atomically"? Or is it always possible to set one arbitrarily before the other?> Personally, I think we should keep `xl vcpu-pin <dom> <vcpu> <affinity>'' > working, and doing the thing that people are most used with, i.e., > setting the pinning (hard affinity).Agreed.> At that point, the most natural > thing to do seemed to add a switch to have this deal with soft affinity, > which is what happens in this patch. > > I of course can come up with pretty much all sort of behavior, e.g., > setting both hard and soft affinity to the same cpumap, if no switch is > provided, and then introduce both a ''-h'' and a ''-s'', for specifying > different cpumaps, at the same time, etc.Is setting them both the same pretty much identical in practice to setting the hard and leaving the soft at all? I was actually thinking of -b/--both but as you suggest could work too, except -h is help so would need a different letter. Another option would be to take two affinities on the command line, the first being hard and the second soft with some special syntax ("-"?) for "leave it alone".> Of course, that could well happen during 4.5, I guess. :-PSure.> > So, what do we want? > > Regards, > Dario >
On Mon, 2013-12-02 at 19:17 +0100, Dario Faggioli wrote:> On mer, 2013-11-27 at 14:45 +0000, Ian Campbell wrote: > > > > + */ > > > +#define LIBXL_HAVE_SOFT_AFFINITY 1 > > > + > > > +/* > > > * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the > > > * libxl_vendor_device field is present in the hvm sections of > > > * libxl_domain_build_info. This field tells libxl which > > > @@ -994,9 +1002,35 @@ int libxl_userdata_retrieve(libxl_ctx *ctx, uint32_t domid, > > > > > > int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo); > > > int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > > > - libxl_bitmap *cpumap); > > > + const libxl_bitmap *cpumap_hard, > > > + const libxl_bitmap *cpumap_soft); > > > int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, > > > - unsigned int max_vcpus, libxl_bitmap *cpumap); > > > + unsigned int max_vcpus, > > > + const libxl_bitmap *cpumap_hard, > > > + const libxl_bitmap *cpumap_soft); > > > + > > > +#if defined (LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x040400 > > > + > > > +static inline > > > +int libxl_set_vcpuaffinity_0x040200(libxl_ctx *ctx, uint32_t domid, > > > + uint32_t vcpuid, libxl_bitmap *cpumap) > > > +{ > > > + return libxl_set_vcpuaffinity(ctx, domid, vcpuid, cpumap, NULL); > > > +} > > > + > > > +static inline > > > +int libxl_set_vcpuaffinity_all_0x040200(libxl_ctx *ctx, uint32_t domid, > > > + unsigned int max_vcpus, > > > + libxl_bitmap *cpumap) > > > +{ > > > + return libxl_set_vcpuaffinity_all(ctx, domid, max_vcpus, cpumap, NULL); > > > +} > > > + > > > +#define libxl_set_vcpuaffinity libxl_set_vcpuaffinity_0x040200 > > > +#define libxl_set_vcpuaffinity_all libxl_set_vcpuaffinity_all_0x040200 > > > > This looks correct, thanks. It could also have jsut been #defined since > > the inline is only used in the libxl_domain_create_restore_0x040200 case > > because there is a local variable. > > > Yes, I saw that. > > You mean something like this: > > #define libxl_set_vcpuaffinity(_ctx, _domid, _vcpuid, _map) \ > libxl_set_vcpuaffinity((_ctx), (_domid), (_vcpuid), (_map), NULL) > > ?Something like that yes.> Well, I guess I can, but don''t I more protection for the arguments / > risk to clash with the app namespace, if I do?Since the arguments of the inner function which are common to the outer have the same types I don''t think you gain any extra type checking and I don''t think the namespace clashes in practice (BTW, there is no need for leading _ in macro arguments IIRC)> That''s the main reason why I used the inline variant, and that is true > for v5 as well. If you prefer to go with the `#define'' only, I can > respin (either the series or the patch).I don''t have a strong preference, I was just observing in case you thought it nicer. Ian.
Ian Campbell
2013-Dec-03 09:41 UTC
Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}
On Mon, 2013-12-02 at 19:21 +0100, Dario Faggioli wrote:> On mer, 2013-11-27 at 13:45 +0000, Ian Campbell wrote: > > On Fri, 2013-11-22 at 19:56 +0100, Dario Faggioli wrote: > > > > > > cpumap->map = xc_cpupool_freeinfo(ctx->xch); > > > diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c > > > index 9f5f589..1815422 100644 > > > --- a/tools/libxl/libxl_utils.c > > > +++ b/tools/libxl/libxl_utils.c > > > @@ -651,6 +651,56 @@ char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const libxl_bitmap *bitmap) > > > return q; > > > } > > > > > > +int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int max_cpus) > > > > You seem to have combined code motion with actual changes here. Please > > don''t do that. > > > As I explained in the commit message for this patch in v5, I actually am > both moving and (slightly) changing the functions, and that''s mainly > because the changes I''m doing require the move (e.g., GC_INIT/GC_FREE > not being available in the original place). > > I feel like it''s more natural to do like this, rather than having a > pre-patch just moving the code for no apparent reason... Isn''t it?Certainly making changes which are necessarily required by the code motion is fine to do in a single step, although even then if you can arrange to make the changes either before or after the move it is better. But is that what is happening here? -static inline int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, - int max_cpus) -{ - if (max_cpus < 0) - return ERROR_INVAL; - if (max_cpus == 0) - max_cpus = libxl_get_max_cpus(ctx); - if (max_cpus == 0) - return ERROR_FAIL; - - return libxl_bitmap_alloc(ctx, cpumap, max_cpus); -} is becoming: +int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int max_cpus) +{ + GC_INIT(ctx); + int rc = 0; + + if (max_cpus < 0) { + rc = ERROR_INVAL; + goto out; + } + if (max_cpus == 0) + max_cpus = libxl_get_max_cpus(ctx); + if (max_cpus <= 0) { + LOG(ERROR, "failed to retrieve the maximum number of cpus"); + rc = ERROR_FAIL; + goto out; + } + /* This can''t fail: no need to check and log */ + libxl_bitmap_alloc(ctx, cpumap, max_cpus); + + out: + GC_FREE; + return rc; +} which involves a lot more reworking than simply adding the GC. In any case I don''t see why the original function couldn''t be moved as is and then have the GC-ification added in the new location, I don''t think the move requires the change to using the GC In any way.> Or was it something else that you were pointing out? > > (Anyway, feel free to look at v5 and reply there).In general it would be better to reply to vN either up front or as you do the rework, so we can resolve any misunderstanding rather than waiting until after you''ve posted vN+1 and perhaps requiring a vN+2. Ian.
Dario Faggioli
2013-Dec-03 10:27 UTC
Re: [PATCH v4 13/15] xl: enable getting and setting soft
On mar, 2013-12-03 at 09:32 +0000, Ian Campbell wrote:> On Mon, 2013-12-02 at 19:10 +0100, Dario Faggioli wrote: > > > Is there no option to set both? > > > > > No, not for now. Do you think there should be one? > > Is it never necessary to set the "atomically"? Or is it always possible > to set one arbitrarily before the other? >From the user perspective (as we''re discussing `xl'' interface), there certainly are cases when one want both to have the same or different values, but it is possible to achieve that by setting them independently (just takes more time, since you need two `xl vcpu-pin'' commands). Setting soft affinity *never* fails, while setting hard may fail, in the same exact circumstances when it is failing right now (e.g, setting hard affinity completely outside of your cpupool). Moreover, setting whichever one does not interfere with the other (apart from the possibility of getting a warning, it the system goes through some funny transient state). Therefore, I don''t think there is an hard requirement for this, although I''m certainly up for trying to make this more comfortable to use, once in 4.5.> > At that point, the most natural > > thing to do seemed to add a switch to have this deal with soft affinity, > > which is what happens in this patch. > > > > I of course can come up with pretty much all sort of behavior, e.g., > > setting both hard and soft affinity to the same cpumap, if no switch is > > provided, and then introduce both a ''-h'' and a ''-s'', for specifying > > different cpumaps, at the same time, etc. > > Is setting them both the same pretty much identical in practice to > setting the hard and leaving the soft at all? >Yes it is. That''s why I''d rather provide a mechanism for setting both, but each one to its own value (which could be hard=3-5-soft=all). I thinks it''s more useful.> I was actually thinking of -b/--both but as you suggest could work too, > except -h is help so would need a different letter. >Well, sure -h was pretty stupid, sorry. :-)> Another option would be to take two affinities on the command line, the > first being hard and the second soft with some special syntax ("-"?) for > "leave it alone". >Mmm... I think I like this better. So, if we take the approach implemented in this patch right now, and introduce something like that later, here''s what we''d get: 1. xl vcpu-pin 3 all 4-6 : hard affinity for all dom3 vcpus --> 4-6 2. xl vcpu-pin -s 3 all 3,5 : soft affinity for all dom3 vcpus --> 3,5 3. xl vcpu-pin 3 all 4-6 3,5 : combination of 1 and 2 4. xl vcpu-pin 3 all - 3,5 : same as 2 Would that be ok? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-Dec-03 10:59 UTC
Re: [PATCH v4 13/15] xl: enable getting and setting soft
On Tue, 2013-12-03 at 11:27 +0100, Dario Faggioli wrote:> On mar, 2013-12-03 at 09:32 +0000, Ian Campbell wrote: > > Another option would be to take two affinities on the command line, the > > first being hard and the second soft with some special syntax ("-"?) for > > "leave it alone". > > > Mmm... I think I like this better. So, if we take the approach > implemented in this patch right now, and introduce something like that > later, here''s what we''d get: > > 1. xl vcpu-pin 3 all 4-6 : hard affinity for all dom3 vcpus --> 4-6You mean soft don''t you? And also hard -> all, not left alone.> 2. xl vcpu-pin -s 3 all 3,5 : soft affinity for all dom3 vcpus --> 3,5I think if you give -s then you should expect only one cpumap, which would be the soft affinity map.> 3. xl vcpu-pin 3 all 4-6 3,5 : combination of 1 and 2Modulo the correction above, yes.> 4. xl vcpu-pin 3 all - 3,5 : same as 2Yes> Would that be ok?I like it. Ian,
Dario Faggioli
2013-Dec-03 11:14 UTC
Re: [PATCH v4 13/15] xl: enable getting and setting soft
On mar, 2013-12-03 at 10:59 +0000, Ian Campbell wrote:> On Tue, 2013-12-03 at 11:27 +0100, Dario Faggioli wrote: > > On mar, 2013-12-03 at 09:32 +0000, Ian Campbell wrote: > > > Another option would be to take two affinities on the command line, the > > > first being hard and the second soft with some special syntax ("-"?) for > > > "leave it alone". > > > > > Mmm... I think I like this better. So, if we take the approach > > implemented in this patch right now, and introduce something like that > > later, here''s what we''d get: > > > > 1. xl vcpu-pin 3 all 4-6 : hard affinity for all dom3 vcpus --> 4-6 > > You mean soft don''t you? And also hard -> all, not left alone. >Aha! :-) Actually, the ''all'' here (and below) refers to "what dom3''s vcpu(s)"? xl vcpu-pin <OPTIONS> <domain-id> <vcpu> <cpus> So, no, I did mean hard and, more specifically, I included that example because that is what is being used right now (and I mean _right_now_ so before this series), and that we both agreed should continue working. Now that you made me thinking about this, going this way we risk to have something like: xl vcpu-pin 3 all all all which may not exactly look crystal-clear...> > 2. xl vcpu-pin -s 3 all 3,5 : soft affinity for all dom3 vcpus --> 3,5 > > I think if you give -s then you should expect only one cpumap, which > would be the soft affinity map. >Same here. Another example could be: xl vcpu-pin -s 3 0 3,5 which has indeed only one cpumap, "3,5", while the "0" means vcpu #0 doma domain 3.> > Would that be ok? > > I like it. >So, is this still true? :-D Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-Dec-03 11:18 UTC
Re: [PATCH v4 13/15] xl: enable getting and setting soft
On Tue, 2013-12-03 at 12:14 +0100, Dario Faggioli wrote:> On mar, 2013-12-03 at 10:59 +0000, Ian Campbell wrote: > > On Tue, 2013-12-03 at 11:27 +0100, Dario Faggioli wrote: > > > On mar, 2013-12-03 at 09:32 +0000, Ian Campbell wrote: > > > > Another option would be to take two affinities on the command line, the > > > > first being hard and the second soft with some special syntax ("-"?) for > > > > "leave it alone". > > > > > > > Mmm... I think I like this better. So, if we take the approach > > > implemented in this patch right now, and introduce something like that > > > later, here''s what we''d get: > > > > > > 1. xl vcpu-pin 3 all 4-6 : hard affinity for all dom3 vcpus --> 4-6 > > > > You mean soft don''t you? And also hard -> all, not left alone. > > > Aha! :-) Actually, the ''all'' here (and below) refers to "what dom3''s > vcpu(s)"? > > xl vcpu-pin <OPTIONS> <domain-id> <vcpu> <cpus> > > So, no, I did mean hard and, more specifically, I included that example > because that is what is being used right now (and I mean _right_now_ so > before this series), and that we both agreed should continue working.Oh, yes I was confused by the all for the vcpu field.> Now that you made me thinking about this, going this way we risk to have > something like: > > xl vcpu-pin 3 all all all > > which may not exactly look crystal-clear...It''s a bit odd isn''t it, but I think livable with.> > > 2. xl vcpu-pin -s 3 all 3,5 : soft affinity for all dom3 vcpus --> 3,5 > > > > I think if you give -s then you should expect only one cpumap, which > > would be the soft affinity map. > > > Same here. Another example could be: > > xl vcpu-pin -s 3 0 3,5 > > which has indeed only one cpumap, "3,5", while the "0" means vcpu #0 > doma domain 3. > > > > Would that be ok? > > > > I like it. > > > So, is this still true? :-DYes. All IMHO of course -- would be nice to hear what others think. Ian.
Dario Faggioli
2013-Dec-03 11:40 UTC
Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}
`On mar, 2013-12-03 at 09:41 +0000, Ian Campbell wrote:> On Mon, 2013-12-02 at 19:21 +0100, Dario Faggioli wrote: > > I feel like it''s more natural to do like this, rather than having a > > pre-patch just moving the code for no apparent reason... Isn''t it? > > Certainly making changes which are necessarily required by the code > motion is fine to do in a single step, although even then if you can > arrange to make the changes either before or after the move it is > better. > > But is that what is happening here? > > -static inline int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, > - int max_cpus) > -{ > - if (max_cpus < 0) > - return ERROR_INVAL; > - if (max_cpus == 0) > - max_cpus = libxl_get_max_cpus(ctx); > - if (max_cpus == 0) > - return ERROR_FAIL; > - > - return libxl_bitmap_alloc(ctx, cpumap, max_cpus); > -} > > is becoming: > > +int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int max_cpus) > +{ > + GC_INIT(ctx); > + int rc = 0; > + > + if (max_cpus < 0) { > + rc = ERROR_INVAL; > + goto out; > + } > + if (max_cpus == 0) > + max_cpus = libxl_get_max_cpus(ctx); > + if (max_cpus <= 0) { > + LOG(ERROR, "failed to retrieve the maximum number of cpus"); > + rc = ERROR_FAIL; > + goto out; > + } > + /* This can''t fail: no need to check and log */ > + libxl_bitmap_alloc(ctx, cpumap, max_cpus); > + > + out: > + GC_FREE; > + return rc; > +} > > which involves a lot more reworking than simply adding the GC. >Well, is it? All I''m doing is adding the GC and one LOG(). In v5 it''s 2 LOG()s. The rest of the rework is indeed related to the GC-ification, since I can''t leave without calling GC_FREE anymore...> In any > case I don''t see why the original function couldn''t be moved as is and > then have the GC-ification added in the new location, I don''t think the > move requires the change to using the GC In any way. >Mmm... I''m not sure I''m fully understanding what you''re trying to say. That function is in libxl_utils.h right now (IIRC, I put it there myself :-)) because it''s a trivial wrapper of libxl_bitmap_alloc(). What is happening is that, as a result of changing the error handling in libxl_get_max_cpus(), and deciding to move logging for when it fails closer to it --which is what happens in this very patch-- I just don''t think this is a trivial enough wrapper any longer. So, my view here is: if I modify the function, adding GC and the LOG()s, then it should also be moved, if not, it can stay where it is. What you seem to be suggesting is that I (either in this or a previous patch) move the function as is, for no apparent reason, and then (either in a following or this patch), introduce the GC and the LOG()s... Is that the case, or am I missing something? I see the point of not combining functional and not-functional changes, but that really looks odd to me, in this particular case. Anyway, if that''s what you want, I certainly can do that... I''m not being graded against a maximum number of patches in a series, am I? ;-P> > (Anyway, feel free to look at v5 and reply there). > > In general it would be better to reply to vN either up front or as you > do the rework, so we can resolve any misunderstanding rather than > waiting until after you''ve posted vN+1 and perhaps requiring a vN+2. >Definitely, and I always do that. This round suffered from a combination of weird circumstances on my side. Sorry for that. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-Dec-03 11:45 UTC
Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}
On Tue, 2013-12-03 at 12:40 +0100, Dario Faggioli wrote:> So, my view here is: if I modify the function, adding GC and the LOG()s, > then it should also be moved, if not, it can stay where it is. What you > seem to be suggesting is that I (either in this or a previous patch) > move the function as is, for no apparent reason,It''s not for "no apparent reason", it is to separate code motion from reworking the function, which you would explain in your commit message. If that were considered no apparent reason then it wouldn''t be possible to split the code motion from the functional changes in the majority of cases. Ian.
Dario Faggioli
2013-Dec-03 12:06 UTC
Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}
On mar, 2013-12-03 at 11:45 +0000, Ian Campbell wrote:> On Tue, 2013-12-03 at 12:40 +0100, Dario Faggioli wrote: > > So, my view here is: if I modify the function, adding GC and the LOG()s, > > then it should also be moved, if not, it can stay where it is. What you > > seem to be suggesting is that I (either in this or a previous patch) > > move the function as is, for no apparent reason, > > It''s not for "no apparent reason", it is to separate code motion from > reworking the function, which you would explain in your commit message. >Ok then.> If that were considered no apparent reason then it wouldn''t be possible > to split the code motion from the functional changes in the majority of > cases. >Right. Thanks for the explanation, and sorry for not getting what you mean/want sooner. It just differs from the way I''ve done things up to now, so I really wasn''t sure I was understanding, rather than not wanting to do it. :-) Will do. Thanks again and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2013-Dec-03 17:40 UTC
Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}
Dario Faggioli writes ("Re: [Xen-devel] [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}"):> Thanks for the explanation, and sorry for not getting what you mean/want > sooner. It just differs from the way I''ve done things up to now, so I > really wasn''t sure I was understanding, rather than not wanting to do > it. :-)I think it would be helpful for me to expand on and motivate what Ian C has said, which I agree with. In general, code motion makes it very hard to review substantive changes. It makes it difficult to spot even very small substantive changes. On the other hand, it is much easier to review a patch that says in its commit message that it contains _only_ code motion and a specifically enumerated set of absolutely required changes (new function beginnings and endings, GC_INIT and GC_FREE, etc. etc. (but don''t write "etc. etc." in the commit message - spell it out)). Firstly, if you know a patch is supposed to be just code motion, it is fairly easy to check this with an ad hoc approach. And secondly, if we trust that the person who wrote the commit message is scrupulous about not introducing other stuff into it, we can take more of it on trust. And then the non-code motion patch is much easier to review too, because it just says "add missing LOG calls" and we can verify that that''s what it is doing. In general IMO splitting patches up should be done as much as possible, so long as the changes themselves are not interrelated. The reason for this is that reviewing involves understanding the commit message and then reviewing the code changes against the claimed changes from the commit message. This is task is much harder when the commit message is a rag bag of small changes. Ian.
Ian Jackson
2013-Dec-03 17:48 UTC
Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()
Dario Faggioli writes ("[PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()"):> to retrieve the actual number of pCPUs on the host. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > --- > tools/libxl/libxl.h | 3 +++ > tools/libxl/libxl_utils.c | 18 ++++++++++++++++++ > 2 files changed, 21 insertions(+) > > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > index a9663e4..2fab5ba 100644 > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -652,6 +652,9 @@ int libxl_domain_preserve(libxl_ctx *ctx, uint32_t domid, libxl_domain_create_in > /* get max. number of cpus supported by hypervisor */ > int libxl_get_max_cpus(libxl_ctx *ctx); > > +/* get the actual number of online cpus on the host */ > +int libxl_get_nr_cpus(libxl_ctx *ctx);This number might be out of date as soon as it is read, won''t it ? Ian.
Dario Faggioli
2013-Dec-03 17:52 UTC
Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()
On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote:> Dario Faggioli writes ("[PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()"): > > to retrieve the actual number of pCPUs on the host. > > > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > --- > > tools/libxl/libxl.h | 3 +++ > > tools/libxl/libxl_utils.c | 18 ++++++++++++++++++ > > 2 files changed, 21 insertions(+) > > > > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > > index a9663e4..2fab5ba 100644 > > --- a/tools/libxl/libxl.h > > +++ b/tools/libxl/libxl.h > > @@ -652,6 +652,9 @@ int libxl_domain_preserve(libxl_ctx *ctx, uint32_t domid, libxl_domain_create_in > > /* get max. number of cpus supported by hypervisor */ > > int libxl_get_max_cpus(libxl_ctx *ctx); > > > > +/* get the actual number of online cpus on the host */ > > +int libxl_get_nr_cpus(libxl_ctx *ctx); > > This number might be out of date as soon as it is read, won''t it ? >Quite possible, yes. So, are you suggesting that we shouldn''t even allow the user to read it? Or that I should mention that in the comment? (Or something else?) Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2013-Dec-03 17:54 UTC
Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()
Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()"):> On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote: > > This number might be out of date as soon as it is read, won''t it ? > > Quite possible, yes. > > So, are you suggesting that we shouldn''t even allow the user to read it? > Or that I should mention that in the comment? (Or something else?)Perhaps I didn''t explain my concerns clearly enough. I wonder what is it for ? Isn''t it difficult to use correctly ? Ian.
George Dunlap
2013-Dec-03 18:09 UTC
Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()
On 12/03/2013 05:54 PM, Ian Jackson wrote:> Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()"): >> On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote: >>> This number might be out of date as soon as it is read, won''t it ? >> >> Quite possible, yes. >> >> So, are you suggesting that we shouldn''t even allow the user to read it? >> Or that I should mention that in the comment? (Or something else?) > > Perhaps I didn''t explain my concerns clearly enough. > > I wonder what is it for ? Isn''t it difficult to use correctly ?Dario uses it in the new version of libxl_vcpu_set_affinity() to limit what was considered an "unreachable cpu" in its warning. (v5 14/17) That is, if you set the affinity to 1111111111111111111111..., and there are only 4 pcpus, it will return 111100000..... These don''t match, and yet there are no unreachable cpus. So it asks nr_cpus first, then only compares bits 1..[nr_cpus-1]. I''m not sure how often that would change, but if there was a race, I think it would just be a spurious error message (or lack thereof). -George
Dario Faggioli
2013-Dec-03 18:15 UTC
Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()
On mar, 2013-12-03 at 17:54 +0000, Ian Jackson wrote:> Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()"): > > On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote: > > > This number might be out of date as soon as it is read, won''t it ? > > > > Quite possible, yes. > > > > So, are you suggesting that we shouldn''t even allow the user to read it? > > Or that I should mention that in the comment? (Or something else?) > > Perhaps I didn''t explain my concerns clearly enough. > > I wonder what is it for ? Isn''t it difficult to use correctly ? >It is indeed. In fact, it is for now used only in the dryrun_only path of the (newly introduced by this series) `xl vcpu-pin'' command and, later in the series, for the sake of providing some LOG_DEBUG or LOG_WARN info. Unfortunately, there are not much alternatives to figure out which ones of the bits in a libxl_bitmap (be it a cpumap, in this case) are set to 0 because the pcpu in question is up and running but not part of the mask, or, OTOH, it''s just that the map is 64 bits wide, but we only have 8 pcpus. :-/ I really don''t think there are better ways to solve this at the libxl layer, and, if we want to improve the situation, we probably should amend the handling (not the API, hopefully) of libxl_bitmap in a more fundamental way. I''m certainly up for it, but perhaps not at this stage in the code freeze :-/ However, again, all this drives is a dryrun_only mode and some debug/warn output, so I really think there is no real harm. On an IRC conversation, IanJ seems to be convinced about this too: Diziet: Eg if you run it while also adding a cpu ? dariof: Diziet: it''s in the dry_run path dariof: Diziet: so, when it prints the cpumap that would have been used (if not in dryrun_only) mode, like this: print_bitmap(cpumap.map, nb_cpu, stdout); dariof: Diziet: it''s possible for the output to not be consistent with the new situation dariof: Diziet: basically, if it asks for the number of onlince cpus, it gets 3, and then the mask is <0111> it will print "all" Diziet: I think you have convinced me that this is too intractable a problem to solve at this level of the API and that therefore the call you propose to add is fine. Diziet: Shall I c&p this scrool into an email with my ack ? I cut-&-paste-d this with Ian''s permission, but I''d rather avoid providing a formal "Acked-by: Ian Jackson" line myself! :-P Ian, Please :-) Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2013-Dec-03 18:16 UTC
Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()
Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()"):> On an IRC conversation, IanJ seems to be convinced about this too:...> Diziet: Eg if you run it while also adding a cpu ? > dariof: Diziet: it''s in the dry_run path > dariof: Diziet: so, when it prints the cpumap that would have been used (if not in dryrun_only) mode, like this: print_bitmap(cpumap.map, nb_cpu, stdout); > dariof: Diziet: it''s possible for the output to not be consistent with the new situation > dariof: Diziet: basically, if it asks for the number of onlince cpus, it gets 3, and then the mask is <0111> it will print "all" > Diziet: I think you have convinced me that this is too intractable a problem to solve at this level of the API and that therefore the call you propose to add is fine. > Diziet: Shall I c&p this scrool into an email with my ack ? > > I cut-&-paste-d this with Ian''s permission, but I''d rather avoid > providing a formal "Acked-by: Ian Jackson" line myself! :-PAcked-by: Ian Jackson <ian.jackson@eu.citrix.com> Thanks, Ian.
Konrad Rzeszutek Wilk
2013-Dec-03 18:17 UTC
Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()
On Tue, Dec 03, 2013 at 06:09:10PM +0000, George Dunlap wrote:> On 12/03/2013 05:54 PM, Ian Jackson wrote: > >Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()"): > >>On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote: > >>>This number might be out of date as soon as it is read, won''t it ? > >> > >>Quite possible, yes. > >> > >>So, are you suggesting that we shouldn''t even allow the user to read it? > >>Or that I should mention that in the comment? (Or something else?) > > > >Perhaps I didn''t explain my concerns clearly enough. > > > >I wonder what is it for ? Isn''t it difficult to use correctly ? > > Dario uses it in the new version of libxl_vcpu_set_affinity() to > limit what was considered an "unreachable cpu" in its warning. (v5 > 14/17) That is, if you set the affinity to > 1111111111111111111111..., and there are only 4 pcpus, it will > return 111100000..... These don''t match, and yet there are no > unreachable cpus. So it asks nr_cpus first, then only compares bits > 1..[nr_cpus-1].What about the inverse? 2 PCPUs and 16 VCPUS? Won''t that still throw this calculation off?> > I''m not sure how often that would change, but if there was a race, I > think it would just be a spurious error message (or lack thereof). > > -George > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Dario Faggioli
2013-Dec-03 18:19 UTC
Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()
On mar, 2013-12-03 at 18:09 +0000, George Dunlap wrote:> On 12/03/2013 05:54 PM, Ian Jackson wrote: > > Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()"): > >> On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote: > >>> This number might be out of date as soon as it is read, won''t it ? > >> > >> Quite possible, yes. > >> > >> So, are you suggesting that we shouldn''t even allow the user to read it? > >> Or that I should mention that in the comment? (Or something else?) > > > > Perhaps I didn''t explain my concerns clearly enough. > > > > I wonder what is it for ? Isn''t it difficult to use correctly ? > > Dario uses it in the new version of libxl_vcpu_set_affinity() to limit > what was considered an "unreachable cpu" in its warning. (v5 14/17) > That is, if you set the affinity to 1111111111111111111111..., and there > are only 4 pcpus, it will return 111100000..... These don''t match, and > yet there are no unreachable cpus. So it asks nr_cpus first, then only > compares bits 1..[nr_cpus-1]. >Yes, and also in the new dryrun_only mode I''m introducing for `xl vcpu-pin'', as print_bitmap() in xl wants to know the total number of pcpus, so that, if the mask is <11110000>, but we only have 4 pcpus, it can print "all".> I''m not sure how often that would change, but if there was a race, I > think it would just be a spurious error message (or lack thereof). >Exactly. Also, as mentioned in the other e-mail I just sent, I''m up for trying to figure out a better way to handle this, as soon as 4.5 begins. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2013-Dec-03 18:22 UTC
Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()
On Tue, Dec 3, 2013 at 6:17 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Tue, Dec 03, 2013 at 06:09:10PM +0000, George Dunlap wrote: >> On 12/03/2013 05:54 PM, Ian Jackson wrote: >> >Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()"): >> >>On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote: >> >>>This number might be out of date as soon as it is read, won''t it ? >> >> >> >>Quite possible, yes. >> >> >> >>So, are you suggesting that we shouldn''t even allow the user to read it? >> >>Or that I should mention that in the comment? (Or something else?) >> > >> >Perhaps I didn''t explain my concerns clearly enough. >> > >> >I wonder what is it for ? Isn''t it difficult to use correctly ? >> >> Dario uses it in the new version of libxl_vcpu_set_affinity() to >> limit what was considered an "unreachable cpu" in its warning. (v5 >> 14/17) That is, if you set the affinity to >> 1111111111111111111111..., and there are only 4 pcpus, it will >> return 111100000..... These don''t match, and yet there are no >> unreachable cpus. So it asks nr_cpus first, then only compares bits >> 1..[nr_cpus-1]. > > What about the inverse? 2 PCPUs and 16 VCPUS? Won''t that still throw > this calculation off?Both things being compared are pcpus. -George
Dario Faggioli
2013-Dec-03 18:26 UTC
Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()
`On mar, 2013-12-03 at 13:17 -0500, Konrad Rzeszutek Wilk wrote:> On Tue, Dec 03, 2013 at 06:09:10PM +0000, George Dunlap wrote: > > On 12/03/2013 05:54 PM, Ian Jackson wrote: > > >Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()"): > > >>On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote: > > >>>This number might be out of date as soon as it is read, won''t it ? > > >> > > >>Quite possible, yes. > > >> > > >>So, are you suggesting that we shouldn''t even allow the user to read it? > > >>Or that I should mention that in the comment? (Or something else?) > > > > > >Perhaps I didn''t explain my concerns clearly enough. > > > > > >I wonder what is it for ? Isn''t it difficult to use correctly ? > > > > Dario uses it in the new version of libxl_vcpu_set_affinity() to > > limit what was considered an "unreachable cpu" in its warning. (v5 > > 14/17) That is, if you set the affinity to > > 1111111111111111111111..., and there are only 4 pcpus, it will > > return 111100000..... These don''t match, and yet there are no > > unreachable cpus. So it asks nr_cpus first, then only compares bits > > 1..[nr_cpus-1]. > > What about the inverse? 2 PCPUs and 16 VCPUS? Won''t that still throw > this calculation off? >It''s not how it works, it''s not the number of vcpus against pcpus that matters. So, as soon as your 16 vcpus have affinity with some _real_, _existing_ and _online_ pcpus, that would be fine and this situation you''re mentioning won''t cause any problem. OTOH, if you offline any of the pcpus in the process, no matter how many pcpus-vs-vcpus you have, you risk getting a spurious warning (and only that). Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel