Dario Faggioli
2013-Jul-11 16:10 UTC
[PATCH] xl: allow for node-wise specification of vcpu pinning
Making it possible to use something like the following: * "nodes:0-3": all pCPUs of nodes 0,1,2,3; * "nodes:0-3,^node:2": all pCPUS of nodes 0,1,3; * "1,nodes:1-2,^6": pCPU 1 plus all pCPUs of nodes 1,2 but not pCPU 6; * ... In both domain config file and `xl vcpu-pin''. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@citrix.com> Cc: George Dunlap <george.dunlap@eu.citrix.com> --- docs/man/xl.cfg.pod.5 | 20 ++++++++ tools/libxl/xl_cmdimpl.c | 114 +++++++++++++++++++++++++++++++--------------- 2 files changed, 96 insertions(+), 38 deletions(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index 069b73f..90ccbaf 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -125,6 +125,26 @@ run on cpu #3 of the host. =back +A C<CPU-LIST> may also be specified NUMA node-wise as follows: + +=over 4 + +=item "nodes:all" + +To allow all the vcpus of the guest to run on all the cpus of all the NUMA +nodes of the host. + +=item "nodes:0-3,node:^2" + +To allow all the vcpus of the guest to run on the cpus belonging to +the NUMA nodes 0,1,3 of the host. + +=back + +Combining the two is allowed. For instance, "1,node:2,^6" means all the +vcpus of the guest will run on cpus 1 and on all the cpus of NUMA node 2, +but not on cpu 6. + If this option is not specified, libxl automatically tries to place the new domain on the host''s NUMA nodes (provided the host has more than one NUMA node) by pinning it to the cpus of those nodes. A heuristic approach is diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 8a478ba..33e50b6 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -481,61 +481,99 @@ static void split_string_into_string_list(const char *str, free(s); } +static int range_parse_bitmap(const char *str, libxl_bitmap *map) +{ + char *nstr, *endptr; + uint32_t ida, idb; + + ida = idb = strtoul(str, &endptr, 10); + if (endptr == str) + return EINVAL; + + if (*endptr == ''-'') { + nstr = endptr + 1; + idb = strtoul(nstr, &endptr, 10); + if (endptr == nstr) + return EINVAL; + } + + libxl_bitmap_set_none(map); + while (ida <= idb) { + libxl_bitmap_set(map, ida); + ida++; + } + + return 0; +} + static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap) { - libxl_bitmap exclude_cpumap; - uint32_t cpuida, cpuidb; - char *endptr, *toka, *tokb, *saveptr = NULL; - int i, rc = 0, rmcpu; + libxl_bitmap map, cpu_nodemap, *this_map; + char *ptr, *saveptr = NULL; + bool isnot, isnode; + int i, rc = 0; - if (!strcmp(cpu, "all")) { + if (!strcmp(cpu, "all") || !strcmp(cpu, "nodes:all")) { libxl_bitmap_set_any(cpumap); return 0; } - if (libxl_cpu_bitmap_alloc(ctx, &exclude_cpumap, 0)) { - fprintf(stderr, "Error: Failed to allocate cpumap.\n"); - return ENOMEM; + libxl_bitmap_init(&map); + libxl_bitmap_init(&cpu_nodemap); + + rc = libxl_node_bitmap_alloc(ctx, &cpu_nodemap, 0); + if (rc) { + fprintf(stderr, "libxl_node_bitmap_alloc failed.\n"); + goto out; } + rc = libxl_cpu_bitmap_alloc(ctx, &map, 0); + if (rc) { + fprintf(stderr, "libxl_cpu_bitmap_alloc failed.\n"); + goto out; + } + + for (ptr = strtok_r(cpu, ",", &saveptr); ptr; + ptr = strtok_r(NULL, ",", &saveptr)) { + isnot = isnode = false; - for (toka = strtok_r(cpu, ",", &saveptr); toka; - toka = strtok_r(NULL, ",", &saveptr)) { - rmcpu = 0; - if (*toka == ''^'') { - /* This (These) Cpu(s) will be removed from the map */ - toka++; - rmcpu = 1; + /* Are we dealing with cpus or nodes? */ + if (!strncmp(ptr, "node:", 5) || !strncmp(ptr, "nodes:", 6)) { + isnode = true; + ptr += 5 + (ptr[4] == ''s''); } - /* Extract a valid (range of) cpu(s) */ - cpuida = cpuidb = strtoul(toka, &endptr, 10); - if (endptr == toka) { + /* Are we adding or removing cpus/nodes? */ + if (*ptr == ''^'') { + isnot = true; + ptr++; + } + /* Get in map a bitmap representative of the range */ + if (range_parse_bitmap(ptr, &map)) { fprintf(stderr, "Error: Invalid argument.\n"); rc = EINVAL; - goto vcpp_out; - } - if (*endptr == ''-'') { - tokb = endptr + 1; - cpuidb = strtoul(tokb, &endptr, 10); - if (endptr == tokb || cpuida > cpuidb) { - fprintf(stderr, "Error: Invalid argument.\n"); - rc = EINVAL; - goto vcpp_out; - } + goto out; } - while (cpuida <= cpuidb) { - rmcpu == 0 ? libxl_bitmap_set(cpumap, cpuida) : - libxl_bitmap_set(&exclude_cpumap, cpuida); - cpuida++; + + /* Add or remove the specified cpus */ + if (isnode) { + rc = libxl_nodemap_to_cpumap(ctx, &map, &cpu_nodemap); + if (rc) { + fprintf(stderr, "libxl_nodemap_to_cpumap failed.\n"); + goto out; + } + this_map = &cpu_nodemap; + } else { + this_map = ↦ } - } - /* Clear all the cpus from the removal list */ - libxl_for_each_set_bit(i, exclude_cpumap) { - libxl_bitmap_reset(cpumap, i); + libxl_for_each_set_bit(i, *this_map) { + isnot ? libxl_bitmap_reset(cpumap, i) + : libxl_bitmap_set(cpumap, i); + } } -vcpp_out: - libxl_bitmap_dispose(&exclude_cpumap); + out: + libxl_bitmap_dispose(&map); + libxl_bitmap_dispose(&cpu_nodemap); return rc; }
Dario Faggioli
2013-Jul-11 16:15 UTC
Re: [PATCH] xl: allow for node-wise specification of vcpu pinning
On gio, 2013-07-11 at 18:10 +0200, Dario Faggioli wrote:> Making it possible to use something like the following: > * "nodes:0-3": all pCPUs of nodes 0,1,2,3; > * "nodes:0-3,^node:2": all pCPUS of nodes 0,1,3; > * "1,nodes:1-2,^6": pCPU 1 plus all pCPUs of nodes 1,2 > but not pCPU 6; > * ... > > In both domain config file and `xl vcpu-pin''. >And just allow me to clarify that this has nothing to do with NUMA node affinity, neither with the specification, nor with the behavior... I am working on per-vcpu node affinity (among all the other NUMA stuff, of course), and I will submit it soon... It just is not this, ok? :-D Here, I''m just introducing a new, and possibly more comfortable, way of specifying vCPU to pCPU _pinning_. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2013-Jul-12 16:51 UTC
Re: [PATCH] xl: allow for node-wise specification of vcpu pinning
Dario Faggioli writes ("[PATCH] xl: allow for node-wise specification of vcpu pinning"):> Making it possible to use something like the following: > * "nodes:0-3": all pCPUs of nodes 0,1,2,3; > * "nodes:0-3,^node:2": all pCPUS of nodes 0,1,3; > * "1,nodes:1-2,^6": pCPU 1 plus all pCPUs of nodes 1,2 > but not pCPU 6; > * ...I have some comments about this. I think the idea is fine, and the syntax is not bad. The documentation needs to explain the semantics more formally. The parsing code is starting to be rather impenetrable. Have you considered parsing this with flex or maybe doing this with regexps or something ? If you do decide to keep it the way it is:> + ida = idb = strtoul(str, &endptr, 10); > + if (endptr == str) > + return EINVAL;You need to check for overflow, too, everywhere you call strtoul. It may be easiest to wrap strtoul. What if "long" is bigger than uint32_t ? I think this can silently overflow. (Although perhaps people who write crazy strings of hex asking for vcpu 2^40 or something deserve to silently lose.)> - if (!strcmp(cpu, "all")) { > + if (!strcmp(cpu, "all") || !strcmp(cpu, "nodes:all")) { > libxl_bitmap_set_any(cpumap); > return 0;With your new "not" syntax, this needs to become not a special case. Someone might write all,^7 and it should work. I think your current code changes the meaning of ^7 from "not pcpu 7" to "empty set of pcpus", which is wrong.> + /* Are we dealing with cpus or nodes? */ > + if (!strncmp(ptr, "node:", 5) || !strncmp(ptr, "nodes:", 6)) { > + isnode = true; > + ptr += 5 + (ptr[4] == ''s'');Cripes. Please, no. None of these magic 4s and 5s and 6s, and the ptr[4]==''s'' thing is truly awful. How about some macro STR_SKIP_PREFIX(), and two separate ifs, I think this is starting to be complex enough that it ought to have some in-tree test vectors. Ian.
Dario Faggioli
2013-Aug-01 10:59 UTC
Re: [PATCH] xl: allow for node-wise specification of vcpu pinning
On ven, 2013-07-12 at 17:51 +0100, Ian Jackson wrote:> Dario Faggioli writes ("[PATCH] xl: allow for node-wise specification of vcpu pinning"): > > Making it possible to use something like the following: > > * "nodes:0-3": all pCPUs of nodes 0,1,2,3; > > * "nodes:0-3,^node:2": all pCPUS of nodes 0,1,3; > > * "1,nodes:1-2,^6": pCPU 1 plus all pCPUs of nodes 1,2 > > but not pCPU 6; > > * ... > > I have some comments about this. >Ok, thanks for reviewing. Also, sorry for being so late in replying, but I was travelling and, after that, busy enough that I could not come back to this before now. :-/> I think the idea is fine, and the syntax is not bad. >Ok, that''s already something. :-)> The documentation needs to explain the semantics more formally. >I''ll try to improve it. What was it that you think was not clear enough?> The parsing code is starting to be rather impenetrable. Have you > considered parsing this with flex or maybe doing this with regexps or > something ? >HeHe, actually I haven''t, for the very simple reason that I know nothing of flex. :-) That means, for me, turning it to flex include the overhead of learning what flex is and how to use it, which does not look worthwhile for "just" this change. Also, in this same patch I''m breaking down the parsing function, improving its readability (or so I think), and I''d rather try to push a bit more on this direction, rather than introducing flex. Honestly, I also think that this isn''t that bad, but of course my opinion is not at all authoritative, since I really have not idea how it would look like with flex. So, I think I''m sending this back to you, as a maintainer. :-) My view is that it is fine as it is, but I will conform with what _you_ think it''s best.> If you do decide to keep it the way it is: > > > + ida = idb = strtoul(str, &endptr, 10); > > + if (endptr == str) > > + return EINVAL; > > You need to check for overflow, too, everywhere you call strtoul. It > may be easiest to wrap strtoul. >I see what you mean. This is pretty much code motion, from the point of view of this patch, but that does not make it less worthwhile. What I''m thinking about is that there are other uses of strtol(), most of them assigning the returned result to an int, so I''m not sure how to wrap it... What the wrapper should return?> What if "long" is bigger than uint32_t ? I think this can silently > overflow. (Although perhaps people who write crazy strings of hex > asking for vcpu 2^40 or something deserve to silently lose.) >What about I change ida and idb to long? I checked and it looks like the worst that can happen, if we really get some crazy big number, is that libxl_bitmap_set() turns into a nop, and no range at all is set (which, as you say, is probably what those crazy people deserves). Thoughts?> > - if (!strcmp(cpu, "all")) { > > + if (!strcmp(cpu, "all") || !strcmp(cpu, "nodes:all")) { > > libxl_bitmap_set_any(cpumap); > > return 0; > > With your new "not" syntax, this needs to become not a special case. > Someone might write > all,^7 > and it should work. >Oh, I like that. will do.> I think your current code changes the meaning of > ^7 > from "not pcpu 7" to "empty set of pcpus", which is wrong. >Mmm... you mean if one only specifies "^7"? That does not work even right now (because it already translates to an "empty set of pcups", I just tried). Or was it something else that you were saying? If you''re saying that specifying just "^7" should mean "all,^7", I think I agree, and I can do that.> > + /* Are we dealing with cpus or nodes? */ > > + if (!strncmp(ptr, "node:", 5) || !strncmp(ptr, "nodes:", 6)) { > > + isnode = true; > > + ptr += 5 + (ptr[4] == ''s''); > > Cripes. Please, no. None of these magic 4s and 5s and 6s, and the > ptr[4]==''s'' thing is truly awful. > > How about some macro STR_SKIP_PREFIX(), and two separate ifs, >Another nice idea. I''ll go for it.> I think this is starting to be complex enough that it ought to have > some in-tree test vectors. >Oh, you mean like the one we have in check-xl-disk-parse, right? Ok, it makes sense, I will look into this too. Thanks again and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel