Hello, Third version of the NUMA placement series Xen 4.2. All the comments received during v2''s review have been addressed (more details in single changelogs). The most notable changes are the following: - the libxl_cpumap --> libxl_bitmap renaming has been rebased on top of the recent patches that allows us to allocate bitmaps of different sizes; - the heuristics for deciding which NUMA placement is the best one has been redesigned, so that it now provides total ordering. Here it is what this posting contains (* = acked during previous round): * [PATCH 01 of 10 v3] libxl: add a new Array type to the IDL [PATCH 02 of 10 v3] libxl,libxc: introduce libxl_get_numainfo() * [PATCH 03 of 10 v3] xl: add more NUMA information to `xl info -n'' [PATCH 04 of 10 v3] libxl: rename libxl_cpumap to libxl_bitmap [PATCH 05 of 10 v3] libxl: expand the libxl_bitmap API a bit * [PATCH 06 of 10 v3] libxl: introduce some node map helpers [PATCH 07 of 10 v3] libxl: explicitly check for libmath in autoconf Is where data structures, utility functions and infrastructure are introduced. * [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes * [PATCH 09 of 10 v3] libxl: have NUMA placement deal with cpupools Host the core of the mechanism. * [PATCH 10 of 10 v3] Some automatic NUMA placement documentation For some more documentation. Thanks a lot and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Dario Faggioli
2012-Jul-04 16:02 UTC
[PATCH 01 of 10 v3] libxl: add a new Array type to the IDL
# HG changeset patch # User Ian Campbell <ian.campbell@citrix.com> # Date 1341416322 -7200 # Node ID 8e367818e194c212cd1470aad663f3243ff53bdb # Parent 42f76d536b116d2ebad1b6705ae51ecd171d2581 libxl: add a new Array type to the IDL And make all the required infrastructure updates to enable this. Since there are currently no uses of this type there is no change to the generated code. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Changes from v2: * v2''s patch replaced by what Ian Campbell posted as <03b641aa89f979a1670b.1340791844@cosworth.uk.xensource.com>, as agreed during the review process of that patch. diff --git a/tools/libxl/gentest.py b/tools/libxl/gentest.py --- a/tools/libxl/gentest.py +++ b/tools/libxl/gentest.py @@ -27,6 +27,18 @@ def gen_rand_init(ty, v, indent = " " s = "" if isinstance(ty, idl.Enumeration): s += "%s = %s;\n" % (ty.pass_arg(v, parent is None), randomize_enum(ty)) + elif isinstance(ty, idl.Array): + if parent is None: + raise Exception("Array type must have a parent") + s += "%s = rand()%%8;\n" % (parent + ty.lenvar.name) + s += "%s = calloc(%s, sizeof(*%s));\n" % \ + (v, parent + ty.lenvar.name, v) + s += "{\n" + s += " int i;\n" + s += " for (i=0; i<%s; i++)\n" % (parent + ty.lenvar.name) + s += gen_rand_init(ty.elem_type, v+"[i]", + indent + " ", parent) + s += "}\n" elif isinstance(ty, idl.KeyedUnion): if parent is None: raise Exception("KeyedUnion type must have a parent") diff --git a/tools/libxl/gentypes.py b/tools/libxl/gentypes.py --- a/tools/libxl/gentypes.py +++ b/tools/libxl/gentypes.py @@ -11,8 +11,12 @@ def libxl_C_instance_of(ty, instancename return libxl_C_type_define(ty) else: return libxl_C_type_define(ty) + " " + instancename - else: - return ty.typename + " " + instancename + + s = "" + if isinstance(ty, idl.Array): + s += libxl_C_instance_of(ty.lenvar.type, ty.lenvar.name) + ";\n" + + return s + ty.typename + " " + instancename def libxl_C_type_define(ty, indent = ""): s = "" @@ -66,6 +70,21 @@ def libxl_C_type_dispose(ty, v, indent s += libxl_C_type_dispose(f.type, fexpr, indent + " ", nparent) s += " break;\n" s += "}\n" + elif isinstance(ty, idl.Array): + if parent is None: + raise Exception("Array type must have a parent") + if ty.elem_type.dispose_fn is not None: + s += "{\n" + s += " int i;\n" + s += " for (i=0; i<%s; i++)\n" % (parent + ty.lenvar.name) + s += libxl_C_type_dispose(ty.elem_type, v+"[i]", + indent + " ", parent) + if ty.dispose_fn is not None: + if ty.elem_type.dispose_fn is not None: + s += " " + s += "%s(%s);\n" % (ty.dispose_fn, ty.pass_arg(v, parent is None)) + if ty.elem_type.dispose_fn is not None: + s += "}\n" elif isinstance(ty, idl.Struct) and (parent is None or ty.dispose_fn is None): for f in [f for f in ty.fields if not f.const]: (nparent,fexpr) = ty.member(v, f, parent is None) @@ -164,7 +183,24 @@ def libxl_C_type_gen_json(ty, v, indent s = "" if parent is None: s += "yajl_gen_status s;\n" - if isinstance(ty, idl.Enumeration): + + if isinstance(ty, idl.Array): + if parent is None: + raise Exception("Array type must have a parent") + s += "{\n" + s += " int i;\n" + s += " s = yajl_gen_array_open(hand);\n" + s += " if (s != yajl_gen_status_ok)\n" + s += " goto out;\n" + s += " for (i=0; i<%s; i++) {\n" % (parent + ty.lenvar.name) + s += libxl_C_type_gen_json(ty.elem_type, v+"[i]", + indent + " ", parent) + s += " }\n" + s += " s = yajl_gen_array_close(hand);\n" + s += " if (s != yajl_gen_status_ok)\n" + s += " goto out;\n" + s += "}\n" + elif isinstance(ty, idl.Enumeration): s += "s = libxl__yajl_gen_enum(hand, %s_to_string(%s));\n" % (ty.typename, ty.pass_arg(v, parent is None)) s += "if (s != yajl_gen_status_ok)\n" s += " goto out;\n" diff --git a/tools/libxl/idl.py b/tools/libxl/idl.py --- a/tools/libxl/idl.py +++ b/tools/libxl/idl.py @@ -266,6 +266,17 @@ string = Builtin("char *", namespace = N json_fn = "libxl__string_gen_json", autogenerate_json = False) +class Array(Type): + """An array of the same type""" + def __init__(self, elem_type, lenvar_name, **kwargs): + kwargs.setdefault(''dispose_fn'', ''free'') + Type.__init__(self, namespace=elem_type.namespace, typename=elem_type.rawname + " *", **kwargs) + + lv_kwargs = dict([(x.lstrip(''lenvar_''),y) for (x,y) in kwargs.items() if x.startswith(''lenvar_'')]) + + self.lenvar = Field(integer, lenvar_name, **lv_kwargs) + self.elem_type = elem_type + class OrderedDict(dict): """A dictionary which remembers insertion order. diff --git a/tools/libxl/idl.txt b/tools/libxl/idl.txt --- a/tools/libxl/idl.txt +++ b/tools/libxl/idl.txt @@ -145,11 +145,24 @@ idl.KeyedUnion A subclass of idl.Aggregate which represents the C union type where the currently valid member of the union can be determined based - upon another member in the containing type. + upon another member in the containing type. An idl.KeyedUnion must + always be a member of a containing idl.Aggregate type. - The KeyedUnion.keyvar contains an idl.type the member of the - containing type which determines the valid member of the union. The - must be an instance of the Enumeration type. + The KeyedUnion.keyvar contains an idl.Field, this is the member of + the containing type which determines the valid member of the + union. The idl.Field.type of the keyvar must be an Enumeration type. + +idl.Array + + A class representing an array of similar elements. An idl.Array must + always be an idl.Field of a containing idl.Aggregate. + + idl.Array.elem_type contains an idl.Type which is the type of each + element of the array. + + idl.Array.len_var contains an idl.Field which is added to the parent + idl.Aggregate and will contain the length of the array. The field + MUST be named num_ARRAYNAME. Standard Types -------------- diff --git a/tools/ocaml/libs/xl/genwrap.py b/tools/ocaml/libs/xl/genwrap.py --- a/tools/ocaml/libs/xl/genwrap.py +++ b/tools/ocaml/libs/xl/genwrap.py @@ -55,7 +55,8 @@ def ocaml_type_of(ty): return "int%d" % ty.width else: return "int" - + elif isinstance(ty,idl.Array): + return "%s array" % ocaml_type_of(ty.elem_type) elif isinstance(ty,idl.Builtin): if not builtins.has_key(ty.typename): raise NotImplementedError("Unknown Builtin %s (%s)" % (ty.typename, type(ty))) @@ -138,6 +139,8 @@ def c_val(ty, c, o, indent="", parent = if not fn: raise NotImplementedError("No c_val fn for Builtin %s (%s)" % (ty.typename, type(ty))) s += "%s;" % (fn % { "o": o, "c": c }) + elif isinstance (ty,idl.Array): + raise("Cannot handle Array type\n") elif isinstance(ty,idl.Enumeration) and (parent is None): n = 0 s += "switch(Int_val(%s)) {\n" % o @@ -195,6 +198,16 @@ def ocaml_Val(ty, o, c, indent="", paren if not fn: raise NotImplementedError("No ocaml Val fn for Builtin %s (%s)" % (ty.typename, type(ty))) s += "%s = %s;" % (o, fn % { "c": c }) + elif isinstance(ty, idl.Array): + s += "{\n" + s += "\t int i;\n" + s += "\t value array_elem;\n" + s += "\t %s = caml_alloc(%s,0);\n" % (o, parent + ty.lenvar.name) + s += "\t for(i=0; i<%s; i++) {\n" % (parent + ty.lenvar.name) + s += "\t %s\n" % ocaml_Val(ty.elem_type, "array_elem", c + "[i]", "") + s += "\t Store_field(%s, i, array_elem);\n" % o + s += "\t }\n" + s += "\t}" elif isinstance(ty,idl.Enumeration) and (parent is None): n = 0 s += "switch(%s) {\n" % c diff --git a/tools/python/genwrap.py b/tools/python/genwrap.py --- a/tools/python/genwrap.py +++ b/tools/python/genwrap.py @@ -4,7 +4,7 @@ import sys,os import idl -(TYPE_DEFBOOL, TYPE_BOOL, TYPE_INT, TYPE_UINT, TYPE_STRING, TYPE_AGGREGATE) = range(6) +(TYPE_DEFBOOL, TYPE_BOOL, TYPE_INT, TYPE_UINT, TYPE_STRING, TYPE_ARRAY, TYPE_AGGREGATE) = range(7) def py_type(ty): if ty == idl.bool: @@ -18,6 +18,8 @@ def py_type(ty): return TYPE_INT else: return TYPE_UINT + if isinstance(ty, idl.Array): + return TYPE_ARRAY if isinstance(ty, idl.Aggregate): return TYPE_AGGREGATE if ty == idl.string: @@ -74,7 +76,7 @@ def py_attrib_get(ty, f): l.append('' return genwrap__ull_get(self->obj.%s);''%f.name) elif t == TYPE_STRING: l.append('' return genwrap__string_get(&self->obj.%s);''%f.name) - elif t == TYPE_AGGREGATE: + elif t == TYPE_AGGREGATE or t == TYPE_ARRAY: l.append('' PyErr_SetString(PyExc_NotImplementedError, "Getting %s");''%ty.typename) l.append('' return NULL;'') else: @@ -105,7 +107,7 @@ def py_attrib_set(ty, f): l.append('' return ret;'') elif t == TYPE_STRING: l.append('' return genwrap__string_set(v, &self->obj.%s);''%f.name) - elif t == TYPE_AGGREGATE: + elif t == TYPE_AGGREGATE or t == TYPE_ARRAY: l.append('' PyErr_SetString(PyExc_NotImplementedError, "Setting %s");''%ty.typename) l.append('' return -1;'') else:
Dario Faggioli
2012-Jul-04 16:02 UTC
[PATCH 02 of 10 v3] libxl, libxc: introduce libxl_get_numainfo()
# HG changeset patch # User Dario Faggioli <raistlin@linux.it> # Date 1341416323 -7200 # Node ID 0ca91a203fc95d3d18bb436ecdc7106b0b2ff22f # Parent 8e367818e194c212cd1470aad663f3243ff53bdb libxl,libxc: introduce libxl_get_numainfo() Make some NUMA node information available to the toolstack. Achieve this by means of xc_numainfo(), which exposes memory size and amount of free memory of each node, as well as the relative distances of each node to all the others. For properly exposing distances we need the IDL to support arrays. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * converted libxl__zalloc(NULL, ...) to libxl_calloc(NOGC, ...). * Fixed the comment about memory ownership of libxl_get_numainfo(). * Added a comment for libxl_numainfo in libxl_types.idl. Changes from v1: * malloc converted to libxl__zalloc(NOGC, ...). * The patch also accommodates some bits of what was in "libxc, libxl: introduce xc_nodemap_t and libxl_nodemap" which was removed as well, as full support for node maps at libxc level is not needed (yet!). diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -35,6 +35,20 @@ int xc_get_max_cpus(xc_interface *xch) return max_cpus; } +int xc_get_max_nodes(xc_interface *xch) +{ + static int max_nodes = 0; + xc_physinfo_t physinfo; + + if ( max_nodes ) + return max_nodes; + + if ( !xc_physinfo(xch, &physinfo) ) + max_nodes = physinfo.max_node_id + 1; + + return max_nodes; +} + int xc_get_cpumap_size(xc_interface *xch) { return (xc_get_max_cpus(xch) + 7) / 8; diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -329,6 +329,12 @@ int xc_get_cpumap_size(xc_interface *xch /* allocate a cpumap */ xc_cpumap_t xc_cpumap_alloc(xc_interface *xch); + /* + * NODEMAP handling + */ +/* return maximum number of NUMA nodes the hypervisor supports */ +int xc_get_max_nodes(xc_interface *xch); + /* * DOMAIN DEBUGGING FUNCTIONS */ diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -3298,6 +3298,75 @@ fail: return ret; } +libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr) +{ + GC_INIT(ctx); + xc_numainfo_t ninfo; + DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize); + DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree); + DECLARE_HYPERCALL_BUFFER(uint32_t, node_dists); + libxl_numainfo *ret = NULL; + int i, j, max_nodes; + + max_nodes = libxl_get_max_nodes(ctx); + if (max_nodes == 0) + { + LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES"); + ret = NULL; + goto out; + } + + memsize = xc_hypercall_buffer_alloc + (ctx->xch, memsize, sizeof(*memsize) * max_nodes); + memfree = xc_hypercall_buffer_alloc + (ctx->xch, memfree, sizeof(*memfree) * max_nodes); + node_dists = xc_hypercall_buffer_alloc + (ctx->xch, node_dists, sizeof(*node_dists) * max_nodes * max_nodes); + if ((memsize == NULL) || (memfree == NULL) || (node_dists == NULL)) { + LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM, + "Unable to allocate hypercall arguments"); + goto fail; + } + + set_xen_guest_handle(ninfo.node_to_memsize, memsize); + set_xen_guest_handle(ninfo.node_to_memfree, memfree); + set_xen_guest_handle(ninfo.node_to_node_distance, node_dists); + ninfo.max_node_index = max_nodes - 1; + if (xc_numainfo(ctx->xch, &ninfo) != 0) { + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting numainfo"); + goto fail; + } + + if (ninfo.max_node_index < max_nodes - 1) + max_nodes = ninfo.max_node_index + 1; + + *nr = max_nodes; + + ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * max_nodes); + for (i = 0; i < max_nodes; i++) + ret[i].dists = libxl__calloc(NOGC, max_nodes, sizeof(*node_dists)); + + for (i = 0; i < max_nodes; i++) { +#define V(mem, i) (mem[i] == INVALID_NUMAINFO_ID) ? \ + LIBXL_NUMAINFO_INVALID_ENTRY : mem[i] + ret[i].size = V(memsize, i); + ret[i].free = V(memfree, i); + ret[i].num_dists = max_nodes; + for (j = 0; j < ret[i].num_dists; j++) + ret[i].dists[j] = V(node_dists, i * max_nodes + j); +#undef V + } + + fail: + xc_hypercall_buffer_free(ctx->xch, memsize); + xc_hypercall_buffer_free(ctx->xch, memfree); + xc_hypercall_buffer_free(ctx->xch, node_dists); + + out: + GC_FREE; + return ret; +} + const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx) { union { diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -532,6 +532,9 @@ int libxl_domain_preserve(libxl_ctx *ctx /* get max. number of cpus supported by hypervisor */ int libxl_get_max_cpus(libxl_ctx *ctx); +/* get max. number of NUMA nodes supported by hypervisor */ +int libxl_get_max_nodes(libxl_ctx *ctx); + int libxl_domain_rename(libxl_ctx *ctx, uint32_t domid, const char *old_name, const char *new_name); @@ -604,6 +607,10 @@ void libxl_vminfo_list_free(libxl_vminfo libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out); void libxl_cputopology_list_free(libxl_cputopology *, int nb_cpu); +#define LIBXL_NUMAINFO_INVALID_ENTRY (~(uint32_t)0) +libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr); +void libxl_numainfo_list_free(libxl_numainfo *, int nr); + libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, int *nb_vcpu, int *nr_vcpus_out); void libxl_vcpuinfo_list_free(libxl_vcpuinfo *, int nr_vcpus); diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -433,6 +433,15 @@ libxl_physinfo = Struct("physinfo", [ ("cap_hvm_directio", bool), ], dir=DIR_OUT) +# NUMA node characteristics: size and free are how much memory it has, and how +# much of it is free, respectively. dists is an array of distances from this +# node to each other node. +libxl_numainfo = Struct("numainfo", [ + ("size", uint64), + ("free", uint64), + ("dists", Array(uint32, "num_dists")), + ], dir=DIR_OUT) + libxl_cputopology = Struct("cputopology", [ ("core", uint32), ("socket", uint32), diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -572,6 +572,11 @@ int libxl_get_max_cpus(libxl_ctx *ctx) return xc_get_max_cpus(ctx->xch); } +int libxl_get_max_nodes(libxl_ctx *ctx) +{ + return xc_get_max_nodes(ctx->xch); +} + int libxl__enum_from_string(const libxl_enum_string_table *t, const char *s, int *e) { @@ -594,6 +599,14 @@ void libxl_cputopology_list_free(libxl_c free(list); } +void libxl_numainfo_list_free(libxl_numainfo *list, int nr) +{ + int i; + for (i = 0; i < nr; i++) + libxl_numainfo_dispose(&list[i]); + free(list); +} + void libxl_vcpuinfo_list_free(libxl_vcpuinfo *list, int nr) { int i; diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -484,6 +484,7 @@ typedef struct xen_sysctl_topologyinfo x DEFINE_XEN_GUEST_HANDLE(xen_sysctl_topologyinfo_t); /* XEN_SYSCTL_numainfo */ +#define INVALID_NUMAINFO_ID (~0U) struct xen_sysctl_numainfo { /* * IN: maximum addressable entry in the caller-provided arrays.
Dario Faggioli
2012-Jul-04 16:02 UTC
[PATCH 03 of 10 v3] xl: add more NUMA information to `xl info -n''
# HG changeset patch # User Dario Faggioli <raistlin@linux.it> # Date 1341416323 -7200 # Node ID f1227d5a82e56d10e302aec4c3717d281718a349 # Parent 0ca91a203fc95d3d18bb436ecdc7106b0b2ff22f xl: add more NUMA information to `xl info -n'' So that the user knows how much memory there is on each node and how far they are from each others. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> --- Changes from v1: * integer division replaced by right shift. diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -4249,6 +4249,36 @@ static void output_physinfo(void) return; } +static void output_numainfo(void) +{ + libxl_numainfo *info; + int i, j, nr; + + info = libxl_get_numainfo(ctx, &nr); + if (info == NULL) { + fprintf(stderr, "libxl_get_numainfo failed.\n"); + return; + } + + printf("numa_info :\n"); + printf("node: memsize memfree distances\n"); + + for (i = 0; i < nr; i++) { + if (info[i].size != LIBXL_NUMAINFO_INVALID_ENTRY) { + printf("%4d: %6"PRIu64" %6"PRIu64" %d", i, + info[i].size >> 20, info[i].free >> 20, + info[i].dists[0]); + for (j = 1; j < info[i].num_dists; j++) + printf(",%d", info[i].dists[j]); + printf("\n"); + } + } + + libxl_numainfo_list_free(info, nr); + + return; +} + static void output_topologyinfo(void) { libxl_cputopology *info; @@ -4271,8 +4301,6 @@ static void output_topologyinfo(void) libxl_cputopology_list_free(info, nr); - printf("numa_info : none\n"); - return; } @@ -4282,8 +4310,10 @@ static void info(int numa) output_physinfo(); - if (numa) + if (numa) { output_topologyinfo(); + output_numainfo(); + } output_xeninfo();
Hello, Third version of the NUMA placement series Xen 4.2. All the comments received during v2''s review have been addressed (more details in single changelogs). The most notable changes are the following: - the libxl_cpumap --> libxl_bitmap renaming has been rebased on top of the recent patches that allows us to allocate bitmaps of different sizes; - the heuristics for deciding which NUMA placement is the best one has been redesigned, so that it now provides total ordering. Here it is what this posting contains (* = acked during previous round): * [PATCH 01 of 10 v3] libxl: add a new Array type to the IDL [PATCH 02 of 10 v3] libxl,libxc: introduce libxl_get_numainfo() * [PATCH 03 of 10 v3] xl: add more NUMA information to `xl info -n'' [PATCH 04 of 10 v3] libxl: rename libxl_cpumap to libxl_bitmap [PATCH 05 of 10 v3] libxl: expand the libxl_bitmap API a bit * [PATCH 06 of 10 v3] libxl: introduce some node map helpers [PATCH 07 of 10 v3] libxl: explicitly check for libmath in autoconf Is where data structures, utility functions and infrastructure are introduced. * [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes * [PATCH 09 of 10 v3] libxl: have NUMA placement deal with cpupools Host the core of the mechanism. * [PATCH 10 of 10 v3] Some automatic NUMA placement documentation For some more documentation. Thanks a lot and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Dario Faggioli
2012-Jul-04 16:18 UTC
[PATCH 01 of 10 v3] libxl: add a new Array type to the IDL
# HG changeset patch # User Ian Campbell <ian.campbell@citrix.com> # Date 1341416322 -7200 # Node ID 8e367818e194c212cd1470aad663f3243ff53bdb # Parent 42f76d536b116d2ebad1b6705ae51ecd171d2581 libxl: add a new Array type to the IDL And make all the required infrastructure updates to enable this. Since there are currently no uses of this type there is no change to the generated code. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Changes from v2: * v2''s patch replaced by what Ian Campbell posted as <03b641aa89f979a1670b.1340791844@cosworth.uk.xensource.com>, as agreed during the review process of that patch. diff --git a/tools/libxl/gentest.py b/tools/libxl/gentest.py --- a/tools/libxl/gentest.py +++ b/tools/libxl/gentest.py @@ -27,6 +27,18 @@ def gen_rand_init(ty, v, indent = " " s = "" if isinstance(ty, idl.Enumeration): s += "%s = %s;\n" % (ty.pass_arg(v, parent is None), randomize_enum(ty)) + elif isinstance(ty, idl.Array): + if parent is None: + raise Exception("Array type must have a parent") + s += "%s = rand()%%8;\n" % (parent + ty.lenvar.name) + s += "%s = calloc(%s, sizeof(*%s));\n" % \ + (v, parent + ty.lenvar.name, v) + s += "{\n" + s += " int i;\n" + s += " for (i=0; i<%s; i++)\n" % (parent + ty.lenvar.name) + s += gen_rand_init(ty.elem_type, v+"[i]", + indent + " ", parent) + s += "}\n" elif isinstance(ty, idl.KeyedUnion): if parent is None: raise Exception("KeyedUnion type must have a parent") diff --git a/tools/libxl/gentypes.py b/tools/libxl/gentypes.py --- a/tools/libxl/gentypes.py +++ b/tools/libxl/gentypes.py @@ -11,8 +11,12 @@ def libxl_C_instance_of(ty, instancename return libxl_C_type_define(ty) else: return libxl_C_type_define(ty) + " " + instancename - else: - return ty.typename + " " + instancename + + s = "" + if isinstance(ty, idl.Array): + s += libxl_C_instance_of(ty.lenvar.type, ty.lenvar.name) + ";\n" + + return s + ty.typename + " " + instancename def libxl_C_type_define(ty, indent = ""): s = "" @@ -66,6 +70,21 @@ def libxl_C_type_dispose(ty, v, indent s += libxl_C_type_dispose(f.type, fexpr, indent + " ", nparent) s += " break;\n" s += "}\n" + elif isinstance(ty, idl.Array): + if parent is None: + raise Exception("Array type must have a parent") + if ty.elem_type.dispose_fn is not None: + s += "{\n" + s += " int i;\n" + s += " for (i=0; i<%s; i++)\n" % (parent + ty.lenvar.name) + s += libxl_C_type_dispose(ty.elem_type, v+"[i]", + indent + " ", parent) + if ty.dispose_fn is not None: + if ty.elem_type.dispose_fn is not None: + s += " " + s += "%s(%s);\n" % (ty.dispose_fn, ty.pass_arg(v, parent is None)) + if ty.elem_type.dispose_fn is not None: + s += "}\n" elif isinstance(ty, idl.Struct) and (parent is None or ty.dispose_fn is None): for f in [f for f in ty.fields if not f.const]: (nparent,fexpr) = ty.member(v, f, parent is None) @@ -164,7 +183,24 @@ def libxl_C_type_gen_json(ty, v, indent s = "" if parent is None: s += "yajl_gen_status s;\n" - if isinstance(ty, idl.Enumeration): + + if isinstance(ty, idl.Array): + if parent is None: + raise Exception("Array type must have a parent") + s += "{\n" + s += " int i;\n" + s += " s = yajl_gen_array_open(hand);\n" + s += " if (s != yajl_gen_status_ok)\n" + s += " goto out;\n" + s += " for (i=0; i<%s; i++) {\n" % (parent + ty.lenvar.name) + s += libxl_C_type_gen_json(ty.elem_type, v+"[i]", + indent + " ", parent) + s += " }\n" + s += " s = yajl_gen_array_close(hand);\n" + s += " if (s != yajl_gen_status_ok)\n" + s += " goto out;\n" + s += "}\n" + elif isinstance(ty, idl.Enumeration): s += "s = libxl__yajl_gen_enum(hand, %s_to_string(%s));\n" % (ty.typename, ty.pass_arg(v, parent is None)) s += "if (s != yajl_gen_status_ok)\n" s += " goto out;\n" diff --git a/tools/libxl/idl.py b/tools/libxl/idl.py --- a/tools/libxl/idl.py +++ b/tools/libxl/idl.py @@ -266,6 +266,17 @@ string = Builtin("char *", namespace = N json_fn = "libxl__string_gen_json", autogenerate_json = False) +class Array(Type): + """An array of the same type""" + def __init__(self, elem_type, lenvar_name, **kwargs): + kwargs.setdefault(''dispose_fn'', ''free'') + Type.__init__(self, namespace=elem_type.namespace, typename=elem_type.rawname + " *", **kwargs) + + lv_kwargs = dict([(x.lstrip(''lenvar_''),y) for (x,y) in kwargs.items() if x.startswith(''lenvar_'')]) + + self.lenvar = Field(integer, lenvar_name, **lv_kwargs) + self.elem_type = elem_type + class OrderedDict(dict): """A dictionary which remembers insertion order. diff --git a/tools/libxl/idl.txt b/tools/libxl/idl.txt --- a/tools/libxl/idl.txt +++ b/tools/libxl/idl.txt @@ -145,11 +145,24 @@ idl.KeyedUnion A subclass of idl.Aggregate which represents the C union type where the currently valid member of the union can be determined based - upon another member in the containing type. + upon another member in the containing type. An idl.KeyedUnion must + always be a member of a containing idl.Aggregate type. - The KeyedUnion.keyvar contains an idl.type the member of the - containing type which determines the valid member of the union. The - must be an instance of the Enumeration type. + The KeyedUnion.keyvar contains an idl.Field, this is the member of + the containing type which determines the valid member of the + union. The idl.Field.type of the keyvar must be an Enumeration type. + +idl.Array + + A class representing an array of similar elements. An idl.Array must + always be an idl.Field of a containing idl.Aggregate. + + idl.Array.elem_type contains an idl.Type which is the type of each + element of the array. + + idl.Array.len_var contains an idl.Field which is added to the parent + idl.Aggregate and will contain the length of the array. The field + MUST be named num_ARRAYNAME. Standard Types -------------- diff --git a/tools/ocaml/libs/xl/genwrap.py b/tools/ocaml/libs/xl/genwrap.py --- a/tools/ocaml/libs/xl/genwrap.py +++ b/tools/ocaml/libs/xl/genwrap.py @@ -55,7 +55,8 @@ def ocaml_type_of(ty): return "int%d" % ty.width else: return "int" - + elif isinstance(ty,idl.Array): + return "%s array" % ocaml_type_of(ty.elem_type) elif isinstance(ty,idl.Builtin): if not builtins.has_key(ty.typename): raise NotImplementedError("Unknown Builtin %s (%s)" % (ty.typename, type(ty))) @@ -138,6 +139,8 @@ def c_val(ty, c, o, indent="", parent = if not fn: raise NotImplementedError("No c_val fn for Builtin %s (%s)" % (ty.typename, type(ty))) s += "%s;" % (fn % { "o": o, "c": c }) + elif isinstance (ty,idl.Array): + raise("Cannot handle Array type\n") elif isinstance(ty,idl.Enumeration) and (parent is None): n = 0 s += "switch(Int_val(%s)) {\n" % o @@ -195,6 +198,16 @@ def ocaml_Val(ty, o, c, indent="", paren if not fn: raise NotImplementedError("No ocaml Val fn for Builtin %s (%s)" % (ty.typename, type(ty))) s += "%s = %s;" % (o, fn % { "c": c }) + elif isinstance(ty, idl.Array): + s += "{\n" + s += "\t int i;\n" + s += "\t value array_elem;\n" + s += "\t %s = caml_alloc(%s,0);\n" % (o, parent + ty.lenvar.name) + s += "\t for(i=0; i<%s; i++) {\n" % (parent + ty.lenvar.name) + s += "\t %s\n" % ocaml_Val(ty.elem_type, "array_elem", c + "[i]", "") + s += "\t Store_field(%s, i, array_elem);\n" % o + s += "\t }\n" + s += "\t}" elif isinstance(ty,idl.Enumeration) and (parent is None): n = 0 s += "switch(%s) {\n" % c diff --git a/tools/python/genwrap.py b/tools/python/genwrap.py --- a/tools/python/genwrap.py +++ b/tools/python/genwrap.py @@ -4,7 +4,7 @@ import sys,os import idl -(TYPE_DEFBOOL, TYPE_BOOL, TYPE_INT, TYPE_UINT, TYPE_STRING, TYPE_AGGREGATE) = range(6) +(TYPE_DEFBOOL, TYPE_BOOL, TYPE_INT, TYPE_UINT, TYPE_STRING, TYPE_ARRAY, TYPE_AGGREGATE) = range(7) def py_type(ty): if ty == idl.bool: @@ -18,6 +18,8 @@ def py_type(ty): return TYPE_INT else: return TYPE_UINT + if isinstance(ty, idl.Array): + return TYPE_ARRAY if isinstance(ty, idl.Aggregate): return TYPE_AGGREGATE if ty == idl.string: @@ -74,7 +76,7 @@ def py_attrib_get(ty, f): l.append('' return genwrap__ull_get(self->obj.%s);''%f.name) elif t == TYPE_STRING: l.append('' return genwrap__string_get(&self->obj.%s);''%f.name) - elif t == TYPE_AGGREGATE: + elif t == TYPE_AGGREGATE or t == TYPE_ARRAY: l.append('' PyErr_SetString(PyExc_NotImplementedError, "Getting %s");''%ty.typename) l.append('' return NULL;'') else: @@ -105,7 +107,7 @@ def py_attrib_set(ty, f): l.append('' return ret;'') elif t == TYPE_STRING: l.append('' return genwrap__string_set(v, &self->obj.%s);''%f.name) - elif t == TYPE_AGGREGATE: + elif t == TYPE_AGGREGATE or t == TYPE_ARRAY: l.append('' PyErr_SetString(PyExc_NotImplementedError, "Setting %s");''%ty.typename) l.append('' return -1;'') else:
Dario Faggioli
2012-Jul-04 16:18 UTC
[PATCH 02 of 10 v3] libxl, libxc: introduce libxl_get_numainfo()
# HG changeset patch # User Dario Faggioli <raistlin@linux.it> # Date 1341416323 -7200 # Node ID 0ca91a203fc95d3d18bb436ecdc7106b0b2ff22f # Parent 8e367818e194c212cd1470aad663f3243ff53bdb libxl,libxc: introduce libxl_get_numainfo() Make some NUMA node information available to the toolstack. Achieve this by means of xc_numainfo(), which exposes memory size and amount of free memory of each node, as well as the relative distances of each node to all the others. For properly exposing distances we need the IDL to support arrays. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * converted libxl__zalloc(NULL, ...) to libxl_calloc(NOGC, ...). * Fixed the comment about memory ownership of libxl_get_numainfo(). * Added a comment for libxl_numainfo in libxl_types.idl. Changes from v1: * malloc converted to libxl__zalloc(NOGC, ...). * The patch also accommodates some bits of what was in "libxc, libxl: introduce xc_nodemap_t and libxl_nodemap" which was removed as well, as full support for node maps at libxc level is not needed (yet!). diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -35,6 +35,20 @@ int xc_get_max_cpus(xc_interface *xch) return max_cpus; } +int xc_get_max_nodes(xc_interface *xch) +{ + static int max_nodes = 0; + xc_physinfo_t physinfo; + + if ( max_nodes ) + return max_nodes; + + if ( !xc_physinfo(xch, &physinfo) ) + max_nodes = physinfo.max_node_id + 1; + + return max_nodes; +} + int xc_get_cpumap_size(xc_interface *xch) { return (xc_get_max_cpus(xch) + 7) / 8; diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -329,6 +329,12 @@ int xc_get_cpumap_size(xc_interface *xch /* allocate a cpumap */ xc_cpumap_t xc_cpumap_alloc(xc_interface *xch); + /* + * NODEMAP handling + */ +/* return maximum number of NUMA nodes the hypervisor supports */ +int xc_get_max_nodes(xc_interface *xch); + /* * DOMAIN DEBUGGING FUNCTIONS */ diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -3298,6 +3298,75 @@ fail: return ret; } +libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr) +{ + GC_INIT(ctx); + xc_numainfo_t ninfo; + DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize); + DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree); + DECLARE_HYPERCALL_BUFFER(uint32_t, node_dists); + libxl_numainfo *ret = NULL; + int i, j, max_nodes; + + max_nodes = libxl_get_max_nodes(ctx); + if (max_nodes == 0) + { + LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES"); + ret = NULL; + goto out; + } + + memsize = xc_hypercall_buffer_alloc + (ctx->xch, memsize, sizeof(*memsize) * max_nodes); + memfree = xc_hypercall_buffer_alloc + (ctx->xch, memfree, sizeof(*memfree) * max_nodes); + node_dists = xc_hypercall_buffer_alloc + (ctx->xch, node_dists, sizeof(*node_dists) * max_nodes * max_nodes); + if ((memsize == NULL) || (memfree == NULL) || (node_dists == NULL)) { + LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM, + "Unable to allocate hypercall arguments"); + goto fail; + } + + set_xen_guest_handle(ninfo.node_to_memsize, memsize); + set_xen_guest_handle(ninfo.node_to_memfree, memfree); + set_xen_guest_handle(ninfo.node_to_node_distance, node_dists); + ninfo.max_node_index = max_nodes - 1; + if (xc_numainfo(ctx->xch, &ninfo) != 0) { + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting numainfo"); + goto fail; + } + + if (ninfo.max_node_index < max_nodes - 1) + max_nodes = ninfo.max_node_index + 1; + + *nr = max_nodes; + + ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * max_nodes); + for (i = 0; i < max_nodes; i++) + ret[i].dists = libxl__calloc(NOGC, max_nodes, sizeof(*node_dists)); + + for (i = 0; i < max_nodes; i++) { +#define V(mem, i) (mem[i] == INVALID_NUMAINFO_ID) ? \ + LIBXL_NUMAINFO_INVALID_ENTRY : mem[i] + ret[i].size = V(memsize, i); + ret[i].free = V(memfree, i); + ret[i].num_dists = max_nodes; + for (j = 0; j < ret[i].num_dists; j++) + ret[i].dists[j] = V(node_dists, i * max_nodes + j); +#undef V + } + + fail: + xc_hypercall_buffer_free(ctx->xch, memsize); + xc_hypercall_buffer_free(ctx->xch, memfree); + xc_hypercall_buffer_free(ctx->xch, node_dists); + + out: + GC_FREE; + return ret; +} + const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx) { union { diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -532,6 +532,9 @@ int libxl_domain_preserve(libxl_ctx *ctx /* get max. number of cpus supported by hypervisor */ int libxl_get_max_cpus(libxl_ctx *ctx); +/* get max. number of NUMA nodes supported by hypervisor */ +int libxl_get_max_nodes(libxl_ctx *ctx); + int libxl_domain_rename(libxl_ctx *ctx, uint32_t domid, const char *old_name, const char *new_name); @@ -604,6 +607,10 @@ void libxl_vminfo_list_free(libxl_vminfo libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out); void libxl_cputopology_list_free(libxl_cputopology *, int nb_cpu); +#define LIBXL_NUMAINFO_INVALID_ENTRY (~(uint32_t)0) +libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr); +void libxl_numainfo_list_free(libxl_numainfo *, int nr); + libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, int *nb_vcpu, int *nr_vcpus_out); void libxl_vcpuinfo_list_free(libxl_vcpuinfo *, int nr_vcpus); diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -433,6 +433,15 @@ libxl_physinfo = Struct("physinfo", [ ("cap_hvm_directio", bool), ], dir=DIR_OUT) +# NUMA node characteristics: size and free are how much memory it has, and how +# much of it is free, respectively. dists is an array of distances from this +# node to each other node. +libxl_numainfo = Struct("numainfo", [ + ("size", uint64), + ("free", uint64), + ("dists", Array(uint32, "num_dists")), + ], dir=DIR_OUT) + libxl_cputopology = Struct("cputopology", [ ("core", uint32), ("socket", uint32), diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -572,6 +572,11 @@ int libxl_get_max_cpus(libxl_ctx *ctx) return xc_get_max_cpus(ctx->xch); } +int libxl_get_max_nodes(libxl_ctx *ctx) +{ + return xc_get_max_nodes(ctx->xch); +} + int libxl__enum_from_string(const libxl_enum_string_table *t, const char *s, int *e) { @@ -594,6 +599,14 @@ void libxl_cputopology_list_free(libxl_c free(list); } +void libxl_numainfo_list_free(libxl_numainfo *list, int nr) +{ + int i; + for (i = 0; i < nr; i++) + libxl_numainfo_dispose(&list[i]); + free(list); +} + void libxl_vcpuinfo_list_free(libxl_vcpuinfo *list, int nr) { int i; diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -484,6 +484,7 @@ typedef struct xen_sysctl_topologyinfo x DEFINE_XEN_GUEST_HANDLE(xen_sysctl_topologyinfo_t); /* XEN_SYSCTL_numainfo */ +#define INVALID_NUMAINFO_ID (~0U) struct xen_sysctl_numainfo { /* * IN: maximum addressable entry in the caller-provided arrays.
Dario Faggioli
2012-Jul-04 16:18 UTC
[PATCH 03 of 10 v3] xl: add more NUMA information to `xl info -n''
# HG changeset patch # User Dario Faggioli <raistlin@linux.it> # Date 1341416323 -7200 # Node ID f1227d5a82e56d10e302aec4c3717d281718a349 # Parent 0ca91a203fc95d3d18bb436ecdc7106b0b2ff22f xl: add more NUMA information to `xl info -n'' So that the user knows how much memory there is on each node and how far they are from each others. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> --- Changes from v1: * integer division replaced by right shift. diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -4249,6 +4249,36 @@ static void output_physinfo(void) return; } +static void output_numainfo(void) +{ + libxl_numainfo *info; + int i, j, nr; + + info = libxl_get_numainfo(ctx, &nr); + if (info == NULL) { + fprintf(stderr, "libxl_get_numainfo failed.\n"); + return; + } + + printf("numa_info :\n"); + printf("node: memsize memfree distances\n"); + + for (i = 0; i < nr; i++) { + if (info[i].size != LIBXL_NUMAINFO_INVALID_ENTRY) { + printf("%4d: %6"PRIu64" %6"PRIu64" %d", i, + info[i].size >> 20, info[i].free >> 20, + info[i].dists[0]); + for (j = 1; j < info[i].num_dists; j++) + printf(",%d", info[i].dists[j]); + printf("\n"); + } + } + + libxl_numainfo_list_free(info, nr); + + return; +} + static void output_topologyinfo(void) { libxl_cputopology *info; @@ -4271,8 +4301,6 @@ static void output_topologyinfo(void) libxl_cputopology_list_free(info, nr); - printf("numa_info : none\n"); - return; } @@ -4282,8 +4310,10 @@ static void info(int numa) output_physinfo(); - if (numa) + if (numa) { output_topologyinfo(); + output_numainfo(); + } output_xeninfo();
Dario Faggioli
2012-Jul-04 16:18 UTC
[PATCH 04 of 10 v3] libxl: rename libxl_cpumap to libxl_bitmap
# HG changeset patch # User Dario Faggioli <raistlin@linux.it> # Date 1341416323 -7200 # Node ID cfdd6d53f3dd3c6aa325fe6d8a17e4089daafae5 # Parent f1227d5a82e56d10e302aec4c3717d281718a349 libxl: rename libxl_cpumap to libxl_bitmap And leave to the caller the burden of knowing and remembering what kind of bitmap each instance of libxl_bitmap is. This is basically just some s/libxl_cpumap/libxl_bitmap/ (and some other related interface name substitution, e.g., libxl_for_each_cpu) in a bunch of files, with no real functional change involved. A specific allocation helper is introduced, besides libxl_bitmap_alloc(). It is called libxl_cpu_bitmap_alloc() and is meant at substituting the old libxl_cpumap_alloc(). It is just something easier to use in cases where one wants to allocate a libxl_bitmap that is going to serve as a cpu map. This is because we want to be able to deal with both cpu and NUMA node maps, but we don''t want to duplicate all the various helpers and wrappers. While at it, add the usual initialization function, common to all libxl data structures. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.eu.com> --- Changes from v2: * rebased on top of 51d2daabd428 (libxl: allow setting more than 31 vcpus). * Fixed one missing rename of cpumap into bitmap. * Added libxl_bitmap_init(). Changes from v1: * this patch replaces "libxl: abstract libxl_cpumap to just libxl_map" as it directly change the name of the old type instead of adding one more abstraction layer. diff --git a/tools/libxl/gentest.py b/tools/libxl/gentest.py --- a/tools/libxl/gentest.py +++ b/tools/libxl/gentest.py @@ -20,7 +20,7 @@ def randomize_case(s): def randomize_enum(e): return random.choice([v.name for v in e.values]) -handcoded = ["libxl_cpumap", "libxl_key_value_list", +handcoded = ["libxl_bitmap", "libxl_key_value_list", "libxl_cpuid_policy_list", "libxl_string_list"] def gen_rand_init(ty, v, indent = " ", parent = None): @@ -117,16 +117,16 @@ static void rand_bytes(uint8_t *p, size_ p[i] = rand() % 256; } -static void libxl_cpumap_rand_init(libxl_cpumap *cpumap) +static void libxl_bitmap_rand_init(libxl_bitmap *bitmap) { int i; - cpumap->size = rand() % 16; - cpumap->map = calloc(cpumap->size, sizeof(*cpumap->map)); - libxl_for_each_cpu(i, *cpumap) { + bitmap->size = rand() % 16; + bitmap->map = calloc(bitmap->size, sizeof(*bitmap->map)); + libxl_for_each_bit(i, *bitmap) { if (rand() % 2) - libxl_cpumap_set(cpumap, i); + libxl_bitmap_set(bitmap, i); else - libxl_cpumap_reset(cpumap, i); + libxl_bitmap_reset(bitmap, i); } } diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -586,7 +586,7 @@ static int cpupool_info(libxl__gc *gc, info->poolid = xcinfo->cpupool_id; info->sched = xcinfo->sched_id; info->n_dom = xcinfo->n_dom; - rc = libxl_cpumap_alloc(CTX, &info->cpumap, 0); + rc = libxl_cpu_bitmap_alloc(CTX, &info->cpumap, 0); if (rc) { LOG(ERROR, "unable to allocate cpumap %d\n", rc); @@ -3431,7 +3431,7 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ct } for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu, ++ptr) { - if (libxl_cpumap_alloc(ctx, &ptr->cpumap, 0)) { + if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "allocating cpumap"); return NULL; } @@ -3454,7 +3454,7 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ct } int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, - libxl_cpumap *cpumap) + libxl_bitmap *cpumap) { if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map)) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu affinity"); @@ -3464,7 +3464,7 @@ int libxl_set_vcpuaffinity(libxl_ctx *ct } int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, - unsigned int max_vcpus, libxl_cpumap *cpumap) + unsigned int max_vcpus, libxl_bitmap *cpumap) { int i, rc = 0; @@ -3478,7 +3478,7 @@ int libxl_set_vcpuaffinity_all(libxl_ctx return rc; } -int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_cpumap *cpumap) +int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_bitmap *cpumap) { GC_INIT(ctx); libxl_dominfo info; @@ -3498,7 +3498,7 @@ retry_transaction: for (i = 0; i <= info.vcpu_max_id; i++) libxl__xs_write(gc, t, libxl__sprintf(gc, "%s/cpu/%u/availability", dompath, i), - "%s", libxl_cpumap_test(cpumap, i) ? "online" : "offline"); + "%s", libxl_bitmap_test(cpumap, i) ? "online" : "offline"); if (!xs_transaction_end(ctx->xsh, t, 0)) { if (errno == EAGAIN) goto retry_transaction; @@ -4094,7 +4094,7 @@ int libxl_tmem_freeable(libxl_ctx *ctx) return rc; } -int libxl_get_freecpus(libxl_ctx *ctx, libxl_cpumap *cpumap) +int libxl_get_freecpus(libxl_ctx *ctx, libxl_bitmap *cpumap) { int ncpus; @@ -4113,7 +4113,7 @@ int libxl_get_freecpus(libxl_ctx *ctx, l int libxl_cpupool_create(libxl_ctx *ctx, const char *name, libxl_scheduler sched, - libxl_cpumap cpumap, libxl_uuid *uuid, + libxl_bitmap cpumap, libxl_uuid *uuid, uint32_t *poolid) { GC_INIT(ctx); @@ -4136,8 +4136,8 @@ int libxl_cpupool_create(libxl_ctx *ctx, return ERROR_FAIL; } - libxl_for_each_cpu(i, cpumap) - if (libxl_cpumap_test(&cpumap, i)) { + libxl_for_each_bit(i, cpumap) + if (libxl_bitmap_test(&cpumap, i)) { rc = xc_cpupool_addcpu(ctx->xch, *poolid, i); if (rc) { LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, @@ -4172,7 +4172,7 @@ int libxl_cpupool_destroy(libxl_ctx *ctx int rc, i; xc_cpupoolinfo_t *info; xs_transaction_t t; - libxl_cpumap cpumap; + libxl_bitmap cpumap; info = xc_cpupool_getinfo(ctx->xch, poolid); if (info == NULL) { @@ -4184,13 +4184,13 @@ int libxl_cpupool_destroy(libxl_ctx *ctx if ((info->cpupool_id != poolid) || (info->n_dom)) goto out; - rc = libxl_cpumap_alloc(ctx, &cpumap, 0); + rc = libxl_cpu_bitmap_alloc(ctx, &cpumap, 0); if (rc) goto out; memcpy(cpumap.map, info->cpumap, cpumap.size); - libxl_for_each_cpu(i, cpumap) - if (libxl_cpumap_test(&cpumap, i)) { + libxl_for_each_bit(i, cpumap) + if (libxl_bitmap_test(&cpumap, i)) { rc = xc_cpupool_removecpu(ctx->xch, poolid, i); if (rc) { LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, @@ -4219,7 +4219,7 @@ int libxl_cpupool_destroy(libxl_ctx *ctx rc = 0; out1: - libxl_cpumap_dispose(&cpumap); + libxl_bitmap_dispose(&cpumap); out: xc_cpupool_infofree(ctx->xch, info); GC_FREE; @@ -4287,7 +4287,7 @@ int libxl_cpupool_cpuadd_node(libxl_ctx { int rc = 0; int cpu, nr; - libxl_cpumap freemap; + libxl_bitmap freemap; libxl_cputopology *topology; if (libxl_get_freecpus(ctx, &freemap)) { @@ -4302,7 +4302,7 @@ int libxl_cpupool_cpuadd_node(libxl_ctx *cpus = 0; for (cpu = 0; cpu < nr; cpu++) { - if (libxl_cpumap_test(&freemap, cpu) && (topology[cpu].node == node) && + if (libxl_bitmap_test(&freemap, cpu) && (topology[cpu].node == node) && !libxl_cpupool_cpuadd(ctx, poolid, cpu)) { (*cpus)++; } @@ -4311,7 +4311,7 @@ int libxl_cpupool_cpuadd_node(libxl_ctx free(topology); out: - libxl_cpumap_dispose(&freemap); + libxl_bitmap_dispose(&freemap); return rc; } @@ -4353,7 +4353,7 @@ int libxl_cpupool_cpuremove_node(libxl_c if (poolinfo[p].poolid == poolid) { for (cpu = 0; cpu < nr_cpus; cpu++) { if ((topology[cpu].node == node) && - libxl_cpumap_test(&poolinfo[p].cpumap, cpu) && + libxl_bitmap_test(&poolinfo[p].cpumap, cpu) && !libxl_cpupool_cpuremove(ctx, poolid, cpu)) { (*cpus)++; } diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -285,8 +285,9 @@ typedef uint64_t libxl_ev_user; typedef struct { uint32_t size; /* number of bytes in map */ uint8_t *map; -} libxl_cpumap; -void libxl_cpumap_dispose(libxl_cpumap *map); +} libxl_bitmap; +void libxl_bitmap_init(libxl_bitmap *map); +void libxl_bitmap_dispose(libxl_bitmap *map); /* libxl_cpuid_policy_list is a dynamic array storing CPUID policies * for multiple leafs. It is terminated with an entry holding @@ -790,10 +791,10 @@ int libxl_userdata_retrieve(libxl_ctx *c int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo); int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, - libxl_cpumap *cpumap); + libxl_bitmap *cpumap); int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, - unsigned int max_vcpus, libxl_cpumap *cpumap); -int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_cpumap *cpumap); + unsigned int max_vcpus, libxl_bitmap *cpumap); +int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_bitmap *cpumap); libxl_scheduler libxl_get_scheduler(libxl_ctx *ctx); @@ -843,10 +844,10 @@ int libxl_tmem_shared_auth(libxl_ctx *ct int auth); int libxl_tmem_freeable(libxl_ctx *ctx); -int libxl_get_freecpus(libxl_ctx *ctx, libxl_cpumap *cpumap); +int libxl_get_freecpus(libxl_ctx *ctx, libxl_bitmap *cpumap); int libxl_cpupool_create(libxl_ctx *ctx, const char *name, libxl_scheduler sched, - libxl_cpumap cpumap, libxl_uuid *uuid, + libxl_bitmap cpumap, libxl_uuid *uuid, uint32_t *poolid); int libxl_cpupool_destroy(libxl_ctx *ctx, uint32_t poolid); int libxl_cpupool_rename(libxl_ctx *ctx, const char *name, uint32_t poolid); diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -203,16 +203,16 @@ int libxl__domain_build_info_setdefault( if (!b_info->max_vcpus) b_info->max_vcpus = 1; if (!b_info->avail_vcpus.size) { - if (libxl_cpumap_alloc(CTX, &b_info->avail_vcpus, 1)) + if (libxl_cpu_bitmap_alloc(CTX, &b_info->avail_vcpus, 1)) return ERROR_FAIL; - libxl_cpumap_set(&b_info->avail_vcpus, 0); + libxl_bitmap_set(&b_info->avail_vcpus, 0); } else if (b_info->avail_vcpus.size > HVM_MAX_VCPUS) return ERROR_FAIL; if (!b_info->cpumap.size) { - if (libxl_cpumap_alloc(CTX, &b_info->cpumap, 0)) + if (libxl_cpu_bitmap_alloc(CTX, &b_info->cpumap, 0)) return ERROR_FAIL; - libxl_cpumap_set_any(&b_info->cpumap); + libxl_bitmap_set_any(&b_info->cpumap); } if (b_info->max_memkb == LIBXL_MEMKB_DEFAULT) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -208,8 +208,8 @@ static char ** libxl__build_device_model NULL); } - nr_set_cpus = libxl_cpumap_count_set(&b_info->avail_vcpus); - s = libxl_cpumap_to_hex_string(CTX, &b_info->avail_vcpus); + nr_set_cpus = libxl_bitmap_count_set(&b_info->avail_vcpus); + s = libxl_bitmap_to_hex_string(CTX, &b_info->avail_vcpus); flexarray_vappend(dm_args, "-vcpu_avail", libxl__sprintf(gc, "%s", s), NULL); free(s); @@ -459,7 +459,7 @@ static char ** libxl__build_device_model flexarray_append(dm_args, "-smp"); if (b_info->avail_vcpus.size) { int nr_set_cpus = 0; - nr_set_cpus = libxl_cpumap_count_set(&b_info->avail_vcpus); + nr_set_cpus = libxl_bitmap_count_set(&b_info->avail_vcpus); flexarray_append(dm_args, libxl__sprintf(gc, "%d,maxcpus=%d", b_info->max_vcpus, diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -202,7 +202,7 @@ int libxl__build_post(libxl__gc *gc, uin ents[11] = libxl__sprintf(gc, "%lu", state->store_mfn); for (i = 0; i < info->max_vcpus; i++) { ents[12+(i*2)] = libxl__sprintf(gc, "cpu/%d/availability", i); - ents[12+(i*2)+1] = libxl_cpumap_test(&info->avail_vcpus, i) + ents[12+(i*2)+1] = libxl_bitmap_test(&info->avail_vcpus, i) ? "online" : "offline"; } diff --git a/tools/libxl/libxl_json.c b/tools/libxl/libxl_json.c --- a/tools/libxl/libxl_json.c +++ b/tools/libxl/libxl_json.c @@ -99,8 +99,8 @@ yajl_gen_status libxl_uuid_gen_json(yajl return yajl_gen_string(hand, (const unsigned char *)buf, LIBXL_UUID_FMTLEN); } -yajl_gen_status libxl_cpumap_gen_json(yajl_gen hand, - libxl_cpumap *cpumap) +yajl_gen_status libxl_bitmap_gen_json(yajl_gen hand, + libxl_bitmap *bitmap) { yajl_gen_status s; int i; @@ -108,8 +108,8 @@ yajl_gen_status libxl_cpumap_gen_json(ya s = yajl_gen_array_open(hand); if (s != yajl_gen_status_ok) goto out; - libxl_for_each_cpu(i, *cpumap) { - if (libxl_cpumap_test(cpumap, i)) { + libxl_for_each_bit(i, *bitmap) { + if (libxl_bitmap_test(bitmap, i)) { s = yajl_gen_integer(hand, i); if (s != yajl_gen_status_ok) goto out; } diff --git a/tools/libxl/libxl_json.h b/tools/libxl/libxl_json.h --- a/tools/libxl/libxl_json.h +++ b/tools/libxl/libxl_json.h @@ -26,7 +26,7 @@ yajl_gen_status libxl_defbool_gen_json(y yajl_gen_status libxl_domid_gen_json(yajl_gen hand, libxl_domid *p); yajl_gen_status libxl_uuid_gen_json(yajl_gen hand, libxl_uuid *p); yajl_gen_status libxl_mac_gen_json(yajl_gen hand, libxl_mac *p); -yajl_gen_status libxl_cpumap_gen_json(yajl_gen hand, libxl_cpumap *p); +yajl_gen_status libxl_bitmap_gen_json(yajl_gen hand, libxl_bitmap *p); yajl_gen_status libxl_cpuid_policy_list_gen_json(yajl_gen hand, libxl_cpuid_policy_list *p); yajl_gen_status libxl_string_list_gen_json(yajl_gen hand, libxl_string_list *p); diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -10,7 +10,7 @@ libxl_defbool = Builtin("defbool", passb libxl_domid = Builtin("domid", json_fn = "yajl_gen_integer", autogenerate_json = False) libxl_uuid = Builtin("uuid", passby=PASS_BY_REFERENCE) libxl_mac = Builtin("mac", passby=PASS_BY_REFERENCE) -libxl_cpumap = Builtin("cpumap", dispose_fn="libxl_cpumap_dispose", passby=PASS_BY_REFERENCE) +libxl_bitmap = Builtin("bitmap", dispose_fn="libxl_bitmap_dispose", passby=PASS_BY_REFERENCE) libxl_cpuid_policy_list = Builtin("cpuid_policy_list", dispose_fn="libxl_cpuid_dispose", passby=PASS_BY_REFERENCE) libxl_string_list = Builtin("string_list", dispose_fn="libxl_string_list_dispose", passby=PASS_BY_REFERENCE) @@ -198,7 +198,7 @@ libxl_cpupoolinfo = Struct("cpupoolinfo" ("poolid", uint32), ("sched", libxl_scheduler), ("n_dom", uint32), - ("cpumap", libxl_cpumap) + ("cpumap", libxl_bitmap) ], dir=DIR_OUT) libxl_vminfo = Struct("vminfo", [ @@ -247,8 +247,8 @@ libxl_domain_sched_params = Struct("doma libxl_domain_build_info = Struct("domain_build_info",[ ("max_vcpus", integer), - ("avail_vcpus", libxl_cpumap), - ("cpumap", libxl_cpumap), + ("avail_vcpus", libxl_bitmap), + ("cpumap", libxl_bitmap), ("tsc_mode", libxl_tsc_mode), ("max_memkb", MemKB), ("target_memkb", MemKB), @@ -409,7 +409,7 @@ libxl_vcpuinfo = Struct("vcpuinfo", [ ("blocked", bool), ("running", bool), ("vcpu_time", uint64), # total vcpu time ran (ns) - ("cpumap", libxl_cpumap), # current cpu''s affinities + ("cpumap", libxl_bitmap), # current cpu''s affinities ], dir=DIR_OUT) libxl_physinfo = Struct("physinfo", [ diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -487,79 +487,70 @@ int libxl_mac_to_device_nic(libxl_ctx *c return rc; } -int libxl_cpumap_alloc(libxl_ctx *ctx, libxl_cpumap *cpumap, int max_cpus) +int libxl_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *bitmap, int n_bits) { GC_INIT(ctx); int sz; - int rc; - if (max_cpus < 0) { - rc = ERROR_INVAL; - goto out; - } - if (max_cpus == 0) - max_cpus = libxl_get_max_cpus(ctx); - if (max_cpus == 0) { - rc = ERROR_FAIL; - goto out; - } + sz = (n_bits + 7) / 8; + bitmap->map = libxl__calloc(NOGC, sizeof(*bitmap->map), sz); + bitmap->size = sz; - sz = (max_cpus + 7) / 8; - cpumap->map = libxl__calloc(NOGC, sizeof(*cpumap->map), sz); - cpumap->size = sz; - - rc = 0; - out: GC_FREE; - return rc; + return 0; } -void libxl_cpumap_dispose(libxl_cpumap *map) +void libxl_bitmap_init(libxl_bitmap *map) +{ + memset(map, ''\0'', sizeof(*map)); +} + +void libxl_bitmap_dispose(libxl_bitmap *map) { free(map->map); } -int libxl_cpumap_test(const libxl_cpumap *cpumap, int cpu) +int libxl_bitmap_test(const libxl_bitmap *bitmap, int bit) { - if (cpu >= cpumap->size * 8) + if (bit >= bitmap->size * 8) return 0; - return (cpumap->map[cpu / 8] & (1 << (cpu & 7))) ? 1 : 0; + return (bitmap->map[bit / 8] & (1 << (bit & 7))) ? 1 : 0; } -void libxl_cpumap_set(libxl_cpumap *cpumap, int cpu) +void libxl_bitmap_set(libxl_bitmap *bitmap, int bit) { - if (cpu >= cpumap->size * 8) + if (bit >= bitmap->size * 8) return; - cpumap->map[cpu / 8] |= 1 << (cpu & 7); + bitmap->map[bit / 8] |= 1 << (bit & 7); } -void libxl_cpumap_reset(libxl_cpumap *cpumap, int cpu) +void libxl_bitmap_reset(libxl_bitmap *bitmap, int bit) { - if (cpu >= cpumap->size * 8) + if (bit >= bitmap->size * 8) return; - cpumap->map[cpu / 8] &= ~(1 << (cpu & 7)); + bitmap->map[bit / 8] &= ~(1 << (bit & 7)); } -int libxl_cpumap_count_set(const libxl_cpumap *cpumap) +int libxl_bitmap_count_set(const libxl_bitmap *bitmap) { - int i, nr_set_cpus = 0; - libxl_for_each_set_cpu(i, *cpumap) - nr_set_cpus++; + int i, nr_set_bits = 0; + libxl_for_each_set_bit(i, *bitmap) + nr_set_bits++; - return nr_set_cpus; + return nr_set_bits; } /* NB. caller is responsible for freeing the memory */ -char *libxl_cpumap_to_hex_string(libxl_ctx *ctx, const libxl_cpumap *cpumap) +char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const libxl_bitmap *bitmap) { GC_INIT(ctx); - int i = cpumap->size; - char *p = libxl__zalloc(NOGC, cpumap->size * 2 + 3); + int i = bitmap->size; + char *p = libxl__zalloc(NOGC, bitmap->size * 2 + 3); char *q = p; strncpy(p, "0x", 2); p += 2; while(--i >= 0) { - sprintf(p, "%02x", cpumap->map[i]); + sprintf(p, "%02x", bitmap->map[i]); p += 2; } *p = ''\0''; diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h --- a/tools/libxl/libxl_utils.h +++ b/tools/libxl/libxl_utils.h @@ -63,29 +63,44 @@ int libxl_devid_to_device_nic(libxl_ctx int libxl_vdev_to_device_disk(libxl_ctx *ctx, uint32_t domid, const char *vdev, libxl_device_disk *disk); -int libxl_cpumap_alloc(libxl_ctx *ctx, libxl_cpumap *cpumap, int max_cpus); -int libxl_cpumap_test(const libxl_cpumap *cpumap, int cpu); -void libxl_cpumap_set(libxl_cpumap *cpumap, int cpu); -void libxl_cpumap_reset(libxl_cpumap *cpumap, int cpu); -int libxl_cpumap_count_set(const libxl_cpumap *cpumap); -char *libxl_cpumap_to_hex_string(libxl_ctx *ctx, const libxl_cpumap *cpumap); -static inline void libxl_cpumap_set_any(libxl_cpumap *cpumap) +int libxl_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *bitmap, int n_bits); + /* Allocated bimap is from malloc, libxl_bitmap_dispose() to be + * called by the application when done. */ +int libxl_bitmap_test(const libxl_bitmap *bitmap, int bit); +void libxl_bitmap_set(libxl_bitmap *bitmap, int bit); +void libxl_bitmap_reset(libxl_bitmap *bitmap, int bit); +int libxl_bitmap_count_set(const libxl_bitmap *cpumap); +char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const libxl_bitmap *cpumap); +static inline void libxl_bitmap_set_any(libxl_bitmap *bitmap) { - memset(cpumap->map, -1, cpumap->size); + memset(bitmap->map, -1, bitmap->size); } -static inline void libxl_cpumap_set_none(libxl_cpumap *cpumap) +static inline void libxl_bitmap_set_none(libxl_bitmap *bitmap) { - memset(cpumap->map, 0, cpumap->size); + memset(bitmap->map, 0, bitmap->size); } -static inline int libxl_cpumap_cpu_valid(libxl_cpumap *cpumap, int cpu) +static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit) { - return cpu >= 0 && cpu < (cpumap->size * 8); + return bit >= 0 && bit < (bitmap->size * 8); } -#define libxl_for_each_cpu(var, map) for (var = 0; var < (map).size * 8; var++) -#define libxl_for_each_set_cpu(v, m) for (v = 0; v < (m).size * 8; v++) \ - if (libxl_cpumap_test(&(m), v)) +#define libxl_for_each_bit(var, map) for (var = 0; var < (map).size * 8; var++) +#define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \ + if (libxl_bitmap_test(&(m), v)) -static inline uint32_t libxl__sizekb_to_mb(uint32_t s) { +static inline int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, + int max_cpus) +{ + if (max_cpus < 0) + return ERROR_INVAL; + if (max_cpus == 0) + max_cpus = libxl_get_max_cpus(ctx); + if (max_cpus == 0) + return ERROR_FAIL; + + return libxl_bitmap_alloc(ctx, cpumap, max_cpus); +} + + static inline uint32_t libxl__sizekb_to_mb(uint32_t s) { return (s + 1023) / 1024; } diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -492,19 +492,19 @@ static void split_string_into_string_lis free(s); } -static int vcpupin_parse(char *cpu, libxl_cpumap *cpumap) -{ - libxl_cpumap exclude_cpumap; +static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap) +{ + libxl_bitmap exclude_cpumap; uint32_t cpuida, cpuidb; char *endptr, *toka, *tokb, *saveptr = NULL; int i, rc = 0, rmcpu; if (!strcmp(cpu, "all")) { - libxl_cpumap_set_any(cpumap); + libxl_bitmap_set_any(cpumap); return 0; } - if (libxl_cpumap_alloc(ctx, &exclude_cpumap, 0)) { + if (libxl_cpu_bitmap_alloc(ctx, &exclude_cpumap, 0)) { fprintf(stderr, "Error: Failed to allocate cpumap.\n"); return ENOMEM; } @@ -534,19 +534,19 @@ static int vcpupin_parse(char *cpu, libx } } while (cpuida <= cpuidb) { - rmcpu == 0 ? libxl_cpumap_set(cpumap, cpuida) : - libxl_cpumap_set(&exclude_cpumap, cpuida); + rmcpu == 0 ? libxl_bitmap_set(cpumap, cpuida) : + libxl_bitmap_set(&exclude_cpumap, cpuida); cpuida++; } } /* Clear all the cpus from the removal list */ - libxl_for_each_set_cpu(i, exclude_cpumap) { - libxl_cpumap_reset(cpumap, i); + libxl_for_each_set_bit(i, exclude_cpumap) { + libxl_bitmap_reset(cpumap, i); } vcpp_out: - libxl_cpumap_dispose(&exclude_cpumap); + libxl_bitmap_dispose(&exclude_cpumap); return rc; } @@ -649,13 +649,13 @@ static void parse_config_data(const char if (!xlu_cfg_get_long (config, "vcpus", &l, 0)) { b_info->max_vcpus = l; - if (libxl_cpumap_alloc(ctx, &b_info->avail_vcpus, l)) { + if (libxl_cpu_bitmap_alloc(ctx, &b_info->avail_vcpus, l)) { fprintf(stderr, "Unable to allocate cpumap\n"); exit(1); } - libxl_cpumap_set_none(&b_info->avail_vcpus); + libxl_bitmap_set_none(&b_info->avail_vcpus); while (l-- > 0) - libxl_cpumap_set((&b_info->avail_vcpus), l); + libxl_bitmap_set((&b_info->avail_vcpus), l); } if (!xlu_cfg_get_long (config, "maxvcpus", &l, 0)) @@ -664,7 +664,7 @@ static void parse_config_data(const char if (!xlu_cfg_get_list (config, "cpus", &cpus, 0, 1)) { int i, n_cpus = 0; - if (libxl_cpumap_alloc(ctx, &b_info->cpumap, 0)) { + if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) { fprintf(stderr, "Unable to allocate cpumap\n"); exit(1); } @@ -684,14 +684,14 @@ static void parse_config_data(const char * the cpumap derived from the list ensures memory is being * allocated on the proper nodes anyway. */ - libxl_cpumap_set_none(&b_info->cpumap); + libxl_bitmap_set_none(&b_info->cpumap); while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) { i = atoi(buf); - if (!libxl_cpumap_cpu_valid(&b_info->cpumap, i)) { + if (!libxl_bitmap_cpu_valid(&b_info->cpumap, i)) { fprintf(stderr, "cpu %d illegal\n", i); exit(1); } - libxl_cpumap_set(&b_info->cpumap, i); + libxl_bitmap_set(&b_info->cpumap, i); if (n_cpus < b_info->max_vcpus) vcpu_to_pcpu[n_cpus] = i; n_cpus++; @@ -700,12 +700,12 @@ static void parse_config_data(const char else if (!xlu_cfg_get_string (config, "cpus", &buf, 0)) { char *buf2 = strdup(buf); - if (libxl_cpumap_alloc(ctx, &b_info->cpumap, 0)) { + if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) { fprintf(stderr, "Unable to allocate cpumap\n"); exit(1); } - libxl_cpumap_set_none(&b_info->cpumap); + libxl_bitmap_set_none(&b_info->cpumap); if (vcpupin_parse(buf2, &b_info->cpumap)) exit(1); free(buf2); @@ -1800,28 +1800,28 @@ start: /* If single vcpu to pcpu mapping was requested, honour it */ if (vcpu_to_pcpu) { - libxl_cpumap vcpu_cpumap; - - ret = libxl_cpumap_alloc(ctx, &vcpu_cpumap, 0); + libxl_bitmap vcpu_cpumap; + + ret = libxl_cpu_bitmap_alloc(ctx, &vcpu_cpumap, 0); if (ret) goto error_out; for (i = 0; i < d_config.b_info.max_vcpus; i++) { if (vcpu_to_pcpu[i] != -1) { - libxl_cpumap_set_none(&vcpu_cpumap); - libxl_cpumap_set(&vcpu_cpumap, vcpu_to_pcpu[i]); + libxl_bitmap_set_none(&vcpu_cpumap); + libxl_bitmap_set(&vcpu_cpumap, vcpu_to_pcpu[i]); } else { - libxl_cpumap_set_any(&vcpu_cpumap); + libxl_bitmap_set_any(&vcpu_cpumap); } if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap)) { fprintf(stderr, "setting affinity failed on vcpu `%d''.\n", i); - libxl_cpumap_dispose(&vcpu_cpumap); + libxl_bitmap_dispose(&vcpu_cpumap); free(vcpu_to_pcpu); ret = ERROR_FAIL; goto error_out; } } - libxl_cpumap_dispose(&vcpu_cpumap); + libxl_bitmap_dispose(&vcpu_cpumap); free(vcpu_to_pcpu); vcpu_to_pcpu = NULL; } @@ -4058,7 +4058,7 @@ int main_vcpulist(int argc, char **argv) static void vcpupin(const char *d, const char *vcpu, char *cpu) { libxl_vcpuinfo *vcpuinfo; - libxl_cpumap cpumap; + libxl_bitmap cpumap; uint32_t vcpuid; char *endptr; @@ -4075,7 +4075,7 @@ static void vcpupin(const char *d, const find_domain(d); - if (libxl_cpumap_alloc(ctx, &cpumap, 0)) { + if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) { goto vcpupin_out; } @@ -4102,7 +4102,7 @@ static void vcpupin(const char *d, const libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu); } vcpupin_out1: - libxl_cpumap_dispose(&cpumap); + libxl_bitmap_dispose(&cpumap); vcpupin_out: ; } @@ -4122,7 +4122,7 @@ static void vcpuset(const char *d, const { char *endptr; unsigned int max_vcpus, i; - libxl_cpumap cpumap; + libxl_bitmap cpumap; max_vcpus = strtoul(nr_vcpus, &endptr, 10); if (nr_vcpus == endptr) { @@ -4132,17 +4132,17 @@ static void vcpuset(const char *d, const find_domain(d); - if (libxl_cpumap_alloc(ctx, &cpumap, 0)) { - fprintf(stderr, "libxl_cpumap_alloc failed\n"); + if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) { + fprintf(stderr, "libxl_cpu_bitmap_alloc failed\n"); return; } for (i = 0; i < max_vcpus; i++) - libxl_cpumap_set(&cpumap, i); + libxl_bitmap_set(&cpumap, i); if (libxl_set_vcpuonline(ctx, domid, &cpumap) < 0) fprintf(stderr, "libxl_set_vcpuonline failed domid=%d max_vcpus=%d\n", domid, max_vcpus); - libxl_cpumap_dispose(&cpumap); + libxl_bitmap_dispose(&cpumap); } int main_vcpuset(int argc, char **argv) @@ -4206,7 +4206,7 @@ static void output_physinfo(void) libxl_physinfo info; const libxl_version_info *vinfo; unsigned int i; - libxl_cpumap cpumap; + libxl_bitmap cpumap; int n = 0; if (libxl_get_physinfo(ctx, &info) != 0) { @@ -4238,8 +4238,8 @@ static void output_physinfo(void) printf("sharing_used_memory : %"PRIu64"\n", info.sharing_used_frames / i); } if (!libxl_get_freecpus(ctx, &cpumap)) { - libxl_for_each_cpu(i, cpumap) - if (libxl_cpumap_test(&cpumap, i)) + libxl_for_each_bit(i, cpumap) + if (libxl_bitmap_test(&cpumap, i)) n++; printf("free_cpus : %d\n", n); free(cpumap.map); @@ -5861,8 +5861,8 @@ int main_cpupoolcreate(int argc, char ** XLU_ConfigList *cpus; XLU_ConfigList *nodes; int n_cpus, n_nodes, i, n; - libxl_cpumap freemap; - libxl_cpumap cpumap; + libxl_bitmap freemap; + libxl_bitmap cpumap; libxl_uuid uuid; libxl_cputopology *topology; int rc = -ERROR_FAIL; @@ -5975,7 +5975,7 @@ int main_cpupoolcreate(int argc, char ** fprintf(stderr, "libxl_get_freecpus failed\n"); goto out_cfg; } - if (libxl_cpumap_alloc(ctx, &cpumap, 0)) { + if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) { fprintf(stderr, "Failed to allocate cpumap\n"); goto out_cfg; } @@ -5992,8 +5992,8 @@ int main_cpupoolcreate(int argc, char ** n = atoi(buf); for (i = 0; i < nr; i++) { if ((topology[i].node == n) && - libxl_cpumap_test(&freemap, i)) { - libxl_cpumap_set(&cpumap, i); + libxl_bitmap_test(&freemap, i)) { + libxl_bitmap_set(&cpumap, i); n_cpus++; } } @@ -6011,11 +6011,11 @@ int main_cpupoolcreate(int argc, char ** while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) { i = atoi(buf); if ((i < 0) || (i >= freemap.size * 8) || - !libxl_cpumap_test(&freemap, i)) { + !libxl_bitmap_test(&freemap, i)) { fprintf(stderr, "cpu %d illegal or not free\n", i); goto out_cfg; } - libxl_cpumap_set(&cpumap, i); + libxl_bitmap_set(&cpumap, i); n_cpus++; } } else @@ -6113,8 +6113,8 @@ int main_cpupoollist(int argc, char **ar printf("%-19s", name); free(name); n = 0; - libxl_for_each_cpu(c, poolinfo[p].cpumap) - if (libxl_cpumap_test(&poolinfo[p].cpumap, c)) { + libxl_for_each_bit(c, poolinfo[p].cpumap) + if (libxl_bitmap_test(&poolinfo[p].cpumap, c)) { if (n && opt_cpus) printf(","); if (opt_cpus) printf("%d", c); n++; @@ -6313,7 +6313,7 @@ int main_cpupoolnumasplit(int argc, char int n_cpus; char name[16]; libxl_uuid uuid; - libxl_cpumap cpumap; + libxl_bitmap cpumap; libxl_cpupoolinfo *poolinfo; libxl_cputopology *topology; libxl_dominfo info; @@ -6343,7 +6343,7 @@ int main_cpupoolnumasplit(int argc, char return -ERROR_FAIL; } - if (libxl_cpumap_alloc(ctx, &cpumap, 0)) { + if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) { fprintf(stderr, "Failed to allocate cpumap\n"); libxl_cputopology_list_free(topology, n_cpus); return -ERROR_FAIL; @@ -6369,7 +6369,7 @@ int main_cpupoolnumasplit(int argc, char for (c = 0; c < n_cpus; c++) { if (topology[c].node == node) { topology[c].node = LIBXL_CPUTOPOLOGY_INVALID_ENTRY; - libxl_cpumap_set(&cpumap, n); + libxl_bitmap_set(&cpumap, n); n++; } } @@ -6391,7 +6391,7 @@ int main_cpupoolnumasplit(int argc, char fprintf(stderr, "failed to offline vcpus\n"); goto out; } - libxl_cpumap_set_none(&cpumap); + libxl_bitmap_set_none(&cpumap); for (c = 0; c < n_cpus; c++) { if (topology[c].node == LIBXL_CPUTOPOLOGY_INVALID_ENTRY) { @@ -6429,7 +6429,7 @@ int main_cpupoolnumasplit(int argc, char out: libxl_cputopology_list_free(topology, n_cpus); - libxl_cpumap_dispose(&cpumap); + libxl_bitmap_dispose(&cpumap); return ret; } diff --git a/tools/python/xen/lowlevel/xl/xl.c b/tools/python/xen/lowlevel/xl/xl.c --- a/tools/python/xen/lowlevel/xl/xl.c +++ b/tools/python/xen/lowlevel/xl/xl.c @@ -231,14 +231,14 @@ int attrib__libxl_cpuid_policy_list_set( return -1; } -int attrib__libxl_cpumap_set(PyObject *v, libxl_cpumap *pptr) +int attrib__libxl_bitmap_set(PyObject *v, libxl_bitmap *pptr) { int i; long cpu; for (i = 0; i < PyList_Size(v); i++) { cpu = PyInt_AsLong(PyList_GetItem(v, i)); - libxl_cpumap_set(pptr, cpu); + libxl_bitmap_set(pptr, cpu); } return 0; } @@ -293,14 +293,14 @@ PyObject *attrib__libxl_cpuid_policy_lis return NULL; } -PyObject *attrib__libxl_cpumap_get(libxl_cpumap *pptr) +PyObject *attrib__libxl_bitmap_get(libxl_bitmap *pptr) { PyObject *cpulist = NULL; int i; cpulist = PyList_New(0); - libxl_for_each_cpu(i, *pptr) { - if ( libxl_cpumap_test(pptr, i) ) { + libxl_for_each_bit(i, *pptr) { + if ( libxl_bitmap_test(pptr, i) ) { PyObject* pyint = PyInt_FromLong(i); PyList_Append(cpulist, pyint);
Dario Faggioli
2012-Jul-04 16:18 UTC
[PATCH 05 of 10 v3] libxl: expand the libxl_bitmap API a bit
# HG changeset patch # User Dario Faggioli <raistlin@linux.it> # Date 1341416323 -7200 # Node ID 516eb90ec5599c8d44a5a5c109db9d8bbdb6ed09 # Parent cfdd6d53f3dd3c6aa325fe6d8a17e4089daafae5 libxl: expand the libxl_bitmap API a bit By adding copying and *_is_full/*_is_empty facilities. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * added an assert for equal sizes in libxl_bitmap_copy(). Changes from v1: * now libxl_is_full/empty return 1 if true and 0 if false, as logic (and as requested during review). diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -510,6 +510,36 @@ void libxl_bitmap_dispose(libxl_bitmap * free(map->map); } +void libxl_bitmap_copy(libxl_ctx *ctx, libxl_bitmap *dptr, + const libxl_bitmap *sptr) +{ + int sz; + + assert(dptr->size == sptr->size); + sz = dptr->size = sptr->size; + memcpy(dptr->map, sptr->map, sz * sizeof(*dptr->map)); +} + +int libxl_bitmap_is_full(const libxl_bitmap *bitmap) +{ + int i; + + for (i = 0; i < bitmap->size; i++) + if (bitmap->map[i] != (uint8_t)-1) + return 0; + return 1; +} + +int libxl_bitmap_is_empty(const libxl_bitmap *bitmap) +{ + int i; + + for (i = 0; i < bitmap->size; i++) + if (bitmap->map[i]) + return 0; + return 1; +} + int libxl_bitmap_test(const libxl_bitmap *bitmap, int bit) { if (bit >= bitmap->size * 8) diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h --- a/tools/libxl/libxl_utils.h +++ b/tools/libxl/libxl_utils.h @@ -66,6 +66,10 @@ int libxl_vdev_to_device_disk(libxl_ctx int libxl_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *bitmap, int n_bits); /* Allocated bimap is from malloc, libxl_bitmap_dispose() to be * called by the application when done. */ +void libxl_bitmap_copy(libxl_ctx *ctx, libxl_bitmap *dptr, + const libxl_bitmap *sptr); +int libxl_bitmap_is_full(const libxl_bitmap *bitmap); +int libxl_bitmap_is_empty(const libxl_bitmap *bitmap); int libxl_bitmap_test(const libxl_bitmap *bitmap, int bit); void libxl_bitmap_set(libxl_bitmap *bitmap, int bit); void libxl_bitmap_reset(libxl_bitmap *bitmap, int bit);
Dario Faggioli
2012-Jul-04 16:18 UTC
[PATCH 06 of 10 v3] libxl: introduce some node map helpers
# HG changeset patch # User Dario Faggioli <raistlin@linux.it> # Date 1341416323 -7200 # Node ID 3b65112bedc0656512312e29b89652f1c4ca0083 # Parent 516eb90ec5599c8d44a5a5c109db9d8bbdb6ed09 libxl: introduce some node map helpers To allow for allocating a node specific libxl_bitmap (as it is for cpu number and maps). Helper unctions to convert a node map it its coresponding cpu map and vice versa are also implemented. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> --- Changes from v2: * add a max_nodes parameter to libxl_node_bitmap_alloc() to allow allocating bitmap of different sizes (as it is not possible for cpu bitmaps). * Comments in libxl_utils.h moved above the function prototypes. Changes from v1: * This patch replaces "libxl: introduce libxl_nodemap", as the libxl_bitmap type introduced by the previous patch is now used for both cpu and node maps, as requested during v1 review. diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -588,6 +588,50 @@ char *libxl_bitmap_to_hex_string(libxl_c return q; } +int libxl_nodemap_to_cpumap(libxl_ctx *ctx, + const libxl_bitmap *nodemap, + libxl_bitmap *cpumap) +{ + libxl_cputopology *tinfo = NULL; + int nr_cpus, i, rc = 0; + + tinfo = libxl_get_cpu_topology(ctx, &nr_cpus); + if (tinfo == NULL) { + rc = ERROR_FAIL; + goto out; + } + + libxl_bitmap_set_none(cpumap); + for (i = 0; i < nr_cpus; i++) { + if (libxl_bitmap_test(nodemap, tinfo[i].node)) + libxl_bitmap_set(cpumap, i); + } + out: + libxl_cputopology_list_free(tinfo, nr_cpus); + return rc; +} + +int libxl_cpumap_to_nodemap(libxl_ctx *ctx, + const libxl_bitmap *cpumap, + libxl_bitmap *nodemap) +{ + libxl_cputopology *tinfo = NULL; + int nr_cpus, i, rc = 0; + + tinfo = libxl_get_cpu_topology(ctx, &nr_cpus); + if (tinfo == NULL) { + rc = ERROR_FAIL; + goto out; + } + + libxl_bitmap_set_none(nodemap); + libxl_for_each_set_bit(i, *cpumap) + libxl_bitmap_set(nodemap, tinfo[i].node); + out: + libxl_cputopology_list_free(tinfo, nr_cpus); + return rc; +} + int libxl_get_max_cpus(libxl_ctx *ctx) { return xc_get_max_cpus(ctx->xch); diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h --- a/tools/libxl/libxl_utils.h +++ b/tools/libxl/libxl_utils.h @@ -104,6 +104,29 @@ static inline int libxl_cpu_bitmap_alloc return libxl_bitmap_alloc(ctx, cpumap, max_cpus); } +static inline int libxl_node_bitmap_alloc(libxl_ctx *ctx, + libxl_bitmap *nodemap, + int max_nodes) +{ + if (max_nodes < 0) + return ERROR_INVAL; + if (max_nodes == 0) + max_nodes = libxl_get_max_nodes(ctx); + if (max_nodes == 0) + return ERROR_FAIL; + + return libxl_bitmap_alloc(ctx, nodemap, max_nodes); +} + +/* Populate cpumap with the cpus spanned by the nodes in nodemap */ +int libxl_nodemap_to_cpumap(libxl_ctx *ctx, + const libxl_bitmap *nodemap, + libxl_bitmap *cpumap); +/* Populate nodemap with the nodes of the cpus in cpumap */ +int libxl_cpumap_to_nodemap(libxl_ctx *ctx, + const libxl_bitmap *cpuemap, + libxl_bitmap *nodemap); + static inline uint32_t libxl__sizekb_to_mb(uint32_t s) { return (s + 1023) / 1024; }
Dario Faggioli
2012-Jul-04 16:18 UTC
[PATCH 07 of 10 v3] libxl: explicitly check for libmath in autoconf
# HG changeset patch # User Dario Faggioli <raistlin@linux.it> # Date 1341416324 -7200 # Node ID 6fd693e7f3bc8b4d9bd20befff2c13de5591a7c5 # Parent 3b65112bedc0656512312e29b89652f1c4ca0083 libxl: explicitly check for libmath in autoconf As well as explicitly add -lm to libxl''s Makefile. This is because next patch uses floating point arithmetic, and it is better to state it clearly that we need libmath (just in case we find a libc that wants that to be explicitly enforced). Notice that autoconf should be rerun after applying this change. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/configure.ac b/tools/configure.ac --- a/tools/configure.ac +++ b/tools/configure.ac @@ -133,6 +133,7 @@ AC_CHECK_LIB([lzo2], [lzo1x_decompress], AC_SUBST(zlib) AC_CHECK_LIB([aio], [io_setup], [system_aio="y"], [system_aio="n"]) AC_SUBST(system_aio) +AC_CHECK_LIB([m], [isnan], [], [AC_MSG_ERROR([Could not find libmath])]) AC_CHECK_LIB([crypto], [MD5], [], [AC_MSG_ERROR([Could not find libcrypto])]) AC_CHECK_LIB([ext2fs], [ext2fs_open2], [libext2fs="y"], [libext2fs="n"]) AC_SUBST(libext2fs) diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -61,7 +61,7 @@ ifeq ($(BISON),) scanners, please install it an rerun configure) endif -LIBXL_LIBS += -lyajl +LIBXL_LIBS += -lyajl -lm LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \ libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
Dario Faggioli
2012-Jul-04 16:18 UTC
[PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
# HG changeset patch # User Dario Faggioli <raistlin@linux.it> # Date 1341416324 -7200 # Node ID 7087d3622ee2051654c9e78fe4829da10c2d46f1 # Parent 6fd693e7f3bc8b4d9bd20befff2c13de5591a7c5 libxl: enable automatic placement of guests on NUMA nodes If a domain does not have a VCPU affinity, try to pin it automatically to some PCPUs. This is done taking into account the NUMA characteristics of the host. In fact, we look for a combination of host''s NUMA nodes with enough free memory and number of PCPUs for the new domain, and pin it to the VCPUs of those nodes. Once we know which ones, among all the possible combinations, represents valid placement candidates for a domain, use some heuistics for deciding which is the best. For instance, smaller candidates are considered to be better, both from the domain''s point of view (fewer memory spreading among nodes) and from the system as a whole point of view (fewer memoy fragmentation). In case of candidates of equal sizes (i.e., with the same number of nodes), the amount of free memory and the number of domain already assigned to their nodes are considered. Very often, candidates with greater amount of memory are the one we wants, as this is also good for keeping memory fragmentation under control. However, if the difference in how much free memory two candidates have, the number of assigned domains might be what decides which candidate wins. This all happens internally to libxl, and no API for driving the mechanism is provided for now. This matches what xend already does. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> --- Changes from v2: * lots of typos. * Clayfied some comments, as requested during (ijc''s) review. * Added some more information/reference for the combination generation algorithm. * nodemap_to_nodes_cpus() function renamed to nodemap_to_nr_cpus(). * libxl_bitmap_init() used to make sure we do not try to free random memory on failure paths of functions that allocates a libxl_bitmap. * Always invoke libxl__sort_numa_candidates(), even if we get there with just 1 candidate, as requested during review. * Simplified the if-s that check for input parameter consistency in libxl__get_numa_candidates() as requested during (gwd''s) review. * Comparison function for candidates changed so that it now provides total ordering, as requested during review. It is still using FP arithmetic, though. Also I think that just putting the difference between the amount of free memory and between the number of assigned domains of two candidates in a single formula (after normalizing and weighting them) is both clear and effective enough. * Function definitions moved to a numa specific source file (libxl_numa.c), as suggested during review. Changes from v1: * This patches incorporates the changes from both "libxl, xl: enable automatic placement of guests on NUMA nodes" and "libxl, xl: heuristics for reordering NUMA placement candidates" from v1. * The logic of the algorithm is basically the same as in v1, but the splitting of it in the various functions has been completely redesigned from scratch. * No public API for placement or candidate generation is now exposed, everything happens within libxl, as agreed during v1 review. * The relevant documentation have been moved near the actual functions and features. Also, the amount and (hopefully!) the quality of the documentation has been improved a lot, as requested. * All the comments about using the proper libxl facilities and helpers for allocations, etc., have been considered and applied. * This patch still bails out from NUMA optimizations if it find out cpupools are being utilized. It is next patch that makes the two things interact properly, as suggested during review. diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -111,8 +111,8 @@ created online and the remainder will be =item B<cpus="CPU-LIST"> -List of which cpus the guest is allowed to use. Default behavior is -`all cpus`. A C<CPU-LIST> may be specified as follows: +List of which cpus the guest is allowed to use. By default xl will (via +libxl) pick some cpus (see below). A C<CPU-LIST> may be specified as follows: =over 4 @@ -132,6 +132,12 @@ run on cpu #3 of the host. =back +If this option is not specified, libxl automatically tries to place the new +domain on the host''s NUMA nodes (provided the host has more than one NUMA +node) by pinning it to the cpus of those nodes. A heuristic approach is +utilized with the goals of maximizing performance for the domain and, at +the same time, achieving efficient utilization of the host''s CPUs and RAM. + =item B<cpu_weight=WEIGHT> A domain with a weight of 512 will get twice as much CPU as a domain diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -66,7 +66,7 @@ LIBXL_LIBS += -lyajl -lm LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \ libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \ libxl_internal.o libxl_utils.o libxl_uuid.o \ - libxl_json.o libxl_aoutils.o \ + libxl_json.o libxl_aoutils.o libxl_numa.o \ libxl_save_callout.o _libxl_save_msgs_callout.o \ libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y) LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -98,6 +98,106 @@ out: return sched; } +/* Subtract two values and translate the result in [0, 1] */ +static double normalized_diff(double a, double b) +{ +#define max(a, b) (a > b ? a : b) + if (!a && a == b) + return 0.0; + return (a - b) / max(a, b); +} + +/* + * The NUMA placement candidates are reordered according to the following + * heuristics: + * - candidates involving fewer nodes come first. In case two (or + * more) candidates span the same number of nodes, + * - the amount of free memory and the number of domains assigned to the + * candidates are considered. In doing that, candidates with greater + * amount of free memory and fewer domains assigned to them are preferred, + * with free memory "weighting" three times as much as number of domains. + */ +static int numa_cmpf(const void *v1, const void *v2) +{ + const libxl__numa_candidate *c1 = v1; + const libxl__numa_candidate *c2 = v2; +#define sign(a) a > 0 ? 1 : a < 0 ? -1 : 0 + double freememkb_diff = normalized_diff(c2->free_memkb, c1->free_memkb); + double nrdomains_diff = normalized_diff(c1->nr_domains, c2->nr_domains); + + if (c1->nr_nodes != c2->nr_nodes) + return c1->nr_nodes - c2->nr_nodes; + + return sign(3*freememkb_diff + nrdomains_diff); +} + +/* The actual automatic NUMA placement routine */ +static int numa_place_domain(libxl__gc *gc, libxl_domain_build_info *info) +{ + int nr_candidates = 0; + libxl__numa_candidate *candidates = NULL; + libxl_bitmap candidate_nodemap; + libxl_cpupoolinfo *pinfo; + int nr_pools, rc = 0; + uint32_t memkb; + + libxl_bitmap_init(&candidate_nodemap); + + /* First of all, if cpupools are in use, better not to mess with them */ + pinfo = libxl_list_cpupool(CTX, &nr_pools); + if (!pinfo) + return ERROR_FAIL; + if (nr_pools > 1) { + LOG(NOTICE, "skipping NUMA placement as cpupools are in use"); + goto out; + } + + rc = libxl_domain_need_memory(CTX, info, &memkb); + if (rc) + goto out; + if (libxl_node_bitmap_alloc(CTX, &candidate_nodemap, 0)) { + rc = ERROR_FAIL; + goto out; + } + + /* Find all the candidates with enough free memory and at least + * as much pcpus as the domain has vcpus. */ + rc = libxl__get_numa_candidates(gc, memkb, info->max_vcpus, 0, 0, + &candidates, &nr_candidates); + if (rc) + goto out; + + LOG(DETAIL, "%d NUMA placement candidates found", nr_candidates); + + /* No suitable placement candidates. We just return without touching the + * domain''s info->cpumap. It will have affinity with all nodes/cpus. */ + if (nr_candidates == 0) { + LOG(NOTICE, "NUMA placement failed, performance might be affected"); + goto out; + } + + /* Bring the best candidate in front of the list --> candidates[0] */ + libxl__sort_numa_candidates(candidates, nr_candidates, numa_cmpf); + + /* + * At this point, the first candidate in the array is the one we want. + * Go for it by mapping its node map to the domain''s info->cpumap. + */ + libxl__numa_candidate_get_nodemap(gc, &candidates[0], &candidate_nodemap); + rc = libxl_nodemap_to_cpumap(CTX, &candidate_nodemap, &info->cpumap); + if (rc) + goto out; + + LOG(DETAIL, "NUMA placement candidate with %d nodes, %d cpus and " + "%"PRIu32" KB free selected", candidates[0].nr_nodes, + candidates[0].nr_cpus, candidates[0].free_memkb / 1024); + + out: + libxl_bitmap_dispose(&candidate_nodemap); + libxl_cpupoolinfo_list_free(pinfo, nr_pools); + return rc; +} + int libxl__build_pre(libxl__gc *gc, uint32_t domid, libxl_domain_build_info *info, libxl__domain_build_state *state) { @@ -107,7 +207,22 @@ int libxl__build_pre(libxl__gc *gc, uint uint32_t rtc_timeoffset; xc_domain_max_vcpus(ctx->xch, domid, info->max_vcpus); + + /* + * Check if the domain has any CPU affinity. If not, try to build up one. + * In case numa_place_domain() find at least a suitable candidate, it will + * affect info->cpumap accordingly; if it does not, it just leaves it + * as it is. This means (unless some weird error manifests) the subsequent + * call to libxl_set_vcpuaffinity_all() will do the actual placement, + * whatever that turns out to be. + */ + if (libxl_bitmap_is_full(&info->cpumap)) { + int rc = numa_place_domain(gc, info); + if (rc) + return rc; + } libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap); + xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + LIBXL_MAXMEM_CONSTANT); if (info->type == LIBXL_DOMAIN_TYPE_PV) xc_domain_set_memmap_limit(ctx->xch, domid, diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2216,6 +2216,140 @@ static inline void libxl__ctx_unlock(lib #define CTX_LOCK (libxl__ctx_lock(CTX)) #define CTX_UNLOCK (libxl__ctx_unlock(CTX)) +/* + * Automatic NUMA placement + * + * These functions and data structures deal with the initial placement of a + * domain onto the host NUMA nodes. + * + * The key concept here is the one of "NUMA placement candidate", which is + * basically a set of nodes whose characteristics have been successfully + * checked against some specific requirements. More precisely, a candidate is + * the nodemap associated with one of the possible subset of the host NUMA + * nodes providing a certain amount of free memory, or a given number of cpus, + * or even both (depending in what the caller wants). For convenience of use, + * some of this information are stored within the candidate itself, instead of + * always being dynamically computed. A single node is a valid placement + * candidate, but it is also possible for a candidate to contain all the nodes + * of the host. The fewer nodes there are in a candidate, the better + * performance a domain placed onto it should get. For instance, looking for a + * numa candidates with 2GB of free memory means we want all the possible + * subsets of the host NUMA nodes with, cumulatively, at least 2GB of free + * memory. That could be possible by just using one particular node, or may + * require more nodes, depending on the characteristics of the host, on how + * many domains have been created already, on how big they are, etc. + * + * The intended usage is as follows: + * 1. by, fist of all, calling libxl__get_numa_candidates(), and specifying + * the proper constraints to it (e.g., the amount of memory a domain need + * as the minimum amount of free memory for the candidates) one can build + * up a whole set of suitable placing alternatives for a domain; + * 2. after that, one specific candidate should be chosen. That can happen + * by looking at their various characteristics; + * 3. the chosen candidate''s nodemap should be utilized for computing the + * actual affinity of the domain which, given the current NUMA support + * in the hypervisor, is what determines the placement of the domain''s + * vcpus and memory. + * + * To make phase 2 even easier, a sorting helper function for the list of + * candidates is provided in the form of libxl__sort_numa_candidates(). The + * only that is needed is defining a comparison function, containing the + * criteria for deciding, given two candidates, which one is ''better''. + * Depending on how the comparison function is defined, the best candidate + * (where, of course, best is defined with respect to the heuristics + * implemented in the comparison function itself, libxl__numa_candidate_cmpf()) + * could become the first or the last element of the list. + * + * Summarizing, achieving automatic NUMA placement is just a matter of + * obtaining the list of suitable placement candidates, perhaps asking for each + * of them to provide at least the amount of memory the domain needs. After + * that just implement a comparison function by means of the various helpers + * retrieving the relevant information about the candidates themselves. + * Finally, call the sorting helper function and use the candidate that became + * (typically) the first element of the list for determining the domain''s + * affinity. + */ + +typedef struct { + int nr_cpus, nr_nodes; + int nr_domains; + uint32_t free_memkb; + libxl_bitmap nodemap; +} libxl__numa_candidate; + +/* + * This generates the list of NUMA placement candidates satisfying some + * specific conditions. If min_nodes and/or max_nodes are not 0, their value is + * used to determine the minimum and maximum number of nodes that are allow to + * be present in each candidate. If min_nodes and/or max_nodes are 0, the + * minimum and maximum number of nodes to be used are automatically selected by + * the implementation (and that will likely be just 1 node for the minimum and + * the total number of existent nodes for the maximum). Re min_free_memkb and + * min_cpu, if not 0, it means the caller only wants candidates with at + * least that amount of free memory and that number of cpus, respectively. If + * min_free_memkb and/or min_cpus are 0, the candidates'' free memory and number + * of cpus won''t be checked at all, which means a candidate will always be + * considered suitable wrt the specific constraint. cndts is where the list of + * exactly nr_cndts candidates is returned. Note that, in case no candidates + * are found at all, the function returns successfully, but with nr_cndts equal + * to zero. + */ +_hidden int libxl__get_numa_candidates(libxl__gc *gc, + uint32_t min_free_memkb, int min_cpus, + int min_nodes, int max_nodes, + libxl__numa_candidate *cndts[], int *nr_cndts); + +/* Initialization, allocation and deallocation for placement candidates */ +static inline void libxl__numa_candidate_init(libxl__numa_candidate *cndt) +{ + cndt->free_memkb = 0; + cndt->nr_cpus = cndt->nr_nodes = cndt->nr_domains = 0; + libxl_bitmap_init(&cndt->nodemap); +} + +static inline int libxl__numa_candidate_alloc(libxl__gc *gc, + libxl__numa_candidate *cndt) +{ + return libxl_node_bitmap_alloc(CTX, &cndt->nodemap, 0); +} +static inline void libxl__numa_candidate_dispose(libxl__numa_candidate *cndt) +{ + libxl_bitmap_dispose(&cndt->nodemap); +} +static inline void libxl__numacandidate_list_free(libxl__numa_candidate *cndts, + int nr_cndts) +{ + int i; + + for (i = 0; i < nr_cndts; i++) + libxl__numa_candidate_dispose(&cndts[i]); + free(cndts); +} + +/* Retrieve (in nodemap) the node map associated to placement candidate cndt */ +static inline +void libxl__numa_candidate_get_nodemap(libxl__gc *gc, + const libxl__numa_candidate *cndt, + libxl_bitmap *nodemap) +{ + libxl_bitmap_copy(CTX, nodemap, &cndt->nodemap); +} +/* Set the node map of placement candidate cndt to match nodemap */ +static inline +void libxl__numa_candidate_put_nodemap(libxl__gc *gc, + libxl__numa_candidate *cndt, + const libxl_bitmap *nodemap) +{ + libxl_bitmap_copy(CTX, &cndt->nodemap, nodemap); +} + +/* Signature for the comparison function between two candidates c1 and c2 */ +typedef int (*libxl__numa_candidate_cmpf)(const void *v1, const void *v2); +/* Sort the list of candidates in cndts (an array with nr_cndts elements in + * it) using cmpf for comparing two candidates. Uses libc''s qsort(). */ +_hidden void libxl__sort_numa_candidates(libxl__numa_candidate cndts[], + int nr_cndts, + libxl__numa_candidate_cmpf cmpf); /* * Inserts "elm_new" into the sorted list "head". diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c new file mode 100644 --- /dev/null +++ b/tools/libxl/libxl_numa.c @@ -0,0 +1,382 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * Author Dario Faggioli <dario.faggioli@citrix.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include "libxl_osdeps.h" /* must come before any other headers */ + +#include <glob.h> + +#include "libxl_internal.h" + +/* + * What follows are helpers for generating all the k-combinations + * without repetitions of a set S with n elements in it. Formally + * speaking, they are subsets of k distinct elements of S and, if + * S is n elements big, the number of k-combinations is equal to + * the binomial coefficient (n k), which means we get exactly + * n!/(k! * (n - k)!) subsets, all of them with k elements. + * + * The various subset are generated one after the other by calling + * comb_init() first, and, after that, comb_next() + * (n k)-1 times. An iterator is used to store the current status + * of the whole generation operation (i.e., basically, the last + * combination that has been generated). As soon as all + * combinations have been generated, comb_next() will + * start returning 0 instead of 1. It is of course important that + * the same instance of the iterator and the same values for + * n and k are used for each call. If that doesn''t happen, the + * result is unspecified. + * + * The algorithm is a well known one (see, for example, D. Knuth''s "The + * Art of Computer Programming - Volume 4, Fascicle 3" and it produces + * the combinations in such a way that they (well, more precisely, + * their indexes it the array/map representing the set) come with + * lexicographic ordering. + * + * For example, with n = 5 and k = 3, calling comb_init() + * will generate { 0, 1, 2 }, while subsequent valid calls to + * comb_next() will produce the following: + * { { 0, 1, 3 }, { 0, 1, 4 }, + * { 0, 2, 3 }, { 0, 2, 4 }, { 0, 3, 4 }, + * { 1, 2, 3 }, { 1, 2, 4 }, { 1, 3, 4 }, + * { 2, 3, 4 } } + * + * This is used by the automatic NUMA placement logic below. + */ +typedef int* comb_iter_t; + +static int comb_init(libxl__gc *gc, comb_iter_t *it, int n, int k) +{ + comb_iter_t new_iter; + int i; + + if (n < k) + return 0; + + /* First set is always { 0, 1, 2, ..., k-1 } */ + GCNEW_ARRAY(new_iter, k); + for (i = 0; i < k; i++) + new_iter[i] = i; + + *it = new_iter; + return 1; +} + +static int comb_next(comb_iter_t it, int n, int k) +{ + int i; + + /* + * The idea here is to find the leftmost element from where + * we should start incrementing the indexes of the iterator. + * This means looking for the highest index that can be increased + * while still producing value smaller than n-1. In the example + * above, when dealing with { 0, 1, 4 }, such an element is the + * second one, as the third is already equal to 4 (which actually + * is n-1). + * Once we found from where to start, we increment that element + * and override the right-hand rest of the iterator with its + * successors, thus achieving lexicographic ordering. + * + * Regarding the termination of the generation process, when we + * manage in bringing n-k at the very first position of the iterator, + * we know that is the last valid combination ( { 2, 3, 4 }, with + * n - k = 5 - 2 = 2, in the example above), and thus we start + * returning 0 as soon as we cross that border. + */ + for (i = k - 1; it[i] == n - k + i; i--) { + if (i <= 0) + return 0; + } + for (it[i]++, i++; i < k; i++) + it[i] = it[i - 1] + 1; + return 1; +} + +/* NUMA automatic placement (see libxl_internal.h for details) */ + +/* + * This function turns a k-combination iterator into a node map. + * This means the bits in the node map corresponding to the indexes + * of the given combination are the ones that will be set. + * For example, if the iterator represents the combination { 0, 2, 4}, + * the node map will have bits #0, #2 and #4 set. + */ +static void comb_get_nodemap(comb_iter_t it, libxl_bitmap *nodemap, int k) +{ + int i; + + libxl_bitmap_set_none(nodemap); + for (i = 0; i < k; i++) + libxl_bitmap_set(nodemap, it[i]); +} + +/* Retrieve the number of cpus that the nodes that are part of the nodemap + * span. */ +static int nodemap_to_nr_cpus(libxl_cputopology *tinfo, int nr_cpus, + const libxl_bitmap *nodemap) +{ + int i, nodes_cpus = 0; + + for (i = 0; i < nr_cpus; i++) { + if (libxl_bitmap_test(nodemap, tinfo[i].node)) + nodes_cpus++; + } + return nodes_cpus; +} + +/* Retrieve the amount of free memory within the nodemap */ +static uint32_t nodemap_to_free_memkb(libxl_numainfo *ninfo, + libxl_bitmap *nodemap) +{ + uint32_t free_memkb = 0; + int i; + + libxl_for_each_set_bit(i, *nodemap) + free_memkb += ninfo[i].free / 1024; + + return free_memkb; +} + +/* Retrieve the number of domains that can potentially run on the cpus + * the nodes that are part of the nodemap. */ +static int nodemap_to_nr_domains(libxl__gc *gc, libxl_cputopology *tinfo, + const libxl_bitmap *nodemap) +{ + libxl_dominfo *dinfo = NULL; + libxl_bitmap dom_nodemap; + int nr_doms, nr_cpus; + int nr_domains = 0; + int i, j, k; + + dinfo = libxl_list_domain(CTX, &nr_doms); + if (dinfo == NULL) + return ERROR_FAIL; + + if (libxl_node_bitmap_alloc(CTX, &dom_nodemap, 0) < 0) { + libxl_dominfo_list_free(dinfo, nr_doms); + return ERROR_FAIL; + } + + for (i = 0; i < nr_doms; i++) { + libxl_vcpuinfo *vinfo; + int nr_vcpus; + + vinfo = libxl_list_vcpu(CTX, dinfo[i].domid, &nr_vcpus, &nr_cpus); + if (vinfo == NULL) + continue; + + libxl_bitmap_set_none(&dom_nodemap); + for (j = 0; j < nr_vcpus; j++) { + libxl_for_each_set_bit(k, vinfo[j].cpumap) + libxl_bitmap_set(&dom_nodemap, tinfo[k].node); + } + + libxl_for_each_set_bit(j, dom_nodemap) { + if (libxl_bitmap_test(nodemap, j)) { + nr_domains++; + break; + } + } + + libxl_vcpuinfo_list_free(vinfo, nr_vcpus); + } + + libxl_bitmap_dispose(&dom_nodemap); + libxl_dominfo_list_free(dinfo, nr_doms); + return nr_domains; +} + +/* + * This function tries to figure out if the host has a consistent number + * of cpus along all its NUMA nodes. In fact, if that is the case, we can + * calculate the minimum number of nodes needed for a domain by just + * dividing its total number of vcpus by this value computed here. + * However, we are not allowed to assume that all the nodes have the + * same number of cpus. Therefore, in case discrepancies among different + * nodes are found, this function just returns 0, for the caller to know + * it shouldn''t rely on this ''optimization'', and sort out things in some + * other way (most likely, just start trying with candidates with just + * one node). + */ +static int cpus_per_node_count(libxl_cputopology *tinfo, int nr_cpus, + libxl_numainfo *ninfo, int nr_nodes) +{ + int cpus_per_node = 0; + int j, i; + + /* This makes sense iff # of PCPUs is the same for all nodes */ + for (j = 0; j < nr_nodes; j++) { + int curr_cpus = 0; + + for (i = 0; i < nr_cpus; i++) { + if (tinfo[i].node == j) + curr_cpus++; + } + /* So, if the above does not hold, turn the whole thing off! */ + cpus_per_node = cpus_per_node == 0 ? curr_cpus : cpus_per_node; + if (cpus_per_node != curr_cpus) + return 0; + } + return cpus_per_node; +} + +/* Get all the placement candidates satisfying some specific conditions */ +int libxl__get_numa_candidates(libxl__gc *gc, + uint32_t min_free_memkb, int min_cpus, + int min_nodes, int max_nodes, + libxl__numa_candidate *cndts[], int *nr_cndts) +{ + libxl__numa_candidate *new_cndts = NULL; + libxl_cputopology *tinfo = NULL; + libxl_numainfo *ninfo = NULL; + int nr_nodes = 0, nr_cpus = 0; + libxl_bitmap nodemap; + int array_size, rc; + + libxl_bitmap_init(&nodemap); + + /* Get platform info and prepare the map for testing the combinations */ + ninfo = libxl_get_numainfo(CTX, &nr_nodes); + if (ninfo == NULL) + return ERROR_FAIL; + /* If we don''t have at least 2 nodes, it is useless to proceed */ + if (nr_nodes < 2) { + rc = 0; + goto out; + } + + tinfo = libxl_get_cpu_topology(CTX, &nr_cpus); + if (tinfo == NULL) { + rc = ERROR_FAIL; + goto out; + } + + rc = libxl_node_bitmap_alloc(CTX, &nodemap, 0); + if (rc) + goto out; + + /* + * If the minimum number of NUMA nodes is not explicitly specified + * (i.e., min_nodes == 0), we try to figure out a sensible number of nodes + * from where to start generating candidates, if possible (or just start + * from 1 otherwise). The maximum number of nodes should not exceed the + * number of existent NUMA nodes on the host, or the candidate generation + * won''t work properly. + */ + if (!min_nodes) { + int cpus_per_node; + + cpus_per_node = cpus_per_node_count(tinfo, nr_cpus, ninfo, nr_nodes); + if (cpus_per_node == 0) + min_nodes = 1; + else + min_nodes = (min_cpus + cpus_per_node - 1) / cpus_per_node; + } + if (min_nodes > nr_nodes) + min_nodes = nr_nodes; + if (!max_nodes || max_nodes > nr_nodes) + max_nodes = nr_nodes; + if (min_nodes > max_nodes) { + rc = ERROR_INVAL; + goto out; + } + + /* Initialize the local storage for the combinations */ + *nr_cndts = 0; + array_size = nr_nodes; + GCNEW_ARRAY(new_cndts, array_size); + + /* Generate all the combinations of any size from min_nodes to + * max_nodes (see comb_init() and comb_next()). */ + while (min_nodes <= max_nodes) { + comb_iter_t comb_iter; + int comb_ok; + + /* + * And here it is. Each step of this cycle generates a combination of + * nodes as big as min_nodes mandates. Each of these combinations is + * checked against the constraints provided by the caller (namely, + * amount of free memory and number of cpus) and it becomes an actual + * placement candidate iff it passes the check. + */ + for (comb_ok = comb_init(gc, &comb_iter, nr_nodes, min_nodes); comb_ok; + comb_ok = comb_next(comb_iter, nr_nodes, min_nodes)) { + uint32_t nodes_free_memkb; + int nodes_cpus; + + comb_get_nodemap(comb_iter, &nodemap, min_nodes); + + /* If there is not enough memory in this combination, skip it + * and go generating the next one... */ + nodes_free_memkb = nodemap_to_free_memkb(ninfo, &nodemap); + if (min_free_memkb && nodes_free_memkb < min_free_memkb) + continue; + + /* And the same applies if this combination is short in cpus */ + nodes_cpus = nodemap_to_nr_cpus(tinfo, nr_cpus, &nodemap); + if (min_cpus && nodes_cpus < min_cpus) + continue; + + /* + * Conditions are met, we can add this combination to the + * NUMA placement candidates list. We first make sure there + * is enough space in there, and then we initialize the new + * candidate element with the node map corresponding to the + * combination we are dealing with. Memory allocation for + * expanding the array that hosts the list happens in chunks + * equal to the number of NUMA nodes in the system (to + * avoid allocating memory each and every time we find a + * new candidate). + */ + if (*nr_cndts == array_size) + array_size += nr_nodes; + GCREALLOC_ARRAY(new_cndts, array_size); + + libxl__numa_candidate_alloc(gc, &new_cndts[*nr_cndts]); + libxl__numa_candidate_put_nodemap(gc, &new_cndts[*nr_cndts], + &nodemap); + new_cndts[*nr_cndts].nr_domains + nodemap_to_nr_domains(gc, tinfo, &nodemap); + new_cndts[*nr_cndts].free_memkb = nodes_free_memkb; + new_cndts[*nr_cndts].nr_nodes = min_nodes; + new_cndts[*nr_cndts].nr_cpus = nodes_cpus; + + LOG(DEBUG, "NUMA placement candidate #%d found: nr_nodes=%d, " + "nr_cpus=%d, free_memkb=%"PRIu32"", *nr_cndts, + min_nodes, new_cndts[*nr_cndts].nr_cpus, + new_cndts[*nr_cndts].free_memkb / 1024); + + (*nr_cndts)++; + } + min_nodes++; + } + + *cndts = new_cndts; + out: + libxl_bitmap_dispose(&nodemap); + libxl_cputopology_list_free(tinfo, nr_cpus); + libxl_numainfo_list_free(ninfo, nr_nodes); + return rc; +} + +void libxl__sort_numa_candidates(libxl__numa_candidate cndts[], int nr_cndts, + libxl__numa_candidate_cmpf cmpf) +{ + /* Reorder candidates (see the comparison function for + * the details on the heuristics) */ + qsort(cndts, nr_cndts, sizeof(cndts[0]), cmpf); +} + +
Dario Faggioli
2012-Jul-04 16:18 UTC
[PATCH 09 of 10 v3] libxl: have NUMA placement deal with cpupools
# HG changeset patch # User Dario Faggioli <raistlin@linux.it> # Date 1341416324 -7200 # Node ID 885e2f385601d66179058bfb6bd3960f17d5e068 # Parent 7087d3622ee2051654c9e78fe4829da10c2d46f1 libxl: have NUMA placement deal with cpupools In such a way that only the cpus belonging to the cpupool of the domain being placed are considered for the placement itself. This happens by filtering out all the nodes in which the cpupool has not any cpu from the placement candidates. After that -- as a cpu pooling not necessarily happens at NUMA nodes boundaries -- we also make sure only the actual cpus that are part of the pool are considered when counting how much processors a placement candidate is able to provide. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> --- Changes from v2: * fixed typos in comments. diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -132,25 +132,29 @@ static int numa_cmpf(const void *v1, con } /* The actual automatic NUMA placement routine */ -static int numa_place_domain(libxl__gc *gc, libxl_domain_build_info *info) +static int numa_place_domain(libxl__gc *gc, uint32_t domid, + libxl_domain_build_info *info) { int nr_candidates = 0; libxl__numa_candidate *candidates = NULL; libxl_bitmap candidate_nodemap; - libxl_cpupoolinfo *pinfo; - int nr_pools, rc = 0; + libxl_cpupoolinfo cpupool_info; + int i, cpupool, rc = 0; uint32_t memkb; libxl_bitmap_init(&candidate_nodemap); - /* First of all, if cpupools are in use, better not to mess with them */ - pinfo = libxl_list_cpupool(CTX, &nr_pools); - if (!pinfo) - return ERROR_FAIL; - if (nr_pools > 1) { - LOG(NOTICE, "skipping NUMA placement as cpupools are in use"); - goto out; - } + /* + * Extract the cpumap from the cpupool the domain belong to. In fact, + * it only makes sense to consider the cpus/nodes that are in there + * for placement. + */ + rc = cpupool = libxl__domain_cpupool(gc, domid); + if (rc < 0) + return rc; + rc = libxl_cpupool_info(CTX, &cpupool_info, cpupool); + if (rc) + return rc; rc = libxl_domain_need_memory(CTX, info, &memkb); if (rc) @@ -162,7 +166,8 @@ static int numa_place_domain(libxl__gc * /* Find all the candidates with enough free memory and at least * as much pcpus as the domain has vcpus. */ - rc = libxl__get_numa_candidates(gc, memkb, info->max_vcpus, 0, 0, + rc = libxl__get_numa_candidates(gc, memkb, info->max_vcpus, + 0, 0, &cpupool_info.cpumap, &candidates, &nr_candidates); if (rc) goto out; @@ -188,13 +193,20 @@ static int numa_place_domain(libxl__gc * if (rc) goto out; + /* Avoid trying to set the affinity to cpus that might be in the + * nodemap but not in our cpupool. */ + libxl_for_each_set_bit(i, info->cpumap) { + if (!libxl_bitmap_test(&cpupool_info.cpumap, i)) + libxl_bitmap_reset(&info->cpumap, i); + } + LOG(DETAIL, "NUMA placement candidate with %d nodes, %d cpus and " "%"PRIu32" KB free selected", candidates[0].nr_nodes, candidates[0].nr_cpus, candidates[0].free_memkb / 1024); out: libxl_bitmap_dispose(&candidate_nodemap); - libxl_cpupoolinfo_list_free(pinfo, nr_pools); + libxl_cpupoolinfo_dispose(&cpupool_info); return rc; } @@ -217,7 +229,7 @@ int libxl__build_pre(libxl__gc *gc, uint * whatever that turns out to be. */ if (libxl_bitmap_is_full(&info->cpumap)) { - int rc = numa_place_domain(gc, info); + int rc = numa_place_domain(gc, domid, info); if (rc) return rc; } diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2289,14 +2289,17 @@ typedef struct { * least that amount of free memory and that number of cpus, respectively. If * min_free_memkb and/or min_cpus are 0, the candidates'' free memory and number * of cpus won''t be checked at all, which means a candidate will always be - * considered suitable wrt the specific constraint. cndts is where the list of - * exactly nr_cndts candidates is returned. Note that, in case no candidates - * are found at all, the function returns successfully, but with nr_cndts equal - * to zero. + * considered suitable wrt the specific constraint. suitable_cpumap is useful + * for specifying we want only the cpus in that mask to be considered while + * generating placement candidates (for example because of cpupools). cndts is + * where the list of exactly nr_cndts candidates is returned. Note that, in + * case no candidates are found at all, the function returns successfully, but + * with nr_cndts equal to zero. */ _hidden int libxl__get_numa_candidates(libxl__gc *gc, uint32_t min_free_memkb, int min_cpus, int min_nodes, int max_nodes, + const libxl_bitmap *suitable_cpumap, libxl__numa_candidate *cndts[], int *nr_cndts); /* Initialization, allocation and deallocation for placement candidates */ diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c --- a/tools/libxl/libxl_numa.c +++ b/tools/libxl/libxl_numa.c @@ -122,15 +122,27 @@ static void comb_get_nodemap(comb_iter_t libxl_bitmap_set(nodemap, it[i]); } +/* Retrieve how many nodes a nodemap spans */ +static int nodemap_to_nr_nodes(const libxl_bitmap *nodemap) +{ + int i, nr_nodes = 0; + + libxl_for_each_set_bit(i, *nodemap) + nr_nodes++; + return nr_nodes; +} + /* Retrieve the number of cpus that the nodes that are part of the nodemap - * span. */ + * span and are also set in suitable_cpumap. */ static int nodemap_to_nr_cpus(libxl_cputopology *tinfo, int nr_cpus, + const libxl_bitmap *suitable_cpumap, const libxl_bitmap *nodemap) { int i, nodes_cpus = 0; for (i = 0; i < nr_cpus; i++) { - if (libxl_bitmap_test(nodemap, tinfo[i].node)) + if (libxl_bitmap_test(suitable_cpumap, i) && + libxl_bitmap_test(nodemap, tinfo[i].node)) nodes_cpus++; } return nodes_cpus; @@ -236,13 +248,14 @@ static int cpus_per_node_count(libxl_cpu int libxl__get_numa_candidates(libxl__gc *gc, uint32_t min_free_memkb, int min_cpus, int min_nodes, int max_nodes, + const libxl_bitmap *suitable_cpumap, libxl__numa_candidate *cndts[], int *nr_cndts) { libxl__numa_candidate *new_cndts = NULL; libxl_cputopology *tinfo = NULL; libxl_numainfo *ninfo = NULL; int nr_nodes = 0, nr_cpus = 0; - libxl_bitmap nodemap; + libxl_bitmap suitable_nodemap, nodemap; int array_size, rc; libxl_bitmap_init(&nodemap); @@ -267,6 +280,15 @@ int libxl__get_numa_candidates(libxl__gc if (rc) goto out; + /* Allocate and prepare the map of the node that can be utilized for + * placement, basing on the map of suitable cpus. */ + rc = libxl_node_bitmap_alloc(CTX, &suitable_nodemap, 0); + if (rc) + goto out; + rc = libxl_cpumap_to_nodemap(CTX, suitable_cpumap, &suitable_nodemap); + if (rc) + goto out; + /* * If the minimum number of NUMA nodes is not explicitly specified * (i.e., min_nodes == 0), we try to figure out a sensible number of nodes @@ -314,9 +336,14 @@ int libxl__get_numa_candidates(libxl__gc for (comb_ok = comb_init(gc, &comb_iter, nr_nodes, min_nodes); comb_ok; comb_ok = comb_next(comb_iter, nr_nodes, min_nodes)) { uint32_t nodes_free_memkb; - int nodes_cpus; + int i, nodes_cpus; + /* Get the nodemap for the combination and filter unwanted nodes */ comb_get_nodemap(comb_iter, &nodemap, min_nodes); + libxl_for_each_set_bit(i, nodemap) { + if (!libxl_bitmap_test(&suitable_nodemap, i)) + libxl_bitmap_reset(&nodemap, i); + } /* If there is not enough memory in this combination, skip it * and go generating the next one... */ @@ -325,7 +352,8 @@ int libxl__get_numa_candidates(libxl__gc continue; /* And the same applies if this combination is short in cpus */ - nodes_cpus = nodemap_to_nr_cpus(tinfo, nr_cpus, &nodemap); + nodes_cpus = nodemap_to_nr_cpus(tinfo, nr_cpus, suitable_cpumap, + &nodemap); if (min_cpus && nodes_cpus < min_cpus) continue; @@ -350,12 +378,13 @@ int libxl__get_numa_candidates(libxl__gc new_cndts[*nr_cndts].nr_domains nodemap_to_nr_domains(gc, tinfo, &nodemap); new_cndts[*nr_cndts].free_memkb = nodes_free_memkb; - new_cndts[*nr_cndts].nr_nodes = min_nodes; + new_cndts[*nr_cndts].nr_nodes = nodemap_to_nr_nodes(&nodemap); new_cndts[*nr_cndts].nr_cpus = nodes_cpus; LOG(DEBUG, "NUMA placement candidate #%d found: nr_nodes=%d, " "nr_cpus=%d, free_memkb=%"PRIu32"", *nr_cndts, - min_nodes, new_cndts[*nr_cndts].nr_cpus, + new_cndts[*nr_cndts].nr_nodes, + new_cndts[*nr_cndts].nr_cpus, new_cndts[*nr_cndts].free_memkb / 1024); (*nr_cndts)++; @@ -365,6 +394,7 @@ int libxl__get_numa_candidates(libxl__gc *cndts = new_cndts; out: + libxl_bitmap_dispose(&suitable_nodemap); libxl_bitmap_dispose(&nodemap); libxl_cputopology_list_free(tinfo, nr_cpus); libxl_numainfo_list_free(ninfo, nr_nodes);
Dario Faggioli
2012-Jul-04 16:18 UTC
[PATCH 10 of 10 v3] Some automatic NUMA placement documentation
# HG changeset patch # User Dario Faggioli <raistlin@linux.it> # Date 1341416324 -7200 # Node ID f1523c3dc63746e07b11fada5be3d461c3807256 # Parent 885e2f385601d66179058bfb6bd3960f17d5e068 Some automatic NUMA placement documentation About rationale, usage and (some small bits of) API. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Changes from v1: * API documentation moved close to the actual functions. diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown new file mode 100644 --- /dev/null +++ b/docs/misc/xl-numa-placement.markdown @@ -0,0 +1,91 @@ +# Guest Automatic NUMA Placement in libxl and xl # + +## Rationale ## + +NUMA means the memory accessing times of a program running on a CPU depends on +the relative distance between that CPU and that memory. In fact, most of the +NUMA systems are built in such a way that each processor has its local memory, +on which it can operate very fast. On the other hand, getting and storing data +from and on remote memory (that is, memory local to some other processor) is +quite more complex and slow. On these machines, a NUMA node is usually defined +as a set of processor cores (typically a physical CPU package) and the memory +directly attached to the set of cores. + +The Xen hypervisor deals with Non-Uniform Memory Access (NUMA]) machines by +assigning to its domain a "node affinity", i.e., a set of NUMA nodes of the +host from which it gets its memory allocated. + +NUMA awareness becomes very important as soon as many domains start running +memory-intensive workloads on a shared host. In fact, the cost of accessing non +node-local memory locations is very high, and the performance degradation is +likely to be noticeable. + +## Guest Placement in xl ## + +If using xl for creating and managing guests, it is very easy to ask for both +manual or automatic placement of them across the host''s NUMA nodes. + +Note that xm/xend does the very same thing, the only differences residing in +the details of the heuristics adopted for the placement (see below). + +### Manual Guest Placement with xl ### + +Thanks to the "cpus=" option, it is possible to specify where a domain should +be created and scheduled on, directly in its config file. This affects NUMA +placement and memory accesses as the hypervisor constructs the node affinity of +a VM basing right on its CPU affinity when it is created. + +This is very simple and effective, but requires the user/system administrator +to explicitly specify affinities for each and every domain, or Xen won''t be +able to guarantee the locality for their memory accesses. + +It is also possible to deal with NUMA by partitioning the system using cpupools +(available in the upcoming release of Xen, 4.2). Again, this could be "The +Right Answer" for many needs and occasions, but has to to be carefully +considered and manually setup by hand. + +### Automatic Guest Placement with xl ### + +In case no "cpus=" option is specified in the config file, libxl tries to +figure out on its own on which node(s) the domain could fit best. It is +worthwhile noting that optimally fitting a set of VMs on the NUMA nodes of an +host host is an incarnation of the Bin Packing Problem. In fact, the various +VMs with different memory sizes are the items to be packed, and the host nodes +are the bins. That is known to be NP-hard, thus, it is probably better to +tackle the problem with some sort of hauristics, as we do not have any oracle +available! + +The first thing to do is finding a node, or even a set of nodes, that have +enough free memory and enough physical CPUs for accommodating the one new +domain. The idea is to find a spot for the domain with at least as much free +memory as it has configured, and as much pCPUs as it has vCPUs. After that, +the actual decision on which solution to go for happens accordingly to the +following heuristics: + + * candidates involving fewer nodes come first. In case two (or more) + candidates span the same number of nodes, + * the amount of free memory and the number of domains assigned to the + candidates are considered. In doing that, candidates with greater amount + of free memory and fewer assigned domains are preferred, with free memory + "weighting" three times as much as number of domains. + +Giving preference to small candidates ensures better performance for the guest, +as it avoid spreading its memory among different nodes. Favouring the nodes +that have the biggest amounts of free memory helps keeping the memory +fragmentation small, from a system wide perspective. However, in case more +candidates fulfil these criteria by roughly the same extent, having the number +of domains the candidates are "hosting" helps balancing the load on the various +nodes. + +## Guest Placement within libxl ## + +xl achieves automatic NUMA just because libxl does it interrnally. +No API is provided (yet) for interacting with this feature and modify +the library behaviour regarding automatic placement, it just happens +by default if no affinity is specified (as it is with xm/xend). + +For actually looking and maybe tweaking the mechanism and the algorithms it +uses, all is implemented as a set of libxl internal interfaces and facilities. +Look at the comment "Automatic NUMA placement" in libxl\_internal.h. + +Note this may change in future versions of Xen/libxl.
Dario Faggioli
2012-Jul-04 16:21 UTC
Re: [PATCH 00 of 10 v3] Automatic NUMA placement for xl
On Wed, 2012-07-04 at 18:02 +0200, Dario Faggioli wrote:> Hello, >Sorry for this! I don''t have the slightest idea of why it `hg email'' hanged after sending only these 3 mails. :-( Anyway, my second attempt seems to have succeeded, so please, consider that for the full series (i.e., the thread where the msg-id of the introductory mail is <patchbomb.1341418679@Solace>), and sorry again for the noise. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Dario Faggioli
2012-Jul-04 16:41 UTC
Re: [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
On Wed, 2012-07-04 at 18:18 +0200, Dario Faggioli wrote:> # HG changeset patch > # User Dario Faggioli <raistlin@linux.it> > # Date 1341416324 -7200 > # Node ID 7087d3622ee2051654c9e78fe4829da10c2d46f1 > # Parent 6fd693e7f3bc8b4d9bd20befff2c13de5591a7c5 > libxl: enable automatic placement of guests on NUMA nodes >George, Ian (Jackson),> Once we know which ones, among all the possible combinations, represents valid > placement candidates for a domain, use some heuistics for deciding which is the > best. For instance, smaller candidates are considered to be better, both from > the domain''s point of view (fewer memory spreading among nodes) and from the > system as a whole point of view (fewer memoy fragmentation). In case of > candidates of equal sizes (i.e., with the same number of nodes), the amount of > free memory and the number of domain already assigned to their nodes are > considered. Very often, candidates with greater amount of memory are the one > we wants, as this is also good for keeping memory fragmentation under control. > However, if the difference in how much free memory two candidates have, the > number of assigned domains might be what decides which candidate wins. > > [...] > > --- > Changes from v2: > > [...] > > * Comparison function for candidates changed so that it now provides > total ordering, as requested during review. It is still using FP > arithmetic, though. Also I think that just putting the difference > between the amount of free memory and between the number of assigned > domains of two candidates in a single formula (after normalizing and > weighting them) is both clear and effective enough. > > [...] >I thought at what a sensible comparison function should look like, and I also plotted some graphs while randomly generating both the amount of free memory and number of domains. The outcome of all this is I managed in convincing myself the solution below is both clear and understandable as it is effective (confirmed by the test I was able to perform up to now). Basically, the idea is to consider both things (freemem and nr_domains), but with freemem being 3 times more "important" than nr_domains. That is very similar to one of the log()-based solutions proposed by Ian, but I really think just normalizing and weighting is easier to understand, even if quickly looking at the formula/code. Regarding the percent penalty per each domain proposed by George on IRC, I liked the idea a lot, but the figured out that here will always be a number of domains (e.g., 20, if the penalty is set to 5%) starting from which nr_domains counts more than freemem, which is the opposite of what I want to achieve. So, now, comparison between placement candidates happens like this: return a.nr_nodes < b.nr_nodes ? a : b.nr_nodes < a.nr_nodes ? b : 3*norm_diff(b.freemem, a.freemem) - norm_diff(a.nr_domains, b.nr_domains) where: norm_diff(x, y) := (x - y)/max(x, y) Which removes the nasty effects of having that 10% range as in v2 of the series. If that ''3'' looks too much of a magic number, I can of course enum/#define it, or even make it configurable (although, the latter, not for 4.2, I guess :-) ).> +/* Subtract two values and translate the result in [0, 1] */ > +static double normalized_diff(double a, double b) > +{ > +#define max(a, b) (a > b ? a : b) > + if (!a && a == b) > + return 0.0; > + return (a - b) / max(a, b); > +} > + > +/* > + * The NUMA placement candidates are reordered according to the following > + * heuristics: > + * - candidates involving fewer nodes come first. In case two (or > + * more) candidates span the same number of nodes, > + * - the amount of free memory and the number of domains assigned to the > + * candidates are considered. In doing that, candidates with greater > + * amount of free memory and fewer domains assigned to them are preferred, > + * with free memory "weighting" three times as much as number of domains. > + */ > +static int numa_cmpf(const void *v1, const void *v2) > +{ > + const libxl__numa_candidate *c1 = v1; > + const libxl__numa_candidate *c2 = v2; > +#define sign(a) a > 0 ? 1 : a < 0 ? -1 : 0 > + double freememkb_diff = normalized_diff(c2->free_memkb, c1->free_memkb); > + double nrdomains_diff = normalized_diff(c1->nr_domains, c2->nr_domains); > + > + if (c1->nr_nodes != c2->nr_nodes) > + return c1->nr_nodes - c2->nr_nodes; > + > + return sign(3*freememkb_diff + nrdomains_diff); > +} > +What do you think? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Roger Pau Monne
2012-Jul-04 16:44 UTC
Re: [PATCH 07 of 10 v3] libxl: explicitly check for libmath in autoconf
Dario Faggioli wrote:> # HG changeset patch > # User Dario Faggioli<raistlin@linux.it> > # Date 1341416324 -7200 > # Node ID 6fd693e7f3bc8b4d9bd20befff2c13de5591a7c5 > # Parent 3b65112bedc0656512312e29b89652f1c4ca0083 > libxl: explicitly check for libmath in autoconf > > As well as explicitly add -lm to libxl''s Makefile. > > This is because next patch uses floating point arithmetic, and > it is better to state it clearly that we need libmath (just in > case we find a libc that wants that to be explicitly enforced). > > Notice that autoconf should be rerun after applying this change. > > Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com>Acked-by: Roger Pau Monne <roger.pau@citrix.com>
Ian Campbell
2012-Jul-06 10:35 UTC
Re: [PATCH 02 of 10 v3] libxl, libxc: introduce libxl_get_numainfo()
On Wed, 2012-07-04 at 17:18 +0100, Dario Faggioli wrote:> # HG changeset patch > # User Dario Faggioli <raistlin@linux.it> > # Date 1341416323 -7200 > # Node ID 0ca91a203fc95d3d18bb436ecdc7106b0b2ff22f > # Parent 8e367818e194c212cd1470aad663f3243ff53bdb > libxl,libxc: introduce libxl_get_numainfo() > > Make some NUMA node information available to the toolstack. Achieve > this by means of xc_numainfo(), which exposes memory size and amount > of free memory of each node, as well as the relative distances of > each node to all the others. > > For properly exposing distances we need the IDL to support arrays. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>Acked-by: Ian Campbell <ian.campbell@citrix.com>> > --- > Changes from v2: > * converted libxl__zalloc(NULL, ...) to libxl_calloc(NOGC, ...). > * Fixed the comment about memory ownership of libxl_get_numainfo(). > * Added a comment for libxl_numainfo in libxl_types.idl. > > Changes from v1: > * malloc converted to libxl__zalloc(NOGC, ...). > * The patch also accommodates some bits of what was in "libxc, > libxl: introduce xc_nodemap_t and libxl_nodemap" which was > removed as well, as full support for node maps at libxc > level is not needed (yet!). > > diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c > --- a/tools/libxc/xc_misc.c > +++ b/tools/libxc/xc_misc.c > @@ -35,6 +35,20 @@ int xc_get_max_cpus(xc_interface *xch) > return max_cpus; > } > > +int xc_get_max_nodes(xc_interface *xch) > +{ > + static int max_nodes = 0; > + xc_physinfo_t physinfo; > + > + if ( max_nodes ) > + return max_nodes; > + > + if ( !xc_physinfo(xch, &physinfo) ) > + max_nodes = physinfo.max_node_id + 1; > + > + return max_nodes; > +} > + > int xc_get_cpumap_size(xc_interface *xch) > { > return (xc_get_max_cpus(xch) + 7) / 8; > diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h > --- a/tools/libxc/xenctrl.h > +++ b/tools/libxc/xenctrl.h > @@ -329,6 +329,12 @@ int xc_get_cpumap_size(xc_interface *xch > /* allocate a cpumap */ > xc_cpumap_t xc_cpumap_alloc(xc_interface *xch); > > + /* > + * NODEMAP handling > + */ > +/* return maximum number of NUMA nodes the hypervisor supports */ > +int xc_get_max_nodes(xc_interface *xch); > + > /* > * DOMAIN DEBUGGING FUNCTIONS > */ > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -3298,6 +3298,75 @@ fail: > return ret; > } > > +libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr) > +{ > + GC_INIT(ctx); > + xc_numainfo_t ninfo; > + DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize); > + DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree); > + DECLARE_HYPERCALL_BUFFER(uint32_t, node_dists); > + libxl_numainfo *ret = NULL; > + int i, j, max_nodes; > + > + max_nodes = libxl_get_max_nodes(ctx); > + if (max_nodes == 0) > + { > + LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES"); > + ret = NULL; > + goto out; > + } > + > + memsize = xc_hypercall_buffer_alloc > + (ctx->xch, memsize, sizeof(*memsize) * max_nodes); > + memfree = xc_hypercall_buffer_alloc > + (ctx->xch, memfree, sizeof(*memfree) * max_nodes); > + node_dists = xc_hypercall_buffer_alloc > + (ctx->xch, node_dists, sizeof(*node_dists) * max_nodes * max_nodes); > + if ((memsize == NULL) || (memfree == NULL) || (node_dists == NULL)) { > + LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM, > + "Unable to allocate hypercall arguments"); > + goto fail; > + } > + > + set_xen_guest_handle(ninfo.node_to_memsize, memsize); > + set_xen_guest_handle(ninfo.node_to_memfree, memfree); > + set_xen_guest_handle(ninfo.node_to_node_distance, node_dists); > + ninfo.max_node_index = max_nodes - 1; > + if (xc_numainfo(ctx->xch, &ninfo) != 0) { > + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting numainfo"); > + goto fail; > + } > + > + if (ninfo.max_node_index < max_nodes - 1) > + max_nodes = ninfo.max_node_index + 1; > + > + *nr = max_nodes; > + > + ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * max_nodes); > + for (i = 0; i < max_nodes; i++) > + ret[i].dists = libxl__calloc(NOGC, max_nodes, sizeof(*node_dists)); > + > + for (i = 0; i < max_nodes; i++) { > +#define V(mem, i) (mem[i] == INVALID_NUMAINFO_ID) ? \ > + LIBXL_NUMAINFO_INVALID_ENTRY : mem[i] > + ret[i].size = V(memsize, i); > + ret[i].free = V(memfree, i); > + ret[i].num_dists = max_nodes; > + for (j = 0; j < ret[i].num_dists; j++) > + ret[i].dists[j] = V(node_dists, i * max_nodes + j); > +#undef V > + } > + > + fail: > + xc_hypercall_buffer_free(ctx->xch, memsize); > + xc_hypercall_buffer_free(ctx->xch, memfree); > + xc_hypercall_buffer_free(ctx->xch, node_dists); > + > + out: > + GC_FREE; > + return ret; > +} > + > const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx) > { > union { > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -532,6 +532,9 @@ int libxl_domain_preserve(libxl_ctx *ctx > /* get max. number of cpus supported by hypervisor */ > int libxl_get_max_cpus(libxl_ctx *ctx); > > +/* get max. number of NUMA nodes supported by hypervisor */ > +int libxl_get_max_nodes(libxl_ctx *ctx); > + > int libxl_domain_rename(libxl_ctx *ctx, uint32_t domid, > const char *old_name, const char *new_name); > > @@ -604,6 +607,10 @@ void libxl_vminfo_list_free(libxl_vminfo > libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out); > void libxl_cputopology_list_free(libxl_cputopology *, int nb_cpu); > > +#define LIBXL_NUMAINFO_INVALID_ENTRY (~(uint32_t)0) > +libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr); > +void libxl_numainfo_list_free(libxl_numainfo *, int nr); > + > libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, > int *nb_vcpu, int *nr_vcpus_out); > void libxl_vcpuinfo_list_free(libxl_vcpuinfo *, int nr_vcpus); > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -433,6 +433,15 @@ libxl_physinfo = Struct("physinfo", [ > ("cap_hvm_directio", bool), > ], dir=DIR_OUT) > > +# NUMA node characteristics: size and free are how much memory it has, and how > +# much of it is free, respectively. dists is an array of distances from this > +# node to each other node. > +libxl_numainfo = Struct("numainfo", [ > + ("size", uint64), > + ("free", uint64), > + ("dists", Array(uint32, "num_dists")), > + ], dir=DIR_OUT) > + > libxl_cputopology = Struct("cputopology", [ > ("core", uint32), > ("socket", uint32), > diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c > --- a/tools/libxl/libxl_utils.c > +++ b/tools/libxl/libxl_utils.c > @@ -572,6 +572,11 @@ int libxl_get_max_cpus(libxl_ctx *ctx) > return xc_get_max_cpus(ctx->xch); > } > > +int libxl_get_max_nodes(libxl_ctx *ctx) > +{ > + return xc_get_max_nodes(ctx->xch); > +} > + > int libxl__enum_from_string(const libxl_enum_string_table *t, > const char *s, int *e) > { > @@ -594,6 +599,14 @@ void libxl_cputopology_list_free(libxl_c > free(list); > } > > +void libxl_numainfo_list_free(libxl_numainfo *list, int nr) > +{ > + int i; > + for (i = 0; i < nr; i++) > + libxl_numainfo_dispose(&list[i]); > + free(list); > +} > + > void libxl_vcpuinfo_list_free(libxl_vcpuinfo *list, int nr) > { > int i; > diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h > --- a/xen/include/public/sysctl.h > +++ b/xen/include/public/sysctl.h > @@ -484,6 +484,7 @@ typedef struct xen_sysctl_topologyinfo x > DEFINE_XEN_GUEST_HANDLE(xen_sysctl_topologyinfo_t); > > /* XEN_SYSCTL_numainfo */ > +#define INVALID_NUMAINFO_ID (~0U) > struct xen_sysctl_numainfo { > /* > * IN: maximum addressable entry in the caller-provided arrays.
Ian Campbell
2012-Jul-06 10:39 UTC
Re: [PATCH 04 of 10 v3] libxl: rename libxl_cpumap to libxl_bitmap
On Wed, 2012-07-04 at 17:18 +0100, Dario Faggioli wrote:> # HG changeset patch > # User Dario Faggioli <raistlin@linux.it> > # Date 1341416323 -7200 > # Node ID cfdd6d53f3dd3c6aa325fe6d8a17e4089daafae5 > # Parent f1227d5a82e56d10e302aec4c3717d281718a349 > libxl: rename libxl_cpumap to libxl_bitmap > > And leave to the caller the burden of knowing and remembering what kind > of bitmap each instance of libxl_bitmap is. > > This is basically just some s/libxl_cpumap/libxl_bitmap/ (and some other > related interface name substitution, e.g., libxl_for_each_cpu) in a bunch > of files, with no real functional change involved. > > A specific allocation helper is introduced, besides libxl_bitmap_alloc(). > It is called libxl_cpu_bitmap_alloc() and is meant at substituting the old > libxl_cpumap_alloc(). It is just something easier to use in cases where one > wants to allocate a libxl_bitmap that is going to serve as a cpu map. > > This is because we want to be able to deal with both cpu and NUMA node > maps, but we don''t want to duplicate all the various helpers and wrappers. > > While at it, add the usual initialization function, common to all libxl > data structures. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.eu.com>Acked-by: Ian Campbell <ian.campbell@citrix.com>> > --- > Changes from v2: > * rebased on top of 51d2daabd428 (libxl: allow setting more than 31 vcpus). > * Fixed one missing rename of cpumap into bitmap. > * Added libxl_bitmap_init(). > > Changes from v1: > * this patch replaces "libxl: abstract libxl_cpumap to just libxl_map" > as it directly change the name of the old type instead of adding one > more abstraction layer. > > diff --git a/tools/libxl/gentest.py b/tools/libxl/gentest.py > --- a/tools/libxl/gentest.py > +++ b/tools/libxl/gentest.py > @@ -20,7 +20,7 @@ def randomize_case(s): > def randomize_enum(e): > return random.choice([v.name for v in e.values]) > > -handcoded = ["libxl_cpumap", "libxl_key_value_list", > +handcoded = ["libxl_bitmap", "libxl_key_value_list", > "libxl_cpuid_policy_list", "libxl_string_list"] > > def gen_rand_init(ty, v, indent = " ", parent = None): > @@ -117,16 +117,16 @@ static void rand_bytes(uint8_t *p, size_ > p[i] = rand() % 256; > } > > -static void libxl_cpumap_rand_init(libxl_cpumap *cpumap) > +static void libxl_bitmap_rand_init(libxl_bitmap *bitmap) > { > int i; > - cpumap->size = rand() % 16; > - cpumap->map = calloc(cpumap->size, sizeof(*cpumap->map)); > - libxl_for_each_cpu(i, *cpumap) { > + bitmap->size = rand() % 16; > + bitmap->map = calloc(bitmap->size, sizeof(*bitmap->map)); > + libxl_for_each_bit(i, *bitmap) { > if (rand() % 2) > - libxl_cpumap_set(cpumap, i); > + libxl_bitmap_set(bitmap, i); > else > - libxl_cpumap_reset(cpumap, i); > + libxl_bitmap_reset(bitmap, i); > } > } > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -586,7 +586,7 @@ static int cpupool_info(libxl__gc *gc, > info->poolid = xcinfo->cpupool_id; > info->sched = xcinfo->sched_id; > info->n_dom = xcinfo->n_dom; > - rc = libxl_cpumap_alloc(CTX, &info->cpumap, 0); > + rc = libxl_cpu_bitmap_alloc(CTX, &info->cpumap, 0); > if (rc) > { > LOG(ERROR, "unable to allocate cpumap %d\n", rc); > @@ -3431,7 +3431,7 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ct > } > > for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu, ++ptr) { > - if (libxl_cpumap_alloc(ctx, &ptr->cpumap, 0)) { > + if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) { > LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "allocating cpumap"); > return NULL; > } > @@ -3454,7 +3454,7 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ct > } > > int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > - libxl_cpumap *cpumap) > + libxl_bitmap *cpumap) > { > if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map)) { > LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu affinity"); > @@ -3464,7 +3464,7 @@ int libxl_set_vcpuaffinity(libxl_ctx *ct > } > > int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, > - unsigned int max_vcpus, libxl_cpumap *cpumap) > + unsigned int max_vcpus, libxl_bitmap *cpumap) > { > int i, rc = 0; > > @@ -3478,7 +3478,7 @@ int libxl_set_vcpuaffinity_all(libxl_ctx > return rc; > } > > -int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_cpumap *cpumap) > +int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_bitmap *cpumap) > { > GC_INIT(ctx); > libxl_dominfo info; > @@ -3498,7 +3498,7 @@ retry_transaction: > for (i = 0; i <= info.vcpu_max_id; i++) > libxl__xs_write(gc, t, > libxl__sprintf(gc, "%s/cpu/%u/availability", dompath, i), > - "%s", libxl_cpumap_test(cpumap, i) ? "online" : "offline"); > + "%s", libxl_bitmap_test(cpumap, i) ? "online" : "offline"); > if (!xs_transaction_end(ctx->xsh, t, 0)) { > if (errno == EAGAIN) > goto retry_transaction; > @@ -4094,7 +4094,7 @@ int libxl_tmem_freeable(libxl_ctx *ctx) > return rc; > } > > -int libxl_get_freecpus(libxl_ctx *ctx, libxl_cpumap *cpumap) > +int libxl_get_freecpus(libxl_ctx *ctx, libxl_bitmap *cpumap) > { > int ncpus; > > @@ -4113,7 +4113,7 @@ int libxl_get_freecpus(libxl_ctx *ctx, l > > int libxl_cpupool_create(libxl_ctx *ctx, const char *name, > libxl_scheduler sched, > - libxl_cpumap cpumap, libxl_uuid *uuid, > + libxl_bitmap cpumap, libxl_uuid *uuid, > uint32_t *poolid) > { > GC_INIT(ctx); > @@ -4136,8 +4136,8 @@ int libxl_cpupool_create(libxl_ctx *ctx, > return ERROR_FAIL; > } > > - libxl_for_each_cpu(i, cpumap) > - if (libxl_cpumap_test(&cpumap, i)) { > + libxl_for_each_bit(i, cpumap) > + if (libxl_bitmap_test(&cpumap, i)) { > rc = xc_cpupool_addcpu(ctx->xch, *poolid, i); > if (rc) { > LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, > @@ -4172,7 +4172,7 @@ int libxl_cpupool_destroy(libxl_ctx *ctx > int rc, i; > xc_cpupoolinfo_t *info; > xs_transaction_t t; > - libxl_cpumap cpumap; > + libxl_bitmap cpumap; > > info = xc_cpupool_getinfo(ctx->xch, poolid); > if (info == NULL) { > @@ -4184,13 +4184,13 @@ int libxl_cpupool_destroy(libxl_ctx *ctx > if ((info->cpupool_id != poolid) || (info->n_dom)) > goto out; > > - rc = libxl_cpumap_alloc(ctx, &cpumap, 0); > + rc = libxl_cpu_bitmap_alloc(ctx, &cpumap, 0); > if (rc) > goto out; > > memcpy(cpumap.map, info->cpumap, cpumap.size); > - libxl_for_each_cpu(i, cpumap) > - if (libxl_cpumap_test(&cpumap, i)) { > + libxl_for_each_bit(i, cpumap) > + if (libxl_bitmap_test(&cpumap, i)) { > rc = xc_cpupool_removecpu(ctx->xch, poolid, i); > if (rc) { > LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, > @@ -4219,7 +4219,7 @@ int libxl_cpupool_destroy(libxl_ctx *ctx > rc = 0; > > out1: > - libxl_cpumap_dispose(&cpumap); > + libxl_bitmap_dispose(&cpumap); > out: > xc_cpupool_infofree(ctx->xch, info); > GC_FREE; > @@ -4287,7 +4287,7 @@ int libxl_cpupool_cpuadd_node(libxl_ctx > { > int rc = 0; > int cpu, nr; > - libxl_cpumap freemap; > + libxl_bitmap freemap; > libxl_cputopology *topology; > > if (libxl_get_freecpus(ctx, &freemap)) { > @@ -4302,7 +4302,7 @@ int libxl_cpupool_cpuadd_node(libxl_ctx > > *cpus = 0; > for (cpu = 0; cpu < nr; cpu++) { > - if (libxl_cpumap_test(&freemap, cpu) && (topology[cpu].node == node) && > + if (libxl_bitmap_test(&freemap, cpu) && (topology[cpu].node == node) && > !libxl_cpupool_cpuadd(ctx, poolid, cpu)) { > (*cpus)++; > } > @@ -4311,7 +4311,7 @@ int libxl_cpupool_cpuadd_node(libxl_ctx > > free(topology); > out: > - libxl_cpumap_dispose(&freemap); > + libxl_bitmap_dispose(&freemap); > return rc; > } > > @@ -4353,7 +4353,7 @@ int libxl_cpupool_cpuremove_node(libxl_c > if (poolinfo[p].poolid == poolid) { > for (cpu = 0; cpu < nr_cpus; cpu++) { > if ((topology[cpu].node == node) && > - libxl_cpumap_test(&poolinfo[p].cpumap, cpu) && > + libxl_bitmap_test(&poolinfo[p].cpumap, cpu) && > !libxl_cpupool_cpuremove(ctx, poolid, cpu)) { > (*cpus)++; > } > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -285,8 +285,9 @@ typedef uint64_t libxl_ev_user; > typedef struct { > uint32_t size; /* number of bytes in map */ > uint8_t *map; > -} libxl_cpumap; > -void libxl_cpumap_dispose(libxl_cpumap *map); > +} libxl_bitmap; > +void libxl_bitmap_init(libxl_bitmap *map); > +void libxl_bitmap_dispose(libxl_bitmap *map); > > /* libxl_cpuid_policy_list is a dynamic array storing CPUID policies > * for multiple leafs. It is terminated with an entry holding > @@ -790,10 +791,10 @@ int libxl_userdata_retrieve(libxl_ctx *c > > int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo); > int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, > - libxl_cpumap *cpumap); > + libxl_bitmap *cpumap); > int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, > - unsigned int max_vcpus, libxl_cpumap *cpumap); > -int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_cpumap *cpumap); > + unsigned int max_vcpus, libxl_bitmap *cpumap); > +int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_bitmap *cpumap); > > libxl_scheduler libxl_get_scheduler(libxl_ctx *ctx); > > @@ -843,10 +844,10 @@ int libxl_tmem_shared_auth(libxl_ctx *ct > int auth); > int libxl_tmem_freeable(libxl_ctx *ctx); > > -int libxl_get_freecpus(libxl_ctx *ctx, libxl_cpumap *cpumap); > +int libxl_get_freecpus(libxl_ctx *ctx, libxl_bitmap *cpumap); > int libxl_cpupool_create(libxl_ctx *ctx, const char *name, > libxl_scheduler sched, > - libxl_cpumap cpumap, libxl_uuid *uuid, > + libxl_bitmap cpumap, libxl_uuid *uuid, > uint32_t *poolid); > int libxl_cpupool_destroy(libxl_ctx *ctx, uint32_t poolid); > int libxl_cpupool_rename(libxl_ctx *ctx, const char *name, uint32_t poolid); > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -203,16 +203,16 @@ int libxl__domain_build_info_setdefault( > if (!b_info->max_vcpus) > b_info->max_vcpus = 1; > if (!b_info->avail_vcpus.size) { > - if (libxl_cpumap_alloc(CTX, &b_info->avail_vcpus, 1)) > + if (libxl_cpu_bitmap_alloc(CTX, &b_info->avail_vcpus, 1)) > return ERROR_FAIL; > - libxl_cpumap_set(&b_info->avail_vcpus, 0); > + libxl_bitmap_set(&b_info->avail_vcpus, 0); > } else if (b_info->avail_vcpus.size > HVM_MAX_VCPUS) > return ERROR_FAIL; > > if (!b_info->cpumap.size) { > - if (libxl_cpumap_alloc(CTX, &b_info->cpumap, 0)) > + if (libxl_cpu_bitmap_alloc(CTX, &b_info->cpumap, 0)) > return ERROR_FAIL; > - libxl_cpumap_set_any(&b_info->cpumap); > + libxl_bitmap_set_any(&b_info->cpumap); > } > > if (b_info->max_memkb == LIBXL_MEMKB_DEFAULT) > diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c > --- a/tools/libxl/libxl_dm.c > +++ b/tools/libxl/libxl_dm.c > @@ -208,8 +208,8 @@ static char ** libxl__build_device_model > NULL); > } > > - nr_set_cpus = libxl_cpumap_count_set(&b_info->avail_vcpus); > - s = libxl_cpumap_to_hex_string(CTX, &b_info->avail_vcpus); > + nr_set_cpus = libxl_bitmap_count_set(&b_info->avail_vcpus); > + s = libxl_bitmap_to_hex_string(CTX, &b_info->avail_vcpus); > flexarray_vappend(dm_args, "-vcpu_avail", > libxl__sprintf(gc, "%s", s), NULL); > free(s); > @@ -459,7 +459,7 @@ static char ** libxl__build_device_model > flexarray_append(dm_args, "-smp"); > if (b_info->avail_vcpus.size) { > int nr_set_cpus = 0; > - nr_set_cpus = libxl_cpumap_count_set(&b_info->avail_vcpus); > + nr_set_cpus = libxl_bitmap_count_set(&b_info->avail_vcpus); > > flexarray_append(dm_args, libxl__sprintf(gc, "%d,maxcpus=%d", > b_info->max_vcpus, > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -202,7 +202,7 @@ int libxl__build_post(libxl__gc *gc, uin > ents[11] = libxl__sprintf(gc, "%lu", state->store_mfn); > for (i = 0; i < info->max_vcpus; i++) { > ents[12+(i*2)] = libxl__sprintf(gc, "cpu/%d/availability", i); > - ents[12+(i*2)+1] = libxl_cpumap_test(&info->avail_vcpus, i) > + ents[12+(i*2)+1] = libxl_bitmap_test(&info->avail_vcpus, i) > ? "online" : "offline"; > } > > diff --git a/tools/libxl/libxl_json.c b/tools/libxl/libxl_json.c > --- a/tools/libxl/libxl_json.c > +++ b/tools/libxl/libxl_json.c > @@ -99,8 +99,8 @@ yajl_gen_status libxl_uuid_gen_json(yajl > return yajl_gen_string(hand, (const unsigned char *)buf, LIBXL_UUID_FMTLEN); > } > > -yajl_gen_status libxl_cpumap_gen_json(yajl_gen hand, > - libxl_cpumap *cpumap) > +yajl_gen_status libxl_bitmap_gen_json(yajl_gen hand, > + libxl_bitmap *bitmap) > { > yajl_gen_status s; > int i; > @@ -108,8 +108,8 @@ yajl_gen_status libxl_cpumap_gen_json(ya > s = yajl_gen_array_open(hand); > if (s != yajl_gen_status_ok) goto out; > > - libxl_for_each_cpu(i, *cpumap) { > - if (libxl_cpumap_test(cpumap, i)) { > + libxl_for_each_bit(i, *bitmap) { > + if (libxl_bitmap_test(bitmap, i)) { > s = yajl_gen_integer(hand, i); > if (s != yajl_gen_status_ok) goto out; > } > diff --git a/tools/libxl/libxl_json.h b/tools/libxl/libxl_json.h > --- a/tools/libxl/libxl_json.h > +++ b/tools/libxl/libxl_json.h > @@ -26,7 +26,7 @@ yajl_gen_status libxl_defbool_gen_json(y > yajl_gen_status libxl_domid_gen_json(yajl_gen hand, libxl_domid *p); > yajl_gen_status libxl_uuid_gen_json(yajl_gen hand, libxl_uuid *p); > yajl_gen_status libxl_mac_gen_json(yajl_gen hand, libxl_mac *p); > -yajl_gen_status libxl_cpumap_gen_json(yajl_gen hand, libxl_cpumap *p); > +yajl_gen_status libxl_bitmap_gen_json(yajl_gen hand, libxl_bitmap *p); > yajl_gen_status libxl_cpuid_policy_list_gen_json(yajl_gen hand, > libxl_cpuid_policy_list *p); > yajl_gen_status libxl_string_list_gen_json(yajl_gen hand, libxl_string_list *p); > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -10,7 +10,7 @@ libxl_defbool = Builtin("defbool", passb > libxl_domid = Builtin("domid", json_fn = "yajl_gen_integer", autogenerate_json = False) > libxl_uuid = Builtin("uuid", passby=PASS_BY_REFERENCE) > libxl_mac = Builtin("mac", passby=PASS_BY_REFERENCE) > -libxl_cpumap = Builtin("cpumap", dispose_fn="libxl_cpumap_dispose", passby=PASS_BY_REFERENCE) > +libxl_bitmap = Builtin("bitmap", dispose_fn="libxl_bitmap_dispose", passby=PASS_BY_REFERENCE) > libxl_cpuid_policy_list = Builtin("cpuid_policy_list", dispose_fn="libxl_cpuid_dispose", passby=PASS_BY_REFERENCE) > > libxl_string_list = Builtin("string_list", dispose_fn="libxl_string_list_dispose", passby=PASS_BY_REFERENCE) > @@ -198,7 +198,7 @@ libxl_cpupoolinfo = Struct("cpupoolinfo" > ("poolid", uint32), > ("sched", libxl_scheduler), > ("n_dom", uint32), > - ("cpumap", libxl_cpumap) > + ("cpumap", libxl_bitmap) > ], dir=DIR_OUT) > > libxl_vminfo = Struct("vminfo", [ > @@ -247,8 +247,8 @@ libxl_domain_sched_params = Struct("doma > > libxl_domain_build_info = Struct("domain_build_info",[ > ("max_vcpus", integer), > - ("avail_vcpus", libxl_cpumap), > - ("cpumap", libxl_cpumap), > + ("avail_vcpus", libxl_bitmap), > + ("cpumap", libxl_bitmap), > ("tsc_mode", libxl_tsc_mode), > ("max_memkb", MemKB), > ("target_memkb", MemKB), > @@ -409,7 +409,7 @@ libxl_vcpuinfo = Struct("vcpuinfo", [ > ("blocked", bool), > ("running", bool), > ("vcpu_time", uint64), # total vcpu time ran (ns) > - ("cpumap", libxl_cpumap), # current cpu''s affinities > + ("cpumap", libxl_bitmap), # current cpu''s affinities > ], dir=DIR_OUT) > > libxl_physinfo = Struct("physinfo", [ > diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c > --- a/tools/libxl/libxl_utils.c > +++ b/tools/libxl/libxl_utils.c > @@ -487,79 +487,70 @@ int libxl_mac_to_device_nic(libxl_ctx *c > return rc; > } > > -int libxl_cpumap_alloc(libxl_ctx *ctx, libxl_cpumap *cpumap, int max_cpus) > +int libxl_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *bitmap, int n_bits) > { > GC_INIT(ctx); > int sz; > - int rc; > > - if (max_cpus < 0) { > - rc = ERROR_INVAL; > - goto out; > - } > - if (max_cpus == 0) > - max_cpus = libxl_get_max_cpus(ctx); > - if (max_cpus == 0) { > - rc = ERROR_FAIL; > - goto out; > - } > + sz = (n_bits + 7) / 8; > + bitmap->map = libxl__calloc(NOGC, sizeof(*bitmap->map), sz); > + bitmap->size = sz; > > - sz = (max_cpus + 7) / 8; > - cpumap->map = libxl__calloc(NOGC, sizeof(*cpumap->map), sz); > - cpumap->size = sz; > - > - rc = 0; > - out: > GC_FREE; > - return rc; > + return 0; > } > > -void libxl_cpumap_dispose(libxl_cpumap *map) > +void libxl_bitmap_init(libxl_bitmap *map) > +{ > + memset(map, ''\0'', sizeof(*map)); > +} > + > +void libxl_bitmap_dispose(libxl_bitmap *map) > { > free(map->map); > } > > -int libxl_cpumap_test(const libxl_cpumap *cpumap, int cpu) > +int libxl_bitmap_test(const libxl_bitmap *bitmap, int bit) > { > - if (cpu >= cpumap->size * 8) > + if (bit >= bitmap->size * 8) > return 0; > - return (cpumap->map[cpu / 8] & (1 << (cpu & 7))) ? 1 : 0; > + return (bitmap->map[bit / 8] & (1 << (bit & 7))) ? 1 : 0; > } > > -void libxl_cpumap_set(libxl_cpumap *cpumap, int cpu) > +void libxl_bitmap_set(libxl_bitmap *bitmap, int bit) > { > - if (cpu >= cpumap->size * 8) > + if (bit >= bitmap->size * 8) > return; > - cpumap->map[cpu / 8] |= 1 << (cpu & 7); > + bitmap->map[bit / 8] |= 1 << (bit & 7); > } > > -void libxl_cpumap_reset(libxl_cpumap *cpumap, int cpu) > +void libxl_bitmap_reset(libxl_bitmap *bitmap, int bit) > { > - if (cpu >= cpumap->size * 8) > + if (bit >= bitmap->size * 8) > return; > - cpumap->map[cpu / 8] &= ~(1 << (cpu & 7)); > + bitmap->map[bit / 8] &= ~(1 << (bit & 7)); > } > > -int libxl_cpumap_count_set(const libxl_cpumap *cpumap) > +int libxl_bitmap_count_set(const libxl_bitmap *bitmap) > { > - int i, nr_set_cpus = 0; > - libxl_for_each_set_cpu(i, *cpumap) > - nr_set_cpus++; > + int i, nr_set_bits = 0; > + libxl_for_each_set_bit(i, *bitmap) > + nr_set_bits++; > > - return nr_set_cpus; > + return nr_set_bits; > } > > /* NB. caller is responsible for freeing the memory */ > -char *libxl_cpumap_to_hex_string(libxl_ctx *ctx, const libxl_cpumap *cpumap) > +char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const libxl_bitmap *bitmap) > { > GC_INIT(ctx); > - int i = cpumap->size; > - char *p = libxl__zalloc(NOGC, cpumap->size * 2 + 3); > + int i = bitmap->size; > + char *p = libxl__zalloc(NOGC, bitmap->size * 2 + 3); > char *q = p; > strncpy(p, "0x", 2); > p += 2; > while(--i >= 0) { > - sprintf(p, "%02x", cpumap->map[i]); > + sprintf(p, "%02x", bitmap->map[i]); > p += 2; > } > *p = ''\0''; > diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h > --- a/tools/libxl/libxl_utils.h > +++ b/tools/libxl/libxl_utils.h > @@ -63,29 +63,44 @@ int libxl_devid_to_device_nic(libxl_ctx > int libxl_vdev_to_device_disk(libxl_ctx *ctx, uint32_t domid, const char *vdev, > libxl_device_disk *disk); > > -int libxl_cpumap_alloc(libxl_ctx *ctx, libxl_cpumap *cpumap, int max_cpus); > -int libxl_cpumap_test(const libxl_cpumap *cpumap, int cpu); > -void libxl_cpumap_set(libxl_cpumap *cpumap, int cpu); > -void libxl_cpumap_reset(libxl_cpumap *cpumap, int cpu); > -int libxl_cpumap_count_set(const libxl_cpumap *cpumap); > -char *libxl_cpumap_to_hex_string(libxl_ctx *ctx, const libxl_cpumap *cpumap); > -static inline void libxl_cpumap_set_any(libxl_cpumap *cpumap) > +int libxl_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *bitmap, int n_bits); > + /* Allocated bimap is from malloc, libxl_bitmap_dispose() to be > + * called by the application when done. */ > +int libxl_bitmap_test(const libxl_bitmap *bitmap, int bit); > +void libxl_bitmap_set(libxl_bitmap *bitmap, int bit); > +void libxl_bitmap_reset(libxl_bitmap *bitmap, int bit); > +int libxl_bitmap_count_set(const libxl_bitmap *cpumap); > +char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const libxl_bitmap *cpumap); > +static inline void libxl_bitmap_set_any(libxl_bitmap *bitmap) > { > - memset(cpumap->map, -1, cpumap->size); > + memset(bitmap->map, -1, bitmap->size); > } > -static inline void libxl_cpumap_set_none(libxl_cpumap *cpumap) > +static inline void libxl_bitmap_set_none(libxl_bitmap *bitmap) > { > - memset(cpumap->map, 0, cpumap->size); > + memset(bitmap->map, 0, bitmap->size); > } > -static inline int libxl_cpumap_cpu_valid(libxl_cpumap *cpumap, int cpu) > +static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit) > { > - return cpu >= 0 && cpu < (cpumap->size * 8); > + return bit >= 0 && bit < (bitmap->size * 8); > } > -#define libxl_for_each_cpu(var, map) for (var = 0; var < (map).size * 8; var++) > -#define libxl_for_each_set_cpu(v, m) for (v = 0; v < (m).size * 8; v++) \ > - if (libxl_cpumap_test(&(m), v)) > +#define libxl_for_each_bit(var, map) for (var = 0; var < (map).size * 8; var++) > +#define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \ > + if (libxl_bitmap_test(&(m), v)) > > -static inline uint32_t libxl__sizekb_to_mb(uint32_t s) { > +static inline int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, > + int max_cpus) > +{ > + if (max_cpus < 0) > + return ERROR_INVAL; > + if (max_cpus == 0) > + max_cpus = libxl_get_max_cpus(ctx); > + if (max_cpus == 0) > + return ERROR_FAIL; > + > + return libxl_bitmap_alloc(ctx, cpumap, max_cpus); > +} > + > + static inline uint32_t libxl__sizekb_to_mb(uint32_t s) { > return (s + 1023) / 1024; > } > > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -492,19 +492,19 @@ static void split_string_into_string_lis > free(s); > } > > -static int vcpupin_parse(char *cpu, libxl_cpumap *cpumap) > -{ > - libxl_cpumap exclude_cpumap; > +static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap) > +{ > + libxl_bitmap exclude_cpumap; > uint32_t cpuida, cpuidb; > char *endptr, *toka, *tokb, *saveptr = NULL; > int i, rc = 0, rmcpu; > > if (!strcmp(cpu, "all")) { > - libxl_cpumap_set_any(cpumap); > + libxl_bitmap_set_any(cpumap); > return 0; > } > > - if (libxl_cpumap_alloc(ctx, &exclude_cpumap, 0)) { > + if (libxl_cpu_bitmap_alloc(ctx, &exclude_cpumap, 0)) { > fprintf(stderr, "Error: Failed to allocate cpumap.\n"); > return ENOMEM; > } > @@ -534,19 +534,19 @@ static int vcpupin_parse(char *cpu, libx > } > } > while (cpuida <= cpuidb) { > - rmcpu == 0 ? libxl_cpumap_set(cpumap, cpuida) : > - libxl_cpumap_set(&exclude_cpumap, cpuida); > + rmcpu == 0 ? libxl_bitmap_set(cpumap, cpuida) : > + libxl_bitmap_set(&exclude_cpumap, cpuida); > cpuida++; > } > } > > /* Clear all the cpus from the removal list */ > - libxl_for_each_set_cpu(i, exclude_cpumap) { > - libxl_cpumap_reset(cpumap, i); > + libxl_for_each_set_bit(i, exclude_cpumap) { > + libxl_bitmap_reset(cpumap, i); > } > > vcpp_out: > - libxl_cpumap_dispose(&exclude_cpumap); > + libxl_bitmap_dispose(&exclude_cpumap); > > return rc; > } > @@ -649,13 +649,13 @@ static void parse_config_data(const char > if (!xlu_cfg_get_long (config, "vcpus", &l, 0)) { > b_info->max_vcpus = l; > > - if (libxl_cpumap_alloc(ctx, &b_info->avail_vcpus, l)) { > + if (libxl_cpu_bitmap_alloc(ctx, &b_info->avail_vcpus, l)) { > fprintf(stderr, "Unable to allocate cpumap\n"); > exit(1); > } > - libxl_cpumap_set_none(&b_info->avail_vcpus); > + libxl_bitmap_set_none(&b_info->avail_vcpus); > while (l-- > 0) > - libxl_cpumap_set((&b_info->avail_vcpus), l); > + libxl_bitmap_set((&b_info->avail_vcpus), l); > } > > if (!xlu_cfg_get_long (config, "maxvcpus", &l, 0)) > @@ -664,7 +664,7 @@ static void parse_config_data(const char > if (!xlu_cfg_get_list (config, "cpus", &cpus, 0, 1)) { > int i, n_cpus = 0; > > - if (libxl_cpumap_alloc(ctx, &b_info->cpumap, 0)) { > + if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) { > fprintf(stderr, "Unable to allocate cpumap\n"); > exit(1); > } > @@ -684,14 +684,14 @@ static void parse_config_data(const char > * the cpumap derived from the list ensures memory is being > * allocated on the proper nodes anyway. > */ > - libxl_cpumap_set_none(&b_info->cpumap); > + libxl_bitmap_set_none(&b_info->cpumap); > while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) { > i = atoi(buf); > - if (!libxl_cpumap_cpu_valid(&b_info->cpumap, i)) { > + if (!libxl_bitmap_cpu_valid(&b_info->cpumap, i)) { > fprintf(stderr, "cpu %d illegal\n", i); > exit(1); > } > - libxl_cpumap_set(&b_info->cpumap, i); > + libxl_bitmap_set(&b_info->cpumap, i); > if (n_cpus < b_info->max_vcpus) > vcpu_to_pcpu[n_cpus] = i; > n_cpus++; > @@ -700,12 +700,12 @@ static void parse_config_data(const char > else if (!xlu_cfg_get_string (config, "cpus", &buf, 0)) { > char *buf2 = strdup(buf); > > - if (libxl_cpumap_alloc(ctx, &b_info->cpumap, 0)) { > + if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) { > fprintf(stderr, "Unable to allocate cpumap\n"); > exit(1); > } > > - libxl_cpumap_set_none(&b_info->cpumap); > + libxl_bitmap_set_none(&b_info->cpumap); > if (vcpupin_parse(buf2, &b_info->cpumap)) > exit(1); > free(buf2); > @@ -1800,28 +1800,28 @@ start: > > /* If single vcpu to pcpu mapping was requested, honour it */ > if (vcpu_to_pcpu) { > - libxl_cpumap vcpu_cpumap; > - > - ret = libxl_cpumap_alloc(ctx, &vcpu_cpumap, 0); > + libxl_bitmap vcpu_cpumap; > + > + ret = libxl_cpu_bitmap_alloc(ctx, &vcpu_cpumap, 0); > if (ret) > goto error_out; > for (i = 0; i < d_config.b_info.max_vcpus; i++) { > > if (vcpu_to_pcpu[i] != -1) { > - libxl_cpumap_set_none(&vcpu_cpumap); > - libxl_cpumap_set(&vcpu_cpumap, vcpu_to_pcpu[i]); > + libxl_bitmap_set_none(&vcpu_cpumap); > + libxl_bitmap_set(&vcpu_cpumap, vcpu_to_pcpu[i]); > } else { > - libxl_cpumap_set_any(&vcpu_cpumap); > + libxl_bitmap_set_any(&vcpu_cpumap); > } > if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap)) { > fprintf(stderr, "setting affinity failed on vcpu `%d''.\n", i); > - libxl_cpumap_dispose(&vcpu_cpumap); > + libxl_bitmap_dispose(&vcpu_cpumap); > free(vcpu_to_pcpu); > ret = ERROR_FAIL; > goto error_out; > } > } > - libxl_cpumap_dispose(&vcpu_cpumap); > + libxl_bitmap_dispose(&vcpu_cpumap); > free(vcpu_to_pcpu); vcpu_to_pcpu = NULL; > } > > @@ -4058,7 +4058,7 @@ int main_vcpulist(int argc, char **argv) > static void vcpupin(const char *d, const char *vcpu, char *cpu) > { > libxl_vcpuinfo *vcpuinfo; > - libxl_cpumap cpumap; > + libxl_bitmap cpumap; > > uint32_t vcpuid; > char *endptr; > @@ -4075,7 +4075,7 @@ static void vcpupin(const char *d, const > > find_domain(d); > > - if (libxl_cpumap_alloc(ctx, &cpumap, 0)) { > + if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) { > goto vcpupin_out; > } > > @@ -4102,7 +4102,7 @@ static void vcpupin(const char *d, const > libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu); > } > vcpupin_out1: > - libxl_cpumap_dispose(&cpumap); > + libxl_bitmap_dispose(&cpumap); > vcpupin_out: > ; > } > @@ -4122,7 +4122,7 @@ static void vcpuset(const char *d, const > { > char *endptr; > unsigned int max_vcpus, i; > - libxl_cpumap cpumap; > + libxl_bitmap cpumap; > > max_vcpus = strtoul(nr_vcpus, &endptr, 10); > if (nr_vcpus == endptr) { > @@ -4132,17 +4132,17 @@ static void vcpuset(const char *d, const > > find_domain(d); > > - if (libxl_cpumap_alloc(ctx, &cpumap, 0)) { > - fprintf(stderr, "libxl_cpumap_alloc failed\n"); > + if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) { > + fprintf(stderr, "libxl_cpu_bitmap_alloc failed\n"); > return; > } > for (i = 0; i < max_vcpus; i++) > - libxl_cpumap_set(&cpumap, i); > + libxl_bitmap_set(&cpumap, i); > > if (libxl_set_vcpuonline(ctx, domid, &cpumap) < 0) > fprintf(stderr, "libxl_set_vcpuonline failed domid=%d max_vcpus=%d\n", domid, max_vcpus); > > - libxl_cpumap_dispose(&cpumap); > + libxl_bitmap_dispose(&cpumap); > } > > int main_vcpuset(int argc, char **argv) > @@ -4206,7 +4206,7 @@ static void output_physinfo(void) > libxl_physinfo info; > const libxl_version_info *vinfo; > unsigned int i; > - libxl_cpumap cpumap; > + libxl_bitmap cpumap; > int n = 0; > > if (libxl_get_physinfo(ctx, &info) != 0) { > @@ -4238,8 +4238,8 @@ static void output_physinfo(void) > printf("sharing_used_memory : %"PRIu64"\n", info.sharing_used_frames / i); > } > if (!libxl_get_freecpus(ctx, &cpumap)) { > - libxl_for_each_cpu(i, cpumap) > - if (libxl_cpumap_test(&cpumap, i)) > + libxl_for_each_bit(i, cpumap) > + if (libxl_bitmap_test(&cpumap, i)) > n++; > printf("free_cpus : %d\n", n); > free(cpumap.map); > @@ -5861,8 +5861,8 @@ int main_cpupoolcreate(int argc, char ** > XLU_ConfigList *cpus; > XLU_ConfigList *nodes; > int n_cpus, n_nodes, i, n; > - libxl_cpumap freemap; > - libxl_cpumap cpumap; > + libxl_bitmap freemap; > + libxl_bitmap cpumap; > libxl_uuid uuid; > libxl_cputopology *topology; > int rc = -ERROR_FAIL; > @@ -5975,7 +5975,7 @@ int main_cpupoolcreate(int argc, char ** > fprintf(stderr, "libxl_get_freecpus failed\n"); > goto out_cfg; > } > - if (libxl_cpumap_alloc(ctx, &cpumap, 0)) { > + if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) { > fprintf(stderr, "Failed to allocate cpumap\n"); > goto out_cfg; > } > @@ -5992,8 +5992,8 @@ int main_cpupoolcreate(int argc, char ** > n = atoi(buf); > for (i = 0; i < nr; i++) { > if ((topology[i].node == n) && > - libxl_cpumap_test(&freemap, i)) { > - libxl_cpumap_set(&cpumap, i); > + libxl_bitmap_test(&freemap, i)) { > + libxl_bitmap_set(&cpumap, i); > n_cpus++; > } > } > @@ -6011,11 +6011,11 @@ int main_cpupoolcreate(int argc, char ** > while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) { > i = atoi(buf); > if ((i < 0) || (i >= freemap.size * 8) || > - !libxl_cpumap_test(&freemap, i)) { > + !libxl_bitmap_test(&freemap, i)) { > fprintf(stderr, "cpu %d illegal or not free\n", i); > goto out_cfg; > } > - libxl_cpumap_set(&cpumap, i); > + libxl_bitmap_set(&cpumap, i); > n_cpus++; > } > } else > @@ -6113,8 +6113,8 @@ int main_cpupoollist(int argc, char **ar > printf("%-19s", name); > free(name); > n = 0; > - libxl_for_each_cpu(c, poolinfo[p].cpumap) > - if (libxl_cpumap_test(&poolinfo[p].cpumap, c)) { > + libxl_for_each_bit(c, poolinfo[p].cpumap) > + if (libxl_bitmap_test(&poolinfo[p].cpumap, c)) { > if (n && opt_cpus) printf(","); > if (opt_cpus) printf("%d", c); > n++; > @@ -6313,7 +6313,7 @@ int main_cpupoolnumasplit(int argc, char > int n_cpus; > char name[16]; > libxl_uuid uuid; > - libxl_cpumap cpumap; > + libxl_bitmap cpumap; > libxl_cpupoolinfo *poolinfo; > libxl_cputopology *topology; > libxl_dominfo info; > @@ -6343,7 +6343,7 @@ int main_cpupoolnumasplit(int argc, char > return -ERROR_FAIL; > } > > - if (libxl_cpumap_alloc(ctx, &cpumap, 0)) { > + if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) { > fprintf(stderr, "Failed to allocate cpumap\n"); > libxl_cputopology_list_free(topology, n_cpus); > return -ERROR_FAIL; > @@ -6369,7 +6369,7 @@ int main_cpupoolnumasplit(int argc, char > for (c = 0; c < n_cpus; c++) { > if (topology[c].node == node) { > topology[c].node = LIBXL_CPUTOPOLOGY_INVALID_ENTRY; > - libxl_cpumap_set(&cpumap, n); > + libxl_bitmap_set(&cpumap, n); > n++; > } > } > @@ -6391,7 +6391,7 @@ int main_cpupoolnumasplit(int argc, char > fprintf(stderr, "failed to offline vcpus\n"); > goto out; > } > - libxl_cpumap_set_none(&cpumap); > + libxl_bitmap_set_none(&cpumap); > > for (c = 0; c < n_cpus; c++) { > if (topology[c].node == LIBXL_CPUTOPOLOGY_INVALID_ENTRY) { > @@ -6429,7 +6429,7 @@ int main_cpupoolnumasplit(int argc, char > > out: > libxl_cputopology_list_free(topology, n_cpus); > - libxl_cpumap_dispose(&cpumap); > + libxl_bitmap_dispose(&cpumap); > > return ret; > } > diff --git a/tools/python/xen/lowlevel/xl/xl.c b/tools/python/xen/lowlevel/xl/xl.c > --- a/tools/python/xen/lowlevel/xl/xl.c > +++ b/tools/python/xen/lowlevel/xl/xl.c > @@ -231,14 +231,14 @@ int attrib__libxl_cpuid_policy_list_set( > return -1; > } > > -int attrib__libxl_cpumap_set(PyObject *v, libxl_cpumap *pptr) > +int attrib__libxl_bitmap_set(PyObject *v, libxl_bitmap *pptr) > { > int i; > long cpu; > > for (i = 0; i < PyList_Size(v); i++) { > cpu = PyInt_AsLong(PyList_GetItem(v, i)); > - libxl_cpumap_set(pptr, cpu); > + libxl_bitmap_set(pptr, cpu); > } > return 0; > } > @@ -293,14 +293,14 @@ PyObject *attrib__libxl_cpuid_policy_lis > return NULL; > } > > -PyObject *attrib__libxl_cpumap_get(libxl_cpumap *pptr) > +PyObject *attrib__libxl_bitmap_get(libxl_bitmap *pptr) > { > PyObject *cpulist = NULL; > int i; > > cpulist = PyList_New(0); > - libxl_for_each_cpu(i, *pptr) { > - if ( libxl_cpumap_test(pptr, i) ) { > + libxl_for_each_bit(i, *pptr) { > + if ( libxl_bitmap_test(pptr, i) ) { > PyObject* pyint = PyInt_FromLong(i); > > PyList_Append(cpulist, pyint);
Ian Campbell
2012-Jul-06 10:40 UTC
Re: [PATCH 05 of 10 v3] libxl: expand the libxl_bitmap API a bit
On Wed, 2012-07-04 at 17:18 +0100, Dario Faggioli wrote:> # HG changeset patch > # User Dario Faggioli <raistlin@linux.it> > # Date 1341416323 -7200 > # Node ID 516eb90ec5599c8d44a5a5c109db9d8bbdb6ed09 > # Parent cfdd6d53f3dd3c6aa325fe6d8a17e4089daafae5 > libxl: expand the libxl_bitmap API a bit > > By adding copying and *_is_full/*_is_empty facilities. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>Acked-by: Ian Campbell <ian.campbell@citrix.com>> > --- > Changes from v2: > * added an assert for equal sizes in libxl_bitmap_copy(). > > Changes from v1: > * now libxl_is_full/empty return 1 if true and 0 if false, > as logic (and as requested during review). > > diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c > --- a/tools/libxl/libxl_utils.c > +++ b/tools/libxl/libxl_utils.c > @@ -510,6 +510,36 @@ void libxl_bitmap_dispose(libxl_bitmap * > free(map->map); > } > > +void libxl_bitmap_copy(libxl_ctx *ctx, libxl_bitmap *dptr, > + const libxl_bitmap *sptr) > +{ > + int sz; > + > + assert(dptr->size == sptr->size); > + sz = dptr->size = sptr->size; > + memcpy(dptr->map, sptr->map, sz * sizeof(*dptr->map)); > +} > + > +int libxl_bitmap_is_full(const libxl_bitmap *bitmap) > +{ > + int i; > + > + for (i = 0; i < bitmap->size; i++) > + if (bitmap->map[i] != (uint8_t)-1) > + return 0; > + return 1; > +} > + > +int libxl_bitmap_is_empty(const libxl_bitmap *bitmap) > +{ > + int i; > + > + for (i = 0; i < bitmap->size; i++) > + if (bitmap->map[i]) > + return 0; > + return 1; > +} > + > int libxl_bitmap_test(const libxl_bitmap *bitmap, int bit) > { > if (bit >= bitmap->size * 8) > diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h > --- a/tools/libxl/libxl_utils.h > +++ b/tools/libxl/libxl_utils.h > @@ -66,6 +66,10 @@ int libxl_vdev_to_device_disk(libxl_ctx > int libxl_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *bitmap, int n_bits); > /* Allocated bimap is from malloc, libxl_bitmap_dispose() to be > * called by the application when done. */ > +void libxl_bitmap_copy(libxl_ctx *ctx, libxl_bitmap *dptr, > + const libxl_bitmap *sptr); > +int libxl_bitmap_is_full(const libxl_bitmap *bitmap); > +int libxl_bitmap_is_empty(const libxl_bitmap *bitmap); > int libxl_bitmap_test(const libxl_bitmap *bitmap, int bit); > void libxl_bitmap_set(libxl_bitmap *bitmap, int bit); > void libxl_bitmap_reset(libxl_bitmap *bitmap, int bit);
Ian Campbell
2012-Jul-06 10:55 UTC
Re: [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
On Wed, 2012-07-04 at 17:18 +0100, Dario Faggioli wrote:> # HG changeset patch > # User Dario Faggioli <raistlin@linux.it> > # Date 1341416324 -7200 > # Node ID 7087d3622ee2051654c9e78fe4829da10c2d46f1 > # Parent 6fd693e7f3bc8b4d9bd20befff2c13de5591a7c5 > libxl: enable automatic placement of guests on NUMA nodes > > If a domain does not have a VCPU affinity, try to pin it automatically to some > PCPUs. This is done taking into account the NUMA characteristics of the host. > In fact, we look for a combination of host''s NUMA nodes with enough free memory > and number of PCPUs for the new domain, and pin it to the VCPUs of those nodes. > > Once we know which ones, among all the possible combinations, represents valid > placement candidates for a domain, use some heuistics for deciding which is the > best. For instance, smaller candidates are considered to be better, both from > the domain''s point of view (fewer memory spreading among nodes) and from the > system as a whole point of view (fewer memoy fragmentation). In case of > candidates of equal sizes (i.e., with the same number of nodes), the amount of > free memory and the number of domain already assigned to their nodes are > considered. Very often, candidates with greater amount of memory are the one > we wants, as this is also good for keeping memory fragmentation under control. > However, if the difference in how much free memory two candidates have, the > number of assigned domains might be what decides which candidate wins.I can''t parse this last sentence. Are there some words missing after "how much free memory two candidates have"? If you want to post the corrected text I think we can fold it in while applying rather than reposting (assuming no other reason to repost crops up).> This all happens internally to libxl, and no API for driving the mechanism is > provided for now. This matches what xend already does. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > Acked-by: George Dunlap <george.dunlap@eu.citrix.com> > > --- > Changes from v2: > * lots of typos. > * Clayfied some comments, as requested during (ijc''s) review. > * Added some more information/reference for the combination generation > algorithm. > * nodemap_to_nodes_cpus() function renamed to nodemap_to_nr_cpus(). > * libxl_bitmap_init() used to make sure we do not try to free random > memory on failure paths of functions that allocates a libxl_bitmap. > * Always invoke libxl__sort_numa_candidates(), even if we get there > with just 1 candidate, as requested during review. > * Simplified the if-s that check for input parameter consistency in > libxl__get_numa_candidates() as requested during (gwd''s) review. > * Comparison function for candidates changed so that it now provides > total ordering, as requested during review. It is still using FP > arithmetic, though. Also I think that just putting the difference > between the amount of free memory and between the number of assigned > domains of two candidates in a single formula (after normalizing and > weighting them) is both clear and effective enough. > * Function definitions moved to a numa specific source file (libxl_numa.c), > as suggested during review. > > > Changes from v1: > * This patches incorporates the changes from both "libxl, xl: enable automatic > placement of guests on NUMA nodes" and "libxl, xl: heuristics for reordering > NUMA placement candidates" from v1. > * The logic of the algorithm is basically the same as in v1, but the splitting > of it in the various functions has been completely redesigned from scratch. > * No public API for placement or candidate generation is now exposed, > everything happens within libxl, as agreed during v1 review. > * The relevant documentation have been moved near the actual functions and > features. Also, the amount and (hopefully!) the quality of the documentation > has been improved a lot, as requested. > * All the comments about using the proper libxl facilities and helpers for > allocations, etc., have been considered and applied. > * This patch still bails out from NUMA optimizations if it find out cpupools > are being utilized. It is next patch that makes the two things interact > properly, as suggested during review. > > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 > --- a/docs/man/xl.cfg.pod.5 > +++ b/docs/man/xl.cfg.pod.5 > @@ -111,8 +111,8 @@ created online and the remainder will be > > =item B<cpus="CPU-LIST"> > > -List of which cpus the guest is allowed to use. Default behavior is > -`all cpus`. A C<CPU-LIST> may be specified as follows: > +List of which cpus the guest is allowed to use. By default xl will (via > +libxl) pick some cpus (see below). A C<CPU-LIST> may be specified as follows: > > =over 4 > > @@ -132,6 +132,12 @@ run on cpu #3 of the host. > > =back > > +If this option is not specified, libxl automatically tries to place the new > +domain on the host''s NUMA nodes (provided the host has more than one NUMA > +node) by pinning it to the cpus of those nodes. A heuristic approach is > +utilized with the goals of maximizing performance for the domain and, at > +the same time, achieving efficient utilization of the host''s CPUs and RAM. > + > =item B<cpu_weight=WEIGHT> > > A domain with a weight of 512 will get twice as much CPU as a domain > diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile > --- a/tools/libxl/Makefile > +++ b/tools/libxl/Makefile > @@ -66,7 +66,7 @@ LIBXL_LIBS += -lyajl -lm > LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \ > libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \ > libxl_internal.o libxl_utils.o libxl_uuid.o \ > - libxl_json.o libxl_aoutils.o \ > + libxl_json.o libxl_aoutils.o libxl_numa.o \ > libxl_save_callout.o _libxl_save_msgs_callout.o \ > libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y) > LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -98,6 +98,106 @@ out: > return sched; > } > > +/* Subtract two values and translate the result in [0, 1] */ > +static double normalized_diff(double a, double b) > +{ > +#define max(a, b) (a > b ? a : b) > + if (!a && a == b) > + return 0.0; > + return (a - b) / max(a, b); > +}I think this actually returns a result in [-1,1] rather than [0,1] like the comment says. Considering the a==1 b==2 case -> ( 1 - 2 ) / max(1, 2) => -1 / 2 => -0.5 Is it the comment or the code which is wrong? (From the following numa_cmpf I think the comment)> + > +/* > + * The NUMA placement candidates are reordered according to the following > + * heuristics: > + * - candidates involving fewer nodes come first. In case two (or > + * more) candidates span the same number of nodes, > + * - the amount of free memory and the number of domains assigned to the > + * candidates are considered. In doing that, candidates with greater > + * amount of free memory and fewer domains assigned to them are preferred, > + * with free memory "weighting" three times as much as number of domains. > + */ > +static int numa_cmpf(const void *v1, const void *v2) > +{ > + const libxl__numa_candidate *c1 = v1; > + const libxl__numa_candidate *c2 = v2; > +#define sign(a) a > 0 ? 1 : a < 0 ? -1 : 0Does the caller of numa_cmpf rely on the result being specifically -1, 0 or +1? Usually such functions only care about -ve, 0 or +ve. Or maybe this is a double->int conversion thing? I guess (int)0.1 == 0 rather than +ve like you''d want? Yes, I suppose that''s it, nevermind. I might have written the floating point zeroes as 0.0 to make it clearer rather than relying on automatic promotion.> + double freememkb_diff = normalized_diff(c2->free_memkb, c1->free_memkb); > + double nrdomains_diff = normalized_diff(c1->nr_domains, c2->nr_domains); > + > + if (c1->nr_nodes != c2->nr_nodes) > + return c1->nr_nodes - c2->nr_nodes; > + > + return sign(3*freememkb_diff + nrdomains_diff); > +} > + > +/* The actual automatic NUMA placement routine */ > +static int numa_place_domain(libxl__gc *gc, libxl_domain_build_info *info)I didn''t review the rest -- George already acked it. Ian.
Ian Campbell
2012-Jul-06 11:16 UTC
Re: [PATCH 00 of 10 v3] Automatic NUMA placement for xl
On Wed, 2012-07-04 at 17:17 +0100, Dario Faggioli wrote:> Hello, > > Third version of the NUMA placement series Xen 4.2.I''m afraid I get a segfault with this: quartz:~# xl dest d32-1 ; gdb --args xl cr /etc/xen/debian-x86_32p-1 [...] (gdb) r Starting program: /usr/sbin/xl cr /etc/xen/debian-x86_32p-1 [Thread debugging using libthread_db enabled] Parsing config from /etc/xen/debian-x86_32p-1 Program received signal SIGSEGV, Segmentation fault. *__GI___libc_free (mem=0x1) at malloc.c:3710 3710 malloc.c: No such file or directory. in malloc.c (gdb) bt #0 *__GI___libc_free (mem=0x1) at malloc.c:3710 #1 0xb7fa8b78 in libxl_bitmap_dispose (map=0xbffff170) at libxl_utils.c:510 #2 0xb7fadbde in libxl__get_numa_candidates (gc=0x806b7a0, min_free_memkb=141312, min_cpus=4, min_nodes=0, max_nodes=0, suitable_cpumap=0xbffff1f4, cndts=0xbffff208, nr_cndts=0xbffff20c) at libxl_numa.c:397 #3 0xb7fa45ec in numa_place_domain (gc=0x806b7a0, domid=8, info=0xbffff5dc, state=0x806a954) at libxl_dom.c:169 #4 libxl__build_pre (gc=0x806b7a0, domid=8, info=0xbffff5dc, state=0x806a954) at libxl_dom.c:232 #5 0xb7f98234 in libxl__domain_build (gc=0x806b7a0, info=0xbffff5dc, domid=8, state=0x806a954) at libxl_create.c:320 #6 0xb7f9859f in domcreate_bootloader_done (egc=0xbffff43c, bl=0x806a998, rc=0) at libxl_create.c:695 #7 0xb7fb5e50 in bootloader_callback (egc=<value optimized out>, bl=0x806a998, rc=0) at libxl_bootloader.c:256 #8 0xb7fb7482 in libxl__bootloader_run (egc=0xbffff43c, bl=0x806a998) at libxl_bootloader.c:394 #9 0xb7f99535 in initiate_domain_create (ctx=<value optimized out>, d_config=<value optimized out>, domid=0x8068354, restore_fd=-1, ao_how=0x0, aop_console_how=0x0) at libxl_create.c:635 #10 do_domain_create (ctx=<value optimized out>, d_config=<value optimized out>, domid=0x8068354, restore_fd=-1, ao_how=0x0, aop_console_how=0x0) at libxl_create.c:1039 #11 0xb7f9966f in libxl_domain_create_new (ctx=0x8069030, d_config=0xbffff5ac, domid=0x8068354, ao_how=0x0, aop_console_how=0x0) at libxl_create.c:1062 #12 0x0805c479 in create_domain (dom_info=<value optimized out>) at xl_cmdimpl.c:1809 #13 0x0805dd13 in main_create (argc=2, argv=0xbffffd28) at xl_cmdimpl.c:3774 #14 0x0804d1d6 in main (argc=3, argv=0xbffffd24) at xl.c:263 (gdb) frame 1 #1 0xb7fa8b78 in libxl_bitmap_dispose (map=0xbffff170) at libxl_utils.c:510 510 libxl_utils.c: No such file or directory. in libxl_utils.c (gdb) print *map $2 = {size = 3221221764, map = 0x1 <Address 0x1 out of bounds>} (gdb) frame 2 #2 0xb7fadbde in libxl__get_numa_candidates (gc=0x806b7a0, min_free_memkb=141312, min_cpus=4, min_nodes=0, max_nodes=0, suitable_cpumap=0xbffff1f4, cndts=0xbffff208, nr_cndts=0xbffff20c) at libxl_numa.c:397 397 libxl_numa.c: No such file or directory. in libxl_numa.c (gdb) print suitable_nodemap $3 = {size = 3221221764, map = 0x1 <Address 0x1 out of bounds>} (gdb) print nodemap $4 = {size = 0, map = 0x0} So it looks like suitable_nodemap wasn''t initialised? There are a few "goto out"s before initialising that variable, but none of them log (really they should) and I didn''t investigate which one it was yet. Ian.
Ian Campbell
2012-Jul-06 11:20 UTC
Re: [PATCH 00 of 10 v3] Automatic NUMA placement for xl
On Fri, 2012-07-06 at 12:16 +0100, Ian Campbell wrote:> There are a few "goto out"s before initialising that variable, but none > of them log (really they should) and I didn''t investigate which one it > was yet.It seems to be: /* If we don''t have at least 2 nodes, it is useless to proceed */ if (nr_nodes < 2) { LOG(DEBUG, "only %d node. no placement required", nr_nodes); rc = 0; goto out; } (LOG is mine...). The other exit paths look like the log further down the stack but not this one so it is worth adding. You probably want another libxl_bitmap_init near the top of the function. Ian.
Ian Campbell
2012-Jul-06 11:22 UTC
Re: [PATCH 00 of 10 v3] Automatic NUMA placement for xl
On Fri, 2012-07-06 at 12:20 +0100, Ian Campbell wrote:> You probably want another libxl_bitmap_init near the top of the > function.Works for me: # HG changeset patch # User Ian Campbell <ian.campbell@citrix.com> # Date 1341573735 -3600 # Node ID 4f964e2446c935838f54b9cae48a6c62fd8de3d0 # Parent 124ddd91c8de38204e94d3125013a40aaa326774 [mq]: libxl-numa-place-segfault.patch diff -r 124ddd91c8de -r 4f964e2446c9 tools/libxl/libxl_numa.c --- a/tools/libxl/libxl_numa.c Wed Jul 04 17:38:44 2012 +0200 +++ b/tools/libxl/libxl_numa.c Fri Jul 06 12:22:15 2012 +0100 @@ -258,6 +258,7 @@ int libxl__get_numa_candidates(libxl__gc libxl_bitmap suitable_nodemap, nodemap; int array_size, rc; + libxl_bitmap_init(&suitable_nodemap); libxl_bitmap_init(&nodemap); /* Get platform info and prepare the map for testing the combinations */ @@ -266,6 +267,7 @@ int libxl__get_numa_candidates(libxl__gc return ERROR_FAIL; /* If we don''t have at least 2 nodes, it is useless to proceed */ if (nr_nodes < 2) { + LOG(DEBUG, "only %d node. no placement required", nr_nodes); rc = 0; goto out; }
George Dunlap
2012-Jul-06 11:30 UTC
Re: [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
On 04/07/12 17:18, Dario Faggioli wrote:> # HG changeset patch > # User Dario Faggioli<raistlin@linux.it> > # Date 1341416324 -7200 > # Node ID 7087d3622ee2051654c9e78fe4829da10c2d46f1 > # Parent 6fd693e7f3bc8b4d9bd20befff2c13de5591a7c5 > libxl: enable automatic placement of guests on NUMA nodes > > If a domain does not have a VCPU affinity, try to pin it automatically to some > PCPUs. This is done taking into account the NUMA characteristics of the host. > In fact, we look for a combination of host''s NUMA nodes with enough free memory > and number of PCPUs for the new domain, and pin it to the VCPUs of those nodes. > > Once we know which ones, among all the possible combinations, represents valid > placement candidates for a domain, use some heuistics for deciding which is the > best. For instance, smaller candidates are considered to be better, both from > the domain''s point of view (fewer memory spreading among nodes) and from the > system as a whole point of view (fewer memoy fragmentation). In case of > candidates of equal sizes (i.e., with the same number of nodes), the amount of > free memory and the number of domain already assigned to their nodes are > considered. Very often, candidates with greater amount of memory are the one > we wants, as this is also good for keeping memory fragmentation under control. > However, if the difference in how much free memory two candidates have, the > number of assigned domains might be what decides which candidate wins. > > This all happens internally to libxl, and no API for driving the mechanism is > provided for now. This matches what xend already does. > > Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com> > Acked-by: George Dunlap<george.dunlap@eu.citrix.com>One question I have: Is there any particular reason to sort the whole list, rather than just finding the maximum based on the comparison function? But I think it''s been a long time and it looks good enough to me: Acked-by: George Dunlap <george.dunlap@eu.citrix.com>> > --- > Changes from v2: > * lots of typos. > * Clayfied some comments, as requested during (ijc''s) review. > * Added some more information/reference for the combination generation > algorithm. > * nodemap_to_nodes_cpus() function renamed to nodemap_to_nr_cpus(). > * libxl_bitmap_init() used to make sure we do not try to free random > memory on failure paths of functions that allocates a libxl_bitmap. > * Always invoke libxl__sort_numa_candidates(), even if we get there > with just 1 candidate, as requested during review. > * Simplified the if-s that check for input parameter consistency in > libxl__get_numa_candidates() as requested during (gwd''s) review. > * Comparison function for candidates changed so that it now provides > total ordering, as requested during review. It is still using FP > arithmetic, though. Also I think that just putting the difference > between the amount of free memory and between the number of assigned > domains of two candidates in a single formula (after normalizing and > weighting them) is both clear and effective enough. > * Function definitions moved to a numa specific source file (libxl_numa.c), > as suggested during review. > > > Changes from v1: > * This patches incorporates the changes from both "libxl, xl: enable automatic > placement of guests on NUMA nodes" and "libxl, xl: heuristics for reordering > NUMA placement candidates" from v1. > * The logic of the algorithm is basically the same as in v1, but the splitting > of it in the various functions has been completely redesigned from scratch. > * No public API for placement or candidate generation is now exposed, > everything happens within libxl, as agreed during v1 review. > * The relevant documentation have been moved near the actual functions and > features. Also, the amount and (hopefully!) the quality of the documentation > has been improved a lot, as requested. > * All the comments about using the proper libxl facilities and helpers for > allocations, etc., have been considered and applied. > * This patch still bails out from NUMA optimizations if it find out cpupools > are being utilized. It is next patch that makes the two things interact > properly, as suggested during review. > > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 > --- a/docs/man/xl.cfg.pod.5 > +++ b/docs/man/xl.cfg.pod.5 > @@ -111,8 +111,8 @@ created online and the remainder will be > > =item B<cpus="CPU-LIST"> > > -List of which cpus the guest is allowed to use. Default behavior is > -`all cpus`. A C<CPU-LIST> may be specified as follows: > +List of which cpus the guest is allowed to use. By default xl will (via > +libxl) pick some cpus (see below). A C<CPU-LIST> may be specified as follows: > > =over 4 > > @@ -132,6 +132,12 @@ run on cpu #3 of the host. > > =back > > +If this option is not specified, libxl automatically tries to place the new > +domain on the host''s NUMA nodes (provided the host has more than one NUMA > +node) by pinning it to the cpus of those nodes. A heuristic approach is > +utilized with the goals of maximizing performance for the domain and, at > +the same time, achieving efficient utilization of the host''s CPUs and RAM. > + > =item B<cpu_weight=WEIGHT> > > A domain with a weight of 512 will get twice as much CPU as a domain > diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile > --- a/tools/libxl/Makefile > +++ b/tools/libxl/Makefile > @@ -66,7 +66,7 @@ LIBXL_LIBS += -lyajl -lm > LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \ > libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \ > libxl_internal.o libxl_utils.o libxl_uuid.o \ > - libxl_json.o libxl_aoutils.o \ > + libxl_json.o libxl_aoutils.o libxl_numa.o \ > libxl_save_callout.o _libxl_save_msgs_callout.o \ > libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y) > LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -98,6 +98,106 @@ out: > return sched; > } > > +/* Subtract two values and translate the result in [0, 1] */ > +static double normalized_diff(double a, double b) > +{ > +#define max(a, b) (a> b ? a : b) > + if (!a&& a == b) > + return 0.0; > + return (a - b) / max(a, b); > +} > + > +/* > + * The NUMA placement candidates are reordered according to the following > + * heuristics: > + * - candidates involving fewer nodes come first. In case two (or > + * more) candidates span the same number of nodes, > + * - the amount of free memory and the number of domains assigned to the > + * candidates are considered. In doing that, candidates with greater > + * amount of free memory and fewer domains assigned to them are preferred, > + * with free memory "weighting" three times as much as number of domains. > + */ > +static int numa_cmpf(const void *v1, const void *v2) > +{ > + const libxl__numa_candidate *c1 = v1; > + const libxl__numa_candidate *c2 = v2; > +#define sign(a) a> 0 ? 1 : a< 0 ? -1 : 0 > + double freememkb_diff = normalized_diff(c2->free_memkb, c1->free_memkb); > + double nrdomains_diff = normalized_diff(c1->nr_domains, c2->nr_domains); > + > + if (c1->nr_nodes != c2->nr_nodes) > + return c1->nr_nodes - c2->nr_nodes; > + > + return sign(3*freememkb_diff + nrdomains_diff); > +} > + > +/* The actual automatic NUMA placement routine */ > +static int numa_place_domain(libxl__gc *gc, libxl_domain_build_info *info) > +{ > + int nr_candidates = 0; > + libxl__numa_candidate *candidates = NULL; > + libxl_bitmap candidate_nodemap; > + libxl_cpupoolinfo *pinfo; > + int nr_pools, rc = 0; > + uint32_t memkb; > + > + libxl_bitmap_init(&candidate_nodemap); > + > + /* First of all, if cpupools are in use, better not to mess with them */ > + pinfo = libxl_list_cpupool(CTX,&nr_pools); > + if (!pinfo) > + return ERROR_FAIL; > + if (nr_pools> 1) { > + LOG(NOTICE, "skipping NUMA placement as cpupools are in use"); > + goto out; > + } > + > + rc = libxl_domain_need_memory(CTX, info,&memkb); > + if (rc) > + goto out; > + if (libxl_node_bitmap_alloc(CTX,&candidate_nodemap, 0)) { > + rc = ERROR_FAIL; > + goto out; > + } > + > + /* Find all the candidates with enough free memory and at least > + * as much pcpus as the domain has vcpus. */ > + rc = libxl__get_numa_candidates(gc, memkb, info->max_vcpus, 0, 0, > +&candidates,&nr_candidates); > + if (rc) > + goto out; > + > + LOG(DETAIL, "%d NUMA placement candidates found", nr_candidates); > + > + /* No suitable placement candidates. We just return without touching the > + * domain''s info->cpumap. It will have affinity with all nodes/cpus. */ > + if (nr_candidates == 0) { > + LOG(NOTICE, "NUMA placement failed, performance might be affected"); > + goto out; > + } > + > + /* Bring the best candidate in front of the list --> candidates[0] */ > + libxl__sort_numa_candidates(candidates, nr_candidates, numa_cmpf); > + > + /* > + * At this point, the first candidate in the array is the one we want. > + * Go for it by mapping its node map to the domain''s info->cpumap. > + */ > + libxl__numa_candidate_get_nodemap(gc,&candidates[0],&candidate_nodemap); > + rc = libxl_nodemap_to_cpumap(CTX,&candidate_nodemap,&info->cpumap); > + if (rc) > + goto out; > + > + LOG(DETAIL, "NUMA placement candidate with %d nodes, %d cpus and " > + "%"PRIu32" KB free selected", candidates[0].nr_nodes, > + candidates[0].nr_cpus, candidates[0].free_memkb / 1024); > + > + out: > + libxl_bitmap_dispose(&candidate_nodemap); > + libxl_cpupoolinfo_list_free(pinfo, nr_pools); > + return rc; > +} > + > int libxl__build_pre(libxl__gc *gc, uint32_t domid, > libxl_domain_build_info *info, libxl__domain_build_state *state) > { > @@ -107,7 +207,22 @@ int libxl__build_pre(libxl__gc *gc, uint > uint32_t rtc_timeoffset; > > xc_domain_max_vcpus(ctx->xch, domid, info->max_vcpus); > + > + /* > + * Check if the domain has any CPU affinity. If not, try to build up one. > + * In case numa_place_domain() find at least a suitable candidate, it will > + * affect info->cpumap accordingly; if it does not, it just leaves it > + * as it is. This means (unless some weird error manifests) the subsequent > + * call to libxl_set_vcpuaffinity_all() will do the actual placement, > + * whatever that turns out to be. > + */ > + if (libxl_bitmap_is_full(&info->cpumap)) { > + int rc = numa_place_domain(gc, info); > + if (rc) > + return rc; > + } > libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,&info->cpumap); > + > xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + LIBXL_MAXMEM_CONSTANT); > if (info->type == LIBXL_DOMAIN_TYPE_PV) > xc_domain_set_memmap_limit(ctx->xch, domid, > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h > --- a/tools/libxl/libxl_internal.h > +++ b/tools/libxl/libxl_internal.h > @@ -2216,6 +2216,140 @@ static inline void libxl__ctx_unlock(lib > #define CTX_LOCK (libxl__ctx_lock(CTX)) > #define CTX_UNLOCK (libxl__ctx_unlock(CTX)) > > +/* > + * Automatic NUMA placement > + * > + * These functions and data structures deal with the initial placement of a > + * domain onto the host NUMA nodes. > + * > + * The key concept here is the one of "NUMA placement candidate", which is > + * basically a set of nodes whose characteristics have been successfully > + * checked against some specific requirements. More precisely, a candidate is > + * the nodemap associated with one of the possible subset of the host NUMA > + * nodes providing a certain amount of free memory, or a given number of cpus, > + * or even both (depending in what the caller wants). For convenience of use, > + * some of this information are stored within the candidate itself, instead of > + * always being dynamically computed. A single node is a valid placement > + * candidate, but it is also possible for a candidate to contain all the nodes > + * of the host. The fewer nodes there are in a candidate, the better > + * performance a domain placed onto it should get. For instance, looking for a > + * numa candidates with 2GB of free memory means we want all the possible > + * subsets of the host NUMA nodes with, cumulatively, at least 2GB of free > + * memory. That could be possible by just using one particular node, or may > + * require more nodes, depending on the characteristics of the host, on how > + * many domains have been created already, on how big they are, etc. > + * > + * The intended usage is as follows: > + * 1. by, fist of all, calling libxl__get_numa_candidates(), and specifying > + * the proper constraints to it (e.g., the amount of memory a domain need > + * as the minimum amount of free memory for the candidates) one can build > + * up a whole set of suitable placing alternatives for a domain; > + * 2. after that, one specific candidate should be chosen. That can happen > + * by looking at their various characteristics; > + * 3. the chosen candidate''s nodemap should be utilized for computing the > + * actual affinity of the domain which, given the current NUMA support > + * in the hypervisor, is what determines the placement of the domain''s > + * vcpus and memory. > + * > + * To make phase 2 even easier, a sorting helper function for the list of > + * candidates is provided in the form of libxl__sort_numa_candidates(). The > + * only that is needed is defining a comparison function, containing the > + * criteria for deciding, given two candidates, which one is ''better''. > + * Depending on how the comparison function is defined, the best candidate > + * (where, of course, best is defined with respect to the heuristics > + * implemented in the comparison function itself, libxl__numa_candidate_cmpf()) > + * could become the first or the last element of the list. > + * > + * Summarizing, achieving automatic NUMA placement is just a matter of > + * obtaining the list of suitable placement candidates, perhaps asking for each > + * of them to provide at least the amount of memory the domain needs. After > + * that just implement a comparison function by means of the various helpers > + * retrieving the relevant information about the candidates themselves. > + * Finally, call the sorting helper function and use the candidate that became > + * (typically) the first element of the list for determining the domain''s > + * affinity. > + */ > + > +typedef struct { > + int nr_cpus, nr_nodes; > + int nr_domains; > + uint32_t free_memkb; > + libxl_bitmap nodemap; > +} libxl__numa_candidate; > + > +/* > + * This generates the list of NUMA placement candidates satisfying some > + * specific conditions. If min_nodes and/or max_nodes are not 0, their value is > + * used to determine the minimum and maximum number of nodes that are allow to > + * be present in each candidate. If min_nodes and/or max_nodes are 0, the > + * minimum and maximum number of nodes to be used are automatically selected by > + * the implementation (and that will likely be just 1 node for the minimum and > + * the total number of existent nodes for the maximum). Re min_free_memkb and > + * min_cpu, if not 0, it means the caller only wants candidates with at > + * least that amount of free memory and that number of cpus, respectively. If > + * min_free_memkb and/or min_cpus are 0, the candidates'' free memory and number > + * of cpus won''t be checked at all, which means a candidate will always be > + * considered suitable wrt the specific constraint. cndts is where the list of > + * exactly nr_cndts candidates is returned. Note that, in case no candidates > + * are found at all, the function returns successfully, but with nr_cndts equal > + * to zero. > + */ > +_hidden int libxl__get_numa_candidates(libxl__gc *gc, > + uint32_t min_free_memkb, int min_cpus, > + int min_nodes, int max_nodes, > + libxl__numa_candidate *cndts[], int *nr_cndts); > + > +/* Initialization, allocation and deallocation for placement candidates */ > +static inline void libxl__numa_candidate_init(libxl__numa_candidate *cndt) > +{ > + cndt->free_memkb = 0; > + cndt->nr_cpus = cndt->nr_nodes = cndt->nr_domains = 0; > + libxl_bitmap_init(&cndt->nodemap); > +} > + > +static inline int libxl__numa_candidate_alloc(libxl__gc *gc, > + libxl__numa_candidate *cndt) > +{ > + return libxl_node_bitmap_alloc(CTX,&cndt->nodemap, 0); > +} > +static inline void libxl__numa_candidate_dispose(libxl__numa_candidate *cndt) > +{ > + libxl_bitmap_dispose(&cndt->nodemap); > +} > +static inline void libxl__numacandidate_list_free(libxl__numa_candidate *cndts, > + int nr_cndts) > +{ > + int i; > + > + for (i = 0; i< nr_cndts; i++) > + libxl__numa_candidate_dispose(&cndts[i]); > + free(cndts); > +} > + > +/* Retrieve (in nodemap) the node map associated to placement candidate cndt */ > +static inline > +void libxl__numa_candidate_get_nodemap(libxl__gc *gc, > + const libxl__numa_candidate *cndt, > + libxl_bitmap *nodemap) > +{ > + libxl_bitmap_copy(CTX, nodemap,&cndt->nodemap); > +} > +/* Set the node map of placement candidate cndt to match nodemap */ > +static inline > +void libxl__numa_candidate_put_nodemap(libxl__gc *gc, > + libxl__numa_candidate *cndt, > + const libxl_bitmap *nodemap) > +{ > + libxl_bitmap_copy(CTX,&cndt->nodemap, nodemap); > +} > + > +/* Signature for the comparison function between two candidates c1 and c2 */ > +typedef int (*libxl__numa_candidate_cmpf)(const void *v1, const void *v2); > +/* Sort the list of candidates in cndts (an array with nr_cndts elements in > + * it) using cmpf for comparing two candidates. Uses libc''s qsort(). */ > +_hidden void libxl__sort_numa_candidates(libxl__numa_candidate cndts[], > + int nr_cndts, > + libxl__numa_candidate_cmpf cmpf); > > /* > * Inserts "elm_new" into the sorted list "head". > diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c > new file mode 100644 > --- /dev/null > +++ b/tools/libxl/libxl_numa.c > @@ -0,0 +1,382 @@ > +/* > + * Copyright (C) 2012 Citrix Ltd. > + * Author Dario Faggioli<dario.faggioli@citrix.com> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU Lesser General Public License as published > + * by the Free Software Foundation; version 2.1 only. with the special > + * exception on linking described in file LICENSE. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU Lesser General Public License for more details. > + */ > + > +#include "libxl_osdeps.h" /* must come before any other headers */ > + > +#include<glob.h> > + > +#include "libxl_internal.h" > + > +/* > + * What follows are helpers for generating all the k-combinations > + * without repetitions of a set S with n elements in it. Formally > + * speaking, they are subsets of k distinct elements of S and, if > + * S is n elements big, the number of k-combinations is equal to > + * the binomial coefficient (n k), which means we get exactly > + * n!/(k! * (n - k)!) subsets, all of them with k elements. > + * > + * The various subset are generated one after the other by calling > + * comb_init() first, and, after that, comb_next() > + * (n k)-1 times. An iterator is used to store the current status > + * of the whole generation operation (i.e., basically, the last > + * combination that has been generated). As soon as all > + * combinations have been generated, comb_next() will > + * start returning 0 instead of 1. It is of course important that > + * the same instance of the iterator and the same values for > + * n and k are used for each call. If that doesn''t happen, the > + * result is unspecified. > + * > + * The algorithm is a well known one (see, for example, D. Knuth''s "The > + * Art of Computer Programming - Volume 4, Fascicle 3" and it produces > + * the combinations in such a way that they (well, more precisely, > + * their indexes it the array/map representing the set) come with > + * lexicographic ordering. > + * > + * For example, with n = 5 and k = 3, calling comb_init() > + * will generate { 0, 1, 2 }, while subsequent valid calls to > + * comb_next() will produce the following: > + * { { 0, 1, 3 }, { 0, 1, 4 }, > + * { 0, 2, 3 }, { 0, 2, 4 }, { 0, 3, 4 }, > + * { 1, 2, 3 }, { 1, 2, 4 }, { 1, 3, 4 }, > + * { 2, 3, 4 } } > + * > + * This is used by the automatic NUMA placement logic below. > + */ > +typedef int* comb_iter_t; > + > +static int comb_init(libxl__gc *gc, comb_iter_t *it, int n, int k) > +{ > + comb_iter_t new_iter; > + int i; > + > + if (n< k) > + return 0; > + > + /* First set is always { 0, 1, 2, ..., k-1 } */ > + GCNEW_ARRAY(new_iter, k); > + for (i = 0; i< k; i++) > + new_iter[i] = i; > + > + *it = new_iter; > + return 1; > +} > + > +static int comb_next(comb_iter_t it, int n, int k) > +{ > + int i; > + > + /* > + * The idea here is to find the leftmost element from where > + * we should start incrementing the indexes of the iterator. > + * This means looking for the highest index that can be increased > + * while still producing value smaller than n-1. In the example > + * above, when dealing with { 0, 1, 4 }, such an element is the > + * second one, as the third is already equal to 4 (which actually > + * is n-1). > + * Once we found from where to start, we increment that element > + * and override the right-hand rest of the iterator with its > + * successors, thus achieving lexicographic ordering. > + * > + * Regarding the termination of the generation process, when we > + * manage in bringing n-k at the very first position of the iterator, > + * we know that is the last valid combination ( { 2, 3, 4 }, with > + * n - k = 5 - 2 = 2, in the example above), and thus we start > + * returning 0 as soon as we cross that border. > + */ > + for (i = k - 1; it[i] == n - k + i; i--) { > + if (i<= 0) > + return 0; > + } > + for (it[i]++, i++; i< k; i++) > + it[i] = it[i - 1] + 1; > + return 1; > +} > + > +/* NUMA automatic placement (see libxl_internal.h for details) */ > + > +/* > + * This function turns a k-combination iterator into a node map. > + * This means the bits in the node map corresponding to the indexes > + * of the given combination are the ones that will be set. > + * For example, if the iterator represents the combination { 0, 2, 4}, > + * the node map will have bits #0, #2 and #4 set. > + */ > +static void comb_get_nodemap(comb_iter_t it, libxl_bitmap *nodemap, int k) > +{ > + int i; > + > + libxl_bitmap_set_none(nodemap); > + for (i = 0; i< k; i++) > + libxl_bitmap_set(nodemap, it[i]); > +} > + > +/* Retrieve the number of cpus that the nodes that are part of the nodemap > + * span. */ > +static int nodemap_to_nr_cpus(libxl_cputopology *tinfo, int nr_cpus, > + const libxl_bitmap *nodemap) > +{ > + int i, nodes_cpus = 0; > + > + for (i = 0; i< nr_cpus; i++) { > + if (libxl_bitmap_test(nodemap, tinfo[i].node)) > + nodes_cpus++; > + } > + return nodes_cpus; > +} > + > +/* Retrieve the amount of free memory within the nodemap */ > +static uint32_t nodemap_to_free_memkb(libxl_numainfo *ninfo, > + libxl_bitmap *nodemap) > +{ > + uint32_t free_memkb = 0; > + int i; > + > + libxl_for_each_set_bit(i, *nodemap) > + free_memkb += ninfo[i].free / 1024; > + > + return free_memkb; > +} > + > +/* Retrieve the number of domains that can potentially run on the cpus > + * the nodes that are part of the nodemap. */ > +static int nodemap_to_nr_domains(libxl__gc *gc, libxl_cputopology *tinfo, > + const libxl_bitmap *nodemap) > +{ > + libxl_dominfo *dinfo = NULL; > + libxl_bitmap dom_nodemap; > + int nr_doms, nr_cpus; > + int nr_domains = 0; > + int i, j, k; > + > + dinfo = libxl_list_domain(CTX,&nr_doms); > + if (dinfo == NULL) > + return ERROR_FAIL; > + > + if (libxl_node_bitmap_alloc(CTX,&dom_nodemap, 0)< 0) { > + libxl_dominfo_list_free(dinfo, nr_doms); > + return ERROR_FAIL; > + } > + > + for (i = 0; i< nr_doms; i++) { > + libxl_vcpuinfo *vinfo; > + int nr_vcpus; > + > + vinfo = libxl_list_vcpu(CTX, dinfo[i].domid,&nr_vcpus,&nr_cpus); > + if (vinfo == NULL) > + continue; > + > + libxl_bitmap_set_none(&dom_nodemap); > + for (j = 0; j< nr_vcpus; j++) { > + libxl_for_each_set_bit(k, vinfo[j].cpumap) > + libxl_bitmap_set(&dom_nodemap, tinfo[k].node); > + } > + > + libxl_for_each_set_bit(j, dom_nodemap) { > + if (libxl_bitmap_test(nodemap, j)) { > + nr_domains++; > + break; > + } > + } > + > + libxl_vcpuinfo_list_free(vinfo, nr_vcpus); > + } > + > + libxl_bitmap_dispose(&dom_nodemap); > + libxl_dominfo_list_free(dinfo, nr_doms); > + return nr_domains; > +} > + > +/* > + * This function tries to figure out if the host has a consistent number > + * of cpus along all its NUMA nodes. In fact, if that is the case, we can > + * calculate the minimum number of nodes needed for a domain by just > + * dividing its total number of vcpus by this value computed here. > + * However, we are not allowed to assume that all the nodes have the > + * same number of cpus. Therefore, in case discrepancies among different > + * nodes are found, this function just returns 0, for the caller to know > + * it shouldn''t rely on this ''optimization'', and sort out things in some > + * other way (most likely, just start trying with candidates with just > + * one node). > + */ > +static int cpus_per_node_count(libxl_cputopology *tinfo, int nr_cpus, > + libxl_numainfo *ninfo, int nr_nodes) > +{ > + int cpus_per_node = 0; > + int j, i; > + > + /* This makes sense iff # of PCPUs is the same for all nodes */ > + for (j = 0; j< nr_nodes; j++) { > + int curr_cpus = 0; > + > + for (i = 0; i< nr_cpus; i++) { > + if (tinfo[i].node == j) > + curr_cpus++; > + } > + /* So, if the above does not hold, turn the whole thing off! */ > + cpus_per_node = cpus_per_node == 0 ? curr_cpus : cpus_per_node; > + if (cpus_per_node != curr_cpus) > + return 0; > + } > + return cpus_per_node; > +} > + > +/* Get all the placement candidates satisfying some specific conditions */ > +int libxl__get_numa_candidates(libxl__gc *gc, > + uint32_t min_free_memkb, int min_cpus, > + int min_nodes, int max_nodes, > + libxl__numa_candidate *cndts[], int *nr_cndts) > +{ > + libxl__numa_candidate *new_cndts = NULL; > + libxl_cputopology *tinfo = NULL; > + libxl_numainfo *ninfo = NULL; > + int nr_nodes = 0, nr_cpus = 0; > + libxl_bitmap nodemap; > + int array_size, rc; > + > + libxl_bitmap_init(&nodemap); > + > + /* Get platform info and prepare the map for testing the combinations */ > + ninfo = libxl_get_numainfo(CTX,&nr_nodes); > + if (ninfo == NULL) > + return ERROR_FAIL; > + /* If we don''t have at least 2 nodes, it is useless to proceed */ > + if (nr_nodes< 2) { > + rc = 0; > + goto out; > + } > + > + tinfo = libxl_get_cpu_topology(CTX,&nr_cpus); > + if (tinfo == NULL) { > + rc = ERROR_FAIL; > + goto out; > + } > + > + rc = libxl_node_bitmap_alloc(CTX,&nodemap, 0); > + if (rc) > + goto out; > + > + /* > + * If the minimum number of NUMA nodes is not explicitly specified > + * (i.e., min_nodes == 0), we try to figure out a sensible number of nodes > + * from where to start generating candidates, if possible (or just start > + * from 1 otherwise). The maximum number of nodes should not exceed the > + * number of existent NUMA nodes on the host, or the candidate generation > + * won''t work properly. > + */ > + if (!min_nodes) { > + int cpus_per_node; > + > + cpus_per_node = cpus_per_node_count(tinfo, nr_cpus, ninfo, nr_nodes); > + if (cpus_per_node == 0) > + min_nodes = 1; > + else > + min_nodes = (min_cpus + cpus_per_node - 1) / cpus_per_node; > + } > + if (min_nodes> nr_nodes) > + min_nodes = nr_nodes; > + if (!max_nodes || max_nodes> nr_nodes) > + max_nodes = nr_nodes; > + if (min_nodes> max_nodes) { > + rc = ERROR_INVAL; > + goto out; > + } > + > + /* Initialize the local storage for the combinations */ > + *nr_cndts = 0; > + array_size = nr_nodes; > + GCNEW_ARRAY(new_cndts, array_size); > + > + /* Generate all the combinations of any size from min_nodes to > + * max_nodes (see comb_init() and comb_next()). */ > + while (min_nodes<= max_nodes) { > + comb_iter_t comb_iter; > + int comb_ok; > + > + /* > + * And here it is. Each step of this cycle generates a combination of > + * nodes as big as min_nodes mandates. Each of these combinations is > + * checked against the constraints provided by the caller (namely, > + * amount of free memory and number of cpus) and it becomes an actual > + * placement candidate iff it passes the check. > + */ > + for (comb_ok = comb_init(gc,&comb_iter, nr_nodes, min_nodes); comb_ok; > + comb_ok = comb_next(comb_iter, nr_nodes, min_nodes)) { > + uint32_t nodes_free_memkb; > + int nodes_cpus; > + > + comb_get_nodemap(comb_iter,&nodemap, min_nodes); > + > + /* If there is not enough memory in this combination, skip it > + * and go generating the next one... */ > + nodes_free_memkb = nodemap_to_free_memkb(ninfo,&nodemap); > + if (min_free_memkb&& nodes_free_memkb< min_free_memkb) > + continue; > + > + /* And the same applies if this combination is short in cpus */ > + nodes_cpus = nodemap_to_nr_cpus(tinfo, nr_cpus,&nodemap); > + if (min_cpus&& nodes_cpus< min_cpus) > + continue; > + > + /* > + * Conditions are met, we can add this combination to the > + * NUMA placement candidates list. We first make sure there > + * is enough space in there, and then we initialize the new > + * candidate element with the node map corresponding to the > + * combination we are dealing with. Memory allocation for > + * expanding the array that hosts the list happens in chunks > + * equal to the number of NUMA nodes in the system (to > + * avoid allocating memory each and every time we find a > + * new candidate). > + */ > + if (*nr_cndts == array_size) > + array_size += nr_nodes; > + GCREALLOC_ARRAY(new_cndts, array_size); > + > + libxl__numa_candidate_alloc(gc,&new_cndts[*nr_cndts]); > + libxl__numa_candidate_put_nodemap(gc,&new_cndts[*nr_cndts], > +&nodemap); > + new_cndts[*nr_cndts].nr_domains > + nodemap_to_nr_domains(gc, tinfo,&nodemap); > + new_cndts[*nr_cndts].free_memkb = nodes_free_memkb; > + new_cndts[*nr_cndts].nr_nodes = min_nodes; > + new_cndts[*nr_cndts].nr_cpus = nodes_cpus; > + > + LOG(DEBUG, "NUMA placement candidate #%d found: nr_nodes=%d, " > + "nr_cpus=%d, free_memkb=%"PRIu32"", *nr_cndts, > + min_nodes, new_cndts[*nr_cndts].nr_cpus, > + new_cndts[*nr_cndts].free_memkb / 1024); > + > + (*nr_cndts)++; > + } > + min_nodes++; > + } > + > + *cndts = new_cndts; > + out: > + libxl_bitmap_dispose(&nodemap); > + libxl_cputopology_list_free(tinfo, nr_cpus); > + libxl_numainfo_list_free(ninfo, nr_nodes); > + return rc; > +} > + > +void libxl__sort_numa_candidates(libxl__numa_candidate cndts[], int nr_cndts, > + libxl__numa_candidate_cmpf cmpf) > +{ > + /* Reorder candidates (see the comparison function for > + * the details on the heuristics) */ > + qsort(cndts, nr_cndts, sizeof(cndts[0]), cmpf); > +} > + > +
Ian Campbell
2012-Jul-06 11:37 UTC
Re: [PATCH 03 of 10 v3] xl: add more NUMA information to `xl info -n''
On Wed, 2012-07-04 at 17:18 +0100, Dario Faggioli wrote:> # HG changeset patch > # User Dario Faggioli <raistlin@linux.it> > # Date 1341416323 -7200 > # Node ID f1227d5a82e56d10e302aec4c3717d281718a349 > # Parent 0ca91a203fc95d3d18bb436ecdc7106b0b2ff22f > xl: add more NUMA information to `xl info -n'' > > So that the user knows how much memory there is on each node and > how far they are from each others. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > Acked-by: Ian Campbell <ian.campbell@citrix.com>On my single node system this produces: cpu_topology : cpu: core socket node 0: 0 0 0 1: 1 0 0 2: 2 0 0 3: 3 0 0 numa_info : node: memsize memfree distances 0: 4608 3083 10 Is "distances" here right? I''d have expected only a single 0 (distance from self)?> > --- > Changes from v1: > * integer division replaced by right shift. > > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -4249,6 +4249,36 @@ static void output_physinfo(void) > return; > } > > +static void output_numainfo(void) > +{ > + libxl_numainfo *info; > + int i, j, nr; > + > + info = libxl_get_numainfo(ctx, &nr); > + if (info == NULL) { > + fprintf(stderr, "libxl_get_numainfo failed.\n"); > + return; > + } > + > + printf("numa_info :\n"); > + printf("node: memsize memfree distances\n"); > + > + for (i = 0; i < nr; i++) { > + if (info[i].size != LIBXL_NUMAINFO_INVALID_ENTRY) { > + printf("%4d: %6"PRIu64" %6"PRIu64" %d", i, > + info[i].size >> 20, info[i].free >> 20, > + info[i].dists[0]); > + for (j = 1; j < info[i].num_dists; j++) > + printf(",%d", info[i].dists[j]); > + printf("\n"); > + } > + } > + > + libxl_numainfo_list_free(info, nr); > + > + return; > +} > + > static void output_topologyinfo(void) > { > libxl_cputopology *info; > @@ -4271,8 +4301,6 @@ static void output_topologyinfo(void) > > libxl_cputopology_list_free(info, nr); > > - printf("numa_info : none\n"); > - > return; > } > > @@ -4282,8 +4310,10 @@ static void info(int numa) > > output_physinfo(); > > - if (numa) > + if (numa) { > output_topologyinfo(); > + output_numainfo(); > + } > > output_xeninfo(); >
Ian Campbell
2012-Jul-06 11:42 UTC
Re: [PATCH 07 of 10 v3] libxl: explicitly check for libmath in autoconf
On Wed, 2012-07-04 at 17:18 +0100, Dario Faggioli wrote:> # HG changeset patch > # User Dario Faggioli <raistlin@linux.it> > # Date 1341416324 -7200 > # Node ID 6fd693e7f3bc8b4d9bd20befff2c13de5591a7c5 > # Parent 3b65112bedc0656512312e29b89652f1c4ca0083 > libxl: explicitly check for libmath in autoconf > > As well as explicitly add -lm to libxl''s Makefile. > > This is because next patch uses floating point arithmetic, and > it is better to state it clearly that we need libmath (just in > case we find a libc that wants that to be explicitly enforced). > > Notice that autoconf should be rerun after applying this change. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > diff --git a/tools/configure.ac b/tools/configure.ac > --- a/tools/configure.ac > +++ b/tools/configure.ac > @@ -133,6 +133,7 @@ AC_CHECK_LIB([lzo2], [lzo1x_decompress], > AC_SUBST(zlib) > AC_CHECK_LIB([aio], [io_setup], [system_aio="y"], [system_aio="n"]) > AC_SUBST(system_aio) > +AC_CHECK_LIB([m], [isnan], [], [AC_MSG_ERROR([Could not find libmath])])Should this be s/libmath/libm/ to avoid confusion? I will do this as I commit if necessary. Ian.> AC_CHECK_LIB([crypto], [MD5], [], [AC_MSG_ERROR([Could not find libcrypto])]) > AC_CHECK_LIB([ext2fs], [ext2fs_open2], [libext2fs="y"], [libext2fs="n"]) > AC_SUBST(libext2fs) > diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile > --- a/tools/libxl/Makefile > +++ b/tools/libxl/Makefile > @@ -61,7 +61,7 @@ ifeq ($(BISON),) > scanners, please install it an rerun configure) > endif > > -LIBXL_LIBS += -lyajl > +LIBXL_LIBS += -lyajl -lm > > LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \ > libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
Dario Faggioli
2012-Jul-06 11:54 UTC
Re: [PATCH 07 of 10 v3] libxl: explicitly check for libmath in autoconf
On Fri, 2012-07-06 at 12:42 +0100, Ian Campbell wrote:> > diff --git a/tools/configure.ac b/tools/configure.ac > > --- a/tools/configure.ac > > +++ b/tools/configure.ac > > @@ -133,6 +133,7 @@ AC_CHECK_LIB([lzo2], [lzo1x_decompress], > > AC_SUBST(zlib) > > AC_CHECK_LIB([aio], [io_setup], [system_aio="y"], [system_aio="n"]) > > AC_SUBST(system_aio) > > +AC_CHECK_LIB([m], [isnan], [], [AC_MSG_ERROR([Could not find libmath])]) > > Should this be s/libmath/libm/ to avoid confusion? >Yes, I think that would be better.> I will do this as I > commit if necessary. >Well, that would be great, thanks. Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Jul-06 12:00 UTC
Re: [PATCH 03 of 10 v3] xl: add more NUMA information to `xl info -n''
On Fri, 2012-07-06 at 12:37 +0100, Ian Campbell wrote:> > xl: add more NUMA information to `xl info -n'' > > > > So that the user knows how much memory there is on each node and > > how far they are from each others. > > > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > Acked-by: Ian Campbell <ian.campbell@citrix.com> > > On my single node system this produces: > cpu_topology : > cpu: core socket node > 0: 0 0 0 > 1: 1 0 0 > 2: 2 0 0 > 3: 3 0 0 > numa_info : > node: memsize memfree distances > 0: 4608 3083 10 > > Is "distances" here right? >It looks the same here. While with two nodes it gives something like the below: 0: 1: 0: 10, 20 1: 20, 10 More important, forgetting about the ''10'' do you want me to suppress that part of the output if there''s only one node? I did not think at it before, but it might make sense. Of course, I can also do this with a follow-up patch, just let me know.> I''d have expected only a single 0 (distance > from self)? >Honestly, I''ve not looked at where it comes from... I think I''ve seen something similar in Linux as well (although I might be wrong). Do you want me to investigate? Thanks, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2012-Jul-06 12:15 UTC
Re: [PATCH 03 of 10 v3] xl: add more NUMA information to `xl info -n''
On Fri, 2012-07-06 at 13:00 +0100, Dario Faggioli wrote:> On Fri, 2012-07-06 at 12:37 +0100, Ian Campbell wrote: > > > xl: add more NUMA information to `xl info -n'' > > > > > > So that the user knows how much memory there is on each node and > > > how far they are from each others. > > > > > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > > Acked-by: Ian Campbell <ian.campbell@citrix.com> > > > > On my single node system this produces: > > cpu_topology : > > cpu: core socket node > > 0: 0 0 0 > > 1: 1 0 0 > > 2: 2 0 0 > > 3: 3 0 0 > > numa_info : > > node: memsize memfree distances > > 0: 4608 3083 10 > > > > Is "distances" here right? > > > It looks the same here. While with two nodes it gives something like the > below: > > 0: 1: > 0: 10, 20 > 1: 20, 10 > > More important, forgetting about the ''10'' do you want me to suppress > that part of the output if there''s only one node? I did not think at it > before, but it might make sense. Of course, I can also do this with a > follow-up patch, just let me know.Oh, it''s a decimal number, for some reason I was reading it in a bitmap or a "distance per column" way (i.e. as dist[0]=0 and dist[1]=1), which is non-sensical now I think about it> > > I''d have expected only a single 0 (distance > > from self)? > > > Honestly, I''ve not looked at where it comes from... I think I''ve seen > something similar in Linux as well (although I might be wrong). > > Do you want me to investigate?It''s ok, was just me being dumb ;-) I''ll commit this shortly, along with the patches up until (and including #7) Ian
Ian Campbell
2012-Jul-06 12:19 UTC
Re: [PATCH 00 of 10 v3] Automatic NUMA placement for xl
On Wed, 2012-07-04 at 17:17 +0100, Dario Faggioli wrote:> Hello, > > Third version of the NUMA placement series Xen 4.2. > > All the comments received during v2''s review have been addressed (more details > in single changelogs). > > The most notable changes are the following: > - the libxl_cpumap --> libxl_bitmap renaming has been rebased on top of the > recent patches that allows us to allocate bitmaps of different sizes; > - the heuristics for deciding which NUMA placement is the best one has been > redesigned, so that it now provides total ordering. > > Here it is what this posting contains (* = acked during previous round): > > * [PATCH 01 of 10 v3] libxl: add a new Array type to the IDL > [PATCH 02 of 10 v3] libxl,libxc: introduce libxl_get_numainfo() > * [PATCH 03 of 10 v3] xl: add more NUMA information to `xl info -n'' > [PATCH 04 of 10 v3] libxl: rename libxl_cpumap to libxl_bitmap > [PATCH 05 of 10 v3] libxl: expand the libxl_bitmap API a bit > * [PATCH 06 of 10 v3] libxl: introduce some node map helpers > [PATCH 07 of 10 v3] libxl: explicitly check for libmath in autoconfThese are now sufficiently acked that I have committed them.> Is where data structures, utility functions and infrastructure are introduced. > > * [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodesAs discussed this one has a few issues so I stopped before committing this one. Ian.> * [PATCH 09 of 10 v3] libxl: have NUMA placement deal with cpupools > > Host the core of the mechanism. > > * [PATCH 10 of 10 v3] Some automatic NUMA placement documentation > > For some more documentation. > > Thanks a lot and Regards, > Dario >
George Dunlap
2012-Jul-06 12:42 UTC
Re: [PATCH 09 of 10 v3] libxl: have NUMA placement deal with cpupools
On 04/07/12 17:18, Dario Faggioli wrote:> # HG changeset patch > # User Dario Faggioli<raistlin@linux.it> > # Date 1341416324 -7200 > # Node ID 885e2f385601d66179058bfb6bd3960f17d5e068 > # Parent 7087d3622ee2051654c9e78fe4829da10c2d46f1 > libxl: have NUMA placement deal with cpupools > > In such a way that only the cpus belonging to the cpupool of the > domain being placed are considered for the placement itself. > > This happens by filtering out all the nodes in which the cpupool has > not any cpu from the placement candidates. After that -- as a cpu pooling > not necessarily happens at NUMA nodes boundaries -- we also make sure > only the actual cpus that are part of the pool are considered when > counting how much processors a placement candidate is able to provide. > > Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com> > Acked-by: Ian Campbell<ian.campbell@citrix.com>If I''m reading this right, the filtering won''t prevent duplicate entries returned from get_numa_candidates; is that right? I.e., suppose you have a 4-node system and you run "xl cpupool-numa-split" to get one pool per node. Before this patch, your generator might return the following sets containing node 0: {0} {0,1} {0,2} {0,3} {0,1,2} {0,1,3} {0,1,2,3} {0,2,3} But now, if the domain is placed in a cpupool that has only numa node 0, it will return 8 copies of {0}. Is that correct? -George> > --- > Changes from v2: > * fixed typos in comments. > > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -132,25 +132,29 @@ static int numa_cmpf(const void *v1, con > } > > /* The actual automatic NUMA placement routine */ > -static int numa_place_domain(libxl__gc *gc, libxl_domain_build_info *info) > +static int numa_place_domain(libxl__gc *gc, uint32_t domid, > + libxl_domain_build_info *info) > { > int nr_candidates = 0; > libxl__numa_candidate *candidates = NULL; > libxl_bitmap candidate_nodemap; > - libxl_cpupoolinfo *pinfo; > - int nr_pools, rc = 0; > + libxl_cpupoolinfo cpupool_info; > + int i, cpupool, rc = 0; > uint32_t memkb; > > libxl_bitmap_init(&candidate_nodemap); > > - /* First of all, if cpupools are in use, better not to mess with them */ > - pinfo = libxl_list_cpupool(CTX,&nr_pools); > - if (!pinfo) > - return ERROR_FAIL; > - if (nr_pools> 1) { > - LOG(NOTICE, "skipping NUMA placement as cpupools are in use"); > - goto out; > - } > + /* > + * Extract the cpumap from the cpupool the domain belong to. In fact, > + * it only makes sense to consider the cpus/nodes that are in there > + * for placement. > + */ > + rc = cpupool = libxl__domain_cpupool(gc, domid); > + if (rc< 0) > + return rc; > + rc = libxl_cpupool_info(CTX,&cpupool_info, cpupool); > + if (rc) > + return rc; > > rc = libxl_domain_need_memory(CTX, info,&memkb); > if (rc) > @@ -162,7 +166,8 @@ static int numa_place_domain(libxl__gc * > > /* Find all the candidates with enough free memory and at least > * as much pcpus as the domain has vcpus. */ > - rc = libxl__get_numa_candidates(gc, memkb, info->max_vcpus, 0, 0, > + rc = libxl__get_numa_candidates(gc, memkb, info->max_vcpus, > + 0, 0,&cpupool_info.cpumap, > &candidates,&nr_candidates); > if (rc) > goto out; > @@ -188,13 +193,20 @@ static int numa_place_domain(libxl__gc * > if (rc) > goto out; > > + /* Avoid trying to set the affinity to cpus that might be in the > + * nodemap but not in our cpupool. */ > + libxl_for_each_set_bit(i, info->cpumap) { > + if (!libxl_bitmap_test(&cpupool_info.cpumap, i)) > + libxl_bitmap_reset(&info->cpumap, i); > + } > + > LOG(DETAIL, "NUMA placement candidate with %d nodes, %d cpus and " > "%"PRIu32" KB free selected", candidates[0].nr_nodes, > candidates[0].nr_cpus, candidates[0].free_memkb / 1024); > > out: > libxl_bitmap_dispose(&candidate_nodemap); > - libxl_cpupoolinfo_list_free(pinfo, nr_pools); > + libxl_cpupoolinfo_dispose(&cpupool_info); > return rc; > } > > @@ -217,7 +229,7 @@ int libxl__build_pre(libxl__gc *gc, uint > * whatever that turns out to be. > */ > if (libxl_bitmap_is_full(&info->cpumap)) { > - int rc = numa_place_domain(gc, info); > + int rc = numa_place_domain(gc, domid, info); > if (rc) > return rc; > } > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h > --- a/tools/libxl/libxl_internal.h > +++ b/tools/libxl/libxl_internal.h > @@ -2289,14 +2289,17 @@ typedef struct { > * least that amount of free memory and that number of cpus, respectively. If > * min_free_memkb and/or min_cpus are 0, the candidates'' free memory and number > * of cpus won''t be checked at all, which means a candidate will always be > - * considered suitable wrt the specific constraint. cndts is where the list of > - * exactly nr_cndts candidates is returned. Note that, in case no candidates > - * are found at all, the function returns successfully, but with nr_cndts equal > - * to zero. > + * considered suitable wrt the specific constraint. suitable_cpumap is useful > + * for specifying we want only the cpus in that mask to be considered while > + * generating placement candidates (for example because of cpupools). cndts is > + * where the list of exactly nr_cndts candidates is returned. Note that, in > + * case no candidates are found at all, the function returns successfully, but > + * with nr_cndts equal to zero. > */ > _hidden int libxl__get_numa_candidates(libxl__gc *gc, > uint32_t min_free_memkb, int min_cpus, > int min_nodes, int max_nodes, > + const libxl_bitmap *suitable_cpumap, > libxl__numa_candidate *cndts[], int *nr_cndts); > > /* Initialization, allocation and deallocation for placement candidates */ > diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c > --- a/tools/libxl/libxl_numa.c > +++ b/tools/libxl/libxl_numa.c > @@ -122,15 +122,27 @@ static void comb_get_nodemap(comb_iter_t > libxl_bitmap_set(nodemap, it[i]); > } > > +/* Retrieve how many nodes a nodemap spans */ > +static int nodemap_to_nr_nodes(const libxl_bitmap *nodemap) > +{ > + int i, nr_nodes = 0; > + > + libxl_for_each_set_bit(i, *nodemap) > + nr_nodes++; > + return nr_nodes; > +} > + > /* Retrieve the number of cpus that the nodes that are part of the nodemap > - * span. */ > + * span and are also set in suitable_cpumap. */ > static int nodemap_to_nr_cpus(libxl_cputopology *tinfo, int nr_cpus, > + const libxl_bitmap *suitable_cpumap, > const libxl_bitmap *nodemap) > { > int i, nodes_cpus = 0; > > for (i = 0; i< nr_cpus; i++) { > - if (libxl_bitmap_test(nodemap, tinfo[i].node)) > + if (libxl_bitmap_test(suitable_cpumap, i)&& > + libxl_bitmap_test(nodemap, tinfo[i].node)) > nodes_cpus++; > } > return nodes_cpus; > @@ -236,13 +248,14 @@ static int cpus_per_node_count(libxl_cpu > int libxl__get_numa_candidates(libxl__gc *gc, > uint32_t min_free_memkb, int min_cpus, > int min_nodes, int max_nodes, > + const libxl_bitmap *suitable_cpumap, > libxl__numa_candidate *cndts[], int *nr_cndts) > { > libxl__numa_candidate *new_cndts = NULL; > libxl_cputopology *tinfo = NULL; > libxl_numainfo *ninfo = NULL; > int nr_nodes = 0, nr_cpus = 0;/tmp/extdiff.HJ7jEN/xen-upstream.hg.2019315297ee/tools/libxl/libxl_numa.c > - libxl_bitmap nodemap; > + libxl_bitmap suitable_nodemap, nodemap; > int array_size, rc; > > libxl_bitmap_init(&nodemap); > @@ -267,6 +280,15 @@ int libxl__get_numa_candidates(libxl__gc > if (rc) > goto out; > > + /* Allocate and prepare the map of the node that can be utilized for > + * placement, basing on the map of suitable cpus. */ > + rc = libxl_node_bitmap_alloc(CTX,&suitable_nodemap, 0); > + if (rc) > + goto out; > + rc = libxl_cpumap_to_nodemap(CTX, suitable_cpumap,&suitable_nodemap); > + if (rc) > + goto out; > + > /* > * If the minimum number of NUMA nodes is not explicitly specified > * (i.e., min_nodes == 0), we try to figure out a sensible number of nodes > @@ -314,9 +336,14 @@ int libxl__get_numa_candidates(libxl__gc > for (comb_ok = comb_init(gc,&comb_iter, nr_nodes, min_nodes); comb_ok; > comb_ok = comb_next(comb_iter, nr_nodes, min_nodes)) { > uint32_t nodes_free_memkb; > - int nodes_cpus; > + int i, nodes_cpus; > > + /* Get the nodemap for the combination and filter unwanted nodes */ > comb_get_nodemap(comb_iter,&nodemap, min_nodes); > + libxl_for_each_set_bit(i, nodemap) { > + if (!libxl_bitmap_test(&suitable_nodemap, i)) > + libxl_bitmap_reset(&nodemap, i); > + } > > /* If there is not enough memory in this combination, skip it > * and go generating the next one... */ > @@ -325,7 +352,8 @@ int libxl__get_numa_candidates(libxl__gc > continue; > > /* And the same applies if this combination is short in cpus */ > - nodes_cpus = nodemap_to_nr_cpus(tinfo, nr_cpus,&nodemap); > + nodes_cpus = nodemap_to_nr_cpus(tinfo, nr_cpus, suitable_cpumap, > +&nodemap); > if (min_cpus&& nodes_cpus< min_cpus) > continue; > > @@ -350,12 +378,13 @@ int libxl__get_numa_candidates(libxl__gc > new_cndts[*nr_cndts].nr_domains > nodemap_to_nr_domains(gc, tinfo,&nodemap); > new_cndts[*nr_cndts].free_memkb = nodes_free_memkb; > - new_cndts[*nr_cndts].nr_nodes = min_nodes; > + new_cndts[*nr_cndts].nr_nodes = nodemap_to_nr_nodes(&nodemap); > new_cndts[*nr_cndts].nr_cpus = nodes_cpus; > > LOG(DEBUG, "NUMA placement candidate #%d found: nr_nodes=%d, " > "nr_cpus=%d, free_memkb=%"PRIu32"", *nr_cndts, > - min_nodes, new_cndts[*nr_cndts].nr_cpus, > + new_cndts[*nr_cndts].nr_nodes, > + new_cndts[*nr_cndts].nr_cpus, > new_cndts[*nr_cndts].free_memkb / 1024); > > (*nr_cndts)++; > @@ -365,6 +394,7 @@ int libxl__get_numa_candidates(libxl__gc > > *cndts = new_cndts; > out: > + libxl_bitmap_dispose(&suitable_nodemap); > libxl_bitmap_dispose(&nodemap); > libxl_cputopology_list_free(tinfo, nr_cpus); > libxl_numainfo_list_free(ninfo, nr_nodes);
Dario Faggioli
2012-Jul-06 12:52 UTC
Re: [PATCH 03 of 10 v3] xl: add more NUMA information to `xl info -n''
On Fri, 2012-07-06 at 13:15 +0100, Ian Campbell wrote:> > It looks the same here. While with two nodes it gives something like the > > below: > > > > 0: 1: > > 0: 10, 20 > > 1: 20, 10 > > > > More important, forgetting about the ''10'' do you want me to suppress > > that part of the output if there''s only one node? I did not think at it > > before, but it might make sense. Of course, I can also do this with a > > follow-up patch, just let me know. > > Oh, it''s a decimal number, >Indeed.> for some reason I was reading it in a bitmap > or a "distance per column" way (i.e. as dist[0]=0 and dist[1]=1), which > is non-sensical now I think about it >Well, I myself did not realize you were thinking that way myself. :-)> > Do you want me to investigate? > > It''s ok, was just me being dumb ;-) > > I''ll commit this shortly, along with the patches up until (and including > #7) >Cool. Thanks. Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Jul-06 13:00 UTC
Re: [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
On Fri, 2012-07-06 at 12:30 +0100, George Dunlap wrote:> One question I have: Is there any particular reason to sort the whole > list, rather than just finding the maximum based on the comparison function? >That is basically a leftover from my previous "let''s make things a bit more general" approach. Don''t get me wrong, I still think that and like sorting more than anything else for the interface (although it''s an "internal interface") but yes, right now looking for the max won''t be anything different from sorting descending and taking the #0 element.> But I think it''s been a long time and it looks good enough to me: > > Acked-by: George Dunlap <george.dunlap@eu.citrix.com> >Ok, I get this like I can leave it as it is... Or you want me to kill the sorting? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Jul-06 13:03 UTC
Re: [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
On Fri, 2012-07-06 at 11:55 +0100, Ian Campbell wrote:> > Once we know which ones, among all the possible combinations, represents valid > > placement candidates for a domain, use some heuistics for deciding which is the > > best. For instance, smaller candidates are considered to be better, both from > > the domain''s point of view (fewer memory spreading among nodes) and from the > > system as a whole point of view (fewer memoy fragmentation). In case of > > candidates of equal sizes (i.e., with the same number of nodes), the amount of > > free memory and the number of domain already assigned to their nodes are > > considered. Very often, candidates with greater amount of memory are the one > > we wants, as this is also good for keeping memory fragmentation under control. > > However, if the difference in how much free memory two candidates have, the > > number of assigned domains might be what decides which candidate wins. > > I can''t parse this last sentence. Are there some words missing after > "how much free memory two candidates have"? >Ok, I see. What about something like the below: "However, if the amount of free memory of two candidates is very similar, we look at how many domains are assigned to each candidate, and take the one that has fewer of them."> If you want to post the corrected text I think we can fold it in while > applying rather than reposting (assuming no other reason to repost crops > up). >Let''s see... I''m fine either way... Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Jul-06 13:05 UTC
Re: [PATCH 00 of 10 v3] Automatic NUMA placement for xl
On Fri, 2012-07-06 at 12:22 +0100, Ian Campbell wrote:> On Fri, 2012-07-06 at 12:20 +0100, Ian Campbell wrote: > > > You probably want another libxl_bitmap_init near the top of the > > function. >Damn. I tested with two, with (fake) 4 and (fake) 8, but I guess I forgot testing with just one node this time. Sorry for that. :-(> Works for me: >Good. Thank you, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2012-Jul-06 13:05 UTC
Re: [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
On 06/07/12 14:00, Dario Faggioli wrote:> On Fri, 2012-07-06 at 12:30 +0100, George Dunlap wrote: >> One question I have: Is there any particular reason to sort the whole >> list, rather than just finding the maximum based on the comparison function? >> > That is basically a leftover from my previous "let''s make things a bit > more general" approach. Don''t get me wrong, I still think that and like > sorting more than anything else for the interface (although it''s an > "internal interface") but yes, right now looking for the max won''t be > anything different from sorting descending and taking the #0 element. > >> But I think it''s been a long time and it looks good enough to me: >> >> Acked-by: George Dunlap<george.dunlap@eu.citrix.com> >> > Ok, I get this like I can leave it as it is... Or you want me to kill > the sorting?I can''t really foresee a time when anyone would want to use anything other than the best option. Just choosing the best makes a slightly simpler interface, and simplified the code somewhat. At the moment, sorting shouldn''t take too long, but suppose we get systems with 128 nodes at some point in the future -- then the number of possible combinations might be pretty large, and sorting that even at n log n might take a noticeable amount of time. So I think it''s up to you: If you thinking sorting will be useful in the future, then I think keep it. But if you also think it''s not going to be very useful, I think it would make more sense to take it out. -George
Dario Faggioli
2012-Jul-06 13:10 UTC
Re: [PATCH 09 of 10 v3] libxl: have NUMA placement deal with cpupools
On Fri, 2012-07-06 at 13:42 +0100, George Dunlap wrote:> > libxl: have NUMA placement deal with cpupools > > > > In such a way that only the cpus belonging to the cpupool of the > > domain being placed are considered for the placement itself. > > > > This happens by filtering out all the nodes in which the cpupool has > > not any cpu from the placement candidates. After that -- as a cpu pooling > > not necessarily happens at NUMA nodes boundaries -- we also make sure > > only the actual cpus that are part of the pool are considered when > > counting how much processors a placement candidate is able to provide. > > > > Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com> > > Acked-by: Ian Campbell<ian.campbell@citrix.com> > If I''m reading this right, the filtering won''t prevent duplicate entries > returned from get_numa_candidates; is that right? I.e., suppose you > have a 4-node system and you run "xl cpupool-numa-split" to get one pool > per node. Before this patch, your generator might return the following > sets containing node 0: > {0} > {0,1} > {0,2} > {0,3} > {0,1,2} > {0,1,3} > {0,1,2,3} > {0,2,3} > > But now, if the domain is placed in a cpupool that has only numa node 0, > it will return 8 copies of {0}. Is that correct? >It is. As the generation happens before cpupool are being considered at all. Point is, while the number of cores could be quite high (and continue to grow), the number of NUMA nodes in existing machines that such a case won''t hurt that much. Anyway, you''re definitely right, it would have been possible to do much better. Maybe, if we''re cool with patch 8, we can jut skip this for now, and I''ll resubmit a separate patch (where I''ll deal with duplicates) like later or on Monday? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2012-Jul-06 13:21 UTC
Re: [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
On Fri, 2012-07-06 at 14:03 +0100, Dario Faggioli wrote:> On Fri, 2012-07-06 at 11:55 +0100, Ian Campbell wrote: > > > Once we know which ones, among all the possible combinations, represents valid > > > placement candidates for a domain, use some heuistics for deciding which is the > > > best. For instance, smaller candidates are considered to be better, both from > > > the domain''s point of view (fewer memory spreading among nodes) and from the > > > system as a whole point of view (fewer memoy fragmentation). In case of > > > candidates of equal sizes (i.e., with the same number of nodes), the amount of > > > free memory and the number of domain already assigned to their nodes are > > > considered. Very often, candidates with greater amount of memory are the one > > > we wants, as this is also good for keeping memory fragmentation under control. > > > However, if the difference in how much free memory two candidates have, the > > > number of assigned domains might be what decides which candidate wins. > > > > I can''t parse this last sentence. Are there some words missing after > > "how much free memory two candidates have"? > > > Ok, I see. What about something like the below: > > "However, if the amount of free memory of two candidates is very > similar, we look at how many domains are assigned to each candidate, and > take the one that has fewer of them."Looks good, thanks. Is it still "very similar" now that it is a total order or is it actually a strict equality now? (or an equality down to the precision of a float...)> > If you want to post the corrected text I think we can fold it in while > > applying rather than reposting (assuming no other reason to repost crops > > up). > > > Let''s see... I''m fine either way...I think there''s a few other fixes to be made so it''ll have to be a repost. Ian/.
George Dunlap
2012-Jul-06 13:27 UTC
Re: [PATCH 09 of 10 v3] libxl: have NUMA placement deal with cpupools
On 06/07/12 14:10, Dario Faggioli wrote:> On Fri, 2012-07-06 at 13:42 +0100, George Dunlap wrote: >>> libxl: have NUMA placement deal with cpupools >>> >>> In such a way that only the cpus belonging to the cpupool of the >>> domain being placed are considered for the placement itself. >>> >>> This happens by filtering out all the nodes in which the cpupool has >>> not any cpu from the placement candidates. After that -- as a cpu pooling >>> not necessarily happens at NUMA nodes boundaries -- we also make sure >>> only the actual cpus that are part of the pool are considered when >>> counting how much processors a placement candidate is able to provide. >>> >>> Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com> >>> Acked-by: Ian Campbell<ian.campbell@citrix.com> >> If I''m reading this right, the filtering won''t prevent duplicate entries >> returned from get_numa_candidates; is that right? I.e., suppose you >> have a 4-node system and you run "xl cpupool-numa-split" to get one pool >> per node. Before this patch, your generator might return the following >> sets containing node 0: >> {0} >> {0,1} >> {0,2} >> {0,3} >> {0,1,2} >> {0,1,3} >> {0,1,2,3} >> {0,2,3} >> >> But now, if the domain is placed in a cpupool that has only numa node 0, >> it will return 8 copies of {0}. Is that correct? >> > It is. As the generation happens before cpupool are being considered at > all. Point is, while the number of cores could be quite high (and > continue to grow), the number of NUMA nodes in existing machines that > such a case won''t hurt that much. Anyway, you''re definitely right, it > would have been possible to do much better. > > Maybe, if we''re cool with patch 8, we can jut skip this for now, and > I''ll resubmit a separate patch (where I''ll deal with duplicates) like > later or on Monday?Well, before discussing acking or nacking, I just wanted to establish that this is what the code did. The thing is, apart from re-writing your generator only to use nodes in the cpupool, the most efficient thing to get rid of duplicates is probably to do a sort anyway; so the end-to-end number of operations may end up similar either way. Why don''t we do this: let''s check in this version, so we can start getting the cpu placement stuff tested. Then if there''s time, you can post patches to do the filtering at the node generation stage rather than the filtering stage. Does that make sense? -George
Ian Campbell
2012-Jul-06 13:32 UTC
Re: [PATCH 09 of 10 v3] libxl: have NUMA placement deal with cpupools
On Fri, 2012-07-06 at 14:27 +0100, George Dunlap wrote:> Why don''t we do this: let''s check in this version, so we can start > getting the cpu placement stuff tested. Then if there''s time, you can > post patches to do the filtering at the node generation stage rather > than the filtering stage. Does that make sense?FWIW I''d be happy with this approach. Keen to get the basic functionality in ;-) Ian.
Dario Faggioli
2012-Jul-06 13:42 UTC
Re: [PATCH 09 of 10 v3] libxl: have NUMA placement deal with cpupools
On Fri, 2012-07-06 at 14:27 +0100, George Dunlap wrote:> > Maybe, if we''re cool with patch 8, we can jut skip this for now, and > > I''ll resubmit a separate patch (where I''ll deal with duplicates) like > > later or on Monday? > Well, before discussing acking or nacking, I just wanted to establish > that this is what the code did. >Sure.> Why don''t we do this: let''s check in this version, so we can start > getting the cpu placement stuff tested. Then if there''s time, you can > post patches to do the filtering at the node generation stage rather > than the filtering stage. Does that make sense? >It does to me, and I also think it''s important to start seeing how this deals with some more thorough (automated or not) testing. Especially considering that changing the generator (and this applies also to the max-VS-sort thing) ex-post won''t imply any change in the algorithm, so the test results we get with this version will still be valid (at least conceptually :-D). Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Jul-06 13:52 UTC
Re: [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
On Fri, 2012-07-06 at 14:21 +0100, Ian Campbell wrote:> > Ok, I see. What about something like the below: > > > > "However, if the amount of free memory of two candidates is very > > similar, we look at how many domains are assigned to each candidate, and > > take the one that has fewer of them." > > Looks good, thanks. >Ok.> Is it still "very similar" now that it is a total order or is it > actually a strict equality now? (or an equality down to the precision of > a float...) >It is, at least according to me. In fact, if the normalized difference between the amount of free memory of two candidates is 0.1 (i.e., candidates are "very similar") and the difference in number of domains is -0.5 (one has half the domains the other does) we get 3*0.1-0.5=-0.2 ==> the number of domains is important. OTOH, a normalized difference of 0.2 in the amount of free memory (i.e., candidates are no longer "very similar") is enough to prevent the same difference in number of domains from subverting the sign of the comparison. It of course depends on how much details we want to go in in this kind of documentation. Given there''s no way neither an application developer nor an user could change/affect the parameters controlling the heuristics, I think what we have there is just fine. Did I persuade you? :-P Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2012-Jul-06 13:54 UTC
Re: [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
On Fri, 2012-07-06 at 14:52 +0100, Dario Faggioli wrote:> On Fri, 2012-07-06 at 14:21 +0100, Ian Campbell wrote: > > > Ok, I see. What about something like the below: > > > > > > "However, if the amount of free memory of two candidates is very > > > similar, we look at how many domains are assigned to each candidate, and > > > take the one that has fewer of them." > > > > Looks good, thanks. > > > Ok. > > > Is it still "very similar" now that it is a total order or is it > > actually a strict equality now? (or an equality down to the precision of > > a float...) > > > It is, at least according to me. In fact, if the normalized difference > between the amount of free memory of two candidates is 0.1 (i.e., > candidates are "very similar") and the difference in number of domains > is -0.5 (one has half the domains the other does) we get > 3*0.1-0.5=-0.2 ==> the number of domains is important. OTOH, a > normalized difference of 0.2 in the amount of free memory (i.e., > candidates are no longer "very similar") is enough to prevent the same > difference in number of domains from subverting the sign of the > comparison. > > It of course depends on how much details we want to go in in this kind > of documentation. Given there''s no way neither an application developer > nor an user could change/affect the parameters controlling the > heuristics, I think what we have there is just fine. > > Did I persuade you? :-PYup, wasn''t asking for docs just making sure I understood ;-)> > Thanks and Regards, > Dario >
George Dunlap
2012-Jul-06 14:08 UTC
Re: [PATCH 10 of 10 v3] Some automatic NUMA placement documentation
On 04/07/12 17:18, Dario Faggioli wrote:> # HG changeset patch > # User Dario Faggioli<raistlin@linux.it> > # Date 1341416324 -7200 > # Node ID f1523c3dc63746e07b11fada5be3d461c3807256 > # Parent 885e2f385601d66179058bfb6bd3960f17d5e068 > Some automatic NUMA placement documentation > > About rationale, usage and (some small bits of) API. > > Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com> > Acked-by: Ian Campbell<ian.campbell@citrix.com> > > Changes from v1: > * API documentation moved close to the actual functions. > > diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown > new file mode 100644 > --- /dev/null > +++ b/docs/misc/xl-numa-placement.markdown > @@ -0,0 +1,91 @@ > +# Guest Automatic NUMA Placement in libxl and xl # > + > +## Rationale ## > + > +NUMA means the memory accessing times of a program running on a CPU depends on > +the relative distance between that CPU and that memory. In fact, most of the > +NUMA systems are built in such a way that each processor has its local memory, > +on which it can operate very fast. On the other hand, getting and storing data > +from and on remote memory (that is, memory local to some other processor) is > +quite more complex and slow. On these machines, a NUMA node is usually defined > +as a set of processor cores (typically a physical CPU package) and the memory > +directly attached to the set of cores. > + > +The Xen hypervisor deals with Non-Uniform Memory Access (NUMA]) machines by > +assigning to its domain a "node affinity", i.e., a set of NUMA nodes of the > +host from which it gets its memory allocated. > + > +NUMA awareness becomes very important as soon as many domains start running > +memory-intensive workloads on a shared host. In fact, the cost of accessing non > +node-local memory locations is very high, and the performance degradation is > +likely to be noticeable. > + > +## Guest Placement in xl ## > + > +If using xl for creating and managing guests, it is very easy to ask for both > +manual or automatic placement of them across the host''s NUMA nodes. > + > +Note that xm/xend does the very same thing, the only differences residing in > +the details of the heuristics adopted for the placement (see below). > + > +### Manual Guest Placement with xl ### > + > +Thanks to the "cpus=" option, it is possible to specify where a domain should > +be created and scheduled on, directly in its config file. This affects NUMA > +placement and memory accesses as the hypervisor constructs the node affinity of > +a VM basing right on its CPU affinity when it is created. > + > +This is very simple and effective, but requires the user/system administrator > +to explicitly specify affinities for each and every domain, or Xen won''t be > +able to guarantee the locality for their memory accesses. > + > +It is also possible to deal with NUMA by partitioning the system using cpupools > +(available in the upcoming release of Xen, 4.2). Again, this could be "The > +Right Answer" for many needs and occasions, but has to to be carefully > +considered and manually setup by hand. > + > +### Automatic Guest Placement with xl ### > + > +In case no "cpus=" option is specified in the config file, libxl tries toI think "If no ''cpus='' option..." is better here.> +figure out on its own on which node(s) the domain could fit best. It is > +worthwhile noting that optimally fitting a set of VMs on the NUMA nodes of an > +host host is an incarnation of the Bin Packing Problem. In fact, the varioushost host> +VMs with different memory sizes are the items to be packed, and the host nodes > +are the bins. That is known to be NP-hard, thus, it is probably better to > +tackle the problem with some sort of hauristics, as we do not have any oracle > +available!I think you can just say "...is an incarnation of the Bin Packing Problem, which is known to be NP-hard." We will therefore be using some heuristics." (nb the spelling of "heuristics" as well.)> + > +The first thing to do is finding a node, or even a set of nodes, that have > +enough free memory and enough physical CPUs for accommodating the one new > +domain. The idea is to find a spot for the domain with at least as much free > +memory as it has configured, and as much pCPUs as it has vCPUs. After that, > +the actual decision on which solution to go for happens accordingly to the > +following heuristics: > + > + * candidates involving fewer nodes come first. In case two (or more) > + candidates span the same number of nodes, > + * the amount of free memory and the number of domains assigned to the > + candidates are considered. In doing that, candidates with greater amount > + of free memory and fewer assigned domains are preferred, with free memory > + "weighting" three times as much as number of domains. > + > +Giving preference to small candidates ensures better performance for the guest,I think I would say "candidates with fewer nodes" here; "small candidates" doesn''t convey "fewer nodes" to me.> +as it avoid spreading its memory among different nodes. Favouring the nodes > +that have the biggest amounts of free memory helps keeping the memoryWe normally don''t say "big amount", but "large amount" (don''t ask me why -- just sounds a bit funny to me). So this would be "largest amount".> +fragmentation small, from a system wide perspective. However, in case moreAgain, s/in case/if/; Other than that, looks good to me. -George> +candidates fulfil these criteria by roughly the same extent, having the number > +of domains the candidates are "hosting" helps balancing the load on the various > +nodes. > + > +## Guest Placement within libxl ## > + > +xl achieves automatic NUMA just because libxl does it interrnally. > +No API is provided (yet) for interacting with this feature and modify > +the library behaviour regarding automatic placement, it just happens > +by default if no affinity is specified (as it is with xm/xend). > + > +For actually looking and maybe tweaking the mechanism and the algorithms it > +uses, all is implemented as a set of libxl internal interfaces and facilities. > +Look at the comment "Automatic NUMA placement" in libxl\_internal.h. > + > +Note this may change in future versions of Xen/libxl.
George Dunlap
2012-Jul-06 14:26 UTC
Re: [PATCH 10 of 10 v3] Some automatic NUMA placement documentation
On 06/07/12 15:08, George Dunlap wrote:> On 04/07/12 17:18, Dario Faggioli wrote: >> +that have the biggest amounts of free memory helps keeping the memory > We normally don''t say "big amount", but "large amount" (don''t ask me > why -- just sounds a bit funny to me). So this would be "largest amount".BTW, the rule as far as I can tell is this: * "big" can apply to countable things, physical or not; but can''t apply to quantities. So "big cat", "big house", "big idea", and "big ego" are OK, but "big volume" or "big amount" are wrong. * "large" can apply to quantities, and to physical things, but not to non-physical things. So "large cat", "large house", "large volume", and "large amount" are all OK; but "large idea" and "large ego" are wrong. -George
Dario Faggioli
2012-Jul-06 14:35 UTC
Re: [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
On Fri, 2012-07-06 at 14:05 +0100, George Dunlap wrote:> > Ok, I get this like I can leave it as it is... Or you want me to kill > > the sorting? > I can''t really foresee a time when anyone would want to use anything > other than the best option. >Well, perhaps being able to do some more fiddling, like <<now that I have all the ones that meet _THESE_ characteristics, and I have them in _THAT_ precise ordering, let''s pick up the first that meets _THIS_OTHER_ requirement>>. Anyway, it might well be over-thinking the whole thing. In my first RFC, when I was introducing more placement policies (and the respective user interfaces and configuration bits), I was exploiting the fact that, like this, I never loose access to the full list of candidates, so, maybe when/if that will be the case again (during 4.3 dev cycle) everything will be more clear. As soon as we''ll have a better picture of what feature and what interface we want this whole placement thing to have, what kind of users and behaviour we want to support (e.g., what libvirt does and what does it require wrt NUMA placement), we could better decide what to do. That''s the benefit of having all this internally and not exported in any means yet (a benefit for which I give you and Ian all the credits :-P).> Just choosing the best makes a slightly > simpler interface, and simplified the code somewhat. >I can''t and am not going to argue against that, as I think that too.> At the moment, > sorting shouldn''t take too long, but suppose we get systems with 128 > nodes at some point in the future -- then the number of possible > combinations might be pretty large, and sorting that even at n log n > might take a noticeable amount of time. >Ditto.> So I think it''s up to you: If you thinking sorting will be useful in the > future, then I think keep it. But if you also think it''s not going to > be very useful, I think it would make more sense to take it out. >As we agreed elsewhere in the thread, let''s keep it like this for now, and see how it behaves. I''ll keep an eye at it, and won''t push for keeping sort() instead of max() if shows to not provide any benefit. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Jul-06 14:37 UTC
Re: [PATCH 10 of 10 v3] Some automatic NUMA placement documentation
On Fri, 2012-07-06 at 15:26 +0100, George Dunlap wrote:> BTW, the rule as far as I can tell is this: > * "big" can apply to countable things, physical or not; but can''t apply > to quantities. So "big cat", "big house", "big idea", and "big ego" are > OK, but "big volume" or "big amount" are wrong. > * "large" can apply to quantities, and to physical things, but not to > non-physical things. So "large cat", "large house", "large volume", and > "large amount" are all OK; but "large idea" and "large ego" are wrong. >Wow. Very interesting (/me taking notes for the next time :-D) Thanks, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2012-Jul-06 14:40 UTC
Re: [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
On 06/07/12 15:35, Dario Faggioli wrote:> On Fri, 2012-07-06 at 14:05 +0100, George Dunlap wrote: >>> Ok, I get this like I can leave it as it is... Or you want me to kill >>> the sorting? >> I can''t really foresee a time when anyone would want to use anything >> other than the best option. >> > Well, perhaps being able to do some more fiddling, like<<now that I > have all the ones that meet _THESE_ characteristics, and I have them in > _THAT_ precise ordering, let''s pick up the first that meets _THIS_OTHER_ > requirement>>. > > Anyway, it might well be over-thinking the whole thing. In my first RFC, > when I was introducing more placement policies (and the respective user > interfaces and configuration bits), I was exploiting the fact that, like > this, I never loose access to the full list of candidates, so, maybe > when/if that will be the case again (during 4.3 dev cycle) everything > will be more clear. > > As soon as we''ll have a better picture of what feature and what > interface we want this whole placement thing to have, what kind of users > and behaviour we want to support (e.g., what libvirt does and what does > it require wrt NUMA placement), we could better decide what to do. > That''s the benefit of having all this internally and not exported in any > means yet (a benefit for which I give you and Ian all the credits :-P). > >> Just choosing the best makes a slightly >> simpler interface, and simplified the code somewhat. >> > I can''t and am not going to argue against that, as I think that too. > >> At the moment, >> sorting shouldn''t take too long, but suppose we get systems with 128 >> nodes at some point in the future -- then the number of possible >> combinations might be pretty large, and sorting that even at n log n >> might take a noticeable amount of time. >> > Ditto. > >> So I think it''s up to you: If you thinking sorting will be useful in the >> future, then I think keep it. But if you also think it''s not going to >> be very useful, I think it would make more sense to take it out. >> > As we agreed elsewhere in the thread, let''s keep it like this for now, > and see how it behaves. I''ll keep an eye at it, and won''t push for > keeping sort() instead of max() if shows to not provide any benefit.OK -- then I think we have a go-ahead to check in patches 8 and 9 (with perhaps some changes to wording of some comments?). -George
Ian Campbell
2012-Jul-06 16:27 UTC
Re: [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes
On Fri, 2012-07-06 at 15:40 +0100, George Dunlap wrote:> On 06/07/12 15:35, Dario Faggioli wrote: > > On Fri, 2012-07-06 at 14:05 +0100, George Dunlap wrote: > >>> Ok, I get this like I can leave it as it is... Or you want me to kill > >>> the sorting? > >> I can''t really foresee a time when anyone would want to use anything > >> other than the best option. > >> > > Well, perhaps being able to do some more fiddling, like<<now that I > > have all the ones that meet _THESE_ characteristics, and I have them in > > _THAT_ precise ordering, let''s pick up the first that meets _THIS_OTHER_ > > requirement>>. > > > > Anyway, it might well be over-thinking the whole thing. In my first RFC, > > when I was introducing more placement policies (and the respective user > > interfaces and configuration bits), I was exploiting the fact that, like > > this, I never loose access to the full list of candidates, so, maybe > > when/if that will be the case again (during 4.3 dev cycle) everything > > will be more clear. > > > > As soon as we''ll have a better picture of what feature and what > > interface we want this whole placement thing to have, what kind of users > > and behaviour we want to support (e.g., what libvirt does and what does > > it require wrt NUMA placement), we could better decide what to do. > > That''s the benefit of having all this internally and not exported in any > > means yet (a benefit for which I give you and Ian all the credits :-P). > > > >> Just choosing the best makes a slightly > >> simpler interface, and simplified the code somewhat. > >> > > I can''t and am not going to argue against that, as I think that too. > > > >> At the moment, > >> sorting shouldn''t take too long, but suppose we get systems with 128 > >> nodes at some point in the future -- then the number of possible > >> combinations might be pretty large, and sorting that even at n log n > >> might take a noticeable amount of time. > >> > > Ditto. > > > >> So I think it''s up to you: If you thinking sorting will be useful in the > >> future, then I think keep it. But if you also think it''s not going to > >> be very useful, I think it would make more sense to take it out. > >> > > As we agreed elsewhere in the thread, let''s keep it like this for now, > > and see how it behaves. I''ll keep an eye at it, and won''t push for > > keeping sort() instead of max() if shows to not provide any benefit. > OK -- then I think we have a go-ahead to check in patches 8 and 9 (with > perhaps some changes to wording of some comments?).Patch 8 had a crash whcih needs resolving first. I''m expecting Dario will repost. Ian.
Ian Campbell
2012-Jul-08 18:32 UTC
Re: [PATCH 00 of 10 v3] Automatic NUMA placement for xl
(resend as I don''t think I had SMTP setup properly on my laptop, sorry if you get this twice!) On Wed, 2012-07-04 at 12:17 -0400, Dario Faggioli wrote:> Hello, > > Third version of the NUMA placement series Xen 4.2.Starting an HVM guest (although I don''t suppose it is HVM specific) on a none NUMA system I get this: libxl: debug: libxl_numa.c:270:libxl__get_numa_candidates: only 1 node. no placement required libxl: detail: libxl_dom.c:175:numa_place_domain: 0 NUMA placement candidates found libxl: notice: libxl_dom.c:180:numa_place_domain: NUMA placement failed, performance might be affected this being a non-NUMA system I suppose it is not the end of the world. It''d be nice to avoid the warning though -- perhaps libxl__get_numa_candidates should either not special case single node systems or it should manually return the trivial candidate set?> All the comments received during v2''s review have been addressed (more details > in single changelogs). > > The most notable changes are the following: > - the libxl_cpumap --> libxl_bitmap renaming has been rebased on top of the > recent patches that allows us to allocate bitmaps of different sizes; > - the heuristics for deciding which NUMA placement is the best one has been > redesigned, so that it now provides total ordering. > > Here it is what this posting contains (* = acked during previous round): > > * [PATCH 01 of 10 v3] libxl: add a new Array type to the IDL > [PATCH 02 of 10 v3] libxl,libxc: introduce libxl_get_numainfo() > * [PATCH 03 of 10 v3] xl: add more NUMA information to `xl info -n'' > [PATCH 04 of 10 v3] libxl: rename libxl_cpumap to libxl_bitmap > [PATCH 05 of 10 v3] libxl: expand the libxl_bitmap API a bit > * [PATCH 06 of 10 v3] libxl: introduce some node map helpers > [PATCH 07 of 10 v3] libxl: explicitly check for libmath in autoconf > > Is where data structures, utility functions and infrastructure are introduced. > > * [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes > * [PATCH 09 of 10 v3] libxl: have NUMA placement deal with cpupools > > Host the core of the mechanism. > > * [PATCH 10 of 10 v3] Some automatic NUMA placement documentation > > For some more documentation. > > Thanks a lot and Regards, > Dario >
Dario Faggioli
2012-Jul-09 14:32 UTC
Re: [PATCH 00 of 10 v3] Automatic NUMA placement for xl
On Sun, 2012-07-08 at 12:32 -0600, Ian Campbell wrote:> On Wed, 2012-07-04 at 12:17 -0400, Dario Faggioli wrote: > > Hello, > > > > Third version of the NUMA placement series Xen 4.2. > > Starting an HVM guest (although I don''t suppose it is HVM specific) on a > none NUMA system I get this: > > libxl: debug: libxl_numa.c:270:libxl__get_numa_candidates: only 1 node. no placement required > libxl: detail: libxl_dom.c:175:numa_place_domain: 0 NUMA placement candidates found > libxl: notice: libxl_dom.c:180:numa_place_domain: NUMA placement failed, performance might be affected >Mmm... I see. No, it''s not HVM specific and you''re right, it does not make much sense.> this being a non-NUMA system I suppose it is not the end of the world. > It''d be nice to avoid the warning though -- perhaps > libxl__get_numa_candidates should either not special case single node > systems or it should manually return the trivial candidate set? >Yep, I can definitely do that, along with the other fix you suggested and the wording of the doc patch and resend the last three. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Jul-10 15:16 UTC
Re: [PATCH 09 of 10 v3] libxl: have NUMA placement deal with cpupools
On Fri, 2012-07-06 at 14:27 +0100, George Dunlap wrote:> Why don''t we do this: let''s check in this version, so we can start > getting the cpu placement stuff tested. Then if there''s time, you can > post patches to do the filtering at the node generation stage rather > than the filtering stage. Does that make sense? >FYI, while at it, I modified the filtering, basically moving it at nodemap generation time, so that it now should only generate and consider the candidates that makes actual sense, taking the domain''s cpupool into account, without all those duplicates. Thanks for pointing that out, I knew it was an issue, but did not realize how ugly it was. :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Apparently Analagous Threads
- [PATCH v2 0/5] xl: allow for node-wise specification of vcpu pinning
- [PATCH] libxl: fix compile error of libvirt
- [PATCH 00 of 16] libxl: autogenerate type definitions and destructor functions
- [PATCH 00/13] Coverity fixes for libxl
- [PATCH 0 of 3] libxl: cleanups for type destructor generation