thr3ads.net - Xen devel - [PATCH v4 00/15] Implement vcpu soft affinity for credit1 [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Dario Faggioli

2013-Nov-22 18:56 UTC

[PATCH v4 00/15] Implement vcpu soft affinity for credit1

Take 4 for soft affinities.

Very briefly, what it does is allowing each vcpu to have:
 - an hard affinity, which they already do, and we usually call pinning. This
   is the list of pcpus where a vcpu is allowed to run;
 - a soft affinity, which this series introduces. This is the list of pcpus
   where a vcpu *prefers* to run.

Once that is done, per-vcpu NUMA-aware scheduling is easily implemented on top
of that, just by instructing libxl to issue the proper call to setup the soft
affinity of the domain''s vcpus to be equal to its node-affinity.

In this version, I addressed all the comments I got during v3[*], mostly at the
libxl and xl level.  For xl, I re-organized and refactored the code, as it was
requested. For xl, I changed the interface as IanC suggested, so now there is
only one function dealing with both hard and soft affinities via 2 cpumaps,
either of which (but not both!) can be NULL. To honour libxl API stability,
LIBXL_API_VERSION #if-deffery is exploited, as also suggersted during v3
review.

Release whise, George expressed himself in favour of a freeze exception here:
 http://bugs.xenproject.org/xen/mid/<528B8B25.6000608@eu.citrix.com>

The patch series is available in the following git branch:

 git://xenbits.xen.org/people/dariof/xen.git numa/per-vcpu-affinity-v4

Thanks and Regards,
Dario

[*]
http://bugs.xenproject.org/xen/mid/<20131118175544.31002.79574.stgit@Solace>

---

Dario Faggioli (15):
   a  xl: match output of vcpu-list with pinning syntax
      libxl: sanitize error handling in libxl_get_max_{cpus,nodes}
 n    libxl: introduce libxl_get_nr_cpus()
  ra  xl: allow for node-wise specification of vcpu pinning
   a  xl: implement and enable dryrun mode for `xl vcpu-pin''
   a  xl: test script for the cpumap parser (for vCPU pinning)
  r   xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity
  r   xen: sched: introduce soft-affinity and use it instead d->node-affinity
  r   xen: derive NUMA node affinity from hard and soft CPU affinity
  r   xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
   a  libxc: get and set soft and hard affinity
      libxl: get and set soft affinity
      xl: enable getting and setting soft
*     xl: enable for specifying node-affinity in the config file
   a  libxl: automatic NUMA placement affects soft affinity

 n = new in v4
 r = has been ''Reviewed-by''
 a = has been ''Acked-by''
 * = has been ''Acked-by'' but the implementation changed a bit
     (IanJ explicitly said to point this out)

 docs/man/xl.cfg.pod.5                           |   64 ++-
 docs/man/xl.pod.1                               |   24 +
 docs/misc/xl-numa-placement.markdown            |  162 +++++---
 tools/libxc/xc_domain.c                         |   47 +-
 tools/libxc/xenctrl.h                           |   44 ++
 tools/libxl/Makefile                            |    2 
 tools/libxl/check-xl-vcpupin-parse              |  294 ++++++++++++++
 tools/libxl/check-xl-vcpupin-parse.data-example |   53 +++
 tools/libxl/libxl.c                             |   96 ++++-
 tools/libxl/libxl.h                             |   41 ++
 tools/libxl/libxl_create.c                      |    6 
 tools/libxl/libxl_dom.c                         |   23 +
 tools/libxl/libxl_types.idl                     |    4 
 tools/libxl/libxl_utils.c                       |   76 ++++
 tools/libxl/libxl_utils.h                       |   47 +-
 tools/libxl/xl_cmdimpl.c                        |  482 +++++++++++++++--------
 tools/libxl/xl_cmdtable.c                       |    5 
 tools/ocaml/libs/xc/xenctrl_stubs.c             |    8 
 tools/python/xen/lowlevel/xc/xc.c               |    6 
 xen/arch/x86/traps.c                            |   13 -
 xen/common/domain.c                             |   86 ++--
 xen/common/domctl.c                             |   92 ++++
 xen/common/keyhandler.c                         |    4 
 xen/common/sched_credit.c                       |  161 +++-----
 xen/common/sched_sedf.c                         |    2 
 xen/common/schedule.c                           |   57 ++-
 xen/common/wait.c                               |   10 
 xen/include/public/domctl.h                     |   15 +
 xen/include/xen/sched.h                         |   14 -
 29 files changed, 1452 insertions(+), 486 deletions(-)
 create mode 100755 tools/libxl/check-xl-vcpupin-parse
 create mode 100644 tools/libxl/check-xl-vcpupin-parse.data-example

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Dario Faggioli

2013-Nov-22 18:56 UTC

head link

[PATCH v4 01/15] xl: match output of vcpu-list with pinning syntax

in fact, pinning to all the pcpus happens by specifying "all"
(either on the command line or in the config file), while `xl
vcpu-list'' report it as "any cpu".

Change this into something more consistent, by using "all"
everywhere.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v1:
 * this patch was not there in v1. It is now as using the
   same syntax for both input and output was requested
   during review.
---
 tools/libxl/xl_cmdimpl.c |   27 +++++++--------------------
 1 file changed, 7 insertions(+), 20 deletions(-)

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 341863e..26d7712 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3118,8 +3118,7 @@ out:
     }
 }
 
-/* If map is not full, prints it and returns 0. Returns 1 otherwise. */
-static int print_bitmap(uint8_t *map, int maplen, FILE *stream)
+static void print_bitmap(uint8_t *map, int maplen, FILE *stream)
 {
     int i;
     uint8_t pmap = 0, bitmask = 0;
@@ -3157,28 +3156,16 @@ static int print_bitmap(uint8_t *map, int maplen, FILE
*stream)
         case 2:
             break;
         case 1:
-            if (firstset == 0)
-                return 1;
+            if (firstset == 0) {
+                fprintf(stream, "all");
+                break;
+            }
         case 3:
             fprintf(stream, "%s%d", state > 1 ? "," :
"", firstset);
             if (i - 1 > firstset)
                 fprintf(stream, "-%d", i - 1);
             break;
     }
-
-    return 0;
-}
-
-static void print_cpumap(uint8_t *map, int maplen, FILE *stream)
-{
-    if (print_bitmap(map, maplen, stream))
-        fprintf(stream, "any cpu");
-}
-
-static void print_nodemap(uint8_t *map, int maplen, FILE *stream)
-{
-    if (print_bitmap(map, maplen, stream))
-        fprintf(stream, "any node");
 }
 
 static void list_domains(int verbose, int context, int claim, int numa,
@@ -3251,7 +3238,7 @@ static void list_domains(int verbose, int context, int
claim, int numa,
             libxl_domain_get_nodeaffinity(ctx, info[i].domid, &nodemap);
 
             putchar('' '');
-            print_nodemap(nodemap.map, physinfo.nr_nodes, stdout);
+            print_bitmap(nodemap.map, physinfo.nr_nodes, stdout);
         }
         putchar(''\n'');
     }
@@ -4463,7 +4450,7 @@ static void print_vcpuinfo(uint32_t tdomid,
     /*      TIM */
     printf("%9.1f  ", ((float)vcpuinfo->vcpu_time / 1e9));
     /* CPU AFFINITY */
-    print_cpumap(vcpuinfo->cpumap.map, nr_cpus, stdout);
+    print_bitmap(vcpuinfo->cpumap.map, nr_cpus, stdout);
     printf("\n");
 }

Dario Faggioli

2013-Nov-22 18:56 UTC

head link

[PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

as well as both error handling and logging in libxl_cpu_bitmap_alloc
and libxl_node_bitmap_alloc.

Now libxl_get_max_{cpus,nodes} either return a positive number, or
a libxl error code. Thanks to that, it is possible to fix loggig for
the two bitmap allocation functions, which now happens _inside_ the
functions themselves, and report what happens more accurately.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Changes from v3:
 * switch the functions to LOG() / LOGE().
 * take care of the callers of libxl_get_max_{cpus,nodes}() too.

Changes from v2:
 * this wasn''t there in v2, but fixing this for v3 was requested
   during v2 review.
---
 tools/libxl/libxl.c       |   14 ++++-------
 tools/libxl/libxl_utils.c |   58 +++++++++++++++++++++++++++++++++++++++++++--
 tools/libxl/libxl_utils.h |   32 +++++--------------------
 3 files changed, 67 insertions(+), 37 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 9b93262..61d8a76 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -611,10 +611,8 @@ static int cpupool_info(libxl__gc *gc,
     info->n_dom = xcinfo->n_dom;
     rc = libxl_cpu_bitmap_alloc(CTX, &info->cpumap, 0);
     if (rc)
-    {
-        LOG(ERROR, "unable to allocate cpumap %d\n", rc);
         goto out;
-    }
+
     memcpy(info->cpumap.map, xcinfo->cpumap, info->cpumap.size);
 
     rc = 0;
@@ -4351,7 +4349,7 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx,
int *nb_cpu_out)
     int max_cpus;
 
     max_cpus = libxl_get_max_cpus(ctx);
-    if (max_cpus == 0)
+    if (max_cpus <= 0)
     {
         LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of
CPUS");
         ret = NULL;
@@ -4416,7 +4414,7 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int
*nr)
     int i, j, max_nodes;
 
     max_nodes = libxl_get_max_nodes(ctx);
-    if (max_nodes == 0)
+    if (max_nodes <= 0)
     {
         LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of
NODES");
         ret = NULL;
@@ -4538,10 +4536,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t
domid,
     }
 
     for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu,
++ptr) {
-        if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) {
-            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "allocating
cpumap");
+        if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0))
             return NULL;
-        }
         if (xc_vcpu_getinfo(ctx->xch, domid, *nb_vcpu, &vcpuinfo) == -1)
{
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu
info");
             return NULL;
@@ -5304,7 +5300,7 @@ int libxl_get_freecpus(libxl_ctx *ctx, libxl_bitmap
*cpumap)
     int ncpus;
 
     ncpus = libxl_get_max_cpus(ctx);
-    if (ncpus == 0)
+    if (ncpus <= 0)
         return ERROR_FAIL;
 
     cpumap->map = xc_cpupool_freeinfo(ctx->xch);
diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index 9f5f589..1815422 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -651,6 +651,56 @@ char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const
libxl_bitmap *bitmap)
     return q;
 }
 
+int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int max_cpus)
+{
+    GC_INIT(ctx);
+    int rc = 0;
+
+    if (max_cpus < 0) {
+        rc = ERROR_INVAL;
+        goto out;
+    }
+    if (max_cpus == 0)
+        max_cpus = libxl_get_max_cpus(ctx);
+    if (max_cpus <= 0) {
+        LOG(ERROR, "failed to retrieve the maximum number of cpus");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+    /* This can''t fail: no need to check and log */
+    libxl_bitmap_alloc(ctx, cpumap, max_cpus);
+
+ out:
+    GC_FREE;
+    return rc;
+}
+
+int libxl_node_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *nodemap,
+                            int max_nodes)
+{
+    GC_INIT(ctx);
+    int rc = 0;
+
+    if (max_nodes < 0) {
+        rc = ERROR_INVAL;
+        goto out;
+    }
+
+    if (max_nodes == 0)
+        max_nodes = libxl_get_max_nodes(ctx);
+    if (max_nodes <= 0) {
+        LOG(ERROR, "failed to retrieve the maximum number of nodes");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+    /* This can''t fail: no need to check and log */
+    libxl_bitmap_alloc(ctx, nodemap, max_nodes);
+
+ out:
+    GC_FREE;
+    return rc;
+}
+
 int libxl_nodemap_to_cpumap(libxl_ctx *ctx,
                             const libxl_bitmap *nodemap,
                             libxl_bitmap *cpumap)
@@ -719,12 +769,16 @@ int libxl_cpumap_to_nodemap(libxl_ctx *ctx,
 
 int libxl_get_max_cpus(libxl_ctx *ctx)
 {
-    return xc_get_max_cpus(ctx->xch);
+    int max_cpus = xc_get_max_cpus(ctx->xch);
+
+    return max_cpus <= 0 ? ERROR_FAIL : max_cpus;
 }
 
 int libxl_get_max_nodes(libxl_ctx *ctx)
 {
-    return xc_get_max_nodes(ctx->xch);
+    int max_nodes = xc_get_max_nodes(ctx->xch);
+
+    return max_nodes <= 0 ? ERROR_FAIL : max_nodes;
 }
 
 int libxl__enum_from_string(const libxl_enum_string_table *t,
diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h
index 7b84e6a..b11cf28 100644
--- a/tools/libxl/libxl_utils.h
+++ b/tools/libxl/libxl_utils.h
@@ -98,32 +98,12 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap
*bitmap, int bit)
 #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \
                                              if (libxl_bitmap_test(&(m),
v))
 
-static inline int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap,
-                                         int max_cpus)
-{
-    if (max_cpus < 0)
-        return ERROR_INVAL;
-    if (max_cpus == 0)
-        max_cpus = libxl_get_max_cpus(ctx);
-    if (max_cpus == 0)
-        return ERROR_FAIL;
-
-    return libxl_bitmap_alloc(ctx, cpumap, max_cpus);
-}
-
-static inline int libxl_node_bitmap_alloc(libxl_ctx *ctx,
-                                          libxl_bitmap *nodemap,
-                                          int max_nodes)
-{
-    if (max_nodes < 0)
-        return ERROR_INVAL;
-    if (max_nodes == 0)
-        max_nodes = libxl_get_max_nodes(ctx);
-    if (max_nodes == 0)
-        return ERROR_FAIL;
-
-    return libxl_bitmap_alloc(ctx, nodemap, max_nodes);
-}
+int libxl_cpu_bitmap_alloc(libxl_ctx *ctx,
+                           libxl_bitmap *cpumap,
+                           int max_cpus);
+int libxl_node_bitmap_alloc(libxl_ctx *ctx,
+                            libxl_bitmap *nodemap,
+                            int max_nodes);
 
 /* Populate cpumap with the cpus spanned by the nodes in nodemap */
 int libxl_nodemap_to_cpumap(libxl_ctx *ctx,

Dario Faggioli

2013-Nov-22 18:56 UTC

head link

[PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

to retrieve the actual number of pCPUs on the host.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
 tools/libxl/libxl.h       |    3 +++
 tools/libxl/libxl_utils.c |   18 ++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a9663e4..2fab5ba 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -652,6 +652,9 @@ int libxl_domain_preserve(libxl_ctx *ctx, uint32_t domid,
libxl_domain_create_in
 /* get max. number of cpus supported by hypervisor */
 int libxl_get_max_cpus(libxl_ctx *ctx);
 
+/* get the actual number of online cpus on the host */
+int libxl_get_nr_cpus(libxl_ctx *ctx);
+
 /* get max. number of NUMA nodes supported by hypervisor */
 int libxl_get_max_nodes(libxl_ctx *ctx);
 
diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index 1815422..8763070 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -774,6 +774,24 @@ int libxl_get_max_cpus(libxl_ctx *ctx)
     return max_cpus <= 0 ? ERROR_FAIL : max_cpus;
 }
 
+int libxl_get_nr_cpus(libxl_ctx *ctx)
+{
+    GC_INIT(ctx);
+    xc_physinfo_t physinfo = { 0 };
+    int rc = 0;
+
+    if (xc_physinfo(ctx->xch, &physinfo)) {
+        LOGE(ERROR, "xc_physinfo failed.");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+    rc = physinfo.nr_cpus;
+
+ out:
+    GC_FREE;
+    return rc;
+}
+
 int libxl_get_max_nodes(libxl_ctx *ctx)
 {
     int max_nodes = xc_get_max_nodes(ctx->xch);

Dario Faggioli

2013-Nov-22 18:57 UTC

head link

[PATCH v4 04/15] xl: allow for node-wise specification of vcpu pinning

Making it possible to use something like the following:
 * "nodes:0-3": all pCPUs of nodes 0,1,2,3;
 * "nodes:0-3,^node:2": all pCPUS of nodes 0,1,3;
 * "1,nodes:1-2,^6": pCPU 1 plus all pCPUs of nodes 1,2
   but not pCPU 6;
 * ...

In both domain config file and `xl vcpu-pin''.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes from v1 (of this very series):
 * actually checking for both "nodes:" and "node:" as per
   the doc says;
 * using strcmp() (rather than strncmp()) when matching
   "all", to avoid returning success on any longer string
   that just begins with "all";
 * fixing the handling (well, the rejection, actually) of
   "^all" and "^nodes:all";
 * make some string pointers const.

Changes from v2 (of original series):
 * turned a ''return'' into ''goto out'',
consistently with the
   most of exit patterns;
 * harmonized error handling: now parse_range() return a
   libxl error code, as requested during review;
 * dealing with "all" moved inside update_cpumap_range().
   It''s tricky to move it in parse_range() (as requested
   during review), since we need the cpumap being modified
   handy when dealing with it. However, having it in
   update_cpumap_range() simplifies the code just as much
   as that;
 * explicitly checking for junk after a valid value or range
   in parse_range(), as requested during review;
 * xl exits on parsing failing, so no need to reset the
   cpumap to something sensible in vcpupin_parse(), as
   suggested during review;

Changes from v1 (of original series):
 * code rearranged in order to look more simple to follow
   and understand, as requested during review;
 * improved docs in xl.cfg.pod.5, as requested during
   review;
 * strtoul() now returns into unsigned long, and the
   case where it returns ULONG_MAX is now taken into
   account, as requested during review;
 * stuff like "all,^7" now works, as requested during
   review. Specifying just "^7" does not work either
   before or after this change
 * killed some magic (i.e., `ptr += 5 + (ptr[4] == ''s''`)
   by introducing STR_SKIP_PREFIX() macro, as requested
   during review.
---
 docs/man/xl.cfg.pod.5    |   20 ++++++
 tools/libxl/xl_cmdimpl.c |  153 +++++++++++++++++++++++++++++++++-------------
 2 files changed, 128 insertions(+), 45 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 3b227b7..5e17b5d 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -115,7 +115,25 @@ To allow all the vcpus of the guest to run on all the cpus
on the host.
 
 =item "0-3,5,^1"
 
-To allow all the vcpus of the guest to run on cpus 0,2,3,5.
+To allow all the vcpus of the guest to run on cpus 0,2,3,5. Combining
+this with "all" is possible, meaning "all,^7" results in
all the vcpus
+of the guest running on all the cpus on the host except cpu 7.
+
+=item "nodes:0-3,node:^2"
+
+To allow all the vcpus of the guest to run on the cpus from NUMA nodes
+0,1,3 of the host. So, if cpus 0-3 belongs to node 0, cpus 4-7 belongs
+to node 1 and cpus 8-11 to node 3, the above would mean all the vcpus
+of the guest will run on cpus 0-3,8-11.
+
+Combining this notation with the one above is possible. For instance,
+"1,node:2,^6", means all the vcpus of the guest will run on cpu 1 and
+on all the cpus of NUMA node 2, but not on cpu 6. Following the same
+example as above, that would be cpus 1,4,5,7.
+
+Combining this with "all" is also possible, meaning
"all,^nodes:1"
+results in all the vcpus of the guest running on all the cpus on the
+host, except for the cpus belonging to the host NUMA node 1.
 
 =item ["2", "3"] (or [2, 3])
 
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 26d7712..8138cfe 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -59,6 +59,11 @@
         }                                                               \
     })
 
+#define STR_HAS_PREFIX( a, b )  \
+    ( strncmp(a, b, strlen(b)) == 0 )
+#define STR_SKIP_PREFIX( a, b ) \
+    ( STR_HAS_PREFIX(a, b) ? ((a) += strlen(b), 1) : 0 )
+
 
 int logfile = 2;
 
@@ -563,61 +568,121 @@ static void split_string_into_string_list(const char
*str,
     free(s);
 }
 
-static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap)
+static int parse_range(const char *str, unsigned long *a, unsigned long *b)
 {
-    libxl_bitmap exclude_cpumap;
-    uint32_t cpuida, cpuidb;
-    char *endptr, *toka, *tokb, *saveptr = NULL;
-    int i, rc = 0, rmcpu;
+    const char *nstr;
+    char *endptr;
 
-    if (!strcmp(cpu, "all")) {
-        libxl_bitmap_set_any(cpumap);
-        return 0;
+    *a = *b = strtoul(str, &endptr, 10);
+    if (endptr == str || *a == ULONG_MAX)
+        return ERROR_INVAL;
+
+    if (*endptr == ''-'') {
+        nstr = endptr + 1;
+
+        *b = strtoul(nstr, &endptr, 10);
+        if (endptr == nstr || *b == ULONG_MAX || *b < *a)
+            return ERROR_INVAL;
+    }
+
+    /* Valid value or range so far, but we also don''t want junk after
that */
+    if (*endptr != ''\0'')
+        return ERROR_INVAL;
+
+    return 0;
+}
+
+/*
+ * Add or removes a specific set of cpus (specified in str, either as
+ * single cpus or as entire NUMA nodes) to/from cpumap.
+ */
+static int update_cpumap_range(const char *str, libxl_bitmap *cpumap)
+{
+    unsigned long ida, idb;
+    libxl_bitmap node_cpumap;
+    bool is_not = false, is_nodes = false;
+    int rc = 0;
+
+    libxl_bitmap_init(&node_cpumap);
+
+    rc = libxl_node_bitmap_alloc(ctx, &node_cpumap, 0);
+    if (rc) {
+        fprintf(stderr, "libxl_node_bitmap_alloc failed.\n");
+        goto out;
     }
 
-    if (libxl_cpu_bitmap_alloc(ctx, &exclude_cpumap, 0)) {
-        fprintf(stderr, "Error: Failed to allocate cpumap.\n");
-        return ENOMEM;
+    /* Are we adding or removing cpus/nodes? */
+    if (STR_SKIP_PREFIX(str, "^")) {
+        is_not = true;
     }
 
-    for (toka = strtok_r(cpu, ",", &saveptr); toka;
-         toka = strtok_r(NULL, ",", &saveptr)) {
-        rmcpu = 0;
-        if (*toka == ''^'') {
-            /* This (These) Cpu(s) will be removed from the map */
-            toka++;
-            rmcpu = 1;
-        }
-        /* Extract a valid (range of) cpu(s) */
-        cpuida = cpuidb = strtoul(toka, &endptr, 10);
-        if (endptr == toka) {
-            fprintf(stderr, "Error: Invalid argument.\n");
-            rc = EINVAL;
-            goto vcpp_out;
-        }
-        if (*endptr == ''-'') {
-            tokb = endptr + 1;
-            cpuidb = strtoul(tokb, &endptr, 10);
-            if (endptr == tokb || cpuida > cpuidb) {
-                fprintf(stderr, "Error: Invalid argument.\n");
-                rc = EINVAL;
-                goto vcpp_out;
+    /* Are we dealing with cpus or full nodes? */
+    if (STR_SKIP_PREFIX(str, "node:") || STR_SKIP_PREFIX(str,
"nodes:")) {
+        is_nodes = true;
+    }
+
+    if (strcmp(str, "all") == 0) {
+        /* We do not accept "^all" or "^nodes:all" */
+        if (is_not) {
+            fprintf(stderr, "Can''t combine \"^\" and
\"all\".\n");
+            rc = ERROR_INVAL;
+        } else
+            libxl_bitmap_set_any(cpumap);
+        goto out;
+    }
+
+    rc = parse_range(str, &ida, &idb);
+    if (rc) {
+        fprintf(stderr, "Invalid pcpu range: %s.\n", str);
+        goto out;
+    }
+
+    /* Add or remove the specified cpus in the range */
+    while (ida <= idb) {
+        if (is_nodes) {
+            /* Add/Remove all the cpus of a NUMA node */
+            int i;
+
+            rc = libxl_node_to_cpumap(ctx, ida, &node_cpumap);
+            if (rc) {
+                fprintf(stderr, "libxl_node_to_cpumap failed.\n");
+                goto out;
             }
+
+            /* Add/Remove all the cpus in the node cpumap */
+            libxl_for_each_set_bit(i, node_cpumap) {
+                is_not ? libxl_bitmap_reset(cpumap, i) :
+                         libxl_bitmap_set(cpumap, i);
+            }
+        } else {
+            /* Add/Remove this cpu */
+            is_not ? libxl_bitmap_reset(cpumap, ida) :
+                     libxl_bitmap_set(cpumap, ida);
         }
-        while (cpuida <= cpuidb) {
-            rmcpu == 0 ? libxl_bitmap_set(cpumap, cpuida) :
-                         libxl_bitmap_set(&exclude_cpumap, cpuida);
-            cpuida++;
-        }
+        ida++;
     }
 
-    /* Clear all the cpus from the removal list */
-    libxl_for_each_set_bit(i, exclude_cpumap) {
-        libxl_bitmap_reset(cpumap, i);
-    }
+ out:
+    libxl_bitmap_dispose(&node_cpumap);
+    return rc;
+}
 
-vcpp_out:
-    libxl_bitmap_dispose(&exclude_cpumap);
+/*
+ * Takes a string representing a set of cpus (specified either as
+ * single cpus or as eintire NUMA nodes) and turns it into the
+ * corresponding libxl_bitmap (in cpumap).
+ */
+static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap)
+{
+    char *ptr, *saveptr = NULL;
+    int rc = 0;
+
+    for (ptr = strtok_r(cpu, ",", &saveptr); ptr;
+         ptr = strtok_r(NULL, ",", &saveptr)) {
+        rc = update_cpumap_range(ptr, cpumap);
+        if (rc)
+            break;
+    }
 
     return rc;
 }

Dario Faggioli

2013-Nov-22 18:57 UTC

head link

[PATCH v4 05/15] xl: implement and enable dryrun mode for `xl vcpu-pin''

As it can be useful to see if the outcome of some complex vCPU
pinning bitmap specification looks as expected.

This also allow for the introduction of some automatic testing
and verification for the bitmap parsing code, as it happens
already in check-xl-disk-parse and check-xl-vif-parse.

In particular, to make the above possible, this commit also
changes the implementation of the vcpu-pin command so that,
instead of always returning 0, it returns an error if the
parsing fails.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v2 (of original series):
 * fixed a typo in the changelog
---
 tools/libxl/xl_cmdimpl.c  |   48 +++++++++++++++++++++++++++++++++------------
 tools/libxl/xl_cmdtable.c |    2 +-
 2 files changed, 36 insertions(+), 14 deletions(-)

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 8138cfe..7b4d058 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -4583,40 +4583,62 @@ int main_vcpulist(int argc, char **argv)
     return 0;
 }
 
-static void vcpupin(uint32_t domid, const char *vcpu, char *cpu)
+static int vcpupin(uint32_t domid, const char *vcpu, char *cpu)
 {
     libxl_vcpuinfo *vcpuinfo;
     libxl_bitmap cpumap;
 
     uint32_t vcpuid;
     char *endptr;
-    int i, nb_vcpu;
+    int i, nb_cpu, nb_vcpu, rc = -1;
+
+    libxl_bitmap_init(&cpumap);
 
     vcpuid = strtoul(vcpu, &endptr, 10);
     if (vcpu == endptr) {
         if (strcmp(vcpu, "all")) {
             fprintf(stderr, "Error: Invalid argument.\n");
-            return;
+            goto out;
         }
         vcpuid = -1;
     }
 
-    if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) {
-        goto vcpupin_out;
-    }
+    if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0))
+        goto out;
 
     if (vcpupin_parse(cpu, &cpumap))
-        goto vcpupin_out1;
+        goto out;
+
+    if (dryrun_only) {
+        nb_cpu = libxl_get_nr_cpus(ctx);
+        if (nb_cpu <= 0) {
+            fprintf(stderr, "libxl_get_nr_cpus failed.\n");
+            goto out;
+        }
+
+        fprintf(stdout, "cpumap: ");
+        print_bitmap(cpumap.map, nb_cpu, stdout);
+        fprintf(stdout, "\n");
+
+        if (ferror(stdout) || fflush(stdout)) {
+            perror("stdout");
+            exit(-1);
+        }
+
+        rc = 0;
+        goto out;
+    }
 
     if (vcpuid != -1) {
         if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap) == -1) {
             fprintf(stderr, "Could not set affinity for vcpu
`%u''.\n", vcpuid);
+            goto out;
         }
     }
     else {
         if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu, &i))) {
             fprintf(stderr, "libxl_list_vcpu failed.\n");
-            goto vcpupin_out1;
+            goto out;
         }
         for (i = 0; i < nb_vcpu; i++) {
             if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid,
@@ -4627,10 +4649,11 @@ static void vcpupin(uint32_t domid, const char *vcpu,
char *cpu)
         }
         libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu);
     }
-  vcpupin_out1:
+
+    rc = 0;
+ out:
     libxl_bitmap_dispose(&cpumap);
-  vcpupin_out:
-    ;
+    return rc;
 }
 
 int main_vcpupin(int argc, char **argv)
@@ -4641,8 +4664,7 @@ int main_vcpupin(int argc, char **argv)
         /* No options */
     }
 
-    vcpupin(find_domain(argv[optind]), argv[optind+1] , argv[optind+2]);
-    return 0;
+    return vcpupin(find_domain(argv[optind]), argv[optind+1] , argv[optind+2]);
 }
 
 static void vcpuset(uint32_t domid, const char* nr_vcpus, int check_host)
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 7709206..ebe0220 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -211,7 +211,7 @@ struct cmd_spec cmd_table[] = {
       "[Domain, ...]",
     },
     { "vcpu-pin",
-      &main_vcpupin, 0, 1,
+      &main_vcpupin, 1, 1,
       "Set which CPUs a VCPU can use",
       "<Domain> <VCPU|all> <CPUs|all>",
     },

Dario Faggioli

2013-Nov-22 18:57 UTC

head link

[PATCH v4 06/15] xl: test script for the cpumap parser (for vCPU pinning)

This commit introduces "check-xl-vcpupin-parse" for helping
verifying and debugging the (v)CPU bitmap parsing code in xl.

The script runs "xl -N vcpu-pin 0 all <some strings>"
repeatedly, with various input strings, and checks that the
output is as expected.

This is what the script can do:

# ./check-xl-vcpupin-parse -h
 usage: ./check-xl-vcpupin-parse [options]

 Tests various vcpu-pinning strings. If run without arguments acts
 as follows:
  - generates some test data and saves them in
    check-xl-vcpupin-parse.data;
  - tests all the generated configurations (reading them back from
    check-xl-vcpupin-parse.data).

 An example of a test vector file is provided in
 check-xl-vcpupin-parse.data-example.

 Options:
  -h         prints this message
  -r seed    uses seed for initializing the rundom number generator
             (default: the script PID)
  -s string  tries using string as a vcpu pinning configuration and
             reports whether that succeeds or not
  -o ofile   save the test data in ofile
             (default: check-xl-vcpupin-parse.data)
  -i ifile   read test data from ifile

An example test data file (generated on a 2 NUMA nodes, 16 CPUs
host) is being provided in check-xl-vcpupin-parse.data-example.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
Changes from v2 (of original series):
 * killed the `sleep 1'', as requested during review;
 * allow for passing a custom randon seed, and report what is
   the actual random seed used, as requested during review;
 * allow for testing for specific pinning configuration strings,
   as suggested during review;
 * stores the test data in a file, after generating them, and
   read them back from there for actual testing, as suggested
   during review;
 * allow for reading the test data from an existing test file
   instead than always generating new ones.

Changes from v1 (of original series):
 * this was not there in v1, and adding it has been requested
   during review.
---
 tools/libxl/check-xl-vcpupin-parse              |  294 +++++++++++++++++++++++
 tools/libxl/check-xl-vcpupin-parse.data-example |   53 ++++
 2 files changed, 347 insertions(+)
 create mode 100755 tools/libxl/check-xl-vcpupin-parse
 create mode 100644 tools/libxl/check-xl-vcpupin-parse.data-example

diff --git a/tools/libxl/check-xl-vcpupin-parse
b/tools/libxl/check-xl-vcpupin-parse
new file mode 100755
index 0000000..21f8421
--- /dev/null
+++ b/tools/libxl/check-xl-vcpupin-parse
@@ -0,0 +1,294 @@
+#!/bin/bash
+
+set -e
+
+if [ -x ./xl ] ; then
+    export LD_LIBRARY_PATH=.:../libxc:../xenstore:
+    XL=./xl
+else
+    XL=xl
+fi
+
+fprefix=tmp.check-xl-vcpupin-parse
+outfile=check-xl-vcpupin-parse.data
+
+usage () {
+cat <<END
+usage: $0 [options]
+
+Tests various vcpu-pinning strings. If run without arguments acts
+as follows:
+ - generates some test data and saves them in $outfile;
+ - tests all the generated configurations (reading them back from
+   $outfile).
+
+An example of a test vector file is provided in ${outfile}-example.
+
+Options:
+ -h         prints this message
+ -r seed    uses seed for initializing the rundom number generator
+            (default: the script PID)
+ -s string  tries using string as a vcpu pinning configuration and
+            reports whether that succeeds or not
+ -o ofile   save the test data in ofile (default: $outfile)
+ -i ifile   read test data from ifile
+END
+}
+
+expected () {
+    cat >$fprefix.expected
+}
+
+# by default, re-seed with our PID
+seed=$$
+failures=0
+
+# Execute one test and check the result against the provided
+# rc value and output
+one () {
+    expected_rc=$1; shift
+    printf "test case %s...\n" "$*"
+    set +e
+    ${XL} -N vcpu-pin 0 all "$@" </dev/null >$fprefix.actual
2>/dev/null
+    actual_rc=$?
+    if [ $actual_rc != $expected_rc ]; then
+        diff -u $fprefix.expected $fprefix.actual
+        echo >&2 "test case \`$*'' failed ($actual_rc
$diff_rc)"
+        failures=$(( $failures + 1 ))
+    fi
+    set -e
+}
+
+# Write an entry in the test vector file. Format is as follows:
+#  test-string*expected-rc*expected-output
+write () {
+    printf "$1*$2*$3\n" >> $outfile
+}
+
+complete () {
+    if [ "$failures" = 0 ]; then
+        echo all ok.; exit 0
+    else
+        echo "$failures tests failed."; exit 1
+    fi
+}
+
+# Test a specific pinning string
+string () {
+    expected_rc=$1; shift
+    printf "test case %s...\n" "$*"
+    set +e
+    ${XL} -N vcpu-pin 0 all "$@" &> /dev/null
+    actual_rc=$?
+    set -e
+
+    if [ $actual_rc != $expected_rc ]; then
+        echo >&2 "test case \`$*'' failed
($actual_rc)"
+    else
+        echo >&2 "test case \`$*'' succeeded"
+    fi
+
+    exit 0
+}
+
+# Read a test vector file (provided as $1) line by line and
+# test all the entries it contains
+run ()
+{
+    while read line
+    do
+        if [ ${line:0:1} != ''#'' ]; then
+            test_string="`echo $line | cut -f1
-d''*''`"
+            exp_rc="`echo $line | cut -f2 -d''*''`"
+            exp_output="`echo $line | cut -f3
-d''*''`"
+
+            expected <<END
+$exp_output
+END
+            one $exp_rc "$test_string"
+        fi
+    done < $1
+
+    complete
+
+    exit 0
+}
+
+while getopts "hr:s:o:i:" option
+do
+    case $option in
+    h)
+        usage
+        exit 0
+        ;;
+    r)
+        seed=$OPTARG
+        ;;
+    s)
+        string 0 "$OPTARG"
+        ;;
+    o)
+        outfile=$OPTARG
+        ;;
+    i)
+        run $OPTARG
+        ;;
+    esac
+done
+
+#---------- test data ----------
+#
+nr_cpus=`xl info | grep nr_cpus | cut -f2 -d'':''`
+nr_nodes=`xl info | grep nr_nodes | cut -f2 -d'':''`
+nr_cpus_per_node=`xl info -n | sed ''/cpu:/,/numa_info/!d'' |
head -n -1 | \
+    awk ''{print $4}'' | uniq -c | tail -1 | awk
''{print $1}''`
+cat >$outfile <<END
+# WARNING: some of these tests are topology based tests.
+# Expect failures if the topology is not detected correctly
+# detected topology: $nr_cpus CPUs, $nr_nodes nodes, $nr_cpus_per_node CPUs per
node.
+#
+# seed used for random number generation: seed=${seed}.
+#
+# Format is as follows:
+#  test-string*expected-return-code*expected-output
+#
+END
+
+# Re-seed the random number generator
+RANDOM=$seed
+
+echo "# Testing a wrong configuration" >> $outfile
+write foo 255 ""
+
+echo "# Testing the ''all'' syntax" >> $outfile
+write "all" 0 "cpumap: all"
+write "nodes:all" 0 "cpumap: all"
+write "all,nodes:all" 0 "cpumap: all"
+write "all,^nodes:0,all" 0 "cpumap: all"
+
+echo "# Testing the empty cpumap case" >> $outfile
+write "^0" 0 "cpumap: none"
+
+echo "# A few attempts of pinning to just one random cpu" >>
$outfile
+if [ $nr_cpus -gt 1 ]; then
+    for i in `seq 0 3`; do
+        cpu=$(($RANDOM % nr_cpus))
+        write "$cpu" 0 "cpumap: $cpu"
+    done
+fi
+
+echo "# A few attempts of pinning to all but one random cpu" >>
$outfile
+if [ $nr_cpus -gt 2 ]; then
+    for i in `seq 0 3`; do
+        cpu=$(($RANDOM % nr_cpus))
+        if [ $cpu -eq 0 ]; then
+            expected_range="1-$((nr_cpus - 1))"
+        elif [ $cpu -eq 1 ]; then
+            expected_range="0,2-$((nr_cpus - 1))"
+        elif [ $cpu -eq $((nr_cpus - 2)) ]; then
+            expected_range="0-$((cpu - 1)),$((nr_cpus - 1))"
+        elif [ $cpu -eq $((nr_cpus - 1)) ]; then
+            expected_range="0-$((nr_cpus - 2))"
+        else
+            expected_range="0-$((cpu - 1)),$((cpu + 1))-$((nr_cpus -
1))"
+        fi
+        write "all,^$cpu" 0 "cpumap: $expected_range"
+    done
+fi
+
+echo "# A few attempts of pinning to a random range of cpus" >>
$outfile
+if [ $nr_cpus -gt 2 ]; then
+    for i in `seq 0 3`; do
+        cpua=$(($RANDOM % nr_cpus))
+        range=$((nr_cpus - cpua))
+        cpub=$(($RANDOM % range))
+        cpubb=$((cpua + cpub))
+        if [ $cpua -eq $cpubb ]; then
+            expected_range="$cpua"
+        else
+            expected_range="$cpua-$cpubb"
+        fi
+        write "$expected_range" 0 "cpumap: $expected_range"
+    done
+fi
+
+echo "# A few attempts of pinning to just one random node" >>
$outfile
+if [ $nr_nodes -gt 1 ]; then
+    for i in `seq 0 3`; do
+        node=$(($RANDOM % nr_nodes))
+        # this assumes that the first $nr_cpus_per_node (from cpu
+        # 0 to cpu $nr_cpus_per_node-1) are assigned to the first node
+        # (node 0), the second $nr_cpus_per_node (from $nr_cpus_per_node
+        # to 2*$nr_cpus_per_node-1) are assigned to the second node (node
+        # 1), etc. Expect failures if that is not the case.
+        write "nodes:$node" 0 "cpumap:
$((nr_cpus_per_node*node))-$((nr_cpus_per_node*(node+1)-1))"
+    done
+fi
+
+echo "# A few attempts of pinning to all but one random node"
>> $outfile
+if [ $nr_nodes -gt 1 ]; then
+    for i in `seq 0 3`; do
+        node=$(($RANDOM % nr_nodes))
+        # this assumes that the first $nr_cpus_per_node (from cpu
+        # 0 to cpu $nr_cpus_per_node-1) are assigned to the first node
+        # (node 0), the second $nr_cpus_per_node (from $nr_cpus_per_node
+        # to 2*$nr_cpus_per_node-1) are assigned to the second node (node
+        # 1), etc. Expect failures if that is not the case.
+        if [ $node -eq 0 ]; then
+            expected_range="$nr_cpus_per_node-$((nr_cpus - 1))"
+        elif [ $node -eq $((nr_nodes - 1)) ]; then
+            expected_range="0-$((nr_cpus - nr_cpus_per_node - 1))"
+        else
+           
expected_range="0-$((nr_cpus_per_node*node-1)),$((nr_cpus_per_node*(node+1)))-$nr_cpus"
+        fi
+        write "all,^nodes:$node" 0 "cpumap:
$expected_range"
+    done
+fi
+
+echo "# A few attempts of pinning to a random range of nodes"
>> $outfile
+if [ $nr_nodes -gt 1 ]; then
+    for i in `seq 0 3`; do
+        nodea=$(($RANDOM % nr_nodes))
+        range=$((nr_nodes - nodea))
+        nodeb=$(($RANDOM % range))
+        nodebb=$((nodea + nodeb))
+        # this assumes that the first $nr_cpus_per_node (from cpu
+        # 0 to cpu $nr_cpus_per_node-1) are assigned to the first node
+        # (node 0), the second $nr_cpus_per_node (from $nr_cpus_per_node
+        # to 2*$nr_cpus_per_node-1) are assigned to the second node (node
+        # 1), etc. Expect failures if that is not the case.
+        if [ $nodea -eq 0 ] && [ $nodebb -eq $((nr_nodes - 1)) ]; then
+            expected_range="all"
+        else
+           
expected_range="$((nr_cpus_per_node*nodea))-$((nr_cpus_per_node*(nodebb+1)
- 1))"
+        fi
+        write "nodes:$nodea-$nodebb" 0 "cpumap:
$expected_range"
+    done
+fi
+
+echo "# A few attempts of pinning to a node but excluding one random
cpu" >> $outfile
+if [ $nr_nodes -gt 1 ]; then
+    for i in `seq 0 3`; do
+        node=$(($RANDOM % nr_nodes))
+        # this assumes that the first $nr_cpus_per_node (from cpu
+        # 0 to cpu $nr_cpus_per_node-1) are assigned to the first node
+        # (node 0), the second $nr_cpus_per_node (from $nr_cpus_per_node
+        # to 2*$nr_cpus_per_node-1) are assigned to the second node (node
+        # 1), etc. Expect failures if that is not the case.
+        cpu=$(($RANDOM % nr_cpus_per_node + nr_cpus_per_node*node))
+        if [ $cpu -eq $((nr_cpus_per_node*node)) ]; then
+            expected_range="$((nr_cpus_per_node*node +
1))-$((nr_cpus_per_node*(node+1) - 1))"
+        elif [ $cpu -eq $((nr_cpus_per_node*node + 1)) ]; then
+           
expected_range="$((nr_cpus_per_node*node)),$((nr_cpus_per_node*node +
2))-$((nr_cpus_per_node*(node+1) - 1))"
+        elif [ $cpu -eq $((nr_cpus_per_node*(node+1) - 2)) ]; then
+           
expected_range="$((nr_cpus_per_node*node))-$((nr_cpus_per_node*(node+1) -
3)),$((nr_cpus_per_node*(node+1) - 1))"
+        elif [ $cpu -eq $((nr_cpus_per_node*(node+1) - 1)) ]; then
+           
expected_range="$((nr_cpus_per_node*node))-$((nr_cpus_per_node*(node+1) -
2))"
+        else
+            expected_range="$((nr_cpus_per_node*node))-$((cpu - 1)),$((cpu
+ 1))-$((nr_cpus_per_node*(node+1) - 1))"
+        fi
+        write "nodes:$node,^$cpu" 0 "cpumap:
$expected_range"
+    done
+fi
+
+run $outfile
diff --git a/tools/libxl/check-xl-vcpupin-parse.data-example
b/tools/libxl/check-xl-vcpupin-parse.data-example
new file mode 100644
index 0000000..4bbd5de
--- /dev/null
+++ b/tools/libxl/check-xl-vcpupin-parse.data-example
@@ -0,0 +1,53 @@
+# WARNING: some of these tests are topology based tests.
+# Expect failures if the topology is not detected correctly
+# detected topology:  16 CPUs,  2 nodes, 8 CPUs per node.
+#
+# seed used for random number generation: seed=13328.
+#
+# Format is as follows:
+#  test-string*expected-return-code*expected-output
+#
+# Testing a wrong configuration
+foo*255*
+# Testing the ''all'' syntax
+all*0*cpumap: all
+nodes:all*0*cpumap: all
+all,nodes:all*0*cpumap: all
+all,^nodes:0,all*0*cpumap: all
+# Testing the empty cpumap case
+^0*0*cpumap: none
+# A few attempts of pinning to just one random cpu
+0*0*cpumap: 0
+9*0*cpumap: 9
+6*0*cpumap: 6
+0*0*cpumap: 0
+# A few attempts of pinning to all but one random cpu
+all,^12*0*cpumap: 0-11,13-15
+all,^6*0*cpumap: 0-5,7-15
+all,^3*0*cpumap: 0-2,4-15
+all,^7*0*cpumap: 0-6,8-15
+# A few attempts of pinning to a random range of cpus
+13-15*0*cpumap: 13-15
+7*0*cpumap: 7
+3-5*0*cpumap: 3-5
+8-11*0*cpumap: 8-11
+# A few attempts of pinning to just one random node
+nodes:1*0*cpumap: 8-15
+nodes:0*0*cpumap: 0-7
+nodes:0*0*cpumap: 0-7
+nodes:0*0*cpumap: 0-7
+# A few attempts of pinning to all but one random node
+all,^nodes:0*0*cpumap: 8-15
+all,^nodes:1*0*cpumap: 0-7
+all,^nodes:1*0*cpumap: 0-7
+all,^nodes:0*0*cpumap: 8-15
+# A few attempts of pinning to a random range of nodes
+nodes:1-1*0*cpumap: 8-15
+nodes:1-1*0*cpumap: 8-15
+nodes:0-1*0*cpumap: all
+nodes:0-0*0*cpumap: 0-7
+# A few attempts of pinning to a node but excluding one random cpu
+nodes:1,^8*0*cpumap: 9-15
+nodes:0,^6*0*cpumap: 0-5,7
+nodes:1,^9*0*cpumap: 8,10-15
+nodes:0,^5*0*cpumap: 0-4,6-7

Dario Faggioli

2013-Nov-22 18:57 UTC

head link

[PATCH v4 07/15] xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity

in order to distinguish it from the cpu_soft_affinity which will
be introduced a later commit ("xen: sched: introduce soft-affinity
and use it instead d->node-affinity").

This patch does not imply any functional change, it is basically
the result of something like the following:

 s/cpu_affinity/cpu_hard_affinity/g
 s/cpu_affinity_tmp/cpu_hard_affinity_tmp/g
 s/cpu_affinity_saved/cpu_hard_affinity_saved/g

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
---
Changes from v2:
 * patch has been moved one step up in the series.
---
 xen/arch/x86/traps.c      |   11 ++++++-----
 xen/common/domain.c       |   22 +++++++++++-----------
 xen/common/domctl.c       |    2 +-
 xen/common/keyhandler.c   |    2 +-
 xen/common/sched_credit.c |   12 ++++++------
 xen/common/sched_sedf.c   |    2 +-
 xen/common/schedule.c     |   21 +++++++++++----------
 xen/common/wait.c         |    4 ++--
 xen/include/xen/sched.h   |    8 ++++----
 9 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index d8b3eac..157031e 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -3093,7 +3093,8 @@ static void nmi_mce_softirq(void)
 
     /* Set the tmp value unconditionally, so that
      * the check in the iret hypercall works. */
-    cpumask_copy(st->vcpu->cpu_affinity_tmp,
st->vcpu->cpu_affinity);
+    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
+                 st->vcpu->cpu_hard_affinity);
 
     if ((cpu != st->processor)
        || (st->processor != st->vcpu->processor))
@@ -3128,11 +3129,11 @@ void async_exception_cleanup(struct vcpu *curr)
         return;
 
     /* Restore affinity.  */
-    if ( !cpumask_empty(curr->cpu_affinity_tmp) &&
-         !cpumask_equal(curr->cpu_affinity_tmp, curr->cpu_affinity) )
+    if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) &&
+         !cpumask_equal(curr->cpu_hard_affinity_tmp,
curr->cpu_hard_affinity) )
     {
-        vcpu_set_affinity(curr, curr->cpu_affinity_tmp);
-        cpumask_clear(curr->cpu_affinity_tmp);
+        vcpu_set_affinity(curr, curr->cpu_hard_affinity_tmp);
+        cpumask_clear(curr->cpu_hard_affinity_tmp);
     }
 
     if ( !(curr->async_exception_mask & (curr->async_exception_mask -
1)) )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 2cbc489..d8116c7 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -125,9 +125,9 @@ struct vcpu *alloc_vcpu(
 
     tasklet_init(&v->continue_hypercall_tasklet, NULL, 0);
 
-    if ( !zalloc_cpumask_var(&v->cpu_affinity) ||
-         !zalloc_cpumask_var(&v->cpu_affinity_tmp) ||
-         !zalloc_cpumask_var(&v->cpu_affinity_saved) ||
+    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
+         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
+         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
          !zalloc_cpumask_var(&v->vcpu_dirty_cpumask) )
         goto fail_free;
 
@@ -156,9 +156,9 @@ struct vcpu *alloc_vcpu(
  fail_wq:
         destroy_waitqueue_vcpu(v);
  fail_free:
-        free_cpumask_var(v->cpu_affinity);
-        free_cpumask_var(v->cpu_affinity_tmp);
-        free_cpumask_var(v->cpu_affinity_saved);
+        free_cpumask_var(v->cpu_hard_affinity);
+        free_cpumask_var(v->cpu_hard_affinity_tmp);
+        free_cpumask_var(v->cpu_hard_affinity_saved);
         free_cpumask_var(v->vcpu_dirty_cpumask);
         free_vcpu_struct(v);
         return NULL;
@@ -371,7 +371,7 @@ void domain_update_node_affinity(struct domain *d)
 
     for_each_vcpu ( d, v )
     {
-        cpumask_and(online_affinity, v->cpu_affinity, online);
+        cpumask_and(online_affinity, v->cpu_hard_affinity, online);
         cpumask_or(cpumask, cpumask, online_affinity);
     }
 
@@ -734,9 +734,9 @@ static void complete_domain_destroy(struct rcu_head *head)
     for ( i = d->max_vcpus - 1; i >= 0; i-- )
         if ( (v = d->vcpu[i]) != NULL )
         {
-            free_cpumask_var(v->cpu_affinity);
-            free_cpumask_var(v->cpu_affinity_tmp);
-            free_cpumask_var(v->cpu_affinity_saved);
+            free_cpumask_var(v->cpu_hard_affinity);
+            free_cpumask_var(v->cpu_hard_affinity_tmp);
+            free_cpumask_var(v->cpu_hard_affinity_saved);
             free_cpumask_var(v->vcpu_dirty_cpumask);
             free_vcpu_struct(v);
         }
@@ -875,7 +875,7 @@ int vcpu_reset(struct vcpu *v)
     v->async_exception_mask = 0;
     memset(v->async_exception_state, 0,
sizeof(v->async_exception_state));
 #endif
-    cpumask_clear(v->cpu_affinity_tmp);
+    cpumask_clear(v->cpu_hard_affinity_tmp);
     clear_bit(_VPF_blocked, &v->pause_flags);
     clear_bit(_VPF_in_reset, &v->pause_flags);
 
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 904d27b..5e0ac5c 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -629,7 +629,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t)
u_domctl)
         else
         {
             ret = cpumask_to_xenctl_bitmap(
-                &op->u.vcpuaffinity.cpumap, v->cpu_affinity);
+                &op->u.vcpuaffinity.cpumap, v->cpu_hard_affinity);
         }
     }
     break;
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index 8e4b3f8..c11f577 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -296,7 +296,7 @@ static void dump_domains(unsigned char key)
                    !vcpu_event_delivery_is_enabled(v));
             cpuset_print(tmpstr, sizeof(tmpstr), v->vcpu_dirty_cpumask);
             printk("dirty_cpus=%s ", tmpstr);
-            cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_affinity);
+            cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_hard_affinity);
             printk("cpu_affinity=%s\n", tmpstr);
             printk("    pause_count=%d pause_flags=%lx\n",
                    atomic_read(&v->pause_count), v->pause_flags);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index db5512e..c6a2560 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -332,13 +332,13 @@ csched_balance_cpumask(const struct vcpu *vc, int step,
cpumask_t *mask)
     if ( step == CSCHED_BALANCE_NODE_AFFINITY )
     {
         cpumask_and(mask, CSCHED_DOM(vc->domain)->node_affinity_cpumask,
-                    vc->cpu_affinity);
+                    vc->cpu_hard_affinity);
 
         if ( unlikely(cpumask_empty(mask)) )
-            cpumask_copy(mask, vc->cpu_affinity);
+            cpumask_copy(mask, vc->cpu_hard_affinity);
     }
     else /* step == CSCHED_BALANCE_CPU_AFFINITY */
-        cpumask_copy(mask, vc->cpu_affinity);
+        cpumask_copy(mask, vc->cpu_hard_affinity);
 }
 
 static void burn_credits(struct csched_vcpu *svc, s_time_t now)
@@ -407,7 +407,7 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
 
             if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY
                  && !__vcpu_has_node_affinity(new->vcpu,
-                                              new->vcpu->cpu_affinity) )
+                                             
new->vcpu->cpu_hard_affinity) )
                 continue;
 
             /* Are there idlers suitable for new (for this balance step)? */
@@ -642,7 +642,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu
*vc, bool_t commit)
 
     /* Store in cpus the mask of online cpus on which the domain can run */
     online = cpupool_scheduler_cpumask(vc->domain->cpupool);
-    cpumask_and(&cpus, vc->cpu_affinity, online);
+    cpumask_and(&cpus, vc->cpu_hard_affinity, online);
 
     for_each_csched_balance_step( balance_step )
     {
@@ -1498,7 +1498,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int
balance_step)
              * or counter.
              */
             if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY
-                 && !__vcpu_has_node_affinity(vc, vc->cpu_affinity)
)
+                 && !__vcpu_has_node_affinity(vc,
vc->cpu_hard_affinity) )
                 continue;
 
             csched_balance_cpumask(vc, balance_step, csched_balance_mask);
diff --git a/xen/common/sched_sedf.c b/xen/common/sched_sedf.c
index 7c24171..c219aed 100644
--- a/xen/common/sched_sedf.c
+++ b/xen/common/sched_sedf.c
@@ -396,7 +396,7 @@ static int sedf_pick_cpu(const struct scheduler *ops, struct
vcpu *v)
     cpumask_t *online;
 
     online = cpupool_scheduler_cpumask(v->domain->cpupool);
-    cpumask_and(&online_affinity, v->cpu_affinity, online);
+    cpumask_and(&online_affinity, v->cpu_hard_affinity, online);
     return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) -
1,
                          &online_affinity);
 }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 0f45f07..c4236c5 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -194,9 +194,9 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
      */
     v->processor = processor;
     if ( is_idle_domain(d) || d->is_pinned )
-        cpumask_copy(v->cpu_affinity, cpumask_of(processor));
+        cpumask_copy(v->cpu_hard_affinity, cpumask_of(processor));
     else
-        cpumask_setall(v->cpu_affinity);
+        cpumask_setall(v->cpu_hard_affinity);
 
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
@@ -285,7 +285,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         migrate_timer(&v->singleshot_timer, new_p);
         migrate_timer(&v->poll_timer, new_p);
 
-        cpumask_setall(v->cpu_affinity);
+        cpumask_setall(v->cpu_hard_affinity);
 
         lock = vcpu_schedule_lock_irq(v);
         v->processor = new_p;
@@ -457,7 +457,7 @@ static void vcpu_migrate(struct vcpu *v)
              */
             if ( pick_called &&
                  (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock)
&&
-                 cpumask_test_cpu(new_cpu, v->cpu_affinity) &&
+                 cpumask_test_cpu(new_cpu, v->cpu_hard_affinity) &&
                  cpumask_test_cpu(new_cpu,
v->domain->cpupool->cpu_valid) )
                 break;
 
@@ -561,7 +561,7 @@ void restore_vcpu_affinity(struct domain *d)
         {
             printk(XENLOG_DEBUG "Restoring affinity for d%dv%d\n",
                    d->domain_id, v->vcpu_id);
-            cpumask_copy(v->cpu_affinity, v->cpu_affinity_saved);
+            cpumask_copy(v->cpu_hard_affinity,
v->cpu_hard_affinity_saved);
             v->affinity_broken = 0;
         }
 
@@ -604,20 +604,21 @@ int cpu_disable_scheduler(unsigned int cpu)
             unsigned long flags;
             spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
 
-            cpumask_and(&online_affinity, v->cpu_affinity,
c->cpu_valid);
+            cpumask_and(&online_affinity, v->cpu_hard_affinity,
c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
-                 cpumask_test_cpu(cpu, v->cpu_affinity) )
+                 cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
             {
                 printk(XENLOG_DEBUG "Breaking affinity for d%dv%d\n",
                         d->domain_id, v->vcpu_id);
 
                 if (system_state == SYS_STATE_suspend)
                 {
-                    cpumask_copy(v->cpu_affinity_saved, v->cpu_affinity);
+                    cpumask_copy(v->cpu_hard_affinity_saved,
+                                 v->cpu_hard_affinity);
                     v->affinity_broken = 1;
                 }
 
-                cpumask_setall(v->cpu_affinity);
+                cpumask_setall(v->cpu_hard_affinity);
             }
 
             if ( v->processor == cpu )
@@ -665,7 +666,7 @@ int vcpu_set_affinity(struct vcpu *v, const cpumask_t
*affinity)
 
     lock = vcpu_schedule_lock_irq(v);
 
-    cpumask_copy(v->cpu_affinity, affinity);
+    cpumask_copy(v->cpu_hard_affinity, affinity);
 
     /* Always ask the scheduler to re-evaluate placement
      * when changing the affinity */
diff --git a/xen/common/wait.c b/xen/common/wait.c
index 3c9366c..3f6ff41 100644
--- a/xen/common/wait.c
+++ b/xen/common/wait.c
@@ -134,7 +134,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
 
     /* Save current VCPU affinity; force wakeup on *this* CPU only. */
     wqv->wakeup_cpu = smp_processor_id();
-    cpumask_copy(&wqv->saved_affinity, curr->cpu_affinity);
+    cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
     if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
     {
         gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
@@ -183,7 +183,7 @@ void check_wakeup_from_wait(void)
     {
         /* Re-set VCPU affinity and re-enter the scheduler. */
         struct vcpu *curr = current;
-        cpumask_copy(&wqv->saved_affinity, curr->cpu_affinity);
+        cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
         if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
         {
             gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index cbdf377..40e5927 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -192,11 +192,11 @@ struct vcpu
     spinlock_t       virq_lock;
 
     /* Bitmask of CPUs on which this VCPU may run. */
-    cpumask_var_t    cpu_affinity;
+    cpumask_var_t    cpu_hard_affinity;
     /* Used to change affinity temporarily. */
-    cpumask_var_t    cpu_affinity_tmp;
+    cpumask_var_t    cpu_hard_affinity_tmp;
     /* Used to restore affinity across S3. */
-    cpumask_var_t    cpu_affinity_saved;
+    cpumask_var_t    cpu_hard_affinity_saved;
 
     /* Bitmask of CPUs which are holding onto this VCPU''s state. */
     cpumask_var_t    vcpu_dirty_cpumask;
@@ -792,7 +792,7 @@ void watchdog_domain_destroy(struct domain *d);
 #define has_hvm_container_domain(d) ((d)->guest_type != guest_type_pv)
 #define has_hvm_container_vcpu(v)   (has_hvm_container_domain((v)->domain))
 #define is_pinned_vcpu(v) ((v)->domain->is_pinned || \
-                           cpumask_weight((v)->cpu_affinity) == 1)
+                           cpumask_weight((v)->cpu_hard_affinity) == 1)
 #ifdef HAS_PASSTHROUGH
 #define need_iommu(d)    ((d)->need_iommu)
 #else

Dario Faggioli

2013-Nov-22 18:57 UTC

head link

[PATCH v4 08/15] xen: sched: introduce soft-affinity and use it instead d->node-affinity

Before this change, each vcpu had its own vcpu-affinity
(in v->cpu_affinity), representing the set of pcpus where
the vcpu is allowed to run. Since when NUMA-aware scheduling
was introduced the (credit1 only, for now) scheduler also
tries as much as it can to run all the vcpus of a domain
on one of the nodes that constitutes the domain''s
node-affinity.

The idea here is making the mechanism more general by:
  * allowing for this ''preference'' for some pcpus/nodes to be
    expressed on a per-vcpu basis, instead than for the domain
    as a whole. That is to say, each vcpu should have its own
    set of preferred pcpus/nodes, instead than it being the
    very same for all the vcpus of the domain;
  * generalizing the idea of ''preferred pcpus'' to not only
NUMA
    awareness and support. That is to say, independently from
    it being or not (mostly) useful on NUMA systems, it should
    be possible to specify, for each vcpu, a set of pcpus where
    it prefers to run (in addition, and possibly unrelated to,
    the set of pcpus where it is allowed to run).

We will be calling this set of *preferred* pcpus the vcpu''s
soft affinity, and this changes introduce it, and starts using it
for scheduling, replacing the indirect use of the domain''s NUMA
node-affinity. This is more general, as soft affinity does not
have to be related to NUMA. Nevertheless, it allows to achieve the
same results of NUMA-aware scheduling, just by making soft affinity
equal to the domain''s node affinity, for all the vCPUs (e.g.,
from the toolstack).

This also means renaming most of the NUMA-aware scheduling related
functions, in credit1, to something more generic, hinting toward
the concept of soft affinity rather than directly to NUMA awareness.

As a side effects, this simplifies the code quit a bit. In fact,
prior to this change, we needed to cache the translation of
d->node_affinity (which is a nodemask_t) to a cpumask_t, since that
is what scheduling decisions require (we used to keep it in
node_affinity_cpumask). This, and all the complicated logic
required to keep it updated, is not necessary any longer.

The high level description of NUMA placement and scheduling in
docs/misc/xl-numa-placement.markdown is being updated too, to match
the new architecture.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
---
Changes from v2:
 * this patch folds patches 6 ("xen: sched: make space for
   cpu_soft_affinity") and 10 ("xen: sched: use soft-affinity
   instead of domain''s node-affinity"), as suggested during
   review. ''Reviewed-by'' from George is there since both patch
   6 and 10 had it, and I didn''t do anything else than squashing
   them.

Changes from v1:
 * in v1, "7/12 xen: numa-sched: use per-vcpu node-affinity for
   actual scheduling" was doing something very similar to this
   patch.
---
 docs/misc/xl-numa-placement.markdown |  148 ++++++++++++++++++++++-----------
 xen/common/domain.c                  |    5 +
 xen/common/keyhandler.c              |    2 
 xen/common/sched_credit.c            |  153 +++++++++++++---------------------
 xen/common/schedule.c                |    3 +
 xen/include/xen/sched.h              |    3 +
 6 files changed, 168 insertions(+), 146 deletions(-)

diff --git a/docs/misc/xl-numa-placement.markdown
b/docs/misc/xl-numa-placement.markdown
index caa3fec..b1ed361 100644
--- a/docs/misc/xl-numa-placement.markdown
+++ b/docs/misc/xl-numa-placement.markdown
@@ -12,13 +12,6 @@ is quite more complex and slow. On these machines, a NUMA
node is usually
 defined as a set of processor cores (typically a physical CPU package) and
 the memory directly attached to the set of cores.
 
-The Xen hypervisor deals with NUMA machines by assigning to each domain
-a "node affinity", i.e., a set of NUMA nodes of the host from which
they
-get their memory allocated. Also, even if the node affinity of a domain
-is allowed to change on-line, it is very important to "place" the
domain
-correctly when it is fist created, as the most of its memory is allocated
-at that time and can not (for now) be moved easily.
-
 NUMA awareness becomes very important as soon as many domains start
 running memory-intensive workloads on a shared host. In fact, the cost
 of accessing non node-local memory locations is very high, and the
@@ -27,14 +20,37 @@ performance degradation is likely to be noticeable.
 For more information, have a look at the [Xen NUMA Introduction][numa_intro]
 page on the Wiki.
 
+## Xen and NUMA machines: the concept of _node-affinity_ ##
+
+The Xen hypervisor deals with NUMA machines throughout the concept of
+_node-affinity_. The node-affinity of a domain is the set of NUMA nodes
+of the host where the memory for the domain is being allocated (mostly,
+at domain creation time). This is, at least in principle, different and
+unrelated with the vCPU (hard and soft, see below) scheduling affinity,
+which instead is the set of pCPUs where the vCPU is allowed (or prefers)
+to run.
+
+Of course, despite the fact that they belong to and affect different
+subsystems, the domain node-affinity and the vCPUs affinity are not
+completely independent.
+In fact, if the domain node-affinity is not explicitly specified by the
+user, via the proper libxl calls or xl config item, it will be computed
+basing on the vCPUs'' scheduling affinity.
+
+Notice that, even if the node affinity of a domain may change on-line,
+it is very important to "place" the domain correctly when it is fist
+created, as the most of its memory is allocated at that time and can
+not (for now) be moved easily.
+
 ### Placing via pinning and cpupools ###
 
-The simplest way of placing a domain on a NUMA node is statically pinning
-the domain''s vCPUs to the pCPUs of the node. This goes under the name
of
-CPU affinity and can be set through the "cpus=" option in the config
file
-(more about this below). Another option is to pool together the pCPUs
-spanning the node and put the domain in such a cpupool with the
"pool="
-config option (as documented in our [Wiki][cpupools_howto]).
+The simplest way of placing a domain on a NUMA node is setting the hard
+scheduling affinity of the domain''s vCPUs to the pCPUs of the node.
This
+also goes under the name of vCPU pinning, and can be done through the
+"cpus=" option in the config file (more about this below). Another
option
+is to pool together the pCPUs spanning the node and put the domain in
+such a _cpupool_ with the "pool=" config option (as documented in our
+[Wiki][cpupools_howto]).
 
 In both the above cases, the domain will not be able to execute outside
 the specified set of pCPUs for any reasons, even if all those pCPUs are
@@ -45,24 +61,45 @@ may come at he cost of some load imbalances.
 
 ### NUMA aware scheduling ###
 
-If the credit scheduler is in use, the concept of node affinity defined
-above does not only apply to memory. In fact, starting from Xen 4.3, the
-scheduler always tries to run the domain''s vCPUs on one of the nodes
in
-its node affinity. Only if that turns out to be impossible, it will just
-pick any free pCPU.
-
-This is, therefore, something more flexible than CPU affinity, as a domain
-can still run everywhere, it just prefers some nodes rather than others.
-Locality of access is less guaranteed than in the pinning case, but that
-comes along with better chances to exploit all the host resources (e.g.,
-the pCPUs).
-
-In fact, if all the pCPUs in a domain''s node affinity are busy, it is
-possible for the domain to run outside of there, but it is very likely that
-slower execution (due to remote memory accesses) is still better than no
-execution at all, as it would happen with pinning. For this reason, NUMA
-aware scheduling has the potential of bringing substantial performances
-benefits, although this will depend on the workload.
+If using the credit1 scheduler, and starting from Xen 4.3, the scheduler
+itself always tries to run the domain''s vCPUs on one of the nodes in
+its node-affinity. Only if that turns out to be impossible, it will just
+pick any free pCPU. Locality of access is less guaranteed than in the
+pinning case, but that comes along with better chances to exploit all
+the host resources (e.g., the pCPUs).
+
+Starting from Xen 4.4, credit1 supports two forms of affinity: hard and
+soft, both on a per-vCPU basis. This means each vCPU can have its own
+soft affinity, stating where such vCPU prefers to execute on. This is
+less strict than what it (also starting from 4.4) is called hard affinity,
+as the vCPU can potentially run everywhere, it just prefers some pCPUs
+rather than others.
+In Xen 4.4, therefore, NUMA-aware scheduling is achieved by matching the
+soft affinity of the vCPUs of a domain with its node-affinity.
+
+In fact, as it was for 4.3, if all the pCPUs in a vCPU''s soft affinity
+are busy, it is possible for the domain to run outside from there. The
+idea is that slower execution (due to remote memory accesses) is still
+better than no execution at all (as it would happen with pinning). For
+this reason, NUMA aware scheduling has the potential of bringing
+substantial performances benefits, although this will depend on the
+workload.
+
+Notice that, for each vCPU, the following three scenarios are possbile:
+
+  * a vCPU *is pinned* to some pCPUs and *does not have* any soft affinity
+    In this case, the vCPU is always scheduled on one of the pCPUs to which
+    it is pinned, without any specific peference among them.
+  * a vCPU *has* its own soft affinity and *is not* pinned to any particular
+    pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the
+    scheduler will try to have it running on one of the pCPUs in its soft
+    affinity;
+  * a vCPU *has* its own vCPU soft affinity and *is also* pinned to some
+    pCPUs. In this case, the vCPU is always scheduled on one of the pCPUs
+    onto which it is pinned, with, among them, a preference for the ones
+    that also forms its soft affinity. In case pinning and soft affinity
+    form two disjoint sets of pCPUs, pinning "wins", and the soft
affinity
+    is just ignored.
 
 ## Guest placement in xl ##
 
@@ -71,25 +108,23 @@ both manual or automatic placement of them across the
host''s NUMA nodes.
 
 Note that xm/xend does a very similar thing, the only differences being
 the details of the heuristics adopted for automatic placement (see below),
-and the lack of support (in both xm/xend and the Xen versions where that\
+and the lack of support (in both xm/xend and the Xen versions where that
 was the default toolstack) for NUMA aware scheduling.
 
 ### Placing the guest manually ###
 
 Thanks to the "cpus=" option, it is possible to specify where a
domain
 should be created and scheduled on, directly in its config file. This
-affects NUMA placement and memory accesses as the hypervisor constructs
-the node affinity of a VM basing right on its CPU affinity when it is
-created.
+affects NUMA placement and memory accesses as, in this case, the
+hypervisor constructs the node-affinity of a VM basing right on its
+vCPU pinning when it is created.
 
 This is very simple and effective, but requires the user/system
-administrator to explicitly specify affinities for each and every domain,
+administrator to explicitly specify the pinning for each and every domain,
 or Xen won''t be able to guarantee the locality for their memory
accesses.
 
-Notice that this also pins the domain''s vCPUs to the specified set of
-pCPUs, so it not only sets the domain''s node affinity (its memory will
-come from the nodes to which the pCPUs belong), but at the same time
-forces the vCPUs of the domain to be scheduled on those same pCPUs.
+That, of course, also mean the vCPUs of the domain will only be able to
+execute on those same pCPUs.
 
 ### Placing the guest automatically ###
 
@@ -97,7 +132,9 @@ If no "cpus=" option is specified in the config
file, libxl tries
 to figure out on its own on which node(s) the domain could fit best.
 If it finds one (some), the domain''s node affinity get set to there,
 and both memory allocations and NUMA aware scheduling (for the credit
-scheduler and starting from Xen 4.3) will comply with it.
+scheduler and starting from Xen 4.3) will comply with it. Starting from
+Xen 4.4, this also means that the mask resulting from this "fitting"
+procedure will become the soft affinity of all the vCPUs of the domain.
 
 It is worthwhile noting that optimally fitting a set of VMs on the NUMA
 nodes of an host is an incarnation of the Bin Packing Problem. In fact,
@@ -142,34 +179,43 @@ any placement from happening:
 
     libxl_defbool_set(&domain_build_info->numa_placement, false);
 
-Also, if `numa_placement` is set to `true`, the domain must not
-have any CPU affinity (i.e., `domain_build_info->cpumap` must
-have all its bits set, as it is by default), or domain creation
-will fail returning `ERROR_INVAL`.
+Also, if `numa_placement` is set to `true`, the domain''s vCPUs must
+not be pinned (i.e., `domain_build_info->cpumap` must have all its
+bits set, as it is by default), or domain creation will fail with
+`ERROR_INVAL`.
 
 Starting from Xen 4.3, in case automatic placement happens (and is
-successful), it will affect the domain''s node affinity and _not_ its
-CPU affinity. Namely, the domain''s vCPUs will not be pinned to any
+successful), it will affect the domain''s node-affinity and _not_ its
+vCPU pinning. Namely, the domain''s vCPUs will not be pinned to any
 pCPU on the host, but the memory from the domain will come from the
 selected node(s) and the NUMA aware scheduling (if the credit scheduler
-is in use) will try to keep the domain there as much as possible.
+is in use) will try to keep the domain''s vCPUs there as much as
possible.
 
 Besides than that, looking and/or tweaking the placement algorithm
 search "Automatic NUMA placement" in libxl\_internal.h.
 
 Note this may change in future versions of Xen/libxl.
 
+## Xen < 4.4 ##
+
+The concept of vCPU soft affinity has been introduced for the first time
+in Xen 4.4. In 4.3, it is the domain''s node-affinity that drives the
+NUMA-aware scheduler. The main difference is soft affinity is per-vCPU,
+and so each vCPU can have its own mask of pCPUs, while node-affinity is
+per-domain, that is the equivalent of having all the vCPUs with the same
+soft affinity.
+
 ## Xen < 4.3 ##
 
 As NUMA aware scheduling is a new feature of Xen 4.3, things are a little
 bit different for earlier version of Xen. If no "cpus=" option is
specified
 and Xen 4.2 is in use, the automatic placement algorithm still runs, but
 the results is used to _pin_ the vCPUs of the domain to the output node(s).
-This is consistent with what was happening with xm/xend, which were also
-affecting the domain''s CPU affinity.
+This is consistent with what was happening with xm/xend.
 
 On a version of Xen earlier than 4.2, there is not automatic placement at
-all in xl or libxl, and hence no node or CPU affinity being affected.
+all in xl or libxl, and hence no node-affinity, vCPU affinity or pinning
+being introduced/modified.
 
 ## Limitations ##
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index d8116c7..d6ac4d1 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -128,6 +128,7 @@ struct vcpu *alloc_vcpu(
     if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
          !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
          !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
+         !zalloc_cpumask_var(&v->cpu_soft_affinity) ||
          !zalloc_cpumask_var(&v->vcpu_dirty_cpumask) )
         goto fail_free;
 
@@ -159,6 +160,7 @@ struct vcpu *alloc_vcpu(
         free_cpumask_var(v->cpu_hard_affinity);
         free_cpumask_var(v->cpu_hard_affinity_tmp);
         free_cpumask_var(v->cpu_hard_affinity_saved);
+        free_cpumask_var(v->cpu_soft_affinity);
         free_cpumask_var(v->vcpu_dirty_cpumask);
         free_vcpu_struct(v);
         return NULL;
@@ -390,8 +392,6 @@ void domain_update_node_affinity(struct domain *d)
                 node_set(node, d->node_affinity);
     }
 
-    sched_set_node_affinity(d, &d->node_affinity);
-
     spin_unlock(&d->node_affinity_lock);
 
     free_cpumask_var(online_affinity);
@@ -737,6 +737,7 @@ static void complete_domain_destroy(struct rcu_head *head)
             free_cpumask_var(v->cpu_hard_affinity);
             free_cpumask_var(v->cpu_hard_affinity_tmp);
             free_cpumask_var(v->cpu_hard_affinity_saved);
+            free_cpumask_var(v->cpu_soft_affinity);
             free_cpumask_var(v->vcpu_dirty_cpumask);
             free_vcpu_struct(v);
         }
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index c11f577..42fb418 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -298,6 +298,8 @@ static void dump_domains(unsigned char key)
             printk("dirty_cpus=%s ", tmpstr);
             cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_hard_affinity);
             printk("cpu_affinity=%s\n", tmpstr);
+            cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_soft_affinity);
+            printk("cpu_soft_affinity=%s\n", tmpstr);
             printk("    pause_count=%d pause_flags=%lx\n",
                    atomic_read(&v->pause_count), v->pause_flags);
             arch_dump_vcpu_info(v);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index c6a2560..8b02b7b 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -112,10 +112,24 @@
 
 
 /*
- * Node Balancing
+ * Hard and soft affinity load balancing.
+ *
+ * Idea is each vcpu has some pcpus that it prefers, some that it does not
+ * prefer but is OK with, and some that it cannot run on at all. The first
+ * set of pcpus are the ones that are both in the soft affinity *and* in the
+ * hard affinity; the second set of pcpus are the ones that are in the hard
+ * affinity but *not* in the soft affinity; the third set of pcpus are the
+ * ones that are not in the hard affinity.
+ *
+ * We implement a two step balancing logic. Basically, every time there is
+ * the need to decide where to run a vcpu, we first check the soft affinity
+ * (well, actually, the && between soft and hard affinity), to see if
we can
+ * send it where it prefers to (and can) run on. However, if the first step
+ * does not find any suitable and free pcpu, we fall back checking the hard
+ * affinity.
  */
-#define CSCHED_BALANCE_NODE_AFFINITY    0
-#define CSCHED_BALANCE_CPU_AFFINITY     1
+#define CSCHED_BALANCE_SOFT_AFFINITY    0
+#define CSCHED_BALANCE_HARD_AFFINITY    1
 
 /*
  * Boot parameters
@@ -138,7 +152,7 @@ struct csched_pcpu {
 
 /*
  * Convenience macro for accessing the per-PCPU cpumask we need for
- * implementing the two steps (vcpu and node affinity) balancing logic.
+ * implementing the two steps (soft and hard affinity) balancing logic.
  * It is stored in csched_pcpu so that serialization is not an issue,
  * as there is a csched_pcpu for each PCPU and we always hold the
  * runqueue spin-lock when using this.
@@ -178,9 +192,6 @@ struct csched_dom {
     struct list_head active_vcpu;
     struct list_head active_sdom_elem;
     struct domain *dom;
-    /* cpumask translated from the domain''s node-affinity.
-     * Basically, the CPUs we prefer to be scheduled on. */
-    cpumask_var_t node_affinity_cpumask;
     uint16_t active_vcpu_count;
     uint16_t weight;
     uint16_t cap;
@@ -261,59 +272,28 @@ __runq_remove(struct csched_vcpu *svc)
     list_del_init(&svc->runq_elem);
 }
 
-/*
- * Translates node-affinity mask into a cpumask, so that we can use it during
- * actual scheduling. That of course will contain all the cpus from all the
- * set nodes in the original node-affinity mask.
- *
- * Note that any serialization needed to access mask safely is complete
- * responsibility of the caller of this function/hook.
- */
-static void csched_set_node_affinity(
-    const struct scheduler *ops,
-    struct domain *d,
-    nodemask_t *mask)
-{
-    struct csched_dom *sdom;
-    int node;
-
-    /* Skip idle domain since it doesn''t even have a
node_affinity_cpumask */
-    if ( unlikely(is_idle_domain(d)) )
-        return;
-
-    sdom = CSCHED_DOM(d);
-    cpumask_clear(sdom->node_affinity_cpumask);
-    for_each_node_mask( node, *mask )
-        cpumask_or(sdom->node_affinity_cpumask,
sdom->node_affinity_cpumask,
-                   &node_to_cpumask(node));
-}
 
 #define for_each_csched_balance_step(step) \
-    for ( (step) = 0; (step) <= CSCHED_BALANCE_CPU_AFFINITY; (step)++ )
+    for ( (step) = 0; (step) <= CSCHED_BALANCE_HARD_AFFINITY; (step)++ )
 
 
 /*
- * vcpu-affinity balancing is always necessary and must never be skipped.
- * OTOH, if a domain''s node-affinity is said to be automatically
computed
- * (or if it just spans all the nodes), we can safely avoid dealing with
- * node-affinity entirely.
+ * Hard affinity balancing is always necessary and must never be skipped.
+ * OTOH, if the vcpu''s soft affinity is full (it spans all the
possible
+ * pcpus) we can safely avoid dealing with it entirely.
  *
- * Node-affinity is also deemed meaningless in case it has empty
- * intersection with mask, to cover the cases where using the node-affinity
+ * A vcpu''s soft affinity is also deemed meaningless in case it has
empty
+ * intersection with mask, to cover the cases where using the soft affinity
  * mask seems legit, but would instead led to trying to schedule the vcpu
  * on _no_ pcpu! Typical use cases are for mask to be equal to the
vcpu''s
- * vcpu-affinity, or to the && of vcpu-affinity and the set of online
cpus
+ * hard affinity, or to the && of hard affinity and the set of online
cpus
  * in the domain''s cpupool.
  */
-static inline int __vcpu_has_node_affinity(const struct vcpu *vc,
+static inline int __vcpu_has_soft_affinity(const struct vcpu *vc,
                                            const cpumask_t *mask)
 {
-    const struct domain *d = vc->domain;
-    const struct csched_dom *sdom = CSCHED_DOM(d);
-
-    if ( d->auto_node_affinity
-         || cpumask_full(sdom->node_affinity_cpumask)
-         || !cpumask_intersects(sdom->node_affinity_cpumask, mask) )
+    if ( cpumask_full(vc->cpu_soft_affinity)
+         || !cpumask_intersects(vc->cpu_soft_affinity, mask) )
         return 0;
 
     return 1;
@@ -321,23 +301,22 @@ static inline int __vcpu_has_node_affinity(const struct
vcpu *vc,
 
 /*
  * Each csched-balance step uses its own cpumask. This function determines
- * which one (given the step) and copies it in mask. For the node-affinity
- * balancing step, the pcpus that are not part of vc''s vcpu-affinity
are
+ * which one (given the step) and copies it in mask. For the soft affinity
+ * balancing step, the pcpus that are not part of vc''s hard affinity
are
  * filtered out from the result, to avoid running a vcpu where it would
  * like, but is not allowed to!
  */
 static void
 csched_balance_cpumask(const struct vcpu *vc, int step, cpumask_t *mask)
 {
-    if ( step == CSCHED_BALANCE_NODE_AFFINITY )
+    if ( step == CSCHED_BALANCE_SOFT_AFFINITY )
     {
-        cpumask_and(mask, CSCHED_DOM(vc->domain)->node_affinity_cpumask,
-                    vc->cpu_hard_affinity);
+        cpumask_and(mask, vc->cpu_soft_affinity, vc->cpu_hard_affinity);
 
         if ( unlikely(cpumask_empty(mask)) )
             cpumask_copy(mask, vc->cpu_hard_affinity);
     }
-    else /* step == CSCHED_BALANCE_CPU_AFFINITY */
+    else /* step == CSCHED_BALANCE_HARD_AFFINITY */
         cpumask_copy(mask, vc->cpu_hard_affinity);
 }
 
@@ -398,15 +377,15 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
     else if ( !idlers_empty )
     {
         /*
-         * Node and vcpu-affinity balancing loop. For vcpus without
-         * a useful node-affinity, consider vcpu-affinity only.
+         * Soft and hard affinity balancing loop. For vcpus without
+         * a useful soft affinity, consider hard affinity only.
          */
         for_each_csched_balance_step( balance_step )
         {
             int new_idlers_empty;
 
-            if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY
-                 && !__vcpu_has_node_affinity(new->vcpu,
+            if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY
+                 && !__vcpu_has_soft_affinity(new->vcpu,
                                              
new->vcpu->cpu_hard_affinity) )
                 continue;
 
@@ -418,11 +397,11 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
 
             /*
              * Let''s not be too harsh! If there aren''t idlers
suitable
-             * for new in its node-affinity mask, make sure we check its
-             * vcpu-affinity as well, before taking final decisions.
+             * for new in its soft affinity mask, make sure we check its
+             * hard affinity as well, before taking final decisions.
              */
             if ( new_idlers_empty
-                 && balance_step == CSCHED_BALANCE_NODE_AFFINITY )
+                 && balance_step == CSCHED_BALANCE_SOFT_AFFINITY )
                 continue;
 
             /*
@@ -649,23 +628,23 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu
*vc, bool_t commit)
         /*
          * We want to pick up a pcpu among the ones that are online and
          * can accommodate vc, which is basically what we computed above
-         * and stored in cpus. As far as vcpu-affinity is concerned,
+         * and stored in cpus. As far as hard affinity is concerned,
          * there always will be at least one of these pcpus, hence cpus
          * is never empty and the calls to cpumask_cycle() and
          * cpumask_test_cpu() below are ok.
          *
-         * On the other hand, when considering node-affinity too, it
+         * On the other hand, when considering soft affinity too, it
          * is possible for the mask to become empty (for instance, if the
          * domain has been put in a cpupool that does not contain any of the
-         * nodes in its node-affinity), which would result in the ASSERT()-s
+         * pcpus in its soft affinity), which would result in the ASSERT()-s
          * inside cpumask_*() operations triggering (in debug builds).
          *
-         * Therefore, in this case, we filter the node-affinity mask against
-         * cpus and, if the result is empty, we just skip the node-affinity
+         * Therefore, in this case, we filter the soft affinity mask against
+         * cpus and, if the result is empty, we just skip the soft affinity
          * balancing step all together.
          */
-        if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY
-             && !__vcpu_has_node_affinity(vc, &cpus) )
+        if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY
+             && !__vcpu_has_soft_affinity(vc, &cpus) )
             continue;
 
         /* Pick an online CPU from the proper affinity mask */
@@ -1122,13 +1101,6 @@ csched_alloc_domdata(const struct scheduler *ops, struct
domain *dom)
     if ( sdom == NULL )
         return NULL;
 
-    if ( !alloc_cpumask_var(&sdom->node_affinity_cpumask) )
-    {
-        xfree(sdom);
-        return NULL;
-    }
-    cpumask_setall(sdom->node_affinity_cpumask);
-
     /* Initialize credit and weight */
     INIT_LIST_HEAD(&sdom->active_vcpu);
     INIT_LIST_HEAD(&sdom->active_sdom_elem);
@@ -1158,9 +1130,6 @@ csched_dom_init(const struct scheduler *ops, struct domain
*dom)
 static void
 csched_free_domdata(const struct scheduler *ops, void *data)
 {
-    struct csched_dom *sdom = data;
-
-    free_cpumask_var(sdom->node_affinity_cpumask);
     xfree(data);
 }
 
@@ -1486,19 +1455,19 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int
balance_step)
             BUG_ON( is_idle_vcpu(vc) );
 
             /*
-             * If the vcpu has no useful node-affinity, skip this vcpu.
-             * In fact, what we want is to check if we have any node-affine
-             * work to steal, before starting to look at vcpu-affine work.
+             * If the vcpu has no useful soft affinity, skip this vcpu.
+             * In fact, what we want is to check if we have any
"soft-affine
+             * work" to steal, before starting to look at
"hard-affine work".
              *
              * Notice that, if not even one vCPU on this runq has a useful
-             * node-affinity, we could have avoid considering this runq for
-             * a node balancing step in the first place. This, for instance,
+             * soft affinity, we could have avoid considering this runq for
+             * a soft balancing step in the first place. This, for instance,
              * can be implemented by taking note of on what runq there are
-             * vCPUs with useful node-affinities in some sort of bitmap
+             * vCPUs with useful soft affinities in some sort of bitmap
              * or counter.
              */
-            if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY
-                 && !__vcpu_has_node_affinity(vc,
vc->cpu_hard_affinity) )
+            if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY
+                 && !__vcpu_has_soft_affinity(vc,
vc->cpu_hard_affinity) )
                 continue;
 
             csched_balance_cpumask(vc, balance_step, csched_balance_mask);
@@ -1546,17 +1515,17 @@ csched_load_balance(struct csched_private *prv, int cpu,
         SCHED_STAT_CRANK(load_balance_other);
 
     /*
-     * Let''s look around for work to steal, taking both vcpu-affinity
-     * and node-affinity into account. More specifically, we check all
+     * Let''s look around for work to steal, taking both hard affinity
+     * and soft affinity into account. More specifically, we check all
      * the non-idle CPUs'' runq, looking for:
-     *  1. any node-affine work to steal first,
-     *  2. if not finding anything, any vcpu-affine work to steal.
+     *  1. any "soft-affine work" to steal first,
+     *  2. if not finding anything, any "hard-affine work" to steal.
      */
     for_each_csched_balance_step( bstep )
     {
         /*
          * We peek at the non-idling CPUs in a node-wise fashion. In fact,
-         * it is more likely that we find some node-affine work on our same
+         * it is more likely that we find some affine work on our same
          * node, not to mention that migrating vcpus within the same node
          * could well expected to be cheaper than across-nodes (memory
          * stays local, there might be some node-wide cache[s], etc.).
@@ -1982,8 +1951,6 @@ const struct scheduler sched_credit_def = {
     .adjust         = csched_dom_cntl,
     .adjust_global  = csched_sys_cntl,
 
-    .set_node_affinity  = csched_set_node_affinity,
-
     .pick_cpu       = csched_cpu_pick,
     .do_schedule    = csched_schedule,
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c4236c5..c9ae521 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -198,6 +198,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     else
         cpumask_setall(v->cpu_hard_affinity);
 
+    cpumask_setall(v->cpu_soft_affinity);
+
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
                v, v->processor);
@@ -286,6 +288,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         migrate_timer(&v->poll_timer, new_p);
 
         cpumask_setall(v->cpu_hard_affinity);
+        cpumask_setall(v->cpu_soft_affinity);
 
         lock = vcpu_schedule_lock_irq(v);
         v->processor = new_p;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 40e5927..3575312 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -198,6 +198,9 @@ struct vcpu
     /* Used to restore affinity across S3. */
     cpumask_var_t    cpu_hard_affinity_saved;
 
+    /* Bitmask of CPUs on which this VCPU prefers to run. */
+    cpumask_var_t    cpu_soft_affinity;
+
     /* Bitmask of CPUs which are holding onto this VCPU''s state. */
     cpumask_var_t    vcpu_dirty_cpumask;

Dario Faggioli

2013-Nov-22 18:57 UTC

head link

[PATCH v4 09/15] xen: derive NUMA node affinity from hard and soft CPU affinity

if a domain''s NUMA node-affinity (which is what controls
memory allocations) is provided by the user/toolstack, it
just is not touched. However, if the user does not say
anything, leaving it all to Xen, let''s compute it in the
following way:

 1. cpupool''s cpus & hard-affinity & soft-affinity
 2. if (1) is empty: cpupool''s cpus & hard-affinity

This guarantees memory to be allocated from the narrowest
possible set of NUMA nodes, ad makes it relatively easy to
set up NUMA-aware scheduling on top of soft affinity.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
---
Changes from v3:
 * avoid pointless calls to cpumask_clear(), as requested
   during review;
 * ASSERT() non emptyness of cpupool & hard affinity, as
   suggested during review.

Changes from v2:
 * the loop computing the mask is now only executed when
   it really is useful, as suggested during review;
 * the loop, and all the cpumask handling is optimized,
   in a way similar to what was suggested during review.
---
 xen/common/domain.c |   61 +++++++++++++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 22 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index d6ac4d1..4adc123 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -353,17 +353,17 @@ struct domain *domain_create(
 
 void domain_update_node_affinity(struct domain *d)
 {
-    cpumask_var_t cpumask;
-    cpumask_var_t online_affinity;
+    cpumask_var_t dom_cpumask, dom_cpumask_soft;
+    cpumask_t *dom_affinity;
     const cpumask_t *online;
     struct vcpu *v;
-    unsigned int node;
+    unsigned int cpu;
 
-    if ( !zalloc_cpumask_var(&cpumask) )
+    if ( !zalloc_cpumask_var(&dom_cpumask) )
         return;
-    if ( !alloc_cpumask_var(&online_affinity) )
+    if ( !zalloc_cpumask_var(&dom_cpumask_soft) )
     {
-        free_cpumask_var(cpumask);
+        free_cpumask_var(dom_cpumask);
         return;
     }
 
@@ -371,31 +371,48 @@ void domain_update_node_affinity(struct domain *d)
 
     spin_lock(&d->node_affinity_lock);
 
-    for_each_vcpu ( d, v )
-    {
-        cpumask_and(online_affinity, v->cpu_hard_affinity, online);
-        cpumask_or(cpumask, cpumask, online_affinity);
-    }
-
     /*
-     * If d->auto_node_affinity is true, the domain''s node-affinity
mask
-     * (d->node_affinity) is automaically computed from all the
domain''s
-     * vcpus'' vcpu-affinity masks (the union of which we have just
built
-     * above in cpumask). OTOH, if d->auto_node_affinity is false, we
-     * must leave the node-affinity of the domain alone.
+     * If d->auto_node_affinity is true, let''s compute the
domain''s
+     * node-affinity and update d->node_affinity accordingly. if false,
+     * just leave d->auto_node_affinity alone.
      */
     if ( d->auto_node_affinity )
     {
+        /*
+         * We want the narrowest possible set of pcpus (to get the narowest
+         * possible set of nodes). What we need is the cpumask of where the
+         * domain can run (the union of the hard affinity of all its vcpus),
+         * and the full mask of where it would prefer to run (the union of
+         * the soft affinity of all its various vcpus). Let''s build
them.
+         */
+        for_each_vcpu ( d, v )
+        {
+            cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity);
+            cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
+                       v->cpu_soft_affinity);
+        }
+        /* Filter out non-online cpus */
+        cpumask_and(dom_cpumask, dom_cpumask, online);
+        ASSERT(!cpumask_empty(dom_cpumask));
+        /* And compute the intersection between hard, online and soft */
+        cpumask_and(dom_cpumask_soft, dom_cpumask_soft, dom_cpumask);
+
+        /*
+         * If not empty, the intersection of hard, soft and online is the
+         * narrowest set we want. If empty, we fall back to hard&online.
+         */
+        dom_affinity = cpumask_empty(dom_cpumask_soft) ?
+                           dom_cpumask : dom_cpumask_soft;
+
         nodes_clear(d->node_affinity);
-        for_each_online_node ( node )
-            if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
-                node_set(node, d->node_affinity);
+        for_each_cpu( cpu, dom_affinity )
+            node_set(cpu_to_node(cpu), d->node_affinity);
     }
 
     spin_unlock(&d->node_affinity_lock);
 
-    free_cpumask_var(online_affinity);
-    free_cpumask_var(cpumask);
+    free_cpumask_var(dom_cpumask_soft);
+    free_cpumask_var(dom_cpumask);
 }

Dario Faggioli

2013-Nov-22 18:57 UTC

head link

[PATCH v4 10/15] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity

by adding a flag for the caller to specify which one he cares about.

Add also another cpumap there. This way, in case of
DOMCTL_setvcpuaffinity, Xen can return back to the caller the
"effective affinity" of the vcpu. We call the effective affinity
the intersection between cpupool''s cpus, the (new?) hard affinity
and the (new?) soft affinity.

The purpose of this is allowing the toolstack to figure out whether
or not the requested change produced sensible results, when combined
with the other settings that are already in place.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
---
Changes since v3:
 * no longer discarding possible errors. Also, rollback setting
   hard affinity if setting soft affinity fails afterwards, so
   that the caller really sees no changes when the call fails,
   as requested during review;
 * fixed -EFAULT --> -ENOMEM in case of a failed memory allocation,
   as requested during review;
 * removed non necessary use of pointer to pointer, as requested
   during review.

Changes from v2:
 * in DOMCTL_[sg]etvcpuaffinity, flag is really a flag now,
   i.e., we accept request for setting and getting: (1) only
   hard affinity; (2) only soft affinity; (3) both; as
   suggested during review.
---
 tools/libxc/xc_domain.c     |    4 +-
 xen/arch/x86/traps.c        |    4 +-
 xen/common/domctl.c         |   92 +++++++++++++++++++++++++++++++++++++++++--
 xen/common/schedule.c       |   35 +++++++++++-----
 xen/common/wait.c           |    6 +--
 xen/include/public/domctl.h |   15 ++++++-
 xen/include/xen/sched.h     |    3 +
 7 files changed, 134 insertions(+), 25 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 1ccafc5..f9ae4bf 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -215,7 +215,9 @@ int xc_vcpu_setaffinity(xc_interface *xch,
 
     domctl.cmd = XEN_DOMCTL_setvcpuaffinity;
     domctl.domain = (domid_t)domid;
-    domctl.u.vcpuaffinity.vcpu    = vcpu;
+    domctl.u.vcpuaffinity.vcpu = vcpu;
+    /* Soft affinity is there, but not used anywhere for now, so... */
+    domctl.u.vcpuaffinity.flags = XEN_VCPUAFFINITY_HARD;
 
     memcpy(local, cpumap, cpusize);
 
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 157031e..ff4523b 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -3103,7 +3103,7 @@ static void nmi_mce_softirq(void)
          * Make sure to wakeup the vcpu on the
          * specified processor.
          */
-        vcpu_set_affinity(st->vcpu, cpumask_of(st->processor));
+        vcpu_set_hard_affinity(st->vcpu, cpumask_of(st->processor));
 
         /* Affinity is restored in the iret hypercall. */
     }
@@ -3132,7 +3132,7 @@ void async_exception_cleanup(struct vcpu *curr)
     if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) &&
          !cpumask_equal(curr->cpu_hard_affinity_tmp,
curr->cpu_hard_affinity) )
     {
-        vcpu_set_affinity(curr, curr->cpu_hard_affinity_tmp);
+        vcpu_set_hard_affinity(curr, curr->cpu_hard_affinity_tmp);
         cpumask_clear(curr->cpu_hard_affinity_tmp);
     }
 
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 5e0ac5c..6081d22 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -616,20 +616,102 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t)
u_domctl)
 
         if ( op->cmd == XEN_DOMCTL_setvcpuaffinity )
         {
-            cpumask_var_t new_affinity;
+            cpumask_var_t new_affinity, old_affinity;
+            cpumask_t *online;
+
+            /* We need this to restore hard affinity if setting soft fails */
+            if ( !alloc_cpumask_var(&old_affinity) )
+            {
+                ret = -ENOMEM;
+                break;
+            }
+            cpumask_copy(old_affinity, v->cpu_hard_affinity);
 
             ret = xenctl_bitmap_to_cpumask(
                 &new_affinity, &op->u.vcpuaffinity.cpumap);
-            if ( !ret )
+            if ( ret )
+            {
+                free_cpumask_var(old_affinity);
+                break;
+            }
+
+            ret = -EINVAL;
+            if ( op->u.vcpuaffinity.flags == 0 )
+                goto setvcpuaffinity_out;
+
+            if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD )
+            {
+                ret = vcpu_set_hard_affinity(v, new_affinity);
+                if ( ret )
+                    goto setvcpuaffinity_out;
+            }
+            if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT )
+            {
+                ret = vcpu_set_soft_affinity(v, new_affinity);
+                if ( ret )
+                {
+                    /*
+                     * Since we''re returning error, the caller expects
nothing
+                     * happened, so we rollback the changes to hard affinity
+                     * (if any).
+                     */
+                    if ( op->u.vcpuaffinity.flags &
XEN_VCPUAFFINITY_HARD )
+                        vcpu_set_hard_affinity(v, old_affinity);
+                    goto setvcpuaffinity_out;
+                }
+            }
+
+            /*
+             * Report back to the caller what the "effective
affinity", that
+             * is the intersection of cpupool''s pcpus, the (new?) hard
+             * affinity and the (new?) soft-affinity.
+             */
+            if ( !guest_handle_is_null(op->u.vcpuaffinity.eff_cpumap.bitmap)
)
             {
-                ret = vcpu_set_affinity(v, new_affinity);
-                free_cpumask_var(new_affinity);
+                online = cpupool_online_cpumask(v->domain->cpupool);
+                cpumask_and(new_affinity, online, v->cpu_hard_affinity);
+                if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT)
+                    cpumask_and(new_affinity, new_affinity,
+                                v->cpu_soft_affinity);
+
+                ret = cpumask_to_xenctl_bitmap(
+                    &op->u.vcpuaffinity.eff_cpumap, new_affinity);
             }
+
+ setvcpuaffinity_out:
+            free_cpumask_var(new_affinity);
+            free_cpumask_var(old_affinity);
         }
         else
         {
+            cpumask_var_t affinity;
+
+            /*
+             * If the caller asks for both _HARD and _SOFT, what we return
+             * is the intersection of hard and soft affinity for the vcpu.
+             */
+            if ( !alloc_cpumask_var(&affinity) )
+            {
+                ret = -ENOMEM;
+                break;
+            }
+            cpumask_setall(affinity);
+
+            if ( op->u.vcpuaffinity.flags == 0 )
+            {
+                ret = -EINVAL;
+                free_cpumask_var(affinity);
+                break;
+            }
+            if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD )
+                cpumask_copy(affinity, v->cpu_hard_affinity);
+            if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT )
+                cpumask_and(affinity, affinity, v->cpu_soft_affinity);
+
             ret = cpumask_to_xenctl_bitmap(
-                &op->u.vcpuaffinity.cpumap, v->cpu_hard_affinity);
+                &op->u.vcpuaffinity.cpumap, affinity);
+
+            free_cpumask_var(affinity);
         }
     }
     break;
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c9ae521..b1e9b08 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -654,22 +654,14 @@ void sched_set_node_affinity(struct domain *d, nodemask_t
*mask)
     SCHED_OP(DOM2OP(d), set_node_affinity, d, mask);
 }
 
-int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity)
+static int vcpu_set_affinity(
+    struct vcpu *v, const cpumask_t *affinity, cpumask_t *which)
 {
-    cpumask_t online_affinity;
-    cpumask_t *online;
     spinlock_t *lock;
 
-    if ( v->domain->is_pinned )
-        return -EINVAL;
-    online = VCPU2ONLINE(v);
-    cpumask_and(&online_affinity, affinity, online);
-    if ( cpumask_empty(&online_affinity) )
-        return -EINVAL;
-
     lock = vcpu_schedule_lock_irq(v);
 
-    cpumask_copy(v->cpu_hard_affinity, affinity);
+    cpumask_copy(which, affinity);
 
     /* Always ask the scheduler to re-evaluate placement
      * when changing the affinity */
@@ -688,6 +680,27 @@ int vcpu_set_affinity(struct vcpu *v, const cpumask_t
*affinity)
     return 0;
 }
 
+int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity)
+{
+    cpumask_t online_affinity;
+    cpumask_t *online;
+
+    if ( v->domain->is_pinned )
+        return -EINVAL;
+
+    online = VCPU2ONLINE(v);
+    cpumask_and(&online_affinity, affinity, online);
+    if ( cpumask_empty(&online_affinity) )
+        return -EINVAL;
+
+    return vcpu_set_affinity(v, affinity, v->cpu_hard_affinity);
+}
+
+int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity)
+{
+    return vcpu_set_affinity(v, affinity, v->cpu_soft_affinity);
+}
+
 /* Block the currently-executing domain until a pertinent event occurs. */
 void vcpu_block(void)
 {
diff --git a/xen/common/wait.c b/xen/common/wait.c
index 3f6ff41..1f6b597 100644
--- a/xen/common/wait.c
+++ b/xen/common/wait.c
@@ -135,7 +135,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
     /* Save current VCPU affinity; force wakeup on *this* CPU only. */
     wqv->wakeup_cpu = smp_processor_id();
     cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
-    if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
+    if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
     {
         gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
         domain_crash_synchronous();
@@ -166,7 +166,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
 static void __finish_wait(struct waitqueue_vcpu *wqv)
 {
     wqv->esp = NULL;
-    (void)vcpu_set_affinity(current, &wqv->saved_affinity);
+    (void)vcpu_set_hard_affinity(current, &wqv->saved_affinity);
 }
 
 void check_wakeup_from_wait(void)
@@ -184,7 +184,7 @@ void check_wakeup_from_wait(void)
         /* Re-set VCPU affinity and re-enter the scheduler. */
         struct vcpu *curr = current;
         cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
-        if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
+        if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
         {
             gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
             domain_crash_synchronous();
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 01a3652..4f71450 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -300,8 +300,19 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_nodeaffinity_t);
 /* XEN_DOMCTL_setvcpuaffinity */
 /* XEN_DOMCTL_getvcpuaffinity */
 struct xen_domctl_vcpuaffinity {
-    uint32_t  vcpu;              /* IN */
-    struct xenctl_bitmap cpumap; /* IN/OUT */
+    /* IN variables. */
+    uint32_t  vcpu;
+ /* Set/get the hard affinity for vcpu */
+#define _XEN_VCPUAFFINITY_HARD  0
+#define XEN_VCPUAFFINITY_HARD   (1U<<_XEN_VCPUAFFINITY_HARD)
+ /* Set/get the soft affinity for vcpu */
+#define _XEN_VCPUAFFINITY_SOFT  1
+#define XEN_VCPUAFFINITY_SOFT   (1U<<_XEN_VCPUAFFINITY_SOFT)
+    uint32_t flags;
+    /* IN/OUT variables. */
+    struct xenctl_bitmap cpumap;
+    /* OUT variables. */
+    struct xenctl_bitmap eff_cpumap;
 };
 typedef struct xen_domctl_vcpuaffinity xen_domctl_vcpuaffinity_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuaffinity_t);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 3575312..0f728b3 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -755,7 +755,8 @@ void scheduler_free(struct scheduler *sched);
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c);
 void vcpu_force_reschedule(struct vcpu *v);
 int cpu_disable_scheduler(unsigned int cpu);
-int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity);
+int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity);
+int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity);
 void restore_vcpu_affinity(struct domain *d);
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate);

Dario Faggioli

2013-Nov-22 18:58 UTC

head link

[PATCH v4 11/15] libxc: get and set soft and hard affinity

by using the new flag introduced in the parameters of the
DOMCTL_{get,set}_vcpuaffinity hypercall.

This happens by adding a new parameter (flags) to
xc_vcpu_setaffinity() and xc_vcpu_getaffinity(), so that the
caller can decide to set either the soft or hard affinity, or
even both.

In case of setting both hard and soft, they are set to the
same cpumap. xc_get_setaffinity() also takes another new param,
for reporting back to the caller what the actual affinity the
scheduler uses will be after a successful call.
In case of asking to get both hard and soft, what the caller
gets is the intersection between them.

In-tree callers are also fixed to cope with the new interface.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
Changes from v2:
 * better cleanup logic in _vcpu_setaffinity() (regarding
   xc_hypercall_buffer_{alloc,free}()), as suggested during
   review;
 * make it more evident that DOMCTL_setvcpuaffinity has an out
   parameter, by calling ecpumap_out, and improving the comment
   wrt that;
 * change the interface and have xc_vcpu_[sg]etaffinity() so
   that they take the new parameters (flags and ecpumap_out) and
   fix the in tree callers.
---
 tools/libxc/xc_domain.c             |   47 +++++++++++++++++++++--------------
 tools/libxc/xenctrl.h               |   44 ++++++++++++++++++++++++++++++++-
 tools/libxl/libxl.c                 |    7 ++++-
 tools/ocaml/libs/xc/xenctrl_stubs.c |    8 ++++--
 tools/python/xen/lowlevel/xc/xc.c   |    6 +++-
 5 files changed, 86 insertions(+), 26 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index f9ae4bf..c137351 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -192,44 +192,52 @@ int xc_domain_node_getaffinity(xc_interface *xch,
 int xc_vcpu_setaffinity(xc_interface *xch,
                         uint32_t domid,
                         int vcpu,
-                        xc_cpumap_t cpumap)
+                        xc_cpumap_t cpumap,
+                        uint32_t flags,
+                        xc_cpumap_t ecpumap_out)
 {
     DECLARE_DOMCTL;
-    DECLARE_HYPERCALL_BUFFER(uint8_t, local);
+    DECLARE_HYPERCALL_BUFFER(uint8_t, cpumap_local);
+    DECLARE_HYPERCALL_BUFFER(uint8_t, ecpumap_local);
     int ret = -1;
     int cpusize;
 
     cpusize = xc_get_cpumap_size(xch);
-    if (!cpusize)
+    if ( !cpusize )
     {
         PERROR("Could not get number of cpus");
-        goto out;
+        return -1;
     }
 
-    local = xc_hypercall_buffer_alloc(xch, local, cpusize);
-    if ( local == NULL )
+    cpumap_local = xc_hypercall_buffer_alloc(xch, cpumap_local, cpusize);
+    ecpumap_local = xc_hypercall_buffer_alloc(xch, ecpumap_local, cpusize);
+    if ( cpumap_local == NULL || cpumap_local == NULL)
     {
-        PERROR("Could not allocate memory for setvcpuaffinity domctl
hypercall");
+        PERROR("Could not allocate hcall buffers for
DOMCTL_setvcpuaffinity");
         goto out;
     }
 
     domctl.cmd = XEN_DOMCTL_setvcpuaffinity;
     domctl.domain = (domid_t)domid;
     domctl.u.vcpuaffinity.vcpu = vcpu;
-    /* Soft affinity is there, but not used anywhere for now, so... */
-    domctl.u.vcpuaffinity.flags = XEN_VCPUAFFINITY_HARD;
-
-    memcpy(local, cpumap, cpusize);
-
-    set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local);
+    domctl.u.vcpuaffinity.flags = flags;
 
+    memcpy(cpumap_local, cpumap, cpusize);
+    set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, cpumap_local);
     domctl.u.vcpuaffinity.cpumap.nr_bits = cpusize * 8;
 
+    set_xen_guest_handle(domctl.u.vcpuaffinity.eff_cpumap.bitmap,
+                         ecpumap_local);
+    domctl.u.vcpuaffinity.eff_cpumap.nr_bits = cpusize * 8;
+
     ret = do_domctl(xch, &domctl);
 
-    xc_hypercall_buffer_free(xch, local);
+    if ( ecpumap_out != NULL )
+        memcpy(ecpumap_out, ecpumap_local, cpusize);
 
  out:
+    xc_hypercall_buffer_free(xch, cpumap_local);
+    xc_hypercall_buffer_free(xch, ecpumap_local);
     return ret;
 }
 
@@ -237,6 +245,7 @@ int xc_vcpu_setaffinity(xc_interface *xch,
 int xc_vcpu_getaffinity(xc_interface *xch,
                         uint32_t domid,
                         int vcpu,
+                        uint32_t flags,
                         xc_cpumap_t cpumap)
 {
     DECLARE_DOMCTL;
@@ -245,22 +254,23 @@ int xc_vcpu_getaffinity(xc_interface *xch,
     int cpusize;
 
     cpusize = xc_get_cpumap_size(xch);
-    if (!cpusize)
+    if ( !cpusize )
     {
         PERROR("Could not get number of cpus");
-        goto out;
+        return -1;
     }
 
     local = xc_hypercall_buffer_alloc(xch, local, cpusize);
-    if (local == NULL)
+    if ( local == NULL )
     {
         PERROR("Could not allocate memory for getvcpuaffinity domctl
hypercall");
-        goto out;
+        return -1;
     }
 
     domctl.cmd = XEN_DOMCTL_getvcpuaffinity;
     domctl.domain = (domid_t)domid;
     domctl.u.vcpuaffinity.vcpu = vcpu;
+    domctl.u.vcpuaffinity.flags = flags;
 
     set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local);
     domctl.u.vcpuaffinity.cpumap.nr_bits = cpusize * 8;
@@ -270,7 +280,6 @@ int xc_vcpu_getaffinity(xc_interface *xch,
     memcpy(cpumap, local, cpusize);
 
     xc_hypercall_buffer_free(xch, local);
-out:
     return ret;
 }
 
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 4ac6b8a..a97ed67 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -579,13 +579,55 @@ int xc_domain_node_getaffinity(xc_interface *xch,
                                uint32_t domind,
                                xc_nodemap_t nodemap);
 
+/**
+ * This function specifies the CPU affinity for a vcpu.
+ *
+ * There are two kinds of affinity. Soft affinity is on what pcpus a vcpu
+ * prefers to run. Hard affinity is on what pcpus a vcpu is allowed to run.
+ * If flags contains *only* XEN_VCPUAFFINITY_SOFT, it is the soft affinity
+ * that is set. If flags contains *only* XEN_VCPUAFFINITY_HARD, it is the
+ * hard affinity that is set. If flags contains *both*, both are set to the
+ * same value, provided in cpumap.
+ *
+ * The function also returns the effective affinity, via the ecpumap_out
+ * parameter. Effective affinity it the intersection of soft affinity, hard
+ * affinity and the set of the cpus of the cpupool the domain belongs to.
+ * It basically is what the Xen scheduler will actually use. Reporting it
+ * back to the caller allows him to check if that matches with, or at least
+ * is good enough for, his purposes.
+ *
+ * @param xch a handle to an open hypervisor interface.
+ * @param domid the id of the domain to which the vcpu belongs
+ * @param vcpu the vcpu id wihin the domain
+ * @param cpumap the (hard, soft, both) new affinity map one wants to set
+ * #param flags what we want to set
+ * @param ecpumap_out where the effective affinity for the vcpu is returned
+ */
 int xc_vcpu_setaffinity(xc_interface *xch,
                         uint32_t domid,
                         int vcpu,
-                        xc_cpumap_t cpumap);
+                        xc_cpumap_t cpumap,
+                        uint32_t flags,
+                        xc_cpumap_t ecpumap_out);
+
+/**
+ * This function retrieves hard or soft CPU affinity (or their intersection)
+ * for a vcpu, depending on flags.
+ *
+ * Soft affinity is returned if *only* XEN_VCPUAFFINITY_SOFT is set in flags.
+ * Hard affinity is returned if *only* XEN_VCPUAFFINITY_HARD is set in flags.
+ * If both are set, what is returned is the intersection of the two.
+ *
+ * @param xch a handle to an open hypervisor interface.
+ * @param domid the id of the domain to which the vcpu belongs
+ * @param vcpu the vcpu id wihin the domain
+ * @param flags what we want get
+ * @param cpumap is where the desired affinity is returned
+ */
 int xc_vcpu_getaffinity(xc_interface *xch,
                         uint32_t domid,
                         int vcpu,
+                        uint32_t flags,
                         xc_cpumap_t cpumap);
 
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 61d8a76..961d55e 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -4542,7 +4542,9 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t
domid,
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu
info");
             return NULL;
         }
-        if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu,
ptr->cpumap.map) == -1) {
+        if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu,
+                                XEN_VCPUAFFINITY_HARD,
+                                ptr->cpumap.map) == -1) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu
affinity");
             return NULL;
         }
@@ -4559,7 +4561,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t
domid,
 int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid,
                            libxl_bitmap *cpumap)
 {
-    if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map)) {
+    if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map,
+                            XEN_VCPUAFFINITY_HARD, NULL)) {
         LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu
affinity");
         return ERROR_FAIL;
     }
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c
b/tools/ocaml/libs/xc/xenctrl_stubs.c
index f5cf0ed..30327d4 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -438,7 +438,9 @@ CAMLprim value stub_xc_vcpu_setaffinity(value xch, value
domid,
 			c_cpumap[i/8] |= 1 << (i&7);
 	}
 	retval = xc_vcpu_setaffinity(_H(xch), _D(domid),
-	                             Int_val(vcpu), c_cpumap);
+				     Int_val(vcpu), c_cpumap,
+				     XEN_VCPUAFFINITY_HARD,
+				     NULL);
 	free(c_cpumap);
 
 	if (retval < 0)
@@ -460,7 +462,9 @@ CAMLprim value stub_xc_vcpu_getaffinity(value xch, value
domid,
 		failwith_xc(_H(xch));
 
 	retval = xc_vcpu_getaffinity(_H(xch), _D(domid),
-	                             Int_val(vcpu), c_cpumap);
+				     Int_val(vcpu),
+				     XEN_VCPUAFFINITY_HARD,
+				     c_cpumap);
 	if (retval < 0) {
 		free(c_cpumap);
 		failwith_xc(_H(xch));
diff --git a/tools/python/xen/lowlevel/xc/xc.c
b/tools/python/xen/lowlevel/xc/xc.c
index 2625fc4..9348ce6 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -256,7 +256,8 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self,
         }
     }
   
-    if ( xc_vcpu_setaffinity(self->xc_handle, dom, vcpu, cpumap) != 0 )
+    if ( xc_vcpu_setaffinity(self->xc_handle, dom, vcpu, cpumap,
+                             XEN_VCPUAFFINITY_HARD, NULL) != 0 )
     {
         free(cpumap);
         return pyxc_error_to_exception(self->xc_handle);
@@ -403,7 +404,8 @@ static PyObject *pyxc_vcpu_getinfo(XcObject *self,
     if(cpumap == NULL)
         return pyxc_error_to_exception(self->xc_handle);
 
-    rc = xc_vcpu_getaffinity(self->xc_handle, dom, vcpu, cpumap);
+    rc = xc_vcpu_getaffinity(self->xc_handle, dom, vcpu,
+                             XEN_VCPUAFFINITY_HARD, cpumap);
     if ( rc < 0 )
     {
         free(cpumap);

Dario Faggioli

2013-Nov-22 18:58 UTC

head link

[PATCH v4 12/15] libxl: get and set soft affinity

Make space for two new cpumap-s, one in vcpu_info (for getting
soft affinity) and build_info (for setting it) and amend the
API for setting vCPU affinity.

libxl_set_vcpuaffinity() now takes two cpumaps, one for hard
and one for soft affinity (LIBXL_API_VERSION is exploited to
retain source level backword compatibility). Either of the
two cpumap can be NULL, in which case, only the affinity
corresponding to the non-NULL cpumap will be affected.

Getting soft affinity happens indirectly, via `xl vcpu-list''
(as it is already for hard affinity).

This commit also introduces some logic to check whether the
affinity which will be used by Xen to schedule the vCPU(s)
does actually match with the cpumaps provided. In fact, we
want to allow every possible combination of hard and soft
affinity to be set, but we warn the user upon particularly
weird situations (e.g., hard and soft being disjoint sets
of pCPUs).

This is the first change breaking the libxl ABI, so it bumps
the MAJOR.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Changes from v3:
 * only introduce one LIBXL_HAVE_ symbol for soft affinity,
   as requested during review;
 * use LIBXL_API_VERSION instead of having multiple version
   of the same function, as suggested during review;
 * use libxl_get_nr_cpus() rather than libxl_get_cputopology(),
   as suggested during review;
 * use LOGE() instead of LIBXL__LOG_ERRNO(), as requested
   during review;
 * kill the flags and use just one _set_vcpuaffinity()
   function with two cpumaps, allowing either of them to
   be NULL, as suggested during review;
 * avoid overflowing the bitmaps in libxl_bitmap_equal(),
   as suggested during review.

Changes from v2:
 * interface completely redesigned, as discussed during
   review.
---
 tools/libxl/Makefile        |    2 +
 tools/libxl/libxl.c         |   79 ++++++++++++++++++++++++++++++++++++++-----
 tools/libxl/libxl.h         |   38 ++++++++++++++++++++-
 tools/libxl/libxl_create.c  |    6 +++
 tools/libxl/libxl_dom.c     |    3 +-
 tools/libxl/libxl_types.idl |    4 ++
 tools/libxl/libxl_utils.h   |   25 +++++++++++++-
 tools/libxl/xl_cmdimpl.c    |    6 ++-
 8 files changed, 145 insertions(+), 18 deletions(-)

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index d8495bb..1410c44 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -5,7 +5,7 @@
 XEN_ROOT = $(CURDIR)/../..
 include $(XEN_ROOT)/tools/Rules.mk
 
-MAJOR = 4.3
+MAJOR = 4.4
 MINOR = 0
 
 XLUMAJOR = 4.3
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 961d55e..4b871d7 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -4538,6 +4538,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t
domid,
     for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu,
++ptr) {
         if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0))
             return NULL;
+        if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap_soft, 0))
+            return NULL;
         if (xc_vcpu_getinfo(ctx->xch, domid, *nb_vcpu, &vcpuinfo) == -1)
{
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu
info");
             return NULL;
@@ -4548,6 +4550,12 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t
domid,
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu
affinity");
             return NULL;
         }
+        if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu,
+                                XEN_VCPUAFFINITY_SOFT,
+                                ptr->cpumap_soft.map) == -1) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu
affinity");
+            return NULL;
+        }
         ptr->vcpuid = *nb_vcpu;
         ptr->cpu = vcpuinfo.cpu;
         ptr->online = !!vcpuinfo.online;
@@ -4559,28 +4567,81 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t
domid,
 }
 
 int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid,
-                           libxl_bitmap *cpumap)
+                           const libxl_bitmap *cpumap_hard,
+                           const libxl_bitmap *cpumap_soft)
 {
-    if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map,
-                            XEN_VCPUAFFINITY_HARD, NULL)) {
-        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu
affinity");
-        return ERROR_FAIL;
+    GC_INIT(ctx);
+    libxl_bitmap ecpumap;
+    int nr_cpus = 0, rc;
+
+    if (!cpumap_hard && !cpumap_soft)
+        return ERROR_INVAL;
+
+    rc = libxl_cpu_bitmap_alloc(ctx, &ecpumap, 0);
+    if (rc)
+        return rc;
+
+    nr_cpus = libxl_get_nr_cpus(ctx);
+    if (nr_cpus <= 0) {
+        rc = ERROR_FAIL;
+        goto out;
     }
+
+    if (cpumap_hard && xc_vcpu_setaffinity(ctx->xch, domid, vcpuid,
+                                           cpumap_hard->map,
+                                           XEN_VCPUAFFINITY_HARD,
+                                           ecpumap.map)) {
+        LOGE(ERROR, "setting vcpu hard affinity");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+    if (cpumap_hard && !libxl_bitmap_equal(cpumap_hard, &ecpumap,
nr_cpus))
+        LOG(DEBUG, "New hard affinity for vcpu %d contains unreachable
cpus",
+            vcpuid);
+
+    if (cpumap_soft && xc_vcpu_setaffinity(ctx->xch, domid, vcpuid,
+                                           cpumap_soft->map,
+                                           XEN_VCPUAFFINITY_SOFT,
+                                           ecpumap.map)) {
+        LOGE(ERROR, "setting vcpu soft affinity");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+    if (cpumap_soft && !libxl_bitmap_equal(cpumap_soft, &ecpumap,
nr_cpus))
+        LOG(DEBUG, "New soft affinity for vcpu %d contains unreachable
cpus",
+            vcpuid);
+
+    /*
+     * When setting hard affinity, it is guaranteed that the result will not
+     * be empty, so we need to check for that only if soft.
+     */
+    if (cpumap_soft && libxl_bitmap_is_empty(&ecpumap))
+        LOG(WARN, "New soft affinity for vcpu %d has only unreachable
cpus."
+            " Only hard affinity will be considered for scheduling",
vcpuid);
+
+    rc = 0;
+ out:
+    libxl_bitmap_dispose(&ecpumap);
+    GC_FREE;
     return 0;
 }
 
 int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
-                               unsigned int max_vcpus, libxl_bitmap *cpumap)
+                               unsigned int max_vcpus,
+                               const libxl_bitmap *cpumap_hard,
+                               const libxl_bitmap *cpumap_soft)
 {
+    GC_INIT(ctx);
     int i, rc = 0;
 
     for (i = 0; i < max_vcpus; i++) {
-        if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap)) {
-            LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
-                       "failed to set affinity for %d", i);
+        if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap_hard, cpumap_soft)) {
+            LOG(WARN, "failed to set affinity for %d", i);
             rc = ERROR_FAIL;
         }
     }
+
+    GC_FREE;
     return rc;
 }
 
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 2fab5ba..9864927 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -82,6 +82,14 @@
 #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1
 
 /*
+ * LIBXL_HAVE_SOFT_AFFINITY indicates that a ''cpumap_soft''
+ * field (of libxl_bitmap type) is present in both
+ * libxl_domain_build_info and libxl_vcpuinfo, containing
+ * the soft affinity for the vcpu.
+ */
+#define LIBXL_HAVE_SOFT_AFFINITY 1
+
+/*
  * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the
  * libxl_vendor_device field is present in the hvm sections of
  * libxl_domain_build_info. This field tells libxl which
@@ -994,9 +1002,35 @@ int libxl_userdata_retrieve(libxl_ctx *ctx, uint32_t
domid,
 
 int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo);
 int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid,
-                           libxl_bitmap *cpumap);
+                           const libxl_bitmap *cpumap_hard,
+                           const libxl_bitmap *cpumap_soft);
 int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
-                               unsigned int max_vcpus, libxl_bitmap *cpumap);
+                               unsigned int max_vcpus,
+                               const libxl_bitmap *cpumap_hard,
+                               const libxl_bitmap *cpumap_soft);
+
+#if defined (LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x040400
+
+static inline
+int libxl_set_vcpuaffinity_0x040200(libxl_ctx *ctx, uint32_t domid,
+                                    uint32_t vcpuid, libxl_bitmap *cpumap)
+{
+    return libxl_set_vcpuaffinity(ctx, domid, vcpuid, cpumap, NULL);
+}
+
+static inline
+int libxl_set_vcpuaffinity_all_0x040200(libxl_ctx *ctx, uint32_t domid,
+                                        unsigned int max_vcpus,
+                                        libxl_bitmap *cpumap)
+{
+    return libxl_set_vcpuaffinity_all(ctx, domid, max_vcpus, cpumap, NULL);
+}
+
+#define libxl_set_vcpuaffinity libxl_set_vcpuaffinity_0x040200
+#define libxl_set_vcpuaffinity_all libxl_set_vcpuaffinity_all_0x040200
+
+#endif
+
 int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid,
                                   libxl_bitmap *nodemap);
 int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index fe7ba0d..58f80df 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -193,6 +193,12 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
         libxl_bitmap_set_any(&b_info->cpumap);
     }
 
+    if (!b_info->cpumap_soft.size) {
+        if (libxl_cpu_bitmap_alloc(CTX, &b_info->cpumap_soft, 0))
+            return ERROR_FAIL;
+        libxl_bitmap_set_any(&b_info->cpumap_soft);
+    }
+
     libxl_defbool_setdefault(&b_info->numa_placement, true);
 
     if (!b_info->nodemap.size) {
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 72489f8..4bfed60 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -236,7 +236,8 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
             return rc;
     }
     libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap);
-    libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,
&info->cpumap);
+    libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,
&info->cpumap,
+                               &info->cpumap_soft);
 
     xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb +
LIBXL_MAXMEM_CONSTANT);
     xs_domid = xs_read(ctx->xsh, XBT_NULL,
"/tool/xenstored/domid", NULL);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index cba8eff..48699c9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -298,6 +298,7 @@ libxl_domain_build_info =
Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
     ("cpumap",          libxl_bitmap),
+    ("cpumap_soft",     libxl_bitmap),
     ("nodemap",         libxl_bitmap),
     ("numa_placement",  libxl_defbool),
     ("tsc_mode",        libxl_tsc_mode),
@@ -510,7 +511,8 @@ libxl_vcpuinfo = Struct("vcpuinfo", [
     ("blocked", bool),
     ("running", bool),
     ("vcpu_time", uint64), # total vcpu time ran (ns)
-    ("cpumap", libxl_bitmap), # current cpu''s affinities
+    ("cpumap", libxl_bitmap), # current hard cpu affinity
+    ("cpumap_soft", libxl_bitmap), # current soft cpu affinity
     ], dir=DIR_OUT)
 
 libxl_physinfo = Struct("physinfo", [
diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h
index b11cf28..c205dd7 100644
--- a/tools/libxl/libxl_utils.h
+++ b/tools/libxl/libxl_utils.h
@@ -90,7 +90,7 @@ static inline void libxl_bitmap_set_none(libxl_bitmap *bitmap)
 {
     memset(bitmap->map, 0, bitmap->size);
 }
-static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit)
+static inline int libxl_bitmap_valid(const libxl_bitmap *bitmap, int bit)
 {
     return bit >= 0 && bit < (bitmap->size * 8);
 }
@@ -98,6 +98,29 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap
*bitmap, int bit)
 #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \
                                              if (libxl_bitmap_test(&(m),
v))
 
+static inline int libxl_bitmap_equal(const libxl_bitmap *ba,
+                                     const libxl_bitmap *bb,
+                                     int nr_bits)
+{
+    int i;
+
+    /* Only check nr_bits (all bits if <= 0) */
+    nr_bits = (nr_bits <= 0) ? ba->size * 8 : nr_bits;
+    for (i = 0; i < nr_bits; i++) {
+        /* If overflowing one of the bitmaps, we call them different */
+        if (!libxl_bitmap_valid(ba, i) || !libxl_bitmap_valid(bb, i))
+            return 0;
+        if (libxl_bitmap_test(ba, i) != libxl_bitmap_test(bb, i))
+            return 0;
+    }
+    return 1;
+}
+
+static inline int libxl_bitmap_cpu_valid(libxl_bitmap *cpumap, int cpu)
+{
+    return libxl_bitmap_valid(cpumap, cpu);
+}
+
 int libxl_cpu_bitmap_alloc(libxl_ctx *ctx,
                            libxl_bitmap *cpumap,
                            int max_cpus);
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 7b4d058..17fffdd 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2222,7 +2222,7 @@ start:
             } else {
                 libxl_bitmap_set_any(&vcpu_cpumap);
             }
-            if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap)) {
+            if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap, NULL))
{
                 fprintf(stderr, "setting affinity failed on vcpu
`%d''.\n", i);
                 libxl_bitmap_dispose(&vcpu_cpumap);
                 free(vcpu_to_pcpu);
@@ -4630,7 +4630,7 @@ static int vcpupin(uint32_t domid, const char *vcpu, char
*cpu)
     }
 
     if (vcpuid != -1) {
-        if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap) == -1) {
+        if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) {
             fprintf(stderr, "Could not set affinity for vcpu
`%u''.\n", vcpuid);
             goto out;
         }
@@ -4642,7 +4642,7 @@ static int vcpupin(uint32_t domid, const char *vcpu, char
*cpu)
         }
         for (i = 0; i < nb_vcpu; i++) {
             if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid,
-                                       &cpumap) == -1) {
+                                       &cpumap, NULL)) {
                 fprintf(stderr, "libxl_set_vcpuaffinity failed"
                                 " on vcpu `%u''.\n",
vcpuinfo[i].vcpuid);
             }

Dario Faggioli

2013-Nov-22 18:58 UTC

head link

[PATCH v4 13/15] xl: enable getting and setting soft

Getting happens via `xl vcpu-list'', which now looks like this:

 # xl vcpu-list -s
 Name       ID VCPU CPU State Time(s) Affinity (Hard / Soft)
 Domain-0   0     0  11  -b-     5.4  8-15 / all
 Domain-0   0     1  11  -b-     1.0  8-15 / all
 Domain-0   0    14  13  -b-     1.4  8-15 / all
 Domain-0   0    15   8  -b-     1.6  8-15 / all
 vm-test    3     0   4  -b-     2.5  0-12 / 0-7
 vm-test    3     1   0  -b-     3.2  0-12 / 0-7

Setting happens by adding a ''-s''/''--soft''
switch to `xl vcpu-pin''.

xl manual page is updated accordingly.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Changes from v3:
 * fix typos in doc, rephrased the help message and changed
   the title of the column for hard/soft affinity, as suggested
   during review.

Changes from v2:
 * this patch folds what in v2 were patches 13 and 14;
 * `xl vcpu-pin'' always shows both had and soft affinity,
   without the need of passing ''-s''.
---
 docs/man/xl.pod.1         |   24 +++++++++++++++
 tools/libxl/xl_cmdimpl.c  |   70 +++++++++++++++++++++++++++------------------
 tools/libxl/xl_cmdtable.c |    3 +-
 3 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index e7b9de2..b738ca2 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -619,7 +619,7 @@ after B<vcpu-set>, go to B<SEE ALSO> section for
information.
 Lists VCPU information for a specific domain.  If no domain is
 specified, VCPU information for all domains will be provided.
 
-=item B<vcpu-pin> I<domain-id> I<vcpu> I<cpus>
+=item B<vcpu-pin> [I<OPTIONS>] I<domain-id> I<vcpu>
I<cpus>
 
 Pins the VCPU to only run on the specific CPUs.  The keyword
 B<all> can be used to apply the I<cpus> list to all VCPUs in the
@@ -630,6 +630,28 @@ different run state is appropriate.  Pinning can be used to
restrict
 this, by ensuring certain VCPUs can only run on certain physical
 CPUs.
 
+B<OPTIONS>
+
+=over 4
+
+=item B<-s>, B<--soft>
+
+The same as above, but affect I<soft affinity> rather than pinning
+(also called I<hard affinity>).
+
+Normally, VCPUs just wander among the CPUs where it is allowed to
+run (either all the CPUs or the ones to which it is pinned, as said
+for B<vcpu-list>). Soft affinity offers a means to specify one or
+more I<preferred> CPUs. Basically, among the ones where it can run,
+the VCPU the VCPU will greatly prefer to execute on one of these
+CPUs, whenever that is possible.
+
+Notice that, in order for soft affinity to actually work, it needs
+special support in the scheduler. Right now, only credit1 provides
+that.
+
+=back
+
 =item B<vm-list>
 
 Prints information about guests. This list excludes information about
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 17fffdd..d6fda26 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -4514,8 +4514,10 @@ static void print_vcpuinfo(uint32_t tdomid,
     }
     /*      TIM */
     printf("%9.1f  ", ((float)vcpuinfo->vcpu_time / 1e9));
-    /* CPU AFFINITY */
+    /* CPU HARD AND SOFT AFFINITY */
     print_bitmap(vcpuinfo->cpumap.map, nr_cpus, stdout);
+    printf(" / ");
+    print_bitmap(vcpuinfo->cpumap_soft.map, nr_cpus, stdout);
     printf("\n");
 }
 
@@ -4550,7 +4552,8 @@ static void vcpulist(int argc, char **argv)
     }
 
     printf("%-32s %5s %5s %5s %5s %9s %s\n",
-           "Name", "ID", "VCPU", "CPU",
"State", "Time(s)", "CPU Affinity");
+           "Name", "ID", "VCPU", "CPU",
"State", "Time(s)",
+           "Affinity (Hard / Soft)");
     if (!argc) {
         if (!(dominfo = libxl_list_domain(ctx, &nb_domain))) {
             fprintf(stderr, "libxl_list_domain failed.\n");
@@ -4583,17 +4586,33 @@ int main_vcpulist(int argc, char **argv)
     return 0;
 }
 
-static int vcpupin(uint32_t domid, const char *vcpu, char *cpu)
+int main_vcpupin(int argc, char **argv)
 {
     libxl_vcpuinfo *vcpuinfo;
     libxl_bitmap cpumap;
-
-    uint32_t vcpuid;
-    char *endptr;
-    int i, nb_cpu, nb_vcpu, rc = -1;
+    uint32_t vcpuid, domid;
+    const char *vcpu;
+    char *endptr, *cpu;
+    int nb_cpu, nb_vcpu, rc = -1;
+    int opt, soft_affinity = 0;
+    static struct option opts[] = {
+        {"soft", 0, 0, ''s''},
+        COMMON_LONG_OPTS,
+        {0, 0, 0, 0}
+    };
 
     libxl_bitmap_init(&cpumap);
 
+    SWITCH_FOREACH_OPT(opt, "s", opts, "vcpu-pin", 3) {
+    case ''s'':
+        soft_affinity = 1;
+        break;
+    }
+
+    domid = find_domain(argv[optind]);
+    vcpu = argv[optind+1];
+    cpu = argv[optind+2];
+
     vcpuid = strtoul(vcpu, &endptr, 10);
     if (vcpu == endptr) {
         if (strcmp(vcpu, "all")) {
@@ -4630,23 +4649,29 @@ static int vcpupin(uint32_t domid, const char *vcpu,
char *cpu)
     }
 
     if (vcpuid != -1) {
-        if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) {
-            fprintf(stderr, "Could not set affinity for vcpu
`%u''.\n", vcpuid);
+        if (!soft_affinity &&
+            libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) {
+            fprintf(stderr, "Could not set hard affinity for vcpu
`%u''.\n",
+                    vcpuid);
+            goto out;
+        } else if (soft_affinity &&
+                   libxl_set_vcpuaffinity(ctx, domid, vcpuid, NULL,
&cpumap)) {
+            fprintf(stderr, "Could not set soft affinity for vcpu
`%u''.\n",
+                    vcpuid);
             goto out;
         }
     }
     else {
-        if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu, &i))) {
+        if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu,
&nb_cpu))) {
             fprintf(stderr, "libxl_list_vcpu failed.\n");
             goto out;
         }
-        for (i = 0; i < nb_vcpu; i++) {
-            if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid,
-                                       &cpumap, NULL)) {
-                fprintf(stderr, "libxl_set_vcpuaffinity failed"
-                                " on vcpu `%u''.\n",
vcpuinfo[i].vcpuid);
-            }
-        }
+        if (!soft_affinity &&
+            libxl_set_vcpuaffinity_all(ctx, domid, nb_vcpu, &cpumap, NULL))
+            fprintf(stderr, "Could not set hard affinity.\n");
+        else if (soft_affinity &&
+                 libxl_set_vcpuaffinity_all(ctx, domid, nb_vcpu, NULL,
&cpumap))
+            fprintf(stderr, "Could not set soft affinity.\n");
         libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu);
     }
 
@@ -4656,17 +4681,6 @@ static int vcpupin(uint32_t domid, const char *vcpu, char
*cpu)
     return rc;
 }
 
-int main_vcpupin(int argc, char **argv)
-{
-    int opt;
-
-    SWITCH_FOREACH_OPT(opt, "", NULL, "vcpu-pin", 3) {
-        /* No options */
-    }
-
-    return vcpupin(find_domain(argv[optind]), argv[optind+1] , argv[optind+2]);
-}
-
 static void vcpuset(uint32_t domid, const char* nr_vcpus, int check_host)
 {
     char *endptr;
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index ebe0220..8aee343 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -213,7 +213,8 @@ struct cmd_spec cmd_table[] = {
     { "vcpu-pin",
       &main_vcpupin, 1, 1,
       "Set which CPUs a VCPU can use",
-      "<Domain> <VCPU|all> <CPUs|all>",
+      "[option] <Domain> <VCPU|all> <CPUs|all>",
+      "-s, --soft         Set soft affinity",
     },
     { "vcpu-set",
       &main_vcpuset, 0, 1,

Dario Faggioli

2013-Nov-22 18:58 UTC

head link

[PATCH v4 14/15] xl: enable for specifying node-affinity in the config file

in a similar way to how it is possible to specify vcpu-affinity.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Changes from v3:
 * fix typos and language issues in docs and comments, as
   suggested during review;
 * common code to soft and hard affinity parsing factored
   together, as requested uring review.

Changes from v2:
 * use the new libxl API. Although the implementation changed
   only a little bit, I removed IanJ''s Acked-by, although I am
   here saying that he did provided it, as requested.
---
 docs/man/xl.cfg.pod.5    |   25 +++++-
 tools/libxl/xl_cmdimpl.c |  192 ++++++++++++++++++++++++++++++----------------
 2 files changed, 146 insertions(+), 71 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 5e17b5d..9566393 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -144,19 +144,38 @@ run on cpu #3 of the host.
 =back
 
 If this option is not specified, no vcpu to cpu pinning is established,
-and the vcpus of the guest can run on all the cpus of the host.
+and the vcpus of the guest can run on all the cpus of the host. If this
+option is specified, the intersection of the vcpu pinning mask, provided
+here, and the soft affinity mask, provided via B<cpus\_soft=> (if any),
+is utilized to compute the domain node-affinity, for driving memory
+allocations.
 
 If we are on a NUMA machine (i.e., if the host has more than one NUMA
 node) and this option is not specified, libxl automatically tries to
 place the guest on the least possible number of nodes. That, however,
 will not affect vcpu pinning, so the guest will still be able to run on
-all the cpus, it will just prefer the ones from the node it has been
-placed on. A heuristic approach is used for choosing the best node (or
+all the cpus. A heuristic approach is used for choosing the best node (or
 set of nodes), with the goals of maximizing performance for the guest
 and, at the same time, achieving efficient utilization of host cpus
 and memory. See F<docs/misc/xl-numa-placement.markdown> for more
 details.
 
+=item B<cpus_soft="CPU-LIST">
+
+Exactly as B<cpus=>, but specifies soft affinity, rather than pinning
+(also called hard affinity). When using the credit scheduler, this
+means the vcpus of the domain prefers to run on these pcpus.
+
+A C<CPU-LIST> is specified exactly as above, for B<cpus=>.
+
+If this option is not specified, the vcpus of the guest will not have
+any preference regarding on what cpu to run, and the scheduler will
+treat all the cpus where a vcpu can execute (if B<cpus=> is specified),
+or all the host cpus (if not), the same. If this option is specified,
+the intersection of the soft affinity mask, provided here, and the vcpu
+pinning, provided via B<cpus=> (if any), is utilized to compute the
+domain node-affinity, for driving memory allocations.
+
 =back
 
 =head3 CPU Scheduling
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index d6fda26..abd3e42 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -76,8 +76,9 @@ xlchild children[child_max];
 static const char *common_domname;
 static int fd_lock = -1;
 
-/* Stash for specific vcpu to pcpu mappping */
+/* Stash for specific vcpu to pcpu hard and soft mappping */
 static int *vcpu_to_pcpu;
+static int *vcpu_to_pcpu_soft;
 
 static const char savefileheader_magic[32]     "Xen saved domain, xl
format\n \0 \r";
@@ -687,6 +688,64 @@ static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap)
     return rc;
 }
 
+static int *parse_config_cpumap_list(XLU_ConfigList *cpus,
+                                     libxl_bitmap *cpumap,
+                                     int max_vcpus)
+{
+    int i, n_cpus = 0;
+    int *to_pcpu;
+    const char *buf;
+
+    if (libxl_cpu_bitmap_alloc(ctx, cpumap, 0)) {
+        fprintf(stderr, "Unable to allocate cpumap\n");
+        exit(1);
+    }
+
+    /* Prepare the array for single vcpu to pcpu mappings */
+    to_pcpu = xmalloc(sizeof(int) * max_vcpus);
+    memset(to_pcpu, -1, sizeof(int) * max_vcpus);
+
+    /*
+     * Idea here is to let libxl think all the domain''s vcpus
+     * have cpu affinity with all the pcpus on the list. Doing
+     * that ensures memory is allocated on the proper NUMA nodes.
+     * It is then us, here in xl, that matches each single vcpu
+     * to its pcpu (and that''s why we need to stash such info in
+     * the to_pcpu array now) after the domain has been created.
+     * This way, we avoid having to pass to libxl some big array
+     * hosting the single mappings.
+     */
+    libxl_bitmap_set_none(cpumap);
+    while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) {
+        i = atoi(buf);
+        if (!libxl_bitmap_cpu_valid(cpumap, i)) {
+            fprintf(stderr, "cpu %d illegal\n", i);
+            exit(1);
+        }
+        libxl_bitmap_set(cpumap, i);
+        if (n_cpus < max_vcpus)
+            to_pcpu[n_cpus] = i;
+        n_cpus++;
+    }
+
+    return to_pcpu;
+}
+
+static void parse_config_cpumap_string(const char *buf, libxl_bitmap *cpumap)
+{
+        char *buf2 = strdup(buf);
+
+        if (libxl_cpu_bitmap_alloc(ctx, cpumap, 0)) {
+            fprintf(stderr, "Unable to allocate cpumap\n");
+            exit(1);
+        }
+
+        libxl_bitmap_set_none(cpumap);
+        if (vcpupin_parse(buf2, cpumap))
+            exit(1);
+        free(buf2);
+}
+
 static void parse_config_data(const char *config_source,
                               const char *config_data,
                               int config_len,
@@ -697,7 +756,8 @@ static void parse_config_data(const char *config_source,
     const char *buf;
     long l;
     XLU_Config *config;
-    XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms;
+    XLU_ConfigList *cpus, *cpus_soft, *vbds, *nics, *pcis;
+    XLU_ConfigList *cvfbs, *cpuids, *vtpms;
     XLU_ConfigList *ioports, *irqs, *iomem;
     int num_ioports, num_irqs, num_iomem;
     int pci_power_mgmt = 0;
@@ -820,57 +880,24 @@ static void parse_config_data(const char *config_source,
         b_info->max_vcpus = l;
 
     if (!xlu_cfg_get_list (config, "cpus", &cpus, 0, 1)) {
-        int n_cpus = 0;
-
-        if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) {
-            fprintf(stderr, "Unable to allocate cpumap\n");
-            exit(1);
-        }
-
-        /* Prepare the array for single vcpu to pcpu mappings */
-        vcpu_to_pcpu = xmalloc(sizeof(int) * b_info->max_vcpus);
-        memset(vcpu_to_pcpu, -1, sizeof(int) * b_info->max_vcpus);
-
-        /*
-         * Idea here is to let libxl think all the domain''s vcpus
-         * have cpu affinity with all the pcpus on the list.
-         * It is then us, here in xl, that matches each single vcpu
-         * to its pcpu (and that''s why we need to stash such info in
-         * the vcpu_to_pcpu array now) after the domain has been created.
-         * Doing it like this saves the burden of passing to libxl
-         * some big array hosting the single mappings. Also, using
-         * the cpumap derived from the list ensures memory is being
-         * allocated on the proper nodes anyway.
-         */
-        libxl_bitmap_set_none(&b_info->cpumap);
-        while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) {
-            i = atoi(buf);
-            if (!libxl_bitmap_cpu_valid(&b_info->cpumap, i)) {
-                fprintf(stderr, "cpu %d illegal\n", i);
-                exit(1);
-            }
-            libxl_bitmap_set(&b_info->cpumap, i);
-            if (n_cpus < b_info->max_vcpus)
-                vcpu_to_pcpu[n_cpus] = i;
-            n_cpus++;
-        }
-
+        vcpu_to_pcpu = parse_config_cpumap_list(cpus, &b_info->cpumap,
+                                                b_info->max_vcpus);
         /* We have a cpumap, disable automatic placement */
         libxl_defbool_set(&b_info->numa_placement, false);
     }
     else if (!xlu_cfg_get_string (config, "cpus", &buf, 0)) {
-        char *buf2 = strdup(buf);
-
-        if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) {
-            fprintf(stderr, "Unable to allocate cpumap\n");
-            exit(1);
-        }
-
-        libxl_bitmap_set_none(&b_info->cpumap);
-        if (vcpupin_parse(buf2, &b_info->cpumap))
-            exit(1);
-        free(buf2);
+        parse_config_cpumap_string(buf, &b_info->cpumap);
+        libxl_defbool_set(&b_info->numa_placement, false);
+    }
 
+    if (!xlu_cfg_get_list (config, "cpus_soft", &cpus_soft, 0,
1)) {
+        vcpu_to_pcpu_soft +            parse_config_cpumap_list(cpus_soft,
&b_info->cpumap_soft,
+                                     b_info->max_vcpus);
+        libxl_defbool_set(&b_info->numa_placement, false);
+    }
+    else if (!xlu_cfg_get_string (config, "cpus_soft", &buf, 0))
{
+        parse_config_cpumap_string(buf, &b_info->cpumap_soft);
         libxl_defbool_set(&b_info->numa_placement, false);
     }
 
@@ -1991,6 +2018,40 @@ static void evdisable_disk_ejects(libxl_evgen_disk_eject
**diskws,
     }
 }
 
+static inline int set_vcpu_to_pcpu_affinity(uint32_t domid, int *to_pcpu,
+                                            int max_vcpus, int soft)
+{
+    libxl_bitmap vcpu_cpumap;
+    int i, ret = 0;
+
+    ret = libxl_cpu_bitmap_alloc(ctx, &vcpu_cpumap, 0);
+    if (ret)
+        return -1;
+
+    for (i = 0; i < max_vcpus; i++) {
+        if (to_pcpu[i] != -1) {
+            libxl_bitmap_set_none(&vcpu_cpumap);
+            libxl_bitmap_set(&vcpu_cpumap, to_pcpu[i]);
+        } else {
+            libxl_bitmap_set_any(&vcpu_cpumap);
+        }
+        if (!soft &&
+            libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap, NULL)) {
+            fprintf(stderr, "setting hard affinity failed on vcpu
`%d''.\n", i);
+            ret = -1;
+            break;
+        } else if (soft &&
+                   libxl_set_vcpuaffinity(ctx, domid, i, NULL,
&vcpu_cpumap)) {
+            fprintf(stderr, "setting soft affinity failed on vcpu
`%d''.\n", i);
+            ret = -1;
+            break;
+        }
+    }
+    libxl_bitmap_dispose(&vcpu_cpumap);
+
+    return ret;
+}
+
 static uint32_t create_domain(struct domain_create *dom_info)
 {
     uint32_t domid = INVALID_DOMID;
@@ -2207,31 +2268,26 @@ start:
     if ( ret )
         goto error_out;
 
-    /* If single vcpu to pcpu mapping was requested, honour it */
+    /* If single vcpu pinning or soft affinity was requested, honour it */
     if (vcpu_to_pcpu) {
-        libxl_bitmap vcpu_cpumap;
+        ret = set_vcpu_to_pcpu_affinity(domid, vcpu_to_pcpu,
+                                        d_config.b_info.max_vcpus, 0);
+        free(vcpu_to_pcpu);
 
-        ret = libxl_cpu_bitmap_alloc(ctx, &vcpu_cpumap, 0);
         if (ret)
             goto error_out;
-        for (i = 0; i < d_config.b_info.max_vcpus; i++) {
 
-            if (vcpu_to_pcpu[i] != -1) {
-                libxl_bitmap_set_none(&vcpu_cpumap);
-                libxl_bitmap_set(&vcpu_cpumap, vcpu_to_pcpu[i]);
-            } else {
-                libxl_bitmap_set_any(&vcpu_cpumap);
-            }
-            if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap, NULL))
{
-                fprintf(stderr, "setting affinity failed on vcpu
`%d''.\n", i);
-                libxl_bitmap_dispose(&vcpu_cpumap);
-                free(vcpu_to_pcpu);
-                ret = ERROR_FAIL;
-                goto error_out;
-            }
-        }
-        libxl_bitmap_dispose(&vcpu_cpumap);
-        free(vcpu_to_pcpu); vcpu_to_pcpu = NULL;
+        vcpu_to_pcpu = NULL;
+    }
+    if (vcpu_to_pcpu_soft) {
+        ret = set_vcpu_to_pcpu_affinity(domid, vcpu_to_pcpu_soft,
+                                        d_config.b_info.max_vcpus, 1);
+        free(vcpu_to_pcpu_soft);
+
+        if (ret)
+            goto error_out;
+
+        vcpu_to_pcpu_soft = NULL;
     }
 
     ret = libxl_userdata_store(ctx, domid, "xl",

Dario Faggioli

2013-Nov-22 18:58 UTC

head link

[PATCH v4 15/15] libxl: automatic NUMA placement affects soft affinity

vCPU soft affinity and NUMA-aware scheduling does not have
to be related. However, soft affinity is how NUMA-aware
scheduling is actually implemented, and therefore, by default,
the results of automatic NUMA placement (at VM creation time)
are also used to set the soft affinity of all the vCPUs of
the domain.

Of course, this only happens if automatic NUMA placement is
enabled and actually takes place (for instance, if the user
does not specify any hard and soft affiniy in the xl config
file).

This also takes care of the vice-versa, i.e., don''t trigger
automatic placement if the config file specifies either an
hard (the check for which was already there) or a soft (the
check for which is introduced by this commit) affinity.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
---
Changes from v3:
 * rephrase comments and docs, as suggestd during review.
---
 docs/man/xl.cfg.pod.5                |   21 +++++++++++----------
 docs/misc/xl-numa-placement.markdown |   14 ++++++++++++--
 tools/libxl/libxl_dom.c              |   20 ++++++++++++++++++--
 3 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 9566393..f210201 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -150,16 +150,6 @@ here, and the soft affinity mask, provided via
B<cpus\_soft=> (if any),
 is utilized to compute the domain node-affinity, for driving memory
 allocations.
 
-If we are on a NUMA machine (i.e., if the host has more than one NUMA
-node) and this option is not specified, libxl automatically tries to
-place the guest on the least possible number of nodes. That, however,
-will not affect vcpu pinning, so the guest will still be able to run on
-all the cpus. A heuristic approach is used for choosing the best node (or
-set of nodes), with the goals of maximizing performance for the guest
-and, at the same time, achieving efficient utilization of host cpus
-and memory. See F<docs/misc/xl-numa-placement.markdown> for more
-details.
-
 =item B<cpus_soft="CPU-LIST">
 
 Exactly as B<cpus=>, but specifies soft affinity, rather than pinning
@@ -176,6 +166,17 @@ the intersection of the soft affinity mask, provided here,
and the vcpu
 pinning, provided via B<cpus=> (if any), is utilized to compute the
 domain node-affinity, for driving memory allocations.
 
+If this option is not specified (and B<cpus=> is not specified either),
+libxl automatically tries to place the guest on the least possible
+number of nodes. A heuristic approach is used for choosing the best
+node (or set of nodes), with the goal of maximizing performance for
+the guest and, at the same time, achieving efficient utilization of
+host cpus and memory. In that case, the soft affinity of all the vcpus
+of the domain will be set to the pcpus belonging to the NUMA nodes
+chosen during placement.
+
+For more details, see F<docs/misc/xl-numa-placement.markdown>.
+
 =back
 
 =head3 CPU Scheduling
diff --git a/docs/misc/xl-numa-placement.markdown
b/docs/misc/xl-numa-placement.markdown
index b1ed361..09ae95e 100644
--- a/docs/misc/xl-numa-placement.markdown
+++ b/docs/misc/xl-numa-placement.markdown
@@ -126,10 +126,20 @@ or Xen won''t be able to guarantee the locality
for their memory accesses.
 That, of course, also mean the vCPUs of the domain will only be able to
 execute on those same pCPUs.
 
+It is is also possible to have a "cpus\_soft=" option in the xl
config file,
+to specify the soft affinity for all the vCPUs of the domain. This affects
+the NUMA placement in the following way:
+
+ * if only "cpus\_soft=" is present, the VM''s node-affinity
will be equal
+   to the nodes to which the pCPUs in the soft affinity mask belong;
+ * if both "cpus\_soft=" and "cpus=" are present, the
VM''s node-affinity
+   will be equal to the nodes to which the pCPUs present both in hard and
+   soft affinity belong.
+
 ### Placing the guest automatically ###
 
-If no "cpus=" option is specified in the config file, libxl tries
-to figure out on its own on which node(s) the domain could fit best.
+If neither "cpus=" nor "cpus\_soft=" are present in the
config file, libxl
+tries to figure out on its own on which node(s) the domain could fit best.
 If it finds one (some), the domain''s node affinity get set to there,
 and both memory allocations and NUMA aware scheduling (for the credit
 scheduler and starting from Xen 4.3) will comply with it. Starting from
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 4bfed60..ceb8643 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -222,18 +222,34 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
      * some weird error manifests) the subsequent call to
      * libxl_domain_set_nodeaffinity() will do the actual placement,
      * whatever that turns out to be.
+     *
+     * As far as scheduling is concerned, we achieve NUMA-aware scheduling
+     * by having the results of placement affect the soft affinity of all
+     * the vcpus of the domain. Of course, we want that iff placement is
+     * enabled and actually happens, so we only change info->cpumap_soft to
+     * reflect the placement result if that is the case
      */
     if (libxl_defbool_val(info->numa_placement)) {
 
-        if (!libxl_bitmap_is_full(&info->cpumap)) {
+        /* We require both hard and soft affinity not to be set */
+        if (!libxl_bitmap_is_full(&info->cpumap) ||
+            !libxl_bitmap_is_full(&info->cpumap_soft)) {
             LOG(ERROR, "Can run NUMA placement only if no vcpu "
-                       "affinity is specified");
+                       "(hard or soft) affinity is specified");
             return ERROR_INVAL;
         }
 
         rc = numa_place_domain(gc, domid, info);
         if (rc)
             return rc;
+
+        /*
+         * We change the soft affinity in domain_build_info here, of course
+         * after converting the result of placement from nodes to cpus. the
+         * following call to libxl_set_vcpuaffinity_all_soft() will do the
+         * actual updating of the domain''s vcpus'' soft
affinity.
+         */
+        libxl_nodemap_to_cpumap(ctx, &info->nodemap,
&info->cpumap_soft);
     }
     libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap);
     libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,
&info->cpumap,

George Dunlap

2013-Nov-25 17:26 UTC

head link

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

On 11/22/2013 06:56 PM, Dario Faggioli wrote:> as well as both error handling and logging in libxl_cpu_bitmap_alloc
> and libxl_node_bitmap_alloc.
>
> Now libxl_get_max_{cpus,nodes} either return a positive number, or
> a libxl error code. Thanks to that, it is possible to fix loggig for
> the two bitmap allocation functions, which now happens _inside_ the
> functions themselves, and report what happens more accurately.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

George Dunlap

2013-Nov-25 17:52 UTC

head link

Re: [PATCH v4 12/15] libxl: get and set soft affinity

On 11/22/2013 06:58 PM, Dario Faggioli wrote:> Make space for two new cpumap-s, one in vcpu_info (for getting
> soft affinity) and build_info (for setting it) and amend the
> API for setting vCPU affinity.
>
> libxl_set_vcpuaffinity() now takes two cpumaps, one for hard
> and one for soft affinity (LIBXL_API_VERSION is exploited to
> retain source level backword compatibility). Either of the
> two cpumap can be NULL, in which case, only the affinity
> corresponding to the non-NULL cpumap will be affected.
>
> Getting soft affinity happens indirectly, via `xl vcpu-list''
> (as it is already for hard affinity).
>
> This commit also introduces some logic to check whether the
> affinity which will be used by Xen to schedule the vCPU(s)
> does actually match with the cpumaps provided. In fact, we
> want to allow every possible combination of hard and soft
> affinity to be set, but we warn the user upon particularly
> weird situations (e.g., hard and soft being disjoint sets
> of pCPUs).
>
> This is the first change breaking the libxl ABI, so it bumps
> the MAJOR.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Looks good:

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
> ---
> Changes from v3:
>   * only introduce one LIBXL_HAVE_ symbol for soft affinity,
>     as requested during review;
>   * use LIBXL_API_VERSION instead of having multiple version
>     of the same function, as suggested during review;
>   * use libxl_get_nr_cpus() rather than libxl_get_cputopology(),
>     as suggested during review;
>   * use LOGE() instead of LIBXL__LOG_ERRNO(), as requested
>     during review;
>   * kill the flags and use just one _set_vcpuaffinity()
>     function with two cpumaps, allowing either of them to
>     be NULL, as suggested during review;
>   * avoid overflowing the bitmaps in libxl_bitmap_equal(),
>     as suggested during review.
>
> Changes from v2:
>   * interface completely redesigned, as discussed during
>     review.
> ---
>   tools/libxl/Makefile        |    2 +
>   tools/libxl/libxl.c         |   79
++++++++++++++++++++++++++++++++++++++-----
>   tools/libxl/libxl.h         |   38 ++++++++++++++++++++-
>   tools/libxl/libxl_create.c  |    6 +++
>   tools/libxl/libxl_dom.c     |    3 +-
>   tools/libxl/libxl_types.idl |    4 ++
>   tools/libxl/libxl_utils.h   |   25 +++++++++++++-
>   tools/libxl/xl_cmdimpl.c    |    6 ++-
>   8 files changed, 145 insertions(+), 18 deletions(-)
>
> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> index d8495bb..1410c44 100644
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -5,7 +5,7 @@
>   XEN_ROOT = $(CURDIR)/../..
>   include $(XEN_ROOT)/tools/Rules.mk
>   
> -MAJOR = 4.3
> +MAJOR = 4.4
>   MINOR = 0
>   
>   XLUMAJOR = 4.3
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 961d55e..4b871d7 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -4538,6 +4538,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx,
uint32_t domid,
>       for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu,
++ptr) {
>           if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0))
>               return NULL;
> +        if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap_soft, 0))
> +            return NULL;
>           if (xc_vcpu_getinfo(ctx->xch, domid, *nb_vcpu, &vcpuinfo)
== -1) {
>               LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu
info");
>               return NULL;
> @@ -4548,6 +4550,12 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx,
uint32_t domid,
>               LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu
affinity");
>               return NULL;
>           }
> +        if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu,
> +                                XEN_VCPUAFFINITY_SOFT,
> +                                ptr->cpumap_soft.map) == -1) {
> +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu
affinity");
> +            return NULL;
> +        }
>           ptr->vcpuid = *nb_vcpu;
>           ptr->cpu = vcpuinfo.cpu;
>           ptr->online = !!vcpuinfo.online;
> @@ -4559,28 +4567,81 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx,
uint32_t domid,
>   }
>   
>   int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t
vcpuid,
> -                           libxl_bitmap *cpumap)
> +                           const libxl_bitmap *cpumap_hard,
> +                           const libxl_bitmap *cpumap_soft)
>   {
> -    if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map,
> -                            XEN_VCPUAFFINITY_HARD, NULL)) {
> -        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu
affinity");
> -        return ERROR_FAIL;
> +    GC_INIT(ctx);
> +    libxl_bitmap ecpumap;
> +    int nr_cpus = 0, rc;
> +
> +    if (!cpumap_hard && !cpumap_soft)
> +        return ERROR_INVAL;
> +
> +    rc = libxl_cpu_bitmap_alloc(ctx, &ecpumap, 0);
> +    if (rc)
> +        return rc;
> +
> +    nr_cpus = libxl_get_nr_cpus(ctx);
> +    if (nr_cpus <= 0) {
> +        rc = ERROR_FAIL;
> +        goto out;
>       }
> +
> +    if (cpumap_hard && xc_vcpu_setaffinity(ctx->xch, domid,
vcpuid,
> +                                           cpumap_hard->map,
> +                                           XEN_VCPUAFFINITY_HARD,
> +                                           ecpumap.map)) {
> +        LOGE(ERROR, "setting vcpu hard affinity");
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +    if (cpumap_hard && !libxl_bitmap_equal(cpumap_hard,
&ecpumap, nr_cpus))
> +        LOG(DEBUG, "New hard affinity for vcpu %d contains
unreachable cpus",
> +            vcpuid);
> +
> +    if (cpumap_soft && xc_vcpu_setaffinity(ctx->xch, domid,
vcpuid,
> +                                           cpumap_soft->map,
> +                                           XEN_VCPUAFFINITY_SOFT,
> +                                           ecpumap.map)) {
> +        LOGE(ERROR, "setting vcpu soft affinity");
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +    if (cpumap_soft && !libxl_bitmap_equal(cpumap_soft,
&ecpumap, nr_cpus))
> +        LOG(DEBUG, "New soft affinity for vcpu %d contains
unreachable cpus",
> +            vcpuid);
> +
> +    /*
> +     * When setting hard affinity, it is guaranteed that the result will
not
> +     * be empty, so we need to check for that only if soft.
> +     */
> +    if (cpumap_soft && libxl_bitmap_is_empty(&ecpumap))
> +        LOG(WARN, "New soft affinity for vcpu %d has only unreachable
cpus."
> +            " Only hard affinity will be considered for
scheduling", vcpuid);
> +
> +    rc = 0;
> + out:
> +    libxl_bitmap_dispose(&ecpumap);
> +    GC_FREE;
>       return 0;
>   }
>   
>   int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
> -                               unsigned int max_vcpus, libxl_bitmap
*cpumap)
> +                               unsigned int max_vcpus,
> +                               const libxl_bitmap *cpumap_hard,
> +                               const libxl_bitmap *cpumap_soft)
>   {
> +    GC_INIT(ctx);
>       int i, rc = 0;
>   
>       for (i = 0; i < max_vcpus; i++) {
> -        if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap)) {
> -            LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
> -                       "failed to set affinity for %d", i);
> +        if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap_hard,
cpumap_soft)) {
> +            LOG(WARN, "failed to set affinity for %d", i);
>               rc = ERROR_FAIL;
>           }
>       }
> +
> +    GC_FREE;
>       return rc;
>   }
>   
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 2fab5ba..9864927 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -82,6 +82,14 @@
>   #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1
>   
>   /*
> + * LIBXL_HAVE_SOFT_AFFINITY indicates that a
''cpumap_soft''
> + * field (of libxl_bitmap type) is present in both
> + * libxl_domain_build_info and libxl_vcpuinfo, containing
> + * the soft affinity for the vcpu.
> + */
> +#define LIBXL_HAVE_SOFT_AFFINITY 1
> +
> +/*
>    * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the
>    * libxl_vendor_device field is present in the hvm sections of
>    * libxl_domain_build_info. This field tells libxl which
> @@ -994,9 +1002,35 @@ int libxl_userdata_retrieve(libxl_ctx *ctx, uint32_t
domid,
>   
>   int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo);
>   int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t
vcpuid,
> -                           libxl_bitmap *cpumap);
> +                           const libxl_bitmap *cpumap_hard,
> +                           const libxl_bitmap *cpumap_soft);
>   int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
> -                               unsigned int max_vcpus, libxl_bitmap
*cpumap);
> +                               unsigned int max_vcpus,
> +                               const libxl_bitmap *cpumap_hard,
> +                               const libxl_bitmap *cpumap_soft);
> +
> +#if defined (LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x040400
> +
> +static inline
> +int libxl_set_vcpuaffinity_0x040200(libxl_ctx *ctx, uint32_t domid,
> +                                    uint32_t vcpuid, libxl_bitmap *cpumap)
> +{
> +    return libxl_set_vcpuaffinity(ctx, domid, vcpuid, cpumap, NULL);
> +}
> +
> +static inline
> +int libxl_set_vcpuaffinity_all_0x040200(libxl_ctx *ctx, uint32_t domid,
> +                                        unsigned int max_vcpus,
> +                                        libxl_bitmap *cpumap)
> +{
> +    return libxl_set_vcpuaffinity_all(ctx, domid, max_vcpus, cpumap,
NULL);
> +}
> +
> +#define libxl_set_vcpuaffinity libxl_set_vcpuaffinity_0x040200
> +#define libxl_set_vcpuaffinity_all libxl_set_vcpuaffinity_all_0x040200
> +
> +#endif
> +
>   int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid,
>                                     libxl_bitmap *nodemap);
>   int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid,
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index fe7ba0d..58f80df 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -193,6 +193,12 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
>           libxl_bitmap_set_any(&b_info->cpumap);
>       }
>   
> +    if (!b_info->cpumap_soft.size) {
> +        if (libxl_cpu_bitmap_alloc(CTX, &b_info->cpumap_soft, 0))
> +            return ERROR_FAIL;
> +        libxl_bitmap_set_any(&b_info->cpumap_soft);
> +    }
> +
>       libxl_defbool_setdefault(&b_info->numa_placement, true);
>   
>       if (!b_info->nodemap.size) {
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 72489f8..4bfed60 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -236,7 +236,8 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
>               return rc;
>       }
>       libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap);
> -    libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,
&info->cpumap);
> +    libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,
&info->cpumap,
> +                               &info->cpumap_soft);
>   
>       xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb +
LIBXL_MAXMEM_CONSTANT);
>       xs_domid = xs_read(ctx->xsh, XBT_NULL,
"/tool/xenstored/domid", NULL);
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index cba8eff..48699c9 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -298,6 +298,7 @@ libxl_domain_build_info =
Struct("domain_build_info",[
>       ("max_vcpus",       integer),
>       ("avail_vcpus",     libxl_bitmap),
>       ("cpumap",          libxl_bitmap),
> +    ("cpumap_soft",     libxl_bitmap),
>       ("nodemap",         libxl_bitmap),
>       ("numa_placement",  libxl_defbool),
>       ("tsc_mode",        libxl_tsc_mode),
> @@ -510,7 +511,8 @@ libxl_vcpuinfo = Struct("vcpuinfo", [
>       ("blocked", bool),
>       ("running", bool),
>       ("vcpu_time", uint64), # total vcpu time ran (ns)
> -    ("cpumap", libxl_bitmap), # current cpu''s
affinities
> +    ("cpumap", libxl_bitmap), # current hard cpu affinity
> +    ("cpumap_soft", libxl_bitmap), # current soft cpu affinity
>       ], dir=DIR_OUT)
>   
>   libxl_physinfo = Struct("physinfo", [
> diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h
> index b11cf28..c205dd7 100644
> --- a/tools/libxl/libxl_utils.h
> +++ b/tools/libxl/libxl_utils.h
> @@ -90,7 +90,7 @@ static inline void libxl_bitmap_set_none(libxl_bitmap
*bitmap)
>   {
>       memset(bitmap->map, 0, bitmap->size);
>   }
> -static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit)
> +static inline int libxl_bitmap_valid(const libxl_bitmap *bitmap, int bit)
>   {
>       return bit >= 0 && bit < (bitmap->size * 8);
>   }
> @@ -98,6 +98,29 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap
*bitmap, int bit)
>   #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8;
v++) \
>                                                if
(libxl_bitmap_test(&(m), v))
>   
> +static inline int libxl_bitmap_equal(const libxl_bitmap *ba,
> +                                     const libxl_bitmap *bb,
> +                                     int nr_bits)
> +{
> +    int i;
> +
> +    /* Only check nr_bits (all bits if <= 0) */
> +    nr_bits = (nr_bits <= 0) ? ba->size * 8 : nr_bits;
> +    for (i = 0; i < nr_bits; i++) {
> +        /* If overflowing one of the bitmaps, we call them different */
> +        if (!libxl_bitmap_valid(ba, i) || !libxl_bitmap_valid(bb, i))
> +            return 0;
> +        if (libxl_bitmap_test(ba, i) != libxl_bitmap_test(bb, i))
> +            return 0;
> +    }
> +    return 1;
> +}
> +
> +static inline int libxl_bitmap_cpu_valid(libxl_bitmap *cpumap, int cpu)
> +{
> +    return libxl_bitmap_valid(cpumap, cpu);
> +}
> +
>   int libxl_cpu_bitmap_alloc(libxl_ctx *ctx,
>                              libxl_bitmap *cpumap,
>                              int max_cpus);
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 7b4d058..17fffdd 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -2222,7 +2222,7 @@ start:
>               } else {
>                   libxl_bitmap_set_any(&vcpu_cpumap);
>               }
> -            if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap)) {
> +            if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap,
NULL)) {
>                   fprintf(stderr, "setting affinity failed on vcpu
`%d''.\n", i);
>                   libxl_bitmap_dispose(&vcpu_cpumap);
>                   free(vcpu_to_pcpu);
> @@ -4630,7 +4630,7 @@ static int vcpupin(uint32_t domid, const char *vcpu,
char *cpu)
>       }
>   
>       if (vcpuid != -1) {
> -        if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap) == -1)
{
> +        if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL))
{
>               fprintf(stderr, "Could not set affinity for vcpu
`%u''.\n", vcpuid);
>               goto out;
>           }
> @@ -4642,7 +4642,7 @@ static int vcpupin(uint32_t domid, const char *vcpu,
char *cpu)
>           }
>           for (i = 0; i < nb_vcpu; i++) {
>               if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid,
> -                                       &cpumap) == -1) {
> +                                       &cpumap, NULL)) {
>                   fprintf(stderr, "libxl_set_vcpuaffinity failed"
>                                   " on vcpu `%u''.\n",
vcpuinfo[i].vcpuid);
>               }
>

Jan Beulich

2013-Nov-27 13:11 UTC

head link

Re: [PATCH v4 10/15] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity

>>> On 22.11.13 at 19:57, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> +            /*
> +             * Report back to the caller what the "effective
affinity", that
> +             * is the intersection of cpupool''s pcpus, the (new?)
hard
> +             * affinity and the (new?) soft-affinity.
> +             */
> +            if (
!guest_handle_is_null(op->u.vcpuaffinity.eff_cpumap.bitmap) )
>              {
> -                ret = vcpu_set_affinity(v, new_affinity);
> -                free_cpumask_var(new_affinity);
> +                online = cpupool_online_cpumask(v->domain->cpupool);
> +                cpumask_and(new_affinity, online,
v->cpu_hard_affinity);
> +                if ( op->u.vcpuaffinity.flags &
XEN_VCPUAFFINITY_SOFT)
> +                    cpumask_and(new_affinity, new_affinity,
> +                                v->cpu_soft_affinity);
> +
> +                ret = cpumask_to_xenctl_bitmap(
> +                    &op->u.vcpuaffinity.eff_cpumap, new_affinity);
So with both flags set, how is the caller supposed to know what
hard affinity is now in effect? I said on the previous version already
that with you _having_ two CPU masks, you should return both.
>          else
>          {
> +            cpumask_var_t affinity;
> +
> +            /*
> +             * If the caller asks for both _HARD and _SOFT, what we return
> +             * is the intersection of hard and soft affinity for the vcpu.
> +             */
> +            if ( !alloc_cpumask_var(&affinity) )
> +            {
> +                ret = -ENOMEM;
> +                break;
> +            }
> +            cpumask_setall(affinity);
> +
> +            if ( op->u.vcpuaffinity.flags == 0 )
> +            {
> +                ret = -EINVAL;
> +                free_cpumask_var(affinity);
> +                break;
> +            }
> +            if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_HARD )
> +                cpumask_copy(affinity, v->cpu_hard_affinity);
> +            if ( op->u.vcpuaffinity.flags & XEN_VCPUAFFINITY_SOFT )
> +                cpumask_and(affinity, affinity, v->cpu_soft_affinity);
> +
>              ret = cpumask_to_xenctl_bitmap(
> -                &op->u.vcpuaffinity.cpumap,
v->cpu_hard_affinity);
> +                &op->u.vcpuaffinity.cpumap, affinity);
Similarly here.

Of course, the name "eff_cpumap" then is likely not really suitable.
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -300,8 +300,19 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_nodeaffinity_t);
>  /* XEN_DOMCTL_setvcpuaffinity */
>  /* XEN_DOMCTL_getvcpuaffinity */
>  struct xen_domctl_vcpuaffinity {
> -    uint32_t  vcpu;              /* IN */
> -    struct xenctl_bitmap cpumap; /* IN/OUT */
> +    /* IN variables. */
> +    uint32_t  vcpu;
> + /* Set/get the hard affinity for vcpu */
> +#define _XEN_VCPUAFFINITY_HARD  0
> +#define XEN_VCPUAFFINITY_HARD   (1U<<_XEN_VCPUAFFINITY_HARD)
> + /* Set/get the soft affinity for vcpu */
> +#define _XEN_VCPUAFFINITY_SOFT  1
> +#define XEN_VCPUAFFINITY_SOFT   (1U<<_XEN_VCPUAFFINITY_SOFT)
> +    uint32_t flags;
> +    /* IN/OUT variables. */
In further revisions, please make these annotations correctly
reflect what you patch does: In its current shape, this is an IN for
set and an OUT for get, ...
> +    struct xenctl_bitmap cpumap;
> +    /* OUT variables. */
... while this is an OUT for set, and unused for get (and I also said
earlier that _if_ this remains unused for one of the two operations,
then sharing the interface structure wouldn''t be appropriate
anymore).

Jan

Ian Campbell

2013-Nov-27 13:45 UTC

head link

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

On Fri, 2013-11-22 at 19:56 +0100, Dario Faggioli wrote:> @@ -4351,7 +4349,7 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx
*ctx, int *nb_cpu_out)
>      int max_cpus;
>  
>      max_cpus = libxl_get_max_cpus(ctx);
> -    if (max_cpus == 0)
> +    if (max_cpus <= 0)
Doesn''t this slightly odd error condition (which includes 0 as an
error)
actually stem from  xc_get_max_cpus''s broken handling of the case where
xc_physinfo failed?

It''d be better IMHO to make sure that this call chain ultimately
returns
either a negative error or a positive number of cpus and never 0. I''d
be
inclined to do that in xc_get_max_cpus. I suppose then
xc_get_{cpu,node}map_size would need error handling too. Handling it in
libxl_get_max_cpus might be reasonable too. That function really ought
to be returning libxl error codes, not -1 and errno as it currently
does.

If xc_physinfo is failing it seems unlikely that anything subsequent to
that is going to make much useful progress.
> @@ -4538,10 +4536,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx,
uint32_t domid,
>      }
>  
>      for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu,
++ptr) {
> -        if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0)) {
> -            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "allocating
cpumap");
> +        if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0))
>              return NULL;
Isn''t this leaking ret == ptr here and everywhere else this returns?
> -        }
>          if (xc_vcpu_getinfo(ctx->xch, domid, *nb_vcpu, &vcpuinfo)
== -1) {
>              LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu
info");
>              return NULL;
> @@ -5304,7 +5300,7 @@ int libxl_get_freecpus(libxl_ctx *ctx, libxl_bitmap
*cpumap)
>      int ncpus;
>  
>      ncpus = libxl_get_max_cpus(ctx);
> -    if (ncpus == 0)
> +    if (ncpus <= 0)
>          return ERROR_FAIL;
If libxl_get_max_cpus returned a proper rc you could propagate it.
>  
>      cpumap->map = xc_cpupool_freeinfo(ctx->xch);
> diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
> index 9f5f589..1815422 100644
> --- a/tools/libxl/libxl_utils.c
> +++ b/tools/libxl/libxl_utils.c
> @@ -651,6 +651,56 @@ char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const
libxl_bitmap *bitmap)
>      return q;
>  }
>  
> +int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int
max_cpus)
You seem to have combined code motion with actual changes here. Please
don''t do that.
> +{
> +    GC_INIT(ctx);
> +    int rc = 0;
> +
> +    if (max_cpus < 0) {
> +        rc = ERROR_INVAL;
No need to log here?
> +        goto out;
> +    }
> +    if (max_cpus == 0)
> +        max_cpus = libxl_get_max_cpus(ctx);
> +    if (max_cpus <= 0) {
> +        LOG(ERROR, "failed to retrieve the maximum number of
cpus");
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +    /* This can''t fail: no need to check and log */
> +    libxl_bitmap_alloc(ctx, cpumap, max_cpus);
> +
> + out:
> +    GC_FREE;
> +    return rc;
> +}
> +
> +int libxl_node_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *nodemap,
> +                            int max_nodes)
This is identical to cpu_bitmap alloc other than the parameter name and
the log message, isn''t it? A common internal function, perhaps taking a
const char *what would eliminate that.
> +{
> +    GC_INIT(ctx);
> +    int rc = 0;
> +
> +    if (max_nodes < 0) {
> +        rc = ERROR_INVAL;
> +        goto out;
> +    }
> +
> +    if (max_nodes == 0)
> +        max_nodes = libxl_get_max_nodes(ctx);
> +    if (max_nodes <= 0) {
> +        LOG(ERROR, "failed to retrieve the maximum number of
nodes");
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +    /* This can''t fail: no need to check and log */
> +    libxl_bitmap_alloc(ctx, nodemap, max_nodes);
> +
> + out:
> +    GC_FREE;
> +    return rc;
> +}
> +
>  int libxl_nodemap_to_cpumap(libxl_ctx *ctx,
>                              const libxl_bitmap *nodemap,
>                              libxl_bitmap *cpumap)
> @@ -719,12 +769,16 @@ int libxl_cpumap_to_nodemap(libxl_ctx *ctx,
>  
>  int libxl_get_max_cpus(libxl_ctx *ctx)
>  {
> -    return xc_get_max_cpus(ctx->xch);
> +    int max_cpus = xc_get_max_cpus(ctx->xch);
> +
> +    return max_cpus <= 0 ? ERROR_FAIL : max_cpus;
Aha, so there is indeed no need for all the <= changes in the callers.
Good!

Ian.

Ian Campbell

2013-Nov-27 13:49 UTC

head link

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

On Fri, 2013-11-22 at 19:56 +0100, Dario Faggioli wrote:> to retrieve the actual number of pCPUs on the host.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
>  tools/libxl/libxl.h       |    3 +++
>  tools/libxl/libxl_utils.c |   18 ++++++++++++++++++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index a9663e4..2fab5ba 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -652,6 +652,9 @@ int libxl_domain_preserve(libxl_ctx *ctx, uint32_t
domid, libxl_domain_create_in
>  /* get max. number of cpus supported by hypervisor */
>  int libxl_get_max_cpus(libxl_ctx *ctx);
>  
> +/* get the actual number of online cpus on the host */
> +int libxl_get_nr_cpus(libxl_ctx *ctx);
I suggest get_online_cpus if that is what it returns. "nr" could be
counting various different types of cpu.

> +
>  /* get max. number of NUMA nodes supported by hypervisor */
>  int libxl_get_max_nodes(libxl_ctx *ctx);
>  
> diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
> index 1815422..8763070 100644
> --- a/tools/libxl/libxl_utils.c
> +++ b/tools/libxl/libxl_utils.c
> @@ -774,6 +774,24 @@ int libxl_get_max_cpus(libxl_ctx *ctx)
>      return max_cpus <= 0 ? ERROR_FAIL : max_cpus;
>  }
>  
> +int libxl_get_nr_cpus(libxl_ctx *ctx)
Arguably for consistency with the existing interfaces this should be a
wrapper around a (new) libxc function, but lets not go there today.
> +{
> +    GC_INIT(ctx);
> +    xc_physinfo_t physinfo = { 0 };
I don''t think this initialiser is actually needed (nowhere in libxc
does
it).
> +    int rc = 0;
> +
> +    if (xc_physinfo(ctx->xch, &physinfo)) {
> +        LOGE(ERROR, "xc_physinfo failed.");
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +    rc = physinfo.nr_cpus;
> +
> + out:
> +    GC_FREE;
> +    return rc;
> +}
> +
>  int libxl_get_max_nodes(libxl_ctx *ctx)
>  {
>      int max_nodes = xc_get_max_nodes(ctx->xch);
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

George Dunlap

2013-Nov-27 14:17 UTC

head link

Re: [PATCH v4 10/15] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity

On 11/27/2013 01:11 PM, Jan Beulich wrote:>>>> On 22.11.13 at 19:57, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
>> +            /*
>> +             * Report back to the caller what the "effective
affinity", that
>> +             * is the intersection of cpupool''s pcpus, the
(new?) hard
>> +             * affinity and the (new?) soft-affinity.
>> +             */
>> +            if (
!guest_handle_is_null(op->u.vcpuaffinity.eff_cpumap.bitmap) )
>>               {
>> -                ret = vcpu_set_affinity(v, new_affinity);
>> -                free_cpumask_var(new_affinity);
>> +                online =
cpupool_online_cpumask(v->domain->cpupool);
>> +                cpumask_and(new_affinity, online,
v->cpu_hard_affinity);
>> +                if ( op->u.vcpuaffinity.flags &
XEN_VCPUAFFINITY_SOFT)
>> +                    cpumask_and(new_affinity, new_affinity,
>> +                                v->cpu_soft_affinity);
>> +
>> +                ret = cpumask_to_xenctl_bitmap(
>> +                    &op->u.vcpuaffinity.eff_cpumap,
new_affinity);
> So with both flags set, how is the caller supposed to know what
> hard affinity is now in effect? I said on the previous version already
> that with you _having_ two CPU masks, you should return both.
If I recall the timing correctly, I think this series was sent out 
before you guys had come to that conclusion on the other thread.

  -George

Dario Faggioli

2013-Nov-27 14:31 UTC

head link

Re: [PATCH v4 10/15] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity

On mer, 2013-11-27 at 14:17 +0000, George Dunlap wrote:> On 11/27/2013 01:11 PM, Jan Beulich wrote:
> >>>> On 22.11.13 at 19:57, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> >> +            /*
> >> +             * Report back to the caller what the "effective
affinity", that
> >> +             * is the intersection of cpupool''s pcpus,
the (new?) hard
> >> +             * affinity and the (new?) soft-affinity.
> >> +             */
> >> +            if (
!guest_handle_is_null(op->u.vcpuaffinity.eff_cpumap.bitmap) )
> >>               {
> >> -                ret = vcpu_set_affinity(v, new_affinity);
> >> -                free_cpumask_var(new_affinity);
> >> +                online =
cpupool_online_cpumask(v->domain->cpupool);
> >> +                cpumask_and(new_affinity, online,
v->cpu_hard_affinity);
> >> +                if ( op->u.vcpuaffinity.flags &
XEN_VCPUAFFINITY_SOFT)
> >> +                    cpumask_and(new_affinity, new_affinity,
> >> +                                v->cpu_soft_affinity);
> >> +
> >> +                ret = cpumask_to_xenctl_bitmap(
> >> +                    &op->u.vcpuaffinity.eff_cpumap,
new_affinity);
> > So with both flags set, how is the caller supposed to know what
> > hard affinity is now in effect? I said on the previous version already
> > that with you _having_ two CPU masks, you should return both.
> 
> If I recall the timing correctly, I think this series was sent out 
> before you guys had come to that conclusion on the other thread.
> Indeed. Anyway, new version implementing exactly what we agreed upon
there coming shortly (for real, I just got sidetracked by a couple of
other stuff, but it''s almost ready).

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2013-Nov-27 14:45 UTC

head link

Re: [PATCH v4 12/15] libxl: get and set soft affinity

On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote:> Make space for two new cpumap-s, one in vcpu_info (for getting
> soft affinity) and build_info (for setting it) and amend the
> API for setting vCPU affinity.
> 
> libxl_set_vcpuaffinity() now takes two cpumaps, one for hard
> and one for soft affinity (LIBXL_API_VERSION is exploited to
> retain source level backword compatibility). Either of the
> two cpumap can be NULL, in which case, only the affinity
> corresponding to the non-NULL cpumap will be affected.
> 
> Getting soft affinity happens indirectly, via `xl vcpu-list''
> (as it is already for hard affinity).
> 
> This commit also introduces some logic to check whether the
> affinity which will be used by Xen to schedule the vCPU(s)
> does actually match with the cpumaps provided. In fact, we
> want to allow every possible combination of hard and soft
> affinity to be set, but we warn the user upon particularly
> weird situations (e.g., hard and soft being disjoint sets
> of pCPUs).
> 
> This is the first change breaking the libxl ABI, so it bumps
> the MAJOR.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Changes from v3:
>  * only introduce one LIBXL_HAVE_ symbol for soft affinity,
>    as requested during review;
>  * use LIBXL_API_VERSION instead of having multiple version
>    of the same function, as suggested during review;
>  * use libxl_get_nr_cpus() rather than libxl_get_cputopology(),
>    as suggested during review;
>  * use LOGE() instead of LIBXL__LOG_ERRNO(), as requested
>    during review;
>  * kill the flags and use just one _set_vcpuaffinity()
>    function with two cpumaps, allowing either of them to
>    be NULL, as suggested during review;
>  * avoid overflowing the bitmaps in libxl_bitmap_equal(),
>    as suggested during review.
> 
> Changes from v2:
>  * interface completely redesigned, as discussed during
>    review.
> ---
>  tools/libxl/Makefile        |    2 +
>  tools/libxl/libxl.c         |   79
++++++++++++++++++++++++++++++++++++++-----
>  tools/libxl/libxl.h         |   38 ++++++++++++++++++++-
>  tools/libxl/libxl_create.c  |    6 +++
>  tools/libxl/libxl_dom.c     |    3 +-
>  tools/libxl/libxl_types.idl |    4 ++
>  tools/libxl/libxl_utils.h   |   25 +++++++++++++-
>  tools/libxl/xl_cmdimpl.c    |    6 ++-
>  8 files changed, 145 insertions(+), 18 deletions(-)
> 
> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> index d8495bb..1410c44 100644
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -5,7 +5,7 @@
>  XEN_ROOT = $(CURDIR)/../..
>  include $(XEN_ROOT)/tools/Rules.mk
>  
> -MAJOR = 4.3
> +MAJOR = 4.4
>  MINOR = 0
>  
>  XLUMAJOR = 4.3
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 961d55e..4b871d7 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -4538,6 +4538,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx,
uint32_t domid,
>      for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id; ++*nb_vcpu,
++ptr) {
>          if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0))
>              return NULL;
> +        if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap_soft, 0))
> +            return NULL;
Leaks ptr and ptr->cpumap...
>          if (xc_vcpu_getinfo(ctx->xch, domid, *nb_vcpu, &vcpuinfo)
== -1) {
>              LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu
info");
>              return NULL;
> @@ -4548,6 +4550,12 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx,
uint32_t domid,
>              LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu
affinity");
>              return NULL;
>          }
> +        if (xc_vcpu_getaffinity(ctx->xch, domid, *nb_vcpu,
> +                                XEN_VCPUAFFINITY_SOFT,
> +                                ptr->cpumap_soft.map) == -1) {
> +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting vcpu
affinity");
> +            return NULL;
> +        }
>          ptr->vcpuid = *nb_vcpu;
>          ptr->cpu = vcpuinfo.cpu;
>          ptr->online = !!vcpuinfo.online;
> @@ -4559,28 +4567,81 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx,
uint32_t domid,
>  }
>  
>  int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t
vcpuid,
> -                           libxl_bitmap *cpumap)
> +                           const libxl_bitmap *cpumap_hard,
> +                           const libxl_bitmap *cpumap_soft)
>  {
> -    if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map,
> -                            XEN_VCPUAFFINITY_HARD, NULL)) {
> -        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu
affinity");
> -        return ERROR_FAIL;
> +    GC_INIT(ctx);
> +    libxl_bitmap ecpumap;
> +    int nr_cpus = 0, rc;
> +
> +    if (!cpumap_hard && !cpumap_soft)
> +        return ERROR_INVAL;
> +
> +    rc = libxl_cpu_bitmap_alloc(ctx, &ecpumap, 0);
> +    if (rc)
> +        return rc;
> +
> +    nr_cpus = libxl_get_nr_cpus(ctx);
> +    if (nr_cpus <= 0) {
> +        rc = ERROR_FAIL;
> +        goto out;
>      }
> +
> +    if (cpumap_hard && xc_vcpu_setaffinity(ctx->xch, domid,
vcpuid,
> +                                           cpumap_hard->map,
> +                                           XEN_VCPUAFFINITY_HARD,
> +                                           ecpumap.map)) {
> +        LOGE(ERROR, "setting vcpu hard affinity");
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +    if (cpumap_hard && !libxl_bitmap_equal(cpumap_hard,
&ecpumap, nr_cpus))
> +        LOG(DEBUG, "New hard affinity for vcpu %d contains
unreachable cpus",
> +            vcpuid);
> +
> +    if (cpumap_soft && xc_vcpu_setaffinity(ctx->xch, domid,
vcpuid,
> +                                           cpumap_soft->map,
> +                                           XEN_VCPUAFFINITY_SOFT,
> +                                           ecpumap.map)) {
> +        LOGE(ERROR, "setting vcpu soft affinity");
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +    if (cpumap_soft && !libxl_bitmap_equal(cpumap_soft,
&ecpumap, nr_cpus))
> +        LOG(DEBUG, "New soft affinity for vcpu %d contains
unreachable cpus",
> +            vcpuid);
> +
> +    /*
> +     * When setting hard affinity, it is guaranteed that the result will
not
> +     * be empty, so we need to check for that only if soft.
> +     */
> +    if (cpumap_soft && libxl_bitmap_is_empty(&ecpumap))
> +        LOG(WARN, "New soft affinity for vcpu %d has only unreachable
cpus."
> +            " Only hard affinity will be considered for
scheduling", vcpuid);
> +
> +    rc = 0;
> + out:
> +    libxl_bitmap_dispose(&ecpumap);
> +    GC_FREE;
>      return 0;
ITYM rreturn rc.
>  }
>  
>  int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
> -                               unsigned int max_vcpus, libxl_bitmap
*cpumap)
> +                               unsigned int max_vcpus,
> +                               const libxl_bitmap *cpumap_hard,
> +                               const libxl_bitmap *cpumap_soft)
>  {
> +    GC_INIT(ctx);
>      int i, rc = 0;
>  
>      for (i = 0; i < max_vcpus; i++) {
> -        if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap)) {
> -            LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
> -                       "failed to set affinity for %d", i);
> +        if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap_hard,
cpumap_soft)) {
> +            LOG(WARN, "failed to set affinity for %d", i);
>              rc = ERROR_FAIL;
>          }
>      }
> +
> +    GC_FREE;
>      return rc;
>  }
>  
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 2fab5ba..9864927 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -82,6 +82,14 @@
>  #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1
>  
>  /*
> + * LIBXL_HAVE_SOFT_AFFINITY indicates that a
''cpumap_soft''
> + * field (of libxl_bitmap type) is present in both
> + * libxl_domain_build_info and libxl_vcpuinfo, containing
> + * the soft affinity for the vcpu.
"the vcpu", in the build_info case I guess it applies to all vcpus?

> + */
> +#define LIBXL_HAVE_SOFT_AFFINITY 1
> +
> +/*
>   * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the
>   * libxl_vendor_device field is present in the hvm sections of
>   * libxl_domain_build_info. This field tells libxl which
> @@ -994,9 +1002,35 @@ int libxl_userdata_retrieve(libxl_ctx *ctx, uint32_t
domid,
>  
>  int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo);
>  int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t
vcpuid,
> -                           libxl_bitmap *cpumap);
> +                           const libxl_bitmap *cpumap_hard,
> +                           const libxl_bitmap *cpumap_soft);
>  int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
> -                               unsigned int max_vcpus, libxl_bitmap
*cpumap);
> +                               unsigned int max_vcpus,
> +                               const libxl_bitmap *cpumap_hard,
> +                               const libxl_bitmap *cpumap_soft);
> +
> +#if defined (LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x040400
> +
> +static inline
> +int libxl_set_vcpuaffinity_0x040200(libxl_ctx *ctx, uint32_t domid,
> +                                    uint32_t vcpuid, libxl_bitmap *cpumap)
> +{
> +    return libxl_set_vcpuaffinity(ctx, domid, vcpuid, cpumap, NULL);
> +}
> +
> +static inline
> +int libxl_set_vcpuaffinity_all_0x040200(libxl_ctx *ctx, uint32_t domid,
> +                                        unsigned int max_vcpus,
> +                                        libxl_bitmap *cpumap)
> +{
> +    return libxl_set_vcpuaffinity_all(ctx, domid, max_vcpus, cpumap,
NULL);
> +}
> +
> +#define libxl_set_vcpuaffinity libxl_set_vcpuaffinity_0x040200
> +#define libxl_set_vcpuaffinity_all libxl_set_vcpuaffinity_all_0x040200
This looks correct, thanks. It could also have jsut been #defined since
the inline is only used in the libxl_domain_create_restore_0x040200 case
because there is a local variable.> +
> +#endif
> +
>  int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid,
>                                    libxl_bitmap *nodemap);
>  int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid,
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index fe7ba0d..58f80df 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -193,6 +193,12 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
>          libxl_bitmap_set_any(&b_info->cpumap);
>      }
>  
> +    if (!b_info->cpumap_soft.size) {
> +        if (libxl_cpu_bitmap_alloc(CTX, &b_info->cpumap_soft, 0))
> +            return ERROR_FAIL;
> +        libxl_bitmap_set_any(&b_info->cpumap_soft);
> +    }
> +
>      libxl_defbool_setdefault(&b_info->numa_placement, true);
>  
>      if (!b_info->nodemap.size) {
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 72489f8..4bfed60 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -236,7 +236,8 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
>              return rc;
>      }
>      libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap);
> -    libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,
&info->cpumap);
> +    libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,
&info->cpumap,
> +                               &info->cpumap_soft);
>  
>      xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb +
LIBXL_MAXMEM_CONSTANT);
>      xs_domid = xs_read(ctx->xsh, XBT_NULL,
"/tool/xenstored/domid", NULL);
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index cba8eff..48699c9 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -298,6 +298,7 @@ libxl_domain_build_info =
Struct("domain_build_info",[
>      ("max_vcpus",       integer),
>      ("avail_vcpus",     libxl_bitmap),
>      ("cpumap",          libxl_bitmap),
> +    ("cpumap_soft",     libxl_bitmap),
>      ("nodemap",         libxl_bitmap),
>      ("numa_placement",  libxl_defbool),
>      ("tsc_mode",        libxl_tsc_mode),
> @@ -510,7 +511,8 @@ libxl_vcpuinfo = Struct("vcpuinfo", [
>      ("blocked", bool),
>      ("running", bool),
>      ("vcpu_time", uint64), # total vcpu time ran (ns)
> -    ("cpumap", libxl_bitmap), # current cpu''s
affinities
> +    ("cpumap", libxl_bitmap), # current hard cpu affinity
> +    ("cpumap_soft", libxl_bitmap), # current soft cpu affinity
>      ], dir=DIR_OUT)
>  
>  libxl_physinfo = Struct("physinfo", [
> diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h
> index b11cf28..c205dd7 100644
> --- a/tools/libxl/libxl_utils.h
> +++ b/tools/libxl/libxl_utils.h
> @@ -90,7 +90,7 @@ static inline void libxl_bitmap_set_none(libxl_bitmap
*bitmap)
>  {
>      memset(bitmap->map, 0, bitmap->size);
>  }
> -static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit)
> +static inline int libxl_bitmap_valid(const libxl_bitmap *bitmap, int bit)
This name suggests to me that it checks whether the entire bitmap is
somehow valid.

..._bit_valid? (or valid_bit).

An alternative might be for the places which use this to assume that any
bit past the end of the mask must be zero (or 1, but I think 0 makes
more sense).
>  {
>      return bit >= 0 && bit < (bitmap->size * 8);
>  }
> @@ -98,6 +98,29 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap
*bitmap, int bit)
>  #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++)
\
>                                               if
(libxl_bitmap_test(&(m), v))
>  
> +static inline int libxl_bitmap_equal(const libxl_bitmap *ba,
> +                                     const libxl_bitmap *bb,
> +                                     int nr_bits)
> +{
> +    int i;
> +
> +    /* Only check nr_bits (all bits if <= 0) */
> +    nr_bits = (nr_bits <= 0) ? ba->size * 8 : nr_bits;
> +    for (i = 0; i < nr_bits; i++) {
> +        /* If overflowing one of the bitmaps, we call them different */
> +        if (!libxl_bitmap_valid(ba, i) || !libxl_bitmap_valid(bb, i))
> +            return 0;
> +        if (libxl_bitmap_test(ba, i) != libxl_bitmap_test(bb, i))
> +            return 0;
> +    }
> +    return 1;
> +}
> +
> +static inline int libxl_bitmap_cpu_valid(libxl_bitmap *cpumap, int cpu)
> +{
> +    return libxl_bitmap_valid(cpumap, cpu);
> +}
> +
>  int libxl_cpu_bitmap_alloc(libxl_ctx *ctx,
>                             libxl_bitmap *cpumap,
>                             int max_cpus);
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 7b4d058..17fffdd 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -2222,7 +2222,7 @@ start:
>              } else {
>                  libxl_bitmap_set_any(&vcpu_cpumap);
>              }
> -            if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap)) {
> +            if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap,
NULL)) {
>                  fprintf(stderr, "setting affinity failed on vcpu
`%d''.\n", i);
>                  libxl_bitmap_dispose(&vcpu_cpumap);
>                  free(vcpu_to_pcpu);
> @@ -4630,7 +4630,7 @@ static int vcpupin(uint32_t domid, const char *vcpu,
char *cpu)
>      }
>  
>      if (vcpuid != -1) {
> -        if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap) == -1)
{
> +        if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL))
{
>              fprintf(stderr, "Could not set affinity for vcpu
`%u''.\n", vcpuid);
>              goto out;
>          }
> @@ -4642,7 +4642,7 @@ static int vcpupin(uint32_t domid, const char *vcpu,
char *cpu)
>          }
>          for (i = 0; i < nb_vcpu; i++) {
>              if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid,
> -                                       &cpumap) == -1) {
> +                                       &cpumap, NULL)) {
>                  fprintf(stderr, "libxl_set_vcpuaffinity failed"
>                                  " on vcpu `%u''.\n",
vcpuinfo[i].vcpuid);
>              }
>

Ian Campbell

2013-Nov-27 14:57 UTC

head link

Re: [PATCH v4 13/15] xl: enable getting and setting soft

On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote:> Getting happens via `xl vcpu-list'', which now looks like this:
> 
>  # xl vcpu-list -s
>  Name       ID VCPU CPU State Time(s) Affinity (Hard / Soft)
>  Domain-0   0     0  11  -b-     5.4  8-15 / all
>  Domain-0   0     1  11  -b-     1.0  8-15 / all
>  Domain-0   0    14  13  -b-     1.4  8-15 / all
>  Domain-0   0    15   8  -b-     1.6  8-15 / all
>  vm-test    3     0   4  -b-     2.5  0-12 / 0-7
>  vm-test    3     1   0  -b-     3.2  0-12 / 0-7
> 
> Setting happens by adding a
''-s''/''--soft'' switch to `xl
vcpu-pin''.
> 
> xl manual page is updated accordingly.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Changes from v3:
>  * fix typos in doc, rephrased the help message and changed
>    the title of the column for hard/soft affinity, as suggested
>    during review.
> 
> Changes from v2:
>  * this patch folds what in v2 were patches 13 and 14;
>  * `xl vcpu-pin'' always shows both had and soft affinity,
>    without the need of passing ''-s''.
> ---
>  docs/man/xl.pod.1         |   24 +++++++++++++++
>  tools/libxl/xl_cmdimpl.c  |   70
+++++++++++++++++++++++++++------------------
>  tools/libxl/xl_cmdtable.c |    3 +-
>  3 files changed, 67 insertions(+), 30 deletions(-)
> 
> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
> index e7b9de2..b738ca2 100644
> --- a/docs/man/xl.pod.1
> +++ b/docs/man/xl.pod.1
> @@ -619,7 +619,7 @@ after B<vcpu-set>, go to B<SEE ALSO>
section for information.
>  Lists VCPU information for a specific domain.  If no domain is
>  specified, VCPU information for all domains will be provided.
>  
> -=item B<vcpu-pin> I<domain-id> I<vcpu> I<cpus>
> +=item B<vcpu-pin> [I<OPTIONS>] I<domain-id>
I<vcpu> I<cpus>
>  
>  Pins the VCPU to only run on the specific CPUs.  The keyword
>  B<all> can be used to apply the I<cpus> list to all VCPUs in
the
> @@ -630,6 +630,28 @@ different run state is appropriate.  Pinning can be
used to restrict
>  this, by ensuring certain VCPUs can only run on certain physical
>  CPUs.
>  
> +B<OPTIONS>
> +
> +=over 4
> +
> +=item B<-s>, B<--soft>
> +
> +The same as above, but affect I<soft affinity> rather than pinning
> +(also called I<hard affinity>).
> +
> +Normally, VCPUs just wander among the CPUs where it is allowed to
s/it is/they are/

I''d usually avoid terms such as "wander" in formal writing.
Perhaps:
        "A VCPUs is normally scheduled among a set of PCPUs."
> +run (either all the CPUs or the ones to which it is pinned, as said
> +for B<vcpu-list>). Soft affinity offers a means to specify one or
s/, as said for B<vcpu-list>// unless you think that adds something? I
can''t see why vcpu-list is relevant though.

Or just "(subject to pinning)" would be a less wordy way to get the
message across.
> +more I<preferred> CPUs. Basically, among the ones where it can run,
> +the VCPU the VCPU will greatly prefer to execute on one of these
> +CPUs, whenever that is possible.
"...one or more I<preferred> CPUs where the VCPU will prefer to run
whenever possible".
> +
> +Notice that, in order for soft affinity to actually work, it needs
> +special support in the scheduler. Right now, only credit1 provides
> +that.
> +
> +=back
> +
>  =item B<vm-list>
>  
>  Prints information about guests. This list excludes information about
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 17fffdd..d6fda26 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -4630,23 +4649,29 @@ static int vcpupin(uint32_t domid, const char
*vcpu, char *cpu)
>      }
>  
>      if (vcpuid != -1) {
> -        if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL))
{
> -            fprintf(stderr, "Could not set affinity for vcpu
`%u''.\n", vcpuid);
> +        if (!soft_affinity &&
> +            libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL))
{
> +            fprintf(stderr, "Could not set hard affinity for vcpu
`%u''.\n",
> +                    vcpuid);
> +            goto out;
> +        } else if (soft_affinity &&
> +                   libxl_set_vcpuaffinity(ctx, domid, vcpuid, NULL,
&cpumap)) {
> +            fprintf(stderr, "Could not set soft affinity for vcpu
`%u''.\n",
> +                    vcpuid);
I think I would have written this as:
	libxl_bitmap *soft = NULL, *hard = NULL;
	if (soft_affinity)
		soft = &cpumap
	else
		hard = &cpumap

The in both of the branches "if (vcpu != -1)":
	libxl_set_vcpuaffinity(..., hard, soft)
		error handling

I think you can omit the soft/hard from the log message other logging
and/or the fact that they know what they asked for will cover the
distinction. If you really wanted the const char *what = soft_affinity ?
"soft" : "hard" to use.

Is there no option to set both?

Ian Campbell

2013-Nov-27 15:53 UTC

head link

Re: [PATCH v4 14/15] xl: enable for specifying node-affinity in the config file

On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote:> in a similar way to how it is possible to specify vcpu-affinity.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Changes from v3:
>  * fix typos and language issues in docs and comments, as
>    suggested during review;
>  * common code to soft and hard affinity parsing factored
>    together, as requested uring review.
> 
> Changes from v2:
>  * use the new libxl API. Although the implementation changed
>    only a little bit, I removed IanJ''s Acked-by, although I am
>    here saying that he did provided it, as requested.
> ---
>  docs/man/xl.cfg.pod.5    |   25 +++++-
>  tools/libxl/xl_cmdimpl.c |  192
++++++++++++++++++++++++++++++----------------
>  2 files changed, 146 insertions(+), 71 deletions(-)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index 5e17b5d..9566393 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> +=item B<cpus_soft="CPU-LIST">
> +
> +Exactly as B<cpus=>, but specifies soft affinity, rather than
pinning
> +(also called hard affinity). When using the credit scheduler, this
> +means the vcpus of the domain prefers to run on these pcpus.
Just "prefer".
> +
> +A C<CPU-LIST> is specified exactly as above, for B<cpus=>.
> +
> +If this option is not specified, the vcpus of the guest will not have
> +any preference regarding on what cpu to run, and the scheduler will
> +treat all the cpus where a vcpu can execute (if B<cpus=> is
specified),
> +or all the host cpus (if not), the same.
You can end this sentence after "on what cpu to run" without losing
information IMHO.
>  If this option is specified,
> +the intersection of the soft affinity mask, provided here, and the vcpu
> +pinning, provided via B<cpus=> (if any), is utilized to compute the
> +domain node-affinity, for driving memory allocations.
> +
>  =back
>  
>  =head3 CPU Scheduling
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index d6fda26..abd3e42 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -76,8 +76,9 @@ xlchild children[child_max];
>  static const char *common_domname;
>  static int fd_lock = -1;
>  
> -/* Stash for specific vcpu to pcpu mappping */
> +/* Stash for specific vcpu to pcpu hard and soft mappping */
You could fix the spelling of mapping while here.
>  static int *vcpu_to_pcpu;
> +static int *vcpu_to_pcpu_soft;
>  
>  static const char savefileheader_magic[32]>      "Xen saved
domain, xl format\n \0 \r";
> @@ -687,6 +688,64 @@ static int vcpupin_parse(char *cpu, libxl_bitmap
*cpumap)
>      return rc;
>  }
>  
> +static int *parse_config_cpumap_list(XLU_ConfigList *cpus,
> +                                     libxl_bitmap *cpumap,
> +                                     int max_vcpus)
I assume this is pure motion/refactoring? Or do I need to read it?
> +{
> +    int i, n_cpus = 0;
> +    int *to_pcpu;
> +    const char *buf;
> +
> +    if (libxl_cpu_bitmap_alloc(ctx, cpumap, 0)) {
> +        fprintf(stderr, "Unable to allocate cpumap\n");
> +        exit(1);
> +    }
> +
> +    /* Prepare the array for single vcpu to pcpu mappings */
> +    to_pcpu = xmalloc(sizeof(int) * max_vcpus);
> +    memset(to_pcpu, -1, sizeof(int) * max_vcpus);
> +
> +    /*
> +     * Idea here is to let libxl think all the domain''s vcpus
> +     * have cpu affinity with all the pcpus on the list. Doing
> +     * that ensures memory is allocated on the proper NUMA nodes.
> +     * It is then us, here in xl, that matches each single vcpu
> +     * to its pcpu (and that''s why we need to stash such info in
> +     * the to_pcpu array now) after the domain has been created.
> +     * This way, we avoid having to pass to libxl some big array
> +     * hosting the single mappings.
> +     */
> +    libxl_bitmap_set_none(cpumap);
> +    while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) {
> +        i = atoi(buf);
> +        if (!libxl_bitmap_cpu_valid(cpumap, i)) {
> +            fprintf(stderr, "cpu %d illegal\n", i);
> +            exit(1);
> +        }
> +        libxl_bitmap_set(cpumap, i);
> +        if (n_cpus < max_vcpus)
> +            to_pcpu[n_cpus] = i;
> +        n_cpus++;
> +    }
> +
> +    return to_pcpu;
> +}
> +
> +static void parse_config_cpumap_string(const char *buf, libxl_bitmap
*cpumap)
Likewise this is also motion/refactoring, correct?
> +{
> +        char *buf2 = strdup(buf);
> +
> +        if (libxl_cpu_bitmap_alloc(ctx, cpumap, 0)) {
> +            fprintf(stderr, "Unable to allocate cpumap\n");
> +            exit(1);
> +        }
> +
> +        libxl_bitmap_set_none(cpumap);
> +        if (vcpupin_parse(buf2, cpumap))
> +            exit(1);
> +        free(buf2);
> +}
> +
>  static void parse_config_data(const char *config_source,
>                                const char *config_data,
>                                int config_len,
> @@ -697,7 +756,8 @@ static void parse_config_data(const char
*config_source,
>      const char *buf;
>      long l;
>      XLU_Config *config;
> -    XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms;
> +    XLU_ConfigList *cpus, *cpus_soft, *vbds, *nics, *pcis;
> +    XLU_ConfigList *cvfbs, *cpuids, *vtpms;
>      XLU_ConfigList *ioports, *irqs, *iomem;
>      int num_ioports, num_irqs, num_iomem;
>      int pci_power_mgmt = 0;
> @@ -820,57 +880,24 @@ static void parse_config_data(const char
*config_source,
>          b_info->max_vcpus = l;
>  
>      if (!xlu_cfg_get_list (config, "cpus", &cpus, 0, 1)) {
> -        int n_cpus = 0;
> -
> -        if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) {
> -            fprintf(stderr, "Unable to allocate cpumap\n");
> -            exit(1);
> -        }
> -
> -        /* Prepare the array for single vcpu to pcpu mappings */
> -        vcpu_to_pcpu = xmalloc(sizeof(int) * b_info->max_vcpus);
> -        memset(vcpu_to_pcpu, -1, sizeof(int) * b_info->max_vcpus);
> -
> -        /*
> -         * Idea here is to let libxl think all the domain''s vcpus
> -         * have cpu affinity with all the pcpus on the list.
> -         * It is then us, here in xl, that matches each single vcpu
> -         * to its pcpu (and that''s why we need to stash such info
in
> -         * the vcpu_to_pcpu array now) after the domain has been created.
> -         * Doing it like this saves the burden of passing to libxl
> -         * some big array hosting the single mappings. Also, using
> -         * the cpumap derived from the list ensures memory is being
> -         * allocated on the proper nodes anyway.
> -         */
> -        libxl_bitmap_set_none(&b_info->cpumap);
> -        while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) {
> -            i = atoi(buf);
> -            if (!libxl_bitmap_cpu_valid(&b_info->cpumap, i)) {
> -                fprintf(stderr, "cpu %d illegal\n", i);
> -                exit(1);
> -            }
> -            libxl_bitmap_set(&b_info->cpumap, i);
> -            if (n_cpus < b_info->max_vcpus)
> -                vcpu_to_pcpu[n_cpus] = i;
> -            n_cpus++;
> -        }
> -
> +        vcpu_to_pcpu = parse_config_cpumap_list(cpus,
&b_info->cpumap,
> +                                                b_info->max_vcpus);
>          /* We have a cpumap, disable automatic placement */
>          libxl_defbool_set(&b_info->numa_placement, false);
>      }
>      else if (!xlu_cfg_get_string (config, "cpus", &buf, 0))
{
> -        char *buf2 = strdup(buf);
> -
> -        if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) {
> -            fprintf(stderr, "Unable to allocate cpumap\n");
> -            exit(1);
> -        }
> -
> -        libxl_bitmap_set_none(&b_info->cpumap);
> -        if (vcpupin_parse(buf2, &b_info->cpumap))
> -            exit(1);
> -        free(buf2);
> +        parse_config_cpumap_string(buf, &b_info->cpumap);
> +        libxl_defbool_set(&b_info->numa_placement, false);
> +    }
>  
> +    if (!xlu_cfg_get_list (config, "cpus_soft", &cpus_soft,
0, 1)) {
> +        vcpu_to_pcpu_soft > +           
parse_config_cpumap_list(cpus_soft, &b_info->cpumap_soft,
> +                                     b_info->max_vcpus);
> +        libxl_defbool_set(&b_info->numa_placement, false);
> +    }
> +    else if (!xlu_cfg_get_string (config, "cpus_soft", &buf,
0)) {
> +        parse_config_cpumap_string(buf, &b_info->cpumap_soft);
>          libxl_defbool_set(&b_info->numa_placement, false);
>      }
This is / else if pattern is the same for both types of bitmap, right?

Can you make it a function which takes config,
"cpus_soft"/"cpus" and a
pointer to the map to fill in etc?
>  
> @@ -1991,6 +2018,40 @@ static void
evdisable_disk_ejects(libxl_evgen_disk_eject **diskws,
>      }
>  }
>  
> +static inline int set_vcpu_to_pcpu_affinity(uint32_t domid, int *to_pcpu,
> +                                            int max_vcpus, int soft)
> +{
> +    libxl_bitmap vcpu_cpumap;
> +    int i, ret = 0;
> +
> +    ret = libxl_cpu_bitmap_alloc(ctx, &vcpu_cpumap, 0);
> +    if (ret)
> +        return -1;
> +
> +    for (i = 0; i < max_vcpus; i++) {
> +        if (to_pcpu[i] != -1) {
> +            libxl_bitmap_set_none(&vcpu_cpumap);
> +            libxl_bitmap_set(&vcpu_cpumap, to_pcpu[i]);
> +        } else {
> +            libxl_bitmap_set_any(&vcpu_cpumap);
> +        }
> +        if (!soft &&
> +            libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap, NULL))
{
> +            fprintf(stderr, "setting hard affinity failed on vcpu
`%d''.\n", i);
> +            ret = -1;
> +            break;
> +        } else if (soft &&
> +                   libxl_set_vcpuaffinity(ctx, domid, i, NULL,
&vcpu_cpumap)) {
Another place where using termporary hard and soft pointers initialised
from the soft boolean would work I think.

Ian.

Ian Campbell

2013-Nov-27 15:55 UTC

head link

Re: [PATCH v4 15/15] libxl: automatic NUMA placement affects soft affinity

On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote:> vCPU soft affinity and NUMA-aware scheduling does not have
> to be related. However, soft affinity is how NUMA-aware
> scheduling is actually implemented, and therefore, by default,
> the results of automatic NUMA placement (at VM creation time)
> are also used to set the soft affinity of all the vCPUs of
> the domain.
> 
> Of course, this only happens if automatic NUMA placement is
> enabled and actually takes place (for instance, if the user
> does not specify any hard and soft affiniy in the xl config
> file).
> 
> This also takes care of the vice-versa, i.e., don''t trigger
> automatic placement if the config file specifies either an
> hard (the check for which was already there) or a soft (the
> check for which is introduced by this commit) affinity.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Dario Faggioli

2013-Dec-02 18:10 UTC

head link

Re: [PATCH v4 13/15] xl: enable getting and setting soft

On mer, 2013-11-27 at 14:57 +0000, Ian Campbell wrote:> On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote:
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -4630,23 +4649,29 @@ static int vcpupin(uint32_t domid, const char
*vcpu, char *cpu)
> >      }
> >  
> >      if (vcpuid != -1) {
> > -        if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap,
NULL)) {
> > -            fprintf(stderr, "Could not set affinity for vcpu
`%u''.\n", vcpuid);
> > +        if (!soft_affinity &&
> > +            libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap,
NULL)) {
> > +            fprintf(stderr, "Could not set hard affinity for
vcpu `%u''.\n",
> > +                    vcpuid);
> > +            goto out;
> > +        } else if (soft_affinity &&
> > +                   libxl_set_vcpuaffinity(ctx, domid, vcpuid, NULL,
&cpumap)) {
> > +            fprintf(stderr, "Could not set soft affinity for
vcpu `%u''.\n",
> > +                    vcpuid);
> 
> I think I would have written this as:
> 	libxl_bitmap *soft = NULL, *hard = NULL;
> 	if (soft_affinity)
> 		soft = &cpumap
> 	else
> 		hard = &cpumap
> 
> The in both of the branches "if (vcpu != -1)":
> 	libxl_set_vcpuaffinity(..., hard, soft)
> 		error handling
> 
> I think you can omit the soft/hard from the log message other logging
> and/or the fact that they know what they asked for will cover the
> distinction. If you really wanted the const char *what = soft_affinity ?
> "soft" : "hard" to use.
> Done this, and all the that was suggested in this e-mail.
> Is there no option to set both?
> No, not for now. Do you think there should be one?

Personally, I think we should keep `xl vcpu-pin <dom> <vcpu>
<affinity>''
working, and doing the thing that people are most used with, i.e.,
setting the pinning (hard affinity). At that point, the most natural
thing to do seemed to add a switch to have this deal with soft affinity,
which is what happens in this patch.

I of course can come up with pretty much all sort of behavior, e.g.,
setting both hard and soft affinity to the same cpumap, if no switch is
provided, and then introduce both a ''-h'' and a
''-s'', for specifying
different cpumaps, at the same time, etc.

Of course, that could well happen during 4.5, I guess. :-P

So, what do we want?

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Dario Faggioli

2013-Dec-02 18:17 UTC

head link

Re: [PATCH v4 12/15] libxl: get and set soft affinity

On mer, 2013-11-27 at 14:45 +0000, Ian Campbell wrote:> On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote:
> > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> > index 961d55e..4b871d7 100644
> > --- a/tools/libxl/libxl.c
> > +++ b/tools/libxl/libxl.c
> > @@ -4538,6 +4538,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx,
uint32_t domid,
> >      for (*nb_vcpu = 0; *nb_vcpu <= domaininfo.max_vcpu_id;
++*nb_vcpu, ++ptr) {
> >          if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0))
> >              return NULL;
> > +        if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap_soft, 0))
> > +            return NULL;
> 
> Leaks ptr and ptr->cpumap...
> Addressed in a separate patch (still part of the new version, v5, of the
series).
> > + */
> > +#define LIBXL_HAVE_SOFT_AFFINITY 1
> > +
> > +/*
> >   * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the
> >   * libxl_vendor_device field is present in the hvm sections of
> >   * libxl_domain_build_info. This field tells libxl which
> > @@ -994,9 +1002,35 @@ int libxl_userdata_retrieve(libxl_ctx *ctx,
uint32_t domid,
> >  
> >  int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo);
> >  int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t
vcpuid,
> > -                           libxl_bitmap *cpumap);
> > +                           const libxl_bitmap *cpumap_hard,
> > +                           const libxl_bitmap *cpumap_soft);
> >  int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
> > -                               unsigned int max_vcpus, libxl_bitmap
*cpumap);
> > +                               unsigned int max_vcpus,
> > +                               const libxl_bitmap *cpumap_hard,
> > +                               const libxl_bitmap *cpumap_soft);
> > +
> > +#if defined (LIBXL_API_VERSION) && LIBXL_API_VERSION <
0x040400
> > +
> > +static inline
> > +int libxl_set_vcpuaffinity_0x040200(libxl_ctx *ctx, uint32_t domid,
> > +                                    uint32_t vcpuid, libxl_bitmap
*cpumap)
> > +{
> > +    return libxl_set_vcpuaffinity(ctx, domid, vcpuid, cpumap, NULL);
> > +}
> > +
> > +static inline
> > +int libxl_set_vcpuaffinity_all_0x040200(libxl_ctx *ctx, uint32_t
domid,
> > +                                        unsigned int max_vcpus,
> > +                                        libxl_bitmap *cpumap)
> > +{
> > +    return libxl_set_vcpuaffinity_all(ctx, domid, max_vcpus, cpumap,
NULL);
> > +}
> > +
> > +#define libxl_set_vcpuaffinity libxl_set_vcpuaffinity_0x040200
> > +#define libxl_set_vcpuaffinity_all
libxl_set_vcpuaffinity_all_0x040200
> 
> This looks correct, thanks. It could also have jsut been #defined since
> the inline is only used in the libxl_domain_create_restore_0x040200 case
> because there is a local variable.
>Yes, I saw that.

You mean something like this:

#define libxl_set_vcpuaffinity(_ctx, _domid, _vcpuid, _map) \
    libxl_set_vcpuaffinity((_ctx), (_domid), (_vcpuid), (_map), NULL)

?

Well, I guess I can, but don''t I more protection for the arguments /
risk to clash with the app namespace, if I do?

That''s the main reason why I used the inline variant, and that is true
for v5 as well. If you prefer to go with the `#define'' only, I can
respin (either the series or the patch).

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Dario Faggioli

2013-Dec-02 18:21 UTC

head link

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

On mer, 2013-11-27 at 13:45 +0000, Ian Campbell wrote:> On Fri, 2013-11-22 at 19:56 +0100, Dario Faggioli wrote:
> >  
> >      cpumap->map = xc_cpupool_freeinfo(ctx->xch);
> > diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
> > index 9f5f589..1815422 100644
> > --- a/tools/libxl/libxl_utils.c
> > +++ b/tools/libxl/libxl_utils.c
> > @@ -651,6 +651,56 @@ char *libxl_bitmap_to_hex_string(libxl_ctx *ctx,
const libxl_bitmap *bitmap)
> >      return q;
> >  }
> >  
> > +int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int
max_cpus)
> 
> You seem to have combined code motion with actual changes here. Please
> don''t do that.
> As I explained in the commit message for this patch in v5, I actually am
both moving and (slightly) changing the functions, and that''s mainly
because the changes I''m doing require the move (e.g., GC_INIT/GC_FREE
not being available in the original place).

I feel like it''s more natural to do like this, rather than having a
pre-patch just moving the code for no apparent reason... Isn''t it?

Or was it something else that you were pointing out?

(Anyway, feel free to look at v5 and reply there).

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Dario Faggioli

2013-Dec-02 18:22 UTC

head link

Re: [PATCH v4 14/15] xl: enable for specifying node-affinity in the config file

On mer, 2013-11-27 at 15:53 +0000, Ian Campbell wrote:> On Fri, 2013-11-22 at 19:58 +0100, Dario Faggioli wrote:
> > +static int *parse_config_cpumap_list(XLU_ConfigList *cpus,
> > +                                     libxl_bitmap *cpumap,
> > +                                     int max_vcpus)
> 
> I assume this is pure motion/refactoring? Or do I need to read it?
> It indeed it motion/refactoring, and so it is in v5.
> > +{
> > +    int i, n_cpus = 0;
> > +    int *to_pcpu;
> > +    const char *buf;
> > +
> > +    if (libxl_cpu_bitmap_alloc(ctx, cpumap, 0)) {
> > +        fprintf(stderr, "Unable to allocate cpumap\n");
> > +        exit(1);
> > +    }
> > +
> > +    /* Prepare the array for single vcpu to pcpu mappings */
> > +    to_pcpu = xmalloc(sizeof(int) * max_vcpus);
> > +    memset(to_pcpu, -1, sizeof(int) * max_vcpus);
> > +
> > +    /*
> > +     * Idea here is to let libxl think all the domain''s
vcpus
> > +     * have cpu affinity with all the pcpus on the list. Doing
> > +     * that ensures memory is allocated on the proper NUMA nodes.
> > +     * It is then us, here in xl, that matches each single vcpu
> > +     * to its pcpu (and that''s why we need to stash such
info in
> > +     * the to_pcpu array now) after the domain has been created.
> > +     * This way, we avoid having to pass to libxl some big array
> > +     * hosting the single mappings.
> > +     */
> > +    libxl_bitmap_set_none(cpumap);
> > +    while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) {
> > +        i = atoi(buf);
> > +        if (!libxl_bitmap_cpu_valid(cpumap, i)) {
> > +            fprintf(stderr, "cpu %d illegal\n", i);
> > +            exit(1);
> > +        }
> > +        libxl_bitmap_set(cpumap, i);
> > +        if (n_cpus < max_vcpus)
> > +            to_pcpu[n_cpus] = i;
> > +        n_cpus++;
> > +    }
> > +
> > +    return to_pcpu;
> > +}
> > +
> > +static void parse_config_cpumap_string(const char *buf, libxl_bitmap
*cpumap)
> 
> Likewise this is also motion/refactoring, correct?
> Same here.

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2013-Dec-03 09:32 UTC

head link

Re: [PATCH v4 13/15] xl: enable getting and setting soft

On Mon, 2013-12-02 at 19:10 +0100, Dario Faggioli wrote:> > Is there no option to set both?
> > 
> No, not for now. Do you think there should be one?
Is it never necessary to set the "atomically"? Or is it always
possible
to set one arbitrarily before the other?
> Personally, I think we should keep `xl vcpu-pin <dom> <vcpu>
<affinity>''
> working, and doing the thing that people are most used with, i.e.,
> setting the pinning (hard affinity).
Agreed.
> At that point, the most natural
> thing to do seemed to add a switch to have this deal with soft affinity,
> which is what happens in this patch.
> 
> I of course can come up with pretty much all sort of behavior, e.g.,
> setting both hard and soft affinity to the same cpumap, if no switch is
> provided, and then introduce both a ''-h'' and a
''-s'', for specifying
> different cpumaps, at the same time, etc.
Is setting them both the same pretty much identical in practice to
setting the hard and leaving the soft at all?

I was actually thinking of -b/--both but as you suggest could work too,
except -h is help so would need a different letter.

Another option would be to take two affinities on the command line, the
first being hard and the second soft with some special syntax ("-"?)
for
"leave it alone".
> Of course, that could well happen during 4.5, I guess. :-P
Sure.
> 
> So, what do we want?
> 
> Regards,
> Dario
>

Ian Campbell

2013-Dec-03 09:35 UTC

head link

Re: [PATCH v4 12/15] libxl: get and set soft affinity

On Mon, 2013-12-02 at 19:17 +0100, Dario Faggioli wrote:> On mer, 2013-11-27 at 14:45 +0000, Ian Campbell wrote:
> 
> > > + */
> > > +#define LIBXL_HAVE_SOFT_AFFINITY 1
> > > +
> > > +/*
> > >   * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the
> > >   * libxl_vendor_device field is present in the hvm sections of
> > >   * libxl_domain_build_info. This field tells libxl which
> > > @@ -994,9 +1002,35 @@ int libxl_userdata_retrieve(libxl_ctx *ctx,
uint32_t domid,
> > >  
> > >  int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo
*physinfo);
> > >  int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid,
uint32_t vcpuid,
> > > -                           libxl_bitmap *cpumap);
> > > +                           const libxl_bitmap *cpumap_hard,
> > > +                           const libxl_bitmap *cpumap_soft);
> > >  int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
> > > -                               unsigned int max_vcpus,
libxl_bitmap *cpumap);
> > > +                               unsigned int max_vcpus,
> > > +                               const libxl_bitmap *cpumap_hard,
> > > +                               const libxl_bitmap *cpumap_soft);
> > > +
> > > +#if defined (LIBXL_API_VERSION) && LIBXL_API_VERSION
< 0x040400
> > > +
> > > +static inline
> > > +int libxl_set_vcpuaffinity_0x040200(libxl_ctx *ctx, uint32_t
domid,
> > > +                                    uint32_t vcpuid,
libxl_bitmap *cpumap)
> > > +{
> > > +    return libxl_set_vcpuaffinity(ctx, domid, vcpuid, cpumap,
NULL);
> > > +}
> > > +
> > > +static inline
> > > +int libxl_set_vcpuaffinity_all_0x040200(libxl_ctx *ctx, uint32_t
domid,
> > > +                                        unsigned int max_vcpus,
> > > +                                        libxl_bitmap *cpumap)
> > > +{
> > > +    return libxl_set_vcpuaffinity_all(ctx, domid, max_vcpus,
cpumap, NULL);
> > > +}
> > > +
> > > +#define libxl_set_vcpuaffinity libxl_set_vcpuaffinity_0x040200
> > > +#define libxl_set_vcpuaffinity_all
libxl_set_vcpuaffinity_all_0x040200
> > 
> > This looks correct, thanks. It could also have jsut been #defined
since
> > the inline is only used in the libxl_domain_create_restore_0x040200
case
> > because there is a local variable.
> >
> Yes, I saw that.
> 
> You mean something like this:
> 
> #define libxl_set_vcpuaffinity(_ctx, _domid, _vcpuid, _map) \
>     libxl_set_vcpuaffinity((_ctx), (_domid), (_vcpuid), (_map), NULL)
> 
> ?
Something like that yes.
> Well, I guess I can, but don''t I more protection for the arguments
/
> risk to clash with the app namespace, if I do?
Since the arguments of the inner function which are common to the outer
have the same types I don''t think you gain any extra type checking and
I
don''t think the namespace clashes in practice (BTW, there is no need
for
leading _ in macro arguments IIRC)
> That''s the main reason why I used the inline variant, and that is
true
> for v5 as well. If you prefer to go with the `#define'' only, I can
> respin (either the series or the patch).
I don''t have a strong preference, I was just observing in case you
thought it nicer.

Ian.

Ian Campbell

2013-Dec-03 09:41 UTC

head link

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

On Mon, 2013-12-02 at 19:21 +0100, Dario Faggioli wrote:> On mer, 2013-11-27 at 13:45 +0000, Ian Campbell wrote:
> > On Fri, 2013-11-22 at 19:56 +0100, Dario Faggioli wrote:
> > >  
> > >      cpumap->map = xc_cpupool_freeinfo(ctx->xch);
> > > diff --git a/tools/libxl/libxl_utils.c
b/tools/libxl/libxl_utils.c
> > > index 9f5f589..1815422 100644
> > > --- a/tools/libxl/libxl_utils.c
> > > +++ b/tools/libxl/libxl_utils.c
> > > @@ -651,6 +651,56 @@ char *libxl_bitmap_to_hex_string(libxl_ctx
*ctx, const libxl_bitmap *bitmap)
> > >      return q;
> > >  }
> > >  
> > > +int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap,
int max_cpus)
> > 
> > You seem to have combined code motion with actual changes here. Please
> > don''t do that.
> > 
> As I explained in the commit message for this patch in v5, I actually am
> both moving and (slightly) changing the functions, and that''s
mainly
> because the changes I''m doing require the move (e.g.,
GC_INIT/GC_FREE
> not being available in the original place).
> 
> I feel like it''s more natural to do like this, rather than having
a
> pre-patch just moving the code for no apparent reason... Isn''t it?
Certainly making changes which are necessarily required by the code
motion is fine to do in a single step, although even then if you can
arrange to make the changes either before or after the move it is
better.

But is that what is happening here?

-static inline int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap,
-                                         int max_cpus)
-{
-    if (max_cpus < 0)
-        return ERROR_INVAL;
-    if (max_cpus == 0)
-        max_cpus = libxl_get_max_cpus(ctx);
-    if (max_cpus == 0)
-        return ERROR_FAIL;
-
-    return libxl_bitmap_alloc(ctx, cpumap, max_cpus);
-}

is becoming:

+int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int max_cpus)
+{
+    GC_INIT(ctx);
+    int rc = 0;
+
+    if (max_cpus < 0) {
+        rc = ERROR_INVAL;
+        goto out;
+    }
+    if (max_cpus == 0)
+        max_cpus = libxl_get_max_cpus(ctx);
+    if (max_cpus <= 0) {
+        LOG(ERROR, "failed to retrieve the maximum number of cpus");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+    /* This can''t fail: no need to check and log */
+    libxl_bitmap_alloc(ctx, cpumap, max_cpus);
+
+ out:
+    GC_FREE;
+    return rc;
+}

which involves a lot more reworking than simply adding the GC. In any
case I don''t see why the original function couldn''t be moved
as is and
then have the GC-ification added in the new location, I don''t think the
move requires the change to using the GC In any way.

> Or was it something else that you were pointing out?
> 
> (Anyway, feel free to look at v5 and reply there).
In general it would be better to reply to vN either up front or as you
do the rework, so we can resolve any misunderstanding rather than
waiting until after you''ve posted vN+1 and perhaps requiring a vN+2.

Ian.

Dario Faggioli

2013-Dec-03 10:27 UTC

head link

Re: [PATCH v4 13/15] xl: enable getting and setting soft

On mar, 2013-12-03 at 09:32 +0000, Ian Campbell wrote:> On Mon, 2013-12-02 at 19:10 +0100, Dario Faggioli wrote:
> > > Is there no option to set both?
> > > 
> > No, not for now. Do you think there should be one?
> 
> Is it never necessary to set the "atomically"? Or is it always
possible
> to set one arbitrarily before the other?
> From the user perspective (as we''re discussing `xl''
interface), there
certainly are cases when one want both to have the same or different
values, but it is possible to achieve that by setting them independently
(just takes more time, since you need two `xl vcpu-pin'' commands).

Setting soft affinity *never* fails, while setting hard may fail, in the
same exact circumstances when it is failing right now (e.g, setting hard
affinity completely outside of your cpupool). Moreover, setting
whichever one does not interfere with the other (apart from the
possibility of getting a warning, it the system goes through some funny
transient state).

Therefore, I don''t think there is an hard requirement for this,
although
I''m certainly up for trying to make this more comfortable to use, once
in 4.5.
> > At that point, the most natural
> > thing to do seemed to add a switch to have this deal with soft
affinity,
> > which is what happens in this patch.
> > 
> > I of course can come up with pretty much all sort of behavior, e.g.,
> > setting both hard and soft affinity to the same cpumap, if no switch
is
> > provided, and then introduce both a ''-h'' and a
''-s'', for specifying
> > different cpumaps, at the same time, etc.
> 
> Is setting them both the same pretty much identical in practice to
> setting the hard and leaving the soft at all?
>Yes it is. That''s why I''d rather provide a mechanism for
setting both,
but each one to its own value (which could be hard=3-5-soft=all). I
thinks it''s more useful.
> I was actually thinking of -b/--both but as you suggest could work too,
> except -h is help so would need a different letter.
> Well, sure -h was pretty stupid, sorry. :-)
> Another option would be to take two affinities on the command line, the
> first being hard and the second soft with some special syntax
("-"?) for
> "leave it alone".
> Mmm... I think I like this better. So, if we take the approach
implemented in this patch right now, and introduce something like that
later, here''s what we''d get:

 1. xl vcpu-pin 3 all 4-6 : hard affinity for all dom3 vcpus --> 4-6
 2. xl vcpu-pin -s 3 all 3,5 : soft affinity for all dom3 vcpus --> 3,5
 3. xl vcpu-pin 3 all 4-6 3,5 : combination of 1 and 2
 4. xl vcpu-pin 3 all - 3,5 : same as 2

Would that be ok?

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2013-Dec-03 10:59 UTC

head link

Re: [PATCH v4 13/15] xl: enable getting and setting soft

On Tue, 2013-12-03 at 11:27 +0100, Dario Faggioli wrote:> On mar, 2013-12-03 at 09:32 +0000, Ian Campbell wrote:
> > Another option would be to take two affinities on the command line,
the
> > first being hard and the second soft with some special syntax
("-"?) for
> > "leave it alone".
> > 
> Mmm... I think I like this better. So, if we take the approach
> implemented in this patch right now, and introduce something like that
> later, here''s what we''d get:
> 
>  1. xl vcpu-pin 3 all 4-6 : hard affinity for all dom3 vcpus --> 4-6
You mean soft don''t you? And also hard -> all, not left alone.
>  2. xl vcpu-pin -s 3 all 3,5 : soft affinity for all dom3 vcpus --> 3,5
I think if you give -s then you should expect only one cpumap, which
would be the soft affinity map.
>  3. xl vcpu-pin 3 all 4-6 3,5 : combination of 1 and 2
Modulo the correction above, yes.
>  4. xl vcpu-pin 3 all - 3,5 : same as 2
Yes
> Would that be ok?
I like it.

Ian,

Dario Faggioli

2013-Dec-03 11:14 UTC

head link

Re: [PATCH v4 13/15] xl: enable getting and setting soft

On mar, 2013-12-03 at 10:59 +0000, Ian Campbell wrote:> On Tue, 2013-12-03 at 11:27 +0100, Dario Faggioli wrote:
> > On mar, 2013-12-03 at 09:32 +0000, Ian Campbell wrote:
> > > Another option would be to take two affinities on the command
line, the
> > > first being hard and the second soft with some special syntax
("-"?) for
> > > "leave it alone".
> > > 
> > Mmm... I think I like this better. So, if we take the approach
> > implemented in this patch right now, and introduce something like that
> > later, here''s what we''d get:
> > 
> >  1. xl vcpu-pin 3 all 4-6 : hard affinity for all dom3 vcpus -->
4-6
> 
> You mean soft don''t you? And also hard -> all, not left alone.
> Aha! :-) Actually, the ''all'' here (and below) refers to
"what dom3''s
vcpu(s)"?

 xl vcpu-pin <OPTIONS> <domain-id> <vcpu> <cpus>

So, no, I did mean hard and, more specifically, I included that example
because that is what is being used right now (and I mean _right_now_ so
before this series), and that we both agreed should continue working.

Now that you made me thinking about this, going this way we risk to have
something like:

 xl vcpu-pin 3 all all all

which may not exactly look crystal-clear...
> >  2. xl vcpu-pin -s 3 all 3,5 : soft affinity for all dom3 vcpus -->
3,5
> 
> I think if you give -s then you should expect only one cpumap, which
> would be the soft affinity map.
> Same here. Another example could be:

 xl vcpu-pin -s 3 0 3,5

which has indeed only one cpumap, "3,5", while the "0" means
vcpu #0
doma domain 3.
> > Would that be ok?
> 
> I like it.
> So, is this still true? :-D

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2013-Dec-03 11:18 UTC

head link

Re: [PATCH v4 13/15] xl: enable getting and setting soft

On Tue, 2013-12-03 at 12:14 +0100, Dario Faggioli wrote:> On mar, 2013-12-03 at 10:59 +0000, Ian Campbell wrote:
> > On Tue, 2013-12-03 at 11:27 +0100, Dario Faggioli wrote:
> > > On mar, 2013-12-03 at 09:32 +0000, Ian Campbell wrote:
> > > > Another option would be to take two affinities on the
command line, the
> > > > first being hard and the second soft with some special
syntax ("-"?) for
> > > > "leave it alone".
> > > > 
> > > Mmm... I think I like this better. So, if we take the approach
> > > implemented in this patch right now, and introduce something like
that
> > > later, here''s what we''d get:
> > > 
> > >  1. xl vcpu-pin 3 all 4-6 : hard affinity for all dom3 vcpus
--> 4-6
> > 
> > You mean soft don''t you? And also hard -> all, not left
alone.
> > 
> Aha! :-) Actually, the ''all'' here (and below) refers to
"what dom3''s
> vcpu(s)"?
> 
>  xl vcpu-pin <OPTIONS> <domain-id> <vcpu> <cpus>
> 
> So, no, I did mean hard and, more specifically, I included that example
> because that is what is being used right now (and I mean _right_now_ so
> before this series), and that we both agreed should continue working.
Oh, yes I was confused by the all for the vcpu field.
> Now that you made me thinking about this, going this way we risk to have
> something like:
> 
>  xl vcpu-pin 3 all all all
> 
> which may not exactly look crystal-clear...
It''s a bit odd isn''t it, but I think livable with.
> > >  2. xl vcpu-pin -s 3 all 3,5 : soft affinity for all dom3 vcpus
--> 3,5
> > 
> > I think if you give -s then you should expect only one cpumap, which
> > would be the soft affinity map.
> > 
> Same here. Another example could be:
> 
>  xl vcpu-pin -s 3 0 3,5
> 
> which has indeed only one cpumap, "3,5", while the "0"
means vcpu #0
> doma domain 3.
> 
> > > Would that be ok?
> > 
> > I like it.
> > 
> So, is this still true? :-D
Yes. All IMHO of course -- would be nice to hear what others think.

Ian.

Dario Faggioli

2013-Dec-03 11:40 UTC

head link

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

`On mar, 2013-12-03 at 09:41 +0000, Ian Campbell wrote:> On Mon, 2013-12-02 at 19:21 +0100, Dario Faggioli wrote:
> > I feel like it''s more natural to do like this, rather than
having a
> > pre-patch just moving the code for no apparent reason...
Isn''t it?
> 
> Certainly making changes which are necessarily required by the code
> motion is fine to do in a single step, although even then if you can
> arrange to make the changes either before or after the move it is
> better.
> 
> But is that what is happening here?
> 
> -static inline int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap
*cpumap,
> -                                         int max_cpus)
> -{
> -    if (max_cpus < 0)
> -        return ERROR_INVAL;
> -    if (max_cpus == 0)
> -        max_cpus = libxl_get_max_cpus(ctx);
> -    if (max_cpus == 0)
> -        return ERROR_FAIL;
> -
> -    return libxl_bitmap_alloc(ctx, cpumap, max_cpus);
> -}
> 
> is becoming:
> 
> +int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int
max_cpus)
> +{
> +    GC_INIT(ctx);
> +    int rc = 0;
> +
> +    if (max_cpus < 0) {
> +        rc = ERROR_INVAL;
> +        goto out;
> +    }
> +    if (max_cpus == 0)
> +        max_cpus = libxl_get_max_cpus(ctx);
> +    if (max_cpus <= 0) {
> +        LOG(ERROR, "failed to retrieve the maximum number of
cpus");
> +        rc = ERROR_FAIL;
> +        goto out;
> +    }
> +    /* This can''t fail: no need to check and log */
> +    libxl_bitmap_alloc(ctx, cpumap, max_cpus);
> +
> + out:
> +    GC_FREE;
> +    return rc;
> +}
> 
> which involves a lot more reworking than simply adding the GC. 
>Well, is it? All I''m doing is adding the GC and one LOG(). In v5
it''s 2
LOG()s. The rest of the rework is indeed related to the GC-ification,
since I can''t leave without calling GC_FREE anymore...
> In any
> case I don''t see why the original function couldn''t be
moved as is and
> then have the GC-ification added in the new location, I don''t
think the
> move requires the change to using the GC In any way.
> Mmm... I''m not sure I''m fully understanding what
you''re trying to say.
That function is in libxl_utils.h right now (IIRC, I put it there
myself :-)) because it''s a trivial wrapper of libxl_bitmap_alloc().

What is happening is that, as a result of changing the error handling in
libxl_get_max_cpus(), and deciding to move logging for when it fails
closer to it --which is what happens in this very patch-- I just don''t
think this is a trivial enough wrapper any longer.

So, my view here is: if I modify the function, adding GC and the LOG()s,
then it should also be moved, if not, it can stay where it is. What you
seem to be suggesting is that I (either in this or a previous patch)
move the function as is, for no apparent reason, and then (either in a
following or this patch), introduce the GC and the LOG()s... Is that the
case, or am I missing something? I see the point of not combining
functional and not-functional changes, but that really looks odd to me,
in this particular case.

Anyway, if that''s what you want, I certainly can do that...
I''m not
being graded against a maximum number of patches in a series, am I? ;-P
> > (Anyway, feel free to look at v5 and reply there).
> 
> In general it would be better to reply to vN either up front or as you
> do the rework, so we can resolve any misunderstanding rather than
> waiting until after you''ve posted vN+1 and perhaps requiring a
vN+2.
> Definitely, and I always do that. This round suffered from a combination
of weird circumstances on my side. Sorry for that.

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2013-Dec-03 11:45 UTC

head link

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

On Tue, 2013-12-03 at 12:40 +0100, Dario Faggioli wrote:> So, my view here is: if I modify the function, adding GC and the LOG()s,
> then it should also be moved, if not, it can stay where it is. What you
> seem to be suggesting is that I (either in this or a previous patch)
> move the function as is, for no apparent reason,
It''s not for "no apparent reason", it is to separate code
motion from
reworking the function, which you would explain in your commit message.

If that were considered no apparent reason then it wouldn''t be possible
to split the code motion from the functional changes in the majority of
cases.

Ian.

Dario Faggioli

2013-Dec-03 12:06 UTC

head link

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

On mar, 2013-12-03 at 11:45 +0000, Ian Campbell wrote:> On Tue, 2013-12-03 at 12:40 +0100, Dario Faggioli wrote:
> > So, my view here is: if I modify the function, adding GC and the
LOG()s,
> > then it should also be moved, if not, it can stay where it is. What
you
> > seem to be suggesting is that I (either in this or a previous patch)
> > move the function as is, for no apparent reason,
> 
> It''s not for "no apparent reason", it is to separate
code motion from
> reworking the function, which you would explain in your commit message.
> Ok then.
> If that were considered no apparent reason then it wouldn''t be
possible
> to split the code motion from the functional changes in the majority of
> cases.
> Right.

Thanks for the explanation, and sorry for not getting what you mean/want
sooner. It just differs from the way I''ve done things up to now, so I
really wasn''t sure I was understanding, rather than not wanting to do
it. :-)

Will do.

Thanks again and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Jackson

2013-Dec-03 17:40 UTC

head link

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

Dario Faggioli writes ("Re: [Xen-devel] [PATCH v4 02/15] libxl: sanitize
error handling in libxl_get_max_{cpus, nodes}"):> Thanks for the explanation, and sorry for not getting what you mean/want
> sooner. It just differs from the way I''ve done things up to now,
so I
> really wasn''t sure I was understanding, rather than not wanting to
do
> it. :-)
I think it would be helpful for me to expand on and motivate what Ian
C has said, which I agree with.

In general, code motion makes it very hard to review substantive
changes. It makes it difficult to spot even very small substantive
changes.

On the other hand, it is much easier to review a patch that says in
its commit message that it contains _only_ code motion and a
specifically enumerated set of absolutely required changes (new
function beginnings and endings, GC_INIT and GC_FREE, etc. etc. (but
don''t write "etc. etc." in the commit message - spell it
out)).

Firstly, if you know a patch is supposed to be just code motion, it is
fairly easy to check this with an ad hoc approach. And secondly, if
we trust that the person who wrote the commit message is scrupulous
about not introducing other stuff into it, we can take more of it on
trust.

And then the non-code motion patch is much easier to review too,
because it just says "add missing LOG calls" and we can verify that
that''s what it is doing.

In general IMO splitting patches up should be done as much as
possible, so long as the changes themselves are not interrelated. The
reason for this is that reviewing involves understanding the commit
message and then reviewing the code changes against the claimed
changes from the commit message. This is task is much harder when the
commit message is a rag bag of small changes.

Ian.

Ian Jackson

2013-Dec-03 17:48 UTC

head link

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

Dario Faggioli writes ("[PATCH v4 03/15] libxl: introduce
libxl_get_nr_cpus()"):> to retrieve the actual number of pCPUs on the host.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
>  tools/libxl/libxl.h       |    3 +++
>  tools/libxl/libxl_utils.c |   18 ++++++++++++++++++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index a9663e4..2fab5ba 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -652,6 +652,9 @@ int libxl_domain_preserve(libxl_ctx *ctx, uint32_t
domid, libxl_domain_create_in
>  /* get max. number of cpus supported by hypervisor */
>  int libxl_get_max_cpus(libxl_ctx *ctx);
>  
> +/* get the actual number of online cpus on the host */
> +int libxl_get_nr_cpus(libxl_ctx *ctx);
This number might be out of date as soon as it is read, won''t it ?

Ian.

Dario Faggioli

2013-Dec-03 17:52 UTC

head link

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote:> Dario Faggioli writes ("[PATCH v4 03/15] libxl: introduce
libxl_get_nr_cpus()"):
> > to retrieve the actual number of pCPUs on the host.
> > 
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> > ---
> >  tools/libxl/libxl.h       |    3 +++
> >  tools/libxl/libxl_utils.c |   18 ++++++++++++++++++
> >  2 files changed, 21 insertions(+)
> > 
> > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> > index a9663e4..2fab5ba 100644
> > --- a/tools/libxl/libxl.h
> > +++ b/tools/libxl/libxl.h
> > @@ -652,6 +652,9 @@ int libxl_domain_preserve(libxl_ctx *ctx, uint32_t
domid, libxl_domain_create_in
> >  /* get max. number of cpus supported by hypervisor */
> >  int libxl_get_max_cpus(libxl_ctx *ctx);
> >  
> > +/* get the actual number of online cpus on the host */
> > +int libxl_get_nr_cpus(libxl_ctx *ctx);
> 
> This number might be out of date as soon as it is read, won''t it ?
> Quite possible, yes.

So, are you suggesting that we shouldn''t even allow the user to read
it?
Or that I should mention that in the comment? (Or something else?)

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Jackson

2013-Dec-03 17:54 UTC

head link

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce
libxl_get_nr_cpus()"):> On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote:
> > This number might be out of date as soon as it is read, won''t
it ?
> 
> Quite possible, yes.
> 
> So, are you suggesting that we shouldn''t even allow the user to
read it?
> Or that I should mention that in the comment? (Or something else?)
Perhaps I didn''t explain my concerns clearly enough.

I wonder what is it for ?  Isn''t it difficult to use correctly ?

Ian.

George Dunlap

2013-Dec-03 18:09 UTC

head link

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

On 12/03/2013 05:54 PM, Ian Jackson wrote:> Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce
libxl_get_nr_cpus()"):
>> On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote:
>>> This number might be out of date as soon as it is read,
won''t it ?
>>
>> Quite possible, yes.
>>
>> So, are you suggesting that we shouldn''t even allow the user
to read it?
>> Or that I should mention that in the comment? (Or something else?)
>
> Perhaps I didn''t explain my concerns clearly enough.
>
> I wonder what is it for ?  Isn''t it difficult to use correctly ?
Dario uses it in the new version of libxl_vcpu_set_affinity() to limit 
what was considered an "unreachable cpu" in its warning.  (v5 14/17) 
That is, if you set the affinity to 1111111111111111111111..., and there 
are only 4 pcpus, it will return 111100000.....  These don''t match, and
yet there are no unreachable cpus.  So it asks nr_cpus first, then only 
compares bits 1..[nr_cpus-1].

I''m not sure how often that would change, but if there was a race, I 
think it would just be a spurious error message (or lack thereof).

  -George

Dario Faggioli

2013-Dec-03 18:15 UTC

head link

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

On mar, 2013-12-03 at 17:54 +0000, Ian Jackson wrote:> Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce
libxl_get_nr_cpus()"):
> > On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote:
> > > This number might be out of date as soon as it is read,
won''t it ?
> > 
> > Quite possible, yes.
> > 
> > So, are you suggesting that we shouldn''t even allow the user
to read it?
> > Or that I should mention that in the comment? (Or something else?)
> 
> Perhaps I didn''t explain my concerns clearly enough.
> 
> I wonder what is it for ?  Isn''t it difficult to use correctly ?
> It is indeed. In fact, it is for now used only in the dryrun_only path
of the (newly introduced by this series) `xl vcpu-pin'' command and,
later in the series, for the sake of providing some LOG_DEBUG or
LOG_WARN info.

Unfortunately, there are not much alternatives to figure out which ones
of the bits in a libxl_bitmap (be it a cpumap, in this case) are set to
0 because the pcpu in question is up and running but not part of the
mask, or, OTOH, it''s just that the map is 64 bits wide, but we only
have
8 pcpus. :-/

I really don''t think there are better ways to solve this at the libxl
layer, and, if we want to improve the situation, we probably should
amend the handling (not the API, hopefully) of libxl_bitmap in a more
fundamental way. I''m certainly up for it, but perhaps not at this stage
in the code freeze :-/

However, again, all this drives is a dryrun_only mode and some
debug/warn output, so I really think there is no real harm.

On an IRC conversation, IanJ seems to be convinced about this too:

Diziet: Eg if you run it while also adding a cpu ?
dariof: Diziet: it''s in the dry_run path
dariof: Diziet: so, when it prints the cpumap that would have been used (if not
in dryrun_only) mode, like this: print_bitmap(cpumap.map, nb_cpu, stdout);
dariof: Diziet: it''s possible for the output to not be consistent with
the new situation
dariof: Diziet: basically, if it asks for the number of onlince cpus, it gets 3,
and then the mask is <0111> it will print "all"
Diziet: I think you have convinced me that this is too intractable a problem to
solve at this level of the API and that therefore the call you propose to add is
fine.
Diziet: Shall I c&p this scrool into an email with my ack ?

I cut-&-paste-d this with Ian''s permission, but I''d rather
avoid
providing a formal "Acked-by: Ian Jackson" line myself! :-P

Ian, Please :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Jackson

2013-Dec-03 18:16 UTC

head link

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce
libxl_get_nr_cpus()"):> On an IRC conversation, IanJ seems to be convinced about this too:
...> Diziet: Eg if you run it while also adding a cpu ?
> dariof: Diziet: it''s in the dry_run path
> dariof: Diziet: so, when it prints the cpumap that would have been used (if
not in dryrun_only) mode, like this: print_bitmap(cpumap.map, nb_cpu, stdout);
> dariof: Diziet: it''s possible for the output to not be consistent
with the new situation
> dariof: Diziet: basically, if it asks for the number of onlince cpus, it
gets 3, and then the mask is <0111> it will print "all"
> Diziet: I think you have convinced me that this is too intractable a
problem to solve at this level of the API and that therefore the call you
propose to add is fine.
> Diziet: Shall I c&p this scrool into an email with my ack ?
> 
> I cut-&-paste-d this with Ian''s permission, but I''d
rather avoid
> providing a formal "Acked-by: Ian Jackson" line myself! :-P
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Thanks,
Ian.

Konrad Rzeszutek Wilk

2013-Dec-03 18:17 UTC

head link

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

On Tue, Dec 03, 2013 at 06:09:10PM +0000, George Dunlap
wrote:> On 12/03/2013 05:54 PM, Ian Jackson wrote:
> >Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce
libxl_get_nr_cpus()"):
> >>On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote:
> >>>This number might be out of date as soon as it is read,
won''t it ?
> >>
> >>Quite possible, yes.
> >>
> >>So, are you suggesting that we shouldn''t even allow the
user to read it?
> >>Or that I should mention that in the comment? (Or something else?)
> >
> >Perhaps I didn''t explain my concerns clearly enough.
> >
> >I wonder what is it for ?  Isn''t it difficult to use correctly
?
> 
> Dario uses it in the new version of libxl_vcpu_set_affinity() to
> limit what was considered an "unreachable cpu" in its warning. 
(v5
> 14/17) That is, if you set the affinity to
> 1111111111111111111111..., and there are only 4 pcpus, it will
> return 111100000.....  These don''t match, and yet there are no
> unreachable cpus.  So it asks nr_cpus first, then only compares bits
> 1..[nr_cpus-1].
What about the inverse? 2 PCPUs and 16 VCPUS? Won''t that still throw
this calculation off?> 
> I''m not sure how often that would change, but if there was a race,
I
> think it would just be a spurious error message (or lack thereof).
> 
>  -George
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Dario Faggioli

2013-Dec-03 18:19 UTC

head link

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

On mar, 2013-12-03 at 18:09 +0000, George Dunlap wrote:> On 12/03/2013 05:54 PM, Ian Jackson wrote:
> > Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce
libxl_get_nr_cpus()"):
> >> On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote:
> >>> This number might be out of date as soon as it is read,
won''t it ?
> >>
> >> Quite possible, yes.
> >>
> >> So, are you suggesting that we shouldn''t even allow the
user to read it?
> >> Or that I should mention that in the comment? (Or something else?)
> >
> > Perhaps I didn''t explain my concerns clearly enough.
> >
> > I wonder what is it for ?  Isn''t it difficult to use
correctly ?
> 
> Dario uses it in the new version of libxl_vcpu_set_affinity() to limit 
> what was considered an "unreachable cpu" in its warning.  (v5
14/17)
> That is, if you set the affinity to 1111111111111111111111..., and there 
> are only 4 pcpus, it will return 111100000.....  These don''t
match, and
> yet there are no unreachable cpus.  So it asks nr_cpus first, then only 
> compares bits 1..[nr_cpus-1].
> Yes, and also in the new dryrun_only mode I''m introducing for `xl
vcpu-pin'', as print_bitmap() in xl wants to know the total number of
pcpus, so that, if the mask is <11110000>, but we only have 4 pcpus, it
can print "all".
> I''m not sure how often that would change, but if there was a race,
I
> think it would just be a spurious error message (or lack thereof).
> Exactly. Also, as mentioned in the other e-mail I just sent, I''m up for
trying to figure out a better way to handle this, as soon as 4.5 begins.

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

George Dunlap

2013-Dec-03 18:22 UTC

head link

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

On Tue, Dec 3, 2013 at 6:17 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:> On Tue, Dec 03, 2013 at 06:09:10PM +0000, George Dunlap wrote:
>> On 12/03/2013 05:54 PM, Ian Jackson wrote:
>> >Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce
libxl_get_nr_cpus()"):
>> >>On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote:
>> >>>This number might be out of date as soon as it is read,
won''t it ?
>> >>
>> >>Quite possible, yes.
>> >>
>> >>So, are you suggesting that we shouldn''t even allow
the user to read it?
>> >>Or that I should mention that in the comment? (Or something
else?)
>> >
>> >Perhaps I didn''t explain my concerns clearly enough.
>> >
>> >I wonder what is it for ?  Isn''t it difficult to use
correctly ?
>>
>> Dario uses it in the new version of libxl_vcpu_set_affinity() to
>> limit what was considered an "unreachable cpu" in its
warning.  (v5
>> 14/17) That is, if you set the affinity to
>> 1111111111111111111111..., and there are only 4 pcpus, it will
>> return 111100000.....  These don''t match, and yet there are no
>> unreachable cpus.  So it asks nr_cpus first, then only compares bits
>> 1..[nr_cpus-1].
>
> What about the inverse? 2 PCPUs and 16 VCPUS? Won''t that still
throw
> this calculation off?
Both things being compared are pcpus.

 -George

Dario Faggioli

2013-Dec-03 18:26 UTC

head link

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

`On mar, 2013-12-03 at 13:17 -0500, Konrad Rzeszutek Wilk
wrote:> On Tue, Dec 03, 2013 at 06:09:10PM +0000, George Dunlap wrote:
> > On 12/03/2013 05:54 PM, Ian Jackson wrote:
> > >Dario Faggioli writes ("Re: [PATCH v4 03/15] libxl: introduce
libxl_get_nr_cpus()"):
> > >>On mar, 2013-12-03 at 17:48 +0000, Ian Jackson wrote:
> > >>>This number might be out of date as soon as it is read,
won''t it ?
> > >>
> > >>Quite possible, yes.
> > >>
> > >>So, are you suggesting that we shouldn''t even allow
the user to read it?
> > >>Or that I should mention that in the comment? (Or something
else?)
> > >
> > >Perhaps I didn''t explain my concerns clearly enough.
> > >
> > >I wonder what is it for ?  Isn''t it difficult to use
correctly ?
> > 
> > Dario uses it in the new version of libxl_vcpu_set_affinity() to
> > limit what was considered an "unreachable cpu" in its
warning.  (v5
> > 14/17) That is, if you set the affinity to
> > 1111111111111111111111..., and there are only 4 pcpus, it will
> > return 111100000.....  These don''t match, and yet there are
no
> > unreachable cpus.  So it asks nr_cpus first, then only compares bits
> > 1..[nr_cpus-1].
> 
> What about the inverse? 2 PCPUs and 16 VCPUS? Won''t that still
throw
> this calculation off?
>It''s not how it works, it''s not the number of vcpus against
pcpus that
matters. So, as soon as your 16 vcpus have affinity with some _real_,
_existing_ and _online_ pcpus, that would be fine and this situation
you''re mentioning won''t cause any problem.

OTOH, if you offline any of the pcpus in the process, no matter how many
pcpus-vs-vcpus you have, you risk getting a spurious warning (and only
that).

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Xen devel - Nov 2013 - [PATCH v4 00/15] Implement vcpu soft affinity for credit1

[PATCH v4 00/15] Implement vcpu soft affinity for credit1

[PATCH v4 01/15] xl: match output of vcpu-list with pinning syntax

[PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

[PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

[PATCH v4 04/15] xl: allow for node-wise specification of vcpu pinning

[PATCH v4 05/15] xl: implement and enable dryrun mode for `xl vcpu-pin''

[PATCH v4 06/15] xl: test script for the cpumap parser (for vCPU pinning)

[PATCH v4 07/15] xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity

[PATCH v4 08/15] xen: sched: introduce soft-affinity and use it instead d->node-affinity

[PATCH v4 09/15] xen: derive NUMA node affinity from hard and soft CPU affinity

[PATCH v4 10/15] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity

[PATCH v4 11/15] libxc: get and set soft and hard affinity

[PATCH v4 12/15] libxl: get and set soft affinity

[PATCH v4 13/15] xl: enable getting and setting soft

[PATCH v4 14/15] xl: enable for specifying node-affinity in the config file

[PATCH v4 15/15] libxl: automatic NUMA placement affects soft affinity

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

Re: [PATCH v4 12/15] libxl: get and set soft affinity

Re: [PATCH v4 10/15] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

Re: [PATCH v4 10/15] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity

Re: [PATCH v4 10/15] xen: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity

Re: [PATCH v4 12/15] libxl: get and set soft affinity

Re: [PATCH v4 13/15] xl: enable getting and setting soft

Re: [PATCH v4 14/15] xl: enable for specifying node-affinity in the config file

Re: [PATCH v4 15/15] libxl: automatic NUMA placement affects soft affinity

Re: [PATCH v4 13/15] xl: enable getting and setting soft

Re: [PATCH v4 12/15] libxl: get and set soft affinity

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

Re: [PATCH v4 14/15] xl: enable for specifying node-affinity in the config file

Re: [PATCH v4 13/15] xl: enable getting and setting soft

Re: [PATCH v4 12/15] libxl: get and set soft affinity

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

Re: [PATCH v4 13/15] xl: enable getting and setting soft

Re: [PATCH v4 13/15] xl: enable getting and setting soft

Re: [PATCH v4 13/15] xl: enable getting and setting soft

Re: [PATCH v4 13/15] xl: enable getting and setting soft

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

Re: [PATCH v4 02/15] libxl: sanitize error handling in libxl_get_max_{cpus, nodes}

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()

Re: [PATCH v4 03/15] libxl: introduce libxl_get_nr_cpus()