thr3ads.net - Xen devel - [PATCH RFC 0/7] vnuma topology support [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Elena Ufimtseva

2013-Aug-27 07:54 UTC

[PATCH RFC 0/7] vnuma topology support

This series of patches introduces vNUMA topology implementation and
provides interfaces and data structures, exposing to PV guest virtual topology
and enabling guest OS to use its own NUMA placement mechanisms.

vNUMA topology support for Linux PV guest comes in a separate patch.

Please review and send your comments.

Introduction
-------------

vNUMA topology is exposed to the PV guest to improve performance when running
workloads on NUMA machines.
XEN vNUMA implementation provides a way to create vNUMA-enabled guests on
NUMA/UMA
and map vNUMA topology to physical NUMA in a optimal way.

XEN vNUMA support

Current set of patches introduces subop hypercall that is available for
enlightened
PV guests with vNUMA patches applied.

Domain structure was modified to reflect per-domain vNUMA topology for use in
other
vNUMA-aware subsystems (e.g. ballooning).

libxc

libxc provides interfaces to build PV guests with vNUMA support and in case of
NUMA
machines provides initial memory allocation on physical NUMA nodes. This
implemented by
 trying to allocate all vnodes on one NUMA node. In case of insufficient memory,
vnodes
are allocated on other physical NODE with enough memory. If none of physical
nodes
has enough memory, the memory allocation is done using default mechanism and
vnodes
may have pages allocated on different nodes.

libxl

libxl provides a way to predefine in VM config vNUMA topology - number of
vnodes,
memory arrangement, vcpus to vnodes assignment, distance map.

PV guest

As of now, only PV guest can take advantage of vNUMA functionality. vNUMA Linux
patches
should be applied and NUMA support should be compiled in kernel.


Patchset v. 0.1
---------------

[PATCH RFC 1/7] xen/vNUMA: hypercall and vnuma topology structures
Defines XENMEM subop hypercall for PV vNUMA enabled guests and provides
vNUMA topology information from per-domain vnuma topology build info.

[PATCH RFC 2/7] xen/vnuma: domctl subop for vnuma setup
Defines domctl subop hypercall for per-domain vNUMA topology construct

[PATCH RFC 3/7] libxc/vnuma: domain per-domain vnuma structures.
Makes use of domctl vnuma subop and initializes per-domain vnuma topology.

[PATCH RFC 4/7] libxl/vnuma: vnuma domain config and pre-build
Defines VM config options for vNUMA PV domain creation as follows:

[PATCH RFC 5/7] libxl/vnuma: vnuma enabler
Enables libxl vnuma ABI by LIBXL_HAVE_BUILDINFO_VNUMA

[PATCH RFC 6/7] libxc/vnuma: vnuma per phys NUMA allocation.
Allows for vNUMA enabled domains to allocate vnodes on physical NUMA nodes.
Tries to allocate all vnodes on one NUMA node, or on next one if not all
vnodes fit. If no physical numa node found, will let xen decide.

[PATCH RFC 7/7] xen/vnuam: basic vnuma debug info
Prints basic vnuma info per domain on ''debug-keys u''.

TODO:
---------------
- initial boot mem allocation alghoritm for vNUMA nodes on physical nodes;
- linux vnuma memblocks and e820 holes needs testing;
- move XENMEM subop hypercall in xen to sysctl subop;
- some kind of statistics for vnuma enabled guests, xl info/list;
- take into account cpu pinning if defined in VM config;
- take into account automatic NUMA placement mechanism;
- arch dependend pieces tests;
- help files;

Elena Ufimtseva (7):
  xen/vnuma: subop hypercall and vnuma topology structures.
  xen/vnuma: domctl subop for vnuma setup.
  libxc/vnuma: per-domain vnuma structures.
  libxl/vnuma: vnuma domain config
  libxl/vnuma: vnuma enabler.
  libxc/vnuma: vnuma per phys NUMA allocation.
  xen/vnuma: basic vnuma debug info

 tools/libxc/xc_dom.h         |   10 +++
 tools/libxc/xc_dom_x86.c     |   79 +++++++++++++++--
 tools/libxc/xc_domain.c      |   63 ++++++++++++++
 tools/libxc/xenctrl.h        |   17 ++++
 tools/libxc/xg_private.h     |    4 +
 tools/libxl/libxl.c          |   28 ++++++
 tools/libxl/libxl.h          |   23 +++++
 tools/libxl/libxl_arch.h     |    6 ++
 tools/libxl/libxl_dom.c      |  115 ++++++++++++++++++++++--
 tools/libxl/libxl_internal.h |    3 +
 tools/libxl/libxl_types.idl  |    6 +-
 tools/libxl/libxl_x86.c      |   91 +++++++++++++++++++
 tools/libxl/xl_cmdimpl.c     |  197 +++++++++++++++++++++++++++++++++++++++++-
 xen/arch/x86/numa.c          |   16 +++-
 xen/common/domain.c          |    6 ++
 xen/common/domctl.c          |   72 ++++++++++++++-
 xen/common/memory.c          |   90 ++++++++++++++++++-
 xen/include/public/domctl.h  |   15 +++-
 xen/include/public/memory.h  |    1 +
 xen/include/public/vnuma.h   |   12 +++
 xen/include/xen/domain.h     |    9 ++
 xen/include/xen/sched.h      |    1 +
 xen/include/xen/vnuma.h      |   27 ++++++
 23 files changed, 871 insertions(+), 20 deletions(-)
 create mode 100644 xen/include/public/vnuma.h
 create mode 100644 xen/include/xen/vnuma.h

-- 
1.7.10.4

Elena Ufimtseva

2013-Aug-27 07:54 UTC

head link

[PATCH RFC 1/7] xen/vnuma: subop hypercall and vnuma topology structures.

Defines XENMEM subop hypercall for PV vNUMA enabled guests and provides
vNUMA topology information from per-domain vnuma topology build info.
TODO:
subop XENMEM hypercall is subject to change to sysctl subop.

Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
---
 xen/common/memory.c         |   90 ++++++++++++++++++++++++++++++++++++++++++-
 xen/include/public/memory.h |    1 +
 xen/include/public/vnuma.h  |   12 ++++++
 xen/include/xen/domain.h    |    9 +++++
 xen/include/xen/sched.h     |    1 +
 xen/include/xen/vnuma.h     |   27 +++++++++++++
 6 files changed, 139 insertions(+), 1 deletion(-)
 create mode 100644 xen/include/public/vnuma.h
 create mode 100644 xen/include/xen/vnuma.h

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 50b740f..c7fbe11 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -28,6 +28,7 @@
 #include <public/memory.h>
 #include <xsm/xsm.h>
 #include <xen/trace.h>
+#include <xen/vnuma.h>
 
 struct memop_args {
     /* INPUT */
@@ -732,7 +733,94 @@ long do_memory_op(unsigned long cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
         rcu_unlock_domain(d);
 
         break;
-
+    case XENMEM_get_vnuma_info:
+    {
+        int i;
+        struct vnuma_topology_info mtopology;
+        struct vnuma_topology_info touser_topo;
+        struct domain *d;
+        unsigned int max_pages;
+        vnuma_memblk_t *vblks;
+        XEN_GUEST_HANDLE(int) vdistance;
+        XEN_GUEST_HANDLE_PARAM(int) vdist_param;
+        XEN_GUEST_HANDLE(vnuma_memblk_t) buf;
+        XEN_GUEST_HANDLE_PARAM(vnuma_memblk_t) buf_param;
+        XEN_GUEST_HANDLE(int) vcpu_to_vnode;
+        XEN_GUEST_HANDLE_PARAM(int) vmap_param;
+
+        rc = -1;
+        if ( guest_handle_is_null(arg) )
+            return rc;
+        if( copy_from_guest(&mtopology, arg, 1) )
+        {
+            gdprintk(XENLOG_INFO, "Cannot get copy_from_guest..\n");
+            return -EFAULT;
+        }
+        gdprintk(XENLOG_INFO, "Domain id is %d\n",mtopology.domid);
+        if ( (d = rcu_lock_domain_by_any_id(mtopology.domid)) == NULL )
+        {
+            gdprintk(XENLOG_INFO, "Numa: Could not get domain
id.\n");
+            return -ESRCH;
+        }
+        rcu_unlock_domain(d);
+        touser_topo.nr_vnodes = d->vnuma.nr_vnodes;
+        rc = copy_to_guest(arg, &touser_topo, 1);
+        if ( rc )
+        {
+            gdprintk(XENLOG_INFO, "Bad news, could not copy to guest NUMA
info\n");
+            return -EFAULT;
+        }
+        max_pages = d->max_pages;
+        if ( touser_topo.nr_vnodes == 0 || touser_topo.nr_vnodes >
d->max_vcpus )
+        {
+            gdprintk(XENLOG_INFO, "vNUMA: Error in block creation - vnodes
%d, vcpus %d \n", touser_topo.nr_vnodes, d->max_vcpus);
+            return -EFAULT;
+        }
+        vblks = (vnuma_memblk_t *)xmalloc_array(struct vnuma_memblk,
touser_topo.nr_vnodes);
+        if ( vblks == NULL )
+        {
+            gdprintk(XENLOG_INFO, "vNUMA: Could not get memory for
memblocks\n");
+            return -1;
+        }
+        buf_param = guest_handle_cast(mtopology.vnuma_memblks, vnuma_memblk_t);
+        buf = guest_handle_from_param(buf_param, vnuma_memblk_t);
+        for ( i = 0; i < touser_topo.nr_vnodes; i++ )
+        {
+                gdprintk(XENLOG_INFO, "vmemblk[%d] start %#lx end
%#lx\n", i, d->vnuma.vnuma_memblks[i].start,
d->vnuma.vnuma_memblks[i].end);
+                if ( copy_to_guest_offset(buf, i,
&d->vnuma.vnuma_memblks[i], 1) )
+                {
+                    gdprintk(XENLOG_INFO, "Failed to copy to guest
vmemblk[%d]\n", i);
+                    goto out;
+                }
+        }
+        vdist_param = guest_handle_cast(mtopology.vdistance, int);
+        vdistance = guest_handle_from_param(vdist_param, int);
+        for ( i = 0; i < touser_topo.nr_vnodes * touser_topo.nr_vnodes; i++
)
+        {
+            if ( copy_to_guest_offset(vdistance, i,
&d->vnuma.vdistance[i], 1) )
+            {
+                gdprintk(XENLOG_INFO, "Failed to copy to guest
vdistance[%d]\n", i);
+                goto out;
+            }
+        }
+        vmap_param = guest_handle_cast(mtopology.vcpu_to_vnode, int);
+        vcpu_to_vnode = guest_handle_from_param(vmap_param, int);
+        for ( i = 0; i < d->max_vcpus ; i++ )
+        {
+            if ( copy_to_guest_offset(vcpu_to_vnode, i,
&d->vnuma.vcpu_to_vnode[i], 1) )
+            {
+                    gdprintk(XENLOG_INFO, "Failed to copy to guest
vcputonode[%d]\n", i);
+                    goto out;
+            }
+            else
+                gdprintk(XENLOG_INFO, "Copied map [%d] = %x\n", i,
d->vnuma.vcpu_to_vnode[i]);
+        }
+        return rc;
+out:
+        if ( vblks ) xfree(vblks);
+        return rc;
+        break;
+    }
     default:
         rc = arch_memory_op(op, arg);
         break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 7a26dee..30cb8af 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -453,6 +453,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
  * Caller must be privileged or the hypercall fails.
  */
 #define XENMEM_claim_pages                  24
+#define XENMEM_get_vnuma_info               25
 
 /*
  * XENMEM_claim_pages flags - the are no flags at this time.
diff --git a/xen/include/public/vnuma.h b/xen/include/public/vnuma.h
new file mode 100644
index 0000000..a88dfe2
--- /dev/null
+++ b/xen/include/public/vnuma.h
@@ -0,0 +1,12 @@
+#ifndef __XEN_PUBLIC_VNUMA_H
+#define __XEN_PUBLIC_VNUMA_H
+
+#include "xen.h"
+
+struct vnuma_memblk {
+              uint64_t start, end;
+};
+typedef struct vnuma_memblk vnuma_memblk_t;
+DEFINE_XEN_GUEST_HANDLE(vnuma_memblk_t);
+
+#endif
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index a057069..3d39218 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -4,6 +4,7 @@
 
 #include <public/xen.h>
 #include <asm/domain.h>
+#include <public/vnuma.h>
 
 typedef union {
     struct vcpu_guest_context *nat;
@@ -89,4 +90,12 @@ extern unsigned int xen_processor_pmbits;
 
 extern bool_t opt_dom0_vcpus_pin;
 
+struct domain_vnuma_info {
+    uint16_t nr_vnodes;
+    int *vdistance;
+    vnuma_memblk_t *vnuma_memblks;
+    int *vcpu_to_vnode;
+    int *vnode_to_pnode;
+};
+
 #endif /* __XEN_DOMAIN_H__ */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ae6a3b8..cb023cf 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -377,6 +377,7 @@ struct domain
     nodemask_t node_affinity;
     unsigned int last_alloc_node;
     spinlock_t node_affinity_lock;
+    struct domain_vnuma_info vnuma;
 };
 
 struct domain_setup_info
diff --git a/xen/include/xen/vnuma.h b/xen/include/xen/vnuma.h
new file mode 100644
index 0000000..f1ab531
--- /dev/null
+++ b/xen/include/xen/vnuma.h
@@ -0,0 +1,27 @@
+#ifndef _VNUMA_H
+#define _VNUMA_H
+#include <public/vnuma.h>
+
+/* DEFINE_XEN_GUEST_HANDLE(vnuma_memblk_t); */
+
+struct vnuma_topology_info {
+    domid_t domid;
+    uint16_t nr_vnodes;
+    XEN_GUEST_HANDLE_64(vnuma_memblk_t) vnuma_memblks;
+    XEN_GUEST_HANDLE_64(int) vdistance;
+    XEN_GUEST_HANDLE_64(int) vcpu_to_vnode;
+    XEN_GUEST_HANDLE_64(int) vnode_to_pnode;
+};
+typedef struct vnuma_topology_info vnuma_topology_info_t;
+DEFINE_XEN_GUEST_HANDLE(vnuma_topology_info_t);
+
+#define __vnode_distance_offset(_dom, _i, _j) \
+        ( ((_j)*((_dom)->vnuma.nr_vnodes)) + (_i) )
+
+#define __vnode_distance(_dom, _i, _j) \
+        ( (_dom)->vnuma.vdistance[__vnode_distance_offset((_dom), (_i),
(_j))] )
+
+#define __vnode_distance_set(_dom, _i, _j, _v) \
+        do { __vnode_distance((_dom), (_i), (_j)) = (_v); } while(0)
+
+#endif
-- 
1.7.10.4

Elena Ufimtseva

2013-Aug-27 07:54 UTC

head link

[PATCH RFC 2/7] xen/vnuma: domctl subop for vnuma setup.

Defines domctl subop hypercall for per-domain vNUMA topology construct.

Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
---
 xen/common/domain.c         |    6 ++++
 xen/common/domctl.c         |   72 ++++++++++++++++++++++++++++++++++++++++++-
 xen/include/public/domctl.h |   15 ++++++++-
 3 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 9390a22..f0c0a79 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -227,6 +227,11 @@ struct domain *domain_create(
     spin_lock_init(&d->node_affinity_lock);
     d->node_affinity = NODE_MASK_ALL;
     d->auto_node_affinity = 1;
+    d->vnuma.vnuma_memblks = NULL;
+    d->vnuma.vnode_to_pnode = NULL;
+    d->vnuma.vcpu_to_vnode = NULL;
+    d->vnuma.vdistance = NULL;
+    d->vnuma.nr_vnodes = 0;
 
     spin_lock_init(&d->shutdown_lock);
     d->shutdown_code = -1;
@@ -532,6 +537,7 @@ int domain_kill(struct domain *d)
         tmem_destroy(d->tmem);
         domain_set_outstanding_pages(d, 0);
         d->tmem = NULL;
+        /* TODO: vnuma_destroy(d->vnuma); */
         /* fallthrough */
     case DOMDYING_dying:
         rc = domain_relinquish_resources(d);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 9760d50..b552e60 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -29,6 +29,7 @@
 #include <asm/page.h>
 #include <public/domctl.h>
 #include <xsm/xsm.h>
+#include <xen/vnuma.h>
 
 static DEFINE_SPINLOCK(domctl_lock);
 DEFINE_SPINLOCK(vcpu_alloc_lock);
@@ -862,7 +863,76 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t)
u_domctl)
         ret = set_global_virq_handler(d, virq);
     }
     break;
-
+    case XEN_DOMCTL_setvnumainfo:
+    {
+        int i, j;
+        int dist_size;
+        int dist, vmap, vntop;
+        vnuma_memblk_t vmemblk;
+        
+        ret = -EFAULT;
+        dist = i = j = 0;
+        if (op->u.vnuma.nr_vnodes <= 0 || op->u.vnuma.nr_vnodes >
NR_CPUS)
+            break;
+        d->vnuma.nr_vnodes = op->u.vnuma.nr_vnodes;
+        dist_size = d->vnuma.nr_vnodes * d->vnuma.nr_vnodes;
+        if ( (d->vnuma.vdistance =
xmalloc_bytes(sizeof(*d->vnuma.vdistance) * dist_size) ) == NULL)
+           break;
+        for ( i = 0; i < d->vnuma.nr_vnodes; i++ )
+            for ( j = 0; j < d->vnuma.nr_vnodes; j++ )
+            {
+                if ( unlikely(__copy_from_guest_offset(&dist,
op->u.vnuma.vdistance, __vnode_distance_offset(d, i, j), 1)) )
+                {
+                    gdprintk(XENLOG_INFO, "vNUMA: Copy distance table
error\n");
+                    goto err_dom;
+                }
+                __vnode_distance_set(d, i, j, dist);
+            }
+        if ( (d->vnuma.vnuma_memblks =
xmalloc_bytes(sizeof(*d->vnuma.vnuma_memblks) * d->vnuma.nr_vnodes)) ==
NULL )
+            goto err_dom;
+        for ( i = 0; i < d->vnuma.nr_vnodes; i++ )
+        {
+            if ( unlikely(__copy_from_guest_offset(&vmemblk,
op->u.vnuma.vnuma_memblks, i, 1)) )
+            {
+                gdprintk(XENLOG_INFO, "vNUMA: memory size error\n");
+                goto err_dom;
+            }
+            d->vnuma.vnuma_memblks[i].start = vmemblk.start;
+            d->vnuma.vnuma_memblks[i].end = vmemblk.end;
+        }
+        if ( (d->vnuma.vcpu_to_vnode =
xmalloc_bytes(sizeof(*d->vnuma.vcpu_to_vnode) * d->max_vcpus)) == NULL )
+            goto err_dom;
+        for ( i = 0; i < d->max_vcpus; i++ )
+        {
+            if ( unlikely(__copy_from_guest_offset(&vmap,
op->u.vnuma.vcpu_to_vnode, i, 1)) )
+            {
+                gdprintk(XENLOG_INFO, "vNUMA: vcputovnode map
error\n");
+                goto err_dom;
+            }
+            d->vnuma.vcpu_to_vnode[i] = vmap;
+        }
+        if ( !guest_handle_is_null(op->u.vnuma.vnode_to_pnode) )
+        {
+            if ( (d->vnuma.vnode_to_pnode =
xmalloc_bytes(sizeof(*d->vnuma.vnode_to_pnode) * d->vnuma.nr_vnodes)) ==
NULL )
+                goto err_dom;
+            for ( i = 0; i < d->vnuma.nr_vnodes; i++ )
+            {
+                if ( unlikely(__copy_from_guest_offset(&vntop,
op->u.vnuma.vnode_to_pnode, i, 1)) )
+                {
+                    gdprintk(XENLOG_INFO, "vNUMA: vnode_t_pnode map
error\n");
+                    goto err_dom;
+                }
+                d->vnuma.vnode_to_pnode[i] = vntop;
+            }
+        }
+        else
+            d->vnuma.vnode_to_pnode = NULL;
+        ret = 0;
+        break;
+err_dom:
+        ret = -EINVAL;
+    }
+    break;
     default:
         ret = arch_do_domctl(op, d, u_domctl);
         break;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 4c5b2bb..a034688 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -35,6 +35,7 @@
 #include "xen.h"
 #include "grant_table.h"
 #include "hvm/save.h"
+#include "xen/vnuma.h"
 
 #define XEN_DOMCTL_INTERFACE_VERSION 0x00000009
 
@@ -852,6 +853,17 @@ struct xen_domctl_set_broken_page_p2m {
 typedef struct xen_domctl_set_broken_page_p2m xen_domctl_set_broken_page_p2m_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_broken_page_p2m_t);
 
+struct xen_domctl_vnuma {
+    uint16_t nr_vnodes;
+    XEN_GUEST_HANDLE_64(int) vdistance;
+    XEN_GUEST_HANDLE_64(vnuma_memblk_t) vnuma_memblks;
+    XEN_GUEST_HANDLE_64(int) vcpu_to_vnode;
+    XEN_GUEST_HANDLE_64(int) vnode_to_pnode;
+};
+
+typedef struct xen_domctl_vnuma xen_domctl_vnuma_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_vnuma_t);
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -920,6 +932,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_set_broken_page_p2m           67
 #define XEN_DOMCTL_setnodeaffinity               68
 #define XEN_DOMCTL_getnodeaffinity               69
+#define XEN_DOMCTL_setvnumainfo                  70
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -979,6 +992,7 @@ struct xen_domctl {
         struct xen_domctl_set_broken_page_p2m set_broken_page_p2m;
         struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
         struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
+        struct xen_domctl_vnuma             vnuma;
         uint8_t                             pad[128];
     } u;
 };
@@ -986,7 +1000,6 @@ typedef struct xen_domctl xen_domctl_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_t);
 
 #endif /* __XEN_PUBLIC_DOMCTL_H__ */
-
 /*
  * Local variables:
  * mode: C
-- 
1.7.10.4

Elena Ufimtseva

2013-Aug-27 07:54 UTC

head link

[PATCH RFC 3/7] libxc/vnuma: per-domain vnuma structures.

Makes use of domctl vnuma subop and initializes per-domain
vnuma topology.

Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxc/xc_dom.h    |    9 +++++++
 tools/libxc/xc_domain.c |   63 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h   |   17 +++++++++++++
 3 files changed, 89 insertions(+)

diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h
index 86e23ee..4375f25 100644
--- a/tools/libxc/xc_dom.h
+++ b/tools/libxc/xc_dom.h
@@ -114,6 +114,15 @@ struct xc_dom_image {
     struct xc_dom_phys *phys_pages;
     int realmodearea_log;
 
+    /* vNUMA topology and memory allocation structure
+     * Defines the way to allocate XEN
+     * memory from phys NUMA nodes by providing mask
+     * vnuma_to_pnuma */
+    int nr_vnodes;
+    struct vnuma_memblk *vnumablocks;
+    uint64_t *vmemsizes;
+    int *vnode_to_pnode;
+
     /* malloc memory pool */
     struct xc_dom_mem *memblocks;
 
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 3257e2a..98445e3 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -24,6 +24,7 @@
 #include "xg_save_restore.h"
 #include <xen/memory.h>
 #include <xen/hvm/hvm_op.h>
+#include "xg_private.h"
 
 int xc_domain_create(xc_interface *xch,
                      uint32_t ssidref,
@@ -1629,6 +1630,68 @@ int xc_domain_set_virq_handler(xc_interface *xch,
uint32_t domid, int virq)
     return do_domctl(xch, &domctl);
 }
 
+/* Informs XEN that domain is vNUMA aware */
+int xc_domain_setvnodes(xc_interface *xch, 
+                        uint32_t domid, 
+                        uint16_t nr_vnodes,
+                        uint16_t nr_vcpus,
+                        vnuma_memblk_t *vmemblks,
+                        int *vdistance,
+                        int *vcpu_to_vnode,
+                        int *vnode_to_pnode)
+{
+    int rc;
+    DECLARE_DOMCTL;
+    DECLARE_HYPERCALL_BUFFER(int, distbuf);
+    DECLARE_HYPERCALL_BUFFER(vnuma_memblk_t, membuf);
+    DECLARE_HYPERCALL_BUFFER(int, vcpumapbuf); 
+    DECLARE_HYPERCALL_BUFFER(int, vntopbuf); 
+
+    rc = -EINVAL;
+    memset(&domctl, 0, sizeof(domctl));
+    if ( vdistance == NULL || vcpu_to_vnode == NULL || vmemblks == NULL )
+    /* vnode_to_pnode can be null on non-NUMA machines */
+    {
+            PERROR("Parameters are wrong XEN_DOMCTL_setvnumainfo\n");
+            return -EINVAL;
+    }
+    distbuf = xc_hypercall_buffer_alloc
+        (xch, distbuf, sizeof(*vdistance) * nr_vnodes * nr_vnodes);
+    membuf = xc_hypercall_buffer_alloc
+        (xch, membuf, sizeof(*membuf) * nr_vnodes);
+    vcpumapbuf = xc_hypercall_buffer_alloc
+        (xch, vcpumapbuf, sizeof(*vcpu_to_vnode) * nr_vcpus);
+    vntopbuf = xc_hypercall_buffer_alloc
+        (xch, vntopbuf, sizeof(*vnode_to_pnode) * nr_vnodes); 
+
+    if (distbuf == NULL || membuf == NULL || vcpumapbuf == NULL || vntopbuf ==
NULL )
+    {
+            PERROR("Could not allocate memory for xc hypercall
XEN_DOMCTL_setvnumainfo\n");
+            goto fail;
+    }
+    memcpy(distbuf, vdistance, sizeof(*vdistance) * nr_vnodes * nr_vnodes);
+    memcpy(vntopbuf, vnode_to_pnode, sizeof(*vnode_to_pnode) * nr_vnodes);
+    memcpy(vcpumapbuf, vcpu_to_vnode, sizeof(*vcpu_to_vnode) * nr_vcpus);
+    memcpy(membuf, vmemblks, sizeof(*vmemblks) * nr_vnodes);
+    
+    set_xen_guest_handle(domctl.u.vnuma.vdistance, distbuf); 
+    set_xen_guest_handle(domctl.u.vnuma.vnuma_memblks, membuf);
+    set_xen_guest_handle(domctl.u.vnuma.vcpu_to_vnode, vcpumapbuf);
+    set_xen_guest_handle(domctl.u.vnuma.vnode_to_pnode, vntopbuf);
+    
+    domctl.cmd = XEN_DOMCTL_setvnumainfo;
+    domctl.domain = (domid_t)domid;
+    domctl.u.vnuma.nr_vnodes = nr_vnodes;
+    rc = do_domctl(xch, &domctl);
+fail:
+    xc_hypercall_buffer_free(xch, distbuf);
+    xc_hypercall_buffer_free(xch, membuf);
+    xc_hypercall_buffer_free(xch, vcpumapbuf);
+    xc_hypercall_buffer_free(xch, vntopbuf);
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index f2cebaf..fb66cfa 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1083,6 +1083,23 @@ int xc_domain_set_memmap_limit(xc_interface *xch,
                                uint32_t domid,
                                unsigned long map_limitkb);
 
+/*unsigned long xc_get_memory_hole_size(unsigned long start, unsigned long
end);
+
+int xc_domain_align_vnodes(xc_interface *xch,
+                            uint32_t domid,
+                            uint64_t *vmemareas, 
+                            vnuma_memblk_t *vnuma_memblks,
+                            uint16_t nr_vnodes);
+*/
+int xc_domain_setvnodes(xc_interface *xch,
+                        uint32_t domid,
+                        uint16_t nr_vnodes,
+                        uint16_t nr_vcpus,
+                        vnuma_memblk_t *vmemareas,
+                        int *vdistance,
+                        int *vcpu_to_vnode,
+                        int *vnode_to_pnode);
+
 #if defined(__i386__) || defined(__x86_64__)
 /*
  * PC BIOS standard E820 types and structure.
-- 
1.7.10.4

Elena Ufimtseva

2013-Aug-27 07:54 UTC

head link

[PATCH RFC 4/7] libxl/vnuma: vnuma domain config

Defines VM config options for vNUMA PV domain creation as follows:
vnodes - number of nodes and enables vnuma
vnumamem - vnuma nodes memory sizes
vnuma_distance - vnuma distance table (may be omitted)
vcpu_to_vnode - vcpu to vnode mask (may be omitted)

sum of all numamem should be equal to memory option.
Number of vcpus should not be less that number of vnodes.

VM config Examples:

memory = 16384
vcpus = 8
name = "rc"
vnodes = 8
vnumamem = "2g, 2g, 2g, 2g, 2g, 2g, 2g, 2g"
vcpu_to_vnode ="5 6 7 4 3 2 1 0"

memory = 2048
vcpus = 4
name = "rc9"
vnodes = 2
vnumamem = "1g, 1g"
vnuma_distance = "10 20, 10 20"
vcpu_to_vnode ="1, 3, 2, 0"

Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxl/libxl.c          |   28 ++++++
 tools/libxl/libxl.h          |   15 ++++
 tools/libxl/libxl_arch.h     |    6 ++
 tools/libxl/libxl_dom.c      |  115 ++++++++++++++++++++++--
 tools/libxl/libxl_internal.h |    3 +
 tools/libxl/libxl_types.idl  |    6 +-
 tools/libxl/libxl_x86.c      |   91 +++++++++++++++++++
 tools/libxl/xl_cmdimpl.c     |  197 +++++++++++++++++++++++++++++++++++++++++-
 8 files changed, 454 insertions(+), 7 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 81785df..cd25474 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -4293,6 +4293,34 @@ static int libxl__set_vcpuonline_qmp(libxl__gc *gc,
uint32_t domid,
     }
     return 0;
 }
+#ifdef LIBXL_HAVE_BUILDINFO_VNUMA
+int libxl_domain_setvnodes(libxl_ctx *ctx,
+                            uint32_t domid,
+                            uint16_t nr_vnodes,
+                            uint16_t nr_vcpus,
+                            vnuma_memblk_t *vnuma_memblks,
+                            int *vdistance,
+                            int *vcpu_to_vnode,
+                            int *vnode_to_pnode)
+{
+    GC_INIT(ctx);
+    int ret;
+    ret = xc_domain_setvnodes(ctx->xch, domid, nr_vnodes,
+                                nr_vcpus, vnuma_memblks,
+                                vdistance, vcpu_to_vnode,
+                                vnode_to_pnode);
+    GC_FREE;
+    return ret;
+}
+
+int libxl_default_vcpu_to_vnuma(libxl_domain_build_info *info)
+{
+    int i;
+    for(i = 0; i < info->max_vcpus; i++)
+        info->vcpu_to_vnode[i] = i % info->nr_vnodes;
+    return 0;
+}
+#endif
 
 int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_bitmap *cpumap)
 {
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index be19bf5..a1a5e33 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -706,6 +706,21 @@ void libxl_vcpuinfo_list_free(libxl_vcpuinfo *, int
nr_vcpus);
 void libxl_device_vtpm_list_free(libxl_device_vtpm*, int nr_vtpms);
 void libxl_vtpminfo_list_free(libxl_vtpminfo *, int nr_vtpms);
 
+/* vNUMA topology */
+
+#ifdef LIBXL_HAVE_BUILDINFO_VNUMA 
+#include <xen/vnuma.h>
+int libxl_domain_setvnodes(libxl_ctx *ctx,
+                            uint32_t domid,
+                            uint16_t nr_vnodes,
+                            uint16_t nr_vcpus,
+                            vnuma_memblk_t *vnuma_memblks,
+                            int *vdistance,
+                            int *vcpu_to_vnode,
+                            int *vnode_to_pnode);
+
+int libxl_default_vcpu_to_vnuma(libxl_domain_build_info *info);
+#endif
 /*
  * Devices
  * ======diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index abe6685..76c1975 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -18,5 +18,11 @@
 /* arch specific internal domain creation function */
 int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
                uint32_t domid);
+int libxl_vnuma_align_mem(libxl__gc *gc,
+                            uint32_t domid,
+                            struct libxl_domain_build_info *b_info,
+                            vnuma_memblk_t *memblks); /* linux specific memory
blocks: out */
 
+
+unsigned long e820_memory_hole_size(unsigned long start, unsigned long end,
struct e820entry e820[], int nr);
 #endif
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 6e2252a..8bbbd18 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -200,6 +200,63 @@ static int numa_place_domain(libxl__gc *gc, uint32_t domid,
     libxl_cpupoolinfo_dispose(&cpupool_info);
     return rc;
 }
+#define set_all_vnodes(n)    for(i=0; i< info->nr_vnodes; i++) \
+                                info->vnode_to_pnode[i] = n
+
+int libxl_init_vnodemap(libxl__gc *gc, uint32_t domid,
+                        libxl_domain_build_info *info)
+{
+    int i, n, start, nr_nodes;
+    uint64_t *mems;
+    unsigned long long claim[16];
+    libxl_numainfo *ninfo = NULL;
+
+    if (info->vnode_to_pnode == NULL)
+        info->vnode_to_pnode = calloc(info->nr_vnodes,
sizeof(*info->vnode_to_pnode));
+
+    set_all_vnodes(NUMA_NO_NODE);
+    mems = info->vnuma_memszs;
+    ninfo = libxl_get_numainfo(CTX, &nr_nodes);
+    if (ninfo == NULL) {
+        LOG(INFO, "No HW NUMA found\n");
+        return -EINVAL;
+    }
+    /* lets check if all vnodes will fit in one node */
+    for(n = 0; n < nr_nodes;  n++) {
+        if(ninfo[n].free/1024 >= info->max_memkb) {
+            /* all fit on one node, fill the mask */
+            set_all_vnodes(n);
+            LOG(INFO, "Setting all vnodes to node %d, free = %lu, need
=%lu Kb\n", n, ninfo[n].free/1024, info->max_memkb);
+            return 0;
+            }
+    }
+    /* TODO: change algorithm. The current just fits the nodes
+     * Will be nice to have them also sorted by size  */
+    /* If no p-node found, will be set to NUMA_NO_NODE and allocation will fail
*/
+    LOG(INFO, "Found %d physical NUMA nodes\n", nr_nodes);
+    memset(claim, 0, sizeof(*claim) * 16);
+    start =  0;
+    for ( n = 0; n < nr_nodes;  n++ )
+    {
+        for ( i = start; i < info->nr_vnodes; i++ )
+        {
+            LOG(INFO, "Compare %Lx for vnode[%d] size %lx with free space
on pnode[%d], free %lx\n",
+                  claim[n] + mems[i], i, mems[i], n, ninfo[n].free);
+            if ( ((claim[n] + mems[i]) <= ninfo[n].free) &&
(info->vnode_to_pnode[i] == NUMA_NO_NODE) )
+            {
+                info->vnode_to_pnode[i] = n;
+                LOG(INFO, "Set vnode[%d] to pnode [%d]\n", i, n);
+                claim[n] += mems[i];
+            }
+            else {
+                /* Will have another chance at other pnode */
+                start = i;
+                continue;
+            }
+        }
+    }
+    return 0;
+}
 
 int libxl__build_pre(libxl__gc *gc, uint32_t domid,
               libxl_domain_config *d_config, libxl__domain_build_state *state)
@@ -232,9 +289,36 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
         if (rc)
             return rc;
     }
+#ifdef  LIBXL_HAVE_BUILDINFO_VNUMA
+    if (info->nr_vnodes <= info->max_vcpus &&
info->nr_vnodes != 0) {
+        vnuma_memblk_t *memblks = libxl__calloc(gc, info->nr_vnodes,
sizeof(*memblks));
+        libxl_vnuma_align_mem(gc, domid, info, memblks);
+        if (libxl_init_vnodemap(gc, domid, info) != 0) {
+            LOG(INFO, "Failed to call init_vnodemap\n");
+            rc = libxl_domain_setvnodes(ctx, domid, info->nr_vnodes,
+                                    info->max_vcpus, memblks,
+                                    info->vdistance, info->vcpu_to_vnode,
+                                    NULL);
+        }
+        else
+            rc = libxl_domain_setvnodes(ctx, domid, info->nr_vnodes,
+                                    info->max_vcpus, memblks,
+                                    info->vdistance, info->vcpu_to_vnode,
+                                    info->vnode_to_pnode);
+        if (rc < 0 ) LOG(INFO, "Failed to call
xc_domain_setvnodes\n");
+        for(int i=0; i<info->nr_vnodes; i++)
+            LOG(INFO, "Mapping vnode %d to pnode %d\n", i,
info->vnode_to_pnode[i]);
+        libxl_bitmap_set_none(&info->nodemap);
+        libxl_bitmap_set(&info->nodemap, 0);
+    }
+    else {
+        LOG(INFO, "NOT Calling vNUMA construct with nr_nodes = %d\n",
info->nr_vnodes);
+        info->nr_vnodes = 0;
+    }
+#endif
     libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap);
     libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,
&info->cpumap);
-
+        
     xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb +
LIBXL_MAXMEM_CONSTANT);
     xs_domid = xs_read(ctx->xsh, XBT_NULL,
"/tool/xenstored/domid", NULL);
     state->store_domid = xs_domid ? atoi(xs_domid) : 0;
@@ -368,7 +452,20 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
             }
         }
     }
-
+#ifdef LIBXL_HAVE_BUILDINFO_VNUMA
+    if (info->nr_vnodes != 0 && info->vnuma_memszs != NULL
&& info->vnode_to_pnode != NULL) {
+        dom->nr_vnodes = info->nr_vnodes;
+        dom->vnumablocks = malloc(info->nr_vnodes *
sizeof(*dom->vnumablocks));
+        dom->vnode_to_pnode = (int *)malloc(info->nr_vnodes *
sizeof(*info->vnode_to_pnode));
+        dom->vmemsizes = malloc(info->nr_vnodes *
sizeof(*info->vnuma_memszs));
+        if (dom->vmemsizes == NULL || dom->vnode_to_pnode == NULL) {
+            LOGE(ERROR, "%s:Failed to allocate memory for memory
sizes.\n",__FUNCTION__);
+            goto out;
+        }
+        memcpy(dom->vmemsizes, info->vnuma_memszs,
sizeof(*info->vnuma_memszs) * info->nr_vnodes);
+        memcpy(dom->vnode_to_pnode, info->vnode_to_pnode,
sizeof(*info->vnode_to_pnode) * info->nr_vnodes);
+    }
+#endif
     dom->flags = flags;
     dom->console_evtchn = state->console_port;
     dom->console_domid = state->console_domid;
@@ -388,9 +485,17 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
         LOGE(ERROR, "xc_dom_mem_init failed");
         goto out;
     }
-    if ( (ret = xc_dom_boot_mem_init(dom)) != 0 ) {
-        LOGE(ERROR, "xc_dom_boot_mem_init failed");
-        goto out;
+    if (info->nr_vnodes != 0 && info->vnuma_memszs != NULL) {
+        if ( (ret = xc_dom_boot_mem_init(dom)) != 0 ) {
+            LOGE(ERROR, "xc_dom_boot_mem_init_node  failed");
+            goto out;
+        }
+    }
+    else {
+        if ( (ret = xc_dom_boot_mem_init(dom)) != 0 ) {
+            LOGE(ERROR, "xc_dom_boot_mem_init failed");
+            goto out;
+        }
     }
     if ( (ret = xc_dom_build_image(dom)) != 0 ) {
         LOGE(ERROR, "xc_dom_build_image failed");
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index f051d91..4a501c4 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2709,6 +2709,7 @@ static inline void libxl__ctx_unlock(libxl_ctx *ctx) {
 #define CTX_LOCK (libxl__ctx_lock(CTX))
 #define CTX_UNLOCK (libxl__ctx_unlock(CTX))
 
+#define NUMA_NO_NODE 0xFF
 /*
  * Automatic NUMA placement
  *
@@ -2832,6 +2833,8 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc,
     libxl_bitmap_copy(CTX, &cndt->nodemap, nodemap);
 }
 
+int libxl_init_vnodemap(libxl__gc *gc, uint32_t domid,
+                                libxl_domain_build_info *info);
 /*
  * Inserts "elm_new" into the sorted list "head".
  *
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 85341a0..c3a4d95 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -208,6 +208,7 @@ libxl_dominfo = Struct("dominfo",[
     ("vcpu_max_id", uint32),
     ("vcpu_online", uint32),
     ("cpupool",     uint32),
+    ("nr_vnodes",   uint16),
     ], dir=DIR_OUT)
 
 libxl_cpupoolinfo = Struct("cpupoolinfo", [
@@ -279,7 +280,10 @@ libxl_domain_build_info =
Struct("domain_build_info",[
     ("disable_migrate", libxl_defbool),
     ("cpuid",           libxl_cpuid_policy_list),
     ("blkdev_start",    string),
-    
+    ("vnuma_memszs",    Array(uint64, "nr_vnodes")),
+    ("vcpu_to_vnode",   Array(integer, "nr_vnodemap")),
+    ("vdistance",       Array(integer, "nr_vdist")),
+    ("vnode_to_pnode",  Array(integer,
"nr_vnode_to_pnode")),
     ("device_model_version", libxl_device_model_version),
     ("device_model_stubdomain", libxl_defbool),
     # if you set device_model you must set device_model_version too
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index a78c91d..35da3a8 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -308,3 +308,94 @@ int libxl__arch_domain_create(libxl__gc *gc,
libxl_domain_config *d_config,
 
     return ret;
 }
+
+unsigned long e820_memory_hole_size(unsigned long start, unsigned long end,
struct e820entry e820[], int nr)
+{
+#define clamp(val, min, max) ({             \
+    typeof(val) __val = (val);              \
+    typeof(min) __min = (min);              \
+    typeof(max) __max = (max);              \
+    (void) (&__val == &__min);              \
+    (void) (&__val == &__max);              \
+    __val = __val < __min ? __min: __val;   \
+    __val > __max ? __max: __val; })
+    int i;
+    unsigned long absent, start_pfn, end_pfn;
+    absent = start - end;
+    for(i = 0; i < nr; i++) {
+        if(e820[i].type == E820_RAM) {
+            start_pfn = clamp(e820[i].addr, start, end);
+            end_pfn =   clamp(e820[i].addr + e820[i].size, start, end);
+            absent -= end_pfn - start_pfn;
+        }
+    }
+    return absent;
+}
+
+/* Align memory blocks for linux NUMA build image */
+int libxl_vnuma_align_mem(libxl__gc *gc,
+                            uint32_t domid,
+                            libxl_domain_build_info *b_info,
+                            vnuma_memblk_t *memblks) /* linux specific memory
blocks: out */
+{
+#ifndef roundup
+#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
+#endif 
+    /* 
+      This function transforms mem block sizes in bytes 
+      into aligned PV Linux guest NUMA nodes. 
+      XEN will provide this memory layout to PV Linux guest upon boot for
+      PV Linux guests.
+    */
+    int i, rc;
+    unsigned long shift = 0, size, node_min_size = 1, limit;
+    unsigned long end_max;
+    uint32_t nr;
+    struct e820entry map[E820MAX];
+    
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX);
+    if (rc < 0) {
+        errno = rc;
+        return -EINVAL;
+    }
+    nr = rc;
+    rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
+                       (b_info->max_memkb - b_info->target_memkb) +
+                       b_info->u.pv.slack_memkb);
+    if (rc)
+        return ERROR_FAIL;
+    
+    end_max = map[nr-1].addr + map[nr-1].size;
+    
+    shift = 0;
+    for(i = 0; i < b_info->nr_vnodes; i++) {
+        printf("block [%d] start inside align = %#lx\n", i,
b_info->vnuma_memszs[i]);
+    }
+    memset(memblks, 0, sizeof(*memblks)*b_info->nr_vnodes);
+    memblks[0].start = 0;
+    for(i = 0; i < b_info->nr_vnodes; i++) {
+        memblks[i].start += shift;
+        memblks[i].end += shift + b_info->vnuma_memszs[i];
+        limit = size = memblks[i].end - memblks[i].start;
+        while (memblks[i].end - memblks[i].start -
e820_memory_hole_size(memblks[i].start, memblks[i].end, map, nr) < size) {
+            memblks[i].end += node_min_size;
+            shift += node_min_size;
+            if (memblks[i].end - memblks[i].start >= limit) {
+                memblks[i].end = memblks[i].start + limit;
+                break;
+            }
+            if (memblks[i].end == end_max) {
+                memblks[i].end = end_max;
+                break;
+            }
+        }
+        shift = memblks[i].end;
+        memblks[i].start = roundup(memblks[i].start, 4*1024);
+
+        printf("start = %#010lx, end = %#010lx\n", memblks[i].start,
memblks[i].end);
+    }
+    if(memblks[i-1].end > end_max)
+        memblks[i-1].end = end_max;
+    return 0;
+}
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 884f050..36a8275 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -539,7 +539,121 @@ vcpp_out:
 
     return rc;
 }
+#ifdef LIBXL_HAVE_BUILDINFO_VNUMA
+static int vdistance_parse(char *vdistcfg, int *vdistance, int nr_vnodes)
+{
+    char *endptr, *toka, *tokb, *saveptra = NULL, *saveptrb = NULL;
+    int *vdist_tmp = NULL;
+    int rc = 0;
+    int i, j, dist, parsed = 0;	
+    rc = -EINVAL;
+    if(vdistance == NULL) {
+        return rc;
+    }
+    vdist_tmp = (int *)malloc(nr_vnodes * nr_vnodes * sizeof(*vdistance));
+    if (vdist_tmp == NULL)
+        return rc;
+    i =0; j = 0;
+    for (toka = strtok_r(vdistcfg, ",", &saveptra); toka;
+        toka = strtok_r(NULL, ",", &saveptra)) {
+        if ( i >= nr_vnodes ) 
+            goto vdist_parse_err;
+        for (tokb = strtok_r(toka, " ", &saveptrb); tokb;
+            tokb = strtok_r(NULL, " ", &saveptrb)) {
+            if (j >= nr_vnodes) 
+                goto vdist_parse_err;
+            dist = strtol(tokb, &endptr, 10);
+            if (tokb == endptr)
+                goto vdist_parse_err;
+            *(vdist_tmp + j*nr_vnodes + i) = dist;
+            parsed++;
+            j++;
+        }
+        i++;
+        j = 0;
+    }
+    rc = parsed;
+    memcpy(vdistance, vdist_tmp, nr_vnodes * nr_vnodes * sizeof(*vdistance));
+vdist_parse_err:
+    if (vdist_tmp !=NULL ) free(vdist_tmp);
+    return rc;
+}
 
+static int vcputovnode_parse(char *cfg, int *vmap, int nr_vnodes, int nr_vcpus)
+{
+    char *toka, *endptr, *saveptra = NULL;
+    int *vmap_tmp = NULL;
+    int rc = 0;
+    int i;
+    rc = -EINVAL;
+    i = 0;
+    if(vmap == NULL) {
+        return rc;
+    }
+    vmap_tmp = (int *)malloc(sizeof(*vmap) * nr_vcpus);
+    memset(vmap_tmp, 0, sizeof(*vmap) * nr_vcpus);
+    for (toka = strtok_r(cfg, " ", &saveptra); toka;
+        toka = strtok_r(NULL, " ", &saveptra)) {
+        if (i >= nr_vcpus) goto vmap_parse_out;
+            vmap_tmp[i] = strtoul(toka, &endptr, 10);
+            if( endptr == toka) 
+                goto vmap_parse_out;
+            fprintf(stderr, "Parsed vcpu_to_vnode[%d] = %d.\n", i,
vmap_tmp[i]);
+        i++;
+    }
+    memcpy(vmap, vmap_tmp, sizeof(*vmap) * nr_vcpus);
+    rc = i;
+vmap_parse_out:
+    if (vmap_tmp != NULL) free(vmap_tmp);
+    return rc;
+}
+
+static int vnumamem_parse(char *vmemsizes, uint64_t *vmemregions, int
nr_vnodes)
+{
+    uint64_t memsize;
+    char *endptr, *toka, *saveptr = NULL;
+    int rc = 0;
+    int j;
+    rc = -EINVAL;
+    if(vmemregions == NULL) {
+        goto vmem_parse_out;
+    }
+    memsize = 0;
+    j = 0;
+    for (toka = strtok_r(vmemsizes, ",", &saveptr); toka;
+        toka = strtok_r(NULL, ",", &saveptr)) {
+        if ( j >= nr_vnodes ) 
+            goto vmem_parse_out;
+        memsize = strtoul(toka, &endptr, 10);
+        if (endptr == toka) 
+            goto vmem_parse_out;
+        switch (*endptr) {
+            case ''G'':
+            case ''g'':
+                memsize = memsize * 1024 * 1024 * 1024;
+                break;
+            case ''M'':
+            case ''m'':
+                memsize = memsize * 1024 * 1024;
+                break;
+            case ''K'':
+            case ''k'':
+                memsize = memsize * 1024 ;
+                break;
+            default:
+                continue;
+                break;
+        }
+        if (memsize > 0) {
+            vmemregions[j] = memsize;
+            j++;
+        }
+    }
+    rc = j;
+vmem_parse_out:   
+    return rc;
+}
+#endif
 static void parse_config_data(const char *config_source,
                               const char *config_data,
                               int config_len,
@@ -871,7 +985,13 @@ static void parse_config_data(const char *config_source,
     {
         char *cmdline = NULL;
         const char *root = NULL, *extra = "";
-
+#ifdef LIBXL_HAVE_BUILDINFO_VNUMA        
+        const char *vnumamemcfg = NULL;
+        int nr_vnuma_regions;
+        long unsigned int vnuma_memparsed = 0;
+        const char *vmapcfg  = NULL;
+        const char *vdistcfg = NULL;
+#endif
         xlu_cfg_replace_string (config, "kernel",
&b_info->u.pv.kernel, 0);
 
         xlu_cfg_get_string (config, "root", &root, 0);
@@ -888,7 +1008,82 @@ static void parse_config_data(const char *config_source,
             fprintf(stderr, "Failed to allocate memory for
cmdline\n");
             exit(1);
         }
+#ifdef LIBXL_HAVE_BUILDINFO_VNUMA
+        if (!xlu_cfg_get_long (config, "vnodes", &l, 0)) {
+                b_info->nr_vnodes = l;
+                if (b_info->nr_vnodes <= 0)
+                    exit(1);
+                if(!xlu_cfg_get_string (config, "vnumamem",
&vnumamemcfg, 0)) {
+                        b_info->vnuma_memszs = calloc(b_info->nr_vnodes,
+                                                   
sizeof(*b_info->vnuma_memszs));
+                        if (b_info->vnuma_memszs == NULL) {
+                            fprintf(stderr, "WARNING: Could not allocate
vNUMA node memory sizes.\n");
+                            exit(1);
+                        }
+                        char *buf2 = strdup(vnumamemcfg);
+                        nr_vnuma_regions = vnumamem_parse(buf2,
b_info->vnuma_memszs,
+                                                               
b_info->nr_vnodes);
+                        for(i = 0; i < b_info->nr_vnodes; i++)
+                            vnuma_memparsed = vnuma_memparsed +
(b_info->vnuma_memszs[i] >> 10);
+
+                        if(vnuma_memparsed != b_info->max_memkb ||
+                                nr_vnuma_regions != b_info->nr_vnodes )
+                        {
+                            fprintf(stderr, "WARNING: Incorrect vNUMA
config. Parsed memory = %lu, parsed nodes = %d, max = %lx\n",
+                                        vnuma_memparsed, nr_vnuma_regions,
b_info->max_memkb);
+                            if(buf2) free(buf2);
+                            exit(1);
+                        }
+                        if (buf2) free(buf2);
+                }
+                else 
+                    b_info->nr_vnodes=0;
+                if(!xlu_cfg_get_string(config, "vnuma_distance",
&vdistcfg, 0)) {
+                    b_info->vdistance = (int *)calloc(b_info->nr_vnodes *
b_info->nr_vnodes,
+                                                       
sizeof(*b_info->vdistance));
+                    if (b_info->vdistance == NULL) 
+                       exit(1);
+                    char *buf2 = strdup(vdistcfg);
+                    if(vdistance_parse(buf2, b_info->vdistance,
b_info->nr_vnodes) != b_info->nr_vnodes * b_info->nr_vnodes) {
+                        if (buf2) free(buf2);
+                        free(b_info->vdistance);
+                        exit(1);
+                    } 
+                    if(buf2) free(buf2);
+                }
+                else 
+                {
+                    /* default distance */
+                    b_info->vdistance = (int *)calloc(b_info->nr_vnodes *
b_info->nr_vnodes, sizeof(*b_info->vdistance));
+                    if (b_info->vdistance == NULL)
+                        exit(1);
+                    for(i = 0; i < b_info->nr_vnodes; i++)
+                        for(int j = 0; j < b_info->nr_vnodes; j++)
+                            *(b_info->vdistance + j*b_info->nr_vnodes +
i) = (i == j ? 10 : 20);
 
+                }
+                if(!xlu_cfg_get_string(config, "vcpu_to_vnode",
&vmapcfg, 0))
+                {
+                    b_info->vcpu_to_vnode = (int
*)calloc(b_info->max_vcpus, sizeof(*b_info->vcpu_to_vnode));
+                    if (b_info->vcpu_to_vnode == NULL) 
+                       exit(-1);
+                    char *buf2 = strdup(vmapcfg);
+                    if (vcputovnode_parse(buf2, b_info->vcpu_to_vnode,
b_info->nr_vnodes, b_info->max_vcpus) < 0) {
+                        if (buf2) free(buf2);
+                        fprintf(stderr, "Error parsing vcpu to vnode
mask\n");
+                        exit(1);
+                    }
+                    if(buf2) free(buf2);
+                }
+                else
+                {
+                    b_info->vcpu_to_vnode = (int
*)calloc(b_info->max_vcpus, sizeof(*b_info->vcpu_to_vnode));
+                    if (b_info->vcpu_to_vnode != NULL)
+                        libxl_default_vcpu_to_vnuma(b_info);
+                }
+        }
+#endif        
+        
         xlu_cfg_replace_string (config, "bootloader",
&b_info->u.pv.bootloader, 0);
         switch (xlu_cfg_get_list_as_string_list(config,
"bootloader_args",
                                       &b_info->u.pv.bootloader_args, 1))
-- 
1.7.10.4

Elena Ufimtseva

2013-Aug-27 07:54 UTC

head link

[PATCH RFC 5/7] libxl/vnuma: vnuma enabler.

Enables libxl vnuma ABI by LIBXL_HAVE_BUILDINFO_VNUMA.

Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxl/libxl.h |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a1a5e33..ad0d0d8 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -90,6 +90,14 @@
 #define LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE 1
 
 /*
+ * LIBXL_HAVE_BUILDINFO_VNUMA indicates that vnuma topology will be
+ * build for the guest upon request and with VM configuration.
+ * It will try to define best allocation for vNUMA
+ * nodes on real NUMA nodes.
+ */
+#define LIBXL_HAVE_BUILDINFO_VNUMA 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
-- 
1.7.10.4

Elena Ufimtseva

2013-Aug-27 07:54 UTC

head link

[PATCH RFC 6/7] libxc/vnuma: vnuma per phys NUMA allocation.

Allows for vNUMA enabled domains to allocate vnodes on physical NUMA nodes.
Tries to allocate all vnodes on one NUMA node, or on next one if not all
vnodes fit. If no physical numa node found, will let xen decide.

TODO:
take into account cpu pinning if defined in VM config;
take into account automatic NUMA placement mechanism;

Allows for vNUMA enabled domains to allocate vnodes
on physical NUMA nodes. Adds some arch bits.

Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
---
 tools/libxc/xc_dom.h     |    1 +
 tools/libxc/xc_dom_x86.c |   79 ++++++++++++++++++++++++++++++++++++++++------
 tools/libxc/xg_private.h |    4 +++
 3 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h
index 4375f25..7037614 100644
--- a/tools/libxc/xc_dom.h
+++ b/tools/libxc/xc_dom.h
@@ -371,6 +371,7 @@ static inline xen_pfn_t xc_dom_p2m_guest(struct xc_dom_image
*dom,
 int arch_setup_meminit(struct xc_dom_image *dom);
 int arch_setup_bootearly(struct xc_dom_image *dom);
 int arch_setup_bootlate(struct xc_dom_image *dom);
+int arch_boot_numa_alloc(struct xc_dom_image *dom);
 
 /*
  * Local variables:
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 126c0f8..99f7444 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -789,27 +789,47 @@ int arch_setup_meminit(struct xc_dom_image *dom)
     else
     {
         /* try to claim pages for early warning of insufficient memory avail */
+        rc = 0;
         if ( dom->claim_enabled ) {
             rc = xc_domain_claim_pages(dom->xch, dom->guest_domid,
                                        dom->total_pages);
             if ( rc )
+            {
+                xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                             "%s: Failed to claim mem for dom\n",
+                             __FUNCTION__);
                 return rc;
+            }
         }
         /* setup initial p2m */
         for ( pfn = 0; pfn < dom->total_pages; pfn++ )
             dom->p2m_host[pfn] = pfn;
         
         /* allocate guest memory */
-        for ( i = rc = allocsz = 0;
-              (i < dom->total_pages) && !rc;
-              i += allocsz )
+        if (dom->nr_vnodes > 0)
         {
-            allocsz = dom->total_pages - i;
-            if ( allocsz > 1024*1024 )
-                allocsz = 1024*1024;
-            rc = xc_domain_populate_physmap_exact(
-                dom->xch, dom->guest_domid, allocsz,
-                0, 0, &dom->p2m_host[i]);
+            rc = arch_boot_numa_alloc(dom);
+            if ( rc )
+            {
+                xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                             "%s: Failed to allocate memory on NUMA
nodes\n",
+                             __FUNCTION__);
+                return rc;
+            }
+        }
+        else
+        {
+            for ( i = rc = allocsz = 0;
+                  (i < dom->total_pages) && !rc;
+                  i += allocsz )
+            {
+                allocsz = dom->total_pages - i;
+                if ( allocsz > 1024*1024 )
+                    allocsz = 1024*1024;
+                rc = xc_domain_populate_physmap_exact(
+                    dom->xch, dom->guest_domid, allocsz,
+                    0, 0, &dom->p2m_host[i]);
+            }
         }
 
         /* Ensure no unclaimed pages are left unused.
@@ -817,7 +837,48 @@ int arch_setup_meminit(struct xc_dom_image *dom)
         (void)xc_domain_claim_pages(dom->xch, dom->guest_domid,
                                     0 /* cancels the claim */);
     }
+    return rc;
+}
+
+int arch_boot_numa_alloc(struct xc_dom_image *dom)
+{ 
+    int rc, n;
+    uint64_t guest_pages;
+    unsigned long allocsz, i, k;
+    unsigned long memflags;
 
+    rc = allocsz = k = 0;
+    for(n = 0; n < dom->nr_vnodes; n++)
+    {
+        memflags = 0;
+        if ( dom->vnode_to_pnode[n] != NUMA_NO_NODE )
+        {
+            memflags |= XENMEMF_exact_node(dom->vnode_to_pnode[n]);
+            memflags |= XENMEMF_exact_node_request;
+        }
+        guest_pages = dom->vmemsizes[n] >> PAGE_SHIFT_X86;
+        for ( i = 0;
+            (i < guest_pages) && !rc;
+                i += allocsz )
+        {
+            allocsz = guest_pages - i;
+            if ( allocsz > 1024*1024 )
+                allocsz = 1024*1024;
+                rc = xc_domain_populate_physmap_exact(
+                                    dom->xch, dom->guest_domid, allocsz,
+                                    0, memflags, &dom->p2m_host[i + k]);
+            k += allocsz;
+        }
+        if (rc == 0) printf("%s: allocated %lx pages for vnode %d on pnode
%d out of %lx\n",
+                       __FUNCTION__,i, n, dom->vnode_to_pnode[n],
dom->total_pages);
+        else
+        {
+            xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+                    "%s: Failed allocation of %lx pages for vnode %d on
pnode %d out of %lx\n",
+                    __FUNCTION__,i, n, dom->vnode_to_pnode[n],
dom->total_pages);
+            return -EINVAL;
+        }
+    }
     return rc;
 }
 
diff --git a/tools/libxc/xg_private.h b/tools/libxc/xg_private.h
index db02ccf..538d185 100644
--- a/tools/libxc/xg_private.h
+++ b/tools/libxc/xg_private.h
@@ -127,6 +127,10 @@ typedef uint64_t l4_pgentry_64_t;
 #define ROUNDUP(_x,_w) (((unsigned long)(_x)+(1UL<<(_w))-1) &
~((1UL<<(_w))-1))
 #define NRPAGES(x) (ROUNDUP(x, PAGE_SHIFT) >> PAGE_SHIFT)
 
+#define MAX_ORDER_X86           11
+#define NODE_MIN_SIZE_X86       1024*1024*4
+#define ZONE_ALIGN_X86          (1UL << (MAX_ORDER_X86 + PAGE_SHIFT_X86))
+#define NUMA_NO_NODE            0xFF
 
 /* XXX SMH: following skanky macros rely on variable p2m_size being set */
 /* XXX TJD: also, "guest_width" should be the guest''s
sizeof(unsigned long) */
-- 
1.7.10.4

Elena Ufimtseva

2013-Aug-27 07:54 UTC

head link

[PATCH RFC 7/7] xen/vnuma: basic vnuma debug info

Prints basic vnuma info per domain on ''debug-keys u''.

Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
---
 xen/arch/x86/numa.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index b141877..71bfd31 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -347,7 +347,7 @@ EXPORT_SYMBOL(node_data);
 static void dump_numa(unsigned char key)
 {
 	s_time_t now = NOW();
-	int i;
+	int i, j;
 	struct domain *d;
 	struct page_info *page;
 	unsigned int page_num_node[MAX_NUMNODES];
@@ -389,6 +389,20 @@ static void dump_numa(unsigned char key)
 
 		for_each_online_node(i)
 			printk("    Node %u: %u\n", i, page_num_node[i]);
+                if(d->vnuma.nr_vnodes > 0) 
+                {
+                    printk("    Domain has %d vnodes\n",
d->vnuma.nr_vnodes);
+                    for(j = 0; j < d->vnuma.nr_vnodes; j++) {
+                        printk("    vnode %d ranges %#010lx - %#010lx
pnode %d",
+                                j, d->vnuma.vnuma_memblks[j].start, 
+                                d->vnuma.vnuma_memblks[j].end,
+                                d->vnuma.vnode_to_pnode[j]);
+                    }
+                    printk("    Domain vcpu to vnode: ");
+                    for(j = 0; j < d->max_vcpus; j++) 
+                        printk("%d ", d->vnuma.vcpu_to_vnode[j]);
+                    printk("\n");
+                }
 	}
 
 	rcu_read_unlock(&domlist_read_lock);
-- 
1.7.10.4

Jan Beulich

2013-Aug-27 08:53 UTC

head link

Re: [PATCH RFC 1/7] xen/vnuma: subop hypercall and vnuma topology structures.

>>> On 27.08.13 at 09:54, Elena Ufimtseva <ufimtseva@gmail.com>
wrote:
> Defines XENMEM subop hypercall for PV vNUMA enabled guests and provides
> vNUMA topology information from per-domain vnuma topology build info.
> TODO:
> subop XENMEM hypercall is subject to change to sysctl subop.
That would mean it''s intended to be used by the tool stack only. I
thought that the balloon driver (and perhaps other code) are also
intended to be consumers.
> @@ -732,7 +733,94 @@ long do_memory_op(unsigned long cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
>          rcu_unlock_domain(d);
>  
>          break;
> -
> +    case XENMEM_get_vnuma_info:
> +    {
> +        int i;
> +        struct vnuma_topology_info mtopology;
> +        struct vnuma_topology_info touser_topo;
> +        struct domain *d;
> +        unsigned int max_pages;
> +        vnuma_memblk_t *vblks;
> +        XEN_GUEST_HANDLE(int) vdistance;
> +        XEN_GUEST_HANDLE_PARAM(int) vdist_param;
> +        XEN_GUEST_HANDLE(vnuma_memblk_t) buf;
> +        XEN_GUEST_HANDLE_PARAM(vnuma_memblk_t) buf_param;
> +        XEN_GUEST_HANDLE(int) vcpu_to_vnode;
> +        XEN_GUEST_HANDLE_PARAM(int) vmap_param;
> +
> +        rc = -1;
You absolutely need to use proper -E... values when returnin
hypercall status.
> +        if ( guest_handle_is_null(arg) )
> +            return rc;
> +        if( copy_from_guest(&mtopology, arg, 1) )
> +        {
> +            gdprintk(XENLOG_INFO, "Cannot get
copy_from_guest..\n");
> +            return -EFAULT;
> +        }
> +        gdprintk(XENLOG_INFO, "Domain id is
%d\n",mtopology.domid);
I appreciate that you need such for debugging, but this should be
removed before posting patches.
> +        if ( (d = rcu_lock_domain_by_any_id(mtopology.domid)) == NULL )
> +        {
> +            gdprintk(XENLOG_INFO, "Numa: Could not get domain
id.\n");
> +            return -ESRCH;
> +        }
> +        rcu_unlock_domain(d);
> +        touser_topo.nr_vnodes = d->vnuma.nr_vnodes;
Mis-ordered: First you want to use d, then rcu-unlock it.
> +        rc = copy_to_guest(arg, &touser_topo, 1);
> +        if ( rc )
> +        {
> +            gdprintk(XENLOG_INFO, "Bad news, could not copy to guest
NUMA info\n");
> +            return -EFAULT;
> +        }
> +        max_pages = d->max_pages;
> +        if ( touser_topo.nr_vnodes == 0 || touser_topo.nr_vnodes >
d->max_vcpus )
> +        {
> +            gdprintk(XENLOG_INFO, "vNUMA: Error in block creation -
vnodes %d, vcpus %d \n", touser_topo.nr_vnodes, d->max_vcpus);
> +            return -EFAULT;
> +        }
> +        vblks = (vnuma_memblk_t *)xmalloc_array(struct vnuma_memblk,
touser_topo.nr_vnodes);
> +        if ( vblks == NULL )
> +        {
> +            gdprintk(XENLOG_INFO, "vNUMA: Could not get memory for
memblocks\n");
> +            return -1;
> +        }
> +        buf_param = guest_handle_cast(mtopology.vnuma_memblks,
vnuma_memblk_t);
By giving the structure field a proper type you should be able to
avoid the use of guest_handle_cast() here and below.
> +        buf = guest_handle_from_param(buf_param, vnuma_memblk_t);
> +        for ( i = 0; i < touser_topo.nr_vnodes; i++ )
> +        {
> +                gdprintk(XENLOG_INFO, "vmemblk[%d] start %#lx end
%#lx\n", i, d->vnuma.vnuma_memblks[i].start,
d->vnuma.vnuma_memblks[i].end);
Actually, I''m going to give up here (for this file) - the not cleaned
up
code is obfuscating the real meat of the code too much for
reasonable reviewing.
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -453,6 +453,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
>   * Caller must be privileged or the hypercall fails.
>   */
>  #define XENMEM_claim_pages                  24
> +#define XENMEM_get_vnuma_info               25
>  
>  /*
>   * XENMEM_claim_pages flags - the are no flags at this time.
Misplaced - don''t put this in the middle of another logical section.
> --- /dev/null
> +++ b/xen/include/public/vnuma.h
> @@ -0,0 +1,12 @@
> +#ifndef __XEN_PUBLIC_VNUMA_H
> +#define __XEN_PUBLIC_VNUMA_H
> +
> +#include "xen.h"
> +
> +struct vnuma_memblk {
> +              uint64_t start, end;
> +};
> +typedef struct vnuma_memblk vnuma_memblk_t;
> +DEFINE_XEN_GUEST_HANDLE(vnuma_memblk_t);
> +
> +#endif
Unmotivated new file. Plus the type isn''t used elsewhere in the
public interface.
> @@ -89,4 +90,12 @@ extern unsigned int xen_processor_pmbits;
>  
>  extern bool_t opt_dom0_vcpus_pin;
>  
> +struct domain_vnuma_info {
> +    uint16_t nr_vnodes;
> +    int *vdistance;
> +    vnuma_memblk_t *vnuma_memblks;
> +    int *vcpu_to_vnode;
> +    int *vnode_to_pnode;
Can any of the "int" fields reasonably be negative? If not, they
ought to be "unsigned int".
> --- /dev/null
> +++ b/xen/include/xen/vnuma.h
> @@ -0,0 +1,27 @@
> +#ifndef _VNUMA_H
> +#define _VNUMA_H
> +#include <public/vnuma.h>
> +
> +/* DEFINE_XEN_GUEST_HANDLE(vnuma_memblk_t); */
> +
> +struct vnuma_topology_info {
> +    domid_t domid;
> +    uint16_t nr_vnodes;
> +    XEN_GUEST_HANDLE_64(vnuma_memblk_t) vnuma_memblks;
> +    XEN_GUEST_HANDLE_64(int) vdistance;
> +    XEN_GUEST_HANDLE_64(int) vcpu_to_vnode;
> +    XEN_GUEST_HANDLE_64(int) vnode_to_pnode;
> +};
> +typedef struct vnuma_topology_info vnuma_topology_info_t;
> +DEFINE_XEN_GUEST_HANDLE(vnuma_topology_info_t);
At least up to here it seems like this is part of the intended public
interface, and hence ought to go into public/memory.h.
> +#define __vnode_distance_offset(_dom, _i, _j) \
> +        ( ((_j)*((_dom)->vnuma.nr_vnodes)) + (_i) )
Missing blanks around *.
> +
> +#define __vnode_distance(_dom, _i, _j) \
> +        ( (_dom)->vnuma.vdistance[__vnode_distance_offset((_dom), (_i),
(_j))] )
> +
> +#define __vnode_distance_set(_dom, _i, _j, _v) \
> +        do { __vnode_distance((_dom), (_i), (_j)) = (_v); } while(0)
Proper parenthesization is clearly necessary, but there are clearly
some that are reducing legibility of the code without having any
useful purpose.

Jan

Jan Beulich

2013-Aug-27 08:59 UTC

head link

Re: [PATCH RFC 2/7] xen/vnuma: domctl subop for vnuma setup.

>>> On 27.08.13 at 09:54, Elena Ufimtseva <ufimtseva@gmail.com>
wrote:
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -227,6 +227,11 @@ struct domain *domain_create(
>      spin_lock_init(&d->node_affinity_lock);
>      d->node_affinity = NODE_MASK_ALL;
>      d->auto_node_affinity = 1;
> +    d->vnuma.vnuma_memblks = NULL;
> +    d->vnuma.vnode_to_pnode = NULL;
> +    d->vnuma.vcpu_to_vnode = NULL;
> +    d->vnuma.vdistance = NULL;
> +    d->vnuma.nr_vnodes = 0;
Pretty pointless considering that struct domain starts out from a
zeroed page.
> @@ -532,6 +537,7 @@ int domain_kill(struct domain *d)
>          tmem_destroy(d->tmem);
>          domain_set_outstanding_pages(d, 0);
>          d->tmem = NULL;
> +        /* TODO: vnuma_destroy(d->vnuma); */
That''s intended to go away by the time the RFC tag gets dropped?
> @@ -862,7 +863,76 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t)
u_domctl)
>          ret = set_global_virq_handler(d, virq);
>      }
>      break;
> -
> +    case XEN_DOMCTL_setvnumainfo:
> +    {
> +        int i, j;
> +        int dist_size;
> +        int dist, vmap, vntop;
unsigned, unsigned, unsigned.
> +        vnuma_memblk_t vmemblk;
> +        
> +        ret = -EFAULT;
> +        dist = i = j = 0;
> +        if (op->u.vnuma.nr_vnodes <= 0 || op->u.vnuma.nr_vnodes
> NR_CPUS)
> +            break;
-EFAULT seems inappropriate here.
> +        d->vnuma.nr_vnodes = op->u.vnuma.nr_vnodes;
> +        dist_size = d->vnuma.nr_vnodes * d->vnuma.nr_vnodes;
> +        if ( (d->vnuma.vdistance =
xmalloc_bytes(sizeof(*d->vnuma.vdistance) * dist_size) ) == NULL)
> +           break;
> +        for ( i = 0; i < d->vnuma.nr_vnodes; i++ )
> +            for ( j = 0; j < d->vnuma.nr_vnodes; j++ )
> +            {
> +                if ( unlikely(__copy_from_guest_offset(&dist,
op->u.vnuma.vdistance, __vnode_distance_offset(d, i, j), 1)) )
Long line.
> +                {
> +                    gdprintk(XENLOG_INFO, "vNUMA: Copy distance table
error\n");
> +                    goto err_dom;
> +                }
> +                __vnode_distance_set(d, i, j, dist);
> +            }
> +        if ( (d->vnuma.vnuma_memblks =
xmalloc_bytes(sizeof(*d->vnuma.vnuma_memblks) * d->vnuma.nr_vnodes)) ==
NULL )
Again.
> +            goto err_dom;
> +        for ( i = 0; i < d->vnuma.nr_vnodes; i++ )
> +        {
> +            if ( unlikely(__copy_from_guest_offset(&vmemblk,
op->u.vnuma.vnuma_memblks, i, 1)) )
> +            {
> +                gdprintk(XENLOG_INFO, "vNUMA: memory size
error\n");
Just like for the earlier patch - the many formal problems make it quite
hard to review the actual code.
> @@ -852,6 +853,17 @@ struct xen_domctl_set_broken_page_p2m {
>  typedef struct xen_domctl_set_broken_page_p2m
xen_domctl_set_broken_page_p2m_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_broken_page_p2m_t);
>  
> +struct xen_domctl_vnuma {
> +    uint16_t nr_vnodes;
> +    XEN_GUEST_HANDLE_64(int) vdistance;
> +    XEN_GUEST_HANDLE_64(vnuma_memblk_t) vnuma_memblks;
> +    XEN_GUEST_HANDLE_64(int) vcpu_to_vnode;
> +    XEN_GUEST_HANDLE_64(int) vnode_to_pnode;
uint, uint, uint.

Jan

Ian Campbell

2013-Aug-27 09:10 UTC

head link

Re: [PATCH RFC 3/7] libxc/vnuma: per-domain vnuma structures.

On Tue, 2013-08-27 at 03:54 -0400, Elena Ufimtseva
wrote:> Makes use of domctl vnuma subop and initializes per-domain
> vnuma topology.
> 
> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
> ---
>  tools/libxc/xc_dom.h    |    9 +++++++
>  tools/libxc/xc_domain.c |   63
+++++++++++++++++++++++++++++++++++++++++++++++
>  tools/libxc/xenctrl.h   |   17 +++++++++++++
>  3 files changed, 89 insertions(+)
> 
> diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h
> index 86e23ee..4375f25 100644
> --- a/tools/libxc/xc_dom.h
> +++ b/tools/libxc/xc_dom.h
> @@ -114,6 +114,15 @@ struct xc_dom_image {
>      struct xc_dom_phys *phys_pages;
>      int realmodearea_log;
>  
> +    /* vNUMA topology and memory allocation structure
> +     * Defines the way to allocate XEN
> +     * memory from phys NUMA nodes by providing mask
> +     * vnuma_to_pnuma */
> +    int nr_vnodes;
> +    struct vnuma_memblk *vnumablocks;
> +    uint64_t *vmemsizes;
> +    int *vnode_to_pnode;
> +
>      /* malloc memory pool */
>      struct xc_dom_mem *memblocks;
>  
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 3257e2a..98445e3 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -24,6 +24,7 @@
>  #include "xg_save_restore.h"
>  #include <xen/memory.h>
>  #include <xen/hvm/hvm_op.h>
> +#include "xg_private.h"
>  
>  int xc_domain_create(xc_interface *xch,
>                       uint32_t ssidref,
> @@ -1629,6 +1630,68 @@ int xc_domain_set_virq_handler(xc_interface *xch,
uint32_t domid, int virq)
>      return do_domctl(xch, &domctl);
>  }
>  
> +/* Informs XEN that domain is vNUMA aware */
"Xen" ;-)
> +int xc_domain_setvnodes(xc_interface *xch, 
> +                        uint32_t domid, 
> +                        uint16_t nr_vnodes,
> +                        uint16_t nr_vcpus,
> +                        vnuma_memblk_t *vmemblks,
> +                        int *vdistance,
> +                        int *vcpu_to_vnode,
> +                        int *vnode_to_pnode)
Can some of these be const?
> +{
> +    int rc;
> +    DECLARE_DOMCTL;
> +    DECLARE_HYPERCALL_BUFFER(int, distbuf);
> +    DECLARE_HYPERCALL_BUFFER(vnuma_memblk_t, membuf);
> +    DECLARE_HYPERCALL_BUFFER(int, vcpumapbuf); 
> +    DECLARE_HYPERCALL_BUFFER(int, vntopbuf); 
> +
> +    rc = -EINVAL;
After the comment below about ENOMEM I think the value set here is
unused.
> +    memset(&domctl, 0, sizeof(domctl));
DECLARE_DOMCTL will initialise domctl iff valgrind is enabled, which is
all that is required I think.
> +    if ( vdistance == NULL || vcpu_to_vnode == NULL || vmemblks == NULL )
> +    /* vnode_to_pnode can be null on non-NUMA machines */
> +    {
> +            PERROR("Parameters are wrong
XEN_DOMCTL_setvnumainfo\n");
> +            return -EINVAL;
> +    }
> +    distbuf = xc_hypercall_buffer_alloc
> +        (xch, distbuf, sizeof(*vdistance) * nr_vnodes * nr_vnodes);
> +    membuf = xc_hypercall_buffer_alloc
> +        (xch, membuf, sizeof(*membuf) * nr_vnodes);
> +    vcpumapbuf = xc_hypercall_buffer_alloc
> +        (xch, vcpumapbuf, sizeof(*vcpu_to_vnode) * nr_vcpus);
> +    vntopbuf = xc_hypercall_buffer_alloc
> +        (xch, vntopbuf, sizeof(*vnode_to_pnode) * nr_vnodes); 
> +
> +    if (distbuf == NULL || membuf == NULL || vcpumapbuf == NULL ||
vntopbuf == NULL )
> +    {
> +            PERROR("Could not allocate memory for xc hypercall
XEN_DOMCTL_setvnumainfo\n");
rc = -ENOMEM?
> +            goto fail;
> +    }
> +    memcpy(distbuf, vdistance, sizeof(*vdistance) * nr_vnodes *
nr_vnodes);
> +    memcpy(vntopbuf, vnode_to_pnode, sizeof(*vnode_to_pnode) * nr_vnodes);
> +    memcpy(vcpumapbuf, vcpu_to_vnode, sizeof(*vcpu_to_vnode) * nr_vcpus);
> +    memcpy(membuf, vmemblks, sizeof(*vmemblks) * nr_vnodes);
You can use DECLARE_HYPERCALL_BOUNCE and xc__hypercall_bounce_pre/post
which takes care of the alloc and copying stuff internally.
> +    
> +    set_xen_guest_handle(domctl.u.vnuma.vdistance, distbuf); 
> +    set_xen_guest_handle(domctl.u.vnuma.vnuma_memblks, membuf);
> +    set_xen_guest_handle(domctl.u.vnuma.vcpu_to_vnode, vcpumapbuf);
> +    set_xen_guest_handle(domctl.u.vnuma.vnode_to_pnode, vntopbuf);
> +    
> +    domctl.cmd = XEN_DOMCTL_setvnumainfo;
> +    domctl.domain = (domid_t)domid;
> +    domctl.u.vnuma.nr_vnodes = nr_vnodes;
> +    rc = do_domctl(xch, &domctl);
> +fail:
> +    xc_hypercall_buffer_free(xch, distbuf);
> +    xc_hypercall_buffer_free(xch, membuf);
> +    xc_hypercall_buffer_free(xch, vcpumapbuf);
> +    xc_hypercall_buffer_free(xch, vntopbuf);
> +
> +    return rc;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index f2cebaf..fb66cfa 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -1083,6 +1083,23 @@ int xc_domain_set_memmap_limit(xc_interface *xch,
>                                 uint32_t domid,
>                                 unsigned long map_limitkb);
>  
> +/*unsigned long xc_get_memory_hole_size(unsigned long start, unsigned long
end);
What is this?
> +
> +int xc_domain_align_vnodes(xc_interface *xch,
> +                            uint32_t domid,
> +                            uint64_t *vmemareas, 
> +                            vnuma_memblk_t *vnuma_memblks,
> +                            uint16_t nr_vnodes);
> +*/
> +int xc_domain_setvnodes(xc_interface *xch,
> +                        uint32_t domid,
> +                        uint16_t nr_vnodes,
> +                        uint16_t nr_vcpus,
> +                        vnuma_memblk_t *vmemareas,
> +                        int *vdistance,
> +                        int *vcpu_to_vnode,
> +                        int *vnode_to_pnode);
> +
>  #if defined(__i386__) || defined(__x86_64__)
>  /*
>   * PC BIOS standard E820 types and structure.

Ian Campbell

2013-Aug-27 09:45 UTC

head link

Re: [PATCH RFC 4/7] libxl/vnuma: vnuma domain config

On Tue, 2013-08-27 at 03:54 -0400, Elena Ufimtseva
wrote:> Defines VM config options for vNUMA PV domain creation as follows:
> vnodes - number of nodes and enables vnuma
> vnumamem - vnuma nodes memory sizes
> vnuma_distance - vnuma distance table (may be omitted)
> vcpu_to_vnode - vcpu to vnode mask (may be omitted)
> 
> sum of all numamem should be equal to memory option.
> Number of vcpus should not be less that number of vnodes.
> 
> VM config Examples:
Please patch docs/ as necessary (e.g. the manpages) at the same time.
> 
> memory = 16384
> vcpus = 8
> name = "rc"
> vnodes = 8
> vnumamem = "2g, 2g, 2g, 2g, 2g, 2g, 2g, 2g"
> vcpu_to_vnode ="5 6 7 4 3 2 1 0"
xl cfg supports arrays, is there any reason not to use them?

Hopefully (lib)xl will also implement some sort of sane default in the
case where people don''t want to spell all this out?

Is it actually useful to be able to arbitrarily map vcpus to nodes? I''d
have thought dividing the vcpus among the nodes evenly would be
sufficient for almost everyone.

What happens if the total of vnumamem does not == memory? Would it be
useful to be able to specify this as ratios? e.g. "1:1:1:1" etc? Or
maybe we should simply extend the memory syntax to take a list and
memory becomes the total?

What happens if length(vnumamem) != vnodes? Likewise vcpu_to_vnode vs
vcspus.

How is maxmem handled/reconciled? Is there a vnumamaxmem? Likewise
maxvcpus.
> memory = 2048
> vcpus = 4
> name = "rc9"
> vnodes = 2
> vnumamem = "1g, 1g"
> vnuma_distance = "10 20, 10 20"
> vcpu_to_vnode ="1, 3, 2, 0"
> 
> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
> ---
>  tools/libxl/libxl.c          |   28 ++++++
>  tools/libxl/libxl.h          |   15 ++++
>  tools/libxl/libxl_arch.h     |    6 ++
>  tools/libxl/libxl_dom.c      |  115 ++++++++++++++++++++++--
>  tools/libxl/libxl_internal.h |    3 +
>  tools/libxl/libxl_types.idl  |    6 +-
>  tools/libxl/libxl_x86.c      |   91 +++++++++++++++++++
>  tools/libxl/xl_cmdimpl.c     |  197
+++++++++++++++++++++++++++++++++++++++++-
>  8 files changed, 454 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 81785df..cd25474 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -4293,6 +4293,34 @@ static int libxl__set_vcpuonline_qmp(libxl__gc *gc,
uint32_t domid,
>      }
>      return 0;
>  }
> +#ifdef LIBXL_HAVE_BUILDINFO_VNUMA
libxl itself doesn''t need to use the ifdef, just provide it for
external
callers.
> +int libxl_domain_setvnodes(libxl_ctx *ctx,
> +                            uint32_t domid,
> +                            uint16_t nr_vnodes,
> +                            uint16_t nr_vcpus,
> +                            vnuma_memblk_t *vnuma_memblks,
> +                            int *vdistance,
> +                            int *vcpu_to_vnode,
> +                            int *vnode_to_pnode)
> +{
> +    GC_INIT(ctx);
> +    int ret;
> +    ret = xc_domain_setvnodes(ctx->xch, domid, nr_vnodes,
> +                                nr_vcpus, vnuma_memblks,
> +                                vdistance, vcpu_to_vnode,
> +                                vnode_to_pnode);
> +    GC_FREE;
> +    return ret;
> +}
> +
> +int libxl_default_vcpu_to_vnuma(libxl_domain_build_info *info)
> +{
> +    int i;
> +    for(i = 0; i < info->max_vcpus; i++)
> +        info->vcpu_to_vnode[i] = i % info->nr_vnodes;
> +    return 0;
> +}
> +#endif
>  
>  int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_bitmap
*cpumap)
>  {
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index be19bf5..a1a5e33 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -706,6 +706,21 @@ void libxl_vcpuinfo_list_free(libxl_vcpuinfo *, int
nr_vcpus);
>  void libxl_device_vtpm_list_free(libxl_device_vtpm*, int nr_vtpms);
>  void libxl_vtpminfo_list_free(libxl_vtpminfo *, int nr_vtpms);
>  
> +/* vNUMA topology */
> +
> +#ifdef LIBXL_HAVE_BUILDINFO_VNUMA 
Unneeded, but you do need to add the #define (which seems missing, how
does this stuff get built?)
> +#include <xen/vnuma.h>
Includes should go at the top unless there is a good reason otherwise.

However we try and avoid exposing Xen interfaces in the libxl interface.
This means you need to define a libxl equivalent, which should be done
via the libxl IDL.
> +int libxl_domain_setvnodes(libxl_ctx *ctx,
> +                            uint32_t domid,
> +                            uint16_t nr_vnodes,
> +                            uint16_t nr_vcpus,
> +                            vnuma_memblk_t *vnuma_memblks,
> +                            int *vdistance,
> +                            int *vcpu_to_vnode,
> +                            int *vnode_to_pnode);
> +
> +int libxl_default_vcpu_to_vnuma(libxl_domain_build_info *info);
> +#endif
>  /*
>   * Devices
>   * ======> diff --git a/tools/libxl/libxl_arch.h
b/tools/libxl/libxl_arch.h
> index abe6685..76c1975 100644
> --- a/tools/libxl/libxl_arch.h
> +++ b/tools/libxl/libxl_arch.h
> @@ -18,5 +18,11 @@
>  /* arch specific internal domain creation function */
>  int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config
*d_config,
>                 uint32_t domid);
> +int libxl_vnuma_align_mem(libxl__gc *gc,
libxl__foo (double underscores) for internal function please.
> +                            uint32_t domid,
> +                            struct libxl_domain_build_info *b_info,
> +                            vnuma_memblk_t *memblks); /* linux specific
memory blocks: out */
Why/how is this Linux specific? This is a hypercall parameter, isn''t
it?
>  
> +
> +unsigned long e820_memory_hole_size(unsigned long start, unsigned long
end, struct e820entry e820[], int nr);
>  #endif
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 6e2252a..8bbbd18 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -200,6 +200,63 @@ static int numa_place_domain(libxl__gc *gc, uint32_t
domid,
>      libxl_cpupoolinfo_dispose(&cpupool_info);
>      return rc;
>  }
> +#define set_all_vnodes(n)    for(i=0; i< info->nr_vnodes; i++) \
> +                                info->vnode_to_pnode[i] = n
> +
> +int libxl_init_vnodemap(libxl__gc *gc, uint32_t domid,
Double underscore please.
> +                        libxl_domain_build_info *info)
> +{
> +    int i, n, start, nr_nodes;
> +    uint64_t *mems;
> +    unsigned long long claim[16];
Where does 16 come from?
> +    libxl_numainfo *ninfo = NULL;
> +
> +    if (info->vnode_to_pnode == NULL)
> +        info->vnode_to_pnode = calloc(info->nr_vnodes,
sizeof(*info->vnode_to_pnode));
> +
> +    set_all_vnodes(NUMA_NO_NODE);
> +    mems = info->vnuma_memszs;
> +    ninfo = libxl_get_numainfo(CTX, &nr_nodes);
> +    if (ninfo == NULL) {
> +        LOG(INFO, "No HW NUMA found\n");
> +        return -EINVAL;
> +    }
> +    /* lets check if all vnodes will fit in one node */
> +    for(n = 0; n < nr_nodes;  n++) {
> +        if(ninfo[n].free/1024 >= info->max_memkb) {
> +            /* all fit on one node, fill the mask */
> +            set_all_vnodes(n);
> +            LOG(INFO, "Setting all vnodes to node %d, free = %lu,
need =%lu Kb\n", n, ninfo[n].free/1024, info->max_memkb);
> +            return 0;
> +            }
> +    }
> +    /* TODO: change algorithm. The current just fits the nodes
> +     * Will be nice to have them also sorted by size  */
> +    /* If no p-node found, will be set to NUMA_NO_NODE and allocation will
fail */
> +    LOG(INFO, "Found %d physical NUMA nodes\n", nr_nodes);
> +    memset(claim, 0, sizeof(*claim) * 16);
> +    start =  0;
> +    for ( n = 0; n < nr_nodes;  n++ )
> +    {
If nr_nodes > 16 this will overflow claim[n].
> +        for ( i = start; i < info->nr_vnodes; i++ )
> +        {
> +            LOG(INFO, "Compare %Lx for vnode[%d] size %lx with free
space on pnode[%d], free %lx\n",
> +                  claim[n] + mems[i], i, mems[i], n, ninfo[n].free);
These should be at best LOG(DEBUG, ...). Perhaps a LOG_(INFO, ...)
summary at the end would be suitable?
> +            if ( ((claim[n] + mems[i]) <= ninfo[n].free) &&
(info->vnode_to_pnode[i] == NUMA_NO_NODE) )
> +            {
> +                info->vnode_to_pnode[i] = n;
> +                LOG(INFO, "Set vnode[%d] to pnode [%d]\n", i,
n);
> +                claim[n] += mems[i];
> +            }
> +            else {
> +                /* Will have another chance at other pnode */
> +                start = i;
> +                continue;
> +            }
> +        }
> +    }
> +    return 0;
> +}
>  
>  int libxl__build_pre(libxl__gc *gc, uint32_t domid,
>                libxl_domain_config *d_config, libxl__domain_build_state
*state)
> @@ -232,9 +289,36 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
>          if (rc)
>              return rc;
>      }
> +#ifdef  LIBXL_HAVE_BUILDINFO_VNUMA
Not needed.
> +    if (info->nr_vnodes <= info->max_vcpus &&
info->nr_vnodes != 0) {
> +        vnuma_memblk_t *memblks = libxl__calloc(gc, info->nr_vnodes,
sizeof(*memblks));
> +        libxl_vnuma_align_mem(gc, domid, info, memblks);
> +        if (libxl_init_vnodemap(gc, domid, info) != 0) {
> +            LOG(INFO, "Failed to call init_vnodemap\n");
> +            rc = libxl_domain_setvnodes(ctx, domid, info->nr_vnodes,
> +                                    info->max_vcpus, memblks,
> +                                    info->vdistance,
info->vcpu_to_vnode,
> +                                    NULL);
> +        }
> +        else
> +            rc = libxl_domain_setvnodes(ctx, domid, info->nr_vnodes,
> +                                    info->max_vcpus, memblks,
> +                                    info->vdistance,
info->vcpu_to_vnode,
> +                                    info->vnode_to_pnode);
> +        if (rc < 0 ) LOG(INFO, "Failed to call
xc_domain_setvnodes\n");
> +        for(int i=0; i<info->nr_vnodes; i++)
> +            LOG(INFO, "Mapping vnode %d to pnode %d\n", i,
info->vnode_to_pnode[i]);
> +        libxl_bitmap_set_none(&info->nodemap);
> +        libxl_bitmap_set(&info->nodemap, 0);
> +    }
> +    else {
> +        LOG(INFO, "NOT Calling vNUMA construct with nr_nodes =
%d\n", info->nr_vnodes);
> +        info->nr_vnodes = 0;
> +    }
> +#endif
>      libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap);
>      libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,
&info->cpumap);
> -
> +        
>      xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb +
LIBXL_MAXMEM_CONSTANT);
>      xs_domid = xs_read(ctx->xsh, XBT_NULL,
"/tool/xenstored/domid", NULL);
>      state->store_domid = xs_domid ? atoi(xs_domid) : 0;
> @@ -368,7 +452,20 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
>              }
>          }
>      }
> -
> +#ifdef LIBXL_HAVE_BUILDINFO_VNUMA
and again.
> +    if (info->nr_vnodes != 0 && info->vnuma_memszs != NULL
&& info->vnode_to_pnode != NULL) {
> +        dom->nr_vnodes = info->nr_vnodes;
> +        dom->vnumablocks = malloc(info->nr_vnodes *
sizeof(*dom->vnumablocks));
> +        dom->vnode_to_pnode = (int *)malloc(info->nr_vnodes *
sizeof(*info->vnode_to_pnode));
> +        dom->vmemsizes = malloc(info->nr_vnodes *
sizeof(*info->vnuma_memszs));
> +        if (dom->vmemsizes == NULL || dom->vnode_to_pnode == NULL) {
> +            LOGE(ERROR, "%s:Failed to allocate memory for memory
sizes.\n",__FUNCTION__);
I thought LOG* already included file/function stuff.
> +            goto out;
> +        }
> +        memcpy(dom->vmemsizes, info->vnuma_memszs,
sizeof(*info->vnuma_memszs) * info->nr_vnodes);
> +        memcpy(dom->vnode_to_pnode, info->vnode_to_pnode,
sizeof(*info->vnode_to_pnode) * info->nr_vnodes);
> +    }
> +#endif
>      dom->flags = flags;
>      dom->console_evtchn = state->console_port;
>      dom->console_domid = state->console_domid;
> @@ -388,9 +485,17 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
>          LOGE(ERROR, "xc_dom_mem_init failed");
>          goto out;
>      }
> -    if ( (ret = xc_dom_boot_mem_init(dom)) != 0 ) {
> -        LOGE(ERROR, "xc_dom_boot_mem_init failed");
> -        goto out;
> +    if (info->nr_vnodes != 0 && info->vnuma_memszs != NULL)
{
> +        if ( (ret = xc_dom_boot_mem_init(dom)) != 0 ) {
> +            LOGE(ERROR, "xc_dom_boot_mem_init_node  failed");
No _node on the actual call here, I can''t see how it differes from the
following call in fact.
> +            goto out;
> +        }
> +    }
> +    else {
> +        if ( (ret = xc_dom_boot_mem_init(dom)) != 0 ) {
> +            LOGE(ERROR, "xc_dom_boot_mem_init failed");
> +            goto out;
> +        }
>      }
>      if ( (ret = xc_dom_build_image(dom)) != 0 ) {
>          LOGE(ERROR, "xc_dom_build_image failed");
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index f051d91..4a501c4 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2709,6 +2709,7 @@ static inline void libxl__ctx_unlock(libxl_ctx *ctx)
{
>  #define CTX_LOCK (libxl__ctx_lock(CTX))
>  #define CTX_UNLOCK (libxl__ctx_unlock(CTX))
>  
> +#define NUMA_NO_NODE 0xFF
256 nodes isn''t completely implausible. Looks like nr_vnodes is a
uint16_t so 0xffff or ~((uint16_t)0) would be better I think.

>  /*
>   * Automatic NUMA placement
>   *
> @@ -2832,6 +2833,8 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc,
>      libxl_bitmap_copy(CTX, &cndt->nodemap, nodemap);
>  }
>  
> +int libxl_init_vnodemap(libxl__gc *gc, uint32_t domid,
> +                                libxl_domain_build_info *info);
>  /*
>   * Inserts "elm_new" into the sorted list "head".
>   *
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 85341a0..c3a4d95 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -208,6 +208,7 @@ libxl_dominfo = Struct("dominfo",[
>      ("vcpu_max_id", uint32),
>      ("vcpu_online", uint32),
>      ("cpupool",     uint32),
> +    ("nr_vnodes",   uint16),
>      ], dir=DIR_OUT)
>  
>  libxl_cpupoolinfo = Struct("cpupoolinfo", [
> @@ -279,7 +280,10 @@ libxl_domain_build_info =
Struct("domain_build_info",[
>      ("disable_migrate", libxl_defbool),
>      ("cpuid",           libxl_cpuid_policy_list),
>      ("blkdev_start",    string),
> -    
> +    ("vnuma_memszs",    Array(uint64, "nr_vnodes")),
> +    ("vcpu_to_vnode",   Array(integer,
"nr_vnodemap")),
> +    ("vdistance",       Array(integer, "nr_vdist")),
> +    ("vnode_to_pnode",  Array(integer,
"nr_vnode_to_pnode")),
>      ("device_model_version", libxl_device_model_version),
>      ("device_model_stubdomain", libxl_defbool),
>      # if you set device_model you must set device_model_version too
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index a78c91d..35da3a8 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -308,3 +308,94 @@ int libxl__arch_domain_create(libxl__gc *gc,
libxl_domain_config *d_config,
>  
>      return ret;
>  }
> +
> +unsigned long e820_memory_hole_size(unsigned long start, unsigned long
end, struct e820entry e820[], int nr)
> +{
> +#define clamp(val, min, max) ({             \
> +    typeof(val) __val = (val);              \
> +    typeof(min) __min = (min);              \
> +    typeof(max) __max = (max);              \
> +    (void) (&__val == &__min);              \
> +    (void) (&__val == &__max);              \
> +    __val = __val < __min ? __min: __val;   \
> +    __val > __max ? __max: __val; })
> +    int i;
> +    unsigned long absent, start_pfn, end_pfn;
> +    absent = start - end;
> +    for(i = 0; i < nr; i++) {
> +        if(e820[i].type == E820_RAM) {
> +            start_pfn = clamp(e820[i].addr, start, end);
> +            end_pfn =   clamp(e820[i].addr + e820[i].size, start, end);
> +            absent -= end_pfn - start_pfn;
> +        }
> +    }
> +    return absent;
> +}
> +
> +/* Align memory blocks for linux NUMA build image */
> +int libxl_vnuma_align_mem(libxl__gc *gc,
> +                            uint32_t domid,
> +                            libxl_domain_build_info *b_info,
> +                            vnuma_memblk_t *memblks) /* linux specific
memory blocks: out */
> +{
> +#ifndef roundup
> +#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
> +#endif 
> +    /* 
> +      This function transforms mem block sizes in bytes 
> +      into aligned PV Linux guest NUMA nodes. 
> +      XEN will provide this memory layout to PV Linux guest upon boot for
> +      PV Linux guests.
You say PV Linux guest three times here but I don''t think any of this
is
specific to PV Linux as opposed to PV guests generally (whether or not
Linux is the only current implementation of this interface doesn''t
really matter)
> +    */
> +    int i, rc;
> +    unsigned long shift = 0, size, node_min_size = 1, limit;
> +    unsigned long end_max;
> +    uint32_t nr;
> +    struct e820entry map[E820MAX];
> +    
> +    libxl_ctx *ctx = libxl__gc_owner(gc);
> +    rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX);
> +    if (rc < 0) {
> +        errno = rc;
> +        return -EINVAL;
> +    }
> +    nr = rc;
> +    rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
> +                       (b_info->max_memkb - b_info->target_memkb) +
> +                       b_info->u.pv.slack_memkb);
> +    if (rc)
> +        return ERROR_FAIL;
> +    
> +    end_max = map[nr-1].addr + map[nr-1].size;
> +    
> +    shift = 0;
> +    for(i = 0; i < b_info->nr_vnodes; i++) {
> +        printf("block [%d] start inside align = %#lx\n", i,
b_info->vnuma_memszs[i]);
No printf in libxl please.
> +    }
> +    memset(memblks, 0, sizeof(*memblks)*b_info->nr_vnodes);
> +    memblks[0].start = 0;
> +    for(i = 0; i < b_info->nr_vnodes; i++) {
> +        memblks[i].start += shift;
> +        memblks[i].end += shift + b_info->vnuma_memszs[i];
> +        limit = size = memblks[i].end - memblks[i].start;
> +        while (memblks[i].end - memblks[i].start -
e820_memory_hole_size(memblks[i].start, memblks[i].end, map, nr) < size) {
Please see if you can shorten this line.
> +            memblks[i].end += node_min_size;
> +            shift += node_min_size;
> +            if (memblks[i].end - memblks[i].start >= limit) {
> +                memblks[i].end = memblks[i].start + limit;
> +                break;
> +            }
> +            if (memblks[i].end == end_max) {
> +                memblks[i].end = end_max;
> +                break;
> +            }
> +        }
> +        shift = memblks[i].end;
> +        memblks[i].start = roundup(memblks[i].start, 4*1024);
> +
> +        printf("start = %#010lx, end = %#010lx\n",
memblks[i].start, memblks[i].end);
> +    }
> +    if(memblks[i-1].end > end_max)
> +        memblks[i-1].end = end_max;
> +    return 0;
> +}
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 884f050..36a8275 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -539,7 +539,121 @@ vcpp_out:
>  
>      return rc;
>  }
> +#ifdef LIBXL_HAVE_BUILDINFO_VNUMA
This isn''t strictly needed in xl either. Although some people are keen
to have xl build against newer and older libxl in order to test the
compatibility guarentees made by the library.
> +static int vdistance_parse(char *vdistcfg, int *vdistance, int nr_vnodes)
> +{
Please can you use some line breaks to separate logical paragraphs and
make things more readable. e.g. after the local variable declaration and
between related blocks of code.
> +    char *endptr, *toka, *tokb, *saveptra = NULL, *saveptrb = NULL;
> +    int *vdist_tmp = NULL;
> +    int rc = 0;
> +    int i, j, dist, parsed = 0;	
> +    rc = -EINVAL;
Here you have:

int rc = 0;
rc = -EINVAL

One of them is redundant.
> +    if(vdistance == NULL) {
> +        return rc;
> +    }
> +    vdist_tmp = (int *)malloc(nr_vnodes * nr_vnodes * sizeof(*vdistance));
> +    if (vdist_tmp == NULL)
> +        return rc;
> +    i =0; j = 0;
> +    for (toka = strtok_r(vdistcfg, ",", &saveptra); toka;
> +        toka = strtok_r(NULL, ",", &saveptra)) {
> +        if ( i >= nr_vnodes ) 
> +            goto vdist_parse_err;
> +        for (tokb = strtok_r(toka, " ", &saveptrb); tokb;
> +            tokb = strtok_r(NULL, " ", &saveptrb)) {
> +            if (j >= nr_vnodes) 
> +                goto vdist_parse_err;
> +            dist = strtol(tokb, &endptr, 10);
> +            if (tokb == endptr)
> +                goto vdist_parse_err;
> +            *(vdist_tmp + j*nr_vnodes + i) = dist;
> +            parsed++;
> +            j++;
> +        }
> +        i++;
> +        j = 0;
This would all be easier if it was an xlcfg list.
> +    }
> +    rc = parsed;
> +    memcpy(vdistance, vdist_tmp, nr_vnodes * nr_vnodes *
sizeof(*vdistance));
> +vdist_parse_err:
> +    if (vdist_tmp !=NULL ) free(vdist_tmp);
> +    return rc;
> +}
>  
> +static int vcputovnode_parse(char *cfg, int *vmap, int nr_vnodes, int
nr_vcpus)
> +{
> +    char *toka, *endptr, *saveptra = NULL;
> +    int *vmap_tmp = NULL;
> +    int rc = 0;
> +    int i;
> +    rc = -EINVAL;
> +    i = 0;
> +    if(vmap == NULL) {
> +        return rc;
> +    }
> +    vmap_tmp = (int *)malloc(sizeof(*vmap) * nr_vcpus);
> +    memset(vmap_tmp, 0, sizeof(*vmap) * nr_vcpus);
> +    for (toka = strtok_r(cfg, " ", &saveptra); toka;
> +        toka = strtok_r(NULL, " ", &saveptra)) {
> +        if (i >= nr_vcpus) goto vmap_parse_out;
> +            vmap_tmp[i] = strtoul(toka, &endptr, 10);
> +            if( endptr == toka) 
> +                goto vmap_parse_out;
> +            fprintf(stderr, "Parsed vcpu_to_vnode[%d] = %d.\n",
i, vmap_tmp[i]);
> +        i++;
> +    }
> +    memcpy(vmap, vmap_tmp, sizeof(*vmap) * nr_vcpus);
> +    rc = i;
> +vmap_parse_out:
> +    if (vmap_tmp != NULL) free(vmap_tmp);
> +    return rc;
> +}
> +
> +static int vnumamem_parse(char *vmemsizes, uint64_t *vmemregions, int
nr_vnodes)
> +{
> +    uint64_t memsize;
> +    char *endptr, *toka, *saveptr = NULL;
> +    int rc = 0;
> +    int j;
> +    rc = -EINVAL;
> +    if(vmemregions == NULL) {
> +        goto vmem_parse_out;
> +    }
> +    memsize = 0;
> +    j = 0;
> +    for (toka = strtok_r(vmemsizes, ",", &saveptr); toka;
> +        toka = strtok_r(NULL, ",", &saveptr)) {
> +        if ( j >= nr_vnodes ) 
> +            goto vmem_parse_out;
> +        memsize = strtoul(toka, &endptr, 10);
> +        if (endptr == toka) 
> +            goto vmem_parse_out;
> +        switch (*endptr) {
> +            case ''G'':
> +            case ''g'':
> +                memsize = memsize * 1024 * 1024 * 1024;
> +                break;
> +            case ''M'':
> +            case ''m'':
> +                memsize = memsize * 1024 * 1024;
> +                break;
> +            case ''K'':
> +            case ''k'':
> +                memsize = memsize * 1024 ;
> +                break;
> +            default:
> +                continue;
> +                break;
> +        }
> +        if (memsize > 0) {
> +            vmemregions[j] = memsize;
> +            j++;
> +        }
> +    }
> +    rc = j;
> +vmem_parse_out:   
> +    return rc;
> +}
> +#endif
>  static void parse_config_data(const char *config_source,
>                                const char *config_data,
>                                int config_len,
> @@ -871,7 +985,13 @@ static void parse_config_data(const char
*config_source,
>      {
>          char *cmdline = NULL;
>          const char *root = NULL, *extra = "";
> -
> +#ifdef LIBXL_HAVE_BUILDINFO_VNUMA        
> +        const char *vnumamemcfg = NULL;
> +        int nr_vnuma_regions;
> +        long unsigned int vnuma_memparsed = 0;
> +        const char *vmapcfg  = NULL;
> +        const char *vdistcfg = NULL;
> +#endif
>          xlu_cfg_replace_string (config, "kernel",
&b_info->u.pv.kernel, 0);
>  
>          xlu_cfg_get_string (config, "root", &root, 0);
> @@ -888,7 +1008,82 @@ static void parse_config_data(const char
*config_source,
>              fprintf(stderr, "Failed to allocate memory for
cmdline\n");
>              exit(1);
>          }
> +#ifdef LIBXL_HAVE_BUILDINFO_VNUMA
> +        if (!xlu_cfg_get_long (config, "vnodes", &l, 0)) {
> +                b_info->nr_vnodes = l;
> +                if (b_info->nr_vnodes <= 0)
> +                    exit(1);
> +                if(!xlu_cfg_get_string (config, "vnumamem",
&vnumamemcfg, 0)) {
> +                        b_info->vnuma_memszs =
calloc(b_info->nr_vnodes,
> +                                                   
sizeof(*b_info->vnuma_memszs));
> +                        if (b_info->vnuma_memszs == NULL) {
> +                            fprintf(stderr, "WARNING: Could not
allocate vNUMA node memory sizes.\n");
> +                            exit(1);
> +                        }
> +                        char *buf2 = strdup(vnumamemcfg);
> +                        nr_vnuma_regions = vnumamem_parse(buf2,
b_info->vnuma_memszs,
> +                                                               
b_info->nr_vnodes);
> +                        for(i = 0; i < b_info->nr_vnodes; i++)
> +                            vnuma_memparsed = vnuma_memparsed +
(b_info->vnuma_memszs[i] >> 10);
> +
> +                        if(vnuma_memparsed != b_info->max_memkb ||
> +                                nr_vnuma_regions != b_info->nr_vnodes )
> +                        {
> +                            fprintf(stderr, "WARNING: Incorrect vNUMA
config. Parsed memory = %lu, parsed nodes = %d, max = %lx\n",
> +                                        vnuma_memparsed, nr_vnuma_regions,
b_info->max_memkb);
> +                            if(buf2) free(buf2);
> +                            exit(1);
> +                        }
> +                        if (buf2) free(buf2);
> +                }
> +                else 
> +                    b_info->nr_vnodes=0;
> +                if(!xlu_cfg_get_string(config, "vnuma_distance",
&vdistcfg, 0)) {
> +                    b_info->vdistance = (int
*)calloc(b_info->nr_vnodes * b_info->nr_vnodes,
> +                                                       
sizeof(*b_info->vdistance));
> +                    if (b_info->vdistance == NULL) 
> +                       exit(1);
> +                    char *buf2 = strdup(vdistcfg);
> +                    if(vdistance_parse(buf2, b_info->vdistance,
b_info->nr_vnodes) != b_info->nr_vnodes * b_info->nr_vnodes) {
> +                        if (buf2) free(buf2);
> +                        free(b_info->vdistance);
> +                        exit(1);
> +                    } 
> +                    if(buf2) free(buf2);
> +                }
> +                else 
> +                {
> +                    /* default distance */
> +                    b_info->vdistance = (int
*)calloc(b_info->nr_vnodes * b_info->nr_vnodes,
sizeof(*b_info->vdistance));
> +                    if (b_info->vdistance == NULL)
> +                        exit(1);
> +                    for(i = 0; i < b_info->nr_vnodes; i++)
> +                        for(int j = 0; j < b_info->nr_vnodes; j++)
> +                            *(b_info->vdistance +
j*b_info->nr_vnodes + i) = (i == j ? 10 : 20);
>  
> +                }
> +                if(!xlu_cfg_get_string(config, "vcpu_to_vnode",
&vmapcfg, 0))
> +                {
> +                    b_info->vcpu_to_vnode = (int
*)calloc(b_info->max_vcpus, sizeof(*b_info->vcpu_to_vnode));
> +                    if (b_info->vcpu_to_vnode == NULL) 
> +                       exit(-1);
> +                    char *buf2 = strdup(vmapcfg);
> +                    if (vcputovnode_parse(buf2, b_info->vcpu_to_vnode,
b_info->nr_vnodes, b_info->max_vcpus) < 0) {
> +                        if (buf2) free(buf2);
> +                        fprintf(stderr, "Error parsing vcpu to vnode
mask\n");
> +                        exit(1);
> +                    }
> +                    if(buf2) free(buf2);
> +                }
> +                else
> +                {
> +                    b_info->vcpu_to_vnode = (int
*)calloc(b_info->max_vcpus, sizeof(*b_info->vcpu_to_vnode));
> +                    if (b_info->vcpu_to_vnode != NULL)
> +                        libxl_default_vcpu_to_vnuma(b_info);
> +                }
> +        }
> +#endif        
> +        
>          xlu_cfg_replace_string (config, "bootloader",
&b_info->u.pv.bootloader, 0);
>          switch (xlu_cfg_get_list_as_string_list(config,
"bootloader_args",
>                                       
&b_info->u.pv.bootloader_args, 1))

Ian Campbell

2013-Aug-27 09:46 UTC

head link

Re: [PATCH RFC 5/7] libxl/vnuma: vnuma enabler.

On Tue, 2013-08-27 at 03:54 -0400, Elena Ufimtseva
wrote:> Enables libxl vnuma ABI by LIBXL_HAVE_BUILDINFO_VNUMA.
This should be in the patch which introduces the libxl interface and as
I mentioned it doesn''t need to be ifdef''d in the library.

A more natural way to structure this series would be to add the libxl
stuff in one patch and the xl stuff in a second.
> 
> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
> ---
>  tools/libxl/libxl.h |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index a1a5e33..ad0d0d8 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -90,6 +90,14 @@
>  #define LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE 1
>  
>  /*
> + * LIBXL_HAVE_BUILDINFO_VNUMA indicates that vnuma topology will be
> + * build for the guest upon request and with VM configuration.
> + * It will try to define best allocation for vNUMA
> + * nodes on real NUMA nodes.
> + */
> +#define LIBXL_HAVE_BUILDINFO_VNUMA 1
> +
> +/*
>   * libxl ABI compatibility
>   *
>   * The only guarantee which libxl makes regarding ABI compatibility

George Dunlap

2013-Aug-27 13:44 UTC

head link

Re: [PATCH RFC 0/7] vnuma topology support

On Tue, Aug 27, 2013 at 8:54 AM, Elena Ufimtseva <ufimtseva@gmail.com>
wrote:> This series of patches introduces vNUMA topology implementation and
> provides interfaces and data structures, exposing to PV guest virtual
topology
> and enabling guest OS to use its own NUMA placement mechanisms.
>
> vNUMA topology support for Linux PV guest comes in a separate patch.
>
> Please review and send your comments.
Elena, do you have a public git tree to pull from, for the lazy?

 -George

George Dunlap

2013-Aug-27 14:06 UTC

head link

Re: [PATCH RFC 1/7] xen/vnuma: subop hypercall and vnuma topology structures.

On Tue, Aug 27, 2013 at 9:53 AM, Jan Beulich <JBeulich@suse.com>
wrote:>>>> On 27.08.13 at 09:54, Elena Ufimtseva
<ufimtseva@gmail.com> wrote:
>> Defines XENMEM subop hypercall for PV vNUMA enabled guests and provides
>> vNUMA topology information from per-domain vnuma topology build info.
>> TODO:
>> subop XENMEM hypercall is subject to change to sysctl subop.
>
> That would mean it''s intended to be used by the tool stack only. I
> thought that the balloon driver (and perhaps other code) are also
> intended to be consumers.
Can Elena take it from your detailed review that you''re OK with the
general approach here?

 -George

Jan Beulich

2013-Aug-27 14:14 UTC

head link

Re: [PATCH RFC 1/7] xen/vnuma: subop hypercall and vnuma topology structures.

>>> On 27.08.13 at 16:06, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Tue, Aug 27, 2013 at 9:53 AM, Jan Beulich <JBeulich@suse.com>
wrote:
>>>>> On 27.08.13 at 09:54, Elena Ufimtseva
<ufimtseva@gmail.com> wrote:
>>> Defines XENMEM subop hypercall for PV vNUMA enabled guests and
provides
>>> vNUMA topology information from per-domain vnuma topology build
info.
>>> TODO:
>>> subop XENMEM hypercall is subject to change to sysctl subop.
>>
>> That would mean it''s intended to be used by the tool stack
only. I
>> thought that the balloon driver (and perhaps other code) are also
>> intended to be consumers.
> 
> Can Elena take it from your detailed review that you''re OK with
the
> general approach here?
Not yet - as said in the middle of both review replies, the
enormous amount of formal issues in the patches makes it
very hard to review them, and hence I gave up at those
points. Thus I can only say that it looks okay at a first
glance.

Jan

Matt Wilson

2013-Aug-28 16:42 UTC

head link

Re: [PATCH RFC 1/7] xen/vnuma: subop hypercall and vnuma topology structures.

On Tue, Aug 27, 2013 at 03:54:20AM -0400, Elena Ufimtseva
wrote:> Defines XENMEM subop hypercall for PV vNUMA enabled guests and provides
> vNUMA topology information from per-domain vnuma topology build info.
> TODO:
> subop XENMEM hypercall is subject to change to sysctl subop.
> 
> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
[...]
> diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
> index a057069..3d39218 100644
> --- a/xen/include/xen/domain.h
> +++ b/xen/include/xen/domain.h
> @@ -4,6 +4,7 @@
>  
>  #include <public/xen.h>
>  #include <asm/domain.h>
> +#include <public/vnuma.h>
>  
>  typedef union {
>      struct vcpu_guest_context *nat;
> @@ -89,4 +90,12 @@ extern unsigned int xen_processor_pmbits;
>  
>  extern bool_t opt_dom0_vcpus_pin;
>  
> +struct domain_vnuma_info {
> +    uint16_t nr_vnodes;
> +    int *vdistance;
> +    vnuma_memblk_t *vnuma_memblks;
> +    int *vcpu_to_vnode;
> +    int *vnode_to_pnode;
What''s the purpose of providing a vNode to pNode mapping to the guest?
Or am I misunderstanding that this wouldn''t be provided to the guest?
> +};
> +
>  #endif /* __XEN_DOMAIN_H__ */
--msw

Elena Ufimtseva

2013-Aug-28 17:01 UTC

head link

Re: [PATCH RFC 1/7] xen/vnuma: subop hypercall and vnuma topology structures.

Matt
This is for xen only to set the vnuma topology.
Guest will not have this, but will have some other interface to
retreive this info (as we were discussing this in numa aware
ballooning).

Elena

On Wed, Aug 28, 2013 at 12:42 PM, Matt Wilson <msw@amazon.com>
wrote:> On Tue, Aug 27, 2013 at 03:54:20AM -0400, Elena Ufimtseva wrote:
>> Defines XENMEM subop hypercall for PV vNUMA enabled guests and provides
>> vNUMA topology information from per-domain vnuma topology build info.
>> TODO:
>> subop XENMEM hypercall is subject to change to sysctl subop.
>>
>> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
>
> [...]
>
>> diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
>> index a057069..3d39218 100644
>> --- a/xen/include/xen/domain.h
>> +++ b/xen/include/xen/domain.h
>> @@ -4,6 +4,7 @@
>>
>>  #include <public/xen.h>
>>  #include <asm/domain.h>
>> +#include <public/vnuma.h>
>>
>>  typedef union {
>>      struct vcpu_guest_context *nat;
>> @@ -89,4 +90,12 @@ extern unsigned int xen_processor_pmbits;
>>
>>  extern bool_t opt_dom0_vcpus_pin;
>>
>> +struct domain_vnuma_info {
>> +    uint16_t nr_vnodes;
>> +    int *vdistance;
>> +    vnuma_memblk_t *vnuma_memblks;
>> +    int *vcpu_to_vnode;
>> +    int *vnode_to_pnode;
>
> What''s the purpose of providing a vNode to pNode mapping to the
guest?
> Or am I misunderstanding that this wouldn''t be provided to the
guest?
>
>> +};
>> +
>>  #endif /* __XEN_DOMAIN_H__ */
>
> --msw


-- 
Elena

Elena Ufimtseva

2013-Aug-30 21:12 UTC

head link

Re: [PATCH RFC 0/7] vnuma topology support

Hi George, all

I will have it ready along with next version of patches.

Elena

On Tue, Aug 27, 2013 at 9:44 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:> On Tue, Aug 27, 2013 at 8:54 AM, Elena Ufimtseva
<ufimtseva@gmail.com> wrote:
>> This series of patches introduces vNUMA topology implementation and
>> provides interfaces and data structures, exposing to PV guest virtual
topology
>> and enabling guest OS to use its own NUMA placement mechanisms.
>>
>> vNUMA topology support for Linux PV guest comes in a separate patch.
>>
>> Please review and send your comments.
>
> Elena, do you have a public git tree to pull from, for the lazy?
>
>  -George


-- 
Elena

Xen devel - Aug 2013 - [PATCH RFC 0/7] vnuma topology support

[PATCH RFC 0/7] vnuma topology support

[PATCH RFC 1/7] xen/vnuma: subop hypercall and vnuma topology structures.

[PATCH RFC 2/7] xen/vnuma: domctl subop for vnuma setup.

[PATCH RFC 3/7] libxc/vnuma: per-domain vnuma structures.

[PATCH RFC 4/7] libxl/vnuma: vnuma domain config

[PATCH RFC 5/7] libxl/vnuma: vnuma enabler.

[PATCH RFC 6/7] libxc/vnuma: vnuma per phys NUMA allocation.

[PATCH RFC 7/7] xen/vnuma: basic vnuma debug info

Re: [PATCH RFC 1/7] xen/vnuma: subop hypercall and vnuma topology structures.

Re: [PATCH RFC 2/7] xen/vnuma: domctl subop for vnuma setup.

Re: [PATCH RFC 3/7] libxc/vnuma: per-domain vnuma structures.

Re: [PATCH RFC 4/7] libxl/vnuma: vnuma domain config

Re: [PATCH RFC 5/7] libxl/vnuma: vnuma enabler.

Re: [PATCH RFC 0/7] vnuma topology support

Re: [PATCH RFC 1/7] xen/vnuma: subop hypercall and vnuma topology structures.

Re: [PATCH RFC 1/7] xen/vnuma: subop hypercall and vnuma topology structures.

Re: [PATCH RFC 1/7] xen/vnuma: subop hypercall and vnuma topology structures.

Re: [PATCH RFC 1/7] xen/vnuma: subop hypercall and vnuma topology structures.

Re: [PATCH RFC 0/7] vnuma topology support