Hello Everyone, This posting follows the RFC subject-ed "Automatically place guest on host''s NUMA nodes with xl" (patchbomb.1334150267@Solace , 4/11/2012), _however_ the key bits have been pretty much rewritten, aiming at 4.2 inclusion. In fact, automatic NUMA placement is something xm/xend does, and xl should include something like that, for the sake of feature parity. For the same reason, many of the functionalities that were part of the RFC are being kept out (e.g., NUMA aware scheduling in the hypervisor). Of course, while at it, all the review comments the RFC got, which involved patches that are part of this series too, have been considered and applied. So, here it is, in some more details, what this is all about: 1/11 libxc: abstract xenctl_cpumap to just xenctl_map 2/11 libxl: abstract libxl_cpumap to just libxl_map 3/11 libxc, libxl: introduce xc_nodemap_t and libxl_nodemap 4/11 libxl: expand the libxl_{cpu,node}map API a bit 5/11 libxl: add a new Array type to the IDL 6/11 libxl: introduce libxl_get_numainfo() Introduce the data structures, calls and infrastructure needed for retrieving all the information about the NUMA-ness of the system and deal with them at the toolstack level. 7/11 xen: enhance dump_numa output 8/11 xl: add more NUMA information to `xl info -n'' Are just output and debugging enhancements. 9/11 libxl, xl: enable automatic placement of guests on NUMA nodes 10/11 libxl, xl: heuristics for reordering NUMA placement candidates Are where the automatic placement is really implemented. Searching for suitable placement candidates and manipulating these candidates a bit is being done within libxl, by adding a couple of API calls. This was not like this in the previous version, but I really think it is something other toolstack wanting to build on top of libxl could benefit of this, instead of re-implementing everything from scratch. On the other hand, The key bits of the algorithm (i.e., which candidate to chose, how, whether or not to mangle it a bit, etc.) are implemented at the xl level. 11/11 Some automatic NUMA placement documentation Puts some more details about the implementation and the usage of the new feature directly in the tree. Thanks a lot and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Dario Faggioli
2012-May-31 12:11 UTC
[PATCH 01 of 11] libxc: abstract xenctl_cpumap to just xenctl_map
More specifically: 1. replaces xenctl_cpumap with xenctl_map 2. provides bitmap_to_xenctl_map and the reverse; 3. re-implement cpumask_to_xenctl_map with bitmap_to_xenctl_map and the reverse; Other than #3, no functional changes. Interface only slightly afected. This is in preparation of introducing NUMA nodes maps. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.eu.com> diff --git a/tools/libxc/xc_cpupool.c b/tools/libxc/xc_cpupool.c --- a/tools/libxc/xc_cpupool.c +++ b/tools/libxc/xc_cpupool.c @@ -90,7 +90,7 @@ xc_cpupoolinfo_t *xc_cpupool_getinfo(xc_ sysctl.u.cpupool_op.op = XEN_SYSCTL_CPUPOOL_OP_INFO; sysctl.u.cpupool_op.cpupool_id = poolid; set_xen_guest_handle(sysctl.u.cpupool_op.cpumap.bitmap, local); - sysctl.u.cpupool_op.cpumap.nr_cpus = local_size * 8; + sysctl.u.cpupool_op.cpumap.nr_elems = local_size * 8; err = do_sysctl_save(xch, &sysctl); @@ -184,7 +184,7 @@ xc_cpumap_t xc_cpupool_freeinfo(xc_inter sysctl.cmd = XEN_SYSCTL_cpupool_op; sysctl.u.cpupool_op.op = XEN_SYSCTL_CPUPOOL_OP_FREEINFO; set_xen_guest_handle(sysctl.u.cpupool_op.cpumap.bitmap, local); - sysctl.u.cpupool_op.cpumap.nr_cpus = mapsize * 8; + sysctl.u.cpupool_op.cpumap.nr_elems = mapsize * 8; err = do_sysctl_save(xch, &sysctl); diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -142,7 +142,7 @@ int xc_vcpu_setaffinity(xc_interface *xc set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); - domctl.u.vcpuaffinity.cpumap.nr_cpus = cpusize * 8; + domctl.u.vcpuaffinity.cpumap.nr_elems = cpusize * 8; ret = do_domctl(xch, &domctl); @@ -182,7 +182,7 @@ int xc_vcpu_getaffinity(xc_interface *xc domctl.u.vcpuaffinity.vcpu = vcpu; set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); - domctl.u.vcpuaffinity.cpumap.nr_cpus = cpusize * 8; + domctl.u.vcpuaffinity.cpumap.nr_elems = cpusize * 8; ret = do_domctl(xch, &domctl); diff --git a/tools/libxc/xc_tbuf.c b/tools/libxc/xc_tbuf.c --- a/tools/libxc/xc_tbuf.c +++ b/tools/libxc/xc_tbuf.c @@ -134,7 +134,7 @@ int xc_tbuf_set_cpu_mask(xc_interface *x bitmap_64_to_byte(bytemap, &mask64, sizeof (mask64) * 8); set_xen_guest_handle(sysctl.u.tbuf_op.cpu_mask.bitmap, bytemap); - sysctl.u.tbuf_op.cpu_mask.nr_cpus = sizeof(bytemap) * 8; + sysctl.u.tbuf_op.cpu_mask.nr_elems = sizeof(bytemap) * 8; ret = do_sysctl(xch, &sysctl); diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c --- a/xen/arch/x86/cpu/mcheck/mce.c +++ b/xen/arch/x86/cpu/mcheck/mce.c @@ -1545,8 +1545,7 @@ long do_mca(XEN_GUEST_HANDLE(xen_mc_t) u cpumap = &cpu_online_map; else { - ret = xenctl_cpumap_to_cpumask(&cmv, - &op->u.mc_inject_v2.cpumap); + ret = xenctl_map_to_cpumask(&cmv, &op->u.mc_inject_v2.cpumap); if ( ret ) break; cpumap = cmv; diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c --- a/xen/arch/x86/platform_hypercall.c +++ b/xen/arch/x86/platform_hypercall.c @@ -365,7 +365,7 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe { uint32_t cpu; uint64_t idletime, now = NOW(); - struct xenctl_cpumap ctlmap; + struct xenctl_map ctlmap; cpumask_var_t cpumap; XEN_GUEST_HANDLE(uint8) cpumap_bitmap; XEN_GUEST_HANDLE(uint64) idletimes; @@ -378,11 +378,11 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe if ( cpufreq_controller != FREQCTL_dom0_kernel ) break; - ctlmap.nr_cpus = op->u.getidletime.cpumap_nr_cpus; + ctlmap.nr_elems = op->u.getidletime.cpumap_nr_cpus; guest_from_compat_handle(cpumap_bitmap, op->u.getidletime.cpumap_bitmap); ctlmap.bitmap.p = cpumap_bitmap.p; /* handle -> handle_64 conversion */ - if ( (ret = xenctl_cpumap_to_cpumask(&cpumap, &ctlmap)) != 0 ) + if ( (ret = xenctl_map_to_cpumask(&cpumap, &ctlmap)) != 0 ) goto out; guest_from_compat_handle(idletimes, op->u.getidletime.idletime); @@ -401,7 +401,7 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe op->u.getidletime.now = now; if ( ret == 0 ) - ret = cpumask_to_xenctl_cpumap(&ctlmap, cpumap); + ret = cpumask_to_xenctl_map(&ctlmap, cpumap); free_cpumask_var(cpumap); if ( ret == 0 && copy_to_guest(u_xenpf_op, op, 1) ) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -489,7 +489,7 @@ int cpupool_do_sysctl(struct xen_sysctl_ op->cpupool_id = c->cpupool_id; op->sched_id = c->sched->sched_id; op->n_dom = c->n_dom; - ret = cpumask_to_xenctl_cpumap(&op->cpumap, c->cpu_valid); + ret = cpumask_to_xenctl_map(&op->cpumap, c->cpu_valid); cpupool_put(c); } break; @@ -584,7 +584,7 @@ int cpupool_do_sysctl(struct xen_sysctl_ case XEN_SYSCTL_CPUPOOL_OP_FREEINFO: { - ret = cpumask_to_xenctl_cpumap( + ret = cpumask_to_xenctl_map( &op->cpumap, &cpupool_free_cpus); } break; diff --git a/xen/common/domctl.c b/xen/common/domctl.c --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -31,28 +31,29 @@ static DEFINE_SPINLOCK(domctl_lock); DEFINE_SPINLOCK(vcpu_alloc_lock); -int cpumask_to_xenctl_cpumap( - struct xenctl_cpumap *xenctl_cpumap, const cpumask_t *cpumask) +int bitmap_to_xenctl_map(struct xenctl_map *xenctl_map, + const unsigned long *bitmap, + unsigned int nbits) { unsigned int guest_bytes, copy_bytes, i; uint8_t zero = 0; int err = 0; - uint8_t *bytemap = xmalloc_array(uint8_t, (nr_cpu_ids + 7) / 8); + uint8_t *bytemap = xmalloc_array(uint8_t, (nbits + 7) / 8); if ( !bytemap ) return -ENOMEM; - guest_bytes = (xenctl_cpumap->nr_cpus + 7) / 8; - copy_bytes = min_t(unsigned int, guest_bytes, (nr_cpu_ids + 7) / 8); + guest_bytes = (xenctl_map->nr_elems + 7) / 8; + copy_bytes = min_t(unsigned int, guest_bytes, (nbits + 7) / 8); - bitmap_long_to_byte(bytemap, cpumask_bits(cpumask), nr_cpu_ids); + bitmap_long_to_byte(bytemap, bitmap, nbits); if ( copy_bytes != 0 ) - if ( copy_to_guest(xenctl_cpumap->bitmap, bytemap, copy_bytes) ) + if ( copy_to_guest(xenctl_map->bitmap, bytemap, copy_bytes) ) err = -EFAULT; for ( i = copy_bytes; !err && i < guest_bytes; i++ ) - if ( copy_to_guest_offset(xenctl_cpumap->bitmap, i, &zero, 1) ) + if ( copy_to_guest_offset(xenctl_map->bitmap, i, &zero, 1) ) err = -EFAULT; xfree(bytemap); @@ -60,36 +61,58 @@ int cpumask_to_xenctl_cpumap( return err; } -int xenctl_cpumap_to_cpumask( - cpumask_var_t *cpumask, const struct xenctl_cpumap *xenctl_cpumap) +int xenctl_map_to_bitmap(unsigned long *bitmap, + const struct xenctl_map *xenctl_map, + unsigned int nbits) { unsigned int guest_bytes, copy_bytes; int err = 0; - uint8_t *bytemap = xzalloc_array(uint8_t, (nr_cpu_ids + 7) / 8); + uint8_t *bytemap = xzalloc_array(uint8_t, (nbits + 7) / 8); if ( !bytemap ) return -ENOMEM; - guest_bytes = (xenctl_cpumap->nr_cpus + 7) / 8; - copy_bytes = min_t(unsigned int, guest_bytes, (nr_cpu_ids + 7) / 8); + guest_bytes = (xenctl_map->nr_elems + 7) / 8; + copy_bytes = min_t(unsigned int, guest_bytes, (nbits + 7) / 8); if ( copy_bytes != 0 ) { - if ( copy_from_guest(bytemap, xenctl_cpumap->bitmap, copy_bytes) ) + if ( copy_from_guest(bytemap, xenctl_map->bitmap, copy_bytes) ) err = -EFAULT; - if ( (xenctl_cpumap->nr_cpus & 7) && (guest_bytes <= sizeof(bytemap)) ) - bytemap[guest_bytes-1] &= ~(0xff << (xenctl_cpumap->nr_cpus & 7)); + if ( (xenctl_map->nr_elems & 7) && (guest_bytes <= sizeof(bytemap)) ) + bytemap[guest_bytes-1] &= ~(0xff << (xenctl_map->nr_elems & 7)); } - if ( err ) - /* nothing */; - else if ( alloc_cpumask_var(cpumask) ) - bitmap_byte_to_long(cpumask_bits(*cpumask), bytemap, nr_cpu_ids); + if ( !err ) + bitmap_byte_to_long(bitmap, bytemap, nbits); + + xfree(bytemap); + + return err; +} + +int cpumask_to_xenctl_map(struct xenctl_map *xenctl_cpumap, + const cpumask_t *cpumask) +{ + return bitmap_to_xenctl_map(xenctl_cpumap, cpumask_bits(cpumask), + nr_cpu_ids); +} + +int xenctl_map_to_cpumask(cpumask_var_t *cpumask, + const struct xenctl_map *xenctl_cpumap) +{ + int err = 0; + + if ( alloc_cpumask_var(cpumask) ) { + err = xenctl_map_to_bitmap(cpumask_bits(*cpumask), xenctl_cpumap, + nr_cpu_ids); + /* In case of error, cleanup is up to us, as the caller won''t care! */ + if ( err ) + free_cpumask_var(*cpumask); + } else err = -ENOMEM; - xfree(bytemap); - return err; } @@ -617,7 +640,7 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc { cpumask_var_t new_affinity; - ret = xenctl_cpumap_to_cpumask( + ret = xenctl_map_to_cpumask( &new_affinity, &op->u.vcpuaffinity.cpumap); if ( !ret ) { @@ -627,7 +650,7 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc } else { - ret = cpumask_to_xenctl_cpumap( + ret = cpumask_to_xenctl_map( &op->u.vcpuaffinity.cpumap, v->cpu_affinity); } diff --git a/xen/common/trace.c b/xen/common/trace.c --- a/xen/common/trace.c +++ b/xen/common/trace.c @@ -384,7 +384,7 @@ int tb_control(xen_sysctl_tbuf_op_t *tbc { cpumask_var_t mask; - rc = xenctl_cpumap_to_cpumask(&mask, &tbc->cpu_mask); + rc = xenctl_map_to_cpumask(&mask, &tbc->cpu_mask); if ( !rc ) { cpumask_copy(&tb_cpu_mask, mask); diff --git a/xen/include/public/arch-x86/xen-mca.h b/xen/include/public/arch-x86/xen-mca.h --- a/xen/include/public/arch-x86/xen-mca.h +++ b/xen/include/public/arch-x86/xen-mca.h @@ -414,7 +414,7 @@ struct xen_mc_mceinject { struct xen_mc_inject_v2 { uint32_t flags; - struct xenctl_cpumap cpumap; + struct xenctl_map cpumap; }; #endif diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -283,7 +283,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_getvc /* XEN_DOMCTL_getvcpuaffinity */ struct xen_domctl_vcpuaffinity { uint32_t vcpu; /* IN */ - struct xenctl_cpumap cpumap; /* IN/OUT */ + struct xenctl_map cpumap; /* IN/OUT */ }; typedef struct xen_domctl_vcpuaffinity xen_domctl_vcpuaffinity_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuaffinity_t); diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -71,8 +71,8 @@ struct xen_sysctl_tbuf_op { #define XEN_SYSCTL_TBUFOP_disable 5 uint32_t cmd; /* IN/OUT variables */ - struct xenctl_cpumap cpu_mask; - uint32_t evt_mask; + struct xenctl_map cpu_mask; + uint32_t evt_mask; /* OUT variables */ uint64_aligned_t buffer_mfn; uint32_t size; /* Also an IN variable! */ @@ -531,7 +531,7 @@ struct xen_sysctl_cpupool_op { uint32_t domid; /* IN: M */ uint32_t cpu; /* IN: AR */ uint32_t n_dom; /* OUT: I */ - struct xenctl_cpumap cpumap; /* OUT: IF */ + struct xenctl_map cpumap; /* OUT: IF */ }; typedef struct xen_sysctl_cpupool_op xen_sysctl_cpupool_op_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpupool_op_t); diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -822,9 +822,9 @@ typedef uint8_t xen_domain_handle_t[16]; #endif #ifndef __ASSEMBLY__ -struct xenctl_cpumap { +struct xenctl_map { XEN_GUEST_HANDLE_64(uint8) bitmap; - uint32_t nr_cpus; + uint32_t nr_elems; }; #endif diff --git a/xen/include/xen/cpumask.h b/xen/include/xen/cpumask.h --- a/xen/include/xen/cpumask.h +++ b/xen/include/xen/cpumask.h @@ -424,8 +424,8 @@ extern cpumask_t cpu_present_map; #define for_each_present_cpu(cpu) for_each_cpu(cpu, &cpu_present_map) /* Copy to/from cpumap provided by control tools. */ -struct xenctl_cpumap; -int cpumask_to_xenctl_cpumap(struct xenctl_cpumap *, const cpumask_t *); -int xenctl_cpumap_to_cpumask(cpumask_var_t *, const struct xenctl_cpumap *); +struct xenctl_map; +int cpumask_to_xenctl_map(struct xenctl_map *, const cpumask_t *); +int xenctl_map_to_cpumask(cpumask_var_t *, const struct xenctl_map *); #endif /* __XEN_CPUMASK_H */ diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst --- a/xen/include/xlat.lst +++ b/xen/include/xlat.lst @@ -2,7 +2,7 @@ # ! - needs translation # ? - needs checking ? dom0_vga_console_info xen.h -? xenctl_cpumap xen.h +? xenctl_map xen.h ? mmu_update xen.h ! mmuext_op xen.h ! start_info xen.h
Dario Faggioli
2012-May-31 12:11 UTC
[PATCH 02 of 11] libxl: abstract libxl_cpumap to just libxl_map
More specifically: 1. introduces struct libxl_map; 2. re-implement libxl_cpumap_* on top of struct libxl_map_*; No functional nor interface changes at all. This is in preparation of the introduction of NUMA nodes maps. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.eu.com> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -282,11 +282,17 @@ typedef uint32_t libxl_hwcap[8]; typedef uint64_t libxl_ev_user; -typedef struct { +struct libxl_map { uint32_t size; /* number of bytes in map */ uint8_t *map; -} libxl_cpumap; -void libxl_cpumap_dispose(libxl_cpumap *map); +}; +void libxl_map_dispose(struct libxl_map *map); + +typedef struct libxl_map libxl_cpumap; +static inline void libxl_cpumap_dispose(libxl_cpumap *cpumap) +{ + return libxl_map_dispose(cpumap); +} typedef struct { /* diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -488,47 +488,53 @@ int libxl_mac_to_device_nic(libxl_ctx *c return rc; } +void libxl_map_dispose(struct libxl_map *map) +{ + free(map->map); +} + +static int libxl_map_alloc(libxl_ctx *ctx, struct libxl_map *map, int n_elems) +{ + int sz; + + sz = (n_elems + 7) / 8; + map->map = calloc(sz, sizeof(*map->map)); + if (!map->map) + return ERROR_NOMEM; + map->size = sz; + return 0; +} + +int libxl_map_test(struct libxl_map *map, int elem) +{ + if (elem >= map->size * 8) + return 0; + return (map->map[elem / 8] & (1 << (elem & 7))) ? 1 : 0; +} + +void libxl_map_set(struct libxl_map *map, int elem) +{ + if (elem >= map->size * 8) + return; + map->map[elem / 8] |= 1 << (elem & 7); +} + +void libxl_map_reset(struct libxl_map *map, int elem) +{ + if (elem >= map->size * 8) + return; + map->map[elem / 8] &= ~(1 << (elem & 7)); +} + int libxl_cpumap_alloc(libxl_ctx *ctx, libxl_cpumap *cpumap) { int max_cpus; - int sz; max_cpus = libxl_get_max_cpus(ctx); if (max_cpus == 0) return ERROR_FAIL; - sz = (max_cpus + 7) / 8; - cpumap->map = calloc(sz, sizeof(*cpumap->map)); - if (!cpumap->map) - return ERROR_NOMEM; - cpumap->size = sz; - return 0; -} - -void libxl_cpumap_dispose(libxl_cpumap *map) -{ - free(map->map); -} - -int libxl_cpumap_test(libxl_cpumap *cpumap, int cpu) -{ - if (cpu >= cpumap->size * 8) - return 0; - return (cpumap->map[cpu / 8] & (1 << (cpu & 7))) ? 1 : 0; -} - -void libxl_cpumap_set(libxl_cpumap *cpumap, int cpu) -{ - if (cpu >= cpumap->size * 8) - return; - cpumap->map[cpu / 8] |= 1 << (cpu & 7); -} - -void libxl_cpumap_reset(libxl_cpumap *cpumap, int cpu) -{ - if (cpu >= cpumap->size * 8) - return; - cpumap->map[cpu / 8] &= ~(1 << (cpu & 7)); + return libxl_map_alloc(ctx, cpumap, max_cpus); } int libxl_get_max_cpus(libxl_ctx *ctx) diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h --- a/tools/libxl/libxl_utils.h +++ b/tools/libxl/libxl_utils.h @@ -63,21 +63,46 @@ int libxl_devid_to_device_nic(libxl_ctx int libxl_vdev_to_device_disk(libxl_ctx *ctx, uint32_t domid, const char *vdev, libxl_device_disk *disk); +int libxl_map_test(struct libxl_map *map, int elem); +void libxl_map_set(struct libxl_map *map, int elem); +void libxl_map_reset(struct libxl_map *map, int elem); +static inline void libxl_map_set_any(struct libxl_map *map) +{ + memset(map->map, -1, map->size); +} +static inline void libxl_map_set_none(struct libxl_map *map) +{ + memset(map->map, 0, map->size); +} +static inline int libxl_map_elem_valid(struct libxl_map *map, int elem) +{ + return elem >= 0 && elem < (map->size * 8); +} + int libxl_cpumap_alloc(libxl_ctx *ctx, libxl_cpumap *cpumap); -int libxl_cpumap_test(libxl_cpumap *cpumap, int cpu); -void libxl_cpumap_set(libxl_cpumap *cpumap, int cpu); -void libxl_cpumap_reset(libxl_cpumap *cpumap, int cpu); +static inline int libxl_cpumap_test(libxl_cpumap *cpumap, int cpu) +{ + return libxl_map_test(cpumap, cpu); +} +static inline void libxl_cpumap_set(libxl_cpumap *cpumap, int cpu) +{ + libxl_map_set(cpumap, cpu); +} +static inline void libxl_cpumap_reset(libxl_cpumap *cpumap, int cpu) +{ + libxl_map_reset(cpumap, cpu); +} static inline void libxl_cpumap_set_any(libxl_cpumap *cpumap) { - memset(cpumap->map, -1, cpumap->size); + libxl_map_set_any(cpumap); } static inline void libxl_cpumap_set_none(libxl_cpumap *cpumap) { - memset(cpumap->map, 0, cpumap->size); + libxl_map_set_none(cpumap); } static inline int libxl_cpumap_cpu_valid(libxl_cpumap *cpumap, int cpu) { - return cpu >= 0 && cpu < (cpumap->size * 8); + return libxl_map_elem_valid(cpumap, cpu); } #define libxl_for_each_cpu(var, map) for (var = 0; var < (map).size * 8; var++) #define libxl_for_each_set_cpu(v, m) for (v = 0; v < (m).size * 8; v++) \
Dario Faggioli
2012-May-31 12:11 UTC
[PATCH 03 of 11] libxc, libxl: introduce xc_nodemap_t and libxl_nodemap
As NUMA node-related counterparts of xc_cpumap_t and libxl_cpumap. This is in preparation of making it possible to manipulate NUMA nodes from the toolstack(s). Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -35,11 +35,30 @@ int xc_get_max_cpus(xc_interface *xch) return max_cpus; } +int xc_get_max_nodes(xc_interface *xch) +{ + static int max_nodes = 0; + xc_physinfo_t physinfo; + + if ( max_nodes ) + return max_nodes; + + if ( !xc_physinfo(xch, &physinfo) ) + max_nodes = physinfo.max_node_id + 1; + + return max_nodes; +} + int xc_get_cpumap_size(xc_interface *xch) { return (xc_get_max_cpus(xch) + 7) / 8; } +int xc_get_nodemap_size(xc_interface *xch) +{ + return (xc_get_max_nodes(xch) + 7) / 8; +} + xc_cpumap_t xc_cpumap_alloc(xc_interface *xch) { int sz; @@ -50,6 +69,16 @@ xc_cpumap_t xc_cpumap_alloc(xc_interface return calloc(1, sz); } +xc_nodemap_t xc_nodemap_alloc(xc_interface *xch) +{ + int sz; + + sz = xc_get_nodemap_size(xch); + if (sz == 0) + return NULL; + return calloc(1, sz); +} + int xc_readconsolering(xc_interface *xch, char *buffer, unsigned int *pnr_chars, diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -330,6 +330,20 @@ int xc_get_cpumap_size(xc_interface *xch xc_cpumap_t xc_cpumap_alloc(xc_interface *xch); /* + * NODEMAP handling + */ +typedef uint8_t *xc_nodemap_t; + +/* return maximum number of NUMA nodes the hypervisor supports */ +int xc_get_max_nodes(xc_interface *xch); + +/* return array size for nodemap */ +int xc_get_nodemap_size(xc_interface *xch); + +/* allocate a nodemap */ +xc_nodemap_t xc_nodemap_alloc(xc_interface *xch); + +/* * DOMAIN DEBUGGING FUNCTIONS */ diff --git a/tools/libxl/gentest.py b/tools/libxl/gentest.py --- a/tools/libxl/gentest.py +++ b/tools/libxl/gentest.py @@ -20,7 +20,7 @@ def randomize_case(s): def randomize_enum(e): return random.choice([v.name for v in e.values]) -handcoded = ["libxl_cpumap", "libxl_key_value_list", +handcoded = ["libxl_cpumap", "libxl_nodemap", "libxl_key_value_list", "libxl_cpuid_policy_list", "libxl_file_reference", "libxl_string_list"] diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -294,6 +294,12 @@ static inline void libxl_cpumap_dispose( return libxl_map_dispose(cpumap); } +typedef struct libxl_map libxl_nodemap; +static inline void libxl_nodemap_dispose(libxl_nodemap *nodemap) +{ + return libxl_map_dispose(nodemap); +} + typedef struct { /* * Path is always set if the file reference is valid. However if @@ -549,6 +555,9 @@ int libxl_domain_preserve(libxl_ctx *ctx /* get max. number of cpus supported by hypervisor */ int libxl_get_max_cpus(libxl_ctx *ctx); +/* get max. number of NUMA nodes supported by hypervisor */ +int libxl_get_max_nodes(libxl_ctx *ctx); + /* * Run the configured bootloader for a PV domain and update * info->kernel, info->u.pv.ramdisk and info->u.pv.cmdline as diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -11,6 +11,7 @@ libxl_domid = Builtin("domid", json_fn libxl_uuid = Builtin("uuid", passby=PASS_BY_REFERENCE) libxl_mac = Builtin("mac", passby=PASS_BY_REFERENCE) libxl_cpumap = Builtin("cpumap", dispose_fn="libxl_cpumap_dispose", passby=PASS_BY_REFERENCE) +libxl_nodemap = Builtin("nodemap", dispose_fn="libxl_nodemap_dispose", passby=PASS_BY_REFERENCE) libxl_cpuid_policy_list = Builtin("cpuid_policy_list", dispose_fn="libxl_cpuid_dispose", passby=PASS_BY_REFERENCE) libxl_string_list = Builtin("string_list", dispose_fn="libxl_string_list_dispose", passby=PASS_BY_REFERENCE) diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -537,11 +537,27 @@ int libxl_cpumap_alloc(libxl_ctx *ctx, l return libxl_map_alloc(ctx, cpumap, max_cpus); } +int libxl_nodemap_alloc(libxl_ctx *ctx, libxl_nodemap *nodemap) +{ + int max_nodes; + + max_nodes = libxl_get_max_nodes(ctx); + if (max_nodes == 0) + return ERROR_FAIL; + + return libxl_map_alloc(ctx, nodemap, max_nodes); +} + int libxl_get_max_cpus(libxl_ctx *ctx) { return xc_get_max_cpus(ctx->xch); } +int libxl_get_max_nodes(libxl_ctx *ctx) +{ + return xc_get_max_nodes(ctx->xch); +} + int libxl__enum_from_string(const libxl_enum_string_table *t, const char *s, int *e) { diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h --- a/tools/libxl/libxl_utils.h +++ b/tools/libxl/libxl_utils.h @@ -108,6 +108,35 @@ static inline int libxl_cpumap_cpu_valid #define libxl_for_each_set_cpu(v, m) for (v = 0; v < (m).size * 8; v++) \ if (libxl_cpumap_test(&(m), v)) +int libxl_nodemap_alloc(libxl_ctx *ctx, libxl_nodemap *nodemap); +static inline int libxl_nodemap_test(libxl_nodemap *nodemap, int node) +{ + return libxl_map_test(nodemap, node); +} +static inline void libxl_nodemap_set(libxl_nodemap *nodemap, int node) +{ + libxl_map_set(nodemap, node); +} +static inline void libxl_nodemap_reset(libxl_nodemap *nodemap, int node) +{ + libxl_map_reset(nodemap, node); +} +static inline void libxl_nodemap_set_any(libxl_nodemap *nodemap) +{ + libxl_map_set_any(nodemap); +} +static inline void libxl_nodemap_set_none(libxl_nodemap *nodemap) +{ + libxl_map_set_none(nodemap); +} +static inline int libxl_nodemap_node_valid(libxl_nodemap *nodemap, int node) +{ + return libxl_map_elem_valid(nodemap, node); +} +#define libxl_for_each_node(var, map) for (var = 0; var < (map).size * 8; var++) +#define libxl_for_each_set_node(v, m) for (v = 0; v < (m).size * 8; v++) \ + if (libxl_nodemap_test(&(m), v)) + static inline uint32_t libxl__sizekb_to_mb(uint32_t s) { return (s + 1023) / 1024; } diff --git a/xen/common/domctl.c b/xen/common/domctl.c --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -116,6 +116,21 @@ int xenctl_map_to_cpumask(cpumask_var_t return err; } +int nodemask_to_xenctl_map(struct xenctl_map *xenctl_nodemap, + const nodemask_t *nodemask) +{ + return bitmap_to_xenctl_map(xenctl_nodemap, cpumask_bits(nodemask), + MAX_NUMNODES); +} + +int xenctl_map_to_nodemask(nodemask_t *nodemask, + const struct xenctl_map *xenctl_nodemap) +{ + return xenctl_map_to_bitmap(nodes_addr(*nodemask), + xenctl_nodemap, + MAX_NUMNODES); +} + static inline int is_free_domid(domid_t dom) { struct domain *d;
Dario Faggioli
2012-May-31 12:11 UTC
[PATCH 04 of 11] libxl: expand the libxl_{cpu, node}map API a bit
By adding copying and *_is_full/*_is_empty facilities. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -505,6 +505,35 @@ static int libxl_map_alloc(libxl_ctx *ct return 0; } +static void libxl_map_copy(struct libxl_map *dptr, + const struct libxl_map *sptr) +{ + int sz; + + sz = dptr->size = sptr->size; + memcpy(dptr->map, sptr->map, sz * sizeof(*dptr->map)); +} + +int libxl_map_is_full(struct libxl_map *map) +{ + int i; + + for (i = 0; i < map->size; i++) + if (map->map[i] != (uint8_t)-1) + return -1; + return 0; +} + +int libxl_map_is_empty(struct libxl_map *map) +{ + int i; + + for (i = 0; i < map->size; i++) + if (map->map[i]) + return -1; + return 0; +} + int libxl_map_test(struct libxl_map *map, int elem) { if (elem >= map->size * 8) @@ -548,6 +577,18 @@ int libxl_nodemap_alloc(libxl_ctx *ctx, return libxl_map_alloc(ctx, nodemap, max_nodes); } +void libxl_cpumap_copy(/*XXX libxl_ctx *ctx,*/ libxl_cpumap *dst, + const libxl_cpumap *src) +{ + libxl_map_copy(dst, src); +} + +void libxl_nodemap_copy(/*XXX libxl_ctx *ctx,*/ libxl_nodemap *dst, + const libxl_nodemap *src) +{ + libxl_map_copy(dst, src); +} + int libxl_get_max_cpus(libxl_ctx *ctx) { return xc_get_max_cpus(ctx->xch); diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h --- a/tools/libxl/libxl_utils.h +++ b/tools/libxl/libxl_utils.h @@ -63,6 +63,8 @@ int libxl_devid_to_device_nic(libxl_ctx int libxl_vdev_to_device_disk(libxl_ctx *ctx, uint32_t domid, const char *vdev, libxl_device_disk *disk); +int libxl_map_is_full(struct libxl_map *map); +int libxl_map_is_empty(struct libxl_map *map); int libxl_map_test(struct libxl_map *map, int elem); void libxl_map_set(struct libxl_map *map, int elem); void libxl_map_reset(struct libxl_map *map, int elem); @@ -80,6 +82,16 @@ static inline int libxl_map_elem_valid(s } int libxl_cpumap_alloc(libxl_ctx *ctx, libxl_cpumap *cpumap); +void libxl_cpumap_copy(/*XXX libxl_ctx *ctx, */ libxl_cpumap *dst, + const libxl_cpumap *src); +static inline int libxl_cpumap_is_full(libxl_cpumap *cpumap) +{ + return libxl_map_is_full(cpumap); +} +static inline int libxl_cpumap_is_empty(libxl_cpumap *cpumap) +{ + return libxl_map_is_empty(cpumap); +} static inline int libxl_cpumap_test(libxl_cpumap *cpumap, int cpu) { return libxl_map_test(cpumap, cpu); @@ -109,6 +121,16 @@ static inline int libxl_cpumap_cpu_valid if (libxl_cpumap_test(&(m), v)) int libxl_nodemap_alloc(libxl_ctx *ctx, libxl_nodemap *nodemap); +void libxl_nodemap_copy(/*XXX libxl_ctx *ctx, */ libxl_nodemap *dst, + const libxl_nodemap *src); +static inline int libxl_nodemap_is_full(libxl_nodemap *nodemap) +{ + return libxl_map_is_full(nodemap); +} +static inline int libxl_nodemap_is_empty(libxl_nodemap *nodemap) +{ + return libxl_map_is_empty(nodemap); +} static inline int libxl_nodemap_test(libxl_nodemap *nodemap, int node) { return libxl_map_test(nodemap, node);
Dario Faggioli
2012-May-31 12:11 UTC
[PATCH 05 of 11] libxl: add a new Array type to the IDL
And make all the required infrastructure updates to enable this. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/libxl/gentest.py b/tools/libxl/gentest.py --- a/tools/libxl/gentest.py +++ b/tools/libxl/gentest.py @@ -28,6 +28,18 @@ def gen_rand_init(ty, v, indent = " " s = "" if isinstance(ty, idl.Enumeration): s += "%s = %s;\n" % (ty.pass_arg(v, parent is None), randomize_enum(ty)) + elif isinstance(ty, idl.Array): + if parent is None: + raise Exception("Array type must have a parent") + s += "%s = rand()%%8;\n" % (parent + ty.lenvar.name) + s += "%s = calloc(%s, sizeof(*%s));\n" % \ + (v, parent + ty.lenvar.name, v) + s += "{\n" + s += " int i;\n" + s += " for (i=0; i<%s; i++)\n" % (parent + ty.lenvar.name) + s += gen_rand_init(ty.elem_type, v+"[i]", + indent + " ", parent) + s += "}\n" elif isinstance(ty, idl.KeyedUnion): if parent is None: raise Exception("KeyedUnion type must have a parent") diff --git a/tools/libxl/gentypes.py b/tools/libxl/gentypes.py --- a/tools/libxl/gentypes.py +++ b/tools/libxl/gentypes.py @@ -11,8 +11,12 @@ def libxl_C_instance_of(ty, instancename return libxl_C_type_define(ty) else: return libxl_C_type_define(ty) + " " + instancename - else: - return ty.typename + " " + instancename + + s = "" + if isinstance(ty, idl.Array): + s += libxl_C_instance_of(ty.lenvar.type, ty.lenvar.name) + ";\n" + + return s + ty.typename + " " + instancename def libxl_C_type_define(ty, indent = ""): s = "" @@ -66,6 +70,18 @@ def libxl_C_type_dispose(ty, v, indent s += libxl_C_type_dispose(f.type, fexpr, indent + " ", nparent) s += " break;\n" s += "}\n" + elif isinstance(ty, idl.Array): + if parent is None: + raise Exception("Array type must have a parent") + s += "{\n" + if ty.elem_type.dispose_fn is not None: + s += " int i;\n" + s += " for (i=0; i<%s; i++)\n" % (parent + ty.lenvar.name) + s += libxl_C_type_dispose(ty.elem_type, v+"[i]", + indent + " ", parent) + if ty.dispose_fn is not None: + s += " %s(%s);\n" % (ty.dispose_fn, ty.pass_arg(v, parent is None)) + s += "}\n" elif isinstance(ty, idl.Struct) and (parent is None or ty.dispose_fn is None): for f in [f for f in ty.fields if not f.const]: (nparent,fexpr) = ty.member(v, f, parent is None) @@ -164,7 +180,24 @@ def libxl_C_type_gen_json(ty, v, indent s = "" if parent is None: s += "yajl_gen_status s;\n" - if isinstance(ty, idl.Enumeration): + + if isinstance(ty, idl.Array): + if parent is None: + raise Exception("Array type must have a parent") + s += "{\n" + s += " int i;\n" + s += " s = yajl_gen_array_open(hand);\n" + s += " if (s != yajl_gen_status_ok)\n" + s += " goto out;\n" + s += " for (i=0; i<%s; i++) {\n" % (parent + ty.lenvar.name) + s += libxl_C_type_gen_json(ty.elem_type, v+"[i]", + indent + " ", parent) + s += " }\n" + s += " s = yajl_gen_array_close(hand);\n" + s += " if (s != yajl_gen_status_ok)\n" + s += " goto out;\n" + s += "}\n" + elif isinstance(ty, idl.Enumeration): s += "s = libxl__yajl_gen_enum(hand, %s_to_string(%s));\n" % (ty.typename, ty.pass_arg(v, parent is None)) s += "if (s != yajl_gen_status_ok)\n" s += " goto out;\n" diff --git a/tools/libxl/idl.py b/tools/libxl/idl.py --- a/tools/libxl/idl.py +++ b/tools/libxl/idl.py @@ -266,6 +266,17 @@ string = Builtin("char *", namespace = N json_fn = "libxl__string_gen_json", autogenerate_json = False) +class Array(Type): + """An array of the same type""" + def __init__(self, elem_type, lenvar_name, **kwargs): + kwargs.setdefault(''dispose_fn'', ''free'') + Type.__init__(self, namespace=elem_type.namespace, typename=elem_type.rawname + " *", **kwargs) + + lv_kwargs = dict([(x.lstrip(''lenvar_''),y) for (x,y) in kwargs.items() if x.startswith(''lenvar_'')]) + + self.lenvar = Field(integer, lenvar_name, **lv_kwargs) + self.elem_type = elem_type + class OrderedDict(dict): """A dictionary which remembers insertion order. diff --git a/tools/libxl/idl.txt b/tools/libxl/idl.txt --- a/tools/libxl/idl.txt +++ b/tools/libxl/idl.txt @@ -145,12 +145,24 @@ idl.KeyedUnion A subclass of idl.Aggregate which represents the C union type where the currently valid member of the union can be determined based - upon another member in the containing type. + upon another member in the containing type. An idl.KeyedUnion must + always be a member of a containing idl.Aggregate type. - The KeyedUnion.keyvar contains an idl.type the member of the + The KeyedUnion.keyvar contains an idl.Type the member of the containing type which determines the valid member of the union. The must be an instance of the Enumeration type. +idl.Array + + A class representing an array or similar elements. An idl.Array must + always be an idl.Field of a containing idl.Aggregate. + + idl.Array.elem_type contains an idl.Type which is the type of all + elements of the array. + + idl.Array.len_var contains an idl.Field which is added to the parent + idl.Aggregate and will contain the length of the array. + Standard Types -------------- diff --git a/tools/ocaml/libs/xl/genwrap.py b/tools/ocaml/libs/xl/genwrap.py --- a/tools/ocaml/libs/xl/genwrap.py +++ b/tools/ocaml/libs/xl/genwrap.py @@ -54,7 +54,8 @@ def ocaml_type_of(ty): return "int%d" % ty.width else: return "int" - + elif isinstance(ty,idl.Array): + return "%s list" % ocaml_type_of(ty.elem_type) elif isinstance(ty,idl.Builtin): if not builtins.has_key(ty.typename): raise NotImplementedError("Unknown Builtin %s (%s)" % (ty.typename, type(ty))) diff --git a/tools/python/genwrap.py b/tools/python/genwrap.py --- a/tools/python/genwrap.py +++ b/tools/python/genwrap.py @@ -4,7 +4,7 @@ import sys,os import idl -(TYPE_DEFBOOL, TYPE_BOOL, TYPE_INT, TYPE_UINT, TYPE_STRING, TYPE_AGGREGATE) = range(6) +(TYPE_DEFBOOL, TYPE_BOOL, TYPE_INT, TYPE_UINT, TYPE_STRING, TYPE_ARRAY, TYPE_AGGREGATE) = range(7) def py_type(ty): if ty == idl.bool: @@ -18,6 +18,8 @@ def py_type(ty): return TYPE_INT else: return TYPE_UINT + if isinstance(ty, idl.Array): + return TYPE_ARRAY if isinstance(ty, idl.Aggregate): return TYPE_AGGREGATE if ty == idl.string: @@ -74,7 +76,7 @@ def py_attrib_get(ty, f): l.append('' return genwrap__ull_get(self->obj.%s);''%f.name) elif t == TYPE_STRING: l.append('' return genwrap__string_get(&self->obj.%s);''%f.name) - elif t == TYPE_AGGREGATE: + elif t == TYPE_AGGREGATE or t == TYPE_ARRAY: l.append('' PyErr_SetString(PyExc_NotImplementedError, "Getting %s");''%ty.typename) l.append('' return NULL;'') else: @@ -105,7 +107,7 @@ def py_attrib_set(ty, f): l.append('' return ret;'') elif t == TYPE_STRING: l.append('' return genwrap__string_set(v, &self->obj.%s);''%f.name) - elif t == TYPE_AGGREGATE: + elif t == TYPE_AGGREGATE or t == TYPE_ARRAY: l.append('' PyErr_SetString(PyExc_NotImplementedError, "Setting %s");''%ty.typename) l.append('' return -1;'') else:
Dario Faggioli
2012-May-31 12:11 UTC
[PATCH 06 of 11] libxl: introduce libxl_get_numainfo()
Make some NUMA node information available to the toolstack. Achieve this by means of xc_numainfo(), which exposes memory size and amount of free memory of each node, as well as the relative distances of each node to all the others. For properly exposing distances we need the IDL to support arrays. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -3014,6 +3014,83 @@ fail: return ret; } +libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr) +{ + xc_numainfo_t ninfo; + DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize); + DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree); + DECLARE_HYPERCALL_BUFFER(uint32_t, node_distances); + libxl_numainfo *ret = NULL; + int i, j, max_nodes; + + max_nodes = libxl_get_max_nodes(ctx); + if (max_nodes == 0) + { + LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES"); + return NULL; + } + + memsize = xc_hypercall_buffer_alloc + (ctx->xch, memsize, sizeof(*memsize) * max_nodes); + memfree = xc_hypercall_buffer_alloc + (ctx->xch, memfree, sizeof(*memfree) * max_nodes); + node_distances = xc_hypercall_buffer_alloc + (ctx->xch, node_distances, sizeof(*node_distances) * max_nodes * max_nodes); + if ((memsize == NULL) || (memfree == NULL) || (node_distances == NULL)) { + LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM, + "Unable to allocate hypercall arguments"); + goto fail; + } + + set_xen_guest_handle(ninfo.node_to_memsize, memsize); + set_xen_guest_handle(ninfo.node_to_memfree, memfree); + set_xen_guest_handle(ninfo.node_to_node_distance, node_distances); + ninfo.max_node_index = max_nodes - 1; + if (xc_numainfo(ctx->xch, &ninfo) != 0) { + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting numainfo"); + goto fail; + } + + if (ninfo.max_node_index < max_nodes - 1) + max_nodes = ninfo.max_node_index + 1; + + ret = malloc(sizeof(libxl_numainfo) * max_nodes); + if (ret == NULL) { + LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM, + "Unable to allocate return value"); + goto fail; + } + for (i = 0; i < max_nodes; i++) { + ret[i].dists = malloc(sizeof(*node_distances) * max_nodes); + if (ret == NULL) { + for (j = i; j >= 0; j--) + free(ret[i].dists); + free(ret); + goto fail; + } + } + + for (i = 0; i < max_nodes; i++) { +#define V(mem, i) (mem[i] == INVALID_NUMAINFO_ID) ? \ + LIBXL_NUMAINFO_INVALID_ENTRY : mem[i] + ret[i].size = V(memsize, i); + ret[i].free = V(memfree, i); + ret[i].num_dists = max_nodes; + for (j = 0; j < ret[i].num_dists; j++) + ret[i].dists[j] = V(node_distances, i * max_nodes + j); +#undef V + } + +fail: + xc_hypercall_buffer_free(ctx->xch, memsize); + xc_hypercall_buffer_free(ctx->xch, memfree); + xc_hypercall_buffer_free(ctx->xch, node_distances); + + if (ret) + *nr = max_nodes; + return ret; +} + const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx) { union { diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -810,6 +810,9 @@ int libxl_get_physinfo(libxl_ctx *ctx, l #define LIBXL_CPUTOPOLOGY_INVALID_ENTRY (~(uint32_t)0) libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nr); void libxl_cputopology_list_free(libxl_cputopology *, int nr); +#define LIBXL_NUMAINFO_INVALID_ENTRY (~(uint32_t)0) +libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr); +void libxl_numainfo_list_free(libxl_numainfo *, int nr); libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid, int *nb_vcpu, int *nrcpus); int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid, diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -427,6 +427,12 @@ libxl_physinfo = Struct("physinfo", [ ("cap_hvm_directio", bool), ], dir=DIR_OUT) +libxl_numainfo = Struct("numainfo", [ + ("size", uint64), + ("free", uint64), + ("dists", Array(uint32, "num_dists")), + ], dir=DIR_OUT) + libxl_cputopology = Struct("cputopology", [ ("core", uint32), ("socket", uint32), diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -613,6 +613,14 @@ int libxl__enum_from_string(const libxl_ return ERROR_FAIL; } +void libxl_numainfo_list_free(libxl_numainfo *list, int nr) +{ + int i; + for (i = 0; i < nr; i++) + libxl_numainfo_dispose(&list[i]); + free(list); +} + void libxl_cputopology_list_free(libxl_cputopology *list, int nr) { int i; diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -484,6 +484,7 @@ typedef struct xen_sysctl_topologyinfo x DEFINE_XEN_GUEST_HANDLE(xen_sysctl_topologyinfo_t); /* XEN_SYSCTL_numainfo */ +#define INVALID_NUMAINFO_ID (~0U) struct xen_sysctl_numainfo { /* * IN: maximum addressable entry in the caller-provided arrays.
By showing the number of free pages on each node. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c --- a/xen/arch/x86/numa.c +++ b/xen/arch/x86/numa.c @@ -365,10 +365,11 @@ static void dump_numa(unsigned char key) for_each_online_node(i) { paddr_t pa = (paddr_t)(NODE_DATA(i)->node_start_pfn + 1)<< PAGE_SHIFT; - printk("idx%d -> NODE%d start->%lu size->%lu\n", + printk("idx%d -> NODE%d start->%lu size->%lu free->%lu\n", i, NODE_DATA(i)->node_id, NODE_DATA(i)->node_start_pfn, - NODE_DATA(i)->node_spanned_pages); + NODE_DATA(i)->node_spanned_pages, + avail_node_heap_pages(i)); /* sanity check phys_to_nid() */ printk("phys_to_nid(%"PRIpaddr") -> %d should be %d\n", pa, phys_to_nid(pa), NODE_DATA(i)->node_id);
Dario Faggioli
2012-May-31 12:11 UTC
[PATCH 08 of 11] xl: add more NUMA information to `xl info -n''
So that the user knows how much memory there is on each node and how far they are from each others. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -4217,6 +4217,36 @@ static void output_physinfo(void) return; } +static void output_numainfo(void) +{ + libxl_numainfo *info; + int i, j, nr; + + info = libxl_get_numainfo(ctx, &nr); + if (info == NULL) { + fprintf(stderr, "libxl_get_numainfo failed.\n"); + return; + } + + printf("numa_info :\n"); + printf("node: memsize memfree distances\n"); + + for (i = 0; i < nr; i++) { + if (info[i].size != LIBXL_NUMAINFO_INVALID_ENTRY) { + printf("%4d: %6"PRIu64" %6"PRIu64" %d", i, + info[i].size / (1 << 20), info[i].free / (1 << 20), + info[i].dists[0]); + for (j = 1; j < info[i].num_dists; j++) + printf(",%d", info[i].dists[j]); + printf("\n"); + } + } + + libxl_numainfo_list_free(info, nr); + + return; +} + static void output_topologyinfo(void) { libxl_cputopology *info; @@ -4239,8 +4269,6 @@ static void output_topologyinfo(void) libxl_cputopology_list_free(info, nr); - printf("numa_info : none\n"); - return; } @@ -4250,8 +4278,10 @@ static void info(int numa) output_physinfo(); - if (numa) + if (numa) { output_topologyinfo(); + output_numainfo(); + } output_xeninfo();
Dario Faggioli
2012-May-31 12:11 UTC
[PATCH 09 of 11] libxl, xl: enable automatic placement of guests on NUMA nodes
If a domain does not have a VCPU affinity, try to pin it automatically to some PCPUs. This is done taking into account the NUMA characteristics of the host: we look for a combination of host''s NUMA nodes that has enough free memoy for the new domain, and pin it to the VCPUs of those nodes. Smaller combinations are considered first, to avoid spreading the domain''s memory among too many nodes. After that, we also ensure that the chosen combination comprises at least the same number of PCPUs than the number of VCPUs of the domain, increasing the number of nodes it is made up of if that is not the case. When adding new nodes to a combination, node distances are taken into account, trying to minimize the peformance impact for the domain. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -111,8 +111,8 @@ created online and the remainder will be =item B<cpus="CPU-LIST"> -List of which cpus the guest is allowed to use. Default behavior is -`all cpus`. A C<CPU-LIST> may be specified as follows: +List of which cpus the guest is allowed to use. By default xl will +pick some cpus (see below). A C<CPU-LIST> may be specified as follows: =over 4 @@ -132,6 +132,12 @@ run on cpu #3 of the host. =back +If this option is not specified, xl automatically try to place the new +domain on the host''s NUMA nodes (provided the host has more than one NUMA +node) by pinning it to the cpus of those nodes. The employed heuristics +tries to maximize performance for the domain itself, while also achieving +efficient resource utilization. + =item B<cpu_weight=WEIGHT> A domain with a weight of 512 will get twice as much CPU as a domain diff --git a/tools/libxl/gentest.py b/tools/libxl/gentest.py --- a/tools/libxl/gentest.py +++ b/tools/libxl/gentest.py @@ -131,6 +131,19 @@ static void libxl_cpumap_rand_init(libxl } } +static void libxl_nodemap_rand_init(libxl_nodemap *nodemap) +{ + int i; + nodemap->size = rand() % 16; + nodemap->map = calloc(nodemap->size, sizeof(*nodemap->map)); + libxl_for_each_node(i, *nodemap) { + if (rand() % 2) + libxl_nodemap_set(nodemap, i); + else + libxl_nodemap_reset(nodemap, i); + } +} + static void libxl_key_value_list_rand_init(libxl_key_value_list *pkvl) { int i, nr_kvp = rand() % 16; diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -626,6 +626,15 @@ libxl_cpupoolinfo * libxl_list_cpupool(l libxl_vminfo * libxl_list_vm(libxl_ctx *ctx, int *nb_vm); void libxl_vminfo_list_free(libxl_vminfo *list, int nr); +/* Automatic NUMA placement */ +libxl_numa_candidate *libxl_domain_numa_candidates(libxl_ctx *ctx, + libxl_domain_build_info *b_info, + int min_nodes, int *nr_cndts); +int libxl_numa_candidate_add_cpus(libxl_ctx *ctx, + int min_cpus, int max_nodes, + libxl_numa_candidate *candidate); +void libxl_numa_candidates_list_free(libxl_numa_candidate *list, int nr); + /* * Devices * ======diff --git a/tools/libxl/libxl_json.c b/tools/libxl/libxl_json.c --- a/tools/libxl/libxl_json.c +++ b/tools/libxl/libxl_json.c @@ -119,6 +119,26 @@ out: return s; } +yajl_gen_status libxl_nodemap_gen_json(yajl_gen hand, + libxl_nodemap *nodemap) +{ + yajl_gen_status s; + int i; + + s = yajl_gen_array_open(hand); + if (s != yajl_gen_status_ok) goto out; + + libxl_for_each_node(i, *nodemap) { + if (libxl_nodemap_test(nodemap, i)) { + s = yajl_gen_integer(hand, i); + if (s != yajl_gen_status_ok) goto out; + } + } + s = yajl_gen_array_close(hand); +out: + return s; +} + yajl_gen_status libxl_key_value_list_gen_json(yajl_gen hand, libxl_key_value_list *pkvl) { diff --git a/tools/libxl/libxl_json.h b/tools/libxl/libxl_json.h --- a/tools/libxl/libxl_json.h +++ b/tools/libxl/libxl_json.h @@ -27,6 +27,7 @@ yajl_gen_status libxl_domid_gen_json(yaj yajl_gen_status libxl_uuid_gen_json(yajl_gen hand, libxl_uuid *p); yajl_gen_status libxl_mac_gen_json(yajl_gen hand, libxl_mac *p); yajl_gen_status libxl_cpumap_gen_json(yajl_gen hand, libxl_cpumap *p); +yajl_gen_status libxl_nodemap_gen_json(yajl_gen hand, libxl_nodemap *p); yajl_gen_status libxl_cpuid_policy_list_gen_json(yajl_gen hand, libxl_cpuid_policy_list *p); yajl_gen_status libxl_string_list_gen_json(yajl_gen hand, libxl_string_list *p); diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -439,6 +439,12 @@ libxl_cputopology = Struct("cputopology" ("node", uint32), ], dir=DIR_OUT) +libxl_numa_candidate = Struct("numa_candidate", [ + ("nr_nodes", integer), + ("free_memkb", uint32), + ("nodemap", libxl_nodemap), + ], dir=DIR_OUT) + libxl_sched_credit_domain = Struct("sched_credit_domain", [ ("weight", integer), ("cap", integer), diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -613,6 +613,250 @@ int libxl__enum_from_string(const libxl_ return ERROR_FAIL; } +static uint64_t node_to_nodemap_distance(const libxl_numainfo *ninfo, + libxl_nodemap *nodemap, + int node) +{ + int i; + uint64_t dist = 0; + + /* Sum of the distances of a node from all the nodes in the map */ + libxl_for_each_set_node(i, *nodemap) + dist += labs(ninfo[node].dists[i]); + + return dist; +} + +static int add_node_to_nodemap(const libxl_numainfo *ninfo, int nr_nodes, + libxl_nodemap *nodemap) +{ + int i, node = -1; + uint64_t min_dist = UINT64_MAX; + + /* Find the node closest to our nodemap */ + for (i = 0; i < nr_nodes && !libxl_nodemap_test(nodemap, i); i++) { + if (node_to_nodemap_distance(ninfo, nodemap, i) < min_dist) { + min_dist = node_to_nodemap_distance(ninfo, nodemap, i); + node = i; + } + } + if (node >= 0) + libxl_nodemap_set(nodemap, node); + + return node; +} + +/* + * The following two uility functions compute the node maps + * coresponding to the [ n! / (k! * (n - k)!) ] combinations + * of @size nodes within a set that is @nr_nodes big, without + * repetition. The algorithm used for generating them uses + * a vector containing the indexes in the set of the elements + * of the i-eth combination to generate the (i+1)-eth, and + * ensures they come in lexicographic order. + * + * For example, with n = 5 and k = 3, calling numa_node_set_init() + * will generate "0, 1, 2", while subsequent calls to + * numa_node_set_next() yield the following: + * "0, 1, 2 + * 0, 1, 3 + * 0, 1, 4 + * 0, 2, 3 + * 0, 2, 4 + * 0, 3, 4 + * 1, 2, 3 + * 1, 2, 4 + * 1, 3, 4 + * 2, 3, 4" + */ +static void numa_node_set_init(int nr_nodes, int size, int *idxs, + libxl_nodemap *nodemap) +{ + int i; + + /* First set always is "0, 1, 2, ..., size-1" */ + libxl_nodemap_set_none(nodemap); + for (i = 0; i < size; i++) { + libxl_nodemap_set(nodemap, i); + idxs[i] = i; + } +} + +static int numa_node_set_next(int nr_nodes, int size, int *idxs, + libxl_nodemap *nodemap) +{ + int i; + + /* Check whether we just need to increase the rightmost index */ + if (idxs[size - 1] < nr_nodes - 1) { + idxs[size - 1]++; + goto build_nodemap; + } + + /* Find the rightmost element from where to start the increasing */ + for (i = size - 1; idxs[i] == nr_nodes - size + i; i--) { + if (i <= 0) { + /* No more valid combinations! */ + return -1; + } + } + for (idxs[i]++, i += 1; i < size; i++) + idxs[i] = idxs[i - 1] + 1; + + build_nodemap: + libxl_nodemap_set_none(nodemap); + for (i = 0; i < size; i++) + libxl_nodemap_set(nodemap, idxs[i]); + + return 0; +} + +libxl_numa_candidate *libxl_domain_numa_candidates(libxl_ctx *ctx, + libxl_domain_build_info *b_info, + int min_nodes, int *nr_cndts) +{ + uint32_t dom_memkb, nodes_memkb; + libxl_numa_candidate *cndts = NULL; + libxl_nodemap suitable_nodes; + int *suitable_nodes_idx; + libxl_numainfo *ninfo; + int nr_nodes, i, next_comb_ok; + + ninfo = libxl_get_numainfo(ctx, &nr_nodes); + if (ninfo == NULL) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl_get_numainfo failed.\n"); + goto out; + } + + /* Prepare the nodemap and the index array for the combinations */ + if (libxl_nodemap_alloc(ctx, &suitable_nodes) < 0) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl_nodemap_alloc failed\n"); + goto out_numainfo; + } + suitable_nodes_idx = malloc(sizeof(int) * nr_nodes); + if (suitable_nodes_idx == NULL) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "malloc failed\n"); + goto out_nodemap; + } + + /* Memoy needed by the domain */ + if (libxl_domain_need_memory(ctx, b_info, &dom_memkb)) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl_domain_need_memory failed\n"); + goto out_nodemap_idx; + } + + *nr_cndts = 0; + while (1) { + /* Avoid considering a silly number of nodes */ + if (min_nodes > nr_nodes) + break; + + /* Generate the combinations and check their cumulative free memory */ + numa_node_set_init(nr_nodes, min_nodes, suitable_nodes_idx, + &suitable_nodes); + do { + nodes_memkb = 0; + libxl_for_each_set_node(i, suitable_nodes) + nodes_memkb += ninfo[i].free / 1024; + + /* If free memory is enough, this is a valid candidate */ + if (nodes_memkb >= dom_memkb) { + cndts = realloc(cndts, sizeof(cndts[0]) * (*nr_cndts+1)); + if (cndts == NULL) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "realloc failed\n"); + goto out_nodemap_idx; + } + + libxl_nodemap_alloc(ctx, &cndts[*nr_cndts].nodemap); + libxl_nodemap_copy(&cndts[*nr_cndts].nodemap, &suitable_nodes); + cndts[*nr_cndts].free_memkb = nodes_memkb; + cndts[*nr_cndts].nr_nodes = min_nodes; + (*nr_cndts)++; + } + + next_comb_ok = numa_node_set_next(nr_nodes, min_nodes, + suitable_nodes_idx, + &suitable_nodes) == 0; + } while (next_comb_ok); + + min_nodes++; + } + + out_nodemap_idx: + free(suitable_nodes_idx); + out_nodemap: + libxl_nodemap_dispose(&suitable_nodes); + out_numainfo: + libxl_numainfo_list_free(ninfo, nr_nodes); + out: + return cndts; +} + +int libxl_numa_candidate_add_cpus(libxl_ctx *ctx, + int min_cpus, int max_nodes, + libxl_numa_candidate *candidate) +{ + int dom_nodes, nodes_cpus; + libxl_cputopology *tinfo; + libxl_numainfo *ninfo; + int nr_nodes, nr_cpus; + int i, rc; + + tinfo = libxl_get_cpu_topology(ctx, &nr_cpus); + if (tinfo == NULL) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl_get_topologyinfo failed\n"); + rc = ERROR_NOMEM; + goto out; + } + + ninfo = libxl_get_numainfo(ctx, &nr_nodes); + if (ninfo == NULL) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl_get_numainfo failed\n"); + rc = ERROR_NOMEM; + goto out_topologyinfo; + } + + if (max_nodes == -1 || max_nodes > nr_nodes) + max_nodes = nr_nodes; + + /* If a candidate is short in PCPUs, just add more nodes */ + dom_nodes = candidate->nr_nodes; + while (1) { + rc = ERROR_FAIL; + + if (candidate->nr_nodes > max_nodes) + break; + + for (i = 0, nodes_cpus = 0; i < nr_cpus; i++) { + if (libxl_nodemap_test(&candidate->nodemap, tinfo[i].node)) + nodes_cpus++; + } + + /* Either we have enough PCPUs, and thus we''re done ... */ + if (nodes_cpus >= min_cpus) { + rc = 0; + break; + } + /* ... Or we need more, and we want to try adding the closest node */ + add_node_to_nodemap(ninfo, nr_nodes, &candidate->nodemap); + candidate->nr_nodes++; + } + + libxl_numainfo_list_free(ninfo, nr_nodes); + out_topologyinfo: + libxl_cputopology_list_free(tinfo, nr_cpus); + out: + return rc; +} + +void libxl_numa_candidates_list_free(libxl_numa_candidate *list, int nr) +{ + int i; + for (i = 0; i < nr; i++) + libxl_numa_candidate_dispose(&list[i]); + free(list); +} + void libxl_numainfo_list_free(libxl_numainfo *list, int nr) { int i; diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -490,6 +490,122 @@ static void split_string_into_string_lis free(s); } +/* How many PCPUs are there on each node? */ +static int cpus_per_node(libxl_cputopology *tinfo, int nr_cpus) +{ + libxl_numainfo *ninfo; + int nr_nodes, j, i; + int cpus_nodes = 0; + + ninfo = libxl_get_numainfo(ctx, &nr_nodes); + if (ninfo == NULL) + return 0; + + /* This makes sense iff # of PCPUs is the same for all nodes */ + for (j = 0; j < nr_nodes; j++) { + int curr_cpus = 0; + + for (i = 0; i < nr_cpus; i++) { + if (tinfo[i].node == j) + curr_cpus++; + } + cpus_nodes = cpus_nodes == 0 ? curr_cpus : cpus_nodes; + + /* Make sure the above holds, or turn the whole thing off! */ + if (cpus_nodes != curr_cpus) { + cpus_nodes = 0; + break; + } + } + + libxl_numainfo_dispose(ninfo); + return cpus_nodes; +} + +/* Try to achieve "optimal" NUMA placement */ +static int place_domain(libxl_domain_build_info *b_info) +{ + int dom_min_nodes, dom_max_nodes; + int dom_needs_cpus, cpus_nodes; + libxl_cputopology *tinfo; + libxl_cpupoolinfo *pinfo; + int nr_cpus, nr_pools; + libxl_numa_candidate *candidates; + int candidate, nr_candidates; + int i, err; + + /* First of all, if the domain has its cpu-affinity, just don''t bother */ + if (libxl_cpumap_is_full(&b_info->cpumap) != 0) + return 0; + + /* If cpupools are in use, better not to mess with them */ + pinfo = libxl_list_cpupool(ctx, &nr_pools); + if (!pinfo) { + fprintf(stderr, "error getting cpupool info\n"); + err = ENOMEM; + goto out; + } + if (nr_pools > 1) { + fprintf(stderr, "skip numa placement as cpupools are being used\n"); + err = 0; + goto out_poolinfo; + } + + tinfo = libxl_get_cpu_topology(ctx, &nr_cpus); + if (tinfo == NULL) { + fprintf(stderr, "libxl_get_topologyinfo failed.\n"); + err = ENOMEM; + goto out_poolinfo; + } + + /* If possible, start with a sensible minimum number of nodes */ + cpus_nodes = cpus_per_node(tinfo, nr_cpus); + dom_needs_cpus = b_info->max_vcpus; + + if (cpus_nodes != 0) + dom_min_nodes = (dom_needs_cpus + cpus_nodes - 1) / cpus_nodes; + else + dom_min_nodes = 1; + + /* Let''s fit the domain memory-wise (ENOSPC if we don''t manage) */ + candidates = libxl_domain_numa_candidates(ctx, b_info, dom_min_nodes, + &nr_candidates); + if (!candidates) { + err = ENOSPC; + goto out_topologyinfo; + } + + /* Pick a candidate and ensure it gives us enough PCPUs */ + dom_max_nodes = -1; err = ERROR_FAIL; + for (candidate = 0; err && candidate < nr_candidates; candidate++) { + /* We try adding nodes up to all of them (dom_max_nodes = -1) */ + err = libxl_numa_candidate_add_cpus(ctx, dom_needs_cpus, dom_max_nodes, + &candidates[candidate]); + } + + if (err == ERROR_FAIL) { + /* Report back we didn''t find a candidate with enough cpus */ + err = ESRCH; + } else if (!err) { + /* Things went fine. Commit the map into domain''s build info */ + libxl_cpumap_set_none(&b_info->cpumap); + for (i = 0; i < nr_cpus; i++) { + if (libxl_nodemap_test(&candidates[candidate-1].nodemap, + tinfo[i].node)) + libxl_cpumap_set(&b_info->cpumap, i); + } + } + + libxl_numa_candidates_list_free(candidates, nr_candidates); +out_topologyinfo: + libxl_cputopology_list_free(tinfo, nr_cpus); +out_poolinfo: + for (i = 0; i < nr_pools; i++) + libxl_cpupoolinfo_dispose(&pinfo[i]); +out: + return err; +} + static int vcpupin_parse(char *cpu, libxl_cpumap *cpumap) { libxl_cpumap exclude_cpumap; @@ -1738,6 +1854,16 @@ start: goto error_out; } + ret = place_domain(&d_config.b_info); + if (ret == ESRCH || ret == ENOSPC) { + fprintf(stderr, "Failed to place the domain, fallback to all nodes\n"); + /* Resetting domain''s cpu-affinity is enough for that */ + libxl_cpumap_set_any(&d_config.b_info.cpumap); + } else if (ret != 0) { + ret = ERROR_FAIL; + goto error_out; + } + libxl_asyncprogress_how autoconnect_console_how_buf; if ( dom_info->console_autoconnect ) { autoconnect_console_how_buf.callback = autoconnect_console; diff --git a/tools/python/xen/lowlevel/xl/xl.c b/tools/python/xen/lowlevel/xl/xl.c --- a/tools/python/xen/lowlevel/xl/xl.c +++ b/tools/python/xen/lowlevel/xl/xl.c @@ -320,6 +320,23 @@ PyObject *attrib__libxl_file_reference_g return genwrap__string_get(&pptr->path); } +PyObject *attrib__libxl_nodemap_get(libxl_nodemap *pptr) +{ + PyObject *nodelist = NULL; + int i; + + nodelist = PyList_New(0); + libxl_for_each_node(i, *pptr) { + if ( libxl_nodemap_test(pptr, i) ) { + PyObject* pyint = PyInt_FromLong(i); + + PyList_Append(nodelist, pyint); + Py_DECREF(pyint); + } + } + return nodelist; +} + PyObject *attrib__libxl_hwcap_get(libxl_hwcap *pptr) { PyErr_SetString(PyExc_NotImplementedError, "Getting hwcap");
Dario Faggioli
2012-May-31 12:11 UTC
[PATCH 10 of 11] libxl, xl: heuristics for reordering NUMA placement candidates
Once we know which ones of all the possible combinations represents valid placement candidates for a domain, use some heuistics for deciding which one to pick (instead to just taking the first one). First of all, the smaller candidates are better both from the domain''s point of view (fewer memory spreading among nodes) and fom the system point of view (fewer memoy fragmentation). In case of candidates of equal sizes, the one that has the greater amount of memory by at least 10% wins, as this is (again) good for keeping the fragmentation small. Finally, the number of domains running on the nodes involved in the combinations is checked, and the "least populated" candidate is the one that is considered. This makes the wholle automatic NUMA placement mechanism very similar to what xm/xend does, although no memory considerations are present there. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -633,6 +633,8 @@ libxl_numa_candidate *libxl_domain_numa_ int libxl_numa_candidate_add_cpus(libxl_ctx *ctx, int min_cpus, int max_nodes, libxl_numa_candidate *candidate); +int libxl_numa_candidate_count_domains(libxl_ctx *ctx, + libxl_numa_candidate *candidate); void libxl_numa_candidates_list_free(libxl_numa_candidate *list, int nr); /* diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -441,6 +441,7 @@ libxl_cputopology = Struct("cputopology" libxl_numa_candidate = Struct("numa_candidate", [ ("nr_nodes", integer), + ("nr_domains", integer), ("free_memkb", uint32), ("nodemap", libxl_nodemap), ], dir=DIR_OUT) diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -849,6 +849,70 @@ int libxl_numa_candidate_add_cpus(libxl_ return rc; } +int libxl_numa_candidate_count_domains(libxl_ctx *ctx, + libxl_numa_candidate *candidate) +{ + libxl_nodemap dom_nodemap; + libxl_cputopology *tinfo; + int nr_doms, nr_cpus, rc = 0; + libxl_dominfo *dinfo; + int i, j, k; + + dinfo = libxl_list_domain(ctx, &nr_doms); + if (dinfo == NULL) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl_list_domain failed\n"); + rc = ERROR_NOMEM; + goto out; + } + + if (libxl_nodemap_alloc(ctx, &dom_nodemap) < 0) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl_nodemap_alloc failed\n"); + rc = ERROR_NOMEM; + goto out_dominfo; + } + + tinfo = libxl_get_cpu_topology(ctx, &nr_cpus); + if (tinfo == NULL) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl_get_topologyinfo failed\n"); + rc = ERROR_NOMEM; + goto out_nodemap; + } + + candidate->nr_domains = 0; + for (i = 0; i < nr_doms; i++) { + libxl_vcpuinfo *vinfo; + int nr_vcpus, nr_cpus; + + vinfo = libxl_list_vcpu(ctx, dinfo[i].domid, &nr_vcpus, &nr_cpus); + if (vinfo == NULL) + continue; + + libxl_nodemap_set_none(&dom_nodemap); + for (j = 0; j < nr_vcpus; j++) { + libxl_for_each_set_cpu(k, vinfo[j].cpumap) + libxl_nodemap_set(&dom_nodemap, tinfo[k].node); + } + + libxl_for_each_set_node(j, dom_nodemap) { + if (libxl_nodemap_test(&candidate->nodemap, j)) { + candidate->nr_domains++; + goto found; + } + } + found: + libxl_vcpuinfo_list_free(vinfo, nr_vcpus); + } + + + libxl_cputopology_list_free(tinfo, nr_cpus); + out_nodemap: + libxl_nodemap_dispose(&dom_nodemap); + out_dominfo: + libxl_dominfo_list_free(dinfo, nr_doms); + out: + return rc; +} + void libxl_numa_candidates_list_free(libxl_numa_candidate *list, int nr) { int i; diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -522,6 +522,34 @@ static int cpus_per_node(libxl_cputopolo return cpus_nodes; } +/* + * The NUMA placement candidates are reordered according to the following + * heuristics: + * - candidates involving fewer nodes come first. In case two (or + * more) candidates span the same number of nodes, + * - candidates with greater amount of free memory come first. In + * case two (or more) candidates differ in their amount of free + * memory by less than 10%, + * - candidates with fewer domains insisting on them at the time of + * this call come first. + */ +static int candidates_cmpf(const void *v1, const void *v2) +{ + const libxl_numa_candidate *c1 = (const libxl_numa_candidate*) v1; + const libxl_numa_candidate *c2 = (const libxl_numa_candidate*) v2; + double mem_diff = labs(c1->free_memkb - c2->free_memkb); + double mem_avg = (c1->free_memkb + c2->free_memkb) / 2.0; + + if (c1->nr_nodes != c2->nr_nodes) + return c1->nr_nodes - c2->nr_nodes; + + if ((mem_diff / mem_avg) * 100.0 < 10.0 && + c1->nr_domains != c2->nr_domains) + return c1->nr_domains - c2->nr_domains; + + return c2->free_memkb - c1->free_memkb; +} + /* Try to achieve "optimal" NUMA placement */ static int place_domain(libxl_domain_build_info *b_info) { @@ -575,6 +603,18 @@ static int place_domain(libxl_domain_bui goto out_topologyinfo; } + /* Account for the number of domains insisting on a candidate placement */ + for (i = 0; i < nr_candidates; i++) { + if (libxl_numa_candidate_count_domains(ctx, &candidates[i])) { + fprintf(stderr, "libxl_numa_candidate_count_domains failed\n"); + err = ENOMEM; + goto out_cndtslist; + } + } + + /* Reorder candidates (see @candidates_cmpf for the heuristics) */ + qsort(candidates, nr_candidates, sizeof(candidates[0]), candidates_cmpf); + /* Pick a candidate and ensure it gives us enough PCPUs */ dom_max_nodes = -1; err = ERROR_FAIL; for (candidate = 0; err && candidate < nr_candidates; candidate++) { @@ -596,6 +636,7 @@ static int place_domain(libxl_domain_bui } } +out_cndtslist: libxl_numa_candidates_list_free(candidates, nr_candidates); out_topologyinfo: libxl_cputopology_list_free(tinfo, nr_cpus);
Dario Faggioli
2012-May-31 12:11 UTC
[PATCH 11 of 11] Some automatic NUMA placement documentation
About rationale, usage and API. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/docs/misc/xl-numa-placement.markdonw b/docs/misc/xl-numa-placement.markdonw new file mode 100644 --- /dev/null +++ b/docs/misc/xl-numa-placement.markdonw @@ -0,0 +1,113 @@ +# Guest Automatic NUMA Placement in libxl and xl # + +## Rationale ## + +The Xen hypervisor deals with Non-Uniform Memory Access (NUMA]) +machines by assigning to its domain a "node affinity", i.e., a set of NUMA +nodes of the host from which it gets its memory allocated. + +NUMA awareness becomes very important as soon as many domains start running +memory-intensive workloads on a shared host. In fact, the cost of accessing +non node-local memory locations is very high, and the performance degradation +is likely to be noticeable. + +## Guest Placement in xl ## + +If using xl for creating and managing guests, it is very easy to ask +for both manual or automatic placement of them across the host''s NUMA +nodes. + +Note that xm/xend does the very same thing, the only differences residing +in the details of the heuristics adopted for the placement (see below). + +### Manual Guest Placement with xl ### + +Thanks to the "cpus=" option, it is possible to specify where a domain +should be created and scheduled on, directly in its config file. This +affects NUMA placement and memory accesses as the hypervisor constructs +the node affinity of a VM basing right on its CPU affinity when it is +created. + +This is very simple and effective, but requires the user/system +administrator to explicitly specify affinities for each and every domain, +or Xen won''t be able to enable guarantee the locality for their memory +accesses. + +### Automatic Guest Placement with xl ### + +In case no "cpus=" option is specified in the config file, xl tries +to figure out on its own on which node(s) the domain could fit best. + +First of all, it needs to find a node (or a set of nodes) that have +enough free memory for accommodating the domain. After that, the actual +decision on where to put the new guest happens by generating all the +possible combinations of nodes that satisfies the above and chose among +them according to the following heuristics: + + * candidates involving fewer nodes come first. In case two (or more) + candidates span the same number of nodes, + * candidates with greater amount of free memory come first. In case + two (or more) candidates differ in their amount of free memory by + less than 10%, + * candidates with fewer domains already placed on them come first. + +Giving preference to small candidates ensures better performance for +the guest, as it avoid spreading its memory among different nodes. +Using the nodes that have the biggest amounts of free memory helps +keeping the memory fragmentation small, from a system wide perspective. +Finally, in case more candidates fulfil these criteria by the same +extent, choosing the candidate that is hosting fewer domain helps +balancing the load on the various nodes. + +The last step is figuring out whether the selected candidate contains +at least as much CPUs as the number of VCPUs of the VM. The current +solution for the case when this is not verified is just to add some +more nodes, until the condition turns into being true. When doing +this, the nodes with the least possible distance from the ones +already in the nodemap are considered. + +## Guest Placement in libxl ## + +xl achieves automatic NUMA placement by means of the following API +calls, provided by libxl. + + libxl_numa_candidate *libxl_domain_numa_candidates(libxl_ctx *ctx, + libxl_domain_build_info *b_info, + int min_nodes, int *nr_cndts); + +This is what should be used to generate the full set of placement +candidates. In fact, the function returns an array of containing nr_cndts +libxl_numa_candidate (see below). Each candidate is basically a set of nodes +that has been checked against the memory requirement derived from the +provided libxl_domain_build_info. + + int libxl_numa_candidate_add_cpus(libxl_ctx *ctx, + int min_cpus, int max_nodes, + libxl_numa_candidate *candidate); + +This is what should be used to ensure a placement candidate has at least +min_cpus CPUs. In case it does not, the function also take care of +adding more nodes to the candidate itself (up to when the value specified +in max_nodes is reached). When adding new nodes, the one that has the +smallest "distance" from the current node map is selected at each step. + + libxl_numa_candidate_count_domains(libxl_ctx *ctx, + libxl_numa_candidate *candidate); + +This is what counts the number of domains that are currently pinned +to the CPUs of the nodes of a given candidate. + +Finally, a placement candidate is represented by the following data +structure: + + typedef struct libxl_numa_candidate { + int nr_nodes; + int nr_domains; + uint32_t free_memkb; + libxl_nodemap nodemap; + } libxl_numa_candidate; + +It basically tells what are the nodes the candidate spans (in the nodemap), +how many of them there are (nr_nodes), how much free memory they can +provide all together (free_memkb) and how many domains are running pinned +to their CPUs (nr_domains).
Ian Jackson
2012-May-31 14:10 UTC
Re: [PATCH 01 of 11] libxc: abstract xenctl_cpumap to just xenctl_map
Dario Faggioli writes ("[PATCH 01 of 11] libxc: abstract xenctl_cpumap to just xenctl_map"):> More specifically: > 1. replaces xenctl_cpumap with xenctl_map > 2. provides bitmap_to_xenctl_map and the reverse; > 3. re-implement cpumask_to_xenctl_map with bitmap_to_xenctl_map > and the reverse; > > Other than #3, no functional changes. Interface only slightly > afected.This looks plausible to me but it needs a review from a hypervisor maintainer too. Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson
2012-May-31 14:11 UTC
Re: [PATCH 02 of 11] libxl: abstract libxl_cpumap to just libxl_map
Dario Faggioli writes ("[PATCH 02 of 11] libxl: abstract libxl_cpumap to just libxl_map"):> More specifically: > 1. introduces struct libxl_map; > 2. re-implement libxl_cpumap_* on top of struct libxl_map_*; > > No functional nor interface changes at all.Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson
2012-May-31 14:12 UTC
Re: [PATCH 03 of 11] libxc, libxl: introduce xc_nodemap_t and libxl_nodemap
Dario Faggioli writes ("[PATCH 03 of 11] libxc, libxl: introduce xc_nodemap_t and libxl_nodemap"):> As NUMA node-related counterparts of xc_cpumap_t and libxl_cpumap.Wouldn''t it be possible to just use `libxl_map'' type directly, with the caller knowing whether it contains node or cpu numbers ? As it is this has a whole lot of duplicated wrapper code. Ian.
Ian Jackson
2012-May-31 14:13 UTC
Re: [PATCH 04 of 11] libxl: expand the libxl_{cpu, node}map API a bit
Dario Faggioli writes ("[PATCH 04 of 11] libxl: expand the libxl_{cpu,node}map API a bit"):> +void libxl_cpumap_copy(/*XXX libxl_ctx *ctx,*/ libxl_cpumap *dst,XXX ? Ian.
Ian Jackson
2012-May-31 14:22 UTC
Re: [PATCH 06 of 11] libxl: introduce libxl_get_numainfo()
Dario Faggioli writes ("[PATCH 06 of 11] libxl: introduce libxl_get_numainfo()"):> Make some NUMA node information available to the toolstack. Achieve > this by means of xc_numainfo(), which exposes memory size and amount > of free memory of each node, as well as the relative distances of > each node to all the others....> +#define LIBXL_NUMAINFO_INVALID_ENTRY (~(uint32_t)0)Is there some reason we can''t just make sure we use the same value for this in both places ? That would avoid the need for this:> +#define V(mem, i) (mem[i] == INVALID_NUMAINFO_ID) ? \ > + LIBXL_NUMAINFO_INVALID_ENTRY : mem[i]I appreciate that the types aren''t the same. In libxc it''s an unsigned long. But shouldn''t they be the same ?> +libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr) > +{ > + xc_numainfo_t ninfo; > + DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize); > + DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree); > + DECLARE_HYPERCALL_BUFFER(uint32_t, node_distances); > + libxl_numainfo *ret = NULL; > + int i, j, max_nodes; > + > + max_nodes = libxl_get_max_nodes(ctx); > + if (max_nodes == 0) > + { > + LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES"); > + return NULL; > + } > + > + memsize = xc_hypercall_buffer_alloc > + (ctx->xch, memsize, sizeof(*memsize) * max_nodes); > + memfree = xc_hypercall_buffer_alloc > + (ctx->xch, memfree, sizeof(*memfree) * max_nodes); > + node_distances = xc_hypercall_buffer_alloc > + (ctx->xch, node_distances, sizeof(*node_distances) * max_nodes * max_nodes);This kind of stuff normally lives in libxc. I appreciate that we have a bit of it in libxl already, but do we want to perpetuate that ?> + ret = malloc(sizeof(libxl_numainfo) * max_nodes); > + if (ret == NULL) {In libxl you can use libxl__zalloc(NULL,...) (and don''t have to check for errors because it can''t fail). But perhaps this is going into libxc ? I''d like to hear what other people say about putting this in libxl vs. libxc.> + ret[i].dists = malloc(sizeof(*node_distances) * max_nodes);Again. And the lack of error exits will simplify this a lot. Ian.
Dario Faggioli writes ("[PATCH 07 of 11] xen: enhance dump_numa output"):> By showing the number of free pages on each node....> - printk("idx%d -> NODE%d start->%lu size->%lu\n", > + printk("idx%d -> NODE%d start->%lu size->%lu free->%lu\n", > i, NODE_DATA(i)->node_id, > NODE_DATA(i)->node_start_pfn, > - NODE_DATA(i)->node_spanned_pages); > + NODE_DATA(i)->node_spanned_pages, > + avail_node_heap_pages(i));This seems to have an indentation error. Otherwise, Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Ian.
Ian Jackson
2012-May-31 14:24 UTC
Re: [PATCH 08 of 11] xl: add more NUMA information to `xl info -n''
Dario Faggioli writes ("[PATCH 08 of 11] xl: add more NUMA information to `xl info -n''"):> So that the user knows how much memory there is on each node and > how far they are from each others.Perhaps the json output machinery can do this for us ? If that''s sufficiently legible then it would be an improvement on this open-coded printer. Also people might try to parse the output from open-coded printer which would be annoying when we try to extend things. Ian.
Dario Faggioli
2012-May-31 14:30 UTC
Re: [PATCH 04 of 11] libxl: expand the libxl_{cpu, node}map API a bit
On Thu, 2012-05-31 at 15:13 +0100, Ian Jackson wrote:> Dario Faggioli writes ("[PATCH 04 of 11] libxl: expand the libxl_{cpu,node}map API a bit"): > > +void libxl_cpumap_copy(/*XXX libxl_ctx *ctx,*/ libxl_cpumap *dst, > > XXX ? >Yep, I left it on purpose but I also wanted to mention that in the changelog, which I apparently missed. Sorry. However, now that we are here, the whole point is, does that function need a libxl_ctx argument? It doesn''t use it, but I was concerned about consistency with the other libxl_cpumap_* calls, and maybe future needs... Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-May-31 14:32 UTC
Re: [PATCH 03 of 11] libxc, libxl: introduce xc_nodemap_t and libxl_nodemap
On Thu, 2012-05-31 at 15:12 +0100, Ian Jackson wrote:> Dario Faggioli writes ("[PATCH 03 of 11] libxc, libxl: introduce xc_nodemap_t and libxl_nodemap"): > > As NUMA node-related counterparts of xc_cpumap_t and libxl_cpumap. > > Wouldn''t it be possible to just use `libxl_map'' type directly, with > the caller knowing whether it contains node or cpu numbers ? >Definitely possible.> As it is this has a whole lot of duplicated wrapper code. >I agree, and I will do that. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Thu, 2012-05-31 at 15:23 +0100, Ian Jackson wrote:> Dario Faggioli writes ("[PATCH 07 of 11] xen: enhance dump_numa output"): > > By showing the number of free pages on each node. > ... > > - printk("idx%d -> NODE%d start->%lu size->%lu\n", > > + printk("idx%d -> NODE%d start->%lu size->%lu free->%lu\n", > > i, NODE_DATA(i)->node_id, > > NODE_DATA(i)->node_start_pfn, > > - NODE_DATA(i)->node_spanned_pages); > > + NODE_DATA(i)->node_spanned_pages, > > + avail_node_heap_pages(i)); > > This seems to have an indentation error. >Yep, that file has tabs... :-/ Will fix it. Thanks, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-May-31 14:40 UTC
Re: [PATCH 08 of 11] xl: add more NUMA information to `xl info -n''
On Thu, 2012-05-31 at 15:24 +0100, Ian Jackson wrote:> Dario Faggioli writes ("[PATCH 08 of 11] xl: add more NUMA information to `xl info -n''"): > > So that the user knows how much memory there is on each node and > > how far they are from each others. > > Perhaps the json output machinery can do this for us ? If that''s > sufficiently legible then it would be an improvement on this > open-coded printer. > > Also people might try to parse the output from open-coded printer > which would be annoying when we try to extend things. >Not sure I got 100% of what you mean. I put something together on the line of what ''void output_topologyinfo(void)'' does, but I can do it some other way as soon as I understand how you''re suggesting to do it :-P Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2012-May-31 14:47 UTC
Re: [PATCH 01 of 11] libxc: abstract xenctl_cpumap to just xenctl_map
On 31/05/12 15:10, Ian Jackson wrote:> Dario Faggioli writes ("[PATCH 01 of 11] libxc: abstract xenctl_cpumap to just xenctl_map"): >> More specifically: >> 1. replaces xenctl_cpumap with xenctl_map >> 2. provides bitmap_to_xenctl_map and the reverse; >> 3. re-implement cpumask_to_xenctl_map with bitmap_to_xenctl_map >> and the reverse; >> >> Other than #3, no functional changes. Interface only slightly >> afected. > This looks plausible to me but it needs a review from a hypervisor > maintainer too.What exactly are you looking for from a hypervisor maintianer? I was going to say it looks plausible to me but needs a review from a tools maintianer. :-) Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
George Dunlap
2012-May-31 14:54 UTC
Re: [PATCH 02 of 11] libxl: abstract libxl_cpumap to just libxl_map
On 31/05/12 13:11, Dario Faggioli wrote:> More specifically: > 1. introduces struct libxl_map; > 2. re-implement libxl_cpumap_* on top of struct libxl_map_*; > > No functional nor interface changes at all. > > This is in preparation of the introduction of NUMA nodes maps. > > Signed-off-by: Dario Faggioli<dario.faggioli@citrix.eu.com>Acked-by: George Dunlap <george.dunlap@eu.citrix.com>> > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -282,11 +282,17 @@ typedef uint32_t libxl_hwcap[8]; > > typedef uint64_t libxl_ev_user; > > -typedef struct { > +struct libxl_map { > uint32_t size; /* number of bytes in map */ > uint8_t *map; > -} libxl_cpumap; > -void libxl_cpumap_dispose(libxl_cpumap *map); > +}; > +void libxl_map_dispose(struct libxl_map *map); > + > +typedef struct libxl_map libxl_cpumap; > +static inline void libxl_cpumap_dispose(libxl_cpumap *cpumap) > +{ > + return libxl_map_dispose(cpumap); > +} > > typedef struct { > /* > diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c > --- a/tools/libxl/libxl_utils.c > +++ b/tools/libxl/libxl_utils.c > @@ -488,47 +488,53 @@ int libxl_mac_to_device_nic(libxl_ctx *c > return rc; > } > > +void libxl_map_dispose(struct libxl_map *map) > +{ > + free(map->map); > +} > + > +static int libxl_map_alloc(libxl_ctx *ctx, struct libxl_map *map, int n_elems) > +{ > + int sz; > + > + sz = (n_elems + 7) / 8; > + map->map = calloc(sz, sizeof(*map->map)); > + if (!map->map) > + return ERROR_NOMEM; > + map->size = sz; > + return 0; > +} > + > +int libxl_map_test(struct libxl_map *map, int elem) > +{ > + if (elem>= map->size * 8) > + return 0; > + return (map->map[elem / 8]& (1<< (elem& 7))) ? 1 : 0; > +} > + > +void libxl_map_set(struct libxl_map *map, int elem) > +{ > + if (elem>= map->size * 8) > + return; > + map->map[elem / 8] |= 1<< (elem& 7); > +} > + > +void libxl_map_reset(struct libxl_map *map, int elem) > +{ > + if (elem>= map->size * 8) > + return; > + map->map[elem / 8]&= ~(1<< (elem& 7)); > +} > + > int libxl_cpumap_alloc(libxl_ctx *ctx, libxl_cpumap *cpumap) > { > int max_cpus; > - int sz; > > max_cpus = libxl_get_max_cpus(ctx); > if (max_cpus == 0) > return ERROR_FAIL; > > - sz = (max_cpus + 7) / 8; > - cpumap->map = calloc(sz, sizeof(*cpumap->map)); > - if (!cpumap->map) > - return ERROR_NOMEM; > - cpumap->size = sz; > - return 0; > -} > - > -void libxl_cpumap_dispose(libxl_cpumap *map) > -{ > - free(map->map); > -} > - > -int libxl_cpumap_test(libxl_cpumap *cpumap, int cpu) > -{ > - if (cpu>= cpumap->size * 8) > - return 0; > - return (cpumap->map[cpu / 8]& (1<< (cpu& 7))) ? 1 : 0; > -} > - > -void libxl_cpumap_set(libxl_cpumap *cpumap, int cpu) > -{ > - if (cpu>= cpumap->size * 8) > - return; > - cpumap->map[cpu / 8] |= 1<< (cpu& 7); > -} > - > -void libxl_cpumap_reset(libxl_cpumap *cpumap, int cpu) > -{ > - if (cpu>= cpumap->size * 8) > - return; > - cpumap->map[cpu / 8]&= ~(1<< (cpu& 7)); > + return libxl_map_alloc(ctx, cpumap, max_cpus); > } > > int libxl_get_max_cpus(libxl_ctx *ctx) > diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h > --- a/tools/libxl/libxl_utils.h > +++ b/tools/libxl/libxl_utils.h > @@ -63,21 +63,46 @@ int libxl_devid_to_device_nic(libxl_ctx > int libxl_vdev_to_device_disk(libxl_ctx *ctx, uint32_t domid, const char *vdev, > libxl_device_disk *disk); > > +int libxl_map_test(struct libxl_map *map, int elem); > +void libxl_map_set(struct libxl_map *map, int elem); > +void libxl_map_reset(struct libxl_map *map, int elem); > +static inline void libxl_map_set_any(struct libxl_map *map) > +{ > + memset(map->map, -1, map->size); > +} > +static inline void libxl_map_set_none(struct libxl_map *map) > +{ > + memset(map->map, 0, map->size); > +} > +static inline int libxl_map_elem_valid(struct libxl_map *map, int elem) > +{ > + return elem>= 0&& elem< (map->size * 8); > +} > + > int libxl_cpumap_alloc(libxl_ctx *ctx, libxl_cpumap *cpumap); > -int libxl_cpumap_test(libxl_cpumap *cpumap, int cpu); > -void libxl_cpumap_set(libxl_cpumap *cpumap, int cpu); > -void libxl_cpumap_reset(libxl_cpumap *cpumap, int cpu); > +static inline int libxl_cpumap_test(libxl_cpumap *cpumap, int cpu) > +{ > + return libxl_map_test(cpumap, cpu); > +} > +static inline void libxl_cpumap_set(libxl_cpumap *cpumap, int cpu) > +{ > + libxl_map_set(cpumap, cpu); > { > - memset(cpumap->map, -1, cpumap->size); > + libxl_map_set_any(cpumap); > } > static inline void libxl_cpumap_set_none(libxl_cpumap *cpumap) > { > - memset(cpumap->map, 0, cpumap->size); > + libxl_map_set_none(cpumap); > } > static inline int libxl_cpumap_cpu_valid(libxl_cpumap *cpumap, int cpu) > { > - return cpu>= 0&& cpu< (cpumap->size * 8); > + return libxl_map_elem_valid(cpumap, cpu); > } > #define libxl_for_each_cpu(var, map) for (var = 0; var< (map).size * 8; var++) > #define libxl_for_each_set_cpu(v, m) for (v = 0; v< (m).size * 8; v++) \
George Dunlap
2012-May-31 14:55 UTC
Re: [PATCH 01 of 11] libxc: abstract xenctl_cpumap to just xenctl_map
On Thu, May 31, 2012 at 3:47 PM, George Dunlap <george.dunlap@eu.citrix.com> wrote:>> This looks plausible to me but it needs a review from a hypervisor >> maintainer too. > > What exactly are you looking for from a hypervisor maintianer? > > I was going to say it looks plausible to me but needs a review from a tools > maintianer. :-) > > Acked-by: George Dunlap <george.dunlap@eu.citrix.com>Er, sorry -- seems I counted something wrong and reviewed the wrong patch. This ack shoudl go to patch #2. Let me come back to this one. -George
Dario Faggioli
2012-May-31 14:57 UTC
Re: [PATCH 06 of 11] libxl: introduce libxl_get_numainfo()
On Thu, 2012-05-31 at 15:22 +0100, Ian Jackson wrote:> Dario Faggioli writes ("[PATCH 06 of 11] libxl: introduce libxl_get_numainfo()"): > > Make some NUMA node information available to the toolstack. Achieve > > this by means of xc_numainfo(), which exposes memory size and amount > > of free memory of each node, as well as the relative distances of > > each node to all the others. > ... > > +#define LIBXL_NUMAINFO_INVALID_ENTRY (~(uint32_t)0) > > Is there some reason we can''t just make sure we use the same value for > this in both places ? That would avoid the need for this: >Sorry, with "both places" being? Maybe you''re talking about reusing LIBXL_CPUTOPOLOGY_INVALID_ENTRY here as well? If yes, I think it is possible and I also thought doing it like that... Or was it something different you were saying?> > +#define V(mem, i) (mem[i] == INVALID_NUMAINFO_ID) ? \ > > + LIBXL_NUMAINFO_INVALID_ENTRY : mem[i] > > I appreciate that the types aren''t the same. In libxc it''s an > unsigned long. But shouldn''t they be the same ? >I again am not sure about what types you are talking about here I''m afraid... :-(> > +libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr) > > +{ > > + xc_numainfo_t ninfo; > > + DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize); > > + DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree); > > + DECLARE_HYPERCALL_BUFFER(uint32_t, node_distances); > > + libxl_numainfo *ret = NULL; > > + int i, j, max_nodes; > > + > > + max_nodes = libxl_get_max_nodes(ctx); > > + if (max_nodes == 0) > > + { > > + LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES"); > > + return NULL; > > + } > > + > > + memsize = xc_hypercall_buffer_alloc > > + (ctx->xch, memsize, sizeof(*memsize) * max_nodes); > > + memfree = xc_hypercall_buffer_alloc > > + (ctx->xch, memfree, sizeof(*memfree) * max_nodes); > > + node_distances = xc_hypercall_buffer_alloc > > + (ctx->xch, node_distances, sizeof(*node_distances) * max_nodes * max_nodes); > > This kind of stuff normally lives in libxc. I appreciate that we have > a bit of it in libxl already, but do we want to perpetuate that ? >Yes, I did this like it is mainly because it is exactly what libxl_get_cpu_topology() does and thus I thought it to be the way to go. :-)> > + ret = malloc(sizeof(libxl_numainfo) * max_nodes); > > + if (ret == NULL) { > > In libxl you can use libxl__zalloc(NULL,...) (and don''t have to check > for errors because it can''t fail). But perhaps this is going into > libxc ? >About libxl__zalloc(), noted. Thanks. About moving this all, I can of couse look into doing that, if wanted.> I''d like to hear what other people say about putting this in libxl > vs. libxc. >Sure, just let me know what people prefer... Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2012-May-31 15:01 UTC
Re: [PATCH 01 of 11] libxc: abstract xenctl_cpumap to just xenctl_map
On Thu, May 31, 2012 at 1:11 PM, Dario Faggioli <raistlin@linux.it> wrote:> More specifically: >  1. replaces xenctl_cpumap with xenctl_map >  2. provides bitmap_to_xenctl_map and the reverse; >  3. re-implement cpumask_to_xenctl_map with bitmap_to_xenctl_map >   and the reverse; > > Other than #3, no functional changes. Interface only slightly > afected. > > This is in preparation of introducing NUMA nodes maps.Dario, What changes are there in this since the last time you posted this? Anything other than updating the patch description? -George> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.eu.com> > > diff --git a/tools/libxc/xc_cpupool.c b/tools/libxc/xc_cpupool.c > --- a/tools/libxc/xc_cpupool.c > +++ b/tools/libxc/xc_cpupool.c > @@ -90,7 +90,7 @@ xc_cpupoolinfo_t *xc_cpupool_getinfo(xc_ >   sysctl.u.cpupool_op.op = XEN_SYSCTL_CPUPOOL_OP_INFO; >   sysctl.u.cpupool_op.cpupool_id = poolid; >   set_xen_guest_handle(sysctl.u.cpupool_op.cpumap.bitmap, local); > -   sysctl.u.cpupool_op.cpumap.nr_cpus = local_size * 8; > +   sysctl.u.cpupool_op.cpumap.nr_elems = local_size * 8; > >   err = do_sysctl_save(xch, &sysctl); > > @@ -184,7 +184,7 @@ xc_cpumap_t xc_cpupool_freeinfo(xc_inter >   sysctl.cmd = XEN_SYSCTL_cpupool_op; >   sysctl.u.cpupool_op.op = XEN_SYSCTL_CPUPOOL_OP_FREEINFO; >   set_xen_guest_handle(sysctl.u.cpupool_op.cpumap.bitmap, local); > -   sysctl.u.cpupool_op.cpumap.nr_cpus = mapsize * 8; > +   sysctl.u.cpupool_op.cpumap.nr_elems = mapsize * 8; > >   err = do_sysctl_save(xch, &sysctl); > > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c > --- a/tools/libxc/xc_domain.c > +++ b/tools/libxc/xc_domain.c > @@ -142,7 +142,7 @@ int xc_vcpu_setaffinity(xc_interface *xc > >   set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); > > -   domctl.u.vcpuaffinity.cpumap.nr_cpus = cpusize * 8; > +   domctl.u.vcpuaffinity.cpumap.nr_elems = cpusize * 8; > >   ret = do_domctl(xch, &domctl); > > @@ -182,7 +182,7 @@ int xc_vcpu_getaffinity(xc_interface *xc >   domctl.u.vcpuaffinity.vcpu = vcpu; > >   set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); > -   domctl.u.vcpuaffinity.cpumap.nr_cpus = cpusize * 8; > +   domctl.u.vcpuaffinity.cpumap.nr_elems = cpusize * 8; > >   ret = do_domctl(xch, &domctl); > > diff --git a/tools/libxc/xc_tbuf.c b/tools/libxc/xc_tbuf.c > --- a/tools/libxc/xc_tbuf.c > +++ b/tools/libxc/xc_tbuf.c > @@ -134,7 +134,7 @@ int xc_tbuf_set_cpu_mask(xc_interface *x >   bitmap_64_to_byte(bytemap, &mask64, sizeof (mask64) * 8); > >   set_xen_guest_handle(sysctl.u.tbuf_op.cpu_mask.bitmap, bytemap); > -   sysctl.u.tbuf_op.cpu_mask.nr_cpus = sizeof(bytemap) * 8; > +   sysctl.u.tbuf_op.cpu_mask.nr_elems = sizeof(bytemap) * 8; > >   ret = do_sysctl(xch, &sysctl); > > diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c > --- a/xen/arch/x86/cpu/mcheck/mce.c > +++ b/xen/arch/x86/cpu/mcheck/mce.c > @@ -1545,8 +1545,7 @@ long do_mca(XEN_GUEST_HANDLE(xen_mc_t) u >       cpumap = &cpu_online_map; >     else >     { > -       ret = xenctl_cpumap_to_cpumask(&cmv, > -                      &op->u.mc_inject_v2.cpumap); > +       ret = xenctl_map_to_cpumask(&cmv, &op->u.mc_inject_v2.cpumap); >       if ( ret ) >         break; >       cpumap = cmv; > diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c > --- a/xen/arch/x86/platform_hypercall.c > +++ b/xen/arch/x86/platform_hypercall.c > @@ -365,7 +365,7 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe >   { >     uint32_t cpu; >     uint64_t idletime, now = NOW(); > -     struct xenctl_cpumap ctlmap; > +     struct xenctl_map ctlmap; >     cpumask_var_t cpumap; >     XEN_GUEST_HANDLE(uint8) cpumap_bitmap; >     XEN_GUEST_HANDLE(uint64) idletimes; > @@ -378,11 +378,11 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe >     if ( cpufreq_controller != FREQCTL_dom0_kernel ) >       break; > > -     ctlmap.nr_cpus  = op->u.getidletime.cpumap_nr_cpus; > +     ctlmap.nr_elems  = op->u.getidletime.cpumap_nr_cpus; >     guest_from_compat_handle(cpumap_bitmap, >                  op->u.getidletime.cpumap_bitmap); >     ctlmap.bitmap.p = cpumap_bitmap.p; /* handle -> handle_64 conversion */ > -     if ( (ret = xenctl_cpumap_to_cpumask(&cpumap, &ctlmap)) != 0 ) > +     if ( (ret = xenctl_map_to_cpumask(&cpumap, &ctlmap)) != 0 ) >       goto out; >     guest_from_compat_handle(idletimes, op->u.getidletime.idletime); > > @@ -401,7 +401,7 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe > >     op->u.getidletime.now = now; >     if ( ret == 0 ) > -       ret = cpumask_to_xenctl_cpumap(&ctlmap, cpumap); > +       ret = cpumask_to_xenctl_map(&ctlmap, cpumap); >     free_cpumask_var(cpumap); > >     if ( ret == 0 && copy_to_guest(u_xenpf_op, op, 1) ) > diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c > --- a/xen/common/cpupool.c > +++ b/xen/common/cpupool.c > @@ -489,7 +489,7 @@ int cpupool_do_sysctl(struct xen_sysctl_ >     op->cpupool_id = c->cpupool_id; >     op->sched_id = c->sched->sched_id; >     op->n_dom = c->n_dom; > -     ret = cpumask_to_xenctl_cpumap(&op->cpumap, c->cpu_valid); > +     ret = cpumask_to_xenctl_map(&op->cpumap, c->cpu_valid); >     cpupool_put(c); >   } >   break; > @@ -584,7 +584,7 @@ int cpupool_do_sysctl(struct xen_sysctl_ > >   case XEN_SYSCTL_CPUPOOL_OP_FREEINFO: >   { > -     ret = cpumask_to_xenctl_cpumap( > +     ret = cpumask_to_xenctl_map( >       &op->cpumap, &cpupool_free_cpus); >   } >   break; > diff --git a/xen/common/domctl.c b/xen/common/domctl.c > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -31,28 +31,29 @@ >  static DEFINE_SPINLOCK(domctl_lock); >  DEFINE_SPINLOCK(vcpu_alloc_lock); > > -int cpumask_to_xenctl_cpumap( > -   struct xenctl_cpumap *xenctl_cpumap, const cpumask_t *cpumask) > +int bitmap_to_xenctl_map(struct xenctl_map *xenctl_map, > +             const unsigned long *bitmap, > +             unsigned int nbits) >  { >   unsigned int guest_bytes, copy_bytes, i; >   uint8_t zero = 0; >   int err = 0; > -   uint8_t *bytemap = xmalloc_array(uint8_t, (nr_cpu_ids + 7) / 8); > +   uint8_t *bytemap = xmalloc_array(uint8_t, (nbits + 7) / 8); > >   if ( !bytemap ) >     return -ENOMEM; > > -   guest_bytes = (xenctl_cpumap->nr_cpus + 7) / 8; > -   copy_bytes  = min_t(unsigned int, guest_bytes, (nr_cpu_ids + 7) / 8); > +   guest_bytes = (xenctl_map->nr_elems + 7) / 8; > +   copy_bytes  = min_t(unsigned int, guest_bytes, (nbits + 7) / 8); > > -   bitmap_long_to_byte(bytemap, cpumask_bits(cpumask), nr_cpu_ids); > +   bitmap_long_to_byte(bytemap, bitmap, nbits); > >   if ( copy_bytes != 0 ) > -     if ( copy_to_guest(xenctl_cpumap->bitmap, bytemap, copy_bytes) ) > +     if ( copy_to_guest(xenctl_map->bitmap, bytemap, copy_bytes) ) >       err = -EFAULT; > >   for ( i = copy_bytes; !err && i < guest_bytes; i++ ) > -     if ( copy_to_guest_offset(xenctl_cpumap->bitmap, i, &zero, 1) ) > +     if ( copy_to_guest_offset(xenctl_map->bitmap, i, &zero, 1) ) >       err = -EFAULT; > >   xfree(bytemap); > @@ -60,36 +61,58 @@ int cpumask_to_xenctl_cpumap( >   return err; >  } > > -int xenctl_cpumap_to_cpumask( > -   cpumask_var_t *cpumask, const struct xenctl_cpumap *xenctl_cpumap) > +int xenctl_map_to_bitmap(unsigned long *bitmap, > +             const struct xenctl_map *xenctl_map, > +             unsigned int nbits) >  { >   unsigned int guest_bytes, copy_bytes; >   int err = 0; > -   uint8_t *bytemap = xzalloc_array(uint8_t, (nr_cpu_ids + 7) / 8); > +   uint8_t *bytemap = xzalloc_array(uint8_t, (nbits + 7) / 8); > >   if ( !bytemap ) >     return -ENOMEM; > > -   guest_bytes = (xenctl_cpumap->nr_cpus + 7) / 8; > -   copy_bytes  = min_t(unsigned int, guest_bytes, (nr_cpu_ids + 7) / 8); > +   guest_bytes = (xenctl_map->nr_elems + 7) / 8; > +   copy_bytes  = min_t(unsigned int, guest_bytes, (nbits + 7) / 8); > >   if ( copy_bytes != 0 ) >   { > -     if ( copy_from_guest(bytemap, xenctl_cpumap->bitmap, copy_bytes) ) > +     if ( copy_from_guest(bytemap, xenctl_map->bitmap, copy_bytes) ) >       err = -EFAULT; > -     if ( (xenctl_cpumap->nr_cpus & 7) && (guest_bytes <= sizeof(bytemap)) ) > -       bytemap[guest_bytes-1] &= ~(0xff << (xenctl_cpumap->nr_cpus & 7)); > +     if ( (xenctl_map->nr_elems & 7) && (guest_bytes <= sizeof(bytemap)) ) > +       bytemap[guest_bytes-1] &= ~(0xff << (xenctl_map->nr_elems & 7)); >   } > > -   if ( err ) > -     /* nothing */; > -   else if ( alloc_cpumask_var(cpumask) ) > -     bitmap_byte_to_long(cpumask_bits(*cpumask), bytemap, nr_cpu_ids); > +   if ( !err ) > +     bitmap_byte_to_long(bitmap, bytemap, nbits); > + > +   xfree(bytemap); > + > +   return err; > +} > + > +int cpumask_to_xenctl_map(struct xenctl_map *xenctl_cpumap, > +              const cpumask_t *cpumask) > +{ > +   return bitmap_to_xenctl_map(xenctl_cpumap, cpumask_bits(cpumask), > +                 nr_cpu_ids); > +} > + > +int xenctl_map_to_cpumask(cpumask_var_t *cpumask, > +              const struct xenctl_map *xenctl_cpumap) > +{ > +   int err = 0; > + > +   if ( alloc_cpumask_var(cpumask) ) { > +     err = xenctl_map_to_bitmap(cpumask_bits(*cpumask), xenctl_cpumap, > +                  nr_cpu_ids); > +     /* In case of error, cleanup is up to us, as the caller won''t care! */ > +     if ( err ) > +       free_cpumask_var(*cpumask); > +   } >   else >     err = -ENOMEM; > > -   xfree(bytemap); > - >   return err; >  } > > @@ -617,7 +640,7 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc >     { >       cpumask_var_t new_affinity; > > -       ret = xenctl_cpumap_to_cpumask( > +       ret = xenctl_map_to_cpumask( >         &new_affinity, &op->u.vcpuaffinity.cpumap); >       if ( !ret ) >       { > @@ -627,7 +650,7 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc >     } >     else >     { > -       ret = cpumask_to_xenctl_cpumap( > +       ret = cpumask_to_xenctl_map( >         &op->u.vcpuaffinity.cpumap, v->cpu_affinity); >     } > > diff --git a/xen/common/trace.c b/xen/common/trace.c > --- a/xen/common/trace.c > +++ b/xen/common/trace.c > @@ -384,7 +384,7 @@ int tb_control(xen_sysctl_tbuf_op_t *tbc >   { >     cpumask_var_t mask; > > -     rc = xenctl_cpumap_to_cpumask(&mask, &tbc->cpu_mask); > +     rc = xenctl_map_to_cpumask(&mask, &tbc->cpu_mask); >     if ( !rc ) >     { >       cpumask_copy(&tb_cpu_mask, mask); > diff --git a/xen/include/public/arch-x86/xen-mca.h b/xen/include/public/arch-x86/xen-mca.h > --- a/xen/include/public/arch-x86/xen-mca.h > +++ b/xen/include/public/arch-x86/xen-mca.h > @@ -414,7 +414,7 @@ struct xen_mc_mceinject { > >  struct xen_mc_inject_v2 { >     uint32_t flags; > -    struct xenctl_cpumap cpumap; > +    struct xenctl_map cpumap; >  }; >  #endif > > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -283,7 +283,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_getvc >  /* XEN_DOMCTL_getvcpuaffinity */ >  struct xen_domctl_vcpuaffinity { >   uint32_t  vcpu;        /* IN */ > -   struct xenctl_cpumap cpumap; /* IN/OUT */ > +   struct xenctl_map cpumap;   /* IN/OUT */ >  }; >  typedef struct xen_domctl_vcpuaffinity xen_domctl_vcpuaffinity_t; >  DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuaffinity_t); > diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h > --- a/xen/include/public/sysctl.h > +++ b/xen/include/public/sysctl.h > @@ -71,8 +71,8 @@ struct xen_sysctl_tbuf_op { >  #define XEN_SYSCTL_TBUFOP_disable    5 >   uint32_t cmd; >   /* IN/OUT variables */ > -   struct xenctl_cpumap cpu_mask; > -   uint32_t       evt_mask; > +   struct xenctl_map cpu_mask; > +   uint32_t      evt_mask; >   /* OUT variables */ >   uint64_aligned_t buffer_mfn; >   uint32_t size;  /* Also an IN variable! */ > @@ -531,7 +531,7 @@ struct xen_sysctl_cpupool_op { >   uint32_t domid;    /* IN: M        */ >   uint32_t cpu;     /* IN: AR       */ >   uint32_t n_dom;    /*       OUT: I  */ > -   struct xenctl_cpumap cpumap; /*   OUT: IF */ > +   struct xenctl_map cpumap;   /*   OUT: IF */ >  }; >  typedef struct xen_sysctl_cpupool_op xen_sysctl_cpupool_op_t; >  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpupool_op_t); > diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h > --- a/xen/include/public/xen.h > +++ b/xen/include/public/xen.h > @@ -822,9 +822,9 @@ typedef uint8_t xen_domain_handle_t[16]; >  #endif > >  #ifndef __ASSEMBLY__ > -struct xenctl_cpumap { > +struct xenctl_map { >   XEN_GUEST_HANDLE_64(uint8) bitmap; > -   uint32_t nr_cpus; > +   uint32_t nr_elems; >  }; >  #endif > > diff --git a/xen/include/xen/cpumask.h b/xen/include/xen/cpumask.h > --- a/xen/include/xen/cpumask.h > +++ b/xen/include/xen/cpumask.h > @@ -424,8 +424,8 @@ extern cpumask_t cpu_present_map; >  #define for_each_present_cpu(cpu)  for_each_cpu(cpu, &cpu_present_map) > >  /* Copy to/from cpumap provided by control tools. */ > -struct xenctl_cpumap; > -int cpumask_to_xenctl_cpumap(struct xenctl_cpumap *, const cpumask_t *); > -int xenctl_cpumap_to_cpumask(cpumask_var_t *, const struct xenctl_cpumap *); > +struct xenctl_map; > +int cpumask_to_xenctl_map(struct xenctl_map *, const cpumask_t *); > +int xenctl_map_to_cpumask(cpumask_var_t *, const struct xenctl_map *); > >  #endif /* __XEN_CPUMASK_H */ > diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst > --- a/xen/include/xlat.lst > +++ b/xen/include/xlat.lst > @@ -2,7 +2,7 @@ >  # ! - needs translation >  # ? - needs checking >  ?    dom0_vga_console_info      xen.h > -?    xenctl_cpumap          xen.h > +?    xenctl_map            xen.h >  ?    mmu_update            xen.h >  !    mmuext_op            xen.h >  !    start_info            xen.h > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Jackson
2012-May-31 15:02 UTC
Re: [PATCH 09 of 11] libxl, xl: enable automatic placement of guests on NUMA nodes
Dario Faggioli writes ("[PATCH 09 of 11] libxl, xl: enable automatic placement of guests on NUMA nodes"):> If a domain does not have a VCPU affinity, try to pin it automatically > to some PCPUs. This is done taking into account the NUMA characteristics > of the host: we look for a combination of host''s NUMA nodes that has enough > free memoy for the new domain, and pin it to the VCPUs of those nodes. > Smaller combinations are considered first, to avoid spreading the > domain''s memory among too many nodes.Thanks for this. Here are my comments:> +static void libxl_nodemap_rand_init(libxl_nodemap *nodemap) > +{ > + int i; > + nodemap->size = rand() % 16; > + nodemap->map = calloc(nodemap->size, sizeof(*nodemap->map)); > + libxl_for_each_node(i, *nodemap) { > + if (rand() % 2) > + libxl_nodemap_set(nodemap, i); > + else > + libxl_nodemap_reset(nodemap, i); > + } > +}For your random number generation, please use nrand48, with a seed in the libxl ctx. (This means you''ll need to take out the ctx lock.) And provide a way to set the seed.> +/* Automatic NUMA placement */ > +libxl_numa_candidate *libxl_domain_numa_candidates(libxl_ctx *ctx, > + libxl_domain_build_info *b_info, > + int min_nodes, int *nr_cndts); > +int libxl_numa_candidate_add_cpus(libxl_ctx *ctx, > + int min_cpus, int max_nodes, > + libxl_numa_candidate *candidate); > +void libxl_numa_candidates_list_free(libxl_numa_candidate *list, int nr);This interface documentation is deficient, I''m afraid. Please explain how these functions are supposed to be used.> diff --git a/tools/libxl/libxl_json.c b/tools/libxl/libxl_json.c > --- a/tools/libxl/libxl_json.c > +++ b/tools/libxl/libxl_json.c > @@ -119,6 +119,26 @@ out: > return s; > } > > +yajl_gen_status libxl_nodemap_gen_json(yajl_gen hand, > + libxl_nodemap *nodemap) > +{ > + yajl_gen_status s; > + int i; > + > + s = yajl_gen_array_open(hand); > + if (s != yajl_gen_status_ok) goto out; > + > + libxl_for_each_node(i, *nodemap) { > + if (libxl_nodemap_test(nodemap, i)) { > + s = yajl_gen_integer(hand, i); > + if (s != yajl_gen_status_ok) goto out; > + } > + } > + s = yajl_gen_array_close(hand); > +out: > + return s; > +}This is a copy of _cpumap_gen_json. Bad enough to have one of these, let along two.> +/* > + * The following two uility functions compute the node maps > + * coresponding to the [ n! / (k! * (n - k)!) ] combinations > + * of @size nodes within a set that is @nr_nodes big, without > + * repetition. The algorithm used for generating them uses > + * a vector containing the indexes in the set of the elements > + * of the i-eth combination to generate the (i+1)-eth, and > + * ensures they come in lexicographic order.Can you please avoid this `@'' ? We aren''t using doxygen (and never will I hope) and it''s just noise. Also I think the "..." here (and later) are wrong since these aren''t strings. If you want to talk about them as arrays and want something to indicate the grouping you could use { }.> +static int numa_node_set_next(int nr_nodes, int size, int *idxs, > + libxl_nodemap *nodemap)You should at least write int idxs[] and explain that the caller is expected to provide an array of size `size'' which is private to the implementation of numa_node_set_...> +{ > + int i; > + > + /* Check whether we just need to increase the rightmost index */ > + if (idxs[size - 1] < nr_nodes - 1) { > + idxs[size - 1]++; > + goto build_nodemap;Is there something wrong with `else'' ? Or maybe you mean `goto out'' ? But, I think in fact this special case is unnecessary, isn''t it ?> + /* Find the rightmost element from where to start the increasing */ > + for (i = size - 1; idxs[i] == nr_nodes - size + i; i--) {Since if idxs[size-1] == nr_nodes-1 this loop''s first test of the condition with i == size-1 becomes idxs[size-1]==nr_nodes-1 and we therefore execute this> + for (idxs[i]++, i += 1; i < size; i++) > + idxs[i] = idxs[i - 1] + 1;with i==size-1 and thus we increment idxs[size-1] and quit the loop right away. Furthermore, that last loop is really rather confusing. The function would benefit from a comment describing the algorith. `i += 1'' is un-idiomatic, too.> +libxl_numa_candidate *libxl_domain_numa_candidates(libxl_ctx *ctx, > + libxl_domain_build_info *b_info, > + int min_nodes, int *nr_cndts)I think it would be better if this function returned a libxl error code. That way in would be possible to distinguish the various error cases with different error codes. You should define new error codes if you need them. For the libxl internal subcalls, do this: rc = libxl_get_some_information(CTX, ...); if (rc) goto out;> + suitable_nodes_idx = malloc(sizeof(int) * nr_nodes);Use GCNEW_ARRAY and ditch the error handling. You will need to GC_INIT and GC_FREE. This will give you a gc and then you should write `CTX'' rather than `ctx'' so that your code can easily be moved into an inner function.> + /* Generate the combinations and check their cumulative free memory */ > + numa_node_set_init(nr_nodes, min_nodes, suitable_nodes_idx, > + &suitable_nodes); > + do { > + nodes_memkb = 0; > + libxl_for_each_set_node(i, suitable_nodes) > + nodes_memkb += ninfo[i].free / 1024;This should be a for loop. If necessary, numa_node_set_* should be changed so that they can easily be used in a for loop. This would be acceptable, for example: for (rc = numa_node_set_init(nr_nodes, min_nodes, suitable_nodes_idx, &suitable_nodes); !rc; rc = numa_node_set_next(nr_nodes, min_nodes, suitable_nodes_idx, &suitable_nodes)) { Perhaps numa_node_set_ should take a struct for all these arguments. Then you could write: for (rc = numa_node_set_init(&numa_node_set_iter, nr_nodes, min_nodes); !rc; rc = numa_node_set_next(&numa_node_set_iter) { Or even better if you changed the interface a bit: for (numa_node_set_init(gc, &numa_node_set_iter, nr_nodes, min_nodes); numa_node_set_get(&numa_node_set_iter, &suitable_nodes); numa_node_set_next(&numa_node_set_iter) { which would allow you to make numa_node_set_iter entirely opaque and move the allocation of the idx array into the numa_node_set_* functions.> + cndts = realloc(cndts, sizeof(cndts[0]) * (*nr_cndts+1)); > + if (cndts == NULL) {Firstly, use libxl__zrealloc(0, ...) and ditch the error handling. Secondly, this reallocs the array every time we add a node. It would be better to keep a separate note of the array size and reallocate in bigger chunks.> + out_nodemap_idx: > + free(suitable_nodes_idx); > + out_nodemap: > + libxl_nodemap_dispose(&suitable_nodes); > + out_numainfo: > + libxl_numainfo_list_free(ninfo, nr_nodes); > + out: > + return cndts; > +}Please don''t use this error handling style. Instead, initialise all the resource variables to 0 right at the top of the function, and unconditionally free them. For the output variable cndts, you can unconditionally free it at the end if you do this on the success exit path:> +int libxl_numa_candidate_add_cpus(libxl_ctx *ctx, > + int min_cpus, int max_nodes, > + libxl_numa_candidate *candidate) > +{ > + int dom_nodes, nodes_cpus; > + libxl_cputopology *tinfo; > + libxl_numainfo *ninfo; > + int nr_nodes, nr_cpus; > + int i, rc; > + > + tinfo = libxl_get_cpu_topology(ctx, &nr_cpus); > + if (tinfo == NULL) { > + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl_get_topologyinfo failed\n"); > + rc = ERROR_NOMEM;We aren''t supposed to be returning ERROR_NOMEM any more because all our memory allocation is now assumed to be infallible. Shouldn''t this be ERROR_FAIL ? (And in the next stanza.)> + if (max_nodes == -1 || max_nodes > nr_nodes) > + max_nodes = nr_nodes;I''m afraid I can''t really make much sense of this algorithm without thinking very hard. Maybe if the interface doc comment had explained what it was supposed to do it would be obvious...> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -490,6 +490,122 @@ static void split_string_into_string_lis...> +/* How many PCPUs are there on each node? */ > +static int cpus_per_node(libxl_cputopology *tinfo, int nr_cpus) > +{ > + libxl_numainfo *ninfo; > + int nr_nodes, j, i; > + int cpus_nodes = 0; > + > + ninfo = libxl_get_numainfo(ctx, &nr_nodes); > + if (ninfo == NULL) > + return 0; > + > + /* This makes sense iff # of PCPUs is the same for all nodes */And what if it isn''t ? Later I see:> + if (cpus_nodes != 0) > + dom_min_nodes = (dom_needs_cpus + cpus_nodes - 1) / cpus_nodes; > + else > + dom_min_nodes = 1;which seems to suggest we''ll pile the whole thing onto one pcpu.> +/* Try to achieve "optimal" NUMA placement */ > +static int place_domain(libxl_domain_build_info *b_info) > +{... + if (nr_pools > 1) {> + fprintf(stderr, "skip numa placement as cpupools are being used\n"); > + err = 0; > + goto out_poolinfo; > + }Perhaps in the case of cpupools the right answer is to consider only those pcpus in the intended pool ? Or is that likely to do more harm than good, with people who are currently doing their numa placement with cpupools ?> + tinfo = libxl_get_cpu_topology(ctx, &nr_cpus); > + if (tinfo == NULL) { > + fprintf(stderr, "libxl_get_topologyinfo failed.\n"); > + err = ENOMEM;libxl functions should return libxl error codes, not errno values. You will probably need to introduce some.> + if (err == ERROR_FAIL) { > + /* Report back we didn''t find a candidate with enough cpus */ > + err = ESRCH;I don''t think we should ever turn ERROR_FAIL into something else. ERROR_FAIL means `something went wrong in a way that shouldn''t happen''.> +out_topologyinfo: > + libxl_cputopology_list_free(tinfo, nr_cpus); > +out_poolinfo: > + for (i = 0; i < nr_pools; i++) > + libxl_cpupoolinfo_dispose(&pinfo[i]); > +out: > + return err; > +}Again, please use an idempotent error handling style. Ian.
Dario Faggioli
2012-May-31 15:08 UTC
Re: [PATCH 01 of 11] libxc: abstract xenctl_cpumap to just xenctl_map
On Thu, 2012-05-31 at 16:01 +0100, George Dunlap wrote:> Dario, > > What changes are there in this since the last time you posted this? > Anything other than updating the patch description? >Nope. Patches 1, 2 and 3 are just what I sent with your comments applied about code and changelogs. Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2012-May-31 15:08 UTC
Re: [PATCH 11 of 11] Some automatic NUMA placement documentation
Dario Faggioli writes ("[PATCH 11 of 11] Some automatic NUMA placement documentation"):> About rationale, usage and API....> +## Guest Placement in libxl ##Oh here''s the API documentation! In general I would prefer to see docs come in the same patch but I guess I can read it here:> +xl achieves automatic NUMA placement by means of the following API > +calls, provided by libxl.Can you try to write some more general comment about what order these functions should be called in ? Or to put it another way:> + libxl_numa_candidate *libxl_domain_numa_candidates(libxl_ctx *ctx, > + libxl_domain_build_info *b_info, > + int min_nodes, int *nr_cndts); > + > +This is what should be used to generate the full set of placement > +candidates. In fact, the function returns an array of containing nr_cndts > +libxl_numa_candidate (see below). Each candidate is basically a set of nodes > +that has been checked against the memory requirement derived from the > +provided libxl_domain_build_info.That tells me what the function does. But my starting point is that I have no idea when or why I might want to `generate the full set of placement candidates''. If this is something I need to do the docs need to explain that. This goes double for this function:> + int libxl_numa_candidate_add_cpus(libxl_ctx *ctx, > + int min_cpus, int max_nodes, > + libxl_numa_candidate *candidate); > + > +This is what should be used to ensure a placement candidate has at least > +min_cpus CPUs. In case it does not, the function also take care of > +adding more nodes to the candidate itself (up to when the value specified > +in max_nodes is reached). When adding new nodes, the one that has the > +smallest "distance" from the current node map is selected at each step.`add_cpus'' doesn''t seem the same as `ensure a candidate has at least min_cpus CPUs''. In what sense are the CPUs added ? And, as before, why might I want to call this ? And when would I call it ? Why does the interface to libxl expose this rather than just offering a single function libxl_do_automatic_numa_placement_according_to_heuristics ?> + libxl_numa_candidate_count_domains(libxl_ctx *ctx, > + libxl_numa_candidate *candidate); > + > +This is what counts the number of domains that are currently pinned > +to the CPUs of the nodes of a given candidate.Why is that useful ? It is used by some of your other code so I guess this is a facility which is useful to the implementors of other placement algorithms ? Ian.
Dario Faggioli
2012-May-31 15:41 UTC
Re: [PATCH 11 of 11] Some automatic NUMA placement documentation
On Thu, 2012-05-31 at 16:08 +0100, Ian Jackson wrote:> Dario Faggioli writes ("[PATCH 11 of 11] Some automatic NUMA placement documentation"): > > About rationale, usage and API. > ... > > +## Guest Placement in libxl ## > > Oh here''s the API documentation! >Yep. :-)> In general I would prefer to see docs come in the same patch but I > guess I can read it here: >I can put it there. Would you recommend _moving_ it in the source file o also leaving a copy, o maybe just some pointers, here?> > +xl achieves automatic NUMA placement by means of the following API > > +calls, provided by libxl. > > Can you try to write some more general comment about what order these > functions should be called in ? >Yes I can, good point. <snip>> > + int libxl_numa_candidate_add_cpus(libxl_ctx *ctx, > > + int min_cpus, int max_nodes, > > + libxl_numa_candidate *candidate); > > + > > +This is what should be used to ensure a placement candidate has at least > > +min_cpus CPUs. In case it does not, the function also take care of > > +adding more nodes to the candidate itself (up to when the value specified > > +in max_nodes is reached). When adding new nodes, the one that has the > > +smallest "distance" from the current node map is selected at each step. > > `add_cpus'' doesn''t seem the same as `ensure a candidate has at least > min_cpus CPUs''. In what sense are the CPUs added ? >In the sense that, if you have a candidate placement which comprises, say, 2 nodes, summing up to 12 CPUs, but you want your VM to have access to a minimum of 16 CPUs the this checks that and add more nodes to the candidate placement until it contains enough of them for giving you the number of CPUs you asked for.> And, as before, why might I want to call this ? And when would I call > it ? Why does the interface to libxl expose this rather than just > offering a single function > > libxl_do_automatic_numa_placement_according_to_heuristics > > ? >Right, that is of couse an option, and I thought about going that way for quite a while. However, what I think is best for libxl is to provide a set of building blocks for its user to be able to implement the actual heuristics. That''s why I tried to offer a ''generate the placement candidates'' and ''manipulate the placement candidate'' kind of API, to make it possible for lower levels to use it when putting their own placing algorithm together. It is also kind of extendable, I mean, we can always add another ''manipulate the candidates in some other way'' function if needed or wanted by us or whatever other libxl user. Doing it the other way, i.e., one big function doing everything, would mean that as soon as we want to change or improve the placement heuristics, we need to modify the behaviour of that API call, which I think it is suboptimal. Also, if some other user of libxl does not like the heuristics I came out with, they have to reimplement the whole placement. So, like it is right now, the actual heuristics is implemented in xl, on top of these placement candidate generation and manipulation facilities, which I finally decided it was the way I liked this whole thing most. :-) Again, you''re right in asking this reasoning to be part of the documentation, and to be put in the proper place, and I will do that. However, now that I''ve put it here, do you think it makes some sense?> > + libxl_numa_candidate_count_domains(libxl_ctx *ctx, > > + libxl_numa_candidate *candidate); > > + > > +This is what counts the number of domains that are currently pinned > > +to the CPUs of the nodes of a given candidate. > > Why is that useful ? It is used by some of your other code so I guess > this is a facility which is useful to the implementors of other > placement algorithms ? >As I tried to explain above, yes, exactly like that. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2012-May-31 15:41 UTC
Re: [PATCH 03 of 11] libxc, libxl: introduce xc_nodemap_t and libxl_nodemap
On 31/05/12 13:11, Dario Faggioli wrote:> As NUMA node-related counterparts of xc_cpumap_t and libxl_cpumap. > > This is in preparation of making it possible to manipulate > NUMA nodes from the toolstack(s). > > Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com> > > diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c > --- a/tools/libxc/xc_misc.c > +++ b/tools/libxc/xc_misc.c > @@ -35,11 +35,30 @@ int xc_get_max_cpus(xc_interface *xch) > return max_cpus; > } > > +int xc_get_max_nodes(xc_interface *xch) > +{ > + static int max_nodes = 0; > + xc_physinfo_t physinfo; > + > + if ( max_nodes ) > + return max_nodes; > + > + if ( !xc_physinfo(xch,&physinfo) ) > + max_nodes = physinfo.max_node_id + 1; > + > + return max_nodes;What does xc_physinfo() return? Should this return the error message rather than 0 if it fails? -George
George Dunlap
2012-May-31 15:51 UTC
Re: [PATCH 04 of 11] libxl: expand the libxl_{cpu, node}map API a bit
On 31/05/12 13:11, Dario Faggioli wrote:> By adding copying and *_is_full/*_is_empty facilities. > > Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com> > > diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c > --- a/tools/libxl/libxl_utils.c > +++ b/tools/libxl/libxl_utils.c > @@ -505,6 +505,35 @@ static int libxl_map_alloc(libxl_ctx *ct > return 0; > } > > +static void libxl_map_copy(struct libxl_map *dptr, > + const struct libxl_map *sptr) > +{ > + int sz; > + > + sz = dptr->size = sptr->size; > + memcpy(dptr->map, sptr->map, sz * sizeof(*dptr->map)); > +} > + > +int libxl_map_is_full(struct libxl_map *map) > +{ > + int i; > + > + for (i = 0; i< map->size; i++) > + if (map->map[i] != (uint8_t)-1) > + return -1; > + return 0; > +}Why are you returning -1 and 0 for false and true, rather than 0 and 1? None of the other libxl "_is_" functions (e.g., libxl_is_stubom()) do that. -George
George Dunlap
2012-May-31 15:54 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
On 31/05/12 13:11, Dario Faggioli wrote:> And make all the required infrastructure updates to enable this. > > Signed-off-by: Ian Campbell<ian.campbell@citrix.com> > Tested-by: Dario Faggioli<dario.faggioli@citrix.com>This is beyond my ken, so I''m going to skip this one. :-) -George
Dario Faggioli
2012-May-31 16:01 UTC
Re: [PATCH 04 of 11] libxl: expand the libxl_{cpu, node}map API a bit
On Thu, 2012-05-31 at 16:51 +0100, George Dunlap wrote:> > +int libxl_map_is_full(struct libxl_map *map) > > +{ > > + int i; > > + > > + for (i = 0; i< map->size; i++) > > + if (map->map[i] != (uint8_t)-1) > > + return -1; > > + return 0; > > +} > Why are you returning -1 and 0 for false and true, rather than 0 and 1? > None of the other libxl "_is_" functions (e.g., libxl_is_stubom()) do that. >Oh, well, it seemed natural to me as well for true to be 1, but that would have meant giving 0 a different meaning than ''everything is ok'', and I didn''t find relevant examples of it being done... Thanks for pointing this out, I''ll change it. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-May-31 16:09 UTC
Re: [PATCH 03 of 11] libxc, libxl: introduce xc_nodemap_t and libxl_nodemap
On Thu, 2012-05-31 at 16:41 +0100, George Dunlap wrote:> > +int xc_get_max_nodes(xc_interface *xch) > > +{ > > + static int max_nodes = 0; > > + xc_physinfo_t physinfo; > > + > > + if ( max_nodes ) > > + return max_nodes; > > + > > + if ( !xc_physinfo(xch,&physinfo) ) > > + max_nodes = physinfo.max_node_id + 1; > > + > > + return max_nodes; > What does xc_physinfo() return? Should this return the error message > rather than 0 if it fails? >int xc_get_max_cpus(xc_interface *xch) { static int max_cpus = 0; xc_physinfo_t physinfo; if ( max_cpus ) return max_cpus; if ( !xc_physinfo(xch, &physinfo) ) max_cpus = physinfo.max_cpu_id + 1; return max_cpus; } And I guess the reason for this is this is used (both in libxc and libxl) in a few places to determine the size of the array that will host the cpu map. Thus, getting a 0 ensures we do not allocate a random amount of memory. Of course it should be more than possible to change it, but going though all the callers is needed. If you want me to do that, just say it, and I will perhaps deal with both this "original" and my "variant", right? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-May-31 16:27 UTC
Re: [PATCH 09 of 11] libxl, xl: enable automatic placement of guests on NUMA nodes
On Thu, 2012-05-31 at 16:02 +0100, Ian Jackson wrote:> Dario Faggioli writes ("[PATCH 09 of 11] libxl, xl: enable automatic placement of guests on NUMA nodes"): > > If a domain does not have a VCPU affinity, try to pin it automatically > > to some PCPUs. This is done taking into account the NUMA characteristics > > of the host: we look for a combination of host''s NUMA nodes that has enough > > free memoy for the new domain, and pin it to the VCPUs of those nodes. > > Smaller combinations are considered first, to avoid spreading the > > domain''s memory among too many nodes. > > Thanks for this. Here are my comments: >Thanks to you. :-)> > +static void libxl_nodemap_rand_init(libxl_nodemap *nodemap) > > +{ > > + int i; > > + nodemap->size = rand() % 16; > > + nodemap->map = calloc(nodemap->size, sizeof(*nodemap->map)); > > + libxl_for_each_node(i, *nodemap) { > > + if (rand() % 2) > > + libxl_nodemap_set(nodemap, i); > > + else > > + libxl_nodemap_reset(nodemap, i); > > + } > > +} > > For your random number generation, please use nrand48, with a seed in > the libxl ctx. (This means you''ll need to take out the ctx lock.) > And provide a way to set the seed. >Sounds reasonable, I''ll look into this. As a side note, as I deliberately took inspiration from libxl_cpumap_rand_init() for this, should I apply what you just said to that function too?> > +/* Automatic NUMA placement */ > > +libxl_numa_candidate *libxl_domain_numa_candidates(libxl_ctx *ctx, > > + libxl_domain_build_info *b_info, > > + int min_nodes, int *nr_cndts); > > +int libxl_numa_candidate_add_cpus(libxl_ctx *ctx, > > + int min_cpus, int max_nodes, > > + libxl_numa_candidate *candidate); > > +void libxl_numa_candidates_list_free(libxl_numa_candidate *list, int nr); > > This interface documentation is deficient, I''m afraid. Please explain > how these functions are supposed to be used. >Sure, as per the other mail, I''ll put the API documentation here.> > +yajl_gen_status libxl_nodemap_gen_json(yajl_gen hand, > > + libxl_nodemap *nodemap) > > +{ > > + yajl_gen_status s; > > + int i; > > + > > + s = yajl_gen_array_open(hand); > > + if (s != yajl_gen_status_ok) goto out; > > + > > + libxl_for_each_node(i, *nodemap) { > > + if (libxl_nodemap_test(nodemap, i)) { > > + s = yajl_gen_integer(hand, i); > > + if (s != yajl_gen_status_ok) goto out; > > + } > > + } > > + s = yajl_gen_array_close(hand); > > +out: > > + return s; > > +} > > This is a copy of _cpumap_gen_json. Bad enough to have one of these, > let along two. >So, perhaps you''re suggesting merging the two of them?> > +static int numa_node_set_next(int nr_nodes, int size, int *idxs, > > + libxl_nodemap *nodemap) > > You should at least write int idxs[] and explain that the > caller is expected to provide an array of size `size'' which is private > to the implementation of numa_node_set_... > > <snip> >I like this and the other suggestions about restructuring the combination generation and iteration, and I''ll also try to improve the comments and the documentation.> > +libxl_numa_candidate *libxl_domain_numa_candidates(libxl_ctx *ctx, > > + libxl_domain_build_info *b_info, > > + int min_nodes, int *nr_cndts) > > I think it would be better if this function returned a libxl error > code. That way in would be possible to distinguish the various error > cases with different error codes. You should define new error codes > if you need them. > > For the libxl internal subcalls, do this: > rc = libxl_get_some_information(CTX, ...); > if (rc) goto out; >Ok, I will move the array of candidates among the parameters...> > + cndts = realloc(cndts, sizeof(cndts[0]) * (*nr_cndts+1)); > > + if (cndts == NULL) { > > Firstly, use libxl__zrealloc(0, ...) and ditch the error handling. > Secondly, this reallocs the array every time we add a node. It would > be better to keep a separate note of the array size and reallocate in > bigger chunks. >Yes, that''s very bad, I agree.> > + out_nodemap_idx: > > + free(suitable_nodes_idx); > > + out_nodemap: > > + libxl_nodemap_dispose(&suitable_nodes); > > + out_numainfo: > > + libxl_numainfo_list_free(ninfo, nr_nodes); > > + out: > > + return cndts; > > +} > > Please don''t use this error handling style. Instead, initialise all > the resource variables to 0 right at the top of the function, and > unconditionally free them. >Ok.> For the output variable cndts, you can unconditionally free it at the > end if you do this on the success exit path: >No, that is up to the caller to free...> > +int libxl_numa_candidate_add_cpus(libxl_ctx *ctx, > > + int min_cpus, int max_nodes, > > + libxl_numa_candidate *candidate) > > +{ > > + int dom_nodes, nodes_cpus; > > + libxl_cputopology *tinfo; > > + libxl_numainfo *ninfo; > > + int nr_nodes, nr_cpus; > > + int i, rc; > > + > > + tinfo = libxl_get_cpu_topology(ctx, &nr_cpus); > > + if (tinfo == NULL) { > > + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl_get_topologyinfo failed\n"); > > + rc = ERROR_NOMEM; > > We aren''t supposed to be returning ERROR_NOMEM any more because all > our memory allocation is now assumed to be infallible. Shouldn''t this > be ERROR_FAIL ? (And in the next stanza.) >Right. What I need is to distinguish, in the caller, whether this failed because it did not find and couldn''t add enough CPUs --which would mean "bad, but we just reset the affinity and avoid placing the domain"-- or for some other reason --which would mean "bad, we must bail out". But I guess I can manage doing this with the ERROR_s (or adding my own ones, if that is the case).> > +/* How many PCPUs are there on each node? */ > > +static int cpus_per_node(libxl_cputopology *tinfo, int nr_cpus) > > +{ > > + libxl_numainfo *ninfo; > > + int nr_nodes, j, i; > > + int cpus_nodes = 0; > > + > > + ninfo = libxl_get_numainfo(ctx, &nr_nodes); > > + if (ninfo == NULL) > > + return 0; > > + > > + /* This makes sense iff # of PCPUs is the same for all nodes */ > > And what if it isn''t ? Later I see: > > > + if (cpus_nodes != 0) > > + dom_min_nodes = (dom_needs_cpus + cpus_nodes - 1) / cpus_nodes; > > + else > > + dom_min_nodes = 1; > > which seems to suggest we''ll pile the whole thing onto one pcpu. >You''re again right I might have spent a few more words on this... It''s just I fear bearing to verbose (I really tend to talk and write too much!). This whole comes from an hint from George during his review of my RFC, where he said we should check how many CPUs we have on each node so that, if we know we have 6 CPUs on each node and the domain wants 8 CPUs, we can avoid looking for solutions comprising less than 2 nodes. This is all true, but it depends on the fact each node has the same amount of CPUs in it, otherwise, math becomes way more complex than just that division. Therefore, what this is trying to do is check whether or not we are in that situation and, if not, just turn this kind of optimization off and tell the rest of the algorithm to give use _all_ the candidates, starting from a minimum number of nodes equal to one (which is the meaning of dom_min_nodes = 1 at this stage).> > +/* Try to achieve "optimal" NUMA placement */ > > +static int place_domain(libxl_domain_build_info *b_info) > > +{ > ... > + if (nr_pools > 1) { > > + fprintf(stderr, "skip numa placement as cpupools are being used\n"); > > + err = 0; > > + goto out_poolinfo; > > + } > > Perhaps in the case of cpupools the right answer is to consider only > those pcpus in the intended pool ? Or is that likely to do more harm > than good, with people who are currently doing their numa placement > with cpupools ? >I''m not sure. What you are saying is what I wanted to have, but then I thought, as it is for VCPU affinity, if the user is asking something specific about CPU usage, it may be better to leave the whole thing with him and turning all the automation down. However, I think I can make the algorithm work while taking cpupools into account, it''s just a matter of deciding. Maybe hearing something from George and Juergen could be useful... Thanks a lot for the quick and detailed review. :-) Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2012-May-31 16:42 UTC
Re: [PATCH 09 of 11] libxl, xl: enable automatic placement of guests on NUMA nodes
On Thu, 2012-05-31 at 17:27 +0100, Dario Faggioli wrote:> On Thu, 2012-05-31 at 16:02 +0100, Ian Jackson wrote: > > Dario Faggioli writes ("[PATCH 09 of 11] libxl, xl: enable automatic placement of guests on NUMA nodes"): > > > If a domain does not have a VCPU affinity, try to pin it automatically > > > to some PCPUs. This is done taking into account the NUMA characteristics > > > of the host: we look for a combination of host''s NUMA nodes that has enough > > > free memoy for the new domain, and pin it to the VCPUs of those nodes. > > > Smaller combinations are considered first, to avoid spreading the > > > domain''s memory among too many nodes. > > > > Thanks for this. Here are my comments: > > > Thanks to you. :-) > > > > +static void libxl_nodemap_rand_init(libxl_nodemap *nodemap) > > > +{ > > > + int i; > > > + nodemap->size = rand() % 16; > > > + nodemap->map = calloc(nodemap->size, sizeof(*nodemap->map)); > > > + libxl_for_each_node(i, *nodemap) { > > > + if (rand() % 2) > > > + libxl_nodemap_set(nodemap, i); > > > + else > > > + libxl_nodemap_reset(nodemap, i); > > > + } > > > +} > > > > For your random number generation, please use nrand48, with a seed in > > the libxl ctx. (This means you''ll need to take out the ctx lock.) > > And provide a way to set the seed. > > > Sounds reasonable, I''ll look into this. As a side note, as I > deliberately took inspiration from libxl_cpumap_rand_init() for this, > should I apply what you just said to that function too?This code is in gentest.py not libxl proper, it doesn''t need a seed in the ctx nor to use good random numbers, Ian J was mistakenly think this was actual libxl functionality I think. Ian.
Dario Faggioli
2012-May-31 16:56 UTC
Re: [PATCH 09 of 11] libxl, xl: enable automatic placement of guests on NUMA nodes
On Thu, 2012-05-31 at 17:42 +0100, Ian Campbell wrote:> > > For your random number generation, please use nrand48, with a seed in > > > the libxl ctx. (This means you''ll need to take out the ctx lock.) > > > And provide a way to set the seed. > > > > > Sounds reasonable, I''ll look into this. As a side note, as I > > deliberately took inspiration from libxl_cpumap_rand_init() for this, > > should I apply what you just said to that function too? > > This code is in gentest.py not libxl proper, >It is, indeed, as it is its _cpumap_ counterpart from which I copied it, as soon as I started getting build errors because of an obscure libxl_nodemap_rand_init() (which I didn''t knew anything of) being missing! :-P Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2012-Jun-01 16:44 UTC
Re: [PATCH 06 of 11] libxl: introduce libxl_get_numainfo()
On 31/05/12 15:22, Ian Jackson wrote:> Dario Faggioli writes ("[PATCH 06 of 11] libxl: introduce libxl_get_numainfo()"): >> Make some NUMA node information available to the toolstack. Achieve >> this by means of xc_numainfo(), which exposes memory size and amount >> of free memory of each node, as well as the relative distances of >> each node to all the others. > ... >> +#define LIBXL_NUMAINFO_INVALID_ENTRY (~(uint32_t)0) > Is there some reason we can''t just make sure we use the same value for > this in both places ? That would avoid the need for this: > >> +#define V(mem, i) (mem[i] == INVALID_NUMAINFO_ID) ? \ >> + LIBXL_NUMAINFO_INVALID_ENTRY : mem[i] > I appreciate that the types aren''t the same. In libxc it''s an > unsigned long. But shouldn''t they be the same ?It looks like Dario was just following suit from the topology interface: xen/include/syctl.h: #define INVALID_TOPOLOGY_ID (~0U) tools/libxl/libxl.h: #define LIBXL_CPUTOPOLOGY_INVALID_ENTRY (~(uint32_t)0) It might be worth seeing who wrote those and asking that person why they thought it was important. It might be something to do with making the libxl type a fixed size...?> In libxl you can use libxl__zalloc(NULL,...) (and don''t have to check > for errors because it can''t fail). But perhaps this is going into > libxc ? I''d like to hear what other people say about putting this in > libxl vs. libxc.I dunno, whom do we expect to be calling libxc? If the answer is "One day, only libxl", then it''s just a matter of taste. Otherwise, we need to ask whether someone might want to call such a function w/o having to link to libxl. -George
George Dunlap
2012-Jun-01 16:56 UTC
Re: [PATCH 08 of 11] xl: add more NUMA information to `xl info -n''
On 31/05/12 15:24, Ian Jackson wrote:> Dario Faggioli writes ("[PATCH 08 of 11] xl: add more NUMA information to `xl info -n''"): >> So that the user knows how much memory there is on each node and >> how far they are from each others. > Perhaps the json output machinery can do this for us ? If that''s > sufficiently legible then it would be an improvement on this > open-coded printer.That sounds like a different patch series -- after all, the topology info is also "open-coded" at the moment. I think there''s enough work to do with the NUMA placement stuff for now. :-) My only nit (somewhat pedantic) is that "x / (1 << n)" will come up with exactly the same answer as "x >> n", but with a very fast bit-shift rather than a very slow integer divison. Not that it matters for this purpose; but it does make the expression more complicated-looking than it needs to me. But other than that, I think this patch is probably good as it is. -George
Ian Jackson
2012-Jun-01 16:58 UTC
Re: [PATCH 06 of 11] libxl: introduce libxl_get_numainfo()
George Dunlap writes ("Re: [PATCH 06 of 11] libxl: introduce libxl_get_numainfo()"):> I dunno, whom do we expect to be calling libxc? If the answer is "One > day, only libxl", then it''s just a matter of taste. Otherwise, we need > to ask whether someone might want to call such a function w/o having to > link to libxl.Perhaps the right thing is to cross that bridge when we come to it. Ian.
Ian Jackson
2012-Jun-08 13:54 UTC
Re: [PATCH 04 of 11] libxl: expand the libxl_{cpu, node}map API a bit
Dario Faggioli writes ("Re: [Xen-devel] [PATCH 04 of 11] libxl: expand the libxl_{cpu, node}map API a bit"):> On Thu, 2012-05-31 at 15:13 +0100, Ian Jackson wrote: > > Dario Faggioli writes ("[PATCH 04 of 11] libxl: expand the libxl_{cpu,node}map API a bit"): > > > +void libxl_cpumap_copy(/*XXX libxl_ctx *ctx,*/ libxl_cpumap *dst, > > > > XXX ? > > Yep, I left it on purpose but I also wanted to mention that in the > changelog, which I apparently missed. Sorry. > > However, now that we are here, the whole point is, does that function > need a libxl_ctx argument? It doesn''t use it, but I was concerned about > consistency with the other libxl_cpumap_* calls, and maybe future > needs...Yes. It should take a ctx. Ian.
Ian Jackson
2012-Jun-08 14:01 UTC
Re: [PATCH 11 of 11] Some automatic NUMA placement documentation
Dario Faggioli writes ("Re: [Xen-devel] [PATCH 11 of 11] Some automatic NUMA placement documentation"):> On Thu, 2012-05-31 at 16:08 +0100, Ian Jackson wrote: > > In general I would prefer to see docs come in the same patch but I > > guess I can read it here: > > I can put it there. Would you recommend _moving_ it in the source file o > also leaving a copy, o maybe just some pointers, here?In general I don''t mind whether API docs are in a separate file or in the source headers. What we have here is sufficiently complex that it seems to merit its own file. A reference to it from libxl.h would be appropriate.> Right, that is of couse an option, and I thought about going that way > for quite a while. However, what I think is best for libxl is to provide > a set of building blocks for its user to be able to implement the actual > heuristics.The difficulty with this is that this is supposedly becoming a stable API. If we change the approach later, we may want to change the API. Do we know clearly enough what building blocks are needed and what they should be like ? I think it would be easier, at this stage for 4.2, simply to offer a single `do something sensible'' heuristic, and then think about more complicated control by the caller for 4.3.> Doing it the other way, i.e., one big function doing everything, would > mean that as soon as we want to change or improve the placement > heuristics, we need to modify the behaviour of that API call, which I > think it is suboptimal.If we have a function whose documented behaviour is `try to do a roughly optimal thing'' then improving its optimisation is not a change to the API semantics. Specifically, existing code then will, when upgraded to a newer libxl, behaviour differently - better, we hope. Is that not the intent ?> Also, if some other user of libxl does not like > the heuristics I came out with, they have to reimplement the whole > placement.I think users of 4.2 won''t be doing that. That is, we can tell them to please wait for 4.3 when we will have a richer API for this kind of thing.> So, like it is right now, the actual heuristics is implemented in xl, on > top of these placement candidate generation and manipulation facilities, > which I finally decided it was the way I liked this whole thing > most. :-)So how much of the code currently in libxl should be reproduced in (say) libvirt ?> Again, you''re right in asking this reasoning to be part of the > documentation, and to be put in the proper place, and I will do that. > However, now that I''ve put it here, do you think it makes some sense?I think it''s fine for it to be in a separate file. But I think it should ideally be in the same patch as the implementation.> > > + libxl_numa_candidate_count_domains(libxl_ctx *ctx, > > > + libxl_numa_candidate *candidate); > > > + > > > +This is what counts the number of domains that are currently pinned > > > +to the CPUs of the nodes of a given candidate. > > > > Why is that useful ? It is used by some of your other code so I guess > > this is a facility which is useful to the implementors of other > > placement algorithms ? > > As I tried to explain above, yes, exactly like that.Are we really sure that we want to support this API for a long time ? Ian.
George Dunlap
2012-Jun-08 14:03 UTC
Re: [PATCH 11 of 11] Some automatic NUMA placement documentation
On 08/06/12 15:01, Ian Jackson wrote:> Dario Faggioli writes ("Re: [Xen-devel] [PATCH 11 of 11] Some automatic NUMA placement documentation"): >> On Thu, 2012-05-31 at 16:08 +0100, Ian Jackson wrote: >>> In general I would prefer to see docs come in the same patch but I >>> guess I can read it here: >> I can put it there. Would you recommend _moving_ it in the source file o >> also leaving a copy, o maybe just some pointers, here? > In general I don''t mind whether API docs are in a separate file or in > the source headers. What we have here is sufficiently complex that it > seems to merit its own file. A reference to it from libxl.h would be > appropriate. > >> Right, that is of couse an option, and I thought about going that way >> for quite a while. However, what I think is best for libxl is to provide >> a set of building blocks for its user to be able to implement the actual >> heuristics. > The difficulty with this is that this is supposedly becoming a stable > API. If we change the approach later, we may want to change the API. > Do we know clearly enough what building blocks are needed and what > they should be like ? > > I think it would be easier, at this stage for 4.2, simply to offer a > single `do something sensible'' heuristic, and then think about more > complicated control by the caller for 4.3.In fact, I think it would probably be better to take Ian Campbell''s suggestion and either just do it in xl completely (with no libxl changes) or do it entirely with libxl, with no external interface. Then we''ll have something better than nothing, and the whole 4.3 dev cycle to come up with a good interface we can commit to. -George
Ian Jackson
2012-Jun-08 14:03 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
Dario Faggioli writes ("[Xen-devel] [PATCH 05 of 11] libxl: add a new Array type to the IDL"):> And make all the required infrastructure updates to enable this. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Tested-by: Dario Faggioli <dario.faggioli@citrix.com>You write `From: Dario'' in the headers but it has a S-o-b from Ian. I think you should write From: Ian Campbell <ian.campbell@citrix.com> if he actually wrote it. git-commit --author will do this for you and I think git-send-email will then generate appropriate output. Can you point me at your git tree so I can easily clone it and see what the output from this new IDL generator looks like ? Thanks, Ian.
Ian Jackson
2012-Jun-08 14:06 UTC
Re: [PATCH 11 of 11] Some automatic NUMA placement documentation
George Dunlap writes ("Re: [Xen-devel] [PATCH 11 of 11] Some automatic NUMA placement documentation"):> On 08/06/12 15:01, Ian Jackson wrote: > > I think it would be easier, at this stage for 4.2, simply to offer a > > single `do something sensible'' heuristic, and then think about more > > complicated control by the caller for 4.3. > > In fact, I think it would probably be better to take Ian Campbell''s > suggestion and either just do it in xl completely (with no libxl > changes) or do it entirely with libxl, with no external interface. Then > we''ll have something better than nothing, and the whole 4.3 dev cycle to > come up with a good interface we can commit to.Yes. Ian.
Dario Faggioli
2012-Jun-08 14:35 UTC
Re: [PATCH 11 of 11] Some automatic NUMA placement documentation
On Fri, 2012-06-08 at 15:06 +0100, Ian Jackson wrote:> George Dunlap writes ("Re: [Xen-devel] [PATCH 11 of 11] Some automatic NUMA placement documentation"): > > On 08/06/12 15:01, Ian Jackson wrote: > > > I think it would be easier, at this stage for 4.2, simply to offer a > > > single `do something sensible'' heuristic, and then think about more > > > complicated control by the caller for 4.3. > > > > In fact, I think it would probably be better to take Ian Campbell''s > > suggestion and either just do it in xl completely (with no libxl > > changes) or do it entirely with libxl, with no external interface. Then > > we''ll have something better than nothing, and the whole 4.3 dev cycle to > > come up with a good interface we can commit to. > > Yes. >That''s fine. As also discussed on IRC with Ian Campbell and George, I''m putting everything inside libxl, but with _no_ public API exposure of it at all (as per analogy to it being in xend). Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Jun-08 15:14 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
On Fri, 2012-06-08 at 15:03 +0100, Ian Jackson wrote:> Dario Faggioli writes ("[Xen-devel] [PATCH 05 of 11] libxl: add a new Array type to the IDL"): > > And make all the required infrastructure updates to enable this. > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > Tested-by: Dario Faggioli <dario.faggioli@citrix.com> > > You write `From: Dario'' in the headers but it has a S-o-b from Ian. >Right.> I think you should write > From: Ian Campbell <ian.campbell@citrix.com> > if he actually wrote it. >Yes, this is by Ian...> git-commit --author will do this for you and I think git-send-email > will then generate appropriate output. >Oh, thanks... Do you happen to know what a mercurial equivalent for this could be? :-P> Can you point me at your git tree so I can easily clone it and see > what the output from this new IDL generator looks like ? >Well, it''s an hg tree + patchqueue that I don''t have anywhere public. I just put a fresh clone of xen-unstable plus this patch here (on a Citrix NIS machine): ~dariof/Repositories/xen-unstable-array.hg is that fine for now? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2012-Jun-08 15:17 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
Dario Faggioli writes ("Re: [Xen-devel] [PATCH 05 of 11] libxl: add a new Array type to the IDL"):> On Fri, 2012-06-08 at 15:03 +0100, Ian Jackson wrote: > > git-commit --author will do this for you and I think git-send-email > > will then generate appropriate output. > > Oh, thanks... Do you happen to know what a mercurial equivalent for this > could be? :-Phg ci -u> > Can you point me at your git tree so I can easily clone it and see > > what the output from this new IDL generator looks like ? > > Well, it''s an hg tree + patchqueue that I don''t have anywhere public. I > just put a fresh clone of xen-unstable plus this patch here (on a Citrix > NIS machine): > > ~dariof/Repositories/xen-unstable-array.hg > > is that fine for now?Thanks, that''s great. I''ll take a look. Ian.
Dario Faggioli
2012-Jun-08 15:19 UTC
Re: [PATCH 11 of 11] Some automatic NUMA placement documentation
On Fri, 2012-06-08 at 15:01 +0100, Ian Jackson wrote:> > Right, that is of couse an option, and I thought about going that way > > for quite a while. However, what I think is best for libxl is to provide > > a set of building blocks for its user to be able to implement the actual > > heuristics. > > The difficulty with this is that this is supposedly becoming a stable > API. If we change the approach later, we may want to change the API. > Do we know clearly enough what building blocks are needed and what > they should be like ? >No, you all are right, not yet... More investigation is needed here and it is not now the moment to do it.> > Doing it the other way, i.e., one big function doing everything, would > > mean that as soon as we want to change or improve the placement > > heuristics, we need to modify the behaviour of that API call, which I > > think it is suboptimal. > > If we have a function whose documented behaviour is `try to do a > roughly optimal thing'' then improving its optimisation is not a change > to the API semantics. > > Specifically, existing code then will, when upgraded to a newer libxl, > behaviour differently - better, we hope. Is that not the intent ? >It is. I''m changing this all into something like "when you do not say anything, libxl will place the domain `sensibly`, +/- as xend was doing". We''ll then add all the necessary interface and API in place during 4.3.> > So, like it is right now, the actual heuristics is implemented in xl, on > > top of these placement candidate generation and manipulation facilities, > > which I finally decided it was the way I liked this whole thing > > most. :-) > > So how much of the code currently in libxl should be reproduced in > (say) libvirt ? >Again, I agree. Let''s leave this for now, but I''ll definitely investigate this later. Thanks a lot for looking into this, :-) Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2012-Jun-08 15:37 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
Dario Faggioli writes ("Re: [Xen-devel] [PATCH 05 of 11] libxl: add a new Array type to the IDL"):> On Fri, 2012-06-08 at 15:03 +0100, Ian Jackson wrote: > > Can you point me at your git tree so I can easily clone it and see > > what the output from this new IDL generator looks like ? > > Well, it''s an hg tree + patchqueue that I don''t have anywhere public. I > just put a fresh clone of xen-unstable plus this patch here (on a Citrix > NIS machine): > > ~dariof/Repositories/xen-unstable-array.hg > > is that fine for now?Uh, can you please push your whole queue there ? In order to review the IDL generator output I naturally need a user of the new array type. Ian.
Dario Faggioli
2012-Jun-08 15:52 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
On Fri, 2012-06-08 at 16:37 +0100, Ian Jackson wrote:> > Well, it''s an hg tree + patchqueue that I don''t have anywhere public. I > > just put a fresh clone of xen-unstable plus this patch here (on a Citrix > > NIS machine): > > > > ~dariof/Repositories/xen-unstable-array.hg > > > > is that fine for now? > > Uh, can you please push your whole queue there ? In order to review > the IDL generator output I naturally need a user of the new array > type. >Gah, right! The whole queue is under deep restructuring, but I should have pushed the patch that constitutes the direct user of the new type... Thanks, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2012-Jun-08 15:57 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
Dario Faggioli writes ("Re: [Xen-devel] [PATCH 05 of 11] libxl: add a new Array type to the IDL"):> Gah, right! The whole queue is under deep restructuring, but I should > have pushed the patch that constitutes the direct user of the new > type...I''m happy to wait. Ian.
Ian Campbell
2012-Jun-12 09:02 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
On Fri, 2012-06-08 at 16:14 +0100, Dario Faggioli wrote:> On Fri, 2012-06-08 at 15:03 +0100, Ian Jackson wrote: > > git-commit --author will do this for you and I think git-send-email > > will then generate appropriate output. > > > Oh, thanks... Do you happen to know what a mercurial equivalent for this > could be? :-PIan J answered for the general commit case but in case you are interested for the mq case you can put a comment at the top of the patch (before the changelog) with the same form as what "hg export" gives you. e.g.: # HG changeset patch # User Ian Campbell <ian.campbell@citrix.com> This makes qpush do the right thing (IIRC, I didn''t try it just now...). Ian.
Dario Faggioli
2012-Jun-13 06:59 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
On Tue, 2012-06-12 at 10:02 +0100, Ian Campbell wrote:> On Fri, 2012-06-08 at 16:14 +0100, Dario Faggioli wrote: > > On Fri, 2012-06-08 at 15:03 +0100, Ian Jackson wrote: > > > git-commit --author will do this for you and I think git-send-email > > > will then generate appropriate output. > > > > > Oh, thanks... Do you happen to know what a mercurial equivalent for this > > could be? :-P > > Ian J answered for the general commit case but in case you are > interested for the mq case you can put a comment at the top of the patch > (before the changelog) with the same form as what "hg export" gives you. > e.g.: > # HG changeset patch > # User Ian Campbell <ian.campbell@citrix.com> >I am, indeed, interested in patchqueues... So thanks to both of you. :-)> This makes qpush do the right thing (IIRC, I didn''t try it just now...). >I see. It seems `hg qrefresh'' also takes a -u argument which is doing the trick. Anyway, thanks again, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Jun-18 12:06 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
On Wed, 2012-06-13 at 08:59 +0200, Dario Faggioli wrote:> > Ian J answered for the general commit case but in case you are > > interested for the mq case you can put a comment at the top of the patch > > (before the changelog) with the same form as what "hg export" gives you. > > e.g.: > > # HG changeset patch > > # User Ian Campbell <ian.campbell@citrix.com> > > > I am, indeed, interested in patchqueues... So thanks to both of you. :-) > > > This makes qpush do the right thing (IIRC, I didn''t try it just now...). > > > I see. It seems `hg qrefresh'' also takes a -u argument which is doing > the trick. >For the record, I tried the above and it seemed to work on my local copy, but not while sending via patchbomb... :-( I''ll try IanC''s suggestion for next time... Sorry. :-( Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Jun-21 14:32 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
On Wed, 2012-06-13 at 08:59 +0200, Dario Faggioli wrote:> > Ian J answered for the general commit case but in case you are > > interested for the mq case you can put a comment at the top of the patch > > (before the changelog) with the same form as what "hg export" gives you. > > e.g.: > > # HG changeset patch > > # User Ian Campbell <ian.campbell@citrix.com> > >So, I have this in place, as I did when sent v2. However, as I suppose this is not doing its job, as I see this: From: Dario Faggioli <raistlinxxx> To: xen-develxxx Cc: ... Subject: [PATCH 02 of 10 v2] libxl: add a new Array type to the IDL Maybe I should remove the --plain option from my patchbomb line, so that those comments make it to the list? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Ian Campbell
2012-Jun-21 14:35 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
On Thu, 2012-06-21 at 15:32 +0100, Dario Faggioli wrote:> On Wed, 2012-06-13 at 08:59 +0200, Dario Faggioli wrote: > > > Ian J answered for the general commit case but in case you are > > > interested for the mq case you can put a comment at the top of the patch > > > (before the changelog) with the same form as what "hg export" gives you. > > > e.g.: > > > # HG changeset patch > > > # User Ian Campbell <ian.campbell@citrix.com> > > > > So, I have this in place, as I did when sent v2. However, as I suppose > this is not doing its job, as I see this: > > From: Dario Faggioli <raistlinxxx> > To: xen-develxxx > Cc: ... > Subject: [PATCH 02 of 10 v2] libxl: add a new Array type to the IDL > > Maybe I should remove the --plain option from my patchbomb line, so that > those comments make it to the list?That would work, I don''t normally use --plain. Ian.
Dario Faggioli
2012-Jun-21 14:35 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
On Wed, 2012-06-13 at 08:59 +0200, Dario Faggioli wrote:> > Ian J answered for the general commit case but in case you are > > interested for the mq case you can put a comment at the top of the patch > > (before the changelog) with the same form as what "hg export" gives you. > > e.g.: > > # HG changeset patch > > # User Ian Campbell <ian.campbell@citrix.com> > >So, I have this in place, as I did when sent v2. However, as I suppose this is not doing its job, as I see this: From: Dario Faggioli <raistlinxxx> To: xen-develxxx Cc: ... Subject: [PATCH 02 of 10 v2] libxl: add a new Array type to the IDL Maybe I should remove the --plain option from my patchbomb line, so that those comments make it to the list? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2012-Jun-26 16:28 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
Dario Faggioli writes ("[Xen-devel] [PATCH 05 of 11] libxl: add a new Array type to the IDL"):> And make all the required infrastructure updates to enable this.> diff --git a/tools/libxl/idl.txt b/tools/libxl/idl.txt > --- a/tools/libxl/idl.txt > +++ b/tools/libxl/idl.txt > @@ -145,12 +145,24 @@ idl.KeyedUnion > > A subclass of idl.Aggregate which represents the C union type > where the currently valid member of the union can be determined based > - upon another member in the containing type. > + upon another member in the containing type. An idl.KeyedUnion must > + always be a member of a containing idl.Aggregate type. > > - The KeyedUnion.keyvar contains an idl.type the member of the > + The KeyedUnion.keyvar contains an idl.Type the member of the > containing type which determines the valid member of the union. The > must be an instance of the Enumeration type.Something seems to have been mangled here in this sentence (although all your patch does is change the case of the name).> +idl.Array > + > + A class representing an array or similar elements. An idl.Array must^^ of.> + always be an idl.Field of a containing idl.Aggregate. > + > + idl.Array.elem_type contains an idl.Type which is the type of all > + elements of the array.More natural in English would be, I think, "... the type of each element of the array".> + idl.Array.len_var contains an idl.Field which is added to the parent > + idl.Aggregate and will contain the length of the array.I think you should explicitly say that len_var MUST be a field named "num_XXX" where the original field is named XXX, as previously stated. In the future this will probably be enforced by the idl compiler. Aside from that, Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Thanks, Ian.
Ian Campbell
2012-Jun-26 16:30 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
On Tue, 2012-06-26 at 17:28 +0100, Ian Jackson wrote:> Dario Faggioli writes ("[Xen-devel] [PATCH 05 of 11] libxl: add a new Array type to the IDL"): > > And make all the required infrastructure updates to enable this. > > > diff --git a/tools/libxl/idl.txt b/tools/libxl/idl.txt > > --- a/tools/libxl/idl.txt > > +++ b/tools/libxl/idl.txt > > @@ -145,12 +145,24 @@ idl.KeyedUnion > > > > A subclass of idl.Aggregate which represents the C union type > > where the currently valid member of the union can be determined based > > - upon another member in the containing type. > > + upon another member in the containing type. An idl.KeyedUnion must > > + always be a member of a containing idl.Aggregate type. > > > > - The KeyedUnion.keyvar contains an idl.type the member of the > > + The KeyedUnion.keyvar contains an idl.Type the member of the > > containing type which determines the valid member of the union. The > > must be an instance of the Enumeration type. > > Something seems to have been mangled here in this sentence (although > all your patch does is change the case of the name). > > > +idl.Array > > + > > + A class representing an array or similar elements. An idl.Array must > ^^ > of. > > > + always be an idl.Field of a containing idl.Aggregate. > > + > > + idl.Array.elem_type contains an idl.Type which is the type of all > > + elements of the array. > > More natural in English would be, I think, "... the type of each > element of the array". > > > + idl.Array.len_var contains an idl.Field which is added to the parent > > + idl.Aggregate and will contain the length of the array. > > I think you should explicitly say that len_var MUST be a field named > "num_XXX" where the original field is named XXX, as previously stated. > > In the future this will probably be enforced by the idl compiler. > > > Aside from that, > > Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>Thanks. Dario, since this is my patch do you want me to address the above and resend?> > > Thanks, > Ian.
Dario Faggioli
2012-Jun-26 16:58 UTC
Re: [PATCH 05 of 11] libxl: add a new Array type to the IDL
On Tue, 2012-06-26 at 17:30 +0100, Ian Campbell wrote:> > Aside from that, > > > > Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> > > Thanks. > > Dario, since this is my patch do you want me to address the above and > resend? >As you wish. If that doesn''t bother you too much, I''m fine with that. Otherwise, I can do it when resending the whole series... Let''s say that, if I don''t see the patch on the list when I come to resend, I''ll apply these changes myself, ok? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel