Dario Faggioli
2012-Oct-05 14:08 UTC
[PATCH 0 of 8] NUMA Awareness for the Credit Scheduler
Hi Everyone, Here it comes a patch series instilling some NUMA awareness in the Credit scheduler. What the patches do is teaching the Xen''s scheduler how to try maximizing performances on a NUMA host, taking advantage of the information coming from the automatic NUMA placement we have in libxl. Right now, the placement algorithm runs and selects a node (or a set of nodes) where it is best to put a new domain on. Then, all the memory for the new domain is allocated from those node(s) and all the vCPUs of the new domain are pinned to the pCPUs of those node(s). What we do here is, instead of statically pinning the domain''s vCPUs to the nodes'' pCPUs, have the (Credit) scheduler _prefer_ running them there. That enables most of the performances benefits of "real" pinning, but without its intrinsic lack of flexibility. The above happens by extending to the scheduler the knowledge of a domain''s node-affinity. We then ask it to first try to run the domain''s vCPUs on one of the nodes the domain has affinity with. Of course, if that turns out to be impossible, it falls back on the old behaviour (i.e., considering vcpu-affinity only). Just allow me to mention that NUMA aware scheduling not only is one of the item of the NUMA roadmap I''m trying to maintain here http://wiki.xen.org/wiki/Xen_NUMA_Roadmap. It is also one of the features we decided we want for Xen 4.3 (and thus it is part of the list of such features that George is maintaining). Up to now, I''ve been able to thoroughly test this only on my 2 NUMA nodes testbox, by running the SpecJBB2005 benchmark concurrently on multiple VMs, and the results looks really nice. A full set of what I got can be found inside my presentation from last XenSummit, which is available here: http://www.slideshare.net/xen_com_mgr/numa-and-virtualization-the-case-of-xen?ref=http://www.xen.org/xensummit/xs12na_talks/T9.html However, I rerun some of the tests in these last days (since I changed some bits of the implementation) and here''s what I got: ------------------------------------------------------- SpecJBB2005 Total Aggregate Throughput ------------------------------------------------------- #VMs No NUMA affinity NUMA affinity & +/- % scheduling ------------------------------------------------------- 2 34653.273 40243.015 +16.13% 4 29883.057 35526.807 +18.88% 6 23512.926 27015.786 +14.89% 8 19120.243 21825.818 +14.15% 10 15676.675 17701.472 +12.91% Basically, results are consistent with what is shown in the super-nice graphs I have in the slides above! :-) As said, this looks nice to me, especially considering that my test machine is quite small, i.e., its 2 nodes are very close to each others from a latency point of view. I really expect more improvement on bigger hardware, where much greater NUMA effect is to be expected. Of course, I myself will continue benchmarking (hopefully, on systems with more than 2 nodes too), but should anyone want to run its own testing, that would be great, so feel free to do that and report results to me and/or to the list! A little bit more about the series: 1/8 xen, libxc: rename xenctl_cpumap to xenctl_bitmap 2/8 xen, libxc: introduce node maps and masks Is some preparation work. 3/8 xen: let the (credit) scheduler know about `node affinity` Is where the vcpu load balancing logic of the credit scheduler is modified to support node-affinity. 4/8 xen: allow for explicitly specifying node-affinity 5/8 libxc: allow for explicitly specifying node-affinity 6/8 libxl: allow for explicitly specifying node-affinity 7/8 libxl: automatic placement deals with node-affinity Is what wires the in-scheduler node-affinity support with the external world. Please, note that patch 4 touches XSM and Flask, which is the area with which I have less experience and less chance to test properly. So, If Daniel and/or anyone interested in that could take a look and comment, that would be awesome. 8/8 xl: report node-affinity for domains Is just some small output enhancement. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Dario Faggioli
2012-Oct-05 14:08 UTC
[PATCH 1 of 8] xen, libxc: rename xenctl_cpumap to xenctl_bitmap
More specifically: 1. replaces xenctl_cpumap with xenctl_bitmap 2. provides bitmap_to_xenctl_bitmap and the reverse; 3. re-implement cpumask_to_xenctl_bitmap with bitmap_to_xenctl_bitmap and the reverse; Other than #3, no functional changes. Interface only slightly afected. This is in preparation of introducing NUMA node-affinity maps. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/libxc/xc_cpupool.c b/tools/libxc/xc_cpupool.c --- a/tools/libxc/xc_cpupool.c +++ b/tools/libxc/xc_cpupool.c @@ -90,7 +90,7 @@ xc_cpupoolinfo_t *xc_cpupool_getinfo(xc_ sysctl.u.cpupool_op.op = XEN_SYSCTL_CPUPOOL_OP_INFO; sysctl.u.cpupool_op.cpupool_id = poolid; set_xen_guest_handle(sysctl.u.cpupool_op.cpumap.bitmap, local); - sysctl.u.cpupool_op.cpumap.nr_cpus = local_size * 8; + sysctl.u.cpupool_op.cpumap.nr_elems = local_size * 8; err = do_sysctl_save(xch, &sysctl); @@ -184,7 +184,7 @@ xc_cpumap_t xc_cpupool_freeinfo(xc_inter sysctl.cmd = XEN_SYSCTL_cpupool_op; sysctl.u.cpupool_op.op = XEN_SYSCTL_CPUPOOL_OP_FREEINFO; set_xen_guest_handle(sysctl.u.cpupool_op.cpumap.bitmap, local); - sysctl.u.cpupool_op.cpumap.nr_cpus = mapsize * 8; + sysctl.u.cpupool_op.cpumap.nr_elems = mapsize * 8; err = do_sysctl_save(xch, &sysctl); diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -142,7 +142,7 @@ int xc_vcpu_setaffinity(xc_interface *xc set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); - domctl.u.vcpuaffinity.cpumap.nr_cpus = cpusize * 8; + domctl.u.vcpuaffinity.cpumap.nr_elems = cpusize * 8; ret = do_domctl(xch, &domctl); @@ -182,7 +182,7 @@ int xc_vcpu_getaffinity(xc_interface *xc domctl.u.vcpuaffinity.vcpu = vcpu; set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); - domctl.u.vcpuaffinity.cpumap.nr_cpus = cpusize * 8; + domctl.u.vcpuaffinity.cpumap.nr_elems = cpusize * 8; ret = do_domctl(xch, &domctl); diff --git a/tools/libxc/xc_tbuf.c b/tools/libxc/xc_tbuf.c --- a/tools/libxc/xc_tbuf.c +++ b/tools/libxc/xc_tbuf.c @@ -134,7 +134,7 @@ int xc_tbuf_set_cpu_mask(xc_interface *x bitmap_64_to_byte(bytemap, &mask64, sizeof (mask64) * 8); set_xen_guest_handle(sysctl.u.tbuf_op.cpu_mask.bitmap, bytemap); - sysctl.u.tbuf_op.cpu_mask.nr_cpus = sizeof(bytemap) * 8; + sysctl.u.tbuf_op.cpu_mask.nr_elems = sizeof(bytemap) * 8; ret = do_sysctl(xch, &sysctl); diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c --- a/xen/arch/x86/cpu/mcheck/mce.c +++ b/xen/arch/x86/cpu/mcheck/mce.c @@ -1474,8 +1474,7 @@ long do_mca(XEN_GUEST_HANDLE(xen_mc_t) u cpumap = &cpu_online_map; else { - ret = xenctl_cpumap_to_cpumask(&cmv, - &op->u.mc_inject_v2.cpumap); + ret = xenctl_bitmap_to_cpumask(&cmv, &op->u.mc_inject_v2.cpumap); if ( ret ) break; cpumap = cmv; diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c --- a/xen/arch/x86/platform_hypercall.c +++ b/xen/arch/x86/platform_hypercall.c @@ -371,7 +371,7 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe { uint32_t cpu; uint64_t idletime, now = NOW(); - struct xenctl_cpumap ctlmap; + struct xenctl_bitmap ctlmap; cpumask_var_t cpumap; XEN_GUEST_HANDLE(uint8) cpumap_bitmap; XEN_GUEST_HANDLE(uint64) idletimes; @@ -384,11 +384,11 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe if ( cpufreq_controller != FREQCTL_dom0_kernel ) break; - ctlmap.nr_cpus = op->u.getidletime.cpumap_nr_cpus; + ctlmap.nr_elems = op->u.getidletime.cpumap_nr_cpus; guest_from_compat_handle(cpumap_bitmap, op->u.getidletime.cpumap_bitmap); ctlmap.bitmap.p = cpumap_bitmap.p; /* handle -> handle_64 conversion */ - if ( (ret = xenctl_cpumap_to_cpumask(&cpumap, &ctlmap)) != 0 ) + if ( (ret = xenctl_bitmap_to_cpumask(&cpumap, &ctlmap)) != 0 ) goto out; guest_from_compat_handle(idletimes, op->u.getidletime.idletime); @@ -407,7 +407,7 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe op->u.getidletime.now = now; if ( ret == 0 ) - ret = cpumask_to_xenctl_cpumap(&ctlmap, cpumap); + ret = cpumask_to_xenctl_bitmap(&ctlmap, cpumap); free_cpumask_var(cpumap); if ( ret == 0 && copy_to_guest(u_xenpf_op, op, 1) ) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -493,7 +493,7 @@ int cpupool_do_sysctl(struct xen_sysctl_ op->cpupool_id = c->cpupool_id; op->sched_id = c->sched->sched_id; op->n_dom = c->n_dom; - ret = cpumask_to_xenctl_cpumap(&op->cpumap, c->cpu_valid); + ret = cpumask_to_xenctl_bitmap(&op->cpumap, c->cpu_valid); cpupool_put(c); } break; @@ -588,7 +588,7 @@ int cpupool_do_sysctl(struct xen_sysctl_ case XEN_SYSCTL_CPUPOOL_OP_FREEINFO: { - ret = cpumask_to_xenctl_cpumap( + ret = cpumask_to_xenctl_bitmap( &op->cpumap, &cpupool_free_cpus); } break; diff --git a/xen/common/domctl.c b/xen/common/domctl.c --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -32,28 +32,29 @@ static DEFINE_SPINLOCK(domctl_lock); DEFINE_SPINLOCK(vcpu_alloc_lock); -int cpumask_to_xenctl_cpumap( - struct xenctl_cpumap *xenctl_cpumap, const cpumask_t *cpumask) +int bitmap_to_xenctl_bitmap(struct xenctl_bitmap *xenctl_bitmap, + const unsigned long *bitmap, + unsigned int nbits) { unsigned int guest_bytes, copy_bytes, i; uint8_t zero = 0; int err = 0; - uint8_t *bytemap = xmalloc_array(uint8_t, (nr_cpu_ids + 7) / 8); + uint8_t *bytemap = xmalloc_array(uint8_t, (nbits + 7) / 8); if ( !bytemap ) return -ENOMEM; - guest_bytes = (xenctl_cpumap->nr_cpus + 7) / 8; - copy_bytes = min_t(unsigned int, guest_bytes, (nr_cpu_ids + 7) / 8); + guest_bytes = (xenctl_bitmap->nr_elems + 7) / 8; + copy_bytes = min_t(unsigned int, guest_bytes, (nbits + 7) / 8); - bitmap_long_to_byte(bytemap, cpumask_bits(cpumask), nr_cpu_ids); + bitmap_long_to_byte(bytemap, bitmap, nbits); if ( copy_bytes != 0 ) - if ( copy_to_guest(xenctl_cpumap->bitmap, bytemap, copy_bytes) ) + if ( copy_to_guest(xenctl_bitmap->bitmap, bytemap, copy_bytes) ) err = -EFAULT; for ( i = copy_bytes; !err && i < guest_bytes; i++ ) - if ( copy_to_guest_offset(xenctl_cpumap->bitmap, i, &zero, 1) ) + if ( copy_to_guest_offset(xenctl_bitmap->bitmap, i, &zero, 1) ) err = -EFAULT; xfree(bytemap); @@ -61,36 +62,59 @@ int cpumask_to_xenctl_cpumap( return err; } -int xenctl_cpumap_to_cpumask( - cpumask_var_t *cpumask, const struct xenctl_cpumap *xenctl_cpumap) +int xenctl_bitmap_to_bitmap(unsigned long *bitmap, + const struct xenctl_bitmap *xenctl_bitmap, + unsigned int nbits) { unsigned int guest_bytes, copy_bytes; int err = 0; - uint8_t *bytemap = xzalloc_array(uint8_t, (nr_cpu_ids + 7) / 8); + uint8_t *bytemap = xzalloc_array(uint8_t, (nbits + 7) / 8); if ( !bytemap ) return -ENOMEM; - guest_bytes = (xenctl_cpumap->nr_cpus + 7) / 8; - copy_bytes = min_t(unsigned int, guest_bytes, (nr_cpu_ids + 7) / 8); + guest_bytes = (xenctl_bitmap->nr_elems + 7) / 8; + copy_bytes = min_t(unsigned int, guest_bytes, (nbits + 7) / 8); if ( copy_bytes != 0 ) { - if ( copy_from_guest(bytemap, xenctl_cpumap->bitmap, copy_bytes) ) + if ( copy_from_guest(bytemap, xenctl_bitmap->bitmap, copy_bytes) ) err = -EFAULT; - if ( (xenctl_cpumap->nr_cpus & 7) && (guest_bytes <= sizeof(bytemap)) ) - bytemap[guest_bytes-1] &= ~(0xff << (xenctl_cpumap->nr_cpus & 7)); + if ( (xenctl_bitmap->nr_elems & 7) && + (guest_bytes <= sizeof(bytemap)) ) + bytemap[guest_bytes-1] &= ~(0xff << (xenctl_bitmap->nr_elems & 7)); } - if ( err ) - /* nothing */; - else if ( alloc_cpumask_var(cpumask) ) - bitmap_byte_to_long(cpumask_bits(*cpumask), bytemap, nr_cpu_ids); + if ( !err ) + bitmap_byte_to_long(bitmap, bytemap, nbits); + + xfree(bytemap); + + return err; +} + +int cpumask_to_xenctl_bitmap(struct xenctl_bitmap *xenctl_cpumap, + const cpumask_t *cpumask) +{ + return bitmap_to_xenctl_bitmap(xenctl_cpumap, cpumask_bits(cpumask), + nr_cpu_ids); +} + +int xenctl_bitmap_to_cpumask(cpumask_var_t *cpumask, + const struct xenctl_bitmap *xenctl_cpumap) +{ + int err = 0; + + if ( alloc_cpumask_var(cpumask) ) { + err = xenctl_bitmap_to_bitmap(cpumask_bits(*cpumask), xenctl_cpumap, + nr_cpu_ids); + /* In case of error, cleanup is up to us, as the caller won''t care! */ + if ( err ) + free_cpumask_var(*cpumask); + } else err = -ENOMEM; - xfree(bytemap); - return err; } @@ -621,7 +645,7 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc { cpumask_var_t new_affinity; - ret = xenctl_cpumap_to_cpumask( + ret = xenctl_bitmap_to_cpumask( &new_affinity, &op->u.vcpuaffinity.cpumap); if ( !ret ) { @@ -631,7 +655,7 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc } else { - ret = cpumask_to_xenctl_cpumap( + ret = cpumask_to_xenctl_bitmap( &op->u.vcpuaffinity.cpumap, v->cpu_affinity); } diff --git a/xen/common/trace.c b/xen/common/trace.c --- a/xen/common/trace.c +++ b/xen/common/trace.c @@ -384,7 +384,7 @@ int tb_control(xen_sysctl_tbuf_op_t *tbc { cpumask_var_t mask; - rc = xenctl_cpumap_to_cpumask(&mask, &tbc->cpu_mask); + rc = xenctl_bitmap_to_cpumask(&mask, &tbc->cpu_mask); if ( !rc ) { cpumask_copy(&tb_cpu_mask, mask); diff --git a/xen/include/public/arch-x86/xen-mca.h b/xen/include/public/arch-x86/xen-mca.h --- a/xen/include/public/arch-x86/xen-mca.h +++ b/xen/include/public/arch-x86/xen-mca.h @@ -414,7 +414,7 @@ struct xen_mc_mceinject { struct xen_mc_inject_v2 { uint32_t flags; - struct xenctl_cpumap cpumap; + struct xenctl_bitmap cpumap; }; #endif diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -284,7 +284,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_getvc /* XEN_DOMCTL_getvcpuaffinity */ struct xen_domctl_vcpuaffinity { uint32_t vcpu; /* IN */ - struct xenctl_cpumap cpumap; /* IN/OUT */ + struct xenctl_bitmap cpumap; /* IN/OUT */ }; typedef struct xen_domctl_vcpuaffinity xen_domctl_vcpuaffinity_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuaffinity_t); diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -71,7 +71,7 @@ struct xen_sysctl_tbuf_op { #define XEN_SYSCTL_TBUFOP_disable 5 uint32_t cmd; /* IN/OUT variables */ - struct xenctl_cpumap cpu_mask; + struct xenctl_bitmap cpu_mask; uint32_t evt_mask; /* OUT variables */ uint64_aligned_t buffer_mfn; @@ -532,7 +532,7 @@ struct xen_sysctl_cpupool_op { uint32_t domid; /* IN: M */ uint32_t cpu; /* IN: AR */ uint32_t n_dom; /* OUT: I */ - struct xenctl_cpumap cpumap; /* OUT: IF */ + struct xenctl_bitmap cpumap; /* OUT: IF */ }; typedef struct xen_sysctl_cpupool_op xen_sysctl_cpupool_op_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpupool_op_t); diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -820,9 +820,9 @@ typedef uint8_t xen_domain_handle_t[16]; #endif #ifndef __ASSEMBLY__ -struct xenctl_cpumap { +struct xenctl_bitmap { XEN_GUEST_HANDLE_64(uint8) bitmap; - uint32_t nr_cpus; + uint32_t nr_elems; }; #endif diff --git a/xen/include/xen/cpumask.h b/xen/include/xen/cpumask.h --- a/xen/include/xen/cpumask.h +++ b/xen/include/xen/cpumask.h @@ -424,8 +424,8 @@ extern cpumask_t cpu_present_map; #define for_each_present_cpu(cpu) for_each_cpu(cpu, &cpu_present_map) /* Copy to/from cpumap provided by control tools. */ -struct xenctl_cpumap; -int cpumask_to_xenctl_cpumap(struct xenctl_cpumap *, const cpumask_t *); -int xenctl_cpumap_to_cpumask(cpumask_var_t *, const struct xenctl_cpumap *); +struct xenctl_bitmap; +int cpumask_to_xenctl_bitmap(struct xenctl_bitmap *, const cpumask_t *); +int xenctl_bitmap_to_cpumask(cpumask_var_t *, const struct xenctl_bitmap *); #endif /* __XEN_CPUMASK_H */ diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst --- a/xen/include/xlat.lst +++ b/xen/include/xlat.lst @@ -2,7 +2,7 @@ # ! - needs translation # ? - needs checking ? dom0_vga_console_info xen.h -? xenctl_cpumap xen.h +? xenctl_bitmap xen.h ? mmu_update xen.h ! mmuext_op xen.h ! start_info xen.h
Dario Faggioli
2012-Oct-05 14:08 UTC
[PATCH 2 of 8] xen, libxc: introduce node maps and masks
Following suit from cpumap and cpumask implementations. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -54,6 +54,11 @@ int xc_get_cpumap_size(xc_interface *xch return (xc_get_max_cpus(xch) + 7) / 8; } +int xc_get_nodemap_size(xc_interface *xch) +{ + return (xc_get_max_nodes(xch) + 7) / 8; +} + xc_cpumap_t xc_cpumap_alloc(xc_interface *xch) { int sz; @@ -64,6 +69,16 @@ xc_cpumap_t xc_cpumap_alloc(xc_interface return calloc(1, sz); } +xc_nodemap_t xc_nodemap_alloc(xc_interface *xch) +{ + int sz; + + sz = xc_get_nodemap_size(xch); + if (sz == 0) + return NULL; + return calloc(1, sz); +} + int xc_readconsolering(xc_interface *xch, char *buffer, unsigned int *pnr_chars, diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -330,12 +330,20 @@ int xc_get_cpumap_size(xc_interface *xch /* allocate a cpumap */ xc_cpumap_t xc_cpumap_alloc(xc_interface *xch); - /* +/* * NODEMAP handling */ +typedef uint8_t *xc_nodemap_t; + /* return maximum number of NUMA nodes the hypervisor supports */ int xc_get_max_nodes(xc_interface *xch); +/* return array size for nodemap */ +int xc_get_nodemap_size(xc_interface *xch); + +/* allocate a nodemap */ +xc_nodemap_t xc_nodemap_alloc(xc_interface *xch); + /* * DOMAIN DEBUGGING FUNCTIONS */ diff --git a/xen/common/domctl.c b/xen/common/domctl.c --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -118,6 +118,30 @@ int xenctl_bitmap_to_cpumask(cpumask_var return err; } +int nodemask_to_xenctl_bitmap(struct xenctl_bitmap *xenctl_nodemap, + const nodemask_t *nodemask) +{ + return bitmap_to_xenctl_bitmap(xenctl_nodemap, cpumask_bits(nodemask), + MAX_NUMNODES); +} + +int xenctl_bitmap_to_nodemask(nodemask_t *nodemask, + const struct xenctl_bitmap *xenctl_nodemap) +{ + int err = 0; + + if ( alloc_nodemask_var(nodemask) ) { + err = xenctl_bitmap_to_bitmap(nodes_addr(*nodemask), xenctl_nodemap, + MAX_NUMNODES); + if ( err ) + free_nodemask_var(*nodemask); + } + else + err = -ENOMEM; + + return err; +} + static inline int is_free_domid(domid_t dom) { struct domain *d; diff --git a/xen/include/xen/nodemask.h b/xen/include/xen/nodemask.h --- a/xen/include/xen/nodemask.h +++ b/xen/include/xen/nodemask.h @@ -298,6 +298,53 @@ static inline int __nodemask_parse(const } #endif +/* + * nodemask_var_t: struct nodemask for stack usage. + * + * See definition of cpumask_var_t in include/xen//cpumask.h. + */ +#if MAX_NUMNODES > 2 * BITS_PER_LONG +#include <xen/xmalloc.h> + +typedef nodemask_t *nodemask_var_t; + +#define nr_nodemask_bits (BITS_TO_LONGS(MAX_NUMNODES) * BITS_PER_LONG) + +static inline bool_t alloc_nodemask_var(nodemask_var_t *mask) +{ + *(void **)mask = _xmalloc(nr_nodemask_bits / 8, sizeof(long)); + return *mask != NULL; +} + +static inline bool_t zalloc_nodemask_var(nodemask_var_t *mask) +{ + *(void **)mask = _xzalloc(nr_nodemask_bits / 8, sizeof(long)); + return *mask != NULL; +} + +static inline void free_nodemask_var(nodemask_var_t mask) +{ + xfree(mask); +} +#else +typedef nodemask_t nodemask_var_t; + +static inline bool_t alloc_nodemask_var(nodemask_var_t *mask) +{ + return 1; +} + +static inline bool_t zalloc_nodemask_var(nodemask_var_t *mask) +{ + nodes_clear(*mask); + return 1; +} + +static inline void free_nodemask_var(nodemask_var_t mask) +{ +} +#endif + #if MAX_NUMNODES > 1 #define for_each_node_mask(node, mask) \ for ((node) = first_node(mask); \
Dario Faggioli
2012-Oct-05 14:08 UTC
[PATCH 3 of 8] xen: let the (credit) scheduler know about `node affinity`
As vcpu-affinity tells where vcpus can run, node-affinity tells where a domain''s vcpus prefer to run. Respecting vcpu-affinity is the primary concern, but honouring node-affinity will likely result in some performances benefit. This change modifies the vcpu load balancing algorithm (for the credit scheduler only), introducing a two steps logic. During the first step, we use the node-affinity mask. The aim is giving precedence to the CPUs where it is known to be preferrable for the domain to run. If that fails in finding a valid CPU, the node-affinity is just ignored and, in the second step, we fall back to using cpu-affinity only. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -101,6 +101,13 @@ /* + * Node Balancing + */ +#define CSCHED_BALANCE_CPU_AFFINITY 0 +#define CSCHED_BALANCE_NODE_AFFINITY 1 +#define CSCHED_BALANCE_LAST CSCHED_BALANCE_NODE_AFFINITY + +/* * Boot parameters */ static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS; @@ -148,6 +155,9 @@ struct csched_dom { struct list_head active_vcpu; struct list_head active_sdom_elem; struct domain *dom; + /* cpumask translated from the domain'' node-affinity + * mask. Basically, the CPUs we prefer to be scheduled on. */ + cpumask_var_t node_affinity_cpumask; uint16_t active_vcpu_count; uint16_t weight; uint16_t cap; @@ -228,6 +238,39 @@ static inline void list_del_init(&svc->runq_elem); } +#define for_each_csched_balance_step(__step) \ + for ( (__step) = CSCHED_BALANCE_LAST; (__step) >= 0; (__step)-- ) + +/* + * Each csched-balance step should use its own cpumask. This function + * determines which one, given the step, and copies it in mask. Notice + * that, in the case of a node balancing step, it also filters out from + * the node-affinity mask the cpus that are not part of vc''s cpu-affinity, + * as we do not want to end up running a vcpu where it is not allowed to! + * + * As an optimization, if a domain does not have any specific node-affinity + * (namely, its node affinity is automatically computed), we inform the + * caller that he can skip the first step by returning -1. + */ +static int +csched_balance_cpumask(const struct vcpu *vc, int step, cpumask_t *mask) +{ + if ( step == CSCHED_BALANCE_NODE_AFFINITY ) + { + struct domain *d = vc->domain; + struct csched_dom *sdom = CSCHED_DOM(d); + + if ( cpumask_full(sdom->node_affinity_cpumask) ) + return -1; + + cpumask_and(mask, sdom->node_affinity_cpumask, vc->cpu_affinity); + } + else /* step == CSCHED_BALANACE_CPU_AFFINITY */ + cpumask_copy(mask, vc->cpu_affinity); + + return 0; +} + static void burn_credits(struct csched_vcpu *svc, s_time_t now) { s_time_t delta; @@ -250,6 +293,20 @@ boolean_param("tickle_one_idle_cpu", opt DEFINE_PER_CPU(unsigned int, last_tickle_cpu); static inline void +__cpumask_tickle(cpumask_t *mask, const cpumask_t *idle_mask) +{ + CSCHED_STAT_CRANK(tickle_idlers_some); + if ( opt_tickle_one_idle ) + { + this_cpu(last_tickle_cpu) + cpumask_cycle(this_cpu(last_tickle_cpu), idle_mask); + cpumask_set_cpu(this_cpu(last_tickle_cpu), mask); + } + else + cpumask_or(mask, mask, idle_mask); +} + +static inline void __runq_tickle(unsigned int cpu, struct csched_vcpu *new) { struct csched_vcpu * const cur @@ -287,22 +344,26 @@ static inline void } else { - cpumask_t idle_mask; + cpumask_t idle_mask, balance_mask; + int balance_step; - cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity); - if ( !cpumask_empty(&idle_mask) ) + for_each_csched_balance_step(balance_step) { - CSCHED_STAT_CRANK(tickle_idlers_some); - if ( opt_tickle_one_idle ) - { - this_cpu(last_tickle_cpu) = - cpumask_cycle(this_cpu(last_tickle_cpu), &idle_mask); - cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask); - } - else - cpumask_or(&mask, &mask, &idle_mask); + if ( csched_balance_cpumask(new->vcpu, balance_step, + &balance_mask) ) + continue; + + /* Look for idlers in the step''s cpumask */ + cpumask_and(&idle_mask, prv->idlers, &balance_mask); + if ( !cpumask_empty(&idle_mask) ) + __cpumask_tickle(&mask, &idle_mask); + + cpumask_and(&mask, &mask, &balance_mask); + + /* We can quit balancing if we found someone to tickle */ + if ( !cpumask_empty(&mask) ) + break; } - cpumask_and(&mask, &mask, new->vcpu->cpu_affinity); } } @@ -443,35 +504,42 @@ static inline int } static inline int -__csched_vcpu_is_migrateable(struct vcpu *vc, int dest_cpu) +__csched_vcpu_is_migrateable(struct vcpu *vc, int dest_cpu, cpumask_t *mask) { /* * Don''t pick up work that''s in the peer''s scheduling tail or hot on - * peer PCPU. Only pick up work that''s allowed to run on our CPU. + * peer PCPU. Only pick up work that prefers and/or is allowed to run + * on our CPU. */ return !vc->is_running && !__csched_vcpu_is_cache_hot(vc) && - cpumask_test_cpu(dest_cpu, vc->cpu_affinity); + cpumask_test_cpu(dest_cpu, mask); } static int _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) { - cpumask_t cpus; + cpumask_t cpus, start_cpus; cpumask_t idlers; cpumask_t *online; + struct csched_dom *sdom = CSCHED_DOM(vc->domain); struct csched_pcpu *spc = NULL; int cpu; /* - * Pick from online CPUs in VCPU''s affinity mask, giving a - * preference to its current processor if it''s in there. + * Pick an online CPU from the && of vcpu-affinity and node-affinity + * masks (if not empty, in which case only the vcpu-affinity mask is + * used). Also, try to give a preference to its current processor if + * it''s in there. */ online = cpupool_scheduler_cpumask(vc->domain->cpupool); cpumask_and(&cpus, online, vc->cpu_affinity); - cpu = cpumask_test_cpu(vc->processor, &cpus) + cpumask_and(&start_cpus, &cpus, sdom->node_affinity_cpumask); + if ( unlikely(cpumask_empty(&start_cpus)) ) + cpumask_copy(&start_cpus, &cpus); + cpu = cpumask_test_cpu(vc->processor, &start_cpus) ? vc->processor - : cpumask_cycle(vc->processor, &cpus); + : cpumask_cycle(vc->processor, &start_cpus); ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) ); /* @@ -867,6 +935,13 @@ csched_alloc_domdata(const struct schedu if ( sdom == NULL ) return NULL; + if ( !alloc_cpumask_var(&sdom->node_affinity_cpumask) ) + { + xfree(sdom); + return NULL; + } + cpumask_setall(sdom->node_affinity_cpumask); + /* Initialize credit and weight */ INIT_LIST_HEAD(&sdom->active_vcpu); sdom->active_vcpu_count = 0; @@ -900,6 +975,9 @@ csched_dom_init(const struct scheduler * static void csched_free_domdata(const struct scheduler *ops, void *data) { + struct csched_dom *sdom = data; + + free_cpumask_var(sdom->node_affinity_cpumask); xfree(data); } @@ -1211,30 +1289,48 @@ csched_runq_steal(int peer_cpu, int cpu, */ if ( peer_pcpu != NULL && !is_idle_vcpu(peer_vcpu) ) { - list_for_each( iter, &peer_pcpu->runq ) + int balance_step; + + /* + * Take node-affinity into account. That means, for all the vcpus + * in peer_pcpu''s runq, check _first_ if their node-affinity allows + * them to run on cpu. If not, retry the loop considering plain + * vcpu-affinity. Also, notice that as soon as one vcpu is found, + * balancing is considered done, and the vcpu is returned to the + * caller. + */ + for_each_csched_balance_step(balance_step) { - speer = __runq_elem(iter); + list_for_each( iter, &peer_pcpu->runq ) + { + cpumask_t balance_mask; - /* - * If next available VCPU here is not of strictly higher - * priority than ours, this PCPU is useless to us. - */ - if ( speer->pri <= pri ) - break; + speer = __runq_elem(iter); - /* Is this VCPU is runnable on our PCPU? */ - vc = speer->vcpu; - BUG_ON( is_idle_vcpu(vc) ); + /* + * If next available VCPU here is not of strictly higher + * priority than ours, this PCPU is useless to us. + */ + if ( speer->pri <= pri ) + break; - if (__csched_vcpu_is_migrateable(vc, cpu)) - { - /* We got a candidate. Grab it! */ - CSCHED_VCPU_STAT_CRANK(speer, migrate_q); - CSCHED_STAT_CRANK(migrate_queued); - WARN_ON(vc->is_urgent); - __runq_remove(speer); - vc->processor = cpu; - return speer; + /* Is this VCPU runnable on our PCPU? */ + vc = speer->vcpu; + BUG_ON( is_idle_vcpu(vc) ); + + if ( csched_balance_cpumask(vc, balance_step, &balance_mask) ) + continue; + + if (__csched_vcpu_is_migrateable(vc, cpu, &balance_mask)) + { + /* We got a candidate. Grab it! */ + CSCHED_VCPU_STAT_CRANK(speer, migrate_q); + CSCHED_STAT_CRANK(migrate_queued); + WARN_ON(vc->is_urgent); + __runq_remove(speer); + vc->processor = cpu; + return speer; + } } } }
Dario Faggioli
2012-Oct-05 14:08 UTC
[PATCH 4 of 8] xen: allow for explicitly specifying node-affinity
Make it possible to pass the node-affinity of a domain to the hypervisor from the upper layers, instead of always being computed automatically. Note that this also required generalizing the Flask hooks for setting and getting the affinity, so that they now deal with both vcpu and node affinity. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/xen/common/domain.c b/xen/common/domain.c --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -222,6 +222,7 @@ struct domain *domain_create( spin_lock_init(&d->node_affinity_lock); d->node_affinity = NODE_MASK_ALL; + d->auto_node_affinity = 1; spin_lock_init(&d->shutdown_lock); d->shutdown_code = -1; @@ -362,11 +363,26 @@ void domain_update_node_affinity(struct cpumask_or(cpumask, cpumask, online_affinity); } - for_each_online_node ( node ) - if ( cpumask_intersects(&node_to_cpumask(node), cpumask) ) - node_set(node, nodemask); + if ( d->auto_node_affinity ) + { + /* Node-affinity is automaically computed from all vcpu-affinities */ + for_each_online_node ( node ) + if ( cpumask_intersects(&node_to_cpumask(node), cpumask) ) + node_set(node, nodemask); - d->node_affinity = nodemask; + d->node_affinity = nodemask; + } + else + { + /* Node-affinity is provided by someone else, just filter out cpus + * that are either offline or not in the affinity of any vcpus. */ + for_each_node_mask ( node, d->node_affinity ) + if ( !cpumask_intersects(&node_to_cpumask(node), cpumask) ) + node_clear(node, d->node_affinity); + } + + sched_set_node_affinity(d, &d->node_affinity); + spin_unlock(&d->node_affinity_lock); free_cpumask_var(online_affinity); @@ -374,6 +390,36 @@ void domain_update_node_affinity(struct } +int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity) +{ + /* Being affine with no nodes is just wrong */ + if ( nodes_empty(*affinity) ) + return -EINVAL; + + spin_lock(&d->node_affinity_lock); + + /* + * Being/becoming explicitly affine to all nodes is not particularly + * useful. Let''s take it as the `reset node affinity` command. + */ + if ( nodes_full(*affinity) ) + { + d->auto_node_affinity = 1; + goto out; + } + + d->auto_node_affinity = 0; + d->node_affinity = *affinity; + +out: + spin_unlock(&d->node_affinity_lock); + + domain_update_node_affinity(d); + + return 0; +} + + struct domain *get_domain_by_id(domid_t dom) { struct domain *d; diff --git a/xen/common/domctl.c b/xen/common/domctl.c --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -642,6 +642,40 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc } break; + case XEN_DOMCTL_setnodeaffinity: + case XEN_DOMCTL_getnodeaffinity: + { + domid_t dom = op->domain; + struct domain *d = rcu_lock_domain_by_id(dom); + + ret = -ESRCH; + if ( d == NULL ) + break; + + ret = xsm_nodeaffinity(op->cmd, d); + if ( ret ) + goto nodeaffinity_out; + + if ( op->cmd == XEN_DOMCTL_setnodeaffinity ) + { + nodemask_t new_affinity; + + ret = xenctl_bitmap_to_nodemask(&new_affinity, + &op->u.nodeaffinity.nodemap); + if ( !ret ) + ret = domain_set_node_affinity(d, &new_affinity); + } + else + { + ret = nodemask_to_xenctl_bitmap(&op->u.nodeaffinity.nodemap, + &d->node_affinity); + } + + nodeaffinity_out: + rcu_unlock_domain(d); + } + break; + case XEN_DOMCTL_setvcpuaffinity: case XEN_DOMCTL_getvcpuaffinity: { diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c --- a/xen/common/keyhandler.c +++ b/xen/common/keyhandler.c @@ -217,6 +217,14 @@ static void cpuset_print(char *set, int *set++ = ''\0''; } +static void nodeset_print(char *set, int size, const nodemask_t *mask) +{ + *set++ = ''[''; + set += nodelist_scnprintf(set, size-2, mask); + *set++ = '']''; + *set++ = ''\0''; +} + static void periodic_timer_print(char *str, int size, uint64_t period) { if ( period == 0 ) @@ -272,6 +280,9 @@ static void dump_domains(unsigned char k dump_pageframe_info(d); + nodeset_print(tmpstr, sizeof(tmpstr), &d->node_affinity); + printk("NODE affinity for domain %d: %s\n", d->domain_id, tmpstr); + printk("VCPU information and callbacks for domain %u:\n", d->domain_id); for_each_vcpu ( d, v ) diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -238,6 +238,33 @@ static inline void list_del_init(&svc->runq_elem); } +/* + * Translates node-affinity mask into a cpumask, so that we can use it during + * actual scheduling. That of course will contain all the cpus from all the + * set nodes in the original node-affinity mask. + * + * Note that any serialization needed to access mask safely is complete + * responsibility of the caller of this function/hook. + */ +static void csched_set_node_affinity( + const struct scheduler *ops, + struct domain *d, + nodemask_t *mask) +{ + struct csched_dom *sdom; + int node; + + /* Skip idle domain since it doesn''t even have a node_affinity_cpumask */ + if ( unlikely(is_idle_domain(d)) ) + return; + + sdom = CSCHED_DOM(d); + cpumask_clear(sdom->node_affinity_cpumask); + for_each_node_mask( node, *mask ) + cpumask_or(sdom->node_affinity_cpumask, sdom->node_affinity_cpumask, + &node_to_cpumask(node)); +} + #define for_each_csched_balance_step(__step) \ for ( (__step) = CSCHED_BALANCE_LAST; (__step) >= 0; (__step)-- ) @@ -260,7 +287,8 @@ csched_balance_cpumask(const struct vcpu struct domain *d = vc->domain; struct csched_dom *sdom = CSCHED_DOM(d); - if ( cpumask_full(sdom->node_affinity_cpumask) ) + if ( cpumask_full(sdom->node_affinity_cpumask) || + d->auto_node_affinity == 1 ) return -1; cpumask_and(mask, sdom->node_affinity_cpumask, vc->cpu_affinity); @@ -1786,6 +1814,8 @@ const struct scheduler sched_credit_def .adjust = csched_dom_cntl, .adjust_global = csched_sys_cntl, + .set_node_affinity = csched_set_node_affinity, + .pick_cpu = csched_cpu_pick, .do_schedule = csched_schedule, diff --git a/xen/common/schedule.c b/xen/common/schedule.c --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -588,6 +588,11 @@ int cpu_disable_scheduler(unsigned int c return ret; } +void sched_set_node_affinity(struct domain *d, nodemask_t *mask) +{ + SCHED_OP(DOM2OP(d), set_node_affinity, d, mask); +} + int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity) { cpumask_t online_affinity; diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -279,6 +279,16 @@ typedef struct xen_domctl_getvcpuinfo xe DEFINE_XEN_GUEST_HANDLE(xen_domctl_getvcpuinfo_t); +/* Get/set the NUMA node(s) with which the guest has affinity with. */ +/* XEN_DOMCTL_setnodeaffinity */ +/* XEN_DOMCTL_getnodeaffinity */ +struct xen_domctl_nodeaffinity { + struct xenctl_bitmap nodemap;/* IN */ +}; +typedef struct xen_domctl_nodeaffinity xen_domctl_nodeaffinity_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_nodeaffinity_t); + + /* Get/set which physical cpus a vcpu can execute on. */ /* XEN_DOMCTL_setvcpuaffinity */ /* XEN_DOMCTL_getvcpuaffinity */ @@ -900,6 +910,8 @@ struct xen_domctl { #define XEN_DOMCTL_set_access_required 64 #define XEN_DOMCTL_audit_p2m 65 #define XEN_DOMCTL_set_virq_handler 66 +#define XEN_DOMCTL_setnodeaffinity 67 +#define XEN_DOMCTL_getnodeaffinity 68 #define XEN_DOMCTL_gdbsx_guestmemio 1000 #define XEN_DOMCTL_gdbsx_pausevcpu 1001 #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 @@ -913,6 +925,7 @@ struct xen_domctl { struct xen_domctl_getpageframeinfo getpageframeinfo; struct xen_domctl_getpageframeinfo2 getpageframeinfo2; struct xen_domctl_getpageframeinfo3 getpageframeinfo3; + struct xen_domctl_nodeaffinity nodeaffinity; struct xen_domctl_vcpuaffinity vcpuaffinity; struct xen_domctl_shadow_op shadow_op; struct xen_domctl_max_mem max_mem; diff --git a/xen/include/xen/nodemask.h b/xen/include/xen/nodemask.h --- a/xen/include/xen/nodemask.h +++ b/xen/include/xen/nodemask.h @@ -8,8 +8,9 @@ * See detailed comments in the file linux/bitmap.h describing the * data type on which these nodemasks are based. * - * For details of nodemask_scnprintf() and nodemask_parse(), - * see bitmap_scnprintf() and bitmap_parse() in lib/bitmap.c. + * For details of nodemask_scnprintf(), nodelist_scnpintf() and + * nodemask_parse(), see bitmap_scnprintf() and bitmap_parse() + * in lib/bitmap.c. * * The available nodemask operations are: * @@ -48,6 +49,7 @@ * unsigned long *nodes_addr(mask) Array of unsigned long''s in mask * * int nodemask_scnprintf(buf, len, mask) Format nodemask for printing + * int nodelist_scnprintf(buf, len, mask) Format nodemask as a list for printing * int nodemask_parse(ubuf, ulen, mask) Parse ascii string as nodemask * * for_each_node_mask(node, mask) for-loop node over mask @@ -280,6 +282,14 @@ static inline int __first_unset_node(con #define nodes_addr(src) ((src).bits) +#define nodelist_scnprintf(buf, len, src) \ + __nodelist_scnprintf((buf), (len), (src), MAX_NUMNODES) +static inline int __nodelist_scnprintf(char *buf, int len, + const nodemask_t *srcp, int nbits) +{ + return bitmap_scnlistprintf(buf, len, srcp->bits, nbits); +} + #if 0 #define nodemask_scnprintf(buf, len, src) \ __nodemask_scnprintf((buf), (len), &(src), MAX_NUMNODES) diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -182,6 +182,8 @@ struct scheduler { struct xen_domctl_scheduler_op *); int (*adjust_global) (const struct scheduler *, struct xen_sysctl_scheduler_op *); + void (*set_node_affinity) (const struct scheduler *, + struct domain *, nodemask_t *); void (*dump_settings) (const struct scheduler *); void (*dump_cpu_state) (const struct scheduler *, int); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -346,8 +346,12 @@ struct domain /* Various mem_events */ struct mem_event_per_domain *mem_event; - /* Currently computed from union of all vcpu cpu-affinity masks. */ + /* + * Can be specified by the user. If that is not the case, it is + * computed from the union of all the vcpu cpu-affinity masks. + */ nodemask_t node_affinity; + int auto_node_affinity; unsigned int last_alloc_node; spinlock_t node_affinity_lock; }; @@ -416,6 +420,7 @@ static inline void get_knownalive_domain ASSERT(!(atomic_read(&d->refcnt) & DOMAIN_DESTROYED)); } +int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity); void domain_update_node_affinity(struct domain *d); struct domain *domain_create( @@ -519,6 +524,7 @@ void sched_destroy_domain(struct domain int sched_move_domain(struct domain *d, struct cpupool *c); long sched_adjust(struct domain *, struct xen_domctl_scheduler_op *); long sched_adjust_global(struct xen_sysctl_scheduler_op *); +void sched_set_node_affinity(struct domain *, nodemask_t *); int sched_id(void); void sched_tick_suspend(void); void sched_tick_resume(void); diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -56,6 +56,7 @@ struct xsm_operations { int (*domain_create) (struct domain *d, u32 ssidref); int (*max_vcpus) (struct domain *d); int (*destroydomain) (struct domain *d); + int (*nodeaffinity) (int cmd, struct domain *d); int (*vcpuaffinity) (int cmd, struct domain *d); int (*scheduler) (struct domain *d); int (*getdomaininfo) (struct domain *d); @@ -229,6 +230,11 @@ static inline int xsm_destroydomain (str return xsm_call(destroydomain(d)); } +static inline int xsm_nodeaffinity (int cmd, struct domain *d) +{ + return xsm_call(nodeaffinity(cmd, d)); +} + static inline int xsm_vcpuaffinity (int cmd, struct domain *d) { return xsm_call(vcpuaffinity(cmd, d)); diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c --- a/xen/xsm/dummy.c +++ b/xen/xsm/dummy.c @@ -634,6 +634,7 @@ void xsm_fixup_ops (struct xsm_operation set_to_dummy_if_null(ops, domain_create); set_to_dummy_if_null(ops, max_vcpus); set_to_dummy_if_null(ops, destroydomain); + set_to_dummy_if_null(ops, nodeaffinity); set_to_dummy_if_null(ops, vcpuaffinity); set_to_dummy_if_null(ops, scheduler); set_to_dummy_if_null(ops, getdomaininfo); diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -521,17 +521,19 @@ static int flask_destroydomain(struct do DOMAIN__DESTROY); } -static int flask_vcpuaffinity(int cmd, struct domain *d) +static int flask_affinity(int cmd, struct domain *d) { u32 perm; switch ( cmd ) { case XEN_DOMCTL_setvcpuaffinity: - perm = DOMAIN__SETVCPUAFFINITY; + case XEN_DOMCTL_setnodeaffinity: + perm = DOMAIN__SETAFFINITY; break; case XEN_DOMCTL_getvcpuaffinity: - perm = DOMAIN__GETVCPUAFFINITY; + case XEN_DOMCTL_getnodeaffinity: + perm = DOMAIN__GETAFFINITY; break; default: return -EPERM; @@ -1473,7 +1475,8 @@ static struct xsm_operations flask_ops .domain_create = flask_domain_create, .max_vcpus = flask_max_vcpus, .destroydomain = flask_destroydomain, - .vcpuaffinity = flask_vcpuaffinity, + .nodeaffinity = flask_affinity, + .vcpuaffinity = flask_affinity, .scheduler = flask_scheduler, .getdomaininfo = flask_getdomaininfo, .getvcpucontext = flask_getvcpucontext, diff --git a/xen/xsm/flask/include/av_perm_to_string.h b/xen/xsm/flask/include/av_perm_to_string.h --- a/xen/xsm/flask/include/av_perm_to_string.h +++ b/xen/xsm/flask/include/av_perm_to_string.h @@ -37,8 +37,8 @@ S_(SECCLASS_DOMAIN, DOMAIN__TRANSITION, "transition") S_(SECCLASS_DOMAIN, DOMAIN__MAX_VCPUS, "max_vcpus") S_(SECCLASS_DOMAIN, DOMAIN__DESTROY, "destroy") - S_(SECCLASS_DOMAIN, DOMAIN__SETVCPUAFFINITY, "setvcpuaffinity") - S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUAFFINITY, "getvcpuaffinity") + S_(SECCLASS_DOMAIN, DOMAIN__SETAFFINITY, "setaffinity") + S_(SECCLASS_DOMAIN, DOMAIN__GETAFFINITY, "getaffinity") S_(SECCLASS_DOMAIN, DOMAIN__SCHEDULER, "scheduler") S_(SECCLASS_DOMAIN, DOMAIN__GETDOMAININFO, "getdomaininfo") S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUINFO, "getvcpuinfo") diff --git a/xen/xsm/flask/include/av_permissions.h b/xen/xsm/flask/include/av_permissions.h --- a/xen/xsm/flask/include/av_permissions.h +++ b/xen/xsm/flask/include/av_permissions.h @@ -38,8 +38,8 @@ #define DOMAIN__TRANSITION 0x00000020UL #define DOMAIN__MAX_VCPUS 0x00000040UL #define DOMAIN__DESTROY 0x00000080UL -#define DOMAIN__SETVCPUAFFINITY 0x00000100UL -#define DOMAIN__GETVCPUAFFINITY 0x00000200UL +#define DOMAIN__SETAFFINITY 0x00000100UL +#define DOMAIN__GETAFFINITY 0x00000200UL #define DOMAIN__SCHEDULER 0x00000400UL #define DOMAIN__GETDOMAININFO 0x00000800UL #define DOMAIN__GETVCPUINFO 0x00001000UL
Dario Faggioli
2012-Oct-05 14:08 UTC
[PATCH 5 of 8] libxc: allow for explicitly specifying node-affinity
By providing the proper get/set interface and wiring them to the new domctl-s from the previous commit. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -110,6 +110,83 @@ int xc_domain_shutdown(xc_interface *xch } +int xc_domain_node_setaffinity(xc_interface *xch, + uint32_t domid, + xc_nodemap_t nodemap) +{ + DECLARE_DOMCTL; + DECLARE_HYPERCALL_BUFFER(uint8_t, local); + int ret = -1; + int nodesize; + + nodesize = xc_get_nodemap_size(xch); + if (!nodesize) + { + PERROR("Could not get number of nodes"); + goto out; + } + + local = xc_hypercall_buffer_alloc(xch, local, nodesize); + if ( local == NULL ) + { + PERROR("Could not allocate memory for setnodeaffinity domctl hypercall"); + goto out; + } + + domctl.cmd = XEN_DOMCTL_setnodeaffinity; + domctl.domain = (domid_t)domid; + + memcpy(local, nodemap, nodesize); + set_xen_guest_handle(domctl.u.nodeaffinity.nodemap.bitmap, local); + domctl.u.nodeaffinity.nodemap.nr_elems = nodesize * 8; + + ret = do_domctl(xch, &domctl); + + xc_hypercall_buffer_free(xch, local); + + out: + return ret; +} + +int xc_domain_node_getaffinity(xc_interface *xch, + uint32_t domid, + xc_nodemap_t nodemap) +{ + DECLARE_DOMCTL; + DECLARE_HYPERCALL_BUFFER(uint8_t, local); + int ret = -1; + int nodesize; + + nodesize = xc_get_nodemap_size(xch); + if (!nodesize) + { + PERROR("Could not get number of nodes"); + goto out; + } + + local = xc_hypercall_buffer_alloc(xch, local, nodesize); + if ( local == NULL ) + { + PERROR("Could not allocate memory for getnodeaffinity domctl hypercall"); + goto out; + } + + domctl.cmd = XEN_DOMCTL_getnodeaffinity; + domctl.domain = (domid_t)domid; + + set_xen_guest_handle(domctl.u.nodeaffinity.nodemap.bitmap, local); + domctl.u.nodeaffinity.nodemap.nr_elems = nodesize * 8; + + ret = do_domctl(xch, &domctl); + + memcpy(nodemap, local, nodesize); + + xc_hypercall_buffer_free(xch, local); + + out: + return ret; +} + int xc_vcpu_setaffinity(xc_interface *xch, uint32_t domid, int vcpu, diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -521,6 +521,32 @@ int xc_watchdog(xc_interface *xch, uint32_t id, uint32_t timeout); +/** + * This function explicitly sets the host NUMA nodes the domain will + * have affinity with. + * + * @parm xch a handle to an open hypervisor interface. + * @parm domid the domain id one wants to set the affinity of. + * @parm nodemap the map of the affine nodes. + * @return 0 on success, -1 on failure. + */ +int xc_domain_node_setaffinity(xc_interface *xch, + uint32_t domind, + xc_nodemap_t nodemap); + +/** + * This function retrieves the host NUMA nodes the domain has + * affinity with. + * + * @parm xch a handle to an open hypervisor interface. + * @parm domid the domain id one wants to get the node affinity of. + * @parm nodemap the map of the affine nodes. + * @return 0 on success, -1 on failure. + */ +int xc_domain_node_getaffinity(xc_interface *xch, + uint32_t domind, + xc_nodemap_t nodemap); + int xc_vcpu_setaffinity(xc_interface *xch, uint32_t domid, int vcpu,
Dario Faggioli
2012-Oct-05 14:08 UTC
[PATCH 6 of 8] libxl: allow for explicitly specifying node-affinity
By introducing a nodemap in libxl_domain_build_info and providing the get/set methods to deal with it. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -3926,6 +3926,26 @@ int libxl_set_vcpuaffinity_all(libxl_ctx return rc; } +int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid, + libxl_bitmap *nodemap) +{ + if (xc_domain_node_setaffinity(ctx->xch, domid, nodemap->map)) { + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting node affinity"); + return ERROR_FAIL; + } + return 0; +} + +int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid, + libxl_bitmap *nodemap) +{ + if (xc_domain_node_getaffinity(ctx->xch, domid, nodemap->map)) { + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting node affinity"); + return ERROR_FAIL; + } + return 0; +} + int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_bitmap *cpumap) { GC_INIT(ctx); diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -859,6 +859,10 @@ int libxl_set_vcpuaffinity(libxl_ctx *ct libxl_bitmap *cpumap); int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, unsigned int max_vcpus, libxl_bitmap *cpumap); +int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid, + libxl_bitmap *nodemap); +int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid, + libxl_bitmap *nodemap); int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_bitmap *cpumap); libxl_scheduler libxl_get_scheduler(libxl_ctx *ctx); diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -219,6 +219,12 @@ int libxl__domain_build_info_setdefault( libxl_defbool_setdefault(&b_info->numa_placement, true); + if (!b_info->nodemap.size) { + if (libxl_node_bitmap_alloc(CTX, &b_info->nodemap, 0)) + return ERROR_FAIL; + libxl_bitmap_set_any(&b_info->nodemap); + } + if (b_info->max_memkb == LIBXL_MEMKB_DEFAULT) b_info->max_memkb = 32 * 1024; if (b_info->target_memkb == LIBXL_MEMKB_DEFAULT) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -230,6 +230,7 @@ int libxl__build_pre(libxl__gc *gc, uint if (rc) return rc; } + libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap); libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap); xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + LIBXL_MAXMEM_CONSTANT); diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -255,6 +255,7 @@ libxl_domain_build_info = Struct("domain ("max_vcpus", integer), ("avail_vcpus", libxl_bitmap), ("cpumap", libxl_bitmap), + ("nodemap", libxl_bitmap), ("numa_placement", libxl_defbool), ("tsc_mode", libxl_tsc_mode), ("max_memkb", MemKB),
Dario Faggioli
2012-Oct-05 14:08 UTC
[PATCH 7 of 8] libxl: automatic placement deals with node-affinity
Which basically means the following two things: 1) during domain creation, it is the node-affinity of the domain --rather than the vcpu-affinities of its vcpus-- that is affected by automatic placement; 2) during automatic placement, when counting how many vcpus are already "bound" to a placement candidate (as part of the process of choosing the best candidate), node-affinity is also considered, together with vcpu-affinity. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -133,13 +133,13 @@ static int numa_place_domain(libxl__gc * { int found; libxl__numa_candidate candidate; - libxl_bitmap candidate_nodemap; + libxl_bitmap cpupool_nodemap; libxl_cpupoolinfo cpupool_info; int i, cpupool, rc = 0; uint32_t memkb; libxl__numa_candidate_init(&candidate); - libxl_bitmap_init(&candidate_nodemap); + libxl_bitmap_init(&cpupool_nodemap); /* * Extract the cpumap from the cpupool the domain belong to. In fact, @@ -156,7 +156,7 @@ static int numa_place_domain(libxl__gc * rc = libxl_domain_need_memory(CTX, info, &memkb); if (rc) goto out; - if (libxl_node_bitmap_alloc(CTX, &candidate_nodemap, 0)) { + if (libxl_node_bitmap_alloc(CTX, &cpupool_nodemap, 0)) { rc = ERROR_FAIL; goto out; } @@ -174,17 +174,19 @@ static int numa_place_domain(libxl__gc * if (found == 0) goto out; - /* Map the candidate''s node map to the domain''s info->cpumap */ - libxl__numa_candidate_get_nodemap(gc, &candidate, &candidate_nodemap); - rc = libxl_nodemap_to_cpumap(CTX, &candidate_nodemap, &info->cpumap); + /* Map the candidate''s node map to the domain''s info->nodemap */ + libxl__numa_candidate_get_nodemap(gc, &candidate, &info->nodemap); + + /* Avoid trying to set the affinity to nodes that might be in the + * candidate''s nodemap but out of our cpupool. */ + rc = libxl_cpumap_to_nodemap(CTX, &cpupool_info.cpumap, + &cpupool_nodemap); if (rc) goto out; - /* Avoid trying to set the affinity to cpus that might be in the - * nodemap but not in our cpupool. */ - libxl_for_each_set_bit(i, info->cpumap) { - if (!libxl_bitmap_test(&cpupool_info.cpumap, i)) - libxl_bitmap_reset(&info->cpumap, i); + libxl_for_each_set_bit(i, info->nodemap) { + if (!libxl_bitmap_test(&cpupool_nodemap, i)) + libxl_bitmap_reset(&info->nodemap, i); } LOG(DETAIL, "NUMA placement candidate with %d nodes, %d cpus and " @@ -193,7 +195,7 @@ static int numa_place_domain(libxl__gc * out: libxl__numa_candidate_dispose(&candidate); - libxl_bitmap_dispose(&candidate_nodemap); + libxl_bitmap_dispose(&cpupool_nodemap); libxl_cpupoolinfo_dispose(&cpupool_info); return rc; } @@ -211,10 +213,10 @@ int libxl__build_pre(libxl__gc *gc, uint /* * Check if the domain has any CPU affinity. If not, try to build * up one. In case numa_place_domain() find at least a suitable - * candidate, it will affect info->cpumap accordingly; if it + * candidate, it will affect info->nodemap accordingly; if it * does not, it just leaves it as it is. This means (unless * some weird error manifests) the subsequent call to - * libxl_set_vcpuaffinity_all() will do the actual placement, + * libxl_domain_set_nodeaffinity() will do the actual placement, * whatever that turns out to be. */ if (libxl_defbool_val(info->numa_placement)) { diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c --- a/tools/libxl/libxl_numa.c +++ b/tools/libxl/libxl_numa.c @@ -171,7 +171,7 @@ static int nodemap_to_nr_vcpus(libxl__gc const libxl_bitmap *nodemap) { libxl_dominfo *dinfo = NULL; - libxl_bitmap vcpu_nodemap; + libxl_bitmap vcpu_nodemap, dom_nodemap; int nr_doms, nr_cpus; int nr_vcpus = 0; int i, j, k; @@ -185,6 +185,12 @@ static int nodemap_to_nr_vcpus(libxl__gc return ERROR_FAIL; } + if (libxl_node_bitmap_alloc(CTX, &dom_nodemap, 0) < 0) { + libxl_dominfo_list_free(dinfo, nr_doms); + libxl_bitmap_dispose(&vcpu_nodemap); + return ERROR_FAIL; + } + for (i = 0; i < nr_doms; i++) { libxl_vcpuinfo *vinfo; int nr_dom_vcpus; @@ -193,6 +199,9 @@ static int nodemap_to_nr_vcpus(libxl__gc if (vinfo == NULL) continue; + /* Retrieve the domain''s node-affinity map (see below) */ + libxl_domain_get_nodeaffinity(CTX, dinfo[i].domid, &dom_nodemap); + /* For each vcpu of each domain ... */ for (j = 0; j < nr_dom_vcpus; j++) { @@ -201,9 +210,17 @@ static int nodemap_to_nr_vcpus(libxl__gc libxl_for_each_set_bit(k, vinfo[j].cpumap) libxl_bitmap_set(&vcpu_nodemap, tinfo[k].node); - /* And check if that map has any intersection with our nodemap */ + /* + * We now check whether the && of the vcpu''s nodemap and the + * domain''s nodemap has any intersection with the nodemap of our + * canidate. + * Using both (vcpu''s and domain''s) nodemaps allows us to take + * both vcpu-affinity and node-affinity into account when counting + * the number of vcpus bound to the candidate. + */ libxl_for_each_set_bit(k, vcpu_nodemap) { - if (libxl_bitmap_test(nodemap, k)) { + if (libxl_bitmap_test(&dom_nodemap, k) && + libxl_bitmap_test(nodemap, k)) { nr_vcpus++; break; } @@ -213,6 +230,7 @@ static int nodemap_to_nr_vcpus(libxl__gc libxl_vcpuinfo_list_free(vinfo, nr_dom_vcpus); } + libxl_bitmap_dispose(&dom_nodemap); libxl_bitmap_dispose(&vcpu_nodemap); libxl_dominfo_list_free(dinfo, nr_doms); return nr_vcpus;
Dario Faggioli
2012-Oct-05 14:08 UTC
[PATCH 8 of 8] xl: add node-affinity to the output of `xl list`
Node-affinity is now something that is under (some) control of the user, so show it upon request as part of the output of `xl list''. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -2834,14 +2834,82 @@ out: } } -static void list_domains(int verbose, int context, const libxl_dominfo *info, int nb_domain) +static void print_bitmap(uint8_t *map, int maplen, FILE *stream, int cpu_node) +{ + int i; + uint8_t pmap = 0, bitmask = 0; + int firstset = 0, state = 0; + + for (i = 0; i < maplen; i++) { + if (i % 8 == 0) { + pmap = *map++; + bitmask = 1; + } else bitmask <<= 1; + + switch (state) { + case 0: + case 2: + if ((pmap & bitmask) != 0) { + firstset = i; + state++; + } + continue; + case 1: + case 3: + if ((pmap & bitmask) == 0) { + fprintf(stream, "%s%d", state > 1 ? "," : "", firstset); + if (i - 1 > firstset) + fprintf(stream, "-%d", i - 1); + state = 2; + } + continue; + } + } + switch (state) { + case 0: + fprintf(stream, "none"); + break; + case 2: + break; + case 1: + if (firstset == 0) { + fprintf(stream, cpu_node ? "any cpu" : "any node"); + break; + } + case 3: + fprintf(stream, "%s%d", state > 1 ? "," : "", firstset); + if (i - 1 > firstset) + fprintf(stream, "-%d", i - 1); + break; + } +} + +static void list_domains(int verbose, int context, int numa, const libxl_dominfo *info, int nb_domain) { int i; static const char shutdown_reason_letters[]= "-rscw"; + libxl_bitmap nodemap; + libxl_physinfo physinfo; + + libxl_bitmap_init(&nodemap); + libxl_physinfo_init(&physinfo); printf("Name ID Mem VCPUs\tState\tTime(s)"); if (verbose) printf(" UUID Reason-Code\tSecurity Label"); if (context && !verbose) printf(" Security Label"); + if (numa) { + if (libxl_node_bitmap_alloc(ctx, &nodemap, 0)) { + fprintf(stderr, "libxl_node_bitmap_alloc_failed.\n"); + exit(1); + } + if (libxl_get_physinfo(ctx, &physinfo) != 0) { + fprintf(stderr, "libxl_physinfo failed.\n"); + libxl_bitmap_dispose(&nodemap); + exit(1); + } + + printf(" NODE Affinity"); + } printf("\n"); for (i = 0; i < nb_domain; i++) { char *domname; @@ -2875,14 +2943,23 @@ static void list_domains(int verbose, in rc = libxl_flask_sid_to_context(ctx, info[i].ssidref, &buf, &size); if (rc < 0) - printf(" -"); + printf(" -"); else { - printf(" %s", buf); + printf(" %16s", buf); free(buf); } } + if (numa) { + libxl_domain_get_nodeaffinity(ctx, info[i].domid, &nodemap); + + putchar('' ''); + print_bitmap(nodemap.map, physinfo.nr_nodes, stdout, 0); + } putchar(''\n''); } + + libxl_bitmap_dispose(&nodemap); + libxl_physinfo_dispose(&physinfo); } static void list_vm(void) @@ -3724,12 +3801,14 @@ int main_list(int argc, char **argv) int opt, verbose = 0; int context = 0; int details = 0; + int numa = 0; int option_index = 0; static struct option long_options[] = { {"long", 0, 0, ''l''}, {"help", 0, 0, ''h''}, {"verbose", 0, 0, ''v''}, {"context", 0, 0, ''Z''}, + {"numa", 0, 0, ''n''}, {0, 0, 0, 0} }; @@ -3738,7 +3817,7 @@ int main_list(int argc, char **argv) int nb_domain, rc; while (1) { - opt = getopt_long(argc, argv, "lvhZ", long_options, &option_index); + opt = getopt_long(argc, argv, "lvhZn", long_options, &option_index); if (opt == -1) break; @@ -3755,6 +3834,9 @@ int main_list(int argc, char **argv) case ''Z'': context = 1; break; + case ''n'': + numa = 1; + break; default: fprintf(stderr, "option `%c'' not supported.\n", optopt); break; @@ -3790,7 +3872,7 @@ int main_list(int argc, char **argv) if (details) list_domains_details(info, nb_domain); else - list_domains(verbose, context, info, nb_domain); + list_domains(verbose, context, numa, info, nb_domain); if (info_free) libxl_dominfo_list_free(info, nb_domain); @@ -4062,56 +4144,6 @@ int main_button_press(int argc, char **a return 0; } -static void print_bitmap(uint8_t *map, int maplen, FILE *stream) -{ - int i; - uint8_t pmap = 0, bitmask = 0; - int firstset = 0, state = 0; - - for (i = 0; i < maplen; i++) { - if (i % 8 == 0) { - pmap = *map++; - bitmask = 1; - } else bitmask <<= 1; - - switch (state) { - case 0: - case 2: - if ((pmap & bitmask) != 0) { - firstset = i; - state++; - } - continue; - case 1: - case 3: - if ((pmap & bitmask) == 0) { - fprintf(stream, "%s%d", state > 1 ? "," : "", firstset); - if (i - 1 > firstset) - fprintf(stream, "-%d", i - 1); - state = 2; - } - continue; - } - } - switch (state) { - case 0: - fprintf(stream, "none"); - break; - case 2: - break; - case 1: - if (firstset == 0) { - fprintf(stream, "any cpu"); - break; - } - case 3: - fprintf(stream, "%s%d", state > 1 ? "," : "", firstset); - if (i - 1 > firstset) - fprintf(stream, "-%d", i - 1); - break; - } -} - static void print_vcpuinfo(uint32_t tdomid, const libxl_vcpuinfo *vcpuinfo, uint32_t nr_cpus) @@ -4135,7 +4167,7 @@ static void print_vcpuinfo(uint32_t tdom /* TIM */ printf("%9.1f ", ((float)vcpuinfo->vcpu_time / 1e9)); /* CPU AFFINITY */ - print_bitmap(vcpuinfo->cpumap.map, nr_cpus, stdout); + print_bitmap(vcpuinfo->cpumap.map, nr_cpus, stdout, 1); printf("\n"); } diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c --- a/tools/libxl/xl_cmdtable.c +++ b/tools/libxl/xl_cmdtable.c @@ -50,7 +50,8 @@ struct cmd_spec cmd_table[] = { "[options] [Domain]\n", "-l, --long Output all VM details\n" "-v, --verbose Prints out UUIDs and security context\n" - "-Z, --context Prints out security context" + "-Z, --context Prints out security context\n" + "-n, --numa Prints out NUMA node affinity" }, { "destroy", &main_destroy, 0, 1,
Jan Beulich
2012-Oct-05 14:25 UTC
Re: [PATCH 3 of 8] xen: let the (credit) scheduler know about `node affinity`
>>> On 05.10.12 at 16:08, Dario Faggioli <dario.faggioli@citrix.com> wrote: > @@ -287,22 +344,26 @@ static inline void > } > else > { > - cpumask_t idle_mask; > + cpumask_t idle_mask, balance_mask;Be _very_ careful about adding on-stack CPU mask variables (also further below): each one of them grows the stack frame by 512 bytes (when building for the current maximum of 4095 CPUs), which is generally too much; you may want to consider pre-allocated scratch space instead. Jan
Ian Jackson
2012-Oct-05 16:36 UTC
Re: [PATCH 8 of 8] xl: add node-affinity to the output of `xl list`
Dario Faggioli writes ("[PATCH 8 of 8] xl: add node-affinity to the output of `xl list`"):> Node-affinity is now something that is under (some) control of the > user, so show it upon request as part of the output of `xl list''....> -static void list_domains(int verbose, int context, const libxl_dominfo *info, int nb_domain) > +static void print_bitmap(uint8_t *map, int maplen, FILE *stream, int cpu_node) > +{ > + int i; > + uint8_t pmap = 0, bitmask = 0; > + int firstset = 0, state = 0; > + > + for (i = 0; i < maplen; i++) { > + if (i % 8 == 0) { > + pmap = *map++; > + bitmask = 1; > + } else bitmask <<= 1; > + > + switch (state) { > + case 0: > + case 2: > + if ((pmap & bitmask) != 0) { > + firstset = i; > + state++; > + } > + continue; > + case 1: > + case 3: > + if ((pmap & bitmask) == 0) { > + fprintf(stream, "%s%d", state > 1 ? "," : "", firstset); > + if (i - 1 > firstset) > + fprintf(stream, "-%d", i - 1); > + state = 2; > + } > + continue; > + } > + }Is this business with a state variable really the least opaque way of writing this ? Oh I see you''re just moving it about. Oh well.. Ian.
Dan Magenheimer
2012-Oct-08 19:43 UTC
Re: [PATCH 0 of 8] NUMA Awareness for the Credit Scheduler
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com] > Sent: Friday, October 05, 2012 8:08 AM > To: xen-devel@lists.xen.org > Cc: Andre Przywara; Ian Campbell; Anil Madhavapeddy; George Dunlap; Andrew Cooper; Juergen Gross; Ian > Jackson; Jan Beulich; Marcus Granado; Daniel De Graaf; Matt Wilson > Subject: [Xen-devel] [PATCH 0 of 8] NUMA Awareness for the Credit Scheduler > > Hi Everyone, > > Here it comes a patch series instilling some NUMA awareness in the Credit > scheduler.Hi Dario -- Just wondering... is the NUMA information preserved on live migration? I''m not saying that it necessarily should, but it may just work due to the implementation (since migration is a form of domain creation). In either case, it might be good to comment about live migration on your wiki. Thanks, Dan
Juergen Gross
2012-Oct-09 09:53 UTC
Re: [PATCH 3 of 8] xen: let the (credit) scheduler know about `node affinity`
Am 05.10.2012 16:08, schrieb Dario Faggioli:> As vcpu-affinity tells where vcpus can run, node-affinity tells > where a domain''s vcpus prefer to run. Respecting vcpu-affinity is > the primary concern, but honouring node-affinity will likely > result in some performances benefit. > > This change modifies the vcpu load balancing algorithm (for the > credit scheduler only), introducing a two steps logic. > During the first step, we use the node-affinity mask. The aim is > giving precedence to the CPUs where it is known to be preferrable > for the domain to run. If that fails in finding a valid CPU, the > node-affinity is just ignored and, in the second step, we fall > back to using cpu-affinity only. > > Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com> > > diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c > --- a/xen/common/sched_credit.c > +++ b/xen/common/sched_credit.c...> static int > _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) > { > - cpumask_t cpus; > + cpumask_t cpus, start_cpus; > cpumask_t idlers; > cpumask_t *online; > + struct csched_dom *sdom = CSCHED_DOM(vc->domain); > struct csched_pcpu *spc = NULL; > int cpu; > > /* > - * Pick from online CPUs in VCPU''s affinity mask, giving a > - * preference to its current processor if it''s in there. > + * Pick an online CPU from the&& of vcpu-affinity and node-affinity > + * masks (if not empty, in which case only the vcpu-affinity mask is > + * used). Also, try to give a preference to its current processor if > + * it''s in there. > */ > online = cpupool_scheduler_cpumask(vc->domain->cpupool); > cpumask_and(&cpus, online, vc->cpu_affinity); > - cpu = cpumask_test_cpu(vc->processor,&cpus) > + cpumask_and(&start_cpus,&cpus, sdom->node_affinity_cpumask); > + if ( unlikely(cpumask_empty(&start_cpus)) ) > + cpumask_copy(&start_cpus,&cpus); > + cpu = cpumask_test_cpu(vc->processor,&start_cpus) > ? vc->processor > - : cpumask_cycle(vc->processor,&cpus); > + : cpumask_cycle(vc->processor,&start_cpus); > ASSERT( !cpumask_empty(&cpus)&& cpumask_test_cpu(cpu,&cpus) );Shouldn''t the ASSERT be changed to start_cpus, too? Juergen -- Juergen Gross Principal Developer Operating Systems PBG PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
Juergen Gross
2012-Oct-09 10:02 UTC
Re: [PATCH 0 of 8] NUMA Awareness for the Credit Scheduler
Am 05.10.2012 16:08, schrieb Dario Faggioli:> Hi Everyone, > > Here it comes a patch series instilling some NUMA awareness in the Credit > scheduler. > > What the patches do is teaching the Xen''s scheduler how to try maximizing > performances on a NUMA host, taking advantage of the information coming from > the automatic NUMA placement we have in libxl. Right now, the > placement algorithm runs and selects a node (or a set of nodes) where it is best > to put a new domain on. Then, all the memory for the new domain is allocated > from those node(s) and all the vCPUs of the new domain are pinned to the pCPUs > of those node(s). What we do here is, instead of statically pinning the domain''s > vCPUs to the nodes'' pCPUs, have the (Credit) scheduler _prefer_ running them > there. That enables most of the performances benefits of "real" pinning, but > without its intrinsic lack of flexibility. > > The above happens by extending to the scheduler the knowledge of a domain''s > node-affinity. We then ask it to first try to run the domain''s vCPUs on one of > the nodes the domain has affinity with. Of course, if that turns out to be > impossible, it falls back on the old behaviour (i.e., considering vcpu-affinity > only). > > Just allow me to mention that NUMA aware scheduling not only is one of the item > of the NUMA roadmap I''m trying to maintain here > http://wiki.xen.org/wiki/Xen_NUMA_Roadmap. It is also one of the features we > decided we want for Xen 4.3 (and thus it is part of the list of such features > that George is maintaining). > > Up to now, I''ve been able to thoroughly test this only on my 2 NUMA nodes > testbox, by running the SpecJBB2005 benchmark concurrently on multiple VMs, and > the results looks really nice. A full set of what I got can be found inside my > presentation from last XenSummit, which is available here: > > http://www.slideshare.net/xen_com_mgr/numa-and-virtualization-the-case-of-xen?ref=http://www.xen.org/xensummit/xs12na_talks/T9.html > > However, I rerun some of the tests in these last days (since I changed some > bits of the implementation) and here''s what I got: > > ------------------------------------------------------- > SpecJBB2005 Total Aggregate Throughput > ------------------------------------------------------- > #VMs No NUMA affinity NUMA affinity& +/- % > scheduling > ------------------------------------------------------- > 2 34653.273 40243.015 +16.13% > 4 29883.057 35526.807 +18.88% > 6 23512.926 27015.786 +14.89% > 8 19120.243 21825.818 +14.15% > 10 15676.675 17701.472 +12.91% > > Basically, results are consistent with what is shown in the super-nice graphs I > have in the slides above! :-) As said, this looks nice to me, especially > considering that my test machine is quite small, i.e., its 2 nodes are very > close to each others from a latency point of view. I really expect more > improvement on bigger hardware, where much greater NUMA effect is to be > expected. Of course, I myself will continue benchmarking (hopefully, on > systems with more than 2 nodes too), but should anyone want to run its own > testing, that would be great, so feel free to do that and report results to me > and/or to the list! > > A little bit more about the series: > > 1/8 xen, libxc: rename xenctl_cpumap to xenctl_bitmap > 2/8 xen, libxc: introduce node maps and masks > > Is some preparation work. > > 3/8 xen: let the (credit) scheduler know about `node affinity` > > Is where the vcpu load balancing logic of the credit scheduler is modified to > support node-affinity. > > 4/8 xen: allow for explicitly specifying node-affinity > 5/8 libxc: allow for explicitly specifying node-affinity > 6/8 libxl: allow for explicitly specifying node-affinity > 7/8 libxl: automatic placement deals with node-affinity > > Is what wires the in-scheduler node-affinity support with the external world. > Please, note that patch 4 touches XSM and Flask, which is the area with which I > have less experience and less chance to test properly. So, If Daniel and/or > anyone interested in that could take a look and comment, that would be awesome. > > 8/8 xl: report node-affinity for domains > > Is just some small output enhancement.Apart from the minor comment to Patch 3: Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com> -- Juergen Gross Principal Developer Operating Systems PBG PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
Dario Faggioli
2012-Oct-09 10:21 UTC
Re: [PATCH 3 of 8] xen: let the (credit) scheduler know about `node affinity`
On Tue, 2012-10-09 at 11:53 +0200, Juergen Gross wrote:> > diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c > > --- a/xen/common/sched_credit.c > > +++ b/xen/common/sched_credit.c > ... > > static int > > _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) > > { > > - cpumask_t cpus; > > + cpumask_t cpus, start_cpus; > > cpumask_t idlers; > > cpumask_t *online; > > + struct csched_dom *sdom = CSCHED_DOM(vc->domain); > > struct csched_pcpu *spc = NULL; > > int cpu; > > > > /* > > - * Pick from online CPUs in VCPU''s affinity mask, giving a > > - * preference to its current processor if it''s in there. > > + * Pick an online CPU from the&& of vcpu-affinity and node-affinity > > + * masks (if not empty, in which case only the vcpu-affinity mask is > > + * used). Also, try to give a preference to its current processor if > > + * it''s in there. > > */ > > online = cpupool_scheduler_cpumask(vc->domain->cpupool); > > cpumask_and(&cpus, online, vc->cpu_affinity); > > - cpu = cpumask_test_cpu(vc->processor,&cpus) > > + cpumask_and(&start_cpus,&cpus, sdom->node_affinity_cpumask); > > + if ( unlikely(cpumask_empty(&start_cpus)) ) > > + cpumask_copy(&start_cpus,&cpus); > > + cpu = cpumask_test_cpu(vc->processor,&start_cpus) > > ? vc->processor > > - : cpumask_cycle(vc->processor,&cpus); > > + : cpumask_cycle(vc->processor,&start_cpus); > > ASSERT( !cpumask_empty(&cpus)&& cpumask_test_cpu(cpu,&cpus) ); > > Shouldn''t the ASSERT be changed to start_cpus, too? >Well, it seems it definitely should, and I seem to have missed that! Thanks a lot, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Oct-09 10:29 UTC
Re: [PATCH 3 of 8] xen: let the (credit) scheduler know about `node affinity`
On Fri, 2012-10-05 at 15:25 +0100, Jan Beulich wrote:> >>> On 05.10.12 at 16:08, Dario Faggioli <dario.faggioli@citrix.com> wrote: > > @@ -287,22 +344,26 @@ static inline void > > } > > else > > { > > - cpumask_t idle_mask; > > + cpumask_t idle_mask, balance_mask; > > Be _very_ careful about adding on-stack CPU mask variables > (also further below): each one of them grows the stack frame > by 512 bytes (when building for the current maximum of 4095 > CPUs), which is generally too much; you may want to consider > pre-allocated scratch space instead. >I see your point, and I think you''re right... I wasn''t "thinking that big". :-) I''ll look into all of these situations and see if I can move the masks off the stack. Any preference between global variables and members of one of the scheduler''s data structures? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Oct-09 10:45 UTC
Re: [PATCH 0 of 8] NUMA Awareness for the Credit Scheduler
On Mon, 2012-10-08 at 12:43 -0700, Dan Magenheimer wrote:> Just wondering... is the NUMA information preserved on live migration? > I''m not saying that it necessarily should, but it may just work > due to the implementation (since migration is a form of domain creation). >What could I say... yes, but "preserved" is not the right word. :-) In fact, something happens when you migrate a VM. As you said, migration is a special case of domain creation, so the placement algorithm will trigger as a part of the process of creating the target VM (unless you override the relevant options in the config file during migration itself). That means the target VM will be placed on one (some) node(s) of the target host, and it''s node-affinity will be set accordingly. _However_, there is right now no guarantee for the final decision from the placing algorithm on the target machine to be "compatible" with the one made on the source machine at initial VM creation time. For instance, if your VM fits in just one node and is placed there on machine A, it well could end up being split on two or more nodes when migrated on machine B (and, of course, the vice versa). Whether, that is acceptable or not, is of course debatable, and we had a bit of this discussion already (although no real conclusion has been reached yet). My take is that, right now, since we do not yet expose any virtual NUMA topology to the VM itself, the behaviour described above is fine. As soon as we''ll have some guest NUMA awareness, than it might be worthwhile to try to preserve it, at least to some extent. Oh, and BTW, I''m of course talking about migration with xl and libxl. If you use other toolstacks, then the hypervisor will default to his current (_without_ this series) behaviour, and it all will depend on who and when calls xc_domain_node_setaffinity() or, perhaps, XEN_DOMCTL_setnodeaffinity directly.> In either case, it might be good to comment about live migration > on your wiki. >That is definitely a good point, I will put something there about migration and the behaviour described above. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Oct-09 11:07 UTC
Re: [PATCH 8 of 8] xl: add node-affinity to the output of `xl list`
On Fri, 2012-10-05 at 17:36 +0100, Ian Jackson wrote:> > +static void print_bitmap(uint8_t *map, int maplen, FILE *stream, int cpu_node) > > +{ > > + int i; > > + uint8_t pmap = 0, bitmask = 0; > > + int firstset = 0, state = 0; > > + > > ... > > Is this business with a state variable really the least opaque way of > writing this ? Oh I see you''re just moving it about. Oh well.. >I don''t think it''s any opaque and, yes, I''m mostly moving that print_bitmap function up in the file, but the new status variable is mine (guilty ass charge :-D). Honestly, despite the fact that the function is called print_bitmap(), it contains the following code: case 1: if (firstset == 0) { fprintf(stream, "any cpu"); break; } case 3: Which is what made me thinking that opacity was not its first concern in the first place, and that turning it into being opaque was none of this change''s business. :-) However, I see your point... Perhaps I can add two functions (something like print_{cpumap,nodemap}()), both calling the original print_bitmap(), and deal with the "any {cpu,node}" case within them... Do you like that better? Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Keir Fraser
2012-Oct-09 11:10 UTC
Re: [PATCH 3 of 8] xen: let the (credit) scheduler know about `node affinity`
On 09/10/2012 11:29, "Dario Faggioli" <dario.faggioli@citrix.com> wrote:> On Fri, 2012-10-05 at 15:25 +0100, Jan Beulich wrote: >>>>> On 05.10.12 at 16:08, Dario Faggioli <dario.faggioli@citrix.com> wrote: >>> @@ -287,22 +344,26 @@ static inline void >>> } >>> else >>> { >>> - cpumask_t idle_mask; >>> + cpumask_t idle_mask, balance_mask; >> >> Be _very_ careful about adding on-stack CPU mask variables >> (also further below): each one of them grows the stack frame >> by 512 bytes (when building for the current maximum of 4095 >> CPUs), which is generally too much; you may want to consider >> pre-allocated scratch space instead. >> > I see your point, and I think you''re right... I wasn''t "thinking that > big". :-) > > I''ll look into all of these situations and see if I can move the masks > off the stack. Any preference between global variables and members of > one of the scheduler''s data structures?Since multiple instances of the scheduler can be active, across multiple cpu pools, surely they have to be allocated in the per-scheduler-instance structures? Or dynamically xmalloc''ed just in the scope they are needed. -- Keir> Thanks and Regards, > Dario
Ian Jackson
2012-Oct-09 15:03 UTC
Re: [PATCH 8 of 8] xl: add node-affinity to the output of `xl list`
Dario Faggioli writes ("Re: [Xen-devel] [PATCH 8 of 8] xl: add node-affinity to the output of `xl list`"):> Honestly, despite the fact that the function is called print_bitmap(), > it contains the following code: > > case 1: > if (firstset == 0) { > fprintf(stream, "any cpu"); > break; > } > case 3:Uh, yes, I see what you mean.> Which is what made me thinking that opacity was not its first concern in > the first place, and that turning it into being opaque was none of this > change''s business. :-)You are right that since you''re just moving the code, it''s not a problem for this patch.> However, I see your point... Perhaps I can add two functions (something > like print_{cpumap,nodemap}()), both calling the original > print_bitmap(), and deal with the "any {cpu,node}" case within them... > > Do you like that better?That would indeed be an improvement. Ian.
George Dunlap
2012-Oct-09 15:59 UTC
Re: [PATCH 1 of 8] xen, libxc: rename xenctl_cpumap to xenctl_bitmap
On Fri, Oct 5, 2012 at 3:08 PM, Dario Faggioli <dario.faggioli@citrix.com> wrote:> More specifically: > 1. replaces xenctl_cpumap with xenctl_bitmap > 2. provides bitmap_to_xenctl_bitmap and the reverse; > 3. re-implement cpumask_to_xenctl_bitmap with > bitmap_to_xenctl_bitmap and the reverse; > > Other than #3, no functional changes. Interface only slightly > afected. > > This is in preparation of introducing NUMA node-affinity maps. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>Acked-by: George Dunlap <george.dunlap@eu.citrix.com>> > diff --git a/tools/libxc/xc_cpupool.c b/tools/libxc/xc_cpupool.c > --- a/tools/libxc/xc_cpupool.c > +++ b/tools/libxc/xc_cpupool.c > @@ -90,7 +90,7 @@ xc_cpupoolinfo_t *xc_cpupool_getinfo(xc_ > sysctl.u.cpupool_op.op = XEN_SYSCTL_CPUPOOL_OP_INFO; > sysctl.u.cpupool_op.cpupool_id = poolid; > set_xen_guest_handle(sysctl.u.cpupool_op.cpumap.bitmap, local); > - sysctl.u.cpupool_op.cpumap.nr_cpus = local_size * 8; > + sysctl.u.cpupool_op.cpumap.nr_elems = local_size * 8; > > err = do_sysctl_save(xch, &sysctl); > > @@ -184,7 +184,7 @@ xc_cpumap_t xc_cpupool_freeinfo(xc_inter > sysctl.cmd = XEN_SYSCTL_cpupool_op; > sysctl.u.cpupool_op.op = XEN_SYSCTL_CPUPOOL_OP_FREEINFO; > set_xen_guest_handle(sysctl.u.cpupool_op.cpumap.bitmap, local); > - sysctl.u.cpupool_op.cpumap.nr_cpus = mapsize * 8; > + sysctl.u.cpupool_op.cpumap.nr_elems = mapsize * 8; > > err = do_sysctl_save(xch, &sysctl); > > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c > --- a/tools/libxc/xc_domain.c > +++ b/tools/libxc/xc_domain.c > @@ -142,7 +142,7 @@ int xc_vcpu_setaffinity(xc_interface *xc > > set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); > > - domctl.u.vcpuaffinity.cpumap.nr_cpus = cpusize * 8; > + domctl.u.vcpuaffinity.cpumap.nr_elems = cpusize * 8; > > ret = do_domctl(xch, &domctl); > > @@ -182,7 +182,7 @@ int xc_vcpu_getaffinity(xc_interface *xc > domctl.u.vcpuaffinity.vcpu = vcpu; > > set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local); > - domctl.u.vcpuaffinity.cpumap.nr_cpus = cpusize * 8; > + domctl.u.vcpuaffinity.cpumap.nr_elems = cpusize * 8; > > ret = do_domctl(xch, &domctl); > > diff --git a/tools/libxc/xc_tbuf.c b/tools/libxc/xc_tbuf.c > --- a/tools/libxc/xc_tbuf.c > +++ b/tools/libxc/xc_tbuf.c > @@ -134,7 +134,7 @@ int xc_tbuf_set_cpu_mask(xc_interface *x > bitmap_64_to_byte(bytemap, &mask64, sizeof (mask64) * 8); > > set_xen_guest_handle(sysctl.u.tbuf_op.cpu_mask.bitmap, bytemap); > - sysctl.u.tbuf_op.cpu_mask.nr_cpus = sizeof(bytemap) * 8; > + sysctl.u.tbuf_op.cpu_mask.nr_elems = sizeof(bytemap) * 8; > > ret = do_sysctl(xch, &sysctl); > > diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c > --- a/xen/arch/x86/cpu/mcheck/mce.c > +++ b/xen/arch/x86/cpu/mcheck/mce.c > @@ -1474,8 +1474,7 @@ long do_mca(XEN_GUEST_HANDLE(xen_mc_t) u > cpumap = &cpu_online_map; > else > { > - ret = xenctl_cpumap_to_cpumask(&cmv, > - &op->u.mc_inject_v2.cpumap); > + ret = xenctl_bitmap_to_cpumask(&cmv, &op->u.mc_inject_v2.cpumap); > if ( ret ) > break; > cpumap = cmv; > diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c > --- a/xen/arch/x86/platform_hypercall.c > +++ b/xen/arch/x86/platform_hypercall.c > @@ -371,7 +371,7 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe > { > uint32_t cpu; > uint64_t idletime, now = NOW(); > - struct xenctl_cpumap ctlmap; > + struct xenctl_bitmap ctlmap; > cpumask_var_t cpumap; > XEN_GUEST_HANDLE(uint8) cpumap_bitmap; > XEN_GUEST_HANDLE(uint64) idletimes; > @@ -384,11 +384,11 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe > if ( cpufreq_controller != FREQCTL_dom0_kernel ) > break; > > - ctlmap.nr_cpus = op->u.getidletime.cpumap_nr_cpus; > + ctlmap.nr_elems = op->u.getidletime.cpumap_nr_cpus; > guest_from_compat_handle(cpumap_bitmap, > op->u.getidletime.cpumap_bitmap); > ctlmap.bitmap.p = cpumap_bitmap.p; /* handle -> handle_64 conversion */ > - if ( (ret = xenctl_cpumap_to_cpumask(&cpumap, &ctlmap)) != 0 ) > + if ( (ret = xenctl_bitmap_to_cpumask(&cpumap, &ctlmap)) != 0 ) > goto out; > guest_from_compat_handle(idletimes, op->u.getidletime.idletime); > > @@ -407,7 +407,7 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe > > op->u.getidletime.now = now; > if ( ret == 0 ) > - ret = cpumask_to_xenctl_cpumap(&ctlmap, cpumap); > + ret = cpumask_to_xenctl_bitmap(&ctlmap, cpumap); > free_cpumask_var(cpumap); > > if ( ret == 0 && copy_to_guest(u_xenpf_op, op, 1) ) > diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c > --- a/xen/common/cpupool.c > +++ b/xen/common/cpupool.c > @@ -493,7 +493,7 @@ int cpupool_do_sysctl(struct xen_sysctl_ > op->cpupool_id = c->cpupool_id; > op->sched_id = c->sched->sched_id; > op->n_dom = c->n_dom; > - ret = cpumask_to_xenctl_cpumap(&op->cpumap, c->cpu_valid); > + ret = cpumask_to_xenctl_bitmap(&op->cpumap, c->cpu_valid); > cpupool_put(c); > } > break; > @@ -588,7 +588,7 @@ int cpupool_do_sysctl(struct xen_sysctl_ > > case XEN_SYSCTL_CPUPOOL_OP_FREEINFO: > { > - ret = cpumask_to_xenctl_cpumap( > + ret = cpumask_to_xenctl_bitmap( > &op->cpumap, &cpupool_free_cpus); > } > break; > diff --git a/xen/common/domctl.c b/xen/common/domctl.c > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -32,28 +32,29 @@ > static DEFINE_SPINLOCK(domctl_lock); > DEFINE_SPINLOCK(vcpu_alloc_lock); > > -int cpumask_to_xenctl_cpumap( > - struct xenctl_cpumap *xenctl_cpumap, const cpumask_t *cpumask) > +int bitmap_to_xenctl_bitmap(struct xenctl_bitmap *xenctl_bitmap, > + const unsigned long *bitmap, > + unsigned int nbits) > { > unsigned int guest_bytes, copy_bytes, i; > uint8_t zero = 0; > int err = 0; > - uint8_t *bytemap = xmalloc_array(uint8_t, (nr_cpu_ids + 7) / 8); > + uint8_t *bytemap = xmalloc_array(uint8_t, (nbits + 7) / 8); > > if ( !bytemap ) > return -ENOMEM; > > - guest_bytes = (xenctl_cpumap->nr_cpus + 7) / 8; > - copy_bytes = min_t(unsigned int, guest_bytes, (nr_cpu_ids + 7) / 8); > + guest_bytes = (xenctl_bitmap->nr_elems + 7) / 8; > + copy_bytes = min_t(unsigned int, guest_bytes, (nbits + 7) / 8); > > - bitmap_long_to_byte(bytemap, cpumask_bits(cpumask), nr_cpu_ids); > + bitmap_long_to_byte(bytemap, bitmap, nbits); > > if ( copy_bytes != 0 ) > - if ( copy_to_guest(xenctl_cpumap->bitmap, bytemap, copy_bytes) ) > + if ( copy_to_guest(xenctl_bitmap->bitmap, bytemap, copy_bytes) ) > err = -EFAULT; > > for ( i = copy_bytes; !err && i < guest_bytes; i++ ) > - if ( copy_to_guest_offset(xenctl_cpumap->bitmap, i, &zero, 1) ) > + if ( copy_to_guest_offset(xenctl_bitmap->bitmap, i, &zero, 1) ) > err = -EFAULT; > > xfree(bytemap); > @@ -61,36 +62,59 @@ int cpumask_to_xenctl_cpumap( > return err; > } > > -int xenctl_cpumap_to_cpumask( > - cpumask_var_t *cpumask, const struct xenctl_cpumap *xenctl_cpumap) > +int xenctl_bitmap_to_bitmap(unsigned long *bitmap, > + const struct xenctl_bitmap *xenctl_bitmap, > + unsigned int nbits) > { > unsigned int guest_bytes, copy_bytes; > int err = 0; > - uint8_t *bytemap = xzalloc_array(uint8_t, (nr_cpu_ids + 7) / 8); > + uint8_t *bytemap = xzalloc_array(uint8_t, (nbits + 7) / 8); > > if ( !bytemap ) > return -ENOMEM; > > - guest_bytes = (xenctl_cpumap->nr_cpus + 7) / 8; > - copy_bytes = min_t(unsigned int, guest_bytes, (nr_cpu_ids + 7) / 8); > + guest_bytes = (xenctl_bitmap->nr_elems + 7) / 8; > + copy_bytes = min_t(unsigned int, guest_bytes, (nbits + 7) / 8); > > if ( copy_bytes != 0 ) > { > - if ( copy_from_guest(bytemap, xenctl_cpumap->bitmap, copy_bytes) ) > + if ( copy_from_guest(bytemap, xenctl_bitmap->bitmap, copy_bytes) ) > err = -EFAULT; > - if ( (xenctl_cpumap->nr_cpus & 7) && (guest_bytes <= sizeof(bytemap)) ) > - bytemap[guest_bytes-1] &= ~(0xff << (xenctl_cpumap->nr_cpus & 7)); > + if ( (xenctl_bitmap->nr_elems & 7) && > + (guest_bytes <= sizeof(bytemap)) ) > + bytemap[guest_bytes-1] &= ~(0xff << (xenctl_bitmap->nr_elems & 7)); > } > > - if ( err ) > - /* nothing */; > - else if ( alloc_cpumask_var(cpumask) ) > - bitmap_byte_to_long(cpumask_bits(*cpumask), bytemap, nr_cpu_ids); > + if ( !err ) > + bitmap_byte_to_long(bitmap, bytemap, nbits); > + > + xfree(bytemap); > + > + return err; > +} > + > +int cpumask_to_xenctl_bitmap(struct xenctl_bitmap *xenctl_cpumap, > + const cpumask_t *cpumask) > +{ > + return bitmap_to_xenctl_bitmap(xenctl_cpumap, cpumask_bits(cpumask), > + nr_cpu_ids); > +} > + > +int xenctl_bitmap_to_cpumask(cpumask_var_t *cpumask, > + const struct xenctl_bitmap *xenctl_cpumap) > +{ > + int err = 0; > + > + if ( alloc_cpumask_var(cpumask) ) { > + err = xenctl_bitmap_to_bitmap(cpumask_bits(*cpumask), xenctl_cpumap, > + nr_cpu_ids); > + /* In case of error, cleanup is up to us, as the caller won''t care! */ > + if ( err ) > + free_cpumask_var(*cpumask); > + } > else > err = -ENOMEM; > > - xfree(bytemap); > - > return err; > } > > @@ -621,7 +645,7 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc > { > cpumask_var_t new_affinity; > > - ret = xenctl_cpumap_to_cpumask( > + ret = xenctl_bitmap_to_cpumask( > &new_affinity, &op->u.vcpuaffinity.cpumap); > if ( !ret ) > { > @@ -631,7 +655,7 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc > } > else > { > - ret = cpumask_to_xenctl_cpumap( > + ret = cpumask_to_xenctl_bitmap( > &op->u.vcpuaffinity.cpumap, v->cpu_affinity); > } > > diff --git a/xen/common/trace.c b/xen/common/trace.c > --- a/xen/common/trace.c > +++ b/xen/common/trace.c > @@ -384,7 +384,7 @@ int tb_control(xen_sysctl_tbuf_op_t *tbc > { > cpumask_var_t mask; > > - rc = xenctl_cpumap_to_cpumask(&mask, &tbc->cpu_mask); > + rc = xenctl_bitmap_to_cpumask(&mask, &tbc->cpu_mask); > if ( !rc ) > { > cpumask_copy(&tb_cpu_mask, mask); > diff --git a/xen/include/public/arch-x86/xen-mca.h b/xen/include/public/arch-x86/xen-mca.h > --- a/xen/include/public/arch-x86/xen-mca.h > +++ b/xen/include/public/arch-x86/xen-mca.h > @@ -414,7 +414,7 @@ struct xen_mc_mceinject { > > struct xen_mc_inject_v2 { > uint32_t flags; > - struct xenctl_cpumap cpumap; > + struct xenctl_bitmap cpumap; > }; > #endif > > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -284,7 +284,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_getvc > /* XEN_DOMCTL_getvcpuaffinity */ > struct xen_domctl_vcpuaffinity { > uint32_t vcpu; /* IN */ > - struct xenctl_cpumap cpumap; /* IN/OUT */ > + struct xenctl_bitmap cpumap; /* IN/OUT */ > }; > typedef struct xen_domctl_vcpuaffinity xen_domctl_vcpuaffinity_t; > DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuaffinity_t); > diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h > --- a/xen/include/public/sysctl.h > +++ b/xen/include/public/sysctl.h > @@ -71,7 +71,7 @@ struct xen_sysctl_tbuf_op { > #define XEN_SYSCTL_TBUFOP_disable 5 > uint32_t cmd; > /* IN/OUT variables */ > - struct xenctl_cpumap cpu_mask; > + struct xenctl_bitmap cpu_mask; > uint32_t evt_mask; > /* OUT variables */ > uint64_aligned_t buffer_mfn; > @@ -532,7 +532,7 @@ struct xen_sysctl_cpupool_op { > uint32_t domid; /* IN: M */ > uint32_t cpu; /* IN: AR */ > uint32_t n_dom; /* OUT: I */ > - struct xenctl_cpumap cpumap; /* OUT: IF */ > + struct xenctl_bitmap cpumap; /* OUT: IF */ > }; > typedef struct xen_sysctl_cpupool_op xen_sysctl_cpupool_op_t; > DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpupool_op_t); > diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h > --- a/xen/include/public/xen.h > +++ b/xen/include/public/xen.h > @@ -820,9 +820,9 @@ typedef uint8_t xen_domain_handle_t[16]; > #endif > > #ifndef __ASSEMBLY__ > -struct xenctl_cpumap { > +struct xenctl_bitmap { > XEN_GUEST_HANDLE_64(uint8) bitmap; > - uint32_t nr_cpus; > + uint32_t nr_elems; > }; > #endif > > diff --git a/xen/include/xen/cpumask.h b/xen/include/xen/cpumask.h > --- a/xen/include/xen/cpumask.h > +++ b/xen/include/xen/cpumask.h > @@ -424,8 +424,8 @@ extern cpumask_t cpu_present_map; > #define for_each_present_cpu(cpu) for_each_cpu(cpu, &cpu_present_map) > > /* Copy to/from cpumap provided by control tools. */ > -struct xenctl_cpumap; > -int cpumask_to_xenctl_cpumap(struct xenctl_cpumap *, const cpumask_t *); > -int xenctl_cpumap_to_cpumask(cpumask_var_t *, const struct xenctl_cpumap *); > +struct xenctl_bitmap; > +int cpumask_to_xenctl_bitmap(struct xenctl_bitmap *, const cpumask_t *); > +int xenctl_bitmap_to_cpumask(cpumask_var_t *, const struct xenctl_bitmap *); > > #endif /* __XEN_CPUMASK_H */ > diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst > --- a/xen/include/xlat.lst > +++ b/xen/include/xlat.lst > @@ -2,7 +2,7 @@ > # ! - needs translation > # ? - needs checking > ? dom0_vga_console_info xen.h > -? xenctl_cpumap xen.h > +? xenctl_bitmap xen.h > ? mmu_update xen.h > ! mmuext_op xen.h > ! start_info xen.h > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
George Dunlap
2012-Oct-09 15:59 UTC
Re: [PATCH 2 of 8] xen, libxc: introduce node maps and masks
On Fri, Oct 5, 2012 at 3:08 PM, Dario Faggioli <dario.faggioli@citrix.com> wrote:> Following suit from cpumap and cpumask implementations. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>Acked-by: George Dunlap <george.dunlap@eu.citrix.com>> > diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c > --- a/tools/libxc/xc_misc.c > +++ b/tools/libxc/xc_misc.c > @@ -54,6 +54,11 @@ int xc_get_cpumap_size(xc_interface *xch > return (xc_get_max_cpus(xch) + 7) / 8; > } > > +int xc_get_nodemap_size(xc_interface *xch) > +{ > + return (xc_get_max_nodes(xch) + 7) / 8; > +} > + > xc_cpumap_t xc_cpumap_alloc(xc_interface *xch) > { > int sz; > @@ -64,6 +69,16 @@ xc_cpumap_t xc_cpumap_alloc(xc_interface > return calloc(1, sz); > } > > +xc_nodemap_t xc_nodemap_alloc(xc_interface *xch) > +{ > + int sz; > + > + sz = xc_get_nodemap_size(xch); > + if (sz == 0) > + return NULL; > + return calloc(1, sz); > +} > + > int xc_readconsolering(xc_interface *xch, > char *buffer, > unsigned int *pnr_chars, > diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h > --- a/tools/libxc/xenctrl.h > +++ b/tools/libxc/xenctrl.h > @@ -330,12 +330,20 @@ int xc_get_cpumap_size(xc_interface *xch > /* allocate a cpumap */ > xc_cpumap_t xc_cpumap_alloc(xc_interface *xch); > > - /* > +/* > * NODEMAP handling > */ > +typedef uint8_t *xc_nodemap_t; > + > /* return maximum number of NUMA nodes the hypervisor supports */ > int xc_get_max_nodes(xc_interface *xch); > > +/* return array size for nodemap */ > +int xc_get_nodemap_size(xc_interface *xch); > + > +/* allocate a nodemap */ > +xc_nodemap_t xc_nodemap_alloc(xc_interface *xch); > + > /* > * DOMAIN DEBUGGING FUNCTIONS > */ > diff --git a/xen/common/domctl.c b/xen/common/domctl.c > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -118,6 +118,30 @@ int xenctl_bitmap_to_cpumask(cpumask_var > return err; > } > > +int nodemask_to_xenctl_bitmap(struct xenctl_bitmap *xenctl_nodemap, > + const nodemask_t *nodemask) > +{ > + return bitmap_to_xenctl_bitmap(xenctl_nodemap, cpumask_bits(nodemask), > + MAX_NUMNODES); > +} > + > +int xenctl_bitmap_to_nodemask(nodemask_t *nodemask, > + const struct xenctl_bitmap *xenctl_nodemap) > +{ > + int err = 0; > + > + if ( alloc_nodemask_var(nodemask) ) { > + err = xenctl_bitmap_to_bitmap(nodes_addr(*nodemask), xenctl_nodemap, > + MAX_NUMNODES); > + if ( err ) > + free_nodemask_var(*nodemask); > + } > + else > + err = -ENOMEM; > + > + return err; > +} > + > static inline int is_free_domid(domid_t dom) > { > struct domain *d; > diff --git a/xen/include/xen/nodemask.h b/xen/include/xen/nodemask.h > --- a/xen/include/xen/nodemask.h > +++ b/xen/include/xen/nodemask.h > @@ -298,6 +298,53 @@ static inline int __nodemask_parse(const > } > #endif > > +/* > + * nodemask_var_t: struct nodemask for stack usage. > + * > + * See definition of cpumask_var_t in include/xen//cpumask.h. > + */ > +#if MAX_NUMNODES > 2 * BITS_PER_LONG > +#include <xen/xmalloc.h> > + > +typedef nodemask_t *nodemask_var_t; > + > +#define nr_nodemask_bits (BITS_TO_LONGS(MAX_NUMNODES) * BITS_PER_LONG) > + > +static inline bool_t alloc_nodemask_var(nodemask_var_t *mask) > +{ > + *(void **)mask = _xmalloc(nr_nodemask_bits / 8, sizeof(long)); > + return *mask != NULL; > +} > + > +static inline bool_t zalloc_nodemask_var(nodemask_var_t *mask) > +{ > + *(void **)mask = _xzalloc(nr_nodemask_bits / 8, sizeof(long)); > + return *mask != NULL; > +} > + > +static inline void free_nodemask_var(nodemask_var_t mask) > +{ > + xfree(mask); > +} > +#else > +typedef nodemask_t nodemask_var_t; > + > +static inline bool_t alloc_nodemask_var(nodemask_var_t *mask) > +{ > + return 1; > +} > + > +static inline bool_t zalloc_nodemask_var(nodemask_var_t *mask) > +{ > + nodes_clear(*mask); > + return 1; > +} > + > +static inline void free_nodemask_var(nodemask_var_t mask) > +{ > +} > +#endif > + > #if MAX_NUMNODES > 1 > #define for_each_node_mask(node, mask) \ > for ((node) = first_node(mask); \ > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
George Dunlap
2012-Oct-09 16:29 UTC
Re: [PATCH 3 of 8] xen: let the (credit) scheduler know about `node affinity`
On Fri, Oct 5, 2012 at 3:08 PM, Dario Faggioli <dario.faggioli@citrix.com> wrote:> As vcpu-affinity tells where vcpus can run, node-affinity tells > where a domain''s vcpus prefer to run. Respecting vcpu-affinity is > the primary concern, but honouring node-affinity will likely > result in some performances benefit. > > This change modifies the vcpu load balancing algorithm (for the > credit scheduler only), introducing a two steps logic. > During the first step, we use the node-affinity mask. The aim is > giving precedence to the CPUs where it is known to be preferrable > for the domain to run. If that fails in finding a valid CPU, the > node-affinity is just ignored and, in the second step, we fall > back to using cpu-affinity only. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>Looking at the load-balancing code, it makes me think that there is probably some interesting work to do there in the future; but I think this patch can go in as it is for now. So (Once others'' comments are addressed) Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Do scroll down to read my comments on load balancing...> @@ -1211,30 +1289,48 @@ csched_runq_steal(int peer_cpu, int cpu, > */ > if ( peer_pcpu != NULL && !is_idle_vcpu(peer_vcpu) ) > { > - list_for_each( iter, &peer_pcpu->runq ) > + int balance_step; > + > + /* > + * Take node-affinity into account. That means, for all the vcpus > + * in peer_pcpu''s runq, check _first_ if their node-affinity allows > + * them to run on cpu. If not, retry the loop considering plain > + * vcpu-affinity. Also, notice that as soon as one vcpu is found, > + * balancing is considered done, and the vcpu is returned to the > + * caller. > + */ > + for_each_csched_balance_step(balance_step) > { > - speer = __runq_elem(iter); > + list_for_each( iter, &peer_pcpu->runq ) > + { > + cpumask_t balance_mask; > > - /* > - * If next available VCPU here is not of strictly higher > - * priority than ours, this PCPU is useless to us. > - */ > - if ( speer->pri <= pri ) > - break; > + speer = __runq_elem(iter); > > - /* Is this VCPU is runnable on our PCPU? */ > - vc = speer->vcpu; > - BUG_ON( is_idle_vcpu(vc) ); > + /* > + * If next available VCPU here is not of strictly higher > + * priority than ours, this PCPU is useless to us. > + */ > + if ( speer->pri <= pri ) > + break; > > - if (__csched_vcpu_is_migrateable(vc, cpu)) > - { > - /* We got a candidate. Grab it! */ > - CSCHED_VCPU_STAT_CRANK(speer, migrate_q); > - CSCHED_STAT_CRANK(migrate_queued); > - WARN_ON(vc->is_urgent); > - __runq_remove(speer); > - vc->processor = cpu; > - return speer; > + /* Is this VCPU runnable on our PCPU? */ > + vc = speer->vcpu; > + BUG_ON( is_idle_vcpu(vc) ); > + > + if ( csched_balance_cpumask(vc, balance_step, &balance_mask) ) > + continue;This will have the effect that a vcpu with any node affinity at all will be stolen before a vcpu with no node affinity: i.e., if you have a system with 4 nodes, and one vcpu has an affinity to nodes 1-2-3, another has affinity with only 1, and another has an affinity to all 4, the one which has an affinity to all 4 will be passed over the first round, while either of the first ones might be nabbed (depending on what pcpu they''re on). Furthermore, the effect of this whole thing (if I''m reading it right) will be to go through *each runqueue* twice, rather than checking all cpus for vcpus with good node affinity, and then all cpus for vcpus with good cpumasks. It seems like it would be better to check: * This node for node-affine work to steal * Other nodes for node-affine work to steal * All nodes for cpu-affine work to steal. Ideally, the search would terminate fairly quickly with the first set, meaning that in the common case we never even check other nodes. Going through the cpu list twice means trying to grab the scheduler lock for each cpu twice; but hopefully that would be made up for by having a shorter list. Thoughts? Like I said, I think this is something to put on our to-do list; this patch should go in so we can start testing it as soon as possible. -George> + > + if (__csched_vcpu_is_migrateable(vc, cpu, &balance_mask)) > + { > + /* We got a candidate. Grab it! */ > + CSCHED_VCPU_STAT_CRANK(speer, migrate_q); > + CSCHED_STAT_CRANK(migrate_queued); > + WARN_ON(vc->is_urgent); > + __runq_remove(speer); > + vc->processor = cpu; > + return speer; > + } > } > } > } > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
George Dunlap
2012-Oct-09 16:47 UTC
Re: [PATCH 4 of 8] xen: allow for explicitly specifying node-affinity
On Fri, Oct 5, 2012 at 3:08 PM, Dario Faggioli <dario.faggioli@citrix.com> wrote:> Make it possible to pass the node-affinity of a domain to the hypervisor > from the upper layers, instead of always being computed automatically. > > Note that this also required generalizing the Flask hooks for setting > and getting the affinity, so that they now deal with both vcpu and > node affinity. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > diff --git a/xen/common/domain.c b/xen/common/domain.c > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -222,6 +222,7 @@ struct domain *domain_create( > > spin_lock_init(&d->node_affinity_lock); > d->node_affinity = NODE_MASK_ALL; > + d->auto_node_affinity = 1; > > spin_lock_init(&d->shutdown_lock); > d->shutdown_code = -1; > @@ -362,11 +363,26 @@ void domain_update_node_affinity(struct > cpumask_or(cpumask, cpumask, online_affinity); > } > > - for_each_online_node ( node ) > - if ( cpumask_intersects(&node_to_cpumask(node), cpumask) ) > - node_set(node, nodemask); > + if ( d->auto_node_affinity ) > + { > + /* Node-affinity is automaically computed from all vcpu-affinities */ > + for_each_online_node ( node ) > + if ( cpumask_intersects(&node_to_cpumask(node), cpumask) ) > + node_set(node, nodemask); > > - d->node_affinity = nodemask; > + d->node_affinity = nodemask; > + } > + else > + { > + /* Node-affinity is provided by someone else, just filter out cpus > + * that are either offline or not in the affinity of any vcpus. */ > + for_each_node_mask ( node, d->node_affinity ) > + if ( !cpumask_intersects(&node_to_cpumask(node), cpumask) ) > + node_clear(node, d->node_affinity); > + } > + > + sched_set_node_affinity(d, &d->node_affinity); > + > spin_unlock(&d->node_affinity_lock); > > free_cpumask_var(online_affinity); > @@ -374,6 +390,36 @@ void domain_update_node_affinity(struct > } > > > +int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity) > +{ > + /* Being affine with no nodes is just wrong */ > + if ( nodes_empty(*affinity) ) > + return -EINVAL; > + > + spin_lock(&d->node_affinity_lock); > + > + /* > + * Being/becoming explicitly affine to all nodes is not particularly > + * useful. Let''s take it as the `reset node affinity` command. > + */ > + if ( nodes_full(*affinity) ) > + { > + d->auto_node_affinity = 1; > + goto out; > + } > + > + d->auto_node_affinity = 0; > + d->node_affinity = *affinity; > + > +out: > + spin_unlock(&d->node_affinity_lock); > + > + domain_update_node_affinity(d); > + > + return 0; > +} > + > + > struct domain *get_domain_by_id(domid_t dom) > { > struct domain *d; > diff --git a/xen/common/domctl.c b/xen/common/domctl.c > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -642,6 +642,40 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc > } > break; > > + case XEN_DOMCTL_setnodeaffinity: > + case XEN_DOMCTL_getnodeaffinity: > + { > + domid_t dom = op->domain; > + struct domain *d = rcu_lock_domain_by_id(dom); > + > + ret = -ESRCH; > + if ( d == NULL ) > + break; > + > + ret = xsm_nodeaffinity(op->cmd, d); > + if ( ret ) > + goto nodeaffinity_out; > + > + if ( op->cmd == XEN_DOMCTL_setnodeaffinity ) > + { > + nodemask_t new_affinity; > + > + ret = xenctl_bitmap_to_nodemask(&new_affinity, > + &op->u.nodeaffinity.nodemap); > + if ( !ret ) > + ret = domain_set_node_affinity(d, &new_affinity); > + } > + else > + { > + ret = nodemask_to_xenctl_bitmap(&op->u.nodeaffinity.nodemap, > + &d->node_affinity); > + } > + > + nodeaffinity_out: > + rcu_unlock_domain(d); > + } > + break; > + > case XEN_DOMCTL_setvcpuaffinity: > case XEN_DOMCTL_getvcpuaffinity: > { > diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c > --- a/xen/common/keyhandler.c > +++ b/xen/common/keyhandler.c > @@ -217,6 +217,14 @@ static void cpuset_print(char *set, int > *set++ = ''\0''; > } > > +static void nodeset_print(char *set, int size, const nodemask_t *mask) > +{ > + *set++ = ''[''; > + set += nodelist_scnprintf(set, size-2, mask); > + *set++ = '']''; > + *set++ = ''\0''; > +} > + > static void periodic_timer_print(char *str, int size, uint64_t period) > { > if ( period == 0 ) > @@ -272,6 +280,9 @@ static void dump_domains(unsigned char k > > dump_pageframe_info(d); > > + nodeset_print(tmpstr, sizeof(tmpstr), &d->node_affinity); > + printk("NODE affinity for domain %d: %s\n", d->domain_id, tmpstr); > + > printk("VCPU information and callbacks for domain %u:\n", > d->domain_id); > for_each_vcpu ( d, v ) > diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c > --- a/xen/common/sched_credit.c > +++ b/xen/common/sched_credit.c > @@ -238,6 +238,33 @@ static inline void > list_del_init(&svc->runq_elem); > } > > +/* > + * Translates node-affinity mask into a cpumask, so that we can use it during > + * actual scheduling. That of course will contain all the cpus from all the > + * set nodes in the original node-affinity mask. > + * > + * Note that any serialization needed to access mask safely is complete > + * responsibility of the caller of this function/hook. > + */ > +static void csched_set_node_affinity( > + const struct scheduler *ops, > + struct domain *d, > + nodemask_t *mask) > +{ > + struct csched_dom *sdom; > + int node; > + > + /* Skip idle domain since it doesn''t even have a node_affinity_cpumask */ > + if ( unlikely(is_idle_domain(d)) ) > + return; > + > + sdom = CSCHED_DOM(d); > + cpumask_clear(sdom->node_affinity_cpumask); > + for_each_node_mask( node, *mask ) > + cpumask_or(sdom->node_affinity_cpumask, sdom->node_affinity_cpumask, > + &node_to_cpumask(node)); > +} > + > #define for_each_csched_balance_step(__step) \ > for ( (__step) = CSCHED_BALANCE_LAST; (__step) >= 0; (__step)-- ) > > @@ -260,7 +287,8 @@ csched_balance_cpumask(const struct vcpu > struct domain *d = vc->domain; > struct csched_dom *sdom = CSCHED_DOM(d); > > - if ( cpumask_full(sdom->node_affinity_cpumask) ) > + if ( cpumask_full(sdom->node_affinity_cpumask) || > + d->auto_node_affinity == 1 ) > return -1; > > cpumask_and(mask, sdom->node_affinity_cpumask, vc->cpu_affinity); > @@ -1786,6 +1814,8 @@ const struct scheduler sched_credit_def > .adjust = csched_dom_cntl, > .adjust_global = csched_sys_cntl, > > + .set_node_affinity = csched_set_node_affinity, > + > .pick_cpu = csched_cpu_pick, > .do_schedule = csched_schedule, > > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -588,6 +588,11 @@ int cpu_disable_scheduler(unsigned int c > return ret; > } > > +void sched_set_node_affinity(struct domain *d, nodemask_t *mask) > +{ > + SCHED_OP(DOM2OP(d), set_node_affinity, d, mask); > +} > + > int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity) > { > cpumask_t online_affinity; > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -279,6 +279,16 @@ typedef struct xen_domctl_getvcpuinfo xe > DEFINE_XEN_GUEST_HANDLE(xen_domctl_getvcpuinfo_t); > > > +/* Get/set the NUMA node(s) with which the guest has affinity with. */ > +/* XEN_DOMCTL_setnodeaffinity */ > +/* XEN_DOMCTL_getnodeaffinity */ > +struct xen_domctl_nodeaffinity { > + struct xenctl_bitmap nodemap;/* IN */ > +}; > +typedef struct xen_domctl_nodeaffinity xen_domctl_nodeaffinity_t; > +DEFINE_XEN_GUEST_HANDLE(xen_domctl_nodeaffinity_t); > + > + > /* Get/set which physical cpus a vcpu can execute on. */ > /* XEN_DOMCTL_setvcpuaffinity */ > /* XEN_DOMCTL_getvcpuaffinity */ > @@ -900,6 +910,8 @@ struct xen_domctl { > #define XEN_DOMCTL_set_access_required 64 > #define XEN_DOMCTL_audit_p2m 65 > #define XEN_DOMCTL_set_virq_handler 66 > +#define XEN_DOMCTL_setnodeaffinity 67 > +#define XEN_DOMCTL_getnodeaffinity 68 > #define XEN_DOMCTL_gdbsx_guestmemio 1000 > #define XEN_DOMCTL_gdbsx_pausevcpu 1001 > #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 > @@ -913,6 +925,7 @@ struct xen_domctl { > struct xen_domctl_getpageframeinfo getpageframeinfo; > struct xen_domctl_getpageframeinfo2 getpageframeinfo2; > struct xen_domctl_getpageframeinfo3 getpageframeinfo3; > + struct xen_domctl_nodeaffinity nodeaffinity; > struct xen_domctl_vcpuaffinity vcpuaffinity; > struct xen_domctl_shadow_op shadow_op; > struct xen_domctl_max_mem max_mem; > diff --git a/xen/include/xen/nodemask.h b/xen/include/xen/nodemask.h > --- a/xen/include/xen/nodemask.h > +++ b/xen/include/xen/nodemask.h > @@ -8,8 +8,9 @@ > * See detailed comments in the file linux/bitmap.h describing the > * data type on which these nodemasks are based. > * > - * For details of nodemask_scnprintf() and nodemask_parse(), > - * see bitmap_scnprintf() and bitmap_parse() in lib/bitmap.c. > + * For details of nodemask_scnprintf(), nodelist_scnpintf() and > + * nodemask_parse(), see bitmap_scnprintf() and bitmap_parse() > + * in lib/bitmap.c. > * > * The available nodemask operations are: > * > @@ -48,6 +49,7 @@ > * unsigned long *nodes_addr(mask) Array of unsigned long''s in mask > * > * int nodemask_scnprintf(buf, len, mask) Format nodemask for printing > + * int nodelist_scnprintf(buf, len, mask) Format nodemask as a list for printing > * int nodemask_parse(ubuf, ulen, mask) Parse ascii string as nodemask > * > * for_each_node_mask(node, mask) for-loop node over mask > @@ -280,6 +282,14 @@ static inline int __first_unset_node(con > > #define nodes_addr(src) ((src).bits) > > +#define nodelist_scnprintf(buf, len, src) \ > + __nodelist_scnprintf((buf), (len), (src), MAX_NUMNODES) > +static inline int __nodelist_scnprintf(char *buf, int len, > + const nodemask_t *srcp, int nbits) > +{ > + return bitmap_scnlistprintf(buf, len, srcp->bits, nbits); > +} > + > #if 0 > #define nodemask_scnprintf(buf, len, src) \ > __nodemask_scnprintf((buf), (len), &(src), MAX_NUMNODES) > diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h > --- a/xen/include/xen/sched-if.h > +++ b/xen/include/xen/sched-if.h > @@ -182,6 +182,8 @@ struct scheduler { > struct xen_domctl_scheduler_op *); > int (*adjust_global) (const struct scheduler *, > struct xen_sysctl_scheduler_op *); > + void (*set_node_affinity) (const struct scheduler *, > + struct domain *, nodemask_t *); > void (*dump_settings) (const struct scheduler *); > void (*dump_cpu_state) (const struct scheduler *, int); > > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h > --- a/xen/include/xen/sched.h > +++ b/xen/include/xen/sched.h > @@ -346,8 +346,12 @@ struct domain > /* Various mem_events */ > struct mem_event_per_domain *mem_event; > > - /* Currently computed from union of all vcpu cpu-affinity masks. */ > + /* > + * Can be specified by the user. If that is not the case, it is > + * computed from the union of all the vcpu cpu-affinity masks. > + */ > nodemask_t node_affinity; > + int auto_node_affinity; > unsigned int last_alloc_node; > spinlock_t node_affinity_lock; > }; > @@ -416,6 +420,7 @@ static inline void get_knownalive_domain > ASSERT(!(atomic_read(&d->refcnt) & DOMAIN_DESTROYED)); > } > > +int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity); > void domain_update_node_affinity(struct domain *d); > > struct domain *domain_create( > @@ -519,6 +524,7 @@ void sched_destroy_domain(struct domain > int sched_move_domain(struct domain *d, struct cpupool *c); > long sched_adjust(struct domain *, struct xen_domctl_scheduler_op *); > long sched_adjust_global(struct xen_sysctl_scheduler_op *); > +void sched_set_node_affinity(struct domain *, nodemask_t *); > int sched_id(void); > void sched_tick_suspend(void); > void sched_tick_resume(void); > diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h > --- a/xen/include/xsm/xsm.h > +++ b/xen/include/xsm/xsm.h > @@ -56,6 +56,7 @@ struct xsm_operations { > int (*domain_create) (struct domain *d, u32 ssidref); > int (*max_vcpus) (struct domain *d); > int (*destroydomain) (struct domain *d); > + int (*nodeaffinity) (int cmd, struct domain *d); > int (*vcpuaffinity) (int cmd, struct domain *d); > int (*scheduler) (struct domain *d); > int (*getdomaininfo) (struct domain *d); > @@ -229,6 +230,11 @@ static inline int xsm_destroydomain (str > return xsm_call(destroydomain(d)); > } > > +static inline int xsm_nodeaffinity (int cmd, struct domain *d) > +{ > + return xsm_call(nodeaffinity(cmd, d)); > +} > + > static inline int xsm_vcpuaffinity (int cmd, struct domain *d) > { > return xsm_call(vcpuaffinity(cmd, d)); > diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c > --- a/xen/xsm/dummy.c > +++ b/xen/xsm/dummy.c > @@ -634,6 +634,7 @@ void xsm_fixup_ops (struct xsm_operation > set_to_dummy_if_null(ops, domain_create); > set_to_dummy_if_null(ops, max_vcpus); > set_to_dummy_if_null(ops, destroydomain); > + set_to_dummy_if_null(ops, nodeaffinity); > set_to_dummy_if_null(ops, vcpuaffinity); > set_to_dummy_if_null(ops, scheduler); > set_to_dummy_if_null(ops, getdomaininfo); > diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c > --- a/xen/xsm/flask/hooks.c > +++ b/xen/xsm/flask/hooks.c > @@ -521,17 +521,19 @@ static int flask_destroydomain(struct do > DOMAIN__DESTROY); > } > > -static int flask_vcpuaffinity(int cmd, struct domain *d) > +static int flask_affinity(int cmd, struct domain *d) > { > u32 perm; > > switch ( cmd ) > { > case XEN_DOMCTL_setvcpuaffinity: > - perm = DOMAIN__SETVCPUAFFINITY; > + case XEN_DOMCTL_setnodeaffinity: > + perm = DOMAIN__SETAFFINITY; > break; > case XEN_DOMCTL_getvcpuaffinity: > - perm = DOMAIN__GETVCPUAFFINITY; > + case XEN_DOMCTL_getnodeaffinity: > + perm = DOMAIN__GETAFFINITY; > break; > default: > return -EPERM; > @@ -1473,7 +1475,8 @@ static struct xsm_operations flask_ops > .domain_create = flask_domain_create, > .max_vcpus = flask_max_vcpus, > .destroydomain = flask_destroydomain, > - .vcpuaffinity = flask_vcpuaffinity, > + .nodeaffinity = flask_affinity, > + .vcpuaffinity = flask_affinity, > .scheduler = flask_scheduler, > .getdomaininfo = flask_getdomaininfo, > .getvcpucontext = flask_getvcpucontext, > diff --git a/xen/xsm/flask/include/av_perm_to_string.h b/xen/xsm/flask/include/av_perm_to_string.h > --- a/xen/xsm/flask/include/av_perm_to_string.h > +++ b/xen/xsm/flask/include/av_perm_to_string.h > @@ -37,8 +37,8 @@ > S_(SECCLASS_DOMAIN, DOMAIN__TRANSITION, "transition") > S_(SECCLASS_DOMAIN, DOMAIN__MAX_VCPUS, "max_vcpus") > S_(SECCLASS_DOMAIN, DOMAIN__DESTROY, "destroy") > - S_(SECCLASS_DOMAIN, DOMAIN__SETVCPUAFFINITY, "setvcpuaffinity") > - S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUAFFINITY, "getvcpuaffinity") > + S_(SECCLASS_DOMAIN, DOMAIN__SETAFFINITY, "setaffinity") > + S_(SECCLASS_DOMAIN, DOMAIN__GETAFFINITY, "getaffinity")The top of this file says, "This file is automatically generated. Do not edit." I didn''t see any files that might have been modified to effect these changes -- did I miss them? Or is the comment a lie? Or should you find that file and edit it instead? :-)> S_(SECCLASS_DOMAIN, DOMAIN__SCHEDULER, "scheduler") > S_(SECCLASS_DOMAIN, DOMAIN__GETDOMAININFO, "getdomaininfo") > S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUINFO, "getvcpuinfo") > diff --git a/xen/xsm/flask/include/av_permissions.h b/xen/xsm/flask/include/av_permissions.h > --- a/xen/xsm/flask/include/av_permissions.h > +++ b/xen/xsm/flask/include/av_permissions.h > @@ -38,8 +38,8 @@ > #define DOMAIN__TRANSITION 0x00000020UL > #define DOMAIN__MAX_VCPUS 0x00000040UL > #define DOMAIN__DESTROY 0x00000080UL > -#define DOMAIN__SETVCPUAFFINITY 0x00000100UL > -#define DOMAIN__GETVCPUAFFINITY 0x00000200UL > +#define DOMAIN__SETAFFINITY 0x00000100UL > +#define DOMAIN__GETAFFINITY 0x00000200ULSame thing here. Other than that, looks good! -George> #define DOMAIN__SCHEDULER 0x00000400UL > #define DOMAIN__GETDOMAININFO 0x00000800UL > #define DOMAIN__GETVCPUINFO 0x00001000UL > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2012-Oct-09 16:52 UTC
Re: [PATCH 4 of 8] xen: allow for explicitly specifying node-affinity
Could you trim your quotes please?> > diff --git a/xen/xsm/flask/include/av_perm_to_string.h b/xen/xsm/flask/include/av_perm_to_string.h > > --- a/xen/xsm/flask/include/av_perm_to_string.h > > +++ b/xen/xsm/flask/include/av_perm_to_string.h > > @@ -37,8 +37,8 @@ > > S_(SECCLASS_DOMAIN, DOMAIN__TRANSITION, "transition") > > S_(SECCLASS_DOMAIN, DOMAIN__MAX_VCPUS, "max_vcpus") > > S_(SECCLASS_DOMAIN, DOMAIN__DESTROY, "destroy") > > - S_(SECCLASS_DOMAIN, DOMAIN__SETVCPUAFFINITY, "setvcpuaffinity") > > - S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUAFFINITY, "getvcpuaffinity") > > + S_(SECCLASS_DOMAIN, DOMAIN__SETAFFINITY, "setaffinity") > > + S_(SECCLASS_DOMAIN, DOMAIN__GETAFFINITY, "getaffinity") > > The top of this file says, "This file is automatically generated. Do > not edit." I didn''t see any files that might have been modified to > effect these changes -- did I miss them? Or is the comment a lie? Or > should you find that file and edit it instead? :-)Also, in that case why is this file checked in? Usually the reason is if the generating tool is not widely available, but in this case it seems to be tools/flask/policy/policy/flask/mkflask.sh which depends on awk and not a lot else -- so I think we can rely on it being available. Ian.
Dario Faggioli
2012-Oct-09 17:17 UTC
Re: [PATCH 4 of 8] xen: allow for explicitly specifying node-affinity
On Tue, 2012-10-09 at 17:47 +0100, George Dunlap wrote:> > diff --git a/xen/xsm/flask/include/av_perm_to_string.h b/xen/xsm/flask/include/av_perm_to_string.h > > --- a/xen/xsm/flask/include/av_perm_to_string.h > > +++ b/xen/xsm/flask/include/av_perm_to_string.h > > @@ -37,8 +37,8 @@ > > S_(SECCLASS_DOMAIN, DOMAIN__TRANSITION, "transition") > > S_(SECCLASS_DOMAIN, DOMAIN__MAX_VCPUS, "max_vcpus") > > S_(SECCLASS_DOMAIN, DOMAIN__DESTROY, "destroy") > > - S_(SECCLASS_DOMAIN, DOMAIN__SETVCPUAFFINITY, "setvcpuaffinity") > > - S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUAFFINITY, "getvcpuaffinity") > > + S_(SECCLASS_DOMAIN, DOMAIN__SETAFFINITY, "setaffinity") > > + S_(SECCLASS_DOMAIN, DOMAIN__GETAFFINITY, "getaffinity") > > The top of this file says, "This file is automatically generated. Do > not edit." I didn''t see any files that might have been modified to > effect these changes -- did I miss them? Or is the comment a lie? Or > should you find that file and edit it instead? :-) >Wow! I said I have very poor knowledge of this security hook thing, but that is something quite big that I appear to have missed!! :-P Thanks for pointing that out, and sorry for that. I''ll definitely have to take a look at the generator, and will do while resending. Thanks again and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Daniel De Graaf
2012-Oct-09 18:31 UTC
[PATCH RFC] flask: move policy header sources into hypervisor
Ian Campbell wrote: [...]>>> +++ b/xen/xsm/flask/include/av_perm_to_string.h > Also, in that case why is this file checked in?This patch fixes the autogenerated files, but doesn''t fully wire them in to things like "make clean" or .{git,hg}ignore. I don''t see an obvious way to clean generated header files in Xen''s build system; perhaps someone who knows the build system better can point out the right way to wire this up. --------------------------------------->8---------------------------- Rather than keeping around headers that are autogenerated in order to avoid adding build dependencies from xen/ to files in tools/, move the relevant parts of the FLASK policy into the hypervisor tree and generate the headers as part of the hypervisor''s build. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- tools/flask/policy/Makefile | 2 +- tools/flask/policy/policy/flask/Makefile | 41 ------ xen/xsm/flask/Makefile | 21 +++ xen/xsm/flask/include/av_perm_to_string.h | 147 ------------------- xen/xsm/flask/include/av_permissions.h | 157 --------------------- xen/xsm/flask/include/class_to_string.h | 15 -- xen/xsm/flask/include/flask.h | 35 ----- xen/xsm/flask/include/initial_sid_to_string.h | 16 --- .../flask => xen/xsm/flask/policy}/access_vectors | 0 .../flask => xen/xsm/flask/policy}/initial_sids | 0 .../xsm/flask/policy}/mkaccess_vector.sh | 4 +- .../flask => xen/xsm/flask/policy}/mkflask.sh | 6 +- .../xsm/flask/policy}/security_classes | 0 13 files changed, 27 insertions(+), 417 deletions(-) delete mode 100644 tools/flask/policy/policy/flask/Makefile delete mode 100644 xen/xsm/flask/include/av_perm_to_string.h delete mode 100644 xen/xsm/flask/include/av_permissions.h delete mode 100644 xen/xsm/flask/include/class_to_string.h delete mode 100644 xen/xsm/flask/include/flask.h delete mode 100644 xen/xsm/flask/include/initial_sid_to_string.h rename {tools/flask/policy/policy/flask => xen/xsm/flask/policy}/access_vectors (100%) rename {tools/flask/policy/policy/flask => xen/xsm/flask/policy}/initial_sids (100%) rename {tools/flask/policy/policy/flask => xen/xsm/flask/policy}/mkaccess_vector.sh (97%) rename {tools/flask/policy/policy/flask => xen/xsm/flask/policy}/mkflask.sh (95%) rename {tools/flask/policy/policy/flask => xen/xsm/flask/policy}/security_classes (100%) diff --git a/tools/flask/policy/Makefile b/tools/flask/policy/Makefile index 5c25cbe..3f5aa38 100644 --- a/tools/flask/policy/Makefile +++ b/tools/flask/policy/Makefile @@ -61,7 +61,7 @@ LOADPOLICY := $(SBINDIR)/flask-loadpolicy # policy source layout POLDIR := policy MODDIR := $(POLDIR)/modules -FLASKDIR := $(POLDIR)/flask +FLASKDIR := ../../../xen/xsm/flask/policy SECCLASS := $(FLASKDIR)/security_classes ISIDS := $(FLASKDIR)/initial_sids AVS := $(FLASKDIR)/access_vectors diff --git a/tools/flask/policy/policy/flask/Makefile b/tools/flask/policy/policy/flask/Makefile deleted file mode 100644 index 5f57e88..0000000 --- a/tools/flask/policy/policy/flask/Makefile +++ /dev/null @@ -1,41 +0,0 @@ -# flask needs to know where to export the libselinux headers. -LIBSEL ?= ../../libselinux - -# flask needs to know where to export the kernel headers. -LINUXDIR ?= ../../../linux-2.6 - -AWK = awk - -CONFIG_SHELL := $(shell if [ -x "$$BASH" ]; then echo $$BASH; \ - else if [ -x /bin/bash ]; then echo /bin/bash; \ - else echo sh; fi ; fi) - -FLASK_H_DEPEND = security_classes initial_sids -AV_H_DEPEND = access_vectors - -FLASK_H_FILES = class_to_string.h flask.h initial_sid_to_string.h -AV_H_FILES = av_perm_to_string.h av_permissions.h -ALL_H_FILES = $(FLASK_H_FILES) $(AV_H_FILES) - -all: $(ALL_H_FILES) - -$(FLASK_H_FILES): $(FLASK_H_DEPEND) - $(CONFIG_SHELL) mkflask.sh $(AWK) $(FLASK_H_DEPEND) - -$(AV_H_FILES): $(AV_H_DEPEND) - $(CONFIG_SHELL) mkaccess_vector.sh $(AWK) $(AV_H_DEPEND) - -tolib: all - install -m 644 flask.h av_permissions.h $(LIBSEL)/include/selinux - install -m 644 class_to_string.h av_inherit.h common_perm_to_string.h av_perm_to_string.h $(LIBSEL)/src - -tokern: all - install -m 644 $(ALL_H_FILES) $(LINUXDIR)/security/selinux/include - -install: all - -relabel: - -clean: - rm -f $(FLASK_H_FILES) - rm -f $(AV_H_FILES) diff --git a/xen/xsm/flask/Makefile b/xen/xsm/flask/Makefile index 92fb410..238495a 100644 --- a/xen/xsm/flask/Makefile +++ b/xen/xsm/flask/Makefile @@ -5,3 +5,24 @@ obj-y += flask_op.o subdir-y += ss CFLAGS += -I./include + +AWK = awk + +CONFIG_SHELL := $(shell if [ -x "$$BASH" ]; then echo $$BASH; \ + else if [ -x /bin/bash ]; then echo /bin/bash; \ + else echo sh; fi ; fi) + +FLASK_H_DEPEND = policy/security_classes policy/initial_sids +AV_H_DEPEND = policy/access_vectors + +FLASK_H_FILES = include/flask.h include/class_to_string.h include/initial_sid_to_string.h +AV_H_FILES = include/av_perm_to_string.h include/av_permissions.h +ALL_H_FILES = $(FLASK_H_FILES) $(AV_H_FILES) + +$(obj-y) ss/built_in.o: $(ALL_H_FILES) + +$(FLASK_H_FILES): $(FLASK_H_DEPEND) + $(CONFIG_SHELL) policy/mkflask.sh $(AWK) $(FLASK_H_DEPEND) + +$(AV_H_FILES): $(AV_H_DEPEND) + $(CONFIG_SHELL) policy/mkaccess_vector.sh $(AWK) $(AV_H_DEPEND) diff --git a/xen/xsm/flask/include/av_perm_to_string.h b/xen/xsm/flask/include/av_perm_to_string.h deleted file mode 100644 index c3f2370..0000000 --- a/xen/xsm/flask/include/av_perm_to_string.h +++ /dev/null @@ -1,147 +0,0 @@ -/* This file is automatically generated. Do not edit. */ - S_(SECCLASS_XEN, XEN__SCHEDULER, "scheduler") - S_(SECCLASS_XEN, XEN__SETTIME, "settime") - S_(SECCLASS_XEN, XEN__TBUFCONTROL, "tbufcontrol") - S_(SECCLASS_XEN, XEN__READCONSOLE, "readconsole") - S_(SECCLASS_XEN, XEN__CLEARCONSOLE, "clearconsole") - S_(SECCLASS_XEN, XEN__PERFCONTROL, "perfcontrol") - S_(SECCLASS_XEN, XEN__MTRR_ADD, "mtrr_add") - S_(SECCLASS_XEN, XEN__MTRR_DEL, "mtrr_del") - S_(SECCLASS_XEN, XEN__MTRR_READ, "mtrr_read") - S_(SECCLASS_XEN, XEN__MICROCODE, "microcode") - S_(SECCLASS_XEN, XEN__PHYSINFO, "physinfo") - S_(SECCLASS_XEN, XEN__QUIRK, "quirk") - S_(SECCLASS_XEN, XEN__WRITECONSOLE, "writeconsole") - S_(SECCLASS_XEN, XEN__READAPIC, "readapic") - S_(SECCLASS_XEN, XEN__WRITEAPIC, "writeapic") - S_(SECCLASS_XEN, XEN__PRIVPROFILE, "privprofile") - S_(SECCLASS_XEN, XEN__NONPRIVPROFILE, "nonprivprofile") - S_(SECCLASS_XEN, XEN__KEXEC, "kexec") - S_(SECCLASS_XEN, XEN__FIRMWARE, "firmware") - S_(SECCLASS_XEN, XEN__SLEEP, "sleep") - S_(SECCLASS_XEN, XEN__FREQUENCY, "frequency") - S_(SECCLASS_XEN, XEN__GETIDLE, "getidle") - S_(SECCLASS_XEN, XEN__DEBUG, "debug") - S_(SECCLASS_XEN, XEN__GETCPUINFO, "getcpuinfo") - S_(SECCLASS_XEN, XEN__HEAP, "heap") - S_(SECCLASS_XEN, XEN__PM_OP, "pm_op") - S_(SECCLASS_XEN, XEN__MCA_OP, "mca_op") - S_(SECCLASS_XEN, XEN__LOCKPROF, "lockprof") - S_(SECCLASS_XEN, XEN__CPUPOOL_OP, "cpupool_op") - S_(SECCLASS_XEN, XEN__SCHED_OP, "sched_op") - S_(SECCLASS_XEN, XEN__TMEM_OP, "tmem_op") - S_(SECCLASS_XEN, XEN__TMEM_CONTROL, "tmem_control") - S_(SECCLASS_DOMAIN, DOMAIN__SETVCPUCONTEXT, "setvcpucontext") - S_(SECCLASS_DOMAIN, DOMAIN__PAUSE, "pause") - S_(SECCLASS_DOMAIN, DOMAIN__UNPAUSE, "unpause") - S_(SECCLASS_DOMAIN, DOMAIN__RESUME, "resume") - S_(SECCLASS_DOMAIN, DOMAIN__CREATE, "create") - S_(SECCLASS_DOMAIN, DOMAIN__TRANSITION, "transition") - S_(SECCLASS_DOMAIN, DOMAIN__MAX_VCPUS, "max_vcpus") - S_(SECCLASS_DOMAIN, DOMAIN__DESTROY, "destroy") - S_(SECCLASS_DOMAIN, DOMAIN__SETVCPUAFFINITY, "setvcpuaffinity") - S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUAFFINITY, "getvcpuaffinity") - S_(SECCLASS_DOMAIN, DOMAIN__SCHEDULER, "scheduler") - S_(SECCLASS_DOMAIN, DOMAIN__GETDOMAININFO, "getdomaininfo") - S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUINFO, "getvcpuinfo") - S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUCONTEXT, "getvcpucontext") - S_(SECCLASS_DOMAIN, DOMAIN__SETDOMAINMAXMEM, "setdomainmaxmem") - S_(SECCLASS_DOMAIN, DOMAIN__SETDOMAINHANDLE, "setdomainhandle") - S_(SECCLASS_DOMAIN, DOMAIN__SETDEBUGGING, "setdebugging") - S_(SECCLASS_DOMAIN, DOMAIN__HYPERCALL, "hypercall") - S_(SECCLASS_DOMAIN, DOMAIN__SETTIME, "settime") - S_(SECCLASS_DOMAIN, DOMAIN__SET_TARGET, "set_target") - S_(SECCLASS_DOMAIN, DOMAIN__SHUTDOWN, "shutdown") - S_(SECCLASS_DOMAIN, DOMAIN__SETADDRSIZE, "setaddrsize") - S_(SECCLASS_DOMAIN, DOMAIN__GETADDRSIZE, "getaddrsize") - S_(SECCLASS_DOMAIN, DOMAIN__TRIGGER, "trigger") - S_(SECCLASS_DOMAIN, DOMAIN__GETEXTVCPUCONTEXT, "getextvcpucontext") - S_(SECCLASS_DOMAIN, DOMAIN__SETEXTVCPUCONTEXT, "setextvcpucontext") - S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUEXTSTATE, "getvcpuextstate") - S_(SECCLASS_DOMAIN, DOMAIN__SETVCPUEXTSTATE, "setvcpuextstate") - S_(SECCLASS_DOMAIN, DOMAIN__GETPODTARGET, "getpodtarget") - S_(SECCLASS_DOMAIN, DOMAIN__SETPODTARGET, "setpodtarget") - S_(SECCLASS_DOMAIN, DOMAIN__SET_MISC_INFO, "set_misc_info") - S_(SECCLASS_DOMAIN, DOMAIN__SET_VIRQ_HANDLER, "set_virq_handler") - S_(SECCLASS_DOMAIN2, DOMAIN2__RELABELFROM, "relabelfrom") - S_(SECCLASS_DOMAIN2, DOMAIN2__RELABELTO, "relabelto") - S_(SECCLASS_DOMAIN2, DOMAIN2__RELABELSELF, "relabelself") - S_(SECCLASS_DOMAIN2, DOMAIN2__MAKE_PRIV_FOR, "make_priv_for") - S_(SECCLASS_DOMAIN2, DOMAIN2__SET_AS_TARGET, "set_as_target") - S_(SECCLASS_DOMAIN2, DOMAIN2__SET_CPUID, "set_cpuid") - S_(SECCLASS_DOMAIN2, DOMAIN2__GETTSC, "gettsc") - S_(SECCLASS_DOMAIN2, DOMAIN2__SETTSC, "settsc") - S_(SECCLASS_HVM, HVM__SETHVMC, "sethvmc") - S_(SECCLASS_HVM, HVM__GETHVMC, "gethvmc") - S_(SECCLASS_HVM, HVM__SETPARAM, "setparam") - S_(SECCLASS_HVM, HVM__GETPARAM, "getparam") - S_(SECCLASS_HVM, HVM__PCILEVEL, "pcilevel") - S_(SECCLASS_HVM, HVM__IRQLEVEL, "irqlevel") - S_(SECCLASS_HVM, HVM__PCIROUTE, "pciroute") - S_(SECCLASS_HVM, HVM__BIND_IRQ, "bind_irq") - S_(SECCLASS_HVM, HVM__CACHEATTR, "cacheattr") - S_(SECCLASS_HVM, HVM__TRACKDIRTYVRAM, "trackdirtyvram") - S_(SECCLASS_HVM, HVM__HVMCTL, "hvmctl") - S_(SECCLASS_HVM, HVM__MEM_EVENT, "mem_event") - S_(SECCLASS_HVM, HVM__MEM_SHARING, "mem_sharing") - S_(SECCLASS_HVM, HVM__AUDIT_P2M, "audit_p2m") - S_(SECCLASS_HVM, HVM__SEND_IRQ, "send_irq") - S_(SECCLASS_HVM, HVM__SHARE_MEM, "share_mem") - S_(SECCLASS_EVENT, EVENT__BIND, "bind") - S_(SECCLASS_EVENT, EVENT__SEND, "send") - S_(SECCLASS_EVENT, EVENT__STATUS, "status") - S_(SECCLASS_EVENT, EVENT__NOTIFY, "notify") - S_(SECCLASS_EVENT, EVENT__CREATE, "create") - S_(SECCLASS_EVENT, EVENT__RESET, "reset") - S_(SECCLASS_GRANT, GRANT__MAP_READ, "map_read") - S_(SECCLASS_GRANT, GRANT__MAP_WRITE, "map_write") - S_(SECCLASS_GRANT, GRANT__UNMAP, "unmap") - S_(SECCLASS_GRANT, GRANT__TRANSFER, "transfer") - S_(SECCLASS_GRANT, GRANT__SETUP, "setup") - S_(SECCLASS_GRANT, GRANT__COPY, "copy") - S_(SECCLASS_GRANT, GRANT__QUERY, "query") - S_(SECCLASS_MMU, MMU__MAP_READ, "map_read") - S_(SECCLASS_MMU, MMU__MAP_WRITE, "map_write") - S_(SECCLASS_MMU, MMU__PAGEINFO, "pageinfo") - S_(SECCLASS_MMU, MMU__PAGELIST, "pagelist") - S_(SECCLASS_MMU, MMU__ADJUST, "adjust") - S_(SECCLASS_MMU, MMU__STAT, "stat") - S_(SECCLASS_MMU, MMU__TRANSLATEGP, "translategp") - S_(SECCLASS_MMU, MMU__UPDATEMP, "updatemp") - S_(SECCLASS_MMU, MMU__PHYSMAP, "physmap") - S_(SECCLASS_MMU, MMU__PINPAGE, "pinpage") - S_(SECCLASS_MMU, MMU__MFNLIST, "mfnlist") - S_(SECCLASS_MMU, MMU__MEMORYMAP, "memorymap") - S_(SECCLASS_MMU, MMU__REMOTE_REMAP, "remote_remap") - S_(SECCLASS_MMU, MMU__MMUEXT_OP, "mmuext_op") - S_(SECCLASS_MMU, MMU__EXCHANGE, "exchange") - S_(SECCLASS_SHADOW, SHADOW__DISABLE, "disable") - S_(SECCLASS_SHADOW, SHADOW__ENABLE, "enable") - S_(SECCLASS_SHADOW, SHADOW__LOGDIRTY, "logdirty") - S_(SECCLASS_RESOURCE, RESOURCE__ADD, "add") - S_(SECCLASS_RESOURCE, RESOURCE__REMOVE, "remove") - S_(SECCLASS_RESOURCE, RESOURCE__USE, "use") - S_(SECCLASS_RESOURCE, RESOURCE__ADD_IRQ, "add_irq") - S_(SECCLASS_RESOURCE, RESOURCE__REMOVE_IRQ, "remove_irq") - S_(SECCLASS_RESOURCE, RESOURCE__ADD_IOPORT, "add_ioport") - S_(SECCLASS_RESOURCE, RESOURCE__REMOVE_IOPORT, "remove_ioport") - S_(SECCLASS_RESOURCE, RESOURCE__ADD_IOMEM, "add_iomem") - S_(SECCLASS_RESOURCE, RESOURCE__REMOVE_IOMEM, "remove_iomem") - S_(SECCLASS_RESOURCE, RESOURCE__STAT_DEVICE, "stat_device") - S_(SECCLASS_RESOURCE, RESOURCE__ADD_DEVICE, "add_device") - S_(SECCLASS_RESOURCE, RESOURCE__REMOVE_DEVICE, "remove_device") - S_(SECCLASS_RESOURCE, RESOURCE__PLUG, "plug") - S_(SECCLASS_RESOURCE, RESOURCE__UNPLUG, "unplug") - S_(SECCLASS_RESOURCE, RESOURCE__SETUP, "setup") - S_(SECCLASS_SECURITY, SECURITY__COMPUTE_AV, "compute_av") - S_(SECCLASS_SECURITY, SECURITY__COMPUTE_CREATE, "compute_create") - S_(SECCLASS_SECURITY, SECURITY__COMPUTE_MEMBER, "compute_member") - S_(SECCLASS_SECURITY, SECURITY__CHECK_CONTEXT, "check_context") - S_(SECCLASS_SECURITY, SECURITY__LOAD_POLICY, "load_policy") - S_(SECCLASS_SECURITY, SECURITY__COMPUTE_RELABEL, "compute_relabel") - S_(SECCLASS_SECURITY, SECURITY__COMPUTE_USER, "compute_user") - S_(SECCLASS_SECURITY, SECURITY__SETENFORCE, "setenforce") - S_(SECCLASS_SECURITY, SECURITY__SETBOOL, "setbool") - S_(SECCLASS_SECURITY, SECURITY__SETSECPARAM, "setsecparam") - S_(SECCLASS_SECURITY, SECURITY__ADD_OCONTEXT, "add_ocontext") - S_(SECCLASS_SECURITY, SECURITY__DEL_OCONTEXT, "del_ocontext") diff --git a/xen/xsm/flask/include/av_permissions.h b/xen/xsm/flask/include/av_permissions.h deleted file mode 100644 index 65302e8..0000000 --- a/xen/xsm/flask/include/av_permissions.h +++ /dev/null @@ -1,157 +0,0 @@ -/* This file is automatically generated. Do not edit. */ -#define XEN__SCHEDULER 0x00000001UL -#define XEN__SETTIME 0x00000002UL -#define XEN__TBUFCONTROL 0x00000004UL -#define XEN__READCONSOLE 0x00000008UL -#define XEN__CLEARCONSOLE 0x00000010UL -#define XEN__PERFCONTROL 0x00000020UL -#define XEN__MTRR_ADD 0x00000040UL -#define XEN__MTRR_DEL 0x00000080UL -#define XEN__MTRR_READ 0x00000100UL -#define XEN__MICROCODE 0x00000200UL -#define XEN__PHYSINFO 0x00000400UL -#define XEN__QUIRK 0x00000800UL -#define XEN__WRITECONSOLE 0x00001000UL -#define XEN__READAPIC 0x00002000UL -#define XEN__WRITEAPIC 0x00004000UL -#define XEN__PRIVPROFILE 0x00008000UL -#define XEN__NONPRIVPROFILE 0x00010000UL -#define XEN__KEXEC 0x00020000UL -#define XEN__FIRMWARE 0x00040000UL -#define XEN__SLEEP 0x00080000UL -#define XEN__FREQUENCY 0x00100000UL -#define XEN__GETIDLE 0x00200000UL -#define XEN__DEBUG 0x00400000UL -#define XEN__GETCPUINFO 0x00800000UL -#define XEN__HEAP 0x01000000UL -#define XEN__PM_OP 0x02000000UL -#define XEN__MCA_OP 0x04000000UL -#define XEN__LOCKPROF 0x08000000UL -#define XEN__CPUPOOL_OP 0x10000000UL -#define XEN__SCHED_OP 0x20000000UL -#define XEN__TMEM_OP 0x40000000UL -#define XEN__TMEM_CONTROL 0x80000000UL - -#define DOMAIN__SETVCPUCONTEXT 0x00000001UL -#define DOMAIN__PAUSE 0x00000002UL -#define DOMAIN__UNPAUSE 0x00000004UL -#define DOMAIN__RESUME 0x00000008UL -#define DOMAIN__CREATE 0x00000010UL -#define DOMAIN__TRANSITION 0x00000020UL -#define DOMAIN__MAX_VCPUS 0x00000040UL -#define DOMAIN__DESTROY 0x00000080UL -#define DOMAIN__SETVCPUAFFINITY 0x00000100UL -#define DOMAIN__GETVCPUAFFINITY 0x00000200UL -#define DOMAIN__SCHEDULER 0x00000400UL -#define DOMAIN__GETDOMAININFO 0x00000800UL -#define DOMAIN__GETVCPUINFO 0x00001000UL -#define DOMAIN__GETVCPUCONTEXT 0x00002000UL -#define DOMAIN__SETDOMAINMAXMEM 0x00004000UL -#define DOMAIN__SETDOMAINHANDLE 0x00008000UL -#define DOMAIN__SETDEBUGGING 0x00010000UL -#define DOMAIN__HYPERCALL 0x00020000UL -#define DOMAIN__SETTIME 0x00040000UL -#define DOMAIN__SET_TARGET 0x00080000UL -#define DOMAIN__SHUTDOWN 0x00100000UL -#define DOMAIN__SETADDRSIZE 0x00200000UL -#define DOMAIN__GETADDRSIZE 0x00400000UL -#define DOMAIN__TRIGGER 0x00800000UL -#define DOMAIN__GETEXTVCPUCONTEXT 0x01000000UL -#define DOMAIN__SETEXTVCPUCONTEXT 0x02000000UL -#define DOMAIN__GETVCPUEXTSTATE 0x04000000UL -#define DOMAIN__SETVCPUEXTSTATE 0x08000000UL -#define DOMAIN__GETPODTARGET 0x10000000UL -#define DOMAIN__SETPODTARGET 0x20000000UL -#define DOMAIN__SET_MISC_INFO 0x40000000UL -#define DOMAIN__SET_VIRQ_HANDLER 0x80000000UL - -#define DOMAIN2__RELABELFROM 0x00000001UL -#define DOMAIN2__RELABELTO 0x00000002UL -#define DOMAIN2__RELABELSELF 0x00000004UL -#define DOMAIN2__MAKE_PRIV_FOR 0x00000008UL -#define DOMAIN2__SET_AS_TARGET 0x00000010UL -#define DOMAIN2__SET_CPUID 0x00000020UL -#define DOMAIN2__GETTSC 0x00000040UL -#define DOMAIN2__SETTSC 0x00000080UL - -#define HVM__SETHVMC 0x00000001UL -#define HVM__GETHVMC 0x00000002UL -#define HVM__SETPARAM 0x00000004UL -#define HVM__GETPARAM 0x00000008UL -#define HVM__PCILEVEL 0x00000010UL -#define HVM__IRQLEVEL 0x00000020UL -#define HVM__PCIROUTE 0x00000040UL -#define HVM__BIND_IRQ 0x00000080UL -#define HVM__CACHEATTR 0x00000100UL -#define HVM__TRACKDIRTYVRAM 0x00000200UL -#define HVM__HVMCTL 0x00000400UL -#define HVM__MEM_EVENT 0x00000800UL -#define HVM__MEM_SHARING 0x00001000UL -#define HVM__AUDIT_P2M 0x00002000UL -#define HVM__SEND_IRQ 0x00004000UL -#define HVM__SHARE_MEM 0x00008000UL - -#define EVENT__BIND 0x00000001UL -#define EVENT__SEND 0x00000002UL -#define EVENT__STATUS 0x00000004UL -#define EVENT__NOTIFY 0x00000008UL -#define EVENT__CREATE 0x00000010UL -#define EVENT__RESET 0x00000020UL - -#define GRANT__MAP_READ 0x00000001UL -#define GRANT__MAP_WRITE 0x00000002UL -#define GRANT__UNMAP 0x00000004UL -#define GRANT__TRANSFER 0x00000008UL -#define GRANT__SETUP 0x00000010UL -#define GRANT__COPY 0x00000020UL -#define GRANT__QUERY 0x00000040UL - -#define MMU__MAP_READ 0x00000001UL -#define MMU__MAP_WRITE 0x00000002UL -#define MMU__PAGEINFO 0x00000004UL -#define MMU__PAGELIST 0x00000008UL -#define MMU__ADJUST 0x00000010UL -#define MMU__STAT 0x00000020UL -#define MMU__TRANSLATEGP 0x00000040UL -#define MMU__UPDATEMP 0x00000080UL -#define MMU__PHYSMAP 0x00000100UL -#define MMU__PINPAGE 0x00000200UL -#define MMU__MFNLIST 0x00000400UL -#define MMU__MEMORYMAP 0x00000800UL -#define MMU__REMOTE_REMAP 0x00001000UL -#define MMU__MMUEXT_OP 0x00002000UL -#define MMU__EXCHANGE 0x00004000UL - -#define SHADOW__DISABLE 0x00000001UL -#define SHADOW__ENABLE 0x00000002UL -#define SHADOW__LOGDIRTY 0x00000004UL - -#define RESOURCE__ADD 0x00000001UL -#define RESOURCE__REMOVE 0x00000002UL -#define RESOURCE__USE 0x00000004UL -#define RESOURCE__ADD_IRQ 0x00000008UL -#define RESOURCE__REMOVE_IRQ 0x00000010UL -#define RESOURCE__ADD_IOPORT 0x00000020UL -#define RESOURCE__REMOVE_IOPORT 0x00000040UL -#define RESOURCE__ADD_IOMEM 0x00000080UL -#define RESOURCE__REMOVE_IOMEM 0x00000100UL -#define RESOURCE__STAT_DEVICE 0x00000200UL -#define RESOURCE__ADD_DEVICE 0x00000400UL -#define RESOURCE__REMOVE_DEVICE 0x00000800UL -#define RESOURCE__PLUG 0x00001000UL -#define RESOURCE__UNPLUG 0x00002000UL -#define RESOURCE__SETUP 0x00004000UL - -#define SECURITY__COMPUTE_AV 0x00000001UL -#define SECURITY__COMPUTE_CREATE 0x00000002UL -#define SECURITY__COMPUTE_MEMBER 0x00000004UL -#define SECURITY__CHECK_CONTEXT 0x00000008UL -#define SECURITY__LOAD_POLICY 0x00000010UL -#define SECURITY__COMPUTE_RELABEL 0x00000020UL -#define SECURITY__COMPUTE_USER 0x00000040UL -#define SECURITY__SETENFORCE 0x00000080UL -#define SECURITY__SETBOOL 0x00000100UL -#define SECURITY__SETSECPARAM 0x00000200UL -#define SECURITY__ADD_OCONTEXT 0x00000400UL -#define SECURITY__DEL_OCONTEXT 0x00000800UL - diff --git a/xen/xsm/flask/include/class_to_string.h b/xen/xsm/flask/include/class_to_string.h deleted file mode 100644 index 7716645..0000000 --- a/xen/xsm/flask/include/class_to_string.h +++ /dev/null @@ -1,15 +0,0 @@ -/* This file is automatically generated. Do not edit. */ -/* - * Security object class definitions - */ - S_("null") - S_("xen") - S_("domain") - S_("domain2") - S_("hvm") - S_("mmu") - S_("resource") - S_("shadow") - S_("event") - S_("grant") - S_("security") diff --git a/xen/xsm/flask/include/flask.h b/xen/xsm/flask/include/flask.h deleted file mode 100644 index 3bff998..0000000 --- a/xen/xsm/flask/include/flask.h +++ /dev/null @@ -1,35 +0,0 @@ -/* This file is automatically generated. Do not edit. */ -#ifndef _SELINUX_FLASK_H_ -#define _SELINUX_FLASK_H_ - -/* - * Security object class definitions - */ -#define SECCLASS_XEN 1 -#define SECCLASS_DOMAIN 2 -#define SECCLASS_DOMAIN2 3 -#define SECCLASS_HVM 4 -#define SECCLASS_MMU 5 -#define SECCLASS_RESOURCE 6 -#define SECCLASS_SHADOW 7 -#define SECCLASS_EVENT 8 -#define SECCLASS_GRANT 9 -#define SECCLASS_SECURITY 10 - -/* - * Security identifier indices for initial entities - */ -#define SECINITSID_XEN 1 -#define SECINITSID_DOM0 2 -#define SECINITSID_DOMIO 3 -#define SECINITSID_DOMXEN 4 -#define SECINITSID_UNLABELED 5 -#define SECINITSID_SECURITY 6 -#define SECINITSID_IOPORT 7 -#define SECINITSID_IOMEM 8 -#define SECINITSID_IRQ 9 -#define SECINITSID_DEVICE 10 - -#define SECINITSID_NUM 10 - -#endif diff --git a/xen/xsm/flask/include/initial_sid_to_string.h b/xen/xsm/flask/include/initial_sid_to_string.h deleted file mode 100644 index 814f4bf..0000000 --- a/xen/xsm/flask/include/initial_sid_to_string.h +++ /dev/null @@ -1,16 +0,0 @@ -/* This file is automatically generated. Do not edit. */ -static char *initial_sid_to_string[] -{ - "null", - "xen", - "dom0", - "domio", - "domxen", - "unlabeled", - "security", - "ioport", - "iomem", - "irq", - "device", -}; - diff --git a/tools/flask/policy/policy/flask/access_vectors b/xen/xsm/flask/policy/access_vectors similarity index 100% rename from tools/flask/policy/policy/flask/access_vectors rename to xen/xsm/flask/policy/access_vectors diff --git a/tools/flask/policy/policy/flask/initial_sids b/xen/xsm/flask/policy/initial_sids similarity index 100% rename from tools/flask/policy/policy/flask/initial_sids rename to xen/xsm/flask/policy/initial_sids diff --git a/tools/flask/policy/policy/flask/mkaccess_vector.sh b/xen/xsm/flask/policy/mkaccess_vector.sh similarity index 97% rename from tools/flask/policy/policy/flask/mkaccess_vector.sh rename to xen/xsm/flask/policy/mkaccess_vector.sh index 43a60a7..8ec87f7 100644 --- a/tools/flask/policy/policy/flask/mkaccess_vector.sh +++ b/xen/xsm/flask/policy/mkaccess_vector.sh @@ -9,8 +9,8 @@ awk=$1 shift # output files -av_permissions="av_permissions.h" -av_perm_to_string="av_perm_to_string.h" +av_permissions="include/av_permissions.h" +av_perm_to_string="include/av_perm_to_string.h" cat $* | $awk " BEGIN { diff --git a/tools/flask/policy/policy/flask/mkflask.sh b/xen/xsm/flask/policy/mkflask.sh similarity index 95% rename from tools/flask/policy/policy/flask/mkflask.sh rename to xen/xsm/flask/policy/mkflask.sh index 9c84754..e8d8fb5 100644 --- a/tools/flask/policy/policy/flask/mkflask.sh +++ b/xen/xsm/flask/policy/mkflask.sh @@ -9,9 +9,9 @@ awk=$1 shift 1 # output file -output_file="flask.h" -debug_file="class_to_string.h" -debug_file2="initial_sid_to_string.h" +output_file="include/flask.h" +debug_file="include/class_to_string.h" +debug_file2="include/initial_sid_to_string.h" cat $* | $awk " BEGIN { diff --git a/tools/flask/policy/policy/flask/security_classes b/xen/xsm/flask/policy/security_classes similarity index 100% rename from tools/flask/policy/policy/flask/security_classes rename to xen/xsm/flask/policy/security_classes -- 1.7.11.4
Matt Wilson
2012-Oct-09 20:20 UTC
Re: [PATCH 0 of 8] NUMA Awareness for the Credit Scheduler
On Tue, Oct 09, 2012 at 11:45:49AM +0100, Dario Faggioli wrote:> > Whether, that is acceptable or not, is of course debatable, and we had a > bit of this discussion already (although no real conclusion has been > reached yet). > My take is that, right now, since we do not yet expose any virtual NUMA > topology to the VM itself, the behaviour described above is fine. As > soon as we''ll have some guest NUMA awareness, than it might be > worthwhile to try to preserve it, at least to some extent.For what it''s worth, under VMware all bets are off if a vNUMA enabled guest is migrated via vMotion. See "Performance Best Practices for VMware vSphere 5.0" [1] page 40. There is also a good deal of information in a paper published by VMware labs on HPC workloads [2] and a blog post on NUMA load balancing [3]. Matt [1] http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf [2] http://labs.vmware.com/publications/performance-evaluation-of-hpc-benchmarks-on-vmwares-esxi-server [3] http://blogs.vmware.com/vsphere/2012/02/vspherenuma-loadbalancing.htmlvnu
Ian Campbell
2012-Oct-10 08:38 UTC
Re: [PATCH RFC] flask: move policy header sources into hypervisor
On Tue, 2012-10-09 at 19:31 +0100, Daniel De Graaf wrote:> Ian Campbell wrote: > [...] > >>> +++ b/xen/xsm/flask/include/av_perm_to_string.h > > Also, in that case why is this file checked in? > > This patch fixes the autogenerated files, but doesn''t fully wire them in > to things like "make clean" or .{git,hg}ignore. I don''t see an obvious > way to clean generated header files in Xen''s build system; perhaps > someone who knows the build system better can point out the right way to > wire this up.xen/arch/x86/Makefile has a clean:: rule which removes autogenerated stuff like the asm-offsets files. Probably the right model to follow. Ian.> > --------------------------------------->8---------------------------- > > Rather than keeping around headers that are autogenerated in order to > avoid adding build dependencies from xen/ to files in tools/, move the > relevant parts of the FLASK policy into the hypervisor tree and generate > the headers as part of the hypervisor''s build. > > Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > tools/flask/policy/Makefile | 2 +- > tools/flask/policy/policy/flask/Makefile | 41 ------ > xen/xsm/flask/Makefile | 21 +++ > xen/xsm/flask/include/av_perm_to_string.h | 147 ------------------- > xen/xsm/flask/include/av_permissions.h | 157 --------------------- > xen/xsm/flask/include/class_to_string.h | 15 -- > xen/xsm/flask/include/flask.h | 35 ----- > xen/xsm/flask/include/initial_sid_to_string.h | 16 --- > .../flask => xen/xsm/flask/policy}/access_vectors | 0 > .../flask => xen/xsm/flask/policy}/initial_sids | 0 > .../xsm/flask/policy}/mkaccess_vector.sh | 4 +- > .../flask => xen/xsm/flask/policy}/mkflask.sh | 6 +- > .../xsm/flask/policy}/security_classes | 0 > 13 files changed, 27 insertions(+), 417 deletions(-) > delete mode 100644 tools/flask/policy/policy/flask/Makefile > delete mode 100644 xen/xsm/flask/include/av_perm_to_string.h > delete mode 100644 xen/xsm/flask/include/av_permissions.h > delete mode 100644 xen/xsm/flask/include/class_to_string.h > delete mode 100644 xen/xsm/flask/include/flask.h > delete mode 100644 xen/xsm/flask/include/initial_sid_to_string.h > rename {tools/flask/policy/policy/flask => xen/xsm/flask/policy}/access_vectors (100%) > rename {tools/flask/policy/policy/flask => xen/xsm/flask/policy}/initial_sids (100%) > rename {tools/flask/policy/policy/flask => xen/xsm/flask/policy}/mkaccess_vector.sh (97%) > rename {tools/flask/policy/policy/flask => xen/xsm/flask/policy}/mkflask.sh (95%) > rename {tools/flask/policy/policy/flask => xen/xsm/flask/policy}/security_classes (100%) > > diff --git a/tools/flask/policy/Makefile b/tools/flask/policy/Makefile > index 5c25cbe..3f5aa38 100644 > --- a/tools/flask/policy/Makefile > +++ b/tools/flask/policy/Makefile > @@ -61,7 +61,7 @@ LOADPOLICY := $(SBINDIR)/flask-loadpolicy > # policy source layout > POLDIR := policy > MODDIR := $(POLDIR)/modules > -FLASKDIR := $(POLDIR)/flask > +FLASKDIR := ../../../xen/xsm/flask/policy > SECCLASS := $(FLASKDIR)/security_classes > ISIDS := $(FLASKDIR)/initial_sids > AVS := $(FLASKDIR)/access_vectors > diff --git a/tools/flask/policy/policy/flask/Makefile b/tools/flask/policy/policy/flask/Makefile > deleted file mode 100644 > index 5f57e88..0000000 > --- a/tools/flask/policy/policy/flask/Makefile > +++ /dev/null > @@ -1,41 +0,0 @@ > -# flask needs to know where to export the libselinux headers. > -LIBSEL ?= ../../libselinux > - > -# flask needs to know where to export the kernel headers. > -LINUXDIR ?= ../../../linux-2.6 > - > -AWK = awk > - > -CONFIG_SHELL := $(shell if [ -x "$$BASH" ]; then echo $$BASH; \ > - else if [ -x /bin/bash ]; then echo /bin/bash; \ > - else echo sh; fi ; fi) > - > -FLASK_H_DEPEND = security_classes initial_sids > -AV_H_DEPEND = access_vectors > - > -FLASK_H_FILES = class_to_string.h flask.h initial_sid_to_string.h > -AV_H_FILES = av_perm_to_string.h av_permissions.h > -ALL_H_FILES = $(FLASK_H_FILES) $(AV_H_FILES) > - > -all: $(ALL_H_FILES) > - > -$(FLASK_H_FILES): $(FLASK_H_DEPEND) > - $(CONFIG_SHELL) mkflask.sh $(AWK) $(FLASK_H_DEPEND) > - > -$(AV_H_FILES): $(AV_H_DEPEND) > - $(CONFIG_SHELL) mkaccess_vector.sh $(AWK) $(AV_H_DEPEND) > - > -tolib: all > - install -m 644 flask.h av_permissions.h $(LIBSEL)/include/selinux > - install -m 644 class_to_string.h av_inherit.h common_perm_to_string.h av_perm_to_string.h $(LIBSEL)/src > - > -tokern: all > - install -m 644 $(ALL_H_FILES) $(LINUXDIR)/security/selinux/include > - > -install: all > - > -relabel: > - > -clean: > - rm -f $(FLASK_H_FILES) > - rm -f $(AV_H_FILES) > diff --git a/xen/xsm/flask/Makefile b/xen/xsm/flask/Makefile > index 92fb410..238495a 100644 > --- a/xen/xsm/flask/Makefile > +++ b/xen/xsm/flask/Makefile > @@ -5,3 +5,24 @@ obj-y += flask_op.o > subdir-y += ss > > CFLAGS += -I./include > + > +AWK = awk > + > +CONFIG_SHELL := $(shell if [ -x "$$BASH" ]; then echo $$BASH; \ > + else if [ -x /bin/bash ]; then echo /bin/bash; \ > + else echo sh; fi ; fi) > + > +FLASK_H_DEPEND = policy/security_classes policy/initial_sids > +AV_H_DEPEND = policy/access_vectors > + > +FLASK_H_FILES = include/flask.h include/class_to_string.h include/initial_sid_to_string.h > +AV_H_FILES = include/av_perm_to_string.h include/av_permissions.h > +ALL_H_FILES = $(FLASK_H_FILES) $(AV_H_FILES) > + > +$(obj-y) ss/built_in.o: $(ALL_H_FILES) > + > +$(FLASK_H_FILES): $(FLASK_H_DEPEND) > + $(CONFIG_SHELL) policy/mkflask.sh $(AWK) $(FLASK_H_DEPEND) > + > +$(AV_H_FILES): $(AV_H_DEPEND) > + $(CONFIG_SHELL) policy/mkaccess_vector.sh $(AWK) $(AV_H_DEPEND) > diff --git a/xen/xsm/flask/include/av_perm_to_string.h b/xen/xsm/flask/include/av_perm_to_string.h > deleted file mode 100644 > index c3f2370..0000000 > --- a/xen/xsm/flask/include/av_perm_to_string.h > +++ /dev/null > @@ -1,147 +0,0 @@ > -/* This file is automatically generated. Do not edit. */ > - S_(SECCLASS_XEN, XEN__SCHEDULER, "scheduler") > - S_(SECCLASS_XEN, XEN__SETTIME, "settime") > - S_(SECCLASS_XEN, XEN__TBUFCONTROL, "tbufcontrol") > - S_(SECCLASS_XEN, XEN__READCONSOLE, "readconsole") > - S_(SECCLASS_XEN, XEN__CLEARCONSOLE, "clearconsole") > - S_(SECCLASS_XEN, XEN__PERFCONTROL, "perfcontrol") > - S_(SECCLASS_XEN, XEN__MTRR_ADD, "mtrr_add") > - S_(SECCLASS_XEN, XEN__MTRR_DEL, "mtrr_del") > - S_(SECCLASS_XEN, XEN__MTRR_READ, "mtrr_read") > - S_(SECCLASS_XEN, XEN__MICROCODE, "microcode") > - S_(SECCLASS_XEN, XEN__PHYSINFO, "physinfo") > - S_(SECCLASS_XEN, XEN__QUIRK, "quirk") > - S_(SECCLASS_XEN, XEN__WRITECONSOLE, "writeconsole") > - S_(SECCLASS_XEN, XEN__READAPIC, "readapic") > - S_(SECCLASS_XEN, XEN__WRITEAPIC, "writeapic") > - S_(SECCLASS_XEN, XEN__PRIVPROFILE, "privprofile") > - S_(SECCLASS_XEN, XEN__NONPRIVPROFILE, "nonprivprofile") > - S_(SECCLASS_XEN, XEN__KEXEC, "kexec") > - S_(SECCLASS_XEN, XEN__FIRMWARE, "firmware") > - S_(SECCLASS_XEN, XEN__SLEEP, "sleep") > - S_(SECCLASS_XEN, XEN__FREQUENCY, "frequency") > - S_(SECCLASS_XEN, XEN__GETIDLE, "getidle") > - S_(SECCLASS_XEN, XEN__DEBUG, "debug") > - S_(SECCLASS_XEN, XEN__GETCPUINFO, "getcpuinfo") > - S_(SECCLASS_XEN, XEN__HEAP, "heap") > - S_(SECCLASS_XEN, XEN__PM_OP, "pm_op") > - S_(SECCLASS_XEN, XEN__MCA_OP, "mca_op") > - S_(SECCLASS_XEN, XEN__LOCKPROF, "lockprof") > - S_(SECCLASS_XEN, XEN__CPUPOOL_OP, "cpupool_op") > - S_(SECCLASS_XEN, XEN__SCHED_OP, "sched_op") > - S_(SECCLASS_XEN, XEN__TMEM_OP, "tmem_op") > - S_(SECCLASS_XEN, XEN__TMEM_CONTROL, "tmem_control") > - S_(SECCLASS_DOMAIN, DOMAIN__SETVCPUCONTEXT, "setvcpucontext") > - S_(SECCLASS_DOMAIN, DOMAIN__PAUSE, "pause") > - S_(SECCLASS_DOMAIN, DOMAIN__UNPAUSE, "unpause") > - S_(SECCLASS_DOMAIN, DOMAIN__RESUME, "resume") > - S_(SECCLASS_DOMAIN, DOMAIN__CREATE, "create") > - S_(SECCLASS_DOMAIN, DOMAIN__TRANSITION, "transition") > - S_(SECCLASS_DOMAIN, DOMAIN__MAX_VCPUS, "max_vcpus") > - S_(SECCLASS_DOMAIN, DOMAIN__DESTROY, "destroy") > - S_(SECCLASS_DOMAIN, DOMAIN__SETVCPUAFFINITY, "setvcpuaffinity") > - S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUAFFINITY, "getvcpuaffinity") > - S_(SECCLASS_DOMAIN, DOMAIN__SCHEDULER, "scheduler") > - S_(SECCLASS_DOMAIN, DOMAIN__GETDOMAININFO, "getdomaininfo") > - S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUINFO, "getvcpuinfo") > - S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUCONTEXT, "getvcpucontext") > - S_(SECCLASS_DOMAIN, DOMAIN__SETDOMAINMAXMEM, "setdomainmaxmem") > - S_(SECCLASS_DOMAIN, DOMAIN__SETDOMAINHANDLE, "setdomainhandle") > - S_(SECCLASS_DOMAIN, DOMAIN__SETDEBUGGING, "setdebugging") > - S_(SECCLASS_DOMAIN, DOMAIN__HYPERCALL, "hypercall") > - S_(SECCLASS_DOMAIN, DOMAIN__SETTIME, "settime") > - S_(SECCLASS_DOMAIN, DOMAIN__SET_TARGET, "set_target") > - S_(SECCLASS_DOMAIN, DOMAIN__SHUTDOWN, "shutdown") > - S_(SECCLASS_DOMAIN, DOMAIN__SETADDRSIZE, "setaddrsize") > - S_(SECCLASS_DOMAIN, DOMAIN__GETADDRSIZE, "getaddrsize") > - S_(SECCLASS_DOMAIN, DOMAIN__TRIGGER, "trigger") > - S_(SECCLASS_DOMAIN, DOMAIN__GETEXTVCPUCONTEXT, "getextvcpucontext") > - S_(SECCLASS_DOMAIN, DOMAIN__SETEXTVCPUCONTEXT, "setextvcpucontext") > - S_(SECCLASS_DOMAIN, DOMAIN__GETVCPUEXTSTATE, "getvcpuextstate") > - S_(SECCLASS_DOMAIN, DOMAIN__SETVCPUEXTSTATE, "setvcpuextstate") > - S_(SECCLASS_DOMAIN, DOMAIN__GETPODTARGET, "getpodtarget") > - S_(SECCLASS_DOMAIN, DOMAIN__SETPODTARGET, "setpodtarget") > - S_(SECCLASS_DOMAIN, DOMAIN__SET_MISC_INFO, "set_misc_info") > - S_(SECCLASS_DOMAIN, DOMAIN__SET_VIRQ_HANDLER, "set_virq_handler") > - S_(SECCLASS_DOMAIN2, DOMAIN2__RELABELFROM, "relabelfrom") > - S_(SECCLASS_DOMAIN2, DOMAIN2__RELABELTO, "relabelto") > - S_(SECCLASS_DOMAIN2, DOMAIN2__RELABELSELF, "relabelself") > - S_(SECCLASS_DOMAIN2, DOMAIN2__MAKE_PRIV_FOR, "make_priv_for") > - S_(SECCLASS_DOMAIN2, DOMAIN2__SET_AS_TARGET, "set_as_target") > - S_(SECCLASS_DOMAIN2, DOMAIN2__SET_CPUID, "set_cpuid") > - S_(SECCLASS_DOMAIN2, DOMAIN2__GETTSC, "gettsc") > - S_(SECCLASS_DOMAIN2, DOMAIN2__SETTSC, "settsc") > - S_(SECCLASS_HVM, HVM__SETHVMC, "sethvmc") > - S_(SECCLASS_HVM, HVM__GETHVMC, "gethvmc") > - S_(SECCLASS_HVM, HVM__SETPARAM, "setparam") > - S_(SECCLASS_HVM, HVM__GETPARAM, "getparam") > - S_(SECCLASS_HVM, HVM__PCILEVEL, "pcilevel") > - S_(SECCLASS_HVM, HVM__IRQLEVEL, "irqlevel") > - S_(SECCLASS_HVM, HVM__PCIROUTE, "pciroute") > - S_(SECCLASS_HVM, HVM__BIND_IRQ, "bind_irq") > - S_(SECCLASS_HVM, HVM__CACHEATTR, "cacheattr") > - S_(SECCLASS_HVM, HVM__TRACKDIRTYVRAM, "trackdirtyvram") > - S_(SECCLASS_HVM, HVM__HVMCTL, "hvmctl") > - S_(SECCLASS_HVM, HVM__MEM_EVENT, "mem_event") > - S_(SECCLASS_HVM, HVM__MEM_SHARING, "mem_sharing") > - S_(SECCLASS_HVM, HVM__AUDIT_P2M, "audit_p2m") > - S_(SECCLASS_HVM, HVM__SEND_IRQ, "send_irq") > - S_(SECCLASS_HVM, HVM__SHARE_MEM, "share_mem") > - S_(SECCLASS_EVENT, EVENT__BIND, "bind") > - S_(SECCLASS_EVENT, EVENT__SEND, "send") > - S_(SECCLASS_EVENT, EVENT__STATUS, "status") > - S_(SECCLASS_EVENT, EVENT__NOTIFY, "notify") > - S_(SECCLASS_EVENT, EVENT__CREATE, "create") > - S_(SECCLASS_EVENT, EVENT__RESET, "reset") > - S_(SECCLASS_GRANT, GRANT__MAP_READ, "map_read") > - S_(SECCLASS_GRANT, GRANT__MAP_WRITE, "map_write") > - S_(SECCLASS_GRANT, GRANT__UNMAP, "unmap") > - S_(SECCLASS_GRANT, GRANT__TRANSFER, "transfer") > - S_(SECCLASS_GRANT, GRANT__SETUP, "setup") > - S_(SECCLASS_GRANT, GRANT__COPY, "copy") > - S_(SECCLASS_GRANT, GRANT__QUERY, "query") > - S_(SECCLASS_MMU, MMU__MAP_READ, "map_read") > - S_(SECCLASS_MMU, MMU__MAP_WRITE, "map_write") > - S_(SECCLASS_MMU, MMU__PAGEINFO, "pageinfo") > - S_(SECCLASS_MMU, MMU__PAGELIST, "pagelist") > - S_(SECCLASS_MMU, MMU__ADJUST, "adjust") > - S_(SECCLASS_MMU, MMU__STAT, "stat") > - S_(SECCLASS_MMU, MMU__TRANSLATEGP, "translategp") > - S_(SECCLASS_MMU, MMU__UPDATEMP, "updatemp") > - S_(SECCLASS_MMU, MMU__PHYSMAP, "physmap") > - S_(SECCLASS_MMU, MMU__PINPAGE, "pinpage") > - S_(SECCLASS_MMU, MMU__MFNLIST, "mfnlist") > - S_(SECCLASS_MMU, MMU__MEMORYMAP, "memorymap") > - S_(SECCLASS_MMU, MMU__REMOTE_REMAP, "remote_remap") > - S_(SECCLASS_MMU, MMU__MMUEXT_OP, "mmuext_op") > - S_(SECCLASS_MMU, MMU__EXCHANGE, "exchange") > - S_(SECCLASS_SHADOW, SHADOW__DISABLE, "disable") > - S_(SECCLASS_SHADOW, SHADOW__ENABLE, "enable") > - S_(SECCLASS_SHADOW, SHADOW__LOGDIRTY, "logdirty") > - S_(SECCLASS_RESOURCE, RESOURCE__ADD, "add") > - S_(SECCLASS_RESOURCE, RESOURCE__REMOVE, "remove") > - S_(SECCLASS_RESOURCE, RESOURCE__USE, "use") > - S_(SECCLASS_RESOURCE, RESOURCE__ADD_IRQ, "add_irq") > - S_(SECCLASS_RESOURCE, RESOURCE__REMOVE_IRQ, "remove_irq") > - S_(SECCLASS_RESOURCE, RESOURCE__ADD_IOPORT, "add_ioport") > - S_(SECCLASS_RESOURCE, RESOURCE__REMOVE_IOPORT, "remove_ioport") > - S_(SECCLASS_RESOURCE, RESOURCE__ADD_IOMEM, "add_iomem") > - S_(SECCLASS_RESOURCE, RESOURCE__REMOVE_IOMEM, "remove_iomem") > - S_(SECCLASS_RESOURCE, RESOURCE__STAT_DEVICE, "stat_device") > - S_(SECCLASS_RESOURCE, RESOURCE__ADD_DEVICE, "add_device") > - S_(SECCLASS_RESOURCE, RESOURCE__REMOVE_DEVICE, "remove_device") > - S_(SECCLASS_RESOURCE, RESOURCE__PLUG, "plug") > - S_(SECCLASS_RESOURCE, RESOURCE__UNPLUG, "unplug") > - S_(SECCLASS_RESOURCE, RESOURCE__SETUP, "setup") > - S_(SECCLASS_SECURITY, SECURITY__COMPUTE_AV, "compute_av") > - S_(SECCLASS_SECURITY, SECURITY__COMPUTE_CREATE, "compute_create") > - S_(SECCLASS_SECURITY, SECURITY__COMPUTE_MEMBER, "compute_member") > - S_(SECCLASS_SECURITY, SECURITY__CHECK_CONTEXT, "check_context") > - S_(SECCLASS_SECURITY, SECURITY__LOAD_POLICY, "load_policy") > - S_(SECCLASS_SECURITY, SECURITY__COMPUTE_RELABEL, "compute_relabel") > - S_(SECCLASS_SECURITY, SECURITY__COMPUTE_USER, "compute_user") > - S_(SECCLASS_SECURITY, SECURITY__SETENFORCE, "setenforce") > - S_(SECCLASS_SECURITY, SECURITY__SETBOOL, "setbool") > - S_(SECCLASS_SECURITY, SECURITY__SETSECPARAM, "setsecparam") > - S_(SECCLASS_SECURITY, SECURITY__ADD_OCONTEXT, "add_ocontext") > - S_(SECCLASS_SECURITY, SECURITY__DEL_OCONTEXT, "del_ocontext") > diff --git a/xen/xsm/flask/include/av_permissions.h b/xen/xsm/flask/include/av_permissions.h > deleted file mode 100644 > index 65302e8..0000000 > --- a/xen/xsm/flask/include/av_permissions.h > +++ /dev/null > @@ -1,157 +0,0 @@ > -/* This file is automatically generated. Do not edit. */ > -#define XEN__SCHEDULER 0x00000001UL > -#define XEN__SETTIME 0x00000002UL > -#define XEN__TBUFCONTROL 0x00000004UL > -#define XEN__READCONSOLE 0x00000008UL > -#define XEN__CLEARCONSOLE 0x00000010UL > -#define XEN__PERFCONTROL 0x00000020UL > -#define XEN__MTRR_ADD 0x00000040UL > -#define XEN__MTRR_DEL 0x00000080UL > -#define XEN__MTRR_READ 0x00000100UL > -#define XEN__MICROCODE 0x00000200UL > -#define XEN__PHYSINFO 0x00000400UL > -#define XEN__QUIRK 0x00000800UL > -#define XEN__WRITECONSOLE 0x00001000UL > -#define XEN__READAPIC 0x00002000UL > -#define XEN__WRITEAPIC 0x00004000UL > -#define XEN__PRIVPROFILE 0x00008000UL > -#define XEN__NONPRIVPROFILE 0x00010000UL > -#define XEN__KEXEC 0x00020000UL > -#define XEN__FIRMWARE 0x00040000UL > -#define XEN__SLEEP 0x00080000UL > -#define XEN__FREQUENCY 0x00100000UL > -#define XEN__GETIDLE 0x00200000UL > -#define XEN__DEBUG 0x00400000UL > -#define XEN__GETCPUINFO 0x00800000UL > -#define XEN__HEAP 0x01000000UL > -#define XEN__PM_OP 0x02000000UL > -#define XEN__MCA_OP 0x04000000UL > -#define XEN__LOCKPROF 0x08000000UL > -#define XEN__CPUPOOL_OP 0x10000000UL > -#define XEN__SCHED_OP 0x20000000UL > -#define XEN__TMEM_OP 0x40000000UL > -#define XEN__TMEM_CONTROL 0x80000000UL > - > -#define DOMAIN__SETVCPUCONTEXT 0x00000001UL > -#define DOMAIN__PAUSE 0x00000002UL > -#define DOMAIN__UNPAUSE 0x00000004UL > -#define DOMAIN__RESUME 0x00000008UL > -#define DOMAIN__CREATE 0x00000010UL > -#define DOMAIN__TRANSITION 0x00000020UL > -#define DOMAIN__MAX_VCPUS 0x00000040UL > -#define DOMAIN__DESTROY 0x00000080UL > -#define DOMAIN__SETVCPUAFFINITY 0x00000100UL > -#define DOMAIN__GETVCPUAFFINITY 0x00000200UL > -#define DOMAIN__SCHEDULER 0x00000400UL > -#define DOMAIN__GETDOMAININFO 0x00000800UL > -#define DOMAIN__GETVCPUINFO 0x00001000UL > -#define DOMAIN__GETVCPUCONTEXT 0x00002000UL > -#define DOMAIN__SETDOMAINMAXMEM 0x00004000UL > -#define DOMAIN__SETDOMAINHANDLE 0x00008000UL > -#define DOMAIN__SETDEBUGGING 0x00010000UL > -#define DOMAIN__HYPERCALL 0x00020000UL > -#define DOMAIN__SETTIME 0x00040000UL > -#define DOMAIN__SET_TARGET 0x00080000UL > -#define DOMAIN__SHUTDOWN 0x00100000UL > -#define DOMAIN__SETADDRSIZE 0x00200000UL > -#define DOMAIN__GETADDRSIZE 0x00400000UL > -#define DOMAIN__TRIGGER 0x00800000UL > -#define DOMAIN__GETEXTVCPUCONTEXT 0x01000000UL > -#define DOMAIN__SETEXTVCPUCONTEXT 0x02000000UL > -#define DOMAIN__GETVCPUEXTSTATE 0x04000000UL > -#define DOMAIN__SETVCPUEXTSTATE 0x08000000UL > -#define DOMAIN__GETPODTARGET 0x10000000UL > -#define DOMAIN__SETPODTARGET 0x20000000UL > -#define DOMAIN__SET_MISC_INFO 0x40000000UL > -#define DOMAIN__SET_VIRQ_HANDLER 0x80000000UL > - > -#define DOMAIN2__RELABELFROM 0x00000001UL > -#define DOMAIN2__RELABELTO 0x00000002UL > -#define DOMAIN2__RELABELSELF 0x00000004UL > -#define DOMAIN2__MAKE_PRIV_FOR 0x00000008UL > -#define DOMAIN2__SET_AS_TARGET 0x00000010UL > -#define DOMAIN2__SET_CPUID 0x00000020UL > -#define DOMAIN2__GETTSC 0x00000040UL > -#define DOMAIN2__SETTSC 0x00000080UL > - > -#define HVM__SETHVMC 0x00000001UL > -#define HVM__GETHVMC 0x00000002UL > -#define HVM__SETPARAM 0x00000004UL > -#define HVM__GETPARAM 0x00000008UL > -#define HVM__PCILEVEL 0x00000010UL > -#define HVM__IRQLEVEL 0x00000020UL > -#define HVM__PCIROUTE 0x00000040UL > -#define HVM__BIND_IRQ 0x00000080UL > -#define HVM__CACHEATTR 0x00000100UL > -#define HVM__TRACKDIRTYVRAM 0x00000200UL > -#define HVM__HVMCTL 0x00000400UL > -#define HVM__MEM_EVENT 0x00000800UL > -#define HVM__MEM_SHARING 0x00001000UL > -#define HVM__AUDIT_P2M 0x00002000UL > -#define HVM__SEND_IRQ 0x00004000UL > -#define HVM__SHARE_MEM 0x00008000UL > - > -#define EVENT__BIND 0x00000001UL > -#define EVENT__SEND 0x00000002UL > -#define EVENT__STATUS 0x00000004UL > -#define EVENT__NOTIFY 0x00000008UL > -#define EVENT__CREATE 0x00000010UL > -#define EVENT__RESET 0x00000020UL > - > -#define GRANT__MAP_READ 0x00000001UL > -#define GRANT__MAP_WRITE 0x00000002UL > -#define GRANT__UNMAP 0x00000004UL > -#define GRANT__TRANSFER 0x00000008UL > -#define GRANT__SETUP 0x00000010UL > -#define GRANT__COPY 0x00000020UL > -#define GRANT__QUERY 0x00000040UL > - > -#define MMU__MAP_READ 0x00000001UL > -#define MMU__MAP_WRITE 0x00000002UL > -#define MMU__PAGEINFO 0x00000004UL > -#define MMU__PAGELIST 0x00000008UL > -#define MMU__ADJUST 0x00000010UL > -#define MMU__STAT 0x00000020UL > -#define MMU__TRANSLATEGP 0x00000040UL > -#define MMU__UPDATEMP 0x00000080UL > -#define MMU__PHYSMAP 0x00000100UL > -#define MMU__PINPAGE 0x00000200UL > -#define MMU__MFNLIST 0x00000400UL > -#define MMU__MEMORYMAP 0x00000800UL > -#define MMU__REMOTE_REMAP 0x00001000UL > -#define MMU__MMUEXT_OP 0x00002000UL > -#define MMU__EXCHANGE 0x00004000UL > - > -#define SHADOW__DISABLE 0x00000001UL > -#define SHADOW__ENABLE 0x00000002UL > -#define SHADOW__LOGDIRTY 0x00000004UL > - > -#define RESOURCE__ADD 0x00000001UL > -#define RESOURCE__REMOVE 0x00000002UL > -#define RESOURCE__USE 0x00000004UL > -#define RESOURCE__ADD_IRQ 0x00000008UL > -#define RESOURCE__REMOVE_IRQ 0x00000010UL > -#define RESOURCE__ADD_IOPORT 0x00000020UL > -#define RESOURCE__REMOVE_IOPORT 0x00000040UL > -#define RESOURCE__ADD_IOMEM 0x00000080UL > -#define RESOURCE__REMOVE_IOMEM 0x00000100UL > -#define RESOURCE__STAT_DEVICE 0x00000200UL > -#define RESOURCE__ADD_DEVICE 0x00000400UL > -#define RESOURCE__REMOVE_DEVICE 0x00000800UL > -#define RESOURCE__PLUG 0x00001000UL > -#define RESOURCE__UNPLUG 0x00002000UL > -#define RESOURCE__SETUP 0x00004000UL > - > -#define SECURITY__COMPUTE_AV 0x00000001UL > -#define SECURITY__COMPUTE_CREATE 0x00000002UL > -#define SECURITY__COMPUTE_MEMBER 0x00000004UL > -#define SECURITY__CHECK_CONTEXT 0x00000008UL > -#define SECURITY__LOAD_POLICY 0x00000010UL > -#define SECURITY__COMPUTE_RELABEL 0x00000020UL > -#define SECURITY__COMPUTE_USER 0x00000040UL > -#define SECURITY__SETENFORCE 0x00000080UL > -#define SECURITY__SETBOOL 0x00000100UL > -#define SECURITY__SETSECPARAM 0x00000200UL > -#define SECURITY__ADD_OCONTEXT 0x00000400UL > -#define SECURITY__DEL_OCONTEXT 0x00000800UL > - > diff --git a/xen/xsm/flask/include/class_to_string.h b/xen/xsm/flask/include/class_to_string.h > deleted file mode 100644 > index 7716645..0000000 > --- a/xen/xsm/flask/include/class_to_string.h > +++ /dev/null > @@ -1,15 +0,0 @@ > -/* This file is automatically generated. Do not edit. */ > -/* > - * Security object class definitions > - */ > - S_("null") > - S_("xen") > - S_("domain") > - S_("domain2") > - S_("hvm") > - S_("mmu") > - S_("resource") > - S_("shadow") > - S_("event") > - S_("grant") > - S_("security") > diff --git a/xen/xsm/flask/include/flask.h b/xen/xsm/flask/include/flask.h > deleted file mode 100644 > index 3bff998..0000000 > --- a/xen/xsm/flask/include/flask.h > +++ /dev/null > @@ -1,35 +0,0 @@ > -/* This file is automatically generated. Do not edit. */ > -#ifndef _SELINUX_FLASK_H_ > -#define _SELINUX_FLASK_H_ > - > -/* > - * Security object class definitions > - */ > -#define SECCLASS_XEN 1 > -#define SECCLASS_DOMAIN 2 > -#define SECCLASS_DOMAIN2 3 > -#define SECCLASS_HVM 4 > -#define SECCLASS_MMU 5 > -#define SECCLASS_RESOURCE 6 > -#define SECCLASS_SHADOW 7 > -#define SECCLASS_EVENT 8 > -#define SECCLASS_GRANT 9 > -#define SECCLASS_SECURITY 10 > - > -/* > - * Security identifier indices for initial entities > - */ > -#define SECINITSID_XEN 1 > -#define SECINITSID_DOM0 2 > -#define SECINITSID_DOMIO 3 > -#define SECINITSID_DOMXEN 4 > -#define SECINITSID_UNLABELED 5 > -#define SECINITSID_SECURITY 6 > -#define SECINITSID_IOPORT 7 > -#define SECINITSID_IOMEM 8 > -#define SECINITSID_IRQ 9 > -#define SECINITSID_DEVICE 10 > - > -#define SECINITSID_NUM 10 > - > -#endif > diff --git a/xen/xsm/flask/include/initial_sid_to_string.h b/xen/xsm/flask/include/initial_sid_to_string.h > deleted file mode 100644 > index 814f4bf..0000000 > --- a/xen/xsm/flask/include/initial_sid_to_string.h > +++ /dev/null > @@ -1,16 +0,0 @@ > -/* This file is automatically generated. Do not edit. */ > -static char *initial_sid_to_string[] > -{ > - "null", > - "xen", > - "dom0", > - "domio", > - "domxen", > - "unlabeled", > - "security", > - "ioport", > - "iomem", > - "irq", > - "device", > -}; > - > diff --git a/tools/flask/policy/policy/flask/access_vectors b/xen/xsm/flask/policy/access_vectors > similarity index 100% > rename from tools/flask/policy/policy/flask/access_vectors > rename to xen/xsm/flask/policy/access_vectors > diff --git a/tools/flask/policy/policy/flask/initial_sids b/xen/xsm/flask/policy/initial_sids > similarity index 100% > rename from tools/flask/policy/policy/flask/initial_sids > rename to xen/xsm/flask/policy/initial_sids > diff --git a/tools/flask/policy/policy/flask/mkaccess_vector.sh b/xen/xsm/flask/policy/mkaccess_vector.sh > similarity index 97% > rename from tools/flask/policy/policy/flask/mkaccess_vector.sh > rename to xen/xsm/flask/policy/mkaccess_vector.sh > index 43a60a7..8ec87f7 100644 > --- a/tools/flask/policy/policy/flask/mkaccess_vector.sh > +++ b/xen/xsm/flask/policy/mkaccess_vector.sh > @@ -9,8 +9,8 @@ awk=$1 > shift > > # output files > -av_permissions="av_permissions.h" > -av_perm_to_string="av_perm_to_string.h" > +av_permissions="include/av_permissions.h" > +av_perm_to_string="include/av_perm_to_string.h" > > cat $* | $awk " > BEGIN { > diff --git a/tools/flask/policy/policy/flask/mkflask.sh b/xen/xsm/flask/policy/mkflask.sh > similarity index 95% > rename from tools/flask/policy/policy/flask/mkflask.sh > rename to xen/xsm/flask/policy/mkflask.sh > index 9c84754..e8d8fb5 100644 > --- a/tools/flask/policy/policy/flask/mkflask.sh > +++ b/xen/xsm/flask/policy/mkflask.sh > @@ -9,9 +9,9 @@ awk=$1 > shift 1 > > # output file > -output_file="flask.h" > -debug_file="class_to_string.h" > -debug_file2="initial_sid_to_string.h" > +output_file="include/flask.h" > +debug_file="include/class_to_string.h" > +debug_file2="include/initial_sid_to_string.h" > > cat $* | $awk " > BEGIN { > diff --git a/tools/flask/policy/policy/flask/security_classes b/xen/xsm/flask/policy/security_classes > similarity index 100% > rename from tools/flask/policy/policy/flask/security_classes > rename to xen/xsm/flask/policy/security_classes > -- > 1.7.11.4 >
Dario Faggioli
2012-Oct-10 08:44 UTC
Re: [PATCH RFC] flask: move policy header sources into hypervisor
Hello Daniel, On Tue, 2012-10-09 at 14:31 -0400, Daniel De Graaf wrote:> Ian Campbell wrote: > [...] > >>> +++ b/xen/xsm/flask/include/av_perm_to_string.h > > Also, in that case why is this file checked in? > > This patch fixes the autogenerated files, but doesn''t fully wire them in > to things like "make clean" or .{git,hg}ignore. >Forgive me for pushing but, while you''re here, do you mind taking a look and sharing your thoughts about the hunks of the patch touching XSM and FLASK? As I said, I''ve very few experience with that part of Xen, and it would be wonderful to know whether what I did looks sane, or I messed something up! :-P Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Oct-10 08:46 UTC
Re: [PATCH 8 of 8] xl: add node-affinity to the output of `xl list`
On Tue, 2012-10-09 at 16:03 +0100, Ian Jackson wrote:> > Which is what made me thinking that opacity was not its first concern in > > the first place, and that turning it into being opaque was none of this > > change''s business. :-) > > You are right that since you''re just moving the code, it''s not a > problem for this patch. >Ok.> > However, I see your point... Perhaps I can add two functions (something > > like print_{cpumap,nodemap}()), both calling the original > > print_bitmap(), and deal with the "any {cpu,node}" case within them... > > > > Do you like that better? > > That would indeed be an improvement. >But I think I''ll go for it then. It''s a small effort, and I think the final results would be better too. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2012-Oct-10 10:55 UTC
Re: [PATCH 7 of 8] libxl: automatic placement deals with node-affinity
On Fri, Oct 5, 2012 at 3:08 PM, Dario Faggioli <dario.faggioli@citrix.com> wrote:> Which basically means the following two things: > 1) during domain creation, it is the node-affinity of > the domain --rather than the vcpu-affinities of its > vcpus-- that is affected by automatic placement; > 2) during automatic placement, when counting how many > vcpus are already "bound" to a placement candidate > (as part of the process of choosing the best > candidate), node-affinity is also considered, > together with vcpu-affinity. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>Acked-by: George Dunlap <george.dunlap@eu.citrix.com>> > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -133,13 +133,13 @@ static int numa_place_domain(libxl__gc * > { > int found; > libxl__numa_candidate candidate; > - libxl_bitmap candidate_nodemap; > + libxl_bitmap cpupool_nodemap; > libxl_cpupoolinfo cpupool_info; > int i, cpupool, rc = 0; > uint32_t memkb; > > libxl__numa_candidate_init(&candidate); > - libxl_bitmap_init(&candidate_nodemap); > + libxl_bitmap_init(&cpupool_nodemap); > > /* > * Extract the cpumap from the cpupool the domain belong to. In fact, > @@ -156,7 +156,7 @@ static int numa_place_domain(libxl__gc * > rc = libxl_domain_need_memory(CTX, info, &memkb); > if (rc) > goto out; > - if (libxl_node_bitmap_alloc(CTX, &candidate_nodemap, 0)) { > + if (libxl_node_bitmap_alloc(CTX, &cpupool_nodemap, 0)) { > rc = ERROR_FAIL; > goto out; > } > @@ -174,17 +174,19 @@ static int numa_place_domain(libxl__gc * > if (found == 0) > goto out; > > - /* Map the candidate''s node map to the domain''s info->cpumap */ > - libxl__numa_candidate_get_nodemap(gc, &candidate, &candidate_nodemap); > - rc = libxl_nodemap_to_cpumap(CTX, &candidate_nodemap, &info->cpumap); > + /* Map the candidate''s node map to the domain''s info->nodemap */ > + libxl__numa_candidate_get_nodemap(gc, &candidate, &info->nodemap); > + > + /* Avoid trying to set the affinity to nodes that might be in the > + * candidate''s nodemap but out of our cpupool. */ > + rc = libxl_cpumap_to_nodemap(CTX, &cpupool_info.cpumap, > + &cpupool_nodemap); > if (rc) > goto out; > > - /* Avoid trying to set the affinity to cpus that might be in the > - * nodemap but not in our cpupool. */ > - libxl_for_each_set_bit(i, info->cpumap) { > - if (!libxl_bitmap_test(&cpupool_info.cpumap, i)) > - libxl_bitmap_reset(&info->cpumap, i); > + libxl_for_each_set_bit(i, info->nodemap) { > + if (!libxl_bitmap_test(&cpupool_nodemap, i)) > + libxl_bitmap_reset(&info->nodemap, i); > } > > LOG(DETAIL, "NUMA placement candidate with %d nodes, %d cpus and " > @@ -193,7 +195,7 @@ static int numa_place_domain(libxl__gc * > > out: > libxl__numa_candidate_dispose(&candidate); > - libxl_bitmap_dispose(&candidate_nodemap); > + libxl_bitmap_dispose(&cpupool_nodemap); > libxl_cpupoolinfo_dispose(&cpupool_info); > return rc; > } > @@ -211,10 +213,10 @@ int libxl__build_pre(libxl__gc *gc, uint > /* > * Check if the domain has any CPU affinity. If not, try to build > * up one. In case numa_place_domain() find at least a suitable > - * candidate, it will affect info->cpumap accordingly; if it > + * candidate, it will affect info->nodemap accordingly; if it > * does not, it just leaves it as it is. This means (unless > * some weird error manifests) the subsequent call to > - * libxl_set_vcpuaffinity_all() will do the actual placement, > + * libxl_domain_set_nodeaffinity() will do the actual placement, > * whatever that turns out to be. > */ > if (libxl_defbool_val(info->numa_placement)) { > diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c > --- a/tools/libxl/libxl_numa.c > +++ b/tools/libxl/libxl_numa.c > @@ -171,7 +171,7 @@ static int nodemap_to_nr_vcpus(libxl__gc > const libxl_bitmap *nodemap) > { > libxl_dominfo *dinfo = NULL; > - libxl_bitmap vcpu_nodemap; > + libxl_bitmap vcpu_nodemap, dom_nodemap; > int nr_doms, nr_cpus; > int nr_vcpus = 0; > int i, j, k; > @@ -185,6 +185,12 @@ static int nodemap_to_nr_vcpus(libxl__gc > return ERROR_FAIL; > } > > + if (libxl_node_bitmap_alloc(CTX, &dom_nodemap, 0) < 0) { > + libxl_dominfo_list_free(dinfo, nr_doms); > + libxl_bitmap_dispose(&vcpu_nodemap); > + return ERROR_FAIL; > + } > + > for (i = 0; i < nr_doms; i++) { > libxl_vcpuinfo *vinfo; > int nr_dom_vcpus; > @@ -193,6 +199,9 @@ static int nodemap_to_nr_vcpus(libxl__gc > if (vinfo == NULL) > continue; > > + /* Retrieve the domain''s node-affinity map (see below) */ > + libxl_domain_get_nodeaffinity(CTX, dinfo[i].domid, &dom_nodemap); > + > /* For each vcpu of each domain ... */ > for (j = 0; j < nr_dom_vcpus; j++) { > > @@ -201,9 +210,17 @@ static int nodemap_to_nr_vcpus(libxl__gc > libxl_for_each_set_bit(k, vinfo[j].cpumap) > libxl_bitmap_set(&vcpu_nodemap, tinfo[k].node); > > - /* And check if that map has any intersection with our nodemap */ > + /* > + * We now check whether the && of the vcpu''s nodemap and the > + * domain''s nodemap has any intersection with the nodemap of our > + * canidate. > + * Using both (vcpu''s and domain''s) nodemaps allows us to take > + * both vcpu-affinity and node-affinity into account when counting > + * the number of vcpus bound to the candidate. > + */ > libxl_for_each_set_bit(k, vcpu_nodemap) { > - if (libxl_bitmap_test(nodemap, k)) { > + if (libxl_bitmap_test(&dom_nodemap, k) && > + libxl_bitmap_test(nodemap, k)) { > nr_vcpus++; > break; > } > @@ -213,6 +230,7 @@ static int nodemap_to_nr_vcpus(libxl__gc > libxl_vcpuinfo_list_free(vinfo, nr_dom_vcpus); > } > > + libxl_bitmap_dispose(&dom_nodemap); > libxl_bitmap_dispose(&vcpu_nodemap); > libxl_dominfo_list_free(dinfo, nr_doms); > return nr_vcpus; > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
George Dunlap
2012-Oct-10 11:00 UTC
Re: [PATCH 0 of 8] NUMA Awareness for the Credit Scheduler
On Fri, Oct 5, 2012 at 3:08 PM, Dario Faggioli <dario.faggioli@citrix.com> wrote:> Hi Everyone, > > Here it comes a patch series instilling some NUMA awareness in the Credit > scheduler.Hey Dario -- I''ve looked through everything and acked everything I felt I understood well enough / had the authority to ack. Thanks for the good work! -George> > What the patches do is teaching the Xen''s scheduler how to try maximizing > performances on a NUMA host, taking advantage of the information coming from > the automatic NUMA placement we have in libxl. Right now, the > placement algorithm runs and selects a node (or a set of nodes) where it is best > to put a new domain on. Then, all the memory for the new domain is allocated > from those node(s) and all the vCPUs of the new domain are pinned to the pCPUs > of those node(s). What we do here is, instead of statically pinning the domain''s > vCPUs to the nodes'' pCPUs, have the (Credit) scheduler _prefer_ running them > there. That enables most of the performances benefits of "real" pinning, but > without its intrinsic lack of flexibility. > > The above happens by extending to the scheduler the knowledge of a domain''s > node-affinity. We then ask it to first try to run the domain''s vCPUs on one of > the nodes the domain has affinity with. Of course, if that turns out to be > impossible, it falls back on the old behaviour (i.e., considering vcpu-affinity > only). > > Just allow me to mention that NUMA aware scheduling not only is one of the item > of the NUMA roadmap I''m trying to maintain here > http://wiki.xen.org/wiki/Xen_NUMA_Roadmap. It is also one of the features we > decided we want for Xen 4.3 (and thus it is part of the list of such features > that George is maintaining). > > Up to now, I''ve been able to thoroughly test this only on my 2 NUMA nodes > testbox, by running the SpecJBB2005 benchmark concurrently on multiple VMs, and > the results looks really nice. A full set of what I got can be found inside my > presentation from last XenSummit, which is available here: > > http://www.slideshare.net/xen_com_mgr/numa-and-virtualization-the-case-of-xen?ref=http://www.xen.org/xensummit/xs12na_talks/T9.html > > However, I rerun some of the tests in these last days (since I changed some > bits of the implementation) and here''s what I got: > > ------------------------------------------------------- > SpecJBB2005 Total Aggregate Throughput > ------------------------------------------------------- > #VMs No NUMA affinity NUMA affinity & +/- % > scheduling > ------------------------------------------------------- > 2 34653.273 40243.015 +16.13% > 4 29883.057 35526.807 +18.88% > 6 23512.926 27015.786 +14.89% > 8 19120.243 21825.818 +14.15% > 10 15676.675 17701.472 +12.91% > > Basically, results are consistent with what is shown in the super-nice graphs I > have in the slides above! :-) As said, this looks nice to me, especially > considering that my test machine is quite small, i.e., its 2 nodes are very > close to each others from a latency point of view. I really expect more > improvement on bigger hardware, where much greater NUMA effect is to be > expected. Of course, I myself will continue benchmarking (hopefully, on > systems with more than 2 nodes too), but should anyone want to run its own > testing, that would be great, so feel free to do that and report results to me > and/or to the list! > > A little bit more about the series: > > 1/8 xen, libxc: rename xenctl_cpumap to xenctl_bitmap > 2/8 xen, libxc: introduce node maps and masks > > Is some preparation work. > > 3/8 xen: let the (credit) scheduler know about `node affinity` > > Is where the vcpu load balancing logic of the credit scheduler is modified to > support node-affinity. > > 4/8 xen: allow for explicitly specifying node-affinity > 5/8 libxc: allow for explicitly specifying node-affinity > 6/8 libxl: allow for explicitly specifying node-affinity > 7/8 libxl: automatic placement deals with node-affinity > > Is what wires the in-scheduler node-affinity support with the external world. > Please, note that patch 4 touches XSM and Flask, which is the area with which I > have less experience and less chance to test properly. So, If Daniel and/or > anyone interested in that could take a look and comment, that would be awesome. > > 8/8 xl: report node-affinity for domains > > Is just some small output enhancement. > > Thanks and Regards, > Dario > > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Dario Faggioli
2012-Oct-10 12:28 UTC
Re: [PATCH 0 of 8] NUMA Awareness for the Credit Scheduler
On Wed, 2012-10-10 at 12:00 +0100, George Dunlap wrote:> On Fri, Oct 5, 2012 at 3:08 PM, Dario Faggioli > <dario.faggioli@citrix.com> wrote: > > Hi Everyone, > > > > Here it comes a patch series instilling some NUMA awareness in the Credit > > scheduler. > > Hey Dario -- >Hi!> I''ve looked through everything and acked everything I > felt I understood well enough / had the authority to ack. >Yep, I''ve seen that. Thanks.> Thanks for > the good work! >Well, thanks to you for the good comments... And be prepared for next round! :-P Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Daniel De Graaf
2012-Oct-10 14:03 UTC
Re: [PATCH RFC] flask: move policy header sources into hypervisor
On 10/10/2012 04:44 AM, Dario Faggioli wrote:> Hello Daniel, > > On Tue, 2012-10-09 at 14:31 -0400, Daniel De Graaf wrote: >> Ian Campbell wrote: >> [...] >>>>> +++ b/xen/xsm/flask/include/av_perm_to_string.h >>> Also, in that case why is this file checked in? >> >> This patch fixes the autogenerated files, but doesn''t fully wire them in >> to things like "make clean" or .{git,hg}ignore. >> > Forgive me for pushing but, while you''re here, do you mind taking a look > and sharing your thoughts about the hunks of the patch touching XSM and > FLASK? As I said, I''ve very few experience with that part of Xen, and it > would be wonderful to know whether what I did looks sane, or I messed > something up! :-P > > Thanks and Regards, > Dario >Ah, in my distraction with fixing the autogeneration I neglected to finish looking at the original patch. The XSM changes look good except for a missing implementation of the dummy_nodeaffinity() function in xen/xsm/dummy.c. However, since the implementation of xsm_nodeaffinity and xsm_vcpuaffinity are identical, it may be simpler to just merge them into a common xsm_affinity_domctl hook (as is implemented in xsm/flask/hooks.c) - in that case, just renaming the existing dummy hook will suffice. A more general note on the topic of what XSM permissions to use: normally, each domctl has its own permission, and so adding new domctls would be done by adding a new permission to the access_vectors file (which is the source of av_perm_to_string.h). However, for this case, it seems rather unlikely that one would want to allow access to vcpu affinity and deny node affinity, so using the same permission for both accesses is the best solution. When renaming a permission (such as getvcpuaffinity => getaffinity), the FLASK policy also needs to be changed - you can normally just grep for the permission being changed. The dummy hook would be caught in a compilation with XSM enabled, but I notice that current xen-unstable will not build due to a patch being applied out of order (xsm/flask: add domain relabel support requires rcu_lock_domain_by_any_id which was added in the prior patch). Adding Keir to CC since he applied the patch. -- Daniel De Graaf National Security Agency
Dario Faggioli
2012-Oct-10 14:39 UTC
Re: [PATCH RFC] flask: move policy header sources into hypervisor
On Wed, 2012-10-10 at 15:03 +0100, Daniel De Graaf wrote:> Ah, in my distraction with fixing the autogeneration I neglected to > finish looking at the original patch. >:-)> The XSM changes look good except > for a missing implementation of the dummy_nodeaffinity() function in > xen/xsm/dummy.c. However, since the implementation of xsm_nodeaffinity > and xsm_vcpuaffinity are identical, it may be simpler to just merge them > into a common xsm_affinity_domctl hook (as is implemented in > xsm/flask/hooks.c) - in that case, just renaming the existing dummy hook > will suffice. >Ok, thanks. I will do that.> A more general note on the topic of what XSM permissions to use: > normally, each domctl has its own permission, and so adding new domctls > would be done by adding a new permission to the access_vectors file > (which is the source of av_perm_to_string.h). However, for this case, it > seems rather unlikely that one would want to allow access to vcpu > affinity and deny node affinity, so using the same permission for both > accesses is the best solution. >Yes, exactly. Moreover, looking at xen/xsm/flask/include/av_permissions.h where DOMAIN__{GET,SET}VCPUAFFINITY are, I got thee impression that there is no more space left for DOMAIN__* permissions, as they already go from 0x00000001UL to 0x80000000UL... Is that so?> When renaming a permission (such as getvcpuaffinity => getaffinity), the > FLASK policy also needs to be changed - you can normally just grep for > the permission being changed. >Ok and thanks again. I will do that too...> The dummy hook would be caught in a compilation with XSM enabled, but I > notice that current xen-unstable will not build due to a patch being > applied out of order (xsm/flask: add domain relabel support requires > rcu_lock_domain_by_any_id which was added in the prior patch). Adding > Keir to CC since he applied the patch. >... As well as I will try to check for this for next round (hoping that by that time the issue you''re describing here would be fixed :-)). Thanks a lot and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Daniel De Graaf
2012-Oct-10 15:32 UTC
Re: [PATCH RFC] flask: move policy header sources into hypervisor
On 10/10/2012 10:39 AM, Dario Faggioli wrote: [...]>> A more general note on the topic of what XSM permissions to use: >> normally, each domctl has its own permission, and so adding new domctls >> would be done by adding a new permission to the access_vectors file >> (which is the source of av_perm_to_string.h). However, for this case, it >> seems rather unlikely that one would want to allow access to vcpu >> affinity and deny node affinity, so using the same permission for both >> accesses is the best solution. >> > Yes, exactly. > > Moreover, looking at xen/xsm/flask/include/av_permissions.h where > DOMAIN__{GET,SET}VCPUAFFINITY are, I got thee impression that there is > no more space left for DOMAIN__* permissions, as they already go from > 0x00000001UL to 0x80000000UL... Is that so?Yes. My XSM patch series expands this by adding SECCLASS_DOMAIN2 to address this (and that part is already in 4.3). This solution can be applied to any XSM classes needing more than 32 permission bits. -- Daniel De Graaf National Security Agency
Dario Faggioli
2012-Oct-10 16:18 UTC
Re: [PATCH 0 of 8] NUMA Awareness for the Credit Scheduler
On Mon, 2012-10-08 at 12:43 -0700, Dan Magenheimer wrote:> Just wondering... is the NUMA information preserved on live migration? > I''m not saying that it necessarily should, but it may just work > due to the implementation (since migration is a form of domain creation). > In either case, it might be good to comment about live migration > on your wiki. >FYI: http://wiki.xen.org/wiki/Xen_NUMA_Introduction http://wiki.xen.org/wiki?title=Xen_NUMA_Introduction&diff=5327&oldid=4598 As per the NUMA roadmap ( http://wiki.xen.org/wiki/Xen_NUMA_Roadmap ), you can see here [1] that it was already there. :-) Regards, Dario [1] http://wiki.xen.org/wiki/Xen_NUMA_Roadmap#Virtual_NUMA_topology_exposure_to_guests -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel