This series of patches introduces vNUMA topology awareness and
provides interfaces and data structures to enable vNUMA for
PV domU guests.
vNUMA topology support should be supported by PV guest kernel.
Corresponging patches should be applied.
Introduction
-------------
vNUMA topology is exposed to the PV guest to improve performance when running
workloads on NUMA machines.
XEN vNUMA implementation provides a way to create vNUMA-enabled guests on
NUMA/UMA
and map vNUMA topology to physical NUMA in a optimal way.
XEN vNUMA support
Current set of patches introduces subop hypercall that is available for
enlightened
PV guests with vNUMA patches applied.
Domain structure was modified to reflect per-domain vNUMA topology for use in
other
vNUMA-aware subsystems (e.g. ballooning).
libxc
libxc provides interfaces to build PV guests with vNUMA support and in case of
NUMA
machines provides initial memory allocation on physical NUMA nodes. This
implemented by
utilizing nodemap formed by automatic NUMA placement. Details are in patch #3.
libxl
libxl provides a way to predefine in VM config vNUMA topology - number of
vnodes,
memory arrangement, vcpus to vnodes assignment, distance map.
PV guest
As of now, only PV guest can take advantage of vNUMA functionality. vNUMA Linux
patches
should be applied and NUMA support should be compiled in kernel.
Example of booting vNUMA enabled pv domU:
NUMA machine:
cpu_topology :
cpu: core socket node
0: 0 0 0
1: 1 0 0
2: 2 0 0
3: 3 0 0
4: 0 1 1
5: 1 1 1
6: 2 1 1
7: 3 1 1
numa_info :
node: memsize memfree distances
0: 17664 12243 10,20
1: 16384 11929 20,10
VM config:
memory = 16384
vcpus = 8
name = "rcbig"
vnodes = 8
vnumamem = "2g, 2g, 2g, 2g, 2g, 2g, 2g, 2g"
vcpu_to_vnode ="5 6 7 4 3 2 1 0"
root@superpipe:~# xl list -n
Name ID Mem VCPUs State Time(s) NODE
Affinity
Domain-0 0 4096 1 r----- 581.5
any node
r9 1 2048 1 -b---- 19.9
0
rc9k1 2 2048 6 -b---- 21.1
1
*rcbig 6 16384 8 -b---- 4.9
any node
xl debug-keys u:
XEN) Memory location of each domain:
(XEN) Domain 0 (total: 1048576):
(XEN) Node 0: 510411
(XEN) Node 1: 538165
(XEN) Domain 2 (total: 524288):
(XEN) Node 0: 0
(XEN) Node 1: 524288
(XEN) Domain 3 (total: 4194304):
(XEN) Node 0: 2621440
(XEN) Node 1: 1572864
(XEN) Domain has 8 vnodes
(XEN) pnode 0: vnodes: 0 (2048), 1 (2048), 2 (2048), 3 (2048), 4 (2048),
(XEN) pnode 1: vnodes: 5 (2048), 6 (2048), 7 (2048),
(XEN) Domain vcpu to vnode: 5 6 7 4 3 2 1 0
pv linux boot (domain 3):
[ 0.000000] init_memory_mapping: [mem 0x00100000-0x37fffffff]
[ 0.000000] [mem 0x00100000-0x37fffffff] page 4k
[ 0.000000] RAMDISK: [mem 0x01dd6000-0x0347dfff]
[ 0.000000] vNUMA: memblk[0] - 0x0 0x80000000
[ 0.000000] vNUMA: memblk[1] - 0x80000000 0x100000000
[ 0.000000] vNUMA: memblk[2] - 0x100000000 0x180000000
[ 0.000000] vNUMA: memblk[3] - 0x180000000 0x200000000
[ 0.000000] vNUMA: memblk[4] - 0x200000000 0x280000000
[ 0.000000] vNUMA: memblk[5] - 0x280000000 0x300000000
[ 0.000000] vNUMA: memblk[6] - 0x300000000 0x380000000
[ 0.000000] vNUMA: memblk[7] - 0x380000000 0x400000000
[ 0.000000] NUMA: Initialized distance table, cnt=8
[ 0.000000] Initmem setup node 0 [mem 0x00000000-0x7fffffff]
[ 0.000000] NODE_DATA [mem 0x7ffd9000-0x7fffffff]
[ 0.000000] Initmem setup node 1 [mem 0x80000000-0xffffffff]
[ 0.000000] NODE_DATA [mem 0xfffd9000-0xffffffff]
[ 0.000000] Initmem setup node 2 [mem 0x100000000-0x17fffffff]
[ 0.000000] NODE_DATA [mem 0x17ffd9000-0x17fffffff]
[ 0.000000] Initmem setup node 3 [mem 0x180000000-0x1ffffffff]
[ 0.000000] NODE_DATA [mem 0x1fffd9000-0x1ffffffff]
[ 0.000000] Initmem setup node 4 [mem 0x200000000-0x27fffffff]
[ 0.000000] NODE_DATA [mem 0x27ffd9000-0x27fffffff]
[ 0.000000] Initmem setup node 5 [mem 0x280000000-0x2ffffffff]
[ 0.000000] NODE_DATA [mem 0x2fffd9000-0x2ffffffff]
[ 0.000000] Initmem setup node 6 [mem 0x300000000-0x37fffffff]
[ 0.000000] NODE_DATA [mem 0x37ffd9000-0x37fffffff]
[ 0.000000] Initmem setup node 7 [mem 0x380000000-0x3ffffffff]
[ 0.000000] NODE_DATA [mem 0x3fdff7000-0x3fe01dfff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x00001000-0x00ffffff]
[ 0.000000] DMA32 [mem 0x01000000-0xffffffff]
[ 0.000000] Normal [mem 0x100000000-0x3ffffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x00001000-0x0009ffff]
[ 0.000000] node 0: [mem 0x00100000-0x7fffffff]
[ 0.000000] node 1: [mem 0x80000000-0xffffffff]
[ 0.000000] node 2: [mem 0x100000000-0x17fffffff]
[ 0.000000] node 3: [mem 0x180000000-0x1ffffffff]
[ 0.000000] node 4: [mem 0x200000000-0x27fffffff]
[ 0.000000] node 5: [mem 0x280000000-0x2ffffffff]
[ 0.000000] node 6: [mem 0x300000000-0x37fffffff]
[ 0.000000] node 7: [mem 0x380000000-0x3ffffffff]
[ 0.000000] On node 0 totalpages: 524191
[ 0.000000] DMA zone: 56 pages used for memmap
[ 0.000000] DMA zone: 21 pages reserved
[ 0.000000] DMA zone: 3999 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 7112 pages used for memmap
[ 0.000000] DMA32 zone: 520192 pages, LIFO batch:31
[ 0.000000] On node 1 totalpages: 524288
[ 0.000000] DMA32 zone: 7168 pages used for memmap
[ 0.000000] DMA32 zone: 524288 pages, LIFO batch:31
[ 0.000000] On node 2 totalpages: 524288
[ 0.000000] Normal zone: 7168 pages used for memmap
[ 0.000000] Normal zone: 524288 pages, LIFO batch:31
[ 0.000000] On node 3 totalpages: 524288
[ 0.000000] Normal zone: 7168 pages used for memmap
[ 0.000000] Normal zone: 524288 pages, LIFO batch:31
[ 0.000000] On node 4 totalpages: 524288
[ 0.000000] Normal zone: 7168 pages used for memmap
[ 0.000000] Normal zone: 524288 pages, LIFO batch:31
[ 0.000000] On node 5 totalpages: 524288
[ 0.000000] Normal zone: 7168 pages used for memmap
[ 0.000000] Normal zone: 524288 pages, LIFO batch:31
[ 0.000000] On node 6 totalpages: 524288
[ 0.000000] Normal zone: 7168 pages used for memmap
[ 0.000000] Normal zone: 524288 pages, LIFO batch:31
[ 0.000000] On node 7 totalpages: 524288
[ 0.000000] Normal zone: 7168 pages used for memmap
[ 0.000000] Normal zone: 524288 pages, LIFO batch:31
[ 0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org
[ 0.000000] smpboot: Allowing 8 CPUs, 0 hotplug CPUs
[ 0.000000] No local APIC present
[ 0.000000] APIC: disable apic facility
[ 0.000000] APIC: switched to apic NOOP
[ 0.000000] nr_irqs_gsi: 16
[ 0.000000] Booting paravirtualized kernel on Xen
[ 0.000000] Xen version: 4.4-unstable (preserve-AD)
[ 0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:8
nr_node_ids:8
[ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff88007fc00000 s85120 r8192
d21376 u2097152
[ 0.000000] pcpu-alloc: s85120 r8192 d21376 u2097152 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0 [1] 1 [2] 2 [3] 3 [4] 4 [5] 5 [6] 6 [7] 7
[ 0.000000] Built 8 zonelists in Node order, mobility grouping on. Total
pages: 4136842
numactl withing running guest:
root@heatpipe:~# numactl --ha
available: 8 nodes (0-7)
node 0 cpus: 7
node 0 size: 2047 MB
node 0 free: 2001 MB
node 1 cpus: 6
node 1 size: 2048 MB
node 1 free: 2008 MB
node 2 cpus: 5
node 2 size: 2048 MB
node 2 free: 2010 MB
node 3 cpus: 4
node 3 size: 2048 MB
node 3 free: 2009 MB
node 4 cpus: 3
node 4 size: 2048 MB
node 4 free: 2009 MB
node 5 cpus: 0
node 5 size: 2048 MB
node 5 free: 1982 MB
node 6 cpus: 1
node 6 size: 2048 MB
node 6 free: 2008 MB
node 7 cpus: 2
node 7 size: 2048 MB
node 7 free: 1944 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 20 20 20 20 20 20 20
1: 20 10 20 20 20 20 20 20
2: 20 20 10 20 20 20 20 20
3: 20 20 20 10 20 20 20 20
4: 20 20 20 20 10 20 20 20
5: 20 20 20 20 20 10 20 20
6: 20 20 20 20 20 20 10 20
7: 20 20 20 20 20 20 20 10
root@heatpipe:~# numastat -c
Per-node numastat info (in MBs):
Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Total
------ ------ ------ ------ ------ ------ ------ ------ -----
Numa_Hit 37 43 35 42 43 97 45 58 401
Numa_Miss 0 0 0 0 0 0 0 0 0
Numa_Foreign 0 0 0 0 0 0 0 0 0
Interleave_Hit 7 7 7 7 7 7 7 7 56
Local_Node 28 34 26 33 34 97 36 49 336
Other_Node 9 9 9 9 9 0 9 9 65
Patchset applies to latest Xen tree
commit e008e9119d03852020b93e1d4da9a80ec1af9c75
Available at http://git.gitorious.org/xenvnuma/xenvnuma.git
Elena Ufimtseva (7):
Xen vNUMA for PV guests.
Per-domain vNUMA initialization.
vNUMA nodes allocation on NUMA nodes.
vNUMA libxl supporting functionality.
vNUMA VM config parsing functions
xl.cgf documentation update for vNUMA.
NUMA debug-key additional output for vNUMA
docs/man/xl.cfg.pod.5 | 50 +++++++++++
tools/libxc/xc_dom.h | 9 ++
tools/libxc/xc_dom_x86.c | 77 ++++++++++++++--
tools/libxc/xc_domain.c | 57 ++++++++++++
tools/libxc/xenctrl.h | 9 ++
tools/libxc/xg_private.h | 1 +
tools/libxl/libxl.c | 19 ++++
tools/libxl/libxl.h | 20 ++++-
tools/libxl/libxl_arch.h | 5 ++
tools/libxl/libxl_dom.c | 105 +++++++++++++++++++++-
tools/libxl/libxl_internal.h | 3 +
tools/libxl/libxl_types.idl | 5 +-
tools/libxl/libxl_x86.c | 86 ++++++++++++++++++
tools/libxl/xl_cmdimpl.c | 205 ++++++++++++++++++++++++++++++++++++++++++
xen/arch/x86/numa.c | 23 ++++-
xen/common/domain.c | 25 +++++-
xen/common/domctl.c | 68 +++++++++++++-
xen/common/memory.c | 56 ++++++++++++
xen/include/public/domctl.h | 15 +++-
xen/include/public/memory.h | 9 +-
xen/include/xen/domain.h | 11 +++
xen/include/xen/sched.h | 1 +
xen/include/xen/vnuma.h | 27 ++++++
23 files changed, 869 insertions(+), 17 deletions(-)
create mode 100644 xen/include/xen/vnuma.h
--
1.7.10.4