Michal Privoznik
2015-Feb-10 09:14 UTC
Re: [libvirt-users] HugePages - can't start guest that requires them
On 09.02.2015 18:19, G. Richard Bellamy wrote:> First I'll quickly summarize my understanding of how to configure numa... > > In "//memoryBacking/hugepages/page[@nodeset]" I am telling libvirt to > use hugepages for the guest, and to get those hugepages from a > particular host NUMA node.No, @nodeset refers to guest NUMA nodes.> > In "//numatune/memory[@nodeset]" I am telling libvirt to pin the > memory allocation to the guest from a particular host numa node.The <memory/> element tells what to do with not explicitly pinned guest NUMA nodes.> In "//numatune/memnode[@nodeset]" I am telling libvirt which guest > NUMA node (cellid) should come from which host NUMA node (nodeset).Correct. This way you can explicitly pin guest onto host NUMA nodes.> > In "//cpu/numa/cell[@id]" I am telling libvirt how much memory to > allocate to each guest NUMA node (cell).Yes. Each <cell/> creates guest NUMA node. It interconnects vCPUs and guest memory - which vCPUs should lie in which guest NUMA node, and how much memory should be available for that particular guest NUMA node.> > Basically, I thought "nodeset", regardless of where it existed in the > domain xml, referred to the host's NUMA node, and "cell" (<cell id=/> > or @cellid) refers to the guest's NUMA node. > > However.... > > Atlas [1] starts without issue, prometheus [2] fails with "libvirtd[]: > hugepages: node 2 not found". I found a patch that contains the code > responsible for throwing this error [3], > > + if (def->cpu && def->cpu->ncells) { > + /* Fortunately, we allow only guest NUMA nodes to be continuous > + * starting from zero. */ > + pos = def->cpu->ncells - 1; > + } > + > + next_bit = virBitmapNextSetBit(page->nodemask, pos); > + if (next_bit >= 0) { > + virReportError(VIR_ERR_XML_DETAIL, > + _("hugepages: node %zd not found"), > + next_bit); > + return -1; > + } > > Without digging too deeply into the actual code, and just inferring > from the above, it looks like we are reading the number of cells set > in "//cpu/numa" with def->cpu->ncells, and comparing it to the number > of nodesets in "//memoryBacking//hugepages". I think this means that I > misunderstand what the nodeset is for in that element... > > Of note is the fact that my host has non-contiguous NUMA node numbers: > 2015-02-09 08:53:06 > root@eanna i ~ # numastat > node0 node2 > numa_hit 216225024 440311113 > numa_miss 0 795018 > numa_foreign 795018 0 > interleave_hit 15835 15783 > local_node 214029815 221903122 > other_node 2195209 219203009 > > Thanks again for any help. >Libvirt should be perfectly able to cope with noncontinuous host NUMA nodes. However, noncontinuous guest NUMA nodes are not supported yet - but it shouldn't matter since users have full control over creating guest NUMA nodes. Anyway, if you find the documentation incomplete in any sense, any part, or you feel that rewording some paragraphs may help, feel free to propose a patch and I'll review it. Michal
G. Richard Bellamy
2015-Feb-20 20:32 UTC
Re: [libvirt-users] HugePages - can't start guest that requires them
On Tue, Feb 10, 2015 at 1:14 AM, Michal Privoznik <mprivozn@redhat.com> wrote:> On 09.02.2015 18:19, G. Richard Bellamy wrote: >> First I'll quickly summarize my understanding of how to configure numa... >> >> In "//memoryBacking/hugepages/page[@nodeset]" I am telling libvirt to >> use hugepages for the guest, and to get those hugepages from a >> particular host NUMA node. > > No, @nodeset refers to guest NUMA nodes. > >> >> In "//numatune/memory[@nodeset]" I am telling libvirt to pin the >> memory allocation to the guest from a particular host numa node. > > The <memory/> element tells what to do with not explicitly pinned guest > NUMA nodes. > >> In "//numatune/memnode[@nodeset]" I am telling libvirt which guest >> NUMA node (cellid) should come from which host NUMA node (nodeset). > > Correct. This way you can explicitly pin guest onto host NUMA nodes. > >> >> In "//cpu/numa/cell[@id]" I am telling libvirt how much memory to >> allocate to each guest NUMA node (cell). > > Yes. Each <cell/> creates guest NUMA node. It interconnects vCPUs and > guest memory - which vCPUs should lie in which guest NUMA node, and how > much memory should be available for that particular guest NUMA node. > >> >> Basically, I thought "nodeset", regardless of where it existed in the >> domain xml, referred to the host's NUMA node, and "cell" (<cell id=/> >> or @cellid) refers to the guest's NUMA node. >> >> However.... >> >> Atlas [1] starts without issue, prometheus [2] fails with "libvirtd[]: >> hugepages: node 2 not found". I found a patch that contains the code >> responsible for throwing this error [3], >> >> + if (def->cpu && def->cpu->ncells) { >> + /* Fortunately, we allow only guest NUMA nodes to be continuous >> + * starting from zero. */ >> + pos = def->cpu->ncells - 1; >> + } >> + >> + next_bit = virBitmapNextSetBit(page->nodemask, pos); >> + if (next_bit >= 0) { >> + virReportError(VIR_ERR_XML_DETAIL, >> + _("hugepages: node %zd not found"), >> + next_bit); >> + return -1; >> + } >> >> Without digging too deeply into the actual code, and just inferring >> from the above, it looks like we are reading the number of cells set >> in "//cpu/numa" with def->cpu->ncells, and comparing it to the number >> of nodesets in "//memoryBacking//hugepages". I think this means that I >> misunderstand what the nodeset is for in that element... >> >> Of note is the fact that my host has non-contiguous NUMA node numbers: >> 2015-02-09 08:53:06 >> root@eanna i ~ # numastat >> node0 node2 >> numa_hit 216225024 440311113 >> numa_miss 0 795018 >> numa_foreign 795018 0 >> interleave_hit 15835 15783 >> local_node 214029815 221903122 >> other_node 2195209 219203009 >> >> Thanks again for any help. >> > > Libvirt should be perfectly able to cope with noncontinuous host NUMA > nodes. However, noncontinuous guest NUMA nodes are not supported yet - > but it shouldn't matter since users have full control over creating > guest NUMA nodes. > > Anyway, if you find the documentation incomplete in any sense, any part, > or you feel that rewording some paragraphs may help, feel free to > propose a patch and I'll review it.Thanks again Michal, I'm slowly zeroing in to a good resolution here. I think the documentation is clear enough - it's the fact that a guest NUMA node can be referred to as either cell(id) or nodeset, depending on element context - that's what threw me. I've modified my config [1] based on my understanding, and am running into a new error. Basically I'm hitting the oom-killer [2] even though the hard_limit [3] of memtune is below the total number of hugepages set for that NUMA nodeset. [1] http://sprunge.us/BadI [2] http://sprunge.us/eELZ [3] http://sprunge.us/GYXM
Michal Privoznik
2015-Feb-23 07:01 UTC
Re: [libvirt-users] HugePages - can't start guest that requires them
On 20.02.2015 21:32, G. Richard Bellamy wrote:> <snip/> > > I've modified my config [1] based on my understanding, and am running > into a new error. Basically I'm hitting the oom-killer [2] even though > the hard_limit [3] of memtune is below the total number of hugepages > set for that NUMA nodeset. >Just drop the hard_limit. It's a blackbox we should had never introduced. In Linux, from kernel's POV, there's no difference between guest RAM and hypervisor memory to store its internal state. It's all one big chunk of memory. And even if you know the first part (how much memory you're letting guest to have), you don't know anything about the other part - how much memory does hypervisor need to store its internal state (which may even change over the time), therefore you can't tell the sum of both parts. Also, in the config of your VM, you're not using hugepages. Or you've just posted wrong XML? Then again, kernel's approach to hugepages is not as awesome as to regular system pages. Either on boot (1GB) or at runtime (2MB) one must cut a slice of memory off to be used by hugepages and nothing else. So even if you have ~17GB RAM free on both nodes, they are reserved for hugepages, hence the OOM. Michal
Possibly Parallel Threads
- Re: HugePages - can't start guest that requires them
- Re: HugePages - can't start guest that requires them
- Re: HugePages - can't start guest that requires them
- Re: HugePages - can't start guest that requires them
- Re: HugePages - can't start guest that requires them