Dulloor
2009-Nov-09 07:40 UTC
[Xen-devel] Xen-devel [XEN PATCH] [Linux-PVOPS] ballooning on numa domains
Problems : * With ballooning, the mfns could change underneath and the numa mappings could get distorted (even for domains with the actual physical memory map). * xen/interface/memory.h included with pvops is not in sync with the latest xen version. Added the definitions needed for the patch. -dulloor _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Nov-09 09:03 UTC
Re: [Xen-devel] Xen-devel [XEN PATCH] [Linux-PVOPS] ballooning on numa domains
>>> Dulloor <dulloor@gmail.com> 09.11.09 08:40 >>> >Problems : >* With ballooning, the mfns could change underneath and the numa >mappings could get distorted (even for domains with the actual >physical memory map).Besides being unsure about the idea as such (I don''t think Xen kernels have been knowing much about NUMA characteristics of the underlying machine), you''re assuming that node IDs in kernel and hypervisor are identical, which I don''t think is generally valid; XENMEMF_node(), afaict, is presently meant to be used by the Xen tools (and hypervisor internal users) only. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dulloor
2009-Nov-09 10:35 UTC
Re: [Xen-devel] Xen-devel [XEN PATCH] [Linux-PVOPS] ballooning on numa domains
- Xen guest domains can''t read the numa acpi tables and this patch (in the current state) is a no-op for them. - dom0 can read the numa tables (same as xen). Also, the memory map for dom0 is (currently) set in a way that the numa ranges are consistent. I don''t see that changing, so I feel the assumption is valid. - XENMEMF flags are indeed meant for xen tools. But, ballooning is completely xen specific too ... it is a xen tool, except that it resides in domain''s kernel/tree. Please let know your take on this. -dulloor On Mon, Nov 9, 2009 at 4:03 AM, Jan Beulich <JBeulich@novell.com> wrote:>>>> Dulloor <dulloor@gmail.com> 09.11.09 08:40 >>> >>Problems : >>* With ballooning, the mfns could change underneath and the numa >>mappings could get distorted (even for domains with the actual >>physical memory map). > > Besides being unsure about the idea as such (I don''t think Xen kernels > have been knowing much about NUMA characteristics of the underlying > machine), you''re assuming that node IDs in kernel and hypervisor are > identical, which I don''t think is generally valid; XENMEMF_node(), afaict, > is presently meant to be used by the Xen tools (and hypervisor internal > users) only. > > Jan > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Nov-09 12:57 UTC
Re: [Xen-devel] Xen-devel [XEN PATCH] [Linux-PVOPS] ballooning on numa domains
>>> Dulloor <dulloor@gmail.com> 09.11.09 11:35 >>> >- dom0 can read the numa tables (same as xen). Also, the memory map >for dom0 is (currently) set in a way that the numa ranges are >consistent. I don''t see that changing, so I feel the assumption is >valid.Pseudo-consistent at best - there''s no reason to believe that the node a physical page appears to live on (by looking up its address in the SRAT) has any relationship to the node it really lives on. And even if that was the case, you could easily end up with many (up to all but one) nodes appearing unpopulated (due to dom0_mem=).>- XENMEMF flags are indeed meant for xen tools. But, ballooning is >completely xen specific too ... it is a xen tool, except that it >resides in domain''s kernel/tree.That doesn''t help you with the node ID issue: The tools can make meaningful use of Xen node IDs; if you want to do this in the kernel you''ll have to establish a kernel<->Xen translation of node IDs. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dulloor
2009-Nov-09 14:18 UTC
Re: [Xen-devel] Xen-devel [XEN PATCH] [Linux-PVOPS] ballooning on numa domains
On Mon, Nov 9, 2009 at 7:57 AM, Jan Beulich <JBeulich@novell.com> wrote:>>>> Dulloor <dulloor@gmail.com> 09.11.09 11:35 >>> >>- dom0 can read the numa tables (same as xen). Also, the memory map >>for dom0 is (currently) set in a way that the numa ranges are >>consistent. I don''t see that changing, so I feel the assumption is >>valid. > > Pseudo-consistent at best - there''s no reason to believe that the node > a physical page appears to live on (by looking up its address in the SRAT) > has any relationship to the node it really lives on. > > And even if that was the case, you could easily end up with many (up to > all but one) nodes appearing unpopulated (due to dom0_mem=).Agreed pseudo-consistent (offseted by alloc_spfn). But, even with the dom0_mem set, the numa ranges are silently clipped, so the mappings are still (almost)consistent.> >>- XENMEMF flags are indeed meant for xen tools. But, ballooning is >>completely xen specific too ... it is a xen tool, except that it >>resides in domain''s kernel/tree. > > That doesn''t help you with the node ID issue: The tools can make > meaningful use of Xen node IDs; if you want to do this in the kernel > you''ll have to establish a kernel<->Xen translation of node IDs. >For other guest domains, we will need translation (part of my next patches). But, for dom0, translation is implicit due to shared acpi tables. I could work on a patch to make mappings fully consistent (by rigging the slit/srat values as seen by dom0), inertia being an interface acceptable to Linux folks. Do we need that ?> Jan > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Nov-09 15:13 UTC
Re: [Xen-devel] Xen-devel [XEN PATCH] [Linux-PVOPS] ballooning on numa domains
>>> Dulloor <dulloor@gmail.com> 09.11.09 15:18 >>> >On Mon, Nov 9, 2009 at 7:57 AM, Jan Beulich <JBeulich@novell.com> wrote: >>>>> Dulloor <dulloor@gmail.com> 09.11.09 11:35 >>> >>>- dom0 can read the numa tables (same as xen). Also, the memory map >>>for dom0 is (currently) set in a way that the numa ranges are >>>consistent. I don''t see that changing, so I feel the assumption is >>>valid. >> >> Pseudo-consistent at best - there''s no reason to believe that the node >> a physical page appears to live on (by looking up its address in the SRAT) >> has any relationship to the node it really lives on. >> >> And even if that was the case, you could easily end up with many (up to >> all but one) nodes appearing unpopulated (due to dom0_mem=). > >Agreed pseudo-consistent (offseted by alloc_spfn). But, even with thealloc_spfn (or really the only instance I''m aware of that would matter here) is relevant only for the single big blob that contains kernel, initial page tables, and such; all other of Dom0''s memory can be distributed randomly across the address space.>dom0_mem set, >the numa ranges are silently clipped, so the mappings are still >(almost)consistent.Correct - but, as previously said, with certain (possibly all but one) nodes having no memory at all (possibly until ballooning). (Have you checked that a previously unpopulated node suddenly becoming populated is being handled properly in all respects in the kernel''s memory management subsystem, and can you guarantee this will always be the case in the future?)>> >>>- XENMEMF flags are indeed meant for xen tools. But, ballooning is >>>completely xen specific too ... it is a xen tool, except that it >>>resides in domain''s kernel/tree. >> >> That doesn''t help you with the node ID issue: The tools can make >> meaningful use of Xen node IDs; if you want to do this in the kernel >> you''ll have to establish a kernel<->Xen translation of node IDs. >> >For other guest domains, we will need translation (part of my next patches). >But, for dom0, translation is implicit due to shared acpi tables.Not really - just check setup_node() in Xen: The node ID is software assigned, what comes from SRAT is the pxm value.>I could work on a patch to make mappings fully consistent (by rigging the >slit/srat values as seen by dom0), inertia being an interface acceptable to >Linux folks. Do we need that ?Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dulloor
2009-Nov-10 06:37 UTC
Re: [Xen-devel] Xen-devel [XEN PATCH] [Linux-PVOPS] ballooning on numa domains
On Mon, Nov 9, 2009 at 10:13 AM, Jan Beulich <JBeulich@novell.com> wrote:>>>> Dulloor <dulloor@gmail.com> 09.11.09 15:18 >>> >>On Mon, Nov 9, 2009 at 7:57 AM, Jan Beulich <JBeulich@novell.com> wrote: >>>>>> Dulloor <dulloor@gmail.com> 09.11.09 11:35 >>> >>>>- dom0 can read the numa tables (same as xen). Also, the memory map >>>>for dom0 is (currently) set in a way that the numa ranges are >>>>consistent. I don''t see that changing, so I feel the assumption is >>>>valid. >>> >>> Pseudo-consistent at best - there''s no reason to believe that the node >>> a physical page appears to live on (by looking up its address in the SRAT) >>> has any relationship to the node it really lives on. >>> >>> And even if that was the case, you could easily end up with many (up to >>> all but one) nodes appearing unpopulated (due to dom0_mem=). >> >>Agreed pseudo-consistent (offseted by alloc_spfn). But, even with the > > alloc_spfn (or really the only instance I''m aware of that would matter > here) is relevant only for the single big blob that contains kernel, > initial page tables, and such; all other of Dom0''s memory can be > distributed randomly across the address space.Offseted by alloc_spfn. (mfn = pfn+alloc_spfn) while setting the vphysmap. Did you mean when dom0_mem is set ?>>dom0_mem set,ou >>the numa ranges are silently clipped, so the mappings are still >>(almost)consistent. > > Correct - but, as previously said, with certain (possibly all but one) > nodes having no memory at all (possibly until ballooning). (Have you > checked that a previously unpopulated node suddenly becoming > populated is being handled properly in all respects in the kernel''s > memory management subsystem, and can you guarantee this will > always be the case in the future?)You mean the dom0 starts with low memory (few nodes unpopulated) and then ballooning adds more ? But, isn''t the memory map (for dom0) set upto dom0-max-mem. And, ballooning can only increase/decrease reservations in dom0''s address space. Maybe I didn''t understand your point.> >>> >>>>- XENMEMF flags are indeed meant for xen tools. But, ballooning is >>>>completely xen specific too ... it is a xen tool, except that it >>>>resides in domain''s kernel/tree. >>> >>> That doesn''t help you with the node ID issue: The tools can make >>> meaningful use of Xen node IDs; if you want to do this in the kernel >>> you''ll have to establish a kernel<->Xen translation of node IDs. >>> >>For other guest domains, we will need translation (part of my next patches). >>But, for dom0, translation is implicit due to shared acpi tables. > > Not really - just check setup_node() in Xen: The node ID is software > assigned, what comes from SRAT is the pxm value.But, it is done the same way in Dom0 and xen, although I do agree that this is not guaranteed in future.> >>I could work on a patch to make mappings fully consistent (by rigging the >>slit/srat values as seen by dom0), inertia being an interface acceptable to >>Linux folks. Do we need that ? > > Jan >In general, I agree there is work to be done (planned for in later patches). Please do let know any ideas you have. But, as far as this patch is concerned, it tries only one thing that the node distribution of memory remains the same across ballooning, acknowledging that mappings can change underneath and making no other assumptions. It might help in some cases and is a no-op in others. Whether the initial distribution is consistent or pseudo-consistent is a matter of more work. Moreover, this is just best effort, since even if XENMEMF_node(n) is set, the allocation inside xen could still be from other nodes'' heaps. If you/Jeremy don''t find this (incremental) patch useful, we can drop it for now and that''s fine with me ! :) -dulloor _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Nov-10 08:57 UTC
Re: [Xen-devel] Xen-devel [XEN PATCH] [Linux-PVOPS] ballooning on numa domains
>>> Dulloor <dulloor@gmail.com> 10.11.09 07:37 >>> >On Mon, Nov 9, 2009 at 10:13 AM, Jan Beulich <JBeulich@novell.com> wrote: >> alloc_spfn (or really the only instance I''m aware of that would matter >> here) is relevant only for the single big blob that contains kernel, >> initial page tables, and such; all other of Dom0''s memory can be >> distributed randomly across the address space. > >Offseted by alloc_spfn. (mfn = pfn+alloc_spfn) while setting the vphysmap.Again, you''re only looking at the first chunk allocated for Dom0. Just look at the following 50 or so lines of code in domain_build.c.>Did you mean when dom0_mem is set ?That''s independent. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel