keir & harper, You know, currently alloc_domheap_pages try to alloc mem from the node where current LP sit, which is not always right. E.g some hypercall issued from dom0''s LP allocate memory for other domains. This patch intend to resolve this issue on NUMA system by locating node from the served domain. A new interface alloc_domheap_pages_on_node is introduced instead of changing current implementation, which is invasive. There are still many places left for the new interface, but we can change them in a incremental way if needed. Appreciate your comments. -- best rgds, edwin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Mar-31 10:47 UTC
Re: [Xen-devel] [RFC][PATCH] domheap optimization for NUMA
On 31/3/08 11:40, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:> A new interface alloc_domheap_pages_on_node is introduced instead of changing > current implementation, which is invasive. > > There are still many places left for the new interface, but we can change them > in a incremental way if needed.A function called alloc_domheap_pages_on_node() should take a *node* argument not a vcpu. Perhaps have a helper domain_default_node() which returns node for d->vcpu[0] if d->vcpu[0] is non-NULL, else returns some value meaning ''any''. This will probably require some changes to page_alloc.c. That file currently likes to pass cpu ids around rather than node ids, but it would really be cleaner to pass around the latter. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhai, Edwin
2008-Apr-02 13:06 UTC
Re: [Xen-devel] [RFC][PATCH] domheap optimization for NUMA
On Mon, Mar 31, 2008 at 11:47:36AM +0100, Keir Fraser wrote:> On 31/3/08 11:40, "Zhai, Edwin" <edwin.zhai@intel.com> wrote: > > > A new interface alloc_domheap_pages_on_node is introduced instead of changing > > current implementation, which is invasive. > > > > There are still many places left for the new interface, but we can change them > > in a incremental way if needed. > > A function called alloc_domheap_pages_on_node() should take a *node* > argument not a vcpu. Perhaps have a helper domain_default_node() which > returns node for d->vcpu[0] if d->vcpu[0] is non-NULL, else returns some > value meaning ''any''. > > This will probably require some changes to page_alloc.c. That file currently > likes to pass cpu ids around rather than node ids, but it would really be > cleaner to pass around the latter.The issue is alloc_domheap_pages take domain* as parameter to indicate if need account pages for the domain, sometimes it''s NULL. In this case, we can''t deduct the node from domain. I believe it''s why use cpu here as getting cpuid is easier. Another option, always use domain* to locate node(not allowed NULL) and add a new flag _MEMF_assign to indicate the assignment, which changes the interface and is invasive.> > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >-- best rgds, edwin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Apr-02 13:27 UTC
Re: [Xen-devel] [RFC][PATCH] domheap optimization for NUMA
On 2/4/08 14:06, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:> The issue is alloc_domheap_pages take domain* as parameter to indicate if need > account pages for the domain, sometimes it''s NULL. In this case, we can''t > deduct > the node from domain. I believe it''s why use cpu here as getting cpuid is > easier.Yes, but it''s a bad interface, particularlty when the function is called alloc_domheap_pages_on_node(). Pass in a nodeid. Write a helper function to work out the nodeid from the domain*.> Another option, always use domain* to locate node(not allowed NULL) and add a > new flag _MEMF_assign to indicate the assignment, which changes the interface > and is invasive.Yes, that''s a bad idea. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2008-Apr-02 22:49 UTC
Re: [Xen-devel] [RFC][PATCH] domheap optimization for NUMA
Keir Fraser wrote:> On 2/4/08 14:06, "Zhai, Edwin" <edwin.zhai@intel.com> wrote: > >> The issue is alloc_domheap_pages take domain* as parameter to indicate if need >> account pages for the domain, sometimes it''s NULL. In this case, we can''t >> deduct >> the node from domain. I believe it''s why use cpu here as getting cpuid is >> easier. > > Yes, but it''s a bad interface, particularlty when the function is called > alloc_domheap_pages_on_node(). Pass in a nodeid. Write a helper function to > work out the nodeid from the domain*.I was just looking at this code, too, so I fixed this. Eventually alloc_heap_pages is called, which deals with nodes only, so I replaced cpu with node everywhere else, too. Now __alloc_domheap_pages and alloc_domheap_pages_on_node are almost the same (except parameter ordering), so I removed the first one, since the naming of the latter is better. Passing node numbers instead of cpu numbers needs cpu_to_node and asm/numa.h, if you think there is a better way, I am all ears.> >> Another option, always use domain* to locate node(not allowed NULL) and add a >> new flag _MEMF_assign to indicate the assignment, which changes the interface >> and is invasive. > > Yes, that''s a bad idea. > > -- KeirThe first diff is against Edwin''s patch, the second includes it. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 277-84917 ----to satisfy European Law for business letters: AMD Saxony Limited Liability Company & Co. KG, Wilschdorfer Landstr. 101, 01109 Dresden, Germany Register Court Dresden: HRA 4896, General Partner authorized to represent: AMD Saxony LLC (Wilmington, Delaware, US) General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Apr-02 23:21 UTC
Re: [Xen-devel] [RFC][PATCH] domheap optimization for NUMA
On 2/4/08 23:49, "Andre Przywara" <andre.przywara@amd.com> wrote:>> Yes, but it''s a bad interface, particularlty when the function is called >> alloc_domheap_pages_on_node(). Pass in a nodeid. Write a helper function to >> work out the nodeid from the domain*. > I was just looking at this code, too, so I fixed this. Eventually > alloc_heap_pages is called, which deals with nodes only, so I replaced > cpu with node everywhere else, too. Now __alloc_domheap_pages and > alloc_domheap_pages_on_node are almost the same (except parameter > ordering), so I removed the first one, since the naming of the latter is > better. Passing node numbers instead of cpu numbers needs cpu_to_node > and asm/numa.h, if you think there is a better way, I am all ears.That''s fine. If you reference numa stuff then you need numa.h. But vcpu_to_node and domain_to_node as well as cpu_to_node, please. There''s no need to be open-coding v->processor everywhere. Also in future we might care to pick node based on v''s affinity map rather than just current processor value. And usage of d->vcpu[0] without checking for != NULL is asking to introduce edge-case bugs. We can easily do that NULL check in one place if we implement domain_to_node(). And, while I''m thinking about the interfaces, let''s just stick to alloc_domheap_page() and alloc_domheap_pages(). Let''s add a flags parameter to the former (so it matches the latter in that respect) and let''s add a MEMF_node() flag subtype (similar to MEMF_bits). Semantics will be that if MEMF_node(node) is provided then we try to allocate memory from node; else we try to allocate memory from a node local to specified domain; else if domain is NULL then we ignore locality. Since zero is probably a valid numa nodeid we can define MEMF_node() as something like ((((node)+1)&0xff)<<8). Then since NUMA_NO_NODE==0xff everything works nicely: MEMF_node(NUMA_NO_NODE) is equivalent to not specifying MEMF_node() at all, which is what we would logically expect. NUMA_NO_NODE probably needs to be pulled out of asm-x86/numa.h and made the official arch-neutral way to specify ''don''t care'' for numa nodes. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2008-Apr-03 10:39 UTC
Re: [Xen-devel] [RFC][PATCH] domheap optimization for NUMA
Keir,>>> Yes, but it''s a bad interface, particularlty when the function is called >>> alloc_domheap_pages_on_node(). Pass in a nodeid. Write a helper function to >>> work out the nodeid from the domain*. >> I was just looking at this code, too, so I fixed this. Eventually >> alloc_heap_pages is called, which deals with nodes only, so I replaced >> cpu with node everywhere else, too. Now __alloc_domheap_pages and >> alloc_domheap_pages_on_node are almost the same (except parameter >> ordering), so I removed the first one, since the naming of the latter is >> better. Passing node numbers instead of cpu numbers needs cpu_to_node >> and asm/numa.h, if you think there is a better way, I am all ears. > > That''s fine. If you reference numa stuff then you need numa.h. > > But vcpu_to_node and domain_to_node as well as cpu_to_node, please. There''s > no need to be open-coding v->processor everywhere. Also in future we might > care to pick node based on v''s affinity map rather than just current > processor value. And usage of d->vcpu[0] without checking for != NULL is > asking to introduce edge-case bugs. We can easily do that NULL check in one > place if we implement domain_to_node().Ok, I did this. I provided NUMA_NO_NODE in the case d->vcpu[0] is NULL, this will be resolved to the current node in alloc_heap_pages (at least for now). By the way, can we solve the DMA_BITSIZE problem (your mail from 28th Feb) with this? If no node is specified, use the current behaviour of preferring non DMA zones, else stick to the given node. If you agree, I will implement this.> And, while I''m thinking about the interfaces, let''s just stick to > alloc_domheap_page() and alloc_domheap_pages(). Let''s add a flags parameter > to the former (so it matches the latter in that respect) and let''s add a > MEMF_node() flag subtype (similar to MEMF_bits). Semantics will be that if > MEMF_node(node) is provided then we try to allocate memory from node; else > we try to allocate memory from a node local to specified domain; else if > domain is NULL then we ignore locality.Sounds reasonable. I changed this, too. If domain is NULL, domain_to_node will return NUMA_NO_NODE, which will eventually ignore locality (in alloc_heap_pages).> > Since zero is probably a valid numa nodeid we can define MEMF_node() as > something like ((((node)+1)&0xff)<<8). Then since NUMA_NO_NODE==0xff > everything works nicely: MEMF_node(NUMA_NO_NODE) is equivalent to not > specifying MEMF_node() at all, which is what we would logically expect.Good idea.> NUMA_NO_NODE probably needs to be pulled out of asm-x86/numa.h and made the > official arch-neutral way to specify ''don''t care'' for numa nodes.Is this really needed? I provided memflags=0 is all don''t care cases, this should work and is more compatible. But beware that this silently assumes in page_alloc.c#alloc_domheap_pages that NUMA_NO_NODE is 0xFF, otherwise this trick will not work. Attached again a diff against my last version and the full patch (for some reason a missing bracket slipped through my last one, sorry for that). This is only quick-tested (booted and created a guest on each node). Signed-off-by: Andre Przywara <andre.przywara@amd.com> Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 277-84917 ----to satisfy European Law for business letters: AMD Saxony Limited Liability Company & Co. KG, Wilschdorfer Landstr. 101, 01109 Dresden, Germany Register Court Dresden: HRA 4896, General Partner authorized to represent: AMD Saxony LLC (Wilmington, Delaware, US) General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Apr-03 10:58 UTC
Re: [Xen-devel] [RFC][PATCH] domheap optimization for NUMA
On 3/4/08 11:39, "Andre Przywara" <andre.przywara@amd.com> wrote:> By the way, can we solve the DMA_BITSIZE problem (your mail from 28th > Feb) with this? If no node is specified, use the current behaviour of > preferring non DMA zones, else stick to the given node. > If you agree, I will implement this.I don''t think that gets us what we want. The fact is we specify NUMA node on nearly 100% of allocations (either explicitly via MEMF_node() or via passing a non-NULL domain pointer). So you would *always* prefer local DMA pages over remote non-DMA pages. That''s not necessarily better than the current policy. My point in my email of Feb 28th was that we should set dma_bitsize ''appropriately'' (well, according to a slightly arbitrary policy :-) so that *some* DMA memory is set aside and only used to satisfy allocations which cannot be satisfied by a remote, while *some* memory is always made available on every node for local allocations. Does that make sense?>> NUMA_NO_NODE probably needs to be pulled out of asm-x86/numa.h and made the >> official arch-neutral way to specify ''don''t care'' for numa nodes. > Is this really needed? I provided memflags=0 is all don''t care cases, > this should work and is more compatible. But beware that this silently > assumes in page_alloc.c#alloc_domheap_pages that NUMA_NO_NODE is 0xFF, > otherwise this trick will not work.Yes it is needed if your patch is to work across all architectures, not just x86! Your current patch is broken in this respect because you quite unnecessarily define domain_to_node() and vcpu_to_node() in asm/numa.h rather than xen/numa.h. Please address architectural portability and re-send the patch. Apart from that I think it''s just about ready to go in. Thanks, Keir> Attached again a diff against my last version and the full patch (for > some reason a missing bracket slipped through my last one, sorry for that). > > This is only quick-tested (booted and created a guest on each node)._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2008-Apr-03 13:57 UTC
Re: [Xen-devel] [RFC][PATCH] domheap optimization for NUMA
Keir Fraser wrote:>>> NUMA_NO_NODE probably needs to be pulled out of asm-x86/numa.h and made the >>> official arch-neutral way to specify ''don''t care'' for numa nodes. >> Is this really needed? I provided memflags=0 is all don''t care cases, >> this should work and is more compatible. But beware that this silently >> assumes in page_alloc.c#alloc_domheap_pages that NUMA_NO_NODE is 0xFF, >> otherwise this trick will not work. > > Yes it is needed if your patch is to work across all architectures, not just > x86! Your current patch is broken in this respect because you quite > unnecessarily define domain_to_node() and vcpu_to_node() in asm/numa.h > rather than xen/numa.h.Right you are. I fixed this below. While playing around with the headers, I realized that numa.h is not needed at all in common/page_alloc.c and common/sysctl.c. Can you confirm this? Or is this needed for the other architectures? Signed-off-by: Andre Przywara <andre.przywara@amd.com>> Please address architectural portability and re-send the patch. Apart from > that I think it''s just about ready to go in.I am just trying to install an IA64 cross compiler, but it seems this will take some time... Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 277-84917 ----to satisfy European Law for business letters: AMD Saxony Limited Liability Company & Co. KG, Wilschdorfer Landstr. 101, 01109 Dresden, Germany Register Court Dresden: HRA 4896, General Partner authorized to represent: AMD Saxony LLC (Wilmington, Delaware, US) General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Apr-03 14:49 UTC
Re: [Xen-devel] [RFC][PATCH] domheap optimization for NUMA
On 3/4/08 14:57, "Andre Przywara" <andre.przywara@amd.com> wrote:>> Yes it is needed if your patch is to work across all architectures, not just >> x86! Your current patch is broken in this respect because you quite >> unnecessarily define domain_to_node() and vcpu_to_node() in asm/numa.h >> rather than xen/numa.h. > Right you are. I fixed this below. While playing around with the > headers, I realized that numa.h is not needed at all in > common/page_alloc.c and common/sysctl.c. Can you confirm this? Or is > this needed for the other architectures?It''s polite to directly include stuff you use and not depend on transitive header inclusion, as that can be fragile.> Signed-off-by: Andre Przywara <andre.przywara@amd.com> > > >> Please address architectural portability and re-send the patch. Apart from >> that I think it''s just about ready to go in. > I am just trying to install an IA64 cross compiler, but it seems this > will take some time...The ia64 guys will thank you for it, even if I don''t require it! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Isaku Yamahata
2008-Apr-04 08:37 UTC
Re: [Xen-devel] [RFC][PATCH] domheap optimization for NUMA
On Thu, Apr 03, 2008 at 03:57:08PM +0200, Andre Przywara wrote:> I am just trying to install an IA64 cross compiler, but it seems this > will take some time...The following page would be helpfull. (prebuild packages for Fedora8 are also available.) http://wiki.xensource.com/xenwiki/CrossCompiling If your system is debian, packages are available from http://www.emdebian.org/index.html -- yamahata _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Aron Griffis
2008-Apr-04 13:22 UTC
Re: [Xen-devel] [RFC][PATCH] domheap optimization for NUMA
Hi Andre, Andre Przywara wrote: [Thu Apr 03 2008, 09:57:08AM EDT]> I am just trying to install an IA64 cross compiler, but it > seems this will take some time...Thank you! It would help ia64 a lot if x86 committers would compile-test for ia64. It''s actually very easy. As Isaku mentioned, there are instructions at http://wiki.xensource.com/xenwiki/CrossCompiling If you have any trouble, please let me know... Thanks, Aron _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhai, Edwin
2008-Apr-07 13:25 UTC
Re: [Xen-devel] [RFC][PATCH] domheap optimization for NUMA
On Thu, Apr 03, 2008 at 12:21:26AM +0100, Keir Fraser wrote:> > And, while I''m thinking about the interfaces, let''s just stick to > alloc_domheap_page() and alloc_domheap_pages(). Let''s add a flags parameter > to the former (so it matches the latter in that respect) and let''s add a > MEMF_node() flag subtype (similar to MEMF_bits). Semantics will be that if > MEMF_node(node) is provided then we try to allocate memory from node; else > we try to allocate memory from a node local to specified domain; else if > domain is NULL then we ignore locality. > > Since zero is probably a valid numa nodeid we can define MEMF_node() as > something like ((((node)+1)&0xff)<<8). Then since NUMA_NO_NODE==0xff > everything works nicely: MEMF_node(NUMA_NO_NODE) is equivalent to not > specifying MEMF_node() at all, which is what we would logically expect. > NUMA_NO_NODE probably needs to be pulled out of asm-x86/numa.h and made the > official arch-neutral way to specify ''don''t care'' for numa nodes.Keir, It''s really good idea. BTW, Do you think if need make xenheap allocation also NUMA aware in some case, e.g VMCS? Thanks,> > -- Keir > >-- best rgds, edwin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Apr-07 13:51 UTC
Re: [Xen-devel] [RFC][PATCH] domheap optimization for NUMA
On 7/4/08 14:25, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:> It''s really good idea.And now done and checked in.> BTW, > Do you think if need make xenheap allocation also NUMA aware in some case, e.g > VMCS?We could get rid of the xenheap allocator altogether, at least on x86/64. Or, just for this case, alloc_domheap_pages() things like VMCS and then map_domain_page_global() them. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel