Magenheimer, Dan (HP Labs Fort Collins)
2005-Mar-17 23:08 UTC
[Xen-devel] A tale of three memory allocators
This should be a good meaty developers discussion, full of opinions. I''m far from an expert in this area and hereby solicit help. Once upon a time there was the Xen simplified memory allocator, written by Keir. When Dan (the ia64 man) looked at this allocator for Xen/ia64, he was very displeased. "Ia64 has a much more complex physical memory architecture than x86," he said. "For example, physical memory space is often not contiguous, and what about the cool NUMA stuff that is in Linux 2.6?" So Dan took the Linux memory allocation code, hacked on it mightily, adding numerous ugly ifdefs and used the result in place of Keir''s code for Xen/ia64. The result works, but is truly an abomination. Some time later, Rusty looked at Keir''s code and too was displeased. He rewrote it to be much cleaner and more simplified, much to Keir''s delight. And this code was placed in common, still to be ignored by Xen/ia64. Now Arun gazed upon the ugly Xen/ia64 memory allocator full of ifdefs and was much dismayed. He preferred Rusty''s code and, even better, sharing more code with Xen/x86, so he busily generated a patch for Xen/ia64 to use the common code. Said patch is still pending because coincidentally Greg is currently looking at porting Xen/ia64 to one of those newfangled ia64 NUMA machines. "I would like to turn on CONFIG_NUMA. CONFIG_DISCONTIGMEM, and CONFIG_VIRTUAL_MEM_MAP," said Greg. What should Dan (the ia64 man) do??? How complex is too complex? How ugly is too ugly? This concludes (for now) the tale of three memory allocators... (please respond/discuss :-) ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Said patch is still pending because coincidentally Greg > is currently looking at porting Xen/ia64 to one of those newfangled > ia64 NUMA machines. "I would like to turn on CONFIG_NUMA. > CONFIG_DISCONTIGMEM, and CONFIG_VIRTUAL_MEM_MAP," said Greg.I''d vote strongly for: Stick with Rusty''s allocator and just have different instances for different memory banks. Wrap the allocation function to prioritise which pool to allocate from. This handles discontig memory and NUMA nicely. Ian ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Thu, 17 Mar 2005, Magenheimer, Dan (HP Labs Fort Collins) wrote:> Said patch is still pending because coincidentally Greg > is currently looking at porting Xen/ia64 to one of those newfangled > ia64 NUMA machines. "I would like to turn on CONFIG_NUMA. > CONFIG_DISCONTIGMEM, and CONFIG_VIRTUAL_MEM_MAP," said Greg.Two out of three is enough. I don''t see the need for both CONFIG_DISCONTIGMEM and CONFIG_VIRTUAL_MEM_MAP. I guess that having a NUMA aware allocator could come in handy though, so guest domains get their memory from the right node wrt. to where they get CPU time scheduled. To me, that would suggest that Ian''s hunch is right and the proper thing to do would be different instances of Rusty''s simple allocator. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Rik van Riel <riel <at> redhat.com> writes:> Two out of three is enough. I don''t see the need for > both CONFIG_DISCONTIGMEM and CONFIG_VIRTUAL_MEM_MAP.Well, we need it for sn2. We have very large memory holes within a node, so we use the virtual memmap to make the mem_map array within a node virtually contiguous.> I guess that having a NUMA aware allocator could come > in handy though, so guest domains get their memory from > the right node wrt. to where they get CPU time scheduled.Yep, it would probably be a mistake to overoptimize Xen on NUMA at this point, but doing basic things like this makes sense. Jesse ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Friday, March 18, 2005 8:56 am, Jesse Barnes wrote:> Rik van Riel <riel <at> redhat.com> writes: > > Two out of three is enough. I don''t see the need for > > both CONFIG_DISCONTIGMEM and CONFIG_VIRTUAL_MEM_MAP. > > Well, we need it for sn2. We have very large memory holes within a node, > so we use the virtual memmap to make the mem_map array within a node > virtually contiguous.Of course I mean that the hypervisor probably has to support this stuff to work correctly on multi-node ia64 machines. The guests can probably get away with being special cased since they can be presented with a contiguous memory model (Dan is this what you''re doing now?)... Thanks, Jesse ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Hi, Dan, This is really a good writing. :) My feeling is that we could still try to merge with XEN common memory system first, and then, based on this simplified model, investigate carefully about what''s the best and cleanest approach to support add-on features, like NUMA here. Yes, it would be great if XEN can run on so many different work models soon. It''s really quickest way to support NUMA stuff based on existing Linux code. However sometimes re-design based on a new usage model may be more valuable than simply copying stuff for largely different one. :) VM hypervisor differs with normal OS to large extent. Linux memory management is efficient which however, contains too many redundancies for XEN. For example, Linux has to differentiate allocation request for normal memory or DMA-capable one. Instead Xen grants Dom0 (service OS) to take charge of physical devices, thus no need to know DMA related stuff. Another example is, large part of Linux memory management code is for handling user process, which has a different appearance compared to Domain. That brings with many uncertainties for later development. So to adopt XEN common code actually let us to sync with new fix/update/design which benefiting virtual machine concept quickly; however to borrow Linux code brings us complexity to maintain and catch up with change of linux. It''s very likely that efforts on those finally just shows unrelated to XEN usage model. Actually our patch replaces linux stuff with XEN common memory part, including both boot-time allocator, buddy system and Rusty''s simple slab allocator. If this can be merged earlier, we then get a better base to consider how to support NUMA model on XEN environment. IMO, actually XEN already leaves space for such enhancement. The buddy system is based on concept of zone, which you can also think as node. A quick way may be: (As Ian points out) 1. Define more domain ID, like: #define MEMZONE_XEN 0 #define MEMZONE_DOM 1 #define MEMZONE_NODE 4 /* say 4 node */ #define NR_ZONES MEMZONE_NODE + 2 2. Define new wrapping interfaces: struct pfn_info *alloc_node_pages(struct domain *d, unsigned int order) { Scan node_list Alloc_heap_pages(node_id, order).. ... } Maybe later it can also be enhanced to use hierarchy domain structure, if really required. Who knows? Anyway, I just throw above out as an example that it''s not so difficult if we merge with xen common code first. :) Thanks, Kevin>-----Original Message----- >From: xen-devel-admin@lists.sourceforge.net >[mailto:xen-devel-admin@lists.sourceforge.net] On Behalf OfMagenheimer, Dan (HP>Labs Fort Collins) >Sent: Thursday, March 17, 2005 3:09 PM >To: xen-devel@lists.sourceforge.net >Subject: [Xen-devel] A tale of three memory allocators------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
>-----Original Message----- >From: xen-devel-admin@lists.sourceforge.net >[mailto:xen-devel-admin@lists.sourceforge.net] On Behalf Of Ian Pratt >Sent: Thursday, March 17, 2005 4:08 PM > >> Said patch is still pending because coincidentally Greg >> is currently looking at porting Xen/ia64 to one of those newfangled >> ia64 NUMA machines. "I would like to turn on CONFIG_NUMA. >> CONFIG_DISCONTIGMEM, and CONFIG_VIRTUAL_MEM_MAP," said Greg. > >I''d vote strongly for: >Stick with Rusty''s allocator and just have different instances for >different memory banks. Wrap the allocation function to prioritisewhich>pool to allocate from. This handles discontig memory and NUMA nicely. > > >IanI assume that we can have no change to Rusty''s allocator at all, except some wrap interfaces to buddy system, right? Slab should only focus on memory in xenheap, while xenheap seems to be node irrespective... :) Thanks, Kevin ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Magenheimer, Dan (HP Labs Fort Collins)
2005-Mar-18 20:07 UTC
RE: [Xen-devel] A tale of three memory allocators
>Of course I mean that the hypervisor probably has to support this stuffto>work correctly on multi-node ia64 machines. The guests can probablyget away>with being special cased since they can be presented with a contiguousmemory>model (Dan is this what you''re doing now?)...On Xen/ia64, domain0 is given the real EFI memory map and has capability to map any part of memory not owned by the Xen hypervisor. All other domains are presented with contiguous memory starting at zero. ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Magenheimer, Dan (HP Labs Fort Collins)
2005-Mar-18 20:11 UTC
RE: [Xen-devel] A tale of three memory allocators
I''m less concerned about NUMA configurations... I agree that NUMA support could be added later. My first concern would be discontiguous memory as, on ia64, it is not uncommon for a machine to have a physical memory map of something like: 0GB-1GB 2GB-4GB 64GB-69GB Total 8GB How does the Rusty Russell allocator handle a map like this? Dan ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Friday 18 March 2005 14:07, Magenheimer, Dan (HP Labs Fort Collins) wrote:> >Of course I mean that the hypervisor probably has to support this stuff > > to > > >work correctly on multi-node ia64 machines. The guests can probably > > get away > > >with being special cased since they can be presented with a contiguous > > memory > > >model (Dan is this what you''re doing now?)... > > On Xen/ia64, domain0 is given the real EFI memory map and has capability > to map any part of memory not owned by the Xen hypervisor. All other > domains are presented with contiguous memory starting at zero.Even though the memory presented to the domU''s are contiguous and start at zero, do you think it would be advantageous to still provide some sort of cpu<->memory topology infomation to the domU, so that domU''s which span more than one node can take advantage of the NUMA code in the Linux kernel? I would think eventually this is something many of the platforms would want, no? -Andrew Theurer ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Friday 18 March 2005 14:11, Magenheimer, Dan (HP Labs Fort Collins) wrote:> I''m less concerned about NUMA configurations... I agree that > NUMA support could be added later. My first concern would > be discontiguous memory as, on ia64, it is not uncommon > for a machine to have a physical memory map of something like: > > 0GB-1GB > 2GB-4GB > 64GB-69GB > Total 8GB > > How does the Rusty Russell allocator handle a map like this?xmalloc.c calls alloc_xenheap_pages() as needed. So call page_alloc.c''s init_boot_pages() repeatedly (as is done already for x86), passing it the memory ranges you do have, and xmalloc will use only that memory. -- Hollis Blanchard IBM Linux Technology Center ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
>-----Original Message----- >From: Magenheimer, Dan (HP Labs Fort Collins)[mailto:dan.magenheimer@hp.com]>Sent: Friday, March 18, 2005 12:12 PM > >I''m less concerned about NUMA configurations... I agree that >NUMA support could be added later. My first concern would >be discontiguous memory as, on ia64, it is not uncommon >for a machine to have a physical memory map of something like: > >0GB-1GB >2GB-4GB >64GB-69GB >Total 8GB > >How does the Rusty Russell allocator handle a map like this? > >DanRusty''s allocator only handles slab, which is in upper layer upon buddy system. Here what the holes actually affects is the memmap, which is the base of buddy system. It''s easier to add virtual memmap when initialize frame_table, without any touch upon slab part. Thanks, Kevin ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel