Hi all, 1. Does desktop computers, such as intel dual core really benefit from NUMA? 2. Does it have a real effect on the performance of Xen? 3. Can''t we let the guest OS manage NUMA instead of Xen? what is the difference? and why is it implemented in Xen? Thanks, David. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* David Pilger <pilger.david@gmail.com> [2007-01-14 06:04]:> Hi all, > > 1. Does desktop computers, such as intel dual core really benefit from NUMA?Desktop computers with AMD chips which include a memory bus on the cpu have NUMA characteristics that can benefit from keeping memory close to the cpu.> 2. Does it have a real effect on the performance of Xen?I''ve [1]posted previously to the list on the performance benefit for NUMA systems and that there is no regression for non-NUMA systems.> 3. Can''t we let the guest OS manage NUMA instead of Xen? what is the > difference? and why is it implemented in Xen?Xen owns all of the system memory and also controls the allocation of that memory and therefor determines what memory and which processors are in use for a guest. If we are to be able to create a guest with memory close to the physical processors in-use, then we must understand the topology of the system when we allocate memory for the guest. I''m not sure I understand entirely what you mean by letting the guest OS manage NUMA instead. However, the current Xen NUMA implementation does not export the domain''s NUMA-ness to the guest kernel, but that is the next logical step. Not only allocate memory to the guest in a NUMA-aware fashion, but in the case that we are required to give memory to a guest from multiple NUMA nodes, to export the guest topology such that if the guest OS is NUMA-aware, it can make NUMA-aware decisions. 1. http://lists.xensource.com/archives/html/xen-devel/2006-09/msg00958.html -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
David Pilger wrote:> Hi all, > > 1. Does desktop computers, such as intel dual core really benefit from > NUMA?No. NUMA standards Non-Uniform Memory Architecture. It''s basically a system where you have nodes (which are essentially independent computers) that are connected via a high speed bus. Each node has it''s own memory but through the magic of NUMA, every node can access the other nodes memory as if it''s own. Most NUMA systems (if not all) are very high end servers.> 2. Does it have a real effect on the performance of Xen?On a NUMA system, absolutely. If you have a domain running on a particular node, you want to make sure that it''s using memory that''s in it''s node if at all possible. Accessing memory on a local node is considerably faster than access memory on other nodes. Prior to Ryan''s NUMA work, Xen would just blindly allocate memory to a domain without taking into account memory locality.> 3. Can''t we let the guest OS manage NUMA instead of Xen? what is the > difference? and why is it implemented in Xen?If a guest OS spans multiple nodes, then you would want it to be NUMA aware. However, you always want Xen to, at least, be NUMA aware so that it allocates memory appropriately. Regards, Anthony Liguori> Thanks, > David._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > Anthony Liguori > Sent: 15 January 2007 17:22 > To: David Pilger > Cc: xen-devel; Ryan Harper > Subject: [Xen-devel] Re: NUMA and SMP > > David Pilger wrote: > > Hi all, > > > > 1. Does desktop computers, such as intel dual core really > benefit from > > NUMA? > > No. NUMA standards Non-Uniform Memory Architecture. It''s > basically a > system where you have nodes (which are essentially independent > computers) that are connected via a high speed bus. Each > node has it''s > own memory but through the magic of NUMA, every node can access the > other nodes memory as if it''s own. Most NUMA systems (if not > all) are > very high end servers.Good description, but you have to agreet that AMD has a NUMA-style architecture in the Opteron class systems. However, this is sometimes also called "SUMO" (Sufficiently Uniform Memory Organization), which means that non-NUMA-aware software will operate correctly on the system, although not optimally (because the software will allocate memory without regard to it''s locality, and thus potentially incurr penalties that aren''t necessary). It''s "sufficiently uniform" because the penalty (compared with "true NUMA") for "bad" memory allocation is in the same order as a normal memory fetch (but of course, that means about 2X to 3X a local memory fetch). On other NUMA systems, the penalty for accessing out-of-node memory can be 10-100x the local memory access time, which is obviously a much more noticable effect.> > > 2. Does it have a real effect on the performance of Xen? > > On a NUMA system, absolutely. If you have a domain running on a > particular node, you want to make sure that it''s using memory > that''s in > it''s node if at all possible. Accessing memory on a local node is > considerably faster than access memory on other nodes. Prior > to Ryan''s > NUMA work, Xen would just blindly allocate memory to a domain without > taking into account memory locality.Absolutely, there''s a noticable benefit.> > > 3. Can''t we let the guest OS manage NUMA instead of Xen? what is the > > difference? and why is it implemented in Xen? > > If a guest OS spans multiple nodes, then you would want it to be NUMA > aware. However, you always want Xen to, at least, be NUMA > aware so that > it allocates memory appropriately.Ideally, we''d want the NUMA information exported to the guest, but at least if Xen knows that memory allocated for a particular guest is local to the same (group of) processor(s), there''s a benefit. You can''t "just" leave it to the guest OS tho'', because the guest has no control over which bits of memory it''s actually gets, Xen doles that out, and if the OS is NUMA aware but gets memory from Node1 and procesor on Node0, then it''s not much the OS can do to make things better, right? -- Mats> > Regards, > > Anthony Liguori > > > Thanks, > > David. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On the topic of NUMA: I''d like to dispute the assumption that a NUMA-aware OS can actually make good decisions about the initial placement of memory in a reasonable hardware ccNUMA system. How does the OS know on which node a particular chunk of memory will be most accessed? The truth is that unless the application or person running the application is herself NUMA-aware and can provide placement hints or directives, the OS will seldom beat a round-robin / interleave or random placement strategy. To illustrate, consider an app which lays out a bunch of data in memory in a single thread and then spawns worker threads to process it. Is the OS to place memory close to the initial thread? How can it possibly know how many threads will eventually process the data? Even if the OS knew how many threads will eventually crunch the data, it cannot possibly know at placement time if each thread will work on an assigned data subset (and if so, which one) or if it will act as a pipeline stage with all the data being passed from one thread to the next. If you go beyond initial memory placement or start considering memory migration, then it''s even harder to win because you have to pay copy and stall penalties during migrations. So you have to be real smart about predicting the future to do better than your ~10-40% memory bandwidth and latency hit associated with doing simple memory interleaving on a modern hardware-ccNUMA system. And it gets worse for you when your app is successfully taking advantage of the memory cache hierarchy because its performance is less impacted by raw memory latency and bandwidth. Things also get more difficult on a time-sharing host with competing apps. There is a strong argument for making hypervisors and OSes NUMA aware in the sense that: 1- They know about system topology 2- They can export this information up the stack to applications and users 3- They can take in directives from users and applications to partition the host and place some threads and memory in specific partitions. 4- They use an interleaved (or random) initial memory placement strategy by default. The argument that the OS on its own -- without user or application directives -- can make better placement decisions than round-robin or random placement is -- in my opinion -- flawed. I also am skeptical that the complexity associated with page migration strategies would be worthwhile: If you got it wrong the first time, what makes you think you''ll do better this time? Emmanuel. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: Emmanuel Ackaouy [mailto:ack@xensource.com] > Sent: 16 January 2007 13:56 > To: Petersson, Mats > Cc: xen-devel; Anthony Liguori; David Pilger; Ryan Harper > Subject: Re: [Xen-devel] Re: NUMA and SMP > > On the topic of NUMA: > > I''d like to dispute the assumption that a NUMA-aware OS can actually > make good decisions about the initial placement of memory in a > reasonable hardware ccNUMA system.I''m not saying that it ALWAYS can make good decisions, but it''s got a better chance than software that just places things in "first available" way.> > How does the OS know on which node a particular chunk of memory > will be most accessed? The truth is that unless the application or > person running the application is herself NUMA-aware and can provide > placement hints or directives, the OS will seldom beat a round-robin / > interleave or random placement strategy.I don''t disagree with that.> > To illustrate, consider an app which lays out a bunch of data > in memory > in a single thread and then spawns worker threads to process it.That''s a good example of a hard to crack nut. Not easily solved in the OS, that''s for sure.> > Is the OS to place memory close to the initial thread? How can it > possibly > know how many threads will eventually process the data? > > Even if the OS knew how many threads will eventually crunch the data, > it cannot possibly know at placement time if each thread will > work on an > assigned data subset (and if so, which one) or if it will act as a > pipeline > stage with all the data being passed from one thread to the next. > > If you go beyond initial memory placement or start considering memory > migration, then it''s even harder to win because you have to pay copy > and stall penalties during migrations. So you have to be real smart > about predicting the future to do better than your ~10-40% memory > bandwidth and latency hit associated with doing simple memory > interleaving on a modern hardware-ccNUMA system.Sure, I certainly wasn''t suggesting memory migration. However, there is a case where NUMA information COULD be helpful, and that is if the system is paging in, it could try to find a page in the local node rather than "random" [although without knowing what the future holds, this could be wrong - as any non-future-knowing strategy would be]. Of course, I wouldn''t disagree if you said "The system probably has too little memory if it''s paging"!.> > And it gets worse for you when your app is successfully > taking advantage > of the memory cache hierarchy because its performance is less impacted > by raw memory latency and bandwidth.Indeed.> > Things also get more difficult on a time-sharing host with competing > apps.Agreed.> > There is a strong argument for making hypervisors and OSes NUMA > aware in the sense that: > 1- They know about system topology > 2- They can export this information up the stack to applications and > users > 3- They can take in directives from users and applications to > partition > the > host and place some threads and memory in specific partitions. > 4- They use an interleaved (or random) initial memory > placement strategy > by default. > > The argument that the OS on its own -- without user or application > directives -- can make better placement decisions than round-robin or > random placement is -- in my opinion -- flawed.Debatable - it depends a lot on WHAT applications you expect to run, and how they behave. If you consider an application that frequently allocates and de-allocates memory dynamically in a single threaded process (say compiler), then allocating memory in the local node should be the "first choice". Multithreaded apps can use a similar approach, if a thread is allocating memory, it''s often a good chance that the memory is being used by that thread too [although this doesn''t work for message passing between threads, obviously, this is again a case where "knowledge from the app" will be the only better solution than "random"]. This approach is by far not perfect, but if you consider that applications often do short term allocations, it makes sense to allocate on the local node if possible.> > I also am skeptical that the complexity associated with page migration > strategies would be worthwhile: If you got it wrong the first > time, what > makes you think you''ll do better this time?I''m not advocating for any page-migration, with the possible exception that page-faults that are resolved by paging in should have a first-choice of local node. However, supporting NUMA in the Hypervisor and forwarding arch-info to the guest would make sense. At the least the very basic principle of: If the guest is to run on a limited set of processors (nodes), allocate memory from that (those) node(s) for the guest would make a lot of sense. [Note that I''m by no means a NUMA expert - I just happen to work for AMD that happens to have a ccNUMA architecture]. -- Mats> > Emmanuel. > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 1/15/07, Anthony Liguori <aliguori@linux.vnet.ibm.com> wrote:> No. NUMA standards Non-Uniform Memory Architecture. It''s basically a > system where you have nodes (which are essentially independent > computers) that are connected via a high speed bus. Each node has it''s > own memory but through the magic of NUMA, every node can access the > other nodes memory as if it''s own. Most NUMA systems (if not all) are > very high end servers.no, at this point, most NUMA servers are probably Alienware desktops for gamers :-) Especially now that Dell is selling them. Opteron brought NUMA into the mainstream in a big way. And desktop unit sales trump all supercomputer sales :-) Us poor supercomputer types are, once again, in the noise where dollar volume is concerned. And, Linux has known for some time how to exploit the NUMA-ness of these Opteron systems. There is even an ACPI table entry, SRAT, to describe the NUMA-ness of a machine.> > > 2. Does it have a real effect on the performance of Xen? > > On a NUMA system, absolutely. If you have a domain running on a > particular node, you want to make sure that it''s using memory that''s in > it''s node if at all possible. Accessing memory on a local node is > considerably faster than access memory on other nodes.Right, but it''s not even close to a factor of two on a desktop machine like dual opteron. It''s still worth being NUMA aware, however. thanks ron _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Jan 16, 2007, at 15:19, Petersson, Mats wrote:>> There is a strong argument for making hypervisors and OSes NUMA >> aware in the sense that: >> 1- They know about system topology >> 2- They can export this information up the stack to applications and >> users >> 3- They can take in directives from users and applications to >> partition >> the >> host and place some threads and memory in specific partitions. >> 4- They use an interleaved (or random) initial memory >> placement strategy >> by default. >> >> The argument that the OS on its own -- without user or application >> directives -- can make better placement decisions than round-robin or >> random placement is -- in my opinion -- flawed. > > Debatable - it depends a lot on WHAT applications you expect to run, > and > how they behave. If you consider an application that frequently > allocates and de-allocates memory dynamically in a single threaded > process (say compiler), then allocating memory in the local node should > be the "first choice". > > Multithreaded apps can use a similar approach, if a thread is > allocating > memory, it''s often a good chance that the memory is being used by that > thread too [although this doesn''t work for message passing between > threads, obviously, this is again a case where "knowledge from the app" > will be the only better solution than "random"]. > > This approach is by far not perfect, but if you consider that > applications often do short term allocations, it makes sense to > allocate > on the local node if possible.I do not agree. Just because a thread happens to run on processor X when it first faults in a page off the process'' heap doesn''t give you a good indication that the memory will be used mostly by this thread or that the thread will continue running on the same processor. There are at least as many cases when this assumption is invalid than when it is valid. Without any solid indication that something else will work better, round robin allocation has to be the default strategy. Also, if you allow one process to consume a large percentage of one node''s memory, you are indirectly hurting all competing multi-threaded apps which benefit from higher total memory bandwidth when they spread their data across nodes. I understand your point that if a single threaded process quickly shrinks its heap after growing it, it makes it less likely that it will migrate to a different processor while it is using this memory. I''m not sure how you predict that memory will be quickly released at allocation time though. Even if you could, I maintain you would still need safeguards in place to balance that process'' needs with that of competing multi-threaded apps benefiting from the memory bandwidth scaling with number of hosting nodes. You could try and compromise and allocate round robin starting locally and perhaps with diminishing strides as the total allocation grows (ie allocate local and progressively move towards a page round robin scheme as more memory is requested). I''m not sure this would do any better than plain old dumb round robin in the average case but it''s worth a thought.> However, supporting NUMA in the Hypervisor and forwarding arch-info to > the guest would make sense. At the least the very basic principle of: > If > the guest is to run on a limited set of processors (nodes), allocate > memory from that (those) node(s) for the guest would make a lot of > sense.I suspect there is widespread agreement on this point. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: Emmanuel Ackaouy [mailto:ack@xensource.com] > Sent: 16 January 2007 16:14 > To: Petersson, Mats > Cc: xen-devel; Anthony Liguori; David Pilger; Ryan Harper > Subject: Re: [Xen-devel] Re: NUMA and SMP > > On Jan 16, 2007, at 15:19, Petersson, Mats wrote: > >> There is a strong argument for making hypervisors and OSes NUMA > >> aware in the sense that: > >> 1- They know about system topology > >> 2- They can export this information up the stack to > applications and > >> users > >> 3- They can take in directives from users and applications to > >> partition > >> the > >> host and place some threads and memory in specific partitions. > >> 4- They use an interleaved (or random) initial memory > >> placement strategy > >> by default. > >> > >> The argument that the OS on its own -- without user or application > >> directives -- can make better placement decisions than > round-robin or > >> random placement is -- in my opinion -- flawed. > > > > Debatable - it depends a lot on WHAT applications you > expect to run, > > and > > how they behave. If you consider an application that frequently > > allocates and de-allocates memory dynamically in a single threaded > > process (say compiler), then allocating memory in the local > node should > > be the "first choice". > > > > Multithreaded apps can use a similar approach, if a thread is > > allocating > > memory, it''s often a good chance that the memory is being > used by that > > thread too [although this doesn''t work for message passing between > > threads, obviously, this is again a case where "knowledge > from the app" > > will be the only better solution than "random"]. > > > > This approach is by far not perfect, but if you consider that > > applications often do short term allocations, it makes sense to > > allocate > > on the local node if possible. > > I do not agree. > > Just because a thread happens to run on processor X when > it first faults in a page off the process'' heap doesn''t give you > a good indication that the memory will be used mostly by > this thread or that the thread will continue running on the > same processor. There are at least as many cases when > this assumption is invalid than when it is valid. Without any > solid indication that something else will work better, round > robin allocation has to be the default strategy.My guess would be that noticably more than 50% of all (user-mode) memory allocations are released within a shorter time than the time quanta used by the scheduler - which in itself means that it''s most likely not going to swap from one processor to another (although of course an interrupt may reschedule and move the thread to another processor, of course). These memory allocations are also usually small, but there may be many of them done in any second of runtime of the machine. Note that I haven''t made any effort to verify this guess, so if there''s some other data that you have that contradicts my view, then by all means disregard my thoughts!> > Also, if you allow one process to consume a large percentage > of one node''s memory, you are indirectly hurting all competing > multi-threaded apps which benefit from higher total memory > bandwidth when they spread their data across nodes.Yes. There''s definitely one of the drawbacks with this method.> > I understand your point that if a single threaded process quickly > shrinks its heap after growing it, it makes it less likely > that it will > migrate to a different processor while it is using this memory. I''m > not sure how you predict that memory will be quickly released at > allocation time though. Even if you could, I maintain you would > still need safeguards in place to balance that process'' needs > with that of competing multi-threaded apps benefiting from the > memory bandwidth scaling with number of hosting nodes.See above "guesswork".> > You could try and compromise and allocate round robin starting > locally and perhaps with diminishing strides as the total allocation > grows (ie allocate local and progressively move towards a page > round robin scheme as more memory is requested). I''m not sure > this would do any better than plain old dumb round robin in the > average case but it''s worth a thought.That''s definitely not a bad idea. Also, it''s probably not a bad idea to have at least two choices: "Allocate on closest processor" and "Round robin" (or "random" - apparently, this is a better approach than LRU for cache-line replacement, where LRU tends to work very badly for some cases, so it may be a better approach than round robin for the same reason).> > > > However, supporting NUMA in the Hypervisor and forwarding > arch-info to > > the guest would make sense. At the least the very basic > principle of: > > If > > the guest is to run on a limited set of processors (nodes), allocate > > memory from that (those) node(s) for the guest would make a lot of > > sense. > > I suspect there is widespread agreement on this point. > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I am puzzled ,what is the page migration? Thank you in advance Emmanuel Ackaouy 写道:> On the topic of NUMA: > > I''d like to dispute the assumption that a NUMA-aware OS can actually > make good decisions about the initial placement of memory in a > reasonable hardware ccNUMA system. > > How does the OS know on which node a particular chunk of memory > will be most accessed? The truth is that unless the application or > person running the application is herself NUMA-aware and can provide > placement hints or directives, the OS will seldom beat a round-robin / > interleave or random placement strategy. > > To illustrate, consider an app which lays out a bunch of data in memory > in a single thread and then spawns worker threads to process it. > > Is the OS to place memory close to the initial thread? How can it > possibly > know how many threads will eventually process the data? > > Even if the OS knew how many threads will eventually crunch the data, > it cannot possibly know at placement time if each thread will work on an > assigned data subset (and if so, which one) or if it will act as a > pipeline > stage with all the data being passed from one thread to the next. > > If you go beyond initial memory placement or start considering memory > migration, then it''s even harder to win because you have to pay copy > and stall penalties during migrations. So you have to be real smart > about predicting the future to do better than your ~10-40% memory > bandwidth and latency hit associated with doing simple memory > interleaving on a modern hardware-ccNUMA system. > > And it gets worse for you when your app is successfully taking advantage > of the memory cache hierarchy because its performance is less impacted > by raw memory latency and bandwidth. > > Things also get more difficult on a time-sharing host with competing > apps. > > There is a strong argument for making hypervisors and OSes NUMA > aware in the sense that: > 1- They know about system topology > 2- They can export this information up the stack to applications and > users > 3- They can take in directives from users and applications to > partition the > host and place some threads and memory in specific partitions. > 4- They use an interleaved (or random) initial memory placement strategy > by default. > > The argument that the OS on its own -- without user or application > directives -- can make better placement decisions than round-robin or > random placement is -- in my opinion -- flawed. > > I also am skeptical that the complexity associated with page migration > strategies would be worthwhile: If you got it wrong the first time, what > makes you think you''ll do better this time? > > Emmanuel. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: tgh [mailto:tianguanhua@ncic.ac.cn] > Sent: 20 March 2007 13:10 > To: Emmanuel Ackaouy > Cc: Petersson, Mats; Anthony Liguori; xen-devel; David > Pilger; Ryan Harper > Subject: Re: [Xen-devel] Re: NUMA and SMP > > I am puzzled ,what is the page migration? > Thank you in advanceI''m not entirely sure it''s the correct term, but I used to indicate that if you allocate some memory local to processor no X, and then later on, the page is used by processor Y, then one could consider "moving" the page from the memory region of X to the memory region of Y. So you "migrate" the page from one processor to another. This is of course not a "free" operation, and it''s only really helpful if the memory is accessed many times (and not cached each time it''s accessed). A case where this can be done "almost for free" is when a page is swapped out, and on return, allocate the page from the processor that made the access. But of course, if you''re looking for ultimate performance, swapping is a terrible idea - so making small optimizations in memory management when you''re loosing tons of cycles by swapping is meaningless as a overall performance gain. -- Mats> > > Emmanuel Ackaouy 写道: > > On the topic of NUMA: > > > > I''d like to dispute the assumption that a NUMA-aware OS can actually > > make good decisions about the initial placement of memory in a > > reasonable hardware ccNUMA system. > > > > How does the OS know on which node a particular chunk of memory > > will be most accessed? The truth is that unless the application or > > person running the application is herself NUMA-aware and can provide > > placement hints or directives, the OS will seldom beat a > round-robin / > > interleave or random placement strategy. > > > > To illustrate, consider an app which lays out a bunch of > data in memory > > in a single thread and then spawns worker threads to process it. > > > > Is the OS to place memory close to the initial thread? How can it > > possibly > > know how many threads will eventually process the data? > > > > Even if the OS knew how many threads will eventually crunch > the data, > > it cannot possibly know at placement time if each thread > will work on an > > assigned data subset (and if so, which one) or if it will act as a > > pipeline > > stage with all the data being passed from one thread to the next. > > > > If you go beyond initial memory placement or start > considering memory > > migration, then it''s even harder to win because you have to pay copy > > and stall penalties during migrations. So you have to be real smart > > about predicting the future to do better than your ~10-40% memory > > bandwidth and latency hit associated with doing simple memory > > interleaving on a modern hardware-ccNUMA system. > > > > And it gets worse for you when your app is successfully > taking advantage > > of the memory cache hierarchy because its performance is > less impacted > > by raw memory latency and bandwidth. > > > > Things also get more difficult on a time-sharing host with competing > > apps. > > > > There is a strong argument for making hypervisors and OSes NUMA > > aware in the sense that: > > 1- They know about system topology > > 2- They can export this information up the stack to > applications and > > users > > 3- They can take in directives from users and applications to > > partition the > > host and place some threads and memory in specific partitions. > > 4- They use an interleaved (or random) initial memory > placement strategy > > by default. > > > > The argument that the OS on its own -- without user or application > > directives -- can make better placement decisions than > round-robin or > > random placement is -- in my opinion -- flawed. > > > > I also am skeptical that the complexity associated with > page migration > > strategies would be worthwhile: If you got it wrong the > first time, what > > makes you think you''ll do better this time? > > > > Emmanuel. > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thank you for your reply I see and does xen support the numa-aware guestlinux now or in the future? another question maybe should be another topic what is the function of xc_map_foreign_range()in /tools/libxc/xc_linux.c? does xc_map_foreign_range() mmap the shared memory with another domain ,or with domain0 ,orsomething? could you help me Thanks in advance Petersson, Mats 写道:>> -----Original Message----- >> From: tgh [mailto:tianguanhua@ncic.ac.cn] >> Sent: 20 March 2007 13:10 >> To: Emmanuel Ackaouy >> Cc: Petersson, Mats; Anthony Liguori; xen-devel; David >> Pilger; Ryan Harper >> Subject: Re: [Xen-devel] Re: NUMA and SMP >> >> I am puzzled ,what is the page migration? >> Thank you in advance >> > > I''m not entirely sure it''s the correct term, but I used to indicate that if you allocate some memory local to processor no X, and then later on, the page is used by processor Y, then one could consider "moving" the page from the memory region of X to the memory region of Y. So you "migrate" the page from one processor to another. This is of course not a "free" operation, and it''s only really helpful if the memory is accessed many times (and not cached each time it''s accessed). > > A case where this can be done "almost for free" is when a page is swapped out, and on return, allocate the page from the processor that made the access. But of course, if you''re looking for ultimate performance, swapping is a terrible idea - so making small optimizations in memory management when you''re loosing tons of cycles by swapping is meaningless as a overall performance gain. > > -- > Mats > >> Emmanuel Ackaouy 写道: >> >>> On the topic of NUMA: >>> >>> I''d like to dispute the assumption that a NUMA-aware OS can actually >>> make good decisions about the initial placement of memory in a >>> reasonable hardware ccNUMA system. >>> >>> How does the OS know on which node a particular chunk of memory >>> will be most accessed? The truth is that unless the application or >>> person running the application is herself NUMA-aware and can provide >>> placement hints or directives, the OS will seldom beat a >>> >> round-robin / >> >>> interleave or random placement strategy. >>> >>> To illustrate, consider an app which lays out a bunch of >>> >> data in memory >> >>> in a single thread and then spawns worker threads to process it. >>> >>> Is the OS to place memory close to the initial thread? How can it >>> possibly >>> know how many threads will eventually process the data? >>> >>> Even if the OS knew how many threads will eventually crunch >>> >> the data, >> >>> it cannot possibly know at placement time if each thread >>> >> will work on an >> >>> assigned data subset (and if so, which one) or if it will act as a >>> pipeline >>> stage with all the data being passed from one thread to the next. >>> >>> If you go beyond initial memory placement or start >>> >> considering memory >> >>> migration, then it''s even harder to win because you have to pay copy >>> and stall penalties during migrations. So you have to be real smart >>> about predicting the future to do better than your ~10-40% memory >>> bandwidth and latency hit associated with doing simple memory >>> interleaving on a modern hardware-ccNUMA system. >>> >>> And it gets worse for you when your app is successfully >>> >> taking advantage >> >>> of the memory cache hierarchy because its performance is >>> >> less impacted >> >>> by raw memory latency and bandwidth. >>> >>> Things also get more difficult on a time-sharing host with competing >>> apps. >>> >>> There is a strong argument for making hypervisors and OSes NUMA >>> aware in the sense that: >>> 1- They know about system topology >>> 2- They can export this information up the stack to >>> >> applications and >> >>> users >>> 3- They can take in directives from users and applications to >>> partition the >>> host and place some threads and memory in specific partitions. >>> 4- They use an interleaved (or random) initial memory >>> >> placement strategy >> >>> by default. >>> >>> The argument that the OS on its own -- without user or application >>> directives -- can make better placement decisions than >>> >> round-robin or >> >>> random placement is -- in my opinion -- flawed. >>> >>> I also am skeptical that the complexity associated with >>> >> page migration >> >>> strategies would be worthwhile: If you got it wrong the >>> >> first time, what >> >>> makes you think you''ll do better this time? >>> >>> Emmanuel. >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >>> >>> >> >> >> > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 2007-03-20 at 21:10 +0800, tgh wrote:> I am puzzled ,what is the page migration? > Thank you in advanceNUMA is clear? NUMA distributes main memory across multiple memory interfaces. This used to be a feature reserved to high-end multiprocessor architectures, but in servers it is becoming sort of a commodity these days, in part due AMD multiprocessor systems being NUMA systems these days. AMD64 processors carry an integrated memory controller. So, if you buy an SMP machine with AMD processors today, you''d find each slice of the total memory being connected to a different processor inside. Note that this doesn''t break the ''symmetric'' in ''SMP'': it still remains a global, flat physical address space. The processors have interconnects by which memory can be read from remote processors as well, and will do so transparently to system and application software. [The alternative is rather the ''classic'' model: Multiple processors interconnected making SMP, but single memory interface in a single northbridge (Intel would call it the "MCH") connecting to the front-side bus, connecting all processors them to main memory. Obviously, that single memory interface will easily become a bottleneck, if all processors try to access memory simultaneously.] NUMA *may* help here: accessing local memory is very fast. Acessing remote memory is still pretty fast, but not as fast as it could be: hence ''NUMA'' - non-uniform memory access. So, in order to take advantage of such a memory topology, memory data would ideally be always at the CPU where the processing happens. But processes (or domains, regarding xen) may migrate between different processors. Whether this happens depends on scheduling decisions. There''s a cost involved in migration itself, so schedulers will do it ideally only if it really-makes-sense(TM). In order to keep a NUMA-system happy, pages once allocated could be moved as well, to where the current CPU is. This is page migration. As you may imagine, even more costly, and unfortunately completely useless if cpu migration needs to happen on a regular basis. Therefore it''s difficult to get it right. Getting it right depends on how much the scheduler and memory management knows about where the memory asked for will be needed -- in advance. This is the hardest part: Most software won''t tell, because the programming models employed today do not even recognize the fact that it may matter. Even if they would, in many cases, it would be even difficult to predict at all. regards, daniel -- Daniel Stodden LRR - Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für Informatik der TU München D-85748 Garching http://www.lrr.in.tum.de/~stodden mailto:stodden@cs.tum.edu PGP Fingerprint: F5A4 1575 4C56 E26A 0B33 3D80 457E 82AE B0D8 735B _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: tgh [mailto:tianguanhua@ncic.ac.cn] > Sent: 20 March 2007 13:50 > To: Petersson, Mats > Cc: Emmanuel Ackaouy; Anthony Liguori; xen-devel; David > Pilger; Ryan Harper > Subject: Re: [Xen-devel] Re: NUMA and SMP > > Thank you for your reply > > I see > and does xen support the numa-aware guestlinux now or in the future?There is no support in current Xen for NUMA-awareness, and for the guest to understand NUMA-ness in the system, Xen must have sufficient understanding to forward the relevant information to the guest.> > another question maybe should be another topic > what is the function of xc_map_foreign_range()in > /tools/libxc/xc_linux.c? > does xc_map_foreign_range() mmap the shared memory with another domain > ,or with domain0 ,orsomething?It maps a shared memory region with the domain specified by "domid". -- Mats> > could you help me > Thanks in advance > > Petersson, Mats 写道: > >> -----Original Message----- > >> From: tgh [mailto:tianguanhua@ncic.ac.cn] > >> Sent: 20 March 2007 13:10 > >> To: Emmanuel Ackaouy > >> Cc: Petersson, Mats; Anthony Liguori; xen-devel; David > >> Pilger; Ryan Harper > >> Subject: Re: [Xen-devel] Re: NUMA and SMP > >> > >> I am puzzled ,what is the page migration? > >> Thank you in advance > >> > > > > I''m not entirely sure it''s the correct term, but I used to > indicate that if you allocate some memory local to processor > no X, and then later on, the page is used by processor Y, > then one could consider "moving" the page from the memory > region of X to the memory region of Y. So you "migrate" the > page from one processor to another. This is of course not a > "free" operation, and it''s only really helpful if the memory > is accessed many times (and not cached each time it''s accessed). > > > > A case where this can be done "almost for free" is when a > page is swapped out, and on return, allocate the page from > the processor that made the access. But of course, if you''re > looking for ultimate performance, swapping is a terrible idea > - so making small optimizations in memory management when > you''re loosing tons of cycles by swapping is meaningless as a > overall performance gain. > > > > -- > > Mats > > > >> Emmanuel Ackaouy 写道: > >> > >>> On the topic of NUMA: > >>> > >>> I''d like to dispute the assumption that a NUMA-aware OS > can actually > >>> make good decisions about the initial placement of memory in a > >>> reasonable hardware ccNUMA system. > >>> > >>> How does the OS know on which node a particular chunk of memory > >>> will be most accessed? The truth is that unless the application or > >>> person running the application is herself NUMA-aware and > can provide > >>> placement hints or directives, the OS will seldom beat a > >>> > >> round-robin / > >> > >>> interleave or random placement strategy. > >>> > >>> To illustrate, consider an app which lays out a bunch of > >>> > >> data in memory > >> > >>> in a single thread and then spawns worker threads to process it. > >>> > >>> Is the OS to place memory close to the initial thread? How can it > >>> possibly > >>> know how many threads will eventually process the data? > >>> > >>> Even if the OS knew how many threads will eventually crunch > >>> > >> the data, > >> > >>> it cannot possibly know at placement time if each thread > >>> > >> will work on an > >> > >>> assigned data subset (and if so, which one) or if it will > act as a > >>> pipeline > >>> stage with all the data being passed from one thread to the next. > >>> > >>> If you go beyond initial memory placement or start > >>> > >> considering memory > >> > >>> migration, then it''s even harder to win because you have > to pay copy > >>> and stall penalties during migrations. So you have to be > real smart > >>> about predicting the future to do better than your ~10-40% memory > >>> bandwidth and latency hit associated with doing simple memory > >>> interleaving on a modern hardware-ccNUMA system. > >>> > >>> And it gets worse for you when your app is successfully > >>> > >> taking advantage > >> > >>> of the memory cache hierarchy because its performance is > >>> > >> less impacted > >> > >>> by raw memory latency and bandwidth. > >>> > >>> Things also get more difficult on a time-sharing host > with competing > >>> apps. > >>> > >>> There is a strong argument for making hypervisors and OSes NUMA > >>> aware in the sense that: > >>> 1- They know about system topology > >>> 2- They can export this information up the stack to > >>> > >> applications and > >> > >>> users > >>> 3- They can take in directives from users and applications to > >>> partition the > >>> host and place some threads and memory in specific partitions. > >>> 4- They use an interleaved (or random) initial memory > >>> > >> placement strategy > >> > >>> by default. > >>> > >>> The argument that the OS on its own -- without user or application > >>> directives -- can make better placement decisions than > >>> > >> round-robin or > >> > >>> random placement is -- in my opinion -- flawed. > >>> > >>> I also am skeptical that the complexity associated with > >>> > >> page migration > >> > >>> strategies would be worthwhile: If you got it wrong the > >>> > >> first time, what > >> > >>> makes you think you''ll do better this time? > >>> > >>> Emmanuel. > >>> > >>> > >>> _______________________________________________ > >>> Xen-devel mailing list > >>> Xen-devel@lists.xensource.com > >>> http://lists.xensource.com/xen-devel > >>> > >>> > >>> > >> > >> > >> > > > > > > > > > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* Petersson, Mats <Mats.Petersson@amd.com> [2007-03-20 11:33]:> > -----Original Message----- > > From: tgh [mailto:tianguanhua@ncic.ac.cn] > > Sent: 20 March 2007 13:50 > > To: Petersson, Mats > > Cc: Emmanuel Ackaouy; Anthony Liguori; xen-devel; David > > Pilger; Ryan Harper > > Subject: Re: [Xen-devel] Re: NUMA and SMP > > > > Thank you for your reply > > > > I see > > and does xen support the numa-aware guestlinux now or in the future? > > There is no support in current Xen for NUMA-awareness, and for the > guest to understand NUMA-ness in the system, Xen must have sufficient > understanding to forward the relevant information to the guest.As of Xen 3.0.4, Xen has support for detecting NUMA systems, parsing SRAT tables which indicate how memory and cpu are split up between the system NUMA nodes, support for allocating memory local to a particular cpu. To use NUMA, one must pass numa=on on the xen command line. Xen still lacks a NUMA-aware scheduler, so one must be sure to pin vcpus and keep your guest within a NUMA node. This is done using the cpus="" parameter in the guest config file. Xen doesn''t export any of the topology information is gleans from the SRAT table at the moment. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: Ryan Harper [mailto:ryanh@us.ibm.com] > Sent: 20 March 2007 16:46 > To: Petersson, Mats > Cc: tgh; xen-devel > Subject: Re: [Xen-devel] Re: NUMA and SMP > > * Petersson, Mats <Mats.Petersson@amd.com> [2007-03-20 11:33]: > > > -----Original Message----- > > > From: tgh [mailto:tianguanhua@ncic.ac.cn] > > > Sent: 20 March 2007 13:50 > > > To: Petersson, Mats > > > Cc: Emmanuel Ackaouy; Anthony Liguori; xen-devel; David > > > Pilger; Ryan Harper > > > Subject: Re: [Xen-devel] Re: NUMA and SMP > > > > > > Thank you for your reply > > > > > > I see > > > and does xen support the numa-aware guestlinux now or in > the future? > > > > There is no support in current Xen for NUMA-awareness, and for the > > guest to understand NUMA-ness in the system, Xen must have > sufficient > > understanding to forward the relevant information to the guest. > > As of Xen 3.0.4, Xen has support for detecting NUMA systems, parsing > SRAT tables which indicate how memory and cpu are split up between the > system NUMA nodes, support for allocating memory local to a particular > cpu. To use NUMA, one must pass numa=on on the xen command line. > > Xen still lacks a NUMA-aware scheduler, so one must be sure > to pin vcpus > and keep your guest within a NUMA node. This is done using > the cpus="" > parameter in the guest config file. > > Xen doesn''t export any of the topology information is gleans from the > SRAT table at the moment.Thanks for the update - I must have missed that it went in. -- Mats> > > -- > Ryan Harper > Software Engineer; Linux Technology Center > IBM Corp., Austin, Tx > (512) 838-9253 T/L: 678-9253 > ryanh@us.ibm.com > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thank you for your reply Daniel Stodden 写道:> On Tue, 2007-03-20 at 21:10 +0800, tgh wrote: > >> I am puzzled ,what is the page migration? >> Thank you in advance >> > > NUMA is clear? NUMA distributes main memory across multiple memory > interfaces. > > This used to be a feature reserved to high-end multiprocessor > architectures, but in servers it is becoming sort of a commodity these > days, in part due AMD multiprocessor systems being NUMA systems these > days. AMD64 processors carry an integrated memory controller. So, if you > buy an SMP machine with AMD processors today, you''d find each slice of > the total memory being connected to a different processor inside. > > Note that this doesn''t break the ''symmetric'' in ''SMP'': it still remains > a global, flat physical address space. The processors have interconnects > by which memory can be read from remote processors as well, and will do > so transparently to system and application software. >that is ,in the smp with adm64,it is a numa in the hardware architecture,while a smp in the system software,is it right? Thank you in advance> [The alternative is rather the ''classic'' model: Multiple processors > interconnected making SMP, but single memory interface in a single > northbridge (Intel would call it the "MCH") connecting to the front-side > bus, connecting all processors them to main memory. Obviously, that > single memory interface will easily become a bottleneck, if all > processors try to access memory simultaneously.] > > NUMA *may* help here: accessing local memory is very fast. Acessing > remote memory is still pretty fast, but not as fast as it could be: > hence ''NUMA'' - non-uniform memory access. > > So, in order to take advantage of such a memory topology, memory data > would ideally be always at the CPU where the processing happens. But > processes (or domains, regarding xen) may migrate between different > processors. Whether this happens depends on scheduling decisions. > There''s a cost involved in migration itself, so schedulers will do it > ideally only if it really-makes-sense(TM). > > In order to keep a NUMA-system happy, pages once allocated could be > moved as well, to where the current CPU is. This is page migration. > As you may imagine, even more costly, and unfortunately completely > useless if cpu migration needs to happen on a regular basis. Therefore > it''s difficult to get it right. Getting it right depends on how much the > scheduler and memory management knows about where the memory asked for > will be needed -- in advance. This is the hardest part: Most software > won''t tell, because the programming models employed today do not even > recognize the fact that it may matter. Even if they would, in many > cases, it would be even difficult to predict at all. > > regards, > daniel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2007-03-21 at 09:08 +0800, tgh wrote:> Thank you for your reply > > > Daniel Stodden 写道: > > On Tue, 2007-03-20 at 21:10 +0800, tgh wrote: > > > >> I am puzzled ,what is the page migration? > >> Thank you in advance > >> > > > > NUMA is clear? NUMA distributes main memory across multiple memory > > interfaces. > > > > This used to be a feature reserved to high-end multiprocessor > > architectures, but in servers it is becoming sort of a commodity these > > days, in part due AMD multiprocessor systems being NUMA systems these > > days. AMD64 processors carry an integrated memory controller. So, if you > > buy an SMP machine with AMD processors today, you''d find each slice of > > the total memory being connected to a different processor inside. > > > > Note that this doesn''t break the ''symmetric'' in ''SMP'': it still remains > > a global, flat physical address space. The processors have interconnects > > by which memory can be read from remote processors as well, and will do > > so transparently to system and application software. > > > that is ,in the smp with adm64,it is a numa in the hardware > architecture,while a smp in the system software,is it right?%} i believe you mean the right thing. it remains a regular smp architecture. system software remains smp. regards, daniel -- Daniel Stodden LRR - Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für Informatik der TU München D-85748 Garching http://www.lrr.in.tum.de/~stodden mailto:stodden@cs.tum.edu PGP Fingerprint: F5A4 1575 4C56 E26A 0B33 3D80 457E 82AE B0D8 735B _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thank you for reply>>> >>> >> that is ,in the smp with adm64,it is a numa in the hardware >> architecture,while a smp in the system software,is it right? >> > > %} > i believe you mean the right thing. it remains a regular smp > architecture. system software remains smp. > >in the linux ,a one node (struct pglist_data) has many zone(struct zone_struct),a zone has many page(struct page),is it right? in the smp with adm64 with the hardware of numa, its linux is a smp os ,then there is only one node (struct pglist_data) in the os when running or there are as many nodes as cpus in the system, does linux smp support two or more nodes when running? or in this case linux support numa feature? I am confused could you help me Thanks in advance> regards, > daniel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2007-03-22 at 09:16 +0800, tgh wrote:> Thank you for reply > > >>> > >>> > >> that is ,in the smp with adm64,it is a numa in the hardware > >> architecture,while a smp in the system software,is it right? > >> > > > > %} > > i believe you mean the right thing. it remains a regular smp > > architecture. system software remains smp. > > > > > in the linux ,a one node (struct pglist_data) has many zone(struct > zone_struct),a zone has many page(struct page),is it right?right.> in the smp with adm64 with the hardware of numa, its linux is a smp os > ,then there is only one node (struct pglist_data) in the os when running > or there are as many nodes as cpus in the system, does linux smp > support two or more nodes when running? or in this case linux support > numa feature?the number of nodes corresponds to the number of memory areas which allocators need distinguish. in the case of integrated memory controllers like amd64, expect to find as many nodes as there are processors. yes, linux has numa support. see linux/Documentation/vm/ daniel -- Daniel Stodden LRR - Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für Informatik der TU München D-85748 Garching http://www.lrr.in.tum.de/~stodden mailto:stodden@cs.tum.edu PGP Fingerprint: F5A4 1575 4C56 E26A 0B33 3D80 457E 82AE B0D8 735B _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thank you for your reply> >> in the smp with adm64 with the hardware of numa, its linux is a smp os >> ,then there is only one node (struct pglist_data) in the os when running >> or there are as many nodes as cpus in the system, does linux smp >> support two or more nodes when running? or in this case linux support >> numa feature? >> > > the number of nodes corresponds to the number of memory areas which > allocators need distinguish. in the case of integrated memory > controllers like amd64, expect to find as many nodes as there are > processors. > > yes, linux has numa support. see linux/Documentation/vm/ >linux has numa support,and in the case of integrated memory controllers like amd64, CONFIG_NUMA should be choice and linux support numa,is it right? if CONFIG_NUMA is not choice ,then linux could not work well in the case of integrated memory controllers even for amd64,is it right? and for the paravirt xen-guest-linux do not support numa-aware now ,or does it support numa-aware if CONFIG_NUMA choiced? Thanks in advance> daniel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2007-03-22 at 20:13 +0800, tgh wrote:> Thank you for your reply > > > > > > >> in the smp with adm64 with the hardware of numa, its linux is a smp os > >> ,then there is only one node (struct pglist_data) in the os when running > >> or there are as many nodes as cpus in the system, does linux smp > >> support two or more nodes when running? or in this case linux support > >> numa feature? > >> > > > > the number of nodes corresponds to the number of memory areas which > > allocators need distinguish. in the case of integrated memory > > controllers like amd64, expect to find as many nodes as there are > > processors. > > > > yes, linux has numa support. see linux/Documentation/vm/ > > > > linux has numa support,and in the case of integrated memory controllers > like amd64, CONFIG_NUMA should be choice and linux support numa,is it > right? if CONFIG_NUMA is not choice ,then linux could not work well in > the case of integrated memory controllers even for amd64,is it right?> and for the paravirt xen-guest-linux do not support numa-aware now ,or > does it support numa-aware if CONFIG_NUMA choiced?no. there is num-support in xen, as far as inspection of the memory topology and inclusion in the memory management is concerned. so basically, you can add the desired node number to get_free_pages(). there is numa support in linux to a somewhat larger degree. but... config NUMA bool "Non Uniform Memory Access (NUMA) Support" depends on SMP && !X86_64_XEN ...there is no numa-support for paravirtual kernels at this point in time. see, you can''t just switch it on and expect anything to improve. the vm may typically see a subset of the cpus/nodes physically available, with no reflection on their mapping to physical nodes. page migration between logical cpus is pointless if logical cpus migrate across physical ones, right? regards, daniel -- Daniel Stodden LRR - Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für Informatik der TU München D-85748 Garching http://www.lrr.in.tum.de/~stodden mailto:stodden@cs.tum.edu PGP Fingerprint: F5A4 1575 4C56 E26A 0B33 3D80 457E 82AE B0D8 735B _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* Daniel Stodden <stodden@cs.tum.edu> [2007-03-22 07:29]:> On Thu, 2007-03-22 at 20:13 +0800, tgh wrote: > > Thank you for your reply > > > > > > > > > > > >> in the smp with adm64 with the hardware of numa, its linux is a smp os > > >> ,then there is only one node (struct pglist_data) in the os when running > > >> or there are as many nodes as cpus in the system, does linux smp > > >> support two or more nodes when running? or in this case linux support > > >> numa feature? > > >> > > > > > > the number of nodes corresponds to the number of memory areas which > > > allocators need distinguish. in the case of integrated memory > > > controllers like amd64, expect to find as many nodes as there are > > > processors. > > > > > > yes, linux has numa support. see linux/Documentation/vm/ > > > > > > > linux has numa support,and in the case of integrated memory controllers > > like amd64, CONFIG_NUMA should be choice and linux support numa,is it > > right? if CONFIG_NUMA is not choice ,then linux could not work well in > > the case of integrated memory controllers even for amd64,is it right? > > > > and for the paravirt xen-guest-linux do not support numa-aware now ,or > > does it support numa-aware if CONFIG_NUMA choiced? > > no. there is num-support in xen, as far as inspection of the memory > topology and inclusion in the memory management is concerned. so > basically, you can add the desired node number to get_free_pages().There is NUMA support in Xen since 3.0.4 in the hypervisor, and we have the capability to ensure a guest memory is local to the processors being used. The topology of the system is not exported to the guest so CONFIG_NUMA in the guest kernel config will be of no value. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2007-03-22 at 08:02 -0500, Ryan Harper wrote:> > > and for the paravirt xen-guest-linux do not support numa-aware now ,or > > > does it support numa-aware if CONFIG_NUMA choiced? > > > > no. there is num-support in xen, as far as inspection of the memory > > topology and inclusion in the memory management is concerned. so > > basically, you can add the desired node number to get_free_pages(). > > There is NUMA support in Xen since 3.0.4 in the hypervisor, and we have > the capability to ensure a guest memory is local to the processors being > used. The topology of the system is not exported to the guest so > CONFIG_NUMA in the guest kernel config will be of no value.oops, that''s more than i''ve noticed. thanks for the correction. so now it seems up to me to ask questions. :} i don''t see that path taken along vcpu_migrate. where is it happening? cheers, daniel -- Daniel Stodden LRR - Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für Informatik der TU München D-85748 Garching http://www.lrr.in.tum.de/~stodden mailto:stodden@cs.tum.edu PGP Fingerprint: F5A4 1575 4C56 E26A 0B33 3D80 457E 82AE B0D8 735B _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* Daniel Stodden <stodden@cs.tum.edu> [2007-03-22 10:03]:> On Thu, 2007-03-22 at 08:02 -0500, Ryan Harper wrote: > > > > > and for the paravirt xen-guest-linux do not support numa-aware now ,or > > > > does it support numa-aware if CONFIG_NUMA choiced? > > > > > > no. there is num-support in xen, as far as inspection of the memory > > > topology and inclusion in the memory management is concerned. so > > > basically, you can add the desired node number to get_free_pages(). > > > > There is NUMA support in Xen since 3.0.4 in the hypervisor, and we have > > the capability to ensure a guest memory is local to the processors being > > used. The topology of the system is not exported to the guest so > > CONFIG_NUMA in the guest kernel config will be of no value. > > oops, that''s more than i''ve noticed. thanks for the correction. so now > it seems up to me to ask questions. :} i don''t see that path taken along > vcpu_migrate. where is it happening?The credit scheduler is not NUMA aware. So to ensure that the initial allocation for the guest remains local, the domain uses a cpumask (generated from cpus="" config file option) to keep the scheduler from migrating vcpus to off-node cpus. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2007-03-22 at 10:12 -0500, Ryan Harper wrote:> * Daniel Stodden <stodden@cs.tum.edu> [2007-03-22 10:03]: > > On Thu, 2007-03-22 at 08:02 -0500, Ryan Harper wrote: > > > > > > > and for the paravirt xen-guest-linux do not support numa-aware now ,or > > > > > does it support numa-aware if CONFIG_NUMA choiced? > > > > > > > > no. there is num-support in xen, as far as inspection of the memory > > > > topology and inclusion in the memory management is concerned. so > > > > basically, you can add the desired node number to get_free_pages(). > > > > > > There is NUMA support in Xen since 3.0.4 in the hypervisor, and we have > > > the capability to ensure a guest memory is local to the processors being > > > used. The topology of the system is not exported to the guest so > > > CONFIG_NUMA in the guest kernel config will be of no value. > > > > oops, that''s more than i''ve noticed. thanks for the correction. so now > > it seems up to me to ask questions. :} i don''t see that path taken along > > vcpu_migrate. where is it happening? > > The credit scheduler is not NUMA aware. So to ensure that the initial > allocation for the guest remains local, the domain uses a cpumask > (generated from cpus="" config file option) to keep the scheduler from > migrating vcpus to off-node cpus.i see. not like i''m too deep in the srat, but methinks there may be sane default values to be generated from the srat. any work happening or having happened on that front? regards, daniel -- Daniel Stodden LRR - Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für Informatik der TU München D-85748 Garching http://www.lrr.in.tum.de/~stodden mailto:stodden@cs.tum.edu PGP Fingerprint: F5A4 1575 4C56 E26A 0B33 3D80 457E 82AE B0D8 735B _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* Daniel Stodden <stodden@cs.tum.edu> [2007-03-22 10:41]:> On Thu, 2007-03-22 at 10:12 -0500, Ryan Harper wrote: > > * Daniel Stodden <stodden@cs.tum.edu> [2007-03-22 10:03]: > > > On Thu, 2007-03-22 at 08:02 -0500, Ryan Harper wrote: > > > > > > > > > and for the paravirt xen-guest-linux do not support numa-aware now ,or > > > > > > does it support numa-aware if CONFIG_NUMA choiced? > > > > > > > > > > no. there is num-support in xen, as far as inspection of the memory > > > > > topology and inclusion in the memory management is concerned. so > > > > > basically, you can add the desired node number to get_free_pages(). > > > > > > > > There is NUMA support in Xen since 3.0.4 in the hypervisor, and we have > > > > the capability to ensure a guest memory is local to the processors being > > > > used. The topology of the system is not exported to the guest so > > > > CONFIG_NUMA in the guest kernel config will be of no value. > > > > > > oops, that''s more than i''ve noticed. thanks for the correction. so now > > > it seems up to me to ask questions. :} i don''t see that path taken along > > > vcpu_migrate. where is it happening? > > > > The credit scheduler is not NUMA aware. So to ensure that the initial > > allocation for the guest remains local, the domain uses a cpumask > > (generated from cpus="" config file option) to keep the scheduler from > > migrating vcpus to off-node cpus. > > i see. not like i''m too deep in the srat, but methinks there may be sane > default values to be generated from the srat. any work happening or > having happened on that front?Xen understands the topology but does not make any direct use of the information in either the initial placement of VCPUs for the guest, nor in the scheduler when making migration decisions. I''m not aware of any work to address that at the moment. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2007-03-22 at 11:01 -0500, Ryan Harper wrote:> > i see. not like i''m too deep in the srat, but methinks there may be sane > > default values to be generated from the srat. any work happening or > > having happened on that front? > > Xen understands the topology but does not make any direct use of the > information in either the initial placement of VCPUs for the guest, nor > in the scheduler when making migration decisions. I''m not aware of any > work to address that at the moment.thanks. do you continue work on xen and numa, or proceed elsewhere? regards, daniel -- Daniel Stodden LRR - Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für Informatik der TU München D-85748 Garching http://www.lrr.in.tum.de/~stodden mailto:stodden@cs.tum.edu PGP Fingerprint: F5A4 1575 4C56 E26A 0B33 3D80 457E 82AE B0D8 735B _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* Daniel Stodden <stodden@cs.tum.edu> [2007-03-22 11:25]:> On Thu, 2007-03-22 at 11:01 -0500, Ryan Harper wrote: > > > > i see. not like i''m too deep in the srat, but methinks there may be sane > > > default values to be generated from the srat. any work happening or > > > having happened on that front? > > > > Xen understands the topology but does not make any direct use of the > > information in either the initial placement of VCPUs for the guest, nor > > in the scheduler when making migration decisions. I''m not aware of any > > work to address that at the moment. > > thanks. do you continue work on xen and numa, or proceed elsewhere?I continue to work on xen and keep and eye on the NUMA support. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
hi how many nodes in the numa with adm64 does xen support at present? Thank you Ryan Harper 写道:> * Daniel Stodden <stodden@cs.tum.edu> [2007-03-22 11:25]: > >> On Thu, 2007-03-22 at 11:01 -0500, Ryan Harper wrote: >> >> >>>> i see. not like i''m too deep in the srat, but methinks there may be sane >>>> default values to be generated from the srat. any work happening or >>>> having happened on that front? >>>> >>> Xen understands the topology but does not make any direct use of the >>> information in either the initial placement of VCPUs for the guest, nor >>> in the scheduler when making migration decisions. I''m not aware of any >>> work to address that at the moment. >>> >> thanks. do you continue work on xen and numa, or proceed elsewhere? >> > > I continue to work on xen and keep and eye on the NUMA support. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* tgh <tianguanhua@ncic.ac.cn> [2007-03-23 00:48]:> hi > how many nodes in the numa with adm64 does xen support at present?in xen/include/asm-x86/numa.h: #define NODE_SHIFT=6 #in xen/include/xen/numa.h: #define MAX_NUMNODES = (1 << NODE_SHIFT); which works out to 64 nodes. I don''t know if anyone has tested more than an 8 node system. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > Ryan Harper > Sent: 23 March 2007 14:43 > To: tgh > Cc: Xen Developers; Daniel Stodden > Subject: Re: [Xen-devel] Re: NUMA and SMP > > * tgh <tianguanhua@ncic.ac.cn> [2007-03-23 00:48]: > > hi > > how many nodes in the numa with adm64 does xen support at present? > > in xen/include/asm-x86/numa.h: > #define NODE_SHIFT=6 > > #in xen/include/xen/numa.h: > #define MAX_NUMNODES = (1 << NODE_SHIFT); > > which works out to 64 nodes. I don''t know if anyone has tested more > than an 8 node system.Of course, if we''re talking AMD64 systems, if a NODE is a socket, the currently available architecture supports 8 NODES, so there''s plenty of space to grow such a system. I think there''s plans to grow this, but I doubt that the limit above will be reached anytime soon. Even if a node is a core within a CPU, the current limit of 8 sockets will limit the number of cores in a system to 32 cores when the quad-core processors become available. So still sufficient to support any current architecture. -- Mats> > -- > Ryan Harper > Software Engineer; Linux Technology Center > IBM Corp., Austin, Tx > (512) 838-9253 T/L: 678-9253 > ryanh@us.ibm.com > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
hi xen does not support numa-aware guest linux, is it right? and there are memory-hotplug.c and migration.c in the linux2.6.20, does it means that linux could support the hotplug memory or not ? if it could ,does linux have to be numa-aware to support memory hotplug or a smp linux could support memory hotplug? I am confused about it could you help me Thanks in advance Petersson, Mats 写道:> > > >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of >> Ryan Harper >> Sent: 23 March 2007 14:43 >> To: tgh >> Cc: Xen Developers; Daniel Stodden >> Subject: Re: [Xen-devel] Re: NUMA and SMP >> >> * tgh <tianguanhua@ncic.ac.cn> [2007-03-23 00:48]: >> >>> hi >>> how many nodes in the numa with adm64 does xen support at present? >>> >> in xen/include/asm-x86/numa.h: >> #define NODE_SHIFT=6 >> >> #in xen/include/xen/numa.h: >> #define MAX_NUMNODES = (1 << NODE_SHIFT); >> >> which works out to 64 nodes. I don''t know if anyone has tested more >> than an 8 node system. >> > > Of course, if we''re talking AMD64 systems, if a NODE is a socket, the > currently available architecture supports 8 NODES, so there''s plenty of > space to grow such a system. I think there''s plans to grow this, but I > doubt that the limit above will be reached anytime soon. > > > Even if a node is a core within a CPU, the current limit of 8 sockets > will limit the number of cores in a system to 32 cores when the > quad-core processors become available. So still sufficient to support > any current architecture. > > -- > Mats > >> -- >> Ryan Harper >> Software Engineer; Linux Technology Center >> IBM Corp., Austin, Tx >> (512) 838-9253 T/L: 678-9253 >> ryanh@us.ibm.com >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >> >> >> > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* tgh <tianguanhua@ncic.ac.cn> [2007-03-27 20:51]:> hi > xen does not support numa-aware guest linux, is it right?You can have NUMA enabled in your guest, but Xen does not export something like a virtual SRAT table that your NUMA-aware guest could use to determine if its memory and cpu were in two different nodes.> and there are memory-hotplug.c and migration.c in the linux2.6.20, does > it means that linux could support the hotplug memory or not ?I don''t know the current state of memory hotplug in Linux.> if it could ,does linux have to be numa-aware to support memory hotplugI don''t believe supporting memory hotplug is related to NUMA.> or a smp linux could support memory hotplug?SMP linux isn''t related to memory hotplug either. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Liang Yang
2007-Mar-28 21:25 UTC
[Xen-devel] The context switch overhead comparison between vmexit/vmentry and hypercall.
Hi, If I just considering the pure context switch ovehead, which one has bigger overhead, using HW vmexit/vmentry to do root and non-root mode switch by programming VT-x vetor or using SW hypercall to inject interrupt to switch from ring 1 to ring 0 (or ring 3 to ring 0 for 64bit OS)? Does the switch between ring1 and ring0 has the same overhead as the switch between ring 3 and ring0? BTW, both root and non-root mode has four rings, if the ring0 and ring3 in non-root mode are used for guest os kernel and user applications, which ring level in root mode will be used when doing vmexit? Thanks, Liang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel