Keir (and community), Any thoughts on Jeurgen Gross'' patch on cpu pools? As a reminder, the idea is to allow "pools" of cpus that would have separate schedulers. Physical cpus and domains can be moved from one pool to another only by an explicit command. The main purpose Fujitsu seems to have is to allow a simple machine "partitioning" that is more robust than using simple affinity masks. Another potential advantage would be the ability to use different schedulers for different purposes. For my part, it seems like they should be OK. The main thing I don''t like is the ugliness related to continue_hypercall_on_cpu(), described below. Jeurgen, could you remind us what were the advantages of pools in the hypervisor, versus just having affinity masks (with maybe sugar in the toolstack)? Re the ugly part of the patch, relating to continue_hypercall_on_cpu(): Domains are assigned to a pool, so if continue_hypercall_on_cpu() is called for a cpu not in the domain''s pool, you can''t just run it normally. Jeurgen''s solution (IIRC) was to pause all domains in the other pool, temporarily move the cpu in question to the calling domain''s pool, finish the hypercall, then move the cpu in question back to the other pool. Since there''s a lot of antecedents in that, let''s take an example: Two pools; Pool A has cpus 0 and 1, pool B has cpus 2 and 3. Domain 0 is running in pool A, domain 1 is running in pool B. Domain 0 calls "continue_hypercall_on_cpu()" for cpu 2. Cpu 2 is in pool B, so Jeurgen''s patch: * Pauses domain 1 * Moves cpu 2 to pool A * Finishes the hypercall * Moves cpu 2 back to pool B * Unpauses domain 1 That seemed a bit ugly to me, but I''m not familiar enough with the use cases or the code to know if there''s a cleaner solution. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 27/07/2009 16:20, "George Dunlap" <dunlapg@umich.edu> wrote:> Keir (and community), > > Any thoughts on Jeurgen Gross'' patch on cpu pools? > > As a reminder, the idea is to allow "pools" of cpus that would have > separate schedulers. Physical cpus and domains can be moved from one > pool to another only by an explicit command. The main purpose Fujitsu > seems to have is to allow a simple machine "partitioning" that is more > robust than using simple affinity masks. Another potential advantage > would be the ability to use different schedulers for different > purposes.My own opinion, if it was not clear before, is that I''m not personally super excited about this feature. I''d like to know how interested users are in it before we spend developer effort on polishing it for inclusion. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap wrote:> Keir (and community), > > Any thoughts on Jeurgen Gross'' patch on cpu pools? > > As a reminder, the idea is to allow "pools" of cpus that would have > separate schedulers. Physical cpus and domains can be moved from one > pool to another only by an explicit command. The main purpose Fujitsu > seems to have is to allow a simple machine "partitioning" that is more > robust than using simple affinity masks. Another potential advantage > would be the ability to use different schedulers for different > purposes. > > For my part, it seems like they should be OK. The main thing I don''t > like is the ugliness related to continue_hypercall_on_cpu(), described > below. > > Jeurgen, could you remind us what were the advantages of pools in the > hypervisor, versus just having > affinity masks (with maybe sugar in the toolstack)? > > Re the ugly part of the patch, relating to continue_hypercall_on_cpu(): > > Domains are assigned to a pool, so > if continue_hypercall_on_cpu() is called for a cpu not in the domain''s > pool, you can''t just run it normally. Jeurgen''s solution (IIRC) was to > pause all domains in the other pool, temporarily move the cpu in > question to the calling domain''s pool, finish the hypercall, then move > the cpu in question back to the other pool. > > Since there''s a lot of antecedents in that, let''s take an example: > > Two pools; Pool A has cpus 0 and 1, pool B has cpus 2 and 3. > > Domain 0 is running in pool A, domain 1 is running in pool B. > > Domain 0 calls "continue_hypercall_on_cpu()" for cpu 2. > > Cpu 2 is in pool B, so Jeurgen''s patch: > * Pauses domain 1 > * Moves cpu 2 to pool A > * Finishes the hypercall > * Moves cpu 2 back to pool B > * Unpauses domain 1 > > That seemed a bit ugly to me, but I''m not familiar enough with the use > cases or the code to know if there''s a cleaner solution. >A usecase from me: I want a pool that passthrough pcpus to the mission critical domains. A scheduling algorithm will map vcpus to pcpus one by one in this pool. That will implement a reliable hard partitioning. although it will lose some benefit of virtualization. And we still want a pool using the credit scheduler for common domains. thanks, zhigang> -George > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap wrote:> Keir (and community), > > Any thoughts on Jeurgen Gross'' patch on cpu pools? > > As a reminder, the idea is to allow "pools" of cpus that would have > separate schedulers. Physical cpus and domains can be moved from one > pool to another only by an explicit command. The main purpose Fujitsu > seems to have is to allow a simple machine "partitioning" that is more > robust than using simple affinity masks. Another potential advantage > would be the ability to use different schedulers for different > purposes. > > For my part, it seems like they should be OK. The main thing I don''t > like is the ugliness related to continue_hypercall_on_cpu(), described > below. > > Jeurgen, could you remind us what were the advantages of pools in the > hypervisor, versus just having > affinity masks (with maybe sugar in the toolstack)?Sure. Our main reason for introducing pools was the weakness of the current scheduler(s) to schedule domains according to their weights while restricting the domains to a subset of the physical processors using pinning. I think it is virtually impossible to find a general solution for this problem without some sort of pooling (if somebody proves me being wrong here, I''m completely glad to take this "perfect" scheduler instead of pools :-) ). So while the reason for the pools was a lack of functionality in the first run, there are some more benefits: + possibility to use different schedulers for different domains on the same machine (do you remember the discussion with bcredit?). Zhigang has posted a request for this feature already. + less lock conflicts on huge machines with many processors + pools could be a good base for NUMA-aware scheduling policies> > Re the ugly part of the patch, relating to continue_hypercall_on_cpu(): > > Domains are assigned to a pool, so > if continue_hypercall_on_cpu() is called for a cpu not in the domain''s > pool, you can''t just run it normally. Jeurgen''s solution (IIRC) was to > pause all domains in the other pool, temporarily move the cpu in > question to the calling domain''s pool, finish the hypercall, then move > the cpu in question back to the other pool. > > Since there''s a lot of antecedents in that, let''s take an example: > > Two pools; Pool A has cpus 0 and 1, pool B has cpus 2 and 3. > > Domain 0 is running in pool A, domain 1 is running in pool B. > > Domain 0 calls "continue_hypercall_on_cpu()" for cpu 2. > > Cpu 2 is in pool B, so Jeurgen''s patch: > * Pauses domain 1 > * Moves cpu 2 to pool A > * Finishes the hypercall > * Moves cpu 2 back to pool B > * Unpauses domain 1 > > That seemed a bit ugly to me, but I''m not familiar enough with the use > cases or the code to know if there''s a cleaner solution.Some thoughts on this topic: The continue_hypercall_on_cpu() function is needed on x86 for loading new microcode into the processor. The source buffer of the new microcode is located in dom0-memory so dom0 has to run on the physical processor the new code is loaded into (otherwise it wouldn''t be accessible). We could avoid the complete continue_hypercall_on_cpu() stuff if the microcode would be copied into a hypervisor buffer and use on_selected_cpus() instead. Other users (cpu hotplug and acpi_enter_sleep) would have to switch to other solutions as well. BTW: continue_hypercall_on_cpu() exists on x86 only and it isn''t really much better than my usage of it: - remember old pinning state of current vcpu - pin it temporarily to the cpu it should continue on - continue the hypercall - remove temporary pinning - re-establish old pinning (if any) Pretty much the same as my solution above ;-) So I would suggest to eliminate continue_hypercall_on_cpu() completely if you are feeling uneasy with my solution. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/07/2009 06:40, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:> BTW: continue_hypercall_on_cpu() exists on x86 only and it isn''t really much > better than my usage of it: > - remember old pinning state of current vcpu > - pin it temporarily to the cpu it should continue on > - continue the hypercall > - remove temporary pinning > - re-establish old pinning (if any) > Pretty much the same as my solution above ;-)If your solution locks the pinning, as we do already, so that it cannot be changed while the continue_hypercall_on_cpu() is running, then that is fine. If it''s not locked then it''s not safe. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 01:41 +0100 on 28 Jul (1248745277), Zhigang Wang wrote:> A usecase from me: I want a pool that passthrough pcpus to the mission > critical domains. A scheduling algorithm will map vcpus to pcpus one > by one in this pool. That will implement a reliable hard partitioning. > although it will lose some benefit of virtualization.That''s easily done by setting affinity masks in the tools, without needing any mechanism in Xen. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan wrote:> At 01:41 +0100 on 28 Jul (1248745277), Zhigang Wang wrote: >> A usecase from me: I want a pool that passthrough pcpus to the mission >> critical domains. A scheduling algorithm will map vcpus to pcpus one >> by one in this pool. That will implement a reliable hard partitioning. >> although it will lose some benefit of virtualization. > > That''s easily done by setting affinity masks in the tools, without > needing any mechanism in Xen.More or less. You have to set the affinity masks for ALL domains to avoid scheduling on the "special" cpus. You won''t have reliable scheduling weights any more. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 28/07/2009 06:40, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >> BTW: continue_hypercall_on_cpu() exists on x86 only and it isn''t really much >> better than my usage of it: >> - remember old pinning state of current vcpu >> - pin it temporarily to the cpu it should continue on >> - continue the hypercall >> - remove temporary pinning >> - re-establish old pinning (if any) >> Pretty much the same as my solution above ;-) > > If your solution locks the pinning, as we do already, so that it cannot be > changed while the continue_hypercall_on_cpu() is running, then that is fine. > If it''s not locked then it''s not safe.Locking in my solution should be okay. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Jul 28, 2009 at 11:15 AM, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote:> Tim Deegan wrote: >> That''s easily done by setting affinity masks in the tools, without >> needing any mechanism in Xen. > > More or less. > You have to set the affinity masks for ALL domains to avoid scheduling on the > "special" cpus. > You won''t have reliable scheduling weights any more.I think this is the key thing: scheduling algorithms normally ignore pinning. Both credit1 and (atm) credit2 assume that a VM with higher credit will be able to run before a VM of lower credit. But if there are some VMs pinned to a subset of pcpus, and other VMs pinned on another subset (or not pinned at all), this breaks that assumption. The algorithms might be able to be extended to account for pinning, but it would make things a lot more complicated. This means harder to understand and modify the algorithm, which means it''s likely to break in the future as people try to extend it. It also means any future updates or rewrites have to take this pin-to-partition case into account and do the "right thing". Given that people want to partition a machine, I think cpu pools makes the most sense: * From a user perspective it''s easier; no need to pin every VM, simply assign which pool it starts in * From a scheduler perspective, it makes thinking about the algorithms easier. It''s OK to build in the assumption that each VM can run anywhere. Other than partitioning, there''s no real need to adjust the scheduling algorithm to do it. -George> > > Juergen > > -- > Juergen Gross Principal Developer Operating Systems > TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 > Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com > Otto-Hahn-Ring 6 Internet: ts.fujitsu.com > D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 13:50 +0100 on 28 Jul (1248789008), George Dunlap wrote:> On Tue, Jul 28, 2009 at 11:15 AM, Juergen > Gross<juergen.gross@ts.fujitsu.com> wrote: > > Tim Deegan wrote: > >> That''s easily done by setting affinity masks in the tools, without > >> needing any mechanism in Xen. > > > > More or less. > > You have to set the affinity masks for ALL domains to avoid scheduling on the > > "special" cpus.Bah. You have to set the CPU pool of all domains to achieve the same thing; in any case this kind of thing is what toolstacks are good at. :)> > You won''t have reliable scheduling weights any more.That''s a much more interesting argument. It seems to me that in this simple case the scheduling weights will work out OK, but I can see that in the general case it gets entertaining.> Given that people want to partition a machine, I think cpu pools makes > the most sense: > * From a user perspective it''s easier; no need to pin every VM, simply > assign which pool it starts inI''ll say it again because I think it''s important: policy belongs in the tools. User-friendly abstractions don''t have to extend into the hypervisor interfaces unless...> * From a scheduler perspective, it makes thinking about the algorithms > easier. It''s OK to build in the assumption that each VM can run > anywhere. Other than partitioning, there''s no real need to adjust the > scheduling algorithm to do it....unless there''s a benefit to keeping the hypervisor simple. Which this certainly looks like. Does strict partitioning of CPUs like this satisfy everyone''s requirements? Bearing in mind that - It''s not work-conserving, i.e. it doesn''t allow best-effort scheduling of pool A''s vCPUs on the idle CPUs of pool B. - It restricts the maximum useful number of vCPUs per guest to the size of a pool rather than the size of the machine. - dom0 would be restricted to a subset of CPUs. That seems OK to me but occasionally people talk about having dom0''s vCPUs pinned 1-1 on the physical CPUs. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan wrote:> At 13:50 +0100 on 28 Jul (1248789008), George Dunlap wrote: >> On Tue, Jul 28, 2009 at 11:15 AM, Juergen >> Gross<juergen.gross@ts.fujitsu.com> wrote: >>> Tim Deegan wrote: >>>> That''s easily done by setting affinity masks in the tools, without >>>> needing any mechanism in Xen. >>> More or less. >>> You have to set the affinity masks for ALL domains to avoid scheduling on the >>> "special" cpus. > > Bah. You have to set the CPU pool of all domains to achieve the same > thing; in any case this kind of thing is what toolstacks are good at. :)No. If I have a dedicated pool for my "special domain" and all other domains are running in the default pool 0, I only have to set the pool of my special domain. Nothing else.> >>> You won''t have reliable scheduling weights any more. > > That''s a much more interesting argument. It seems to me that in this > simple case the scheduling weights will work out OK, but I can see that > in the general case it gets entertaining.Even in the relatively simple case of 2 disjunct subsets of domains/cpus (e.g. 2 domains on cpu 0+1 and 2 domains on cpu 2+3) the consumed time of the domains does not reflect their weights correctly.> >> Given that people want to partition a machine, I think cpu pools makes >> the most sense: >> * From a user perspective it''s easier; no need to pin every VM, simply >> assign which pool it starts in > > I''ll say it again because I think it''s important: policy belongs in the > tools. User-friendly abstractions don''t have to extend into the > hypervisor interfaces unless... > >> * From a scheduler perspective, it makes thinking about the algorithms >> easier. It''s OK to build in the assumption that each VM can run >> anywhere. Other than partitioning, there''s no real need to adjust the >> scheduling algorithm to do it. > > ...unless there''s a benefit to keeping the hypervisor simple. Which > this certainly looks like. > > Does strict partitioning of CPUs like this satisfy everyone''s > requirements? Bearing in mind that > > - It''s not work-conserving, i.e. it doesn''t allow best-effort > scheduling of pool A''s vCPUs on the idle CPUs of pool B. > > - It restricts the maximum useful number of vCPUs per guest to the size > of a pool rather than the size of the machine. > > - dom0 would be restricted to a subset of CPUs. That seems OK to me > but occasionally people talk about having dom0''s vCPUs pinned 1-1 on > the physical CPUs.You don''t have to define other pools. You can just live with the default pool extended to all cpus and everything is as today. Pinning is still working in each pool as today. If a user has domains with different scheduling requirements (e.g. sedf and credit are to be used) he can use one partitioned machine instead two dedicated machines. And he can shift resources between the domains (e.g. devices, memory, single cores or even threads). He can''t do that without pools today. With pools you have more possibilities without losing any function you have today. The only restriction is that you might not be able to use ALL features together with pools (e.g. complete load balancing), but the alternative would be to either lose some other functionality (scheduling weights) or to use different machines which won''t give you load balancing either. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 14:24 +0100 on 28 Jul (1248791073), Juergen Gross wrote:> > Does strict partitioning of CPUs like this satisfy everyone''s > > requirements? Bearing in mind that > > > > - It''s not work-conserving, i.e. it doesn''t allow best-effort > > scheduling of pool A''s vCPUs on the idle CPUs of pool B. > > > > - It restricts the maximum useful number of vCPUs per guest to the size > > of a pool rather than the size of the machine. > > > > - dom0 would be restricted to a subset of CPUs. That seems OK to me > > but occasionally people talk about having dom0''s vCPUs pinned 1-1 on > > the physical CPUs. > > You don''t have to define other pools. You can just live with the default pool > extended to all cpus and everything is as today.Yep, all I''m saying is you can''t do both. If the people who want this feature (so far I count two of you) want to do both, then this solution''s good not enough, and we should think about that before going ahead with it. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan wrote:> At 14:24 +0100 on 28 Jul (1248791073), Juergen Gross wrote: >>> Does strict partitioning of CPUs like this satisfy everyone''s >>> requirements? Bearing in mind that >>> >>> - It''s not work-conserving, i.e. it doesn''t allow best-effort >>> scheduling of pool A''s vCPUs on the idle CPUs of pool B. >>> >>> - It restricts the maximum useful number of vCPUs per guest to the size >>> of a pool rather than the size of the machine. >>> >>> - dom0 would be restricted to a subset of CPUs. That seems OK to me >>> but occasionally people talk about having dom0''s vCPUs pinned 1-1 on >>> the physical CPUs. >> You don''t have to define other pools. You can just live with the default pool >> extended to all cpus and everything is as today. > > Yep, all I''m saying is you can''t do both. If the people who want this > feature (so far I count two of you) want to do both, then this > solution''s good not enough, and we should think about that before going > ahead with it.Okay. I think your first point is the most important one. It might be possible to build a load balancing scheme to shift cpus between pools dynamically, but this should be step 2, I think :-) But it would be a nice project :-) Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Jul 28, 2009 at 2:31 PM, Tim Deegan<Tim.Deegan@citrix.com> wrote:> At 14:24 +0100 on 28 Jul (1248791073), Juergen Gross wrote: >> > Does strict partitioning of CPUs like this satisfy everyone''s >> > requirements? Bearing in mind that >> > >> > - It''s not work-conserving, i.e. it doesn''t allow best-effort >> > scheduling of pool A''s vCPUs on the idle CPUs of pool B. >> > >> > - It restricts the maximum useful number of vCPUs per guest to the size >> > of a pool rather than the size of the machine. >> > >> > - dom0 would be restricted to a subset of CPUs. That seems OK to me >> > but occasionally people talk about having dom0''s vCPUs pinned 1-1 on >> > the physical CPUs. >> >> You don''t have to define other pools. You can just live with the default pool >> extended to all cpus and everything is as today. > > Yep, all I''m saying is you can''t do both. If the people who want this > feature (so far I count two of you) want to do both, then this > solution''s good not enough, and we should think about that before going > ahead with it.Yes, if you have more than one pool, then dom0 can''t run on all cpus; but it can still run with dom0''s vcpus pinned 1-1 on the physical cpus in its pool. I''m not sure why someone who wants to partition a machine would simultaneously want dom0 to run across all cpus... As Juergen says, for people who don''t use the feature, it shouldn''t have any real effect. The patch is pretty straightforward, except for the "continue_hypercall_on_cpu()" bit. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Jul 28, 2009 at 2:39 PM, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote:> I think your first point is the most important one. > It might be possible to build a load balancing scheme to shift cpus between > pools dynamically, but this should be step 2, I think :-) > But it would be a nice project :-)If I recall your use case, Juergen, I thought the whole point was to keep some set of VMs limited to just a subset of CPUs? So the first point is a feature for you, not a bug. :-) If we ever do find someone who wants cpu pools, perhaps to use different schedulers, but wants to be able to dynamically adjust pool size, then they can work on such a project. Until then, no point spending time on something no one''s going to use. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/07/2009 14:41, "George Dunlap" <dunlapg@umich.edu> wrote:> As Juergen says, for people who don''t use the feature, it shouldn''t > have any real effect. The patch is pretty straightforward, except for > the "continue_hypercall_on_cpu()" bit.Just pulled up the patch. Actually cpupool_borrow_cpu() does not seem to lock down the cpu-pool-vcpu relationship while continue_hypercall_on_cpu() is running. In particular, it is clear that it does nothing if the vcpu is already part of the pool that the domain is running in. But then what if the cpu is removed from the pool during the borrow_cpu()/return_cpu() critical region? It hardly inspires confidence. Another thing I noted is that sched_tick_suspend/resume are pointlessly changed to take a cpu parameter, which is smp_processor_id(). I swear at the screen whenever I see people trying to slip that kind of nonsense in. It makes it look like the functions can operate on an arbitrary cpu when in fact I''ll wager they cannot (and I doubt the author of such changes has checked). It''s a nasty nasty interface change. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap wrote:> On Tue, Jul 28, 2009 at 2:39 PM, Juergen > Gross<juergen.gross@ts.fujitsu.com> wrote: >> I think your first point is the most important one. >> It might be possible to build a load balancing scheme to shift cpus between >> pools dynamically, but this should be step 2, I think :-) >> But it would be a nice project :-) > > If I recall your use case, Juergen, I thought the whole point was to > keep some set of VMs limited to just a subset of CPUs? So the first > point is a feature for you, not a bug. :-)Indeed. I just like to think about further enhancements, even if my company isn''t requiring them...> > If we ever do find someone who wants cpu pools, perhaps to use > different schedulers, but wants to be able to dynamically adjust pool > size, then they can work on such a project. Until then, no point > spending time on something no one''s going to use.Absolutely true. OTOH I see pools as an interesting way to support large NUMA systems in an effective way. And for this usage you would need such a project :-) I think it is very important to check the possible future enhancements, as they might influence decisions today. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sorry for the late join... I wonder if cpu pools helps with the following problem: Some large software company that shall remain nameless continues to license their high value applications on a per-pcpu basis rather than on a per-vcpu basis. As a result, VMs running these applications must be restricted to specific pcpu''s which are "licensed" to run the software. Currently this is done with pinning, but pinning does restrict the flexibility of a multi-vcpu VM. Affinity seems like it should help, but affinity doesn''t restrict the VM from running on a non-affinitive pcpu (does it?) For example, assume you have an 8 vcpu VM and it must be restricted to a 2 pcpu license on a 4 pcpu server. Ideally, you''d like any of the 8 vcpus to be assigned to either pcpu at any time so you don''t want to pin, for example, even vcpu''s to pcpu#0 and odd vcpu''s to pcpu#1. And, if all vcpus are idle, you''d like pcpu#0 and pcpu#1 to be free to run other VMs. Can this be done with cpu pools (easier than / more flexibly than / and not at all ) with current pinning and affinity? Also in a data center, does cpu pools make it possible/ easier for tools to pre-assign a subset of processors on ALL servers in the data center to serve a certain licensed class of VMs? For example, perhaps one would like to upgrade some of the machines in one''s virtual data center from dual-core to quad-core but not pay for additional per-pcpu app licenses (i.e. the additional pcpus will be used for other non-licensed VMs). Tools could assign two pcpus on each server to be part of the "DB pool" thus restricting execution (and license fees) but still allowing easy migration. Can this be done with cpu pools (easier than / more flexibly than / and not at all ) with current pinning and affinity? If the answer to these questions is yes, than I suspect one large software company might be very interested in cpu pools.> -----Original Message----- > From: Juergen Gross [mailto:juergen.gross@ts.fujitsu.com] > Sent: Tuesday, July 28, 2009 7:57 AM > To: George Dunlap > Cc: xen-devel@lists.xensource.com; Zhigang Wang; Tim Deegan; > Keir Fraser > Subject: Re: [Xen-devel] Cpu pools discussion > > > George Dunlap wrote: > > On Tue, Jul 28, 2009 at 2:39 PM, Juergen > > Gross<juergen.gross@ts.fujitsu.com> wrote: > >> I think your first point is the most important one. > >> It might be possible to build a load balancing scheme to > shift cpus between > >> pools dynamically, but this should be step 2, I think :-) > >> But it would be a nice project :-) > > > > If I recall your use case, Juergen, I thought the whole point was to > > keep some set of VMs limited to just a subset of CPUs? So the first > > point is a feature for you, not a bug. :-) > > Indeed. > I just like to think about further enhancements, even if my > company isn''t > requiring them... > > > > > If we ever do find someone who wants cpu pools, perhaps to use > > different schedulers, but wants to be able to dynamically > adjust pool > > size, then they can work on such a project. Until then, no point > > spending time on something no one''s going to use. > > Absolutely true. > OTOH I see pools as an interesting way to support large NUMA > systems in an > effective way. And for this usage you would need such a project :-) > I think it is very important to check the possible future > enhancements, as > they might influence decisions today. > > > Juergen > > -- > Juergen Gross Principal Developer Operating Systems > TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 > Fujitsu Technolgy Solutions e-mail: > juergen.gross@ts.fujitsu.com > Otto-Hahn-Ring 6 Internet: ts.fujitsu.com > D-81739 Muenchen Company details: > ts.fujitsu.com/imprint.html > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/07/2009 16:29, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> Currently this is done with pinning, but pinning > does restrict the flexibility of a multi-vcpu VM. > Affinity seems like it should help, but affinity > doesn''t restrict the VM from running on a non-affinitive > pcpu (does it?)Yes it does. VCPUs only run on PCPUs in their affinity masks. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Or to put it another way, "pinning" is shorthand for an affinity that contains only one cpu. -George On Tue, Jul 28, 2009 at 4:49 PM, Keir Fraser<keir.fraser@eu.citrix.com> wrote:> On 28/07/2009 16:29, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote: > >> Currently this is done with pinning, but pinning >> does restrict the flexibility of a multi-vcpu VM. >> Affinity seems like it should help, but affinity >> doesn''t restrict the VM from running on a non-affinitive >> pcpu (does it?) > > Yes it does. VCPUs only run on PCPUs in their affinity masks. > > -- Keir > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 28/07/2009 16:29, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote: > >> Currently this is done with pinning, but pinning >> does restrict the flexibility of a multi-vcpu VM. >> Affinity seems like it should help, but affinity >> doesn''t restrict the VM from running on a non-affinitive >> pcpu (does it?) > > Yes it does. VCPUs only run on PCPUs in their affinity masks. > > -- Keir > >I''m wondering is there some performance difference between these two scenario: 1) vcpu0 pinned to pcpu0, vcpu1 pinned to pcpu1. 2) vcpu0 and vcpu1 affined to pcpu0 and pcpu1 but not pinned. Currently we have to explicitly pin *every* vcpu to get true hard partitioning. We are seeking for a better solution, whether it will be in the hypervisor or just user space tools. But seems the cpu pool concept is attractive. thanks, zhigang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer wrote:> Sorry for the late join... > > I wonder if cpu pools helps with the following problem: > > Some large software company that shall remain nameless > continues to license their high value applications > on a per-pcpu basis rather than on a per-vcpu basis. > As a result, VMs running these applications must be > restricted to specific pcpu''s which are "licensed" to > run the software. > > Currently this is done with pinning, but pinning > does restrict the flexibility of a multi-vcpu VM. > Affinity seems like it should help, but affinity > doesn''t restrict the VM from running on a non-affinitive > pcpu (does it?) > > For example, assume you have an 8 vcpu VM and it > must be restricted to a 2 pcpu license on a > 4 pcpu server. Ideally, you''d like any of the 8 > vcpus to be assigned to either pcpu at any time > so you don''t want to pin, for example, even > vcpu''s to pcpu#0 and odd vcpu''s to pcpu#1. > And, if all vcpus are idle, you''d like pcpu#0 > and pcpu#1 to be free to run other VMs. > > Can this be done with cpu pools (easier than / more > flexibly than / and not at all ) with current pinning > and affinity?Pools will restrict the assigned domains to the assigned pcpus. This can be done by affinity masks as well. But pools won''t allow domains of pool B to run on idle pcpus of pool A.> > Also in a data center, does cpu pools make it possible/ > easier for tools to pre-assign a subset of processors > on ALL servers in the data center to serve a certain > licensed class of VMs? For example, perhaps one > would like to upgrade some of the machines in one''s > virtual data center from dual-core to quad-core but > not pay for additional per-pcpu app licenses (i.e. > the additional pcpus will be used for other non-licensed > VMs). Tools could assign two pcpus on each server > to be part of the "DB pool" thus restricting execution > (and license fees) but still allowing easy migration. > > Can this be done with cpu pools (easier than / more > flexibly than / and not at all ) with current pinning > and affinity?This is easy doable with pools. We are doing this for our BS2000 system.> > If the answer to these questions is yes, than I > suspect one large software company might be very > interested in cpu pools.Is one "yes" enough? :-) Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 28/07/2009 14:41, "George Dunlap" <dunlapg@umich.edu> wrote: > >> As Juergen says, for people who don''t use the feature, it shouldn''t >> have any real effect. The patch is pretty straightforward, except for >> the "continue_hypercall_on_cpu()" bit. > > Just pulled up the patch. Actually cpupool_borrow_cpu() does not seem to > lock down the cpu-pool-vcpu relationship while continue_hypercall_on_cpu() > is running. In particular, it is clear that it does nothing if the vcpu is > already part of the pool that the domain is running in. But then what if the > cpu is removed from the pool during the borrow_cpu()/return_cpu() critical > region? It hardly inspires confidence.I checked the use cases. All calls leading to cpupool_borrow_cpu() are done under the domctl lock. The same applies to all cpupool operations. I can add an explicit check not to unassign borrowed cpus, if you like.> > Another thing I noted is that sched_tick_suspend/resume are pointlessly > changed to take a cpu parameter, which is smp_processor_id(). I swear at the > screen whenever I see people trying to slip that kind of nonsense in. ItSorry, this seems to be an artefact of an earlier version of my changes. I''ll remove this one...> makes it look like the functions can operate on an arbitrary cpu when in > fact I''ll wager they cannot (and I doubt the author of such changes has > checked). It''s a nasty nasty interface change.I''m pretty sure they could indeed work on any cpu. At least I tried to use them on other cpus, but I ran into other problems leading to the current solution not requiring the cpu parameter any more. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29/07/2009 07:14, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>> Just pulled up the patch. Actually cpupool_borrow_cpu() does not seem to >> lock down the cpu-pool-vcpu relationship while continue_hypercall_on_cpu() >> is running. In particular, it is clear that it does nothing if the vcpu is >> already part of the pool that the domain is running in. But then what if the >> cpu is removed from the pool during the borrow_cpu()/return_cpu() critical >> region? It hardly inspires confidence. > > I checked the use cases. > All calls leading to cpupool_borrow_cpu() are done under the domctl lock. > The same applies to all cpupool operations.Uhhh... How did you figure that one out? I don''t think one single caller of continue_hypercall_on_cpu() holds the domctl_lock. The callers are all sysctls and platform_ops.> I can add an explicit check not to unassign borrowed cpus, if you like.Your new interface ought to be responsible for its own synchronisation needs. And if it''s not you should implement the appropriate assertions regarding e.g., spin_is_locked(), plus a code comment. It''s simple negligence to do neither. This is all not to say that I''ve been convinced we should accept the feature at all... -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 29/07/2009 07:14, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >>> Just pulled up the patch. Actually cpupool_borrow_cpu() does not seem to >>> lock down the cpu-pool-vcpu relationship while continue_hypercall_on_cpu() >>> is running. In particular, it is clear that it does nothing if the vcpu is >>> already part of the pool that the domain is running in. But then what if the >>> cpu is removed from the pool during the borrow_cpu()/return_cpu() critical >>> region? It hardly inspires confidence. >> I checked the use cases. >> All calls leading to cpupool_borrow_cpu() are done under the domctl lock. >> The same applies to all cpupool operations. > > Uhhh... How did you figure that one out? I don''t think one single caller of > continue_hypercall_on_cpu() holds the domctl_lock. The callers are all > sysctls and platform_ops.Sigh. I just recalled it from memory. Seems I was wrong.> >> I can add an explicit check not to unassign borrowed cpus, if you like. > > Your new interface ought to be responsible for its own synchronisation > needs. And if it''s not you should implement the appropriate assertions > regarding e.g., spin_is_locked(), plus a code comment. It''s simple > negligence to do neither.You are right. I will add a check to ensure borrowed cpus are not allowed to change the pool. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29/07/2009 09:52, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>>> I can add an explicit check not to unassign borrowed cpus, if you like. >> >> Your new interface ought to be responsible for its own synchronisation >> needs. And if it''s not you should implement the appropriate assertions >> regarding e.g., spin_is_locked(), plus a code comment. It''s simple >> negligence to do neither. > > You are right. > I will add a check to ensure borrowed cpus are not allowed to change the pool.A couple more comments. It is not safe to domain_pause() while you hold locks. It can deadlock, as domain_pause() waits for the domain to be descheduled, but it could be spinning on a lock you hold. Also it looks like a domain can be moved away from a pool while the pool is paused, and then you would leak a pause refcount. Secondly, I think that the cpupool_borrow/return calls should be embedded within vcpu_{lock,unlock,locked_change}_affinity(); also I see no need to have cpupool_return_cpu() return anything as you should be able to make a decision to move onto another CPU on the next scheduling round anyway (which can always be forced by setting SCHEDULE_SOFTIRQ). Really I dislike this patch greatly, as you can tell. ;-) The patchset as a whole is *ginormous*, the Xen patch by itself is pretty big and complicated and I believe full of races and deadlocks. I''ve just picked up on a few obvious ones from a very brief read. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> > > On 29/07/2009 09:52, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >>>> I can add an explicit check not to unassign borrowed cpus, if you like. >>> Your new interface ought to be responsible for its own synchronisation >>> needs. And if it''s not you should implement the appropriate assertions >>> regarding e.g., spin_is_locked(), plus a code comment. It''s simple >>> negligence to do neither. >> You are right. >> I will add a check to ensure borrowed cpus are not allowed to change the pool. > > A couple more comments. > > It is not safe to domain_pause() while you hold locks. It can deadlock, as > domain_pause() waits for the domain to be descheduled, but it could be > spinning on a lock you hold. Also it looks like a domain can be moved away > from a pool while the pool is paused, and then you would leak a pause > refcount. > > Secondly, I think that the cpupool_borrow/return calls should be embedded > within vcpu_{lock,unlock,locked_change}_affinity(); also I see no need to > have cpupool_return_cpu() return anything as you should be able to make a > decision to move onto another CPU on the next scheduling round anyway (which > can always be forced by setting SCHEDULE_SOFTIRQ). > > Really I dislike this patch greatly, as you can tell. ;-) The patchset as a > whole is *ginormous*, the Xen patch by itself is pretty big and complicated > and I believe full of races and deadlocks. I''ve just picked up on a few > obvious ones from a very brief read.The main problems you mention here are related to the cpupool_borrow stuff, which is the main objection of George, too (its not my favourite part of the patch, too). Would you feel better if I''d try to eliminate the reason for cpupool_borrow? This function is needed only for continue_hypercall_on_cpu outside of the current pool. I think it should be possible to replace those by on_selected_cpus with less impact on the whole system. I tried not to change any interfaces which are not directly related to the pools in the first run. If the result of this approach forces you to reject the patch, I would be happy to change it. I agree with you it would be better not to need that borrow stuff, but I don''t know whether you would like the continue_hypercall_on_cpu elimination more (or which solution would cause less pain). The next step after that would be to split up the xen patch into logical pieces. I would suggest to change the scheduler internals in a separate patch (mainly the elimination of the local variables) to make the functional changes required for the pools more obvious. This should reduce the pure pool related patch by a factor of 2. Regarding races: I tested the "normal" pool interfaces (cpu add/remove, domain create/destroy/move) rather intensive (multiple concurrent scripts ran for several hours). The cpu borrow stuff was NOT tested very much. There are 3 use cases for this interface: - cpu microcode loading is running at system boot (this was my favourite test case) - enter deep sleep only continues on cpu 0, which I removed only occasionally from pool 0 - I don''t think I could test cpu hotplug... Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29/07/2009 12:06, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:> Would you feel better if I''d try to eliminate the reason for cpupool_borrow? > This function is needed only for continue_hypercall_on_cpu outside of the > current pool. I think it should be possible to replace those by > on_selected_cpus with less impact on the whole system.Some of the stuff in the continuation handlers cannot be executed in irq context. ''Fixing'' that would make many of the users ugly and less maintainable, so getting borrow/return right is the better answer I think. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 29/07/2009 12:06, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >> Would you feel better if I''d try to eliminate the reason for cpupool_borrow? >> This function is needed only for continue_hypercall_on_cpu outside of the >> current pool. I think it should be possible to replace those by >> on_selected_cpus with less impact on the whole system. > > Some of the stuff in the continuation handlers cannot be executed in irq > context. ''Fixing'' that would make many of the users ugly and less > maintainable, so getting borrow/return right is the better answer I think.The alternative would be a tasklet set up in irq. And we are speaking of 3 users. I could try a patch and then we could compare the two solutions. What do you think? Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29/07/2009 13:33, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>>> Would you feel better if I''d try to eliminate the reason for cpupool_borrow? >>> This function is needed only for continue_hypercall_on_cpu outside of the >>> current pool. I think it should be possible to replace those by >>> on_selected_cpus with less impact on the whole system. >> >> Some of the stuff in the continuation handlers cannot be executed in irq >> context. ''Fixing'' that would make many of the users ugly and less >> maintainable, so getting borrow/return right is the better answer I think. > > The alternative would be a tasklet set up in irq. > And we are speaking of 3 users. > I could try a patch and then we could compare the two solutions. What do you > think?This would work for a couple of callers, but some really need to be running in dom0 context. Or, more precisely, not the context of some other domain (softirqs synchronously preempt execution of a vcpu context). This can lead to subtle deadlocks, for example in freeze_domains() and in __cpu_die(), because we may need the vcpu we have snchronously preempted to make some progress for ourselves to be able to get past a spin loop. Another alternative might be to create a ''hypervisor thread'', either dynamically, or a per-cpu worker thread, and do the work in that. Of course that has its own complexities and these threads would also have their own interactions with cpu pools to keep them pinned on the appropriate physical cpu. I don''t know whether this would really work out simpler. Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 29/07/2009 13:33, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >>>> Would you feel better if I''d try to eliminate the reason for cpupool_borrow? >>>> This function is needed only for continue_hypercall_on_cpu outside of the >>>> current pool. I think it should be possible to replace those by >>>> on_selected_cpus with less impact on the whole system. >>> Some of the stuff in the continuation handlers cannot be executed in irq >>> context. ''Fixing'' that would make many of the users ugly and less >>> maintainable, so getting borrow/return right is the better answer I think. >> The alternative would be a tasklet set up in irq. >> And we are speaking of 3 users. >> I could try a patch and then we could compare the two solutions. What do you >> think? > > This would work for a couple of callers, but some really need to be running > in dom0 context. Or, more precisely, not the context of some other domain > (softirqs synchronously preempt execution of a vcpu context). This can lead > to subtle deadlocks, for example in freeze_domains() and in __cpu_die(), > because we may need the vcpu we have snchronously preempted to make some > progress for ourselves to be able to get past a spin loop.Okay.> Another alternative might be to create a ''hypervisor thread'', either > dynamically, or a per-cpu worker thread, and do the work in that. Of course > that has its own complexities and these threads would also have their own > interactions with cpu pools to keep them pinned on the appropriate physical > cpu. I don''t know whether this would really work out simpler.There should be an easy solution for this: What you are suggesting here sounds like a "hypervisor domain" similar to the the idle domain, but with high priority and normally all vcpus blocked. The interactions of this domain with cpupools would be the same as for the idle domain. I think this approach could be attractive, but the question is if the pros outweigh the cons. OTOH such a domain could open interesting opportunities. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 30/07/2009 06:46, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>> Another alternative might be to create a ''hypervisor thread'', either >> dynamically, or a per-cpu worker thread, and do the work in that. Of course >> that has its own complexities and these threads would also have their own >> interactions with cpu pools to keep them pinned on the appropriate physical >> cpu. I don''t know whether this would really work out simpler. > > There should be an easy solution for this: What you are suggesting here sounds > like a "hypervisor domain" similar to the the idle domain, but with high > priority and normally all vcpus blocked. > > The interactions of this domain with cpupools would be the same as for the > idle domain. > > I think this approach could be attractive, but the question is if the pros > outweigh the cons. OTOH such a domain could open interesting opportunities.I think especially if cpupools are added into the mix then this becomes more attractive than the current approach. The other alternative is to modify the two existing problematic callers to work okay from softirq context (or not need continue_hypercall_on_cpu() at all, which might be possible at least in the case of CPU hotplug). I would be undecided between these two just now -- it depends on how easily those two callers can be fixed up. CPU hotplug raises a question in relation to cpupools, by the way. What pool does a cpu get added to when it is brought online? And what do you do when someone offlines a CPU (e.g., especially when it is the last in its pool)? In that latter case, have you not considered it, or do you refuse the offline, or do you somehow break the pool affinity so that domains belonging to it can run elsewhere? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> CPU hotplug raises a question in relation to cpupools, by the way. What pool > does a cpu get added to when it is brought online? And what do you do when > someone offlines a CPU (e.g., especially when it is the last in its pool)? > In that latter case, have you not considered it, or do you refuse the > offline, or do you somehow break the pool affinity so that domains belonging > to it can run elsewhere?These cases are already covered by my patch. A new cpu is always added to the "free pool". It can then be assigned to any pool. Perhaps it would be better to add it to pool 0, but that''s a minor detail, I think. Offlining the last cpu of a pool with active domains is refused. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 30/07/2009 06:46, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >>> Another alternative might be to create a ''hypervisor thread'', either >>> dynamically, or a per-cpu worker thread, and do the work in that. Of course >>> that has its own complexities and these threads would also have their own >>> interactions with cpu pools to keep them pinned on the appropriate physical >>> cpu. I don''t know whether this would really work out simpler. >> There should be an easy solution for this: What you are suggesting here sounds >> like a "hypervisor domain" similar to the the idle domain, but with high >> priority and normally all vcpus blocked. >> >> The interactions of this domain with cpupools would be the same as for the >> idle domain. >> >> I think this approach could be attractive, but the question is if the pros >> outweigh the cons. OTOH such a domain could open interesting opportunities. > > I think especially if cpupools are added into the mix then this becomes more > attractive than the current approach. The other alternative is to modify the > two existing problematic callers to work okay from softirq context (or not > need continue_hypercall_on_cpu() at all, which might be possible at least in > the case of CPU hotplug). I would be undecided between these two just now -- > it depends on how easily those two callers can be fixed up.I''ll try to set up a patch to add a hypervisor domain. Regarding all the problems I got with switching cpus between pools (avoid running on the cpu to be switched etc.) this solution could make life much easier. And George would be happy to see all the borrow cpu stuff vanish :-) Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 30/07/2009 13:51, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>> I think especially if cpupools are added into the mix then this becomes more >> attractive than the current approach. The other alternative is to modify the >> two existing problematic callers to work okay from softirq context (or not >> need continue_hypercall_on_cpu() at all, which might be possible at least in >> the case of CPU hotplug). I would be undecided between these two just now -- >> it depends on how easily those two callers can be fixed up. > > I''ll try to set up a patch to add a hypervisor domain. Regarding all the > problems I got with switching cpus between pools (avoid running on the cpu to > be switched etc.) this solution could make life much easier.I''m inclined actually to think a hypervisor domain is not necessary, and we can get by with softirqs. I actually think cpu offline can be reimplemented without softirqs or continue_hypercall_on_cpu(), and I would imagine cpupool changes then could use a similar technique. I will take a look at that, and you can take your cues from it if I find an elegant solution along those lines.> And George would be happy to see all the borrow cpu stuff vanish :-)Yes, well I think we can get rid of that, regardless of a decision regarding hypervisor domains. And we get rid of vcpu_lock_affinity too, which is nice. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 30/07/2009 13:51, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >>> I think especially if cpupools are added into the mix then this becomes more >>> attractive than the current approach. The other alternative is to modify the >>> two existing problematic callers to work okay from softirq context (or not >>> need continue_hypercall_on_cpu() at all, which might be possible at least in >>> the case of CPU hotplug). I would be undecided between these two just now -- >>> it depends on how easily those two callers can be fixed up. >> I''ll try to set up a patch to add a hypervisor domain. Regarding all the >> problems I got with switching cpus between pools (avoid running on the cpu to >> be switched etc.) this solution could make life much easier. > > I''m inclined actually to think a hypervisor domain is not necessary, and we > can get by with softirqs. I actually think cpu offline can be reimplemented > without softirqs or continue_hypercall_on_cpu(), and I would imagine cpupool > changes then could use a similar technique. I will take a look at that, and > you can take your cues from it if I find an elegant solution along those > lines.Thanks, that''s great! Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel