Ian Pratt
2005-Jun-08 18:25 UTC
RE: [Xen-devel] [PATCH] Yield to VCPU hcall, spinlock yielding
> The key point is that with > kernel-level preemption notification, VCPUs are always in > kernel mode when suspended, never in user mode. Application > state is always saved in Linux, not in Xen, and is available > to be resumed on another VCPU if Linux so chooses.In principle, but... Do you believe this is going to interact well with Linux''s work stealing CPU migration? I haven''t looked closely at the current code, but from Linux''s scheduler''s POV the de-scheduled (yielded) CPU looks like a perfectly healthy CPU, so there''s no particular reason that another CPU would steal work from it (without hacking the algorithm, which I suppose we could do). Also, do you have to do something special in your yield routine to ensure that no real process is currently running on the yielded processor so that all processes on the run queue are available for stealing? Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bryan S Rosenburg
2005-Jun-08 18:40 UTC
RE: [Xen-devel] [PATCH] Yield to VCPU hcall, spinlock yielding
"Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> wrote on 06/08/2005 02:25:56 PM:> > The key point is that with > > kernel-level preemption notification, VCPUs are always in > > kernel mode when suspended, never in user mode. Application > > state is always saved in Linux, not in Xen, and is available > > to be resumed on another VCPU if Linux so chooses. > > In principle, but... > > Do you believe this is going to interact well with Linux''s work stealing > CPU migration? I haven''t looked closely at the current code, but from > Linux''s scheduler''s POV the de-scheduled (yielded) CPU looks like a > perfectly healthy CPU, so there''s no particular reason that another CPU > would steal work from it (without hacking the algorithm, which I suppose > we could do). Also, do you have to do something special in your yield > routine to ensure that no real process is currently running on the > yielded processor so that all processes on the run queue are available > for stealing? > > IanIn our original posting, we proposed that the Linux interrupt handler for preemption notifications would create (or unblock) a high-priority kernel thread which would then yield back to the hypervisor. To Linux on other CPUs, the de-scheduled CPU would appear to be busy running the high-priority thread, and all real work that that CPU had been doing would be eligible for stealing. I don''t think that Ryan has yet implemented the high-priority thread part of the proposal, but that''s always been part of the plan. - Bryan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Theurer
2005-Jun-08 19:11 UTC
Re: [Xen-devel] [PATCH] Yield to VCPU hcall, spinlock yielding
On Wednesday 08 June 2005 13:40, Bryan S Rosenburg wrote:> "Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> wrote on 06/08/2005 02:25:56PM:> > > The key point is that with > > > kernel-level preemption notification, VCPUs are always in > > > kernel mode when suspended, never in user mode. Application > > > state is always saved in Linux, not in Xen, and is available > > > to be resumed on another VCPU if Linux so chooses. > > > > In principle, but... > > > > Do you believe this is going to interact well with Linux''s work > > stealing CPU migration? I haven''t looked closely at the current > > code, but from Linux''s scheduler''s POV the de-scheduled (yielded) > > CPU looks like a perfectly healthy CPU, so there''s no particular > > reason that another CPU would steal work from it (without hacking > > the algorithm, which I suppose we could do). Also, do you have to > > do something special in your yield routine to ensure that no real > > process is currently running on the yielded processor so that all > > processes on the run queue are available for stealing? > > > > Ian > > In our original posting, we proposed that the Linux interrupt handler > for preemption notifications would create (or unblock) a > high-priority kernel thread which would then yield back to the > hypervisor. To Linux on other CPUs, the de-scheduled CPU would > appear to be busy running the high-priority thread, and all real work > that that CPU had been doing would be eligible for stealing.IMO, I don''t think this alone is enough to encourage task migration. The primary motivator to steal is a 25% or more load imbalance, and one extra fake kernel thread will probably not be enough to trigger this. To solve this and other issues, I believe we need an extra modifier to the Linux kernel cpus'' load value, which Xen could modify to hint the kernel what cpus'' relative processing power is. The Linux kernel scheduler''s per cpu load values would be something like (max_cpu_power / cpu_power * nr_running). Xen could update cpu_power for a number of situations, a "long" preemption, a much faster alternative to a vcpu hot-unplug (don''t unplug, just set cpu_power to 0), and to normalize load values for vcpus which have different time-slice lengths on the physical cpus. I would hope something like this could also be used without Xen on Linux so it has wider appeal. One thing that comes to mind is normalizing cpus'' load when some cpus may be "speed stepped" down for power management or heat issues. -Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ryan Harper
2005-Jun-08 19:17 UTC
Re: [Xen-devel] [PATCH] Yield to VCPU hcall, spinlock yielding
* Bryan Rosenburg <rosnbrg@us.ibm.com> [2005-06-08 13:40]:> "Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> wrote on 06/08/2005 02:25:56 PM: > > > > The key point is that with > > > kernel-level preemption notification, VCPUs are always in > > > kernel mode when suspended, never in user mode. Application > > > state is always saved in Linux, not in Xen, and is available > > > to be resumed on another VCPU if Linux so chooses. > > > > In principle, but... > > > > Do you believe this is going to interact well with Linux''s work stealing > > CPU migration? I haven''t looked closely at the current code, but from > > Linux''s scheduler''s POV the de-scheduled (yielded) CPU looks like a > > perfectly healthy CPU, so there''s no particular reason that another CPU > > would steal work from it (without hacking the algorithm, which I suppose > > we could do). Also, do you have to do something special in your yield > > routine to ensure that no real process is currently running on the > > yielded processor so that all processes on the run queue are available > > for stealing? > > > > Ian > > In our original posting, we proposed that the Linux interrupt handler for > preemption notifications would create (or unblock) a high-priority kernel > thread which would then yield back to the hypervisor. To Linux on other > CPUs, the de-scheduled CPU would appear to be busy running the > high-priority thread, and all real work that that CPU had been doing would > be eligible for stealing. > > I don''t think that Ryan has yet implemented the high-priority thread part > of the proposal, but that''s always been part of the plan.That is correct. I attempted to schedule some work on a particular cpu in the interrupt handler but I couldn''t quite get it working. Currently, we are yielding in the interrupt handler which isn''t what we proposed, but I had hoped that it was close enough to see the general effectiveness of the approach. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bryan S Rosenburg
2005-Jun-08 20:49 UTC
Re: [Xen-devel] [PATCH] Yield to VCPU hcall, spinlock yielding
habanero@us.ltcfwd.linux.ibm.com wrote on 06/08/2005 03:11:06 PM:> > In our original posting, we proposed that the Linux interrupt handler > > for preemption notifications would create (or unblock) a > > high-priority kernel thread which would then yield back to the > > hypervisor. To Linux on other CPUs, the de-scheduled CPU would > > appear to be busy running the high-priority thread, and all real work > > that that CPU had been doing would be eligible for stealing. > > IMO, I don''t think this alone is enough to encourage task migration. > The primary motivator to steal is a 25% or more load imbalance, and one > extra fake kernel thread will probably not be enough to trigger this.The kernel thread is needed at the very least to ensure that all user programs on the de-scheduled CPU are available for migration. In an important case, a program on the de-scheduled CPU holds a futex, and another CPU goes idle because its program blocks on the futex. We''d want the idle CPU to pick up the futex holder, and I''m assuming (with very little actual knowledge) that the Linux scheduler would make that happen.> > To solve this and other issues, I believe we need an extra modifier to > the Linux kernel cpus'' load value, which Xen could modify to hint the > kernel what cpus'' relative processing power is. The Linux kernel > scheduler''s per cpu load values would be something like (max_cpu_power > / cpu_power * nr_running). Xen could update cpu_power for a number of > situations, a "long" preemption, a much faster alternative to a vcpu > hot-unplug (don''t unplug, just set cpu_power to 0), and to normalize > load values for vcpus which have different time-slice lengths on the > physical cpus. > > I would hope something like this could also be used without Xen on Linux> so it has wider appeal. One thing that comes to mind is normalizing > cpus'' load when some cpus may be "speed stepped" down for power > management or heat issues. > > -AndrewI''d view your "cpu_power" proposal as orthogonal to (or perhaps complementary to) our ideas on preemption notification. It''s aimed more at load-balancing and fair scheduling than specifically at the problems that arise with the preemption of lock holders. On the apparent CPU speed issue, does Linux account in any way for different interrupt loads on different processors? Is a program just out of luck if it happens to get scheduled on a processor with heavy interrupt traffic, or will Linux notice that it''s not making the same progress as its peers and shuffle things around? It seems that your cpu_power proposal might have something to contribute here. - Bryan - Bryan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel