Attached is a proposal authored by Bryan Rosenburg, Orran Krieger and Ryan Harper. Comments, questions, and criticism requested. Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Attached is a proposal authored by Bryan Rosenburg, Orran > Krieger and Ryan Harper. Comments, questions, and criticism > requested.Ryan, Much of what you''re proposing closely matches our own plans: It''s always better that a domain have the minimum number of VCPUs active that are required to meet its CPU load, and gang scheduling is clearly preferred where possible. However, I''m convinced that pre-emption notifcations are not the way to go: Kernels typically have no way to back-out of holding a lock early, so giving them an active call-back is not very useful. I think its better to have a counter that the VCPU increments whenever it grabs a lock and decrements when it releases a lock. When the pre-emption timer goes off, the hypervisor can check the counter. If its non zero, the hypervisor can choose to hold-off the preemption for e.g. 50us. It can also set a bit in another word indiciating that a pre-emption is pending. Whenever the ''#locks held'' counter is decremented to zero, the pre-emption pending bit can be checked, and the VCPU should imediately yield if it is. An alternative/complementary scheme would be to have each lock able to store the number of the VCPU that''s holding it. If a VCPU finds that a lock is already taken, it can look in the shared info page to see if the VCPU that''s holding the lock is actually running. If its not, it can issue a hypervisor_yield_to_VCPU X hypercall and avoid further spinning, passing its time slice to the VCPU holding the lock. Anyhow, good stuff to discuss next week. Best, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> [2005-04-01 18:55]:> > > Attached is a proposal authored by Bryan Rosenburg, Orran > > Krieger and Ryan Harper. Comments, questions, and criticism > > requested. > > Ryan, > > Much of what you''re proposing closely matches our own plans: It''s always > better that a domain have the minimum number of VCPUs active that are > required to meet its CPU load, and gang scheduling is clearly preferred > where possible.That sounds good.> However, I''m convinced that pre-emption notifcations are not the way to > go: Kernels typically have no way to back-out of holding a lock early, > so giving them an active call-back is not very useful.With a notification method using interrupts the kernel is informing the hypervisor when it is safe to preempt. That is, the interrupt is serviced only when no locks are being held which is ideal for avoiding preemption of a lock-holder. If the kernel does not yield in time, then we are no worse off than preemption with no notification w.r.t. preempting lock-holders. The notification allows the kernel to prepare for preemption, such as migrating applications to other cpus that are not being preempted.> I think its better to have a counter that the VCPU increments whenever > it grabs a lock and decrements when it releases a lock. When the > pre-emption timer goes off, the hypervisor can check the counter. If its > non zero, the hypervisor can choose to hold-off the preemption for e.g. > 50us. It can also set a bit in another word indiciating that a > pre-emption is pending. Whenever the ''#locks held'' counter is > decremented to zero, the pre-emption pending bit can be checked, and the > VCPU should imediately yield if it is.One of our concerns was the accounting overhead incurred during each spinlock acquisition and release. Linux acquires and release spinlocks at an incredible rate. Rather than affect the fast path of the spinlock code, in our proposal, we only pay when we need to preempt.> An alternative/complementary scheme would be to have each lock able to > store the number of the VCPU that''s holding it. If a VCPU finds that a > lock is already taken, it can look in the shared info page to see if the > VCPU that''s holding the lock is actually running. If its not, it can > issue a hypervisor_yield_to_VCPU X hypercall and avoid further spinning, > passing its time slice to the VCPU holding the lock.The directed yield is complementary to any of the schemes dicussed here as it helps out when lock-holder preemption actually occurs. This is the current method employed by the IBM production hypervisor. You can see the Linux/Power implementation in arch/ppc64/lib/locks.h Thanks for the comments. I look forward to further discussion. Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt writes:> However, I''m convinced that pre-emption notifcations are not the way to > go: Kernels typically have no way to back-out of holding a lock early, > so giving them an active call-back is not very useful.No one is proposing that kernels back out of holding locks early. On receiving the notification, the kernel is expected to yield the processor when it next reaches a lock-free state. Scheduling a thread to do the yield accomplishes that in a very clean manner.> I think its better to have a counter that the VCPU increments whenever > it grabs a lock and decrements when it releases a lock. When the > pre-emption timer goes off, the hypervisor can check the counter. If its > non zero, the hypervisor can choose to hold-off the preemption for e.g. > 50us. It can also set a bit in another word indiciating that a > pre-emption is pending. Whenever the ''#locks held'' counter is > decremented to zero, the pre-emption pending bit can be checked, and the > VCPU should imediately yield if it is.This is in fact the mechanism described in Uhlig et al. Its main drawback is that it does nothing to address the problem of user-level lock-holder preemption. The proposed notification scheme is a single, clean mechanism that lets a kernel avoid untimely preemption of both user- and kernel-level lock holders. - Bryan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 1 Apr 2005, Bryan S Rosenburg wrote:> This is in fact the mechanism described in Uhlig et al. Its main drawback > is that it does nothing to address the problem of user-level lock-holder > preemption.I''m not sure we need that, futexes seem to take care of the user-level problem pretty well. Hmmmm, that makes me wonder if we should use something like futexes (but simpler, since a virtual machine is one address space) for xenolinux spinlocks... -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sat, 2 Apr 2005, Rik van Riel wrote:> On Fri, 1 Apr 2005, Bryan S Rosenburg wrote: > > > This is in fact the mechanism described in Uhlig et al. Its maindrawback> > is that it does nothing to address the problem of user-levellock-holder> > preemption. > > I''m not sure we need that, futexes seem to take care of the > user-level problem pretty well.I understand how futexes help with the problem of preempting user-level lock holders when Linux is running natively, but virtualization complicates the story. If a user-level thread owns a lock when the hypervisor preempts the virtual processor on which it is running, the state of that thread is buried in the hypervisor and is not available to the Linux kernel. Even if other threads that try to acquire the lock drop into the kernel, there''s nothing that the kernel runnning on other processors can do to get the lock-holder running. One motivation for the preemption notification mechanism we proposed is that for the most part it avoids suspending virtual processors running user-level code. User-level threads are always suspended in Linux rather than in the hypervisor, so they''re available to be run on other virtual processors. I would say that preemption notification is needed in order to keep futexes working well as a solution to the user-level lock problem. - Bryan Rik van Riel <riel@redhat.com> 04/02/2005 10:25 AM To: Bryan S Rosenburg/Watson/IBM@IBMUS cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>, ryanh@us.ltcfwd.linux.ibm.com, xen-devel@lists.xensource.com, Orran Y Krieger/Watson/IBM@IBMUS Subject: RE: [Xen-devel] SMP Guest Proposal RFC On Fri, 1 Apr 2005, Bryan S Rosenburg wrote:> This is in fact the mechanism described in Uhlig et al. Its maindrawback> is that it does nothing to address the problem of user-level lock-holder> preemption.I''m not sure we need that, futexes seem to take care of the user-level problem pretty well. Hmmmm, that makes me wonder if we should use something like futexes (but simpler, since a virtual machine is one address space) for xenolinux spinlocks... -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel