thr3ads.net - Xen devel - [Xen-devel] SMP Guest Proposal RFC [Apr 2005]

If this information is useful, please help other people find it:
Share via:

Ryan Harper

2005-Apr-01 16:37 UTC

[Xen-devel] SMP Guest Proposal RFC

Attached is a proposal authored by Bryan Rosenburg, Orran Krieger and
Ryan Harper.  Comments, questions, and criticism requested.

Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-02 00:21 UTC

head link

RE: [Xen-devel] SMP Guest Proposal RFC

> Attached is a proposal authored by Bryan Rosenburg, Orran 
> Krieger and Ryan Harper.  Comments, questions, and criticism 
> requested.
Ryan,

Much of what you''re proposing closely matches our own plans:
It''s always
better that a domain have the minimum number of VCPUs active that are
required to meet its CPU load, and gang scheduling is clearly preferred
where possible. 

However, I''m convinced that pre-emption notifcations are not the way to
go: Kernels typically have no way to back-out of holding a lock early,
so giving them an active call-back is not very useful.

I think its better to have a counter that the VCPU increments whenever
it grabs a lock and decrements when it releases a lock. When the
pre-emption timer goes off, the hypervisor can check the counter. If its
non zero, the hypervisor can choose to hold-off the preemption for e.g.
50us. It can also set a bit in another word indiciating that a
pre-emption is pending. Whenever the ''#locks held'' counter is
decremented to zero, the pre-emption pending bit can be checked, and the
VCPU should imediately yield if it is.

An alternative/complementary scheme would be to have each lock able to
store the number of the VCPU that''s holding it. If a VCPU finds that a
lock is already taken, it can look in the shared info page to see if the
VCPU that''s holding the lock is actually running. If its not, it can
issue a hypervisor_yield_to_VCPU X hypercall and avoid further spinning,
passing its time slice to the VCPU holding the lock. 

Anyhow, good stuff to discuss next week.

Best,
Ian







_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ryan Harper

2005-Apr-02 01:46 UTC

head link

Re: [Xen-devel] SMP Guest Proposal RFC

* Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> [2005-04-01
18:55]:>  
> > Attached is a proposal authored by Bryan Rosenburg, Orran 
> > Krieger and Ryan Harper.  Comments, questions, and criticism 
> > requested.
> 
> Ryan,
> 
> Much of what you''re proposing closely matches our own plans:
It''s always
> better that a domain have the minimum number of VCPUs active that are
> required to meet its CPU load, and gang scheduling is clearly preferred
> where possible. 
That sounds good.
> However, I''m convinced that pre-emption notifcations are not the
way to
> go: Kernels typically have no way to back-out of holding a lock early,
> so giving them an active call-back is not very useful.
With a notification method using interrupts the kernel is informing the
hypervisor when it is safe to preempt.  That is, the interrupt is
serviced only when no locks are being held which is ideal for avoiding
preemption of a lock-holder.  If the kernel does not yield in time, then
we are no worse off than preemption with no notification w.r.t.
preempting lock-holders.  The notification allows the kernel to prepare
for preemption, such as migrating applications to other cpus that are
not being preempted.
> I think its better to have a counter that the VCPU increments whenever
> it grabs a lock and decrements when it releases a lock. When the
> pre-emption timer goes off, the hypervisor can check the counter. If its
> non zero, the hypervisor can choose to hold-off the preemption for e.g.
> 50us. It can also set a bit in another word indiciating that a
> pre-emption is pending. Whenever the ''#locks held''
counter is
> decremented to zero, the pre-emption pending bit can be checked, and the
> VCPU should imediately yield if it is.
One of our concerns was the accounting overhead incurred during each
spinlock acquisition and release.  Linux acquires and release spinlocks
at an incredible rate.  Rather than affect the fast path of the spinlock
code, in our proposal, we only pay when we need to preempt.  
> An alternative/complementary scheme would be to have each lock able to
> store the number of the VCPU that''s holding it. If a VCPU finds
that a
> lock is already taken, it can look in the shared info page to see if the
> VCPU that''s holding the lock is actually running. If its not, it
can
> issue a hypervisor_yield_to_VCPU X hypercall and avoid further spinning,
> passing its time slice to the VCPU holding the lock. 
The directed yield is complementary to any of the schemes dicussed here
as it helps out when lock-holder preemption actually occurs.  This is
the current method employed by the IBM production hypervisor.  You can
see the Linux/Power implementation in arch/ppc64/lib/locks.h

Thanks for the comments.  I look forward to further discussion.

Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bryan S Rosenburg

2005-Apr-02 03:49 UTC

head link

RE: [Xen-devel] SMP Guest Proposal RFC

Ian Pratt writes:> However, I''m convinced that pre-emption notifcations are not the
way to
> go: Kernels typically have no way to back-out of holding a lock early,
> so giving them an active call-back is not very useful.
No one is proposing that kernels back out of holding locks early.  On 
receiving the notification, the kernel is expected to yield the processor 
when it next reaches a lock-free state.  Scheduling a thread to do the 
yield accomplishes that in a very clean manner.

> I think its better to have a counter that the VCPU increments whenever
> it grabs a lock and decrements when it releases a lock. When the
> pre-emption timer goes off, the hypervisor can check the counter. If its
> non zero, the hypervisor can choose to hold-off the preemption for e.g.
> 50us. It can also set a bit in another word indiciating that a
> pre-emption is pending. Whenever the ''#locks held''
counter is
> decremented to zero, the pre-emption pending bit can be checked, and the
> VCPU should imediately yield if it is.
This is in fact the mechanism described in Uhlig et al.  Its main drawback 
is that it does nothing to address the problem of user-level lock-holder 
preemption.  The proposed notification scheme is a single, clean mechanism 
that lets a kernel avoid untimely preemption of both user- and 
kernel-level lock holders.

- Bryan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rik van Riel

2005-Apr-02 15:25 UTC

head link

RE: [Xen-devel] SMP Guest Proposal RFC

On Fri, 1 Apr 2005, Bryan S Rosenburg wrote:
> This is in fact the mechanism described in Uhlig et al.  Its main drawback 
> is that it does nothing to address the problem of user-level lock-holder 
> preemption.
I''m not sure we need that, futexes seem to take care of the
user-level problem pretty well.

Hmmmm, that makes me wonder if we should use something like
futexes (but simpler, since a virtual machine is one address
space) for xenolinux spinlocks...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bryan S Rosenburg

2005-Apr-02 21:24 UTC

head link

RE: [Xen-devel] SMP Guest Proposal RFC

On Sat, 2 Apr 2005, Rik van Riel wrote:
> On Fri, 1 Apr 2005, Bryan S Rosenburg wrote:
> 
> > This is in fact the mechanism described in Uhlig et al.  Its main 
drawback > > is that it does nothing to address the problem of user-level 
lock-holder > > preemption.
> 
> I''m not sure we need that, futexes seem to take care of the
> user-level problem pretty well.
I understand how futexes help with the problem of preempting user-level 
lock holders when Linux is running natively, but virtualization 
complicates the story.  If a user-level thread owns a lock when the 
hypervisor preempts the virtual processor on which it is running, the 
state of that thread is buried in the hypervisor and is not available to 
the Linux kernel.  Even if other threads that try to acquire the lock drop 
into the kernel, there''s nothing that the kernel runnning on other 
processors can do to get the lock-holder running.  One motivation for the 
preemption notification mechanism we proposed is that for the most part it 
avoids suspending virtual processors running user-level code.  User-level 
threads are always suspended in Linux rather than in the hypervisor, so 
they''re available to be run on other virtual processors.  I would say
that
preemption notification is needed in order to keep futexes working well as 
a solution to the user-level lock problem.

- Bryan





Rik van Riel <riel@redhat.com>
04/02/2005 10:25 AM

 
        To:     Bryan S Rosenburg/Watson/IBM@IBMUS
        cc:     Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>,
ryanh@us.ltcfwd.linux.ibm.com,
xen-devel@lists.xensource.com, Orran Y Krieger/Watson/IBM@IBMUS
        Subject:        RE: [Xen-devel] SMP Guest Proposal RFC



On Fri, 1 Apr 2005, Bryan S Rosenburg wrote:
> This is in fact the mechanism described in Uhlig et al.  Its main 
drawback > is that it does nothing to address the problem of user-level lock-holder 
> preemption.
I''m not sure we need that, futexes seem to take care of the
user-level problem pretty well.

Hmmmm, that makes me wonder if we should use something like
futexes (but simpler, since a virtual machine is one address
space) for xenolinux spinlocks...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2005 - SMP Guest Proposal RFC

[Xen-devel] SMP Guest Proposal RFC

RE: [Xen-devel] SMP Guest Proposal RFC

Re: [Xen-devel] SMP Guest Proposal RFC

RE: [Xen-devel] SMP Guest Proposal RFC

RE: [Xen-devel] SMP Guest Proposal RFC

RE: [Xen-devel] SMP Guest Proposal RFC