thr3ads.net - Xen devel - [Xen-devel] lazy context switching [Aug 2005]

If this information is useful, please help other people find it:
Share via:

Hollis Blanchard

2005-Aug-25 21:55 UTC

[Xen-devel] lazy context switching

Hi Keir, I noticed changeset 027812e4a63c, in which you split off 
context_switch_finalise() from context_switch(). I really appreciate the 
comments you added!

/*
 * Called by the scheduler to switch to another VCPU. On entry, although
 * VCPUF_running is no longer asserted for @prev, its context is still running
 * on the local CPU and is not committed to memory. The local scheduler lock
 * is therefore still held, and interrupts are disabled, because the local CPU
 * is in an inconsistent state.
 * 
 * The callee must ensure that the local CPU is no longer running in
@prev''s
 * context, and that the context is saved to memory, before returning.
 * Alternatively, if implementing lazy context switching, it suffices to 
ensure
 * that invoking __sync_lazy_execstate() will switch and commit @prev''s
state.
 */
 extern void context_switch(
     struct vcpu *prev, 
     struct vcpu *next);

PowerPC has a relatively large set of (general-purpose) registers; half are 
volatile and half are not. When we take an exception, we do not save the 
nonvolatiles in the exception handler, since we may be returning to the same 
domain anyways, and in that case C code will ensure that the nonvolatiles are 
correct.

Later on, if it turns out we are switching domains, we save/restore all the 
state we can, then return to the exception handler which saves the old set of 
nonvolatiles and loads the new one. Until that point, some domain state is 
spread arbitrarily across our stack.

That means that context_switch() cannot actually save all of @prev''s
state to
memory (and neither can __sync_lazy_execstate()) -- only by returning all the 
way to assembly can we accomplish that.

Thoughts?

-- 
Hollis Blanchard
IBM Linux Technology Center

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Aug-26 09:37 UTC

head link

[Xen-devel] Re: lazy context switching

On 25 Aug 2005, at 22:55, Hollis Blanchard wrote:
> Later on, if it turns out we are switching domains, we save/restore 
> all the
> state we can, then return to the exception handler which saves the old 
> set of
> nonvolatiles and loads the new one. Until that point, some domain 
> state is
> spread arbitrarily across our stack.
>
> That means that context_switch() cannot actually save all of
@prev''s
> state to
> memory (and neither can __sync_lazy_execstate()) -- only by returning 
> all the
> way to assembly can we accomplish that.
>
> Thoughts?
What you need is a synchronisation point, visible to other CPUs, beyond 
which things like DOM0_GETVCPUCONTEXT can be sure to read consistent 
current state for the descheduled vcpu. See domain_sleep_sync() for the 
current way we ensure that state is committed to memory.

If you have a lot of register state, have you considered maintaining a 
Xen stack per VCPU? The context-switch interface already supports this, 
for ia64.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Hollis Blanchard

2005-Aug-26 16:38 UTC

head link

[Xen-devel] Re: lazy context switching

On Aug 26, 2005, at 4:37 AM, Keir Fraser wrote:>
> On 25 Aug 2005, at 22:55, Hollis Blanchard wrote:
>
>> Later on, if it turns out we are switching domains, we save/restore 
>> all the
>> state we can, then return to the exception handler which saves the 
>> old set of
>> nonvolatiles and loads the new one. Until that point, some domain 
>> state is
>> spread arbitrarily across our stack.
>>
>> That means that context_switch() cannot actually save all of
@prev''s
>> state to
>> memory (and neither can __sync_lazy_execstate()) -- only by returning 
>> all the
>> way to assembly can we accomplish that.
>>
>> Thoughts?
>
> What you need is a synchronisation point, visible to other CPUs, 
> beyond which things like DOM0_GETVCPUCONTEXT can be sure to read 
> consistent current state for the descheduled vcpu. See 
> domain_sleep_sync() for the current way we ensure that state is 
> committed to memory.
Hmmmmm. I think the basic problem is that in the exception handler we 
don''t usually know we will need this state. The exception is a debug 
exception, where we know we will need it for the GDB stub.

However, we also have a hypervisor-dedicated timer, HDEC (hypervisor 
decrementer). Rather than using it as a plain tick which may or may not 
cause a scheduler exception, we can use it to *always* mean a context 
switch. In that case, we would always save the full state on HDEC 
entry, because we know it will always cause a context switch. Judging 
by set_ac_timer() callers, it seems that only the scheduler really uses 
the Xen timer tick. If non-scheduler components start using 
Xen-internal ticks, this approach wouldn''t hold up (or rather, it would
start becoming less efficient).

Would that also work for DOM0_GETVCPUCONTEXT? Let''s assume the dom0 
vcpu and the target vcpu are running on separate dedicated processors. 
In that case, dom0 could wait for the target vcpu to take an HDEC at 
some point in the future, but if it really is a dedicated vcpu then we 
would want the schedule interval to be the maximum, so that could be a 
long time. Another option is to have vcpu_pause() end up resetting the 
target vcpu''s processor''s HDEC via an IPI, which would cause a
fake
scheduler HDEC to go off, syncronizing the target vcpu''s state.

What do you think?
> If you have a lot of register state, have you considered maintaining a 
> Xen stack per VCPU? The context-switch interface already supports 
> this, for ia64.
We have plenty of space on the per-CPU stack for the register state (we 
use it anyways on a debug exception for the GDB stub). And even if we 
had one stack per VCPU, we would still want to avoid unnecessarily 
saving/restoring the nonvolatiles...

-- 
Hollis Blanchard
IBM Linux Technology Center

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Aug-26 17:14 UTC

head link

[Xen-devel] Re: lazy context switching

On 26 Aug 2005, at 17:38, Hollis Blanchard wrote:
> Hmmmmm. I think the basic problem is that in the exception handler we 
> don''t usually know we will need this state. The exception is a
debug
> exception, where we know we will need it for the GDB stub.
>
> However, we also have a hypervisor-dedicated timer, HDEC (hypervisor 
> decrementer). Rather than using it as a plain tick which may or may 
> not cause a scheduler exception, we can use it to *always* mean a 
> context switch. In that case, we would always save the full state on 
> HDEC entry, because we know it will always cause a context switch. 
> Judging by set_ac_timer() callers, it seems that only the scheduler 
> really uses the Xen timer tick. If non-scheduler components start 
> using Xen-internal ticks, this approach wouldn''t hold up (or
rather,
> it would start becoming less efficient).
Why not move the non-volatile save/restore into your context switch 
routine, rather than deferring it until you exit the hypervisor?

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Hollis Blanchard

2005-Aug-26 17:40 UTC

head link

[Xen-devel] Re: lazy context switching

On Aug 26, 2005, at 12:14 PM, Keir Fraser wrote:>
> On 26 Aug 2005, at 17:38, Hollis Blanchard wrote:
>
>> Hmmmmm. I think the basic problem is that in the exception handler we 
>> don''t usually know we will need this state. The exception is a
debug
>> exception, where we know we will need it for the GDB stub.
>>
>> However, we also have a hypervisor-dedicated timer, HDEC (hypervisor 
>> decrementer). Rather than using it as a plain tick which may or may 
>> not cause a scheduler exception, we can use it to *always* mean a 
>> context switch. In that case, we would always save the full state on 
>> HDEC entry, because we know it will always cause a context switch. 
>> Judging by set_ac_timer() callers, it seems that only the scheduler 
>> really uses the Xen timer tick. If non-scheduler components start 
>> using Xen-internal ticks, this approach wouldn''t hold up (or
rather,
>> it would start becoming less efficient).
>
> Why not move the non-volatile save/restore into your context switch 
> routine, rather than deferring it until you exit the hypervisor?
This is a key point. r14-r31 are nonvolatile in our C ABI, which means 
that callees must preserve their contents for callers. At a high level, 
our exception handlers look like this:

exception:
	save r0-r13 to cpu_user_regs
	call c_handler
	restore r0-r13
	return

We know that c_handler() will use r14-r31, but we also know that when 
it returns, their contents will have been restored. So saving and 
restoring them in assembly would be a waste of time.

context_switch() will be called from somewhere beneath c_handler(). At 
that point, the original nonvolatiles will have been saved across many 
stack frames (starting with c_handler()''s), so we really are unable to 
access them at this point. However, we trust that by the time we get 
back to the exception handler, the original nonvolatiles will have been 
restored off all those stack frames.

-- 
Hollis Blanchard
IBM Linux Technology Center


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Aug-26 18:06 UTC

head link

[Xen-devel] Re: lazy context switching

On 26 Aug 2005, at 18:40, Hollis Blanchard wrote:
> context_switch() will be called from somewhere beneath c_handler(). At 
> that point, the original nonvolatiles will have been saved across many 
> stack frames (starting with c_handler()''s), so we really are
unable to
> access them at this point. However, we trust that by the time we get 
> back to the exception handler, the original nonvolatiles will have 
> been restored off all those stack frames.
Hmmmm... Anyone interested in the state of a paused vcpu will call 
sync_vcpu_execstate() after descheduling the vcpu. That is an entirely 
arch-specific function that you can define to wait until the 
non-volatile registers are safely saved to memory.

Maybe you could add a flag to the arch-specific portion of the vcpu 
structure that gets set after the non-volatile registers are saved to 
memory and cleared when they are restored to active use. Then 
sync_vcpu_execstate() can spin on that flag waiting for it to be 
non-zero.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Hollis Blanchard

2005-Aug-26 19:24 UTC

head link

[Xen-devel] Re: lazy context switching

On Friday 26 August 2005 13:06, Keir Fraser wrote:> On 26 Aug 2005, at 18:40, Hollis Blanchard wrote:
> > context_switch() will be called from somewhere beneath c_handler(). At
> > that point, the original nonvolatiles will have been saved across many
> > stack frames (starting with c_handler()''s), so we really are
unable to
> > access them at this point. However, we trust that by the time we get
> > back to the exception handler, the original nonvolatiles will have
> > been restored off all those stack frames.
>
> Hmmmm... Anyone interested in the state of a paused vcpu will call
> sync_vcpu_execstate() after descheduling the vcpu. That is an entirely
> arch-specific function that you can define to wait until the
> non-volatile registers are safely saved to memory.
>
> Maybe you could add a flag to the arch-specific portion of the vcpu
> structure that gets set after the non-volatile registers are saved to
> memory and cleared when they are restored to active use. Then
> sync_vcpu_execstate() can spin on that flag waiting for it to be
> non-zero.
I think this could work, in conjunction with using the HDEC timer solely as a 
context-switch interrupt (so saving all nonvolatiles on HDEC).

However, if we ever want to context switch for other reasons (e.g. a
"yield"
hypercall), we''re back to the same problem: the hcall exception handler
won''t
save the nonvolatiles...

-- 
Hollis Blanchard
IBM Linux Technology Center

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Aug-27 08:30 UTC

head link

[Xen-devel] Re: lazy context switching

On 26 Aug 2005, at 20:24, Hollis Blanchard wrote:
> However, if we ever want to context switch for other reasons (e.g. a 
> "yield"
> hypercall), we''re back to the same problem: the hcall exception 
> handler won''t
> save the nonvolatiles...
Instead of save/restore in just one exception handler, you could check 
a flag on the exit path of all exception handlers (maybe the same flag 
that sync_vcpu_execstate spins on). If that flag is clear you know you 
need to take a slower path that saves the non-volatiles into the old 
vcpu struct and loads from the new vcpu struct.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2005-Aug-30 15:25 UTC

head link

[Xen-devel] Re: lazy context switching

Hi Hollis --

Sorry for the late reply... keeping up with xen-devel
is getting tough!

How does Linux do this on Power?  Xen/ia64 heavily
leverages the equivalent Linux/ia64 code.  As you
may know, Linux/ia64 scatters state all over
memory and uses "unwind descriptors" so that it
can recover all the state.  I''d imagine Power does
something similar...

Dan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Hollis Blanchard

2005-Aug-30 16:22 UTC

head link

Re: [Xen-devel] Re: lazy context switching

On Tuesday 30 August 2005 10:25, Dan Magenheimer wrote:> Sorry for the late reply... keeping up with xen-devel
> is getting tough!
Yes, I just occasionally skim...
> How does Linux do this on Power?  Xen/ia64 heavily
> leverages the equivalent Linux/ia64 code.  As you
> may know, Linux/ia64 scatters state all over
> memory and uses "unwind descriptors" so that it
> can recover all the state.  I''d imagine Power does
> something similar...
Linux has a per-kernel thread stack, so ''_switch'' saves the
current (i.e.
kernel, not original usermode) nonvolatiles to the previous task structure.

For Xen/PPC, we use a per-cpu stack, so we need to save the original (domain) 
nonvolatiles to the previous domain structure.

-- 
Hollis Blanchard
IBM Linux Technology Center

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2005 - lazy context switching

[Xen-devel] lazy context switching

[Xen-devel] Re: lazy context switching

[Xen-devel] Re: lazy context switching

[Xen-devel] Re: lazy context switching

[Xen-devel] Re: lazy context switching

[Xen-devel] Re: lazy context switching

[Xen-devel] Re: lazy context switching

[Xen-devel] Re: lazy context switching

[Xen-devel] Re: lazy context switching

Re: [Xen-devel] Re: lazy context switching