thr3ads.net - Xen devel - [Xen-devel] Fast inter-VM signaling using monitor/mwait [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Michael Abd-El-Malek

2009-Apr-20 16:32 UTC

[Xen-devel] Fast inter-VM signaling using monitor/mwait

I''ve implemented a fast inter-VM signaling mechanism using the x86  
monitor/mwait instructions.  One-way event notification takes ~0.5us,  
compared to ~8us when using Xen''s event channels.  If there''s
interest
in this code, I''m willing to clean it up and/or share it with others.

A little bit of background...  For my dissertation work, I''m enabling  
portable file system implementations by running a file system in a  
VM.  Small file system-agnostic modules in the kernel pass all VFS  
operations from the user OS (running user applications) to the file  
system VM (running the preferred OS for the file system).  In contrast  
to user-level file systems, my approach leverages unmodified file  
system implementations and provides better isolation for the FS from  
the myriad OSs that a user may be running.  I''ve implemented a unified
buffer caching mechanism between VMs that requires very little changes  
to the OSs: less than a dozen line of changes.  Additionally, we''ve  
modified Xen''s migration mechanism to support atomic migration of two  
VMs.  We currently have NetBSD and Linux (2.6.18 and 2.6.28) ports.   
I''ve implemented an IPC layer that''s very similar to the one
in the
block and network PV drivers (i.e., uses shared memory for data  
transfer and event channels for signaling).

Unfortunately, Xen''s event channels were too slow for my purposes.   
For the remainder of this email, assume that each VM has a dedicated  
core -- I''m trying to optimize latency for this case.  The culprit is  
the overhead for context switching to the guest OS interrupt handler  
(~3.5us for x86_64 2.6.28) and another context switch to a worker  
thread (~3us).  In addition, there''s a ~2us cost for making a
"send
event" hypercall; this includes the cost of a hypercall and for  
sending an x86 inter-process-interrupt (IPI).  Thus, a one-way event  
notification costs ~8us.  Thus, an IPC takes ~16us for a request and a  
response notification.  This cost hasn''t been problematic for the  
block and network drivers primarily since the hardware access cost for  
the underlying operations is typically in the millisecond range.  An  
extra 16us is noise.

Our design goal of preserving file system semantics without modifying  
the file system necessitates that all VFS operations are sent to the  
file system VM.  In other words, there is no client caching.  Thus,  
there is a high frequency of IPCs among the VMs.  For example, we pass  
all in-cache data and metadata accesses, and permission checks and  
directory entry validation callbacks.  These VFS operations can often  
cost less than 1us.  Adding a 16us signaling cost is thus a big  
overhead, slowing macrobenchmarks by ~20%.

I implemented a polling mechanism that spins on a shared memory  
location to check for requests/responses.  Its performance overhead  
was minimal (<1us).  But it had an adverse effect on power consumption  
during idle time.  Fortunately, since the Pentium chip, x86 has  
included two instructions for efficiently (power-wise) implementing  
this type of inter-processor polling.  A processor executes a monitor  
instruction with a memory address to be monitored, then executes an  
mwait instruction.  The mwait instruction returns when a write occurs  
to that memory location, or when an interrupt occurs.

The mwait instruction is privileged.  So I added a new hypercall that  
wraps access to the mwait instruction.  Thus, my code has a Xen  
component (the new hypercall) and a guest kernel component (code for  
executing the hypercall and for turning off/on the timer interrupts  
around the hypercall).  For this code to be merged into Xen, it would  
need to add security checks and check whether the processor supports  
such a feature.

Are any folks interested in this code?  Would it make sense to  
integrate this into Xen?  I''ve implemented the guest code in Linux  
2.6.28, but I can easily port it to 2.6.30 or 2.6.18.  I''m also happy  
to provide my benchmarking code.

Cheers,
Mike

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Michael Abd-El-Malek

2009-Apr-20 17:43 UTC

head link

[Xen-devel] Fast inter-VM signaling using monitor/mwait

I''ve implemented a fast inter-VM signaling mechanism using the x86  
monitor/mwait instructions.  One-way event notification takes ~0.5us,  
compared to ~8us when using Xen''s event channels.  If there''s
interest
in this code, I''m willing to clean it up and/or share it with others.

A little bit of background...  For my dissertation work, I''m enabling  
portable file system implementations by running a file system in a  
VM.  Small file system-agnostic modules in the kernel pass all VFS  
operations from the user OS (running user applications) to the file  
system VM (running the preferred OS for the file system).  In contrast  
to user-level file systems, my approach leverages unmodified file  
system implementations and provides better isolation for the FS from  
the myriad OSs that a user may be running.  I''ve implemented a unified
buffer caching mechanism between VMs that requires very little changes  
to the OSs: less than a dozen line of changes.  Additionally, we''ve  
modified Xen''s migration mechanism to support atomic migration of two  
VMs.  We currently have NetBSD and Linux (2.6.18 and 2.6.28) ports.   
I''ve implemented an IPC layer that''s very similar to the one
in the
block and network PV drivers (i.e., uses shared memory for data  
transfer and event channels for signaling).

Unfortunately, Xen''s event channels were too slow for my purposes.   
For the remainder of this email, assume that each VM has a dedicated  
core -- I''m trying to optimize latency for this case.  The culprit is  
the overhead for context switching to the guest OS interrupt handler  
(~3.5us for x86_64 2.6.28) and another context switch to a worker  
thread (~3us).  In addition, there''s a ~2us cost for making a
"send
event" hypercall; this includes the cost of a hypercall and for  
sending an x86 inter-process-interrupt (IPI).  Thus, a one-way event  
notification costs ~8us.  Thus, an IPC takes ~16us for a request and a  
response notification.  This cost hasn''t been problematic for the  
block and network drivers primarily since the hardware access cost for  
the underlying operations is typically in the millisecond range.  An  
extra 16us is noise.

Our design goal of preserving file system semantics without modifying  
the file system necessitates that all VFS operations are sent to the  
file system VM.  In other words, there is no client caching.  Thus,  
there is a high frequency of IPCs among the VMs.  For example, we pass  
all in-cache data and metadata accesses, and permission checks and  
directory entry validation callbacks.  These VFS operations can often  
cost less than 1us.  Adding a 16us signaling cost is thus a big  
overhead, slowing macrobenchmarks by ~20%.

I implemented a polling mechanism that spins on a shared memory  
location to check for requests/responses.  Its performance overhead  
was minimal (<1us).  But it had an adverse effect on power consumption  
during idle time.  Fortunately, since the Pentium chip, x86 has  
included two instructions for efficiently (power-wise) implementing  
this type of inter-processor polling.  A processor executes a monitor  
instruction with a memory address to be monitored, then executes an  
mwait instruction.  The mwait instruction returns when a write occurs  
to that memory location, or when an interrupt occurs.

The mwait instruction is privileged.  So I added a new hypercall that  
wraps access to the mwait instruction.  Thus, my code has a Xen  
component (the new hypercall) and a guest kernel component (code for  
executing the hypercall and for turning off/on the timer interrupts  
around the hypercall).  For this code to be merged into Xen, it would  
need to add security checks and check whether the processor supports  
such a feature.

Are any folks interested in this code?  Would it make sense to  
integrate this into Xen?  I''ve implemented the guest code in Linux  
2.6.28, but I can easily port it to 2.6.30 or 2.6.18.  I''m also happy  
to provide my benchmarking code.

Cheers,
Mike

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2009-Apr-21 03:19 UTC

head link

RE: [Xen-devel] Fast inter-VM signaling using monitor/mwait

> The mwait instruction is privileged.  So I added a new hypercall that
> wraps access to the mwait instruction.  Thus, my code has a Xen
> component (the new hypercall) and a guest kernel component (code for
> executing the hypercall and for turning off/on the timer interrupts
> around the hypercall).  For this code to be merged into Xen, it would
> need to add security checks and check whether the processor supports
> such a feature.
I seem to recall that some newer CPUs have an mwait instruction accessible from
ring3, using a different opcode -- you might want to check this out.

How do you deal with atomicity of the monitor and mwait? i.e. how do you stop
the hypervisor pre-empting the VM and using monitor for its own purposes or
letting another guest use it?

Have you thought about HVM guests as well as PV? 

Best,
Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2009-Apr-21 09:01 UTC

head link

RE: [Xen-devel] Fast inter-VM signaling using monitor/mwait

>From: Ian Pratt
>Sent: 2009年4月21日 11:19
>
>> The mwait instruction is privileged.  So I added a new hypercall that
>> wraps access to the mwait instruction.  Thus, my code has a Xen
>> component (the new hypercall) and a guest kernel component (code for
>> executing the hypercall and for turning off/on the timer interrupts
>> around the hypercall).  For this code to be merged into Xen, it would
>> need to add security checks and check whether the processor supports
>> such a feature.
>
>I seem to recall that some newer CPUs have an mwait 
>instruction accessible from ring3, using a different opcode -- 
>you might want to check this out.
>
>How do you deal with atomicity of the monitor and mwait? i.e. 
>how do you stop the hypervisor pre-empting the VM and using 
>monitor for its own purposes or letting another guest use it?
That's a true concern. To use monitor/mwait sanely, software is required
to not add voluntary context switch in between, however to ensure that
atomicity at hypercall level, I'm not sure about overall efficiency when
multiple VMs are all active...
>
>Have you thought about HVM guests as well as PV? 
>
For HVM guest, both vmexit and vmentry clears any address range
monitoring in effect and thus that won't work.

Thanks,
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Michael Abd-El-Malek

2009-Apr-23 21:42 UTC

head link

Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait

On Apr 20, 2009, at 11:19 PM, Ian Pratt wrote:
>> The mwait instruction is privileged.  So I added a new hypercall that
>> wraps access to the mwait instruction.  Thus, my code has a Xen
>> component (the new hypercall) and a guest kernel component (code for
>> executing the hypercall and for turning off/on the timer interrupts
>> around the hypercall).  For this code to be merged into Xen, it would
>> need to add security checks and check whether the processor supports
>> such a feature.
>
> I seem to recall that some newer CPUs have an mwait instruction  
> accessible from ring3, using a different opcode -- you might want to  
> check this out.
Thanks for the pointer.  I''m not aware of these new instructions.  A  
quick Google search didn''t turn up anything.  Can any of the Intel/AMD
folks shed more light?
> How do you deal with atomicity of the monitor and mwait? i.e. how do  
> you stop the hypervisor pre-empting the VM and using monitor for its  
> own purposes or letting another guest use it?
The monitor and mwait instructions are _both_ executed in the  
hypervisor.  I should''ve been clearer: my new "mwait"
hypercall
executes both the monitor and mwait instructions.
> Have you thought about HVM guests as well as PV?
No, I haven''t thought about HVM.  (I''m about to reply to Kevin
TIan''s
response and hopefully we can figure this out.)
> Best,
> Ian
Thanks for the feedback!
Mike

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Michael Abd-El-Malek

2009-Apr-23 21:48 UTC

head link

Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait

On Apr 21, 2009, at 5:01 AM, Tian, Kevin wrote:
>> From: Ian Pratt
>> Sent: 2009年4月21日 11:19
>>
>>> The mwait instruction is privileged.  So I added a new hypercall  
>>> that
>>> wraps access to the mwait instruction.  Thus, my code has a Xen
>>> component (the new hypercall) and a guest kernel component (code
for
>>> executing the hypercall and for turning off/on the timer interrupts
>>> around the hypercall).  For this code to be merged into Xen, it  
>>> would
>>> need to add security checks and check whether the processor
supports
>>> such a feature.
>>
>> I seem to recall that some newer CPUs have an mwait
>> instruction accessible from ring3, using a different opcode --
>> you might want to check this out.
>>
>> How do you deal with atomicity of the monitor and mwait? i.e.
>> how do you stop the hypervisor pre-empting the VM and using
>> monitor for its own purposes or letting another guest use it?
>
> That''s a true concern. To use monitor/mwait sanely, software is  
> required
> to not add voluntary context switch in between, however to ensure that
> atomicity at hypercall level, I''m not sure about overall
efficiency
> when
> multiple VMs are all active...
I''m executing the montior and mwait instructions together in the  
hypercall.  The hypercall also takes an argument specifying the old  
value of the memory location.  When the mwait instruction returns, the  
hypervisor can check and handle any interrupts.  I currently return a  
continuation so that the mwait hypercall is rexecuted at the end of  
handling interrupts.  I haven''t really thought about what if the VM  
gets scheduled out.  These are the kinds of issues that I''d like to  
fix if the community wants to add this hypercall.  For my benchmarking  
purposes, I''m not worrying about this :)
>> Have you thought about HVM guests as well as PV?
>>
>
> For HVM guest, both vmexit and vmentry clears any address range
> monitoring in effect and thus that won''t work.
I imagine this would cause the mwait instruction to execute before a  
write occurs to the memory address?  If so, the guest OS can check  
this (by comparing the memory address''s value to the previous saved  
value), and reexecute the mwait hypercall.  Users of mwait already  
have to check whether their terminating condition has occurred, since  
interrupts cause mwait to return.

Thanks for the feedback,
Mike
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2009-Apr-26 13:04 UTC

head link

RE: [Xen-devel] Fast inter-VM signaling using monitor/mwait

>From: Michael Abd-El-Malek [mailto:mabdelmalek@cmu.edu] 
>Sent: 2009年4月24日 5:48
>
>On Apr 21, 2009, at 5:01 AM, Tian, Kevin wrote:
>
>>> From: Ian Pratt
>>> Sent: 2009年4月21日 11:19
>>>
>>>> The mwait instruction is privileged.  So I added a new
hypercall
>>>> that
>>>> wraps access to the mwait instruction.  Thus, my code has a Xen
>>>> component (the new hypercall) and a guest kernel component 
>(code for
>>>> executing the hypercall and for turning off/on the timer
interrupts
>>>> around the hypercall).  For this code to be merged into Xen, it
>>>> would
>>>> need to add security checks and check whether the 
>processor supports
>>>> such a feature.
>>>
>>> I seem to recall that some newer CPUs have an mwait
>>> instruction accessible from ring3, using a different opcode --
>>> you might want to check this out.
>>>
>>> How do you deal with atomicity of the monitor and mwait? i.e.
>>> how do you stop the hypervisor pre-empting the VM and using
>>> monitor for its own purposes or letting another guest use it?
>>
>> That's a true concern. To use monitor/mwait sanely, software is  
>> required
>> to not add voluntary context switch in between, however to 
>ensure that
>> atomicity at hypercall level, I'm not sure about overall efficiency
>> when
>> multiple VMs are all active...
>
>I'm executing the montior and mwait instructions together in the  
>hypercall.  The hypercall also takes an argument specifying the old  
>value of the memory location.  When the mwait instruction 
>returns, the  
>hypervisor can check and handle any interrupts.  I currently return a  
>continuation so that the mwait hypercall is rexecuted at the end of  
>handling interrupts.  I haven't really thought about what if the VM  
>gets scheduled out.  These are the kinds of issues that I'd like to  
>fix if the community wants to add this hypercall.  For my 
Maybe the reverse that you need consider those issues to persuade
the community or else it's like a very limited usage in real world. This
is something to hold the cpu exclusively with unknown time, unless
you also ensure producer, which writes to monitored address, not 
being scheduled out too, which then further limits the actual benefit. 
>benchmarking  
>purposes, I'm not worrying about this :)
>
>>> Have you thought about HVM guests as well as PV?
>>>
>>
>> For HVM guest, both vmexit and vmentry clears any address range
>> monitoring in effect and thus that won't work.
>
>I imagine this would cause the mwait instruction to execute before a  
>write occurs to the memory address?  If so, the guest OS can check  
>this (by comparing the memory address's value to the previous saved  
>value), and reexecute the mwait hypercall.  Users of mwait already  
>have to check whether their terminating condition has occurred, since  
>interrupts cause mwait to return.
yes, then why do you need monitor/mwait, compared to a simple loop
checking data directly? :-)

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Michael Abd-El-Malek

2009-May-05 14:28 UTC

head link

Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait

On Apr 26, 2009, at 9:04 AM, Tian, Kevin wrote:
>> From: Michael Abd-El-Malek [mailto:mabdelmalek@cmu.edu]
>> Sent: 2009年4月24日 5:48
>>
>> On Apr 21, 2009, at 5:01 AM, Tian, Kevin wrote:
>>
>>>> From: Ian Pratt
>>>> Sent: 2009年4月21日 11:19
>>>>
>>>>> The mwait instruction is privileged.  So I added a new
hypercall
>>>>> that
>>>>> wraps access to the mwait instruction.  Thus, my code has a
Xen
>>>>> component (the new hypercall) and a guest kernel component
>> (code for
>>>>> executing the hypercall and for turning off/on the timer  
>>>>> interrupts
>>>>> around the hypercall).  For this code to be merged into
Xen, it
>>>>> would
>>>>> need to add security checks and check whether the
>> processor supports
>>>>> such a feature.
>>>>
>>>> I seem to recall that some newer CPUs have an mwait
>>>> instruction accessible from ring3, using a different opcode --
>>>> you might want to check this out.
>>>>
>>>> How do you deal with atomicity of the monitor and mwait? i.e.
>>>> how do you stop the hypervisor pre-empting the VM and using
>>>> monitor for its own purposes or letting another guest use it?
>>>
>>> That''s a true concern. To use monitor/mwait sanely,
software is
>>> required
>>> to not add voluntary context switch in between, however to
>> ensure that
>>> atomicity at hypercall level, I''m not sure about overall
efficiency
>>> when
>>> multiple VMs are all active...
>>
>> I''m executing the montior and mwait instructions together in
the
>> hypercall.  The hypercall also takes an argument specifying the old
>> value of the memory location.  When the mwait instruction
>> returns, the
>> hypervisor can check and handle any interrupts.  I currently return a
>> continuation so that the mwait hypercall is rexecuted at the end of
>> handling interrupts.  I haven''t really thought about what if
the VM
>> gets scheduled out.  These are the kinds of issues that I''d
like to
>> fix if the community wants to add this hypercall.  For my
>
> Maybe the reverse that you need consider those issues to persuade
> the community or else it''s like a very limited usage in real
world.
> This
> is something to hold the cpu exclusively with unknown time, unless
> you also ensure producer, which writes to monitored address, not
> being scheduled out too, which then further limits the actual benefit.
Interrupts will cause the mwait instruction to return.  So the same  
periodic timer interrupts that are used for VM scheduling will  
continue to be useful.  The CPU is not held exclusively for unbounded  
time.
>> benchmarking
>> purposes, I''m not worrying about this :)
>>
>>>> Have you thought about HVM guests as well as PV?
>>>>
>>>
>>> For HVM guest, both vmexit and vmentry clears any address range
>>> monitoring in effect and thus that won''t work.
>>
>> I imagine this would cause the mwait instruction to execute before a
>> write occurs to the memory address?  If so, the guest OS can check
>> this (by comparing the memory address''s value to the previous
saved
>> value), and reexecute the mwait hypercall.  Users of mwait already
>> have to check whether their terminating condition has occurred, since
>> interrupts cause mwait to return.
>
> yes, then why do you need monitor/mwait, compared to a simple loop
> checking data directly? :-)
The simple spin-poll loop prevents the core from going into a low- 
energy mode.  My motivation in using monitor/mwait is to get the  
latency of spin-poll but with the energy efficiency of Xen events  
(i.e., the CPU can go to sleep if the VM is waiting for a signal).

Cheers,
Mike
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2009-May-06 06:22 UTC

head link

RE: [Xen-devel] Fast inter-VM signaling using monitor/mwait

>From: Michael Abd-El-Malek [mailto:mabdelmalek@cmu.edu] 
>Sent: 2009年5月5日 22:29
>
>On Apr 26, 2009, at 9:04 AM, Tian, Kevin wrote:
>
>>> From: Michael Abd-El-Malek [mailto:mabdelmalek@cmu.edu]
>>> Sent: 2009年4月24日 5:48
>>>
>>> On Apr 21, 2009, at 5:01 AM, Tian, Kevin wrote:
>>>
>>>>> From: Ian Pratt
>>>>> Sent: 2009年4月21日 11:19
>>>>>
>>>>>> The mwait instruction is privileged.  So I added a new
hypercall
>>>>>> that
>>>>>> wraps access to the mwait instruction.  Thus, my code
has a Xen
>>>>>> component (the new hypercall) and a guest kernel
component
>>> (code for
>>>>>> executing the hypercall and for turning off/on the
timer
>>>>>> interrupts
>>>>>> around the hypercall).  For this code to be merged into
Xen, it
>>>>>> would
>>>>>> need to add security checks and check whether the
>>> processor supports
>>>>>> such a feature.
>>>>>
>>>>> I seem to recall that some newer CPUs have an mwait
>>>>> instruction accessible from ring3, using a different opcode
--
>>>>> you might want to check this out.
>>>>>
>>>>> How do you deal with atomicity of the monitor and mwait?
i.e.
>>>>> how do you stop the hypervisor pre-empting the VM and using
>>>>> monitor for its own purposes or letting another guest use
it?
>>>>
>>>> That's a true concern. To use monitor/mwait sanely,
software is
>>>> required
>>>> to not add voluntary context switch in between, however to
>>> ensure that
>>>> atomicity at hypercall level, I'm not sure about overall
efficiency
>>>> when
>>>> multiple VMs are all active...
>>>
>>> I'm executing the montior and mwait instructions together in
the
>>> hypercall.  The hypercall also takes an argument specifying the old
>>> value of the memory location.  When the mwait instruction
>>> returns, the
>>> hypervisor can check and handle any interrupts.  I 
>currently return a
>>> continuation so that the mwait hypercall is rexecuted at the end of
>>> handling interrupts.  I haven't really thought about what if
the VM
>>> gets scheduled out.  These are the kinds of issues that I'd
like to
>>> fix if the community wants to add this hypercall.  For my
>>
>> Maybe the reverse that you need consider those issues to persuade
>> the community or else it's like a very limited usage in real world.
>> This
>> is something to hold the cpu exclusively with unknown time, unless
>> you also ensure producer, which writes to monitored address, not
>> being scheduled out too, which then further limits the 
>actual benefit.
>
>Interrupts will cause the mwait instruction to return.  So the same  
>periodic timer interrupts that are used for VM scheduling will  
>continue to be useful.  The CPU is not held exclusively for unbounded  
>time.
In Xen actual vcpu scheduling happens at the point before resuming
back to VM, instead of in timer interrupt ISR. So as long as your
monitor/mwait loop in hypercall doesn't exit before update is observed, 
scheduling won't happen.
>
>>> benchmarking
>>> purposes, I'm not worrying about this :)
>>>
>>>>> Have you thought about HVM guests as well as PV?
>>>>>
>>>>
>>>> For HVM guest, both vmexit and vmentry clears any address range
>>>> monitoring in effect and thus that won't work.
>>>
>>> I imagine this would cause the mwait instruction to execute before
a
>>> write occurs to the memory address?  If so, the guest OS can check
>>> this (by comparing the memory address's value to the previous
saved
>>> value), and reexecute the mwait hypercall.  Users of mwait already
>>> have to check whether their terminating condition has 
>occurred, since
>>> interrupts cause mwait to return.
>>
>> yes, then why do you need monitor/mwait, compared to a simple loop
>> checking data directly? :-)
>
>The simple spin-poll loop prevents the core from going into a low- 
>energy mode.  My motivation in using monitor/mwait is to get the  
>latency of spin-poll but with the energy efficiency of Xen events  
>(i.e., the CPU can go to sleep if the VM is waiting for a signal).
That's obvious a wrong model to go. There could be other runnable
threads with VM. Here it's not "if VM is waiting for a singal",
instead
it's just "if one thread in VM is waiting for a signal".  

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Michael Abd-El-Malek

2009-May-06 14:38 UTC

head link

Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait

On May 6, 2009, at 2:22 AM, Tian, Kevin wrote:
>> From: Michael Abd-El-Malek [mailto:mabdelmalek@cmu.edu]
>> Sent: 2009年5月5日 22:29
>>
>> On Apr 26, 2009, at 9:04 AM, Tian, Kevin wrote:
>>
>>>> From: Michael Abd-El-Malek [mailto:mabdelmalek@cmu.edu]
>>>> Sent: 2009年4月24日 5:48
>>>>
>>>> On Apr 21, 2009, at 5:01 AM, Tian, Kevin wrote:
>>>>
>>>>>> From: Ian Pratt
>>>>>> Sent: 2009年4月21日 11:19
>>>>>>
>>>>>>> The mwait instruction is privileged.  So I added a
new hypercall
>>>>>>> that
>>>>>>> wraps access to the mwait instruction.  Thus, my
code has a Xen
>>>>>>> component (the new hypercall) and a guest kernel
component
>>>> (code for
>>>>>>> executing the hypercall and for turning off/on the
timer
>>>>>>> interrupts
>>>>>>> around the hypercall).  For this code to be merged
into Xen, it
>>>>>>> would
>>>>>>> need to add security checks and check whether the
>>>> processor supports
>>>>>>> such a feature.
>>>>>>
>>>>>> I seem to recall that some newer CPUs have an mwait
>>>>>> instruction accessible from ring3, using a different
opcode --
>>>>>> you might want to check this out.
>>>>>>
>>>>>> How do you deal with atomicity of the monitor and
mwait? i.e.
>>>>>> how do you stop the hypervisor pre-empting the VM and
using
>>>>>> monitor for its own purposes or letting another guest
use it?
>>>>>
>>>>> That''s a true concern. To use monitor/mwait
sanely, software is
>>>>> required
>>>>> to not add voluntary context switch in between, however to
>>>> ensure that
>>>>> atomicity at hypercall level, I''m not sure about
overall
>>>>> efficiency
>>>>> when
>>>>> multiple VMs are all active...
>>>>
>>>> I''m executing the montior and mwait instructions
together in the
>>>> hypercall.  The hypercall also takes an argument specifying the
old
>>>> value of the memory location.  When the mwait instruction
>>>> returns, the
>>>> hypervisor can check and handle any interrupts.  I
>> currently return a
>>>> continuation so that the mwait hypercall is rexecuted at the
end of
>>>> handling interrupts.  I haven''t really thought about
what if the VM
>>>> gets scheduled out.  These are the kinds of issues that
I''d like to
>>>> fix if the community wants to add this hypercall.  For my
>>>
>>> Maybe the reverse that you need consider those issues to persuade
>>> the community or else it''s like a very limited usage in
real world.
>>> This
>>> is something to hold the cpu exclusively with unknown time, unless
>>> you also ensure producer, which writes to monitored address, not
>>> being scheduled out too, which then further limits the
>> actual benefit.
>>
>> Interrupts will cause the mwait instruction to return.  So the same
>> periodic timer interrupts that are used for VM scheduling will
>> continue to be useful.  The CPU is not held exclusively for unbounded
>> time.
>
> In Xen actual vcpu scheduling happens at the point before resuming
> back to VM, instead of in timer interrupt ISR. So as long as your
> monitor/mwait loop in hypercall doesn''t exit before update is  
> observed,
> scheduling won''t happen.
I''m not an expert on Xen scheduling, so please correct my following  
understanding.  For the credit scheduler, csched_tick sets the next  
timer interrupt.  So after the mwait hypercall executes the mwait  
instruction and is waiting for a memory write, I observe the timer  
interrupt eventually causing the mwait instruction to return.  The  
mwait hypercall can then run the scheduler.
>>>> benchmarking
>>>> purposes, I''m not worrying about this :)
>>>>
>>>>>> Have you thought about HVM guests as well as PV?
>>>>>>
>>>>>
>>>>> For HVM guest, both vmexit and vmentry clears any address
range
>>>>> monitoring in effect and thus that won''t work.
>>>>
>>>> I imagine this would cause the mwait instruction to execute  
>>>> before a
>>>> write occurs to the memory address?  If so, the guest OS can
check
>>>> this (by comparing the memory address''s value to the
previous saved
>>>> value), and reexecute the mwait hypercall.  Users of mwait
already
>>>> have to check whether their terminating condition has
>> occurred, since
>>>> interrupts cause mwait to return.
>>>
>>> yes, then why do you need monitor/mwait, compared to a simple loop
>>> checking data directly? :-)
>>
>> The simple spin-poll loop prevents the core from going into a low-
>> energy mode.  My motivation in using monitor/mwait is to get the
>> latency of spin-poll but with the energy efficiency of Xen events
>> (i.e., the CPU can go to sleep if the VM is waiting for a signal).
>
> That''s obvious a wrong model to go. There could be other runnable
> threads with VM. Here it''s not "if VM is waiting for a
singal",
> instead
> it''s just "if one thread in VM is waiting for a signal".
Yes, the model is "if the VM''s CPU is idle".  In other words,
if there
are runnable threads, I don''t need to interrupt the CPU.  The reason  
is that I''m treating the VM as a "server VM" -- so if
it''s serving
other requests, there''s no need to interrupt it; it will check for new
requests after finishing with the current request.  I only want to  
signal the VM in case it''s idle.

Cheers,
Mike
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-May-06 15:05 UTC

head link

Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait

On 06/05/2009 15:38, "Michael Abd-El-Malek"
<mabdelmalek@cmu.edu> wrote:
>> In Xen actual vcpu scheduling happens at the point before resuming
>> back to VM, instead of in timer interrupt ISR. So as long as your
>> monitor/mwait loop in hypercall doesn''t exit before update is
>> observed,
>> scheduling won''t happen.
> 
> I''m not an expert on Xen scheduling, so please correct my
following
> understanding.  For the credit scheduler, csched_tick sets the next
> timer interrupt.  So after the mwait hypercall executes the mwait
> instruction and is waiting for a memory write, I observe the timer
> interrupt eventually causing the mwait instruction to return.  The
> mwait hypercall can then run the scheduler.
The issue is, what if another VM''s VCPUs are runnable? Xen should
prefer to
run those rather than pause the CPU on an MWAIT, right? But if it does that
it will lose the memory-access wakeup property of the MWAIT and cannot
schedule the ''MWAIT''ing VM back in promptly when the relevant
memory access
occurs. I don''t see that MWAIT can used effectively in a guest idle
loop
unless you are happy to bin the work-conserving property of the Xen
scheduler (e.g., by dedicating a physical CPU to the VM).

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Michael Abd-El-Malek

2009-May-07 16:25 UTC

head link

Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait

On May 6, 2009, at 11:05 AM, Keir Fraser wrote:
> On 06/05/2009 15:38, "Michael Abd-El-Malek"
<mabdelmalek@cmu.edu>
> wrote:
>
>>> In Xen actual vcpu scheduling happens at the point before resuming
>>> back to VM, instead of in timer interrupt ISR. So as long as your
>>> monitor/mwait loop in hypercall doesn''t exit before update
is
>>> observed,
>>> scheduling won''t happen.
>>
>> I''m not an expert on Xen scheduling, so please correct my
following
>> understanding.  For the credit scheduler, csched_tick sets the next
>> timer interrupt.  So after the mwait hypercall executes the mwait
>> instruction and is waiting for a memory write, I observe the timer
>> interrupt eventually causing the mwait instruction to return.  The
>> mwait hypercall can then run the scheduler.
>
> The issue is, what if another VM''s VCPUs are runnable? Xen should
> prefer to
> run those rather than pause the CPU on an MWAIT, right? But if it  
> does that
> it will lose the memory-access wakeup property of the MWAIT and cannot
> schedule the ''MWAIT''ing VM back in promptly when the
relevant memory
> access
> occurs. I don''t see that MWAIT can used effectively in a guest
idle
> loop
> unless you are happy to bin the work-conserving property of the Xen
> scheduler (e.g., by dedicating a physical CPU to the VM).
I see the issue, thanks for explaining this.  Yes, I have dedicated a  
physical CPU to my VM, which does throw away the work-conserving  
property.  As a solution, what if the source VM continues to use the  
"send event" hypercall to signal the destination VM, but we modify the
"send event" hypercall as follows.  If the destination VM is currently
executing _and_ is blocked in an mwait hypercall, then we simply write  
to its monitored memory address.  Otherwise, we fall back to the  
existing event channel mechanism.

How does that sound?

Thanks,
Mike

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2009 - Fast inter-VM signaling using monitor/mwait

[Xen-devel] Fast inter-VM signaling using monitor/mwait

[Xen-devel] Fast inter-VM signaling using monitor/mwait

RE: [Xen-devel] Fast inter-VM signaling using monitor/mwait

RE: [Xen-devel] Fast inter-VM signaling using monitor/mwait

Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait

Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait

RE: [Xen-devel] Fast inter-VM signaling using monitor/mwait

Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait

RE: [Xen-devel] Fast inter-VM signaling using monitor/mwait

Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait

Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait

Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait