Michael Abd-El-Malek
2009-Apr-20 16:32 UTC
[Xen-devel] Fast inter-VM signaling using monitor/mwait
I''ve implemented a fast inter-VM signaling mechanism using the x86 monitor/mwait instructions. One-way event notification takes ~0.5us, compared to ~8us when using Xen''s event channels. If there''s interest in this code, I''m willing to clean it up and/or share it with others. A little bit of background... For my dissertation work, I''m enabling portable file system implementations by running a file system in a VM. Small file system-agnostic modules in the kernel pass all VFS operations from the user OS (running user applications) to the file system VM (running the preferred OS for the file system). In contrast to user-level file systems, my approach leverages unmodified file system implementations and provides better isolation for the FS from the myriad OSs that a user may be running. I''ve implemented a unified buffer caching mechanism between VMs that requires very little changes to the OSs: less than a dozen line of changes. Additionally, we''ve modified Xen''s migration mechanism to support atomic migration of two VMs. We currently have NetBSD and Linux (2.6.18 and 2.6.28) ports. I''ve implemented an IPC layer that''s very similar to the one in the block and network PV drivers (i.e., uses shared memory for data transfer and event channels for signaling). Unfortunately, Xen''s event channels were too slow for my purposes. For the remainder of this email, assume that each VM has a dedicated core -- I''m trying to optimize latency for this case. The culprit is the overhead for context switching to the guest OS interrupt handler (~3.5us for x86_64 2.6.28) and another context switch to a worker thread (~3us). In addition, there''s a ~2us cost for making a "send event" hypercall; this includes the cost of a hypercall and for sending an x86 inter-process-interrupt (IPI). Thus, a one-way event notification costs ~8us. Thus, an IPC takes ~16us for a request and a response notification. This cost hasn''t been problematic for the block and network drivers primarily since the hardware access cost for the underlying operations is typically in the millisecond range. An extra 16us is noise. Our design goal of preserving file system semantics without modifying the file system necessitates that all VFS operations are sent to the file system VM. In other words, there is no client caching. Thus, there is a high frequency of IPCs among the VMs. For example, we pass all in-cache data and metadata accesses, and permission checks and directory entry validation callbacks. These VFS operations can often cost less than 1us. Adding a 16us signaling cost is thus a big overhead, slowing macrobenchmarks by ~20%. I implemented a polling mechanism that spins on a shared memory location to check for requests/responses. Its performance overhead was minimal (<1us). But it had an adverse effect on power consumption during idle time. Fortunately, since the Pentium chip, x86 has included two instructions for efficiently (power-wise) implementing this type of inter-processor polling. A processor executes a monitor instruction with a memory address to be monitored, then executes an mwait instruction. The mwait instruction returns when a write occurs to that memory location, or when an interrupt occurs. The mwait instruction is privileged. So I added a new hypercall that wraps access to the mwait instruction. Thus, my code has a Xen component (the new hypercall) and a guest kernel component (code for executing the hypercall and for turning off/on the timer interrupts around the hypercall). For this code to be merged into Xen, it would need to add security checks and check whether the processor supports such a feature. Are any folks interested in this code? Would it make sense to integrate this into Xen? I''ve implemented the guest code in Linux 2.6.28, but I can easily port it to 2.6.30 or 2.6.18. I''m also happy to provide my benchmarking code. Cheers, Mike _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Michael Abd-El-Malek
2009-Apr-20 17:43 UTC
[Xen-devel] Fast inter-VM signaling using monitor/mwait
I''ve implemented a fast inter-VM signaling mechanism using the x86 monitor/mwait instructions. One-way event notification takes ~0.5us, compared to ~8us when using Xen''s event channels. If there''s interest in this code, I''m willing to clean it up and/or share it with others. A little bit of background... For my dissertation work, I''m enabling portable file system implementations by running a file system in a VM. Small file system-agnostic modules in the kernel pass all VFS operations from the user OS (running user applications) to the file system VM (running the preferred OS for the file system). In contrast to user-level file systems, my approach leverages unmodified file system implementations and provides better isolation for the FS from the myriad OSs that a user may be running. I''ve implemented a unified buffer caching mechanism between VMs that requires very little changes to the OSs: less than a dozen line of changes. Additionally, we''ve modified Xen''s migration mechanism to support atomic migration of two VMs. We currently have NetBSD and Linux (2.6.18 and 2.6.28) ports. I''ve implemented an IPC layer that''s very similar to the one in the block and network PV drivers (i.e., uses shared memory for data transfer and event channels for signaling). Unfortunately, Xen''s event channels were too slow for my purposes. For the remainder of this email, assume that each VM has a dedicated core -- I''m trying to optimize latency for this case. The culprit is the overhead for context switching to the guest OS interrupt handler (~3.5us for x86_64 2.6.28) and another context switch to a worker thread (~3us). In addition, there''s a ~2us cost for making a "send event" hypercall; this includes the cost of a hypercall and for sending an x86 inter-process-interrupt (IPI). Thus, a one-way event notification costs ~8us. Thus, an IPC takes ~16us for a request and a response notification. This cost hasn''t been problematic for the block and network drivers primarily since the hardware access cost for the underlying operations is typically in the millisecond range. An extra 16us is noise. Our design goal of preserving file system semantics without modifying the file system necessitates that all VFS operations are sent to the file system VM. In other words, there is no client caching. Thus, there is a high frequency of IPCs among the VMs. For example, we pass all in-cache data and metadata accesses, and permission checks and directory entry validation callbacks. These VFS operations can often cost less than 1us. Adding a 16us signaling cost is thus a big overhead, slowing macrobenchmarks by ~20%. I implemented a polling mechanism that spins on a shared memory location to check for requests/responses. Its performance overhead was minimal (<1us). But it had an adverse effect on power consumption during idle time. Fortunately, since the Pentium chip, x86 has included two instructions for efficiently (power-wise) implementing this type of inter-processor polling. A processor executes a monitor instruction with a memory address to be monitored, then executes an mwait instruction. The mwait instruction returns when a write occurs to that memory location, or when an interrupt occurs. The mwait instruction is privileged. So I added a new hypercall that wraps access to the mwait instruction. Thus, my code has a Xen component (the new hypercall) and a guest kernel component (code for executing the hypercall and for turning off/on the timer interrupts around the hypercall). For this code to be merged into Xen, it would need to add security checks and check whether the processor supports such a feature. Are any folks interested in this code? Would it make sense to integrate this into Xen? I''ve implemented the guest code in Linux 2.6.28, but I can easily port it to 2.6.30 or 2.6.18. I''m also happy to provide my benchmarking code. Cheers, Mike _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2009-Apr-21 03:19 UTC
RE: [Xen-devel] Fast inter-VM signaling using monitor/mwait
> The mwait instruction is privileged. So I added a new hypercall that > wraps access to the mwait instruction. Thus, my code has a Xen > component (the new hypercall) and a guest kernel component (code for > executing the hypercall and for turning off/on the timer interrupts > around the hypercall). For this code to be merged into Xen, it would > need to add security checks and check whether the processor supports > such a feature.I seem to recall that some newer CPUs have an mwait instruction accessible from ring3, using a different opcode -- you might want to check this out. How do you deal with atomicity of the monitor and mwait? i.e. how do you stop the hypervisor pre-empting the VM and using monitor for its own purposes or letting another guest use it? Have you thought about HVM guests as well as PV? Best, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2009-Apr-21 09:01 UTC
RE: [Xen-devel] Fast inter-VM signaling using monitor/mwait
>From: Ian Pratt >Sent: 2009年4月21日 11:19 > >> The mwait instruction is privileged. So I added a new hypercall that >> wraps access to the mwait instruction. Thus, my code has a Xen >> component (the new hypercall) and a guest kernel component (code for >> executing the hypercall and for turning off/on the timer interrupts >> around the hypercall). For this code to be merged into Xen, it would >> need to add security checks and check whether the processor supports >> such a feature. > >I seem to recall that some newer CPUs have an mwait >instruction accessible from ring3, using a different opcode -- >you might want to check this out. > >How do you deal with atomicity of the monitor and mwait? i.e. >how do you stop the hypervisor pre-empting the VM and using >monitor for its own purposes or letting another guest use it?That's a true concern. To use monitor/mwait sanely, software is required to not add voluntary context switch in between, however to ensure that atomicity at hypercall level, I'm not sure about overall efficiency when multiple VMs are all active...> >Have you thought about HVM guests as well as PV? >For HVM guest, both vmexit and vmentry clears any address range monitoring in effect and thus that won't work. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Michael Abd-El-Malek
2009-Apr-23 21:42 UTC
Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait
On Apr 20, 2009, at 11:19 PM, Ian Pratt wrote:>> The mwait instruction is privileged. So I added a new hypercall that >> wraps access to the mwait instruction. Thus, my code has a Xen >> component (the new hypercall) and a guest kernel component (code for >> executing the hypercall and for turning off/on the timer interrupts >> around the hypercall). For this code to be merged into Xen, it would >> need to add security checks and check whether the processor supports >> such a feature. > > I seem to recall that some newer CPUs have an mwait instruction > accessible from ring3, using a different opcode -- you might want to > check this out.Thanks for the pointer. I''m not aware of these new instructions. A quick Google search didn''t turn up anything. Can any of the Intel/AMD folks shed more light?> How do you deal with atomicity of the monitor and mwait? i.e. how do > you stop the hypervisor pre-empting the VM and using monitor for its > own purposes or letting another guest use it?The monitor and mwait instructions are _both_ executed in the hypervisor. I should''ve been clearer: my new "mwait" hypercall executes both the monitor and mwait instructions.> Have you thought about HVM guests as well as PV?No, I haven''t thought about HVM. (I''m about to reply to Kevin TIan''s response and hopefully we can figure this out.)> Best, > IanThanks for the feedback! Mike _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Michael Abd-El-Malek
2009-Apr-23 21:48 UTC
Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait
On Apr 21, 2009, at 5:01 AM, Tian, Kevin wrote:>> From: Ian Pratt >> Sent: 2009年4月21日 11:19 >> >>> The mwait instruction is privileged. So I added a new hypercall >>> that >>> wraps access to the mwait instruction. Thus, my code has a Xen >>> component (the new hypercall) and a guest kernel component (code for >>> executing the hypercall and for turning off/on the timer interrupts >>> around the hypercall). For this code to be merged into Xen, it >>> would >>> need to add security checks and check whether the processor supports >>> such a feature. >> >> I seem to recall that some newer CPUs have an mwait >> instruction accessible from ring3, using a different opcode -- >> you might want to check this out. >> >> How do you deal with atomicity of the monitor and mwait? i.e. >> how do you stop the hypervisor pre-empting the VM and using >> monitor for its own purposes or letting another guest use it? > > That''s a true concern. To use monitor/mwait sanely, software is > required > to not add voluntary context switch in between, however to ensure that > atomicity at hypercall level, I''m not sure about overall efficiency > when > multiple VMs are all active...I''m executing the montior and mwait instructions together in the hypercall. The hypercall also takes an argument specifying the old value of the memory location. When the mwait instruction returns, the hypervisor can check and handle any interrupts. I currently return a continuation so that the mwait hypercall is rexecuted at the end of handling interrupts. I haven''t really thought about what if the VM gets scheduled out. These are the kinds of issues that I''d like to fix if the community wants to add this hypercall. For my benchmarking purposes, I''m not worrying about this :)>> Have you thought about HVM guests as well as PV? >> > > For HVM guest, both vmexit and vmentry clears any address range > monitoring in effect and thus that won''t work.I imagine this would cause the mwait instruction to execute before a write occurs to the memory address? If so, the guest OS can check this (by comparing the memory address''s value to the previous saved value), and reexecute the mwait hypercall. Users of mwait already have to check whether their terminating condition has occurred, since interrupts cause mwait to return. Thanks for the feedback, Mike _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2009-Apr-26 13:04 UTC
RE: [Xen-devel] Fast inter-VM signaling using monitor/mwait
>From: Michael Abd-El-Malek [mailto:mabdelmalek@cmu.edu] >Sent: 2009年4月24日 5:48 > >On Apr 21, 2009, at 5:01 AM, Tian, Kevin wrote: > >>> From: Ian Pratt >>> Sent: 2009年4月21日 11:19 >>> >>>> The mwait instruction is privileged. So I added a new hypercall >>>> that >>>> wraps access to the mwait instruction. Thus, my code has a Xen >>>> component (the new hypercall) and a guest kernel component >(code for >>>> executing the hypercall and for turning off/on the timer interrupts >>>> around the hypercall). For this code to be merged into Xen, it >>>> would >>>> need to add security checks and check whether the >processor supports >>>> such a feature. >>> >>> I seem to recall that some newer CPUs have an mwait >>> instruction accessible from ring3, using a different opcode -- >>> you might want to check this out. >>> >>> How do you deal with atomicity of the monitor and mwait? i.e. >>> how do you stop the hypervisor pre-empting the VM and using >>> monitor for its own purposes or letting another guest use it? >> >> That's a true concern. To use monitor/mwait sanely, software is >> required >> to not add voluntary context switch in between, however to >ensure that >> atomicity at hypercall level, I'm not sure about overall efficiency >> when >> multiple VMs are all active... > >I'm executing the montior and mwait instructions together in the >hypercall. The hypercall also takes an argument specifying the old >value of the memory location. When the mwait instruction >returns, the >hypervisor can check and handle any interrupts. I currently return a >continuation so that the mwait hypercall is rexecuted at the end of >handling interrupts. I haven't really thought about what if the VM >gets scheduled out. These are the kinds of issues that I'd like to >fix if the community wants to add this hypercall. For myMaybe the reverse that you need consider those issues to persuade the community or else it's like a very limited usage in real world. This is something to hold the cpu exclusively with unknown time, unless you also ensure producer, which writes to monitored address, not being scheduled out too, which then further limits the actual benefit.>benchmarking >purposes, I'm not worrying about this :) > >>> Have you thought about HVM guests as well as PV? >>> >> >> For HVM guest, both vmexit and vmentry clears any address range >> monitoring in effect and thus that won't work. > >I imagine this would cause the mwait instruction to execute before a >write occurs to the memory address? If so, the guest OS can check >this (by comparing the memory address's value to the previous saved >value), and reexecute the mwait hypercall. Users of mwait already >have to check whether their terminating condition has occurred, since >interrupts cause mwait to return.yes, then why do you need monitor/mwait, compared to a simple loop checking data directly? :-) Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Michael Abd-El-Malek
2009-May-05 14:28 UTC
Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait
On Apr 26, 2009, at 9:04 AM, Tian, Kevin wrote:>> From: Michael Abd-El-Malek [mailto:mabdelmalek@cmu.edu] >> Sent: 2009年4月24日 5:48 >> >> On Apr 21, 2009, at 5:01 AM, Tian, Kevin wrote: >> >>>> From: Ian Pratt >>>> Sent: 2009年4月21日 11:19 >>>> >>>>> The mwait instruction is privileged. So I added a new hypercall >>>>> that >>>>> wraps access to the mwait instruction. Thus, my code has a Xen >>>>> component (the new hypercall) and a guest kernel component >> (code for >>>>> executing the hypercall and for turning off/on the timer >>>>> interrupts >>>>> around the hypercall). For this code to be merged into Xen, it >>>>> would >>>>> need to add security checks and check whether the >> processor supports >>>>> such a feature. >>>> >>>> I seem to recall that some newer CPUs have an mwait >>>> instruction accessible from ring3, using a different opcode -- >>>> you might want to check this out. >>>> >>>> How do you deal with atomicity of the monitor and mwait? i.e. >>>> how do you stop the hypervisor pre-empting the VM and using >>>> monitor for its own purposes or letting another guest use it? >>> >>> That''s a true concern. To use monitor/mwait sanely, software is >>> required >>> to not add voluntary context switch in between, however to >> ensure that >>> atomicity at hypercall level, I''m not sure about overall efficiency >>> when >>> multiple VMs are all active... >> >> I''m executing the montior and mwait instructions together in the >> hypercall. The hypercall also takes an argument specifying the old >> value of the memory location. When the mwait instruction >> returns, the >> hypervisor can check and handle any interrupts. I currently return a >> continuation so that the mwait hypercall is rexecuted at the end of >> handling interrupts. I haven''t really thought about what if the VM >> gets scheduled out. These are the kinds of issues that I''d like to >> fix if the community wants to add this hypercall. For my > > Maybe the reverse that you need consider those issues to persuade > the community or else it''s like a very limited usage in real world. > This > is something to hold the cpu exclusively with unknown time, unless > you also ensure producer, which writes to monitored address, not > being scheduled out too, which then further limits the actual benefit.Interrupts will cause the mwait instruction to return. So the same periodic timer interrupts that are used for VM scheduling will continue to be useful. The CPU is not held exclusively for unbounded time.>> benchmarking >> purposes, I''m not worrying about this :) >> >>>> Have you thought about HVM guests as well as PV? >>>> >>> >>> For HVM guest, both vmexit and vmentry clears any address range >>> monitoring in effect and thus that won''t work. >> >> I imagine this would cause the mwait instruction to execute before a >> write occurs to the memory address? If so, the guest OS can check >> this (by comparing the memory address''s value to the previous saved >> value), and reexecute the mwait hypercall. Users of mwait already >> have to check whether their terminating condition has occurred, since >> interrupts cause mwait to return. > > yes, then why do you need monitor/mwait, compared to a simple loop > checking data directly? :-)The simple spin-poll loop prevents the core from going into a low- energy mode. My motivation in using monitor/mwait is to get the latency of spin-poll but with the energy efficiency of Xen events (i.e., the CPU can go to sleep if the VM is waiting for a signal). Cheers, Mike _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2009-May-06 06:22 UTC
RE: [Xen-devel] Fast inter-VM signaling using monitor/mwait
>From: Michael Abd-El-Malek [mailto:mabdelmalek@cmu.edu] >Sent: 2009年5月5日 22:29 > >On Apr 26, 2009, at 9:04 AM, Tian, Kevin wrote: > >>> From: Michael Abd-El-Malek [mailto:mabdelmalek@cmu.edu] >>> Sent: 2009年4月24日 5:48 >>> >>> On Apr 21, 2009, at 5:01 AM, Tian, Kevin wrote: >>> >>>>> From: Ian Pratt >>>>> Sent: 2009年4月21日 11:19 >>>>> >>>>>> The mwait instruction is privileged. So I added a new hypercall >>>>>> that >>>>>> wraps access to the mwait instruction. Thus, my code has a Xen >>>>>> component (the new hypercall) and a guest kernel component >>> (code for >>>>>> executing the hypercall and for turning off/on the timer >>>>>> interrupts >>>>>> around the hypercall). For this code to be merged into Xen, it >>>>>> would >>>>>> need to add security checks and check whether the >>> processor supports >>>>>> such a feature. >>>>> >>>>> I seem to recall that some newer CPUs have an mwait >>>>> instruction accessible from ring3, using a different opcode -- >>>>> you might want to check this out. >>>>> >>>>> How do you deal with atomicity of the monitor and mwait? i.e. >>>>> how do you stop the hypervisor pre-empting the VM and using >>>>> monitor for its own purposes or letting another guest use it? >>>> >>>> That's a true concern. To use monitor/mwait sanely, software is >>>> required >>>> to not add voluntary context switch in between, however to >>> ensure that >>>> atomicity at hypercall level, I'm not sure about overall efficiency >>>> when >>>> multiple VMs are all active... >>> >>> I'm executing the montior and mwait instructions together in the >>> hypercall. The hypercall also takes an argument specifying the old >>> value of the memory location. When the mwait instruction >>> returns, the >>> hypervisor can check and handle any interrupts. I >currently return a >>> continuation so that the mwait hypercall is rexecuted at the end of >>> handling interrupts. I haven't really thought about what if the VM >>> gets scheduled out. These are the kinds of issues that I'd like to >>> fix if the community wants to add this hypercall. For my >> >> Maybe the reverse that you need consider those issues to persuade >> the community or else it's like a very limited usage in real world. >> This >> is something to hold the cpu exclusively with unknown time, unless >> you also ensure producer, which writes to monitored address, not >> being scheduled out too, which then further limits the >actual benefit. > >Interrupts will cause the mwait instruction to return. So the same >periodic timer interrupts that are used for VM scheduling will >continue to be useful. The CPU is not held exclusively for unbounded >time.In Xen actual vcpu scheduling happens at the point before resuming back to VM, instead of in timer interrupt ISR. So as long as your monitor/mwait loop in hypercall doesn't exit before update is observed, scheduling won't happen.> >>> benchmarking >>> purposes, I'm not worrying about this :) >>> >>>>> Have you thought about HVM guests as well as PV? >>>>> >>>> >>>> For HVM guest, both vmexit and vmentry clears any address range >>>> monitoring in effect and thus that won't work. >>> >>> I imagine this would cause the mwait instruction to execute before a >>> write occurs to the memory address? If so, the guest OS can check >>> this (by comparing the memory address's value to the previous saved >>> value), and reexecute the mwait hypercall. Users of mwait already >>> have to check whether their terminating condition has >occurred, since >>> interrupts cause mwait to return. >> >> yes, then why do you need monitor/mwait, compared to a simple loop >> checking data directly? :-) > >The simple spin-poll loop prevents the core from going into a low- >energy mode. My motivation in using monitor/mwait is to get the >latency of spin-poll but with the energy efficiency of Xen events >(i.e., the CPU can go to sleep if the VM is waiting for a signal).That's obvious a wrong model to go. There could be other runnable threads with VM. Here it's not "if VM is waiting for a singal", instead it's just "if one thread in VM is waiting for a signal". Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Michael Abd-El-Malek
2009-May-06 14:38 UTC
Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait
On May 6, 2009, at 2:22 AM, Tian, Kevin wrote:>> From: Michael Abd-El-Malek [mailto:mabdelmalek@cmu.edu] >> Sent: 2009年5月5日 22:29 >> >> On Apr 26, 2009, at 9:04 AM, Tian, Kevin wrote: >> >>>> From: Michael Abd-El-Malek [mailto:mabdelmalek@cmu.edu] >>>> Sent: 2009年4月24日 5:48 >>>> >>>> On Apr 21, 2009, at 5:01 AM, Tian, Kevin wrote: >>>> >>>>>> From: Ian Pratt >>>>>> Sent: 2009年4月21日 11:19 >>>>>> >>>>>>> The mwait instruction is privileged. So I added a new hypercall >>>>>>> that >>>>>>> wraps access to the mwait instruction. Thus, my code has a Xen >>>>>>> component (the new hypercall) and a guest kernel component >>>> (code for >>>>>>> executing the hypercall and for turning off/on the timer >>>>>>> interrupts >>>>>>> around the hypercall). For this code to be merged into Xen, it >>>>>>> would >>>>>>> need to add security checks and check whether the >>>> processor supports >>>>>>> such a feature. >>>>>> >>>>>> I seem to recall that some newer CPUs have an mwait >>>>>> instruction accessible from ring3, using a different opcode -- >>>>>> you might want to check this out. >>>>>> >>>>>> How do you deal with atomicity of the monitor and mwait? i.e. >>>>>> how do you stop the hypervisor pre-empting the VM and using >>>>>> monitor for its own purposes or letting another guest use it? >>>>> >>>>> That''s a true concern. To use monitor/mwait sanely, software is >>>>> required >>>>> to not add voluntary context switch in between, however to >>>> ensure that >>>>> atomicity at hypercall level, I''m not sure about overall >>>>> efficiency >>>>> when >>>>> multiple VMs are all active... >>>> >>>> I''m executing the montior and mwait instructions together in the >>>> hypercall. The hypercall also takes an argument specifying the old >>>> value of the memory location. When the mwait instruction >>>> returns, the >>>> hypervisor can check and handle any interrupts. I >> currently return a >>>> continuation so that the mwait hypercall is rexecuted at the end of >>>> handling interrupts. I haven''t really thought about what if the VM >>>> gets scheduled out. These are the kinds of issues that I''d like to >>>> fix if the community wants to add this hypercall. For my >>> >>> Maybe the reverse that you need consider those issues to persuade >>> the community or else it''s like a very limited usage in real world. >>> This >>> is something to hold the cpu exclusively with unknown time, unless >>> you also ensure producer, which writes to monitored address, not >>> being scheduled out too, which then further limits the >> actual benefit. >> >> Interrupts will cause the mwait instruction to return. So the same >> periodic timer interrupts that are used for VM scheduling will >> continue to be useful. The CPU is not held exclusively for unbounded >> time. > > In Xen actual vcpu scheduling happens at the point before resuming > back to VM, instead of in timer interrupt ISR. So as long as your > monitor/mwait loop in hypercall doesn''t exit before update is > observed, > scheduling won''t happen.I''m not an expert on Xen scheduling, so please correct my following understanding. For the credit scheduler, csched_tick sets the next timer interrupt. So after the mwait hypercall executes the mwait instruction and is waiting for a memory write, I observe the timer interrupt eventually causing the mwait instruction to return. The mwait hypercall can then run the scheduler.>>>> benchmarking >>>> purposes, I''m not worrying about this :) >>>> >>>>>> Have you thought about HVM guests as well as PV? >>>>>> >>>>> >>>>> For HVM guest, both vmexit and vmentry clears any address range >>>>> monitoring in effect and thus that won''t work. >>>> >>>> I imagine this would cause the mwait instruction to execute >>>> before a >>>> write occurs to the memory address? If so, the guest OS can check >>>> this (by comparing the memory address''s value to the previous saved >>>> value), and reexecute the mwait hypercall. Users of mwait already >>>> have to check whether their terminating condition has >> occurred, since >>>> interrupts cause mwait to return. >>> >>> yes, then why do you need monitor/mwait, compared to a simple loop >>> checking data directly? :-) >> >> The simple spin-poll loop prevents the core from going into a low- >> energy mode. My motivation in using monitor/mwait is to get the >> latency of spin-poll but with the energy efficiency of Xen events >> (i.e., the CPU can go to sleep if the VM is waiting for a signal). > > That''s obvious a wrong model to go. There could be other runnable > threads with VM. Here it''s not "if VM is waiting for a singal", > instead > it''s just "if one thread in VM is waiting for a signal".Yes, the model is "if the VM''s CPU is idle". In other words, if there are runnable threads, I don''t need to interrupt the CPU. The reason is that I''m treating the VM as a "server VM" -- so if it''s serving other requests, there''s no need to interrupt it; it will check for new requests after finishing with the current request. I only want to signal the VM in case it''s idle. Cheers, Mike _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-May-06 15:05 UTC
Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait
On 06/05/2009 15:38, "Michael Abd-El-Malek" <mabdelmalek@cmu.edu> wrote:>> In Xen actual vcpu scheduling happens at the point before resuming >> back to VM, instead of in timer interrupt ISR. So as long as your >> monitor/mwait loop in hypercall doesn''t exit before update is >> observed, >> scheduling won''t happen. > > I''m not an expert on Xen scheduling, so please correct my following > understanding. For the credit scheduler, csched_tick sets the next > timer interrupt. So after the mwait hypercall executes the mwait > instruction and is waiting for a memory write, I observe the timer > interrupt eventually causing the mwait instruction to return. The > mwait hypercall can then run the scheduler.The issue is, what if another VM''s VCPUs are runnable? Xen should prefer to run those rather than pause the CPU on an MWAIT, right? But if it does that it will lose the memory-access wakeup property of the MWAIT and cannot schedule the ''MWAIT''ing VM back in promptly when the relevant memory access occurs. I don''t see that MWAIT can used effectively in a guest idle loop unless you are happy to bin the work-conserving property of the Xen scheduler (e.g., by dedicating a physical CPU to the VM). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Michael Abd-El-Malek
2009-May-07 16:25 UTC
Re: [Xen-devel] Fast inter-VM signaling using monitor/mwait
On May 6, 2009, at 11:05 AM, Keir Fraser wrote:> On 06/05/2009 15:38, "Michael Abd-El-Malek" <mabdelmalek@cmu.edu> > wrote: > >>> In Xen actual vcpu scheduling happens at the point before resuming >>> back to VM, instead of in timer interrupt ISR. So as long as your >>> monitor/mwait loop in hypercall doesn''t exit before update is >>> observed, >>> scheduling won''t happen. >> >> I''m not an expert on Xen scheduling, so please correct my following >> understanding. For the credit scheduler, csched_tick sets the next >> timer interrupt. So after the mwait hypercall executes the mwait >> instruction and is waiting for a memory write, I observe the timer >> interrupt eventually causing the mwait instruction to return. The >> mwait hypercall can then run the scheduler. > > The issue is, what if another VM''s VCPUs are runnable? Xen should > prefer to > run those rather than pause the CPU on an MWAIT, right? But if it > does that > it will lose the memory-access wakeup property of the MWAIT and cannot > schedule the ''MWAIT''ing VM back in promptly when the relevant memory > access > occurs. I don''t see that MWAIT can used effectively in a guest idle > loop > unless you are happy to bin the work-conserving property of the Xen > scheduler (e.g., by dedicating a physical CPU to the VM).I see the issue, thanks for explaining this. Yes, I have dedicated a physical CPU to my VM, which does throw away the work-conserving property. As a solution, what if the source VM continues to use the "send event" hypercall to signal the destination VM, but we modify the "send event" hypercall as follows. If the destination VM is currently executing _and_ is blocked in an mwait hypercall, then we simply write to its monitored memory address. Otherwise, we fall back to the existing event channel mechanism. How does that sound? Thanks, Mike _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel