Tristan Gingold
2007-Feb-22 05:23 UTC
[Xen-devel] Improving hvm IO performance by using self IO emulator (YA io-emu?)
Summary: I am proposing a new method to improve hvm IO emulation: the IO requests are reflected to the domain firmware which emulates the IO using PV drivers. The pros of this method are minor hypervisor modifications, smooth transition, performance improvement and convergence with PV model Discussion: The current IO emulator (ioemu process in dom-0) is a well known bottleneck for hvm performance because IO requests travel is long and cross many rings. Many ideas to improve the emulation have been proposed. None of them have been adopted because their approach are too disruptive. Based on my recent firmware experience I''d like to propose a new method. The principle is rather simple: the hvm domain does all the work. IO requests are simply reflected to the domain. When the hypervisor decodes an IO request it sends it to the domain using a SMI(x86)/PMI(ia64)-like interruption. This reflection saves some registers, put parameters (IO req) into registers and call the firmware at defined address using a defined mode (physical mode should be the best). The firmware handles the IO request like ioemu does but use PV drivers (net, blk, fb...) to access to external resources. It then resumes the domain execution through an hypercall which restores registers and mode. I think there are many pros to this approach: * the changes in the hypervisor are rather small: only the code to do the reflection has to be added. This is a well-known and light mechanism. * the transition can be smooth: this new method can co-exist in several way with the current method. First it can be used only when enabled. Then once the reflection code is added in the hypervisor the firmware can just send the IO request to ioemu like the hypervisor already does. The in domain IO emulation can be added driver per driver (eg: IDE disk first, then network, then fb). This smooth transition is a major advantage to early evaluate this new method. * Because all the emulation work is done in the domain the work in accounted to this domain and not to another domain (dom0 today). This is good for management and for security. * From the hypervisor point of view such an hvm domain looks like a PV domain: only the creation differs. This IO emulation method unifies the domain. This will simplify save & restore and Xen in general. * Performance should be improved compared to the current io emulation method: the IO request travel is shorter. If we want to work on performance we could later handle directly some IO requests in the hypervisor (I think of ports or iomem which don''t have side-effect). I don''t see a lot of cons, the major one is ''porting'' ioemu code to firmware code. This is the challenge. But qemu seems to be well structured. Most of the files might be ported without changes, the core has of course to be rewritten. The PV drivers should also be ported. SMP can be first handled with a global lock and later concurrent accesses may be allowed. This may improve performance compared to ioemu which is almost single threaded. I don''t know yet how to use the PV-on-HVM drivers. There is currently only one page to communicate with xenstore. We can try to share this page between the firmware and the PV-on-HVM drivers or we may create a second page. I have thought of this new IO emulation method during my work on EFI gfw for ia64. Recently I have looked more deeply into the sources. I can''t see any stopper yet. Unless someone has a strong point against this method I hope I will be able to work on it shortly (ia64 first - sorry!) Comments are *very* welcome. Tristan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Feb-22 07:59 UTC
Re: [Xen-devel] Improving hvm IO performance by using self IO emulator (YA io-emu?)
On 22/2/07 05:23, "Tristan Gingold" <tgingold@free.fr> wrote:> The current IO emulator (ioemu process in dom-0) is a well known bottleneck > for hvm performance because IO requests travel is long and cross many rings. > > Many ideas to improve the emulation have been proposed. None of them have > been adopted because their approach are too disruptive. > > Based on my recent firmware experience I''d like to propose a new method.This sounds plausible. It probably depends on what kind of ''firmware'' environment you plan to drop the ioemu code into? The general idea of emulated devices looking to the control stack like PV I/O is one that we want for x86 as well. So any xend changes to that effect are welcome.> * From the hypervisor point of view such an hvm domain looks like a PV domain: > only the creation differs. This IO emulation method unifies the domain. This > will simplify save & restore and Xen in general.I don''t know the specifics of ia64 VTi, but I''d expect that Xen will still need to be aware of VTi? I''d be surprised if the differences can be hidden safely and efficiently. The model you propose sounds much more to me like a VTi (non-PV) domain with PV extensions in an extended firmware module. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
tgingold@free.fr
2007-Feb-22 09:33 UTC
Re: [Xen-devel] Improving hvm IO performance by using self IO emulator (YA io-emu?)
Quoting Keir Fraser <Keir.Fraser@cl.cam.ac.uk>:> On 22/2/07 05:23, "Tristan Gingold" <tgingold@free.fr> wrote: > > > The current IO emulator (ioemu process in dom-0) is a well known bottleneck > > for hvm performance because IO requests travel is long and cross many > rings. > > > > Many ideas to improve the emulation have been proposed. None of them have > > been adopted because their approach are too disruptive. > > > > Based on my recent firmware experience I''d like to propose a new method. > > This sounds plausible. It probably depends on what kind of ''firmware'' > environment you plan to drop the ioemu code into? The general idea of > emulated devices looking to the control stack like PV I/O is one that we > want for x86 as well.Yes that''s the idea.> So any xend changes to that effect are welcome.> > * From the hypervisor point of view such an hvm domain looks like a PV > domain: > > only the creation differs. This IO emulation method unifies the domain. > This > > will simplify save & restore and Xen in general. > > I don''t know the specifics of ia64 VTi, but I''d expect that Xen will still > need to be aware of VTi?Sure.> I''d be surprised if the differences can be hidden > safely and efficiently.If we can get rid of the ioemu process the differences between hvm and PV will be small, won''t they ?> The model you propose sounds much more to me like a > VTi (non-PV) domain with PV extensions in an extended firmware module.Yes, but this model should work with the ioemu process. Tristan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Feb-22 10:23 UTC
Re: [Xen-devel] Improving hvm IO performance by using self IO emulator (YA io-emu?)
On 22/2/07 09:33, "tgingold@free.fr" <tgingold@free.fr> wrote:>> I don''t know the specifics of ia64 VTi, but I''d expect that Xen will still >> need to be aware of VTi? > Sure. >> I''d be surprised if the differences can be hidden >> safely and efficiently. > If we can get rid of the ioemu process the differences between hvm and PV will > be small, won''t they ?>From the perspective of dom0, yes. From the perspective of Xen, maybe not somuch. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guy Zana
2007-Feb-22 10:34 UTC
RE: [Xen-devel] Improving hvm IO performance by using self IO emulator(YA io-emu?)
Are you suggesting to write EFI PV Drivers? This method sounds very promising, but there are some limitations, Windows Vista 32bit does not support EFI (so I''ve read). It''s like PV-on-HVM but it also eliminates the need to install the regular PV-on-HVM drivers.> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > Tristan Gingold > Sent: Thursday, February 22, 2007 7:23 AM > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] Improving hvm IO performance by using > self IO emulator(YA io-emu?) > > Summary: I am proposing a new method to improve hvm IO > emulation: the IO requests are reflected to the domain > firmware which emulates the IO using PV drivers. The pros of > this method are minor hypervisor modifications, smooth > transition, performance improvement and convergence with PV model > > > Discussion: > > The current IO emulator (ioemu process in dom-0) is a well > known bottleneck for hvm performance because IO requests > travel is long and cross many rings. > > Many ideas to improve the emulation have been proposed. None > of them have been adopted because their approach are too disruptive. > > Based on my recent firmware experience I''d like to propose a > new method. > > The principle is rather simple: the hvm domain does all the > work. IO requests are simply reflected to the domain. When > the hypervisor decodes an IO request it sends it to the > domain using a SMI(x86)/PMI(ia64)-like interruption. This > reflection saves some registers, put parameters (IO req) into > registers and call the firmware at defined address using a > defined mode (physical mode should be the best). The > firmware handles the IO request like ioemu does but use PV > drivers (net, blk, fb...) to access to external resources. > It then resumes the domain execution through an hypercall > which restores registers and mode. > > I think there are many pros to this approach: > > * the changes in the hypervisor are rather small: only the > code to do the reflection has to be added. This is a > well-known and light mechanism. > > * the transition can be smooth: this new method can co-exist > in several way with the current method. First it can be used > only when enabled. Then once the reflection code is added in > the hypervisor the firmware can just send the IO request to > ioemu like the hypervisor already does. The in domain IO > emulation can be added driver per driver (eg: IDE disk first, > then network, then fb). > This smooth transition is a major advantage to early evaluate > this new method. > > * Because all the emulation work is done in the domain the > work in accounted to this domain and not to another domain > (dom0 today). This is good for management and for security. > > * From the hypervisor point of view such an hvm domain looks > like a PV domain: > only the creation differs. This IO emulation method unifies > the domain. This will simplify save & restore and Xen in general. > > * Performance should be improved compared to the current io > emulation method: > the IO request travel is shorter. If we want to work on > performance we could later handle directly some IO requests > in the hypervisor (I think of ports or iomem which don''t have > side-effect). > > > I don''t see a lot of cons, the major one is ''porting'' ioemu > code to firmware code. This is the challenge. But qemu > seems to be well structured. > Most of the files might be ported without changes, the core > has of course to be rewritten. The PV drivers should also be ported. > > SMP can be first handled with a global lock and later > concurrent accesses may be allowed. This may improve > performance compared to ioemu which is almost single threaded. > > I don''t know yet how to use the PV-on-HVM drivers. There is > currently only one page to communicate with xenstore. We can > try to share this page between the firmware and the PV-on-HVM > drivers or we may create a second page. > > > I have thought of this new IO emulation method during my work > on EFI gfw for ia64. Recently I have looked more deeply into > the sources. I can''t see any stopper yet. Unless someone > has a strong point against this method I hope I will be able > to work on it shortly (ia64 first - sorry!) > > Comments are *very* welcome. > > Tristan. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2007-Feb-22 16:06 UTC
[Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
Hi Tristan, Thanks for posting this. Tristan Gingold wrote:> Summary: I am proposing a new method to improve hvm IO emulation: the IO > requests are reflected to the domain firmware which emulates the IO using PV > drivers. The pros of this method are minor hypervisor modifications, smooth > transition, performance improvement and convergence with PV model > > > Discussion: > > The current IO emulator (ioemu process in dom-0) is a well known bottleneck > for hvm performance because IO requests travel is long and cross many rings.I''m not quite sure that I agree this is the bottleneck. If IO latency were the problem, then a major reduction in IO latency ought to significantly improve performance right? KVM has a pretty much optimal path from the kernel to userspace. The overhead of going to userspace is roughly two syscalls (and we''ve measured this overhead). Yet it makes almost no difference in IO throughput. The big problem with disk emulation isn''t IO latency, but the fact that the IDE emulation can only have one outstanding request at a time. The SCSI emulation helps this a lot. I don''t know what the bottle neck is in network emulation, but I suspect the number of copies we have in the path has a great deal to do with it.> Many ideas to improve the emulation have been proposed. None of them have > been adopted because their approach are too disruptive. > > Based on my recent firmware experience I''d like to propose a new method. > > The principle is rather simple: the hvm domain does all the work. IO requests > are simply reflected to the domain. When the hypervisor decodes an IO > request it sends it to the domain using a SMI(x86)/PMI(ia64)-like > interruption. This reflection saves some registers, put parameters (IO req) > into registers and call the firmware at defined address using a defined mode > (physical mode should be the best). The firmware handles the IO request like > ioemu does but use PV drivers (net, blk, fb...) to access to external > resources. It then resumes the domain execution through an hypercall which > restores registers and mode. > > I think there are many pros to this approach: > > * the changes in the hypervisor are rather small: only the code to do the > reflection has to be added. This is a well-known and light mechanism. > > * the transition can be smooth: this new method can co-exist in several way > with the current method. First it can be used only when enabled. Then once > the reflection code is added in the hypervisor the firmware can just send the > IO request to ioemu like the hypervisor already does. The in domain IO > emulation can be added driver per driver (eg: IDE disk first, then network, > then fb). > This smooth transition is a major advantage to early evaluate this new method. > > * Because all the emulation work is done in the domain the work in accounted > to this domain and not to another domain (dom0 today). This is good for > management and for security. > > * From the hypervisor point of view such an hvm domain looks like a PV domain: > only the creation differs. This IO emulation method unifies the domain. This > will simplify save & restore and Xen in general. > > * Performance should be improved compared to the current io emulation method: > the IO request travel is shorter. If we want to work on performance we could > later handle directly some IO requests in the hypervisor (I think of ports or > iomem which don''t have side-effect). > > > I don''t see a lot of cons, the major one is ''porting'' ioemu code to > firmware code. This is the challenge. But qemu seems to be well structured. > Most of the files might be ported without changes, the core has of course to > be rewritten. The PV drivers should also be ported. > > SMP can be first handled with a global lock and later concurrent accesses may > be allowed. This may improve performance compared to ioemu which is almost > single threaded.There''s a lot to like about this sort of approach. It''s not a silver bullet wrt performance but I think the model is elegant in many ways. An interesting place to start would be lapic/pit emulation. Removing this code from the hypervisor would be pretty useful and there is no need to address PV-on-HVM issues. Can you provide more details on how the reflecting works? Have you measured the cost of reflection? Do you just setup a page table that maps physical memory 1-1 and then reenter the guest? Does the firmware get loaded as an option ROM or is it a special portion of guest memory that isn''t normally reachable? Regards, Anthony Liguori> I don''t know yet how to use the PV-on-HVM drivers. There is currently only > one page to communicate with xenstore. We can try to share this page > between the firmware and the PV-on-HVM drivers or we may create a second > page. > > > I have thought of this new IO emulation method during my work on EFI gfw for > ia64. Recently I have looked more deeply into the sources. I can''t see any > stopper yet. Unless someone has a strong point against this method I hope > I will be able to work on it shortly (ia64 first - sorry!) > > Comments are *very* welcome. > > Tristan._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
tgingold@free.fr
2007-Feb-22 20:58 UTC
[Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
Selon Anthony Liguori <aliguori@us.ibm.com>:> Hi Tristan, > > Thanks for posting this.[...]> I''m not quite sure that I agree this is the bottleneck. If IO latency > were the problem, then a major reduction in IO latency ought to > significantly improve performance right?Sure. It is interesting to note you don''t agree. This appeared to me so obvious. Maybe should I do measures first and think only after :-)> KVM has a pretty much optimal path from the kernel to userspace. The > overhead of going to userspace is roughly two syscalls (and we''ve > measured this overhead). Yet it makes almost no difference in IO > throughput.The path can be split into 2 parts: from trap to ioemu and from ioemu to real hardware (the return is the same). ioemu to hardware should be roughly the same with KVM and Xen. Is trap to ioemu that different between Xen and KVM ? Honestly I don''t know. Does anyone have figures ? It would be interesting to compare disk (or net) performances between: * linux * dom0 * driver domain * PV-on-HVM drivers * ioemu Does such a comparaison exist ?> The big problem with disk emulation isn''t IO latency, but the fact that > the IDE emulation can only have one outstanding request at a time. The > SCSI emulation helps this a lot.IIRC, a real IDE can only have one outstanding request too (this may have changed with AHCI). This is really IIRC :-( BTW on ia64 there is no REP IN/OUT. When Windows use IDE in PIO mode (during install and crash dump), performances are horrible. There is a patch which adds a special handling for PIO mode and really improve data rate.> I don''t know what the bottle neck is in network emulation, but I suspect > the number of copies we have in the path has a great deal to do with it.This reason seems obvious. [...]> There''s a lot to like about this sort of approach. It''s not a silver > bullet wrt performance but I think the model is elegant in many ways. > An interesting place to start would be lapic/pit emulation. Removing > this code from the hypervisor would be pretty useful and there is no > need to address PV-on-HVM issues.Indeed this is the simpler code to move. But why would it be useful ?> Can you provide more details on how the reflecting works? Have you > measured the cost of reflection? Do you just setup a page table that > maps physical memory 1-1 and then reenter the guest?Yes, set disable PG, set up flat mode and reenter the guest. Cost not yet measured!> Does the firmware get loaded as an option ROM or is it a special portion > of guest memory that isn''t normally reachable?IMHO it should come with hvmload. No needs to make it unreachable. Tristan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2007-Feb-22 21:23 UTC
[Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
tgingold@free.fr wrote:>> KVM has a pretty much optimal path from the kernel to userspace. The >> overhead of going to userspace is roughly two syscalls (and we''ve >> measured this overhead). Yet it makes almost no difference in IO >> throughput. >> > The path can be split into 2 parts: from trap to ioemu and from ioemu to > real hardware (the return is the same). ioemu to hardware should be roughly > the same with KVM and Xen. Is trap to ioemu that different between Xen and > KVM ? >Yup. With KVM, there is no scheduler involvement. qemu does a blocking ioctl to the Linux kernel, and the Linux kernel does a vmrun. Provided the time slice hasn''t been exhausted, Linux returns directly to qemu after a vmexit. Xen uses event channels which involved domain switches and select()''ing. A lot of the time, the path is pretty optimal. However, quite a bit of the time, you run into worst case scenarios with the various schedulers and the latency sky rockets.> Honestly I don''t know. Does anyone have figures ? >Yeah, it varies a lot on different hardware. For reference: if round trip to a null int80 syscall is 150 nsec, a round trip vmexit to userspace in KVM may be 2500 nsec. On bare metal, it may cost 1700 nsec to do a PIO operation to a IDE port so 2500 really isn''t that bad. Xen is usually around there too but every so often, it spikes to something awful (100ks of nsecs) and that skews the average cost.> It would be interesting to compare disk (or net) performances between: > * linux > * dom0 > * driver domain > * PV-on-HVM drivers > * ioemu > > Does such a comparaison exist ? >Not that I know of. I''ve done a lot of benchmarking but not of PV-on-HVM. Xen can typically get pretty close to native for disk IO.>> The big problem with disk emulation isn''t IO latency, but the fact that >> the IDE emulation can only have one outstanding request at a time. The >> SCSI emulation helps this a lot. >> > IIRC, a real IDE can only have one outstanding request too (this may have > changed with AHCI). This is really IIRC :-( >You recall correctly. IDE can only have one type of outstanding DMA request.> BTW on ia64 there is no REP IN/OUT. When Windows use IDE in PIO mode (during > install and crash dump), performances are horrible. There is a patch which > adds a special handling for PIO mode and really improve data rate. >Ouch :-( Fortunately, OS''s won''t use PIO very often.>> I don''t know what the bottle neck is in network emulation, but I suspect >> the number of copies we have in the path has a great deal to do with it. >> > This reason seems obvious. > > > [...] > >> There''s a lot to like about this sort of approach. It''s not a silver >> bullet wrt performance but I think the model is elegant in many ways. >> An interesting place to start would be lapic/pit emulation. Removing >> this code from the hypervisor would be pretty useful and there is no >> need to address PV-on-HVM issues. >> > Indeed this is the simpler code to move. But why would it be useful ? >Removing code from the hypervisor reduces the TCB so it''s a win. Having it in firmware within the HVM domain is even better than having it in dom0 too wrt the TCB.>> Can you provide more details on how the reflecting works? Have you >> measured the cost of reflection? Do you just setup a page table that >> maps physical memory 1-1 and then reenter the guest? >> > Yes, set disable PG, set up flat mode and reenter the guest. > Cost not yet measured! >That would be very useful to measure. My chief concern would be that disabling PG would be considerably more costly than entering with paging enabled. That may not be the case on VT today since there is no ASIDs so it would be useful to test on SVM too.>> Does the firmware get loaded as an option ROM or is it a special portion >> of guest memory that isn''t normally reachable? >> > IMHO it should come with hvmload. No needs to make it unreachable. >It would be nice to get rid of hvmloader in the long term IMHO. Any initialization should be done in the BIOS. Regards, Anthony Liguori> Tristan. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2007-Feb-22 21:24 UTC
Re: [Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
> > The big problem with disk emulation isn''t IO latency, but the fact that > > the IDE emulation can only have one outstanding request at a time. The > > SCSI emulation helps this a lot. > > IIRC, a real IDE can only have one outstanding request too (this may have > changed with AHCI). This is really IIRC :-(Can SATA drives queue multiple outstanding requests? Thought some newer rev could, but I may well be misremembering - in any case we''d want something that was well supported.> > I don''t know what the bottle neck is in network emulation, but I suspect > > the number of copies we have in the path has a great deal to do with it. > > This reason seems obvious.Latency may matter more to the network performance than it did to block, actually (especially given our current setup is fairly pessimal wrt latency!). It would be interesting to see how much difference this makes. In any case, copies are bad too :-) Presumably, hooking directly into the paravirt network channel would improve this situation too. Perhaps the network device ought to be the first to move?> > There''s a lot to like about this sort of approach. It''s not a silver > > bullet wrt performance but I think the model is elegant in many ways. > > An interesting place to start would be lapic/pit emulation. Removing > > this code from the hypervisor would be pretty useful and there is no > > need to address PV-on-HVM issues. > > Indeed this is the simpler code to move. But why would it be useful ?It might be a good proof of concept, and it simplifies the hypervisor (and the migration / suspend process) at the same time.> > Does the firmware get loaded as an option ROM or is it a special portion > > of guest memory that isn''t normally reachable? > > IMHO it should come with hvmload. No needs to make it unreachable.Mmmm. It''s not like the guest can break security if it tampers with the device models in its own memory space. Question: how does this compare with using a "stub domain" to run the device models? The previous proposed approach was to automatically switch to the stub domain on trapping an IO by the HVM guest, and have that stub domain run the device models, etc. You seem to be actually proposing running the code within the HVM guest itself. The two approaches aren''t actually that different, IMO, since the guest still effectively has two different execution contexts. It does seem to me that running within the HVM guest itself might be more flexible. A cool little trick that this strategy could enable is to run a full Qemu instruction emulator within the device model - I''d imagine this could be useful on IA64, for instance, in order to provide support for running legacy OSes (e.g. for x86, or *cough* PPC ;-)) Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2007-Feb-22 21:33 UTC
Re: [Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
Mark Williamson wrote:>>> The big problem with disk emulation isn''t IO latency, but the fact that >>> the IDE emulation can only have one outstanding request at a time. The >>> SCSI emulation helps this a lot. >>> >> IIRC, a real IDE can only have one outstanding request too (this may have >> changed with AHCI). This is really IIRC :-( >> > > Can SATA drives queue multiple outstanding requests? Thought some newer rev > could, but I may well be misremembering - in any case we''d want something > that was well supported. >SATA can, yes. However, as you mention, SATA is very poorly supported. The LSI scsi adapter seems to work quite nicely with Windows and Linux. And it supports TCQ. And it''s already implemented :-) Can''t really beat that :-)> >>> I don''t know what the bottle neck is in network emulation, but I suspect >>> the number of copies we have in the path has a great deal to do with it. >>> >> This reason seems obvious. >> > > Latency may matter more to the network performance than it did to block, > actually (especially given our current setup is fairly pessimal wrt > latency!). It would be interesting to see how much difference this makes. > > In any case, copies are bad too :-) Presumably, hooking directly into the > paravirt network channel would improve this situation too. > > Perhaps the network device ought to be the first to move? >Can''t say. I haven''t done much research on network performance.>>> There''s a lot to like about this sort of approach. It''s not a silver >>> bullet wrt performance but I think the model is elegant in many ways. >>> An interesting place to start would be lapic/pit emulation. Removing >>> this code from the hypervisor would be pretty useful and there is no >>> need to address PV-on-HVM issues. >>> >> Indeed this is the simpler code to move. But why would it be useful ? >> > > It might be a good proof of concept, and it simplifies the hypervisor (and the > migration / suspend process) at the same time. > > >>> Does the firmware get loaded as an option ROM or is it a special portion >>> of guest memory that isn''t normally reachable? >>> >> IMHO it should come with hvmload. No needs to make it unreachable. >> > > Mmmm. It''s not like the guest can break security if it tampers with the > device models in its own memory space. > > Question: how does this compare with using a "stub domain" to run the device > models? The previous proposed approach was to automatically switch to the > stub domain on trapping an IO by the HVM guest, and have that stub domain run > the device models, etc. >Reflecting is a bit more expensive than doing a stub domain. There is no way to wire up the VMEXITs to go directly into the guest so you''re always going to have to pay the cost of going from guest => host => guest => host => guest for every PIO. The guest is incapable of reenabling PG on its own hence the extra host => guest transition. Compare to stub domain where, if done correctly, you can go from guest => host/0 => host/3 => host/0 => guest. The question would be, is host/0 => host/3 => host/0 fundamentally faster than host => guest => host. I know that guest => host => guest typically costs *at least* 1000 nsecs on SVM. A null sysenter syscall (that''s host/3 => host/0 => host/3) is roughly 75 nsecs. So my expectation is that stub domain can actually be made to be faster than reflecting. Regards, Anthony Liguori> You seem to be actually proposing running the code within the HVM guest > itself. The two approaches aren''t actually that different, IMO, since the > guest still effectively has two different execution contexts. It does seem > to me that running within the HVM guest itself might be more flexible. > > A cool little trick that this strategy could enable is to run a full Qemu > instruction emulator within the device model - I''d imagine this could be > useful on IA64, for instance, in order to provide support for running legacy > OSes (e.g. for x86, or *cough* PPC ;-)) > > Cheers, > Mark > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2007-Feb-22 21:41 UTC
Re: [Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
While I''m thinking about it, I wonder how returning to the guest from the emulator would work... We''d want to hypercall to transfer back to it... do we need specific Xen support for this or could (for instance) Gerd''s work on domU kexec be leveraged here? Perhaps it would be worth evaluating some kind of "send these events and then switch back to guest code" hypercall so that the emulator doesn''t have to bounce in and out of Xen so much. Remains to be seen whether this makes much diffecence to overall performance but it seems somehow civilised ;-) Cheers, Mark On Thursday 22 February 2007 21:23, Anthony Liguori wrote:> tgingold@free.fr wrote: > >> KVM has a pretty much optimal path from the kernel to userspace. The > >> overhead of going to userspace is roughly two syscalls (and we''ve > >> measured this overhead). Yet it makes almost no difference in IO > >> throughput. > > > > The path can be split into 2 parts: from trap to ioemu and from ioemu to > > real hardware (the return is the same). ioemu to hardware should be > > roughly the same with KVM and Xen. Is trap to ioemu that different > > between Xen and KVM ? > > Yup. With KVM, there is no scheduler involvement. qemu does a blocking > ioctl to the Linux kernel, and the Linux kernel does a vmrun. Provided > the time slice hasn''t been exhausted, Linux returns directly to qemu > after a vmexit. > > Xen uses event channels which involved domain switches and > select()''ing. A lot of the time, the path is pretty optimal. However, > quite a bit of the time, you run into worst case scenarios with the > various schedulers and the latency sky rockets. > > > Honestly I don''t know. Does anyone have figures ? > > Yeah, it varies a lot on different hardware. For reference: > > if round trip to a null int80 syscall is 150 nsec, a round trip vmexit > to userspace in KVM may be 2500 nsec. On bare metal, it may cost 1700 > nsec to do a PIO operation to a IDE port so 2500 really isn''t that bad. > > Xen is usually around there too but every so often, it spikes to > something awful (100ks of nsecs) and that skews the average cost. > > > It would be interesting to compare disk (or net) performances between: > > * linux > > * dom0 > > * driver domain > > * PV-on-HVM drivers > > * ioemu > > > > Does such a comparaison exist ? > > Not that I know of. I''ve done a lot of benchmarking but not of PV-on-HVM. > > Xen can typically get pretty close to native for disk IO. > > >> The big problem with disk emulation isn''t IO latency, but the fact that > >> the IDE emulation can only have one outstanding request at a time. The > >> SCSI emulation helps this a lot. > > > > IIRC, a real IDE can only have one outstanding request too (this may have > > changed with AHCI). This is really IIRC :-( > > You recall correctly. IDE can only have one type of outstanding DMA > request. > > > BTW on ia64 there is no REP IN/OUT. When Windows use IDE in PIO mode > > (during install and crash dump), performances are horrible. There is a > > patch which adds a special handling for PIO mode and really improve data > > rate. > > Ouch :-( Fortunately, OS''s won''t use PIO very often. > > >> I don''t know what the bottle neck is in network emulation, but I suspect > >> the number of copies we have in the path has a great deal to do with it. > > > > This reason seems obvious. > > > > > > [...] > > > >> There''s a lot to like about this sort of approach. It''s not a silver > >> bullet wrt performance but I think the model is elegant in many ways. > >> An interesting place to start would be lapic/pit emulation. Removing > >> this code from the hypervisor would be pretty useful and there is no > >> need to address PV-on-HVM issues. > > > > Indeed this is the simpler code to move. But why would it be useful ? > > Removing code from the hypervisor reduces the TCB so it''s a win. Having > it in firmware within the HVM domain is even better than having it in > dom0 too wrt the TCB. > > >> Can you provide more details on how the reflecting works? Have you > >> measured the cost of reflection? Do you just setup a page table that > >> maps physical memory 1-1 and then reenter the guest? > > > > Yes, set disable PG, set up flat mode and reenter the guest. > > Cost not yet measured! > > That would be very useful to measure. My chief concern would be that > disabling PG would be considerably more costly than entering with paging > enabled. That may not be the case on VT today since there is no ASIDs > so it would be useful to test on SVM too. > > >> Does the firmware get loaded as an option ROM or is it a special portion > >> of guest memory that isn''t normally reachable? > > > > IMHO it should come with hvmload. No needs to make it unreachable. > > It would be nice to get rid of hvmloader in the long term IMHO. Any > initialization should be done in the BIOS. > > Regards, > > Anthony Liguori > > > Tristan. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2007-Feb-23 00:12 UTC
Re: [Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
Alan wrote:>> SATA can, yes. However, as you mention, SATA is very poorly supported. >> > > By what - it works very nicely in current Linux kernels,But it isn''t supported by older kernels and most versions of Windows. A major use of virtualization is running older operating systems so depending on newer kernels is not really an option (if we have a new kernel, we''d prefer to use a paravirtual driver anyway).> including AHCI > with NCQ and multiple outstanding commands. The fact Xen isn''t merged > and is living in prehistory is I''m afraid a Xen problem. >This discussion is independent of Xen. It''s equally applicable to KVM and QEMU so please don''t assume this has anything to do with Xen''s merge status. Regards, Anthony Liguori _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2007-Feb-23 00:15 UTC
Re: [Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
> > Can SATA drives queue multiple outstanding requests? Thought some newer > > rev could, but I may well be misremembering - in any case we''d want > > something that was well supported. > > SATA can, yes. However, as you mention, SATA is very poorly supported. > > The LSI scsi adapter seems to work quite nicely with Windows and Linux. > And it supports TCQ. And it''s already implemented :-) Can''t really > beat that :-)LSI wins :-) Supporting TCQ is cool too (but can we actually leverage that through the PV interface?)> > Perhaps the network device ought to be the first to move? > > Can''t say. I haven''t done much research on network performance.Network was the hard device to virtualise anyway, so I suspect efficiency may matter more here.... although we''d have to test whether it was significant compared to other factors (is the device we''re emulating at least well suited to efficient batching behaviour or should we be looking at that too?)> Reflecting is a bit more expensive than doing a stub domain. There is > no way to wire up the VMEXITs to go directly into the guest so you''re > always going to have to pay the cost of going from guest => host => > guest => host => guest for every PIO. The guest is incapable of > reenabling PG on its own hence the extra host => guest transition.VMEXITs still go to ring 0 though, right? So you still need the ring transition into the guest and back? What you wouldn''t need if leveraging HVM is the pagetable switch - although I don''t know if this is the case for VT-i which is somewhat different to VT-x in design.> I know that guest => host => guest typically costs *at least* 1000 nsecs > on SVM. A null sysenter syscall (that''s host/3 => host/0 => host/3) is > roughly 75 nsecs. > > So my expectation is that stub domain can actually be made to be faster > than reflecting. >Interesting. The code should be fairly common to both though, so maybe we can do a bakeoff! Cheers, Mark> Regards, > > Anthony Liguori > > > You seem to be actually proposing running the code within the HVM guest > > itself. The two approaches aren''t actually that different, IMO, since > > the guest still effectively has two different execution contexts. It > > does seem to me that running within the HVM guest itself might be more > > flexible. > > > > A cool little trick that this strategy could enable is to run a full Qemu > > instruction emulator within the device model - I''d imagine this could be > > useful on IA64, for instance, in order to provide support for running > > legacy OSes (e.g. for x86, or *cough* PPC ;-)) > > > > Cheers, > > Mark-- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Alan
2007-Feb-23 00:26 UTC
Re: [Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
> SATA can, yes. However, as you mention, SATA is very poorly supported.By what - it works very nicely in current Linux kernels, including AHCI with NCQ and multiple outstanding commands. The fact Xen isn''t merged and is living in prehistory is I''m afraid a Xen problem. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Alan
2007-Feb-23 00:32 UTC
Re: [Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
> Can SATA drives queue multiple outstanding requests? Thought some newer rev > could, but I may well be misremembering - in any case we''d want something > that was well supported.Most SATA drives support NCQ which is sensible queuing and all the rest of it. Some early ones get it badly wrong. You need a controller interface that handles it and a device that handles it. Devices pretending to be PATA controllers lack the brains but AHCI is a current intel standard used by many vendors for this role and intended to replace SFF style IDE. Alam _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Alan
2007-Feb-23 12:57 UTC
Re: [Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
> But it isn''t supported by older kernels and most versions of Windows. ALinux 2.4.x AHCI drivers exist. Windows 95/98 are lacking them as is NT that much is true, but Win2K and later support AHCI. AHCI is also very nice from a virtualisation point of view as you get commands in queues and you can batch them up sensibly. For older windows there is the ADMA interface which is saner to emulate than SFF but not very sane.> This discussion is independent of Xen. It''s equally applicable to KVM > and QEMU so please don''t assume this has anything to do with Xen''s merge > status.Don''t even get me started on qemu. The qemu "emulation" of ATAPI is a good reason to use anything else as an interface. Alan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2007-Feb-23 18:56 UTC
Re: [Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
Alan wrote:>> But it isn''t supported by older kernels and most versions of Windows. A >> > > Linux 2.4.x AHCI drivers exist. Windows 95/98 are lacking them as is NT > that much is true, but Win2K and later support AHCI. AHCI is also very > nice from a virtualisation point of view as you get commands in queues > and you can batch them up sensibly. > > For older windows there is the ADMA interface which is saner to emulate > than SFF but not very sane. > > >> This discussion is independent of Xen. It''s equally applicable to KVM >> and QEMU so please don''t assume this has anything to do with Xen''s merge >> status. >> > > Don''t even get me started on qemu. The qemu "emulation" of ATAPI is a good > reason to use anything else as an interface. >Feel free to submit patches. Regards, Anthony Liguori> Alan >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tristan Gingold
2007-Feb-24 06:07 UTC
[Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
On Thu, Feb 22, 2007 at 03:23:03PM -0600, Anthony Liguori wrote:> tgingold@free.fr wrote:[... overhead ...]> Yup. With KVM, there is no scheduler involvement. qemu does a blocking > ioctl to the Linux kernel, and the Linux kernel does a vmrun. Provided > the time slice hasn''t been exhausted, Linux returns directly to qemu > after a vmexit.Ok, thank you for the details.> Xen uses event channels which involved domain switches and > select()''ing. A lot of the time, the path is pretty optimal. However, > quite a bit of the time, you run into worst case scenarios with the > various schedulers and the latency sky rockets. > > >Honestly I don''t know. Does anyone have figures ? > > > > Yeah, it varies a lot on different hardware. For reference: > > if round trip to a null int80 syscall is 150 nsec, a round trip vmexit > to userspace in KVM may be 2500 nsec. On bare metal, it may cost 1700 > nsec to do a PIO operation to a IDE port so 2500 really isn''t that bad. > > Xen is usually around there too but every so often, it spikes to > something awful (100ks of nsecs) and that skews the average cost.That explains the latency. [...]> >>The big problem with disk emulation isn''t IO latency, but the fact that > >>the IDE emulation can only have one outstanding request at a time. The > >>SCSI emulation helps this a lot. > >> > >IIRC, a real IDE can only have one outstanding request too (this may have > >changed with AHCI). This is really IIRC :-( > > > > You recall correctly. IDE can only have one type of outstanding DMA > request.So there is something I do not understand: KDM IDE accesses are almost as fast as bare metal (2500 ns vs 1700 ns). Is KVM IO performance awful compared to bare metal ? If so why ? [...]> Removing code from the hypervisor reduces the TCB so it''s a win. Having > it in firmware within the HVM domain is even better than having it in > dom0 too wrt the TCB.Ok.> >>Can you provide more details on how the reflecting works? Have you > >>measured the cost of reflection? Do you just setup a page table that > >>maps physical memory 1-1 and then reenter the guest? > >> > >Yes, set disable PG, set up flat mode and reenter the guest. > >Cost not yet measured! > > That would be very useful to measure. My chief concern would be that > disabling PG would be considerably more costly than entering with paging > enabled. That may not be the case on VT today since there is no ASIDs > so it would be useful to test on SVM too.Switching to physical mode shouldn''t be slow on ia64 (Sorry I am more familiar with Xen/ia64). Anyways this is a detail.> >>Does the firmware get loaded as an option ROM or is it a special portion > >>of guest memory that isn''t normally reachable? > >> > >IMHO it should come with hvmload. No needs to make it unreachable. > > > > It would be nice to get rid of hvmloader in the long term IMHO. Any > initialization should be done in the BIOS.Again I am not very familiar with hvmloader and these are implementation details IMHO. Tristan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tristan Gingold
2007-Feb-24 06:12 UTC
Re: [Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
On Thu, Feb 22, 2007 at 09:24:15PM +0000, Mark Williamson wrote: [...]> Perhaps the network device ought to be the first to move?I think I will start with the most simple device :-)> > > Does the firmware get loaded as an option ROM or is it a special portion > > > of guest memory that isn''t normally reachable? > > > > IMHO it should come with hvmload. No needs to make it unreachable. > > Mmmm. It''s not like the guest can break security if it tampers with the > device models in its own memory space.[Maybe I don''t catch all the english here] How can the guest break security WRT an usual PV domain ?> Question: how does this compare with using a "stub domain" to run the device > models? The previous proposed approach was to automatically switch to the > stub domain on trapping an IO by the HVM guest, and have that stub domain run > the device models, etc.Is there a partial/full implementation of stub domain ? The pro of firmware approach compared to stub domain is the easy way to do it: it doesn''t requires of lot of modification in the HV.> You seem to be actually proposing running the code within the HVM guest > itself. The two approaches aren''t actually that different, IMO, since the > guest still effectively has two different execution contexts. It does seem > to me that running within the HVM guest itself might be more flexible.I fully agree.> A cool little trick that this strategy could enable is to run a full Qemu > instruction emulator within the device model - I''d imagine this could be > useful on IA64, for instance, in order to provide support for running legacy > OSes (e.g. for x86, or *cough* PPC ;-))That''s something I''d like to have too. Tristan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tristan Gingold
2007-Feb-24 06:17 UTC
Re: [Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
On Thu, Feb 22, 2007 at 03:33:21PM -0600, Anthony Liguori wrote:> Mark Williamson wrote:[...]> >Mmmm. It''s not like the guest can break security if it tampers with the > >device models in its own memory space. > > > >Question: how does this compare with using a "stub domain" to run the > >device models? The previous proposed approach was to automatically switch > >to the stub domain on trapping an IO by the HVM guest, and have that stub > >domain run the device models, etc. > > > > Reflecting is a bit more expensive than doing a stub domain. There is > no way to wire up the VMEXITs to go directly into the guest so you''re > always going to have to pay the cost of going from guest => host => > guest => host => guest for every PIO. The guest is incapable of > reenabling PG on its own hence the extra host => guest transition. > > Compare to stub domain where, if done correctly, you can go from guest > => host/0 => host/3 => host/0 => guest. The question would be, is > host/0 => host/3 => host/0 fundamentally faster than host => guest => host. > > I know that guest => host => guest typically costs *at least* 1000 nsecs > on SVM. A null sysenter syscall (that''s host/3 => host/0 => host/3) is > roughly 75 nsecs. > > So my expectation is that stub domain can actually be made to be faster > than reflecting.Ok. Unfortunatly I don''t have the figures for ia64. With the firmware approach strictly speaking we don''t need to reenter into the guest mode during the reflection. That would be very like stub-domain. [I really have to look on stub-domain implementation if there is such one]. Tristan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tristan Gingold
2007-Feb-24 06:19 UTC
Re: [Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
On Thu, Feb 22, 2007 at 09:41:20PM +0000, Mark Williamson wrote:> While I''m thinking about it, I wonder how returning to the guest from the > emulator would work... > > We''d want to hypercall to transfer back to it... do we need specific Xen > support for this or could (for instance) Gerd''s work on domU kexec be > leveraged here? > > Perhaps it would be worth evaluating some kind of "send these events and then > switch back to guest code" hypercall so that the emulator doesn''t have to > bounce in and out of Xen so much. Remains to be seen whether this makes much > diffecence to overall performance but it seems somehow civilised ;-)For sure, there are a lot of possible minor optimization points... Tristan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2007-Feb-27 12:14 UTC
Re: [Xen-devel] Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
> > Perhaps the network device ought to be the first to move? > > I think I will start with the most simple device :-)Good plan :-)> > > IMHO it should come with hvmload. No needs to make it unreachable. > > > > Mmmm. It''s not like the guest can break security if it tampers with the > > device models in its own memory space. > > [Maybe I don''t catch all the english here] > How can the guest break security WRT an usual PV domain ?It can''t - I just meant that it''s no worse having the emulators in the domain itself than having a paravirtualised domain. It doesn''t imply an increase in trust, so there''s no particular reason not to put emulators in the guest.> > Question: how does this compare with using a "stub domain" to run the > > device models? The previous proposed approach was to automatically > > switch to the stub domain on trapping an IO by the HVM guest, and have > > that stub domain run the device models, etc. > > Is there a partial/full implementation of stub domain ? > The pro of firmware approach compared to stub domain is the easy way to do > it: it doesn''t requires of lot of modification in the HV.I believe some folks are working on this, but I''m not sure there''s a "proper" stub domain with emulators linked to mini-os (as per the original plan) yet. It''s nice that modification isn''t required - I suspect it also means less changes in the tools, etc, are necessary.> > A cool little trick that this strategy could enable is to run a full Qemu > > instruction emulator within the device model - I''d imagine this could be > > useful on IA64, for instance, in order to provide support for running > > legacy OSes (e.g. for x86, or *cough* PPC ;-)) > > That''s something I''d like to have too.This is probably another discussion, but there''s some interesting design questions here regarding how the CPU emulation would fit in. For instance, whether it would need to be done in such a way that the guest could be save/restored (or even live migrated!) between x86 and IA64 machines... Being able to do this would be cool, but I''m not sure how useful it would be in the real world! This also has implications for whether the PV drivers would use the host architecture protocol or the guest architecture protocol... This stuff could get fun :-) In any case, getting it working in any form would be an advance. Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel