Khoa Huynh
2006-Apr-28 01:52 UTC
[Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
On VT-x systems, according to Intel VMX specifications, the instruction-length information in VMCS on VM exits is not always valid. The instruction-length field in VMCS is ONLY valid in the follwing cases: when the VM exit is caused by the execution of instructions that cause the VM exit unconditionally or based on the execution-control bitmap, a software exception (INT3 or INT0), or a task switch. For VM exits caused by data faults (hardware exceptions), the instruction-length field in VMCS is actually undefined. In these cases, the hypervisor can derive the correct instruction length by fetching bytes based on the guest instruction pointer and decoding those bytes. There is already a function to do this in the SVM sub-directory. This function should be moved up one level to HVM sub-directory, so both VMX and SVM can use it. It should be noted that VMX only uses this instrlen function when the hypervisor needs the instruction-length info and that info is undefined in VMCS, e.g., for MMIO instructions. In other cases where the instruction-length field is valid in VMCS, the hypervisor continues to get that info from VMCS (via vmread operation). I came across this problem in my effort to get Windows NT booting on Xen. There are TWO patches attached below: * instrlen1.patch effectively moves the instrlen.c file from xen/arch/x86/hvm/svm sub-directory up one level to xen/arch/x86/hvm sub-directory and makes minor changes to instrlen.c so that it will work at its new location. * instrlen2.patch makes additional changes to VMX code so the hypervisor can use the instrlen function correctly in all modes in cases where the instruction-length field is undefined and read from VMCS in cases where it is defined. I must acknowledge that most of the code in the first patch (instrlen1.patch) does not come from me since the primary prupose of this patch is to move the instrlen.c file from one location to another in the tree (it also makes some minor changes). The second patch (instrlen2.patch) is more meaty :-) These two patches should apply cleanly to the latest xen-unstable tree (hg tip = 9866). I have tested these patches successfully on two systems using a variety of guest OSes (e.g. WinXP, Win2003 Server). Signed-off-by: Khoa Huynh <khoa@us.ibm.com> (See attached file: instrlen1.patch)(See attached file: instrlen2.patch) Regards, Khoa _________________________________________ Khoa Huynh, Ph.D. IBM Linux Technology Center (512) 838-4903; T/L 678-4903; khoa@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Apr-28 02:41 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
Please don''t submit patches with mailers that attach as binary attachments. You''ll have to resubmit anyway as your copyright line is wrong (unless you really did write this code a thousand years ago ;-)) Regards, Anthony Liguori Khoa Huynh wrote:> On VT-x systems, according to Intel VMX specifications, > the instruction-length information in VMCS on VM exits > is not always valid. The instruction-length field in > VMCS is ONLY valid in the follwing cases: when the VM > exit is caused by the execution of instructions that > cause the VM exit unconditionally or based on the > execution-control bitmap, a software exception (INT3 > or INT0), or a task switch. > > For VM exits caused by data faults (hardware exceptions), > the instruction-length field in VMCS is actually undefined. > In these cases, the hypervisor can derive the correct > instruction length by fetching bytes based on the guest > instruction pointer and decoding those bytes. There is > already a function to do this in the SVM sub-directory. > This function should be moved up one level to HVM > sub-directory, so both VMX and SVM can use it. > > It should be noted that VMX only uses this instrlen > function when the hypervisor needs the instruction-length > info and that info is undefined in VMCS, e.g., for MMIO > instructions. In other cases where the instruction-length > field is valid in VMCS, the hypervisor continues to get > that info from VMCS (via vmread operation). > > I came across this problem in my effort to get Windows > NT booting on Xen. > > There are TWO patches attached below: > > * instrlen1.patch effectively moves the instrlen.c file > from xen/arch/x86/hvm/svm sub-directory up one level to > xen/arch/x86/hvm sub-directory and makes minor changes > to instrlen.c so that it will work at its new location. > > * instrlen2.patch makes additional changes to VMX code > so the hypervisor can use the instrlen function correctly > in all modes in cases where the instruction-length field is > undefined and read from VMCS in cases where it is defined. > > I must acknowledge that most of the code in the first patch > (instrlen1.patch) does not come from me since the primary > prupose of this patch is to move the instrlen.c file from > one location to another in the tree (it also makes some > minor changes). The second patch (instrlen2.patch) is > more meaty :-) > > These two patches should apply cleanly to the latest > xen-unstable tree (hg tip = 9866). > > I have tested these patches successfully on two systems > using a variety of guest OSes (e.g. WinXP, Win2003 Server). > > Signed-off-by: Khoa Huynh <khoa@us.ibm.com> > > (See attached file: instrlen1.patch)(See attached file: instrlen2.patch) > > Regards, > Khoa > _________________________________________ > Khoa Huynh, Ph.D. > IBM Linux Technology Center > (512) 838-4903; T/L 678-4903; khoa@us.ibm.com > ------------------------------------------------------------------------ > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Apr-28 06:03 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
On 28 Apr 2006, at 02:52, Khoa Huynh wrote:> It should be noted that VMX only uses this instrlen > function when the hypervisor needs the instruction-length > info and that info is undefined in VMCS, e.g., for MMIO > instructions. In other cases where the instruction-length > field is valid in VMCS, the hypervisor continues to get > that info from VMCS (via vmread operation).I don''t believe we need the instruction-length at all, and I suspect that the decoder could be removed from hvm/svm entirely. There are two broad categories of instruction I''m thinking of: 1. Instructions with their own VMEXIT reason code tend to be really simple so we know their length anyway and, if not, the instr-length field should be valid 2. For mmio instructions, the emulator can work out the length for itself and increment eip appropriately. There''s no need to know the instruction length in advance of invoking the emulator. I guess there may be one or two instructions, particularly on AMD, where we aren''t feeding the instruction to the mmio emulator and the instruction isn''t fixed length, so perhaps we''ll need a small decoder in hvm/svm for those. But even if so, it could be much simpler than what is there right now. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2006-Apr-28 09:02 UTC
RE: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > Keir Fraser > Sent: 28 April 2006 07:03 > To: Khoa Huynh > Cc: xen-devel > Subject: Re: [Xen-devel] [PATCH] Calculate correct > instruction length for data-fault VM exits on VT-x systems > > > On 28 Apr 2006, at 02:52, Khoa Huynh wrote: > > > It should be noted that VMX only uses this instrlen > function when the > > hypervisor needs the instruction-length info and that info is > > undefined in VMCS, e.g., for MMIO instructions. In other > cases where > > the instruction-length field is valid in VMCS, the hypervisor > > continues to get that info from VMCS (via vmread operation). > > I don''t believe we need the instruction-length at all, and I > suspect that the decoder could be removed from hvm/svm > entirely. There are two broad categories of instruction I''m > thinking of: > 1. Instructions with their own VMEXIT reason code tend to > be really simple so we know their length anyway and, if not, > the instr-length field should be valid > 2. For mmio instructions, the emulator can work out the > length for itself and increment eip appropriately. There''s no > need to know the instruction length in advance of invoking > the emulator. > > I guess there may be one or two instructions, particularly on > AMD, where we aren''t feeding the instruction to the mmio > emulator and the instruction isn''t fixed length, so perhaps > we''ll need a small decoder in hvm/svm for those. But even if > so, it could be much simpler than what is there right now.Yes, this is correct. There is a specific routine that takes as an argument which instruction(s) we''re looking for, and calculates it''s length, for this purpose [since we do know which instructions we are looking for]. I''ll look at your previous suggestion of merging the MMIO emulation into x86_emulate later on today. We probably do need to sum up the length and pass it back to the caller - as that code doesn''t know how to update the correct field of the different processor architectures (vmcb vs. vmcs vs. stack-frame for Para-virtual machine). But it shouldn''t be particularly hard to achieve this. -- Mats _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Apr-28 09:14 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
On 28 Apr 2006, at 10:02, Petersson, Mats wrote:> I''ll look at your previous suggestion of merging the MMIO emulation > into > x86_emulate later on today. We probably do need to sum up the length > and > pass it back to the caller - as that code doesn''t know how to update > the > correct field of the different processor architectures (vmcb vs. vmcs > vs. stack-frame for Para-virtual machine). But it shouldn''t be > particularly hard to achieve this.The emulator uses and updates the eip field of the passed-in regs structure. We may want to change this interface in future by having the caller explicitly pass in a buffer containing the instruction, and the number of valid bytes in the buffer. Or add a ''fetch_insn_byte'' callback hook to the emulator interface. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2006-Apr-28 09:19 UTC
RE: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > Keir Fraser > Sent: 28 April 2006 10:15 > To: Petersson, Mats > Cc: Khoa Huynh; xen-devel > Subject: Re: [Xen-devel] [PATCH] Calculate correct > instruction length for data-fault VM exits on VT-x systems > > > On 28 Apr 2006, at 10:02, Petersson, Mats wrote: > > > I''ll look at your previous suggestion of merging the MMIO emulation > > into x86_emulate later on today. We probably do need to sum up the > > length and pass it back to the caller - as that code > doesn''t know how > > to update the correct field of the different processor > architectures > > (vmcb vs. vmcs vs. stack-frame for Para-virtual machine). But it > > shouldn''t be particularly hard to achieve this. > > The emulator uses and updates the eip field of the passed-in > regs structure. We may want to change this interface in > future by having the caller explicitly pass in a buffer > containing the instruction, and the number of valid bytes in > the buffer. Or add a ''fetch_insn_byte'' > callback hook to the emulator interface.I think passing a buffer is the best choice here. And I suppose we can always stuff vmc[bs]->rip into regs->eip and pull it back out again when we get back - using a wrapper function may be the easiest way to achieve this (at least short term). We will of course also need to get the communication with QEMU done in some way. I haven''t spent any time looking at it so far... -- Mats> > -- Keir > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Apr-28 09:24 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
On 28 Apr 2006, at 10:19, Petersson, Mats wrote:> I think passing a buffer is the best choice here. And I suppose we can > always stuff vmc[bs]->rip into regs->eip and pull it back out again > when > we get back - using a wrapper function may be the easiest way to > achieve > this (at least short term).Yes, I expect HVM users will want some kind of helpful wrapper around the core emulator. I''m trying to keep the emulator itself very generic, so it makes sense that some tailoring will be required in some usages.> We will of course also need to get the communication with QEMU done in > some way.Yes, I don''t imagine that is hard as long as you''re prepared to copy the guest''s register state to and from qemu-dm. Even on x8664 it''s only a couple hundred bytes so unlikely to be a significant overhead. Then you can simply have a copy of the emulator inside qemu-dm. -- Keir> I haven''t spent any time looking at it so far..._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Khoa Huynh
2006-Apr-28 18:10 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote on 04/28/2006 01:03:02 AM:> > On 28 Apr 2006, at 02:52, Khoa Huynh wrote: > > > It should be noted that VMX only uses this instrlen > > function when the hypervisor needs the instruction-length > > info and that info is undefined in VMCS, e.g., for MMIO > > instructions. In other cases where the instruction-length > > field is valid in VMCS, the hypervisor continues to get > > that info from VMCS (via vmread operation). > > I don''t believe we need the instruction-length at all, and I suspect > that the decoder could be removed from hvm/svm entirely. There are two > broad categories of instruction I''m thinking of: > 1. Instructions with their own VMEXIT reason code tend to be really > simple so we know their length anyway and, if not, the instr-length > field should be validFor these instructions, on Intel VT-x, the instruction length is valid in VMCS. On AMD, there is a simple look-up function which determines the length of the instruction which is passed in as a parameter. We are good here.> 2. For mmio instructions, the emulator can work out the length for > itself and increment eip appropriately. There''s no need to know the > instruction length in advance of invoking the emulator.Yeah, MMIO instructions are problematic and I was trying to address this area by using the stripped-down emulator for SVM, but you are suggesting that we get rid of that stripped-down emulator in SVM, get rid of the MMIO decoder/emulator in HVM directory (platform.c), and use the generic x86 emulator in xen/arch/x86 for MMIO instructions instead. This would certainly be much cleaner than having different versions of decoder/emulator lying around in different places. I wonder if there would be any noticeable impact on path lengths for MMIO instructions ?> > On 28 Apr 2006, at 10:02, Petersson, Mats wrote: > > > > > I''ll look at your previous suggestion of merging the MMIO emulation > > > into x86_emulate later on today. We probably do need to sum up the > > > length and pass it back to the caller - as that code > > > doesn''t know how > > > to update the correct field of the different processor > > > architectures > > > (vmcb vs. vmcs vs. stack-frame for Para-virtual machine). But it > > > shouldn''t be particularly hard to achieve this. > > > > The emulator uses and updates the eip field of the passed-in > > regs structure. We may want to change this interface in > > future by having the caller explicitly pass in a buffer > > containing the instruction, and the number of valid bytes in > > the buffer. Or add a ''fetch_insn_byte'' > > callback hook to the emulator interface. > > I think passing a buffer is the best choice here. And I suppose we can > always stuff vmc[bs]->rip into regs->eip and pull it back out again when > we get back - using a wrapper function may be the easiest way to achieve > this (at least short term).I guess we can have a wrapper that takes as input the guest instruction pointer (rip), fetches the whole MAX_INST_LEN (15-byte) buffer starting at rip (make sure that we don''t cross a page boundary), and then passes that to the emulator. The emulator would decode, emulate, and would include in its return the updated guest instruction pointer (rip) and instruction length. This info will be stuffed back into vmcs/vmcb/stack as appropriate. Is this more or less what you have in mind ? Thanks. Regards, Khoa _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Apr-28 20:02 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
Leendert van Doorn wrote:> On Fri, 2006-04-28 at 10:24 +0100, Keir Fraser wrote: > > >> Yes, I expect HVM users will want some kind of helpful wrapper around >> the core emulator. I''m trying to keep the emulator itself very generic, >> so it makes sense that some tailoring will be required in some usages. >> >> >>> We will of course also need to get the communication with QEMU done in >>> some way. >>> >> Yes, I don''t imagine that is hard as long as you''re prepared to copy >> the guest''s register state to and from qemu-dm. Even on x8664 it''s only >> a couple hundred bytes so unlikely to be a significant overhead. Then >> you can simply have a copy of the emulator inside qemu-dm. >> >> -- Keir >> >> >>> I haven''t spent any time looking at it so far... >>> > > Here is something I''ve been toying with lately, inspired by the work > Steve''s students presented at Eurosys. > > The way we are currently doing real-mode emulation for VT-x is a royal > pain and getting the semantics right for all big real mode uses (Solaris > 9, SLES''s gfxboot) is next to impossible in the current framework. What > I was thinking about was to switch back and forth between a VT-x > partition and a full emulator. The obvious choice for this would be to > put back the qemu instruction emulator into qemu-dm and handle all > real-mode instructions there. As soon as CR0.PE is set to 0, we do an > upcall to the emulator. Once CR0.PE=1 and we have emulated some > threshold number of instructions (1000?) we switch back to the VMX > partition. This would allow us to amortize the cost of doing a full > context save and restore. Obviously, this is only a concern for VT-x, > but SVM could benefit in the following scenario: >This could be extended to support systems without VT/SVM. Instead of dropping back when CR0.PE=1 (after 1000 instructions), if VT/SVM isn''t available, you wait until a switch to ring 3. This would essentially do what QEmu + qvm86 does today. I''d be really happy to see this in Xen since it would let you use unmodified guests (even when VT/SVM is not present). Regards, Anthony Liguori> We could do a similar thing for I/O operations. Basically, generate an > upcall into qemu-dm on an MMIO or PIO exit and let qemu-dm deal with it. > It can do the same trick and emulate a number of instructions (1000?) > before returning to the HVM partition. This will eliminate expensive > VMCS/VMCB exits on subsequent I/O operations (just consider doing a > block write on an IDE device in PIO mode, this is common behavior). It > will also eliminate the need for the MMIO instruction emulator in the > hypervisor. > > The only difficulty is that the hypervisor keeps some of the device > state vpit and *pics and shotcuts operations to them. This state needs > to be exposed to qemu-dm so that it is saved and restored on every > qemu-dm invocation. I need to verify this, but as far as I''m aware, all > the accesses to the devices emulated in the hypervisor are PIO > operations. These are easy to decode with the exit information that is > provided by VT-x and SVM, so they don''t need a a full instruction > decoder. > > Comments? > > Leendert > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Leendert van Doorn
2006-Apr-29 01:20 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
On Fri, 2006-04-28 at 10:24 +0100, Keir Fraser wrote:> Yes, I expect HVM users will want some kind of helpful wrapper around > the core emulator. I''m trying to keep the emulator itself very generic, > so it makes sense that some tailoring will be required in some usages. > > > We will of course also need to get the communication with QEMU done in > > some way. > > Yes, I don''t imagine that is hard as long as you''re prepared to copy > the guest''s register state to and from qemu-dm. Even on x8664 it''s only > a couple hundred bytes so unlikely to be a significant overhead. Then > you can simply have a copy of the emulator inside qemu-dm. > > -- Keir > > > I haven''t spent any time looking at it so far...Here is something I''ve been toying with lately, inspired by the work Steve''s students presented at Eurosys. The way we are currently doing real-mode emulation for VT-x is a royal pain and getting the semantics right for all big real mode uses (Solaris 9, SLES''s gfxboot) is next to impossible in the current framework. What I was thinking about was to switch back and forth between a VT-x partition and a full emulator. The obvious choice for this would be to put back the qemu instruction emulator into qemu-dm and handle all real-mode instructions there. As soon as CR0.PE is set to 0, we do an upcall to the emulator. Once CR0.PE=1 and we have emulated some threshold number of instructions (1000?) we switch back to the VMX partition. This would allow us to amortize the cost of doing a full context save and restore. Obviously, this is only a concern for VT-x, but SVM could benefit in the following scenario: We could do a similar thing for I/O operations. Basically, generate an upcall into qemu-dm on an MMIO or PIO exit and let qemu-dm deal with it. It can do the same trick and emulate a number of instructions (1000?) before returning to the HVM partition. This will eliminate expensive VMCS/VMCB exits on subsequent I/O operations (just consider doing a block write on an IDE device in PIO mode, this is common behavior). It will also eliminate the need for the MMIO instruction emulator in the hypervisor. The only difficulty is that the hypervisor keeps some of the device state vpit and *pics and shotcuts operations to them. This state needs to be exposed to qemu-dm so that it is saved and restored on every qemu-dm invocation. I need to verify this, but as far as I''m aware, all the accesses to the devices emulated in the hypervisor are PIO operations. These are easy to decode with the exit information that is provided by VT-x and SVM, so they don''t need a a full instruction decoder. Comments? Leendert _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Apr-29 07:21 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
On 28 Apr 2006, at 19:10, Khoa Huynh wrote:> For these instructions, on Intel VT-x, the instruction length is valid > in VMCS. On AMD, there is a simple look-up function which determines > the length of the instruction which is passed in as a parameter. > We are good here.The Intel code only uses the instr-len field because it happens to be handy. Going to a look-up function in a separate file when you *know* at compile time what the instruction length must be is stupid, imo. We should only have to do that if the instruction needs some decoding for us to know its length (perhaps because of prefix bytes or effective address suffixes) and we are not otherwise going to be decoding the instruction as part of emulation.> I guess we can have a wrapper that takes as input the guest instruction > pointer (rip), fetches the whole MAX_INST_LEN (15-byte) buffer starting > at rip (make sure that we don''t cross a page boundary), and then passes > that to the emulator. The emulator would decode, emulate, and would > include in its return the updated guest instruction pointer (rip) and > instruction length. This info will be stuffed back into > vmcs/vmcb/stack > as appropriate. Is this more or less what you have in mind ?Yes, exactly. It gets a bit trickier though -- we''ll have to fill the buffer with up to 15 bytes. If we fail to get all of 15 bytes (perhaps because the instruction straddles a page boundary and the second page has been evicted since the instruction faulted) then we ought to tell the emulator how many bytes we actually copied and it shoudl return error if it ends up going off the end of teh instruction buffer. Alternatively we could re-enter the guest immediately if we cannot read 15 bytes from the EIP -- but that''ll cause an infinite loop if the instruction itself doesn''t straddle the page boundary as it won''t trigger paging in the guest. But I/O instructions are unlikely to be in paged memory. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Apr-29 08:00 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
On 29 Apr 2006, at 02:20, Leendert van Doorn wrote:> The only difficulty is that the hypervisor keeps some of the device > state vpit and *pics and shotcuts operations to them. This state needs > to be exposed to qemu-dm so that it is saved and restored on every > qemu-dm invocation. I need to verify this, but as far as I''m aware, all > the accesses to the devices emulated in the hypervisor are PIO > operations. These are easy to decode with the exit information that is > provided by VT-x and SVM, so they don''t need a a full instruction > decoder.The APIC and IO-APIC are accessed via mmio. The former is written fairly frequently with singleton updates (to the TPR and EOI registers) so we''d want to carry on dealing with those directly in Xen I should think. Still you''d have to deal with the case that one of the Xen-emulated devices is accessed while emulating in qemu-dm -- as you say you''d probably have to pull their state vectors out of Xen when starting emulating. We''ll need that for save/restore anyway though. I don''t know if this will make sense for emulated I/O but it does sound like a very sane alternative to vmxassist for dealing with real mode. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Apr-29 10:39 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
On 29 Apr 2006, at 15:54, Leendert van Doorn wrote:> The state is already partially exposed to qemu-dm through the shared > global I/O data page (include/public/hvm/ioreq.h). This is easy to > extend so that a context switch doesn''t involve copying device state. > This is also the place where we should store the vmx_assist_context > information that is required by the emulator.Xen-emulated devices could have their state fetched on demand -- it ought to be rare that qemu-dm ends up emulating an access to such a device. Or have qemu call down into Xen to emulate those accesses. Having a device model duplicated in both Xen and qemu, and synchronising between the two, sounds a bit sketchy. SSE state is another consideration, but maybe that''s not too expensive to save/restore/copy on every qemu-dm transition. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Leendert van Doorn
2006-Apr-29 14:54 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
On Sat, 2006-04-29 at 09:00 +0100, Keir Fraser wrote:> The APIC and IO-APIC are accessed via mmio. The former is written > fairly frequently with singleton updates (to the TPR and EOI registers) > so we''d want to carry on dealing with those directly in Xen I should > think. Still you''d have to deal with the case that one of the > Xen-emulated devices is accessed while emulating in qemu-dm -- as you > say you''d probably have to pull their state vectors out of Xen when > starting emulating. We''ll need that for save/restore anyway though.The state is already partially exposed to qemu-dm through the shared global I/O data page (include/public/hvm/ioreq.h). This is easy to extend so that a context switch doesn''t involve copying device state. This is also the place where we should store the vmx_assist_context information that is required by the emulator. The mmio *pic operations could just be handled by x86_emulate.> I don''t know if this will make sense for emulated I/O but it does sound > like a very sane alternative to vmxassist for dealing with real mode.The big advantage I see for I/O is that 1) we don''t need the instruction decode support anymore so it cleans up the hypervisor, 2) it has the potential to greatly reducing the number of exits that are caused by I/O by emulating subsequent I/O operations before returning to the HVM partition. Especially for the older devices that we are currently emulating this could be a major win, but even for modern devices where you are manipulating ring buffers that reside in I/O space it would be a win. I don''t think that moving the I/O decoding from the hypervisor to the device model is going to be a major performance bottleneck. This cost is dwarfed by the upcall into qemu-dm. Leendert _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Apr-29 18:54 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
On 30 Apr 2006, at 00:24, Leendert van Doorn wrote:>> Xen-emulated devices could have their state fetched on demand -- it >> ought to be rare that qemu-dm ends up emulating an access to such a >> device. Or have qemu call down into Xen to emulate those accesses. >> Having a device model duplicated in both Xen and qemu, and >> synchronising between the two, sounds a bit sketchy. > > In the BIOS (realmode) these devices are initialized and the subsequent > 32-bit code depends on the proper initialization. For the common I/O > emulation case this should hopefully be rare.How does this work now? Do we really have two copies of each device model? I doubt that''s implemented safely. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Apr-29 19:46 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
On 30 Apr 2006, at 02:37, Leendert van Doorn wrote:>> How does this work now? Do we really have two copies of each device >> model? I doubt that''s implemented safely. > > Right now the realmode code runs inside the VMX partition where it is > partially emulated by vmxassist. So all accesses to the emulated > devices > go through the hypervisor first before they (potentially) end up in > qemu-dm. When a transition is made to 32/64-bit code all the > initialized > device state is still there.Ah yes, I forgot that the mmio decoder stuff in Xen handles real mode. So that means that currently each device model is either implemented in Xen or in qemu-dm, but not both (now that the heinous split PIT device model is gone). That''s a nice state of affairs.> The problem of keeping the the hypervisor state and the qemu-dm state > in > sync is introduced when we alternate between emulation and real > execution. This becomes more interesting when we consider MP guests > where one CPU is running inside the emulator and another on the real > hardware.It''d obviously be better avoided altogether, unless we have to perform horrible contortions to do so, or if doing so would hurt performance of operations that we care about. Don''t get me wrong by the way: I do think that leveraging qemu''s full emulator, at least to get us out of the stickiest situations, is a very good idea. I''m only concerned about some of the finer details. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Leendert van Doorn
2006-Apr-29 23:24 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
On Sat, 2006-04-29 at 11:39 +0100, Keir Fraser wrote:> > Xen-emulated devices could have their state fetched on demand -- it > ought to be rare that qemu-dm ends up emulating an access to such a > device. Or have qemu call down into Xen to emulate those accesses. > Having a device model duplicated in both Xen and qemu, and > synchronising between the two, sounds a bit sketchy.In the BIOS (realmode) these devices are initialized and the subsequent 32-bit code depends on the proper initialization. For the common I/O emulation case this should hopefully be rare. Leendert _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Leendert van Doorn
2006-Apr-30 01:37 UTC
Re: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
On Sat, 2006-04-29 at 19:54 +0100, Keir Fraser wrote:> > In the BIOS (realmode) these devices are initialized and the subsequent > > 32-bit code depends on the proper initialization. For the common I/O > > emulation case this should hopefully be rare. > > How does this work now? Do we really have two copies of each device > model? I doubt that''s implemented safely.Right now the realmode code runs inside the VMX partition where it is partially emulated by vmxassist. So all accesses to the emulated devices go through the hypervisor first before they (potentially) end up in qemu-dm. When a transition is made to 32/64-bit code all the initialized device state is still there. The problem of keeping the the hypervisor state and the qemu-dm state in sync is introduced when we alternate between emulation and real execution. This becomes more interesting when we consider MP guests where one CPU is running inside the emulator and another on the real hardware. Leendert _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2006-May-02 12:36 UTC
RE: [Xen-devel] [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
> -----Original Message----- > From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] > Sent: 29 April 2006 08:22 > To: Khoa Huynh > Cc: Petersson, Mats; xen-devel > Subject: Re: [Xen-devel] [PATCH] Calculate correct > instruction length for data-fault VM exits on VT-x systems > > > On 28 Apr 2006, at 19:10, Khoa Huynh wrote: > > > For these instructions, on Intel VT-x, the instruction > length is valid > > in VMCS. On AMD, there is a simple look-up function which > determines > > the length of the instruction which is passed in as a parameter. > > We are good here. > > The Intel code only uses the instr-len field because it > happens to be handy. Going to a look-up function in a > separate file when you *know* at compile time what the > instruction length must be is stupid, imo. We should only > have to do that if the instruction needs some decoding for us > to know its length (perhaps because of prefix bytes or > effective address suffixes) and we are not otherwise going to > be decoding the instruction as part of emulation.We do have to forward the RIP to next instruction, and we don''t know the prefix and other things, so I don''t think we can improve on the current setup [although I noticed some time ago that you replaced the call to calculate instrlen on HLT with a constant one, which I suppose is fine, since IF there is a prefix (unlikely) to HLT, then we just re-execute it without prefix, which doesn''t make a whole lot of difference [Except we MAY end up waiting for another interrupt, technically speaking, which would probably not be a good thing - of course, I''ve never seen any code with a prefix to HLT - but it''s perfectly allowed to do redundant and useless prefixes on any instruction, for example to change the alignment of the next instruction].> > > I guess we can have a wrapper that takes as input the guest > > instruction pointer (rip), fetches the whole MAX_INST_LEN (15-byte) > > buffer starting at rip (make sure that we don''t cross a page > > boundary), and then passes that to the emulator. The > emulator would > > decode, emulate, and would include in its return the updated guest > > instruction pointer (rip) and instruction length. This > info will be > > stuffed back into vmcs/vmcb/stack as appropriate. Is this more or > > less what you have in mind ? > > Yes, exactly. It gets a bit trickier though -- we''ll have to > fill the buffer with up to 15 bytes. If we fail to get all of > 15 bytes (perhaps because the instruction straddles a page > boundary and the second page has been evicted since the > instruction faulted) then we ought to tell the emulator how > many bytes we actually copied and it shoudl return error if > it ends up going off the end of teh instruction buffer. > Alternatively we could re-enter the guest immediately if we > cannot read > 15 bytes from the EIP -- but that''ll cause an infinite loop > if the instruction itself doesn''t straddle the page boundary > as it won''t trigger paging in the guest. But I/O instructions > are unlikely to be in paged memory.Yes - at the moment, we do not cope well with instructions that straddle a page-boundary if the next page isn''t present in memory. On the other hand, I''ve been looking at just using the existing hooks (in the ops structure) to read instruction bytes and I think that would work just fine. However, I haven''t been looking through this entire thread to see what the conclusion is, so this may all be moot... -- Mats> > -- Keir > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel