thr3ads.net - Xen devel - [Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs [Mar 2009]

If this information is useful, please help other people find it:
Share via:

George Dunlap

2009-Mar-26 12:25 UTC

[Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

With recent builds of -unstable, I''ve been unable to get an HVM domain
to boot on an AMD box with the virtual network card enabled.  The same
exact binaries work on an intel box just fine.

The problem turns out to be with the handling of a 32-bit rep MOVS
instruction in the 16-bit gPXE initialization code.  The offending
code is here:

f3 67 a4 66 89 3e 7b 02

nasm interprets it this way:

00000000  F3                db 0xF3
00000001  67A4              a32 movsb
00000003  66893E7B02        a32 mov [0x27b],edi

In other words, a 32-bit rep movs in 16-bit mode.  (gPXE appeas to be
copying itself from where it was installed at 0xc9000 to somewhere
higher in memory, 0x200000.  Not clear why it wants to do that.)

On Intel boxes, the code causes a #GP (not surprisingly), and the
emulator handles it successfully.

On AMD boxes (at least two of them), this causes a #GP (surprisingly).
That calls to the BIOS "null trap handler", which simply does an iret,
causing a busy loop.

There are three possibilities I came up with:
1) The same thing would happen outside of SVM; in which case it''s
(sort of) a gPXE bug for using an instruction that won''t work on AMD
boxes.
2) Xen is subtly screwing up the VM state, causing the AMD hardware
not to recognize that this shouldn''t cause a #GP
3) AMD hardware (at least some of it) doesn''t handle 32-bit rep movs
instructions in 16-bit mode.

If it''s #1, we should try to build gPXE without the 32-bit instructions

If it''s #2, we need to track down what state is being corrupted by Xen.

If it''s #3, the simplest solution is probably to take vmexits on GP
faults and attempt to emulate the instruction if we''re in real mode,
as we do for vmx.

Wei, Christoph: any ideas?

The cpuid output of the two boxes I''ve tried this on is below.

Thanks,
 -George Dunlap

[elite]
processor	: 0

vendor_id	: AuthenticAMD

cpu family	: 16

model		: 2

model name	: Quad-Core AMD Opteron(tm) Processor 2352

stepping	: 3

cpu MHz		: 2094.850

cache size	: 512 KB

fdiv_bug	: no

hlt_bug		: no

f00f_bug	: no

coma_bug	: no

fpu		: yes

fpu_exception	: yes

cpuid level	: 5

wp		: yes

flags		: fpu de tsc msr pae cx8 apic mtrr cmov pat clflush mmx fxsr
sse sse2 ht syscall nx mmxext fxsr_opt 3dnowext 3dnow constant_tsc pni
cmp_legacy cr8legacy ts ttp tm stc [6] [7] [8]

bogomips	: 4190.72


[dakota]processor	: 0

vendor_id	: AuthenticAMD

cpu family	: 15

model		: 65

model name	: Dual-Core AMD Opteron(tm) Processor 2218

stepping	: 2

cpu MHz		: 2593.560

cache size	: 1024 KB

fdiv_bug	: no

hlt_bug		: no

f00f_bug	: no

coma_bug	: no

fpu		: yes

fpu_exception	: yes

cpuid level	: 1

wp		: yes

flags		: fpu de tsc msr pae mce cx8 apic mtrr mca cmov pat clflush mmx
fxsr sse sse2 ht nx mmxext fxsr_opt 3dnowext 3dnow pni cmp_legacy
cr8legacy ts fid vid ttp tm stc

bogomips	: 5188.45

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Mar-26 14:43 UTC

head link

Re: [Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

On 26/03/2009 12:25, "George Dunlap"
<George.Dunlap@eu.citrix.com> wrote:
> There are three possibilities I came up with:
> 1) The same thing would happen outside of SVM; in which case it''s
> (sort of) a gPXE bug for using an instruction that won''t work on
AMD
> boxes.
> 2) Xen is subtly screwing up the VM state, causing the AMD hardware
> not to recognize that this shouldn''t cause a #GP
> 3) AMD hardware (at least some of it) doesn''t handle 32-bit rep
movs
> instructions in 16-bit mode.
It must surely be a Xen bug. Doing 32-bit ops in 16-bit mode is a completely
standard thing that all processors will support. The other alternative is
perhaps we have somehow managed to build ourselves a bogus gpxe image.

Your assertion that it causes GP on Intel is weird. We should be running in
the emulator already since for the movs to 0x200000 to work we must be
running in big real mode (i.e., one of the segment registers has a limit
greater than 0xffff) and so we cannot be emulating that by running the guest
in vm86 mode.

I can give some help tracking this down when I''m back next week, if
it''s not
resolved by then. It''s also the sort of thing which may interest Tim
Deegan,
who has also worked on real mode support on the Intel side in the past.

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2009-Mar-26 14:54 UTC

head link

Re: [Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

At 14:43 +0000 on 26 Mar (1238078627), Keir Fraser
wrote:> Your assertion that it causes GP on Intel is weird. We should be running in
> the emulator already since for the movs to 0x200000 to work we must be
> running in big real mode (i.e., one of the segment registers has a limit
> greater than 0xffff) and so we cannot be emulating that by running the
guest
> in vm86 mode.
We do use vm86 mode for big-real-mode; we just clip the segment limits
to 16 bits and carry on, since almost all instructions don''t use the
big
segments.  Then when we take a fault for the A32 REP MOVS with
the>16-bit offset we go into the emulator and it does the right thing.
Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Mar-26 15:15 UTC

head link

Re: [Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

Keir Fraser wrote:> On 26/03/2009 12:25, "George Dunlap"
<George.Dunlap@eu.citrix.com> wrote:
>
>   
>> There are three possibilities I came up with:
>> 1) The same thing would happen outside of SVM; in which case
it''s
>> (sort of) a gPXE bug for using an instruction that won''t work
on AMD
>> boxes.
>> 2) Xen is subtly screwing up the VM state, causing the AMD hardware
>> not to recognize that this shouldn''t cause a #GP
>> 3) AMD hardware (at least some of it) doesn''t handle 32-bit
rep movs
>> instructions in 16-bit mode.
>>     
>
> It must surely be a Xen bug. Doing 32-bit ops in 16-bit mode is a
completely
> standard thing that all processors will support. The other alternative is
> perhaps we have somehow managed to build ourselves a bogus gpxe image.
>   for #3, I meant that perhaps the AMD hardware didn''t handle it properly
in non-root mode (as opposed to #1, which suggested it may not work on 
AMD hardware at all).  I''m not that familiar with this level of the x86
architecture at all, so I''ll take your word for it.
:-)> Your assertion that it causes GP on Intel is weird. We should be running in
> the emulator already since for the movs to 0x200000 to work we must be
> running in big real mode (i.e., one of the segment registers has a limit
> greater than 0xffff) and so we cannot be emulating that by running the
guest
> in vm86 mode.
>   Maybe I wasn''t clear, or didn''t use the technical terms
properly; in any
case, here''s a trace from an Intel box about the code in question.  I 
added some extra tracing to gather information about what happened in 
the emulation.  You see:
* An io port write (the last thing before the instruction).
* An EXCEPTION_NMI exit at the code in question (cs=c900 eip=1cb, linear 
address = c91cb) caused by a trap 13 (GP fault)
* The emulator copies 1 page from c9000 to 200000
* Repeats for ca000 -> 201000

!  4.110129337 -x  vmentry
]  4.110130683 -x  vmexit exit_reason IO_INSTRUCTION eip 7b16
   4.110130683 -x io write port 981 val 40
   4.110133785 -x  runstate_change d2v0 running->offline
 [dom0 handles the io write]
   4.110142327 -x  runstate_change d2v0 runnable->running
!  4.110144371 -x  vmentry
]  4.110145950 -x  vmexit exit_reason EXCEPTION_NMI eip 1cb
   4.110145950 -x realmode (trap 13)
   4.110145950 -x rep_mov sseg 2 soff 0 dseg 3 doff 200000
   4.110145950 -x rep_mov2 saddr c9000 sgpa c9000 daddr 200000 dgpa 200000
]  4.110156960 -x  vmentry cycles 26424 !
]  4.110158295 -x  vmexit exit_reason EXCEPTION_NMI eip 1cb
   4.110158295 -x realmode (trap 13)
   4.110158295 -x rep_mov sseg 2 soff 1000 dseg 3 doff 201000
   4.110158295 -x rep_mov2 saddr ca000 sgpa ca000 daddr 201000 dgpa 201000
]  4.110162836 -x  vmentry cycles 10899 !

So it seems clear that:
* it was not in all-emulation mode
* it took a GP fault at that instruction
* it emulated it successfully. 
Is this not what''s expected?> I can give some help tracking this down when I''m back next week,
if it''s not
> resolved by then. It''s also the sort of thing which may interest
Tim Deegan,
> who has also worked on real mode support on the Intel side in the past.
>   Tim gave me a hand to get this far.  I''m going to try to get the rep 
movs instruction into Gianluca''s "xentest" framework when he
comes back
next week, so we can isolate different variables better.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2009-Mar-26 16:24 UTC

head link

Re: [Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

On Thursday 26 March 2009 16:15:06 George Dunlap wrote:> Keir Fraser wrote:
> > On 26/03/2009 12:25, "George Dunlap"
<George.Dunlap@eu.citrix.com> wrote:
> >> There are three possibilities I came up with:
> >> 1) The same thing would happen outside of SVM; in which case
it''s
> >> (sort of) a gPXE bug for using an instruction that won''t
work on AMD
> >> boxes.
> >> 2) Xen is subtly screwing up the VM state, causing the AMD
hardware
> >> not to recognize that this shouldn''t cause a #GP
I think it''s #2. Look at the #GP causes in APM 
Volume 2 for MOVSx: the only one in real mode is if the address 
exceeded a data segment limit.  And the comment from Deegan about 
clipping segment limits to 16 bits makes me think that the clipping is 
happening on AMD machines and it shouldn''t be.

So probably, VMCB.DS.LIMIT is smaller than it should be. Note, that
AMD requires the segment limit to be the effective limit and
the granularity segment attribute is ignored.

Christoph


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2009-Mar-26 16:31 UTC

head link

Re: [Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

At 16:24 +0000 on 26 Mar (1238084646), Christoph Egger
wrote:> I think it''s #2. Look at the #GP causes in APM 
> Volume 2 for MOVSx: the only one in real mode is if the address 
> exceeded a data segment limit.  And the comment from Deegan about 
> clipping segment limits to 16 bits makes me think that the clipping is 
> happening on AMD machines and it shouldn''t be.
That particular clipping happens in vmx.c so I certainly hope it
doesn''t
get called on AMD machines. :)  But yes, it''s likely that some
big-real-mode segment state has got lost somewhere.

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Mar-30 15:02 UTC

head link

Re: [Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

So it turns out this is actually a bug in gPXE, combined with a
limit-check bug in HVM emulation.  The segment limit was only set to
16 bits, but was clearly trying to use a 32-bit address, so the AMD
hardware was doing exactly what it should have done.

The reason it worked on an Intel box was that the hvm address checking
logic doesn''t do *any* limit checking when in real mode.  If you add a
real-mode segment limit check in hvm.c:hvm_virtual_to_linear_addr(),
then it the VM has the exact same issue.

So one thing is clear, we need to re-compile the gPXE "binary" that
comes in xen so that it doesn''t violate segment limits.

We might think about checking some limits in real mode as well, just
for good measure.

 -George

On Thu, Mar 26, 2009 at 5:31 PM, Tim Deegan <Tim.Deegan@citrix.com>
wrote:> At 16:24 +0000 on 26 Mar (1238084646), Christoph Egger wrote:
>> I think it''s #2. Look at the #GP causes in APM
>> Volume 2 for MOVSx: the only one in real mode is if the address
>> exceeded a data segment limit.  And the comment from Deegan about
>> clipping segment limits to 16 bits makes me think that the clipping is
>> happening on AMD machines and it shouldn''t be.
>
> That particular clipping happens in vmx.c so I certainly hope it
doesn''t
> get called on AMD machines. :)  But yes, it''s likely that some
> big-real-mode segment state has got lost somewhere.
>
> Tim.
>
> --
> Tim Deegan <Tim.Deegan@citrix.com>
> Principal Software Engineer, Citrix Systems (R&D) Ltd.
> [Company #02300071, SL9 0DZ, UK.]
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Mar-31 07:21 UTC

head link

[Xen-devel] Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs

On 31/03/2009 04:56, "Michael Brown" <mcb30@etherboot.org>
wrote:
> It''s a bug in the BIOS; it should either not advertise PMM, or it
should
> follow the PMM spec and call the initialisation entry point in flat real
> mode.
Thanks Michael. I''ll get this fixed.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Mar-31 13:06 UTC

head link

[Xen-devel] Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs

On 31/03/2009 04:56, "Michael Brown" <mcb30@etherboot.org>
wrote:
> It''s a bug in the BIOS; it should either not advertise PMM, or it
should
> follow the PMM spec and call the initialisation entry point in flat real
> mode.
George, can you please test with xen-unstable:19477. This should give all
our real-mode segments 4GB limits, and thus fix our PMM support.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Mar-31 13:21 UTC

head link

Re: [Xen-devel] Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs

That seems to have done it.  Now I can go back to what I was going to
do in the first place, which is test my new scheduler on an 8-core
system... :-)

 -George

On Tue, Mar 31, 2009 at 2:06 PM, Keir Fraser <keir.fraser@eu.citrix.com>
wrote:> On 31/03/2009 04:56, "Michael Brown" <mcb30@etherboot.org>
wrote:
>
>> It''s a bug in the BIOS; it should either not advertise PMM, or
it should
>> follow the PMM spec and call the initialisation entry point in flat
real
>> mode.
>
> George, can you please test with xen-unstable:19477. This should give all
> our real-mode segments 4GB limits, and thus fix our PMM support.
>
>  -- Keir
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Mar 2009 - Real-mode bug with AMD, gPXE, and 32-bit rep movs

[Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

Re: [Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

Re: [Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

Re: [Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

Re: [Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

Re: [Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

Re: [Xen-devel] Real-mode bug with AMD, gPXE, and 32-bit rep movs

[Xen-devel] Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs

[Xen-devel] Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs

Re: [Xen-devel] Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs