Hi, While changing our Xen 3.2.x based HVM BIOS ROM to use gPXE instead of etherboot, I ran into an interesting behavior. The gPXE code, which runs in real mode, contains the following sequence: wait_for_tick: pushl %eax pushw %fs movw $0x40, %ax movw %ax, %fs movl %fs:(0x6c), %eax 1: pushf sti hlt popf cmpl %fs:(0x6c), %eax je 1b popw %fs popl %eax ret It uses this to timeout waiting for a key press. The expected interrupt is from the BIOS timer implemented in rombios. But in fact, the loop hangs. However, if I insert a nop instruction between the sti and hlt, then things work as expected. Is there something wrong with this sequence? This happens on AMD, so it''s not a quirk of the real mode emulations on Intel. I notice that in the gPXE code currently in xen-unstable, the path that uses this code is patched out. /gary -- Gary Grebus Virtual Iron Software, Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gary, Which CPU family you are using? 0xF? There is an errata which seems to be related. See page 50. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/ 33610.pdf -Wei -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Gary Grebus Sent: Tuesday, November 18, 2008 12:50 PM To: xen-devel Subject: [Xen-devel] Problem with BIOS timer interrupts Hi, While changing our Xen 3.2.x based HVM BIOS ROM to use gPXE instead of etherboot, I ran into an interesting behavior. The gPXE code, which runs in real mode, contains the following sequence: wait_for_tick: pushl %eax pushw %fs movw $0x40, %ax movw %ax, %fs movl %fs:(0x6c), %eax 1: pushf sti hlt popf cmpl %fs:(0x6c), %eax je 1b popw %fs popl %eax ret It uses this to timeout waiting for a key press. The expected interrupt is from the BIOS timer implemented in rombios. But in fact, the loop hangs. However, if I insert a nop instruction between the sti and hlt, then things work as expected. Is there something wrong with this sequence? This happens on AMD, so it''s not a quirk of the real mode emulations on Intel. I notice that in the gPXE code currently in xen-unstable, the path that uses this code is patched out. /gary -- Gary Grebus Virtual Iron Software, Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 18/11/08 18:50, "Gary Grebus" <ggrebus@virtualiron.com> wrote:> It uses this to timeout waiting for a key press. The expected interrupt > is from the BIOS timer implemented in rombios. But in fact, the loop > hangs. However, if I insert a nop instruction between the sti and hlt, > then things work as expected. > > Is there something wrong with this sequence? This happens on AMD, so > it''s not a quirk of the real mode emulations on Intel. > > I notice that in the gPXE code currently in xen-unstable, the path that > uses this code is patched out.As a data point, I commented it out because the delay''s annoying rather than because it caused a boot hang for me. I was testing on Intel though. Inserting the nop is obviously bogus (I expect you''re aware of that :-), since it raises the opportunity of a wakeup-waiting race. That it fixes this issue is very weird. I expect we have some issue to do with leaving an interrupt shadow during HLT emulation -- why this would only trigger in real mode I cannot guess. Wei''s erratum is not applicable, for three reasons: 1. We disable C1 clock ramping 2. We always intercept HLT 3. STI; HLT is a standard x86 idiom used in all OSes, and this is the only place we''re seeing a problem. Also the erratum would lead to rare non-deterministic hangs, not a hang every time (which is what you''re seeing?). I would say it''s a good idea to see if you can repro this on xen-unstable. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 2008-11-18 at 20:02 +0000, Keir Fraser wrote:> On 18/11/08 18:50, "Gary Grebus" <ggrebus@virtualiron.com> wrote: > > > It uses this to timeout waiting for a key press. The expected interrupt > > is from the BIOS timer implemented in rombios. But in fact, the loop > > hangs. However, if I insert a nop instruction between the sti and hlt, > > then things work as expected. > > > > Is there something wrong with this sequence? This happens on AMD, so > > it''s not a quirk of the real mode emulations on Intel.Interestingly, the same problem and "fix" apply on Intel under vmxassist. But likely nobody cares about that anymore (and gPXE has other problems with vmxassist).> > > > I notice that in the gPXE code currently in xen-unstable, the path that > > uses this code is patched out. > > As a data point, I commented it out because the delay''s annoying rather than > because it caused a boot hang for me. I was testing on Intel though. > > Inserting the nop is obviously bogus (I expect you''re aware of that :-), > since it raises the opportunity of a wakeup-waiting race. That it fixes this > issue is very weird. I expect we have some issue to do with leaving an > interrupt shadow during HLT emulation -- why this would only trigger in real > mode I cannot guess. > > Wei''s erratum is not applicable, for three reasons: > 1. We disable C1 clock ramping > 2. We always intercept HLT > 3. STI; HLT is a standard x86 idiom used in all OSes, and this is the only > place we''re seeing a problem. Also the erratum would lead to rare > non-deterministic hangs, not a hang every time (which is what you''re > seeing?).OK. It does appear to be family 0xf processor. dom0 says: processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 65 model name : Dual-Core AMD Opteron(tm) Processor 2216 stepping : 2 cpu MHz : 2394.000 cache size : 1024 KB> > I would say it''s a good idea to see if you can repro this on xen-unstable.OK... My usual xen-unstable setup is out of commission at the moment, but I will try to reproduce it. /gary _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 18/11/08 20:28, "Gary Grebus" <ggrebus@virtualiron.com> wrote:>> I would say it''s a good idea to see if you can repro this on xen-unstable. > > OK... My usual xen-unstable setup is out of commission at the moment, > but I will try to reproduce it.Another approach, if this really is happening every time, would be to trace the hell out of HLT emulation and interrupt delivery with printk. Since this happens so early, you shouldn''t end up overwhelmed with trace data. Perhaps you can narrow down the problem that way... -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 2008-11-18 at 22:13 +0000, Keir Fraser wrote:> On 18/11/08 20:28, "Gary Grebus" <ggrebus@virtualiron.com> wrote: > > >> I would say it''s a good idea to see if you can repro this on xen-unstable. > > > > OK... My usual xen-unstable setup is out of commission at the moment, > > but I will try to reproduce it. > > Another approach, if this really is happening every time, would be to trace > the hell out of HLT emulation and interrupt delivery with printk. Since this > happens so early, you shouldn''t end up overwhelmed with trace data. Perhaps > you can narrow down the problem that way...Well, I can''t reproduce this even in my own setup on AMD. All the failures must have really been on Intel with vmxassist (or in my imagination). Sorry for the noise. /gary -- Gary Grebus Virtual Iron Software, Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel