thr3ads.net - Xen devel - guest crash in wrmsr_hypervisor_regs if hypercall page is paged out [Apr 2013]

If this information is useful, please help other people find it:
Share via:

Olaf Hering

2013-Apr-30 18:19 UTC

guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

With current xen-unstable I see this guest crash if the gfn 169ff is
paged out. The xenpaging -v output shows that 169ff is populated, but
appearently wrmsr_hypervisor_regs does not like the resulting mfn?!

...
(XEN) HVM10: HVM Loader
(XEN) HVM10: Detected Xen v4.3.26939-20130430
(XEN) HVM10: Xenbus rings @0xfeffc000, event channel 6
(XEN) HVM10: System requested SeaBIOS
(XEN) HVM10: CPU speed is 2926 MHz
(XEN) irq.c:270: Dom10 PCI link 0 changed 0 -> 5
(XEN) HVM10: PCI-ISA link 0 routed to IRQ5
(XEN) irq.c:270: Dom10 PCI link 1 changed 0 -> 10
(XEN) HVM10: PCI-ISA link 1 routed to IRQ10
(XEN) irq.c:270: Dom10 PCI link 2 changed 0 -> 11
(XEN) HVM10: PCI-ISA link 2 routed to IRQ11
(XEN) irq.c:270: Dom10 PCI link 3 changed 0 -> 5
(XEN) HVM10: PCI-ISA link 3 routed to IRQ5
(XEN) HVM10: pci dev 01:2 INTD->IRQ5
(XEN) HVM10: pci dev 01:3 INTA->IRQ10
(XEN) HVM10: pci dev 03:0 INTA->IRQ5
(XEN) HVM10: pci dev 02:0 bar 10 size lx: 02000000
(XEN) HVM10: pci dev 03:0 bar 14 size lx: 01000000
(XEN) HVM10: pci dev 02:0 bar 30 size lx: 00010000
(XEN) HVM10: pci dev 02:0 bar 14 size lx: 00001000
(XEN) HVM10: pci dev 03:0 bar 10 size lx: 00000100
(XEN) HVM10: pci dev 01:2 bar 20 size lx: 00000020
(XEN) HVM10: pci dev 01:1 bar 20 size lx: 00000010
(XEN) HVM10: Multiprocessor initialisation:
(XEN) HVM10:  - CPU0 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ...
done.
(XEN) HVM10:  - CPU1 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ...
done.
(XEN) HVM10:  - CPU2 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ...
done.
(XEN) HVM10:  - CPU3 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ...
done.
(XEN) HVM10: Testing HVM environment:
(XEN) HVM10:  - REP INSB across page boundaries ... passed
(XEN) HVM10:  - GS base MSRs and SWAPGS ... passed
(XEN) HVM10: Passed 2 of 2 tests
(XEN) HVM10: Writing SMBIOS tables ...
(XEN) HVM10: Loading SeaBIOS ...
(XEN) HVM10: Creating MP tables ...
(XEN) HVM10: Loading ACPI ...
(XEN) HVM10: vm86 TSS at fc00a100
(XEN) HVM10: BIOS map:
(XEN) HVM10:  10000-100d3: Scratch space
(XEN) HVM10:  e0000-fffff: Main BIOS
(XEN) HVM10: E820 table:
(XEN) HVM10:  [00]: 00000000:00000000 - 00000000:000a0000: RAM
(XEN) HVM10:  HOLE: 00000000:000a0000 - 00000000:000e0000
(XEN) HVM10:  [01]: 00000000:000e0000 - 00000000:00100000: RESERVED
(XEN) HVM10:  [02]: 00000000:00100000 - 00000000:16a00000: RAM
(XEN) HVM10:  HOLE: 00000000:16a00000 - 00000000:fc000000
(XEN) HVM10:  [03]: 00000000:fc000000 - 00000001:00000000: RESERVED
(XEN) HVM10: Invoking SeaBIOS ...
(XEN) HVM10: SeaBIOS (version ?-20130430_174224-bax)
(XEN) HVM10:
(XEN) HVM10: Found Xen hypervisor signature at 40000000
(XEN) HVM10: xen: copy e820...
(XEN) HVM10: Ram Size=0x16a00000 (0x0000000000000000 high)
(XEN) HVM10: Relocating low data from 0x000e2490 to 0x000ef790 (size 2156)
(XEN) HVM10: Relocating init from 0x000e2cfc to 0x169e20f0 (size 56804)
(XEN) HVM10: CPU Mhz=2928
(XEN) HVM10: Found 7 PCI devices (max PCI bus is 00)
(XEN) HVM10: Allocated Xen hypercall page at 169ff000
(XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR 40000000
(XEN) HVM10: Detected Xen v4.3
(XEN) io.c:201:d10 MMIO emulation failed @ 0008:c2c2c2c2: 18 7c 55 6d 03 83 ff
ff 10 7c
(XEN) hvm.c:1253:d10 Triple fault on VCPU0 - invoking HVM shutdown action 1.
(XEN) HVM11: HVM Loader
(XEN) HVM11: Detected Xen v4.3.26939-20130430
...


The .cfg file looks like this:

name="12.3_full_default2"
uuid="3c8c0937-cb46-4fe9-a871-8e4c60ab8dfe"
memory=370
vcpus=4
serial="pty"
builder="hvm"
boot="dc"
disk=[ 
        ''file:/some.vdisk,hda,w'',
        ''file:/some.iso,hdc:cdrom,r'',
]
vif=[
        ''bridge=br0,type=netfront''
]
vfb = [
        ''type=vnc,vncunused=1,keymap=de''
]
usb=1
usbdevice=''tablet''

I''m using "xl -v -v create -d -p -f domU.cfg" to start it,
then run xenpaging
manually, then unpause the guest.


Olaf

Tim Deegan

2013-May-02 11:20 UTC

head link

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

At 20:19 +0200 on 30 Apr (1367353157), Olaf Hering
wrote:> 
> With current xen-unstable I see this guest crash if the gfn 169ff is
> paged out. The xenpaging -v output shows that 169ff is populated, but
> appearently wrmsr_hypervisor_regs does not like the resulting mfn?!
Looks that way:
> (XEN) HVM10: Allocated Xen hypercall page at 169ff000
> (XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR 40000000
That MFN looks like garbage, so I''m guessing that
''page'' was null, i.e.
get_page_from_gfn() returned NULL.  I guess you''ll need to instrument
it
up to figure out why.  At least the GFN is a predictable constant which
should make it easier to add debugging printout for just this case.

Tim.

Olaf Hering

2013-May-02 14:43 UTC

head link

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

On Thu, May 02, Tim Deegan wrote:
> At 20:19 +0200 on 30 Apr (1367353157), Olaf Hering wrote:
> > 
> > With current xen-unstable I see this guest crash if the gfn 169ff is
> > paged out. The xenpaging -v output shows that 169ff is populated, but
> > appearently wrmsr_hypervisor_regs does not like the resulting mfn?!
> 
> Looks that way:
> 
> > (XEN) HVM10: Allocated Xen hypercall page at 169ff000
> > (XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR 40000000
> 
> That MFN looks like garbage, so I''m guessing that
''page'' was null, i.e.
> get_page_from_gfn() returned NULL.  I guess you''ll need to
instrument it
> up to figure out why.  At least the GFN is a predictable constant which
> should make it easier to add debugging printout for just this case.
The GMFN has p2m_t p2m_ram_paged, so the mfn is -1.

Its not clear to me, how should wrmsr_hypervisor_regs handle a paged
gfn? I was under the impression that get_page_from_gfn would wait until
the gfn is paged-in again.

Olaf

Tim Deegan

2013-May-02 14:52 UTC

head link

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

At 16:43 +0200 on 02 May (1367512981), Olaf Hering
wrote:> On Thu, May 02, Tim Deegan wrote:
> 
> > At 20:19 +0200 on 30 Apr (1367353157), Olaf Hering wrote:
> > > 
> > > With current xen-unstable I see this guest crash if the gfn 169ff
is
> > > paged out. The xenpaging -v output shows that 169ff is populated,
but
> > > appearently wrmsr_hypervisor_regs does not like the resulting
mfn?!
> > 
> > Looks that way:
> > 
> > > (XEN) HVM10: Allocated Xen hypercall page at 169ff000
> > > (XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR
40000000
> > 
> > That MFN looks like garbage, so I''m guessing that
''page'' was null, i.e.
> > get_page_from_gfn() returned NULL.  I guess you''ll need to
instrument it
> > up to figure out why.  At least the GFN is a predictable constant
which
> > should make it easier to add debugging printout for just this case.
> 
> The GMFN has p2m_t p2m_ram_paged, so the mfn is -1.
> 
> Its not clear to me, how should wrmsr_hypervisor_regs handle a paged
> gfn? I was under the impression that get_page_from_gfn would wait until
> the gfn is paged-in again.
Ah, it doesn''t seem to be that way.  Other callers of the p2m functions
handle this in the caller. :(

So you''ll need to add something like:
    if ( paged )
        p2m_mem_paging_populate(d, gmfn);

here (and anywhere else).

It would be much better if this could happen inside the p2m lookup
function, but ISTR it currently can''t because you can''t sleep
with any
locks held.

Tim.

Jan Beulich

2013-May-02 14:58 UTC

head link

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

>>> On 02.05.13 at 16:43, Olaf Hering <olaf@aepfle.de> wrote:
> On Thu, May 02, Tim Deegan wrote:
> 
>> At 20:19 +0200 on 30 Apr (1367353157), Olaf Hering wrote:
>> > 
>> > With current xen-unstable I see this guest crash if the gfn 169ff
is
>> > paged out. The xenpaging -v output shows that 169ff is populated,
but
>> > appearently wrmsr_hypervisor_regs does not like the resulting
mfn?!
>> 
>> Looks that way:
>> 
>> > (XEN) HVM10: Allocated Xen hypercall page at 169ff000
>> > (XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR
40000000
>> 
>> That MFN looks like garbage, so I''m guessing that
''page'' was null, i.e.
>> get_page_from_gfn() returned NULL.  I guess you''ll need to
instrument it
>> up to figure out why.  At least the GFN is a predictable constant which
>> should make it easier to add debugging printout for just this case.
> 
> The GMFN has p2m_t p2m_ram_paged, so the mfn is -1.
> 
> Its not clear to me, how should wrmsr_hypervisor_regs handle a paged
> gfn? I was under the impression that get_page_from_gfn would wait until
> the gfn is paged-in again.
We can''t put a vCPU to sleep at arbitrary points yet, which means
that right now the caller of the function is responsible for the
wait-and-retry - normally that would be in hypercall handlers, but
obviously you need this here too.

Jan

Olaf Hering

2013-May-02 15:20 UTC

head link

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

On Thu, May 02, Jan Beulich wrote:
> We can''t put a vCPU to sleep at arbitrary points yet, which means
> that right now the caller of the function is responsible for the
> wait-and-retry - normally that would be in hypercall handlers, but
> obviously you need this here too.
Yes, thats the issue.

vmx_msr_write_intercept and svm_msr_write_intercept could just return
X86EMUL_RETRY to their callers.

How should emulate_privileged_op handle the wrmsr_hypervisor_regs
failure due to a paged page?

Olaf

Jan Beulich

2013-May-02 15:29 UTC

head link

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

>>> On 02.05.13 at 17:20, Olaf Hering <olaf@aepfle.de> wrote:
> On Thu, May 02, Jan Beulich wrote:
> 
>> We can''t put a vCPU to sleep at arbitrary points yet, which
means
>> that right now the caller of the function is responsible for the
>> wait-and-retry - normally that would be in hypercall handlers, but
>> obviously you need this here too.
> 
> Yes, thats the issue.
> 
> vmx_msr_write_intercept and svm_msr_write_intercept could just return
> X86EMUL_RETRY to their callers.
> 
> How should emulate_privileged_op handle the wrmsr_hypervisor_regs
> failure due to a paged page?
That''s a PV only path, hence no need to consider paging. Just
assert that the return value of X86EMUL_OKAY.

Jan

Olaf Hering

2013-May-02 17:46 UTC

head link

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

On Thu, May 02, Jan Beulich wrote:
> >>> On 02.05.13 at 17:20, Olaf Hering <olaf@aepfle.de>
wrote:
> > On Thu, May 02, Jan Beulich wrote:
> > 
> >> We can''t put a vCPU to sleep at arbitrary points yet,
which means
> >> that right now the caller of the function is responsible for the
> >> wait-and-retry - normally that would be in hypercall handlers, but
> >> obviously you need this here too.
> > 
> > Yes, thats the issue.
> > 
> > vmx_msr_write_intercept and svm_msr_write_intercept could just return
> > X86EMUL_RETRY to their callers.
> > 
> > How should emulate_privileged_op handle the wrmsr_hypervisor_regs
> > failure due to a paged page?
> 
> That''s a PV only path, hence no need to consider paging. Just
> assert that the return value of X86EMUL_OKAY.
I sent a patch which fixes this issue for me. The 4.2 branch has
appearently the same issue.

Olaf

Xen devel - Apr 2013 - guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out