Olaf Hering
2013-Apr-30 18:19 UTC
guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
With current xen-unstable I see this guest crash if the gfn 169ff is paged out. The xenpaging -v output shows that 169ff is populated, but appearently wrmsr_hypervisor_regs does not like the resulting mfn?! ... (XEN) HVM10: HVM Loader (XEN) HVM10: Detected Xen v4.3.26939-20130430 (XEN) HVM10: Xenbus rings @0xfeffc000, event channel 6 (XEN) HVM10: System requested SeaBIOS (XEN) HVM10: CPU speed is 2926 MHz (XEN) irq.c:270: Dom10 PCI link 0 changed 0 -> 5 (XEN) HVM10: PCI-ISA link 0 routed to IRQ5 (XEN) irq.c:270: Dom10 PCI link 1 changed 0 -> 10 (XEN) HVM10: PCI-ISA link 1 routed to IRQ10 (XEN) irq.c:270: Dom10 PCI link 2 changed 0 -> 11 (XEN) HVM10: PCI-ISA link 2 routed to IRQ11 (XEN) irq.c:270: Dom10 PCI link 3 changed 0 -> 5 (XEN) HVM10: PCI-ISA link 3 routed to IRQ5 (XEN) HVM10: pci dev 01:2 INTD->IRQ5 (XEN) HVM10: pci dev 01:3 INTA->IRQ10 (XEN) HVM10: pci dev 03:0 INTA->IRQ5 (XEN) HVM10: pci dev 02:0 bar 10 size lx: 02000000 (XEN) HVM10: pci dev 03:0 bar 14 size lx: 01000000 (XEN) HVM10: pci dev 02:0 bar 30 size lx: 00010000 (XEN) HVM10: pci dev 02:0 bar 14 size lx: 00001000 (XEN) HVM10: pci dev 03:0 bar 10 size lx: 00000100 (XEN) HVM10: pci dev 01:2 bar 20 size lx: 00000020 (XEN) HVM10: pci dev 01:1 bar 20 size lx: 00000010 (XEN) HVM10: Multiprocessor initialisation: (XEN) HVM10: - CPU0 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... done. (XEN) HVM10: - CPU1 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... done. (XEN) HVM10: - CPU2 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... done. (XEN) HVM10: - CPU3 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... done. (XEN) HVM10: Testing HVM environment: (XEN) HVM10: - REP INSB across page boundaries ... passed (XEN) HVM10: - GS base MSRs and SWAPGS ... passed (XEN) HVM10: Passed 2 of 2 tests (XEN) HVM10: Writing SMBIOS tables ... (XEN) HVM10: Loading SeaBIOS ... (XEN) HVM10: Creating MP tables ... (XEN) HVM10: Loading ACPI ... (XEN) HVM10: vm86 TSS at fc00a100 (XEN) HVM10: BIOS map: (XEN) HVM10: 10000-100d3: Scratch space (XEN) HVM10: e0000-fffff: Main BIOS (XEN) HVM10: E820 table: (XEN) HVM10: [00]: 00000000:00000000 - 00000000:000a0000: RAM (XEN) HVM10: HOLE: 00000000:000a0000 - 00000000:000e0000 (XEN) HVM10: [01]: 00000000:000e0000 - 00000000:00100000: RESERVED (XEN) HVM10: [02]: 00000000:00100000 - 00000000:16a00000: RAM (XEN) HVM10: HOLE: 00000000:16a00000 - 00000000:fc000000 (XEN) HVM10: [03]: 00000000:fc000000 - 00000001:00000000: RESERVED (XEN) HVM10: Invoking SeaBIOS ... (XEN) HVM10: SeaBIOS (version ?-20130430_174224-bax) (XEN) HVM10: (XEN) HVM10: Found Xen hypervisor signature at 40000000 (XEN) HVM10: xen: copy e820... (XEN) HVM10: Ram Size=0x16a00000 (0x0000000000000000 high) (XEN) HVM10: Relocating low data from 0x000e2490 to 0x000ef790 (size 2156) (XEN) HVM10: Relocating init from 0x000e2cfc to 0x169e20f0 (size 56804) (XEN) HVM10: CPU Mhz=2928 (XEN) HVM10: Found 7 PCI devices (max PCI bus is 00) (XEN) HVM10: Allocated Xen hypercall page at 169ff000 (XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR 40000000 (XEN) HVM10: Detected Xen v4.3 (XEN) io.c:201:d10 MMIO emulation failed @ 0008:c2c2c2c2: 18 7c 55 6d 03 83 ff ff 10 7c (XEN) hvm.c:1253:d10 Triple fault on VCPU0 - invoking HVM shutdown action 1. (XEN) HVM11: HVM Loader (XEN) HVM11: Detected Xen v4.3.26939-20130430 ... The .cfg file looks like this: name="12.3_full_default2" uuid="3c8c0937-cb46-4fe9-a871-8e4c60ab8dfe" memory=370 vcpus=4 serial="pty" builder="hvm" boot="dc" disk=[ ''file:/some.vdisk,hda,w'', ''file:/some.iso,hdc:cdrom,r'', ] vif=[ ''bridge=br0,type=netfront'' ] vfb = [ ''type=vnc,vncunused=1,keymap=de'' ] usb=1 usbdevice=''tablet'' I''m using "xl -v -v create -d -p -f domU.cfg" to start it, then run xenpaging manually, then unpause the guest. Olaf
Tim Deegan
2013-May-02 11:20 UTC
Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
At 20:19 +0200 on 30 Apr (1367353157), Olaf Hering wrote:> > With current xen-unstable I see this guest crash if the gfn 169ff is > paged out. The xenpaging -v output shows that 169ff is populated, but > appearently wrmsr_hypervisor_regs does not like the resulting mfn?!Looks that way:> (XEN) HVM10: Allocated Xen hypercall page at 169ff000 > (XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR 40000000That MFN looks like garbage, so I''m guessing that ''page'' was null, i.e. get_page_from_gfn() returned NULL. I guess you''ll need to instrument it up to figure out why. At least the GFN is a predictable constant which should make it easier to add debugging printout for just this case. Tim.
Olaf Hering
2013-May-02 14:43 UTC
Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
On Thu, May 02, Tim Deegan wrote:> At 20:19 +0200 on 30 Apr (1367353157), Olaf Hering wrote: > > > > With current xen-unstable I see this guest crash if the gfn 169ff is > > paged out. The xenpaging -v output shows that 169ff is populated, but > > appearently wrmsr_hypervisor_regs does not like the resulting mfn?! > > Looks that way: > > > (XEN) HVM10: Allocated Xen hypercall page at 169ff000 > > (XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR 40000000 > > That MFN looks like garbage, so I''m guessing that ''page'' was null, i.e. > get_page_from_gfn() returned NULL. I guess you''ll need to instrument it > up to figure out why. At least the GFN is a predictable constant which > should make it easier to add debugging printout for just this case.The GMFN has p2m_t p2m_ram_paged, so the mfn is -1. Its not clear to me, how should wrmsr_hypervisor_regs handle a paged gfn? I was under the impression that get_page_from_gfn would wait until the gfn is paged-in again. Olaf
Tim Deegan
2013-May-02 14:52 UTC
Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
At 16:43 +0200 on 02 May (1367512981), Olaf Hering wrote:> On Thu, May 02, Tim Deegan wrote: > > > At 20:19 +0200 on 30 Apr (1367353157), Olaf Hering wrote: > > > > > > With current xen-unstable I see this guest crash if the gfn 169ff is > > > paged out. The xenpaging -v output shows that 169ff is populated, but > > > appearently wrmsr_hypervisor_regs does not like the resulting mfn?! > > > > Looks that way: > > > > > (XEN) HVM10: Allocated Xen hypercall page at 169ff000 > > > (XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR 40000000 > > > > That MFN looks like garbage, so I''m guessing that ''page'' was null, i.e. > > get_page_from_gfn() returned NULL. I guess you''ll need to instrument it > > up to figure out why. At least the GFN is a predictable constant which > > should make it easier to add debugging printout for just this case. > > The GMFN has p2m_t p2m_ram_paged, so the mfn is -1. > > Its not clear to me, how should wrmsr_hypervisor_regs handle a paged > gfn? I was under the impression that get_page_from_gfn would wait until > the gfn is paged-in again.Ah, it doesn''t seem to be that way. Other callers of the p2m functions handle this in the caller. :( So you''ll need to add something like: if ( paged ) p2m_mem_paging_populate(d, gmfn); here (and anywhere else). It would be much better if this could happen inside the p2m lookup function, but ISTR it currently can''t because you can''t sleep with any locks held. Tim.
Jan Beulich
2013-May-02 14:58 UTC
Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
>>> On 02.05.13 at 16:43, Olaf Hering <olaf@aepfle.de> wrote: > On Thu, May 02, Tim Deegan wrote: > >> At 20:19 +0200 on 30 Apr (1367353157), Olaf Hering wrote: >> > >> > With current xen-unstable I see this guest crash if the gfn 169ff is >> > paged out. The xenpaging -v output shows that 169ff is populated, but >> > appearently wrmsr_hypervisor_regs does not like the resulting mfn?! >> >> Looks that way: >> >> > (XEN) HVM10: Allocated Xen hypercall page at 169ff000 >> > (XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR 40000000 >> >> That MFN looks like garbage, so I''m guessing that ''page'' was null, i.e. >> get_page_from_gfn() returned NULL. I guess you''ll need to instrument it >> up to figure out why. At least the GFN is a predictable constant which >> should make it easier to add debugging printout for just this case. > > The GMFN has p2m_t p2m_ram_paged, so the mfn is -1. > > Its not clear to me, how should wrmsr_hypervisor_regs handle a paged > gfn? I was under the impression that get_page_from_gfn would wait until > the gfn is paged-in again.We can''t put a vCPU to sleep at arbitrary points yet, which means that right now the caller of the function is responsible for the wait-and-retry - normally that would be in hypercall handlers, but obviously you need this here too. Jan
Olaf Hering
2013-May-02 15:20 UTC
Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
On Thu, May 02, Jan Beulich wrote:> We can''t put a vCPU to sleep at arbitrary points yet, which means > that right now the caller of the function is responsible for the > wait-and-retry - normally that would be in hypercall handlers, but > obviously you need this here too.Yes, thats the issue. vmx_msr_write_intercept and svm_msr_write_intercept could just return X86EMUL_RETRY to their callers. How should emulate_privileged_op handle the wrmsr_hypervisor_regs failure due to a paged page? Olaf
Jan Beulich
2013-May-02 15:29 UTC
Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
>>> On 02.05.13 at 17:20, Olaf Hering <olaf@aepfle.de> wrote: > On Thu, May 02, Jan Beulich wrote: > >> We can''t put a vCPU to sleep at arbitrary points yet, which means >> that right now the caller of the function is responsible for the >> wait-and-retry - normally that would be in hypercall handlers, but >> obviously you need this here too. > > Yes, thats the issue. > > vmx_msr_write_intercept and svm_msr_write_intercept could just return > X86EMUL_RETRY to their callers. > > How should emulate_privileged_op handle the wrmsr_hypervisor_regs > failure due to a paged page?That''s a PV only path, hence no need to consider paging. Just assert that the return value of X86EMUL_OKAY. Jan
Olaf Hering
2013-May-02 17:46 UTC
Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
On Thu, May 02, Jan Beulich wrote:> >>> On 02.05.13 at 17:20, Olaf Hering <olaf@aepfle.de> wrote: > > On Thu, May 02, Jan Beulich wrote: > > > >> We can''t put a vCPU to sleep at arbitrary points yet, which means > >> that right now the caller of the function is responsible for the > >> wait-and-retry - normally that would be in hypercall handlers, but > >> obviously you need this here too. > > > > Yes, thats the issue. > > > > vmx_msr_write_intercept and svm_msr_write_intercept could just return > > X86EMUL_RETRY to their callers. > > > > How should emulate_privileged_op handle the wrmsr_hypervisor_regs > > failure due to a paged page? > > That''s a PV only path, hence no need to consider paging. Just > assert that the return value of X86EMUL_OKAY.I sent a patch which fixes this issue for me. The 4.2 branch has appearently the same issue. Olaf