In a problem report for a non-Linux HVM guest with PV drivers that got brought to our attention, an issue in the PV driver code caused a XENMAPSPACE_shared_info operation to be issued in a way racing ongoing MMIO emulations on other vCPU-s targeting the gPFN being changed. The way MMIO emulation requiring callout into qemu currently works, handle_mmio() would be called twice for each such operation. This clearly assumes that both operations use consistent paths, which is easily violated when between the two operations the p2m type of the page operated on changes (in the case at hand, the transition was from MMIO [not handled by any device] to RAM, i.e. the second run through the emulator didn''t even call the MMIO related code anymore, leaving the vCPU''s io_state in HVMIO_completed instead of HVMIO_none, confusing the _next_ invocation of the emulator, and obfuscating the problem quite significantly). While I realize that the guest side problem makes debatable whether the situation really needs hypervisor improvement, I''d like cases like hot-added memory or devices to be considered here too. In particular I wonder whether emulator state shouldn''t be preserved across the invocation of qemu, so that the second run through it would neither risk of looking at a different instruction, nor having the emulated instruction access other (guest) physical memory - after all, real hardware wouldn''t decode instructions and evaluate operands more than once either. Thanks for any thoughts on this, Jan
On 02/07/2012 15:42, "Jan Beulich" <JBeulich@suse.com> wrote:> In a problem report for a non-Linux HVM guest with PV drivers that > got brought to our attention, an issue in the PV driver code caused > a XENMAPSPACE_shared_info operation to be issued in a way > racing ongoing MMIO emulations on other vCPU-s targeting the > gPFN being changed. > > The way MMIO emulation requiring callout into qemu currently works, > handle_mmio() would be called twice for each such operation. This > clearly assumes that both operations use consistent paths, which > is easily violated when between the two operations the p2m type > of the page operated on changes (in the case at hand, the > transition was from MMIO [not handled by any device] to RAM, > i.e. the second run through the emulator didn''t even call the > MMIO related code anymore, leaving the vCPU''s io_state in > HVMIO_completed instead of HVMIO_none, confusing the _next_ > invocation of the emulator, and obfuscating the problem quite > significantly). > > While I realize that the guest side problem makes debatable > whether the situation really needs hypervisor improvement, I''d > like cases like hot-added memory or devices to be considered > here too. In particular I wonder whether emulator state > shouldn''t be preserved across the invocation of qemu, so that > the second run through it would neither risk of looking at a > different instruction, nor having the emulated instruction > access other (guest) physical memory - after all, real hardware > wouldn''t decode instructions and evaluate operands more than > once either.I think changes for 4.3 will go some way to solving this. Improving waitqueue support for x86, and then using it for HVM MMIO emulation. Then we will keep emulator state on the per-vcpu hypervisor stack. -- Keir> Thanks for any thoughts on this, > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 07/02/12 16:42, Jan Beulich wrote:> In a problem report for a non-Linux HVM guest with PV drivers that > got brought to our attention, an issue in the PV driver code caused > a XENMAPSPACE_shared_info operation to be issued in a way > racing ongoing MMIO emulations on other vCPU-s targeting the > gPFN being changed. > > The way MMIO emulation requiring callout into qemu currently works, > handle_mmio() would be called twice for each such operation. This > clearly assumes that both operations use consistent paths, which > is easily violated when between the two operations the p2m type > of the page operated on changes (in the case at hand, the > transition was from MMIO [not handled by any device] to RAM, > i.e. the second run through the emulator didn''t even call the > MMIO related code anymore, leaving the vCPU''s io_state in > HVMIO_completed instead of HVMIO_none, confusing the _next_ > invocation of the emulator, and obfuscating the problem quite > significantly).This race also affects nested virtualization where the l1 hypervisor "forwards" its PCI devices to a guest (Hyper-V). The race caused MMIO handling to fail because a vcpu could switch between l2 and l1 guest between running handle_mmio() and running the softirq. The fix was to run softirq''s before switching the guest state of the vcpu. Christoph> While I realize that the guest side problem makes debatable > whether the situation really needs hypervisor improvement, I''d > like cases like hot-added memory or devices to be considered > here too. In particular I wonder whether emulator state > shouldn''t be preserved across the invocation of qemu, so that > the second run through it would neither risk of looking at a > different instruction, nor having the emulated instruction > access other (guest) physical memory - after all, real hardware > wouldn''t decode instructions and evaluate operands more than > once either. > > Thanks for any thoughts on this, > Jan-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632