Kouya Shimura
2009-Nov-26 08:17 UTC
[Xen-devel] [PATCH] x86 shadow: fix race when domain is dying
There are some cases that shadow_write_p2m_entry() is called after the domain is killed. It causes Xen to crash. - The race between xc_map_foreign_batch from qemu-dm and "xm destroy" command. The actual console log: (XEN) Xen call trace: (XEN) [<ffff82c4801c012e>] hash_foreach+0x87/0x17e (XEN) [<ffff82c4801c0362>] sh_remove_all_mappings+0x13d/0x22f (XEN) [<ffff82c4801c913d>] shadow_write_p2m_entry+0x14f/0x390 (XEN) [<ffff82c4801bc73a>] p2m_set_entry+0x23f/0x472 (XEN) [<ffff82c4801ba213>] set_p2m_entry+0x7d/0xb1 (XEN) [<ffff82c4801ba3c9>] p2m_remove_page+0x158/0x167 (XEN) [<ffff82c4801ba5d8>] guest_physmap_remove_page+0xd9/0x13c (XEN) [<ffff82c48015e0e4>] arch_memory_op+0x608/0xb3c (XEN) [<ffff82c4801138f3>] do_memory_op+0x1944/0x19a1 (XEN) [<ffff82c480113b98>] do_multicall+0x248/0x390 (XEN) [<ffff82c4801ec1bf>] syscall_enter+0xef/0x149 - The hypervisor calls domain_crash when PoD fails. The actual console log: (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! tot_pages 65751 pod_entries 197408 (XEN) domain_crash called from p2m.c:1062 (XEN) Domain 1 reported crashed by domain 0 on cpu#3: (XEN) ----[ Xen-3.5-unstable x86_64 debug=y Tainted: C ]---- ...[snip] (XEN) Xen call trace: (XEN) [<ffff82c4801c152e>] hash_foreach+0x87/0x17e (XEN) [<ffff82c4801c1762>] sh_remove_all_mappings+0x13d/0x22f (XEN) [<ffff82c4801ca491>] shadow_write_p2m_entry+0x14f/0x390 (XEN) [<ffff82c4801bdaf6>] p2m_set_entry+0x23f/0x472 (XEN) [<ffff82c4801bb5b3>] set_p2m_entry+0x7d/0xb1 (XEN) [<ffff82c4801bdf9f>] p2m_pod_zero_check+0x276/0x3d8 (XEN) [<ffff82c4801be71f>] p2m_pod_demand_populate+0x61e/0x8dc (XEN) [<ffff82c4801beb5c>] p2m_pod_check_and_populate+0x17f/0x1fa (XEN) [<ffff82c4801bf228>] p2m_gfn_to_mfn+0x34a/0x3f3 (XEN) [<ffff82c480166528>] mod_l1_entry+0x1aa/0x7ee (XEN) [<ffff82c48016774f>] do_mmu_update+0x56a/0x144b (XEN) [<ffff82c4801ed1bf>] syscall_enter+0xef/0x149 (XEN) (XEN) Pagetable walk from 0000000000000000: (XEN) L4[0x000] = 000000011e7c4067 00000000000d9933 (XEN) L3[0x000] = 000000011e7c3067 00000000000d9934 (XEN) L2[0x000] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 3: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: 0000000000000000 (XEN) **************************************** Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Nov-26 11:07 UTC
Re: [Xen-devel] [PATCH] x86 shadow: fix race when domain is dying
Hi, At 08:17 +0000 on 26 Nov (1259223466), Kouya Shimura wrote:> There are some cases that shadow_write_p2m_entry() is called after > the domain is killed. It causes Xen to crash.Thanks for catching this! I''m afraid your fix opens a different race window, though: any p2m operation that happens after d->is_dying is set but before p2m_teardown() will corrupt the p2m (because the entry wouldn''t actually get written). If it also happens before shadow_teardown() it could break the invariants of the shadow pagetables, possibly causing a crash when shadow_teardown() is reached. The right fix is to test for whether shadow_teardown() has been called, and if so, call safe_write_entry() without trying to fix up the shadows. I''ve attached a patch. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kouya Shimura
2009-Nov-27 01:22 UTC
Re: [Xen-devel] [PATCH] x86 shadow: fix race when domain is dying
Hi Tim, Thanks for correcting this. Indeed my patch is unsafe. Keir, This is serious. I think c/s 20508 should be applied to xen-3.4 too. Actually I met this for the first time in xen-3.4. Thanks, Kouya Tim Deegan writes:> Hi, > > At 08:17 +0000 on 26 Nov (1259223466), Kouya Shimura wrote: > > There are some cases that shadow_write_p2m_entry() is called after > > the domain is killed. It causes Xen to crash. > > Thanks for catching this! I''m afraid your fix opens a different race > window, though: any p2m operation that happens after d->is_dying is set > but before p2m_teardown() will corrupt the p2m (because the entry > wouldn''t actually get written). If it also happens before > shadow_teardown() it could break the invariants of the shadow > pagetables, possibly causing a crash when shadow_teardown() is reached. > > The right fix is to test for whether shadow_teardown() has been called, > and if so, call safe_write_entry() without trying to fix up the shadows. > I''ve attached a patch. > > Cheers, > > Tim. > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, Citrix Systems (R&D) Ltd. > [Company #02300071, SL9 0DZ, UK.]_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel