Sengul Thomas
2013-Apr-18 06:30 UTC
Page table sync. in Xen arm (Arndale) with SMP enabled (Bug fix included)
Hello, I''m running Xen on Arndale (Exynos5250) board with the following source. Xen repo: //xenbits.xen.org/people/julieng/xen-unstable.git branch: arndale And when I generate heavy network traffic at DomU (with SMP enabled) the following error occasionally occurs: (XEN) Assertion ''map[slot].pt.avail != 0'' failed, line 219, file mm.c (XEN) Xen BUG at mm.c:219 (XEN) CPU1: Unexpected Trap: Undefined Instruction (XEN) ----[ Xen-4.3-unstable arm32 debug=y Tainted: C ]---- (XEN) CPU: 1 (XEN) PC: 0023d1bc __bug+0x2c/0x44 (XEN) CPSR: 200001da MODE:Hypervisor (XEN) R0: 0025c70c R1: 00000003 R2: 3fd4fd80 R3: 00000fff (XEN) R4: 00259170 R5: 000000db R6: 0025a4ec R7: 7ffd7180 (XEN) R8: bfcf8000 R9: 7ffd516c R10:7ffd7000 R11:7ffe7d2c R12:00000004 (XEN) HYP: SP: 7ffe7d24 LR: 0023d1bc ... skip (XEN) [<0023d1bc>] __bug+0x2c/0x44 (XEN) [<00243a18>] unmap_domain_page+0x7c/0xac (XEN) [<002452a8>] p2m_lookup+0x13c/0x170 (XEN) [<0024562c>] gmfn_to_mfn+0x14/0x20 (XEN) [<00209e04>] __get_paged_frame+0x24/0x9c (XEN) [<0020a2f4>] __acquire_grant_for_copy+0x478/0x6b0 (XEN) [<0020cd30>] do_grant_table_op+0x1938/0x2cb8 (XEN) [<00247d00>] do_trap_hypervisor+0x70c/0xa7c (XEN) [<0024a030>] return_from_trap+0/0x4 I have digged a little bit and figured out that map_domain_page and unmap_domain_page functions access page table without synchronization. Here goes a simple fix and I hope you guys make it xen-compatible :) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index ba378d0..7821db0 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -70,6 +70,8 @@ unsigned long total_pages; extern char __init_begin[], __init_end[]; +spinlock_t avail_lock; + void dump_pt_walk(lpae_t *first, paddr_t addr) { lpae_t *second = NULL, *third = NULL; @@ -151,6 +153,7 @@ void *map_domain_page(unsigned long mfn) int i, slot; local_irq_save(flags); + spin_lock(&avail_lock); /* The map is laid out as an open-addressed hash table where each * entry is a 2MB superpage pte. We use the available bits of each @@ -191,6 +194,7 @@ void *map_domain_page(unsigned long mfn) } #endif + spin_unlock(&avail_lock); local_irq_restore(flags); va = (DOMHEAP_VIRT_START @@ -214,12 +218,14 @@ void unmap_domain_page(const void *va) int slot = ((unsigned long) va - DOMHEAP_VIRT_START) >> SECOND_SHIFT; local_irq_save(flags); + spin_lock(&avail_lock); ASSERT(slot >= 0 && slot < DOMHEAP_ENTRIES); ASSERT(map[slot].pt.avail != 0); map[slot].pt.avail--; + spin_unlock(&avail_lock); local_irq_restore(flags); } @@ -368,6 +374,8 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_WXN, SCTLR_EL2); /* Flush everything after setting WXN bit. */ flush_xen_text_tlb(); + + spin_lock_init(&avail_lock); } Sincerely, Thomas
Ian Campbell
2013-Apr-18 08:02 UTC
Re: Page table sync. in Xen arm (Arndale) with SMP enabled (Bug fix included)
On Thu, 2013-04-18 at 07:30 +0100, Sengul Thomas wrote:> Hello, > > I''m running Xen on Arndale (Exynos5250) board with the following source. > > Xen repo: //xenbits.xen.org/people/julieng/xen-unstable.git > branch: arndale > > And when I generate heavy network traffic at DomU (with SMP enabled) > the following error occasionally occurs: > > (XEN) Assertion ''map[slot].pt.avail != 0'' failed, line 219, file mm.c > (XEN) Xen BUG at mm.c:219 > (XEN) CPU1: Unexpected Trap: Undefined Instruction > (XEN) ----[ Xen-4.3-unstable arm32 debug=y Tainted: C ]---- > (XEN) CPU: 1 > (XEN) PC: 0023d1bc __bug+0x2c/0x44 > (XEN) CPSR: 200001da MODE:Hypervisor > (XEN) R0: 0025c70c R1: 00000003 R2: 3fd4fd80 R3: 00000fff > (XEN) R4: 00259170 R5: 000000db R6: 0025a4ec R7: 7ffd7180 > (XEN) R8: bfcf8000 R9: 7ffd516c R10:7ffd7000 R11:7ffe7d2c R12:00000004 > (XEN) HYP: SP: 7ffe7d24 LR: 0023d1bc > ... skip > (XEN) [<0023d1bc>] __bug+0x2c/0x44 > (XEN) [<00243a18>] unmap_domain_page+0x7c/0xac > (XEN) [<002452a8>] p2m_lookup+0x13c/0x170 > (XEN) [<0024562c>] gmfn_to_mfn+0x14/0x20 > (XEN) [<00209e04>] __get_paged_frame+0x24/0x9c > (XEN) [<0020a2f4>] __acquire_grant_for_copy+0x478/0x6b0 > (XEN) [<0020cd30>] do_grant_table_op+0x1938/0x2cb8 > (XEN) [<00247d00>] do_trap_hypervisor+0x70c/0xa7c > (XEN) [<0024a030>] return_from_trap+0/0x4 > > I have digged a little bit and figured out that map_domain_page and > unmap_domain_page functions access page table without synchronization.Hrm, I thought map_domain_page was supposed to create a PCPU local mapping, at least I thought that was Tim''s intent. By its very nature a per-PCPU mapping shouldn''t require locking except against local interrupts (which it correctly disables). I expect the real bug is that we have failed to remember to setup per-PCPU mappings at DOMHEAP_VIRT_START when we implemented SMP! Oops... We don''t appear to have even managed per-PCPU pagetables, so this may not be a completely trivial fix... This also explains why Stefano was able to implement map_doman_page_global in terms of map_domain_page: http://marc.info/?l=xen-devel&m=136551673224671> Here goes a simple fix and I hope you guys make it xen-compatible :)Thanks, as I say I don''t think the fix is right but the analysis/identification of the issue was certainly valuable! WRT future submissions the most important thing is to include a Signed-off-by line to indicate that you certify the patch under the Developer''s Certificate of Origin, which is include in: http://wiki.xen.org/wiki/Submitting_Xen_Patches Ian.
Stefano Stabellini
2013-Apr-19 13:08 UTC
Re: Page table sync. in Xen arm (Arndale) with SMP enabled (Bug fix included)
On Thu, 18 Apr 2013, Ian Campbell wrote:> On Thu, 2013-04-18 at 07:30 +0100, Sengul Thomas wrote: > > Hello, > > > > I''m running Xen on Arndale (Exynos5250) board with the following source. > > > > Xen repo: //xenbits.xen.org/people/julieng/xen-unstable.git > > branch: arndale > > > > And when I generate heavy network traffic at DomU (with SMP enabled) > > the following error occasionally occurs: > > > > (XEN) Assertion ''map[slot].pt.avail != 0'' failed, line 219, file mm.c > > (XEN) Xen BUG at mm.c:219 > > (XEN) CPU1: Unexpected Trap: Undefined Instruction > > (XEN) ----[ Xen-4.3-unstable arm32 debug=y Tainted: C ]---- > > (XEN) CPU: 1 > > (XEN) PC: 0023d1bc __bug+0x2c/0x44 > > (XEN) CPSR: 200001da MODE:Hypervisor > > (XEN) R0: 0025c70c R1: 00000003 R2: 3fd4fd80 R3: 00000fff > > (XEN) R4: 00259170 R5: 000000db R6: 0025a4ec R7: 7ffd7180 > > (XEN) R8: bfcf8000 R9: 7ffd516c R10:7ffd7000 R11:7ffe7d2c R12:00000004 > > (XEN) HYP: SP: 7ffe7d24 LR: 0023d1bc > > ... skip > > (XEN) [<0023d1bc>] __bug+0x2c/0x44 > > (XEN) [<00243a18>] unmap_domain_page+0x7c/0xac > > (XEN) [<002452a8>] p2m_lookup+0x13c/0x170 > > (XEN) [<0024562c>] gmfn_to_mfn+0x14/0x20 > > (XEN) [<00209e04>] __get_paged_frame+0x24/0x9c > > (XEN) [<0020a2f4>] __acquire_grant_for_copy+0x478/0x6b0 > > (XEN) [<0020cd30>] do_grant_table_op+0x1938/0x2cb8 > > (XEN) [<00247d00>] do_trap_hypervisor+0x70c/0xa7c > > (XEN) [<0024a030>] return_from_trap+0/0x4 > > > > I have digged a little bit and figured out that map_domain_page and > > unmap_domain_page functions access page table without synchronization. > > Hrm, I thought map_domain_page was supposed to create a PCPU local > mapping, at least I thought that was Tim''s intent. By its very nature a > per-PCPU mapping shouldn''t require locking except against local > interrupts (which it correctly disables). > > I expect the real bug is that we have failed to remember to setup > per-PCPU mappings at DOMHEAP_VIRT_START when we implemented SMP! Oops... > We don''t appear to have even managed per-PCPU pagetables, so this may > not be a completely trivial fix... > > This also explains why Stefano was able to implement > map_doman_page_global in terms of map_domain_page: > http://marc.info/?l=xen-devel&m=136551673224671Regarding that patch, should I re-implement map_doman_page_global using the current map_domain_page code plus Sengul''s patch? Should I wait now and rebase later?> > Here goes a simple fix and I hope you guys make it xen-compatible :) > > Thanks, as I say I don''t think the fix is right but the > analysis/identification of the issue was certainly valuable!that''s right, thanks again!
Ian Campbell
2013-Apr-19 13:24 UTC
Re: Page table sync. in Xen arm (Arndale) with SMP enabled (Bug fix included)
On Fri, 2013-04-19 at 14:08 +0100, Stefano Stabellini wrote:> On Thu, 18 Apr 2013, Ian Campbell wrote:> > This also explains why Stefano was able to implement > > map_doman_page_global in terms of map_domain_page: > > http://marc.info/?l=xen-devel&m=136551673224671 > > Regarding that patch, should I re-implement map_doman_page_global using > the current map_domain_page code plus Sengul''s patch?I don''t think we want a lock in map_domain_page for performance reasons. I''m looking at making the domheap per-pcpu right now and although it looks a bit fiddly I don''t think it should take too long.> Should I wait now and rebase later?Ultimately we want to implement map_domain_page_global in terms of vmap. That ought to be pretty easy. What I think needs to happen is that: * early_ioremap needs to flip around and start allocating from the top of the region (for vmap internal reasons, see below) * We need to initialise vmap and implement arch_vmap_virt_end() to return the address right below the used early_ioremap space (this is why it needs to allocate from the end) * Implement domain_map/unmap_page_global in terms of vmap/vunmap, which looks to be more of less trivial API impedance matching I don''t think that is blocked by the per-pcpu domheap stuff, is it?