Sengul Thomas
2013-Apr-18  06:30 UTC
Page table sync. in Xen arm (Arndale) with SMP enabled (Bug fix included)
Hello,
I''m running Xen on Arndale (Exynos5250) board with the following
source.
Xen repo: //xenbits.xen.org/people/julieng/xen-unstable.git
       branch: arndale
And when I generate heavy network traffic at DomU (with SMP enabled)
the following error occasionally occurs:
(XEN) Assertion ''map[slot].pt.avail != 0'' failed, line 219,
file mm.c
(XEN) Xen BUG at mm.c:219
(XEN) CPU1: Unexpected Trap: Undefined Instruction
(XEN) ----[ Xen-4.3-unstable  arm32  debug=y  Tainted:    C ]----
(XEN) CPU:    1
(XEN) PC:     0023d1bc __bug+0x2c/0x44
(XEN) CPSR:   200001da MODE:Hypervisor
(XEN)      R0: 0025c70c R1: 00000003 R2: 3fd4fd80 R3: 00000fff
(XEN)      R4: 00259170 R5: 000000db R6: 0025a4ec R7: 7ffd7180
(XEN)      R8: bfcf8000 R9: 7ffd516c R10:7ffd7000 R11:7ffe7d2c R12:00000004
(XEN) HYP: SP: 7ffe7d24 LR: 0023d1bc
... skip
(XEN)    [<0023d1bc>] __bug+0x2c/0x44
(XEN)    [<00243a18>] unmap_domain_page+0x7c/0xac
(XEN)    [<002452a8>] p2m_lookup+0x13c/0x170
(XEN)    [<0024562c>] gmfn_to_mfn+0x14/0x20
(XEN)    [<00209e04>] __get_paged_frame+0x24/0x9c
(XEN)    [<0020a2f4>] __acquire_grant_for_copy+0x478/0x6b0
(XEN)    [<0020cd30>] do_grant_table_op+0x1938/0x2cb8
(XEN)    [<00247d00>] do_trap_hypervisor+0x70c/0xa7c
(XEN)    [<0024a030>] return_from_trap+0/0x4
I have digged a little bit and figured out that map_domain_page and
unmap_domain_page functions access page table without synchronization.
Here goes a simple fix and I hope you guys make it xen-compatible :)
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index ba378d0..7821db0 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -70,6 +70,8 @@ unsigned long total_pages;
 extern char __init_begin[], __init_end[];
+spinlock_t avail_lock;
+
 void dump_pt_walk(lpae_t *first, paddr_t addr)
 {
     lpae_t *second = NULL, *third = NULL;
@@ -151,6 +153,7 @@ void *map_domain_page(unsigned long mfn)
     int i, slot;
     local_irq_save(flags);
+    spin_lock(&avail_lock);
     /* The map is laid out as an open-addressed hash table where each
      * entry is a 2MB superpage pte.  We use the available bits of each
@@ -191,6 +194,7 @@ void *map_domain_page(unsigned long mfn)
     }
 #endif
+    spin_unlock(&avail_lock);
     local_irq_restore(flags);
     va = (DOMHEAP_VIRT_START
@@ -214,12 +218,14 @@ void unmap_domain_page(const void *va)
     int slot = ((unsigned long) va - DOMHEAP_VIRT_START) >> SECOND_SHIFT;
     local_irq_save(flags);
+    spin_lock(&avail_lock);
     ASSERT(slot >= 0 && slot < DOMHEAP_ENTRIES);
     ASSERT(map[slot].pt.avail != 0);
     map[slot].pt.avail--;
+    spin_unlock(&avail_lock);
     local_irq_restore(flags);
 }
@@ -368,6 +374,8 @@ void __init setup_pagetables(unsigned long
boot_phys_offset, paddr_t xen_paddr)
     WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_WXN, SCTLR_EL2);
     /* Flush everything after setting WXN bit. */
     flush_xen_text_tlb();
+
+    spin_lock_init(&avail_lock);
 }
Sincerely,
Thomas
Ian Campbell
2013-Apr-18  08:02 UTC
Re: Page table sync. in Xen arm (Arndale) with SMP enabled (Bug fix included)
On Thu, 2013-04-18 at 07:30 +0100, Sengul Thomas wrote:> Hello, > > I''m running Xen on Arndale (Exynos5250) board with the following source. > > Xen repo: //xenbits.xen.org/people/julieng/xen-unstable.git > branch: arndale > > And when I generate heavy network traffic at DomU (with SMP enabled) > the following error occasionally occurs: > > (XEN) Assertion ''map[slot].pt.avail != 0'' failed, line 219, file mm.c > (XEN) Xen BUG at mm.c:219 > (XEN) CPU1: Unexpected Trap: Undefined Instruction > (XEN) ----[ Xen-4.3-unstable arm32 debug=y Tainted: C ]---- > (XEN) CPU: 1 > (XEN) PC: 0023d1bc __bug+0x2c/0x44 > (XEN) CPSR: 200001da MODE:Hypervisor > (XEN) R0: 0025c70c R1: 00000003 R2: 3fd4fd80 R3: 00000fff > (XEN) R4: 00259170 R5: 000000db R6: 0025a4ec R7: 7ffd7180 > (XEN) R8: bfcf8000 R9: 7ffd516c R10:7ffd7000 R11:7ffe7d2c R12:00000004 > (XEN) HYP: SP: 7ffe7d24 LR: 0023d1bc > ... skip > (XEN) [<0023d1bc>] __bug+0x2c/0x44 > (XEN) [<00243a18>] unmap_domain_page+0x7c/0xac > (XEN) [<002452a8>] p2m_lookup+0x13c/0x170 > (XEN) [<0024562c>] gmfn_to_mfn+0x14/0x20 > (XEN) [<00209e04>] __get_paged_frame+0x24/0x9c > (XEN) [<0020a2f4>] __acquire_grant_for_copy+0x478/0x6b0 > (XEN) [<0020cd30>] do_grant_table_op+0x1938/0x2cb8 > (XEN) [<00247d00>] do_trap_hypervisor+0x70c/0xa7c > (XEN) [<0024a030>] return_from_trap+0/0x4 > > I have digged a little bit and figured out that map_domain_page and > unmap_domain_page functions access page table without synchronization.Hrm, I thought map_domain_page was supposed to create a PCPU local mapping, at least I thought that was Tim''s intent. By its very nature a per-PCPU mapping shouldn''t require locking except against local interrupts (which it correctly disables). I expect the real bug is that we have failed to remember to setup per-PCPU mappings at DOMHEAP_VIRT_START when we implemented SMP! Oops... We don''t appear to have even managed per-PCPU pagetables, so this may not be a completely trivial fix... This also explains why Stefano was able to implement map_doman_page_global in terms of map_domain_page: http://marc.info/?l=xen-devel&m=136551673224671> Here goes a simple fix and I hope you guys make it xen-compatible :)Thanks, as I say I don''t think the fix is right but the analysis/identification of the issue was certainly valuable! WRT future submissions the most important thing is to include a Signed-off-by line to indicate that you certify the patch under the Developer''s Certificate of Origin, which is include in: http://wiki.xen.org/wiki/Submitting_Xen_Patches Ian.
Stefano Stabellini
2013-Apr-19  13:08 UTC
Re: Page table sync. in Xen arm (Arndale) with SMP enabled (Bug fix included)
On Thu, 18 Apr 2013, Ian Campbell wrote:> On Thu, 2013-04-18 at 07:30 +0100, Sengul Thomas wrote: > > Hello, > > > > I''m running Xen on Arndale (Exynos5250) board with the following source. > > > > Xen repo: //xenbits.xen.org/people/julieng/xen-unstable.git > > branch: arndale > > > > And when I generate heavy network traffic at DomU (with SMP enabled) > > the following error occasionally occurs: > > > > (XEN) Assertion ''map[slot].pt.avail != 0'' failed, line 219, file mm.c > > (XEN) Xen BUG at mm.c:219 > > (XEN) CPU1: Unexpected Trap: Undefined Instruction > > (XEN) ----[ Xen-4.3-unstable arm32 debug=y Tainted: C ]---- > > (XEN) CPU: 1 > > (XEN) PC: 0023d1bc __bug+0x2c/0x44 > > (XEN) CPSR: 200001da MODE:Hypervisor > > (XEN) R0: 0025c70c R1: 00000003 R2: 3fd4fd80 R3: 00000fff > > (XEN) R4: 00259170 R5: 000000db R6: 0025a4ec R7: 7ffd7180 > > (XEN) R8: bfcf8000 R9: 7ffd516c R10:7ffd7000 R11:7ffe7d2c R12:00000004 > > (XEN) HYP: SP: 7ffe7d24 LR: 0023d1bc > > ... skip > > (XEN) [<0023d1bc>] __bug+0x2c/0x44 > > (XEN) [<00243a18>] unmap_domain_page+0x7c/0xac > > (XEN) [<002452a8>] p2m_lookup+0x13c/0x170 > > (XEN) [<0024562c>] gmfn_to_mfn+0x14/0x20 > > (XEN) [<00209e04>] __get_paged_frame+0x24/0x9c > > (XEN) [<0020a2f4>] __acquire_grant_for_copy+0x478/0x6b0 > > (XEN) [<0020cd30>] do_grant_table_op+0x1938/0x2cb8 > > (XEN) [<00247d00>] do_trap_hypervisor+0x70c/0xa7c > > (XEN) [<0024a030>] return_from_trap+0/0x4 > > > > I have digged a little bit and figured out that map_domain_page and > > unmap_domain_page functions access page table without synchronization. > > Hrm, I thought map_domain_page was supposed to create a PCPU local > mapping, at least I thought that was Tim''s intent. By its very nature a > per-PCPU mapping shouldn''t require locking except against local > interrupts (which it correctly disables). > > I expect the real bug is that we have failed to remember to setup > per-PCPU mappings at DOMHEAP_VIRT_START when we implemented SMP! Oops... > We don''t appear to have even managed per-PCPU pagetables, so this may > not be a completely trivial fix... > > This also explains why Stefano was able to implement > map_doman_page_global in terms of map_domain_page: > http://marc.info/?l=xen-devel&m=136551673224671Regarding that patch, should I re-implement map_doman_page_global using the current map_domain_page code plus Sengul''s patch? Should I wait now and rebase later?> > Here goes a simple fix and I hope you guys make it xen-compatible :) > > Thanks, as I say I don''t think the fix is right but the > analysis/identification of the issue was certainly valuable!that''s right, thanks again!
Ian Campbell
2013-Apr-19  13:24 UTC
Re: Page table sync. in Xen arm (Arndale) with SMP enabled (Bug fix included)
On Fri, 2013-04-19 at 14:08 +0100, Stefano Stabellini wrote:> On Thu, 18 Apr 2013, Ian Campbell wrote:> > This also explains why Stefano was able to implement > > map_doman_page_global in terms of map_domain_page: > > http://marc.info/?l=xen-devel&m=136551673224671 > > Regarding that patch, should I re-implement map_doman_page_global using > the current map_domain_page code plus Sengul''s patch?I don''t think we want a lock in map_domain_page for performance reasons. I''m looking at making the domheap per-pcpu right now and although it looks a bit fiddly I don''t think it should take too long.> Should I wait now and rebase later?Ultimately we want to implement map_domain_page_global in terms of vmap. That ought to be pretty easy. What I think needs to happen is that: * early_ioremap needs to flip around and start allocating from the top of the region (for vmap internal reasons, see below) * We need to initialise vmap and implement arch_vmap_virt_end() to return the address right below the used early_ioremap space (this is why it needs to allocate from the end) * Implement domain_map/unmap_page_global in terms of vmap/vunmap, which looks to be more of less trivial API impedance matching I don''t think that is blocked by the per-pcpu domheap stuff, is it?