We have two drivers that both map a region of IO memory from a PCI device. This works fine on Linux, but in xen dom0 we hit an assertion in arch/i386/mm/ioremap-xen.c:direct_remap_area_pte_fn() when doing the second ioremap(): BUG_ON(!pte_none(*pte)); It''s not clear to me whether this is just an over-eager assertion or if this operation is genuinely not supported on Xen. The assertion doesn''t seem to be related to the operation of that function, and if I remove it everything seems to work fine. Is anyone who is more familiar with this code able to throw any light on the reason for the assertion? Thanks Kieran _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
There''s no TLB flush or INVLPG, so there is a risk of a stale TLB entry if the pte is already in use (we do not consider the case where the pte is actually being set to its existing value -- we don''t expect that should ever happen). If the drivers are using ioremap() then they should be getting different virtual addresses and hence no pte overlap, I would have thought. -- Keir On 17/10/07 14:25, "Kieran Mansley" <kmansley@solarflare.com> wrote:> We have two drivers that both map a region of IO memory from a PCI > device. This works fine on Linux, but in xen dom0 we hit an assertion > in arch/i386/mm/ioremap-xen.c:direct_remap_area_pte_fn() when doing the > second ioremap(): > > BUG_ON(!pte_none(*pte)); > > It''s not clear to me whether this is just an over-eager assertion or if > this operation is genuinely not supported on Xen. The assertion doesn''t > seem to be related to the operation of that function, and if I remove it > everything seems to work fine. > > Is anyone who is more familiar with this code able to throw any light on > the reason for the assertion? > > Thanks > > Kieran > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2007-10-17 at 14:43 +0100, Keir Fraser wrote:> There''s no TLB flush or INVLPG, so there is a risk of a stale TLB entry if > the pte is already in use (we do not consider the case where the pte is > actually being set to its existing value -- we don''t expect that should ever > happen). If the drivers are using ioremap() then they should be getting > different virtual addresses and hence no pte overlap, I would have thought.OK, I''ve dug a bit deeper. The problem is not with two mappings of the same IO region, it is with anything involving ioremap() followed by iounmap() followed by another ioremap(). I''ve been able to demonstrate this by mapping in a region, immediately unmapping it using iounmap(), then mapping it in again (in this particular case using ioremap_nocache ()) i.e.: membase = ioremap_nocache ( membase_phys, membase_len ); if ( !membase ) { return -ENOMEM; } iounmap(membase); membase = ioremap_nocache ( membase_phys, membase_len ); The second ioremap calls get_vm_area() and happens to be given a region of virtual memory that at least partly overlaps the area that was used for the first one. As the area was freed by the iounmap() in the middle this seems sensible, but then when it comes to get the PTE for the virtual addresses in the second map, it finds there are entries left over from the first one. Kieran _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 17/10/07 16:58, "Kieran Mansley" <kmansley@solarflare.com> wrote:> The second ioremap calls get_vm_area() and happens to be given a region > of virtual memory that at least partly overlaps the area that was used > for the first one. As the area was freed by the iounmap() in the middle > this seems sensible, but then when it comes to get the PTE for the > virtual addresses in the second map, it finds there are entries left > over from the first one.Well, that''s bogus. iounmap() should zap all PTEs via remove_vm_area(). If it''s not doing so then we have a bug. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2007-10-17 at 17:06 +0100, Keir Fraser wrote:> On 17/10/07 16:58, "Kieran Mansley" <kmansley@solarflare.com> wrote: > > > The second ioremap calls get_vm_area() and happens to be given a region > > of virtual memory that at least partly overlaps the area that was used > > for the first one. As the area was freed by the iounmap() in the middle > > this seems sensible, but then when it comes to get the PTE for the > > virtual addresses in the second map, it finds there are entries left > > over from the first one. > > Well, that''s bogus. iounmap() should zap all PTEs via remove_vm_area(). If > it''s not doing so then we have a bug.Yep. When iounmap() calls remove_vm_area() it percolates down to vunmap_pte_range(). This seems to not always succeed: checking that ! pte_none() on the pte at the end of that function occasionally throws up a warning. Unsurprisingly this is on the same addresses that later hit the assertion in direct_remap_area_pte_fn(). The reason that vunmap_pte_range() is failing seems to be due to a problem in xen/arch/x86/mm.c:do_update_va_mapping() (vunmap_pte_range calls ptep_get_and_clear which calls HYPERVISOR_update_va_mapping). This is returning EINVAL because it can''t get the l1e from the guest: pl1e = guest_map_l1e(v, va, &gl1mfn); if ( unlikely(!pl1e || !mod_l1_entry(pl1e, val, gl1mfn)) ) rc = -EINVAL; guest_map_l1e is returning NULL because of incorrect flag settings on the l2e: /* Check flags that it will be safe to read the l1e */ if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT ) return NULL; It looks to me like neither _PAGE_PRESENT nor _PAGE_PSE is set. Any advice about why this might be and how to fix this would be gratefully received. Kieran _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 18/10/07 15:15, "Kieran Mansley" <kmansley@solarflare.com> wrote:> It looks to me like neither _PAGE_PRESENT nor _PAGE_PSE is set. Any > advice about why this might be and how to fix this would be gratefully > received.Vmalloc area pagetables are demand mapped into per-process pgdirs. So it''s invalid for vunmap_pte_range() to be relying solely on update_va_mapping(). Either it should use some other method, or at least have some other method as a fallback. This is easily fixed. I guess I''ll audit other uses of update_va_mapping() at the same time. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 18/10/07 15:40, "Keir Fraser" <Keir.Fraser@cl.cam.ac.uk> wrote:> Vmalloc area pagetables are demand mapped into per-process pgdirs. So it''s > invalid for vunmap_pte_range() to be relying solely on update_va_mapping(). > Either it should use some other method, or at least have some other method > as a fallback. > > This is easily fixed. I guess I''ll audit other uses of update_va_mapping() > at the same time.Fixed by linux-2.6.18-xen.hg changeset 265. I''ll also backport for 3.1 series. This was introduced by xen changeset 14731, back in April. So Jan is to blame (and me, I suppose!). ;-) I expect it''s in a few vendor kernels at this point... -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <Keir.Fraser@cl.cam.ac.uk> 18.10.07 17:22 >>> >On 18/10/07 15:40, "Keir Fraser" <Keir.Fraser@cl.cam.ac.uk> wrote: > >> Vmalloc area pagetables are demand mapped into per-process pgdirs. So it''s >> invalid for vunmap_pte_range() to be relying solely on update_va_mapping(). >> Either it should use some other method, or at least have some other method >> as a fallback. >> >> This is easily fixed. I guess I''ll audit other uses of update_va_mapping() >> at the same time. > >Fixed by linux-2.6.18-xen.hg changeset 265. I''ll also backport for 3.1 >series. > >This was introduced by xen changeset 14731, back in April. So Jan is to >blame (and me, I suppose!). ;-) I expect it''s in a few vendor kernels at >this point...Indeed, I didn''t consider this situation. However, excluding the check against current->mm here was on purpose (and hence I''m not certain your adding of it was correct) - when used on user mappings, ptep_get_and_clear() is expected to return the most up-to-date A/D flags, which cannot be achieved through a hypercall. On the contrary, the use(s) with init_mm never have such expectations as far as I recall, they at best check the returned value for validity/presence/null. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 19/10/07 08:13, "Jan Beulich" <jbeulich@novell.com> wrote:>> This was introduced by xen changeset 14731, back in April. So Jan is to >> blame (and me, I suppose!). ;-) I expect it''s in a few vendor kernels at >> this point... > > Indeed, I didn''t consider this situation. However, excluding the check > against current->mm here was on purpose (and hence I''m not certain > your adding of it was correct) - when used on user mappings, > ptep_get_and_clear() is expected to return the most up-to-date A/D > flags, which cannot be achieved through a hypercall. On the contrary, > the use(s) with init_mm never have such expectations as far as I > recall, they at best check the returned value for validity/presence/null.Ah, that''s true... I''ll fix the fix. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel