These mappings are in fact special and require special handling in privcmd, which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun. x86 already gets this correct. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: xen-devel@lists.xenproject.org --- arch/arm/xen/enlighten.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index 83e4f95..932cc50 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -96,7 +96,7 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr, struct remap_data *info = data; struct page *page = info->pages[info->index++]; unsigned long pfn = page_to_pfn(page); - pte_t pte = pfn_pte(pfn, info->prot); + pte_t pte = pte_mkspecial(pfn_pte(pfn, info->prot)); if (map_foreign_page(pfn, info->fgmfn, info->domid)) return -EFAULT; -- 1.7.10.4
Stefano Stabellini
2013-Dec-06 17:48 UTC
Re: [PATCH] arm: xen: foreign mapping PTEs are special.
On Wed, 4 Dec 2013, Ian Campbell wrote:> These mappings are in fact special and require special handling in privcmd, > which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun. > > x86 already gets this correct.Yes, but x86 does that for PV guests, not for autotranslate guests (for which the function return -EINVAL). Given that in the ARM case we are changing the p2m underneath, why do we also need to mark them special?> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > Cc: xen-devel@lists.xenproject.org > --- > arch/arm/xen/enlighten.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c > index 83e4f95..932cc50 100644 > --- a/arch/arm/xen/enlighten.c > +++ b/arch/arm/xen/enlighten.c > @@ -96,7 +96,7 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr, > struct remap_data *info = data; > struct page *page = info->pages[info->index++]; > unsigned long pfn = page_to_pfn(page); > - pte_t pte = pfn_pte(pfn, info->prot); > + pte_t pte = pte_mkspecial(pfn_pte(pfn, info->prot)); > > if (map_foreign_page(pfn, info->fgmfn, info->domid)) > return -EFAULT; > -- > 1.7.10.4 >
Ian Campbell
2013-Dec-06 17:54 UTC
Re: [PATCH] arm: xen: foreign mapping PTEs are special.
On Fri, 2013-12-06 at 17:48 +0000, Stefano Stabellini wrote:> On Wed, 4 Dec 2013, Ian Campbell wrote: > > These mappings are in fact special and require special handling in privcmd, > > which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun. > > > > x86 already gets this correct. > > Yes, but x86 does that for PV guests, not for autotranslate guests (for > which the function return -EINVAL). > > Given that in the ARM case we are changing the p2m underneath, why do we > also need to mark them special?It''s not about the p2m, it''s about the handling in privcmd wrt setup and teardown of the stage one mapping which goes along with the p2m manipulations. Without this the normal rmap counting kicks in and complains about the mapcount being -1. It''s possible that we could handle this by implementing vma fault handling for the privcmd driver, but that would mean a lot more book keeping (e.g. to know if the ioctl has been made for a particular address or not) and would mean we diverged more from the pv paths. Ian.
Stefano Stabellini
2013-Dec-08 17:32 UTC
Re: [PATCH] arm: xen: foreign mapping PTEs are special.
On Fri, 6 Dec 2013, Ian Campbell wrote:> On Fri, 2013-12-06 at 17:48 +0000, Stefano Stabellini wrote: > > On Wed, 4 Dec 2013, Ian Campbell wrote: > > > These mappings are in fact special and require special handling in privcmd, > > > which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun. > > > > > > x86 already gets this correct. > > > > Yes, but x86 does that for PV guests, not for autotranslate guests (for > > which the function return -EINVAL). > > > > Given that in the ARM case we are changing the p2m underneath, why do we > > also need to mark them special? > > It''s not about the p2m, it''s about the handling in privcmd wrt setup and > teardown of the stage one mapping which goes along with the p2m > manipulations. > > Without this the normal rmap counting kicks in and complains about the > mapcount being -1.In the case of PV guests we need to mark the pte special because there is no struct page associated to it: the userspace virtual address is mapped to a foreign mfn. No dom0 pages are involved. In this case we do have a corresponding physical address and a corresponding struct page in dom0. In fact the dom0 page is a xenballooned_pages allocated by alloc_empty_pages. What exactly is the kernel complaining about? Didn''t we allocate the pages correctly? Didn''t we call get_page appropriately? There must be a way to make the mapcount happy without marking the pages as special. Also consider that on 3.13-rc2 pte_mkspecial on arm is implemented as: static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
Ian Campbell
2013-Dec-09 10:19 UTC
Re: [PATCH] arm: xen: foreign mapping PTEs are special.
On Sun, 2013-12-08 at 17:32 +0000, Stefano Stabellini wrote:> On Fri, 6 Dec 2013, Ian Campbell wrote: > > On Fri, 2013-12-06 at 17:48 +0000, Stefano Stabellini wrote: > > > On Wed, 4 Dec 2013, Ian Campbell wrote: > > > > These mappings are in fact special and require special handling in privcmd, > > > > which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun. > > > > > > > > x86 already gets this correct. > > > > > > Yes, but x86 does that for PV guests, not for autotranslate guests (for > > > which the function return -EINVAL). > > > > > > Given that in the ARM case we are changing the p2m underneath, why do we > > > also need to mark them special? > > > > It''s not about the p2m, it''s about the handling in privcmd wrt setup and > > teardown of the stage one mapping which goes along with the p2m > > manipulations. > > > > Without this the normal rmap counting kicks in and complains about the > > mapcount being -1. > > In the case of PV guests we need to mark the pte special because there > is no struct page associated to it: the userspace virtual address is > mapped to a foreign mfn. No dom0 pages are involved. > > In this case we do have a corresponding physical address and a > corresponding struct page in dom0. In fact the dom0 page is a > xenballooned_pages allocated by alloc_empty_pages.special is not 100% the same as "no struct page". It means "not normal" where normal also has other requirements, such as being normal user process RAM established in the normal way (e.g. get_user_pages or similar), which privcmd does not do.> What exactly is the kernel complaining about?There are a few different symptoms, but frequently it is hitting a bad pte path on teardown. This is caused by vm_normal_page returning true when it should be false (where true and false are actually non-NULL and NULL). e.g.: BUG: Bad page map in process xl pte:e0004077b33f53 pmd:4079575003 page:ffffffbce1a2f328 count:1 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4000000000000014(referenced|dirty) addr:0000007fb5259000 vm_flags:040644fa anon_vma: (null) mapping:ffffffc03a6fda58 index:0 vma->vm_ops->fault: privcmd_fault+0x0/0x38 vma->vm_file->f_op->mmap: privcmd_mmap+0x0/0x2c CPU: 0 PID: 2657 Comm: xl Not tainted 3.12.0+ #102 Call trace: [<ffffffc0000880f8>] dump_backtrace+0x0/0x12c [<ffffffc000088238>] show_stack+0x14/0x1c [<ffffffc0004b67e0>] dump_stack+0x70/0x90 [<ffffffc000125690>] print_bad_pte+0x12c/0x1bc [<ffffffc0001268f4>] unmap_single_vma+0x4cc/0x700 [<ffffffc0001273b4>] unmap_vmas+0x68/0xb4 [<ffffffc00012c050>] unmap_region+0xcc/0x1d4 [<ffffffc00012df20>] do_munmap+0x218/0x314 [<ffffffc00012e060>] vm_munmap+0x44/0x64 [<ffffffc00012ed78>] SyS_munmap+0x24/0x34 Or: BUG: Bad page state in process xl pfn:4077b4d page:ffffffbce1a2f8d8 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4000000000000014(referenced|dirty) Modules linked in: CPU: 0 PID: 2657 Comm: xl Tainted: G B 3.12.0+ #102 Call trace: [<ffffffc0000880f8>] dump_backtrace+0x0/0x12c [<ffffffc000088238>] show_stack+0x14/0x1c [<ffffffc0004b67e0>] dump_stack+0x70/0x90 [<ffffffc00010f798>] bad_page+0xc4/0x110 [<ffffffc00010f8b4>] free_pages_prepare+0xd0/0xd8 [<ffffffc000110e94>] free_hot_cold_page+0x28/0x178 [<ffffffc000111460>] free_hot_cold_page_list+0x38/0x60 [<ffffffc000114cf0>] release_pages+0x190/0x1dc [<ffffffc00012c0e0>] unmap_region+0x15c/0x1d4 [<ffffffc00012df20>] do_munmap+0x218/0x314 [<ffffffc00012e060>] vm_munmap+0x44/0x64 [<ffffffc00012ed78>] SyS_munmap+0x24/0x34> Didn''t we allocate the pages correctly?It depends what you mean by correctly, we''ve done everything right for a special page, except telling the kernel (via the pte) that the page is special.> Didn''t we call get_page appropriately?The count which is wrong is not page->count it is page->_mapcount, which is used by the rmap stuff AFAIK.> There must be a way to make the mapcount happy without marking the pages > as special.I''m sure there is, but it would be semantically incorrect, these pages are not normal process memory, even though they do happen to have a struct page. I really don''t think making these pages appear normal is what we want. We don''t want them being considered page cache, available to be swapped or any of that "normal" stuff.> Also consider that on 3.13-rc2 pte_mkspecial on arm is implemented as: > > static inline pte_t pte_mkspecial(pte_t pte) { return pte; }This is because 32-bit arm does not have a spare PTE bit to use as PTE_SPECIAL, end therefore does not define __HAVE_ARCH_PTE_SPECIAL. It takes the slow/fallback path in vm_normal, which works fine. arm64 does have such a spare bit, so vm_normal takes the fast path but reaches the wrong conclusion because we didn''t set the special bit like we should. Ian.
Stefano Stabellini
2013-Dec-09 12:16 UTC
Re: [PATCH] arm: xen: foreign mapping PTEs are special.
On Mon, 9 Dec 2013, Ian Campbell wrote:> On Sun, 2013-12-08 at 17:32 +0000, Stefano Stabellini wrote: > > On Fri, 6 Dec 2013, Ian Campbell wrote: > > > On Fri, 2013-12-06 at 17:48 +0000, Stefano Stabellini wrote: > > > > On Wed, 4 Dec 2013, Ian Campbell wrote: > > > > > These mappings are in fact special and require special handling in privcmd, > > > > > which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun. > > > > > > > > > > x86 already gets this correct. > > > > > > > > Yes, but x86 does that for PV guests, not for autotranslate guests (for > > > > which the function return -EINVAL). > > > > > > > > Given that in the ARM case we are changing the p2m underneath, why do we > > > > also need to mark them special? > > > > > > It''s not about the p2m, it''s about the handling in privcmd wrt setup and > > > teardown of the stage one mapping which goes along with the p2m > > > manipulations. > > > > > > Without this the normal rmap counting kicks in and complains about the > > > mapcount being -1. > > > > In the case of PV guests we need to mark the pte special because there > > is no struct page associated to it: the userspace virtual address is > > mapped to a foreign mfn. No dom0 pages are involved. > > > > In this case we do have a corresponding physical address and a > > corresponding struct page in dom0. In fact the dom0 page is a > > xenballooned_pages allocated by alloc_empty_pages. > > special is not 100% the same as "no struct page". It means "not normal" > where normal also has other requirements, such as being normal user > process RAM established in the normal way (e.g. get_user_pages or > similar), which privcmd does not do. > > > What exactly is the kernel complaining about? > > There are a few different symptoms, but frequently it is hitting a bad > pte path on teardown. This is caused by vm_normal_page returning true > when it should be false (where true and false are actually non-NULL and > NULL). > > e.g.: > > BUG: Bad page map in process xl pte:e0004077b33f53 pmd:4079575003 > page:ffffffbce1a2f328 count:1 mapcount:-1 mapping: (null) index:0x0 > page flags: 0x4000000000000014(referenced|dirty) > addr:0000007fb5259000 vm_flags:040644fa anon_vma: (null) mapping:ffffffc03a6fda58 index:0 > vma->vm_ops->fault: privcmd_fault+0x0/0x38 > vma->vm_file->f_op->mmap: privcmd_mmap+0x0/0x2c > CPU: 0 PID: 2657 Comm: xl Not tainted 3.12.0+ #102 > Call trace: > [<ffffffc0000880f8>] dump_backtrace+0x0/0x12c > [<ffffffc000088238>] show_stack+0x14/0x1c > [<ffffffc0004b67e0>] dump_stack+0x70/0x90 > [<ffffffc000125690>] print_bad_pte+0x12c/0x1bc > [<ffffffc0001268f4>] unmap_single_vma+0x4cc/0x700 > [<ffffffc0001273b4>] unmap_vmas+0x68/0xb4 > [<ffffffc00012c050>] unmap_region+0xcc/0x1d4 > [<ffffffc00012df20>] do_munmap+0x218/0x314 > [<ffffffc00012e060>] vm_munmap+0x44/0x64 > [<ffffffc00012ed78>] SyS_munmap+0x24/0x34 > > Or: > > BUG: Bad page state in process xl pfn:4077b4d > page:ffffffbce1a2f8d8 count:0 mapcount:-1 mapping: (null) index:0x0 > page flags: 0x4000000000000014(referenced|dirty) > Modules linked in: > CPU: 0 PID: 2657 Comm: xl Tainted: G B 3.12.0+ #102 > Call trace: > [<ffffffc0000880f8>] dump_backtrace+0x0/0x12c > [<ffffffc000088238>] show_stack+0x14/0x1c > [<ffffffc0004b67e0>] dump_stack+0x70/0x90 > [<ffffffc00010f798>] bad_page+0xc4/0x110 > [<ffffffc00010f8b4>] free_pages_prepare+0xd0/0xd8 > [<ffffffc000110e94>] free_hot_cold_page+0x28/0x178 > [<ffffffc000111460>] free_hot_cold_page_list+0x38/0x60 > [<ffffffc000114cf0>] release_pages+0x190/0x1dc > [<ffffffc00012c0e0>] unmap_region+0x15c/0x1d4 > [<ffffffc00012df20>] do_munmap+0x218/0x314 > [<ffffffc00012e060>] vm_munmap+0x44/0x64 > [<ffffffc00012ed78>] SyS_munmap+0x24/0x34Could you please resend the patch adding the backtrace to the commit message? Given that it skips a few steps, maybe it''s best to write it down manually: unmap_single_vma -> unmap_page_range -> zap_pud_range -> zap_pmd_range -> zap_pte_range -> print_bad_pte