Dave McCracken
2010-Jun-09 14:02 UTC
[Xen-devel] [Linux PATCH] Fix to hugepages to work around new PWT handling
Recent changes to Linux include code to set new flags in the pte, including _PAGE_PAT and _PAGE_PWT. That change conflicts with hugepage using the pte macros to set up its pmd entries. This patch resolves that problem. An additional fix here is to make sure the _PAGE_PRESENT bit is set before hugepages does a mk_pte(), since Xen depends on that bit to trigger the pfn->mfn translation. Signed-off-by: Dave McCracken <dave.mccracken@oracle.com> -------- --- stable-2.6.32.x//arch/x86/include/asm/hugetlb.h 2010-06-04 12:19:31.000000000 -0500 +++ 2.6.32-hfix//arch/x86/include/asm/hugetlb.h 2010-06-08 12:23:53.000000000 -0500 @@ -44,7 +44,7 @@ static inline pte_t huge_ptep_get(pte_t static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { - set_pmd((pmd_t *)ptep, __pmd(pte_val(pte))); + set_pmd((pmd_t *)ptep, native_make_pmd(native_pte_val(pte))); } static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, --- stable-2.6.32.x//mm/hugetlb.c 2010-06-04 12:19:36.000000000 -0500 +++ 2.6.32-hfix//mm/hugetlb.c 2010-06-07 07:11:34.000000000 -0500 @@ -1732,12 +1732,14 @@ static pte_t make_huge_pte(struct vm_are int writable) { pte_t entry; + pgprot_t pgprot; + pgprot = __pgprot(pgprot_val(vma->vm_page_prot) | _PAGE_PRESENT); if (writable) { entry - pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot))); + pte_mkwrite(pte_mkdirty(mk_pte(page, pgprot))); } else { - entry = huge_pte_wrprotect(mk_pte(page, vma->vm_page_prot)); + entry = huge_pte_wrprotect(mk_pte(page, pgprot)); } entry = pte_mkyoung(entry); entry = pte_mkhuge(entry); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jun-09 18:21 UTC
Re: [Xen-devel] [Linux PATCH] Fix to hugepages to work around new PWT handling
On 06/09/2010 07:02 AM, Dave McCracken wrote:> Recent changes to Linux include code to set new flags in the pte, > including _PAGE_PAT and _PAGE_PWT. That change conflicts with hugepage > using the pte macros to set up its pmd entries. This patch resolves > that problem. >Could you explain this a bit more clearly? Why is using __pmd not working in this case? Is it because the kernel is now setting PAT and PWT on huge pages? But PAT isn''t even the same flag for huge pages...> An additional fix here is to make sure the _PAGE_PRESENT bit is set > before hugepages does a mk_pte(), since Xen depends on that bit to > trigger the pfn->mfn translation. >Why is the kernel creating a non-present mapping? If it isn''t present, why does it matter whether we do the pfn->mfn conversion? J> Signed-off-by: Dave McCracken <dave.mccracken@oracle.com> > > -------- > > > --- stable-2.6.32.x//arch/x86/include/asm/hugetlb.h 2010-06-04 12:19:31.000000000 -0500 > +++ 2.6.32-hfix//arch/x86/include/asm/hugetlb.h 2010-06-08 12:23:53.000000000 -0500 > @@ -44,7 +44,7 @@ static inline pte_t huge_ptep_get(pte_t > static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, > pte_t *ptep, pte_t pte) > { > - set_pmd((pmd_t *)ptep, __pmd(pte_val(pte))); > + set_pmd((pmd_t *)ptep, native_make_pmd(native_pte_val(pte))); > } > > static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, > --- stable-2.6.32.x//mm/hugetlb.c 2010-06-04 12:19:36.000000000 -0500 > +++ 2.6.32-hfix//mm/hugetlb.c 2010-06-07 07:11:34.000000000 -0500 > @@ -1732,12 +1732,14 @@ static pte_t make_huge_pte(struct vm_are > int writable) > { > pte_t entry; > + pgprot_t pgprot; > > + pgprot = __pgprot(pgprot_val(vma->vm_page_prot) | _PAGE_PRESENT); > if (writable) { > entry > - pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot))); > + pte_mkwrite(pte_mkdirty(mk_pte(page, pgprot))); > } else { > - entry = huge_pte_wrprotect(mk_pte(page, vma->vm_page_prot)); > + entry = huge_pte_wrprotect(mk_pte(page, pgprot)); > } > entry = pte_mkyoung(entry); > entry = pte_mkhuge(entry); > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dave McCracken
2010-Jun-09 18:35 UTC
Re: [Xen-devel] [Linux PATCH] Fix to hugepages to work around new PWT handling
On Wednesday, June 09, 2010, Jeremy Fitzhardinge wrote:> > Recent changes to Linux include code to set new flags in the pte, > > including _PAGE_PAT and _PAGE_PWT. That change conflicts with hugepage > > using the pte macros to set up its pmd entries. This patch resolves > > that problem. > > > > > > Could you explain this a bit more clearly? Why is using __pmd not > working in this case? Is it because the kernel is now setting PAT and > PWT on huge pages? But PAT isn''t even the same flag for huge pages...For some reason the latest xen_make_pte() and xen_pte_val() are attempting to do some magic with PAT, PCD, and PWT. The previous version of the code in set_huge_pte_at() did "__pmd(pte_val(pte))", and the end result was that PSE got turned off and PWT was turned on. Doing the native versions of those calls avoids this issue, since all the code is really trying to do here is a typecast. I realize the more complete fix probably involves something like converting hugepages to use pmd throughout instead of pte, but that''s a much bigger change and this solves the immediate problem.> > An additional fix here is to make sure the _PAGE_PRESENT bit is set > > before hugepages does a mk_pte(), since Xen depends on that bit to > > trigger the pfn->mfn translation. > > > > > > Why is the kernel creating a non-present mapping? If it isn''t present, > why does it matter whether we do the pfn->mfn conversion?The hugepage function make_huge_pte() called mk_pte() to turn a page and a pgprot into a pte before it set PRESENT. The PRESENT flag was set after the pte was made. This meant that the Xen version of the macro did not see PRESENT so did not do the pfn_to_mfn(). My patch sets PRESENT first so the right thing will happen. Dave McCracken Oracle Corp. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jun-09 18:54 UTC
Re: [Xen-devel] [Linux PATCH] Fix to hugepages to work around new PWT handling
On 06/09/2010 11:35 AM, Dave McCracken wrote:> On Wednesday, June 09, 2010, Jeremy Fitzhardinge wrote: > >>> Recent changes to Linux include code to set new flags in the pte, >>> including _PAGE_PAT and _PAGE_PWT. That change conflicts with hugepage >>> using the pte macros to set up its pmd entries. This patch resolves >>> that problem. >>> >>> >>> >> Could you explain this a bit more clearly? Why is using __pmd not >> working in this case? Is it because the kernel is now setting PAT and >> PWT on huge pages? But PAT isn''t even the same flag for huge pages... >> > For some reason the latest xen_make_pte() and xen_pte_val() are attempting to > do some magic with PAT, PCD, and PWT. The previous version of the code in > set_huge_pte_at() did "__pmd(pte_val(pte))", and the end result was that PSE > got turned off and PWT was turned on. Doing the native versions of those calls > avoids this issue, since all the code is really trying to do here is a > typecast. >Yes, the pte code is converting the Linux PAT encodings to the Xen ones, so that the resulting mappings have the right page properties. Does the Linux hugepage code set PAT_LARGE on huge ptes, and does Xen support huge mappings with PAT properties? If so, you''ll probably need to do the same thing.> I realize the more complete fix probably involves something like converting > hugepages to use pmd throughout instead of pte, but that''s a much bigger > change and this solves the immediate problem. >I think that''s actually pretty straightforward. It shouldn''t be using the plain pte/pmd macros anyway, but the huge_ counterparts. huge_pte should map to pmd. But this portion of the patch looks OK for now.>>> An additional fix here is to make sure the _PAGE_PRESENT bit is set >>> before hugepages does a mk_pte(), since Xen depends on that bit to >>> trigger the pfn->mfn translation. >>> >>> >>> >> Why is the kernel creating a non-present mapping? If it isn''t present, >> why does it matter whether we do the pfn->mfn conversion? >> > The hugepage function make_huge_pte() called mk_pte() to turn a page and a > pgprot into a pte before it set PRESENT. The PRESENT flag was set after the > pte was made. This meant that the Xen version of the macro did not see > PRESENT so did not do the pfn_to_mfn(). My patch sets PRESENT first so the > right thing will happen. >But in general kernel code shouldn''t be just nakedly setting present on the pte without also remaking the whole thing. That doesn''t happen with normal ptes, and it probably shouldn''t happen with huge ptes. Forcing present on a pte at this level seems very bogus. Why not change the upper code to set present if that''s want it wants? I''ll skip this chunk for now. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dave McCracken
2010-Jun-09 19:26 UTC
Re: [Xen-devel] [Linux PATCH] Fix to hugepages to work around new PWT handling
On Wednesday, June 09, 2010, Jeremy Fitzhardinge wrote:> >>> An additional fix here is to make sure the _PAGE_PRESENT bit is set > >>> before hugepages does a mk_pte(), since Xen depends on that bit to > >>> trigger the pfn->mfn translation. > >>> > >>> > >>> > >>> > >> Why is the kernel creating a non-present mapping? If it isn''t present, > >> why does it matter whether we do the pfn->mfn conversion? > >> > >> > >> > > The hugepage function make_huge_pte() called mk_pte() to turn a page and > > a pgprot into a pte before it set PRESENT. The PRESENT flag was set > > after the pte was made. This meant that the Xen version of the macro > > did not see PRESENT so did not do the pfn_to_mfn(). My patch sets > > PRESENT first so the right thing will happen. > > > > > > But in general kernel code shouldn''t be just nakedly setting present on > the pte without also remaking the whole thing. That doesn''t happen with > normal ptes, and it probably shouldn''t happen with huge ptes. Forcing > present on a pte at this level seems very bogus. Why not change the > upper code to set present if that''s want it wants? > > I''ll skip this chunk for now.Um, this is the upper level code. The entire purpose of make_huge_pte is to construct a present huge pte from page and pgprot. The problem is that the original code makes the pte, then sets the present bit via pte_mkhuge(). This means the Xen-specific macro that triggers on present is misled and doesn''t do the pfn_to_mfn(). Without this patch hugepages is handing pfns to the hypervisor to map instead of mfns. Dave _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jun-24 10:05 UTC
Re: [Xen-devel] [Linux PATCH] Fix to hugepages to work around new PWT handling
On 06/09/2010 08:26 PM, Dave McCracken wrote:>> But in general kernel code shouldn''t be just nakedly setting present on >> the pte without also remaking the whole thing. That doesn''t happen with >> normal ptes, and it probably shouldn''t happen with huge ptes. Forcing >> present on a pte at this level seems very bogus. Why not change the >> upper code to set present if that''s want it wants? >> >> I''ll skip this chunk for now. >> > Um, this is the upper level code. The entire purpose of make_huge_pte is to > construct a present huge pte from page and pgprot. The problem is that the > original code makes the pte, then sets the present bit via pte_mkhuge(). This > means the Xen-specific macro that triggers on present is misled and doesn''t do > the pfn_to_mfn(). Without this patch hugepages is handing pfns to the > hypervisor to map instead of mfns. >In principle, setting present should cause the pte to be converted from pfn to mfn, but I don''t think that ever happens with normal ptes (since non-present ptes contain swap info). But I don''t see where a huge pte gets present set; pte_mkhuge itself doesn''t do anything except set PSE. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dave McCracken
2010-Jun-24 22:38 UTC
Re: [Xen-devel] [Linux PATCH] Fix to hugepages to work around new PWT handling
On Thursday, June 24, 2010, Jeremy Fitzhardinge wrote:> > Um, this is the upper level code. The entire purpose of make_huge_pte is > > to construct a present huge pte from page and pgprot. The problem is > > that the original code makes the pte, then sets the present bit via > > pte_mkhuge(). This means the Xen-specific macro that triggers on > > present is misled and doesn''t do the pfn_to_mfn(). Without this patch > > hugepages is handing pfns to the hypervisor to map instead of mfns. > > > > > > In principle, setting present should cause the pte to be converted from > pfn to mfn, but I don''t think that ever happens with normal ptes (since > non-present ptes contain swap info). But I don''t see where a huge pte > gets present set; pte_mkhuge itself doesn''t do anything except set PSE.Wow. I just dug through the code. The landscape has sure changed since the last time I followed this path. It used to be that vma->vm_page_prot only contained the various read/write flags for that vma. At that time pte_mkhuge() did in fact add _PAGE_PRESENT| _PAGE_PSE to the pte. Now it appears that vma->vm_page_prot does include _PAGE_PRESENT in all its various states. So this part of the patch is in fact unnecessary. It''s what I get for not rechecking my facts to be sure they haven''t changed. Sorry. Dave McCracken Oracle Corp. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel