Konrad Rzeszutek Wilk
2011-May-13 15:30 UTC
[Xen-devel] Xen MMU''s requirement to pin pages RO and initial_memory_mapping.
aka, remove the hack added by git commit 609cfda586c7fe3e5d1a02c51edb587506294167 (Merge branch ''stable/bug-fixes-for-rc5'' of git://git.kernel.org/../xen) One idea that is on the table was proposed by Yinghai: "Xen should set RAM for page-table to RO after init_memory mapping." In other words, don''t do the magic ''mask_rw_pte'' when set_pte is called. (and don''t do the calls to make_lowmem_page_readonly when allocating the PTE table, nor PMD, nor PUD) - I think? But instead do: a). when we load the cr3? We could go through the whole pagetable and set the RO as we need? b). when we are finished with the creation of a page table? So similary to the point above - don''t set the RO on the pages until we have completed the full creation of the page. c). when post_allocator_start is called? d). other ideas? But I vaguelly recall that we are using the page table as we are adding in the entries. And that we are pinning them as well. Perhaps the trigger to scan the pagetable and set them to RO should be done .. at what point? When the PUD/PMD allocations are done? And when PUD/PMD are set? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-May-16 10:23 UTC
[Xen-devel] Re: Xen MMU''s requirement to pin pages RO and initial_memory_mapping.
On Fri, 13 May 2011, Konrad Rzeszutek Wilk wrote:> aka, remove the hack added by git commit 609cfda586c7fe3e5d1a02c51edb587506294167 > (Merge branch ''stable/bug-fixes-for-rc5'' of git://git.kernel.org/../xen) > > One idea that is on the table was proposed by Yinghai: > "Xen should set RAM for page-table to RO after init_memory mapping." > > In other words, don''t do the magic ''mask_rw_pte'' when set_pte is called. > (and don''t do the calls to make_lowmem_page_readonly when allocating > the PTE table, nor PMD, nor PUD) - I think? > > But instead do: > a). when we load the cr3? We could go through the whole pagetable and > set the RO as we need? > b). when we are finished with the creation of a page table? So similary > to the point above - don''t set the RO on the pages until we have completed > the full creation of the page. > c). when post_allocator_start is called?I think c) is too late because there might be allocations or at least memblock allocations after init_memory_mapping and before post_allocator_start.> d). other ideas? > > But I vaguelly recall that we are using the page table as we are adding in the > entries. And that we are pinning them as well. Perhaps the trigger to scan the > pagetable and set them to RO should be done .. at what point? When the PUD/PMD > allocations are done? And when PUD/PMD are set? >Setting the pagetable pages RO after init_memory_mapping is difficult because pagetable pages have to be set RO before becoming pagetable pages. They become pagetable pages when: - they are explicitly pinned by pin_pagetable_pfn - they are hooked into the current pagetable Like you wrote, considering that the x86_64 version of kernel_physical_mapping_init hooks the pagetable pages into the currently used pagetable, it wouldn''t be possible to mark the pagetable pages RO after init_memory_mapping. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-May-16 15:41 UTC
[Xen-devel] Re: Xen MMU''s requirement to pin pages RO and initial_memory_mapping.
> They become pagetable pages when: > > - they are explicitly pinned by pin_pagetable_pfn > > - they are hooked into the current pagetableOk, so could we use those two calls to trigger the pagetable walk and mark them RO as appropiate? Which call sites are those? The xen_set_pgd/xen_set_pud/xen_set_pmd ? Presumarily we don''t have to do that for the PTE''s that are already mapped (as xen_setup_kernel_pagetable, and xen_map_identity_early do this already).> Like you wrote, considering that the x86_64 version of > kernel_physical_mapping_init hooks the pagetable pages into the > currently used pagetable, it wouldn''t be possible to mark the pagetable > pages RO after init_memory_mapping.<nods> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
H. Peter Anvin
2011-May-16 21:54 UTC
[Xen-devel] Re: Xen MMU''s requirement to pin pages RO and initial_memory_mapping.
On 05/16/2011 08:41 AM, Konrad Rzeszutek Wilk wrote:>> They become pagetable pages when: >> >> - they are explicitly pinned by pin_pagetable_pfn >> >> - they are hooked into the current pagetable > > Ok, so could we use those two calls to trigger the pagetable walk > and mark them RO as appropiate? Which call sites are those? The > xen_set_pgd/xen_set_pud/xen_set_pmd ? Presumarily we don''t have > to do that for the PTE''s that are already mapped (as > xen_setup_kernel_pagetable, and xen_map_identity_early do this > already). > >> Like you wrote, considering that the x86_64 version of >> kernel_physical_mapping_init hooks the pagetable pages into the >> currently used pagetable, it wouldn''t be possible to mark the pagetable >> pages RO after init_memory_mapping. >Doesn''t Xen have some kind of compatibility mode which could be used during setup? -hpa _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-May-17 17:50 UTC
[Xen-devel] Re: Xen MMU''s requirement to pin pages RO and initial_memory_mapping.
On Mon, 16 May 2011, Konrad Rzeszutek Wilk wrote:> > They become pagetable pages when: > > > > - they are explicitly pinned by pin_pagetable_pfn > > > > - they are hooked into the current pagetable > > Ok, so could we use those two calls to trigger the pagetable walk > and mark them RO as appropiate? Which call sites are those? The > xen_set_pgd/xen_set_pud/xen_set_pmd ?xen_alloc_pte_init and xen_alloc_pmd_init are the ones that mark the pagetable pages RO and pin them, calling make_lowmem_page_readonly and pin_pagetable_pfn. alloc_pte/pmd are called right before hooking them into the pagetable; unfortunately that means that they fail at marking the pagetable pages RO: make_lowmem_page_readonly uses lookup_address to find the pte corresponding to a page, however at this point the pagetable pages are not mapped yet (usually they are not hooked but when they are hooked, the upper level pagetable page is not hooked), so lookup_address fails. In order to catch these errors Xen has a parachute: xen_set_pte_init, the function that takes care of writing a pte to memory and that on xen converts pfns to mfns, also marks pagetable pages RO trying to understand when that is appropriate. This is all very ugly and delicate. I think alloc_pte/pmd were always thought to be used to mark and pin pagetable pages but they currently fail during the initial pagetable setup. If we could fix alloc_pte/pmd most of the problems and the hacks would go away. Ideally we could remove both mask_rw_pte (currently responsible for marking pagetable pages RO, called from xen_set_pte_init) and x86_init.mapping.pagetable_reserve. More thinking (and caffeine) needed...> Presumarily we don''t have > to do that for the PTE''s that are already mapped (as > xen_setup_kernel_pagetable, and xen_map_identity_early do this > already).No, we don''t. We do need to make sure they stay RO on x86_32 where we write the pagetable pages in two steps and we switch pagetable to swapper_pg_dir. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-May-17 17:51 UTC
[Xen-devel] Re: Xen MMU''s requirement to pin pages RO and initial_memory_mapping.
CC''ing Keir in case he knows something I am missing. On Mon, 16 May 2011, H. Peter Anvin wrote:> On 05/16/2011 08:41 AM, Konrad Rzeszutek Wilk wrote: > >> They become pagetable pages when: > >> > >> - they are explicitly pinned by pin_pagetable_pfn > >> > >> - they are hooked into the current pagetable > > > > Ok, so could we use those two calls to trigger the pagetable walk > > and mark them RO as appropiate? Which call sites are those? The > > xen_set_pgd/xen_set_pud/xen_set_pmd ? Presumarily we don''t have > > to do that for the PTE''s that are already mapped (as > > xen_setup_kernel_pagetable, and xen_map_identity_early do this > > already). > > > >> Like you wrote, considering that the x86_64 version of > >> kernel_physical_mapping_init hooks the pagetable pages into the > >> currently used pagetable, it wouldn''t be possible to mark the pagetable > >> pages RO after init_memory_mapping. > > > > Doesn''t Xen have some kind of compatibility mode which could be used > during setup?Unfortunately not that I am aware. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-May-17 18:05 UTC
[Xen-devel] Re: Xen MMU''s requirement to pin pages RO and initial_memory_mapping.
On Tue, May 17, 2011 at 06:50:55PM +0100, Stefano Stabellini wrote:> On Mon, 16 May 2011, Konrad Rzeszutek Wilk wrote: > > > They become pagetable pages when: > > > > > > - they are explicitly pinned by pin_pagetable_pfn > > > > > > - they are hooked into the current pagetable > > > > Ok, so could we use those two calls to trigger the pagetable walk > > and mark them RO as appropiate? Which call sites are those? The > > xen_set_pgd/xen_set_pud/xen_set_pmd ? > > xen_alloc_pte_init and xen_alloc_pmd_init are the ones that mark the > pagetable pages RO and pin them, calling make_lowmem_page_readonly and > pin_pagetable_pfn. > > alloc_pte/pmd are called right before hooking them into the pagetable; > unfortunately that means that they fail at marking the pagetable pages > RO: make_lowmem_page_readonly uses lookup_address to find the pte > corresponding to a page, however at this point the pagetable pages are > not mapped yet (usually they are not hooked but when they are hooked, the > upper level pagetable page is not hooked), so lookup_address fails.Right. We don''t have to walk the hooked pagetable, I think. We are passed in the PMD/PGD of the PFN and we could look at the content of that PFN. Walk each entry in there and for those that are present, determine if the page table it points to (whatever level it is) is RO. If not, mark it RO. And naturally do it recursively to cover all levels.> > In order to catch these errors Xen has a parachute: xen_set_pte_init, > the function that takes care of writing a pte to memory and that on xen > converts pfns to mfns, also marks pagetable pages RO trying to > understand when that is appropriate. > > This is all very ugly and delicate.And complex.> > I think alloc_pte/pmd were always thought to be used to mark and pin > pagetable pages but they currently fail during the initial pagetable > setup. If we could fix alloc_pte/pmd most of the problems and the hacks > would go away.Yeey!> Ideally we could remove both mask_rw_pte (currently responsible for > marking pagetable pages RO, called from xen_set_pte_init) and > x86_init.mapping.pagetable_reserve. > > More thinking (and caffeine) needed... > > > > > > Presumarily we don''t have > > to do that for the PTE''s that are already mapped (as > > xen_setup_kernel_pagetable, and xen_map_identity_early do this > > already). > > No, we don''t. > We do need to make sure they stay RO on x86_32 where we write the > pagetable pages in two steps and we switch pagetable to swapper_pg_dir.And then vice-versa later on during the bootup. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-May-17 18:07 UTC
[Xen-devel] Re: Xen MMU''s requirement to pin pages RO and initial_memory_mapping.
> > > > Doesn''t Xen have some kind of compatibility mode which could be used > > during setup? > > Unfortunately not that I am aware.Peter, were you thinking of XENFEAT_auto_translated_physmap? Which is that the hypervisor does all of the MMU translations? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-May-17 18:17 UTC
[Xen-devel] Re: Xen MMU''s requirement to pin pages RO and initial_memory_mapping.
> > xen_alloc_pte_init and xen_alloc_pmd_init are the ones that mark the > > pagetable pages RO and pin them, calling make_lowmem_page_readonly and > > pin_pagetable_pfn. > > > > alloc_pte/pmd are called right before hooking them into the pagetable; > > unfortunately that means that they fail at marking the pagetable pages > > RO: make_lowmem_page_readonly uses lookup_address to find the pte > > corresponding to a page, however at this point the pagetable pages are > > not mapped yet (usually they are not hooked but when they are hooked, the > > upper level pagetable page is not hooked), so lookup_address fails. > > Right. We don''t have to walk the hooked pagetable, I think. We are passed > in the PMD/PGD of the PFN and we could look at the content of that PFN.err, got that backwards. "passed in the PFN of the PMD/PGD". _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-May-23 15:20 UTC
[Xen-devel] Re: Xen MMU''s requirement to pin pages RO and initial_memory_mapping.
On Tue, 17 May 2011, Konrad Rzeszutek Wilk wrote:> On Tue, May 17, 2011 at 06:50:55PM +0100, Stefano Stabellini wrote: > > On Mon, 16 May 2011, Konrad Rzeszutek Wilk wrote: > > > > They become pagetable pages when: > > > > > > > > - they are explicitly pinned by pin_pagetable_pfn > > > > > > > > - they are hooked into the current pagetable > > > > > > Ok, so could we use those two calls to trigger the pagetable walk > > > and mark them RO as appropiate? Which call sites are those? The > > > xen_set_pgd/xen_set_pud/xen_set_pmd ? > > > > xen_alloc_pte_init and xen_alloc_pmd_init are the ones that mark the > > pagetable pages RO and pin them, calling make_lowmem_page_readonly and > > pin_pagetable_pfn. > > > > alloc_pte/pmd are called right before hooking them into the pagetable; > > unfortunately that means that they fail at marking the pagetable pages > > RO: make_lowmem_page_readonly uses lookup_address to find the pte > > corresponding to a page, however at this point the pagetable pages are > > not mapped yet (usually they are not hooked but when they are hooked, the > > upper level pagetable page is not hooked), so lookup_address fails. > > Right. We don''t have to walk the hooked pagetable, I think. We are passed > in the PMD/PGD of the PFN and we could look at the content of that PFN. > Walk each entry in there and for those that are present, determine > if the page table it points to (whatever level it is) is RO. If not, mark > it RO. And naturally do it recursively to cover all levels.I am not sure what you mean. The interface is the following: void alloc_pte(struct mm_struct *mm, unsigned long pfn); pfn is the pagetable page''s pfn, that has to be marked RO in all his mappings; mm is the mm_struct where this pagetable page is mapped. Except it is not true anymore because the pagetable page is not mapped yet in mm; so we cannot walk anything here, unfortunately. We could remember that we failed to mark this page RO so that the next time we try to write a pte that contains that address we know we have to mark it RO. But this approach is basically equivalent to the one we had before 2.6.39: we consider the range pgt_buf_start-pgt_buf_end a "published" range of pagetable pages that we have to mark RO. In fact it is affected by the same problem: after writing the ptes that map the range pgt_buf_start-pgt_buf_end, if we try to allocate a new pagetable page incrementing pgt_buf_end we fail because the new page is already marked RW in a pinned page. At the same time we cannot modify the pte to change the mapping to RO because lookup_address doesn''t find it (the pagetable page containing the pte in question is not reachable from init_mm yet). Unfortunately I cannot see an easy way to fix alloc_pte without making sure that the pfn passed as an argument is already mapped and the pte is reachable using lookup_address. Alternatively we could come up with a new interface that properly publishes the pgt_buf_start-pgt_buf_top range, but it would still need a "free" function for the pgt_buf_end-pgt_buf_top range to be called after the initial mapping is complete. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-May-24 13:06 UTC
[Xen-devel] Re: Xen MMU''s requirement to pin pages RO and initial_memory_mapping.
On Mon, May 23, 2011 at 04:20:15PM +0100, Stefano Stabellini wrote:> On Tue, 17 May 2011, Konrad Rzeszutek Wilk wrote: > > On Tue, May 17, 2011 at 06:50:55PM +0100, Stefano Stabellini wrote: > > > On Mon, 16 May 2011, Konrad Rzeszutek Wilk wrote: > > > > > They become pagetable pages when: > > > > > > > > > > - they are explicitly pinned by pin_pagetable_pfn > > > > > > > > > > - they are hooked into the current pagetable > > > > > > > > Ok, so could we use those two calls to trigger the pagetable walk > > > > and mark them RO as appropiate? Which call sites are those? The > > > > xen_set_pgd/xen_set_pud/xen_set_pmd ? > > > > > > xen_alloc_pte_init and xen_alloc_pmd_init are the ones that mark the > > > pagetable pages RO and pin them, calling make_lowmem_page_readonly and > > > pin_pagetable_pfn. > > > > > > alloc_pte/pmd are called right before hooking them into the pagetable; > > > unfortunately that means that they fail at marking the pagetable pages > > > RO: make_lowmem_page_readonly uses lookup_address to find the pte > > > corresponding to a page, however at this point the pagetable pages are > > > not mapped yet (usually they are not hooked but when they are hooked, the > > > upper level pagetable page is not hooked), so lookup_address fails. > > > > Right. We don''t have to walk the hooked pagetable, I think. We are passed > > in the PMD/PGD of the PFN and we could look at the content of that PFN. > > Walk each entry in there and for those that are present, determine > > if the page table it points to (whatever level it is) is RO. If not, mark > > it RO. And naturally do it recursively to cover all levels. > > I am not sure what you mean. > The interface is the following: > > void alloc_pte(struct mm_struct *mm, unsigned long pfn); > > pfn is the pagetable page''s pfn, that has to be marked RO in all his mappings; > mm is the mm_struct where this pagetable page is mapped. > Except it is not true anymore because the pagetable page is not mapped > yet in mm; so we cannot walk anything here, unfortunately.I was thinking to "resolve" the pfn, and directly read from the pfn''s the entries. So not walking the mm_struct, but reading the raw data from the PFN page... but I that would not do much as alloc_pte is done _before_ that pagetable is actually populated - so it has nothing in it.> We could remember that we failed to mark this page RO so that the next > time we try to write a pte that contains that address we know we have to > mark it RO. > But this approach is basically equivalent to the one we had before > 2.6.39: we consider the range pgt_buf_start-pgt_buf_end a "published" > range of pagetable pages that we have to mark RO. > In fact it is affected by the same problem: after writing the ptes that > map the range pgt_buf_start-pgt_buf_end, if we try to allocate a new > pagetable page incrementing pgt_buf_end we fail because the new page is > already marked RW in a pinned page. > At the same time we cannot modify the pte to change the mapping to RO > because lookup_address doesn''t find it (the pagetable page containing > the pte in question is not reachable from init_mm yet).So.. why not do the raw walk of the PFN (and within this "raw walk" ioremap the PFNs, and do a depth-first walk on the page-tables do set them to RO) when it is being hooked up to the page-table? Meaning - whatever trigger point is when we try to set a PUD in a PGD, or PTE into a PMD. And naturally we can''t walk the ''init_mm'' as it has not been hooked up yet (and it cannot as the page-tables have not been set to RO).> > > Unfortunately I cannot see an easy way to fix alloc_pte without making > sure that the pfn passed as an argument is already mapped and the pte is > reachable using lookup_address.Lets ignore that for now.> > Alternatively we could come up with a new interface that properly > publishes the pgt_buf_start-pgt_buf_top range, but it would still need a > "free" function for the pgt_buf_end-pgt_buf_top range to be called after > the initial mapping is complete._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-May-24 16:24 UTC
[Xen-devel] Re: Xen MMU''s requirement to pin pages RO and initial_memory_mapping.
On Tue, 24 May 2011, Konrad Rzeszutek Wilk wrote:> > > Right. We don''t have to walk the hooked pagetable, I think. We are passed > > > in the PMD/PGD of the PFN and we could look at the content of that PFN. > > > Walk each entry in there and for those that are present, determine > > > if the page table it points to (whatever level it is) is RO. If not, mark > > > it RO. And naturally do it recursively to cover all levels. > > > > I am not sure what you mean. > > The interface is the following: > > > > void alloc_pte(struct mm_struct *mm, unsigned long pfn); > > > > pfn is the pagetable page''s pfn, that has to be marked RO in all his mappings; > > mm is the mm_struct where this pagetable page is mapped. > > Except it is not true anymore because the pagetable page is not mapped > > yet in mm; so we cannot walk anything here, unfortunately. > > I was thinking to "resolve" the pfn, and directly read from the pfn''s the > entries. So not walking the mm_struct, but reading the raw data from the > PFN page... but I that would not do much as alloc_pte is done _before_ that > pagetable is actually populated - so it has nothing in it. > > > > We could remember that we failed to mark this page RO so that the next > > time we try to write a pte that contains that address we know we have to > > mark it RO. > > But this approach is basically equivalent to the one we had before > > 2.6.39: we consider the range pgt_buf_start-pgt_buf_end a "published" > > range of pagetable pages that we have to mark RO. > > In fact it is affected by the same problem: after writing the ptes that > > map the range pgt_buf_start-pgt_buf_end, if we try to allocate a new > > pagetable page incrementing pgt_buf_end we fail because the new page is > > already marked RW in a pinned page. > > At the same time we cannot modify the pte to change the mapping to RO > > because lookup_address doesn''t find it (the pagetable page containing > > the pte in question is not reachable from init_mm yet). > > So.. why not do the raw walk of the PFN (and within this > "raw walk" ioremap the PFNs, and do a depth-first walk on the page-tables > do set them to RO) when it is being hooked up to the page-table? > Meaning - whatever trigger point is when we try to set a PUD in a PGD, > or PTE into a PMD. And naturally we can''t walk the ''init_mm'' as it > has not been hooked up yet (and it cannot as the page-tables have not > been set to RO).We cannot use the pvop call when we hook the pagetable page into the upper level because it is right after the alloc_pte call and we still cannot find its mappings through lookup_address. Keep in mind that it is the pagetable page mapping that we have to set read-only, not the content of the pagetable page. The pagetable page mapping might be written before or after the pagetable page is being hooked into the pagetable. BTW I have to say that all the alloc_pte calls are currently wrong because the struct mm that we are passing is meaningless. On the other hand if you are suggesting to use the pvop call when we write pte entries, that is set_pte (xen_set_pte_init), get the pfn from the pte, figure out if it is a pagetable page, and if it is mark it read-only, that is exactly what we are already doing. Unfortunately it has a number of issues: - once the pte entries are written they cannot easily be changed because they still cannot be found using lookup_address. This means that once the pte entries for range pgt_buf_start-pgt_buf_end have been written, we cannot allocate any new pagetable pages, because we don''t have a way to mark them read-only anymore. This is the issue that brought us to the introduction of x86_init.mapping.pagetable_reserve. - we need to distinguish between normal mappings and temporary mappings; among the temporary mappings we need to distinguish between new pagetable pages that are not hooked into the pagetable yet and pagetable already hooked into the pagetable. It is not easy and it is error prone. I think the approach of catching a pte write and trying to understand what the pfn exactly is has proven to be too fragile. We need a simple proper interface, like alloc_pte was supposed to be, otherwise it will keep breaking. At the moment I can only see two possible ways of solving this: 1) Have a way to find the pte that maps a pagetable page when the pagetable page is hooked into the pagetable. Unfortunately I fail to see a way to do it, given the current way of allocating the pagetable pages. Maybe somebody with a deeper knowlegde of kernel_physical_mapping_init could suggest a way. 2) Have a simple way to know if a pfn is a pagetable page. If we had a function like pfn_is_pagetable_page(unsigned long pfn) we could remove the hacks we have in xen_set_pte and mark read-only the entry whenever needed. This implies that we need a way to know all the pfn''s of the pagetable pages in advance. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel