Attilio Rao
2012-Aug-14 12:24 UTC
[PATCH v2 0/2] XEN/X86: Document pagetable_reserve PVOPS and enforce a better semantic
When looking for documenting the pagetable_reserve PVOPS, I realized that it assumes start == pgt_buf_start. I think this is not semantically right (even if with the current code this should not be a problem in practice) and what we really want is to extend the logic in order to do the RO -> RW convertion also for the range [pgt_buf_start, start). This patch then implements this missing conversion, adding some smaller cleanups and finally provides documentation for the PVOPS. Please look at 2/2 for more details on how the comment is structured. If we get this right we will have a reference to be used later on for others PVOPS. The difference with v1 is that sh_start local variable in xen_mapping_pagetable_reserve() is renamed as begin. Also, the commit messages have been tweaked. Attilio Rao (2): XEN, X86: Improve semantic support for pagetable_reserve PVOPS Xen: Document the semantic of the pagetable_reserve PVOPS arch/x86/include/asm/x86_init.h | 19 +++++++++++++++++-- arch/x86/mm/init.c | 4 ++++ arch/x86/xen/mmu.c | 22 ++++++++++++++++++++-- 3 files changed, 41 insertions(+), 4 deletions(-) -- 1.7.2.5
Attilio Rao
2012-Aug-14 12:24 UTC
[PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
- Allow xen_mapping_pagetable_reserve() to handle a start different from pgt_buf_start, but still bigger than it. - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve() for verifying start and end are contained in the range [pgt_buf_start, pgt_buf_top]. - In xen_mapping_pagetable_reserve(), change printk into pr_debug. - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is an actual need to do that (or, in other words, if there are actually some pages going to switch from RO to RW). Signed-off-by: Attilio Rao <attilio.rao@citrix.com> --- arch/x86/mm/init.c | 4 ++++ arch/x86/xen/mmu.c | 22 ++++++++++++++++++++-- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index e0e6990..c5849b6 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en void __init native_pagetable_reserve(u64 start, u64 end) { + if (start < PFN_PHYS(pgt_buf_start) || end > PFN_PHYS(pgt_buf_top)) + panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n" + start, end, PFN_PHYS(pgt_buf_start), + PFN_PHYS(pgt_buf_top)); memblock_reserve(start, end - start); } diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index b65a761..66d73a2 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1180,12 +1180,30 @@ static void __init xen_pagetable_setup_start(pgd_t *base) static __init void xen_mapping_pagetable_reserve(u64 start, u64 end) { + u64 begin; + + begin = PFN_PHYS(pgt_buf_start); + + if (start < begin || end > PFN_PHYS(pgt_buf_top)) + panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n" + start, end, begin, PFN_PHYS(pgt_buf_top)); + + /* set RW the initial range */ + if (start != begin) + pr_debug("xen: setting RW the range %llx - %llx\n", + begin, start); + while (begin < start) { + make_lowmem_page_readwrite(__va(begin)); + begin += PAGE_SIZE; + } + /* reserve the range used */ native_pagetable_reserve(start, end); /* set as RW the rest */ - printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end, - PFN_PHYS(pgt_buf_top)); + if (end != PFN_PHYS(pgt_buf_top)) + pr_debug("xen: setting RW the range %llx - %llx\n", + end, PFN_PHYS(pgt_buf_top)); while (end < PFN_PHYS(pgt_buf_top)) { make_lowmem_page_readwrite(__va(end)); end += PAGE_SIZE; -- 1.7.2.5
Attilio Rao
2012-Aug-14 12:24 UTC
[PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
The informations added on the hook are: - Native behaviour - Xen specific behaviour - Logic behind the Xen specific behaviour - PVOPS semantic Signed-off-by: Attilio Rao <attilio.rao@citrix.com> --- arch/x86/include/asm/x86_init.h | 19 +++++++++++++++++-- 1 files changed, 17 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index 38155f6..b22093c 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -72,8 +72,23 @@ struct x86_init_oem { * struct x86_init_mapping - platform specific initial kernel pagetable setup * @pagetable_reserve: reserve a range of addresses for kernel pagetable usage * - * For more details on the purpose of this hook, look in - * init_memory_mapping and the commit that added it. + * It does reserve a range of pages, to be used as pagetable pages. + * The start and end parameters are expected to be contained in the + * [pgt_buf_start, pgt_buf_top] range. + * The native implementation reserves the pages via the memblock_reserve() + * interface. + * The Xen implementation, besides reserving the range via memblock_reserve(), + * also sets RW the remaining pages contained in the ranges + * [pgt_buf_start, start) and [end, pgt_buf_top). + * This is needed because the range [pgt_buf_start, pgt_buf_top] was + * previously mapped read-only by xen_set_pte_init: when running + * on Xen all the pagetable pages need to be mapped read-only in order to + * avoid protection faults from the hypervisor. However, once the correct + * amount of pages is reserved for the pagetables, all the others contained + * in the range must be set to RW so that they can be correctly recycled by + * the VM subsystem. + * This operation is meant to be performed only during init_memory_mapping(), + * just after space for the kernel direct mapping tables is found. */ struct x86_init_mapping { void (*pagetable_reserve)(u64 start, u64 end); -- 1.7.2.5
David Vrabel
2012-Aug-14 13:57 UTC
Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
On 14/08/12 13:24, Attilio Rao wrote:> The informations added on the hook are: > - Native behaviour > - Xen specific behaviour > - Logic behind the Xen specific behaviourThese are implementation details and should be documented with the implementations (if necessary).> - PVOPS semanticThis is the interesting stuff. This particular pvop seems a little odd really. It might make more sense if it took a third parameter for pgt_buf_top. "@pagetable_reserve is used to reserve a range of PFNs used for the kernel direct mapping page tables and cleans-up any PFNs that ended up not being used for the tables. It shall reserve the range (start, end] with memblock_reserve(). It shall prepare PFNs in the range (end, pgt_buf_top] for general (non page table) use. It shall only be called in init_memory_mapping() after the direct mapping tables have been constructed." Having said that, I couldn''t immediately see where pages in (end, pgt_buf_top] was getting set RO. Can you point me to where it''s done? David> Signed-off-by: Attilio Rao <attilio.rao@citrix.com> > --- > arch/x86/include/asm/x86_init.h | 19 +++++++++++++++++-- > 1 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h > index 38155f6..b22093c 100644 > --- a/arch/x86/include/asm/x86_init.h > +++ b/arch/x86/include/asm/x86_init.h > @@ -72,8 +72,23 @@ struct x86_init_oem { > * struct x86_init_mapping - platform specific initial kernel pagetable setup > * @pagetable_reserve: reserve a range of addresses for kernel pagetable usage > * > - * For more details on the purpose of this hook, look in > - * init_memory_mapping and the commit that added it. > + * It does reserve a range of pages, to be used as pagetable pages. > + * The start and end parameters are expected to be contained in the > + * [pgt_buf_start, pgt_buf_top] range. > + * The native implementation reserves the pages via the memblock_reserve() > + * interface. > + * The Xen implementation, besides reserving the range via memblock_reserve(), > + * also sets RW the remaining pages contained in the ranges > + * [pgt_buf_start, start) and [end, pgt_buf_top). > + * This is needed because the range [pgt_buf_start, pgt_buf_top] was > + * previously mapped read-only by xen_set_pte_init: when running > + * on Xen all the pagetable pages need to be mapped read-only in order to > + * avoid protection faults from the hypervisor. However, once the correct > + * amount of pages is reserved for the pagetables, all the others contained > + * in the range must be set to RW so that they can be correctly recycled by > + * the VM subsystem. > + * This operation is meant to be performed only during init_memory_mapping(), > + * just after space for the kernel direct mapping tables is found. > */ > struct x86_init_mapping { > void (*pagetable_reserve)(u64 start, u64 end);
Attilio Rao
2012-Aug-14 14:12 UTC
Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
On 14/08/12 14:57, David Vrabel wrote:> On 14/08/12 13:24, Attilio Rao wrote: > >> The informations added on the hook are: >> - Native behaviour >> - Xen specific behaviour >> - Logic behind the Xen specific behaviour >> > These are implementation details and should be documented with the > implementations (if necessary). >In this specific case, implementation details are very valuable to understand the semantic of operations, this is why I added them there. I think, at least for this case, this is the best trade-off.>> - PVOPS semantic >> > This is the interesting stuff. > > This particular pvop seems a little odd really. It might make more > sense if it took a third parameter for pgt_buf_top. >The thing is that this work (documenting PVOPS) should help in understanding the logic behind some PVOPS and possibly improve/rework them. For this stage, I agreed with Konrad to keep the changes as small as possible. Once the documentation about the semantic is in place we can think about ways to improve things more effectively (for example, in some cases we may want to rewrite the PVOP completely).> "@pagetable_reserve is used to reserve a range of PFNs used for the > kernel direct mapping page tables and cleans-up any PFNs that ended up > not being used for the tables. > > It shall reserve the range (start, end] with memblock_reserve(). It > shall prepare PFNs in the range (end, pgt_buf_top] for general (non page > table) use. > > It shall only be called in init_memory_mapping() after the direct > mapping tables have been constructed." > > Having said that, I couldn''t immediately see where pages in (end, > pgt_buf_top] was getting set RO. Can you point me to where it''s done? >As mentioned in the comment, please look at xen_set_pte_init(). Attilio
David Vrabel
2012-Aug-15 11:19 UTC
Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
On 14/08/12 15:12, Attilio Rao wrote:> On 14/08/12 14:57, David Vrabel wrote: >> On 14/08/12 13:24, Attilio Rao wrote: >> >>> The informations added on the hook are: - Native behaviour - Xen >>> specific behaviour - Logic behind the Xen specific behaviour >>> >> These are implementation details and should be documented with the >> implementations (if necessary). >> > > In this specific case, implementation details are very valuable to > understand the semantic of operations, this is why I added them > there. I think, at least for this case, this is the best trade-off.Documenting the implementation details will be useful for reviewing or refactoring the pv-ops but I don''t think it useful in the longer term for it to be included in the API documentation upstream.>>> - PVOPS semantic >>> >> This is the interesting stuff. >> >> This particular pvop seems a little odd really. It might make more >> sense if it took a third parameter for pgt_buf_top. > > The thing is that this work (documenting PVOPS) should help in > understanding the logic behind some PVOPS and possibly > improve/rework them. For this stage, I agreed with Konrad to keep the > changes as small as possible. Once the documentation about the > semantic is in place we can think about ways to improve things more > effectively (for example, in some cases we may want to rewrite the > PVOP completely).After looking at it some more, I think this pv-ops is unnecessary. How about the following patch to just remove it completely? I''ve only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning is sound.>> "@pagetable_reserve is used to reserve a range of PFNs used for the >> kernel direct mapping page tables and cleans-up any PFNs that ended >> up not being used for the tables. >> >> It shall reserve the range (start, end] with memblock_reserve(). It >> shall prepare PFNs in the range (end, pgt_buf_top] for general (non >> page table) use. >> >> It shall only be called in init_memory_mapping() after the direct >> mapping tables have been constructed." >> >> Having said that, I couldn''t immediately see where pages in (end, >> pgt_buf_top] was getting set RO. Can you point me to where it''s >> done? >> > > As mentioned in the comment, please look at xen_set_pte_init().xen_set_pte_init() only ensures it doesn''t set the PTE as writable if it is already present and read-only. David 8<---------------------- x86: remove x86_init.mapping.pagetable_reserve paravirt op The x86_init.mapping.pagetable_reserve paravirt op is used for Xen guests to set the writable flag for the mapping of (pgt_buf_end, pgt_buf_top]. This is not necessary as these pages are never set as read-only as they have never contained page tables. When running as a Xen guest, the initial page tables are provided by Xen (these are reserved with memblock_reserve() in xen_setup_kernel_pagetable()) and constructed in brk space (for 32-bit guests) or in the kernel''s .data section (for 64-bit guests, see head_64.S). Since these are all marked as reserved, (pgt_buf_start, pgt_buf_top] does not overlap with them and the mappings for these PFNs will be read-write. Since Xen doesn''t need to change the mapping its implementation becomes the same as a native and we can simply remove this pv-op completely. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- arch/x86/include/asm/pgtable_types.h | 1 - arch/x86/include/asm/x86_init.h | 12 ------------ arch/x86/kernel/x86_init.c | 4 ---- arch/x86/mm/init.c | 22 +++------------------- arch/x86/xen/mmu.c | 19 ++----------------- 5 files changed, 5 insertions(+), 53 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 013286a..0a11293 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -301,7 +301,6 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, /* Install a pte for a particular vaddr in kernel space. */ void set_pte_vaddr(unsigned long vaddr, pte_t pte); -extern void native_pagetable_reserve(u64 start, u64 end); #ifdef CONFIG_X86_32 extern void native_pagetable_setup_start(pgd_t *base); extern void native_pagetable_setup_done(pgd_t *base); diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index 38155f6..b527dd4 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -69,17 +69,6 @@ struct x86_init_oem { }; /** - * struct x86_init_mapping - platform specific initial kernel pagetable setup - * @pagetable_reserve: reserve a range of addresses for kernel pagetable usage - * - * For more details on the purpose of this hook, look in - * init_memory_mapping and the commit that added it. - */ -struct x86_init_mapping { - void (*pagetable_reserve)(u64 start, u64 end); -}; - -/** * struct x86_init_paging - platform specific paging functions * @pagetable_setup_start: platform specific pre paging_init() call * @pagetable_setup_done: platform specific post paging_init() call @@ -135,7 +124,6 @@ struct x86_init_ops { struct x86_init_mpparse mpparse; struct x86_init_irqs irqs; struct x86_init_oem oem; - struct x86_init_mapping mapping; struct x86_init_paging paging; struct x86_init_timers timers; struct x86_init_iommu iommu; diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index 9f3167e..040c05f 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -63,10 +63,6 @@ struct x86_init_ops x86_init __initdata = { .banner = default_banner, }, - .mapping = { - .pagetable_reserve = native_pagetable_reserve, - }, - .paging = { .pagetable_setup_start = native_pagetable_setup_start, .pagetable_setup_done = native_pagetable_setup_done, diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index e0e6990..c449873 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -90,11 +90,6 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en (pgt_buf_top << PAGE_SHIFT) - 1); } -void __init native_pagetable_reserve(u64 start, u64 end) -{ - memblock_reserve(start, end - start); -} - #ifdef CONFIG_X86_32 #define NR_RANGE_MR 3 #else /* CONFIG_X86_64 */ @@ -283,22 +278,11 @@ unsigned long __init_refok init_memory_mapping(unsigned long start, /* * Reserve the kernel pagetable pages we used (pgt_buf_start - - * pgt_buf_end) and free the other ones (pgt_buf_end - pgt_buf_top) - * so that they can be reused for other purposes. - * - * On native it just means calling memblock_reserve, on Xen it also - * means marking RW the pagetable pages that we allocated before - * but that haven''t been used. - * - * In fact on xen we mark RO the whole range pgt_buf_start - - * pgt_buf_top, because we have to make sure that when - * init_memory_mapping reaches the pagetable pages area, it maps - * RO all the pagetable pages, including the ones that are beyond - * pgt_buf_end at that time. + * pgt_buf_end). */ if (!after_bootmem && pgt_buf_end > pgt_buf_start) - x86_init.mapping.pagetable_reserve(PFN_PHYS(pgt_buf_start), - PFN_PHYS(pgt_buf_end)); + memblock_reserve(PFN_PHYS(pgt_buf_start), + PFN_PHYS(pgt_buf_end) - PFN_PHYS(pgt_buf_start)); if (!after_bootmem) early_memtest(start, end); diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index b65a761..e55dfc0 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1178,20 +1178,6 @@ static void __init xen_pagetable_setup_start(pgd_t *base) { } -static __init void xen_mapping_pagetable_reserve(u64 start, u64 end) -{ - /* reserve the range used */ - native_pagetable_reserve(start, end); - - /* set as RW the rest */ - printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end, - PFN_PHYS(pgt_buf_top)); - while (end < PFN_PHYS(pgt_buf_top)) { - make_lowmem_page_readwrite(__va(end)); - end += PAGE_SIZE; - } -} - static void xen_post_allocator_init(void); static void __init xen_pagetable_setup_done(pgd_t *base) @@ -2067,7 +2053,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = { void __init xen_init_mmu_ops(void) { - x86_init.mapping.pagetable_reserve = xen_mapping_pagetable_reserve; x86_init.paging.pagetable_setup_start = xen_pagetable_setup_start; x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done; pv_mmu_ops = xen_mmu_ops; -- 1.7.2.5
Stefano Stabellini
2012-Aug-15 13:55 UTC
Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
On Wed, 15 Aug 2012, David Vrabel wrote:> On 14/08/12 15:12, Attilio Rao wrote: > > On 14/08/12 14:57, David Vrabel wrote: > >> On 14/08/12 13:24, Attilio Rao wrote: > >> > >>> The informations added on the hook are: - Native behaviour - Xen > >>> specific behaviour - Logic behind the Xen specific behaviour > >>> > >> These are implementation details and should be documented with the > >> implementations (if necessary). > >> > > > > In this specific case, implementation details are very valuable to > > understand the semantic of operations, this is why I added them > > there. I think, at least for this case, this is the best trade-off. > > Documenting the implementation details will be useful for reviewing or > refactoring the pv-ops but I don''t think it useful in the longer term > for it to be included in the API documentation upstream. > > >>> - PVOPS semantic > >>> > >> This is the interesting stuff. > >> > >> This particular pvop seems a little odd really. It might make more > >> sense if it took a third parameter for pgt_buf_top. > > > > The thing is that this work (documenting PVOPS) should help in > > understanding the logic behind some PVOPS and possibly > > improve/rework them. For this stage, I agreed with Konrad to keep the > > changes as small as possible. Once the documentation about the > > semantic is in place we can think about ways to improve things more > > effectively (for example, in some cases we may want to rewrite the > > PVOP completely). > > After looking at it some more, I think this pv-ops is unnecessary. How > about the following patch to just remove it completely? > > I''ve only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning > is sound.Do you have more then 4G to dom0 on those boxes? It certainly fixed a serious crash at the time it was introduced, see http://marc.info/?l=linux-kernel&m=129901609503574 and http://marc.info/?l=linux-kernel&m=130133909408229. Unless something big changed in kernel_physical_mapping_init, I think we still need it. Depending on the e820 of your test box, the kernel could crash (or not), possibly in different places.> >> "@pagetable_reserve is used to reserve a range of PFNs used for the > >> kernel direct mapping page tables and cleans-up any PFNs that ended > >> up not being used for the tables. > >> > >> It shall reserve the range (start, end] with memblock_reserve(). It > >> shall prepare PFNs in the range (end, pgt_buf_top] for general (non > >> page table) use. > >> > >> It shall only be called in init_memory_mapping() after the direct > >> mapping tables have been constructed." > >> > >> Having said that, I couldn''t immediately see where pages in (end, > >> pgt_buf_top] was getting set RO. Can you point me to where it''s > >> done? > >> > > > > As mentioned in the comment, please look at xen_set_pte_init(). > > xen_set_pte_init() only ensures it doesn''t set the PTE as writable if it > is already present and read-only.look at mask_rw_pte and read the threads linked above, unfortunately it is not that simple.> David > > 8<---------------------- > x86: remove x86_init.mapping.pagetable_reserve paravirt op > > The x86_init.mapping.pagetable_reserve paravirt op is used for Xen > guests to set the writable flag for the mapping of (pgt_buf_end, > pgt_buf_top]. This is not necessary as these pages are never set as > read-only as they have never contained page tables.Is this actually true? It is possible when pagetable pages are allocated by alloc_low_page.> When running as a Xen guest, the initial page tables are provided by > Xen (these are reserved with memblock_reserve() in > xen_setup_kernel_pagetable()) and constructed in brk space (for 32-bit > guests) or in the kernel''s .data section (for 64-bit guests, see > head_64.S). > > Since these are all marked as reserved, (pgt_buf_start, pgt_buf_top] > does not overlap with them and the mappings for these PFNs will be > read-write.We are talking about pagetable pages built by kernel_physical_mapping_init.> Since Xen doesn''t need to change the mapping its implementation > becomes the same as a native and we can simply remove this pv-op > completely.I don''t think so.> Signed-off-by: David Vrabel <david.vrabel@citrix.com> > --- > arch/x86/include/asm/pgtable_types.h | 1 - > arch/x86/include/asm/x86_init.h | 12 ------------ > arch/x86/kernel/x86_init.c | 4 ---- > arch/x86/mm/init.c | 22 +++------------------- > arch/x86/xen/mmu.c | 19 ++----------------- > 5 files changed, 5 insertions(+), 53 deletions(-) > > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h > index 013286a..0a11293 100644 > --- a/arch/x86/include/asm/pgtable_types.h > +++ b/arch/x86/include/asm/pgtable_types.h > @@ -301,7 +301,6 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, > /* Install a pte for a particular vaddr in kernel space. */ > void set_pte_vaddr(unsigned long vaddr, pte_t pte); > > -extern void native_pagetable_reserve(u64 start, u64 end); > #ifdef CONFIG_X86_32 > extern void native_pagetable_setup_start(pgd_t *base); > extern void native_pagetable_setup_done(pgd_t *base); > diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h > index 38155f6..b527dd4 100644 > --- a/arch/x86/include/asm/x86_init.h > +++ b/arch/x86/include/asm/x86_init.h > @@ -69,17 +69,6 @@ struct x86_init_oem { > }; > > /** > - * struct x86_init_mapping - platform specific initial kernel pagetable setup > - * @pagetable_reserve: reserve a range of addresses for kernel pagetable usage > - * > - * For more details on the purpose of this hook, look in > - * init_memory_mapping and the commit that added it. > - */ > -struct x86_init_mapping { > - void (*pagetable_reserve)(u64 start, u64 end); > -}; > - > -/** > * struct x86_init_paging - platform specific paging functions > * @pagetable_setup_start: platform specific pre paging_init() call > * @pagetable_setup_done: platform specific post paging_init() call > @@ -135,7 +124,6 @@ struct x86_init_ops { > struct x86_init_mpparse mpparse; > struct x86_init_irqs irqs; > struct x86_init_oem oem; > - struct x86_init_mapping mapping; > struct x86_init_paging paging; > struct x86_init_timers timers; > struct x86_init_iommu iommu; > diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c > index 9f3167e..040c05f 100644 > --- a/arch/x86/kernel/x86_init.c > +++ b/arch/x86/kernel/x86_init.c > @@ -63,10 +63,6 @@ struct x86_init_ops x86_init __initdata = { > .banner = default_banner, > }, > > - .mapping = { > - .pagetable_reserve = native_pagetable_reserve, > - }, > - > .paging = { > .pagetable_setup_start = native_pagetable_setup_start, > .pagetable_setup_done = native_pagetable_setup_done, > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index e0e6990..c449873 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -90,11 +90,6 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en > (pgt_buf_top << PAGE_SHIFT) - 1); > } > > -void __init native_pagetable_reserve(u64 start, u64 end) > -{ > - memblock_reserve(start, end - start); > -} > - > #ifdef CONFIG_X86_32 > #define NR_RANGE_MR 3 > #else /* CONFIG_X86_64 */ > @@ -283,22 +278,11 @@ unsigned long __init_refok init_memory_mapping(unsigned long start, > > /* > * Reserve the kernel pagetable pages we used (pgt_buf_start - > - * pgt_buf_end) and free the other ones (pgt_buf_end - pgt_buf_top) > - * so that they can be reused for other purposes. > - * > - * On native it just means calling memblock_reserve, on Xen it also > - * means marking RW the pagetable pages that we allocated before > - * but that haven''t been used. > - * > - * In fact on xen we mark RO the whole range pgt_buf_start - > - * pgt_buf_top, because we have to make sure that when > - * init_memory_mapping reaches the pagetable pages area, it maps > - * RO all the pagetable pages, including the ones that are beyond > - * pgt_buf_end at that time. > + * pgt_buf_end). > */ > if (!after_bootmem && pgt_buf_end > pgt_buf_start) > - x86_init.mapping.pagetable_reserve(PFN_PHYS(pgt_buf_start), > - PFN_PHYS(pgt_buf_end)); > + memblock_reserve(PFN_PHYS(pgt_buf_start), > + PFN_PHYS(pgt_buf_end) - PFN_PHYS(pgt_buf_start)); > > if (!after_bootmem) > early_memtest(start, end); > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c > index b65a761..e55dfc0 100644 > --- a/arch/x86/xen/mmu.c > +++ b/arch/x86/xen/mmu.c > @@ -1178,20 +1178,6 @@ static void __init xen_pagetable_setup_start(pgd_t *base) > { > } > > -static __init void xen_mapping_pagetable_reserve(u64 start, u64 end) > -{ > - /* reserve the range used */ > - native_pagetable_reserve(start, end); > - > - /* set as RW the rest */ > - printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end, > - PFN_PHYS(pgt_buf_top)); > - while (end < PFN_PHYS(pgt_buf_top)) { > - make_lowmem_page_readwrite(__va(end)); > - end += PAGE_SIZE; > - } > -} > - > static void xen_post_allocator_init(void); > > static void __init xen_pagetable_setup_done(pgd_t *base) > @@ -2067,7 +2053,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = { > > void __init xen_init_mmu_ops(void) > { > - x86_init.mapping.pagetable_reserve = xen_mapping_pagetable_reserve; > x86_init.paging.pagetable_setup_start = xen_pagetable_setup_start; > x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done; > pv_mmu_ops = xen_mmu_ops; > -- > 1.7.2.5 > >
Attilio Rao
2012-Aug-15 17:15 UTC
Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
On 15/08/12 18:25, Stefano Stabellini wrote:> On Tue, 14 Aug 2012, Attilio Rao wrote: > >> - Allow xen_mapping_pagetable_reserve() to handle a start different from >> pgt_buf_start, but still bigger than it. >> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve() >> for verifying start and end are contained in the range >> [pgt_buf_start, pgt_buf_top]. >> - In xen_mapping_pagetable_reserve(), change printk into pr_debug. >> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is >> an actual need to do that (or, in other words, if there are actually some >> pages going to switch from RO to RW). >> >> Signed-off-by: Attilio Rao<attilio.rao@citrix.com> >> --- >> arch/x86/mm/init.c | 4 ++++ >> arch/x86/xen/mmu.c | 22 ++++++++++++++++++++-- >> 2 files changed, 24 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c >> index e0e6990..c5849b6 100644 >> --- a/arch/x86/mm/init.c >> +++ b/arch/x86/mm/init.c >> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en >> >> void __init native_pagetable_reserve(u64 start, u64 end) >> { >> + if (start< PFN_PHYS(pgt_buf_start) || end> PFN_PHYS(pgt_buf_top)) >> + panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n" >> > code style (you can check whether your patch breaks the code style with > scripts/checkpatch.pl) >I actually did before to submit, it reported 0 errors/warning. Do you have an handy link on where I can find a style guide for Linux kernel? I tried to follow what other parts of the code do. Attilio
Stefano Stabellini
2012-Aug-15 17:25 UTC
Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
On Tue, 14 Aug 2012, Attilio Rao wrote:> - Allow xen_mapping_pagetable_reserve() to handle a start different from > pgt_buf_start, but still bigger than it. > - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve() > for verifying start and end are contained in the range > [pgt_buf_start, pgt_buf_top]. > - In xen_mapping_pagetable_reserve(), change printk into pr_debug. > - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is > an actual need to do that (or, in other words, if there are actually some > pages going to switch from RO to RW). > > Signed-off-by: Attilio Rao <attilio.rao@citrix.com> > --- > arch/x86/mm/init.c | 4 ++++ > arch/x86/xen/mmu.c | 22 ++++++++++++++++++++-- > 2 files changed, 24 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index e0e6990..c5849b6 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en > > void __init native_pagetable_reserve(u64 start, u64 end) > { > + if (start < PFN_PHYS(pgt_buf_start) || end > PFN_PHYS(pgt_buf_top)) > + panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"code style (you can check whether your patch breaks the code style with scripts/checkpatch.pl)> + start, end, PFN_PHYS(pgt_buf_start), > + PFN_PHYS(pgt_buf_top)); > memblock_reserve(start, end - start); > } > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c > index b65a761..66d73a2 100644 > --- a/arch/x86/xen/mmu.c > +++ b/arch/x86/xen/mmu.c > @@ -1180,12 +1180,30 @@ static void __init xen_pagetable_setup_start(pgd_t *base) > > static __init void xen_mapping_pagetable_reserve(u64 start, u64 end) > { > + u64 begin; > + > + begin = PFN_PHYS(pgt_buf_start); > + > + if (start < begin || end > PFN_PHYS(pgt_buf_top)) > + panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"code style> + start, end, begin, PFN_PHYS(pgt_buf_top)); > + > + /* set RW the initial range */ > + if (start != begin) > + pr_debug("xen: setting RW the range %llx - %llx\n", > + begin, start); > + while (begin < start) { > + make_lowmem_page_readwrite(__va(begin)); > + begin += PAGE_SIZE; > + } > + > /* reserve the range used */ > native_pagetable_reserve(start, end); > > /* set as RW the rest */ > - printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end, > - PFN_PHYS(pgt_buf_top)); > + if (end != PFN_PHYS(pgt_buf_top)) > + pr_debug("xen: setting RW the range %llx - %llx\n", > + end, PFN_PHYS(pgt_buf_top)); > while (end < PFN_PHYS(pgt_buf_top)) { > make_lowmem_page_readwrite(__va(end)); > end += PAGE_SIZE; > -- > 1.7.2.5 >
Stefano Stabellini
2012-Aug-15 17:33 UTC
Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
You can add me acked-by, once you do these changes On Wed, 15 Aug 2012, Stefano Stabellini wrote:> On Tue, 14 Aug 2012, Attilio Rao wrote: > > - Allow xen_mapping_pagetable_reserve() to handle a start different from > > pgt_buf_start, but still bigger than it. > > - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve() > > for verifying start and end are contained in the range > > [pgt_buf_start, pgt_buf_top]. > > - In xen_mapping_pagetable_reserve(), change printk into pr_debug. > > - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is > > an actual need to do that (or, in other words, if there are actually some > > pages going to switch from RO to RW). > > > > Signed-off-by: Attilio Rao <attilio.rao@citrix.com> > > --- > > arch/x86/mm/init.c | 4 ++++ > > arch/x86/xen/mmu.c | 22 ++++++++++++++++++++-- > > 2 files changed, 24 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > > index e0e6990..c5849b6 100644 > > --- a/arch/x86/mm/init.c > > +++ b/arch/x86/mm/init.c > > @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en > > > > void __init native_pagetable_reserve(u64 start, u64 end) > > { > > + if (start < PFN_PHYS(pgt_buf_start) || end > PFN_PHYS(pgt_buf_top)) > > + panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n" > > code style (you can check whether your patch breaks the code style with > scripts/checkpatch.pl) > > > + start, end, PFN_PHYS(pgt_buf_start), > > + PFN_PHYS(pgt_buf_top)); > > memblock_reserve(start, end - start); > > } > > > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c > > index b65a761..66d73a2 100644 > > --- a/arch/x86/xen/mmu.c > > +++ b/arch/x86/xen/mmu.c > > @@ -1180,12 +1180,30 @@ static void __init xen_pagetable_setup_start(pgd_t *base) > > > > static __init void xen_mapping_pagetable_reserve(u64 start, u64 end) > > { > > + u64 begin; > > + > > + begin = PFN_PHYS(pgt_buf_start); > > + > > + if (start < begin || end > PFN_PHYS(pgt_buf_top)) > > + panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n" > > code style > > > + start, end, begin, PFN_PHYS(pgt_buf_top)); > > + > > + /* set RW the initial range */ > > + if (start != begin) > > + pr_debug("xen: setting RW the range %llx - %llx\n", > > + begin, start); > > + while (begin < start) { > > + make_lowmem_page_readwrite(__va(begin)); > > + begin += PAGE_SIZE; > > + } > > + > > /* reserve the range used */ > > native_pagetable_reserve(start, end); > > > > /* set as RW the rest */ > > - printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end, > > - PFN_PHYS(pgt_buf_top)); > > + if (end != PFN_PHYS(pgt_buf_top)) > > + pr_debug("xen: setting RW the range %llx - %llx\n", > > + end, PFN_PHYS(pgt_buf_top)); > > while (end < PFN_PHYS(pgt_buf_top)) { > > make_lowmem_page_readwrite(__va(end)); > > end += PAGE_SIZE;
Stefano Stabellini
2012-Aug-15 17:46 UTC
Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
On Wed, 15 Aug 2012, Attilio Rao wrote:> On 15/08/12 18:25, Stefano Stabellini wrote: > > On Tue, 14 Aug 2012, Attilio Rao wrote: > > > >> - Allow xen_mapping_pagetable_reserve() to handle a start different from > >> pgt_buf_start, but still bigger than it. > >> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve() > >> for verifying start and end are contained in the range > >> [pgt_buf_start, pgt_buf_top]. > >> - In xen_mapping_pagetable_reserve(), change printk into pr_debug. > >> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is > >> an actual need to do that (or, in other words, if there are actually some > >> pages going to switch from RO to RW). > >> > >> Signed-off-by: Attilio Rao<attilio.rao@citrix.com> > >> --- > >> arch/x86/mm/init.c | 4 ++++ > >> arch/x86/xen/mmu.c | 22 ++++++++++++++++++++-- > >> 2 files changed, 24 insertions(+), 2 deletions(-) > >> > >> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > >> index e0e6990..c5849b6 100644 > >> --- a/arch/x86/mm/init.c > >> +++ b/arch/x86/mm/init.c > >> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en > >> > >> void __init native_pagetable_reserve(u64 start, u64 end) > >> { > >> + if (start< PFN_PHYS(pgt_buf_start) || end> PFN_PHYS(pgt_buf_top)) > >> + panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n" > >> > > code style (you can check whether your patch breaks the code style with > > scripts/checkpatch.pl) > > > > I actually did before to submit, it reported 0 errors/warning.strange, that really looks like a line over 80 chars> Do you have an handy link on where I can find a style guide for Linux > kernel? I tried to follow what other parts of the code do.Documentation/CodingStyle
Stefano Stabellini
2012-Aug-15 17:46 UTC
Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
On Tue, 14 Aug 2012, Attilio Rao wrote:> The informations added on the hook are: > - Native behaviour > - Xen specific behaviour > - Logic behind the Xen specific behaviour > - PVOPS semantic > > Signed-off-by: Attilio Rao <attilio.rao@citrix.com>Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>> arch/x86/include/asm/x86_init.h | 19 +++++++++++++++++-- > 1 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h > index 38155f6..b22093c 100644 > --- a/arch/x86/include/asm/x86_init.h > +++ b/arch/x86/include/asm/x86_init.h > @@ -72,8 +72,23 @@ struct x86_init_oem { > * struct x86_init_mapping - platform specific initial kernel pagetable setup > * @pagetable_reserve: reserve a range of addresses for kernel pagetable usage > * > - * For more details on the purpose of this hook, look in > - * init_memory_mapping and the commit that added it. > + * It does reserve a range of pages, to be used as pagetable pages. > + * The start and end parameters are expected to be contained in the > + * [pgt_buf_start, pgt_buf_top] range. > + * The native implementation reserves the pages via the memblock_reserve() > + * interface. > + * The Xen implementation, besides reserving the range via memblock_reserve(), > + * also sets RW the remaining pages contained in the ranges > + * [pgt_buf_start, start) and [end, pgt_buf_top). > + * This is needed because the range [pgt_buf_start, pgt_buf_top] was > + * previously mapped read-only by xen_set_pte_init: when running > + * on Xen all the pagetable pages need to be mapped read-only in order to > + * avoid protection faults from the hypervisor. However, once the correct > + * amount of pages is reserved for the pagetables, all the others contained > + * in the range must be set to RW so that they can be correctly recycled by > + * the VM subsystem. > + * This operation is meant to be performed only during init_memory_mapping(), > + * just after space for the kernel direct mapping tables is found. > */ > struct x86_init_mapping { > void (*pagetable_reserve)(u64 start, u64 end); > -- > 1.7.2.5 >
Ian Campbell
2012-Aug-15 18:43 UTC
Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
On Wed, 2012-08-15 at 18:46 +0100, Stefano Stabellini wrote:> On Wed, 15 Aug 2012, Attilio Rao wrote: > > On 15/08/12 18:25, Stefano Stabellini wrote: > > > On Tue, 14 Aug 2012, Attilio Rao wrote: > > > > > >> - Allow xen_mapping_pagetable_reserve() to handle a start different from > > >> pgt_buf_start, but still bigger than it. > > >> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve() > > >> for verifying start and end are contained in the range > > >> [pgt_buf_start, pgt_buf_top]. > > >> - In xen_mapping_pagetable_reserve(), change printk into pr_debug. > > >> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is > > >> an actual need to do that (or, in other words, if there are actually some > > >> pages going to switch from RO to RW). > > >> > > >> Signed-off-by: Attilio Rao<attilio.rao@citrix.com> > > >> --- > > >> arch/x86/mm/init.c | 4 ++++ > > >> arch/x86/xen/mmu.c | 22 ++++++++++++++++++++-- > > >> 2 files changed, 24 insertions(+), 2 deletions(-) > > >> > > >> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > > >> index e0e6990..c5849b6 100644 > > >> --- a/arch/x86/mm/init.c > > >> +++ b/arch/x86/mm/init.c > > >> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en > > >> > > >> void __init native_pagetable_reserve(u64 start, u64 end) > > >> { > > >> + if (start< PFN_PHYS(pgt_buf_start) || end> PFN_PHYS(pgt_buf_top)) > > >> + panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n" > > >> > > > code style (you can check whether your patch breaks the code style with > > > scripts/checkpatch.pl) > > > > > > > I actually did before to submit, it reported 0 errors/warning. > > strange, that really looks like a line over 80 charsAlso there should be one space either side of the "<" and ">" in the conditional.> > > > Do you have an handy link on where I can find a style guide for Linux > > kernel? I tried to follow what other parts of the code do. > > Documentation/CodingStyle > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Attilio Rao
2012-Aug-15 18:47 UTC
Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
On 15/08/12 18:46, Stefano Stabellini wrote:> On Wed, 15 Aug 2012, Attilio Rao wrote: > >> On 15/08/12 18:25, Stefano Stabellini wrote: >> >>> On Tue, 14 Aug 2012, Attilio Rao wrote: >>> >>> >>>> - Allow xen_mapping_pagetable_reserve() to handle a start different from >>>> pgt_buf_start, but still bigger than it. >>>> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve() >>>> for verifying start and end are contained in the range >>>> [pgt_buf_start, pgt_buf_top]. >>>> - In xen_mapping_pagetable_reserve(), change printk into pr_debug. >>>> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is >>>> an actual need to do that (or, in other words, if there are actually some >>>> pages going to switch from RO to RW). >>>> >>>> Signed-off-by: Attilio Rao<attilio.rao@citrix.com> >>>> --- >>>> arch/x86/mm/init.c | 4 ++++ >>>> arch/x86/xen/mmu.c | 22 ++++++++++++++++++++-- >>>> 2 files changed, 24 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c >>>> index e0e6990..c5849b6 100644 >>>> --- a/arch/x86/mm/init.c >>>> +++ b/arch/x86/mm/init.c >>>> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en >>>> >>>> void __init native_pagetable_reserve(u64 start, u64 end) >>>> { >>>> + if (start< PFN_PHYS(pgt_buf_start) || end> PFN_PHYS(pgt_buf_top)) >>>> + panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n" >>>> >>>> >>> code style (you can check whether your patch breaks the code style with >>> scripts/checkpatch.pl) >>> >>> >> I actually did before to submit, it reported 0 errors/warning. >> > strange, that really looks like a line over 80 chars > >Actually code style explicitely says to not break strings because they want to retain the ability to grep. In FreeBSD this is the same and I think this is why checkpatch doesn''t whine. I don''t think there is a bug here. Can I submit the patch as it is, then? Attilio
Attilio Rao
2012-Aug-15 18:50 UTC
Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
On 15/08/12 19:43, Ian Campbell wrote:> On Wed, 2012-08-15 at 18:46 +0100, Stefano Stabellini wrote: > >> On Wed, 15 Aug 2012, Attilio Rao wrote: >> >>> On 15/08/12 18:25, Stefano Stabellini wrote: >>> >>>> On Tue, 14 Aug 2012, Attilio Rao wrote: >>>> >>>> >>>>> - Allow xen_mapping_pagetable_reserve() to handle a start different from >>>>> pgt_buf_start, but still bigger than it. >>>>> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve() >>>>> for verifying start and end are contained in the range >>>>> [pgt_buf_start, pgt_buf_top]. >>>>> - In xen_mapping_pagetable_reserve(), change printk into pr_debug. >>>>> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is >>>>> an actual need to do that (or, in other words, if there are actually some >>>>> pages going to switch from RO to RW). >>>>> >>>>> Signed-off-by: Attilio Rao<attilio.rao@citrix.com> >>>>> --- >>>>> arch/x86/mm/init.c | 4 ++++ >>>>> arch/x86/xen/mmu.c | 22 ++++++++++++++++++++-- >>>>> 2 files changed, 24 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c >>>>> index e0e6990..c5849b6 100644 >>>>> --- a/arch/x86/mm/init.c >>>>> +++ b/arch/x86/mm/init.c >>>>> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en >>>>> >>>>> void __init native_pagetable_reserve(u64 start, u64 end) >>>>> { >>>>> + if (start< PFN_PHYS(pgt_buf_start) || end> PFN_PHYS(pgt_buf_top)) >>>>> + panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n" >>>>> >>>>> >>>> code style (you can check whether your patch breaks the code style with >>>> scripts/checkpatch.pl) >>>> >>>> >>> I actually did before to submit, it reported 0 errors/warning. >>> >> strange, that really looks like a line over 80 chars >> > Also there should be one space either side of the "<" and">" in the > conditional. > >I have no idea why they are reported like that, but in the original patch the space is fine. Attilio
David Vrabel
2012-Aug-15 19:15 UTC
Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
On 15/08/12 14:55, Stefano Stabellini wrote:> On Wed, 15 Aug 2012, David Vrabel wrote: >> On 14/08/12 15:12, Attilio Rao wrote: >>> On 14/08/12 14:57, David Vrabel wrote: >>>> On 14/08/12 13:24, Attilio Rao wrote: >> After looking at it some more, I think this pv-ops is unnecessary. How >> about the following patch to just remove it completely? >> >> I''ve only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning >> is sound. > > Do you have more then 4G to dom0 on those boxes?I''ve tested with 6G now, both 64-bit and 32-bit with HIGHPTE.> It certainly fixed a serious crash at the time it was introduced, see > http://marc.info/?l=linux-kernel&m=129901609503574 and > http://marc.info/?l=linux-kernel&m=130133909408229. Unless something big > changed in kernel_physical_mapping_init, I think we still need it. > Depending on the e820 of your test box, the kernel could crash (or not), > possibly in different places. > >>>> Having said that, I couldn''t immediately see where pages in (end, >>>> pgt_buf_top] was getting set RO. Can you point me to where it''s >>>> done? >>>> >>> >>> As mentioned in the comment, please look at xen_set_pte_init(). >> >> xen_set_pte_init() only ensures it doesn''t set the PTE as writable if it >> is already present and read-only. > > look at mask_rw_pte and read the threads linked above, unfortunately it > is not that simple.Yes, I was remembering what 32-bit did here. The 64-bit version is a bit confused and it often ends up /not/ clearing RW for the direct mapping of the pages in the pgt_buf because any existing RW mappings will be used as-is. See phys_pte_init() which checks for an existing mapping and only sets the PTE if it is not already set. pgd_populate(), pud_populate(), and pmd_populate() take care of clearing RW for the newly allocated page table pages, so mask_rw_pte() only needs to consider clearing RW for mappings of the the existing page table PFNs which all lie outside the range (pt_buf_start, pt_buf_top]. This patch makes it more sensible, and makes removal of the pv-op safe if pgt_buf lies outside the initial mapping. index 04c1f61..2fd5e86 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1400,14 +1400,13 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte) unsigned long pfn = pte_pfn(pte); /* - * If the new pfn is within the range of the newly allocated - * kernel pagetable, and it isn''t being mapped into an - * early_ioremap fixmap slot as a freshly allocated page, make sure - * it is RO. + * If this is a PTE of an early_ioremap fixmap slot but + * outside the range (pgt_buf_start, pgt_buf_top], then this + * PTE is mapping a PFN in the current page table. Make + * sure it is RO. */ - if (((!is_early_ioremap_ptep(ptep) && - pfn >= pgt_buf_start && pfn < pgt_buf_top)) || - (is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1))) + if (is_early_ioremap_ptep(ptep) + && (pfn < pgt_buf_start || pfn >= pgt_buf_top)) pte = pte_wrprotect(pte); return pte;>> 8<---------------------- >> x86: remove x86_init.mapping.pagetable_reserve paravirt op >> >> The x86_init.mapping.pagetable_reserve paravirt op is used for Xen >> guests to set the writable flag for the mapping of (pgt_buf_end, >> pgt_buf_top]. This is not necessary as these pages are never set as >> read-only as they have never contained page tables. > > Is this actually true? It is possible when pagetable pages are > allocated by alloc_low_page.These newly allocated page table pages will be set read-only when they are linked into the page tables with pgd_populate(), pud_populate() and friends.>> When running as a Xen guest, the initial page tables are provided by >> Xen (these are reserved with memblock_reserve() in >> xen_setup_kernel_pagetable()) and constructed in brk space (for 32-bit >> guests) or in the kernel''s .data section (for 64-bit guests, see >> head_64.S). >> >> Since these are all marked as reserved, (pgt_buf_start, pgt_buf_top] >> does not overlap with them and the mappings for these PFNs will be >> read-write. > > We are talking about pagetable pages built by > kernel_physical_mapping_init. > > >> Since Xen doesn''t need to change the mapping its implementation >> becomes the same as a native and we can simply remove this pv-op >> completely. > > I don''t think so.David
Ian Campbell
2012-Aug-16 08:12 UTC
Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
On Wed, 2012-08-15 at 19:47 +0100, Attilio Rao wrote:> On 15/08/12 18:46, Stefano Stabellini wrote: > > On Wed, 15 Aug 2012, Attilio Rao wrote: > > > >> On 15/08/12 18:25, Stefano Stabellini wrote: > >> > >>> On Tue, 14 Aug 2012, Attilio Rao wrote: > >>> > >>> > >>>> - Allow xen_mapping_pagetable_reserve() to handle a start different from > >>>> pgt_buf_start, but still bigger than it. > >>>> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve() > >>>> for verifying start and end are contained in the range > >>>> [pgt_buf_start, pgt_buf_top]. > >>>> - In xen_mapping_pagetable_reserve(), change printk into pr_debug. > >>>> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is > >>>> an actual need to do that (or, in other words, if there are actually some > >>>> pages going to switch from RO to RW). > >>>> > >>>> Signed-off-by: Attilio Rao<attilio.rao@citrix.com> > >>>> --- > >>>> arch/x86/mm/init.c | 4 ++++ > >>>> arch/x86/xen/mmu.c | 22 ++++++++++++++++++++-- > >>>> 2 files changed, 24 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > >>>> index e0e6990..c5849b6 100644 > >>>> --- a/arch/x86/mm/init.c > >>>> +++ b/arch/x86/mm/init.c > >>>> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en > >>>> > >>>> void __init native_pagetable_reserve(u64 start, u64 end) > >>>> { > >>>> + if (start< PFN_PHYS(pgt_buf_start) || end> PFN_PHYS(pgt_buf_top)) > >>>> + panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n" > >>>> > >>>> > >>> code style (you can check whether your patch breaks the code style with > >>> scripts/checkpatch.pl) > >>> > >>> > >> I actually did before to submit, it reported 0 errors/warning. > >> > > strange, that really looks like a line over 80 chars > > > > > > Actually code style explicitely says to not break strings because they > want to retain the ability to grep. In FreeBSD this is the same and I > think this is why checkpatch doesn''t whine. I don''t think there is a bug > here.Right, CodingStyle changed a little while ago from a strict 80 column limit to just strongly preferred 80 columns with an explicit exception for user visible strings.> > Can I submit the patch as it is, then? > > Attilio > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Stefano Stabellini
2012-Aug-16 09:53 UTC
Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
On Wed, 15 Aug 2012, Attilio Rao wrote:> On 15/08/12 18:46, Stefano Stabellini wrote: > > On Wed, 15 Aug 2012, Attilio Rao wrote: > > > >> On 15/08/12 18:25, Stefano Stabellini wrote: > >> > >>> On Tue, 14 Aug 2012, Attilio Rao wrote: > >>> > >>> > >>>> - Allow xen_mapping_pagetable_reserve() to handle a start different from > >>>> pgt_buf_start, but still bigger than it. > >>>> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve() > >>>> for verifying start and end are contained in the range > >>>> [pgt_buf_start, pgt_buf_top]. > >>>> - In xen_mapping_pagetable_reserve(), change printk into pr_debug. > >>>> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is > >>>> an actual need to do that (or, in other words, if there are actually some > >>>> pages going to switch from RO to RW). > >>>> > >>>> Signed-off-by: Attilio Rao<attilio.rao@citrix.com> > >>>> --- > >>>> arch/x86/mm/init.c | 4 ++++ > >>>> arch/x86/xen/mmu.c | 22 ++++++++++++++++++++-- > >>>> 2 files changed, 24 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > >>>> index e0e6990..c5849b6 100644 > >>>> --- a/arch/x86/mm/init.c > >>>> +++ b/arch/x86/mm/init.c > >>>> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en > >>>> > >>>> void __init native_pagetable_reserve(u64 start, u64 end) > >>>> { > >>>> + if (start< PFN_PHYS(pgt_buf_start) || end> PFN_PHYS(pgt_buf_top)) > >>>> + panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n" > >>>> > >>>> > >>> code style (you can check whether your patch breaks the code style with > >>> scripts/checkpatch.pl) > >>> > >>> > >> I actually did before to submit, it reported 0 errors/warning. > >> > > strange, that really looks like a line over 80 chars > > > > > > Actually code style explicitely says to not break strings because they > want to retain the ability to grep. In FreeBSD this is the same and I > think this is why checkpatch doesn''t whine. I don''t think there is a bug > here. > > Can I submit the patch as it is, then?it is ok for me
Stefano Stabellini
2012-Aug-16 11:07 UTC
Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
On Wed, 15 Aug 2012, David Vrabel wrote:> On 15/08/12 14:55, Stefano Stabellini wrote: > > On Wed, 15 Aug 2012, David Vrabel wrote: > >> On 14/08/12 15:12, Attilio Rao wrote: > >>> On 14/08/12 14:57, David Vrabel wrote: > >>>> On 14/08/12 13:24, Attilio Rao wrote: > >> After looking at it some more, I think this pv-ops is unnecessary. How > >> about the following patch to just remove it completely? > >> > >> I''ve only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning > >> is sound. > > > > Do you have more then 4G to dom0 on those boxes? > > I''ve tested with 6G now, both 64-bit and 32-bit with HIGHPTE. > > > It certainly fixed a serious crash at the time it was introduced, see > > http://marc.info/?l=linux-kernel&m=129901609503574 and > > http://marc.info/?l=linux-kernel&m=130133909408229. Unless something big > > changed in kernel_physical_mapping_init, I think we still need it. > > Depending on the e820 of your test box, the kernel could crash (or not), > > possibly in different places. > > > >>>> Having said that, I couldn''t immediately see where pages in (end, > >>>> pgt_buf_top] was getting set RO. Can you point me to where it''s > >>>> done? > >>>> > >>> > >>> As mentioned in the comment, please look at xen_set_pte_init(). > >> > >> xen_set_pte_init() only ensures it doesn''t set the PTE as writable if it > >> is already present and read-only. > > > > look at mask_rw_pte and read the threads linked above, unfortunately it > > is not that simple. > > Yes, I was remembering what 32-bit did here. > > The 64-bit version is a bit confused and it often ends up /not/ clearing > RW for the direct mapping of the pages in the pgt_buf because any > existing RW mappings will be used as-is. See phys_pte_init() which > checks for an existing mapping and only sets the PTE if it is not > already set.not all the pagetable pages might be already mapped, even if they are already hooked into the pagetable> pgd_populate(), pud_populate(), and pmd_populate() take care of clearing > RW for the newly allocated page table pages, so mask_rw_pte() only needs > to consider clearing RW for mappings of the the existing page table PFNs > which all lie outside the range (pt_buf_start, pt_buf_top]. > > This patch makes it more sensible, and makes removal of the pv-op safe > if pgt_buf lies outside the initial mapping. > > index 04c1f61..2fd5e86 100644 > --- a/arch/x86/xen/mmu.c > +++ b/arch/x86/xen/mmu.c > @@ -1400,14 +1400,13 @@ static pte_t __init mask_rw_pte(pte_t *ptep, > pte_t pte) > unsigned long pfn = pte_pfn(pte); > > /* > - * If the new pfn is within the range of the newly allocated > - * kernel pagetable, and it isn''t being mapped into an > - * early_ioremap fixmap slot as a freshly allocated page, make sure > - * it is RO. > + * If this is a PTE of an early_ioremap fixmap slot but > + * outside the range (pgt_buf_start, pgt_buf_top], then this > + * PTE is mapping a PFN in the current page table. Make > + * sure it is RO. > */ > - if (((!is_early_ioremap_ptep(ptep) && > - pfn >= pgt_buf_start && pfn < pgt_buf_top)) || > - (is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1))) > + if (is_early_ioremap_ptep(ptep) > + && (pfn < pgt_buf_start || pfn >= pgt_buf_top)) > pte = pte_wrprotect(pte); > > return pte; >That''s a mistake, you just inverted the condition! What if map_low_page is used to map a page already hooked into the pagetable? It should be RO while with your change it would be RW. Also you don''t handle the case when map_low_page is used to map the very latest page, allocated and mapped by alloc_low_page, but still not hooked into the pagetable: that page needs to be RW. I believe you need more RAM than 6G to see all these issues.> >> 8<---------------------- > >> x86: remove x86_init.mapping.pagetable_reserve paravirt op > >> > >> The x86_init.mapping.pagetable_reserve paravirt op is used for Xen > >> guests to set the writable flag for the mapping of (pgt_buf_end, > >> pgt_buf_top]. This is not necessary as these pages are never set as > >> read-only as they have never contained page tables. > > > > Is this actually true? It is possible when pagetable pages are > > allocated by alloc_low_page. > > These newly allocated page table pages will be set read-only when they > are linked into the page tables with pgd_populate(), pud_populate() and > friends. > > >> When running as a Xen guest, the initial page tables are provided by > >> Xen (these are reserved with memblock_reserve() in > >> xen_setup_kernel_pagetable()) and constructed in brk space (for 32-bit > >> guests) or in the kernel''s .data section (for 64-bit guests, see > >> head_64.S). > >> > >> Since these are all marked as reserved, (pgt_buf_start, pgt_buf_top] > >> does not overlap with them and the mappings for these PFNs will be > >> read-write. > > > > We are talking about pagetable pages built by > > kernel_physical_mapping_init. > > > > > >> Since Xen doesn''t need to change the mapping its implementation > >> becomes the same as a native and we can simply remove this pv-op > >> completely. > > > > I don''t think so. > > David >
David Vrabel
2012-Aug-16 14:33 UTC
Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
On 16/08/12 12:07, Stefano Stabellini wrote:> On Wed, 15 Aug 2012, David Vrabel wrote: >> On 15/08/12 14:55, Stefano Stabellini wrote: >>> On Wed, 15 Aug 2012, David Vrabel wrote: >>>> On 14/08/12 15:12, Attilio Rao wrote: >>>>> On 14/08/12 14:57, David Vrabel wrote: >>>>>> On 14/08/12 13:24, Attilio Rao wrote: >>>> After looking at it some more, I think this pv-ops is unnecessary. How >>>> about the following patch to just remove it completely? >>>> >>>> I''ve only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning >>>> is sound. >>> >>> Do you have more then 4G to dom0 on those boxes? >> >> I''ve tested with 6G now, both 64-bit and 32-bit with HIGHPTE. >> >>> It certainly fixed a serious crash at the time it was introduced, see >>> http://marc.info/?l=linux-kernel&m=129901609503574 and >>> http://marc.info/?l=linux-kernel&m=130133909408229. Unless something big >>> changed in kernel_physical_mapping_init, I think we still need it. >>> Depending on the e820 of your test box, the kernel could crash (or not), >>> possibly in different places.FYI, it looks like pgt_buf is now located low down, which is why these changes worked for me. Possibly this changed as part of a memblock refactor.>>>>>> Having said that, I couldn''t immediately see where pages in (end, >>>>>> pgt_buf_top] was getting set RO. Can you point me to where it''s >>>>>> done? >>>>>> >>>>> >>>>> As mentioned in the comment, please look at xen_set_pte_init(). >>>> >>>> xen_set_pte_init() only ensures it doesn''t set the PTE as writable if it >>>> is already present and read-only. >>> >>> look at mask_rw_pte and read the threads linked above, unfortunately it >>> is not that simple. >> >> Yes, I was remembering what 32-bit did here. >> >> The 64-bit version is a bit confused and it often ends up /not/ clearing >> RW for the direct mapping of the pages in the pgt_buf because any >> existing RW mappings will be used as-is. See phys_pte_init() which >> checks for an existing mapping and only sets the PTE if it is not >> already set. > > not all the pagetable pages might be already mapped, even if they are > already hooked into the pagetableYes, I think this is easy to handle though. /* * Make sure that active page table pages are not mapped RW. */ if (is_early_ioremap_ptep(ptep)) { /* * If we''re updating an early ioremap PTE, then this PFN may * already be in the linear mapping. If it is, use the * existing RW bit. */ unsigned int level; pte_t *linear_pte; linear_pte = lookup_address(__va(PFN_PHYS(pfn)), &level); if (linear_pte && !pte_write(*linear_pt)) pte = pte_wrprotect(pte); } else if (pfn >= pgt_buf_start && pfn < pgt_buf_end) { /* * The PFN may not be mapped but may be hooked into the page * tables. Make sure this new mapping is read-only. */ pte = pte_wrprotect(pte); } However, the real subtlety are page tables that are mapped as they themselves are hooked in. As as example, let''s say pgt_buf_start (s) is on a 1 GiB boundary and pgt_buf_top (t) is below the next 2 MiB boundary. We''re mapping memory with 4 KiB pages from s to s + 4 MiB. This requires a new PUD and two new PMDs. To map this region: 1. Allocate a new PUD. (@ e = pgt_buf_end) 2. Allocate a new PMD. (@ e + 1) 3. Fill in this PMD''s PTEs. This covers pgt_buf so sets (s, e + 1) as RO. 4. Call pmd_populate() to hook-in this PMD. This does not set e+1 as RO (but this is ok as it already is). 5. Allocate a new PMD (@ e + 2) 6. Fill in this PMD''s PTEs. 7. Call pmd_populate() to hook in this PMD. This does not set e+2 as RO as the region isn''t mapped yet. 8. Call pud_populate() to hook in this PUD. This sets e as RO but e + 2 is still RW. 9. Boom! It may be possible to fixup the permissions as the pages are hooked in. i.e., if this page''s entries cover (pgt_buf_start, pgt_buf_end], walk the entries and any child tables and fixup the permissions of the leaf entries. This would walk the tables a few times unless we were careful to only walk them when hooking a page into an active table. It was fun trying to understand this, but I think I''ll give up now... David