The frametable check in xen_in_range() incorrectly compares virtual addresses to physical addresses (i.e. the parameters). Unfortunately, the frametable is only contiguous in the virtual address space, so one can''t simply take __pa() of its start and end. And since it is quite large, iterating through each page to gets its phys addr adds a perceptible delay when that check has to be done for each page of physical memory (as is the case in the only caller, the VT-d routine that maps memory for dom0). But it also appears that we can''t convert the phys addr arguments into their virt addrs to compare with the contiguous frametable range because they will convert to the DIRECTMAP va''s instead. So while I would prefer to find a way to keep the check against the frametable, I don''t see any obvious way to do so. So this RFC is to see if there are any ideas for how the test could be re-written. If not, then the below patch should be applied to remove the frametable check altogether. Signed-off-by: Joseph Cihula <joseph.cihula@intel.com> diff -r 655dc3bc1d8e xen/arch/x86/setup.c --- a/xen/arch/x86/setup.c Thu Apr 16 11:54:06 2009 +0100 +++ b/xen/arch/x86/setup.c Wed Apr 22 15:23:46 2009 -0700 @@ -1119,7 +1119,7 @@ int xen_in_range(paddr_t start, paddr_t int i; static struct { paddr_t s, e; - } xen_regions[5]; + } xen_regions[4]; /* initialize first time */ if ( !xen_regions[0].s ) @@ -1140,10 +1140,6 @@ int xen_in_range(paddr_t start, paddr_t /* bss + boot allocator bitmap */ xen_regions[3].s = __pa(&__bss_start); xen_regions[3].e = allocator_bitmap_end; - /* frametable */ - xen_regions[4].s = (unsigned long)frame_table; - xen_regions[4].e = (unsigned long)frame_table + - PFN_UP(max_page * sizeof(*frame_table)); } for ( i = 0; i < ARRAY_SIZE(xen_regions); i++ ) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote:> Unfortunately, the frametable is only contiguous in the virtual address space, > so one can''t simply take __pa() of its start and end. And since it is quite > large, iterating through each page to gets its phys addr adds a perceptible > delay when that check has to be done for each page of physical memory (as is > the case in the only caller, the VT-d routine that maps memory for dom0). But > it also appears that we can''t convert the phys addr arguments into their virt > addrs to compare with the contiguous frametable range because they will > convert to the DIRECTMAP va''s instead.The frametable is allocated in aligned 2MB chunks. So you can check at that granularity rather than 4kB. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir.fraser@eu.citrix.com> 23.04.09 09:25 >>> >On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote: > >> Unfortunately, the frametable is only contiguous in the virtual address space, >> so one can''t simply take __pa() of its start and end. And since it is quite >> large, iterating through each page to gets its phys addr adds a perceptible >> delay when that check has to be done for each page of physical memory (as is >> the case in the only caller, the VT-d routine that maps memory for dom0). But >> it also appears that we can''t convert the phys addr arguments into their virt >> addrs to compare with the contiguous frametable range because they will >> convert to the DIRECTMAP va''s instead. > >The frametable is allocated in aligned 2MB chunks. So you can check at that >granularity rather than 4kB.... and perhaps allocation should be attempted in 1Gb chunks when the table size is getting close to or exceeding 1Gb (and 1Gb-pages are supported). Or, since the space mapped is larger than the space allocated anyway, the condition might be just that of 1Gb-pages being supported (provided a 1Gb- aligned chunk can be allocated). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Thursday, April 23, 2009 12:25 AM > > On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote: > > > Unfortunately, the frametable is only contiguous in the virtual address space, > > so one can''t simply take __pa() of its start and end. And since it is quite > > large, iterating through each page to gets its phys addr adds a perceptible > > delay when that check has to be done for each page of physical memory (as is > > the case in the only caller, the VT-d routine that maps memory for dom0). But > > it also appears that we can''t convert the phys addr arguments into their virt > > addrs to compare with the contiguous frametable range because they will > > convert to the DIRECTMAP va''s instead. > > The frametable is allocated in aligned 2MB chunks. So you can check at that > granularity rather than 4kB.That made it just a single iteration on a 2GB system, but what fn should be used to convert the va to pa? __pa() isn''t converting this correctly. Joe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> "Cihula, Joseph" <joseph.cihula@intel.com> 24.04.09 03:26 >>> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Thursday, April 23, 2009 12:25 AM >> >> On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote: >> >> > Unfortunately, the frametable is only contiguous in the virtual address space, >> > so one can''t simply take __pa() of its start and end. And since it is quite >> > large, iterating through each page to gets its phys addr adds a perceptible >> > delay when that check has to be done for each page of physical memory (as is >> > the case in the only caller, the VT-d routine that maps memory for dom0). But >> > it also appears that we can''t convert the phys addr arguments into their virt >> > addrs to compare with the contiguous frametable range because they will >> > convert to the DIRECTMAP va''s instead. >> >> The frametable is allocated in aligned 2MB chunks. So you can check at that >> granularity rather than 4kB. > >That made it just a single iteration on a 2GB system, but what fn should be used >to convert the va to pa? __pa() isn''t converting this correctly.I''m afraid you''ll have to either create one, or you''ll have to consult the page tables. Also, how can this be a single iteration on a 2Gb system? struct page_info is 32 bytes, so I''d expect the frame table to be 16Mb in size (i.e. eight iterations). Also, after suggesting to use gb-pages when possible here I realized that it''s probably a latent bug to map more space than was allocated - if the non-allocated-but-mapped pages happen to later get allocated to a domain, that domain may change the cacheability attributes of any of these pages, resulting in aliasing issues. I''ll put together a patch for this, but it''ll be a couple of days until I''ll be able to do so. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 24/04/2009 08:04, "Jan Beulich" <jbeulich@novell.com> wrote:> Also, after suggesting to use gb-pages when possible here I realized that > it''s probably a latent bug to map more space than was allocated - if the > non-allocated-but-mapped pages happen to later get allocated to a domain, > that domain may change the cacheability attributes of any of these pages, > resulting in aliasing issues. I''ll put together a patch for this, but it''ll be > a couple of days until I''ll be able to do so.I think we should shatter the superpage on demand. This would also be required for superpage mappings of Xen itself: when we free initmem that memory can now be allocated to a domain (now xenheap and domheap are merged on x86/64). An alternative might be to mark such partially-freed superpages as Xenheap-only, and allocate them preferentially for Xenheap callers (i.e., alloc those pages first, then from the general heap). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 2009-04-24 at 15:16 +0800, Keir Fraser wrote:> On 24/04/2009 08:04, "Jan Beulich" <jbeulich@novell.com> wrote: > > > Also, after suggesting to use gb-pages when possible here I realized that > > it''s probably a latent bug to map more space than was allocated - if the > > non-allocated-but-mapped pages happen to later get allocated to a domain, > > that domain may change the cacheability attributes of any of these pages, > > resulting in aliasing issues. I''ll put together a patch for this, but it''ll be > > a couple of days until I''ll be able to do so. > > I think we should shatter the superpage on demand. This would also be > required for superpage mappings of Xen itself: when we free initmem that > memory can now be allocated to a domain (now xenheap and domheap are merged > on x86/64).Actually, we just found an example that problems can arise here. Just it''s not the superpage shattering, but unused __per_cpu area. Dom0 actually gets these pages and tries to do DMA on them, which causes VT-d page fault and dom0 boot failure. I have a patch for it adding this part to iommu page table. Thanks, Qing> > An alternative might be to mark such partially-freed superpages as > Xenheap-only, and allocate them preferentially for Xenheap callers (i.e., > alloc those pages first, then from the general heap).> > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Apr-24 23:20 UTC
[Xen-devel] [PATCH] iommu: fix unused percpu in xen_in_range()
On Fri, 2009-04-24 at 15:16 +0800, Keir Fraser wrote:> On 24/04/2009 08:04, "Jan Beulich" <jbeulich@novell.com> wrote: > > > Also, after suggesting to use gb-pages when possible here I realized that > > it''s probably a latent bug to map more space than was allocated - if the > > non-allocated-but-mapped pages happen to later get allocated to a domain, > > that domain may change the cacheability attributes of any of these pages, > > resulting in aliasing issues. I''ll put together a patch for this, but it''ll be > > a couple of days until I''ll be able to do so. > > I think we should shatter the superpage on demand. This would also be > required for superpage mappings of Xen itself: when we free initmem that > memory can now be allocated to a domain (now xenheap and domheap are merged > on x86/64). > > An alternative might be to mark such partially-freed superpages as > Xenheap-only, and allocate them preferentially for Xenheap callers (i.e., > alloc those pages first, then from the general heap). >Here is the patch I mentioned above, it can fix dom0 booting on my box: --- unused percpu area is reclaimed as xenheap, but since xenheap and domheap are shared on x86_64, it''s possible dom0 can get these pages and perform DMA on them. This patch removes this area in xen_in_range(), so iommu 1:1 mapping for this area can be added. Signed-off-by: Qing He <qing.he@intel.com> --- diff -r 8b152638adaa xen/arch/x86/setup.c --- a/xen/arch/x86/setup.c Thu Apr 23 16:22:48 2009 +0100 +++ b/xen/arch/x86/setup.c Fri Apr 24 15:24:18 2009 +0800 @@ -98,6 +98,7 @@ cpumask_t cpu_present_map; unsigned long xen_phys_start; unsigned long allocator_bitmap_end; +unsigned long per_cpu_used_end; #ifdef CONFIG_X86_32 /* Limits of Xen heap, used to initialise the allocator. */ @@ -223,6 +224,8 @@ static void __init percpu_init_areas(voi (first_unused << PERCPU_SHIFT), (NR_CPUS - first_unused) << PERCPU_SHIFT); #endif + + per_cpu_used_end = __pa(__per_cpu_start) + (first_unused << PERCPU_SHIFT); } static void __init init_idle_domain(void) @@ -1124,9 +1127,9 @@ int xen_in_range(paddr_t start, paddr_t /* initialize first time */ if ( !xen_regions[0].s ) { - extern char __init_begin[], __per_cpu_start[], __per_cpu_end[], - __bss_start[]; + extern char __init_begin[], __per_cpu_start[], __bss_start[]; extern unsigned long allocator_bitmap_end; + extern unsigned long per_cpu_used_end; /* S3 resume code (and other real mode trampoline code) */ xen_regions[0].s = bootsym_phys(trampoline_start); @@ -1136,7 +1139,7 @@ int xen_in_range(paddr_t start, paddr_t xen_regions[1].e = __pa(&__init_begin); /* per-cpu data */ xen_regions[2].s = __pa(&__per_cpu_start); - xen_regions[2].e = __pa(&__per_cpu_end); + xen_regions[2].e = per_cpu_used_end; /* bss + boot allocator bitmap */ xen_regions[3].s = __pa(&__bss_start); xen_regions[3].e = allocator_bitmap_end; diff -r 8b152638adaa xen/arch/x86/tboot.c --- a/xen/arch/x86/tboot.c Thu Apr 23 16:22:48 2009 +0100 +++ b/xen/arch/x86/tboot.c Fri Apr 24 15:24:18 2009 +0800 @@ -48,6 +48,7 @@ static uint64_t sinit_base, sinit_size; extern char __init_begin[], __per_cpu_start[], __per_cpu_end[], __bss_start[]; extern unsigned long allocator_bitmap_end; +extern unsigned long per_cpu_used_end; #define SHA1_SIZE 20 typedef uint8_t sha1_hash_t[SHA1_SIZE]; @@ -310,7 +311,7 @@ void tboot_shutdown(uint32_t shutdown_ty __pa(&_stext); /* per-cpu data */ g_tboot_shared->mac_regions[2].start = (uint64_t)__pa(&__per_cpu_start); - g_tboot_shared->mac_regions[2].size = __pa(&__per_cpu_end) - + g_tboot_shared->mac_regions[2].size = per_cpu_used_end - __pa(&__per_cpu_start); /* bss */ g_tboot_shared->mac_regions[3].start = (uint64_t)__pa(&__bss_start); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jan Beulich [mailto:jbeulich@novell.com] > Sent: Friday, April 24, 2009 12:05 AM > > >>> "Cihula, Joseph" <joseph.cihula@intel.com> 24.04.09 03:26 >>> > >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > >> Sent: Thursday, April 23, 2009 12:25 AM > >> > >> On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote: > >> > >> > Unfortunately, the frametable is only contiguous in the virtual address space, > >> > so one can''t simply take __pa() of its start and end. And since it is quite > >> > large, iterating through each page to gets its phys addr adds a perceptible > >> > delay when that check has to be done for each page of physical memory (as is > >> > the case in the only caller, the VT-d routine that maps memory for dom0). But > >> > it also appears that we can''t convert the phys addr arguments into their virt > >> > addrs to compare with the contiguous frametable range because they will > >> > convert to the DIRECTMAP va''s instead. > >> > >> The frametable is allocated in aligned 2MB chunks. So you can check at that > >> granularity rather than 4kB. > > > >That made it just a single iteration on a 2GB system, but what fn should be used > >to convert the va to pa? __pa() isn''t converting this correctly. > > I''m afraid you''ll have to either create one, or you''ll have to consult the page > tables. Also, how can this be a single iteration on a 2Gb system? struct > page_info is 32 bytes, so I''d expect the frame table to be 16Mb in size (i.e. > eight iterations).It''s actually 8 (I cut off my trace early before). I don''t think that it is worth the effort of walking the page tables just to exclude the frametable from the VT-d tables, since there are other memory regions (xenheap, etc.) that we''re not excluding. But if there is a binlist of things to do when someone gets time, I''d suggest adding this to it. Joe> Also, after suggesting to use gb-pages when possible here I realized that > it''s probably a latent bug to map more space than was allocated - if the > non-allocated-but-mapped pages happen to later get allocated to a domain, > that domain may change the cacheability attributes of any of these pages, > resulting in aliasing issues. I''ll put together a patch for this, but it''ll be a > couple of days until I''ll be able to do so. > > Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel