thr3ads.net - Xen devel - [Xen-devel] [RFC] fix xen_in

If this information is useful, please help other people find it:
Share via:

Cihula, Joseph

2009-Apr-22 23:53 UTC

[Xen-devel] [RFC] fix xen_in_range()

The frametable check in xen_in_range() incorrectly compares virtual addresses to
physical addresses (i.e. the parameters).

Unfortunately, the frametable is only contiguous in the virtual address space,
so one can''t simply take __pa() of its start and end.  And since it is
quite large, iterating through each page to gets its phys addr adds a
perceptible delay when that check has to be done for each page of physical
memory (as is the case in the only caller, the VT-d routine that maps memory for
dom0).  But it also appears that we can''t convert the phys addr
arguments into their virt addrs to compare with the contiguous frametable range
because they will convert to the DIRECTMAP va''s instead.

So while I would prefer to find a way to keep the check against the frametable,
I don''t see any obvious way to do so.  So this RFC is to see if there
are any ideas for how the test could be re-written.

If not, then the below patch should be applied to remove the frametable check
altogether.

Signed-off-by:  Joseph Cihula <joseph.cihula@intel.com>

diff -r 655dc3bc1d8e xen/arch/x86/setup.c
--- a/xen/arch/x86/setup.c      Thu Apr 16 11:54:06 2009 +0100
+++ b/xen/arch/x86/setup.c      Wed Apr 22 15:23:46 2009 -0700
@@ -1119,7 +1119,7 @@ int xen_in_range(paddr_t start, paddr_t
     int i;
     static struct {
         paddr_t s, e;
-    } xen_regions[5];
+    } xen_regions[4];

     /* initialize first time */
     if ( !xen_regions[0].s )
@@ -1140,10 +1140,6 @@ int xen_in_range(paddr_t start, paddr_t
         /* bss + boot allocator bitmap */
         xen_regions[3].s = __pa(&__bss_start);
         xen_regions[3].e = allocator_bitmap_end;
-        /* frametable */
-        xen_regions[4].s = (unsigned long)frame_table;
-        xen_regions[4].e = (unsigned long)frame_table +
-                           PFN_UP(max_page * sizeof(*frame_table));
     }

     for ( i = 0; i < ARRAY_SIZE(xen_regions); i++ )


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Apr-23 07:25 UTC

head link

[Xen-devel] Re: [RFC] fix xen_in_range()

On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com>
wrote:
> Unfortunately, the frametable is only contiguous in the virtual address
space,
> so one can''t simply take __pa() of its start and end.  And since
it is quite
> large, iterating through each page to gets its phys addr adds a perceptible
> delay when that check has to be done for each page of physical memory (as
is
> the case in the only caller, the VT-d routine that maps memory for dom0). 
But
> it also appears that we can''t convert the phys addr arguments into
their virt
> addrs to compare with the contiguous frametable range because they will
> convert to the DIRECTMAP va''s instead.
The frametable is allocated in aligned 2MB chunks. So you can check at that
granularity rather than 4kB.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2009-Apr-23 07:58 UTC

head link

[Xen-devel] Re: [RFC] fix xen_in_range()

>>> Keir Fraser <keir.fraser@eu.citrix.com> 23.04.09 09:25
>>>
>On 23/04/2009 00:53, "Cihula, Joseph"
<joseph.cihula@intel.com> wrote:
>
>> Unfortunately, the frametable is only contiguous in the virtual address
space,
>> so one can''t simply take __pa() of its start and end.  And
since it is quite
>> large, iterating through each page to gets its phys addr adds a
perceptible
>> delay when that check has to be done for each page of physical memory
(as is
>> the case in the only caller, the VT-d routine that maps memory for
dom0).  But
>> it also appears that we can''t convert the phys addr arguments
into their virt
>> addrs to compare with the contiguous frametable range because they will
>> convert to the DIRECTMAP va''s instead.
>
>The frametable is allocated in aligned 2MB chunks. So you can check at that
>granularity rather than 4kB.
... and perhaps allocation should be attempted in 1Gb chunks when the table
size is getting close to or exceeding 1Gb (and 1Gb-pages are supported). Or,
since the space mapped is larger than the space allocated anyway, the
condition might be just that of 1Gb-pages being supported (provided a 1Gb-
aligned chunk can be allocated).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Cihula, Joseph

2009-Apr-24 01:26 UTC

head link

[Xen-devel] RE: [RFC] fix xen_in_range()

> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Thursday, April 23, 2009 12:25 AM
>
> On 23/04/2009 00:53, "Cihula, Joseph"
<joseph.cihula@intel.com> wrote:
>
> > Unfortunately, the frametable is only contiguous in the virtual
address space,
> > so one can''t simply take __pa() of its start and end.  And
since it is quite
> > large, iterating through each page to gets its phys addr adds a
perceptible
> > delay when that check has to be done for each page of physical memory
(as is
> > the case in the only caller, the VT-d routine that maps memory for
dom0).  But
> > it also appears that we can''t convert the phys addr arguments
into their virt
> > addrs to compare with the contiguous frametable range because they
will
> > convert to the DIRECTMAP va''s instead.
>
> The frametable is allocated in aligned 2MB chunks. So you can check at that
> granularity rather than 4kB.
That made it just a single iteration on a 2GB system, but what fn should be used
to convert the va to pa?  __pa() isn''t converting this correctly.

Joe

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2009-Apr-24 07:04 UTC

head link

[Xen-devel] RE: [RFC] fix xen_in_range()

>>> "Cihula, Joseph" <joseph.cihula@intel.com> 24.04.09
03:26 >>>
>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] 
>> Sent: Thursday, April 23, 2009 12:25 AM
>>
>> On 23/04/2009 00:53, "Cihula, Joseph"
<joseph.cihula@intel.com> wrote:
>>
>> > Unfortunately, the frametable is only contiguous in the virtual
address space,
>> > so one can''t simply take __pa() of its start and end. 
And since it is quite
>> > large, iterating through each page to gets its phys addr adds a
perceptible
>> > delay when that check has to be done for each page of physical
memory (as is
>> > the case in the only caller, the VT-d routine that maps memory for
dom0).  But
>> > it also appears that we can''t convert the phys addr
arguments into their virt
>> > addrs to compare with the contiguous frametable range because they
will
>> > convert to the DIRECTMAP va''s instead.
>>
>> The frametable is allocated in aligned 2MB chunks. So you can check at
that
>> granularity rather than 4kB.
>
>That made it just a single iteration on a 2GB system, but what fn should be
used
>to convert the va to pa?  __pa() isn''t converting this correctly.
I''m afraid you''ll have to either create one, or
you''ll have to consult the page
tables. Also, how can this be a single iteration on a 2Gb system? struct
page_info is 32 bytes, so I''d expect the frame table to be 16Mb in size
(i.e.
eight iterations).

Also, after suggesting to use gb-pages when possible here I realized that
it''s probably a latent bug to map more space than was allocated - if
the
non-allocated-but-mapped pages happen to later get allocated to a domain,
that domain may change the cacheability attributes of any of these pages,
resulting in aliasing issues. I''ll put together a patch for this, but
it''ll be a
couple of days until I''ll be able to do so.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Apr-24 07:16 UTC

head link

Re: [Xen-devel] RE: [RFC] fix xen_in_range()

On 24/04/2009 08:04, "Jan Beulich" <jbeulich@novell.com> wrote:
> Also, after suggesting to use gb-pages when possible here I realized that
> it''s probably a latent bug to map more space than was allocated -
if the
> non-allocated-but-mapped pages happen to later get allocated to a domain,
> that domain may change the cacheability attributes of any of these pages,
> resulting in aliasing issues. I''ll put together a patch for this,
but it''ll be
> a couple of days until I''ll be able to do so.
I think we should shatter the superpage on demand. This would also be
required for superpage mappings of Xen itself: when we free initmem that
memory can now be allocated to a domain (now xenheap and domheap are merged
on x86/64).

An alternative might be to mark such partially-freed superpages as
Xenheap-only, and allocate them preferentially for Xenheap callers (i.e.,
alloc those pages first, then from the general heap).

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Qing He

2009-Apr-24 23:14 UTC

head link

Re: [Xen-devel] RE: [RFC] fix xen_in_range()

On Fri, 2009-04-24 at 15:16 +0800, Keir Fraser wrote:> On 24/04/2009 08:04, "Jan Beulich" <jbeulich@novell.com>
wrote:
> 
> > Also, after suggesting to use gb-pages when possible here I realized
that
> > it''s probably a latent bug to map more space than was
allocated - if the
> > non-allocated-but-mapped pages happen to later get allocated to a
domain,
> > that domain may change the cacheability attributes of any of these
pages,
> > resulting in aliasing issues. I''ll put together a patch for
this, but it''ll be
> > a couple of days until I''ll be able to do so.
> 
> I think we should shatter the superpage on demand. This would also be
> required for superpage mappings of Xen itself: when we free initmem that
> memory can now be allocated to a domain (now xenheap and domheap are merged
> on x86/64).
Actually, we just found an example that problems can arise here.
Just it''s not the superpage shattering, but unused __per_cpu area. Dom0
actually gets these pages and tries to do DMA on them, which causes VT-d
page fault and dom0 boot failure. I have a patch for it adding this part
to iommu page table.

Thanks,
Qing> 
> An alternative might be to mark such partially-freed superpages as
> Xenheap-only, and allocate them preferentially for Xenheap callers (i.e.,
> alloc those pages first, then from the general heap).
> 
>  -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Qing He

2009-Apr-24 23:20 UTC

head link

[Xen-devel] [PATCH] iommu: fix unused percpu in xen_in_range()

On Fri, 2009-04-24 at 15:16 +0800, Keir Fraser wrote:> On 24/04/2009 08:04, "Jan Beulich" <jbeulich@novell.com>
wrote:
> 
> > Also, after suggesting to use gb-pages when possible here I realized
that
> > it''s probably a latent bug to map more space than was
allocated - if the
> > non-allocated-but-mapped pages happen to later get allocated to a
domain,
> > that domain may change the cacheability attributes of any of these
pages,
> > resulting in aliasing issues. I''ll put together a patch for
this, but it''ll be
> > a couple of days until I''ll be able to do so.
> 
> I think we should shatter the superpage on demand. This would also be
> required for superpage mappings of Xen itself: when we free initmem that
> memory can now be allocated to a domain (now xenheap and domheap are merged
> on x86/64).
> 
> An alternative might be to mark such partially-freed superpages as
> Xenheap-only, and allocate them preferentially for Xenheap callers (i.e.,
> alloc those pages first, then from the general heap).
> 
Here is the patch I mentioned above, it can fix dom0 booting on my box:
---

unused percpu area is reclaimed as xenheap, but since xenheap and
domheap are shared on x86_64, it''s possible dom0 can get these pages
and
perform DMA on them. This patch removes this area in xen_in_range(),
so iommu 1:1 mapping for this area can be added.

Signed-off-by: Qing He <qing.he@intel.com>

---
diff -r 8b152638adaa xen/arch/x86/setup.c
--- a/xen/arch/x86/setup.c	Thu Apr 23 16:22:48 2009 +0100
+++ b/xen/arch/x86/setup.c	Fri Apr 24 15:24:18 2009 +0800
@@ -98,6 +98,7 @@ cpumask_t cpu_present_map;
 
 unsigned long xen_phys_start;
 unsigned long allocator_bitmap_end;
+unsigned long per_cpu_used_end;
 
 #ifdef CONFIG_X86_32
 /* Limits of Xen heap, used to initialise the allocator. */
@@ -223,6 +224,8 @@ static void __init percpu_init_areas(voi
                          (first_unused << PERCPU_SHIFT),
                          (NR_CPUS - first_unused) << PERCPU_SHIFT);
 #endif
+
+    per_cpu_used_end = __pa(__per_cpu_start) + (first_unused <<
PERCPU_SHIFT);
 }
 
 static void __init init_idle_domain(void)
@@ -1124,9 +1127,9 @@ int xen_in_range(paddr_t start, paddr_t 
     /* initialize first time */
     if ( !xen_regions[0].s )
     {
-        extern char __init_begin[], __per_cpu_start[], __per_cpu_end[],
-                    __bss_start[];
+        extern char __init_begin[], __per_cpu_start[], __bss_start[];
         extern unsigned long allocator_bitmap_end;
+        extern unsigned long per_cpu_used_end;
 
         /* S3 resume code (and other real mode trampoline code) */
         xen_regions[0].s = bootsym_phys(trampoline_start);
@@ -1136,7 +1139,7 @@ int xen_in_range(paddr_t start, paddr_t 
         xen_regions[1].e = __pa(&__init_begin);
         /* per-cpu data */
         xen_regions[2].s = __pa(&__per_cpu_start);
-        xen_regions[2].e = __pa(&__per_cpu_end);
+        xen_regions[2].e = per_cpu_used_end;
         /* bss + boot allocator bitmap */
         xen_regions[3].s = __pa(&__bss_start);
         xen_regions[3].e = allocator_bitmap_end;
diff -r 8b152638adaa xen/arch/x86/tboot.c
--- a/xen/arch/x86/tboot.c	Thu Apr 23 16:22:48 2009 +0100
+++ b/xen/arch/x86/tboot.c	Fri Apr 24 15:24:18 2009 +0800
@@ -48,6 +48,7 @@ static uint64_t sinit_base, sinit_size;
 
 extern char __init_begin[], __per_cpu_start[], __per_cpu_end[], __bss_start[];
 extern unsigned long allocator_bitmap_end;
+extern unsigned long per_cpu_used_end;
 
 #define SHA1_SIZE      20
 typedef uint8_t   sha1_hash_t[SHA1_SIZE];
@@ -310,7 +311,7 @@ void tboot_shutdown(uint32_t shutdown_ty
                                               __pa(&_stext);
         /* per-cpu data */
         g_tboot_shared->mac_regions[2].start =
(uint64_t)__pa(&__per_cpu_start);
-        g_tboot_shared->mac_regions[2].size = __pa(&__per_cpu_end) -
+        g_tboot_shared->mac_regions[2].size = per_cpu_used_end -
                                               __pa(&__per_cpu_start);
         /* bss */
         g_tboot_shared->mac_regions[3].start =
(uint64_t)__pa(&__bss_start);



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Cihula, Joseph

2009-Apr-24 23:29 UTC

head link

RE: [Xen-devel] RE: [RFC] fix xen_in_range()

> From: Jan Beulich [mailto:jbeulich@novell.com]
> Sent: Friday, April 24, 2009 12:05 AM
>
> >>> "Cihula, Joseph" <joseph.cihula@intel.com>
24.04.09 03:26 >>>
> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> >> Sent: Thursday, April 23, 2009 12:25 AM
> >>
> >> On 23/04/2009 00:53, "Cihula, Joseph"
<joseph.cihula@intel.com> wrote:
> >>
> >> > Unfortunately, the frametable is only contiguous in the
virtual address space,
> >> > so one can''t simply take __pa() of its start and
end.  And since it is quite
> >> > large, iterating through each page to gets its phys addr adds
a perceptible
> >> > delay when that check has to be done for each page of
physical memory (as is
> >> > the case in the only caller, the VT-d routine that maps
memory for dom0).  But
> >> > it also appears that we can''t convert the phys addr
arguments into their virt
> >> > addrs to compare with the contiguous frametable range because
they will
> >> > convert to the DIRECTMAP va''s instead.
> >>
> >> The frametable is allocated in aligned 2MB chunks. So you can
check at that
> >> granularity rather than 4kB.
> >
> >That made it just a single iteration on a 2GB system, but what fn
should be used
> >to convert the va to pa?  __pa() isn''t converting this
correctly.
>
> I''m afraid you''ll have to either create one, or
you''ll have to consult the page
> tables. Also, how can this be a single iteration on a 2Gb system? struct
> page_info is 32 bytes, so I''d expect the frame table to be 16Mb in
size (i.e.
> eight iterations).
It''s actually 8 (I cut off my trace early before).

I don''t think that it is worth the effort of walking the page tables
just to exclude the frametable from the VT-d tables, since there are other
memory regions (xenheap, etc.) that we''re not excluding.  But if there
is a binlist of things to do when someone gets time, I''d suggest adding
this to it.

Joe
> Also, after suggesting to use gb-pages when possible here I realized that
> it''s probably a latent bug to map more space than was allocated -
if the
> non-allocated-but-mapped pages happen to later get allocated to a domain,
> that domain may change the cacheability attributes of any of these pages,
> resulting in aliasing issues. I''ll put together a patch for this,
but it''ll be a
> couple of days until I''ll be able to do so.
>
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2009 - [RFC] fix xen_in_range()

[Xen-devel] [RFC] fix xen_in_range()

[Xen-devel] Re: [RFC] fix xen_in_range()

[Xen-devel] Re: [RFC] fix xen_in_range()

[Xen-devel] RE: [RFC] fix xen_in_range()

[Xen-devel] RE: [RFC] fix xen_in_range()

Re: [Xen-devel] RE: [RFC] fix xen_in_range()

Re: [Xen-devel] RE: [RFC] fix xen_in_range()

[Xen-devel] [PATCH] iommu: fix unused percpu in xen_in_range()

RE: [Xen-devel] RE: [RFC] fix xen_in_range()