thr3ads.net - Xen devel - [Xen-devel] [PATCH 5/5] just realized that it''s broken [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Jan Beulich

2009-Jan-30 10:48 UTC

[Xen-devel] [PATCH 5/5] just realized that it''s broken

This patch was moving the cpumask out of struct page_info into the page
itself, but I just realized that this it not valid, since the purpose of the
mask
is to avoid a later TLB flush, hence the base assumption is that the page
might still be in the TLB of some CPU, thus making it possible for a mis-
behaved guest to still write to that page.

Sorry for the mis-numbering,
Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jan-30 11:35 UTC

head link

Re: [Xen-devel] [PATCH 5/5] just realized that it''s broken

Fortunately I think it can be got rid of and probably at little or no
performance cost. I will take a look maybe next week.

 -- Keir

On 30/01/2009 10:48, "Jan Beulich" <jbeulich@novell.com> wrote:
> This patch was moving the cpumask out of struct page_info into the page
> itself, but I just realized that this it not valid, since the purpose of
the
> mask
> is to avoid a later TLB flush, hence the base assumption is that the page
> might still be in the TLB of some CPU, thus making it possible for a mis-
> behaved guest to still write to that page.
> 
> Sorry for the mis-numbering,
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2009-Feb-04 08:27 UTC

head link

Re: [Xen-devel] [PATCH 5/5] just realized that it''s broken

>>> "Jan Beulich" <jbeulich@novell.com> 30.01.09 11:48
>>>
>This patch was moving the cpumask out of struct page_info into the page
>itself, but I just realized that this it not valid, since the purpose of the
mask
>is to avoid a later TLB flush, hence the base assumption is that the page
>might still be in the TLB of some CPU, thus making it possible for a mis-
>behaved guest to still write to that page.
>
>Sorry for the mis-numbering,
>Jan
Actually, after some more consideration I think the patch is actually correct
(minus an issue that already has been in place for a while anyway): All
access a domain may continue have to a page it''s freeing (through stale
TLB entries) is read/execute. This is because when the type count of a
page (i.e. in particular a writeable one) a (domain-)global TLB flush is
performed anyway. Allowing the freeing domain (as well as the one
subsequently allocating) to read the cpumask does not present a problem
in my opinion.

The issue mentioned above that I think current code has even without
that patch (though I may easily have missed some aspect here) is that
there is no enforcement of an immediate TLB flush when a writeable
page''s type count drops to zero - instead the flush is deferred until
the
end of the current operation or batch. With the removal of the use of
the per-domain lock in these code paths it is no longer valid to defer
the flush this much - it must be carried out before the page in question
gets unlocked.

And I would think that invalidate_shadow_ldt() has a similar issue, plus
it seems questionable that it does a local flush when acting on the current
vCPU, but a global one for all others.

Jan

*******************************************************

Move the (potentially large) cpumask field of free pages into the page
itself, thus making sizeof(struct page_info) independent of the
configured number of CPUs, which avoids penalizing systems with few
CPUs just because the hypervisor is able to support many.

Signed-off-by: Jan Beulich <jbeulich@novell.com>

--- 2009-01-27.orig/xen/common/page_alloc.c	2009-01-29 15:27:38.000000000 +0100
+++ 2009-01-27/xen/common/page_alloc.c	2009-01-29 15:33:09.000000000 +0100
@@ -376,7 +376,7 @@ static struct page_info *alloc_heap_page
         BUG_ON(pg[i].count_info != 0);
 
         /* Add in any extra CPUs that need flushing because of this page. */
-        cpus_andnot(extra_cpus_mask, pg[i].u.free.cpumask, mask);
+        cpus_andnot(extra_cpus_mask, PFN_CPUMASK(&pg[i]), mask);
         tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
         cpus_or(mask, mask, extra_cpus_mask);
 
@@ -425,11 +425,11 @@ static void free_heap_pages(
         if ( (d = page_get_owner(&pg[i])) != NULL )
         {
             pg[i].tlbflush_timestamp = tlbflush_current_time();
-            pg[i].u.free.cpumask     = d->domain_dirty_cpumask;
+            PFN_CPUMASK(&pg[i])      = d->domain_dirty_cpumask;
         }
         else
         {
-            cpus_clear(pg[i].u.free.cpumask);
+            cpus_clear(PFN_CPUMASK(&pg[i]));
         }
     }
 
--- 2009-01-27.orig/xen/include/asm-ia64/mm.h	2009-01-30 08:26:33.000000000
+0100
+++ 2009-01-27/xen/include/asm-ia64/mm.h	2009-01-29 14:24:42.000000000 +0100
@@ -35,8 +35,11 @@ typedef unsigned long page_flags_t;
  * Every architecture must ensure the following:
  *  1. ''struct page_info'' contains a ''struct
list_head list''.
  *  2. Provide a PFN_ORDER() macro for accessing the order of a free page.
+ *  3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-tainted
+ *     TLBs.
  */
-#define PFN_ORDER(_pfn)	((_pfn)->u.free.order)
+#define PFN_ORDER(_pfn)   ((_pfn)->u.free.order)
+#define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask)
 
 #define PRtype_info "016lx"
 
--- 2009-01-27.orig/xen/include/asm-x86/mm.h	2009-01-29 14:03:37.000000000 +0100
+++ 2009-01-27/xen/include/asm-x86/mm.h	2009-01-30 08:28:27.000000000 +0100
@@ -14,8 +14,18 @@
  * Every architecture must ensure the following:
  *  1. ''struct page_info'' contains a ''struct
page_list_entry list''.
  *  2. Provide a PFN_ORDER() macro for accessing the order of a free page.
+ *  3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-tainted
+ *     TLBs.
  */
 #define PFN_ORDER(_pfn) ((_pfn)->v.free.order)
+#ifdef __i386__
+# define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask)
+#else
+# define PFN_CPUMASK(_pfn) (*(cpumask_t *)page_to_virt(_pfn))
+# if NR_CPUS > PAGE_SIZE * 8
+#  error Cannot handle this many CPUs.
+# endif
+#endif
 
 /*
  * This definition is solely for the use in struct page_info (and
@@ -72,8 +82,10 @@ struct page_info
 
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
+#ifdef __i386__
             /* Mask of possibly-tainted TLBs. */
             cpumask_t cpumask;
+#endif
         } free;
 
     } u;



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Feb-04 08:49 UTC

head link

Re: [Xen-devel] [PATCH 5/5] just realized that it''s broken

On 04/02/2009 08:27, "Jan Beulich" <jbeulich@novell.com> wrote:
> The issue mentioned above that I think current code has even without
> that patch (though I may easily have missed some aspect here) is that
> there is no enforcement of an immediate TLB flush when a writeable
> page''s type count drops to zero - instead the flush is deferred
until the
> end of the current operation or batch. With the removal of the use of
> the per-domain lock in these code paths it is no longer valid to defer
> the flush this much - it must be carried out before the page in question
> gets unlocked.
Oh, good point. We don''t need the synchronous flush in free_page_type()
any
more as we do not rely on correctness of the linear map any more (we use it,
but then verify sufficient correctness via get_page()/lock_page(), so now a
guest can only shoot itself in the foot).

I''ll revert back to the lazy scheme we used to use. Which of course
means
cpumask-in-the-page will not be safe. I will kill it off by other means.

For invalidate_shadow_ldt(), the LDT mappings are per-VCPU so only the local
CPU needs flushing. The reason for flush_tlb_mask() is I think a
conservative attempt at flushing the correct TLB if we are not the VCPU in
question. I would think it is better to use v->vcpu_dirty_cpumask, which
maybe didn''t exist when i_s_l() was last looked at. I shall take a
look.

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jan 2009 - [PATCH 5/5] just realized that it''s broken

[Xen-devel] [PATCH 5/5] just realized that it''s broken

Re: [Xen-devel] [PATCH 5/5] just realized that it''s broken

Re: [Xen-devel] [PATCH 5/5] just realized that it''s broken

Re: [Xen-devel] [PATCH 5/5] just realized that it''s broken