This patch was moving the cpumask out of struct page_info into the page itself, but I just realized that this it not valid, since the purpose of the mask is to avoid a later TLB flush, hence the base assumption is that the page might still be in the TLB of some CPU, thus making it possible for a mis- behaved guest to still write to that page. Sorry for the mis-numbering, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-30 11:35 UTC
Re: [Xen-devel] [PATCH 5/5] just realized that it''s broken
Fortunately I think it can be got rid of and probably at little or no performance cost. I will take a look maybe next week. -- Keir On 30/01/2009 10:48, "Jan Beulich" <jbeulich@novell.com> wrote:> This patch was moving the cpumask out of struct page_info into the page > itself, but I just realized that this it not valid, since the purpose of the > mask > is to avoid a later TLB flush, hence the base assumption is that the page > might still be in the TLB of some CPU, thus making it possible for a mis- > behaved guest to still write to that page. > > Sorry for the mis-numbering, > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Feb-04 08:27 UTC
Re: [Xen-devel] [PATCH 5/5] just realized that it''s broken
>>> "Jan Beulich" <jbeulich@novell.com> 30.01.09 11:48 >>> >This patch was moving the cpumask out of struct page_info into the page >itself, but I just realized that this it not valid, since the purpose of the mask >is to avoid a later TLB flush, hence the base assumption is that the page >might still be in the TLB of some CPU, thus making it possible for a mis- >behaved guest to still write to that page. > >Sorry for the mis-numbering, >JanActually, after some more consideration I think the patch is actually correct (minus an issue that already has been in place for a while anyway): All access a domain may continue have to a page it''s freeing (through stale TLB entries) is read/execute. This is because when the type count of a page (i.e. in particular a writeable one) a (domain-)global TLB flush is performed anyway. Allowing the freeing domain (as well as the one subsequently allocating) to read the cpumask does not present a problem in my opinion. The issue mentioned above that I think current code has even without that patch (though I may easily have missed some aspect here) is that there is no enforcement of an immediate TLB flush when a writeable page''s type count drops to zero - instead the flush is deferred until the end of the current operation or batch. With the removal of the use of the per-domain lock in these code paths it is no longer valid to defer the flush this much - it must be carried out before the page in question gets unlocked. And I would think that invalidate_shadow_ldt() has a similar issue, plus it seems questionable that it does a local flush when acting on the current vCPU, but a global one for all others. Jan ******************************************************* Move the (potentially large) cpumask field of free pages into the page itself, thus making sizeof(struct page_info) independent of the configured number of CPUs, which avoids penalizing systems with few CPUs just because the hypervisor is able to support many. Signed-off-by: Jan Beulich <jbeulich@novell.com> --- 2009-01-27.orig/xen/common/page_alloc.c 2009-01-29 15:27:38.000000000 +0100 +++ 2009-01-27/xen/common/page_alloc.c 2009-01-29 15:33:09.000000000 +0100 @@ -376,7 +376,7 @@ static struct page_info *alloc_heap_page BUG_ON(pg[i].count_info != 0); /* Add in any extra CPUs that need flushing because of this page. */ - cpus_andnot(extra_cpus_mask, pg[i].u.free.cpumask, mask); + cpus_andnot(extra_cpus_mask, PFN_CPUMASK(&pg[i]), mask); tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp); cpus_or(mask, mask, extra_cpus_mask); @@ -425,11 +425,11 @@ static void free_heap_pages( if ( (d = page_get_owner(&pg[i])) != NULL ) { pg[i].tlbflush_timestamp = tlbflush_current_time(); - pg[i].u.free.cpumask = d->domain_dirty_cpumask; + PFN_CPUMASK(&pg[i]) = d->domain_dirty_cpumask; } else { - cpus_clear(pg[i].u.free.cpumask); + cpus_clear(PFN_CPUMASK(&pg[i])); } } --- 2009-01-27.orig/xen/include/asm-ia64/mm.h 2009-01-30 08:26:33.000000000 +0100 +++ 2009-01-27/xen/include/asm-ia64/mm.h 2009-01-29 14:24:42.000000000 +0100 @@ -35,8 +35,11 @@ typedef unsigned long page_flags_t; * Every architecture must ensure the following: * 1. ''struct page_info'' contains a ''struct list_head list''. * 2. Provide a PFN_ORDER() macro for accessing the order of a free page. + * 3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-tainted + * TLBs. */ -#define PFN_ORDER(_pfn) ((_pfn)->u.free.order) +#define PFN_ORDER(_pfn) ((_pfn)->u.free.order) +#define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask) #define PRtype_info "016lx" --- 2009-01-27.orig/xen/include/asm-x86/mm.h 2009-01-29 14:03:37.000000000 +0100 +++ 2009-01-27/xen/include/asm-x86/mm.h 2009-01-30 08:28:27.000000000 +0100 @@ -14,8 +14,18 @@ * Every architecture must ensure the following: * 1. ''struct page_info'' contains a ''struct page_list_entry list''. * 2. Provide a PFN_ORDER() macro for accessing the order of a free page. + * 3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-tainted + * TLBs. */ #define PFN_ORDER(_pfn) ((_pfn)->v.free.order) +#ifdef __i386__ +# define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask) +#else +# define PFN_CPUMASK(_pfn) (*(cpumask_t *)page_to_virt(_pfn)) +# if NR_CPUS > PAGE_SIZE * 8 +# error Cannot handle this many CPUs. +# endif +#endif /* * This definition is solely for the use in struct page_info (and @@ -72,8 +82,10 @@ struct page_info /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ struct { +#ifdef __i386__ /* Mask of possibly-tainted TLBs. */ cpumask_t cpumask; +#endif } free; } u; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Feb-04 08:49 UTC
Re: [Xen-devel] [PATCH 5/5] just realized that it''s broken
On 04/02/2009 08:27, "Jan Beulich" <jbeulich@novell.com> wrote:> The issue mentioned above that I think current code has even without > that patch (though I may easily have missed some aspect here) is that > there is no enforcement of an immediate TLB flush when a writeable > page''s type count drops to zero - instead the flush is deferred until the > end of the current operation or batch. With the removal of the use of > the per-domain lock in these code paths it is no longer valid to defer > the flush this much - it must be carried out before the page in question > gets unlocked.Oh, good point. We don''t need the synchronous flush in free_page_type() any more as we do not rely on correctness of the linear map any more (we use it, but then verify sufficient correctness via get_page()/lock_page(), so now a guest can only shoot itself in the foot). I''ll revert back to the lazy scheme we used to use. Which of course means cpumask-in-the-page will not be safe. I will kill it off by other means. For invalidate_shadow_ldt(), the LDT mappings are per-VCPU so only the local CPU needs flushing. The reason for flush_tlb_mask() is I think a conservative attempt at flushing the correct TLB if we are not the VCPU in question. I would think it is better to use v->vcpu_dirty_cpumask, which maybe didn''t exist when i_s_l() was last looked at. I shall take a look. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel