David Vrabel
2012-Jul-09  10:39 UTC
[PATCH 0/4] x86/xen: page fault and context switch performance (v2)
This series improves the performance of Xen PV guests when doing page faults (32-bit guests only) and context switches. Can an x86 maintainer ack patch 3 (x86: add desc_equal() to compare GDT descriptors) so this series can go via Konrad''s tree? The other patches are Xen-specific. Changes since v1: - fix (very!) slow boot on x86_64 by avoiding hypercalls when building the initial page tables. These updates were mostly a) done to unpinned pages and b) often had invalid MFNs that Xen warns about. - When building the initial page tables, zero PTEs for pages without an MFN yet. - add desc_equal() instead of open coding it. David
David Vrabel
2012-Jul-09  10:39 UTC
[PATCH 1/4] xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable
From: David Vrabel <david.vrabel@citrix.com>
In xen_set_pte() if batching is unavailable (because the caller is in
an interrupt context such as handling a page fault) it would fall back
to using native_set_pte() and trapping and emulating the PTE write.
On 32-bit guests this requires two traps for each PTE write (one for
each dword of the PTE).  Instead, do one mmu_update hypercall
directly.
During construction of the initial page tables, continue to use
native_set_pte() because most of the PTEs being set are in writable
and unpinned pages (see phys_pmd_init() in arch/x86/mm/init_64.c) and
using a hypercall for this is very expensive.
This significantly improves page fault performance in 32-bit PV
guests.
lmbench3 test  Before    After     Improvement
----------------------------------------------
lat_pagefault  3.18 us   2.32 us   27%
lat_proc fork  356 us    313.3 us  11%
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/mmu.c |   30 +++++++++++++++++++++++++-----
 1 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3a73785..3f1783a 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -308,8 +308,20 @@ static bool xen_batched_set_pte(pte_t *ptep, pte_t pteval)
 
 static inline void __xen_set_pte(pte_t *ptep, pte_t pteval)
 {
-	if (!xen_batched_set_pte(ptep, pteval))
-		native_set_pte(ptep, pteval);
+	if (!xen_batched_set_pte(ptep, pteval)) {
+		/*
+		 * Could call native_set_pte() here and trap and
+		 * emulate the PTE write but with 32-bit guests this
+		 * needs two traps (one for each of the two 32-bit
+		 * words in the PTE) so do one hypercall directly
+		 * instead.
+		 */
+		struct mmu_update u;
+
+		u.ptr = virt_to_machine(ptep).maddr | MMU_NORMAL_PT_UPDATE;
+		u.val = pte_val_ma(pteval);
+		HYPERVISOR_mmu_update(&u, 1, NULL, DOMID_SELF);
+	}
 }
 
 static void xen_set_pte(pte_t *ptep, pte_t pteval)
@@ -1416,13 +1428,21 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
 }
 #endif /* CONFIG_X86_64 */
 
-/* Init-time set_pte while constructing initial pagetables, which
-   doesn''t allow RO pagetable pages to be remapped RW */
+/*
+ * Init-time set_pte while constructing initial pagetables, which
+ * doesn''t allow RO page table pages to be remapped RW.
+ *
+ * Many of these PTE updates are done on unpinned and writable pages
+ * and doing a hypercall for these is unnecessary and expensive.  At
+ * this point it is not possible to tell if a page is pinned or not,
+ * so always write the PTE directly and rely on Xen trapping and
+ * emulating any updates as necessary.
+ */
 static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
 {
 	pte = mask_rw_pte(ptep, pte);
 
-	xen_set_pte(ptep, pte);
+	native_set_pte(ptep, pte);
 }
 
 static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
-- 
1.7.2.5
David Vrabel
2012-Jul-09  10:39 UTC
[PATCH 2/4] xen/mm: zero PTEs for non-present MFNs in the initial page table
From: David Vrabel <david.vrabel@citrix.com>
When constructing the initial page tables, if the MFN for a usable PFN
is missing in the p2m then that frame is initially ballooned out.  In
this case, zero the PTE (as in decrease_reservation() in
drivers/xen/balloon.c).
This is obviously safe instead of having an valid PTE with an MFN of
INVALID_P2M_ENTRY (~0).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/mmu.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3f1783a..27336df 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1432,6 +1432,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
  * Init-time set_pte while constructing initial pagetables, which
  * doesn''t allow RO page table pages to be remapped RW.
  *
+ * If there is no MFN for this PFN then this page is initially
+ * ballooned out so clear the PTE (as in decrease_reservation() in
+ * drivers/xen/balloon.c).
+ *
  * Many of these PTE updates are done on unpinned and writable pages
  * and doing a hypercall for these is unnecessary and expensive.  At
  * this point it is not possible to tell if a page is pinned or not,
@@ -1440,7 +1444,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
  */
 static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
 {
-	pte = mask_rw_pte(ptep, pte);
+	if (pte_mfn(pte) != INVALID_P2M_ENTRY)
+		pte = mask_rw_pte(ptep, pte);
+	else
+		pte = __pte_ma(0);
 
 	native_set_pte(ptep, pte);
 }
-- 
1.7.2.5
David Vrabel
2012-Jul-09  10:39 UTC
[PATCH 3/4] x86: add desc_equal() to compare GDT descriptors
From: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/include/asm/desc.h |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 8bf1c06..d26f059 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -85,6 +85,12 @@ static inline int desc_empty(const void *ptr)
 	return !(desc[0] | desc[1]);
 }
 
+static inline bool desc_equal(const struct desc_struct *d1,
+			      const struct desc_struct *d2)
+{
+	return d1->a == d2->a && d1->b == d2->b;
+}
+
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #else
-- 
1.7.2.5
David Vrabel
2012-Jul-09  10:39 UTC
[PATCH 4/4] x86/xen: avoid updating TLS descriptors if they haven''t changed
From: David Vrabel <david.vrabel@citrix.com>
When switching tasks in a Xen PV guest, avoid updating the TLS
descriptors if they haven''t changed.  This improves the speed of
context switches by almost 10% as much of the time the descriptors are
the same or only one is different.
The descriptors written into the GDT by Xen are modified from the
values passed in the update_descriptor hypercall so we keep shadow
copies of the three TLS descriptors to compare against.
lmbench3 test     Before  After  Improvement
--------------------------------------------
lat_ctx -s 32 24   7.19    6.52  9%
lat_pipe          12.56   11.66  7%
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/enlighten.c |   29 ++++++++++++++++++++++++++---
 1 files changed, 26 insertions(+), 3 deletions(-)
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ff962d4..33d53a1 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -124,6 +124,19 @@ struct shared_info *HYPERVISOR_shared_info = (void
*)&xen_dummy_shared_info;
  */
 static int have_vcpu_info_placement = 1;
 
+struct tls_descs {
+	struct desc_struct desc[3];
+};
+
+/*
+ * Updating the 3 TLS descriptors in the GDT on every task switch is
+ * surprisingly expensive so we avoid updating them if they haven''t
+ * changed.  Since Xen writes different descriptors than the one
+ * passed in the update_descriptor hypercall we keep shadow copies to
+ * compare against.
+ */
+static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
+
 static void clamp_max_cpus(void)
 {
 #ifdef CONFIG_SMP
@@ -543,9 +556,19 @@ static void __init xen_load_gdt_boot(const struct desc_ptr
*dtr)
 static void load_TLS_descriptor(struct thread_struct *t,
 				unsigned int cpu, unsigned int i)
 {
-	struct desc_struct *gdt = get_cpu_gdt_table(cpu);
-	xmaddr_t maddr = arbitrary_virt_to_machine(&gdt[GDT_ENTRY_TLS_MIN+i]);
-	struct multicall_space mc = __xen_mc_entry(0);
+	struct desc_struct *shadow = &per_cpu(shadow_tls_desc, cpu).desc[i];
+	struct desc_struct *gdt;
+	xmaddr_t maddr;
+	struct multicall_space mc;
+
+	if (desc_equal(shadow, &t->tls_array[i]))
+		return;
+
+	*shadow = t->tls_array[i];
+
+	gdt = get_cpu_gdt_table(cpu);
+	maddr = arbitrary_virt_to_machine(&gdt[GDT_ENTRY_TLS_MIN+i]);
+	mc = __xen_mc_entry(0);
 
 	MULTI_update_descriptor(mc.mc, maddr.maddr, t->tls_array[i]);
 }
-- 
1.7.2.5
Ian Campbell
2012-Jul-09  14:38 UTC
Re: [Xen-devel] [PATCH 0/4] x86/xen: page fault and context switch performance (v2)
On Mon, 2012-07-09 at 06:39 -0400, David Vrabel wrote:> This series improves the performance of Xen PV guests when doing page > faults (32-bit guests only) and context switches.These all look good to me, thanks David. Acked-by: Ian Campbell <ian.campbell@citrix.com>
Konrad Rzeszutek Wilk
2012-Jul-11  19:21 UTC
Re: [PATCH 3/4] x86: add desc_equal() to compare GDT descriptors
On Mon, Jul 09, 2012 at 11:39:07AM +0100, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > Signed-off-by: David Vrabel <david.vrabel@citrix.com>Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>> --- > arch/x86/include/asm/desc.h | 6 ++++++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h > index 8bf1c06..d26f059 100644 > --- a/arch/x86/include/asm/desc.h > +++ b/arch/x86/include/asm/desc.h > @@ -85,6 +85,12 @@ static inline int desc_empty(const void *ptr) > return !(desc[0] | desc[1]); > } > > +static inline bool desc_equal(const struct desc_struct *d1, > + const struct desc_struct *d2) > +{ > + return d1->a == d2->a && d1->b == d2->b; > +} > + > #ifdef CONFIG_PARAVIRT > #include <asm/paravirt.h> > #else > -- > 1.7.2.5
David Vrabel
2012-Jul-17  15:22 UTC
Re: [PATCH 3/4] x86: add desc_equal() to compare GDT descriptors
On 09/07/12 11:39, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > Signed-off-by: David Vrabel <david.vrabel@citrix.com>ping? Can we get an x86 an x86 maintainer for this patch, please? David> --- > arch/x86/include/asm/desc.h | 6 ++++++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h > index 8bf1c06..d26f059 100644 > --- a/arch/x86/include/asm/desc.h > +++ b/arch/x86/include/asm/desc.h > @@ -85,6 +85,12 @@ static inline int desc_empty(const void *ptr) > return !(desc[0] | desc[1]); > } > > +static inline bool desc_equal(const struct desc_struct *d1, > + const struct desc_struct *d2) > +{ > + return d1->a == d2->a && d1->b == d2->b; > +} > + > #ifdef CONFIG_PARAVIRT > #include <asm/paravirt.h> > #else
H. Peter Anvin
2012-Jul-17  15:57 UTC
Re: [PATCH 3/4] x86: add desc_equal() to compare GDT descriptors
Sorry we''re all travelling at the moment David Vrabel <david.vrabel@citrix.com> wrote:>On 09/07/12 11:39, David Vrabel wrote: >> From: David Vrabel <david.vrabel@citrix.com> >> >> Signed-off-by: David Vrabel <david.vrabel@citrix.com> > >ping? Can we get an x86 an x86 maintainer for this patch, please? > >David > >> --- >> arch/x86/include/asm/desc.h | 6 ++++++ >> 1 files changed, 6 insertions(+), 0 deletions(-) >> >> diff --git a/arch/x86/include/asm/desc.h >b/arch/x86/include/asm/desc.h >> index 8bf1c06..d26f059 100644 >> --- a/arch/x86/include/asm/desc.h >> +++ b/arch/x86/include/asm/desc.h >> @@ -85,6 +85,12 @@ static inline int desc_empty(const void *ptr) >> return !(desc[0] | desc[1]); >> } >> >> +static inline bool desc_equal(const struct desc_struct *d1, >> + const struct desc_struct *d2) >> +{ >> + return d1->a == d2->a && d1->b == d2->b; >> +} >> + >> #ifdef CONFIG_PARAVIRT >> #include <asm/paravirt.h> >> #else-- Sent from my mobile phone. Please excuse brevity and lack of formatting.