thr3ads.net - Xen devel - [Xen-devel] [PATCH] x86/acpi fixes for 3.2 (v1) impacting distributions. [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Konrad Rzeszutek Wilk

2011-Nov-08 21:15 UTC

[Xen-devel] [PATCH] x86/acpi fixes for 3.2 (v1) impacting distributions.

I am posting three patches that are impacting distributions (both Ubuntu
and Fedora Core 16) when running the Linux v3.1 (or later) under Xen.

The first one is a regression:
 [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to

In 3.1 we set pm_idle to something else besides the default_idle which
is not good. We want to use the default_halt as it does a yield hypercall, while
the other pm_idle do not. Worst yet, when we would migrate a guest we could
be using the wrong pm_idle code (on AMD boxes).

The two other ones are more controversial and I am not sure if the path
I had choosen is the "best" to fix the corruption problem. The
"Right Way"
would be to wrap pte_flags with a pvops call, but that has serious performance
drawback implications. Ad nauseum details are in the patch:

 [PATCH 2/3] x86/cpa: Use pte_attrs instead of pte_flags on

and the last one is not that important, but nonethless if somebody is running
CONFIG_CPA_DEBUG and with a radeon or nouveau card they might get sporadic:
"CPA (x) bad PTE" messages. This patch fixes that.

 [PATCH 3/3] x86/paravirt: Use pte_val instead of pte_flags on CPA

The patches are also located in 

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-x86

 arch/x86/include/asm/pgtable.h |    5 +++++
 arch/x86/kernel/process.c      |    5 +++++
 arch/x86/mm/pageattr-test.c    |    6 +++++-
 arch/x86/mm/pageattr.c         |    2 +-
 include/linux/cpuidle.h        |    2 ++
 5 files changed, 18 insertions(+), 2 deletions(-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Nov-08 21:15 UTC

head link

[Xen-devel] [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

. however we ignore the disable_cpuidle() directive and
in select_idle_routine() set it to type preferred by the CPU.

This is a regression introduced by

 commit a0bfa1373859e9d11dc92561a8667588803e42d8
 Author: Len Brown <len.brown@intel.com>
 Date:   Fri Apr 1 19:34:59 2011 -0400

     cpuidle: stop depending on pm_idle

specifically this patch:

 commit d91ee5863b71e8c90eaf6035bff3078a85e2e7b5
 Author: Len Brown <len.brown@intel.com>
 Date:   Fri Apr 1 18:28:35 2011 -0400

     cpuidle: replace xen access to x86 pm_idle and default_idle

     ..scribble on pm_idle and access default_idle,
    have it simply disable_cpuidle() so acpi_idle will not load and
    architecture default HLT will be used.

.. but the default HLT does not get used. Instead we end up in the
situation that select_idle_routine() is called from arch/x86/kernel/cpu/common.c
and if we called cpuidle_disable(), then the pm_idle is not set, and
we end up setting pm_idle to mwait_idle or amd_e400_idle.

This patch uses the cpuidle_disabled() functionality and
makes select_idle_routine() adhere to that.

Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: x86@kernel.org
CC: Len Brown <len.brown@intel.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Tejun Heo <tj@kernel.org>
CC: Thomas Renninger <trenn@suse.de>
CC: stable@kernel.org
---
 arch/x86/kernel/process.c |    5 +++++
 include/linux/cpuidle.h   |    2 ++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index e7e3b01..1f7f8c8 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -14,6 +14,7 @@
 #include <linux/utsname.h>
 #include <trace/events/power.h>
 #include <linux/hw_breakpoint.h>
+#include <linux/cpuidle.h>
 #include <asm/cpu.h>
 #include <asm/system.h>
 #include <asm/apic.h>
@@ -587,6 +588,10 @@ void __cpuinit select_idle_routine(const struct cpuinfo_x86
*c)
 	if (pm_idle)
 		return;
 
+	if (cpuidle_disabled()) {
+		pm_idle = default_idle;
+		return;
+	}
 	if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
 		/*
 		 * One CPU supports mwait => All CPUs supports mwait
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index b51629e..123fe9e 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -122,6 +122,7 @@ struct cpuidle_driver {
 };
 
 #ifdef CONFIG_CPU_IDLE
+extern int cpuidle_disabled(void);
 extern void disable_cpuidle(void);
 extern int cpuidle_idle_call(void);
 
@@ -137,6 +138,7 @@ extern int cpuidle_enable_device(struct cpuidle_device
*dev);
 extern void cpuidle_disable_device(struct cpuidle_device *dev);
 
 #else
+static inline int cpuidle_disabled(void) { return 1; }
 static inline void disable_cpuidle(void) { }
 static inline int cpuidle_idle_call(void) { return -ENODEV; }
 
-- 
1.7.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Nov-08 21:15 UTC

head link

[Xen-devel] [PATCH 2/3] x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations.

When using the paravirt interface, most of the page operations are wrapped
in the pvops interface. The one that is not is the pte_flags. The reason
being that for most cases, the "raw" PTE flag values for baremetal and
whatever
pvops platform is running (in this case) - share the same bit meaning.

Except for PAT. Under Linux, the PAT MSR is written to be:

          PAT4                 PAT0
+---+----+----+----+-----+----+----+
 WC | WC | WB | UC | UC- | WC | WB |  <= Linux
+---+----+----+----+-----+----+----+
 WC | WT | WB | UC | UC- | WT | WB |  <= BIOS
+---+----+----+----+-----+----+----+
 WC | WP | WC | UC | UC- | WT | WB |  <= Xen
+---+----+----+----+-----+----+----+

The lookup of this index table translates to looking up
Bit 7, Bit 4, and Bit 3 of PTE:

 PAT/PSE (bit 7) ... PCD (bit 4) .. PWT (bit 3).

If all bits are off, then we are using PAT0. If bit 3 turned on,
then we are using PAT1, if bit 3 and bit 4, then PAT2..

Back to the PAT MSR table:

As you can see, the PAT1 translates to PAT4 under Xen. Under Linux
we only use PAT0, PAT1, and PAT2 for the caching as:

 WB = none (so PAT0)
 WC = PWT (bit 3 on)
 UC = PWT | PCD (bit 3 and 4 are on).

But to make it work with Xen, we end up doing for WC a translation:

 PWT (so bit 3 on) --> PAT (so bit 7 is on) and clear bit 3

And to translate back (when the paravirt pte_val is used) we would:

 PAT (bit 7 on) --> PWT (bit 3 on) and clear bit 7.

This works quite well, except if code uses the pte_flags, as pte_flags
reads the raw value and does not go through the paravirt. Which means
that if (when running under Xen):

 1) we allocate some pages.
 2) call set_pages_array_wc, which ends up calling:
     __page_change_att_set_clr(.., __pgprot(__PAGE_WC),  /* set */
                                 , __pgprot(__PAGE_MASK), /* clear */
    which ends up reading the _raw_ PTE flags and _only_ look at the
    _PTE_FLAG_MASK contents with __PAGE_MASK cleared (0x18) and
    __PAGE_WC (0x8) set.

     read raw *pte -> 0x67
     *pte = 0x67 & ^0x18 | 0x8
     *pte = 0x67 & 0xfffffe7 | 0x8
     *pte = 0x6f

   [now set_pte_atomic is called, and 0x6f is written in, but under
    xen_make_pte, the bit 3 is translated to bit 7, so it ends up
    writting 0xa7, which is correct]

 3) do something to them.
 4) call set_pages_array_wb
     __page_change_att_set_clr(.., __pgprot(__PAGE_WB),  /* set */
                                 , __pgprot(__PAGE_MASK), /* clear */
    which ends up reading the _raw_ PTE and _only_ look at the
    _PTE_FLAG_MASK contents with _PAGE_MASK cleared (0x18) and
    __PAGE_WB (0x0) set:

     read raw *pte -> 0xa7
     *pte = 0xa7 & &0x18 | 0
     *pte = 0xa7 & 0xfffffe7 | 0
     *pte = 0xa7

   [we check whether the old PTE is different from the new one

    if (pte_val(old_pte) != pte_val(new_pte)) {
        set_pte_atomic(kpte, new_pte);
        ...

   and find out that 0xA7 == 0xA7 so we do not write the new PTE value in]

   End result is that we failed at removing the WC caching bit!

 5) free them.
   [and have pages with PAT4 (bit 7) set, so other subsystems end up using
    the pages that have the write combined bit set resulting in crashes.
Yikes!].

The fix, which this patch proposes, is to wrap the pte_pgprot in the CPA
code with newly introduced pte_attrs which can go through the pvops interface
to get the "emulated" value instead of the raw. Naturally if
CONFIG_PARAVIRT is
not set, it would end calling native_pte_val.

The other way to fix this is by wrapping pte_flags and go through the pvops
interface and it really is the Right Thing to do.  The problem is, that past
experience with mprotect stuff demonstrates that it be really expensive in inner
loops, and pte_flags() is used in some very perf-critical areas.

Example code to run this and see the various mysterious subsystems/applications
crashing

MODULE_AUTHOR("Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>");
MODULE_DESCRIPTION("wb_to_wc_and_back");
MODULE_LICENSE("GPL");
MODULE_VERSION(WB_TO_WC);

static int thread(void *arg)
{
	struct page *a[MAX_PAGES];
	unsigned int i, j;
	do {
		for (j = 0, i = 0;i < MAX_PAGES; i++, j++) {
			a[i] = alloc_page(GFP_KERNEL);
			if (!a[i])
				break;
		}
		set_pages_array_wc(a, j);
		set_current_state(TASK_INTERRUPTIBLE);
		schedule_timeout_interruptible(HZ);
		for (i = 0; i < j; i++) {
			unsigned long *addr = page_address(a[i]);
			if (addr) {
				memset(addr, 0xc2, PAGE_SIZE);
			}
		}
		set_pages_array_wb(a, j);
		for (i = 0; i< MAX_PAGES; i++) {
			if (a[i])
				__free_page(a[i]);
			a[i] = NULL;
		}
	} while (!kthread_should_stop());
	return 0;
}
static struct task_struct *t;
static int __init wb_to_wc_init(void)
{
	t = kthread_run(thread, NULL, "wb_to_wc_and_back");
	return 0;
}
static void __exit wb_to_wc_exit(void)
{
	if (t)
		kthread_stop(t);
}
module_init(wb_to_wc_init);
module_exit(wb_to_wc_exit);

This fixes RH BZ #742032
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: stable@kernel.org
---
 arch/x86/include/asm/pgtable.h |    5 +++++
 arch/x86/mm/pageattr.c         |    2 +-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 18601c8..34027b0 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -349,6 +349,11 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot,
pgprot_t newprot)
 	return __pgprot(preservebits | addbits);
 }
 
+static inline pgprot_t pte_attrs(pte_t pte)
+{
+	return __pgprot(pte_val(pte) & PTE_FLAGS_MASK);
+}
+
 #define pte_pgprot(x) __pgprot(pte_flags(x) & PTE_FLAGS_MASK)
 
 #define canon_pgprot(p) __pgprot(massage_pgprot(p))
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index f9e5267..efa8040 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -651,7 +651,7 @@ repeat:
 
 	if (level == PG_LEVEL_4K) {
 		pte_t new_pte;
-		pgprot_t new_prot = pte_pgprot(old_pte);
+		pgprot_t new_prot = pte_attrs(old_pte);
 		unsigned long pfn = pte_pfn(old_pte);
 
 		pgprot_val(new_prot) &= ~pgprot_val(cpa->mask_clr);
-- 
1.7.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Nov-08 21:15 UTC

head link

[Xen-devel] [PATCH 3/3] x86/paravirt: Use pte_val instead of pte_flags on CPA pageattr_test

For details refer to patch "x86/paravirt: Use pte_attrs instead of
pte_flags on CPA/set_p.._wb/wc operations." which explains that
some pages have the _PAGE_PWT bit set in the _PAGE_PSE field
when running under Xen.

When pageattr_test is running it uses pte_flags to check whether
it succedded in setting _PAGE_UNUSED1 bit, but also whether the
page had _PAGE_PSE. This can happen when one of the randomly selected
pages to be tested is a page that has been set to be _PAGE_WC
as under Xen, that field is under _PAGE_PSE. Since the
''pte_huge''
call is using the pte_flags(x) macro, which extracts the "raw"
contents
of the PTE, the translation of _PAGE_PSE -> _PAGE_PWT does not happen
and we incorrectly identify the PTE as bad.

Using the ''pte_val'' instead of ''pte_flags''
fixes the problem and
this patch does that.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: stable@kernel.org
---
 arch/x86/mm/pageattr-test.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/pageattr-test.c b/arch/x86/mm/pageattr-test.c
index b008656..da853e3 100644
--- a/arch/x86/mm/pageattr-test.c
+++ b/arch/x86/mm/pageattr-test.c
@@ -38,6 +38,10 @@ static int pte_testbit(pte_t pte)
 {
 	return pte_flags(pte) & _PAGE_UNUSED1;
 }
+static int pte_testhuge(pte_t pte)
+{
+	return pte_val(pte) & _PAGE_PSE;
+}
 
 struct split_state {
 	long lpg, gpg, spg, exec;
@@ -180,7 +184,7 @@ static int pageattr_test(void)
 		}
 
 		pte = lookup_address(addr[i], &level);
-		if (!pte || !pte_testbit(*pte) || pte_huge(*pte)) {
+		if (!pte || !pte_testbit(*pte) || pte_testhuge(*pte)) {
 			printk(KERN_ERR "CPA %lx: bad pte %Lx\n", addr[i],
 				pte ? (u64)pte_val(*pte) : 0ULL);
 			failed++;
-- 
1.7.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Deepthi Dharwar

2011-Nov-09 10:19 UTC

head link

[Xen-devel] Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

On Wednesday 09 November 2011 02:45 AM, Konrad Rzeszutek Wilk
wrote:> . however we ignore the disable_cpuidle() directive and
> in select_idle_routine() set it to type preferred by the CPU.
> 
> This is a regression introduced by
> 
>  commit a0bfa1373859e9d11dc92561a8667588803e42d8
>  Author: Len Brown <len.brown@intel.com>
>  Date:   Fri Apr 1 19:34:59 2011 -0400
> 
>      cpuidle: stop depending on pm_idle
> 
> specifically this patch:
> 
>  commit d91ee5863b71e8c90eaf6035bff3078a85e2e7b5
>  Author: Len Brown <len.brown@intel.com>
>  Date:   Fri Apr 1 18:28:35 2011 -0400
> 
>      cpuidle: replace xen access to x86 pm_idle and default_idle
> 
>      ..scribble on pm_idle and access default_idle,
>     have it simply disable_cpuidle() so acpi_idle will not load and
>     architecture default HLT will be used.
> 
> .. but the default HLT does not get used. Instead we end up in the
> situation that select_idle_routine() is called from
arch/x86/kernel/cpu/common.c
> and if we called cpuidle_disable(), then the pm_idle is not set, and
> we end up setting pm_idle to mwait_idle or amd_e400_idle.
> 
> This patch uses the cpuidle_disabled() functionality and
> makes select_idle_routine() adhere to that.
> 
> Reported-by: Stefan Bader <stefan.bader@canonical.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: x86@kernel.org
> CC: Len Brown <len.brown@intel.com>
> CC: Borislav Petkov <bp@alien8.de>
> CC: Tejun Heo <tj@kernel.org>
> CC: Thomas Renninger <trenn@suse.de>
> CC: stable@kernel.org
> ---
>  arch/x86/kernel/process.c |    5 +++++
>  include/linux/cpuidle.h   |    2 ++
>  2 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index e7e3b01..1f7f8c8 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -14,6 +14,7 @@
>  #include <linux/utsname.h>
>  #include <trace/events/power.h>
>  #include <linux/hw_breakpoint.h>
> +#include <linux/cpuidle.h>
>  #include <asm/cpu.h>
>  #include <asm/system.h>
>  #include <asm/apic.h>
> @@ -587,6 +588,10 @@ void __cpuinit select_idle_routine(const struct
cpuinfo_x86 *c)
>  	if (pm_idle)
>  		return;
> 
> +	if (cpuidle_disabled()) {
> +		pm_idle = default_idle;
> +		return;
> +	}

The above check is needed to initialise pm_idle to default_idle if
cpuidle is disabled but this requirement here is Zen specific. 
On other architectures, if cpuidle is disabled on boot then we 
rather would want mwait_idle or amd_e400_idle to be enabled than 
default_idle. Can we make this check Zen specific ? 

...  if(ZEN_ARCH && cpuidle_disabled()) ... 



>  	if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
>  		/*
>  		 * One CPU supports mwait => All CPUs supports mwait
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index b51629e..123fe9e 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -122,6 +122,7 @@ struct cpuidle_driver {
>  };
> 
>  #ifdef CONFIG_CPU_IDLE
> +extern int cpuidle_disabled(void);
>  extern void disable_cpuidle(void);
>  extern int cpuidle_idle_call(void);
> 
> @@ -137,6 +138,7 @@ extern int cpuidle_enable_device(struct cpuidle_device
*dev);
>  extern void cpuidle_disable_device(struct cpuidle_device *dev);
> 
>  #else
> +static inline int cpuidle_disabled(void) { return 1; }
>  static inline void disable_cpuidle(void) { }
>  static inline int cpuidle_idle_call(void) { return -ENODEV; }
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Deepthi Dharwar

2011-Nov-09 10:51 UTC

head link

[Xen-devel] Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

Just correcting the typo s/zen/xen/g .

On Wednesday 09 November 2011 03:49 PM, Deepthi Dharwar
wrote:> On Wednesday 09 November 2011 02:45 AM, Konrad Rzeszutek Wilk wrote:
>> . however we ignore the disable_cpuidle() directive and
>> in select_idle_routine() set it to type preferred by the CPU.
>>
>> This is a regression introduced by
>>
>>  commit a0bfa1373859e9d11dc92561a8667588803e42d8
>>  Author: Len Brown <len.brown@intel.com>
>>  Date:   Fri Apr 1 19:34:59 2011 -0400
>>
>>      cpuidle: stop depending on pm_idle
>>
>> specifically this patch:
>>
>>  commit d91ee5863b71e8c90eaf6035bff3078a85e2e7b5
>>  Author: Len Brown <len.brown@intel.com>
>>  Date:   Fri Apr 1 18:28:35 2011 -0400
>>
>>      cpuidle: replace xen access to x86 pm_idle and default_idle
>>
>>      ..scribble on pm_idle and access default_idle,
>>     have it simply disable_cpuidle() so acpi_idle will not load and
>>     architecture default HLT will be used.
>>
>> .. but the default HLT does not get used. Instead we end up in the
>> situation that select_idle_routine() is called from
arch/x86/kernel/cpu/common.c
>> and if we called cpuidle_disable(), then the pm_idle is not set, and
>> we end up setting pm_idle to mwait_idle or amd_e400_idle.
>>
>> This patch uses the cpuidle_disabled() functionality and
>> makes select_idle_routine() adhere to that.
>>
>> Reported-by: Stefan Bader <stefan.bader@canonical.com>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>
>> CC: Thomas Gleixner <tglx@linutronix.de>
>> CC: Ingo Molnar <mingo@redhat.com>
>> CC: "H. Peter Anvin" <hpa@zytor.com>
>> CC: x86@kernel.org
>> CC: Len Brown <len.brown@intel.com>
>> CC: Borislav Petkov <bp@alien8.de>
>> CC: Tejun Heo <tj@kernel.org>
>> CC: Thomas Renninger <trenn@suse.de>
>> CC: stable@kernel.org
>> ---
>>  arch/x86/kernel/process.c |    5 +++++
>>  include/linux/cpuidle.h   |    2 ++
>>  2 files changed, 7 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> index e7e3b01..1f7f8c8 100644
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -14,6 +14,7 @@
>>  #include <linux/utsname.h>
>>  #include <trace/events/power.h>
>>  #include <linux/hw_breakpoint.h>
>> +#include <linux/cpuidle.h>
>>  #include <asm/cpu.h>
>>  #include <asm/system.h>
>>  #include <asm/apic.h>
>> @@ -587,6 +588,10 @@ void __cpuinit select_idle_routine(const struct
cpuinfo_x86 *c)
>>  	if (pm_idle)
>>  		return;
>>
>> +	if (cpuidle_disabled()) {
>> +		pm_idle = default_idle;
>> +		return;
>> +	}
> 
> 
> The above check is needed to initialise pm_idle to default_idle if
> cpuidle is disabled but this requirement here is xen specific. 
> On other architectures, if cpuidle is disabled on boot then we 
> rather would want mwait_idle or amd_e400_idle to be enabled than 
> default_idle. Can we make this check xen specific ? 
> 
> ...  if(XEN_ARCH && cpuidle_disabled()) ... 
> 
> 
> 
> 
>>  	if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
>>  		/*
>>  		 * One CPU supports mwait => All CPUs supports mwait
>> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
>> index b51629e..123fe9e 100644
>> --- a/include/linux/cpuidle.h
>> +++ b/include/linux/cpuidle.h
>> @@ -122,6 +122,7 @@ struct cpuidle_driver {
>>  };
>>
>>  #ifdef CONFIG_CPU_IDLE
>> +extern int cpuidle_disabled(void);
>>  extern void disable_cpuidle(void);
>>  extern int cpuidle_idle_call(void);
>>
>> @@ -137,6 +138,7 @@ extern int cpuidle_enable_device(struct
cpuidle_device *dev);
>>  extern void cpuidle_disable_device(struct cpuidle_device *dev);
>>
>>  #else
>> +static inline int cpuidle_disabled(void) { return 1; }
>>  static inline void disable_cpuidle(void) { }
>>  static inline int cpuidle_idle_call(void) { return -ENODEV; }
>>
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Nov-09 14:44 UTC

head link

[Xen-devel] Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

. snip..> > 
> >      ..scribble on pm_idle and access default_idle,
> >     have it simply disable_cpuidle() so acpi_idle will not load and
> >     architecture default HLT will be used.
> > 
> > .. but the default HLT does not get used. Instead we end up in the
Hey Deepthi,
> > +	if (cpuidle_disabled()) {
> > +		pm_idle = default_idle;
> > +		return;
> > +	}
> 
> 
> The above check is needed to initialise pm_idle to default_idle if
> cpuidle is disabled but this requirement here is Zen specific. 
> On other architectures, if cpuidle is disabled on boot then we 
> rather would want mwait_idle or amd_e400_idle to be enabled than 
> default_idle. Can we make this check Zen specific ? 
We do? Why? I would have thought that if you want to disable the cpuidle
you would want the default_idle. 

Perhaps there is another way do this is:

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index becd6d9..04b10a4 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -38,6 +38,7 @@ int cpuidle_disabled(void)
 void disable_cpuidle(void)
 {
 	off = 1;
+	pm_idle = default_idle;
 }

which would do almost the same thing as this patch (well, except
that if you run cpuidle.off=1 you still end up with amd_e400_idle
instead of default_idle, so it is not the complete solution).
> 
> ...  if(ZEN_ARCH && cpuidle_disabled()) ... 
That would not work very well. For one thing you would need to call
''xen_domain()'', and pull in a lots of header files. Second of,
it
looks quite ugly and kernel folks like pretty code.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Nov-10 04:06 UTC

head link

Re: [Xen-devel] [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

On Tue, Nov 08, 2011 at 04:15:12PM -0500, Konrad Rzeszutek Wilk
wrote:> . however we ignore the disable_cpuidle() directive and
> in select_idle_routine() set it to type preferred by the CPU.
> 
> This is a regression introduced by
> 
>  commit a0bfa1373859e9d11dc92561a8667588803e42d8
>  Author: Len Brown <len.brown@intel.com>
>  Date:   Fri Apr 1 19:34:59 2011 -0400
> 
>      cpuidle: stop depending on pm_idle
> 
> specifically this patch:
> 
>  commit d91ee5863b71e8c90eaf6035bff3078a85e2e7b5
>  Author: Len Brown <len.brown@intel.com>
>  Date:   Fri Apr 1 18:28:35 2011 -0400
> 
>      cpuidle: replace xen access to x86 pm_idle and default_idle
> 
>      ..scribble on pm_idle and access default_idle,
>     have it simply disable_cpuidle() so acpi_idle will not load and
>     architecture default HLT will be used.
> 
> .. but the default HLT does not get used. Instead we end up in the
> situation that select_idle_routine() is called from
arch/x86/kernel/cpu/common.c
> and if we called cpuidle_disable(), then the pm_idle is not set, and
> we end up setting pm_idle to mwait_idle or amd_e400_idle.
> 
> This patch uses the cpuidle_disabled() functionality and
> makes select_idle_routine() adhere to that.
> 
> Reported-by: Stefan Bader <stefan.bader@canonical.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
And just found that there is a BZ for this as well:
https://bugzilla.redhat.com/show_bug.cgi?id=739499

And this patch fixes the Linux kernel to boot under Amazon EC2.

Another option which I played with today was to do this instead:

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 46d6d21..45136f2 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -448,6 +448,6 @@ void __init xen_arch_setup(void)
 #endif
        disable_cpuidle();
        boot_option_idle_override = IDLE_HALT;
-
+       pm_idle = default_idle;
        fiddle_vdso();
 }


But that defeats the whole purpose of the patch series that Len developed
where we would _not_ scribble on the pm_idle. <sigh>

Thoughts?> 
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: x86@kernel.org
> CC: Len Brown <len.brown@intel.com>
> CC: Borislav Petkov <bp@alien8.de>
> CC: Tejun Heo <tj@kernel.org>
> CC: Thomas Renninger <trenn@suse.de>
> CC: stable@kernel.org
> ---
>  arch/x86/kernel/process.c |    5 +++++
>  include/linux/cpuidle.h   |    2 ++
>  2 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index e7e3b01..1f7f8c8 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -14,6 +14,7 @@
>  #include <linux/utsname.h>
>  #include <trace/events/power.h>
>  #include <linux/hw_breakpoint.h>
> +#include <linux/cpuidle.h>
>  #include <asm/cpu.h>
>  #include <asm/system.h>
>  #include <asm/apic.h>
> @@ -587,6 +588,10 @@ void __cpuinit select_idle_routine(const struct
cpuinfo_x86 *c)
>  	if (pm_idle)
>  		return;
>  
> +	if (cpuidle_disabled()) {
> +		pm_idle = default_idle;
> +		return;
> +	}
>  	if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
>  		/*
>  		 * One CPU supports mwait => All CPUs supports mwait
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index b51629e..123fe9e 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -122,6 +122,7 @@ struct cpuidle_driver {
>  };
>  
>  #ifdef CONFIG_CPU_IDLE
> +extern int cpuidle_disabled(void);
>  extern void disable_cpuidle(void);
>  extern int cpuidle_idle_call(void);
>  
> @@ -137,6 +138,7 @@ extern int cpuidle_enable_device(struct cpuidle_device
*dev);
>  extern void cpuidle_disable_device(struct cpuidle_device *dev);
>  
>  #else
> +static inline int cpuidle_disabled(void) { return 1; }
>  static inline void disable_cpuidle(void) { }
>  static inline int cpuidle_idle_call(void) { return -ENODEV; }
>  
> -- 
> 1.7.7.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Deepthi Dharwar

2011-Nov-11 11:41 UTC

head link

[Xen-devel] Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

On Wednesday 09 November 2011 08:14 PM, Konrad Rzeszutek Wilk
wrote:> . snip..
>>>
>>>      ..scribble on pm_idle and access default_idle,
>>>     have it simply disable_cpuidle() so acpi_idle will not load and
>>>     architecture default HLT will be used.
>>>
>>> .. but the default HLT does not get used. Instead we end up in the
> 
> Hey Deepthi,
> 
>>> +	if (cpuidle_disabled()) {
>>> +		pm_idle = default_idle;
>>> +		return;
>>> +	}
>>
>>
>> The above check is needed to initialise pm_idle to default_idle if
>> cpuidle is disabled but this requirement here is Zen specific. 
>> On other architectures, if cpuidle is disabled on boot then we 
>> rather would want mwait_idle or amd_e400_idle to be enabled than 
>> default_idle. Can we make this check Zen specific ? 
> 
> We do? Why? I would have thought that if you want to disable the cpuidle
> you would want the default_idle. 
Well I was with a view that if cpuidle is disabled, mwait and arm_e400
would take precedence over default_idle. But if is ok to have default_idle
instead,
then go ahead. 
> Perhaps there is another way do this is:
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index becd6d9..04b10a4 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -38,6 +38,7 @@ int cpuidle_disabled(void)
>  void disable_cpuidle(void)
>  {
>  	off = 1;
> +	pm_idle = default_idle;
>  }
> Brining pm_idle pointer back to cpuidle code is going a step back coz
we wanted to complete remove using pm_idle going forward. As having 
a pointer exported into various files is not a good thing. That''s the 
reason we started the clean up in the first place :) 
> which would do almost the same thing as this patch (well, except
> that if you run cpuidle.off=1 you still end up with amd_e400_idle
> instead of default_idle, so it is not the complete solution).
> 
>>
>> ...  if(ZEN_ARCH && cpuidle_disabled()) ... 
> 
> That would not work very well. For one thing you would need to call
> ''xen_domain()'', and pull in a lots of header files.
Second of, it
> looks quite ugly and kernel folks like pretty code.
> Yes, I agree to this. 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Len Brown

2011-Nov-13 06:00 UTC

head link

Re: [Xen-devel] [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

> And just found that there is a BZ for this as well:
> https://bugzilla.redhat.com/show_bug.cgi?id=739499
>
> And this patch fixes the Linux kernel to boot under Amazon EC2.
doesn''t matter.

Working around an Amazon EC2 bug in a newly compiled upstream kernel
isn''t going to help with any of the kernels that don''t include
that workaround.

Amazon EC2 should not advertise MWAIT support that it does not have,
and all kernels should run on it w/o any workaround.

Len Brown, Intel Open Source Technology Center

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Nov-14 14:37 UTC

head link

Re: [Xen-devel] [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

On Sun, Nov 13, 2011 at 01:00:15AM -0500, Len Brown
wrote:> > And just found that there is a BZ for this as well:
> > https://bugzilla.redhat.com/show_bug.cgi?id=739499
> >
> > And this patch fixes the Linux kernel to boot under Amazon EC2.
> 
> doesn''t matter.
> 
> Working around an Amazon EC2 bug in a newly compiled upstream kernel
> isn''t going to help with any of the kernels that don''t
include that workaround.
Prior to 3.1 we did have a solution for this - we would use the default_idle
and not pick and/choose one based on the CPUID flags.

It was not really a choice made by "hypervisor wrongly advertises the
MWAIT flag", but rather that we want to use the safe_halt, which ends up
calling the yield hypercall.

That is the optimal solution irregardless of what version of hypervisor
is being used.
> 
> Amazon EC2 should not advertise MWAIT support that it does not have,
> and all kernels should run on it w/o any workaround.
> 
> Len Brown, Intel Open Source Technology Center
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Nov-15 14:40 UTC

head link

Re: [Xen-devel] Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

> Well I was with a view that if cpuidle is disabled, mwait and arm_e400
> would take precedence over default_idle. But if is ok to have default_idle
instead,
> then go ahead. 
> 
> > Perhaps there is another way do this is:
> > 
> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> > index becd6d9..04b10a4 100644
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -38,6 +38,7 @@ int cpuidle_disabled(void)
> >  void disable_cpuidle(void)
> >  {
> >  	off = 1;
> > +	pm_idle = default_idle;
> >  }
> > 
> Brining pm_idle pointer back to cpuidle code is going a step back coz
> we wanted to complete remove using pm_idle going forward. As having 
> a pointer exported into various files is not a good thing. That''s
the
> reason we started the clean up in the first place :) 
Perhaps then something along these lines, where we still set default_idle
but don''t fiddle with the pm_idle (and can still rip out the
EXPORT_SYMBOL_GPL(default_idle) in 2012):

(Hadn''t tested this yet).

diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
index c2ff2a1..2d2f01c 100644
--- a/arch/x86/include/asm/system.h
+++ b/arch/x86/include/asm/system.h
@@ -401,6 +401,7 @@ extern unsigned long arch_align_stack(unsigned long sp);
 extern void free_init_pages(char *what, unsigned long begin, unsigned long
end);
 
 void default_idle(void);
+bool set_pm_idle_to_default(void);
 
 void stop_this_cpu(void *dummy);
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index e7e3b01..bfb113f 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -403,6 +403,14 @@ void default_idle(void)
 EXPORT_SYMBOL(default_idle);
 #endif
 
+bool set_pm_idle_to_default()
+{
+	if (!pm_idle) {
+		pm_idle = default_idle;
+		return true;
+	}
+	return false;
+}
 void stop_this_cpu(void *dummy)
 {
 	local_irq_disable();
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 46d6d21..7506181 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -448,6 +448,6 @@ void __init xen_arch_setup(void)
 #endif
 	disable_cpuidle();
 	boot_option_idle_override = IDLE_HALT;
-
+	WARN_ON(!set_pm_idle_to_default());
 	fiddle_vdso();
 }

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Nov-22 13:46 UTC

head link

Re: [Xen-devel] Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

On Tue, Nov 15, 2011 at 09:40:04AM -0500, Konrad Rzeszutek Wilk
wrote:> > Well I was with a view that if cpuidle is disabled, mwait and arm_e400
> > would take precedence over default_idle. But if is ok to have
default_idle instead,
> > then go ahead. 
> > 
> > > Perhaps there is another way do this is:
> > > 
> > > diff --git a/drivers/cpuidle/cpuidle.c
b/drivers/cpuidle/cpuidle.c
> > > index becd6d9..04b10a4 100644
> > > --- a/drivers/cpuidle/cpuidle.c
> > > +++ b/drivers/cpuidle/cpuidle.c
> > > @@ -38,6 +38,7 @@ int cpuidle_disabled(void)
> > >  void disable_cpuidle(void)
> > >  {
> > >  	off = 1;
> > > +	pm_idle = default_idle;
> > >  }
> > > 
> > Brining pm_idle pointer back to cpuidle code is going a step back coz
> > we wanted to complete remove using pm_idle going forward. As having 
> > a pointer exported into various files is not a good thing.
That''s the
> > reason we started the clean up in the first place :) 
> 
> Perhaps then something along these lines, where we still set default_idle
> but don''t fiddle with the pm_idle (and can still rip out the
> EXPORT_SYMBOL_GPL(default_idle) in 2012):
> 
> (Hadn''t tested this yet).
Now have tested it.
> 
> diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
> index c2ff2a1..2d2f01c 100644
> --- a/arch/x86/include/asm/system.h
> +++ b/arch/x86/include/asm/system.h
> @@ -401,6 +401,7 @@ extern unsigned long arch_align_stack(unsigned long
sp);
>  extern void free_init_pages(char *what, unsigned long begin, unsigned long
end);
>  
>  void default_idle(void);
> +bool set_pm_idle_to_default(void);
>  
>  void stop_this_cpu(void *dummy);
>  
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index e7e3b01..bfb113f 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -403,6 +403,14 @@ void default_idle(void)
>  EXPORT_SYMBOL(default_idle);
>  #endif
>  
> +bool set_pm_idle_to_default()
> +{
> +	if (!pm_idle) {
> +		pm_idle = default_idle;
> +		return true;
> +	}
> +	return false;
> +}
>  void stop_this_cpu(void *dummy)
>  {
>  	local_irq_disable();
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 46d6d21..7506181 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -448,6 +448,6 @@ void __init xen_arch_setup(void)
>  #endif
>  	disable_cpuidle();
>  	boot_option_idle_override = IDLE_HALT;
> -
> +	WARN_ON(!set_pm_idle_to_default());
>  	fiddle_vdso();
>  }
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

Deepthi Dharwar

2011-Nov-23 11:06 UTC

head link

Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

Hi Konrad,

On 11/22/2011 07:16 PM, Konrad Rzeszutek Wilk wrote:
> On Tue, Nov 15, 2011 at 09:40:04AM -0500, Konrad Rzeszutek Wilk wrote:
>>> Well I was with a view that if cpuidle is disabled, mwait and
arm_e400
>>> would take precedence over default_idle. But if is ok to have
default_idle instead,
>>> then go ahead. 
>>>
>>>> Perhaps there is another way do this is:
>>>>
>>>> diff --git a/drivers/cpuidle/cpuidle.c
b/drivers/cpuidle/cpuidle.c
>>>> index becd6d9..04b10a4 100644
>>>> --- a/drivers/cpuidle/cpuidle.c
>>>> +++ b/drivers/cpuidle/cpuidle.c
>>>> @@ -38,6 +38,7 @@ int cpuidle_disabled(void)
>>>>  void disable_cpuidle(void)
>>>>  {
>>>>  	off = 1;
>>>> +	pm_idle = default_idle;
>>>>  }
>>>>
>>> Brining pm_idle pointer back to cpuidle code is going a step back
coz
>>> we wanted to complete remove using pm_idle going forward. As having
>>> a pointer exported into various files is not a good thing.
That''s the
>>> reason we started the clean up in the first place :) 
>>
>> Perhaps then something along these lines, where we still set
default_idle
>> but don''t fiddle with the pm_idle (and can still rip out the
>> EXPORT_SYMBOL_GPL(default_idle) in 2012):
>>
>> (Hadn''t tested this yet).
>
> Now have tested it.
>


As long as you dont bring back exporting pm_idle and using it in cpuidle
code
it should be ok.  So one should not use pm_idle in disable_cpuidle function.
>>
>> diff --git a/arch/x86/include/asm/system.h
b/arch/x86/include/asm/system.h
>> index c2ff2a1..2d2f01c 100644
>> --- a/arch/x86/include/asm/system.h
>> +++ b/arch/x86/include/asm/system.h
>> @@ -401,6 +401,7 @@ extern unsigned long arch_align_stack(unsigned long
sp);
>>  extern void free_init_pages(char *what, unsigned long begin, unsigned
long end);
>>  
>>  void default_idle(void);
>> +bool set_pm_idle_to_default(void);
>>  
>>  void stop_this_cpu(void *dummy);
>>  
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> index e7e3b01..bfb113f 100644
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -403,6 +403,14 @@ void default_idle(void)
>>  EXPORT_SYMBOL(default_idle);
>>  #endif
>>  
>> +bool set_pm_idle_to_default()
>> +{
>> +	if (!pm_idle) {
>> +		pm_idle = default_idle;
>> +		return true;
>> +	}
>> +	return false;
>> +}
>>  void stop_this_cpu(void *dummy)
>>  {
>>  	local_irq_disable();
>> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
>> index 46d6d21..7506181 100644
>> --- a/arch/x86/xen/setup.c
>> +++ b/arch/x86/xen/setup.c
>> @@ -448,6 +448,6 @@ void __init xen_arch_setup(void)
>>  #endif
>>  	disable_cpuidle();
>>  	boot_option_idle_override = IDLE_HALT;
>> -
>> +	WARN_ON(!set_pm_idle_to_default());
>>  	fiddle_vdso();
>>  }
>>


This looks good, where we see setting of default_idle only  in
Xen setup code and not the in the generic cpuidle code path.
This would allow other architectures to use mwait or arm_e400
to be enabled when cpuidle is disabled.
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>


Cheers,
Deepthi



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Dec-02 23:31 UTC

head link

Re: [PATCH 2/3] x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations.

> The fix, which this patch proposes, is to wrap the pte_pgprot in the CPA
> code with newly introduced pte_attrs which can go through the pvops
interface
> to get the "emulated" value instead of the raw. Naturally if
CONFIG_PARAVIRT is
> not set, it would end calling native_pte_val.
> 
> The other way to fix this is by wrapping pte_flags and go through the pvops
> interface and it really is the Right Thing to do.  The problem is, that
past
> experience with mprotect stuff demonstrates that it be really expensive in
inner
> loops, and pte_flags() is used in some very perf-critical areas.
I did not get to verify the mprotect stuff as I need to chase down the details
of it,
but I did run some benchmarks using kernbench on three different boxes:

 AMD A8-3850 (8GB) - tst005
 Intel i3-2100 (8GB) - tst007
 Nehelem EX (32logical cpus) (32GB) - tst010

I''ve put all the kernebench results in
https://www.dumpdata.com/results/baseline_pte_flags_pte_attrs/
(and the chart for the AMD is attached).

The boxes have a fresh install of F16, with a 3.2-rc3 variant kernel using the
.config that F16 came with. I just hit Enter when oldconfig asked me to choose.

The baseline is virgin v3.2-rc3. The pte_attrs is the patch that this email is
replaying too (on top of v3.2-rc3). The pte_flags are two patches that wrap
pte_flags
in paravirt and use alternative_asm to patch the code (on top of v3.2-rc3).

The patches are in the URL mentioned or in my git branch as
devel/pte_attrs.v1 (
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git).
I am also attaching them in this email.

The summary is that I could only get the numbers to show some difference when
the maximum load was run - and _only_ on the AMD machine. The small SandyBridge
and the big SandyBridge had no trouble with. The AMD machine the difference was
13% worst if pte_flags (so alternative_asm) was used instead of pte_attrs.

The way I did these tests is to bootup with ''init=/bin/bash'',
remount / as rw, activate
swap disk and run kernbench on the v3.2-rc3 linux tree. Then unplug the machine
for a tea
break and then repeat the cycle with a different kernel.

Xen devel - Nov 2011 - [PATCH] x86/acpi fixes for 3.2 (v1) impacting distributions.

[Xen-devel] [PATCH] x86/acpi fixes for 3.2 (v1) impacting distributions.

[Xen-devel] [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

[Xen-devel] [PATCH 2/3] x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations.

[Xen-devel] [PATCH 3/3] x86/paravirt: Use pte_val instead of pte_flags on CPA pageattr_test

[Xen-devel] Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

[Xen-devel] Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

[Xen-devel] Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

Re: [Xen-devel] [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

[Xen-devel] Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

Re: [Xen-devel] [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

Re: [Xen-devel] [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

Re: [Xen-devel] Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

Re: [Xen-devel] Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

Re: [PATCH 1/3] cpuidle: If disable_cpuidle() is called, set pm_idle to default_idle.

Re: [PATCH 2/3] x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations.