Stefan Bader
2011-Oct-26 10:24 UTC
[Xen-devel] Regression in 3.1 causes Xen to use wrong idle routine
The following commit changes calls to pm_idle into first trying cpuidle_call_idle() and if that returns non-zero to fall back to call pm_idle(). commit a0bfa1373859e9d11dc92561a8667588803e42d8 Author: Len Brown <len.brown@intel.com> Date: Fri Apr 1 19:34:59 2011 -0400 cpuidle: stop depending on pm_idle However cpuidle_call_idle() will return -ENODEV if it is supposed to be disabled by cpuidle.off. Which then causes pm_idle() to be called. This has some bad interaction with the following change that tries to make use of disabling cpuidle in Xen to fall back to hlt. commit d91ee5863b71e8c90eaf6035bff3078a85e2e7b5 Author: Len Brown <len.brown@intel.com> Date: Fri Apr 1 18:28:35 2011 -0400 cpuidle: replace xen access to x86 pm_idle and default_idle The problem I see is that select_idle_routine() is called from arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle anymore, it can cause mwait_idle or amd_e400_idle functions to be selected. In testing it seem amd_e400_idle in PVM domU at least does not immediately cause problems, but mwait_idle just causes crashes. From the reports I have this may be related to older hypervisors (3.1 and older) not clearing the mwait capability. But overall there seems something wrong in the interaction. I am not really sure whether the logic of calling pm_idle() on all errors from cpuidle_call_idle() is already flawed or the assumption in the Xen patch about being able to prevent the wrong idle function by turning cpuidle off is incorrect. One quick fix could be to add some Xen case into select_idle_routine() which picks default_idle... -Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Oct-26 13:30 UTC
[Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
On Wed, Oct 26, 2011 at 12:24:17PM +0200, Stefan Bader wrote:> The following commit changes calls to pm_idle into first trying > cpuidle_call_idle() and if that returns non-zero to fall back to > call pm_idle(). > > commit a0bfa1373859e9d11dc92561a8667588803e42d8 > Author: Len Brown <len.brown@intel.com> > Date: Fri Apr 1 19:34:59 2011 -0400 > > cpuidle: stop depending on pm_idle > > However cpuidle_call_idle() will return -ENODEV if it is supposed to be disabled > by cpuidle.off. Which then causes pm_idle() to be called. > > This has some bad interaction with the following change that tries to > make use of disabling cpuidle in Xen to fall back to hlt. > > commit d91ee5863b71e8c90eaf6035bff3078a85e2e7b5 > Author: Len Brown <len.brown@intel.com> > Date: Fri Apr 1 18:28:35 2011 -0400 > > cpuidle: replace xen access to x86 pm_idle and default_idle > > The problem I see is that select_idle_routine() is called from > arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle > anymore, it can cause mwait_idle or amd_e400_idle functions to be selected.Right, b/c that is what d91ee5863b71e8c90eaf6035bff3078a85e2e7b5 was suppose to do - " xen scribble on pm_idle and access default_idle, have it simply disable_cpuidle() so acpi_idle will not load and architecture default HLT will be used." But it seems that select_idle_routine() was not thought off.> In testing it seem amd_e400_idle in PVM domU at least does not immediately cause > problems, but mwait_idle just causes crashes. From the reports I have > this may be related to older hypervisors (3.1 and older) not clearing the mwait > capability. But overall there seems something wrong in the interaction. > > I am not really sure whether the logic of calling pm_idle() on all errors from > cpuidle_call_idle() is already flawed or the assumption in the Xen patch about > being able to prevent the wrong idle function by turning cpuidle off is incorrect. > One quick fix could be to add some Xen case into select_idle_routine() which > picks default_idle...What about using the cpuidle_disabled() functionality and adhere to that? As so: diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index e7e3b01..1f7f8c8 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -14,6 +14,7 @@ #include <linux/utsname.h> #include <trace/events/power.h> #include <linux/hw_breakpoint.h> +#include <linux/cpuidle.h> #include <asm/cpu.h> #include <asm/system.h> #include <asm/apic.h> @@ -587,6 +588,10 @@ void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c) if (pm_idle) return; + if (cpuidle_disabled()) { + pm_idle = default_idle; + return; + } if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) { /* * One CPU supports mwait => All CPUs supports mwait diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index b51629e..123fe9e 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -122,6 +122,7 @@ struct cpuidle_driver { }; #ifdef CONFIG_CPU_IDLE +extern int cpuidle_disabled(void); extern void disable_cpuidle(void); extern int cpuidle_idle_call(void); @@ -137,6 +138,7 @@ extern int cpuidle_enable_device(struct cpuidle_device *dev); extern void cpuidle_disable_device(struct cpuidle_device *dev); #else +static inline int cpuidle_disabled(void) { return 1; } static inline void disable_cpuidle(void) { } static inline int cpuidle_idle_call(void) { return -ENODEV; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Bader
2011-Oct-26 13:36 UTC
[Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
On 26.10.2011 15:30, Konrad Rzeszutek Wilk wrote:> On Wed, Oct 26, 2011 at 12:24:17PM +0200, Stefan Bader wrote: >> The following commit changes calls to pm_idle into first trying >> cpuidle_call_idle() and if that returns non-zero to fall back to >> call pm_idle(). >> >> commit a0bfa1373859e9d11dc92561a8667588803e42d8 >> Author: Len Brown <len.brown@intel.com> >> Date: Fri Apr 1 19:34:59 2011 -0400 >> >> cpuidle: stop depending on pm_idle >> >> However cpuidle_call_idle() will return -ENODEV if it is supposed to be disabled >> by cpuidle.off. Which then causes pm_idle() to be called. >> >> This has some bad interaction with the following change that tries to >> make use of disabling cpuidle in Xen to fall back to hlt. >> >> commit d91ee5863b71e8c90eaf6035bff3078a85e2e7b5 >> Author: Len Brown <len.brown@intel.com> >> Date: Fri Apr 1 18:28:35 2011 -0400 >> >> cpuidle: replace xen access to x86 pm_idle and default_idle >> >> The problem I see is that select_idle_routine() is called from >> arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle >> anymore, it can cause mwait_idle or amd_e400_idle functions to be selected. > > Right, b/c that is what d91ee5863b71e8c90eaf6035bff3078a85e2e7b5 was suppose > to do - " xen scribble on pm_idle and access default_idle, > have it simply disable_cpuidle() so acpi_idle will not load and > architecture default HLT will be used." > > But it seems that select_idle_routine() was not thought off. > >> In testing it seem amd_e400_idle in PVM domU at least does not immediately cause >> problems, but mwait_idle just causes crashes. From the reports I have >> this may be related to older hypervisors (3.1 and older) not clearing the mwait >> capability. But overall there seems something wrong in the interaction. >> >> I am not really sure whether the logic of calling pm_idle() on all errors from >> cpuidle_call_idle() is already flawed or the assumption in the Xen patch about >> being able to prevent the wrong idle function by turning cpuidle off is incorrect. >> One quick fix could be to add some Xen case into select_idle_routine() which >> picks default_idle... > > What about using the cpuidle_disabled() functionality and adhere to that? > As so: > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index e7e3b01..1f7f8c8 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -14,6 +14,7 @@ > #include <linux/utsname.h> > #include <trace/events/power.h> > #include <linux/hw_breakpoint.h> > +#include <linux/cpuidle.h> > #include <asm/cpu.h> > #include <asm/system.h> > #include <asm/apic.h> > @@ -587,6 +588,10 @@ void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c) > if (pm_idle) > return; > > + if (cpuidle_disabled()) { > + pm_idle = default_idle; > + return; > + } > if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) { > /* > * One CPU supports mwait => All CPUs supports mwait > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h > index b51629e..123fe9e 100644 > --- a/include/linux/cpuidle.h > +++ b/include/linux/cpuidle.h > @@ -122,6 +122,7 @@ struct cpuidle_driver { > }; > > #ifdef CONFIG_CPU_IDLE > +extern int cpuidle_disabled(void); > extern void disable_cpuidle(void); > extern int cpuidle_idle_call(void); > > @@ -137,6 +138,7 @@ extern int cpuidle_enable_device(struct cpuidle_device *dev); > extern void cpuidle_disable_device(struct cpuidle_device *dev); > > #else > +static inline int cpuidle_disabled(void) { return 1; } > static inline void disable_cpuidle(void) { } > static inline int cpuidle_idle_call(void) { return -ENODEV; } > > -- > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html>From reading over it, this should work. Though I would be interested to hearfrom the linux-acpi folks. Also to double check that calling pm_idle when cpuidle.off was specified really is what is intended. -Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Oct-26 13:48 UTC
[Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
> > What about using the cpuidle_disabled() functionality and adhere to that? > > As so: > >.. snip..> > >From reading over it, this should work. Though I would be interested to hear > from the linux-acpi folks. Also to double check that calling pm_idle when > cpuidle.off was specified really is what is intended.Oh yeah, definitly need the input from linux-acpi folks. And also to be actually tested :-) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Bader
2011-Oct-26 13:57 UTC
[Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
On 26.10.2011 15:48, Konrad Rzeszutek Wilk wrote:>>> What about using the cpuidle_disabled() functionality and adhere to that? >>> As so: >>> > .. snip.. >> >> >From reading over it, this should work. Though I would be interested to hear >> from the linux-acpi folks. Also to double check that calling pm_idle when >> cpuidle.off was specified really is what is intended. > > Oh yeah, definitly need the input from linux-acpi folks. And also to be actually > tested :-)I can volunteer to do the testing. But I am lazy enough to hold back a bit as someone may tell us this is completely the wrong way to fix it. :) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-09 17:58 UTC
Re: [Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
On Wed, Oct 26, 2011 at 03:57:15PM +0200, Stefan Bader wrote:> On 26.10.2011 15:48, Konrad Rzeszutek Wilk wrote: > >>> What about using the cpuidle_disabled() functionality and adhere to that? > >>> As so: > >>> > > .. snip.. > >> > >> >From reading over it, this should work. Though I would be interested to hear > >> from the linux-acpi folks. Also to double check that calling pm_idle when > >> cpuidle.off was specified really is what is intended. > > > > Oh yeah, definitly need the input from linux-acpi folks. And also to be actually > > tested :-) > > I can volunteer to do the testing. But I am lazy enough to hold back a bit as > someone may tell us this is completely the wrong way to fix it. :)So the other option is to use ''idle=halt'' on the Linux command line. That should provide the workaround for the folks reporting this (is there a BZ for it?). At least until a good solution is hammered out. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Bader
2011-Nov-10 08:33 UTC
Re: [Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
On 09.11.2011 18:58, Konrad Rzeszutek Wilk wrote:> On Wed, Oct 26, 2011 at 03:57:15PM +0200, Stefan Bader wrote: >> On 26.10.2011 15:48, Konrad Rzeszutek Wilk wrote: >>>>> What about using the cpuidle_disabled() functionality and adhere to that? >>>>> As so: >>>>> >>> .. snip.. >>>> >>>> >From reading over it, this should work. Though I would be interested to hear >>>> from the linux-acpi folks. Also to double check that calling pm_idle when >>>> cpuidle.off was specified really is what is intended. >>> >>> Oh yeah, definitly need the input from linux-acpi folks. And also to be actually >>> tested :-) >> >> I can volunteer to do the testing. But I am lazy enough to hold back a bit as >> someone may tell us this is completely the wrong way to fix it. :) > > So the other option is to use ''idle=halt'' on the Linux command line. That should > provide the workaround for the folks reporting this (is there a BZ for it?). >Believe upstream bz is still down. Got the issue reported here, though: http://bugs.launchpad.net/bugs/881076 Hm, as a workaround probably. I guess it could be added when the test images are done. Unfortunately, when running your image in the cloud it is kind of hard to recover. Even more as you could have tested on an AMD based host. Wonder whether it would be an option to automatically mask the mwait capability off when running in paravirt mode... -Stefan> At least until a good solution is hammered out. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-10 14:55 UTC
Re: [Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
On Thu, Nov 10, 2011 at 09:33:06AM +0100, Stefan Bader wrote:> On 09.11.2011 18:58, Konrad Rzeszutek Wilk wrote: > > On Wed, Oct 26, 2011 at 03:57:15PM +0200, Stefan Bader wrote: > >> On 26.10.2011 15:48, Konrad Rzeszutek Wilk wrote: > >>>>> What about using the cpuidle_disabled() functionality and adhere to that? > >>>>> As so: > >>>>> > >>> .. snip.. > >>>> > >>>> >From reading over it, this should work. Though I would be interested to hear > >>>> from the linux-acpi folks. Also to double check that calling pm_idle when > >>>> cpuidle.off was specified really is what is intended. > >>> > >>> Oh yeah, definitly need the input from linux-acpi folks. And also to be actually > >>> tested :-) > >> > >> I can volunteer to do the testing. But I am lazy enough to hold back a bit as > >> someone may tell us this is completely the wrong way to fix it. :) > > > > So the other option is to use ''idle=halt'' on the Linux command line. That should > > provide the workaround for the folks reporting this (is there a BZ for it?). > > > Believe upstream bz is still down. Got the issue reported here, though: > > http://bugs.launchpad.net/bugs/881076Ok, added that to the patch description.> > Hm, as a workaround probably. I guess it could be added when the test images are > done. Unfortunately, when running your image in the cloud it is kind of hard to > recover. Even more as you could have tested on an AMD based host.Sadly my AMD boxes did not explode when I tested Len''s patches :-(.> > Wonder whether it would be an option to automatically mask the mwait capability > off when running in paravirt mode...We could, but why not strive to do it correctly first. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Bader
2011-Nov-10 15:35 UTC
Re: [Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
On 10.11.2011 15:55, Konrad Rzeszutek Wilk wrote:> On Thu, Nov 10, 2011 at 09:33:06AM +0100, Stefan Bader wrote: >> On 09.11.2011 18:58, Konrad Rzeszutek Wilk wrote: >>> On Wed, Oct 26, 2011 at 03:57:15PM +0200, Stefan Bader wrote: >>>> On 26.10.2011 15:48, Konrad Rzeszutek Wilk wrote: >>>>>>> What about using the cpuidle_disabled() functionality and adhere to that? >>>>>>> As so: >>>>>>> >>>>> .. snip.. >>>>>> >>>>>> >From reading over it, this should work. Though I would be interested to hear >>>>>> from the linux-acpi folks. Also to double check that calling pm_idle when >>>>>> cpuidle.off was specified really is what is intended. >>>>> >>>>> Oh yeah, definitly need the input from linux-acpi folks. And also to be actually >>>>> tested :-) >>>> >>>> I can volunteer to do the testing. But I am lazy enough to hold back a bit as >>>> someone may tell us this is completely the wrong way to fix it. :) >>> >>> So the other option is to use ''idle=halt'' on the Linux command line. That should >>> provide the workaround for the folks reporting this (is there a BZ for it?). >>> >> Believe upstream bz is still down. Got the issue reported here, though: >> >> http://bugs.launchpad.net/bugs/881076 > > Ok, added that to the patch description. >> >> Hm, as a workaround probably. I guess it could be added when the test images are >> done. Unfortunately, when running your image in the cloud it is kind of hard to >> recover. Even more as you could have tested on an AMD based host. >> Sadly my AMD boxes did not explode when I tested Len''s patches :-(.No, to my knowledge AMD CPUs do not have mwait. So the only alternate idle to happen would be the e400 aware one (which is basically default_idle with conditionally programming a broadcast timer interrupt to get out of c1e). I saw this happen on my box, too. And there was no visible badness caused by it.>> >> Wonder whether it would be an option to automatically mask the mwait capability >> off when running in paravirt mode... >> We could, but why not strive to do it correctly first.Just seemed (from looking at those replies about wanting mwait idle even when cpuidle is disabled) that there is a slight controversy about what correctly is. I personally would also think that a state of cpuidle disabled means no special idle function. But that does not have to be the correct interpretation. So I was reasoning that if the mwait function gets selected because the caps say its available, but it does not work in paravirt, then maybe claiming its availability is wrong as well. Of course there is also the initial question about calling a generic function and if that fails call the idle function hook directly. Which, because having cpuidle disabled is returned as an error, ended up calling the hook directly again. I really have trouble understanding the reasoning there... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Len Brown
2011-Nov-13 03:46 UTC
[Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
On Wed, Oct 26, 2011 at 6:24 AM, Stefan Bader <stefan.bader@canonical.com> wrote:> The following commit changes calls to pm_idle into first trying > cpuidle_call_idle() and if that returns non-zero to fall back to > call pm_idle(). > > commit a0bfa1373859e9d11dc92561a8667588803e42d8 > Author: Len Brown <len.brown@intel.com> > Date: Fri Apr 1 19:34:59 2011 -0400 > > cpuidle: stop depending on pm_idle > > However cpuidle_call_idle() will return -ENODEV if it is supposed to be disabled > by cpuidle.off. Which then causes pm_idle() to be called. > > This has some bad interaction with the following change that tries to > make use of disabling cpuidle in Xen to fall back to hlt. > > commit d91ee5863b71e8c90eaf6035bff3078a85e2e7b5 > Author: Len Brown <len.brown@intel.com> > Date: Fri Apr 1 18:28:35 2011 -0400 > > cpuidle: replace xen access to x86 pm_idle and default_idle > > The problem I see is that select_idle_routine() is called from > arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle > anymore, it can cause mwait_idle or amd_e400_idle functions to be selected. > In testing it seem amd_e400_idle in PVM domU at least does not immediately cause > problems, but mwait_idle just causes crashes. From the reports I have > this may be related to older hypervisors (3.1 and older) not clearing the mwait > capability. But overall there seems something wrong in the interaction.Why is Xen advertising X86_FEATURE_MWAIT and then crashing when the dom0 (or other guests) use what it advertises? What versions of Xen have this bug?> I am not really sure whether the logic of calling pm_idle() on all errors from > cpuidle_call_idle() is already flawed or the assumption in the Xen patch about > being able to prevent the wrong idle function by turning cpuidle off is incorrect.The patches above appear to be operating as intended. What wasn''t expected, was that some version of Xen is deployed that advertises the MWAIT feature, but crashes when it is used.> One quick fix could be to add some Xen case into select_idle_routine() which > picks default_idle...No. Working around this Xen bug for a newly compiled Dom0 is insufficient. All guests that also look for MWAIT support w/o asking ACPI (ie. all versions of Linux that use intel_idle, such as the last few Fedora''s, RHEL, SLES etc.) will trip over the same Xen bug, even if Dom0 doesn''t. Xen must not advertises MWAIT support if it doesn''t have MWAIT support. thanks, Len Brown, Intel Open Source Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Nov-13 16:59 UTC
Re: [Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
On 13/11/2011 03:46, "Len Brown" <lenb@kernel.org> wrote:>> The problem I see is that select_idle_routine() is called from >> arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle >> anymore, it can cause mwait_idle or amd_e400_idle functions to be selected. >> In testing it seem amd_e400_idle in PVM domU at least does not immediately >> cause >> problems, but mwait_idle just causes crashes. From the reports I have >> this may be related to older hypervisors (3.1 and older) not clearing the >> mwait >> capability. But overall there seems something wrong in the interaction. > > Why is Xen advertising X86_FEATURE_MWAIT and then crashing > when the dom0 (or other guests) use what it advertises? > > What versions of Xen have this bug?Xen doesn''t advertise MWAIT. Possibly Xen-pv_ops is lying to the rest of the kernel via the cpuid pv_ops hook. This would probably be because Xen is relying on the OSPM in dom0 kernel to parse out Cx/Px info which Xen itself *can* use. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-14 14:31 UTC
[Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
Hey Len,> > The problem I see is that select_idle_routine() is called from > > arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle > > anymore, it can cause mwait_idle or amd_e400_idle functions to be selected. > > In testing it seem amd_e400_idle in PVM domU at least does not immediately cause > > problems, but mwait_idle just causes crashes. From the reports I have > > this may be related to older hypervisors (3.1 and older) not clearing the mwait > > capability. But overall there seems something wrong in the interaction. > > Why is Xen advertising X86_FEATURE_MWAIT and then crashing > when the dom0 (or other guests) use what it advertises?The only case where I''ve seen this is with Amazon EC2. The other newer hypervisors (4.1.1 and such) do not trigger this.> > What versions of Xen have this bug?Whatever Amazon is using. I think they are RHEL5 based hypervisor.> > > I am not really sure whether the logic of calling pm_idle() on all errors from > > cpuidle_call_idle() is already flawed or the assumption in the Xen patch about > > being able to prevent the wrong idle function by turning cpuidle off is incorrect. > > The patches above appear to be operating as intended. > What wasn''t expected, was that some version of Xen is deployed that > advertises the MWAIT feature, but crashes when it is used.How does that work with AMD? On those machines it ends up calling amd_e400_idle instead of the default_idle. Granted it does not "BUG" out but it does lead to extra trap-n-emulate (the MSR operation) in to the hypervisor which is not good.> > > One quick fix could be to add some Xen case into select_idle_routine() which > > picks default_idle... > > No. > > Working around this Xen bug for a newly compiled Dom0 is insufficient. > > All guests that also look for MWAIT support w/o asking ACPI > (ie. all versions of Linux that use intel_idle, such as the last few > Fedora''s, RHEL, SLES etc.) > will trip over the same Xen bug, even if Dom0 doesn''t. > > Xen must not advertises MWAIT support if it doesn''t have MWAIT support.How does work out when we figure MWAIT support from the CPUID? Or are you saying that it is correct - if the CPU advertises it, then yes advertise it to the Linux kernel? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-14 18:19 UTC
Re: [Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
On Sun, Nov 13, 2011 at 04:59:10PM +0000, Keir Fraser wrote:> On 13/11/2011 03:46, "Len Brown" <lenb@kernel.org> wrote: > > >> The problem I see is that select_idle_routine() is called from > >> arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle > >> anymore, it can cause mwait_idle or amd_e400_idle functions to be selected. > >> In testing it seem amd_e400_idle in PVM domU at least does not immediately > >> cause > >> problems, but mwait_idle just causes crashes. From the reports I have > >> this may be related to older hypervisors (3.1 and older) not clearing the > >> mwait > >> capability. But overall there seems something wrong in the interaction. > > > > Why is Xen advertising X86_FEATURE_MWAIT and then crashing > > when the dom0 (or other guests) use what it advertises? > > > > What versions of Xen have this bug? > > Xen doesn''t advertise MWAIT. Possibly Xen-pv_ops is lying to the rest of the > kernel via the cpuid pv_ops hook. This would probably be because Xen isI can''t seem to find anything in there advertising the MWAIT feature.> relying on the OSPM in dom0 kernel to parse out Cx/Px info which Xen itself > *can* use.The Cx/Px patches that would parse the Cx/Px and then percolate those up to the hypervisor are in mainline. So that is not it. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-14 18:26 UTC
Re: [Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
On Mon, Nov 14, 2011 at 09:31:24AM -0500, Konrad Rzeszutek Wilk wrote:> Hey Len, > > > > The problem I see is that select_idle_routine() is called from > > > arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle > > > anymore, it can cause mwait_idle or amd_e400_idle functions to be selected. > > > In testing it seem amd_e400_idle in PVM domU at least does not immediately cause > > > problems, but mwait_idle just causes crashes. From the reports I have > > > this may be related to older hypervisors (3.1 and older) not clearing the mwait > > > capability. But overall there seems something wrong in the interaction. > > > > Why is Xen advertising X86_FEATURE_MWAIT and then crashing > > when the dom0 (or other guests) use what it advertises? > > The only case where I''ve seen this is with Amazon EC2. The other > newer hypervisors (4.1.1 and such) do not trigger this. > > > > > What versions of Xen have this bug? > > Whatever Amazon is using. I think they are RHEL5 based hypervisor.Vincent, Would you have ideas of what might be happening? It looks as if some Amazon instances are advertisting the MWAIT flag but not really supporting it. Any ideas of what might be going on?> > > > > > I am not really sure whether the logic of calling pm_idle() on all errors from > > > cpuidle_call_idle() is already flawed or the assumption in the Xen patch about > > > being able to prevent the wrong idle function by turning cpuidle off is incorrect. > > > > The patches above appear to be operating as intended. > > What wasn''t expected, was that some version of Xen is deployed that > > advertises the MWAIT feature, but crashes when it is used. > > How does that work with AMD? On those machines it ends up calling > amd_e400_idle instead of the default_idle. Granted it does not "BUG" out > but it does lead to extra trap-n-emulate (the MSR operation) in to the hypervisor > which is not good... snip..> > Working around this Xen bug for a newly compiled Dom0 is insufficient.It looks to affect PV guests as well.> > > > All guests that also look for MWAIT support w/o asking ACPI > > (ie. all versions of Linux that use intel_idle, such as the last few > > Fedora''s, RHEL, SLES etc.) > > will trip over the same Xen bug, even if Dom0 doesn''t.Don''t think I answered this one: The intel_idle has not been used in those cases b/c the default_idle had been used instead.> > > > Xen must not advertises MWAIT support if it doesn''t have MWAIT support. > > How does work out when we figure MWAIT support from the CPUID? > Or are you saying that it is correct - if the CPU advertises it, then yes > advertise it to the Linux kernel? > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-14 20:11 UTC
Re: [Xen-devel] Re: Regression in 3.1 causes Xen to use wrong idle routine
On Mon, Nov 14, 2011 at 01:19:01PM -0500, Konrad Rzeszutek Wilk wrote:> On Sun, Nov 13, 2011 at 04:59:10PM +0000, Keir Fraser wrote: > > On 13/11/2011 03:46, "Len Brown" <lenb@kernel.org> wrote: > > > > >> The problem I see is that select_idle_routine() is called from > > >> arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle > > >> anymore, it can cause mwait_idle or amd_e400_idle functions to be selected. > > >> In testing it seem amd_e400_idle in PVM domU at least does not immediately > > >> cause > > >> problems, but mwait_idle just causes crashes. From the reports I have > > >> this may be related to older hypervisors (3.1 and older) not clearing the > > >> mwait > > >> capability. But overall there seems something wrong in the interaction. > > > > > > Why is Xen advertising X86_FEATURE_MWAIT and then crashing > > > when the dom0 (or other guests) use what it advertises? > > > > > > What versions of Xen have this bug? > > > > Xen doesn''t advertise MWAIT. Possibly Xen-pv_ops is lying to the rest of the > > kernel via the cpuid pv_ops hook. This would probably be because Xen is > > I can''t seem to find anything in there advertising the MWAIT feature. > > > relying on the OSPM in dom0 kernel to parse out Cx/Px info which Xen itself > > *can* use. > > The Cx/Px patches that would parse the Cx/Px and then percolate those up > to the hypervisor are in mainline. So that is not it.<sigh> I meant to say "are _not_ in mainline". Sorry for the confusion. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel