There''re basically two methods to enter a given C-state: legacy (hlt + I/O read), and native(using mwait). MWAIT is always preferred when both underlying CPU and OS support, which is a more efficient way to conduct C-state transition. Xen PM relies on Dom0 to parse ACPI Cx/Px information, which involves one step to notify BIOS about a set of capabilities supported by OSPM. One capability is about mwait support, which if true ACPI Cx entry contains entry parameters for mwait, or else I/O port information is provided. Xen PM later decides entry method (i/o or mwait) based on parsed ACPI information from dom0. However Xen doesn''t expose MWAIT capability to dom0 due to changeset 17573: # HG changeset patch # User Keir Fraser <keir.fraser@citrix.com> # Date 1210065934 -3600 # Node ID 777f294e3be81a4d0825e3a9b633a8d81c37f613 # Parent d5589865bfce91bf65c34bd56e466bf26ca4176f x86, Intel: Make only EST feature visible to dom0 to enable Cx-state logic. There should be no need to make MWAIT visible. Signed-off-by: Keir Fraser <keir.fraser@citrix.com> diff -r d5589865bfce -r 777f294e3be8 xen/arch/x86/traps.c --- a/xen/arch/x86/traps.c Tue May 06 10:19:10 2008 +0100 +++ b/xen/arch/x86/traps.c Tue May 06 10:25:34 2008 +0100 @@ -713,8 +713,7 @@ static int emulate_forced_invalid_op(str __clear_bit(X86_FEATURE_PBE, &d); __clear_bit(X86_FEATURE_DTES64 % 32, &c); - if ( !IS_PRIV(current->domain) ) - __clear_bit(X86_FEATURE_MWAIT % 32, &c); + __clear_bit(X86_FEATURE_MWAIT % 32, &c); __clear_bit(X86_FEATURE_DSCPL % 32, &c); __clear_bit(X86_FEATURE_VMXE % 32, &c); __clear_bit(X86_FEATURE_SMXE % 32, &c); [...] This then brings a problem to Dom0 which thinks underlying CPU doesn''t report mwait, and thus notify BIOS to use old I/O based method. Later a trick is integrated in Jeremy''s pvops tree: commit 4151815a222723002d727ef0a89f0756d4e7d3d2 Author: Wei Gang <gang.wei@intel.com> Date: Mon Apr 5 14:21:18 2010 -0700 xen/acpi: re-enable mwait for xen cpuidle Xen hypervisor doesn''t export mwait feature to dom0, but latest Linux kernel start to check this feature while initializing _PDC object. Bypass such check in pv-ops case to re-enable mwait for xen cpuidle. Signed-off-by: Wei Gang <gang.wei@intel.com> Signed-off-by: Yu Ke <ke.yu@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> diff --git a/arch/x86/kernel/acpi/processor.c b/arch/x86/kernel/acpi/processor.c index 8c9526d..d88866c 100644 --- a/arch/x86/kernel/acpi/processor.c +++ b/arch/x86/kernel/acpi/processor.c @@ -60,7 +60,7 @@ static void init_intel_pdc(struct acpi_processor *pr, struct cpuinfo_x86 *c) /* * If mwait/monitor is unsupported, C2/C3_FFH will be disabled */ - if (!cpu_has(c, X86_FEATURE_MWAIT)) + if (!cpu_has(c, X86_FEATURE_MWAIT) && !xen_initial_domain()) buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); obj->type = ACPI_TYPE_BUFFER; Above trick is ugly and error-prone, since it always enable mwait regardless of actual CPU capability. It''s unlikely to make into upstream, and also get lost in into some distro such as SLES11. Instead of enhancing current approach (e.g. add a separate channel to reveal mwait capability instead of using cpuid), I''m curious why we don''t expose mwait in cpuid to dom0 directly. At least original comment in 17573 looks obscure, since EIST has nothing to do with Cx... Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/08/2011 06:35, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Above trick is ugly and error-prone, since it always enable mwait regardless > of > actual CPU capability. It''s unlikely to make into upstream, and also get lost > in > into some distro such as SLES11. > > Instead of enhancing current approach (e.g. add a separate channel to reveal > mwait capability instead of using cpuid), I''m curious why we don''t expose > mwait > in cpuid to dom0 directly. At least original comment in 17573 looks obscure, > since EIST has nothing to do with Cx...Well the problem is some older kernels will try to use MWAIT if they see the feature in CPUID. Of course that doesn''t work outside ring 0. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Keir Fraser [mailto:keir.xen@gmail.com] > Sent: Monday, August 15, 2011 2:55 PM > > On 15/08/2011 06:35, "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > Above trick is ugly and error-prone, since it always enable mwait regardless > > of > > actual CPU capability. It''s unlikely to make into upstream, and also get lost > > in > > into some distro such as SLES11. > > > > Instead of enhancing current approach (e.g. add a separate channel to reveal > > mwait capability instead of using cpuid), I''m curious why we don''t expose > > mwait > > in cpuid to dom0 directly. At least original comment in 17573 looks obscure, > > since EIST has nothing to do with Cx... > > Well the problem is some older kernels will try to use MWAIT if they see the > feature in CPUID. Of course that doesn''t work outside ring 0. >if those old kernels are still of interest, then possibly a boot option in Xen would be applausive. Or can we just allow selectively exposing MWAIT when xen cpuidle is enabled? Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/08/2011 08:13, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> Well the problem is some older kernels will try to use MWAIT if they see the >> feature in CPUID. Of course that doesn''t work outside ring 0. >> > > if those old kernels are still of interest, then possibly a boot option in Xen > would > be applausive. Or can we just allow selectively exposing MWAIT when xen > cpuidle is enabled?The kernel could unconditionally advertise MWAIT from its cpuid pv_ops hook? If all that''s doing is causing relevant parts of BIOS tables to be parsed, would that be safe when MWAIT is not in fact available? Else the kernel could get the flag from the real non paravirtualised CPUID instruction. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Keir Fraser [mailto:keir.xen@gmail.com] > Sent: Monday, August 15, 2011 3:45 PM > > On 15/08/2011 08:13, "Tian, Kevin" <kevin.tian@intel.com> wrote: > > >> Well the problem is some older kernels will try to use MWAIT if they see the > >> feature in CPUID. Of course that doesn''t work outside ring 0. > >> > > > > if those old kernels are still of interest, then possibly a boot option in Xen > > would > > be applausive. Or can we just allow selectively exposing MWAIT when xen > > cpuidle is enabled? > > The kernel could unconditionally advertise MWAIT from its cpuid pv_ops hook? > If all that''s doing is causing relevant parts of BIOS tables to be parsed, > would that be safe when MWAIT is not in fact available?It''s not safe to unconditionally advertise MWAIT, since if underlying CPU doesn''t support it you''ll get invalid op in Xen when attempting to use it. This is why I want to see MWAIT expose to guest base on real cpu capability, even when I say selectively expose it under xen cpuidle. :-)> > Else the kernel could get the flag from the real non paravirtualised CPUID > instruction.linux uses cpu_has to extract mwait capability. To use real cpuid instruction, then we need change Linux code which is not worthy though, like below: if (!cpu_has(c, X86_FEATURE_MWAIT)) buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); If we make it into cpu_has bits, then it lacks of original guarding effect. So how about the change like below? emulate_forced_invalid_op: - __clear_bit(X86_FEATURE_MWAIT % 32, &c); + if ( !IS_PRIV(current->domain) || !xen_cpuidle ) + __clear_bit(X86_FEATURE_MWAIT % 32, &c); Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote: > There''re basically two methods to enter a given C-state: legacy (hlt + I/O > read), > and native(using mwait). MWAIT is always preferred when both underlying CPU > and OS support, which is a more efficient way to conduct C-state transition. > > Xen PM relies on Dom0 to parse ACPI Cx/Px information, which involves one > step to notify BIOS about a set of capabilities supported by OSPM. One > capability > is about mwait support, which if true ACPI Cx entry contains entry > parameters > for mwait, or else I/O port information is provided. Xen PM later decides > entry > method (i/o or mwait) based on parsed ACPI information from dom0. > > However Xen doesn''t expose MWAIT capability to dom0 due to changeset 17573: > > This then brings a problem to Dom0 which thinks underlying CPU > doesn''t report mwait, and thus notify BIOS to use old I/O based method. > > Later a trick is integrated in Jeremy''s pvops tree: > > --- a/arch/x86/kernel/acpi/processor.c > +++ b/arch/x86/kernel/acpi/processor.c > @@ -60,7 +60,7 @@ static void init_intel_pdc(struct acpi_processor *pr, > struct cpuinfo_x86 *c) > /* > * If mwait/monitor is unsupported, C2/C3_FFH will be disabled > */ > - if (!cpu_has(c, X86_FEATURE_MWAIT)) > + if (!cpu_has(c, X86_FEATURE_MWAIT) && !xen_initial_domain()) > buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); > > obj->type = ACPI_TYPE_BUFFER; > > Above trick is ugly and error-prone, since it always enable mwait regardless > of actual CPU capability.3.x (and later 2.6.3x) don''t look at the CPUID flag anymore, they just check boot_option_idle_override, which is being controlled from the command line or enforced for some particular systems based on DMI data.> It''s unlikely to make into upstream, and also get lost in > into some distro such as SLES11.We can certainly fix it there. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Monday, August 15, 2011 4:02 PM > > >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote: > > There''re basically two methods to enter a given C-state: legacy (hlt + I/O > > read), > > and native(using mwait). MWAIT is always preferred when both underlying > CPU > > and OS support, which is a more efficient way to conduct C-state transition. > > > > Xen PM relies on Dom0 to parse ACPI Cx/Px information, which involves one > > step to notify BIOS about a set of capabilities supported by OSPM. One > > capability > > is about mwait support, which if true ACPI Cx entry contains entry > > parameters > > for mwait, or else I/O port information is provided. Xen PM later decides > > entry > > method (i/o or mwait) based on parsed ACPI information from dom0. > > > > However Xen doesn''t expose MWAIT capability to dom0 due to changeset > 17573: > > > > This then brings a problem to Dom0 which thinks underlying CPU > > doesn''t report mwait, and thus notify BIOS to use old I/O based method. > > > > Later a trick is integrated in Jeremy''s pvops tree: > > > > --- a/arch/x86/kernel/acpi/processor.c > > +++ b/arch/x86/kernel/acpi/processor.c > > @@ -60,7 +60,7 @@ static void init_intel_pdc(struct acpi_processor *pr, > > struct cpuinfo_x86 *c) > > /* > > * If mwait/monitor is unsupported, C2/C3_FFH will be disabled > > */ > > - if (!cpu_has(c, X86_FEATURE_MWAIT)) > > + if (!cpu_has(c, X86_FEATURE_MWAIT) && !xen_initial_domain()) > > buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); > > > > obj->type = ACPI_TYPE_BUFFER; > > > > Above trick is ugly and error-prone, since it always enable mwait regardless > > of actual CPU capability. > > 3.x (and later 2.6.3x) don''t look at the CPUID flag anymore, they just > check boot_option_idle_override, which is being controlled from the > command line or enforced for some particular systems based on DMI > data.I don''t think so. "boot_option_idle_override" controls the way how idle loop is implemented, which has the side effect to disable MWAIT if cpuid says it but "boot=nomwait" is specified. But it has no effect to enable MWAIT if Xen doesn''t tell dom0 about it. Check arch_acpi_set_pdc_bits under x86: static inline void arch_acpi_set_pdc_bits(u32 *buf) { struct cpuinfo_x86 *c = &cpu_data(0); buf[2] |= ACPI_PDC_C_CAPABILITY_SMP; if (cpu_has(c, X86_FEATURE_EST)) buf[2] |= ACPI_PDC_EST_CAPABILITY_SWSMP; if (cpu_has(c, X86_FEATURE_ACPI)) buf[2] |= ACPI_PDC_T_FFH; /* * If mwait/monitor is unsupported, C2/C3_FFH will be disabled */ if (!cpu_has(c, X86_FEATURE_MWAIT)) buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); }> > > It''s unlikely to make into upstream, and also get lost in > > into some distro such as SLES11. > > We can certainly fix it there. >that''d be great. I/O method has observable impact on power efficiency, and the fix would be very welcomed. :-) Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/08/2011 08:57, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> Else the kernel could get the flag from the real non paravirtualised CPUID >> instruction. > > linux uses cpu_has to extract mwait capability. To use real cpuid instruction, > then > we need change Linux code which is not worthy though, like below: > if (!cpu_has(c, X86_FEATURE_MWAIT)) > buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); > If we make it into cpu_has bits, then it lacks of original guarding effect.cpu_has() accesses a pre-filled capabilities bitmask, filled from cpuid, right? And cpuid goes through a pv_ops hook? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 15.08.11 at 09:57, "Tian, Kevin" <kevin.tian@intel.com> wrote: > So how about the change like below? > > emulate_forced_invalid_op: > - __clear_bit(X86_FEATURE_MWAIT % 32, &c); > + if ( !IS_PRIV(current->domain) || !xen_cpuidle ) > + __clear_bit(X86_FEATURE_MWAIT % 32, &c);You''d break any Dom0 kernel that uses the flag to determine whether it can actually use the mwait/monitor instructions. I agree with Keir that for this particular purpose the code ought to look at the real CPUID flag. It should be possible to do this without changing non-Xen code, since the use of the instructions is additionally gated on CPUID leaf 5 producing non-zero output. But again, the kernel has to do this for itself, the hypervisor shouldn''t expose the feature flag. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Keir Fraser [mailto:keir.xen@gmail.com] > Sent: Monday, August 15, 2011 4:11 PM > > On 15/08/2011 08:57, "Tian, Kevin" <kevin.tian@intel.com> wrote: > > >> Else the kernel could get the flag from the real non paravirtualised CPUID > >> instruction. > > > > linux uses cpu_has to extract mwait capability. To use real cpuid instruction, > > then > > we need change Linux code which is not worthy though, like below: > > if (!cpu_has(c, X86_FEATURE_MWAIT)) > > buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); > > If we make it into cpu_has bits, then it lacks of original guarding effect. > > cpu_has() accesses a pre-filled capabilities bitmask, filled from cpuid, > right? And cpuid goes through a pv_ops hook? >I''m not quite catching you here. Do you want to prefill mwait bit from pv_ops hook unconditionally even in current situation where Xen doesn''t expose mwait, or selectively under some dom0''s boot option even when Xen is changed to expose mwait? The first case is not sane, while for the 2nd case I don''t think any pv_ops hook is required as long as Xen can expose it, isn''t it? Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 15.08.11 at 10:09, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@novell.com] >> Sent: Monday, August 15, 2011 4:02 PM >> >> >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> > There''re basically two methods to enter a given C-state: legacy (hlt + I/O >> > read), >> > and native(using mwait). MWAIT is always preferred when both underlying >> CPU >> > and OS support, which is a more efficient way to conduct C-state transition. >> > >> > Xen PM relies on Dom0 to parse ACPI Cx/Px information, which involves one >> > step to notify BIOS about a set of capabilities supported by OSPM. One >> > capability >> > is about mwait support, which if true ACPI Cx entry contains entry >> > parameters >> > for mwait, or else I/O port information is provided. Xen PM later decides >> > entry >> > method (i/o or mwait) based on parsed ACPI information from dom0. >> > >> > However Xen doesn''t expose MWAIT capability to dom0 due to changeset >> 17573: >> > >> > This then brings a problem to Dom0 which thinks underlying CPU >> > doesn''t report mwait, and thus notify BIOS to use old I/O based method. >> > >> > Later a trick is integrated in Jeremy''s pvops tree: >> > >> > --- a/arch/x86/kernel/acpi/processor.c >> > +++ b/arch/x86/kernel/acpi/processor.c >> > @@ -60,7 +60,7 @@ static void init_intel_pdc(struct acpi_processor *pr, >> > struct cpuinfo_x86 *c) >> > /* >> > * If mwait/monitor is unsupported, C2/C3_FFH will be disabled >> > */ >> > - if (!cpu_has(c, X86_FEATURE_MWAIT)) >> > + if (!cpu_has(c, X86_FEATURE_MWAIT) && !xen_initial_domain()) >> > buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); >> > >> > obj->type = ACPI_TYPE_BUFFER; >> > >> > Above trick is ugly and error-prone, since it always enable mwait regardless >> > of actual CPU capability. >> >> 3.x (and later 2.6.3x) don''t look at the CPUID flag anymore, they just >> check boot_option_idle_override, which is being controlled from the >> command line or enforced for some particular systems based on DMI >> data. > > I don''t think so. "boot_option_idle_override" controls the way how idle loop > is implemented, which has the side effect to disable MWAIT if cpuid says it > but "boot=nomwait" is specified. But it has no effect to enable MWAIT if > Xen doesn''t tell dom0 about it. > > Check arch_acpi_set_pdc_bits under x86:Indeed - I didn''t look at the header files. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Monday, August 15, 2011 4:12 PM > > >>> On 15.08.11 at 09:57, "Tian, Kevin" <kevin.tian@intel.com> wrote: > > So how about the change like below? > > > > emulate_forced_invalid_op: > > - __clear_bit(X86_FEATURE_MWAIT % 32, &c); > > + if ( !IS_PRIV(current->domain) || !xen_cpuidle ) > > + __clear_bit(X86_FEATURE_MWAIT % 32, &c); > > You''d break any Dom0 kernel that uses the flag to determine whether it > can actually use the mwait/monitor instructions. I agree with Keir thatFor those old kernels, then disable cpuidle to revert the effect.> for this particular purpose the code ought to look at the real CPUID > flag. > > It should be possible to do this without changing non-Xen code, since > the use of the instructions is additionally gated on CPUID leaf 5 > producing non-zero output. But again, the kernel has to do this for > itself, the hypervisor shouldn''t expose the feature flag. >Even by using real CPUID, the point is that we either need make it into cpu capability bits to avoid further change arch_acpi_set_pdc_bits, or provide Xen''s own arch_acpi_set_pdc_bits by looking at real CPUID result directly. The former has same effect as having Xen expose mwait, while the latter simply adds more changes to Linux code. Somehow imho it''s easier to switch to a newer Xen release than switching to a newer dom0 kernel. The latter requires more validations. If we can make changes in Xen, it''s easier for existing deployment to workaround this issue. Or else only next releases from various OSVs may have the fix which has long days to go... Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 15.08.11 at 10:14, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> From: Keir Fraser [mailto:keir.xen@gmail.com] >> Sent: Monday, August 15, 2011 4:11 PM >> >> On 15/08/2011 08:57, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >> >> Else the kernel could get the flag from the real non paravirtualised CPUID >> >> instruction. >> > >> > linux uses cpu_has to extract mwait capability. To use real cpuid > instruction, >> > then >> > we need change Linux code which is not worthy though, like below: >> > if (!cpu_has(c, X86_FEATURE_MWAIT)) >> > buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); >> > If we make it into cpu_has bits, then it lacks of original guarding effect. >> >> cpu_has() accesses a pre-filled capabilities bitmask, filled from cpuid, >> right? And cpuid goes through a pv_ops hook? >> > > I''m not quite catching you here. Do you want to prefill mwait bit from > pv_ops > hook unconditionally even in current situation where Xen doesn''t expose > mwait, or selectively under some dom0''s boot option even when Xen is > changed to expose mwait? The first case is not sane, while for the 2nd case > I don''t think any pv_ops hook is required as long as Xen can expose it, > isn''t it?Didn''t you want to make Xen''s exposing of the pv bit conditional upon xen_cpuidle? Dom0 can indirectly see this variable already through XEN_PROCESSOR_PM_CX being set in SIF_PM_MASK, so it could use it for its own decision. Whether that''s really safe is another matter, as xen_cpuidle set doesn''t mean mwait is actually being used. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/08/2011 09:14, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> cpu_has() accesses a pre-filled capabilities bitmask, filled from cpuid, >> right? And cpuid goes through a pv_ops hook? >> > > I''m not quite catching you here. Do you want to prefill mwait bit from pv_ops > hook unconditionally even in current situation where Xen doesn''t expose > mwait, or selectively under some dom0''s boot option even when Xen is > changed to expose mwait? The first case is not sane, while for the 2nd case > I don''t think any pv_ops hook is required as long as Xen can expose it, isn''t > it?But there is a pv_ops hook, and Xen isn''t going to expose it because it breaks old dom0 kernels. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 15.08.11 at 10:09, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@novell.com] >> >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> > It''s unlikely to make into upstream, and also get lost in >> > into some distro such as SLES11. >> >> We can certainly fix it there. >> > > that''d be great. I/O method has observable impact on power efficiency, > and the fix would be very welcomed. :-)While the change is simple to do and works, I''m somewhat concerned that while improving the situation on CPUs that support the break-on- interrupt extension to mwait, it would result in C2/C3 not being usable at all on CPUs that don''t (but support mwait in its simpler form and have ACPI tables specifying FFH as address space id). Is that only a theoretical concern (i.e. is there an implicit guarantee that for other than C1 FFH wouldn''t be specified without that extension being available)? I thinks it''s a practical one, or otherwise there wouldn''t be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC evaluation. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Aug 15, 2011 at 09:02:29AM +0100, Jan Beulich wrote:> >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote: > > There''re basically two methods to enter a given C-state: legacy (hlt + I/O > > read), > > and native(using mwait). MWAIT is always preferred when both underlying CPU > > and OS support, which is a more efficient way to conduct C-state transition. > > > > Xen PM relies on Dom0 to parse ACPI Cx/Px information, which involves one > > step to notify BIOS about a set of capabilities supported by OSPM. One > > capability > > is about mwait support, which if true ACPI Cx entry contains entry > > parameters > > for mwait, or else I/O port information is provided. Xen PM later decides > > entry > > method (i/o or mwait) based on parsed ACPI information from dom0. > > > > However Xen doesn''t expose MWAIT capability to dom0 due to changeset 17573: > > > > This then brings a problem to Dom0 which thinks underlying CPU > > doesn''t report mwait, and thus notify BIOS to use old I/O based method. > > > > Later a trick is integrated in Jeremy''s pvops tree: > > > > --- a/arch/x86/kernel/acpi/processor.c > > +++ b/arch/x86/kernel/acpi/processor.c > > @@ -60,7 +60,7 @@ static void init_intel_pdc(struct acpi_processor *pr, > > struct cpuinfo_x86 *c) > > /* > > * If mwait/monitor is unsupported, C2/C3_FFH will be disabled > > */ > > - if (!cpu_has(c, X86_FEATURE_MWAIT)) > > + if (!cpu_has(c, X86_FEATURE_MWAIT) && !xen_initial_domain()) > > buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); > > > > obj->type = ACPI_TYPE_BUFFER; > > > > Above trick is ugly and error-prone, since it always enable mwait regardless > > of actual CPU capability. > > 3.x (and later 2.6.3x) don''t look at the CPUID flag anymore, they just > check boot_option_idle_override, which is being controlled from the > command line or enforced for some particular systems based on DMI > data.Or in case when running under Xen (dom0): (arch/x86/xen/setup.c) 428 disable_cpuidle(); 429 boot_option_idle_override = IDLE_HALT; Which ends up calling the xen_safe_halt which does the yield hypercall. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Keir Fraser [mailto:keir.xen@gmail.com] > Sent: Monday, August 15, 2011 5:02 PM > > On 15/08/2011 09:14, "Tian, Kevin" <kevin.tian@intel.com> wrote: > > >> cpu_has() accesses a pre-filled capabilities bitmask, filled from cpuid, > >> right? And cpuid goes through a pv_ops hook? > >> > > > > I''m not quite catching you here. Do you want to prefill mwait bit from pv_ops > > hook unconditionally even in current situation where Xen doesn''t expose > > mwait, or selectively under some dom0''s boot option even when Xen is > > changed to expose mwait? The first case is not sane, while for the 2nd case > > I don''t think any pv_ops hook is required as long as Xen can expose it, isn''t > > it? > > But there is a pv_ops hook, and Xen isn''t going to expose it because it > breaks old dom0 kernels. >yes, now I understand your suggestion. Basically we discussed two approaches: a) in current pv_ops hook (xen_cpuid): - use unconditional cpuid to query mwait capability on physical cpu - if the bit is set, and SIF_PM_MASK indicates xen pm is enabled: then set the bit in the ECX when leaf 1 is queried this should effectively has X86_FEATURE_MWAIT set, and then _PDC method will notify BIOS using mwait entry method. This method doesn''t require Xen change, but it would only work for future releases which incorporates this change b) in Xen we selectively clear MWAIT capability in pv_cpuid, based on whether xen_cpuidle is enabled. If there''s no MWAIT available on physical CPU, or xen_cpuidle is disabled, MWAIT is not exposed to the guest. This approach doesn''t break old dom0 kernel, as it''s controlled by xen_cpuidle cmdline option. Basically it''s not suggested to activate Cx transition by using legacy I/O method, so it''s fine to disable xen cpuidle on those old dom0 kernel. approach a) is better from the angle that we don''t want non-ring0 code to execute MWAIT which causes invalid opcode exception, while for PM purpose MWAIT is purely informational. Approach b) is better if we want existing Xen deployments to use efficient C-state benefit, since Xen is backward compatible and thus upgrade to a newer Xen release is lighter than upgrading to a newer dom0 kernel. What''s your opinion about this? If b) is not a valid concern from your side, we will go to a) as you suggested. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Monday, August 15, 2011 6:29 PM > > >>> On 15.08.11 at 10:09, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> > It''s unlikely to make into upstream, and also get lost in > >> > into some distro such as SLES11. > >> > >> We can certainly fix it there. > >> > > > > that''d be great. I/O method has observable impact on power efficiency, > > and the fix would be very welcomed. :-) > > While the change is simple to do and works, I''m somewhat concernedCurrent change mentioned above is not safe, which always enables mwait support w/o knowledge about underlying presence. But new processors all have mwait support, so this may not be big problem.> that while improving the situation on CPUs that support the break-on- > interrupt extension to mwait, it would result in C2/C3 not being usable > at all on CPUs that don''t (but support mwait in its simpler form and > have ACPI tables specifying FFH as address space id). Is that only a > theoretical concern (i.e. is there an implicit guarantee that for other > than C1 FFH wouldn''t be specified without that extension being > available)? I thinks it''s a practical one, or otherwise there wouldn''t > be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC > evaluation.Yes, this is a practical one, though I don''t know any box doing that. In all the boxes I''ve been using so far, all the extensions are available. But since BIOS vendor may also impact the availability of CPUID bits, I think we should do it right by strictly conforming to theSDM. I.e. we need check CPUID leafs and then verify all Cx states propagated from dom0, instead of blindly following its info. Will work a patch for that. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 16/08/2011 06:34, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> From: Keir Fraser [mailto:keir.xen@gmail.com] >> Sent: Monday, August 15, 2011 5:02 PM >> >> On 15/08/2011 09:14, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >>>> cpu_has() accesses a pre-filled capabilities bitmask, filled from cpuid, >>>> right? And cpuid goes through a pv_ops hook? >>>> >>> >>> I''m not quite catching you here. Do you want to prefill mwait bit from >>> pv_ops >>> hook unconditionally even in current situation where Xen doesn''t expose >>> mwait, or selectively under some dom0''s boot option even when Xen is >>> changed to expose mwait? The first case is not sane, while for the 2nd case >>> I don''t think any pv_ops hook is required as long as Xen can expose it, >>> isn''t >>> it? >> >> But there is a pv_ops hook, and Xen isn''t going to expose it because it >> breaks old dom0 kernels. >> > > yes, now I understand your suggestion. Basically we discussed two approaches: > > a) in current pv_ops hook (xen_cpuid): > - use unconditional cpuid to query mwait capability on physical cpu > - if the bit is set, and SIF_PM_MASK indicates xen pm is enabled: > then set the bit in the ECX when leaf 1 is queried > this should effectively has X86_FEATURE_MWAIT set, and then _PDC method > will notify BIOS using mwait entry method. > This method doesn''t require Xen change, but it would only work for future > releases which incorporates this change > > b) in Xen we selectively clear MWAIT capability in pv_cpuid, based on whether > xen_cpuidle is enabled. If there''s no MWAIT available on physical CPU, or > xen_cpuidle is disabled, MWAIT is not exposed to the guest. This approach > doesn''t break old dom0 kernel, as it''s controlled by xen_cpuidle cmdline > option.It does require xen_cpuidle=0 to be added to the Xen command line. That''s not great.> Basically it''s not suggested to activate Cx transition by using legacy I/O > method, > so it''s fine to disable xen cpuidle on those old dom0 kernel. > > approach a) is better from the angle that we don''t want non-ring0 code to > execute MWAIT which causes invalid opcode exception, while for PM purpose > MWAIT is purely informational. > > Approach b) is better if we want existing Xen deployments to use efficient > C-state benefit, since Xen is backward compatible and thus upgrade to a > newer Xen release is lighter than upgrading to a newer dom0 kernel. > > What''s your opinion about this? If b) is not a valid concern from your side, > we > will go to a) as you suggested.I think (a) is the right way to go. -- Keir> Thanks > Kevin_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@novell.com] >> Sent: Monday, August 15, 2011 6:29 PM >> >> >>> On 15.08.11 at 10:09, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >> From: Jan Beulich [mailto:JBeulich@novell.com] >> >> >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >> > It''s unlikely to make into upstream, and also get lost in >> >> > into some distro such as SLES11. >> >> >> >> We can certainly fix it there. >> >> >> > >> > that''d be great. I/O method has observable impact on power efficiency, >> > and the fix would be very welcomed. :-) >> >> While the change is simple to do and works, I''m somewhat concerned > > Current change mentioned above is not safe, which always enables mwait > support w/o knowledge about underlying presence. But new processorsNot following you here: If I execute a "real" cpuid instruction, I will know whether mwait is "present".> all have mwait support, so this may not be big problem. > >> that while improving the situation on CPUs that support the break-on- >> interrupt extension to mwait, it would result in C2/C3 not being usable >> at all on CPUs that don''t (but support mwait in its simpler form and >> have ACPI tables specifying FFH as address space id). Is that only a >> theoretical concern (i.e. is there an implicit guarantee that for other >> than C1 FFH wouldn''t be specified without that extension being >> available)? I thinks it''s a practical one, or otherwise there wouldn''t >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC >> evaluation. > > Yes, this is a practical one, though I don''t know any box doing that. In all > the boxes I''ve been using so far, all the extensions are available. But > since > BIOS vendor may also impact the availability of CPUID bits, I think we > should do it right by strictly conforming to theSDM. I.e. we need check > CPUID leafs and then verify all Cx states propagated from dom0, instead > of blindly following its info. Will work a patch for that.You''re getting it sort of wrong way round: What I don''t want to do (but seemingly being necessary) is mimic the decision logic the hypervisor uses (i.e. require the break-on-interrupt extension for C2/C3 entering through MWAIT) in Dom0 when deciding about the bits to pass to _PDC. That ought to be an implementation detail (subject to change) in the hypervisor alone. The hypervisor itself, otoh, already properly checks CPUID leaf 5 (and that''s what might cause it to not use mwait despite the bit in CPUID leaf 1 being set, which should be all Dom0 ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Tuesday, August 16, 2011 2:42 PM > > >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> Sent: Monday, August 15, 2011 6:29 PM > >> > >> >>> On 15.08.11 at 10:09, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> >> >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> >> > It''s unlikely to make into upstream, and also get lost in > >> >> > into some distro such as SLES11. > >> >> > >> >> We can certainly fix it there. > >> >> > >> > > >> > that''d be great. I/O method has observable impact on power efficiency, > >> > and the fix would be very welcomed. :-) > >> > >> While the change is simple to do and works, I''m somewhat concerned > > > > Current change mentioned above is not safe, which always enables mwait > > support w/o knowledge about underlying presence. But new processors > > Not following you here: If I execute a "real" cpuid instruction, I will know > whether mwait is "present".Sorry. I thought you want to integrate current patch in 2.6.32 pvops tree: diff --git a/arch/x86/kernel/acpi/processor.c b/arch/x86/kernel/acpi/processor.c index 8c9526d..d88866c 100644 --- a/arch/x86/kernel/acpi/processor.c +++ b/arch/x86/kernel/acpi/processor.c @@ -60,7 +60,7 @@ static void init_intel_pdc(struct acpi_processor *pr, struct cpuinfo_x86 *c) /* * If mwait/monitor is unsupported, C2/C3_FFH will be disabled */ - if (!cpu_has(c, X86_FEATURE_MWAIT)) + if (!cpu_has(c, X86_FEATURE_MWAIT) && !xen_initial_domain()) buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); obj->type = ACPI_TYPE_BUFFER; that''s definitely wrong.> > > all have mwait support, so this may not be big problem. > > > >> that while improving the situation on CPUs that support the break-on- > >> interrupt extension to mwait, it would result in C2/C3 not being usable > >> at all on CPUs that don''t (but support mwait in its simpler form and > >> have ACPI tables specifying FFH as address space id). Is that only a > >> theoretical concern (i.e. is there an implicit guarantee that for other > >> than C1 FFH wouldn''t be specified without that extension being > >> available)? I thinks it''s a practical one, or otherwise there wouldn''t > >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC > >> evaluation. > > > > Yes, this is a practical one, though I don''t know any box doing that. In all > > the boxes I''ve been using so far, all the extensions are available. But > > since > > BIOS vendor may also impact the availability of CPUID bits, I think we > > should do it right by strictly conforming to theSDM. I.e. we need check > > CPUID leafs and then verify all Cx states propagated from dom0, instead > > of blindly following its info. Will work a patch for that. > > You''re getting it sort of wrong way round: What I don''t want to do (but > seemingly being necessary) is mimic the decision logic the hypervisor > uses (i.e. require the break-on-interrupt extension for C2/C3 entering > through MWAIT) in Dom0 when deciding about the bits to pass tobreak-on-interrupt is not a hard requirement to use MWAIT. Even when that extension is not available, MWAIT can be still used to enter C2/C3, just with interrupt enabled.> _PDC. That ought to be an implementation detail (subject to change) > in the hypervisor alone. The hypervisor itself, otoh, already properly > checks CPUID leaf 5 (and that''s what might cause it to not use mwait > despite the bit in CPUID leaf 1 being set, which should be all Dom0 > ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH). >I made a mistake, that currently CPUID leaf 5 is already checked in check_cx in hypervisor, so it should be sane. However I still fail to catch your real concern here. :/ Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 16.08.11 at 08:53, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@novell.com] >> Sent: Tuesday, August 16, 2011 2:42 PM >> >> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >> From: Jan Beulich [mailto:JBeulich@novell.com] >> >> Sent: Monday, August 15, 2011 6:29 PM >> >> >> >> that while improving the situation on CPUs that support the break-on- >> >> interrupt extension to mwait, it would result in C2/C3 not being usable >> >> at all on CPUs that don''t (but support mwait in its simpler form and >> >> have ACPI tables specifying FFH as address space id). Is that only a >> >> theoretical concern (i.e. is there an implicit guarantee that for other >> >> than C1 FFH wouldn''t be specified without that extension being >> >> available)? I thinks it''s a practical one, or otherwise there wouldn''t >> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC >> >> evaluation. >> > >> > Yes, this is a practical one, though I don''t know any box doing that. In > all >> > the boxes I''ve been using so far, all the extensions are available. But >> > since >> > BIOS vendor may also impact the availability of CPUID bits, I think we >> > should do it right by strictly conforming to theSDM. I.e. we need check >> > CPUID leafs and then verify all Cx states propagated from dom0, instead >> > of blindly following its info. Will work a patch for that. >> >> You''re getting it sort of wrong way round: What I don''t want to do (but >> seemingly being necessary) is mimic the decision logic the hypervisor >> uses (i.e. require the break-on-interrupt extension for C2/C3 entering >> through MWAIT) in Dom0 when deciding about the bits to pass to > > break-on-interrupt is not a hard requirement to use MWAIT. Even when > that extension is not available, MWAIT can be still used to enter C2/C3, > just with interrupt enabled.And that''s why this implementation detail should be confined to the hypervisor - Dom0 should not care about this if at all possible.>> _PDC. That ought to be an implementation detail (subject to change) >> in the hypervisor alone. The hypervisor itself, otoh, already properly >> checks CPUID leaf 5 (and that''s what might cause it to not use mwait >> despite the bit in CPUID leaf 1 being set, which should be all Dom0 >> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH). >> > > I made a mistake, that currently CPUID leaf 5 is already checked in > check_cx in hypervisor, so it should be sane. However I still fail to catch > your real concern here. :/If Dom0 finds (real) CPUID leaf 1 report MWAIT to be available, it will (with the logic outlined above) call _PDC without clearing ACPI_PDC_C_C2C3_FFH. If now the break-on-interrupt extension is not present, but the address space ID for C2 or C3 is set to FFH, then Xen (in acpi_processor_ffh_cstate_probe()) will reject the Cx entry (and hence refrain from using the respective C-state), whereas if Dom0 cleared ACPI_PDC_C_C2C3_FFH in that case, firmware would (normally) have converted the address space ID to SYSTEM_IO, and hence Xen would have decided to use C2/C3 with the SYSIO entry method. So this is only acceptable if there are *no* production CPUs of any vendor that would support MWAIT without the break-on-interrupt extension. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Tuesday, August 16, 2011 4:13 PM > > >>> On 16.08.11 at 08:53, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> Sent: Tuesday, August 16, 2011 2:42 PM > >> > >> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> >> Sent: Monday, August 15, 2011 6:29 PM > >> >> > >> >> that while improving the situation on CPUs that support the break-on- > >> >> interrupt extension to mwait, it would result in C2/C3 not being usable > >> >> at all on CPUs that don''t (but support mwait in its simpler form and > >> >> have ACPI tables specifying FFH as address space id). Is that only a > >> >> theoretical concern (i.e. is there an implicit guarantee that for other > >> >> than C1 FFH wouldn''t be specified without that extension being > >> >> available)? I thinks it''s a practical one, or otherwise there wouldn''t > >> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC > >> >> evaluation. > >> > > >> > Yes, this is a practical one, though I don''t know any box doing that. In > > all > >> > the boxes I''ve been using so far, all the extensions are available. But > >> > since > >> > BIOS vendor may also impact the availability of CPUID bits, I think we > >> > should do it right by strictly conforming to theSDM. I.e. we need check > >> > CPUID leafs and then verify all Cx states propagated from dom0, instead > >> > of blindly following its info. Will work a patch for that. > >> > >> You''re getting it sort of wrong way round: What I don''t want to do (but > >> seemingly being necessary) is mimic the decision logic the hypervisor > >> uses (i.e. require the break-on-interrupt extension for C2/C3 entering > >> through MWAIT) in Dom0 when deciding about the bits to pass to > > > > break-on-interrupt is not a hard requirement to use MWAIT. Even when > > that extension is not available, MWAIT can be still used to enter C2/C3, > > just with interrupt enabled. > > And that''s why this implementation detail should be confined to the > hypervisor - Dom0 should not care about this if at all possible. > > >> _PDC. That ought to be an implementation detail (subject to change) > >> in the hypervisor alone. The hypervisor itself, otoh, already properly > >> checks CPUID leaf 5 (and that''s what might cause it to not use mwait > >> despite the bit in CPUID leaf 1 being set, which should be all Dom0 > >> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH). > >> > > > > I made a mistake, that currently CPUID leaf 5 is already checked in > > check_cx in hypervisor, so it should be sane. However I still fail to catch > > your real concern here. :/ > > If Dom0 finds (real) CPUID leaf 1 report MWAIT to be available, it > will (with the logic outlined above) call _PDC without clearing > ACPI_PDC_C_C2C3_FFH. If now the break-on-interrupt extension > is not present, but the address space ID for C2 or C3 is set to FFH, > then Xen (in acpi_processor_ffh_cstate_probe()) will reject the > Cx entry (and hence refrain from using the respective C-state), > whereas if Dom0 cleared ACPI_PDC_C_C2C3_FFH in that case, > firmware would (normally) have converted the address space ID to > SYSTEM_IO, and hence Xen would have decided to use C2/C3 with > the SYSIO entry method. > > So this is only acceptable if there are *no* production CPUs of any > vendor that would support MWAIT without the break-on-interrupt > extension. >yes, that''s also the way that native Linux code currently uses: - notify BIOS ACPI_PDC_C_C2C3_FFH if cpu has mwait - reject Cx entry if break-on-interrupt extension is not present later in acpi processor driver when parsing Cx entries.>From BIOS ACPI p.o.v, OSPM can notify BIOS about FFH style if followingconditions are true: a) cpu supports mwait b) OSPM itself supports mwait a) is architectural, but b) is implementation specific regarding to what can be called "support". Obviously both Xen and Linux here use an inconsistent way between the place notifying BIOS and the point parsing ACPI Cx entry. So your conclusion is correct that C2/C3 would be rejected on the CPU which doesn''t support MWAIT with break-on-interrupt extension. But it should be fine in the real world, and we may consider whether to do something when a real case is encountered in the future. On the other hand, you can think it as the decision from Xen that it doesn''t want to use legacy I/O method for C2/C3 when such situation exists. :-) Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 16.08.11 at 10:29, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@novell.com] >> Sent: Tuesday, August 16, 2011 4:13 PM >> >> >>> On 16.08.11 at 08:53, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >> From: Jan Beulich [mailto:JBeulich@novell.com] >> >> Sent: Tuesday, August 16, 2011 2:42 PM >> >> >> >> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >> >> From: Jan Beulich [mailto:JBeulich@novell.com] >> >> >> Sent: Monday, August 15, 2011 6:29 PM >> >> >> >> >> >> that while improving the situation on CPUs that support the break-on- >> >> >> interrupt extension to mwait, it would result in C2/C3 not being usable >> >> >> at all on CPUs that don''t (but support mwait in its simpler form and >> >> >> have ACPI tables specifying FFH as address space id). Is that only a >> >> >> theoretical concern (i.e. is there an implicit guarantee that for other >> >> >> than C1 FFH wouldn''t be specified without that extension being >> >> >> available)? I thinks it''s a practical one, or otherwise there wouldn''t >> >> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC >> >> >> evaluation. >> >> > >> >> > Yes, this is a practical one, though I don''t know any box doing that. In >> > all >> >> > the boxes I''ve been using so far, all the extensions are available. But >> >> > since >> >> > BIOS vendor may also impact the availability of CPUID bits, I think we >> >> > should do it right by strictly conforming to theSDM. I.e. we need check >> >> > CPUID leafs and then verify all Cx states propagated from dom0, instead >> >> > of blindly following its info. Will work a patch for that. >> >> >> >> You''re getting it sort of wrong way round: What I don''t want to do (but >> >> seemingly being necessary) is mimic the decision logic the hypervisor >> >> uses (i.e. require the break-on-interrupt extension for C2/C3 entering >> >> through MWAIT) in Dom0 when deciding about the bits to pass to >> > >> > break-on-interrupt is not a hard requirement to use MWAIT. Even when >> > that extension is not available, MWAIT can be still used to enter C2/C3, >> > just with interrupt enabled. >> >> And that''s why this implementation detail should be confined to the >> hypervisor - Dom0 should not care about this if at all possible. >> >> >> _PDC. That ought to be an implementation detail (subject to change) >> >> in the hypervisor alone. The hypervisor itself, otoh, already properly >> >> checks CPUID leaf 5 (and that''s what might cause it to not use mwait >> >> despite the bit in CPUID leaf 1 being set, which should be all Dom0 >> >> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH). >> >> >> > >> > I made a mistake, that currently CPUID leaf 5 is already checked in >> > check_cx in hypervisor, so it should be sane. However I still fail to catch >> > your real concern here. :/ >> >> If Dom0 finds (real) CPUID leaf 1 report MWAIT to be available, it >> will (with the logic outlined above) call _PDC without clearing >> ACPI_PDC_C_C2C3_FFH. If now the break-on-interrupt extension >> is not present, but the address space ID for C2 or C3 is set to FFH, >> then Xen (in acpi_processor_ffh_cstate_probe()) will reject the >> Cx entry (and hence refrain from using the respective C-state), >> whereas if Dom0 cleared ACPI_PDC_C_C2C3_FFH in that case, >> firmware would (normally) have converted the address space ID to >> SYSTEM_IO, and hence Xen would have decided to use C2/C3 with >> the SYSIO entry method. >> >> So this is only acceptable if there are *no* production CPUs of any >> vendor that would support MWAIT without the break-on-interrupt >> extension. >> > > yes, that''s also the way that native Linux code currently uses: > - notify BIOS ACPI_PDC_C_C2C3_FFH if cpu has mwait > - reject Cx entry if break-on-interrupt extension is not present later > in acpi processor driver when parsing Cx entries. > > From BIOS ACPI p.o.v, OSPM can notify BIOS about FFH style if following > conditions are true: > a) cpu supports mwait > b) OSPM itself supports mwait > > a) is architectural, but b) is implementation specific regarding to what > can be called "support". Obviously both Xen and Linux here use an > inconsistent way between the place notifying BIOS and the point parsing > ACPI Cx entry. So your conclusion is correct that C2/C3 would be rejected > on the CPU which doesn''t support MWAIT with break-on-interrupt > extension. But it should be fine in the real world, and we may consider > whether to do something when a real case is encountered in the future. > > On the other hand, you can think it as the decision from Xen that it > doesn''t want to use legacy I/O method for C2/C3 when such situation exists. > :-)Yeah, but customers could validly view this as regression (because on such a system Xen would use C2/C3 currently). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Tuesday, August 16, 2011 4:45 PM > > >>> On 16.08.11 at 10:29, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> Sent: Tuesday, August 16, 2011 4:13 PM > >> > >> >>> On 16.08.11 at 08:53, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> >> Sent: Tuesday, August 16, 2011 2:42 PM > >> >> > >> >> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> >> >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> >> >> Sent: Monday, August 15, 2011 6:29 PM > >> >> >> > >> >> >> that while improving the situation on CPUs that support the break-on- > >> >> >> interrupt extension to mwait, it would result in C2/C3 not being usable > >> >> >> at all on CPUs that don''t (but support mwait in its simpler form and > >> >> >> have ACPI tables specifying FFH as address space id). Is that only a > >> >> >> theoretical concern (i.e. is there an implicit guarantee that for other > >> >> >> than C1 FFH wouldn''t be specified without that extension being > >> >> >> available)? I thinks it''s a practical one, or otherwise there wouldn''t > >> >> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC > >> >> >> evaluation. > >> >> > > >> >> > Yes, this is a practical one, though I don''t know any box doing that. In > >> > all > >> >> > the boxes I''ve been using so far, all the extensions are available. But > >> >> > since > >> >> > BIOS vendor may also impact the availability of CPUID bits, I think we > >> >> > should do it right by strictly conforming to theSDM. I.e. we need check > >> >> > CPUID leafs and then verify all Cx states propagated from dom0, > instead > >> >> > of blindly following its info. Will work a patch for that. > >> >> > >> >> You''re getting it sort of wrong way round: What I don''t want to do (but > >> >> seemingly being necessary) is mimic the decision logic the hypervisor > >> >> uses (i.e. require the break-on-interrupt extension for C2/C3 entering > >> >> through MWAIT) in Dom0 when deciding about the bits to pass to > >> > > >> > break-on-interrupt is not a hard requirement to use MWAIT. Even when > >> > that extension is not available, MWAIT can be still used to enter C2/C3, > >> > just with interrupt enabled. > >> > >> And that''s why this implementation detail should be confined to the > >> hypervisor - Dom0 should not care about this if at all possible. > >> > >> >> _PDC. That ought to be an implementation detail (subject to change) > >> >> in the hypervisor alone. The hypervisor itself, otoh, already properly > >> >> checks CPUID leaf 5 (and that''s what might cause it to not use mwait > >> >> despite the bit in CPUID leaf 1 being set, which should be all Dom0 > >> >> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH). > >> >> > >> > > >> > I made a mistake, that currently CPUID leaf 5 is already checked in > >> > check_cx in hypervisor, so it should be sane. However I still fail to catch > >> > your real concern here. :/ > >> > >> If Dom0 finds (real) CPUID leaf 1 report MWAIT to be available, it > >> will (with the logic outlined above) call _PDC without clearing > >> ACPI_PDC_C_C2C3_FFH. If now the break-on-interrupt extension > >> is not present, but the address space ID for C2 or C3 is set to FFH, > >> then Xen (in acpi_processor_ffh_cstate_probe()) will reject the > >> Cx entry (and hence refrain from using the respective C-state), > >> whereas if Dom0 cleared ACPI_PDC_C_C2C3_FFH in that case, > >> firmware would (normally) have converted the address space ID to > >> SYSTEM_IO, and hence Xen would have decided to use C2/C3 with > >> the SYSIO entry method. > >> > >> So this is only acceptable if there are *no* production CPUs of any > >> vendor that would support MWAIT without the break-on-interrupt > >> extension. > >> > > > > yes, that''s also the way that native Linux code currently uses: > > - notify BIOS ACPI_PDC_C_C2C3_FFH if cpu has mwait > > - reject Cx entry if break-on-interrupt extension is not present later > > in acpi processor driver when parsing Cx entries. > > > > From BIOS ACPI p.o.v, OSPM can notify BIOS about FFH style if following > > conditions are true: > > a) cpu supports mwait > > b) OSPM itself supports mwait > > > > a) is architectural, but b) is implementation specific regarding to what > > can be called "support". Obviously both Xen and Linux here use an > > inconsistent way between the place notifying BIOS and the point parsing > > ACPI Cx entry. So your conclusion is correct that C2/C3 would be rejected > > on the CPU which doesn''t support MWAIT with break-on-interrupt > > extension. But it should be fine in the real world, and we may consider > > whether to do something when a real case is encountered in the future. > > > > On the other hand, you can think it as the decision from Xen that it > > doesn''t want to use legacy I/O method for C2/C3 when such situation exists. > > :-) > > Yeah, but customers could validly view this as regression (because on > such a system Xen would use C2/C3 currently).Well, if such system -does- exists and such customers -do-exist. Anyway legacy I/O method should really be avoided. If some customers happen to do some power business on such system, I''d suggest moving to a typical system since that environment won''t ensure a good power efficiency. It''s not a good base for power tuning. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 16.08.11 at 10:29, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@novell.com] >> Sent: Tuesday, August 16, 2011 4:13 PM >> >> >>> On 16.08.11 at 08:53, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >> From: Jan Beulich [mailto:JBeulich@novell.com] >> >> Sent: Tuesday, August 16, 2011 2:42 PM >> >> >> >> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >> >> From: Jan Beulich [mailto:JBeulich@novell.com] >> >> >> Sent: Monday, August 15, 2011 6:29 PM >> >> >> >> >> >> that while improving the situation on CPUs that support the break-on- >> >> >> interrupt extension to mwait, it would result in C2/C3 not being usable >> >> >> at all on CPUs that don''t (but support mwait in its simpler form and >> >> >> have ACPI tables specifying FFH as address space id). Is that only a >> >> >> theoretical concern (i.e. is there an implicit guarantee that for other >> >> >> than C1 FFH wouldn''t be specified without that extension being >> >> >> available)? I thinks it''s a practical one, or otherwise there wouldn''t >> >> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC >> >> >> evaluation. >> >> > >> >> > Yes, this is a practical one, though I don''t know any box doing that. In >> > all >> >> > the boxes I''ve been using so far, all the extensions are available. But >> >> > since >> >> > BIOS vendor may also impact the availability of CPUID bits, I think we >> >> > should do it right by strictly conforming to theSDM. I.e. we need check >> >> > CPUID leafs and then verify all Cx states propagated from dom0, instead >> >> > of blindly following its info. Will work a patch for that. >> >> >> >> You''re getting it sort of wrong way round: What I don''t want to do (but >> >> seemingly being necessary) is mimic the decision logic the hypervisor >> >> uses (i.e. require the break-on-interrupt extension for C2/C3 entering >> >> through MWAIT) in Dom0 when deciding about the bits to pass to >> > >> > break-on-interrupt is not a hard requirement to use MWAIT. Even when >> > that extension is not available, MWAIT can be still used to enter C2/C3, >> > just with interrupt enabled. >> >> And that''s why this implementation detail should be confined to the >> hypervisor - Dom0 should not care about this if at all possible. >> >> >> _PDC. That ought to be an implementation detail (subject to change) >> >> in the hypervisor alone. The hypervisor itself, otoh, already properly >> >> checks CPUID leaf 5 (and that''s what might cause it to not use mwait >> >> despite the bit in CPUID leaf 1 being set, which should be all Dom0 >> >> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH). >> >> >> > >> > I made a mistake, that currently CPUID leaf 5 is already checked in >> > check_cx in hypervisor, so it should be sane. However I still fail to catch >> > your real concern here. :/ >> >> If Dom0 finds (real) CPUID leaf 1 report MWAIT to be available, it >> will (with the logic outlined above) call _PDC without clearing >> ACPI_PDC_C_C2C3_FFH. If now the break-on-interrupt extension >> is not present, but the address space ID for C2 or C3 is set to FFH, >> then Xen (in acpi_processor_ffh_cstate_probe()) will reject the >> Cx entry (and hence refrain from using the respective C-state), >> whereas if Dom0 cleared ACPI_PDC_C_C2C3_FFH in that case, >> firmware would (normally) have converted the address space ID to >> SYSTEM_IO, and hence Xen would have decided to use C2/C3 with >> the SYSIO entry method. >> >> So this is only acceptable if there are *no* production CPUs of any >> vendor that would support MWAIT without the break-on-interrupt >> extension. >> > > yes, that''s also the way that native Linux code currently uses: > - notify BIOS ACPI_PDC_C_C2C3_FFH if cpu has mwait > - reject Cx entry if break-on-interrupt extension is not present later > in acpi processor driver when parsing Cx entries. > > From BIOS ACPI p.o.v, OSPM can notify BIOS about FFH style if following > conditions are true: > a) cpu supports mwait > b) OSPM itself supports mwait > > a) is architectural, but b) is implementation specific regarding to what > can be called "support". Obviously both Xen and Linux here use an > inconsistent way between the place notifying BIOS and the point parsing > ACPI Cx entry. So your conclusion is correct that C2/C3 would be rejected > on the CPU which doesn''t support MWAIT with break-on-interrupt > extension. But it should be fine in the real world, and we may consider > whether to do something when a real case is encountered in the future.Actually, I just checked, and I have two systems that have MWAIT but no extensions to it. While one is an old SDV of yours, the other is a production Dell system (which I only run Windows on, so I can''t really check how Xen would behave on it prior to and with the discussed adjustment). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Thursday, August 18, 2011 8:36 PM > > >>> On 16.08.11 at 10:29, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> Sent: Tuesday, August 16, 2011 4:13 PM > >> > >> >>> On 16.08.11 at 08:53, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> >> Sent: Tuesday, August 16, 2011 2:42 PM > >> >> > >> >> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> >> >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> >> >> Sent: Monday, August 15, 2011 6:29 PM > >> >> >> > >> >> >> that while improving the situation on CPUs that support the break-on- > >> >> >> interrupt extension to mwait, it would result in C2/C3 not being usable > >> >> >> at all on CPUs that don''t (but support mwait in its simpler form and > >> >> >> have ACPI tables specifying FFH as address space id). Is that only a > >> >> >> theoretical concern (i.e. is there an implicit guarantee that for other > >> >> >> than C1 FFH wouldn''t be specified without that extension being > >> >> >> available)? I thinks it''s a practical one, or otherwise there wouldn''t > >> >> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC > >> >> >> evaluation. > >> >> > > >> >> > Yes, this is a practical one, though I don''t know any box doing that. In > >> > all > >> >> > the boxes I''ve been using so far, all the extensions are available. But > >> >> > since > >> >> > BIOS vendor may also impact the availability of CPUID bits, I think we > >> >> > should do it right by strictly conforming to theSDM. I.e. we need check > >> >> > CPUID leafs and then verify all Cx states propagated from dom0, > instead > >> >> > of blindly following its info. Will work a patch for that. > >> >> > >> >> You''re getting it sort of wrong way round: What I don''t want to do (but > >> >> seemingly being necessary) is mimic the decision logic the hypervisor > >> >> uses (i.e. require the break-on-interrupt extension for C2/C3 entering > >> >> through MWAIT) in Dom0 when deciding about the bits to pass to > >> > > >> > break-on-interrupt is not a hard requirement to use MWAIT. Even when > >> > that extension is not available, MWAIT can be still used to enter C2/C3, > >> > just with interrupt enabled. > >> > >> And that''s why this implementation detail should be confined to the > >> hypervisor - Dom0 should not care about this if at all possible. > >> > >> >> _PDC. That ought to be an implementation detail (subject to change) > >> >> in the hypervisor alone. The hypervisor itself, otoh, already properly > >> >> checks CPUID leaf 5 (and that''s what might cause it to not use mwait > >> >> despite the bit in CPUID leaf 1 being set, which should be all Dom0 > >> >> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH). > >> >> > >> > > >> > I made a mistake, that currently CPUID leaf 5 is already checked in > >> > check_cx in hypervisor, so it should be sane. However I still fail to catch > >> > your real concern here. :/ > >> > >> If Dom0 finds (real) CPUID leaf 1 report MWAIT to be available, it > >> will (with the logic outlined above) call _PDC without clearing > >> ACPI_PDC_C_C2C3_FFH. If now the break-on-interrupt extension > >> is not present, but the address space ID for C2 or C3 is set to FFH, > >> then Xen (in acpi_processor_ffh_cstate_probe()) will reject the > >> Cx entry (and hence refrain from using the respective C-state), > >> whereas if Dom0 cleared ACPI_PDC_C_C2C3_FFH in that case, > >> firmware would (normally) have converted the address space ID to > >> SYSTEM_IO, and hence Xen would have decided to use C2/C3 with > >> the SYSIO entry method. > >> > >> So this is only acceptable if there are *no* production CPUs of any > >> vendor that would support MWAIT without the break-on-interrupt > >> extension. > >> > > > > yes, that''s also the way that native Linux code currently uses: > > - notify BIOS ACPI_PDC_C_C2C3_FFH if cpu has mwait > > - reject Cx entry if break-on-interrupt extension is not present later > > in acpi processor driver when parsing Cx entries. > > > > From BIOS ACPI p.o.v, OSPM can notify BIOS about FFH style if following > > conditions are true: > > a) cpu supports mwait > > b) OSPM itself supports mwait > > > > a) is architectural, but b) is implementation specific regarding to what > > can be called "support". Obviously both Xen and Linux here use an > > inconsistent way between the place notifying BIOS and the point parsing > > ACPI Cx entry. So your conclusion is correct that C2/C3 would be rejected > > on the CPU which doesn''t support MWAIT with break-on-interrupt > > extension. But it should be fine in the real world, and we may consider > > whether to do something when a real case is encountered in the future. > > Actually, I just checked, and I have two systems that have MWAIT but > no extensions to it. While one is an old SDV of yours, the other is a > production Dell system (which I only run Windows on, so I can''t really > check how Xen would behave on it prior to and with the discussed > adjustment).OK, to avoid the regression, if it''s really cared, then we may change Xen to support entering C-state by mwait with interrupt enabled. But the next question is whether it''s really worthy of enabling Xen PM such way: - I think native Linux only supports mwait with break-on-interrupt extension too. You may confirm on such machines which I think should have no C2/C3 available. It''s less likely for a customer to try PM on a platform where native Linux fails to do that - using mwait with interrupt enabled lacks the trace capability, while w/o trace I don''t think any customer would enable Xen PM w/o a verification process. Another approach, if we really want to keep original I/O style, is to reveal Xen''s mwait related conditions in shared info page, say a simple flag to indicate whether mwait bit should be set by pvops cpuid hook. Xen will check mwait extension earlier before dom0 is launched, instead of current point where dom0 registers cx info. This way there''s no Xen implementation detail encoded in dom0, while concerned regression could be removed. You''re OSV guys. So let''s draw a conclusion from your expertise. If you think such regression is important and should be avoided, and the 2nd approach about share info is reasonable, I''ll go for it. :-) Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 19.08.11 at 03:31, "Tian, Kevin" <kevin.tian@intel.com> wrote: > OK, to avoid the regression, if it''s really cared, then we may change Xen > to support entering C-state by mwait with interrupt enabled.I don''t think that''s worth it.> But the next question is whether it''s really worthy of enabling Xen PM > such way: > - I think native Linux only supports mwait with break-on-interrupt > extension too. You may confirm on such machines which I think should > have no C2/C3 available. It''s less likely for a customer to try PM on a > platform where native Linux fails to do thatLooking at the code, I can''t see why Linux wouldn''t use the I/O method in this case instead.> - using mwait with interrupt enabled lacks the trace capability, > while w/o trace I don''t think any customer would enable Xen PM w/o a > verification process. > > Another approach, if we really want to keep original I/O style, is to > reveal Xen''s mwait related conditions in shared info page, say a simple > flag to indicate whether mwait bit should be set by pvops cpuid hook. > Xen will check mwait extension earlier before dom0 is launched, instead > of current point where dom0 registers cx info. This way there''s no Xen > implementation detail encoded in dom0, while concerned regression > could be removed.The concept sounds reasonable, just that the shared info page probably isn''t the right mechanism (after all this is Dom0-only information that we want to expose). A new platform sub-hypercall would probably be the better route. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Friday, August 19, 2011 4:11 PM > > >>> On 19.08.11 at 03:31, "Tian, Kevin" <kevin.tian@intel.com> wrote: > > OK, to avoid the regression, if it''s really cared, then we may change Xen > > to support entering C-state by mwait with interrupt enabled. > > I don''t think that''s worth it. > > > But the next question is whether it''s really worthy of enabling Xen PM > > such way: > > - I think native Linux only supports mwait with break-on-interrupt > > extension too. You may confirm on such machines which I think should > > have no C2/C3 available. It''s less likely for a customer to try PM on a > > platform where native Linux fails to do that > > Looking at the code, I can''t see why Linux wouldn''t use the I/O method > in this case instead.acpi_processor_ffh_cstate_probe_cpu: /* mwait ecx extensions INTERRUPT_BREAK should be supported for C2/C3 */ if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) || !(ecx & CPUID5_ECX_INTERRUPT_BREAK)) { retval = -1; goto out; } arch_acpi_set_pdc_bits: /* * If mwait/monitor is unsupported, C2/C3_FFH will be disabled */ if (!cpu_has(c, X86_FEATURE_MWAIT)) buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);> > > - using mwait with interrupt enabled lacks the trace capability, > > while w/o trace I don''t think any customer would enable Xen PM w/o a > > verification process. > > > > Another approach, if we really want to keep original I/O style, is to > > reveal Xen''s mwait related conditions in shared info page, say a simple > > flag to indicate whether mwait bit should be set by pvops cpuid hook. > > Xen will check mwait extension earlier before dom0 is launched, instead > > of current point where dom0 registers cx info. This way there''s no Xen > > implementation detail encoded in dom0, while concerned regression > > could be removed. > > The concept sounds reasonable, just that the shared info page probably > isn''t the right mechanism (after all this is Dom0-only information that > we want to expose). A new platform sub-hypercall would probably be > the better route. >yes, that''s a better choice. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 19.08.11 at 10:34, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@novell.com] >> Sent: Friday, August 19, 2011 4:11 PM >> >> >>> On 19.08.11 at 03:31, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> > OK, to avoid the regression, if it''s really cared, then we may change Xen >> > to support entering C-state by mwait with interrupt enabled. >> >> I don''t think that''s worth it. >> >> > But the next question is whether it''s really worthy of enabling Xen PM >> > such way: >> > - I think native Linux only supports mwait with break-on-interrupt >> > extension too. You may confirm on such machines which I think should >> > have no C2/C3 available. It''s less likely for a customer to try PM on a >> > platform where native Linux fails to do that >> >> Looking at the code, I can''t see why Linux wouldn''t use the I/O method >> in this case instead. > > acpi_processor_ffh_cstate_probe_cpu: > /* mwait ecx extensions INTERRUPT_BREAK should be supported for > C2/C3 */ > if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) || > !(ecx & CPUID5_ECX_INTERRUPT_BREAK)) { > retval = -1; > goto out; > } > > arch_acpi_set_pdc_bits: > /* > * If mwait/monitor is unsupported, C2/C3_FFH will be disabled > */ > if (!cpu_has(c, X86_FEATURE_MWAIT)) > buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); >Right, this precludes the use of MWAIT, but it doesn''t preclude using the I/O method.>> >> > - using mwait with interrupt enabled lacks the trace capability, >> > while w/o trace I don''t think any customer would enable Xen PM w/o a >> > verification process. >> > >> > Another approach, if we really want to keep original I/O style, is to >> > reveal Xen''s mwait related conditions in shared info page, say a simple >> > flag to indicate whether mwait bit should be set by pvops cpuid hook. >> > Xen will check mwait extension earlier before dom0 is launched, instead >> > of current point where dom0 registers cx info. This way there''s no Xen >> > implementation detail encoded in dom0, while concerned regression >> > could be removed. >> >> The concept sounds reasonable, just that the shared info page probably >> isn''t the right mechanism (after all this is Dom0-only information that >> we want to expose). A new platform sub-hypercall would probably be >> the better route. >> > > yes, that''s a better choice.Yet another idea - why don''t we simply pass the buffer passed to arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH (which I don''t think Xen really supports at present). Or really, depending on who controls what, the P, C, and T bits should be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently does, and then let Xen override the bits it ought to control). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Friday, August 19, 2011 5:32 PM > > >>> On 19.08.11 at 10:34, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> Sent: Friday, August 19, 2011 4:11 PM > >> > >> >>> On 19.08.11 at 03:31, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> > OK, to avoid the regression, if it''s really cared, then we may change Xen > >> > to support entering C-state by mwait with interrupt enabled. > >> > >> I don''t think that''s worth it. > >> > >> > But the next question is whether it''s really worthy of enabling Xen PM > >> > such way: > >> > - I think native Linux only supports mwait with break-on-interrupt > >> > extension too. You may confirm on such machines which I think should > >> > have no C2/C3 available. It''s less likely for a customer to try PM on a > >> > platform where native Linux fails to do that > >> > >> Looking at the code, I can''t see why Linux wouldn''t use the I/O method > >> in this case instead. > > > > acpi_processor_ffh_cstate_probe_cpu: > > /* mwait ecx extensions INTERRUPT_BREAK should be supported > for > > C2/C3 */ > > if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) || > > !(ecx & CPUID5_ECX_INTERRUPT_BREAK)) { > > retval = -1; > > goto out; > > } > > > > arch_acpi_set_pdc_bits: > > /* > > * If mwait/monitor is unsupported, C2/C3_FFH will be disabled > > */ > > if (!cpu_has(c, X86_FEATURE_MWAIT)) > > buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); > > > > Right, this precludes the use of MWAIT, but it doesn''t preclude using > the I/O method.In your special machine which has mwait but no break-on-interrupt extension: arch_acpi_set_pdc_bits tells BIOS that mwait should be used. BIOS then report _CST table with each entry filled with mwait style info later when Linux verifies _CST entry by acpi_processor_ffh_cstate_probe_cpu, it simply fails. No fallback to i/o method as BIOS is only notified in ACPI init time.> > >> > >> > - using mwait with interrupt enabled lacks the trace capability, > >> > while w/o trace I don''t think any customer would enable Xen PM w/o a > >> > verification process. > >> > > >> > Another approach, if we really want to keep original I/O style, is to > >> > reveal Xen''s mwait related conditions in shared info page, say a simple > >> > flag to indicate whether mwait bit should be set by pvops cpuid hook. > >> > Xen will check mwait extension earlier before dom0 is launched, instead > >> > of current point where dom0 registers cx info. This way there''s no Xen > >> > implementation detail encoded in dom0, while concerned regression > >> > could be removed. > >> > >> The concept sounds reasonable, just that the shared info page probably > >> isn''t the right mechanism (after all this is Dom0-only information that > >> we want to expose). A new platform sub-hypercall would probably be > >> the better route. > >> > > > > yes, that''s a better choice. > > Yet another idea - why don''t we simply pass the buffer passed to > arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the > bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH > (which I don''t think Xen really supports at present). > > Or really, depending on who controls what, the P, C, and T bits should > be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently > does, and then let Xen override the bits it ought to control)._PDC is encoded in AML language, and requires an ACPI parser which is one thing we avoid in Xen. If Xen want to override those bits, then whole ACPI component needs move down to Xen too. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 19.08.11 at 11:39, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@novell.com] >> Sent: Friday, August 19, 2011 5:32 PM >> >> >>> On 19.08.11 at 10:34, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >> From: Jan Beulich [mailto:JBeulich@novell.com] >> >> Sent: Friday, August 19, 2011 4:11 PM >> >> >> >> >>> On 19.08.11 at 03:31, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >> > OK, to avoid the regression, if it''s really cared, then we may change Xen >> >> > to support entering C-state by mwait with interrupt enabled. >> >> >> >> I don''t think that''s worth it. >> >> >> >> > But the next question is whether it''s really worthy of enabling Xen PM >> >> > such way: >> >> > - I think native Linux only supports mwait with break-on-interrupt >> >> > extension too. You may confirm on such machines which I think should >> >> > have no C2/C3 available. It''s less likely for a customer to try PM on a >> >> > platform where native Linux fails to do that >> >> >> >> Looking at the code, I can''t see why Linux wouldn''t use the I/O method >> >> in this case instead. >> > >> > acpi_processor_ffh_cstate_probe_cpu: >> > /* mwait ecx extensions INTERRUPT_BREAK should be supported >> for >> > C2/C3 */ >> > if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) || >> > !(ecx & CPUID5_ECX_INTERRUPT_BREAK)) { >> > retval = -1; >> > goto out; >> > } >> > >> > arch_acpi_set_pdc_bits: >> > /* >> > * If mwait/monitor is unsupported, C2/C3_FFH will be disabled >> > */ >> > if (!cpu_has(c, X86_FEATURE_MWAIT)) >> > buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); >> > >> >> Right, this precludes the use of MWAIT, but it doesn''t preclude using >> the I/O method. > > In your special machine which has mwait but no break-on-interrupt extension: > > arch_acpi_set_pdc_bits tells BIOS that mwait should be used. > > BIOS then report _CST table with each entry filled with mwait style info > > later when Linux verifies _CST entry by acpi_processor_ffh_cstate_probe_cpu, > it simply fails. No fallback to i/o method as BIOS is only notified in ACPI > init time.Oh, right, I just stumbled across that too (and didn''t pay close enough attention before). The function really should be checking for the extension bits.>> >> >> >> >> > - using mwait with interrupt enabled lacks the trace capability, >> >> > while w/o trace I don''t think any customer would enable Xen PM w/o a >> >> > verification process. >> >> > >> >> > Another approach, if we really want to keep original I/O style, is to >> >> > reveal Xen''s mwait related conditions in shared info page, say a simple >> >> > flag to indicate whether mwait bit should be set by pvops cpuid hook. >> >> > Xen will check mwait extension earlier before dom0 is launched, instead >> >> > of current point where dom0 registers cx info. This way there''s no Xen >> >> > implementation detail encoded in dom0, while concerned regression >> >> > could be removed. >> >> >> >> The concept sounds reasonable, just that the shared info page probably >> >> isn''t the right mechanism (after all this is Dom0-only information that >> >> we want to expose). A new platform sub-hypercall would probably be >> >> the better route. >> >> >> > >> > yes, that''s a better choice. >> >> Yet another idea - why don''t we simply pass the buffer passed to >> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the >> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH >> (which I don''t think Xen really supports at present). >> >> Or really, depending on who controls what, the P, C, and T bits should >> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently >> does, and then let Xen override the bits it ought to control). > > _PDC is encoded in AML language, and requires an ACPI parser which > is one thing we avoid in Xen. If Xen want to override those bits, then > whole ACPI component needs move down to Xen too.No, I''m not saying the evaluation should be happening there. Below is a draft hypervisor patch (only compile tested so far). Jan --- a/xen/arch/ia64/linux-xen/acpi.c +++ b/xen/arch/ia64/linux-xen/acpi.c @@ -241,6 +241,12 @@ int get_cpu_id(u32 acpi_id) return -1; } + +int arch_acpi_set_pdc_bits(u32 *pdc, u32 mask) +{ + pdc[2] |= ACPI_PDC_EST_CAPABILITY_SMP & mask; +} + #endif static int __init --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -643,12 +643,6 @@ static int cpuidle_init_cpu(int cpu) return 0; } -#define CPUID_MWAIT_LEAF (5) -#define CPUID5_ECX_EXTENSIONS_SUPPORTED (0x1) -#define CPUID5_ECX_INTERRUPT_BREAK (0x2) - -#define MWAIT_ECX_INTERRUPT_BREAK (0x1) - #define MWAIT_SUBSTATE_MASK (0xf) #define MWAIT_SUBSTATE_SIZE (4) --- a/xen/arch/x86/acpi/lib.c +++ b/xen/arch/x86/acpi/lib.c @@ -81,3 +81,31 @@ unsigned int acpi_get_processor_id(unsig return INVALID_ACPIID; } + +int arch_acpi_set_pdc_bits(u32 *pdc, u32 mask) +{ + u32 ecx; + + pdc[2] |= ACPI_PDC_C_CAPABILITY_SMP & mask; + + if (boot_cpu_has(X86_FEATURE_EST)) + pdc[2] |= ACPI_PDC_EST_CAPABILITY_SWSMP & mask; + + if (boot_cpu_has(X86_FEATURE_ACPI)) + pdc[2] |= ACPI_PDC_T_FFH & mask; + + /* + * If mwait/monitor or its break-on-interrupt extension are + * unsupported, Cx_FFH will be disabled. + */ + if (boot_cpu_has(X86_FEATURE_MWAIT) && + boot_cpu_data.cpuid_level >= CPUID_MWAIT_LEAF) + ecx = cpuid_ecx(CPUID_MWAIT_LEAF); + else + ecx = 0; + if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) || + !(ecx & CPUID5_ECX_INTERRUPT_BREAK)) + pdc[2] &= ~(ACPI_PDC_C_C1_FFH | ACPI_PDC_C_C2C3_FFH); + + return 0; +} --- a/xen/arch/x86/platform_hypercall.c +++ b/xen/arch/x86/platform_hypercall.c @@ -416,6 +416,18 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe ret = -EINVAL; break; + case XEN_PM_PDC: + if ( op->u.set_pminfo.id + 1 ) + ret = -EINVAL; + else + { + XEN_GUEST_HANDLE(uint32) pdc; + + guest_from_compat_handle(pdc, op->u.set_pminfo.u.pdc); + ret = acpi_set_pdc_bits(pdc); + } + break; + default: ret = -EINVAL; break; --- a/xen/drivers/acpi/pmstat.c +++ b/xen/drivers/acpi/pmstat.c @@ -517,3 +517,37 @@ int do_pm_op(struct xen_sysctl_pm_op *op return ret; } + +int acpi_set_pdc_bits(XEN_GUEST_HANDLE(uint32) pdc) +{ + u32 bits[3]; + int ret; + + if ( copy_from_guest(bits, pdc, 2) ) + ret = -EFAULT; + else if ( bits[0] != ACPI_PDC_REVISION_ID || !bits[1] ) + ret = -EINVAL; + else if ( copy_from_guest_offset(bits + 2, pdc, 2, 1) ) + ret = -EFAULT; + else + { + u32 mask = 0; + + if ( xen_processor_pmbits & XEN_PROCESSOR_PM_CX ) + mask |= ACPI_PDC_C_BITS | ACPI_PDC_SMP_C1PT; + if ( xen_processor_pmbits & XEN_PROCESSOR_PM_PX ) + mask |= ACPI_PDC_P_BITS | ACPI_PDC_SMP_C1PT; + if ( xen_processor_pmbits & XEN_PROCESSOR_PM_TX ) + mask |= ACPI_PDC_T_BITS | ACPI_PDC_SMP_C1PT; +printk("PDC: %04x (mask %04x)", bits[2], mask);//temp + bits[2] &= ACPI_PDC_C_BITS & ACPI_PDC_P_BITS & ACPI_PDC_T_BITS & + ACPI_PDC_SMP_C1PT & ~mask; +printk(" -> %04x", bits[2]);//temp + ret = arch_acpi_set_pdc_bits(bits, mask); +printk(" -> %04x (%d)\n", bits[2], ret);//temp + } + if ( !ret ) + ret = copy_to_guest_offset(pdc, 2, bits + 2, 1); + + return ret; +} --- a/xen/include/acpi/pdc_intel.h +++ b/xen/include/acpi/pdc_intel.h @@ -4,6 +4,8 @@ #ifndef __PDC_INTEL_H__ #define __PDC_INTEL_H__ +#define ACPI_PDC_REVISION_ID 1 + #define ACPI_PDC_P_FFH (0x0001) #define ACPI_PDC_C_C1_HALT (0x0002) #define ACPI_PDC_T_FFH (0x0004) @@ -14,6 +16,7 @@ #define ACPI_PDC_SMP_T_SWCOORD (0x0080) #define ACPI_PDC_C_C1_FFH (0x0100) #define ACPI_PDC_C_C2C3_FFH (0x0200) +#define ACPI_PDC_SMP_P_HWCOORD (0x0800) #define ACPI_PDC_EST_CAPABILITY_SMP (ACPI_PDC_SMP_C1PT | \ ACPI_PDC_C_C1_HALT | \ @@ -22,6 +25,7 @@ #define ACPI_PDC_EST_CAPABILITY_SWSMP (ACPI_PDC_SMP_C1PT | \ ACPI_PDC_C_C1_HALT | \ ACPI_PDC_SMP_P_SWCOORD | \ + ACPI_PDC_SMP_P_HWCOORD | \ ACPI_PDC_P_FFH) #define ACPI_PDC_C_CAPABILITY_SMP (ACPI_PDC_SMP_C2C3 | \ @@ -30,4 +34,17 @@ ACPI_PDC_C_C1_FFH | \ ACPI_PDC_C_C2C3_FFH) +#define ACPI_PDC_C_BITS (ACPI_PDC_C_C1_HALT | \ + ACPI_PDC_C_C1_FFH | \ + ACPI_PDC_SMP_C2C3 | \ + ACPI_PDC_SMP_C_SWCOORD | \ + ACPI_PDC_C_C2C3_FFH) + +#define ACPI_PDC_P_BITS (ACPI_PDC_P_FFH | \ + ACPI_PDC_SMP_P_SWCOORD | \ + ACPI_PDC_SMP_P_HWCOORD) + +#define ACPI_PDC_T_BITS (ACPI_PDC_T_FFH | \ + ACPI_PDC_SMP_T_SWCOORD) + #endif /* __PDC_INTEL_H__ */ --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -152,6 +152,10 @@ #define boot_cpu_has(bit) test_bit(bit, boot_cpu_data.x86_capability) #define cpufeat_mask(idx) (1u << ((idx) & 31)) +#define CPUID_MWAIT_LEAF 5 +#define CPUID5_ECX_EXTENSIONS_SUPPORTED 0x1 +#define CPUID5_ECX_INTERRUPT_BREAK 0x2 + #ifdef __i386__ #define cpu_has_vme boot_cpu_has(X86_FEATURE_VME) #define cpu_has_de boot_cpu_has(X86_FEATURE_DE) --- a/xen/include/public/platform.h +++ b/xen/include/public/platform.h @@ -304,6 +304,7 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_getidletim #define XEN_PM_CX 0 #define XEN_PM_PX 1 #define XEN_PM_TX 2 +#define XEN_PM_PDC 3 /* Px sub info type */ #define XEN_PX_PCT 1 @@ -401,6 +402,7 @@ struct xenpf_set_processor_pminfo { union { struct xen_processor_power power;/* Cx: _CST/_CSD */ struct xen_processor_performance perf; /* Px: _PPC/_PCT/_PSS/_PSD */ + XEN_GUEST_HANDLE(uint32) pdc; /* _PDC */ } u; }; typedef struct xenpf_set_processor_pminfo xenpf_set_processor_pminfo_t; --- a/xen/include/xen/acpi.h +++ b/xen/include/xen/acpi.h @@ -358,6 +358,9 @@ static inline unsigned int acpi_get_csta static inline void acpi_set_cstate_limit(unsigned int new_limit) { return; } #endif +int acpi_set_pdc_bits(XEN_GUEST_HANDLE(uint32)); +int arch_acpi_set_pdc_bits(u32 *, u32 mask); + #ifdef CONFIG_ACPI_NUMA int acpi_get_pxm(acpi_handle handle); #else _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 19.08.11 at 14:55, Jan Beulich wrote: > >>> On 19.08.11 at 11:39, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> Sent: Friday, August 19, 2011 5:32 PM > >> > >> >>> On 19.08.11 at 10:34, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> >> Sent: Friday, August 19, 2011 4:11 PM > >> >> > >> >> >>> On 19.08.11 at 03:31, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> >> > OK, to avoid the regression, if it''s really cared, then we may change Xen > >> >> > to support entering C-state by mwait with interrupt enabled. > >> >> > >> >> I don''t think that''s worth it. > >> >> > >> >> > But the next question is whether it''s really worthy of enabling Xen PM > >> >> > such way: > >> >> > - I think native Linux only supports mwait with break-on-interrupt > >> >> > extension too. You may confirm on such machines which I think should > >> >> > have no C2/C3 available. It''s less likely for a customer to try PM on a > >> >> > platform where native Linux fails to do that > >> >> > >> >> Looking at the code, I can''t see why Linux wouldn''t use the I/O method > >> >> in this case instead. > >> > > >> > acpi_processor_ffh_cstate_probe_cpu: > >> > /* mwait ecx extensions INTERRUPT_BREAK should be supported > >> for > >> > C2/C3 */ > >> > if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) || > >> > !(ecx & CPUID5_ECX_INTERRUPT_BREAK)) { > >> > retval = -1; > >> > goto out; > >> > } > >> > > >> > arch_acpi_set_pdc_bits: > >> > /* > >> > * If mwait/monitor is unsupported, C2/C3_FFH will be disabled > >> > */ > >> > if (!cpu_has(c, X86_FEATURE_MWAIT)) > >> > buf[2] &= ~(ACPI_PDC_C_C2C3_FFH); > >> > > >> > >> Right, this precludes the use of MWAIT, but it doesn''t preclude using > >> the I/O method. > > > > In your special machine which has mwait but no break-on-interrupt extension: > > > > arch_acpi_set_pdc_bits tells BIOS that mwait should be used. > > > > BIOS then report _CST table with each entry filled with mwait style info > > > > later when Linux verifies _CST entry by acpi_processor_ffh_cstate_probe_cpu, > > it simply fails. No fallback to i/o method as BIOS is only notified in ACPI > > init time. > > Oh, right, I just stumbled across that too (and didn''t pay close enough > attention before). The function really should be checking for the > extension bits. > > >> > >> >> > >> >> > - using mwait with interrupt enabled lacks the trace capability, > >> >> > while w/o trace I don''t think any customer would enable Xen PM w/o a > >> >> > verification process. > >> >> > > >> >> > Another approach, if we really want to keep original I/O style, is to > >> >> > reveal Xen''s mwait related conditions in shared info page, say a simple > >> >> > flag to indicate whether mwait bit should be set by pvops cpuid hook. > >> >> > Xen will check mwait extension earlier before dom0 is launched, instead > >> >> > of current point where dom0 registers cx info. This way there''s no Xen > >> >> > implementation detail encoded in dom0, while concerned regression > >> >> > could be removed. > >> >> > >> >> The concept sounds reasonable, just that the shared info page probably > >> >> isn''t the right mechanism (after all this is Dom0-only information that > >> >> we want to expose). A new platform sub-hypercall would probably be > >> >> the better route. > >> >> > >> > > >> > yes, that''s a better choice. > >> > >> Yet another idea - why don''t we simply pass the buffer passed to > >> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the > >> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH > >> (which I don''t think Xen really supports at present). > >> > >> Or really, depending on who controls what, the P, C, and T bits should > >> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently > >> does, and then let Xen override the bits it ought to control). > > > > _PDC is encoded in AML language, and requires an ACPI parser which > > is one thing we avoid in Xen. If Xen want to override those bits, then > > whole ACPI component needs move down to Xen too. > > No, I''m not saying the evaluation should be happening there. Below is > a draft hypervisor patch (only compile tested so far).Attached a patch that actually works (with a minimal Dom0 addition). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Friday, August 19, 2011 11:02 PM > > >> Yet another idea - why don''t we simply pass the buffer passed to > > >> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the > > >> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH > > >> (which I don''t think Xen really supports at present). > > >> > > >> Or really, depending on who controls what, the P, C, and T bits should > > >> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently > > >> does, and then let Xen override the bits it ought to control). > > > > > > _PDC is encoded in AML language, and requires an ACPI parser which > > > is one thing we avoid in Xen. If Xen want to override those bits, then > > > whole ACPI component needs move down to Xen too. > > > > No, I''m not saying the evaluation should be happening there. Below is > > a draft hypervisor patch (only compile tested so far). > > Attached a patch that actually works (with a minimal Dom0 addition). >yes, this change looks more straightforward. :-) Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 21.08.11 at 07:26, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@novell.com] >> Sent: Friday, August 19, 2011 11:02 PM >> > >> Yet another idea - why don''t we simply pass the buffer passed to >> > >> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the >> > >> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH >> > >> (which I don''t think Xen really supports at present). >> > >> >> > >> Or really, depending on who controls what, the P, C, and T bits should >> > >> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently >> > >> does, and then let Xen override the bits it ought to control). >> > > >> > > _PDC is encoded in AML language, and requires an ACPI parser which >> > > is one thing we avoid in Xen. If Xen want to override those bits, then >> > > whole ACPI component needs move down to Xen too. >> > >> > No, I''m not saying the evaluation should be happening there. Below is >> > a draft hypervisor patch (only compile tested so far). >> >> Attached a patch that actually works (with a minimal Dom0 addition). >> > > yes, this change looks more straightforward. :-)With that in, we still have more deficiencies compared to native Linux. For one, we don''t use mwait when ACPI doesn''t tell us to, while Linux does (in the intel_idle driver for deeper C-states, and for C1 also via mwait_idle()). This is likely a bit more work, but it should be possible to construct C-state information from CPUID leaf 5 (and, if valid, ignore information passed down from Dom0), which would match intel_idle''s taking precedence over acpi_idle in Linux. Second, if only C1 gets announced by ACPI, we end up not using it because Dom0 simply neglects to let the hypervisor know. This is because acpi_processor_get_power_info_cst() (back to at least 2.6.16) returns -EFAULT if less than two C-states were found. Simply prefixing the check with "!processor_pm_external() && " fixes this (but I don''t know whether something similar could be done in Jeremy''s tree). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Thursday, August 25, 2011 8:37 PM > > >>> On 21.08.11 at 07:26, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> Sent: Friday, August 19, 2011 11:02 PM > >> > >> Yet another idea - why don''t we simply pass the buffer passed to > >> > >> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the > >> > >> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH > >> > >> (which I don''t think Xen really supports at present). > >> > >> > >> > >> Or really, depending on who controls what, the P, C, and T bits should > >> > >> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently > >> > >> does, and then let Xen override the bits it ought to control). > >> > > > >> > > _PDC is encoded in AML language, and requires an ACPI parser which > >> > > is one thing we avoid in Xen. If Xen want to override those bits, then > >> > > whole ACPI component needs move down to Xen too. > >> > > >> > No, I''m not saying the evaluation should be happening there. Below is > >> > a draft hypervisor patch (only compile tested so far). > >> > >> Attached a patch that actually works (with a minimal Dom0 addition). > >> > > > > yes, this change looks more straightforward. :-) > > With that in, we still have more deficiencies compared to native Linux.definitely there''ll be even more than what''s revealed today, due to the way that dom0 ACPI processor driver is tightly bound. there''re lots of factors in dom0 itself which may impact the verification/filtering on Cx entries provide by BIOS, while some of which should be avoided from Xen p.o.v, such as the 2nd example you just found. The more severe is that to work around those factors adds intrusive Xen awareness into generic ACPI processor driver, e.g. @@ -780,7 +780,7 @@ static int acpi_processor_get_power_info current_count)); /* Validate number of power states discovered */ - if (current_count < 2) + if (current_count < 1 + !processor_pm_external()) status = -EFAULT; end: More changes like above are added, less possibilities for Xen PM changes to be accepted into upstream. Also such specific changes made on one dom0 version may be invalid in a new version quickly. Above change is one example which doesn''t hold true in newer kernel. When working with Konrad on rebasing xen PM patches to latest Linux 3.0.0. we tried hard to avoid intrusive changes in generic ACPI processor driver, by trying to invoke existing interfaces in higher level as possible. The end result is that we skip handling those corner cases like above example for now, by at least making Xen PM working on majority boxes. Later after Xen PM is accepted upstream with more Xen awareness in Linux ACPI people, those corner cases handling may be improved gradually. Another option Yang currently is working on is to port native intel-idle driver to Xen, which should avoid nasty dependency on dom0 ACPI bits and immune to various BIOS bugs.> > For one, we don''t use mwait when ACPI doesn''t tell us to, while Linux > does (in the intel_idle driver for deeper C-states, and for C1 also via > mwait_idle()). This is likely a bit more work, but it should be possible to > construct C-state information from CPUID leaf 5 (and, if valid, ignore > information passed down from Dom0), which would match intel_idle''s > taking precedence over acpi_idle in Linux.yes. This should be a desired feature in Xen, with some limitations: - not work with CPU hotplug - not work with old boxes (starting from Nehalem) - not work with Px/Cx state changes (_PPC, _CST e.g. from Node Manager) So this will be a supplemented option to existing acpi_idle, and should work on most cases when above 3 factors are not concerned.> > Second, if only C1 gets announced by ACPI, we end up not using it > because Dom0 simply neglects to let the hypervisor know. This is > because acpi_processor_get_power_info_cst() (back to at least > 2.6.16) returns -EFAULT if less than two C-states were found. Simply > prefixing the check with "!processor_pm_external() && " fixes this > (but I don''t know whether something similar could be done in Jeremy''s > tree).this is a very temporary problem which disappears quickly in subsequent versions. But if just taking 2.6.18-xen, it''s a right fix. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 26.08.11 at 04:18, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@novell.com] >> Sent: Thursday, August 25, 2011 8:37 PM >> >> >>> On 21.08.11 at 07:26, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >> From: Jan Beulich [mailto:JBeulich@novell.com] >> >> Sent: Friday, August 19, 2011 11:02 PM >> >> > >> Yet another idea - why don''t we simply pass the buffer passed to >> >> > >> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the >> >> > >> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH >> >> > >> (which I don''t think Xen really supports at present). >> >> > >> >> >> > >> Or really, depending on who controls what, the P, C, and T bits should >> >> > >> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently >> >> > >> does, and then let Xen override the bits it ought to control). >> >> > > >> >> > > _PDC is encoded in AML language, and requires an ACPI parser which >> >> > > is one thing we avoid in Xen. If Xen want to override those bits, then >> >> > > whole ACPI component needs move down to Xen too. >> >> > >> >> > No, I''m not saying the evaluation should be happening there. Below is >> >> > a draft hypervisor patch (only compile tested so far). >> >> >> >> Attached a patch that actually works (with a minimal Dom0 addition). >> >> >> > >> > yes, this change looks more straightforward. :-) >> >> With that in, we still have more deficiencies compared to native Linux. > > definitely there''ll be even more than what''s revealed today, due to the > way that dom0 ACPI processor driver is tightly bound. there''re lots of > factors in dom0 itself which may impact the verification/filtering on > Cx entries provide by BIOS, while some of which should be avoided from > Xen p.o.v, such as the 2nd example you just found. The more severe is > that to work around those factors adds intrusive Xen awareness into > generic ACPI processor driver, e.g. > > @@ -780,7 +780,7 @@ static int acpi_processor_get_power_info > current_count)); > > /* Validate number of power states discovered */ > - if (current_count < 2) > + if (current_count < 1 + !processor_pm_external()) > status = -EFAULT; > > end: > > More changes like above are added, less possibilities for Xen PM > changes to be accepted into upstream. Also such specific changes > made on one dom0 version may be invalid in a new version quickly. > Above change is one example which doesn''t hold true in newer > kernel.Afaict, the code is unchanged up to at least 3.0, and requires the same adjustment (at least for the non-pvops case; the pvops one clearly can''t be reasonably viewed from any post-2.6.32 perspective).> When working with Konrad on rebasing xen PM patches to latest > Linux 3.0.0. we tried hard to avoid intrusive changes in generic > ACPI processor driver, by trying to invoke existing interfaces in > higher level as possible. The end result is that we skip handling > those corner cases like above example for now, by at least making > Xen PM working on majority boxes. Later after Xen PM is accepted > upstream with more Xen awareness in Linux ACPI people, those > corner cases handling may be improved gradually. > > Another option Yang currently is working on is to port native intel-idle > driver to Xen, which should avoid nasty dependency on dom0 ACPI > bits and immune to various BIOS bugs.That''s good to hear.>> For one, we don''t use mwait when ACPI doesn''t tell us to, while Linux >> does (in the intel_idle driver for deeper C-states, and for C1 also via >> mwait_idle()). This is likely a bit more work, but it should be possible to >> construct C-state information from CPUID leaf 5 (and, if valid, ignore >> information passed down from Dom0), which would match intel_idle''s >> taking precedence over acpi_idle in Linux. > > yes. This should be a desired feature in Xen, with some limitations: > - not work with CPU hotplug > - not work with old boxes (starting from Nehalem) > - not work with Px/Cx state changes (_PPC, _CST e.g. from Node Manager) > > So this will be a supplemented option to existing acpi_idle, and should > work on most cases when above 3 factors are not concerned. > >> >> Second, if only C1 gets announced by ACPI, we end up not using it >> because Dom0 simply neglects to let the hypervisor know. This is >> because acpi_processor_get_power_info_cst() (back to at least >> 2.6.16) returns -EFAULT if less than two C-states were found. Simply >> prefixing the check with "!processor_pm_external() && " fixes this >> (but I don''t know whether something similar could be done in Jeremy''s >> tree). > > this is a very temporary problem which disappears quickly in subsequent > versions. But if just taking 2.6.18-xen, it''s a right fix.Again - when did you see this disappear? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel