Stefan Bader
2013-Jan-14 15:58 UTC
kernel 3.7+ cpufreq regression on AMD system running as dom0
Starting with kernel v3.7 the following commit added a quirk to obtain the real frequencies of certain AMD systems: commit f594065faf4f9067c2283a34619fc0714e79a98d Author: Matthew Garrett <mjg@redhat.com> Date: Tue Sep 4 08:28:06 2012 +0000 ACPI: Add fixups for AMD P-state figures When running bare-metal, on my Opteron 6128 test box results in the frequencies remaining effectively unchanged: [ 5.475735] P0: MSR(hi,lo): 8000015c-50004004 [ 5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000 [ 5.484001] P1: MSR(hi,lo): 8000014c-50004a4e [ 5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500 [ 5.492272] P2: MSR(hi,lo): 80000141-50005048 [ 5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200 [ 5.500540] P3: MSR(hi,lo): 80000138-50005844 [ 5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000 [ 5.508812] P4: MSR(hi,lo): 80000131-50005c40 [ 5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800 However running as dom0 under Xen 4.2, reading this MSR returns null: [ 11.613068] P0: MSR(hi,lo): 00000000-00000000 [ 11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600 [ 11.613078] P1: MSR(hi,lo): 00000000-00000000 [ 11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600 [ 11.613085] P2: MSR(hi,lo): 00000000-00000000 [ 11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600 [ 11.613091] P3: MSR(hi,lo): 00000000-00000000 [ 11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600 [ 11.613098] P4: MSR(hi,lo): 00000000-00000000 [ 11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600 And this results in Xen failing to change the governor: "(XEN) Fail change to ondemand governor" I suppose this ultimately requires some support in the hypervisor to pass through the real values. But since this is at least on my combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a regression compared to older kernels, I wonder whether the following change might be something that should go into mainline: --- a/drivers/acpi/processor_perflib.c +++ b/drivers/acpi/processor_perflib.c @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) || boot_cpu_data.x86 == 0x11) { rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); + /* Bit 63 indicates whether contents are valid */ + if (!(hi & 0x8000000)) + return; fid = lo & 0x3f; did = (lo >> 6) & 7; if (boot_cpu_data.x86 == 0x10) I tested something similar (so hopefully I have not failed on slapping together a cleaned up version), which did resolve the problem. -Stefan Reference: https://bugs.launchpad.net/bugs/1078619
Borislav Petkov
2013-Jan-14 16:34 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:> Starting with kernel v3.7 the following commit added a quirk > to obtain the real frequencies of certain AMD systems: > > commit f594065faf4f9067c2283a34619fc0714e79a98d > Author: Matthew Garrett <mjg@redhat.com> > Date: Tue Sep 4 08:28:06 2012 +0000 > > ACPI: Add fixups for AMD P-state figures > > When running bare-metal, on my Opteron 6128 test box results > in the frequencies remaining effectively unchanged: > [ 5.475735] P0: MSR(hi,lo): 8000015c-50004004 > [ 5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000 > [ 5.484001] P1: MSR(hi,lo): 8000014c-50004a4e > [ 5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500 > [ 5.492272] P2: MSR(hi,lo): 80000141-50005048 > [ 5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200 > [ 5.500540] P3: MSR(hi,lo): 80000138-50005844 > [ 5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000 > [ 5.508812] P4: MSR(hi,lo): 80000131-50005c40 > [ 5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800 > > However running as dom0 under Xen 4.2, reading this MSR returns > null: > [ 11.613068] P0: MSR(hi,lo): 00000000-00000000 > [ 11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600 > [ 11.613078] P1: MSR(hi,lo): 00000000-00000000 > [ 11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600 > [ 11.613085] P2: MSR(hi,lo): 00000000-00000000 > [ 11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600 > [ 11.613091] P3: MSR(hi,lo): 00000000-00000000 > [ 11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600 > [ 11.613098] P4: MSR(hi,lo): 00000000-00000000 > [ 11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600 > > And this results in Xen failing to change the governor: > "(XEN) Fail change to ondemand governor" > > I suppose this ultimately requires some support in the hypervisor > to pass through the real values. But since this is at least on my > combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a > regression compared to older kernels, I wonder whether the following > change might be something that should go into mainline: > > --- a/drivers/acpi/processor_perflib.c > +++ b/drivers/acpi/processor_perflib.c > @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px > if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) > || boot_cpu_data.x86 == 0x11) { > rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); > + /* Bit 63 indicates whether contents are valid */ > + if (!(hi & 0x8000000)) > + return;I don''t think that''s the right change - this is fixing baremetal so that it works on xen. And besides, this code was in powernow-k8 before so I''m wondering why did it work then.> fid = lo & 0x3f; > did = (lo >> 6) & 7; > if (boot_cpu_data.x86 == 0x10) > > I tested something similar (so hopefully I have not failed on slapping > together a cleaned up version), which did resolve the problem. > > -Stefan > > Reference: https://bugs.launchpad.net/bugs/1078619Adding the new Andre. :-) Andre, what did we do for powernow-k8 on xen so that the F10h 50MHz steps quirk would work? Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --
Jan Beulich
2013-Jan-14 16:55 UTC
Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
>>> On 14.01.13 at 17:34, Borislav Petkov <bp@alien8.de> wrote: > On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote: >> --- a/drivers/acpi/processor_perflib.c >> +++ b/drivers/acpi/processor_perflib.c >> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px > *px >> if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) >> || boot_cpu_data.x86 == 0x11) { >> rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); >> + /* Bit 63 indicates whether contents are valid */ >> + if (!(hi & 0x8000000)) >> + return; > > I don''t think that''s the right change - this is fixing baremetal so that > it works on xen. And besides, this code was in powernow-k8 before so I''m > wondering why did it work then.Because the driver doesn''t get loaded in that case? Jan
Stefan Bader
2013-Jan-14 17:08 UTC
Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
On 14.01.2013 17:34, Borislav Petkov wrote:> On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote: >> Starting with kernel v3.7 the following commit added a quirk >> to obtain the real frequencies of certain AMD systems: >> >> commit f594065faf4f9067c2283a34619fc0714e79a98d >> Author: Matthew Garrett <mjg@redhat.com> >> Date: Tue Sep 4 08:28:06 2012 +0000 >> >> ACPI: Add fixups for AMD P-state figures >> >> When running bare-metal, on my Opteron 6128 test box results >> in the frequencies remaining effectively unchanged: >> [ 5.475735] P0: MSR(hi,lo): 8000015c-50004004 >> [ 5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000 >> [ 5.484001] P1: MSR(hi,lo): 8000014c-50004a4e >> [ 5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500 >> [ 5.492272] P2: MSR(hi,lo): 80000141-50005048 >> [ 5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200 >> [ 5.500540] P3: MSR(hi,lo): 80000138-50005844 >> [ 5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000 >> [ 5.508812] P4: MSR(hi,lo): 80000131-50005c40 >> [ 5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800 >> >> However running as dom0 under Xen 4.2, reading this MSR returns >> null: >> [ 11.613068] P0: MSR(hi,lo): 00000000-00000000 >> [ 11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600 >> [ 11.613078] P1: MSR(hi,lo): 00000000-00000000 >> [ 11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600 >> [ 11.613085] P2: MSR(hi,lo): 00000000-00000000 >> [ 11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600 >> [ 11.613091] P3: MSR(hi,lo): 00000000-00000000 >> [ 11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600 >> [ 11.613098] P4: MSR(hi,lo): 00000000-00000000 >> [ 11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600 >> >> And this results in Xen failing to change the governor: >> "(XEN) Fail change to ondemand governor" >> >> I suppose this ultimately requires some support in the hypervisor >> to pass through the real values. But since this is at least on my >> combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a >> regression compared to older kernels, I wonder whether the following >> change might be something that should go into mainline: >> >> --- a/drivers/acpi/processor_perflib.c >> +++ b/drivers/acpi/processor_perflib.c >> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px >> if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) >> || boot_cpu_data.x86 == 0x11) { >> rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); >> + /* Bit 63 indicates whether contents are valid */ >> + if (!(hi & 0x8000000)) >> + return; > > I don''t think that''s the right change - this is fixing baremetal so that > it works on xen. And besides, this code was in powernow-k8 before so I''m > wondering why did it work then.This actually only started to work when the xen-processor module got introduced to provide acpi information to the hypervisor. If I remember correctly powernow-k8 did fail. For the way I did the fix: the AMD BIOS docs seemed to indicate that even for bare metal bit 63 would say whether the values are valid. So I thought this is a nice coincidence that under Xen with all 0 this matches that special case... ;) -Stefan> >> fid = lo & 0x3f; >> did = (lo >> 6) & 7; >> if (boot_cpu_data.x86 == 0x10) >> >> I tested something similar (so hopefully I have not failed on slapping >> together a cleaned up version), which did resolve the problem. >> >> -Stefan >> >> Reference: https://bugs.launchpad.net/bugs/1078619 > > Adding the new Andre. :-) Andre, what did we do for powernow-k8 on xen > so that the F10h 50MHz steps quirk would work? > > Thanks. >
André Przywara
2013-Jan-14 17:40 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On Mon, 14 Jan 2013 18:08:45 +0100 Stefan Bader <stefan.bader@canonical.com> wrote:> On 14.01.2013 17:34, Borislav Petkov wrote: > > On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote: > >> Starting with kernel v3.7 the following commit added a quirk > >> to obtain the real frequencies of certain AMD systems: > >> > >> commit f594065faf4f9067c2283a34619fc0714e79a98d > >> Author: Matthew Garrett <mjg@redhat.com> > >> Date: Tue Sep 4 08:28:06 2012 +0000 > >> > >> ACPI: Add fixups for AMD P-state figures > >> > >> When running bare-metal, on my Opteron 6128 test box results > >> in the frequencies remaining effectively unchanged: > >> [ 5.475735] P0: MSR(hi,lo): 8000015c-50004004 > >> [ 5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000 > >> [ 5.484001] P1: MSR(hi,lo): 8000014c-50004a4e > >> [ 5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500 > >> [ 5.492272] P2: MSR(hi,lo): 80000141-50005048 > >> [ 5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200 > >> [ 5.500540] P3: MSR(hi,lo): 80000138-50005844 > >> [ 5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000 > >> [ 5.508812] P4: MSR(hi,lo): 80000131-50005c40 > >> [ 5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800 > >> > >> However running as dom0 under Xen 4.2, reading this MSR returns > >> null: > >> [ 11.613068] P0: MSR(hi,lo): 00000000-00000000 > >> [ 11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600 > >> [ 11.613078] P1: MSR(hi,lo): 00000000-00000000 > >> [ 11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600 > >> [ 11.613085] P2: MSR(hi,lo): 00000000-00000000 > >> [ 11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600 > >> [ 11.613091] P3: MSR(hi,lo): 00000000-00000000 > >> [ 11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600 > >> [ 11.613098] P4: MSR(hi,lo): 00000000-00000000 > >> [ 11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600 > >> > >> And this results in Xen failing to change the governor: > >> "(XEN) Fail change to ondemand governor" > >> > >> I suppose this ultimately requires some support in the hypervisor > >> to pass through the real values. But since this is at least on my > >> combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a > >> regression compared to older kernels, I wonder whether the > >> following change might be something that should go into mainline: > >> > >> --- a/drivers/acpi/processor_perflib.c > >> +++ b/drivers/acpi/processor_perflib.c > >> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct > >> acpi_processor_px *px if ((boot_cpu_data.x86 == 0x10 && > >> boot_cpu_data.x86_model < 10) || boot_cpu_data.x86 == 0x11) { > >> rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); > >> + /* Bit 63 indicates whether contents are valid */ > >> + if (!(hi & 0x8000000)) > >> + return; > > > > I don''t think that''s the right change - this is fixing baremetal so > > that it works on xen. And besides, this code was in powernow-k8 > > before so I''m wondering why did it work then. > > This actually only started to work when the xen-processor module got > introduced to provide acpi information to the hypervisor. If I > remember correctly powernow-k8 did fail. > For the way I did the fix: the AMD BIOS docs seemed to indicate that > even for bare metal bit 63 would say whether the values are valid. So > I thought this is a nice coincidence that under Xen with all 0 this > matches that special case... ;)From a first glance I think this fix is a valid approach. There are BIOSes which disable P-states via this bit, so we have to observe this for bare-metal, too. Let me think a bit more about this, however, and see whether there is a better solution to do the right thing (tm) under Xen. Getting back to you then. Thanks, Andre.
Matt Wilson
2013-Jan-15 13:04 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:> Starting with kernel v3.7 the following commit added a quirk > to obtain the real frequencies of certain AMD systems: > > commit f594065faf4f9067c2283a34619fc0714e79a98d > Author: Matthew Garrett <mjg@redhat.com> > Date: Tue Sep 4 08:28:06 2012 +0000 > > ACPI: Add fixups for AMD P-state figures > > When running bare-metal, on my Opteron 6128 test box results > in the frequencies remaining effectively unchanged: > [ 5.475735] P0: MSR(hi,lo): 8000015c-50004004 > [ 5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000 > [ 5.484001] P1: MSR(hi,lo): 8000014c-50004a4e > [ 5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500 > [ 5.492272] P2: MSR(hi,lo): 80000141-50005048 > [ 5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200 > [ 5.500540] P3: MSR(hi,lo): 80000138-50005844 > [ 5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000 > [ 5.508812] P4: MSR(hi,lo): 80000131-50005c40 > [ 5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800 > > However running as dom0 under Xen 4.2, reading this MSR returns > null: > [ 11.613068] P0: MSR(hi,lo): 00000000-00000000 > [ 11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600 > [ 11.613078] P1: MSR(hi,lo): 00000000-00000000 > [ 11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600 > [ 11.613085] P2: MSR(hi,lo): 00000000-00000000 > [ 11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600 > [ 11.613091] P3: MSR(hi,lo): 00000000-00000000 > [ 11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600 > [ 11.613098] P4: MSR(hi,lo): 00000000-00000000 > [ 11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600 > > And this results in Xen failing to change the governor: > "(XEN) Fail change to ondemand governor" > > I suppose this ultimately requires some support in the hypervisor > to pass through the real values. But since this is at least on my > combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a > regression compared to older kernels, I wonder whether the following > change might be something that should go into mainline:The correct values should be returned already via rdmsr if "cpureq=dom0-kernel" is specified on the Xen command line. Looking at the LP report, it doesn''t seem that this option was used. Likely you will also need to use "dom0_vcpu_pin=1" if you want dom0 to be the cpufreq controller on AMD. Matt> --- a/drivers/acpi/processor_perflib.c > +++ b/drivers/acpi/processor_perflib.c > @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px > if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) > || boot_cpu_data.x86 == 0x11) { > rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); > + /* Bit 63 indicates whether contents are valid */ > + if (!(hi & 0x8000000)) > + return; > fid = lo & 0x3f; > did = (lo >> 6) & 7; > if (boot_cpu_data.x86 == 0x10) > > I tested something similar (so hopefully I have not failed on slapping > together a cleaned up version), which did resolve the problem. > > -Stefan > > Reference: https://bugs.launchpad.net/bugs/1078619
Konrad Rzeszutek Wilk
2013-Jan-15 17:53 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On Mon, Jan 14, 2013 at 05:34:45PM +0100, Borislav Petkov wrote:> On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote: > > Starting with kernel v3.7 the following commit added a quirk > > to obtain the real frequencies of certain AMD systems: > > > > commit f594065faf4f9067c2283a34619fc0714e79a98d > > Author: Matthew Garrett <mjg@redhat.com> > > Date: Tue Sep 4 08:28:06 2012 +0000 > > > > ACPI: Add fixups for AMD P-state figures > > > > When running bare-metal, on my Opteron 6128 test box results > > in the frequencies remaining effectively unchanged: > > [ 5.475735] P0: MSR(hi,lo): 8000015c-50004004 > > [ 5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000 > > [ 5.484001] P1: MSR(hi,lo): 8000014c-50004a4e > > [ 5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500 > > [ 5.492272] P2: MSR(hi,lo): 80000141-50005048 > > [ 5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200 > > [ 5.500540] P3: MSR(hi,lo): 80000138-50005844 > > [ 5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000 > > [ 5.508812] P4: MSR(hi,lo): 80000131-50005c40 > > [ 5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800 > > > > However running as dom0 under Xen 4.2, reading this MSR returns > > null: > > [ 11.613068] P0: MSR(hi,lo): 00000000-00000000 > > [ 11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600 > > [ 11.613078] P1: MSR(hi,lo): 00000000-00000000 > > [ 11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600 > > [ 11.613085] P2: MSR(hi,lo): 00000000-00000000 > > [ 11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600 > > [ 11.613091] P3: MSR(hi,lo): 00000000-00000000 > > [ 11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600 > > [ 11.613098] P4: MSR(hi,lo): 00000000-00000000 > > [ 11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600 > > > > And this results in Xen failing to change the governor: > > "(XEN) Fail change to ondemand governor" > > > > I suppose this ultimately requires some support in the hypervisor > > to pass through the real values. But since this is at least on my > > combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a > > regression compared to older kernels, I wonder whether the following > > change might be something that should go into mainline: > > > > --- a/drivers/acpi/processor_perflib.c > > +++ b/drivers/acpi/processor_perflib.c > > @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px > > if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) > > || boot_cpu_data.x86 == 0x11) { > > rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); > > + /* Bit 63 indicates whether contents are valid */ > > + if (!(hi & 0x8000000)) > > + return; > > I don''t think that''s the right change - this is fixing baremetal so that > it works on xen. And besides, this code was in powernow-k8 before so I''m > wondering why did it work then.Powernow-k8 only populated the cpufreq policy information. This library (processor_perflib) is the generic library used for ACPI P-states parsing. This specific function (acpi_processor_get_performance_states) is just used to fetch and parse the P-states. Xen-acpi-processor (which we use to upload the P and C-states to the hypervisor) ends up calling this library to parse the P-states and this unfortunate quirk clamps the P-states based on the MSRS. It is odd that this CPU specific quirk got added in this generic library. Is there no ACPI quirk system similar to how DMI quirks are handled? Anyhow, I think this patch makes sense - it makes sure that the MSR value is sane. Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>> > > fid = lo & 0x3f; > > did = (lo >> 6) & 7; > > if (boot_cpu_data.x86 == 0x10) > > > > I tested something similar (so hopefully I have not failed on slapping > > together a cleaned up version), which did resolve the problem. > > > > -Stefan > > > > Reference: https://bugs.launchpad.net/bugs/1078619 > > Adding the new Andre. :-) Andre, what did we do for powernow-k8 on xen > so that the F10h 50MHz steps quirk would work? > > Thanks. > > -- > Regards/Gruss, > Boris. > > Sent from a fat crate under my desk. Formatting is fine. > --
Matt Wilson
2013-Jan-15 17:59 UTC
Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
On Tue, Jan 15, 2013 at 02:04:30PM +0100, Matt Wilson wrote:> On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote: > > Starting with kernel v3.7 the following commit added a quirk > > to obtain the real frequencies of certain AMD systems: > > > > commit f594065faf4f9067c2283a34619fc0714e79a98d > > Author: Matthew Garrett <mjg@redhat.com> > > Date: Tue Sep 4 08:28:06 2012 +0000 > > > > ACPI: Add fixups for AMD P-state figures > > > > When running bare-metal, on my Opteron 6128 test box results > > in the frequencies remaining effectively unchanged: > > [ 5.475735] P0: MSR(hi,lo): 8000015c-50004004 > > [ 5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000 > > [ 5.484001] P1: MSR(hi,lo): 8000014c-50004a4e > > [ 5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500 > > [ 5.492272] P2: MSR(hi,lo): 80000141-50005048 > > [ 5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200 > > [ 5.500540] P3: MSR(hi,lo): 80000138-50005844 > > [ 5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000 > > [ 5.508812] P4: MSR(hi,lo): 80000131-50005c40 > > [ 5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800 > > > > However running as dom0 under Xen 4.2, reading this MSR returns > > null: > > [ 11.613068] P0: MSR(hi,lo): 00000000-00000000 > > [ 11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600 > > [ 11.613078] P1: MSR(hi,lo): 00000000-00000000 > > [ 11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600 > > [ 11.613085] P2: MSR(hi,lo): 00000000-00000000 > > [ 11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600 > > [ 11.613091] P3: MSR(hi,lo): 00000000-00000000 > > [ 11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600 > > [ 11.613098] P4: MSR(hi,lo): 00000000-00000000 > > [ 11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600 > > > > And this results in Xen failing to change the governor: > > "(XEN) Fail change to ondemand governor" > > > > I suppose this ultimately requires some support in the hypervisor > > to pass through the real values. But since this is at least on my > > combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a > > regression compared to older kernels, I wonder whether the following > > change might be something that should go into mainline: > > The correct values should be returned already via rdmsr if > "cpureq=dom0-kernel" is specified on the Xen command line. Looking at > the LP report, it doesn''t seem that this option was used. > > Likely you will also need to use "dom0_vcpu_pin=1" if you want dom0 to > be the cpufreq controller on AMD.Aah, after chatting with Konrad I understand the problem better. This broke using Xen as the cpufreq controller - sorry for failing to notice that. :-) Checking for a valid result seems to make sense. Matt> > --- a/drivers/acpi/processor_perflib.c > > +++ b/drivers/acpi/processor_perflib.c > > @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px > > if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) > > || boot_cpu_data.x86 == 0x11) { > > rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); > > + /* Bit 63 indicates whether contents are valid */ > > + if (!(hi & 0x8000000)) > > + return; > > fid = lo & 0x3f; > > did = (lo >> 6) & 7; > > if (boot_cpu_data.x86 == 0x10) > > > > I tested something similar (so hopefully I have not failed on slapping > > together a cleaned up version), which did resolve the problem. > > > > -Stefan > > > > Reference: https://bugs.launchpad.net/bugs/1078619 > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Borislav Petkov
2013-Jan-15 18:18 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On Tue, Jan 15, 2013 at 12:53:05PM -0500, Konrad Rzeszutek Wilk wrote:> > I don''t think that''s the right change - this is fixing baremetal so that > > it works on xen. And besides, this code was in powernow-k8 before so I''m > > wondering why did it work then. > > Powernow-k8 only populated the cpufreq policy information. This library > (processor_perflib) is the generic library used for ACPI P-states parsing. > This specific function (acpi_processor_get_performance_states) is just > used to fetch and parse the P-states. > > Xen-acpi-processor (which we use to upload the P and C-states to the > hypervisor) ends up calling this library to parse the P-states > and this unfortunate quirk clamps the P-states based on the MSRS.Huh? This is a fix for _PSS frequency values which are rounded and thus imprecise. The _PSS objects are the unfortunate ones, as most of the other crap BIOS produces.> It is odd that this CPU specific quirk got added in this generic > library. Is there no ACPI quirk system similar to how DMI quirks are > handled?Even if there were, do you know all the boards and BIOS revisions which have those rounded values? The fix addresses the hardware which has those 50MHz multiples and simply ignores the _PSS data but reads out the P-states directly from the hardware.> Anyhow, I think this patch makes sense - it makes sure that the MSR > value is sane.I agree to a certain degree. Testing the Valid bit is something we should do for P-state MSRs - and for all MSRs containing a Valid bit, for that matter - and the original code didn''t do it. However, you need to push down the *correct* frequencies *after* the quirk to the hypervisor (I''m looking at push_pxx_to_hypervisor()) so that it is aware of the exact P-state frequencies this CPU supports and not some rounded values. AFAICT for the xen part, of course. But the baseline stands: you need to tell the thing that switches P-states the exact P-state frequencies of the CPU. :-). -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --
Jan Beulich
2013-Jan-16 10:26 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
>>> On 15.01.13 at 18:53, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Mon, Jan 14, 2013 at 05:34:45PM +0100, Borislav Petkov wrote: >> On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote: >> > --- a/drivers/acpi/processor_perflib.c >> > +++ b/drivers/acpi/processor_perflib.c >> > @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px >> > if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) >> > || boot_cpu_data.x86 == 0x11) { >> > rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); >> > + /* Bit 63 indicates whether contents are valid */ >> > + if (!(hi & 0x8000000)) >> > + return; >> >> I don''t think that''s the right change - this is fixing baremetal so that >> it works on xen. And besides, this code was in powernow-k8 before so I''m >> wondering why did it work then. > > Powernow-k8 only populated the cpufreq policy information. This library > (processor_perflib) is the generic library used for ACPI P-states parsing. > This specific function (acpi_processor_get_performance_states) is just > used to fetch and parse the P-states. > > Xen-acpi-processor (which we use to upload the P and C-states to the > hypervisor) ends up calling this library to parse the P-states > and this unfortunate quirk clamps the P-states based on the MSRS. > > It is odd that this CPU specific quirk got added in this generic > library. Is there no ACPI quirk system similar to how DMI quirks > are handled? > > Anyhow, I think this patch makes sense - it makes sure that the > MSR value is sane. > > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>Did someone actually _test_ that patch? I ask because the mask used (0x8000000) doesn''t check bit 63 as the comment says, but bit 59 instead... Jan
Stefan Bader
2013-Jan-16 14:34 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On 01/16/2013 11:26 AM, Jan Beulich wrote:>>>> On 15.01.13 at 18:53, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: >> On Mon, Jan 14, 2013 at 05:34:45PM +0100, Borislav Petkov wrote: >>> On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote: >>>> --- a/drivers/acpi/processor_perflib.c >>>> +++ b/drivers/acpi/processor_perflib.c >>>> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px >>>> if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) >>>> || boot_cpu_data.x86 == 0x11) { >>>> rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); >>>> + /* Bit 63 indicates whether contents are valid */ >>>> + if (!(hi & 0x8000000)) >>>> + return; >>> >>> I don''t think that''s the right change - this is fixing baremetal so that >>> it works on xen. And besides, this code was in powernow-k8 before so I''m >>> wondering why did it work then. >> >> Powernow-k8 only populated the cpufreq policy information. This library >> (processor_perflib) is the generic library used for ACPI P-states parsing. >> This specific function (acpi_processor_get_performance_states) is just >> used to fetch and parse the P-states. >> >> Xen-acpi-processor (which we use to upload the P and C-states to the >> hypervisor) ends up calling this library to parse the P-states >> and this unfortunate quirk clamps the P-states based on the MSRS. >> >> It is odd that this CPU specific quirk got added in this generic >> library. Is there no ACPI quirk system similar to how DMI quirks >> are handled? >> >> Anyhow, I think this patch makes sense - it makes sure that the >> MSR value is sane. >> >> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > Did someone actually _test_ that patch? I ask because the mask > used (0x8000000) doesn''t check bit 63 as the comment says, but > bit 59 instead... > > Jan >Not this version which I did at the end of a day to have a cleaned up version to discuss. And obviously I managed to get the number of zeros wrong. :( The comment is right and it should be 0x80000000
Konrad Rzeszutek Wilk
2013-Jan-18 19:00 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On Tue, Jan 15, 2013 at 07:18:39PM +0100, Borislav Petkov wrote:> On Tue, Jan 15, 2013 at 12:53:05PM -0500, Konrad Rzeszutek Wilk wrote: > > > I don''t think that''s the right change - this is fixing baremetal so that > > > it works on xen. And besides, this code was in powernow-k8 before so I''m > > > wondering why did it work then. > > > > Powernow-k8 only populated the cpufreq policy information. This library > > (processor_perflib) is the generic library used for ACPI P-states parsing. > > This specific function (acpi_processor_get_performance_states) is just > > used to fetch and parse the P-states. > > > > Xen-acpi-processor (which we use to upload the P and C-states to the > > hypervisor) ends up calling this library to parse the P-states > > and this unfortunate quirk clamps the P-states based on the MSRS. > > Huh? This is a fix for _PSS frequency values which are rounded and thus > imprecise. The _PSS objects are the unfortunate ones, as most of the > other crap BIOS produces.I did not explain myself well. The fix is OK - it just that the hypervisor causes the quirk to not work correctly. Hmm, I wonder if there BIOSes that do the same thing (cause the MSR to return 0). Per you estimation of BIOS quality, it seems that this could happen.> > > It is odd that this CPU specific quirk got added in this generic > > library. Is there no ACPI quirk system similar to how DMI quirks are > > handled? > > Even if there were, do you know all the boards and BIOS revisions which > have those rounded values? The fix addresses the hardware which has > those 50MHz multiples and simply ignores the _PSS data but reads out the > P-states directly from the hardware.Oh, I was not thinking DMI per-say. I was thinking something similar to DMI-quirk API. But for the ACPI subsystem, so it would be: if (ARM) ... these quirks neccessary if (AMD) .. these quirks and then the ACPI code can make the calls to this ACPI-quirk API to figure out whether it needs to modulate values. But this is all hand-waving at this point.> > > Anyhow, I think this patch makes sense - it makes sure that the MSR > > value is sane. > > I agree to a certain degree. Testing the Valid bit is something we > should do for P-state MSRs - and for all MSRs containing a Valid bit, > for that matter - and the original code didn''t do it.OK.> > However, you need to push down the *correct* frequencies *after* the > quirk to the hypervisor (I''m looking at push_pxx_to_hypervisor()) so > that it is aware of the exact P-state frequencies this CPU supports and > not some rounded values.Yes! It could be done in the hypervisor (it does the MSRs and figures out that the P-states need tweaking).> > AFAICT for the xen part, of course. But the baseline stands: you need to > tell the thing that switches P-states the exact P-state frequencies of > the CPU. :-).Right, that information is gathered from the MSRs. I think the Xen would need to do this since it can do the MSRs correctly and modify the P-states. So something like this in the hypervisor maybe (not even tested): diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c index a9b7792..54e7808 100644 --- a/xen/arch/x86/acpi/cpufreq/powernow.c +++ b/xen/arch/x86/acpi/cpufreq/powernow.c @@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy, return 0; } +#define MSR_AMD_PSTATE_DEF_BASE 0xc0010064 +static void amd_fixup_frequency(struct xen_processor_px *px, int i) +{ + u32 hi, lo, fid, did; + int index = px->control & 0x00000007; + + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) + return; + + if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) + || boot_cpu_data.x86 == 0x11) { + rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); + /* Bit 63 indicates whether contents are valid */ + if (!(hi & 0x80000000)) + return; + + fid = lo & 0x3f; + did = (lo >> 6) & 7; + if (boot_cpu_data.x86 == 0x10) + px->core_frequency = (100 * (fid + 0x10)) >> did; + else + px->core_frequency = (100 * (fid + 8)) >> did; + } +} + +static void amd_fixup_freq(struct processor_performance *perf) +{ + int i; + + for (i = 0; i < perf->state_count; i++) + amd_fixup_frequency(perf->states, i); + +} static int powernow_cpufreq_verify(struct cpufreq_policy *policy) { struct acpi_cpufreq_data *data; @@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy) perf = &processor_pminfo[policy->cpu]->perf; + amd_fixup_freq(perf); + cpufreq_verify_within_limits(policy, 0, perf->states[perf->platform_limit].core_frequency * 1000);
Boris Ostrovsky
2013-Jan-18 19:38 UTC
Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
On 01/18/2013 02:00 PM, Konrad Rzeszutek Wilk wrote:> > Right, that information is gathered from the MSRs. I think the Xen would > need to do this since it can do the MSRs correctly and modify the P-states. > > So something like this in the hypervisor maybe (not even tested):Is there any harm in allowing dom0 read P-state registers? Something along these lines: diff -r 40881d58e991 xen/arch/x86/traps.c --- a/xen/arch/x86/traps.c Thu Jan 17 14:47:04 2013 -0500 +++ b/xen/arch/x86/traps.c Fri Jan 18 09:32:51 2013 -0500 @@ -2535,7 +2535,7 @@ static int emulate_privileged_op(struct case MSR_K8_PSTATE7: if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ) goto fail; - if ( !is_cpufreq_controller(v->domain) ) + if ( d->domain_id != 0 ) { regs->eax = regs->edx = 0; break; (It does seem to fix the bug too) -boris> > diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c > index a9b7792..54e7808 100644 > --- a/xen/arch/x86/acpi/cpufreq/powernow.c > +++ b/xen/arch/x86/acpi/cpufreq/powernow.c > @@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy, > > return 0; > } > +#define MSR_AMD_PSTATE_DEF_BASE 0xc0010064 > +static void amd_fixup_frequency(struct xen_processor_px *px, int i) > +{ > + u32 hi, lo, fid, did; > + int index = px->control & 0x00000007; > + > + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) > + return; > + > + if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) > + || boot_cpu_data.x86 == 0x11) { > + rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); > + /* Bit 63 indicates whether contents are valid */ > + if (!(hi & 0x80000000)) > + return; > + > + fid = lo & 0x3f; > + did = (lo >> 6) & 7; > + if (boot_cpu_data.x86 == 0x10) > + px->core_frequency = (100 * (fid + 0x10)) >> did; > + else > + px->core_frequency = (100 * (fid + 8)) >> did; > + } > +} > + > +static void amd_fixup_freq(struct processor_performance *perf) > +{ > > + int i; > + > + for (i = 0; i < perf->state_count; i++) > + amd_fixup_frequency(perf->states, i); > + > +} > static int powernow_cpufreq_verify(struct cpufreq_policy *policy) > { > struct acpi_cpufreq_data *data; > @@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy) > > perf = &processor_pminfo[policy->cpu]->perf; > > + amd_fixup_freq(perf); > + > cpufreq_verify_within_limits(policy, 0, > perf->states[perf->platform_limit].core_frequency * 1000); > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Andrew Cooper
2013-Jan-18 19:44 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On 18/01/13 19:38, Boris Ostrovsky wrote:> On 01/18/2013 02:00 PM, Konrad Rzeszutek Wilk wrote: >> Right, that information is gathered from the MSRs. I think the Xen would >> need to do this since it can do the MSRs correctly and modify the P-states. >> >> So something like this in the hypervisor maybe (not even tested): > Is there any harm in allowing dom0 read P-state registers?Yes - the dom0 vcpu could be moved across pcpus between MSR accesses. There is currently some hacky code for pinning the dom0 cpus right at boot time, after which dom0 is permitted to access a few more MSRs, which appear to be power related. ~Andrew> > Something along these lines: > > diff -r 40881d58e991 xen/arch/x86/traps.c > --- a/xen/arch/x86/traps.c Thu Jan 17 14:47:04 2013 -0500 > +++ b/xen/arch/x86/traps.c Fri Jan 18 09:32:51 2013 -0500 > @@ -2535,7 +2535,7 @@ static int emulate_privileged_op(struct > case MSR_K8_PSTATE7: > if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ) > goto fail; > - if ( !is_cpufreq_controller(v->domain) ) > + if ( d->domain_id != 0 ) > { > regs->eax = regs->edx = 0; > break; > > > (It does seem to fix the bug too) > > -boris > > >> diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c >> index a9b7792..54e7808 100644 >> --- a/xen/arch/x86/acpi/cpufreq/powernow.c >> +++ b/xen/arch/x86/acpi/cpufreq/powernow.c >> @@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy, >> >> return 0; >> } >> +#define MSR_AMD_PSTATE_DEF_BASE 0xc0010064 >> +static void amd_fixup_frequency(struct xen_processor_px *px, int i) >> +{ >> + u32 hi, lo, fid, did; >> + int index = px->control & 0x00000007; >> + >> + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) >> + return; >> + >> + if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) >> + || boot_cpu_data.x86 == 0x11) { >> + rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); >> + /* Bit 63 indicates whether contents are valid */ >> + if (!(hi & 0x80000000)) >> + return; >> + >> + fid = lo & 0x3f; >> + did = (lo >> 6) & 7; >> + if (boot_cpu_data.x86 == 0x10) >> + px->core_frequency = (100 * (fid + 0x10)) >> did; >> + else >> + px->core_frequency = (100 * (fid + 8)) >> did; >> + } >> +} >> + >> +static void amd_fixup_freq(struct processor_performance *perf) >> +{ >> >> + int i; >> + >> + for (i = 0; i < perf->state_count; i++) >> + amd_fixup_frequency(perf->states, i); >> + >> +} >> static int powernow_cpufreq_verify(struct cpufreq_policy *policy) >> { >> struct acpi_cpufreq_data *data; >> @@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy) >> >> perf = &processor_pminfo[policy->cpu]->perf; >> >> + amd_fixup_freq(perf); >> + >> cpufreq_verify_within_limits(policy, 0, >> perf->states[perf->platform_limit].core_frequency * 1000); >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Borislav Petkov
2013-Jan-18 20:03 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On Fri, Jan 18, 2013 at 02:00:15PM -0500, Konrad Rzeszutek Wilk wrote:> I did not explain myself well. The fix is OK - it just that the > hypervisor causes the quirk to not work correctly. Hmm, I wonder if > there BIOSes that do the same thing (cause the MSR to return 0). Per > you estimation of BIOS quality, it seems that this could happen.Yeah, I don''t think there''s a limit to the amount of SNAFU a BIOS can cause :-).> Oh, I was not thinking DMI per-say. I was thinking something similar to > DMI-quirk API. But for the ACPI subsystem, so it would be: > > if (ARM) > ... these quirks neccessary > if (AMD) > .. these quirks > > and then the ACPI code can make the calls to this ACPI-quirk API to > figure out whether it needs to modulate values. But this is all > hand-waving at this point.Yeah, those CPUs are just a very small set to even warrant a quirk API. [ … ]> Right, that information is gathered from the MSRs. I think the Xen would > need to do this since it can do the MSRs correctly and modify the P-states. > > So something like this in the hypervisor maybe (not even tested):Yeah, something like that. Basically you can copy the quirk down to the hypervisor. But, Andre was explaining to me the other day that those P-states frequencies are not that important. Let me explain: the ondemand governor, for example, computes idle time and each time it needs to increase, it switches straight up to the highest frequency. When it decreases the freq. though, it goes down in a staircase manner, going over all P-states, AFAICT. So we use them but not for all decisions. The question is, what does the xen governor(s) do? If it only uses the frequencies for reporting, then it is not that big of a deal. If it uses their values for switching decisions, then it probably needs the correct ones.> diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c > index a9b7792..54e7808 100644 > --- a/xen/arch/x86/acpi/cpufreq/powernow.c > +++ b/xen/arch/x86/acpi/cpufreq/powernow.c > @@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy, > > return 0; > } > +#define MSR_AMD_PSTATE_DEF_BASE 0xc0010064 > +static void amd_fixup_frequency(struct xen_processor_px *px, int i) > +{ > + u32 hi, lo, fid, did; > + int index = px->control & 0x00000007; > + > + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) > + return; > + > + if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) > + || boot_cpu_data.x86 == 0x11) { > + rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); > + /* Bit 63 indicates whether contents are valid */ > + if (!(hi & 0x80000000)) > + return;Something''s funny with this indentation.> + > + fid = lo & 0x3f; > + did = (lo >> 6) & 7; > + if (boot_cpu_data.x86 == 0x10) > + px->core_frequency = (100 * (fid + 0x10)) >> did; > + else > + px->core_frequency = (100 * (fid + 8)) >> did; > + } > +} > + > +static void amd_fixup_freq(struct processor_performance *perf) > +{ > > + int i; > + > + for (i = 0; i < perf->state_count; i++) > + amd_fixup_frequency(perf->states, i); > + > +} > static int powernow_cpufreq_verify(struct cpufreq_policy *policy) > { > struct acpi_cpufreq_data *data; > @@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy) > > perf = &processor_pminfo[policy->cpu]->perf; > > + amd_fixup_freq(perf); > + > cpufreq_verify_within_limits(policy, 0, > perf->states[perf->platform_limit].core_frequency * 1000);Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --
Konrad Rzeszutek Wilk
2013-Jan-18 22:00 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
> > > Right, that information is gathered from the MSRs. I think the Xen would > > need to do this since it can do the MSRs correctly and modify the P-states. > > > > So something like this in the hypervisor maybe (not even tested): > > Yeah, something like that. Basically you can copy the quirk down to the > hypervisor.<nods> Need also to include the comments from Matthew in it.> > But, Andre was explaining to me the other day that those P-states > frequencies are not that important. > > Let me explain: the ondemand governor, for example, computes idle time > and each time it needs to increase, it switches straight up to the > highest frequency. When it decreases the freq. though, it goes down in a > staircase manner, going over all P-states, AFAICT. > > So we use them but not for all decisions. The question is, what does the > xen governor(s) do?should look in the code. I know it borrowed from the Linux code - but I don''t know from which era - 2.6.18 maybe?> > If it only uses the frequencies for reporting, then it is not that big > of a deal. If it uses their values for switching decisions, then it > probably needs the correct ones.OK.> > > diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c > > index a9b7792..54e7808 100644 > > --- a/xen/arch/x86/acpi/cpufreq/powernow.c > > +++ b/xen/arch/x86/acpi/cpufreq/powernow.c > > @@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy, > > > > return 0; > > } > > +#define MSR_AMD_PSTATE_DEF_BASE 0xc0010064 > > +static void amd_fixup_frequency(struct xen_processor_px *px, int i) > > +{ > > + u32 hi, lo, fid, did; > > + int index = px->control & 0x00000007; > > + > > + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) > > + return; > > + > > + if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) > > + || boot_cpu_data.x86 == 0x11) { > > + rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); > > + /* Bit 63 indicates whether contents are valid */ > > + if (!(hi & 0x80000000)) > > + return; > > Something''s funny with this indentation.That was copy-n-paste, so I must have done something incorrectly. Anyhow I still need to actually test this code.> > > + > > + fid = lo & 0x3f; > > + did = (lo >> 6) & 7; > > + if (boot_cpu_data.x86 == 0x10) > > + px->core_frequency = (100 * (fid + 0x10)) >> did; > > + else > > + px->core_frequency = (100 * (fid + 8)) >> did; > > + } > > +} > > + > > +static void amd_fixup_freq(struct processor_performance *perf) > > +{ > > > > + int i; > > + > > + for (i = 0; i < perf->state_count; i++) > > + amd_fixup_frequency(perf->states, i); > > + > > +} > > static int powernow_cpufreq_verify(struct cpufreq_policy *policy) > > { > > struct acpi_cpufreq_data *data; > > @@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy) > > > > perf = &processor_pminfo[policy->cpu]->perf; > > > > + amd_fixup_freq(perf); > > + > > cpufreq_verify_within_limits(policy, 0, > > perf->states[perf->platform_limit].core_frequency * 1000); > > Thanks. > > -- > Regards/Gruss, > Boris. > > Sent from a fat crate under my desk. Formatting is fine. > --
Stefan Bader
2013-Jan-21 12:22 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On 01/18/2013 08:03 PM, Borislav Petkov wrote:> On Fri, Jan 18, 2013 at 02:00:15PM -0500, Konrad Rzeszutek Wilk wrote: >> I did not explain myself well. The fix is OK - it just that the >> hypervisor causes the quirk to not work correctly. Hmm, I wonder if >> there BIOSes that do the same thing (cause the MSR to return 0). Per >> you estimation of BIOS quality, it seems that this could happen. > > Yeah, I don''t think there''s a limit to the amount of SNAFU a BIOS can > cause :-). >So for having the "check for sensible BIOS" in mainline I refreshed the patch (fixed the bit test, and actually tested it this time) and also added some hopefully sensible explanation to it (attached below). Should I send it to acpi lists or would that have to go via an Andre? -Stefan From 6e2fc8291c91339123a37162382d8b08b50867ba Mon Sep 17 00:00:00 2001 From: Stefan Bader <stefan.bader@canonical.com> Date: Mon, 14 Jan 2013 16:17:00 +0100 Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies To fix incorrect P-state frequencies which can happen on some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d "ACPI: Add fixups for AMD P-state figures" introduced a quirk to obtain the correct values by reading from AMD specific MSRs. This did cause a regression when running a kernel using that quirk under Xen which does (currently) not pass on the contents of the HW but 0. And this seems to cause a failure to initialize the ondemand governour (hard to say for sure as all P-states appear to run at the same frequency). While this should also be fixed in the hypervisor (to allow a guest to read that MSR), this patch is intended to work around the issue in the meantime. In discussion it turned out that indeed real HW/BIOSes may choose to not set the valid bit and thus mark the P-state as invalid. So this could be considered a fix for broken BIOSes that also works around the issue on Xen. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Cc: stable@vger.kernel.org # v3.7.. --- drivers/acpi/processor_perflib.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c index 836bfe0..41f4bdac 100644 --- a/drivers/acpi/processor_perflib.c +++ b/drivers/acpi/processor_perflib.c @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px, int i) if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) || boot_cpu_data.x86 == 0x11) { rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); + /* Bit 63 indicates whether contents are valid */ + if (!(hi & 0x80000000)) + return; fid = lo & 0x3f; did = (lo >> 6) & 7; if (boot_cpu_data.x86 == 0x10) -- 1.8.0
Borislav Petkov
2013-Jan-21 12:42 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On Mon, Jan 21, 2013 at 12:22:18PM +0000, Stefan Bader wrote:> So for having the "check for sensible BIOS" in mainline I refreshed > the patch (fixed the bit test, and actually tested it this time) and > also added some hopefully sensible explanation to it (attached > below). > > Should I send it to acpi lists or would that have to go via an Andre?Maybe Rafael could pick it up?> > -Stefan > > From 6e2fc8291c91339123a37162382d8b08b50867ba Mon Sep 17 00:00:00 2001 > From: Stefan Bader <stefan.bader@canonical.com> > Date: Mon, 14 Jan 2013 16:17:00 +0100 > Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies > > To fix incorrect P-state frequencies which can happen on > some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d > "ACPI: Add fixups for AMD P-state figures" > introduced a quirk to obtain the correct values by reading > from AMD specific MSRs. > > This did cause a regression when running a kernel using that > quirk under Xen which does (currently) not pass on the contents > of the HW but 0.Actually this should say "does not currently pass through MSR accesses to baremetal" or similar. And this bit you mean is actually bit 63: "63: PstateEn. Read-write. 1=The P-state specified by this MSR is valid. 0=The P-state specified by this MSR is not valid. The purpose of this register is to indicate if the rest of the P-state information in the register is valid after a reset; it controls no hardware." in the MSRC001_00[68:64] P-State [4:0] Registers.> And this seems to cause a failure to initialize > the ondemand governour (hard to say for sure as all P-states > appear to run at the same frequency). > > While this should also be fixed in the hypervisor (to allow > a guest to read that MSR), this patch is intended to work > around the issue in the meantime. In discussion it turned out > that indeed real HW/BIOSes may choose to not set the valid bit > and thus mark the P-state as invalid. So this could be considered > a fix for broken BIOSes that also works around the issue on Xen. > > Signed-off-by: Stefan Bader <stefan.bader@canonical.com> > Cc: stable@vger.kernel.org # v3.7.. > --- > drivers/acpi/processor_perflib.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c > index 836bfe0..41f4bdac 100644 > --- a/drivers/acpi/processor_perflib.c > +++ b/drivers/acpi/processor_perflib.c > @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct > acpi_processor_px *px, int i) > if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) > || boot_cpu_data.x86 == 0x11) { > rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); > + /* Bit 63 indicates whether contents are valid */ > + if (!(hi & 0x80000000))You can make this a lot more explicit: if (!(hi & BIT(31))) return; This way 1) you''re sure you''re testing the correct bit and 2) any reviewer can know on the spot which bit it is about.> + return; > fid = lo & 0x3f; > did = (lo >> 6) & 7; > if (boot_cpu_data.x86 == 0x10)Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --
Rafael J. Wysocki
2013-Jan-21 12:53 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On Monday, January 21, 2013 01:42:55 PM Borislav Petkov wrote:> On Mon, Jan 21, 2013 at 12:22:18PM +0000, Stefan Bader wrote: > > So for having the "check for sensible BIOS" in mainline I refreshed > > the patch (fixed the bit test, and actually tested it this time) and > > also added some hopefully sensible explanation to it (attached > > below). > > > > Should I send it to acpi lists or would that have to go via an Andre? > > Maybe Rafael could pick it up?I can, if you ACK it for me. :-) Thanks, Rafael> > From 6e2fc8291c91339123a37162382d8b08b50867ba Mon Sep 17 00:00:00 2001 > > From: Stefan Bader <stefan.bader@canonical.com> > > Date: Mon, 14 Jan 2013 16:17:00 +0100 > > Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies > > > > To fix incorrect P-state frequencies which can happen on > > some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d > > "ACPI: Add fixups for AMD P-state figures" > > introduced a quirk to obtain the correct values by reading > > from AMD specific MSRs. > > > > This did cause a regression when running a kernel using that > > quirk under Xen which does (currently) not pass on the contents > > of the HW but 0. > > Actually this should say "does not currently pass through MSR accesses > to baremetal" or similar. > > And this bit you mean is actually bit 63: > > "63: PstateEn. Read-write. 1=The P-state specified by this MSR is valid. > 0=The P-state specified by this MSR is not valid. The purpose of this > register is to indicate if the rest of the P-state information in the > register is valid after a reset; it controls no hardware." > > in the MSRC001_00[68:64] P-State [4:0] Registers. > > > And this seems to cause a failure to initialize > > the ondemand governour (hard to say for sure as all P-states > > appear to run at the same frequency). > > > > While this should also be fixed in the hypervisor (to allow > > a guest to read that MSR), this patch is intended to work > > around the issue in the meantime. In discussion it turned out > > that indeed real HW/BIOSes may choose to not set the valid bit > > and thus mark the P-state as invalid. So this could be considered > > a fix for broken BIOSes that also works around the issue on Xen. > > > > Signed-off-by: Stefan Bader <stefan.bader@canonical.com> > > Cc: stable@vger.kernel.org # v3.7.. > > --- > > drivers/acpi/processor_perflib.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c > > index 836bfe0..41f4bdac 100644 > > --- a/drivers/acpi/processor_perflib.c > > +++ b/drivers/acpi/processor_perflib.c > > @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct > > acpi_processor_px *px, int i) > > if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) > > || boot_cpu_data.x86 == 0x11) { > > rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); > > + /* Bit 63 indicates whether contents are valid */ > > + if (!(hi & 0x80000000)) > > You can make this a lot more explicit: > > if (!(hi & BIT(31))) > return; > > This way > > 1) you''re sure you''re testing the correct bit and > 2) any reviewer can know on the spot which bit it is about. > > > + return; > > fid = lo & 0x3f; > > did = (lo >> 6) & 7; > > if (boot_cpu_data.x86 == 0x10) > > Thanks. > >-- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center.
Borislav Petkov
2013-Jan-21 13:08 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On Mon, Jan 21, 2013 at 01:53:38PM +0100, Rafael J. Wysocki wrote:> I can, if you ACK it for me. :-)Will do, after Stefan corrects some cosmetic issues :-) Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --
Stefan Bader
2013-Jan-21 13:11 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On 01/21/2013 12:42 PM, Borislav Petkov wrote:> On Mon, Jan 21, 2013 at 12:22:18PM +0000, Stefan Bader wrote: >> So for having the "check for sensible BIOS" in mainline I refreshed >> the patch (fixed the bit test, and actually tested it this time) and >> also added some hopefully sensible explanation to it (attached >> below). >> >> Should I send it to acpi lists or would that have to go via an Andre? > > Maybe Rafael could pick it up? > >> >> -Stefan >> >> From 6e2fc8291c91339123a37162382d8b08b50867ba Mon Sep 17 00:00:00 2001 >> From: Stefan Bader <stefan.bader@canonical.com> >> Date: Mon, 14 Jan 2013 16:17:00 +0100 >> Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies >> >> To fix incorrect P-state frequencies which can happen on >> some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d >> "ACPI: Add fixups for AMD P-state figures" >> introduced a quirk to obtain the correct values by reading >> from AMD specific MSRs. >> >> This did cause a regression when running a kernel using that >> quirk under Xen which does (currently) not pass on the contents >> of the HW but 0. > > Actually this should say "does not currently pass through MSR accesses > to baremetal" or similar.Ok, that sounds much better.> > And this bit you mean is actually bit 63: > > "63: PstateEn. Read-write. 1=The P-state specified by this MSR is valid. > 0=The P-state specified by this MSR is not valid. The purpose of this > register is to indicate if the rest of the P-state information in the > register is valid after a reset; it controls no hardware." > > in the MSRC001_00[68:64] P-State [4:0] Registers.Darn, yes.> >> And this seems to cause a failure to initialize >> the ondemand governour (hard to say for sure as all P-states >> appear to run at the same frequency). >> >> While this should also be fixed in the hypervisor (to allow >> a guest to read that MSR), this patch is intended to work >> around the issue in the meantime. In discussion it turned out >> that indeed real HW/BIOSes may choose to not set the valid bit >> and thus mark the P-state as invalid. So this could be considered >> a fix for broken BIOSes that also works around the issue on Xen. >> >> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> >> Cc: stable@vger.kernel.org # v3.7.. >> --- >> drivers/acpi/processor_perflib.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c >> index 836bfe0..41f4bdac 100644 >> --- a/drivers/acpi/processor_perflib.c >> +++ b/drivers/acpi/processor_perflib.c >> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct >> acpi_processor_px *px, int i) >> if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) >> || boot_cpu_data.x86 == 0x11) { >> rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); >> + /* Bit 63 indicates whether contents are valid */ >> + if (!(hi & 0x80000000)) > > You can make this a lot more explicit: > > if (!(hi & BIT(31))) > return; >True, ok, so let me respin the whole thing and re-send it. -Stefan> This way > > 1) you''re sure you''re testing the correct bit and > 2) any reviewer can know on the spot which bit it is about. > >> + return; >> fid = lo & 0x3f; >> did = (lo >> 6) & 7; >> if (boot_cpu_data.x86 == 0x10) > > Thanks. >
Stefan Bader
2013-Jan-21 15:03 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
Ok, so how about this? -Stefan From 9870926d4a847e36c0f61921762fd50f1c92f75d Mon Sep 17 00:00:00 2001 From: Stefan Bader <stefan.bader@canonical.com> Date: Mon, 14 Jan 2013 16:17:00 +0100 Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies To fix incorrect P-state frequencies which can happen on some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d "ACPI: Add fixups for AMD P-state figures" introduced a quirk to obtain the correct values by reading from AMD specific MSRs. This did cause a regression when running a kernel using that quirk under Xen which does (currently) not pass through MSR reads to the HW. Instead the guest gets a 0 in return. And this seems to cause a failure to initialize the ondemand governour (hard to say for sure as all P-states appear to run at the same frequency). While this should also be fixed in the hypervisor (to allow a guest to read that MSR), this patch is intended to work around the issue in the meantime. In discussion it turned out that indeed real HW/BIOSes may choose to not set the valid bit and thus mark the P-state as invalid. So this could be considered a fix for broken BIOSes that also works around the issue on Xen. [v2] Reword description text and use helper for bit index. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Cc: stable@vger.kernel.org # v3.7.. --- drivers/acpi/processor_perflib.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c index 836bfe0..caa042e 100644 --- a/drivers/acpi/processor_perflib.c +++ b/drivers/acpi/processor_perflib.c @@ -340,6 +340,17 @@ static void amd_fixup_frequency(struct acpi_processor_px *px, int i) if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) || boot_cpu_data.x86 == 0x11) { rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); + /* + * MSR C001_0064+: + * Bit 63: PstateEn. Read-write. 1=The P-state specified by + * this MSR is valid. 0=The P-state specified by this MSR is + * not valid. The purpose of this register is to indicate if + * the rest of the P-state information in the register is + * valid after a reset; it controls no hardware. + */ + if (!(hi & BIT(31))) + return; + fid = lo & 0x3f; did = (lo >> 6) & 7; if (boot_cpu_data.x86 == 0x10) -- 1.8.0
Borislav Petkov
2013-Jan-21 15:31 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On Mon, Jan 21, 2013 at 03:03:43PM +0000, Stefan Bader wrote:> From 9870926d4a847e36c0f61921762fd50f1c92f75d Mon Sep 17 00:00:00 2001 > From: Stefan Bader <stefan.bader@canonical.com> > Date: Mon, 14 Jan 2013 16:17:00 +0100 > Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies > > To fix incorrect P-state frequencies which can happen on > some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d > "ACPI: Add fixups for AMD P-state figures" > introduced a quirk to obtain the correct values by reading > from AMD specific MSRs. > > This did cause a regression when running a kernel using that > quirk under Xen which does (currently) not pass through MSR > reads to the HW. Instead the guest gets a 0 in return. > And this seems to cause a failure to initialize the ondemand > governour (hard to say for sure as all P-states appear to run > at the same frequency). > > While this should also be fixed in the hypervisor (to allow > a guest to read that MSR), this patch is intended to work > around the issue in the meantime. In discussion it turned out > that indeed real HW/BIOSes may choose to not set the valid bit > and thus mark the P-state as invalid. So this could be considered > a fix for broken BIOSes that also works around the issue on Xen. > > [v2] Reword description text and use helper for bit index. > Signed-off-by: Stefan Bader <stefan.bader@canonical.com> > Cc: stable@vger.kernel.org # v3.7.. > --- > drivers/acpi/processor_perflib.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c > index 836bfe0..caa042e 100644 > --- a/drivers/acpi/processor_perflib.c > +++ b/drivers/acpi/processor_perflib.c > @@ -340,6 +340,17 @@ static void amd_fixup_frequency(struct > acpi_processor_px *px, int i) > if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) > || boot_cpu_data.x86 == 0x11) { > rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); > + /* > + * MSR C001_0064+: > + * Bit 63: PstateEn. Read-write. 1=The P-state specified by > + * this MSR is valid. 0=The P-state specified by this MSR is > + * not valid. The purpose of this register is to indicate if > + * the rest of the P-state information in the register is > + * valid after a reset; it controls no hardware. > + */Maybe this comment is a but too long and it contains that idiotic processor manual speak :-). It should''ve contained only the first sentence: "PstateEn. If set, the P-state is valid." But maybe Rafael could correct it while committing, no reason to resend for that only. Acked-by: Borislav Petkov <bp@suse.de>> + if (!(hi & BIT(31))) > + return; > + > fid = lo & 0x3f; > did = (lo >> 6) & 7; > if (boot_cpu_data.x86 == 0x10) > -- > 1.8.0Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --
Boris Ostrovsky
2013-Jan-22 00:01 UTC
Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
On 01/18/2013 02:00 PM, Konrad Rzeszutek Wilk wrote:> > So something like this in the hypervisor maybe (not even tested): > > diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c > index a9b7792..54e7808 100644 > --- a/xen/arch/x86/acpi/cpufreq/powernow.c > +++ b/xen/arch/x86/acpi/cpufreq/powernow.c > @@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy, > > return 0; > } > +#define MSR_AMD_PSTATE_DEF_BASE 0xc0010064 > +static void amd_fixup_frequency(struct xen_processor_px *px, int i) > +{ > + u32 hi, lo, fid, did; > + int index = px->control & 0x00000007; > + > + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) > + return; > + > + if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) > + || boot_cpu_data.x86 == 0x11) { > + rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); > + /* Bit 63 indicates whether contents are valid */ > + if (!(hi & 0x80000000)) > + return; > + > + fid = lo & 0x3f; > + did = (lo >> 6) & 7; > + if (boot_cpu_data.x86 == 0x10) > + px->core_frequency = (100 * (fid + 0x10)) >> did; > + else > + px->core_frequency = (100 * (fid + 8)) >> did; > + } > +} > + > +static void amd_fixup_freq(struct processor_performance *perf) > +{ > > + int i; > + > + for (i = 0; i < perf->state_count; i++) > + amd_fixup_frequency(perf->states, i);amd_fixup_frequency(&perf->states[i]) will walk P-states.> + > +} > static int powernow_cpufreq_verify(struct cpufreq_policy *policy) > { > struct acpi_cpufreq_data *data; > @@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy) > > perf = &processor_pminfo[policy->cpu]->perf; > > + amd_fixup_freq(perf);Maybe move to powernow_cpufreq_cpu_init(), although this works too. -boris> + > cpufreq_verify_within_limits(policy, 0, > perf->states[perf->platform_limit].core_frequency * 1000); > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Rafael J. Wysocki
2013-Jan-22 13:54 UTC
Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
On Monday, January 21, 2013 04:31:05 PM Borislav Petkov wrote:> On Mon, Jan 21, 2013 at 03:03:43PM +0000, Stefan Bader wrote: > > From 9870926d4a847e36c0f61921762fd50f1c92f75d Mon Sep 17 00:00:00 2001 > > From: Stefan Bader <stefan.bader@canonical.com> > > Date: Mon, 14 Jan 2013 16:17:00 +0100 > > Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies > > > > To fix incorrect P-state frequencies which can happen on > > some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d > > "ACPI: Add fixups for AMD P-state figures" > > introduced a quirk to obtain the correct values by reading > > from AMD specific MSRs. > > > > This did cause a regression when running a kernel using that > > quirk under Xen which does (currently) not pass through MSR > > reads to the HW. Instead the guest gets a 0 in return. > > And this seems to cause a failure to initialize the ondemand > > governour (hard to say for sure as all P-states appear to run > > at the same frequency). > > > > While this should also be fixed in the hypervisor (to allow > > a guest to read that MSR), this patch is intended to work > > around the issue in the meantime. In discussion it turned out > > that indeed real HW/BIOSes may choose to not set the valid bit > > and thus mark the P-state as invalid. So this could be considered > > a fix for broken BIOSes that also works around the issue on Xen. > > > > [v2] Reword description text and use helper for bit index. > > Signed-off-by: Stefan Bader <stefan.bader@canonical.com> > > Cc: stable@vger.kernel.org # v3.7.. > > --- > > drivers/acpi/processor_perflib.c | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c > > index 836bfe0..caa042e 100644 > > --- a/drivers/acpi/processor_perflib.c > > +++ b/drivers/acpi/processor_perflib.c > > @@ -340,6 +340,17 @@ static void amd_fixup_frequency(struct > > acpi_processor_px *px, int i) > > if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10) > > || boot_cpu_data.x86 == 0x11) { > > rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi); > > + /* > > + * MSR C001_0064+: > > + * Bit 63: PstateEn. Read-write. 1=The P-state specified by > > + * this MSR is valid. 0=The P-state specified by this MSR is > > + * not valid. The purpose of this register is to indicate if > > + * the rest of the P-state information in the register is > > + * valid after a reset; it controls no hardware. > > + */ > > Maybe this comment is a but too long and it contains that idiotic > processor manual speak :-). It should''ve contained only the first > sentence: "PstateEn. If set, the P-state is valid." > > But maybe Rafael could correct it while committing, no reason to resend > for that only. > > Acked-by: Borislav Petkov <bp@suse.de>Applied, the comment fixed up. It''s in my bleeding-edge branch for now and I''ll move it to the linux-next branch after build testing. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center.