Stefan Bader
2012-Apr-25 13:00 UTC
Workings/effectiveness of the xen-acpi-processor driver
Since there have been requests about that driver to get backported into 3.2, I was interested to find out what or how much would be gained by that. The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). Which was not very successful as the drivers bail out of the init function because the first call to acpi_processor_register_performance() returns -ENODEV. There is some frequency scaling when running without Xen, so I need to do some more debugging there. The second system was an Intel one (4 core i7 920@2.67GHz) which was successfully loading the driver. Via xenpm I can see the various frequencies and also see them being changed. However the cpuidle data out of xenpm looks a bit odd: #> xenpm get-cpuidle-states 0 Max C-state: C7 cpu id : 0 total C-states : 2 idle time(ms) : 10819311 C0 : transition [00000000000000000001] residency [00000000000000005398 ms] C1 : transition [00000000000000000001] residency [00000000000010819311 ms] pc3 : [00000000000000000000 ms] pc6 : [00000000000000000000 ms] pc7 : [00000000000000000000 ms] cc3 : [00000000000000000000 ms] cc6 : [00000000000000000000 ms] Also gathering samples over 30s does look like only C0 and C1 are used. This might be because C1E support is enabled in BIOS but when looking at the intel_idle data in sysfs when running without a hypervisor will show C3 and C6 for the cores. That could have been just a wrong output, so I plugged in a power meter and compared a kernel running natively and running as dom0 (with and without the acpi-processor driver). Native: 175W dom0: 183W (with only marginal difference between with or without the processor driver) [yes, the system has a somewhat high base consumption which I attribute to a ridiculously dimensioned graphics subsystem to be running a text console] This I would take as C3 and C6 really not being used and the frequency scaling having no impact on the idle system is not that much surprising. But if that was true it would also limit the usefulness of the turbo mode which I understand would also be limited by the c-state of the other cores. Do I misread the data I see? Or maybe its a known limitation? In case it is worth doing more research I''ll gladly try things and gather more data. Thanks, Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Apr-26 15:50 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote:> Since there have been requests about that driver to get backported into 3.2, I > was interested to find out what or how much would be gained by that. > > The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). Which > was not very successful as the drivers bail out of the init function because the > first call to acpi_processor_register_performance() returns -ENODEV. There is > some frequency scaling when running without Xen, so I need to do some more > debugging there.Did you back-port the other components - the ones that turn off the native frequency scalling? provide disable_cpufreq() function to disable the API. xen/acpi-processor: Do not depend on CPU frequency scaling drivers. xen/cpufreq: Disable the cpu frequency scaling drivers from loading> > The second system was an Intel one (4 core i7 920@2.67GHz) which was > successfully loading the driver. Via xenpm I can see the various frequencies and > also see them being changed. However the cpuidle data out of xenpm looks a bit odd: > > #> xenpm get-cpuidle-states 0 > Max C-state: C7 > > cpu id : 0 > total C-states : 2 > idle time(ms) : 10819311 > C0 : transition [00000000000000000001] > residency [00000000000000005398 ms] > C1 : transition [00000000000000000001] > residency [00000000000010819311 ms] > pc3 : [00000000000000000000 ms] > pc6 : [00000000000000000000 ms] > pc7 : [00000000000000000000 ms] > cc3 : [00000000000000000000 ms] > cc6 : [00000000000000000000 ms] > > Also gathering samples over 30s does look like only C0 and C1 are used. ThisYes.> might be because C1E support is enabled in BIOS but when looking at the > intel_idle data in sysfs when running without a hypervisor will show C3 and C6 > for the cores. That could have been just a wrong output, so I plugged in a power > meter and compared a kernel running natively and running as dom0 (with and > without the acpi-processor driver). > > Native: 175W > dom0: 183W (with only marginal difference between with or without the > processor driver) > [yes, the system has a somewhat high base consumption which I attribute to a > ridiculously dimensioned graphics subsystem to be running a text console] > > This I would take as C3 and C6 really not being used and the frequency scalingTo go in deeper modes there is also a need to backport a Xen unstable hypercall which will allow the kernel to detect the other states besides C0-C2. "XEN_SET_PDC query was implemented in c/s 23783: "ACPI: add _PDC input override mechanism".> having no impact on the idle system is not that much surprising. But if that was > true it would also limit the usefulness of the turbo mode which I understand > would also be limited by the c-state of the other cores.Hm, I should double-check that - but somehow I thought that Xen independetly checks for TurboMode and if the P-states are in, then they are activated.> > Do I misread the data I see? Or maybe its a known limitation? In case it is > worth doing more research I''ll gladly try things and gather more data.Just missing some patches. Oh, and this one: xen/acpi: Fix Kconfig dependency on CPU_FREQ Hmm.. I think a patch disappeared somewhere.> > Thanks, > Stefan >
Stefan Bader
2012-Apr-26 16:25 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote:> On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: >> Since there have been requests about that driver to get backported into 3.2, I >> was interested to find out what or how much would be gained by that. >> >> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). Which >> was not very successful as the drivers bail out of the init function because the >> first call to acpi_processor_register_performance() returns -ENODEV. There is >> some frequency scaling when running without Xen, so I need to do some more >> debugging there. > > Did you back-port the other components - the ones that turn off the native > frequency scalling? > > provide disable_cpufreq() function to disable the API. > xen/acpi-processor: Do not depend on CPU frequency scaling drivers. > xen/cpufreq: Disable the cpu frequency scaling drivers from loading >>Yes, here is the full set for reference: * xen/cpufreq: Disable the cpu frequency scaling drivers from loading. * xen/acpi: Remove the WARN''s as they just create noise. * xen/acpi: Fix Kconfig dependency on CPU_FREQ * xen/acpi-processor: Do not depend on CPU frequency scaling drivers. * xen/acpi-processor: C and P-state driver that uploads said data to hyper * provide disable_cpufreq() function to disable the API.>> The second system was an Intel one (4 core i7 920@2.67GHz) which was >> successfully loading the driver. Via xenpm I can see the various frequencies and >> also see them being changed. However the cpuidle data out of xenpm looks a bit odd: >> >> #> xenpm get-cpuidle-states 0 >> Max C-state: C7 >> >> cpu id : 0 >> total C-states : 2 >> idle time(ms) : 10819311 >> C0 : transition [00000000000000000001] >> residency [00000000000000005398 ms] >> C1 : transition [00000000000000000001] >> residency [00000000000010819311 ms] >> pc3 : [00000000000000000000 ms] >> pc6 : [00000000000000000000 ms] >> pc7 : [00000000000000000000 ms] >> cc3 : [00000000000000000000 ms] >> cc6 : [00000000000000000000 ms] >> >> Also gathering samples over 30s does look like only C0 and C1 are used. This > > Yes. >> might be because C1E support is enabled in BIOS but when looking at the >> intel_idle data in sysfs when running without a hypervisor will show C3 and C6 >> for the cores. That could have been just a wrong output, so I plugged in a power >> meter and compared a kernel running natively and running as dom0 (with and >> without the acpi-processor driver). >> >> Native: 175W >> dom0: 183W (with only marginal difference between with or without the >> processor driver) >> [yes, the system has a somewhat high base consumption which I attribute to a >> ridiculously dimensioned graphics subsystem to be running a text console] >> >> This I would take as C3 and C6 really not being used and the frequency scaling > > To go in deeper modes there is also a need to backport a Xen unstable > hypercall which will allow the kernel to detect the other states besides > C0-C2. > > "XEN_SET_PDC query was implemented in c/s 23783: > "ACPI: add _PDC input override mechanism". >I see. There is a kernel patch about enabling MWAIT that refers to that...> >> having no impact on the idle system is not that much surprising. But if that was >> true it would also limit the usefulness of the turbo mode which I understand >> would also be limited by the c-state of the other cores. > > Hm, I should double-check that - but somehow I thought that Xen independetly > checks for TurboMode and if the P-states are in, then they are activated. >Turbo mode should be enabled. I had been only looking at a generic overview about it on Intel site which sounded like it would make more of a difference on how much one core could get overclocked related to how many cores are active (and I translated active or not into deeper c-states or not). Looking at the verbose output of turbostat it seems not to make that much difference whether 2-4 cores are running. A single core alone could get one more increment in clock stepping. That does not immediately sound a lot. And of course how much or long the higher clock is used depends on other factors as well and is not under OS control. In the end it is probably quite dynamic and hard to come up with hard facts to prove its value. Though if I can lower the idle power usage by reaching a bit further, that would greatly help to justify the effort and potential risk of backporting...>> >> Do I misread the data I see? Or maybe its a known limitation? In case it is >> worth doing more research I''ll gladly try things and gather more data. > > Just missing some patches. > > Oh, and this one: > xen/acpi: Fix Kconfig dependency on CPU_FREQ > > Hmm.. I think a patch disappeared somewhere. > >> >> Thanks, >> Stefan >> > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Apr-26 17:04 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote:> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: > > On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: > >> Since there have been requests about that driver to get backported into 3.2, I > >> was interested to find out what or how much would be gained by that. > >> > >> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). Which > >> was not very successful as the drivers bail out of the init function because the > >> first call to acpi_processor_register_performance() returns -ENODEV. There is > >> some frequency scaling when running without Xen, so I need to do some more > >> debugging there. > > > > Did you back-port the other components - the ones that turn off the native > > frequency scalling? > > > > provide disable_cpufreq() function to disable the API. > > xen/acpi-processor: Do not depend on CPU frequency scaling drivers. > > xen/cpufreq: Disable the cpu frequency scaling drivers from loading > >> > > Yes, here is the full set for reference: > > * xen/cpufreq: Disable the cpu frequency scaling drivers from loading. > * xen/acpi: Remove the WARN''s as they just create noise. > * xen/acpi: Fix Kconfig dependency on CPU_FREQ > * xen/acpi-processor: Do not depend on CPU frequency scaling drivers. > * xen/acpi-processor: C and P-state driver that uploads said data to hyper > * provide disable_cpufreq() function to disable the API.The one thing that I forgot to mention about the: xen/cpufreq: Disable the cpu frequency scaling drivers from loading. is that it also fixes a bug - which is that without this, the acpi-cpufreq would run and change the frequency states based on what dom0 thought was good. Which meant that Xen hypervisor would run with somebody behind its back changing the P-states - and would lower the performance in some case a lot (especially if one used the dom0_max_vcpus=X). .. snip..> >> Native: 175W > >> dom0: 183W (with only marginal difference between with or without the > >> processor driver)What happens if you run it xen-acpi-processor.off=1 ? (That should inhibit the driver from uploading the power management data). Is the value ~200w?> >> [yes, the system has a somewhat high base consumption which I attribute to a > >> ridiculously dimensioned graphics subsystem to be running a text console]Heh.> >> > >> This I would take as C3 and C6 really not being used and the frequency scaling > > > > To go in deeper modes there is also a need to backport a Xen unstable > > hypercall which will allow the kernel to detect the other states besides > > C0-C2. > > > > "XEN_SET_PDC query was implemented in c/s 23783: > > "ACPI: add _PDC input override mechanism". > > > > I see. There is a kernel patch about enabling MWAIT that refers to that...Yeah, I should see about back-porting it in Xen 4.1..> > > > >> having no impact on the idle system is not that much surprising. But if that was > >> true it would also limit the usefulness of the turbo mode which I understand > >> would also be limited by the c-state of the other cores. > > > > Hm, I should double-check that - but somehow I thought that Xen independetly > > checks for TurboMode and if the P-states are in, then they are activated. > > > Turbo mode should be enabled. I had been only looking at a generic overview > about it on Intel site which sounded like it would make more of a difference on > how much one core could get overclocked related to how many cores are active > (and I translated active or not into deeper c-states or not). > Looking at the verbose output of turbostat it seems not to make that much > difference whether 2-4 cores are running. A single core alone could get one more > increment in clock stepping. That does not immediately sound a lot. And of > course how much or long the higher clock is used depends on other factors as > well and is not under OS control. > > In the end it is probably quite dynamic and hard to come up with hard facts to > prove its value. Though if I can lower the idle power usage by reaching a bit > further, that would greatly help to justify the effort and potential risk of > backporting...Did you see the P-states dipping? I don''t really know how much using C-states beyond C-2 matters on baremetal in terms of saving power?> > >> > >> Do I misread the data I see? Or maybe its a known limitation? In case it is > >> worth doing more research I''ll gladly try things and gather more data.This patchset was driven more by the performance needs. Somehow (and I am not sure how), using the PM, makes the whole system work much faster. That was what Ian found out when he ran the Phorenix benchmark. But looking back, perhaps the reason for this was that the native cpufreq scaling drivers were running in the back changing the machine''s power states without consulting the Xen hypervisor.> > > > Just missing some patches. > > > > Oh, and this one: > > xen/acpi: Fix Kconfig dependency on CPU_FREQ > > > > Hmm.. I think a patch disappeared somewhere. > > > >> > >> Thanks, > >> Stefan > >> > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel > >
Konrad Rzeszutek Wilk
2012-May-01 20:02 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote:> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: > > On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: > >> Since there have been requests about that driver to get backported into 3.2, I > >> was interested to find out what or how much would be gained by that. > >> > >> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). Which > >> was not very successful as the drivers bail out of the init function because the > >> first call to acpi_processor_register_performance() returns -ENODEV. There is > >> some frequency scaling when running without Xen, so I need to do some more > >> debugging there. > > > > Did you back-port the other components - the ones that turn off the native > > frequency scalling? > > > > provide disable_cpufreq() function to disable the API. > > xen/acpi-processor: Do not depend on CPU frequency scaling drivers. > > xen/cpufreq: Disable the cpu frequency scaling drivers from loading > >> > > Yes, here is the full set for reference: > > * xen/cpufreq: Disable the cpu frequency scaling drivers from loading. > * xen/acpi: Remove the WARN''s as they just create noise. > * xen/acpi: Fix Kconfig dependency on CPU_FREQ > * xen/acpi-processor: Do not depend on CPU frequency scaling drivers. > * xen/acpi-processor: C and P-state driver that uploads said data to hyper > * provide disable_cpufreq() function to disable the API.And (Linus just pulled it), you also need this one: df88b2d96e36d9a9e325bfcd12eb45671cbbc937 (xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won''t be loaded.)> > >> The second system was an Intel one (4 core i7 920@2.67GHz) which was > >> successfully loading the driver. Via xenpm I can see the various frequencies and > >> also see them being changed. However the cpuidle data out of xenpm looks a bit odd: > >> > >> #> xenpm get-cpuidle-states 0 > >> Max C-state: C7 > >> > >> cpu id : 0 > >> total C-states : 2 > >> idle time(ms) : 10819311 > >> C0 : transition [00000000000000000001] > >> residency [00000000000000005398 ms] > >> C1 : transition [00000000000000000001] > >> residency [00000000000010819311 ms] > >> pc3 : [00000000000000000000 ms] > >> pc6 : [00000000000000000000 ms] > >> pc7 : [00000000000000000000 ms] > >> cc3 : [00000000000000000000 ms] > >> cc6 : [00000000000000000000 ms] > >> > >> Also gathering samples over 30s does look like only C0 and C1 are used. This > > > > Yes. > >> might be because C1E support is enabled in BIOS but when looking at the > >> intel_idle data in sysfs when running without a hypervisor will show C3 and C6 > >> for the cores. That could have been just a wrong output, so I plugged in a power > >> meter and compared a kernel running natively and running as dom0 (with and > >> without the acpi-processor driver). > >> > >> Native: 175W > >> dom0: 183W (with only marginal difference between with or without the > >> processor driver) > >> [yes, the system has a somewhat high base consumption which I attribute to a > >> ridiculously dimensioned graphics subsystem to be running a text console] > >> > >> This I would take as C3 and C6 really not being used and the frequency scalingSo the other thing I forgot to note is that C3->C6 have a detrimental effect on some Intel boxes with Xen. We haven''t figured out exactly which ones and the bug is definitly in the hypervisor. The bug is that when the CPU goes in those states the NIC ends up being unresponsive. Its like the interrupts stopped being ACKed. If I run ''xenpm set-max-cstate 2'' the issue disappears.> > > > To go in deeper modes there is also a need to backport a Xen unstable > > hypercall which will allow the kernel to detect the other states besides > > C0-C2. > > > > "XEN_SET_PDC query was implemented in c/s 23783: > > "ACPI: add _PDC input override mechanism". > > > > I see. There is a kernel patch about enabling MWAIT that refers to that...Were there any special things you ran when checking the output? Just plugging and looking at the results?> > > > >> having no impact on the idle system is not that much surprising. But if that was > >> true it would also limit the usefulness of the turbo mode which I understand > >> would also be limited by the c-state of the other cores. > > > > Hm, I should double-check that - but somehow I thought that Xen independetly > > checks for TurboMode and if the P-states are in, then they are activated.I did a bit of checking around and it does seem that is the case. From what I have gathered the TurboMode kicks in when the CPU is C0 mode (which should be obvious), and when the other cores are in anything but C0 mode. And sure enough that seems to be the case. But I can''t get the concrete details whether the "but C0 mode" means that TurboMode will work better if the C mode is legacy C1, C2, C3 or the CPU C-states (so MWAIT enabled). Trying to find out from Len Brown more details..> > > Turbo mode should be enabled. I had been only looking at a generic overview > about it on Intel site which sounded like it would make more of a difference on > how much one core could get overclocked related to how many cores are active > (and I translated active or not into deeper c-states or not). > Looking at the verbose output of turbostat it seems not to make that much > difference whether 2-4 cores are running. A single core alone could get one more > increment in clock stepping. That does not immediately sound a lot. And of > course how much or long the higher clock is used depends on other factors as > well and is not under OS control. > > In the end it is probably quite dynamic and hard to come up with hard facts to > prove its value. Though if I can lower the idle power usage by reaching a bit > further, that would greatly help to justify the effort and potential risk of > backporting...I understand. I wish I could give you the exact percentage points by which the power usage will drop. But I think the more substantial reason benefit of these patches is performance gains. The ones that Ian Campbell ran and were posted on Phorenix site paint that they are beneficial.> > >> > >> Do I misread the data I see? Or maybe its a known limitation? In case it is > >> worth doing more research I''ll gladly try things and gather more data. > > > > Just missing some patches. > > > > Oh, and this one: > > xen/acpi: Fix Kconfig dependency on CPU_FREQ > > > > Hmm.. I think a patch disappeared somewhere.That was the one I referenced at the beginning of this email.
Boris Ostrovsky
2012-May-01 22:35 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote:> On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote: >> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: >>> On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: >>>> Since there have been requests about that driver to get backported into 3.2, I >>>> was interested to find out what or how much would be gained by that. >>>> >>>> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). Which >>>> was not very successful as the drivers bail out of the init function because the >>>> first call to acpi_processor_register_performance() returns -ENODEV. There is >>>> some frequency scaling when running without Xen, so I need to do some more >>>> debugging there.I believe this is caused by the somewhat under-enlightened xen_apic_read(): static u32 xen_apic_read(u32 reg) { return 0; } This results in some data, most importantly boot_cpu_physical_apicid, not being set correctly and, in turn, causes x86_cpu_to_apicid to be broken. On larger AMD systems boot processor is typically APICID=0x20 (I don''t have Intel system handy to see how it looks there). As a quick and dirty test you can try: diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index edc2448..1f78998 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1781,6 +1781,7 @@ void __init register_lapic_address(unsigned long address) } if (boot_cpu_physical_apicid == -1U) { boot_cpu_physical_apicid = read_apic_id(); + boot_cpu_physical_apicid = 32; apic_version[boot_cpu_physical_apicid] GET_APIC_VERSION(apic_read(APIC_LVR)); } (Set it to whatever APICID on core0 is, I suspect it won''t be zero). -boris>>> >>> Did you back-port the other components - the ones that turn off the native >>> frequency scalling? >>> >>> provide disable_cpufreq() function to disable the API. >>> xen/acpi-processor: Do not depend on CPU frequency scaling drivers. >>> xen/cpufreq: Disable the cpu frequency scaling drivers from loading >>>> >> >> Yes, here is the full set for reference: >> >> * xen/cpufreq: Disable the cpu frequency scaling drivers from loading. >> * xen/acpi: Remove the WARN''s as they just create noise. >> * xen/acpi: Fix Kconfig dependency on CPU_FREQ >> * xen/acpi-processor: Do not depend on CPU frequency scaling drivers. >> * xen/acpi-processor: C and P-state driver that uploads said data to hyper >> * provide disable_cpufreq() function to disable the API. > > And (Linus just pulled it), you also need this one: > df88b2d96e36d9a9e325bfcd12eb45671cbbc937 (xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won''t be loaded.) > >> >>>> The second system was an Intel one (4 core i7 920@2.67GHz) which was >>>> successfully loading the driver. Via xenpm I can see the various frequencies and >>>> also see them being changed. However the cpuidle data out of xenpm looks a bit odd: >>>> >>>> #> xenpm get-cpuidle-states 0 >>>> Max C-state: C7 >>>> >>>> cpu id : 0 >>>> total C-states : 2 >>>> idle time(ms) : 10819311 >>>> C0 : transition [00000000000000000001] >>>> residency [00000000000000005398 ms] >>>> C1 : transition [00000000000000000001] >>>> residency [00000000000010819311 ms] >>>> pc3 : [00000000000000000000 ms] >>>> pc6 : [00000000000000000000 ms] >>>> pc7 : [00000000000000000000 ms] >>>> cc3 : [00000000000000000000 ms] >>>> cc6 : [00000000000000000000 ms] >>>> >>>> Also gathering samples over 30s does look like only C0 and C1 are used. This >>> >>> Yes. >>>> might be because C1E support is enabled in BIOS but when looking at the >>>> intel_idle data in sysfs when running without a hypervisor will show C3 and C6 >>>> for the cores. That could have been just a wrong output, so I plugged in a power >>>> meter and compared a kernel running natively and running as dom0 (with and >>>> without the acpi-processor driver). >>>> >>>> Native: 175W >>>> dom0: 183W (with only marginal difference between with or without the >>>> processor driver) >>>> [yes, the system has a somewhat high base consumption which I attribute to a >>>> ridiculously dimensioned graphics subsystem to be running a text console] >>>> >>>> This I would take as C3 and C6 really not being used and the frequency scaling > > So the other thing I forgot to note is that C3->C6 have a detrimental > effect on some Intel boxes with Xen. We haven''t figured out exactly which ones > and the bug is definitly in the hypervisor. The bug is that when the CPU goes in > those states the NIC ends up being unresponsive. Its like the interrupts stopped > being ACKed. If I run ''xenpm set-max-cstate 2'' the issue disappears. > >>> >>> To go in deeper modes there is also a need to backport a Xen unstable >>> hypercall which will allow the kernel to detect the other states besides >>> C0-C2. >>> >>> "XEN_SET_PDC query was implemented in c/s 23783: >>> "ACPI: add _PDC input override mechanism". >>> >> >> I see. There is a kernel patch about enabling MWAIT that refers to that... > > Were there any special things you ran when checking the output? Just plugging > and looking at the results? >> >>> >>>> having no impact on the idle system is not that much surprising. But if that was >>>> true it would also limit the usefulness of the turbo mode which I understand >>>> would also be limited by the c-state of the other cores. >>> >>> Hm, I should double-check that - but somehow I thought that Xen independetly >>> checks for TurboMode and if the P-states are in, then they are activated. > > I did a bit of checking around and it does seem that is the case. From what > I have gathered the TurboMode kicks in when the CPU is C0 mode (which should > be obvious), and when the other cores are in anything but C0 mode. And sure > enough that seems to be the case. But I can''t get the concrete details whether > the "but C0 mode" means that TurboMode will work better if the C mode is legacy > C1, C2, C3 or the CPU C-states (so MWAIT enabled). Trying to find out from > Len Brown more details.. >>> >> Turbo mode should be enabled. I had been only looking at a generic overview >> about it on Intel site which sounded like it would make more of a difference on >> how much one core could get overclocked related to how many cores are active >> (and I translated active or not into deeper c-states or not). >> Looking at the verbose output of turbostat it seems not to make that much >> difference whether 2-4 cores are running. A single core alone could get one more >> increment in clock stepping. That does not immediately sound a lot. And of >> course how much or long the higher clock is used depends on other factors as >> well and is not under OS control. >> >> In the end it is probably quite dynamic and hard to come up with hard facts to >> prove its value. Though if I can lower the idle power usage by reaching a bit >> further, that would greatly help to justify the effort and potential risk of >> backporting... > > I understand. I wish I could give you the exact percentage points by which > the power usage will drop. But I think the more substantial reason benefit of > these patches is performance gains. The ones that Ian Campbell ran and were > posted on Phorenix site paint that they are beneficial. > >> >>>> >>>> Do I misread the data I see? Or maybe its a known limitation? In case it is >>>> worth doing more research I''ll gladly try things and gather more data. >>> >>> Just missing some patches. >>> >>> Oh, and this one: >>> xen/acpi: Fix Kconfig dependency on CPU_FREQ >>> >>> Hmm.. I think a patch disappeared somewhere. > > That was the one I referenced at the beginning of this email. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Konrad Rzeszutek Wilk
2012-May-01 22:54 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On Tue, May 01, 2012 at 06:35:45PM -0400, Boris Ostrovsky wrote:> On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote: > >On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote: > >>On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: > >>>On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: > >>>>Since there have been requests about that driver to get backported into 3.2, I > >>>>was interested to find out what or how much would be gained by that. > >>>> > >>>>The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). Which > >>>>was not very successful as the drivers bail out of the init function because the > >>>>first call to acpi_processor_register_performance() returns -ENODEV. There is > >>>>some frequency scaling when running without Xen, so I need to do some more > >>>>debugging there. > > I believe this is caused by the somewhat under-enlightened xen_apic_read(): > > static u32 xen_apic_read(u32 reg) > { > return 0; > } > > This results in some data, most importantly > boot_cpu_physical_apicid, not being set correctly and, in turn, > causes x86_cpu_to_apicid to be broken.What is the involvment of x86_cpu_to_apicid to acpi_processor_register_performance? Or is this more of a stab in the dark? Stefan, one way to debug this is to make the driver be a module and then configure the /sys/../acpi/debug_level and debug_layer to be 0xffffffff and try loading the module. It should print out tons of data (And the reason it returned -Exxx).> > On larger AMD systems boot processor is typically APICID=0x20 (I > don''t have Intel system handy to see how it looks there). > > As a quick and dirty test you can try: > > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c > index edc2448..1f78998 100644 > --- a/arch/x86/kernel/apic/apic.c > +++ b/arch/x86/kernel/apic/apic.c > @@ -1781,6 +1781,7 @@ void __init register_lapic_address(unsigned > long address) > } > if (boot_cpu_physical_apicid == -1U) { > boot_cpu_physical_apicid = read_apic_id(); > + boot_cpu_physical_apicid = 32; > apic_version[boot_cpu_physical_apicid] > GET_APIC_VERSION(apic_read(APIC_LVR)); > } > > > (Set it to whatever APICID on core0 is, I suspect it won''t be zero). > > -boris > > > >>> > >>>Did you back-port the other components - the ones that turn off the native > >>>frequency scalling? > >>> > >>> provide disable_cpufreq() function to disable the API. > >>> xen/acpi-processor: Do not depend on CPU frequency scaling drivers. > >>> xen/cpufreq: Disable the cpu frequency scaling drivers from loading > >>>> > >> > >>Yes, here is the full set for reference: > >> > >>* xen/cpufreq: Disable the cpu frequency scaling drivers from loading. > >>* xen/acpi: Remove the WARN''s as they just create noise. > >>* xen/acpi: Fix Kconfig dependency on CPU_FREQ > >>* xen/acpi-processor: Do not depend on CPU frequency scaling drivers. > >>* xen/acpi-processor: C and P-state driver that uploads said data to hyper > >>* provide disable_cpufreq() function to disable the API. > > > >And (Linus just pulled it), you also need this one: > > df88b2d96e36d9a9e325bfcd12eb45671cbbc937 (xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won''t be loaded.) > > > >> > >>>>The second system was an Intel one (4 core i7 920@2.67GHz) which was > >>>>successfully loading the driver. Via xenpm I can see the various frequencies and > >>>>also see them being changed. However the cpuidle data out of xenpm looks a bit odd: > >>>> > >>>>#> xenpm get-cpuidle-states 0 > >>>>Max C-state: C7 > >>>> > >>>>cpu id : 0 > >>>>total C-states : 2 > >>>>idle time(ms) : 10819311 > >>>>C0 : transition [00000000000000000001] > >>>> residency [00000000000000005398 ms] > >>>>C1 : transition [00000000000000000001] > >>>> residency [00000000000010819311 ms] > >>>>pc3 : [00000000000000000000 ms] > >>>>pc6 : [00000000000000000000 ms] > >>>>pc7 : [00000000000000000000 ms] > >>>>cc3 : [00000000000000000000 ms] > >>>>cc6 : [00000000000000000000 ms] > >>>> > >>>>Also gathering samples over 30s does look like only C0 and C1 are used. This > >>> > >>>Yes. > >>>>might be because C1E support is enabled in BIOS but when looking at the > >>>>intel_idle data in sysfs when running without a hypervisor will show C3 and C6 > >>>>for the cores. That could have been just a wrong output, so I plugged in a power > >>>>meter and compared a kernel running natively and running as dom0 (with and > >>>>without the acpi-processor driver). > >>>> > >>>>Native: 175W > >>>>dom0: 183W (with only marginal difference between with or without the > >>>> processor driver) > >>>>[yes, the system has a somewhat high base consumption which I attribute to a > >>>>ridiculously dimensioned graphics subsystem to be running a text console] > >>>> > >>>>This I would take as C3 and C6 really not being used and the frequency scaling > > > >So the other thing I forgot to note is that C3->C6 have a detrimental > >effect on some Intel boxes with Xen. We haven''t figured out exactly which ones > >and the bug is definitly in the hypervisor. The bug is that when the CPU goes in > >those states the NIC ends up being unresponsive. Its like the interrupts stopped > >being ACKed. If I run ''xenpm set-max-cstate 2'' the issue disappears. > > > >>> > >>>To go in deeper modes there is also a need to backport a Xen unstable > >>>hypercall which will allow the kernel to detect the other states besides > >>>C0-C2. > >>> > >>>"XEN_SET_PDC query was implemented in c/s 23783: > >>> "ACPI: add _PDC input override mechanism". > >>> > >> > >>I see. There is a kernel patch about enabling MWAIT that refers to that... > > > >Were there any special things you ran when checking the output? Just plugging > >and looking at the results? > >> > >>> > >>>>having no impact on the idle system is not that much surprising. But if that was > >>>>true it would also limit the usefulness of the turbo mode which I understand > >>>>would also be limited by the c-state of the other cores. > >>> > >>>Hm, I should double-check that - but somehow I thought that Xen independetly > >>>checks for TurboMode and if the P-states are in, then they are activated. > > > >I did a bit of checking around and it does seem that is the case. From what > >I have gathered the TurboMode kicks in when the CPU is C0 mode (which should > >be obvious), and when the other cores are in anything but C0 mode. And sure > >enough that seems to be the case. But I can''t get the concrete details whether > >the "but C0 mode" means that TurboMode will work better if the C mode is legacy > >C1, C2, C3 or the CPU C-states (so MWAIT enabled). Trying to find out from > >Len Brown more details.. > >>> > >>Turbo mode should be enabled. I had been only looking at a generic overview > >>about it on Intel site which sounded like it would make more of a difference on > >>how much one core could get overclocked related to how many cores are active > >>(and I translated active or not into deeper c-states or not). > >>Looking at the verbose output of turbostat it seems not to make that much > >>difference whether 2-4 cores are running. A single core alone could get one more > >>increment in clock stepping. That does not immediately sound a lot. And of > >>course how much or long the higher clock is used depends on other factors as > >>well and is not under OS control. > >> > >>In the end it is probably quite dynamic and hard to come up with hard facts to > >>prove its value. Though if I can lower the idle power usage by reaching a bit > >>further, that would greatly help to justify the effort and potential risk of > >>backporting... > > > >I understand. I wish I could give you the exact percentage points by which > >the power usage will drop. But I think the more substantial reason benefit of > >these patches is performance gains. The ones that Ian Campbell ran and were > >posted on Phorenix site paint that they are beneficial. > > > >> > >>>> > >>>>Do I misread the data I see? Or maybe its a known limitation? In case it is > >>>>worth doing more research I''ll gladly try things and gather more data. > >>> > >>>Just missing some patches. > >>> > >>>Oh, and this one: > >>> xen/acpi: Fix Kconfig dependency on CPU_FREQ > >>> > >>>Hmm.. I think a patch disappeared somewhere. > > > >That was the one I referenced at the beginning of this email. > > > >_______________________________________________ > >Xen-devel mailing list > >Xen-devel@lists.xen.org > >http://lists.xen.org/xen-devel > > >
Konrad Rzeszutek Wilk
2012-May-02 00:47 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On Tue, May 01, 2012 at 06:54:56PM -0400, Konrad Rzeszutek Wilk wrote:> On Tue, May 01, 2012 at 06:35:45PM -0400, Boris Ostrovsky wrote: > > On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote: > > >On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote: > > >>On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: > > >>>On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: > > >>>>Since there have been requests about that driver to get backported into 3.2, I > > >>>>was interested to find out what or how much would be gained by that. > > >>>> > > >>>>The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). Which > > >>>>was not very successful as the drivers bail out of the init function because the > > >>>>first call to acpi_processor_register_performance() returns -ENODEV. There is > > >>>>some frequency scaling when running without Xen, so I need to do some more > > >>>>debugging there. > > > > I believe this is caused by the somewhat under-enlightened xen_apic_read(): > > > > static u32 xen_apic_read(u32 reg) > > { > > return 0; > > } > > > > This results in some data, most importantly > > boot_cpu_physical_apicid, not being set correctly and, in turn, > > causes x86_cpu_to_apicid to be broken. > > What is the involvment of x86_cpu_to_apicid to acpi_processor_register_performance? > Or is this more of a stab in the dark?Ah, it is the acpi_get_cpuid that gets called by acpi_processor_add->acpi_processor_get_info. And this one: 201 #ifdef CONFIG_SMP 202 for_each_possible_cpu(i) { 203 if (cpu_physical_id(i) == apic_id) 204 return i; 205 } 206 #else where the cpu_physical_id(i) is per_cpu(x86_cpu_to_apicid, i). But it is curious that it has been working for me on AMD and Intel machines. Granted the only server boxes I''ve are Intel - don''t have AMD server boxes at all. Stefan, can you send the full dmesg output too please?
Boris Ostrovsky
2012-May-02 01:11 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 05/01/2012 08:47 PM, Konrad Rzeszutek Wilk wrote:> On Tue, May 01, 2012 at 06:54:56PM -0400, Konrad Rzeszutek Wilk wrote: >> On Tue, May 01, 2012 at 06:35:45PM -0400, Boris Ostrovsky wrote: >>> On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote: >>>> On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote: >>>>> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: >>>>>> On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: >>>>>>> Since there have been requests about that driver to get backported into 3.2, I >>>>>>> was interested to find out what or how much would be gained by that. >>>>>>> >>>>>>> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). Which >>>>>>> was not very successful as the drivers bail out of the init function because the >>>>>>> first call to acpi_processor_register_performance() returns -ENODEV. There is >>>>>>> some frequency scaling when running without Xen, so I need to do some more >>>>>>> debugging there. >>> >>> I believe this is caused by the somewhat under-enlightened xen_apic_read(): >>> >>> static u32 xen_apic_read(u32 reg) >>> { >>> return 0; >>> } >>> >>> This results in some data, most importantly >>> boot_cpu_physical_apicid, not being set correctly and, in turn, >>> causes x86_cpu_to_apicid to be broken. >> >> What is the involvment of x86_cpu_to_apicid to acpi_processor_register_performance? >> Or is this more of a stab in the dark? > > Ah, it is the acpi_get_cpuid that gets called by acpi_processor_add->acpi_processor_get_info. > > And this one: > 201 #ifdef CONFIG_SMP > 202 for_each_possible_cpu(i) { > 203 if (cpu_physical_id(i) == apic_id) > 204 return i; > 205 } > 206 #else > > where the cpu_physical_id(i) is per_cpu(x86_cpu_to_apicid, i).Right. This and the fact that ''if (apicid == boot_cpu_physical_apicid)'' in generic_processor_info() is never true. In the end, ''processors'' per-cpu variable is not initialized for cpu 0 and that''s what causes acpi_processor_register_performance() to fail.> > But it is curious that it has been working for me on AMD and Intel machines. > Granted the only server boxes I''ve are Intel - don''t have AMD server boxes at all.I am also surprised that aside from some power inefficiencies and "BIOS bug" warnings the system appeared reasonably OK. I''d think that with APIC IDs messed up it would not run. If Intel processors number cores starting with APICID=0 then you wouldn''t see any issues. -boris> > Stefan, can you send the full dmesg output too please? >
Stefan Bader
2012-May-02 08:22 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
Sorry for not responding sooner, there was a good opportunity for a long week end here I could not miss. Also I had been moving the power meter over to the other machine. On 01.05.2012 22:02, Konrad Rzeszutek Wilk wrote:> On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote: >> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: >>> On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: >>>> Since there have been requests about that driver to get backported into 3.2, I >>>> was interested to find out what or how much would be gained by that. >>>> >>>> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). Which >>>> was not very successful as the drivers bail out of the init function because the >>>> first call to acpi_processor_register_performance() returns -ENODEV. There is >>>> some frequency scaling when running without Xen, so I need to do some more >>>> debugging there. >>> >>> Did you back-port the other components - the ones that turn off the native >>> frequency scalling? >>> >>> provide disable_cpufreq() function to disable the API. >>> xen/acpi-processor: Do not depend on CPU frequency scaling drivers. >>> xen/cpufreq: Disable the cpu frequency scaling drivers from loading >>>> >> >> Yes, here is the full set for reference: >> >> * xen/cpufreq: Disable the cpu frequency scaling drivers from loading. >> * xen/acpi: Remove the WARN''s as they just create noise. >> * xen/acpi: Fix Kconfig dependency on CPU_FREQ >> * xen/acpi-processor: Do not depend on CPU frequency scaling drivers. >> * xen/acpi-processor: C and P-state driver that uploads said data to hyper >> * provide disable_cpufreq() function to disable the API. > > And (Linus just pulled it), you also need this one: > df88b2d96e36d9a9e325bfcd12eb45671cbbc937 (xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won''t be loaded.)I suspect that one only if I would have commit 73c154c60be106b47f15d1111fc2d75cc7a436f2 xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it. and the change in the hypervisor to implement XEN_SET_PDC.> >> >>>> The second system was an Intel one (4 core i7 920@2.67GHz) which was >>>> successfully loading the driver. Via xenpm I can see the various frequencies and >>>> also see them being changed. However the cpuidle data out of xenpm looks a bit odd: >>>> >>>> #> xenpm get-cpuidle-states 0 >>>> Max C-state: C7 >>>> >>>> cpu id : 0 >>>> total C-states : 2 >>>> idle time(ms) : 10819311 >>>> C0 : transition [00000000000000000001] >>>> residency [00000000000000005398 ms] >>>> C1 : transition [00000000000000000001] >>>> residency [00000000000010819311 ms] >>>> pc3 : [00000000000000000000 ms] >>>> pc6 : [00000000000000000000 ms] >>>> pc7 : [00000000000000000000 ms] >>>> cc3 : [00000000000000000000 ms] >>>> cc6 : [00000000000000000000 ms] >>>> >>>> Also gathering samples over 30s does look like only C0 and C1 are used. This >>> >>> Yes. >>>> might be because C1E support is enabled in BIOS but when looking at the >>>> intel_idle data in sysfs when running without a hypervisor will show C3 and C6 >>>> for the cores. That could have been just a wrong output, so I plugged in a power >>>> meter and compared a kernel running natively and running as dom0 (with and >>>> without the acpi-processor driver). >>>> >>>> Native: 175W >>>> dom0: 183W (with only marginal difference between with or without the >>>> processor driver) >>>> [yes, the system has a somewhat high base consumption which I attribute to a >>>> ridiculously dimensioned graphics subsystem to be running a text console] >>>> >>>> This I would take as C3 and C6 really not being used and the frequency scaling > > So the other thing I forgot to note is that C3->C6 have a detrimental > effect on some Intel boxes with Xen. We haven''t figured out exactly which ones > and the bug is definitly in the hypervisor. The bug is that when the CPU goes in > those states the NIC ends up being unresponsive. Its like the interrupts stopped > being ACKed. If I run ''xenpm set-max-cstate 2'' the issue disappears. >I think I saw weird behavior not only under Xen with that. One netbook I got has the nasty tendency to go into undetermined sleeps and you had to hit some keys to keep it running. In that case I found out that the intel gfx card was one which does not cause interrupts on vsync and other sources seemed not able to wake up from deeper c-states when being msi. So the wonder cure there is to force interrupts not being msi. That aside, I was also comparing power usage in idle on the AMD box. And that shows an even bigger difference between running natively and running dom0 under the hypervisor (103W against 127W). As that is AMD there never are other c-states than c1(e) which xenpm shows as being used (got the feeling that with AMD CPUs it is quite hard to find out the c1 residency nowadays (only intel_idle keeps statistics). I think I got some data now but still need to parse it (if I am not wrong one needs to use power trace now). But regardless it would hint that having the deeper c-states may not be the real solution.>>> >>> To go in deeper modes there is also a need to backport a Xen unstable >>> hypercall which will allow the kernel to detect the other states besides >>> C0-C2. >>> >>> "XEN_SET_PDC query was implemented in c/s 23783: >>> "ACPI: add _PDC input override mechanism". >>> >> >> I see. There is a kernel patch about enabling MWAIT that refers to that... > > Were there any special things you ran when checking the output? Just plugging > and looking at the results?In the first step it only was the idle power usage. But as you say the real target for the patches were a better overall performance under use. So I need to think of something to quantify the difference.>> >>> >>>> having no impact on the idle system is not that much surprising. But if that was >>>> true it would also limit the usefulness of the turbo mode which I understand >>>> would also be limited by the c-state of the other cores. >>> >>> Hm, I should double-check that - but somehow I thought that Xen independetly >>> checks for TurboMode and if the P-states are in, then they are activated. > > I did a bit of checking around and it does seem that is the case. From what > I have gathered the TurboMode kicks in when the CPU is C0 mode (which should > be obvious), and when the other cores are in anything but C0 mode. And sure > enough that seems to be the case. But I can''t get the concrete details whether > the "but C0 mode" means that TurboMode will work better if the C mode is legacy > C1, C2, C3 or the CPU C-states (so MWAIT enabled). Trying to find out from > Len Brown more details.. >>>Right, that is how I understood things, too. Depending on how many cores are "active" (which does not really say what active means) the frequency can go up in steps of 133Mhz (or was that 200) up to a certain upper limit. The upper limit depends on how many cores are active but for my CPU seems just one more step between 2-4 cores active and 1 core active. The rest is not under OS control beside the CPU to go up in frequency need to be in P0 mode. From the looks other cores can be in P0 as well and can go up a bit as well.>> Turbo mode should be enabled. I had been only looking at a generic overview >> about it on Intel site which sounded like it would make more of a difference on >> how much one core could get overclocked related to how many cores are active >> (and I translated active or not into deeper c-states or not). >> Looking at the verbose output of turbostat it seems not to make that much >> difference whether 2-4 cores are running. A single core alone could get one more >> increment in clock stepping. That does not immediately sound a lot. And of >> course how much or long the higher clock is used depends on other factors as >> well and is not under OS control. >> >> In the end it is probably quite dynamic and hard to come up with hard facts to >> prove its value. Though if I can lower the idle power usage by reaching a bit >> further, that would greatly help to justify the effort and potential risk of >> backporting... > > I understand. I wish I could give you the exact percentage points by which > the power usage will drop. But I think the more substantial reason benefit of > these patches is performance gains. The ones that Ian Campbell ran and were > posted on Phorenix site paint that they are beneficial.I probably should ask Ian (or check on that post) what tests he was using to measure the difference.> >> >>>> >>>> Do I misread the data I see? Or maybe its a known limitation? In case it is >>>> worth doing more research I''ll gladly try things and gather more data. >>> >>> Just missing some patches. >>> >>> Oh, and this one: >>> xen/acpi: Fix Kconfig dependency on CPU_FREQ >>> >>> Hmm.. I think a patch disappeared somewhere. > > That was the one I referenced at the beginning of this email. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Stefan Bader
2012-May-02 08:36 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 02.05.2012 00:35, Boris Ostrovsky wrote:> On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote: >> On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote: >>> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: >>>> On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: >>>>> Since there have been requests about that driver to get backported into 3.2, I >>>>> was interested to find out what or how much would be gained by that. >>>>> >>>>> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). >>>>> Which >>>>> was not very successful as the drivers bail out of the init function >>>>> because the >>>>> first call to acpi_processor_register_performance() returns -ENODEV. There is >>>>> some frequency scaling when running without Xen, so I need to do some more >>>>> debugging there. > > I believe this is caused by the somewhat under-enlightened xen_apic_read(): > > static u32 xen_apic_read(u32 reg) > { > return 0; > } > > This results in some data, most importantly boot_cpu_physical_apicid, not being > set correctly and, in turn, causes x86_cpu_to_apicid to be broken.Ah ok. I check what my box say and try the change below and gathering more data as suggested in the follow-ups (including to turn on the acpi debugging and debugging in the xen acpi processor driver). The latter I had done but that only would print "max acpi id: 16" (or so) before the failure. No wonder missing the acpi debugging.> > On larger AMD systems boot processor is typically APICID=0x20 (I don''t have > Intel system handy to see how it looks there). > > As a quick and dirty test you can try: > > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c > index edc2448..1f78998 100644 > --- a/arch/x86/kernel/apic/apic.c > +++ b/arch/x86/kernel/apic/apic.c > @@ -1781,6 +1781,7 @@ void __init register_lapic_address(unsigned long address) > } > if (boot_cpu_physical_apicid == -1U) { > boot_cpu_physical_apicid = read_apic_id(); > + boot_cpu_physical_apicid = 32; > apic_version[boot_cpu_physical_apicid] > GET_APIC_VERSION(apic_read(APIC_LVR)); > } > > > (Set it to whatever APICID on core0 is, I suspect it won''t be zero). > > -boris > > >>>> >>>> Did you back-port the other components - the ones that turn off the native >>>> frequency scalling? >>>> >>>> provide disable_cpufreq() function to disable the API. >>>> xen/acpi-processor: Do not depend on CPU frequency scaling drivers. >>>> xen/cpufreq: Disable the cpu frequency scaling drivers from loading >>>>> >>> >>> Yes, here is the full set for reference: >>> >>> * xen/cpufreq: Disable the cpu frequency scaling drivers from loading. >>> * xen/acpi: Remove the WARN''s as they just create noise. >>> * xen/acpi: Fix Kconfig dependency on CPU_FREQ >>> * xen/acpi-processor: Do not depend on CPU frequency scaling drivers. >>> * xen/acpi-processor: C and P-state driver that uploads said data to hyper >>> * provide disable_cpufreq() function to disable the API. >> >> And (Linus just pulled it), you also need this one: >> df88b2d96e36d9a9e325bfcd12eb45671cbbc937 (xen/enlighten: Disable MWAIT_LEAF >> so that acpi-pad won''t be loaded.) >> >>> >>>>> The second system was an Intel one (4 core i7 920@2.67GHz) which was >>>>> successfully loading the driver. Via xenpm I can see the various >>>>> frequencies and >>>>> also see them being changed. However the cpuidle data out of xenpm looks a >>>>> bit odd: >>>>> >>>>> #> xenpm get-cpuidle-states 0 >>>>> Max C-state: C7 >>>>> >>>>> cpu id : 0 >>>>> total C-states : 2 >>>>> idle time(ms) : 10819311 >>>>> C0 : transition [00000000000000000001] >>>>> residency [00000000000000005398 ms] >>>>> C1 : transition [00000000000000000001] >>>>> residency [00000000000010819311 ms] >>>>> pc3 : [00000000000000000000 ms] >>>>> pc6 : [00000000000000000000 ms] >>>>> pc7 : [00000000000000000000 ms] >>>>> cc3 : [00000000000000000000 ms] >>>>> cc6 : [00000000000000000000 ms] >>>>> >>>>> Also gathering samples over 30s does look like only C0 and C1 are used. This >>>> >>>> Yes. >>>>> might be because C1E support is enabled in BIOS but when looking at the >>>>> intel_idle data in sysfs when running without a hypervisor will show C3 and C6 >>>>> for the cores. That could have been just a wrong output, so I plugged in a >>>>> power >>>>> meter and compared a kernel running natively and running as dom0 (with and >>>>> without the acpi-processor driver). >>>>> >>>>> Native: 175W >>>>> dom0: 183W (with only marginal difference between with or without the >>>>> processor driver) >>>>> [yes, the system has a somewhat high base consumption which I attribute to a >>>>> ridiculously dimensioned graphics subsystem to be running a text console] >>>>> >>>>> This I would take as C3 and C6 really not being used and the frequency scaling >> >> So the other thing I forgot to note is that C3->C6 have a detrimental >> effect on some Intel boxes with Xen. We haven''t figured out exactly which ones >> and the bug is definitly in the hypervisor. The bug is that when the CPU goes in >> those states the NIC ends up being unresponsive. Its like the interrupts stopped >> being ACKed. If I run ''xenpm set-max-cstate 2'' the issue disappears. >> >>>> >>>> To go in deeper modes there is also a need to backport a Xen unstable >>>> hypercall which will allow the kernel to detect the other states besides >>>> C0-C2. >>>> >>>> "XEN_SET_PDC query was implemented in c/s 23783: >>>> "ACPI: add _PDC input override mechanism". >>>> >>> >>> I see. There is a kernel patch about enabling MWAIT that refers to that... >> >> Were there any special things you ran when checking the output? Just plugging >> and looking at the results? >>> >>>> >>>>> having no impact on the idle system is not that much surprising. But if >>>>> that was >>>>> true it would also limit the usefulness of the turbo mode which I understand >>>>> would also be limited by the c-state of the other cores. >>>> >>>> Hm, I should double-check that - but somehow I thought that Xen independetly >>>> checks for TurboMode and if the P-states are in, then they are activated. >> >> I did a bit of checking around and it does seem that is the case. From what >> I have gathered the TurboMode kicks in when the CPU is C0 mode (which should >> be obvious), and when the other cores are in anything but C0 mode. And sure >> enough that seems to be the case. But I can''t get the concrete details whether >> the "but C0 mode" means that TurboMode will work better if the C mode is legacy >> C1, C2, C3 or the CPU C-states (so MWAIT enabled). Trying to find out from >> Len Brown more details.. >>>> >>> Turbo mode should be enabled. I had been only looking at a generic overview >>> about it on Intel site which sounded like it would make more of a difference on >>> how much one core could get overclocked related to how many cores are active >>> (and I translated active or not into deeper c-states or not). >>> Looking at the verbose output of turbostat it seems not to make that much >>> difference whether 2-4 cores are running. A single core alone could get one more >>> increment in clock stepping. That does not immediately sound a lot. And of >>> course how much or long the higher clock is used depends on other factors as >>> well and is not under OS control. >>> >>> In the end it is probably quite dynamic and hard to come up with hard facts to >>> prove its value. Though if I can lower the idle power usage by reaching a bit >>> further, that would greatly help to justify the effort and potential risk of >>> backporting... >> >> I understand. I wish I could give you the exact percentage points by which >> the power usage will drop. But I think the more substantial reason benefit of >> these patches is performance gains. The ones that Ian Campbell ran and were >> posted on Phorenix site paint that they are beneficial. >> >>> >>>>> >>>>> Do I misread the data I see? Or maybe its a known limitation? In case it is >>>>> worth doing more research I''ll gladly try things and gather more data. >>>> >>>> Just missing some patches. >>>> >>>> Oh, and this one: >>>> xen/acpi: Fix Kconfig dependency on CPU_FREQ >>>> >>>> Hmm.. I think a patch disappeared somewhere. >> >> That was the one I referenced at the beginning of this email. >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >> > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2012-May-02 09:19 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
>>> On 02.05.12 at 03:11, Boris Ostrovsky <boris.ostrovsky@amd.com> wrote: > On 05/01/2012 08:47 PM, Konrad Rzeszutek Wilk wrote: >> But it is curious that it has been working for me on AMD and Intel machines. >> Granted the only server boxes I''ve are Intel - don''t have AMD server boxes at > all. > > I am also surprised that aside from some power inefficiencies and "BIOS > bug" warnings the system appeared reasonably OK. I''d think that with > APIC IDs messed up it would not run.That''s not suprising at all to me, given that the entire LAPIC and interrupt management happens in Xen and not in Dom0. Hence P- and C-state management code is likely the only consumer of this (always going to be bogus under Xen) data. (In our kernels, I specifically removed all traces of these, as any use of them can only have bad effects. In pv-ops this not being an option, the only alternative appears to be to find and adjust each one individually, now and forever.) Jan
Stefan Bader
2012-May-02 14:56 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 02.05.2012 00:54, Konrad Rzeszutek Wilk wrote:> What is the involvment of x86_cpu_to_apicid to acpi_processor_register_performance? > Or is this more of a stab in the dark? > > Stefan, one way to debug this is to make the driver be a module and then > configure the /sys/../acpi/debug_level and debug_layer to be 0xffffffff > and try loading the module. It should print out tons of data (And the reason > it returned -Exxx).For completeness here the run without the modification Boris had suggested. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Stefan Bader
2012-May-02 15:01 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 02.05.2012 00:35, Boris Ostrovsky wrote:> On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote: >> On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote: >>> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: >>>> On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: >>>>> Since there have been requests about that driver to get backported into 3.2, I >>>>> was interested to find out what or how much would be gained by that. >>>>> >>>>> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). >>>>> Which >>>>> was not very successful as the drivers bail out of the init function >>>>> because the >>>>> first call to acpi_processor_register_performance() returns -ENODEV. There is >>>>> some frequency scaling when running without Xen, so I need to do some more >>>>> debugging there. > > I believe this is caused by the somewhat under-enlightened xen_apic_read(): > > static u32 xen_apic_read(u32 reg) > { > return 0; > } > > This results in some data, most importantly boot_cpu_physical_apicid, not being > set correctly and, in turn, causes x86_cpu_to_apicid to be broken. > > On larger AMD systems boot processor is typically APICID=0x20 (I don''t have > Intel system handy to see how it looks there). > > As a quick and dirty test you can try: > > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c > index edc2448..1f78998 100644 > --- a/arch/x86/kernel/apic/apic.c > +++ b/arch/x86/kernel/apic/apic.c > @@ -1781,6 +1781,7 @@ void __init register_lapic_address(unsigned long address) > } > if (boot_cpu_physical_apicid == -1U) { > boot_cpu_physical_apicid = read_apic_id(); > + boot_cpu_physical_apicid = 32; > apic_version[boot_cpu_physical_apicid] > GET_APIC_VERSION(apic_read(APIC_LVR)); > } > > > (Set it to whatever APICID on core0 is, I suspect it won''t be zero). >Indeed, when I hack the above id to be 0x10 (as my dmesg tells) most of the BIOS bug messages are gone and the xen-acpi-processor driver successfully loads and seems to be switching frequencies ok (just a quick tight loop which made one cpu go from P4 to P0). -Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-May-02 16:08 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On Wed, May 02, 2012 at 05:01:00PM +0200, Stefan Bader wrote:> On 02.05.2012 00:35, Boris Ostrovsky wrote: > > On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote: > >> On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote: > >>> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: > >>>> On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: > >>>>> Since there have been requests about that driver to get backported into 3.2, I > >>>>> was interested to find out what or how much would be gained by that. > >>>>> > >>>>> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). > >>>>> Which > >>>>> was not very successful as the drivers bail out of the init function > >>>>> because the > >>>>> first call to acpi_processor_register_performance() returns -ENODEV. There is > >>>>> some frequency scaling when running without Xen, so I need to do some more > >>>>> debugging there. > > > > I believe this is caused by the somewhat under-enlightened xen_apic_read(): > > > > static u32 xen_apic_read(u32 reg) > > { > > return 0; > > } > > > > This results in some data, most importantly boot_cpu_physical_apicid, not being > > set correctly and, in turn, causes x86_cpu_to_apicid to be broken. > > > > On larger AMD systems boot processor is typically APICID=0x20 (I don''t have > > Intel system handy to see how it looks there). > > > > As a quick and dirty test you can try: > > > > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c > > index edc2448..1f78998 100644 > > --- a/arch/x86/kernel/apic/apic.c > > +++ b/arch/x86/kernel/apic/apic.c > > @@ -1781,6 +1781,7 @@ void __init register_lapic_address(unsigned long address) > > } > > if (boot_cpu_physical_apicid == -1U) { > > boot_cpu_physical_apicid = read_apic_id(); > > + boot_cpu_physical_apicid = 32; > > apic_version[boot_cpu_physical_apicid] > > GET_APIC_VERSION(apic_read(APIC_LVR)); > > } > > > > > > (Set it to whatever APICID on core0 is, I suspect it won''t be zero). > > > > Indeed, when I hack the above id to be 0x10 (as my dmesg tells) most of the BIOS > bug messages are gone and the xen-acpi-processor driver successfully loads and > seems to be switching frequencies ok (just a quick tight loop which made one cpu > go from P4 to P0).OK. Can you try the attached patch pls? It should do the same thing as what the debug patch does - get the _real_ APIC ID from the hypervisor. (And as bonus it also removes the annoying: BIOS bug: APIC version is 0 for CPU 0/0x0, fixing up to 0x10 diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index a8f8844..d816448 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -811,7 +811,29 @@ static void xen_io_delay(void) #ifdef CONFIG_X86_LOCAL_APIC static u32 xen_apic_read(u32 reg) { - return 0; + struct xen_platform_op op = { + .cmd = XENPF_get_cpuinfo, + .interface_version = XENPF_INTERFACE_VERSION, + .u.pcpu_info.xen_cpuid = 0, + }; + int ret = 0; + + /* Shouldn''t need this as APIC is turned off for PV, and we only + * get called on the bootup processor. But just in case. */ + if (!xen_initial_domain() || smp_processor_id()) + return 0; + + if (reg == APIC_LVR) + return 0x10; + + if (reg != APIC_ID) + return 0; + + ret = HYPERVISOR_dom0_op(&op); + if (ret) + return 0; + + return op.u.pcpu_info.apic_id; } static void xen_apic_write(u32 reg, u32 val)
Boris Ostrovsky
2012-May-02 17:06 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 05/02/2012 12:08 PM, Konrad Rzeszutek Wilk wrote:> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index a8f8844..d816448 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -811,7 +811,29 @@ static void xen_io_delay(void) > #ifdef CONFIG_X86_LOCAL_APIC > static u32 xen_apic_read(u32 reg) > { > - return 0; > + struct xen_platform_op op = { > + .cmd = XENPF_get_cpuinfo, > + .interface_version = XENPF_INTERFACE_VERSION, > + .u.pcpu_info.xen_cpuid = 0,Is this always zero? This will probably solve the current problem but I am wondering whether in the future we might hit another bug because this routine will return the same APICID for all VCPUs. -boris> + }; > + int ret = 0; > + > + /* Shouldn''t need this as APIC is turned off for PV, and we only > + * get called on the bootup processor. But just in case. */ > + if (!xen_initial_domain() || smp_processor_id()) > + return 0; > + > + if (reg == APIC_LVR) > + return 0x10; > + > + if (reg != APIC_ID) > + return 0; > + > + ret = HYPERVISOR_dom0_op(&op); > + if (ret) > + return 0; > + > + return op.u.pcpu_info.apic_id; > } > > static void xen_apic_write(u32 reg, u32 val) >
Konrad Rzeszutek Wilk
2012-May-02 17:14 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On Wed, May 02, 2012 at 01:06:34PM -0400, Boris Ostrovsky wrote:> On 05/02/2012 12:08 PM, Konrad Rzeszutek Wilk wrote: > >diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > >index a8f8844..d816448 100644 > >--- a/arch/x86/xen/enlighten.c > >+++ b/arch/x86/xen/enlighten.c > >@@ -811,7 +811,29 @@ static void xen_io_delay(void) > > #ifdef CONFIG_X86_LOCAL_APIC > > static u32 xen_apic_read(u32 reg) > > { > >- return 0; > >+ struct xen_platform_op op = { > >+ .cmd = XENPF_get_cpuinfo, > >+ .interface_version = XENPF_INTERFACE_VERSION, > >+ .u.pcpu_info.xen_cpuid = 0, > > > Is this always zero? This will probably solve the current problemIts a CPU number (not tied in to APIC or ACPI IDs).> but I am wondering whether in the future we might hit another bug > because this routine will return the same APICID for all VCPUs.Later on it does a check for ''smp_processor_id()'' - and if that is anything but zero it will bail out. So this shoudl solve the problem for the bootup processor.> > -boris > > > >+ }; > >+ int ret = 0; > >+ > >+ /* Shouldn''t need this as APIC is turned off for PV, and we only > >+ * get called on the bootup processor. But just in case. */ > >+ if (!xen_initial_domain() || smp_processor_id()) > >+ return 0; > >+ > >+ if (reg == APIC_LVR) > >+ return 0x10; > >+ > >+ if (reg != APIC_ID) > >+ return 0; > >+ > >+ ret = HYPERVISOR_dom0_op(&op); > >+ if (ret) > >+ return 0; > >+ > >+ return op.u.pcpu_info.apic_id; > > } > > > > static void xen_apic_write(u32 reg, u32 val) > > >
Stefan Bader
2012-May-02 21:29 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 02.05.2012 18:08, Konrad Rzeszutek Wilk wrote:> On Wed, May 02, 2012 at 05:01:00PM +0200, Stefan Bader wrote: >> On 02.05.2012 00:35, Boris Ostrovsky wrote: >>> On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote: >>>> On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote: >>>>> On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote: >>>>>> On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote: >>>>>>> Since there have been requests about that driver to get backported into 3.2, I >>>>>>> was interested to find out what or how much would be gained by that. >>>>>>> >>>>>>> The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). >>>>>>> Which >>>>>>> was not very successful as the drivers bail out of the init function >>>>>>> because the >>>>>>> first call to acpi_processor_register_performance() returns -ENODEV. There is >>>>>>> some frequency scaling when running without Xen, so I need to do some more >>>>>>> debugging there. >>> >>> I believe this is caused by the somewhat under-enlightened xen_apic_read(): >>> >>> static u32 xen_apic_read(u32 reg) >>> { >>> return 0; >>> } >>> >>> This results in some data, most importantly boot_cpu_physical_apicid, not being >>> set correctly and, in turn, causes x86_cpu_to_apicid to be broken. >>> >>> On larger AMD systems boot processor is typically APICID=0x20 (I don''t have >>> Intel system handy to see how it looks there). >>> >>> As a quick and dirty test you can try: >>> >>> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c >>> index edc2448..1f78998 100644 >>> --- a/arch/x86/kernel/apic/apic.c >>> +++ b/arch/x86/kernel/apic/apic.c >>> @@ -1781,6 +1781,7 @@ void __init register_lapic_address(unsigned long address) >>> } >>> if (boot_cpu_physical_apicid == -1U) { >>> boot_cpu_physical_apicid = read_apic_id(); >>> + boot_cpu_physical_apicid = 32; >>> apic_version[boot_cpu_physical_apicid] >>> GET_APIC_VERSION(apic_read(APIC_LVR)); >>> } >>> >>> >>> (Set it to whatever APICID on core0 is, I suspect it won''t be zero). >>> >> >> Indeed, when I hack the above id to be 0x10 (as my dmesg tells) most of the BIOS >> bug messages are gone and the xen-acpi-processor driver successfully loads and >> seems to be switching frequencies ok (just a quick tight loop which made one cpu >> go from P4 to P0). > > OK. Can you try the attached patch pls? It should do the same thing > as what the debug patch does - get the _real_ APIC ID from the hypervisor. > (And as bonus it also removes the annoying: > > BIOS bug: APIC version is 0 for CPU 0/0x0, fixing up to 0x10All the BIOS bug messages are gone from the Intel and the AMD box. Just the AMD box again fails to load the xen-acpi-processor driver. Though it must be a different exit. I had enabled acpi debugging for the modprobe call but this time there is no trace of any acpi messages.> > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index a8f8844..d816448 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -811,7 +811,29 @@ static void xen_io_delay(void) > #ifdef CONFIG_X86_LOCAL_APIC > static u32 xen_apic_read(u32 reg) > { > - return 0; > + struct xen_platform_op op = { > + .cmd = XENPF_get_cpuinfo, > + .interface_version = XENPF_INTERFACE_VERSION, > + .u.pcpu_info.xen_cpuid = 0, > + }; > + int ret = 0; > + > + /* Shouldn''t need this as APIC is turned off for PV, and we only > + * get called on the bootup processor. But just in case. */ > + if (!xen_initial_domain() || smp_processor_id()) > + return 0; > + > + if (reg == APIC_LVR) > + return 0x10; > + > + if (reg != APIC_ID) > + return 0; > + > + ret = HYPERVISOR_dom0_op(&op); > + if (ret) > + return 0; > + > + return op.u.pcpu_info.apic_id; > } > > static void xen_apic_write(u32 reg, u32 val) > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Boris Ostrovsky
2012-May-02 21:31 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 05/02/2012 01:14 PM, Konrad Rzeszutek Wilk wrote:> On Wed, May 02, 2012 at 01:06:34PM -0400, Boris Ostrovsky wrote: >> On 05/02/2012 12:08 PM, Konrad Rzeszutek Wilk wrote: >>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c >>> index a8f8844..d816448 100644 >>> --- a/arch/x86/xen/enlighten.c >>> +++ b/arch/x86/xen/enlighten.c >>> @@ -811,7 +811,29 @@ static void xen_io_delay(void) >>> #ifdef CONFIG_X86_LOCAL_APIC >>> static u32 xen_apic_read(u32 reg) >>> { >>> - return 0; >>> + struct xen_platform_op op = { >>> + .cmd = XENPF_get_cpuinfo, >>> + .interface_version = XENPF_INTERFACE_VERSION, >>> + .u.pcpu_info.xen_cpuid = 0, >> >> >> Is this always zero? This will probably solve the current problem > > Its a CPU number (not tied in to APIC or ACPI IDs).Why not use CPU number instead of zero here?> >> but I am wondering whether in the future we might hit another bug >> because this routine will return the same APICID for all VCPUs. > > Later on it does a check for ''smp_processor_id()'' - and if > that is anything but zero it will bail out.Can you point me to the check you are referring to? -boris> > So this shoudl solve the problem for the bootup processor. >> >> -boris >> >> >>> + }; >>> + int ret = 0; >>> + >>> + /* Shouldn''t need this as APIC is turned off for PV, and we only >>> + * get called on the bootup processor. But just in case. */ >>> + if (!xen_initial_domain() || smp_processor_id()) >>> + return 0; >>> + >>> + if (reg == APIC_LVR) >>> + return 0x10; >>> + >>> + if (reg != APIC_ID) >>> + return 0; >>> + >>> + ret = HYPERVISOR_dom0_op(&op); >>> + if (ret) >>> + return 0; >>> + >>> + return op.u.pcpu_info.apic_id; >>> } >>> >>> static void xen_apic_write(u32 reg, u32 val) >>> >> >
Konrad Rzeszutek Wilk
2012-May-02 21:41 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On Wed, May 02, 2012 at 02:31:07PM -0700, Boris Ostrovsky wrote:> On 05/02/2012 01:14 PM, Konrad Rzeszutek Wilk wrote: > >On Wed, May 02, 2012 at 01:06:34PM -0400, Boris Ostrovsky wrote: > >>On 05/02/2012 12:08 PM, Konrad Rzeszutek Wilk wrote: > >>>diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > >>>index a8f8844..d816448 100644 > >>>--- a/arch/x86/xen/enlighten.c > >>>+++ b/arch/x86/xen/enlighten.c > >>>@@ -811,7 +811,29 @@ static void xen_io_delay(void) > >>> #ifdef CONFIG_X86_LOCAL_APIC > >>> static u32 xen_apic_read(u32 reg) > >>> { > >>>- return 0; > >>>+ struct xen_platform_op op = { > >>>+ .cmd = XENPF_get_cpuinfo, > >>>+ .interface_version = XENPF_INTERFACE_VERSION, > >>>+ .u.pcpu_info.xen_cpuid = 0, > >> > >> > >>Is this always zero? This will probably solve the current problem > > > >Its a CPU number (not tied in to APIC or ACPI IDs). > > Why not use CPU number instead of zero here?The issue was only with the bootup CPU - so was using the Xen''s bootup CPU number - which is zero (as is Linux''s).> > > > >>but I am wondering whether in the future we might hit another bug > >>because this routine will return the same APICID for all VCPUs. > > > > Later on it does a check for ''smp_processor_id()'' - and if > >that is anything but zero it will bail out. > > Can you point me to the check you are referring to?if (!xen_initial_domain() || smp_processor_id())> > -boris > > > > > >So this shoudl solve the problem for the bootup processor. > >> > >>-boris > >> > >> > >>>+ }; > >>>+ int ret = 0; > >>>+ > >>>+ /* Shouldn''t need this as APIC is turned off for PV, and we only > >>>+ * get called on the bootup processor. But just in case. */ > >>>+ if (!xen_initial_domain() || smp_processor_id()) > >>>+ return 0; > >>>+ > >>>+ if (reg == APIC_LVR) > >>>+ return 0x10; > >>>+ > >>>+ if (reg != APIC_ID) > >>>+ return 0; > >>>+ > >>>+ ret = HYPERVISOR_dom0_op(&op); > >>>+ if (ret) > >>>+ return 0; > >>>+ > >>>+ return op.u.pcpu_info.apic_id; > >>> } > >>> > >>> static void xen_apic_write(u32 reg, u32 val) > >>> > >> > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Boris Ostrovsky
2012-May-02 22:09 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 05/02/2012 05:41 PM, Konrad Rzeszutek Wilk wrote:> On Wed, May 02, 2012 at 02:31:07PM -0700, Boris Ostrovsky wrote: >> On 05/02/2012 01:14 PM, Konrad Rzeszutek Wilk wrote: >>> On Wed, May 02, 2012 at 01:06:34PM -0400, Boris Ostrovsky wrote: >>>> On 05/02/2012 12:08 PM, Konrad Rzeszutek Wilk wrote: >>>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c >>>>> index a8f8844..d816448 100644 >>>>> --- a/arch/x86/xen/enlighten.c >>>>> +++ b/arch/x86/xen/enlighten.c >>>>> @@ -811,7 +811,29 @@ static void xen_io_delay(void) >>>>> #ifdef CONFIG_X86_LOCAL_APIC >>>>> static u32 xen_apic_read(u32 reg) >>>>> { >>>>> - return 0; >>>>> + struct xen_platform_op op = { >>>>> + .cmd = XENPF_get_cpuinfo, >>>>> + .interface_version = XENPF_INTERFACE_VERSION, >>>>> + .u.pcpu_info.xen_cpuid = 0, >>>> >>>> >>>> Is this always zero? This will probably solve the current problem >>> >>> Its a CPU number (not tied in to APIC or ACPI IDs). >> >> Why not use CPU number instead of zero here? > > The issue was only with the bootup CPU - so was using the Xen''s > bootup CPU number - which is zero (as is Linux''s).I agree that for this particular problem this may be sufficient. My concern is that in the future someone may decide to use apic_read(APIC_ID) or read_apic_id() for some other purpose and they won''t get expected result (i.e. on all CPUs they will get the same answer).> >> >>> >>>> but I am wondering whether in the future we might hit another bug >>>> because this routine will return the same APICID for all VCPUs. >>> >>> Later on it does a check for ''smp_processor_id()'' - and if >>> that is anything but zero it will bail out. >> >> Can you point me to the check you are referring to? > > if (!xen_initial_domain() || smp_processor_id())I don''t see this line --- neither in the mainline nor in your kernel. Which kernel and which routine is this in? BTW, this patch doesn''t quite work, xen-acpi-processor driver fails to load with the same error as before. I''ll look at this tomorrow more carefully. -boris> > >> >> -boris >> >> >>> >>> So this shoudl solve the problem for the bootup processor. >>>> >>>> -boris >>>> >>>> >>>>> + }; >>>>> + int ret = 0; >>>>> + >>>>> + /* Shouldn''t need this as APIC is turned off for PV, and we only >>>>> + * get called on the bootup processor. But just in case. */ >>>>> + if (!xen_initial_domain() || smp_processor_id()) >>>>> + return 0; >>>>> + >>>>> + if (reg == APIC_LVR) >>>>> + return 0x10; >>>>> + >>>>> + if (reg != APIC_ID) >>>>> + return 0; >>>>> + >>>>> + ret = HYPERVISOR_dom0_op(&op); >>>>> + if (ret) >>>>> + return 0; >>>>> + >>>>> + return op.u.pcpu_info.apic_id; >>>>> } >>>>> >>>>> static void xen_apic_write(u32 reg, u32 val) >>>>> >>>> >>> >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >
Stefan Bader
2012-May-03 06:55 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 03.05.2012 00:09, Boris Ostrovsky wrote:> On 05/02/2012 05:41 PM, Konrad Rzeszutek Wilk wrote: >> On Wed, May 02, 2012 at 02:31:07PM -0700, Boris Ostrovsky wrote: >>> On 05/02/2012 01:14 PM, Konrad Rzeszutek Wilk wrote: >>>> On Wed, May 02, 2012 at 01:06:34PM -0400, Boris Ostrovsky wrote: >>>>> On 05/02/2012 12:08 PM, Konrad Rzeszutek Wilk wrote: >>>>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c >>>>>> index a8f8844..d816448 100644 >>>>>> --- a/arch/x86/xen/enlighten.c >>>>>> +++ b/arch/x86/xen/enlighten.c >>>>>> @@ -811,7 +811,29 @@ static void xen_io_delay(void) >>>>>> #ifdef CONFIG_X86_LOCAL_APIC >>>>>> static u32 xen_apic_read(u32 reg) >>>>>> { >>>>>> - return 0; >>>>>> + struct xen_platform_op op = { >>>>>> + .cmd = XENPF_get_cpuinfo, >>>>>> + .interface_version = XENPF_INTERFACE_VERSION, >>>>>> + .u.pcpu_info.xen_cpuid = 0, >>>>> >>>>> >>>>> Is this always zero? This will probably solve the current problem >>>> >>>> Its a CPU number (not tied in to APIC or ACPI IDs). >>> >>> Why not use CPU number instead of zero here? >> >> The issue was only with the bootup CPU - so was using the Xen''s >> bootup CPU number - which is zero (as is Linux''s). > > I agree that for this particular problem this may be sufficient. > > My concern is that in the future someone may decide to use apic_read(APIC_ID) or > read_apic_id() for some other purpose and they won''t get expected result (i.e. > on all CPUs they will get the same answer). > >> >>> >>>> >>>>> but I am wondering whether in the future we might hit another bug >>>>> because this routine will return the same APICID for all VCPUs. >>>> >>>> Later on it does a check for ''smp_processor_id()'' - and if >>>> that is anything but zero it will bail out. >>> >>> Can you point me to the check you are referring to? >> >> if (!xen_initial_domain() || smp_processor_id()) > > I don''t see this line --- neither in the mainline nor in your kernel. Which > kernel and which routine is this in?This is in the inline patch Konrad asked me to check.> > BTW, this patch doesn''t quite work, xen-acpi-processor driver fails to load with > the same error as before. I''ll look at this tomorrow more carefully.It does fail but at least for me it seems slightly different. I do the modprobe after turning on all acpi debugging levels and layers. And without any change there are queries visible. With that patch the driver just fails but there are no queries. I plan to have another go with messages sprinkled to all exit paths today, too (was just a bit too late and two pints later yesterday). -Stefan> > > -boris > >> >> >>> >>> -boris >>> >>> >>>> >>>> So this shoudl solve the problem for the bootup processor. >>>>> >>>>> -boris >>>>> >>>>> >>>>>> + }; >>>>>> + int ret = 0; >>>>>> + >>>>>> + /* Shouldn''t need this as APIC is turned off for PV, and we only >>>>>> + * get called on the bootup processor. But just in case. */ >>>>>> + if (!xen_initial_domain() || smp_processor_id()) >>>>>> + return 0; >>>>>> + >>>>>> + if (reg == APIC_LVR) >>>>>> + return 0x10; >>>>>> + >>>>>> + if (reg != APIC_ID) >>>>>> + return 0; >>>>>> + >>>>>> + ret = HYPERVISOR_dom0_op(&op); >>>>>> + if (ret) >>>>>> + return 0; >>>>>> + >>>>>> + return op.u.pcpu_info.apic_id; >>>>>> } >>>>>> >>>>>> static void xen_apic_write(u32 reg, u32 val) >>>>>> >>>>> >>>> >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xen.org >>> http://lists.xen.org/xen-devel >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Stefan Bader
2012-May-03 10:00 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 03.05.2012 08:55, Stefan Bader wrote:> On 03.05.2012 00:09, Boris Ostrovsky wrote: >> On 05/02/2012 05:41 PM, Konrad Rzeszutek Wilk wrote: >>> On Wed, May 02, 2012 at 02:31:07PM -0700, Boris Ostrovsky wrote: >>>> On 05/02/2012 01:14 PM, Konrad Rzeszutek Wilk wrote: >>>>> On Wed, May 02, 2012 at 01:06:34PM -0400, Boris Ostrovsky wrote: >>>>>> On 05/02/2012 12:08 PM, Konrad Rzeszutek Wilk wrote: >>>>>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c >>>>>>> index a8f8844..d816448 100644 >>>>>>> --- a/arch/x86/xen/enlighten.c >>>>>>> +++ b/arch/x86/xen/enlighten.c >>>>>>> @@ -811,7 +811,29 @@ static void xen_io_delay(void) >>>>>>> #ifdef CONFIG_X86_LOCAL_APIC >>>>>>> static u32 xen_apic_read(u32 reg) >>>>>>> { >>>>>>> - return 0; >>>>>>> + struct xen_platform_op op = { >>>>>>> + .cmd = XENPF_get_cpuinfo, >>>>>>> + .interface_version = XENPF_INTERFACE_VERSION, >>>>>>> + .u.pcpu_info.xen_cpuid = 0, >>>>>> >>>>>> >>>>>> Is this always zero? This will probably solve the current problem >>>>> >>>>> Its a CPU number (not tied in to APIC or ACPI IDs). >>>> >>>> Why not use CPU number instead of zero here? >>> >>> The issue was only with the bootup CPU - so was using the Xen''s >>> bootup CPU number - which is zero (as is Linux''s). >> >> I agree that for this particular problem this may be sufficient. >> >> My concern is that in the future someone may decide to use apic_read(APIC_ID) or >> read_apic_id() for some other purpose and they won''t get expected result (i.e. >> on all CPUs they will get the same answer). >> >>> >>>> >>>>> >>>>>> but I am wondering whether in the future we might hit another bug >>>>>> because this routine will return the same APICID for all VCPUs. >>>>> >>>>> Later on it does a check for ''smp_processor_id()'' - and if >>>>> that is anything but zero it will bail out. >>>> >>>> Can you point me to the check you are referring to? >>> >>> if (!xen_initial_domain() || smp_processor_id()) >> >> I don''t see this line --- neither in the mainline nor in your kernel. Which >> kernel and which routine is this in? > > This is in the inline patch Konrad asked me to check. >> >> BTW, this patch doesn''t quite work, xen-acpi-processor driver fails to load with >> the same error as before. I''ll look at this tomorrow more carefully. > > It does fail but at least for me it seems slightly different. I do the modprobe > after turning on all acpi debugging levels and layers. And without any change > there are queries visible. With that patch the driver just fails but there are > no queries. I plan to have another go with messages sprinkled to all exit paths > today, too (was just a bit too late and two pints later yesterday).Gah! Once you do it right... It *does* fail the exactly same way and there are some acpi debug messages...> > > -Stefan >> >> >> -boris >> >>> >>> >>>> >>>> -boris >>>> >>>> >>>>> >>>>> So this shoudl solve the problem for the bootup processor. >>>>>> >>>>>> -boris >>>>>> >>>>>> >>>>>>> + }; >>>>>>> + int ret = 0; >>>>>>> + >>>>>>> + /* Shouldn''t need this as APIC is turned off for PV, and we only >>>>>>> + * get called on the bootup processor. But just in case. */ >>>>>>> + if (!xen_initial_domain() || smp_processor_id()) >>>>>>> + return 0; >>>>>>> + >>>>>>> + if (reg == APIC_LVR) >>>>>>> + return 0x10; >>>>>>> + >>>>>>> + if (reg != APIC_ID) >>>>>>> + return 0; >>>>>>> + >>>>>>> + ret = HYPERVISOR_dom0_op(&op); >>>>>>> + if (ret) >>>>>>> + return 0; >>>>>>> + >>>>>>> + return op.u.pcpu_info.apic_id; >>>>>>> } >>>>>>> >>>>>>> static void xen_apic_write(u32 reg, u32 val) >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xen.org >>>> http://lists.xen.org/xen-devel >>> >> >> > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Stefan Bader
2012-May-03 12:58 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 03.05.2012 00:09, Boris Ostrovsky wrote:> On 05/02/2012 05:41 PM, Konrad Rzeszutek Wilk wrote: >> On Wed, May 02, 2012 at 02:31:07PM -0700, Boris Ostrovsky wrote: >>> On 05/02/2012 01:14 PM, Konrad Rzeszutek Wilk wrote: >>>> On Wed, May 02, 2012 at 01:06:34PM -0400, Boris Ostrovsky wrote: >>>>> On 05/02/2012 12:08 PM, Konrad Rzeszutek Wilk wrote: >>>>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c >>>>>> index a8f8844..d816448 100644 >>>>>> --- a/arch/x86/xen/enlighten.c >>>>>> +++ b/arch/x86/xen/enlighten.c >>>>>> @@ -811,7 +811,29 @@ static void xen_io_delay(void) >>>>>> #ifdef CONFIG_X86_LOCAL_APIC >>>>>> static u32 xen_apic_read(u32 reg) >>>>>> { >>>>>> - return 0; >>>>>> + struct xen_platform_op op = { >>>>>> + .cmd = XENPF_get_cpuinfo, >>>>>> + .interface_version = XENPF_INTERFACE_VERSION, >>>>>> + .u.pcpu_info.xen_cpuid = 0, >>>>> >>>>> >>>>> Is this always zero? This will probably solve the current problem >>>> >>>> Its a CPU number (not tied in to APIC or ACPI IDs). >>> >>> Why not use CPU number instead of zero here? >> >> The issue was only with the bootup CPU - so was using the Xen''s >> bootup CPU number - which is zero (as is Linux''s). > > I agree that for this particular problem this may be sufficient. > > My concern is that in the future someone may decide to use apic_read(APIC_ID) or > read_apic_id() for some other purpose and they won''t get expected result (i.e. > on all CPUs they will get the same answer). > >> >>> >>>> >>>>> but I am wondering whether in the future we might hit another bug >>>>> because this routine will return the same APICID for all VCPUs. >>>> >>>> Later on it does a check for ''smp_processor_id()'' - and if >>>> that is anything but zero it will bail out. >>> >>> Can you point me to the check you are referring to? >> >> if (!xen_initial_domain() || smp_processor_id()) > > I don''t see this line --- neither in the mainline nor in your kernel. Which > kernel and which routine is this in? > > BTW, this patch doesn''t quite work, xen-acpi-processor driver fails to load with > the same error as before. I''ll look at this tomorrow more carefully. > > > -boris > >> >> >>> >>> -boris >>> >>> >>>> >>>> So this shoudl solve the problem for the bootup processor. >>>>> >>>>> -boris >>>>> >>>>> >>>>>> + }; >>>>>> + int ret = 0; >>>>>> + >>>>>> + /* Shouldn''t need this as APIC is turned off for PV, and we only >>>>>> + * get called on the bootup processor. But just in case. */ >>>>>> + if (!xen_initial_domain() || smp_processor_id()) >>>>>> + return 0; >>>>>> + >>>>>> + if (reg == APIC_LVR) >>>>>> + return 0x10; >>>>>> + >>>>>> + if (reg != APIC_ID) >>>>>> + return 0; >>>>>> + >>>>>> + ret = HYPERVISOR_dom0_op(&op); >>>>>> + if (ret) >>>>>> + return 0; >>>>>> + >>>>>> + return op.u.pcpu_info.apic_id; >>>>>> } >>>>>> >>>>>> static void xen_apic_write(u32 reg, u32 val)I added debugging to all exit paths that could return 0 (which is what the boot_cpu_physical_apicid is set to with that patch. Which would only leave the case of the HV call returning the wrong value somehow...>>>>>> >>>>> >>>> >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xen.org >>> http://lists.xen.org/xen-devel >> > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Stefan Bader
2012-May-03 14:47 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 03.05.2012 14:58, Stefan Bader wrote:>>>>> So this shoudl solve the problem for the bootup processor. >>>>>> >>>>>> -boris >>>>>> >>>>>> >>>>>>> + }; >>>>>>> + int ret = 0; >>>>>>> + >>>>>>> + /* Shouldn''t need this as APIC is turned off for PV, and we only >>>>>>> + * get called on the bootup processor. But just in case. */ >>>>>>> + if (!xen_initial_domain() || smp_processor_id()) >>>>>>> + return 0; >>>>>>> + >>>>>>> + if (reg == APIC_LVR) >>>>>>> + return 0x10; >>>>>>> + >>>>>>> + if (reg != APIC_ID) >>>>>>> + return 0; >>>>>>> + >>>>>>> + ret = HYPERVISOR_dom0_op(&op); >>>>>>> + if (ret) >>>>>>> + return 0; >>>>>>> + >>>>>>> + return op.u.pcpu_info.apic_id; >>>>>>> } >>>>>>> >>>>>>> static void xen_apic_write(u32 reg, u32 val) > > I added debugging to all exit paths that could return 0 (which is what the > boot_cpu_physical_apicid is set to with that patch. Which would only leave the > case of the HV call returning the wrong value somehow... >Hmmm, so xen_apic_read is still correct... [ 0.000000] ACPI: Local APIC address 0xfee00000 [ 0.000000] xxx xen_apic_read(20) [ 0.000000] xxx xen_apic_read -> 10 [ 0.000000] boot_cpu_physical_apicid = 0 [ 0.000000] xxx xen_apic_read(30) [ 0.000000] +- apic version = 10 there seems to be a slightly strange tweak (at least for me) in read_apic_id... static inline unsigned int read_apic_id(void) { unsigned int reg; reg = apic_read(APIC_ID); // calls apic->read(reg) return apic->get_apic_id(reg); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-May-03 15:46 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On Thu, May 03, 2012 at 04:47:46PM +0200, Stefan Bader wrote:> On 03.05.2012 14:58, Stefan Bader wrote: > > >>>>> So this shoudl solve the problem for the bootup processor. > >>>>>> > >>>>>> -boris > >>>>>> > >>>>>> > >>>>>>> + }; > >>>>>>> + int ret = 0; > >>>>>>> + > >>>>>>> + /* Shouldn''t need this as APIC is turned off for PV, and we only > >>>>>>> + * get called on the bootup processor. But just in case. */ > >>>>>>> + if (!xen_initial_domain() || smp_processor_id()) > >>>>>>> + return 0; > >>>>>>> + > >>>>>>> + if (reg == APIC_LVR) > >>>>>>> + return 0x10; > >>>>>>> + > >>>>>>> + if (reg != APIC_ID) > >>>>>>> + return 0; > >>>>>>> + > >>>>>>> + ret = HYPERVISOR_dom0_op(&op); > >>>>>>> + if (ret) > >>>>>>> + return 0; > >>>>>>> + > >>>>>>> + return op.u.pcpu_info.apic_id; > >>>>>>> } > >>>>>>> > >>>>>>> static void xen_apic_write(u32 reg, u32 val) > > > > I added debugging to all exit paths that could return 0 (which is what the > > boot_cpu_physical_apicid is set to with that patch. Which would only leave the > > case of the HV call returning the wrong value somehow... > > > Hmmm, so xen_apic_read is still correct... > > [ 0.000000] ACPI: Local APIC address 0xfee00000 > [ 0.000000] xxx xen_apic_read(20) > [ 0.000000] xxx xen_apic_read -> 10 > [ 0.000000] boot_cpu_physical_apicid = 0 > [ 0.000000] xxx xen_apic_read(30) > [ 0.000000] +- apic version = 10 > > there seems to be a slightly strange tweak (at least for me) in read_apic_id... > > static inline unsigned int read_apic_id(void) > { > unsigned int reg; > > reg = apic_read(APIC_ID); // calls apic->read(reg) > > return apic->get_apic_id(reg);Duh!! Let me spin out a new patch that will do this.> } > >> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-May-03 16:14 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On Wed, May 02, 2012 at 06:09:26PM -0400, Boris Ostrovsky wrote:> On 05/02/2012 05:41 PM, Konrad Rzeszutek Wilk wrote: > >On Wed, May 02, 2012 at 02:31:07PM -0700, Boris Ostrovsky wrote: > >>On 05/02/2012 01:14 PM, Konrad Rzeszutek Wilk wrote: > >>>On Wed, May 02, 2012 at 01:06:34PM -0400, Boris Ostrovsky wrote: > >>>>On 05/02/2012 12:08 PM, Konrad Rzeszutek Wilk wrote: > >>>>>diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > >>>>>index a8f8844..d816448 100644 > >>>>>--- a/arch/x86/xen/enlighten.c > >>>>>+++ b/arch/x86/xen/enlighten.c > >>>>>@@ -811,7 +811,29 @@ static void xen_io_delay(void) > >>>>> #ifdef CONFIG_X86_LOCAL_APIC > >>>>> static u32 xen_apic_read(u32 reg) > >>>>> { > >>>>>- return 0; > >>>>>+ struct xen_platform_op op = { > >>>>>+ .cmd = XENPF_get_cpuinfo, > >>>>>+ .interface_version = XENPF_INTERFACE_VERSION, > >>>>>+ .u.pcpu_info.xen_cpuid = 0, > >>>> > >>>> > >>>>Is this always zero? This will probably solve the current problem > >>> > >>>Its a CPU number (not tied in to APIC or ACPI IDs). > >> > >>Why not use CPU number instead of zero here? > > > >The issue was only with the bootup CPU - so was using the Xen''s > >bootup CPU number - which is zero (as is Linux''s). > > I agree that for this particular problem this may be sufficient. > > My concern is that in the future someone may decide to use > apic_read(APIC_ID) or read_apic_id() for some other purpose and they > won''t get expected result (i.e. on all CPUs they will get the same > answer).Good point. Let''s get this particular bug fixed for v3.5, and then will do a more comprehensive fix for v3.6.
Boris Ostrovsky
2012-May-03 17:02 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 05/03/2012 11:46 AM, Konrad Rzeszutek Wilk wrote:> On Thu, May 03, 2012 at 04:47:46PM +0200, Stefan Bader wrote: >> On 03.05.2012 14:58, Stefan Bader wrote: >> >>>>>>> So this shoudl solve the problem for the bootup processor. >>>>>>>> >>>>>>>> -boris >>>>>>>> >>>>>>>> >>>>>>>>> + }; >>>>>>>>> + int ret = 0; >>>>>>>>> + >>>>>>>>> + /* Shouldn''t need this as APIC is turned off for PV, and we only >>>>>>>>> + * get called on the bootup processor. But just in case. */ >>>>>>>>> + if (!xen_initial_domain() || smp_processor_id()) >>>>>>>>> + return 0; >>>>>>>>> + >>>>>>>>> + if (reg == APIC_LVR) >>>>>>>>> + return 0x10; >>>>>>>>> + >>>>>>>>> + if (reg != APIC_ID) >>>>>>>>> + return 0; >>>>>>>>> + >>>>>>>>> + ret = HYPERVISOR_dom0_op(&op); >>>>>>>>> + if (ret) >>>>>>>>> + return 0; >>>>>>>>> + >>>>>>>>> + return op.u.pcpu_info.apic_id;return (op.u.pcpu_info.apic_id << 24); indeed fixes the problem. -boris>>>>>>>>> } >>>>>>>>> >>>>>>>>> static void xen_apic_write(u32 reg, u32 val) >>> >>> I added debugging to all exit paths that could return 0 (which is what the >>> boot_cpu_physical_apicid is set to with that patch. Which would only leave the >>> case of the HV call returning the wrong value somehow... >>> >> Hmmm, so xen_apic_read is still correct... >> >> [ 0.000000] ACPI: Local APIC address 0xfee00000 >> [ 0.000000] xxx xen_apic_read(20) >> [ 0.000000] xxx xen_apic_read -> 10 >> [ 0.000000] boot_cpu_physical_apicid = 0 >> [ 0.000000] xxx xen_apic_read(30) >> [ 0.000000] +- apic version = 10 >> >> there seems to be a slightly strange tweak (at least for me) in read_apic_id... >> >> static inline unsigned int read_apic_id(void) >> { >> unsigned int reg; >> >> reg = apic_read(APIC_ID); // calls apic->read(reg) >> >> return apic->get_apic_id(reg); > > Duh!! Let me spin out a new patch that will do this. >> } >> >> > > > >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel > >
Konrad Rzeszutek Wilk
2012-May-03 17:08 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
> > Hmmm, so xen_apic_read is still correct... > > > > [ 0.000000] ACPI: Local APIC address 0xfee00000 > > [ 0.000000] xxx xen_apic_read(20) > > [ 0.000000] xxx xen_apic_read -> 10 > > [ 0.000000] boot_cpu_physical_apicid = 0 > > [ 0.000000] xxx xen_apic_read(30) > > [ 0.000000] +- apic version = 10 > > > > there seems to be a slightly strange tweak (at least for me) in read_apic_id... > > > > static inline unsigned int read_apic_id(void) > > { > > unsigned int reg; > > > > reg = apic_read(APIC_ID); // calls apic->read(reg) > > > > return apic->get_apic_id(reg); > > Duh!! Let me spin out a new patch that will do this.Meaning bit-shift it. We ended up doing 10 >> 24 (get_apic_id does that) which results in zero. So lets be a bit more cautious and over-write the get_apic_id and set_apic_id and as well do the proper bit-shifting. commit 4bb450ea9dca1b8d845f1b53ab6476615a32badf Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Date: Wed May 2 15:04:51 2012 -0400 xen/apic: Return the APIC ID (and version) for CPU 0. On x86_64 on AMD machines where the first APIC_ID is not zero, we get: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled) BIOS bug: APIC version is 0 for CPU 1/0x10, fixing up to 0x10 BIOS bug: APIC version mismatch, boot CPU: 0, CPU 1: version 10 which means that when the ACPI processor driver loads and tries to parse the _Pxx states it fails to do as, as it ends up calling acpi_get_cpuid which does this: for_each_possible_cpu(i) { if (cpu_physical_id(i) == apic_id) return i; } And the bootup CPU, has not been found so it fails and returns -1 for the first CPU - which then subsequently in the loop that "acpi_processor_get_info" does results in returning an error, which means that "acpi_processor_add" failing and per_cpu(processor) is never set (and is NULL). That means that when xen-acpi-processor tries to load (much much later on) and parse the P-states it gets -ENODEV from acpi_processor_register_performance() (which tries to read the per_cpu(processor)) and fails to parse the data. Reported-by: Stefan Bader <stefan.bader@canonical.com> Suggested-by: Boris Ostrovsky <boris.ostrovsky@amd.com> [v2: Bit-shift APIC ID by 24 bits] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index a8f8844..63d6c22 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -53,6 +53,9 @@ #include <asm/processor.h> #include <asm/proto.h> #include <asm/msr-index.h> +#ifdef CONFIG_NUMA +#include <asm/numa.h> +#endif #include <asm/traps.h> #include <asm/setup.h> #include <asm/desc.h> @@ -809,9 +812,40 @@ static void xen_io_delay(void) } #ifdef CONFIG_X86_LOCAL_APIC +static unsigned long xen_set_apic_id(unsigned int x) +{ + WARN_ON(1); + return x; +} +static unsigned int xen_get_apic_id(unsigned long x) +{ + return (((x)>>24) & 0xFFu); +} static u32 xen_apic_read(u32 reg) { - return 0; + struct xen_platform_op op = { + .cmd = XENPF_get_cpuinfo, + .interface_version = XENPF_INTERFACE_VERSION, + .u.pcpu_info.xen_cpuid = 0, + }; + int ret = 0; + + /* Shouldn''t need this as APIC is turned off for PV, and we only + * get called on the bootup processor. But just in case. */ + if (!xen_initial_domain() || smp_processor_id()) + return 0; + + if (reg == APIC_LVR) + return 0x10; + + if (reg != APIC_ID) + return 0; + + ret = HYPERVISOR_dom0_op(&op); + if (ret) + return 0; + + return op.u.pcpu_info.apic_id << 24; } static void xen_apic_write(u32 reg, u32 val) @@ -849,6 +883,8 @@ static void set_xen_basic_apic_ops(void) apic->icr_write = xen_apic_icr_write; apic->wait_icr_idle = xen_apic_wait_icr_idle; apic->safe_wait_icr_idle = xen_safe_apic_wait_icr_idle; + apic->set_apic_id = xen_set_apic_id; + apic->get_apic_id = xen_get_apic_id; } #endif @@ -1294,6 +1330,9 @@ asmlinkage void __init xen_start_kernel(void) */ acpi_numa = -1; #endif +#if defined(CONFIG_NUMA) && defined(CONFIG_X86_32) + numa_off = 1; +#endif pgd = (pgd_t *)xen_start_info->pt_base;
Stefan Bader
2012-May-04 08:00 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On 03.05.2012 19:08, Konrad Rzeszutek Wilk wrote:>>> Hmmm, so xen_apic_read is still correct... >>> >>> [ 0.000000] ACPI: Local APIC address 0xfee00000 >>> [ 0.000000] xxx xen_apic_read(20) >>> [ 0.000000] xxx xen_apic_read -> 10 >>> [ 0.000000] boot_cpu_physical_apicid = 0 >>> [ 0.000000] xxx xen_apic_read(30) >>> [ 0.000000] +- apic version = 10 >>> >>> there seems to be a slightly strange tweak (at least for me) in read_apic_id... >>> >>> static inline unsigned int read_apic_id(void) >>> { >>> unsigned int reg; >>> >>> reg = apic_read(APIC_ID); // calls apic->read(reg) >>> >>> return apic->get_apic_id(reg); >> >> Duh!! Let me spin out a new patch that will do this. > > Meaning bit-shift it. We ended up doing 10 >> 24 (get_apic_id does that) > which results in zero. So lets be a bit more cautious and over-write > the get_apic_id and set_apic_id and as well do the proper bit-shifting. >I can confirm that with this patch applied I can load the xen-acpi-processor driver and see P-states changing in xenpm. No BIOS bug messages. Also tested on the i7 and it does still work. I am attaching the dmesg output of both runs. On the i7 the xen apic functions are called in two places and for cpu#0 only. On AMD, I see additional call later and for all cpus while enabling them. apic->read bails out, but apic->get_apic_id returns 0 (though I could see no immediate problem from that). Maybe that info is helpful for the next stage. -Stefan> commit 4bb450ea9dca1b8d845f1b53ab6476615a32badf > Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > Date: Wed May 2 15:04:51 2012 -0400 > > xen/apic: Return the APIC ID (and version) for CPU 0. > > On x86_64 on AMD machines where the first APIC_ID is not zero, we get: > > ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled) > BIOS bug: APIC version is 0 for CPU 1/0x10, fixing up to 0x10 > BIOS bug: APIC version mismatch, boot CPU: 0, CPU 1: version 10 > > which means that when the ACPI processor driver loads and > tries to parse the _Pxx states it fails to do as, as it > ends up calling acpi_get_cpuid which does this: > > for_each_possible_cpu(i) { > if (cpu_physical_id(i) == apic_id) > return i; > } > > And the bootup CPU, has not been found so it fails and returns -1 > for the first CPU - which then subsequently in the loop that > "acpi_processor_get_info" does results in returning an error, which > means that "acpi_processor_add" failing and per_cpu(processor) > is never set (and is NULL). > > That means that when xen-acpi-processor tries to load (much much > later on) and parse the P-states it gets -ENODEV from > acpi_processor_register_performance() (which tries to read > the per_cpu(processor)) and fails to parse the data. > > Reported-by: Stefan Bader <stefan.bader@canonical.com> > Suggested-by: Boris Ostrovsky <boris.ostrovsky@amd.com> > [v2: Bit-shift APIC ID by 24 bits] > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>Tested-by: Stefan Bader <stefan.bader@canonical.com>> > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index a8f8844..63d6c22 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -53,6 +53,9 @@ > #include <asm/processor.h> > #include <asm/proto.h> > #include <asm/msr-index.h> > +#ifdef CONFIG_NUMA > +#include <asm/numa.h> > +#endif > #include <asm/traps.h> > #include <asm/setup.h> > #include <asm/desc.h> > @@ -809,9 +812,40 @@ static void xen_io_delay(void) > } > > #ifdef CONFIG_X86_LOCAL_APIC > +static unsigned long xen_set_apic_id(unsigned int x) > +{ > + WARN_ON(1); > + return x; > +} > +static unsigned int xen_get_apic_id(unsigned long x) > +{ > + return (((x)>>24) & 0xFFu); > +} > static u32 xen_apic_read(u32 reg) > { > - return 0; > + struct xen_platform_op op = { > + .cmd = XENPF_get_cpuinfo, > + .interface_version = XENPF_INTERFACE_VERSION, > + .u.pcpu_info.xen_cpuid = 0, > + }; > + int ret = 0; > + > + /* Shouldn''t need this as APIC is turned off for PV, and we only > + * get called on the bootup processor. But just in case. */ > + if (!xen_initial_domain() || smp_processor_id()) > + return 0; > + > + if (reg == APIC_LVR) > + return 0x10; > + > + if (reg != APIC_ID) > + return 0; > + > + ret = HYPERVISOR_dom0_op(&op); > + if (ret) > + return 0; > + > + return op.u.pcpu_info.apic_id << 24; > } > > static void xen_apic_write(u32 reg, u32 val) > @@ -849,6 +883,8 @@ static void set_xen_basic_apic_ops(void) > apic->icr_write = xen_apic_icr_write; > apic->wait_icr_idle = xen_apic_wait_icr_idle; > apic->safe_wait_icr_idle = xen_safe_apic_wait_icr_idle; > + apic->set_apic_id = xen_set_apic_id; > + apic->get_apic_id = xen_get_apic_id; > } > > #endif > @@ -1294,6 +1330,9 @@ asmlinkage void __init xen_start_kernel(void) > */ > acpi_numa = -1; > #endif > +#if defined(CONFIG_NUMA) && defined(CONFIG_X86_32) > + numa_off = 1; > +#endif > > pgd = (pgd_t *)xen_start_info->pt_base; > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Pasi Kärkkäinen
2012-May-06 15:23 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On Thu, Apr 26, 2012 at 01:04:26PM -0400, Konrad Rzeszutek Wilk wrote:> > >> > > >> This I would take as C3 and C6 really not being used and the frequency scaling > > > > > > To go in deeper modes there is also a need to backport a Xen unstable > > > hypercall which will allow the kernel to detect the other states besides > > > C0-C2. > > > > > > "XEN_SET_PDC query was implemented in c/s 23783: > > > "ACPI: add _PDC input override mechanism". > > > > > > > I see. There is a kernel patch about enabling MWAIT that refers to that... > > Yeah, I should see about back-porting it in Xen 4.1.. >Should this patch be backported and merged to xen-4.1-testing.hg for Xen 4.1.3 release ? Thanks, -- Pasi
Konrad Rzeszutek Wilk
2012-May-07 17:33 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On Sun, May 06, 2012 at 06:23:41PM +0300, Pasi Kärkkäinen wrote:> On Thu, Apr 26, 2012 at 01:04:26PM -0400, Konrad Rzeszutek Wilk wrote: > > > >> > > > >> This I would take as C3 and C6 really not being used and the frequency scaling > > > > > > > > To go in deeper modes there is also a need to backport a Xen unstable > > > > hypercall which will allow the kernel to detect the other states besides > > > > C0-C2. > > > > > > > > "XEN_SET_PDC query was implemented in c/s 23783: > > > > "ACPI: add _PDC input override mechanism". > > > > > > > > > > I see. There is a kernel patch about enabling MWAIT that refers to that... > > > > Yeah, I should see about back-porting it in Xen 4.1.. > > > > Should this patch be backported and merged to xen-4.1-testing.hg for Xen 4.1.3 release ?It is a new feature so no. Which does mean that the MWAIT states won''t be uploaded to the hypervisor. But the legacy ones (so the ones that are in the ACPI _CST) are still uploaded. And TurboMode kicks in without the MWAIT states.
Pasi Kärkkäinen
2012-May-07 17:44 UTC
Re: Workings/effectiveness of the xen-acpi-processor driver
On Mon, May 07, 2012 at 01:33:06PM -0400, Konrad Rzeszutek Wilk wrote:> On Sun, May 06, 2012 at 06:23:41PM +0300, Pasi Kärkkäinen wrote: > > On Thu, Apr 26, 2012 at 01:04:26PM -0400, Konrad Rzeszutek Wilk wrote: > > > > >> > > > > >> This I would take as C3 and C6 really not being used and the frequency scaling > > > > > > > > > > To go in deeper modes there is also a need to backport a Xen unstable > > > > > hypercall which will allow the kernel to detect the other states besides > > > > > C0-C2. > > > > > > > > > > "XEN_SET_PDC query was implemented in c/s 23783: > > > > > "ACPI: add _PDC input override mechanism". > > > > > > > > > > > > > I see. There is a kernel patch about enabling MWAIT that refers to that... > > > > > > Yeah, I should see about back-porting it in Xen 4.1.. > > > > > > > Should this patch be backported and merged to xen-4.1-testing.hg for Xen 4.1.3 release ? > > It is a new feature so no. Which does mean that the MWAIT states won''t be uploaded > to the hypervisor. But the legacy ones (so the ones that are in the ACPI _CST) > are still uploaded. And TurboMode kicks in without the MWAIT states. >Ah, thanks for clearing that up. -- Pasi