Konrad Rzeszutek Wilk
2012-Mar-06 17:40 UTC
Is: drivers/cpufreq/cpufreq-xen.c Was:Re: [PATCH 2 of 2] linux-xencommons: Load processor-passthru
.. snip..>> Both of them (acpi-cpufreq.c and powernow-k8.c) have a symbol >> dependency on drivers/acpi/processor.c > > But them being ''m'' or ''y'' shouldn''t matter in the end.I thought you were saying it matters - as it should be done around the same time as cpufreq drivers were loaded? .. snip..>> For a), this would mean some form of unregistering the existing >> cpufreq scaling drivers.The reason > > Or loading before them (and not depending on them), thus > preventing them from loading successfully.I think what you are suggesting is that to write a driver in drivers/cpufreq/ that gets either started before the other ones (if built-in) or if as a module gets loaded from xencommons. That driver would then make the call to acpi_processor_preregister_performance(), acpi_processor_register_performance() and acpi_processor_notify_smm(). It would function as a cpufreq-scaling driver but in reality all calls to it from cpufreq gov-* drivers would end up with nop. Dave, would you be Ok with a driver like that in your tree?> >> for that is we want to use the generic ones (acpi-cpufreq and >> powernow-k8) b/c they do all the filtering and parsing of the ACPI >> data instead of re-implementing it in our own cpufreq-xen-scaling.I don''t know what I was reading, but the filtering/parsing looks be done via those acpi_processor_* calls. So it sounds like it could be done that way.>> Thought one other option is to export both powernow-k8 and >> acpi-cpufreq functions that do this and use them within the >> cpufreq-xen-scaling-driver but that sounds icky. > > Indeed. > >> 2). Upload the power management information up to the hypervisor. > > Which doesn''t require cpufreq drivers at all (in non-pv-ops we simply > suppress the CPU_FREQ config option when XEN is set).Heh.> > Jan > >
Dave Jones
2012-Mar-06 17:59 UTC
Re: Is: drivers/cpufreq/cpufreq-xen.c Was:Re: [PATCH 2 of 2] linux-xencommons: Load processor-passthru
On Tue, Mar 06, 2012 at 12:40:08PM -0500, Konrad Rzeszutek Wilk wrote: > I think what you are suggesting is that to write a driver in drivers/cpufreq/ > that gets either started before the other ones (if built-in) or if as > a module gets > loaded from xencommons. That driver would then make the call > to acpi_processor_preregister_performance(), > acpi_processor_register_performance() and acpi_processor_notify_smm(). > It would function as a cpufreq-scaling driver but > in reality all calls to it from cpufreq gov-* drivers would end up with nop. > > Dave, would you be Ok with a driver like that in your tree? I joined this thread half-way through, so I''m not sure what the original problem was. How is a driver that does nothing better than just masking out the cpufreq capabilities to guests ? Dave
Jan Beulich
2012-Mar-07 08:18 UTC
Is: drivers/cpufreq/cpufreq-xen.c Was:Re: [PATCH 2 of 2] linux-xencommons: Load processor-passthru
>>> On 06.03.12 at 18:40, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote: >>> Both of them (acpi-cpufreq.c and powernow-k8.c) have a symbol >>> dependency on drivers/acpi/processor.c >> >> But them being ''m'' or ''y'' shouldn''t matter in the end. > > I thought you were saying it matters - as it should be done around the > same time as cpufreq drivers were loaded?Depends on how you interpret "matters": You should not introduce a requirement for these to be set to ''y''. And I was hinting at the fact that when they''re ''m'', a distro may load them much later than the processor driver (SuSE for instance generally loads the processor driver from initrd, but cpufreq post-boot (when the selected run level''s scripts get executed).>>> For a), this would mean some form of unregistering the existing >>> cpufreq scaling drivers.The reason >> >> Or loading before them (and not depending on them), thus >> preventing them from loading successfully. > > I think what you are suggesting is that to write a driver in > drivers/cpufreq/ > that gets either started before the other ones (if built-in) or if as > a module gets > loaded from xencommons. That driver would then make the call > to acpi_processor_preregister_performance(), > acpi_processor_register_performance() and acpi_processor_notify_smm(). > It would function as a cpufreq-scaling driver but > in reality all calls to it from cpufreq gov-* drivers would end up with nop.Yes, that''s the option I would expect to be the least cumbersome one if you want to go the cpufreq driver route. I''d personally prefer the processor-extcntl logic to be ported over into ACPI''s processor driver, and suppress the loading of cpufreq drivers altogether (unless "cpufreq=dom0" was given on the Xen command line, an option the introduction of which I always considered bogus). Jan
Konrad Rzeszutek Wilk
2012-Mar-08 02:12 UTC
Re: Is: drivers/cpufreq/cpufreq-xen.c Was:Re: [PATCH 2 of 2] linux-xencommons: Load processor-passthru
> > I think what you are suggesting is that to write a driver in drivers/cpufreq/ > > that gets either started before the other ones (if built-in) or if as > > a module gets > > loaded from xencommons. That driver would then make the call > > to acpi_processor_preregister_performance(), > > acpi_processor_register_performance() and acpi_processor_notify_smm(). > > It would function as a cpufreq-scaling driver but > > in reality all calls to it from cpufreq gov-* drivers would end up with nop. > > > > Dave, would you be Ok with a driver like that in your tree? > > I joined this thread half-way through, so I''m not sure what the original problem was. > How is a driver that does nothing better than just masking out the cpufreq capabilities to guests ?Hey Dave, The problem statement is three-fold: 1). Parse and upload ACPI0007 (or PROCESSOR_TYPE) information to the hypervisor - aka P-states. 2). Upload the Cx state information. 3). Inhibit CPU frequency scaling drivers from loading. The reason for wanting to solve 1) and 2) is such that the Xen hypervisor is the only one that knows the CPU usage of different guests and can make the proper decision of when to put CPUs and packages in proper states. Unfortunately the hypervisor has no support to parse ACPI DSDT tables, hence it needs help from the initial domain to provide this information. The reason for 3) is that we do not want the initial domain to change P-states while the hypervisor is doing it as well - it causes rather some funny cases of P-states transitions. So in the past (old classic XenOLinux patches) there were patches added in the drivers/acpi/processor_* to make the appropriate hypercalls. And the CPUFREQ drivers were not built for the xen kernels. Neither one of those is an option for the upstream kernel. I''ve been looking at how to leverage the existing wealth of functionality that the drivers/acpi/processor-* libs provide and trying to use that. The first couple of versions would harvest the data after the cpufreq scaling drivers had used and upload them. But that would not solve the 3) case. So then I went off in making a cpufreq governor that would be a nop and do 1) and 2). The last incarnation, [see attached] instead uses the drivers/acpi/processor_* libs to fetch the ACPI information, calls "acpi_processor_notify_smm" to inhibit the cpu freq scaling drivers from being able to load. It actually works pretty well when it is built-in, but not sure how to make it work bullet-proof when CONFIG_X86_ACPI_CPUFREQ=m. So my big question is whether could be a ''cpufreq.off=1'' API, similar to the "disable_cpuidle()" call that inhibit the cpuidle drivers? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Mar-13 16:46 UTC
Re: Is: drivers/cpufreq/cpufreq-xen.c Was:Re: [PATCH 2 of 2] linux-xencommons: Load processor-passthru
On Wed, Mar 07, 2012 at 09:12:00PM -0500, Konrad Rzeszutek Wilk wrote:> > > I think what you are suggesting is that to write a driver in drivers/cpufreq/ > > > that gets either started before the other ones (if built-in) or if as > > > a module gets > > > loaded from xencommons. That driver would then make the call > > > to acpi_processor_preregister_performance(), > > > acpi_processor_register_performance() and acpi_processor_notify_smm(). > > > It would function as a cpufreq-scaling driver but > > > in reality all calls to it from cpufreq gov-* drivers would end up with nop. > > > > > > Dave, would you be Ok with a driver like that in your tree? > > > > I joined this thread half-way through, so I''m not sure what the original problem was. > > How is a driver that does nothing better than just masking out the cpufreq capabilities to guests ? > Hey Dave, > > The problem statement is three-fold: > 1). Parse and upload ACPI0007 (or PROCESSOR_TYPE) information to the > hypervisor - aka P-states. > 2). Upload the Cx state information. > 3). Inhibit CPU frequency scaling drivers from loading... snip..> So my big question is whether could be a ''cpufreq.off=1'' API, similar > to the "disable_cpuidle()" > call that inhibit the cpuidle drivers?Which would look like this (compile tested, but not extensivly): diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 1236623..1ba8dff 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -10,6 +10,7 @@ #include <linux/pm.h> #include <linux/memblock.h> #include <linux/cpuidle.h> +#include <linux/cpufreq.h> #include <asm/elf.h> #include <asm/vdso.h> @@ -420,6 +421,7 @@ void __init xen_arch_setup(void) boot_cpu_data.hlt_works_ok = 1; #endif disable_cpuidle(); + disable_cpufreq(); WARN_ON(set_pm_idle_to_default()); fiddle_vdso(); } diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 622013f..7f2f149 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -126,6 +126,15 @@ static int __init init_cpufreq_transition_notifier_list(void) } pure_initcall(init_cpufreq_transition_notifier_list); +static int off __read_mostly; +int cpufreq_disabled(void) +{ + return off; +} +void disable_cpufreq(void) +{ + off = 1; +} static LIST_HEAD(cpufreq_governor_list); static DEFINE_MUTEX(cpufreq_governor_mutex); @@ -1441,6 +1450,9 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy, { int retval = -EINVAL; + if (cpufreq_disabled()) + return -ENODEV; + pr_debug("target for CPU %u: %u kHz, relation %u\n", policy->cpu, target_freq, relation); if (cpu_online(policy->cpu) && cpufreq_driver->target) @@ -1549,6 +1561,9 @@ int cpufreq_register_governor(struct cpufreq_governor *governor) if (!governor) return -EINVAL; + if (cpufreq_disabled()) + return -ENODEV; + mutex_lock(&cpufreq_governor_mutex); err = -EBUSY; @@ -1572,6 +1587,9 @@ void cpufreq_unregister_governor(struct cpufreq_governor *governor) if (!governor) return; + if (cpufreq_disabled()) + return; + #ifdef CONFIG_HOTPLUG_CPU for_each_present_cpu(cpu) { if (cpu_online(cpu)) @@ -1814,6 +1832,9 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data) unsigned long flags; int ret; + if (cpufreq_disabled()) + return -ENODEV; + if (!driver_data || !driver_data->verify || !driver_data->init || ((!driver_data->setpolicy) && (!driver_data->target))) return -EINVAL; @@ -1901,6 +1922,9 @@ static int __init cpufreq_core_init(void) { int cpu; + if (cpufreq_disabled()) + return -ENODEV; + for_each_possible_cpu(cpu) { per_cpu(cpufreq_policy_cpu, cpu) = -1; init_rwsem(&per_cpu(cpu_policy_rwsem, cpu)); diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 6216115..8ff4427 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -35,6 +35,7 @@ #ifdef CONFIG_CPU_FREQ int cpufreq_register_notifier(struct notifier_block *nb, unsigned int list); int cpufreq_unregister_notifier(struct notifier_block *nb, unsigned int list); +extern void disable_cpufreq(void); #else /* CONFIG_CPU_FREQ */ static inline int cpufreq_register_notifier(struct notifier_block *nb, unsigned int list) @@ -46,6 +47,7 @@ static inline int cpufreq_unregister_notifier(struct notifier_block *nb, { return 0; } +static inline void disable_cpufreq(void) { } #endif /* CONFIG_CPU_FREQ */ /* if (cpufreq_driver->target) exists, the ->governor decides what frequency
Seemingly Similar Threads
- [PATCH] cpufreq: error path fixes
- [RFC PATCH] Exporting ACPI Pxx/Cxx states to other kernel subsystems (v1).
- Can not modprobe acpi-cpufreq.ko in CentOS 5.2
- [PATCH 0 of 2] [RFC] Patches to work with processor-passthru driver (v1).
- [PATCH V2 1/2] cpufreq, xenpm: fix cpufreq and xenpm mismatch