Mark Langsdorf
2007-Aug-29 22:02 UTC
[Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
Enable cpufreq support in Xen for AMD Operton processors by: 1) Allowing the PowerNow! driver in dom0 to write to the PowerNow! MSRs. 2) Adding the cpufreq notifier chain to time-xen.c in dom0. On a frequency change, a platform hypercall is performed to scale the frequency multiplier in the hypervisor. 3) Adding a platform hypercall to the hypervisor the scale the frequency multiplier and reset the time stamps so that next calibration remains reasonably correct. Patch 1 covers the frequency scaling platform call. Patch 2 covers the changes necessary to the PowerNow! driver to make it correctly associate shared cores under Xen and to write to MSRs. This code can be readily expanded to cover Intel or other non-AMD processors by modifying xen/arch/x8/traps.c to allow the appropriate MSR accesses. Caveat: currently, this code does not support the in-kernel ondemand cpufreq governor. Dom0 must run a userspace daemon to monitor the utilization of the physical cpus with the getcpuinfo sysctl hypercall. Caveat 2: on SMP systems, dom0_vcpus_pin is strongly advised. Caveat 3: Even though the clock multipliers are being scaled and recorded correctly in both dom0 and the hypervisor, time errors appear immediately after a frequency change. They are not more likely when the frequency is constant. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> diff -r 05c22f282023 arch/i386/kernel/time-xen.c --- a/arch/i386/kernel/time-xen.c Tue Aug 14 16:20:55 2007 +0100 +++ b/arch/i386/kernel/time-xen.c Tue Aug 28 14:55:24 2007 -0500 @@ -50,6 +50,7 @@ #include <linux/percpu.h> #include <linux/kernel_stat.h> #include <linux/posix-timers.h> +#include <linux/cpufreq.h> #include <asm/io.h> #include <asm/smp.h> @@ -1118,6 +1119,65 @@ void local_teardown_timer(unsigned int c BUG_ON(cpu == 0); unbind_from_irqhandler(per_cpu(timer_irq, cpu), NULL); } +#endif + +#if CONFIG_CPU_FREQ +/* + * cpufreq scaling handling + */ +static int time_cpufreq_notifier(struct notifier_block *nb, unsigned long val, + void *data) +{ + struct cpufreq_freqs *freq = data; + struct vcpu_time_info *info = &vcpu_info(freq->cpu)->time; + struct xen_platform_op op; + cpumask_t oldmask; + unsigned int cpu; + + if (cpu_has(&cpu_data[freq->cpu], X86_FEATURE_CONSTANT_TSC)) + return 0; + + if (val == CPUFREQ_PRECHANGE) + return 0; + + /* change the frequency inside the hypervisor */ + oldmask = current->cpus_allowed; + set_cpus_allowed(current, cpumask_of_cpu(freq->cpu)); + schedule(); + op.cmd = XENPF_change_freq; + op.u.change_freq.info = info; + op.u.change_freq.old = freq->old; + op.u.change_freq.new = freq->new; + op.u.change_freq.cpu_num = freq->cpu; + HYPERVISOR_platform_op(&op); + + for_each_online_cpu(cpu) { + get_time_values_from_xen(cpu); + per_cpu(processed_system_time, cpu) + per_cpu(shadow_time, cpu).system_timestamp; + } + + set_cpus_allowed(current, oldmask); + schedule(); + + return 0; +} + +static struct notifier_block time_cpufreq_notifier_block = { + .notifier_call = time_cpufreq_notifier +}; + +static int __init cpufreq_time_setup(void) +{ + if (!cpufreq_register_notifier(&time_cpufreq_notifier_block, + CPUFREQ_TRANSITION_NOTIFIER)) { + printk(KERN_ERR "failed to set up cpufreq notifier\n"); + return -ENODEV; + } + return 0; +} + +core_initcall(cpufreq_time_setup); #endif /* diff -r 05c22f282023 include/xen/interface/platform.h --- a/include/xen/interface/platform.h Tue Aug 14 16:20:55 2007 +0100 +++ b/include/xen/interface/platform.h Tue Aug 28 14:55:24 2007 -0500 @@ -153,6 +153,17 @@ typedef struct xenpf_firmware_info xenpf typedef struct xenpf_firmware_info xenpf_firmware_info_t; DEFINE_XEN_GUEST_HANDLE(xenpf_firmware_info_t); +#define XENPF_change_freq 52 +struct xenpf_change_freq { + /* IN variables */ + struct vcpu_time_info *info; /* vcpu time info for changing vcpu */ + uint32_t old; /* original frequency */ + uint32_t new; /* new frequency */ + uint32_t cpu_num; +}; +typedef struct xenpf_change_freq xenpf_change_freq_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_change_freq_t); + #define XENPF_enter_acpi_sleep 51 struct xenpf_enter_acpi_sleep { /* IN variables */ @@ -175,6 +186,7 @@ struct xen_platform_op { struct xenpf_microcode_update microcode; struct xenpf_platform_quirk platform_quirk; struct xenpf_firmware_info firmware_info; + struct xenpf_change_freq change_freq; struct xenpf_enter_acpi_sleep enter_acpi_sleep; uint8_t pad[128]; } u; diff -r 256160ff19b7 xen/include/xen/time.h --- a/xen/include/xen/time.h Thu Aug 16 13:27:59 2007 +0100 +++ b/xen/include/xen/time.h Wed Aug 29 17:10:06 2007 -0500 @@ -74,6 +74,8 @@ extern void do_settime( extern void do_settime( unsigned long secs, unsigned long nsecs, u64 system_time_base); +extern void do_change_freq(struct vcpu_time_info *info, unsigned int old, unsigned int new, int cpu_num); + extern void send_timer_event(struct vcpu *v); #endif /* __XEN_TIME_H__ */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Aug-30 06:41 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
Hi, Mark, Some comments here: a) Current approach is simple to let Dom0 conduct frequency change. That should be OK in the start, but at the same time we should also consider the on-demand governor within Xen itself. Xen can always get first-hand data about domain status, while dom0 (either user-level or in-kernel) can''t achieve in time. Fine- grained frequency change is more likely to be achieved within Xen directly. b) Did you miss some part of patch? I didn''t see place within Xen to handle new platform hypercall. Also please don''t mix Linux and Xen change altogether in one patch. c) I took a look at your previous version. It seemed that you need do some change to Xen''s calibration code. The calibration happens once per second on local processor. Say [start,end] of calibration period is [t0, t2], and frequency change happens at [t1] and Xen is notified with that event at [t1'']. Here we get several problematic window: t1 < t < t1'': dom0 still uses old scale while TSC frequency already changes t1'' < t < t2: dom0 uses right scale matching TSC change t2: Xen runs its calibration timer while this period is with mixed frequency and Xen will get a new frequency [new''] something between [old, new]. Such mismatch may make dom0 misinterpret elapsed TSC offset. So I think one thing you can try is to stop calibration timer at t1'', change scale, and then restart calibration timer again. But the mismatch between [t1, t1''] is difficult to be solved unless in-xen governor is used. :-) d) How about adding a ''cpufreq'' boot option? Once it''s on, dom0_vcpus_pin is forced to on too. Or else it really doesn''t make sense to let dom0 conduct frequency change. Thanks, Kevin>From: Mark Langsdorf >Sent: 2007年8月30日 6:03 > >Enable cpufreq support in Xen for AMD Operton processors by: >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-30 09:30 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
On 30/8/07 07:41, "Tian, Kevin" <kevin.tian@intel.com> wrote:> a) Current approach is simple to let Dom0 conduct frequency > change. That should be OK in the start, but at the same time we > should also consider the on-demand governor within Xen itself. > Xen can always get first-hand data about domain status, while > dom0 (either user-level or in-kernel) can''t achieve in time. Fine- > grained frequency change is more likely to be achieved within > Xen directly.Personally I''m a fan of doing it in dom0 userspace, although doing it within Xen can also be argued for. Doing it in dom0 kernel doesn''t seem very attractive apart from the obvious pragmatic advantage that all the code is already in the Linux kernel. :-) If we''re doing it in the Linux kernel, I don''t see much point in hacking up the defunct powernow (or equivalent Intel) code. Why not fix the generic acpi-cpufreq.c? That is supposed to work on any modern CPU. I''m not sure the 2.6.18 version is new enough, but I''d rather see a backported and fixed version of that file, rather than bother to maintain modified versions of obsolete source files. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Aug-30 09:45 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
>From: Keir Fraser [mailto:keir@xensource.com] >Sent: 2007年8月30日 17:31 > >On 30/8/07 07:41, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> a) Current approach is simple to let Dom0 conduct frequency >> change. That should be OK in the start, but at the same time we >> should also consider the on-demand governor within Xen itself. >> Xen can always get first-hand data about domain status, while >> dom0 (either user-level or in-kernel) can''t achieve in time. Fine- >> grained frequency change is more likely to be achieved within >> Xen directly. > >Personally I''m a fan of doing it in dom0 userspace, although doing it >within >Xen can also be argued for. Doing it in dom0 kernel doesn''t seem very >attractive apart from the obvious pragmatic advantage that all the code is >already in the Linux kernel. :-)Sure, some experiment can be done later to compare dom0 userspace and in-xen governor. Agree that in-kernel dom0 approach is not charming because anyway it needs help from either user-level or xen for global view. Actually finally we may take both. Xen takes simple heuristic policy with user-level governor to adjust with more complex and flexible policies based on domain behaviors. :-)> >If we''re doing it in the Linux kernel, I don''t see much point in hacking up >the defunct powernow (or equivalent Intel) code. Why not fix the generic >acpi-cpufreq.c? That is supposed to work on any modern CPU. I''m not >sure the >2.6.18 version is new enough, but I''d rather see a backported and fixed >version of that file, rather than bother to maintain modified versions of >obsolete source files. >Yep. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-30 10:12 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
On 30/8/07 10:45, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Sure, some experiment can be done later to compare dom0 userspace > and in-xen governor. Agree that in-kernel dom0 approach is not > charming because anyway it needs help from either user-level or xen > for global view. > > Actually finally we may take both. Xen takes simple heuristic policy > with user-level governor to adjust with more complex and flexible > policies based on domain behaviors. :-)There is the problem, though, that most modern CPUs will need cpufreq info to be parsed out of the ACPI DSDT. And Xen can''t do that itself unaided. Pushing down some form of cpufreq info table to Xen *is* an option though, but we''d need more custom dom0 kernel code to do that. Or we''d need to do it from a userspace program. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Langsdorf, Mark
2007-Aug-30 14:45 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
> > a) Current approach is simple to let Dom0 conduct frequency > > change. That should be OK in the start, but at the same time we > > should also consider the on-demand governor within Xen itself. > > Xen can always get first-hand data about domain status, while > > dom0 (either user-level or in-kernel) can''t achieve in time. Fine- > > grained frequency change is more likely to be achieved within > > Xen directly. > > Personally I''m a fan of doing it in dom0 userspace, although > doing it within Xen can also be argued for. Doing it in dom0 > kernel doesn''t seem very attractive apart from the obvious > pragmatic advantage that all the code is > already in the Linux kernel. :-)The advantage to doing it in the dom0 kernel is that the distributions have just switched from doing it in userspace, and thus have all their tools set up to do it in the kernel. To me, it makes more sense to simplify the user interface, so that a native mode machine and a virtual machine uses the same tools. The end user shouldn''t need to learn cpuspeed when running power management on a virtual machine host if the same computer uses ondemand when running a native mode kernel.> If we''re doing it in the Linux kernel, I don''t see much point > in hacking up the defunct powernow (or equivalent Intel) code. > Why not fix the generic acpi-cpufreq.c? That is supposed to > work on any modern CPU. I''m not sure the 2.6.18 version is > new enough, but I''d rather see a backported and fixed > version of that file, rather than bother to maintain modified > versions of obsolete source files.powernow-k8 and the Intel SpeedStep equivalents are being maintained in preference to acpi-cpufreq. I don''t think the code is obsolete or defunct. -Mark Langsdorf Operating System Research Center AMD _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Langsdorf, Mark
2007-Aug-30 14:57 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
> a) Current approach is simple to let Dom0 conduct frequency > change. That should be OK in the start, but at the same time we > should also consider the on-demand governor within Xen itself.Supporting cpufreq in Xen requires the ability to parse ACPI or a new dom0 driver to pass ACPI data to Xen, as well as a mechanism for setting policy. Since dom0 already has both of those, I really think making it work in dom0 is simplest.> b) Did you miss some part of patch? I didn''t see place within Xen > to handle new platform hypercall. Also please don''t mix Linux and > Xen change altogether in one patch.I thought I had everything, but I''ll repost as four patches to be sure.> c) I took a look at your previous version. It seemed that you need do > some change to Xen''s calibration code. The calibration happens once > per second on local processor. Say [start,end] of calibration > period is > [t0, t2], and frequency change happens at [t1] and Xen is > notified with > that event at [t1'']. Here we get several problematic window: > t1 < t < t1'': dom0 still uses old scale while TSC > frequency already > changes > t1'' < t < t2: dom0 uses right scale matching TSC change > t2: Xen runs its calibration timer while this period is > with mixed > frequency and Xen will get a new frequency [new''] something between > [old, new]. Such mismatch may make dom0 misinterpret elapsed TSC > offset. > So I think one thing you can try is to stop calibration > timer at t1'', change scale, and then restart calibration timer again.But> the mismatch between [t1, t1''] is difficult to be solved unless in-xen> governor is used. :-)Xen accepts some error in the time-keeping code. Changing at t'' seems to reduce the error to acceptable margins.> d) How about adding a ''cpufreq'' boot option? Once it''s on, > dom0_vcpus_pin is forced to on too. Or else it really doesn''t make > sense to let dom0 conduct frequency change.Good idea. I''ll look into adding that, but it''s a minor fix compared to the main code. -Mark Langsdorf Operating System Research Center AMD _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rik van Riel
2007-Aug-30 14:59 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
Keir Fraser wrote:> Personally I''m a fan of doing it in dom0 userspace, although doing it within > Xen can also be argued for. Doing it in dom0 kernel doesn''t seem very > attractive apart from the obvious pragmatic advantage that all the code is > already in the Linux kernel. :-)Code duplication is bad. It is the reason why Xen will (hopefully) go away in the long run. Please do not propagate this horrible idea that all code should be copied around and have obsolete versions maintained forever. The dom0 kernel is where the code already lives, so that code should be used. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-30 15:04 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
On 30/8/07 15:45, "Langsdorf, Mark" <mark.langsdorf@amd.com> wrote:> The advantage to doing it in the dom0 kernel is that the > distributions have just switched from doing it in userspace, > and thus have all their tools set up to do it in the kernel. > > To me, it makes more sense to simplify the user interface, > so that a native mode machine and a virtual machine uses the > same tools. The end user shouldn''t need to learn cpuspeed > when running power management on a virtual machine host if > the same computer uses ondemand when running a native mode > kernel.It''s a misleading simplification. For example, the ondemand governor will build and run in a dom0 kernel but it''s not actually going to do the right thing, as it doesn''t observe whole-machine load. So, in fact, it''s probably going to close down all CPUs thinking they are idle and hence shaft system performance. Furthermore it is very common to run a dom0 with fewer VCPUs than PCPUs: using dom0 kernel as the control point for cpufreq disallows this.> powernow-k8 and the Intel SpeedStep equivalents are being > maintained in preference to acpi-cpufreq. I don''t think > the code is obsolete or defunct.I''m sure I''ve seen lkml posts to the contrary but I haven''t been able to dig any up. You are the powernow-k8 maintainer so I guess it''s your code anyhow. :-) But I''ve seen acpi-cpufreq getting beefed up with MSR support quite recently, and most boxes support ACPI P states, so I''m surprised there wouldn''t be convergence on a single driver. For your Xen changes, the MSR whitelisting should be conditional on actually being on an AMD box, and also should be conditional on opt_dom0_vcpus_pin. Actually a new config option ''cpufreq=dom0-kernel'' wouldn''t be a bad idea, and that could then imply dom0_vcpus_pin. Absolutely no good can come of letting dom0 mess with cpufreq MSRs while its VCPUS can be migrated across cores. Also I''m not sure why you make access to the MSRs conditional on the guest not being compat (32-on-64)? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-30 15:08 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
On 30/8/07 15:57, "Langsdorf, Mark" <mark.langsdorf@amd.com> wrote:>> d) How about adding a ''cpufreq'' boot option? Once it''s on, >> dom0_vcpus_pin is forced to on too. Or else it really doesn''t make >> sense to let dom0 conduct frequency change. > > Good idea. I''ll look into adding that, but it''s a minor fix > compared to the main code.''Minor'' in that dom0 will likely end up changing frequency of a different CPU than it expects? You have the core code done, but it''s not really usable without some safety catches. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rik van Riel
2007-Aug-30 18:23 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
Keir Fraser wrote:> It''s a misleading simplification. For example, the ondemand governor will > build and run in a dom0 kernel but it''s not actually going to do the right > thing, as it doesn''t observe whole-machine load.Here is the missing piece of the puzzle. A platform hypercall operation to get system wide idle time. I believe Mark''s changes, together with this little patch, are the way we can get cpufreq working on Xen with the minimal amount of code duplication. Duplicating code anywhere, whether it be inside the hypervisor or in some Xen-only userland package, will only lead to bit rot and make Xen maintenance more painful. Signed-off-by: Rik van Riel <riel@redhat.com> -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rik van Riel
2007-Aug-30 20:56 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
Langsdorf, Mark wrote:>> Here is the missing piece of the puzzle. A platform >> hypercall operation to get system wide idle time. >> >> I believe Mark''s changes, together with this little >> patch, are the way we can get cpufreq working on >> Xen with the minimal amount of code duplication. >> >> Duplicating code anywhere, whether it be inside the >> hypervisor or in some Xen-only userland package, will >> only lead to bit rot and make Xen maintenance more >> painful. > > This code looks like it returns the amount of time > spent in the running runstate, not the idle time. > Am I completely missing something?It is the time that the CPU''s _idle domain_ has spent in RUNSTATE_running. This corresponds to the idle time on each physical CPU.> For reference, the ondemand governor calculates > idle time as the sum of cpustat.idle and cpustat.iowait. > I''d think the equivalent would be the sum of > RUNSTATE_runnable and RUNSTATE_blocked.The hypervisor has no concept of iowait on a physical CPU basis. It only tracks domain VCPUs and the idle domain VCPUs. Since the idle domain VCPUs never migrate between CPUs, they reflect physical CPU idle time. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Aug-31 01:20 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
>From: Keir Fraser [mailto:keir@xensource.com] >Sent: 2007年8月30日 18:12 > >On 30/8/07 10:45, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> Sure, some experiment can be done later to compare dom0 userspace >> and in-xen governor. Agree that in-kernel dom0 approach is not >> charming because anyway it needs help from either user-level or xen >> for global view. >> >> Actually finally we may take both. Xen takes simple heuristic policy >> with user-level governor to adjust with more complex and flexible >> policies based on domain behaviors. :-) > >There is the problem, though, that most modern CPUs will need cpufreq >info >to be parsed out of the ACPI DSDT. And Xen can''t do that itself unaided. >Pushing down some form of cpufreq info table to Xen *is* an option >though, >but we''d need more custom dom0 kernel code to do that. Or we''d need >to do it >from a userspace program. > > -- KeirYes, that information has to be parsed and notified to Xen. Some PV l ogic needs to be hooked into cpufreq as you said. Userspace program is a good choice too, like the user level ACPI interpreter. However the major block is to parse dynamically changed information. For example, cpufreq info may change (due to some hardware events) and OSPM is required to re-evaluate P-state info. That re-evaluation may get different info after checking some hardware bits. To do it in user level, firstly we need emulate an event and then need virtualize those bits. Not very easy to track. So I more agree with you that an user-level stuff is the way to go, at least for the start. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Aug-31 02:42 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
>From: Rik van Riel [mailto:riel@redhat.com] >Sent: 2007年8月30日 22:59 > >Keir Fraser wrote: > >> Personally I''m a fan of doing it in dom0 userspace, although doing it >within >> Xen can also be argued for. Doing it in dom0 kernel doesn''t seem very >> attractive apart from the obvious pragmatic advantage that all the code >is >> already in the Linux kernel. :-) > >Code duplication is bad. It is the reason why Xen >will (hopefully) go away in the long run. Please do >not propagate this horrible idea that all code should >be copied around and have obsolete versions maintained >forever. > >The dom0 kernel is where the code already lives, so >that code should be used. >My several points on this: a) We shouldn''t eliminate all possibilities in the start, and all experiments need to be done to see whether effort is worth for best power saving b) Code duplication is definitely bad. But if finally xen-based governor is proved to be with best power saving cap, why not? Actually it''s not that horrible as you said to copy all code. Xen just needs generic freq info and method to conduct freq change. All the parse and compatibility issue are still taken by dom0 with a new PV interface to report result to Xen. c) Your patch in another mail is a good one to support on-demand driver within dom0. But there''re several variants compared to native environment: * Idle time is not accurate since: - idle vcpu may still runs on other processors and then run_state is not updated - the idle snapshot may change before returning to instruction after hypercall * latency information is inaccurate. On native, the latency basically reflects the time consumed on MSR write and de-facto freq change accurately. However on this case, the time between (dom0 issues WRMSR hypercall) and (Xen finally WRMSR) is even likely larger than sample ratio of on-demand driver, if vcpu switch happens in the middle. On-demand driver is fine-grained one which only works with transition latency <= 10ms. I believe that''s a tuned value and it can''t be always satisfied in virtualization environment. What does such variable ''latency'' make effect to final power saving of on-demand governor? Need data to see. d) I guess final power saving of cpufreq (either approach) is not obvious, since average CPU utilization should be higher than native which is the goal of virtualization. C-state may be more interesting. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Aug-31 02:43 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
>From: Rik van Riel [mailto:riel@redhat.com] >Sent: 2007年8月31日 2:23 > >Keir Fraser wrote: > >> It''s a misleading simplification. For example, the ondemand governor >will >> build and run in a dom0 kernel but it''s not actually going to do the right >> thing, as it doesn''t observe whole-machine load. > >Here is the missing piece of the puzzle. A platform >hypercall operation to get system wide idle time. > >I believe Mark''s changes, together with this little >patch, are the way we can get cpufreq working on >Xen with the minimal amount of code duplication. > >Duplicating code anywhere, whether it be inside the >hypervisor or in some Xen-only userland package, will >only lead to bit rot and make Xen maintenance more >painful. > >Signed-off-by: Rik van Riel <riel@redhat.com> >The run_state info is not accurate for other vcpus not on current processor, since it''s only updated when switching out. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2007-Aug-31 08:41 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
>Here is the missing piece of the puzzle. A platform >hypercall operation to get system wide idle time.Two things I''m not clear about here: - How is the caller going to be able to associate the logical CPU numbers returned with vCPU numbers? - How is the caller supposed to deal with logical CPUs it has no vCPU for? I continue to think that it is not reasonable to expect vCPUs to be pinned just for the purpose of doing frequency control (and even then I don''t think there''s a guaranteed association between vCPU and logical CPU numbers), as much as I don''t think it is a good idea to require dom0 to have as many vCPUs as there are logical CPUs (which isn''t even possible on systems with >32 logical CPUs). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-31 09:23 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
On 31/8/07 03:42, "Tian, Kevin" <kevin.tian@intel.com> wrote:> d) I guess final power saving of cpufreq (either approach) is not obvious, > since average CPU utilization should be higher than native which is the > goal of virtualization. C-state may be more interesting.Yes! I would love to see some C-state support in Xen, both for normal idle-loop execution and, as further work, deeper sleeps for hot-unplugged CPUs (which can be under control of management/performance tools in dom0). In the now prevalent multi-core environments, I''ll be surprised if it''s not better to deep-sleep whole cores rather than run them all at continually varying half speeds. And, simultaneously with making C-states a viable power-saving model, I think multi-core makes it harder to decide what the ''right'' per-cpu cpu frequency changes should be. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-31 10:04 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
On 31/8/07 02:20, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Yes, that information has to be parsed and notified to Xen. Some PV l > ogic needs to be hooked into cpufreq as you said. Userspace program > is a good choice too, like the user level ACPI interpreter. However the > major block is to parse dynamically changed information. For example, > cpufreq info may change (due to some hardware events) and OSPM > is required to re-evaluate P-state info. That re-evaluation may get > different info after checking some hardware bits. To do it in user level, > firstly we need emulate an event and then need virtualize those bits. > Not very easy to track. So I more agree with you that an user-level > stuff is the way to go, at least for the start.Not sure what you''re saying here. It seems to be ''getting async acpi events in user space would be hard, so doing this stuff in user space is the way to go''. The argument and conclusion seem at odds with one another. Unfortunately I have to reluctantly agree with the argument: at least, it would be some work to provide an interface to allow user processes to access the ACPI event mechanism. I''m not sure how much. :-) Personally I''d concentrate on picking out C-state control from ACPI, pass that info down to Xen, and hook that into the idle-loop and cpu-hotplug paths. Much easier, no async acpi events to worry about, and probably as good or better power saving where you have more than a couple of cores. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rik van Riel
2007-Aug-31 13:50 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
Tian, Kevin wrote:> b) Code duplication is definitely bad. But if finally xen-based governor is > proved to be with best power saving cap, why not?Because the larger the hypervisor is, the less practical it becomes to maintain. The current Xen hypervisor already has bugs in its copied-from-Linux code that were fixed in Linux after the code was copied. A small hypervisor is nice, but Xen is painfully large to maintain.> d) I guess final power saving of cpufreq (either approach) is not obvious, > since average CPU utilization should be higher than native which is the > goal of virtualization. C-state may be more interesting.This makes a lot of sense. C-state makes a big impact on power usage and can be implemented inside the idle loop relatively easily. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Aug-31 15:09 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
>From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] >Sent: 2007年8月31日 18:05 > >On 31/8/07 02:20, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> Yes, that information has to be parsed and notified to Xen. Some PV l >> ogic needs to be hooked into cpufreq as you said. Userspace program >> is a good choice too, like the user level ACPI interpreter. However the >> major block is to parse dynamically changed information. For example, >> cpufreq info may change (due to some hardware events) and OSPM >> is required to re-evaluate P-state info. That re-evaluation may get >> different info after checking some hardware bits. To do it in user level, >> firstly we need emulate an event and then need virtualize those bits. >> Not very easy to track. So I more agree with you that an user-level >> stuff is the way to go, at least for the start. > >Not sure what you''re saying here. It seems to be ''getting async acpi >events >in user space would be hard, so doing this stuff in user space is the way >to >go''. The argument and conclusion seem at odds with one another. >Unfortunately I have to reluctantly agree with the argument: at least, it >would be some work to provide an interface to allow user processes to >access >the ACPI event mechanism. I''m not sure how much. :-)Oh, my fault here. The argument is as you said that it''s a bit hard and the conclusion is really that user level ''governor'' is more easier. :-P> >Personally I''d concentrate on picking out C-state control from ACPI, pass >that info down to Xen, and hook that into the idle-loop and cpu-hotplug >paths. Much easier, no async acpi events to worry about, and probably >as >good or better power saving where you have more than a couple of >cores. >Agree, and some of my colleagues currently works on that part (More specifically, Winston Wang and Ke Yu). But one caveat is that C-state is still likely to be changed dynamically due to hardware event, like AC power change on mobile or some monitor logic on server. But for the first step we can just tweak kernel parser and pass to Xen. They should get something experimental out once getting a measurable power saving, like: * C-state policy in idle loop * Tick-less Xen at Idle, to reduce deep C-state interrupted too early * TSC/APIC freeze at deeper C-states Also the cpu-hotplug path like you mentioned here. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-31 15:25 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
On 31/8/07 16:09, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Agree, and some of my colleagues currently works on that part (More > specifically, Winston Wang and Ke Yu). But one caveat is that C-state > is still likely to be changed dynamically due to hardware event, like AC > power change on mobile or some monitor logic on server.Does this affect the available C states, or the current C state of a CPU? I suppose it''s the former that matters, since an attempt to deep-sleep may start to fail, which would lead to increased power consumption? Xen should be very easy indeed to make tickless when idle. In fact the PIT handler can be disabled on most systems - we just haven''t bothered to implement that simple bit of logic yet. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Sep-01 00:23 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
>From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] >Sent: 2007年8月31日 23:26 > >On 31/8/07 16:09, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> Agree, and some of my colleagues currently works on that part (More >> specifically, Winston Wang and Ke Yu). But one caveat is that C-state >> is still likely to be changed dynamically due to hardware event, like AC >> power change on mobile or some monitor logic on server. > >Does this affect the available C states, or the current C state of a CPU? I >suppose it''s the former that matters, since an attempt to deep-sleep may >start to fail, which would lead to increased power consumption?Just affect the available C-states and Xen re-evaluates to get new list and thus new decision. Actually that event brings Xen back to C0 out of any C1...Cn. When Xen starts to handle that event, the CPU is always in C0 as a running state. :-)> >Xen should be very easy indeed to make tickless when idle. In fact the >PIT >handler can be disabled on most systems - we just haven''t bothered to >implement that simple bit of logic yet. >Yes, it''s much simpler compared to Linux side and will bring obvious power saving due to longer C-state residency period. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Sep-01 11:07 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
On 1/9/07 01:23, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> Does this affect the available C states, or the current C state of a CPU? I >> suppose it''s the former that matters, since an attempt to deep-sleep may >> start to fail, which would lead to increased power consumption? > > Just affect the available C-states and Xen re-evaluates to get new list > and thus new decision. Actually that event brings Xen back to C0 out > of any C1...Cn. When Xen starts to handle that event, the CPU is always > in C0 as a running state. :-)There''s only one ''ACPI interrupt line'' though, and presumably the re-eval happens in dom0 somewhere, sometime later, and possibly on a different CPU from the one that took the interrupt. What happens if you try and enter a C state that is no longer available? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Sep-01 13:31 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
>From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] >Sent: 2007年9月1日 19:07 > >On 1/9/07 01:23, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>> Does this affect the available C states, or the current C state of a CPU? >I >>> suppose it''s the former that matters, since an attempt to deep-sleep >may >>> start to fail, which would lead to increased power consumption? >> >> Just affect the available C-states and Xen re-evaluates to get new list >> and thus new decision. Actually that event brings Xen back to C0 out >> of any C1...Cn. When Xen starts to handle that event, the CPU is >always >> in C0 as a running state. :-) > >There''s only one ''ACPI interrupt line'' though, and presumably the re-eval >happens in dom0 somewhere, sometime later, and possibly on a >different CPU >from the one that took the interrupt. What happens if you try and enter a >C >state that is no longer available? > > -- KeirYes, this is even true on native. SCI happens on one CPU with another CPU is decided to enter some unavailable C-state at the point. I guess hardware should tolerate such invalid request like taking it as a no-op or choosing a closest one. Software can anyway issue an invalid request to break report from hardware... Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Sep-01 13:57 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
On 1/9/07 14:31, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> There''s only one ''ACPI interrupt line'' though, and presumably the re-eval >> happens in dom0 somewhere, sometime later, and possibly on a >> different CPU >> from the one that took the interrupt. What happens if you try and enter a >> C >> state that is no longer available? >> >> -- Keir > > Yes, this is even true on native. SCI happens on one CPU with another > CPU is decided to enter some unavailable C-state at the point. I guess > hardware should tolerate such invalid request like taking it as a no-op > or choosing a closest one. Software can anyway issue an invalid request > to break report from hardware...While I''m hassling you about C states: are there any examples in real system of available C states changing dynamically? I mean, it makes sense to me that available power/frequency combinations may change in response to something like AC power being removed (this may make higher power/freq options unavailable). But why would available C states change? I thought these states were implemented internally on the CPU, and so are either universally available or not, and I don''t see any physical explanation for why available deep-sleep options would be affected by e.g., battery vs. AC operation. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Sep-01 14:12 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
On 1/9/07 14:31, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Yes, this is even true on native. SCI happens on one CPU with another > CPU is decided to enter some unavailable C-state at the point. I guess > hardware should tolerate such invalid request like taking it as a no-op > or choosing a closest one. Software can anyway issue an invalid request > to break report from hardware...If this is the case, that nothing entirely terrible happens when you try to enter an invalid C state, then it might not be critical to rely on ACPI''s event-reporting mechanisms to collect new C-state information. A dom0 user daemon could re-evaluate the ACPI objects every 5-10s for example, which would have negligible cost. I reckon such a simple scheme would would work well, unless updated objects can be supplied as part of the ACPI event mechanism, and you have to find and evaluate those? In that case I suppose some ACPI event info would need to be propagated for it to see the modified C-state info. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Sep-01 14:14 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
>From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] >Sent: 2007年9月1日 21:58 > >While I''m hassling you about C states: are there any examples in real >system >of available C states changing dynamically? I mean, it makes sense to >me >that available power/frequency combinations may change in response to >something like AC power being removed (this may make higher >power/freq >options unavailable). But why would available C states change? I >thought >these states were implemented internally on the CPU, and so are either >universally available or not, and I don''t see any physical explanation for >why available deep-sleep options would be affected by e.g., battery vs. >AC >operation. > > -- KeirI have to say that I didn''t think it carefully before. Following is the description about _CST from ACPI spec (8.4.2.1): The platform may change the number or type of C States available for OSPM use dynamically by issuing a Notify events on the processor object with a notification value of 0x81. This will cause OSPM to re-evaluate any _CST object residing under the processor object notified. For example, the platform might notify OSPM that the number of supported C States has changed as a result of an asynchronous AC insertion / removal event. Maybe some hardware people know the tricks inside. :-( Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Sep-01 14:18 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
>From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] >Sent: 2007年9月1日 22:13 > >On 1/9/07 14:31, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> Yes, this is even true on native. SCI happens on one CPU with another >> CPU is decided to enter some unavailable C-state at the point. I guess >> hardware should tolerate such invalid request like taking it as a no-op >> or choosing a closest one. Software can anyway issue an invalid >request >> to break report from hardware... > >If this is the case, that nothing entirely terrible happens when you try to >enter an invalid C state, then it might not be critical to rely on ACPI''s >event-reporting mechanisms to collect new C-state information. A dom0 >user >daemon could re-evaluate the ACPI objects every 5-10s for example, >which >would have negligible cost. I reckon such a simple scheme would would >work >well, unless updated objects can be supplied as part of the ACPI event >mechanism, and you have to find and evaluate those? In that case I >suppose >some ACPI event info would need to be propagated for it to see the >modified >C-state info. > > -- KeirYes, ACPI defines standard method _CST to export C-state information and a notification to that object is forced (as part of ACPI event like a GPE) at some hardware status change. Then OSPM can re-evaluate _CST. Such event should be rare, and thus a periodical poll is not necessary. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Sep-01 14:22 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
>From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] >Sent: 2007年9月1日 21:58 >options unavailable). But why would available C states change? I >thought >these states were implemented internally on the CPU, and so are either >universally available or not, and I don''t see any physical explanation forNormally C1 (halt) is implemented internally on CPU, while the rest goes to chipset which then like to stop CPU clock or other heavy power saving work... Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Sep-01 15:26 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
On 1/9/07 15:18, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Yes, ACPI defines standard method _CST to export C-state information > and a notification to that object is forced (as part of ACPI event like a > GPE) at some hardware status change. Then OSPM can re-evaluate > _CST. Such event should be rare, and thus a periodical poll is not > necessary.I was just wondering about doing it from dom0 userland. I''ve dug into it a bit now, and it looks like the right answer is to hook into the ''acpid'' daemon. This can be configured to give notifications for power-management changes. This would work fine unless the \_GPE.xxx methods redefined the _CST objects, causing userspace''s value for the object (taken from static parsing of DSDT/SSDT) to be out of date. That seems not very likely though, to say the least; I''m just thinking through the scope here for ''self-modifying'' AML. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Sep-01 15:45 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
>From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] >Sent: 2007年9月1日 23:27 > >On 1/9/07 15:18, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> Yes, ACPI defines standard method _CST to export C-state >information >> and a notification to that object is forced (as part of ACPI event like a >> GPE) at some hardware status change. Then OSPM can re-evaluate >> _CST. Such event should be rare, and thus a periodical poll is not >> necessary. > >I was just wondering about doing it from dom0 userland. I''ve dug into it a >bit now, and it looks like the right answer is to hook into the ''acpid'' >daemon. This can be configured to give notifications for >power-management >changes.Event notification is OK, but the problem is how to retrieve the information from user space. Firstly user application should be allowed to map ACPI related memory area (interface exists today). Then that user application may setup ACPI namespace and retrieve C-state information. However whether it can work depends on low level implementations. For example, one implementation can be: - AC power change (Let''s still take this example though the relationship is clear yet. :-P) - SCI is triggered with a GPE xxx - Related _Lxxx method firstly set a local flag from ''0'' to ''1'' and then notify Processor node - OSPM then re-evaluates _CST method - _CST method may be defined as something like: If local_flag equals to 0 Return _CST1 Else Return _CST2 In this case, the local flag change happens on the stack of dom0''s kernel (local variables in ACPI are not allocated in global ACPI data area). Then even when user application receives the notification, it still observes _CST1 instead of _CST2. To make it work, we have to emulate a similar flow to force local variable on stack of user application changed properly. However that flow is implementation specific and some hardware bits may be touched in the middle which means more emulation to underlying platform. Maybe I''m too nervous and actual implementation may be very simple. But ACPI does allow above condition.> >This would work fine unless the \_GPE.xxx methods redefined the _CST >objects, causing userspace''s value for the object (taken from static >parsing >of DSDT/SSDT) to be out of date. That seems not very likely though, to >say >the least; I''m just thinking through the scope here for ''self-modifying'' >AML. :-) >But that''s possible, right? Just like one change you did before, to ''self-modifying'' processor information from guest BIOS. If the dynamically changed _CST requirement does exist, it''s easy to provide two versions of _CST and then copy one to some address dynamically. I do see one implementation to specify _CST address in some reserved region, instead of including in ACPI table explicitly. I think that gives room for variable _CST implementation. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Sep-01 16:41 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
On 1/9/07 16:45, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Maybe I''m too nervous and actual implementation may be very simple. > But ACPI does allow above condition.Yes, I hadn''t thought of the possibility of local variables being modified in the \_GPE.xxx method. Yuk. There are other problems relying on the kernel to notify us. For a start, an unpinned dom0 kernel is likely to get very confused about CPU APIC IDs and hence map processor objects incorrectly onto physical cpus, and quite possibly fail to register some processor objects entirely. That whole area is a mess without vcpu pinning (not something we want to rely on). This also means that having C-states controlled from dom0 kernel (no userland program at all) has similar limitation. Can we rely on from-scratch evaluation of DSDT and SSDTs to get us up-to-date C-state info? Perhaps we could trigger that via infrequent polling or some suitable low-level event (e.g., SCI count going up in /proc/interrupts)? Or could we be confident that evaluating the appropriate \_GPE.xxx object would be idempotent and hence safe to do from dom0 userspace? Then we could always evaluate it before _CST. It''s also possible that we should implement something simple, and then complicate it just as much as we need to based on testing on real systems. :-) Otherwise we tie ourselves in knots for cases that may not exist. So I''d be happy to start with one-shot static C-state determination from dom0 userspace. That can always be disabled if it causes trouble on some systems, and be incrementally improved. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Sep-03 04:25 UTC
RE: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
>From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] >Sent: 2007年9月2日 0:42 > >On 1/9/07 16:45, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> Maybe I''m too nervous and actual implementation may be very simple. >> But ACPI does allow above condition. > >Yes, I hadn''t thought of the possibility of local variables being modified >in the \_GPE.xxx method. Yuk. > >There are other problems relying on the kernel to notify us. For a start, >an >unpinned dom0 kernel is likely to get very confused about CPU APIC IDs >and >hence map processor objects incorrectly onto physical cpus, and quite >possibly fail to register some processor objects entirely. That whole area >is a mess without vcpu pinning (not something we want to rely on). This >also >means that having C-states controlled from dom0 kernel (no userland >program >at all) has similar limitation.Yes, dom0 may not summarize same information as on native if it lacks of correct knowledge to the environment, unless we change dom0 logic.>Can we rely on from-scratch evaluation of DSDT and SSDTs to get us >up-to-date C-state info? Perhaps we could trigger that via infrequent >polling or some suitable low-level event (e.g., SCI count going up in >/proc/interrupts)? Or could we be confident that evaluating the >appropriate >\_GPE.xxx object would be idempotent and hence safe to do from dom0 >userspace? Then we could always evaluate it before _CST.I don''t think so. ACPI content is BIOS/OEM specific, and we can''t make assumption here unless we analyze them and list only supported BIOS/platforms (definitely not what we want)> >It''s also possible that we should implement something simple, and then >complicate it just as much as we need to based on testing on real >systems. >:-) Otherwise we tie ourselves in knots for cases that may not exist. So >I''d >be happy to start with one-shot static C-state determination from dom0 >userspace. That can always be disabled if it causes trouble on some >systems, >and be incrementally improved. >Agree on this suggestion. Actually I really can''t come up reason why available C-states may change especially on server platform. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rik van Riel
2007-Sep-04 17:23 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
Keir Fraser wrote:> I was just wondering about doing it from dom0 userland. I''ve dug into it a > bit now, and it looks like the right answer is to hook into the ''acpid'' > daemon.Acpid is on its way out. Rumor has it that acpid no longer even works with some (very) recent kernels. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xeb
2007-Oct-01 08:30 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
Hello! I have did some tests and it works well. Frequencies successfully switches from userspace. But there is clock glitches as on dom0 as on dumU. # while true; do date; done; .... 12:10:01 12:10:01 12:10:01 12:10:02 12:10:02 12:10:02 12:10:02 <here freq was changed> 12:10:01 12:10:02 12:10:03 12:10:03 12:10:03 ..... -- View this message in context: http://www.nabble.com/-PATCH--1-2%3A-cpufreq-PowerNow%21-in-Xen%3A-Time-and-platform-changes-tf4350705.html#a12974843 Sent from the Xen - Dev mailing list archive at Nabble.com. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Oct-01 08:33 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
On 1/10/07 09:30, "xeb" <xeb@mail.ru> wrote:> Hello! > I have did some tests and it works well. Frequencies successfully switches > from userspace. > But there is clock glitches as on dom0 as on dumU.Should be fixed by xen-unstable changeset 15982:b3814860d170 (currently in the staging tree). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xeb
2007-Oct-02 12:56 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
Keir Fraser wrote:> > On 1/10/07 09:30, "xeb" <xeb@mail.ru> wrote: > >> Hello! >> I have did some tests and it works well. Frequencies successfully >> switches >> from userspace. >> But there is clock glitches as on dom0 as on dumU. > > Should be fixed by xen-unstable changeset 15982:b3814860d170 (currently in > the staging tree). > > -- Keir >Clock glitches still occurs. Kernel reports next messages: Timer ISR/1: Time went backwards: delta=-312990340 delta_cpu=863676209 shadow=55606869012 off=866802928 processed=56786660988 cpu_processed=55609994439 0: 56783327655 1: 55609994439 Timer ISR/1: Time went backwards: delta=-316323654 delta_cpu=3676314 shadow=55606869012 off=870136739 processed=56793327654 cpu_processed=56473327686 0: 56789994321 1: 56473327686 Timer ISR/1: Time went backwards: delta=-319657223 delta_cpu=3676078 shadow=55606869012 off=873469685 processed=56799994320 cpu_processed=56476661019 0: 56799994320 1: 56476661019 Timer ISR/1: Time went backwards: delta=-321045750 delta_cpu=5620884 shadow=55606869012 off=878747355 processed=56806660986 cpu_processed=56479994352 0: 56806660986 1: 56479994352 Timer ISR/1: Time went backwards: delta=-321049645 delta_cpu=5616989 shadow=55606869012 off=882077351 processed=56809994319 cpu_processed=56483327685 0: 56809994319 1: 56483327685 -- View this message in context: http://www.nabble.com/-PATCH--1-2%3A-cpufreq-PowerNow%21-in-Xen%3A-Time-and-platform-changes-tf4350705.html#a12998267 Sent from the Xen - Dev mailing list archive at Nabble.com. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xeb
2007-Oct-02 12:57 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
Keir Fraser wrote:> > On 1/10/07 09:30, "xeb" <xeb@mail.ru> wrote: > >> Hello! >> I have did some tests and it works well. Frequencies successfully >> switches >> from userspace. >> But there is clock glitches as on dom0 as on dumU. > > Should be fixed by xen-unstable changeset 15982:b3814860d170 (currently in > the staging tree). > > -- Keir >Clock glitches still occurs. Kernel reports next messages: Timer ISR/1: Time went backwards: delta=-312990340 delta_cpu=863676209 shadow=55606869012 off=866802928 processed=56786660988 cpu_processed=55609994439 0: 56783327655 1: 55609994439 Timer ISR/1: Time went backwards: delta=-316323654 delta_cpu=3676314 shadow=55606869012 off=870136739 processed=56793327654 cpu_processed=56473327686 0: 56789994321 1: 56473327686 Timer ISR/1: Time went backwards: delta=-319657223 delta_cpu=3676078 shadow=55606869012 off=873469685 processed=56799994320 cpu_processed=56476661019 0: 56799994320 1: 56476661019 Timer ISR/1: Time went backwards: delta=-321045750 delta_cpu=5620884 shadow=55606869012 off=878747355 processed=56806660986 cpu_processed=56479994352 0: 56806660986 1: 56479994352 Timer ISR/1: Time went backwards: delta=-321049645 delta_cpu=5616989 shadow=55606869012 off=882077351 processed=56809994319 cpu_processed=56483327685 0: 56809994319 1: 56483327685 -- View this message in context: http://www.nabble.com/-PATCH--1-2%3A-cpufreq-PowerNow%21-in-Xen%3A-Time-and-platform-changes-tf4350705.html#a12998280 Sent from the Xen - Dev mailing list archive at Nabble.com. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xeb
2007-Oct-02 13:00 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
Keir Fraser wrote:> > On 1/10/07 09:30, "xeb" <xeb@mail.ru> wrote: > >> Hello! >> I have did some tests and it works well. Frequencies successfully >> switches >> from userspace. >> But there is clock glitches as on dom0 as on dumU. > > Should be fixed by xen-unstable changeset 15982:b3814860d170 (currently in > the staging tree). > > -- Keir >Clock glitches still occurs. Kernel reports next messages: Timer ISR/1: Time went backwards: delta=-312990340 delta_cpu=863676209 shadow=55606869012 off=866802928 processed=56786660988 cpu_processed=55609994439 0: 56783327655 1: 55609994439 Timer ISR/1: Time went backwards: delta=-316323654 delta_cpu=3676314 shadow=55606869012 off=870136739 processed=56793327654 cpu_processed=56473327686 0: 56789994321 1: 56473327686 Timer ISR/1: Time went backwards: delta=-319657223 delta_cpu=3676078 shadow=55606869012 off=873469685 processed=56799994320 cpu_processed=56476661019 0: 56799994320 1: 56476661019 Timer ISR/1: Time went backwards: delta=-321045750 delta_cpu=5620884 shadow=55606869012 off=878747355 processed=56806660986 cpu_processed=56479994352 0: 56806660986 1: 56479994352 Timer ISR/1: Time went backwards: delta=-321049645 delta_cpu=5616989 shadow=55606869012 off=882077351 processed=56809994319 cpu_processed=56483327685 0: 56809994319 1: 56483327685 -- View this message in context: http://www.nabble.com/-PATCH--1-2%3A-cpufreq-PowerNow%21-in-Xen%3A-Time-and-platform-changes-tf4350705.html#a12998319 Sent from the Xen - Dev mailing list archive at Nabble.com. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xeb
2007-Oct-02 13:02 UTC
Re: [Xen-devel] [PATCH] 1/2: cpufreq/PowerNow! in Xen: Time and platform changes
Keir Fraser wrote:> > On 1/10/07 09:30, "xeb" <xeb@mail.ru> wrote: > >> Hello! >> I have did some tests and it works well. Frequencies successfully >> switches >> from userspace. >> But there is clock glitches as on dom0 as on dumU. > > Should be fixed by xen-unstable changeset 15982:b3814860d170 (currently in > the staging tree). > > -- Keir >Clock glitches still occurs. Kernel reports next messages: Timer ISR/1: Time went backwards: delta=-312990340 delta_cpu=863676209 shadow=55606869012 off=866802928 processed=56786660988 cpu_processed=55609994439 0: 56783327655 1: 55609994439 Timer ISR/1: Time went backwards: delta=-316323654 delta_cpu=3676314 shadow=55606869012 off=870136739 processed=56793327654 cpu_processed=56473327686 0: 56789994321 1: 56473327686 Timer ISR/1: Time went backwards: delta=-319657223 delta_cpu=3676078 shadow=55606869012 off=873469685 processed=56799994320 cpu_processed=56476661019 0: 56799994320 1: 56476661019 Timer ISR/1: Time went backwards: delta=-321045750 delta_cpu=5620884 shadow=55606869012 off=878747355 processed=56806660986 cpu_processed=56479994352 0: 56806660986 1: 56479994352 Timer ISR/1: Time went backwards: delta=-321049645 delta_cpu=5616989 shadow=55606869012 off=882077351 processed=56809994319 cpu_processed=56483327685 0: 56809994319 1: 56483327685 -- View this message in context: http://www.nabble.com/-PATCH--1-2%3A-cpufreq-PowerNow%21-in-Xen%3A-Time-and-platform-changes-tf4350705.html#a12998368 Sent from the Xen - Dev mailing list archive at Nabble.com. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel