Hi, The following patchset is to add processor P state power management support to Xen X86. Basic Description: =============Cpufreq is a fundemantal feature of prosessor power management. In X86, It is defined as processor P state in ACPI spec, and supported by most x86 processors. Linux supports cpufreq related features by ACPI P state lib, cpufreq drivers, and cpufreq policies. This patchset intend to add this feature to Xen. Basically following things need to be done to support P state - get P state info from ACPI table; - setup cpufreq infrastructure at hypervisor; - setup cpufreq ondemand policy at hypervisor; Considering dom0 kernel already has provided ACPI CA and perflib, the first task is done in dom0 kernel, and transfer Px related info to hypervisor via hypercall. The second task, to setup cpufreq infrastructure, is done in hypervisor, including cpufreq drivers and policy data structure. The third task, to setup cpufreq policy, is also done in hypervisor based on the infrastructure, implementing the goal to manage processor P state according to processor domain dependency and workload, keeping balance between processor performance and power consumption. Currently Px patch provide user to choose processor P state controlled by the new hypervisor control model by adding xen grub cmdline "cpufreq=xen-cpufreq" option, or, keep original dom0 kernel control model (still) by adding xen grub cmdline "cpufreq=dom0-kernel" option. In this version, Px algorithm is ported from Linux cpufreq ondemand policy, and some other features like user governor and power aware schedule is our next plan. Patch Description: ============This patchset is based on cset xen-linux-535/xen-staging-17602 [Patch 1/5] [dom0] Fix a bug related to parse named objects [Patch 2/5] [dom0] Basic framework of getting and notifying Px info [Patch 3/5] [xen] Get ACPI Px from dom0 and chose Px controller [Patch 4/5] [xen] Setup cpufreq infrusturcture, driver and tools [Patch 5/5] [xen] Implement cpufreq ondemand policy Special notes on get_measured_perf( ) ===========================get_measured_perf( unsigned int cpu ) is a function to measure processor''s average frequency over a period of time according to IA32_MPERF and IA32_APERF MSRs. Currently this function just test average frequency of the processor which runs the process. We plan to expand handle query on non-current cpu later. ------------------------------------------------------- Thanks, Jinsong ____________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Is it really necessary to further expand the code duplication in Xen (instead of re-using the already existing code in dom0)? I was under the impression that the latter had been the plan for as much as possible of both Cx and Px handling... Jan>>> "Liu, Jinsong" <jinsong.liu@intel.com> 13.05.08 05:09 >>>Hi, The following patchset is to add processor P state power management support to Xen X86. Basic Description: =============Cpufreq is a fundemantal feature of prosessor power management. In X86, It is defined as processor P state in ACPI spec, and supported by most x86 processors. Linux supports cpufreq related features by ACPI P state lib, cpufreq drivers, and cpufreq policies. This patchset intend to add this feature to Xen. Basically following things need to be done to support P state - get P state info from ACPI table; - setup cpufreq infrastructure at hypervisor; - setup cpufreq ondemand policy at hypervisor; Considering dom0 kernel already has provided ACPI CA and perflib, the first task is done in dom0 kernel, and transfer Px related info to hypervisor via hypercall. The second task, to setup cpufreq infrastructure, is done in hypervisor, including cpufreq drivers and policy data structure. The third task, to setup cpufreq policy, is also done in hypervisor based on the infrastructure, implementing the goal to manage processor P state according to processor domain dependency and workload, keeping balance between processor performance and power consumption. Currently Px patch provide user to choose processor P state controlled by the new hypervisor control model by adding xen grub cmdline "cpufreq=xen-cpufreq" option, or, keep original dom0 kernel control model (still) by adding xen grub cmdline "cpufreq=dom0-kernel" option. In this version, Px algorithm is ported from Linux cpufreq ondemand policy, and some other features like user governor and power aware schedule is our next plan. Patch Description: ============This patchset is based on cset xen-linux-535/xen-staging-17602 [Patch 1/5] [dom0] Fix a bug related to parse named objects [Patch 2/5] [dom0] Basic framework of getting and notifying Px info [Patch 3/5] [xen] Get ACPI Px from dom0 and chose Px controller [Patch 4/5] [xen] Setup cpufreq infrusturcture, driver and tools [Patch 5/5] [xen] Implement cpufreq ondemand policy Special notes on get_measured_perf( ) ===========================get_measured_perf( unsigned int cpu ) is a function to measure processor''s average frequency over a period of time according to IA32_MPERF and IA32_APERF MSRs. Currently this function just test average frequency of the processor which runs the process. We plan to expand handle query on non-current cpu later. ------------------------------------------------------- Thanks, Jinsong ____________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Jan Beulich [mailto:jbeulich@novell.com] >Sent: 2008年5月13日 15:57 > >Is it really necessary to further expand the code duplication >in Xen (instead >of re-using the already existing code in dom0)? I was under >the impression >that the latter had been the plan for as much as possible of >both Cx and Px >handling... Jan >At least two reasons we see necessary to let Xen control freq change directly, as discussed before: a) Dom0 is itself a guest, which may be even scheduled out between the point it makes decision and the point where Xen traps the request to really update related MSRs. Also some special measurement is required to ensure its ondemand governor to be triggered within expected period. When whole system is scaled up, above is likely to be more inaccurate. While doing it in Xen can ensure fine-grained chk on real processors, to fully utilize hardware fast-swtich technology. This also allows for more governor experiments later like to incorporates some virtualization specific information. b) This releases dependency to requirement that dom0 vcpu topolicy has to match physical one with affinity pushed. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Langsdorf, Mark
2008-May-15 21:55 UTC
RE: [Xen-devel][PATCH 0/5] Add cpufreq pwr mgmt to Xen
> The following patchset is to add processor P state power management > support to Xen X86.Is there any way to determine what frequencies the CPU is running at now? I think I have support for the AMD Architectural P-state driver working now, but it would be nice if there was a better way to test that it''s working. Thanks. -Mark Langsdorf Operating System Research Center AMD _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Langsdorf, Mark [mailto:mark.langsdorf@amd.com] >Sent: 2008年5月16日 5:55 > >> The following patchset is to add processor P state power management >> support to Xen X86. > >Is there any way to determine what frequencies the >CPU is running at now? I think I have support for >the AMD Architectural P-state driver working now, >but it would be nice if there was a better way to >test that it''s working. >Mark, the efficient way to check current freq is cpu specific. ACPI defines a way to read PERF_STATUS and then compare to _PSS status field as confirmation whether the transition succeeds. Corresponding MSR in Intel are IA32_PERF_STATUS upon which cpufreq driver exacts to compare freq table for actual value. As hardware coordination is active, above value doesn''t reflect the actual freq. Thus there''re a new MSR pair (MPERF/APERF) on Intel processors to report the actual freq. You can check get_cur_freq_on_cpu for all possible ways. I can''t tell for AMD cpus but I guess similar MSRs may also exist in your case. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel