Trying to understand whether CPU frequency scaling is actually working on a system currently requires (afaics) source patches, as there is no way to get the current state of a CPU. Even if this is intentional, this doesn''t seem very helpful when considering to make this functionality available to customers: I''m certain quite a few will ask how they can tell whether this is actually working. Now, apart from the simple job of adding a sub-hypercall to retrieve the necessary bits, I''m wondering whether this wouldn''t be just one more element that would much better be surfaced to the guest via the vCPU info structure (or, as that''s size constrained, a new construct to make guest-read-only information available via a shared page). Other (potential) items to make available this same way would e.g. be guest- accessible last-exception-from/-to MSR values (as the values read would be meaningless if read through rdmsr). So I''m basically considering to add a generic mechanism first, and then make cpufreq the first user of it. The question just is - use a completely new (guest-ro) per-vCPU page, perhaps with chained descriptors rather than a fixed layout, or extend the vCPU info structure, but e.g. require the guest to use VCPUOP_register_vcpu_info to gain access to all structure fields. Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Jan Beulich >Sent: 2008年9月8日 21:12 > >Trying to understand whether CPU frequency scaling is actually >working on >a system currently requires (afaics) source patches, as there >is no way to >get the current state of a CPU. Even if this is intentional,What do you mean by current state of CPU? If cpufreq is enabled, user should be able to retrieve statistics information by sysctl path.>this doesn''t seem >very helpful when considering to make this functionality available to >customers: I''m certain quite a few will ask how they can tell >whether this >is actually working. > >Now, apart from the simple job of adding a sub-hypercall to >retrieve the >necessary bits, I''m wondering whether this wouldn''t be just one more >element that would much better be surfaced to the guest via the vCPU >info structure (or, as that''s size constrained, a new construct to make >guest-read-only information available via a shared page). Other >(potential) items to make available this same way would e.g. be guest- >accessible last-exception-from/-to MSR values (as the values read would >be meaningless if read through rdmsr).Not quite understand. Cpufreq is physical cpu stuff, and do you aim to expose physical information through vcpu specific shared page? Then that would add fixed requirement on dom0 vcpu number to physical cpus, which is intentially avoided in current design. I guess I may get your intent wrong though. Thanks, Kevin> >So I''m basically considering to add a generic mechanism first, and then >make cpufreq the first user of it. The question just is - use >a completely >new (guest-ro) per-vCPU page, perhaps with chained descriptors rather >than a fixed layout, or extend the vCPU info structure, but >e.g. require >the guest to use VCPUOP_register_vcpu_info to gain access to all >structure fields. > >Thanks, Jan >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> "Tian, Kevin" <kevin.tian@intel.com> 08.09.08 15:22 >>> >>From: Jan Beulich >>Sent: 2008年9月8日 21:12 >> >>Trying to understand whether CPU frequency scaling is actually >>working on >>a system currently requires (afaics) source patches, as there >>is no way to >>get the current state of a CPU. Even if this is intentional, > >What do you mean by current state of CPU? If cpufreq is enabled, >user should be able to retrieve statistics information by sysctl >path.How? I can''t see where the current frequency a CPU is running at is being exposed.>>this doesn''t seem >>very helpful when considering to make this functionality available to >>customers: I''m certain quite a few will ask how they can tell >>whether this >>is actually working. >> >>Now, apart from the simple job of adding a sub-hypercall to >>retrieve the >>necessary bits, I''m wondering whether this wouldn''t be just one more >>element that would much better be surfaced to the guest via the vCPU >>info structure (or, as that''s size constrained, a new construct to make >>guest-read-only information available via a shared page). Other >>(potential) items to make available this same way would e.g. be guest- >>accessible last-exception-from/-to MSR values (as the values read would >>be meaningless if read through rdmsr). > >Not quite understand. Cpufreq is physical cpu stuff, and do you aim >to expose physical information through vcpu specific shared page? >Then that would add fixed requirement on dom0 vcpu number to >physical cpus, which is intentially avoided in current design.The intent is to expose the frequency of the pCPU the particular vCPU is currently running on, perhaps only in Dom0. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 8/9/08 14:12, "Jan Beulich" <jbeulich@novell.com> wrote:> Trying to understand whether CPU frequency scaling is actually working on > a system currently requires (afaics) source patches, as there is no way to > get the current state of a CPU. Even if this is intentional, this doesn''t seem > very helpful when considering to make this functionality available to > customers: I''m certain quite a few will ask how they can tell whether this > is actually working.Well, they might notice that their battery lasts longer than an hour. :-)> So I''m basically considering to add a generic mechanism first, and then > make cpufreq the first user of it. The question just is - use a completely > new (guest-ro) per-vCPU page, perhaps with chained descriptors rather > than a fixed layout, or extend the vCPU info structure, but e.g. require > the guest to use VCPUOP_register_vcpu_info to gain access to all > structure fields.I''m skeptical about throwing in more shared-memory structures between Xen and guests anyway. It doesn''t even seem a good fit here since this info is not of great interest to most guests in my opinion. I presume most users'' curiosities will be satisfied by being able to poll CPU frequencies/voltages from dom0 via a hypercall. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Jan Beulich [mailto:jbeulich@novell.com] >Sent: 2008年9月8日 21:33 >>> >>>Trying to understand whether CPU frequency scaling is actually >>>working on >>>a system currently requires (afaics) source patches, as there >>>is no way to >>>get the current state of a CPU. Even if this is intentional, >> >>What do you mean by current state of CPU? If cpufreq is enabled, >>user should be able to retrieve statistics information by sysctl >>path. > >How? I can''t see where the current frequency a CPU is running at >is being exposed.common/sysctl.c: XEN_SYSCTL_get_pmstat> >>>this doesn''t seem >>>very helpful when considering to make this functionality available to >>>customers: I''m certain quite a few will ask how they can tell >>>whether this >>>is actually working. >>> >>>Now, apart from the simple job of adding a sub-hypercall to >>>retrieve the >>>necessary bits, I''m wondering whether this wouldn''t be just one more >>>element that would much better be surfaced to the guest via the vCPU >>>info structure (or, as that''s size constrained, a new >construct to make >>>guest-read-only information available via a shared page). Other >>>(potential) items to make available this same way would e.g. >be guest- >>>accessible last-exception-from/-to MSR values (as the values >read would >>>be meaningless if read through rdmsr). >> >>Not quite understand. Cpufreq is physical cpu stuff, and do you aim >>to expose physical information through vcpu specific shared page? >>Then that would add fixed requirement on dom0 vcpu number to >>physical cpus, which is intentially avoided in current design. > >The intent is to expose the frequency of the pCPU the particular vCPU >is currently running on, perhaps only in Dom0. >Then you have to pin dom0 vCPU to corresponding pCPU, and have dom0 with same number as pCPU. I don''t think such limitation necessary for just retrieving some pCPU information. Or if you still enable vcpu migration, you have to fake virtual freq change notification within dom0 at vcpu migration as pCPU may scale its own freq individually. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 8/9/08 14:50, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> The intent is to expose the frequency of the pCPU the particular vCPU >> is currently running on, perhaps only in Dom0. >> > > Then you have to pin dom0 vCPU to corresponding pCPU, and have > dom0 with same number as pCPU. I don''t think such limitation > necessary for just retrieving some pCPU information. > > Or if you still enable vcpu migration, you have to fake virtual freq > change notification within dom0 at vcpu migration as pCPU may > scale its own freq individually.Greater virtualisation of cpufreq info (making it available to arbitrary guests, with pcpu->vcpu remapping) is just not very useful in my opinion. It''s one of those features that more hacker-ish users might think they want to decorate their desktop. After all, guest performance is at least as affected by CPU (and other resources) contention from other guests as it is by power-management governors in the hyeprvisor. Indeed, if the governors are doing their job right then guest performance should not be considerably impacted by them even in absolute terms. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: 2008年9月8日 22:01 > >On 8/9/08 14:50, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>> The intent is to expose the frequency of the pCPU the >particular vCPU >>> is currently running on, perhaps only in Dom0. >>> >> >> Then you have to pin dom0 vCPU to corresponding pCPU, and have >> dom0 with same number as pCPU. I don''t think such limitation >> necessary for just retrieving some pCPU information. >> >> Or if you still enable vcpu migration, you have to fake virtual freq >> change notification within dom0 at vcpu migration as pCPU may >> scale its own freq individually. > >Greater virtualisation of cpufreq info (making it available to >arbitrary >guests, with pcpu->vcpu remapping) is just not very useful in >my opinion. >It''s one of those features that more hacker-ish users might >think they want >to decorate their desktop.Agree.> >After all, guest performance is at least as affected by CPU (and other >resources) contention from other guests as it is by power-management >governors in the hyeprvisor. Indeed, if the governors are >doing their job >right then guest performance should not be considerably >impacted by them >even in absolute terms.Exactly. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> "Tian, Kevin" <kevin.tian@intel.com> 08.09.08 15:50 >>> >>From: Jan Beulich [mailto:jbeulich@novell.com] >>How? I can''t see where the current frequency a CPU is running at >>is being exposed. > >common/sysctl.c: XEN_SYSCTL_get_pmstatAh, okay, I missed that. But - I can''t use this from the kernel anyway, and tools that track the frequency (i.e. KDE sysguard) would need to be modified in order to make use of this. I''d really prefer /proc/cpuinfo to correctly reflect this at least in Dom0. And even beyond that - I can''t seem to find any users of the APIs in tools/libxc/xc_pm.c, so these really appear to be dead stubs.>Then you have to pin dom0 vCPU to corresponding pCPU, and have >dom0 with same number as pCPU. I don''t think such limitation >necessary for just retrieving some pCPU information. > >Or if you still enable vcpu migration, you have to fake virtual freq >change notification within dom0 at vcpu migration as pCPU may >scale its own freq individually.Why? All I care about is a snapshot value. It doesn''t matter whether it''s stale by the time I get it, there''s nothing going to be calculated from it, apart from statistics. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir.fraser@eu.citrix.com> 08.09.08 16:00 >>> >After all, guest performance is at least as affected by CPU (and other >resources) contention from other guests as it is by power-management >governors in the hyeprvisor. Indeed, if the governors are doing their job >right then guest performance should not be considerably impacted by them >even in absolute terms.Right you say ''if'' - what if not? How do I tell, especially when I can''t touch the system and easily put a patched hypervisor and/or kernel on. If we get a complaint from a customer that he thinks frequency scaling doesn''t do what it''s expected to, we''ll need to have a simple mechanism at hand to determine what''s going on with his box. And even for development purposes I think a one-look proof that things work as expected is quite useful (I''m doing the same for a few other basic things - the interrupt rate being one of those that helped spot problems that otherwise would have gone unnoticed for a much longer period of time). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 8/9/08 15:30, "Jan Beulich" <jbeulich@novell.com> wrote:>>>> Keir Fraser <keir.fraser@eu.citrix.com> 08.09.08 16:00 >>> >> After all, guest performance is at least as affected by CPU (and other >> resources) contention from other guests as it is by power-management >> governors in the hyeprvisor. Indeed, if the governors are doing their job >> right then guest performance should not be considerably impacted by them >> even in absolute terms. > > Right you say ''if'' - what if not? How do I tell, especially when I can''t touch > the system and easily put a patched hypervisor and/or kernel on. If we > get a complaint from a customer that he thinks frequency scaling doesn''t > do what it''s expected to, we''ll need to have a simple mechanism at hand > to determine what''s going on with his box. > > And even for development purposes I think a one-look proof that things > work as expected is quite useful (I''m doing the same for a few other basic > things - the interrupt rate being one of those that helped spot problems > that otherwise would have gone unnoticed for a much longer period of > time).Okay, I''ll grant you that this is a useful scenario. For this purpose the existing sysctl, plus a simple lashed-up dom0 userspace utility to dump the statistics, would be perfectly sufficient. Another possibility would be to generate xentrace events when CPU frequencies change. Then, if you already buy into using xentrace to get accurate logs about what''s happening and when (which we do for our own product development and maintenance), then the CPU frequency events would be nicely integrated into that. The advantage of the latter is that the frequency changes are interleaved with other stuff you care about, like what''s being scheduled, what''s on run queues etc. Since obviously frequency information all by itself is not so interesting. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Jan Beulich [mailto:jbeulich@novell.com] >Sent: 2008年9月8日 22:24 > >>>> "Tian, Kevin" <kevin.tian@intel.com> 08.09.08 15:50 >>> >>>From: Jan Beulich [mailto:jbeulich@novell.com] >>>How? I can''t see where the current frequency a CPU is running at >>>is being exposed. >> >>common/sysctl.c: XEN_SYSCTL_get_pmstat > >Ah, okay, I missed that. But - I can''t use this from the kernel anyway, >and tools that track the frequency (i.e. KDE sysguard) would need to >be modified in order to make use of this. I''d really prefer >/proc/cpuinfo >to correctly reflect this at least in Dom0. And even beyond >that - I can''t >seem to find any users of the APIs in tools/libxc/xc_pm.c, so these >really appear to be dead stubs.They''re not dead stubs, and we have internal tools as a modified xen PowerTop version and future we can expect more. This is not a cpufreq only design choice, and similar thing goes for other stuff like MCA and CPU hotplug: whether we want to reuse all existing dom0 Linux interfaces (which however pushes requirement on a vcpu placement within dom0 for each pcpu), or we use some side- band hypercall with existing tools supporting Xen hypercall interface to retrieve pcpu related information in a batch. By far we choose the latter for cpufreq. Actually even by keeping same interface, existing dom0 tools may have to be modified more or less for specific physical information as those tools haven''t system knowledge. For example, PowerTop uses /proc/interrupts to derive break events for Cx residency, which however only reflects virtual interrupts within dom0. Even for /proc/cpuinfo, then you finally require a mangled version with mixed physical and virtual bits?> >>Then you have to pin dom0 vCPU to corresponding pCPU, and have >>dom0 with same number as pCPU. I don''t think such limitation >>necessary for just retrieving some pCPU information. >> >>Or if you still enable vcpu migration, you have to fake virtual freq >>change notification within dom0 at vcpu migration as pCPU may >>scale its own freq individually. > >Why? All I care about is a snapshot value. It doesn''t matter whether >it''s stale by the time I get it, there''s nothing going to be calculated >from it, apart from statistics. >Then such physical info may conflict with other code snippet, if existing in guest, which simply gets freq info from some self calibrated logic. If we can''t assure a consistent view about freq within guest, what''s the point then? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: 2008年9月8日 22:35 >> >> Right you say ''if'' - what if not? How do I tell, especially >when I can''t touch >> the system and easily put a patched hypervisor and/or kernel >on. If we >> get a complaint from a customer that he thinks frequency >scaling doesn''t >> do what it''s expected to, we''ll need to have a simple >mechanism at hand >> to determine what''s going on with his box. >> >> And even for development purposes I think a one-look proof >that things >> work as expected is quite useful (I''m doing the same for a >few other basic >> things - the interrupt rate being one of those that helped >spot problems >> that otherwise would have gone unnoticed for a much longer period of >> time). > >Okay, I''ll grant you that this is a useful scenario. For this >purpose the >existing sysctl, plus a simple lashed-up dom0 userspace >utility to dump the >statistics, would be perfectly sufficient. > >Another possibility would be to generate xentrace events when CPU >frequencies change. Then, if you already buy into using xentrace to get >accurate logs about what''s happening and when (which we do for our own >product development and maintenance), then the CPU frequency >events would be >nicely integrated into that. > >The advantage of the latter is that the frequency changes are >interleaved >with other stuff you care about, like what''s being scheduled, >what''s on run >queues etc. Since obviously frequency information all by >itself is not so >interesting. >Yes, we already have many examples within dom0 to retrieve xen specific information bypassing dom0. By the way, we have the plan to add xentrace events for all processor PM stuff, including freq change and also cpu idle states transition (like break causes, etc.) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Jan Beulich [mailto:jbeulich@novell.com] >Sent: 2008年9月8日 22:31 > >>>> Keir Fraser <keir.fraser@eu.citrix.com> 08.09.08 16:00 >>> >>After all, guest performance is at least as affected by CPU (and other >>resources) contention from other guests as it is by power-management >>governors in the hyeprvisor. Indeed, if the governors are >doing their job >>right then guest performance should not be considerably >impacted by them >>even in absolute terms. > >Right you say ''if'' - what if not? How do I tell, especially >when I can''t touch >the system and easily put a patched hypervisor and/or kernel on. If we >get a complaint from a customer that he thinks frequency >scaling doesn''t >do what it''s expected to, we''ll need to have a simple mechanism at hand >to determine what''s going on with his box. >Extend this complaint to the case, where a customer complains that he thinks frequency scaling on (v)cpu 0 doesn''t do what it''s expected to, how do you check what''s going on if there''s no fixed mapping between vcpu and pcpu, and no tool talking to xen directly... Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 8/9/08 15:49, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> The advantage of the latter is that the frequency changes are >> interleaved >> with other stuff you care about, like what''s being scheduled, >> what''s on run >> queues etc. Since obviously frequency information all by >> itself is not so >> interesting. >> > > Yes, we already have many examples within dom0 to retrieve xen > specific information bypassing dom0. By the way, we have the plan > to add xentrace events for all processor PM stuff, including freq > change and also cpu idle states transition (like break causes, etc.)It''s good to know there are other users and developers of xentrace! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel