I see serious discrepancies between Cpu usage as reported by /proc/stat on Xen3 virts and Cpu usage as reported by the hypervisor via "xm" tool (cpu_time). The problem exists on Intel and AMD platforms - 1 Vcpu and multiple Vcpu slots - 1 Physical CPU and multiple Physical CPU hosts. The skew is pronounced with workloads that "sleep-wake-sleep-wake" at a high frequency while workloads that hog the CPU don''t exhibit this problem as much. Anybody seen this ? Any insights ? http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=882 has all the details. - Pradeep Vincent _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Does your /proc/stat analysis include time spent in the kernel? Another possibility here is that, if your guest blocks a lot, you will see that Linux counts the guest as ''running'' for less of the context-switch path than Xen does. This will cause Linux''s estimate of time used to be less than Xen''s. There''s not much to be done about that: in general Xen has more knowledge of what is actually going on, including precisely when a switch of control happens, and the numbers from xentop will be more accurate than numbers generated by the guest itself (particularly with frequently-blocking workloads). Although it depends on what you''re interested in measuring -- if you care about the amount of time spent doing useful application work (as opposed to context switching) then you might be more interested in the Linux stats because Xen will include more time spent in the Linux and Xen context switch paths. -- Keir On 2/3/07 23:42, "Pradeep Vincent" <pradeep.vincent@gmail.com> wrote:> I see serious discrepancies between Cpu usage as reported by /proc/stat on > Xen3 > virts and Cpu usage as reported by the hypervisor via "xm" tool > (cpu_time). The problem exists on Intel and AMD platforms - 1 Vcpu and > multiple Vcpu slots - 1 Physical CPU and multiple Physical CPU hosts. > > The skew is pronounced with workloads that "sleep-wake-sleep-wake" at > a high frequency while workloads that hog the CPU don''t exhibit this > problem as much. > > Anybody seen this ? Any insights ? > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=882 has all the > details. > > - Pradeep Vincent > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
With a trivial workload like "ls -R /" I see as much as 30% diff and with other workloads I see that xm reports twice what /proc/stat reports. Sounds too high to me. Linux counts all the nanosecs not accounted by the hypervisor towards "stolen" or "blocked" as its own usage. This should include all the time spent in the hypervisor in the context of a particular Vcpu - The hypervisor counts nsecs as "stolen" or "blocked" only after the Vcpu''s state is changed (from running to something else) So most part of the hypervisor''s CPU usage should be accounted for the same way by xm and by /proc/stat on guests as they both use the same "stolen" and "blocked" nsecs as accounted for and maintained by the hypervisor. Like you said context switch overhead isn''t accounted for accurately but hypervisor''s cpu usage accounting suffers from the same problem and to the same extent. Even if this isn''t the case, context switch cpu usage can''t account for this big a difference. - Pradeep Vincent On 3/4/07, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:> Does your /proc/stat analysis include time spent in the kernel? > > Another possibility here is that, if your guest blocks a lot, you will see > that Linux counts the guest as ''running'' for less of the context-switch path > than Xen does. This will cause Linux''s estimate of time used to be less than > Xen''s. There''s not much to be done about that: in general Xen has more > knowledge of what is actually going on, including precisely when a switch of > control happens, and the numbers from xentop will be more accurate than > numbers generated by the guest itself (particularly with frequently-blocking > workloads). Although it depends on what you''re interested in measuring -- if > you care about the amount of time spent doing useful application work (as > opposed to context switching) then you might be more interested in the Linux > stats because Xen will include more time spent in the Linux and Xen context > switch paths. > > -- Keir > > On 2/3/07 23:42, "Pradeep Vincent" <pradeep.vincent@gmail.com> wrote: > > > I see serious discrepancies between Cpu usage as reported by /proc/stat on > > Xen3 > > virts and Cpu usage as reported by the hypervisor via "xm" tool > > (cpu_time). The problem exists on Intel and AMD platforms - 1 Vcpu and > > multiple Vcpu slots - 1 Physical CPU and multiple Physical CPU hosts. > > > > The skew is pronounced with workloads that "sleep-wake-sleep-wake" at > > a high frequency while workloads that hog the CPU don''t exhibit this > > problem as much. > > > > Anybody seen this ? Any insights ? > > > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=882 has all the > > details. > > > > - Pradeep Vincent > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Does your /proc/stat analysis include time spent in the kernel?Yes.. it does.. - Pradeep Vincent On 3/6/07, Pradeep Vincent <pradeep.vincent@gmail.com> wrote:> With a trivial workload like "ls -R /" I see as much as 30% diff and > with other workloads I see that xm reports twice what /proc/stat > reports. Sounds too high to me. > > Linux counts all the nanosecs not accounted by the hypervisor towards > "stolen" or "blocked" as its own usage. This should include all the > time spent in the hypervisor in the context of a particular Vcpu - The > hypervisor counts nsecs as "stolen" or "blocked" only after the Vcpu''s > state is changed (from running to something else) So most part of the > hypervisor''s CPU usage should be accounted for the same way by xm and > by /proc/stat on guests as they both use the same "stolen" and > "blocked" nsecs as accounted for and maintained by the hypervisor. > > Like you said context switch overhead isn''t accounted for accurately > but hypervisor''s cpu usage accounting suffers from the same problem > and to the same extent. Even if this isn''t the case, context switch > cpu usage can''t account for this big a difference. > > - Pradeep Vincent > > On 3/4/07, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote: > > Does your /proc/stat analysis include time spent in the kernel? > > > > Another possibility here is that, if your guest blocks a lot, you will see > > that Linux counts the guest as ''running'' for less of the context-switch path > > than Xen does. This will cause Linux''s estimate of time used to be less than > > Xen''s. There''s not much to be done about that: in general Xen has more > > knowledge of what is actually going on, including precisely when a switch of > > control happens, and the numbers from xentop will be more accurate than > > numbers generated by the guest itself (particularly with frequently-blocking > > workloads). Although it depends on what you''re interested in measuring -- if > > you care about the amount of time spent doing useful application work (as > > opposed to context switching) then you might be more interested in the Linux > > stats because Xen will include more time spent in the Linux and Xen context > > switch paths. > > > > -- Keir > > > > On 2/3/07 23:42, "Pradeep Vincent" <pradeep.vincent@gmail.com> wrote: > > > > > I see serious discrepancies between Cpu usage as reported by /proc/stat on > > > Xen3 > > > virts and Cpu usage as reported by the hypervisor via "xm" tool > > > (cpu_time). The problem exists on Intel and AMD platforms - 1 Vcpu and > > > multiple Vcpu slots - 1 Physical CPU and multiple Physical CPU hosts. > > > > > > The skew is pronounced with workloads that "sleep-wake-sleep-wake" at > > > a high frequency while workloads that hog the CPU don''t exhibit this > > > problem as much. > > > > > > Anybody seen this ? Any insights ? > > > > > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=882 has all the > > > details. > > > > > > - Pradeep Vincent > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xensource.com > > > http://lists.xensource.com/xen-devel > > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 7/3/07 01:11, "Pradeep Vincent" <pradeep.vincent@gmail.com> wrote:> Linux counts all the nanosecs not accounted by the hypervisor towards > "stolen" or "blocked" as its own usage. This should include all the > time spent in the hypervisor in the context of a particular Vcpu - The > hypervisor counts nsecs as "stolen" or "blocked" only after the Vcpu''s > state is changed (from running to something else) So most part of the > hypervisor''s CPU usage should be accounted for the same way by xm and > by /proc/stat on guests as they both use the same "stolen" and > "blocked" nsecs as accounted for and maintained by the hypervisor. > > Like you said context switch overhead isn''t accounted for accurately > but hypervisor''s cpu usage accounting suffers from the same problem > and to the same extent. Even if this isn''t the case, context switch > cpu usage can''t account for this big a difference.It sounds like you could track this one down yourself and post a patch if you find a bug? :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel