thr3ads.net - Xen devel - [Xen-devel] CPU Usage Discrepancies [Mar 2007]

If this information is useful, please help other people find it:
Share via:

Pradeep Vincent

2007-Mar-02 23:42 UTC

[Xen-devel] CPU Usage Discrepancies

I see serious discrepancies between Cpu usage as reported by /proc/stat on Xen3
virts and Cpu usage as reported by the hypervisor via "xm" tool
(cpu_time). The problem exists on Intel and AMD platforms - 1 Vcpu and
multiple Vcpu slots - 1 Physical CPU and multiple Physical CPU hosts.

The skew is pronounced with workloads that "sleep-wake-sleep-wake" at
a high frequency while workloads that hog the CPU don''t exhibit this
problem as much.

Anybody seen this ? Any insights ?

http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=882 has all the details.

- Pradeep Vincent

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Mar-04 12:12 UTC

head link

Re: [Xen-devel] CPU Usage Discrepancies

Does your /proc/stat analysis include time spent in the kernel?

Another possibility here is that, if your guest blocks a lot, you will see
that Linux counts the guest as ''running'' for less of the
context-switch path
than Xen does. This will cause Linux''s estimate of time used to be less
than
Xen''s. There''s not much to be done about that: in general Xen
has more
knowledge of what is actually going on, including precisely when a switch of
control happens, and the numbers from xentop will be more accurate than
numbers generated by the guest itself (particularly with frequently-blocking
workloads). Although it depends on what you''re interested in measuring
-- if
you care about the amount of time spent doing useful application work (as
opposed to context switching) then you might be more interested in the Linux
stats because Xen will include more time spent in the Linux and Xen context
switch paths.

 -- Keir

On 2/3/07 23:42, "Pradeep Vincent" <pradeep.vincent@gmail.com>
wrote:
> I see serious discrepancies between Cpu usage as reported by /proc/stat on
> Xen3
> virts and Cpu usage as reported by the hypervisor via "xm" tool
> (cpu_time). The problem exists on Intel and AMD platforms - 1 Vcpu and
> multiple Vcpu slots - 1 Physical CPU and multiple Physical CPU hosts.
> 
> The skew is pronounced with workloads that
"sleep-wake-sleep-wake" at
> a high frequency while workloads that hog the CPU don''t exhibit
this
> problem as much.
> 
> Anybody seen this ? Any insights ?
> 
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=882 has all the
> details.
> 
> - Pradeep Vincent
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pradeep Vincent

2007-Mar-07 01:11 UTC

head link

Re: [Xen-devel] CPU Usage Discrepancies

With a trivial workload like "ls -R /" I see as much as 30% diff and
with other workloads I see that xm reports twice what /proc/stat
reports. Sounds too high to me.

Linux counts all the nanosecs not accounted by the hypervisor towards
"stolen" or "blocked" as its own usage. This should include
all the
time spent in the hypervisor in the context of a particular Vcpu - The
hypervisor counts nsecs as "stolen" or "blocked" only after
the Vcpu''s
state is changed (from running to something else)  So most part of the
hypervisor''s CPU usage should be accounted for the same way by xm and
by /proc/stat on guests as they both use the same "stolen" and
"blocked" nsecs as accounted for and maintained by the hypervisor.

Like you said context switch overhead isn''t accounted for accurately
but hypervisor''s cpu usage accounting suffers from the same problem
and to the same extent. Even if this isn''t the case,  context switch
cpu usage can''t account for this big a difference.

- Pradeep Vincent

On 3/4/07, Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
wrote:> Does your /proc/stat analysis include time spent in the kernel?
>
> Another possibility here is that, if your guest blocks a lot, you will see
> that Linux counts the guest as ''running'' for less of the
context-switch path
> than Xen does. This will cause Linux''s estimate of time used to be
less than
> Xen''s. There''s not much to be done about that: in general
Xen has more
> knowledge of what is actually going on, including precisely when a switch
of
> control happens, and the numbers from xentop will be more accurate than
> numbers generated by the guest itself (particularly with
frequently-blocking
> workloads). Although it depends on what you''re interested in
measuring -- if
> you care about the amount of time spent doing useful application work (as
> opposed to context switching) then you might be more interested in the
Linux
> stats because Xen will include more time spent in the Linux and Xen context
> switch paths.
>
>  -- Keir
>
> On 2/3/07 23:42, "Pradeep Vincent"
<pradeep.vincent@gmail.com> wrote:
>
> > I see serious discrepancies between Cpu usage as reported by
/proc/stat on
> > Xen3
> > virts and Cpu usage as reported by the hypervisor via "xm"
tool
> > (cpu_time). The problem exists on Intel and AMD platforms - 1 Vcpu and
> > multiple Vcpu slots - 1 Physical CPU and multiple Physical CPU hosts.
> >
> > The skew is pronounced with workloads that
"sleep-wake-sleep-wake" at
> > a high frequency while workloads that hog the CPU don''t
exhibit this
> > problem as much.
> >
> > Anybody seen this ? Any insights ?
> >
> > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=882 has all the
> > details.
> >
> > - Pradeep Vincent
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
>
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pradeep Vincent

2007-Mar-07 01:14 UTC

head link

Re: [Xen-devel] CPU Usage Discrepancies

> > Does your /proc/stat analysis include time spent in the kernel?
Yes.. it does..

- Pradeep Vincent

On 3/6/07, Pradeep Vincent <pradeep.vincent@gmail.com>
wrote:> With a trivial workload like "ls -R /" I see as much as 30% diff
and
> with other workloads I see that xm reports twice what /proc/stat
> reports. Sounds too high to me.
>
> Linux counts all the nanosecs not accounted by the hypervisor towards
> "stolen" or "blocked" as its own usage. This should
include all the
> time spent in the hypervisor in the context of a particular Vcpu - The
> hypervisor counts nsecs as "stolen" or "blocked" only
after the Vcpu''s
> state is changed (from running to something else)  So most part of the
> hypervisor''s CPU usage should be accounted for the same way by xm
and
> by /proc/stat on guests as they both use the same "stolen" and
> "blocked" nsecs as accounted for and maintained by the
hypervisor.
>
> Like you said context switch overhead isn''t accounted for
accurately
> but hypervisor''s cpu usage accounting suffers from the same
problem
> and to the same extent. Even if this isn''t the case,  context
switch
> cpu usage can''t account for this big a difference.
>
> - Pradeep Vincent
>
> On 3/4/07, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:
> > Does your /proc/stat analysis include time spent in the kernel?
> >
> > Another possibility here is that, if your guest blocks a lot, you will
see
> > that Linux counts the guest as ''running'' for less of
the context-switch path
> > than Xen does. This will cause Linux''s estimate of time used
to be less than
> > Xen''s. There''s not much to be done about that: in
general Xen has more
> > knowledge of what is actually going on, including precisely when a
switch of
> > control happens, and the numbers from xentop will be more accurate
than
> > numbers generated by the guest itself (particularly with
frequently-blocking
> > workloads). Although it depends on what you''re interested in
measuring -- if
> > you care about the amount of time spent doing useful application work
(as
> > opposed to context switching) then you might be more interested in the
Linux
> > stats because Xen will include more time spent in the Linux and Xen
context
> > switch paths.
> >
> >  -- Keir
> >
> > On 2/3/07 23:42, "Pradeep Vincent"
<pradeep.vincent@gmail.com> wrote:
> >
> > > I see serious discrepancies between Cpu usage as reported by
/proc/stat on
> > > Xen3
> > > virts and Cpu usage as reported by the hypervisor via
"xm" tool
> > > (cpu_time). The problem exists on Intel and AMD platforms - 1
Vcpu and
> > > multiple Vcpu slots - 1 Physical CPU and multiple Physical CPU
hosts.
> > >
> > > The skew is pronounced with workloads that
"sleep-wake-sleep-wake" at
> > > a high frequency while workloads that hog the CPU don''t
exhibit this
> > > problem as much.
> > >
> > > Anybody seen this ? Any insights ?
> > >
> > > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=882 has
all the
> > > details.
> > >
> > > - Pradeep Vincent
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xensource.com
> > > http://lists.xensource.com/xen-devel
> >
> >
> >
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Mar-07 08:32 UTC

head link

Re: [Xen-devel] CPU Usage Discrepancies

On 7/3/07 01:11, "Pradeep Vincent" <pradeep.vincent@gmail.com>
wrote:
> Linux counts all the nanosecs not accounted by the hypervisor towards
> "stolen" or "blocked" as its own usage. This should
include all the
> time spent in the hypervisor in the context of a particular Vcpu - The
> hypervisor counts nsecs as "stolen" or "blocked" only
after the Vcpu''s
> state is changed (from running to something else)  So most part of the
> hypervisor''s CPU usage should be accounted for the same way by xm
and
> by /proc/stat on guests as they both use the same "stolen" and
> "blocked" nsecs as accounted for and maintained by the
hypervisor.
> 
> Like you said context switch overhead isn''t accounted for
accurately
> but hypervisor''s cpu usage accounting suffers from the same
problem
> and to the same extent. Even if this isn''t the case,  context
switch
> cpu usage can''t account for this big a difference.
It sounds like you could track this one down yourself and post a patch if
you find a bug? :-)

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Mar 2007 - CPU Usage Discrepancies

[Xen-devel] CPU Usage Discrepancies

Re: [Xen-devel] CPU Usage Discrepancies

Re: [Xen-devel] CPU Usage Discrepancies

Re: [Xen-devel] CPU Usage Discrepancies

Re: [Xen-devel] CPU Usage Discrepancies