thr3ads.net - Xen devel - [Xen-devel] [PATCH 1/2] cpu steal time accounting [Feb 2006]

If this information is useful, please help other people find it:
Share via:

Rik van Riel

2006-Feb-21 00:51 UTC

[Xen-devel] [PATCH 1/2] cpu steal time accounting

Allow guest domains to get information from the hypervisor on how much
cpu time their virtual cpus have used.  This is needed to estimate the
cpu steal time.

Signed-off-by: Rik van Riel <riel@redhat.com>

--- xen/include/public/vcpu.h.steal	2006-02-07 18:01:41.000000000 -0500
+++ xen/include/public/vcpu.h	2006-02-17 13:51:45.000000000 -0500
@@ -51,6 +51,14 @@
 /* Returns 1 if the given VCPU is up. */
 #define VCPUOP_is_up                3
 
+/*
+ * Get information on how much CPU time this VCPU has used, etc...
+ *
+ * @extra_arg == pointer to an empty dom0_getvcpuinfo_t, the "OUT"
variables
+ *               of which filled in with scheduler info.
+ */
+#define VCPUOP_cpu_info             4
+
 #endif /* __XEN_PUBLIC_VCPU_H__ */
 
 /*
--- xen/common/domain.c.steal	2006-02-07 18:01:40.000000000 -0500
+++ xen/common/domain.c	2006-02-17 13:52:44.000000000 -0500
@@ -451,8 +451,24 @@
     case VCPUOP_is_up:
         rc = !test_bit(_VCPUF_down, &v->vcpu_flags);
         break;
+
+    case VCPUOP_cpu_info:
+	{
+	    struct dom0_getvcpuinfo vi = { 0, };
+	    vi.online = !test_bit(_VCPUF_down, &v->vcpu_flags);
+	    vi.blocked = test_bit(_VCPUF_blocked, &v->vcpu_flags);
+	    vi.running  = test_bit(_VCPUF_running, &v->vcpu_flags);
+	    vi.cpu_time = v->cpu_time;
+	    vi.cpu = v->processor;
+	    rc = 0;
+
+	    if ( copy_to_user(arg, &vi, sizeof(dom0_getvcpuinfo_t)) )
+		rc = -EFAULT;
+	    break;
+	}
     }
 
+
     return rc;
 }
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2006-Feb-21 09:05 UTC

head link

RE: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Why do you need to add a new VCPUOP while doing same thing as DOM0_GETVCPUINFO?

Thanks,
Kevin
>-----Original Message-----
>From: xen-devel-bounces@lists.xensource.com
>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Rik van Riel
>Sent: 2006年2月21日 8:51
>To: xen-devel@lists.xensource.com
>Subject: [Xen-devel] [PATCH 1/2] cpu steal time accounting
>
>Allow guest domains to get information from the hypervisor on how much
>cpu time their virtual cpus have used.  This is needed to estimate the
>cpu steal time.
>
>Signed-off-by: Rik van Riel <riel@redhat.com>
>
>--- xen/include/public/vcpu.h.steal	2006-02-07 18:01:41.000000000 -0500
>+++ xen/include/public/vcpu.h	2006-02-17 13:51:45.000000000 -0500
>@@ -51,6 +51,14 @@
> /* Returns 1 if the given VCPU is up. */
> #define VCPUOP_is_up                3
>
>+/*
>+ * Get information on how much CPU time this VCPU has used, etc...
>+ *
>+ * @extra_arg == pointer to an empty dom0_getvcpuinfo_t, the
"OUT" variables
>+ *               of which filled in with scheduler info.
>+ */
>+#define VCPUOP_cpu_info             4
>+
> #endif /* __XEN_PUBLIC_VCPU_H__ */
>
> /*
>--- xen/common/domain.c.steal	2006-02-07 18:01:40.000000000 -0500
>+++ xen/common/domain.c	2006-02-17 13:52:44.000000000 -0500
>@@ -451,8 +451,24 @@
>     case VCPUOP_is_up:
>         rc = !test_bit(_VCPUF_down, &v->vcpu_flags);
>         break;
>+
>+    case VCPUOP_cpu_info:
>+	{
>+	    struct dom0_getvcpuinfo vi = { 0, };
>+	    vi.online = !test_bit(_VCPUF_down, &v->vcpu_flags);
>+	    vi.blocked = test_bit(_VCPUF_blocked, &v->vcpu_flags);
>+	    vi.running  = test_bit(_VCPUF_running, &v->vcpu_flags);
>+	    vi.cpu_time = v->cpu_time;
>+	    vi.cpu = v->processor;
>+	    rc = 0;
>+
>+	    if ( copy_to_user(arg, &vi, sizeof(dom0_getvcpuinfo_t)) )
>+		rc = -EFAULT;
>+	    break;
>+	}
>     }
>
>+
>     return rc;
> }
>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rik van Riel

2006-Feb-21 12:56 UTC

head link

RE: [Xen-devel] [PATCH 1/2] cpu steal time accounting

On Tue, 21 Feb 2006, Tian, Kevin wrote:
> Why do you need to add a new VCPUOP while doing same thing as 
> DOM0_GETVCPUINFO?
Because the dom0_ops only work for dom0 and reworking that
function to allow non-privileged domains to get info just
on themselves would end up being a way uglier patch.

-- 
All Rights Reversed

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2006-Feb-21 14:06 UTC

head link

RE: [Xen-devel] [PATCH 1/2] cpu steal time accounting

>From: Rik van Riel [mailto:riel@redhat.com]
>Sent: 2006年2月21日 20:56
>
>On Tue, 21 Feb 2006, Tian, Kevin wrote:
>
>> Why do you need to add a new VCPUOP while doing same thing as
>> DOM0_GETVCPUINFO?
>
>Because the dom0_ops only work for dom0 and reworking that
>function to allow non-privileged domains to get info just
>on themselves would end up being a way uglier patch.
>
See your point now. Since physical processor id is also exported in your patch,
will it cause a trend to allow non-privileged domain to query more physical
context information about domain itself? Like GETDOMAININFO, GETVCPUCONTEXT,
etc. For example, guest may use gap info between max_pages and tot_pages to
decide whether eagerly adding free pages as caches. It can also consider that
info as an indicator of some type of tight resource contention. If it''s
the usage model, maybe we can consider move them out of dom0_ prefix and make it
common.

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Feb-21 17:42 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

On 21 Feb 2006, at 00:51, Rik van Riel wrote:
> Allow guest domains to get information from the hypervisor on how much
> cpu time their virtual cpus have used.  This is needed to estimate the
> cpu steal time.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
Probably we''ll kill off the dom0_op instead (or at least rename the 
info structure and leave the old dom0_op just as a legacy placeholder 
for a while).

Looking at the other patch, I think I''m missing higher level context 
regarding what this patch is about. I grepped around for 
account_steal_time() -- looks like it''s currently used by s390, but as 
part of a rather bigger patch that also calls 
account_user_time/account_system_time.

Is there a lkml thread I should read to get up to speed on this? Should 
your patch be using those other functions in this account_foo_time api? 
What functionality do we currently miss by not targetting that api?

  thanks,
  Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rik van Riel

2006-Feb-21 19:32 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

On Tue, 21 Feb 2006, Keir Fraser wrote:
> Looking at the other patch, I think I''m missing higher level
context
> regarding what this patch is about. I grepped around for 
> account_steal_time() -- looks like it''s currently used by s390,
but as
> part of a rather bigger patch that also calls 
> account_user_time/account_system_time.
Basically steal time is the amount of time when:
1) we had a runnable task, but
2) it was not running, because our vcpu was scheduled
   away by the hypervisor

Thus, steal time measures how much a particular workload
inside a virtual machine is impacted by contention on the
cpu between different virtual machines.

It also makes sure that time the vcpu itself was not running
is not erroneously accounted to the currently running process,
which matters when a user is trying to determine how much CPU
time a particular task needs.

-- 
All Rights Reversed

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Diwaker Gupta

2006-Feb-21 21:24 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

> Basically steal time is the amount of time when:
> 1) we had a runnable task, but
> 2) it was not running, because our vcpu was scheduled
>    away by the hypervisor
Just FYI: XenMon provides a similar metric which we call Waiting Time
-- the time a domain was runnable but not running. Of course, yours is
a different use case, since you want to query the stolen time from
within the guest.

Diwaker
--
Web/Blog/Gallery: http://floatingsun.net/blog

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Feb-22 09:00 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

On 21 Feb 2006, at 19:32, Rik van Riel wrote:
> Basically steal time is the amount of time when:
> 1) we had a runnable task, but
> 2) it was not running, because our vcpu was scheduled
>    away by the hypervisor
>
> Thus, steal time measures how much a particular workload
> inside a virtual machine is impacted by contention on the
> cpu between different virtual machines.
>
> It also makes sure that time the vcpu itself was not running
> is not erroneously accounted to the currently running process,
> which matters when a user is trying to determine how much CPU
> time a particular task needs.
Is accounting user/system time an unnecessary extra? I guess we already 
do it by sampling at tick granularity anyway?

Should ''steal time'' include blocked time when the guest had no
work to
execute?

Also, given the logic currently only triggers when the guest detects it 
''missed a tick'', would it be good enough simply to account 
#missed_ticks as steal time. It would certainly be a lot simpler to 
implement, and you end up dividing everything down to tick granularity 
anyway. :-)

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rik van Riel

2006-Feb-22 14:27 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

On Wed, 22 Feb 2006, Keir Fraser wrote:
> Is accounting user/system time an unnecessary extra? I guess we already 
> do it by sampling at tick granularity anyway?
> 
> Should ''steal time'' include blocked time when the guest
had no work to
> execute?
No, this is idle time.  If the guest had no work to do,
it wasn''t suffering from contention of the CPU.
> Also, given the logic currently only triggers when the guest detects it
> ''missed a tick'', would it be good enough simply to
account #missed_ticks as
> steal time. It would certainly be a lot simpler to implement, and you end
up
> dividing everything down to tick granularity anyway. :-)
Not good enough if the hypervisor ends up scheduling
guests on a granularity finer than the guest''s own
timer ticks.

The reason for only checking steal time when we miss
a tick is that I don''t want to run the (expensive?)
steal time logic on every timer interrupt.

-- 
All Rights Reversed

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Feb-22 17:08 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

On 22 Feb 2006, at 14:27, Rik van Riel wrote:
>
>> Is accounting user/system time an unnecessary extra? I guess we 
>> already
>> do it by sampling at tick granularity anyway?
>>
>> Should ''steal time'' include blocked time when the
guest had no work to
>> execute?
>
> No, this is idle time.  If the guest had no work to do,
> it wasn''t suffering from contention of the CPU.
But the ''vcpu_time'' you read out of Xen excludes time spent 
blocked/unrunnable. Won''t you end up accounting  that as it it were 
involuntary preemption? Also:
  1. What if a guest gets preempted for lots of short time periods (less 
than a jiffy). Then some arbitrary time in the future is preempted for 
long enough to activate you stolen-time logic. Won''t you end up 
incorrectly accounting the accumulated short time periods?
  2. Is the Xen provided ''vcpu_time'', divided down into
jiffies, even
comparable with the kstats that you sum? What about accumulated 
rounding errors in ''vcpu_time'' and the kstats causing relative
drift
between them over time?

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rik van Riel

2006-Feb-22 17:11 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

On Wed, 22 Feb 2006, Keir Fraser wrote:
> But the ''vcpu_time'' you read out of Xen excludes time
spent
> blocked/unrunnable. Won''t you end up accounting  that as it it
were
> involuntary preemption? Also:
If the domain is unrunnable, surely there won''t be a
process on the virtual cpu that is runnable?  Or am
I overlooking something here?
>  1. What if a guest gets preempted for lots of short time periods (less 
> than a jiffy). Then some arbitrary time in the future is preempted for 
> long enough to activate you stolen-time logic. Won''t you end up 
> incorrectly accounting the accumulated short time periods?
This is true.  I''m not sure we''d want to get the vcpu info
at every timer interrupt though, that could end up being
too expensive...
>  2. Is the Xen provided ''vcpu_time'', divided down into
jiffies, even
> comparable with the kstats that you sum? What about accumulated rounding 
> errors in ''vcpu_time'' and the kstats causing relative
drift between them
> over time?
In the tests I ran the steal time seemed to work out quite
well with what I expected it to be, watching /proc/stat from
inside the guest and xentop from dom0 simultaneously.

The rounding errors happen occasionally (I added printks to
the if statements catching them), but not all that often...

-- 
All Rights Reversed

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Feb-22 17:45 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

On 22 Feb 2006, at 17:11, Rik van Riel wrote:
> If the domain is unrunnable, surely there won''t be a
> process on the virtual cpu that is runnable?  Or am
> I overlooking something here?
Oh, I see, this is dealt with inside account_steal_time(). No problem 
then.
>>  1. What if a guest gets preempted for lots of short time periods 
>> (less
>> than a jiffy). Then some arbitrary time in the future is preempted for
>> long enough to activate you stolen-time logic. Won''t you end
up
>> incorrectly accounting the accumulated short time periods?
>
> This is true.  I''m not sure we''d want to get the vcpu
info
> at every timer interrupt though, that could end up being
> too expensive...
Having to call down to Xen to get that information is unfortunate. 
Perhaps we can export it in shared_info, or have the guest register a 
virtual address it would like the info written to.
> In the tests I ran the steal time seemed to work out quite
> well with what I expected it to be, watching /proc/stat from
> inside the guest and xentop from dom0 simultaneously.
>
> The rounding errors happen occasionally (I added printks to
> the if statements catching them), but not all that often...
I think the calculation of delta stolen time would be clearer as:
  ((system_time - prev_system_time) - (vcpu_time - prev_vcpu_time)) / 
NS_PER_TICK
where system_time/vcpu_time become the prev_system_time/prev_vcpu_time 
the next time your logic is triggered.

It has another advantage that it does not subtract quantities that can 
slowly relatively drift over days/weeks.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Hecht

2006-Feb-22 23:58 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

> On 22 Feb 2006, at 17:11, Rik van Riel wrote:
> 
>> If the domain is unrunnable, surely there won''t be a
>> process on the virtual cpu that is runnable?  Or am
>> I overlooking something here?
> 
> Oh, I see, this is dealt with inside account_steal_time(). No problem then.
> 
But, the guest will not be able to properly partition this time 
correctly between the idle time stat and the steal time stat.

When the vcpu enters its idle loop and blocks, at some point it will 
become ready to run again.  But, it will not necessarily run right away. 
  Instead, it will sit in the hypervisors runqueue for some amount of 
time.  While in the runqueue, this time is "stolen", since the vcpu 
wants to run but isn''t (at this point it is involuntarily waiting). 
Under a heavily overcommitted system, the amount of time in the runqueue 
following the block may be nontrivial.

But, the guest (the code in account_steal_time()) cannot determine the 
time at which the vcpu was requeued into the hypervisor''s runqueue.  It
will account all of this time toward the "idle time" stat, rather than
partitioning it between "idle time" and "steal time".

To solve this, it may be best to have the hypervisor interface expose 
per-vcpu stolen time directly, rather than vcpu_time.  Then the guest 
does not need to try to guess whether to charge (system_time - 
vcpu_time) against idle or steal.
>>>  1. What if a guest gets preempted for lots of short time periods
(less
>>> than a jiffy). Then some arbitrary time in the future is preempted
for
>>> long enough to activate you stolen-time logic. Won''t you
end up
>>> incorrectly accounting the accumulated short time periods?
>>
>> This is true.  I''m not sure we''d want to get the vcpu
info
>> at every timer interrupt though, that could end up being
>> too expensive...
> 
> Having to call down to Xen to get that information is unfortunate. 
> Perhaps we can export it in shared_info, or have the guest register a 
> virtual address it would like the info written to.
> 
If you do change how this information is passed through the interface, 
then maybe this would be a good time to define the interface to export 
"stolen time" computed by the hypervisor, rather than "vcpu
time".

On a topic relating to these patches, the Linux scheduler also uses the 
routine sched_clock() to calculate the run-time (and sleep-time) of 
processes.  With the current code, these calculations will include 
stolen time in the total run-time of a process.  Perhaps sched_clock() 
should be a clock that does not advance when time is stolen by the 
hypervisor?

We''ve been thinking about these issues also.  I''ve attached a
document
that describes our current thoughts.  The document describes the portion 
of the VMI that deals with time and describes the changes to Linux to 
accommodate VMI Time (Rik, this is an updated draft of the document I 
sent you before).  Comments are welcome.

Thanks,
Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Feb-23 08:40 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

On 22 Feb 2006, at 23:58, Dan Hecht wrote:
> To solve this, it may be best to have the hypervisor interface expose 
> per-vcpu stolen time directly, rather than vcpu_time.  Then the guest 
> does not need to try to guess whether to charge (system_time - 
> vcpu_time) against idle or steal.
Yes, the distinction between stolen and available time does makes sense 
(although I''m not sure ''available'' is a great name)
otherwise you can''t
account for wakeup latencies. account_steal_time() would need to be 
modified in Linux, though, as we would not need its dodgy heuristic for 
deciding whether to account to stolen time or iowait/idle.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Feb-23 08:48 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

On 22 Feb 2006, at 23:58, Dan Hecht wrote:
> The interface does not provide a way for a vcpu to atomically read the
> entire set of time values { wallclock time, real time counter, 
> available
> time counter, stolen time counter }.  While it seems that such a
> mechanism might be "nice" to have in theory, it seems unnecessary
in
> practice.  Indeed, real hardware rarely provides this functionality.
>
> One nice side effect of having this feature is that the explicit stolen
> time counter (or available time counter) can be dropped entirely from
> the interface, since its value can be inferred from the real time
> counter and available time counter (or stolen time counter).
I don''t understand the last paragraph here. It''s not true
that, for
example,
  available_time = real_time - stolen_time
right?

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Feb-23 10:04 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

On 23 Feb 2006, at 08:48, Keir Fraser wrote:
>> One nice side effect of having this feature is that the explicit 
>> stolen
>> time counter (or available time counter) can be dropped entirely from
>> the interface, since its value can be inferred from the real time
>> counter and available time counter (or stolen time counter).
>
> I don''t understand the last paragraph here. It''s not true
that, for
> example,
>  available_time = real_time - stolen_time
> right?
Ah, okay, I see that in fact it is. :-)

Why not just have a halted_time instead? I think that''s what
we''d go
for in Xen.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rik van Riel

2006-Feb-23 13:18 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

On Thu, 23 Feb 2006, Keir Fraser wrote:
> I don''t understand the last paragraph here. It''s not true
that, for example,
>  available_time = real_time - stolen_time
> right?
I think that is true.  Not sure why it wouldn''t be...

-- 
All Rights Reversed

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Hecht

2006-Feb-23 18:17 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Keir Fraser wrote:> 
> On 22 Feb 2006, at 23:58, Dan Hecht wrote:
> 
>> To solve this, it may be best to have the hypervisor interface expose 
>> per-vcpu stolen time directly, rather than vcpu_time.  Then the guest 
>> does not need to try to guess whether to charge (system_time - 
>> vcpu_time) against idle or steal.
> 
> Yes, the distinction between stolen and available time does makes sense 
> (although I''m not sure ''available'' is a great
name)
The term "available" came from looking at it from the perspective of
the
vcpu, rather than the hypervisor.  To the vcpu, the time that it''s 
running or halted is, in a sense, "available" to it (even though, (as
an
optimization) the hypervisor might use the pcpu to do something else 
when the vcpu is halted).  But, anytime the hypervisor forces the vcpu 
to wait involuntarily, the time is no longer "available" to it, but
stolen.

Said another way, on native hardware, stolen time is zero.  All time is 
"available" to the OS.  Though it might choose to halt for some of
this
time, the time is still "available".
> otherwise you can''t 
> account for wakeup latencies. account_steal_time() would need to be 
> modified in Linux, though, as we would not need its dodgy heuristic for 
> deciding whether to account to stolen time or iowait/idle.
> 
Exactly.  We slightly refactor the account_steal_time() interface to 
have an interface that bypasses the heuristic.

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Hecht

2006-Feb-23 19:43 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Keir Fraser wrote:> 
> On 23 Feb 2006, at 08:48, Keir Fraser wrote:
> 
>>> One nice side effect of having this feature is that the explicit
stolen
>>> time counter (or available time counter) can be dropped entirely
from
>>> the interface, since its value can be inferred from the real time
>>> counter and available time counter (or stolen time counter).
>>
>> I don''t understand the last paragraph here. It''s not
true that, for
>> example,
>>  available_time = real_time - stolen_time
>> right?
> 
> Ah, okay, I see that in fact it is. :-)
>
Yeah, by definition, available_time is (real_time - stolen_time).
> Why not just have a halted_time instead? I think that''s what
we''d go for
> in Xen.
> 
I assume you mean have halted_time in addition to vcpu_time, since
you''d
still need vcpu_time to determine stolen_time for the case a running 
vcpu is made to involuntarily wait.

Essentially, by adding halted_time, the Xen and VMI interfaces would be 
very similar in this regard.  We''d have:

xen_system_time                 <==> vmi_real_time
xen_vcpu_time + xen_halted_time <==> vmi_available_time
xen_stolen_time                 <==> vmi_stolen_time

The reason the vmi does not further partition vmi_available_time into 
vcpu_time and halted_time is because the guest is able to correctly do 
this partitioning, if it chooses to do so.  It can be done with:

halt_start = vmi_available_time counter;
Halt;
When the vcpu starts running again,
   halt_end = vmi_available_time counter;

We know during this time, vcpu_time == 0.  halted_time == (halt_end - 
halt_start).  And, when executing outside this region, any 
vmi_available_time that passes is vcpu_time.

So, rather than potentially complicating the interface, the vmi leaves 
the partitioning of vmi_available_time into vcpu_time and halted_time up 
to the guest.  Besides, perhaps there are other ways the guest may want 
to partition vmi_available_time other than into vcpu_time/halted_time, 
so why not leave this up to the guest OS?

Also, unless halted_time/vcpu_time is defined very carefully and 
precisely, having it as part of the interface can become confusing in 
the case the hypervisor wants to implement "halt" using a busy wait,
or
when the paravirtualized kernel is run on native hardware.  In these 
cases, the vcpu is still hogging a pcpu, so it might be unclear whether 
to consider that time vcpu_time or halted_time.

If vcpu_time is defined to be time in which a pcpu is dedicated to the 
vcpu (even if the vcpu executed the "halt" interface and is busy 
waiting), then halted_time would be defined to be time in which no pcpu 
is dedicated to the vcpu but the vcpu is not involuntarily waiting (i.e. 
the remaining time that is not stolen).  But, why expose this hypervisor 
implementation detail through the interface?

On the other hand, if vcpu_time is defined to be the time in which a 
pcpu is dedicated to the vcpu *and* the vcpu is not halted, then 
halted_time is defined to be the time the vcpu is halted (no matter how 
the hypervisor implements the halt -- a pcpu may still be dedicated to 
the vcpu).  But, in this case, why not leave the partitioning of 
available_time into vcpu_time/halted_time up to the guest OS?

Just trying to say that partitioning available_time into vcpu_time and 
halted_time may just add confusion and make the interface more 
complicated without making the interface any more powerful.

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rik van Riel

2006-Feb-24 19:04 UTC

head link

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

On Wed, 22 Feb 2006, Keir Fraser wrote:
> I think the calculation of delta stolen time would be clearer as:
>  ((system_time - prev_system_time) - (vcpu_time - prev_vcpu_time)) /
> NS_PER_TICK
The "(system_time - prev_system_time)" above is equivalent
to the delta_cpu variable.

I agree on saving the prev_vcpu_time though, it makes the code
quite a bit nicer.  I hope you like this patch ;)


Signed-off-by: Rik van Riel <riel@redhat.com>


--- linux-2.6.15.i686/arch/i386/kernel/time-xen.c.steal	2006-02-17
16:44:40.000000000 -0500
+++ linux-2.6.15.i686/arch/i386/kernel/time-xen.c	2006-02-24 13:53:08.000000000
-0500
@@ -48,6 +48,7 @@
 #include <linux/mca.h>
 #include <linux/sysctl.h>
 #include <linux/percpu.h>
+#include <linux/kernel_stat.h>
 
 #include <asm/io.h>
 #include <asm/smp.h>
@@ -77,6 +78,7 @@
 #include <asm/arch_hooks.h>
 
 #include <xen/evtchn.h>
+#include <xen/interface/vcpu.h>
 
 #if defined (__i386__)
 #include <asm/i8259.h>
@@ -125,6 +127,9 @@ static u32 shadow_tv_version;
 static u64 processed_system_time;   /* System time (ns) at last processing. */
 static DEFINE_PER_CPU(u64, processed_system_time);
 
+/* Keep track of how much time our vcpu used, for steal time calculation. */
+static DEFINE_PER_CPU(u64, prev_vcpu_time);
+
 /* Must be signed, as it''s compared with s64 quantities which can be
-ve. */
 #define NS_PER_TICK (1000000000LL/HZ)
 
@@ -624,7 +629,32 @@ irqreturn_t timer_interrupt(int irq, voi
          * Local CPU jiffy work. No need to hold xtime_lock, and I''m
not sure
          * if there is risk of deadlock if we do (since update_process_times
          * may do scheduler rebalancing work and thus acquire runqueue locks).
+	 *
+	 * If we have not run for a while, chances are this vcpu got scheduled
+	 * away.  Try to estimate how much time was stolen.
          */
+	if (delta_cpu > (s64)(2 * NS_PER_TICK)) {
+		dom0_getvcpuinfo_t vcpu = { 0, };
+		s64 steal;
+		u64 dvcpu;
+
+		if (HYPERVISOR_vcpu_op(VCPUOP_cpu_info, cpu, &vcpu) == 0) {
+			dvcpu = vcpu.cpu_time - per_cpu(prev_vcpu_time, cpu);
+			per_cpu(prev_vcpu_time, cpu) = vcpu.cpu_time;
+			steal = delta_cpu - (s64)dvcpu;
+
+			if (steal > 0) {
+				/* do_div modifies the variable in place. */
+				do_div(steal, NS_PER_TICK);
+
+				delta_cpu -= steal * NS_PER_TICK;
+				per_cpu(processed_system_time, cpu) ++							steal * NS_PER_TICK;
+				account_steal_time(current, (cputime_t)steal);
+			}
+		}
+	}
+
 	while (delta_cpu >= NS_PER_TICK) {
 		delta_cpu -= NS_PER_TICK;
 		per_cpu(processed_system_time, cpu) += NS_PER_TICK;
--- linux-2.6.15.i686/include/xen/interface/vcpu.h.steal	2006-02-17
16:14:17.000000000 -0500
+++ linux-2.6.15.i686/include/xen/interface/vcpu.h	2006-02-17 16:14:52.000000000
-0500
@@ -51,6 +51,14 @@
 /* Returns 1 if the given VCPU is up. */
 #define VCPUOP_is_up                3
 
+/*
+ * Get information on how much CPU time this VCPU has used, etc...
+ *
+ * @extra_arg == pointer to an empty dom0_getvcpuinfo_t, the "OUT"
variables
+ *               of which filled in with scheduler info.
+ */
+#define VCPUOP_cpu_info             4
+
 #endif /* __XEN_PUBLIC_VCPU_H__ */
 
 /*

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Feb 2006 - [PATCH 1/2] cpu steal time accounting

[Xen-devel] [PATCH 1/2] cpu steal time accounting

RE: [Xen-devel] [PATCH 1/2] cpu steal time accounting

RE: [Xen-devel] [PATCH 1/2] cpu steal time accounting

RE: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting

Re: [Xen-devel] [PATCH 1/2] cpu steal time accounting