Jerone Young
2005-Feb-23 23:29 UTC
[Xen-devel] Calculating real cpu usage of Xen domains correctly!
Hi all, With the new vm-tools we are trying to get top like capabilities going correctly. Currently we have a program vm-list that has some of this capability but is dependent on the cpu time given by libxc calls (xc_get_dom_info & xc_domain_get_cpu_usage). These two functions give you how much time (in nanoseconds, why is this not documented) the domain has been actively used. Approaches: 1) CPU time % measured per domain (The differential of cpu usage time / some differential time) x 100 new_cpu_time-old_cpu_time new_time-old_time This provides us with the % time the domain had activity ...but does not give us absolute real CPU usage. Another problem here is sometimes you will get percentages like %103 usage, because the cpu usage returned by theses functions looks to be measured slightly over a second. So this one leads to the next 2) Relative usage .. How much of the total of the cpu times is going toward a particular domain. (differential of domain cpu time / total differential of all domains cpu times) new_cpu_time-old_cpu_time total_new_cpu_times-total_old_cpu_times This one gives you a good idea of what percentage a domain took of the total active cpu usage time. Nnot sure this the totally correct way of going about doing this. Anthony Ligouri and I rea bit perplexed if there is a better way. Does anyone know if there is a better way and is there a way to get the real CPU usage. -- Jerone Young Open Virtualization IBM Linux Technology Center jyoung5@us.ibm.com 512-838-1157 (T/L: 678-1157) ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-23 23:44 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
Jerone Young wrote:>1) CPU time % measured per domain >(The differential of cpu usage time / some differential time) x 100 > new_cpu_time-old_cpu_time new_time-old_time > >So for instance, measured over the course of a second, you might get: Domain-0: 98% Domain-1: 99% Domain-2: 0% Which implies that both Domain-0 and Domain-1 are actively running Domain-2 is probably blocked on IO. The typical expectation is that the sum of all the usages is going to be 100%.>2) Relative usage .. How much of the total of the cpu times is going >toward a particular domain. > >The idea here is to have a relative measure. So if Domain-0 used 98% of it''s time, and Domain-1 used 99%, then the result is something like: Domain-0: 49% Domain-1: 50% Domain-2: 0% That looks more right. However, we think that means making assumptions about the underlying scheduling algorithm (meaning that if the difference in cpu_time over a period of time is the same for two domains, that those two domains have gotten the same amount of CPU time). Any advice or recommendations would be greatly appreciated. Regards, Anthony Liguori ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Ian Pratt
2005-Feb-24 00:48 UTC
RE: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
Rather than trying to come up with metrics like this, I''d rather have vm-top show the percentage of a physical CPU that each domain used, independent of it''s allocation. i.e. total including the idle domain will be #CPUS x 100%. Ian> >1) CPU time % measured per domain > >(The differential of cpu usage time / some differential time) x 100 > > new_cpu_time-old_cpu_time new_time-old_time > > > > > So for instance, measured over the course of a second, you might get: > > Domain-0: 98% > Domain-1: 99% > Domain-2: 0% > > Which implies that both Domain-0 and Domain-1 are actively running > Domain-2 is probably blocked on IO. The typical expectation > is that the > sum of all the usages is going to be 100%. > > >2) Relative usage .. How much of the total of the cpu times is going > >toward a particular domain. > > > > > The idea here is to have a relative measure. So if Domain-0 > used 98% of > it''s time, and Domain-1 used 99%, then the result is something like: > > Domain-0: 49% > Domain-1: 50% > Domain-2: 0% > > That looks more right. However, we think that means making > assumptions > about the underlying scheduling algorithm (meaning that if the > difference in cpu_time over a period of time is the same for two > domains, that those two domains have gotten the same amount > of CPU time). > > Any advice or recommendations would be greatly appreciated. > > Regards, > Anthony Liguori > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from > real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-24 01:02 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
Ian Pratt wrote:>Rather than trying to come up with metrics like this, I''d rather have >vm-top show the percentage of a physical CPU that each domain used, >independent of it''s allocation. >i.e. total including the idle domain will be #CPUS x 100%. > > >Right, that''s exactly what we want to do. How does one get that information? We thought that xc_domain_get_cpu_usage() would provide that information but it doesn''t appear too. Thanks, Anthony Liguori>Ian > >------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Andrew Theurer
2005-Feb-24 02:10 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
Anthony Liguori wrote:> Ian Pratt wrote: > >> Rather than trying to come up with metrics like this, I''d rather have >> vm-top show the percentage of a physical CPU that each domain used, >> independent of it''s allocation. >> i.e. total including the idle domain will be #CPUS x 100%. >> >> >> > Right, that''s exactly what we want to do. How does one get that > information? We thought that xc_domain_get_cpu_usage() would provide > that information but it doesn''t appear too.(Caution: I have not even looked at this code and am ready to insert foot in mouth) Can you derive this based on a delta of time stamps, those timestamps correlating to times when the xen scheduler granted a domain''s virtual cpu a physical cpu and when it was removed from a physical cpu? -Andrew Theurer ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Ian Pratt
2005-Feb-24 02:37 UTC
RE: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
> Right, that''s exactly what we want to do. How does one get that > information? We thought that xc_domain_get_cpu_usage() would provide > that information but it doesn''t appear too.I would expect it to return the cumulative total CPU time in ns that the domain has received. If it doesn''t I''d regard it as a bug. Seperately, we should make sure we store a total cumulative time that each CPU has been executing domains (other than idle). Ian ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-24 03:10 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
Ian Pratt wrote:>I would expect it to return the cumulative total CPU time in ns that the >domain has received. > >I think what''s happening is that cpu_time is being updated even if the domain is blocked. Looking in xen/common/schedule.c:__enter_scheduler(), the update to cpu_time is still being updated even if SCHED_OP(do_block, prev) is invoked. Honestly, I don''t understand this code that well but I made the following change: --- schedule.c~ 2005-02-21 22:19:31.000000000 -0600 +++ schedule.c 2005-02-23 19:44:52.000000000 -0600 @@ -389,13 +389,15 @@ { /* This check is needed to avoid a race condition. */ if ( event_pending(prev) ) + { clear_bit(EDF_BLOCKED, &prev->ed_flags); - else + prev->cpu_time += now - prev->lastschd; + } else SCHED_OP(do_block, prev); + } else { + prev->cpu_time += now - prev->lastschd; } - prev->cpu_time += now - prev->lastschd; - /* get policy-specific decision on scheduling... */ next_slice = ops.do_schedule(now); And cpu_time contained what it should contain. Before, if I ran a simple infinite loop in a domU (while true; do true; done), it would report dom0 and domU both using 99% of the CPU. Now it shows domU using 99% of the CPU and dom0 using 1%. This is probably the wrong way to fix this problem but hopefully it makes the right solution obvious to someone who knows this code better. Regards, Anthony Liguori Signed-off-by: Anthony Liguori>If it doesn''t I''d regard it as a bug. > >Seperately, we should make sure we store a total cumulative time that >each CPU has been executing domains (other than idle). > >Ian > > >------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
John L Griffin
2005-Feb-24 13:56 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
> I think what''s happening is that cpu_time is being updated even if the > domain is blocked. Looking in xen/common/schedule.c:__enter_scheduler(),> the update to cpu_time is still being updated even if SCHED_OP(do_block,> prev) is invoked.My quick perusal of the current xeno-unstable code suggests that this SCHED_OP call is a null call. The SCHED_OP macro attempts to call the "do_block" function pointed to in the "struct scheduler sched_bvt_def" array, but this function pointer is never initialized, so it just does a NOP. It appears that what your patch does is limit when cpu_time gets updated, such that the time only gets updated when the exec_domain is in the BLOCKED state: if ( test_bit(EDF_BLOCKED, &prev->ed_flags) ) prev->cpu_time += now - prev->lastschd; } But what does the BLOCKED state mean? This isn''t well documented: "Block the currently-executing domain until a pertinent event occurs." The BLOCKED flag appears to be set by the "SCHEDOP_block" hypervisor call, which is triggered from the guest domains inside the xen_idle() call. Here is some code from xen_idle in process.c: } else if (set_timeout_timer() == 0) { /* NB. Blocking reenable events in a race-free manner. */ HYPERVISOR_block(); } else { local_irq_enable(); HYPERVISOR_yield(); } What this seems to be saying (in regard to your patch working) is that the cpu_time is updated when the domain relinquishes the CPU by block()ing, but cpu_time doesn''t get updated if it relinquishes the CPU by yield()ing. I wonder why this works. Is anyone familiar with block() vs yield(), that could lend some insight into what''s going on? JLG ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-24 20:43 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
Hi John, John L Griffin wrote:>My quick perusal of the current xeno-unstable code suggests that this >SCHED_OP call is a null call. The SCHED_OP macro attempts to call the >"do_block" function pointed to in the "struct scheduler sched_bvt_def" >array, but this function pointer is never initialized, so it just does a >NOP. > >Yes. I agree.>It appears that what your patch does is limit when cpu_time gets updated, >such that the time only gets updated when the exec_domain is in the >BLOCKED state: > >No. Currently, cpu_time will always get updated for the prev domain when __enter_scheduler() is called. My patch modifies that behavior the behavior so that it only gets updated if prev->blocked is false or if prev->blocked is true but there are events pending for the domain. The last part may not be right.> if ( test_bit(EDF_BLOCKED, &prev->ed_flags) ) > prev->cpu_time += now - prev->lastschd; > } > >Sorry, this is my fault, my mail client badly munged the patch when I copy-pasted it. Let me show you the code: if ( test_bit(EDF_BLOCKED, &prev->ed_flags) ) { /* This check is needed to avoid a race condition. */ if ( event_pending(prev) ) clear_bit(EDF_BLOCKED, &prev->ed_flags); else SCHED_OP(do_block, prev); } prev->cpu_time += now - prev->lastschd; Was changed to: if ( test_bit(EDF_BLOCKED, &prev->ed_flags) ) { /* This check is needed to avoid a race condition. */ if ( event_pending(prev) ) { clear_bit(EDF_BLOCKED, &prev->ed_flags); prev->cpu_time += now - prev->lastschd; } else SCHED_OP(do_block, prev); } else { prev->cpu_time += now - prev->lastschd; } It''s not pretty at all. I''m not sure if the update after clear_bit() is necessary either.>What this seems to be saying (in regard to your patch working) is that the >cpu_time is updated when the domain relinquishes the CPU by block()ing, >but cpu_time doesn''t get updated if it relinquishes the CPU by yield()ing. > >No, it''s the opposite. Sorry, I think the whitespace munging made the patch confusing.>I wonder why this works. Is anyone familiar with block() vs yield(), that >could lend some insight into what''s going on? > >I''m quite confident this is the cause of the problem. I''m just not sure about that one cpu_time update. Regards, Anthony Liguori>JLG > > > >------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
John L Griffin
2005-Feb-24 21:46 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
Oops, I indeed misread your patch -- my mistake. However, I''m concerned that we''re missing something bigger. This is my understanding of what the BLOCKED flag (and the surrounding code) means: 1. The guest OS calls HYPERVISOR_block() (thus setting the BLOCKED flag) whenever it wants to yield the processor because it''s waiting for an event. 2. This blocking can happen anytime -- including after the guest OS has been running for quite some time. 3. All the "event_pending(prev)" check in __enter_scheduler() is for is to say "whoops, an event arrived in the time between when the guest OS blocked & right now, so I should clear the BLOCKED flag." This is so the domain can be rescheduled at the scheduler''s earliest discretion (possibly immediately). If these are true, then the original code was correct -- "prev->cpu_time" should be updated during any call to the __enter_scheduler() function, regardless of the state of the BLOCKED flag. Which makes me wonder if something is seriously misbehaving to cause the weird CPU usage totals you''re seeing -- like a yield()ed or block()ed domain improperly getting rescheduled immediately, or an improper modification of the prev->lastschd counter, or the "if (prev == next)" optimization [later in __enter_scheduler()] leaves out some crucial accounting, or...? JLG ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2005-Feb-24 22:02 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
On 24 Feb 2005, at 21:46, John L Griffin wrote:> 3. All the "event_pending(prev)" check in __enter_scheduler() is for > is to > say "whoops, an event arrived in the time between when the guest OS > blocked & right now, so I should clear the BLOCKED flag." This is so > the > domain can be rescheduled at the scheduler''s earliest discretion > (possibly > immediately).It''s to avoid a wakeup waiting race that could cause the domain to be descheduled for unbounded time despite the fact that it now has work to do (event processing). But yes, we''d expect that check not to trigger very often in normal circumstances. -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-24 23:01 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
John L Griffin wrote:>However, I''m concerned that we''re missing something bigger. This is my >understanding of what the BLOCKED flag (and the surrounding code) means: > >You may be correct here. The thing that leads me to believe that is the following. When I first start up domain-0, with no domain-U''s running, the numbers for domain-0 seem right. Domain-0''s usage jumps up to 100% after I create a domain-U. Once I''ve reached this point, it pretty much stays that way. It makes me think someone something''s triggering this behavior.>Which makes me wonder if something is seriously misbehaving to cause the >weird CPU usage totals you''re seeing -- like a yield()ed or block()ed > >Do you have any ideas (or anyone else for that matter) on how to approach this? I''m afraid the impact of putting printk''s in there would be too great. How does one typically debug scheduler issues? I''m willing to spend some cycles looking into this. Regards, Anthony Liguori ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Matt Ayres
2005-Feb-25 02:25 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
On Wed, 2005-02-23 at 19:02 -0600, Anthony Liguori wrote:> Ian Pratt wrote: > > >Rather than trying to come up with metrics like this, I''d rather have > >vm-top show the percentage of a physical CPU that each domain used, > >independent of it''s allocation. > >i.e. total including the idle domain will be #CPUS x 100%. > > > > > > > Right, that''s exactly what we want to do. How does one get that > information? We thought that xc_domain_get_cpu_usage() would provide > that information but it doesn''t appear too.I have lost my xen-devel mailbox recently so I don''t have a history to search, but I remember asking about this before and someone with a hp.com e-mail address stated they had tools such as this completed already and eventually wanted to push them into the Xen tree. It might behoove you to search the archives for who this person was.. ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-25 02:38 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
Matt Ayres wrote:>It might behoove you to search the archives for who this person was.. > > >It was Rob Gardner. Rob hasn''t made the code available yet. Perhaps he wants to now :-) This is a very simple utility. The CPU usage reported by libxc definitely seems to be wrong though. That needs to be fixed. It may have broke in xen-unstable and not xen-2.0.x assuming Rob did his work based on 2.0.x. Regards, -- Anthony Liguori anthony@codemonkey.ws ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
John L Griffin
2005-Feb-25 03:53 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
(Augh, sourceforge mail is driving me nuts; it lost an email exchange between Anthony and myself earlier & seems to have lost the included message below; who knows what other valuable messages were lost today? XenSource: get the new mail reflector going!) Anothony, here are some ideas: - Change the default scheduler in schedule.c to rrobin and/or atropos, to see if they exhibit the same problem. - Create a new global variable "cputime_total" and keep track of the sum of all time intervals to make sure the value is sane: prev->cpu_time += now - prev->lastschd; cputime_total += now - prev->lastschd; - Lie about the value of "now - prev->lastschd": make it a constant 1ms per invocation of __enter_scheduler(), and use that to count the number of times each domain gets scheduled. - Make "lastschd" a global variable, to test if the "prev" structure is getting overwritten somehow. - Make the return type of get_s_time() and its child calls [in time.c] volatile, to make sure the return value of NOW() isn''t getting unnecessarily cached. - In lieu of printk()s on each scheduler entry, you could allocate a few pages of memory, use a signal to fill them up with timestamps & the results of the ops.do_schedule(now) call during your experiment, then printk() the pages out postmortem. JLG Anthony Liguori wrote:> John L Griffin wrote: > > >However, I''m concerned that we''re missing something bigger. This is my> >understanding of what the BLOCKED flag (and the surrounding code)means:> > > > > You may be correct here. The thing that leads me to believe that is the> following. When I first start up domain-0, with no domain-U''s running, > the numbers for domain-0 seem right. Domain-0''s usage jumps up to 100% > after I create a domain-U. Once I''ve reached this point, it pretty much> stays that way. > > It makes me think someone something''s triggering this behavior. > > >Which makes me wonder if something is seriously misbehaving to causethe> >weird CPU usage totals you''re seeing -- like a yield()ed or block()ed > > > > > Do you have any ideas (or anyone else for that matter) on how to > approach this? I''m afraid the impact of putting printk''s in there would> be too great. How does one typically debug scheduler issues? > > I''m willing to spend some cycles looking into this. > > Regards, > Anthony Liguori------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Stephan Diestelhorst
2005-Feb-25 10:31 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
> 1. The guest OS calls HYPERVISOR_block() (thus setting the BLOCKED flag) > whenever it wants to yield the processor because it''s waiting for an > event. > 2. This blocking can happen anytime -- including after the guest OS has > been running for quite some time.Both correct!> 3. All the "event_pending(prev)" check in __enter_scheduler() is for is to > say "whoops, an event arrived in the time between when the guest OS > blocked & right now, so I should clear the BLOCKED flag."This is true as well!> This is so the > domain can be rescheduled at the scheduler''s earliest discretion (possibly > immediately).There is a subtle point here: When we do that check, the domain is actually still running! It will get (probably) descheduled in the "do_schedule" function of the scheduler which is invoked by ops.do_schedule a few lines later in this function.> If these are true, then the original code was correct -- "prev->cpu_time" > should be updated during any call to the __enter_scheduler() function, > regardless of the state of the BLOCKED flag.Thats what I think too. Because the domain stays scheduled regardless what is happening till the call of do_schedule, and should get the time accounted!> Which makes me wonder if something is seriously misbehaving to cause the > weird CPU usage totals you''re seeing -- like a yield()ed or block()ed > domain improperly getting rescheduled immediately, or an improper > modification of the prev->lastschd counter, or the "if (prev == next)" > optimization [later in __enter_scheduler()] leaves out some crucial > accounting, or...?Indeed, those weird results should never occur. I.e. the sum of the relative usage of domains on one cpu (you are not having those two domains spread on two CPUs, are you?) should be <=100%. So what I mean by that is: delta(cpu_time_0 )/delta(real_time) + ... + delta(cpu_time_n) / delta(real_time) <= 100% Assuming that all measurements of delta(cpu_time_i) take place at the same points in time t1, t2. BTW: Which scheduler are you using? Cheers, Stephan ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Stephan Diestelhorst
2005-Feb-25 10:38 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
> 1. The guest OS calls HYPERVISOR_block() (thus setting the BLOCKED flag) > whenever it wants to yield the processor because it''s waiting for an > event. > 2. This blocking can happen anytime -- including after the guest OS has > been running for quite some time.Both correct!> 3. All the "event_pending(prev)" check in __enter_scheduler() is for is to > say "whoops, an event arrived in the time between when the guest OS > blocked & right now, so I should clear the BLOCKED flag."This is true as well!> This is so the > domain can be rescheduled at the scheduler''s earliest discretion (possibly > immediately).There is a subtle point here: When we do that check, the domain is actually still running! It will get (probably) descheduled in the "do_schedule" function of the scheduler which is invoked by ops.do_schedule a few lines later in this function.> If these are true, then the original code was correct -- "prev->cpu_time" > should be updated during any call to the __enter_scheduler() function, > regardless of the state of the BLOCKED flag.Thats what I think too. Because the domain stays scheduled regardless what is happening till the call of do_schedule, and should get the time accounted!> Which makes me wonder if something is seriously misbehaving to cause the > weird CPU usage totals you''re seeing -- like a yield()ed or block()ed > domain improperly getting rescheduled immediately, or an improper > modification of the prev->lastschd counter, or the "if (prev == next)" > optimization [later in __enter_scheduler()] leaves out some crucial > accounting, or...?Indeed, those weird results should never occur. I.e. the sum of the relative usage of domains on one cpu (you are not having those two domains spread on two CPUs, are you?) should be <=100%. So what I mean by that is: delta(cpu_time_0 )/delta(real_time) + ... + delta(cpu_time_n) / delta(real_time) <= 100% Assuming that all measurements of delta(cpu_time_i) take place at the same points in time t1, t2. BTW: Which scheduler are you using? Cheers, Stephan ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Rob Gardner
2005-Feb-25 16:44 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
Anthony Liguori wrote:> Matt Ayres wrote: > >> It might behoove you to search the archives for who this person was.. >> >> >> > It was Rob Gardner. Rob hasn''t made the code available yet. Perhaps he > wants to now :-) > > This is a very simple utility. The CPU usage reported by libxc > definitely seems to be wrong though. That needs to be fixed. It may > have broke in xen-unstable and not xen-2.0.x assuming Rob did his work > based on 2.0.x.I do want to make the code available. It''s just been suffering from creeping featurism... I''ve been putting off sending it out until after adding 1 more feature, then another... So it''s constantly in a state of change. Anyway, I can send out what I''ve got now, which is based on xen-unstable. Is that the base most people are interested in? Rob ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-25 17:10 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!
Rob Gardner wrote:> Anyway, I can send out what I''ve got now, which is based on > xen-unstable. Is that the base most people are interested in?Absolutely! I''d personally love to see it. -- Anthony Liguori anthony@codemonkey.ws ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Rob Gardner
2005-Feb-25 21:35 UTC
Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly! (PATCH)
Anthony Liguori wrote:> Rob Gardner wrote: > >> Anyway, I can send out what I''ve got now, which is based on >> xen-unstable. Is that the base most people are interested in? > > > Absolutely! I''d personally love to see it.OK, here is a patch which provides fine grained cpu utilization reporting. Some notes: - part of the code runs in the hypervisor to collect data, and another part of it runs in dom0 userland to process and display the data - the code contains vestiges of old features, and partially implemented new features; it is a work in progress. - this is the first time I''m sending out a patch to this list so please be gentle on me ;-) Feedback appreciated! Rob Gardner