Mike D. Day
2007-Apr-12 15:16 UTC
[Xen-devel] credit scheduler error rates as reported by HP and UCSD
I''ve been looking at the credit scheduler in light of the paper "Resource Allocation Challenges in Virtual Machine Based IT Environments." http://www.hpl.hp.com/techreports/2007/HPL-2007-25.pdf I''ve got an observation and three questions. My first observation is that the credit scheduler will select a vcpu that has exceeded its credit when there is no other work to be done on any of the other physical cpus in the system. You can verify this by looking at the last couple of lines of the function csched_load_balance in xen/common/sched_credit.c: /* Failed to find more important work elsewhere... */ __runq_remove(snext); return snext; where snext is the vcpu that is over its credit for the current time slice. So now a question: Is this the expected or desired behaviour of the credit scheduler? I would assume so. Why idle vcpu when there is no contention for resources and work to be done by that vcpu? In light of the paper, with very low allocation targets for vcpus, it is not surprising that the positive allocation errors can be quite large. It is also not surprising that the errors (and error distribution) decrease with larger allocation targets. None of this explains the negative allocation errors, where the vcpu''s received less than their pcpu allotments. I speculate that a couple of circumstances may contribute to negative allocation errors: very low weights attached to domains will cause the credit scheduler to attempt to pause vcpus almost every accounting cycle. vcpus may therefore not have as many opportunities to run as frequently as possible. If the ALERT measument method is different, or has a different interval, than the credit schedulers 10ms tick and 30ms accounting cycle, negative errors may result in the view of ALERT. I/O activity: if ALERT performans I/O activity the test, even though it is "cpu intensive" may cause domu to block on dom0 frequently, meaning it will idle more, especially if dom0 has a low credit allocation. Questions: how does ALERT measure actual cpu allocation? Using Xenmon? How does the ALERT exersize the domain? The paper didn''t mention the actual system calls and hypercalls the domains are making when running ALERT. thanks, Mike -- Mike D. Day Virtualization Architect and Sr. Technical Staff Member, IBM LTC Cell: 919 412-3900 ST: mdday@us.ibm.com | AIM: ncmikeday | Yahoo IM: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Diwaker Gupta
2007-Apr-12 16:09 UTC
Re: [Xen-devel] credit scheduler error rates as reported by HP and UCSD
> Questions: how does ALERT measure actual cpu allocation? Using Xenmon? > How does the ALERT exersize the domain? The paper didn''t mention the > actual system calls and hypercalls the domains are making when running > ALERT.yes, we use XenMon. We also have xentop output for correlation and validation. I think the paper describes the workloads we use. When the VMs are running webservers, we use httperf to generate the workload. The iperf benchmark uses the iperf bandwidth measurement tools and I believe the VMs run as iperf servers. For disk I/O, we just use a regular dd command to read a file into /dev/null. I''m not sure how one would accurately generate the set of system calls and hypercalls generated by a given workload. And how would you use this information if you had it? Diwaker -- Web/Blog/Gallery: http://floatingsun.net/blog _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mike D. Day
2007-Apr-12 16:15 UTC
[Xen-devel] Re: credit scheduler error rates as reported by HP and UCSD
On 12/04/07 09:09 -0700, Diwaker Gupta wrote:>>Questions: how does ALERT measure actual cpu allocation? Using Xenmon? >>How does the ALERT exersize the domain? The paper didn''t mention the >>actual system calls and hypercalls the domains are making when running >>ALERT.>yes, we use XenMon. We also have xentop output for correlation and >validation.>I think the paper describes the workloads we use. When the VMs are >running webservers, we use httperf to generate the workload. The iperf >benchmark uses the iperf bandwidth measurement tools and I believe the >VMs run as iperf servers. For disk I/O, we just use a regular dd >command to read a file into /dev/null.For the credit scheduler ALERT tests, was some of the workload generated by httpperf? Or, was domu generating a lot of i/o requests to dom0? thanks, Mike>I''m not sure how one would accurately generate the set of system calls >and hypercalls generated by a given workload. And how would you use >this information if you had it? >-- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Lucy Cherkasova
2007-Apr-12 16:22 UTC
[Xen-devel] Re: credit scheduler error rates as reported by HP and UCSD
Hi Mike,> > My first observation is that the credit scheduler will select a vcpu > that has exceeded its credit when there is no other work to be done on > any of the other physical cpus in the system.In the version of the paper that you read and refer to, we consciously considered the three scheduler comparison using 1 CPU machine: the goal was to compare the "BASIC" scheduler functionality. I will present a bit more results for 2-CPU case during the Xen Summit.> > In light of the paper, with very low allocation targets for vcpus, it > is not surprising that the positive allocation errors can be quite > large. It is also not surprising that the errors (and error > distribution) decrease with larger allocation targets.Because of 1-CPU machine, the explanation of this phenomena is different (it is not related to load balancing of VCPUs) and the Credit scheduler can/should be made more precise. What our paper does not show is the original error distribution for Credit (original -- means after it was released). The resulst that you see in the paper are with the next, significantly improved version by Emmanuel. I beleive that there is still a significant room for improvement.> > None of this explains the negative allocation errors, where the vcpu''s > received less than their pcpu allotments. I speculate that a couple of > circumstances may contribute to negative allocation errors: > > very low weights attached to domains will cause the credit scheduler > to attempt to pause vcpus almost every accounting cycle. vcpus may > therefore not have as many opportunities to run as frequently as > possible. If the ALERT measument method is different, or has a > different interval, than the credit schedulers 10ms tick and 30ms > accounting cycle, negative errors may result in the view of ALERT.ALERT benchmark is setting the allocation of a SINGLE domain (on 1 CPU machine, no other competing domains while running this benchmark) to a chosen target CPU allocation, e.g., 20%, in the non-work-conserving mode. It means that the CPU allocation is CAPPED by 20%. This single domain runs "slurp" (a tight CPU loop, 1 process) to consume the allocated CPU share. The monitoring part of ALERT just collects the measurements from the system using both XenMon and xentop with 1 second reporting granularity Since 1 sec is so much larger than 30 ms slices, there should be possible to get a very accurate CPU allocation for larger CPU allocation targets. However, for 1% CPU allocation you have an immediate error, because Credit will allocate 30ms slice (that is 3% of 1 sec). If Credit would use 10 sec slices than the error will be (theoretically) bounded to 1%. The expectations are that each 1 sec measurements should show 20% CPU utilization for this domain. We run ALERT for different CPU allocation targets from 1% to 90%. The reported error is the error between the targetted CPU allocation and the measured CPU allocation at 1 sec granularity.> > I/O activity: if ALERT performans I/O activity the test, even though > it is "cpu intensive" may cause domu to block on dom0 frequently, > meaning it will idle more, especially if dom0 has a low credit > allocation.There are no I/O activities, ALERT functionality is very special as described above: nothing else is happening in the system.> > Questions: how does ALERT measure actual cpu allocation? Using Xenmon?As, I''ve mentioned above we have measurements from both XenMon and xentop, they are very close for these experiments.> How does the ALERT exersize the domain?ALERT runs "slurp", a cpu-hungry loop, which will "eat" as much CPU as you allocate to it. It is a single process application. The paper didn''t mention the> actual system calls and hypercalls the domains are making when running > ALERT.There is none of such: it is a pure user space benchmark. Best regards, Lucy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mike D. Day
2007-Apr-13 13:06 UTC
[Xen-devel] Re: credit scheduler error rates as reported by HP and UCSD
On 12/04/07 09:22 -0700, Lucy Cherkasova wrote:> >Hi Mike, > > >Because of 1-CPU machine, the explanation of this phenomena is different >(it is not related to load balancing of VCPUs) and the Credit scheduler >can/should be made more precise.No, the opposite is true. Running on a 1=cpu machine will exagerate the over-allocation, because load-balancing has no effect. Hence a vcpu that has already exceeded its allocation will be selected to run. See csched_schedule and csched_load_balance in the source file sched_credit.c>> None of this explains the negative allocation errors, where the vcpu''s >> received less than their pcpu allotments. I speculate that a couple of >> circumstances may contribute to negative allocation errors: >> >> very low weights attached to domains will cause the credit scheduler >> to attempt to pause vcpus almost every accounting cycle. vcpus may >> therefore not have as many opportunities to run as frequently as >> possible. If the ALERT measument method is different, or has a >> different interval, than the credit schedulers 10ms tick and 30ms >> accounting cycle, negative errors may result in the view of ALERT. > >ALERT benchmark is setting the allocation of a SINGLE domain (on 1 CPU machine, >no other competing domains while running this benchmark) to a chosen >target CPU allocation, e.g., 20%, in the non-work-conserving mode. >It means that the CPU allocation is CAPPED by 20%. This single domain runs >"slurp" (a tight CPU loop, 1 process) to consume the allocated CPU share.Yes, again, this will cause the credit scheduler to pause the domu very frequently, which might explain some of the under-allocation errors.>The monitoring part of ALERT just collects the measurements from the system >using both XenMon and xentop with 1 second reporting granularity >Since 1 sec is so much larger than 30 ms slices, there should be possible >to get a very accurate CPU allocation for larger CPU allocation targets. >However, for 1% CPU allocation you have an immediate error, because >Credit will allocate 30ms slice (that is 3% of 1 sec). If Credit >would use 10 sec slices than the error will be (theoretically) bounded >to 1%. > >The expectations are that each 1 sec measurements should show 20% CPU >utilization for this domain.It may be the case that the credit scheduler believes 1 sec measurements should show *at least* a 20% CPU utilization for this domain. That''s the way it is code afaict. I simple patch to csched_credit may be able to confirm this if you can run your tests with a patches scheduler.>We run ALERT for different CPU allocation targets from 1% to 90%. >The reported error is the error between the targetted CPU allocation and >the measured CPU allocation at 1 sec granularity. > >> >> I/O activity: if ALERT performans I/O activity the test, even though >> it is "cpu intensive" may cause domu to block on dom0 frequently, >> meaning it will idle more, especially if dom0 has a low credit >> allocation. > >There are no I/O activities, ALERT functionality is very special as >described above: nothing else is happening in the system.> >> >> Questions: how does ALERT measure actual cpu allocation? Using Xenmon? > >As, I''ve mentioned above we have measurements from both XenMon and xentop, >they are very close for these experiments. > >> How does the ALERT exersize the domain? > >ALERT runs "slurp", a cpu-hungry loop, which will "eat" >as much CPU as you allocate to it. It is a single process application. > > >The paper didn''t mention the >> actual system calls and hypercalls the domains are making when running >> ALERT. > >There is none of such: it is a pure user space benchmark.That''s impossible, there must be hypercalls to load and start the benchmark if nothing else. In addition, there are many hypercalls to the scheduler. What you mean is that the benchmark does not make hypercalls. The only reason I asked is that I wanted to know if block or net I/O was influencing the ALERT test and it sounds as if this is not the case. Mike -- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Emmanuel Ackaouy
2007-Apr-24 15:18 UTC
[Xen-devel] Re: credit scheduler error rates as reported by HP and UCSD
I''ve been away from my computer for a while on holiday and now in the middle of moving so I''ve not had a chance to comment on this thread until now. I apologize for reviving an old topic. I should also say I''ve not read Lucy and Diwaker''s paper nor have I attended the latest Xen Summit but based on prior email conversations with them I suspect I may know enough to insert one comment here: The credit scheduler''s cap mechanism is not a reservation or allocation system. If I remember I threw it in there because Ian thought some host administrators may have wanted to prevent customers paying for SLAs from consuming idle CPU resources when they were available. You wouldn''t want them to get used to freebies and complain when they were "only" getting what they were actually paying for. The design principle for the caps was to add as few new lines of code as possible as it was deemed to be a secondary feature. I''m not surprised it''s not very precise with small caps. I wasn''t even surprised when Lucy and Diwaker found bugs with it after the scheduler was released. Caps are enforced more or less at the accounting period of the credit scheduler which is 30milliseconds. Resource consumption is also calculated by looking at which VCPU is running in the 10ms clock handler. That''s not a serious way to do soft real time scheduling. I think adding actual allocations or some other form of soft real time guarantees to work side by side with the credit scheduler would be a neat idea. Personally I don''t see the point of running experiments or writing papers to understand or show this but, if it convinces someone to do the work, then I''m certainly for it. Emmanuel. On Apr 12, 2007, at 18:22, Lucy Cherkasova wrote:> > Hi Mike, > >> >> My first observation is that the credit scheduler will select a vcpu >> that has exceeded its credit when there is no other work to be done on >> any of the other physical cpus in the system. > > In the version of the paper that you read and refer to, we consciously > considered the three scheduler comparison using 1 CPU machine: > the goal was to compare the "BASIC" scheduler functionality. > I will present a bit more results for 2-CPU case during the Xen Summit. > >> >> In light of the paper, with very low allocation targets for vcpus, it >> is not surprising that the positive allocation errors can be quite >> large. It is also not surprising that the errors (and error >> distribution) decrease with larger allocation targets. > > Because of 1-CPU machine, the explanation of this phenomena is > different > (it is not related to load balancing of VCPUs) and the Credit scheduler > can/should be made more precise. > What our paper does not show is the original error distribution for > Credit > (original -- means after it was released). The resulst that you see in > the paper are with the next, significantly improved version by > Emmanuel. > I beleive that there is still a significant room for improvement. > >> >> None of this explains the negative allocation errors, where the vcpu''s >> received less than their pcpu allotments. I speculate that a couple of >> circumstances may contribute to negative allocation errors: >> >> very low weights attached to domains will cause the credit scheduler >> to attempt to pause vcpus almost every accounting cycle. vcpus may >> therefore not have as many opportunities to run as frequently as >> possible. If the ALERT measument method is different, or has a >> different interval, than the credit schedulers 10ms tick and 30ms >> accounting cycle, negative errors may result in the view of ALERT. > > ALERT benchmark is setting the allocation of a SINGLE domain (on 1 CPU > machine, > no other competing domains while running this benchmark) to a chosen > target CPU allocation, e.g., 20%, in the non-work-conserving mode. > It means that the CPU allocation is CAPPED by 20%. This single domain > runs > "slurp" (a tight CPU loop, 1 process) to consume the allocated CPU > share. > > The monitoring part of ALERT just collects the measurements from the > system > using both XenMon and xentop with 1 second reporting granularity > Since 1 sec is so much larger than 30 ms slices, there should be > possible > to get a very accurate CPU allocation for larger CPU allocation > targets. > However, for 1% CPU allocation you have an immediate error, because > Credit will allocate 30ms slice (that is 3% of 1 sec). If Credit > would use 10 sec slices than the error will be (theoretically) bounded > to 1%. > > The expectations are that each 1 sec measurements should show 20% CPU > utilization for this domain. > > We run ALERT for different CPU allocation targets from 1% to 90%. > The reported error is the error between the targetted CPU allocation > and > the measured CPU allocation at 1 sec granularity. > >> >> I/O activity: if ALERT performans I/O activity the test, even though >> it is "cpu intensive" may cause domu to block on dom0 frequently, >> meaning it will idle more, especially if dom0 has a low credit >> allocation. > > There are no I/O activities, ALERT functionality is very special as > described above: nothing else is happening in the system. > > >> >> Questions: how does ALERT measure actual cpu allocation? Using Xenmon? > > As, I''ve mentioned above we have measurements from both XenMon and > xentop, > they are very close for these experiments. > >> How does the ALERT exersize the domain? > > ALERT runs "slurp", a cpu-hungry loop, which will "eat" > as much CPU as you allocate to it. It is a single process application. > > > The paper didn''t mention the >> actual system calls and hypercalls the domains are making when running >> ALERT. > > There is none of such: it is a pure user space benchmark. > > > Best regards, Lucy >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel