NISHIGUCHI Naoki
2008-Dec-03 08:54 UTC
[Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
Hi all, This patch is what I spoke about improvement of credit scheduler in XenSummit Tokyo. My presentation is now available at http://www.xen.org/xensummit/xensummit_fall_2008.html. In case of using Xen hypervisor on the client virtualization environment, especially enabling vtd and passing through some devices to a domain, I think that it is neccessary to reduce time for the vcpu in the domain to wait its turn on a run queue. My approach is to keep the vcpu''s priority in BOOST and to switch the vcpu to another vcpu at short intervals when there are some vcpus in BOOST priority . Changes to credit scheduler are the following: - Improve the precision of credit There are three changes. First change is to subtract credit for consumed cpu time accurately. Second change is to preserve the value of credit when credit of the vcpu is over upper bound value(currently 300). Third change is to shorten cpu time per one credit(experimentally 30000 credits to 30ms). - Shorten allocated time to a vcpu in BOOST priority Allocated time is experimentally changed to 2ms from 30ms. - Balance credits of each vcpu of a domain - Introduce boost credit Boost credit is new credit to keep a vcpu''s priority in BOOST. When a value of boost credit is 1 or more, priority of the vcpu is set to BOOST. Moreover, to avoid the fall of priority for abrupt cpu consumption of the vcpu, upper bound value of boost credit can be set. How to use: On this patch, I added bcredit scheduler(named boost credit scheduler) as third scheduler. In order to use bcredit scheduler, add "sched=bcredit" option to xen.gz in grub.conf. Then in order to boost a domain, you should enable boost credit of the domain. There is two method. 1. Using xm command, set upper bound value of boost credit of the domain. It is specified by not the value of credit but the millisecond. It is named max boost period. e.g. domain:0, max boost period:100ms xm sched-bcredit -d 0 -m 100 2. Using xm command, set upper bound value of boost credit of the domain and set boost ratio. Boost ratio is ratio to one CPU that is used for distributing boost credit. Boost credit corresponding to boost ratio is distributed in place of credit. An influence of other domains is not received because of ratio to one CPU. e.g. domain:0, max boost period:500ms, boost ratio:80(80% to one CPU) xm sched-bcredit -d 0 -m 500 -r 80 Please review this patch. Any comments are appreciated. Best regards, Naoki Nishiguchi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Dec-03 09:16 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
On 03/12/2008 08:54, "NISHIGUCHI Naoki" <nisiguti@jp.fujitsu.com> wrote:> Please review this patch. > Any comments are appreciated.Don''t hack it into the existing sched_credit.c unless you are really sharing significant amounts of stuff (which it looks like you aren''t?). sched_bcredit.c would be a cleaner name if there''s no sharing. Is a new scheduler necessary -- could the existing credit scheduler be generalised with your boost mechanism to be suitable for both client and server? The issue with multiple schedulers is that it''s most likely the non-default will not be tested, used or maintained. The default credit scheduler gets little enough love as it is, and it''s really the only sensible scheduler to choose now (SEDF is not great -- good example of a rotten non-default scheduler). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Dec-03 12:46 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
On Wed, Dec 3, 2008 at 9:16 AM, Keir Fraser <keir.fraser@eu.citrix.com> wrote:> Don''t hack it into the existing sched_credit.c unless you are really sharing > significant amounts of stuff (which it looks like you aren''t?). > sched_bcredit.c would be a cleaner name if there''s no sharing. Is a new > scheduler necessary -- could the existing credit scheduler be generalised > with your boost mechanism to be suitable for both client and server?I think we ought to be able to work this out; the functionality doesn''t sound that different, and as you say, keeping two schedulers around is only an invitation to bitrot. The more accurate credit scheduling and vcpu credit "balancing" seem like good ideas. For the other changes, it''s probably worth measuring on a battery of tests to see what kinds of effects we get, especially on network throughput. Nishiguchi-san, (I hope that''s right!) as I understood from your presentation, you haven''t tested this on a server workload, but you predict that the "boost" scheduling of 2ms will cause unnecessary overhead for server workloads. Is that correct? Couldn''t we avoid the overhead this way: If a vcpu has 5 or more "boost" credits, we simply set the next-timer to 10ms. If the vcpu yields before then, we subtract the amount of "boost" credits actually used. If not, we subtract 5. That way we''re not interrupting any more frequently than we were before. Come to think of it: won''t the effect of setting the ''boost'' time to 2ms be basically counteracted by giving domains boost credits? I thought the purpose reducing the boost time was to allow other domains to run more quickly? But if a domain has more than 5 ''boost'' credits, it will run for a full 10 ms anyway. Is that not so? Could you test your video latency measurement with all the other optimizations, but with the "boost" time set to 10ms instead of 2? If it works well, it''s probably worth simply merging the bulk of your changes in and testing with server workloads. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
NISHIGUCHI Naoki
2008-Dec-04 07:45 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
Thank you for your comment. I''ll try to be suitable for both server and client. Regards, Naoki Nishiguchi Keir Fraser wrote:> On 03/12/2008 08:54, "NISHIGUCHI Naoki" <nisiguti@jp.fujitsu.com> wrote: > >> Please review this patch. >> Any comments are appreciated. > > Don''t hack it into the existing sched_credit.c unless you are really sharing > significant amounts of stuff (which it looks like you aren''t?). > sched_bcredit.c would be a cleaner name if there''s no sharing. Is a new > scheduler necessary -- could the existing credit scheduler be generalised > with your boost mechanism to be suitable for both client and server? > > The issue with multiple schedulers is that it''s most likely the non-default > will not be tested, used or maintained. The default credit scheduler gets > little enough love as it is, and it''s really the only sensible scheduler to > choose now (SEDF is not great -- good example of a rotten non-default > scheduler). > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
NISHIGUCHI Naoki
2008-Dec-04 07:51 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
Thank you for your suggestions. George Dunlap wrote:> On Wed, Dec 3, 2008 at 9:16 AM, Keir Fraser <keir.fraser@eu.citrix.com> wrote: >> Don''t hack it into the existing sched_credit.c unless you are really sharing >> significant amounts of stuff (which it looks like you aren''t?). >> sched_bcredit.c would be a cleaner name if there''s no sharing. Is a new >> scheduler necessary -- could the existing credit scheduler be generalised >> with your boost mechanism to be suitable for both client and server? > > I think we ought to be able to work this out; the functionality > doesn''t sound that different, and as you say, keeping two schedulers > around is only an invitation to bitrot.I had thought that the scheduler for client would be needed separately because this modification would influence a server workload. In order to minimize modifications, the bcredit scheduler was implemented by wrapping the current credit scheduler. I added the differences between original and bcredit. But as a result, almost functions were created newly. Now, I agree that one scheduler is best.> The more accurate credit scheduling and vcpu credit "balancing" seem > like good ideas. For the other changes, it''s probably worth measuring > on a battery of tests to see what kinds of effects we get, especially > on network throughput.I didn’t think about the battery and the performance.> Nishiguchi-san, (I hope that''s right!) as I understood from your > presentation, you haven''t tested this on a server workload, but you > predict that the "boost" scheduling of 2ms will cause unnecessary > overhead for server workloads. Is that correct?Yes, you are correct. I answered that in Q/A.> Couldn''t we avoid the overhead this way: If a vcpu has 5 or more > "boost" credits, we simply set the next-timer to 10ms. If the vcpu > yields before then, we subtract the amount of "boost" credits actually > used. If not, we subtract 5. That way we''re not interrupting any > more frequently than we were before.I set the next-timer to 2ms in any vcpu having “boost” credits since every vcpu having “boost” credits need to be run equally at short intervals. If there are vcpus having “boost” credits and the next-timer of a vcpu is set to 10ms, the other vcpus will be waited during 10ms. At present, I am thinking that if the other vcpus don’t have “boost” credits then we may set the next-timer to 30ms.> Come to think of it: won''t the effect of setting the ''boost'' time to > 2ms be basically counteracted by giving domains boost credits? I > thought the purpose reducing the boost time was to allow other domains > to run more quickly? But if a domain has more than 5 ''boost'' credits, > it will run for a full 10 ms anyway. Is that not so?I suppose that there are two domains given “boost” credits. One domain runs for 2ms, then the other domain runs for 2ms, then one domain runs for 2ms, then the other domain runs for 2ms, … Because I think to need that waited time of both is same.> Could you test your video latency measurement with all the other > optimizations, but with the "boost" time set to 10ms instead of 2? If > it works well, it''s probably worth simply merging the bulk of your > changes in and testing with server workloads.I tested the video latency measurement with the “boost” time set to 10ms. But it regretted not to work well. As I was mentioned above, the vcpu was occasionally waited during 10ms. On my patch, “boost” time is tuneable. How about the default “boost” time is 30ms and if necessary, “boost” time is set? Is it acceptable? In order to lengthen the “boost” time as much as possible, I will think about computing the length of the next-timer of the vcpu having “boost” credits. I’ll try to revise the patch. And thanks again. Best regards, Naoki Nishiguchi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Dec-04 12:21 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
On Thu, Dec 4, 2008 at 7:51 AM, NISHIGUCHI Naoki <nisiguti@jp.fujitsu.com> wrote:>> The more accurate credit scheduling and vcpu credit "balancing" seem >> like good ideas. For the other changes, it''s probably worth measuring >> on a battery of tests to see what kinds of effects we get, especially >> on network throughput. > > I didn''t think about the battery and the performance.I''m sorry, I used an uncommon definition of the word "battery"; I should have been more careful. :-) In this context, "a battery of tests" means "a combination of several different kinds of tests." I meant some disk-intensive tests, some network-intensive tests, some cpu-intensive tests, and some combination of all three. I can run some of these, and you can make sure that the "client" tests still work well. It would probably be helpful to have other people volunteer to do some testing as well, just to make sure we have our bases covered.> I set the next-timer to 2ms in any vcpu having "boost" credits since every > vcpu having "boost" credits need to be run equally at short intervals. If > there are vcpus having "boost" credits and the next-timer of a vcpu is set > to 10ms, the other vcpus will be waited during 10ms.> At present, I am thinking that if the other vcpus don''t have "boost" credits > then we may set the next-timer to 30ms.I see -- the current setup is good if there''s only one "boosted" VM (per cpu) at a time; but if there are two "boosted" VMs, they''re back to taking turns at 30 ms. Your 2ms patch allows several latency-sensitive VMs to share the "low latency" boost. That makes sense. I agree with your suggestion: we can set the timer to 2ms only if the next waiting vcpu on the queue is also BOOST.> I tested the video latency measurement with the "boost" time set to 10ms. > But it regretted not to work well. As I was mentioned above, the vcpu was > occasionally waited during 10ms.OK, good to know.> On my patch, "boost" time is tuneable. How about the default "boost" time is > 30ms and if necessary, "boost" time is set? Is it acceptable?I suspect that latency-sensitive workloads such as network, especially network servers that do very little computation, may also benefit from short boost times.> In order to lengthen the "boost" time as much as possible, I will think > about computing the length of the next-timer of the vcpu having "boost" > credits.If it makes things simpler, we could just stick with 10ms timeslices when there are no waiting vcpus with BOOST priority, and 2ms if there is BOOST priority. I don''t think there''s a particular need to give a VM only (say) 8 ms instead of 10, if there are no latency-sensitive VMs waiting.> I''ll try to revise the patch.I suggest: * Modify the credit scheduler directly, rather than having an extra scheduler * Break down your changes into patches that make individual changes, i.e (from your first post): + A patch to subtract credit consumed accurately + A patch to preserve the value of cpu credit when the vcpu is over upper bound + A patch to shorten cpu time per one credit + A patch to balance credits of each vcpu of a domain + A patch to introduce BOOST credit (both Xen and tool components) + A patch to shorten allocated time in BOOST priority if the next vcpu on the runqueue is also at BOOST Then we can evaluate each change individually. Thanks for your work! -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Dec-04 12:37 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
On Thu, Dec 4, 2008 at 12:21 PM, George Dunlap <George.Dunlap@eu.citrix.com> wrote:> I see -- the current setup is good if there''s only one "boosted" VM > (per cpu) at a time; but if there are two "boosted" VMs, they''re back > to taking turns at 30 ms. Your 2ms patch allows several > latency-sensitive VMs to share the "low latency" boost. That makes > sense. I agree with your suggestion: we can set the timer to 2ms only > if the next waiting vcpu on the queue is also BOOST.There was a paper earlier this year about scheduling and I/O performance: http://www.cs.rice.edu/CS/Architecture/docs/ongaro-vee08.pdf One of the things he noted was that if a driver domain is accepting network packets for multiple VMs, we sometimes get the following pattern: * driver domain wakes up, starts processing packets. Because it''s in "over", it doesn''t get boosted. * Passes a packet to VM 1, waking it up. It runs in "boost", preempting the (now lower-priority) driver domain. * Other packets (possibly even for VM 1) sit in the driver domain''s queue, waiting for it to get cpu time. Their tests, for 3 networking guests and 3 cpu-intensive guests, showed a 40% degradation in performance due to this problem. While we''re thinking about the scheduler, it might be worth seeing if we can solve this. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
NISHIGUCHI Naoki
2008-Dec-05 02:47 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
Hi, Thank you for your comments and suggestions. George Dunlap wrote:>> I didn''t think about the battery and the performance. > > I''m sorry, I used an uncommon definition of the word "battery"; I > should have been more careful. :-) > > In this context, "a battery of tests" means "a combination of several > different kinds of tests." I meant some disk-intensive tests, some > network-intensive tests, some cpu-intensive tests, and some > combination of all three. I can run some of these, and you can make > sure that the "client" tests still work well. It would probably be > helpful to have other people volunteer to do some testing as well, > just to make sure we have our bases covered.Oh, I misread the word “battery”. I understand what “a battery of tests” means. By the way, what tests do you concretely do? I have no idea on these tests.>> I set the next-timer to 2ms in any vcpu having "boost" credits since every >> vcpu having "boost" credits need to be run equally at short intervals. If >> there are vcpus having "boost" credits and the next-timer of a vcpu is set >> to 10ms, the other vcpus will be waited during 10ms. > >> At present, I am thinking that if the other vcpus don''t have "boost" credits >> then we may set the next-timer to 30ms. > > I see -- the current setup is good if there''s only one "boosted" VM > (per cpu) at a time; but if there are two "boosted" VMs, they''re back > to taking turns at 30 ms. Your 2ms patch allows several > latency-sensitive VMs to share the "low latency" boost. That makes > sense. I agree with your suggestion: we can set the timer to 2ms only > if the next waiting vcpu on the queue is also BOOST.OK. We must consider also a sleeping vcpu. The vcpu will be added to the queue by wakeup. So, we can set the timer to 2ms only if the next waiting vcpu on the queue or the sleeping vcpu is also BOOST. My thought about 2ms is: the period that the vcpu will be executed next is 2ms. Therefore, time slice of the vcpu is changed according to the number of existing vcpus. In a word, we may set the timer to 2ms or less. But I think that the number of vcpus will not be so much. Is this supposition wrong? And how about time slice of 2ms or less?>> On my patch, "boost" time is tuneable. How about the default "boost" time is >> 30ms and if necessary, "boost" time is set? Is it acceptable? > > I suspect that latency-sensitive workloads such as network, especially > network servers that do very little computation, may also benefit from > short boost times.I think so, too.>> In order to lengthen the "boost" time as much as possible, I will think >> about computing the length of the next-timer of the vcpu having "boost" >> credits. > > If it makes things simpler, we could just stick with 10ms timeslices > when there are no waiting vcpus with BOOST priority, and 2ms if there > is BOOST priority. I don''t think there''s a particular need to give a > VM only (say) 8 ms instead of 10, if there are no latency-sensitive > VMs waiting.I agree.>> I''ll try to revise the patch. > > I suggest: > * Modify the credit scheduler directly, rather than having an extra scheduler > * Break down your changes into patches that make individual changes, > i.e (from your first post): > + A patch to subtract credit consumed accurately > + A patch to preserve the value of cpu credit when the vcpu is over upper bound > + A patch to shorten cpu time per one credit > + A patch to balance credits of each vcpu of a domain > + A patch to introduce BOOST credit (both Xen and tool components) > + A patch to shorten allocated time in BOOST priority if the next > vcpu on the runqueue is also at BOOST > > Then we can evaluate each change individually.OK. I’ll separate individual changes from current patch and post each patch. Best regards, Naoki Nishiguchi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
NISHIGUCHI Naoki
2008-Dec-05 03:17 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
Thanks for your information. George Dunlap wrote:> There was a paper earlier this year about scheduling and I/O performance: > http://www.cs.rice.edu/CS/Architecture/docs/ongaro-vee08.pdf > > One of the things he noted was that if a driver domain is accepting > network packets for multiple VMs, we sometimes get the following > pattern: > * driver domain wakes up, starts processing packets. Because it''s in > "over", it doesn''t get boosted. > * Passes a packet to VM 1, waking it up. It runs in "boost", > preempting the (now lower-priority) driver domain. > * Other packets (possibly even for VM 1) sit in the driver domain''s > queue, waiting for it to get cpu time.I don''t read the paper yet, but I think our approach is effective in this problem. However, if driver domain consumes cpu time too much, we couldn''t prevent it from becoming "over" priority. Otherwise, we could keep it with "under" or "boost" priority.> Their tests, for 3 networking guests and 3 cpu-intensive guests, > showed a 40% degradation in performance due to this problem. While > we''re thinking about the scheduler, it might be worth seeing if we can > solve this.Firstly, I''d like to read the paper. Regards, Naoki Nishiguchi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Dec-05 11:37 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
On Fri, Dec 5, 2008 at 2:47 AM, NISHIGUCHI Naoki <nisiguti@jp.fujitsu.com> wrote:> Oh, I misread the word "battery". I understand what "a battery of tests" > means. > By the way, what tests do you concretely do? I have no idea on these tests.For basic workload tests, a couple are pretty handy. vConsolidate is a good test, but pretty hard to set up; I should be able to manage it with our infrastructure here, though. Other tests include: * kernel-build (i.e., time how long it takes to build the Linux kernel) and or ddk-build (Windows equivalent) * specjbb (a cpu-intensive workload) * netperf (for networks) For testing its effect on network, the paper I mentioned has three workloads that it combines with different ways: * cpu (just busy spinning) * sustained network (netbench): throughput * network ping: latency.> OK. > We must consider also a sleeping vcpu. The vcpu will be added to the queue > by wakeup. So, we can set the timer to 2ms only if the next waiting vcpu on > the queue or the sleeping vcpu is also BOOST. > > My thought about 2ms is: the period that the vcpu will be executed next is > 2ms. Therefore, time slice of the vcpu is changed according to the number of > existing vcpus. In a word, we may set the timer to 2ms or less. But I think > that the number of vcpus will not be so much. Is this supposition wrong? And > how about time slice of 2ms or less?I think I understand you to mean: If we set the timer for 10ms, and in the mean time another vcpu wakes up and is set at BOOST, then it won''t get a chance to run for another 10 ms. And you''re suggesting that we run the scheduler at 2ms if there are any vcpus that *may* wake up and be at BOOST, just in case; and you don''t think this situation will happen very often. Is that correct? Unfortunately, in consolidated server workloads you''re pretty likely to have more vcpus than physical cpus, so I think this case would come up pretty often. Furthermore, 2ms is really too short a scheduling quantum for normal use, especially for HVM domains, which have to take a vmexit/vmenter cycle to handle every interrupt. (I did some tests back when we were using the SEDF scheduler, and the scheduling alone was a 4-5% overhead for HVM domains.) But I don''t think we actually have a problem here: if a vcpu wakes up and is promoted to BOOST, won''t it "tickle" the runqueues to find somewhere for it to run? At very least the current cpu should be able to run it, or if it''s already running one at BOOST, it can set its own timer to 2ms. In any case, I think handling this corner case with some extra code is preferrable to running a 2ms timer any time it *might* happen.> OK. > I''ll separate individual changes from current patch and post each patch.Thanks! I''ll take them for a spin today. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
NISHIGUCHI Naoki
2008-Dec-08 08:37 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
George Dunlap wrote:> For basic workload tests, a couple are pretty handy. vConsolidate is > a good test, but pretty hard to set up; I should be able to manage it > with our infrastructure here, though. Other tests include: > * kernel-build (i.e., time how long it takes to build the Linux > kernel) and or ddk-build (Windows equivalent) > * specjbb (a cpu-intensive workload) > * netperf (for networks) > > For testing its effect on network, the paper I mentioned has three > workloads that it combines with different ways: > * cpu (just busy spinning) > * sustained network (netbench): throughput > * network ping: latency.Thanks! I''ll try to prepare.> I think I understand you to mean: If we set the timer for 10ms, and in > the mean time another vcpu wakes up and is set at BOOST, then it won''t > get a chance to run for another 10 ms. And you''re suggesting that we > run the scheduler at 2ms if there are any vcpus that *may* wake up and > be at BOOST, just in case; and you don''t think this situation will > happen very often. Is that correct?Almost that is correct. I had thought that we run the scheduler at 2ms only if there are vcpus that have boost credit and are already at BOOST. But I don''t think so now.> Unfortunately, in consolidated server workloads you''re pretty likely > to have more vcpus than physical cpus, so I think this case would come > up pretty often. Furthermore, 2ms is really too short a scheduling > quantum for normal use, especially for HVM domains, which have to take > a vmexit/vmenter cycle to handle every interrupt. (I did some tests > back when we were using the SEDF scheduler, and the scheduling alone > was a 4-5% overhead for HVM domains.)I see.> But I don''t think we actually have a problem here: if a vcpu wakes up > and is promoted to BOOST, won''t it "tickle" the runqueues to find > somewhere for it to run? At very least the current cpu should be able > to run it, or if it''s already running one at BOOST, it can set its own > timer to 2ms. In any case, I think handling this corner case with > some extra code is preferrable to running a 2ms timer any time it > *might* happen.OK. I implemented as follows: - If next running vcpu is at BOOST and first vcpu on run-queue is at BOOST, set the timer to 2ms. - If next running vcpu is at BOOST and first vcpu on run-queue is not at BOOST, set the timer to 10ms. - If next running vcpu is not at BOOST, set the timer to 30ms. - When a vcpu wakes up, if the vcpu has boost credit then send scheduler interrupts to at least one CPU. In my test environment, it works well. I''ll post last patch today. Thanks, Naoki _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
NISHIGUCHI Naoki
2008-Dec-18 02:49 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
Hi all, In almost the same environment as the paper, I experimented with credit scheduler(original and modified version). I describe the results below. Unfortunately the good result was not obtained by my previous patches. I found that there were some problems on my previous patches. So, I had revised the patches and experimented with revised version again. Using revised patches, the good result was obtained. Especially, please look at the result of ex7. In revised version, I/O bandwidth per guest is growing correctly according to dom0''s weight. I''ll post the revised patches later. Thanks, Naoki Nishiguchi ---------- results ---------- experimental environment: HP dc7800 US/CT(Core2 Duo E6550 2.33GHz) Multi-processor: disable Xen: xen 3.3.0 release dom0: CentOS 5.2 I used the following experiments from among the paper''s experiments. ex3: burn x7, ping x1 ex5: stream x7, ping x1 ex7: stream x3, burn x3, ping x1 ex8: stream x3, ping+burn x1, burn x3 original credit scheduler ex3 burn(%): 14 14 14 14 14 14 14 ping(ms): 19.7(average) 0.1 - 359 ex5 stream(Mbps): 144.05 141.19 137.81 137.01 137.30 138.76 142.21 ping(ms) : 8.2(average) 7.84 - 8.63 ex7 stream(Mbps): 33.74 27.74 34.70 burn(%): 28 28 28 (by guess) ping(ms): 238(average) 1.78 - 485 ex7(xm sched-credit -d 0 -w 512) There was no change in the result. ex8 stream(Mbps): 9.98 11.32 10.61 ping+burn: 264.9ms(average) 20.3 - 547 24% burn(%): 24 24 24 modified version(previous patches) ex3 burn(%): 14 14 14 14 14 14 14 ping(ms): 0.17(average) 0.136 - 0.202 ex5 stream(Mbps): 143.90 141.79 137.15 138.43 138.37 130.33 143.36 ping(ms): 7.2(average) 4.85 - 8.95 ex7 stream(Mbps): 2.33 2.18 1.87 burn(%): 32 32 32 (by guess) ping(ms): 373.7(average) 68.0 - 589 ex7(xm sched-credit -d 0 -w 512) There was no change in the result. ex7(xm sched-credit -d 0 -m 100 -r 20) stream(Mbps): 114.49 117.59 115.76 burn(%): 24 24 24 ping(ms): 1.2(average) 0.158 - 65.1 ex8 stream(Mbps): 1.31 1.09 1.92 ping+burn: 387.7ms(average) 92.6 - 676 24% (by guess) burn(%): 24 24 24 (by guess) revised version ex3 burn(%): 14 14 14 14 14 14 14 ping(ms): 0.18(average) 0.140 - 0.238 ex5 stream(Mbps): 142.57 139.03 137.50 136.77 137.61 138.95 142.63 ping(ms): 8.2(average) 7.86 - 8.71 ex7 stream(Mbps): 143.63 132.13 131.77 burn(%): 24 24 24 ping(ms): 32.2(average) 1.73 - 173 ex7(xm sched-credit -d 0 -w 512) stream(Mbps): 240.06 204.85 229.23 burn(%): 18 18 18 ping(ms): 7.0(average) 0.412 - 73.9 ex7(xm sched-credit -d 0 -m 100 -r 20) stream(Mbps): 139.74 134.95 135.18 burn(%): 23 23 23 ping(ms): 15.1(average) 1.87 - 95.4 ex8 stream(Mbps): 118.15 106.71 116.37 ping+burn: 68.8ms(average) 1.86 - 319 19% burn(%): 19 19 19 ---------- NISHIGUCHI Naoki wrote:> Thanks for your information. > > George Dunlap wrote: >> There was a paper earlier this year about scheduling and I/O performance: >> http://www.cs.rice.edu/CS/Architecture/docs/ongaro-vee08.pdf >> >> One of the things he noted was that if a driver domain is accepting >> network packets for multiple VMs, we sometimes get the following >> pattern: >> * driver domain wakes up, starts processing packets. Because it''s in >> "over", it doesn''t get boosted. >> * Passes a packet to VM 1, waking it up. It runs in "boost", >> preempting the (now lower-priority) driver domain. >> * Other packets (possibly even for VM 1) sit in the driver domain''s >> queue, waiting for it to get cpu time. > > I don''t read the paper yet, but I think our approach is effective in > this problem. > However, if driver domain consumes cpu time too much, we couldn''t > prevent it from becoming "over" priority. Otherwise, we could keep it > with "under" or "boost" priority. > >> Their tests, for 3 networking guests and 3 cpu-intensive guests, >> showed a 40% degradation in performance due to this problem. While >> we''re thinking about the scheduler, it might be worth seeing if we can >> solve this. > > Firstly, I''d like to read the paper. > > Regards, > Naoki Nishiguchi > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Dec-18 10:21 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
Naoki, Thank you for your work! The results look really good. Overall, I think the scheduler as a whole needs some design work before these can go in. No one at this point fully understands the principles on which it''s supposed to run. I''ve been taking a close look at the unmodified scheduler (notably trying to understand the anomalies pointed out by Atsushi), and I think it''s clear that there are some flaws in the logic. Before making a large change like this, I think we should do several things: * Try to describe exactly what the scheduler is currently doing, and why * If there are some inconsistencies, change them * Modify the description to include your proposed changes to the boost scheduler Your changes, although proven effective, make the scheduler much more complicated. If no one understands it now, it will be even harder to understand with your changes, unless we set down some very clear documentation of how the algorithm is supposed to work. Namely, we need to document: * What factors different workloads need; i.e.: + Long enough time for cpu-bound workloads to warm up the cache effectively + Fast responsiveness for "latency-sensitive" workloads, esp. in the face of multiple latency-sensitive workloads + Fairness wrt weight * At a high level, what we''d like to see happen * How individual mechanisms work: + Credits: when they are added / subtracted + Priorities: when they are changed and why + Preemption: when a cpu-bound process gets preempted + Active / passive status: when and why switched from one to the other I''ve been intending to do this for a couple of weeks now, but I''ve got some other patches I need to get cleaned up and submitted first. Hopefully those will be finished by the end of the week. This is my very next priority. Once I have the "design" document, I can describe your changes in reference to them, and we can discuss them at a design level. I have a couple of specific comments on your patches that I''ll put inline in other e-mails. Thank you for your work, and your patience. -George On Thu, Dec 18, 2008 at 2:49 AM, NISHIGUCHI Naoki <nisiguti@jp.fujitsu.com> wrote:> Hi all, > > In almost the same environment as the paper, I experimented with credit > scheduler(original and modified version). > I describe the results below. > > Unfortunately the good result was not obtained by my previous patches. > > I found that there were some problems on my previous patches. > So, I had revised the patches and experimented with revised version again. > Using revised patches, the good result was obtained. > Especially, please look at the result of ex7. In revised version, I/O > bandwidth per guest is growing correctly according to dom0''s weight. > > I''ll post the revised patches later. > > Thanks, > Naoki Nishiguchi > > ---------- results ---------- > experimental environment: > HP dc7800 US/CT(Core2 Duo E6550 2.33GHz) > Multi-processor: disable > Xen: xen 3.3.0 release > dom0: CentOS 5.2 > > I used the following experiments from among the paper''s experiments. > ex3: burn x7, ping x1 > ex5: stream x7, ping x1 > ex7: stream x3, burn x3, ping x1 > ex8: stream x3, ping+burn x1, burn x3 > > original credit scheduler > ex3 > burn(%): 14 14 14 14 14 14 14 > ping(ms): 19.7(average) 0.1 - 359 > ex5 > stream(Mbps): 144.05 141.19 137.81 137.01 137.30 138.76 142.21 > ping(ms) : 8.2(average) 7.84 - 8.63 > ex7 > stream(Mbps): 33.74 27.74 34.70 > burn(%): 28 28 28 (by guess) > ping(ms): 238(average) 1.78 - 485 > ex7(xm sched-credit -d 0 -w 512) > There was no change in the result. > ex8 > stream(Mbps): 9.98 11.32 10.61 > ping+burn: 264.9ms(average) 20.3 - 547 > 24% > burn(%): 24 24 24 > > > modified version(previous patches) > ex3 > burn(%): 14 14 14 14 14 14 14 > ping(ms): 0.17(average) 0.136 - 0.202 > ex5 > stream(Mbps): 143.90 141.79 137.15 138.43 138.37 130.33 143.36 > ping(ms): 7.2(average) 4.85 - 8.95 > ex7 > stream(Mbps): 2.33 2.18 1.87 > burn(%): 32 32 32 (by guess) > ping(ms): 373.7(average) 68.0 - 589 > ex7(xm sched-credit -d 0 -w 512) > There was no change in the result. > ex7(xm sched-credit -d 0 -m 100 -r 20) > stream(Mbps): 114.49 117.59 115.76 > burn(%): 24 24 24 > ping(ms): 1.2(average) 0.158 - 65.1 > ex8 > stream(Mbps): 1.31 1.09 1.92 > ping+burn: 387.7ms(average) 92.6 - 676 > 24% (by guess) > burn(%): 24 24 24 (by guess) > > > revised version > ex3 > burn(%): 14 14 14 14 14 14 14 > ping(ms): 0.18(average) 0.140 - 0.238 > ex5 > stream(Mbps): 142.57 139.03 137.50 136.77 137.61 138.95 142.63 > ping(ms): 8.2(average) 7.86 - 8.71 > ex7 > stream(Mbps): 143.63 132.13 131.77 > burn(%): 24 24 24 > ping(ms): 32.2(average) 1.73 - 173 > ex7(xm sched-credit -d 0 -w 512) > stream(Mbps): 240.06 204.85 229.23 > burn(%): 18 18 18 > ping(ms): 7.0(average) 0.412 - 73.9 > ex7(xm sched-credit -d 0 -m 100 -r 20) > stream(Mbps): 139.74 134.95 135.18 > burn(%): 23 23 23 > ping(ms): 15.1(average) 1.87 - 95.4 > ex8 > stream(Mbps): 118.15 106.71 116.37 > ping+burn: 68.8ms(average) 1.86 - 319 > 19% > burn(%): 19 19 19 > ---------- > > NISHIGUCHI Naoki wrote: >> >> Thanks for your information. >> >> George Dunlap wrote: >>> >>> There was a paper earlier this year about scheduling and I/O performance: >>> http://www.cs.rice.edu/CS/Architecture/docs/ongaro-vee08.pdf >>> >>> One of the things he noted was that if a driver domain is accepting >>> network packets for multiple VMs, we sometimes get the following >>> pattern: >>> * driver domain wakes up, starts processing packets. Because it''s in >>> "over", it doesn''t get boosted. >>> * Passes a packet to VM 1, waking it up. It runs in "boost", >>> preempting the (now lower-priority) driver domain. >>> * Other packets (possibly even for VM 1) sit in the driver domain''s >>> queue, waiting for it to get cpu time. >> >> I don''t read the paper yet, but I think our approach is effective in this >> problem. >> However, if driver domain consumes cpu time too much, we couldn''t prevent >> it from becoming "over" priority. Otherwise, we could keep it with "under" >> or "boost" priority. >> >>> Their tests, for 3 networking guests and 3 cpu-intensive guests, >>> showed a 40% degradation in performance due to this problem. While >>> we''re thinking about the scheduler, it might be worth seeing if we can >>> solve this. >> >> Firstly, I''d like to read the paper. >> >> Regards, >> Naoki Nishiguchi >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2009-Jan-21 10:35 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
Naoki, I''m working on revising the scheduler right now, so it''s probably best if you hold off patches for a little while. I''m also trying to understand the minimum that your client workloads actually need to run well. There were compontents of the "boost" patch series that helped your workload: (a) minimum cpu time, (b) Shortened time slices (2ms) (c) "boosted" priority for multimedia domains Is it possible that having (a) and (b), possibly with some other combinations, could work well without adding (c)? At any rate, I''m going to start with a revised system that has a minimum cpu time, but no "high priority", and see if we can get things to work OK without it. Thanks for your work, BTW -- the scheduler has needed some attention for a long time, but I don''t think it would have gotten it if you hadn''t introduced these patches. Peace, -George On Wed, Jan 21, 2009 at 3:00 AM, NISHIGUCHI Naoki <nisiguti@jp.fujitsu.com> wrote:> Hi George, > > George Dunlap wrote: >> >> Sorry, didn''t finish my thoughts before sending... >> >>> The original meaning of the "boost" priority was a priority given to >>> domains when waking up, so that latency-sensitive workloads could >>> achieve low latency when competing with cpu-intensive workloads, while >>> maintaining weight. I think this meaning of "boost" (and the >>> mechanism) is still important, especially for server-style workloads. >> >> ...so, I think we need to maintain the old "boost" mechanism (or >> something like it), and come up with a new name for this "priority cpu >> time" feature. > > I believe that the old "boost" mechanism remains after applying my patches. > But, now I think that "priority cpu time" feature needs a new name as you > said. > > Because of not changing the existing functionalities in credit scheduler and > achieving continuous high-priority for a domain, I decided to use "boost" > mechanism, especially boost priority. In my rev2 patches, old "boost" > mechanism and "boost credit" I introduced were integrated strongly and the > good result was obtained. But, as you said and I wrote above, I think that > the "boost" mechanism and "boost credit" should be separated. I''ll try to > achieve this by introducing new priority for "priority cpu time" feature. > > Regards, > Naoki > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
NISHIGUCHI Naoki
2009-Jan-22 06:15 UTC
Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization
Hi George, George Dunlap wrote:> I''m working on revising the scheduler right now, so it''s probably best > if you hold off patches for a little while.OK. I''ll wait to finish your work.> I''m also trying to understand the minimum that your client workloads > actually need to run well. There were compontents of the "boost" > patch series that helped your workload: > (a) minimum cpu time, > (b) Shortened time slices (2ms) > (c) "boosted" priority for multimedia domains > > Is it possible that having (a) and (b), possibly with some other > combinations, could work well without adding (c)?Yes, it is possible. I divided the rev2 "boost" patch as follows without (c). (1) minimum cpu time (a): boost_1.patch + boost_1_tools.patch (2) Shortened time slices (b): boost_2.patch (3) alternative "boost" mechanism by boost_credit: boost_3.patch These patches works with the following combinations. (1), (1)+(2), (1)+(2)+(3) Please apply these patches in numerical order. Without (3), didn''t solve the problem in the paper you showed. Are these what you want?> At any rate, I''m going to start with a revised system that has a > minimum cpu time, but no "high priority", and see if we can get things > to work OK without it. > > Thanks for your work, BTW -- the scheduler has needed some attention > for a long time, but I don''t think it would have gotten it if you > hadn''t introduced these patches.Thanks. Best regards, Naoki _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel