Su, Disheng
2009-Mar-20 09:18 UTC
[Xen-devel] [RFC] Add static priority into credit scheduler
Hi all, Attached patches add static priority into credit scheduler. Currently, credit scheduler has 4 kinds of priority: BOOST, UNDER, OVER and IDLE. And the priority of VM is dynamically changed according to the credit of VM, or I/O events, the highest priority VM is chosed to be scheduled in for each scheduling period. Due to priority is not fixed, which VM will be scheduled in is properly unknown. The I/O latency caused by scheduler is well analyzed in [1] and [2]. They provides ways to reduce I/O latency and also retain CPU and I/O fairness between VMs to some extend. There are some cases that reducing latency is much preferable to CPU or I/O fairness, such as RTOS guest or VM with device(audio)-assigned. The straightforward way is to set static(fixed) highest priority for this VM, to make sure it is scheduled each time. Attached patches implemented this kind of mechanism, like SCHED_RR/SCHED_FIFO in Linux. How it works? --Users can set RT priority(between 1~100) for domains. The larger the number, the higher the priority. Users can also change a RT domain into a non-RT domain by setting its priority other than 1~100. --Scheduler always chooses the highest priority domain to run for RT domains, no changes for non-RT domains in there. If RT domains have the same priority, round robin between this domains for every 30ms. 30ms is the default scheduling period, it can be changed to 2ms or other value if needed. --There is still accounting for current running non-RT vcpu in every 10ms, accounting for all non-RT domains in every 30ms as credit scheduler did before. Implementation details: -- In order to minimize the modification in the credit scheduler, one additional rt runqueue per pcpu is added, and one rt active domain list added in csched_private. RT vcpus are added into the rt runqueue in the running pcpu, and rt domains are added into rt active domain. -- Scheduler always chooses the highest priority in the rt runqueue if it''s not empty at first, then chooses from normal runqueue instead. --__runq_insert/__runq_remove are changed to based on the priority of vcpu. -- Vcpu accounting is only took effects on the non-RT vcpus as before. Non-RT vcpus propotionally share the rest of cpu based on their weight. The total weight is changed during adding/removing RT domains, e.g. promoting a non-RT domain to a RT domain, total weight is substracted by the weight of non-RT domain. How to use it: set priority(y) of a VM(x) by: "xm sched-credit -d x -p y" Test results: I did some tests with this patches according to following configuration: CPU: Intel Core 2 Duo E6850, Xen(1881), 7 VMs created on one physical machine A, each 2 VMs pair ping with each other, the other VM has RT priority. Another physical machine B connects with it through 1G network card directly. Conduct these tests from B to A, e.g ping A from B. some test results are uploaded to http://wiki.xensource.com/xenwiki/DishengSu, FYI. Summary: This patches minimize the scheduling latency, while losing CPU, or I/O fairness. It can be used as a scheduler for RT guest, for some cases(such as RT guest and non-RT guests co-exist). While there are lot of areas to improve real time response, such as interrupt latency, Xen I/O model[3]. Any comments are appreciated. Thanks! --------------------- [1]Scheduling I/O in Virtual Machine Monitors [2]Evaluation and Consideration of the Credit Scheduler for Client Virtualization [3]A step to support real-time in virtual machine Best Regards, Disheng, Su _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2009-Mar-20 12:42 UTC
Re: [Xen-devel] [RFC] Add static priority into credit scheduler
So, just to be clear: you''re proposing that this mechanism *might* be useful for a VM with real-time scheduling requirements? Or are actually working on / developing real-time operating systems, and are suggesting this in order to support real-time VMs? I''m not an expert in real-time scheduling, but it doesn''t seem to me like this will really be what a real-time system would want. (Feel free to contradict me if you know better.) It might work OK if there were only a single real-time PV guest, but in the face of competition, you''d have trouble. It seems like an actual real-time Xen scheduler would want the PV guests to submit deadlines to Xen, and then Xen could try to make a decision as to which deadlines to drop if it needs to (based on some mechanism). The only test you''ve measured is networking; but networking isn''t a "real-time" workload, it''s a latency-sensitive workload. And you haven''t measured: * The effect on network traffic if you have several high-priority VMs competing * The effect on network traffic of non-prioritized VMs if a high-priority VM is receiving traffic, or is misbehaving You also haven''t compared how raising a VM''s priority within the current credit framework, such as giving it a very high weight, affects the numbers. Can you get similar results if you were to give the "latency-sensitive" VMs a weight of, say, 10000, and leave the other ones at 256? Overall, I don''t think fixed priorities like this is a good solution: I think it will create more problems than it solves, and I think it''s actually harder to predict how a complex system will actually behave (and thus harder to configure properly). I think the proper solution (and I''m working on a "credit2" scheduler that has these properites) is: * Fix the credit assignment, so that VMs don''t spend very much time in "over" * Give VMs that wake up and are under their credits a fixed "boost" period (e.g., 1ms) * Allow users to specify a cpu "reservation"; so that no matter how much work there is on the system, a VM can be guaranteed to get a minimum fixed amount of the cpu if it wants it; e.g., dom0 always gets 50% of one core if it wants it, no matter how many other VMs are on the system. #1 and #2 have resulted in significant improvements in TCP throughput in the face of competition. I hope to publish a draft here on the list sometime soon, but I''m still working out some of the details. -George Dunlap 2009/3/20 Su, Disheng <disheng.su@intel.com>:> Hi all, > Attached patches add static priority into credit scheduler. > Currently, credit scheduler has 4 kinds of priority: BOOST, UNDER, OVER and IDLE. And the priority of VM is dynamically changed according to the credit of VM, or I/O events, the highest priority VM is chosed to be scheduled in for each scheduling period. Due to priority is not fixed, which VM will be scheduled in is properly unknown. The I/O latency caused by scheduler is well analyzed in [1] and [2]. They provides ways to reduce I/O latency and also retain CPU and I/O fairness between VMs to some extend. > There are some cases that reducing latency is much preferable to CPU or I/O fairness, such as RTOS guest or VM with device(audio)-assigned. The straightforward way is to set static(fixed) highest priority for this VM, to make sure it is scheduled each time. Attached patches implemented this kind of mechanism, like SCHED_RR/SCHED_FIFO in Linux. > > How it works? > --Users can set RT priority(between 1~100) for domains. The larger the number, the higher the priority. Users can also change a RT domain into a non-RT domain by setting its priority other than 1~100. > --Scheduler always chooses the highest priority domain to run for RT domains, no changes for non-RT domains in there. If RT domains have the same priority, round robin between this domains for every 30ms. 30ms is the default scheduling period, it can be changed to 2ms or other value if needed. > --There is still accounting for current running non-RT vcpu in every 10ms, accounting for all non-RT domains in every 30ms as credit scheduler did before. > > Implementation details: > -- In order to minimize the modification in the credit scheduler, one additional rt runqueue per pcpu is added, and one rt active domain list added in csched_private. RT vcpus are added into the rt runqueue in the running pcpu, and rt domains are added into rt active domain. > -- Scheduler always chooses the highest priority in the rt runqueue if it''s not empty at first, then chooses from normal runqueue instead. > --__runq_insert/__runq_remove are changed to based on the priority of vcpu. > -- Vcpu accounting is only took effects on the non-RT vcpus as before. Non-RT vcpus propotionally share the rest of cpu based on their weight. The total weight is changed during adding/removing RT domains, e.g. promoting a non-RT domain to a RT domain, total weight is substracted by the weight of non-RT domain. > > How to use it: > set priority(y) of a VM(x) by: "xm sched-credit -d x -p y" > > Test results: > I did some tests with this patches according to following configuration: > CPU: Intel Core 2 Duo E6850, Xen(1881), 7 VMs created on one physical machine A, each 2 VMs pair ping with each other, the other VM has RT priority. Another physical machine B connects with it through 1G network card directly. Conduct these tests from B to A, e.g ping A from B. > some test results are uploaded to http://wiki.xensource.com/xenwiki/DishengSu, FYI. > > Summary: > This patches minimize the scheduling latency, while losing CPU, or I/O fairness. It can be used as a scheduler for RT guest, for some cases(such as RT guest and non-RT guests co-exist). While there are lot of areas to improve real time response, such as interrupt latency, Xen I/O model[3]. > Any comments are appreciated. Thanks! > > --------------------- > [1]Scheduling I/O in Virtual Machine Monitors > [2]Evaluation and Consideration of the Credit Scheduler for Client Virtualization > [3]A step to support real-time in virtual machine > > Best Regards, > Disheng, Su > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Su, Disheng
2009-Mar-23 07:33 UTC
RE: [Xen-devel] [RFC] Add static priority into credit scheduler
George Dunlap wrote:> So, just to be clear: you''re proposing that this mechanism *might* be > useful for a VM with real-time scheduling requirements? Or are > actually working on / developing real-time operating systems, and are > suggesting this in order to support real-time VMs?The first one, I think it will be useful to consolidate VM with real-time requirements: 1. Enterprise real-time: such as SUSE Linux Enterprise Real Time(http://www.novell.com/products/realtime/), Red Hat Enterprise MRG(http://www.redhat.com/mrg/) and some apps/middware(http://www-03.ibm.com/linux/realtime.html). They are used for mission-critical applications such as, trading system/VOIP server etc. 2. Embedded real-time: normal usage model is to consolidate one embedded RTOS(QNX, VxWorks etc) and a general purpose OS(Linux/Windows) on one cpu core.> > I''m not an expert in real-time scheduling, but it doesn''t seem to me > like this will really be what a real-time system would want. (Feel > free to contradict me if you know better.) It might work OK if there > were only a single real-time PV guest, but in the face of competition, > you''d have trouble. It seems like an actual real-time Xen scheduler > would want the PV guests to submit deadlines to Xen, and then Xen > could try to make a decision as to which deadlines to drop if it needs > to (based on some mechanism).I agree this is one of way to go. But it''s not suitable for all the real time OS, e,g. the enterprise real time Linux(no period/deadline at all), whose real time scheduling mechanism is SCHED_RR/SCHED_FIFO(based on static priority). It''s quite different from traditional embedded real time OS. On the other hand, there are so many embedded real time COTS OS, and how about unmodified RTOS:)? If you just consolidate one embedded real time OS and one general purpose OS(I guess this is the normal usage model currently on cellphone/industry control), how about just setting RTOS as highest priority? It''s true, that static priority has problem if two or more RT VM competing with each other on one phyiscal cpu. This case can be addressed by real time PV guest as you said, or by other ways. Currently I made the assumption that only one RT VM and more non-RT VM on one physical cpu core/thread. It''s reasonable especially with quad/many core.> > The only test you''ve measured is networking; but networking isn''t a > "real-time" workload, it''s a latency-sensitive workload. And youYes, the normal/simple "real-time" workload is "sleep_for_some_ns-and-wake_up", such as Cyclictest(http://rt.wiki.kernel.org/index.php/Cyclictest), but it depends on hrtimer in dom0. In order to minimize the dependence with dom0, I use the assigned network card instead. Sending out a packet from remote machine, then test latency according to the response from RT VM. It''s obvious to see the improvement in scheduler...> haven''t measured: > * The effect on network traffic if you have several high-priority VMs > competingCurrently it''s not in my scope. And I think it''s very hard to schedule multiple RT VM on one CPU in practical.> * The effect on network traffic of non-prioritized VMs if a > high-priority VM is receiving traffic, or is misbehaving >RT VM is dealing with critical events, so we trust it...> You also haven''t compared how raising a VM''s priority within the > current credit framework, such as giving it a very high weight, > affects the numbers. Can you get similar results if you were to give > the "latency-sensitive" VMs a weight of, say, 10000, and leave the > other ones at 256?"Weight" isn''t helpful here. I had tested one VM with assigned audio device, the noise is obvious, if other VM is busy. The current credit framework has the following issues AFAIK: One VM is in OVER state can''t be BOOSTed Multiple VM in BOOST state, no preemption So there is no guarantee to schedule which VM in.> > Overall, I don''t think fixed priorities like this is a good solution: > I think it will create more problems than it solves, and I think it''s > actually harder to predict how a complex system will actually behave > (and thus harder to configure properly).static priority is useful in the simple case(one RT VM and multiple non-RT VM on one cpu core/thread) If we trust the RT VM, then it''s easy to configure. I mean RT VM is usually timely response extern events and then sleep. I know you may concern about such as RT VM is misbehaving, may monopolise the whole cpu. Static priority is just a scheduling mechanism. It depends on user''s favor to use it. Linux supports this kind of static priority also...> > I think the proper solution (and I''m working on a "credit2" scheduler > that has these properites) is: > * Fix the credit assignment, so that VMs don''t spend very much time > in "over" > * Give VMs that wake up and are under their credits a fixed "boost" > period (e.g., 1ms) > * Allow users to specify a cpu "reservation"; so that no matter how > much work there is on the system, a VM can be guaranteed to get a > minimum fixed amount of the cpu if it wants it; e.g., dom0 always gets > 50% of one core if it wants it, no matter how many other VMs are on > the system. > > #1 and #2 have resulted in significant improvements in TCP throughput > in the face of competition. I hope to publish a draft here on the > list sometime soon, but I''m still working out some of the details.Glad to know that credit scheduler is being improved. If it can improve the latency/real time capabilty with minimal enhancement that will be much better:)>Best Regards, Disheng, Su _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2009-Mar-25 10:35 UTC
Re: [Xen-devel] [RFC] Add static priority into credit scheduler
2009/3/23 Su, Disheng <disheng.su@intel.com>:> Glad to know that credit scheduler is being improved. > If it can improve the latency/real time capabilty with minimal enhancement that will be much better:)I''m still a bit skeptical, but I guess not as much as before. I don''t think it''s really the best solution, and I definitely think that people shouldn''t think about this as a *good* solution to latency-sensitive workloads like video and audio. But it might be handy to have around as a "quick-fix". If nothing else, client virtualization (e.g., VMs with audio pass-through) are important, and since credit2 isn''t going to make it into 3.4, something like this might be a necessary stand-in. I''d be interested to hear others'' opinions. Regarding the "sleep-for-some-time-and-wake-up" test, I hacked minios to do simulate this kind of "periodic deadline work" as a part of my development. (Patch attached.) It will set a timer to go off every period, and then spin for a given amount of cycles. If it isn''t scheduled for a period, it "drops" work. Every second it reports the percentage of work completed. Credit1 does absolutely terrible -- completely unfair and unpredictable. A few changes in credit2 allowed the number of missed deadlines to degrade gracefully, equally across all VMs, correlated at least with the VM''s weight, in a predictable manner. If we did include something like this, we would need to make sure that we couldn''t get into a state where misbehaving RT guests locked out dom0 and any driver domains necessary for dom0 network access. (We may trust the VMs not to purposely misbehave, but between bugs and operator error, there''s still plenty of room for misbehavior.) I was looking through the Linux scheduler code, and they seem to have some limits on RT processes as well, presumably for the same reason. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
NISHIGUCHI Naoki
2009-Mar-27 02:39 UTC
Re: [Xen-devel] [RFC] Add static priority into credit scheduler
Hi Disheng and George, Disheng, I''m glad to see your work. George Dunlap wrote:> I''d be interested to hear others'' opinions.I think that static priority is useful under some conditions. But, as George said, I also think it is harder to configure properly. And I''m anxious that it makes credits on non-RT vcpu meaningless. I tested your patch in following environment. CPU: Intel Core2 Quad Q9450 Chipset: Intel 82Q35 VM: dom0 (4 vcpus), HVM (4 vcpus) Xen: c/s 19426 HVM: RT priority (1) pass-through devcies PCI graphic board Integrated devices(audio, USB controller) playing video With this configuration, HVM does not work well. When HVM does not have RT priority, HVM works well. I think we would need to consider the relationship between static priority and credit, handling of dom0 and driver domain, and so on. Best regards, Naoki Nishiguchi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Su, Disheng
2009-Mar-27 03:29 UTC
RE: [Xen-devel] [RFC] Add static priority into credit scheduler
NISHIGUCHI Naoki wrote:> Hi Disheng and George, > > Disheng, I''m glad to see your work. > > George Dunlap wrote: >> I''d be interested to hear others'' opinions. > > I think that static priority is useful under some conditions. > But, as George said, I also think it is harder to configure properly. > And I''m anxious that it makes credits on non-RT vcpu meaningless. >Not exactly, credits still makes sense for non-RT guests, but these guests only propotionally share the rest of cpu(not used by RT guest) based on their wieght/credit. If non-RT guest is scheduled in and out, its credit is substracted as usual. It has the potential that RT guests monopolise the whole cpu, if we don''t have other mechanisms to prevent that.> I tested your patch in following environment. > > CPU: Intel Core2 Quad Q9450 > Chipset: Intel 82Q35 > VM: dom0 (4 vcpus), HVM (4 vcpus) > Xen: c/s 19426 > HVM: > RT priority (1) > pass-through devcies > PCI graphic board > Integrated devices(audio, USB controller) > playing video > > With this configuration, HVM does not work well. > When HVM does not have RT priority, HVM works well. > > I think we would need to consider the relationship between static > priority and credit, handling of dom0 and driver domain, and so on. >Thanks for your testing with the patch! In client virtualization, with static priority, the simplest way is to set the primary guest and dom0 as the highest priority(can be different priority), other auxiliary guests as non-RT guest. I think it *should* sovle the audio/video glitch, I don''t test it though. One of issues in this way is that other non-RT guests may not have enough CPU/thoughput when user is busy in primary guest(e.g playing video, copying files at the same time). I remembered that I heard some audio glitches in some cases with your Bcredit before. Maybe static priority can be helpful but with a somewhat heavy way... Could you kindly have a test with this configuration?> Best regards, > Naoki NishiguchiBest Regards, Disheng, Su _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Su, Disheng
2009-Mar-27 04:29 UTC
RE: [Xen-devel] [RFC] Add static priority into credit scheduler
> I''d be interested to hear others'' opinions.Just found Daniel is working on EmbeddedXen to support hard realtime OS(Xenomai on top of Xen), from thread http://markmail.org/message/o2vyzy7ngf7oluw4 Add Daniel in, hope he can give more opinions on it... Hi Daniel, we are talking about adding static priority in xen''s credit scheduler to support real time guest. You can see the thread from http://markmail.org/message/vn62u7qdbmswms5a, in case you missed it. Could you give us your concerns/opinions about xen to support real time OS, such as the scheduler, interrupt latency, the overhead introduced by Xen,etc.? Thanks! Best Regards, Disheng, Su _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
NISHIGUCHI Naoki
2009-Mar-27 07:03 UTC
Re: [Xen-devel] [RFC] Add static priority into credit scheduler
Su, Disheng wrote:> Not exactly, credits still makes sense for non-RT guests, but these guests only propotionally share the rest of cpu(not used by RT guest) based on their wieght/credit. If non-RT guest is scheduled in and out, its credit is substracted as usual. It has the potential that RT guests monopolise the whole cpu, if we don''t have other mechanisms to prevent that.I understand what you mean. I doubt whether the rest of cpu not used by RT guest is reflected to credit of non-RT guests. If RT guest might monopolize the whole cpu, I think the rest of cpu is nothing, therefore non-RT guests have no credit.> Thanks for your testing with the patch! > In client virtualization, with static priority, the simplest way is to set the primary guest and dom0 as the highest priority(can be different priority), other auxiliary guests as non-RT guest. I think it *should* sovle the audio/video glitch, I don''t test it though. One of issues in this way is that other non-RT guests may not have enough CPU/thoughput when user is busy in primary guest(e.g playing video, copying files at the same time). > I remembered that I heard some audio glitches in some cases with your Bcredit before. Maybe static priority can be helpful but with a somewhat heavy way... > Could you kindly have a test with this configuration?As you sad, I set dom0 and HVM to RT priority(1) and tested. Regretfully, HVM does not work well. Attached file is output of "xm debug-keys r". Best regards, Naoki Nishiguchi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Su, Disheng
2009-Mar-27 08:05 UTC
RE: [Xen-devel] [RFC] Add static priority into credit scheduler
NISHIGUCHI Naoki wrote:> I understand what you mean. > I doubt whether the rest of cpu not used by RT guest is reflected to > credit of non-RT guests. If RT guest might monopolize the whole cpu, I > think the rest of cpu is nothing, therefore non-RT guests have no > credit. >Yes, it''s an issue need to be addressed for client virtualization case, due to the primary guest(e,g Windows) is not a trusted guest. When detecting one RT guest is monopolize cpu for a while(e.g. 1-2minute), one can: 1. kill the RT guest... 2. lower its priority for a while, give other guests the opportunity to run, then restore its previous priority Any ideas?> As you sad, I set dom0 and HVM to RT priority(1) and tested. > Regretfully, HVM does not work well. > > Attached file is output of "xm debug-keys r". >Oh, forgot to mention you need to pin the RT guest, or try the attached patch. If you set one guest as high priority, it increases the chance that its vcpus are migrated back and forth, because the priority is fixed and higher than OVER. Don''t migrate the RT guest in practice. It''s the same with Bcredit from my previous experience, isn''t it?> Best regards, > Naoki NishiguchiBest Regards, Disheng, Su _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
NISHIGUCHI Naoki
2009-Mar-27 10:13 UTC
Re: [Xen-devel] [RFC] Add static priority into credit scheduler
Su, Disheng wrote:> NISHIGUCHI Naoki wrote: >> I understand what you mean. >> I doubt whether the rest of cpu not used by RT guest is reflected to >> credit of non-RT guests. If RT guest might monopolize the whole cpu, I >> think the rest of cpu is nothing, therefore non-RT guests have no >> credit. >> > > Yes, it''s an issue need to be addressed for client virtualization case, due to the primary guest(e,g Windows) is not a trusted guest. > When detecting one RT guest is monopolize cpu for a while(e.g. 1-2minute), one can: > 1. kill the RT guest... > 2. lower its priority for a while, give other guests the opportunity to run, then restore its previous priority > Any ideas?What I mean is credit_total given to non-RT guest in scheduler. If a PC has 4 core cpu and an RT guest has 4 vcpu, I think credit_total would be 0, because we could not predict behavior of the RT guest .> Oh, forgot to mention you need to pin the RT guest, or try the attached patch. > If you set one guest as high priority, it increases the chance that its vcpus are migrated back and forth, because the priority is fixed and higher than > OVER.I tried your patch. The result was the same. I also pin the RT guest and dom0 as follows, but HVM did not work well. vcpu cpu dom0 0 0 1 1 2 2 3 3 HVM 0 0 1 1 2 2 3 3 It seems to me that idle cpus are not effectively used.> Don''t migrate the RT guest in practice. It''s the same with Bcredit from my previous experience, isn''t it?I think that scheduler should not migrate the vcpu needlessly, but necessary migration should be done. Best regards, Naoki Nishiguchi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel