Hi All, I am using Xen 3.3.2 for some of my experiments, and have been consistently observing some sub-optimal results in our experiments with latency sensitive I/O intensive and compute bound VMs running together on a physical machine. We are observing latency issues in cases with enough CPU resources available for both the VMs to co-exist well, even if we give much higher weight to the latency sensitive VM. I suspect it is due to the way the Xen credit scheduler works. In this context, I have some questions regarding the scheduler: 1) In the sched_acct function, the credit cap is set to 300, enough to survive one time slice. But if some VCPU crosses that cap, it is set to 0, and marked inactive. Why is there no concept of a ceiling (like that of a floor for the VCPUs going over the credit line), i.e. why is it not set to 300? Is there some fundamental reason for setting it to 0? I believe this is resulting in a lot of times when our latency sensitive VCPUs have to wait for maybe a time slice, when they can immediately run. This might happen if they run with BOOST priority and get interrupted by a timer tick, which takes that priority away. 2) Why is the runq sorted by just priority (which is very coarse grained: BOOST, UNDER and OVER), and not the credit? This can result in VCPUs with higher credit getting starved for CPU if we have batch and latency sensitive VCPUs in the system. 3) Is there some patch, which makes the current credit scheduler fairer to the latency sensitive VCPUs? I see that the sched_credit2 scheduler addresses these issues, but right now it has just one global runq and no load balancing features. Any advice/inputs here will be extremely valuable! Thanks in advance, -Gaurav _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Jul-09 12:13 UTC
Re: [Xen-devel] Questions regarding Xen Credit Scheduler
Guarav, I''ve identified a lot of the problems you mention here (you may want to see my paper and talk from XenSummit Asia 2009 [1]) , but I haven''t done anything to address them in credit1 because I thought it really just needed to be scrapped and started over. However, that process is taking a lot longer than I''d hoped, so I think it may make sense to do some work to patch up the current scheduler to keep it running until the new one can replace it. Are you willing to help out with some investigation / testing in this process? Regarding specific things: One thing you didn''t catch is that credits before 4.0 are debited probabilistically, a full 10ms at a time, by the very timer tick that moves a vcpu from "inactive" to "active"; so when you make the switch from "active" to "inactive", you don''t start out at 0, but at -10ms. It turns out that''s not only bad for latency-sensitive processes, but it''s also a security bug; so there''s a patch in 4.0 (not sure whether it''s been backported to 3.4) to do accurate accounting based on RDTSC reads instead of probabilistic-based accounting based on timer ticks. #1: Setting the credits to 0 is part of the "reset condition" I mention in my paper. The basic idea is that accumulated credit needs to be discarded somehow. I have a patch that will intsead of setting it to 0, will divide it by 2. This should balance between discarding credits and not starting too far "behind". #2: AFAICT, the reason for choosing to sort by priority was that it allowed a simple O(n) sorting algorithm. However, the effect is that within a given priority, scheduling is round-robin. Round-robin scheduling is known to discriminate against processes that voluntarily block in favor of those that use up their entire timeslice. Diego et al[2] did some experiments with sorting by credit and found that it helped latency sensitive workloads So the answer to #3 is: * The "accurate credit" patch is in 4.0, maybe 3.4. That should help somewhat. * I have a patch that will change the "reset condition"; I''m considering submitting it. I''d appreciate testing / feedback. (I''ll send this in a separate e-mail.) * There is no patch yet that will fix the sort-by-priority, but it should be simple and straightforward to implement. I''ll support putting it in once I''m reasonably convinced that it helps and doesn''t hurt too much. If you were to help out with the implementation and testing, that will happen a lot faster. :--) Peace, -George Refs: [1] http://www.xen.org/xensummit/xensummit_fall_2009.html -- search for my name under "Topics" [2] Diego Ongaro , Alan L. Cox , Scott Rixner, Scheduling I/O in virtual machine monitors, Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, March 05-07, 2008, Seattle, WA, USA On Thu, Jul 8, 2010 at 10:14 PM, Gaurav Dhiman <dimanuec@gmail.com> wrote:> 1) In the sched_acct function, the credit cap is set to 300, enough to > survive one time slice. But if some VCPU crosses that cap, it is set > to 0, and marked inactive. Why is there no concept of a ceiling (like > that of a floor for the VCPUs going over the credit line), i.e. why is > it not set to 300? Is there some fundamental reason for setting it to > 0? I believe this is resulting in a lot of times when our latency > sensitive VCPUs have to wait for maybe a time slice, when they can > immediately run. This might happen if they run with BOOST priority and > get interrupted by a timer tick, which takes that priority away.> > 2) Why is the runq sorted by just priority (which is very coarse > grained: BOOST, UNDER and OVER), and not the credit? This can result > in VCPUs with higher credit getting starved for CPU if we have batch > and latency sensitive VCPUs in the system. > > 3) Is there some patch, which makes the current credit scheduler > fairer to the latency sensitive VCPUs? I see that the sched_credit2 > scheduler addresses these issues, but right now it has just one global > runq and no load balancing features. > > Any advice/inputs here will be extremely valuable! > > Thanks in advance, > -Gaurav > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gaurav Dhiman
2010-Jul-10 09:21 UTC
Re: [Xen-devel] Questions regarding Xen Credit Scheduler
Hi George, Thanks for your reply. I am in the process of fixing some of these issues. So this is what I have in mind: 1. __runq_insert: Insert according to credit as well as priority. Current code just looks at the priority, which is very coarse. 2. __runq_tickle: Tickle the CPU even if the new VCPU has same priority but higher amount of credits left. Current code just looks at the priority. 3. csched_runq_sort: Sort according to credit. 4. csched_acct: If credit of a VCPU crosses 300, then set it to 300, not 0. I am still not sure why the VCPU is being marked as inactive? Can''t I just update the credit and let it be active? 5. csched_schedule: Always call csched_load_balance. In the csched_load_balance and csched_runq_steal functions, change the logic to grab a VCPU with higher credit. Current code just works on priority. Do you think, these ideas make sense? Am I missing out on something?> Regarding specific things: > > One thing you didn''t catch is that credits before 4.0 are debited > probabilistically, a full 10ms at a time, by the very timer tick that > moves a vcpu from "inactive" to "active"; so when you make the switch > from "active" to "inactive", you don''t start out at 0, but at -10ms.Yes, I noticed this; point 4 above tries to address this. As I mentioned above, I am not sure why it is being marked inactive in first place?> It turns out that''s not only bad for latency-sensitive processes, but > it''s also a security bug; so there''s a patch in 4.0 (not sure whether > it''s been backported to 3.4) to do accurate accounting based on RDTSC > reads instead of probabilistic-based accounting based on timer ticks.Yes, I have seen Xen 4.0 code; it does deterministic accounting by recording the amount of time spent on the CPU by a VCPU.> So the answer to #3 is: > * The "accurate credit" patch is in 4.0, maybe 3.4. That should help somewhat. > * I have a patch that will change the "reset condition"; I''m > considering submitting it. I''d appreciate testing / feedback. (I''ll > send this in a separate e-mail.)Please do send this.> * There is no patch yet that will fix the sort-by-priority, but it > should be simple and straightforward to implement. I''ll support > putting it in once I''m reasonably convinced that it helps and doesn''t > hurt too much. If you were to help out with the implementation and > testing, that will happen a lot faster. :--)I am trying to implement the ideas I mentioned above. You feedback would be very helpful. Thanks, -Gaurav _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Jul-12 11:05 UTC
Re: [Xen-devel] Questions regarding Xen Credit Scheduler
On Sat, Jul 10, 2010 at 10:21 AM, Gaurav Dhiman <dimanuec@gmail.com> wrote:> 1. __runq_insert: Insert according to credit as well as priority. > Current code just looks at the priority, which is very coarse.[snip]> 3. csched_runq_sort: Sort according to credit.I think these (both related to a sorted runqueue) are probably good ideas. The main thing to pay attention to is overhead: run some tests with long runqueues, and see if there''s any performance degradation.> 2. __runq_tickle: Tickle the CPU even if the new VCPU has same > priority but higher amount of credits left. Current code just looks at > the priority.[snip]> 5. csched_schedule: Always call csched_load_balance. In the > csched_load_balance and csched_runq_steal functions, change the logic > to grab a VCPU with higher credit. Current code just works on > priority.I''m much more wary of these ideas. The problem here is that doing runqueue tickling and load balancing isn''t free -- IPIs can be expensive, especially if your VMs are running with hardware virtualization . In fact, with the current scheduler, you get a sort of n^2 effect, where the time the system spends doing IPIs due to load balancing squares with the number of schedulable entities. In addition, frequent migration will reduce cache effectiveness and increase congestion on the memory bus. I presume you want to do this to decrease the latency? Lee et al [1] actually found that *decreasing* the cpu migrations of their soft real-time workload led to an overall improvement in quality. The paper doesn''t delve deeply into why, but it seems reasonable to conclude that although the vcpus may have been able to start their task sooner (although even that''s not guaranteed -- it may have taken longer to migrate than to get to the front of the runqueue), they ended their task later, presumably due to cpu stalls on cacheline misses and so on. I think a much better approach would be: * To have long-term effective placement, if possible: i.e., distribute latency-sensitive vcpus * If two latency-sensitive vcpus are sharing a cpu, do shorter time-slices. But I think those need more research; it would be better to put that effort into the new scheduler.> 4. csched_acct: If credit of a VCPU crosses 300, then set it to 300, > not 0. I am still not sure why the VCPU is being marked as inactive? > Can''t I just update the credit and let it be active?Did you read the whitepaper that I linked to, and/or watch my presentation? It has a lot of information about the logic behind the algorithm: specifically, the tendency of this "credit-like" class of algorithms to credit divergence. Please read it and let me know if you have any questions. The active / inactive distinction has to do with who gets credits. If you just divided credits equally with everyone, then eventually VMs that weren''t using credits would gain a lot (or be capped out, as your''e suggesting). Because we allow VMs to use "extra" cpu time if it''s not being used, those VMs will by definition burn more credits than they earn, and will tend to go off to negative. So what credit1 does is assume that all workloads fall into two categories: "active" VMs, which consume as much cpu as they can, and "inactive" (or "I/O-bound") VMs, which use almost no cpu. "Inactive" VMs essentially run at BOOST priority, and run whenever they want to. Then the credit for each timeslice is divided among the "active" VMs. This way the ones that are consuming cpu don''t get too far behind. The problem of course, is that most server workloads fall in the middle: they spend a significant time processing, but also a significant time waiting for more network packets. I looked at the idea of "capping" credit, as you say; but the steady-state when I worked out the algorithms by hand was that all the VMs were at their cap all the time, which screwed up other aspects of the algorithm. Credits need to be thrown away; my proposal was to divide the credits by 2, rather than setting to 0. This should be a good mid-way. These things are actually really subtle. I''ve spent hours and hours with pencil-and-paper, working out different algorithms by hand, to see exactly what effect the different changes would have. I even wrote a discrete event simulator, to make the process a bit faster. (But of course, to understand why things look the way they do, you still have to trace through the algorithm manually). If you''re really keen, I can tar it up and send it to you. :-) So in summary: * Please do post a sort-runq-by-credit patch, preferably along with some benchmarks showing a lack of performance impact * Don''t think increased load-balancing is the right approach. Won''t scale, and probably won''t even make throughput faster. I wouldn''t approve these without significant large-scale testing. * Think the "reset condition" could use revising. My sense is that leaving it at the cap isn''t the best approach. If you can convince me that it works OK (including test results as well as posting graphs of consumed credit, &c), I''ll consider it; but I think dividing in half will be better. I''ll post a patch later today. * Let me know if you want my hacked-up scheduler simulator to play with. :-) Thanks again for your help, -George [1] Min Lee, A. S. Krishnakumar, P. Krishnan, Navjot Singh, Shalini Yajnik. ‘Supporting Soft Real-Time Tasks in the Xen Hypervisor’, VEE 2010, Pittsburgh, PA, March 17-19, 2010> Do you think, these ideas make sense? Am I missing out on something? > >> Regarding specific things: >> >> One thing you didn''t catch is that credits before 4.0 are debited >> probabilistically, a full 10ms at a time, by the very timer tick that >> moves a vcpu from "inactive" to "active"; so when you make the switch >> from "active" to "inactive", you don''t start out at 0, but at -10ms. > > Yes, I noticed this; point 4 above tries to address this. As I > mentioned above, I am not sure why it is being marked inactive in > first place? > >> It turns out that''s not only bad for latency-sensitive processes, but >> it''s also a security bug; so there''s a patch in 4.0 (not sure whether >> it''s been backported to 3.4) to do accurate accounting based on RDTSC >> reads instead of probabilistic-based accounting based on timer ticks. > > Yes, I have seen Xen 4.0 code; it does deterministic accounting by > recording the amount of time spent on the CPU by a VCPU. > >> So the answer to #3 is: >> * The "accurate credit" patch is in 4.0, maybe 3.4. That should help somewhat. >> * I have a patch that will change the "reset condition"; I''m >> considering submitting it. I''d appreciate testing / feedback. (I''ll >> send this in a separate e-mail.) > > Please do send this. > >> * There is no patch yet that will fix the sort-by-priority, but it >> should be simple and straightforward to implement. I''ll support >> putting it in once I''m reasonably convinced that it helps and doesn''t >> hurt too much. If you were to help out with the implementation and >> testing, that will happen a lot faster. :--) > > I am trying to implement the ideas I mentioned above. You feedback > would be very helpful. > > Thanks, > -Gaurav > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gaurav Dhiman
2010-Jul-16 00:41 UTC
Re: [Xen-devel] Questions regarding Xen Credit Scheduler
On Mon, Jul 12, 2010 at 4:05 AM, George Dunlap <George.Dunlap@eu.citrix.com> wrote:>> 2. __runq_tickle: Tickle the CPU even if the new VCPU has same >> priority but higher amount of credits left. Current code just looks at >> the priority. > [snip] >> 5. csched_schedule: Always call csched_load_balance. In the >> csched_load_balance and csched_runq_steal functions, change the logic >> to grab a VCPU with higher credit. Current code just works on >> priority. > > I''m much more wary of these ideas. The problem here is that doing > runqueue tickling and load balancing isn''t free -- IPIs can be > expensive, especially if your VMs are running with hardware > virtualization . In fact, with the current scheduler, you get a sort > of n^2 effect, where the time the system spends doing IPIs due to load > balancing squares with the number of schedulable entities. In > addition, frequent migration will reduce cache effectiveness and > increase congestion on the memory bus. > > I presume you want to do this to decrease the latency? Lee et al [1] > actually found that *decreasing* the cpu migrations of their soft > real-time workload led to an overall improvement in quality. The > paper doesn''t delve deeply into why, but it seems reasonable to > conclude that although the vcpus may have been able to start their > task sooner (although even that''s not guaranteed -- it may have taken > longer to migrate than to get to the front of the runqueue), they > ended their task later, presumably due to cpu stalls on cacheline > misses and so on. >Thanks for this paper. It gives a very interesting analysis on what can go wrong with applications that fall in the middle (need CPU, but are latency sensitive as well). In my experiments, I see some servers like mysql db-servers fall into this category. And as expected they do not do well with some CPU intensive jobs in background, even if I give them highest possible weight (65535). I guess very aggressive migrations might not be a good idea, but there needs to be some way to guarantee such apps getting their fair share at the right time.> I think a much better approach would be: > * To have long-term effective placement, if possible: i.e., distribute > latency-sensitive vcpus > * If two latency-sensitive vcpus are sharing a cpu, do shorter time-slices.These are very interesting ideas indeed.>> 4. csched_acct: If credit of a VCPU crosses 300, then set it to 300, >> not 0. I am still not sure why the VCPU is being marked as inactive? >> Can''t I just update the credit and let it be active?> So what credit1 does is assume that all workloads fall into two > categories: "active" VMs, which consume as much cpu as they can, and > "inactive" (or "I/O-bound") VMs, which use almost no cpu. "Inactive" > VMs essentially run at BOOST priority, and run whenever they want to. > Then the credit for each timeslice is divided among the "active" VMs. > This way the ones that are consuming cpu don''t get too far behind. > > The problem of course, is that most server workloads fall in the > middle: they spend a significant time processing, but also a > significant time waiting for more network packets.This is precisely the problem we are facing.> I looked at the idea of "capping" credit, as you say; but the > steady-state when I worked out the algorithms by hand was that all the > VMs were at their cap all the time, which screwed up other aspects of > the algorithm. Credits need to be thrown away; my proposal was to > divide the credits by 2, rather than setting to 0. This should be a > good mid-way.Sure, dividing by 2 could be a good middle ground. We can additionally not mark them inactive as well?> These things are actually really subtle. I''ve spent hours and hours > with pencil-and-paper, working out different algorithms by hand, to > see exactly what effect the different changes would have. I even > wrote a discrete event simulator, to make the process a bit faster. > (But of course, to understand why things look the way they do, you > still have to trace through the algorithm manually). If you''re really > keen, I can tar it up and send it to you. :-)I am just figuring out how non trivial these apparently small problems are :-) It would be great if you could share your simulator! I will keep you posted on my changes and tests. Thanks, -Gaurav _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Jul-16 09:13 UTC
Re: [Xen-devel] Questions regarding Xen Credit Scheduler
On Fri, Jul 16, 2010 at 1:41 AM, Gaurav Dhiman <dimanuec@gmail.com> wrote:> Sure, dividing by 2 could be a good middle ground. We can additionally > not mark them inactive as well?Think through the implications of your policy if we have the following situation: * 2 "burn" VMs, one with weight 100, one with weight 200 * 10 mostly idle VMs, using 1% of the cpu each, with a weight of 100. Think about what the ideal scheduler would do in this situation. You want the idle VMs to run whenever they want; that''s 90% left for the two "burn" VMs. We want one "burn" VM to run 30% of the time, and the other to run 60% of the time (because of the weights). Now, consider what would happen if we use the algorithm you describe. Credit1 divides all credits by weight among "active" VMs. With your modification, we''re not marking any VMs "inactive", so we''re dividing it by all VMs. That means each accounting period, the "idle" VMs are each getting about 7.7% of the credit (1/13), the 100-weight ''burn" VM is getting 7.7% of the credit, and the 200-weight "burn" vm is getting 15.4% of the credit (2/13). Now what happens? The "burn" VMs are guaranteed to burn more than their credits, so they''re continually negative. The 200-weight VM only has 7.7% of cpu time more credit added per accounting period than the 100-weight VM, so even if we sort by credits, it''s likely that the split will be 10% idle VMs / 49% 200-weight / 41 % 100-weight (i.e., the 200-weight gets 7.7% of total cpu time more, rather than twice as much). If we don''t set a "floor" for credits, then the credit of the "burn" VMs will continue to go negative into oblivion; if we do set a floor, the steady state will be for all VMs to be either at the ceiling (if they''re not using their "fair share"), or at the floor (if they are). (I encourage you to work out your algorithm by hand, or set up a simulator and go over the results with a fine-tooth comb, to understand why this is the case. It''s a real grind, but it will give you a really solid foundation for understanding scheduling problems. I''ve spent hours and hours doing just that.) Credit1 solves this by using the "active / inactive" designation. The 100-weight VM gets 33% of the credits, the 200-weight VM gets 66% of the credits, and the idle VMs are usually in the "inactive" state, running at BOOST priority; only occasionally flipping into "active" for a short time, before flipping back to "inactive". It''s far from ideal, as you''ve seen, but it usually works not too badly. Changing the credits to divide by 2 (but still mark it "active") is a patch-up. But a more fundamental change in the algorithm needs to be made to avoid this; and that''s what credit2 is for. BTW, what are you using to do your analysis of the live scheduler? Xen has a tracing mechanism that I''ve found indispensable for understanding what the algorithm was actually doing. I''ve got the basic tool I use to analyze the output here: http://xenbits.xensource.com/ext/xenalyze.hg I don''t have the patches used to analyze the scheduler stuff public (since they normally go through a lot of churn, and are interesting almost exclusively to developers), but I''ll see if I can dig some of them up for you. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Jul-16 11:04 UTC
Re: [Xen-devel] Questions regarding Xen Credit Scheduler
I''ve uploaded my scheduler simulator here: http://xenbits.xensource.com/people/gdunlap/sched-sim.hg It''s a little hacky still, but I think you should be able to adapt it to your needs. There are "workloads" defined in workloads.c, and a generic scheduling interface. The hg includes an example "round-robin" scheduler, and versions of credit2 with different features added (to be able to compare their effects on different simulated workloads). There''s a script, run.sh which will run a series of simulations and visualize them with "ygraph". It needs some way to specify scheduler-specific things in the workload definitions; atm versions 01 and 02 don''t use weights, and 03 has a hard-coded list. :-) If you want to use to to experiment with tweaks to credit1, you''ll have to implement that yourself. Feel free to send patches for improvements / fixes. -George On Fri, Jul 16, 2010 at 10:13 AM, George Dunlap <George.Dunlap@eu.citrix.com> wrote:> On Fri, Jul 16, 2010 at 1:41 AM, Gaurav Dhiman <dimanuec@gmail.com> wrote: >> Sure, dividing by 2 could be a good middle ground. We can additionally >> not mark them inactive as well? > > Think through the implications of your policy if we have the following > situation: > * 2 "burn" VMs, one with weight 100, one with weight 200 > * 10 mostly idle VMs, using 1% of the cpu each, with a weight of 100. > > Think about what the ideal scheduler would do in this situation. > > You want the idle VMs to run whenever they want; that''s 90% left for > the two "burn" VMs. We want one "burn" VM to run 30% of the time, and > the other to run 60% of the time (because of the weights). > > Now, consider what would happen if we use the algorithm you describe. > Credit1 divides all credits by weight among "active" VMs. With your > modification, we''re not marking any VMs "inactive", so we''re dividing > it by all VMs. That means each accounting period, the "idle" VMs are > each getting about 7.7% of the credit (1/13), the 100-weight ''burn" VM > is getting 7.7% of the credit, and the 200-weight "burn" vm is getting > 15.4% of the credit (2/13). > > Now what happens? The "burn" VMs are guaranteed to burn more than > their credits, so they''re continually negative. The 200-weight VM only > has 7.7% of cpu time more credit added per accounting period than the > 100-weight VM, so even if we sort by credits, it''s likely that the > split will be 10% idle VMs / 49% 200-weight / 41 % 100-weight (i.e., > the 200-weight gets 7.7% of total cpu time more, rather than twice as > much). If we don''t set a "floor" for credits, then the credit of the > "burn" VMs will continue to go negative into oblivion; if we do set a > floor, the steady state will be for all VMs to be either at the > ceiling (if they''re not using their "fair share"), or at the floor (if > they are). > > (I encourage you to work out your algorithm by hand, or set up a > simulator and go over the results with a fine-tooth comb, to > understand why this is the case. It''s a real grind, but it will give > you a really solid foundation for understanding scheduling problems. > I''ve spent hours and hours doing just that.) > > Credit1 solves this by using the "active / inactive" designation. The > 100-weight VM gets 33% of the credits, the 200-weight VM gets 66% of > the credits, and the idle VMs are usually in the "inactive" state, > running at BOOST priority; only occasionally flipping into "active" > for a short time, before flipping back to "inactive". > > It''s far from ideal, as you''ve seen, but it usually works not too > badly. Changing the credits to divide by 2 (but still mark it > "active") is a patch-up. But a more fundamental change in the > algorithm needs to be made to avoid this; and that''s what credit2 is > for. > > BTW, what are you using to do your analysis of the live scheduler? > Xen has a tracing mechanism that I''ve found indispensable for > understanding what the algorithm was actually doing. I''ve got the > basic tool I use to analyze the output here: > > http://xenbits.xensource.com/ext/xenalyze.hg > > I don''t have the patches used to analyze the scheduler stuff public > (since they normally go through a lot of churn, and are interesting > almost exclusively to developers), but I''ll see if I can dig some of > them up for you. > > -George >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhiyuan Shao
2010-Dec-17 14:23 UTC
[Xen-devel] Does Xen-4.0 + pvops kernel still supports PV guest?
hi all, I had installed Xen-4.0 in my Ubuntu desktop (Maverick Meerkat, 10.10) according to https://help.ubuntu.com/community/Xen although, I had to replace grub 2.0 with old version to make everything work. After rebooting, I want to create a PV guest with old method I had used in Xen 3.4.2. However, I got the following messages, and the creation failed. zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ sudo xm cr -c 1-pv.cfg [sudo] password for zhiyuan: Using config file "./1-pv.cfg". Error: (4, ''Out of memory'', ''xc_dom_alloc_segment: segment ramdisk too large (0x11575 > 0x4000 - 0x1b8f pages)\n'') zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ The configuration file is attached with this email. BTW, I booted the system with xen_commandline: dom0_mem=1024M sched=credit and my box has 2GB memory, Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz processor. Thanks in advance! Zhiyuan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhiyuan Shao
2010-Dec-17 14:25 UTC
Re: [Xen-devel] Questions regarding Xen Credit Scheduler
In your case, I think, you can try Boost-Credit. Your I/O intensive domain will benefit for that. Best, Zhiyuan On 07/09/2010 05:14 AM, Gaurav Dhiman wrote:> Hi All, > > I am using Xen 3.3.2 for some of my experiments, and have been > consistently observing some sub-optimal results in our experiments > with latency sensitive I/O intensive and compute bound VMs running > together on a physical machine. We are observing latency issues in > cases with enough CPU resources available for both the VMs to co-exist > well, even if we give much higher weight to the latency sensitive VM. > I suspect it is due to the way the Xen credit scheduler works. In this > context, I have some questions regarding the scheduler: > > 1) In the sched_acct function, the credit cap is set to 300, enough to > survive one time slice. But if some VCPU crosses that cap, it is set > to 0, and marked inactive. Why is there no concept of a ceiling (like > that of a floor for the VCPUs going over the credit line), i.e. why is > it not set to 300? Is there some fundamental reason for setting it to > 0? I believe this is resulting in a lot of times when our latency > sensitive VCPUs have to wait for maybe a time slice, when they can > immediately run. This might happen if they run with BOOST priority and > get interrupted by a timer tick, which takes that priority away. > > 2) Why is the runq sorted by just priority (which is very coarse > grained: BOOST, UNDER and OVER), and not the credit? This can result > in VCPUs with higher credit getting starved for CPU if we have batch > and latency sensitive VCPUs in the system. > > 3) Is there some patch, which makes the current credit scheduler > fairer to the latency sensitive VCPUs? I see that the sched_credit2 > scheduler addresses these issues, but right now it has just one global > runq and no load balancing features. > > Any advice/inputs here will be extremely valuable! > > Thanks in advance, > -Gaurav > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Dec-17 14:36 UTC
Re: [Xen-devel] Does Xen-4.0 + pvops kernel still supports PV guest?
On Fri, 2010-12-17 at 14:23 +0000, Zhiyuan Shao wrote:> hi all, > > I had installed Xen-4.0 in my Ubuntu desktop (Maverick Meerkat, 10.10) > according to https://help.ubuntu.com/community/Xen > > although, I had to replace grub 2.0 with old version to make everything > work. > > After rebooting, I want to create a PV guest with old method I had used > in Xen 3.4.2. However, I got the following messages, and the creation > failed. > > zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ sudo xm cr -c > 1-pv.cfg > [sudo] password for zhiyuan: > Using config file "./1-pv.cfg". > Error: (4, ''Out of memory'', ''xc_dom_alloc_segment: segment ramdisk too > large (0x11575 > 0x4000 - 0x1b8f pages)\n'')0x11575 pages is a 277M ramdisk, which, as the message says, is really quite large. Especially compared with the 64M of RAM which you have configured the guest with. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhiyuan Shao
2010-Dec-17 14:51 UTC
Re: [Xen-devel] Does Xen-4.0 + pvops kernel still supports PV guest?
On 12/17/2010 10:36 PM, Ian Campbell wrote:> On Fri, 2010-12-17 at 14:23 +0000, Zhiyuan Shao wrote: >> hi all, >> >> I had installed Xen-4.0 in my Ubuntu desktop (Maverick Meerkat, 10.10) >> according to https://help.ubuntu.com/community/Xen >> >> although, I had to replace grub 2.0 with old version to make everything >> work. >> >> After rebooting, I want to create a PV guest with old method I had used >> in Xen 3.4.2. However, I got the following messages, and the creation >> failed. >> >> zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ sudo xm cr -c >> 1-pv.cfg >> [sudo] password for zhiyuan: >> Using config file "./1-pv.cfg". >> Error: (4, ''Out of memory'', ''xc_dom_alloc_segment: segment ramdisk too >> large (0x11575> 0x4000 - 0x1b8f pages)\n'') > 0x11575 pages is a 277M ramdisk, which, as the message says, is really > quite large. Especially compared with the 64M of RAM which you have > configured the guest with. > > Ian. > >OK, I changed the "memory" line to 800, but, seems it does not work also: zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ sudo xm cr -c 1-pv.cfg Using config file "./1-pv.cfg". zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ Error: Device 2049 (vbd) could not be connected. Path closed or removed during hotplug add: backend/vbd/1/2049 state: 1 and quite silently ..... But I am very sure the disk image path is correct. And tried also to change "sda" to "hda" the result is the same. I also attach the .config file for the pvops kernel with this email. Should I compile another kernel for the PV domUs? Thanks! Zhiyuan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Dec-17 15:07 UTC
Re: [Xen-devel] Does Xen-4.0 + pvops kernel still supports PV guest?
On Fri, 2010-12-17 at 14:51 +0000, Zhiyuan Shao wrote:> On 12/17/2010 10:36 PM, Ian Campbell wrote: > > On Fri, 2010-12-17 at 14:23 +0000, Zhiyuan Shao wrote: > >> hi all, > >> > >> I had installed Xen-4.0 in my Ubuntu desktop (Maverick Meerkat, 10.10) > >> according to https://help.ubuntu.com/community/Xen > >> > >> although, I had to replace grub 2.0 with old version to make everything > >> work. > >> > >> After rebooting, I want to create a PV guest with old method I had used > >> in Xen 3.4.2. However, I got the following messages, and the creation > >> failed. > >> > >> zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ sudo xm cr -c > >> 1-pv.cfg > >> [sudo] password for zhiyuan: > >> Using config file "./1-pv.cfg". > >> Error: (4, ''Out of memory'', ''xc_dom_alloc_segment: segment ramdisk too > >> large (0x11575> 0x4000 - 0x1b8f pages)\n'') > > 0x11575 pages is a 277M ramdisk, which, as the message says, is really > > quite large. Especially compared with the 64M of RAM which you have > > configured the guest with. > > > > Ian. > > > > > OK, I changed the "memory" line to 800, but, seems it does not work also: > > zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ sudo xm cr -c > 1-pv.cfg > Using config file "./1-pv.cfg". > zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ Error: Device > 2049 (vbd) could not be connected. Path closed or removed during hotplug > add: backend/vbd/1/2049 state: 1 > > and quite silently ..... > > But I am very sure the disk image path is correct. And tried also to > change "sda" to "hda" the result is the same.pvops kernels require you to use xvda not sd* or hd*. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel