At the moment, the scheduler treats dom0 the same as any other VM for the purposes of accounting. Since dom0 is really a critical piece of infrastructure, and a part of the hypervisor system as a whole, it would make more sense to try to give dom0 special rights wrt CPU time. This patch will cause the hypervisor to automatically adjust the weight of dom0 such that it should be able to get either enough cpu for each of its vcpus, or half of the cpus on the system, whichever is less. This patch has been in XenServer for several releases now. I''m mostly posting to see what the interest in this kind of approach is. If there is interest, I''ll do the work to make it more configurable. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> diff -r a91aa7a582fb -r 0aa2ccb56224 xen/common/sched_credit.c --- a/xen/common/sched_credit.c Tue Jan 31 16:34:39 2012 +0000 +++ b/xen/common/sched_credit.c Thu Feb 09 12:31:00 2012 +0000 @@ -175,6 +175,7 @@ struct csched_private { /* Period of master and tick in milliseconds */ unsigned tslice_ms, tick_period_us, ticks_per_tslice; unsigned credits_per_tslice; + struct csched_dom * privdom; }; static void csched_tick(void *_cpu); @@ -551,6 +552,61 @@ csched_cpu_pick(const struct scheduler * return _csched_cpu_pick(ops, vc, 1); } +/* Make sure that the privileged domain always has enough weight for its active + * vcpus to get one full pcpu each */ +static inline void __csched_adj_privdom_weight(struct csched_private *prv) { + struct csched_dom *sdom = prv->privdom; + int other_cpus, new_weight; +#ifndef NDEBUG + int initial_weight; +#endif + + /* If privdom isn''t being accounted for, or is the only active + * domain, we''re done. */ + if ( sdom == NULL + || list_empty(&sdom->active_sdom_elem) + || unlikely(prv->ncpus < 2) + || prv->weight == sdom->weight * sdom->active_vcpu_count ) + return; + + BUG_ON(prv->weight < sdom->weight * sdom->active_vcpu_count); + +#ifndef NDEBUG + initial_weight = sdom->weight; +#endif + + /* First, subtract current privdom weight from system weight */ + prv->weight -= sdom->weight * sdom->active_vcpu_count; + + /* Calculate how many cores to leave to others. */ + other_cpus = prv->ncpus - sdom->active_vcpu_count; + + /* Don''t let privdomain have more than half the available cores */ + if ( sdom->active_vcpu_count > other_cpus ) + { + /* Privdomain total weight will be equal to the weight of all others, + * giving it 50% of available processing power. */ + new_weight = prv->weight / sdom->active_vcpu_count; + } + else + { + /* Calculate new privdomain weight: "other" weight / "other" pcpus */ + new_weight = prv->weight / other_cpus; + } + + if ( new_weight > 0 ) + sdom->weight = new_weight; + + /* Update system weight to reflect new dom0 weight */ + prv->weight += sdom->weight * sdom->active_vcpu_count; + +#ifndef NDEBUG + if(0 && initial_weight != sdom->weight) + printk("%s: d%d weight %d -> %d\n", + __func__, sdom->dom->domain_id, initial_weight, sdom->weight); +#endif +} + static inline void __csched_vcpu_acct_start(struct csched_private *prv, struct csched_vcpu *svc) { @@ -572,6 +628,16 @@ __csched_vcpu_acct_start(struct csched_p { list_add(&sdom->active_sdom_elem, &prv->active_sdom); } + + /* is_privileged isn''t set when dom0 is created, so check it here. */ + if ( unlikely(prv->privdom == NULL) + && IS_PRIV(sdom->dom) ) { + printk("%s: setting dom %d as the privileged domain\n", + __func__, sdom->dom->domain_id); + prv->privdom = sdom; + } + + __csched_adj_privdom_weight(prv); } spin_unlock_irqrestore(&prv->lock, flags); @@ -591,11 +657,15 @@ __csched_vcpu_acct_stop_locked(struct cs BUG_ON( prv->weight < sdom->weight ); sdom->active_vcpu_count--; list_del_init(&svc->active_vcpu_elem); + prv->weight -= sdom->weight; + if ( list_empty(&sdom->active_vcpu) ) { list_del_init(&sdom->active_sdom_elem); } + + __csched_adj_privdom_weight(prv); } static void @@ -821,6 +891,8 @@ csched_dom_cntl( prv->weight += op->u.credit.weight * sdom->active_vcpu_count; } sdom->weight = op->u.credit.weight; + + __csched_adj_privdom_weight(prv); } if ( op->u.credit.cap != (uint16_t)~0U ) @@ -856,6 +928,7 @@ csched_alloc_domdata(const struct schedu static int csched_dom_init(const struct scheduler *ops, struct domain *dom) { + struct csched_private *prv = CSCHED_PRIV(ops); struct csched_dom *sdom; CSCHED_STAT_CRANK(dom_init); @@ -869,6 +942,12 @@ csched_dom_init(const struct scheduler * dom->sched_priv = sdom; + if( IS_PRIV(dom) ) { + printk("%s: setting dom %d as the privileged domain\n", + __func__, dom->domain_id); + prv->privdom = sdom; + } + return 0; }
On Thu, 2012-02-09 at 12:31 +0000, George Dunlap wrote:> At the moment, the scheduler treats dom0 the same as any other VM > for the purposes of accounting. Since dom0 is really a critical > piece of infrastructure, and a part of the hypervisor system as a > whole, it would make more sense to try to give dom0 special rights > wrt CPU time. >To me, this seems more a sensible configuration than a feature to be embedded in the scheduler. I agree that dom0 always getting some CPU time is critical in many ways, but I think the scheduler should provide effective means for achieving such goal and then it is up to toolstack, sysadmin, etc, to use them appropriately. I obviously may be wrong, but I see this as a clear example of implementing a policy, while we should just provide mechanisms. :-)> This patch will cause the hypervisor to automatically adjust the > weight of dom0 such that it should be able to get either enough > cpu for each of its vcpus, or half of the cpus on the system, > whichever is less. >That''s the point. In fact, I think credit has the proper mechanisms to affect the CPU time a domain receives, but not in such a way that the above is achieved, and here''s why this needs to be done by modifying he scheduler (am I wrong?). It''s sort of in replacement of a "minimum guaranteed CPU bandwidth" for a domain mechanism, which maybe is too complex/not worthwhile to put in place in a generalized fashion. Just to clarify with an example, in sedf you wouldn''t need anything like this, as you specify exactly the _guaranteed_ CPU bandwidth[*] each vcpu should be entitled with, i.e., sedf implements the correct mechanism for enforcing a policy like that one, established by someone else up in the stack... Then obviously sedf has a whole bunch of limitations, and we all know it. :-P Another way of looking at it could be, would user expect that? I mean seeing dom0''s weights dynamically changing? But here I really don''t have the proper backup for providing an answer... :-)> I''m mostly posting to see what the interest in this kind of approach > is. If there is interest, I''ll do the work to make it more configurable. >Yes, if we want this, I think it at least should be possible to turn it on and off (although then the question becomes what the default should be). Summarizing, if something not equal, but maybe equally effective, could be achieved without this (e.g., http://wiki.xen.org/wiki/Xen_Best_Practices), I''d be happy not having it. However, if we think it could be useful and it is configurable enough, I can live with it. :-) Regards, Dario [*] For current implementation of sedf, that is true. If that scheduler would ever properly support SMP, it will need some more tweaking to stay true, or it will just become "a bit less true" :-D -- <<This happens because I choose it to happen!>> (Raistlin Majere) ------------------------------------------------------------------- Dario Faggioli, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) PhD Candidate, ReTiS Lab, Scuola Superiore Sant''Anna, Pisa (Italy) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel