thr3ads.net - Xen devel - [PATCH] RFC: Automatically adjust dom0 weight [Feb 2012]

If this information is useful, please help other people find it:
Share via:

George Dunlap

2012-Feb-09 12:31 UTC

[PATCH] RFC: Automatically adjust dom0 weight

At the moment, the scheduler treats dom0 the same as any other VM
for the purposes of accounting.  Since dom0 is really a critical
piece of infrastructure, and a part of the hypervisor system as a
whole, it would make more sense to try to give dom0 special rights
wrt CPU time.

This patch will cause the hypervisor to automatically adjust the
weight of dom0 such that it should be able to get either enough
cpu for each of its vcpus, or half of the cpus on the system,
whichever is less.

This patch has been in XenServer for several releases now.

I''m mostly posting to see what the interest in this kind of approach
is.  If there is interest, I''ll do the work to make it more
configurable.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>

diff -r a91aa7a582fb -r 0aa2ccb56224 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Tue Jan 31 16:34:39 2012 +0000
+++ b/xen/common/sched_credit.c	Thu Feb 09 12:31:00 2012 +0000
@@ -175,6 +175,7 @@ struct csched_private {
     /* Period of master and tick in milliseconds */
     unsigned tslice_ms, tick_period_us, ticks_per_tslice;
     unsigned credits_per_tslice;
+    struct csched_dom * privdom;
 };
 
 static void csched_tick(void *_cpu);
@@ -551,6 +552,61 @@ csched_cpu_pick(const struct scheduler *
     return _csched_cpu_pick(ops, vc, 1);
 }
 
+/* Make sure that the privileged domain always has enough weight for its active
+ * vcpus to get one full pcpu each */
+static inline void __csched_adj_privdom_weight(struct csched_private *prv) {
+    struct csched_dom *sdom = prv->privdom;
+    int other_cpus, new_weight;
+#ifndef NDEBUG
+    int initial_weight;
+#endif
+
+    /* If privdom isn''t being accounted for, or is the only active
+     * domain, we''re done. */
+    if ( sdom == NULL
+         || list_empty(&sdom->active_sdom_elem)
+         || unlikely(prv->ncpus < 2) 
+         || prv->weight == sdom->weight * sdom->active_vcpu_count ) 
+        return;
+
+    BUG_ON(prv->weight < sdom->weight * sdom->active_vcpu_count);
+
+#ifndef NDEBUG
+    initial_weight = sdom->weight;
+#endif
+
+    /* First, subtract current privdom weight from system weight */
+    prv->weight -= sdom->weight * sdom->active_vcpu_count;
+
+    /* Calculate how many cores to leave to others. */
+    other_cpus = prv->ncpus - sdom->active_vcpu_count;
+
+    /* Don''t let privdomain have more than half the available cores */
+    if ( sdom->active_vcpu_count > other_cpus )
+    {
+        /* Privdomain total weight will be equal to the weight of all others,
+         * giving it 50% of available processing power. */
+        new_weight = prv->weight / sdom->active_vcpu_count;
+    }
+    else
+    {
+        /* Calculate new privdomain weight: "other" weight /
"other" pcpus */
+        new_weight = prv->weight / other_cpus;
+    }
+
+    if ( new_weight > 0 )
+        sdom->weight = new_weight;
+
+    /* Update system weight to reflect new dom0 weight */
+    prv->weight += sdom->weight * sdom->active_vcpu_count;
+
+#ifndef NDEBUG
+   if(0 && initial_weight != sdom->weight)
+        printk("%s: d%d weight %d -> %d\n",
+               __func__, sdom->dom->domain_id, initial_weight,
sdom->weight);
+#endif
+}
+
 static inline void
 __csched_vcpu_acct_start(struct csched_private *prv, struct csched_vcpu *svc)
 {
@@ -572,6 +628,16 @@ __csched_vcpu_acct_start(struct csched_p
         {
             list_add(&sdom->active_sdom_elem, &prv->active_sdom);
         }
+
+        /* is_privileged isn''t set when dom0 is created, so check it
here. */
+        if ( unlikely(prv->privdom == NULL)
+             && IS_PRIV(sdom->dom) ) {
+            printk("%s: setting dom %d as the privileged domain\n",
+                   __func__, sdom->dom->domain_id);
+            prv->privdom = sdom;
+        }
+
+        __csched_adj_privdom_weight(prv);
     }
 
     spin_unlock_irqrestore(&prv->lock, flags);
@@ -591,11 +657,15 @@ __csched_vcpu_acct_stop_locked(struct cs
     BUG_ON( prv->weight < sdom->weight );
     sdom->active_vcpu_count--;
     list_del_init(&svc->active_vcpu_elem);
+
     prv->weight -= sdom->weight;
+
     if ( list_empty(&sdom->active_vcpu) )
     {
         list_del_init(&sdom->active_sdom_elem);
     }
+
+    __csched_adj_privdom_weight(prv);
 }
 
 static void
@@ -821,6 +891,8 @@ csched_dom_cntl(
                 prv->weight += op->u.credit.weight *
sdom->active_vcpu_count;
             }
             sdom->weight = op->u.credit.weight;
+
+            __csched_adj_privdom_weight(prv);
         }
 
         if ( op->u.credit.cap != (uint16_t)~0U )
@@ -856,6 +928,7 @@ csched_alloc_domdata(const struct schedu
 static int
 csched_dom_init(const struct scheduler *ops, struct domain *dom)
 {
+    struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_dom *sdom;
 
     CSCHED_STAT_CRANK(dom_init);
@@ -869,6 +942,12 @@ csched_dom_init(const struct scheduler *
 
     dom->sched_priv = sdom;
 
+    if( IS_PRIV(dom) ) {
+        printk("%s: setting dom %d as the privileged domain\n",
+               __func__, dom->domain_id);
+        prv->privdom = sdom;
+    }
+
     return 0;
 }

Dario Faggioli

2012-Feb-10 08:31 UTC

head link

Re: [PATCH] RFC: Automatically adjust dom0 weight

On Thu, 2012-02-09 at 12:31 +0000, George Dunlap wrote:> At the moment, the scheduler treats dom0 the same as any other VM
> for the purposes of accounting.  Since dom0 is really a critical
> piece of infrastructure, and a part of the hypervisor system as a
> whole, it would make more sense to try to give dom0 special rights
> wrt CPU time.
> To me, this seems more a sensible configuration than a feature to be
embedded in the scheduler. I agree that dom0 always getting some CPU
time is critical in many ways, but I think the scheduler should provide
effective means for achieving such goal and then it is up to toolstack,
sysadmin, etc, to use them appropriately.

I obviously may be wrong, but I see this as a clear example of
implementing a policy, while we should just provide mechanisms. :-)
> This patch will cause the hypervisor to automatically adjust the
> weight of dom0 such that it should be able to get either enough
> cpu for each of its vcpus, or half of the cpus on the system,
> whichever is less.
> That''s the point. In fact, I think credit has the proper mechanisms to
affect the CPU time a domain receives, but not in such a way that the
above is achieved, and here''s why this needs to be done by modifying he
scheduler (am I wrong?). It''s sort of in replacement of a "minimum
guaranteed CPU bandwidth" for a domain mechanism, which maybe is too
complex/not worthwhile to put in place in a generalized fashion.

Just to clarify with an example, in sedf you wouldn''t need anything
like
this, as you specify exactly the _guaranteed_ CPU bandwidth[*] each vcpu
should be entitled with, i.e., sedf implements the correct mechanism for
enforcing a policy like that one, established by someone else up in the
stack... Then obviously sedf has a whole bunch of limitations, and we
all know it. :-P

Another way of looking at it could be, would user expect that? I mean
seeing dom0''s weights dynamically changing? But here I really
don''t have
the proper backup for providing an answer... :-)
> I''m mostly posting to see what the interest in this kind of
approach
> is.  If there is interest, I''ll do the work to make it more
configurable.
> Yes, if we want this, I think it at least should be possible to turn it
on and off (although then the question becomes what the default should
be).

Summarizing, if something not equal, but maybe equally effective, could
be achieved without this (e.g.,
wiki.xen.org/wiki/Xen_Best_Practices), I''d be happy not having
it. However, if we think it could be useful and it is configurable
enough, I can live with it. :-)

Regards,
Dario

[*] For current implementation of sedf, that is true. If that scheduler
would ever properly support SMP, it will need some more tweaking to stay
true, or it will just become "a bit less true" :-D

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant''Anna, Pisa (Italy)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Xen devel - Feb 2012 - [PATCH] RFC: Automatically adjust dom0 weight

[PATCH] RFC: Automatically adjust dom0 weight

Re: [PATCH] RFC: Automatically adjust dom0 weight