Mike D. Day
2007-Nov-29  20:19 UTC
[Xen-devel] [PATCH] Scheduling groups, credit scheduler support
The credit implementation is limited to the sharing time slices among
group members. All members of a group must share the master''s time
slices with the master. If the group master is capped at 20%, the
cumulative total of the master and all members will be 20%. This is
specifically to support stub domains, which will host the device model
for hvm domains. The proper way to schedule a stub domain is for it to
share the times slices allocated the the stub''s hvm domain.
The credit scheduler is driven by its accounting function. Each domain
is given a credit amount sufficient to run each of its vcpus for one
scheduling cycle. The scheduler divides the total domain credit by the
number of vcpus and allocates each vcpu its share of the domain''s
credits. A domain with two vcpus has each of its vcpus given 1/2 of
the domain''s credits.
The credit scheduler subtracts credits from each vcpu for every time
slice that vcpu runs. When a vcpu has consumed its credit or exceeded
its cap the credit scheduler puts that vcpu to sleep. At the beginning
of each new scheduling cycle sleeping vcpus that have work are
awakened and given a new share of credits.
The credit scheduler runs vcpus, not domains. However, a domain''s
vcpus are given time slices according to the credits available to the
domain and any caps placed on the domain. Therefore, the simplest way
to group domains together in the credit scheduler is to assign the
member domain''s vcpus to the master domain. Each vcpu assigned to the
master domain receives a credit equal to the master domain''s total
credit divided by the number of assigned vcpus. This forces all the
member domains to share the master domain''s credits with the master,
which achieves the desired behavior. 
The primary accounting function in the credit scheduler is unmodified,
save for the removal of one debugging line. All of the group
processing is handled off the fast path. There are no additional locks
and the only new locked section is the grouping/ungrouping of domains,
which happens infrequently. Although I have yet to run any micro
benchmarks I anticipate no difference in the performance of the credit
scheduler with these patches applied.
Each struct csched_vcpu receives five new members: A list_head to hold
grouped domains (for the master), and another list head to place
member domains on the master''s list; a pointer to the master domain
(for members), and two bool_t members to hold the domain''s grouping
state.
Domains are added to a group by the function
add_member_to_master. This routine moves the member domain''s vcpus to
the master by calling delegate_active_vcpu.
delegate_active_vcpu migrates all the member domain''s active vcpus to
the new master. If necessary it then removes the member domain from
the credit scheduler''s list of active domains.
When a new vcpu is made active by csched_vcpu_acct_start, that vcpu is
always added to the domain master if the vcpu belongs to a domain
member. This and an equivalent line in __csched_vcpu_acct_stop
comprise the only new code that executes on the fast path:
static inline struct csched_dom *master_dom(struct csched_dom *d)
{
    if ( d->is_member )
        return d->master;
    return d;
}
When a domain is removed from a group, the inverse occurs. First the
former member domain''s vcpus are returned by a call to
reclaim_active_vcpus. In addition to reclaiming the vcpus, the
(former) member domain is removed from the master''s list. If it has
any active vcpus, the former member is placed on the credit
scheduler''s list of active domains.
The remainder of the code handles the sched-group sub op and ensures
that a destroyed domain''s grouping properties are properly handled and
that vcpus end up in the right place: either destroyed with their
domain or moved back to the (former) group member which owns them.
Signed-off-by: Mike D. Day <ncmike@us.ibm.com>
--
sched_credit.c |  267 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 259 insertions(+), 8 deletions(-)
-- 
diff -r 0bff1fad920a xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Wed May 09 16:41:28 2007 -0400
+++ b/xen/common/sched_credit.c	Thu May 10 16:45:21 2007 -0400
@@ -219,10 +219,15 @@ struct csched_dom {
 struct csched_dom {
     struct list_head active_vcpu;
     struct list_head active_sdom_elem;
+    struct list_head group;
+    struct list_head group_elem;
+    struct csched_dom *master;
     struct domain *dom;
     uint16_t active_vcpu_count;
     uint16_t weight;
     uint16_t cap;
+    bool_t is_master;
+    bool_t is_member;
 };
 
 /*
@@ -344,6 +349,118 @@ __runq_tickle(unsigned int cpu, struct c
         cpumask_raise_softirq(mask, SCHEDULE_SOFTIRQ);
 }
 
+static inline struct csched_dom *csched_dom(struct domain *d)
+{
+    return (struct csched_dom *)d->sched_priv;
+}
+
+static inline struct csched_dom *get_master_dom(struct csched_dom *d)
+{
+    if ( d->is_member )
+    {
+        if ( get_domain(d->master->dom) )
+            return d->master;
+        BUG();
+    }
+    return NULL;
+}
+
+static inline struct csched_dom *master_dom(struct csched_dom *d)
+{
+    if ( d->is_member )
+        return d->master;
+    return d;
+}
+
+static inline void delegate_active_vcpus(struct csched_dom *member,
+                                         struct csched_dom *master)
+{
+    BUG_ON( ! ( member->is_member ) );
+    BUG_ON( member->master != master );
+    if ( member->is_member && member->master == master )
+    {
+        struct list_head *elem;
+
+        while ( !list_empty(&member->active_vcpu) )
+        {
+            elem = member->active_vcpu.next;
+            list_del(elem);
+            list_add(elem, &master->active_vcpu);
+            member->active_vcpu_count--;
+            master->active_vcpu_count++;
+        }
+
+        if ( !list_empty(&member->active_sdom_elem) )
+        {
+            list_del_init(&member->active_sdom_elem);
+            csched_priv.weight -= member->weight;
+        }
+
+        if ( list_empty(&master->active_sdom_elem) )
+        {
+            list_add(&master->active_sdom_elem,
&csched_priv.active_sdom);
+            csched_priv.weight += master->weight;
+        }
+    }
+}
+
+static inline void reclaim_active_vcpus(struct csched_dom *master,
+                                        struct csched_dom *member)
+{
+    BUG_ON( !master->is_master );
+    BUG_ON( member->master != master );
+    if ( master->is_master && member->master == master )
+    {
+        struct csched_vcpu *iter, *n;
+
+        list_for_each_entry_safe( iter, n, &master->active_vcpu,
+                                  active_vcpu_elem )
+        {
+            if ( iter->sdom == member )
+            {
+                list_del(&iter->active_vcpu_elem);
+                list_add(&iter->active_vcpu_elem,
&member->active_vcpu);
+                master->active_vcpu_count--;
+                member->active_vcpu_count++;
+            }
+        }
+
+        if ( list_empty(&master->active_vcpu) &&
+            !list_empty(&master->active_sdom_elem) )
+        {
+            list_del_init(&master->active_sdom_elem);
+            csched_priv.weight -= master->weight;
+        }
+        if ( !list_empty(&member->active_vcpu) &&
+            list_empty(&member->active_sdom_elem) )
+        {
+            list_add(&member->active_sdom_elem,
&csched_priv.active_sdom);
+            csched_priv.weight += member->weight;
+        }
+    }
+}
+
+static inline void add_member_to_master(struct csched_dom *member,
+                                        struct csched_dom *master)
+{
+    list_add(&member->group_elem, &master->group);
+    member->master = master;
+    member->is_member = 1;
+    master->is_master = 1;
+    delegate_active_vcpus(member, master);
+}
+
+static inline void rem_member_from_master(struct csched_dom *member,
+                                          struct csched_dom *master)
+{
+    reclaim_active_vcpus(master, member);
+    member->is_member = 0;
+    member->master = NULL;
+    list_del(&member->group_elem);
+    if (list_empty(&master->group))
+        master->is_master = 0;
+}
+
 static int
 csched_pcpu_init(int cpu)
 {
@@ -395,6 +512,17 @@ __csched_vcpu_check(struct vcpu *vc)
     else
     {
         BUG_ON( !is_idle_vcpu(vc) );
+    }
+
+    if ( sdom->is_master )
+    {
+        BUG_ON( list_empty(&sdom->group) );
+        BUG_ON( sdom->is_member );
+    }
+    if ( sdom->is_member )
+    {
+        BUG_ON( list_empty(&sdom->group_elem) );
+        BUG_ON( sdom->is_master );
     }
 
     CSCHED_STAT_CRANK(vcpu_check);
@@ -486,11 +614,11 @@ static inline void
 static inline void
 __csched_vcpu_acct_start(struct csched_vcpu *svc)
 {
-    struct csched_dom * const sdom = svc->sdom;
     unsigned long flags;
-
+    struct csched_dom * sdom;
     spin_lock_irqsave(&csched_priv.lock, flags);
 
+    sdom = master_dom(svc->sdom);
     if ( list_empty(&svc->active_vcpu_elem) )
     {
         CSCHED_VCPU_STAT_CRANK(svc, state_active);
@@ -504,14 +632,13 @@ __csched_vcpu_acct_start(struct csched_v
             csched_priv.weight += sdom->weight;
         }
     }
-
     spin_unlock_irqrestore(&csched_priv.lock, flags);
 }
 
 static inline void
 __csched_vcpu_acct_stop_locked(struct csched_vcpu *svc)
 {
-    struct csched_dom * const sdom = svc->sdom;
+    struct csched_dom * const sdom = master_dom(svc->sdom);
 
     BUG_ON( list_empty(&svc->active_vcpu_elem) );
 
@@ -605,20 +732,34 @@ csched_vcpu_init(struct vcpu *vc)
     return 0;
 }
 
+static void group_cleanup(struct csched_vcpu *svc)
+{
+    if ( svc->sdom->is_member )
+        rem_member_from_master(svc->sdom, master_dom(svc->sdom));
+    if ( svc->sdom->is_master )
+    {
+        struct csched_dom *iter, *n;
+        list_for_each_entry_safe( iter, n, &svc->sdom->group,
group_elem )
+        {
+            rem_member_from_master(iter, svc->sdom);
+        }
+    }
+}
+
+
 static void
 csched_vcpu_destroy(struct vcpu *vc)
 {
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
-    struct csched_dom * const sdom = svc->sdom;
     unsigned long flags;
 
     CSCHED_STAT_CRANK(vcpu_destroy);
 
-    BUG_ON( sdom == NULL );
     BUG_ON( !list_empty(&svc->runq_elem) );
 
     spin_lock_irqsave(&csched_priv.lock, flags);
 
+    group_cleanup(svc);
     if ( !list_empty(&svc->active_vcpu_elem) )
         __csched_vcpu_acct_stop_locked(svc);
 
@@ -697,6 +838,112 @@ csched_vcpu_wake(struct vcpu *vc)
     __runq_tickle(cpu, svc);
 }
 
+static inline int
+_sanity_check(struct csched_dom *member, struct csched_dom *master)
+{
+    if ( member->dom->domain_id == master->dom->domain_id )
+        return SGRP_err_same_id;
+    if ( member->is_master )
+        return SGRP_err_already_master;
+    if ( master->is_member )
+        return SGRP_err_already_member;
+    return 0;
+}
+
+static inline int
+add_sanity_check(struct csched_dom *member, struct csched_dom *master)
+{
+    if ( member->master )
+        return SGRP_err_inval;
+    return _sanity_check(member, master);
+}
+
+static inline int
+rem_sanity_check(struct csched_dom *member, struct csched_dom *master)
+{
+    if ( member->is_member && member->master &&
member->master == master )
+        return _sanity_check(member, master);
+    return SGRP_err_inval;
+}
+
+static int csched_group_op(struct xen_domctl_group * op)
+{
+    int ret = -EINVAL;
+
+    switch(op->op)
+    {
+    case SGRP_get_status:
+    case SGRP_get_master:
+    {
+        struct domain *dom = get_domain_by_id(op->id_member);
+        if ( dom )
+        {
+            struct csched_dom *cdom = csched_dom(dom);
+            if ( op->op == SGRP_get_status )
+            {
+                op->is_master = cdom->is_master;
+                op->is_member = cdom->is_member;
+            }
+            else
+            {
+                struct csched_dom *master = get_master_dom(cdom);
+                if ( master )
+                {
+                    op->id_master = master->dom->domain_id;
+                    put_domain(master->dom);
+                }
+                else
+                    op->reason = SGRP_err_not_member;
+            }
+            put_domain(dom);
+            ret = 0;
+        }
+        break;
+    }
+
+    case SGRP_add_member:
+    case SGRP_del_member:
+    {
+        struct domain *member, *master;
+        unsigned long flags;
+
+        master  = get_domain_by_id(op->id_master);
+        if ( !master )
+            break;
+        member = get_domain_by_id(op->id_member);
+        if ( !member )
+            goto release_master;
+        ret = 0;
+        if ( op->op == SGRP_add_member )
+            op->reason +                add_sanity_check(csched_dom(member),
csched_dom(master));
+        else
+            op->reason +                rem_sanity_check(csched_dom(member),
csched_dom(master));
+        if ( op->reason )
+            goto release_member;
+
+        spin_lock_irqsave(&csched_priv.lock, flags);
+        if ( op->op == SGRP_add_member )
+            add_member_to_master(csched_dom(member), csched_dom(master));
+        else
+            rem_member_from_master(csched_dom(member), csched_dom(master));
+        spin_unlock_irqrestore(&csched_priv.lock, flags);
+
+release_member:
+        put_domain(member);
+release_master:
+        put_domain(master);
+
+        break;
+    }
+    default:
+        break;
+    }
+
+    return ret;
+}
+
 static int
 csched_dom_cntl(
     struct domain *d,
@@ -754,10 +1001,14 @@ csched_dom_init(struct domain *dom)
     sdom->active_vcpu_count = 0;
     INIT_LIST_HEAD(&sdom->active_sdom_elem);
     sdom->dom = dom;
+    sdom->master = NULL;
     sdom->weight = CSCHED_DEFAULT_WEIGHT;
     sdom->cap = 0U;
     dom->sched_priv = sdom;
-
+    INIT_LIST_HEAD(&sdom->group);
+    INIT_LIST_HEAD(&sdom->group_elem);
+    sdom->is_master = 0;
+    sdom->is_member = 0;
     return 0;
 }
 
@@ -942,7 +1193,6 @@ csched_acct(void)
         list_for_each_safe( iter_vcpu, next_vcpu, &sdom->active_vcpu )
         {
             svc = list_entry(iter_vcpu, struct csched_vcpu, active_vcpu_elem);
-            BUG_ON( sdom != svc->sdom );
 
             /* Increment credit */
             atomic_add(credit_fair, &svc->credit);
@@ -1384,6 +1634,7 @@ struct scheduler sched_credit_def = {
     .sleep          = csched_vcpu_sleep,
     .wake           = csched_vcpu_wake,
 
+    .group_op        = csched_group_op,
     .adjust         = csched_dom_cntl,
 
     .pick_cpu       = csched_cpu_pick,
-- 
Mike D. Day
Virtualization Architect and Sr. Technical Staff Member, IBM LTC
Cell: 919 412-3900
ST: mdday@us.ibm.com | AIM: ncmikeday | Yahoo IM: ultra.runner
PGP key: http://www.ncultra.org/ncmike/pubkey.asc
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Chris B
2007-Nov-29  22:36 UTC
Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support
My interpretation of your implementation is that it causes hierarchal relationships (master/slave) between domains. Every group has one master and the rest are slaves. The fixed relationships puts implicit limits on the organization of domains. (See http:// www.hpl.hp.com/techreports/94/HPL-94-104.html, for example) Since groups are applicable to more than just scheduling (domain disaggregation, convenient migration, efficient security policy, etc.), a more general mechanism would be preferable. I submitted something similar in the past (Feb 20, 2007). If anyone is interested, I''d be glad to submit a fresh cut of my patches as an RFC. -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Chris
2007-Dec-03  15:37 UTC
Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support
Mike, My reading of your scheduling groups implementation is that it induces hierarchal relationships (master/slave) between domains. That is, every group has one master and the rest are slaves. Although that implementation has the advantage of being small, the fixed relationship puts implicit limits on the organization of domains and the operations that can be applied to them. In addition to scheduling, I believe domain groups are applicable to other areas such as domain disaggregation, convenient migration, efficient security policy, etc.. As such, a non-hierarchical group mechanism is desirable. I submitted a related domain grouping patch in the past (Feb 20, 2007). If anyone is interested, I''d be glad to submit a fresh cut of my patches against the tip as an RFC. Cheers, Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mike D. Day
2007-Dec-03  17:59 UTC
[Xen-devel] Re: Scheduling groups, credit scheduler support
On 03/12/07 10:37 -0500, Chris wrote:> Mike, > > My reading of your scheduling groups implementation is that it induces > hierarchal relationships (master/slave) between domains. That is, every > group has one master and the rest are slaves. Although that implementation > has the advantage of being small, the fixed relationship puts implicit > limits on the organization of domains and the operations that can be > applied to them.Hi Chris, The patches use the term master and member, but it is more of a peer relationship only effecting scheduler accounting. The significance of the "master" is that the "member" domains inherit the master''s cpu weight and credits are charged to the master. The master doesn''t exert any explicit control over its group members (although there may be a use case for doing so). This is the specific functionality we need for stub domains, where the credits consumed by a stub domain need to be charged to the HVM guest domain.> In addition to scheduling, I believe domain groups are applicable to other > areas such as domain disaggregation, convenient migration, efficient > security policy, etc.. As such, a non-hierarchical group mechanism is > desirable.The scheduling group is only visible to the credit scheduler, and there is no meaning outside of the scheduler''s process accounting. I didn''t want to modify the Domain structures, and it wasn''t necessary to get the desired scheduling behavior. I think other types of groups may be useful, and it would be great to see your patches again. Mike -- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Dec-04  10:38 UTC
Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support
On 29/11/07 20:19, "Mike D. Day" <ncmike@us.ibm.com> wrote:> The credit implementation is limited to the sharing time slices among > group members. All members of a group must share the master''s time > slices with the master. If the group master is capped at 20%, the > cumulative total of the master and all members will be 20%. This is > specifically to support stub domains, which will host the device model > for hvm domains. The proper way to schedule a stub domain is for it to > share the times slices allocated the the stub''s hvm domain.It''s good to see these patches get aired again. I hope we can get some integration with the ongoing stub domain work and get some numbers out to prove better scalability and QoS. This would ease the passage of scheduler group support into xen-unstable.> The credit scheduler runs vcpus, not domains. However, a domain''s > vcpus are given time slices according to the credits available to the > domain and any caps placed on the domain. Therefore, the simplest way > to group domains together in the credit scheduler is to assign the > member domain''s vcpus to the master domain. Each vcpu assigned to the > master domain receives a credit equal to the master domain''s total > credit divided by the number of assigned vcpus. This forces all the > member domains to share the master domain''s credits with the master, > which achieves the desired behavior.Is this the right thing to do? I would think that the desired behaviour is for each ''master vcpu'' to freely share its credit allocation with its ''buddy vcpus''. The static N-way split of credits across vcpus within a domain makes some kind of sense, since the vcpus are each equally important and each independent of each other. Statically splitting credits between e.g., HVM guest domain and its stub domain makes less sense. One is subordinate to the other, and a model where the stub can ''steal'' credits dynamically from the HVM domain seems to make more sense. Otherwise, wouldn''t a uniprocessor HVM guest get half its credit stolen by the uniprocessor stub domain, even if the HVM guest is doing no I/O? Perhaps I misunderstand. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mike D. Day
2007-Dec-04  13:50 UTC
[Xen-devel] Re: Scheduling groups, credit scheduler support
On 04/12/07 10:38 +0000, Keir Fraser wrote:>On 29/11/07 20:19, "Mike D. Day" <ncmike@us.ibm.com> wrote: > >> The credit implementation is limited to the sharing time slices among >> group members. All members of a group must share the master''s time >> slices with the master. If the group master is capped at 20%, the >> cumulative total of the master and all members will be 20%. This is >> specifically to support stub domains, which will host the device model >> for hvm domains. The proper way to schedule a stub domain is for it to >> share the times slices allocated the the stub''s hvm domain. > >It''s good to see these patches get aired again. I hope we can get some >integration with the ongoing stub domain work and get some numbers out to >prove better scalability and QoS. This would ease the passage of scheduler >group support into xen-unstable.I previously published some benchmarks: http://article.gmane.org/gmane.comp.emulators.xen.devel/39818/>> The credit scheduler runs vcpus, not domains. However, a domain''s >> vcpus are given time slices according to the credits available to the >> domain and any caps placed on the domain. Therefore, the simplest way >> to group domains together in the credit scheduler is to assign the >> member domain''s vcpus to the master domain. Each vcpu assigned to the >> master domain receives a credit equal to the master domain''s total >> credit divided by the number of assigned vcpus. This forces all the >> member domains to share the master domain''s credits with the master, >> which achieves the desired behavior. > >Is this the right thing to do? I would think that the desired behaviour is >for each ''master vcpu'' to freely share its credit allocation with its ''buddy >vcpus''. The static N-way split of credits across vcpus within a domain makes >some kind of sense, since the vcpus are each equally important and each >independent of each other.This is what happens with the patch today. In fact, the code that allocates credits is untouched by the patch. The difference is that the active vcpus of the stub domain are transfered for accounting purposes to the hvm domain. So whenever the stub domain runs those credits are decremented from the hvm domain. If the stub domain doesn''t run no hvm credits are decremented. Statically splitting credits between e.g., HVM>guest domain and its stub domain makes less sense. One is subordinate to the >other, and a model where the stub can ''steal'' credits dynamically from the >HVM domain seems to make more sense. Otherwise, wouldn''t a uniprocessor HVM >guest get half its credit stolen by the uniprocessor stub domain, even if >the HVM guest is doing no I/O?Credits are "stolen" only when the stub domain runs, so if the hvm domain is doing no I/O then none of its credits go to the stub domain. Mike -- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Chris
2007-Dec-04  19:32 UTC
Re: [Xen-devel] Re: Scheduling groups, credit scheduler support
First let me restate that this is a great use for groups.> The patches use the term master and member, but it is more of a peer > relationship only effecting scheduler accounting. The significance of > the "master" is that the "member" domains inherit the master''s cpu > weight and credits are charged to the master. The master doesn''t exert > any explicit control over its group members (although there may be a > use case for doing so). This is the specific functionality we need for > stub domains, where the credits consumed by a stub domain need to be > charged to the HVM guest domain.A primary concern is that the approach potentially precludes the use of VMM-based grouping information in other situations where groups are useful because the group abstraction exists only in the scheduler. Also, even though it''s currently just used for accounting, group membership information is effectively attached to a single domain. Assuming I''ve read correctly, when the master domain goes away, so does the membership information. That''s probably OK for HVM stub domains, but what if the domains are peers as in the dom0 disaggregation case or (thinking even further ahead) in the general domain decomposition case? One way to avoid both concerns is to create and manage group-tracking objects independently of domain-tracking objects. In other words, make groups a first-class object. They could be referenced by schedulers as well as any other parts of the VMM that want to make use of group information. Regards, Chris P.S. - A refresh of my previous group implementation is coming RSN. I''m testing to make sure it still works as intended. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Dec-04  23:04 UTC
Re: [Xen-devel] Re: Scheduling groups, credit scheduler support
On 4/12/07 19:32, "Chris" <hap10@tycho.ncsc.mil> wrote:> Also, even though it''s currently just used for accounting, group > membership information is effectively attached to a single domain. > Assuming I''ve read correctly, when the master domain goes away, so > does the membership information. That''s probably OK for HVM stub > domains, but what if the domains are peers as in the dom0 > disaggregation case or (thinking even further ahead) in the general > domain decomposition case?How would the disaggregated-dom0 domains be peers in anything other than a conceptual boxes-on-a-powerpoint-slide sense? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Dec-04  23:06 UTC
[Xen-devel] Re: Scheduling groups, credit scheduler support
On 4/12/07 13:50, "Mike D. Day" <ncmike@us.ibm.com> wrote:>> It''s good to see these patches get aired again. I hope we can get some >> integration with the ongoing stub domain work and get some numbers out to >> prove better scalability and QoS. This would ease the passage of scheduler >> group support into xen-unstable. > > I previously published some benchmarks: > > http://article.gmane.org/gmane.comp.emulators.xen.devel/39818/Not slowing down microbenchmarks is the least we should expect for acceptance. The feature needs to earn its keep in the tree by demonstrating superior performance or scalability in a situation we care about. Like for HVM stub domains. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
tgh
2007-Dec-05  01:34 UTC
[Xen-devel] does xen-linux for PV support the Linux Standards base , or not ?
hi does xen-linux for PV support the Linux Standards Base ,or not ?if it does, which version does it support? and what about POSIX? does it support POSIX? and which version does it support? Thanks _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2007-Dec-05  03:49 UTC
Re: [Xen-devel] does xen-linux for PV support the Linux Standards base , or not ?
> does xen-linux for PV support the Linux Standards Base ,or not ?if it > does, which version does it support? and what about POSIX? does it > support POSIX? and which version does it support?Whether LSB and / or POSIX support is provided is mostly (or maybe entirely?) up to the distribution. I''m not familiar with the details of either standard but I''m not aware of anything in PV Linux that breaks them. Much of LSB is about what software is available on the system, how the filesystem is laid out, etc, which is not changed under PV. If it places any restrictions on kernel version, I guess that could be a problem. Much of POSIX is focused on the interface to userspace apps and to the users themserves - again, the use of the PV kernel shouldn''t affect these significantly. In summary: I wouldn''t expect PV to make much difference to either but I can''t rule out it breaking some rule in those standards. Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Chris
2007-Dec-06  16:42 UTC
Re: [Xen-devel] Re: Scheduling groups, credit scheduler support
Hi Keir,> How would the disaggregated-dom0 domains be peers in anything other > than a > conceptual boxes-on-a-powerpoint-slide sense?At the most recent Xen Summit, Derek announced work to remove domain builder from dom0. Other projects removed Xenstore from dom0. Of course we can also remove device drivers from dom0 (although today they still require some attention from dom0). In certain combinations, I consider all of these decomposed components of dom0 to be peers -- in the scheduling sense and in terms of security policy and management operations. In some cases (xenstore, device driver domains, domain builder, etc.), service domains don''t have to be slave to a single master and can operate in a more generic client/server model. Further, I expect there to be less of a distinction between dom0 and domU components as decomposition progresses, so it''s not fair to limit the discussion to existing dom0 components. To rephrase, one part of my perspective is that a one-master/many- slaves model doesn''t capture all of the possible relationships between domains. The other part is that group information can be useful to parts of the VMM in addition to the scheduler. Cheers, Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mike D. Day
2007-Dec-14  13:35 UTC
[Xen-devel] Re: Scheduling groups, credit scheduler support
On 04/12/07 23:06 +0000, Keir Fraser wrote:> On 4/12/07 13:50, "Mike D. Day" <ncmike@us.ibm.com> wrote: > > >> It''s good to see these patches get aired again. I hope we can get some > >> integration with the ongoing stub domain work and get some numbers out to > >> prove better scalability and QoS. This would ease the passage of scheduler > >> group support into xen-unstable. > > > > I previously published some benchmarks: > > > > http://article.gmane.org/gmane.comp.emulators.xen.devel/39818/ > > Not slowing down microbenchmarks is the least we should expect for > acceptance. The feature needs to earn its keep in the tree by demonstrating > superior performance or scalability in a situation we care about. Like for > HVM stub domains. :-)Yes of course. But it also must not slow normal scheduling, which is the point of these benchmarks. As soon as the stub domain is ready for testing I''ll start performance work on hvm domains. Mike -- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Dec-14  13:50 UTC
[Xen-devel] Re: Scheduling groups, credit scheduler support
On 14/12/07 13:35, "Mike D. Day" <ncmike@us.ibm.com> wrote:>> Not slowing down microbenchmarks is the least we should expect for >> acceptance. The feature needs to earn its keep in the tree by demonstrating >> superior performance or scalability in a situation we care about. Like for >> HVM stub domains. :-) > > Yes of course. But it also must not slow normal scheduling, which is > the point of these benchmarks. As soon as the stub domain is ready for > testing I''ll start performance work on hvm domains.Sounds good! K. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Samuel Thibault
2007-Dec-14  16:26 UTC
Re: [Xen-devel] Re: Scheduling groups, credit scheduler support
Mike D. Day, le Fri 14 Dec 2007 08:35:36 -0500, a écrit :> On 04/12/07 23:06 +0000, Keir Fraser wrote: > > On 4/12/07 13:50, "Mike D. Day" <ncmike@us.ibm.com> wrote: > > > > >> It''s good to see these patches get aired again. I hope we can get some > > >> integration with the ongoing stub domain work and get some numbers out to > > >> prove better scalability and QoS. This would ease the passage of scheduler > > >> group support into xen-unstable. > > > > > > I previously published some benchmarks: > > > > > > http://article.gmane.org/gmane.comp.emulators.xen.devel/39818/ > > > > Not slowing down microbenchmarks is the least we should expect for > > acceptance. The feature needs to earn its keep in the tree by demonstrating > > superior performance or scalability in a situation we care about. Like for > > HVM stub domains. :-) > > Yes of course. But it also must not slow normal scheduling, which is > the point of these benchmarks. As soon as the stub domain is ready for > testing I''ll start performance work on hvm domains.It is available on http://xenbits.xensource.com/ext/xen-minios-stubdom.hg I tested groups a bit (the merge goes very fine) but couldn''t see a difference, probably because my test case is very limited (just a CPU burner in dom0), compared to the scheduling boost of the credit scheduler. Samuel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Samuel Thibault
2007-Dec-14  16:49 UTC
Re: [Xen-devel] Re: Scheduling groups, credit scheduler support
Hi, Mike D. Day, le Tue 04 Dec 2007 08:50:20 -0500, a écrit :> Credits are "stolen" only when the stub domain runs, so if the hvm > domain is doing no I/O then none of its credits go to the stub domain.Ok, but take for instance the case where vcpu1 of the HVM does a lot of I/O while vcpu2 of the HVM prefers to burn CPU. Here we would probably like to see the stubdomain take as much CPU time for performing the I/O as the vcpu2 has for burning. I''m not sure that your allocation permits this. Samuel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Samuel Thibault
2007-Dec-14  17:01 UTC
Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support
Mike D. Day, le Thu 29 Nov 2007 15:19:59 -0500, a écrit :> +static inline struct csched_dom *get_master_dom(struct csched_dom *d) > +{ > + if ( d->is_member ) > + { > + if ( get_domain(d->master->dom) ) > + return d->master; > + BUG(); > + } > + return NULL; > +} > + > +static inline struct csched_dom *master_dom(struct csched_dom *d) > +{ > + if ( d->is_member ) > + return d->master; > + return d; > +} > +> +static inline void rem_member_from_master(struct csched_dom *member, > + struct csched_dom *master) > +{ > + reclaim_active_vcpus(master, member); > + member->is_member = 0; > + member->master = NULL; > + list_del(&member->group_elem); > + if (list_empty(&master->group)) > + master->is_master = 0; > +}Mmm, isn''t there a race condition between these, if somebody removes a member in the middle of somebody else calling master_dom() or get_master_dom()? Samuel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Samuel Thibault
2007-Dec-14  17:20 UTC
Re: [Xen-devel] Re: Scheduling groups, credit scheduler support
Chris, le Tue 04 Dec 2007 14:32:11 -0500, a écrit :> One way to avoid both concerns is to create and manage group-tracking > objects independently of domain-tracking objects. In other words, > make groups a first-class object. They could be referenced by > schedulers as well as any other parts of the VMM that want to make > use of group information.Yes, some kind of non-schedulable entity which is just here to do what Mike''s masters do: concentrate scheduling credits. About the userland interface, I can see two approaches: - have people explicitely create groups and put domains in it. That can be hierarchical (putting groups into other groups) - have groups created and destroyed implicitely, for instance join(d1,d2) will make d1 and d2 part of the same group, which is created if there weren''t any previously, or the union of both groups if both existed. The second approach seems fun, but I''m not sure it might ever be useful actually :) Also, there is the question: can a domain belong to several groups? Depending on the point of view, that may be useful or just not make any sense. One problem of belonging to several groups is that you end up with a graph of domains, which may be tedious and potentially non-polynomial to walk. Samuel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Samuel Thibault
2007-Dec-14  17:36 UTC
Re: [Xen-devel] Re: Scheduling groups, credit scheduler support
Samuel Thibault, le Fri 14 Dec 2007 16:26:13 +0000, a écrit :> I tested groups a bit (the merge goes very fine) but couldn''t see a > difference,Oops, scratch that: groups operations were not working. Samuel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Samuel Thibault
2007-Dec-17  16:57 UTC
Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support
Samuel Thibault, le Fri 14 Dec 2007 17:01:28 +0000, a écrit :> Mike D. Day, le Thu 29 Nov 2007 15:19:59 -0500, a écrit : > > +static inline struct csched_dom *get_master_dom(struct csched_dom *d) > > +{ > > + if ( d->is_member ) > > + { > > + if ( get_domain(d->master->dom) ) > > + return d->master; > > + BUG(); > > + } > > + return NULL; > > +} > > + > > +static inline void rem_member_from_master(struct csched_dom *member, > > + struct csched_dom *master) > > +{ > > + reclaim_active_vcpus(master, member); > > + member->is_member = 0; > > + member->master = NULL; > > + list_del(&member->group_elem); > > + if (list_empty(&master->group)) > > + master->is_master = 0; > > +} > > Mmm, isn''t there a race condition between these, if somebody removes a > member in the middle of somebody else calling master_dom() or > get_master_dom()?More precisely, there is one with SGRP_get_master, which doesn''t take the global scheduler lock before calling get_master_dom(). Samuel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Samuel Thibault
2007-Dec-18  11:50 UTC
Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support
Hi,
Here is a bunch of other fixes I had to use
Samuel
diff -r b968ee4f6b4f -r d8ed81d5dc55 tools/libxc/xc_domain.c
--- a/tools/libxc/xc_domain.c	Mon Dec 17 12:05:18 2007 +0000
+++ b/tools/libxc/xc_domain.c	Tue Dec 18 11:45:05 2007 +0000
@@ -938,6 +938,7 @@ int xc_group_get_status(int handle, stru
     domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_group;
     domctl.u.scheduler_op.u.group.op = SGRP_get_status;
     domctl.u.scheduler_op.u.group.id_master = group->id_master;
+    domctl.u.scheduler_op.u.group.id_member = group->id_master;
     ret = do_domctl(handle, &domctl);
 
     if ( ret == 0 )
in xen''s code, id_member is used for both get_status and get_master.
The
code should probably be reworked there so that the fix above is not
needed.
diff -r b968ee4f6b4f -r d8ed81d5dc55 xen/common/domctl.c
--- a/xen/common/domctl.c	Mon Dec 17 12:05:18 2007 +0000
+++ b/xen/common/domctl.c	Tue Dec 18 11:45:05 2007 +0000
@@ -491,6 +491,8 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc
         if ( op->u.scheduler_op.cmd == XEN_DOMCTL_SCHEDOP_group ) {
             rcu_unlock_domain(d);
             ret = sched_group_op(&op->u.scheduler_op.u.group);
+            if ( copy_to_guest(u_domctl, op, 1) )
+                ret = -EFAULT;
             break;
         }
         
Else the userland doesn''t get any value :)
diff -r b968ee4f6b4f -r d8ed81d5dc55 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Mon Dec 17 12:05:18 2007 +0000
+++ b/xen/common/sched_credit.c	Tue Dec 18 11:45:05 2007 +0000
@@ -393,10 +393,12 @@ static inline void delegate_active_vcpus
         if ( !list_empty(&member->active_sdom_elem) )
         {
             list_del_init(&member->active_sdom_elem);
+            BUG_ON( csched_priv.weight < member->weight );
             csched_priv.weight -= member->weight;
         }
 
-        if ( list_empty(&master->active_sdom_elem) )
+        if ( !list_empty(&master->active_vcpu) &&
+             list_empty(&master->active_sdom_elem) )
         {
             list_add(&master->active_sdom_elem,
&csched_priv.active_sdom);
             csched_priv.weight += master->weight;
@@ -429,6 +431,7 @@ static inline void reclaim_active_vcpus(
             !list_empty(&master->active_sdom_elem) )
         {
             list_del_init(&master->active_sdom_elem);
+            BUG_ON( csched_priv.weight < master->weight );
             csched_priv.weight -= master->weight;
         }
         if ( !list_empty(&member->active_vcpu) &&
@@ -913,7 +916,6 @@ static int csched_group_op(struct xen_do
         member = get_domain_by_id(op->id_member);
         if ( !member )
             goto release_master;
-        ret = 0;
         if ( op->op == SGRP_add_member )
             op->reason                  add_sanity_check(csched_dom(member),
csched_dom(master));
@@ -922,6 +924,7 @@ static int csched_group_op(struct xen_do
                 rem_sanity_check(csched_dom(member), csched_dom(master));
         if ( op->reason )
             goto release_member;
+        ret = 0;
 
         spin_lock_irqsave(&csched_priv.lock, flags);
         if ( op->op == SGRP_add_member )
@@ -1193,6 +1196,7 @@ csched_acct(void)
         list_for_each_safe( iter_vcpu, next_vcpu, &sdom->active_vcpu )
         {
             svc = list_entry(iter_vcpu, struct csched_vcpu, active_vcpu_elem);
+            BUG_ON( sdom != master_dom(svc->sdom) );
 
             /* Increment credit */
             atomic_add(credit_fair, &svc->credit);
More safety.
diff -r b968ee4f6b4f -r d8ed81d5dc55 xen/include/public/domctl.h
--- a/xen/include/public/domctl.h	Mon Dec 17 12:05:18 2007 +0000
+++ b/xen/include/public/domctl.h	Tue Dec 18 11:45:05 2007 +0000
@@ -333,7 +333,7 @@ struct xen_domctl_scheduler_op {
             uint8_t is_member;
             domid_t id_master;
             domid_t id_member;
-        } group __attribute__ (( aligned ));
+        } group;
     } u;
 };
 typedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t;
Aligned is of no use.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Chris
2007-Dec-18  16:04 UTC
Re: [Xen-devel] Re: Scheduling groups, credit scheduler support
On Dec 14, 2007, at 12:20 PM, Samuel Thibault wrote:> Yes, some kind of non-schedulable entity which is just here to do what > Mike''s masters do: concentrate scheduling credits.Yes, precisely. More generally, a place in the VMM to consolidate information about group-related resources.> About the userland interface, I can see two approaches: > - have people explicitely create groups and put domains in it. That > can > be hierarchical (putting groups into other groups) > - have groups created and destroyed implicitely, for instance > join(d1,d2) will make d1 and d2 part of the same group, which is > created > if there weren''t any previously, or the union of both groups if both > existed.I prefer the former, only without implicit hierarchy. Policy (in the scheduler, XSM, etc.) can dictate relationships.> Also, there is the question: can a domain belong to several groups?My current implementation doesn''t allow a domain to be in more than one group simultaneously, although it''s been discussed internally. Personally, I think it has potential to cause more harm than good, but this is open to debate. Cheers, Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Samuel Thibault
2007-Dec-19  16:08 UTC
Re: [Xen-devel] Re: Scheduling groups, credit scheduler support
Samuel Thibault, le Fri 14 Dec 2007 17:36:27 +0000, a écrit :> Samuel Thibault, le Fri 14 Dec 2007 16:26:13 +0000, a écrit : > > I tested groups a bit (the merge goes very fine) but couldn''t see a > > difference, > > Oops, scratch that: groups operations were not working.This time I could really test it. There is no real performance difference even when dom0 is busy looping. It hence looks like the boosting feature of the credit scheduler already does a good job. However, when using the "cap" feature of the credit scheduler, there is indeed a noticeable difference: the stubdomain cpu time properly gets accounted in the HVM cpu time, and the cap does have an effect on the whole of the two. Now I guess this has to be somehow merged with the other, more instrusive, group support that got submitted to xen-devel. Samuel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mike D. Day
2007-Dec-19  20:18 UTC
[Xen-devel] Re: Scheduling groups, credit scheduler support
On 14/12/07 16:49 +0000, Samuel Thibault wrote:> Hi, > > Mike D. Day, le Tue 04 Dec 2007 08:50:20 -0500, a écrit : > > Credits are "stolen" only when the stub domain runs, so if the hvm > > domain is doing no I/O then none of its credits go to the stub domain. > > Ok, but take for instance the case where vcpu1 of the HVM does a lot of > I/O while vcpu2 of the HVM prefers to burn CPU. Here we would probably > like to see the stubdomain take as much CPU time for performing the I/O > as the vcpu2 has for burning. I''m not sure that your allocation permits > this.One simple way to handle this is to increase the weight of the hvm domain, which will allow both hvm and stub more credits. Of course, that does nothing to favor the stub domain. The right way to fix this is to allow the hvm domain to defer its credits to the stub domain. I''ll work up a patch. Mike -- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel