thr3ads.net - Xen devel - [Xen-devel] [PATCH] Scheduling groups, credit scheduler support [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Mike D. Day

2007-Nov-29 20:19 UTC

[Xen-devel] [PATCH] Scheduling groups, credit scheduler support

The credit implementation is limited to the sharing time slices among
group members. All members of a group must share the master''s time
slices with the master. If the group master is capped at 20%, the
cumulative total of the master and all members will be 20%. This is
specifically to support stub domains, which will host the device model
for hvm domains. The proper way to schedule a stub domain is for it to
share the times slices allocated the the stub''s hvm domain.

The credit scheduler is driven by its accounting function. Each domain
is given a credit amount sufficient to run each of its vcpus for one
scheduling cycle. The scheduler divides the total domain credit by the
number of vcpus and allocates each vcpu its share of the domain''s
credits. A domain with two vcpus has each of its vcpus given 1/2 of
the domain''s credits.

The credit scheduler subtracts credits from each vcpu for every time
slice that vcpu runs. When a vcpu has consumed its credit or exceeded
its cap the credit scheduler puts that vcpu to sleep. At the beginning
of each new scheduling cycle sleeping vcpus that have work are
awakened and given a new share of credits.

The credit scheduler runs vcpus, not domains. However, a domain''s
vcpus are given time slices according to the credits available to the
domain and any caps placed on the domain. Therefore, the simplest way
to group domains together in the credit scheduler is to assign the
member domain''s vcpus to the master domain. Each vcpu assigned to the
master domain receives a credit equal to the master domain''s total
credit divided by the number of assigned vcpus. This forces all the
member domains to share the master domain''s credits with the master,
which achieves the desired behavior. 

The primary accounting function in the credit scheduler is unmodified,
save for the removal of one debugging line. All of the group
processing is handled off the fast path. There are no additional locks
and the only new locked section is the grouping/ungrouping of domains,
which happens infrequently. Although I have yet to run any micro
benchmarks I anticipate no difference in the performance of the credit
scheduler with these patches applied.

Each struct csched_vcpu receives five new members: A list_head to hold
grouped domains (for the master), and another list head to place
member domains on the master''s list; a pointer to the master domain
(for members), and two bool_t members to hold the domain''s grouping
state.

Domains are added to a group by the function
add_member_to_master. This routine moves the member domain''s vcpus to
the master by calling delegate_active_vcpu.

delegate_active_vcpu migrates all the member domain''s active vcpus to
the new master. If necessary it then removes the member domain from
the credit scheduler''s list of active domains.

When a new vcpu is made active by csched_vcpu_acct_start, that vcpu is
always added to the domain master if the vcpu belongs to a domain
member. This and an equivalent line in __csched_vcpu_acct_stop
comprise the only new code that executes on the fast path:

static inline struct csched_dom *master_dom(struct csched_dom *d)
{
    if ( d->is_member )
        return d->master;
    return d;
}

When a domain is removed from a group, the inverse occurs. First the
former member domain''s vcpus are returned by a call to
reclaim_active_vcpus. In addition to reclaiming the vcpus, the
(former) member domain is removed from the master''s list. If it has
any active vcpus, the former member is placed on the credit
scheduler''s list of active domains.

The remainder of the code handles the sched-group sub op and ensures
that a destroyed domain''s grouping properties are properly handled and
that vcpus end up in the right place: either destroyed with their
domain or moved back to the (former) group member which owns them.

Signed-off-by: Mike D. Day <ncmike@us.ibm.com>

--
sched_credit.c |  267 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 259 insertions(+), 8 deletions(-)

-- 

diff -r 0bff1fad920a xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Wed May 09 16:41:28 2007 -0400
+++ b/xen/common/sched_credit.c	Thu May 10 16:45:21 2007 -0400
@@ -219,10 +219,15 @@ struct csched_dom {
 struct csched_dom {
     struct list_head active_vcpu;
     struct list_head active_sdom_elem;
+    struct list_head group;
+    struct list_head group_elem;
+    struct csched_dom *master;
     struct domain *dom;
     uint16_t active_vcpu_count;
     uint16_t weight;
     uint16_t cap;
+    bool_t is_master;
+    bool_t is_member;
 };
 
 /*
@@ -344,6 +349,118 @@ __runq_tickle(unsigned int cpu, struct c
         cpumask_raise_softirq(mask, SCHEDULE_SOFTIRQ);
 }
 
+static inline struct csched_dom *csched_dom(struct domain *d)
+{
+    return (struct csched_dom *)d->sched_priv;
+}
+
+static inline struct csched_dom *get_master_dom(struct csched_dom *d)
+{
+    if ( d->is_member )
+    {
+        if ( get_domain(d->master->dom) )
+            return d->master;
+        BUG();
+    }
+    return NULL;
+}
+
+static inline struct csched_dom *master_dom(struct csched_dom *d)
+{
+    if ( d->is_member )
+        return d->master;
+    return d;
+}
+
+static inline void delegate_active_vcpus(struct csched_dom *member,
+                                         struct csched_dom *master)
+{
+    BUG_ON( ! ( member->is_member ) );
+    BUG_ON( member->master != master );
+    if ( member->is_member && member->master == master )
+    {
+        struct list_head *elem;
+
+        while ( !list_empty(&member->active_vcpu) )
+        {
+            elem = member->active_vcpu.next;
+            list_del(elem);
+            list_add(elem, &master->active_vcpu);
+            member->active_vcpu_count--;
+            master->active_vcpu_count++;
+        }
+
+        if ( !list_empty(&member->active_sdom_elem) )
+        {
+            list_del_init(&member->active_sdom_elem);
+            csched_priv.weight -= member->weight;
+        }
+
+        if ( list_empty(&master->active_sdom_elem) )
+        {
+            list_add(&master->active_sdom_elem,
&csched_priv.active_sdom);
+            csched_priv.weight += master->weight;
+        }
+    }
+}
+
+static inline void reclaim_active_vcpus(struct csched_dom *master,
+                                        struct csched_dom *member)
+{
+    BUG_ON( !master->is_master );
+    BUG_ON( member->master != master );
+    if ( master->is_master && member->master == master )
+    {
+        struct csched_vcpu *iter, *n;
+
+        list_for_each_entry_safe( iter, n, &master->active_vcpu,
+                                  active_vcpu_elem )
+        {
+            if ( iter->sdom == member )
+            {
+                list_del(&iter->active_vcpu_elem);
+                list_add(&iter->active_vcpu_elem,
&member->active_vcpu);
+                master->active_vcpu_count--;
+                member->active_vcpu_count++;
+            }
+        }
+
+        if ( list_empty(&master->active_vcpu) &&
+            !list_empty(&master->active_sdom_elem) )
+        {
+            list_del_init(&master->active_sdom_elem);
+            csched_priv.weight -= master->weight;
+        }
+        if ( !list_empty(&member->active_vcpu) &&
+            list_empty(&member->active_sdom_elem) )
+        {
+            list_add(&member->active_sdom_elem,
&csched_priv.active_sdom);
+            csched_priv.weight += member->weight;
+        }
+    }
+}
+
+static inline void add_member_to_master(struct csched_dom *member,
+                                        struct csched_dom *master)
+{
+    list_add(&member->group_elem, &master->group);
+    member->master = master;
+    member->is_member = 1;
+    master->is_master = 1;
+    delegate_active_vcpus(member, master);
+}
+
+static inline void rem_member_from_master(struct csched_dom *member,
+                                          struct csched_dom *master)
+{
+    reclaim_active_vcpus(master, member);
+    member->is_member = 0;
+    member->master = NULL;
+    list_del(&member->group_elem);
+    if (list_empty(&master->group))
+        master->is_master = 0;
+}
+
 static int
 csched_pcpu_init(int cpu)
 {
@@ -395,6 +512,17 @@ __csched_vcpu_check(struct vcpu *vc)
     else
     {
         BUG_ON( !is_idle_vcpu(vc) );
+    }
+
+    if ( sdom->is_master )
+    {
+        BUG_ON( list_empty(&sdom->group) );
+        BUG_ON( sdom->is_member );
+    }
+    if ( sdom->is_member )
+    {
+        BUG_ON( list_empty(&sdom->group_elem) );
+        BUG_ON( sdom->is_master );
     }
 
     CSCHED_STAT_CRANK(vcpu_check);
@@ -486,11 +614,11 @@ static inline void
 static inline void
 __csched_vcpu_acct_start(struct csched_vcpu *svc)
 {
-    struct csched_dom * const sdom = svc->sdom;
     unsigned long flags;
-
+    struct csched_dom * sdom;
     spin_lock_irqsave(&csched_priv.lock, flags);
 
+    sdom = master_dom(svc->sdom);
     if ( list_empty(&svc->active_vcpu_elem) )
     {
         CSCHED_VCPU_STAT_CRANK(svc, state_active);
@@ -504,14 +632,13 @@ __csched_vcpu_acct_start(struct csched_v
             csched_priv.weight += sdom->weight;
         }
     }
-
     spin_unlock_irqrestore(&csched_priv.lock, flags);
 }
 
 static inline void
 __csched_vcpu_acct_stop_locked(struct csched_vcpu *svc)
 {
-    struct csched_dom * const sdom = svc->sdom;
+    struct csched_dom * const sdom = master_dom(svc->sdom);
 
     BUG_ON( list_empty(&svc->active_vcpu_elem) );
 
@@ -605,20 +732,34 @@ csched_vcpu_init(struct vcpu *vc)
     return 0;
 }
 
+static void group_cleanup(struct csched_vcpu *svc)
+{
+    if ( svc->sdom->is_member )
+        rem_member_from_master(svc->sdom, master_dom(svc->sdom));
+    if ( svc->sdom->is_master )
+    {
+        struct csched_dom *iter, *n;
+        list_for_each_entry_safe( iter, n, &svc->sdom->group,
group_elem )
+        {
+            rem_member_from_master(iter, svc->sdom);
+        }
+    }
+}
+
+
 static void
 csched_vcpu_destroy(struct vcpu *vc)
 {
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
-    struct csched_dom * const sdom = svc->sdom;
     unsigned long flags;
 
     CSCHED_STAT_CRANK(vcpu_destroy);
 
-    BUG_ON( sdom == NULL );
     BUG_ON( !list_empty(&svc->runq_elem) );
 
     spin_lock_irqsave(&csched_priv.lock, flags);
 
+    group_cleanup(svc);
     if ( !list_empty(&svc->active_vcpu_elem) )
         __csched_vcpu_acct_stop_locked(svc);
 
@@ -697,6 +838,112 @@ csched_vcpu_wake(struct vcpu *vc)
     __runq_tickle(cpu, svc);
 }
 
+static inline int
+_sanity_check(struct csched_dom *member, struct csched_dom *master)
+{
+    if ( member->dom->domain_id == master->dom->domain_id )
+        return SGRP_err_same_id;
+    if ( member->is_master )
+        return SGRP_err_already_master;
+    if ( master->is_member )
+        return SGRP_err_already_member;
+    return 0;
+}
+
+static inline int
+add_sanity_check(struct csched_dom *member, struct csched_dom *master)
+{
+    if ( member->master )
+        return SGRP_err_inval;
+    return _sanity_check(member, master);
+}
+
+static inline int
+rem_sanity_check(struct csched_dom *member, struct csched_dom *master)
+{
+    if ( member->is_member && member->master &&
member->master == master )
+        return _sanity_check(member, master);
+    return SGRP_err_inval;
+}
+
+static int csched_group_op(struct xen_domctl_group * op)
+{
+    int ret = -EINVAL;
+
+    switch(op->op)
+    {
+    case SGRP_get_status:
+    case SGRP_get_master:
+    {
+        struct domain *dom = get_domain_by_id(op->id_member);
+        if ( dom )
+        {
+            struct csched_dom *cdom = csched_dom(dom);
+            if ( op->op == SGRP_get_status )
+            {
+                op->is_master = cdom->is_master;
+                op->is_member = cdom->is_member;
+            }
+            else
+            {
+                struct csched_dom *master = get_master_dom(cdom);
+                if ( master )
+                {
+                    op->id_master = master->dom->domain_id;
+                    put_domain(master->dom);
+                }
+                else
+                    op->reason = SGRP_err_not_member;
+            }
+            put_domain(dom);
+            ret = 0;
+        }
+        break;
+    }
+
+    case SGRP_add_member:
+    case SGRP_del_member:
+    {
+        struct domain *member, *master;
+        unsigned long flags;
+
+        master  = get_domain_by_id(op->id_master);
+        if ( !master )
+            break;
+        member = get_domain_by_id(op->id_member);
+        if ( !member )
+            goto release_master;
+        ret = 0;
+        if ( op->op == SGRP_add_member )
+            op->reason +                add_sanity_check(csched_dom(member),
csched_dom(master));
+        else
+            op->reason +                rem_sanity_check(csched_dom(member),
csched_dom(master));
+        if ( op->reason )
+            goto release_member;
+
+        spin_lock_irqsave(&csched_priv.lock, flags);
+        if ( op->op == SGRP_add_member )
+            add_member_to_master(csched_dom(member), csched_dom(master));
+        else
+            rem_member_from_master(csched_dom(member), csched_dom(master));
+        spin_unlock_irqrestore(&csched_priv.lock, flags);
+
+release_member:
+        put_domain(member);
+release_master:
+        put_domain(master);
+
+        break;
+    }
+    default:
+        break;
+    }
+
+    return ret;
+}
+
 static int
 csched_dom_cntl(
     struct domain *d,
@@ -754,10 +1001,14 @@ csched_dom_init(struct domain *dom)
     sdom->active_vcpu_count = 0;
     INIT_LIST_HEAD(&sdom->active_sdom_elem);
     sdom->dom = dom;
+    sdom->master = NULL;
     sdom->weight = CSCHED_DEFAULT_WEIGHT;
     sdom->cap = 0U;
     dom->sched_priv = sdom;
-
+    INIT_LIST_HEAD(&sdom->group);
+    INIT_LIST_HEAD(&sdom->group_elem);
+    sdom->is_master = 0;
+    sdom->is_member = 0;
     return 0;
 }
 
@@ -942,7 +1193,6 @@ csched_acct(void)
         list_for_each_safe( iter_vcpu, next_vcpu, &sdom->active_vcpu )
         {
             svc = list_entry(iter_vcpu, struct csched_vcpu, active_vcpu_elem);
-            BUG_ON( sdom != svc->sdom );
 
             /* Increment credit */
             atomic_add(credit_fair, &svc->credit);
@@ -1384,6 +1634,7 @@ struct scheduler sched_credit_def = {
     .sleep          = csched_vcpu_sleep,
     .wake           = csched_vcpu_wake,
 
+    .group_op        = csched_group_op,
     .adjust         = csched_dom_cntl,
 
     .pick_cpu       = csched_cpu_pick,

-- 
Mike D. Day
Virtualization Architect and Sr. Technical Staff Member, IBM LTC
Cell: 919 412-3900
ST: mdday@us.ibm.com | AIM: ncmikeday | Yahoo IM: ultra.runner
PGP key: http://www.ncultra.org/ncmike/pubkey.asc

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Chris B

2007-Nov-29 22:36 UTC

head link

Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support

My interpretation of your implementation is that it causes hierarchal  
relationships (master/slave) between domains.  Every group has one  
master and the rest are slaves.  The fixed relationships puts  
implicit limits on the organization of domains.  (See http:// 
www.hpl.hp.com/techreports/94/HPL-94-104.html, for example)

Since groups are applicable to more than just scheduling (domain  
disaggregation, convenient migration, efficient security policy,  
etc.), a more general mechanism would be preferable.

I submitted something similar in the past (Feb 20, 2007).  If anyone  
is interested, I''d be glad to submit a fresh cut of my patches as an  
RFC.

-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Chris

2007-Dec-03 15:37 UTC

head link

Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support

Mike,

My reading of your scheduling groups implementation is that it  
induces hierarchal relationships (master/slave) between domains.   
That is, every group has one master and the rest are slaves.   
Although that implementation has the advantage of being small, the  
fixed relationship puts implicit limits on the organization of  
domains and the operations that can be applied to them.

In addition to scheduling, I believe domain groups are applicable to  
other areas such as domain disaggregation, convenient migration,  
efficient security policy, etc..  As such, a non-hierarchical group  
mechanism is desirable.

I submitted a related domain grouping patch in the past (Feb 20,  
2007).  If anyone is interested, I''d be glad to submit a fresh cut of  
my patches against the tip as an RFC.

Cheers,
Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mike D. Day

2007-Dec-03 17:59 UTC

head link

[Xen-devel] Re: Scheduling groups, credit scheduler support

On 03/12/07 10:37 -0500, Chris wrote:> Mike,
>
> My reading of your scheduling groups implementation is that it induces 
> hierarchal relationships (master/slave) between domains.  That is, every 
> group has one master and the rest are slaves.  Although that implementation
> has the advantage of being small, the fixed relationship puts implicit 
> limits on the organization of domains and the operations that can be 
> applied to them.
Hi Chris,

The patches use the term master and member, but it is more of a peer
relationship only effecting scheduler accounting. The significance of
the "master" is that the "member" domains inherit the
master''s cpu
weight and credits are charged to the master. The master doesn''t exert
any explicit control over its group members (although there may be a
use case for doing so). This is the specific functionality we need for
stub domains, where the credits consumed by a stub domain need to be
charged to the HVM guest domain.
> In addition to scheduling, I believe domain groups are applicable to other 
> areas such as domain disaggregation, convenient migration, efficient 
> security policy, etc..  As such, a non-hierarchical group mechanism is 
> desirable.
The scheduling group is only visible to the credit scheduler, and
there is no meaning outside of the scheduler''s process accounting. I
didn''t want to modify the Domain structures, and it wasn''t
necessary
to get the desired scheduling behavior. I think other types of groups
may be useful, and it would be great to see your patches again.

Mike

-- 
Mike D. Day
IBM LTC
Cell: 919 412-3900
Sametime: ncmike@us.ibm.com AIM: ncmikeday  Yahoo: ultra.runner
PGP key: http://www.ncultra.org/ncmike/pubkey.asc

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Dec-04 10:38 UTC

head link

Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support

On 29/11/07 20:19, "Mike D. Day" <ncmike@us.ibm.com> wrote:
> The credit implementation is limited to the sharing time slices among
> group members. All members of a group must share the master''s time
> slices with the master. If the group master is capped at 20%, the
> cumulative total of the master and all members will be 20%. This is
> specifically to support stub domains, which will host the device model
> for hvm domains. The proper way to schedule a stub domain is for it to
> share the times slices allocated the the stub''s hvm domain.
It''s good to see these patches get aired again. I hope we can get some
integration with the ongoing stub domain work and get some numbers out to
prove better scalability and QoS. This would ease the passage of scheduler
group support into xen-unstable.
> The credit scheduler runs vcpus, not domains. However, a domain''s
> vcpus are given time slices according to the credits available to the
> domain and any caps placed on the domain. Therefore, the simplest way
> to group domains together in the credit scheduler is to assign the
> member domain''s vcpus to the master domain. Each vcpu assigned to
the
> master domain receives a credit equal to the master domain''s total
> credit divided by the number of assigned vcpus. This forces all the
> member domains to share the master domain''s credits with the
master,
> which achieves the desired behavior.
Is this the right thing to do? I would think that the desired behaviour is
for each ''master vcpu'' to freely share its credit allocation
with its ''buddy
vcpus''. The static N-way split of credits across vcpus within a domain
makes
some kind of sense, since the vcpus are each equally important and each
independent of each other. Statically splitting credits between e.g., HVM
guest domain and its stub domain makes less sense. One is subordinate to the
other, and a model where the stub can ''steal'' credits
dynamically from the
HVM domain seems to make more sense. Otherwise, wouldn''t a uniprocessor
HVM
guest get half its credit stolen by the uniprocessor stub domain, even if
the HVM guest is doing no I/O?

Perhaps I misunderstand. :-)

 -- Keir




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mike D. Day

2007-Dec-04 13:50 UTC

head link

[Xen-devel] Re: Scheduling groups, credit scheduler support

On 04/12/07 10:38 +0000, Keir Fraser wrote:>On 29/11/07 20:19, "Mike D. Day" <ncmike@us.ibm.com> wrote:
>
>> The credit implementation is limited to the sharing time slices among
>> group members. All members of a group must share the master''s
time
>> slices with the master. If the group master is capped at 20%, the
>> cumulative total of the master and all members will be 20%. This is
>> specifically to support stub domains, which will host the device model
>> for hvm domains. The proper way to schedule a stub domain is for it to
>> share the times slices allocated the the stub''s hvm domain.
>
>It''s good to see these patches get aired again. I hope we can get
some
>integration with the ongoing stub domain work and get some numbers out to
>prove better scalability and QoS. This would ease the passage of scheduler
>group support into xen-unstable.
I previously published some benchmarks: 

http://article.gmane.org/gmane.comp.emulators.xen.devel/39818/

>> The credit scheduler runs vcpus, not domains. However, a
domain''s
>> vcpus are given time slices according to the credits available to the
>> domain and any caps placed on the domain. Therefore, the simplest way
>> to group domains together in the credit scheduler is to assign the
>> member domain''s vcpus to the master domain. Each vcpu assigned
to the
>> master domain receives a credit equal to the master domain''s
total
>> credit divided by the number of assigned vcpus. This forces all the
>> member domains to share the master domain''s credits with the
master,
>> which achieves the desired behavior.
>
>Is this the right thing to do? I would think that the desired behaviour is
>for each ''master vcpu'' to freely share its credit
allocation with its ''buddy
>vcpus''. The static N-way split of credits across vcpus within a
domain makes
>some kind of sense, since the vcpus are each equally important and each
>independent of each other. 
This is what happens with the patch today. In fact, the code that
allocates credits is untouched by the patch. The difference is that
the active vcpus of the stub domain are transfered for accounting
purposes to the hvm domain. So whenever the stub domain runs those
credits are decremented from the hvm domain. If the stub domain
doesn''t run no hvm credits are decremented.


Statically splitting credits between e.g., HVM>guest domain and its stub domain makes less sense. One is subordinate to the
>other, and a model where the stub can ''steal'' credits
dynamically from the
>HVM domain seems to make more sense. Otherwise, wouldn''t a
uniprocessor HVM
>guest get half its credit stolen by the uniprocessor stub domain, even if
>the HVM guest is doing no I/O?
Credits are "stolen" only when the stub domain runs, so if the hvm
domain is doing no I/O then none of its credits go to the stub domain.

Mike

-- 
Mike D. Day
IBM LTC
Cell: 919 412-3900
Sametime: ncmike@us.ibm.com AIM: ncmikeday  Yahoo: ultra.runner
PGP key: http://www.ncultra.org/ncmike/pubkey.asc

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Chris

2007-Dec-04 19:32 UTC

head link

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

First let me restate that this is a great use for groups.
> The patches use the term master and member, but it is more of a peer
> relationship only effecting scheduler accounting. The significance of
> the "master" is that the "member" domains inherit the
master''s cpu
> weight and credits are charged to the master. The master doesn''t
exert
> any explicit control over its group members (although there may be a
> use case for doing so). This is the specific functionality we need for
> stub domains, where the credits consumed by a stub domain need to be
> charged to the HVM guest domain.
A primary concern is that the approach potentially precludes the use  
of VMM-based grouping information in other situations where groups  
are useful because the group abstraction exists only in the scheduler.

Also, even though it''s currently just used for accounting, group  
membership information is effectively attached to a single domain.   
Assuming I''ve read correctly, when the master domain goes away, so  
does the membership information.  That''s probably OK for HVM stub  
domains, but what if the domains are peers as in the dom0  
disaggregation case or (thinking even further ahead) in the general  
domain decomposition case?

One way to avoid both concerns is to create and manage group-tracking  
objects independently of domain-tracking objects.   In other words,  
make groups a first-class object.  They could be referenced by  
schedulers as well as any other parts of the VMM that want to make  
use of group information.

Regards,
Chris

P.S. - A refresh of my previous group implementation is coming RSN.  
I''m testing to make sure it still works as intended. 
  

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Dec-04 23:04 UTC

head link

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

On 4/12/07 19:32, "Chris" <hap10@tycho.ncsc.mil> wrote:
> Also, even though it''s currently just used for accounting, group
> membership information is effectively attached to a single domain.
> Assuming I''ve read correctly, when the master domain goes away, so
> does the membership information.  That''s probably OK for HVM stub
> domains, but what if the domains are peers as in the dom0
> disaggregation case or (thinking even further ahead) in the general
> domain decomposition case?
How would the disaggregated-dom0 domains be peers in anything other than a
conceptual boxes-on-a-powerpoint-slide sense?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Dec-04 23:06 UTC

head link

[Xen-devel] Re: Scheduling groups, credit scheduler support

On 4/12/07 13:50, "Mike D. Day" <ncmike@us.ibm.com> wrote:
>> It''s good to see these patches get aired again. I hope we can
get some
>> integration with the ongoing stub domain work and get some numbers out
to
>> prove better scalability and QoS. This would ease the passage of
scheduler
>> group support into xen-unstable.
> 
> I previously published some benchmarks:
> 
> http://article.gmane.org/gmane.comp.emulators.xen.devel/39818/
Not slowing down microbenchmarks is the least we should expect for
acceptance. The feature needs to earn its keep in the tree by demonstrating
superior performance or scalability in a situation we care about. Like for
HVM stub domains. :-)

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

tgh

2007-Dec-05 01:34 UTC

head link

[Xen-devel] does xen-linux for PV support the Linux Standards base , or not ?

hi
  does xen-linux for PV support the Linux Standards Base ,or not ?if it 
does, which version does it support? and what about POSIX? does it 
support POSIX? and which version does it support?

Thanks

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Williamson

2007-Dec-05 03:49 UTC

head link

Re: [Xen-devel] does xen-linux for PV support the Linux Standards base , or not ?

>   does xen-linux for PV support the Linux Standards Base ,or not ?if it
> does, which version does it support? and what about POSIX? does it
> support POSIX? and which version does it support?
Whether LSB and / or POSIX support is provided is mostly (or maybe entirely?) 
up to the distribution.  I''m not familiar with the details of either
standard
but I''m not aware of anything in PV Linux that breaks them.

Much of LSB is about what software is available on the system, how the 
filesystem is laid out, etc, which is not changed under PV.  If it places any 
restrictions on kernel version, I guess that could be a problem.

Much of POSIX is focused on the interface to userspace apps and to the users 
themserves - again, the use of the PV kernel shouldn''t affect these 
significantly.

In summary: I wouldn''t expect PV to make much difference to either but
I can''t
rule out it breaking some rule in those standards.

Cheers,
Mark

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Chris

2007-Dec-06 16:42 UTC

head link

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

Hi Keir,
> How would the disaggregated-dom0 domains be peers in anything other  
> than a
> conceptual boxes-on-a-powerpoint-slide sense?
At the most recent Xen Summit, Derek announced work to remove domain  
builder from dom0.  Other projects removed Xenstore from dom0.  Of  
course we can also remove device drivers from dom0 (although today  
they still require some attention from dom0).  In certain  
combinations, I consider all of these decomposed components of dom0  
to be peers -- in the scheduling sense and in terms of security  
policy and management operations.

In some cases (xenstore, device driver domains, domain builder,  
etc.), service domains don''t have to be slave to a single master and  
can operate in a more generic client/server model.  Further, I expect  
there to be less of a distinction between dom0 and domU components as  
decomposition progresses, so it''s not fair to limit the discussion to  
existing dom0 components.

To rephrase, one part of my perspective is that a one-master/many- 
slaves model doesn''t capture all of the possible relationships  
between domains.  The other part is that group information can be  
useful to parts of the VMM in addition to the scheduler.

Cheers,
Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mike D. Day

2007-Dec-14 13:35 UTC

head link

[Xen-devel] Re: Scheduling groups, credit scheduler support

On 04/12/07 23:06 +0000, Keir Fraser wrote:> On 4/12/07 13:50, "Mike D. Day" <ncmike@us.ibm.com> wrote:
> 
> >> It''s good to see these patches get aired again. I hope we
can get some
> >> integration with the ongoing stub domain work and get some numbers
out to
> >> prove better scalability and QoS. This would ease the passage of
scheduler
> >> group support into xen-unstable.
> > 
> > I previously published some benchmarks:
> > 
> > http://article.gmane.org/gmane.comp.emulators.xen.devel/39818/
> 
> Not slowing down microbenchmarks is the least we should expect for
> acceptance. The feature needs to earn its keep in the tree by demonstrating
> superior performance or scalability in a situation we care about. Like for
> HVM stub domains. :-)
Yes of course. But it also must not slow normal scheduling, which is
the point of these benchmarks. As soon as the stub domain is ready for
testing I''ll start performance work on hvm domains.

Mike

-- 
Mike D. Day
IBM LTC
Cell: 919 412-3900
Sametime: ncmike@us.ibm.com AIM: ncmikeday  Yahoo: ultra.runner
PGP key: http://www.ncultra.org/ncmike/pubkey.asc

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Dec-14 13:50 UTC

head link

[Xen-devel] Re: Scheduling groups, credit scheduler support

On 14/12/07 13:35, "Mike D. Day" <ncmike@us.ibm.com> wrote:
>> Not slowing down microbenchmarks is the least we should expect for
>> acceptance. The feature needs to earn its keep in the tree by
demonstrating
>> superior performance or scalability in a situation we care about. Like
for
>> HVM stub domains. :-)
> 
> Yes of course. But it also must not slow normal scheduling, which is
> the point of these benchmarks. As soon as the stub domain is ready for
> testing I''ll start performance work on hvm domains.
Sounds good!

 K.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Samuel Thibault

2007-Dec-14 16:26 UTC

head link

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

Mike D. Day, le Fri 14 Dec 2007 08:35:36 -0500, a écrit
:> On 04/12/07 23:06 +0000, Keir Fraser wrote:
> > On 4/12/07 13:50, "Mike D. Day" <ncmike@us.ibm.com>
wrote:
> > 
> > >> It''s good to see these patches get aired again. I
hope we can get some
> > >> integration with the ongoing stub domain work and get some
numbers out to
> > >> prove better scalability and QoS. This would ease the passage
of scheduler
> > >> group support into xen-unstable.
> > > 
> > > I previously published some benchmarks:
> > > 
> > > http://article.gmane.org/gmane.comp.emulators.xen.devel/39818/
> > 
> > Not slowing down microbenchmarks is the least we should expect for
> > acceptance. The feature needs to earn its keep in the tree by
demonstrating
> > superior performance or scalability in a situation we care about. Like
for
> > HVM stub domains. :-)
> 
> Yes of course. But it also must not slow normal scheduling, which is
> the point of these benchmarks. As soon as the stub domain is ready for
> testing I''ll start performance work on hvm domains.
It is available on

http://xenbits.xensource.com/ext/xen-minios-stubdom.hg

I tested groups a bit (the merge goes very fine) but couldn''t see a
difference, probably because my test case is very limited (just a
CPU burner in dom0), compared to the scheduling boost of the credit
scheduler.

Samuel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Samuel Thibault

2007-Dec-14 16:49 UTC

head link

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

Hi,

Mike D. Day, le Tue 04 Dec 2007 08:50:20 -0500, a écrit
:> Credits are "stolen" only when the stub domain runs, so if the
hvm
> domain is doing no I/O then none of its credits go to the stub domain.
Ok, but take for instance the case where vcpu1 of the HVM does a lot of
I/O while vcpu2 of the HVM prefers to burn CPU. Here we would probably
like to see the stubdomain take as much CPU time for performing the I/O
as the vcpu2 has for burning. I''m not sure that your allocation permits
this.

Samuel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Samuel Thibault

2007-Dec-14 17:01 UTC

head link

Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support

Mike D. Day, le Thu 29 Nov 2007 15:19:59 -0500, a écrit
:> +static inline struct csched_dom *get_master_dom(struct csched_dom *d)
> +{
> +    if ( d->is_member )
> +    {
> +        if ( get_domain(d->master->dom) )
> +            return d->master;
> +        BUG();
> +    }
> +    return NULL;
> +}
> +
> +static inline struct csched_dom *master_dom(struct csched_dom *d)
> +{
> +    if ( d->is_member )
> +        return d->master;
> +    return d;
> +}
> +
> +static inline void rem_member_from_master(struct csched_dom *member,
> +                                          struct csched_dom *master)
> +{
> +    reclaim_active_vcpus(master, member);
> +    member->is_member = 0;
> +    member->master = NULL;
> +    list_del(&member->group_elem);
> +    if (list_empty(&master->group))
> +        master->is_master = 0;
> +}
Mmm, isn''t there a race condition between these, if somebody removes a
member in the middle of somebody else calling master_dom() or
get_master_dom()?

Samuel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Samuel Thibault

2007-Dec-14 17:20 UTC

head link

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

Chris, le Tue 04 Dec 2007 14:32:11 -0500, a écrit :> One way to avoid both concerns is to create and manage group-tracking  
> objects independently of domain-tracking objects.   In other words,  
> make groups a first-class object.  They could be referenced by  
> schedulers as well as any other parts of the VMM that want to make  
> use of group information.
Yes, some kind of non-schedulable entity which is just here to do what
Mike''s masters do: concentrate scheduling credits.

About the userland interface, I can see two approaches:
- have people explicitely create groups and put domains in it. That can
  be hierarchical (putting groups into other groups)
- have groups created and destroyed implicitely, for instance
join(d1,d2) will make d1 and d2 part of the same group, which is created
if there weren''t any previously, or the union of both groups if both
existed.

The second approach seems fun, but I''m not sure it might ever be useful
actually :)

Also, there is the question: can a domain belong to several groups?
Depending on the point of view, that may be useful or just not make
any sense.  One problem of belonging to several groups is that you
end up with a graph of domains, which may be tedious and potentially
non-polynomial to walk.

Samuel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Samuel Thibault

2007-Dec-14 17:36 UTC

head link

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

Samuel Thibault, le Fri 14 Dec 2007 16:26:13 +0000, a écrit
:> I tested groups a bit (the merge goes very fine) but couldn''t see
a
> difference,
Oops, scratch that: groups operations were not working.

Samuel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Samuel Thibault

2007-Dec-17 16:57 UTC

head link

Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support

Samuel Thibault, le Fri 14 Dec 2007 17:01:28 +0000, a écrit
:> Mike D. Day, le Thu 29 Nov 2007 15:19:59 -0500, a écrit :
> > +static inline struct csched_dom *get_master_dom(struct csched_dom *d)
> > +{
> > +    if ( d->is_member )
> > +    {
> > +        if ( get_domain(d->master->dom) )
> > +            return d->master;
> > +        BUG();
> > +    }
> > +    return NULL;
> > +}
> > +
> > +static inline void rem_member_from_master(struct csched_dom *member,
> > +                                          struct csched_dom *master)
> > +{
> > +    reclaim_active_vcpus(master, member);
> > +    member->is_member = 0;
> > +    member->master = NULL;
> > +    list_del(&member->group_elem);
> > +    if (list_empty(&master->group))
> > +        master->is_master = 0;
> > +}
> 
> Mmm, isn''t there a race condition between these, if somebody
removes a
> member in the middle of somebody else calling master_dom() or
> get_master_dom()?
More precisely, there is one with SGRP_get_master, which doesn''t take
the global scheduler lock before calling get_master_dom().

Samuel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Samuel Thibault

2007-Dec-18 11:50 UTC

head link

Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support

Hi,

Here is a bunch of other fixes I had to use

Samuel

diff -r b968ee4f6b4f -r d8ed81d5dc55 tools/libxc/xc_domain.c
--- a/tools/libxc/xc_domain.c	Mon Dec 17 12:05:18 2007 +0000
+++ b/tools/libxc/xc_domain.c	Tue Dec 18 11:45:05 2007 +0000
@@ -938,6 +938,7 @@ int xc_group_get_status(int handle, stru
     domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_group;
     domctl.u.scheduler_op.u.group.op = SGRP_get_status;
     domctl.u.scheduler_op.u.group.id_master = group->id_master;
+    domctl.u.scheduler_op.u.group.id_member = group->id_master;
     ret = do_domctl(handle, &domctl);
 
     if ( ret == 0 )

in xen''s code, id_member is used for both get_status and get_master.
The
code should probably be reworked there so that the fix above is not
needed.

diff -r b968ee4f6b4f -r d8ed81d5dc55 xen/common/domctl.c
--- a/xen/common/domctl.c	Mon Dec 17 12:05:18 2007 +0000
+++ b/xen/common/domctl.c	Tue Dec 18 11:45:05 2007 +0000
@@ -491,6 +491,8 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc
         if ( op->u.scheduler_op.cmd == XEN_DOMCTL_SCHEDOP_group ) {
             rcu_unlock_domain(d);
             ret = sched_group_op(&op->u.scheduler_op.u.group);
+            if ( copy_to_guest(u_domctl, op, 1) )
+                ret = -EFAULT;
             break;
         }
         
Else the userland doesn''t get any value :)

diff -r b968ee4f6b4f -r d8ed81d5dc55 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Mon Dec 17 12:05:18 2007 +0000
+++ b/xen/common/sched_credit.c	Tue Dec 18 11:45:05 2007 +0000
@@ -393,10 +393,12 @@ static inline void delegate_active_vcpus
         if ( !list_empty(&member->active_sdom_elem) )
         {
             list_del_init(&member->active_sdom_elem);
+            BUG_ON( csched_priv.weight < member->weight );
             csched_priv.weight -= member->weight;
         }
 
-        if ( list_empty(&master->active_sdom_elem) )
+        if ( !list_empty(&master->active_vcpu) &&
+             list_empty(&master->active_sdom_elem) )
         {
             list_add(&master->active_sdom_elem,
&csched_priv.active_sdom);
             csched_priv.weight += master->weight;
@@ -429,6 +431,7 @@ static inline void reclaim_active_vcpus(
             !list_empty(&master->active_sdom_elem) )
         {
             list_del_init(&master->active_sdom_elem);
+            BUG_ON( csched_priv.weight < master->weight );
             csched_priv.weight -= master->weight;
         }
         if ( !list_empty(&member->active_vcpu) &&
@@ -913,7 +916,6 @@ static int csched_group_op(struct xen_do
         member = get_domain_by_id(op->id_member);
         if ( !member )
             goto release_master;
-        ret = 0;
         if ( op->op == SGRP_add_member )
             op->reason                  add_sanity_check(csched_dom(member),
csched_dom(master));
@@ -922,6 +924,7 @@ static int csched_group_op(struct xen_do
                 rem_sanity_check(csched_dom(member), csched_dom(master));
         if ( op->reason )
             goto release_member;
+        ret = 0;
 
         spin_lock_irqsave(&csched_priv.lock, flags);
         if ( op->op == SGRP_add_member )
@@ -1193,6 +1196,7 @@ csched_acct(void)
         list_for_each_safe( iter_vcpu, next_vcpu, &sdom->active_vcpu )
         {
             svc = list_entry(iter_vcpu, struct csched_vcpu, active_vcpu_elem);
+            BUG_ON( sdom != master_dom(svc->sdom) );
 
             /* Increment credit */
             atomic_add(credit_fair, &svc->credit);

More safety.

diff -r b968ee4f6b4f -r d8ed81d5dc55 xen/include/public/domctl.h
--- a/xen/include/public/domctl.h	Mon Dec 17 12:05:18 2007 +0000
+++ b/xen/include/public/domctl.h	Tue Dec 18 11:45:05 2007 +0000
@@ -333,7 +333,7 @@ struct xen_domctl_scheduler_op {
             uint8_t is_member;
             domid_t id_master;
             domid_t id_member;
-        } group __attribute__ (( aligned ));
+        } group;
     } u;
 };
 typedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t;

Aligned is of no use.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Chris

2007-Dec-18 16:04 UTC

head link

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

On Dec 14, 2007, at 12:20 PM, Samuel Thibault wrote:> Yes, some kind of non-schedulable entity which is just here to do what
> Mike''s masters do: concentrate scheduling credits.
Yes, precisely.  More generally, a place in the VMM to consolidate  
information about group-related resources.
> About the userland interface, I can see two approaches:
> - have people explicitely create groups and put domains in it. That  
> can
>   be hierarchical (putting groups into other groups)
> - have groups created and destroyed implicitely, for instance
> join(d1,d2) will make d1 and d2 part of the same group, which is  
> created
> if there weren''t any previously, or the union of both groups if
both
> existed.
I prefer the former, only without implicit hierarchy.  Policy (in the  
scheduler, XSM, etc.) can dictate relationships.
> Also, there is the question: can a domain belong to several groups?
My current implementation doesn''t allow a domain to be in more than  
one group simultaneously, although it''s been discussed internally.   
Personally, I think it has potential to cause more harm than good,  
but this is open to debate.

Cheers,
Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Samuel Thibault

2007-Dec-19 16:08 UTC

head link

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

Samuel Thibault, le Fri 14 Dec 2007 17:36:27 +0000, a écrit
:> Samuel Thibault, le Fri 14 Dec 2007 16:26:13 +0000, a écrit :
> > I tested groups a bit (the merge goes very fine) but couldn''t
see a
> > difference,
> 
> Oops, scratch that: groups operations were not working.
This time I could really test it. There is no real performance
difference even when dom0 is busy looping. It hence looks like the
boosting feature of the credit scheduler already does a good job.

However, when using the "cap" feature of the credit scheduler, there
is
indeed a noticeable difference: the stubdomain cpu time properly gets
accounted in the HVM cpu time, and the cap does have an effect on the
whole of the two.

Now I guess this has to be somehow merged with the other, more
instrusive, group support that got submitted to xen-devel.

Samuel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mike D. Day

2007-Dec-19 20:18 UTC

head link

[Xen-devel] Re: Scheduling groups, credit scheduler support

On 14/12/07 16:49 +0000, Samuel Thibault wrote:> Hi,
> 
> Mike D. Day, le Tue 04 Dec 2007 08:50:20 -0500, a écrit :
> > Credits are "stolen" only when the stub domain runs, so if
the hvm
> > domain is doing no I/O then none of its credits go to the stub domain.
> 
> Ok, but take for instance the case where vcpu1 of the HVM does a lot of
> I/O while vcpu2 of the HVM prefers to burn CPU. Here we would probably
> like to see the stubdomain take as much CPU time for performing the I/O
> as the vcpu2 has for burning. I''m not sure that your allocation
permits
> this.
One simple way to handle this is to increase the weight of the hvm
domain, which will allow both hvm and stub more credits. Of course,
that does nothing to favor the stub domain.

The right way to fix this is to allow the hvm domain to defer its
credits to the stub domain. I''ll work up a patch. 

Mike


-- 
Mike D. Day
IBM LTC
Cell: 919 412-3900
Sametime: ncmike@us.ibm.com AIM: ncmikeday  Yahoo: ultra.runner
PGP key: http://www.ncultra.org/ncmike/pubkey.asc

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Nov 2007 - [PATCH] Scheduling groups, credit scheduler support

[Xen-devel] [PATCH] Scheduling groups, credit scheduler support

Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support

Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support

[Xen-devel] Re: Scheduling groups, credit scheduler support

Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support

[Xen-devel] Re: Scheduling groups, credit scheduler support

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

[Xen-devel] Re: Scheduling groups, credit scheduler support

[Xen-devel] does xen-linux for PV support the Linux Standards base , or not ?

Re: [Xen-devel] does xen-linux for PV support the Linux Standards base , or not ?

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

[Xen-devel] Re: Scheduling groups, credit scheduler support

[Xen-devel] Re: Scheduling groups, credit scheduler support

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support

Re: [Xen-devel] [PATCH] Scheduling groups, credit scheduler support

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

Re: [Xen-devel] Re: Scheduling groups, credit scheduler support

[Xen-devel] Re: Scheduling groups, credit scheduler support