Ian Campbell
2011-Nov-23 16:24 UTC
Re: [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers.
On Wed, 2011-11-23 at 15:07 +0000, Dario Faggioli wrote:> Pluggable schedulers'' code knows what (and when) to lock better than > generic code, let''s delegate to them all the concurrency related issues. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > diff -r 84b3e46aa7d2 xen/common/sched_credit.c > --- a/xen/common/sched_credit.c Wed Nov 23 12:03:37 2011 +0000 > +++ b/xen/common/sched_credit.c Wed Nov 23 15:09:14 2011 +0100 > @@ -161,6 +161,7 @@ struct csched_dom { > * System-wide private data > */ > struct csched_private { > + /* lock for the whole pluggable scheduler, nests inside cpupool_lock */ > spinlock_t lock; > struct list_head active_sdom; > uint32_t ncpus;Given that every scheduler plugin is going to need this lock perhaps it could be provided globally (but still have the responsibility for using it fall on the plugin)? I was mainly thinking of the sedf case where you add and maintain the whole structure for just that lock. Perhaps you have future plans which involve having to do that anyway in which case maybe my suggestion doesn''t make sense. Ian.
Dario Faggioli
2011-Nov-23 17:09 UTC
Re: [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers.
On Wed, 2011-11-23 at 16:24 +0000, Ian Campbell wrote:> > struct csched_private { > > + /* lock for the whole pluggable scheduler, nests inside cpupool_lock */ > > spinlock_t lock; > > struct list_head active_sdom; > > uint32_t ncpus; > > Given that every scheduler plugin is going to need this lock perhaps it > could be provided globally (but still have the responsibility for using > it fall on the plugin)? >Makes sense to me, and it should be something pretty easy to do, if you were thinking of just moving the lock to general code. I''m saying this because both credit and credit2 has much more payload in their `struct csched_private'', and if we also want to get rid of the struct for them as well, that would be a different story! :-)> I was mainly thinking of the sedf case where you add and maintain the > whole structure for just that lock. Perhaps you have future plans which > involve having to do that anyway in which case maybe my suggestion > doesn''t make sense. >I know and I agree, that 1-field-struct is just as ugly as hell! Reason why I went for it are homogeneity with the current code of all the schedulers, and yes, also, what you''re saying above, i.e., it might turn useful in future to have some scheduler-wide repository in sedf as it is now for credit*. But no, I don''t have _specific_ plans yet, so your comment do make sense. Anyway, even if we''d stay with what''s in this patch, I think this at least need some commenting... Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ------------------------------------------------------------------- Dario Faggioli, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) PhD Candidate, ReTiS Lab, Scuola Superiore Sant''Anna, Pisa (Italy) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Nov-23 17:30 UTC
Re: [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers.
On Wed, 2011-11-23 at 17:09 +0000, Dario Faggioli wrote:> On Wed, 2011-11-23 at 16:24 +0000, Ian Campbell wrote: > > > struct csched_private { > > > + /* lock for the whole pluggable scheduler, nests inside cpupool_lock */ > > > spinlock_t lock; > > > struct list_head active_sdom; > > > uint32_t ncpus; > > > > Given that every scheduler plugin is going to need this lock perhaps it > > could be provided globally (but still have the responsibility for using > > it fall on the plugin)? > > > Makes sense to me, and it should be something pretty easy to do, if you > were thinking of just moving the lock to general code. > I''m saying this because both credit and credit2 has much more payload in > their `struct csched_private'', and if we also want to get rid of the > struct for them as well, that would be a different story! :-)No, I just meant the lock.> > > I was mainly thinking of the sedf case where you add and maintain the > > whole structure for just that lock. Perhaps you have future plans which > > involve having to do that anyway in which case maybe my suggestion > > doesn''t make sense. > > > I know and I agree, that 1-field-struct is just as ugly as hell! Reason > why I went for it are homogeneity with the current code of all the > schedulers, and yes, also, what you''re saying above, i.e., it might turn > useful in future to have some scheduler-wide repository in sedf as it is > now for credit*. But no, I don''t have _specific_ plans yet, so your > comment do make sense. > > Anyway, even if we''d stay with what''s in this patch, I think this at > least need some commenting... > > Thanks and Regards, > Dario >
George Dunlap
2011-Dec-06 10:34 UTC
Re: [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers.
On Wed, 2011-11-23 at 16:24 +0000, Ian Campbell wrote:> On Wed, 2011-11-23 at 15:07 +0000, Dario Faggioli wrote: > > Pluggable schedulers'' code knows what (and when) to lock better than > > generic code, let''s delegate to them all the concurrency related issues. > > > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > > > diff -r 84b3e46aa7d2 xen/common/sched_credit.c > > --- a/xen/common/sched_credit.c Wed Nov 23 12:03:37 2011 +0000 > > +++ b/xen/common/sched_credit.c Wed Nov 23 15:09:14 2011 +0100 > > @@ -161,6 +161,7 @@ struct csched_dom { > > * System-wide private data > > */ > > struct csched_private { > > + /* lock for the whole pluggable scheduler, nests inside cpupool_lock */ > > spinlock_t lock; > > struct list_head active_sdom; > > uint32_t ncpus; > > Given that every scheduler plugin is going to need this lock perhaps it > could be provided globally (but still have the responsibility for using > it fall on the plugin)?Sorry for the long delay in responding... I don''t really like this idea. For one thing, that would make the generic scheduler code responsible for making per-cpupool locks, which could look messy, and adds to code complexity. There''s already code to allocate per-pool data structures for the schedulers -- it seems like just leveraging that would be easiest. I also think that logically, having each scheduler in charge of its own locking makes more sense; having the generic scheduling code do stuff on behalf of the pluggable scheduler is what caused this problem in the first place. If we think having a one-item struct is ugly, could we just use spinlock_t as the type we allocate / cast to in the sedf scheduler? -George
Dario Faggioli
2011-Dec-06 16:35 UTC
Re: [RFC/RFT][PATCH 1 of 3] Move locking into pluggable schedulers.
On Tue, Dec 6, 2011 at 11:34 AM, George Dunlap <george.dunlap@citrix.com> wrote:>> Given that every scheduler plugin is going to need this lock perhaps it >> could be provided globally (but still have the responsibility for using >> it fall on the plugin)? > > Sorry for the long delay in responding... I don''t really like this idea. > For one thing, that would make the generic scheduler code responsible > for making per-cpupool locks, which could look messy, and adds to code > complexity. >Right. I was looking into this and facing right this issue, i.e., how to make that lock a per-cpupool thing, and wasn''t quite succeeding in doing that in a clean way.> I also think that logically, having each scheduler in charge > of its own locking makes more sense; having the generic scheduling code > do stuff on behalf of the pluggable scheduler is what caused this > problem in the first place. >Indeed.> If we think having a one-item struct is ugly, could we just use > spinlock_t as the type we allocate / cast to in the sedf scheduler? >Well, I put that struct there in the first place so I definitely can live with it. I certainly don''t think it is the most beautiful piece of code I''ve ever wrote but, considering I''d have to dynamically allocate the lock anyway (for the reasons above), and access it through ops->sched_data, having it pointing directly to the lock is not that much better than the struct to me. Therefore, I think I''ll let you decide here, and will keep or kill the struct basing on what you think is nicer! :-) Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ---------------------------------------------------------------------- Dario Faggioli, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) PhD Candidate, ReTiS Lab, Scuola Superiore Sant''Anna, Pisa (Italy)
Hi everyone, Here it is v2 of the lock reworking around and within sched-adjust. With respect to the first posting [1]: - I _did_not_ move the per-pluggable scheduler lock toward schedule.c, as agreed with George during review; - I fixed the bug in sedf spotted by Juergen the way he suggested; - I''ve finally been able to test it under all the three schedulers, and it is doing its job, at least here; Notice the series "collapsed" in one signle patch, as it was being hard to find a breakdown of it that does not introduce regressions and/or transient deadlock situations worse than the ones it''s trying to cure... I hope it''s still readable and comfortable to review. :-) Thanks to everyone who provided his feedback on the first version of this. Regards, Dario [1] http://xen.1045712.n5.nabble.com/RFC-RFT-PATCH-0-of-3-rework-locking-in-sched-adjust-td5016899.html -- xen/common/sched_credit.c | 10 ++++++-- xen/common/sched_credit2.c | 21 ++++++++++--------- xen/common/sched_sedf.c | 156 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------- xen/common/schedule.c | 34 +------------------------------- 4 files changed, 140 insertions(+), 81 deletions(-) -- <<This happens because I choose it to happen!>> (Raistlin Majere) ------------------------------------------------------------------- Dario Faggioli, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) PhD Candidate, ReTiS Lab, Scuola Superiore Sant''Anna, Pisa (Italy) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
The main idea is to move (as much as possible) locking logic from generic code to the various pluggable schedulers. While at it, the following is also accomplished: - pausing all the non-current VCPUs of a domain while changing its scheduling parameters is not effective in avoiding races and it is prone to deadlock, so that is removed. - sedf needs a global lock for preventing races while adjusting domains'' scheduling parameters (as it is for credit and credit2), so that is added. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff -r 38eb74c01d9d xen/common/sched_credit.c --- a/xen/common/sched_credit.c Tue Dec 06 21:16:56 2011 +0000 +++ b/xen/common/sched_credit.c Wed Dec 07 15:45:55 2011 +0100 @@ -161,6 +161,7 @@ struct csched_dom { * System-wide private data */ struct csched_private { + /* lock for the whole pluggable scheduler, nests inside cpupool_lock */ spinlock_t lock; struct list_head active_sdom; uint32_t ncpus; @@ -800,6 +801,10 @@ csched_dom_cntl( struct csched_private *prv = CSCHED_PRIV(ops); unsigned long flags; + /* Protect both get and put branches with the pluggable scheduler + * lock. Runq lock not needed anywhere in here. */ + spin_lock_irqsave(&prv->lock, flags); + if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo ) { op->u.credit.weight = sdom->weight; @@ -809,8 +814,6 @@ csched_dom_cntl( { ASSERT(op->cmd == XEN_DOMCTL_SCHEDOP_putinfo); - spin_lock_irqsave(&prv->lock, flags); - if ( op->u.credit.weight != 0 ) { if ( !list_empty(&sdom->active_sdom_elem) ) @@ -824,9 +827,10 @@ csched_dom_cntl( if ( op->u.credit.cap != (uint16_t)~0U ) sdom->cap = op->u.credit.cap; - spin_unlock_irqrestore(&prv->lock, flags); } + spin_unlock_irqrestore(&prv->lock, flags); + return 0; } diff -r 38eb74c01d9d xen/common/sched_credit2.c --- a/xen/common/sched_credit2.c Tue Dec 06 21:16:56 2011 +0000 +++ b/xen/common/sched_credit2.c Wed Dec 07 15:45:55 2011 +0100 @@ -1384,6 +1384,10 @@ csched_dom_cntl( struct csched_private *prv = CSCHED_PRIV(ops); unsigned long flags; + /* Must hold csched_priv lock to read and update sdom, + * runq lock to update csvcs. */ + spin_lock_irqsave(&prv->lock, flags); + if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo ) { op->u.credit2.weight = sdom->weight; @@ -1397,10 +1401,6 @@ csched_dom_cntl( struct list_head *iter; int old_weight; - /* Must hold csched_priv lock to update sdom, runq lock to - * update csvcs. */ - spin_lock_irqsave(&prv->lock, flags); - old_weight = sdom->weight; sdom->weight = op->u.credit2.weight; @@ -1411,22 +1411,23 @@ csched_dom_cntl( struct csched_vcpu *svc = list_entry(iter, struct csched_vcpu, sdom_elem); /* NB: Locking order is important here. Because we grab this lock here, we - * must never lock csched_priv.lock if we''re holding a runqueue - * lock. */ - vcpu_schedule_lock_irq(svc->vcpu); + * must never lock csched_priv.lock if we''re holding a runqueue lock. + * Also, calling vcpu_schedule_lock() is enough, since IRQs have already + * been disabled. */ + vcpu_schedule_lock(svc->vcpu); BUG_ON(svc->rqd != RQD(ops, svc->vcpu->processor)); svc->weight = sdom->weight; update_max_weight(svc->rqd, svc->weight, old_weight); - vcpu_schedule_unlock_irq(svc->vcpu); + vcpu_schedule_unlock(svc->vcpu); } - - spin_unlock_irqrestore(&prv->lock, flags); } } + spin_unlock_irqrestore(&prv->lock, flags); + return 0; } diff -r 38eb74c01d9d xen/common/sched_sedf.c --- a/xen/common/sched_sedf.c Tue Dec 06 21:16:56 2011 +0000 +++ b/xen/common/sched_sedf.c Wed Dec 07 15:45:55 2011 +0100 @@ -61,6 +61,11 @@ struct sedf_dom_info { struct domain *domain; }; +struct sedf_priv_info { + /* lock for the whole pluggable scheduler, nests inside cpupool_lock */ + spinlock_t lock; +}; + struct sedf_vcpu_info { struct vcpu *vcpu; struct list_head list; @@ -115,6 +120,8 @@ struct sedf_cpu_info { s_time_t current_slice_expires; }; +#define SEDF_PRIV(_ops) \ + ((struct sedf_priv_info *)((_ops)->sched_data)) #define EDOM_INFO(d) ((struct sedf_vcpu_info *)((d)->sched_priv)) #define CPU_INFO(cpu) \ ((struct sedf_cpu_info *)per_cpu(schedule_data, cpu).sched_priv) @@ -772,6 +779,31 @@ static struct task_slice sedf_do_extra_s } +static int sedf_init(struct scheduler *ops) +{ + struct sedf_priv_info *prv; + + prv = xzalloc(struct sedf_priv_info); + if ( prv == NULL ) + return -ENOMEM; + + ops->sched_data = prv; + spin_lock_init(&prv->lock); + + return 0; +} + + +static void sedf_deinit(const struct scheduler *ops) +{ + struct sedf_priv_info *prv; + + prv = SEDF_PRIV(ops); + if ( prv != NULL ) + xfree(prv); +} + + /* Main scheduling function Reasons for calling this function are: -timeslice for the current period used up @@ -1320,22 +1352,15 @@ static void sedf_dump_cpu_state(const st /* Adjusts periods and slices of the domains accordingly to their weights. */ -static int sedf_adjust_weights(struct cpupool *c, struct xen_domctl_scheduler_op *cmd) +static int sedf_adjust_weights(struct cpupool *c, int nr_cpus, int *sumw, s_time_t *sumt) { struct vcpu *p; struct domain *d; - unsigned int cpu, nr_cpus = cpumask_last(&cpu_online_map) + 1; - int *sumw = xzalloc_array(int, nr_cpus); - s_time_t *sumt = xzalloc_array(s_time_t, nr_cpus); + unsigned int cpu; - if ( !sumw || !sumt ) - { - xfree(sumt); - xfree(sumw); - return -ENOMEM; - } - - /* Sum across all weights. */ + /* Sum across all weights. Notice that no runq locking is needed + * here: the caller holds sedf_priv_info.lock and we''re not changing + * anything that is accessed during scheduling. */ rcu_read_lock(&domlist_read_lock); for_each_domain_in_cpupool( d, c ) { @@ -1365,7 +1390,9 @@ static int sedf_adjust_weights(struct cp } rcu_read_unlock(&domlist_read_lock); - /* Adjust all slices (and periods) to the new weight. */ + /* Adjust all slices (and periods) to the new weight. Unlike above, we + * need to take thr runq lock for the various VCPUs: we''re modyfing + * slice and period which are referenced during scheduling. */ rcu_read_lock(&domlist_read_lock); for_each_domain_in_cpupool( d, c ) { @@ -1375,20 +1402,20 @@ static int sedf_adjust_weights(struct cp continue; if ( EDOM_INFO(p)->weight ) { + /* Interrupts already off */ + vcpu_schedule_lock(p); EDOM_INFO(p)->period_orig = EDOM_INFO(p)->period = WEIGHT_PERIOD; EDOM_INFO(p)->slice_orig EDOM_INFO(p)->slice = (EDOM_INFO(p)->weight * (WEIGHT_PERIOD - WEIGHT_SAFETY - sumt[cpu])) / sumw[cpu]; + vcpu_schedule_unlock(p); } } } rcu_read_unlock(&domlist_read_lock); - xfree(sumt); - xfree(sumw); - return 0; } @@ -1396,19 +1423,45 @@ static int sedf_adjust_weights(struct cp /* set or fetch domain scheduling parameters */ static int sedf_adjust(const struct scheduler *ops, struct domain *p, struct xen_domctl_scheduler_op *op) { + struct sedf_priv_info *prv = SEDF_PRIV(ops); + unsigned long flags; + unsigned int nr_cpus = cpumask_last(&cpu_online_map) + 1; + int *sumw = xzalloc_array(int, nr_cpus); + s_time_t *sumt = xzalloc_array(s_time_t, nr_cpus); struct vcpu *v; - int rc; + int rc = 0; PRINT(2,"sedf_adjust was called, domain-id %i new period %"PRIu64" " "new slice %"PRIu64"\nlatency %"PRIu64" extra:%s\n", p->domain_id, op->u.sedf.period, op->u.sedf.slice, op->u.sedf.latency, (op->u.sedf.extratime)?"yes":"no"); + /* Serialize against the pluggable scheduler lock to protect from + * concurrent updates. We need to take the runq lock for the VCPUs + * as well, since we are touching extraweight, weight, slice and + * period. As in sched_credit2.c, runq locks nest inside the + * pluggable scheduler lock. */ + spin_lock_irqsave(&prv->lock, flags); + if ( op->cmd == XEN_DOMCTL_SCHEDOP_putinfo ) { + /* These are used in sedf_adjust_weights() but have to be allocated in + * this function, as we need to avoid nesting xmem_pool_alloc''s lock + * within our prv->lock. */ + if ( !sumw || !sumt ) + { + /* Check for errors here, the _getinfo branch doesn''t care */ + rc = -ENOMEM; + goto out; + } + /* Check for sane parameters. */ if ( !op->u.sedf.period && !op->u.sedf.weight ) - return -EINVAL; + { + rc = -EINVAL; + goto out; + } + if ( op->u.sedf.weight ) { if ( (op->u.sedf.extratime & EXTRA_AWARE) && @@ -1417,59 +1470,78 @@ static int sedf_adjust(const struct sche /* Weight-driven domains with extratime only. */ for_each_vcpu ( p, v ) { + /* (Here and everywhere in the following) IRQs are already off, + * hence vcpu_spin_lock() is the one. */ + vcpu_schedule_lock(v); EDOM_INFO(v)->extraweight = op->u.sedf.weight; EDOM_INFO(v)->weight = 0; EDOM_INFO(v)->slice = 0; EDOM_INFO(v)->period = WEIGHT_PERIOD; + vcpu_schedule_unlock(v); } } else { /* Weight-driven domains with real-time execution. */ - for_each_vcpu ( p, v ) + for_each_vcpu ( p, v ) { + vcpu_schedule_lock(v); EDOM_INFO(v)->weight = op->u.sedf.weight; + vcpu_schedule_unlock(v); + } } } else { + /* + * Sanity checking: note that disabling extra weight requires + * that we set a non-zero slice. + */ + if ( (op->u.sedf.period > PERIOD_MAX) || + (op->u.sedf.period < PERIOD_MIN) || + (op->u.sedf.slice > op->u.sedf.period) || + (op->u.sedf.slice < SLICE_MIN) ) + { + rc = -EINVAL; + goto out; + } + /* Time-driven domains. */ for_each_vcpu ( p, v ) { - /* - * Sanity checking: note that disabling extra weight requires - * that we set a non-zero slice. - */ - if ( (op->u.sedf.period > PERIOD_MAX) || - (op->u.sedf.period < PERIOD_MIN) || - (op->u.sedf.slice > op->u.sedf.period) || - (op->u.sedf.slice < SLICE_MIN) ) - return -EINVAL; + vcpu_schedule_lock(v); EDOM_INFO(v)->weight = 0; EDOM_INFO(v)->extraweight = 0; EDOM_INFO(v)->period_orig = EDOM_INFO(v)->period = op->u.sedf.period; EDOM_INFO(v)->slice_orig = EDOM_INFO(v)->slice = op->u.sedf.slice; + vcpu_schedule_unlock(v); } } - rc = sedf_adjust_weights(p->cpupool, op); + rc = sedf_adjust_weights(p->cpupool, nr_cpus, sumw, sumt); if ( rc ) - return rc; + goto out; for_each_vcpu ( p, v ) { + vcpu_schedule_lock(v); EDOM_INFO(v)->status = (EDOM_INFO(v)->status & ~EXTRA_AWARE) | (op->u.sedf.extratime & EXTRA_AWARE); EDOM_INFO(v)->latency = op->u.sedf.latency; extraq_check(v); + vcpu_schedule_unlock(v); } } else if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo ) { if ( p->vcpu[0] == NULL ) - return -EINVAL; + { + rc = -EINVAL; + goto out; + } + op->u.sedf.period = EDOM_INFO(p->vcpu[0])->period; op->u.sedf.slice = EDOM_INFO(p->vcpu[0])->slice; op->u.sedf.extratime = EDOM_INFO(p->vcpu[0])->status & EXTRA_AWARE; @@ -1477,14 +1549,23 @@ static int sedf_adjust(const struct sche op->u.sedf.weight = EDOM_INFO(p->vcpu[0])->weight; } - PRINT(2,"sedf_adjust_finished\n"); - return 0; +out: + spin_unlock_irqrestore(&prv->lock, flags); + + xfree(sumt); + xfree(sumw); + + PRINT(2,"sedf_adjust_finished with return code %d\n", rc); + return rc; } +static struct sedf_priv_info _sedf_priv; + const struct scheduler sched_sedf_def = { - .name = "Simple EDF Scheduler", - .opt_name = "sedf", - .sched_id = XEN_SCHEDULER_SEDF, + .name = "Simple EDF Scheduler", + .opt_name = "sedf", + .sched_id = XEN_SCHEDULER_SEDF, + .sched_data = &_sedf_priv, .init_domain = sedf_init_domain, .destroy_domain = sedf_destroy_domain, @@ -1498,6 +1579,9 @@ const struct scheduler sched_sedf_def = .alloc_domdata = sedf_alloc_domdata, .free_domdata = sedf_free_domdata, + .init = sedf_init, + .deinit = sedf_deinit, + .do_schedule = sedf_do_schedule, .pick_cpu = sedf_pick_cpu, .dump_cpu_state = sedf_dump_cpu_state, diff -r 38eb74c01d9d xen/common/schedule.c --- a/xen/common/schedule.c Tue Dec 06 21:16:56 2011 +0000 +++ b/xen/common/schedule.c Wed Dec 07 15:45:55 2011 +0100 @@ -1005,7 +1005,6 @@ int sched_id(void) /* Adjust scheduling parameter for a given domain. */ long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op) { - struct vcpu *v; long ret; if ( (op->sched_id != DOM2OP(d)->sched_id) || @@ -1013,40 +1012,11 @@ long sched_adjust(struct domain *d, stru (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) ) return -EINVAL; - /* - * Most VCPUs we can simply pause. If we are adjusting this VCPU then - * we acquire the local schedule_lock to guard against concurrent updates. - * - * We only acquire the local schedule lock after we have paused all other - * VCPUs in this domain. There are two reasons for this: - * 1- We don''t want to hold up interrupts as pausing a VCPU can - * trigger a tlb shootdown. - * 2- Pausing other VCPUs involves briefly locking the schedule - * lock of the CPU they are running on. This CPU could be the - * same as ours. - */ - - for_each_vcpu ( d, v ) - { - if ( v != current ) - vcpu_pause(v); - } - - if ( d == current->domain ) - vcpu_schedule_lock_irq(current); - + /* NB: the pluggable scheduler code needs to take care + * of locking by itself. */ if ( (ret = SCHED_OP(DOM2OP(d), adjust, d, op)) == 0 ) TRACE_1D(TRC_SCHED_ADJDOM, d->domain_id); - if ( d == current->domain ) - vcpu_schedule_unlock_irq(current); - - for_each_vcpu ( d, v ) - { - if ( v != current ) - vcpu_unpause(v); - } - return ret; } -- <<This happens because I choose it to happen!>> (Raistlin Majere) ------------------------------------------------------------------- Dario Faggioli, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) PhD Candidate, ReTiS Lab, Scuola Superiore Sant''Anna, Pisa (Italy) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dario Faggioli
2011-Dec-07 15:04 UTC
Re: [PATCHv2 0 of 1] Rework locking for sched_adjust.
On Wed, 2011-12-07 at 15:49 +0100, Dario Faggioli wrote:> Hi everyone, >Ok, apparently Evolution (with maybe a little help from me too! :-P) messed up with threading... For the next time, I''ll take a serious look at that thing called patchbomb, I promise! :-P Sorry for this, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ------------------------------------------------------------------- Dario Faggioli, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) PhD Candidate, ReTiS Lab, Scuola Superiore Sant''Anna, Pisa (Italy) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dario, this patch won''t apply due to wordwrap damage. Try using "hg email", or sending as an attachment. Thanks, -George On Wed, 2011-12-07 at 15:02 +0000, Dario Faggioli wrote:> The main idea is to move (as much as possible) locking logic > from generic code to the various pluggable schedulers. > > While at it, the following is also accomplished: > - pausing all the non-current VCPUs of a domain while changing its > scheduling parameters is not effective in avoiding races and it is > prone to deadlock, so that is removed. > - sedf needs a global lock for preventing races while adjusting > domains'' scheduling parameters (as it is for credit and credit2), > so that is added. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > diff -r 38eb74c01d9d xen/common/sched_credit.c > --- a/xen/common/sched_credit.c Tue Dec 06 21:16:56 2011 +0000 > +++ b/xen/common/sched_credit.c Wed Dec 07 15:45:55 2011 +0100 > @@ -161,6 +161,7 @@ struct csched_dom { > * System-wide private data > */ > struct csched_private { > + /* lock for the whole pluggable scheduler, nests inside cpupool_lock */ > spinlock_t lock; > struct list_head active_sdom; > uint32_t ncpus; > @@ -800,6 +801,10 @@ csched_dom_cntl( > struct csched_private *prv = CSCHED_PRIV(ops); > unsigned long flags; > > + /* Protect both get and put branches with the pluggable scheduler > + * lock. Runq lock not needed anywhere in here. */ > + spin_lock_irqsave(&prv->lock, flags); > + > if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo ) > { > op->u.credit.weight = sdom->weight; > @@ -809,8 +814,6 @@ csched_dom_cntl( > { > ASSERT(op->cmd == XEN_DOMCTL_SCHEDOP_putinfo); > > - spin_lock_irqsave(&prv->lock, flags); > - > if ( op->u.credit.weight != 0 ) > { > if ( !list_empty(&sdom->active_sdom_elem) ) > @@ -824,9 +827,10 @@ csched_dom_cntl( > if ( op->u.credit.cap != (uint16_t)~0U ) > sdom->cap = op->u.credit.cap; > > - spin_unlock_irqrestore(&prv->lock, flags); > } > > + spin_unlock_irqrestore(&prv->lock, flags); > + > return 0; > } > > diff -r 38eb74c01d9d xen/common/sched_credit2.c > --- a/xen/common/sched_credit2.c Tue Dec 06 21:16:56 2011 +0000 > +++ b/xen/common/sched_credit2.c Wed Dec 07 15:45:55 2011 +0100 > @@ -1384,6 +1384,10 @@ csched_dom_cntl( > struct csched_private *prv = CSCHED_PRIV(ops); > unsigned long flags; > > + /* Must hold csched_priv lock to read and update sdom, > + * runq lock to update csvcs. */ > + spin_lock_irqsave(&prv->lock, flags); > + > if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo ) > { > op->u.credit2.weight = sdom->weight; > @@ -1397,10 +1401,6 @@ csched_dom_cntl( > struct list_head *iter; > int old_weight; > > - /* Must hold csched_priv lock to update sdom, runq lock to > - * update csvcs. */ > - spin_lock_irqsave(&prv->lock, flags); > - > old_weight = sdom->weight; > > sdom->weight = op->u.credit2.weight; > @@ -1411,22 +1411,23 @@ csched_dom_cntl( > struct csched_vcpu *svc = list_entry(iter, struct csched_vcpu, sdom_elem); > > /* NB: Locking order is important here. Because we grab this lock here, we > - * must never lock csched_priv.lock if we''re holding a runqueue > - * lock. */ > - vcpu_schedule_lock_irq(svc->vcpu); > + * must never lock csched_priv.lock if we''re holding a runqueue lock. > + * Also, calling vcpu_schedule_lock() is enough, since IRQs have already > + * been disabled. */ > + vcpu_schedule_lock(svc->vcpu); > > BUG_ON(svc->rqd != RQD(ops, svc->vcpu->processor)); > > svc->weight = sdom->weight; > update_max_weight(svc->rqd, svc->weight, old_weight); > > - vcpu_schedule_unlock_irq(svc->vcpu); > + vcpu_schedule_unlock(svc->vcpu); > } > - > - spin_unlock_irqrestore(&prv->lock, flags); > } > } > > + spin_unlock_irqrestore(&prv->lock, flags); > + > return 0; > } > > diff -r 38eb74c01d9d xen/common/sched_sedf.c > --- a/xen/common/sched_sedf.c Tue Dec 06 21:16:56 2011 +0000 > +++ b/xen/common/sched_sedf.c Wed Dec 07 15:45:55 2011 +0100 > @@ -61,6 +61,11 @@ struct sedf_dom_info { > struct domain *domain; > }; > > +struct sedf_priv_info { > + /* lock for the whole pluggable scheduler, nests inside cpupool_lock */ > + spinlock_t lock; > +}; > + > struct sedf_vcpu_info { > struct vcpu *vcpu; > struct list_head list; > @@ -115,6 +120,8 @@ struct sedf_cpu_info { > s_time_t current_slice_expires; > }; > > +#define SEDF_PRIV(_ops) \ > + ((struct sedf_priv_info *)((_ops)->sched_data)) > #define EDOM_INFO(d) ((struct sedf_vcpu_info *)((d)->sched_priv)) > #define CPU_INFO(cpu) \ > ((struct sedf_cpu_info *)per_cpu(schedule_data, cpu).sched_priv) > @@ -772,6 +779,31 @@ static struct task_slice sedf_do_extra_s > } > > > +static int sedf_init(struct scheduler *ops) > +{ > + struct sedf_priv_info *prv; > + > + prv = xzalloc(struct sedf_priv_info); > + if ( prv == NULL ) > + return -ENOMEM; > + > + ops->sched_data = prv; > + spin_lock_init(&prv->lock); > + > + return 0; > +} > + > + > +static void sedf_deinit(const struct scheduler *ops) > +{ > + struct sedf_priv_info *prv; > + > + prv = SEDF_PRIV(ops); > + if ( prv != NULL ) > + xfree(prv); > +} > + > + > /* Main scheduling function > Reasons for calling this function are: > -timeslice for the current period used up > @@ -1320,22 +1352,15 @@ static void sedf_dump_cpu_state(const st > > > /* Adjusts periods and slices of the domains accordingly to their weights. */ > -static int sedf_adjust_weights(struct cpupool *c, struct xen_domctl_scheduler_op *cmd) > +static int sedf_adjust_weights(struct cpupool *c, int nr_cpus, int *sumw, s_time_t *sumt) > { > struct vcpu *p; > struct domain *d; > - unsigned int cpu, nr_cpus = cpumask_last(&cpu_online_map) + 1; > - int *sumw = xzalloc_array(int, nr_cpus); > - s_time_t *sumt = xzalloc_array(s_time_t, nr_cpus); > + unsigned int cpu; > > - if ( !sumw || !sumt ) > - { > - xfree(sumt); > - xfree(sumw); > - return -ENOMEM; > - } > - > - /* Sum across all weights. */ > + /* Sum across all weights. Notice that no runq locking is needed > + * here: the caller holds sedf_priv_info.lock and we''re not changing > + * anything that is accessed during scheduling. */ > rcu_read_lock(&domlist_read_lock); > for_each_domain_in_cpupool( d, c ) > { > @@ -1365,7 +1390,9 @@ static int sedf_adjust_weights(struct cp > } > rcu_read_unlock(&domlist_read_lock); > > - /* Adjust all slices (and periods) to the new weight. */ > + /* Adjust all slices (and periods) to the new weight. Unlike above, we > + * need to take thr runq lock for the various VCPUs: we''re modyfing > + * slice and period which are referenced during scheduling. */ > rcu_read_lock(&domlist_read_lock); > for_each_domain_in_cpupool( d, c ) > { > @@ -1375,20 +1402,20 @@ static int sedf_adjust_weights(struct cp > continue; > if ( EDOM_INFO(p)->weight ) > { > + /* Interrupts already off */ > + vcpu_schedule_lock(p); > EDOM_INFO(p)->period_orig = > EDOM_INFO(p)->period = WEIGHT_PERIOD; > EDOM_INFO(p)->slice_orig > EDOM_INFO(p)->slice = > (EDOM_INFO(p)->weight * > (WEIGHT_PERIOD - WEIGHT_SAFETY - sumt[cpu])) / sumw[cpu]; > + vcpu_schedule_unlock(p); > } > } > } > rcu_read_unlock(&domlist_read_lock); > > - xfree(sumt); > - xfree(sumw); > - > return 0; > } > > @@ -1396,19 +1423,45 @@ static int sedf_adjust_weights(struct cp > /* set or fetch domain scheduling parameters */ > static int sedf_adjust(const struct scheduler *ops, struct domain *p, struct xen_domctl_scheduler_op *op) > { > + struct sedf_priv_info *prv = SEDF_PRIV(ops); > + unsigned long flags; > + unsigned int nr_cpus = cpumask_last(&cpu_online_map) + 1; > + int *sumw = xzalloc_array(int, nr_cpus); > + s_time_t *sumt = xzalloc_array(s_time_t, nr_cpus); > struct vcpu *v; > - int rc; > + int rc = 0; > > PRINT(2,"sedf_adjust was called, domain-id %i new period %"PRIu64" " > "new slice %"PRIu64"\nlatency %"PRIu64" extra:%s\n", > p->domain_id, op->u.sedf.period, op->u.sedf.slice, > op->u.sedf.latency, (op->u.sedf.extratime)?"yes":"no"); > > + /* Serialize against the pluggable scheduler lock to protect from > + * concurrent updates. We need to take the runq lock for the VCPUs > + * as well, since we are touching extraweight, weight, slice and > + * period. As in sched_credit2.c, runq locks nest inside the > + * pluggable scheduler lock. */ > + spin_lock_irqsave(&prv->lock, flags); > + > if ( op->cmd == XEN_DOMCTL_SCHEDOP_putinfo ) > { > + /* These are used in sedf_adjust_weights() but have to be allocated in > + * this function, as we need to avoid nesting xmem_pool_alloc''s lock > + * within our prv->lock. */ > + if ( !sumw || !sumt ) > + { > + /* Check for errors here, the _getinfo branch doesn''t care */ > + rc = -ENOMEM; > + goto out; > + } > + > /* Check for sane parameters. */ > if ( !op->u.sedf.period && !op->u.sedf.weight ) > - return -EINVAL; > + { > + rc = -EINVAL; > + goto out; > + } > + > if ( op->u.sedf.weight ) > { > if ( (op->u.sedf.extratime & EXTRA_AWARE) && > @@ -1417,59 +1470,78 @@ static int sedf_adjust(const struct sche > /* Weight-driven domains with extratime only. */ > for_each_vcpu ( p, v ) > { > + /* (Here and everywhere in the following) IRQs are already off, > + * hence vcpu_spin_lock() is the one. */ > + vcpu_schedule_lock(v); > EDOM_INFO(v)->extraweight = op->u.sedf.weight; > EDOM_INFO(v)->weight = 0; > EDOM_INFO(v)->slice = 0; > EDOM_INFO(v)->period = WEIGHT_PERIOD; > + vcpu_schedule_unlock(v); > } > } > else > { > /* Weight-driven domains with real-time execution. */ > - for_each_vcpu ( p, v ) > + for_each_vcpu ( p, v ) { > + vcpu_schedule_lock(v); > EDOM_INFO(v)->weight = op->u.sedf.weight; > + vcpu_schedule_unlock(v); > + } > } > } > else > { > + /* > + * Sanity checking: note that disabling extra weight requires > + * that we set a non-zero slice. > + */ > + if ( (op->u.sedf.period > PERIOD_MAX) || > + (op->u.sedf.period < PERIOD_MIN) || > + (op->u.sedf.slice > op->u.sedf.period) || > + (op->u.sedf.slice < SLICE_MIN) ) > + { > + rc = -EINVAL; > + goto out; > + } > + > /* Time-driven domains. */ > for_each_vcpu ( p, v ) > { > - /* > - * Sanity checking: note that disabling extra weight requires > - * that we set a non-zero slice. > - */ > - if ( (op->u.sedf.period > PERIOD_MAX) || > - (op->u.sedf.period < PERIOD_MIN) || > - (op->u.sedf.slice > op->u.sedf.period) || > - (op->u.sedf.slice < SLICE_MIN) ) > - return -EINVAL; > + vcpu_schedule_lock(v); > EDOM_INFO(v)->weight = 0; > EDOM_INFO(v)->extraweight = 0; > EDOM_INFO(v)->period_orig = > EDOM_INFO(v)->period = op->u.sedf.period; > EDOM_INFO(v)->slice_orig = > EDOM_INFO(v)->slice = op->u.sedf.slice; > + vcpu_schedule_unlock(v); > } > } > > - rc = sedf_adjust_weights(p->cpupool, op); > + rc = sedf_adjust_weights(p->cpupool, nr_cpus, sumw, sumt); > if ( rc ) > - return rc; > + goto out; > > for_each_vcpu ( p, v ) > { > + vcpu_schedule_lock(v); > EDOM_INFO(v)->status = > (EDOM_INFO(v)->status & > ~EXTRA_AWARE) | (op->u.sedf.extratime & EXTRA_AWARE); > EDOM_INFO(v)->latency = op->u.sedf.latency; > extraq_check(v); > + vcpu_schedule_unlock(v); > } > } > else if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo ) > { > if ( p->vcpu[0] == NULL ) > - return -EINVAL; > + { > + rc = -EINVAL; > + goto out; > + } > + > op->u.sedf.period = EDOM_INFO(p->vcpu[0])->period; > op->u.sedf.slice = EDOM_INFO(p->vcpu[0])->slice; > op->u.sedf.extratime = EDOM_INFO(p->vcpu[0])->status & EXTRA_AWARE; > @@ -1477,14 +1549,23 @@ static int sedf_adjust(const struct sche > op->u.sedf.weight = EDOM_INFO(p->vcpu[0])->weight; > } > > - PRINT(2,"sedf_adjust_finished\n"); > - return 0; > +out: > + spin_unlock_irqrestore(&prv->lock, flags); > + > + xfree(sumt); > + xfree(sumw); > + > + PRINT(2,"sedf_adjust_finished with return code %d\n", rc); > + return rc; > } > > +static struct sedf_priv_info _sedf_priv; > + > const struct scheduler sched_sedf_def = { > - .name = "Simple EDF Scheduler", > - .opt_name = "sedf", > - .sched_id = XEN_SCHEDULER_SEDF, > + .name = "Simple EDF Scheduler", > + .opt_name = "sedf", > + .sched_id = XEN_SCHEDULER_SEDF, > + .sched_data = &_sedf_priv, > > .init_domain = sedf_init_domain, > .destroy_domain = sedf_destroy_domain, > @@ -1498,6 +1579,9 @@ const struct scheduler sched_sedf_def = > .alloc_domdata = sedf_alloc_domdata, > .free_domdata = sedf_free_domdata, > > + .init = sedf_init, > + .deinit = sedf_deinit, > + > .do_schedule = sedf_do_schedule, > .pick_cpu = sedf_pick_cpu, > .dump_cpu_state = sedf_dump_cpu_state, > diff -r 38eb74c01d9d xen/common/schedule.c > --- a/xen/common/schedule.c Tue Dec 06 21:16:56 2011 +0000 > +++ b/xen/common/schedule.c Wed Dec 07 15:45:55 2011 +0100 > @@ -1005,7 +1005,6 @@ int sched_id(void) > /* Adjust scheduling parameter for a given domain. */ > long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op) > { > - struct vcpu *v; > long ret; > > if ( (op->sched_id != DOM2OP(d)->sched_id) || > @@ -1013,40 +1012,11 @@ long sched_adjust(struct domain *d, stru > (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) ) > return -EINVAL; > > - /* > - * Most VCPUs we can simply pause. If we are adjusting this VCPU then > - * we acquire the local schedule_lock to guard against concurrent updates. > - * > - * We only acquire the local schedule lock after we have paused all other > - * VCPUs in this domain. There are two reasons for this: > - * 1- We don''t want to hold up interrupts as pausing a VCPU can > - * trigger a tlb shootdown. > - * 2- Pausing other VCPUs involves briefly locking the schedule > - * lock of the CPU they are running on. This CPU could be the > - * same as ours. > - */ > - > - for_each_vcpu ( d, v ) > - { > - if ( v != current ) > - vcpu_pause(v); > - } > - > - if ( d == current->domain ) > - vcpu_schedule_lock_irq(current); > - > + /* NB: the pluggable scheduler code needs to take care > + * of locking by itself. */ > if ( (ret = SCHED_OP(DOM2OP(d), adjust, d, op)) == 0 ) > TRACE_1D(TRC_SCHED_ADJDOM, d->domain_id); > > - if ( d == current->domain ) > - vcpu_schedule_unlock_irq(current); > - > - for_each_vcpu ( d, v ) > - { > - if ( v != current ) > - vcpu_unpause(v); > - } > - > return ret; > } > >