In order to avoid high-frequency cpu migration, vcpus may in fact be scheduled slightly out-of-order. Account for this situation properly. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> --- xen/common/sched_credit2.c | 40 ++++++++++++++++------------------------ 1 file changed, 16 insertions(+), 24 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index a7bd2ee..5bf5ebc 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -1544,31 +1544,23 @@ csched_runtime(const struct scheduler *ops, int cpu, struct csched_vcpu *snext) } } - /* - * snext is about to be scheduled; so: - * - * 1. if snext->credit were less than 0 when it was taken off the - * runqueue, then csched_schedule() should have called - * reset_credit(). So at this point snext->credit must be greater - * than 0. - * - * 2. snext''s credit must be greater than or equal to anyone else - * in the queue, so snext->credit - swait->credit must be greater - * than or equal to 0. - */ - ASSERT(rt_credit >= 0); - - /* FIXME: See if we can eliminate this conversion if we know time - * will be outside (MIN,MAX). Probably requires pre-calculating - * credit values of MIN,MAX per vcpu, since each vcpu burns credit - * at a different rate. */ - time = c2t(rqd, rt_credit, snext); - - /* Check limits */ - if ( time < CSCHED_MIN_TIMER ) + /* The next guy may actually have a higher credit, if we''ve */ + if ( rt_credit < 0 ) time = CSCHED_MIN_TIMER; - else if ( time > CSCHED_MAX_TIMER ) - time = CSCHED_MAX_TIMER; + else + { + /* FIXME: See if we can eliminate this conversion if we know time + * will be outside (MIN,MAX). Probably requires pre-calculating + * credit values of MIN,MAX per vcpu, since each vcpu burns credit + * at a different rate. */ + time = c2t(rqd, rt_credit, snext); + + /* Check limits */ + if ( time < CSCHED_MIN_TIMER ) + time = CSCHED_MIN_TIMER; + else if ( time > CSCHED_MAX_TIMER ) + time = CSCHED_MAX_TIMER; + } return time; } -- 1.7.9.5
George Dunlap
2013-Mar-08 14:14 UTC
[PATCH 2/2] credit2: Reset until the front of the runqueue is positive
Under normal circumstances, snext->credit should never be less than -CSCHED_MIN_TIMER. However, under some circumstances, a vcpu with low credits may be allowed to run long enough that its credits are actually less than -CSCHED_CREDIT_INIT. (Instances have been observed, for example, where a vcpu with 200us of credit was allowed to run for 11ms, giving it -10.8ms of credit. Thus it was still negative even after the reset.) If this is the case for snext, we simply want to keep moving everyone up until it is in the black again. This fair because none of the other vcpus want to run at the moment. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> --- xen/common/sched_credit2.c | 81 +++++++++++++++++++++++++++----------------- 1 file changed, 49 insertions(+), 32 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 5bf5ebc..7265d5b 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -588,41 +588,58 @@ no_tickle: /* * Credit-related code */ -static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now) +static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now, + struct csched_vcpu *snext) { struct csched_runqueue_data *rqd = RQD(ops, cpu); struct list_head *iter; - list_for_each( iter, &rqd->svc ) - { - struct csched_vcpu * svc = list_entry(iter, struct csched_vcpu, rqd_elem); - - int start_credit; - - BUG_ON( is_idle_vcpu(svc->vcpu) ); - BUG_ON( svc->rqd != rqd ); - - start_credit = svc->credit; - - /* "Clip" credits to max carryover */ - if ( svc->credit > CSCHED_CARRYOVER_MAX ) - svc->credit = CSCHED_CARRYOVER_MAX; - /* And add INIT */ - svc->credit += CSCHED_CREDIT_INIT; - svc->start_time = now; - - /* TRACE */ { - struct { - unsigned dom:16,vcpu:16; - unsigned credit_start, credit_end; - } d; - d.dom = svc->vcpu->domain->domain_id; - d.vcpu = svc->vcpu->vcpu_id; - d.credit_start = start_credit; - d.credit_end = svc->credit; - trace_var(TRC_CSCHED2_CREDIT_RESET, 1, - sizeof(d), - (unsigned char *)&d); + /* + * Under normal circumstances, snext->credit should never be less + * than -CSCHED_MIN_TIMER. However, under some circumstances, a + * vcpu with low credits may be allowed to run long enough that + * its credits are actually less than -CSCHED_CREDIT_INIT. + * (Instances have been observed, for example, where a vcpu with + * 200us of credit was allowed to run for 11ms, giving it -10.8ms + * of credit. Thus it was still negative even after the reset.) + * + * If this is the case for snext, we simply want to keep moving + * everyone up until it is in the black again. This fair because + * none of the other vcpus want to run at the moment. + */ + while (snext->credit <= CSCHED_CREDIT_RESET ) { + list_for_each( iter, &rqd->svc ) + { + struct csched_vcpu * svc; + int start_credit; + + svc = list_entry(iter, struct csched_vcpu, rqd_elem); + + BUG_ON( is_idle_vcpu(svc->vcpu) ); + BUG_ON( svc->rqd != rqd ); + + start_credit = svc->credit; + + /* "Clip" credits to max carryover */ + if ( svc->credit > CSCHED_CARRYOVER_MAX ) + svc->credit = CSCHED_CARRYOVER_MAX; + /* And add INIT */ + svc->credit += CSCHED_CREDIT_INIT; + svc->start_time = now; + + /* TRACE */ { + struct { + unsigned dom:16,vcpu:16; + unsigned credit_start, credit_end; + } d; + d.dom = svc->vcpu->domain->domain_id; + d.vcpu = svc->vcpu->vcpu_id; + d.credit_start = start_credit; + d.credit_end = svc->credit; + trace_var(TRC_CSCHED2_CREDIT_RESET, 1, + sizeof(d), + (unsigned char *)&d); + } } } @@ -1731,7 +1748,7 @@ csched_schedule( /* Check for the reset condition */ if ( snext->credit <= CSCHED_CREDIT_RESET ) { - reset_credit(ops, cpu, now); + reset_credit(ops, cpu, now, snext); balance_load(ops, cpu, now); } -- 1.7.9.5
>>> On 08.03.13 at 15:14, George Dunlap <george.dunlap@eu.citrix.com> wrote: > In order to avoid high-frequency cpu migration, vcpus may in fact be > scheduled slightly out-of-order. Account for this situation properly. > > Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> > --- > xen/common/sched_credit2.c | 40 ++++++++++++++++------------------------ > 1 file changed, 16 insertions(+), 24 deletions(-) > > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index a7bd2ee..5bf5ebc 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -1544,31 +1544,23 @@ csched_runtime(const struct scheduler *ops, int cpu, > struct csched_vcpu *snext) > } > } > > - /* > - * snext is about to be scheduled; so: > - * > - * 1. if snext->credit were less than 0 when it was taken off the > - * runqueue, then csched_schedule() should have called > - * reset_credit(). So at this point snext->credit must be greater > - * than 0. > - * > - * 2. snext''s credit must be greater than or equal to anyone else > - * in the queue, so snext->credit - swait->credit must be greater > - * than or equal to 0. > - */ > - ASSERT(rt_credit >= 0); > - > - /* FIXME: See if we can eliminate this conversion if we know time > - * will be outside (MIN,MAX). Probably requires pre-calculating > - * credit values of MIN,MAX per vcpu, since each vcpu burns credit > - * at a different rate. */ > - time = c2t(rqd, rt_credit, snext); > - > - /* Check limits */ > - if ( time < CSCHED_MIN_TIMER ) > + /* The next guy may actually have a higher credit, if we''ve */This comment appears to end prematurely.> + if ( rt_credit < 0 )Perhaps <= ? Jan> time = CSCHED_MIN_TIMER; > - else if ( time > CSCHED_MAX_TIMER ) > - time = CSCHED_MAX_TIMER; > + else > + { > + /* FIXME: See if we can eliminate this conversion if we know time > + * will be outside (MIN,MAX). Probably requires pre-calculating > + * credit values of MIN,MAX per vcpu, since each vcpu burns credit > + * at a different rate. */ > + time = c2t(rqd, rt_credit, snext); > + > + /* Check limits */ > + if ( time < CSCHED_MIN_TIMER ) > + time = CSCHED_MIN_TIMER; > + else if ( time > CSCHED_MAX_TIMER ) > + time = CSCHED_MAX_TIMER; > + } > > return time; > } > -- > 1.7.9.5
Jan Beulich
2013-Mar-08 14:35 UTC
Re: [PATCH 2/2] credit2: Reset until the front of the runqueue is positive
>>> On 08.03.13 at 15:14, George Dunlap <george.dunlap@eu.citrix.com> wrote: > Under normal circumstances, snext->credit should never be less than > -CSCHED_MIN_TIMER. However, under some circumstances, a vcpu with low > credits may be allowed to run long enough that its credits are > actually less than -CSCHED_CREDIT_INIT. > > (Instances have been observed, for example, where a vcpu with 200us of > credit was allowed to run for 11ms, giving it -10.8ms of credit. Thus > it was still negative even after the reset.) > > If this is the case for snext, we simply want to keep moving everyone > up until it is in the black again. This fair because none of the > other vcpus want to run at the moment. > > Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> > --- > xen/common/sched_credit2.c | 81 +++++++++++++++++++++++++++----------------- > 1 file changed, 49 insertions(+), 32 deletions(-) > > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 5bf5ebc..7265d5b 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -588,41 +588,58 @@ no_tickle: > /* > * Credit-related code > */ > -static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now) > +static void reset_credit(const struct scheduler *ops, int cpu, s_time_t > now, > + struct csched_vcpu *snext) > { > struct csched_runqueue_data *rqd = RQD(ops, cpu); > struct list_head *iter; > > - list_for_each( iter, &rqd->svc ) > - { > - struct csched_vcpu * svc = list_entry(iter, struct csched_vcpu, > rqd_elem); > - > - int start_credit; > - > - BUG_ON( is_idle_vcpu(svc->vcpu) ); > - BUG_ON( svc->rqd != rqd ); > - > - start_credit = svc->credit; > - > - /* "Clip" credits to max carryover */ > - if ( svc->credit > CSCHED_CARRYOVER_MAX ) > - svc->credit = CSCHED_CARRYOVER_MAX; > - /* And add INIT */ > - svc->credit += CSCHED_CREDIT_INIT; > - svc->start_time = now; > - > - /* TRACE */ { > - struct { > - unsigned dom:16,vcpu:16; > - unsigned credit_start, credit_end; > - } d; > - d.dom = svc->vcpu->domain->domain_id; > - d.vcpu = svc->vcpu->vcpu_id; > - d.credit_start = start_credit; > - d.credit_end = svc->credit; > - trace_var(TRC_CSCHED2_CREDIT_RESET, 1, > - sizeof(d), > - (unsigned char *)&d); > + /* > + * Under normal circumstances, snext->credit should never be less > + * than -CSCHED_MIN_TIMER. However, under some circumstances, a > + * vcpu with low credits may be allowed to run long enough that > + * its credits are actually less than -CSCHED_CREDIT_INIT. > + * (Instances have been observed, for example, where a vcpu with > + * 200us of credit was allowed to run for 11ms, giving it -10.8ms > + * of credit. Thus it was still negative even after the reset.) > + * > + * If this is the case for snext, we simply want to keep moving > + * everyone up until it is in the black again. This fair because > + * none of the other vcpus want to run at the moment. > + */ > + while (snext->credit <= CSCHED_CREDIT_RESET ) {So how long can this loop last? Can''t you get away with a loop altogether, considering that you only add CSCHED_CREDIT_INIT inside the loop? Also, I hope there is some sort of guarantee that snext gets updated by the loop in the first place. Jan> + list_for_each( iter, &rqd->svc ) > + { > + struct csched_vcpu * svc; > + int start_credit; > + > + svc = list_entry(iter, struct csched_vcpu, rqd_elem); > + > + BUG_ON( is_idle_vcpu(svc->vcpu) ); > + BUG_ON( svc->rqd != rqd ); > + > + start_credit = svc->credit; > + > + /* "Clip" credits to max carryover */ > + if ( svc->credit > CSCHED_CARRYOVER_MAX ) > + svc->credit = CSCHED_CARRYOVER_MAX; > + /* And add INIT */ > + svc->credit += CSCHED_CREDIT_INIT; > + svc->start_time = now; > + > + /* TRACE */ { > + struct { > + unsigned dom:16,vcpu:16; > + unsigned credit_start, credit_end; > + } d; > + d.dom = svc->vcpu->domain->domain_id; > + d.vcpu = svc->vcpu->vcpu_id; > + d.credit_start = start_credit; > + d.credit_end = svc->credit; > + trace_var(TRC_CSCHED2_CREDIT_RESET, 1, > + sizeof(d), > + (unsigned char *)&d); > + } > } > } > > @@ -1731,7 +1748,7 @@ csched_schedule( > /* Check for the reset condition */ > if ( snext->credit <= CSCHED_CREDIT_RESET ) > { > - reset_credit(ops, cpu, now); > + reset_credit(ops, cpu, now, snext); > balance_load(ops, cpu, now); > } > > -- > 1.7.9.5
George Dunlap
2013-Mar-08 15:04 UTC
Re: [PATCH 2/2] credit2: Reset until the front of the runqueue is positive
On 08/03/13 14:35, Jan Beulich wrote:>>>> On 08.03.13 at 15:14, George Dunlap <george.dunlap@eu.citrix.com> wrote: >> Under normal circumstances, snext->credit should never be less than >> -CSCHED_MIN_TIMER. However, under some circumstances, a vcpu with low >> credits may be allowed to run long enough that its credits are >> actually less than -CSCHED_CREDIT_INIT. >> >> (Instances have been observed, for example, where a vcpu with 200us of >> credit was allowed to run for 11ms, giving it -10.8ms of credit. Thus >> it was still negative even after the reset.) >> >> If this is the case for snext, we simply want to keep moving everyone >> up until it is in the black again. This fair because none of the >> other vcpus want to run at the moment. >> >> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> >> --- >> xen/common/sched_credit2.c | 81 +++++++++++++++++++++++++++----------------- >> 1 file changed, 49 insertions(+), 32 deletions(-) >> >> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c >> index 5bf5ebc..7265d5b 100644 >> --- a/xen/common/sched_credit2.c >> +++ b/xen/common/sched_credit2.c >> @@ -588,41 +588,58 @@ no_tickle: >> /* >> * Credit-related code >> */ >> -static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now) >> +static void reset_credit(const struct scheduler *ops, int cpu, s_time_t >> now, >> + struct csched_vcpu *snext) >> { >> struct csched_runqueue_data *rqd = RQD(ops, cpu); >> struct list_head *iter; >> >> - list_for_each( iter, &rqd->svc ) >> - { >> - struct csched_vcpu * svc = list_entry(iter, struct csched_vcpu, >> rqd_elem); >> - >> - int start_credit; >> - >> - BUG_ON( is_idle_vcpu(svc->vcpu) ); >> - BUG_ON( svc->rqd != rqd ); >> - >> - start_credit = svc->credit; >> - >> - /* "Clip" credits to max carryover */ >> - if ( svc->credit > CSCHED_CARRYOVER_MAX ) >> - svc->credit = CSCHED_CARRYOVER_MAX; >> - /* And add INIT */ >> - svc->credit += CSCHED_CREDIT_INIT; >> - svc->start_time = now; >> - >> - /* TRACE */ { >> - struct { >> - unsigned dom:16,vcpu:16; >> - unsigned credit_start, credit_end; >> - } d; >> - d.dom = svc->vcpu->domain->domain_id; >> - d.vcpu = svc->vcpu->vcpu_id; >> - d.credit_start = start_credit; >> - d.credit_end = svc->credit; >> - trace_var(TRC_CSCHED2_CREDIT_RESET, 1, >> - sizeof(d), >> - (unsigned char *)&d); >> + /* >> + * Under normal circumstances, snext->credit should never be less >> + * than -CSCHED_MIN_TIMER. However, under some circumstances, a >> + * vcpu with low credits may be allowed to run long enough that >> + * its credits are actually less than -CSCHED_CREDIT_INIT. >> + * (Instances have been observed, for example, where a vcpu with >> + * 200us of credit was allowed to run for 11ms, giving it -10.8ms >> + * of credit. Thus it was still negative even after the reset.) >> + * >> + * If this is the case for snext, we simply want to keep moving >> + * everyone up until it is in the black again. This fair because >> + * none of the other vcpus want to run at the moment. >> + */ >> + while (snext->credit <= CSCHED_CREDIT_RESET ) { > So how long can this loop last? Can''t you get away with a loop > altogether, considering that you only add CSCHED_CREDIT_INIT > inside the loop?I''m not sure what you mean? The point of doing the whole loop over is to make sure that everyone gets the same number of CSCHED_CREDIT_INITs added. While testing this I saw a couple of instances where it did the loop 20 times; the vast majority of the times it went around mutliple times it only went twice. I suppose we could do something like this: if(snext->credit < -CSCHED_CREDIT_INIT) { x = (-snext->credit)/CSCHED_CREDIT_INIT; } else { x = 1; } Then add (x * CSCHED_CREDIT_INIT) to each one. Then in the common case we''re not doing any integer division, but in the uncommon case we have a bounded algorithm. Does that sound better?> > Also, I hope there is some sort of guarantee that snext gets > updated by the loop in the first place.The inner loop iterates over the list of all vcpus assigned to this runqueue, whether or not they are actually on the current "queue" (i.e., runnable and waiting for the cpu) or not. snext was just taken from the top the queue, so it should be on that list. Unfortunately there''s not a simple ASSERT() we can add to make sure that snext is actually in that list; we can only ASSERT that the runqueue of snext->cpu is this runqueue. But if we''re trying to check whether the invariants are being upheld, there''s no sense in assuming one of the invariants we''re trying to check. But if we go the "multiplier" route this whole question disappears. -George
Jan Beulich
2013-Mar-08 15:13 UTC
Re: [PATCH 2/2] credit2: Reset until the front of the runqueue is positive
>>> On 08.03.13 at 16:04, George Dunlap <george.dunlap@eu.citrix.com> wrote: > On 08/03/13 14:35, Jan Beulich wrote: >>>>> On 08.03.13 at 15:14, George Dunlap <george.dunlap@eu.citrix.com> wrote: >>> Under normal circumstances, snext->credit should never be less than >>> -CSCHED_MIN_TIMER. However, under some circumstances, a vcpu with low >>> credits may be allowed to run long enough that its credits are >>> actually less than -CSCHED_CREDIT_INIT. >>> >>> (Instances have been observed, for example, where a vcpu with 200us of >>> credit was allowed to run for 11ms, giving it -10.8ms of credit. Thus >>> it was still negative even after the reset.) >>> >>> If this is the case for snext, we simply want to keep moving everyone >>> up until it is in the black again. This fair because none of the >>> other vcpus want to run at the moment. >>> >>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> >>> --- >>> xen/common/sched_credit2.c | 81 +++++++++++++++++++++++++++----------------- >>> 1 file changed, 49 insertions(+), 32 deletions(-) >>> >>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c >>> index 5bf5ebc..7265d5b 100644 >>> --- a/xen/common/sched_credit2.c >>> +++ b/xen/common/sched_credit2.c >>> @@ -588,41 +588,58 @@ no_tickle: >>> /* >>> * Credit-related code >>> */ >>> -static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now) >>> +static void reset_credit(const struct scheduler *ops, int cpu, s_time_t >>> now, >>> + struct csched_vcpu *snext) >>> { >>> struct csched_runqueue_data *rqd = RQD(ops, cpu); >>> struct list_head *iter; >>> >>> - list_for_each( iter, &rqd->svc ) >>> - { >>> - struct csched_vcpu * svc = list_entry(iter, struct csched_vcpu, >>> rqd_elem); >>> - >>> - int start_credit; >>> - >>> - BUG_ON( is_idle_vcpu(svc->vcpu) ); >>> - BUG_ON( svc->rqd != rqd ); >>> - >>> - start_credit = svc->credit; >>> - >>> - /* "Clip" credits to max carryover */ >>> - if ( svc->credit > CSCHED_CARRYOVER_MAX ) >>> - svc->credit = CSCHED_CARRYOVER_MAX; >>> - /* And add INIT */ >>> - svc->credit += CSCHED_CREDIT_INIT; >>> - svc->start_time = now; >>> - >>> - /* TRACE */ { >>> - struct { >>> - unsigned dom:16,vcpu:16; >>> - unsigned credit_start, credit_end; >>> - } d; >>> - d.dom = svc->vcpu->domain->domain_id; >>> - d.vcpu = svc->vcpu->vcpu_id; >>> - d.credit_start = start_credit; >>> - d.credit_end = svc->credit; >>> - trace_var(TRC_CSCHED2_CREDIT_RESET, 1, >>> - sizeof(d), >>> - (unsigned char *)&d); >>> + /* >>> + * Under normal circumstances, snext->credit should never be less >>> + * than -CSCHED_MIN_TIMER. However, under some circumstances, a >>> + * vcpu with low credits may be allowed to run long enough that >>> + * its credits are actually less than -CSCHED_CREDIT_INIT. >>> + * (Instances have been observed, for example, where a vcpu with >>> + * 200us of credit was allowed to run for 11ms, giving it -10.8ms >>> + * of credit. Thus it was still negative even after the reset.) >>> + * >>> + * If this is the case for snext, we simply want to keep moving >>> + * everyone up until it is in the black again. This fair because >>> + * none of the other vcpus want to run at the moment. >>> + */ >>> + while (snext->credit <= CSCHED_CREDIT_RESET ) { >> So how long can this loop last? Can''t you get away with a loop >> altogether, considering that you only add CSCHED_CREDIT_INIT >> inside the loop? > > I''m not sure what you mean? > > The point of doing the whole loop over is to make sure that everyone > gets the same number of CSCHED_CREDIT_INITs added. > > While testing this I saw a couple of instances where it did the loop 20 > times; the vast majority of the times it went around mutliple times it > only went twice. > > I suppose we could do something like this: > > if(snext->credit < -CSCHED_CREDIT_INIT) { > x = (-snext->credit)/CSCHED_CREDIT_INIT; > } else { > x = 1; > } > > Then add (x * CSCHED_CREDIT_INIT) to each one. Then in the common case > we''re not doing any integer division, but in the uncommon case we have a > bounded algorithm. Does that sound better?Yes, it was something along those lines that I was hoping to get. The division still seems better than perhaps quite a few iterations of the loop.>> Also, I hope there is some sort of guarantee that snext gets >> updated by the loop in the first place. > > The inner loop iterates over the list of all vcpus assigned to this > runqueue, whether or not they are actually on the current "queue" (i.e., > runnable and waiting for the cpu) or not. snext was just taken from the > top the queue, so it should be on that list. > > Unfortunately there''s not a simple ASSERT() we can add to make sure that > snext is actually in that list; we can only ASSERT that the runqueue of > snext->cpu is this runqueue. But if we''re trying to check whether the > invariants are being upheld, there''s no sense in assuming one of the > invariants we''re trying to check. > > But if we go the "multiplier" route this whole question disappears.Indeed. Jan