Dario Faggioli
2012-Dec-03 16:34 UTC
[PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing
Hello, This small series deals with some weirdness in the mechanism with which the credit scheduler choses what PCPU to tickle upon a VCPU wake-up. Details are available in the changelog of the first patch. The new approach has been extensively benchmarked and proved itself either beneficial or harmless. That means it does not introduce any significant amount of overhead and/or performances regressions while, for some workloads, it improves the performances quite sensibly (e.g., `sysbench --test=memory''). Full results in the first changelog too. The rest of the series introduces some macros to enable generating per-scheduler tracing events, retaining the possibility of distinguishing them, even with more than one scheduler running at any given time (via cpupools), and adds some tracing to the credit scheduler. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Dario Faggioli
2012-Dec-03 16:34 UTC
[PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs
Right now, when a VCPU wakes-up, we check if the it should preempt what is running on the PCPU, and whether or not the waking VCPU can be migrated (by tickling some idlers). However, this can result in suboptimal or even wrong behaviour, as explained here: http://lists.xen.org/archives/html/xen-devel/2012-10/msg01732.html This change, instead, when deciding what PCPUs to tickle upon VCPU wake-up, considers both what it is likely to happen on the PCPU where the wakeup occurs, as well as whether or not there are idle PCPUs where to run the waking VCPU. In fact, if there are idlers where the new VCPU can run, we can avoid interrupting the running VCPU. OTOH, in case there aren''t any of these PCPUs, preemption and migration are the way to go. This has been tested by running the following benchmarks inside 2, 6 and 10 VMs concurrently, on a shared host, each with 2 VCPUs and 960 MB of memory (host has 16 ways and 12 GB RAM). 1) All VMs had ''cpus="all"'' in their config file. $ sysbench --test=cpu ... (time, lower is better) | VMs | w/o this change | w/ this change | | 2 | 50.078467 +/- 1.6676162 | 49.704933 +/- 0.0277184 | | 6 | 63.259472 +/- 0.1137586 | 62.227367 +/- 0.3880619 | | 10 | 91.246797 +/- 0.1154008 | 91.174820 +/- 0.0928781 | $ sysbench --test=memory ... (throughput, higher is better) | VMs | w/o this change | w/ this change | | 2 | 485.56333 +/- 6.0527356 | 525.57833 +/- 25.085826 | | 6 | 401.36278 +/- 1.9745916 | 421.96111 +/- 9.0364048 | | 10 | 294.43933 +/- 0.8064945 | 302.49033 +/- 0.2343978 | $ specjbb2005 ... (throughput, higher is better) | VMs | w/o this change | w/ this change | | 2 | 43150.63 +/- 1359.5616 | 42720.632 +/- 1937.4488 | | 6 | 29274.29 +/- 1024.4042 | 29518.171 +/- 1014.5239 | | 10 | 19061.28 +/- 512.88561 | 19050.141 +/- 458.77327 | 2) All VMs had their VCPUs statically pinned to the host''s PCPUs. $ sysbench --test=cpu ... (time, lower is better) | VMs | w/o this change | w/ this change | | 2 | 47.8211 +/- 0.0215504 | 47.826900 +/- 0.0077872 | | 6 | 62.689122 +/- 0.0877173 | 62.764539 +/- 0.3882493 | | 10 | 90.321097 +/- 1.4803867 | 89.974570 +/- 1.1437566 | $ sysbench --test=memory ... (throughput, higher is better) | VMs | w/o this change | w/ this change | | 2 | 550.97667 +/- 2.3512355 | 550.87000 +/- 0.8140792 | | 6 | 443.15000 +/- 5.7471797 | 454.01056 +/- 8.4373466 | | 10 | 313.89233 +/- 1.3237493 | 321.81167 +/- 0.3528418 | $ specjbb2005 ... (throughput, higher is better) | 2 | 49591.057 +/- 952.93384 | 49610.98 +/- 1242.1675 | | 6 | 33538.247 +/- 1089.2115 | 33682.222 +/- 1216.1078 | | 10 | 21927.870 +/- 831.88742 | 21801.138 +/- 561.97068 | Numbers show how the change has either no or very limited impact (specjbb2005 case) or, when it does have some impact, that is an actual improvement in performances, especially in the sysbench-memory case. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -249,13 +249,25 @@ static inline void struct csched_vcpu * const cur CSCHED_VCPU(per_cpu(schedule_data, cpu).curr); struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu)); - cpumask_t mask; + cpumask_t mask, idle_mask; ASSERT(cur); cpumask_clear(&mask); - /* If strictly higher priority than current VCPU, signal the CPU */ - if ( new->pri > cur->pri ) + /* Check whether or not there are idlers that can run new */ + cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity); + + /* + * Should we ask cpu to reschedule? Well, if new can preempt cur, + * and there isn''t any other place where it can run, we do. OTOH, + * if there are idlers where new can run, we can avoid interrupting + * cur and ask them to come and pick new up. So no, in that case, we + * do not signal cpu, avoiding an unnecessary migration of a running + * VCPU. It is true that we are (probably) migrating new, but as it + * is waking up, it likely is cache-cold anyway. + */ + if ( new->pri > cur->pri && + (cur->pri == CSCHED_PRI_IDLE || cpumask_empty(&idle_mask)) ) { if ( cur->pri == CSCHED_PRI_IDLE ) SCHED_STAT_CRANK(tickle_local_idler); @@ -270,7 +282,7 @@ static inline void } /* - * If this CPU has at least two runnable VCPUs, we tickle any idlers to + * If this CPU has at least two runnable VCPUs, we tickle some idlers to * let them know there is runnable work in the system... */ if ( cur->pri > CSCHED_PRI_IDLE ) @@ -281,9 +293,16 @@ static inline void } else { - cpumask_t idle_mask; - cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity); + /* + * If there aren''t idlers for new, then letting new preempt cur and + * try to migrate cur becomes ineviable. If that is the case, update + * the mask of the to-be-tickled CPUs accordingly (i.e., with cur''s + * idlers instead of new''s). + */ + if ( new->pri > cur->pri && cpumask_empty(&idle_mask) ) + cpumask_and(&idle_mask, prv->idlers, cur->vcpu->cpu_affinity); + if ( !cpumask_empty(&idle_mask) ) { SCHED_STAT_CRANK(tickle_idlers_some); @@ -296,7 +315,6 @@ static inline void else cpumask_or(&mask, &mask, &idle_mask); } - cpumask_and(&mask, &mask, new->vcpu->cpu_affinity); } }
Dario Faggioli
2012-Dec-03 16:34 UTC
[PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs
So that it becomes possible to create specific scheduling event within each scheduler without worrying about the overlapping, and also without giving up being able to recognise them univocally. The latter is deemed as useful, since we can have more than one scheduler running at the same time, thanks to cpupools. The event ID is 12 bits, and this change uses the upper 3 of them for the ''scheduler ID''. This means we''re limited to 8 schedulers and to 512 scheduler specific tracing events. Both seem reasonable limitations as of now. This also converts the existing credit2 tracing (the only scheduler generating tracing events up to now) to the new system. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -29,18 +29,24 @@ #define d2printk(x...) //#define d2printk printk -#define TRC_CSCHED2_TICK TRC_SCHED_CLASS + 1 -#define TRC_CSCHED2_RUNQ_POS TRC_SCHED_CLASS + 2 -#define TRC_CSCHED2_CREDIT_BURN TRC_SCHED_CLASS + 3 -#define TRC_CSCHED2_CREDIT_ADD TRC_SCHED_CLASS + 4 -#define TRC_CSCHED2_TICKLE_CHECK TRC_SCHED_CLASS + 5 -#define TRC_CSCHED2_TICKLE TRC_SCHED_CLASS + 6 -#define TRC_CSCHED2_CREDIT_RESET TRC_SCHED_CLASS + 7 -#define TRC_CSCHED2_SCHED_TASKLET TRC_SCHED_CLASS + 8 -#define TRC_CSCHED2_UPDATE_LOAD TRC_SCHED_CLASS + 9 -#define TRC_CSCHED2_RUNQ_ASSIGN TRC_SCHED_CLASS + 10 -#define TRC_CSCHED2_UPDATE_VCPU_LOAD TRC_SCHED_CLASS + 11 -#define TRC_CSCHED2_UPDATE_RUNQ_LOAD TRC_SCHED_CLASS + 12 +/* + * Credit2 tracing events ("only" 512 available!). Check + * include/public/trace.h for more details. + */ +#define TRC_CSCHED2_EVENT(_e) ((TRC_SCHED_CLASS|TRC_MASK_CSCHED2) + _e) + +#define TRC_CSCHED2_TICK (TRC_CSCHED2_EVENT(1)) +#define TRC_CSCHED2_RUNQ_POS (TRC_CSCHED2_EVENT(2)) +#define TRC_CSCHED2_CREDIT_BURN (TRC_CSCHED2_EVENT(3)) +#define TRC_CSCHED2_CREDIT_ADD (TRC_CSCHED2_EVENT(4)) +#define TRC_CSCHED2_TICKLE_CHECK (TRC_CSCHED2_EVENT(5)) +#define TRC_CSCHED2_TICKLE (TRC_CSCHED2_EVENT(6)) +#define TRC_CSCHED2_CREDIT_RESET (TRC_CSCHED2_EVENT(7)) +#define TRC_CSCHED2_SCHED_TASKLET (TRC_CSCHED2_EVENT(8)) +#define TRC_CSCHED2_UPDATE_LOAD (TRC_CSCHED2_EVENT(9)) +#define TRC_CSCHED2_RUNQ_ASSIGN (TRC_CSCHED2_EVENT(10)) +#define TRC_CSCHED2_UPDATE_VCPU_LOAD (TRC_CSCHED2_EVENT(11)) +#define TRC_CSCHED2_UPDATE_RUNQ_LOAD (TRC_CSCHED2_EVENT(12)) /* * WARNING: This is still in an experimental phase. Status and work can be found at the diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h --- a/xen/include/public/trace.h +++ b/xen/include/public/trace.h @@ -57,6 +57,26 @@ #define TRC_SCHED_CLASS 0x00022000 /* Scheduler-specific */ #define TRC_SCHED_VERBOSE 0x00028000 /* More inclusive scheduling */ +/* + * Per-scheduler masks, to identify scheduler specific events. + * + * The highest 3 bits of the last 12 bits of the above TRC_SCHED_* + * tracing classes are reserved for encoding what scheduler is producing + * the event. The last 9 bits is where the scheduler specific event is + * encoded. + * + * This means we can have at most 8 tracing scheduling masks (which + * means at most 8 schedulers generating tracing events) and, in each + * scheduler, up to 512 different events. + */ +#define TRC_SCHED_ID_BITS 3 +#define TRC_SCHED_MASK_SHIFT (TRC_SUBCLS_SHIFT - TRC_SCHED_ID_BITS) + +#define TRC_MASK_CSCHED (0 << TRC_SCHED_MASK_SHIFT) +#define TRC_MASK_CSCHED2 (1 << TRC_SCHED_MASK_SHIFT) +#define TRC_MASK_SEDF (2 << TRC_SCHED_MASK_SHIFT) +#define TRC_MASK_ARINC653 (3 << TRC_SCHED_MASK_SHIFT) + /* Trace classes for Hardware */ #define TRC_HW_PM 0x00801000 /* Power management traces */ #define TRC_HW_IRQ 0x00802000 /* Traces relating to the handling of IRQs */
About tickling, and PCPU selection. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -21,6 +21,7 @@ #include <asm/atomic.h> #include <xen/errno.h> #include <xen/keyhandler.h> +#include <xen/trace.h> /* @@ -95,6 +96,20 @@ /* + * Credit tracing events ("only" 512 available!). Check + * include/public/trace.h for more details. + */ +#define TRC_CSCHED_EVENT(_e) ((TRC_SCHED_CLASS|TRC_MASK_CSCHED) + _e) + +#define TRC_CSCHED_SCHED_TASKLET TRC_CSCHED_EVENT(1) +#define TRC_CSCHED_ACCOUNT_START TRC_CSCHED_EVENT(2) +#define TRC_CSCHED_ACCOUNT_STOP TRC_CSCHED_EVENT(3) +#define TRC_CSCHED_STOLEN_VCPU TRC_CSCHED_EVENT(4) +#define TRC_CSCHED_PICKED_CPU TRC_CSCHED_EVENT(5) +#define TRC_CSCHED_TICKLE TRC_CSCHED_EVENT(6) + + +/* * Boot parameters */ static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS; @@ -318,9 +333,26 @@ static inline void } } - /* Send scheduler interrupts to designated CPUs */ if ( !cpumask_empty(&mask) ) + { + if ( unlikely(tb_init_done) ) + { + /* Avoid TRACE_* to avoid a lot of useless !tb_init_done checks */ + for_each_cpu(cpu, &mask) + { + struct { + unsigned cpu:8; + } d; + d.cpu = cpu; + trace_var(TRC_CSCHED_TICKLE, 0, + sizeof(d), + (unsigned char*)&d); + } + } + + /* Send scheduler interrupts to designated CPUs */ cpumask_raise_softirq(&mask, SCHEDULE_SOFTIRQ); + } } static void @@ -552,6 +584,8 @@ static int if ( commit && spc ) spc->idle_bias = cpu; + TRACE_3D(TRC_CSCHED_PICKED_CPU, vc->domain->domain_id, vc->vcpu_id, cpu); + return cpu; } @@ -584,6 +618,9 @@ static inline void } } + TRACE_3D(TRC_CSCHED_ACCOUNT_START, sdom->dom->domain_id, + svc->vcpu->vcpu_id, sdom->active_vcpu_count); + spin_unlock_irqrestore(&prv->lock, flags); } @@ -606,6 +643,9 @@ static inline void { list_del_init(&sdom->active_sdom_elem); } + + TRACE_3D(TRC_CSCHED_ACCOUNT_STOP, sdom->dom->domain_id, + svc->vcpu->vcpu_id, sdom->active_vcpu_count); } static void @@ -1238,6 +1278,8 @@ csched_runq_steal(int peer_cpu, int cpu, if (__csched_vcpu_is_migrateable(vc, cpu)) { /* We got a candidate. Grab it! */ + TRACE_3D(TRC_CSCHED_STOLEN_VCPU, peer_cpu, + vc->domain->domain_id, vc->vcpu_id); SCHED_VCPU_STAT_CRANK(speer, migrate_q); SCHED_STAT_CRANK(migrate_queued); WARN_ON(vc->is_urgent); @@ -1398,6 +1440,7 @@ csched_schedule( /* Tasklet work (which runs in idle VCPU context) overrides all else. */ if ( tasklet_work_scheduled ) { + TRACE_0D(TRC_CSCHED_SCHED_TASKLET); snext = CSCHED_VCPU(idle_vcpu[cpu]); snext->pri = CSCHED_PRI_TS_BOOST; }
Ian Campbell
2012-Dec-03 17:12 UTC
Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs
On Mon, 2012-12-03 at 16:34 +0000, Dario Faggioli wrote:> + /* > + * If there aren''t idlers for new, then letting new preempt cur and > + * try to migrate cur becomes ineviable. If that is the case, update"inevitable"> + * the mask of the to-be-tickled CPUs accordingly (i.e., with cur''s > + * idlers instead of new''s). > + */ > + if ( new->pri > cur->pri && cpumask_empty(&idle_mask) ) > + cpumask_and(&idle_mask, prv->idlers, cur->vcpu->cpu_affinity); > +(I have nothing more intelligent to say...)
Dario Faggioli
2012-Dec-03 18:26 UTC
Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs
On Mon, 2012-12-03 at 17:12 +0000, Ian Campbell wrote:> On Mon, 2012-12-03 at 16:34 +0000, Dario Faggioli wrote: > > + /* > > + * If there aren''t idlers for new, then letting new preempt cur and > > + * try to migrate cur becomes ineviable. If that is the case, update > > "inevitable" >Gha! Again! :-( I really need that ''comment spellchecker'' thing! Actually, looking for it a little bit more thoroughly, I ran into this (which I didn''t know): http://vimdoc.sourceforge.net/htmldoc/spell.html As well as into the fact that just doing a '':set spell'' in a recent enough version of Vim, while editing a C file, seems to do the job quite nicely.> (I have nothing more intelligent to say...) >Well, it really was useful to me, as it triggered the above! :-) Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2012-Dec-04 18:53 UTC
Re: [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs
On Mon, Dec 3, 2012 at 4:34 PM, Dario Faggioli <raistlin@linux.it> wrote:> So that it becomes possible to create specific scheduling event > within each scheduler without worrying about the overlapping, > and also without giving up being able to recognise them > univocally. The latter is deemed as useful, since we can have > more than one scheduler running at the same time, thanks to > cpupools. > > The event ID is 12 bits, and this change uses the upper 3 of them > for the ''scheduler ID''. This means we''re limited to 8 schedulers > and to 512 scheduler specific tracing events. Both seem reasonable > limitations as of now. > > This also converts the existing credit2 tracing (the only scheduler > generating tracing events up to now) to the new system. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> >> > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -29,18 +29,24 @@ > #define d2printk(x...) > //#define d2printk printk > > -#define TRC_CSCHED2_TICK TRC_SCHED_CLASS + 1 > -#define TRC_CSCHED2_RUNQ_POS TRC_SCHED_CLASS + 2 > -#define TRC_CSCHED2_CREDIT_BURN TRC_SCHED_CLASS + 3 > -#define TRC_CSCHED2_CREDIT_ADD TRC_SCHED_CLASS + 4 > -#define TRC_CSCHED2_TICKLE_CHECK TRC_SCHED_CLASS + 5 > -#define TRC_CSCHED2_TICKLE TRC_SCHED_CLASS + 6 > -#define TRC_CSCHED2_CREDIT_RESET TRC_SCHED_CLASS + 7 > -#define TRC_CSCHED2_SCHED_TASKLET TRC_SCHED_CLASS + 8 > -#define TRC_CSCHED2_UPDATE_LOAD TRC_SCHED_CLASS + 9 > -#define TRC_CSCHED2_RUNQ_ASSIGN TRC_SCHED_CLASS + 10 > -#define TRC_CSCHED2_UPDATE_VCPU_LOAD TRC_SCHED_CLASS + 11 > -#define TRC_CSCHED2_UPDATE_RUNQ_LOAD TRC_SCHED_CLASS + 12 > +/* > + * Credit2 tracing events ("only" 512 available!). Check > + * include/public/trace.h for more details. > + */ > +#define TRC_CSCHED2_EVENT(_e) ((TRC_SCHED_CLASS|TRC_MASK_CSCHED2) > + _e) >I think I would make this generic, and put it in trace.h -- maybe something like this? (Haven''t run this through a compiler) #define TRC_SCHED_CLASS_EVT(_c, _e) \ ((TRC_SCHED_CLASS|(TRC_SCHED_##_c<<TRC_SCHED_MASK_SHIFT))+(_e&TRC_SCHED_CLASS_MASK)) Then these would look like: #define TRC_CSCHED2_TICK TRC_SCHED_CLASS_EVT(CSCHED2,1)> + > +#define TRC_CSCHED2_TICK (TRC_CSCHED2_EVENT(1)) > +#define TRC_CSCHED2_RUNQ_POS (TRC_CSCHED2_EVENT(2)) > +#define TRC_CSCHED2_CREDIT_BURN (TRC_CSCHED2_EVENT(3)) > +#define TRC_CSCHED2_CREDIT_ADD (TRC_CSCHED2_EVENT(4)) > +#define TRC_CSCHED2_TICKLE_CHECK (TRC_CSCHED2_EVENT(5)) > +#define TRC_CSCHED2_TICKLE (TRC_CSCHED2_EVENT(6)) > +#define TRC_CSCHED2_CREDIT_RESET (TRC_CSCHED2_EVENT(7)) > +#define TRC_CSCHED2_SCHED_TASKLET (TRC_CSCHED2_EVENT(8)) > +#define TRC_CSCHED2_UPDATE_LOAD (TRC_CSCHED2_EVENT(9)) > +#define TRC_CSCHED2_RUNQ_ASSIGN (TRC_CSCHED2_EVENT(10)) > +#define TRC_CSCHED2_UPDATE_VCPU_LOAD (TRC_CSCHED2_EVENT(11)) > +#define TRC_CSCHED2_UPDATE_RUNQ_LOAD (TRC_CSCHED2_EVENT(12)) > > /* > * WARNING: This is still in an experimental phase. Status and work can > be found at the > diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h > --- a/xen/include/public/trace.h > +++ b/xen/include/public/trace.h > @@ -57,6 +57,26 @@ > #define TRC_SCHED_CLASS 0x00022000 /* Scheduler-specific */ > #define TRC_SCHED_VERBOSE 0x00028000 /* More inclusive scheduling */ > > +/* > + * Per-scheduler masks, to identify scheduler specific events. > + * > + * The highest 3 bits of the last 12 bits of the above TRC_SCHED_* > + * tracing classes are reserved for encoding what scheduler is producing > + * the event. The last 9 bits is where the scheduler specific event is > + * encoded. > + * > + * This means we can have at most 8 tracing scheduling masks (which > + * means at most 8 schedulers generating tracing events) and, in each > + * scheduler, up to 512 different events. > + */ > +#define TRC_SCHED_ID_BITS 3 > +#define TRC_SCHED_MASK_SHIFT (TRC_SUBCLS_SHIFT - TRC_SCHED_ID_BITS) > + > +#define TRC_MASK_CSCHED (0 << TRC_SCHED_MASK_SHIFT) > +#define TRC_MASK_CSCHED2 (1 << TRC_SCHED_MASK_SHIFT) > +#define TRC_MASK_SEDF (2 << TRC_SCHED_MASK_SHIFT) > +#define TRC_MASK_ARINC653 (3 << TRC_SCHED_MASK_SHIFT) >I don''t think "mask" is right here -- these aren''t masks, they''re numerical values. :-) If we use something like the #define above, then we can do: #define TRC_SCHED_CSCHED 0 #define TRC_SCHED_CSCHED2 /*...*/ -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2012-Dec-04 18:55 UTC
Re: [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs
On Tue, Dec 4, 2012 at 6:53 PM, George Dunlap <George.Dunlap@eu.citrix.com>wrote:> > #define TRC_SCHED_CSCHED 0 > #define TRC_SCHED_CSCHED2 > /*...*/ >No idea what random key combination I pushed to make gmail just sent that e-mail... anyway, that should be a "1" after CSCHED2. Thoughts? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2012-Dec-04 19:10 UTC
Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
On 03/12/12 16:35, Dario Faggioli wrote:> + /* Avoid TRACE_* to avoid a lot of useless !tb_init_done checks */ > + for_each_cpu(cpu, &mask) > + { > + struct { > + unsigned cpu:8; > + } d; > + d.cpu = cpu; > + trace_var(TRC_CSCHED_TICKLE, 0, > + sizeof(d), > + (unsigned char*)&d);Why not just TRC_1D()? The tracing infrastructure can only set the size at a granularity of 32-bit words anyway, and at this point cpu is "unsigned int", which will be a single word. Other than that, everything looks good. -George
George Dunlap
2012-Dec-05 11:51 UTC
Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
On 05/12/12 11:54, Dario Faggioli wrote:> On Tue, 2012-12-04 at 19:10 +0000, George Dunlap wrote: >> On 03/12/12 16:35, Dario Faggioli wrote: >>> + /* Avoid TRACE_* to avoid a lot of useless !tb_init_done checks */ >>> + for_each_cpu(cpu, &mask) >>> + { >>> + struct { >>> + unsigned cpu:8; >>> + } d; >>> + d.cpu = cpu; >>> + trace_var(TRC_CSCHED_TICKLE, 0, >>> + sizeof(d), >>> + (unsigned char*)&d); >> Why not just TRC_1D()? >> > As I tried to explain in the comment, I just wanted to avoid checking > for !tb_init_done more than once, as this happens within a loop and, at > least potentially, there may be more CPUs to tickle (and thus more calls > to TRACE_1D). I take this comment of yours as you not thinking that is > something worthwhile, right? If so, I can definitely turn this into a > "standard" TRACE_1D() call.Oh right -- yeah, no sense in having a duplicate check on tb_init_done; but the struct is still pointless; just passing sizeof(cpu) and &cpu should be prettier (even if the complier will probably optimize it to the same thing). -George
Dario Faggioli
2012-Dec-05 11:54 UTC
Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
On Tue, 2012-12-04 at 19:10 +0000, George Dunlap wrote:> On 03/12/12 16:35, Dario Faggioli wrote: > > + /* Avoid TRACE_* to avoid a lot of useless !tb_init_done checks */ > > + for_each_cpu(cpu, &mask) > > + { > > + struct { > > + unsigned cpu:8; > > + } d; > > + d.cpu = cpu; > > + trace_var(TRC_CSCHED_TICKLE, 0, > > + sizeof(d), > > + (unsigned char*)&d); > > Why not just TRC_1D()? >As I tried to explain in the comment, I just wanted to avoid checking for !tb_init_done more than once, as this happens within a loop and, at least potentially, there may be more CPUs to tickle (and thus more calls to TRACE_1D). I take this comment of yours as you not thinking that is something worthwhile, right? If so, I can definitely turn this into a "standard" TRACE_1D() call.> The tracing infrastructure can only set the size > at a granularity of 32-bit words anyway, and at this point cpu is > "unsigned int", which will be a single word. >I know that. I just followed suit from sched_credit2.c, but I agree it''s quite pointless for just one single field. Even if we decide to leave the direct call to trace_var, I''ll kill the dummy struct.> Other than that, everything looks good. >Ok, thanks. :-) Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2012-Dec-05 11:57 UTC
Re: [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs
On Tue, 2012-12-04 at 18:53 +0000, George Dunlap wrote:> > +/* > + * Credit2 tracing events ("only" 512 available!). Check > + * include/public/trace.h for more details. > + */ > +#define TRC_CSCHED2_EVENT(_e) ((TRC_SCHED_CLASS| > TRC_MASK_CSCHED2) + _e) > > I think I would make this generic, and put it in trace.h >Sounds good. Will do.> -- maybe something like this? (Haven''t run this through a compiler) > > #define TRC_SCHED_CLASS_EVT(_c, _e) \ > > ((TRC_SCHED_CLASS|(TRC_SCHED_##_c<<TRC_SCHED_MASK_SHIFT))+(_e&TRC_SCHED_CLASS_MASK)) >I''ll try it and resend.> +#define TRC_SCHED_ID_BITS 3 > +#define TRC_SCHED_MASK_SHIFT (TRC_SUBCLS_SHIFT - > TRC_SCHED_ID_BITS) > + > +#define TRC_MASK_CSCHED (0 << TRC_SCHED_MASK_SHIFT) > +#define TRC_MASK_CSCHED2 (1 << TRC_SCHED_MASK_SHIFT) > +#define TRC_MASK_SEDF (2 << TRC_SCHED_MASK_SHIFT) > +#define TRC_MASK_ARINC653 (3 << TRC_SCHED_MASK_SHIFT) > > I don''t think "mask" is right here -- these aren''t masks, they''re > numerical values. :-) If we use something like the #define above, > then we can do: > > #define TRC_SCHED_CSCHED 0 > #define TRC_SCHED_CSCHED2 > /*...*/ >I agree, bad name. (re "mask"). :-) Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Wed, 2012-12-05 at 11:54 +0000, Dario Faggioli wrote:> On Tue, 2012-12-04 at 19:10 +0000, George Dunlap wrote: > > On 03/12/12 16:35, Dario Faggioli wrote: > > > + /* Avoid TRACE_* to avoid a lot of useless !tb_init_done checks */ > > > + for_each_cpu(cpu, &mask) > > > + { > > > + struct { > > > + unsigned cpu:8; > > > + } d; > > > + d.cpu = cpu; > > > + trace_var(TRC_CSCHED_TICKLE, 0, > > > + sizeof(d), > > > + (unsigned char*)&d); > > > > Why not just TRC_1D()? > > > As I tried to explain in the comment, I just wanted to avoid checking > for !tb_init_done more than once, as this happens within a loop and, at > least potentially, there may be more CPUs to tickle (and thus more calls > to TRACE_1D).If tb_init_done isn''t marked volatile or anything like that isn''t the check hoisted out of the loop by the compiler?> I take this comment of yours as you not thinking that is > something worthwhile, right? If so, I can definitely turn this into a > "standard" TRACE_1D() call.Or maybe consider __TRACE_1D and friends which omit the check? Ian.
Dario Faggioli
2012-Dec-05 12:15 UTC
Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
On Wed, 2012-12-05 at 12:01 +0000, Ian Campbell wrote:> > As I tried to explain in the comment, I just wanted to avoid checking > > for !tb_init_done more than once, as this happens within a loop and, at > > least potentially, there may be more CPUs to tickle (and thus more calls > > to TRACE_1D). > > If tb_init_done isn''t marked volatile or anything like that isn''t the > check hoisted out of the loop by the compiler? >Good point. As they''re all macros, yes, I think that is something very likely to happen... Although, I haven''t checked the generated code, I''ll take a look. Thanks.> > I take this comment of yours as you not thinking that is > > something worthwhile, right? If so, I can definitely turn this into a > > "standard" TRACE_1D() call. > > Or maybe consider __TRACE_1D and friends which omit the check? >Mmm... It may well be me, but my $ grep __TRACE xen/* -R does not show any results... What am I missing? Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2012-Dec-05 12:16 UTC
Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs
On 03/12/12 16:34, Dario Faggioli wrote:> Right now, when a VCPU wakes-up, we check if the it should preempt > what is running on the PCPU, and whether or not the waking VCPU can > be migrated (by tickling some idlers). However, this can result in > suboptimal or even wrong behaviour, as explained here: > > http://lists.xen.org/archives/html/xen-devel/2012-10/msg01732.html > > This change, instead, when deciding what PCPUs to tickle upon VCPU > wake-up, considers both what it is likely to happen on the PCPU > where the wakeup occurs, as well as whether or not there are idle > PCPUs where to run the waking VCPU. > In fact, if there are idlers where the new VCPU can run, we can > avoid interrupting the running VCPU. OTOH, in case there aren''t > any of these PCPUs, preemption and migration are the way to go. > > This has been tested by running the following benchmarks inside 2, > 6 and 10 VMs concurrently, on a shared host, each with 2 VCPUs and > 960 MB of memory (host has 16 ways and 12 GB RAM). > > 1) All VMs had ''cpus="all"'' in their config file. > > $ sysbench --test=cpu ... (time, lower is better) > | VMs | w/o this change | w/ this change | > | 2 | 50.078467 +/- 1.6676162 | 49.704933 +/- 0.0277184 | > | 6 | 63.259472 +/- 0.1137586 | 62.227367 +/- 0.3880619 | > | 10 | 91.246797 +/- 0.1154008 | 91.174820 +/- 0.0928781 | > $ sysbench --test=memory ... (throughput, higher is better) > | VMs | w/o this change | w/ this change | > | 2 | 485.56333 +/- 6.0527356 | 525.57833 +/- 25.085826 | > | 6 | 401.36278 +/- 1.9745916 | 421.96111 +/- 9.0364048 | > | 10 | 294.43933 +/- 0.8064945 | 302.49033 +/- 0.2343978 | > $ specjbb2005 ... (throughput, higher is better) > | VMs | w/o this change | w/ this change | > | 2 | 43150.63 +/- 1359.5616 | 42720.632 +/- 1937.4488 | > | 6 | 29274.29 +/- 1024.4042 | 29518.171 +/- 1014.5239 | > | 10 | 19061.28 +/- 512.88561 | 19050.141 +/- 458.77327 | > > > 2) All VMs had their VCPUs statically pinned to the host''s PCPUs. > > $ sysbench --test=cpu ... (time, lower is better) > | VMs | w/o this change | w/ this change | > | 2 | 47.8211 +/- 0.0215504 | 47.826900 +/- 0.0077872 | > | 6 | 62.689122 +/- 0.0877173 | 62.764539 +/- 0.3882493 | > | 10 | 90.321097 +/- 1.4803867 | 89.974570 +/- 1.1437566 | > $ sysbench --test=memory ... (throughput, higher is better) > | VMs | w/o this change | w/ this change | > | 2 | 550.97667 +/- 2.3512355 | 550.87000 +/- 0.8140792 | > | 6 | 443.15000 +/- 5.7471797 | 454.01056 +/- 8.4373466 | > | 10 | 313.89233 +/- 1.3237493 | 321.81167 +/- 0.3528418 | > $ specjbb2005 ... (throughput, higher is better) > | 2 | 49591.057 +/- 952.93384 | 49610.98 +/- 1242.1675 | > | 6 | 33538.247 +/- 1089.2115 | 33682.222 +/- 1216.1078 | > | 10 | 21927.870 +/- 831.88742 | 21801.138 +/- 561.97068 | > > > Numbers show how the change has either no or very limited impact > (specjbb2005 case) or, when it does have some impact, that is an > actual improvement in performances, especially in the > sysbench-memory case. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>So I think the principle is good, but the resulting set of "if" statements is hard to figure out what''s going on. What do you think about re-arranging things, something like the attached? This particular version I got rid of the stats, because they require if() statements that break up the flow. If we really think they''re useful, maybe we could have a separate block somewhere for them? We could actually do without the idlers_empty entirely, as if we just remove the condition from the "else" block, the "right thing" will happen; however, it means several unnecessary cpumask operations on a busy system. Thoughts? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Wed, 2012-12-05 at 12:15 +0000, Dario Faggioli wrote:> On Wed, 2012-12-05 at 12:01 +0000, Ian Campbell wrote: > > > As I tried to explain in the comment, I just wanted to avoid checking > > > for !tb_init_done more than once, as this happens within a loop and, at > > > least potentially, there may be more CPUs to tickle (and thus more calls > > > to TRACE_1D). > > > > If tb_init_done isn''t marked volatile or anything like that isn''t the > > check hoisted out of the loop by the compiler? > > > Good point. As they''re all macros, yes, I think that is something very > likely to happen... Although, I haven''t checked the generated code, I''ll > take a look. Thanks. > > > > I take this comment of yours as you not thinking that is > > > something worthwhile, right? If so, I can definitely turn this into a > > > "standard" TRACE_1D() call. > > > > Or maybe consider __TRACE_1D and friends which omit the check? > > > Mmm... It may well be me, but my > > $ grep __TRACE xen/* -R > > does not show any results... What am I missing?I meant to define + use those macros. Ian.
George Dunlap
2012-Dec-05 12:25 UTC
Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
On 05/12/12 12:20, Ian Campbell wrote:> On Wed, 2012-12-05 at 12:15 +0000, Dario Faggioli wrote: >> On Wed, 2012-12-05 at 12:01 +0000, Ian Campbell wrote: >>>> As I tried to explain in the comment, I just wanted to avoid checking >>>> for !tb_init_done more than once, as this happens within a loop and, at >>>> least potentially, there may be more CPUs to tickle (and thus more calls >>>> to TRACE_1D). >>> If tb_init_done isn''t marked volatile or anything like that isn''t the >>> check hoisted out of the loop by the compiler? >>> >> Good point. As they''re all macros, yes, I think that is something very >> likely to happen... Although, I haven''t checked the generated code, I''ll >> take a look. Thanks. >> >>>> I take this comment of yours as you not thinking that is >>>> something worthwhile, right? If so, I can definitely turn this into a >>>> "standard" TRACE_1D() call. >>> Or maybe consider __TRACE_1D and friends which omit the check? >>> >> Mmm... It may well be me, but my >> >> $ grep __TRACE xen/* -R >> >> does not show any results... What am I missing? > I meant to define + use those macros.Well ATM there would be only one user -- and "trace_var(..., sizeof(cpu), &cpu);" is probably just as pretty as __TRACE_1D(..., cpu). I wouldn''t oppose such a patch, but I don''t think it should be required until we want to use "__TRACE_(N>2)D". -George
Mats Petersson
2012-Dec-05 12:38 UTC
Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
On 05/12/12 12:15, Dario Faggioli wrote:> On Wed, 2012-12-05 at 12:01 +0000, Ian Campbell wrote: >>> As I tried to explain in the comment, I just wanted to avoid checking >>> for !tb_init_done more than once, as this happens within a loop and, at >>> least potentially, there may be more CPUs to tickle (and thus more calls >>> to TRACE_1D). >> If tb_init_done isn''t marked volatile or anything like that isn''t the >> check hoisted out of the loop by the compiler? >> > Good point. As they''re all macros, yes, I think that is something very > likely to happen... Although, I haven''t checked the generated code, I''ll > take a look. Thanks.But if there is a call to an opaque function (any function the compiler doesn''t "know" what it does in the current compile unit - or one where the compiler can''t determine that the global variable isn''t being modified, e.g. anything that uses pointers the compiler can''t trace, etc) in between, the compiler must NOT move the actual read of a global variable. So it would be worth checking the compile output. -- Mats> >>> I take this comment of yours as you not thinking that is >>> something worthwhile, right? If so, I can definitely turn this into a >>> "standard" TRACE_1D() call. >> Or maybe consider __TRACE_1D and friends which omit the check? >> > Mmm... It may well be me, but my > > $ grep __TRACE xen/* -R > > does not show any results... What am I missing? > > Dario >
Maybe Matching Threads
- [PATCH 0 of 3] credit2 updates
- [PATCH v2] xen: sched_credit: filter node-affinity mask against online cpus
- [PATCH] xen,credit1: Add variable timeslice
- [Patch] credit: Update other parameters when setting tslice_ms
- Re: [RFC/RFT][PATCH 0 of 3] rework locking in sched_adjust