thr3ads.net - Xen devel - [PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Dario Faggioli

2012-Dec-03 16:34 UTC

[PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing

Hello,

This small series deals with some weirdness in the mechanism with which the
credit scheduler choses what PCPU to tickle upon a VCPU wake-up.  Details are
available in the changelog of the first patch.

The new approach has been extensively benchmarked and proved itself either
beneficial or harmless. That means it does not introduce any significant amount
of overhead and/or performances regressions while, for some workloads, it
improves the performances quite sensibly (e.g., `sysbench
--test=memory'').

Full results in the first changelog too.

The rest of the series introduces some macros to enable generating
per-scheduler tracing events, retaining the possibility of distinguishing them,
even with more than one scheduler running at any given time (via cpupools), and
adds some tracing to the credit scheduler.

Thanks and Regards, Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Dario Faggioli

2012-Dec-03 16:34 UTC

head link

[PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs

Right now, when a VCPU wakes-up, we check if the it should preempt
what is running on the PCPU, and whether or not the waking VCPU can
be migrated (by tickling some idlers). However, this can result in
suboptimal or even wrong behaviour, as explained here:

 http://lists.xen.org/archives/html/xen-devel/2012-10/msg01732.html

This change, instead, when deciding what PCPUs to tickle upon VCPU
wake-up, considers both what it is likely to happen on the PCPU
where the wakeup occurs, as well as whether or not there are idle
PCPUs where to run the waking VCPU.
In fact, if there are idlers where the new VCPU can run, we can
avoid interrupting the running VCPU. OTOH, in case there aren''t
any of these PCPUs, preemption and migration are the way to go.

This has been tested by running the following benchmarks inside 2,
6 and 10 VMs concurrently, on a shared host, each with 2 VCPUs and
960 MB of memory (host has 16 ways and 12 GB RAM).

1) All VMs had ''cpus="all"'' in their config file.

$ sysbench --test=cpu ... (time, lower is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 50.078467 +/- 1.6676162 | 49.704933 +/- 0.0277184 |
 | 6   | 63.259472 +/- 0.1137586 | 62.227367 +/- 0.3880619 |
 | 10  | 91.246797 +/- 0.1154008 | 91.174820 +/- 0.0928781 |
$ sysbench --test=memory ... (throughput, higher is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 485.56333 +/- 6.0527356 | 525.57833 +/- 25.085826 |
 | 6   | 401.36278 +/- 1.9745916 | 421.96111 +/- 9.0364048 |
 | 10  | 294.43933 +/- 0.8064945 | 302.49033 +/- 0.2343978 |
$ specjbb2005 ... (throughput, higher is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 43150.63 +/- 1359.5616  | 42720.632 +/- 1937.4488 |
 | 6   | 29274.29 +/- 1024.4042  | 29518.171 +/- 1014.5239 |
 | 10  | 19061.28 +/- 512.88561  | 19050.141 +/- 458.77327 |


2) All VMs had their VCPUs statically pinned to the host''s PCPUs.

$ sysbench --test=cpu ... (time, lower is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 47.8211   +/- 0.0215504 | 47.826900 +/- 0.0077872 |
 | 6   | 62.689122 +/- 0.0877173 | 62.764539 +/- 0.3882493 |
 | 10  | 90.321097 +/- 1.4803867 | 89.974570 +/- 1.1437566 |
$ sysbench --test=memory ... (throughput, higher is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 550.97667 +/- 2.3512355 | 550.87000 +/- 0.8140792 |
 | 6   | 443.15000 +/- 5.7471797 | 454.01056 +/- 8.4373466 |
 | 10  | 313.89233 +/- 1.3237493 | 321.81167 +/- 0.3528418 |
$ specjbb2005 ... (throughput, higher is better)
 | 2   | 49591.057 +/- 952.93384 | 49610.98  +/- 1242.1675 |
 | 6   | 33538.247 +/- 1089.2115 | 33682.222 +/- 1216.1078 |
 | 10  | 21927.870 +/- 831.88742 | 21801.138 +/- 561.97068 |


Numbers show how the change has either no or very limited impact
(specjbb2005 case) or, when it does have some impact, that is an
actual improvement in performances, especially in the
sysbench-memory case.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -249,13 +249,25 @@ static inline void
     struct csched_vcpu * const cur          CSCHED_VCPU(per_cpu(schedule_data,
cpu).curr);
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
-    cpumask_t mask;
+    cpumask_t mask, idle_mask;
 
     ASSERT(cur);
     cpumask_clear(&mask);
 
-    /* If strictly higher priority than current VCPU, signal the CPU */
-    if ( new->pri > cur->pri )
+    /* Check whether or not there are idlers that can run new */
+    cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
+
+    /*
+     * Should we ask cpu to reschedule? Well, if new can preempt cur,
+     * and there isn''t any other place where it can run, we do. OTOH,
+     * if there are idlers where new can run, we can avoid interrupting
+     * cur and ask them to come and pick new up. So no, in that case, we
+     * do not signal cpu, avoiding an unnecessary migration of a running
+     * VCPU. It is true that we are (probably) migrating new, but as it
+     * is waking up, it likely is cache-cold anyway.
+     */
+    if ( new->pri > cur->pri &&
+         (cur->pri == CSCHED_PRI_IDLE || cpumask_empty(&idle_mask)) )
     {
         if ( cur->pri == CSCHED_PRI_IDLE )
             SCHED_STAT_CRANK(tickle_local_idler);
@@ -270,7 +282,7 @@ static inline void
     }
 
     /*
-     * If this CPU has at least two runnable VCPUs, we tickle any idlers to
+     * If this CPU has at least two runnable VCPUs, we tickle some idlers to
      * let them know there is runnable work in the system...
      */
     if ( cur->pri > CSCHED_PRI_IDLE )
@@ -281,9 +293,16 @@ static inline void
         }
         else
         {
-            cpumask_t idle_mask;
 
-            cpumask_and(&idle_mask, prv->idlers,
new->vcpu->cpu_affinity);
+            /*
+             * If there aren''t idlers for new, then letting new
preempt cur and
+             * try to migrate cur becomes ineviable. If that is the case,
update
+             * the mask of the to-be-tickled CPUs accordingly (i.e., with
cur''s
+             * idlers instead of new''s).
+             */
+            if ( new->pri > cur->pri &&
cpumask_empty(&idle_mask) )
+                cpumask_and(&idle_mask, prv->idlers,
cur->vcpu->cpu_affinity);
+
             if ( !cpumask_empty(&idle_mask) )
             {
                 SCHED_STAT_CRANK(tickle_idlers_some);
@@ -296,7 +315,6 @@ static inline void
                 else
                     cpumask_or(&mask, &mask, &idle_mask);
             }
-            cpumask_and(&mask, &mask, new->vcpu->cpu_affinity);
         }
     }

Dario Faggioli

2012-Dec-03 16:34 UTC

head link

[PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs

So that it becomes possible to create specific scheduling event
within each scheduler without worrying about the overlapping,
and also without giving up being able to recognise them
univocally. The latter is deemed as useful, since we can have
more than one scheduler running at the same time, thanks to
cpupools.

The event ID is 12 bits, and this change uses the upper 3 of them
for the ''scheduler ID''. This means we''re limited to 8
schedulers
and to 512 scheduler specific tracing events. Both seem reasonable
limitations as of now.

This also converts the existing credit2 tracing (the only scheduler
generating tracing events up to now) to the new system.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -29,18 +29,24 @@
 #define d2printk(x...)
 //#define d2printk printk
 
-#define TRC_CSCHED2_TICK        TRC_SCHED_CLASS + 1
-#define TRC_CSCHED2_RUNQ_POS    TRC_SCHED_CLASS + 2
-#define TRC_CSCHED2_CREDIT_BURN TRC_SCHED_CLASS + 3
-#define TRC_CSCHED2_CREDIT_ADD  TRC_SCHED_CLASS + 4
-#define TRC_CSCHED2_TICKLE_CHECK TRC_SCHED_CLASS + 5
-#define TRC_CSCHED2_TICKLE       TRC_SCHED_CLASS + 6
-#define TRC_CSCHED2_CREDIT_RESET TRC_SCHED_CLASS + 7
-#define TRC_CSCHED2_SCHED_TASKLET TRC_SCHED_CLASS + 8
-#define TRC_CSCHED2_UPDATE_LOAD   TRC_SCHED_CLASS + 9
-#define TRC_CSCHED2_RUNQ_ASSIGN   TRC_SCHED_CLASS + 10
-#define TRC_CSCHED2_UPDATE_VCPU_LOAD   TRC_SCHED_CLASS + 11
-#define TRC_CSCHED2_UPDATE_RUNQ_LOAD   TRC_SCHED_CLASS + 12
+/*
+ * Credit2 tracing events ("only" 512 available!). Check
+ * include/public/trace.h for more details.
+ */
+#define TRC_CSCHED2_EVENT(_e)        ((TRC_SCHED_CLASS|TRC_MASK_CSCHED2) + _e)
+
+#define TRC_CSCHED2_TICK             (TRC_CSCHED2_EVENT(1))
+#define TRC_CSCHED2_RUNQ_POS         (TRC_CSCHED2_EVENT(2))
+#define TRC_CSCHED2_CREDIT_BURN      (TRC_CSCHED2_EVENT(3))
+#define TRC_CSCHED2_CREDIT_ADD       (TRC_CSCHED2_EVENT(4))
+#define TRC_CSCHED2_TICKLE_CHECK     (TRC_CSCHED2_EVENT(5))
+#define TRC_CSCHED2_TICKLE           (TRC_CSCHED2_EVENT(6))
+#define TRC_CSCHED2_CREDIT_RESET     (TRC_CSCHED2_EVENT(7))
+#define TRC_CSCHED2_SCHED_TASKLET    (TRC_CSCHED2_EVENT(8))
+#define TRC_CSCHED2_UPDATE_LOAD      (TRC_CSCHED2_EVENT(9))
+#define TRC_CSCHED2_RUNQ_ASSIGN      (TRC_CSCHED2_EVENT(10))
+#define TRC_CSCHED2_UPDATE_VCPU_LOAD (TRC_CSCHED2_EVENT(11))
+#define TRC_CSCHED2_UPDATE_RUNQ_LOAD (TRC_CSCHED2_EVENT(12))
 
 /*
  * WARNING: This is still in an experimental phase.  Status and work can be
found at the
diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h
--- a/xen/include/public/trace.h
+++ b/xen/include/public/trace.h
@@ -57,6 +57,26 @@
 #define TRC_SCHED_CLASS     0x00022000   /* Scheduler-specific    */
 #define TRC_SCHED_VERBOSE   0x00028000   /* More inclusive scheduling */
 
+/*
+ * Per-scheduler masks, to identify scheduler specific events.
+ *
+ * The highest 3 bits of the last 12 bits of the above TRC_SCHED_*
+ * tracing classes are reserved for encoding what scheduler is producing
+ * the event. The last 9 bits is where the scheduler specific event is
+ * encoded.
+ *
+ * This means we can have at most 8 tracing scheduling masks (which
+ * means at most 8 schedulers generating tracing events) and, in each
+ * scheduler, up to 512 different events.
+ */
+#define TRC_SCHED_ID_BITS    3
+#define TRC_SCHED_MASK_SHIFT (TRC_SUBCLS_SHIFT - TRC_SCHED_ID_BITS)
+
+#define TRC_MASK_CSCHED      (0 << TRC_SCHED_MASK_SHIFT)
+#define TRC_MASK_CSCHED2     (1 << TRC_SCHED_MASK_SHIFT)
+#define TRC_MASK_SEDF        (2 << TRC_SCHED_MASK_SHIFT)
+#define TRC_MASK_ARINC653    (3 << TRC_SCHED_MASK_SHIFT)
+
 /* Trace classes for Hardware */
 #define TRC_HW_PM           0x00801000   /* Power management traces */
 #define TRC_HW_IRQ          0x00802000   /* Traces relating to the handling of
IRQs */

Dario Faggioli

2012-Dec-03 16:35 UTC

head link

[PATCH 3 of 3] xen: sched_credit: add some tracing

About tickling, and PCPU selection.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -21,6 +21,7 @@
 #include <asm/atomic.h>
 #include <xen/errno.h>
 #include <xen/keyhandler.h>
+#include <xen/trace.h>
 
 
 /*
@@ -95,6 +96,20 @@
 
 
 /*
+ * Credit tracing events ("only" 512 available!). Check
+ * include/public/trace.h for more details.
+ */
+#define TRC_CSCHED_EVENT(_e)     ((TRC_SCHED_CLASS|TRC_MASK_CSCHED) + _e)
+
+#define TRC_CSCHED_SCHED_TASKLET TRC_CSCHED_EVENT(1)
+#define TRC_CSCHED_ACCOUNT_START TRC_CSCHED_EVENT(2)
+#define TRC_CSCHED_ACCOUNT_STOP  TRC_CSCHED_EVENT(3)
+#define TRC_CSCHED_STOLEN_VCPU   TRC_CSCHED_EVENT(4)
+#define TRC_CSCHED_PICKED_CPU    TRC_CSCHED_EVENT(5)
+#define TRC_CSCHED_TICKLE        TRC_CSCHED_EVENT(6)
+
+
+/*
  * Boot parameters
  */
 static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
@@ -318,9 +333,26 @@ static inline void
         }
     }
 
-    /* Send scheduler interrupts to designated CPUs */
     if ( !cpumask_empty(&mask) )
+    {
+        if ( unlikely(tb_init_done) )
+        {
+            /* Avoid TRACE_* to avoid a lot of useless !tb_init_done checks */
+            for_each_cpu(cpu, &mask)
+            {
+                struct {
+                    unsigned cpu:8;
+                } d;
+                d.cpu = cpu;
+                trace_var(TRC_CSCHED_TICKLE, 0,
+                          sizeof(d),
+                          (unsigned char*)&d);
+            }
+        }
+
+        /* Send scheduler interrupts to designated CPUs */
         cpumask_raise_softirq(&mask, SCHEDULE_SOFTIRQ);
+    }
 }
 
 static void
@@ -552,6 +584,8 @@ static int
     if ( commit && spc )
        spc->idle_bias = cpu;
 
+    TRACE_3D(TRC_CSCHED_PICKED_CPU, vc->domain->domain_id,
vc->vcpu_id, cpu);
+
     return cpu;
 }
 
@@ -584,6 +618,9 @@ static inline void
         }
     }
 
+    TRACE_3D(TRC_CSCHED_ACCOUNT_START, sdom->dom->domain_id,
+             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
+
     spin_unlock_irqrestore(&prv->lock, flags);
 }
 
@@ -606,6 +643,9 @@ static inline void
     {
         list_del_init(&sdom->active_sdom_elem);
     }
+
+    TRACE_3D(TRC_CSCHED_ACCOUNT_STOP, sdom->dom->domain_id,
+             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
 }
 
 static void
@@ -1238,6 +1278,8 @@ csched_runq_steal(int peer_cpu, int cpu,
             if (__csched_vcpu_is_migrateable(vc, cpu))
             {
                 /* We got a candidate. Grab it! */
+                TRACE_3D(TRC_CSCHED_STOLEN_VCPU, peer_cpu,
+                         vc->domain->domain_id, vc->vcpu_id);
                 SCHED_VCPU_STAT_CRANK(speer, migrate_q);
                 SCHED_STAT_CRANK(migrate_queued);
                 WARN_ON(vc->is_urgent);
@@ -1398,6 +1440,7 @@ csched_schedule(
     /* Tasklet work (which runs in idle VCPU context) overrides all else. */
     if ( tasklet_work_scheduled )
     {
+        TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
         snext = CSCHED_VCPU(idle_vcpu[cpu]);
         snext->pri = CSCHED_PRI_TS_BOOST;
     }

Ian Campbell

2012-Dec-03 17:12 UTC

head link

Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs

On Mon, 2012-12-03 at 16:34 +0000, Dario Faggioli wrote:> +            /*
> +             * If there aren''t idlers for new, then letting new
preempt cur and
> +             * try to migrate cur becomes ineviable. If that is the case,
update
"inevitable"
> +             * the mask of the to-be-tickled CPUs accordingly (i.e., with
cur''s
> +             * idlers instead of new''s).
> +             */
> +            if ( new->pri > cur->pri &&
cpumask_empty(&idle_mask) )
> +                cpumask_and(&idle_mask, prv->idlers,
cur->vcpu->cpu_affinity);
> +
(I have nothing more intelligent to say...)

Dario Faggioli

2012-Dec-03 18:26 UTC

head link

Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs

On Mon, 2012-12-03 at 17:12 +0000, Ian Campbell wrote:> On Mon, 2012-12-03 at 16:34 +0000, Dario Faggioli wrote:
> > +            /*
> > +             * If there aren''t idlers for new, then letting
new preempt cur and
> > +             * try to migrate cur becomes ineviable. If that is the
case, update
> 
> "inevitable"
> Gha! Again! :-(

I really need that ''comment spellchecker'' thing! Actually,
looking for
it a little bit more thoroughly, I ran into this (which I didn''t know):

http://vimdoc.sourceforge.net/htmldoc/spell.html

As well as into the fact that just doing a '':set spell'' in a
recent
enough version of Vim, while editing a C file, seems to do the job quite
nicely.
> (I have nothing more intelligent to say...)
> Well, it really was useful to me, as it triggered the above! :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

George Dunlap

2012-Dec-04 18:53 UTC

head link

Re: [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs

On Mon, Dec 3, 2012 at 4:34 PM, Dario Faggioli <raistlin@linux.it> wrote:
> So that it becomes possible to create specific scheduling event
> within each scheduler without worrying about the overlapping,
> and also without giving up being able to recognise them
> univocally. The latter is deemed as useful, since we can have
> more than one scheduler running at the same time, thanks to
> cpupools.
>
> The event ID is 12 bits, and this change uses the upper 3 of them
> for the ''scheduler ID''. This means we''re limited
to 8 schedulers
> and to 512 scheduler specific tracing events. Both seem reasonable
> limitations as of now.
>
> This also converts the existing credit2 tracing (the only scheduler
> generating tracing events up to now) to the new system.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>


>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -29,18 +29,24 @@
>  #define d2printk(x...)
>  //#define d2printk printk
>
> -#define TRC_CSCHED2_TICK        TRC_SCHED_CLASS + 1
> -#define TRC_CSCHED2_RUNQ_POS    TRC_SCHED_CLASS + 2
> -#define TRC_CSCHED2_CREDIT_BURN TRC_SCHED_CLASS + 3
> -#define TRC_CSCHED2_CREDIT_ADD  TRC_SCHED_CLASS + 4
> -#define TRC_CSCHED2_TICKLE_CHECK TRC_SCHED_CLASS + 5
> -#define TRC_CSCHED2_TICKLE       TRC_SCHED_CLASS + 6
> -#define TRC_CSCHED2_CREDIT_RESET TRC_SCHED_CLASS + 7
> -#define TRC_CSCHED2_SCHED_TASKLET TRC_SCHED_CLASS + 8
> -#define TRC_CSCHED2_UPDATE_LOAD   TRC_SCHED_CLASS + 9
> -#define TRC_CSCHED2_RUNQ_ASSIGN   TRC_SCHED_CLASS + 10
> -#define TRC_CSCHED2_UPDATE_VCPU_LOAD   TRC_SCHED_CLASS + 11
> -#define TRC_CSCHED2_UPDATE_RUNQ_LOAD   TRC_SCHED_CLASS + 12
> +/*
> + * Credit2 tracing events ("only" 512 available!). Check
> + * include/public/trace.h for more details.
> + */
> +#define TRC_CSCHED2_EVENT(_e)        ((TRC_SCHED_CLASS|TRC_MASK_CSCHED2)
> + _e)
>
I think I would make this generic, and put it in trace.h -- maybe something
like this?  (Haven''t run this through a compiler)

#define TRC_SCHED_CLASS_EVT(_c, _e) \

((TRC_SCHED_CLASS|(TRC_SCHED_##_c<<TRC_SCHED_MASK_SHIFT))+(_e&TRC_SCHED_CLASS_MASK))

Then these would look like:
#define TRC_CSCHED2_TICK TRC_SCHED_CLASS_EVT(CSCHED2,1)

> +
> +#define TRC_CSCHED2_TICK             (TRC_CSCHED2_EVENT(1))
> +#define TRC_CSCHED2_RUNQ_POS         (TRC_CSCHED2_EVENT(2))
> +#define TRC_CSCHED2_CREDIT_BURN      (TRC_CSCHED2_EVENT(3))
> +#define TRC_CSCHED2_CREDIT_ADD       (TRC_CSCHED2_EVENT(4))
> +#define TRC_CSCHED2_TICKLE_CHECK     (TRC_CSCHED2_EVENT(5))
> +#define TRC_CSCHED2_TICKLE           (TRC_CSCHED2_EVENT(6))
> +#define TRC_CSCHED2_CREDIT_RESET     (TRC_CSCHED2_EVENT(7))
> +#define TRC_CSCHED2_SCHED_TASKLET    (TRC_CSCHED2_EVENT(8))
> +#define TRC_CSCHED2_UPDATE_LOAD      (TRC_CSCHED2_EVENT(9))
> +#define TRC_CSCHED2_RUNQ_ASSIGN      (TRC_CSCHED2_EVENT(10))
> +#define TRC_CSCHED2_UPDATE_VCPU_LOAD (TRC_CSCHED2_EVENT(11))
> +#define TRC_CSCHED2_UPDATE_RUNQ_LOAD (TRC_CSCHED2_EVENT(12))
>
>  /*
>   * WARNING: This is still in an experimental phase.  Status and work can
> be found at the
> diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h
> --- a/xen/include/public/trace.h
> +++ b/xen/include/public/trace.h
> @@ -57,6 +57,26 @@
>  #define TRC_SCHED_CLASS     0x00022000   /* Scheduler-specific    */
>  #define TRC_SCHED_VERBOSE   0x00028000   /* More inclusive scheduling */
>
> +/*
> + * Per-scheduler masks, to identify scheduler specific events.
> + *
> + * The highest 3 bits of the last 12 bits of the above TRC_SCHED_*
> + * tracing classes are reserved for encoding what scheduler is producing
> + * the event. The last 9 bits is where the scheduler specific event is
> + * encoded.
> + *
> + * This means we can have at most 8 tracing scheduling masks (which
> + * means at most 8 schedulers generating tracing events) and, in each
> + * scheduler, up to 512 different events.
> + */
> +#define TRC_SCHED_ID_BITS    3
> +#define TRC_SCHED_MASK_SHIFT (TRC_SUBCLS_SHIFT - TRC_SCHED_ID_BITS)
> +
> +#define TRC_MASK_CSCHED      (0 << TRC_SCHED_MASK_SHIFT)
> +#define TRC_MASK_CSCHED2     (1 << TRC_SCHED_MASK_SHIFT)
> +#define TRC_MASK_SEDF        (2 << TRC_SCHED_MASK_SHIFT)
> +#define TRC_MASK_ARINC653    (3 << TRC_SCHED_MASK_SHIFT)
>
I don''t think "mask" is right here -- these aren''t
masks, they''re numerical
values. :-)  If we use something like the #define above, then we can do:

#define TRC_SCHED_CSCHED 0
#define TRC_SCHED_CSCHED2
/*...*/

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

George Dunlap

2012-Dec-04 18:55 UTC

head link

Re: [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs

On Tue, Dec 4, 2012 at 6:53 PM, George Dunlap
<George.Dunlap@eu.citrix.com>wrote:
>
> #define TRC_SCHED_CSCHED 0
> #define TRC_SCHED_CSCHED2
> /*...*/
>
No idea what random key combination I pushed to make gmail just sent that
e-mail... anyway, that should be a "1" after CSCHED2.

Thoughts?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

George Dunlap

2012-Dec-04 19:10 UTC

head link

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

On 03/12/12 16:35, Dario Faggioli wrote:> +            /* Avoid TRACE_* to avoid a lot of useless !tb_init_done
checks */
> +            for_each_cpu(cpu, &mask)
> +            {
> +                struct {
> +                    unsigned cpu:8;
> +                } d;
> +                d.cpu = cpu;
> +                trace_var(TRC_CSCHED_TICKLE, 0,
> +                          sizeof(d),
> +                          (unsigned char*)&d);
Why not just TRC_1D()?  The tracing infrastructure can only set the size 
at a granularity of 32-bit words anyway, and at this point cpu is 
"unsigned int", which will be a single word.

Other than that, everything looks good.

  -George

George Dunlap

2012-Dec-05 11:51 UTC

head link

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

On 05/12/12 11:54, Dario Faggioli wrote:> On Tue, 2012-12-04 at 19:10 +0000, George Dunlap wrote:
>> On 03/12/12 16:35, Dario Faggioli wrote:
>>> +            /* Avoid TRACE_* to avoid a lot of useless
!tb_init_done checks */
>>> +            for_each_cpu(cpu, &mask)
>>> +            {
>>> +                struct {
>>> +                    unsigned cpu:8;
>>> +                } d;
>>> +                d.cpu = cpu;
>>> +                trace_var(TRC_CSCHED_TICKLE, 0,
>>> +                          sizeof(d),
>>> +                          (unsigned char*)&d);
>> Why not just TRC_1D()?
>>
> As I tried to explain in the comment, I just wanted to avoid checking
> for !tb_init_done more than once, as this happens within a loop and, at
> least potentially, there may be more CPUs to tickle (and thus more calls
> to TRACE_1D). I take this comment of yours as you not thinking that is
> something worthwhile, right? If so, I can definitely turn this into a
> "standard" TRACE_1D() call.
Oh right -- yeah, no sense in having a duplicate check on tb_init_done; 
but the struct is still pointless; just passing sizeof(cpu) and &cpu 
should be prettier (even if the complier will probably optimize it to 
the same thing).

  -George

Dario Faggioli

2012-Dec-05 11:54 UTC

head link

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

On Tue, 2012-12-04 at 19:10 +0000, George Dunlap wrote: > On 03/12/12 16:35, Dario Faggioli wrote:
> > +            /* Avoid TRACE_* to avoid a lot of useless !tb_init_done
checks */
> > +            for_each_cpu(cpu, &mask)
> > +            {
> > +                struct {
> > +                    unsigned cpu:8;
> > +                } d;
> > +                d.cpu = cpu;
> > +                trace_var(TRC_CSCHED_TICKLE, 0,
> > +                          sizeof(d),
> > +                          (unsigned char*)&d);
> 
> Why not just TRC_1D()? 
>As I tried to explain in the comment, I just wanted to avoid checking
for !tb_init_done more than once, as this happens within a loop and, at
least potentially, there may be more CPUs to tickle (and thus more calls
to TRACE_1D). I take this comment of yours as you not thinking that is
something worthwhile, right? If so, I can definitely turn this into a
"standard" TRACE_1D() call.
> The tracing infrastructure can only set the size 
> at a granularity of 32-bit words anyway, and at this point cpu is 
> "unsigned int", which will be a single word.
> I know that. I just followed suit from sched_credit2.c, but I agree
it''s
quite pointless for just one single field. Even if we decide to leave
the direct call to trace_var, I''ll kill the dummy struct.
> Other than that, everything looks good.
> Ok, thanks. :-)
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Dario Faggioli

2012-Dec-05 11:57 UTC

head link

Re: [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs

On Tue, 2012-12-04 at 18:53 +0000, George Dunlap wrote: >         
>         +/*
>         + * Credit2 tracing events ("only" 512 available!). Check
>         + * include/public/trace.h for more details.
>         + */
>         +#define TRC_CSCHED2_EVENT(_e)        ((TRC_SCHED_CLASS|
>         TRC_MASK_CSCHED2) + _e) 
> 
> I think I would make this generic, and put it in trace.h 
>Sounds good. Will do.
> -- maybe something like this?  (Haven''t run this through a
compiler)
> 
> #define TRC_SCHED_CLASS_EVT(_c, _e) \
> 
>
((TRC_SCHED_CLASS|(TRC_SCHED_##_c<<TRC_SCHED_MASK_SHIFT))+(_e&TRC_SCHED_CLASS_MASK))
> 
I''ll try it and resend. >         +#define TRC_SCHED_ID_BITS    3
>         +#define TRC_SCHED_MASK_SHIFT (TRC_SUBCLS_SHIFT -
>         TRC_SCHED_ID_BITS)
>         +
>         +#define TRC_MASK_CSCHED      (0 << TRC_SCHED_MASK_SHIFT)
>         +#define TRC_MASK_CSCHED2     (1 << TRC_SCHED_MASK_SHIFT)
>         +#define TRC_MASK_SEDF        (2 << TRC_SCHED_MASK_SHIFT)
>         +#define TRC_MASK_ARINC653    (3 << TRC_SCHED_MASK_SHIFT) 
> 
> I don''t think "mask" is right here -- these
aren''t masks, they''re
> numerical values. :-)  If we use something like the #define above,
> then we can do:
> 
> #define TRC_SCHED_CSCHED 0
> #define TRC_SCHED_CSCHED2
> /*...*/
> I agree, bad name. (re "mask"). :-)

Thanks and Regards,
Dario 
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2012-Dec-05 12:01 UTC

head link

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

On Wed, 2012-12-05 at 11:54 +0000, Dario Faggioli wrote:> On Tue, 2012-12-04 at 19:10 +0000, George Dunlap wrote: 
> > On 03/12/12 16:35, Dario Faggioli wrote:
> > > +            /* Avoid TRACE_* to avoid a lot of useless
!tb_init_done checks */
> > > +            for_each_cpu(cpu, &mask)
> > > +            {
> > > +                struct {
> > > +                    unsigned cpu:8;
> > > +                } d;
> > > +                d.cpu = cpu;
> > > +                trace_var(TRC_CSCHED_TICKLE, 0,
> > > +                          sizeof(d),
> > > +                          (unsigned char*)&d);
> > 
> > Why not just TRC_1D()? 
> >
> As I tried to explain in the comment, I just wanted to avoid checking
> for !tb_init_done more than once, as this happens within a loop and, at
> least potentially, there may be more CPUs to tickle (and thus more calls
> to TRACE_1D).
If tb_init_done isn''t marked volatile or anything like that
isn''t the
check hoisted out of the loop by the compiler?
> I take this comment of yours as you not thinking that is
> something worthwhile, right? If so, I can definitely turn this into a
> "standard" TRACE_1D() call.
Or maybe consider __TRACE_1D and friends which omit the check?

Ian.

Dario Faggioli

2012-Dec-05 12:15 UTC

head link

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

On Wed, 2012-12-05 at 12:01 +0000, Ian Campbell wrote: > > As I tried to explain in the comment, I just wanted to avoid checking
> > for !tb_init_done more than once, as this happens within a loop and,
at
> > least potentially, there may be more CPUs to tickle (and thus more
calls
> > to TRACE_1D).
> 
> If tb_init_done isn''t marked volatile or anything like that
isn''t the
> check hoisted out of the loop by the compiler?
> Good point. As they''re all macros, yes, I think that is something very
likely to happen... Although, I haven''t checked the generated code,
I''ll
take a look. Thanks.
> > I take this comment of yours as you not thinking that is
> > something worthwhile, right? If so, I can definitely turn this into a
> > "standard" TRACE_1D() call.
> 
> Or maybe consider __TRACE_1D and friends which omit the check?
> Mmm... It may well be me, but my

$ grep __TRACE xen/* -R

does not show any results... What am I missing?

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

George Dunlap

2012-Dec-05 12:16 UTC

head link

Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs

On 03/12/12 16:34, Dario Faggioli wrote:> Right now, when a VCPU wakes-up, we check if the it should preempt
> what is running on the PCPU, and whether or not the waking VCPU can
> be migrated (by tickling some idlers). However, this can result in
> suboptimal or even wrong behaviour, as explained here:
>
>   http://lists.xen.org/archives/html/xen-devel/2012-10/msg01732.html
>
> This change, instead, when deciding what PCPUs to tickle upon VCPU
> wake-up, considers both what it is likely to happen on the PCPU
> where the wakeup occurs, as well as whether or not there are idle
> PCPUs where to run the waking VCPU.
> In fact, if there are idlers where the new VCPU can run, we can
> avoid interrupting the running VCPU. OTOH, in case there aren''t
> any of these PCPUs, preemption and migration are the way to go.
>
> This has been tested by running the following benchmarks inside 2,
> 6 and 10 VMs concurrently, on a shared host, each with 2 VCPUs and
> 960 MB of memory (host has 16 ways and 12 GB RAM).
>
> 1) All VMs had ''cpus="all"'' in their config
file.
>
> $ sysbench --test=cpu ... (time, lower is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 50.078467 +/- 1.6676162 | 49.704933 +/- 0.0277184 |
>   | 6   | 63.259472 +/- 0.1137586 | 62.227367 +/- 0.3880619 |
>   | 10  | 91.246797 +/- 0.1154008 | 91.174820 +/- 0.0928781 |
> $ sysbench --test=memory ... (throughput, higher is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 485.56333 +/- 6.0527356 | 525.57833 +/- 25.085826 |
>   | 6   | 401.36278 +/- 1.9745916 | 421.96111 +/- 9.0364048 |
>   | 10  | 294.43933 +/- 0.8064945 | 302.49033 +/- 0.2343978 |
> $ specjbb2005 ... (throughput, higher is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 43150.63 +/- 1359.5616  | 42720.632 +/- 1937.4488 |
>   | 6   | 29274.29 +/- 1024.4042  | 29518.171 +/- 1014.5239 |
>   | 10  | 19061.28 +/- 512.88561  | 19050.141 +/- 458.77327 |
>
>
> 2) All VMs had their VCPUs statically pinned to the host''s PCPUs.
>
> $ sysbench --test=cpu ... (time, lower is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 47.8211   +/- 0.0215504 | 47.826900 +/- 0.0077872 |
>   | 6   | 62.689122 +/- 0.0877173 | 62.764539 +/- 0.3882493 |
>   | 10  | 90.321097 +/- 1.4803867 | 89.974570 +/- 1.1437566 |
> $ sysbench --test=memory ... (throughput, higher is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 550.97667 +/- 2.3512355 | 550.87000 +/- 0.8140792 |
>   | 6   | 443.15000 +/- 5.7471797 | 454.01056 +/- 8.4373466 |
>   | 10  | 313.89233 +/- 1.3237493 | 321.81167 +/- 0.3528418 |
> $ specjbb2005 ... (throughput, higher is better)
>   | 2   | 49591.057 +/- 952.93384 | 49610.98  +/- 1242.1675 |
>   | 6   | 33538.247 +/- 1089.2115 | 33682.222 +/- 1216.1078 |
>   | 10  | 21927.870 +/- 831.88742 | 21801.138 +/- 561.97068 |
>
>
> Numbers show how the change has either no or very limited impact
> (specjbb2005 case) or, when it does have some impact, that is an
> actual improvement in performances, especially in the
> sysbench-memory case.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
So I think the principle is good, but the resulting set of "if" 
statements is hard to figure out what''s going on.

What do you think about re-arranging things, something like the attached?

This particular version I got rid of the stats, because they require 
if() statements that break up the flow.  If we really think they''re 
useful, maybe we could have a separate block somewhere for them?

We could actually do without the idlers_empty entirely, as if we just 
remove the condition from the "else" block, the "right
thing" will
happen; however, it means several unnecessary cpumask operations on a 
busy system.

Thoughts?

  -George



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2012-Dec-05 12:20 UTC

head link

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

On Wed, 2012-12-05 at 12:15 +0000, Dario Faggioli wrote:> On Wed, 2012-12-05 at 12:01 +0000, Ian Campbell wrote: 
> > > As I tried to explain in the comment, I just wanted to avoid
checking
> > > for !tb_init_done more than once, as this happens within a loop
and, at
> > > least potentially, there may be more CPUs to tickle (and thus
more calls
> > > to TRACE_1D).
> > 
> > If tb_init_done isn''t marked volatile or anything like that
isn''t the
> > check hoisted out of the loop by the compiler?
> > 
> Good point. As they''re all macros, yes, I think that is something
very
> likely to happen... Although, I haven''t checked the generated
code, I''ll
> take a look. Thanks.
> 
> > > I take this comment of yours as you not thinking that is
> > > something worthwhile, right? If so, I can definitely turn this
into a
> > > "standard" TRACE_1D() call.
> > 
> > Or maybe consider __TRACE_1D and friends which omit the check?
> > 
> Mmm... It may well be me, but my
> 
> $ grep __TRACE xen/* -R
> 
> does not show any results... What am I missing?
I meant to define + use those macros.

Ian.

George Dunlap

2012-Dec-05 12:25 UTC

head link

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

On 05/12/12 12:20, Ian Campbell wrote:> On Wed, 2012-12-05 at 12:15 +0000, Dario Faggioli wrote:
>> On Wed, 2012-12-05 at 12:01 +0000, Ian Campbell wrote:
>>>> As I tried to explain in the comment, I just wanted to avoid
checking
>>>> for !tb_init_done more than once, as this happens within a loop
and, at
>>>> least potentially, there may be more CPUs to tickle (and thus
more calls
>>>> to TRACE_1D).
>>> If tb_init_done isn''t marked volatile or anything like
that isn''t the
>>> check hoisted out of the loop by the compiler?
>>>
>> Good point. As they''re all macros, yes, I think that is
something very
>> likely to happen... Although, I haven''t checked the generated
code, I''ll
>> take a look. Thanks.
>>
>>>> I take this comment of yours as you not thinking that is
>>>> something worthwhile, right? If so, I can definitely turn this
into a
>>>> "standard" TRACE_1D() call.
>>> Or maybe consider __TRACE_1D and friends which omit the check?
>>>
>> Mmm... It may well be me, but my
>>
>> $ grep __TRACE xen/* -R
>>
>> does not show any results... What am I missing?
> I meant to define + use those macros.
Well ATM there would be only one user -- and "trace_var(..., 
sizeof(cpu), &cpu);" is probably just as pretty as __TRACE_1D(...,
cpu).

I wouldn''t oppose such a patch, but I don''t think it should be
required
until we want to use "__TRACE_(N>2)D".

  -George

Mats Petersson

2012-Dec-05 12:38 UTC

head link

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

On 05/12/12 12:15, Dario Faggioli wrote:> On Wed, 2012-12-05 at 12:01 +0000, Ian Campbell wrote:
>>> As I tried to explain in the comment, I just wanted to avoid
checking
>>> for !tb_init_done more than once, as this happens within a loop
and, at
>>> least potentially, there may be more CPUs to tickle (and thus more
calls
>>> to TRACE_1D).
>> If tb_init_done isn''t marked volatile or anything like that
isn''t the
>> check hoisted out of the loop by the compiler?
>>
> Good point. As they''re all macros, yes, I think that is something
very
> likely to happen... Although, I haven''t checked the generated
code, I''ll
> take a look. Thanks.But if there is a call to an opaque function (any function the compiler 
doesn''t "know" what it does in the current compile unit - or
one where
the compiler can''t determine that the global variable isn''t
being
modified, e.g. anything that uses pointers the compiler can''t trace, 
etc) in between, the compiler must NOT move the actual read of a global 
variable. So it would be worth checking the compile output.

--
Mats>
>>> I take this comment of yours as you not thinking that is
>>> something worthwhile, right? If so, I can definitely turn this into
a
>>> "standard" TRACE_1D() call.
>> Or maybe consider __TRACE_1D and friends which omit the check?
>>
> Mmm... It may well be me, but my
>
> $ grep __TRACE xen/* -R
>
> does not show any results... What am I missing?
>
> Dario
>

Reasonably Related Threads

Search for more apparently analagous threads

Xen devel - Dec 2012 - [PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing

[PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing

[PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs

[PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs

[PATCH 3 of 3] xen: sched_credit: add some tracing

Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs

Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs

Re: [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs

Re: [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

Re: [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

Re: [PATCH 3 of 3] xen: sched_credit: add some tracing

Reasonably Related Threads