thr3ads.net - Xen devel - [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Jan Beulich

2013-Mar-01 15:35 UTC

[PATCH] SEDF: avoid gathering vCPU-s on pCPU0

The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
incompatible with the SEDF scheduler: Any vCPU using
VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
guests) ends up on pCPU0 after that call. Obviously, running all PV
guests'' (and namely Dom0''s) vCPU-s on pCPU0 causes problems
for those
guests rather sooner than later.

So the main thing that was clearly wrong (and bogus from the beginning)
was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
by a construct that prefers to put back the vCPU on the pCPU that it
got launched on.

However, there''s one more glitch: When reducing the affinity of a vCPU
temporarily, and then widening it again to a set that includes the pCPU
that the vCPU was last running on, the generic scheduler code would not
force a migration of that vCPU, and hence it would forever stay on the
pCPU it last ran on. Since that can again create a load imbalance, the
SEDF scheduler wants a migration to happen regardless of it being
apparently unnecessary.

Of course, an alternative to checking for SEDF explicitly in
vcpu_set_affinity() would be to introduce a flags field in struct
scheduler, and have SEDF set a "always-migrate-on-affinity-change"
flag.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/sched_sedf.c
+++ b/xen/common/sched_sedf.c
@@ -397,7 +397,8 @@ static int sedf_pick_cpu(const struct sc
 
     online = cpupool_scheduler_cpumask(v->domain->cpupool);
     cpumask_and(&online_affinity, v->cpu_affinity, online);
-    return cpumask_first(&online_affinity);
+    return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) -
1,
+                         &online_affinity);
 }
 
 /*
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -605,7 +605,8 @@ int vcpu_set_affinity(struct vcpu *v, co
     vcpu_schedule_lock_irq(v);
 
     cpumask_copy(v->cpu_affinity, affinity);
-    if ( !cpumask_test_cpu(v->processor, v->cpu_affinity) )
+    if ( VCPU2OP(v)->sched_id == XEN_SCHEDULER_SEDF ||
+         !cpumask_test_cpu(v->processor, v->cpu_affinity) )
         set_bit(_VPF_migrating, &v->pause_flags);
 
     vcpu_schedule_unlock_irq(v);





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Keir Fraser

2013-Mar-01 15:54 UTC

head link

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

On 01/03/2013 15:35, "Jan Beulich" <JBeulich@suse.com> wrote:
> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
> incompatible with the SEDF scheduler: Any vCPU using
> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
> guests) ends up on pCPU0 after that call. Obviously, running all PV
> guests'' (and namely Dom0''s) vCPU-s on pCPU0 causes
problems for those
> guests rather sooner than later.
> 
> So the main thing that was clearly wrong (and bogus from the beginning)
> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
> by a construct that prefers to put back the vCPU on the pCPU that it
> got launched on.
> 
> However, there''s one more glitch: When reducing the affinity of a
vCPU
> temporarily, and then widening it again to a set that includes the pCPU
> that the vCPU was last running on, the generic scheduler code would not
> force a migration of that vCPU, and hence it would forever stay on the
> pCPU it last ran on. Since that can again create a load imbalance, the
> SEDF scheduler wants a migration to happen regardless of it being
> apparently unnecessary.
> 
> Of course, an alternative to checking for SEDF explicitly in
> vcpu_set_affinity() would be to introduce a flags field in struct
> scheduler, and have SEDF set a
"always-migrate-on-affinity-change"
> flag.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
> --- a/xen/common/sched_sedf.c
> +++ b/xen/common/sched_sedf.c
> @@ -397,7 +397,8 @@ static int sedf_pick_cpu(const struct sc
>  
>      online = cpupool_scheduler_cpumask(v->domain->cpupool);
>      cpumask_and(&online_affinity, v->cpu_affinity, online);
> -    return cpumask_first(&online_affinity);
> +    return cpumask_cycle(v->vcpu_id %
cpumask_weight(&online_affinity) - 1,
> +                         &online_affinity);
>  }
>  
>  /*
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -605,7 +605,8 @@ int vcpu_set_affinity(struct vcpu *v, co
>      vcpu_schedule_lock_irq(v);
>  
>      cpumask_copy(v->cpu_affinity, affinity);
> -    if ( !cpumask_test_cpu(v->processor, v->cpu_affinity) )
> +    if ( VCPU2OP(v)->sched_id == XEN_SCHEDULER_SEDF ||
> +         !cpumask_test_cpu(v->processor, v->cpu_affinity) )
>          set_bit(_VPF_migrating, &v->pause_flags);
>  
>      vcpu_schedule_unlock_irq(v);
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Ian Campbell

2013-Mar-02 03:44 UTC

head link

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

On Fri, 2013-03-01 at 15:35 +0000, Jan Beulich wrote:> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
> incompatible with the SEDF scheduler: Any vCPU using
> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
> guests) ends up on pCPU0 after that call. Obviously, running all PV
> guests'' (and namely Dom0''s) vCPU-s on pCPU0 causes
problems for those
> guests rather sooner than later.
> 
> So the main thing that was clearly wrong (and bogus from the beginning)
> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
> by a construct that prefers to put back the vCPU on the pCPU that it
> got launched on.
> 
Is this change at all related to the long standing failure to boot with
sedf in the test harnesss?

Juergen Gross

2013-Mar-04 06:48 UTC

head link

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

On 01.03.2013 16:35, Jan Beulich wrote:> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
> incompatible with the SEDF scheduler: Any vCPU using
> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
> guests) ends up on pCPU0 after that call. Obviously, running all PV
> guests'' (and namely Dom0''s) vCPU-s on pCPU0 causes
problems for those
> guests rather sooner than later.
>
> So the main thing that was clearly wrong (and bogus from the beginning)
> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
> by a construct that prefers to put back the vCPU on the pCPU that it
> got launched on.
>
> However, there''s one more glitch: When reducing the affinity of a
vCPU
> temporarily, and then widening it again to a set that includes the pCPU
> that the vCPU was last running on, the generic scheduler code would not
> force a migration of that vCPU, and hence it would forever stay on the
> pCPU it last ran on. Since that can again create a load imbalance, the
> SEDF scheduler wants a migration to happen regardless of it being
> apparently unnecessary.
>
> Of course, an alternative to checking for SEDF explicitly in
> vcpu_set_affinity() would be to introduce a flags field in struct
> scheduler, and have SEDF set a
"always-migrate-on-affinity-change"
> flag.
Or something like this? I don''t like the test for sedf in schedule.c


diff -r 65105a4a8c7a xen/common/sched_sedf.c
--- a/xen/common/sched_sedf.c   Fri Mar 01 16:59:49 2013 +0100
+++ b/xen/common/sched_sedf.c   Mon Mar 04 07:35:53 2013 +0100
@@ -397,7 +397,8 @@ static int sedf_pick_cpu(const struct sc

      online = cpupool_scheduler_cpumask(v->domain->cpupool);
      cpumask_and(&online_affinity, v->cpu_affinity, online);
-    return cpumask_first(&online_affinity);
+    return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) -
1,
+                         &online_affinity);
  }

  /*
@@ -1503,6 +1504,11 @@ out:
      return rc;
  }

+static void sedf_set_affinity(const struct scheduler *ops, struct vcpu *v)
+{
+    set_bit(_VPF_migrating, &v->pause_flags);
+}
+
  static struct sedf_priv_info _sedf_priv;

  const struct scheduler sched_sedf_def = {
@@ -1532,6 +1538,7 @@ const struct scheduler sched_sedf_def       .sleep        
= sedf_sleep,
      .wake           = sedf_wake,
      .adjust         = sedf_adjust,
+    .set_affinity   = sedf_set_affinity,
  };

  /*
diff -r 65105a4a8c7a xen/common/schedule.c
--- a/xen/common/schedule.c     Fri Mar 01 16:59:49 2013 +0100
+++ b/xen/common/schedule.c     Mon Mar 04 07:35:53 2013 +0100
@@ -615,6 +615,8 @@ int vcpu_set_affinity(struct vcpu *v, co
      cpumask_copy(v->cpu_affinity, affinity);
      if ( !cpumask_test_cpu(v->processor, v->cpu_affinity) )
          set_bit(_VPF_migrating, &v->pause_flags);
+    if ( VCPU2OP(v)->set_affinity )
+        SCHED_OP(VCPU2OP(v), set_affinity, v);

      vcpu_schedule_unlock_irq(v);

diff -r 65105a4a8c7a xen/include/xen/sched-if.h
--- a/xen/include/xen/sched-if.h        Fri Mar 01 16:59:49 2013 +0100
+++ b/xen/include/xen/sched-if.h        Mon Mar 04 07:35:53 2013 +0100
@@ -180,6 +180,7 @@ struct scheduler {
      int          (*pick_cpu)       (const struct scheduler *, struct vcpu *);
      void         (*migrate)        (const struct scheduler *, struct vcpu *,
                                      unsigned int);
+    void         (*set_affinity)   (const struct scheduler *, struct vcpu *);
      int          (*adjust)         (const struct scheduler *, struct domain *,
                                      struct xen_domctl_scheduler_op *);
      int          (*adjust_global)  (const struct scheduler *,



-- 
Juergen Gross                 Principal Developer Operating Systems
PBG PDG ES&S SWE OS6                   Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

Jan Beulich

2013-Mar-04 07:57 UTC

head link

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

>>> On 02.03.13 at 04:44, Ian Campbell <ian.campbell@citrix.com>
wrote:
> On Fri, 2013-03-01 at 15:35 +0000, Jan Beulich wrote:
>> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
>> incompatible with the SEDF scheduler: Any vCPU using
>> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
>> guests) ends up on pCPU0 after that call. Obviously, running all PV
>> guests'' (and namely Dom0''s) vCPU-s on pCPU0 causes
problems for those
>> guests rather sooner than later.
>> 
>> So the main thing that was clearly wrong (and bogus from the beginning)
>> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
>> by a construct that prefers to put back the vCPU on the pCPU that it
>> got launched on.
>> 
> 
> Is this change at all related to the long standing failure to boot with
> sedf in the test harnesss?
Yes, very likely. It was looking at the logs (where all Dom0 vCPU-s
were similarly running on pCPU0) plus immediately seeing a hang
when trying it out that prompted me to finally look into the issue.

Jan

Jan Beulich

2013-Mar-04 07:59 UTC

head link

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

>>> On 04.03.13 at 07:48, Juergen Gross
<juergen.gross@ts.fujitsu.com> wrote:
> On 01.03.2013 16:35, Jan Beulich wrote:
>> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
>> incompatible with the SEDF scheduler: Any vCPU using
>> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
>> guests) ends up on pCPU0 after that call. Obviously, running all PV
>> guests'' (and namely Dom0''s) vCPU-s on pCPU0 causes
problems for those
>> guests rather sooner than later.
>>
>> So the main thing that was clearly wrong (and bogus from the beginning)
>> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
>> by a construct that prefers to put back the vCPU on the pCPU that it
>> got launched on.
>>
>> However, there''s one more glitch: When reducing the affinity
of a vCPU
>> temporarily, and then widening it again to a set that includes the pCPU
>> that the vCPU was last running on, the generic scheduler code would not
>> force a migration of that vCPU, and hence it would forever stay on the
>> pCPU it last ran on. Since that can again create a load imbalance, the
>> SEDF scheduler wants a migration to happen regardless of it being
>> apparently unnecessary.
>>
>> Of course, an alternative to checking for SEDF explicitly in
>> vcpu_set_affinity() would be to introduce a flags field in struct
>> scheduler, and have SEDF set a
"always-migrate-on-affinity-change"
>> flag.
> 
> Or something like this? I don''t like the test for sedf in
schedule.c
As said - to me a flag would seem more appropriate than another
indirect call. But for now I''ll commit as is - feel free to submit an
incremental patch.

Jan

George Dunlap

2013-Mar-04 11:22 UTC

head link

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

On Fri, Mar 1, 2013 at 3:35 PM, Jan Beulich <JBeulich@suse.com>
wrote:> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
> incompatible with the SEDF scheduler: Any vCPU using
> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
> guests) ends up on pCPU0 after that call. Obviously, running all PV
> guests'' (and namely Dom0''s) vCPU-s on pCPU0 causes
problems for those
> guests rather sooner than later.
>
> So the main thing that was clearly wrong (and bogus from the beginning)
> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
> by a construct that prefers to put back the vCPU on the pCPU that it
> got launched on.
>
> However, there''s one more glitch: When reducing the affinity of a
vCPU
> temporarily, and then widening it again to a set that includes the pCPU
> that the vCPU was last running on, the generic scheduler code would not
> force a migration of that vCPU, and hence it would forever stay on the
> pCPU it last ran on. Since that can again create a load imbalance, the
> SEDF scheduler wants a migration to happen regardless of it being
> apparently unnecessary.
I''m not quite understanding what this is about -- why is this
necessary for sedf but not for the other schedulers?

 -George

Jan Beulich

2013-Mar-04 11:37 UTC

head link

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

>>> On 04.03.13 at 12:22, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Fri, Mar 1, 2013 at 3:35 PM, Jan Beulich <JBeulich@suse.com>
wrote:
>> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
>> incompatible with the SEDF scheduler: Any vCPU using
>> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
>> guests) ends up on pCPU0 after that call. Obviously, running all PV
>> guests'' (and namely Dom0''s) vCPU-s on pCPU0 causes
problems for those
>> guests rather sooner than later.
>>
>> So the main thing that was clearly wrong (and bogus from the beginning)
>> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
>> by a construct that prefers to put back the vCPU on the pCPU that it
>> got launched on.
>>
>> However, there''s one more glitch: When reducing the affinity
of a vCPU
>> temporarily, and then widening it again to a set that includes the pCPU
>> that the vCPU was last running on, the generic scheduler code would not
>> force a migration of that vCPU, and hence it would forever stay on the
>> pCPU it last ran on. Since that can again create a load imbalance, the
>> SEDF scheduler wants a migration to happen regardless of it being
>> apparently unnecessary.
> 
> I''m not quite understanding what this is about -- why is this
> necessary for sedf but not for the other schedulers?
The problem with sedf is that it doesn''t do any balancing, and
never moves a vCPU to a different pCPU unless the affinity
mask changes in a way that makes this necessary (in which
case it''s the generic scheduler code that invokes to relocation).

So when a vCPU narrows its affinity (in the extreme to a mask
with just one bit set) and then widens it again, it would
nevertheless remain on the pCPU it was formerly restricted to.
When (perhaps much later) a second and then a third vCPU
do the same, they may all end up running on the same pCPU,
i.e. we''d get back to the problem addressed by the first half of
the fix here.

While you might argue that affinity changes ought to be done
in a "sensible" way, I think you still realize that there is behavior
here that one wouldn''t be able to control without explicitly
adding an intermediate step when doing this for DomU-s. And
you should also keep in mind that there are certain things that
Dom0 needs to change its vCPU affinities for (pv-ops doesn''t
do that, which is while certain things there just can''t work);
this is how I noticed in the first place that the second half of
the fix is necessary.

Jan

George Dunlap

2013-Mar-04 12:11 UTC

head link

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

On 04/03/13 11:37, Jan Beulich wrote:> While you might argue that affinity changes ought to be done
> in a "sensible" way, I think you still realize that there is
behavior
> here that one wouldn''t be able to control without explicitly
> adding an intermediate step when doing this for DomU-s. And
> you should also keep in mind that there are certain things that
> Dom0 needs to change its vCPU affinities for (pv-ops doesn''t
> do that, which is while certain things there just can''t work);
> this is how I noticed in the first place that the second half of
> the fix is necessary.
Oh right -- so the problem is that when you *expand* the mask, you want 
the scheduler to take another look at where might be a good place to run 
the vcpu given the changed constraints.

It''s probably not a bad idea to just extend that idea to all the 
schedulers really.  It''s not like we expect people to be changing the 
affinity masks hundreds of times a second.

  -George

George Dunlap

2013-Mar-04 12:15 UTC

head link

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

On 04/03/13 12:23, Jan Beulich wrote:>>>> On 04.03.13 at 13:11, George Dunlap
<george.dunlap@eu.citrix.com> wrote:
>> It''s probably not a bad idea to just extend that idea to all
the
>> schedulers really.  It''s not like we expect people to be
changing the
>> affinity masks hundreds of times a second.
> Yeah, why not - would (minimally) reduce code size and remove the
> SEDF-specific check from the generic scheduler code again.
I''ll whip up a patch...
  -G
>
> Jan
>

Jan Beulich

2013-Mar-04 12:23 UTC

head link

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

>>> On 04.03.13 at 13:11, George Dunlap
<george.dunlap@eu.citrix.com> wrote:
> It''s probably not a bad idea to just extend that idea to all the 
> schedulers really.  It''s not like we expect people to be changing
the
> affinity masks hundreds of times a second.
Yeah, why not - would (minimally) reduce code size and remove the
SEDF-specific check from the generic scheduler code again.

Jan

Ian Jackson

2013-Mar-05 15:54 UTC

head link

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 [and 1 more messages]

Jan Beulich writes ("Re: [Xen-devel] [PATCH] SEDF: avoid gathering vCPU-s
on pCPU0"):> On 02.03.13 at 04:44, Ian Campbell <ian.campbell@citrix.com> wrote:
> > Is this change at all related to the long standing failure to boot
with
> > sedf in the test harnesss?
> 
> Yes, very likely. It was looking at the logs (where all Dom0 vCPU-s
> were similarly running on pCPU0) plus immediately seeing a hang
> when trying it out that prompted me to finally look into the issue.
Good, but...

Jan Beulich writes ("Re: [Xen-devel] [PATCH] SEDF: avoid gathering vCPU-s
on pCPU0"):
...> So when a vCPU narrows its affinity (in the extreme to a mask
> with just one bit set) and then widens it again, it would
> nevertheless remain on the pCPU it was formerly restricted to.
> When (perhaps much later) a second and then a third vCPU
> do the same, they may all end up running on the same pCPU,
> i.e. we''d get back to the problem addressed by the first half of
> the fix here.
I don''t understand why having every VCPU in the whole system running
on a single PCPU would result in a complete hang.  Surely unless we
have many VCPUs all piling on spinlocks there will not be that many
runnable VCPUs ?

Or is this symptom due to the lack of PV spinlocks ?

Ian.

Jan Beulich

2013-Mar-05 16:11 UTC

head link

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 [and 1 more messages]

>>> On 05.03.13 at 16:54, Ian Jackson <Ian.Jackson@eu.citrix.com>
wrote:
> Jan Beulich writes ("Re: [Xen-devel] [PATCH] SEDF: avoid gathering
vCPU-s on pCPU0"):
>> So when a vCPU narrows its affinity (in the extreme to a mask
>> with just one bit set) and then widens it again, it would
>> nevertheless remain on the pCPU it was formerly restricted to.
>> When (perhaps much later) a second and then a third vCPU
>> do the same, they may all end up running on the same pCPU,
>> i.e. we''d get back to the problem addressed by the first half
of
>> the fix here.
> 
> I don''t understand why having every VCPU in the whole system
running
> on a single PCPU would result in a complete hang.  Surely unless we
> have many VCPUs all piling on spinlocks there will not be that many
> runnable VCPUs ?
> 
> Or is this symptom due to the lack of PV spinlocks ?
The case that I observed here was all CPUs piling up in
stop_machine logic - that''s effectively an open coded
spinning operation. But clearly non-ticket, non-pv spin locks
can result in similar live lock effects (as soon as the locking
rate is higher than what the single CPU can handle in terms
of executing locked region code).

Jan

Xen devel - Mar 2013 - [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

[PATCH] SEDF: avoid gathering vCPU-s on pCPU0

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 [and 1 more messages]

Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 [and 1 more messages]