thr3ads.net - Xen devel - [Xen-devel] dom0 hang [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Mukesh Rathor

2009-Jun-24 18:15 UTC

[Xen-devel] dom0 hang

Here are the details on the dom0 hang:

xen: 3.4.0
dom0: 2.6.18-128

dom0.vcpu0: spinning in schedule() on spinlock: spin_lock_irq(&rq->lock);
dom0.vcpu1: eip == ret after __HYPERVISOR_event_channel_op hypercall

Just of of curiosity, I set breakpoint at the above ret in kdb, and it never 
got hit. So I wondered why vcpu1 is not getting scheculed, and noticed that 
xen.schedule always schedules vcpu0. Two cpus on the box, other one is mostly 
in idle.

anyways, I''ve turned lock debugging on in dom0 and reproducing it right
now.

thanks,
Mukesh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Jul-02 03:19 UTC

head link

Re: [Xen-devel] dom0 hang

Mukesh Rathor wrote:> Here are the details on the dom0 hang:
> 
> xen: 3.4.0
> dom0: 2.6.18-128
> 
> dom0.vcpu0: spinning in schedule() on spinlock:
spin_lock_irq(&rq->lock);
> dom0.vcpu1: eip == ret after __HYPERVISOR_event_channel_op hypercall
> 
> Just of of curiosity, I set breakpoint at the above ret in kdb, and it 
> never got hit. So I wondered why vcpu1 is not getting scheculed, and 
> noticed that xen.schedule always schedules vcpu0. Two cpus on the box, 
> other one is mostly in idle.
> 
> anyways, I''ve turned lock debugging on in dom0 and reproducing it
right
> now.
> 
> thanks,
> Mukesh
> 

Ok, here''s what I have found on this:

dom0 hang:
     vcpu0 is trying to wakeup a task and in try_to_wake_up() calls
     task_rq_lock(). since the task has cpu set to 1, it gets runq lock
     for vcpu1. next it calls resched_task() which results in sending IPI
     to vcpu1. for that, vcpu0 gets into the HYPERVISOR_event_channel_op
     HCALL and is waiting to return. Meanwhile, vcpu1 got running, and is
     spinning on it''s runq lock in
"schedule():spin_lock_irq(&rq->lock);",
     that vcpu0 is holding (and is waiting to return from the HCALL).

     As I had noticed before, vcpu0 never gets scheduled in xen. So
     looking further into xen:

xen:
     Both vcpu''s are on the same runq, in this case cpu1. But the
     priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST. As a result,
     the scheduler always picks vcpu1, and vcpu0 is starved. Also, I see in
     kdb that the scheduler timer is not set on cpu 0. That would''ve
     allowed csched_load_balance() to kick in on cpu0. [Also, on
     cpu1, the accounting timer, csched_tick, is not set.  Altho,
     csched_tick() is running on cpu0, it only checks runq for cpu0.]

     Looks like c/s 19500 changed csched_schedule():

-    ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE);
+    ret.time = (is_idle_vcpu(snext->vcpu) ?
+                -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE));

   The quickest fix for us would be to just back that out.


   BTW, just a comment on following (all in sched_credit.c):

       if ( svc->pri == CSCHED_PRI_TS_UNDER &&
          !(svc->flags & CSCHED_FLAG_VCPU_PARKED) )
       {
          svc->pri = CSCHED_PRI_TS_BOOST;
       }
   comibined with
     if ( snext->pri > CSCHED_PRI_TS_OVER )
             __runq_remove(snext);

       Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems dangerous. To me,
       since csched_schedule() never checks for time accumulated by a
       vcpu at pri CSCHED_PRI_TS_BOOST, that is same as pinning a vcpu to a
       pcpu. if that vcpu never makes progress, essentially, the system
       has lost a physical cpu.  Optionally, csched_schedule() should always
       check for cpu time accumulated and reduce the priority over time.
       I can''t tell right off if it already does that. or something
like
       that :)...  my 2 cents.

thanks,
Mukesh
  *** : starting 3 star campaign against overuse of macros!


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-02 07:18 UTC

head link

Re: [Xen-devel] dom0 hang

On 02/07/2009 04:19, "Mukesh Rathor" <mukesh.rathor@oracle.com>
wrote:
>      Looks like c/s 19500 changed csched_schedule():
> 
> -    ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE);
> +    ret.time = (is_idle_vcpu(snext->vcpu) ?
> +                -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE));
> 
>    The quickest fix for us would be to just back that out.
The wakeup should come via __runq_tickle(), I think. Anyhow I don''t see
this
can be the main underlying bug. What if you were running on a uniprocessor
system - you''d still be boned then, right?

I''ve added George Dunlap to the Cc list by the way. He''s doing
a bunch of
scheduler work at the moment.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Jul-02 17:50 UTC

head link

Re: [Xen-devel] dom0 hang

On Thu, Jul 2, 2009 at 4:19 AM, Mukesh Rathor<mukesh.rathor@oracle.com>
wrote:> dom0 hang:
>    vcpu0 is trying to wakeup a task and in try_to_wake_up() calls
>    task_rq_lock(). since the task has cpu set to 1, it gets runq lock
>    for vcpu1. next it calls resched_task() which results in sending IPI
>    to vcpu1. for that, vcpu0 gets into the HYPERVISOR_event_channel_op
>    HCALL and is waiting to return. Meanwhile, vcpu1 got running, and is
>    spinning on it''s runq lock in
"schedule():spin_lock_irq(&rq->lock);",
>    that vcpu0 is holding (and is waiting to return from the HCALL).
>
>    As I had noticed before, vcpu0 never gets scheduled in xen. So
>    looking further into xen:
>
> xen:
>    Both vcpu''s are on the same runq, in this case cpu1. But the
>    priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST. As a result,
>    the scheduler always picks vcpu1, and vcpu0 is starved. Also, I see in
>    kdb that the scheduler timer is not set on cpu 0. That would''ve
>    allowed csched_load_balance() to kick in on cpu0. [Also, on
>    cpu1, the accounting timer, csched_tick, is not set.  Altho,
>    csched_tick() is running on cpu0, it only checks runq for cpu0.]
>
>    Looks like c/s 19500 changed csched_schedule():
>
> -    ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE);
> +    ret.time = (is_idle_vcpu(snext->vcpu) ?
> +                -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE));
>
>  The quickest fix for us would be to just back that out.
>
>
>  BTW, just a comment on following (all in sched_credit.c):
>
>      if ( svc->pri == CSCHED_PRI_TS_UNDER &&
>         !(svc->flags & CSCHED_FLAG_VCPU_PARKED) )
>      {
>         svc->pri = CSCHED_PRI_TS_BOOST;
>      }
>  comibined with
>    if ( snext->pri > CSCHED_PRI_TS_OVER )
>            __runq_remove(snext);
>
>      Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems dangerous. To me,
>      since csched_schedule() never checks for time accumulated by a
>      vcpu at pri CSCHED_PRI_TS_BOOST, that is same as pinning a vcpu to a
>      pcpu. if that vcpu never makes progress, essentially, the system
>      has lost a physical cpu.  Optionally, csched_schedule() should always
>      check for cpu time accumulated and reduce the priority over time.
>      I can''t tell right off if it already does that. or something
like
>      that :)...  my 2 cents.
Hmm... what''s supposed to happen is that eventually a timer tick will
interrupt vcpu1.  If cpu1 is set to be "active", then it will be
debited 10ms worth of credit.  Eventually, it will go into OVER, and
lose BOOST.  If it''s "inactive", then when the tick happens,
it will
be set to "active" and be debited 10ms again, setting it directly into
OVER (and thus also losing boost).

Can you see if the timer ticks are still happening, and perhaps put
some tracing it to verify that what I described above is happening?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Jul-02 19:14 UTC

head link

Re: [Xen-devel] dom0 hang

George Dunlap wrote:> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh
Rathor<mukesh.rathor@oracle.com> wrote:
>> dom0 hang:
>>    vcpu0 is trying to wakeup a task and in try_to_wake_up() calls
>>    task_rq_lock(). since the task has cpu set to 1, it gets runq lock
>>    for vcpu1. next it calls resched_task() which results in sending IPI
>>    to vcpu1. for that, vcpu0 gets into the HYPERVISOR_event_channel_op
>>    HCALL and is waiting to return. Meanwhile, vcpu1 got running, and is
>>    spinning on it''s runq lock in
"schedule():spin_lock_irq(&rq->lock);",
>>    that vcpu0 is holding (and is waiting to return from the HCALL).
>>
>>    As I had noticed before, vcpu0 never gets scheduled in xen. So
>>    looking further into xen:
>>
>> xen:
>>    Both vcpu''s are on the same runq, in this case cpu1. But
the
>>    priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST. As a result,
>>    the scheduler always picks vcpu1, and vcpu0 is starved. Also, I see
in
>>    kdb that the scheduler timer is not set on cpu 0. That
would''ve
>>    allowed csched_load_balance() to kick in on cpu0. [Also, on
>>    cpu1, the accounting timer, csched_tick, is not set.  Altho,
>>    csched_tick() is running on cpu0, it only checks runq for cpu0.]
>>
>>    Looks like c/s 19500 changed csched_schedule():
>>
>> -    ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE);
>> +    ret.time = (is_idle_vcpu(snext->vcpu) ?
>> +                -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE));
>>
>>  The quickest fix for us would be to just back that out.
>>
>>
>>  BTW, just a comment on following (all in sched_credit.c):
>>
>>      if ( svc->pri == CSCHED_PRI_TS_UNDER &&
>>         !(svc->flags & CSCHED_FLAG_VCPU_PARKED) )
>>      {
>>         svc->pri = CSCHED_PRI_TS_BOOST;
>>      }
>>  comibined with
>>    if ( snext->pri > CSCHED_PRI_TS_OVER )
>>            __runq_remove(snext);
>>
>>      Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems dangerous. To me,
>>      since csched_schedule() never checks for time accumulated by a
>>      vcpu at pri CSCHED_PRI_TS_BOOST, that is same as pinning a vcpu to
a
>>      pcpu. if that vcpu never makes progress, essentially, the system
>>      has lost a physical cpu.  Optionally, csched_schedule() should
always
>>      check for cpu time accumulated and reduce the priority over time.
>>      I can''t tell right off if it already does that. or
something like
>>      that :)...  my 2 cents.
> 
> Hmm... what''s supposed to happen is that eventually a timer tick
will
> interrupt vcpu1.  If cpu1 is set to be "active", then it will be
> debited 10ms worth of credit.  Eventually, it will go into OVER, and
> lose BOOST.  If it''s "inactive", then when the tick
happens, it will
> be set to "active" and be debited 10ms again, setting it directly
into
> OVER (and thus also losing boost).
> 
> Can you see if the timer ticks are still happening, and perhaps put
> some tracing it to verify that what I described above is happening?
> 
>  -George

George,

Is that in csched_acct()? Looks like that''s somehow gotten removed. If
true, then may be that''s the fundamental problem to chase.

Here''s what the trq looks like when hung, not in any schedule function:

[0]xkdb> dtrq
CPU[00]: NOW:0x00003f2db9af369e
  1: exp=0x00003ee31cb32200 fn:csched_tick data:0000000000000000
  2: exp=0x00003ee347ece164 fn:time_calibration data:0000000000000000
  3: exp=0x00003ee69a28f04b fn:mce_work_fn data:0000000000000000
  4: exp=0x00003f055895e25f fn:plt_overflow data:0000000000000000
  5: exp=0x00003ee353810216 fn:rtc_update_second data:ffff83007f0226d8

CPU[01]: NOW:0x00003f2db9af369e
  1: exp=0x00003ee30b847988 fn:s_timer_fn data:0000000000000000
  2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback data:ffff83007f022a68


thanks
Mukesh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Jul-02 21:37 UTC

head link

Re: [Xen-devel] dom0 hang

[Oops, adding back in distro list, also adding Kevin Tian and Yu Ke
who wrote cs 19460]

The functionality I was talking about, subtracting credits and
clearing BOOST, happens in csched_vcpu_acct() (which is different than
csched_acct()).  vcpu_acct() is called from csched_tick(), which
should still happen every 10ms on every cpu.

The patch I referred to (cs 19460) disables and re-enables tickers in
xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time the
processor idles.  I can''t see anywhere else that tickers are disabled,
so it''s probably something not properly re-enabling them again.

Try applying the attached patch to see if that changes anything.  (I''m
on the road, so I can''t repro the lockup issue.)  If that
doesn''t
work, try disabling c-states and see if that helps.  Then at least
we''ll know where the problem lies.

 -George

On Thu, Jul 2, 2009 at 10:10 PM, Mukesh Rathor<mukesh.rathor@oracle.com>
wrote:> that seems to only suspend csched_pcpu.ticker which is csched_tick that is
> only sorting local runq.
>
> again, we are concerned about csched_priv.master_ticker that calls
> csched_acct? correct, so i can trace that?
>
> thanks,
> mukesh
>
>
> George Dunlap wrote:
>>
>> Ah, I see that there''s been some changes to tick stuff with
the
>> c-state (e.g., cs 19460).  It looks like they''re supposed to
be going
>> still, but perhaps the tick_suspend() and tick_resume() aren''t
being
>> called properly.  Let me take a closer look.
>>
>>  -George
>>
>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh
Rathor<mukesh.rathor@oracle.com>
>> wrote:
>>>
>>> George Dunlap wrote:
>>>>
>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh
Rathor<mukesh.rathor@oracle.com>
>>>> wrote:
>>>>>
>>>>> dom0 hang:
>>>>>  vcpu0 is trying to wakeup a task and in try_to_wake_up()
calls
>>>>>  task_rq_lock(). since the task has cpu set to 1, it gets
runq lock
>>>>>  for vcpu1. next it calls resched_task() which results in
sending IPI
>>>>>  to vcpu1. for that, vcpu0 gets into the
HYPERVISOR_event_channel_op
>>>>>  HCALL and is waiting to return. Meanwhile, vcpu1 got
running, and is
>>>>>  spinning on it''s runq lock in
"schedule():spin_lock_irq(&rq->lock);",
>>>>>  that vcpu0 is holding (and is waiting to return from the
HCALL).
>>>>>
>>>>>  As I had noticed before, vcpu0 never gets scheduled in
xen. So
>>>>>  looking further into xen:
>>>>>
>>>>> xen:
>>>>>  Both vcpu''s are on the same runq, in this case
cpu1. But the
>>>>>  priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST. As
a result,
>>>>>  the scheduler always picks vcpu1, and vcpu0 is starved.
Also, I see in
>>>>>  kdb that the scheduler timer is not set on cpu 0. That
would''ve
>>>>>  allowed csched_load_balance() to kick in on cpu0. [Also,
on
>>>>>  cpu1, the accounting timer, csched_tick, is not set.
 Altho,
>>>>>  csched_tick() is running on cpu0, it only checks runq for
cpu0.]
>>>>>
>>>>>  Looks like c/s 19500 changed csched_schedule():
>>>>>
>>>>> -    ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE);
>>>>> +    ret.time = (is_idle_vcpu(snext->vcpu) ?
>>>>> +                -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE));
>>>>>
>>>>>  The quickest fix for us would be to just back that out.
>>>>>
>>>>>
>>>>>  BTW, just a comment on following (all in sched_credit.c):
>>>>>
>>>>>    if ( svc->pri == CSCHED_PRI_TS_UNDER &&
>>>>>       !(svc->flags & CSCHED_FLAG_VCPU_PARKED) )
>>>>>    {
>>>>>       svc->pri = CSCHED_PRI_TS_BOOST;
>>>>>    }
>>>>>  comibined with
>>>>>  if ( snext->pri > CSCHED_PRI_TS_OVER )
>>>>>          __runq_remove(snext);
>>>>>
>>>>>    Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems
dangerous. To me,
>>>>>    since csched_schedule() never checks for time
accumulated by a
>>>>>    vcpu at pri CSCHED_PRI_TS_BOOST, that is same as pinning
a vcpu to a
>>>>>    pcpu. if that vcpu never makes progress, essentially,
the system
>>>>>    has lost a physical cpu.  Optionally, csched_schedule()
should
>>>>> always
>>>>>    check for cpu time accumulated and reduce the priority
over time.
>>>>>    I can''t tell right off if it already does that.
or something like
>>>>>    that :)...  my 2 cents.
>>>>
>>>> Hmm... what''s supposed to happen is that eventually a
timer tick will
>>>> interrupt vcpu1.  If cpu1 is set to be "active", then
it will be
>>>> debited 10ms worth of credit.  Eventually, it will go into
OVER, and
>>>> lose BOOST.  If it''s "inactive", then when
the tick happens, it will
>>>> be set to "active" and be debited 10ms again, setting
it directly into
>>>> OVER (and thus also losing boost).
>>>>
>>>> Can you see if the timer ticks are still happening, and perhaps
put
>>>> some tracing it to verify that what I described above is
happening?
>>>>
>>>>  -George
>>>
>>> George,
>>>
>>> Is that in csched_acct()? Looks like that''s somehow gotten
removed. If
>>> true, then may be that''s the fundamental problem to chase.
>>>
>>> Here''s what the trq looks like when hung, not in any
schedule function:
>>>
>>> [0]xkdb> dtrq
>>> CPU[00]: NOW:0x00003f2db9af369e
>>>  1: exp=0x00003ee31cb32200 fn:csched_tick data:0000000000000000
>>>  2: exp=0x00003ee347ece164 fn:time_calibration
data:0000000000000000
>>>  3: exp=0x00003ee69a28f04b fn:mce_work_fn data:0000000000000000
>>>  4: exp=0x00003f055895e25f fn:plt_overflow data:0000000000000000
>>>  5: exp=0x00003ee353810216 fn:rtc_update_second
data:ffff83007f0226d8
>>>
>>> CPU[01]: NOW:0x00003f2db9af369e
>>>  1: exp=0x00003ee30b847988 fn:s_timer_fn data:0000000000000000
>>>  2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback
data:ffff83007f022a68
>>>
>>>
>>> thanks
>>> Mukesh
>>>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Jul-03 00:14 UTC

head link

Re: [Xen-devel] dom0 hang

ah, i totally missed csched_tick():
     if ( !is_idle_vcpu(current) )
         csched_vcpu_acct(cpu);

yeah, looks like that''s what is going on. i''m still waiting to
reproduce. at
first glance, looking at c/s 19460, seems like suspend/resume, well at least 
the resume, should happen in csched_schedule().....

thanks,
Mukesh


George Dunlap wrote:> [Oops, adding back in distro list, also adding Kevin Tian and Yu Ke
> who wrote cs 19460]
> 
> The functionality I was talking about, subtracting credits and
> clearing BOOST, happens in csched_vcpu_acct() (which is different than
> csched_acct()).  vcpu_acct() is called from csched_tick(), which
> should still happen every 10ms on every cpu.
> 
> The patch I referred to (cs 19460) disables and re-enables tickers in
> xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time the
> processor idles.  I can''t see anywhere else that tickers are
disabled,
> so it''s probably something not properly re-enabling them again.
> 
> Try applying the attached patch to see if that changes anything. 
(I''m
> on the road, so I can''t repro the lockup issue.)  If that
doesn''t
> work, try disabling c-states and see if that helps.  Then at least
> we''ll know where the problem lies.
> 
>  -George
> 
> On Thu, Jul 2, 2009 at 10:10 PM, Mukesh
Rathor<mukesh.rathor@oracle.com> wrote:
>> that seems to only suspend csched_pcpu.ticker which is csched_tick that
is
>> only sorting local runq.
>>
>> again, we are concerned about csched_priv.master_ticker that calls
>> csched_acct? correct, so i can trace that?
>>
>> thanks,
>> mukesh
>>
>>
>> George Dunlap wrote:
>>> Ah, I see that there''s been some changes to tick stuff
with the
>>> c-state (e.g., cs 19460).  It looks like they''re supposed
to be going
>>> still, but perhaps the tick_suspend() and tick_resume()
aren''t being
>>> called properly.  Let me take a closer look.
>>>
>>>  -George
>>>
>>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh
Rathor<mukesh.rathor@oracle.com>
>>> wrote:
>>>> George Dunlap wrote:
>>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh
Rathor<mukesh.rathor@oracle.com>
>>>>> wrote:
>>>>>> dom0 hang:
>>>>>>  vcpu0 is trying to wakeup a task and in
try_to_wake_up() calls
>>>>>>  task_rq_lock(). since the task has cpu set to 1, it
gets runq lock
>>>>>>  for vcpu1. next it calls resched_task() which results
in sending IPI
>>>>>>  to vcpu1. for that, vcpu0 gets into the
HYPERVISOR_event_channel_op
>>>>>>  HCALL and is waiting to return. Meanwhile, vcpu1 got
running, and is
>>>>>>  spinning on it''s runq lock in
"schedule():spin_lock_irq(&rq->lock);",
>>>>>>  that vcpu0 is holding (and is waiting to return from
the HCALL).
>>>>>>
>>>>>>  As I had noticed before, vcpu0 never gets scheduled in
xen. So
>>>>>>  looking further into xen:
>>>>>>
>>>>>> xen:
>>>>>>  Both vcpu''s are on the same runq, in this
case cpu1. But the
>>>>>>  priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST.
As a result,
>>>>>>  the scheduler always picks vcpu1, and vcpu0 is
starved. Also, I see in
>>>>>>  kdb that the scheduler timer is not set on cpu 0. That
would''ve
>>>>>>  allowed csched_load_balance() to kick in on cpu0.
[Also, on
>>>>>>  cpu1, the accounting timer, csched_tick, is not set. 
Altho,
>>>>>>  csched_tick() is running on cpu0, it only checks runq
for cpu0.]
>>>>>>
>>>>>>  Looks like c/s 19500 changed csched_schedule():
>>>>>>
>>>>>> -    ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE);
>>>>>> +    ret.time = (is_idle_vcpu(snext->vcpu) ?
>>>>>> +                -1 :
MILLISECS(CSCHED_MSECS_PER_TSLICE));
>>>>>>
>>>>>>  The quickest fix for us would be to just back that
out.
>>>>>>
>>>>>>
>>>>>>  BTW, just a comment on following (all in
sched_credit.c):
>>>>>>
>>>>>>    if ( svc->pri == CSCHED_PRI_TS_UNDER &&
>>>>>>       !(svc->flags & CSCHED_FLAG_VCPU_PARKED) )
>>>>>>    {
>>>>>>       svc->pri = CSCHED_PRI_TS_BOOST;
>>>>>>    }
>>>>>>  comibined with
>>>>>>  if ( snext->pri > CSCHED_PRI_TS_OVER )
>>>>>>          __runq_remove(snext);
>>>>>>
>>>>>>    Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems
dangerous. To me,
>>>>>>    since csched_schedule() never checks for time
accumulated by a
>>>>>>    vcpu at pri CSCHED_PRI_TS_BOOST, that is same as
pinning a vcpu to a
>>>>>>    pcpu. if that vcpu never makes progress,
essentially, the system
>>>>>>    has lost a physical cpu.  Optionally,
csched_schedule() should
>>>>>> always
>>>>>>    check for cpu time accumulated and reduce the
priority over time.
>>>>>>    I can''t tell right off if it already does
that. or something like
>>>>>>    that :)...  my 2 cents.
>>>>> Hmm... what''s supposed to happen is that
eventually a timer tick will
>>>>> interrupt vcpu1.  If cpu1 is set to be "active",
then it will be
>>>>> debited 10ms worth of credit.  Eventually, it will go into
OVER, and
>>>>> lose BOOST.  If it''s "inactive", then
when the tick happens, it will
>>>>> be set to "active" and be debited 10ms again,
setting it directly into
>>>>> OVER (and thus also losing boost).
>>>>>
>>>>> Can you see if the timer ticks are still happening, and
perhaps put
>>>>> some tracing it to verify that what I described above is
happening?
>>>>>
>>>>>  -George
>>>> George,
>>>>
>>>> Is that in csched_acct()? Looks like that''s somehow
gotten removed. If
>>>> true, then may be that''s the fundamental problem to
chase.
>>>>
>>>> Here''s what the trq looks like when hung, not in any
schedule function:
>>>>
>>>> [0]xkdb> dtrq
>>>> CPU[00]: NOW:0x00003f2db9af369e
>>>>  1: exp=0x00003ee31cb32200 fn:csched_tick data:0000000000000000
>>>>  2: exp=0x00003ee347ece164 fn:time_calibration
data:0000000000000000
>>>>  3: exp=0x00003ee69a28f04b fn:mce_work_fn data:0000000000000000
>>>>  4: exp=0x00003f055895e25f fn:plt_overflow
data:0000000000000000
>>>>  5: exp=0x00003ee353810216 fn:rtc_update_second
data:ffff83007f0226d8
>>>>
>>>> CPU[01]: NOW:0x00003f2db9af369e
>>>>  1: exp=0x00003ee30b847988 fn:s_timer_fn data:0000000000000000
>>>>  2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback
data:ffff83007f022a68
>>>>
>>>>
>>>> thanks
>>>> Mukesh
>>>>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Jul-03 01:18 UTC

head link

Re: [Xen-devel] dom0 hang

Hi Kevin/Yu:

acpi_processor_idle()
{
     sched_tick_suspend();
      /*
      * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
      * which will break the later assumption of no sofirq pending,
      * so add do_softirq
      */
     if ( softirq_pending(smp_processor_id()) )
         do_softirq();             <==============
     local_irq_disable();
     if ( softirq_pending(smp_processor_id()) )
     {
         local_irq_enable();
         sched_tick_resume();
         cpufreq_dbs_timer_resume();
         return;
     }

wouldn''t the do_softirq() call scheduler with tick suspended, and
the scheduler then context switches to another vcpu0 (with *_BOOST) which
would result in the stuck vcpu I described?

thanks
Mukesh


Mukesh Rathor wrote:> ah, i totally missed csched_tick():
>     if ( !is_idle_vcpu(current) )
>         csched_vcpu_acct(cpu);
> 
> yeah, looks like that''s what is going on. i''m still
waiting to
> reproduce. at first glance, looking at c/s 19460, seems like 
> suspend/resume, well at least the resume, should happen in 
> csched_schedule().....
> 
> thanks,
> Mukesh
> 
> 
> George Dunlap wrote:
>> [Oops, adding back in distro list, also adding Kevin Tian and Yu Ke
>> who wrote cs 19460]
>>
>> The functionality I was talking about, subtracting credits and
>> clearing BOOST, happens in csched_vcpu_acct() (which is different than
>> csched_acct()).  vcpu_acct() is called from csched_tick(), which
>> should still happen every 10ms on every cpu.
>>
>> The patch I referred to (cs 19460) disables and re-enables tickers in
>> xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time the
>> processor idles.  I can''t see anywhere else that tickers are
disabled,
>> so it''s probably something not properly re-enabling them
again.
>>
>> Try applying the attached patch to see if that changes anything. 
(I''m
>> on the road, so I can''t repro the lockup issue.)  If that
doesn''t
>> work, try disabling c-states and see if that helps.  Then at least
>> we''ll know where the problem lies.
>>
>>  -George
>>
>> On Thu, Jul 2, 2009 at 10:10 PM, Mukesh 
>> Rathor<mukesh.rathor@oracle.com> wrote:
>>> that seems to only suspend csched_pcpu.ticker which is csched_tick 
>>> that is
>>> only sorting local runq.
>>>
>>> again, we are concerned about csched_priv.master_ticker that calls
>>> csched_acct? correct, so i can trace that?
>>>
>>> thanks,
>>> mukesh
>>>
>>>
>>> George Dunlap wrote:
>>>> Ah, I see that there''s been some changes to tick stuff
with the
>>>> c-state (e.g., cs 19460).  It looks like they''re
supposed to be going
>>>> still, but perhaps the tick_suspend() and tick_resume()
aren''t being
>>>> called properly.  Let me take a closer look.
>>>>
>>>>  -George
>>>>
>>>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh
Rathor<mukesh.rathor@oracle.com>
>>>> wrote:
>>>>> George Dunlap wrote:
>>>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh 
>>>>>> Rathor<mukesh.rathor@oracle.com>
>>>>>> wrote:
>>>>>>> dom0 hang:
>>>>>>>  vcpu0 is trying to wakeup a task and in
try_to_wake_up() calls
>>>>>>>  task_rq_lock(). since the task has cpu set to 1,
it gets runq lock
>>>>>>>  for vcpu1. next it calls resched_task() which
results in sending
>>>>>>> IPI
>>>>>>>  to vcpu1. for that, vcpu0 gets into the
HYPERVISOR_event_channel_op
>>>>>>>  HCALL and is waiting to return. Meanwhile, vcpu1
got running,
>>>>>>> and is
>>>>>>>  spinning on it''s runq lock in 
>>>>>>>
"schedule():spin_lock_irq(&rq->lock);",
>>>>>>>  that vcpu0 is holding (and is waiting to return
from the HCALL).
>>>>>>>
>>>>>>>  As I had noticed before, vcpu0 never gets
scheduled in xen. So
>>>>>>>  looking further into xen:
>>>>>>>
>>>>>>> xen:
>>>>>>>  Both vcpu''s are on the same runq, in this
case cpu1. But the
>>>>>>>  priority of vcpu1 has been set to
CSCHED_PRI_TS_BOOST. As a result,
>>>>>>>  the scheduler always picks vcpu1, and vcpu0 is
starved. Also, I
>>>>>>> see in
>>>>>>>  kdb that the scheduler timer is not set on cpu 0.
That would''ve
>>>>>>>  allowed csched_load_balance() to kick in on cpu0.
[Also, on
>>>>>>>  cpu1, the accounting timer, csched_tick, is not
set.  Altho,
>>>>>>>  csched_tick() is running on cpu0, it only checks
runq for cpu0.]
>>>>>>>
>>>>>>>  Looks like c/s 19500 changed csched_schedule():
>>>>>>>
>>>>>>> -    ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE);
>>>>>>> +    ret.time = (is_idle_vcpu(snext->vcpu) ?
>>>>>>> +                -1 :
MILLISECS(CSCHED_MSECS_PER_TSLICE));
>>>>>>>
>>>>>>>  The quickest fix for us would be to just back that
out.
>>>>>>>
>>>>>>>
>>>>>>>  BTW, just a comment on following (all in
sched_credit.c):
>>>>>>>
>>>>>>>    if ( svc->pri == CSCHED_PRI_TS_UNDER
&&
>>>>>>>       !(svc->flags &
CSCHED_FLAG_VCPU_PARKED) )
>>>>>>>    {
>>>>>>>       svc->pri = CSCHED_PRI_TS_BOOST;
>>>>>>>    }
>>>>>>>  comibined with
>>>>>>>  if ( snext->pri > CSCHED_PRI_TS_OVER )
>>>>>>>          __runq_remove(snext);
>>>>>>>
>>>>>>>    Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems
dangerous. To
>>>>>>> me,
>>>>>>>    since csched_schedule() never checks for time
accumulated by a
>>>>>>>    vcpu at pri CSCHED_PRI_TS_BOOST, that is same as
pinning a
>>>>>>> vcpu to a
>>>>>>>    pcpu. if that vcpu never makes progress,
essentially, the system
>>>>>>>    has lost a physical cpu.  Optionally,
csched_schedule() should
>>>>>>> always
>>>>>>>    check for cpu time accumulated and reduce the
priority over time.
>>>>>>>    I can''t tell right off if it already
does that. or something like
>>>>>>>    that :)...  my 2 cents.
>>>>>> Hmm... what''s supposed to happen is that
eventually a timer tick will
>>>>>> interrupt vcpu1.  If cpu1 is set to be
"active", then it will be
>>>>>> debited 10ms worth of credit.  Eventually, it will go
into OVER, and
>>>>>> lose BOOST.  If it''s "inactive",
then when the tick happens, it will
>>>>>> be set to "active" and be debited 10ms again,
setting it directly
>>>>>> into
>>>>>> OVER (and thus also losing boost).
>>>>>>
>>>>>> Can you see if the timer ticks are still happening, and
perhaps put
>>>>>> some tracing it to verify that what I described above
is happening?
>>>>>>
>>>>>>  -George
>>>>> George,
>>>>>
>>>>> Is that in csched_acct()? Looks like that''s
somehow gotten removed. If
>>>>> true, then may be that''s the fundamental problem
to chase.
>>>>>
>>>>> Here''s what the trq looks like when hung, not in
any schedule
>>>>> function:
>>>>>
>>>>> [0]xkdb> dtrq
>>>>> CPU[00]: NOW:0x00003f2db9af369e
>>>>>  1: exp=0x00003ee31cb32200 fn:csched_tick
data:0000000000000000
>>>>>  2: exp=0x00003ee347ece164 fn:time_calibration
data:0000000000000000
>>>>>  3: exp=0x00003ee69a28f04b fn:mce_work_fn
data:0000000000000000
>>>>>  4: exp=0x00003f055895e25f fn:plt_overflow
data:0000000000000000
>>>>>  5: exp=0x00003ee353810216 fn:rtc_update_second
data:ffff83007f0226d8
>>>>>
>>>>> CPU[01]: NOW:0x00003f2db9af369e
>>>>>  1: exp=0x00003ee30b847988 fn:s_timer_fn
data:0000000000000000
>>>>>  2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback
data:ffff83007f022a68
>>>>>
>>>>>
>>>>> thanks
>>>>> Mukesh
>>>>>
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yu, Ke

2009-Jul-03 04:48 UTC

head link

RE: [Xen-devel] dom0 hang

Oh, yes, do_softirq is over killing here, we actually only want to handle the
TIMER_SOFTIRQ, for other SOFTIRQ, it should simply return.

I will cook a patch for this. thanks for identifying this issue.

Best Regards
Ke
>-----Original Message-----
>From: Mukesh Rathor [mailto:mukesh.rathor@oracle.com]
>Sent: Friday, July 03, 2009 9:19 AM
>To: mukesh.rathor@oracle.com
>Cc: George Dunlap; Tian, Kevin; xen-devel@lists.xensource.com; Yu, Ke; Kurt
C.
>Hackel
>Subject: Re: [Xen-devel] dom0 hang
>
>
>Hi Kevin/Yu:
>
>acpi_processor_idle()
>{
>     sched_tick_suspend();
>      /*
>      * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
>      * which will break the later assumption of no sofirq pending,
>      * so add do_softirq
>      */
>     if ( softirq_pending(smp_processor_id()) )
>         do_softirq();             <==============>
>     local_irq_disable();
>     if ( softirq_pending(smp_processor_id()) )
>     {
>         local_irq_enable();
>         sched_tick_resume();
>         cpufreq_dbs_timer_resume();
>         return;
>     }
>
>wouldn''t the do_softirq() call scheduler with tick suspended, and
>the scheduler then context switches to another vcpu0 (with *_BOOST) which
>would result in the stuck vcpu I described?
>
>thanks
>Mukesh
>
>
>Mukesh Rathor wrote:
>> ah, i totally missed csched_tick():
>>     if ( !is_idle_vcpu(current) )
>>         csched_vcpu_acct(cpu);
>>
>> yeah, looks like that''s what is going on. i''m still
waiting to
>> reproduce. at first glance, looking at c/s 19460, seems like
>> suspend/resume, well at least the resume, should happen in
>> csched_schedule().....
>>
>> thanks,
>> Mukesh
>>
>>
>> George Dunlap wrote:
>>> [Oops, adding back in distro list, also adding Kevin Tian and Yu Ke
>>> who wrote cs 19460]
>>>
>>> The functionality I was talking about, subtracting credits and
>>> clearing BOOST, happens in csched_vcpu_acct() (which is different
than
>>> csched_acct()).  vcpu_acct() is called from csched_tick(), which
>>> should still happen every 10ms on every cpu.
>>>
>>> The patch I referred to (cs 19460) disables and re-enables tickers
in
>>> xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time the
>>> processor idles.  I can''t see anywhere else that tickers
are disabled,
>>> so it''s probably something not properly re-enabling them
again.
>>>
>>> Try applying the attached patch to see if that changes anything. 
(I''m
>>> on the road, so I can''t repro the lockup issue.)  If that
doesn''t
>>> work, try disabling c-states and see if that helps.  Then at least
>>> we''ll know where the problem lies.
>>>
>>>  -George
>>>
>>> On Thu, Jul 2, 2009 at 10:10 PM, Mukesh
>>> Rathor<mukesh.rathor@oracle.com> wrote:
>>>> that seems to only suspend csched_pcpu.ticker which is
csched_tick
>>>> that is
>>>> only sorting local runq.
>>>>
>>>> again, we are concerned about csched_priv.master_ticker that
calls
>>>> csched_acct? correct, so i can trace that?
>>>>
>>>> thanks,
>>>> mukesh
>>>>
>>>>
>>>> George Dunlap wrote:
>>>>> Ah, I see that there''s been some changes to tick
stuff with the
>>>>> c-state (e.g., cs 19460).  It looks like they''re
supposed to be going
>>>>> still, but perhaps the tick_suspend() and tick_resume()
aren''t being
>>>>> called properly.  Let me take a closer look.
>>>>>
>>>>>  -George
>>>>>
>>>>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh
>Rathor<mukesh.rathor@oracle.com>
>>>>> wrote:
>>>>>> George Dunlap wrote:
>>>>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh
>>>>>>> Rathor<mukesh.rathor@oracle.com>
>>>>>>> wrote:
>>>>>>>> dom0 hang:
>>>>>>>>  vcpu0 is trying to wakeup a task and in
try_to_wake_up() calls
>>>>>>>>  task_rq_lock(). since the task has cpu set to
1, it gets runq lock
>>>>>>>>  for vcpu1. next it calls resched_task() which
results in sending
>>>>>>>> IPI
>>>>>>>>  to vcpu1. for that, vcpu0 gets into the
>HYPERVISOR_event_channel_op
>>>>>>>>  HCALL and is waiting to return. Meanwhile,
vcpu1 got running,
>>>>>>>> and is
>>>>>>>>  spinning on it''s runq lock in
>>>>>>>>
"schedule():spin_lock_irq(&rq->lock);",
>>>>>>>>  that vcpu0 is holding (and is waiting to
return from the HCALL).
>>>>>>>>
>>>>>>>>  As I had noticed before, vcpu0 never gets
scheduled in xen. So
>>>>>>>>  looking further into xen:
>>>>>>>>
>>>>>>>> xen:
>>>>>>>>  Both vcpu''s are on the same runq, in
this case cpu1. But the
>>>>>>>>  priority of vcpu1 has been set to
CSCHED_PRI_TS_BOOST. As a
>result,
>>>>>>>>  the scheduler always picks vcpu1, and vcpu0 is
starved. Also, I
>>>>>>>> see in
>>>>>>>>  kdb that the scheduler timer is not set on cpu
0. That would''ve
>>>>>>>>  allowed csched_load_balance() to kick in on
cpu0. [Also, on
>>>>>>>>  cpu1, the accounting timer, csched_tick, is
not set.  Altho,
>>>>>>>>  csched_tick() is running on cpu0, it only
checks runq for cpu0.]
>>>>>>>>
>>>>>>>>  Looks like c/s 19500 changed
csched_schedule():
>>>>>>>>
>>>>>>>> -    ret.time =
MILLISECS(CSCHED_MSECS_PER_TSLICE);
>>>>>>>> +    ret.time = (is_idle_vcpu(snext->vcpu) ?
>>>>>>>> +                -1 :
MILLISECS(CSCHED_MSECS_PER_TSLICE));
>>>>>>>>
>>>>>>>>  The quickest fix for us would be to just back
that out.
>>>>>>>>
>>>>>>>>
>>>>>>>>  BTW, just a comment on following (all in
sched_credit.c):
>>>>>>>>
>>>>>>>>    if ( svc->pri == CSCHED_PRI_TS_UNDER
&&
>>>>>>>>       !(svc->flags &
CSCHED_FLAG_VCPU_PARKED) )
>>>>>>>>    {
>>>>>>>>       svc->pri = CSCHED_PRI_TS_BOOST;
>>>>>>>>    }
>>>>>>>>  comibined with
>>>>>>>>  if ( snext->pri > CSCHED_PRI_TS_OVER )
>>>>>>>>          __runq_remove(snext);
>>>>>>>>
>>>>>>>>    Setting CSCHED_PRI_TS_BOOST as pri of vcpu
seems dangerous.
>To
>>>>>>>> me,
>>>>>>>>    since csched_schedule() never checks for
time accumulated by a
>>>>>>>>    vcpu at pri CSCHED_PRI_TS_BOOST, that is
same as pinning a
>>>>>>>> vcpu to a
>>>>>>>>    pcpu. if that vcpu never makes progress,
essentially, the system
>>>>>>>>    has lost a physical cpu.  Optionally,
csched_schedule() should
>>>>>>>> always
>>>>>>>>    check for cpu time accumulated and reduce
the priority over
>time.
>>>>>>>>    I can''t tell right off if it
already does that. or something like
>>>>>>>>    that :)...  my 2 cents.
>>>>>>> Hmm... what''s supposed to happen is that
eventually a timer tick will
>>>>>>> interrupt vcpu1.  If cpu1 is set to be
"active", then it will be
>>>>>>> debited 10ms worth of credit.  Eventually, it will
go into OVER, and
>>>>>>> lose BOOST.  If it''s "inactive",
then when the tick happens, it will
>>>>>>> be set to "active" and be debited 10ms
again, setting it directly
>>>>>>> into
>>>>>>> OVER (and thus also losing boost).
>>>>>>>
>>>>>>> Can you see if the timer ticks are still happening,
and perhaps put
>>>>>>> some tracing it to verify that what I described
above is happening?
>>>>>>>
>>>>>>>  -George
>>>>>> George,
>>>>>>
>>>>>> Is that in csched_acct()? Looks like that''s
somehow gotten removed. If
>>>>>> true, then may be that''s the fundamental
problem to chase.
>>>>>>
>>>>>> Here''s what the trq looks like when hung, not
in any schedule
>>>>>> function:
>>>>>>
>>>>>> [0]xkdb> dtrq
>>>>>> CPU[00]: NOW:0x00003f2db9af369e
>>>>>>  1: exp=0x00003ee31cb32200 fn:csched_tick
>data:0000000000000000
>>>>>>  2: exp=0x00003ee347ece164 fn:time_calibration
>data:0000000000000000
>>>>>>  3: exp=0x00003ee69a28f04b fn:mce_work_fn
>data:0000000000000000
>>>>>>  4: exp=0x00003f055895e25f fn:plt_overflow
>data:0000000000000000
>>>>>>  5: exp=0x00003ee353810216 fn:rtc_update_second
>data:ffff83007f0226d8
>>>>>>
>>>>>> CPU[01]: NOW:0x00003f2db9af369e
>>>>>>  1: exp=0x00003ee30b847988 fn:s_timer_fn
>data:0000000000000000
>>>>>>  2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback
>data:ffff83007f022a68
>>>>>>
>>>>>>
>>>>>> thanks
>>>>>> Mukesh
>>>>>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yu, Ke

2009-Jul-03 05:09 UTC

head link

RE: [Xen-devel] dom0 hang

Hi Mukesh,

Could you please try the following patch, to see if it can resolve the issue you
observed? Thanks.

Best Regards
Ke

diff -r d461c4d8af17 xen/arch/x86/acpi/cpu_idle.c
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -228,10 +228,10 @@ static void acpi_processor_idle(void)
     /*
      * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
      * which will break the later assumption of no sofirq pending,
-     * so add do_softirq
+     * so process the pending timers
      */
-    if ( softirq_pending(smp_processor_id()) )
-        do_softirq();
+
+    process_pending_timers();

     /*
      * Interrupts must be disabled during bus mastering calculations and
>-----Original Message-----
>From: Mukesh Rathor [mailto:mukesh.rathor@oracle.com]
>Sent: Friday, July 03, 2009 9:19 AM
>To: mukesh.rathor@oracle.com
>Cc: George Dunlap; Tian, Kevin; xen-devel@lists.xensource.com; Yu, Ke; Kurt
C.
>Hackel
>Subject: Re: [Xen-devel] dom0 hang
>
>
>Hi Kevin/Yu:
>
>acpi_processor_idle()
>{
>     sched_tick_suspend();
>      /*
>      * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
>      * which will break the later assumption of no sofirq pending,
>      * so add do_softirq
>      */
>     if ( softirq_pending(smp_processor_id()) )
>         do_softirq();             <==============>
>     local_irq_disable();
>     if ( softirq_pending(smp_processor_id()) )
>     {
>         local_irq_enable();
>         sched_tick_resume();
>         cpufreq_dbs_timer_resume();
>         return;
>     }
>
>wouldn''t the do_softirq() call scheduler with tick suspended, and
>the scheduler then context switches to another vcpu0 (with *_BOOST) which
>would result in the stuck vcpu I described?
>
>thanks
>Mukesh
>
>
>Mukesh Rathor wrote:
>> ah, i totally missed csched_tick():
>>     if ( !is_idle_vcpu(current) )
>>         csched_vcpu_acct(cpu);
>>
>> yeah, looks like that''s what is going on. i''m still
waiting to
>> reproduce. at first glance, looking at c/s 19460, seems like
>> suspend/resume, well at least the resume, should happen in
>> csched_schedule().....
>>
>> thanks,
>> Mukesh
>>
>>
>> George Dunlap wrote:
>>> [Oops, adding back in distro list, also adding Kevin Tian and Yu Ke
>>> who wrote cs 19460]
>>>
>>> The functionality I was talking about, subtracting credits and
>>> clearing BOOST, happens in csched_vcpu_acct() (which is different
than
>>> csched_acct()).  vcpu_acct() is called from csched_tick(), which
>>> should still happen every 10ms on every cpu.
>>>
>>> The patch I referred to (cs 19460) disables and re-enables tickers
in
>>> xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time the
>>> processor idles.  I can''t see anywhere else that tickers
are disabled,
>>> so it''s probably something not properly re-enabling them
again.
>>>
>>> Try applying the attached patch to see if that changes anything. 
(I''m
>>> on the road, so I can''t repro the lockup issue.)  If that
doesn''t
>>> work, try disabling c-states and see if that helps.  Then at least
>>> we''ll know where the problem lies.
>>>
>>>  -George
>>>
>>> On Thu, Jul 2, 2009 at 10:10 PM, Mukesh
>>> Rathor<mukesh.rathor@oracle.com> wrote:
>>>> that seems to only suspend csched_pcpu.ticker which is
csched_tick
>>>> that is
>>>> only sorting local runq.
>>>>
>>>> again, we are concerned about csched_priv.master_ticker that
calls
>>>> csched_acct? correct, so i can trace that?
>>>>
>>>> thanks,
>>>> mukesh
>>>>
>>>>
>>>> George Dunlap wrote:
>>>>> Ah, I see that there''s been some changes to tick
stuff with the
>>>>> c-state (e.g., cs 19460).  It looks like they''re
supposed to be going
>>>>> still, but perhaps the tick_suspend() and tick_resume()
aren''t being
>>>>> called properly.  Let me take a closer look.
>>>>>
>>>>>  -George
>>>>>
>>>>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh
>Rathor<mukesh.rathor@oracle.com>
>>>>> wrote:
>>>>>> George Dunlap wrote:
>>>>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh
>>>>>>> Rathor<mukesh.rathor@oracle.com>
>>>>>>> wrote:
>>>>>>>> dom0 hang:
>>>>>>>>  vcpu0 is trying to wakeup a task and in
try_to_wake_up() calls
>>>>>>>>  task_rq_lock(). since the task has cpu set to
1, it gets runq lock
>>>>>>>>  for vcpu1. next it calls resched_task() which
results in sending
>>>>>>>> IPI
>>>>>>>>  to vcpu1. for that, vcpu0 gets into the
>HYPERVISOR_event_channel_op
>>>>>>>>  HCALL and is waiting to return. Meanwhile,
vcpu1 got running,
>>>>>>>> and is
>>>>>>>>  spinning on it''s runq lock in
>>>>>>>>
"schedule():spin_lock_irq(&rq->lock);",
>>>>>>>>  that vcpu0 is holding (and is waiting to
return from the HCALL).
>>>>>>>>
>>>>>>>>  As I had noticed before, vcpu0 never gets
scheduled in xen. So
>>>>>>>>  looking further into xen:
>>>>>>>>
>>>>>>>> xen:
>>>>>>>>  Both vcpu''s are on the same runq, in
this case cpu1. But the
>>>>>>>>  priority of vcpu1 has been set to
CSCHED_PRI_TS_BOOST. As a
>result,
>>>>>>>>  the scheduler always picks vcpu1, and vcpu0 is
starved. Also, I
>>>>>>>> see in
>>>>>>>>  kdb that the scheduler timer is not set on cpu
0. That would''ve
>>>>>>>>  allowed csched_load_balance() to kick in on
cpu0. [Also, on
>>>>>>>>  cpu1, the accounting timer, csched_tick, is
not set.  Altho,
>>>>>>>>  csched_tick() is running on cpu0, it only
checks runq for cpu0.]
>>>>>>>>
>>>>>>>>  Looks like c/s 19500 changed
csched_schedule():
>>>>>>>>
>>>>>>>> -    ret.time =
MILLISECS(CSCHED_MSECS_PER_TSLICE);
>>>>>>>> +    ret.time = (is_idle_vcpu(snext->vcpu) ?
>>>>>>>> +                -1 :
MILLISECS(CSCHED_MSECS_PER_TSLICE));
>>>>>>>>
>>>>>>>>  The quickest fix for us would be to just back
that out.
>>>>>>>>
>>>>>>>>
>>>>>>>>  BTW, just a comment on following (all in
sched_credit.c):
>>>>>>>>
>>>>>>>>    if ( svc->pri == CSCHED_PRI_TS_UNDER
&&
>>>>>>>>       !(svc->flags &
CSCHED_FLAG_VCPU_PARKED) )
>>>>>>>>    {
>>>>>>>>       svc->pri = CSCHED_PRI_TS_BOOST;
>>>>>>>>    }
>>>>>>>>  comibined with
>>>>>>>>  if ( snext->pri > CSCHED_PRI_TS_OVER )
>>>>>>>>          __runq_remove(snext);
>>>>>>>>
>>>>>>>>    Setting CSCHED_PRI_TS_BOOST as pri of vcpu
seems dangerous.
>To
>>>>>>>> me,
>>>>>>>>    since csched_schedule() never checks for
time accumulated by a
>>>>>>>>    vcpu at pri CSCHED_PRI_TS_BOOST, that is
same as pinning a
>>>>>>>> vcpu to a
>>>>>>>>    pcpu. if that vcpu never makes progress,
essentially, the system
>>>>>>>>    has lost a physical cpu.  Optionally,
csched_schedule() should
>>>>>>>> always
>>>>>>>>    check for cpu time accumulated and reduce
the priority over
>time.
>>>>>>>>    I can''t tell right off if it
already does that. or something like
>>>>>>>>    that :)...  my 2 cents.
>>>>>>> Hmm... what''s supposed to happen is that
eventually a timer tick will
>>>>>>> interrupt vcpu1.  If cpu1 is set to be
"active", then it will be
>>>>>>> debited 10ms worth of credit.  Eventually, it will
go into OVER, and
>>>>>>> lose BOOST.  If it''s "inactive",
then when the tick happens, it will
>>>>>>> be set to "active" and be debited 10ms
again, setting it directly
>>>>>>> into
>>>>>>> OVER (and thus also losing boost).
>>>>>>>
>>>>>>> Can you see if the timer ticks are still happening,
and perhaps put
>>>>>>> some tracing it to verify that what I described
above is happening?
>>>>>>>
>>>>>>>  -George
>>>>>> George,
>>>>>>
>>>>>> Is that in csched_acct()? Looks like that''s
somehow gotten removed. If
>>>>>> true, then may be that''s the fundamental
problem to chase.
>>>>>>
>>>>>> Here''s what the trq looks like when hung, not
in any schedule
>>>>>> function:
>>>>>>
>>>>>> [0]xkdb> dtrq
>>>>>> CPU[00]: NOW:0x00003f2db9af369e
>>>>>>  1: exp=0x00003ee31cb32200 fn:csched_tick
>data:0000000000000000
>>>>>>  2: exp=0x00003ee347ece164 fn:time_calibration
>data:0000000000000000
>>>>>>  3: exp=0x00003ee69a28f04b fn:mce_work_fn
>data:0000000000000000
>>>>>>  4: exp=0x00003f055895e25f fn:plt_overflow
>data:0000000000000000
>>>>>>  5: exp=0x00003ee353810216 fn:rtc_update_second
>data:ffff83007f0226d8
>>>>>>
>>>>>> CPU[01]: NOW:0x00003f2db9af369e
>>>>>>  1: exp=0x00003ee30b847988 fn:s_timer_fn
>data:0000000000000000
>>>>>>  2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback
>data:ffff83007f022a68
>>>>>>
>>>>>>
>>>>>> thanks
>>>>>> Mukesh
>>>>>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Jul-07 03:46 UTC

head link

Re: [Xen-devel] dom0 hang

Well, the problem takes long to reproduce (only on certain boxes). And then it
may not always happen. So I want to make sure I understand the fix, as it
was pretty hard to debug.

While the fix will still allow softirqs pending, I guess, functionally
it''s OK because after irq disable, it''ll check for pending
softirq, and
just return. I think the comment about expecting no softirq pending
should be fixed.

BTW, why can''t the tick be suspended when csched_schedule() concludes
it''s idle vcpu before returning? won''t that would make it less
intrusive.

thanks,
Mukesh


Yu, Ke wrote:> Hi Mukesh,
> 
> Could you please try the following patch, to see if it can resolve the
issue you observed? Thanks.
> 
> Best Regards
> Ke
> 
> diff -r d461c4d8af17 xen/arch/x86/acpi/cpu_idle.c
> --- a/xen/arch/x86/acpi/cpu_idle.c
> +++ b/xen/arch/x86/acpi/cpu_idle.c
> @@ -228,10 +228,10 @@ static void acpi_processor_idle(void)
>      /*
>       * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
>       * which will break the later assumption of no sofirq pending,
> -     * so add do_softirq
> +     * so process the pending timers
>       */
> -    if ( softirq_pending(smp_processor_id()) )
> -        do_softirq();
> +
> +    process_pending_timers();
> 
>      /*
>       * Interrupts must be disabled during bus mastering calculations and
> 
>> -----Original Message-----
>> From: Mukesh Rathor [mailto:mukesh.rathor@oracle.com]
>> Sent: Friday, July 03, 2009 9:19 AM
>> To: mukesh.rathor@oracle.com
>> Cc: George Dunlap; Tian, Kevin; xen-devel@lists.xensource.com; Yu, Ke;
Kurt C.
>> Hackel
>> Subject: Re: [Xen-devel] dom0 hang
>>
>>
>> Hi Kevin/Yu:
>>
>> acpi_processor_idle()
>> {
>>     sched_tick_suspend();
>>      /*
>>      * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
>>      * which will break the later assumption of no sofirq pending,
>>      * so add do_softirq
>>      */
>>     if ( softirq_pending(smp_processor_id()) )
>>         do_softirq();             <==============>>
>>     local_irq_disable();
>>     if ( softirq_pending(smp_processor_id()) )
>>     {
>>         local_irq_enable();
>>         sched_tick_resume();
>>         cpufreq_dbs_timer_resume();
>>         return;
>>     }
>>
>> wouldn''t the do_softirq() call scheduler with tick suspended,
and
>> the scheduler then context switches to another vcpu0 (with *_BOOST)
which
>> would result in the stuck vcpu I described?
>>
>> thanks
>> Mukesh
>>
>>
>> Mukesh Rathor wrote:
>>> ah, i totally missed csched_tick():
>>>     if ( !is_idle_vcpu(current) )
>>>         csched_vcpu_acct(cpu);
>>>
>>> yeah, looks like that''s what is going on. i''m
still waiting to
>>> reproduce. at first glance, looking at c/s 19460, seems like
>>> suspend/resume, well at least the resume, should happen in
>>> csched_schedule().....
>>>
>>> thanks,
>>> Mukesh
>>>
>>>
>>> George Dunlap wrote:
>>>> [Oops, adding back in distro list, also adding Kevin Tian and
Yu Ke
>>>> who wrote cs 19460]
>>>>
>>>> The functionality I was talking about, subtracting credits and
>>>> clearing BOOST, happens in csched_vcpu_acct() (which is
different than
>>>> csched_acct()).  vcpu_acct() is called from csched_tick(),
which
>>>> should still happen every 10ms on every cpu.
>>>>
>>>> The patch I referred to (cs 19460) disables and re-enables
tickers in
>>>> xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time
the
>>>> processor idles.  I can''t see anywhere else that
tickers are disabled,
>>>> so it''s probably something not properly re-enabling
them again.
>>>>
>>>> Try applying the attached patch to see if that changes
anything.  (I''m
>>>> on the road, so I can''t repro the lockup issue.)  If
that doesn''t
>>>> work, try disabling c-states and see if that helps.  Then at
least
>>>> we''ll know where the problem lies.
>>>>
>>>>  -George
>>>>
>>>> On Thu, Jul 2, 2009 at 10:10 PM, Mukesh
>>>> Rathor<mukesh.rathor@oracle.com> wrote:
>>>>> that seems to only suspend csched_pcpu.ticker which is
csched_tick
>>>>> that is
>>>>> only sorting local runq.
>>>>>
>>>>> again, we are concerned about csched_priv.master_ticker
that calls
>>>>> csched_acct? correct, so i can trace that?
>>>>>
>>>>> thanks,
>>>>> mukesh
>>>>>
>>>>>
>>>>> George Dunlap wrote:
>>>>>> Ah, I see that there''s been some changes to
tick stuff with the
>>>>>> c-state (e.g., cs 19460).  It looks like
they''re supposed to be going
>>>>>> still, but perhaps the tick_suspend() and tick_resume()
aren''t being
>>>>>> called properly.  Let me take a closer look.
>>>>>>
>>>>>>  -George
>>>>>>
>>>>>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh
>> Rathor<mukesh.rathor@oracle.com>
>>>>>> wrote:
>>>>>>> George Dunlap wrote:
>>>>>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh
>>>>>>>> Rathor<mukesh.rathor@oracle.com>
>>>>>>>> wrote:
>>>>>>>>> dom0 hang:
>>>>>>>>>  vcpu0 is trying to wakeup a task and in
try_to_wake_up() calls
>>>>>>>>>  task_rq_lock(). since the task has cpu set
to 1, it gets runq lock
>>>>>>>>>  for vcpu1. next it calls resched_task()
which results in sending
>>>>>>>>> IPI
>>>>>>>>>  to vcpu1. for that, vcpu0 gets into the
>> HYPERVISOR_event_channel_op
>>>>>>>>>  HCALL and is waiting to return. Meanwhile,
vcpu1 got running,
>>>>>>>>> and is
>>>>>>>>>  spinning on it''s runq lock in
>>>>>>>>>
"schedule():spin_lock_irq(&rq->lock);",
>>>>>>>>>  that vcpu0 is holding (and is waiting to
return from the HCALL).
>>>>>>>>>
>>>>>>>>>  As I had noticed before, vcpu0 never gets
scheduled in xen. So
>>>>>>>>>  looking further into xen:
>>>>>>>>>
>>>>>>>>> xen:
>>>>>>>>>  Both vcpu''s are on the same runq,
in this case cpu1. But the
>>>>>>>>>  priority of vcpu1 has been set to
CSCHED_PRI_TS_BOOST. As a
>> result,
>>>>>>>>>  the scheduler always picks vcpu1, and
vcpu0 is starved. Also, I
>>>>>>>>> see in
>>>>>>>>>  kdb that the scheduler timer is not set on
cpu 0. That would''ve
>>>>>>>>>  allowed csched_load_balance() to kick in
on cpu0. [Also, on
>>>>>>>>>  cpu1, the accounting timer, csched_tick,
is not set.  Altho,
>>>>>>>>>  csched_tick() is running on cpu0, it only
checks runq for cpu0.]
>>>>>>>>>
>>>>>>>>>  Looks like c/s 19500 changed
csched_schedule():
>>>>>>>>>
>>>>>>>>> -    ret.time =
MILLISECS(CSCHED_MSECS_PER_TSLICE);
>>>>>>>>> +    ret.time =
(is_idle_vcpu(snext->vcpu) ?
>>>>>>>>> +                -1 :
MILLISECS(CSCHED_MSECS_PER_TSLICE));
>>>>>>>>>
>>>>>>>>>  The quickest fix for us would be to just
back that out.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  BTW, just a comment on following (all in
sched_credit.c):
>>>>>>>>>
>>>>>>>>>    if ( svc->pri == CSCHED_PRI_TS_UNDER
&&
>>>>>>>>>       !(svc->flags &
CSCHED_FLAG_VCPU_PARKED) )
>>>>>>>>>    {
>>>>>>>>>       svc->pri = CSCHED_PRI_TS_BOOST;
>>>>>>>>>    }
>>>>>>>>>  comibined with
>>>>>>>>>  if ( snext->pri > CSCHED_PRI_TS_OVER
)
>>>>>>>>>          __runq_remove(snext);
>>>>>>>>>
>>>>>>>>>    Setting CSCHED_PRI_TS_BOOST as pri of
vcpu seems dangerous.
>> To
>>>>>>>>> me,
>>>>>>>>>    since csched_schedule() never checks for
time accumulated by a
>>>>>>>>>    vcpu at pri CSCHED_PRI_TS_BOOST, that is
same as pinning a
>>>>>>>>> vcpu to a
>>>>>>>>>    pcpu. if that vcpu never makes progress,
essentially, the system
>>>>>>>>>    has lost a physical cpu.  Optionally,
csched_schedule() should
>>>>>>>>> always
>>>>>>>>>    check for cpu time accumulated and
reduce the priority over
>> time.
>>>>>>>>>    I can''t tell right off if it
already does that. or something like
>>>>>>>>>    that :)...  my 2 cents.
>>>>>>>> Hmm... what''s supposed to happen is
that eventually a timer tick will
>>>>>>>> interrupt vcpu1.  If cpu1 is set to be
"active", then it will be
>>>>>>>> debited 10ms worth of credit.  Eventually, it
will go into OVER, and
>>>>>>>> lose BOOST.  If it''s
"inactive", then when the tick happens, it will
>>>>>>>> be set to "active" and be debited
10ms again, setting it directly
>>>>>>>> into
>>>>>>>> OVER (and thus also losing boost).
>>>>>>>>
>>>>>>>> Can you see if the timer ticks are still
happening, and perhaps put
>>>>>>>> some tracing it to verify that what I described
above is happening?
>>>>>>>>
>>>>>>>>  -George
>>>>>>> George,
>>>>>>>
>>>>>>> Is that in csched_acct()? Looks like
that''s somehow gotten removed. If
>>>>>>> true, then may be that''s the fundamental
problem to chase.
>>>>>>>
>>>>>>> Here''s what the trq looks like when hung,
not in any schedule
>>>>>>> function:
>>>>>>>
>>>>>>> [0]xkdb> dtrq
>>>>>>> CPU[00]: NOW:0x00003f2db9af369e
>>>>>>>  1: exp=0x00003ee31cb32200 fn:csched_tick
>> data:0000000000000000
>>>>>>>  2: exp=0x00003ee347ece164 fn:time_calibration
>> data:0000000000000000
>>>>>>>  3: exp=0x00003ee69a28f04b fn:mce_work_fn
>> data:0000000000000000
>>>>>>>  4: exp=0x00003f055895e25f fn:plt_overflow
>> data:0000000000000000
>>>>>>>  5: exp=0x00003ee353810216 fn:rtc_update_second
>> data:ffff83007f0226d8
>>>>>>> CPU[01]: NOW:0x00003f2db9af369e
>>>>>>>  1: exp=0x00003ee30b847988 fn:s_timer_fn
>> data:0000000000000000
>>>>>>>  2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback
>> data:ffff83007f022a68
>>>>>>>
>>>>>>> thanks
>>>>>>> Mukesh
>>>>>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yu, Ke

2009-Jul-07 07:14 UTC

head link

RE: [Xen-devel] dom0 hang

>-----Original Message-----
>From: Mukesh Rathor [mailto:mukesh.rathor@oracle.com]
>Sent: Tuesday, July 07, 2009 11:47 AM
>To: Yu, Ke
>Cc: George Dunlap; Tian, Kevin; xen-devel@lists.xensource.com; Kurt C.
>Hackel
>Subject: Re: [Xen-devel] dom0 hang
>
>
>Well, the problem takes long to reproduce (only on certain boxes). And then
it
>may not always happen. So I want to make sure I understand the fix, as it
>was pretty hard to debug.
Ok, looking forward your update. 
>
>While the fix will still allow softirqs pending, I guess, functionally
>it''s OK because after irq disable, it''ll check for pending
softirq, and
>just return. I think the comment about expecting no softirq pending
>should be fixed.
Right. the comment will also be fixed.
>
>BTW, why can''t the tick be suspended when csched_schedule()
concludes
>it''s idle vcpu before returning? won''t that would make it
less intrusive.
The tick suspend can be put in csched_schedule, but the suspend/resume logic is
still needed in acpi_processor_idle anyway, due to another dbs_timer
suspend/resume. The intention here is to make acpi_processor_idle the central
place for timers which are stoppable during idle period. If there is other
stoppable timer in the future, it can be easily added to acpi_processor_idle. So
it is clean to keep current logic. and as long as we carefully not over doing
the softirq, it looks not so intrusive. How do you think?

Best Regards
Ke
>
>thanks,
>Mukesh
>
>
>Yu, Ke wrote:
>> Hi Mukesh,
>>
>> Could you please try the following patch, to see if it can resolve the
issue
>you observed? Thanks.
>>
>> Best Regards
>> Ke
>>
>> diff -r d461c4d8af17 xen/arch/x86/acpi/cpu_idle.c
>> --- a/xen/arch/x86/acpi/cpu_idle.c
>> +++ b/xen/arch/x86/acpi/cpu_idle.c
>> @@ -228,10 +228,10 @@ static void acpi_processor_idle(void)
>>      /*
>>       * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
>>       * which will break the later assumption of no sofirq pending,
>> -     * so add do_softirq
>> +     * so process the pending timers
>>       */
>> -    if ( softirq_pending(smp_processor_id()) )
>> -        do_softirq();
>> +
>> +    process_pending_timers();
>>
>>      /*
>>       * Interrupts must be disabled during bus mastering calculations
and
>>
>>> -----Original Message-----
>>> From: Mukesh Rathor [mailto:mukesh.rathor@oracle.com]
>>> Sent: Friday, July 03, 2009 9:19 AM
>>> To: mukesh.rathor@oracle.com
>>> Cc: George Dunlap; Tian, Kevin; xen-devel@lists.xensource.com; Yu,
Ke;
>Kurt C.
>>> Hackel
>>> Subject: Re: [Xen-devel] dom0 hang
>>>
>>>
>>> Hi Kevin/Yu:
>>>
>>> acpi_processor_idle()
>>> {
>>>     sched_tick_suspend();
>>>      /*
>>>      * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
>>>      * which will break the later assumption of no sofirq pending,
>>>      * so add do_softirq
>>>      */
>>>     if ( softirq_pending(smp_processor_id()) )
>>>         do_softirq();             <==============>>>
>>>     local_irq_disable();
>>>     if ( softirq_pending(smp_processor_id()) )
>>>     {
>>>         local_irq_enable();
>>>         sched_tick_resume();
>>>         cpufreq_dbs_timer_resume();
>>>         return;
>>>     }
>>>
>>> wouldn''t the do_softirq() call scheduler with tick
suspended, and
>>> the scheduler then context switches to another vcpu0 (with *_BOOST)
>which
>>> would result in the stuck vcpu I described?
>>>
>>> thanks
>>> Mukesh
>>>
>>>
>>> Mukesh Rathor wrote:
>>>> ah, i totally missed csched_tick():
>>>>     if ( !is_idle_vcpu(current) )
>>>>         csched_vcpu_acct(cpu);
>>>>
>>>> yeah, looks like that''s what is going on. i''m
still waiting to
>>>> reproduce. at first glance, looking at c/s 19460, seems like
>>>> suspend/resume, well at least the resume, should happen in
>>>> csched_schedule().....
>>>>
>>>> thanks,
>>>> Mukesh
>>>>
>>>>
>>>> George Dunlap wrote:
>>>>> [Oops, adding back in distro list, also adding Kevin Tian
and Yu Ke
>>>>> who wrote cs 19460]
>>>>>
>>>>> The functionality I was talking about, subtracting credits
and
>>>>> clearing BOOST, happens in csched_vcpu_acct() (which is
different than
>>>>> csched_acct()).  vcpu_acct() is called from csched_tick(),
which
>>>>> should still happen every 10ms on every cpu.
>>>>>
>>>>> The patch I referred to (cs 19460) disables and re-enables
tickers in
>>>>> xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every
time the
>>>>> processor idles.  I can''t see anywhere else that
tickers are disabled,
>>>>> so it''s probably something not properly
re-enabling them again.
>>>>>
>>>>> Try applying the attached patch to see if that changes
anything.  (I''m
>>>>> on the road, so I can''t repro the lockup issue.) 
If that doesn''t
>>>>> work, try disabling c-states and see if that helps.  Then
at least
>>>>> we''ll know where the problem lies.
>>>>>
>>>>>  -George
>>>>>
>>>>> On Thu, Jul 2, 2009 at 10:10 PM, Mukesh
>>>>> Rathor<mukesh.rathor@oracle.com> wrote:
>>>>>> that seems to only suspend csched_pcpu.ticker which is
csched_tick
>>>>>> that is
>>>>>> only sorting local runq.
>>>>>>
>>>>>> again, we are concerned about csched_priv.master_ticker
that calls
>>>>>> csched_acct? correct, so i can trace that?
>>>>>>
>>>>>> thanks,
>>>>>> mukesh
>>>>>>
>>>>>>
>>>>>> George Dunlap wrote:
>>>>>>> Ah, I see that there''s been some changes
to tick stuff with the
>>>>>>> c-state (e.g., cs 19460).  It looks like
they''re supposed to be going
>>>>>>> still, but perhaps the tick_suspend() and
tick_resume() aren''t being
>>>>>>> called properly.  Let me take a closer look.
>>>>>>>
>>>>>>>  -George
>>>>>>>
>>>>>>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh
>>> Rathor<mukesh.rathor@oracle.com>
>>>>>>> wrote:
>>>>>>>> George Dunlap wrote:
>>>>>>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh
>>>>>>>>> Rathor<mukesh.rathor@oracle.com>
>>>>>>>>> wrote:
>>>>>>>>>> dom0 hang:
>>>>>>>>>>  vcpu0 is trying to wakeup a task and
in try_to_wake_up() calls
>>>>>>>>>>  task_rq_lock(). since the task has cpu
set to 1, it gets runq lock
>>>>>>>>>>  for vcpu1. next it calls
resched_task() which results in sending
>>>>>>>>>> IPI
>>>>>>>>>>  to vcpu1. for that, vcpu0 gets into
the
>>> HYPERVISOR_event_channel_op
>>>>>>>>>>  HCALL and is waiting to return.
Meanwhile, vcpu1 got running,
>>>>>>>>>> and is
>>>>>>>>>>  spinning on it''s runq lock in
>>>>>>>>>>
"schedule():spin_lock_irq(&rq->lock);",
>>>>>>>>>>  that vcpu0 is holding (and is waiting
to return from the HCALL).
>>>>>>>>>>
>>>>>>>>>>  As I had noticed before, vcpu0 never
gets scheduled in xen. So
>>>>>>>>>>  looking further into xen:
>>>>>>>>>>
>>>>>>>>>> xen:
>>>>>>>>>>  Both vcpu''s are on the same
runq, in this case cpu1. But the
>>>>>>>>>>  priority of vcpu1 has been set to
CSCHED_PRI_TS_BOOST. As a
>>> result,
>>>>>>>>>>  the scheduler always picks vcpu1, and
vcpu0 is starved. Also, I
>>>>>>>>>> see in
>>>>>>>>>>  kdb that the scheduler timer is not
set on cpu 0. That would''ve
>>>>>>>>>>  allowed csched_load_balance() to kick
in on cpu0. [Also, on
>>>>>>>>>>  cpu1, the accounting timer,
csched_tick, is not set.  Altho,
>>>>>>>>>>  csched_tick() is running on cpu0, it
only checks runq for cpu0.]
>>>>>>>>>>
>>>>>>>>>>  Looks like c/s 19500 changed
csched_schedule():
>>>>>>>>>>
>>>>>>>>>> -    ret.time =
MILLISECS(CSCHED_MSECS_PER_TSLICE);
>>>>>>>>>> +    ret.time =
(is_idle_vcpu(snext->vcpu) ?
>>>>>>>>>> +                -1 :
MILLISECS(CSCHED_MSECS_PER_TSLICE));
>>>>>>>>>>
>>>>>>>>>>  The quickest fix for us would be to
just back that out.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  BTW, just a comment on following (all
in sched_credit.c):
>>>>>>>>>>
>>>>>>>>>>    if ( svc->pri ==
CSCHED_PRI_TS_UNDER &&
>>>>>>>>>>       !(svc->flags &
CSCHED_FLAG_VCPU_PARKED) )
>>>>>>>>>>    {
>>>>>>>>>>       svc->pri =
CSCHED_PRI_TS_BOOST;
>>>>>>>>>>    }
>>>>>>>>>>  comibined with
>>>>>>>>>>  if ( snext->pri >
CSCHED_PRI_TS_OVER )
>>>>>>>>>>          __runq_remove(snext);
>>>>>>>>>>
>>>>>>>>>>    Setting CSCHED_PRI_TS_BOOST as pri
of vcpu seems
>dangerous.
>>> To
>>>>>>>>>> me,
>>>>>>>>>>    since csched_schedule() never checks
for time accumulated
>by a
>>>>>>>>>>    vcpu at pri CSCHED_PRI_TS_BOOST,
that is same as pinning a
>>>>>>>>>> vcpu to a
>>>>>>>>>>    pcpu. if that vcpu never makes
progress, essentially, the
>system
>>>>>>>>>>    has lost a physical cpu. 
Optionally, csched_schedule()
>should
>>>>>>>>>> always
>>>>>>>>>>    check for cpu time accumulated and
reduce the priority over
>>> time.
>>>>>>>>>>    I can''t tell right off if
it already does that. or something like
>>>>>>>>>>    that :)...  my 2 cents.
>>>>>>>>> Hmm... what''s supposed to happen
is that eventually a timer tick
>will
>>>>>>>>> interrupt vcpu1.  If cpu1 is set to be
"active", then it will be
>>>>>>>>> debited 10ms worth of credit.  Eventually,
it will go into OVER,
>and
>>>>>>>>> lose BOOST.  If it''s
"inactive", then when the tick happens, it will
>>>>>>>>> be set to "active" and be debited
10ms again, setting it directly
>>>>>>>>> into
>>>>>>>>> OVER (and thus also losing boost).
>>>>>>>>>
>>>>>>>>> Can you see if the timer ticks are still
happening, and perhaps put
>>>>>>>>> some tracing it to verify that what I
described above is happening?
>>>>>>>>>
>>>>>>>>>  -George
>>>>>>>> George,
>>>>>>>>
>>>>>>>> Is that in csched_acct()? Looks like
that''s somehow gotten removed.
>If
>>>>>>>> true, then may be that''s the
fundamental problem to chase.
>>>>>>>>
>>>>>>>> Here''s what the trq looks like when
hung, not in any schedule
>>>>>>>> function:
>>>>>>>>
>>>>>>>> [0]xkdb> dtrq
>>>>>>>> CPU[00]: NOW:0x00003f2db9af369e
>>>>>>>>  1: exp=0x00003ee31cb32200 fn:csched_tick
>>> data:0000000000000000
>>>>>>>>  2: exp=0x00003ee347ece164 fn:time_calibration
>>> data:0000000000000000
>>>>>>>>  3: exp=0x00003ee69a28f04b fn:mce_work_fn
>>> data:0000000000000000
>>>>>>>>  4: exp=0x00003f055895e25f fn:plt_overflow
>>> data:0000000000000000
>>>>>>>>  5: exp=0x00003ee353810216 fn:rtc_update_second
>>> data:ffff83007f0226d8
>>>>>>>> CPU[01]: NOW:0x00003f2db9af369e
>>>>>>>>  1: exp=0x00003ee30b847988 fn:s_timer_fn
>>> data:0000000000000000
>>>>>>>>  2: exp=0x00003f1b309ebd45
fn:pmt_timer_callback
>>> data:ffff83007f022a68
>>>>>>>>
>>>>>>>> thanks
>>>>>>>> Mukesh
>>>>>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-07 07:24 UTC

head link

Re: [Xen-devel] dom0 hang

On 07/07/2009 08:14, "Yu, Ke" <ke.yu@intel.com> wrote:
>> BTW, why can''t the tick be suspended when csched_schedule()
concludes
>> it''s idle vcpu before returning? won''t that would
make it less intrusive.
> 
> The tick suspend can be put in csched_schedule, but the suspend/resume
logic
> is still needed in acpi_processor_idle anyway, due to another dbs_timer
> suspend/resume. The intention here is to make acpi_processor_idle the
central
> place for timers which are stoppable during idle period. If there is other
> stoppable timer in the future, it can be easily added to
acpi_processor_idle.
> So it is clean to keep current logic. and as long as we carefully not over
> doing the softirq, it looks not so intrusive. How do you think?
I think the approach is fine. I also already applied your patch since it is
obviously a good bug fix, even if it doesn''t fix Mukesh''s bug.
And I fixed
the comment at the same time. And I backported it for Xen 3.4.1.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Jul-07 18:28 UTC

head link

Re: [Xen-devel] dom0 hang

Keir Fraser wrote:> On 07/07/2009 08:14, "Yu, Ke" <ke.yu@intel.com> wrote:
> 
>>> BTW, why can''t the tick be suspended when
csched_schedule() concludes
>>> it''s idle vcpu before returning? won''t that would
make it less intrusive.
>> The tick suspend can be put in csched_schedule, but the suspend/resume
logic
>> is still needed in acpi_processor_idle anyway, due to another dbs_timer
>> suspend/resume. The intention here is to make acpi_processor_idle the
central
>> place for timers which are stoppable during idle period. If there is
other
>> stoppable timer in the future, it can be easily added to
acpi_processor_idle.
>> So it is clean to keep current logic. and as long as we carefully not
over
>> doing the softirq, it looks not so intrusive. How do you think?
> 
> I think the approach is fine. I also already applied your patch since it is
> obviously a good bug fix, even if it doesn''t fix Mukesh''s
bug. And I fixed
> the comment at the same time. And I backported it for Xen 3.4.1.
> 
>  -- Keir
> 
It fixes my bug.

Thanks,
Mukesh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jun 2009 - dom0 hang

[Xen-devel] dom0 hang

Re: [Xen-devel] dom0 hang

Re: [Xen-devel] dom0 hang

Re: [Xen-devel] dom0 hang

Re: [Xen-devel] dom0 hang

Re: [Xen-devel] dom0 hang

Re: [Xen-devel] dom0 hang

Re: [Xen-devel] dom0 hang

RE: [Xen-devel] dom0 hang

RE: [Xen-devel] dom0 hang

Re: [Xen-devel] dom0 hang

RE: [Xen-devel] dom0 hang

Re: [Xen-devel] dom0 hang

Re: [Xen-devel] dom0 hang