Jaeyong Yoo
2013-May-20 00:41 UTC
Bug report and patch about IRQ freezing after gic_restore_state
Hello xen, I''m running xen on Arndale board and if I run both iperf and du command at Dom0, one of IRQ (either SATA or network) suddenly stop occuring anymore. After some investigation, I found out that when context switching at Xen, IRQs in LR (about to be delivered to Doms) could be lost and never occur anymore. Here goes function call sequence that this problem occurs: (in context switching) - schedule_tail - ctxt_switch_from - local_irq_enable - // after this part, some IRQ can occur and could be directly written to LR - ctxt_switch_to - ... (some more functions) - // before the above IRQ is delivered to Dom (and maintenance IRQ not called), // gic_restore_state can be called - gic_restore_state /* when restoring gic state, the above IRQ * (written to LR) is overwritten * to the previous values, and somehow, * the corresponding IRQ never occur again */ I made the following patch (i.e., enable local irq after gic_restore_state) for preventing the above problem. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/arch/arm/domain.c | 4 ++-- xen/arch/arm/gic.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index f71b582..2c3b132 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -141,6 +141,8 @@ static void ctxt_switch_to(struct vcpu *n) /* VGIC */ gic_restore_state(n); + local_irq_enable(); + /* XXX VFP */ /* XXX MPU */ @@ -215,8 +217,6 @@ static void schedule_tail(struct vcpu *prev) { ctxt_switch_from(prev); - local_irq_enable(); - /* TODO update_runstate_area(current); */ diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index d4f0a43..8186ad8 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -81,11 +81,11 @@ void gic_restore_state(struct vcpu *v) if ( is_idle_vcpu(v) ) return; - spin_lock_irq(&gic.lock); + spin_lock(&gic.lock); this_cpu(lr_mask) = v->arch.lr_mask; for ( i=0; i<nr_lrs; i++) GICH[GICH_LR + i] = v->arch.gic_lr[i]; - spin_unlock_irq(&gic.lock); + spin_unlock(&gic.lock); GICH[GICH_APR] = v->arch.gic_apr; GICH[GICH_HCR] = GICH_HCR_EN; isb(); best, Jaeyong
Julien Grall
2013-May-20 13:11 UTC
Re: Bug report and patch about IRQ freezing after gic_restore_state
On 05/20/2013 01:41 AM, Jaeyong Yoo wrote: Hello,> I''m running xen on Arndale board and if I run both iperf and du command at Dom0, > one of IRQ (either SATA or network) suddenly stop occuring anymore. > After some investigation, I found out that when context switching at Xen, > IRQs in LR (about to be delivered to Doms) could be lost and never occur anymore. > Here goes function call sequence that this problem occurs: > (in context switching) > - schedule_tail > - ctxt_switch_from > - local_irq_enable > - // after this part, some IRQ can occur and could be directly written to LR > - ctxt_switch_to > - ... (some more functions) > - // before the above IRQ is delivered to Dom (and maintenance IRQ not called), > // gic_restore_state can be called > - gic_restore_state /* when restoring gic state, the above IRQ > * (written to LR) is overwritten > * to the previous values, and somehow, > * the corresponding IRQ never occur again */ > > I made the following patch (i.e., enable local irq after gic_restore_state) > for preventing the above problem.Thanks for the patch, I was looking with a similar error on the Arndale Board for a couple of day.> Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > --- > xen/arch/arm/domain.c | 4 ++-- > xen/arch/arm/gic.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index f71b582..2c3b132 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -141,6 +141,8 @@ static void ctxt_switch_to(struct vcpu *n) > /* VGIC */ > gic_restore_state(n); > + local_irq_enable(); > +Could you move the local_irq_enable right after ctxt_switch_to?> /* XXX VFP */ > /* XXX MPU */ > @@ -215,8 +217,6 @@ static void schedule_tail(struct vcpu *prev) > { > ctxt_switch_from(prev); > - local_irq_enable(); > - > /* TODO > update_runstate_area(current); > */ > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index d4f0a43..8186ad8 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -81,11 +81,11 @@ void gic_restore_state(struct vcpu *v) > if ( is_idle_vcpu(v) ) > return; > - spin_lock_irq(&gic.lock); > + spin_lock(&gic.lock); > this_cpu(lr_mask) = v->arch.lr_mask; > for ( i=0; i<nr_lrs; i++) > GICH[GICH_LR + i] = v->arch.gic_lr[i]; > - spin_unlock_irq(&gic.lock); > + spin_unlock(&gic.lock);As the IRQ is disabled and the GICH registers can only be modified by the current physical CPU, I think you can remove the spin_{,un}lock and replace it by a dsb.> GICH[GICH_APR] = v->arch.gic_apr; > GICH[GICH_HCR] = GICH_HCR_EN; > isb();Cheers, -- Julien
유재용
2013-May-21 11:13 UTC
Re: Bug report and patch about IRQ freezing after gic_restore_state
> > Signed-off-by: Jaeyong Yoo > > --- > > xen/arch/arm/domain.c | 4 ++-- > > xen/arch/arm/gic.c | 4 ++-- > > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > > index f71b582..2c3b132 100644 > > --- a/xen/arch/arm/domain.c > > +++ b/xen/arch/arm/domain.c > > @@ -141,6 +141,8 @@ static void ctxt_switch_to(struct vcpu *n) > > /* VGIC */ > > gic_restore_state(n); > > + local_irq_enable(); > > + > > Could you move the local_irq_enable right after ctxt_switch_to?Of course. Just one small concern: Would it be more efficient (i.e., less scheduling latency) if we enable irq right after gic_restore state similar to lock-breaking effect?> > > /* XXX VFP */ > > /* XXX MPU */ > > @@ -215,8 +217,6 @@ static void schedule_tail(struct vcpu *prev) > > { > > ctxt_switch_from(prev); > > - local_irq_enable(); > > - > > /* TODO > > update_runstate_area(current); > > */ > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index d4f0a43..8186ad8 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -81,11 +81,11 @@ void gic_restore_state(struct vcpu *v) > > if ( is_idle_vcpu(v) ) > > return; > > - spin_lock_irq(&gic.lock); > > + spin_lock(&gic.lock); > > this_cpu(lr_mask) = v->arch.lr_mask; > > for ( i=0; i> GICH[GICH_LR + i] = v->arch.gic_lr[i]; > > - spin_unlock_irq(&gic.lock); > > + spin_unlock(&gic.lock); > > As the IRQ is disabled and the GICH registers can only be modified by > the current physical CPU, I think you can remove the spin_{,un}lock and > replace it by a dsb.dsb sounds ok.> > > GICH[GICH_APR] = v->arch.gic_apr; > > GICH[GICH_HCR] = GICH_HCR_EN; > > isb(); >Best, Jaeyong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Stefano Stabellini
2013-May-21 12:00 UTC
Re: Bug report and patch about IRQ freezing after gic_restore_state
On Mon, 20 May 2013, Julien Grall wrote:> On 05/20/2013 01:41 AM, Jaeyong Yoo wrote: > > Hello, > > > I''m running xen on Arndale board and if I run both iperf and du command at Dom0, > > one of IRQ (either SATA or network) suddenly stop occuring anymore. > > After some investigation, I found out that when context switching at Xen, > > IRQs in LR (about to be delivered to Doms) could be lost and never occur anymore. > > Here goes function call sequence that this problem occurs: > > (in context switching) > > - schedule_tail > > - ctxt_switch_from > > - local_irq_enable > > - // after this part, some IRQ can occur and could be directly written to LR > > - ctxt_switch_to > > - ... (some more functions) > > - // before the above IRQ is delivered to Dom (and maintenance IRQ not called), > > // gic_restore_state can be called > > - gic_restore_state /* when restoring gic state, the above IRQ > > * (written to LR) is overwritten > > * to the previous values, and somehow, > > * the corresponding IRQ never occur again */ > > > > I made the following patch (i.e., enable local irq after gic_restore_state) > > for preventing the above problem. > > Thanks for the patch, I was looking with a similar error on the Arndale > Board for a couple of day.Indeed, thanks for the analysis of the bug and the patch! It is a particularly difficult bug to track down because it can only happen if an irq arrives after ctxt_switch_from and before ctxt_switch_to, and the irq is for the next vcpu to be scheduled on the pcpu (otherwise the v == current check at the beginning of gic_set_guest_irq would catch that). Rather than extending the check in gic_set_guest_irq, I think it is wise to run ctxt_switch_to with interrupts disabled.> > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > --- > > xen/arch/arm/domain.c | 4 ++-- > > xen/arch/arm/gic.c | 4 ++-- > > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > > index f71b582..2c3b132 100644 > > --- a/xen/arch/arm/domain.c > > +++ b/xen/arch/arm/domain.c > > @@ -141,6 +141,8 @@ static void ctxt_switch_to(struct vcpu *n) > > /* VGIC */ > > gic_restore_state(n); > > + local_irq_enable(); > > + > > Could you move the local_irq_enable right after ctxt_switch_to?Right, good idea.> > /* XXX VFP */ > > /* XXX MPU */ > > @@ -215,8 +217,6 @@ static void schedule_tail(struct vcpu *prev) > > { > > ctxt_switch_from(prev); > > - local_irq_enable(); > > - > > /* TODO > > update_runstate_area(current); > > */ > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index d4f0a43..8186ad8 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -81,11 +81,11 @@ void gic_restore_state(struct vcpu *v) > > if ( is_idle_vcpu(v) ) > > return; > > - spin_lock_irq(&gic.lock); > > + spin_lock(&gic.lock); > > this_cpu(lr_mask) = v->arch.lr_mask; > > for ( i=0; i<nr_lrs; i++) > > GICH[GICH_LR + i] = v->arch.gic_lr[i]; > > - spin_unlock_irq(&gic.lock); > > + spin_unlock(&gic.lock); > > As the IRQ is disabled and the GICH registers can only be modified by > the current physical CPU, I think you can remove the spin_{,un}lock and > replace it by a dsb.Yes, we can remove the spin_lock but I don''t think we need a dsb there. See the presence of an isb() two lines below.
Ian Campbell
2013-May-21 13:00 UTC
Re: Bug report and patch about IRQ freezing after gic_restore_state
On Tue, 2013-05-21 at 11:13 +0000, 유재용 wrote:> > > Signed-off-by: Jaeyong Yoo > > > --- > > > xen/arch/arm/domain.c | 4 ++-- > > > xen/arch/arm/gic.c | 4 ++-- > > > 2 files changed, 4 insertions(+), 4 deletions(-) > > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > > > index f71b582..2c3b132 100644 > > > --- a/xen/arch/arm/domain.c > > > +++ b/xen/arch/arm/domain.c > > > @@ -141,6 +141,8 @@ static void ctxt_switch_to(struct vcpu *n) > > > /* VGIC */ > > > gic_restore_state(n); > > > + local_irq_enable(); > > > + > > > > Could you move the local_irq_enable right after ctxt_switch_to? > > Of course. > Just one small concern: > Would it be more efficient (i.e., less scheduling latency) if we > enable irq right after gic_restore state similar to lock-breaking > effect?Perhaps, but I think at this stage we should aim for obvious correctness. Optimising things for IRQ/scheduler latency is a complete project in its own right. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jaeyong Yoo
2013-May-22 02:34 UTC
Re: Bug report and patch about IRQ freezing after gic_restore_state
> On Tue, 2013-05-21 at 11:13 +0000, 유재용 wrote: > > > > Signed-off-by: Jaeyong Yoo > > > > --- > > > > xen/arch/arm/domain.c | 4 ++-- > > > > xen/arch/arm/gic.c | 4 ++-- > > > > 2 files changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > > > > index f71b582..2c3b132 100644 > > > > --- a/xen/arch/arm/domain.c > > > > +++ b/xen/arch/arm/domain.c > > > > @@ -141,6 +141,8 @@ static void ctxt_switch_to(struct vcpu *n) > > > > /* VGIC */ > > > > gic_restore_state(n); > > > > + local_irq_enable(); > > > > + > > > > > > Could you move the local_irq_enable right after ctxt_switch_to? > > > > Of course. > > Just one small concern: > > Would it be more efficient (i.e., less scheduling latency) if we > > enable irq right after gic_restore state similar to lock-breaking > > effect? > > Perhaps, but I think at this stage we should aim for obvious > correctness. > > Optimising things for IRQ/scheduler latency is a complete project in its > own right.I got it. Here goes the updated patch: Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> ___ xen/arch/arm/domain.c | 4 ++-- xen/arch/arm/gic.c | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 9ca44ea..ee12b5f 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -226,10 +226,10 @@ static void schedule_tail(struct vcpu *prev) { ctxt_switch_from(prev); - local_irq_enable(); - ctxt_switch_to(current); + local_irq_enable(); + if ( prev != current ) update_runstate_area(current); } diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 30bf8d1..d9940ea 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -93,11 +93,9 @@ void gic_restore_state(struct vcpu *v) if ( is_idle_vcpu(v) ) return; - spin_lock_irq(&gic.lock); this_cpu(lr_mask) = v->arch.lr_mask; for ( i=0; i<nr_lrs; i++) GICH[GICH_LR + i] = v->arch.gic_lr[i]; - spin_unlock_irq(&gic.lock); GICH[GICH_APR] = v->arch.gic_apr; GICH[GICH_HCR] = GICH_HCR_EN; isb(); best, Jaeyong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Stefano Stabellini
2013-May-22 16:55 UTC
Re: Bug report and patch about IRQ freezing after gic_restore_state
On Wed, 22 May 2013, Jaeyong Yoo wrote:> I got it. > Here goes the updated patch: > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>> xen/arch/arm/domain.c | 4 ++-- > xen/arch/arm/gic.c | 2 -- > 2 files changed, 2 insertions(+), 4 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index 9ca44ea..ee12b5f 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -226,10 +226,10 @@ static void schedule_tail(struct vcpu *prev) > { > ctxt_switch_from(prev); > > - local_irq_enable(); > - > ctxt_switch_to(current); > > + local_irq_enable(); > + > if ( prev != current ) > update_runstate_area(current); > } > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index 30bf8d1..d9940ea 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -93,11 +93,9 @@ void gic_restore_state(struct vcpu *v) > if ( is_idle_vcpu(v) ) > return; > > - spin_lock_irq(&gic.lock); > this_cpu(lr_mask) = v->arch.lr_mask; > for ( i=0; i<nr_lrs; i++) > GICH[GICH_LR + i] = v->arch.gic_lr[i]; > - spin_unlock_irq(&gic.lock); > GICH[GICH_APR] = v->arch.gic_apr; > GICH[GICH_HCR] = GICH_HCR_EN; > isb(); > > best, > Jaeyong > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Ian Campbell
2013-May-23 13:24 UTC
Re: Bug report and patch about IRQ freezing after gic_restore_state
On Wed, 2013-05-22 at 17:55 +0100, Stefano Stabellini wrote:> On Wed, 22 May 2013, Jaeyong Yoo wrote: > > I got it. > > Here goes the updated patch:Please can you include the (updated as necessary) changelog when reposting variants on patches. I have written one for you this time.> > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>I applied this but I had to workaround the fact that the patch had DOS line endings. Jaeyong, please can you try and fix this for next time. The wiki page http://wiki.xen.org/wiki/Submitting_Xen_Patches might help with some advice for tooling, in particular it has advice on using git as well as a link to http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/email-clients.txt;hb=HEAD which has lots of useful advice on convincing your MUA not to mangle things... What I have committed is below, please check that it is correct. Cheers, Ian. commit 9f5179a4ecafd9a15e0a066e0f935ded681bf997 Author: Jaeyong Yoo <jaeyong.yoo@samsung.com> Date: Wed May 22 02:34:18 2013 +0000 xen/arm: Disable interrupts for the entire duration of the context switch Not just while saving state. Otherwise there is a race between interrupts arriving and updating the LR state and gic_restore_state overwriting them with the saved state. With this change we no longer need to disable interrupts in gic_restore_state. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [ ijc -- rewrote commit message ] diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 9ca44ea..ee12b5f 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -226,10 +226,10 @@ static void schedule_tail(struct vcpu *prev) { ctxt_switch_from(prev); - local_irq_enable(); - ctxt_switch_to(current); + local_irq_enable(); + if ( prev != current ) update_runstate_area(current); } diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 30bf8d1..d9940ea 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -93,11 +93,9 @@ void gic_restore_state(struct vcpu *v) if ( is_idle_vcpu(v) ) return; - spin_lock_irq(&gic.lock); this_cpu(lr_mask) = v->arch.lr_mask; for ( i=0; i<nr_lrs; i++) GICH[GICH_LR + i] = v->arch.gic_lr[i]; - spin_unlock_irq(&gic.lock); GICH[GICH_APR] = v->arch.gic_apr; GICH[GICH_HCR] = GICH_HCR_EN; isb();
Jaeyong Yoo
2013-May-23 23:57 UTC
Re: Bug report and patch about IRQ freezing after gic_restore_state
Samsung Enterprise Portal mySingle >On Wed, 2013-05-22 at 17:55 +0100, Stefano Stabellini wrote: >> On Wed, 22 May 2013, Jaeyong Yoo wrote: >> > I got it. >> > Here goes the updated patch: > >Please can you include the (updated as necessary) changelog when >reposting variants on patches. I have written one for you this time. > Oops, sorry. >> > >> > Signed-off-by: Jaeyong Yoo >> >> Acked-by: Stefano Stabellini > >I applied this but I had to workaround the fact that the patch had DOS >line endings. > >Jaeyong, please can you try and fix this for next time. > >The wiki page http://wiki.xen.org/wiki/Submitting_Xen_Patches might help >with some advice for tooling, in particular it has advice on using git >as well as a link to >http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/email-clients.txt;hb=HEAD which has lots of useful advice on convincing your MUA not to mangle things... Thanks for the info. I will follow it for next time. > >What I have committed is below, please check that it is correct. It is correct. Thanks again. best, Jaeyong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel