Juergen Gross
2009-Mar-06 08:46 UTC
[Xen-devel] [Patch] avoid deadlock during console output
Hi, during my test for cpupools I''ve found an issue in console output. Sometimes the hypervisor hangs up due to a deadlock if something is printed to the console via printk if a per-cpu scheduler lock is held by the printing processor. Inside printk an event is sent to dom0 which in some cases leads to a call of vcpu_wake resulting in the deadlock. This problem occurs when calling BUG during holding the lock, too. This issue is easily reproducable on a system with multiple cpus under low load by calling xm debug-keys r to dump the schedulers run-queues. On my 4-core machine I need only about 5 calls to stop the machine. The attached patch solves the problem by avoiding sending the event in critical paths. Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-06 09:11 UTC
Re: [Xen-devel] [Patch] avoid deadlock during console output
On 06/03/2009 08:46, "Juergen Gross" <juergen.gross@fujitsu-siemens.com> wrote:> to dump the schedulers run-queues. On my 4-core machine I need only about 5 > calls to stop the machine. > > The attached patch solves the problem by avoiding sending the event in > critical paths.Ugly. Instead we can defer the dom0 notification to a tasklet. I''ll make a patch for that myself. Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2009-Mar-09 06:18 UTC
Re: [Xen-devel] [Patch] avoid deadlock during console output
Keir Fraser wrote:> On 06/03/2009 08:46, "Juergen Gross" <juergen.gross@fujitsu-siemens.com> > wrote: > >> to dump the schedulers run-queues. On my 4-core machine I need only about 5 >> calls to stop the machine. >> >> The attached patch solves the problem by avoiding sending the event in >> critical paths. > > Ugly. Instead we can defer the dom0 notification to a tasklet. I''ll make a > patch for that myself.Hmm, do you think your patch is okay? tasklet_schedule is taking another lock and uses BUG_ON then... I would suggest to modify tasklet_schedule: if ( !t->is_scheduled && !t->is_running ) { if (!list_empty(&t->list)) { spin_unlock_irqrestore(&tasklet_lock, flags); BUG(); } list_add_tail(&t->list, &tasklet_list); } Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2009-Mar-09 07:45 UTC
Re: [Xen-devel] [Patch] avoid deadlock during console output
Juergen Gross wrote:> Keir Fraser wrote: >> On 06/03/2009 08:46, "Juergen Gross" <juergen.gross@fujitsu-siemens.com> >> wrote: >> >>> to dump the schedulers run-queues. On my 4-core machine I need only about 5 >>> calls to stop the machine. >>> >>> The attached patch solves the problem by avoiding sending the event in >>> critical paths. >> Ugly. Instead we can defer the dom0 notification to a tasklet. I''ll make a >> patch for that myself. > > Hmm, do you think your patch is okay? > tasklet_schedule is taking another lock and uses BUG_ON then... > I would suggest to modify tasklet_schedule: > > if ( !t->is_scheduled && !t->is_running ) > { > if (!list_empty(&t->list)) > {t->is_dead = 1;> spin_unlock_irqrestore(&tasklet_lock, flags); > BUG(); > } > list_add_tail(&t->list, &tasklet_list); > }recursion should be avoided, of course! Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-09 08:28 UTC
Re: [Xen-devel] [Patch] avoid deadlock during console output
On 09/03/2009 06:18, "Juergen Gross" <juergen.gross@fujitsu-siemens.com> wrote:> Hmm, do you think your patch is okay? > tasklet_schedule is taking another lock and uses BUG_ON then... > I would suggest to modify tasklet_schedule:I can live with the error patch of BUG_ON() not working in this one case. I don''t think releasing the lock suffices anyway, as we''ll just end up in a recursive loop until the hypervisor stack overflows. The BUG_ON() here is more for informative code annotation than because it''s at all likely to fire. It perhaps makes sense to disable the tasklet_schedule() based on a flag set in console_force_unlock(). That would at least allow NMI-based watchdog to print if we did ever hit the BUG_ON() deadlock (or any other crash while the tasklet lock is held). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel