I wonder if anybody uses kdb_stop_cpus with non-default value. If, yes, I am very interested to learn about your usecase for it. I think that the default kdb behavior is the correct one, so it doesn't make sense to have a knob to turn on incorrect behavior. But I may be missing something obvious. The comment in the code doesn't really satisfy me: /* * Flag indicating whether or not to IPI the other CPUs to stop them on * entering the debugger. Sometimes, this will result in a deadlock as * stop_cpus() waits for the other cpus to stop, so we allow it to be * disabled. In order to maximize the chances of success, use a hard * stop for that. */ The hard stop should be sufficiently mighty. Yes, I am aware of supposedly extremely rare situations where a deadlock could happen even when using hard stop. But I'd rather fix that than have this switch. Oh, the commit message (from 2004) explains it:> Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we > attempt to IPI other cpus when entering the debugger in order to stop > them while in the debugger. The default remains to issue the stop; > however, that can result in a hang if another cpu has interrupts disabled > and is spinning, since the IPI won't be received and the KDB will wait > indefinitely. We probably need to add a timeout, but this is a useful > stopgap in the mean time.But that was before we started using hard stop in this context (in 2009). -- Andriy Gapon
On 06/03/11 10:13, Andriy Gapon wrote:> > I wonder if anybody uses kdb_stop_cpus with non-default value. > If, yes, I am very interested to learn about your usecase for it. > > I think that the default kdb behavior is the correct one, so it doesn't make sense > to have a knob to turn on incorrect behavior. > But I may be missing something obvious. > > The comment in the code doesn't really satisfy me: > /* > * Flag indicating whether or not to IPI the other CPUs to stop them on > * entering the debugger. Sometimes, this will result in a deadlock as > * stop_cpus() waits for the other cpus to stop, so we allow it to be > * disabled. In order to maximize the chances of success, use a hard > * stop for that. > */ > > The hard stop should be sufficiently mighty. > Yes, I am aware of supposedly extremely rare situations where a deadlock could > happen even when using hard stop. But I'd rather fix that than have this switch. > > Oh, the commit message (from 2004) explains it: >> Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we >> attempt to IPI other cpus when entering the debugger in order to stop >> them while in the debugger. The default remains to issue the stop; >> however, that can result in a hang if another cpu has interrupts disabled >> and is spinning, since the IPI won't be received and the KDB will wait >> indefinitely. We probably need to add a timeout, but this is a useful >> stopgap in the mean time. > > But that was before we started using hard stop in this context (in 2009).Some non-x86 platforms (e.g. PPC) don't support real NMIs, and so this still applies. -Nathan
On 3 Jun 2011, at 16:13, Andriy Gapon wrote:> I wonder if anybody uses kdb_stop_cpus with non-default value. > If, yes, I am very interested to learn about your usecase for it.The issue that prompted the sysctl was non-NMI IPIs being used to enter the debugger or reboot following a core hanging with interrupts disabled. With the switch to NMI IPIs in some of those circumstances, life is better -- at least, on hardware that supports non-maskable IPIs. I seem to recall sparc64 doesn't, however? Not sure about MIPS, etc. Attilio has since significantly improved our shutdown behaviour -- initially, the switch to NMI IPIs broke other things (because certain IPIs then improperly preempted threads holding spinlocks), but that pretty much all seems worked out now. Robert> > I think that the default kdb behavior is the correct one, so it doesn't make sense > to have a knob to turn on incorrect behavior. > But I may be missing something obvious. > > The comment in the code doesn't really satisfy me: > /* > * Flag indicating whether or not to IPI the other CPUs to stop them on > * entering the debugger. Sometimes, this will result in a deadlock as > * stop_cpus() waits for the other cpus to stop, so we allow it to be > * disabled. In order to maximize the chances of success, use a hard > * stop for that. > */ > > The hard stop should be sufficiently mighty. > Yes, I am aware of supposedly extremely rare situations where a deadlock could > happen even when using hard stop. But I'd rather fix that than have this switch. > > Oh, the commit message (from 2004) explains it: >> Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we >> attempt to IPI other cpus when entering the debugger in order to stop >> them while in the debugger. The default remains to issue the stop; >> however, that can result in a hang if another cpu has interrupts disabled >> and is spinning, since the IPI won't be received and the KDB will wait >> indefinitely. We probably need to add a timeout, but this is a useful >> stopgap in the mean time. > > But that was before we started using hard stop in this context (in 2009). > > -- > Andriy Gapon
on 03/06/2011 18:13 Andriy Gapon said the following:> > I wonder if anybody uses kdb_stop_cpus with non-default value.I would like to go ahead and remove kdb_stop_cpus tunable/sysctl if nobody objects.> If, yes, I am very interested to learn about your usecase for it. > > I think that the default kdb behavior is the correct one, so it doesn't make sense > to have a knob to turn on incorrect behavior. > But I may be missing something obvious. > > The comment in the code doesn't really satisfy me: > /* > * Flag indicating whether or not to IPI the other CPUs to stop them on > * entering the debugger. Sometimes, this will result in a deadlock as > * stop_cpus() waits for the other cpus to stop, so we allow it to be > * disabled. In order to maximize the chances of success, use a hard > * stop for that. > */ > > The hard stop should be sufficiently mighty. > Yes, I am aware of supposedly extremely rare situations where a deadlock could > happen even when using hard stop. But I'd rather fix that than have this switch. > > Oh, the commit message (from 2004) explains it: >> Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we >> attempt to IPI other cpus when entering the debugger in order to stop >> them while in the debugger. The default remains to issue the stop; >> however, that can result in a hang if another cpu has interrupts disabled >> and is spinning, since the IPI won't be received and the KDB will wait >> indefinitely. We probably need to add a timeout, but this is a useful >> stopgap in the mean time. > > But that was before we started using hard stop in this context (in 2009). >-- Andriy Gapon