On Wed, May 10, 2017 at 03:29:39PM -0700, Olivier Cinquin
wrote:> Hi,
> I'm getting the following kernel panics on recent 11-STABLE (r317883):
>
> spin lock 0xffffffff81df43d0 (smp rendezvous) held by 0xfffff8019c7a7000
(tid 100845) too long
> timeout stopping cpus
> panic: spin lock held too long
> cpuid = 12
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfffffe3e64caf2a0
> vpanic() at vpanic+0x186/frame 0xfffffe3e64caf320
> panic() at panic+0x43/frame 0xfffffe3e64caf380
> _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0x311/frame
0xfffffe3e64caf3f0
> smp_targeted_tlb_shootdown() at smp_targeted_tlb_shootdown+0x101/frame
0xfffffe3e64caf470
> smp_masked_invlpg_range() at smp_masked_invlpg_range+0x50/frame
0xfffffe3e64caf4a0
> pmap_invalidate_range() at pmap_invalidate_range+0x196/frame
0xfffffe3e64caf4e0
> vm_thread_stack_dispose() at vm_thread_stack_dispose+0x2f/frame
0xfffffe3e64caf530
> thread_free() at thread_free+0x39/frame 0xfffffe3e64caf550
> thread_reap() at thread_reap+0x10e/frame 0xfffffe3e64caf570
> proc_reap() at proc_reap+0x6bd/frame 0xfffffe3e64caf5b0
> proc_to_reap() at proc_to_reap+0x48c/frame 0xfffffe3e64caf600
> kern_wait6() at kern_wait6+0x49e/frame 0xfffffe3e64caf6b0
> sys_wait4() at sys_wait4+0x78/frame 0xfffffe3e64caf8a0
> amd64_syscall() at amd64_syscall+0x6c4/frame 0xfffffe3e64cafa30
> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe3e64cafa30
>
>
> The panics occur sporadically and seemingly randomly. They occur on
> multiple machines with the same configuration, and on older revisions of
> 11-STABLE as well as r317883 (I cannot easily bisect this down to a
> specific commit given that it can take days or sometimes weeks for the
> panic to occur). Note that my kernel is compiled with "options
IPSEC", but
> that does not seem to be relevant.
>
> My understanding of the overall problem is that an IPI is sent to all cores
> by smp_rendezvous_cpus, but that some of the cores do not respond to it (or
> at least do not respond to it correctly and in time). More specifically, I
> can find 61 threads that seem to be blocked in cpustop_handler on an atomic
> operation to indicate that they have stopped in response to the IPI. This
> operation is CPU_SET_ATOMIC(cpu, &stopped_cpus); CPU_SET_ATOMIC boils
down
> to a call to "atomic_set_long" (not sure where that is itself
defined for
> amd64). Either there is a line numbering problem, or this suggests that
> perhaps there is a deadlock at this step (unless there's some sort of
> livelock and CPU_SET_ATOMIC is the place where the threads spend most of
> their time).
>
> Looking at the thread backtraces as a whole, I can see the one that
> triggered the panic, the 61 that are in cpustop_handler, and a lot of
> sleeping threads. I guess each core should have an active thread, and this
> is an architecture with 64 cores, so that leaves 64 - (1 + 61) = 2 cores
> that are unaccounted for. Interestingly, for 2 different panics for which I
> have a vmcore file these numbers are the same. For these two panics, the
> IDs of the cores that are neither in cpustop_handler nor calling the smp
> rendez vous are not the same (#18 and #19 in one instance, #44 and #45 in
> the other instance) but in both instances the IDs are consecutive; perhaps
> that's relevant.
>
> I've uploaded a full set of backtraces at
> https://gist.github.com/cinquin/d63cdf9de01b0b8033c47128868f2d38 and a
> dmesg at https://gist.github.com/cinquin/17c0cf6ac7832fd1b683488d08d3e69b
> (in short, the machine is a 4 processor Opteron 6274 system). I'm
pasting below
> an example backtrace of the threads in stopcpu_handler.
>
> Any suggestions as to what the problem might be and how to solve it?
I've
> saved the core and can provide further information.
>
> Thanks
> Olivier Cinquin
>
> A total of 61 threads are along the lines of
>
> #0 cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1275
> #1 0xffffffff8105a6f5 in ipi_nmi_handler () at
/usr/src/sys/x86/x86/mp_x86.c:1231
> #2 0xffffffff80ef740a in trap (frame=0xfffffe3e687f7f30) at
/usr/src/sys/amd64/amd64/trap.c:185
> #3 0xffffffff80edbd03 in nmi_calltrap () at
/usr/src/sys/amd64/amd64/exception.S:510
> ...
> (kgdb) frame 0
> #0 cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1275
> 1275 CPU_SET_ATOMIC(cpu, &stopped_cpus);
> (kgdb) list
> 1270 cpu = PCPU_GET(cpuid);
> 1271
> 1272 savectx(&stoppcbs[cpu]);
> 1273
> 1274 /* Indicate that we are stopped */
> 1275 CPU_SET_ATOMIC(cpu, &stopped_cpus);
> 1276
> 1277 /* Wait for restart */
> 1278 while (!CPU_ISSET(cpu, &started_cpus))
> 1279 ia32_pause();
> (kgdb) p stopped_cpus
> $1 = {__bits = 0xffffffff81df4368}
> (kgdb) p started_cpus
> $2 = {__bits = 0xffffffff81df4418}
>
It would be interesting to see only the backtraces for threads which are
on CPUs, i.e. all nmi_stop_handler()-threads and others.
>From your description, it looks like that there are two problems, both
of which might have the same cause. Initial thread broadcasts tlb
shootdown IPI, calling smp_targeted_tlb_shootdown(). The function
locks the smp_ipi_mtx spinlock. After IPI is sent, the function loops
waiting for other cores acknowledging the IPI. The is the
while (*p_cpudone != generation)
ia32_pause();
loop which reads from other CPU PCPU area.
Then, other thread decides that it should execute shootdown as well, and
tries to lock the same smp_ipi_mtx spinlock. It times out due to the
initial thread, and panics. Panic(9) sends broadcast NMI to all cores
except self to stop the machine. This is what you see in the backtrace
for all except one thread.
If you do not see NMI handler entrance for the initial thread, then this
indicates the second problem, any way showing just backtraces for each
CPU would clear the possible confusion.
The question of why shootdown times out is open anyway. You should
examine pc_smp_tlb_done in each PCPU area to see whether CPUs properly
reacted to the IPI and it is initiator which does not see the update,
or it is some CPU which did not received IPI.
This might be a CPU errata. I remember much earlier AMD CPUs having an
issue with tight read loop, could be that it is it. Did you tried to
update BIOS ? Also install sysutils/devcpu-data and upgrade microcode on
the processor online. I skimmed over the AMD document 48063 Rev. 3.24
"Revision Guide for AMD Family 15h Models 00h-0Fh Processors", nothing
oustanding came out except possibly Errata 619:
619 Non-Posted Reads May Block Write Dependent on Probe
Responses
The northbridge may stall indefinitely on non-posted reads when a posted
write becomes dependent on probe responses.
You might also try to change the loop I cited above to some variant like
while (*p_cpudone != generation) {
ia32_pause();
mfence();
}
and report whether any of the magic helped.