On 1/27/2018 3:23 AM, Don Lewis wrote:> > I just ran into this for this first time with samba46. I kicked of a > ports build this evening before leaving for several hours. When I > returned, samba46 had failed with a build runaway. I just tried again > and I see python stuck in the usem state. This is what I see with > procstat -k:Hmmm, is this indicative of a processor bug or a FreeBSD bug or its indeterminate at this point ?> > PID TID COMM TDNAME KSTACK > 90692 100801 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 100824 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 100857 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 100956 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 100995 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 101483 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 101538 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 101549 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 101570 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 101572 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 101583 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 101588 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 101593 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 101610 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 101629 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_lock_umutex __umtx_op_wait_umutex amd64_syscall fast_syscall_common > 90692 101666 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > 90692 102114 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common > > and procstat -t: > > PID TID COMM TDNAME CPU PRI STATE WCHAN > 90692 100801 python2.7 - -1 124 sleep usem > 90692 100824 python2.7 - -1 124 sleep usem > 90692 100857 python2.7 - -1 124 sleep usem > 90692 100956 python2.7 - -1 125 sleep usem > 90692 100995 python2.7 - -1 124 sleep usem > 90692 101483 python2.7 - -1 124 sleep usem > 90692 101538 python2.7 - -1 125 sleep usem > 90692 101549 python2.7 - -1 124 sleep usem > 90692 101570 python2.7 - -1 124 sleep usem > 90692 101572 python2.7 - -1 124 sleep usem > 90692 101583 python2.7 - -1 125 sleep usem > 90692 101588 python2.7 - -1 124 sleep usem > 90692 101593 python2.7 - -1 123 sleep usem > 90692 101610 python2.7 - -1 124 sleep usem > 90692 101629 python2.7 - -1 125 sleep umtxn > 90692 101666 python2.7 - -1 124 sleep usem > 90692 102114 python2.7 - -1 152 sleep usem > > The machine isn't totally idle. The last pid value in top increases by > about 40 every two seconds. Looks like it might be poudriere polling > something ... > > >-- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike at sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/
On 27 Jan, Mike Tancsa wrote:> On 1/27/2018 3:23 AM, Don Lewis wrote: >> >> I just ran into this for this first time with samba46. I kicked of a >> ports build this evening before leaving for several hours. When I >> returned, samba46 had failed with a build runaway. I just tried again >> and I see python stuck in the usem state. This is what I see with >> procstat -k: > > Hmmm, is this indicative of a processor bug or a FreeBSD bug or its > indeterminate at this point ?My suspicion is a FreeBSD bug, probably a locking / race issue. I know that we've had to make some tweeks to our code for AMD CPUs, like this: ------------------------------------------------------------------------ r321608 | kib | 2017-07-27 01:37:07 -0700 (Thu, 27 Jul 2017) | 9 lines Use MFENCE to serialize RDTSC on non-Intel CPUs. Kernel already used the stronger barrier instruction for AMDs, correct the userspace fast gettimeofday() implementation as well. I did go back and look at the build runaways that I've occasionally seen on my AMD FX-8320E package builder. I haven't seen the python issue there, but have seen gmake get stuck in a sleeping state with a bunch of zombie offspring.