On 1/28/2018 7:41 PM, Don Lewis wrote:> > My suspicion is a FreeBSD bug, probably a locking / race issue. I know > that we've had to make some tweeks to our code for AMD CPUs, like this:OK, I got back the CPUs from AMD (fast turn around!) And sadly, I am still able to hang the compile in about the same place. However, if I set hw.lower_amd64_sharedpage=0 it seems to hang in a different way. CTRL+t shows load: 0.43 cmd: python2.7 15736 [umtxn] 165.00r 14.46u 6.65s 0% 233600k make[1]: Working in: /usr/ports/net/samba47 make: Working in: /usr/ports/net/samba47 # procstat -t 15736 PID TID COMM TDNAME CPU PRI STATE WCHAN 15736 100855 python2.7 - -1 152 sleep usem 15736 100956 python2.7 - -1 124 sleep umtxn 15736 100957 python2.7 - -1 126 sleep umtxn 15736 100958 python2.7 - -1 124 sleep umtxn 15736 100959 python2.7 - -1 127 sleep umtxn 15736 100960 python2.7 - -1 126 sleep umtxn 15736 100961 python2.7 - -1 126 sleep umtxn 15736 100962 python2.7 - -1 126 sleep umtxn 15736 100963 python2.7 - -1 126 sleep umtxn 15736 100964 python2.7 - -1 127 sleep umtxn 15736 100965 python2.7 - -1 126 sleep umtxn 15736 100966 python2.7 - -1 126 sleep umtxn 15736 100967 python2.7 - -1 126 sleep umtxn # procstat -kk 15736 PID TID COMM TDNAME KSTACK 15736 100855 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100956 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100957 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100958 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100959 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100960 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100961 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100962 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100963 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100964 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100965 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100966 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100967 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc If I kill the make, reboot and just type make, it completes after the reboot. If after the reboot, I do an rm -R work, it will hang again. With the default of hw.lower_amd64_sharedpage: 1 post reboot, CTRL+T shows load: 2.73 cmd: python2.7 15703 [usem] 40.92r 12.34u 3.45s 0% 233640k make[1]: Working in: /usr/ports/net/samba47 make: Working in: /usr/ports/net/samba47 root at amdtestr12:/home/mdtancsa # procstat -kk 15703 PID TID COMM TDNAME KSTACK 15703 100824 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100956 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100957 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100958 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100959 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100960 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100961 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100962 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100963 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100964 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100965 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100966 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100967 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc root at amdtestr12:/home/mdtancsa # procstat -t 15703 PID TID COMM TDNAME CPU PRI STATE WCHAN 15703 100824 python2.7 - -1 152 sleep usem 15703 100956 python2.7 - -1 125 sleep usem 15703 100957 python2.7 - -1 127 sleep usem 15703 100958 python2.7 - -1 125 sleep usem 15703 100959 python2.7 - -1 125 sleep usem 15703 100960 python2.7 - -1 126 sleep usem 15703 100961 python2.7 - -1 126 sleep usem 15703 100962 python2.7 - -1 126 sleep usem 15703 100963 python2.7 - -1 126 sleep usem 15703 100964 python2.7 - -1 126 sleep usem 15703 100965 python2.7 - -1 126 sleep umtxn 15703 100966 python2.7 - -1 126 sleep usem 15703 100967 python2.7 - -1 125 sleep usem root at amdtestr12:/home/mdtancsa # ---Mike> > ------------------------------------------------------------------------ > r321608 | kib | 2017-07-27 01:37:07 -0700 (Thu, 27 Jul 2017) | 9 lines > > Use MFENCE to serialize RDTSC on non-Intel CPUs. > > Kernel already used the stronger barrier instruction for AMDs, correct > the userspace fast gettimeofday() implementation as well. > > > > I did go back and look at the build runaways that I've occasionally seen > on my AMD FX-8320E package builder. I haven't seen the python issue > there, but have seen gmake get stuck in a sleeping state with a bunch of > zombie offspring. > >-- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike at sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada
On 1/30/2018 2:51 PM, Mike Tancsa wrote:> > And sadly, I am still able to hang the compile in about the same place. > However, if I setOK, here is a sort of work around. If I have the box a little more busy, I can avoid whatever deadlock is going on. In another console I have cat /dev/urandom | sha256 running while the build runs ... and I can compile net/samba47 from scratch without the compile hanging. This problem also happens on HEAD from today. Should I start a new thread on freebsd-current ? Or just file a bug report ? The compile worked 4/4 ---Mike> > hw.lower_amd64_sharedpage=0 > > it seems to hang in a different way. CTRL+t shows > > load: 0.43 cmd: python2.7 15736 [umtxn] 165.00r 14.46u 6.65s 0% 233600k > make[1]: Working in: /usr/ports/net/samba47 > make: Working in: /usr/ports/net/samba47 > > > # procstat -t 15736 > PID TID COMM TDNAME CPU PRI STATE > WCHAN > 15736 100855 python2.7 - -1 152 sleep > usem > 15736 100956 python2.7 - -1 124 sleep > umtxn > 15736 100957 python2.7 - -1 126 sleep > umtxn > 15736 100958 python2.7 - -1 124 sleep > umtxn > 15736 100959 python2.7 - -1 127 sleep > umtxn > 15736 100960 python2.7 - -1 126 sleep > umtxn > 15736 100961 python2.7 - -1 126 sleep > umtxn > 15736 100962 python2.7 - -1 126 sleep > umtxn > 15736 100963 python2.7 - -1 126 sleep > umtxn > 15736 100964 python2.7 - -1 127 sleep > umtxn > 15736 100965 python2.7 - -1 126 sleep > umtxn > 15736 100966 python2.7 - -1 126 sleep > umtxn > 15736 100967 python2.7 - -1 126 sleep > umtxn > > # procstat -kk 15736 > PID TID COMM TDNAME KSTACK > > 15736 100855 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15736 100956 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15736 100957 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15736 100958 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15736 100959 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15736 100960 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15736 100961 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15736 100962 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15736 100963 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15736 100964 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15736 100965 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15736 100966 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15736 100967 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 > amd64_syscall+0xa48 fast_syscall_common+0xfc > > If I kill the make, reboot and just type make, it completes after the > reboot. If after the reboot, I do an rm -R work, it will hang again. > With the default of > hw.lower_amd64_sharedpage: 1 > post reboot, > > CTRL+T shows > load: 2.73 cmd: python2.7 15703 [usem] 40.92r 12.34u 3.45s 0% 233640k > make[1]: Working in: /usr/ports/net/samba47 > make: Working in: /usr/ports/net/samba47 > > > > root at amdtestr12:/home/mdtancsa # procstat -kk 15703 > PID TID COMM TDNAME KSTACK > > 15703 100824 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15703 100956 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15703 100957 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15703 100958 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15703 100959 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15703 100960 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15703 100961 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15703 100962 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15703 100963 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15703 100964 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15703 100965 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15703 100966 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b > amd64_syscall+0xa48 fast_syscall_common+0xfc > 15703 100967 python2.7 - mi_switch+0xf5 > sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 > umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b > amd64_syscall+0xa48 fast_syscall_common+0xfc > root at amdtestr12:/home/mdtancsa # procstat -t 15703 > PID TID COMM TDNAME CPU PRI STATE > WCHAN > 15703 100824 python2.7 - -1 152 sleep > usem > 15703 100956 python2.7 - -1 125 sleep > usem > 15703 100957 python2.7 - -1 127 sleep > usem > 15703 100958 python2.7 - -1 125 sleep > usem > 15703 100959 python2.7 - -1 125 sleep > usem > 15703 100960 python2.7 - -1 126 sleep > usem > 15703 100961 python2.7 - -1 126 sleep > usem > 15703 100962 python2.7 - -1 126 sleep > usem > 15703 100963 python2.7 - -1 126 sleep > usem > 15703 100964 python2.7 - -1 126 sleep > usem > 15703 100965 python2.7 - -1 126 sleep > umtxn > 15703 100966 python2.7 - -1 126 sleep > usem > 15703 100967 python2.7 - -1 125 sleep > usem > root at amdtestr12:/home/mdtancsa # > > > ---Mike > > >> >> ------------------------------------------------------------------------ >> r321608 | kib | 2017-07-27 01:37:07 -0700 (Thu, 27 Jul 2017) | 9 lines >> >> Use MFENCE to serialize RDTSC on non-Intel CPUs. >> >> Kernel already used the stronger barrier instruction for AMDs, correct >> the userspace fast gettimeofday() implementation as well. >> >> >> >> I did go back and look at the build runaways that I've occasionally seen >> on my AMD FX-8320E package builder. I haven't seen the python issue >> there, but have seen gmake get stuck in a sleeping state with a bunch of >> zombie offspring. >> >> > >-- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike at sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada