On 1/22/2018 5:13 PM, Don Lewis wrote:> On 22 Jan, Mike Tancsa wrote: >> On 1/22/2018 1:41 PM, Peter Moody wrote: >>> fwiw, I upgraded to 11-STABLE (11.1-STABLE #6 r328223), applied the >>> hw.lower_amd64_sharedpage setting to my loader.conf and got a crash >>> last night following the familiar high load -> idle. this was with SMT >>> re-enabled. no crashdump, so it was the hard crash that I've been >>> getting. >> >> hw.lower_amd64_sharedpage=1 is the default on AMD boxes no ? I didnt >> need to set mine to 1 >> >>> >>> shrug, I'm at a loss here. >> >> I am trying an RMA with AMD. > > Something else that you might want to try is 12.0-CURRENT. There might > be some changes in HEAD that need to be merged back to 11.1-STABLE.Temp works as expected now. However, a (similar?) hang building Samba47. ctrl+T shows load: 1.98 cmd: python2.7 53438 [usem] 54.70r 14.98u 6.04s 0% 230992k make: Working in: /usr/ports/net/samba47 load: 0.34 cmd: python2.7 53438 [usem] 168.48r 14.98u 6.04s 0% 230992k make: Working in: /usr/ports/net/samba47 load: 0.31 cmd: python2.7 53438 [usem] 174.12r 14.98u 6.04s 0% 230992k make: Working in: /usr/ports/net/samba47 Going to try the RMA route and see if the replacement CPU avoids this problem. # uname -a FreeBSD amdtestr12.sentex.ca 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r328282: Tue Jan 23 11:34:18 EST 2018 mdtancsa at amdtestr12.sentex.ca:/usr/obj/usr/src/amd64.amd64/sys/server amd64 dev.amdtemp.0.core0.sensor0: 52.6C dev.amdtemp.0.sensor_offset: 0 dev.amdtemp.0.%parent: hostb0 dev.amdtemp.0.%pnpinfo: dev.amdtemp.0.%location: dev.amdtemp.0.%driver: amdtemp dev.amdtemp.0.%desc: AMD CPU On-Die Thermal Sensors dev.amdtemp.%parent: dev.cpu.11.temperature: 52.6C dev.cpu.10.temperature: 52.6C dev.cpu.9.temperature: 52.6C dev.cpu.8.temperature: 52.6C dev.cpu.7.temperature: 52.6C dev.cpu.6.temperature: 52.6C dev.cpu.5.temperature: 52.6C dev.cpu.4.temperature: 52.6C dev.cpu.3.temperature: 52.6C dev.cpu.2.temperature: 52.6C dev.cpu.1.temperature: 52.6C dev.cpu.0.temperature: 52.6C -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike at sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/
On 23/01/2018 19:15, Mike Tancsa wrote:> On 1/22/2018 5:13 PM, Don Lewis wrote: >> On 22 Jan, Mike Tancsa wrote: >>> On 1/22/2018 1:41 PM, Peter Moody wrote: >>>> fwiw, I upgraded to 11-STABLE (11.1-STABLE #6 r328223), applied the >>>> hw.lower_amd64_sharedpage setting to my loader.conf and got a crash >>>> last night following the familiar high load -> idle. this was with SMT >>>> re-enabled. no crashdump, so it was the hard crash that I've been >>>> getting. >>> >>> hw.lower_amd64_sharedpage=1 is the default on AMD boxes no ? I didnt >>> need to set mine to 1 >>> >>>> >>>> shrug, I'm at a loss here. >>> >>> I am trying an RMA with AMD. >> >> Something else that you might want to try is 12.0-CURRENT. There might >> be some changes in HEAD that need to be merged back to 11.1-STABLE. > > > Temp works as expected now. However, a (similar?) hang building Samba47. > > ctrl+T showsIf that works, then maybe you can get procstat -kk -a or a crash dump. Maybe this is not a hardware problem at all (or maybe it is).> load: 1.98 cmd: python2.7 53438 [usem] 54.70r 14.98u 6.04s 0% 230992k > make: Working in: /usr/ports/net/samba47 > load: 0.34 cmd: python2.7 53438 [usem] 168.48r 14.98u 6.04s 0% 230992k > make: Working in: /usr/ports/net/samba47 > load: 0.31 cmd: python2.7 53438 [usem] 174.12r 14.98u 6.04s 0% 230992k > make: Working in: /usr/ports/net/samba47-- Andriy Gapon
On 23 Jan, Mike Tancsa wrote:> On 1/22/2018 5:13 PM, Don Lewis wrote: >> On 22 Jan, Mike Tancsa wrote: >>> On 1/22/2018 1:41 PM, Peter Moody wrote: >>>> fwiw, I upgraded to 11-STABLE (11.1-STABLE #6 r328223), applied the >>>> hw.lower_amd64_sharedpage setting to my loader.conf and got a crash >>>> last night following the familiar high load -> idle. this was with SMT >>>> re-enabled. no crashdump, so it was the hard crash that I've been >>>> getting. >>> >>> hw.lower_amd64_sharedpage=1 is the default on AMD boxes no ? I didnt >>> need to set mine to 1 >>> >>>> >>>> shrug, I'm at a loss here. >>> >>> I am trying an RMA with AMD. >> >> Something else that you might want to try is 12.0-CURRENT. There might >> be some changes in HEAD that need to be merged back to 11.1-STABLE. > > > Temp works as expected now. However, a (similar?) hang building Samba47. > > ctrl+T shows > > > load: 1.98 cmd: python2.7 53438 [usem] 54.70r 14.98u 6.04s 0% 230992k > make: Working in: /usr/ports/net/samba47 > load: 0.34 cmd: python2.7 53438 [usem] 168.48r 14.98u 6.04s 0% 230992k > make: Working in: /usr/ports/net/samba47 > load: 0.31 cmd: python2.7 53438 [usem] 174.12r 14.98u 6.04s 0% 230992k > make: Working in: /usr/ports/net/samba47I just ran into this for this first time with samba46. I kicked of a ports build this evening before leaving for several hours. When I returned, samba46 had failed with a build runaway. I just tried again and I see python stuck in the usem state. This is what I see with procstat -k: PID TID COMM TDNAME KSTACK 90692 100801 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 100824 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 100857 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 100956 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 100995 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 101483 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 101538 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 101549 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 101570 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 101572 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 101583 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 101588 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 101593 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 101610 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 101629 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_lock_umutex __umtx_op_wait_umutex amd64_syscall fast_syscall_common 90692 101666 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common 90692 102114 python2.7 - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_sem2_wait __umtx_op_sem2_wait amd64_syscall fast_syscall_common and procstat -t: PID TID COMM TDNAME CPU PRI STATE WCHAN 90692 100801 python2.7 - -1 124 sleep usem 90692 100824 python2.7 - -1 124 sleep usem 90692 100857 python2.7 - -1 124 sleep usem 90692 100956 python2.7 - -1 125 sleep usem 90692 100995 python2.7 - -1 124 sleep usem 90692 101483 python2.7 - -1 124 sleep usem 90692 101538 python2.7 - -1 125 sleep usem 90692 101549 python2.7 - -1 124 sleep usem 90692 101570 python2.7 - -1 124 sleep usem 90692 101572 python2.7 - -1 124 sleep usem 90692 101583 python2.7 - -1 125 sleep usem 90692 101588 python2.7 - -1 124 sleep usem 90692 101593 python2.7 - -1 123 sleep usem 90692 101610 python2.7 - -1 124 sleep usem 90692 101629 python2.7 - -1 125 sleep umtxn 90692 101666 python2.7 - -1 124 sleep usem 90692 102114 python2.7 - -1 152 sleep usem The machine isn't totally idle. The last pid value in top increases by about 40 every two seconds. Looks like it might be poudriere polling something ...