Mike Pumford michaelp at bsquare.com wrote on Wed Jan 24 12:03:04 UTC 2018 :> I've run into this on modern Intel systems as well. The RAM is sold as > 2400 but thats actually an overclock profile. If I actually enabled it > (despite both board and RAM being qualified for that) the system ends up > locking up or crashing as soon as you stress it. Go back to the standard > DDR profile advertised by the RAM and it is totally stable.The reported fails are during idle time as I understand. Things are working when the CPU's are kept busy from what I've read in the various notes. The hang-ups are during idle times. "the system ends up locking up or crashing as soon as you stress it" does not sound like a matching context. That a slower RAM speed might help idle behave correctly is interesting given the Zen and Ryzen dependence on RAM speed for the speed of its internal interconnect-fabric's operation. I'll note that, if one goes through the referenced Linux exchanges about this, Ryzen Threadripper's examples are also reported to have the problem. ==Mark Millard marklmi at yahoo.com ( markmi at dsl-only.net is going away in 2018-Feb, late)
I think perhaps a good time to summarize as a few issues seem to be going on a) fragile BIOS settings. There seems to be a number of issues around RAM speeds and disabled C-STATES that impact stability. Specifically, lowering the default frequency from 2400 to 2133 seems to help some users with crashes / lockups under heavy loads. b) CPUs manufactured prior to week 25 (some say week 33?) have a hardware defect that manifests itself as segfaults in heavy compiles. I was able to confirm this on 1 of the CPUs I had using a Linux setup. It seems to confirm this, you need to physically look at the CPU for the manufacturing date :( Not sure how to trigger it on FreeBSD reliably, but there is a github project I used to verify on Linux (https://github.com/suaefar/ryzen-test) c) The idle lockup bug. This *seems* to be confirmed on Linux as well http://blog.programster.org/ubuntu-16-04-compile-custom-kernel-for-ryzen and https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1690085 d) Compile failures of some ports. For myself and one other user, compiling net/samba47 reliably hangs in roughly the same place. Its not clear if this is related to any of the above bugs or not. Right now I have RMA'd my 3 CPUs back to AMD. Hopefully, I will get replacements in a week and can get back to testing c) and d). ---Mike On 1/24/2018 9:22 AM, Mark Millard via freebsd-stable wrote:> Mike Pumford michaelp at bsquare.com wrote on > Wed Jan 24 12:03:04 UTC 2018 : > >> I've run into this on modern Intel systems as well. The RAM is sold as >> 2400 but thats actually an overclock profile. If I actually enabled it >> (despite both board and RAM being qualified for that) the system ends up >> locking up or crashing as soon as you stress it. Go back to the standard >> DDR profile advertised by the RAM and it is totally stable. > > The reported fails are during idle time as I understand. Things are > working when the CPU's are kept busy from what I've read in the > various notes. The hang-ups are during idle times. > > "the system ends up locking up or crashing as soon as you stress it" > does not sound like a matching context. > > That a slower RAM speed might help idle behave correctly is interesting > given the Zen and Ryzen dependence on RAM speed for the speed of its > internal interconnect-fabric's operation. > > I'll note that, if one goes through the referenced Linux exchanges about > this, Ryzen Threadripper's examples are also reported to have the problem. > > ==> Mark Millard > marklmi at yahoo.com > ( markmi at dsl-only.net is > going away in 2018-Feb, late) > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" > >-- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike at sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/
Mike Tancsa
2018-Feb-13 13:41 UTC
Ryzen issues on FreeBSD ? (summary of 4 issues) (seemingly solved!)
OK, this is all mostly solved for me it seems. points below inline On 1/24/2018 9:42 AM, Mike Tancsa wrote:> I think perhaps a good time to summarize as a few issues seem to be going on > > a) fragile BIOS settings. There seems to be a number of issues around > RAM speeds and disabled C-STATES that impact stability. Specifically, > lowering the default frequency from 2400 to 2133 seems to help some > users with crashes / lockups under heavy loads.Also disabling core boost on non X cpus (ie 1600 vs 1600x) and making sure the CPU is not overheating. On my ASUS board using a back ported version of amdtemp and amdsmn I confirmed the temp does not go above 50C at full load. Setting the FAN speed to turbo seems to help reduce the max temp the CPU would get.> b) CPUs manufactured prior to week 25 (some say week 33?) have a > hardware defect that manifests itself as segfaults in heavy compiles. I > was able to confirm this on 1 of the CPUs I had using a Linux setup. It > seems to confirm this, you need to physically look at the CPU for the > manufacturing date :( Not sure how to trigger it on FreeBSD reliably, > but there is a github project I used to verify on Linux > (https://github.com/suaefar/ryzen-test)AMD sent me 3 new CPUs without issue. Turn around was about 1 week from Canada to the US and back.> > c) The idle lockup bug. This *seems* to be confirmed on Linux as well > http://blog.programster.org/ubuntu-16-04-compile-custom-kernel-for-ryzen > and > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1690085Perhaps the settings in a), as well as the most recent BIOS update seems to have fixed this issue for me. It sure seemed like a hardware issue, but then again it could be a side effect of d). However, I was never able to break into the debugger using a debugging kernel in HEAD so I suspect it was more hardware related than anything. BIOS Information Vendor: American Megatrends Inc. Version: 3803 Release Date: 01/22/2018 Address: 0xF0000 This is on a Product Name: PRIME X370-PRO Version: Rev X.0x> > d) Compile failures of some ports. For myself and one other user, > compiling net/samba47 reliably hangs in roughly the same place. Its not > clear if this is related to any of the above bugs or not.This too seems to be fixed! The patch in https://docs.freebsd.org/cgi/getmsg.cgi?fetch=417183+0+archive/2018/freebsd-hackers/20180211.freebsd-hackers seems to stop the deadlock. I did 90 builds on RELENG_11 with this patch over night and no deadlocks. For half the builds I had 2 guest VMs also building. For the second half, it was the only thing running on the box and its working as expected All this just in time for my Epyc based system to arrive! ---Mike