I'm running 11-STABLE from 12/9. amdtemp works for me. It also has the
systl indicating that it it has the shared page fix. I'm pretty sure
I've
seen the lockups since then. I'll update to the latest STABLE and see
what happens.
One weird thing about my experience is that if I keep something running
continuously like the distributed.net client on 6 of 12 possible threads,
it keeps the system up for MUCH longer than without. This is a home server
and very lightly loaded (one could argue insanely overpowered for the use
case).
I'm glad to see that there has been some attention on this. I was a little
disappointed by the earlier thread.
I'm happy to help troubleshoot, but I'm not sure what information I can
gather from a hard locked system that doesn't even show anything on the
console.
--
Nimrod
On Wed, Jan 17, 2018 at 4:01 PM Mike Tancsa <mike at sentex.net> wrote:
> On 1/17/2018 3:39 PM, Don Lewis wrote:
> > On 17 Jan, Mike Tancsa wrote:
> >> On 1/17/2018 8:43 AM, Pete French wrote:
> >>>
> >>> Are you running the latest STABLE ? There were some patches
for Ryzen
> >>> which went in I belive, and might affect te stability.
Specificly the
> >>> chnages to stop it locking up when executing code in the top
page ?
> >>
> >> Hi,
> >> I was testing with RELENG_11 as of 2 days ago. The fix seems
to
> be there
> >>
> >> # sysctl -A hw.lower_amd64_sharedpage
> >> hw.lower_amd64_sharedpage: 1
> >>
> >> Would love to find a class of motherboard that pushes its
"You dont need
> >> to dork around with any BIOS settings. It just works. Oh, and we
have a
> >> hardware watchdog too".... ipmi would be stellar.
> >
> > The shared page change fixed the random lockup and silent reboot
problem
> > for me. I've got a 1700X eight core CPU and a Gigabyte X370
Gaming 5. I
> > did have to RMA my CPU (it was an early one) because it had the
problem
> > with random segfaults that seemed to be triggered by process migration
> > between CPU cores. I still haven't switched over to using it for
> > package builds because I see more random fallout than on my older
> > package builder. I'm not blaming the hardware for that at this
point
> > because I see a lot of the same issues on my older machine, but less
> > frequently.
> >
> > One thing to watch (though it should be less critical with a six core
> > CPU) is VRM cooling. I removed the stupid plastic shroud over the VRM
> > sink on my motherboard so that it gets some more airflow.
>
> Thanks! I will confirm the cooling. I tried just now looking at the CPU
> FAN control in the BIOS and up'd it to "turbo" from the
default. Does
> amdtmp.ko work with your chipset ? Nothing on mine unfortunately, so I
> cant tell from the OS if its running hot.
>
> Is there a way to see if your CPU is old and has that bug ? I havent
> seen any segfaults on the few dozen buildworlds I have done. So far its
> always been a total lockup and not crash with RELENG11.
>
> x86info v1.31pre
> Found 12 identical CPUs
> Extended Family: 8 Extended Model: 0 Family: 15 Model: 1 Stepping: 1
> CPU Model (x86info's best guess): AMD Zen Series Processor (ZP-B1)
> Processor name string (BIOS programmed): AMD Ryzen 5 1600 Six-Core
> Processor
>
> Monitor/Mwait: min/max line size 64/64, ecx bit 0 support, enumeration
> extension
> SVM: revision 1, 32768 ASIDs, np, lbrVirt, SVMLock, NRIPSave,
> TscRateMsr, VmcbClean, FlushByAsid, DecodeAssists, PauseFilter,
> PauseFilterThreshold
> Address Size: 48 bits virtual, 48 bits physical
> The physical package has 12 of 16 possible cores implemented.
> running at an estimated 3.20GHz
>
>
>
>
> ---Mike
>
>
>
> --
> -------------------
> Mike Tancsa, tel +1 519 651 3400 <(519)%20651-3400>
> Sentex Communications, mike at sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada http://www.tancsa.com/
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"
>
--
--
Nimrod