our new Server with AMD EPYC and super micro board reboots ramdonly. There is no error message before the reboot in /var/log/messages. we are running 2 Server with VMWare workstation without any problem. The new server should run KVM. older servers with AMD (before EPYC) running KVM without any problem. any idea or recommendation? -- Viele Gr??e Helmut Drodofsky Internet XS Service GmbH He?br?hlstra?e 15 70565 Stuttgart Gesch?ftsf?hrung Helmut Drodofsky HRB 21091 Stuttgart USt.ID: DE190582774 Fon: 0711 781941 0 <tel:+497117819410> Fax: 0711 781941 79 Mail: info at internet-xs.de www.internet-xs.de
> our new Server with AMD EPYC and super micro board reboots ramdonly. > There is no error message before the reboot in /var/log/messages.Anything in the hardware logs of the server like memory error or so? Any watchdog on the servers acting bad? We run CentOS 7 and KVM on AMD Opteron and AMD EPYC servers without issues. Regards, Simon> > we are running 2 Server with VMWare workstation without any problem. > > The new server should run KVM. > > older servers with AMD (before EPYC) running KVM without any problem. > > any idea or recommendation? > > -- > Viele Gr??e > Helmut Drodofsky > > Internet XS Service GmbH > He?br?hlstra?e 15 > 70565 Stuttgart > > Gesch?ftsf?hrung > Helmut Drodofsky > HRB 21091 Stuttgart > USt.ID: DE190582774 > Fon: 0711 781941 0 <tel:+497117819410> > Fax: 0711 781941 79 > Mail: info at internet-xs.de > www.internet-xs.de > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >
Erick Perez - Quadrian Enterprises
2020-Jan-01 18:02 UTC
[CentOS] KVM Random Reboots AMD EPYC Server
I had issues with Supermicro and EPYC in the past year and it was isolated to a faulty 16GB ECC RAM module and the error was just showing in the log of the super micro web-based BMC and nowhere else. The fault was neither Supermicro nor AMD. The brand of the ECC module was Samsung.it failed after 1 year of use. the bad batch I assume because the other 25 pieces of ECC RAM from Samsung that we use in the other servers have no issue. The behavior was that randomly, the server suddenly rebooted with no message at all at Centos level. I realize that the BMC error log is far (very very far) from perfect but perhaps the error is in a strange message lying there. Hope this helps On Wed, Jan 1, 2020 at 10:09 AM Simon Matter via CentOS <centos at centos.org> wrote:> > our new Server with AMD EPYC and super micro board reboots ramdonly. > > There is no error message before the reboot in /var/log/messages. > > Anything in the hardware logs of the server like memory error or so? Any > watchdog on the servers acting bad? > We run CentOS 7 and KVM on AMD Opteron and AMD EPYC servers without issues. > > Regards, > Simon > > > > > we are running 2 Server with VMWare workstation without any problem. > > > > The new server should run KVM. > > > > older servers with AMD (before EPYC) running KVM without any problem. > > > > any idea or recommendation? > > > > -- > > Viele Gr??e > > Helmut Drodofsky > > > > Internet XS Service GmbH > > He?br?hlstra?e 15 > > 70565 Stuttgart > > > > Gesch?ftsf?hrung > > Helmut Drodofsky > > HRB 21091 Stuttgart > > USt.ID: DE190582774 > > Fon: 0711 781941 0 <tel:+497117819410> > > Fax: 0711 781941 79 > > Mail: info at internet-xs.de > > www.internet-xs.de > > _______________________________________________ > > CentOS mailing list > > CentOS at centos.org > > https://lists.centos.org/mailman/listinfo/centos > > > > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >-- --------------------- Erick Perez Quadrian Enterprises S.A. - Panama, Republica de Panama Skype chat: eaperezh WhatsApp IM: +507-6675-5083 ---------------------