I have just noticed this message in my kernel logs, reporting the possibility of an error with my memory. This would go a long way towards explaining the problems i''ve been having. This particular error is occuring when i''m not running xen so is obviously not something brought on by xen itself. The strange thing is that the NMI error is always followed by the TLAN: eth0: Adaptor Error = 0x180002, which says to me that either there is something wrong with my network card which is triggering an NMI, or that the NMI triggers an error in that network adapter. The memory itself is ECC memory in a Compaq Proliant 1600, maybe i can access the memory logs... Either way, what would xen do upon receiving an NMI? Would it spontaneously reboot? I''m running memtest now, and will run memtest86 once I am back in the office. James eth2: Promiscuous mode enabled. eth2: Promiscuous mode enabled. br2: port 1(eth2) entering learning state br2: port 1(eth2) entering forwarding state br2: topology change detected, propagating Uhhuh. NMI received. Dazed and confused, but trying to continue You probably have a hardware problem with your RAM chips TLAN: eth0: Adaptor Error = 0x180002 TLAN: eth0: Starting autonegotiation. TLAN: eth0: Autonegotiation complete. TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD TLAN: eth0: Adaptor Error = 0x180002 TLAN: eth0: Starting autonegotiation. TLAN: eth0: Autonegotiation complete. TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD
Further investigation reveals that i''m not the only one having this problem, and I just checked another Proliant 1600 I have running linux and it has reported that error at least once , so it appears to be a fault in the TLAN driver. doh. The TLAN driver also has another fault in it to do with promisc mode so i''ll remove that driver and put another network adapter in it. I''m guessing then that xen reboots the system if it gets an NMI. Hopefully this ends my list of errors with xen! James From: James Harper Sent: Tue 3/08/2004 11:50 AM To: xen-devel@lists.sourceforge.net Subject: [Xen-devel] memory error? I have just noticed this message in my kernel logs, reporting the possibility of an error with my memory. This would go a long way towards explaining the problems i''ve been having. This particular error is occuring when i''m not running xen so is obviously not something brought on by xen itself. The strange thing is that the NMI error is always followed by the TLAN: eth0: Adaptor Error = 0x180002, which says to me that either there is something wrong with my network card which is triggering an NMI, or that the NMI triggers an error in that network adapter. The memory itself is ECC memory in a Compaq Proliant 1600, maybe i can access the memory logs... Either way, what would xen do upon receiving an NMI? Would it spontaneously reboot? I''m running memtest now, and will run memtest86 once I am back in the office. James eth2: Promiscuous mode enabled. eth2: Promiscuous mode enabled. br2: port 1(eth2) entering learning state br2: port 1(eth2) entering forwarding state br2: topology change detected, propagating Uhhuh. NMI received. Dazed and confused, but trying to continue You probably have a hardware problem with your RAM chips TLAN: eth0: Adaptor Error = 0x180002 TLAN: eth0: Starting autonegotiation. TLAN: eth0: Autonegotiation complete. TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD TLAN: eth0: Adaptor Error = 0x180002 TLAN: eth0: Starting autonegotiation. TLAN: eth0: Autonegotiation complete. TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD
Funnily enough, the error goes away again if I disable the 3rd network card module (natsemi). Maybe those two devices don''t like sharing the same bus... i think they do share an irq. bah. too confusing. james From: James Harper Sent: Tue 3/08/2004 1:16 PM To: xen-devel@lists.sourceforge.net Subject: RE: [Xen-devel] memory error? Further investigation reveals that i''m not the only one having this problem, and I just checked another Proliant 1600 I have running linux and it has reported that error at least once , so it appears to be a fault in the TLAN driver. doh. The TLAN driver also has another fault in it to do with promisc mode so i''ll remove that driver and put another network adapter in it. I''m guessing then that xen reboots the system if it gets an NMI. Hopefully this ends my list of errors with xen! James From: James Harper Sent: Tue 3/08/2004 11:50 AM To: xen-devel@lists.sourceforge.net Subject: [Xen-devel] memory error? I have just noticed this message in my kernel logs, reporting the possibility of an error with my memory. This would go a long way towards explaining the problems i''ve been having. This particular error is occuring when i''m not running xen so is obviously not something brought on by xen itself. The strange thing is that the NMI error is always followed by the TLAN: eth0: Adaptor Error = 0x180002, which says to me that either there is something wrong with my network card which is triggering an NMI, or that the NMI triggers an error in that network adapter. The memory itself is ECC memory in a Compaq Proliant 1600, maybe i can access the memory logs... Either way, what would xen do upon receiving an NMI? Would it spontaneously reboot? I''m running memtest now, and will run memtest86 once I am back in the office. James eth2: Promiscuous mode enabled. eth2: Promiscuous mode enabled. br2: port 1(eth2) entering learning state br2: port 1(eth2) entering forwarding state br2: topology change detected, propagating Uhhuh. NMI received. Dazed and confused, but trying to continue You probably have a hardware problem with your RAM chips TLAN: eth0: Adaptor Error = 0x180002 TLAN: eth0: Starting autonegotiation. TLAN: eth0: Autonegotiation complete. TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD TLAN: eth0: Adaptor Error = 0x180002 TLAN: eth0: Starting autonegotiation. TLAN: eth0: Autonegotiation complete. TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD
> I have just noticed this message in my kernel logs, reporting the possibility of an error with my memory. This would go a long way towards explaining the problems i''ve been having. This particular error is occuring when i''m not running xen so is obviously not something brought on by xen itself. > > The strange thing is that the NMI error is always followed by the TLAN: eth0: Adaptor Error = 0x180002, which says to me that either there is something wrong with my network card which is triggering an NMI, or that the NMI triggers an error in that network adapter. The memory itself is ECC memory in a Compaq Proliant 1600, maybe i can access the memory logs... > > Either way, what would xen do upon receiving an NMI? Would it spontaneously reboot?Hmm, given that it''s not something we''ve ever been able to test, ''spontaneous reboot'' sounds quite possible... In normal operation, it''s relatively hard for Xen to reboot without printing anything. It requires a ''triple fault'', which basically means the hypervisor area of the pagetable has to be corrupt. We haven''t seen a bug like that for a very long time. The link between the NMI and the adaptor error is interesting. I wander if its a parity error on the PCI bus rather than a memory ECC failure? Try re-seating the PCI card? Ian> I''m running memtest now, and will run memtest86 once I am back in the office. > > James > > eth2: Promiscuous mode enabled. > eth2: Promiscuous mode enabled. > br2: port 1(eth2) entering learning state > br2: port 1(eth2) entering forwarding state > br2: topology change detected, propagating > Uhhuh. NMI received. Dazed and confused, but trying to continue > You probably have a hardware problem with your RAM chips > TLAN: eth0: Adaptor Error = 0x180002 > TLAN: eth0: Starting autonegotiation. > TLAN: eth0: Autonegotiation complete. > TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex > TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD > TLAN: eth0: Adaptor Error = 0x180002 > TLAN: eth0: Starting autonegotiation. > TLAN: eth0: Autonegotiation complete. > TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex > TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD-=- MIME -=- --_9DD09D3F-D9F9-4632-8493-BCC48EBC0856_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I have just noticed this message in my kernel logs, reporting the possibility of an error with my memory. This would go a long way towards explaining the problems i''ve been having. This particular error is occuring when i''m not running xen so is obviously not something brought on by xen itself. The strange thing is that the NMI error is always followed by the TLAN: eth0: Adaptor Error =3D 0x180002, which says to me that either there is something wrong with my network card which is triggering an NMI, or that the NMI triggers an error in that network adapter. The memory itself is ECC memory in a Compaq Proliant 1600, maybe i can access the memory logs... Either way, what would xen do upon receiving an NMI? Would it spontaneously reboot? I''m running memtest now, and will run memtest86 once I am back in the office. James eth2: Promiscuous mode enabled. eth2: Promiscuous mode enabled. br2: port 1(eth2) entering learning state br2: port 1(eth2) entering forwarding state br2: topology change detected, propagating Uhhuh. NMI received. Dazed and confused, but trying to continue You probably have a hardware problem with your RAM chips TLAN: eth0: Adaptor Error =3D 0x180002 TLAN: eth0: Starting autonegotiation. TLAN: eth0: Autonegotiation complete. TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD TLAN: eth0: Adaptor Error =3D 0x180002 TLAN: eth0: Starting autonegotiation. TLAN: eth0: Autonegotiation complete. TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD --_9DD09D3F-D9F9-4632-8493-BCC48EBC0856_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <HTML dir=3Dltr><HEAD></HEAD> <BODY> <DIV><FONT face=3DArial color=3D#000000 size=3D2>I have just noticed this message in my kernel logs, reporting the possibility of an error with my memory. This would go a long way towards explaining the problems i''ve been having. This particular error is occuring when i''m not running xen so is obviously not something brought on by xen itself.</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>The strange thing is that the NMI error is always followed by the TLAN: eth0: Adaptor Error =3D 0x180002, which says to me that either there is something wrong with my network card which is triggering an NMI, or that the NMI triggers an error in that network adapter. The memory itself is ECC memory in a Compaq Proliant 1600, maybe i can access the memory logs...</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>Either way, what would xen do upon receiving an NMI? Would it spontaneously reboot?</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>I''m running memtest now, and will run memtest86 once I am back in the office.</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>James</FONT></DIV> <DIV><FONT face=3DArial color=3D#000000 size=3D2></FONT> </DIV> <DIV><FONT face=3DArial color=3D#000000 size=3D2>eth2: Promiscuous mode enabled.<BR>eth2: Promiscuous mode enabled.<BR>br2: port 1(eth2) entering learning state<BR>br2: port 1(eth2) entering forwarding state<BR>br2: topology change detected, propagating<BR>Uhhuh. NMI received. Dazed and confused, but trying to continue<BR>You probably have a hardware problem with your RAM chips<BR>TLAN: eth0: Adaptor Error =3D 0x180002<BR>TLAN: eth0: Starting autonegotiation.<BR>TLAN: eth0: Autonegotiation complete.<BR>TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex<BR>TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD<BR>TLAN: eth0: Adaptor Error =3D 0x180002<BR>TLAN: eth0: Starting autonegotiation.<BR>TLAN: eth0: Autonegotiation complete.<BR>TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex<BR>TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD<BR></DIV></FONT> <DIV> </DIV> <DIV> </DIV> <DIV> </DIV> <DIV> </DIV></BODY></HTML> --_9DD09D3F-D9F9-4632-8493-BCC48EBC0856_-- ------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel ------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
In one of my follow up posts i mentioned that I have now seen this behaviour on another similar server, and seen reports of it on the web. Basically under some circumstances the tlan driver gets or causes pci parity errors and barfs, most likely it doesn''t play well with other cards. I''ve since moved the realtek (natsemi) adapter to another slot which is on a different pci bus to the tlan (that''s what I love about servers - separate pci busses to play with!), and have so far not had any more errors. I''m just building the latest version of xen and then will boot into it and give it a thorough thrashing. But so far it looks like the bulk of the problems I have been experiencing lately were of my own making. James From: Ian Pratt Sent: Tue 3/08/2004 4:42 PM To: James Harper Cc: xen-devel@lists.sourceforge.net; Ian.Pratt@cl.cam.ac.uk Subject: Re: [Xen-devel] memory error?> I have just noticed this message in my kernel logs, reporting the possibility of an error with my memory. This would go a long way towards explaining the problems i''ve been having. This particular error is occuring when i''m not running xen so is obviously not something brought on by xen itself. > > The strange thing is that the NMI error is always followed by the TLAN: eth0: Adaptor Error = 0x180002, which says to me that either there is something wrong with my network card which is triggering an NMI, or that the NMI triggers an error in that network adapter. The memory itself is ECC memory in a Compaq Proliant 1600, maybe i can access the memory logs... > > Either way, what would xen do upon receiving an NMI? Would it spontaneously reboot?Hmm, given that it''s not something we''ve ever been able to test, ''spontaneous reboot'' sounds quite possible... In normal operation, it''s relatively hard for Xen to reboot without printing anything. It requires a ''triple fault'', which basically means the hypervisor area of the pagetable has to be corrupt. We haven''t seen a bug like that for a very long time. The link between the NMI and the adaptor error is interesting. I wander if its a parity error on the PCI bus rather than a memory ECC failure? Try re-seating the PCI card? Ian> I''m running memtest now, and will run memtest86 once I am back in the office. > > James > > eth2: Promiscuous mode enabled. > eth2: Promiscuous mode enabled. > br2: port 1(eth2) entering learning state > br2: port 1(eth2) entering forwarding state > br2: topology change detected, propagating > Uhhuh. NMI received. Dazed and confused, but trying to continue > You probably have a hardware problem with your RAM chips > TLAN: eth0: Adaptor Error = 0x180002 > TLAN: eth0: Starting autonegotiation. > TLAN: eth0: Autonegotiation complete. > TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex > TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD > TLAN: eth0: Adaptor Error = 0x180002 > TLAN: eth0: Starting autonegotiation. > TLAN: eth0: Autonegotiation complete. > TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex > TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD-=- MIME -=- --_9DD09D3F-D9F9-4632-8493-BCC48EBC0856_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I have just noticed this message in my kernel logs, reporting the possibility of an error with my memory. This would go a long way towards explaining the problems i''ve been having. This particular error is occuring when i''m not running xen so is obviously not something brought on by xen itself. The strange thing is that the NMI error is always followed by the TLAN: eth0: Adaptor Error =3D 0x180002, which says to me that either there is something wrong with my network card which is triggering an NMI, or that the NMI triggers an error in that network adapter. The memory itself is ECC memory in a Compaq Proliant 1600, maybe i can access the memory logs... Either way, what would xen do upon receiving an NMI? Would it spontaneously reboot? I''m running memtest now, and will run memtest86 once I am back in the office. James eth2: Promiscuous mode enabled. eth2: Promiscuous mode enabled. br2: port 1(eth2) entering learning state br2: port 1(eth2) entering forwarding state br2: topology change detected, propagating Uhhuh. NMI received. Dazed and confused, but trying to continue You probably have a hardware problem with your RAM chips TLAN: eth0: Adaptor Error =3D 0x180002 TLAN: eth0: Starting autonegotiation. TLAN: eth0: Autonegotiation complete. TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD TLAN: eth0: Adaptor Error =3D 0x180002 TLAN: eth0: Starting autonegotiation. TLAN: eth0: Autonegotiation complete. TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD --_9DD09D3F-D9F9-4632-8493-BCC48EBC0856_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <HTML dir=3Dltr><HEAD></HEAD> <BODY> <DIV><FONT face=3DArial color=3D#000000 size=3D2>I have just noticed this message in my kernel logs, reporting the possibility of an error with my memory. This would go a long way towards explaining the problems i''ve been having. This particular error is occuring when i''m not running xen so is obviously not something brought on by xen itself.</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>The strange thing is that the NMI error is always followed by the TLAN: eth0: Adaptor Error =3D 0x180002, which says to me that either there is something wrong with my network card which is triggering an NMI, or that the NMI triggers an error in that network adapter. The memory itself is ECC memory in a Compaq Proliant 1600, maybe i can access the memory logs...</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>Either way, what would xen do upon receiving an NMI? Would it spontaneously reboot?</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>I''m running memtest now, and will run memtest86 once I am back in the office.</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>James</FONT></DIV> <DIV><FONT face=3DArial color=3D#000000 size=3D2></FONT> </DIV> <DIV><FONT face=3DArial color=3D#000000 size=3D2>eth2: Promiscuous mode enabled.<BR>eth2: Promiscuous mode enabled.<BR>br2: port 1(eth2) entering learning state<BR>br2: port 1(eth2) entering forwarding state<BR>br2: topology change detected, propagating<BR>Uhhuh. NMI received. Dazed and confused, but trying to continue<BR>You probably have a hardware problem with your RAM chips<BR>TLAN: eth0: Adaptor Error =3D 0x180002<BR>TLAN: eth0: Starting autonegotiation.<BR>TLAN: eth0: Autonegotiation complete.<BR>TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex<BR>TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD<BR>TLAN: eth0: Adaptor Error =3D 0x180002<BR>TLAN: eth0: Starting autonegotiation.<BR>TLAN: eth0: Autonegotiation complete.<BR>TLAN: eth0: Link active with AutoNegotiation enabled, at 100Mbps Full-Duplex<BR>TLAN: Partner capability: 10BaseT-HD 10BaseT-FD 100baseTx-HD 100baseTx-FD<BR></DIV></FONT> <DIV> </DIV> <DIV> </DIV> <DIV> </DIV> <DIV> </DIV></BODY></HTML> --_9DD09D3F-D9F9-4632-8493-BCC48EBC0856_-- ------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > I have just noticed this message in my kernel logs, reporting the possibility of an error with my memory. This would go a long way towards explaining the problems i''ve been having. This particular error is occuring when i''m not running xen so is obviously not something brought on by xen itself. > > > > The strange thing is that the NMI error is always followed by the TLAN: eth0: Adaptor Error = 0x180002, which says to me that either there is something wrong with my network card which is triggering an NMI, or that the NMI triggers an error in that network adapter. The memory itself is ECC memory in a Compaq Proliant 1600, maybe i can access the memory logs... > > > > Either way, what would xen do upon receiving an NMI? Would it spontaneously reboot? > > Hmm, given that it''s not something we''ve ever been able to test, > ''spontaneous reboot'' sounds quite possible...Just tested it and, yes, the NMI error path was broken. I''ve checked in a fix now, but this would explain the hangs that you were seeing. -- Keir ------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel