thr3ads.net - Xen devel - Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt

If this information is useful, please help other people find it:
Share via:

Trenta sis

2013-Sep-08 14:35 UTC

Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Hello,

I have the same error, server is auto rebooted during every boot with
kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM managment show
same errors described in previous mails. With Debian wheezy wit non-xen
kernel boots correcte, it seems that problems is with xen kernel
Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian Squeeze+XEN
4.0 working perfect

Upgraded to Debian testing and unstable with same results XEN 4.1 and 4.2.

If you need more information, you can ask.
How can be solved this bug?

Thanks






On Fri, Feb 08, 2013 at 03:08:08PM +0100, agya naila wrote:

Hello all, Today Xen finally running on IBM blade server machine, try to
add nmi=dom0 and find the Base Board Management Controller on bios
configuration and disabled the ''reboot system on nmi''
attribute. This step
won''t eliminate the nmi problem since I still found NMI error interrupt
on
my blade server log but xen would ignored and keep running. If any other
found better solution would be great.

Thanks for the ''workaround'' info.

We still should find out what exactly generates/causes that NMI with Xen..

-- Pasi

Agya

On Thu, Feb 7, 2013 at 9:51 PM, Fabian Arrotin <[1]arr...@centos.org>
wrote:

On 02/06/2013 02:39 PM, agya naila wrote: > I configure it by added
nmi=ignore to my /boot/grub/grub.cfg >

Just to add that I also tried the nmi=ignore parameter for Xen, and it
stills hard reboot/resets automatically during the kernel dom0 boot Fabian

References

Visible links 1. mailto:arr...@centos.org


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Trenta sis

2013-Sep-08 14:41 UTC

head link

IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Hello,

I have the same error, server is auto rebooted during every boot with
kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM managment show
same errors described in previous mails. With Debian wheezy wit non-xen
kernel boots correcte, it seems that problems is with xen kernel
Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian Squeeze+XEN
4.0 working perfect

Upgraded to Debian testing and unstable with same results XEN 4.1 and 4.2.

If you need more information, you can ask.
How can be solved this bug?

Thanks

>
>
>
>
> On Fri, Feb 08, 2013 at 03:08:08PM +0100, agya naila wrote:
>
> Hello all, Today Xen finally running on IBM blade server machine, try to
> add nmi=dom0 and find the Base Board Management Controller on bios
> configuration and disabled the ''reboot system on nmi''
attribute. This step
> won''t eliminate the nmi problem since I still found NMI error
interrupt on
> my blade server log but xen would ignored and keep running. If any other
> found better solution would be great.
>
> Thanks for the ''workaround'' info.
>
> We still should find out what exactly generates/causes that NMI with Xen..
>
> -- Pasi
>
> Agya
>
> On Thu, Feb 7, 2013 at 9:51 PM, Fabian Arrotin <[1]arr...@centos.org>
> wrote:
>
> On 02/06/2013 02:39 PM, agya naila wrote: > I configure it by added
> nmi=ignore to my /boot/grub/grub.cfg >
>
> Just to add that I also tried the nmi=ignore parameter for Xen, and it
> stills hard reboot/resets automatically during the kernel dom0 boot Fabian
>
> References
>
> Visible links 1. mailto:arr...@centos.org
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-Sep-09 19:15 UTC

head link

Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

On Sun, Sep 08, 2013 at 04:41:02PM +0200, Trenta sis
wrote:>  Hello,
> 
> I have the same error, server is auto rebooted during every boot with
> kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM managment show
> same errors described in previous mails. With Debian wheezy wit non-xen
> kernel boots correcte, it seems that problems is with xen kernel
> Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian Squeeze+XEN
> 4.0 working perfect
> 
> Upgraded to Debian testing and unstable with same results XEN 4.1 and 4.2.
> 
> If you need more information, you can ask.
> How can be solved this bug?
Did you the workaround help?

And in regards to finding out exactly what causes it - well there are
logs in the BMC that can point to it the PCI device? Did you check those?
Do they save if there is any device that has PCI SERR on them?

Thanks.

Trenta sis

2013-Sep-12 12:47 UTC

head link

Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Hello,

We need this server and we have made a downgrade to Debian Squeeze.
I hope in a few day to have another HS20 to make some additional test,
I''ll
try to get all information that you asked and send
Sorry, one question what is  PCI SERR ? Where?

Thanks for all

2013/9/9 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> On Sun, Sep 08, 2013 at 04:41:02PM +0200, Trenta sis wrote:
> >  Hello,
> >
> > I have the same error, server is auto rebooted during every boot with
> > kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM managment
> show
> > same errors described in previous mails. With Debian wheezy wit
non-xen
> > kernel boots correcte, it seems that problems is with xen kernel
> > Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian Squeeze+XEN
> > 4.0 working perfect
> >
> > Upgraded to Debian testing and unstable with same results XEN 4.1 and
> 4.2.
> >
> > If you need more information, you can ask.
> > How can be solved this bug?
>
> Did you the workaround help?
>
> And in regards to finding out exactly what causes it - well there are
> logs in the BMC that can point to it the PCI device? Did you check those?
> Do they save if there is any device that has PCI SERR on them?
>
> Thanks.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-Sep-23 14:02 UTC

head link

Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

On Thu, Sep 12, 2013 at 02:47:39PM +0200, Trenta sis
wrote:> Hello,
> 
> We need this server and we have made a downgrade to Debian Squeeze.
> I hope in a few day to have another HS20 to make some additional test,
I''ll
> try to get all information that you asked and send
> Sorry, one question what is  PCI SERR ? Where?
If you log in the BladeCenter webfrontend you should see logs of
each blade. Some of them are ''User XYZ logged in''. But in some
cases
the are more serious ones - such an NMI or PCI SERR. If you could copy-n-paste
them it could help in figuring which PCI device is responsible for causing
the NMI.
> 
> Thanks for all
> 
> 2013/9/9 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> > On Sun, Sep 08, 2013 at 04:41:02PM +0200, Trenta sis wrote:
> > >  Hello,
> > >
> > > I have the same error, server is auto rebooted during every boot
with
> > > kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM
managment
> > show
> > > same errors described in previous mails. With Debian wheezy wit
non-xen
> > > kernel boots correcte, it seems that problems is with xen kernel
> > > Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian Squeeze+XEN
> > > 4.0 working perfect
> > >
> > > Upgraded to Debian testing and unstable with same results XEN 4.1
and
> > 4.2.
> > >
> > > If you need more information, you can ask.
> > > How can be solved this bug?
> >
> > Did you the workaround help?
> >
> > And in regards to finding out exactly what causes it - well there are
> > logs in the BMC that can point to it the PCI device? Did you check
those?
> > Do they save if there is any device that has PCI SERR on them?
> >
> > Thanks.
> >

Trenta sis

2013-Sep-29 10:47 UTC

head link

Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Hello,

In Bladecenter webfrontend appears:

  27 I Blade_09 09/08/13 13:25:17 0x806f0013 <javascript:;> Chassis, (NMI
State) diagnostic interrupt 28 E Blade_09 09/08/13 13:25:12
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 29 I Blade_09 09/08/13
13:09:14 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 30 I Blade_09 09/08/13 13:09:03 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 31 E Blade_09 09/08/13 13:08:58
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 32 I Blade_09 09/08/13
12:46:26 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 33 I Blade_09 09/08/13 12:46:15 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 34 E Blade_09 09/08/13 12:46:11
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 35 I Blade_09 09/08/13
12:34:13 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 36 I Blade_09 09/08/13 12:34:03 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 37 E Blade_09 09/08/13 12:33:58
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 38 I Blade_09 09/08/13
12:27:25 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 39 I Blade_09 09/08/13 12:27:14 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 40 E Blade_09 09/08/13 12:27:10
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 41 I Blade_09 09/08/13
12:20:45 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 42 I Blade_09 09/08/13 12:20:34 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 43 E Blade_09 09/08/13 12:20:30
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 44 I Blade_09 09/08/13
12:18:20 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 45 I Blade_09 09/08/13 12:18:10 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 46 E Blade_09 09/08/13 12:18:05
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 47 I Blade_09 09/08/13
12:15:47 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 48 I Blade_09 09/08/13 12:15:37 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 49 E Blade_09 09/08/13 12:15:32
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
29 I Blade_09 09/08/13 13:09:14 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
30 I Blade_09 09/08/13 13:09:03 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
31 E Blade_09 09/08/13 13:08:58 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
32 I Blade_09 09/08/13 12:46:26 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
33 I Blade_09 09/08/13 12:46:15 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
34 E Blade_09 09/08/13 12:46:11 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
35 I Blade_09 09/08/13 12:34:13 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
36 I Blade_09 09/08/13 12:34:03 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
37 E Blade_09 09/08/13 12:33:58 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
38 I Blade_09 09/08/13 12:27:25 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
39 I Blade_09 09/08/13 12:27:14 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
40 E Blade_09 09/08/13 12:27:10 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
41 I Blade_09 09/08/13 12:20:45 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
42 I Blade_09 09/08/13 12:20:34 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
43 E Blade_09 09/08/13 12:20:30 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
44 I Blade_09 09/08/13 12:18:20 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
45 I Blade_09 09/08/13 12:18:10 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
46 E Blade_09 09/08/13 12:18:05 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
47 I Blade_09 09/08/13 12:15:47 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
48 I Blade_09 09/08/13 12:15:37 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
49 E Blade_09 09/08/13 12:15:32 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
Thanks


  27 I Blade_09 09/08/13 13:25:17 0x806f0013 <javascript:;> Chassis, (NMI
State) diagnostic interrupt 28 E Blade_09 09/08/13 13:25:12
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 29 I Blade_09 09/08/13
13:09:14 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 30 I Blade_09 09/08/13 13:09:03 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 31 E Blade_09 09/08/13 13:08:58
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 32 I Blade_09 09/08/13
12:46:26 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 33 I Blade_09 09/08/13 12:46:15 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 34 E Blade_09 09/08/13 12:46:11
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 35 I Blade_09 09/08/13
12:34:13 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 36 I Blade_09 09/08/13 12:34:03 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 37 E Blade_09 09/08/13 12:33:58
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 38 I Blade_09 09/08/13
12:27:25 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 39 I Blade_09 09/08/13 12:27:14 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 40 E Blade_09 09/08/13 12:27:10
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 41 I Blade_09 09/08/13
12:20:45 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 42 I Blade_09 09/08/13 12:20:34 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 43 E Blade_09 09/08/13 12:20:30
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 44 I Blade_09 09/08/13
12:18:20 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 45 I Blade_09 09/08/13 12:18:10 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 46 E Blade_09 09/08/13 12:18:05
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 47 I Blade_09 09/08/13
12:15:47 0x806f0013 <javascript:;> Recovery Chassis, (NMI State)
diagnostic
interrupt 48 I Blade_09 09/08/13 12:15:37 0x806f0013 <javascript:;>
Chassis,
(NMI State) diagnostic interrupt 49 E Blade_09 09/08/13 12:15:32
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020

2013/9/23 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> On Thu, Sep 12, 2013 at 02:47:39PM +0200, Trenta sis wrote:
> > Hello,
> >
> > We need this server and we have made a downgrade to Debian Squeeze.
> > I hope in a few day to have another HS20 to make some additional test,
> I''ll
> > try to get all information that you asked and send
> > Sorry, one question what is  PCI SERR ? Where?
>
> If you log in the BladeCenter webfrontend you should see logs of
> each blade. Some of them are ''User XYZ logged in''. But in
some cases
> the are more serious ones - such an NMI or PCI SERR. If you could
> copy-n-paste
> them it could help in figuring which PCI device is responsible for causing
> the NMI.
>
> >
> > Thanks for all
> >
> > 2013/9/9 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >
> > > On Sun, Sep 08, 2013 at 04:41:02PM +0200, Trenta sis wrote:
> > > >  Hello,
> > > >
> > > > I have the same error, server is auto rebooted during every
boot with
> > > > kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM
managment
> > > show
> > > > same errors described in previous mails. With Debian wheezy
wit
> non-xen
> > > > kernel boots correcte, it seems that problems is with xen
kernel
> > > > Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian
Squeeze+XEN
> > > > 4.0 working perfect
> > > >
> > > > Upgraded to Debian testing and unstable with same results
XEN 4.1 and
> > > 4.2.
> > > >
> > > > If you need more information, you can ask.
> > > > How can be solved this bug?
> > >
> > > Did you the workaround help?
> > >
> > > And in regards to finding out exactly what causes it - well there
are
> > > logs in the BMC that can point to it the PCI device? Did you
check
> those?
> > > Do they save if there is any device that has PCI SERR on them?
> > >
> > > Thanks.
> > >
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-Sep-30 14:13 UTC

head link

Is: 0xCF8 on extended config space instead of MCONF? Was:Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

> Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
> 27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagnostic
> interrupt
> 28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal
> Error, HI_FERR/NERR Value= 0020
Doing a simple Google search on HI_FERR tells me that it is:

http://www.intel.com/content/dam/doc/datasheet/e7525-memory-controller-hub-datasheet.pdf

and that
3.6.14 HI_FERR – Hub Interface First Error Register (D0:F1)

has something in it. The value is 0020 (is that decimal or hex?). If it is
decimal it is then 10100, which is bit 2 and 4:

bit 2:

HI Internal Parity Error Detected. This bit is sticky through reset. System 
software clears this bit by writing a ‘1’ to the location.
0 = No Internal Parity error detected.
1 = MCH HI bridge has detected an Internal Parity error. Non-fatal.

and bit 4:
HI Data Parity Error Detected. This bit is sticky through reset. System software
clears this bit by writing a ‘1’ to the location.
0 = No HI data parity error.
1 = MCH has detected a parity error on the data phase of a HI transaction. 



But that is unlikely as these are 'non-fatal'. So if this is hex, then
it would
be bit 5, which is:

Enhanced Configuration Access Error. This bit is sticky through reset. System 
software clears this bit by writing a ‘1’ to the location.
0 = No Enhanced Configuration Access error
1 = A PCI Express* Enhanced Configuration access was mistakenly targeting 
the legacy interface. Fatal


That sounds more like it. So we touched a PCIe Enhanced Configuration
(MMCONFIG?)
using the legacy interface (cf8?).

Jan, any thoughts? Is there a particular bug-fix we are missing in Xen 4.1 or
Xen 4.2
for this?  Xen 4.0 seems to work.

Trenta,

When you used Xen 4.0 did you use the same kernel as with Xen 4.1 or Xen 4.2?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2013-Sep-30 15:40 UTC

head link

Is: 0xCF8 on extended config space instead of MCONF? Was:Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

>>> On 30.09.13 at 16:13, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> But that is unlikely as these are 'non-fatal'. So if this is hex,
then it
> would be bit 5, which is:
> 
> Enhanced Configuration Access Error. This bit is sticky through reset. 
> System 
> software clears this bit by writing a ‘1’ to the location.
> 0 = No Enhanced Configuration Access error
> 1 = A PCI Express* Enhanced Configuration access was mistakenly targeting 
> the legacy interface. Fatal
> 
> 
> That sounds more like it. So we touched a PCIe Enhanced Configuration 
> (MMCONFIG?)
> using the legacy interface (cf8?).
> 
> Jan, any thoughts? Is there a particular bug-fix we are missing in Xen 4.1 
> or Xen 4.2
> for this?  Xen 4.0 seems to work.
Possibly MMCONF just didn't get used on 4.0?

And no, I don't think I recall any possibly relevant change. Even more,
the description above sounds more like an error resulting from device
misbehavior than from software incorrectly doing some access.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Trenta sis

2013-Oct-04 16:31 UTC

head link

Re: Is: 0xCF8 on extended config space instead of MCONF? Was:Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Hi,

With Xen 4.0 kernel used was 2.6.32, default kernel Debain 6 (Squeeze)
Thanks

2013/9/30 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
> > 27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State)
> diagnostic
> > interrupt
> > 28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal
> > Error, HI_FERR/NERR Value= 0020
>
> Doing a simple Google search on HI_FERR tells me that it is:
>
>
>
http://www.intel.com/content/dam/doc/datasheet/e7525-memory-controller-hub-datasheet.pdf
>
> and that
> 3.6.14 HI_FERR – Hub Interface First Error Register (D0:F1)
>
> has something in it. The value is 0020 (is that decimal or hex?). If it is
> decimal it is then 10100, which is bit 2 and 4:
>
> bit 2:
>
> HI Internal Parity Error Detected. This bit is sticky through reset. System
> software clears this bit by writing a ‘1’ to the location.
> 0 = No Internal Parity error detected.
> 1 = MCH HI bridge has detected an Internal Parity error. Non-fatal.
>
> and bit 4:
> HI Data Parity Error Detected. This bit is sticky through reset. System
> software
> clears this bit by writing a ‘1’ to the location.
> 0 = No HI data parity error.
> 1 = MCH has detected a parity error on the data phase of a HI transaction.
>
>
>
> But that is unlikely as these are ''non-fatal''. So if this
is hex, then it
> would
> be bit 5, which is:
>
> Enhanced Configuration Access Error. This bit is sticky through reset.
> System
> software clears this bit by writing a ‘1’ to the location.
> 0 = No Enhanced Configuration Access error
> 1 = A PCI Express* Enhanced Configuration access was mistakenly targeting
> the legacy interface. Fatal
>
>
> That sounds more like it. So we touched a PCIe Enhanced Configuration
> (MMCONFIG?)
> using the legacy interface (cf8?).
>
> Jan, any thoughts? Is there a particular bug-fix we are missing in Xen 4.1
> or Xen 4.2
> for this?  Xen 4.0 seems to work.
>
> Trenta,
>
> When you used Xen 4.0 did you use the same kernel as with Xen 4.1 or Xen
> 4.2?
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-Oct-04 16:55 UTC

head link

Re: Is: 0xCF8 on extended config space instead of MCONF? Was:Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

On Fri, Oct 04, 2013 at 06:31:37PM +0200, Trenta sis
wrote:> Hi,
> 
> With Xen 4.0 kernel used was 2.6.32, default kernel Debain 6 (Squeeze)
> Thanks
So if you swap either kernel or hypervisor do you see this? Meaning
if you run with Xen 4.2 + 2.6.32 or Xen 4.0 + current kernel.
> 
> 2013/9/30 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> > > Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
> > > 27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State)
> > diagnostic
> > > interrupt
> > > 28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI
Fatal
> > > Error, HI_FERR/NERR Value= 0020
> >
> > Doing a simple Google search on HI_FERR tells me that it is:
> >
> >
> >
http://www.intel.com/content/dam/doc/datasheet/e7525-memory-controller-hub-datasheet.pdf
> >
> > and that
> > 3.6.14 HI_FERR – Hub Interface First Error Register (D0:F1)
> >
> > has something in it. The value is 0020 (is that decimal or hex?). If
it is
> > decimal it is then 10100, which is bit 2 and 4:
> >
> > bit 2:
> >
> > HI Internal Parity Error Detected. This bit is sticky through reset.
System
> > software clears this bit by writing a ‘1’ to the location.
> > 0 = No Internal Parity error detected.
> > 1 = MCH HI bridge has detected an Internal Parity error. Non-fatal.
> >
> > and bit 4:
> > HI Data Parity Error Detected. This bit is sticky through reset.
System
> > software
> > clears this bit by writing a ‘1’ to the location.
> > 0 = No HI data parity error.
> > 1 = MCH has detected a parity error on the data phase of a HI
transaction.
> >
> >
> >
> > But that is unlikely as these are 'non-fatal'. So if this is
hex, then it
> > would
> > be bit 5, which is:
> >
> > Enhanced Configuration Access Error. This bit is sticky through reset.
> > System
> > software clears this bit by writing a ‘1’ to the location.
> > 0 = No Enhanced Configuration Access error
> > 1 = A PCI Express* Enhanced Configuration access was mistakenly
targeting
> > the legacy interface. Fatal
> >
> >
> > That sounds more like it. So we touched a PCIe Enhanced Configuration
> > (MMCONFIG?)
> > using the legacy interface (cf8?).
> >
> > Jan, any thoughts? Is there a particular bug-fix we are missing in Xen
4.1
> > or Xen 4.2
> > for this?  Xen 4.0 seems to work.
> >
> > Trenta,
> >
> > When you used Xen 4.0 did you use the same kernel as with Xen 4.1 or
Xen
> > 4.2?
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reasonably Related Threads

Search for more reasonably related threads

Xen devel - Sep 2013 - Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Is: 0xCF8 on extended config space instead of MCONF? Was:Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Is: 0xCF8 on extended config space instead of MCONF? Was:Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Re: Is: 0xCF8 on extended config space instead of MCONF? Was:Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Re: Is: 0xCF8 on extended config space instead of MCONF? Was:Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash

Reasonably Related Threads