Woehrle Hartmut SBB CFF FFS (Extern)
2013-Mar-19 15:32 UTC
[CentOS] Centos 6.3 Network bnx2 Problem on HP DL360
Hello Mailing List I got a severe network error message at a HP DL360 Server. The kernel log says: ----------------------------------- /var/log/messages ----------------------------------------------------------------- Mar 19 15:45:06 server kernel: do_IRQ: 2.168 No irq handler for vector (irq -1) Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: intr_sem[0] PCI_CMD[00100446] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PCI_PM[19002108] PCI_MISC_CFG[92000088] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000006] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: HC_STATS_INTERRUPT_STATUS[017f0080] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PBA[00000000] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- start MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP mode[0000b880] state[80008000] evt_mask[00000500] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: pc[0800adec] pc[0800aeb0] instr[8fb10014] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: shmem states: Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: drv_mb[0103000f] fw_mb[0000000f] link_status[0000006f] drv_pulse_mb[0000432b] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[0003610e] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003cc: 44444444 44444444 44444444 00000a3c Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003ec: 00000000 00000000 00000000 00000002 Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 0x3fc[0000ffff] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- end MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Down Mar 19 15:45:20 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Up, 1000 Mbps full duplex ----------------------------------------------------------------------------------------------------------------------------------- Does anyone know that problem? System is Centos 6.3 Kernel Linux server 2.6.32-279.5.2.el6.centos.plus.x86_64 #1 SMP Fri Aug 24 00:25:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Thanks Hartmut
On Mar 19, 2013, at 9:32 AM, Woehrle Hartmut SBB CFF FFS (Extern) <hartmut.woehrle at sbb.ch> wrote:> Hello Mailing List > > I got a severe network error message at a HP DL360 Server. > The kernel log says:If that's a DL360 G7 server, make sure you've applied all of the latest firmware patches from HP on it. The G7 version has been almost notorious for firmware issues with drive controllers, ethernet interfaces, etc. Nate
What's the irq number you can find for the device? You may have to find the driver development guide to figure out what the debug message says. Just the first line points out there is no irq for the device. You can check it in /proc/interrupts, then find a match in /proc/irq/ ------------ Banyan He Blog: http://www.rootong.com Email: banyan at rootong.com On 3/19/2013 11:32 PM, Woehrle Hartmut SBB CFF FFS (Extern) wrote:> Hello Mailing List > > I got a severe network error message at a HP DL360 Server. > The kernel log says: > > ----------------------------------- /var/log/messages ----------------------------------------------------------------- > Mar 19 15:45:06 server kernel: do_IRQ: 2.168 No irq handler for vector (irq -1) > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: intr_sem[0] PCI_CMD[00100446] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PCI_PM[19002108] PCI_MISC_CFG[92000088] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000006] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: HC_STATS_INTERRUPT_STATUS[017f0080] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PBA[00000000] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- start MCP states dump ---> > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP mode[0000b880] state[80008000] evt_mask[00000500] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: pc[0800adec] pc[0800aeb0] instr[8fb10014] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: shmem states: > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: drv_mb[0103000f] fw_mb[0000000f] link_status[0000006f] drv_pulse_mb[0000432b] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[0003610e] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003cc: 44444444 44444444 44444444 00000a3c > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003ec: 00000000 00000000 00000000 00000002 > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 0x3fc[0000ffff] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- end MCP states dump ---> > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Down > Mar 19 15:45:20 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Up, 1000 Mbps full duplex > ----------------------------------------------------------------------------------------------------------------------------------- > > Does anyone know that problem? > > System is Centos 6.3 Kernel > Linux server 2.6.32-279.5.2.el6.centos.plus.x86_64 #1 SMP Fri Aug 24 00:25:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux > > > Thanks > Hartmut > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > . >
Svavar Örn Eysteinsson
2013-Mar-20 13:04 UTC
[CentOS] Centos 6.3 Network bnx2 Problem on HP DL360
How often are you getting these crashes ? I had simular problem on my HP DL380 G7 server. I disabled Active State PowerManagement on the PCI-E express. Try it. Add pcie_aspm=off as optional boot option. Best regards, Svavar O Reykjavik - Iceland On 19.3.2013, at 15:32, Woehrle Hartmut SBB CFF FFS (Extern) wrote:> Hello Mailing List > > I got a severe network error message at a HP DL360 Server. > The kernel log says: > > ----------------------------------- /var/log/messages ----------------------------------------------------------------- > Mar 19 15:45:06 server kernel: do_IRQ: 2.168 No irq handler for vector (irq -1) > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: intr_sem[0] PCI_CMD[00100446] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PCI_PM[19002108] PCI_MISC_CFG[92000088] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000006] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: HC_STATS_INTERRUPT_STATUS[017f0080] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PBA[00000000] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- start MCP states dump ---> > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP mode[0000b880] state[80008000] evt_mask[00000500] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: pc[0800adec] pc[0800aeb0] instr[8fb10014] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: shmem states: > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: drv_mb[0103000f] fw_mb[0000000f] link_status[0000006f] drv_pulse_mb[0000432b] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[0003610e] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003cc: 44444444 44444444 44444444 00000a3c > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003ec: 00000000 00000000 00000000 00000002 > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 0x3fc[0000ffff] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- end MCP states dump ---> > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Down > Mar 19 15:45:20 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Up, 1000 Mbps full duplex > ----------------------------------------------------------------------------------------------------------------------------------- > > Does anyone know that problem? > > System is Centos 6.3 Kernel > Linux server 2.6.32-279.5.2.el6.centos.plus.x86_64 #1 SMP Fri Aug 24 00:25:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux > > > Thanks > Hartmut > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos
Woehrle Hartmut SBB CFF FFS (Extern)
2013-Mar-25 09:06 UTC
[CentOS] Centos 6.3 Network bnx2 Problem on HP DL360
Hello Svavar This was the first time that this problem occurred - with 60 Servers and about half a year of Centos 6 (5 before). But because the interfaces have a permanent load - really 24x7 - problems with power management would be a disaster. I will try to switch off. Thanks Hartmut> How often are you getting these crashes ? > >I had simular problem on my HP DL380 G7 server. > >I disabled Active State PowerManagement on the PCI-E express. > >Try it. > >Add pcie_aspm=off as optional boot option. > > >Best regards, > >Svavar O >Reykjavik - Iceland> On 19.3.2013, at 15:32, Woehrle Hartmut SBB CFF FFS (Extern) wrote: > > Hello Mailing List > > I got a severe network error message at a HP DL360 Server. > The kernel log says: > > ----------------------------------- /var/log/messages > ----------------------------------------------------------------- > Mar 19 15:45:06 server kernel: do_IRQ: 2.168 No irq handler for vector > (irq -1) Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: > DEBUG: intr_sem[0] PCI_CMD[00100446] Mar 19 15:45:17 server kernel: > bnx2 0000:02:00.1: eth1: DEBUG: PCI_PM[19002108] > PCI_MISC_CFG[92000088] Mar 19 15:45:17 server kernel: bnx2 > 0000:02:00.1: eth1: DEBUG: EMAC_TX_STATUS[00000008] > EMAC_RX_STATUS[00000006] Mar 19 15:45:17 server kernel: bnx2 > 0000:02:00.1: eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088] Mar 19 15:45:17 > server kernel: bnx2 0000:02:00.1: eth1: DEBUG: > HC_STATS_INTERRUPT_STATUS[017f0080] > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: > PBA[00000000] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: > <--- start MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 > 0000:02:00.1: eth1: DEBUG: MCP_STATE_P0[0003610e] > MCP_STATE_P1[0003610e] Mar 19 15:45:17 server kernel: bnx2 > 0000:02:00.1: eth1: DEBUG: MCP mode[0000b880] state[80008000] evt_mask[00000500] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: pc[0800adec] pc[0800aeb0] instr[8fb10014] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: shmem states: > Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: > drv_mb[0103000f] fw_mb[0000000f] link_status[0000006f] > drv_pulse_mb[0000432b] Mar 19 15:45:17 server kernel: bnx2 > 0000:02:00.1: eth1: DEBUG: dev_info_signature[44564903] > reset_type[01005254] condition[0003610e] Mar 19 15:45:17 server > kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003cc: 44444444 44444444 > 44444444 00000a3c Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: > eth1: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff Mar 19 > 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003ec: > 00000000 00000000 00000000 00000002 Mar 19 15:45:17 server kernel: > bnx2 0000:02:00.1: eth1: DEBUG: 0x3fc[0000ffff] Mar 19 15:45:17 server > kernel: bnx2 0000:02:00.1: eth1: <--- end MCP states dump ---> Mar 19 > 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is > Down Mar 19 15:45:20 server kernel: bnx2 0000:02:00.1: eth1: NIC > Copper Link is Up, 1000 Mbps full duplex > ---------------------------------------------------------------------- > ------------------------------------------------------------- > > Does anyone know that problem? > > System is Centos 6.3 Kernel > Linux server 2.6.32-279.5.2.el6.centos.plus.x86_64 #1 SMP Fri Aug 24 > 00:25:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux > > > Thanks > Hartmut > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos_______________________________________________ CentOS mailing list CentOS at centos.org http://lists.centos.org/mailman/listinfo/centos
Svavar Örn Eysteinsson
2013-Mar-25 10:10 UTC
[CentOS] Centos 6.3 Network bnx2 Problem on HP DL360
After you have tried the pcie_aspm boot option, also try : echo performance > /sys/module/pcie_aspm/parameters/policy This will disable ASPM on PCIe and operate with maximum performance. This is what I use today on the DL380 G7. On 25.3.2013, at 09:06, Woehrle Hartmut SBB CFF FFS (Extern) wrote:> Hello Svavar > > This was the first time that this problem occurred - with 60 Servers and about half a year of Centos 6 (5 before). > But because the interfaces have a permanent load - really 24x7 - problems with power management would be a disaster. > I will try to switch off. > > Thanks > Hartmut > >> How often are you getting these crashes ? >> >> I had simular problem on my HP DL380 G7 server. >> >> I disabled Active State PowerManagement on the PCI-E express. >> >> Try it. >> >> Add pcie_aspm=off as optional boot option. >> >> >> Best regards, >> >> Svavar O >> Reykjavik - Iceland > > > >> On 19.3.2013, at 15:32, Woehrle Hartmut SBB CFF FFS (Extern) wrote: >> >> Hello Mailing List >> >> I got a severe network error message at a HP DL360 Server. >> The kernel log says: >> >> ----------------------------------- /var/log/messages >> ----------------------------------------------------------------- >> Mar 19 15:45:06 server kernel: do_IRQ: 2.168 No irq handler for vector >> (irq -1) Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: >> DEBUG: intr_sem[0] PCI_CMD[00100446] Mar 19 15:45:17 server kernel: >> bnx2 0000:02:00.1: eth1: DEBUG: PCI_PM[19002108] >> PCI_MISC_CFG[92000088] Mar 19 15:45:17 server kernel: bnx2 >> 0000:02:00.1: eth1: DEBUG: EMAC_TX_STATUS[00000008] >> EMAC_RX_STATUS[00000006] Mar 19 15:45:17 server kernel: bnx2 >> 0000:02:00.1: eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088] Mar 19 15:45:17 >> server kernel: bnx2 0000:02:00.1: eth1: DEBUG: >> HC_STATS_INTERRUPT_STATUS[017f0080] >> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: >> PBA[00000000] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: >> <--- start MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 >> 0000:02:00.1: eth1: DEBUG: MCP_STATE_P0[0003610e] >> MCP_STATE_P1[0003610e] Mar 19 15:45:17 server kernel: bnx2 >> 0000:02:00.1: eth1: DEBUG: MCP mode[0000b880] state[80008000] evt_mask[00000500] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: pc[0800adec] pc[0800aeb0] instr[8fb10014] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: shmem states: >> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: >> drv_mb[0103000f] fw_mb[0000000f] link_status[0000006f] >> drv_pulse_mb[0000432b] Mar 19 15:45:17 server kernel: bnx2 >> 0000:02:00.1: eth1: DEBUG: dev_info_signature[44564903] >> reset_type[01005254] condition[0003610e] Mar 19 15:45:17 server >> kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003cc: 44444444 44444444 >> 44444444 00000a3c Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: >> eth1: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff Mar 19 >> 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003ec: >> 00000000 00000000 00000000 00000002 Mar 19 15:45:17 server kernel: >> bnx2 0000:02:00.1: eth1: DEBUG: 0x3fc[0000ffff] Mar 19 15:45:17 server >> kernel: bnx2 0000:02:00.1: eth1: <--- end MCP states dump ---> Mar 19 >> 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is >> Down Mar 19 15:45:20 server kernel: bnx2 0000:02:00.1: eth1: NIC >> Copper Link is Up, 1000 Mbps full duplex >> ---------------------------------------------------------------------- >> ------------------------------------------------------------- >> >> Does anyone know that problem? >> >> System is Centos 6.3 Kernel >> Linux server 2.6.32-279.5.2.el6.centos.plus.x86_64 #1 SMP Fri Aug 24 >> 00:25:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux >> >> >> Thanks >> Hartmut >> >> _______________________________________________ >> CentOS mailing list >> CentOS at centos.org >> http://lists.centos.org/mailman/listinfo/centos > > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos