Guru Anbalagane
2011-Mar-18 15:53 UTC
Re: [Xen-devel] Broadcom BCM5709 (bnx2) on Dell PowerEdge R610, Issues
This is likely related xen losing interrupts while certain cpus goes to c6 state. The below patch addresses an issue around this. http://xenbits.xen.org/hg/xen-unstable.hg/rev/1087f9a03ab6 Easy workaround would be to turn off cstates in BIOS or limit cstate in xen. Hope this helps. Thanks Guru> Message: 5 > Date: Fri, 18 Mar 2011 11:39:07 -0400 > From: Joshua West<jwest@brandeis.edu> > Subject: [Xen-devel] Broadcom BCM5709 (bnx2) on Dell PowerEdge R610 > Issues > To: xen-devel@lists.xensource.com > Message-ID:<4D837C9B.6030107@brandeis.edu> > Content-Type: text/plain; charset="iso-8859-1" > > Hey folks, > > Unfortunately, ever since we went live with Xen on Dell PowerEdge > R610''s, we''ve been having some odd and aggravating issues. The NIC''s > tend to drop out when under heavy traffic after 1-7 days of uptime > (random, difficult to reproduce). But before I get into the issue''s > specifics, here''s some information about our setup: > > * Dell PowerEdge R610''s w/ 4 Onboard Broadcom BCM5709 1-GbE NIC''s. > * RHEL 5.6. > * Xen 3.4.3 (from xen.org; our own compile) > * Kernel 2.6.18.18 (http://xenbits.xensource.com/linux-2.6.18-xen.hg) > checkout 1073. > * bnx2 driver 2.0.18c from Broadcom''s netxtreme2-6.0.53 package. > * bnx2 that ships with 2.6.18.8 doesn''t support BCM5709''s. > * Had to use driver package from broadcom.com in order to get > networking. > * NIC bonding in pairs (eth0 + eth1, etc), with options "mode=4 > lacp_rate=fast miimon=100 use_carrier=1". > > What occurs is suddenly one of the NIC''s in the bond stops responding. > Gets stuck on transmitting from what I understand. Kernel logs show the > following, which includes extra debug information as the developers from > Broadcom (Michael Chan and Benjamin Li) were assisting in > troubleshooting and gave me a version of bnx2 2.0.18c to run, that > prints out extra debug information upon NIC crash: > > Mar 18 01:40:26 xen-san-gb1 kernel: NETDEV WATCHDOG: eth0: transmit > timed out > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2:<--- start FTQ dump on eth0 ---> > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_PFTQ_CTL 10000 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_TFTQ_CTL 20000 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_MFTQ_CTL 4000 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TBDR_FTQ_CTL 4002 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TDMA_FTQ_CTL 10002 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TXP_FTQ_CTL 10002 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TPAT_FTQ_CTL 10000 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RXP_CFTQ_CTL 8000 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RXP_FTQ_CTL 100000 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_COM_COMXQ_FTQ_CTL > 10000 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_COM_COMTQ_FTQ_CTL > 20000 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_COM_COMQ_FTQ_CTL 10000 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_CP_CPQ_FTQ_CTL 4000 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: TXP mode b84c state > 80001000 evt_mask 500 pc 8001284 pc 8001284 instr 1440fffc > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: TPAT mode b84c state > 80001000 evt_mask 500 pc 8000a50 pc 8000a4c instr 38420001 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: RXP mode b84c state > 80001000 evt_mask 500 pc 8004ad0 pc 8004adc instr 14e0005d > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: COM mode b8cc state > 80008000 evt_mask 500 pc 8000a98 pc 8000a8c instr 8821 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: CP mode b8cc state > 80000000 evt_mask 500 pc 8000c7c pc 8000928 instr 8ce800e8 > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2:<--- end FTQ dump on eth0 ---> > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: intr_sem[0] > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: intr_sem[0] > PCI_CMD[00100406] > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: PCI_PM[19002008] > PCI_MISC_CFG[92000088] > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: > EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] > Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 RPM_MGMT_PKT_CTRL[40000088] > Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 DEBUG: > MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] > Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 DEBUG: > HC_STATS_INTERRUPT_STATUS[01fe0001] > Mar 18 01:40:27 xen-san-gb1 kernel: Ring state for ring 0 napi state 12 > Mar 18 01:40:27 xen-san-gb1 kernel: netdev state 7 > Mar 18 01:40:27 xen-san-gb1 kernel: hw status idx 3267 last status idx > 307c irq jiffies 100759890 > Mar 18 01:40:27 xen-san-gb1 kernel: hw tx cons a669 hw rx cons 103c > Mar 18 01:40:27 xen-san-gb1 kernel: sw tx cons a57c a57c prod a669 > Mar 18 01:40:27 xen-san-gb1 kernel: sw rx cons f3c prod 103c > Mar 18 01:40:27 xen-san-gb1 kernel: Current jiffies 1008f4741 HZ fa tx > 1008f41e2 poll 100759890 > Mar 18 01:40:27 xen-san-gb1 kernel: tx stop jiffies 1008f41e2 tx start > jiffies 0 > Mar 18 01:40:27 xen-san-gb1 kernel: irq_event c68c36 napi_event c68c37 > Mar 18 01:40:27 xen-san-gb1 kernel: Ring state for ring 0 napi state 12 > Mar 18 01:40:27 xen-san-gb1 kernel: netdev state 77 > Mar 18 01:40:27 xen-san-gb1 kernel: hw status idx 3267 last status idx > 307c irq jiffies 100759890 > Mar 18 01:40:27 xen-san-gb1 kernel: hw tx cons a669 hw rx cons 103c > Mar 18 01:40:27 xen-san-gb1 kernel: sw tx cons a57c a57c prod a669 > Mar 18 01:40:27 xen-san-gb1 kernel: sw rx cons f3c prod 103c > Mar 18 01:40:27 xen-san-gb1 kernel: Current jiffies 1008f4741 HZ fa tx > 1008f41e2 poll 100759890 > Mar 18 01:40:27 xen-san-gb1 kernel: tx stop jiffies 1008f41e2 tx start > jiffies 0 > Mar 18 01:40:27 xen-san-gb1 kernel: irq_event c68c36 napi_event c68c37 > Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 NIC Copper Link is Down > Mar 18 01:40:27 xen-san-gb1 kernel: bonding: bond0: link status > definitely down for interface eth0, disabling it > > This was then followed rather quickly by a failure with the second NIC > (eth1) in the bond: > > Mar 18 01:42:26 xen-san-gb1 kernel: NETDEV WATCHDOG: eth1: transmit > timed out > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2:<--- start FTQ dump on eth1 ---> > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_PFTQ_CTL 10000 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_TFTQ_CTL 20000 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_MFTQ_CTL 4000 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TBDR_FTQ_CTL 4002 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TDMA_FTQ_CTL 10000 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TXP_FTQ_CTL 10002 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TPAT_FTQ_CTL 10000 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RXP_CFTQ_CTL 8000 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RXP_FTQ_CTL 100000 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_COM_COMXQ_FTQ_CTL > 10000 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_COM_COMTQ_FTQ_CTL > 20000 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_COM_COMQ_FTQ_CTL 10000 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_CP_CPQ_FTQ_CTL 4000 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: TXP mode b84c state > 80005000 evt_mask 500 pc 8001294 pc 8001284 instr 38640001 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: TPAT mode b84c state > 80001000 evt_mask 500 pc 8000a58 pc 8000a5c instr 8f820014 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: RXP mode b84c state > 80001000 evt_mask 500 pc 8004ad0 pc 8004adc instr 14e0005d > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: COM mode b8cc state > 80000000 evt_mask 500 pc 8000a9c pc 8000a94 instr 3c028000 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: CP mode b8cc state > 80008000 evt_mask 500 pc 8000c58 pc 8000c6c instr 27bdffe8 > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2:<--- end FTQ dump on eth1 ---> > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: intr_sem[0] > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: intr_sem[0] > PCI_CMD[00100406] > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: PCI_PM[19002008] > PCI_MISC_CFG[92000088] > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: > EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] > Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 RPM_MGMT_PKT_CTRL[40000088] > Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 DEBUG: > MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] > Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 DEBUG: > HC_STATS_INTERRUPT_STATUS[01fe0001] > Mar 18 01:42:27 xen-san-gb1 kernel: Ring state for ring 0 napi state 12 > Mar 18 01:42:27 xen-san-gb1 kernel: netdev state 7 > Mar 18 01:42:27 xen-san-gb1 kernel: hw status idx 2bb0 last status idx > 29c4 irq jiffies 100759898 > Mar 18 01:42:27 xen-san-gb1 kernel: hw tx cons e421 hw rx cons a8ce > Mar 18 01:42:27 xen-san-gb1 kernel: sw tx cons e334 e334 prod e421 > Mar 18 01:42:27 xen-san-gb1 kernel: sw rx cons a7ce prod a8ce > Mar 18 01:42:27 xen-san-gb1 kernel: Current jiffies 1008fbc71 HZ fa tx > 1008fb744 poll 100759898 > Mar 18 01:42:27 xen-san-gb1 kernel: tx stop jiffies 1008fb744 tx start > jiffies 100239dfd > Mar 18 01:42:27 xen-san-gb1 kernel: irq_event ab2e13 napi_event ab2e14 > Mar 18 01:42:27 xen-san-gb1 kernel: Ring state for ring 0 napi state 12 > Mar 18 01:42:27 xen-san-gb1 kernel: netdev state 77 > Mar 18 01:42:27 xen-san-gb1 kernel: hw status idx 2bb0 last status idx > 29c4 irq jiffies 100759898 > Mar 18 01:42:27 xen-san-gb1 kernel: hw tx cons e421 hw rx cons a8ce > Mar 18 01:42:27 xen-san-gb1 kernel: sw tx cons e334 e334 prod e421 > Mar 18 01:42:27 xen-san-gb1 kernel: sw rx cons a7ce prod a8ce > Mar 18 01:42:27 xen-san-gb1 kernel: Current jiffies 1008fbc72 HZ fa tx > 1008fb744 poll 100759898 > Mar 18 01:42:27 xen-san-gb1 kernel: tx stop jiffies 1008fb744 tx start > jiffies 100239dfd > Mar 18 01:42:27 xen-san-gb1 kernel: irq_event ab2e13 napi_event ab2e14 > Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 NIC Copper Link is Down > Mar 18 01:42:27 xen-san-gb1 kernel: bonding: bond0: link status > definitely down for interface eth1, disabling it > Mar 18 01:42:27 xen-san-gb1 kernel: bonding: bond0: Warning: No 802.3ad > response from the link partner for any adapters in the bond > > Onto more technical details... > > The kernel we were running (2.6.18.8 from xenbits) was compiled without > support for MSI/MSI-X originally. So, we were experiencing these > problems with plain standard IRQ''s. Michael Chan @ Broadcom, the author > of bnx2 if you modinfo, has told me via email: > > * "The logs show that we haven''t had an interrupt for a very long > time. It''s not clear how that interrupt was lost." > * "So far the logs don''t show any inconsistent state in the hardware > or software. It is possible that the Xen kernel is missing an interrupt > and not delivering to the driver. Normally, in INTA mode, the IRQ is > level triggered and should remain asserted until it is seen by the > driver and de-asserted by the driver." > > But, just in case, I compiled 2.6.18.8 with support for MSI/MSI-X and > was able to confirm (via dmesg and lspci -vv) that the NIC''s began to > use MSI for interrupts. Unfortunately, the NIC crash happened anyways > (the above kernel logs is actually from when running with MSI). > > Here''s whats really bugging me. We have a Dell PowerEdge R610, running > Xen along with the bnx2 drivers from Broadcom, thats been online for > ~220 days. Without a failure. The only difference is the system is not > making use of bonding. It has just one NIC connected to the network > with no VLAN''s trunked down etc. > > It looks like I''m not alone out there, as there''s a Red Hat bugzilla > report for this issue: > > https://bugzilla.redhat.com/show_bug.cgi?id=520888 > > ^^ The above has an indication of *Status > <https://bugzilla.redhat.com/page.cgi?id=fields.html#status>*: CLOSED > DUPLICATE of bug 511368 > <https://bugzilla.redhat.com/show_bug.cgi?id=511368> , but looks like I > don''t have access to view 511368. Grrr. > > Anyways... > > 1) Has anybody else experienced this issue? > 2) Any developers care to comment on possible causes of this problem? > 3) Anybody know of a solution? > 4) What can I do to troubleshoot further, and get developers necessary > information? > > Lastly... > > 5) Is anybody running Intel NIC''s within Dell PowerEdge R610''s, using > bonding + Xen 3.4.3 + 2.6.18.8, and can safely report success? I may > switch to Intel... > > Thanks! > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Joshua West
2011-Mar-18 19:24 UTC
Re: [Xen-devel] Broadcom BCM5709 (bnx2) on Dell PowerEdge R610, Issues
Hi Guru, Awesome, thanks for the tip. I''ll test out disabling cstates in the BIOS as I don''t believe Xen 3.4.x lets you set max_cstate as an argument to xen.gz in grub.conf. The patch in the changeset you mention applies to Xen 3.4.3 code. Do you have an experience with that patch functioning/helping/working with Xen 3.4.x? And if so, do you think it will end up as part of Xen 3.4.4 (if that ever gets tagged/released)? Assuming disabling cstates in the BIOS alleviates my problem, I''ll probably give that patch a whirl with cstates enabled and see if the issue comes back. Just wondering if anybody else has used that patch with Xen 3.4.3 and found success. Thanks. On 03/18/11 11:53, Guru Anbalagane wrote:> This is likely related xen losing interrupts while certain cpus goes > to c6 state. > The below patch addresses an issue around this. > http://xenbits.xen.org/hg/xen-unstable.hg/rev/1087f9a03ab6 > Easy workaround would be to turn off cstates in BIOS or limit cstate > in xen. > > Hope this helps. > Thanks > Guru >> Message: 5 >> Date: Fri, 18 Mar 2011 11:39:07 -0400 >> From: Joshua West<jwest@brandeis.edu> >> Subject: [Xen-devel] Broadcom BCM5709 (bnx2) on Dell PowerEdge R610 >> Issues >> To: xen-devel@lists.xensource.com >> Message-ID:<4D837C9B.6030107@brandeis.edu> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hey folks, >> >> Unfortunately, ever since we went live with Xen on Dell PowerEdge >> R610''s, we''ve been having some odd and aggravating issues. The NIC''s >> tend to drop out when under heavy traffic after 1-7 days of uptime >> (random, difficult to reproduce). But before I get into the issue''s >> specifics, here''s some information about our setup: >> >> * Dell PowerEdge R610''s w/ 4 Onboard Broadcom BCM5709 1-GbE NIC''s. >> * RHEL 5.6. >> * Xen 3.4.3 (from xen.org; our own compile) >> * Kernel 2.6.18.18 >> (http://xenbits.xensource.com/linux-2.6.18-xen.hg) >> checkout 1073. >> * bnx2 driver 2.0.18c from Broadcom''s netxtreme2-6.0.53 package. >> * bnx2 that ships with 2.6.18.8 doesn''t support BCM5709''s. >> * Had to use driver package from broadcom.com in order to get >> networking. >> * NIC bonding in pairs (eth0 + eth1, etc), with options "mode=4 >> lacp_rate=fast miimon=100 use_carrier=1". >> >> What occurs is suddenly one of the NIC''s in the bond stops responding. >> Gets stuck on transmitting from what I understand. Kernel logs show the >> following, which includes extra debug information as the developers from >> Broadcom (Michael Chan and Benjamin Li) were assisting in >> troubleshooting and gave me a version of bnx2 2.0.18c to run, that >> prints out extra debug information upon NIC crash: >> >> Mar 18 01:40:26 xen-san-gb1 kernel: NETDEV WATCHDOG: eth0: transmit >> timed out >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2:<--- start FTQ dump on eth0 >> ---> >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_PFTQ_CTL 10000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_TFTQ_CTL 20000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_MFTQ_CTL 4000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TBDR_FTQ_CTL 4002 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TDMA_FTQ_CTL 10002 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TXP_FTQ_CTL 10002 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TPAT_FTQ_CTL 10000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RXP_CFTQ_CTL 8000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RXP_FTQ_CTL 100000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_COM_COMXQ_FTQ_CTL >> 10000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_COM_COMTQ_FTQ_CTL >> 20000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_COM_COMQ_FTQ_CTL >> 10000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_CP_CPQ_FTQ_CTL 4000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: TXP mode b84c state >> 80001000 evt_mask 500 pc 8001284 pc 8001284 instr 1440fffc >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: TPAT mode b84c state >> 80001000 evt_mask 500 pc 8000a50 pc 8000a4c instr 38420001 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: RXP mode b84c state >> 80001000 evt_mask 500 pc 8004ad0 pc 8004adc instr 14e0005d >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: COM mode b8cc state >> 80008000 evt_mask 500 pc 8000a98 pc 8000a8c instr 8821 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: CP mode b8cc state >> 80000000 evt_mask 500 pc 8000c7c pc 8000928 instr 8ce800e8 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2:<--- end FTQ dump on eth0 ---> >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: intr_sem[0] >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: intr_sem[0] >> PCI_CMD[00100406] >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: PCI_PM[19002008] >> PCI_MISC_CFG[92000088] >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: >> EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 >> RPM_MGMT_PKT_CTRL[40000088] >> Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 DEBUG: >> MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] >> Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 DEBUG: >> HC_STATS_INTERRUPT_STATUS[01fe0001] >> Mar 18 01:40:27 xen-san-gb1 kernel: Ring state for ring 0 napi state 12 >> Mar 18 01:40:27 xen-san-gb1 kernel: netdev state 7 >> Mar 18 01:40:27 xen-san-gb1 kernel: hw status idx 3267 last status idx >> 307c irq jiffies 100759890 >> Mar 18 01:40:27 xen-san-gb1 kernel: hw tx cons a669 hw rx cons 103c >> Mar 18 01:40:27 xen-san-gb1 kernel: sw tx cons a57c a57c prod a669 >> Mar 18 01:40:27 xen-san-gb1 kernel: sw rx cons f3c prod 103c >> Mar 18 01:40:27 xen-san-gb1 kernel: Current jiffies 1008f4741 HZ fa tx >> 1008f41e2 poll 100759890 >> Mar 18 01:40:27 xen-san-gb1 kernel: tx stop jiffies 1008f41e2 tx start >> jiffies 0 >> Mar 18 01:40:27 xen-san-gb1 kernel: irq_event c68c36 napi_event c68c37 >> Mar 18 01:40:27 xen-san-gb1 kernel: Ring state for ring 0 napi state 12 >> Mar 18 01:40:27 xen-san-gb1 kernel: netdev state 77 >> Mar 18 01:40:27 xen-san-gb1 kernel: hw status idx 3267 last status idx >> 307c irq jiffies 100759890 >> Mar 18 01:40:27 xen-san-gb1 kernel: hw tx cons a669 hw rx cons 103c >> Mar 18 01:40:27 xen-san-gb1 kernel: sw tx cons a57c a57c prod a669 >> Mar 18 01:40:27 xen-san-gb1 kernel: sw rx cons f3c prod 103c >> Mar 18 01:40:27 xen-san-gb1 kernel: Current jiffies 1008f4741 HZ fa tx >> 1008f41e2 poll 100759890 >> Mar 18 01:40:27 xen-san-gb1 kernel: tx stop jiffies 1008f41e2 tx start >> jiffies 0 >> Mar 18 01:40:27 xen-san-gb1 kernel: irq_event c68c36 napi_event c68c37 >> Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 NIC Copper Link is Down >> Mar 18 01:40:27 xen-san-gb1 kernel: bonding: bond0: link status >> definitely down for interface eth0, disabling it >> >> This was then followed rather quickly by a failure with the second NIC >> (eth1) in the bond: >> >> Mar 18 01:42:26 xen-san-gb1 kernel: NETDEV WATCHDOG: eth1: transmit >> timed out >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2:<--- start FTQ dump on eth1 >> ---> >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_PFTQ_CTL 10000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_TFTQ_CTL 20000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_MFTQ_CTL 4000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TBDR_FTQ_CTL 4002 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TDMA_FTQ_CTL 10000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TXP_FTQ_CTL 10002 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TPAT_FTQ_CTL 10000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RXP_CFTQ_CTL 8000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RXP_FTQ_CTL 100000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_COM_COMXQ_FTQ_CTL >> 10000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_COM_COMTQ_FTQ_CTL >> 20000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_COM_COMQ_FTQ_CTL >> 10000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_CP_CPQ_FTQ_CTL 4000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: TXP mode b84c state >> 80005000 evt_mask 500 pc 8001294 pc 8001284 instr 38640001 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: TPAT mode b84c state >> 80001000 evt_mask 500 pc 8000a58 pc 8000a5c instr 8f820014 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: RXP mode b84c state >> 80001000 evt_mask 500 pc 8004ad0 pc 8004adc instr 14e0005d >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: COM mode b8cc state >> 80000000 evt_mask 500 pc 8000a9c pc 8000a94 instr 3c028000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: CP mode b8cc state >> 80008000 evt_mask 500 pc 8000c58 pc 8000c6c instr 27bdffe8 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2:<--- end FTQ dump on eth1 ---> >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: intr_sem[0] >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: intr_sem[0] >> PCI_CMD[00100406] >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: PCI_PM[19002008] >> PCI_MISC_CFG[92000088] >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: >> EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 >> RPM_MGMT_PKT_CTRL[40000088] >> Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 DEBUG: >> MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] >> Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 DEBUG: >> HC_STATS_INTERRUPT_STATUS[01fe0001] >> Mar 18 01:42:27 xen-san-gb1 kernel: Ring state for ring 0 napi state 12 >> Mar 18 01:42:27 xen-san-gb1 kernel: netdev state 7 >> Mar 18 01:42:27 xen-san-gb1 kernel: hw status idx 2bb0 last status idx >> 29c4 irq jiffies 100759898 >> Mar 18 01:42:27 xen-san-gb1 kernel: hw tx cons e421 hw rx cons a8ce >> Mar 18 01:42:27 xen-san-gb1 kernel: sw tx cons e334 e334 prod e421 >> Mar 18 01:42:27 xen-san-gb1 kernel: sw rx cons a7ce prod a8ce >> Mar 18 01:42:27 xen-san-gb1 kernel: Current jiffies 1008fbc71 HZ fa tx >> 1008fb744 poll 100759898 >> Mar 18 01:42:27 xen-san-gb1 kernel: tx stop jiffies 1008fb744 tx start >> jiffies 100239dfd >> Mar 18 01:42:27 xen-san-gb1 kernel: irq_event ab2e13 napi_event ab2e14 >> Mar 18 01:42:27 xen-san-gb1 kernel: Ring state for ring 0 napi state 12 >> Mar 18 01:42:27 xen-san-gb1 kernel: netdev state 77 >> Mar 18 01:42:27 xen-san-gb1 kernel: hw status idx 2bb0 last status idx >> 29c4 irq jiffies 100759898 >> Mar 18 01:42:27 xen-san-gb1 kernel: hw tx cons e421 hw rx cons a8ce >> Mar 18 01:42:27 xen-san-gb1 kernel: sw tx cons e334 e334 prod e421 >> Mar 18 01:42:27 xen-san-gb1 kernel: sw rx cons a7ce prod a8ce >> Mar 18 01:42:27 xen-san-gb1 kernel: Current jiffies 1008fbc72 HZ fa tx >> 1008fb744 poll 100759898 >> Mar 18 01:42:27 xen-san-gb1 kernel: tx stop jiffies 1008fb744 tx start >> jiffies 100239dfd >> Mar 18 01:42:27 xen-san-gb1 kernel: irq_event ab2e13 napi_event ab2e14 >> Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 NIC Copper Link is Down >> Mar 18 01:42:27 xen-san-gb1 kernel: bonding: bond0: link status >> definitely down for interface eth1, disabling it >> Mar 18 01:42:27 xen-san-gb1 kernel: bonding: bond0: Warning: No 802.3ad >> response from the link partner for any adapters in the bond >> >> Onto more technical details... >> >> The kernel we were running (2.6.18.8 from xenbits) was compiled without >> support for MSI/MSI-X originally. So, we were experiencing these >> problems with plain standard IRQ''s. Michael Chan @ Broadcom, the author >> of bnx2 if you modinfo, has told me via email: >> >> * "The logs show that we haven''t had an interrupt for a very long >> time. It''s not clear how that interrupt was lost." >> * "So far the logs don''t show any inconsistent state in the hardware >> or software. It is possible that the Xen kernel is missing an interrupt >> and not delivering to the driver. Normally, in INTA mode, the IRQ is >> level triggered and should remain asserted until it is seen by the >> driver and de-asserted by the driver." >> >> But, just in case, I compiled 2.6.18.8 with support for MSI/MSI-X and >> was able to confirm (via dmesg and lspci -vv) that the NIC''s began to >> use MSI for interrupts. Unfortunately, the NIC crash happened anyways >> (the above kernel logs is actually from when running with MSI). >> >> Here''s whats really bugging me. We have a Dell PowerEdge R610, running >> Xen along with the bnx2 drivers from Broadcom, thats been online for >> ~220 days. Without a failure. The only difference is the system is not >> making use of bonding. It has just one NIC connected to the network >> with no VLAN''s trunked down etc. >> >> It looks like I''m not alone out there, as there''s a Red Hat bugzilla >> report for this issue: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=520888 >> >> ^^ The above has an indication of *Status >> <https://bugzilla.redhat.com/page.cgi?id=fields.html#status>*: CLOSED >> DUPLICATE of bug 511368 >> <https://bugzilla.redhat.com/show_bug.cgi?id=511368> , but looks like I >> don''t have access to view 511368. Grrr. >> >> Anyways... >> >> 1) Has anybody else experienced this issue? >> 2) Any developers care to comment on possible causes of this problem? >> 3) Anybody know of a solution? >> 4) What can I do to troubleshoot further, and get developers necessary >> information? >> >> Lastly... >> >> 5) Is anybody running Intel NIC''s within Dell PowerEdge R610''s, using >> bonding + Xen 3.4.3 + 2.6.18.8, and can safely report success? I may >> switch to Intel... >> >> Thanks! >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Joshua West Senior Systems Engineer Brandeis University http://www.brandeis.edu _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Joshua West
2011-Mar-18 19:45 UTC
Re: [Xen-devel] Broadcom BCM5709 (bnx2) on Dell PowerEdge R610, Issues
Hey Guru, So I''m doing some investigation into Xen and Cstates, and using the command `xenpm get-cpuidle-states` I see information like: Max C-state: C7 cpu id : 0 total C-states : 3 idle time(ms) : 86720627 C0 : transition [00000000000056227349] residency [00000000000002247572 ms] C1 : transition [00000000000016212715] residency [00000000000003407751 ms] C2 : transition [00000000000040014634] residency [00000000000082426936 ms] (along with repeated info for the rest of the CPU''s...) Notice how there''s nothing there for C3 - C6... and it states "total C-states : 3" not 7. Does this mean Xen 3.4.3 doesn''t support those C-states? Or my system doesn''t have them available? Thanks. On 03/18/11 11:53, Guru Anbalagane wrote:> This is likely related xen losing interrupts while certain cpus goes > to c6 state. > The below patch addresses an issue around this. > http://xenbits.xen.org/hg/xen-unstable.hg/rev/1087f9a03ab6 > Easy workaround would be to turn off cstates in BIOS or limit cstate > in xen. > > Hope this helps. > Thanks > Guru >> Message: 5 >> Date: Fri, 18 Mar 2011 11:39:07 -0400 >> From: Joshua West<jwest@brandeis.edu> >> Subject: [Xen-devel] Broadcom BCM5709 (bnx2) on Dell PowerEdge R610 >> Issues >> To: xen-devel@lists.xensource.com >> Message-ID:<4D837C9B.6030107@brandeis.edu> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hey folks, >> >> Unfortunately, ever since we went live with Xen on Dell PowerEdge >> R610''s, we''ve been having some odd and aggravating issues. The NIC''s >> tend to drop out when under heavy traffic after 1-7 days of uptime >> (random, difficult to reproduce). But before I get into the issue''s >> specifics, here''s some information about our setup: >> >> * Dell PowerEdge R610''s w/ 4 Onboard Broadcom BCM5709 1-GbE NIC''s. >> * RHEL 5.6. >> * Xen 3.4.3 (from xen.org; our own compile) >> * Kernel 2.6.18.18 >> (http://xenbits.xensource.com/linux-2.6.18-xen.hg) >> checkout 1073. >> * bnx2 driver 2.0.18c from Broadcom''s netxtreme2-6.0.53 package. >> * bnx2 that ships with 2.6.18.8 doesn''t support BCM5709''s. >> * Had to use driver package from broadcom.com in order to get >> networking. >> * NIC bonding in pairs (eth0 + eth1, etc), with options "mode=4 >> lacp_rate=fast miimon=100 use_carrier=1". >> >> What occurs is suddenly one of the NIC''s in the bond stops responding. >> Gets stuck on transmitting from what I understand. Kernel logs show the >> following, which includes extra debug information as the developers from >> Broadcom (Michael Chan and Benjamin Li) were assisting in >> troubleshooting and gave me a version of bnx2 2.0.18c to run, that >> prints out extra debug information upon NIC crash: >> >> Mar 18 01:40:26 xen-san-gb1 kernel: NETDEV WATCHDOG: eth0: transmit >> timed out >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2:<--- start FTQ dump on eth0 >> ---> >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_PFTQ_CTL 10000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_TFTQ_CTL 20000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_MFTQ_CTL 4000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TBDR_FTQ_CTL 4002 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TDMA_FTQ_CTL 10002 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TXP_FTQ_CTL 10002 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TPAT_FTQ_CTL 10000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RXP_CFTQ_CTL 8000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RXP_FTQ_CTL 100000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_COM_COMXQ_FTQ_CTL >> 10000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_COM_COMTQ_FTQ_CTL >> 20000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_COM_COMQ_FTQ_CTL >> 10000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_CP_CPQ_FTQ_CTL 4000 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: TXP mode b84c state >> 80001000 evt_mask 500 pc 8001284 pc 8001284 instr 1440fffc >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: TPAT mode b84c state >> 80001000 evt_mask 500 pc 8000a50 pc 8000a4c instr 38420001 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: RXP mode b84c state >> 80001000 evt_mask 500 pc 8004ad0 pc 8004adc instr 14e0005d >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: COM mode b8cc state >> 80008000 evt_mask 500 pc 8000a98 pc 8000a8c instr 8821 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: CP mode b8cc state >> 80000000 evt_mask 500 pc 8000c7c pc 8000928 instr 8ce800e8 >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2:<--- end FTQ dump on eth0 ---> >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: intr_sem[0] >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: intr_sem[0] >> PCI_CMD[00100406] >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: PCI_PM[19002008] >> PCI_MISC_CFG[92000088] >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: >> EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 >> RPM_MGMT_PKT_CTRL[40000088] >> Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 DEBUG: >> MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] >> Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 DEBUG: >> HC_STATS_INTERRUPT_STATUS[01fe0001] >> Mar 18 01:40:27 xen-san-gb1 kernel: Ring state for ring 0 napi state 12 >> Mar 18 01:40:27 xen-san-gb1 kernel: netdev state 7 >> Mar 18 01:40:27 xen-san-gb1 kernel: hw status idx 3267 last status idx >> 307c irq jiffies 100759890 >> Mar 18 01:40:27 xen-san-gb1 kernel: hw tx cons a669 hw rx cons 103c >> Mar 18 01:40:27 xen-san-gb1 kernel: sw tx cons a57c a57c prod a669 >> Mar 18 01:40:27 xen-san-gb1 kernel: sw rx cons f3c prod 103c >> Mar 18 01:40:27 xen-san-gb1 kernel: Current jiffies 1008f4741 HZ fa tx >> 1008f41e2 poll 100759890 >> Mar 18 01:40:27 xen-san-gb1 kernel: tx stop jiffies 1008f41e2 tx start >> jiffies 0 >> Mar 18 01:40:27 xen-san-gb1 kernel: irq_event c68c36 napi_event c68c37 >> Mar 18 01:40:27 xen-san-gb1 kernel: Ring state for ring 0 napi state 12 >> Mar 18 01:40:27 xen-san-gb1 kernel: netdev state 77 >> Mar 18 01:40:27 xen-san-gb1 kernel: hw status idx 3267 last status idx >> 307c irq jiffies 100759890 >> Mar 18 01:40:27 xen-san-gb1 kernel: hw tx cons a669 hw rx cons 103c >> Mar 18 01:40:27 xen-san-gb1 kernel: sw tx cons a57c a57c prod a669 >> Mar 18 01:40:27 xen-san-gb1 kernel: sw rx cons f3c prod 103c >> Mar 18 01:40:27 xen-san-gb1 kernel: Current jiffies 1008f4741 HZ fa tx >> 1008f41e2 poll 100759890 >> Mar 18 01:40:27 xen-san-gb1 kernel: tx stop jiffies 1008f41e2 tx start >> jiffies 0 >> Mar 18 01:40:27 xen-san-gb1 kernel: irq_event c68c36 napi_event c68c37 >> Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 NIC Copper Link is Down >> Mar 18 01:40:27 xen-san-gb1 kernel: bonding: bond0: link status >> definitely down for interface eth0, disabling it >> >> This was then followed rather quickly by a failure with the second NIC >> (eth1) in the bond: >> >> Mar 18 01:42:26 xen-san-gb1 kernel: NETDEV WATCHDOG: eth1: transmit >> timed out >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2:<--- start FTQ dump on eth1 >> ---> >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_PFTQ_CTL 10000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_TFTQ_CTL 20000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_MFTQ_CTL 4000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TBDR_FTQ_CTL 4002 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TDMA_FTQ_CTL 10000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TXP_FTQ_CTL 10002 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TPAT_FTQ_CTL 10000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RXP_CFTQ_CTL 8000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RXP_FTQ_CTL 100000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_COM_COMXQ_FTQ_CTL >> 10000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_COM_COMTQ_FTQ_CTL >> 20000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_COM_COMQ_FTQ_CTL >> 10000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_CP_CPQ_FTQ_CTL 4000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: TXP mode b84c state >> 80005000 evt_mask 500 pc 8001294 pc 8001284 instr 38640001 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: TPAT mode b84c state >> 80001000 evt_mask 500 pc 8000a58 pc 8000a5c instr 8f820014 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: RXP mode b84c state >> 80001000 evt_mask 500 pc 8004ad0 pc 8004adc instr 14e0005d >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: COM mode b8cc state >> 80000000 evt_mask 500 pc 8000a9c pc 8000a94 instr 3c028000 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: CP mode b8cc state >> 80008000 evt_mask 500 pc 8000c58 pc 8000c6c instr 27bdffe8 >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2:<--- end FTQ dump on eth1 ---> >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: intr_sem[0] >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: intr_sem[0] >> PCI_CMD[00100406] >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: PCI_PM[19002008] >> PCI_MISC_CFG[92000088] >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: >> EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 >> RPM_MGMT_PKT_CTRL[40000088] >> Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 DEBUG: >> MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] >> Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 DEBUG: >> HC_STATS_INTERRUPT_STATUS[01fe0001] >> Mar 18 01:42:27 xen-san-gb1 kernel: Ring state for ring 0 napi state 12 >> Mar 18 01:42:27 xen-san-gb1 kernel: netdev state 7 >> Mar 18 01:42:27 xen-san-gb1 kernel: hw status idx 2bb0 last status idx >> 29c4 irq jiffies 100759898 >> Mar 18 01:42:27 xen-san-gb1 kernel: hw tx cons e421 hw rx cons a8ce >> Mar 18 01:42:27 xen-san-gb1 kernel: sw tx cons e334 e334 prod e421 >> Mar 18 01:42:27 xen-san-gb1 kernel: sw rx cons a7ce prod a8ce >> Mar 18 01:42:27 xen-san-gb1 kernel: Current jiffies 1008fbc71 HZ fa tx >> 1008fb744 poll 100759898 >> Mar 18 01:42:27 xen-san-gb1 kernel: tx stop jiffies 1008fb744 tx start >> jiffies 100239dfd >> Mar 18 01:42:27 xen-san-gb1 kernel: irq_event ab2e13 napi_event ab2e14 >> Mar 18 01:42:27 xen-san-gb1 kernel: Ring state for ring 0 napi state 12 >> Mar 18 01:42:27 xen-san-gb1 kernel: netdev state 77 >> Mar 18 01:42:27 xen-san-gb1 kernel: hw status idx 2bb0 last status idx >> 29c4 irq jiffies 100759898 >> Mar 18 01:42:27 xen-san-gb1 kernel: hw tx cons e421 hw rx cons a8ce >> Mar 18 01:42:27 xen-san-gb1 kernel: sw tx cons e334 e334 prod e421 >> Mar 18 01:42:27 xen-san-gb1 kernel: sw rx cons a7ce prod a8ce >> Mar 18 01:42:27 xen-san-gb1 kernel: Current jiffies 1008fbc72 HZ fa tx >> 1008fb744 poll 100759898 >> Mar 18 01:42:27 xen-san-gb1 kernel: tx stop jiffies 1008fb744 tx start >> jiffies 100239dfd >> Mar 18 01:42:27 xen-san-gb1 kernel: irq_event ab2e13 napi_event ab2e14 >> Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 NIC Copper Link is Down >> Mar 18 01:42:27 xen-san-gb1 kernel: bonding: bond0: link status >> definitely down for interface eth1, disabling it >> Mar 18 01:42:27 xen-san-gb1 kernel: bonding: bond0: Warning: No 802.3ad >> response from the link partner for any adapters in the bond >> >> Onto more technical details... >> >> The kernel we were running (2.6.18.8 from xenbits) was compiled without >> support for MSI/MSI-X originally. So, we were experiencing these >> problems with plain standard IRQ''s. Michael Chan @ Broadcom, the author >> of bnx2 if you modinfo, has told me via email: >> >> * "The logs show that we haven''t had an interrupt for a very long >> time. It''s not clear how that interrupt was lost." >> * "So far the logs don''t show any inconsistent state in the hardware >> or software. It is possible that the Xen kernel is missing an interrupt >> and not delivering to the driver. Normally, in INTA mode, the IRQ is >> level triggered and should remain asserted until it is seen by the >> driver and de-asserted by the driver." >> >> But, just in case, I compiled 2.6.18.8 with support for MSI/MSI-X and >> was able to confirm (via dmesg and lspci -vv) that the NIC''s began to >> use MSI for interrupts. Unfortunately, the NIC crash happened anyways >> (the above kernel logs is actually from when running with MSI). >> >> Here''s whats really bugging me. We have a Dell PowerEdge R610, running >> Xen along with the bnx2 drivers from Broadcom, thats been online for >> ~220 days. Without a failure. The only difference is the system is not >> making use of bonding. It has just one NIC connected to the network >> with no VLAN''s trunked down etc. >> >> It looks like I''m not alone out there, as there''s a Red Hat bugzilla >> report for this issue: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=520888 >> >> ^^ The above has an indication of *Status >> <https://bugzilla.redhat.com/page.cgi?id=fields.html#status>*: CLOSED >> DUPLICATE of bug 511368 >> <https://bugzilla.redhat.com/show_bug.cgi?id=511368> , but looks like I >> don''t have access to view 511368. Grrr. >> >> Anyways... >> >> 1) Has anybody else experienced this issue? >> 2) Any developers care to comment on possible causes of this problem? >> 3) Anybody know of a solution? >> 4) What can I do to troubleshoot further, and get developers necessary >> information? >> >> Lastly... >> >> 5) Is anybody running Intel NIC''s within Dell PowerEdge R610''s, using >> bonding + Xen 3.4.3 + 2.6.18.8, and can safely report success? I may >> switch to Intel... >> >> Thanks! >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Joshua West Senior Systems Engineer Brandeis University http://www.brandeis.edu _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2011-Mar-18 19:46 UTC
RE: [Xen-devel] Broadcom BCM5709 (bnx2) on Dell PowerEdge R610, Issues
cc''ing Guru...> -----Original Message----- > From: Joshua West [mailto:jwest@brandeis.edu] > Sent: Friday, March 18, 2011 1:25 PM > To: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Broadcom BCM5709 (bnx2) on Dell PowerEdge > R610, Issues > > Hi Guru, > > Awesome, thanks for the tip. > > I''ll test out disabling cstates in the BIOS as I don''t believe Xen > 3.4.x > lets you set max_cstate as an argument to xen.gz in grub.conf. > > The patch in the changeset you mention applies to Xen 3.4.3 code. Do > you have an experience with that patch functioning/helping/working with > Xen 3.4.x? And if so, do you think it will end up as part of Xen 3.4.4 > (if that ever gets tagged/released)? Assuming disabling cstates in the > BIOS alleviates my problem, I''ll probably give that patch a whirl with > cstates enabled and see if the issue comes back. Just wondering if > anybody else has used that patch with Xen 3.4.3 and found success. > > Thanks. > > On 03/18/11 11:53, Guru Anbalagane wrote: > > This is likely related xen losing interrupts while certain cpus goes > > to c6 state. > > The below patch addresses an issue around this. > > http://xenbits.xen.org/hg/xen-unstable.hg/rev/1087f9a03ab6 > > Easy workaround would be to turn off cstates in BIOS or limit cstate > > in xen. > > > > Hope this helps. > > Thanks > > Guru > >> Message: 5 > >> Date: Fri, 18 Mar 2011 11:39:07 -0400 > >> From: Joshua West<jwest@brandeis.edu> > >> Subject: [Xen-devel] Broadcom BCM5709 (bnx2) on Dell PowerEdge R610 > >> Issues > >> To: xen-devel@lists.xensource.com > >> Message-ID:<4D837C9B.6030107@brandeis.edu> > >> Content-Type: text/plain; charset="iso-8859-1" > >> > >> Hey folks, > >> > >> Unfortunately, ever since we went live with Xen on Dell PowerEdge > >> R610''s, we''ve been having some odd and aggravating issues. The > NIC''s > >> tend to drop out when under heavy traffic after 1-7 days of uptime > >> (random, difficult to reproduce). But before I get into the issue''s > >> specifics, here''s some information about our setup: > >> > >> * Dell PowerEdge R610''s w/ 4 Onboard Broadcom BCM5709 1-GbE > NIC''s. > >> * RHEL 5.6. > >> * Xen 3.4.3 (from xen.org; our own compile) > >> * Kernel 2.6.18.18 > >> (http://xenbits.xensource.com/linux-2.6.18-xen.hg) > >> checkout 1073. > >> * bnx2 driver 2.0.18c from Broadcom''s netxtreme2-6.0.53 package. > >> * bnx2 that ships with 2.6.18.8 doesn''t support BCM5709''s. > >> * Had to use driver package from broadcom.com in order to get > >> networking. > >> * NIC bonding in pairs (eth0 + eth1, etc), with options "mode=4 > >> lacp_rate=fast miimon=100 use_carrier=1". > >> > >> What occurs is suddenly one of the NIC''s in the bond stops > responding. > >> Gets stuck on transmitting from what I understand. Kernel logs show > the > >> following, which includes extra debug information as the developers > from > >> Broadcom (Michael Chan and Benjamin Li) were assisting in > >> troubleshooting and gave me a version of bnx2 2.0.18c to run, that > >> prints out extra debug information upon NIC crash: > >> > >> Mar 18 01:40:26 xen-san-gb1 kernel: NETDEV WATCHDOG: eth0: transmit > >> timed out > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2:<--- start FTQ dump on eth0 > >> ---> > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_PFTQ_CTL > 10000 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_TFTQ_CTL > 20000 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_MFTQ_CTL > 4000 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TBDR_FTQ_CTL > 4002 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TDMA_FTQ_CTL > 10002 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TXP_FTQ_CTL > 10002 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TPAT_FTQ_CTL > 10000 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RXP_CFTQ_CTL > 8000 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RXP_FTQ_CTL > 100000 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: > BNX2_COM_COMXQ_FTQ_CTL > >> 10000 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: > BNX2_COM_COMTQ_FTQ_CTL > >> 20000 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: > BNX2_COM_COMQ_FTQ_CTL > >> 10000 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_CP_CPQ_FTQ_CTL > 4000 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: TXP mode b84c state > >> 80001000 evt_mask 500 pc 8001284 pc 8001284 instr 1440fffc > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: TPAT mode b84c state > >> 80001000 evt_mask 500 pc 8000a50 pc 8000a4c instr 38420001 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: RXP mode b84c state > >> 80001000 evt_mask 500 pc 8004ad0 pc 8004adc instr 14e0005d > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: COM mode b8cc state > >> 80008000 evt_mask 500 pc 8000a98 pc 8000a8c instr 8821 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: CP mode b8cc state > >> 80000000 evt_mask 500 pc 8000c7c pc 8000928 instr 8ce800e8 > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2:<--- end FTQ dump on eth0 - > --> > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: intr_sem[0] > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: intr_sem[0] > >> PCI_CMD[00100406] > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: > PCI_PM[19002008] > >> PCI_MISC_CFG[92000088] > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: > >> EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] > >> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 > >> RPM_MGMT_PKT_CTRL[40000088] > >> Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 DEBUG: > >> MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] > >> Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 DEBUG: > >> HC_STATS_INTERRUPT_STATUS[01fe0001] > >> Mar 18 01:40:27 xen-san-gb1 kernel: Ring state for ring 0 napi state > 12 > >> Mar 18 01:40:27 xen-san-gb1 kernel: netdev state 7 > >> Mar 18 01:40:27 xen-san-gb1 kernel: hw status idx 3267 last status > idx > >> 307c irq jiffies 100759890 > >> Mar 18 01:40:27 xen-san-gb1 kernel: hw tx cons a669 hw rx cons 103c > >> Mar 18 01:40:27 xen-san-gb1 kernel: sw tx cons a57c a57c prod a669 > >> Mar 18 01:40:27 xen-san-gb1 kernel: sw rx cons f3c prod 103c > >> Mar 18 01:40:27 xen-san-gb1 kernel: Current jiffies 1008f4741 HZ fa > tx > >> 1008f41e2 poll 100759890 > >> Mar 18 01:40:27 xen-san-gb1 kernel: tx stop jiffies 1008f41e2 tx > start > >> jiffies 0 > >> Mar 18 01:40:27 xen-san-gb1 kernel: irq_event c68c36 napi_event > c68c37 > >> Mar 18 01:40:27 xen-san-gb1 kernel: Ring state for ring 0 napi state > 12 > >> Mar 18 01:40:27 xen-san-gb1 kernel: netdev state 77 > >> Mar 18 01:40:27 xen-san-gb1 kernel: hw status idx 3267 last status > idx > >> 307c irq jiffies 100759890 > >> Mar 18 01:40:27 xen-san-gb1 kernel: hw tx cons a669 hw rx cons 103c > >> Mar 18 01:40:27 xen-san-gb1 kernel: sw tx cons a57c a57c prod a669 > >> Mar 18 01:40:27 xen-san-gb1 kernel: sw rx cons f3c prod 103c > >> Mar 18 01:40:27 xen-san-gb1 kernel: Current jiffies 1008f4741 HZ fa > tx > >> 1008f41e2 poll 100759890 > >> Mar 18 01:40:27 xen-san-gb1 kernel: tx stop jiffies 1008f41e2 tx > start > >> jiffies 0 > >> Mar 18 01:40:27 xen-san-gb1 kernel: irq_event c68c36 napi_event > c68c37 > >> Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 NIC Copper Link is > Down > >> Mar 18 01:40:27 xen-san-gb1 kernel: bonding: bond0: link status > >> definitely down for interface eth0, disabling it > >> > >> This was then followed rather quickly by a failure with the second > NIC > >> (eth1) in the bond: > >> > >> Mar 18 01:42:26 xen-san-gb1 kernel: NETDEV WATCHDOG: eth1: transmit > >> timed out > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2:<--- start FTQ dump on eth1 > >> ---> > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_PFTQ_CTL > 10000 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_TFTQ_CTL > 20000 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_MFTQ_CTL > 4000 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TBDR_FTQ_CTL > 4002 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TDMA_FTQ_CTL > 10000 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TXP_FTQ_CTL > 10002 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TPAT_FTQ_CTL > 10000 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RXP_CFTQ_CTL > 8000 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RXP_FTQ_CTL > 100000 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: > BNX2_COM_COMXQ_FTQ_CTL > >> 10000 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: > BNX2_COM_COMTQ_FTQ_CTL > >> 20000 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: > BNX2_COM_COMQ_FTQ_CTL > >> 10000 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_CP_CPQ_FTQ_CTL > 4000 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: TXP mode b84c state > >> 80005000 evt_mask 500 pc 8001294 pc 8001284 instr 38640001 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: TPAT mode b84c state > >> 80001000 evt_mask 500 pc 8000a58 pc 8000a5c instr 8f820014 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: RXP mode b84c state > >> 80001000 evt_mask 500 pc 8004ad0 pc 8004adc instr 14e0005d > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: COM mode b8cc state > >> 80000000 evt_mask 500 pc 8000a9c pc 8000a94 instr 3c028000 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: CP mode b8cc state > >> 80008000 evt_mask 500 pc 8000c58 pc 8000c6c instr 27bdffe8 > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2:<--- end FTQ dump on eth1 - > --> > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: intr_sem[0] > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: intr_sem[0] > >> PCI_CMD[00100406] > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: > PCI_PM[19002008] > >> PCI_MISC_CFG[92000088] > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: > >> EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] > >> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 > >> RPM_MGMT_PKT_CTRL[40000088] > >> Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 DEBUG: > >> MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] > >> Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 DEBUG: > >> HC_STATS_INTERRUPT_STATUS[01fe0001] > >> Mar 18 01:42:27 xen-san-gb1 kernel: Ring state for ring 0 napi state > 12 > >> Mar 18 01:42:27 xen-san-gb1 kernel: netdev state 7 > >> Mar 18 01:42:27 xen-san-gb1 kernel: hw status idx 2bb0 last status > idx > >> 29c4 irq jiffies 100759898 > >> Mar 18 01:42:27 xen-san-gb1 kernel: hw tx cons e421 hw rx cons a8ce > >> Mar 18 01:42:27 xen-san-gb1 kernel: sw tx cons e334 e334 prod e421 > >> Mar 18 01:42:27 xen-san-gb1 kernel: sw rx cons a7ce prod a8ce > >> Mar 18 01:42:27 xen-san-gb1 kernel: Current jiffies 1008fbc71 HZ fa > tx > >> 1008fb744 poll 100759898 > >> Mar 18 01:42:27 xen-san-gb1 kernel: tx stop jiffies 1008fb744 tx > start > >> jiffies 100239dfd > >> Mar 18 01:42:27 xen-san-gb1 kernel: irq_event ab2e13 napi_event > ab2e14 > >> Mar 18 01:42:27 xen-san-gb1 kernel: Ring state for ring 0 napi state > 12 > >> Mar 18 01:42:27 xen-san-gb1 kernel: netdev state 77 > >> Mar 18 01:42:27 xen-san-gb1 kernel: hw status idx 2bb0 last status > idx > >> 29c4 irq jiffies 100759898 > >> Mar 18 01:42:27 xen-san-gb1 kernel: hw tx cons e421 hw rx cons a8ce > >> Mar 18 01:42:27 xen-san-gb1 kernel: sw tx cons e334 e334 prod e421 > >> Mar 18 01:42:27 xen-san-gb1 kernel: sw rx cons a7ce prod a8ce > >> Mar 18 01:42:27 xen-san-gb1 kernel: Current jiffies 1008fbc72 HZ fa > tx > >> 1008fb744 poll 100759898 > >> Mar 18 01:42:27 xen-san-gb1 kernel: tx stop jiffies 1008fb744 tx > start > >> jiffies 100239dfd > >> Mar 18 01:42:27 xen-san-gb1 kernel: irq_event ab2e13 napi_event > ab2e14 > >> Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 NIC Copper Link is > Down > >> Mar 18 01:42:27 xen-san-gb1 kernel: bonding: bond0: link status > >> definitely down for interface eth1, disabling it > >> Mar 18 01:42:27 xen-san-gb1 kernel: bonding: bond0: Warning: No > 802.3ad > >> response from the link partner for any adapters in the bond > >> > >> Onto more technical details... > >> > >> The kernel we were running (2.6.18.8 from xenbits) was compiled > without > >> support for MSI/MSI-X originally. So, we were experiencing these > >> problems with plain standard IRQ''s. Michael Chan @ Broadcom, the > author > >> of bnx2 if you modinfo, has told me via email: > >> > >> * "The logs show that we haven''t had an interrupt for a very > long > >> time. It''s not clear how that interrupt was lost." > >> * "So far the logs don''t show any inconsistent state in the > hardware > >> or software. It is possible that the Xen kernel is missing an > interrupt > >> and not delivering to the driver. Normally, in INTA mode, the IRQ is > >> level triggered and should remain asserted until it is seen by the > >> driver and de-asserted by the driver." > >> > >> But, just in case, I compiled 2.6.18.8 with support for MSI/MSI-X > and > >> was able to confirm (via dmesg and lspci -vv) that the NIC''s began > to > >> use MSI for interrupts. Unfortunately, the NIC crash happened > anyways > >> (the above kernel logs is actually from when running with MSI). > >> > >> Here''s whats really bugging me. We have a Dell PowerEdge R610, > running > >> Xen along with the bnx2 drivers from Broadcom, thats been online for > >> ~220 days. Without a failure. The only difference is the system is > not > >> making use of bonding. It has just one NIC connected to the network > >> with no VLAN''s trunked down etc. > >> > >> It looks like I''m not alone out there, as there''s a Red Hat bugzilla > >> report for this issue: > >> > >> https://bugzilla.redhat.com/show_bug.cgi?id=520888 > >> > >> ^^ The above has an indication of *Status > >> <https://bugzilla.redhat.com/page.cgi?id=fields.html#status>*: > CLOSED > >> DUPLICATE of bug 511368 > >> <https://bugzilla.redhat.com/show_bug.cgi?id=511368> , but looks > like I > >> don''t have access to view 511368. Grrr. > >> > >> Anyways... > >> > >> 1) Has anybody else experienced this issue? > >> 2) Any developers care to comment on possible causes of this > problem? > >> 3) Anybody know of a solution? > >> 4) What can I do to troubleshoot further, and get developers > necessary > >> information? > >> > >> Lastly... > >> > >> 5) Is anybody running Intel NIC''s within Dell PowerEdge R610''s, > using > >> bonding + Xen 3.4.3 + 2.6.18.8, and can safely report success? I > may > >> switch to Intel... > >> > >> Thanks! > >> > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > -- > Joshua West > Senior Systems Engineer > Brandeis University > http://www.brandeis.edu > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guru Anbalagane
2011-Mar-18 19:57 UTC
Re: [Xen-devel] Broadcom BCM5709 (bnx2) on Dell PowerEdge R610, Issues
On 03/18/2011 12:46 PM, Dan Magenheimer wrote:> cc''ing Guru... > > >> -----Original Message----- >> From: Joshua West [mailto:jwest@brandeis.edu] >> Sent: Friday, March 18, 2011 1:25 PM >> To: xen-devel@lists.xensource.com >> Subject: Re: [Xen-devel] Broadcom BCM5709 (bnx2) on Dell PowerEdge >> R610, Issues >> >> Hi Guru, >> >> Awesome, thanks for the tip. >> >> I''ll test out disabling cstates in the BIOS as I don''t believe Xen >> 3.4.x >> lets you set max_cstate as an argument to xen.gz in grub.conf. >>I believe max_cstate=1 works at xen command line. Another way to set cstate is: xenpm set-max-cstate 1>> The patch in the changeset you mention applies to Xen 3.4.3 code. Do >> you have an experience with that patch functioning/helping/working with >> Xen 3.4.x? And if so, do you think it will end up as part of Xen 3.4.4 >> (if that ever gets tagged/released)? Assuming disabling cstates in the >> BIOS alleviates my problem, I''ll probably give that patch a whirl with >> cstates enabled and see if the issue comes back. Just wondering if >> anybody else has used that patch with Xen 3.4.3 and found success. >>For Oracle VM, it applies but need to pull in few more patches to make it work and its not verified yet. So for now, We recommend disabling cstates or limit to 1 or 3. Thanks Guru>> Thanks. >> >> On 03/18/11 11:53, Guru Anbalagane wrote: >> >>> This is likely related xen losing interrupts while certain cpus goes >>> to c6 state. >>> The below patch addresses an issue around this. >>> http://xenbits.xen.org/hg/xen-unstable.hg/rev/1087f9a03ab6 >>> Easy workaround would be to turn off cstates in BIOS or limit cstate >>> in xen. >>> >>> Hope this helps. >>> Thanks >>> Guru >>> >>>> Message: 5 >>>> Date: Fri, 18 Mar 2011 11:39:07 -0400 >>>> From: Joshua West<jwest@brandeis.edu> >>>> Subject: [Xen-devel] Broadcom BCM5709 (bnx2) on Dell PowerEdge R610 >>>> Issues >>>> To: xen-devel@lists.xensource.com >>>> Message-ID:<4D837C9B.6030107@brandeis.edu> >>>> Content-Type: text/plain; charset="iso-8859-1" >>>> >>>> Hey folks, >>>> >>>> Unfortunately, ever since we went live with Xen on Dell PowerEdge >>>> R610''s, we''ve been having some odd and aggravating issues. The >>>> >> NIC''s >> >>>> tend to drop out when under heavy traffic after 1-7 days of uptime >>>> (random, difficult to reproduce). But before I get into the issue''s >>>> specifics, here''s some information about our setup: >>>> >>>> * Dell PowerEdge R610''s w/ 4 Onboard Broadcom BCM5709 1-GbE >>>> >> NIC''s. >> >>>> * RHEL 5.6. >>>> * Xen 3.4.3 (from xen.org; our own compile) >>>> * Kernel 2.6.18.18 >>>> (http://xenbits.xensource.com/linux-2.6.18-xen.hg) >>>> checkout 1073. >>>> * bnx2 driver 2.0.18c from Broadcom''s netxtreme2-6.0.53 package. >>>> * bnx2 that ships with 2.6.18.8 doesn''t support BCM5709''s. >>>> * Had to use driver package from broadcom.com in order to get >>>> networking. >>>> * NIC bonding in pairs (eth0 + eth1, etc), with options "mode=4 >>>> lacp_rate=fast miimon=100 use_carrier=1". >>>> >>>> What occurs is suddenly one of the NIC''s in the bond stops >>>> >> responding. >> >>>> Gets stuck on transmitting from what I understand. Kernel logs show >>>> >> the >> >>>> following, which includes extra debug information as the developers >>>> >> from >> >>>> Broadcom (Michael Chan and Benjamin Li) were assisting in >>>> troubleshooting and gave me a version of bnx2 2.0.18c to run, that >>>> prints out extra debug information upon NIC crash: >>>> >>>> Mar 18 01:40:26 xen-san-gb1 kernel: NETDEV WATCHDOG: eth0: transmit >>>> timed out >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2:<--- start FTQ dump on eth0 >>>> ---> >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_PFTQ_CTL >>>> >> 10000 >> >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_TFTQ_CTL >>>> >> 20000 >> >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RV2P_MFTQ_CTL >>>> >> 4000 >> >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TBDR_FTQ_CTL >>>> >> 4002 >> >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TDMA_FTQ_CTL >>>> >> 10002 >> >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TXP_FTQ_CTL >>>> >> 10002 >> >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_TPAT_FTQ_CTL >>>> >> 10000 >> >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RXP_CFTQ_CTL >>>> >> 8000 >> >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_RXP_FTQ_CTL >>>> >> 100000 >> >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: >>>> >> BNX2_COM_COMXQ_FTQ_CTL >> >>>> 10000 >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: >>>> >> BNX2_COM_COMTQ_FTQ_CTL >> >>>> 20000 >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: >>>> >> BNX2_COM_COMQ_FTQ_CTL >> >>>> 10000 >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: BNX2_CP_CPQ_FTQ_CTL >>>> >> 4000 >> >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: TXP mode b84c state >>>> 80001000 evt_mask 500 pc 8001284 pc 8001284 instr 1440fffc >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: TPAT mode b84c state >>>> 80001000 evt_mask 500 pc 8000a50 pc 8000a4c instr 38420001 >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: RXP mode b84c state >>>> 80001000 evt_mask 500 pc 8004ad0 pc 8004adc instr 14e0005d >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: COM mode b8cc state >>>> 80008000 evt_mask 500 pc 8000a98 pc 8000a8c instr 8821 >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0: CP mode b8cc state >>>> 80000000 evt_mask 500 pc 8000c7c pc 8000928 instr 8ce800e8 >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2:<--- end FTQ dump on eth0 - >>>> >> --> >> >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: intr_sem[0] >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: intr_sem[0] >>>> PCI_CMD[00100406] >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: >>>> >> PCI_PM[19002008] >> >>>> PCI_MISC_CFG[92000088] >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 DEBUG: >>>> EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] >>>> Mar 18 01:40:26 xen-san-gb1 kernel: bnx2: eth0 >>>> RPM_MGMT_PKT_CTRL[40000088] >>>> Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 DEBUG: >>>> MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] >>>> Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 DEBUG: >>>> HC_STATS_INTERRUPT_STATUS[01fe0001] >>>> Mar 18 01:40:27 xen-san-gb1 kernel: Ring state for ring 0 napi state >>>> >> 12 >> >>>> Mar 18 01:40:27 xen-san-gb1 kernel: netdev state 7 >>>> Mar 18 01:40:27 xen-san-gb1 kernel: hw status idx 3267 last status >>>> >> idx >> >>>> 307c irq jiffies 100759890 >>>> Mar 18 01:40:27 xen-san-gb1 kernel: hw tx cons a669 hw rx cons 103c >>>> Mar 18 01:40:27 xen-san-gb1 kernel: sw tx cons a57c a57c prod a669 >>>> Mar 18 01:40:27 xen-san-gb1 kernel: sw rx cons f3c prod 103c >>>> Mar 18 01:40:27 xen-san-gb1 kernel: Current jiffies 1008f4741 HZ fa >>>> >> tx >> >>>> 1008f41e2 poll 100759890 >>>> Mar 18 01:40:27 xen-san-gb1 kernel: tx stop jiffies 1008f41e2 tx >>>> >> start >> >>>> jiffies 0 >>>> Mar 18 01:40:27 xen-san-gb1 kernel: irq_event c68c36 napi_event >>>> >> c68c37 >> >>>> Mar 18 01:40:27 xen-san-gb1 kernel: Ring state for ring 0 napi state >>>> >> 12 >> >>>> Mar 18 01:40:27 xen-san-gb1 kernel: netdev state 77 >>>> Mar 18 01:40:27 xen-san-gb1 kernel: hw status idx 3267 last status >>>> >> idx >> >>>> 307c irq jiffies 100759890 >>>> Mar 18 01:40:27 xen-san-gb1 kernel: hw tx cons a669 hw rx cons 103c >>>> Mar 18 01:40:27 xen-san-gb1 kernel: sw tx cons a57c a57c prod a669 >>>> Mar 18 01:40:27 xen-san-gb1 kernel: sw rx cons f3c prod 103c >>>> Mar 18 01:40:27 xen-san-gb1 kernel: Current jiffies 1008f4741 HZ fa >>>> >> tx >> >>>> 1008f41e2 poll 100759890 >>>> Mar 18 01:40:27 xen-san-gb1 kernel: tx stop jiffies 1008f41e2 tx >>>> >> start >> >>>> jiffies 0 >>>> Mar 18 01:40:27 xen-san-gb1 kernel: irq_event c68c36 napi_event >>>> >> c68c37 >> >>>> Mar 18 01:40:27 xen-san-gb1 kernel: bnx2: eth0 NIC Copper Link is >>>> >> Down >> >>>> Mar 18 01:40:27 xen-san-gb1 kernel: bonding: bond0: link status >>>> definitely down for interface eth0, disabling it >>>> >>>> This was then followed rather quickly by a failure with the second >>>> >> NIC >> >>>> (eth1) in the bond: >>>> >>>> Mar 18 01:42:26 xen-san-gb1 kernel: NETDEV WATCHDOG: eth1: transmit >>>> timed out >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2:<--- start FTQ dump on eth1 >>>> ---> >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_PFTQ_CTL >>>> >> 10000 >> >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_TFTQ_CTL >>>> >> 20000 >> >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RV2P_MFTQ_CTL >>>> >> 4000 >> >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TBDR_FTQ_CTL >>>> >> 4002 >> >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TDMA_FTQ_CTL >>>> >> 10000 >> >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TXP_FTQ_CTL >>>> >> 10002 >> >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_TPAT_FTQ_CTL >>>> >> 10000 >> >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RXP_CFTQ_CTL >>>> >> 8000 >> >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_RXP_FTQ_CTL >>>> >> 100000 >> >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: >>>> >> BNX2_COM_COMXQ_FTQ_CTL >> >>>> 10000 >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: >>>> >> BNX2_COM_COMTQ_FTQ_CTL >> >>>> 20000 >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: >>>> >> BNX2_COM_COMQ_FTQ_CTL >> >>>> 10000 >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: BNX2_CP_CPQ_FTQ_CTL >>>> >> 4000 >> >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: TXP mode b84c state >>>> 80005000 evt_mask 500 pc 8001294 pc 8001284 instr 38640001 >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: TPAT mode b84c state >>>> 80001000 evt_mask 500 pc 8000a58 pc 8000a5c instr 8f820014 >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: RXP mode b84c state >>>> 80001000 evt_mask 500 pc 8004ad0 pc 8004adc instr 14e0005d >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: COM mode b8cc state >>>> 80000000 evt_mask 500 pc 8000a9c pc 8000a94 instr 3c028000 >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1: CP mode b8cc state >>>> 80008000 evt_mask 500 pc 8000c58 pc 8000c6c instr 27bdffe8 >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2:<--- end FTQ dump on eth1 - >>>> >> --> >> >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: intr_sem[0] >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: intr_sem[0] >>>> PCI_CMD[00100406] >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: >>>> >> PCI_PM[19002008] >> >>>> PCI_MISC_CFG[92000088] >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 DEBUG: >>>> EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] >>>> Mar 18 01:42:26 xen-san-gb1 kernel: bnx2: eth1 >>>> RPM_MGMT_PKT_CTRL[40000088] >>>> Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 DEBUG: >>>> MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] >>>> Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 DEBUG: >>>> HC_STATS_INTERRUPT_STATUS[01fe0001] >>>> Mar 18 01:42:27 xen-san-gb1 kernel: Ring state for ring 0 napi state >>>> >> 12 >> >>>> Mar 18 01:42:27 xen-san-gb1 kernel: netdev state 7 >>>> Mar 18 01:42:27 xen-san-gb1 kernel: hw status idx 2bb0 last status >>>> >> idx >> >>>> 29c4 irq jiffies 100759898 >>>> Mar 18 01:42:27 xen-san-gb1 kernel: hw tx cons e421 hw rx cons a8ce >>>> Mar 18 01:42:27 xen-san-gb1 kernel: sw tx cons e334 e334 prod e421 >>>> Mar 18 01:42:27 xen-san-gb1 kernel: sw rx cons a7ce prod a8ce >>>> Mar 18 01:42:27 xen-san-gb1 kernel: Current jiffies 1008fbc71 HZ fa >>>> >> tx >> >>>> 1008fb744 poll 100759898 >>>> Mar 18 01:42:27 xen-san-gb1 kernel: tx stop jiffies 1008fb744 tx >>>> >> start >> >>>> jiffies 100239dfd >>>> Mar 18 01:42:27 xen-san-gb1 kernel: irq_event ab2e13 napi_event >>>> >> ab2e14 >> >>>> Mar 18 01:42:27 xen-san-gb1 kernel: Ring state for ring 0 napi state >>>> >> 12 >> >>>> Mar 18 01:42:27 xen-san-gb1 kernel: netdev state 77 >>>> Mar 18 01:42:27 xen-san-gb1 kernel: hw status idx 2bb0 last status >>>> >> idx >> >>>> 29c4 irq jiffies 100759898 >>>> Mar 18 01:42:27 xen-san-gb1 kernel: hw tx cons e421 hw rx cons a8ce >>>> Mar 18 01:42:27 xen-san-gb1 kernel: sw tx cons e334 e334 prod e421 >>>> Mar 18 01:42:27 xen-san-gb1 kernel: sw rx cons a7ce prod a8ce >>>> Mar 18 01:42:27 xen-san-gb1 kernel: Current jiffies 1008fbc72 HZ fa >>>> >> tx >> >>>> 1008fb744 poll 100759898 >>>> Mar 18 01:42:27 xen-san-gb1 kernel: tx stop jiffies 1008fb744 tx >>>> >> start >> >>>> jiffies 100239dfd >>>> Mar 18 01:42:27 xen-san-gb1 kernel: irq_event ab2e13 napi_event >>>> >> ab2e14 >> >>>> Mar 18 01:42:27 xen-san-gb1 kernel: bnx2: eth1 NIC Copper Link is >>>> >> Down >> >>>> Mar 18 01:42:27 xen-san-gb1 kernel: bonding: bond0: link status >>>> definitely down for interface eth1, disabling it >>>> Mar 18 01:42:27 xen-san-gb1 kernel: bonding: bond0: Warning: No >>>> >> 802.3ad >> >>>> response from the link partner for any adapters in the bond >>>> >>>> Onto more technical details... >>>> >>>> The kernel we were running (2.6.18.8 from xenbits) was compiled >>>> >> without >> >>>> support for MSI/MSI-X originally. So, we were experiencing these >>>> problems with plain standard IRQ''s. Michael Chan @ Broadcom, the >>>> >> author >> >>>> of bnx2 if you modinfo, has told me via email: >>>> >>>> * "The logs show that we haven''t had an interrupt for a very >>>> >> long >> >>>> time. It''s not clear how that interrupt was lost." >>>> * "So far the logs don''t show any inconsistent state in the >>>> >> hardware >> >>>> or software. It is possible that the Xen kernel is missing an >>>> >> interrupt >> >>>> and not delivering to the driver. Normally, in INTA mode, the IRQ is >>>> level triggered and should remain asserted until it is seen by the >>>> driver and de-asserted by the driver." >>>> >>>> But, just in case, I compiled 2.6.18.8 with support for MSI/MSI-X >>>> >> and >> >>>> was able to confirm (via dmesg and lspci -vv) that the NIC''s began >>>> >> to >> >>>> use MSI for interrupts. Unfortunately, the NIC crash happened >>>> >> anyways >> >>>> (the above kernel logs is actually from when running with MSI). >>>> >>>> Here''s whats really bugging me. We have a Dell PowerEdge R610, >>>> >> running >> >>>> Xen along with the bnx2 drivers from Broadcom, thats been online for >>>> ~220 days. Without a failure. The only difference is the system is >>>> >> not >> >>>> making use of bonding. It has just one NIC connected to the network >>>> with no VLAN''s trunked down etc. >>>> >>>> It looks like I''m not alone out there, as there''s a Red Hat bugzilla >>>> report for this issue: >>>> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=520888 >>>> >>>> ^^ The above has an indication of *Status >>>> <https://bugzilla.redhat.com/page.cgi?id=fields.html#status>*: >>>> >> CLOSED >> >>>> DUPLICATE of bug 511368 >>>> <https://bugzilla.redhat.com/show_bug.cgi?id=511368> , but looks >>>> >> like I >> >>>> don''t have access to view 511368. Grrr. >>>> >>>> Anyways... >>>> >>>> 1) Has anybody else experienced this issue? >>>> 2) Any developers care to comment on possible causes of this >>>> >> problem? >> >>>> 3) Anybody know of a solution? >>>> 4) What can I do to troubleshoot further, and get developers >>>> >> necessary >> >>>> information? >>>> >>>> Lastly... >>>> >>>> 5) Is anybody running Intel NIC''s within Dell PowerEdge R610''s, >>>> >> using >> >>>> bonding + Xen 3.4.3 + 2.6.18.8, and can safely report success? I >>>> >> may >> >>>> switch to Intel... >>>> >>>> Thanks! >>>> >>>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >> >> -- >> Joshua West >> Senior Systems Engineer >> Brandeis University >> http://www.brandeis.edu >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel