Hello, my server has massive problems with my NIC. I got: "Detected Tx Unit Hang". At the moment I use 2.6.31 from Jeremy, does anyone know if it''s fixed in 2.6.32 or newer tree? Regards, Stefan Kuhne _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-May-20 22:18 UTC
Re: [Xen-devel] [pv_ops] e1000e: "Detected Tx Unit Hang"
On 05/20/2010 02:45 PM, Stefan Kuhne wrote:> Hello, > > my server has massive problems with my NIC. > I got: "Detected Tx Unit Hang". > > At the moment I use 2.6.31 from Jeremy, does anyone know if it''s fixed > in 2.6.32 or newer tree? >e1000e works fine for me. However, I did have problems with my Ibex Peak-based system and the integrated ethernet devices; they would drop off the PCIe bus (lspci -vx would show all 0xff for the config space), which turned out to be some problem with ALPM (PCIe active link power management). Could this be what you''re seeing? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Kuhne
2010-May-20 22:58 UTC
Re: [Xen-devel] [pv_ops] e1000e: "Detected Tx Unit Hang"
Am 21.05.2010 00:18, schrieb Jeremy Fitzhardinge: Hello Jeremy,> e1000e works fine for me. However, I did have problems with my Ibex > Peak-based system and the integrated ethernet devices; they would drop > off the PCIe bus (lspci -vx would show all 0xff for the config space), > which turned out to be some problem with ALPM (PCIe active link power > management). Could this be what you''re seeing? >my "lspci -vx" output: 02:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) Subsystem: FIRST INTERNATIONAL Computer Inc Unknown device 4720 Flags: bus master, fast devsel, latency 0, IRQ 409 Memory at d0000000 (32-bit, non-prefetchable) [size=128K] I/O ports at 2000 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+ Capabilities: [e0] Express Endpoint IRQ 0 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number c6-a9-09-ff-ff-0b-14-00 00: 86 80 8c 10 07 05 10 00 00 00 00 02 10 00 00 00 10: 00 00 00 d0 00 00 00 00 01 20 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 09 15 20 47 30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00 and the complete dmesg output: [ 9620.997466] 0000:02:00.0: peth0: Detected Tx Unit Hang: [ 9620.997469] TDH <fc> [ 9620.997471] TDT <1f> [ 9620.997473] next_to_use <1f> [ 9620.997475] next_to_clean <fc> [ 9620.997477] buffer_info[next_to_clean]: [ 9620.997479] time_stamp <8e2ec3> [ 9620.997481] next_to_watch <fc> [ 9620.997483] jiffies <8e3a25> [ 9620.997485] next_to_watch.status <0> [ 9622.997490] 0000:02:00.0: peth0: Detected Tx Unit Hang: [ 9622.997496] TDH <fc> [ 9622.997500] TDT <1f> [ 9622.997503] next_to_use <1f> [ 9622.997507] next_to_clean <fc> [ 9622.997511] buffer_info[next_to_clean]: [ 9622.997515] time_stamp <8e2ec3> [ 9622.997519] next_to_watch <fc> [ 9622.997522] jiffies <8e41f5> [ 9622.997526] next_to_watch.status <0> [ 9624.997536] 0000:02:00.0: peth0: Detected Tx Unit Hang: [ 9624.997541] TDH <fc> [ 9624.997545] TDT <1f> [ 9624.997549] next_to_use <1f> [ 9624.997553] next_to_clean <fc> [ 9624.997557] buffer_info[next_to_clean]: [ 9624.997561] time_stamp <8e2ec3> [ 9624.997565] next_to_watch <fc> [ 9624.997568] jiffies <8e49c5> [ 9624.997572] next_to_watch.status <0> [ 9626.065848] eth0: port 1(peth0) entering disabled state [ 9629.910292] e1000e: peth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None [ 9629.910854] eth0: port 1(peth0) entering forwarding state Regards, Stefan Kuhne _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-May-20 23:01 UTC
Re: [Xen-devel] [pv_ops] e1000e: "Detected Tx Unit Hang"
On 05/20/2010 03:58 PM, Stefan Kuhne wrote:> Am 21.05.2010 00:18, schrieb Jeremy Fitzhardinge: > > Hello Jeremy, > > >> e1000e works fine for me. However, I did have problems with my Ibex >> Peak-based system and the integrated ethernet devices; they would drop >> off the PCIe bus (lspci -vx would show all 0xff for the config space), >> which turned out to be some problem with ALPM (PCIe active link power >> management). Could this be what you''re seeing? >> >> > my "lspci -vx" output: > > 02:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet > Controller (Copper) > Subsystem: FIRST INTERNATIONAL Computer Inc Unknown device 4720 > Flags: bus master, fast devsel, latency 0, IRQ 409 > Memory at d0000000 (32-bit, non-prefetchable) [size=128K] > I/O ports at 2000 [size=32] > Capabilities: [c8] Power Management version 2 > Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ > Queue=0/0 Enable+ > Capabilities: [e0] Express Endpoint IRQ 0 > Capabilities: [100] Advanced Error Reporting > Capabilities: [140] Device Serial Number c6-a9-09-ff-ff-0b-14-00 > 00: 86 80 8c 10 07 05 10 00 00 00 00 02 10 00 00 00 > 10: 00 00 00 d0 00 00 00 00 01 20 00 00 00 00 00 00 > 20: 00 00 00 00 00 00 00 00 00 00 00 00 09 15 20 47 > 30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00 > > and the complete dmesg output: > [ 9620.997466] 0000:02:00.0: peth0: Detected Tx Unit Hang: > [ 9620.997469] TDH <fc> > [ 9620.997471] TDT <1f> > [ 9620.997473] next_to_use <1f> > [ 9620.997475] next_to_clean <fc> > [ 9620.997477] buffer_info[next_to_clean]: > [ 9620.997479] time_stamp <8e2ec3> > [ 9620.997481] next_to_watch <fc> > [ 9620.997483] jiffies <8e3a25> > [ 9620.997485] next_to_watch.status <0> > [ 9622.997490] 0000:02:00.0: peth0: Detected Tx Unit Hang: > [ 9622.997496] TDH <fc> > [ 9622.997500] TDT <1f> > [ 9622.997503] next_to_use <1f> > [ 9622.997507] next_to_clean <fc> > [ 9622.997511] buffer_info[next_to_clean]: > [ 9622.997515] time_stamp <8e2ec3> > [ 9622.997519] next_to_watch <fc> > [ 9622.997522] jiffies <8e41f5> > [ 9622.997526] next_to_watch.status <0> > [ 9624.997536] 0000:02:00.0: peth0: Detected Tx Unit Hang: > [ 9624.997541] TDH <fc> > [ 9624.997545] TDT <1f> > [ 9624.997549] next_to_use <1f> > [ 9624.997553] next_to_clean <fc> > [ 9624.997557] buffer_info[next_to_clean]: > [ 9624.997561] time_stamp <8e2ec3> > [ 9624.997565] next_to_watch <fc> > [ 9624.997568] jiffies <8e49c5> > [ 9624.997572] next_to_watch.status <0> > [ 9626.065848] eth0: port 1(peth0) entering disabled state > [ 9629.910292] e1000e: peth0 NIC Link is Up 1000 Mbps Full Duplex, Flow > Control: None > [ 9629.910854] eth0: port 1(peth0) entering forwarding state >OK, definitely different problem. Does it happen immediately, or after a while? Under load? Can you provide the full boot output, and cat /proc/interrupts? Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Heiko Wundram
2010-May-20 23:21 UTC
AW: [Xen-devel] [pv_ops] e1000e: "Detected Tx Unit Hang"
I''m pretty sure the problem you''re seeing is related to a broken firmware of the specific chipset used for this Intel network card, not to Xen/pv_ops kernel. I''ve had the same problems under high load with "semi-old" Supermicro-Boxens I''m administering. There''s an Intel utility to patch the respective Firmware issue (i.e., the network controller EEPROM), but it''s not available online anymore (at least last time I looked for it, I couldn''t find it on the Intel site, where it was prominently featured when I first looked for it). I''ll try to get access to it from the last machine that I applied this patch to, but I''ll only be able to do this some time during the (European) day tomorrow. --- Heiko. -----Ursprüngliche Nachricht----- Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Jeremy Fitzhardinge Gesendet: Freitag, 21. Mai 2010 01:01 An: xen-devel@lists.xensource.com Cc: Stefan Kuhne Betreff: Re: [Xen-devel] [pv_ops] e1000e: "Detected Tx Unit Hang" On 05/20/2010 03:58 PM, Stefan Kuhne wrote:> Am 21.05.2010 00:18, schrieb Jeremy Fitzhardinge: > > Hello Jeremy, > > >> e1000e works fine for me. However, I did have problems with my Ibex >> Peak-based system and the integrated ethernet devices; they would drop >> off the PCIe bus (lspci -vx would show all 0xff for the config space), >> which turned out to be some problem with ALPM (PCIe active link power >> management). Could this be what you''re seeing? >> >> > my "lspci -vx" output: > > 02:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet > Controller (Copper) > Subsystem: FIRST INTERNATIONAL Computer Inc Unknown device 4720 > Flags: bus master, fast devsel, latency 0, IRQ 409 > Memory at d0000000 (32-bit, non-prefetchable) [size=128K] > I/O ports at 2000 [size=32] > Capabilities: [c8] Power Management version 2 > Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ > Queue=0/0 Enable+ > Capabilities: [e0] Express Endpoint IRQ 0 > Capabilities: [100] Advanced Error Reporting > Capabilities: [140] Device Serial Number c6-a9-09-ff-ff-0b-14-00 > 00: 86 80 8c 10 07 05 10 00 00 00 00 02 10 00 00 00 > 10: 00 00 00 d0 00 00 00 00 01 20 00 00 00 00 00 00 > 20: 00 00 00 00 00 00 00 00 00 00 00 00 09 15 20 47 > 30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00 > > and the complete dmesg output: > [ 9620.997466] 0000:02:00.0: peth0: Detected Tx Unit Hang: > [ 9620.997469] TDH <fc> > [ 9620.997471] TDT <1f> > [ 9620.997473] next_to_use <1f> > [ 9620.997475] next_to_clean <fc> > [ 9620.997477] buffer_info[next_to_clean]: > [ 9620.997479] time_stamp <8e2ec3> > [ 9620.997481] next_to_watch <fc> > [ 9620.997483] jiffies <8e3a25> > [ 9620.997485] next_to_watch.status <0> > [ 9622.997490] 0000:02:00.0: peth0: Detected Tx Unit Hang: > [ 9622.997496] TDH <fc> > [ 9622.997500] TDT <1f> > [ 9622.997503] next_to_use <1f> > [ 9622.997507] next_to_clean <fc> > [ 9622.997511] buffer_info[next_to_clean]: > [ 9622.997515] time_stamp <8e2ec3> > [ 9622.997519] next_to_watch <fc> > [ 9622.997522] jiffies <8e41f5> > [ 9622.997526] next_to_watch.status <0> > [ 9624.997536] 0000:02:00.0: peth0: Detected Tx Unit Hang: > [ 9624.997541] TDH <fc> > [ 9624.997545] TDT <1f> > [ 9624.997549] next_to_use <1f> > [ 9624.997553] next_to_clean <fc> > [ 9624.997557] buffer_info[next_to_clean]: > [ 9624.997561] time_stamp <8e2ec3> > [ 9624.997565] next_to_watch <fc> > [ 9624.997568] jiffies <8e49c5> > [ 9624.997572] next_to_watch.status <0> > [ 9626.065848] eth0: port 1(peth0) entering disabled state > [ 9629.910292] e1000e: peth0 NIC Link is Up 1000 Mbps Full Duplex, Flow > Control: None > [ 9629.910854] eth0: port 1(peth0) entering forwarding state >OK, definitely different problem. Does it happen immediately, or after a while? Under load? Can you provide the full boot output, and cat /proc/interrupts? Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Kuhne
2010-May-20 23:22 UTC
Re: [Xen-devel] [pv_ops] e1000e: "Detected Tx Unit Hang"
Am 21.05.2010 01:01, schrieb Jeremy Fitzhardinge: Hello Jeremy,> OK, definitely different problem. Does it happen immediately, or after > a while? Under load? Can you provide the full boot output, and cat > /proc/interrupts? >It happen under copy from domU to physical PC. Boot output from dom0 (dmesg)? root@Overmind:~# cat /proc/interrupts CPU0 CPU1 1: 8 0 xen-pirq-ioapic-edge i8042 8: 0 0 xen-pirq-ioapic-edge rtc0 9: 926014 0 xen-pirq-ioapic-level acpi 12: 22 0 xen-pirq-ioapic-edge i8042 14: 129 0 xen-pirq-ioapic-edge ide0 16: 887 0 xen-pirq-ioapic-level uhci_hcd:usb5, firewire_ohci 17: 0 0 xen-pirq-ioapic-level mmc0 18: 0 0 xen-pirq-ioapic-level uhci_hcd:usb4 19: 0 0 xen-pirq-ioapic-level uhci_hcd:usb3 22: 72 0 xen-pirq-ioapic-level HDA Intel 23: 31 0 xen-pirq-ioapic-level ehci_hcd:usb1, uhci_hcd:usb2 381: 15770007 0 xen-dyn-event vif6.0 382: 85431 0 xen-dyn-event blkif-backend 383: 386 0 xen-dyn-event evtchn:xenconsoled 384: 139 0 xen-dyn-event evtchn:xenstored 385: 42592 0 xen-dyn-event vif5.0 386: 65335 0 xen-dyn-event blkif-backend 387: 315 0 xen-dyn-event evtchn:xenconsoled 388: 139 0 xen-dyn-event evtchn:xenstored 389: 43 0 xen-dyn-event vif4.0 390: 1306 0 xen-dyn-event blkif-backend 391: 123 0 xen-dyn-event evtchn:xenconsoled 392: 135 0 xen-dyn-event evtchn:xenstored 393: 6588 0 xen-dyn-event vif3.0 394: 6723 0 xen-dyn-event blkif-backend 395: 319 0 xen-dyn-event evtchn:xenconsoled 396: 265 0 xen-dyn-event evtchn:xenstored 397: 108544 0 xen-dyn-event vif2.0 398: 315 0 xen-dyn-event blkif-backend 399: 87 0 xen-dyn-event evtchn:xenconsoled 400: 128 0 xen-dyn-event evtchn:xenstored 401: 13477877 0 xen-dyn-event vif1.0 402: 866835 0 xen-dyn-event blkif-backend 403: 28802 0 xen-dyn-event blkif-backend 404: 300 0 xen-dyn-event evtchn:xenconsoled 405: 220 0 xen-dyn-event evtchn:xenstored 406: 0 0 xen-dyn-event evtchn:xenstored 407: 2460 0 xen-dyn-event evtchn:xenstored 408: 2953808 0 xen-pirq-msi ahci 409: 8689919 0 xen-pirq-msi peth0 412: 0 0 xen-dyn-virq pcpu 413: 4550 0 xen-dyn-event xenbus 414: 0 403 xen-dyn-ipi callfuncsingle1 415: 0 0 xen-dyn-virq debug1 416: 0 0 xen-dyn-ipi callfunc1 417: 0 104331 xen-dyn-ipi resched1 418: 0 53769606 xen-dyn-virq timer1 419: 221 0 xen-dyn-ipi callfuncsingle0 420: 0 0 xen-dyn-virq debug0 421: 0 0 xen-dyn-ipi callfunc0 422: 264761 0 xen-dyn-ipi resched0 423: 53761166 0 xen-dyn-virq timer0 NMI: 0 0 Non-maskable interrupts LOC: 0 0 Local timer interrupts SPU: 0 0 Spurious interrupts CNT: 0 0 Performance counter interrupts PND: 0 0 Performance pending work RES: 264761 104331 Rescheduling interrupts CAL: 221 403 Function call interrupts TLB: 0 0 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts MCE: 0 0 Machine check exceptions MCP: 180 180 Machine check polls ERR: 0 MIS: 0 root@Overmind:~# I''ve no PCI device forwarded. Regards, Stefan Kuhne _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thomas Goirand
2010-May-22 15:32 UTC
Re: [Xen-devel] [pv_ops] e1000e: "Detected Tx Unit Hang"
Stefan Kuhne wrote:> Hello, > > my server has massive problems with my NIC. > I got: "Detected Tx Unit Hang". > > At the moment I use 2.6.31 from Jeremy, does anyone know if it''s fixed > in 2.6.32 or newer tree? > > Regards, > Stefan Kuhne >We had the issues with many Supermicro servers as well. It seems that Supermicro doesn''t often upgrade BIOS/ROMs/etc. when they sell their hardware. For us, many times, this fixed the issue: ethtool -K peth0 tso off You might want to try as well. Thomas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Kuhne
2010-May-23 00:16 UTC
Re: [Xen-devel] [pv_ops] e1000e: "Detected Tx Unit Hang"
Hello Heiko, thanks for this script. It seams to work fine now. Thanks, Stefan Kuhne _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Apparently Analagous Threads
- PCI Passthrough of NIC
- NIC Stability Problems Under Xen 4.4 / CentOS 6 / Linux 3.18
- [PATCH 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush
- [PATCH 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush
- NIC Stability Problems Under Xen 4.4 / CentOS 6 / Linux 3.18