Steven Alexson
2007-Nov-09 14:57 UTC
[CentOS] Intel 1000/PRO GT (e1000 driver) and "Detect Tx Unit Hang" error with 4GB RAM
My system configuration: ASUS M2A-VM motherboard AMD Athlon 64 X2 4200+ 2.2 GHz 4x A-DATA 1GB DDR2 800 memory 2x Intel 10/100/1000 Pro/1000 GT Desktop Network Adapter 2x Seagate Barracuda 250GB HD (RAID 1, software RAID) CentOS 5 x86_64; Kernel 2.6.23 (custom built); Version 7.6.9.2 e1000 driver The symptoms of this problem are outlined at: http://e1000.sourceforge.net/wiki/index.php/Issues[1] http://e1000.sourceforge.net/wiki/in...p/Tx_Unit_Hang[2] Last night I started experiencing the "Detected Tx Unit Hang" problem with the Intel e1000 NIC. This happened after I upgraded my system to 4GB RAM (previously 2GB). I have 2 of these cards in the system. I updated the Linux kernel to 2.6.23 and I downloaded from Sourceforge and installed the most recent stable version of the e1000 driver for Linux, version 7.6.9.2. I still experiencing the "Detected Tx Unit Hang" message. I had to recompile the kernel because upgrading to 4GB with the current kernel for CentOS 5 (2.6.18.8-1) causes an error, ata1: softreset failed (1st FIS failed), which results in a kernel panic. Upgrading the kernel to 2.6.23 fixed that problem, but now I have a problem with my network cards. Searching around, I found posts saying that disabling acpi with the kernel options "acpi=off noacpi" would fix it, but it did not. I tried added explicit modprobe options for the driver in /etc/modprobe.conf (options e1000 XsumRX=0 Speed=1000 Duplex=2 InterruptThrottleRate=0 FlowControl=3 RxDescriptors=4096 TxDescriptors=4096 RxIntDelay=0 TxIntDelay=0). Still no change. Still getting experiencing the problem. I then tried another suggestion I found in a forum discussion `ethtool -K eth0 tso=off`. Seems to have had no effect on the problem. This problem occurs immediately when the system is trying to bring the device up. I cannot even get to a point to try sending traffic over the network interface because it never negotiates an IP address from DHCP. If I specify a static IP address, the address is assigned, but I still experience the problem, and I cannot even ping another host. Now, if I reduce the amount of RAM to 3GB or less, everything works fine! So, this leads me to believe that my kernel and driver are configured, compiled, and functioning correctly. It also leads me to believe that there are no problems with the network cards. So, I though perhaps a bad memory module, but no matter which 3 modules of the 4 I leave in, I get the same results. Everything works fine until I add the 4th module. Then I found an article on Intel's site saying that some older EEPROM have the power management option turned on, and that could cause the problem. So, I downloaded the script that would fix the bit in the EEPROM (turning off power management). The script says that it does not apply to my version of the EEPROM. When I run `ethtool -e (eth0|eth1)` I do not have the bit on 0x0010 that is set to "de", so I must believe that the script is correct in assessing that it does not apply to my NICs. So, I thought that perhaps my power supply could be the problem. Perhaps the PSU doesn't supply enough power to power everything when I add the 4th memory module. It is just a generic 300W PSU that came with the case (I have new 500W PSUs on the way). So, I pulled out one of the NICs and disconnected the DVD drive. That is about all I can eliminate. Reducing the hardware installed made no difference. I am running the 64-bit kernel, so I should have no trouble supporting the 4GM RAM, correct?. Now, I am out of ideas, and I seem to have hit a brick wall. One of the things that disturbs me is that all of the articles I have found concerning this problem are dated 1-2 years ago. Can anyone offer me any assistance? Links: ------ [1] http://e1000.sourceforge.net/wiki/index.php/Issues [2] http://e1000.sourceforge.net/wiki/in...p/Tx_Unit_Hang ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos/attachments/20071109/393805ba/attachment.html>
Jim Perrin
2007-Nov-09 15:13 UTC
[CentOS] Intel 1000/PRO GT (e1000 driver) and "Detect Tx Unit Hang" error with 4GB RAM
On Nov 9, 2007 9:57 AM, Steven Alexson <steve at alexson.org> wrote:> > Can anyone offer me any assistance?Doesn't help fix your immediate issue, but the e1000 driver has some substantial updates coming in the 5.1 kernel. It's entirely possible that this will fix your issue. -- During times of universal deceit, telling the truth becomes a revolutionary act. George Orwell
Steven Alexson
2007-Nov-09 16:55 UTC
[CentOS] Intel 1000/PRO GT (e1000 driver) and "Detect Tx Unit Hang" error with 4GB RAM
Hmmm...any chance that you can ellaborate on what updates will be included with 5.1. Not sure if it will be of practical interest anymore (I have 4 new network cards...non-Intel..on the way) since I have spent far more time on this problem than I should have. But, from a curiousity perspective, I am curious. If I could resolve this problem and use the Intel cards, that would be ideal. Then I could return the 4 cards I ordered today and save a bit of money. If waiting the couple of weeks (projected) that it takes for 5.1 to roll out, that might be worth it. Thanks for the insight. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.