Everyone, Most of the time I am over my head in trying to troubleshoot problems. However, after reading manuals, man pages, and getting advice from this list I have been able to work my way through difficulties, and at the end, I usually have a better understanding of what 'is going on'. I can only hope this method will work on this problem too. I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit card. After adding the card to a machine with a new Centos 6.2 install and naming it 'eth4' it works well for 6 to 12 hours and then fails. The failure is characterized by dropping its connection speed from 1000 to 100 while not allowing any data to flow in or out. When this happens a shutdown and reboot does not solve the problem, but shutting down and then removing the power does solve the problem. I wrote a perl script that uses the eth4 interface by pinging another machine every 60 seconds to try to figure out the relationship of the message log entries with the time of failure, and I think there is a corelation of the failure of eth4 to function with the below entry. Unfortunately, I am way over my head on this one. If any of you can help I would surely appreciate your thoughts. Some additional information that may be useful. The TrendNet card is the second TrendNet card I have used. The first card had the same symptoms, and I deduced the card was bad, and purchased another one. The symptoms are the same with the second card. Before I purchase a third card from a different manufacturer I thought I would post this to see what some of you think. This is the first pci-e card I have used; are there problems with the pci-e interfaces as opposed to pci? Do you think the motherboard could be the problem, and moving eth4 to a different slot on the motherboard would be worthwhile. Any ideas ??? Greg Ennis P.S. Here is the appropriate log entry in the /var/log/message file. Jun 20 03:08:38 Mail kernel: ------------[ cut here ]------------ Jun 20 03:08:38 Mail kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Jun 20 03:08:38 Mail kernel: Hardware name: p7-1220 Jun 20 03:08:38 Mail kernel: NETDEV WATCHDOG: eth4 (r8169): transmit queue 0 timed out Jun 20 03:08:38 Mail kernel: Modules linked in: ipt_REDIRECT ipt_LOG xt_limit ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge autofs4 sunrpc bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc garp stp llc scsi_tgt cpufreq_ondemand powernow_k8 freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun kvm uinput sg btusb bluetooth rfkill microcode snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 r8169 mii ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif usb_storage sdhci_pci sdhci mmc_core ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Jun 20 03:08:38 Mail kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.23.1.el6.centos.plus.x86_64 #1 Jun 20 03:08:38 Mail kernel: Call Trace: Jun 20 03:08:38 Mail kernel: <IRQ> [<ffffffff81069c97>] ? warn_slowpath_common+0x87/0xc0 Jun 20 03:08:38 Mail kernel: [<ffffffff81069d86>] ? warn_slowpath_fmt +0x46/0x50 Jun 20 03:08:38 Mail kernel: [<ffffffff81069d86>] ? warn_slowpath_fmt +0x46/0x50 Jun 20 03:08:38 Mail kernel: [<ffffffff81451c0d>] ? dev_watchdog +0x26d/0x280 Jun 20 03:08:38 Mail kernel: [<ffffffff814519a0>] ? dev_watchdog +0x0/0x280 Jun 20 03:08:38 Mail kernel: [<ffffffff810efbf3>] ? trace_nowake_buffer_unlock_commit+0x43/0x60 Jun 20 03:08:38 Mail kernel: [<ffffffff814519a0>] ? dev_watchdog +0x0/0x280 Jun 20 03:08:38 Mail kernel: [<ffffffff8107cab7>] ? run_timer_softirq +0x197/0x340 Jun 20 03:08:38 Mail kernel: [<ffffffff81072291>] ? __do_softirq +0xc1/0x1d0 Jun 20 03:08:38 Mail kernel: [<ffffffff810958b0>] ? hrtimer_interrupt +0x140/0x250 Jun 20 03:08:38 Mail kernel: [<ffffffff8100c24c>] ? call_softirq +0x1c/0x30 Jun 20 03:08:38 Mail kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Jun 20 03:08:38 Mail kernel: [<ffffffff81072075>] ? irq_exit+0x85/0x90 Jun 20 03:08:38 Mail kernel: [<ffffffff814fc550>] ? smp_apic_timer_interrupt+0x70/0x9b Jun 20 03:08:38 Mail kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt +0x13/0x20 Jun 20 03:08:38 Mail kernel: <EOI> [<ffffffff812f5f9c>] ? acpi_idle_enter_simple+0x114/0x14b Jun 20 03:08:38 Mail kernel: [<ffffffff812f5f98>] ? acpi_idle_enter_simple+0x110/0x14b Jun 20 03:08:38 Mail kernel: [<ffffffff814014a7>] ? cpuidle_idle_call +0xa7/0x140 Jun 20 03:08:38 Mail kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 Jun 20 03:08:38 Mail kernel: [<ffffffff814ed686>] ? start_secondary +0x202/0x245 Jun 20 03:08:38 Mail kernel: ---[ end trace 24f15998c117ac8f ]--- Jun 20 03:08:38 Mail kernel: r8169 0000:01:00.0: eth4: link up Jun 20 03:08:39 Mail abrtd: Directory 'oops-2012-06-20-03:08:39-2420-0' creation detected Jun 20 03:08:39 Mail abrt-dump-oops: Reported 1 kernel oopses to Abrt Jun 20 03:08:39 Mail abrtd: Can't open file '/var/spool/abrt/oops-2012-06-20-03:08:39-2420-0/uid': No such file or directory
Gregory P. Ennis wrote: <snip>> I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit > card. After adding the card to a machine with a new Centos 6.2 install > and naming it 'eth4' it works well for 6 to 12 hours and then fails. > The failure is characterized by dropping its connection speed from 1000 > to 100 while not allowing any data to flow in or out. When this happens > a shutdown and reboot does not solve the problem, but shutting down and > then removing the power does solve the problem.<snip>> Some additional information that may be useful. The TrendNet card is > the second TrendNet card I have used. The first card had the same > symptoms, and I deduced the card was bad, and purchased another one. The > symptoms are the same with the second card.<snip> Several questions: do you have another machine on the same network? Does *it* show the problem, around the same time? And, finally, did you buy both TrendNet cards from the same vendor? Are their MACs close? If so, it could be the vendor got a bad batch, either OEM's fault, or the gorilla who un/loaded it during shipping. mark
On 6/20/2012 9:34 AM, Gregory P. Ennis wrote:> > I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit > card. After adding the card to a machine with a new Centos 6.2 install > and naming it 'eth4' it works well for 6 to 12 hours and then fails.Try moving the network card to a new slot, especially if you can swap the network card with another card which is known to work. Also, try swapping the card into a spare server. If the problem follows the network card, then the card is probably bad. If a known-good card misbehaves in the slot where you previously had the network card, then the slot may be bad as well. -- -Chris Nothing in this message is intended to make or accept an offer or to form a contract, except that an attachment that is an image of a contract bearing the signature of an officer of our company may be or become a contract. This message (including any attachments) is intended only for the use of the individual or entity to whom it is addressed. It may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, we hereby notify you that any use, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this message in error, please notify us immediately by telephone and delete this message immediately. Thank you.
> Date: Wed, 20 Jun 2012 10:54:33 -0700 > From: John R Pierce<pierce at hogranch.com> > Subject: Re: [CentOS] Failing Network card > To:centos at centos.org > Message-ID:<4FE20E59.20907 at hogranch.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > On 06/20/12 8:44 AM, Gregory P. Ennis wrote: >> > 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. >> > RTL8111/8168B PCI Express Gigabit Ethernet controller (rev ff)>> pure unmitigated junk. > > -- john r pierce N 37, W 122 santa cruz ca mid-left coastI agree with John's comment. Realtek chips are junk with unpredictable reliability, especially under heavy load. I have had several problems with various versions of the 81xx chips. When I tossed the cards in the garbage and switched to Intel-based NICs, all the problems went away. Every time I build systems with Realtek network chips on the motherboard, I disable them in the BIOS and add Intel NICs instead. YMMV, but please consider ditching Realtek altogether. Chuck
Apparently Analagous Threads
- Reset adapter
- Mabe OT? What managed switch is best for VoIP application?
- Network stopped just out of the blue leaving this backtrace:
- Network hangs after several hours (Centos 6 recently upgraded kernel/glibc)
- kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out