Ian B
2016-Feb-19 10:56 UTC
[CentOS] Network hangs after several hours (Centos 6 recently upgraded kernel/glibc)
Hi all, We have a development server we have just tried updating the kernel & glibc after recent recommendations. Its been stable previously for a few years with only scheduled reboots. Its running Centos 6.6(final) 2.6.32-573.18.1.el6.x86_64 GNU libc 2.12 Upgraded via YUM, rebooted, all fine for several hours, and then network seemed to hang. Not much happening as its a dev server we are testing before moving to production. Googling, I see there is some history of e100e driver having issues, and I'm wondering if it could be related. Does anyone have any thoughts on where to do with it, as I'm assuming it will hang again later. Thanks, Ian Feb 18 05:04:36 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Feb 18 05:04:36 kernel: Hardware name: X9SCL/X9SCM Feb 18 05:04:36 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Feb 18 05:04:36 kernel: Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext4 jbd2 e1000e serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support shpchp ext3 jbd mbcache raid1 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Feb 18 05:04:36 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.4.2.el6.x86_64 #1 Feb 18 05:04:36 kernel: Call Trace: Feb 18 05:04:36 kernel: <IRQ> [<ffffffff81069a17>] ? warn_slowpath_common+0x87/0xc0 Feb 18 05:04:36 kernel: [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50 Feb 18 05:04:36 kernel: [<ffffffff8144a4fd>] ? dev_watchdog+0x26d/0x280 Feb 18 05:04:36 kernel: [<ffffffff8108b3fd>] ? insert_work+0x6d/0xb0 Feb 18 05:04:36 kernel: [<ffffffff8144a290>] ? dev_watchdog+0x0/0x280 Feb 18 05:04:36 kernel: [<ffffffff8107c7f7>] ? run_timer_softirq+0x197/0x340 Feb 18 05:04:36 kernel: [<ffffffff810a0a10>] ? tick_sched_timer+0x0/0xc0 Feb 18 05:04:36 kernel: [<ffffffff8102ad6d>] ? lapic_next_event+0x1d/0x30 Feb 18 05:04:36 kernel: [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 Feb 18 05:04:36 kernel: [<ffffffff81095610>] ? hrtimer_interrupt+0x140/0x250 Feb 18 05:04:36 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 Feb 18 05:04:36 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Feb 18 05:04:36 kernel: [<ffffffff81071de5>] ? irq_exit+0x85/0x90 Feb 18 05:04:36 kernel: [<ffffffff814f4d70>] ? smp_apic_timer_interrupt+0x70/0x9b Feb 18 05:04:36 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20 Feb 18 05:04:36 kernel: <EOI> [<ffffffff812c49de>] ? intel_idle+0xde/0x170 Feb 18 05:04:36 kernel: [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170 Feb 18 05:04:36 kernel: [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140 Feb 18 05:04:36 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 Feb 18 05:04:36 kernel: [<ffffffff814d40ca>] ? rest_init+0x7a/0x80 Feb 18 05:04:36 kernel: [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430 Feb 18 05:04:36 kernel: [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129 Feb 18 05:04:36 kernel: [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109 Feb 18 05:04:36 kernel: ---[ end trace 21915186e9d87b29 ]--- modinfo e1000e | grep version version: 3.2.5-k srcversion: 8CCA78B3C15DE6229299348 vermagic: 2.6.32-573.18.1.el6.x86_64 SMP mod_unload modversions 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM Controller (rev 09) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5) 00:1f.0 ISA bridge: Intel Corporation C202 Chipset Family LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 03:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
Ian B
2016-Feb-19 11:08 UTC
[CentOS] Network hangs after several hours (Centos 6 recently upgraded kernel/glibc)
Just noticed that in the trace, it shows an old kernel, so I don't think grub was automatically selecting the latest kernel. Just wondering what process updates the default to be the latest kernel, and if a problem could be an update but grub selecting an older kernel, but other packages updated ? On Fri, Feb 19, 2016 at 10:56 AM, Ian B <ibrierley at gmail.com> wrote:> Hi all, > > We have a development server we have just tried updating the kernel & > glibc after recent recommendations. Its been stable previously for a few > years with only scheduled reboots. > > Its running > Centos 6.6(final) > 2.6.32-573.18.1.el6.x86_64 > GNU libc 2.12 > > Upgraded via YUM, rebooted, all fine for several hours, and then network > seemed to hang. Not much happening as its a dev server we are testing > before moving to production. > > Googling, I see there is some history of e100e driver having issues, and > I'm wondering if it could be related. > > Does anyone have any thoughts on where to do with it, as I'm assuming it > will hang again later. > > Thanks, Ian > > Feb 18 05:04:36 kernel: WARNING: at net/sched/sch_generic.c:261 > dev_watchdog+0x26d/0x280() (Not tainted) > Feb 18 05:04:36 kernel: Hardware name: X9SCL/X9SCM > Feb 18 05:04:36 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 > timed out > Feb 18 05:04:36 kernel: Modules linked in: ip6t_REJECT nf_conntrack_ipv6 > nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext4 > jbd2 e1000e serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support > shpchp ext3 jbd mbcache raid1 sd_mod crc_t10dif ahci dm_mirror > dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] > Feb 18 05:04:36 kernel: Pid: 0, comm: swapper Not tainted > 2.6.32-220.4.2.el6.x86_64 #1 > Feb 18 05:04:36 kernel: Call Trace: > Feb 18 05:04:36 kernel: <IRQ> [<ffffffff81069a17>] ? > warn_slowpath_common+0x87/0xc0 > Feb 18 05:04:36 kernel: [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50 > Feb 18 05:04:36 kernel: [<ffffffff8144a4fd>] ? dev_watchdog+0x26d/0x280 > Feb 18 05:04:36 kernel: [<ffffffff8108b3fd>] ? insert_work+0x6d/0xb0 > Feb 18 05:04:36 kernel: [<ffffffff8144a290>] ? dev_watchdog+0x0/0x280 > Feb 18 05:04:36 kernel: [<ffffffff8107c7f7>] ? > run_timer_softirq+0x197/0x340 > Feb 18 05:04:36 kernel: [<ffffffff810a0a10>] ? tick_sched_timer+0x0/0xc0 > Feb 18 05:04:36 kernel: [<ffffffff8102ad6d>] ? lapic_next_event+0x1d/0x30 > Feb 18 05:04:36 kernel: [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 > Feb 18 05:04:36 kernel: [<ffffffff81095610>] ? > hrtimer_interrupt+0x140/0x250 > Feb 18 05:04:36 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 > Feb 18 05:04:36 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 > Feb 18 05:04:36 kernel: [<ffffffff81071de5>] ? irq_exit+0x85/0x90 > Feb 18 05:04:36 kernel: [<ffffffff814f4d70>] ? > smp_apic_timer_interrupt+0x70/0x9b > Feb 18 05:04:36 kernel: [<ffffffff8100bc13>] ? > apic_timer_interrupt+0x13/0x20 > Feb 18 05:04:36 kernel: <EOI> [<ffffffff812c49de>] ? intel_idle+0xde/0x170 > Feb 18 05:04:36 kernel: [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170 > Feb 18 05:04:36 kernel: [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140 > Feb 18 05:04:36 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 > Feb 18 05:04:36 kernel: [<ffffffff814d40ca>] ? rest_init+0x7a/0x80 > Feb 18 05:04:36 kernel: [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430 > Feb 18 05:04:36 kernel: [<ffffffff81c1f33a>] ? > x86_64_start_reservations+0x125/0x129 > Feb 18 05:04:36 kernel: [<ffffffff81c1f438>] ? > x86_64_start_kernel+0xfa/0x109 > Feb 18 05:04:36 kernel: ---[ end trace 21915186e9d87b29 ]--- > > modinfo e1000e | grep version > version: 3.2.5-k > srcversion: 8CCA78B3C15DE6229299348 > vermagic: 2.6.32-573.18.1.el6.x86_64 SMP mod_unload modversions > > > 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM > Controller (rev 09) > 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset > Family USB Enhanced Host Controller #2 (rev 05) > 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family > PCI Express Root Port 1 (rev b5) > 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family > PCI Express Root Port 5 (rev b5) > 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset > Family USB Enhanced Host Controller #1 (rev 05) > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5) > 00:1f.0 ISA bridge: Intel Corporation C202 Chipset Family LPC Controller > (rev 05) > 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset > Family SATA AHCI Controller (rev 05) > 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus > Controller (rev 05) > 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network > Connection > 03:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA > G200eW WPCM450 (rev 0a) > > > > > > >
Richard
2016-Feb-19 12:33 UTC
[CentOS] Network hangs after several hours (Centos 6 recently upgraded kernel/glibc)
> Date: Friday, February 19, 2016 11:08:48 +0000 > From: Ian B <ibrierley at gmail.com> > > On Fri, Feb 19, 2016 at 10:56 AM, Ian B <ibrierley at gmail.com> > wrote: > >> Hi all, >> >> We have a development server we have just tried updating the >> kernel & glibc after recent recommendations. Its been stable >> previously for a few years with only scheduled reboots. >> >> Its running >> Centos 6.6(final) >> 2.6.32-573.18.1.el6.x86_64 >> GNU libc 2.12 >> >> Upgraded via YUM, rebooted, all fine for several hours, and then >> network seemed to hang. Not much happening as its a dev server we >> are testing before moving to production. >> >> Googling, I see there is some history of e100e driver having >> issues, and I'm wondering if it could be related. >> >> Does anyone have any thoughts on where to do with it, as I'm >> assuming it will hang again later. >> >> Thanks, Ian >> >> Feb 18 05:04:36 kernel: WARNING: at net/sched/sch_generic.c:261 >> dev_watchdog+0x26d/0x280() (Not tainted) >> Feb 18 05:04:36 kernel: Hardware name: X9SCL/X9SCM >> Feb 18 05:04:36 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit >> queue 0 timed out >> Feb 18 05:04:36 kernel: Modules linked in: ip6t_REJECT >> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack >> ip6table_filter ip6_tables ipv6 ext4 jbd2 e1000e serio_raw >> i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support shpchp ext3 jbd >> mbcache raid1 sd_mod crc_t10dif ahci dm_mirror dm_region_hash >> dm_log dm_mod [last unloaded: scsi_wait_scan] Feb 18 05:04:36 >> kernel: Pid: 0, comm: swapper Not tainted >> 2.6.32-220.4.2.el6.x86_64 #1 >> Feb 18 05:04:36 kernel: Call Trace: >> Feb 18 05:04:36 kernel: <IRQ> [<ffffffff81069a17>] ? >> warn_slowpath_common+0x87/0xc0 >> Feb 18 05:04:36 kernel: [<ffffffff81069b06>] ? >> warn_slowpath_fmt+0x46/0x50 Feb 18 05:04:36 kernel: >> [<ffffffff8144a4fd>] ? dev_watchdog+0x26d/0x280 Feb 18 05:04:36 >> kernel: [<ffffffff8108b3fd>] ? insert_work+0x6d/0xb0 Feb 18 >> 05:04:36 kernel: [<ffffffff8144a290>] ? dev_watchdog+0x0/0x280 >> Feb 18 05:04:36 kernel: [<ffffffff8107c7f7>] ? >> run_timer_softirq+0x197/0x340 >> Feb 18 05:04:36 kernel: [<ffffffff810a0a10>] ? >> tick_sched_timer+0x0/0xc0 Feb 18 05:04:36 kernel: >> [<ffffffff8102ad6d>] ? lapic_next_event+0x1d/0x30 Feb 18 05:04:36 >> kernel: [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 Feb 18 >> 05:04:36 kernel: [<ffffffff81095610>] ? >> hrtimer_interrupt+0x140/0x250 >> Feb 18 05:04:36 kernel: [<ffffffff8100c24c>] ? >> call_softirq+0x1c/0x30 Feb 18 05:04:36 kernel: >> [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Feb 18 05:04:36 >> kernel: [<ffffffff81071de5>] ? irq_exit+0x85/0x90 Feb 18 05:04:36 >> kernel: [<ffffffff814f4d70>] ? >> smp_apic_timer_interrupt+0x70/0x9b >> Feb 18 05:04:36 kernel: [<ffffffff8100bc13>] ? >> apic_timer_interrupt+0x13/0x20 >> Feb 18 05:04:36 kernel: <EOI> [<ffffffff812c49de>] ? >> intel_idle+0xde/0x170 Feb 18 05:04:36 kernel: >> [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170 Feb 18 05:04:36 >> kernel: [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140 Feb >> 18 05:04:36 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 >> Feb 18 05:04:36 kernel: [<ffffffff814d40ca>] ? rest_init+0x7a/0x80 >> Feb 18 05:04:36 kernel: [<ffffffff81c1ff76>] ? >> start_kernel+0x424/0x430 Feb 18 05:04:36 kernel: >> [<ffffffff81c1f33a>] ? >> x86_64_start_reservations+0x125/0x129 >> Feb 18 05:04:36 kernel: [<ffffffff81c1f438>] ? >> x86_64_start_kernel+0xfa/0x109 >> Feb 18 05:04:36 kernel: ---[ end trace 21915186e9d87b29 ]--- >> >> modinfo e1000e | grep version >> version: 3.2.5-k >> srcversion: 8CCA78B3C15DE6229299348 >> vermagic: 2.6.32-573.18.1.el6.x86_64 SMP mod_unload >> modversions >> >> >> 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor >> Family DRAM Controller (rev 09) >> 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series >> Chipset Family USB Enhanced Host Controller #2 (rev 05) >> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series >> Chipset Family PCI Express Root Port 1 (rev b5) >> 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series >> Chipset Family PCI Express Root Port 5 (rev b5) >> 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series >> Chipset Family USB Enhanced Host Controller #1 (rev 05) >> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5) >> 00:1f.0 ISA bridge: Intel Corporation C202 Chipset Family LPC >> Controller (rev 05) >> 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series >> Chipset Family SATA AHCI Controller (rev 05) >> 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset >> Family SMBus Controller (rev 05) >> 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit >> Network Connection >> 03:03.0 VGA compatible controller: Matrox Electronics Systems >> Ltd. MGA G200eW WPCM450 (rev 0a) >>> Just noticed that in the trace, it shows an old kernel, so I don't > think grub was automatically selecting the latest kernel. Just > wondering what process updates the default to be the latest > kernel, and if a problem could be an update but grub selecting an > older kernel, but other packages updated ? >If your machine is "running Centos 6.6(final)", but you've installed the new kernel and glibc that implies that you are selectively applying updates. The 6.7 point release came out last fall. In addition to the security implications of not fully updating the system you may have missed packages that are impacting networking. You may want to do a full updating of the system and then see how it acts -- it's hard to debug a system that may have mis-matched pieces. To see which kernel your grub is set to load by default, look at the grub.conf -- the "default=" line (normally "0") indicates which of the listed kernels will be selected. If the "default" value isn't "0", and/or the newest kernel isn't the first entry, then you have something mucking with things. Check your /etc/sysconfig/kernel file for starters.
Apparently Analagous Threads
- Network hangs after several hours (Centos 6 recently upgraded kernel/glibc)
- Network hangs after several hours (Centos 6 recently upgraded kernel/glibc)
- Network hangs after several hours (Centos 6 recently upgraded kernel/glibc)
- Network hangs after several hours (Centos 6 recently upgraded kernel/glibc)
- USB blues