a previously rock solid reliable server of mine crashed last night, the
server was still running but eth0, a Intel 82574L using the e1000e
driver, went down. The server has a Supermicro X8DTE-F (dual Xeon
X5650, yada yada). server is a drbd master, so that was the first
thing to notice network issues. Just a couple days ago I ran yum
update to the latest, I do this about once a month.
/var/log/messages logged...
(prior to this was nothing but normal smbd complaining about CUPS not
configured).
May 9 22:30:21 sg1 kernel: block drbd0: PingAck did not arrive in time.
May 9 22:30:21 sg1 kernel: block drbd0: peer( Secondary -> Unknown )
conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
May 9 22:30:21 sg1 kernel: block drbd0: asender terminated
May 9 22:30:21 sg1 kernel: block drbd0: Terminating drbd0_asender
May 9 22:30:22 sg1 kernel: block drbd0: new current UUID
BC856D7A6F94F041:237F4033E81B62DF:1E248D699B6793A9:1E238D699B6793A9
May 9 22:30:22 sg1 kernel: block drbd0: Connection closed
May 9 22:30:22 sg1 kernel: block drbd0: conn( NetworkFailure ->
Unconnected )
May 9 22:30:22 sg1 kernel: block drbd0: receiver terminated
May 9 22:30:22 sg1 kernel: block drbd0: Restarting drbd0_receiver
May 9 22:30:22 sg1 kernel: block drbd0: receiver (re)started
May 9 22:30:22 sg1 kernel: block drbd0: conn( Unconnected ->
WFConnection )
May 9 22:30:34 sg1 kernel: ------------[ cut here ]------------
May 9 22:30:34 sg1 kernel: WARNING: at net/sched/sch_generic.c:261
dev_watchdog+0x26b/0x280() (Not tainted)
May 9 22:30:34 sg1 kernel: Hardware name: ISS3500
May 9 22:30:34 sg1 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit
queue 0 timed out
May 9 22:30:34 sg1 kernel: Modules linked in: drbd(U) nfsd max6650
coretemp adm1021 ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache
auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table
mperf ipv6 xfs exportfs microcode iTCO_wdt iTCO_vendor_support joydev
serio_raw i2c_i801 i2c_core lpc_ich mfd_core e1000e(U) ptp pps_core
ioatdma dca i7core_edac edac_core ses enclosure sg ext4 jbd2 mbcache
sd_mod crc_t10dif ahci megaraid_sas mpt2sas scsi_transport_sas
raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
scsi_wait_scan]
May 9 22:30:34 sg1 kernel: Pid: 0, comm: swapper Not tainted
2.6.32-573.22.1.el6.x86_64 #1
May 9 22:30:34 sg1 kernel: Call Trace:
May 9 22:30:34 sg1 kernel: <IRQ> [<ffffffff81077821>] ?
warn_slowpath_common+0x91/0xe0
May 9 22:30:34 sg1 kernel: [<ffffffff81077926>] ?
warn_slowpath_fmt+0x46/0x60
May 9 22:30:34 sg1 kernel: [<ffffffff8148d64b>] ?
dev_watchdog+0x26b/0x280
May 9 22:30:34 sg1 kernel: [<ffffffff8109aded>] ? insert_work+0x6d/0xb0
May 9 22:30:34 sg1 kernel: [<ffffffff81089bd5>] ?
internal_add_timer+0xb5/0x110
May 9 22:30:34 sg1 kernel: [<ffffffff8148d3e0>] ? dev_watchdog+0x0/0x280
May 9 22:30:34 sg1 kernel: [<ffffffff8108a867>] ?
run_timer_softirq+0x197/0x340
May 9 22:30:34 sg1 kernel: [<ffffffff8103579d>] ?
lapic_next_event+0x1d/0x30
May 9 22:30:34 sg1 kernel: [<ffffffff81080361>] ? __do_softirq+0xc1/0x1e0
May 9 22:30:34 sg1 kernel: [<ffffffff810b322f>] ?
tick_program_event+0x2f/0x40
May 9 22:30:34 sg1 kernel: [<ffffffff8100c38c>] ? call_softirq+0x1c/0x30
May 9 22:30:34 sg1 kernel: [<ffffffff8100fc25>] ? do_softirq+0x65/0xa0
May 9 22:30:34 sg1 kernel: [<ffffffff81080215>] ? irq_exit+0x85/0x90
May 9 22:30:34 sg1 kernel: [<ffffffff815435ba>] ?
smp_apic_timer_interrupt+0x4a/0x60
May 9 22:30:34 sg1 kernel: [<ffffffff8100bc13>] ?
apic_timer_interrupt+0x13/0x20
May 9 22:30:34 sg1 kernel: <EOI> [<ffffffff812f1a5e>] ?
intel_idle+0xfe/0x1b0
May 9 22:30:34 sg1 kernel: [<ffffffff812f1a41>] ? intel_idle+0xe1/0x1b0
May 9 22:30:34 sg1 kernel: [<ffffffff8143413a>] ?
cpuidle_idle_call+0x7a/0xe0
May 9 22:30:34 sg1 kernel: [<ffffffff81009fe6>] ? cpu_idle+0xb6/0x110
May 9 22:30:34 sg1 kernel: [<ffffffff81532912>] ?
start_secondary+0x2c0/0x316
May 9 22:30:34 sg1 kernel: ---[ end trace 883800817e091e53 ]---
May 9 22:30:34 sg1 kernel: e1000e 0000:03:00.0: eth0: Reset adapter
unexpectedly
May 9 22:30:35 sg1 abrt-dump-oops: Reported 1 kernel oopses to Abrt
May 9 22:30:35 sg1 abrtd: Directory 'oops-2016-05-09-22:30:35-8763-1'
creation detected
May 9 22:30:38 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 22:30:42 sg1 kernel: Bridge firewalling registered
May 9 22:31:27 sg1 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
May 9 22:31:32 sg1 abrtd: Can't find a meaningful backtrace for hashing
in '.'
May 9 22:31:32 sg1 abrtd: Preserving oops '.' because
DropNotReportableOopses is '(not set)'
May 9 22:31:32 sg1 abrtd: Looking for kernel package
May 9 22:31:32 sg1 abrtd: Kernel package
kernel-2.6.32-573.22.1.el6.x86_64 found
May 9 22:31:33 sg1 abrtd: New problem directory
/var/spool/abrt/oops-2016-05-09-22:30:35-8763-1, processing
May 9 22:31:33 sg1 abrtd: Sending an email...
May 9 22:31:34 sg1 abrtd: Email was sent to: root at localhost
May 9 22:32:25 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 22:32:30 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 22:34:55 sg1 kernel: e1000e 0000:03:00.0: eth0: Reset adapter
unexpectedly
May 9 22:34:59 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 22:37:25 sg1 kernel: e1000e 0000:03:00.0: eth0: Reset adapter
unexpectedly
May 9 22:37:30 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 22:39:50 sg1 kernel: e1000e 0000:03:00.0: eth0: Reset adapter
unexpectedly
May 9 22:39:55 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 22:41:30 sg1 kernel: e1000e 0000:03:00.0: eth0: Reset adapter
unexpectedly
May 9 22:41:35 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 22:44:00 sg1 kernel: e1000e 0000:03:00.0: eth0: Reset adapter
unexpectedly
May 9 22:44:05 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 22:46:28 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 22:46:33 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 22:50:05 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 22:50:09 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 22:52:56 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 22:53:01 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 22:55:30 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 22:55:35 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 22:59:17 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 22:59:22 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:01:45 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:01:50 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:05:02 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:05:07 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:07:19 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:07:23 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:09:34 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:09:38 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:11:47 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:11:52 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:14:27 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:14:31 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:16:38 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:16:42 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:19:08 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:19:12 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:22:18 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:22:22 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:26:52 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:26:57 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:31:24 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:31:29 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:33:43 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:33:47 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:36:30 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:36:35 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:39:45 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:39:50 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:41:58 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:42:03 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:45:04 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:45:08 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:47:19 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:47:24 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:52:06 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:52:11 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:55:05 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:55:09 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
May 9 23:57:31 sg1 kernel: e1000e: eth0 NIC Link is Down
May 9 23:57:36 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: Rx/Tx
(repeating endlessly til I forced the reboot this morning)
--
john r pierce, recycling bits in santa cruz