I have a server thats been running fine for a year or two lock up a few times recently, requiring power cycling. The /var/log/messages after a lockup last night is appended to this message. hardware is a pretty typical server, Supermicro X8DTE-F motherboard, dual Xeon X5650, 48GB ECC memory, LSI SAS 2008 for the boot disks, and LSI MegaRAID SAS 9261-8i for the data volume. Lots of 3TB disks in a raid60. Primary application is BackupPC v3.3.0 (from EPEL), it also has an NFS export (also used for backup purposes). Runs CentOS 6.latest (kernel 2.6.32-431.11.2.el6.x86_64). X is not loaded (inittab level 3). selinux is permissive, iptables is not loaded. this server is on a corporate internal network, 1 Intel 82574L NIC configured with static IP, 2nd one is not in use. any clues what to try? I'm hesitant to enable irqpoll as I hear that it is a real performance sucker. (quiet for a day+ except for an nfs umount 6 hours prior to this crash) Apr 1 21:19:23 sg1 kernel: irq 70: nobody cared (try booting with the "irqpoll" option) Apr 1 21:19:23 sg1 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Apr 1 21:19:23 sg1 kernel: Call Trace: Apr 1 21:19:23 sg1 kernel: <IRQ> [<ffffffff810e8fdb>] ? __report_bad_irq+0x2b/0xa0 Apr 1 21:19:23 sg1 kernel: [<ffffffff810e91dc>] ? note_interrupt+0x18c/0x1d0 Apr 1 21:19:23 sg1 kernel: [<ffffffff810e9825>] ? handle_edge_irq+0xf5/0x180 Apr 1 21:19:23 sg1 kernel: [<ffffffff8100faf9>] ? handle_irq+0x49/0xa0 Apr 1 21:19:23 sg1 kernel: [<ffffffff815315fc>] ? do_IRQ+0x6c/0xf0 Apr 1 21:19:23 sg1 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Apr 1 21:19:23 sg1 kernel: <EOI> [<ffffffff812e0bee>] ? intel_idle+0xde/0x170 Apr 1 21:19:23 sg1 kernel: [<ffffffff812e0bd1>] ? intel_idle+0xc1/0x170 Apr 1 21:19:23 sg1 kernel: [<ffffffff81426b67>] ? cpuidle_idle_call+0xa7/0x140 Apr 1 21:19:23 sg1 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Apr 1 21:19:23 sg1 kernel: [<ffffffff8152143c>] ? start_secondary+0x2ac/0x2ef Apr 1 21:19:23 sg1 kernel: handlers: Apr 1 21:19:23 sg1 kernel: [<ffffffffa01bd260>] (e1000_msix_other+0x0/0x1f0 [e1000e]) Apr 1 21:19:23 sg1 kernel: Disabling IRQ #70 Apr 1 21:19:24 sg1 abrt-dump-oops: Reported 1 kernel oopses to Abrt Apr 1 21:19:24 sg1 abrtd: Directory 'oops-2014-04-01-21:19:24-7042-1' creation detected Apr 1 21:19:25 sg1 abrtd: Can't open file '/var/spool/abrt/oops-2014-04-01-21:19:24-7042-1/uid': No such file or directory Apr 1 21:19:30 sg1 kernel: Bridge firewalling registered Apr 1 21:22:58 sg1 kernel: INFO: task crond:11598 blocked for more than 120 seconds. Apr 1 21:22:58 sg1 kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Apr 1 21:22:58 sg1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 1 21:22:58 sg1 kernel: crond D 0000000000000008 0 11598 7120 0x00000080 Apr 1 21:22:58 sg1 kernel: ffff88011257bd38 0000000000000086 0000000000000000 0000000000000000 Apr 1 21:22:58 sg1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Apr 1 21:22:58 sg1 kernel: ffff88063208dab8 ffff88011257bfd8 000000000000fbc8 ffff88063208dab8 Apr 1 21:22:58 sg1 kernel: Call Trace: Apr 1 21:22:58 sg1 kernel: [<ffffffff81528dd5>] schedule_timeout+0x215/0x2e0 Apr 1 21:22:58 sg1 kernel: [<ffffffff81330968>] ? extract_entropy+0x108/0x1f0 Apr 1 21:22:58 sg1 kernel: [<ffffffff81528a53>] wait_for_common+0x123/0x180 Apr 1 21:22:58 sg1 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Apr 1 21:22:58 sg1 kernel: [<ffffffff81528b6d>] wait_for_completion+0x1d/0x20 Apr 1 21:22:58 sg1 kernel: [<ffffffff81097108>] synchronize_sched+0x58/0x60 Apr 1 21:22:58 sg1 kernel: [<ffffffff81097090>] ? wakeme_after_rcu+0x0/0x20 Apr 1 21:22:58 sg1 kernel: [<ffffffff812229dc>] install_session_keyring_to_cred+0x6c/0xd0 Apr 1 21:22:58 sg1 kernel: [<ffffffff81222b73>] join_session_keyring+0x133/0x160 Apr 1 21:22:58 sg1 kernel: [<ffffffff810e2057>] ? audit_syscall_entry+0x1d7/0x200 Apr 1 21:22:58 sg1 kernel: [<ffffffff81221778>] keyctl_join_session_keyring+0x38/0x70 Apr 1 21:22:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 Apr 1 21:22:58 sg1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Apr 1 21:24:58 sg1 kernel: INFO: task crond:11598 blocked for more than 120 seconds. Apr 1 21:24:58 sg1 kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Apr 1 21:24:58 sg1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 1 21:24:58 sg1 kernel: crond D 0000000000000008 0 11598 7120 0x00000080 Apr 1 21:24:58 sg1 kernel: ffff88011257bd38 0000000000000086 0000000000000000 0000000000000000 Apr 1 21:24:58 sg1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Apr 1 21:24:58 sg1 kernel: ffff88063208dab8 ffff88011257bfd8 000000000000fbc8 ffff88063208dab8 Apr 1 21:24:58 sg1 kernel: Call Trace: Apr 1 21:24:58 sg1 kernel: [<ffffffff81528dd5>] schedule_timeout+0x215/0x2e0 Apr 1 21:24:58 sg1 kernel: [<ffffffff81330968>] ? extract_entropy+0x108/0x1f0 Apr 1 21:24:58 sg1 kernel: [<ffffffff81528a53>] wait_for_common+0x123/0x180 Apr 1 21:24:58 sg1 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Apr 1 21:24:58 sg1 kernel: [<ffffffff81528b6d>] wait_for_completion+0x1d/0x20 Apr 1 21:24:58 sg1 kernel: [<ffffffff81097108>] synchronize_sched+0x58/0x60 Apr 1 21:24:58 sg1 kernel: [<ffffffff81097090>] ? wakeme_after_rcu+0x0/0x20 Apr 1 21:24:58 sg1 kernel: [<ffffffff812229dc>] install_session_keyring_to_cred+0x6c/0xd0 Apr 1 21:24:58 sg1 kernel: [<ffffffff81222b73>] join_session_keyring+0x133/0x160 Apr 1 21:24:58 sg1 kernel: [<ffffffff810e2057>] ? audit_syscall_entry+0x1d7/0x200 Apr 1 21:24:58 sg1 kernel: [<ffffffff81221778>] keyctl_join_session_keyring+0x38/0x70 Apr 1 21:24:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 Apr 1 21:24:58 sg1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Apr 1 21:26:58 sg1 kernel: INFO: task crond:11598 blocked for more than 120 seconds. Apr 1 21:26:58 sg1 kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Apr 1 21:26:58 sg1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 1 21:26:58 sg1 kernel: crond D 0000000000000008 0 11598 7120 0x00000080 Apr 1 21:26:58 sg1 kernel: ffff88011257bd38 0000000000000086 0000000000000000 0000000000000000 Apr 1 21:26:58 sg1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Apr 1 21:26:58 sg1 kernel: ffff88063208dab8 ffff88011257bfd8 000000000000fbc8 ffff88063208dab8 Apr 1 21:26:58 sg1 kernel: Call Trace: Apr 1 21:26:58 sg1 kernel: [<ffffffff81528dd5>] schedule_timeout+0x215/0x2e0 Apr 1 21:26:58 sg1 kernel: [<ffffffff81330968>] ? extract_entropy+0x108/0x1f0 Apr 1 21:26:58 sg1 kernel: [<ffffffff81528a53>] wait_for_common+0x123/0x180 Apr 1 21:26:58 sg1 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Apr 1 21:26:58 sg1 kernel: [<ffffffff81528b6d>] wait_for_completion+0x1d/0x20 Apr 1 21:26:58 sg1 kernel: [<ffffffff81097108>] synchronize_sched+0x58/0x60 Apr 1 21:26:58 sg1 kernel: [<ffffffff81097090>] ? wakeme_after_rcu+0x0/0x20 Apr 1 21:26:58 sg1 kernel: [<ffffffff812229dc>] install_session_keyring_to_cred+0x6c/0xd0 Apr 1 21:26:58 sg1 kernel: [<ffffffff81222b73>] join_session_keyring+0x133/0x160 Apr 1 21:26:58 sg1 kernel: [<ffffffff810e2057>] ? audit_syscall_entry+0x1d7/0x200 Apr 1 21:26:58 sg1 kernel: [<ffffffff81221778>] keyctl_join_session_keyring+0x38/0x70 Apr 1 21:26:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 Apr 1 21:26:58 sg1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Apr 1 21:28:58 sg1 kernel: INFO: task crond:11598 blocked for more than 120 seconds. Apr 1 21:28:58 sg1 kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Apr 1 21:28:58 sg1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 1 21:28:58 sg1 kernel: crond D 0000000000000008 0 11598 7120 0x00000080 Apr 1 21:28:58 sg1 kernel: ffff88011257bd38 0000000000000086 0000000000000000 0000000000000000 Apr 1 21:28:58 sg1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Apr 1 21:28:58 sg1 kernel: ffff88063208dab8 ffff88011257bfd8 000000000000fbc8 ffff88063208dab8 Apr 1 21:28:58 sg1 kernel: Call Trace: Apr 1 21:28:58 sg1 kernel: [<ffffffff81528dd5>] schedule_timeout+0x215/0x2e0 Apr 1 21:28:58 sg1 kernel: [<ffffffff81330968>] ? extract_entropy+0x108/0x1f0 Apr 1 21:28:58 sg1 kernel: [<ffffffff81528a53>] wait_for_common+0x123/0x180 Apr 1 21:28:58 sg1 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Apr 1 21:28:58 sg1 kernel: [<ffffffff81528b6d>] wait_for_completion+0x1d/0x20 Apr 1 21:28:58 sg1 kernel: [<ffffffff81097108>] synchronize_sched+0x58/0x60 Apr 1 21:28:58 sg1 kernel: [<ffffffff81097090>] ? wakeme_after_rcu+0x0/0x20 Apr 1 21:28:58 sg1 kernel: [<ffffffff812229dc>] install_session_keyring_to_cred+0x6c/0xd0 Apr 1 21:28:58 sg1 kernel: [<ffffffff81222b73>] join_session_keyring+0x133/0x160 Apr 1 21:28:58 sg1 kernel: [<ffffffff810e2057>] ? audit_syscall_entry+0x1d7/0x200 Apr 1 21:28:58 sg1 kernel: [<ffffffff81221778>] keyctl_join_session_keyring+0x38/0x70 Apr 1 21:28:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 Apr 1 21:28:58 sg1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Apr 1 21:30:58 sg1 kernel: INFO: task crond:11598 blocked for more than 120 seconds. Apr 1 21:30:58 sg1 kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Apr 1 21:30:58 sg1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 1 21:30:58 sg1 kernel: crond D 0000000000000008 0 11598 7120 0x00000080 Apr 1 21:30:58 sg1 kernel: ffff88011257bd38 0000000000000086 0000000000000000 0000000000000000 Apr 1 21:30:58 sg1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Apr 1 21:30:58 sg1 kernel: ffff88063208dab8 ffff88011257bfd8 000000000000fbc8 ffff88063208dab8 Apr 1 21:30:58 sg1 kernel: Call Trace: Apr 1 21:30:58 sg1 kernel: [<ffffffff81528dd5>] schedule_timeout+0x215/0x2e0 Apr 1 21:30:58 sg1 kernel: [<ffffffff81330968>] ? extract_entropy+0x108/0x1f0 Apr 1 21:30:58 sg1 kernel: [<ffffffff81528a53>] wait_for_common+0x123/0x180 Apr 1 21:30:58 sg1 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Apr 1 21:30:58 sg1 kernel: [<ffffffff81528b6d>] wait_for_completion+0x1d/0x20 Apr 1 21:30:58 sg1 kernel: [<ffffffff81097108>] synchronize_sched+0x58/0x60 Apr 1 21:30:58 sg1 kernel: [<ffffffff81097090>] ? wakeme_after_rcu+0x0/0x20 Apr 1 21:30:58 sg1 kernel: [<ffffffff812229dc>] install_session_keyring_to_cred+0x6c/0xd0 Apr 1 21:30:58 sg1 kernel: [<ffffffff81222b73>] join_session_keyring+0x133/0x160 Apr 1 21:30:58 sg1 kernel: [<ffffffff810e2057>] ? audit_syscall_entry+0x1d7/0x200 Apr 1 21:30:58 sg1 kernel: [<ffffffff81221778>] keyctl_join_session_keyring+0x38/0x70 Apr 1 21:30:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 Apr 1 21:30:58 sg1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Apr 1 21:32:58 sg1 kernel: INFO: task crond:11598 blocked for more than 120 seconds. Apr 1 21:32:58 sg1 kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Apr 1 21:32:58 sg1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 1 21:32:58 sg1 kernel: crond D 0000000000000008 0 11598 7120 0x00000080 Apr 1 21:32:58 sg1 kernel: ffff88011257bd38 0000000000000086 0000000000000000 0000000000000000 Apr 1 21:32:58 sg1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Apr 1 21:32:58 sg1 kernel: ffff88063208dab8 ffff88011257bfd8 000000000000fbc8 ffff88063208dab8 Apr 1 21:32:58 sg1 kernel: Call Trace: Apr 1 21:32:58 sg1 kernel: [<ffffffff81528dd5>] schedule_timeout+0x215/0x2e0 Apr 1 21:32:58 sg1 kernel: [<ffffffff81330968>] ? extract_entropy+0x108/0x1f0 Apr 1 21:32:58 sg1 kernel: [<ffffffff81528a53>] wait_for_common+0x123/0x180 Apr 1 21:32:58 sg1 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Apr 1 21:32:58 sg1 kernel: [<ffffffff81528b6d>] wait_for_completion+0x1d/0x20 Apr 1 21:32:58 sg1 kernel: [<ffffffff81097108>] synchronize_sched+0x58/0x60 Apr 1 21:32:58 sg1 kernel: [<ffffffff81097090>] ? wakeme_after_rcu+0x0/0x20 Apr 1 21:32:58 sg1 kernel: [<ffffffff812229dc>] install_session_keyring_to_cred+0x6c/0xd0 Apr 1 21:32:58 sg1 kernel: [<ffffffff81222b73>] join_session_keyring+0x133/0x160 Apr 1 21:32:58 sg1 kernel: [<ffffffff810e2057>] ? audit_syscall_entry+0x1d7/0x200 Apr 1 21:32:58 sg1 kernel: [<ffffffff81221778>] keyctl_join_session_keyring+0x38/0x70 Apr 1 21:32:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 Apr 1 21:32:58 sg1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Apr 1 21:32:58 sg1 kernel: INFO: task crond:11601 blocked for more than 120 seconds. Apr 1 21:32:58 sg1 kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Apr 1 21:32:58 sg1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 1 21:32:58 sg1 kernel: crond D 0000000000000008 0 11601 7120 0x00000080 Apr 1 21:32:58 sg1 kernel: ffff880102599d38 0000000000000082 0000000000000000 0000000000000000 Apr 1 21:32:58 sg1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Apr 1 21:32:58 sg1 kernel: ffff880633393058 ffff880102599fd8 000000000000fbc8 ffff880633393058 Apr 1 21:32:58 sg1 kernel: Call Trace: Apr 1 21:32:58 sg1 kernel: [<ffffffff81528dd5>] schedule_timeout+0x215/0x2e0 Apr 1 21:32:58 sg1 kernel: [<ffffffff81330968>] ? extract_entropy+0x108/0x1f0 Apr 1 21:32:58 sg1 kernel: [<ffffffff81528a53>] wait_for_common+0x123/0x180 Apr 1 21:32:58 sg1 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Apr 1 21:32:58 sg1 kernel: [<ffffffff81528b6d>] wait_for_completion+0x1d/0x20 Apr 1 21:32:58 sg1 kernel: [<ffffffff81097108>] synchronize_sched+0x58/0x60 Apr 1 21:32:58 sg1 kernel: [<ffffffff81097090>] ? wakeme_after_rcu+0x0/0x20 Apr 1 21:32:58 sg1 kernel: [<ffffffff812229dc>] install_session_keyring_to_cred+0x6c/0xd0 Apr 1 21:32:58 sg1 kernel: [<ffffffff81222b73>] join_session_keyring+0x133/0x160 Apr 1 21:32:58 sg1 kernel: [<ffffffff810e2057>] ? audit_syscall_entry+0x1d7/0x200 Apr 1 21:32:58 sg1 kernel: [<ffffffff81221778>] keyctl_join_session_keyring+0x38/0x70 Apr 1 21:32:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 Apr 1 21:32:58 sg1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Apr 1 21:34:58 sg1 kernel: INFO: task crond:11598 blocked for more than 120 seconds. Apr 1 21:34:58 sg1 kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Apr 1 21:34:58 sg1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 1 21:34:58 sg1 kernel: crond D 0000000000000008 0 11598 7120 0x00000080 Apr 1 21:34:58 sg1 kernel: ffff88011257bd38 0000000000000086 0000000000000000 0000000000000000 Apr 1 21:34:58 sg1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Apr 1 21:34:58 sg1 kernel: ffff88063208dab8 ffff88011257bfd8 000000000000fbc8 ffff88063208dab8 Apr 1 21:34:58 sg1 kernel: Call Trace: Apr 1 21:34:58 sg1 kernel: [<ffffffff81528dd5>] schedule_timeout+0x215/0x2e0 Apr 1 21:34:58 sg1 kernel: [<ffffffff81330968>] ? extract_entropy+0x108/0x1f0 Apr 1 21:34:58 sg1 kernel: [<ffffffff81528a53>] wait_for_common+0x123/0x180 Apr 1 21:34:58 sg1 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Apr 1 21:34:58 sg1 kernel: [<ffffffff81528b6d>] wait_for_completion+0x1d/0x20 Apr 1 21:34:58 sg1 kernel: [<ffffffff81097108>] synchronize_sched+0x58/0x60 Apr 1 21:34:58 sg1 kernel: [<ffffffff81097090>] ? wakeme_after_rcu+0x0/0x20 Apr 1 21:34:58 sg1 kernel: [<ffffffff812229dc>] install_session_keyring_to_cred+0x6c/0xd0 Apr 1 21:34:58 sg1 kernel: [<ffffffff81222b73>] join_session_keyring+0x133/0x160 Apr 1 21:34:58 sg1 kernel: [<ffffffff810e2057>] ? audit_syscall_entry+0x1d7/0x200 Apr 1 21:34:58 sg1 kernel: [<ffffffff81221778>] keyctl_join_session_keyring+0x38/0x70 Apr 1 21:34:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 Apr 1 21:34:58 sg1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Apr 1 21:34:58 sg1 kernel: INFO: task crond:11601 blocked for more than 120 seconds. Apr 1 21:34:58 sg1 kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Apr 1 21:34:58 sg1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 1 21:34:58 sg1 kernel: crond D 0000000000000008 0 11601 7120 0x00000080 Apr 1 21:34:58 sg1 kernel: ffff880102599d38 0000000000000082 0000000000000000 0000000000000000 Apr 1 21:34:58 sg1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Apr 1 21:34:58 sg1 kernel: ffff880633393058 ffff880102599fd8 000000000000fbc8 ffff880633393058 Apr 1 21:34:58 sg1 kernel: Call Trace: Apr 1 21:34:58 sg1 kernel: [<ffffffff81528dd5>] schedule_timeout+0x215/0x2e0 Apr 1 21:34:58 sg1 kernel: [<ffffffff81330968>] ? extract_entropy+0x108/0x1f0 Apr 1 21:34:58 sg1 kernel: [<ffffffff81528a53>] wait_for_common+0x123/0x180 Apr 1 21:34:58 sg1 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Apr 1 21:34:58 sg1 kernel: [<ffffffff81528b6d>] wait_for_completion+0x1d/0x20 Apr 1 21:34:58 sg1 kernel: [<ffffffff81097108>] synchronize_sched+0x58/0x60 Apr 1 21:34:58 sg1 kernel: [<ffffffff81097090>] ? wakeme_after_rcu+0x0/0x20 Apr 1 21:34:58 sg1 kernel: [<ffffffff812229dc>] install_session_keyring_to_cred+0x6c/0xd0 Apr 1 21:34:58 sg1 kernel: [<ffffffff81222b73>] join_session_keyring+0x133/0x160 Apr 1 21:34:58 sg1 kernel: [<ffffffff810e2057>] ? audit_syscall_entry+0x1d7/0x200 Apr 1 21:34:58 sg1 kernel: [<ffffffff81221778>] keyctl_join_session_keyring+0x38/0x70 Apr 1 21:34:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 Apr 1 21:34:58 sg1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Apr 1 21:36:58 sg1 kernel: INFO: task crond:11598 blocked for more than 120 seconds. Apr 1 21:36:58 sg1 kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Apr 1 21:36:58 sg1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 1 21:36:58 sg1 kernel: crond D 0000000000000008 0 11598 7120 0x00000080 Apr 1 21:36:58 sg1 kernel: ffff88011257bd38 0000000000000086 0000000000000000 0000000000000000 Apr 1 21:36:58 sg1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Apr 1 21:36:58 sg1 kernel: ffff88063208dab8 ffff88011257bfd8 000000000000fbc8 ffff88063208dab8 Apr 1 21:36:58 sg1 kernel: Call Trace: Apr 1 21:36:58 sg1 kernel: [<ffffffff81528dd5>] schedule_timeout+0x215/0x2e0 Apr 1 21:36:58 sg1 kernel: [<ffffffff81330968>] ? extract_entropy+0x108/0x1f0 Apr 1 21:36:58 sg1 kernel: [<ffffffff81528a53>] wait_for_common+0x123/0x180 Apr 1 21:36:58 sg1 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Apr 1 21:36:58 sg1 kernel: [<ffffffff81528b6d>] wait_for_completion+0x1d/0x20 Apr 1 21:36:58 sg1 kernel: [<ffffffff81097108>] synchronize_sched+0x58/0x60 Apr 1 21:36:58 sg1 kernel: [<ffffffff81097090>] ? wakeme_after_rcu+0x0/0x20 Apr 1 21:36:58 sg1 kernel: [<ffffffff812229dc>] install_session_keyring_to_cred+0x6c/0xd0 Apr 1 21:36:58 sg1 kernel: [<ffffffff81222b73>] join_session_keyring+0x133/0x160 Apr 1 21:36:58 sg1 kernel: [<ffffffff810e2057>] ? audit_syscall_entry+0x1d7/0x200 Apr 1 21:36:58 sg1 kernel: [<ffffffff81221778>] keyctl_join_session_keyring+0x38/0x70 Apr 1 21:36:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 Apr 1 21:36:58 sg1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b (at 10:57pm, I power cycle it) Apr 1 22:57:43 sg1 kernel: imklog 5.8.10, log source = /proc/kmsg started. Apr 1 22:57:43 sg1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="2232" x-info="http://www.rsyslog.com"] start Apr 1 22:57:43 sg1 kernel: Initializing cgroup subsys cpuset Apr 1 22:57:43 sg1 kernel: Initializing cgroup subsys cpu Apr 1 22:57:43 sg1 kernel: Linux version 2.6.32-431.11.2.el6.x86_64 (mockbuild at c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Tue Mar 25 19:59:55 UTC 2014 Apr 1 22:57:43 sg1 kernel: Command line: ro root=/dev/mapper/vg_sg1-lv_root rd_NO_LUKS rd_LVM_LV=vg_sg1/lv_root rd_LVM_LV=vg_sg1/lv_swap r d_NO_MD quiet SYSFONT=latarcyrheb-sun16 rhgb KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto rhgb quiet rd_NO_DM LANG=en_US.UTF-8 ...... -- john r pierce 37N 122W somewhere on the middle of the left coast
On Wed, 2 Apr 2014, John R Pierce wrote:> I have a server thats been running fine for a year or two lock up a few > times recently, requiring power cycling. > > The /var/log/messages after a lockup last night is appended to this > message. > > hardware is a pretty typical server, Supermicro X8DTE-F motherboard, > dual Xeon X5650, 48GB ECC memory, LSI SAS 2008 for the boot disks, > and LSI MegaRAID SAS 9261-8i for the data volume. Lots of 3TB disks > in a raid60. Primary application is BackupPC v3.3.0 (from EPEL), it > also has an NFS export (also used for backup purposes). > > Runs CentOS 6.latest (kernel 2.6.32-431.11.2.el6.x86_64). X is not > loaded (inittab level 3). selinux is permissive, iptables is not > loaded. this server is on a corporate internal network, 1 Intel > 82574L NIC configured with static IP, 2nd one is not in use.The lovely Supermicro 82574L bug! There was a similar thread about a month ago: http://lists.centos.org/pipermail/centos/2014-March/141348.html The short answer that's worked for me: Add pcie_aspm=off to your boot-time kernel options. I've also started running the kmod-e1001e package from elrepo.org. Also, if MSI-X isn't already turned off in the BIOS, others have suggested making sure it is. -- Paul Heinlein heinlein at madboa.com 45?38' N, 122?6' W
John R Pierce wrote:> I have a server thats been running fine for a year or two lock up a few > times recently, requiring power cycling. > > The /var/log/messages after a lockup last night is appended to this > message. > > hardware is a pretty typical server, Supermicro X8DTE-F motherboard, > dual Xeon X5650, 48GB ECC memory, LSI SAS 2008 for the boot disks, and > LSI MegaRAID SAS 9261-8i for the data volume. Lots of 3TB disks in a > raid60. Primary application is BackupPC v3.3.0 (from EPEL), it also > has an NFS export (also used for backup purposes). > > Runs CentOS 6.latest (kernel 2.6.32-431.11.2.el6.x86_64). X is not > loaded (inittab level 3). selinux is permissive, iptables is not > loaded. this server is on a corporate internal network, 1 Intel 82574L > NIC configured with static IP, 2nd one is not in use. > > any clues what to try? I'm hesitant to enable irqpoll as I hear that > it is a real performance sucker. > ><SNIP>> Apr 1 21:34:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 > Apr 1 21:34:58 sg1 kernel: [<ffffffff8100b072>] > system_call_fastpath+0x16/0x1b > Apr 1 21:36:58 sg1 kernel: INFO: task crond:11598 blocked for more than > 120 seconds. > Apr 1 21:36:58 sg1 kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1 > Apr 1 21:36:58 sg1 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Apr 1 21:36:58 sg1 kernel: crond D 0000000000000008 0 > 11598 7120 0x00000080 > Apr 1 21:36:58 sg1 kernel: ffff88011257bd38 0000000000000086 > 0000000000000000 0000000000000000 > Apr 1 21:36:58 sg1 kernel: 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > Apr 1 21:36:58 sg1 kernel: ffff88063208dab8 ffff88011257bfd8 > 000000000000fbc8 ffff88063208dab8 > Apr 1 21:36:58 sg1 kernel: Call Trace: > Apr 1 21:36:58 sg1 kernel: [<ffffffff81528dd5>] > schedule_timeout+0x215/0x2e0 > Apr 1 21:36:58 sg1 kernel: [<ffffffff81330968>] ? > extract_entropy+0x108/0x1f0 > Apr 1 21:36:58 sg1 kernel: [<ffffffff81528a53>] > wait_for_common+0x123/0x180 > Apr 1 21:36:58 sg1 kernel: [<ffffffff81065df0>] ? > default_wake_function+0x0/0x20 > Apr 1 21:36:58 sg1 kernel: [<ffffffff81528b6d>] > wait_for_completion+0x1d/0x20 > Apr 1 21:36:58 sg1 kernel: [<ffffffff81097108>] > synchronize_sched+0x58/0x60 > Apr 1 21:36:58 sg1 kernel: [<ffffffff81097090>] ? > wakeme_after_rcu+0x0/0x20 > Apr 1 21:36:58 sg1 kernel: [<ffffffff812229dc>] > install_session_keyring_to_cred+0x6c/0xd0 > Apr 1 21:36:58 sg1 kernel: [<ffffffff81222b73>] > join_session_keyring+0x133/0x160 > Apr 1 21:36:58 sg1 kernel: [<ffffffff810e2057>] ? > audit_syscall_entry+0x1d7/0x200 > Apr 1 21:36:58 sg1 kernel: [<ffffffff81221778>] > keyctl_join_session_keyring+0x38/0x70 > Apr 1 21:36:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 > Apr 1 21:36:58 sg1 kernel: [<ffffffff8100b072>] > system_call_fastpath+0x16/0x1b > > (at 10:57pm, I power cycle it) > Apr 1 22:57:43 sg1 kernel: imklog 5.8.10, log source = /proc/kmsg > started. > Apr 1 22:57:43 sg1 rsyslogd: [origin software="rsyslogd" > swVersion="5.8.10" x-pid="2232" x-info="http://www.rsyslog.com"] start > Apr 1 22:57:43 sg1 kernel: Initializing cgroup subsys cpuset > Apr 1 22:57:43 sg1 kernel: Initializing cgroup subsys cpu > Apr 1 22:57:43 sg1 kernel: Linux version 2.6.32-431.11.2.el6.x86_64 > (mockbuild at c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red > Hat 4.4.7-4) (GCC) ) #1 SMP Tue Mar 25 19:59:55 UTC 2014 > Apr 1 22:57:43 sg1 kernel: Command line: ro > root=/dev/mapper/vg_sg1-lv_root rd_NO_LUKS rd_LVM_LV=vg_sg1/lv_root > rd_LVM_LV=vg_sg1/lv_swap r > d_NO_MD quiet SYSFONT=latarcyrheb-sun16 rhgb KEYBOARDTYPE=pc > KEYTABLE=us crashkernel=auto rhgb quiet rd_NO_DM LANG=en_US.UTF-8 > ......I see when it last reported, and I see when you restarted. Could you give me one more piece of info: do sar for that day: I'm curious if the last thing reported was 21:30, or if it kept reporting later. That might tell us if this is when it crashed, or if it was something a bit later that left no trail. mark