Jason Pyeron
2015-Feb-16 17:02 UTC
[CentOS] Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
> -----Original Message----- > From: Jason Pyeron > Sent: Sunday, February 08, 2015 0:00 > > > -----Original Message----- > > From: Jason Pyeron > > Sent: Saturday, February 07, 2015 22:54 > > > > NOTE: this is happening on Centos 6 x86_64, > > 2.6.32-504.3.3.el6.x86_64 not Centos 5 > > > > Dell PowerEdge 2970, Seagate SATA drive, non-raid. > > > > I have this server which has been dying randomly, with no logs. > > Here is a console picture. > > http://i.imgur.com/ZYHlB82.jpgThanks to netconsole, I have the panic to post: Feb 16 06:06:56 BUG: soft lockup - CPU#0 stuck for 67s! [ksmd:88] Feb 16 06:06:56 Modules linked in: Feb 16 06:06:56 nf_nat Feb 16 06:06:56 mpt3sas Feb 16 06:06:56 mpt2sas Feb 16 06:06:56 raid_class Feb 16 06:06:56 mptctl Feb 16 06:06:56 ipmi_si Feb 16 06:06:56 ipmi_devintf Feb 16 06:06:56 netconsole Feb 16 06:06:56 configfs Feb 16 06:06:56 ebtable_nat Feb 16 06:06:56 ebtables Feb 16 06:06:56 nfs Feb 16 06:06:56 lockd Feb 16 06:06:56 fscache Feb 16 06:06:56 auth_rpcgss Feb 16 06:06:56 nfs_acl Feb 16 06:06:56 sunrpc Feb 16 06:06:56 bridge Feb 16 06:06:56 stp Feb 16 06:06:56 llc Feb 16 06:06:56 ipt_REJECT Feb 16 06:06:56 nf_conntrack_ipv4 Feb 16 06:06:56 nf_defrag_ipv4 Feb 16 06:06:56 iptable_filter Feb 16 06:06:56 ip_tables Feb 16 06:06:56 ip6t_REJECT Feb 16 06:06:56 nf_conntrack_ipv6 Feb 16 06:06:56 nf_defrag_ipv6 Feb 16 06:06:56 xt_state Feb 16 06:06:56 nf_conntrack Feb 16 06:06:56 ip6table_filter Feb 16 06:06:56 ip6_tables Feb 16 06:06:56 ipv6 Feb 16 06:06:56 dm_snapshot Feb 16 06:06:56 dm_bufio Feb 16 06:06:56 dm_zero Feb 16 06:06:56 vhost_net Feb 16 06:06:56 macvtap Feb 16 06:06:56 macvlan Feb 16 06:06:56 tun Feb 16 06:06:56 kvm_amd Feb 16 06:06:56 kvm Feb 16 06:06:56 ipmi_msghandler Feb 16 06:06:56 dcdbas Feb 16 06:06:56 serio_raw Feb 16 06:06:56 bnx2 Feb 16 06:06:56 k10temp Feb 16 06:06:56 amd64_edac_mod Feb 16 06:06:56 edac_core Feb 16 06:06:56 edac_mce_amd Feb 16 06:06:56 sg Feb 16 06:06:56 i2c_piix4 Feb 16 06:06:56 shpchp Feb 16 06:06:56 ext4 Feb 16 06:06:56 jbd2 Feb 16 06:06:56 mbcache Feb 16 06:06:56 sd_mod Feb 16 06:06:56 crc_t10dif Feb 16 06:06:56 mptsas Feb 16 06:06:56 mptscsih Feb 16 06:06:56 mptbase Feb 16 06:06:56 scsi_transport_sas Feb 16 06:06:56 ata_generic Feb 16 06:06:56 pata_acpi Feb 16 06:06:56 sata_svw Feb 16 06:06:56 radeon Feb 16 06:06:56 ttm Feb 16 06:06:56 drm_kms_helper Feb 16 06:06:56 drm Feb 16 06:06:56 i2c_algo_bit Feb 16 06:06:56 i2c_core Feb 16 06:06:56 dm_mirror Feb 16 06:06:56 dm_region_hash Feb 16 06:06:56 dm_log Feb 16 06:06:56 dm_mod Feb 16 06:06:56 [last unloaded: dell_rbu] Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 CPU 0 Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 Modules linked in: Feb 16 06:06:56 nf_nat Feb 16 06:06:56 mpt3sas Feb 16 06:06:56 mpt2sas Feb 16 06:06:56 raid_class Feb 16 06:06:56 mptctl Feb 16 06:06:56 ipmi_si Feb 16 06:06:56 ipmi_devintf Feb 16 06:06:56 netconsole Feb 16 06:06:56 configfs Feb 16 06:06:56 ebtable_nat Feb 16 06:06:56 ebtables Feb 16 06:06:56 nfs Feb 16 06:06:56 lockd Feb 16 06:06:56 fscache Feb 16 06:06:56 auth_rpcgss Feb 16 06:06:56 nfs_acl Feb 16 06:06:56 sunrpc Feb 16 06:06:56 bridge Feb 16 06:06:56 stp Feb 16 06:06:56 llc Feb 16 06:06:56 ipt_REJECT Feb 16 06:06:56 nf_conntrack_ipv4 Feb 16 06:06:56 nf_defrag_ipv4 Feb 16 06:06:56 iptable_filter Feb 16 06:06:56 ip_tables Feb 16 06:06:56 ip6t_REJECT Feb 16 06:06:56 nf_conntrack_ipv6 Feb 16 06:06:56 nf_defrag_ipv6 Feb 16 06:06:56 xt_state Feb 16 06:06:56 nf_conntrack Feb 16 06:06:56 ip6table_filter Feb 16 06:06:56 ip6_tables Feb 16 06:06:56 ipv6 Feb 16 06:06:56 dm_snapshot Feb 16 06:06:56 dm_bufio Feb 16 06:06:56 dm_zero Feb 16 06:06:56 vhost_net Feb 16 06:06:56 macvtap Feb 16 06:06:56 macvlan Feb 16 06:06:56 tun Feb 16 06:06:56 kvm_amd Feb 16 06:06:56 kvm Feb 16 06:06:56 ipmi_msghandler Feb 16 06:06:56 dcdbas Feb 16 06:06:56 serio_raw Feb 16 06:06:56 bnx2 Feb 16 06:06:56 k10temp Feb 16 06:06:56 amd64_edac_mod Feb 16 06:06:56 edac_core Feb 16 06:06:56 edac_mce_amd Feb 16 06:06:56 sg Feb 16 06:06:56 i2c_piix4 Feb 16 06:06:56 shpchp Feb 16 06:06:56 ext4 Feb 16 06:06:56 jbd2 Feb 16 06:06:56 mbcache Feb 16 06:06:56 sd_mod Feb 16 06:06:56 crc_t10dif Feb 16 06:06:56 mptsas Feb 16 06:06:56 mptscsih Feb 16 06:06:56 mptbase Feb 16 06:06:56 scsi_transport_sas Feb 16 06:06:56 ata_generic Feb 16 06:06:56 pata_acpi Feb 16 06:06:56 sata_svw Feb 16 06:06:56 radeon Feb 16 06:06:56 ttm Feb 16 06:06:56 drm_kms_helper Feb 16 06:06:56 drm Feb 16 06:06:56 i2c_algo_bit Feb 16 06:06:56 i2c_core Feb 16 06:06:56 dm_mirror Feb 16 06:06:56 dm_region_hash Feb 16 06:06:56 dm_log Feb 16 06:06:56 dm_mod Feb 16 06:06:56 [last unloaded: dell_rbu] Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 Pid: 88, comm: ksmd Not tainted 2.6.32-504.8.1.el6.centos.plus.x86_64 #1 Feb 16 06:06:56 Dell Inc. PowerEdge 2970 Feb 16 06:06:56 /0JKN8W Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 RIP: 0010:[<ffffffff812a1411>] Feb 16 06:06:56 [<ffffffff812a1411>] __bitmap_empty+0x41/0x90 Feb 16 06:06:56 RSP: 0018:ffff88021831dcb0 EFLAGS: 00000202 Feb 16 06:06:56 RAX: 0000000000000000 RBX: ffff88021831dcb0 RCX: 0000000000000010 Feb 16 06:06:56 RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffffffff81e2f198 Feb 16 06:06:56 RBP: ffffffff8100bb8e R08: 0000000000000000 R09: 0000000000000000 Feb 16 06:06:56 R10: ffffea0006679c20 R11: 0000000000000000 R12: 0000000000000000 Feb 16 06:06:56 R13: ffff8801c1b8f650 R14: 0000000198152467 R15: ffffffffa03af44a Feb 16 06:06:56 FS: 00007fc4756b09a0(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 Feb 16 06:06:56 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Feb 16 06:06:56 CR2: 000000c641faeff0 CR3: 0000000001a85000 CR4: 00000000000007f0 Feb 16 06:06:56 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 16 06:06:56 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 16 06:06:56 Process ksmd (pid: 88, threadinfo ffff88021831c000, task ffff880218310040) Feb 16 06:06:56 Stack: Feb 16 06:06:56 ffff88021831dd00 Feb 16 06:06:56 ffffffff81052268 Feb 16 06:06:56 00007f30249b8000 Feb 16 06:06:56 ffffffff81e2f180 Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 d> Feb 16 06:06:56 8000000198152025 Feb 16 06:06:56 ffff880219ade700 Feb 16 06:06:56 00007f30249b8000 Feb 16 06:06:56 ffff880219ade9c8 Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 d> Feb 16 06:06:56 ffffea0006679c20 Feb 16 06:06:56 ffff880219e57ed0 Feb 16 06:06:56 ffff88021831dd30 Feb 16 06:06:56 ffffffff810522e6 Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 Call Trace: Feb 16 06:06:56 [<ffffffff81052268>] ? flush_tlb_others_ipi+0x128/0x130 Feb 16 06:06:56 [<ffffffff810522e6>] ? native_flush_tlb_others+0x76/0x90 Feb 16 06:06:56 [<ffffffff8105240e>] ? flush_tlb_page+0x5e/0xb0 Feb 16 06:06:56 [<ffffffff811721c2>] ? try_to_merge_with_ksm_page+0x532/0x660 Feb 16 06:06:56 [<ffffffff811731a4>] ? ksm_scan_thread+0xeb4/0x1120 Feb 16 06:06:56 [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 Feb 16 06:06:56 [<ffffffff811722f0>] ? ksm_scan_thread+0x0/0x1120 Feb 16 06:06:56 [<ffffffff8109e66e>] ? kthread+0x9e/0xc0 Feb 16 06:06:56 [<ffffffff8100c20a>] ? child_rip+0xa/0x20 Feb 16 06:06:56 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 Feb 16 06:06:56 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Feb 16 06:06:56 Code: Feb 16 06:06:56 c0 Feb 16 06:06:56 7e Feb 16 06:06:56 24 Feb 16 06:06:56 48 Feb 16 06:06:56 83 Feb 16 06:06:56 3f Feb 16 06:06:56 00 Feb 16 06:06:56 48 Feb 16 06:06:56 89 Feb 16 06:06:56 f8 Feb 16 06:06:56 74 Feb 16 06:06:56 13 Feb 16 06:06:56 eb Feb 16 06:06:56 5c Feb 16 06:06:56 0f Feb 16 06:06:56 1f Feb 16 06:06:56 40 Feb 16 06:06:56 00 Feb 16 06:06:56 48 Feb 16 06:06:56 8b Feb 16 06:06:56 48 Feb 16 06:06:56 08 Feb 16 06:06:56 48 Feb 16 06:06:56 83 Feb 16 06:06:56 c0 Feb 16 06:06:56 08 Feb 16 06:06:56 48 Feb 16 06:06:56 85 Feb 16 06:06:56 c9 Feb 16 06:06:56 75 Feb 16 06:06:56 4b Feb 16 06:06:56 83 Feb 16 06:06:56 c2 Feb 16 06:06:56 01 Feb 16 06:06:56 41 Feb 16 06:06:56 39 Feb 16 06:06:56 d0 Feb 16 06:06:56 7f Feb 16 06:06:56 eb Feb 16 06:06:56 40 Feb 16 06:06:56 f6 Feb 16 06:06:56 c6 Feb 16 06:06:56 3f Feb 16 06:06:56 b8> Feb 16 06:06:56 01 Feb 16 06:06:56 00 Feb 16 06:06:56 last message repeated 2 times Feb 16 06:06:56 75 Feb 16 06:06:56 08 Feb 16 06:06:56 c9 Feb 16 06:06:56 c3 Feb 16 06:06:56 66 Feb 16 06:06:56 0f Feb 16 06:06:56 1f Feb 16 06:06:56 44 Feb 16 06:06:56 00 Feb 16 06:06:56 00 Feb 16 06:06:56 89 Feb 16 06:06:56 f0 Feb 16 06:06:56 48 Feb 16 06:06:56 63 Feb 16 06:06:56 d2 Feb 16 06:06:56 c1 Feb 16 06:06:56 192.168.13.230 Feb 16 06:06:56 Call Trace: Feb 16 06:06:56 [<ffffffff81052268>] ? flush_tlb_others_ipi+0x128/0x130 Feb 16 06:06:56 [<ffffffff810522e6>] ? native_flush_tlb_others+0x76/0x90 Feb 16 06:06:56 [<ffffffff8105240e>] ? flush_tlb_page+0x5e/0xb0 Feb 16 06:06:56 [<ffffffff811721c2>] ? try_to_merge_with_ksm_page+0x532/0x660 Feb 16 06:06:56 [<ffffffff811731a4>] ? ksm_scan_thread+0xeb4/0x1120 Feb 16 06:06:56 [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 Feb 16 06:06:56 [<ffffffff811722f0>] ? ksm_scan_thread+0x0/0x1120 Feb 16 06:06:56 [<ffffffff8109e66e>] ? kthread+0x9e/0xc0 Feb 16 06:06:56 [<ffffffff8100c20a>] ? child_rip+0xa/0x20 Feb 16 06:06:56 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 Feb 16 06:06:56 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Feb 16 06:07:01 Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1 Feb 16 06:07:01 Pid: 1950, comm: qemu-kvm Not tainted 2.6.32-504.8.1.el6.centos.plus.x86_64 #1 Feb 16 06:07:01 Call Trace: Feb 16 06:07:01 <NMI> Feb 16 06:07:01 [<ffffffff81530bdc>] ? panic+0xa7/0x16f Feb 16 06:07:01 [<ffffffff81014959>] ? sched_clock+0x9/0x10 Feb 16 06:07:01 [<ffffffff810ea65d>] ? watchdog_overflow_callback+0xcd/0xd0 Feb 16 06:07:01 [<ffffffff81120e07>] ? __perf_event_overflow+0xa7/0x240 Feb 16 06:07:01 [<ffffffff81119e14>] ? perf_event_update_userpage+0x24/0x110 Feb 16 06:07:01 [<ffffffff81121454>] ? perf_event_overflow+0x14/0x20 Feb 16 06:07:01 [<ffffffff8101e3fb>] ? x86_pmu_handle_irq+0x1eb/0x250 Feb 16 06:07:01 [<ffffffff81535ed9>] ? perf_event_nmi_handler+0x39/0xb0 Feb 16 06:07:01 [<ffffffff81537995>] ? notifier_call_chain+0x55/0x80 Feb 16 06:07:01 [<ffffffff815379fa>] ? atomic_notifier_call_chain+0x1a/0x20 Feb 16 06:07:01 [<ffffffff810a4ede>] ? notify_die+0x2e/0x30 Feb 16 06:07:01 [<ffffffff8153565b>] ? do_nmi+0x1bb/0x340 Feb 16 06:07:01 [<ffffffff81534f20>] ? nmi+0x20/0x30 Feb 16 06:07:01 [<ffffffff8153478e>] ? _spin_lock+0x1e/0x30 Feb 16 06:07:01 <<EOE>> Feb 16 06:07:01 [<ffffffff8114fdd3>] ? handle_pte_fault+0x833/0xb00 Feb 16 06:07:01 [<ffffffffa03987da>] ? kvm_ioapic_update_eoi+0x8a/0xf0 [kvm] Feb 16 06:07:01 [<ffffffff811502ca>] ? handle_mm_fault+0x22a/0x300 Feb 16 06:07:01 [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480 Feb 16 06:07:01 [<ffffffff8105d7d1>] ? update_curr+0xe1/0x1f0 Feb 16 06:07:01 [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70 Feb 16 06:07:01 [<ffffffff8100bc0e>] ? invalidate_interrupt0+0xe/0x20 Feb 16 06:07:01 [<ffffffff81060c0c>] ? finish_task_switch+0x4c/0xf0 Feb 16 06:07:01 [<ffffffff815378de>] ? do_page_fault+0x3e/0xa0 Feb 16 06:07:01 [<ffffffff81534c95>] ? page_fault+0x25/0x30 Feb 16 06:07:01 [<ffffffff8129e862>] ? copy_user_generic_string+0x32/0x40 Feb 16 06:07:01 [<ffffffffa03926ab>] ? kvm_write_guest_cached+0x7b/0xa0 [kvm] Feb 16 06:07:01 [<ffffffffa03bf61f>] ? kvm_lapic_sync_to_vapic+0xcf/0x220 [kvm] Feb 16 06:07:01 [<ffffffffa03bdfb8>] ? kvm_apic_has_interrupt+0x48/0xd0 [kvm] Feb 16 06:07:01 [<ffffffffa03ac24d>] ? kvm_arch_vcpu_ioctl_run+0x93d/0x1010 [kvm] Feb 16 06:07:01 [<ffffffff810b2b73>] ? futex_wake+0x93/0x150 Feb 16 06:07:01 [<ffffffffa0392b04>] ? kvm_vcpu_ioctl+0x434/0x580 [kvm] Feb 16 06:07:01 [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70 Feb 16 06:07:01 [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 Feb 16 06:07:01 [<ffffffff811a3e92>] ? vfs_ioctl+0x22/0xa0 Feb 16 06:07:01 [<ffffffff811a435a>] ? do_vfs_ioctl+0x3aa/0x580 Feb 16 06:07:01 [<ffffffff811a45b1>] ? sys_ioctl+0x81/0xa0 Feb 16 06:07:01 [<ffffffff810e5afe>] ? __audit_syscall_exit+0x25e/0x290 Feb 16 06:07:01 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Feb 16 06:07:01 drm_kms_helper: panic occurred, switching back to text console Feb 16 06:07:01 BUG: scheduling while atomic: qemu-kvm/1950/0x14010000 Feb 16 06:07:01 Modules linked in: Feb 16 06:07:01 nf_nat Feb 16 06:07:01 mpt3sas Feb 16 06:07:01 mpt2sas Feb 16 06:07:01 raid_class Feb 16 06:07:01 mptctl Feb 16 06:07:01 ipmi_si Feb 16 06:07:01 ipmi_devintf Feb 16 06:07:01 netconsole Feb 16 06:07:01 configfs Feb 16 06:07:01 ebtable_nat Feb 16 06:07:01 ebtables Feb 16 06:07:01 nfs Feb 16 06:07:01 lockd Feb 16 06:07:01 fscache Feb 16 06:07:01 auth_rpcgss Feb 16 06:07:01 nfs_acl Feb 16 06:07:01 sunrpc Feb 16 06:07:01 bridge Feb 16 06:07:01 stp Feb 16 06:07:01 llc Feb 16 06:07:01 ipt_REJECT Feb 16 06:07:01 nf_conntrack_ipv4 Feb 16 06:07:01 nf_defrag_ipv4 Feb 16 06:07:01 iptable_filter Feb 16 06:07:01 ip_tables Feb 16 06:07:01 ip6t_REJECT Feb 16 06:07:01 nf_conntrack_ipv6 Feb 16 06:07:01 nf_defrag_ipv6 Feb 16 06:07:01 xt_state Feb 16 06:07:01 nf_conntrack Feb 16 06:07:01 ip6table_filter Feb 16 06:07:01 ip6_tables Feb 16 06:07:01 ipv6 Feb 16 06:07:01 dm_snapshot Feb 16 06:07:01 dm_bufio Feb 16 06:07:01 dm_zero Feb 16 06:07:01 vhost_net Feb 16 06:07:01 macvtap Feb 16 06:07:01 macvlan Feb 16 06:07:01 tun Feb 16 06:07:01 kvm_amd Feb 16 06:07:01 kvm Feb 16 06:07:01 ipmi_msghandler Feb 16 06:07:01 dcdbas Feb 16 06:07:01 serio_raw Feb 16 06:07:01 bnx2 Feb 16 06:07:01 k10temp Feb 16 06:07:01 amd64_edac_mod Feb 16 06:07:01 edac_core Feb 16 06:07:01 edac_mce_amd Feb 16 06:07:01 sg Feb 16 06:07:01 i2c_piix4 Feb 16 06:07:01 shpchp Feb 16 06:07:01 ext4 Feb 16 06:07:01 jbd2 Feb 16 06:07:01 mbcache Feb 16 06:07:01 sd_mod Feb 16 06:07:01 crc_t10dif Feb 16 06:07:01 mptsas Feb 16 06:07:01 mptscsih Feb 16 06:07:01 mptbase Feb 16 06:07:01 scsi_transport_sas Feb 16 06:07:01 ata_generic Feb 16 06:07:01 pata_acpi Feb 16 06:07:01 sata_svw Feb 16 06:07:01 radeon Feb 16 06:07:01 ttm Feb 16 06:07:01 drm_kms_helper Feb 16 06:07:01 drm Feb 16 06:07:01 i2c_algo_bit Feb 16 06:07:01 i2c_core Feb 16 06:07:01 dm_mirror Feb 16 06:07:01 dm_region_hash Feb 16 06:07:01 dm_log Feb 16 06:07:01 dm_mod Feb 16 06:07:01 [last unloaded: dell_rbu] Feb 16 06:07:01 192.168.13.230 Feb 16 06:07:01 Pid: 1950, comm: qemu-kvm Not tainted 2.6.32-504.8.1.el6.centos.plus.x86_64 #1 Feb 16 06:07:01 Call Trace: Feb 16 06:07:01 <NMI> Feb 16 06:07:01 [<ffffffff81060bb6>] ? __schedule_bug+0x66/0x70 Feb 16 06:07:01 [<ffffffff8153193c>] ? thread_return+0x6ac/0x7d0 Feb 16 06:07:01 [<ffffffffa002e35d>] ? write_msg+0xfd/0x110 [netconsole] Feb 16 06:07:01 [<ffffffffa00b2d0e>] ? drm_crtc_helper_set_config+0x1be/0xa60 [drm_kms_helper] Feb 16 06:07:01 [<ffffffff8106c85a>] ? __cond_resched+0x2a/0x40 Feb 16 06:07:01 [<ffffffff81531d30>] ? _cond_resched+0x30/0x40 Feb 16 06:07:01 [<ffffffff81174e18>] ? __kmalloc+0x138/0x230 Feb 16 06:07:01 [<ffffffff810ba332>] ? __module_text_address+0x12/0x60 Feb 16 06:07:01 [<ffffffffa00b2d0e>] ? drm_crtc_helper_set_config+0x1be/0xa60 [drm_kms_helper] Feb 16 06:07:01 [<ffffffffa013df27>] ? r100_mm_wreg+0x67/0x90 [radeon] Feb 16 06:07:01 [<ffffffffa01332d2>] ? radeon_crtc_cursor_set+0x92/0x6e0 [radeon] Feb 16 06:07:01 [<ffffffffa005e40c>] ? drm_mode_set_config_internal+0x5c/0xe0 [drm] Feb 16 06:07:01 [<ffffffffa00b0653>] ? drm_fb_helper_restore_fbdev_mode+0xb3/0xe0 [drm_kms_helper] Feb 16 06:07:01 [<ffffffffa00b0788>] ? drm_fb_helper_panic+0x78/0xa0 [drm_kms_helper] Feb 16 06:07:01 [<ffffffff81537995>] ? notifier_call_chain+0x55/0x80 Feb 16 06:07:01 [<ffffffff815379fa>] ? atomic_notifier_call_chain+0x1a/0x20 Feb 16 06:07:01 [<ffffffff81530c07>] ? panic+0xd2/0x16f Feb 16 06:07:01 [<ffffffff81014959>] ? sched_clock+0x9/0x10 Feb 16 06:07:01 [<ffffffff810ea65d>] ? watchdog_overflow_callback+0xcd/0xd0 Feb 16 06:07:01 [<ffffffff81120e07>] ? __perf_event_overflow+0xa7/0x240 Feb 16 06:07:01 [<ffffffff81119e14>] ? perf_event_update_userpage+0x24/0x110 Feb 16 06:07:01 [<ffffffff81121454>] ? perf_event_overflow+0x14/0x20 Feb 16 06:07:01 [<ffffffff8101e3fb>] ? x86_pmu_handle_irq+0x1eb/0x250 Feb 16 06:07:01 [<ffffffff81535ed9>] ? perf_event_nmi_handler+0x39/0xb0 Feb 16 06:07:01 [<ffffffff81537995>] ? notifier_call_chain+0x55/0x80 Feb 16 06:07:01 [<ffffffff815379fa>] ? atomic_notifier_call_chain+0x1a/0x20 Feb 16 06:07:01 [<ffffffff810a4ede>] ? notify_die+0x2e/0x30 Feb 16 06:07:01 [<ffffffff8153565b>] ? do_nmi+0x1bb/0x340 Feb 16 06:07:01 [<ffffffff81534f20>] ? nmi+0x20/0x30 Feb 16 06:07:01 [<ffffffff8153478e>] ? _spin_lock+0x1e/0x30 Feb 16 06:07:01 <<EOE>> Feb 16 06:07:01 [<ffffffff8114fdd3>] ? handle_pte_fault+0x833/0xb00 Feb 16 06:07:01 [<ffffffffa03987da>] ? kvm_ioapic_update_eoi+0x8a/0xf0 [kvm] Feb 16 06:07:01 [<ffffffff811502ca>] ? handle_mm_fault+0x22a/0x300 Feb 16 06:07:01 [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480 Feb 16 06:07:01 [<ffffffff8105d7d1>] ? update_curr+0xe1/0x1f0 Feb 16 06:07:01 [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70 Feb 16 06:07:01 [<ffffffff8100bc0e>] ? invalidate_interrupt0+0xe/0x20 Feb 16 06:07:01 [<ffffffff81060c0c>] ? finish_task_switch+0x4c/0xf0 Feb 16 06:07:01 [<ffffffff815378de>] ? do_page_fault+0x3e/0xa0 Feb 16 06:07:01 [<ffffffff81534c95>] ? page_fault+0x25/0x30 Feb 16 06:07:01 [<ffffffff8129e862>] ? copy_user_generic_string+0x32/0x40 Feb 16 06:07:01 [<ffffffffa03926ab>] ? kvm_write_guest_cached+0x7b/0xa0 [kvm] Feb 16 06:07:01 [<ffffffffa03bf61f>] ? kvm_lapic_sync_to_vapic+0xcf/0x220 [kvm] Feb 16 06:07:01 [<ffffffffa03bdfb8>] ? kvm_apic_has_interrupt+0x48/0xd0 [kvm] Feb 16 06:07:01 [<ffffffffa03ac24d>] ? kvm_arch_vcpu_ioctl_run+0x93d/0x1010 [kvm] Feb 16 06:07:01 [<ffffffff810b2b73>] ? futex_wake+0x93/0x150 Feb 16 06:07:01 [<ffffffffa0392b04>] ? kvm_vcpu_ioctl+0x434/0x580 [kvm] Feb 16 06:07:01 [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70 Feb 16 06:07:01 [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 Feb 16 06:07:01 [<ffffffff811a3e92>] ? vfs_ioctl+0x22/0xa0 Feb 16 06:07:01 [<ffffffff811a435a>] ? do_vfs_ioctl+0x3aa/0x580 Feb 16 06:07:01 [<ffffffff811a45b1>] ? sys_ioctl+0x81/0xa0 Feb 16 06:07:01 [<ffffffff810e5afe>] ? __audit_syscall_exit+0x25e/0x290 Feb 16 06:07:01 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Feb 16 06:07:01 Clocksource tsc unstable (delta = -77309385171 ns). Enable clocksource failover by adding clocksource_failover kernel parameter.> > > > > I had a tail -f over ssh for a week, when this just happened. > > > > Feb 8 00:10:21 thirteen-230 kernel: mptscsih: ioc0: > > attempting task abort! (sc=ffff880057a0a080) > > Feb 8 00:10:21 thirteen-230 kernel: sd 4:0:0:0: [sda] CDB: > > Write(10): 2a 00 1a 17 a1 6f 00 00 01 00 > > Feb 8 00:10:51 thirteen-230 kernel: mptscsih: ioc0: WARNING > > - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x24000000 > > Feb 8 00:10:51 thirteen-230 kernel: mptbase: ioc0: > > Initiating recovery > > Feb 8 00:11:13 thirteen-230 kernel: mptscsih: ioc0: task > > abort: SUCCESS (rv=2002) (sc=ffff880057a0a080) > > Write failed: Connection reset by peer > > > > After reading https://access.redhat.com/solutions/108273, I > > am increasing the logging (shown below) but I am not > > confident about this wait and see approach. > > > > sysctl -w dev.scsi.logging_level=98367 > > > > I am also going to check smartctl output once I get onsite to > > power cycle the system. > > # smartctl -a /dev/sda > smartctl 5.43 2012-06-30 r3573 > [x86_64-linux-2.6.32-504.3.3.el6.x86_64] (local build) > Copyright (C) 2002-12 by Bruce Allen, > http://smartmontools.sourceforge.net > > === START OF INFORMATION SECTION ==> Model Family: Seagate Barracuda (SATA 3Gb/s, 4K Sectors) > Device Model: ST1500DM003-9YN16G > Serial Number: W24153R0 > LU WWN Device Id: 5 000c50 05d03cc1d > Firmware Version: CC82 > User Capacity: 1,500,301,910,016 bytes [1.50 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Device is: In smartctl database [for details use: -P show] > ATA Version is: 8 > ATA Standard is: ATA-8-ACS revision 4 > Local Time is: Sat Feb 7 23:41:00 2015 EST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION ==> SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x00) Offline data > collection activity > was never started. > Auto Offline Data > Collection: Disabled. > Self-test execution status: ( 0) The previous > self-test routine completed > without error or no > self-test has ever > been run. > Total time to complete Offline > data collection: ( 600) seconds. > Offline data collection > capabilities: (0x73) SMART execute Offline > immediate. > Auto Offline data > collection on/off support. > Suspend Offline > collection upon new > command. > No Offline surface > scan supported. > Self-test supported. > Conveyance Self-test > supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data > before entering > power-saving mode. > Supports SMART auto > save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose > Logging supported. > Short self-test routine > recommended polling time: ( 1) minutes. > Extended self-test routine > recommended polling time: ( 194) minutes. > Conveyance self-test routine > recommended polling time: ( 2) minutes. > SCT capabilities: (0x3085) SCT Status supported. > > SMART Attributes Data Structure revision number: 10 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 118 099 006 > Pre-fail Always - 181943016 > 3 Spin_Up_Time 0x0003 092 092 000 > Pre-fail Always - 0 > 4 Start_Stop_Count 0x0032 100 100 020 > Old_age Always - 17 > 5 Reallocated_Sector_Ct 0x0033 100 100 036 > Pre-fail Always - 0 > 7 Seek_Error_Rate 0x000f 075 060 030 > Pre-fail Always - 39599363 > 9 Power_On_Hours 0x0032 100 100 000 > Old_age Always - 821 > 10 Spin_Retry_Count 0x0013 100 100 097 > Pre-fail Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 020 > Old_age Always - 17 > 183 Runtime_Bad_Block 0x0032 100 100 000 > Old_age Always - 0 > 184 End-to-End_Error 0x0032 100 100 099 > Old_age Always - 0 > 187 Reported_Uncorrect 0x0032 100 100 000 > Old_age Always - 0 > 188 Command_Timeout 0x0032 100 100 000 > Old_age Always - 0 > 189 High_Fly_Writes 0x003a 100 100 000 > Old_age Always - 0 > 190 Airflow_Temperature_Cel 0x0022 067 062 045 > Old_age Always - 33 (Min/Max 30/33) > 191 G-Sense_Error_Rate 0x0032 100 100 000 > Old_age Always - 0 > 192 Power-Off_Retract_Count 0x0032 100 100 000 > Old_age Always - 16 > 193 Load_Cycle_Count 0x0032 098 098 000 > Old_age Always - 4551 > 194 Temperature_Celsius 0x0022 033 040 000 > Old_age Always - 33 (0 21 0 0 0) > 197 Current_Pending_Sector 0x0012 100 100 000 > Old_age Always - 0 > 198 Offline_Uncorrectable 0x0010 100 100 000 > Old_age Offline - 0 > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 > Old_age Always - 0 > 240 Head_Flying_Hours 0x0000 100 253 000 > Old_age Offline - 267112606073648 > 241 Total_LBAs_Written 0x0000 100 253 000 > Old_age Offline - 2764453802303 > 242 Total_LBAs_Read 0x0000 100 253 000 > Old_age Offline - 3442873711291 > > SMART Error Log Version: 1 > No Errors Logged > > SMART Self-test log structure revision number 1 > No self-tests have been logged. [To run self-tests, use: smartctl -t] > > > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 0 0 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 > minute delay. > > > > > > Other posts I have read, but I can not act on yet: > > > > * > > http://unix.stackexchange.com/questions/34173/mptscsih-ioc0-ta > sk-abort-success-rv-2002-causes-30-seconds-freezing > > * https://bugzilla.kernel.org/show_bug.cgi?id=18652 > > * https://bugzilla.redhat.com/show_bug.cgi?id=483424 > > * https://bugzilla.kernel.org/show_bug.cgi?id=42765 > > * http://sourceforge.net/p/smartmontools/mailman/message/23849184/ > > * http://kb.softescu.ro/category/hardware/dell/ > > > > -Jason-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100 - - +1 (443) 269-1555 x333 Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- This message is copyright PD Inc, subject to license 20080407P00.
Chris Murphy
2015-Feb-17 08:57 UTC
[CentOS] Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
I think the panic is the consequence of drive write failure. So the actual problem is before the panic call trace. I'd post the entire dmesg somewhere wrap safe (either you mail agent or the forum is hard wrapping and is a pain to read). What do you get for smartctl -x <dev> In the meantime check or replace cables, usually it's the connectors that are faulty not the cable itself. Or replace the drive. Chris Murphy
Jason Pyeron
2015-Feb-17 14:54 UTC
[CentOS] Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
> -----Original Message----- > From: Chris Murphy > Sent: Tuesday, February 17, 2015 3:58 > > I think the panic is the consequence of drive write failure. > So the actual > problem is before the panic call trace.Most of the time it panics without any warning, but once there was:> > -----Original Message----- > > From: Jason Pyeron > > Sent: Sunday, February 08, 2015 0:00 > > > > > -----Original Message----- > > > From: Jason Pyeron > > > Sent: Saturday, February 07, 2015 22:54 > > > > > > Feb 8 00:10:21 thirteen-230 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff880057a0a080) > > > Feb 8 00:10:21 thirteen-230 kernel: sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 1a 17 a1 6f 00 00 01 00 > > > Feb 8 00:10:51 thirteen-230 kernel: mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x24000000 > > > Feb 8 00:10:51 thirteen-230 kernel: mptbase: ioc0: Initiating recovery > > > Feb 8 00:11:13 thirteen-230 kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff880057a0a080)> I'd post the entire dmesg somewherehttp://client.pdinc.us/panic-341e97c30b5a4cb774942bae32d3f163.log> wrap safe (either you mail agent or the forum is hard > wrapping and is a > pain to read). > > What do you get for > smartctl -x <dev>http://client.pdinc.us/smartctl-2000e86b62db27169cc9307358ebf10e.log> > In the meantime check or replace cables, usually it's the > connectors thatIt is a backplane, no "cables". I have reseated the parts.> are faulty not the cable itself. Or replace the drive.I have replaced the drive (and reinstalled) already, the panics still happen once ever 30-40 hours.> > Chris Murphy > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > >-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100 - - +1 (443) 269-1555 x333 Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- This message is copyright PD Inc, subject to license 20080407P00.
Apparently Analagous Threads
- Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
- Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
- Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
- Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
- Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!