Jason Pyeron
2015-Feb-17 14:54 UTC
[CentOS] Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
> -----Original Message----- > From: Chris Murphy > Sent: Tuesday, February 17, 2015 3:58 > > I think the panic is the consequence of drive write failure. > So the actual > problem is before the panic call trace.Most of the time it panics without any warning, but once there was:> > -----Original Message----- > > From: Jason Pyeron > > Sent: Sunday, February 08, 2015 0:00 > > > > > -----Original Message----- > > > From: Jason Pyeron > > > Sent: Saturday, February 07, 2015 22:54 > > > > > > Feb 8 00:10:21 thirteen-230 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff880057a0a080) > > > Feb 8 00:10:21 thirteen-230 kernel: sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 1a 17 a1 6f 00 00 01 00 > > > Feb 8 00:10:51 thirteen-230 kernel: mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x24000000 > > > Feb 8 00:10:51 thirteen-230 kernel: mptbase: ioc0: Initiating recovery > > > Feb 8 00:11:13 thirteen-230 kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff880057a0a080)> I'd post the entire dmesg somewherehttp://client.pdinc.us/panic-341e97c30b5a4cb774942bae32d3f163.log> wrap safe (either you mail agent or the forum is hard > wrapping and is a > pain to read). > > What do you get for > smartctl -x <dev>http://client.pdinc.us/smartctl-2000e86b62db27169cc9307358ebf10e.log> > In the meantime check or replace cables, usually it's the > connectors thatIt is a backplane, no "cables". I have reseated the parts.> are faulty not the cable itself. Or replace the drive.I have replaced the drive (and reinstalled) already, the panics still happen once ever 30-40 hours.> > Chris Murphy > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > >-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100 - - +1 (443) 269-1555 x333 Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- This message is copyright PD Inc, subject to license 20080407P00.
Chris Murphy
2015-Feb-18 01:48 UTC
[CentOS] Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
On Tue, Feb 17, 2015 at 7:54 AM, Jason Pyeron <jpyeron at pdinc.us> wrote:>> I'd post the entire dmesg somewhere > > http://client.pdinc.us/panic-341e97c30b5a4cb774942bae32d3f163.logAt least part of the problem happens before this log starts.>> What do you get for >> smartctl -x <dev> > > http://client.pdinc.us/smartctl-2000e86b62db27169cc9307358ebf10e.logOK no smart extended test has been done, but also no pending bad or relocated sectors, and no phy event errors either. So the write (10) error seems isolated but it's still really suspicious, so I'd start replacing hardware.> I have replaced the drive (and reinstalled) already, the panics still happen once ever 30-40 hours.The only thing that suggests it might not be hardware are all the kvm related messages in the kp. So if you've changed kernels, or VM configuration recently, then I'd revert. That's the limit of the most likely software explanation. If there's no recent software changes, then it must be hardware. -- Chris Murphy
Jason Pyeron
2015-Feb-18 02:34 UTC
[CentOS] Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
> -----Original Message----- > From: Chris Murphy > Sent: Tuesday, February 17, 2015 20:48 > > On Tue, Feb 17, 2015 at 7:54 AM, Jason Pyeron wrote: > >> I'd post the entire dmesg somewhere > > > > http://client.pdinc.us/panic-341e97c30b5a4cb774942bae32d3f163.log > > At least part of the problem happens before this log starts.Feb 15 23:41:19 thirteen-230 dhclient[1272]: DHCPREQUEST on br0 to 192.168.5.58 port 67 (xid=0x48d081b6) Feb 15 23:41:19 thirteen-230 dhclient[1272]: DHCPACK from 192.168.5.58 (xid=0x48d081b6) Feb 15 23:41:21 thirteen-230 dhclient[1272]: bound to 192.168.13.230 -- renewal in 8613 seconds. Feb 16 02:04:54 thirteen-230 dhclient[1272]: DHCPREQUEST on br0 to 192.168.5.58 port 67 (xid=0x48d081b6) Feb 16 02:04:54 thirteen-230 dhclient[1272]: DHCPACK from 192.168.5.58 (xid=0x48d081b6) Feb 16 02:04:55 thirteen-230 dhclient[1272]: bound to 192.168.13.230 -- renewal in 8735 seconds. Feb 16 02:46:09 thirteen-230 kernel: kvm: 1994: cpu0 unimplemented perfctr wrmsr: 0xc0010004 data 0xffffffffffffd8f0 Feb 16 02:46:09 thirteen-230 kernel: kvm: 1994: cpu0 unimplemented perfctr wrmsr: 0xc0010000 data 0x530076 Feb 16 03:53:39 thirteen-230 kernel: kvm: 2161: cpu0 unimplemented perfctr wrmsr: 0xc0010004 data 0xffffffffffffd8f0 Feb 16 03:53:39 thirteen-230 kernel: kvm: 2161: cpu0 unimplemented perfctr wrmsr: 0xc0010000 data 0x530076 Feb 16 04:30:30 thirteen-230 dhclient[1272]: DHCPREQUEST on br0 to 192.168.5.58 port 67 (xid=0x48d081b6) Feb 16 04:30:30 thirteen-230 dhclient[1272]: DHCPACK from 192.168.5.58 (xid=0x48d081b6) Feb 16 04:30:31 thirteen-230 dhclient[1272]: bound to 192.168.13.230 -- renewal in 9224 seconds.> > >> What do you get for > >> smartctl -x <dev> > > > > http://client.pdinc.us/smartctl-2000e86b62db27169cc9307358ebf10e.log > > OK no smart extended test has been done, but also no pending bad or > relocated sectors, and no phy event errors either. So the write (10) > error seems isolated but it's still really suspicious, so I'd start > replacing hardware.Dell tech is enroute. New system board and disk controller.> > > > I have replaced the drive (and reinstalled) already, the > panics still happen once ever 30-40 hours. > > The only thing that suggests it might not be hardware are all the kvm > related messages in the kp.How so, each of the results I find say these are to be ignored.> So if you've changed kernels, or VM > configuration recently, then I'd revert. That's the limit of the mostNo changes from install out of the box.> likely software explanation. If there's no recent software changes, > then it must be hardware. >-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100 - - +1 (443) 269-1555 x333 Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- This message is copyright PD Inc, subject to license 20080407P00.
Possibly Parallel Threads
- Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
- Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
- Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
- Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
- Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!