thr3ads.net - CentOS - [CentOS] Hard drive errors [Sep 2006]

If this information is useful, please help other people find it:
Share via:

Bowie Bailey

2006-Sep-21 19:56 UTC

[CentOS] Hard drive errors

One of my CentOS boxes has started giving me errors.  The box is
CentOS-4.4 (i386) fully updated.  It has a pair of SATA drives in a
software raid 1 configuration.

The errors I see are:

    ata1: command 0xca timeout, stat 0x50 host_stat 0x24
    ata1: status=0x50 { DriveReady SeekComplete }
    Info fld=0x1e22b8, Current sda: sense key No Sense
    ata2: command 0xca timeout, stat 0x50 host_stat 0x24
    ata2: status=0x50 { DriveReady SeekComplete }
    Info fld=0x1e2598, Current sdb: sense key No Sense

If it was just coming from one drive, I would say it was a bad drive,
but this is happening on both drives.

Can anyone give me a starting place to diagnose this?

Thanks.

--
Bowie

Scott Silva

2006-Sep-21 21:06 UTC

head link

[CentOS] Re: Hard drive errors

Bowie Bailey spake the following on 9/21/2006 12:56 PM:> One of my CentOS boxes has started giving me errors.  The box is
> CentOS-4.4 (i386) fully updated.  It has a pair of SATA drives in a
> software raid 1 configuration.
> 
> The errors I see are:
> 
>     ata1: command 0xca timeout, stat 0x50 host_stat 0x24
>     ata1: status=0x50 { DriveReady SeekComplete }
>     Info fld=0x1e22b8, Current sda: sense key No Sense
>     ata2: command 0xca timeout, stat 0x50 host_stat 0x24
>     ata2: status=0x50 { DriveReady SeekComplete }
>     Info fld=0x1e2598, Current sdb: sense key No Sense
> 
> If it was just coming from one drive, I would say it was a bad drive,
> but this is happening on both drives.
> 
> Can anyone give me a starting place to diagnose this?
> 
> Thanks.
> 
> --
> BowieCables? Connections? Power supply going bad?


-- 

MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!

Jordi Espasa Clofent

2006-Sep-22 07:04 UTC

head link

[CentOS] Hard drive errors

> One of my CentOS boxes has started giving me errors.  The box is
> CentOS-4.4 (i386) fully updated.  It has a pair of SATA drives in a
> software raid 1 configuration.
>
> The errors I see are:
>
>     ata1: command 0xca timeout, stat 0x50 host_stat 0x24
>     ata1: status=0x50 { DriveReady SeekComplete }
>     Info fld=0x1e22b8, Current sda: sense key No Sense
>     ata2: command 0xca timeout, stat 0x50 host_stat 0x24
>     ata2: status=0x50 { DriveReady SeekComplete }
>     Info fld=0x1e2598, Current sdb: sense key No Sense
>
> If it was just coming from one drive, I would say it was a bad drive,
> but this is happening on both drives.
>
> Can anyone give me a starting place to diagnose this?
The ideal should be to use the SmartTools utilities, but, if I'm right, 
there're problems with SATA support.

At least, you can boot with liveCD-util (like UltimateBootCD, for example) and 
check the HD integrity.

-- 
Jordi Espasa Clofent

PGP id 0xC5ABA76A #http://pgp.mit.edu/
FSF Associate Member id 4281 #http://www.fsf.org/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL:
<http://lists.centos.org/pipermail/centos/attachments/20060922/e50fb5df/attachment-0002.sig>

Karanbir Singh

2006-Sep-22 10:00 UTC

head link

[CentOS] Hard drive errors

Jordi Espasa Clofent wrote:> 
> The ideal should be to use the SmartTools utilities, but, if I'm right,
> there're problems with SATA support.
smartctl on CentOS >= 4.3, supports SATA just fine.
> At least, you can boot with liveCD-util (like UltimateBootCD, for example)
and
> check the HD integrity.
why not use the CentOS-4 LiveCD ?



-- 
Karanbir Singh : http://www.karan.org/ : 2522219 at icq

Bowie Bailey

2006-Sep-22 13:06 UTC

head link

[CentOS] Re: Hard drive errors

Scott Silva wrote:> Bowie Bailey spake the following on 9/21/2006 12:56 PM:
> > One of my CentOS boxes has started giving me errors.  The box is
> > CentOS-4.4 (i386) fully updated.  It has a pair of SATA drives in a
> > software raid 1 configuration.
> > 
> > The errors I see are:
> > 
> >     ata1: command 0xca timeout, stat 0x50 host_stat 0x24
> >     ata1: status=0x50 { DriveReady SeekComplete }
> >     Info fld=0x1e22b8, Current sda: sense key No Sense
> >     ata2: command 0xca timeout, stat 0x50 host_stat 0x24
> >     ata2: status=0x50 { DriveReady SeekComplete }
> >     Info fld=0x1e2598, Current sdb: sense key No Sense
> > 
> > If it was just coming from one drive, I would say it was a bad
> > drive, but this is happening on both drives.
> > 
> > Can anyone give me a starting place to diagnose this?
> 
> Cables? Connections? Power supply going bad?
Already reseated all of the cables.  I even moved the drives to two
different
SATA ports on the MB.

Power supply...maybe, but not my first guess.  I may swap it out later if I
can't find any other cause.

-- 
Bowie

Bowie Bailey

2006-Sep-22 13:33 UTC

head link

[CentOS] Re: Hard drive errors

Peter Farrow wrote:> This can be caused by overheating.  If the drives are mounted close
> together as some chassis configurations permit, then it make sense
> they may both exhibit the same problem, but not necessarily
> simultaneously.   
> 
> Are this log entries adjacent to each other?
The drives have an open slot between them and a 120mm fan right in
front of them, so I don't think overheating is an issue.  At one
point, I shut the machine down and let it sit for an hour or so.  The
errors started back up immediately during the boot process when I
turned it back on.

The log entries are adjacent...usually with the exact same timestamp.
There are no other errors mixed in with them.

I was able to get the system to boot yesterday.  It triggered a raid
rebuild which finished last night.  The last batch of errors happened
at 4am and the system appears to be running normally now.

Here's a snip from the logs showing the errors and the rebuild:
(I've cut out a few columns to shorten the lines)

----------------------------------
22:36:21 kernel: ata3: command 0x25 timeout, stat 0x50 host_stat 0x24
22:36:21 kernel: ata3: status=0x50 { DriveReady SeekComplete }
22:36:21 kernel: Current sda: sense key No Sense
22:36:21 kernel: ata4: command 0x35 timeout, stat 0x50 host_stat 0x24
22:36:21 kernel: ata4: status=0x50 { DriveReady SeekComplete }
22:36:21 kernel: Current sdb: sense key No Sense
22:37:09 kernel: ata3: command 0x25 timeout, stat 0x50 host_stat 0x24
22:37:09 kernel: ata3: status=0x50 { DriveReady SeekComplete }
22:37:09 kernel: Current sda: sense key No Sense
22:38:03 kernel: md: md1: sync done.
22:38:03 kernel: RAID1 conf printout:
22:38:03 kernel:  --- wd:2 rd:2
22:38:03 kernel:  disk 0, wo:0, o:1, dev:sda2
22:38:03 kernel:  disk 1, wo:0, o:1, dev:sdb2
23:25:39 kernel: ata3: command 0xca timeout, stat 0x50 host_stat 0x24
23:25:39 kernel: ata3: status=0x50 { DriveReady SeekComplete }
23:25:39 kernel: Info fld=0x662270, Current sda: sense key No Sense
23:25:39 kernel: ata4: command 0xca timeout, stat 0x50 host_stat 0x24
23:25:39 kernel: ata4: status=0x50 { DriveReady SeekComplete }
23:25:39 kernel: Info fld=0x662270, Current sdb: sense key No Sense
----------------------------------

These errors are showing ata3 and ata4 because I switched the drives
to different SATA connections on the MB.  These are the same drives
that I showed previously with errors on ata1 and ata2.

-- 
Bowie

Bowie Bailey

2006-Sep-22 13:35 UTC

head link

[CentOS] Hard drive errors

Karanbir Singh wrote:> Jordi Espasa Clofent wrote:
> > 
> > The ideal should be to use the SmartTools utilities, but, if I'm
> > right, there're problems with SATA support.
> 
> smartctl on CentOS >= 4.3, supports SATA just fine.
Sounds good.  I'll give it a try.
> > At least, you can boot with liveCD-util (like UltimateBootCD, for
> > example) and check the HD integrity.
> 
> why not use the CentOS-4 LiveCD ?
I downloaded the CentOS-4 LiveCD last night.  The system is running
from the hard drives at the moment, but if it dies again, I'll try the
LiveCD.

-- 
Bowie

Bowie Bailey

2006-Sep-22 16:41 UTC

head link

[CentOS] Hard drive errors

Bowie Bailey wrote:> Karanbir Singh wrote:
> > Jordi Espasa Clofent wrote:
> > > 
> > > The ideal should be to use the SmartTools utilities, but, if
I'm
> > > right, there're problems with SATA support.
> > 
> > smartctl on CentOS >= 4.3, supports SATA just fine.
> 
> Sounds good.  I'll give it a try.
I tried smartctl and it ran fine, but found no errors (on the short
test).

FYI - I had to specify the ata device type to make it work.

    smartctl -d ata -t short /dev/sda

I was also able to get the smartd service running by adding '-d ata'
to both lines in the /etc/smartd.conf file.

Everything seems to be running normally now.  Smartd is configured to
send me email on problems now, so I'll keep an eye on it and see what
happens.

-- 
Bowie

Bowie Bailey

2006-Sep-22 17:06 UTC

head link

[CentOS] Hard drive errors

Cian Cullinan wrote:> Well given the simultaneous errors from both disks, and the all-clear
> from smartctl, it *really* sounds like a problem external to the
> disks, I.E.: cables, controller etc.
I'm running the long self-test on the drives now.  That will give me a
more definitive answer.

The problem does sound external to the disks.  Someone else suggested
the power supply.  I wouldn't suspect both cables to have problems, so
that's probably not an issue.  The controller is on the MB and I
REALLY don't want to have to tear the machine apart to replace it.

The errors have stopped for now.  No more errors in the past four
hours.  At the moment all I can do is keep monitoring it.

-- 
Bowie

Bowie Bailey

2006-Sep-22 17:38 UTC

head link

[CentOS] Hard drive errors

William L. Maltby wrote:> On Fri, 2006-09-22 at 13:06 -0400, Bowie Bailey wrote:
> > Cian Cullinan wrote:
> > > <snip>
> 
> > The problem does sound external to the disks.  Someone else
> > suggested the power supply.  I wouldn't suspect both cables to
have
> > problems, so that's probably not an issue.
> 
> Have you looked inside the PS? What you think of a separate cables may
> be joined at the base in the PS. And then if one cable has a high
> resistance short, it affects voltage on both legs. And if the output
> tap feeding those two wires (even separate wires on the same
"bus" or
> "tap" are "joined at the base" electrically speaking)
has a problem
> ... it appears on both wires.
> 
> That said, *usually* problems in the wires are "opens" and you
just
> lose that one "leg".
I was actually referring to the SATA cables.  I guess my reply got a
bit garbled there.  One topic at a time... :)

I take your point regarding the power supply cables.  I'll put the
drives on separate legs and see what happens.

-- 
Bowie

Bowie Bailey

2006-Sep-26 15:35 UTC

head link

[CentOS] Re: Hard drive errors

Scott Silva wrote:> Les Mikesell spake the following on 9/25/2006 2:16 PM:
> > On Mon, 2006-09-25 at 09:08 -0400, Lamar Owen wrote:
> > 
> > > Power supply issues can do real damage to the drives, as well.  I
> > > had a pair of 250GB drives that have hard errorred sectors thanks
> > > to a power 
> > > supply 'glitching' on me.  The drives still work fine
otherwise,
> > > but they have several dozen sectors each that will never be
> > > usable. 
> > 
> > Most drive manufacturers have a diagnostic/repair utility that have
> > a fair chance of fixing that sort of problem.  Unfortunately they
> > tend to only run under windows.
> > 
> Look for the Ultimate boot disk;
> http://www.ultimatebootcd.com/
> 
> It has many manufacturer hard drive utilities, and you just need to
> be able to boot from CD.
I burned the CD and tried it out.  There are two utilities on the CD
for Seagate drives, but unless I am missing something, neither of them
have any diagnostic capability.

-- 
Bowie

Bowie Bailey

2006-Sep-26 15:54 UTC

head link

[CentOS] Re: Hard drive errors

chrism at imntv.com wrote:> Bowie Bailey wrote:
> > I burned the CD and tried it out.  There are two utilities on the CD
> > for Seagate drives, but unless I am missing something, neither of
> > them have any diagnostic capability.
> > 
> 
> Have you tried here?
> 
> http://www.seagate.com/support/disc/utils.html
Not yet, but my comment was directed toward the Ultimate Boot CD.  If
the CD is designed for doing diagnostics, it seems odd to me that
there are two Seagate utilities included, but neither of them is
capable of doing diagnostics or testing.  

(I know this is offtopic, just posting here since someone here
suggested using the CD)

-- 
Bowie

Bowie Bailey

2006-Sep-26 16:49 UTC

head link

[CentOS] Re: Hard drive errors

Scott Silva wrote:> Bowie Bailey spake the following on 9/26/2006 8:35 AM:
> > Scott Silva wrote:
> > > Look for the Ultimate boot disk;
> > > http://www.ultimatebootcd.com/
> > > 
> > > It has many manufacturer hard drive utilities, and you just need
> > > to be able to boot from CD.
> > 
> > I burned the CD and tried it out.  There are two utilities on the CD
> > for Seagate drives, but unless I am missing something, neither of
> > them have any diagnostic capability.
> > 
> When you press F2 to get to the disk utilities, you have to use the
> left /right arrows to see more menus.
I knew I was missing something.  Thanks!

-- 
Bowie

Reasonably Related Threads

Search for more apparently analagous threads

CentOS - Sep 2006 - Hard drive errors

[CentOS] Hard drive errors

[CentOS] Re: Hard drive errors

[CentOS] Hard drive errors

[CentOS] Hard drive errors

[CentOS] Re: Hard drive errors

[CentOS] Re: Hard drive errors

[CentOS] Hard drive errors

[CentOS] Hard drive errors

[CentOS] Hard drive errors

[CentOS] Hard drive errors

[CentOS] Re: Hard drive errors

[CentOS] Re: Hard drive errors

[CentOS] Re: Hard drive errors

Reasonably Related Threads