One of my CentOS boxes has started giving me errors. The box is CentOS-4.4 (i386) fully updated. It has a pair of SATA drives in a software raid 1 configuration. The errors I see are: ata1: command 0xca timeout, stat 0x50 host_stat 0x24 ata1: status=0x50 { DriveReady SeekComplete } Info fld=0x1e22b8, Current sda: sense key No Sense ata2: command 0xca timeout, stat 0x50 host_stat 0x24 ata2: status=0x50 { DriveReady SeekComplete } Info fld=0x1e2598, Current sdb: sense key No Sense If it was just coming from one drive, I would say it was a bad drive, but this is happening on both drives. Can anyone give me a starting place to diagnose this? Thanks. -- Bowie
Bowie Bailey spake the following on 9/21/2006 12:56 PM:> One of my CentOS boxes has started giving me errors. The box is > CentOS-4.4 (i386) fully updated. It has a pair of SATA drives in a > software raid 1 configuration. > > The errors I see are: > > ata1: command 0xca timeout, stat 0x50 host_stat 0x24 > ata1: status=0x50 { DriveReady SeekComplete } > Info fld=0x1e22b8, Current sda: sense key No Sense > ata2: command 0xca timeout, stat 0x50 host_stat 0x24 > ata2: status=0x50 { DriveReady SeekComplete } > Info fld=0x1e2598, Current sdb: sense key No Sense > > If it was just coming from one drive, I would say it was a bad drive, > but this is happening on both drives. > > Can anyone give me a starting place to diagnose this? > > Thanks. > > -- > BowieCables? Connections? Power supply going bad? -- MailScanner is like deodorant... You hope everybody uses it, and you notice quickly if they don't!!!!
> One of my CentOS boxes has started giving me errors. The box is > CentOS-4.4 (i386) fully updated. It has a pair of SATA drives in a > software raid 1 configuration. > > The errors I see are: > > ata1: command 0xca timeout, stat 0x50 host_stat 0x24 > ata1: status=0x50 { DriveReady SeekComplete } > Info fld=0x1e22b8, Current sda: sense key No Sense > ata2: command 0xca timeout, stat 0x50 host_stat 0x24 > ata2: status=0x50 { DriveReady SeekComplete } > Info fld=0x1e2598, Current sdb: sense key No Sense > > If it was just coming from one drive, I would say it was a bad drive, > but this is happening on both drives. > > Can anyone give me a starting place to diagnose this?The ideal should be to use the SmartTools utilities, but, if I'm right, there're problems with SATA support. At least, you can boot with liveCD-util (like UltimateBootCD, for example) and check the HD integrity. -- Jordi Espasa Clofent PGP id 0xC5ABA76A #http://pgp.mit.edu/ FSF Associate Member id 4281 #http://www.fsf.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://lists.centos.org/pipermail/centos/attachments/20060922/e50fb5df/attachment-0002.sig>
Jordi Espasa Clofent wrote:> > The ideal should be to use the SmartTools utilities, but, if I'm right, > there're problems with SATA support.smartctl on CentOS >= 4.3, supports SATA just fine.> At least, you can boot with liveCD-util (like UltimateBootCD, for example) and > check the HD integrity.why not use the CentOS-4 LiveCD ? -- Karanbir Singh : http://www.karan.org/ : 2522219 at icq
Scott Silva wrote:> Bowie Bailey spake the following on 9/21/2006 12:56 PM: > > One of my CentOS boxes has started giving me errors. The box is > > CentOS-4.4 (i386) fully updated. It has a pair of SATA drives in a > > software raid 1 configuration. > > > > The errors I see are: > > > > ata1: command 0xca timeout, stat 0x50 host_stat 0x24 > > ata1: status=0x50 { DriveReady SeekComplete } > > Info fld=0x1e22b8, Current sda: sense key No Sense > > ata2: command 0xca timeout, stat 0x50 host_stat 0x24 > > ata2: status=0x50 { DriveReady SeekComplete } > > Info fld=0x1e2598, Current sdb: sense key No Sense > > > > If it was just coming from one drive, I would say it was a bad > > drive, but this is happening on both drives. > > > > Can anyone give me a starting place to diagnose this? > > Cables? Connections? Power supply going bad?Already reseated all of the cables. I even moved the drives to two different SATA ports on the MB. Power supply...maybe, but not my first guess. I may swap it out later if I can't find any other cause. -- Bowie
Peter Farrow wrote:> This can be caused by overheating. If the drives are mounted close > together as some chassis configurations permit, then it make sense > they may both exhibit the same problem, but not necessarily > simultaneously. > > Are this log entries adjacent to each other?The drives have an open slot between them and a 120mm fan right in front of them, so I don't think overheating is an issue. At one point, I shut the machine down and let it sit for an hour or so. The errors started back up immediately during the boot process when I turned it back on. The log entries are adjacent...usually with the exact same timestamp. There are no other errors mixed in with them. I was able to get the system to boot yesterday. It triggered a raid rebuild which finished last night. The last batch of errors happened at 4am and the system appears to be running normally now. Here's a snip from the logs showing the errors and the rebuild: (I've cut out a few columns to shorten the lines) ---------------------------------- 22:36:21 kernel: ata3: command 0x25 timeout, stat 0x50 host_stat 0x24 22:36:21 kernel: ata3: status=0x50 { DriveReady SeekComplete } 22:36:21 kernel: Current sda: sense key No Sense 22:36:21 kernel: ata4: command 0x35 timeout, stat 0x50 host_stat 0x24 22:36:21 kernel: ata4: status=0x50 { DriveReady SeekComplete } 22:36:21 kernel: Current sdb: sense key No Sense 22:37:09 kernel: ata3: command 0x25 timeout, stat 0x50 host_stat 0x24 22:37:09 kernel: ata3: status=0x50 { DriveReady SeekComplete } 22:37:09 kernel: Current sda: sense key No Sense 22:38:03 kernel: md: md1: sync done. 22:38:03 kernel: RAID1 conf printout: 22:38:03 kernel: --- wd:2 rd:2 22:38:03 kernel: disk 0, wo:0, o:1, dev:sda2 22:38:03 kernel: disk 1, wo:0, o:1, dev:sdb2 23:25:39 kernel: ata3: command 0xca timeout, stat 0x50 host_stat 0x24 23:25:39 kernel: ata3: status=0x50 { DriveReady SeekComplete } 23:25:39 kernel: Info fld=0x662270, Current sda: sense key No Sense 23:25:39 kernel: ata4: command 0xca timeout, stat 0x50 host_stat 0x24 23:25:39 kernel: ata4: status=0x50 { DriveReady SeekComplete } 23:25:39 kernel: Info fld=0x662270, Current sdb: sense key No Sense ---------------------------------- These errors are showing ata3 and ata4 because I switched the drives to different SATA connections on the MB. These are the same drives that I showed previously with errors on ata1 and ata2. -- Bowie
Karanbir Singh wrote:> Jordi Espasa Clofent wrote: > > > > The ideal should be to use the SmartTools utilities, but, if I'm > > right, there're problems with SATA support. > > smartctl on CentOS >= 4.3, supports SATA just fine.Sounds good. I'll give it a try.> > At least, you can boot with liveCD-util (like UltimateBootCD, for > > example) and check the HD integrity. > > why not use the CentOS-4 LiveCD ?I downloaded the CentOS-4 LiveCD last night. The system is running from the hard drives at the moment, but if it dies again, I'll try the LiveCD. -- Bowie
Bowie Bailey wrote:> Karanbir Singh wrote: > > Jordi Espasa Clofent wrote: > > > > > > The ideal should be to use the SmartTools utilities, but, if I'm > > > right, there're problems with SATA support. > > > > smartctl on CentOS >= 4.3, supports SATA just fine. > > Sounds good. I'll give it a try.I tried smartctl and it ran fine, but found no errors (on the short test). FYI - I had to specify the ata device type to make it work. smartctl -d ata -t short /dev/sda I was also able to get the smartd service running by adding '-d ata' to both lines in the /etc/smartd.conf file. Everything seems to be running normally now. Smartd is configured to send me email on problems now, so I'll keep an eye on it and see what happens. -- Bowie
Cian Cullinan wrote:> Well given the simultaneous errors from both disks, and the all-clear > from smartctl, it *really* sounds like a problem external to the > disks, I.E.: cables, controller etc.I'm running the long self-test on the drives now. That will give me a more definitive answer. The problem does sound external to the disks. Someone else suggested the power supply. I wouldn't suspect both cables to have problems, so that's probably not an issue. The controller is on the MB and I REALLY don't want to have to tear the machine apart to replace it. The errors have stopped for now. No more errors in the past four hours. At the moment all I can do is keep monitoring it. -- Bowie
William L. Maltby wrote:> On Fri, 2006-09-22 at 13:06 -0400, Bowie Bailey wrote: > > Cian Cullinan wrote: > > > <snip> > > > The problem does sound external to the disks. Someone else > > suggested the power supply. I wouldn't suspect both cables to have > > problems, so that's probably not an issue. > > Have you looked inside the PS? What you think of a separate cables may > be joined at the base in the PS. And then if one cable has a high > resistance short, it affects voltage on both legs. And if the output > tap feeding those two wires (even separate wires on the same "bus" or > "tap" are "joined at the base" electrically speaking) has a problem > ... it appears on both wires. > > That said, *usually* problems in the wires are "opens" and you just > lose that one "leg".I was actually referring to the SATA cables. I guess my reply got a bit garbled there. One topic at a time... :) I take your point regarding the power supply cables. I'll put the drives on separate legs and see what happens. -- Bowie
Scott Silva wrote:> Les Mikesell spake the following on 9/25/2006 2:16 PM: > > On Mon, 2006-09-25 at 09:08 -0400, Lamar Owen wrote: > > > > > Power supply issues can do real damage to the drives, as well. I > > > had a pair of 250GB drives that have hard errorred sectors thanks > > > to a power > > > supply 'glitching' on me. The drives still work fine otherwise, > > > but they have several dozen sectors each that will never be > > > usable. > > > > Most drive manufacturers have a diagnostic/repair utility that have > > a fair chance of fixing that sort of problem. Unfortunately they > > tend to only run under windows. > > > Look for the Ultimate boot disk; > http://www.ultimatebootcd.com/ > > It has many manufacturer hard drive utilities, and you just need to > be able to boot from CD.I burned the CD and tried it out. There are two utilities on the CD for Seagate drives, but unless I am missing something, neither of them have any diagnostic capability. -- Bowie
chrism at imntv.com wrote:> Bowie Bailey wrote: > > I burned the CD and tried it out. There are two utilities on the CD > > for Seagate drives, but unless I am missing something, neither of > > them have any diagnostic capability. > > > > Have you tried here? > > http://www.seagate.com/support/disc/utils.htmlNot yet, but my comment was directed toward the Ultimate Boot CD. If the CD is designed for doing diagnostics, it seems odd to me that there are two Seagate utilities included, but neither of them is capable of doing diagnostics or testing. (I know this is offtopic, just posting here since someone here suggested using the CD) -- Bowie
Scott Silva wrote:> Bowie Bailey spake the following on 9/26/2006 8:35 AM: > > Scott Silva wrote: > > > Look for the Ultimate boot disk; > > > http://www.ultimatebootcd.com/ > > > > > > It has many manufacturer hard drive utilities, and you just need > > > to be able to boot from CD. > > > > I burned the CD and tried it out. There are two utilities on the CD > > for Seagate drives, but unless I am missing something, neither of > > them have any diagnostic capability. > > > When you press F2 to get to the disk utilities, you have to use the > left /right arrows to see more menus.I knew I was missing something. Thanks! -- Bowie