Bart Schaefer
2006-Apr-01 19:16 UTC
[CentOS] CentOS 4.3 occasionally locking up accessing IDE drive
For those who haven't seen my several previous postings about problems with this (now not quite so) new PC, I have an ASUS P5N32-SLI Deluxe motherboard. The boot drive and primary filesystems are on an SATA disk and I'm having no problem with that. However, I recently plugged in a couple of IDE drives salvaged from my old PCs and I'm running into trouble with one of those. The drive in question is a 20GB Maxtor 92049U6. It had an old RH5.2 ext2 filesystem on it when I first plugged it in, from which I tried to recover some data to back up to CD. Mostly this worked, but I started encountering read errors accessing some files so I unmounted the partition and started a fsck on it. At some point during the fsck (I was off doing something else on another workspace at the time), the system locked up hard, leaving the disk activity LED lit. I had to reset the PC. So at that point I booted single-user and ran the fsck from there. It completed successfully after fixing a number of problems. I continued into multi-user mode, finished doing my backups, repartitioned the drive, and started "mkfs -t ext3 -c" on the larger partition, to check for bad blocks. Again at some point part way through the mkfs, the system locked up. Back to single user mode, run the "mkfs", everything finishes fine. Back to multi-user mode, start to copy some large files onto the drive. MD5 sums fail to match for some of the copied files. Unmounted and started up "fsck -y". This succeeded, after fixing a number of errors, so (at this point just as a test case) I re-copied the files with bad MD5s. Some of these came through OK this time, others still did not. I decided perhaps this meant there were still bad blocks on the drive that a read-only test was not finding. You'd think I'd have learned, but encouraged by the success of the previous fsck I optimistically started up another "fsck -c -c -y" on the suspect partition, and this time I waited around to watch it. About 1.6GB into the 16GB partition, the system locked up again. This time I booted into a hard disk diagnostic program instead of into CentOS. After running overnight last night, a non-destructive read-write surface-scan reported no problems with the drive. This leads me to suspect that the problem is with linux, but I don't know how to proceed with diagnosing it. Suggestions would be appreciated.
William L. Maltby
2006-Apr-01 19:47 UTC
[CentOS] CentOS 4.3 occasionally locking up accessing IDE drive
On Sat, 2006-04-01 at 11:16 -0800, Bart Schaefer wrote:> For those who haven't seen my several previous postings about problems > with this (now not quite so) new PC, I have an ASUS P5N32-SLI Deluxe > motherboard. The boot drive and primary filesystems are on an SATA > disk and I'm having no problem with that. However, I recently plugged > in a couple of IDE drives salvaged from my old PCs and I'm running > into trouble with one of those. > > The drive in question is a 20GB Maxtor 92049U6. It had an old RH5.2 > ext2 filesystem on it when I first plugged it in, from which I tried > to recover some data to back up to CD. Mostly this worked, but I > started encountering read errors accessing some files so I unmounted > the partition and started a fsck on it. At some point during the fsck > (I was off doing something else on another workspace at the time), the > system locked up hard, leaving the disk activity LED lit. I had to > reset the PC. > > So at that point I booted single-user and ran the fsck from there. It > completed successfully after fixing a number of problems. I continued > into multi-user mode, finished doing my backups, repartitioned the > drive, and started "mkfs -t ext3 -c" on the larger partition, to check > for bad blocks. Again at some point part way through the mkfs, the > system locked up. > > Back to single user mode, run the "mkfs", everything finishes fine. > Back to multi-user mode, start to copy some large files onto the > drive. MD5 sums fail to match for some of the copied files. > Unmounted and started up "fsck -y". This succeeded, after fixing a > number of errors, so (at this point just as a test case) I re-copied > the files with bad MD5s. Some of these came through OK this time, > others still did not. I decided perhaps this meant there were still > bad blocks on the drive that a read-only test was not finding. > > You'd think I'd have learned, but encouraged by the success of the > previous fsck I optimistically started up another "fsck -c -c -y" on > the suspect partition, and this time I waited around to watch it. > About 1.6GB into the 16GB partition, the system locked up again. > > This time I booted into a hard disk diagnostic program instead of into > CentOS. After running overnight last night, a non-destructive > read-write surface-scan reported no problems with the drive. This > leads me to suspect that the problem is with linux, but I don't know > how to proceed with diagnosing it. Suggestions would be appreciated.Re the overnight diag, are environmental conditions similar to when you encounter problems? Temp, power "brown out", etc? If *not*, try the diag when conditions are similar to when you have the problem. Long shot, but you've obviously gotten o the point of needing a long rifle. Secondly, are your current HD configurations consistent with what is actually on the drive? "sfdisk -l /dev/hdXXX" and then look at your BIOS settings for it. If the BIOS where the disk was originally set up assigned different params than the current BIOS (assuming you did auto-detect rather than set up manually) could this be involved? I don't *think* BIOS settings are actually used in current Linuxes, but I could be wrong. If sfdisk and/or the BIOS show different params than the old BIOS, maybe a manual setup of the HD params will fix your problem? Since you have an (apparent) inconsistent behavior, how about cable integrity, connectors, power etc.? Poorly seated or worn connector could be very sensitive to temp changes and vibration. Have you visually inspected all this, especially the power? Have you put volt meters on the +5/+12 and their respective grounds when the system is under load to see if maybe your PS is too weak? HTH Bill P.S. Tried different IDE cable/power connectors? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.centos.org/pipermail/centos/attachments/20060401/44312f72/attachment-0001.sig>
Bart Schaefer
2006-Apr-02 05:56 UTC
[CentOS] CentOS 4.3 occasionally locking up accessing IDE drive
On 4/1/06, William L. Maltby <BillsCentOS at triad.rr.com> wrote:> Re the overnight diag, are environmental conditions similar to when you > encounter problems? Temp, power "brown out", etc?Yep. The PC was custom-built less than 12 weeks ago, it's on a UPS only a couple of weeks older than that, the case power supply is supposed to handle up to twice as many drives as I have in there, and the IDE cables are brand new and firmly seated.> Secondly, are your current HD configurations consistent with what is > actually on the drive?AFAICT, yes. And the other drive on the same IDE cable, also a Maxtor, is working fine.
Leo Arnts
2006-Apr-02 09:54 UTC
[CentOS] CentOS 4.3 occasionally locking up accessing IDE drive
Hi Bart, Here the same problems with an Asus P5ND2 SLI motherboard and 2 ATA Maxtor 6L200P0 200Gb hard drives in raid 0+1. The disc also checks ok even with the Maxtor tools. Guessing a linux problem maybe one of the ata drivers ? I have seen the problem with several kernel's yust now running 2.6.9-34.ELsmp-CUSTOM -----Oorspronkelijk bericht----- Van: centos-bounces at centos.org [mailto:centos-bounces at centos.org] Namens Bart Schaefer Verzonden: zaterdag 1 april 2006 21:17 Aan: CentOS mailing list Onderwerp: [CentOS] CentOS 4.3 occasionally locking up accessing IDE drive For those who haven't seen my several previous postings about problems with this (now not quite so) new PC, I have an ASUS P5N32-SLI Deluxe motherboard. The boot drive and primary filesystems are on an SATA disk and I'm having no problem with that. However, I recently plugged in a couple of IDE drives salvaged from my old PCs and I'm running into trouble with one of those. The drive in question is a 20GB Maxtor 92049U6. It had an old RH5.2 ext2 filesystem on it when I first plugged it in, from which I tried to recover some data to back up to CD. Mostly this worked, but I started encountering read errors accessing some files so I unmounted the partition and started a fsck on it. At some point during the fsck (I was off doing something else on another workspace at the time), the system locked up hard, leaving the disk activity LED lit. I had to reset the PC. So at that point I booted single-user and ran the fsck from there. It completed successfully after fixing a number of problems. I continued into multi-user mode, finished doing my backups, repartitioned the drive, and started "mkfs -t ext3 -c" on the larger partition, to check for bad blocks. Again at some point part way through the mkfs, the system locked up. Back to single user mode, run the "mkfs", everything finishes fine. Back to multi-user mode, start to copy some large files onto the drive. MD5 sums fail to match for some of the copied files. Unmounted and started up "fsck -y". This succeeded, after fixing a number of errors, so (at this point just as a test case) I re-copied the files with bad MD5s. Some of these came through OK this time, others still did not. I decided perhaps this meant there were still bad blocks on the drive that a read-only test was not finding. You'd think I'd have learned, but encouraged by the success of the previous fsck I optimistically started up another "fsck -c -c -y" on the suspect partition, and this time I waited around to watch it. About 1.6GB into the 16GB partition, the system locked up again. This time I booted into a hard disk diagnostic program instead of into CentOS. After running overnight last night, a non-destructive read-write surface-scan reported no problems with the drive. This leads me to suspect that the problem is with linux, but I don't know how to proceed with diagnosing it. Suggestions would be appreciated. _______________________________________________ CentOS mailing list CentOS at centos.org http://lists.centos.org/mailman/listinfo/centos