I have an Athlon with about 10 HDDs plugged in, primarily to do Disk2Disk backups. Some drives are PATA, some are SATA, some are USB. A strange concoction, but it's been relatively stable for some 4-5 years, despite numerous upgrades and so on. It's been running CentOS 4 for a long, long time. (years) Recently, I've started to have problems with its stability, and after 2 weeks of swapping hardware, found that using an earlier kernel restores its stability! It takes a few days to determine if anything "goes south", so debugging is very, very slow. But I get random read errors, either SCSI errors or (a few times) HDA read errors. Once the read errors begin, the system becomes very unresponsive, and often won't restart, even though I wait for hours, without my hitting the "kill switch". # uname -a Linux backuphost 2.6.9-67.0.22.EL #1 Wed Jul 23 17:17:45 EDT 2008 i686 athlon i386 GNU/Linux The failures occur on all /dev/sd* devices, even those that are USB. Once, /dev/hdc had a similar problem after /dev/sdb had failed. Don't know if the mapping below helps? /dev/hda - PATA, on motherboard, 20 GB. /dev/hdb - IDE CDROM /dev/hdc - on motherboard 500 GB IDE /dev/hdd - on motherboard 300 GB IDE /dev/hde - on PCI card, 500 GB IDE /dev/sda - SATA, on a PCI card, 1 TB /dev/sdb - SATA, on a PCI card 1 TB /dev/sdc - USB on a USB 2.0 PCI card, 750 GB /dev/sde - USB on a USB 2.0 PCI card, 750 GB /dev/sdf - USB on a USB 2.0 PCI card, 1 TB Here's what I see in the /var/log/messages: May 27 05:08:42 hume ntpd[4844]: kernel time sync enabled 0001 May 27 08:01:01 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000 May 27 08:01:01 hume kernel: end_request: I/O error, dev sda, sector 12847 May 27 08:01:01 hume kernel: EXT3-fs error (device sda1): ext3_find_entry: reading directory #2 offset 0 May 27 08:01:01 hume kernel: May 27 08:14:27 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000 May 27 08:14:27 hume kernel: end_request: I/O error, dev sda, sector 12847 May 27 08:14:27 hume kernel: EXT3-fs error (device sda1): ext3_find_entry: reading directory #2 offset 0 May 27 08:14:27 hume kernel: May 27 10:28:30 hume ntpd[4844]: synchronized to 63.240.161.99, stratum 2 May 27 11:48:07 hume sshd(pam_unix)[26873]: session opened for user root by (uid=0) May 27 11:48:10 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000 May 27 11:48:10 hume kernel: end_request: I/O error, dev sda, sector 12847 May 27 11:48:10 hume kernel: EXT3-fs error (device sda1): ext3_find_entry: reading directory #2 offset 0 May 27 11:48:10 hume kernel: May 27 11:48:16 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000 May 27 11:48:16 hume kernel: end_request: I/O error, dev sda, sector 12847 May 27 11:48:16 hume kernel: EXT3-fs error (device sda1): ext3_readdir: directory #2 contains a hole at offset 0 May 27 11:48:23 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000 May 27 11:48:23 hume kernel: end_request: I/O error, dev sda, sector 12847 May 27 11:48:23 hume kernel: EXT3-fs error (device sda1): ext3_readdir: directory #2 contains a hole at offset 0 May 27 11:48:24 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000 May 27 11:48:24 hume kernel: end_request: I/O error, dev sda, sector 12847 May 27 11:48:24 hume kernel: EXT3-fs error (device sda1): ext3_readdir: directory #2 contains a hole at offset 0 May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000 May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 0 May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 0 May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000 May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 8 May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 1 May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000 May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 16 May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 2 May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000 May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 24 May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 3 May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000 May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 32 May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 4 May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000 May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 40 May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 5 May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000 May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 48 May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 6 .. MANY MEGABYTES OF STUFF LIKE THIS .. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos/attachments/20090603/4e665fda/attachment-0001.html>