Out of the blue, dmesg on my HP Proliant w/ a SCSI disk gives loads of messages like this one: EXT3-fs error (device dm-0) in start_transaction: Journal has aborted Then the root fs goes read-only, so little else can be done on the machine. LVM locks up. At restart, fs needs a reboot to recover after fsck. The host starts up ok, then I am given some more minutes before the problem reappears. This is stock CentOS 4.4, never have gotten to update it because of this very same problem. System logs say SCSI I/O error, but SMART says no problem has been found, neither does badblocks (run from a rescue CD bootup). SCSI cabling, terminator, etc has been checked. What should I investigate next? Is the disk condemned? TIA -- Eduardo Grosclaude Universidad Nacional del Comahue Neuquen, Argentina -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos/attachments/20070711/00504832/attachment.html>
On Wed, Jul 11, 2007 at 06:20:50PM -0300, Eduardo Grosclaude alleged:> Out of the blue, dmesg on my HP Proliant w/ a SCSI disk gives loads of > messages like this one: > > EXT3-fs error (device dm-0) in start_transaction: Journal has aborted > > Then the root fs goes read-only, so little else can be done on the machine. > LVM locks up. At restart, fs needs a reboot to recover after fsck. The host > starts up ok, then I am given some more minutes before the problem > reappears. This is stock CentOS 4.4, never have gotten to update it because > of this very same problem. > > System logs say SCSI I/O error, but SMART says no problem has been found, > neither does badblocks (run from a rescue CD bootup). SCSI cabling, > terminator, etc has been checked. > > What should I investigate next? Is the disk condemned?Quite likely the drive is dieing. If you want proof from SMART, something like 'smartctl -t long /dev/sda' will likely fail. -- Garrick Staples, GNU/Linux HPCC SysAdmin University of Southern California Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://lists.centos.org/pipermail/centos/attachments/20070711/bdbfe0f2/attachment.sig>
On 7/11/07, Eduardo Grosclaude <eduardo.grosclaude at gmail.com> wrote:> Out of the blue, dmesg on my HP Proliant w/ a SCSI disk gives loads of > messages like this one: > > EXT3-fs error (device dm-0) in start_transaction: Journal has aborted > > Then the root fs goes read-only, so little else can be done on the machine. > LVM locks up. At restart, fs needs a reboot to recover after fsck. The host > starts up ok, then I am given some more minutes before the problem > reappears. This is stock CentOS 4.4, never have gotten to update it because > of this very same problem. > > System logs say SCSI I/O error, but SMART says no problem has been found, > neither does badblocks (run from a rescue CD bootup). SCSI cabling, > terminator, etc has been checked. > > What should I investigate next? Is the disk condemned?SMART isnt fool-proof. I have had disks that go 'clunk/scraping sounds/spin up' that have gotten SMART seal of approval. My normal checklist is the above with replacing the items (in case that isnt what you meant by check). Replace terminator scsi cable controller diskdrive though I usually do disk drive then controller. -- Stephen J Smoogen. -- CSIRT/Linux System Administrator How far that little candle throws his beams! So shines a good deed in a naughty world. = Shakespeare. "The Merchant of Venice"