I had a busy mailserver fail on me the other day. Below is what was
printed in dmesg. We first suspected a hardware failure (raid controller
or something else), so we moved the drives to another (identical
hardware) machine and ran fsck. Fsck complained ("short read while
reading inode") and asked if I wanted to ignore and rewrite (which I
did).
After booting up again, the problem came back immediately and root was
remounted read only. We moved the data from the read only drive to a new
machine. While copying the data, we got this message from time to time
(on various files): "EXT3-fs error (device dm-0): ext3_get_inode_loc:
unable to read inode block - inode=22561891, block=90243144.
I need to find the cause(s) of the problems. So far I have these
questions/concerns:
- Kernel bug? (This is Ubuntu 8.10 with 2.6.27-7-server)
- Filesystem bug/failure?
- Did the RAID controller fail to detect a failing drive? This is an
Adaptec aoc-usas-s4ir running on a Supermicro motherboard.
I suspect that one of the drives (RAID 6 btw) has failed, but I'm not
sure what to do from here.
Any ideas? Thanks in advance.
dmesg:
[ 38.907730] end_request: I/O error, dev sda, sector 284688831
[ 38.907802] EXT3-fs error (device dm-0): read_block_bitmap: Cannot read block
bitmap - block_group =3D 1086, block_bitmap =3D 35586048
[ 38.907956] Aborting journal on device dm-0.
[ 38.919742] ext3_abort called.
[ 38.919798] EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected
aborted journal
[ 38.919942] Remounting filesystem read-only
[ 38.925855] __journal_remove_journal_head: freeing b_committed_data
[ 38.925915] journal commit I/O error
[ 38.925935] journal commit I/O error
[ 38.925953] journal commit I/O error
[ 38.943245] Remounting filesystem read-only
[ 38.958907] EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal
has aborted
[ 38.958988] EXT3-fs error (device dm-0) in ext3_truncate: Journal has aborted
[ 38.959051] EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal
has aborted
[ 38.959137] EXT3-fs error (device dm-0) in ext3_orphan_del: Journal has
aborted
[ 38.959222] EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal
has aborted
[ 39.024087] journal commit I/O error
[ 39.024103] journal commit I/O error
[ 39.024117] journal commit I/O error
[ 39.024124] journal commit I/O error
[ 39.024181] journal commit I/O error
[ 39.024201] journal commit I/O error
[ 39.024208] journal commit I/O error
[ 39.024258] journal commit I/O error
[ 39.024275] journal commit I/O error
[ 39.024284] journal commit I/O error
[ 39.024330] journal commit I/O error
[ 39.024358] journal commit I/O error
[ 39.024384] journal commit I/O error
[ 39.024432] journal commit I/O error
[ 39.024481] journal commit I/O error
[ 45.749997] sd 0:0:0:0: [sda] Result: hostbyte=3DDID_OK
driverbyte=3DDRIVER_SENSE,SUGGEST_OK
[ 45.750008] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]=20
[ 45.750012] sd 0:0:0:0: [sda] Add. Sense: Internal target failure
[ 45.750017] end_request: I/O error, dev sda, sector 721945599
[ 45.750079] Buffer I/O error on device dm-0, logical block 90243144
[ 45.750137] lost page write due to I/O error on dm-0
[ 87.970284] sd 0:0:0:0: [sda] Result: hostbyte=3DDID_OK
driverbyte=3DDRIVER_SENSE,SUGGEST_OK
[ 87.970292] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]=20
[ 87.970296] sd 0:0:0:0: [sda] Add. Sense: Internal target failure
[ 87.970302] end_request: I/O error, dev sda, sector 83324999
--
Vegard Svanberg <vegard at svanberg.no> [*Takapa at IRC (EFnet)]