David Shaw
2005-Jun-14 21:14 UTC
bad inode number followed by ext3_abort and remount readonly
I have seen this happen a number of times: Jun 13 13:58:16 n202 kernel: EXT3-fs error (device sda5): ext3_get_inode_block: bad inode number: 9 Jun 13 13:58:16 n202 kernel: Aborting journal on device sda5. Jun 13 13:58:16 n202 kernel: EXT3-fs error (device sda5): ext3_get_inode_block: bad inode number: 9 Jun 13 13:58:16 n202 last message repeated 6 times Jun 13 13:58:18 n202 kernel: ext3_abort called. Jun 13 13:58:18 n202 kernel: EXT3-fs error (device sda5): ext3_journal_start_sb: Detected aborted journal Jun 13 13:58:18 n202 kernel: Remounting filesystem read-only Once this happens, things break quickly (/tmp being readonly, as a start). Upon reboot, a manual fsck is required, after which the machine is operational again. This particular example is a SATA disk, but it has happened to a regular old IDE disk as well. It is always the root partition. The bad inode number varies (but is always either 3 or 9). There are no other errors about the disk in the log. Kernel: 2.6.11.7 e2fstools: 1.35 (28-Feb-2004) Any thoughts on how to proceed here? Unfortunately, I'm not able to duplicate this at will. David
Andreas Dilger
2005-Jun-14 23:19 UTC
bad inode number followed by ext3_abort and remount readonly
On Jun 14, 2005 17:14 -0400, David Shaw wrote:> Jun 13 13:58:16 n202 kernel: EXT3-fs error (device sda5): ext3_get_inode_block: bad inode number: 9 > > This particular example is a SATA disk, but it has happened to a > regular old IDE disk as well. It is always the root partition. The > bad inode number varies (but is always either 3 or 9). There are no > other errors about the disk in the log.The "bad inode number" check is only for inodes inside the "reserved inode" area, namely inum < 12. The only commonly used (=valid) inode numbers in this range are the root inode (=2) and the journal inode (=8), so I suspect you are getting single-bit memory errors in bit 1, or if the controller is the same that would also be viewed with suspicion. It is very likely that you are getting other single-bit errors elsewhere but they are harder to notice. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.