On Tue, Oct 28, 2003 at 04:51:52PM +0000, Ben Mansell
wrote:> ext3 is having big problems on my x86-64 system. An ext3 partition has
> just gone crazy, with the following messages in dmesg:
>
> EXT3-fs error (device hda3): ext3_readdir: bad entry in directory #4603905:
rec_len % 4 != 0 - offset=0, inode=2507704792, rec_len=42,
> name_len=0
> Aborting journal on device hda3.
> ext3_abort called.
> EXT3-fs abort (device hda3): ext3_journal_start: Detected aborted
> journal
> Remounting filesystem read-only
> EXT3-fs error (device hda3) in start_transaction: Journal has aborted
> EXT3-fs error (device hda3) in start_transaction: Journal has aborted
> [...]
>
> I hit a similar problem yesterday, but lost some details so I couldn't
> make a proper bug report. However, it did mean that the partition got
> fully fscked, so I think these errors are ext3 getting confused all by
> itself, rather than it complaining about an already-corrupt filesystem.
> There's no sign of any hardware problems with the disk or controller in
> the logs.
The "bad entry in directory" is very clearly a corrupted filesystem
error. Sometimes though fsck might not see the problem if the block
was corrupted when it was read from the disk (so that the in-memory
copy is corrupt, but the copy on-disk is still valid). This is one of
the reasons why as soon as filesystem corruption is detected, the
first thing ext3 will do is to (figuratively) slam down the bulkheads
to contain damage, and remount the filesystem read-only.
Can you try running e2fsck on it, and send us a transcript of the logs
of e2fsck's output?
> Going back through the logs, there are also these worrying messages
> (from the previous crash):
>
> attempt to access beyond end of device
> hda3: rw=0, want=24855217912, limit=143235540
> attempt to access beyond end of device
> hda3: rw=0, want=7590624504, limit=143235540
> attempt to access beyond end of device
> hda3: rw=0, want=17158939432, limit=143235540
> attempt to access beyond end of device
> hda3: rw=0, want=32230066456, limit=143235540
>
These are also traditionally signs of a corrupted filesystem metadata.
Note that if the data blocks are corrupted either to or from the CPU
to the disk, there mayu very well not be any log entries warning about
such failures, other than the complaints from the ext2/3 filesystem.
- Ted