Hi. I woke up this morning to find a ton of waiting emails complaining that
some cron jobs on my system couldn't run because one of my filesystems (ext3
on software RAID 1) was suddenly mounted read-only. Always nice when you're
away from the server due to travel. ;^> I investigated in the logs and
found:
2007-08-02 04:02:25 kern.crit www kernel: EXT3-fs error (device md2):
htree_dirblock_to_tree: bad entry in directory #3616894: rec_len is too small
for name_len - offset=103576, inode=3619715, rec_len=12, name_len=132
2007-08-02 04:02:25 kern.err www kernel: Aborting journal on device md2.
2007-08-02 04:02:25 kern.crit www kernel: ext3_abort called.
2007-08-02 04:02:25 kern.crit www kernel: EXT3-fs error (device md2):
ext3_journal_start_sb: Detected aborted journal
2007-08-02 04:02:25 kern.crit www kernel: Remounting filesystem read-only
I unmounted the filesystem and ran fsck, but though it detected that the
filesystem had errors, it didn't report any findings during the check:
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
/dev/md2: recovering journal
/dev/md2 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/md2: 200231/5248992 files (1.4% non-contiguous), 1563304/10492432
blocks
I remounted the filesystem and all *seems* to be okay now. I was curious
what "directory #3616894" (inode 3619715) was, so I did 'find /
-inum
3619715 -exec ls -dioF {} \;', but the output showed that that was a
non-directory file created and last modified in 2004. How could this be?
And what would cause an error like the above? Am I out of the woods now, or
is there more checking of some kind that I should do to make sure this isn't
going to be happening again?
Thank you for your time!
--
Dan Harkless
http://harkless.org/dan/
Andreas Dilger
2007-Aug-03 16:49 UTC
"htree_dirblock_to_tree: bad entry in directory" error
On Aug 02, 2007 09:01 -0700, Dan Harkless wrote:> 2007-08-02 04:02:25 kern.crit www kernel: EXT3-fs error (device md2): htree_dirblock_to_tree: bad entry in directory #3616894: rec_len is too small for name_len - offset=103576, inode=3619715, rec_len=12, name_len=132 > > I unmounted the filesystem and ran fsck, but though it detected that the > filesystem had errors, it didn't report any findings during the check: > > fsck 1.35 (28-Feb-2004) > e2fsck 1.35 (28-Feb-2004) > /dev/md2: recovering journal > /dev/md2 contains a file system with errors, check forced. > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > /dev/md2: 200231/5248992 files (1.4% non-contiguous), 1563304/10492432 blocks > > I remounted the filesystem and all *seems* to be okay now. I was curious > what "directory #3616894" (inode 3619715) was, so I did 'find / -inum > 3619715 -exec ls -dioF {} \;', but the output showed that that was a > non-directory file created and last modified in 2004. How could this be?Note that the DIRECTORY is 3616894, and the entry within that directory that was corrupted is 3619715.> And what would cause an error like the above? Am I out of the woods now, or > is there more checking of some kind that I should do to make sure this isn't > going to be happening again?Given that there is no corruption on disk, I would put this toward some kind of memory corruption. It might be a single-bit error though, because 12 = 0xc and 132 = 0x84 so if you clear bit 0x80 from the name_len (leaving a name_len = 4) it would be correct for a rec_len of 12. Is the filename 4 characters long? Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.