Hi. I woke up this morning to find a ton of waiting emails complaining that some cron jobs on my system couldn't run because one of my filesystems (ext3 on software RAID 1) was suddenly mounted read-only. Always nice when you're away from the server due to travel. ;^> I investigated in the logs and found: 2007-08-02 04:02:25 kern.crit www kernel: EXT3-fs error (device md2): htree_dirblock_to_tree: bad entry in directory #3616894: rec_len is too small for name_len - offset=103576, inode=3619715, rec_len=12, name_len=132 2007-08-02 04:02:25 kern.err www kernel: Aborting journal on device md2. 2007-08-02 04:02:25 kern.crit www kernel: ext3_abort called. 2007-08-02 04:02:25 kern.crit www kernel: EXT3-fs error (device md2): ext3_journal_start_sb: Detected aborted journal 2007-08-02 04:02:25 kern.crit www kernel: Remounting filesystem read-only I unmounted the filesystem and ran fsck, but though it detected that the filesystem had errors, it didn't report any findings during the check: fsck 1.35 (28-Feb-2004) e2fsck 1.35 (28-Feb-2004) /dev/md2: recovering journal /dev/md2 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/md2: 200231/5248992 files (1.4% non-contiguous), 1563304/10492432 blocks I remounted the filesystem and all *seems* to be okay now. I was curious what "directory #3616894" (inode 3619715) was, so I did 'find / -inum 3619715 -exec ls -dioF {} \;', but the output showed that that was a non-directory file created and last modified in 2004. How could this be? And what would cause an error like the above? Am I out of the woods now, or is there more checking of some kind that I should do to make sure this isn't going to be happening again? Thank you for your time! -- Dan Harkless http://harkless.org/dan/
Andreas Dilger
2007-Aug-03 16:49 UTC
"htree_dirblock_to_tree: bad entry in directory" error
On Aug 02, 2007 09:01 -0700, Dan Harkless wrote:> 2007-08-02 04:02:25 kern.crit www kernel: EXT3-fs error (device md2): htree_dirblock_to_tree: bad entry in directory #3616894: rec_len is too small for name_len - offset=103576, inode=3619715, rec_len=12, name_len=132 > > I unmounted the filesystem and ran fsck, but though it detected that the > filesystem had errors, it didn't report any findings during the check: > > fsck 1.35 (28-Feb-2004) > e2fsck 1.35 (28-Feb-2004) > /dev/md2: recovering journal > /dev/md2 contains a file system with errors, check forced. > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > /dev/md2: 200231/5248992 files (1.4% non-contiguous), 1563304/10492432 blocks > > I remounted the filesystem and all *seems* to be okay now. I was curious > what "directory #3616894" (inode 3619715) was, so I did 'find / -inum > 3619715 -exec ls -dioF {} \;', but the output showed that that was a > non-directory file created and last modified in 2004. How could this be?Note that the DIRECTORY is 3616894, and the entry within that directory that was corrupted is 3619715.> And what would cause an error like the above? Am I out of the woods now, or > is there more checking of some kind that I should do to make sure this isn't > going to be happening again?Given that there is no corruption on disk, I would put this toward some kind of memory corruption. It might be a single-bit error though, because 12 = 0xc and 132 = 0x84 so if you clear bit 0x80 from the name_len (leaving a name_len = 4) it would be correct for a rec_len of 12. Is the filename 4 characters long? Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.