Hi, I am getting the following errors on /boot partition on a production server running 2.4.18 kernel. ------------- Aug 10 00:19:37 kernel: EXT3-fs error (device sd(8,1)): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0 ------------- /boot is not readable at all and attempt to do an ls will return the same error. attempts to force an fsck on that mounted partition (not sure if if it ok to umount /boot) results in fsck returning a segmentation fault. here is an strace of fsck crashing. Any idea on how to recover form this one gracefully? preferably witout rebooting :) ------ strace /sbin/fsck.ext3 -n /dev/sda1 execve("/sbin/fsck.ext3", ["/sbin/fsck.ext3", "-n", "/dev/sda1"], [/* 26 vars */]) = 0 fcntl64(0, 0x1, 0, 0xbffffc74) = 0 fcntl64(0x1, 0x1, 0, 0xbffffc74) = 0 fcntl64(0x2, 0x1, 0, 0xbffffc74) = 0 geteuid32() = 0 getuid32() = 0 getegid32() = 0 getgid32() = 0 brk(0) = 0x80d4fd0 brk(0x80d4ff0) = 0x80d4ff0 brk(0x80d5000) = 0x80d5000 brk(0x80d6000) = 0x80d6000 rt_sigaction(SIGUSR1, {0x8048e40, [], SA_RESTART|0x4000000}, NULL, 8) = 0 rt_sigaction(SIGUSR2, {0x8048e74, [], SA_RESTART|0x4000000}, NULL, 8) = 0 open("/dev/null", O_RDWR) = 3 close(3) = 0 gettimeofday({1028968512, 177653}, NULL) = 0 getrusage(RUSAGE_SELF, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 0 write(2, "e2fsck 1.26 (3-Feb-2002)\n", 25e2fsck 1.26 (3-Feb-2002) ) = 25 stat64("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1), ...}) = 0 open("/proc/swaps", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40000000 read(3, "Filename\t\t\tType\t\tSize\tUsed\tPrior"..., 1024) = 148 stat64("/dev/sda7", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 7), ...}) = 0 stat64("/dev/sda8", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 8), ...}) = 0 read(3, "", 1024) = 0 open("/proc/mounts", O_RDONLY) = 5 stat64("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1), ...}) = 0 brk(0x80d8000) = 0x80d8000 fstat64(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40001000 read(5, "/dev/root / ext3 rw 0 0\n/proc /p"..., 1024) = 529 stat64("/dev/root", 0xbffff930) = -1 ENOENT (No such file or directory) stat64("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 stat64("/boot", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0 close(5) = 0 munmap(0x40001000, 4096) = 0 write(1, "Warning! /dev/sda1 is mounted.\n", 32Warning! /dev/sda1 is mounted. ) = 32 brk(0x80d9000) = 0x80d9000 open("/dev/sda1", O_RDONLY|O_LARGEFILE) = 5 lseek(5, 1024, SEEK_SET) = 1024 read(5, "\210\27\0\0\2^\0\0\263\4\0\0B(\0\0X\27\0\0\1\0\0\0\0\0"..., 1024) = 1024 brk(0x80da000) = 0x80da000 lseek(5, 2048, SEEK_SET) = 2048 read(5, "\3\0\0\0\4\0\0\0\5\0\0\0\17\6\265\7\2\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 lseek(5, 5120, SEEK_SET) = 5120 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 --- SIGSEGV (Segmentation fault) --- +++ killed by SIGSEGV +++ ------------ ------ Martial
On Aug 10, 2002 18:42 +1000, Martial Herbaut wrote:> I am getting the following errors on /boot partition on a production > server running 2.4.18 kernel. > > ------------- > Aug 10 00:19:37 kernel: EXT3-fs error (device sd(8,1)): > ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - > offset=0, inode=0, rec_len=0, name_len=0 > > /boot is not readable at all and attempt to do an ls will return the same > error.This looks like it was trying to read some data and it got zero instead.> attempts to force an fsck on that mounted partition (not sure if if it ok > to umount /boot) results in fsck returning a segmentation fault.It is a very bad idea to fsck a mounted filesystem. You can try just unmounting it and then running e2fsck on it (you probably have to stop sysklogd because it keeps System.map open, maybe others if lsof shows anything). Have you rebooted this system since you got the error? It may also be that if a bad page was "read" in, that any further attempts to read this chunk of data are being read from cache instead of from the disk. This is partly speculation though.> here is an strace of fsck crashing. > > open("/dev/sda1", O_RDONLY|O_LARGEFILE) = 5 > lseek(5, 1024, SEEK_SET) = 1024 > read(5, "\210\27\0\0\2^\0\0\263\4\0\0B(\0\0X\27\0\0\1\0\0\0\0\0"..., 1024) > = 1024Reading the superblock, OK.> lseek(5, 2048, SEEK_SET) = 2048 > read(5, "\3\0\0\0\4\0\0\0\5\0\0\0\17\6\265\7\2\0\0\0\0\0\0\0\0\0"..., > 1024) = 1024Reading the group descriptor table, OK.> lseek(5, 5120, SEEK_SET) = 5120 > read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) > = 1024Reading the inode table, including the root inode (#2, which is what you are having problems with). Hard to tell if this is bad data or not, since the inode #1 space may well be all zeros (128 bytes worth). In any case, the fact that e2fsck is crashing is bad, unless, of course, if it is crashing because of a kernel oops, which is even worse. What would be very useful is if you ran e2fsck under GDB and found where it is crashing, so that this can be fixed. You may have to download the sources and build it yourself to get a version of e2fsck with debugging symbols. You could try debugfs to see what is there, like (this may crash also, but will do no harm): # debugfs /dev/sda1 debugfs> stats debugfs> stat <2> Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/
Thank you Andreas, I unmounted it and it returned: fsck.ext3 /dev/sda1 e2fsck 1.26 (3-Feb-2002) Group descriptors look bad... trying backup blocks... Segmentation fault ----- the system has not been rebooted. would rather not.. ---------------- here is the degubfs output: ------- debugfs: stats Filesystem volume name: /boot Last mounted on: <not available> Filesystem UUID: bf4f160a-e31b-11d5-8665-ec36598cd26c Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal filetype sparse_super Filesystem state: clean with errors Errors behavior: Continue Filesystem OS type: Linux Inode count: 6024 Block count: 24066 Reserved block count: 1203 Free blocks: 10306 Free inodes: 5976 First block: 1 Block size: 1024 Fragment size: 1024 Blocks per group: 8192 Fragments per group: 8192 Inodes per group: 2008 Inode blocks per group: 251 Last mount time: Wed Jul 24 13:36:16 2002 Last write time: Sat Aug 10 21:59:43 2002 Mount count: 17 Maximum mount count: -1 Last checked: Sat Apr 27 23:29:38 2002 Check interval: 0 (<none>) Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal UUID: <none> Journal inode: 8 Journal device: 0x0000 First orphan inode: 0 Group 0: block bitmap at 0, inode bitmap at 0, inode table at 0 0 free blocks, 0 free inodes, 0 used directories Group 1: block bitmap at 0, inode bitmap at 0, inode table at 0 0 free blocks, 0 free inodes, 0 used directories Group 2: block bitmap at 0, inode bitmap at 0, inode table at 0 0 free blocks, 0 free inodes, 0 used directories ------------> > On Aug 10, 2002 18:42 +1000, Martial Herbaut wrote: > > I am getting the following errors on /boot partition on a production > > server running 2.4.18 kernel. > > > > ------------- > > Aug 10 00:19:37 kernel: EXT3-fs error (device sd(8,1)): > > ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - > > offset=0, inode=0, rec_len=0, name_len=0 > > > > /boot is not readable at all and attempt to do an ls will return the same > > error. > > This looks like it was trying to read some data and it got zero instead. > > > attempts to force an fsck on that mounted partition (not sure if if it ok > > to umount /boot) results in fsck returning a segmentation fault. > > It is a very bad idea to fsck a mounted filesystem. You can try just > unmounting it and then running e2fsck on it (you probably have to stop > sysklogd because it keeps System.map open, maybe others if lsof shows > anything). > > Have you rebooted this system since you got the error? It may also be > that if a bad page was "read" in, that any further attempts to read this > chunk of data are being read from cache instead of from the disk. This > is partly speculation though. > > > here is an strace of fsck crashing. > > > > open("/dev/sda1", O_RDONLY|O_LARGEFILE) = 5 > > lseek(5, 1024, SEEK_SET) = 1024 > > read(5, "\210\27\0\0\2^\0\0\263\4\0\0B(\0\0X\27\0\0\1\0\0\0\0\0"..., 1024) > > = 1024 > > Reading the superblock, OK. > > > lseek(5, 2048, SEEK_SET) = 2048 > > read(5, "\3\0\0\0\4\0\0\0\5\0\0\0\17\6\265\7\2\0\0\0\0\0\0\0\0\0"..., > > 1024) = 1024 > > Reading the group descriptor table, OK. > > > lseek(5, 5120, SEEK_SET) = 5120 > > read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) > > = 1024 > > Reading the inode table, including the root inode (#2, which is what you > are having problems with). Hard to tell if this is bad data or not, since > the inode #1 space may well be all zeros (128 bytes worth). In any > case, the fact that e2fsck is crashing is bad, unless, of course, if it > is crashing because of a kernel oops, which is even worse. > > What would be very useful is if you ran e2fsck under GDB and found where > it is crashing, so that this can be fixed. You may have to download the > sources and build it yourself to get a version of e2fsck with debugging > symbols. > > You could try debugfs to see what is there, like (this may crash also, > but will do no harm): > > # debugfs /dev/sda1 > debugfs> stats > debugfs> stat <2> > > Cheers, Andreas > -- > Andreas Dilger > http://www-mddsp.enel.ucalgary.ca/People/adilger/ > http://sourceforge.net/projects/ext2resize/ > > > > _______________________________________________ > Ext3-users mailing list > Ext3-users@redhat.com > https://listman.redhat.com/mailman/listinfo/ext3-users >-- Martial Herbaut --------------- Server101 Fast and Reliable Hosting! http://www.server101.com/
On Aug 10, 2002 22:14 +1000, Martial Herbaut wrote:> I unmounted it and it returned: > fsck.ext3 /dev/sda1 > e2fsck 1.26 (3-Feb-2002) > Group descriptors look bad... trying backup blocks... > Segmentation faultWell, you might get some better behaviour from 1.27, but you might not. The only way to really debug the oops is to run e2fsck under GDB, and unless that is fixed there is not much that can be done to repair this filesystem.> the system has not been rebooted. would rather not..Well, it is likely that any (bad) data in memory has been written to disk already anyways. It looks like someone/something overwrote your disk with zeros. Try something like: od -Ax -tx4 -a /dev/sda1 to see if there is anything there at all worth saving, or if the whole disk was overwritten. If the backup superblock and group descriptors are also gone (which are at an offset of 8MB into the filesystem) then there is not much hope. The only chance is that there _is_ data there and e2fsck is just crashing when it is trying to read the backups... If everything is gone, you may as well just make a new filesystem and restore your /boot partition. There is not much important data there, just your kernel and maybe the initrd. You can copy the kernel from /usr/src/linux or another system another system and re-run mkinitrd, or re-install your kernel RPM to get it back. You need to re-run lilo if you are using it and not grub.> Group 0: block bitmap at 0, inode bitmap at 0, inode table at 0 > 0 free blocks, 0 free inodes, 0 used directories > Group 1: block bitmap at 0, inode bitmap at 0, inode table at 0 > 0 free blocks, 0 free inodes, 0 used directories > Group 2: block bitmap at 0, inode bitmap at 0, inode table at 0 > 0 free blocks, 0 free inodes, 0 used directories> > > lseek(5, 2048, SEEK_SET) = 2048 > > > read(5, "\3\0\0\0\4\0\0\0\5\0\0\0\17\6\265\7\2\0\0\0\0\0\0\0\0\0"..., > > > 1024) = 1024 > > > > Reading the group descriptor table, OK.What is interesting is that in your first message, it read the group descriptor OK (it had block bitmap at block 3, inode bitmap at block 4, and the inode table at block 5, along with the free block counts), and now it looks like it also had problems reading the group descriptor. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/