Hi,
I am getting the following errors on /boot partition on a production
server running 2.4.18 kernel.
-------------
Aug 10 00:19:37 kernel: EXT3-fs error (device sd(8,1)):
ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal -
offset=0, inode=0, rec_len=0, name_len=0
-------------
/boot is not readable at all and attempt to do an ls will return the same
error.
attempts to force an fsck on that mounted partition (not sure if if it ok
to umount /boot) results in fsck returning a segmentation fault.
here is an strace of fsck crashing.
Any idea on how to recover form this one gracefully? preferably witout
rebooting :)
------
strace /sbin/fsck.ext3 -n /dev/sda1
execve("/sbin/fsck.ext3", ["/sbin/fsck.ext3",
"-n", "/dev/sda1"], [/* 26
vars */]) = 0
fcntl64(0, 0x1, 0, 0xbffffc74) = 0
fcntl64(0x1, 0x1, 0, 0xbffffc74) = 0
fcntl64(0x2, 0x1, 0, 0xbffffc74) = 0
geteuid32() = 0
getuid32() = 0
getegid32() = 0
getgid32() = 0
brk(0) = 0x80d4fd0
brk(0x80d4ff0) = 0x80d4ff0
brk(0x80d5000) = 0x80d5000
brk(0x80d6000) = 0x80d6000
rt_sigaction(SIGUSR1, {0x8048e40, [], SA_RESTART|0x4000000}, NULL, 8) = 0
rt_sigaction(SIGUSR2, {0x8048e74, [], SA_RESTART|0x4000000}, NULL, 8) = 0
open("/dev/null", O_RDWR) = 3
close(3) = 0
gettimeofday({1028968512, 177653}, NULL) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 0
write(2, "e2fsck 1.26 (3-Feb-2002)\n", 25e2fsck 1.26 (3-Feb-2002)
) = 25
stat64("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1),
...}) =
0
open("/proc/swaps", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x40000000
read(3, "Filename\t\t\tType\t\tSize\tUsed\tPrior"..., 1024) = 148
stat64("/dev/sda7", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 7),
...}) =
0
stat64("/dev/sda8", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 8),
...}) =
0
read(3, "", 1024) = 0
open("/proc/mounts", O_RDONLY) = 5
stat64("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1),
...}) =
0
brk(0x80d8000) = 0x80d8000
fstat64(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x40001000
read(5, "/dev/root / ext3 rw 0 0\n/proc /p"..., 1024) = 529
stat64("/dev/root", 0xbffff930) = -1 ENOENT (No such file or
directory)
stat64("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat64("/boot", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
close(5) = 0
munmap(0x40001000, 4096) = 0
write(1, "Warning! /dev/sda1 is mounted.\n", 32Warning! /dev/sda1 is
mounted.
) = 32
brk(0x80d9000) = 0x80d9000
open("/dev/sda1", O_RDONLY|O_LARGEFILE) = 5
lseek(5, 1024, SEEK_SET) = 1024
read(5, "\210\27\0\0\2^\0\0\263\4\0\0B(\0\0X\27\0\0\1\0\0\0\0\0"...,
1024)
= 1024
brk(0x80da000) = 0x80da000
lseek(5, 2048, SEEK_SET) = 2048
read(5, "\3\0\0\0\4\0\0\0\5\0\0\0\17\6\265\7\2\0\0\0\0\0\0\0\0\0"...,
1024) = 1024
lseek(5, 5120, SEEK_SET) = 5120
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1024)
= 1024
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++
------------
------
Martial
On Aug 10, 2002 18:42 +1000, Martial Herbaut wrote:> I am getting the following errors on /boot partition on a production > server running 2.4.18 kernel. > > ------------- > Aug 10 00:19:37 kernel: EXT3-fs error (device sd(8,1)): > ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - > offset=0, inode=0, rec_len=0, name_len=0 > > /boot is not readable at all and attempt to do an ls will return the same > error.This looks like it was trying to read some data and it got zero instead.> attempts to force an fsck on that mounted partition (not sure if if it ok > to umount /boot) results in fsck returning a segmentation fault.It is a very bad idea to fsck a mounted filesystem. You can try just unmounting it and then running e2fsck on it (you probably have to stop sysklogd because it keeps System.map open, maybe others if lsof shows anything). Have you rebooted this system since you got the error? It may also be that if a bad page was "read" in, that any further attempts to read this chunk of data are being read from cache instead of from the disk. This is partly speculation though.> here is an strace of fsck crashing. > > open("/dev/sda1", O_RDONLY|O_LARGEFILE) = 5 > lseek(5, 1024, SEEK_SET) = 1024 > read(5, "\210\27\0\0\2^\0\0\263\4\0\0B(\0\0X\27\0\0\1\0\0\0\0\0"..., 1024) > = 1024Reading the superblock, OK.> lseek(5, 2048, SEEK_SET) = 2048 > read(5, "\3\0\0\0\4\0\0\0\5\0\0\0\17\6\265\7\2\0\0\0\0\0\0\0\0\0"..., > 1024) = 1024Reading the group descriptor table, OK.> lseek(5, 5120, SEEK_SET) = 5120 > read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) > = 1024Reading the inode table, including the root inode (#2, which is what you are having problems with). Hard to tell if this is bad data or not, since the inode #1 space may well be all zeros (128 bytes worth). In any case, the fact that e2fsck is crashing is bad, unless, of course, if it is crashing because of a kernel oops, which is even worse. What would be very useful is if you ran e2fsck under GDB and found where it is crashing, so that this can be fixed. You may have to download the sources and build it yourself to get a version of e2fsck with debugging symbols. You could try debugfs to see what is there, like (this may crash also, but will do no harm): # debugfs /dev/sda1 debugfs> stats debugfs> stat <2> Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/
Thank you Andreas,
I unmounted it and it returned:
fsck.ext3 /dev/sda1
e2fsck 1.26 (3-Feb-2002)
Group descriptors look bad... trying backup blocks...
Segmentation fault
-----
the system has not been rebooted. would rather not..
----------------
here is the degubfs output:
-------
debugfs: stats
Filesystem volume name: /boot
Last mounted on: <not available>
Filesystem UUID: bf4f160a-e31b-11d5-8665-ec36598cd26c
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal filetype sparse_super
Filesystem state: clean with errors
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 6024
Block count: 24066
Reserved block count: 1203
Free blocks: 10306
Free inodes: 5976
First block: 1
Block size: 1024
Fragment size: 1024
Blocks per group: 8192
Fragments per group: 8192
Inodes per group: 2008
Inode blocks per group: 251
Last mount time: Wed Jul 24 13:36:16 2002
Last write time: Sat Aug 10 21:59:43 2002
Mount count: 17
Maximum mount count: -1
Last checked: Sat Apr 27 23:29:38 2002
Check interval: 0 (<none>)
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal UUID: <none>
Journal inode: 8
Journal device: 0x0000
First orphan inode: 0
Group 0: block bitmap at 0, inode bitmap at 0, inode table at 0
0 free blocks, 0 free inodes, 0 used directories
Group 1: block bitmap at 0, inode bitmap at 0, inode table at 0
0 free blocks, 0 free inodes, 0 used directories
Group 2: block bitmap at 0, inode bitmap at 0, inode table at 0
0 free blocks, 0 free inodes, 0 used directories
------------
>
> On Aug 10, 2002 18:42 +1000, Martial Herbaut wrote:
> > I am getting the following errors on /boot partition on a production
> > server running 2.4.18 kernel.
> >
> > -------------
> > Aug 10 00:19:37 kernel: EXT3-fs error (device sd(8,1)):
> > ext3_readdir: bad entry in directory #2: rec_len is smaller than
minimal -
> > offset=0, inode=0, rec_len=0, name_len=0
> >
> > /boot is not readable at all and attempt to do an ls will return the
same
> > error.
>
> This looks like it was trying to read some data and it got zero instead.
>
> > attempts to force an fsck on that mounted partition (not sure if if
it ok
> > to umount /boot) results in fsck returning a segmentation fault.
>
> It is a very bad idea to fsck a mounted filesystem. You can try just
> unmounting it and then running e2fsck on it (you probably have to stop
> sysklogd because it keeps System.map open, maybe others if lsof shows
> anything).
>
> Have you rebooted this system since you got the error? It may also be
> that if a bad page was "read" in, that any further attempts to
read this
> chunk of data are being read from cache instead of from the disk. This
> is partly speculation though.
>
> > here is an strace of fsck crashing.
> >
> > open("/dev/sda1", O_RDONLY|O_LARGEFILE) = 5
> > lseek(5, 1024, SEEK_SET) = 1024
> > read(5,
"\210\27\0\0\2^\0\0\263\4\0\0B(\0\0X\27\0\0\1\0\0\0\0\0"..., 1024)
> > = 1024
>
> Reading the superblock, OK.
>
> > lseek(5, 2048, SEEK_SET) = 2048
> > read(5,
"\3\0\0\0\4\0\0\0\5\0\0\0\17\6\265\7\2\0\0\0\0\0\0\0\0\0"...,
> > 1024) = 1024
>
> Reading the group descriptor table, OK.
>
> > lseek(5, 5120, SEEK_SET) = 5120
> > read(5,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024)
> > = 1024
>
> Reading the inode table, including the root inode (#2, which is what you
> are having problems with). Hard to tell if this is bad data or not, since
> the inode #1 space may well be all zeros (128 bytes worth). In any
> case, the fact that e2fsck is crashing is bad, unless, of course, if it
> is crashing because of a kernel oops, which is even worse.
>
> What would be very useful is if you ran e2fsck under GDB and found where
> it is crashing, so that this can be fixed. You may have to download the
> sources and build it yourself to get a version of e2fsck with debugging
> symbols.
>
> You could try debugfs to see what is there, like (this may crash also,
> but will do no harm):
>
> # debugfs /dev/sda1
> debugfs> stats
> debugfs> stat <2>
>
> Cheers, Andreas
> --
> Andreas Dilger
> http://www-mddsp.enel.ucalgary.ca/People/adilger/
> http://sourceforge.net/projects/ext2resize/
>
>
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users@redhat.com
> https://listman.redhat.com/mailman/listinfo/ext3-users
>
--
Martial Herbaut
---------------
Server101
Fast and Reliable Hosting!
http://www.server101.com/
On Aug 10, 2002 22:14 +1000, Martial Herbaut wrote:> I unmounted it and it returned: > fsck.ext3 /dev/sda1 > e2fsck 1.26 (3-Feb-2002) > Group descriptors look bad... trying backup blocks... > Segmentation faultWell, you might get some better behaviour from 1.27, but you might not. The only way to really debug the oops is to run e2fsck under GDB, and unless that is fixed there is not much that can be done to repair this filesystem.> the system has not been rebooted. would rather not..Well, it is likely that any (bad) data in memory has been written to disk already anyways. It looks like someone/something overwrote your disk with zeros. Try something like: od -Ax -tx4 -a /dev/sda1 to see if there is anything there at all worth saving, or if the whole disk was overwritten. If the backup superblock and group descriptors are also gone (which are at an offset of 8MB into the filesystem) then there is not much hope. The only chance is that there _is_ data there and e2fsck is just crashing when it is trying to read the backups... If everything is gone, you may as well just make a new filesystem and restore your /boot partition. There is not much important data there, just your kernel and maybe the initrd. You can copy the kernel from /usr/src/linux or another system another system and re-run mkinitrd, or re-install your kernel RPM to get it back. You need to re-run lilo if you are using it and not grub.> Group 0: block bitmap at 0, inode bitmap at 0, inode table at 0 > 0 free blocks, 0 free inodes, 0 used directories > Group 1: block bitmap at 0, inode bitmap at 0, inode table at 0 > 0 free blocks, 0 free inodes, 0 used directories > Group 2: block bitmap at 0, inode bitmap at 0, inode table at 0 > 0 free blocks, 0 free inodes, 0 used directories> > > lseek(5, 2048, SEEK_SET) = 2048 > > > read(5, "\3\0\0\0\4\0\0\0\5\0\0\0\17\6\265\7\2\0\0\0\0\0\0\0\0\0"..., > > > 1024) = 1024 > > > > Reading the group descriptor table, OK.What is interesting is that in your first message, it read the group descriptor OK (it had block bitmap at block 3, inode bitmap at block 4, and the inode table at block 5, along with the free block counts), and now it looks like it also had problems reading the group descriptor. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/