thr3ads.net - Ext3 users - ext3_readdir error [Aug 2002]

If this information is useful, please help other people find it:
Share via:

Martial Herbaut

2002-Aug-10 08:42 UTC

ext3_readdir error

Hi,

I am getting the following errors on /boot partition on a production 
server running 2.4.18 kernel.

-------------
Aug 10 00:19:37 kernel: EXT3-fs error (device sd(8,1)): 
ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - 
offset=0, inode=0, rec_len=0, name_len=0

-------------

/boot is not readable at all and attempt to do an ls will return the same 
error.

attempts to force  an fsck on that mounted partition (not sure if if it ok 
to umount /boot) results in fsck returning a segmentation fault.

here is an strace of fsck crashing.
Any idea on how to recover form this one gracefully? preferably witout 
rebooting :)


------
strace /sbin/fsck.ext3 -n /dev/sda1
execve("/sbin/fsck.ext3", ["/sbin/fsck.ext3",
"-n", "/dev/sda1"], [/* 26
vars */]) = 0
fcntl64(0, 0x1, 0, 0xbffffc74)          = 0
fcntl64(0x1, 0x1, 0, 0xbffffc74)        = 0
fcntl64(0x2, 0x1, 0, 0xbffffc74)        = 0
geteuid32()                             = 0
getuid32()                              = 0
getegid32()                             = 0
getgid32()                              = 0
brk(0)                                  = 0x80d4fd0
brk(0x80d4ff0)                          = 0x80d4ff0
brk(0x80d5000)                          = 0x80d5000
brk(0x80d6000)                          = 0x80d6000
rt_sigaction(SIGUSR1, {0x8048e40, [], SA_RESTART|0x4000000}, NULL, 8) = 0
rt_sigaction(SIGUSR2, {0x8048e74, [], SA_RESTART|0x4000000}, NULL, 8) = 0
open("/dev/null", O_RDWR)               = 3
close(3)                                = 0
gettimeofday({1028968512, 177653}, NULL) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 0
write(2, "e2fsck 1.26 (3-Feb-2002)\n", 25e2fsck 1.26 (3-Feb-2002)
) = 25
stat64("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1),
...}) =
0
open("/proc/swaps", O_RDONLY)           = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) = 0x40000000
read(3, "Filename\t\t\tType\t\tSize\tUsed\tPrior"..., 1024) = 148
stat64("/dev/sda7", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 7),
...}) =
0
stat64("/dev/sda8", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 8),
...}) =
0
read(3, "", 1024)                       = 0
open("/proc/mounts", O_RDONLY)          = 5
stat64("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1),
...}) =
0
brk(0x80d8000)                          = 0x80d8000
fstat64(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) = 0x40001000
read(5, "/dev/root / ext3 rw 0 0\n/proc /p"..., 1024) = 529
stat64("/dev/root", 0xbffff930)         = -1 ENOENT (No such file or 
directory)
stat64("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat64("/boot", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
close(5)                                = 0
munmap(0x40001000, 4096)                = 0
write(1, "Warning!  /dev/sda1 is mounted.\n", 32Warning!  /dev/sda1 is
mounted.
) = 32
brk(0x80d9000)                          = 0x80d9000
open("/dev/sda1", O_RDONLY|O_LARGEFILE) = 5
lseek(5, 1024, SEEK_SET)                = 1024
read(5, "\210\27\0\0\2^\0\0\263\4\0\0B(\0\0X\27\0\0\1\0\0\0\0\0"...,
1024)
= 1024
brk(0x80da000)                          = 0x80da000
lseek(5, 2048, SEEK_SET)                = 2048
read(5, "\3\0\0\0\4\0\0\0\5\0\0\0\17\6\265\7\2\0\0\0\0\0\0\0\0\0"..., 
1024) = 1024
lseek(5, 5120, SEEK_SET)                = 5120
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1024)
= 1024
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++

------------


------
Martial

Andreas Dilger

2002-Aug-10 10:35 UTC

head link

Re: ext3_readdir error

On Aug 10, 2002  18:42 +1000, Martial Herbaut wrote:> I am getting the following errors on /boot partition on a production 
> server running 2.4.18 kernel.
> 
> -------------
> Aug 10 00:19:37 kernel: EXT3-fs error (device sd(8,1)): 
> ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - 
> offset=0, inode=0, rec_len=0, name_len=0
> 
> /boot is not readable at all and attempt to do an ls will return the same 
> error.
This looks like it was trying to read some data and it got zero instead.
> attempts to force  an fsck on that mounted partition (not sure if if it ok 
> to umount /boot) results in fsck returning a segmentation fault.
It is a very bad idea to fsck a mounted filesystem.  You can try just
unmounting it and then running e2fsck on it (you probably have to stop
sysklogd because it keeps System.map open, maybe others if lsof shows
anything).

Have you rebooted this system since you got the error?  It may also be
that if a bad page was "read" in, that any further attempts to read
this
chunk of data are being read from cache instead of from the disk.  This
is partly speculation though.
> here is an strace of fsck crashing.
> 
> open("/dev/sda1", O_RDONLY|O_LARGEFILE) = 5
> lseek(5, 1024, SEEK_SET)                = 1024
> read(5,
"\210\27\0\0\2^\0\0\263\4\0\0B(\0\0X\27\0\0\1\0\0\0\0\0"..., 1024)
> = 1024
Reading the superblock, OK.
> lseek(5, 2048, SEEK_SET)                = 2048
> read(5,
"\3\0\0\0\4\0\0\0\5\0\0\0\17\6\265\7\2\0\0\0\0\0\0\0\0\0"...,
> 1024) = 1024
Reading the group descriptor table, OK.
> lseek(5, 5120, SEEK_SET)                = 5120
> read(5,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024)
> = 1024
Reading the inode table, including the root inode (#2, which is what you
are having problems with).  Hard to tell if this is bad data or not, since
the inode #1 space may well be all zeros (128 bytes worth).  In any
case, the fact that e2fsck is crashing is bad, unless, of course, if it
is crashing because of a kernel oops, which is even worse.

What would be very useful is if you ran e2fsck under GDB and found where
it is crashing, so that this can be fixed.  You may have to download the
sources and build it yourself to get a version of e2fsck with debugging
symbols.

You could try debugfs to see what is there, like (this may crash also,
but will do no harm):

# debugfs /dev/sda1
debugfs> stats
debugfs> stat <2>

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

Martial Herbaut

2002-Aug-10 12:14 UTC

head link

Re: ext3_readdir error

Thank you Andreas,

I unmounted it and it returned:
fsck.ext3 /dev/sda1
e2fsck 1.26 (3-Feb-2002)
Group descriptors look bad... trying backup blocks...
Segmentation fault

-----
the system has not been rebooted. would rather not..

----------------

here is the degubfs output:

-------
debugfs:  stats
Filesystem volume name:   /boot
Last mounted on:          <not available>
Filesystem UUID:          bf4f160a-e31b-11d5-8665-ec36598cd26c
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal filetype sparse_super
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              6024
Block count:              24066
Reserved block count:     1203
Free blocks:              10306
Free inodes:              5976
First block:              1
Block size:               1024
Fragment size:            1024
Blocks per group:         8192
Fragments per group:      8192
Inodes per group:         2008
Inode blocks per group:   251
Last mount time:          Wed Jul 24 13:36:16 2002
Last write time:          Sat Aug 10 21:59:43 2002
Mount count:              17
Maximum mount count:      -1
Last checked:             Sat Apr 27 23:29:38 2002
Check interval:           0 (<none>)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal UUID:             <none>
Journal inode:            8
Journal device:           0x0000
First orphan inode:       0
 Group  0: block bitmap at 0, inode bitmap at 0, inode table at 0
           0 free blocks, 0 free inodes, 0 used directories
 Group  1: block bitmap at 0, inode bitmap at 0, inode table at 0
           0 free blocks, 0 free inodes, 0 used directories
 Group  2: block bitmap at 0, inode bitmap at 0, inode table at 0
           0 free blocks, 0 free inodes, 0 used directories

------------


> 
> On Aug 10, 2002  18:42 +1000, Martial Herbaut wrote:
> > I am getting the following errors on /boot partition on a production 
> > server running 2.4.18 kernel.
> > 
> > -------------
> > Aug 10 00:19:37 kernel: EXT3-fs error (device sd(8,1)): 
> > ext3_readdir: bad entry in directory #2: rec_len is smaller than
minimal -
> > offset=0, inode=0, rec_len=0, name_len=0
> > 
> > /boot is not readable at all and attempt to do an ls will return the
same
> > error.
> 
> This looks like it was trying to read some data and it got zero instead.
> 
> > attempts to force  an fsck on that mounted partition (not sure if if
it ok
> > to umount /boot) results in fsck returning a segmentation fault.
> 
> It is a very bad idea to fsck a mounted filesystem.  You can try just
> unmounting it and then running e2fsck on it (you probably have to stop
> sysklogd because it keeps System.map open, maybe others if lsof shows
> anything).
> 
> Have you rebooted this system since you got the error?  It may also be
> that if a bad page was "read" in, that any further attempts to
read this
> chunk of data are being read from cache instead of from the disk.  This
> is partly speculation though.
> 
> > here is an strace of fsck crashing.
> > 
> > open("/dev/sda1", O_RDONLY|O_LARGEFILE) = 5
> > lseek(5, 1024, SEEK_SET)                = 1024
> > read(5,
"\210\27\0\0\2^\0\0\263\4\0\0B(\0\0X\27\0\0\1\0\0\0\0\0"..., 1024)
> > = 1024
> 
> Reading the superblock, OK.
> 
> > lseek(5, 2048, SEEK_SET)                = 2048
> > read(5,
"\3\0\0\0\4\0\0\0\5\0\0\0\17\6\265\7\2\0\0\0\0\0\0\0\0\0"...,
> > 1024) = 1024
> 
> Reading the group descriptor table, OK.
> 
> > lseek(5, 5120, SEEK_SET)                = 5120
> > read(5,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024)
> > = 1024
> 
> Reading the inode table, including the root inode (#2, which is what you
> are having problems with).  Hard to tell if this is bad data or not, since
> the inode #1 space may well be all zeros (128 bytes worth).  In any
> case, the fact that e2fsck is crashing is bad, unless, of course, if it
> is crashing because of a kernel oops, which is even worse.
> 
> What would be very useful is if you ran e2fsck under GDB and found where
> it is crashing, so that this can be fixed.  You may have to download the
> sources and build it yourself to get a version of e2fsck with debugging
> symbols.
> 
> You could try debugfs to see what is there, like (this may crash also,
> but will do no harm):
> 
> # debugfs /dev/sda1
> debugfs> stats
> debugfs> stat <2>
> 
> Cheers, Andreas
> --
> Andreas Dilger
> http://www-mddsp.enel.ucalgary.ca/People/adilger/
> http://sourceforge.net/projects/ext2resize/
> 
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users@redhat.com
> https://listman.redhat.com/mailman/listinfo/ext3-users
> 
-- 
Martial Herbaut
---------------
Server101
Fast and Reliable Hosting!
http://www.server101.com/

Andreas Dilger

2002-Aug-10 15:02 UTC

head link

Re: ext3_readdir error

On Aug 10, 2002  22:14 +1000, Martial Herbaut wrote:> I unmounted it and it returned:
> fsck.ext3 /dev/sda1
> e2fsck 1.26 (3-Feb-2002)
> Group descriptors look bad... trying backup blocks...
> Segmentation fault
Well, you might get some better behaviour from 1.27, but you might not.
The only way to really debug the oops is to run e2fsck under GDB, and
unless that is fixed there is not much that can be done to repair this
filesystem.
> the system has not been rebooted. would rather not..
Well, it is likely that any (bad) data in memory has been written to
disk already anyways.  It looks like someone/something overwrote your
disk with zeros.  Try something like:

od -Ax -tx4 -a /dev/sda1

to see if there is anything there at all worth saving, or if the whole
disk was overwritten.  If the backup superblock and group descriptors are
also gone (which are at an offset of 8MB into the filesystem) then there
is not much hope.  The only chance is that there _is_ data there and
e2fsck is just crashing when it is trying to read the backups...

If everything is gone, you may as well just make a new filesystem and
restore your /boot partition.  There is not much important data there,
just your kernel and maybe the initrd.  You can copy the kernel from
/usr/src/linux or another system another system and re-run mkinitrd,
or re-install your kernel RPM to get it back.  You need to re-run
lilo if you are using it and not grub.
>  Group  0: block bitmap at 0, inode bitmap at 0, inode table at 0
>            0 free blocks, 0 free inodes, 0 used directories
>  Group  1: block bitmap at 0, inode bitmap at 0, inode table at 0
>            0 free blocks, 0 free inodes, 0 used directories
>  Group  2: block bitmap at 0, inode bitmap at 0, inode table at 0
>            0 free blocks, 0 free inodes, 0 used directories
> > > lseek(5, 2048, SEEK_SET)                = 2048
> > > read(5,
"\3\0\0\0\4\0\0\0\5\0\0\0\17\6\265\7\2\0\0\0\0\0\0\0\0\0"...,
> > > 1024) = 1024
> > 
> > Reading the group descriptor table, OK.
What is interesting is that in your first message, it read the group
descriptor OK (it had block bitmap at block 3, inode bitmap at block 4,
and the inode table at block 5, along with the free block counts), and
now it looks like it also had problems reading the group descriptor.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

Possibly Parallel Threads

Search for more reasonably related threads

Ext3 users - Aug 2002 - ext3_readdir error

ext3_readdir error

Re: ext3_readdir error

Re: ext3_readdir error

Re: ext3_readdir error

Possibly Parallel Threads