Andreas Steinmetz
2002-Dec-04  22:29 UTC
[Fwd: [RESEND] 2.4.20: ext3: Assertion failure in journal_forget()/Oops on another system]
Just to make sure somebody reacts (please) I'm forwarding this. Please 
cc me on replies as I'm not subscribed to this list.
-------- Original Message --------
Subject: [RESEND] 2.4.20: ext3: Assertion failure in 
journal_forget()/Oops on another system
Date: Wed, 04 Dec 2002 21:27:31 +0100
From: Andreas Steinmetz <ast@domdv.de>
To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, 
sct@redhat.com,  akpm@zip.com.au,  adilger@clusterfs.com
It seems that either my previous post (below) was to vague or it either
got lost or ignored. Anyway I did hope for a spurious hardware error.
Unfortunately today I got an Oops from a completely different system
using ext3 on software raid 0 and raid 1 with data=ordered that again
points to a problem with ext3. The ksymoops output is attached. I'm
really beginning to get worried.
Below is my previous post.
--------------------------
This started to happen during larger (10MB-420MB) rsync based writes to
a striped ext3 partition (/dev/md11) residing on 4 scsi disks which is
mounted with defaults, i.e. data=ordered (rsync over 100Mbps link):
Dec  1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114696
Dec  1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114697
Dec  1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114700
Dec  1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114701
Dec  1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114702
Dec  1 12:25:43 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_new_block:
Allocating block in system zone - block = 114706
<snip>
Dec  1 22:17:55 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_free_blocks
: Freeing blocks in system zones - Block = 573501, count = 2
Dec  1 22:17:55 pollux kernel: EXT3-fs error (device md(9,11)):
ext3_free_blocks
: Freeing blocks in system zones - Block = 573552, count = 14
Dec  1 22:17:55 pollux kernel: Assertion failure in journal_forget() at
transaction.c:1225: "!jh->b_committed_data"
Trying to access the partition resulted in processes hanging in D state:
5336 pts/0    D      0:00 ls -a -N --color=tty -T 0 -l /mnt/data8
e2fstools version is 1.32 and the partition was created with this
version using 'mke2fs -j -b 2048 -i 4096 -R stride=16 /dev/md11'.
An earlier dump of the partition data using tune2fs -l gave:
tune2fs 1.32 (09-Nov-2002)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          7c8d7827-4b25-40ab-a3b8-1c4c6e286868
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal filetype needs_recovery sparse_super
Default mount options:    (none)
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              2621440
Block count:              5236992
Reserved block count:     261849
Free blocks:              4855697
Free inodes:              2621416
First block:              0
Block size:               2048
Fragment size:            2048
Blocks per group:         16384
Fragments per group:      16384
Inodes per group:         8192
Inode blocks per group:   512
Last mount time:          Sat Nov 30 11:23:59 2002
Last write time:          Sun Dec  1 14:09:55 2002
Mount count:              2
Maximum mount count:      -1
Last checked:             Fri Dec  1 19:18:16 2000
Check interval:           0 (<none>)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal UUID:             <none>
Journal inode:            8
Journal device:           0x0000
First orphan inode:       0
Trying 'e2fsck -y /dev/md11' after a reboot showed so many errors and
continued to run for minutes that I aborted e2fsck and do assume that
the file system was completely destroyed.
After recreation of the filesystem on /dev/md11 a rsync run completed
without errors.
As a side note: the system having the rsync sources has an identical
formatted partition (the systems are hardware twins) and doesn't show
any errors.
Some final information about the raid configuration of /dev/md11:
raiddev /dev/md11
            raid-level              0
            nr-raid-disks           4
            nr-spare-disks          0
            chunk-size              32
            persistent-superblock   1
            device                  /dev/sda13
            raid-disk               0
            device                  /dev/sdb13
            raid-disk               1
            device                  /dev/sdc15
            raid-disk               2
            device                  /dev/sdd15
            raid-disk               3
-- 
Andreas Steinmetz
D.O.M. Datenverarbeitung GmbH
-- 
Andreas Steinmetz
D.O.M. Datenverarbeitung GmbH
