thr3ads.net - Ext3 users - Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs [May 2002]

If this information is useful, please help other people find it:
Share via:

Neil Brown

2002-May-21 06:30 UTC

Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

Hi,
 I recently upgraded one of my fileservers from 2.4.16 to 2.4.18 plus
 the ext3-cvs.patch that Andrew Morton pointed me to for addressing
 and assertion failure.

 Since then I have been getting lots of errors like:

May 21 14:07:03 glass kernel: EXT3-fs error (device md(9,0)): ext3_add_entry:
bad entry in directory #2945366: rec_len %% 4 != 0 - offset=0, inode=1886221359,
rec_len=24927, name_len=109
May 21 14:07:23 glass kernel: EXT3-fs error (device md(9,0)): ext3_readdir: bad
entry in directory #2945366: rec_len %% 4 != 0 - offset=0, inode=1886221359,
rec_len=24927, name_len=109
May 21 14:07:23 glass kernel: EXT3-fs warning (device md(9,0)): empty_dir: bad
directory (dir #2945366) - no `.' or `..'

 If I hunt down the directory (find .  -inum ...) and do an "ls -la",
 it appears empty and reponds reasonably well to "rmdir".
 
 So far the directories have mostly been browers caches, so no real
 data has been lost (I think), but it is worrisome.

 I will probably revert to 2.4.16 plus the relevant bits of the CVS
 patch hand-applied.  But I wonder if anyone else has this or
 has any idea what might be happening?
 The directories haven't been moved from buffercache to
 pagecache between 2.4.16 and 2.4.18 or anything like that have they?

 Possibly related...

 My ext3 filesystem is on a raid5 array, with the journal on
 a separate raid1 array. (data=journal mode).

 I get quite a few messages in the logs which say:
  
May 21 14:20:06 glass kernel: raid5: multiple 1 requests for sector 7540536

 For a variety of sector numbers.

 This means that raid5 has received two separate write requests, with
 two separate buffer heads, for the same sector.  This seems like a
 filesystem error to me.
 
 raid5 tries to apply them in the same order that they were received,
 but I don't feel confident that means that the *right* thing is
 happening.

 These were happening with 2.4.16, but the incidence seems to have
 increased with 2.4.18 (though that isn't a very strong observation).

NeilBrown

Neil Brown

2002-May-21 07:13 UTC

head link

Re: Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

On Tuesday May 21, neilb@cse.unsw.edu.au wrote:> 
> 
> Hi,
>  I recently upgraded one of my fileservers from 2.4.16 to 2.4.18 plus
>  the ext3-cvs.patch that Andrew Morton pointed me to for addressing
>  and assertion failure.
I just double checked, and the old kernel wasn't exactly 2.4.16...
It was 2.4.17-pre2 plus Andrew Morton 0.9.16-2417p2 patch for ext3.

NeilBrown

Stephen C. Tweedie

2002-May-21 09:48 UTC

head link

Re: Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

Hi,

On Tue, May 21, 2002 at 04:30:49PM +1000, Neil Brown wrote:
 >  The directories haven't been moved from buffercache to
>  pagecache between 2.4.16 and 2.4.18 or anything like that have they?
No, no big changes.
>  My ext3 filesystem is on a raid5 array, with the journal on
>  a separate raid1 array. (data=journal mode).
Not a configuration I've tested, but I can set up a box here with that
config and see if I can reproduce any problems.
>  I get quite a few messages in the logs which say:
>   
> May 21 14:20:06 glass kernel: raid5: multiple 1 requests for sector 7540536
Can you go into debugfs and find out what inode that sector belongs
to?  You'll need to convert it into a block number first, then use
"icheck".
>  For a variety of sector numbers.
 >  This means that raid5 has received two separate write requests, with
>  two separate buffer heads, for the same sector.  This seems like a
>  filesystem error to me.
Probably, yes.  Is this a SMP box?  I recently fixed a long-standing
bug which could cause that on SMP; it would be worth seeing if the
ext3-0.9.18 I posted fixes the problems.  Are you running dump(8)?  If
so, it could actually be a read request from dump colliding with a
write request from the fs; add_write_bh() doesn't seem to distinguish
between reads and writes when warning about IO collisions.

Otherwise, I'd really like to track this down by turning the raid
warning into a BUG() or BH_ASSERT(), and enable buffer tracing.  (In
fact I can probably do that non-destructively, by calling the buffer
trace printing manually from add_stripe_bh().)

Cheers,
 Stephen

Andreas Dilger

2002-May-21 17:40 UTC

head link

Re: Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

On May 21, 2002  16:30 +1000, Neil Brown wrote:>  I recently upgraded one of my fileservers from 2.4.16 to 2.4.18 plus
>  the ext3-cvs.patch that Andrew Morton pointed me to for addressing
>  and assertion failure.
> 
>  Since then I have been getting lots of errors like:
> 
> May 21 14:07:03 glass kernel: EXT3-fs error (device md(9,0)):
ext3_add_entry: bad entry in directory #2945366: rec_len %% 4 != 0 - offset=0,
inode=1886221359, rec_len=24927, name_len=109
> May 21 14:07:23 glass kernel: EXT3-fs error (device md(9,0)): ext3_readdir:
bad entry in directory #2945366: rec_len %% 4 != 0 - offset=0, inode=1886221359,
rec_len=24927, name_len=109
> May 21 14:07:23 glass kernel: EXT3-fs warning (device md(9,0)): empty_dir:
bad directory (dir #2945366) - no `.' or `..'
>  But I wonder if anyone else has this or has any idea what it might be?
Well, there were several recent reports of ext3 problems, and two
reports had the same message.  Two of them definitely were also using
RAID (RAID 1 and RAID 5).

See thread starting at:
http://marc.theaimsgroup.com/?l=linux-kernel&m=102070252100762&w=4

and also an older (but still 2.4.18) thread at:
http://marc.theaimsgroup.com/?l=ext3-users&m=101041709332435&w=4

No mention of RAID in the second thread.  I can't find the other
messages to which I refer in the first thread about "several other
reports of corruption with ext3 on MD RAID".

Neil, you posted in another message the text from the corrupt directory
blocks, which matches with the second thread where the bad data looked
like filenames (the first thread's directory data doesn't decode to
anything useful in ASCII).

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

Andreas Dilger

2002-May-21 17:49 UTC

head link

Re: Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

On May 21, 2002  16:30 +1000, Neil Brown wrote:>  My ext3 filesystem is on a raid5 array, with the journal on
>  a separate raid1 array. (data=journal mode).
Just to follow up on my previous email, the "other ext3/RAID problems"
are probably from the thread below (start, and ominous end):
http://marc.theaimsgroup.com/?l=ext3-users&m=101489319820233&w=4
http://marc.theaimsgroup.com/?l=ext3-users&m=102011493526790&w=4

In his case, running e2fsck on the MD RAID filesystem actually
causes corruption that wasn't there before (see last message)...

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

Seemingly Similar Threads

Search for more apparently analagous threads

Ext3 users - May 2002 - Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

Re: Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

Re: Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

Re: Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

Re: Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

Seemingly Similar Threads