Neil Brown
2003-Mar-18 01:01 UTC
Re: 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector
On Sunday March 16, gilbertd@treblig.org wrote:> Hi, > I've just built an 800GB RAID5 array and built an ext3 file system > on it; on trying to copy data off the 200GB RAID it is replacing I'm > starting to see errors of the form: > > kernel: EXT3-fs error (device md(9,2)): ext3_new_block: Allocating block in > system zone - block = 140509185 > > and > > kernel: EXT3-fs error (device md(9,2)): ext3_add_entry: bad entry in > directory #70254593: rec_len %% 4 != 0 - offset=28, inode=23880564, > rec_len=21587, name_len=76 > > and > > kernel: raid5: multiple 1 requests for sector 281018464I had exactly these symptoms about a year ago in 2.4.18. I found and fixed the problem and have just checked and the fix is definately in 2.4.20. So if you really are running 2.4.20 then it looks like a similar bug has appeared. These two symptoms strongly suggest a buffer aliasing problem. i.e. you have two buffers (one for data and one for metadata) that refer to the same location on disc. One is part of a file that was recently deleted, but the buffer hasn't been flushed yet. The other is part of a new directory. The old buffer and the new buffer both get written to disc at much the same time (hence the "multiple 1 requests"), but the old buffer hits the disc second and so corrupts the filesystem. The bug I found was specific to data=journal mode, and this certainly has more options for buffer aliasing. Were you using data=journal? NeilBrown
Andrew Morton
2003-Mar-18 03:27 UTC
Re: 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector
Neil Brown <neilb@cse.unsw.edu.au> wrote:> > These two symptoms strongly suggest a buffer aliasing problem. > i.e. you have two buffers (one for data and one for metadata) > that refer to the same location on disc. > One is part of a file that was recently deleted, but the buffer hasn't > been flushed yet. The other is part of a new directory. > The old buffer and the new buffer both get written to disc at much the > same time (hence the "multiple 1 requests"), but the old buffer hits > the disc second and so corrupts the filesystem.This aliasing can happen very easily with direct-io, and it is something which drivers should be able to cope with. I hope RAID is not still assuming that all requests are unique in this way?
Dave Gilbert (Home)
2003-Mar-18 14:04 UTC
Re: 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector
Neil Brown wrote:> The bug I found was specific to data=journal mode, and this certainly > has more options for buffer aliasing. Were you using data=journal?No. Dave