Patch against linux-2.4.8 is at http://www.uow.edu.au/~andrewm/linux/ext3/ The only changes here are merging up to 2.4.8 and the bigendian fix. linux-2.4.8-ac1 currently has ext3-0.9.3 which has no known crash-worthy bugs, but is old. I'm about to send Alan a diff which takes -ac up to 0.9.6. The changes between 0.9.3 and 0.9.6 may be summarised as: - Simplify the handling of synchronous operations (O_SYNC, fsync(), chattr +S, etc). - Fix a couple of places where we're not syncing writes when we should. - Implement batching of synchronous operations: when multiple threads want to perform synchronous writes we allow the threads to block together and all their writes happen in the same transaction. Speeds things up muchly. - Implement support for external journal devices. This is experimental at this stage. It works fine, but the operational interfaces will change. At present the external journal device is not "mounted" when we're using it and it really should be. - ext3 has for a long time had developer code which allows the target device to be turned read-only at the disk device driver level a certain number of jiffies after the fs was mounted. This is to allow scripted testing of crash recovery. This facility has been extended to support two devices; one for the filesystem and one for the external journal device. - Accelerate an O(N^2) algorithm in log_do_checkpoint(). - Accelerate an O(N^2) algorithm in journal_commit_transaction(). - Rate-limit some error messages which can come out when we're hopelessly out of memory. - Honour __GFP_WAIT in journal_try_to_free_buffers(). The fs is supposed to perform synchronous writeout on the second pass of page_launder() and we weren't doing that - we were starting all IO async. The net effect of this change is to decrease throughout with dbench by 10-20%, but system CPU time goes from 60% to 30%. It's the right thing to do...
On Sat, Aug 11, 2001 at 06:40:22PM -0700, Andrew Morton wrote:> Patch against linux-2.4.8 is at > > http://www.uow.edu.au/~andrewm/linux/ext3/ > > The only changes here are merging up to 2.4.8 and the bigendian > fix.Gack. I think about when you wrote this, I managed to crash again. I was running 2.4.8-pre8 + fsync_dev -> fsync_no_super + first fix. It was at transaction.c:1184, but the logs didn't make it to disk. On a related note, what does ext3 do to the disk when this happens, I think I need to point the yaboot author at it since it couldn't load a kernel (which was fun, let me tell you.. :)) -- Tom Rini (TR1265) http://gate.crashing.org/~trini/
On Sat, Aug 11, 2001 at 06:40:22PM -0700, Andrew Morton wrote:> - ext3 has for a long time had developer code which allows the target device > to be turned read-only at the disk device driver level a certain number > of jiffies after the fs was mounted. This is to allow scripted testing > of crash recovery. This facility has been extended to support two devices; > one for the filesystem and one for the external journal device.Would this facility also be able to deal with parts of a device becoming read-only unexpectedly? Some of the disks I have in RAIDs have the nice habit of disabling write access when overheating. That's an interesting failure scenario in a RAID system. Ralf
Ralf Baechle wrote:> > On Sat, Aug 11, 2001 at 06:40:22PM -0700, Andrew Morton wrote: > > > - ext3 has for a long time had developer code which allows the target device > > to be turned read-only at the disk device driver level a certain number > > of jiffies after the fs was mounted. This is to allow scripted testing > > of crash recovery. This facility has been extended to support two devices; > > one for the filesystem and one for the external journal device. > > Would this facility also be able to deal with parts of a device becoming > read-only unexpectedly? Some of the disks I have in RAIDs have the > nice habit of disabling write access when overheating. That's an > interesting failure scenario in a RAID system.Well, that facility is purely for development purposes. The obvious way of testing recovery is to hit the reset button at strategic times, which rather sucks. So what the above IDE driver trick does is adds a new mount option `ro-after=3000'. When this is provided, a kernel timer fires 30 seconds after mount and the IDE driver starts silently ignoring writes to the underlying device. It also provides a special ioctl() which blocks the caller until the timer has fired. So we have scripts which do: 1: mount fs, set to go read-only in 30 seconds 2: start some filesystem activity 3: Block on the timer 4: wake up, kill off the filesystem activity 5: unmount the fs 6: mount the fs (this will run recovery) 7: unmount the fs 8: fsck it 9: repeat with a different read-only interval. I also have a hacked-on version of dbench which writes known-but-variable info into the files, so we can check that the contents of whatever files survived the "crash" are correct. This setup has allowed me to run crash+recovery many thousands of times with varying workloads - I'm pretty confident about recovery because of this. The one thing it doesn't cover is the effects of disk write caching. As for the RAID problem: if the filesystem has magically turned read-only then all you need to do is to unmount it (often hard to do, if it's in use), then make it writable and then mount it or run fsck against it. ext3 will perform recovery and all should be peachy, until next time... -