Hi, After our big ext3 file server crashes, I notice the fsck spends some time replaying the journals (about 5-10 mins for all volumes on the server in question). I guess it must do this should you want to mount the volumes as ext2. My question--is it (theoretically) possible to tell fsck only to replay half-finished and to knock out incomplete transactions from the journals, leaving the kernel to replay the good ones in its own time, possibly reducing downtime by a few minutes? Or might this break assumptions the kernel code makes? Or is it totally impossible and ridiculous? :) Matt
mb/ext3@dcs.qmul.ac.uk wrote:> > Hi, > > After our big ext3 file server crashes, I notice the fsck spends some time > replaying the journals (about 5-10 mins for all volumes on the server in > question). I guess it must do this should you want to mount the volumes as > ext2.That must be one big server. Are the fsck's running in parallel? The ideal situation is that individual partitions on each disk are fsck'ed sequentially, and that all disks are fsked in parallel. So you end up with a *single* fsck running against each disk, but all disks being worked on in parallel. This is all possible, but requires some genuflection over fsck and fstab manpages. Oh, and how come your server crashes?> My question--is it (theoretically) possible to tell fsck only to replay > half-finished and to knock out incomplete transactions from the journals, > leaving the kernel to replay the good ones in its own time, possibly > reducing downtime by a few minutes? Or might this break assumptions the > kernel code makes? Or is it totally impossible and ridiculous? :) >No, this is quite possible, but not desirable. What happens at present is that recovery can replay data unnecessarily - it will rewrite transactions which were in fact fully checkpointed at the time of the crash. But addressing this shortcoming would require that the ext3 commit phase seek to the head of the journal to update the journal superblock each time we've fully checkpointed a transaction. Which would slow down normal operation to gain a recovery-time speedup. Which is a bad tradeoff. Possibly we could optimise this by putting additional information into the journal commit blocks - record the highest known-to-be-committed transaction ID within the commit block. hmm. I suggest that you ensure that you're getting the best possible parallalism in the recovery, and perhaps experiment with smaller journals. -
Hi, On Fri, Feb 15, 2002 at 11:56:38AM +0000, mb/ext3@dcs.qmul.ac.uk wrote:> After our big ext3 file server crashes, I notice the fsck spends some time > replaying the journals (about 5-10 mins for all volumes on the server in > question). I guess it must do this should you want to mount the volumes as > ext2.Yes, but 5--10 minutes is a long time. How many volumes are there? How large are the journals? Can you not parallelise the fscks a bit?> My question--is it (theoretically) possible to tell fsck only to replay > half-finished and to knock out incomplete transactions from the journals, > leaving the kernel to replay the good ones in its own time, possibly > reducing downtime by a few minutes? Or might this break assumptions the > kernel code makes? Or is it totally impossible and ridiculous? :)Incomplete transactions are always ignored completely, both by kernel and e2fsck recovery. The replay is _always_ going to get done, because if e2fsck doesn't do it, then the kernel will do exactly the same thing when you try to mount the filesystems. The kernel and e2fsprogs actually share the same recovery.c file for that. Unless Something Weird is happening, doing the recovery in fsck should be better because you'll be able to parallelise the recovery on different disks. Using kernel recovery, recovery happens at mount time, and mounts are typically done sequentially. Cheers, Stephen