On Don, 26 Jul 2001, Andrew Morton wrote:> Ted has put out a prelease of e2fsprogs-1.23 which supportsWhere to get it? On the sourceforge page there is no prerelease. Best wishes Norbert -- ciao norb +-------------------------------------------------------------------+ | Norbert Preining http://www.logic.at/people/preining | | University of Technology Vienna, Austria preining@logic.at | | DSA: 0x09C5B094 (RSA: 0xCF1FA165) mail subject: get [DSA|RSA]-key | +-------------------------------------------------------------------+
An update to the ext3 filesystem for 2.4 kernels is available at http://www.uow.edu.au/~andrewm/linux/ext3/ The diffs are against linux-2.4.7 and linux-2.4.6-ac5. The changelog is there. One rarely-occurring but oopsable bug was fixed and several quite significant performance enhancements have been made. These are in addition to the performance fixes which went into 0.9.3. Ted has put out a prelease of e2fsprogs-1.23 which supports filesystem type `auto' in /etc/fstab, so it is now possible to switch between ext3- and non-ext3-kernels without changing any configuration. It is recommended that users of earlier ext3 releases upgrade to 0.9.4. For people who are undertaking performance testing, it is perhaps useful to point out that ext3 operates in one of three different journalling modes, and that these modes have very different functionality and very different performance characteristics. Really, you need to test all three and balance the functionality which each mode offers against the throughput which you obtain in your application. The modes are: data=writeback This is classic metadata-only journalling. File data is written back to the main fs lazily. After a crash+recovery the fs's structural integrity is preserved, but the *contents* of files can and will contain old, stale data. Potentially hundreds of megabytes of it. This is the fastest mode for normal filesystem applications. data=ordered The fs ensures that file data is written into the main fs prior to committing its metadata. Hence after a crash+recovery, your files will contain the correct data. This is the default operating mode and throughput is good. It adds about one second to a four minute kernel compile when compared with ext2. Under heavier loads the difference becomes larger. data=journal All data (as well as to metadata) is written to the journal before it is released to the main fs for writeback. This is a specialised mode - for normal fs usage you're better off using ordered data, which has the same benefits of not corrupting data after crash+recovery. However for applications which require synchronous operation such as mail spools and synchronously exported NFS servers, this can be a performance win. I have seen dbench figures in this mode (where the files were opened O_SYNC) running at ten times the throughput of ext2. Not that this is the expected benefit for other applications! Looking at the above issues, one may initially think that the post-recovery data corruption is a serious issue with writeback mode, and that there are big advantages to using journalled or ordered data. However, even in these modes the affected files may be shorter-than-expected after recovery, because the app hadn't finished writing them yet. And usually, a truncated file is just as useless as one which contains garbage - it needs to be deleted. It's not really as simple as that - for small (< a few hundred k) files, it tends to be the case that either the whole file is intact after a crash, or none of it is. This is because the journalling mechanism starts a new transaction every five seconds, and a typical open/write/close operation usually fits entirely inside this window. There is also a security issue to be considered: a recovered writeback-mode filesystem will expose other people's old data to unintended recipients. Hopefully this description will help people make their deployment choices. If not, assistance is available on the ext3-users@redhat.com mailing list. -
On Don, 26 Jul 2001, Norbert Preining wrote:> Where to get it? On the sourceforge page there is no prerelease.Forget it, found it ;-) -- ciao norb +-------------------------------------------------------------------+ | Norbert Preining http://www.logic.at/people/preining | | University of Technology Vienna, Austria preining@logic.at | | DSA: 0x09C5B094 (RSA: 0xCF1FA165) mail subject: get [DSA|RSA]-key | +-------------------------------------------------------------------+
On Thu, 26 Jul 2001, Andrew Morton wrote:> data=journal > > All data (as well as to metadata) is written to the journal > before it is released to the main fs for writeback. > > This is a specialised mode - for normal fs usage you're better > off using ordered data, which has the same benefits of not corrupting > data after crash+recovery. However for applications which require > synchronous operation such as mail spools and synchronously exported > NFS servers, this can be a performance win. I have seen dbenchIn ordered and journal mode, are meta data operations, namely creating a file, rename(), link(), unlink() "synchronous" in the sense that after the call has returned, the effect of this call is never lost, i. e., if link(2) has returned and the machine crashes immediately, will the next recovery ALWAYS recover the link? Or will ext3 still need chattr +S? Does it still support chattr +S at all? Synchronous meta data operations are crucial for mail transfer agents such as Postfix or qmail. Postfix has up until now been setting chattr +S /var/spool/postfix, making original (esp. soft-updating) BSD file systems significantly faster for data (payload) writes in this directory than ext2. Note: I'm not on the ext3-users list. Please Cc: back replies. -- Matthias Andree
Matthias Andree wrote:> > On Thu, 26 Jul 2001, Andrew Morton wrote: > > > data=journal > > > > All data (as well as to metadata) is written to the journal > > before it is released to the main fs for writeback. > > > > This is a specialised mode - for normal fs usage you're better > > off using ordered data, which has the same benefits of not corrupting > > data after crash+recovery. However for applications which require > > synchronous operation such as mail spools and synchronously exported > > NFS servers, this can be a performance win. I have seen dbench > > In ordered and journal mode, are meta data operations, namely creating a > file, rename(), link(), unlink() "synchronous" in the sense that after > the call has returned, the effect of this call is never lost, i. e., if > link(2) has returned and the machine crashes immediately, will the next > recovery ALWAYS recover the link?No, they're not synchronous by default. After recovery they will either be wholly intact, or wholly absent.> Or will ext3 still need chattr +S?Yes, if the app doesn't support O_SYNC or fsync(). I believe that MTA's *do* support those things.> Does it still support chattr +S at all?Yes.> Synchronous meta data operations are crucial for mail transfer agents > such as Postfix or qmail. Postfix has up until now been setting > chattr +S /var/spool/postfix, making original (esp. soft-updating) BSD > file systems significantly faster for data (payload) writes in this > directory than ext2.If postfix is capable of opening the files O_SYNC or of doing fsync() on them then the `chattr +s' is no longer necessary - unlike ext2, when the O_SYNC write() or the fsync() return, the directory contents (as well as the inode, bitmaps, data, etc) will all be tight on disk and will be restored after a crash. This should speed things up considerably, especially with journalled-data mode. I need to test and characterise this some more to come up with some quantitative results and configuration recommendations. BTW, if you have more-than-modest throughput requirements, don't even *think* of mounting the fs with `mount -o sync'. Our performance in this mode is terrible :( I have a hack somewhere which fixes this as much as it can be fixed, but I didn't even bother committing it. It's feasible, but tiresome. A better solution is to fix some lock inversion problems in the core kernel which prevent optimal implementation of data-journalling filesystems. I don't really expect this to occur medium-term or ever. A middle-ground solution may be to add an fs-private `osync' mount option, so all files are treated similarly to O_SYNC, which would work well. -
On Thu, 26 Jul 2001, Andrew Morton wrote:> An update to the ext3 filesystem for 2.4 kernels is available at > > http://www.uow.edu.au/~andrewm/linux/ext3/I'm using ext3-0.9.4 with linux-2.4.7 / 2.4.8-pre1 and get some hangs on my dual P2-350:>From time to time I will have multiple CRON-Daemons in D-state and loginhangs when logging in. It even happens during boot before my MTA is started. I have a single ext3 partition which is exported by kernel-nfs-server. As soon as I do an Alt-SysRq-S forced sync the hang goes away and everything works normal. If you need further information send me an eMail. SGIs kdb is already compiled in so if we need it ... BYtE Philipp -- / / (_)__ __ ____ __ Philipp Hahn / /__/ / _ \/ // /\ \/ / /____/_/_//_/\_,_/ /_/\_\ pmhahn@titan.lahn.de