n the past few days, I've been reading about ext3/journalling/... In order to fully understand how it works I have a few questions. *Imagine the following situation: you opened a file in vi(m), you are editing, but haven't yet saved your work. The system crashes: what will be the result? Will the metadata be modified (assume both atime and noatime)? Will the data itself be corrupted? Or will there be no modification whatsoever because you hadn't saved yet (your work will simply be lost)? *What happens when the system crashes during a write to the journal? Can the journal be corrupted? *About ext3's ordered mode [quote]from Wikipedia: Ordered (medium speed, medium risk) Ordered is as with writeback, but forces file contents to be written before its associated metadata is marked as committed in the journal.[/quote] What's the sequence of events here? 1. user issues command to write his work to disk 2. metadata is recorded in the journal, but is marked as "not yet executed" (or something similar) 3. data (file contents) and metadata are written to disk 4. metadata flag is set as "executed" If a crash happens between step 1 and 2, we are in the situation as described above (first situation): not yet written If a crash happens between step 2 and 3, isn't this the same as writeback? Or is this impossible (I read something about a single transaction, but I forgot where)? Crash between 3 and 4, can be corrected by replaying the journal. Is this a correct view of things? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://listman.redhat.com/archives/ext3-users/attachments/20071218/77978d12/attachment.htm>
On Tue, December 18, 2007 22:11, Bart wrote:> *Imagine the following situation: you opened a file in vi(m), you are > editing, but haven't yet saved your work. The system crashes: what will be > the result?If you haven't saved yet, nothing will happen. But since vi(m) will create a temporary file (.file.swp or something), this file could've made it the disk already.> Will the metadata be modified (assume both atime and noatime)? Will > the data itself be corrupted?The file itsself should not be corrupt. If it were, it'd have been replayed from the journal during bootup (fsck) to provide a non-corrupt filesystem.> simply be lost)? *What happens when the system crashes during a write to > the journal? Can the journal be corrupted?It shouldn't be corrupted. If it were, fsck should be able to fix that, otherwise I'd consider it as a bug.> If a crash happens between step 1 and 2, we are in the situation as > described above (first situation): not yet written If a crash happens > between step 2 and 3, isn't this the same as writeback?AFAIK, writes to the journal have to be atomic: either the journal is updated, or (when it crashes during this operation) it isn't. With data=ordered, the journal is updated after the the data made it to the disk. With data=journal, the journal is updated first. C. -- BOFH excuse #442: Trojan horse ran out of hay
On Tue, Dec 18, 2007 at 22:11:19 +0100, Bart <bart.bas at gmail.com> wrote:> n the past few days, I've been reading about ext3/journalling/... In order > to fully understand how it works I have a few questions. > > *Imagine the following situation: you opened a file in vi(m), you are > editing, but haven't yet saved your work. The system crashes: what will be > the result? Will the metadata be modified (assume both atime and noatime)? > Will the data itself be corrupted? Or will there be no modification > whatsoever because you hadn't saved yet (your work will simply be lost)? > *What happens when the system crashes during a write to the journal? Can the > journal be corrupted?vi keeps data in a scratch file, so if you haven't forced a save your original file will be intact. You should be able to use vi -r to recover at least some of the changes you were working on. This is mostly independent of what is going on in the file system. What you really are looking for with journaling is that when you are told data is safe on disk, it is in fact safe on disk. You also need to worry about drive caching when dealing with this. You can either turn write caching off, have it backed by battery (common with real raid controllers) or use write barriers (this is a mount option) to force cache flushes when needed (not all drives support this), or use disk drives that can report back when commands have really been completed (which is not available for PATA drives).> *About ext3's ordered mode > [quote]from Wikipedia: > Ordered > (medium speed, medium risk) Ordered is as with writeback, but forces > file contents to be written before its associated metadata is marked as > committed in the journal.[/quote]Note that for some workloads data=journal can be as fast as data=ordered. Are you really having a throughput problem with your disk drives? If not then you probably want to use data=journal (assuming that reliability is of high concern to you). If you are having throughput problems there are other potential solutions than reducing the effectiveness of journalling.