Hello. Is there a simple way, at a shell script level, of finding out whether an ext3 fs has a sane journal, other than mounting it or running a full fsck ? I may quite well be missing a few things here, but what I think I'd like is some option extra to e2fsck that says "if this is a journalled filesystem, and it was shut down uncleanly, just replay the journal and check for immediately obvious problems, but don't bother scanning the whole filesystem unless there's a '-f' in sight". AFAICT, the usual way of handling ext3 filesystems seems to be to mark them with fs_passno=0, so they never get fscked from the init scripts - but the journal gets replayed, and a few things get checked at mount time. If mount fails - because something horrible really did happen - then the /etc/rc.sysinit doesn't seem to have any way of coping, or dropping to an interactive shell. John.
On Dec 02, 2002 17:10 +0000, John Vickers wrote:> Is there a simple way, at a shell script level, of finding out whether an > ext3 fs has a sane journal, other than mounting it or running a full fsck ?Yes, "tune2fs -l <dev> | grep 'features:.*needs_recovery'", but reading further you do not actually need it.> I may quite well be missing a few things here, but what I think I'd like is > some option extra to e2fsck that says "if this is a journalled filesystem, > and it was shut down uncleanly, just replay the journal and check for > immediately obvious problems, but don't bother scanning the whole filesystem > unless there's a '-f' in sight".That is how e2fsck already works, no need to change anything. By default it will replay the journal, and then check the superblock for errors. If no error is marked in the superblock, it is done in a second or so[*]. Just doing this with the above script isn't enough, since errors can also be stored in the journal header in case of very serious errors, and the un-recovered filesystem superblock will _appear_ to be fine, but the filesystem really needs a full check. [*] There is also a feature of ext2/3 that allows you to specify full filesystem checks after a certain number of mounts/time. Some people turn this off in the mistaken thought that "it has a journal, I don't need no stinking fsck on my filesystems". However, a journal is no protection against disk, memory, CPU, or kernel errors, so doing periodic full fscks can help find errors before they cause cascading data corruption on your filesystem, or get detected right in the middle of some important work and make your system unusable. If you don't like the "every 20 mounts" full fsck, change it with "tune2fs -c" to be some longer interval.> AFAICT, the usual way of handling ext3 filesystems seems to be to mark them > with fs_passno=0, so they never get fscked from the init scripts - but the > journal gets replayed, and a few things get checked at mount time.That is just plain wrong, since it will skip full checking if there was an error detected in the filesystem.> If mount fails - because something horrible really did happen - then the > /etc/rc.sysinit doesn't seem to have any way of coping, or dropping to an > interactive shell.That's why you should have passno != 0 for all ext3 filesystems, so that e2fsck has a chance to check the superblock before the filesystem is mounted. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
On Mon, 2002-12-02 at 17:10, John Vickers wrote:> Is there a simple way, at a shell script level, of finding out whether an ext3 fs > has a sane journal, other than mounting it or running a full fsck ?Define a "sane journal"? The journal just contains copies of disk blocks. It's nothing more than a list of "here's a copy of the new block number FOO of the filesystem." And the journal is *supposed* to contain gaps after an unexpected reboot --- it's by looking for missing bits that we work out just how much of the journal did get successfully written out to disk when things crashed. In other words, the journal is really really dumb, and there's next to no validation you can sensibly do on its contents without invoking a lot of filesystem layout knowledge (and at that point you're into full fsck territory.)> AFAICT, the usual way of handling ext3 filesystems seems to be to mark them with fs_passno=0, > so they never get fscked from the init scripts - but the journal gets replayed, and a few things > get checked at mount time.No, you should give them a valid pass number to force fsck to run, but when fsck sees an ext3 filesystem needing recovery, it skips the full check and just does the recovery stage. You still want the fsck to run because in case of a filesystem error being detected at run time, the kernel can mark the partition as having an error, and the subsequent fsck can pick that up and force a full fsck to fix it. That mechanism fails if you set the pass number to zero. You can disable forced fscks while preserving that error-recovery behaviour by leaving the passno intact but setting the fsck mount-count and check-intervals to zero with tune2fs. Cheers, Stephen