Hmm... meaning the corrupted area was not touched by the running fs.
Because if it had, the fs would have been remounted ro with a message
asking the user to run fsck.
Remember, ocfs2 is a journaled fs. So there is no need to run fsck
on a regular basis. If a node dies, the surviving node recovers the dead
node (replays it journal). So the parallel with a local fs is just not
there. For example, even with ext3, when its fsck runs on boot, it merely
replays the journal. But yes, it has other triggers like number of mounts
and time since the last fsck that cause it to run the force fsck.
While the number of mounts does not work for us, we could implement
something
with the time since last fsck run, and run force fsck during mount, whenever
the fs gets the opportunity to do so. Some thing to think about.
Sunil
Henrik Carlqvist wrote:> Thanks for a great file system!
>
> I have a two-node cluster working as a HA NFS server. This system has
> worked fine for almost a year, but recently I found that an ocfs2 file
> system had been corrupted and needed to be repaired with fsck.ocfs2.
>
> Even though I don't think that any data was lost I found that 4 of my
14
> ocfs2 partitions had some errors which were corrected by fsck.ocfs2. I
> then realized that these errors probably had been there since a rather
> long time ago when an UPS broke down. I also found that on my system fsck
> is never run on the ocfs2 file systems at reboot unless some kind of
> manual work was done to run fsck.
>
> On my Slackware system fsck is supposed to be run at an early stage in the
> boot scripts, but this doesn't apply to the ocfs2 file systems as o2cb
> hasn't been started at that time. At a late stage in the boot process
> /etc/init.d/o2cb and /etc/init.d/ocfs2 is called from rc.local. However
> the ocfs2 scripts mounts the file systems without checking them even
> if they haven't been cleanly unmounted.
>
> I realize that adding fsck.ocfs2 to the startup scripts might be a non
> trivial task, there are a few scenarios that has to be handled:
>
> 1) A nice controlled reboot of one node while the other node is running
>
> 2) Nice controlled reboot of both nodes at the same time
>
> 3) One node crashing for some reason while the other node is running
>
> 4) Both nodes crashing and rebooting at the same time
>
> Would it be OK to run fsck.ocfs2 on both nodes after the call to o2cb but
> before the call to /etc/init.d/ocfs2?
>
> If one node is still running I suppose that o2cb will prevent fsck.ocfs2
> from running as a node has the file systems mounted?
>
> What if both nodes reboot at the same time? Will o2cb make sure that only
> one node runs fsck on each file system? If one node tries to mount a file
> system which the other node is still running fsck on, will this mounting
> be delayed or aborted by o2cb?
>
> Thanks in advance for any advice regarding fsck and ocfs2!
>
> regards Henrik
>