It is waiting for the heartbeat timeout to trigger denoting a
node death. Then it initiates recovery. With the default settings,
this takes about a minute plus. This is with the o2cb cluster stack.
Note not all fs ops will hang during the detection phase. Only
those ops will hang that directly or indirectly require the
still-undeclared-dead node to respond to some dlm message.
So the notion of ocfs2 being ready to resume operation is very fuzzy.
For example, a process doing non-extending odirect read/writes to a file
will continue to operate during the detect phase. It will only pause
during the recovery phase, which typically takes a few seconds.
From this it appears you are doing node death detection of your own.
What are you using for that?
Sunil
> Folks,
>
> When a node crashes OCFS2 tends to hang for a while fencing the node
> off the cluster, is there a way to test whether OCFS2 is ready to
> resume operations, I am trying to restart a service on a different box
> and I keep getting errors and I would like to isolate the fact that
> the filesystem is not ready from the inability to restart the service
> on a different box.
>
> thank you,
> Enrique Sanchez.