Hi Andy,
As your email has not received a specific response from someone with
better knowledge of NFS and OCFS locking, I will provide my 2 cents.
We started using OCFS2 probably 5 years ago, running Ubuntu on the 2.6
kernel, when we upgraded to Ubuntu 10.04, OCFS2 became more unstable,
causing locking and crashes as you have mentioned above.
Since then OCFS2 has come along way, with newer kernels we are seeing
less and less issues.
I would recommend checking your Kernel versions and upgrading if that
is possible, we are currently running 3.13 on 2 samba file servers
with no issues and 3.5 on 4 VM nodes which isolate as expected.
Let us know if you require any further assistance.
Kind regards,
Marty Sweet
On 6 May 2014 17:27, Andy <aryan1 at allantgroup.com>
wrote:> We have been using OCFS2 for a couple years, and have had a number of
> issues pop up, some of them seem resolved, but we are still concerned
> because the systems still seems a bit fragile.
>
> Several times we have had various OCFS2 volumes become unresponsive or
> slow. We have also run into the "wants too many credits" error a
few
> times, which seems to have been fixed by increasing the journal size on
> the volume causing the issue (might have made the journals bigger than
> they really need to be (256MB), but I want to avoid the credits
> problem). The slowness/unresponsiveness issues seemed to have been
> solved by increasing the cluster size (especially on largish volumes).
> But, there still a few concerns.
>
> The major concern is that when a volume becomes unresponsive, it causes
> a cascade affect where servers that simply have that volume NFS mounted,
> but are not using it, will have problems because commands like df will
> hang on that volume. I know that the nfsserver is trying to return the
> current freespace for the volume, but cannot get it because the volume
> is unresponsive. However, I think it would be better if a cached
> version of the free space could be return instead when the volume is
> unresponsive.
>
> When a server does hang a volume (probably locks) what is the best
> procedure to find the server that is causing the issue and the root
> cause of the problem. I have the scanlocks scripts, and have gotten
> better at determining the which server is the problem and to some extent
> the program or directory, but, to me it still is not an exact science.
> Are there any suggestions about the best way to do this. Ideally, it
> would be nice if I could get the systems to detect this on their own and
> either fence themselves or reboot.
>
> Any help would be appreciated.
>
> Thanks,
>
> Andy
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-users