thr3ads.net - Gluster users - [Gluster-users] Locking failed

If this information is useful, please help other people find it:
Share via:

Osborne, Paul (paul.osborne@canterbury.ac.uk)

2015-Aug-03 14:01 UTC

[Gluster-users] Locking failed - since upgrade to 3.6.4

Hi,

Last week I upgraded one of my gluster clusters (3 hosts with bricks as replica
3) to 3.6.4 from 3.5.4 and all seemed well.

Today I am getting reports that locking has failed:


gfse-cant-01:/var/log/glusterfs# gluster volume status
Locking failed on gfse-rh-01.core.canterbury.ac.uk. Please check log file for
details.
Locking failed on gfse-isr-01.core.canterbury.ac.uk. Please check log file for
details.

Logs:
[2015-08-03 13:45:29.974560] E [glusterd-syncop.c:1640:gd_sync_task_begin]
0-management: Locking Peers Failed.
[2015-08-03 13:49:48.273159] E [glusterd-syncop.c:105:gd_collate_errors] 0-:
Locking failed on gfse-rh-01.core.canterbury.ac.uk. Please ch
eck log file for details.
[2015-08-03 13:49:48.273778] E [glusterd-syncop.c:105:gd_collate_errors] 0-:
Locking failed on gfse-isr-01.core.canterbury.ac.uk. Please c
heck log file for details.


I am wondering if this is a new feature due to 3.6.4 or something that has gone
wrong.

Restarting gluster entirely (btw the restart script does not actually appear to
kill the processes...) resolves the issue but then it repeats a few minutes
later which is rather suboptimal for a running service.

Googling suggests that there may be simultaneous actions going on that can cause
a locking issue.

I know that I have nagios running volume status <volname> for each of my
volumes on each host every few minutes however this is not new and has been in
place for the last 8-9 months that against 3.5 without issue so would hope that
this is not causing the issue.

I am not sure where to look now tbh.




Paul Osborne
Senior Systems Engineer
Canterbury Christ Church University
Tel: 01227 782751

Atin Mukherjee

2015-Aug-03 14:22 UTC

head link

[Gluster-users] Locking failed - since upgrade to 3.6.4

Could you check the glusterd log at the other nodes, that would give you
the hint of the exact issue. Also looking at .cmd_log_history will give you
the time interval at which volume status commands are executed. If the gap
is in milisecs then you are bound to hit it and its expected.

-Atin
Sent from one plus one
On Aug 3, 2015 7:32 PM, "Osborne, Paul (paul.osborne at
canterbury.ac.uk)" <
paul.osborne at canterbury.ac.uk> wrote:
>
> Hi,
>
> Last week I upgraded one of my gluster clusters (3 hosts with bricks as
> replica 3) to 3.6.4 from 3.5.4 and all seemed well.
>
> Today I am getting reports that locking has failed:
>
>
> gfse-cant-01:/var/log/glusterfs# gluster volume status
> Locking failed on gfse-rh-01.core.canterbury.ac.uk. Please check log file
> for details.
> Locking failed on gfse-isr-01.core.canterbury.ac.uk. Please check log
> file for details.
>
> Logs:
> [2015-08-03 13:45:29.974560] E [glusterd-syncop.c:1640:gd_sync_task_begin]
> 0-management: Locking Peers Failed.
> [2015-08-03 13:49:48.273159] E [glusterd-syncop.c:105:gd_collate_errors]
> 0-: Locking failed on gfse-rh-01.core.canterbury.ac.uk. Please ch
> eck log file for details.
> [2015-08-03 13:49:48.273778] E [glusterd-syncop.c:105:gd_collate_errors]
> 0-: Locking failed on gfse-isr-01.core.canterbury.ac.uk. Please c
> heck log file for details.
>
>
> I am wondering if this is a new feature due to 3.6.4 or something that has
> gone wrong.
>
> Restarting gluster entirely (btw the restart script does not actually
> appear to kill the processes...) resolves the issue but then it repeats a
> few minutes later which is rather suboptimal for a running service.
>
> Googling suggests that there may be simultaneous actions going on that can
> cause a locking issue.
>
> I know that I have nagios running volume status <volname> for each of
my
> volumes on each host every few minutes however this is not new and has been
> in place for the last 8-9 months that against 3.5 without issue so would
> hope that this is not causing the issue.
>
> I am not sure where to look now tbh.
>
>
>
>
> Paul Osborne
> Senior Systems Engineer
> Canterbury Christ Church University
> Tel: 01227 782751
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150803/99dc369c/attachment.html>

Gluster users - Aug 2015 - Locking failed - since upgrade to 3.6.4

[Gluster-users] Locking failed - since upgrade to 3.6.4

[Gluster-users] Locking failed - since upgrade to 3.6.4