thr3ads.net - Gluster users - [Gluster-users] Problem with glusterd locks on gluster 3.6.1 [Jun 2016]

If this information is useful, please help other people find it:
Share via:

B.K.Raghuram

2016-Jun-16 08:02 UTC

[Gluster-users] Problem with glusterd locks on gluster 3.6.1

Thanks a lot Atin,

The problem is that we are using a forked version of 3.6.1 which has been
modified to work with ZFS (for snapshots) but we do not have the resources
to port that over to the later versions of gluster.

Would you know of anyone who would be willing to take this on?!

Regards,
-Ram

On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee <amukherj at redhat.com>
wrote:
>
>
> On 06/16/2016 10:49 AM, B.K.Raghuram wrote:
> >
> >
> > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee <amukherj at
redhat.com
> > <mailto:amukherj at redhat.com>> wrote:
> >
> >
> >
> >     On 06/15/2016 04:24 PM, B.K.Raghuram wrote:
> >     > Hi,
> >     >
> >     > We're using gluster 3.6.1 and we periodically find that
gluster
> commands
> >     > fail saying the it could not get the lock on one of the brick
> machines.
> >     > The logs on that machine then say something like :
> >     >
> >     > [2016-06-15 08:17:03.076119] E
> >     > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management:
Unable to
> >     > acquire lock for vol2
> >
> >     This is a possible case if concurrent volume operations are run.
Do
> you
> >     have any script which checks for volume status on an interval from
> all
> >     the nodes, if so then this is an expected behavior.
> >
> >
> > Yes, I do have a couple of scripts that check on volume and quota
> > status.. Given this, I do get a "Another transaction is in
progress.."
> > message which is ok. The problem is that sometimes I get the volume
lock
> > held message which never goes away. This sometimes results in glusterd
> > consuming a lot of memory and CPU and the problem can only be fixed
with
> > a reboot. The log files are huge so I'm not sure if its ok to
attach
> > them to an email.
>
> Ok, so this is known. We have fixed lots of stale lock issues in 3.7
> branch and some of them if not all were also backported to 3.6 branch.
> The issue is you are using 3.6.1 which is quite old. If you can upgrade
> to latest versions of 3.7 or at worst of 3.6 I am confident that this
> will go away.
>
> ~Atin
> >
> >     >
> >     > After sometime, glusterd then seems to give up and die..
> >
> >     Do you mean glusterd shuts down or segfaults, if so I am more
> interested
> >     in analyzing this part. Could you provide us the glusterd log,
> >     cmd_history log file along with core (in case of SEGV) from all
the
> >     nodes for the further analysis?
> >
> >
> > There is no segfault. glusterd just shuts down. As I said above,
> > sometimes this happens and sometimes it just continues to hog a lot of
> > memory and CPU..
> >
> >
> >     >
> >     > Interestingly, I also find the following line in the
beginning of
> >     > etc-glusterfs-glusterd.vol.log and I dont know if this has
any
> >     > significance to the issue :
> >     >
> >     > [2016-06-14 06:48:57.282290] I
> >     > [glusterd-store.c:2063:glusterd_restore_op_version]
0-management:
> >     > Detected new install. Setting op-version to maximum : 30600
> >     >
> >
> >
> > What does this line signify?
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160616/aa0235cf/attachment.html>

Atin Mukherjee

2016-Jun-16 12:47 UTC

head link

[Gluster-users] Problem with glusterd locks on gluster 3.6.1

On 06/16/2016 01:32 PM, B.K.Raghuram wrote:> Thanks a lot Atin,
> 
> The problem is that we are using a forked version of 3.6.1 which has
> been modified to work with ZFS (for snapshots) but we do not have the
> resources to port that over to the later versions of gluster.
> 
> Would you know of anyone who would be willing to take this on?!
If you can cherry pick the patches and apply them on your source and
rebuild it, I can point the patches to you, but you'd need to give a
day's time to me as I have some other items to finish from my plate.

~Atin> 
> Regards,
> -Ram
> 
> On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee <amukherj at redhat.com
> <mailto:amukherj at redhat.com>> wrote:
> 
> 
> 
>     On 06/16/2016 10:49 AM, B.K.Raghuram wrote:
>     >
>     >
>     > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee <amukherj at
redhat.com <mailto:amukherj at redhat.com>
>     > <mailto:amukherj at redhat.com <mailto:amukherj at
redhat.com>>> wrote:
>     >
>     >
>     >
>     >     On 06/15/2016 04:24 PM, B.K.Raghuram wrote:
>     >     > Hi,
>     >     >
>     >     > We're using gluster 3.6.1 and we periodically find
that gluster commands
>     >     > fail saying the it could not get the lock on one of the
brick machines.
>     >     > The logs on that machine then say something like :
>     >     >
>     >     > [2016-06-15 08:17:03.076119] E
>     >     > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management:
Unable to
>     >     > acquire lock for vol2
>     >
>     >     This is a possible case if concurrent volume operations are
run. Do you
>     >     have any script which checks for volume status on an interval
from all
>     >     the nodes, if so then this is an expected behavior.
>     >
>     >
>     > Yes, I do have a couple of scripts that check on volume and quota
>     > status.. Given this, I do get a "Another transaction is in
progress.."
>     > message which is ok. The problem is that sometimes I get the
volume lock
>     > held message which never goes away. This sometimes results in
glusterd
>     > consuming a lot of memory and CPU and the problem can only be
fixed with
>     > a reboot. The log files are huge so I'm not sure if its ok to
attach
>     > them to an email.
> 
>     Ok, so this is known. We have fixed lots of stale lock issues in 3.7
>     branch and some of them if not all were also backported to 3.6 branch.
>     The issue is you are using 3.6.1 which is quite old. If you can upgrade
>     to latest versions of 3.7 or at worst of 3.6 I am confident that this
>     will go away.
> 
>     ~Atin
>     >
>     >     >
>     >     > After sometime, glusterd then seems to give up and die..
>     >
>     >     Do you mean glusterd shuts down or segfaults, if so I am more
>     interested
>     >     in analyzing this part. Could you provide us the glusterd log,
>     >     cmd_history log file along with core (in case of SEGV) from
>     all the
>     >     nodes for the further analysis?
>     >
>     >
>     > There is no segfault. glusterd just shuts down. As I said above,
>     > sometimes this happens and sometimes it just continues to hog a
lot of
>     > memory and CPU..
>     >
>     >
>     >     >
>     >     > Interestingly, I also find the following line in the
>     beginning of
>     >     > etc-glusterfs-glusterd.vol.log and I dont know if this
has any
>     >     > significance to the issue :
>     >     >
>     >     > [2016-06-14 06:48:57.282290] I
>     >     > [glusterd-store.c:2063:glusterd_restore_op_version]
>     0-management:
>     >     > Detected new install. Setting op-version to maximum :
30600
>     >     >
>     >
>     >
>     > What does this line signify?
> 
>

Joe Julian

2016-Jun-17 11:49 UTC

head link

[Gluster-users] Problem with glusterd locks on gluster 3.6.1

Have you offered those patches upstream?

On June 16, 2016 1:02:24 AM PDT, "B.K.Raghuram" <bkrram at
gmail.com> wrote:>Thanks a lot Atin,
>
>The problem is that we are using a forked version of 3.6.1 which has
>been
>modified to work with ZFS (for snapshots) but we do not have the
>resources
>to port that over to the later versions of gluster.
>
>Would you know of anyone who would be willing to take this on?!
>
>Regards,
>-Ram
>
>On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee <amukherj at
redhat.com>
>wrote:
>
>>
>>
>> On 06/16/2016 10:49 AM, B.K.Raghuram wrote:
>> >
>> >
>> > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee
><amukherj at redhat.com
>> > <mailto:amukherj at redhat.com>> wrote:
>> >
>> >
>> >
>> >     On 06/15/2016 04:24 PM, B.K.Raghuram wrote:
>> >     > Hi,
>> >     >
>> >     > We're using gluster 3.6.1 and we periodically find
that
>gluster
>> commands
>> >     > fail saying the it could not get the lock on one of the
brick
>> machines.
>> >     > The logs on that machine then say something like :
>> >     >
>> >     > [2016-06-15 08:17:03.076119] E
>> >     > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management:
>Unable to
>> >     > acquire lock for vol2
>> >
>> >     This is a possible case if concurrent volume operations are
>run. Do
>> you
>> >     have any script which checks for volume status on an interval
>from
>> all
>> >     the nodes, if so then this is an expected behavior.
>> >
>> >
>> > Yes, I do have a couple of scripts that check on volume and quota
>> > status.. Given this, I do get a "Another transaction is in
>progress.."
>> > message which is ok. The problem is that sometimes I get the
volume
>lock
>> > held message which never goes away. This sometimes results in
>glusterd
>> > consuming a lot of memory and CPU and the problem can only be
fixed
>with
>> > a reboot. The log files are huge so I'm not sure if its ok to
>attach
>> > them to an email.
>>
>> Ok, so this is known. We have fixed lots of stale lock issues in 3.7
>> branch and some of them if not all were also backported to 3.6
>branch.
>> The issue is you are using 3.6.1 which is quite old. If you can
>upgrade
>> to latest versions of 3.7 or at worst of 3.6 I am confident that this
>> will go away.
>>
>> ~Atin
>> >
>> >     >
>> >     > After sometime, glusterd then seems to give up and die..
>> >
>> >     Do you mean glusterd shuts down or segfaults, if so I am more
>> interested
>> >     in analyzing this part. Could you provide us the glusterd log,
>> >     cmd_history log file along with core (in case of SEGV) from
all
>the
>> >     nodes for the further analysis?
>> >
>> >
>> > There is no segfault. glusterd just shuts down. As I said above,
>> > sometimes this happens and sometimes it just continues to hog a
lot
>of
>> > memory and CPU..
>> >
>> >
>> >     >
>> >     > Interestingly, I also find the following line in the
>beginning of
>> >     > etc-glusterfs-glusterd.vol.log and I dont know if this
has
>any
>> >     > significance to the issue :
>> >     >
>> >     > [2016-06-14 06:48:57.282290] I
>> >     > [glusterd-store.c:2063:glusterd_restore_op_version]
>0-management:
>> >     > Detected new install. Setting op-version to maximum :
30600
>> >     >
>> >
>> >
>> > What does this line signify?
>>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://www.gluster.org/mailman/listinfo/gluster-users
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160617/20d5410c/attachment.html>

Gluster users - Jun 2016 - Problem with glusterd locks on gluster 3.6.1

[Gluster-users] Problem with glusterd locks on gluster 3.6.1

[Gluster-users] Problem with glusterd locks on gluster 3.6.1

[Gluster-users] Problem with glusterd locks on gluster 3.6.1