thr3ads.net - Gluster users - [Gluster-users] Problem with glusterd locks on gluster 3.6.1 [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Atin Mukherjee

2016-Jun-16 12:47 UTC

[Gluster-users] Problem with glusterd locks on gluster 3.6.1

On 06/16/2016 01:32 PM, B.K.Raghuram wrote:> Thanks a lot Atin,
> 
> The problem is that we are using a forked version of 3.6.1 which has
> been modified to work with ZFS (for snapshots) but we do not have the
> resources to port that over to the later versions of gluster.
> 
> Would you know of anyone who would be willing to take this on?!
If you can cherry pick the patches and apply them on your source and
rebuild it, I can point the patches to you, but you'd need to give a
day's time to me as I have some other items to finish from my plate.

~Atin> 
> Regards,
> -Ram
> 
> On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee <amukherj at redhat.com
> <mailto:amukherj at redhat.com>> wrote:
> 
> 
> 
>     On 06/16/2016 10:49 AM, B.K.Raghuram wrote:
>     >
>     >
>     > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee <amukherj at
redhat.com <mailto:amukherj at redhat.com>
>     > <mailto:amukherj at redhat.com <mailto:amukherj at
redhat.com>>> wrote:
>     >
>     >
>     >
>     >     On 06/15/2016 04:24 PM, B.K.Raghuram wrote:
>     >     > Hi,
>     >     >
>     >     > We're using gluster 3.6.1 and we periodically find
that gluster commands
>     >     > fail saying the it could not get the lock on one of the
brick machines.
>     >     > The logs on that machine then say something like :
>     >     >
>     >     > [2016-06-15 08:17:03.076119] E
>     >     > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management:
Unable to
>     >     > acquire lock for vol2
>     >
>     >     This is a possible case if concurrent volume operations are
run. Do you
>     >     have any script which checks for volume status on an interval
from all
>     >     the nodes, if so then this is an expected behavior.
>     >
>     >
>     > Yes, I do have a couple of scripts that check on volume and quota
>     > status.. Given this, I do get a "Another transaction is in
progress.."
>     > message which is ok. The problem is that sometimes I get the
volume lock
>     > held message which never goes away. This sometimes results in
glusterd
>     > consuming a lot of memory and CPU and the problem can only be
fixed with
>     > a reboot. The log files are huge so I'm not sure if its ok to
attach
>     > them to an email.
> 
>     Ok, so this is known. We have fixed lots of stale lock issues in 3.7
>     branch and some of them if not all were also backported to 3.6 branch.
>     The issue is you are using 3.6.1 which is quite old. If you can upgrade
>     to latest versions of 3.7 or at worst of 3.6 I am confident that this
>     will go away.
> 
>     ~Atin
>     >
>     >     >
>     >     > After sometime, glusterd then seems to give up and die..
>     >
>     >     Do you mean glusterd shuts down or segfaults, if so I am more
>     interested
>     >     in analyzing this part. Could you provide us the glusterd log,
>     >     cmd_history log file along with core (in case of SEGV) from
>     all the
>     >     nodes for the further analysis?
>     >
>     >
>     > There is no segfault. glusterd just shuts down. As I said above,
>     > sometimes this happens and sometimes it just continues to hog a
lot of
>     > memory and CPU..
>     >
>     >
>     >     >
>     >     > Interestingly, I also find the following line in the
>     beginning of
>     >     > etc-glusterfs-glusterd.vol.log and I dont know if this
has any
>     >     > significance to the issue :
>     >     >
>     >     > [2016-06-14 06:48:57.282290] I
>     >     > [glusterd-store.c:2063:glusterd_restore_op_version]
>     0-management:
>     >     > Detected new install. Setting op-version to maximum :
30600
>     >     >
>     >
>     >
>     > What does this line signify?
> 
>

Atin Mukherjee

2016-Jun-17 07:05 UTC

head link

[Gluster-users] Problem with glusterd locks on gluster 3.6.1

On 06/16/2016 06:17 PM, Atin Mukherjee wrote:> 
> 
> On 06/16/2016 01:32 PM, B.K.Raghuram wrote:
>> Thanks a lot Atin,
>>
>> The problem is that we are using a forked version of 3.6.1 which has
>> been modified to work with ZFS (for snapshots) but we do not have the
>> resources to port that over to the later versions of gluster.
>>
>> Would you know of anyone who would be willing to take this on?!
> 
> If you can cherry pick the patches and apply them on your source and
> rebuild it, I can point the patches to you, but you'd need to give a
> day's time to me as I have some other items to finish from my plate.

Here is the list of the patches need to be applied on the following order:

http://review.gluster.org/9328
http://review.gluster.org/9393
http://review.gluster.org/10023
> 
> ~Atin
>>
>> Regards,
>> -Ram
>>
>> On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee <amukherj at
redhat.com
>> <mailto:amukherj at redhat.com>> wrote:
>>
>>
>>
>>     On 06/16/2016 10:49 AM, B.K.Raghuram wrote:
>>     >
>>     >
>>     > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee <amukherj
at redhat.com <mailto:amukherj at redhat.com>
>>     > <mailto:amukherj at redhat.com <mailto:amukherj at
redhat.com>>> wrote:
>>     >
>>     >
>>     >
>>     >     On 06/15/2016 04:24 PM, B.K.Raghuram wrote:
>>     >     > Hi,
>>     >     >
>>     >     > We're using gluster 3.6.1 and we periodically
find that gluster commands
>>     >     > fail saying the it could not get the lock on one of
the brick machines.
>>     >     > The logs on that machine then say something like :
>>     >     >
>>     >     > [2016-06-15 08:17:03.076119] E
>>     >     > [glusterd-op-sm.c:3058:glusterd_op_ac_lock]
0-management: Unable to
>>     >     > acquire lock for vol2
>>     >
>>     >     This is a possible case if concurrent volume operations
are run. Do you
>>     >     have any script which checks for volume status on an
interval from all
>>     >     the nodes, if so then this is an expected behavior.
>>     >
>>     >
>>     > Yes, I do have a couple of scripts that check on volume and
quota
>>     > status.. Given this, I do get a "Another transaction is
in progress.."
>>     > message which is ok. The problem is that sometimes I get the
volume lock
>>     > held message which never goes away. This sometimes results in
glusterd
>>     > consuming a lot of memory and CPU and the problem can only be
fixed with
>>     > a reboot. The log files are huge so I'm not sure if its ok
to attach
>>     > them to an email.
>>
>>     Ok, so this is known. We have fixed lots of stale lock issues in
3.7
>>     branch and some of them if not all were also backported to 3.6
branch.
>>     The issue is you are using 3.6.1 which is quite old. If you can
upgrade
>>     to latest versions of 3.7 or at worst of 3.6 I am confident that
this
>>     will go away.
>>
>>     ~Atin
>>     >
>>     >     >
>>     >     > After sometime, glusterd then seems to give up and
die..
>>     >
>>     >     Do you mean glusterd shuts down or segfaults, if so I am
more
>>     interested
>>     >     in analyzing this part. Could you provide us the glusterd
log,
>>     >     cmd_history log file along with core (in case of SEGV)
from
>>     all the
>>     >     nodes for the further analysis?
>>     >
>>     >
>>     > There is no segfault. glusterd just shuts down. As I said
above,
>>     > sometimes this happens and sometimes it just continues to hog
a lot of
>>     > memory and CPU..
>>     >
>>     >
>>     >     >
>>     >     > Interestingly, I also find the following line in the
>>     beginning of
>>     >     > etc-glusterfs-glusterd.vol.log and I dont know if
this has any
>>     >     > significance to the issue :
>>     >     >
>>     >     > [2016-06-14 06:48:57.282290] I
>>     >     > [glusterd-store.c:2063:glusterd_restore_op_version]
>>     0-management:
>>     >     > Detected new install. Setting op-version to maximum :
30600
>>     >     >
>>     >
>>     >
>>     > What does this line signify?
>>
>>

Gluster users - Jun 2016 - Problem with glusterd locks on gluster 3.6.1

[Gluster-users] Problem with glusterd locks on gluster 3.6.1

[Gluster-users] Problem with glusterd locks on gluster 3.6.1