Atin Mukherjee
2016-Jun-17 07:05 UTC
[Gluster-users] Problem with glusterd locks on gluster 3.6.1
On 06/16/2016 06:17 PM, Atin Mukherjee wrote:> > > On 06/16/2016 01:32 PM, B.K.Raghuram wrote: >> Thanks a lot Atin, >> >> The problem is that we are using a forked version of 3.6.1 which has >> been modified to work with ZFS (for snapshots) but we do not have the >> resources to port that over to the later versions of gluster. >> >> Would you know of anyone who would be willing to take this on?! > > If you can cherry pick the patches and apply them on your source and > rebuild it, I can point the patches to you, but you'd need to give a > day's time to me as I have some other items to finish from my plate.Here is the list of the patches need to be applied on the following order: http://review.gluster.org/9328 http://review.gluster.org/9393 http://review.gluster.org/10023> > ~Atin >> >> Regards, >> -Ram >> >> On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee <amukherj at redhat.com >> <mailto:amukherj at redhat.com>> wrote: >> >> >> >> On 06/16/2016 10:49 AM, B.K.Raghuram wrote: >> > >> > >> > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee <amukherj at redhat.com <mailto:amukherj at redhat.com> >> > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>> wrote: >> > >> > >> > >> > On 06/15/2016 04:24 PM, B.K.Raghuram wrote: >> > > Hi, >> > > >> > > We're using gluster 3.6.1 and we periodically find that gluster commands >> > > fail saying the it could not get the lock on one of the brick machines. >> > > The logs on that machine then say something like : >> > > >> > > [2016-06-15 08:17:03.076119] E >> > > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management: Unable to >> > > acquire lock for vol2 >> > >> > This is a possible case if concurrent volume operations are run. Do you >> > have any script which checks for volume status on an interval from all >> > the nodes, if so then this is an expected behavior. >> > >> > >> > Yes, I do have a couple of scripts that check on volume and quota >> > status.. Given this, I do get a "Another transaction is in progress.." >> > message which is ok. The problem is that sometimes I get the volume lock >> > held message which never goes away. This sometimes results in glusterd >> > consuming a lot of memory and CPU and the problem can only be fixed with >> > a reboot. The log files are huge so I'm not sure if its ok to attach >> > them to an email. >> >> Ok, so this is known. We have fixed lots of stale lock issues in 3.7 >> branch and some of them if not all were also backported to 3.6 branch. >> The issue is you are using 3.6.1 which is quite old. If you can upgrade >> to latest versions of 3.7 or at worst of 3.6 I am confident that this >> will go away. >> >> ~Atin >> > >> > > >> > > After sometime, glusterd then seems to give up and die.. >> > >> > Do you mean glusterd shuts down or segfaults, if so I am more >> interested >> > in analyzing this part. Could you provide us the glusterd log, >> > cmd_history log file along with core (in case of SEGV) from >> all the >> > nodes for the further analysis? >> > >> > >> > There is no segfault. glusterd just shuts down. As I said above, >> > sometimes this happens and sometimes it just continues to hog a lot of >> > memory and CPU.. >> > >> > >> > > >> > > Interestingly, I also find the following line in the >> beginning of >> > > etc-glusterfs-glusterd.vol.log and I dont know if this has any >> > > significance to the issue : >> > > >> > > [2016-06-14 06:48:57.282290] I >> > > [glusterd-store.c:2063:glusterd_restore_op_version] >> 0-management: >> > > Detected new install. Setting op-version to maximum : 30600 >> > > >> > >> > >> > What does this line signify? >> >>
B.K.Raghuram
2016-Jun-17 07:14 UTC
[Gluster-users] Problem with glusterd locks on gluster 3.6.1
Thanks Atin.. I'm not familiar with pulling patches the review system but will try:) On Fri, Jun 17, 2016 at 12:35 PM, Atin Mukherjee <amukherj at redhat.com> wrote:> > > On 06/16/2016 06:17 PM, Atin Mukherjee wrote: > > > > > > On 06/16/2016 01:32 PM, B.K.Raghuram wrote: > >> Thanks a lot Atin, > >> > >> The problem is that we are using a forked version of 3.6.1 which has > >> been modified to work with ZFS (for snapshots) but we do not have the > >> resources to port that over to the later versions of gluster. > >> > >> Would you know of anyone who would be willing to take this on?! > > > > If you can cherry pick the patches and apply them on your source and > > rebuild it, I can point the patches to you, but you'd need to give a > > day's time to me as I have some other items to finish from my plate. > > > Here is the list of the patches need to be applied on the following order: > > http://review.gluster.org/9328 > http://review.gluster.org/9393 > http://review.gluster.org/10023 > > > > > ~Atin > >> > >> Regards, > >> -Ram > >> > >> On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee <amukherj at redhat.com > >> <mailto:amukherj at redhat.com>> wrote: > >> > >> > >> > >> On 06/16/2016 10:49 AM, B.K.Raghuram wrote: > >> > > >> > > >> > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee < > amukherj at redhat.com <mailto:amukherj at redhat.com> > >> > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>> wrote: > >> > > >> > > >> > > >> > On 06/15/2016 04:24 PM, B.K.Raghuram wrote: > >> > > Hi, > >> > > > >> > > We're using gluster 3.6.1 and we periodically find that > gluster commands > >> > > fail saying the it could not get the lock on one of the > brick machines. > >> > > The logs on that machine then say something like : > >> > > > >> > > [2016-06-15 08:17:03.076119] E > >> > > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management: > Unable to > >> > > acquire lock for vol2 > >> > > >> > This is a possible case if concurrent volume operations are > run. Do you > >> > have any script which checks for volume status on an interval > from all > >> > the nodes, if so then this is an expected behavior. > >> > > >> > > >> > Yes, I do have a couple of scripts that check on volume and quota > >> > status.. Given this, I do get a "Another transaction is in > progress.." > >> > message which is ok. The problem is that sometimes I get the > volume lock > >> > held message which never goes away. This sometimes results in > glusterd > >> > consuming a lot of memory and CPU and the problem can only be > fixed with > >> > a reboot. The log files are huge so I'm not sure if its ok to > attach > >> > them to an email. > >> > >> Ok, so this is known. We have fixed lots of stale lock issues in 3.7 > >> branch and some of them if not all were also backported to 3.6 > branch. > >> The issue is you are using 3.6.1 which is quite old. If you can > upgrade > >> to latest versions of 3.7 or at worst of 3.6 I am confident that > this > >> will go away. > >> > >> ~Atin > >> > > >> > > > >> > > After sometime, glusterd then seems to give up and die.. > >> > > >> > Do you mean glusterd shuts down or segfaults, if so I am more > >> interested > >> > in analyzing this part. Could you provide us the glusterd log, > >> > cmd_history log file along with core (in case of SEGV) from > >> all the > >> > nodes for the further analysis? > >> > > >> > > >> > There is no segfault. glusterd just shuts down. As I said above, > >> > sometimes this happens and sometimes it just continues to hog a > lot of > >> > memory and CPU.. > >> > > >> > > >> > > > >> > > Interestingly, I also find the following line in the > >> beginning of > >> > > etc-glusterfs-glusterd.vol.log and I dont know if this has > any > >> > > significance to the issue : > >> > > > >> > > [2016-06-14 06:48:57.282290] I > >> > > [glusterd-store.c:2063:glusterd_restore_op_version] > >> 0-management: > >> > > Detected new install. Setting op-version to maximum : 30600 > >> > > > >> > > >> > > >> > What does this line signify? > >> > >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/22b6b682/attachment.html>