B.K.Raghuram
2016-Jun-17 09:25 UTC
[Gluster-users] Problem with glusterd locks on gluster 3.6.1
Thanks Atin, I had three merge conflicts in the third patch.. I've attached the files with the conflicts. Would any of the intervening commits be needed as well? The conflicts were in : both modified: libglusterfs/src/mem-types.h both modified: xlators/mgmt/glusterd/src/glusterd-utils.c both modified: xlators/mgmt/glusterd/src/glusterd-utils.h On Fri, Jun 17, 2016 at 2:17 PM, Atin Mukherjee <amukherj at redhat.com> wrote:> > > On 06/17/2016 12:44 PM, B.K.Raghuram wrote: > > Thanks Atin.. I'm not familiar with pulling patches the review system > > but will try:) > > It's not that difficult. Open the gerrit review link, go to the download > drop box at the top right corner, click on it and then you will see a > cherry pick option, copy that content and paste it the source code repo > you host. If there are no merge conflicts, it should auto apply, > otherwise you'd need to fix them manually. > > HTH. > Atin > > > > > On Fri, Jun 17, 2016 at 12:35 PM, Atin Mukherjee <amukherj at redhat.com > > <mailto:amukherj at redhat.com>> wrote: > > > > > > > > On 06/16/2016 06:17 PM, Atin Mukherjee wrote: > > > > > > > > > On 06/16/2016 01:32 PM, B.K.Raghuram wrote: > > >> Thanks a lot Atin, > > >> > > >> The problem is that we are using a forked version of 3.6.1 which > has > > >> been modified to work with ZFS (for snapshots) but we do not have > the > > >> resources to port that over to the later versions of gluster. > > >> > > >> Would you know of anyone who would be willing to take this on?! > > > > > > If you can cherry pick the patches and apply them on your source > and > > > rebuild it, I can point the patches to you, but you'd need to give > a > > > day's time to me as I have some other items to finish from my > plate. > > > > > > Here is the list of the patches need to be applied on the following > > order: > > > > http://review.gluster.org/9328 > > http://review.gluster.org/9393 > > http://review.gluster.org/10023 > > > > > > > > ~Atin > > >> > > >> Regards, > > >> -Ram > > >> > > >> On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee > > <amukherj at redhat.com <mailto:amukherj at redhat.com> > > >> <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>> wrote: > > >> > > >> > > >> > > >> On 06/16/2016 10:49 AM, B.K.Raghuram wrote: > > >> > > > >> > > > >> > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee > > <amukherj at redhat.com <mailto:amukherj at redhat.com> > > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>> > > >> > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com> > > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>>> wrote: > > >> > > > >> > > > >> > > > >> > On 06/15/2016 04:24 PM, B.K.Raghuram wrote: > > >> > > Hi, > > >> > > > > >> > > We're using gluster 3.6.1 and we periodically find > > that gluster commands > > >> > > fail saying the it could not get the lock on one of > > the brick machines. > > >> > > The logs on that machine then say something like : > > >> > > > > >> > > [2016-06-15 08:17:03.076119] E > > >> > > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] > > 0-management: Unable to > > >> > > acquire lock for vol2 > > >> > > > >> > This is a possible case if concurrent volume operations > > are run. Do you > > >> > have any script which checks for volume status on an > > interval from all > > >> > the nodes, if so then this is an expected behavior. > > >> > > > >> > > > >> > Yes, I do have a couple of scripts that check on volume and > > quota > > >> > status.. Given this, I do get a "Another transaction is in > > progress.." > > >> > message which is ok. The problem is that sometimes I get > > the volume lock > > >> > held message which never goes away. This sometimes results > > in glusterd > > >> > consuming a lot of memory and CPU and the problem can only > > be fixed with > > >> > a reboot. The log files are huge so I'm not sure if its ok > > to attach > > >> > them to an email. > > >> > > >> Ok, so this is known. We have fixed lots of stale lock issues > > in 3.7 > > >> branch and some of them if not all were also backported to > > 3.6 branch. > > >> The issue is you are using 3.6.1 which is quite old. If you > > can upgrade > > >> to latest versions of 3.7 or at worst of 3.6 I am confident > > that this > > >> will go away. > > >> > > >> ~Atin > > >> > > > >> > > > > >> > > After sometime, glusterd then seems to give up and > die.. > > >> > > > >> > Do you mean glusterd shuts down or segfaults, if so I > > am more > > >> interested > > >> > in analyzing this part. Could you provide us the > > glusterd log, > > >> > cmd_history log file along with core (in case of SEGV) > from > > >> all the > > >> > nodes for the further analysis? > > >> > > > >> > > > >> > There is no segfault. glusterd just shuts down. As I said > > above, > > >> > sometimes this happens and sometimes it just continues to > > hog a lot of > > >> > memory and CPU.. > > >> > > > >> > > > >> > > > > >> > > Interestingly, I also find the following line in the > > >> beginning of > > >> > > etc-glusterfs-glusterd.vol.log and I dont know if > > this has any > > >> > > significance to the issue : > > >> > > > > >> > > [2016-06-14 06:48:57.282290] I > > >> > > [glusterd-store.c:2063:glusterd_restore_op_version] > > >> 0-management: > > >> > > Detected new install. Setting op-version to maximum : > > 30600 > > >> > > > > >> > > > >> > > > >> > What does this line signify? > > >> > > >> > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/72c7f905/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: mem-types.h Type: text/x-chdr Size: 6082 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/72c7f905/attachment-0003.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: glusterd-utils.c Type: text/x-csrc Size: 472427 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/72c7f905/attachment-0004.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: glusterd-utils.h Type: text/x-chdr Size: 28526 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/72c7f905/attachment-0005.bin>
Atin Mukherjee
2016-Jun-17 09:37 UTC
[Gluster-users] Problem with glusterd locks on gluster 3.6.1
I've resolved the merge conflicts and files are attached. Copy these files and follow the instructions from the cherry pick command which failed. ~Atin On 06/17/2016 02:55 PM, B.K.Raghuram wrote:> > Thanks Atin, I had three merge conflicts in the third patch.. I've > attached the files with the conflicts. Would any of the intervening > commits be needed as well? > > The conflicts were in : > > both modified: libglusterfs/src/mem-types.h > both modified: xlators/mgmt/glusterd/src/glusterd-utils.c > both modified: xlators/mgmt/glusterd/src/glusterd-utils.h > > > On Fri, Jun 17, 2016 at 2:17 PM, Atin Mukherjee <amukherj at redhat.com > <mailto:amukherj at redhat.com>> wrote: > > > > On 06/17/2016 12:44 PM, B.K.Raghuram wrote: > > Thanks Atin.. I'm not familiar with pulling patches the review system > > but will try:) > > It's not that difficult. Open the gerrit review link, go to the download > drop box at the top right corner, click on it and then you will see a > cherry pick option, copy that content and paste it the source code repo > you host. If there are no merge conflicts, it should auto apply, > otherwise you'd need to fix them manually. > > HTH. > Atin > > > > > On Fri, Jun 17, 2016 at 12:35 PM, Atin Mukherjee <amukherj at redhat.com <mailto:amukherj at redhat.com> > > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>> wrote: > > > > > > > > On 06/16/2016 06:17 PM, Atin Mukherjee wrote: > > > > > > > > > On 06/16/2016 01:32 PM, B.K.Raghuram wrote: > > >> Thanks a lot Atin, > > >> > > >> The problem is that we are using a forked version of 3.6.1 which has > > >> been modified to work with ZFS (for snapshots) but we do not have the > > >> resources to port that over to the later versions of gluster. > > >> > > >> Would you know of anyone who would be willing to take this on?! > > > > > > If you can cherry pick the patches and apply them on your source and > > > rebuild it, I can point the patches to you, but you'd need to give a > > > day's time to me as I have some other items to finish from my plate. > > > > > > Here is the list of the patches need to be applied on the following > > order: > > > > http://review.gluster.org/9328 > > http://review.gluster.org/9393 > > http://review.gluster.org/10023 > > > > > > > > ~Atin > > >> > > >> Regards, > > >> -Ram > > >> > > >> On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee > > <amukherj at redhat.com <mailto:amukherj at redhat.com> > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>> > > >> <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com> > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>>> wrote: > > >> > > >> > > >> > > >> On 06/16/2016 10:49 AM, B.K.Raghuram wrote: > > >> > > > >> > > > >> > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee > > <amukherj at redhat.com <mailto:amukherj at redhat.com> > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>> > > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com> > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>> > > >> > <mailto:amukherj at redhat.com > <mailto:amukherj at redhat.com> <mailto:amukherj at redhat.com > <mailto:amukherj at redhat.com>> > > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com> > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>>>> wrote: > > >> > > > >> > > > >> > > > >> > On 06/15/2016 04:24 PM, B.K.Raghuram wrote: > > >> > > Hi, > > >> > > > > >> > > We're using gluster 3.6.1 and we periodically find > > that gluster commands > > >> > > fail saying the it could not get the lock on one of > > the brick machines. > > >> > > The logs on that machine then say something like : > > >> > > > > >> > > [2016-06-15 08:17:03.076119] E > > >> > > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] > > 0-management: Unable to > > >> > > acquire lock for vol2 > > >> > > > >> > This is a possible case if concurrent volume > operations > > are run. Do you > > >> > have any script which checks for volume status on an > > interval from all > > >> > the nodes, if so then this is an expected behavior. > > >> > > > >> > > > >> > Yes, I do have a couple of scripts that check on > volume and > > quota > > >> > status.. Given this, I do get a "Another transaction > is in > > progress.." > > >> > message which is ok. The problem is that sometimes I get > > the volume lock > > >> > held message which never goes away. This sometimes > results > > in glusterd > > >> > consuming a lot of memory and CPU and the problem can > only > > be fixed with > > >> > a reboot. The log files are huge so I'm not sure if > its ok > > to attach > > >> > them to an email. > > >> > > >> Ok, so this is known. We have fixed lots of stale lock > issues > > in 3.7 > > >> branch and some of them if not all were also backported to > > 3.6 branch. > > >> The issue is you are using 3.6.1 which is quite old. If you > > can upgrade > > >> to latest versions of 3.7 or at worst of 3.6 I am confident > > that this > > >> will go away. > > >> > > >> ~Atin > > >> > > > >> > > > > >> > > After sometime, glusterd then seems to give up > and die.. > > >> > > > >> > Do you mean glusterd shuts down or segfaults, if so I > > am more > > >> interested > > >> > in analyzing this part. Could you provide us the > > glusterd log, > > >> > cmd_history log file along with core (in case of > SEGV) from > > >> all the > > >> > nodes for the further analysis? > > >> > > > >> > > > >> > There is no segfault. glusterd just shuts down. As I said > > above, > > >> > sometimes this happens and sometimes it just continues to > > hog a lot of > > >> > memory and CPU.. > > >> > > > >> > > > >> > > > > >> > > Interestingly, I also find the following line > in the > > >> beginning of > > >> > > etc-glusterfs-glusterd.vol.log and I dont know if > > this has any > > >> > > significance to the issue : > > >> > > > > >> > > [2016-06-14 06:48:57.282290] I > > >> > > [glusterd-store.c:2063:glusterd_restore_op_version] > > >> 0-management: > > >> > > Detected new install. Setting op-version to > maximum : > > 30600 > > >> > > > > >> > > > >> > > > >> > What does this line signify? > > >> > > >> > > > > > >-------------- next part -------------- A non-text attachment was scrubbed... Name: glusterd-utils.c Type: text/x-csrc Size: 471579 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/34c4f44e/attachment-0003.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: glusterd-utils.h Type: text/x-csrc Size: 28241 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/34c4f44e/attachment-0004.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: mem-types.h Type: text/x-csrc Size: 5892 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/34c4f44e/attachment-0005.bin>