Atin Mukherjee
2016-Jun-17 08:47 UTC
[Gluster-users] Problem with glusterd locks on gluster 3.6.1
On 06/17/2016 12:44 PM, B.K.Raghuram wrote:> Thanks Atin.. I'm not familiar with pulling patches the review system > but will try:)It's not that difficult. Open the gerrit review link, go to the download drop box at the top right corner, click on it and then you will see a cherry pick option, copy that content and paste it the source code repo you host. If there are no merge conflicts, it should auto apply, otherwise you'd need to fix them manually. HTH. Atin> > On Fri, Jun 17, 2016 at 12:35 PM, Atin Mukherjee <amukherj at redhat.com > <mailto:amukherj at redhat.com>> wrote: > > > > On 06/16/2016 06:17 PM, Atin Mukherjee wrote: > > > > > > On 06/16/2016 01:32 PM, B.K.Raghuram wrote: > >> Thanks a lot Atin, > >> > >> The problem is that we are using a forked version of 3.6.1 which has > >> been modified to work with ZFS (for snapshots) but we do not have the > >> resources to port that over to the later versions of gluster. > >> > >> Would you know of anyone who would be willing to take this on?! > > > > If you can cherry pick the patches and apply them on your source and > > rebuild it, I can point the patches to you, but you'd need to give a > > day's time to me as I have some other items to finish from my plate. > > > Here is the list of the patches need to be applied on the following > order: > > http://review.gluster.org/9328 > http://review.gluster.org/9393 > http://review.gluster.org/10023 > > > > > ~Atin > >> > >> Regards, > >> -Ram > >> > >> On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee > <amukherj at redhat.com <mailto:amukherj at redhat.com> > >> <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>> wrote: > >> > >> > >> > >> On 06/16/2016 10:49 AM, B.K.Raghuram wrote: > >> > > >> > > >> > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee > <amukherj at redhat.com <mailto:amukherj at redhat.com> > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>> > >> > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com> > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>>> wrote: > >> > > >> > > >> > > >> > On 06/15/2016 04:24 PM, B.K.Raghuram wrote: > >> > > Hi, > >> > > > >> > > We're using gluster 3.6.1 and we periodically find > that gluster commands > >> > > fail saying the it could not get the lock on one of > the brick machines. > >> > > The logs on that machine then say something like : > >> > > > >> > > [2016-06-15 08:17:03.076119] E > >> > > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] > 0-management: Unable to > >> > > acquire lock for vol2 > >> > > >> > This is a possible case if concurrent volume operations > are run. Do you > >> > have any script which checks for volume status on an > interval from all > >> > the nodes, if so then this is an expected behavior. > >> > > >> > > >> > Yes, I do have a couple of scripts that check on volume and > quota > >> > status.. Given this, I do get a "Another transaction is in > progress.." > >> > message which is ok. The problem is that sometimes I get > the volume lock > >> > held message which never goes away. This sometimes results > in glusterd > >> > consuming a lot of memory and CPU and the problem can only > be fixed with > >> > a reboot. The log files are huge so I'm not sure if its ok > to attach > >> > them to an email. > >> > >> Ok, so this is known. We have fixed lots of stale lock issues > in 3.7 > >> branch and some of them if not all were also backported to > 3.6 branch. > >> The issue is you are using 3.6.1 which is quite old. If you > can upgrade > >> to latest versions of 3.7 or at worst of 3.6 I am confident > that this > >> will go away. > >> > >> ~Atin > >> > > >> > > > >> > > After sometime, glusterd then seems to give up and die.. > >> > > >> > Do you mean glusterd shuts down or segfaults, if so I > am more > >> interested > >> > in analyzing this part. Could you provide us the > glusterd log, > >> > cmd_history log file along with core (in case of SEGV) from > >> all the > >> > nodes for the further analysis? > >> > > >> > > >> > There is no segfault. glusterd just shuts down. As I said > above, > >> > sometimes this happens and sometimes it just continues to > hog a lot of > >> > memory and CPU.. > >> > > >> > > >> > > > >> > > Interestingly, I also find the following line in the > >> beginning of > >> > > etc-glusterfs-glusterd.vol.log and I dont know if > this has any > >> > > significance to the issue : > >> > > > >> > > [2016-06-14 06:48:57.282290] I > >> > > [glusterd-store.c:2063:glusterd_restore_op_version] > >> 0-management: > >> > > Detected new install. Setting op-version to maximum : > 30600 > >> > > > >> > > >> > > >> > What does this line signify? > >> > >> > >
B.K.Raghuram
2016-Jun-17 09:25 UTC
[Gluster-users] Problem with glusterd locks on gluster 3.6.1
Thanks Atin, I had three merge conflicts in the third patch.. I've attached
the files with the conflicts. Would any of the intervening commits be
needed as well?
The conflicts were in :
both modified: libglusterfs/src/mem-types.h
both modified: xlators/mgmt/glusterd/src/glusterd-utils.c
both modified: xlators/mgmt/glusterd/src/glusterd-utils.h
On Fri, Jun 17, 2016 at 2:17 PM, Atin Mukherjee <amukherj at redhat.com>
wrote:
>
>
> On 06/17/2016 12:44 PM, B.K.Raghuram wrote:
> > Thanks Atin.. I'm not familiar with pulling patches the review
system
> > but will try:)
>
> It's not that difficult. Open the gerrit review link, go to the
download
> drop box at the top right corner, click on it and then you will see a
> cherry pick option, copy that content and paste it the source code repo
> you host. If there are no merge conflicts, it should auto apply,
> otherwise you'd need to fix them manually.
>
> HTH.
> Atin
>
> >
> > On Fri, Jun 17, 2016 at 12:35 PM, Atin Mukherjee <amukherj at
redhat.com
> > <mailto:amukherj at redhat.com>> wrote:
> >
> >
> >
> > On 06/16/2016 06:17 PM, Atin Mukherjee wrote:
> > >
> > >
> > > On 06/16/2016 01:32 PM, B.K.Raghuram wrote:
> > >> Thanks a lot Atin,
> > >>
> > >> The problem is that we are using a forked version of
3.6.1 which
> has
> > >> been modified to work with ZFS (for snapshots) but we do
not have
> the
> > >> resources to port that over to the later versions of
gluster.
> > >>
> > >> Would you know of anyone who would be willing to take
this on?!
> > >
> > > If you can cherry pick the patches and apply them on your
source
> and
> > > rebuild it, I can point the patches to you, but you'd
need to give
> a
> > > day's time to me as I have some other items to finish
from my
> plate.
> >
> >
> > Here is the list of the patches need to be applied on the
following
> > order:
> >
> > http://review.gluster.org/9328
> > http://review.gluster.org/9393
> > http://review.gluster.org/10023
> >
> > >
> > > ~Atin
> > >>
> > >> Regards,
> > >> -Ram
> > >>
> > >> On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee
> > <amukherj at redhat.com <mailto:amukherj at redhat.com>
> > >> <mailto:amukherj at redhat.com <mailto:amukherj at
redhat.com>>> wrote:
> > >>
> > >>
> > >>
> > >> On 06/16/2016 10:49 AM, B.K.Raghuram wrote:
> > >> >
> > >> >
> > >> > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee
> > <amukherj at redhat.com <mailto:amukherj at redhat.com>
> > <mailto:amukherj at redhat.com <mailto:amukherj at
redhat.com>>
> > >> > <mailto:amukherj at redhat.com
<mailto:amukherj at redhat.com>
> > <mailto:amukherj at redhat.com <mailto:amukherj at
redhat.com>>>> wrote:
> > >> >
> > >> >
> > >> >
> > >> > On 06/15/2016 04:24 PM, B.K.Raghuram wrote:
> > >> > > Hi,
> > >> > >
> > >> > > We're using gluster 3.6.1 and we
periodically find
> > that gluster commands
> > >> > > fail saying the it could not get the
lock on one of
> > the brick machines.
> > >> > > The logs on that machine then say
something like :
> > >> > >
> > >> > > [2016-06-15 08:17:03.076119] E
> > >> > >
[glusterd-op-sm.c:3058:glusterd_op_ac_lock]
> > 0-management: Unable to
> > >> > > acquire lock for vol2
> > >> >
> > >> > This is a possible case if concurrent volume
operations
> > are run. Do you
> > >> > have any script which checks for volume
status on an
> > interval from all
> > >> > the nodes, if so then this is an expected
behavior.
> > >> >
> > >> >
> > >> > Yes, I do have a couple of scripts that check on
volume and
> > quota
> > >> > status.. Given this, I do get a "Another
transaction is in
> > progress.."
> > >> > message which is ok. The problem is that
sometimes I get
> > the volume lock
> > >> > held message which never goes away. This
sometimes results
> > in glusterd
> > >> > consuming a lot of memory and CPU and the
problem can only
> > be fixed with
> > >> > a reboot. The log files are huge so I'm not
sure if its ok
> > to attach
> > >> > them to an email.
> > >>
> > >> Ok, so this is known. We have fixed lots of stale
lock issues
> > in 3.7
> > >> branch and some of them if not all were also
backported to
> > 3.6 branch.
> > >> The issue is you are using 3.6.1 which is quite old.
If you
> > can upgrade
> > >> to latest versions of 3.7 or at worst of 3.6 I am
confident
> > that this
> > >> will go away.
> > >>
> > >> ~Atin
> > >> >
> > >> > >
> > >> > > After sometime, glusterd then seems to
give up and
> die..
> > >> >
> > >> > Do you mean glusterd shuts down or
segfaults, if so I
> > am more
> > >> interested
> > >> > in analyzing this part. Could you provide us
the
> > glusterd log,
> > >> > cmd_history log file along with core (in
case of SEGV)
> from
> > >> all the
> > >> > nodes for the further analysis?
> > >> >
> > >> >
> > >> > There is no segfault. glusterd just shuts down.
As I said
> > above,
> > >> > sometimes this happens and sometimes it just
continues to
> > hog a lot of
> > >> > memory and CPU..
> > >> >
> > >> >
> > >> > >
> > >> > > Interestingly, I also find the
following line in the
> > >> beginning of
> > >> > > etc-glusterfs-glusterd.vol.log and I
dont know if
> > this has any
> > >> > > significance to the issue :
> > >> > >
> > >> > > [2016-06-14 06:48:57.282290] I
> > >> > >
[glusterd-store.c:2063:glusterd_restore_op_version]
> > >> 0-management:
> > >> > > Detected new install. Setting
op-version to maximum :
> > 30600
> > >> > >
> > >> >
> > >> >
> > >> > What does this line signify?
> > >>
> > >>
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160617/72c7f905/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mem-types.h
Type: text/x-chdr
Size: 6082 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160617/72c7f905/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glusterd-utils.c
Type: text/x-csrc
Size: 472427 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160617/72c7f905/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glusterd-utils.h
Type: text/x-chdr
Size: 28526 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160617/72c7f905/attachment-0005.bin>