thr3ads.net - Gluster users - [Gluster-users] How to check running transactions in gluster? [Nov 2018]

If this information is useful, please help other people find it:
Share via:

Jeevan Patnaik

2018-Nov-26 04:04 UTC

[Gluster-users] How to check running transactions in gluster?

Hi Atin,

Thanks for the details. I think the issue is with few of the nodes which
aren't serving any bricks in rejected state. When I remove them from pool
and stop glusterfs in those nodes,  everything seems normal.

We keep those nodes as spares, but have glusterd runnin. coz in our
configuration, servers are also clients and we are using gluster NFS
without failover for mounts and to localize the impact if a node goes down,
we use localhost as the nfs server on each node.
I.e.,
mount -t nfs localhost:/volume /mointpoint

So, glusterfs should be running in these spare nodes. Now is this okay to
keep those nodes in the pool? Will they go to rejected state again and
cause transaction locks. Why aren't they in sync though they're part of
the
pool.

Regards,
Jeevan.

On Mon, Nov 26, 2018, 8:22 AM Atin Mukherjee <amukherj at redhat.com wrote:
>
>
> On Mon, Nov 26, 2018 at 8:21 AM Atin Mukherjee <amukherj at
redhat.com>
> wrote:
>
>>
>>
>> On Sun, Nov 25, 2018 at 8:40 PM Jeevan Patnaik <g1patnaik at
gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am getting output Another transaction is in progress with few
gluster
>>> volume commands including stop command. And with gluster volume
status
>>> command, it's just hung and fails with timeout error.
>>>
>>
>> This is primarily because of not allowing glusterd to complete it's
>> handshake with others when concurrent restart of glusterd services are
done
>> (as I could understand from your previous email in the list). With
GlusterD
>> (read as GD1) this is a current challenge w.r.t it's design where
due to
>> its N X N handshaking mechanism at the restart sequence to bring all
the
>> configuration data into inconsistent what we've seen is the overall
>> recovery time of the cluster can take very long if N is on the higher
side
>> (in your case N = 72 which is certainly high) and hence the
recommendation
>> is not to restart the glusterd services concurrently and wait for the
>> handshaking to complete.
>>
>
> Forgot to mention that GlusterD2 ( https://github.com/gluster/glusterd2)
> which is in development phase addresses this design problem.
>
>
>>
>>> So, I want to find out which transaction is hung and how to know
this? I
>>> ran volume statedump command, but didn't wait till it's
completed to check
>>> if it's hung or giving any resut, as it is also taking time.
>>>
>>
>> kill -SIGUSR1 $(pidof glusterd) should get you a glusterd statedump
file
>> in /var/run/gluster which can point to a backtrace dump at the bottom
to
>> understand which transaction is currently in progress. In case this
>> transaction is queued up for more than 180 seconds (which is not usual)
the
>> unlock timer kicks out such locks.
>>
>>
>>> Please help me with this.. I'm struggling with these gluster
timeout
>>> errors :(
>>>
>>> Besides, I have also tuned
>>> transport.listen-backlog gluster to 200 and following kernel
parameters
>>> to avoid syn overflow rejects:
>>> net.core.somaxconn = 1024
>>> net.ipv4.tcp_max_syn_backlog = 20480
>>>
>>> Regards,
>>> Jeevan.
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181126/2aad10e8/attachment.html>

Sanju Rakonde

2018-Nov-26 12:44 UTC

head link

[Gluster-users] How to check running transactions in gluster?

Hi Jeevan,

You might be hitting https://bugzilla.redhat.com/show_bug.cgi?id=1635820

Were any of the volumes in "Created" state, when the peer reject issue
is
seen?

Thanks,
Sanju

On Mon, Nov 26, 2018 at 9:35 AM Jeevan Patnaik <g1patnaik at gmail.com>
wrote:
> Hi Atin,
>
> Thanks for the details. I think the issue is with few of the nodes which
> aren't serving any bricks in rejected state. When I remove them from
pool
> and stop glusterfs in those nodes,  everything seems normal.
>
> We keep those nodes as spares, but have glusterd runnin. coz in our
> configuration, servers are also clients and we are using gluster NFS
> without failover for mounts and to localize the impact if a node goes down,
> we use localhost as the nfs server on each node.
> I.e.,
> mount -t nfs localhost:/volume /mointpoint
>
> So, glusterfs should be running in these spare nodes. Now is this okay to
> keep those nodes in the pool? Will they go to rejected state again and
> cause transaction locks. Why aren't they in sync though they're
part of the
> pool.
>
> Regards,
> Jeevan.
>
> On Mon, Nov 26, 2018, 8:22 AM Atin Mukherjee <amukherj at redhat.com
wrote:
>
>>
>>
>> On Mon, Nov 26, 2018 at 8:21 AM Atin Mukherjee <amukherj at
redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Sun, Nov 25, 2018 at 8:40 PM Jeevan Patnaik <g1patnaik at
gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am getting output Another transaction is in progress with few
gluster
>>>> volume commands including stop command. And with gluster volume
status
>>>> command, it's just hung and fails with timeout error.
>>>>
>>>
>>> This is primarily because of not allowing glusterd to complete
it's
>>> handshake with others when concurrent restart of glusterd services
are done
>>> (as I could understand from your previous email in the list). With
GlusterD
>>> (read as GD1) this is a current challenge w.r.t it's design
where due to
>>> its N X N handshaking mechanism at the restart sequence to bring
all the
>>> configuration data into inconsistent what we've seen is the
overall
>>> recovery time of the cluster can take very long if N is on the
higher side
>>> (in your case N = 72 which is certainly high) and hence the
recommendation
>>> is not to restart the glusterd services concurrently and wait for
the
>>> handshaking to complete.
>>>
>>
>> Forgot to mention that GlusterD2 (
https://github.com/gluster/glusterd2)
>> which is in development phase addresses this design problem.
>>
>>
>>>
>>>> So, I want to find out which transaction is hung and how to
know this?
>>>> I ran volume statedump command, but didn't wait till
it's completed to
>>>> check if it's hung or giving any resut, as it is also
taking time.
>>>>
>>>
>>> kill -SIGUSR1 $(pidof glusterd) should get you a glusterd statedump
file
>>> in /var/run/gluster which can point to a backtrace dump at the
bottom to
>>> understand which transaction is currently in progress. In case this
>>> transaction is queued up for more than 180 seconds (which is not
usual) the
>>> unlock timer kicks out such locks.
>>>
>>>
>>>> Please help me with this.. I'm struggling with these
gluster timeout
>>>> errors :(
>>>>
>>>> Besides, I have also tuned
>>>> transport.listen-backlog gluster to 200 and following kernel
parameters
>>>> to avoid syn overflow rejects:
>>>> net.core.somaxconn = 1024
>>>> net.ipv4.tcp_max_syn_backlog = 20480
>>>>
>>>> Regards,
>>>> Jeevan.
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181126/b95c111a/attachment.html>

Gluster users - Nov 2018 - How to check running transactions in gluster?

[Gluster-users] How to check running transactions in gluster?

[Gluster-users] How to check running transactions in gluster?