Hi,
It seems to be like using the step mentioned here
https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Resolving%20Peer%20Rejected/
is the reason you are facing so much problems. You said that the vol files
in all the nodes are different which should never happen, because that's
the purpose of shd to heal the vol files in case of any differences in the
files, but dues to several deletions of the files and starting the glusterd
service again and then disconnection of the nodes has lead to a state where
that task is left incomplete. And whenever you are again trying to start
the volume or get the status then that heal runs but it's not able to
complete as it's not able to get a proper head through which it could make
the heal.
Before the node rejection part, and you applying those steps to rectify it,
did everything worked fine?
Also, can you share the sos-reports if not then the vol files which you see
as different. So, that I can be sure of that.
Regards
Nikhil Ladha
On Wed, Apr 29, 2020 at 12:54 PM <nico at furyweb.fr> wrote:
> I made another test, I restarded glusterd on all 3 nodes and right after
> restart I can get partial volume status but Locking failed occurs a few
> seconds after.
>
> Example of output on node 2 and 3:
> root at glusterDevVM2:~# systemctl restart glusterd
> root at glusterDevVM2:~# gluster volume status tmp
> Status of volume: tmp
> Gluster process TCP Port RDMA Port Online
> Pid
>
>
------------------------------------------------------------------------------
> Brick glusterDevVM2:/bricks/tmp/brick1/data N/A N/A N
> N/A
> Self-heal Daemon on localhost N/A N/A N
> N/A
>
> Task Status of Volume tmp
>
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
> root at glusterDevVM2:~# gluster volume status tmp
> Locking failed on glusterDevVM1. Please check log file for details.
> root at glusterDevVM2:~# gluster volume status tmp
> Status of volume: tmp
> Gluster process TCP Port RDMA Port Online
> Pid
>
>
------------------------------------------------------------------------------
> Brick glusterDevVM1:/bricks/tmp/brick1/data 49215 0 Y
> 5335
> Brick glusterDevVM2:/bricks/tmp/brick1/data 49215 0 Y
> 5239
> Self-heal Daemon on localhost N/A N/A N
> N/A
> Self-heal Daemon on glusterDevVM1 N/A N/A N
> N/A
>
> Task Status of Volume tmp
>
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
> root at glusterDevVM2:~# gluster volume status tmp
> Locking failed on glusterDevVM3. Please check log file for details.
> Locking failed on glusterDevVM1. Please check log file for details.
>
>
> root at glusterDevVM3:~# systemctl restart glusterd
> root at glusterDevVM3:~# gluster volume status tmp
> Status of volume: tmp
> Gluster process TCP Port RDMA Port Online
> Pid
>
>
------------------------------------------------------------------------------
> Brick glusterDevVM1:/bricks/tmp/brick1/data 49215 0 Y
> 5335
> Brick glusterDevVM2:/bricks/tmp/brick1/data 49215 0 Y
> 5239
> Brick glusterDevVM3:/bricks/tmp/brick1/data 49215 0 Y
> 3693
> Self-heal Daemon on localhost N/A N/A N
> N/A
> Self-heal Daemon on glusterDevVM2 N/A N/A N
> N/A
> Self-heal Daemon on glusterDevVM1 N/A N/A Y
> 102850
>
> Task Status of Volume tmp
>
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
> root at glusterDevVM3:~# gluster volume status tmp
> Locking failed on glusterDevVM2. Please check log file for details.
> root at glusterDevVM3:~# gluster volume status tmp
> Locking failed on glusterDevVM1. Please check log file for details.
> Locking failed on glusterDevVM2. Please check log file for details.
> root at glusterDevVM3:~# systemctl restart glusterd
> root at glusterDevVM3:~# gluster volume status tmp
> Status of volume: tmp
> Gluster process TCP Port RDMA Port Online
> Pid
>
>
------------------------------------------------------------------------------
> Brick glusterDevVM3:/bricks/tmp/brick1/data N/A N/A N
> N/A
> Self-heal Daemon on localhost N/A N/A N
> N/A
>
> Task Status of Volume tmp
>
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
> root at glusterDevVM3:~# gluster volume status tmp
> Another transaction is in progress for tmp. Please try again after some
> time.
> root at glusterDevVM3:~# gluster volume status tmp
> Another transaction is in progress for tmp. Please try again after some
> time.
> root at glusterDevVM3:~# gluster volume status tmp
> Another transaction is in progress for tmp. Please try again after some
> time.
>
>
> ------------------------------
> *De: *"Nikhil Ladha" <nladha at redhat.com>
> *?: *nico at furyweb.fr
> *Cc: *"gluster-users" <gluster-users at gluster.org>
> *Envoy?: *Mardi 28 Avril 2020 14:17:46
> *Objet: *Re: [Gluster-users] never ending logging
>
> Hi,
> It says syntax error in the log you shared, so there must be some mistake
> in what you are passing as an argument or some spelling mistake. Otherwise,
> how come it will run on one them and not on other having same
configuration?
> Also, can you please share the complete log file.
> And try restarting all the nodes in the tsp, and execute the commands.
>
> Regards
> Nikhil Ladha
>
> On Tue, Apr 28, 2020 at 5:20 PM <nico at furyweb.fr> wrote:
>
>> Hi.
>>
>> Not really worked well, I restarted node 2 at least a dozen of times
>> until almost all bricks go online but the Rejected state disapeared
after
>> applying the fix.
>> I'm not able to create a volume as all gluster commands are issuing
the
>> "Another transaction is in progress" error.
>> All ping are less than 0.5ms.
>>
>> I noticed another error in brick logs for a failed brick :
>> [2020-04-28 10:58:59.009933] E [MSGID: 101021]
[graph.y:364:graphyyerror]
>> 0-parser: syntax error: line 140 (volume 'data_export-server'):
"!SSLv2"
>> allowed tokens are 'volume', 'type',
'subvolumes', 'option',
>> 'end-volume'()
>>
>> root at glusterDevVM2:/var/lib/glusterd/vols/data_export# grep -n SSLv2
*
>> data_export.gfproxyd.vol:8: option transport.socket.ssl-cipher-list
>> HIGH:\!SSLv2
>> data_export.gfproxyd.vol:26: option transport.socket.ssl-cipher-list
>> HIGH:\!SSLv2
>> data_export.gfproxyd.vol:44: option transport.socket.ssl-cipher-list
>> HIGH:\!SSLv2
>> data_export.glusterDevVM1.bricks-data_export-brick1-data.vol:140:
>> option transport.socket.ssl-cipher-list HIGH:\!SSLv2
>> data_export.glusterDevVM2.bricks-data_export-brick1-data.vol:140:
>> option transport.socket.ssl-cipher-list HIGH:\!SSLv2
>> data_export.glusterDevVM3.bricks-data_export-brick1-data.vol:145:
>> option transport.socket.ssl-cipher-list HIGH:\!SSLv2
>> data_export-shd.vol:7: option transport.socket.ssl-cipher-list
>> HIGH:\!SSLv2
>> data_export-shd.vol:24: option transport.socket.ssl-cipher-list
>> HIGH:\!SSLv2
>> data_export-shd.vol:41: option transport.socket.ssl-cipher-list
>> HIGH:\!SSLv2
>> data_export.tcp-fuse.vol:8: option transport.socket.ssl-cipher-list
>> HIGH:\!SSLv2
>> data_export.tcp-fuse.vol:24: option transport.socket.ssl-cipher-list
>> HIGH:\!SSLv2
>> data_export.tcp-fuse.vol:40: option transport.socket.ssl-cipher-list
>> HIGH:\!SSLv2
>> info:22:ssl.cipher-list=HIGH:\!SSLv2
>> trusted-data_export.tcp-fuse.vol:8: option
>> transport.socket.ssl-cipher-list HIGH:\!SSLv2
>> trusted-data_export.tcp-fuse.vol:26: option
>> transport.socket.ssl-cipher-list HIGH:\!SSLv2
>> trusted-data_export.tcp-fuse.vol:44: option
>> transport.socket.ssl-cipher-list HIGH:\!SSLv2
>> trusted-data_export.tcp-gfproxy-fuse.vol:8: option
>> transport.socket.ssl-cipher-list HIGH:\!SSLv2
>>
>> Another volume with same parameters don't show this error :
>> root at glusterDevVM2:/var/lib/glusterd/vols/userfiles# grep
'2020-04-28
>> 10:5[89]:'
/var/log/glusterfs/bricks/bricks-userfiles-brick1-data.log
>> [2020-04-28 10:58:53.427441] I [MSGID: 100030] [glusterfsd.c:2867:main]
>> 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version
7.5
>> (args: /usr/sbin/glusterfsd -s glusterDevVM2 --volfile-id
>> userfiles.glusterDevVM2.bricks-userfiles-brick1-data -p
>>
/var/run/gluster/vols/userfiles/glusterDevVM2-bricks-userfiles-brick1-data.pid
>> -S /var/run/gluster/072c6be1df6e31e4.socket --brick-name
>> /bricks/userfiles/brick1/data -l
>> /var/log/glusterfs/bricks/bricks-userfiles-brick1-data.log
--xlator-option
>> *-posix.glusterd-uuid=7f6c3023-144b-4db2-9063-d90926dbdd18
--process-name
>> brick --brick-port 49216 --xlator-option
userfiles-server.listen-port=49216)
>> [2020-04-28 10:58:53.428426] I [glusterfsd.c:2594:daemonize]
0-glusterfs:
>> Pid of current running process is 5184
>> [2020-04-28 10:58:53.432337] I
>> [socket.c:4350:ssl_setup_connection_params] 0-socket.glusterfsd: SSL
>> support for glusterd is ENABLED
>> [2020-04-28 10:58:53.436982] I
>> [socket.c:4360:ssl_setup_connection_params] 0-socket.glusterfsd: using
>> certificate depth 1
>> [2020-04-28 10:58:53.437873] I [socket.c:958:__socket_server_bind]
>> 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 9
>> [2020-04-28 10:58:53.438830] I
>> [socket.c:4347:ssl_setup_connection_params] 0-glusterfs: SSL support on
the
>> I/O path is ENABLED
>> [2020-04-28 10:58:53.439206] I
>> [socket.c:4350:ssl_setup_connection_params] 0-glusterfs: SSL support
for
>> glusterd is ENABLED
>> [2020-04-28 10:58:53.439238] I
>> [socket.c:4360:ssl_setup_connection_params] 0-glusterfs: using
certificate
>> depth 1
>> [2020-04-28 10:58:53.441296] I [MSGID: 101190]
>> [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 0
>> [2020-04-28 10:58:53.441434] I [MSGID: 101190]
>> [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 1
>> [2020-04-28 10:59:01.609052] I
>> [rpcsvc.c:2690:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service:
Configured
>> rpc.outstanding-rpc-limit with value 64
>> [2020-04-28 10:59:01.609353] I
>> [socket.c:4347:ssl_setup_connection_params] 0-tcp.userfiles-server: SSL
>> support on the I/O path is ENABLED
>> [2020-04-28 10:59:01.609373] I
>> [socket.c:4350:ssl_setup_connection_params] 0-tcp.userfiles-server: SSL
>> support for glusterd is ENABLED
>> [2020-04-28 10:59:01.609388] I
>> [socket.c:4360:ssl_setup_connection_params] 0-tcp.userfiles-server:
using
>> certificate depth 1
>> [2020-04-28 10:59:01.609403] I
>> [socket.c:4363:ssl_setup_connection_params] 0-tcp.userfiles-server:
using
>> cipher list HIGH:!SSLv2
>> [2020-04-28 10:59:01.644924] I
>> [socket.c:4350:ssl_setup_connection_params]
0-socket.userfiles-changelog:
>> SSL support for glusterd is ENABLED
>> [2020-04-28 10:59:01.644958] I
>> [socket.c:4360:ssl_setup_connection_params]
0-socket.userfiles-changelog:
>> using certificate depth 1
>>
>>
>> ------------------------------
>> *De: *"Nikhil Ladha" <nladha at redhat.com>
>> *?: *nico at furyweb.fr
>> *Cc: *"gluster-users" <gluster-users at gluster.org>
>> *Envoy?: *Mardi 28 Avril 2020 09:31:45
>> *Objet: *Re: [Gluster-users] never ending logging
>>
>> Hi,
>> Okay. So, after applying the fix everything worked well? Means all the
>> peers were in connected state?
>> If so, can you try creating a new volume without enabling SSL and share
>> the log, and also for the volume that is not starting can you try the
steps
>> mentioned here
>>
https://docs.gluster.org/en/latest/Troubleshooting/troubleshooting-glusterd/
only
>> in the 'Common issues how to resolve them section" and what
logs do you get?
>> Also, could you ping test all the peers?
>> And the error 'failed to fetch volume files' occurs as it is
not able to
>> fetch the vol file from it's peers, as all the peers in the cluster
share
>> the same vol file for a volume.
>>
>> Regards
>> Nikhil Ladha
>>
>>
>> On Tue, Apr 28, 2020 at 12:37 PM <nico at furyweb.fr> wrote:
>>
>>> Hi.
>>>
>>> No operation on any volume nor brick, the only change was SSL
>>> certificate renewal on 3 nodes and all clients. Then, node 2 was
rejected
>>> and I applied following steps to fix :
>>>
https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Resolving%20Peer%20Rejected/
>>> I also saw
>>>
https://docs.gluster.org/en/latest/Troubleshooting/troubleshooting-glusterd/
>>> but solution wasn't compatible as cluster.max-op-version
doesn't exist and
>>> all op-version are the same on all 3 nodes.
>>>
>>> The strange thing is error "failed to fetch volume file"
occurs on the
>>> node owning the brick, does it means it can't access it's
own brick ?
>>>
>>> Regards,
>>> Nicolas.
>>>
>>> ------------------------------
>>> *De: *"Nikhil Ladha" <nladha at redhat.com>
>>> *?: *nico at furyweb.fr
>>> *Cc: *"gluster-users" <gluster-users at
gluster.org>
>>> *Envoy?: *Mardi 28 Avril 2020 07:43:20
>>> *Objet: *Re: [Gluster-users] never ending logging
>>>
>>> Hi,
>>> Since, all things are working fine except few bricks which are not
>>> coming up, I doubt there is any issue with gluster itself. Did you
by
>>> chance made any changes to those bricks or the volume or the node
to which
>>> they are linked?
>>> And as far as SSL logs are concerned, I am looking into that
matter.
>>>
>>> Regards
>>> Nikhil Ladha
>>>
>>>
>>> On Mon, Apr 27, 2020 at 7:17 PM <nico at furyweb.fr> wrote:
>>>
>>>> Thanks for reply.
>>>>
>>>> I updated storage pool in 7.5 and restarted all 3 nodes
sequentially.
>>>> All nodes now appear in Connected state from every node and
gluster
>>>> volume list show all 74 volumes.
>>>> SSL log lines are still flooding glusterd log file on all nodes
but
>>>> don't appear on grick log files. As there's no
information about volume nor
>>>> client on these lines I'm not able to check if a certain
volume produce
>>>> this error or not.
>>>> I alos tried pstack after installing Debian package
glusterfs-dbg but
>>>> still getting "No symbols" error
>>>>
>>>> I found that 5 brick processes didn't start on node 2 and 1
on node 3
>>>> [2020-04-27 11:54:23.622659] I [MSGID: 100030]
[glusterfsd.c:2867:main]
>>>> 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd
version 7.5
>>>> (args: /usr/sbin/glusterfsd -s glusterDevVM2 --volfile-id
>>>>
svg_pg_wed_dev_bkp.glusterDevVM2.bricks-svg_pg_wed_dev_bkp-brick1-data -p
>>>>
/var/run/gluster/vols/svg_pg_wed_dev_bkp/glusterDevVM2-bricks-svg_pg_wed_dev_bkp-brick1-data.pid
>>>> -S /var/run/gluster/5023d38a22a8a874.socket --brick-name
>>>> /bricks/svg_pg_wed_dev_bkp/brick1/data -l
>>>>
/var/log/glusterfs/bricks/bricks-svg_pg_wed_dev_bkp-brick1-data.log
>>>> --xlator-option
*-posix.glusterd-uuid=7f6c3023-144b-4db2-9063-d90926dbdd18
>>>> --process-name brick --brick-port 49206 --xlator-option
>>>> svg_pg_wed_dev_bkp-server.listen-port=49206)
>>>> [2020-04-27 11:54:23.632870] I [glusterfsd.c:2594:daemonize]
>>>> 0-glusterfs: Pid of current running process is 5331
>>>> [2020-04-27 11:54:23.636679] I
>>>> [socket.c:4350:ssl_setup_connection_params]
0-socket.glusterfsd: SSL
>>>> support for glusterd is ENABLED
>>>> [2020-04-27 11:54:23.636745] I
>>>> [socket.c:4360:ssl_setup_connection_params]
0-socket.glusterfsd: using
>>>> certificate depth 1
>>>> [2020-04-27 11:54:23.637580] I
[socket.c:958:__socket_server_bind]
>>>> 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 9
>>>> [2020-04-27 11:54:23.637932] I
>>>> [socket.c:4347:ssl_setup_connection_params] 0-glusterfs: SSL
support on the
>>>> I/O path is ENABLED
>>>> [2020-04-27 11:54:23.637949] I
>>>> [socket.c:4350:ssl_setup_connection_params] 0-glusterfs: SSL
support for
>>>> glusterd is ENABLED
>>>> [2020-04-27 11:54:23.637960] I
>>>> [socket.c:4360:ssl_setup_connection_params] 0-glusterfs: using
certificate
>>>> depth 1
>>>> [2020-04-27 11:54:23.639324] I [MSGID: 101190]
>>>> [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll:
Started thread
>>>> with index 0
>>>> [2020-04-27 11:54:23.639380] I [MSGID: 101190]
>>>> [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll:
Started thread
>>>> with index 1
>>>> [2020-04-27 11:54:28.933102] E
>>>> [glusterfsd-mgmt.c:2217:mgmt_getspec_cbk] 0-glusterfs: failed
to get the
>>>> 'volume file' from server
>>>> [2020-04-27 11:54:28.933134] E
>>>> [glusterfsd-mgmt.c:2416:mgmt_getspec_cbk] 0-mgmt: failed to
fetch volume
>>>> file
>>>>
(key:svg_pg_wed_dev_bkp.glusterDevVM2.bricks-svg_pg_wed_dev_bkp-brick1-data)
>>>> [2020-04-27 11:54:28.933361] W
[glusterfsd.c:1596:cleanup_and_exit]
>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe5d1)
[0x7f2b08ec35d1]
>>>> -->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x8d0)
[0x55d46cb5a110]
>>>> -->/usr/sbin/glusterfsd(cleanup_and_exit+0x54)
[0x55d46cb51ec4] ) 0-:
>>>> received signum (0), shutting down
>>>>
>>>> I tried to stop the volume but gluster commands are still
locked
>>>> (Another transaction is in progress.).
>>>>
>>>> Best regards,
>>>> Nicolas.
>>>>
>>>> ------------------------------
>>>> *De: *"Nikhil Ladha" <nladha at redhat.com>
>>>> *?: *nico at furyweb.fr
>>>> *Cc: *"gluster-users" <gluster-users at
gluster.org>
>>>> *Envoy?: *Lundi 27 Avril 2020 13:34:47
>>>> *Objet: *Re: [Gluster-users] never ending logging
>>>>
>>>> Hi,
>>>> As you mentioned that the node 2 is in
"semi-connected" state, I think
>>>> due to that the locking of volume is failing, and since it is
failing in
>>>> one of the volumes the transaction is not complete and you are
seeing a
>>>> transaction error on another volume.
>>>> Moreover, for the repeated logging of lines :
>>>> SSL support on the I/O path is enabled, SSL support for
glusterd is
>>>> enabled and using certificate depth 1
>>>> If you can try creating a volume without having ssl enabled and
then
>>>> check if the same log messages appear.
>>>> Also, if you update to 7.5, and find any change in log message
with SSL
>>>> ENABLED, then please do share that.
>>>>
>>>> Regards
>>>> Nikhil Ladha
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20200429/6cc20874/attachment.html>