thr3ads.net - Gluster users - [Gluster-users] Failure while upgrading gluster to 3.10.1 [May 2017]

If this information is useful, please help other people find it:
Share via:

Atin Mukherjee

2017-May-22 16:12 UTC

[Gluster-users] Failure while upgrading gluster to 3.10.1

On Mon, May 22, 2017 at 9:05 PM, Pawan Alwandi <pawan at platform.sh>
wrote:
>
> On Mon, May 22, 2017 at 8:36 PM, Atin Mukherjee <amukherj at
redhat.com>
> wrote:
>
>>
>>
>> On Mon, May 22, 2017 at 7:51 PM, Atin Mukherjee <amukherj at
redhat.com>
>> wrote:
>>
>>> Sorry Pawan, I did miss the other part of the attachments. So
looking
>>> from the glusterd.info file from all the hosts, it looks like host2
and
>>> host3 do not have the correct op-version. Can you please set the
op-version
>>> as "operating-version=30702" in host2 and host3 and
restart glusterd
>>> instance one by one on all the nodes?
>>>
>>
>> Please ensure that all the hosts are upgraded to the same bits before
>> doing this change.
>>
>
> Having to upgrade all 3 hosts to newer version before gluster could work
> successfully on any of them means application downtime.  The applications
> running on these hosts are expected to be highly available.  So with the
> way the things are right now, is an online upgrade possible?  My upgrade
> steps are: (1) stop the applications (2) umount the gluster volume, and
> then (3) upgrade gluster one host at a time.
>
One of the way to mitigate this is to first do an online upgrade to
glusterfs-3.7.9 (op-version:30707) given this bug was introduced in 3.7.10
and then come to 3.11.

> Our goal is to get gluster upgraded to 3.11 from 3.6.9, and to make this
> an online upgrade we are okay to take two steps 3.6.9 -> 3.7 and then
3.7
> to 3.11.
>
>
>>
>>
>>>
>>> Apparently it looks like there is a bug which you have uncovered,
during
>>> peer handshaking if one of the glusterd instance is running with
old bits
>>> then during validating the handshake request there is a possibility
that
>>> uuid received will be blank and the same was ignored however there
was a
>>> patch http://review.gluster.org/13519 which had some additional
changes
>>> which was always looking at this field and doing some extra checks
which
>>> was causing the handshake to fail. For now, the above workaround
should
>>> suffice. I'll be sending a patch pretty soon.
>>>
>>
>> Posted a patch https://review.gluster.org/#/c/17358 .
>>
>>
>>>
>>>
>>>
>>> On Mon, May 22, 2017 at 11:35 AM, Pawan Alwandi <pawan at
platform.sh>
>>> wrote:
>>>
>>>> Hello Atin,
>>>>
>>>> The tar's have the content of `/var/lib/glusterd` too for
all 3 nodes,
>>>> please check again.
>>>>
>>>> Thanks
>>>>
>>>> On Mon, May 22, 2017 at 11:32 AM, Atin Mukherjee <amukherj
at redhat.com>
>>>> wrote:
>>>>
>>>>> Pawan,
>>>>>
>>>>> I see you have provided the log files from the nodes,
however it'd be
>>>>> really helpful if you can provide me the content of
/var/lib/glusterd from
>>>>> all the nodes to get to the root cause of this issue.
>>>>>
>>>>> On Fri, May 19, 2017 at 12:09 PM, Pawan Alwandi <pawan
at platform.sh>
>>>>> wrote:
>>>>>
>>>>>> Hello Atin,
>>>>>>
>>>>>> Thanks for continued support.  I've attached
requested files from all
>>>>>> 3 nodes.
>>>>>>
>>>>>> (I think we already verified the UUIDs to be correct,
anyway let us
>>>>>> know if you find any more info in the logs)
>>>>>>
>>>>>> Pawan
>>>>>>
>>>>>> On Thu, May 18, 2017 at 11:45 PM, Atin Mukherjee
<amukherj at redhat.com
>>>>>> > wrote:
>>>>>>
>>>>>>>
>>>>>>> On Thu, 18 May 2017 at 23:40, Atin Mukherjee
<amukherj at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Wed, 17 May 2017 at 12:47, Pawan Alwandi
<pawan at platform.sh>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello Atin,
>>>>>>>>>
>>>>>>>>> I realized that these
http://gluster.readthedocs.io/
>>>>>>>>> en/latest/Upgrade-Guide/upgrade_to_3.10/
instructions only work
>>>>>>>>> for upgrades from 3.7, while we are running
3.6.2.  Are there
>>>>>>>>> instructions/suggestion you have for us to
upgrade from 3.6 version?
>>>>>>>>>
>>>>>>>>> I believe upgrade from 3.6 to 3.7 and then
to 3.10 would work, but
>>>>>>>>> I see similar errors reported when I
upgraded to 3.7 too.
>>>>>>>>>
>>>>>>>>> For what its worth, I was able to set the
op-version (gluster v
>>>>>>>>> set all cluster.op-version 30702) but that
doesn't seem to help.
>>>>>>>>>
>>>>>>>>> [2017-05-17 06:48:33.700014] I [MSGID:
100030]
>>>>>>>>> [glusterfsd.c:2338:main]
0-/usr/sbin/glusterd: Started running
>>>>>>>>> /usr/sbin/glusterd version 3.7.20 (args:
/usr/sbin/glusterd -p
>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>> [2017-05-17 06:48:33.703808] I [MSGID:
106478]
>>>>>>>>> [glusterd.c:1383:init] 0-management:
Maximum allowed open file descriptors
>>>>>>>>> set to 65536
>>>>>>>>> [2017-05-17 06:48:33.703836] I [MSGID:
106479]
>>>>>>>>> [glusterd.c:1432:init] 0-management: Using
/var/lib/glusterd as working
>>>>>>>>> directory
>>>>>>>>> [2017-05-17 06:48:33.708866] W [MSGID:
103071]
>>>>>>>>> [rdma.c:4594:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm
>>>>>>>>> event channel creation failed [No such
device]
>>>>>>>>> [2017-05-17 06:48:33.709011] W [MSGID:
103055] [rdma.c:4901:init]
>>>>>>>>> 0-rdma.management: Failed to initialize IB
Device
>>>>>>>>> [2017-05-17 06:48:33.709033] W
[rpc-transport.c:359:rpc_transport_load]
>>>>>>>>> 0-rpc-transport: 'rdma'
initialization failed
>>>>>>>>> [2017-05-17 06:48:33.709088] W
[rpcsvc.c:1642:rpcsvc_create_listener]
>>>>>>>>> 0-rpc-service: cannot create listener,
initing the transport failed
>>>>>>>>> [2017-05-17 06:48:33.709105] E [MSGID:
106243]
>>>>>>>>> [glusterd.c:1656:init] 0-management:
creation of 1 listeners failed,
>>>>>>>>> continuing with succeeded transport
>>>>>>>>> [2017-05-17 06:48:35.480043] I [MSGID:
106513]
>>>>>>>>>
[glusterd-store.c:2068:glusterd_restore_op_version] 0-glusterd:
>>>>>>>>> retrieved op-version: 30600
>>>>>>>>> [2017-05-17 06:48:35.605779] I [MSGID:
106498]
>>>>>>>>>
[glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo]
>>>>>>>>> 0-management: connect returned 0
>>>>>>>>> [2017-05-17 06:48:35.607059] I
[rpc-clnt.c:1046:rpc_clnt_connection_init]
>>>>>>>>> 0-management: setting frame-timeout to 600
>>>>>>>>> [2017-05-17 06:48:35.607670] I
[rpc-clnt.c:1046:rpc_clnt_connection_init]
>>>>>>>>> 0-management: setting frame-timeout to 600
>>>>>>>>> [2017-05-17 06:48:35.607025] I [MSGID:
106498]
>>>>>>>>>
[glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo]
>>>>>>>>> 0-management: connect returned 0
>>>>>>>>> [2017-05-17 06:48:35.608125] I [MSGID:
106544]
>>>>>>>>> [glusterd.c:159:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>>> 7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>
>>>>>>>>
>>>>>>>>> Final graph:
>>>>>>>>>
+-----------------------------------------------------------
>>>>>>>>> -------------------+
>>>>>>>>>   1: volume management
>>>>>>>>>   2:     type mgmt/glusterd
>>>>>>>>>   3:     option rpc-auth.auth-glusterfs on
>>>>>>>>>   4:     option rpc-auth.auth-unix on
>>>>>>>>>   5:     option rpc-auth.auth-null on
>>>>>>>>>   6:     option rpc-auth-allow-insecure on
>>>>>>>>>   7:     option
transport.socket.listen-backlog 128
>>>>>>>>>   8:     option event-threads 1
>>>>>>>>>   9:     option ping-timeout 0
>>>>>>>>>  10:     option
transport.socket.read-fail-log off
>>>>>>>>>  11:     option
transport.socket.keepalive-interval 2
>>>>>>>>>  12:     option
transport.socket.keepalive-time 10
>>>>>>>>>  13:     option transport-type rdma
>>>>>>>>>  14:     option working-directory
/var/lib/glusterd
>>>>>>>>>  15: end-volume
>>>>>>>>>  16:
>>>>>>>>>
+-----------------------------------------------------------
>>>>>>>>> -------------------+
>>>>>>>>> [2017-05-17 06:48:35.609868] I [MSGID:
101190]
>>>>>>>>>
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
>>>>>>>>> thread with index 1
>>>>>>>>> [2017-05-17 06:48:35.610839] W
[socket.c:596:__socket_rwv]
>>>>>>>>> 0-management: readv on 192.168.0.7:24007
failed (No data
>>>>>>>>> available)
>>>>>>>>> [2017-05-17 06:48:35.611907] E
[rpc-clnt.c:370:saved_frames_unwind]
>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7fd6c2d70bb3]
>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7fd6c2b3a2df]
>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fd6c2b3a3fe]
>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7fd6c2b3ba39]
>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x160)[0x7fd6c2b3c380]
>>>>>>>>> ))))) 0-management: forced unwinding frame
type(GLUSTERD-DUMP) op(DUMP(1))
>>>>>>>>> called at 2017-05-17 06:48:35.609965
(xid=0x1)
>>>>>>>>> [2017-05-17 06:48:35.611928] E [MSGID:
106167]
>>>>>>>>>
[glusterd-handshake.c:2091:__glusterd_peer_dump_version_cbk]
>>>>>>>>> 0-management: Error through RPC layer,
retry again later
>>>>>>>>> [2017-05-17 06:48:35.611944] I [MSGID:
106004]
>>>>>>>>>
[glusterd-handler.c:5201:__glusterd_peer_rpc_notify]
>>>>>>>>> 0-management: Peer <192.168.0.7>
(<5ec54b4f-f60c-48c6-9e55-95f2bb58f633>),
>>>>>>>>> in state <Peer in Cluster>, has
disconnected from glusterd.
>>>>>>>>> [2017-05-17 06:48:35.612024] W
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>>>>>>>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/g
>>>>>>>>> lusterd.so(glusterd_big_locked_notify+0x4b)
[0x7fd6bdc4912b]
>>>>>>>>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl
>>>>>>>>> usterd.so(__glusterd_peer_rpc_notify+0x160)
[0x7fd6bdc52dd0]
>>>>>>>>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl
>>>>>>>>> usterd.so(glusterd_mgmt_v3_unlock+0x4c3)
[0x7fd6bdcef1b3] )
>>>>>>>>> 0-management: Lock for vol shared not held
>>>>>>>>> [2017-05-17 06:48:35.612039] W [MSGID:
106118]
>>>>>>>>>
[glusterd-handler.c:5223:__glusterd_peer_rpc_notify]
>>>>>>>>> 0-management: Lock not released for shared
>>>>>>>>> [2017-05-17 06:48:35.612079] W
[socket.c:596:__socket_rwv]
>>>>>>>>> 0-management: readv on 192.168.0.6:24007
failed (No data
>>>>>>>>> available)
>>>>>>>>> [2017-05-17 06:48:35.612179] E
[rpc-clnt.c:370:saved_frames_unwind]
>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7fd6c2d70bb3]
>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7fd6c2b3a2df]
>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fd6c2b3a3fe]
>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7fd6c2b3ba39]
>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x160)[0x7fd6c2b3c380]
>>>>>>>>> ))))) 0-management: forced unwinding frame
type(GLUSTERD-DUMP) op(DUMP(1))
>>>>>>>>> called at 2017-05-17 06:48:35.610007
(xid=0x1)
>>>>>>>>> [2017-05-17 06:48:35.612197] E [MSGID:
106167]
>>>>>>>>>
[glusterd-handshake.c:2091:__glusterd_peer_dump_version_cbk]
>>>>>>>>> 0-management: Error through RPC layer,
retry again later
>>>>>>>>> [2017-05-17 06:48:35.612211] I [MSGID:
106004]
>>>>>>>>>
[glusterd-handler.c:5201:__glusterd_peer_rpc_notify]
>>>>>>>>> 0-management: Peer <192.168.0.6>
(<83e9a0b9-6bd5-483b-8516-d8928805ed95>),
>>>>>>>>> in state <Peer in Cluster>, has
disconnected from glusterd.
>>>>>>>>> [2017-05-17 06:48:35.612292] W
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>>>>>>>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/g
>>>>>>>>> lusterd.so(glusterd_big_locked_notify+0x4b)
[0x7fd6bdc4912b]
>>>>>>>>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl
>>>>>>>>> usterd.so(__glusterd_peer_rpc_notify+0x160)
[0x7fd6bdc52dd0]
>>>>>>>>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl
>>>>>>>>> usterd.so(glusterd_mgmt_v3_unlock+0x4c3)
[0x7fd6bdcef1b3] )
>>>>>>>>> 0-management: Lock for vol shared not held
>>>>>>>>> [2017-05-17 06:48:35.613432] W [MSGID:
106118]
>>>>>>>>>
[glusterd-handler.c:5223:__glusterd_peer_rpc_notify]
>>>>>>>>> 0-management: Lock not released for shared
>>>>>>>>> [2017-05-17 06:48:35.614317] E [MSGID:
106170]
>>>>>>>>>
[glusterd-handshake.c:1051:gd_validate_mgmt_hndsk_req]
>>>>>>>>> 0-management: Request from peer
192.168.0.6:991 has an entry in
>>>>>>>>> peerinfo, but uuid does not match
>>>>>>>>>
>>>>>>>>
>>>>>>>> Apologies for delay. My initial suspect was
correct. You have an
>>>>>>>> incorrect UUID in the peer file which is
causing this. Can you please
>>>>>>>> provide me the
>>>>>>>>
>>>>>>>
>>>>>>> Clicked the send button accidentally!
>>>>>>>
>>>>>>> Can you please send me the content of
/var/lib/glusterd & glusterd
>>>>>>> log from all the nodes?
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, May 15, 2017 at 10:31 PM, Atin
Mukherjee <
>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, 15 May 2017 at 11:58, Pawan
Alwandi <pawan at platform.sh>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Atin,
>>>>>>>>>>>
>>>>>>>>>>> I see below error.  Do I require
gluster to be upgraded on all 3
>>>>>>>>>>> hosts for this to work?  Right now
I have host 1 running 3.10.1 and host 2
>>>>>>>>>>> & 3 running 3.6.2
>>>>>>>>>>>
>>>>>>>>>>> # gluster v set all
cluster.op-version 31001
>>>>>>>>>>> volume set: failed: Required
op_version (31001) is not supported
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes you should given 3.6 version is
EOLed.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, May 15, 2017 at 3:32 AM,
Atin Mukherjee <
>>>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Sun, 14 May 2017 at 21:43,
Atin Mukherjee <
>>>>>>>>>>>> amukherj at redhat.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Allright, I see that you
haven't bumped up the op-version. Can
>>>>>>>>>>>>> you please execute:
>>>>>>>>>>>>>
>>>>>>>>>>>>> gluster v set all
cluster.op-version 30101  and then restart
>>>>>>>>>>>>> glusterd on all the nodes
and check the brick status?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> s/30101/31001
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, May 14, 2017 at
8:55 PM, Pawan Alwandi <
>>>>>>>>>>>>> pawan at platform.sh>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello Atin,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for looking at
this.  Below is the output you
>>>>>>>>>>>>>> requested for.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Again, I'm seeing
those errors after upgrading gluster on
>>>>>>>>>>>>>> host 1.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Host 1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # cat
/var/lib/glusterd/glusterd.info
>>>>>>>>>>>>>>
UUID=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>>>>>> operating-version=30600
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # cat
/var/lib/glusterd/peers/*
>>>>>>>>>>>>>>
uuid=5ec54b4f-f60c-48c6-9e55-95f2bb58f633
>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>> hostname1=192.168.0.7
>>>>>>>>>>>>>>
uuid=83e9a0b9-6bd5-483b-8516-d8928805ed95
>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>> hostname1=192.168.0.6
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # gluster --version
>>>>>>>>>>>>>> glusterfs 3.10.1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Host 2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # cat
/var/lib/glusterd/glusterd.info
>>>>>>>>>>>>>>
UUID=83e9a0b9-6bd5-483b-8516-d8928805ed95
>>>>>>>>>>>>>> operating-version=30600
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # cat
/var/lib/glusterd/peers/*
>>>>>>>>>>>>>>
uuid=5ec54b4f-f60c-48c6-9e55-95f2bb58f633
>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>> hostname1=192.168.0.7
>>>>>>>>>>>>>>
uuid=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>> hostname1=192.168.0.5
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # gluster --version
>>>>>>>>>>>>>> glusterfs 3.6.2 built
on Jan 21 2015 14:23:44
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Host 3
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # cat
/var/lib/glusterd/glusterd.info
>>>>>>>>>>>>>>
UUID=5ec54b4f-f60c-48c6-9e55-95f2bb58f633
>>>>>>>>>>>>>> operating-version=30600
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # cat
/var/lib/glusterd/peers/*
>>>>>>>>>>>>>>
uuid=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>> hostname1=192.168.0.5
>>>>>>>>>>>>>>
uuid=83e9a0b9-6bd5-483b-8516-d8928805ed95
>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>> hostname1=192.168.0.6
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # gluster --version
>>>>>>>>>>>>>> glusterfs 3.6.2 built
on Jan 21 2015 14:23:44
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, May 13, 2017 at
6:28 PM, Atin Mukherjee <
>>>>>>>>>>>>>> amukherj at
redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have already
asked for the following earlier:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you please
provide output of following from all the
>>>>>>>>>>>>>>> nodes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> cat
/var/lib/glusterd/glusterd.info
>>>>>>>>>>>>>>> cat
/var/lib/glusterd/peers/*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, 13 May 2017
at 12:22, Pawan Alwandi
>>>>>>>>>>>>>>> <pawan at
platform.sh> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello folks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does anyone
have any idea whats going on here?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Pawan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, May 10,
2017 at 5:02 PM, Pawan Alwandi <
>>>>>>>>>>>>>>>> pawan at
platform.sh> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm
trying to upgrade gluster from 3.6.2 to 3.10.1 but
>>>>>>>>>>>>>>>>> don't
see the glusterfsd and glusterfs processes coming up.
>>>>>>>>>>>>>>>>>
http://gluster.readthedocs.io/
>>>>>>>>>>>>>>>>>
en/latest/Upgrade-Guide/upgrade_to_3.10/ is the process
>>>>>>>>>>>>>>>>> that
I'm trying to follow.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is a 3
node server setup with a replicated volume
>>>>>>>>>>>>>>>>> having
replica count of 3.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Logs below:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:03.507959] I [MSGID: 100030]
>>>>>>>>>>>>>>>>>
[glusterfsd.c:2460:main] 0-/usr/sbin/glusterd: Started running
>>>>>>>>>>>>>>>>>
/usr/sbin/glusterd version 3.10.1 (args: /usr/sbin/glusterd -p
>>>>>>>>>>>>>>>>>
/var/run/glusterd.pid)
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:03.512827] I [MSGID: 106478]
>>>>>>>>>>>>>>>>>
[glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors
>>>>>>>>>>>>>>>>> set to
65536
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:03.512855] I [MSGID: 106479]
>>>>>>>>>>>>>>>>>
[glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working
>>>>>>>>>>>>>>>>> directory
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:03.520426] W [MSGID: 103071]
>>>>>>>>>>>>>>>>>
[rdma.c:4590:__gf_rdma_ctx_create] 0-rpc-transport/rdma:
>>>>>>>>>>>>>>>>> rdma_cm
event channel creation failed [No such device]
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:03.520452] W [MSGID: 103055]
>>>>>>>>>>>>>>>>>
[rdma.c:4897:init] 0-rdma.management: Failed to initialize IB Device
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:03.520465] W
>>>>>>>>>>>>>>>>>
[rpc-transport.c:350:rpc_transport_load] 0-rpc-transport:
>>>>>>>>>>>>>>>>>
'rdma' initialization failed
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:03.520518] W
>>>>>>>>>>>>>>>>>
[rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service:
>>>>>>>>>>>>>>>>> cannot
create listener, initing the transport failed
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:03.520534] E [MSGID: 106243]
>>>>>>>>>>>>>>>>>
[glusterd.c:1720:init] 0-management: creation of 1 listeners failed,
>>>>>>>>>>>>>>>>> continuing
with succeeded transport
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:04.931764] I [MSGID: 106513]
>>>>>>>>>>>>>>>>>
[glusterd-store.c:2197:glusterd_restore_op_version]
>>>>>>>>>>>>>>>>> 0-glusterd:
retrieved op-version: 30600
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:04.964354] I [MSGID: 106544]
>>>>>>>>>>>>>>>>>
[glusterd.c:158:glusterd_uuid_init] 0-management:
>>>>>>>>>>>>>>>>> retrieved
UUID: 7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:04.993944] I [MSGID: 106498]
>>>>>>>>>>>>>>>>>
[glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo]
>>>>>>>>>>>>>>>>>
0-management: connect returned 0
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:04.995864] I [MSGID: 106498]
>>>>>>>>>>>>>>>>>
[glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo]
>>>>>>>>>>>>>>>>>
0-management: connect returned 0
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:04.995879] W [MSGID: 106062]
>>>>>>>>>>>>>>>>>
[glusterd-handler.c:3466:glust
>>>>>>>>>>>>>>>>>
erd_transport_inet_options_build] 0-glusterd: Failed to
>>>>>>>>>>>>>>>>> get
tcp-user-timeout
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:04.995903] I
>>>>>>>>>>>>>>>>>
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management:
>>>>>>>>>>>>>>>>> setting
frame-timeout to 600
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:04.996325] I
>>>>>>>>>>>>>>>>>
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management:
>>>>>>>>>>>>>>>>> setting
frame-timeout to 600
>>>>>>>>>>>>>>>>> Final
graph:
>>>>>>>>>>>>>>>>>
+-----------------------------
>>>>>>>>>>>>>>>>>
-------------------------------------------------+
>>>>>>>>>>>>>>>>>   1: volume
management
>>>>>>>>>>>>>>>>>   2:    
type mgmt/glusterd
>>>>>>>>>>>>>>>>>   3:    
option rpc-auth.auth-glusterfs on
>>>>>>>>>>>>>>>>>   4:    
option rpc-auth.auth-unix on
>>>>>>>>>>>>>>>>>   5:    
option rpc-auth.auth-null on
>>>>>>>>>>>>>>>>>   6:    
option rpc-auth-allow-insecure on
>>>>>>>>>>>>>>>>>   7:    
option transport.socket.listen-backlog 128
>>>>>>>>>>>>>>>>>   8:    
option event-threads 1
>>>>>>>>>>>>>>>>>   9:    
option ping-timeout 0
>>>>>>>>>>>>>>>>>  10:    
option transport.socket.read-fail-log off
>>>>>>>>>>>>>>>>>  11:    
option transport.socket.keepalive-interval 2
>>>>>>>>>>>>>>>>>  12:    
option transport.socket.keepalive-time 10
>>>>>>>>>>>>>>>>>  13:    
option transport-type rdma
>>>>>>>>>>>>>>>>>  14:    
option working-directory /var/lib/glusterd
>>>>>>>>>>>>>>>>>  15:
end-volume
>>>>>>>>>>>>>>>>>  16:
>>>>>>>>>>>>>>>>>
+-----------------------------
>>>>>>>>>>>>>>>>>
-------------------------------------------------+
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:04.996310] W [MSGID: 106062]
>>>>>>>>>>>>>>>>>
[glusterd-handler.c:3466:glust
>>>>>>>>>>>>>>>>>
erd_transport_inet_options_build] 0-glusterd: Failed to
>>>>>>>>>>>>>>>>> get
tcp-user-timeout
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:05.000461] I [MSGID: 101190]
>>>>>>>>>>>>>>>>>
[event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll:
>>>>>>>>>>>>>>>>> Started
thread with index 1
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:05.001493] W [socket.c:593:__socket_rwv]
>>>>>>>>>>>>>>>>>
0-management: readv on 192.168.0.7:24007 failed (No data
>>>>>>>>>>>>>>>>> available)
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:05.001513] I [MSGID: 106004]
>>>>>>>>>>>>>>>>>
[glusterd-handler.c:5882:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>>
0-management: Peer <192.168.0.7>
(<5ec54b4f-f60c-48c6-9e55-95f2bb58f633>),
>>>>>>>>>>>>>>>>> in state
<Peer in Cluster>, h
>>>>>>>>>>>>>>>>> as
disconnected from glusterd.
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:05.001677] W
>>>>>>>>>>>>>>>>>
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>>>>>>>>>>>>>>>>>
(-->/usr/lib/x86_64-linux-gnu/
>>>>>>>>>>>>>>>>>
glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x20559)
>>>>>>>>>>>>>>>>>
[0x7f0bf9d74559] -->/usr/lib/x86_64-linux-gnu
>>>>>>>>>>>>>>>>>
/glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x29cf0)
>>>>>>>>>>>>>>>>>
[0x7f0bf9d7dcf0] -->/usr/lib/x86_64-linux-gnu/g
>>>>>>>>>>>>>>>>>
lusterfs/3.10.1/xlator/mgmt/glusterd.so(+0xd5ba3)
>>>>>>>>>>>>>>>>>
[0x7f0bf9e29ba3] ) 0-management: Lock for vol shared no
>>>>>>>>>>>>>>>>> t held
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:05.001696] W [MSGID: 106118]
>>>>>>>>>>>>>>>>>
[glusterd-handler.c:5907:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>>
0-management: Lock not released for shared
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:05.003099] E
>>>>>>>>>>>>>>>>>
[rpc-clnt.c:365:saved_frames_unwind] (-->
>>>>>>>>>>>>>>>>>
/usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>
lusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f0bfeeca73c]
>>>>>>>>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(s
>>>>>>>>>>>>>>>>>
aved_frames_unwind+0x1cf)[0x7f0bfec904bf] (-->
>>>>>>>>>>>>>>>>>
/usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>
frpc.so.0(saved_frames_destroy+0xe)[0x7f0bfec905de] (-->
>>>>>>>>>>>>>>>>>
/usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>
frpc.so.0(rpc_clnt_connection_cleanup+0x
>>>>>>>>>>>>>>>>>
91)[0x7f0bfec91c21] (--> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>
frpc.so.0(rpc_clnt_notify+0x290)[0x7f0bfec92710] )))))
>>>>>>>>>>>>>>>>>
0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called
>>>>>>>>>>>>>>>>> at
2017-05-10 09:0
>>>>>>>>>>>>>>>>> 7:05.000627
(xid=0x1)
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:05.003129] E [MSGID: 106167]
>>>>>>>>>>>>>>>>>
[glusterd-handshake.c:2181:__glusterd_peer_dump_version_cbk]
>>>>>>>>>>>>>>>>>
0-management: Error through RPC layer, retry again later
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:05.003251] W [socket.c:593:__socket_rwv]
>>>>>>>>>>>>>>>>>
0-management: readv on 192.168.0.6:24007 failed (No data
>>>>>>>>>>>>>>>>> available)
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:05.003267] I [MSGID: 106004]
>>>>>>>>>>>>>>>>>
[glusterd-handler.c:5882:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>>
0-management: Peer <192.168.0.6>
(<83e9a0b9-6bd5-483b-8516-d8928805ed95>),
>>>>>>>>>>>>>>>>> in state
<Peer in Cluster>, h
>>>>>>>>>>>>>>>>> as
disconnected from glusterd.
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:05.003318] W
>>>>>>>>>>>>>>>>>
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>>>>>>>>>>>>>>>>>
(-->/usr/lib/x86_64-linux-gnu/
>>>>>>>>>>>>>>>>>
glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x20559)
>>>>>>>>>>>>>>>>>
[0x7f0bf9d74559] -->/usr/lib/x86_64-linux-gnu
>>>>>>>>>>>>>>>>>
/glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x29cf0)
>>>>>>>>>>>>>>>>>
[0x7f0bf9d7dcf0] -->/usr/lib/x86_64-linux-gnu/g
>>>>>>>>>>>>>>>>>
lusterfs/3.10.1/xlator/mgmt/glusterd.so(+0xd5ba3)
>>>>>>>>>>>>>>>>>
[0x7f0bf9e29ba3] ) 0-management: Lock for vol shared no
>>>>>>>>>>>>>>>>> t held
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:05.003329] W [MSGID: 106118]
>>>>>>>>>>>>>>>>>
[glusterd-handler.c:5907:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>>
0-management: Lock not released for shared
>>>>>>>>>>>>>>>>> [2017-05-10
09:07:05.003457] E
>>>>>>>>>>>>>>>>>
[rpc-clnt.c:365:saved_frames_unwind] (-->
>>>>>>>>>>>>>>>>>
/usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>
lusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f0bfeeca73c]
>>>>>>>>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(s
>>>>>>>>>>>>>>>>>
aved_frames_unwind+0x1cf)[0x7f0bfec904bf] (-->
>>>>>>>>>>>>>>>>>
/usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>
frpc.so.0(saved_frames_destroy+0xe)[0x7f0bfec905de] (-->
>>>>>>>>>>>>>>>>>
/usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>
frpc.so.0(rpc_clnt_connection_cleanup+0x
>>>>>>>>>>>>>>>>>
91)[0x7f0bfec91c21] (--> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>
frpc.so.0(rpc_clnt_notify+0x290)[0x7f0bfec92710] )))))
>>>>>>>>>>>>>>>>>
0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called
>>>>>>>>>>>>>>>>> at
2017-05-10 09:0
>>>>>>>>>>>>>>>>> 7:05.001407
(xid=0x1)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There are a
bunch of errors reported but I'm not sure
>>>>>>>>>>>>>>>>> which is
signal and which ones are noise.  Does anyone have any idea whats
>>>>>>>>>>>>>>>>> going on
here?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Pawan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>> Gluster-users
mailing list
>>>>>>>>>>>>>>>> Gluster-users
at gluster.org
>>>>>>>>>>>>>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> - Atin (atinm)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> - Atin (atinm)
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>> - Atin (atinm)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> - Atin (atinm)
>>>>>>>>
>>>>>>> --
>>>>>>> - Atin (atinm)
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170522/c757880c/attachment.html>

Pawan Alwandi

2017-May-24 10:50 UTC

head link

[Gluster-users] Failure while upgrading gluster to 3.10.1

Thanks Atin,

So I got gluster downgraded to 3.7.9 on host 1 and now have the glusterfs
and glusterfsd processes come up.  But I see the volume is mounted read
only.

I see these being logged every 3s:

[2017-05-24 10:45:44.440435] W [socket.c:852:__socket_keepalive] 0-socket:
failed to set keep idle -1 on socket 17, Invalid argument
[2017-05-24 10:45:44.440475] E [socket.c:2966:socket_connect] 0-management:
Failed to set keep-alive: Invalid argument
[2017-05-24 10:45:44.440734] W [socket.c:852:__socket_keepalive] 0-socket:
failed to set keep idle -1 on socket 20, Invalid argument
[2017-05-24 10:45:44.440754] E [socket.c:2966:socket_connect] 0-management:
Failed to set keep-alive: Invalid argument
[2017-05-24 10:45:44.441354] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f767c46d483]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f767c2383af]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f767c2384ce]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_
connection_cleanup+0x7e)[0x7f767c239c8e] (--> /usr/lib/x86_64-linux-gnu/
libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f767c23a4a8] ))))) 0-management:
forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-24
10:45:44.440945 (xid=0xbf)
[2017-05-24 10:45:44.441505] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/
glusterd.so(glusterd_big_locked_notify+0x4b) [0x7f767734dffb]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/
glusterd.so(__glusterd_peer_rpc_notify+0x14a) [0x7f7677357c6a]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/
glusterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7f76773f0ef3] ) 0-management:
Lock for vol shared not held
[2017-05-24 10:45:44.441660] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f767c46d483]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f767c2383af]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f767c2384ce]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_
connection_cleanup+0x7e)[0x7f767c239c8e] (--> /usr/lib/x86_64-linux-gnu/
libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f767c23a4a8] ))))) 0-management:
forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-24
10:45:44.441086 (xid=0xbf)
[2017-05-24 10:45:44.441790] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/
glusterd.so(glusterd_big_locked_notify+0x4b) [0x7f767734dffb]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/
glusterd.so(__glusterd_peer_rpc_notify+0x14a) [0x7f7677357c6a]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/
glusterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7f76773f0ef3] ) 0-management:
Lock for vol shared not held

The heal info says this:

# gluster volume heal shared info
Brick 192.168.0.5:/data/exports/shared
Number of entries: 0

Brick 192.168.0.6:/data/exports/shared
Status: Transport endpoint is not connected

Brick 192.168.0.7:/data/exports/shared
Status: Transport endpoint is not connected

Any idea whats up here?

Pawan

On Mon, May 22, 2017 at 9:42 PM, Atin Mukherjee <amukherj at redhat.com>
wrote:
>
>
> On Mon, May 22, 2017 at 9:05 PM, Pawan Alwandi <pawan at platform.sh>
wrote:
>
>>
>> On Mon, May 22, 2017 at 8:36 PM, Atin Mukherjee <amukherj at
redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Mon, May 22, 2017 at 7:51 PM, Atin Mukherjee <amukherj at
redhat.com>
>>> wrote:
>>>
>>>> Sorry Pawan, I did miss the other part of the attachments. So
looking
>>>> from the glusterd.info file from all the hosts, it looks like
host2
>>>> and host3 do not have the correct op-version. Can you please
set the
>>>> op-version as "operating-version=30702" in host2 and
host3 and restart
>>>> glusterd instance one by one on all the nodes?
>>>>
>>>
>>> Please ensure that all the hosts are upgraded to the same bits
before
>>> doing this change.
>>>
>>
>> Having to upgrade all 3 hosts to newer version before gluster could
work
>> successfully on any of them means application downtime.  The
applications
>> running on these hosts are expected to be highly available.  So with
the
>> way the things are right now, is an online upgrade possible?  My
upgrade
>> steps are: (1) stop the applications (2) umount the gluster volume, and
>> then (3) upgrade gluster one host at a time.
>>
>
> One of the way to mitigate this is to first do an online upgrade to
> glusterfs-3.7.9 (op-version:30707) given this bug was introduced in 3.7.10
> and then come to 3.11.
>
>
>> Our goal is to get gluster upgraded to 3.11 from 3.6.9, and to make
this
>> an online upgrade we are okay to take two steps 3.6.9 -> 3.7 and
then 3.7
>> to 3.11.
>>
>>
>>>
>>>
>>>>
>>>> Apparently it looks like there is a bug which you have
uncovered,
>>>> during peer handshaking if one of the glusterd instance is
running with old
>>>> bits then during validating the handshake request there is a
possibility
>>>> that uuid received will be blank and the same was ignored
however there was
>>>> a patch http://review.gluster.org/13519 which had some
additional
>>>> changes which was always looking at this field and doing some
extra checks
>>>> which was causing the handshake to fail. For now, the above
workaround
>>>> should suffice. I'll be sending a patch pretty soon.
>>>>
>>>
>>> Posted a patch https://review.gluster.org/#/c/17358 .
>>>
>>>
>>>>
>>>>
>>>>
>>>> On Mon, May 22, 2017 at 11:35 AM, Pawan Alwandi <pawan at
platform.sh>
>>>> wrote:
>>>>
>>>>> Hello Atin,
>>>>>
>>>>> The tar's have the content of `/var/lib/glusterd` too
for all 3 nodes,
>>>>> please check again.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Mon, May 22, 2017 at 11:32 AM, Atin Mukherjee
<amukherj at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Pawan,
>>>>>>
>>>>>> I see you have provided the log files from the nodes,
however it'd be
>>>>>> really helpful if you can provide me the content of
/var/lib/glusterd from
>>>>>> all the nodes to get to the root cause of this issue.
>>>>>>
>>>>>> On Fri, May 19, 2017 at 12:09 PM, Pawan Alwandi
<pawan at platform.sh>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello Atin,
>>>>>>>
>>>>>>> Thanks for continued support.  I've attached
requested files from
>>>>>>> all 3 nodes.
>>>>>>>
>>>>>>> (I think we already verified the UUIDs to be
correct, anyway let us
>>>>>>> know if you find any more info in the logs)
>>>>>>>
>>>>>>> Pawan
>>>>>>>
>>>>>>> On Thu, May 18, 2017 at 11:45 PM, Atin Mukherjee
<
>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, 18 May 2017 at 23:40, Atin Mukherjee
<amukherj at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Wed, 17 May 2017 at 12:47, Pawan Alwandi
<pawan at platform.sh>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hello Atin,
>>>>>>>>>>
>>>>>>>>>> I realized that these
http://gluster.readthedocs.io/
>>>>>>>>>>
en/latest/Upgrade-Guide/upgrade_to_3.10/ instructions only work
>>>>>>>>>> for upgrades from 3.7, while we are
running 3.6.2.  Are there
>>>>>>>>>> instructions/suggestion you have for us
to upgrade from 3.6 version?
>>>>>>>>>>
>>>>>>>>>> I believe upgrade from 3.6 to 3.7 and
then to 3.10 would work,
>>>>>>>>>> but I see similar errors reported when
I upgraded to 3.7 too.
>>>>>>>>>>
>>>>>>>>>> For what its worth, I was able to set
the op-version (gluster v
>>>>>>>>>> set all cluster.op-version 30702) but
that doesn't seem to help.
>>>>>>>>>>
>>>>>>>>>> [2017-05-17 06:48:33.700014] I [MSGID:
100030]
>>>>>>>>>> [glusterfsd.c:2338:main]
0-/usr/sbin/glusterd: Started running
>>>>>>>>>> /usr/sbin/glusterd version 3.7.20
(args: /usr/sbin/glusterd -p
>>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>>> [2017-05-17 06:48:33.703808] I [MSGID:
106478]
>>>>>>>>>> [glusterd.c:1383:init] 0-management:
Maximum allowed open file descriptors
>>>>>>>>>> set to 65536
>>>>>>>>>> [2017-05-17 06:48:33.703836] I [MSGID:
106479]
>>>>>>>>>> [glusterd.c:1432:init] 0-management:
Using /var/lib/glusterd as working
>>>>>>>>>> directory
>>>>>>>>>> [2017-05-17 06:48:33.708866] W [MSGID:
103071]
>>>>>>>>>> [rdma.c:4594:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm
>>>>>>>>>> event channel creation failed [No such
device]
>>>>>>>>>> [2017-05-17 06:48:33.709011] W [MSGID:
103055] [rdma.c:4901:init]
>>>>>>>>>> 0-rdma.management: Failed to initialize
IB Device
>>>>>>>>>> [2017-05-17 06:48:33.709033] W
[rpc-transport.c:359:rpc_transport_load]
>>>>>>>>>> 0-rpc-transport: 'rdma'
initialization failed
>>>>>>>>>> [2017-05-17 06:48:33.709088] W
[rpcsvc.c:1642:rpcsvc_create_listener]
>>>>>>>>>> 0-rpc-service: cannot create listener,
initing the transport failed
>>>>>>>>>> [2017-05-17 06:48:33.709105] E [MSGID:
106243]
>>>>>>>>>> [glusterd.c:1656:init] 0-management:
creation of 1 listeners failed,
>>>>>>>>>> continuing with succeeded transport
>>>>>>>>>> [2017-05-17 06:48:35.480043] I [MSGID:
106513]
>>>>>>>>>>
[glusterd-store.c:2068:glusterd_restore_op_version] 0-glusterd:
>>>>>>>>>> retrieved op-version: 30600
>>>>>>>>>> [2017-05-17 06:48:35.605779] I [MSGID:
106498]
>>>>>>>>>>
[glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo]
>>>>>>>>>> 0-management: connect returned 0
>>>>>>>>>> [2017-05-17 06:48:35.607059] I
[rpc-clnt.c:1046:rpc_clnt_connection_init]
>>>>>>>>>> 0-management: setting frame-timeout to
600
>>>>>>>>>> [2017-05-17 06:48:35.607670] I
[rpc-clnt.c:1046:rpc_clnt_connection_init]
>>>>>>>>>> 0-management: setting frame-timeout to
600
>>>>>>>>>> [2017-05-17 06:48:35.607025] I [MSGID:
106498]
>>>>>>>>>>
[glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo]
>>>>>>>>>> 0-management: connect returned 0
>>>>>>>>>> [2017-05-17 06:48:35.608125] I [MSGID:
106544]
>>>>>>>>>> [glusterd.c:159:glusterd_uuid_init]
0-management: retrieved
>>>>>>>>>> UUID:
7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Final graph:
>>>>>>>>>>
+-----------------------------------------------------------
>>>>>>>>>> -------------------+
>>>>>>>>>>   1: volume management
>>>>>>>>>>   2:     type mgmt/glusterd
>>>>>>>>>>   3:     option rpc-auth.auth-glusterfs
on
>>>>>>>>>>   4:     option rpc-auth.auth-unix on
>>>>>>>>>>   5:     option rpc-auth.auth-null on
>>>>>>>>>>   6:     option rpc-auth-allow-insecure
on
>>>>>>>>>>   7:     option
transport.socket.listen-backlog 128
>>>>>>>>>>   8:     option event-threads 1
>>>>>>>>>>   9:     option ping-timeout 0
>>>>>>>>>>  10:     option
transport.socket.read-fail-log off
>>>>>>>>>>  11:     option
transport.socket.keepalive-interval 2
>>>>>>>>>>  12:     option
transport.socket.keepalive-time 10
>>>>>>>>>>  13:     option transport-type rdma
>>>>>>>>>>  14:     option working-directory
/var/lib/glusterd
>>>>>>>>>>  15: end-volume
>>>>>>>>>>  16:
>>>>>>>>>>
+-----------------------------------------------------------
>>>>>>>>>> -------------------+
>>>>>>>>>> [2017-05-17 06:48:35.609868] I [MSGID:
101190]
>>>>>>>>>>
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
>>>>>>>>>> thread with index 1
>>>>>>>>>> [2017-05-17 06:48:35.610839] W
[socket.c:596:__socket_rwv]
>>>>>>>>>> 0-management: readv on
192.168.0.7:24007 failed (No data
>>>>>>>>>> available)
>>>>>>>>>> [2017-05-17 06:48:35.611907] E
[rpc-clnt.c:370:saved_frames_unwind]
>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7fd6c2d70bb3]
>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7fd6c2b3a2df]
>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fd6c2b3a3fe]
>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7fd6c2b3ba39]
>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x160)[0x7fd6c2b3c380]
>>>>>>>>>> ))))) 0-management: forced unwinding
frame type(GLUSTERD-DUMP) op(DUMP(1))
>>>>>>>>>> called at 2017-05-17 06:48:35.609965
(xid=0x1)
>>>>>>>>>> [2017-05-17 06:48:35.611928] E [MSGID:
106167]
>>>>>>>>>>
[glusterd-handshake.c:2091:__glusterd_peer_dump_version_cbk]
>>>>>>>>>> 0-management: Error through RPC layer,
retry again later
>>>>>>>>>> [2017-05-17 06:48:35.611944] I [MSGID:
106004]
>>>>>>>>>>
[glusterd-handler.c:5201:__glusterd_peer_rpc_notify]
>>>>>>>>>> 0-management: Peer <192.168.0.7>
(<5ec54b4f-f60c-48c6-9e55-95f2bb58f633>),
>>>>>>>>>> in state <Peer in Cluster>, has
disconnected from glusterd.
>>>>>>>>>> [2017-05-17 06:48:35.612024] W
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>>>>>>>>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/g
>>>>>>>>>>
lusterd.so(glusterd_big_locked_notify+0x4b) [0x7fd6bdc4912b]
>>>>>>>>>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl
>>>>>>>>>>
usterd.so(__glusterd_peer_rpc_notify+0x160) [0x7fd6bdc52dd0]
>>>>>>>>>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl
>>>>>>>>>>
usterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7fd6bdcef1b3] )
>>>>>>>>>> 0-management: Lock for vol shared not
held
>>>>>>>>>> [2017-05-17 06:48:35.612039] W [MSGID:
106118]
>>>>>>>>>>
[glusterd-handler.c:5223:__glusterd_peer_rpc_notify]
>>>>>>>>>> 0-management: Lock not released for
shared
>>>>>>>>>> [2017-05-17 06:48:35.612079] W
[socket.c:596:__socket_rwv]
>>>>>>>>>> 0-management: readv on
192.168.0.6:24007 failed (No data
>>>>>>>>>> available)
>>>>>>>>>> [2017-05-17 06:48:35.612179] E
[rpc-clnt.c:370:saved_frames_unwind]
>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7fd6c2d70bb3]
>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7fd6c2b3a2df]
>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fd6c2b3a3fe]
>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7fd6c2b3ba39]
>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x160)[0x7fd6c2b3c380]
>>>>>>>>>> ))))) 0-management: forced unwinding
frame type(GLUSTERD-DUMP) op(DUMP(1))
>>>>>>>>>> called at 2017-05-17 06:48:35.610007
(xid=0x1)
>>>>>>>>>> [2017-05-17 06:48:35.612197] E [MSGID:
106167]
>>>>>>>>>>
[glusterd-handshake.c:2091:__glusterd_peer_dump_version_cbk]
>>>>>>>>>> 0-management: Error through RPC layer,
retry again later
>>>>>>>>>> [2017-05-17 06:48:35.612211] I [MSGID:
106004]
>>>>>>>>>>
[glusterd-handler.c:5201:__glusterd_peer_rpc_notify]
>>>>>>>>>> 0-management: Peer <192.168.0.6>
(<83e9a0b9-6bd5-483b-8516-d8928805ed95>),
>>>>>>>>>> in state <Peer in Cluster>, has
disconnected from glusterd.
>>>>>>>>>> [2017-05-17 06:48:35.612292] W
[glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>>>>>>>>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/g
>>>>>>>>>>
lusterd.so(glusterd_big_locked_notify+0x4b) [0x7fd6bdc4912b]
>>>>>>>>>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl
>>>>>>>>>>
usterd.so(__glusterd_peer_rpc_notify+0x160) [0x7fd6bdc52dd0]
>>>>>>>>>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.20/xlator/mgmt/gl
>>>>>>>>>>
usterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7fd6bdcef1b3] )
>>>>>>>>>> 0-management: Lock for vol shared not
held
>>>>>>>>>> [2017-05-17 06:48:35.613432] W [MSGID:
106118]
>>>>>>>>>>
[glusterd-handler.c:5223:__glusterd_peer_rpc_notify]
>>>>>>>>>> 0-management: Lock not released for
shared
>>>>>>>>>> [2017-05-17 06:48:35.614317] E [MSGID:
106170]
>>>>>>>>>>
[glusterd-handshake.c:1051:gd_validate_mgmt_hndsk_req]
>>>>>>>>>> 0-management: Request from peer
192.168.0.6:991 has an entry in
>>>>>>>>>> peerinfo, but uuid does not match
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Apologies for delay. My initial suspect was
correct. You have an
>>>>>>>>> incorrect UUID in the peer file which is
causing this. Can you please
>>>>>>>>> provide me the
>>>>>>>>>
>>>>>>>>
>>>>>>>> Clicked the send button accidentally!
>>>>>>>>
>>>>>>>> Can you please send me the content of
/var/lib/glusterd & glusterd
>>>>>>>> log from all the nodes?
>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, May 15, 2017 at 10:31 PM, Atin
Mukherjee <
>>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 15 May 2017 at 11:58, Pawan
Alwandi <pawan at platform.sh>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Atin,
>>>>>>>>>>>>
>>>>>>>>>>>> I see below error.  Do I
require gluster to be upgraded on all
>>>>>>>>>>>> 3 hosts for this to work? 
Right now I have host 1 running 3.10.1 and host
>>>>>>>>>>>> 2 & 3 running 3.6.2
>>>>>>>>>>>>
>>>>>>>>>>>> # gluster v set all
cluster.op-version 31001
>>>>>>>>>>>> volume set: failed: Required
op_version (31001) is not supported
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yes you should given 3.6 version is
EOLed.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, May 15, 2017 at 3:32
AM, Atin Mukherjee <
>>>>>>>>>>>> amukherj at redhat.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, 14 May 2017 at
21:43, Atin Mukherjee <
>>>>>>>>>>>>> amukherj at redhat.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Allright, I see that
you haven't bumped up the op-version.
>>>>>>>>>>>>>> Can you please execute:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gluster v set all
cluster.op-version 30101  and then restart
>>>>>>>>>>>>>> glusterd on all the
nodes and check the brick status?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> s/30101/31001
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, May 14, 2017 at
8:55 PM, Pawan Alwandi <
>>>>>>>>>>>>>> pawan at
platform.sh> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello Atin,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for looking
at this.  Below is the output you
>>>>>>>>>>>>>>> requested for.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Again, I'm
seeing those errors after upgrading gluster on
>>>>>>>>>>>>>>> host 1.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Host 1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # cat
/var/lib/glusterd/glusterd.info
>>>>>>>>>>>>>>>
UUID=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>>>>>>>
operating-version=30600
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # cat
/var/lib/glusterd/peers/*
>>>>>>>>>>>>>>>
uuid=5ec54b4f-f60c-48c6-9e55-95f2bb58f633
>>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>>>
hostname1=192.168.0.7
>>>>>>>>>>>>>>>
uuid=83e9a0b9-6bd5-483b-8516-d8928805ed95
>>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>>>
hostname1=192.168.0.6
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # gluster --version
>>>>>>>>>>>>>>> glusterfs 3.10.1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Host 2
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # cat
/var/lib/glusterd/glusterd.info
>>>>>>>>>>>>>>>
UUID=83e9a0b9-6bd5-483b-8516-d8928805ed95
>>>>>>>>>>>>>>>
operating-version=30600
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # cat
/var/lib/glusterd/peers/*
>>>>>>>>>>>>>>>
uuid=5ec54b4f-f60c-48c6-9e55-95f2bb58f633
>>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>>>
hostname1=192.168.0.7
>>>>>>>>>>>>>>>
uuid=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>>>
hostname1=192.168.0.5
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # gluster --version
>>>>>>>>>>>>>>> glusterfs 3.6.2
built on Jan 21 2015 14:23:44
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Host 3
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # cat
/var/lib/glusterd/glusterd.info
>>>>>>>>>>>>>>>
UUID=5ec54b4f-f60c-48c6-9e55-95f2bb58f633
>>>>>>>>>>>>>>>
operating-version=30600
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # cat
/var/lib/glusterd/peers/*
>>>>>>>>>>>>>>>
uuid=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>>>
hostname1=192.168.0.5
>>>>>>>>>>>>>>>
uuid=83e9a0b9-6bd5-483b-8516-d8928805ed95
>>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>>>
hostname1=192.168.0.6
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # gluster --version
>>>>>>>>>>>>>>> glusterfs 3.6.2
built on Jan 21 2015 14:23:44
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, May 13,
2017 at 6:28 PM, Atin Mukherjee <
>>>>>>>>>>>>>>> amukherj at
redhat.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have already
asked for the following earlier:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you please
provide output of following from all the
>>>>>>>>>>>>>>>> nodes:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> cat
/var/lib/glusterd/glusterd.info
>>>>>>>>>>>>>>>> cat
/var/lib/glusterd/peers/*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, 13 May
2017 at 12:22, Pawan Alwandi
>>>>>>>>>>>>>>>> <pawan at
platform.sh> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello
folks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Does anyone
have any idea whats going on here?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Pawan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, May
10, 2017 at 5:02 PM, Pawan Alwandi <
>>>>>>>>>>>>>>>>> pawan at
platform.sh> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm
trying to upgrade gluster from 3.6.2 to 3.10.1 but
>>>>>>>>>>>>>>>>>>
don't see the glusterfsd and glusterfs processes coming up.
>>>>>>>>>>>>>>>>>>
http://gluster.readthedocs.io/
>>>>>>>>>>>>>>>>>>
en/latest/Upgrade-Guide/upgrade_to_3.10/ is the process
>>>>>>>>>>>>>>>>>> that
I'm trying to follow.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This is
a 3 node server setup with a replicated volume
>>>>>>>>>>>>>>>>>> having
replica count of 3.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Logs
below:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:03.507959] I [MSGID: 100030]
>>>>>>>>>>>>>>>>>>
[glusterfsd.c:2460:main] 0-/usr/sbin/glusterd: Started running
>>>>>>>>>>>>>>>>>>
/usr/sbin/glusterd version 3.10.1 (args: /usr/sbin/glusterd -p
>>>>>>>>>>>>>>>>>>
/var/run/glusterd.pid)
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:03.512827] I [MSGID: 106478]
>>>>>>>>>>>>>>>>>>
[glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors
>>>>>>>>>>>>>>>>>> set to
65536
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:03.512855] I [MSGID: 106479]
>>>>>>>>>>>>>>>>>>
[glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working
>>>>>>>>>>>>>>>>>>
directory
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:03.520426] W [MSGID: 103071]
>>>>>>>>>>>>>>>>>>
[rdma.c:4590:__gf_rdma_ctx_create] 0-rpc-transport/rdma:
>>>>>>>>>>>>>>>>>> rdma_cm
event channel creation failed [No such device]
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:03.520452] W [MSGID: 103055]
>>>>>>>>>>>>>>>>>>
[rdma.c:4897:init] 0-rdma.management: Failed to initialize IB Device
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:03.520465] W
>>>>>>>>>>>>>>>>>>
[rpc-transport.c:350:rpc_transport_load]
>>>>>>>>>>>>>>>>>>
0-rpc-transport: 'rdma' initialization failed
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:03.520518] W
>>>>>>>>>>>>>>>>>>
[rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service:
>>>>>>>>>>>>>>>>>> cannot
create listener, initing the transport failed
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:03.520534] E [MSGID: 106243]
>>>>>>>>>>>>>>>>>>
[glusterd.c:1720:init] 0-management: creation of 1 listeners failed,
>>>>>>>>>>>>>>>>>>
continuing with succeeded transport
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:04.931764] I [MSGID: 106513]
>>>>>>>>>>>>>>>>>>
[glusterd-store.c:2197:glusterd_restore_op_version]
>>>>>>>>>>>>>>>>>>
0-glusterd: retrieved op-version: 30600
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:04.964354] I [MSGID: 106544]
>>>>>>>>>>>>>>>>>>
[glusterd.c:158:glusterd_uuid_init] 0-management:
>>>>>>>>>>>>>>>>>>
retrieved UUID: 7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:04.993944] I [MSGID: 106498]
>>>>>>>>>>>>>>>>>>
[glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo]
>>>>>>>>>>>>>>>>>>
0-management: connect returned 0
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:04.995864] I [MSGID: 106498]
>>>>>>>>>>>>>>>>>>
[glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo]
>>>>>>>>>>>>>>>>>>
0-management: connect returned 0
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:04.995879] W [MSGID: 106062]
>>>>>>>>>>>>>>>>>>
[glusterd-handler.c:3466:glust
>>>>>>>>>>>>>>>>>>
erd_transport_inet_options_build] 0-glusterd: Failed to
>>>>>>>>>>>>>>>>>> get
tcp-user-timeout
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:04.995903] I
>>>>>>>>>>>>>>>>>>
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management:
>>>>>>>>>>>>>>>>>> setting
frame-timeout to 600
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:04.996325] I
>>>>>>>>>>>>>>>>>>
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management:
>>>>>>>>>>>>>>>>>> setting
frame-timeout to 600
>>>>>>>>>>>>>>>>>> Final
graph:
>>>>>>>>>>>>>>>>>>
+-----------------------------
>>>>>>>>>>>>>>>>>>
-------------------------------------------------+
>>>>>>>>>>>>>>>>>>   1:
volume management
>>>>>>>>>>>>>>>>>>   2:   
type mgmt/glusterd
>>>>>>>>>>>>>>>>>>   3:   
option rpc-auth.auth-glusterfs on
>>>>>>>>>>>>>>>>>>   4:   
option rpc-auth.auth-unix on
>>>>>>>>>>>>>>>>>>   5:   
option rpc-auth.auth-null on
>>>>>>>>>>>>>>>>>>   6:   
option rpc-auth-allow-insecure on
>>>>>>>>>>>>>>>>>>   7:   
option transport.socket.listen-backlog 128
>>>>>>>>>>>>>>>>>>   8:   
option event-threads 1
>>>>>>>>>>>>>>>>>>   9:   
option ping-timeout 0
>>>>>>>>>>>>>>>>>>  10:   
option transport.socket.read-fail-log off
>>>>>>>>>>>>>>>>>>  11:   
option transport.socket.keepalive-interval 2
>>>>>>>>>>>>>>>>>>  12:   
option transport.socket.keepalive-time 10
>>>>>>>>>>>>>>>>>>  13:   
option transport-type rdma
>>>>>>>>>>>>>>>>>>  14:   
option working-directory /var/lib/glusterd
>>>>>>>>>>>>>>>>>>  15:
end-volume
>>>>>>>>>>>>>>>>>>  16:
>>>>>>>>>>>>>>>>>>
+-----------------------------
>>>>>>>>>>>>>>>>>>
-------------------------------------------------+
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:04.996310] W [MSGID: 106062]
>>>>>>>>>>>>>>>>>>
[glusterd-handler.c:3466:glust
>>>>>>>>>>>>>>>>>>
erd_transport_inet_options_build] 0-glusterd: Failed to
>>>>>>>>>>>>>>>>>> get
tcp-user-timeout
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:05.000461] I [MSGID: 101190]
>>>>>>>>>>>>>>>>>>
[event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll:
>>>>>>>>>>>>>>>>>> Started
thread with index 1
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:05.001493] W
>>>>>>>>>>>>>>>>>>
[socket.c:593:__socket_rwv] 0-management: readv on
>>>>>>>>>>>>>>>>>>
192.168.0.7:24007 failed (No data available)
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:05.001513] I [MSGID: 106004]
>>>>>>>>>>>>>>>>>>
[glusterd-handler.c:5882:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>>>
0-management: Peer <192.168.0.7>
(<5ec54b4f-f60c-48c6-9e55-95f2bb58f633>),
>>>>>>>>>>>>>>>>>> in
state <Peer in Cluster>, h
>>>>>>>>>>>>>>>>>> as
disconnected from glusterd.
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:05.001677] W
>>>>>>>>>>>>>>>>>>
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>>>>>>>>>>>>>>>>>>
(-->/usr/lib/x86_64-linux-gnu/
>>>>>>>>>>>>>>>>>>
glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x20559)
>>>>>>>>>>>>>>>>>>
[0x7f0bf9d74559] -->/usr/lib/x86_64-linux-gnu
>>>>>>>>>>>>>>>>>>
/glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x29cf0)
>>>>>>>>>>>>>>>>>>
[0x7f0bf9d7dcf0] -->/usr/lib/x86_64-linux-gnu/g
>>>>>>>>>>>>>>>>>>
lusterfs/3.10.1/xlator/mgmt/glusterd.so(+0xd5ba3)
>>>>>>>>>>>>>>>>>>
[0x7f0bf9e29ba3] ) 0-management: Lock for vol shared no
>>>>>>>>>>>>>>>>>> t held
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:05.001696] W [MSGID: 106118]
>>>>>>>>>>>>>>>>>>
[glusterd-handler.c:5907:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>>>
0-management: Lock not released for shared
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:05.003099] E
>>>>>>>>>>>>>>>>>>
[rpc-clnt.c:365:saved_frames_unwind] (-->
>>>>>>>>>>>>>>>>>>
/usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>
lusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f0bfeeca73c]
>>>>>>>>>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(s
>>>>>>>>>>>>>>>>>>
aved_frames_unwind+0x1cf)[0x7f0bfec904bf] (-->
>>>>>>>>>>>>>>>>>>
/usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>
frpc.so.0(saved_frames_destroy+0xe)[0x7f0bfec905de] (-->
>>>>>>>>>>>>>>>>>>
/usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>
frpc.so.0(rpc_clnt_connection_cleanup+0x
>>>>>>>>>>>>>>>>>>
91)[0x7f0bfec91c21] (--> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>
frpc.so.0(rpc_clnt_notify+0x290)[0x7f0bfec92710] )))))
>>>>>>>>>>>>>>>>>>
0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called
>>>>>>>>>>>>>>>>>> at
2017-05-10 09:0
>>>>>>>>>>>>>>>>>>
7:05.000627 (xid=0x1)
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:05.003129] E [MSGID: 106167]
>>>>>>>>>>>>>>>>>>
[glusterd-handshake.c:2181:__glusterd_peer_dump_version_cbk]
>>>>>>>>>>>>>>>>>>
0-management: Error through RPC layer, retry again later
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:05.003251] W
>>>>>>>>>>>>>>>>>>
[socket.c:593:__socket_rwv] 0-management: readv on
>>>>>>>>>>>>>>>>>>
192.168.0.6:24007 failed (No data available)
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:05.003267] I [MSGID: 106004]
>>>>>>>>>>>>>>>>>>
[glusterd-handler.c:5882:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>>>
0-management: Peer <192.168.0.6>
(<83e9a0b9-6bd5-483b-8516-d8928805ed95>),
>>>>>>>>>>>>>>>>>> in
state <Peer in Cluster>, h
>>>>>>>>>>>>>>>>>> as
disconnected from glusterd.
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:05.003318] W
>>>>>>>>>>>>>>>>>>
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>>>>>>>>>>>>>>>>>>
(-->/usr/lib/x86_64-linux-gnu/
>>>>>>>>>>>>>>>>>>
glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x20559)
>>>>>>>>>>>>>>>>>>
[0x7f0bf9d74559] -->/usr/lib/x86_64-linux-gnu
>>>>>>>>>>>>>>>>>>
/glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x29cf0)
>>>>>>>>>>>>>>>>>>
[0x7f0bf9d7dcf0] -->/usr/lib/x86_64-linux-gnu/g
>>>>>>>>>>>>>>>>>>
lusterfs/3.10.1/xlator/mgmt/glusterd.so(+0xd5ba3)
>>>>>>>>>>>>>>>>>>
[0x7f0bf9e29ba3] ) 0-management: Lock for vol shared no
>>>>>>>>>>>>>>>>>> t held
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:05.003329] W [MSGID: 106118]
>>>>>>>>>>>>>>>>>>
[glusterd-handler.c:5907:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>>>
0-management: Lock not released for shared
>>>>>>>>>>>>>>>>>>
[2017-05-10 09:07:05.003457] E
>>>>>>>>>>>>>>>>>>
[rpc-clnt.c:365:saved_frames_unwind] (-->
>>>>>>>>>>>>>>>>>>
/usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>
lusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f0bfeeca73c]
>>>>>>>>>>>>>>>>>> (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(s
>>>>>>>>>>>>>>>>>>
aved_frames_unwind+0x1cf)[0x7f0bfec904bf] (-->
>>>>>>>>>>>>>>>>>>
/usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>
frpc.so.0(saved_frames_destroy+0xe)[0x7f0bfec905de] (-->
>>>>>>>>>>>>>>>>>>
/usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>
frpc.so.0(rpc_clnt_connection_cleanup+0x
>>>>>>>>>>>>>>>>>>
91)[0x7f0bfec91c21] (--> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>
frpc.so.0(rpc_clnt_notify+0x290)[0x7f0bfec92710] )))))
>>>>>>>>>>>>>>>>>>
0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called
>>>>>>>>>>>>>>>>>> at
2017-05-10 09:0
>>>>>>>>>>>>>>>>>>
7:05.001407 (xid=0x1)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> There
are a bunch of errors reported but I'm not sure
>>>>>>>>>>>>>>>>>> which
is signal and which ones are noise.  Does anyone have any idea whats
>>>>>>>>>>>>>>>>>> going
on here?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Pawan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>
Gluster-users mailing list
>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> - Atin (atinm)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>> - Atin (atinm)
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>> - Atin (atinm)
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> - Atin (atinm)
>>>>>>>>>
>>>>>>>> --
>>>>>>>> - Atin (atinm)
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170524/e9fb2eb2/attachment.html>

Gluster users - May 2017 - Failure while upgrading gluster to 3.10.1

[Gluster-users] Failure while upgrading gluster to 3.10.1

[Gluster-users] Failure while upgrading gluster to 3.10.1