Ziemowit Pierzycki
2017-Dec-19 21:55 UTC
[Gluster-users] Upgrading from Gluster 3.8 to 3.12
I have not done the upgrade yet. Since this is a production cluster I need to make sure it stays up or schedule some downtime if it doesn't doesn't. Thanks. On Tue, Dec 19, 2017 at 10:11 AM, Atin Mukherjee <amukherj at redhat.com> wrote:> > > On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzycki <ziemowit at pierzycki.com> > wrote: >> >> Hi, >> >> I have a cluster of 10 servers all running Fedora 24 along with >> Gluster 3.8. I'm planning on doing rolling upgrades to Fedora 27 with >> Gluster 3.12. I saw the documentation and did some testing but I >> would like to run my plan through some (more?) educated minds. >> >> The current setup is: >> >> Volume Name: vol0 >> Distributed-Replicate >> Number of Bricks: 2 x (2 + 1) = 6 >> Bricks: >> Brick1: glt01:/vol/vol0 >> Brick2: glt02:/vol/vol0 >> Brick3: glt05:/vol/vol0 (arbiter) >> Brick4: glt03:/vol/vol0 >> Brick5: glt04:/vol/vol0 >> Brick6: glt06:/vol/vol0 (arbiter) >> >> Volume Name: vol1 >> Distributed-Replicate >> Number of Bricks: 2 x (2 + 1) = 6 >> Bricks: >> Brick1: glt07:/vol/vol1 >> Brick2: glt08:/vol/vol1 >> Brick3: glt05:/vol/vol1 (arbiter) >> Brick4: glt09:/vol/vol1 >> Brick5: glt10:/vol/vol1 >> Brick6: glt06:/vol/vol1 (arbiter) >> >> After performing the upgrade because of differences in checksums, the >> upgraded nodes will become: >> >> State: Peer Rejected (Connected) > > > Have you upgraded all the nodes? If yes, have you bumped up the > cluster.op-version after upgrading all the nodes? Please follow : > http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more details > on how to bump up the cluster.op-version. In case you have done all of these > and you're seeing a checksum issue then I'm afraid you have hit a bug. I'd > need further details like the checksum mismatch error from glusterd.log file > along with the the exact volume's info file from > /var/lib/glusterd/vols/<volname>/info between both the peers to debug this > further. > >> >> If I start doing the upgrades one at a time, with nodes glt10 to glt01 >> except for the arbiters glt05 and glt06, and then upgrading the >> arbiters last, everything should remain online at all times through >> the process. Correct? >> >> Thanks. >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users > >
I was attempting the same on a local sandbox and also have the same problem. Current: 3.8.4 Volume Name: shchst01 Type: Distributed-Replicate Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3 Status: Started Snapshot Count: 0 Number of Bricks: 4 x 3 = 12 Transport-type: tcp Bricks: Brick1: shchhv01-sto:/data/brick3/shchst01 Brick2: shchhv02-sto:/data/brick3/shchst01 Brick3: shchhv03-sto:/data/brick3/shchst01 Brick4: shchhv01-sto:/data/brick1/shchst01 Brick5: shchhv02-sto:/data/brick1/shchst01 Brick6: shchhv03-sto:/data/brick1/shchst01 Brick7: shchhv02-sto:/data/brick2/shchst01 Brick8: shchhv03-sto:/data/brick2/shchst01 Brick9: shchhv04-sto:/data/brick2/shchst01 Brick10: shchhv02-sto:/data/brick4/shchst01 Brick11: shchhv03-sto:/data/brick4/shchst01 Brick12: shchhv04-sto:/data/brick4/shchst01 Options Reconfigured: cluster.data-self-heal-algorithm: full features.shard-block-size: 512MB features.shard: enable performance.readdir-ahead: on storage.owner-uid: 9869 storage.owner-gid: 9869 server.allow-insecure: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.self-heal-daemon: on nfs.disable: on performance.io-thread-count: 64 performance.cache-size: 1GB Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4 RESULT ====================Hostname: shchhv01-sto Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816 State: Peer Rejected (Connected) Upgraded Server: shchhv01-sto =============================[2017-12-20 05:02:44.747313] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2017-12-20 05:02:44.747387] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] 0-management: RPC_CLNT_PING notify failed [2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] 0-management: RPC_CLNT_PING notify failed [2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] 0-management: RPC_CLNT_PING notify failed [2017-12-20 05:02:54.676324] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto, port: 0 [2017-12-20 05:02:54.690237] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30800 [2017-12-20 05:02:54.695823] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272 [2017-12-20 05:02:54.696956] E [MSGID: 106010] [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum 2747317484 on peer shchhv02-sto [2017-12-20 05:02:54.697796] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to shchhv02-sto (0), ret: 0, op_ret: -1 [2017-12-20 05:02:55.033822] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto, port: 0 [2017-12-20 05:02:55.038460] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30800 [2017-12-20 05:02:55.040032] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b [2017-12-20 05:02:55.040266] E [MSGID: 106010] [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum 2747317484 on peer shchhv03-sto [2017-12-20 05:02:55.040405] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to shchhv03-sto (0), ret: 0, op_ret: -1 [2017-12-20 05:02:55.584854] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5, host: shchhv04-sto, port: 0 [2017-12-20 05:02:55.595125] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30800 [2017-12-20 05:02:55.600804] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5 [2017-12-20 05:02:55.601288] E [MSGID: 106010] [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum 2747317484 on peer shchhv04-sto [2017-12-20 05:02:55.601497] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to shchhv04-sto (0), ret: 0, op_ret: -1 Another Server: shchhv02-sto =============================[2017-12-20 05:02:44.667833] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c) [0x7f75fdc12e5c] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08) [0x7f75fdc1ca08] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa) [0x7f75fdcc57fa] ) 0-management: Lock for vol shchst01-sto not held [2017-12-20 05:02:44.667795] I [MSGID: 106004] [glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer <shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state <Peer Rejected>, has disconnected from glusterd. [2017-12-20 05:02:44.667948] W [MSGID: 106118] [glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock not released for shchst01-sto [2017-12-20 05:02:44.760103] I [MSGID: 106163] [glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30800 [2017-12-20 05:02:44.765389] I [MSGID: 106490] [glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816 [2017-12-20 05:02:54.686185] E [MSGID: 106010] [glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: Version of Cksums shchst01 differ. local cksum = 2747317484, remote cksum 4218452135 on peer shchhv01-sto [2017-12-20 05:02:54.686882] I [MSGID: 106493] [glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to shchhv01-sto (0), ret: 0, op_ret: -1 [2017-12-20 05:02:54.717854] I [MSGID: 106493] [glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, port: 0 Another Server: shchhv04-sto =============================[2017-12-20 05:02:44.667620] I [MSGID: 106004] [glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer <shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state <Peer Rejected>, has disconnected from glusterd. [2017-12-20 05:02:44.667808] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c) [0x7f10a33d9e5c] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08) [0x7f10a33e3a08] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa) [0x7f10a348c7fa] ) 0-management: Lock for vol shchst01-sto not held [2017-12-20 05:02:44.667827] W [MSGID: 106118] [glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock not released for shchst01-sto [2017-12-20 05:02:44.760077] I [MSGID: 106163] [glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30800 [2017-12-20 05:02:44.768796] I [MSGID: 106490] [glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816 [2017-12-20 05:02:55.595095] E [MSGID: 106010] [glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: Version of Cksums shchst01-sto differ. local cksum = 2747317484, remote cksum 4218452135 on peer shchhv01-sto [2017-12-20 05:02:55.595273] I [MSGID: 106493] [glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to shchhv01-sto (0), ret: 0, op_ret: -1 [2017-12-20 05:02:55.612957] I [MSGID: 106493] [glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, port: 0 <vol>/info Upgraded Server: shchst01-sto ========================type=2 count=12 status=1 sub_count=3 stripe_count=1 replica_count=3 disperse_count=0 redundancy_count=0 version=52 transport-type=0 volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3 username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc password=58652573-0955-4d00-893a-9f42d0f16717 op-version=30700 client-op-version=30700 quota-version=0 tier-enabled=0 parent_volname=N/A restored_from_snap=00000000-0000-0000-0000-000000000000 snap-max-hard-limit=256 cluster.data-self-heal-algorithm=full features.shard-block-size=512MB features.shard=enable nfs.disable=on cluster.self-heal-daemon=on cluster.server-quorum-type=server cluster.quorum-type=auto network.remote-dio=enable cluster.eager-lock=enable performance.stat-prefetch=off performance.io-cache=off performance.read-ahead=off performance.quick-read=off server.allow-insecure=on storage.owner-gid=9869 storage.owner-uid=9869 performance.readdir-ahead=on performance.io-thread-count=64 performance.cache-size=1GB brick-0=shchhv01-sto:-data-brick3-shchst01 brick-1=shchhv02-sto:-data-brick3-shchst01 brick-2=shchhv03-sto:-data-brick3-shchst01 brick-3=shchhv01-sto:-data-brick1-shchst01 brick-4=shchhv02-sto:-data-brick1-shchst01 brick-5=shchhv03-sto:-data-brick1-shchst01 brick-6=shchhv02-sto:-data-brick2-shchst01 brick-7=shchhv03-sto:-data-brick2-shchst01 brick-8=shchhv04-sto:-data-brick2-shchst01 brick-9=shchhv02-sto:-data-brick4-shchst01 brick-10=shchhv03-sto:-data-brick4-shchst01 brick-11=shchhv04-sto:-data-brick4-shchst01 Another Server: shchhv02-sto =============================type=2 count=12 status=1 sub_count=3 stripe_count=1 replica_count=3 disperse_count=0 redundancy_count=0 version=52 transport-type=0 volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3 username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc password=58652573-0955-4d00-893a-9f42d0f16717 op-version=30700 client-op-version=30700 quota-version=0 parent_volname=N/A restored_from_snap=00000000-0000-0000-0000-000000000000 snap-max-hard-limit=256 cluster.data-self-heal-algorithm=full features.shard-block-size=512MB features.shard=enable performance.readdir-ahead=on storage.owner-uid=9869 storage.owner-gid=9869 server.allow-insecure=on performance.quick-read=off performance.read-ahead=off performance.io-cache=off performance.stat-prefetch=off cluster.eager-lock=enable network.remote-dio=enable cluster.quorum-type=auto cluster.server-quorum-type=server cluster.self-heal-daemon=on nfs.disable=on performance.io-thread-count=64 performance.cache-size=1GB brick-0=shchhv01-sto:-data-brick3-shchst01 brick-1=shchhv02-sto:-data-brick3-shchst01 brick-2=shchhv03-sto:-data-brick3-shchst01 brick-3=shchhv01-sto:-data-brick1-shchst01 brick-4=shchhv02-sto:-data-brick1-shchst01 brick-5=shchhv03-sto:-data-brick1-shchst01 brick-6=shchhv02-sto:-data-brick2-shchst01 brick-7=shchhv03-sto:-data-brick2-shchst01 brick-8=shchhv04-sto:-data-brick2-shchst01 brick-9=shchhv02-sto:-data-brick4-shchst01 brick-10=shchhv03-sto:-data-brick4-shchst01 brick-11=shchhv04-sto:-data-brick4-shchst01 NOTE [root at shchhv01 shchst01]# gluster volume get shchst01 cluster.op-version Warning: Support to get global option value using `volume get <volname>` will be deprecated from next release. Consider using `volume get all` instead for global options Option Value ------ ----- cluster.op-version 30800 [root at shchhv02 shchst01]# gluster volume get shchst01 cluster.op-version Option Value ------ ----- cluster.op-version 30800 -----Original Message----- From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Ziemowit Pierzycki Sent: Tuesday, December 19, 2017 3:56 PM To: gluster-users <gluster-users at gluster.org> Subject: Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12 I have not done the upgrade yet. Since this is a production cluster I need to make sure it stays up or schedule some downtime if it doesn't doesn't. Thanks. On Tue, Dec 19, 2017 at 10:11 AM, Atin Mukherjee <amukherj at redhat.com> wrote:> > > On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzycki > <ziemowit at pierzycki.com> > wrote: >> >> Hi, >> >> I have a cluster of 10 servers all running Fedora 24 along with >> Gluster 3.8. I'm planning on doing rolling upgrades to Fedora 27 >> with Gluster 3.12. I saw the documentation and did some testing but >> I would like to run my plan through some (more?) educated minds. >> >> The current setup is: >> >> Volume Name: vol0 >> Distributed-Replicate >> Number of Bricks: 2 x (2 + 1) = 6 >> Bricks: >> Brick1: glt01:/vol/vol0 >> Brick2: glt02:/vol/vol0 >> Brick3: glt05:/vol/vol0 (arbiter) >> Brick4: glt03:/vol/vol0 >> Brick5: glt04:/vol/vol0 >> Brick6: glt06:/vol/vol0 (arbiter) >> >> Volume Name: vol1 >> Distributed-Replicate >> Number of Bricks: 2 x (2 + 1) = 6 >> Bricks: >> Brick1: glt07:/vol/vol1 >> Brick2: glt08:/vol/vol1 >> Brick3: glt05:/vol/vol1 (arbiter) >> Brick4: glt09:/vol/vol1 >> Brick5: glt10:/vol/vol1 >> Brick6: glt06:/vol/vol1 (arbiter) >> >> After performing the upgrade because of differences in checksums, the >> upgraded nodes will become: >> >> State: Peer Rejected (Connected) > > > Have you upgraded all the nodes? If yes, have you bumped up the > cluster.op-version after upgrading all the nodes? Please follow : > http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more > details on how to bump up the cluster.op-version. In case you have > done all of these and you're seeing a checksum issue then I'm afraid > you have hit a bug. I'd need further details like the checksum > mismatch error from glusterd.log file along with the the exact > volume's info file from /var/lib/glusterd/vols/<volname>/info between > both the peers to debug this further. > >> >> If I start doing the upgrades one at a time, with nodes glt10 to >> glt01 except for the arbiters glt05 and glt06, and then upgrading the >> arbiters last, everything should remain online at all times through >> the process. Correct? >> >> Thanks. >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users > >_______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Looks like a bug as I see tier-enabled = 0 is an additional entry in the info file in shchhv01. As per the code, this field should be written into the glusterd store if the op-version is >= 30706 . What I am guessing is since we didn't have the commit 33f8703a1 "glusterd: regenerate volfiles on op-version bump up" in 3.8.4 while bumping up the op-version the info and volfiles were not regenerated which caused the tier-enabled entry to be missing in the info file. For now, you can copy the info file for the volumes where the mismatch happened from shchhv01 to shchhv02 and restart glusterd service on shchhv02. That should fix up this temporarily. Unfortunately this step might need to be repeated for other nodes as well. @Hari - Could you help in debugging this further. On Wed, Dec 20, 2017 at 10:44 AM, Gustave Dahl <gustave at dahlfamily.net> wrote:> I was attempting the same on a local sandbox and also have the same > problem. > > > Current: 3.8.4 > > Volume Name: shchst01 > Type: Distributed-Replicate > Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3 > Status: Started > Snapshot Count: 0 > Number of Bricks: 4 x 3 = 12 > Transport-type: tcp > Bricks: > Brick1: shchhv01-sto:/data/brick3/shchst01 > Brick2: shchhv02-sto:/data/brick3/shchst01 > Brick3: shchhv03-sto:/data/brick3/shchst01 > Brick4: shchhv01-sto:/data/brick1/shchst01 > Brick5: shchhv02-sto:/data/brick1/shchst01 > Brick6: shchhv03-sto:/data/brick1/shchst01 > Brick7: shchhv02-sto:/data/brick2/shchst01 > Brick8: shchhv03-sto:/data/brick2/shchst01 > Brick9: shchhv04-sto:/data/brick2/shchst01 > Brick10: shchhv02-sto:/data/brick4/shchst01 > Brick11: shchhv03-sto:/data/brick4/shchst01 > Brick12: shchhv04-sto:/data/brick4/shchst01 > Options Reconfigured: > cluster.data-self-heal-algorithm: full > features.shard-block-size: 512MB > features.shard: enable > performance.readdir-ahead: on > storage.owner-uid: 9869 > storage.owner-gid: 9869 > server.allow-insecure: on > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > cluster.self-heal-daemon: on > nfs.disable: on > performance.io-thread-count: 64 > performance.cache-size: 1GB > > Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4 > > RESULT > ====================> Hostname: shchhv01-sto > Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816 > State: Peer Rejected (Connected) > > Upgraded Server: shchhv01-sto > =============================> [2017-12-20 05:02:44.747313] I [MSGID: 101190] > [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread > with > index 1 > [2017-12-20 05:02:44.747387] I [MSGID: 101190] > [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread > with > index 2 > [2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] > 0-management: RPC_CLNT_PING notify failed > [2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] > 0-management: RPC_CLNT_PING notify failed > [2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] > 0-management: RPC_CLNT_PING notify failed > [2017-12-20 05:02:54.676324] I [MSGID: 106493] > [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received > RJT > from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto, > port: 0 > [2017-12-20 05:02:54.690237] I [MSGID: 106163] > [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] > 0-management: > using the op-version 30800 > [2017-12-20 05:02:54.695823] I [MSGID: 106490] > [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] > 0-glusterd: > Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272 > [2017-12-20 05:02:54.696956] E [MSGID: 106010] > [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: > Version > of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum > 2747317484 on peer shchhv02-sto > [2017-12-20 05:02:54.697796] I [MSGID: 106493] > [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: > Responded to shchhv02-sto (0), ret: 0, op_ret: -1 > [2017-12-20 05:02:55.033822] I [MSGID: 106493] > [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received > RJT > from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto, > port: 0 > [2017-12-20 05:02:55.038460] I [MSGID: 106163] > [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] > 0-management: > using the op-version 30800 > [2017-12-20 05:02:55.040032] I [MSGID: 106490] > [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] > 0-glusterd: > Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b > [2017-12-20 05:02:55.040266] E [MSGID: 106010] > [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: > Version > of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum > 2747317484 on peer shchhv03-sto > [2017-12-20 05:02:55.040405] I [MSGID: 106493] > [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: > Responded to shchhv03-sto (0), ret: 0, op_ret: -1 > [2017-12-20 05:02:55.584854] I [MSGID: 106493] > [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received > RJT > from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5, host: shchhv04-sto, > port: 0 > [2017-12-20 05:02:55.595125] I [MSGID: 106163] > [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] > 0-management: > using the op-version 30800 > [2017-12-20 05:02:55.600804] I [MSGID: 106490] > [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] > 0-glusterd: > Received probe from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5 > [2017-12-20 05:02:55.601288] E [MSGID: 106010] > [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: > Version > of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum > 2747317484 on peer shchhv04-sto > [2017-12-20 05:02:55.601497] I [MSGID: 106493] > [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: > Responded to shchhv04-sto (0), ret: 0, op_ret: -1 > > Another Server: shchhv02-sto > =============================> [2017-12-20 05:02:44.667833] W > [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] > (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c) > [0x7f75fdc12e5c] > -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08) > [0x7f75fdc1ca08] > -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa) > [0x7f75fdcc57fa] ) 0-management: Lock for vol shchst01-sto not held > [2017-12-20 05:02:44.667795] I [MSGID: 106004] > [glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer > <shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state <Peer > Rejected>, has disconnected from glusterd. > [2017-12-20 05:02:44.667948] W [MSGID: 106118] > [glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock > not > released for shchst01-sto > [2017-12-20 05:02:44.760103] I [MSGID: 106163] > [glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] > 0-management: > using the op-version 30800 > [2017-12-20 05:02:44.765389] I [MSGID: 106490] > [glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] > 0-glusterd: > Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816 > [2017-12-20 05:02:54.686185] E [MSGID: 106010] > [glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: > Version > of Cksums shchst01 differ. local cksum = 2747317484, remote cksum > 4218452135 on peer shchhv01-sto > [2017-12-20 05:02:54.686882] I [MSGID: 106493] > [glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd: > Responded to shchhv01-sto (0), ret: 0, op_ret: -1 > [2017-12-20 05:02:54.717854] I [MSGID: 106493] > [glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received > RJT > from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, > port: 0 > > Another Server: shchhv04-sto > =============================> [2017-12-20 05:02:44.667620] I [MSGID: 106004] > [glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer > <shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state <Peer > Rejected>, has disconnected from glusterd. > [2017-12-20 05:02:44.667808] W > [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] > (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c) > [0x7f10a33d9e5c] > -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08) > [0x7f10a33e3a08] > -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa) > [0x7f10a348c7fa] ) 0-management: Lock for vol shchst01-sto not held > [2017-12-20 05:02:44.667827] W [MSGID: 106118] > [glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock > not > released for shchst01-sto > [2017-12-20 05:02:44.760077] I [MSGID: 106163] > [glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] > 0-management: > using the op-version 30800 > [2017-12-20 05:02:44.768796] I [MSGID: 106490] > [glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] > 0-glusterd: > Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816 > [2017-12-20 05:02:55.595095] E [MSGID: 106010] > [glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: > Version > of Cksums shchst01-sto differ. local cksum = 2747317484, remote cksum > 4218452135 on peer shchhv01-sto > [2017-12-20 05:02:55.595273] I [MSGID: 106493] > [glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd: > Responded to shchhv01-sto (0), ret: 0, op_ret: -1 > [2017-12-20 05:02:55.612957] I [MSGID: 106493] > [glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received > RJT > from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, > port: 0 > > <vol>/info > > Upgraded Server: shchst01-sto > ========================> type=2 > count=12 > status=1 > sub_count=3 > stripe_count=1 > replica_count=3 > disperse_count=0 > redundancy_count=0 > version=52 > transport-type=0 > volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3 > username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc > password=58652573-0955-4d00-893a-9f42d0f16717 > op-version=30700 > client-op-version=30700 > quota-version=0 > tier-enabled=0 > parent_volname=N/A > restored_from_snap=00000000-0000-0000-0000-000000000000 > snap-max-hard-limit=256 > cluster.data-self-heal-algorithm=full > features.shard-block-size=512MB > features.shard=enable > nfs.disable=on > cluster.self-heal-daemon=on > cluster.server-quorum-type=server > cluster.quorum-type=auto > network.remote-dio=enable > cluster.eager-lock=enable > performance.stat-prefetch=off > performance.io-cache=off > performance.read-ahead=off > performance.quick-read=off > server.allow-insecure=on > storage.owner-gid=9869 > storage.owner-uid=9869 > performance.readdir-ahead=on > performance.io-thread-count=64 > performance.cache-size=1GB > brick-0=shchhv01-sto:-data-brick3-shchst01 > brick-1=shchhv02-sto:-data-brick3-shchst01 > brick-2=shchhv03-sto:-data-brick3-shchst01 > brick-3=shchhv01-sto:-data-brick1-shchst01 > brick-4=shchhv02-sto:-data-brick1-shchst01 > brick-5=shchhv03-sto:-data-brick1-shchst01 > brick-6=shchhv02-sto:-data-brick2-shchst01 > brick-7=shchhv03-sto:-data-brick2-shchst01 > brick-8=shchhv04-sto:-data-brick2-shchst01 > brick-9=shchhv02-sto:-data-brick4-shchst01 > brick-10=shchhv03-sto:-data-brick4-shchst01 > brick-11=shchhv04-sto:-data-brick4-shchst01 > > Another Server: shchhv02-sto > =============================> type=2 > count=12 > status=1 > sub_count=3 > stripe_count=1 > replica_count=3 > disperse_count=0 > redundancy_count=0 > version=52 > transport-type=0 > volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3 > username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc > password=58652573-0955-4d00-893a-9f42d0f16717 > op-version=30700 > client-op-version=30700 > quota-version=0 > parent_volname=N/A > restored_from_snap=00000000-0000-0000-0000-000000000000 > snap-max-hard-limit=256 > cluster.data-self-heal-algorithm=full > features.shard-block-size=512MB > features.shard=enable > performance.readdir-ahead=on > storage.owner-uid=9869 > storage.owner-gid=9869 > server.allow-insecure=on > performance.quick-read=off > performance.read-ahead=off > performance.io-cache=off > performance.stat-prefetch=off > cluster.eager-lock=enable > network.remote-dio=enable > cluster.quorum-type=auto > cluster.server-quorum-type=server > cluster.self-heal-daemon=on > nfs.disable=on > performance.io-thread-count=64 > performance.cache-size=1GB > brick-0=shchhv01-sto:-data-brick3-shchst01 > brick-1=shchhv02-sto:-data-brick3-shchst01 > brick-2=shchhv03-sto:-data-brick3-shchst01 > brick-3=shchhv01-sto:-data-brick1-shchst01 > brick-4=shchhv02-sto:-data-brick1-shchst01 > brick-5=shchhv03-sto:-data-brick1-shchst01 > brick-6=shchhv02-sto:-data-brick2-shchst01 > brick-7=shchhv03-sto:-data-brick2-shchst01 > brick-8=shchhv04-sto:-data-brick2-shchst01 > brick-9=shchhv02-sto:-data-brick4-shchst01 > brick-10=shchhv03-sto:-data-brick4-shchst01 > brick-11=shchhv04-sto:-data-brick4-shchst01 > > NOTE > > [root at shchhv01 shchst01]# gluster volume get shchst01 cluster.op-version > Warning: Support to get global option value using `volume get <volname>` > will be deprecated from next release. Consider using `volume get all` > instead for global options > Option Value > > ------ ----- > > cluster.op-version 30800 > > [root at shchhv02 shchst01]# gluster volume get shchst01 cluster.op-version > Option Value > > ------ ----- > > cluster.op-version 30800 > > -----Original Message----- > From: gluster-users-bounces at gluster.org > [mailto:gluster-users-bounces at gluster.org] On Behalf Of Ziemowit Pierzycki > Sent: Tuesday, December 19, 2017 3:56 PM > To: gluster-users <gluster-users at gluster.org> > Subject: Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12 > > I have not done the upgrade yet. Since this is a production cluster I need > to make sure it stays up or schedule some downtime if it doesn't doesn't. > Thanks. > > On Tue, Dec 19, 2017 at 10:11 AM, Atin Mukherjee <amukherj at redhat.com> > wrote: > > > > > > On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzycki > > <ziemowit at pierzycki.com> > > wrote: > >> > >> Hi, > >> > >> I have a cluster of 10 servers all running Fedora 24 along with > >> Gluster 3.8. I'm planning on doing rolling upgrades to Fedora 27 > >> with Gluster 3.12. I saw the documentation and did some testing but > >> I would like to run my plan through some (more?) educated minds. > >> > >> The current setup is: > >> > >> Volume Name: vol0 > >> Distributed-Replicate > >> Number of Bricks: 2 x (2 + 1) = 6 > >> Bricks: > >> Brick1: glt01:/vol/vol0 > >> Brick2: glt02:/vol/vol0 > >> Brick3: glt05:/vol/vol0 (arbiter) > >> Brick4: glt03:/vol/vol0 > >> Brick5: glt04:/vol/vol0 > >> Brick6: glt06:/vol/vol0 (arbiter) > >> > >> Volume Name: vol1 > >> Distributed-Replicate > >> Number of Bricks: 2 x (2 + 1) = 6 > >> Bricks: > >> Brick1: glt07:/vol/vol1 > >> Brick2: glt08:/vol/vol1 > >> Brick3: glt05:/vol/vol1 (arbiter) > >> Brick4: glt09:/vol/vol1 > >> Brick5: glt10:/vol/vol1 > >> Brick6: glt06:/vol/vol1 (arbiter) > >> > >> After performing the upgrade because of differences in checksums, the > >> upgraded nodes will become: > >> > >> State: Peer Rejected (Connected) > > > > > > Have you upgraded all the nodes? If yes, have you bumped up the > > cluster.op-version after upgrading all the nodes? Please follow : > > http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more > > details on how to bump up the cluster.op-version. In case you have > > done all of these and you're seeing a checksum issue then I'm afraid > > you have hit a bug. I'd need further details like the checksum > > mismatch error from glusterd.log file along with the the exact > > volume's info file from /var/lib/glusterd/vols/<volname>/info between > > both the peers to debug this further. > > > >> > >> If I start doing the upgrades one at a time, with nodes glt10 to > >> glt01 except for the arbiters glt05 and glt06, and then upgrading the > >> arbiters last, everything should remain online at all times through > >> the process. Correct? > >> > >> Thanks. > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> http://lists.gluster.org/mailman/listinfo/gluster-users > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171220/8e7570ea/attachment.html>