thr3ads.net - Gluster users - [Gluster-users] Upgrading from Gluster 3.8 to 3.12 [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Ziemowit Pierzycki

2017-Dec-19 21:55 UTC

[Gluster-users] Upgrading from Gluster 3.8 to 3.12

I have not done the upgrade yet.  Since this is a production cluster I
need to make sure it stays up or schedule some downtime if it doesn't
doesn't.  Thanks.

On Tue, Dec 19, 2017 at 10:11 AM, Atin Mukherjee <amukherj at redhat.com>
wrote:>
>
> On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzycki <ziemowit at
pierzycki.com>
> wrote:
>>
>> Hi,
>>
>> I have a cluster of 10 servers all running Fedora 24 along with
>> Gluster 3.8.  I'm planning on doing rolling upgrades to Fedora 27
with
>> Gluster 3.12.  I saw the documentation and did some testing but I
>> would like to run my plan through some (more?) educated minds.
>>
>> The current setup is:
>>
>> Volume Name: vol0
>> Distributed-Replicate
>> Number of Bricks: 2 x (2 + 1) = 6
>> Bricks:
>> Brick1: glt01:/vol/vol0
>> Brick2: glt02:/vol/vol0
>> Brick3: glt05:/vol/vol0 (arbiter)
>> Brick4: glt03:/vol/vol0
>> Brick5: glt04:/vol/vol0
>> Brick6: glt06:/vol/vol0 (arbiter)
>>
>> Volume Name: vol1
>> Distributed-Replicate
>> Number of Bricks: 2 x (2 + 1) = 6
>> Bricks:
>> Brick1: glt07:/vol/vol1
>> Brick2: glt08:/vol/vol1
>> Brick3: glt05:/vol/vol1 (arbiter)
>> Brick4: glt09:/vol/vol1
>> Brick5: glt10:/vol/vol1
>> Brick6: glt06:/vol/vol1 (arbiter)
>>
>> After performing the upgrade because of differences in checksums, the
>> upgraded nodes will become:
>>
>> State: Peer Rejected (Connected)
>
>
> Have you upgraded all the nodes? If yes, have you bumped up the
> cluster.op-version after upgrading all the nodes? Please follow :
> http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more
details
> on how to bump up the cluster.op-version. In case you have done all of
these
> and you're seeing a checksum issue then I'm afraid you have hit a
bug. I'd
> need further details like the checksum mismatch error from glusterd.log
file
> along with the the exact volume's info file from
> /var/lib/glusterd/vols/<volname>/info between both the peers to debug
this
> further.
>
>>
>> If I start doing the upgrades one at a time, with nodes glt10 to glt01
>> except for the arbiters glt05 and glt06, and then upgrading the
>> arbiters last, everything should remain online at all times through
>> the process.  Correct?
>>
>> Thanks.
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>

Gustave Dahl

2017-Dec-20 05:14 UTC

head link

[Gluster-users] Upgrading from Gluster 3.8 to 3.12

I was attempting the same on a local sandbox and also have the same problem.


Current: 3.8.4

Volume Name: shchst01
Type: Distributed-Replicate
Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: shchhv01-sto:/data/brick3/shchst01
Brick2: shchhv02-sto:/data/brick3/shchst01
Brick3: shchhv03-sto:/data/brick3/shchst01
Brick4: shchhv01-sto:/data/brick1/shchst01
Brick5: shchhv02-sto:/data/brick1/shchst01
Brick6: shchhv03-sto:/data/brick1/shchst01
Brick7: shchhv02-sto:/data/brick2/shchst01
Brick8: shchhv03-sto:/data/brick2/shchst01
Brick9: shchhv04-sto:/data/brick2/shchst01
Brick10: shchhv02-sto:/data/brick4/shchst01
Brick11: shchhv03-sto:/data/brick4/shchst01
Brick12: shchhv04-sto:/data/brick4/shchst01
Options Reconfigured:
cluster.data-self-heal-algorithm: full
features.shard-block-size: 512MB
features.shard: enable
performance.readdir-ahead: on
storage.owner-uid: 9869
storage.owner-gid: 9869
server.allow-insecure: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.self-heal-daemon: on
nfs.disable: on
performance.io-thread-count: 64
performance.cache-size: 1GB

Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4

RESULT
====================Hostname: shchhv01-sto
Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
State: Peer Rejected (Connected)

Upgraded Server:  shchhv01-sto
=============================[2017-12-20 05:02:44.747313] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 1
[2017-12-20 05:02:44.747387] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 2
[2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
0-management: RPC_CLNT_PING notify failed
[2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
0-management: RPC_CLNT_PING notify failed
[2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
0-management: RPC_CLNT_PING notify failed
[2017-12-20 05:02:54.676324] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto, port: 0
[2017-12-20 05:02:54.690237] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:54.695823] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272
[2017-12-20 05:02:54.696956] E [MSGID: 106010]
[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum 2747317484
on peer shchhv02-sto
[2017-12-20 05:02:54.697796] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv02-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:55.033822] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto, port: 0
[2017-12-20 05:02:55.038460] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:55.040032] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b
[2017-12-20 05:02:55.040266] E [MSGID: 106010]
[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum 2747317484
on peer shchhv03-sto
[2017-12-20 05:02:55.040405] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv03-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:55.584854] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5, host: shchhv04-sto, port: 0
[2017-12-20 05:02:55.595125] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:55.600804] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5
[2017-12-20 05:02:55.601288] E [MSGID: 106010]
[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum 2747317484
on peer shchhv04-sto
[2017-12-20 05:02:55.601497] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv04-sto (0), ret: 0, op_ret: -1

Another Server:  shchhv02-sto
=============================[2017-12-20 05:02:44.667833] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c)
[0x7f75fdc12e5c]
-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08)
[0x7f75fdc1ca08]
-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa)
[0x7f75fdcc57fa] ) 0-management: Lock for vol shchst01-sto not held
[2017-12-20 05:02:44.667795] I [MSGID: 106004]
[glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer
<shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state
<Peer
Rejected>, has disconnected from glusterd.
[2017-12-20 05:02:44.667948] W [MSGID: 106118]
[glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock not
released for shchst01-sto
[2017-12-20 05:02:44.760103] I [MSGID: 106163]
[glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:44.765389] I [MSGID: 106490]
[glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
[2017-12-20 05:02:54.686185] E [MSGID: 106010]
[glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01 differ. local cksum = 2747317484, remote cksum 4218452135 on
peer shchhv01-sto
[2017-12-20 05:02:54.686882] I [MSGID: 106493]
[glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv01-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:54.717854] I [MSGID: 106493]
[glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, port: 0

Another Server:  shchhv04-sto
=============================[2017-12-20 05:02:44.667620] I [MSGID: 106004]
[glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer
<shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state
<Peer
Rejected>, has disconnected from glusterd.
[2017-12-20 05:02:44.667808] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c)
[0x7f10a33d9e5c]
-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08)
[0x7f10a33e3a08]
-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa)
[0x7f10a348c7fa] ) 0-management: Lock for vol shchst01-sto not held
[2017-12-20 05:02:44.667827] W [MSGID: 106118]
[glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock not
released for shchst01-sto
[2017-12-20 05:02:44.760077] I [MSGID: 106163]
[glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:44.768796] I [MSGID: 106490]
[glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
[2017-12-20 05:02:55.595095] E [MSGID: 106010]
[glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 2747317484, remote cksum 4218452135
on peer shchhv01-sto
[2017-12-20 05:02:55.595273] I [MSGID: 106493]
[glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv01-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:55.612957] I [MSGID: 106493]
[glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, port: 0

<vol>/info

Upgraded Server: shchst01-sto
========================type=2
count=12
status=1
sub_count=3
stripe_count=1
replica_count=3
disperse_count=0
redundancy_count=0
version=52
transport-type=0
volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3
username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc
password=58652573-0955-4d00-893a-9f42d0f16717
op-version=30700
client-op-version=30700
quota-version=0
tier-enabled=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
cluster.data-self-heal-algorithm=full
features.shard-block-size=512MB
features.shard=enable
nfs.disable=on
cluster.self-heal-daemon=on
cluster.server-quorum-type=server
cluster.quorum-type=auto
network.remote-dio=enable
cluster.eager-lock=enable
performance.stat-prefetch=off
performance.io-cache=off
performance.read-ahead=off
performance.quick-read=off
server.allow-insecure=on
storage.owner-gid=9869
storage.owner-uid=9869
performance.readdir-ahead=on
performance.io-thread-count=64
performance.cache-size=1GB
brick-0=shchhv01-sto:-data-brick3-shchst01
brick-1=shchhv02-sto:-data-brick3-shchst01
brick-2=shchhv03-sto:-data-brick3-shchst01
brick-3=shchhv01-sto:-data-brick1-shchst01
brick-4=shchhv02-sto:-data-brick1-shchst01
brick-5=shchhv03-sto:-data-brick1-shchst01
brick-6=shchhv02-sto:-data-brick2-shchst01
brick-7=shchhv03-sto:-data-brick2-shchst01
brick-8=shchhv04-sto:-data-brick2-shchst01
brick-9=shchhv02-sto:-data-brick4-shchst01
brick-10=shchhv03-sto:-data-brick4-shchst01
brick-11=shchhv04-sto:-data-brick4-shchst01

Another Server:  shchhv02-sto
=============================type=2
count=12
status=1
sub_count=3
stripe_count=1
replica_count=3
disperse_count=0
redundancy_count=0
version=52
transport-type=0
volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3
username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc
password=58652573-0955-4d00-893a-9f42d0f16717
op-version=30700
client-op-version=30700
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
cluster.data-self-heal-algorithm=full
features.shard-block-size=512MB
features.shard=enable
performance.readdir-ahead=on
storage.owner-uid=9869
storage.owner-gid=9869
server.allow-insecure=on
performance.quick-read=off
performance.read-ahead=off
performance.io-cache=off
performance.stat-prefetch=off
cluster.eager-lock=enable
network.remote-dio=enable
cluster.quorum-type=auto
cluster.server-quorum-type=server
cluster.self-heal-daemon=on
nfs.disable=on
performance.io-thread-count=64
performance.cache-size=1GB
brick-0=shchhv01-sto:-data-brick3-shchst01
brick-1=shchhv02-sto:-data-brick3-shchst01
brick-2=shchhv03-sto:-data-brick3-shchst01
brick-3=shchhv01-sto:-data-brick1-shchst01
brick-4=shchhv02-sto:-data-brick1-shchst01
brick-5=shchhv03-sto:-data-brick1-shchst01
brick-6=shchhv02-sto:-data-brick2-shchst01
brick-7=shchhv03-sto:-data-brick2-shchst01
brick-8=shchhv04-sto:-data-brick2-shchst01
brick-9=shchhv02-sto:-data-brick4-shchst01
brick-10=shchhv03-sto:-data-brick4-shchst01
brick-11=shchhv04-sto:-data-brick4-shchst01

NOTE

[root at shchhv01 shchst01]# gluster volume get shchst01 cluster.op-version
Warning: Support to get global option value using `volume get <volname>`
will be deprecated from next release. Consider using `volume get all`
instead for global options
Option                                  Value

------                                  -----

cluster.op-version                      30800  

[root at shchhv02 shchst01]# gluster volume get shchst01 cluster.op-version
Option                                  Value

------                                  -----

cluster.op-version                      30800   

-----Original Message-----
From: gluster-users-bounces at gluster.org
[mailto:gluster-users-bounces at gluster.org] On Behalf Of Ziemowit Pierzycki
Sent: Tuesday, December 19, 2017 3:56 PM
To: gluster-users <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12

I have not done the upgrade yet.  Since this is a production cluster I need
to make sure it stays up or schedule some downtime if it doesn't
doesn't.
Thanks.

On Tue, Dec 19, 2017 at 10:11 AM, Atin Mukherjee <amukherj at redhat.com>
wrote:>
>
> On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzycki 
> <ziemowit at pierzycki.com>
> wrote:
>>
>> Hi,
>>
>> I have a cluster of 10 servers all running Fedora 24 along with 
>> Gluster 3.8.  I'm planning on doing rolling upgrades to Fedora 27 
>> with Gluster 3.12.  I saw the documentation and did some testing but 
>> I would like to run my plan through some (more?) educated minds.
>>
>> The current setup is:
>>
>> Volume Name: vol0
>> Distributed-Replicate
>> Number of Bricks: 2 x (2 + 1) = 6
>> Bricks:
>> Brick1: glt01:/vol/vol0
>> Brick2: glt02:/vol/vol0
>> Brick3: glt05:/vol/vol0 (arbiter)
>> Brick4: glt03:/vol/vol0
>> Brick5: glt04:/vol/vol0
>> Brick6: glt06:/vol/vol0 (arbiter)
>>
>> Volume Name: vol1
>> Distributed-Replicate
>> Number of Bricks: 2 x (2 + 1) = 6
>> Bricks:
>> Brick1: glt07:/vol/vol1
>> Brick2: glt08:/vol/vol1
>> Brick3: glt05:/vol/vol1 (arbiter)
>> Brick4: glt09:/vol/vol1
>> Brick5: glt10:/vol/vol1
>> Brick6: glt06:/vol/vol1 (arbiter)
>>
>> After performing the upgrade because of differences in checksums, the 
>> upgraded nodes will become:
>>
>> State: Peer Rejected (Connected)
>
>
> Have you upgraded all the nodes? If yes, have you bumped up the 
> cluster.op-version after upgrading all the nodes? Please follow :
> http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more 
> details on how to bump up the cluster.op-version. In case you have 
> done all of these and you're seeing a checksum issue then I'm
afraid
> you have hit a bug. I'd need further details like the checksum 
> mismatch error from glusterd.log file along with the the exact 
> volume's info file from /var/lib/glusterd/vols/<volname>/info
between
> both the peers to debug this further.
>
>>
>> If I start doing the upgrades one at a time, with nodes glt10 to 
>> glt01 except for the arbiters glt05 and glt06, and then upgrading the 
>> arbiters last, everything should remain online at all times through 
>> the process.  Correct?
>>
>> Thanks.
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Atin Mukherjee

2017-Dec-20 05:58 UTC

head link

[Gluster-users] Upgrading from Gluster 3.8 to 3.12

Looks like a bug as I see tier-enabled = 0 is an additional entry in the
info file in shchhv01. As per the code, this field should be written into
the glusterd store if the op-version is >= 30706 . What I am guessing is
since we didn't have the commit 33f8703a1 "glusterd: regenerate
volfiles on
op-version bump up" in 3.8.4 while bumping up the op-version the info and
volfiles were not regenerated which caused the tier-enabled entry to be
missing in the info file.

For now, you can copy the info file for the volumes where the mismatch
happened from shchhv01 to shchhv02 and restart glusterd service on
shchhv02. That should fix up this temporarily. Unfortunately this step
might need to be repeated for other nodes as well.

@Hari - Could you help in debugging this further.



On Wed, Dec 20, 2017 at 10:44 AM, Gustave Dahl <gustave at dahlfamily.net>
wrote:
> I was attempting the same on a local sandbox and also have the same
> problem.
>
>
> Current: 3.8.4
>
> Volume Name: shchst01
> Type: Distributed-Replicate
> Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 4 x 3 = 12
> Transport-type: tcp
> Bricks:
> Brick1: shchhv01-sto:/data/brick3/shchst01
> Brick2: shchhv02-sto:/data/brick3/shchst01
> Brick3: shchhv03-sto:/data/brick3/shchst01
> Brick4: shchhv01-sto:/data/brick1/shchst01
> Brick5: shchhv02-sto:/data/brick1/shchst01
> Brick6: shchhv03-sto:/data/brick1/shchst01
> Brick7: shchhv02-sto:/data/brick2/shchst01
> Brick8: shchhv03-sto:/data/brick2/shchst01
> Brick9: shchhv04-sto:/data/brick2/shchst01
> Brick10: shchhv02-sto:/data/brick4/shchst01
> Brick11: shchhv03-sto:/data/brick4/shchst01
> Brick12: shchhv04-sto:/data/brick4/shchst01
> Options Reconfigured:
> cluster.data-self-heal-algorithm: full
> features.shard-block-size: 512MB
> features.shard: enable
> performance.readdir-ahead: on
> storage.owner-uid: 9869
> storage.owner-gid: 9869
> server.allow-insecure: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.self-heal-daemon: on
> nfs.disable: on
> performance.io-thread-count: 64
> performance.cache-size: 1GB
>
> Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4
>
> RESULT
> ====================> Hostname: shchhv01-sto
> Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
> State: Peer Rejected (Connected)
>
> Upgraded Server:  shchhv01-sto
> =============================> [2017-12-20 05:02:44.747313] I [MSGID:
101190]
> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
> with
> index 1
> [2017-12-20 05:02:44.747387] I [MSGID: 101190]
> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
> with
> index 2
> [2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
> 0-management: RPC_CLNT_PING notify failed
> [2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
> 0-management: RPC_CLNT_PING notify failed
> [2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
> 0-management: RPC_CLNT_PING notify failed
> [2017-12-20 05:02:54.676324] I [MSGID: 106493]
> [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received
> RJT
> from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto,
> port: 0
> [2017-12-20 05:02:54.690237] I [MSGID: 106163]
> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack]
> 0-management:
> using the op-version 30800
> [2017-12-20 05:02:54.695823] I [MSGID: 106490]
> [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req]
> 0-glusterd:
> Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272
> [2017-12-20 05:02:54.696956] E [MSGID: 106010]
> [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management:
> Version
> of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum >
2747317484 on peer shchhv02-sto
> [2017-12-20 05:02:54.697796] I [MSGID: 106493]
> [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to shchhv02-sto (0), ret: 0, op_ret: -1
> [2017-12-20 05:02:55.033822] I [MSGID: 106493]
> [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received
> RJT
> from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto,
> port: 0
> [2017-12-20 05:02:55.038460] I [MSGID: 106163]
> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack]
> 0-management:
> using the op-version 30800
> [2017-12-20 05:02:55.040032] I [MSGID: 106490]
> [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req]
> 0-glusterd:
> Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b
> [2017-12-20 05:02:55.040266] E [MSGID: 106010]
> [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management:
> Version
> of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum >
2747317484 on peer shchhv03-sto
> [2017-12-20 05:02:55.040405] I [MSGID: 106493]
> [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to shchhv03-sto (0), ret: 0, op_ret: -1
> [2017-12-20 05:02:55.584854] I [MSGID: 106493]
> [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received
> RJT
> from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5, host: shchhv04-sto,
> port: 0
> [2017-12-20 05:02:55.595125] I [MSGID: 106163]
> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack]
> 0-management:
> using the op-version 30800
> [2017-12-20 05:02:55.600804] I [MSGID: 106490]
> [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req]
> 0-glusterd:
> Received probe from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5
> [2017-12-20 05:02:55.601288] E [MSGID: 106010]
> [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management:
> Version
> of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum >
2747317484 on peer shchhv04-sto
> [2017-12-20 05:02:55.601497] I [MSGID: 106493]
> [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to shchhv04-sto (0), ret: 0, op_ret: -1
>
> Another Server:  shchhv02-sto
> =============================> [2017-12-20 05:02:44.667833] W
> [glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
> (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c)
> [0x7f75fdc12e5c]
> -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08)
> [0x7f75fdc1ca08]
> -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa)
> [0x7f75fdcc57fa] ) 0-management: Lock for vol shchst01-sto not held
> [2017-12-20 05:02:44.667795] I [MSGID: 106004]
> [glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer
> <shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in
state <Peer
> Rejected>, has disconnected from glusterd.
> [2017-12-20 05:02:44.667948] W [MSGID: 106118]
> [glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock
> not
> released for shchst01-sto
> [2017-12-20 05:02:44.760103] I [MSGID: 106163]
> [glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack]
> 0-management:
> using the op-version 30800
> [2017-12-20 05:02:44.765389] I [MSGID: 106490]
> [glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req]
> 0-glusterd:
> Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
> [2017-12-20 05:02:54.686185] E [MSGID: 106010]
> [glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management:
> Version
> of Cksums shchst01 differ. local cksum = 2747317484, remote cksum >
4218452135 on peer shchhv01-sto
> [2017-12-20 05:02:54.686882] I [MSGID: 106493]
> [glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to shchhv01-sto (0), ret: 0, op_ret: -1
> [2017-12-20 05:02:54.717854] I [MSGID: 106493]
> [glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received
> RJT
> from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto,
> port: 0
>
> Another Server:  shchhv04-sto
> =============================> [2017-12-20 05:02:44.667620] I [MSGID:
106004]
> [glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer
> <shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in
state <Peer
> Rejected>, has disconnected from glusterd.
> [2017-12-20 05:02:44.667808] W
> [glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
> (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c)
> [0x7f10a33d9e5c]
> -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08)
> [0x7f10a33e3a08]
> -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa)
> [0x7f10a348c7fa] ) 0-management: Lock for vol shchst01-sto not held
> [2017-12-20 05:02:44.667827] W [MSGID: 106118]
> [glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock
> not
> released for shchst01-sto
> [2017-12-20 05:02:44.760077] I [MSGID: 106163]
> [glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack]
> 0-management:
> using the op-version 30800
> [2017-12-20 05:02:44.768796] I [MSGID: 106490]
> [glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req]
> 0-glusterd:
> Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
> [2017-12-20 05:02:55.595095] E [MSGID: 106010]
> [glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management:
> Version
> of Cksums shchst01-sto differ. local cksum = 2747317484, remote cksum >
4218452135 on peer shchhv01-sto
> [2017-12-20 05:02:55.595273] I [MSGID: 106493]
> [glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to shchhv01-sto (0), ret: 0, op_ret: -1
> [2017-12-20 05:02:55.612957] I [MSGID: 106493]
> [glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received
> RJT
> from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto,
> port: 0
>
> <vol>/info
>
> Upgraded Server: shchst01-sto
> ========================> type=2
> count=12
> status=1
> sub_count=3
> stripe_count=1
> replica_count=3
> disperse_count=0
> redundancy_count=0
> version=52
> transport-type=0
> volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3
> username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc
> password=58652573-0955-4d00-893a-9f42d0f16717
> op-version=30700
> client-op-version=30700
> quota-version=0
> tier-enabled=0
> parent_volname=N/A
> restored_from_snap=00000000-0000-0000-0000-000000000000
> snap-max-hard-limit=256
> cluster.data-self-heal-algorithm=full
> features.shard-block-size=512MB
> features.shard=enable
> nfs.disable=on
> cluster.self-heal-daemon=on
> cluster.server-quorum-type=server
> cluster.quorum-type=auto
> network.remote-dio=enable
> cluster.eager-lock=enable
> performance.stat-prefetch=off
> performance.io-cache=off
> performance.read-ahead=off
> performance.quick-read=off
> server.allow-insecure=on
> storage.owner-gid=9869
> storage.owner-uid=9869
> performance.readdir-ahead=on
> performance.io-thread-count=64
> performance.cache-size=1GB
> brick-0=shchhv01-sto:-data-brick3-shchst01
> brick-1=shchhv02-sto:-data-brick3-shchst01
> brick-2=shchhv03-sto:-data-brick3-shchst01
> brick-3=shchhv01-sto:-data-brick1-shchst01
> brick-4=shchhv02-sto:-data-brick1-shchst01
> brick-5=shchhv03-sto:-data-brick1-shchst01
> brick-6=shchhv02-sto:-data-brick2-shchst01
> brick-7=shchhv03-sto:-data-brick2-shchst01
> brick-8=shchhv04-sto:-data-brick2-shchst01
> brick-9=shchhv02-sto:-data-brick4-shchst01
> brick-10=shchhv03-sto:-data-brick4-shchst01
> brick-11=shchhv04-sto:-data-brick4-shchst01
>
> Another Server:  shchhv02-sto
> =============================> type=2
> count=12
> status=1
> sub_count=3
> stripe_count=1
> replica_count=3
> disperse_count=0
> redundancy_count=0
> version=52
> transport-type=0
> volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3
> username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc
> password=58652573-0955-4d00-893a-9f42d0f16717
> op-version=30700
> client-op-version=30700
> quota-version=0
> parent_volname=N/A
> restored_from_snap=00000000-0000-0000-0000-000000000000
> snap-max-hard-limit=256
> cluster.data-self-heal-algorithm=full
> features.shard-block-size=512MB
> features.shard=enable
> performance.readdir-ahead=on
> storage.owner-uid=9869
> storage.owner-gid=9869
> server.allow-insecure=on
> performance.quick-read=off
> performance.read-ahead=off
> performance.io-cache=off
> performance.stat-prefetch=off
> cluster.eager-lock=enable
> network.remote-dio=enable
> cluster.quorum-type=auto
> cluster.server-quorum-type=server
> cluster.self-heal-daemon=on
> nfs.disable=on
> performance.io-thread-count=64
> performance.cache-size=1GB
> brick-0=shchhv01-sto:-data-brick3-shchst01
> brick-1=shchhv02-sto:-data-brick3-shchst01
> brick-2=shchhv03-sto:-data-brick3-shchst01
> brick-3=shchhv01-sto:-data-brick1-shchst01
> brick-4=shchhv02-sto:-data-brick1-shchst01
> brick-5=shchhv03-sto:-data-brick1-shchst01
> brick-6=shchhv02-sto:-data-brick2-shchst01
> brick-7=shchhv03-sto:-data-brick2-shchst01
> brick-8=shchhv04-sto:-data-brick2-shchst01
> brick-9=shchhv02-sto:-data-brick4-shchst01
> brick-10=shchhv03-sto:-data-brick4-shchst01
> brick-11=shchhv04-sto:-data-brick4-shchst01
>
> NOTE
>
> [root at shchhv01 shchst01]# gluster volume get shchst01 cluster.op-version
> Warning: Support to get global option value using `volume get
<volname>`
> will be deprecated from next release. Consider using `volume get all`
> instead for global options
> Option                                  Value
>
> ------                                  -----
>
> cluster.op-version                      30800
>
> [root at shchhv02 shchst01]# gluster volume get shchst01 cluster.op-version
> Option                                  Value
>
> ------                                  -----
>
> cluster.op-version                      30800
>
> -----Original Message-----
> From: gluster-users-bounces at gluster.org
> [mailto:gluster-users-bounces at gluster.org] On Behalf Of Ziemowit
Pierzycki
> Sent: Tuesday, December 19, 2017 3:56 PM
> To: gluster-users <gluster-users at gluster.org>
> Subject: Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12
>
> I have not done the upgrade yet.  Since this is a production cluster I need
> to make sure it stays up or schedule some downtime if it doesn't
doesn't.
> Thanks.
>
> On Tue, Dec 19, 2017 at 10:11 AM, Atin Mukherjee <amukherj at
redhat.com>
> wrote:
> >
> >
> > On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzycki
> > <ziemowit at pierzycki.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> I have a cluster of 10 servers all running Fedora 24 along with
> >> Gluster 3.8.  I'm planning on doing rolling upgrades to Fedora
27
> >> with Gluster 3.12.  I saw the documentation and did some testing
but
> >> I would like to run my plan through some (more?) educated minds.
> >>
> >> The current setup is:
> >>
> >> Volume Name: vol0
> >> Distributed-Replicate
> >> Number of Bricks: 2 x (2 + 1) = 6
> >> Bricks:
> >> Brick1: glt01:/vol/vol0
> >> Brick2: glt02:/vol/vol0
> >> Brick3: glt05:/vol/vol0 (arbiter)
> >> Brick4: glt03:/vol/vol0
> >> Brick5: glt04:/vol/vol0
> >> Brick6: glt06:/vol/vol0 (arbiter)
> >>
> >> Volume Name: vol1
> >> Distributed-Replicate
> >> Number of Bricks: 2 x (2 + 1) = 6
> >> Bricks:
> >> Brick1: glt07:/vol/vol1
> >> Brick2: glt08:/vol/vol1
> >> Brick3: glt05:/vol/vol1 (arbiter)
> >> Brick4: glt09:/vol/vol1
> >> Brick5: glt10:/vol/vol1
> >> Brick6: glt06:/vol/vol1 (arbiter)
> >>
> >> After performing the upgrade because of differences in checksums,
the
> >> upgraded nodes will become:
> >>
> >> State: Peer Rejected (Connected)
> >
> >
> > Have you upgraded all the nodes? If yes, have you bumped up the
> > cluster.op-version after upgrading all the nodes? Please follow :
> > http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more
> > details on how to bump up the cluster.op-version. In case you have
> > done all of these and you're seeing a checksum issue then I'm
afraid
> > you have hit a bug. I'd need further details like the checksum
> > mismatch error from glusterd.log file along with the the exact
> > volume's info file from
/var/lib/glusterd/vols/<volname>/info between
> > both the peers to debug this further.
> >
> >>
> >> If I start doing the upgrades one at a time, with nodes glt10 to
> >> glt01 except for the arbiters glt05 and glt06, and then upgrading
the
> >> arbiters last, everything should remain online at all times
through
> >> the process.  Correct?
> >>
> >> Thanks.
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171220/8e7570ea/attachment.html>

Seemingly Similar Threads

Search for more reasonably related threads

Gluster users - Dec 2017 - Upgrading from Gluster 3.8 to 3.12

[Gluster-users] Upgrading from Gluster 3.8 to 3.12

[Gluster-users] Upgrading from Gluster 3.8 to 3.12

[Gluster-users] Upgrading from Gluster 3.8 to 3.12

Seemingly Similar Threads