JuanFra RodrÃguez Cardoso
2015-Oct-26 10:59 UTC
[Gluster-users] [Gluster-devel] 3.7.5 upgrade issues
I have replicated my upgradable environment in a testing lab with the following configuration: Distributed gluster (one brick per node) - Node gluster-1: glusterfs version 3.7.4 - Node gluster-2: glusterfs version 3.7.4 - Node gluster-3: glusterfs version 3.7.4 I began by upgrading only the first node to newest version (3.7.5). root at gluster-1 ~]# gluster --version glusterfs 3.7.5 built on Oct 7 2015 16:27:05 When I tried to request the status of gluster volume, I got these error messages: [root at gluster-1 ~]# gluster volume status Staging failed on gluster-2. Please check log file for details. Staging failed on gluster-3. Please check log file for details. In node gluster-2, tailed messages from /var/log/glusterfs/etc-glusterfs-glusterd.vol.log: [2015-10-26 10:50:16.378672] E [MSGID: 106062] [glusterd-volume-ops.c:1796:glusterd_op_stage_heal_volume] 0-glusterd: Unable to get volume name [2015-10-26 10:50:16.378735] E [MSGID: 106301] [glusterd-op-sm.c:5171:glusterd_op_ac_stage_op] 0-management: Stage failed on operation 'Volume Heal', Status : -2 On the other hand, if I upgrade all the nodes at the same time, everything seems working fine! The issue may be when nodes have different versions (3.7.4 and 3.7.5). Is this a normal behavior? It is needed to stop the entire cluster? Regards, ..................................................................... Juan Francisco Rodr?guez Cardoso jfrodriguez at keedio.com | +34 636 69 26 91 www.keedio.com ..................................................................... On 26 October 2015 at 11:48, Alan Orth <alan.orth at gmail.com> wrote:> Hi, > > We're debating updating from 3.5.x to 3.7.x soon on our 2x2 replica set > and these upgrade issues are a bit worrying. Can I hear a few voices from > people who have had positive experiences? :) > > Thanks, > > Alan > > On Fri, Oct 23, 2015 at 6:32 PM, JuanFra Rodr?guez Cardoso < > jfrodriguez at keedio.com> wrote: > >> I had that problem too, but I'm not able to fix it. I was forced to >> downgrade to 3.7.4 to continue running my gluster volumes. >> >> The upgrading process (3.7.4 -> 3.7.5) does not seem fully reliable. >> >> Best. >> >> ..................................................................... >> >> Juan Francisco Rodr?guez Cardoso >> >> jfrodriguez at keedio.com | +34 636 69 26 91 >> >> www.keedio.com >> >> ..................................................................... >> >> On 16 October 2015 at 15:24, David Robinson <david.robinson at corvidtec.com >> > wrote: >> >>> That log was the frick one, which is the node that I upgraded. The >>> frack one is attached. One thing I did notice was the errors below in the >>> etc log file. The /usr/lib64/glusterfs/3.7.5 directory doesn't exist yet >>> on frack. >>> >>> >>> +------------------------------------------------------------------------------+ >>> [2015-10-16 12:04:06.235993] I [MSGID: 101190] >>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>> with index 2 >>> [2015-10-16 12:04:06.236036] I [MSGID: 101190] >>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>> with index 1 >>> [2015-10-16 12:04:06.236099] I [MSGID: 101190] >>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>> with index 2 >>> [2015-10-16 12:04:09.242413] E [socket.c:2278:socket_connect_finish] >>> 0-management: connection to 10.200.82.1:24007 failed (No route to host) >>> [2015-10-16 12:04:09.242504] I [MSGID: 106004] >>> [glusterd-handler.c:5056:__glusterd_peer_rpc_notify] 0-management: Peer < >>> frackib01.corvidtec.com> (<8ab9a966-d536-4bd1-828a-64b2d72c47ca>), in >>> state <Peer in Cluster>, has disconnected from glusterd. >>> [2015-10-16 12:04:09.726895] W [socket.c:869:__socket_keepalive] >>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 14, Invalid >>> argument >>> [2015-10-16 12:04:09.726918] E [socket.c:2965:socket_connect] >>> 0-management: Failed to set keep-alive: Invalid argument >>> [2015-10-16 12:04:09.902756] W [MSGID: 101095] >>> [xlator.c:143:xlator_volopt_dynload] 0-xlator: >>> */usr/lib64/glusterfs/3.7.5/xlator/rpc-transport/socket.so:* cannot >>> open shared object file: No such file or directory >>> >>> >>> ------ Original Message ------ >>> From: "Mohammed Rafi K C" <rkavunga at redhat.com> >>> To: "David Robinson" <drobinson at corvidtec.com>; " >>> gluster-users at gluster.org" <gluster-users at gluster.org>; "Gluster Devel" >>> <gluster-devel at gluster.org> >>> Sent: 10/16/2015 8:43:21 AM >>> Subject: Re: [Gluster-devel] 3.7.5 upgrade issues >>> >>> >>> Hi David, >>> >>> The logs you attached, are they from node "frackib01.corvidtec.com", if >>> not can you attach logs from the node "frackib01.corvidtec.com" ? >>> >>> Regards >>> Rafi KC >>> On 10/16/2015 05:46 PM, David Robinson wrote: >>> >>> I have a replica pair setup that I was trying to upgrade from 3.7.4 to >>> 3.7.5. >>> After upgrading the rpm packages (rpm -Uvh *.rpm) and rebooting one of >>> the nodes, I am now receiving the following: >>> >>> [root at frick01 log]# gluster volume status >>> Staging failed on frackib01.corvidtec.com. Please check log file for >>> details. >>> >>> >>> >>> The logs are attached and my setup is shown below. Can anyone help? >>> >>> [root at frick01 log]# gluster volume info >>> >>> Volume Name: gfs >>> Type: Replicate >>> Volume ID: abc63b5c-bed7-4e3d-9057-00930a2d85d3 >>> Status: Started >>> Number of Bricks: 1 x 2 = 2 >>> Transport-type: tcp,rdma >>> Bricks: >>> Brick1: frickib01.corvidtec.com:/data/brick01/gfs >>> Brick2: frackib01.corvidtec.com:/data/brick01/gfs >>> Options Reconfigured: >>> storage.owner-gid: 100 >>> server.allow-insecure: on >>> performance.readdir-ahead: on >>> server.event-threads: 4 >>> client.event-threads: 4 >>> David >>> >>> >>> >>> _______________________________________________ >>> Gluster-devel mailing listGluster-devel at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> > > > > -- > Alan Orth > alan.orth at gmail.com > https://alaninkenya.org > https://mjanja.ch > "In heaven all the interesting people are missing." -Friedrich Nietzsche > GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0 >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151026/4dbf06e4/attachment.html>
Raghavendra Talur
2015-Oct-28 12:53 UTC
[Gluster-users] [Gluster-devel] 3.7.5 upgrade issues
I have filed a bug for this on bugzilla. Here is the link https://bugzilla.redhat.com/show_bug.cgi?id=1276029. Please cc yourself for updates on the bug. Thanks, Raghavendra Talur -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151028/ce87f60b/attachment.html>
Here is an update to this issue. Gaurav Garg (In Cc) has identified the root cause and the fix [1] has been posted for review in mainline. Once its merged we will backport it and push it for 3.7.6. The issue originated because of introducing new enums in the middle of the enum structure which resulted in a mismatch in enum number at the receiving glusterd end and caused commands to fail. Fix is to move these new enums at the end of the structure, however this would not fix 3.7.5 to 3.7.6 upgrade path as the same mismatch will happen in this case too. However if you upgrade the complete cluster, then the issue goes off. We could have chosen to maintain two different enum structures here (one for pre 3.7.6 and the other is >= 3.7.6) but that makes code look redundant and more importantly ugly. So we chose to go with option 1. Another BZ will be raised to mark 3.7.5 to 3.7.6 upgrade issue as known issue and the same will be captured in the release notes. 3.7.7 onward the upgrade path will be smooth. [1] http://review.gluster.org/#/c/12473/ Thanks, Atin On 10/28/2015 06:23 PM, Raghavendra Talur wrote:> I have filed a bug for this on bugzilla. > Here is the link https://bugzilla.redhat.com/show_bug.cgi?id=1276029. > > Please cc yourself for updates on the bug. > > > Thanks, > Raghavendra Talur > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel >