Hi, We're debating updating from 3.5.x to 3.7.x soon on our 2x2 replica set and these upgrade issues are a bit worrying. Can I hear a few voices from people who have had positive experiences? :) Thanks, Alan On Fri, Oct 23, 2015 at 6:32 PM, JuanFra Rodr?guez Cardoso < jfrodriguez at keedio.com> wrote:> I had that problem too, but I'm not able to fix it. I was forced to > downgrade to 3.7.4 to continue running my gluster volumes. > > The upgrading process (3.7.4 -> 3.7.5) does not seem fully reliable. > > Best. > > ..................................................................... > > Juan Francisco Rodr?guez Cardoso > > jfrodriguez at keedio.com | +34 636 69 26 91 > > www.keedio.com > > ..................................................................... > > On 16 October 2015 at 15:24, David Robinson <david.robinson at corvidtec.com> > wrote: > >> That log was the frick one, which is the node that I upgraded. The frack >> one is attached. One thing I did notice was the errors below in the etc >> log file. The /usr/lib64/glusterfs/3.7.5 directory doesn't exist yet on >> frack. >> >> >> +------------------------------------------------------------------------------+ >> [2015-10-16 12:04:06.235993] I [MSGID: 101190] >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >> with index 2 >> [2015-10-16 12:04:06.236036] I [MSGID: 101190] >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >> with index 1 >> [2015-10-16 12:04:06.236099] I [MSGID: 101190] >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >> with index 2 >> [2015-10-16 12:04:09.242413] E [socket.c:2278:socket_connect_finish] >> 0-management: connection to 10.200.82.1:24007 failed (No route to host) >> [2015-10-16 12:04:09.242504] I [MSGID: 106004] >> [glusterd-handler.c:5056:__glusterd_peer_rpc_notify] 0-management: Peer < >> frackib01.corvidtec.com> (<8ab9a966-d536-4bd1-828a-64b2d72c47ca>), in >> state <Peer in Cluster>, has disconnected from glusterd. >> [2015-10-16 12:04:09.726895] W [socket.c:869:__socket_keepalive] >> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 14, Invalid >> argument >> [2015-10-16 12:04:09.726918] E [socket.c:2965:socket_connect] >> 0-management: Failed to set keep-alive: Invalid argument >> [2015-10-16 12:04:09.902756] W [MSGID: 101095] >> [xlator.c:143:xlator_volopt_dynload] 0-xlator: >> */usr/lib64/glusterfs/3.7.5/xlator/rpc-transport/socket.so:* cannot open >> shared object file: No such file or directory >> >> >> ------ Original Message ------ >> From: "Mohammed Rafi K C" <rkavunga at redhat.com> >> To: "David Robinson" <drobinson at corvidtec.com>; " >> gluster-users at gluster.org" <gluster-users at gluster.org>; "Gluster Devel" < >> gluster-devel at gluster.org> >> Sent: 10/16/2015 8:43:21 AM >> Subject: Re: [Gluster-devel] 3.7.5 upgrade issues >> >> >> Hi David, >> >> The logs you attached, are they from node "frackib01.corvidtec.com", if >> not can you attach logs from the node "frackib01.corvidtec.com" ? >> >> Regards >> Rafi KC >> On 10/16/2015 05:46 PM, David Robinson wrote: >> >> I have a replica pair setup that I was trying to upgrade from 3.7.4 to >> 3.7.5. >> After upgrading the rpm packages (rpm -Uvh *.rpm) and rebooting one of >> the nodes, I am now receiving the following: >> >> [root at frick01 log]# gluster volume status >> Staging failed on frackib01.corvidtec.com. Please check log file for >> details. >> >> >> >> The logs are attached and my setup is shown below. Can anyone help? >> >> [root at frick01 log]# gluster volume info >> >> Volume Name: gfs >> Type: Replicate >> Volume ID: abc63b5c-bed7-4e3d-9057-00930a2d85d3 >> Status: Started >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp,rdma >> Bricks: >> Brick1: frickib01.corvidtec.com:/data/brick01/gfs >> Brick2: frackib01.corvidtec.com:/data/brick01/gfs >> Options Reconfigured: >> storage.owner-gid: 100 >> server.allow-insecure: on >> performance.readdir-ahead: on >> server.event-threads: 4 >> client.event-threads: 4 >> David >> >> >> >> _______________________________________________ >> Gluster-devel mailing listGluster-devel at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-- Alan Orth alan.orth at gmail.com https://alaninkenya.org https://mjanja.ch "In heaven all the interesting people are missing." -Friedrich Nietzsche GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151026/dfb71475/attachment.html>
JuanFra RodrÃguez Cardoso
2015-Oct-26 10:59 UTC
[Gluster-users] [Gluster-devel] 3.7.5 upgrade issues
I have replicated my upgradable environment in a testing lab with the following configuration: Distributed gluster (one brick per node) - Node gluster-1: glusterfs version 3.7.4 - Node gluster-2: glusterfs version 3.7.4 - Node gluster-3: glusterfs version 3.7.4 I began by upgrading only the first node to newest version (3.7.5). root at gluster-1 ~]# gluster --version glusterfs 3.7.5 built on Oct 7 2015 16:27:05 When I tried to request the status of gluster volume, I got these error messages: [root at gluster-1 ~]# gluster volume status Staging failed on gluster-2. Please check log file for details. Staging failed on gluster-3. Please check log file for details. In node gluster-2, tailed messages from /var/log/glusterfs/etc-glusterfs-glusterd.vol.log: [2015-10-26 10:50:16.378672] E [MSGID: 106062] [glusterd-volume-ops.c:1796:glusterd_op_stage_heal_volume] 0-glusterd: Unable to get volume name [2015-10-26 10:50:16.378735] E [MSGID: 106301] [glusterd-op-sm.c:5171:glusterd_op_ac_stage_op] 0-management: Stage failed on operation 'Volume Heal', Status : -2 On the other hand, if I upgrade all the nodes at the same time, everything seems working fine! The issue may be when nodes have different versions (3.7.4 and 3.7.5). Is this a normal behavior? It is needed to stop the entire cluster? Regards, ..................................................................... Juan Francisco Rodr?guez Cardoso jfrodriguez at keedio.com | +34 636 69 26 91 www.keedio.com ..................................................................... On 26 October 2015 at 11:48, Alan Orth <alan.orth at gmail.com> wrote:> Hi, > > We're debating updating from 3.5.x to 3.7.x soon on our 2x2 replica set > and these upgrade issues are a bit worrying. Can I hear a few voices from > people who have had positive experiences? :) > > Thanks, > > Alan > > On Fri, Oct 23, 2015 at 6:32 PM, JuanFra Rodr?guez Cardoso < > jfrodriguez at keedio.com> wrote: > >> I had that problem too, but I'm not able to fix it. I was forced to >> downgrade to 3.7.4 to continue running my gluster volumes. >> >> The upgrading process (3.7.4 -> 3.7.5) does not seem fully reliable. >> >> Best. >> >> ..................................................................... >> >> Juan Francisco Rodr?guez Cardoso >> >> jfrodriguez at keedio.com | +34 636 69 26 91 >> >> www.keedio.com >> >> ..................................................................... >> >> On 16 October 2015 at 15:24, David Robinson <david.robinson at corvidtec.com >> > wrote: >> >>> That log was the frick one, which is the node that I upgraded. The >>> frack one is attached. One thing I did notice was the errors below in the >>> etc log file. The /usr/lib64/glusterfs/3.7.5 directory doesn't exist yet >>> on frack. >>> >>> >>> +------------------------------------------------------------------------------+ >>> [2015-10-16 12:04:06.235993] I [MSGID: 101190] >>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>> with index 2 >>> [2015-10-16 12:04:06.236036] I [MSGID: 101190] >>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>> with index 1 >>> [2015-10-16 12:04:06.236099] I [MSGID: 101190] >>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>> with index 2 >>> [2015-10-16 12:04:09.242413] E [socket.c:2278:socket_connect_finish] >>> 0-management: connection to 10.200.82.1:24007 failed (No route to host) >>> [2015-10-16 12:04:09.242504] I [MSGID: 106004] >>> [glusterd-handler.c:5056:__glusterd_peer_rpc_notify] 0-management: Peer < >>> frackib01.corvidtec.com> (<8ab9a966-d536-4bd1-828a-64b2d72c47ca>), in >>> state <Peer in Cluster>, has disconnected from glusterd. >>> [2015-10-16 12:04:09.726895] W [socket.c:869:__socket_keepalive] >>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 14, Invalid >>> argument >>> [2015-10-16 12:04:09.726918] E [socket.c:2965:socket_connect] >>> 0-management: Failed to set keep-alive: Invalid argument >>> [2015-10-16 12:04:09.902756] W [MSGID: 101095] >>> [xlator.c:143:xlator_volopt_dynload] 0-xlator: >>> */usr/lib64/glusterfs/3.7.5/xlator/rpc-transport/socket.so:* cannot >>> open shared object file: No such file or directory >>> >>> >>> ------ Original Message ------ >>> From: "Mohammed Rafi K C" <rkavunga at redhat.com> >>> To: "David Robinson" <drobinson at corvidtec.com>; " >>> gluster-users at gluster.org" <gluster-users at gluster.org>; "Gluster Devel" >>> <gluster-devel at gluster.org> >>> Sent: 10/16/2015 8:43:21 AM >>> Subject: Re: [Gluster-devel] 3.7.5 upgrade issues >>> >>> >>> Hi David, >>> >>> The logs you attached, are they from node "frackib01.corvidtec.com", if >>> not can you attach logs from the node "frackib01.corvidtec.com" ? >>> >>> Regards >>> Rafi KC >>> On 10/16/2015 05:46 PM, David Robinson wrote: >>> >>> I have a replica pair setup that I was trying to upgrade from 3.7.4 to >>> 3.7.5. >>> After upgrading the rpm packages (rpm -Uvh *.rpm) and rebooting one of >>> the nodes, I am now receiving the following: >>> >>> [root at frick01 log]# gluster volume status >>> Staging failed on frackib01.corvidtec.com. Please check log file for >>> details. >>> >>> >>> >>> The logs are attached and my setup is shown below. Can anyone help? >>> >>> [root at frick01 log]# gluster volume info >>> >>> Volume Name: gfs >>> Type: Replicate >>> Volume ID: abc63b5c-bed7-4e3d-9057-00930a2d85d3 >>> Status: Started >>> Number of Bricks: 1 x 2 = 2 >>> Transport-type: tcp,rdma >>> Bricks: >>> Brick1: frickib01.corvidtec.com:/data/brick01/gfs >>> Brick2: frackib01.corvidtec.com:/data/brick01/gfs >>> Options Reconfigured: >>> storage.owner-gid: 100 >>> server.allow-insecure: on >>> performance.readdir-ahead: on >>> server.event-threads: 4 >>> client.event-threads: 4 >>> David >>> >>> >>> >>> _______________________________________________ >>> Gluster-devel mailing listGluster-devel at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> > > > > -- > Alan Orth > alan.orth at gmail.com > https://alaninkenya.org > https://mjanja.ch > "In heaven all the interesting people are missing." -Friedrich Nietzsche > GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0 >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151026/4dbf06e4/attachment.html>