thr3ads.net - Gluster users - [Gluster-users] [Gluster-devel] 3.7.5 upgrade issues [Nov 2015]

If this information is useful, please help other people find it:
Share via:

JuanFra Rodríguez Cardoso

2015-Oct-26 10:59 UTC

[Gluster-users] [Gluster-devel] 3.7.5 upgrade issues

I have replicated my upgradable environment in a testing lab with the
following configuration:

Distributed gluster (one brick per node)
- Node gluster-1: glusterfs version 3.7.4
- Node gluster-2: glusterfs version 3.7.4
- Node gluster-3: glusterfs version 3.7.4

I began by upgrading only the first node to newest version (3.7.5).

root at gluster-1 ~]# gluster --version
glusterfs 3.7.5 built on Oct  7 2015 16:27:05

When I tried to request the status of gluster volume, I got these error
messages:

[root at gluster-1 ~]# gluster volume status
Staging failed on gluster-2. Please check log file for details.
Staging failed on gluster-3. Please check log file for details.

In node gluster-2, tailed messages from
/var/log/glusterfs/etc-glusterfs-glusterd.vol.log:

[2015-10-26 10:50:16.378672] E [MSGID: 106062]
[glusterd-volume-ops.c:1796:glusterd_op_stage_heal_volume] 0-glusterd:
Unable to get volume name
[2015-10-26 10:50:16.378735] E [MSGID: 106301]
[glusterd-op-sm.c:5171:glusterd_op_ac_stage_op] 0-management: Stage failed
on operation 'Volume Heal', Status : -2


On the other hand, if I upgrade all the nodes at the same time, everything
seems working fine!

The issue may be when nodes have different versions (3.7.4 and 3.7.5).
Is this a normal behavior? It is needed to stop the entire cluster?


Regards,

.....................................................................

Juan Francisco Rodr?guez Cardoso

jfrodriguez at keedio.com | +34 636 69 26 91

www.keedio.com

.....................................................................

On 26 October 2015 at 11:48, Alan Orth <alan.orth at gmail.com> wrote:
> Hi,
>
> We're debating updating from 3.5.x to 3.7.x soon on our 2x2 replica set
> and these upgrade issues are a bit worrying. Can I hear a few voices from
> people who have had positive experiences? :)
>
> Thanks,
>
> Alan
>
> On Fri, Oct 23, 2015 at 6:32 PM, JuanFra Rodr?guez Cardoso <
> jfrodriguez at keedio.com> wrote:
>
>> I had that problem too, but I'm not able to fix it. I was forced to
>> downgrade to 3.7.4 to continue running my gluster volumes.
>>
>> The upgrading process (3.7.4 -> 3.7.5) does not seem fully reliable.
>>
>> Best.
>>
>> .....................................................................
>>
>> Juan Francisco Rodr?guez Cardoso
>>
>> jfrodriguez at keedio.com | +34 636 69 26 91
>>
>> www.keedio.com
>>
>> .....................................................................
>>
>> On 16 October 2015 at 15:24, David Robinson <david.robinson at
corvidtec.com
>> > wrote:
>>
>>> That log was the frick one, which is the node that I upgraded.  The
>>> frack one is attached.  One thing I did notice was the errors below
in the
>>> etc log file.  The /usr/lib64/glusterfs/3.7.5 directory doesn't
exist yet
>>> on frack.
>>>
>>>
>>>
+------------------------------------------------------------------------------+
>>> [2015-10-16 12:04:06.235993] I [MSGID: 101190]
>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
>>> with index 2
>>> [2015-10-16 12:04:06.236036] I [MSGID: 101190]
>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
>>> with index 1
>>> [2015-10-16 12:04:06.236099] I [MSGID: 101190]
>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
>>> with index 2
>>> [2015-10-16 12:04:09.242413] E
[socket.c:2278:socket_connect_finish]
>>> 0-management: connection to 10.200.82.1:24007 failed (No route to
host)
>>> [2015-10-16 12:04:09.242504] I [MSGID: 106004]
>>> [glusterd-handler.c:5056:__glusterd_peer_rpc_notify] 0-management:
Peer <
>>> frackib01.corvidtec.com>
(<8ab9a966-d536-4bd1-828a-64b2d72c47ca>), in
>>> state <Peer in Cluster>, has disconnected from glusterd.
>>> [2015-10-16 12:04:09.726895] W [socket.c:869:__socket_keepalive]
>>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 14,
Invalid
>>> argument
>>> [2015-10-16 12:04:09.726918] E [socket.c:2965:socket_connect]
>>> 0-management: Failed to set keep-alive: Invalid argument
>>> [2015-10-16 12:04:09.902756] W [MSGID: 101095]
>>> [xlator.c:143:xlator_volopt_dynload] 0-xlator:
>>> */usr/lib64/glusterfs/3.7.5/xlator/rpc-transport/socket.so:* cannot
>>> open shared object file: No such file or directory
>>>
>>>
>>> ------ Original Message ------
>>> From: "Mohammed Rafi K C" <rkavunga at redhat.com>
>>> To: "David Robinson" <drobinson at corvidtec.com>;
"
>>> gluster-users at gluster.org" <gluster-users at
gluster.org>; "Gluster Devel"
>>> <gluster-devel at gluster.org>
>>> Sent: 10/16/2015 8:43:21 AM
>>> Subject: Re: [Gluster-devel] 3.7.5 upgrade issues
>>>
>>>
>>> Hi David,
>>>
>>> The logs you attached, are they from node
"frackib01.corvidtec.com", if
>>> not can you attach logs from the node
"frackib01.corvidtec.com" ?
>>>
>>> Regards
>>> Rafi KC
>>> On 10/16/2015 05:46 PM, David Robinson wrote:
>>>
>>> I have a replica pair setup that I was trying to upgrade from 3.7.4
to
>>> 3.7.5.
>>> After upgrading the rpm packages (rpm -Uvh *.rpm) and rebooting one
of
>>> the nodes, I am now receiving the following:
>>>
>>> [root at frick01 log]# gluster volume status
>>> Staging failed on frackib01.corvidtec.com. Please check log file
for
>>> details.
>>>
>>>
>>>
>>> The logs are attached and my setup is shown below.  Can anyone
help?
>>>
>>> [root at frick01 log]# gluster volume info
>>>
>>> Volume Name: gfs
>>> Type: Replicate
>>> Volume ID: abc63b5c-bed7-4e3d-9057-00930a2d85d3
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp,rdma
>>> Bricks:
>>> Brick1: frickib01.corvidtec.com:/data/brick01/gfs
>>> Brick2: frackib01.corvidtec.com:/data/brick01/gfs
>>> Options Reconfigured:
>>> storage.owner-gid: 100
>>> server.allow-insecure: on
>>> performance.readdir-ahead: on
>>> server.event-threads: 4
>>> client.event-threads: 4
>>> David
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing listGluster-devel at
gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Alan Orth
> alan.orth at gmail.com
> https://alaninkenya.org
> https://mjanja.ch
> "In heaven all the interesting people are missing." -Friedrich
Nietzsche
> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151026/4dbf06e4/attachment.html>

Raghavendra Talur

2015-Oct-28 12:53 UTC

head link

[Gluster-users] [Gluster-devel] 3.7.5 upgrade issues

I have filed a bug for this on bugzilla.
Here is the link https://bugzilla.redhat.com/show_bug.cgi?id=1276029.

Please cc yourself for updates on the bug.


Thanks,
Raghavendra Talur
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151028/ce87f60b/attachment.html>

Atin Mukherjee

2015-Nov-02 05:59 UTC

head link

[Gluster-users] [Gluster-devel] 3.7.5 upgrade issues

Here is an update to this issue. Gaurav Garg (In Cc) has identified the
root cause and the fix [1] has been posted for review in mainline. Once
its merged we will backport it and push it for 3.7.6.

The issue originated because of introducing new enums in the middle of
the enum structure which resulted in a mismatch in enum number at the
receiving glusterd end and caused commands to fail. Fix is to move these
new enums at the end of the structure, however this would not fix 3.7.5
to 3.7.6 upgrade path as the same mismatch will happen in this case too.
However if you upgrade the complete cluster, then the issue goes off. We
could have chosen to maintain two different enum structures here (one
for pre 3.7.6 and the other is >= 3.7.6) but that makes code look
redundant and more importantly ugly. So we chose to go with option 1.
Another BZ will be raised to mark 3.7.5 to 3.7.6 upgrade issue as known
issue and the same will be captured in the release notes.

3.7.7 onward the upgrade path will be smooth.

[1] http://review.gluster.org/#/c/12473/

Thanks,
Atin

On 10/28/2015 06:23 PM, Raghavendra Talur wrote:> I have filed a bug for this on bugzilla.
> Here is the link https://bugzilla.redhat.com/show_bug.cgi?id=1276029.
> 
> Please cc yourself for updates on the bug.
> 
> 
> Thanks,
> Raghavendra Talur
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>

Gluster users - Nov 2015 - [Gluster-devel] 3.7.5 upgrade issues

[Gluster-users] [Gluster-devel] 3.7.5 upgrade issues

[Gluster-users] [Gluster-devel] 3.7.5 upgrade issues

[Gluster-users] [Gluster-devel] 3.7.5 upgrade issues