thr3ads.net - Gluster users - [Gluster-users] How-To downgrade GlusterFS [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Micha Ober

2016-Nov-29 23:14 UTC

[Gluster-users] How-To downgrade GlusterFS

I had opened another thread on this mailing list (Subject: "After upgrade
from 3.4.2 to 3.8.5 - High CPU usage resulting in disconnects and
split-brain").

The title may be a bit misleading now, as I am no longer observing high CPU
usage after upgrading to 3.8.6, but the disconnects are still happening and
the number of files in split-brain is growing.

Setup: 6 compute nodes, each serving as a glusterfs server and client,
Ubuntu 14.04, two bricks per node, distribute-replicate

I have two gluster volumes set up (one for scratch data, one for the slurm
scheduler). Only the scratch data volume shows critical errors "[...] has
not responded in the last 42 seconds, disconnecting.". So I can rule out
network problems, the gigabit link between the nodes is not saturated at
all. The disks are almost idle (<10%).

I have glusterfs 3.4.2 on Ubuntu 12.04 on a another compute cluster,
running fine since it was deployed.
I had glusterfs 3.4.2 on Ubuntu 14.04 on this cluster, running fine for
almost a year.

After upgrading to 3.8.5, the problems (as described) started. I would like
to use some of the new features of the newer versions (like bitrot), but
the users can't run their compute jobs right now because the result files
are garbled.

There also seems to be a bug report with a smiliar problem: (but no
progress)
https://bugzilla.redhat.com/show_bug.cgi?id=1370683

For me, ALL servers are affected (not isolated to one or two servers)

I also see messages like "INFO: task gpu_graphene_bv:4476 blocked for more
than 120 seconds." in the syslog.

For completeness (gv0 is the scratch volume, gv2 the slurm volume):

[root at giant2: ~]# gluster v info

Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 993ec7c9-e4bc-44d0-b7c4-2d977e622e86
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: giant1:/gluster/sdc/gv0
Brick2: giant2:/gluster/sdc/gv0
Brick3: giant3:/gluster/sdc/gv0
Brick4: giant4:/gluster/sdc/gv0
Brick5: giant5:/gluster/sdc/gv0
Brick6: giant6:/gluster/sdc/gv0
Brick7: giant1:/gluster/sdd/gv0
Brick8: giant2:/gluster/sdd/gv0
Brick9: giant3:/gluster/sdd/gv0
Brick10: giant4:/gluster/sdd/gv0
Brick11: giant5:/gluster/sdd/gv0
Brick12: giant6:/gluster/sdd/gv0
Options Reconfigured:
auth.allow: X.X.X.*,127.0.0.1
nfs.disable: on

Volume Name: gv2
Type: Replicate
Volume ID: 30c78928-5f2c-4671-becc-8deaee1a7a8d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: giant1:/gluster/sdd/gv2
Brick2: giant2:/gluster/sdd/gv2
Options Reconfigured:
auth.allow: X.X.X.*,127.0.0.1
cluster.granular-entry-heal: on
cluster.locking-scheme: granular
nfs.disable: on


2016-11-30 0:10 GMT+01:00 Micha Ober <micha2k at gmail.com>:
> There also seems to be a bug report with a smiliar problem: (but no
> progress)
> https://bugzilla.redhat.com/show_bug.cgi?id=1370683
>
> For me, ALL servers are affected (not isolated to one or two servers)
>
> I also see messages like "INFO: task gpu_graphene_bv:4476 blocked for
more
> than 120 seconds." in the syslog.
>
> For completeness (gv0 is the scratch volume, gv2 the slurm volume):
>
> [root at giant2: ~]# gluster v info
>
> Volume Name: gv0
> Type: Distributed-Replicate
> Volume ID: 993ec7c9-e4bc-44d0-b7c4-2d977e622e86
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 6 x 2 = 12
> Transport-type: tcp
> Bricks:
> Brick1: giant1:/gluster/sdc/gv0
> Brick2: giant2:/gluster/sdc/gv0
> Brick3: giant3:/gluster/sdc/gv0
> Brick4: giant4:/gluster/sdc/gv0
> Brick5: giant5:/gluster/sdc/gv0
> Brick6: giant6:/gluster/sdc/gv0
> Brick7: giant1:/gluster/sdd/gv0
> Brick8: giant2:/gluster/sdd/gv0
> Brick9: giant3:/gluster/sdd/gv0
> Brick10: giant4:/gluster/sdd/gv0
> Brick11: giant5:/gluster/sdd/gv0
> Brick12: giant6:/gluster/sdd/gv0
> Options Reconfigured:
> auth.allow: X.X.X.*,127.0.0.1
> nfs.disable: on
>
> Volume Name: gv2
> Type: Replicate
> Volume ID: 30c78928-5f2c-4671-becc-8deaee1a7a8d
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: giant1:/gluster/sdd/gv2
> Brick2: giant2:/gluster/sdd/gv2
> Options Reconfigured:
> auth.allow: X.X.X.*,127.0.0.1
> cluster.granular-entry-heal: on
> cluster.locking-scheme: granular
> nfs.disable: on
>
>
> 2016-11-29 19:21 GMT+01:00 Micha Ober <micha2k at gmail.com>:
>
>> I had opened another thread on this mailing list (Subject: "After
upgrade
>> from 3.4.2 to 3.8.5 - High CPU usage resulting in disconnects and
>> split-brain").
>>
>> The title may be a bit misleading now, as I am no longer observing high
>> CPU usage after upgrading to 3.8.6, but the disconnects are still
happening
>> and the number of files in split-brain is growing.
>>
>> Setup: 6 compute nodes, each serving as a glusterfs server and client,
>> Ubuntu 14.04, two bricks per node, distribute-replicate
>>
>> I have two gluster volumes set up (one for scratch data, one for the
>> slurm scheduler). Only the scratch data volume shows critical errors
"[...]
>> has not responded in the last 42 seconds, disconnecting.". So I
can rule
>> out network problems, the gigabit link between the nodes is not
saturated
>> at all. The disks are almost idle (<10%).
>>
>> I have glusterfs 3.4.2 on Ubuntu 12.04 on a another compute cluster,
>> running fine since it was deployed.
>> I had glusterfs 3.4.2 on Ubuntu 14.04 on this cluster, running fine for
>> almost a year.
>>
>> After upgrading to 3.8.5, the problems (as described) started. I would
>> like to use some of the new features of the newer versions (like
bitrot),
>> but the users can't run their compute jobs right now because the
result
>> files are garbled.
>>
>> 2016-11-29 18:53 GMT+01:00 Atin Mukherjee <amukherj at
redhat.com>:
>>
>>> Would you be able to share what is not working for you in 3.8.x
(mention
>>> the exact version). 3.4 is quite old and falling back to an
unsupported
>>> version doesn't look a feasible option.
>>>
>>> On Tue, 29 Nov 2016 at 17:01, Micha Ober <micha2k at
gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I was using gluster 3.4 and upgraded to 3.8, but that version
showed to
>>>> be unusable for me. I now need to downgrade.
>>>>
>>>> I'm running Ubuntu 14.04. As upgrades of the op version
>>>> are irreversible, I guess I have to delete all gluster volumes
and
>>>> re-create them with the downgraded version.
>>>>
>>>> 0. Backup data
>>>> 1. Unmount all gluster volumes
>>>> 2. apt-get purge glusterfs-server glusterfs-client
>>>> 3. Remove PPA for 3.8
>>>> 4. Add PPA for older version
>>>> 5. apt-get install glusterfs-server glusterfs-client
>>>> 6. Create volumes
>>>>
>>>> Is "purge" enough to delete all configuration files
of the currently
>>>> installed version or do I need to  manually clear some residues
before
>>>> installing an older version?
>>>>
>>>> Thanks.
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>> --
>>> - Atin (atinm)
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161130/0ffbfbe3/attachment.html>

Mohammed Rafi K C

2016-Nov-30 05:57 UTC

head link

[Gluster-users] RE : Frequent connect and disconnect messages flooded in logs

Hi Micha,

I have changed the thread and subject so that your original thread
remain same for your query. Let's try to fix the problem what you
observed with 3.8.4, So I have started a new thread to discuss the
frequent disconnect problem.

*If any one else has experienced the same problem, please respond to the
mail.*

It would be very helpful if you could give us some more logs from
clients and bricks.  Also any reproducible steps will surely help to
chase the problem further.

Regards

Rafi KC

On 11/30/2016 04:44 AM, Micha Ober wrote:> I had opened another thread on this mailing list (Subject: "After
> upgrade from 3.4.2 to 3.8.5 - High CPU usage resulting in disconnects
> and split-brain").
>
> The title may be a bit misleading now, as I am no longer observing
> high CPU usage after upgrading to 3.8.6, but the disconnects are still
> happening and the number of files in split-brain is growing.
>
> Setup: 6 compute nodes, each serving as a glusterfs server and client,
> Ubuntu 14.04, two bricks per node, distribute-replicate
>
> I have two gluster volumes set up (one for scratch data, one for the
> slurm scheduler). Only the scratch data volume shows critical errors
> "[...] has not responded in the last 42 seconds, disconnecting.".
So I
> can rule out network problems, the gigabit link between the nodes is
> not saturated at all. The disks are almost idle (<10%).
>
> I have glusterfs 3.4.2 on Ubuntu 12.04 on a another compute cluster,
> running fine since it was deployed.
> I had glusterfs 3.4.2 on Ubuntu 14.04 on this cluster, running fine
> for almost a year.
>
> After upgrading to 3.8.5, the problems (as described) started. I would
> like to use some of the new features of the newer versions (like
> bitrot), but the users can't run their compute jobs right now because
> the result files are garbled.
>
> There also seems to be a bug report with a smiliar problem: (but no
> progress)
> https://bugzilla.redhat.com/show_bug.cgi?id=1370683
>
> For me, ALL servers are affected (not isolated to one or two servers)
>
> I also see messages like "INFO: task gpu_graphene_bv:4476 blocked for
> more than 120 seconds." in the syslog.
>
> For completeness (gv0 is the scratch volume, gv2 the slurm volume):
>
> [root at giant2: ~]# gluster v info
>
> Volume Name: gv0
> Type: Distributed-Replicate
> Volume ID: 993ec7c9-e4bc-44d0-b7c4-2d977e622e86
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 6 x 2 = 12
> Transport-type: tcp
> Bricks:
> Brick1: giant1:/gluster/sdc/gv0
> Brick2: giant2:/gluster/sdc/gv0
> Brick3: giant3:/gluster/sdc/gv0
> Brick4: giant4:/gluster/sdc/gv0
> Brick5: giant5:/gluster/sdc/gv0
> Brick6: giant6:/gluster/sdc/gv0
> Brick7: giant1:/gluster/sdd/gv0
> Brick8: giant2:/gluster/sdd/gv0
> Brick9: giant3:/gluster/sdd/gv0
> Brick10: giant4:/gluster/sdd/gv0
> Brick11: giant5:/gluster/sdd/gv0
> Brick12: giant6:/gluster/sdd/gv0
> Options Reconfigured:
> auth.allow: X.X.X.*,127.0.0.1
> nfs.disable: on
>
> Volume Name: gv2
> Type: Replicate
> Volume ID: 30c78928-5f2c-4671-becc-8deaee1a7a8d
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: giant1:/gluster/sdd/gv2
> Brick2: giant2:/gluster/sdd/gv2
> Options Reconfigured:
> auth.allow: X.X.X.*,127.0.0.1
> cluster.granular-entry-heal: on
> cluster.locking-scheme: granular
> nfs.disable: on
>
>
> 2016-11-30 0:10 GMT+01:00 Micha Ober <micha2k at gmail.com
> <mailto:micha2k at gmail.com>>:
>
>     There also seems to be a bug report with a smiliar problem: (but
>     no progress)
>     https://bugzilla.redhat.com/show_bug.cgi?id=1370683
>     <https://bugzilla.redhat.com/show_bug.cgi?id=1370683>
>
>     For me, ALL servers are affected (not isolated to one or two servers)
>
>     I also see messages like "INFO: task gpu_graphene_bv:4476 blocked
>     for more than 120 seconds." in the syslog.
>
>     For completeness (gv0 is the scratch volume, gv2 the slurm volume):
>
>     [root at giant2: ~]# gluster v info
>
>     Volume Name: gv0
>     Type: Distributed-Replicate
>     Volume ID: 993ec7c9-e4bc-44d0-b7c4-2d977e622e86
>     Status: Started
>     Snapshot Count: 0
>     Number of Bricks: 6 x 2 = 12
>     Transport-type: tcp
>     Bricks:
>     Brick1: giant1:/gluster/sdc/gv0
>     Brick2: giant2:/gluster/sdc/gv0
>     Brick3: giant3:/gluster/sdc/gv0
>     Brick4: giant4:/gluster/sdc/gv0
>     Brick5: giant5:/gluster/sdc/gv0
>     Brick6: giant6:/gluster/sdc/gv0
>     Brick7: giant1:/gluster/sdd/gv0
>     Brick8: giant2:/gluster/sdd/gv0
>     Brick9: giant3:/gluster/sdd/gv0
>     Brick10: giant4:/gluster/sdd/gv0
>     Brick11: giant5:/gluster/sdd/gv0
>     Brick12: giant6:/gluster/sdd/gv0
>     Options Reconfigured:
>     auth.allow: X.X.X.*,127.0.0.1
>     nfs.disable: on
>
>     Volume Name: gv2
>     Type: Replicate
>     Volume ID: 30c78928-5f2c-4671-becc-8deaee1a7a8d
>     Status: Started
>     Snapshot Count: 0
>     Number of Bricks: 1 x 2 = 2
>     Transport-type: tcp
>     Bricks:
>     Brick1: giant1:/gluster/sdd/gv2
>     Brick2: giant2:/gluster/sdd/gv2
>     Options Reconfigured:
>     auth.allow: X.X.X.*,127.0.0.1
>     cluster.granular-entry-heal: on
>     cluster.locking-scheme: granular
>     nfs.disable: on
>
>
>     2016-11-29 19:21 GMT+01:00 Micha Ober <micha2k at gmail.com
>     <mailto:micha2k at gmail.com>>:
>
>         I had opened another thread on this mailing list (Subject:
>         "After upgrade from 3.4.2 to 3.8.5 - High CPU usage resulting
>         in disconnects and split-brain").
>
>         The title may be a bit misleading now, as I am no longer
>         observing high CPU usage after upgrading to 3.8.6, but the
>         disconnects are still happening and the number of files in
>         split-brain is growing.
>
>         Setup: 6 compute nodes, each serving as a glusterfs server and
>         client, Ubuntu 14.04, two bricks per node, distribute-replicate
>
>         I have two gluster volumes set up (one for scratch data, one
>         for the slurm scheduler). Only the scratch data volume shows
>         critical errors "[...] has not responded in the last 42
>         seconds, disconnecting.". So I can rule out network problems,
>         the gigabit link between the nodes is not saturated at all.
>         The disks are almost idle (<10%).
>
>         I have glusterfs 3.4.2 on Ubuntu 12.04 on a another compute
>         cluster, running fine since it was deployed.
>         I had glusterfs 3.4.2 on Ubuntu 14.04 on this cluster, running
>         fine for almost a year.
>
>         After upgrading to 3.8.5, the problems (as described) started.
>         I would like to use some of the new features of the newer
>         versions (like bitrot), but the users can't run their compute
>         jobs right now because the result files are garbled.
>
>         2016-11-29 18:53 GMT+01:00 Atin Mukherjee <amukherj at
redhat.com
>         <mailto:amukherj at redhat.com>>:
>
>             Would you be able to share what is not working for you in
>             3.8.x (mention the exact version). 3.4 is quite old and
>             falling back to an unsupported version doesn't look a
>             feasible option.
>
>             On Tue, 29 Nov 2016 at 17:01, Micha Ober
>             <micha2k at gmail.com <mailto:micha2k at
gmail.com>> wrote:
>
>                 Hi,
>
>                 I was using gluster 3.4 and upgraded to 3.8, but that
>                 version showed to be unusable for me. I now need to
>                 downgrade.
>
>                 I'm running Ubuntu 14.04. As upgrades of the op
>                 version are irreversible, I guess I have to delete all
>                 gluster volumes and re-create them with the downgraded
>                 version. 
>
>                 0. Backup data
>                 1. Unmount all gluster volumes
>                 2. apt-get purge glusterfs-server glusterfs-client
>                 3. Remove PPA for 3.8
>                 4. Add PPA for older version
>                 5. apt-get install glusterfs-server glusterfs-client
>                 6. Create volumes
>
>                 Is "purge" enough to delete all configuration
files of
>                 the currently installed version or do I need to
>                  manually clear some residues before installing an
>                 older version?
>
>                 Thanks.
>                 _______________________________________________
>                 Gluster-users mailing list
>                 Gluster-users at gluster.org
>                 <mailto:Gluster-users at gluster.org>
>                 http://www.gluster.org/mailman/listinfo/gluster-users
>                
<http://www.gluster.org/mailman/listinfo/gluster-users>
>
>             -- 
>             - Atin (atinm)
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161130/4520e2a2/attachment.html>

Micha Ober

2016-Dec-02 19:26 UTC

head link

[Gluster-users] RE : Frequent connect and disconnect messages flooded in logs

** Update: ** I have downgraded from 3.8.6 to 3.7.17 now, but the 
problem still exists.

Client log: http://paste.ubuntu.com/23569065/
Brick log: http://paste.ubuntu.com/23569067/

Please note that each server has two bricks.
Whereas, according to the logs, one brick loses the connection to all 
other hosts:

[2016-12-02 18:38:53.703301] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server:
writev on X.X.X.219:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703381] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server:
writev on X.X.X.62:49118 failed (Broken pipe)
[2016-12-02 18:38:53.703380] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server:
writev on X.X.X.107:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703424] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server:
writev on X.X.X.206:49120 failed (Broken pipe)
[2016-12-02 18:38:53.703359] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server:
writev on X.X.X.58:49121 failed (Broken pipe)

The SECOND brick on the SAME host is NOT affected, i.e. no disconnects!
As I said, the network connection is fine and the disks are idle.
The CPU always has 2 free cores.

It looks like I have to downgrade to 3.4 now in order for the disconnects to
stop.

- Micha


Am 30.11.2016 um 06:57 schrieb Mohammed Rafi K C:>
> Hi Micha,
>
> I have changed the thread and subject so that your original thread 
> remain same for your query. Let's try to fix the problem what you 
> observed with 3.8.4, So I have started a new thread to discuss the 
> frequent disconnect problem.
>
> *If any one else has experienced the same problem, please respond to 
> the mail.*
>
> It would be very helpful if you could give us some more logs from 
> clients and bricks.  Also any reproducible steps will surely help to 
> chase the problem further.
>
> Regards
>
> Rafi KC
>
> On 11/30/2016 04:44 AM, Micha Ober wrote:
>> I had opened another thread on this mailing list (Subject: "After 
>> upgrade from 3.4.2 to 3.8.5 - High CPU usage resulting in disconnects 
>> and split-brain").
>>
>> The title may be a bit misleading now, as I am no longer observing 
>> high CPU usage after upgrading to 3.8.6, but the disconnects are 
>> still happening and the number of files in split-brain is growing.
>>
>> Setup: 6 compute nodes, each serving as a glusterfs server and 
>> client, Ubuntu 14.04, two bricks per node, distribute-replicate
>>
>> I have two gluster volumes set up (one for scratch data, one for the 
>> slurm scheduler). Only the scratch data volume shows critical errors 
>> "[...] has not responded in the last 42 seconds,
disconnecting.". So
>> I can rule out network problems, the gigabit link between the nodes 
>> is not saturated at all. The disks are almost idle (<10%).
>>
>> I have glusterfs 3.4.2 on Ubuntu 12.04 on a another compute cluster, 
>> running fine since it was deployed.
>> I had glusterfs 3.4.2 on Ubuntu 14.04 on this cluster, running fine 
>> for almost a year.
>>
>> After upgrading to 3.8.5, the problems (as described) started. I 
>> would like to use some of the new features of the newer versions 
>> (like bitrot), but the users can't run their compute jobs right now
>> because the result files are garbled.
>>
>> There also seems to be a bug report with a smiliar problem: (but no 
>> progress)
>> https://bugzilla.redhat.com/show_bug.cgi?id=1370683
>>
>> For me, ALL servers are affected (not isolated to one or two servers)
>>
>> I also see messages like "INFO: task gpu_graphene_bv:4476 blocked
for
>> more than 120 seconds." in the syslog.
>>
>> For completeness (gv0 is the scratch volume, gv2 the slurm volume):
>>
>> [root at giant2: ~]# gluster v info
>>
>> Volume Name: gv0
>> Type: Distributed-Replicate
>> Volume ID: 993ec7c9-e4bc-44d0-b7c4-2d977e622e86
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 6 x 2 = 12
>> Transport-type: tcp
>> Bricks:
>> Brick1: giant1:/gluster/sdc/gv0
>> Brick2: giant2:/gluster/sdc/gv0
>> Brick3: giant3:/gluster/sdc/gv0
>> Brick4: giant4:/gluster/sdc/gv0
>> Brick5: giant5:/gluster/sdc/gv0
>> Brick6: giant6:/gluster/sdc/gv0
>> Brick7: giant1:/gluster/sdd/gv0
>> Brick8: giant2:/gluster/sdd/gv0
>> Brick9: giant3:/gluster/sdd/gv0
>> Brick10: giant4:/gluster/sdd/gv0
>> Brick11: giant5:/gluster/sdd/gv0
>> Brick12: giant6:/gluster/sdd/gv0
>> Options Reconfigured:
>> auth.allow: X.X.X.*,127.0.0.1
>> nfs.disable: on
>>
>> Volume Name: gv2
>> Type: Replicate
>> Volume ID: 30c78928-5f2c-4671-becc-8deaee1a7a8d
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: giant1:/gluster/sdd/gv2
>> Brick2: giant2:/gluster/sdd/gv2
>> Options Reconfigured:
>> auth.allow: X.X.X.*,127.0.0.1
>> cluster.granular-entry-heal: on
>> cluster.locking-scheme: granular
>> nfs.disable: on
>>
>>
>> 2016-11-30 0:10 GMT+01:00 Micha Ober <micha2k at gmail.com 
>> <mailto:micha2k at gmail.com>>:
>>
>>     There also seems to be a bug report with a smiliar problem: (but
>>     no progress)
>>     https://bugzilla.redhat.com/show_bug.cgi?id=1370683
>>     <https://bugzilla.redhat.com/show_bug.cgi?id=1370683>
>>
>>     For me, ALL servers are affected (not isolated to one or two
servers)
>>
>>     I also see messages like "INFO: task gpu_graphene_bv:4476
blocked
>>     for more than 120 seconds." in the syslog.
>>
>>     For completeness (gv0 is the scratch volume, gv2 the slurm volume):
>>
>>     [root at giant2: ~]# gluster v info
>>
>>     Volume Name: gv0
>>     Type: Distributed-Replicate
>>     Volume ID: 993ec7c9-e4bc-44d0-b7c4-2d977e622e86
>>     Status: Started
>>     Snapshot Count: 0
>>     Number of Bricks: 6 x 2 = 12
>>     Transport-type: tcp
>>     Bricks:
>>     Brick1: giant1:/gluster/sdc/gv0
>>     Brick2: giant2:/gluster/sdc/gv0
>>     Brick3: giant3:/gluster/sdc/gv0
>>     Brick4: giant4:/gluster/sdc/gv0
>>     Brick5: giant5:/gluster/sdc/gv0
>>     Brick6: giant6:/gluster/sdc/gv0
>>     Brick7: giant1:/gluster/sdd/gv0
>>     Brick8: giant2:/gluster/sdd/gv0
>>     Brick9: giant3:/gluster/sdd/gv0
>>     Brick10: giant4:/gluster/sdd/gv0
>>     Brick11: giant5:/gluster/sdd/gv0
>>     Brick12: giant6:/gluster/sdd/gv0
>>     Options Reconfigured:
>>     auth.allow: X.X.X.*,127.0.0.1
>>     nfs.disable: on
>>
>>     Volume Name: gv2
>>     Type: Replicate
>>     Volume ID: 30c78928-5f2c-4671-becc-8deaee1a7a8d
>>     Status: Started
>>     Snapshot Count: 0
>>     Number of Bricks: 1 x 2 = 2
>>     Transport-type: tcp
>>     Bricks:
>>     Brick1: giant1:/gluster/sdd/gv2
>>     Brick2: giant2:/gluster/sdd/gv2
>>     Options Reconfigured:
>>     auth.allow: X.X.X.*,127.0.0.1
>>     cluster.granular-entry-heal: on
>>     cluster.locking-scheme: granular
>>     nfs.disable: on
>>
>>
>>     2016-11-29 19:21 GMT+01:00 Micha Ober <micha2k at gmail.com>:
>>
>>         I had opened another thread on this mailing list (Subject:
>>         "After upgrade from 3.4.2 to 3.8.5 - High CPU usage
resulting
>>         in disconnects and split-brain").
>>
>>         The title may be a bit misleading now, as I am no longer
>>         observing high CPU usage after upgrading to 3.8.6, but the
>>         disconnects are still happening and the number of files in
>>         split-brain is growing.
>>
>>         Setup: 6 compute nodes, each serving as a glusterfs server
>>         and client, Ubuntu 14.04, two bricks per node,
>>         distribute-replicate
>>
>>         I have two gluster volumes set up (one for scratch data, one
>>         for the slurm scheduler). Only the scratch data volume shows
>>         critical errors "[...] has not responded in the last 42
>>         seconds, disconnecting.". So I can rule out network
problems,
>>         the gigabit link between the nodes is not saturated at all.
>>         The disks are almost idle (<10%).
>>
>>         I have glusterfs 3.4.2 on Ubuntu 12.04 on a another compute
>>         cluster, running fine since it was deployed.
>>         I had glusterfs 3.4.2 on Ubuntu 14.04 on this cluster,
>>         running fine for almost a year.
>>
>>         After upgrading to 3.8.5, the problems (as described)
>>         started. I would like to use some of the new features of the
>>         newer versions (like bitrot), but the users can't run their
>>         compute jobs right now because the result files are garbled.
>>
>>         2016-11-29 18:53 GMT+01:00 Atin Mukherjee <amukherj at
redhat.com>:
>>
>>             Would you be able to share what is not working for you in
>>             3.8.x (mention the exact version). 3.4 is quite old and
>>             falling back to an unsupported version doesn't look a
>>             feasible option.
>>
>>             On Tue, 29 Nov 2016 at 17:01, Micha Ober
>>             <micha2k at gmail.com> wrote:
>>
>>                 Hi,
>>
>>                 I was using gluster 3.4 and upgraded to 3.8, but that
>>                 version showed to be unusable for me. I now need to
>>                 downgrade.
>>
>>                 I'm running Ubuntu 14.04. As upgrades of the op
>>                 version are irreversible, I guess I have to delete
>>                 all gluster volumes and re-create them with the
>>                 downgraded version.
>>
>>                 0. Backup data
>>                 1. Unmount all gluster volumes
>>                 2. apt-get purge glusterfs-server glusterfs-client
>>                 3. Remove PPA for 3.8
>>                 4. Add PPA for older version
>>                 5. apt-get install glusterfs-server glusterfs-client
>>                 6. Create volumes
>>
>>                 Is "purge" enough to delete all configuration
files
>>                 of the currently installed version or do I need to
>>                  manually clear some residues before installing an
>>                 older version?
>>
>>                 Thanks.
>>                 _______________________________________________
>>                 Gluster-users mailing list
>>                 Gluster-users at gluster.org
>>                 <mailto:Gluster-users at gluster.org>
>>                 http://www.gluster.org/mailman/listinfo/gluster-users
>>                
<http://www.gluster.org/mailman/listinfo/gluster-users>
>>
>>             -- 
>>             - Atin (atinm)
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161202/cae8db98/attachment.html>

Gluster users - Nov 2016 - How-To downgrade GlusterFS

[Gluster-users] How-To downgrade GlusterFS

[Gluster-users] RE : Frequent connect and disconnect messages flooded in logs

[Gluster-users] RE : Frequent connect and disconnect messages flooded in logs