thr3ads.net - Gluster users - [Gluster-users] heal info OK but statistics not working [Sep 2017]

If this information is useful, please help other people find it:
Share via:

lejeczek

2017-Sep-04 10:37 UTC

[Gluster-users] heal info OK but statistics not working

hi all

this:
$ vol heal $_vol info
outputs ok and exit code is 0
But if I want to see statistics:
$ gluster vol heal $_vol statistics
Gathering crawl statistics on volume GROUP-WORK has been 
unsuccessful on bricks that are down. Please check if all 
brick processes are running.

I suspect - gluster inability to cope with a situation where 
one peer(which is not even a brick for a single vol on the 
cluster!) is inaccessible to the rest of cluster.
I have not played with any other variations of this case, 
eg. more than one peer goes down, etc.
But I hope someone could try to replicate this simple test case.

Cluster and vols, when something like this happens, seem 
accessible and as such "all" works, except when you want 
more details.
This also fails:
$ gluster vol status $_vol detail
Error : Request timed out

My gluster(3.10.5-1.el7.x86_64) exhibits these symptoms 
every time one(at least) peers goes out of the rest reach.

maybe @devel can comment?

many thanks, L.

Atin Mukherjee

2017-Sep-04 10:47 UTC

head link

[Gluster-users] heal info OK but statistics not working

Please provide the output of gluster volume info, gluster volume status and
gluster peer status.

On Mon, Sep 4, 2017 at 4:07 PM, lejeczek <peljasz at yahoo.co.uk> wrote:
> hi all
>
> this:
> $ vol heal $_vol info
> outputs ok and exit code is 0
> But if I want to see statistics:
> $ gluster vol heal $_vol statistics
> Gathering crawl statistics on volume GROUP-WORK has been unsuccessful on
> bricks that are down. Please check if all brick processes are running.
>
> I suspect - gluster inability to cope with a situation where one
> peer(which is not even a brick for a single vol on the cluster!) is
> inaccessible to the rest of cluster.
> I have not played with any other variations of this case, eg. more than
> one peer goes down, etc.
> But I hope someone could try to replicate this simple test case.
>
> Cluster and vols, when something like this happens, seem accessible and as
> such "all" works, except when you want more details.
> This also fails:
> $ gluster vol status $_vol detail
> Error : Request timed out
>
> My gluster(3.10.5-1.el7.x86_64) exhibits these symptoms every time one(at
> least) peers goes out of the rest reach.
>
> maybe @devel can comment?
>
> many thanks, L.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170904/4d00e821/attachment.html>

lejeczek

2017-Sep-04 13:54 UTC

head link

[Gluster-users] heal info OK but statistics not working

1) one peer, out of four, got separated from the network, 
from the rest of the cluster.
2) that unavailable(while it was unavailable) peer got 
detached with "gluster peer detach" command which succeeded, 
so now cluster comprise of three peers
3) Self-heal daemon (for some reason) does not start(with an 
attempt to restart glusted) on the peer which probed that 
fourth peer.
4) fourth unavailable peer is still up & running but is 
inaccessible to other peers for network is disconnected, 
segmented. That peer's gluster status show peer is still in 
the cluster.
5) So, fourth peer's gluster(nor other processes) stack did 
not fail nor crushed, just network got, is disconnected.
6) peer status show ok & connected for current three peers.

This is third time when it happens to me, very same way: 
each time net-disjointed peer was brought back online then 
statistics & details worked again.

can you not reproduce it?

$ gluster vol info QEMU-VMs

Volume Name: QEMU-VMs
Type: Replicate
Volume ID: 8709782a-daa5-4434-a816-c4e0aef8fef2
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 
10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
Brick2: 
10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
Brick3: 
10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
storage.owner-gid: 107
storage.owner-uid: 107
performance.readdir-ahead: on
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on

$ gluster vol status QEMU-VMs
Status of volume: QEMU-VMs
Gluster process???????????????????????????? TCP Port? RDMA 
Port Online? Pid
------------------------------------------------------------------------------
Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
TERs/0GLUSTER-QEMU-VMs????????????????????? 49156???? 0 
Y?????? 9302
Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
TERs/0GLUSTER-QEMU-VMs????????????????????? 49156???? 0 
Y?????? 7610
Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
STERs/0GLUSTER-QEMU-VMs???????????????????? 49156???? 0 
Y?????? 11013
Self-heal Daemon on localhost?????????????? N/A?????? N/A 
Y?????? 3069276
Self-heal Daemon on 10.5.6.32?????????????? N/A?????? N/A 
Y?????? 3315870
Self-heal Daemon on 10.5.6.49?????????????? N/A?????? N/A 
N?????? N/A? <--- HERE
Self-heal Daemon on 10.5.6.17?????????????? N/A?????? N/A 
Y?????? 5163

Task Status of Volume QEMU-VMs
------------------------------------------------------------------------------
There are no active volume tasks

$ gluster vol heal QEMU-VMs statistics heal-count
Gathering count of entries to be healed on volume QEMU-VMs 
has been unsuccessful on bricks that are down. Please check 
if all brick processes are running.



On 04/09/17 11:47, Atin Mukherjee wrote:> Please provide the output of gluster volume info, gluster 
> volume status and gluster peer status.
>
> On Mon, Sep 4, 2017 at 4:07 PM, lejeczek 
> <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk>> wrote:
>
>     hi all
>
>     this:
>     $ vol heal $_vol info
>     outputs ok and exit code is 0
>     But if I want to see statistics:
>     $ gluster vol heal $_vol statistics
>     Gathering crawl statistics on volume GROUP-WORK has
>     been unsuccessful on bricks that are down. Please
>     check if all brick processes are running.
>
>     I suspect - gluster inability to cope with a situation
>     where one peer(which is not even a brick for a single
>     vol on the cluster!) is inaccessible to the rest of
>     cluster.
>     I have not played with any other variations of this
>     case, eg. more than one peer goes down, etc.
>     But I hope someone could try to replicate this simple
>     test case.
>
>     Cluster and vols, when something like this happens,
>     seem accessible and as such "all" works, except when
>     you want more details.
>     This also fails:
>     $ gluster vol status $_vol detail
>     Error : Request timed out
>
>     My gluster(3.10.5-1.el7.x86_64) exhibits these
>     symptoms every time one(at least) peers goes out of
>     the rest reach.
>
>     maybe @devel can comment?
>
>     many thanks, L.
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>
>     http://lists.gluster.org/mailman/listinfo/gluster-users
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>

Possibly Parallel Threads

Search for more maybe matching threads

Gluster users - Sep 2017 - heal info OK but statistics not working

[Gluster-users] heal info OK but statistics not working

[Gluster-users] heal info OK but statistics not working

[Gluster-users] heal info OK but statistics not working

Possibly Parallel Threads