thr3ads.net - Gluster users - [Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1) [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Gianluca Cecchi

2017-Jul-05 15:42 UTC

[Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)

On Wed, Jul 5, 2017 at 5:22 PM, Atin Mukherjee <amukherj at redhat.com>
wrote:
> And what does glusterd log indicate for these failures?
>

See here in gzip format

https://drive.google.com/file/d/0BwoPbcrMv8mvYmlRLUgyV0pFN0k/view?usp=sharing


It seems that on each host the peer files have been updated with a new
entry "hostname2":

[root at ovirt01 ~]# cat /var/lib/glusterd/peers/*
uuid=b89311fe-257f-4e44-8e15-9bff6245d689
state=3
hostname1=ovirt02.localdomain.local
hostname2=10.10.2.103
uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3
state=3
hostname1=ovirt03.localdomain.local
hostname2=10.10.2.104
[root at ovirt01 ~]#

[root at ovirt02 ~]# cat /var/lib/glusterd/peers/*
uuid=e9717281-a356-42aa-a579-a4647a29a0bc
state=3
hostname1=ovirt01.localdomain.local
hostname2=10.10.2.102
uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3
state=3
hostname1=ovirt03.localdomain.local
hostname2=10.10.2.104
[root at ovirt02 ~]#

[root at ovirt03 ~]# cat /var/lib/glusterd/peers/*
uuid=b89311fe-257f-4e44-8e15-9bff6245d689
state=3
hostname1=ovirt02.localdomain.local
hostname2=10.10.2.103
uuid=e9717281-a356-42aa-a579-a4647a29a0bc
state=3
hostname1=ovirt01.localdomain.local
hostname2=10.10.2.102
[root at ovirt03 ~]#


But not the gluster info on the second and third node that have lost the
ovirt01/gl01 host brick information...

Eg on ovirt02


[root at ovirt02 peers]# gluster volume info export

Volume Name: export
Type: Replicate
Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
Status: Started
Snapshot Count: 0
Number of Bricks: 0 x (2 + 1) = 2
Transport-type: tcp
Bricks:
Brick1: ovirt02.localdomain.local:/gluster/brick3/export
Brick2: ovirt03.localdomain.local:/gluster/brick3/export
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 6
network.ping-timeout: 30
user.cifs: off
nfs.disable: on
performance.strict-o-direct: on
[root at ovirt02 peers]#

And on ovirt03

[root at ovirt03 ~]# gluster volume info export

Volume Name: export
Type: Replicate
Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
Status: Started
Snapshot Count: 0
Number of Bricks: 0 x (2 + 1) = 2
Transport-type: tcp
Bricks:
Brick1: ovirt02.localdomain.local:/gluster/brick3/export
Brick2: ovirt03.localdomain.local:/gluster/brick3/export
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 6
network.ping-timeout: 30
user.cifs: off
nfs.disable: on
performance.strict-o-direct: on
[root at ovirt03 ~]#

While on ovirt01 it seems isolated...

[root at ovirt01 ~]# gluster volume info export

Volume Name: export
Type: Replicate
Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
Status: Started
Snapshot Count: 0
Number of Bricks: 0 x (2 + 1) = 1
Transport-type: tcp
Bricks:
Brick1: gl01.localdomain.local:/gluster/brick3/export
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 6
network.ping-timeout: 30
user.cifs: off
nfs.disable: on
performance.strict-o-direct: on
[root at ovirt01 ~]#
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170705/dd319b8a/attachment.html>

Atin Mukherjee

2017-Jul-05 16:39 UTC

head link

[Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)

OK, so the log just hints to the following:

[2017-07-05 15:04:07.178204] E [MSGID: 106123]
[glusterd-mgmt.c:1532:glusterd_mgmt_v3_commit] 0-management: Commit failed
for operation Reset Brick on local node
[2017-07-05 15:04:07.178214] E [MSGID: 106123]
[glusterd-replace-brick.c:649:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases]
0-management: Commit Op Failed

While going through the code, glusterd_op_reset_brick () failed resulting
into these logs. Now I don't see any error logs generated from
glusterd_op_reset_brick () which makes me thing that have we failed from a
place where we log the failure in debug mode. Would you be able to restart
glusterd service with debug log mode and reran this test and share the log?


On Wed, Jul 5, 2017 at 9:12 PM, Gianluca Cecchi <gianluca.cecchi at
gmail.com>
wrote:
>
>
> On Wed, Jul 5, 2017 at 5:22 PM, Atin Mukherjee <amukherj at
redhat.com>
> wrote:
>
>> And what does glusterd log indicate for these failures?
>>
>
>
> See here in gzip format
>
> https://drive.google.com/file/d/0BwoPbcrMv8mvYmlRLUgyV0pFN0k/
> view?usp=sharing
>
> It seems that on each host the peer files have been updated with a new
> entry "hostname2":
>
> [root at ovirt01 ~]# cat /var/lib/glusterd/peers/*
> uuid=b89311fe-257f-4e44-8e15-9bff6245d689
> state=3
> hostname1=ovirt02.localdomain.local
> hostname2=10.10.2.103
> uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3
> state=3
> hostname1=ovirt03.localdomain.local
> hostname2=10.10.2.104
> [root at ovirt01 ~]#
>
> [root at ovirt02 ~]# cat /var/lib/glusterd/peers/*
> uuid=e9717281-a356-42aa-a579-a4647a29a0bc
> state=3
> hostname1=ovirt01.localdomain.local
> hostname2=10.10.2.102
> uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3
> state=3
> hostname1=ovirt03.localdomain.local
> hostname2=10.10.2.104
> [root at ovirt02 ~]#
>
> [root at ovirt03 ~]# cat /var/lib/glusterd/peers/*
> uuid=b89311fe-257f-4e44-8e15-9bff6245d689
> state=3
> hostname1=ovirt02.localdomain.local
> hostname2=10.10.2.103
> uuid=e9717281-a356-42aa-a579-a4647a29a0bc
> state=3
> hostname1=ovirt01.localdomain.local
> hostname2=10.10.2.102
> [root at ovirt03 ~]#
>
>
> But not the gluster info on the second and third node that have lost the
> ovirt01/gl01 host brick information...
>
> Eg on ovirt02
>
>
> [root at ovirt02 peers]# gluster volume info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 0 x (2 + 1) = 2
> Transport-type: tcp
> Bricks:
> Brick1: ovirt02.localdomain.local:/gluster/brick3/export
> Brick2: ovirt03.localdomain.local:/gluster/brick3/export
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: off
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> features.shard-block-size: 512MB
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-wait-qlength: 10000
> cluster.shd-max-threads: 6
> network.ping-timeout: 30
> user.cifs: off
> nfs.disable: on
> performance.strict-o-direct: on
> [root at ovirt02 peers]#
>
> And on ovirt03
>
> [root at ovirt03 ~]# gluster volume info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 0 x (2 + 1) = 2
> Transport-type: tcp
> Bricks:
> Brick1: ovirt02.localdomain.local:/gluster/brick3/export
> Brick2: ovirt03.localdomain.local:/gluster/brick3/export
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: off
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> features.shard-block-size: 512MB
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-wait-qlength: 10000
> cluster.shd-max-threads: 6
> network.ping-timeout: 30
> user.cifs: off
> nfs.disable: on
> performance.strict-o-direct: on
> [root at ovirt03 ~]#
>
> While on ovirt01 it seems isolated...
>
> [root at ovirt01 ~]# gluster volume info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 0 x (2 + 1) = 1
> Transport-type: tcp
> Bricks:
> Brick1: gl01.localdomain.local:/gluster/brick3/export
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: off
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> features.shard-block-size: 512MB
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-wait-qlength: 10000
> cluster.shd-max-threads: 6
> network.ping-timeout: 30
> user.cifs: off
> nfs.disable: on
> performance.strict-o-direct: on
> [root at ovirt01 ~]#
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170705/514a1529/attachment.html>

Apparently Analagous Threads

Search for more maybe matching threads

Gluster users - Jul 2017 - op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)

[Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)

[Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)

Apparently Analagous Threads