Gianluca Cecchi
2017-Jul-05 15:42 UTC
[Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)
On Wed, Jul 5, 2017 at 5:22 PM, Atin Mukherjee <amukherj at redhat.com> wrote:> And what does glusterd log indicate for these failures? >See here in gzip format https://drive.google.com/file/d/0BwoPbcrMv8mvYmlRLUgyV0pFN0k/view?usp=sharing It seems that on each host the peer files have been updated with a new entry "hostname2": [root at ovirt01 ~]# cat /var/lib/glusterd/peers/* uuid=b89311fe-257f-4e44-8e15-9bff6245d689 state=3 hostname1=ovirt02.localdomain.local hostname2=10.10.2.103 uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3 state=3 hostname1=ovirt03.localdomain.local hostname2=10.10.2.104 [root at ovirt01 ~]# [root at ovirt02 ~]# cat /var/lib/glusterd/peers/* uuid=e9717281-a356-42aa-a579-a4647a29a0bc state=3 hostname1=ovirt01.localdomain.local hostname2=10.10.2.102 uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3 state=3 hostname1=ovirt03.localdomain.local hostname2=10.10.2.104 [root at ovirt02 ~]# [root at ovirt03 ~]# cat /var/lib/glusterd/peers/* uuid=b89311fe-257f-4e44-8e15-9bff6245d689 state=3 hostname1=ovirt02.localdomain.local hostname2=10.10.2.103 uuid=e9717281-a356-42aa-a579-a4647a29a0bc state=3 hostname1=ovirt01.localdomain.local hostname2=10.10.2.102 [root at ovirt03 ~]# But not the gluster info on the second and third node that have lost the ovirt01/gl01 host brick information... Eg on ovirt02 [root at ovirt02 peers]# gluster volume info export Volume Name: export Type: Replicate Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153 Status: Started Snapshot Count: 0 Number of Bricks: 0 x (2 + 1) = 2 Transport-type: tcp Bricks: Brick1: ovirt02.localdomain.local:/gluster/brick3/export Brick2: ovirt03.localdomain.local:/gluster/brick3/export Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on [root at ovirt02 peers]# And on ovirt03 [root at ovirt03 ~]# gluster volume info export Volume Name: export Type: Replicate Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153 Status: Started Snapshot Count: 0 Number of Bricks: 0 x (2 + 1) = 2 Transport-type: tcp Bricks: Brick1: ovirt02.localdomain.local:/gluster/brick3/export Brick2: ovirt03.localdomain.local:/gluster/brick3/export Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on [root at ovirt03 ~]# While on ovirt01 it seems isolated... [root at ovirt01 ~]# gluster volume info export Volume Name: export Type: Replicate Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153 Status: Started Snapshot Count: 0 Number of Bricks: 0 x (2 + 1) = 1 Transport-type: tcp Bricks: Brick1: gl01.localdomain.local:/gluster/brick3/export Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on [root at ovirt01 ~]# -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170705/dd319b8a/attachment.html>
Atin Mukherjee
2017-Jul-05 16:39 UTC
[Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)
OK, so the log just hints to the following: [2017-07-05 15:04:07.178204] E [MSGID: 106123] [glusterd-mgmt.c:1532:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Reset Brick on local node [2017-07-05 15:04:07.178214] E [MSGID: 106123] [glusterd-replace-brick.c:649:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases] 0-management: Commit Op Failed While going through the code, glusterd_op_reset_brick () failed resulting into these logs. Now I don't see any error logs generated from glusterd_op_reset_brick () which makes me thing that have we failed from a place where we log the failure in debug mode. Would you be able to restart glusterd service with debug log mode and reran this test and share the log? On Wed, Jul 5, 2017 at 9:12 PM, Gianluca Cecchi <gianluca.cecchi at gmail.com> wrote:> > > On Wed, Jul 5, 2017 at 5:22 PM, Atin Mukherjee <amukherj at redhat.com> > wrote: > >> And what does glusterd log indicate for these failures? >> > > > See here in gzip format > > https://drive.google.com/file/d/0BwoPbcrMv8mvYmlRLUgyV0pFN0k/ > view?usp=sharing > > It seems that on each host the peer files have been updated with a new > entry "hostname2": > > [root at ovirt01 ~]# cat /var/lib/glusterd/peers/* > uuid=b89311fe-257f-4e44-8e15-9bff6245d689 > state=3 > hostname1=ovirt02.localdomain.local > hostname2=10.10.2.103 > uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3 > state=3 > hostname1=ovirt03.localdomain.local > hostname2=10.10.2.104 > [root at ovirt01 ~]# > > [root at ovirt02 ~]# cat /var/lib/glusterd/peers/* > uuid=e9717281-a356-42aa-a579-a4647a29a0bc > state=3 > hostname1=ovirt01.localdomain.local > hostname2=10.10.2.102 > uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3 > state=3 > hostname1=ovirt03.localdomain.local > hostname2=10.10.2.104 > [root at ovirt02 ~]# > > [root at ovirt03 ~]# cat /var/lib/glusterd/peers/* > uuid=b89311fe-257f-4e44-8e15-9bff6245d689 > state=3 > hostname1=ovirt02.localdomain.local > hostname2=10.10.2.103 > uuid=e9717281-a356-42aa-a579-a4647a29a0bc > state=3 > hostname1=ovirt01.localdomain.local > hostname2=10.10.2.102 > [root at ovirt03 ~]# > > > But not the gluster info on the second and third node that have lost the > ovirt01/gl01 host brick information... > > Eg on ovirt02 > > > [root at ovirt02 peers]# gluster volume info export > > Volume Name: export > Type: Replicate > Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153 > Status: Started > Snapshot Count: 0 > Number of Bricks: 0 x (2 + 1) = 2 > Transport-type: tcp > Bricks: > Brick1: ovirt02.localdomain.local:/gluster/brick3/export > Brick2: ovirt03.localdomain.local:/gluster/brick3/export > Options Reconfigured: > transport.address-family: inet > performance.readdir-ahead: on > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: off > cluster.quorum-type: auto > cluster.server-quorum-type: server > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard: on > features.shard-block-size: 512MB > performance.low-prio-threads: 32 > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-wait-qlength: 10000 > cluster.shd-max-threads: 6 > network.ping-timeout: 30 > user.cifs: off > nfs.disable: on > performance.strict-o-direct: on > [root at ovirt02 peers]# > > And on ovirt03 > > [root at ovirt03 ~]# gluster volume info export > > Volume Name: export > Type: Replicate > Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153 > Status: Started > Snapshot Count: 0 > Number of Bricks: 0 x (2 + 1) = 2 > Transport-type: tcp > Bricks: > Brick1: ovirt02.localdomain.local:/gluster/brick3/export > Brick2: ovirt03.localdomain.local:/gluster/brick3/export > Options Reconfigured: > transport.address-family: inet > performance.readdir-ahead: on > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: off > cluster.quorum-type: auto > cluster.server-quorum-type: server > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard: on > features.shard-block-size: 512MB > performance.low-prio-threads: 32 > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-wait-qlength: 10000 > cluster.shd-max-threads: 6 > network.ping-timeout: 30 > user.cifs: off > nfs.disable: on > performance.strict-o-direct: on > [root at ovirt03 ~]# > > While on ovirt01 it seems isolated... > > [root at ovirt01 ~]# gluster volume info export > > Volume Name: export > Type: Replicate > Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153 > Status: Started > Snapshot Count: 0 > Number of Bricks: 0 x (2 + 1) = 1 > Transport-type: tcp > Bricks: > Brick1: gl01.localdomain.local:/gluster/brick3/export > Options Reconfigured: > transport.address-family: inet > performance.readdir-ahead: on > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: off > cluster.quorum-type: auto > cluster.server-quorum-type: server > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard: on > features.shard-block-size: 512MB > performance.low-prio-threads: 32 > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-wait-qlength: 10000 > cluster.shd-max-threads: 6 > network.ping-timeout: 30 > user.cifs: off > nfs.disable: on > performance.strict-o-direct: on > [root at ovirt01 ~]# > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170705/514a1529/attachment.html>
Apparently Analagous Threads
- op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)
- op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)
- op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)
- op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)
- op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)