Atin Mukherjee
2017-Jul-05 16:39 UTC
[Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)
OK, so the log just hints to the following: [2017-07-05 15:04:07.178204] E [MSGID: 106123] [glusterd-mgmt.c:1532:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Reset Brick on local node [2017-07-05 15:04:07.178214] E [MSGID: 106123] [glusterd-replace-brick.c:649:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases] 0-management: Commit Op Failed While going through the code, glusterd_op_reset_brick () failed resulting into these logs. Now I don't see any error logs generated from glusterd_op_reset_brick () which makes me thing that have we failed from a place where we log the failure in debug mode. Would you be able to restart glusterd service with debug log mode and reran this test and share the log? On Wed, Jul 5, 2017 at 9:12 PM, Gianluca Cecchi <gianluca.cecchi at gmail.com> wrote:> > > On Wed, Jul 5, 2017 at 5:22 PM, Atin Mukherjee <amukherj at redhat.com> > wrote: > >> And what does glusterd log indicate for these failures? >> > > > See here in gzip format > > https://drive.google.com/file/d/0BwoPbcrMv8mvYmlRLUgyV0pFN0k/ > view?usp=sharing > > It seems that on each host the peer files have been updated with a new > entry "hostname2": > > [root at ovirt01 ~]# cat /var/lib/glusterd/peers/* > uuid=b89311fe-257f-4e44-8e15-9bff6245d689 > state=3 > hostname1=ovirt02.localdomain.local > hostname2=10.10.2.103 > uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3 > state=3 > hostname1=ovirt03.localdomain.local > hostname2=10.10.2.104 > [root at ovirt01 ~]# > > [root at ovirt02 ~]# cat /var/lib/glusterd/peers/* > uuid=e9717281-a356-42aa-a579-a4647a29a0bc > state=3 > hostname1=ovirt01.localdomain.local > hostname2=10.10.2.102 > uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3 > state=3 > hostname1=ovirt03.localdomain.local > hostname2=10.10.2.104 > [root at ovirt02 ~]# > > [root at ovirt03 ~]# cat /var/lib/glusterd/peers/* > uuid=b89311fe-257f-4e44-8e15-9bff6245d689 > state=3 > hostname1=ovirt02.localdomain.local > hostname2=10.10.2.103 > uuid=e9717281-a356-42aa-a579-a4647a29a0bc > state=3 > hostname1=ovirt01.localdomain.local > hostname2=10.10.2.102 > [root at ovirt03 ~]# > > > But not the gluster info on the second and third node that have lost the > ovirt01/gl01 host brick information... > > Eg on ovirt02 > > > [root at ovirt02 peers]# gluster volume info export > > Volume Name: export > Type: Replicate > Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153 > Status: Started > Snapshot Count: 0 > Number of Bricks: 0 x (2 + 1) = 2 > Transport-type: tcp > Bricks: > Brick1: ovirt02.localdomain.local:/gluster/brick3/export > Brick2: ovirt03.localdomain.local:/gluster/brick3/export > Options Reconfigured: > transport.address-family: inet > performance.readdir-ahead: on > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: off > cluster.quorum-type: auto > cluster.server-quorum-type: server > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard: on > features.shard-block-size: 512MB > performance.low-prio-threads: 32 > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-wait-qlength: 10000 > cluster.shd-max-threads: 6 > network.ping-timeout: 30 > user.cifs: off > nfs.disable: on > performance.strict-o-direct: on > [root at ovirt02 peers]# > > And on ovirt03 > > [root at ovirt03 ~]# gluster volume info export > > Volume Name: export > Type: Replicate > Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153 > Status: Started > Snapshot Count: 0 > Number of Bricks: 0 x (2 + 1) = 2 > Transport-type: tcp > Bricks: > Brick1: ovirt02.localdomain.local:/gluster/brick3/export > Brick2: ovirt03.localdomain.local:/gluster/brick3/export > Options Reconfigured: > transport.address-family: inet > performance.readdir-ahead: on > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: off > cluster.quorum-type: auto > cluster.server-quorum-type: server > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard: on > features.shard-block-size: 512MB > performance.low-prio-threads: 32 > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-wait-qlength: 10000 > cluster.shd-max-threads: 6 > network.ping-timeout: 30 > user.cifs: off > nfs.disable: on > performance.strict-o-direct: on > [root at ovirt03 ~]# > > While on ovirt01 it seems isolated... > > [root at ovirt01 ~]# gluster volume info export > > Volume Name: export > Type: Replicate > Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153 > Status: Started > Snapshot Count: 0 > Number of Bricks: 0 x (2 + 1) = 1 > Transport-type: tcp > Bricks: > Brick1: gl01.localdomain.local:/gluster/brick3/export > Options Reconfigured: > transport.address-family: inet > performance.readdir-ahead: on > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: off > cluster.quorum-type: auto > cluster.server-quorum-type: server > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard: on > features.shard-block-size: 512MB > performance.low-prio-threads: 32 > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-wait-qlength: 10000 > cluster.shd-max-threads: 6 > network.ping-timeout: 30 > user.cifs: off > nfs.disable: on > performance.strict-o-direct: on > [root at ovirt01 ~]# > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170705/514a1529/attachment.html>
Gianluca Cecchi
2017-Jul-05 22:17 UTC
[Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)
On Wed, Jul 5, 2017 at 6:39 PM, Atin Mukherjee <amukherj at redhat.com> wrote:> OK, so the log just hints to the following: > > [2017-07-05 15:04:07.178204] E [MSGID: 106123] [glusterd-mgmt.c:1532:glusterd_mgmt_v3_commit] > 0-management: Commit failed for operation Reset Brick on local node > [2017-07-05 15:04:07.178214] E [MSGID: 106123] > [glusterd-replace-brick.c:649:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases] > 0-management: Commit Op Failed > > While going through the code, glusterd_op_reset_brick () failed resulting > into these logs. Now I don't see any error logs generated from > glusterd_op_reset_brick () which makes me thing that have we failed from a > place where we log the failure in debug mode. Would you be able to restart > glusterd service with debug log mode and reran this test and share the log? > >What's the best way to set glusterd in debug mode? Can I set this volume, and work on it even if it is now compromised? I ask because I have tried this: [root at ovirt01 ~]# gluster volume get export diagnostics.brick-log-level Option Value ------ ----- diagnostics.brick-log-level INFO [root at ovirt01 ~]# gluster volume set export diagnostics.brick-log-level DEBUG volume set: failed: Error, Validation Failed [root at ovirt01 ~]# While on another volume that is in good state, I can run [root at ovirt01 ~]# gluster volume set iso diagnostics.brick-log-level DEBUG volume set: success [root at ovirt01 ~]# [root at ovirt01 ~]# gluster volume get iso diagnostics.brick-log-level Option Value ------ ----- diagnostics.brick-log-level DEBUG [root at ovirt01 ~]# gluster volume set iso diagnostics.brick-log-level INFO volume set: success [root at ovirt01 ~]# [root at ovirt01 ~]# gluster volume get iso diagnostics.brick-log-level Option Value ------ ----- diagnostics.brick-log-level INFO [root at ovirt01 ~]# Do you mean to run the reset-brick command for another volume or for the same? Can I run it against this "now broken" volume? Or perhaps can I modify /usr/lib/systemd/system/glusterd.service and change in [service] section from Environment="LOG_LEVEL=INFO" to Environment="LOG_LEVEL=DEBUG" and then systemctl daemon-reload systemctl restart glusterd I think it would be better to keep gluster in debug mode the less time possible, as there are other volumes active right now, and I want to prevent fill the log files file system Best to put only some components in debug mode if possible as in the example commands above. Let me know, thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170706/7b4c3b6c/attachment.html>
Reasonably Related Threads
- op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)
- op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)
- op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)
- op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)
- op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)