thr3ads.net - Gluster users - [Gluster-users] Volume stuck unable to add a brick [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Karthik Subrahmanya

2019-Apr-16 12:19 UTC

[Gluster-users] Volume stuck unable to add a brick

Hi Boris,

Thank you for providing the logs.
The problem here is because of the "auth.allow: 127.0.0.1" setting on
the
volume.
When you try to add a new brick to the volume internally replication module
will try to set some metadata on the existing bricks to mark pending heal
on the new brick, by creating a temporary mount. Because of the auth.allow
setting that mount gets permission errors as seen in the below logs,
leading to add-brick failure.
>From data-gluster-dockervols.log-webserver9 :[2019-04-15 14:00:34.226838] I [addr.c:55:compare_addr_and_update]
0-/data/gluster/dockervols: allowed = "127.0.0.1", received addr
"192.168.200.147"
[2019-04-15 14:00:34.226895] E [MSGID: 115004]
[authenticate.c:224:gf_authenticate] 0-auth: no authentication module is
interested in accepting remote-client (null)
[2019-04-15 14:00:34.227129] E [MSGID: 115001]
[server-handshake.c:848:server_setvolume] 0-dockervols-server: Cannot
authenticate client from
webserver8.cast.org-55674-2019/04/15-14:00:20:495333-dockervols-client-2-0-0
3.12.2 [Permission denied]
>From dockervols-add-brick-mount.log :[2019-04-15 14:00:20.672033] W [MSGID: 114043]
[client-handshake.c:1109:client_setvolume_cbk] 0-dockervols-client-2:
failed to set the volume [Permission denied]
[2019-04-15 14:00:20.672102] W [MSGID: 114007]
[client-handshake.c:1138:client_setvolume_cbk] 0-dockervols-client-2:
failed to get 'process-uuid' from reply dict [Invalid argument]
[2019-04-15 14:00:20.672129] E [MSGID: 114044]
[client-handshake.c:1144:client_setvolume_cbk] 0-dockervols-client-2:
SETVOLUME on remote-host failed: Authentication failed [Permission denied]
[2019-04-15 14:00:20.672151] I [MSGID: 114049]
[client-handshake.c:1258:client_setvolume_cbk] 0-dockervols-client-2:
sending AUTH_FAILED event

This is a known issue and we are planning to fix this. For the time being
we have a workaround for this.
- Before you try adding the brick set the auth.allow option to default
i.e., "*" or you can do this by running "gluster v reset
<volname>
auth.allow"
- Add the brick
- After it succeeds set back the auth.allow option to the previous value.

Regards,
Karthik

On Tue, Apr 16, 2019 at 5:20 PM Boris Goldowsky <bgoldowsky at cast.org>
wrote:
> OK, log files attached.
>
>
>
> Boris
>
>
>
>
>
> *From: *Karthik Subrahmanya <ksubrahm at redhat.com>
> *Date: *Tuesday, April 16, 2019 at 2:52 AM
> *To: *Atin Mukherjee <atin.mukherjee83 at gmail.com>, Boris Goldowsky
<
> bgoldowsky at cast.org>
> *Cc: *Gluster-users <gluster-users at gluster.org>
> *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee <atin.mukherjee83 at
gmail.com>
> wrote:
>
> +Karthik Subrahmanya <ksubrahm at redhat.com>
>
>
>
> Didn't we we fix this problem recently? Failed to set extended
attribute
> indicates that temp mount is failing and we don't have quorum number of
> bricks up.
>
>
>
> We had two fixes which handles two kind of add-brick scenarios.
>
> [1] Fails add-brick when increasing the replica count if any of the brick
> is down to avoid data loss. This can be overridden by using the force
> option.
>
> [2] Allow add-brick to set the extended attributes by the temp mount if
> the volume is already mounted (has clients).
>
>
>
> They are in version 3.12.2 so, patch [1] is present there. But since they
> are using the force option it should not have any problem even if they have
> any brick down. The error message they are getting is also different, so it
> is not because of any brick being down I guess.
>
> Patch [2] is not present in 3.12.2 and it is not the conversion from plain
> distribute to replicate volume. So the scenario is different here.
>
> It seems like they are hitting some other issue.
>
>
>
> @Boris,
>
> Can you attach the add-brick's temp mount log. The file name should
look
> something like "dockervols-add-brick-mount.log". Can you also
provide all
> the brick logs of that volume during that time.
>
>
>
> [1] https://review.gluster.org/#/c/glusterfs/+/16330/
>
> [2] https://review.gluster.org/#/c/glusterfs/+/21791/
>
>
>
> Regards,
>
> Karthik
>
>
>
> Boris - What's the gluster version are you using?
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky <bgoldowsky at
cast.org>
> wrote:
>
> Atin, thank you for the reply.  Here are all of those pieces of
> information:
>
>
>
> [bgoldowsky at webserver9 ~]$ gluster --version
>
> glusterfs 3.12.2
>
> (same on all nodes)
>
>
>
> [bgoldowsky at webserver9 ~]$ sudo gluster peer status
>
> Number of Peers: 3
>
>
>
> Hostname: webserver11.cast.org
>
> Uuid: c2b147fd-cab4-4859-9922-db5730f8549d
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: webserver1.cast.org
>
> Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c
>
> State: Peer in Cluster (Connected)
>
> Other names:
>
> 192.168.200.131
>
> webserver1
>
>
>
> Hostname: webserver8.cast.org
>
> Uuid: be2f568b-61c5-4016-9264-083e4e6453a2
>
> State: Peer in Cluster (Connected)
>
> Other names:
>
> webserver8
>
>
>
> [bgoldowsky at webserver1 ~]$ sudo gluster v info
>
> Volume Name: dockervols
>
> Type: Replicate
>
> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/dockervols
>
> Brick2: webserver11:/data/gluster/dockervols
>
> Brick3: webserver9:/data/gluster/dockervols
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
> auth.allow: 127.0.0.1
>
>
>
> Volume Name: testvol
>
> Type: Replicate
>
> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 4 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/testvol
>
> Brick2: webserver9:/data/gluster/testvol
>
> Brick3: webserver11:/data/gluster/testvol
>
> Brick4: webserver8:/data/gluster/testvol
>
> Options Reconfigured:
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> [bgoldowsky at webserver8 ~]$ sudo gluster v info
>
> Volume Name: dockervols
>
> Type: Replicate
>
> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/dockervols
>
> Brick2: webserver11:/data/gluster/dockervols
>
> Brick3: webserver9:/data/gluster/dockervols
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
> auth.allow: 127.0.0.1
>
>
>
> Volume Name: testvol
>
> Type: Replicate
>
> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 4 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/testvol
>
> Brick2: webserver9:/data/gluster/testvol
>
> Brick3: webserver11:/data/gluster/testvol
>
> Brick4: webserver8:/data/gluster/testvol
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
>
>
> [bgoldowsky at webserver9 ~]$ sudo gluster v info
>
> Volume Name: dockervols
>
> Type: Replicate
>
> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/dockervols
>
> Brick2: webserver11:/data/gluster/dockervols
>
> Brick3: webserver9:/data/gluster/dockervols
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
> auth.allow: 127.0.0.1
>
>
>
> Volume Name: testvol
>
> Type: Replicate
>
> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 4 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/testvol
>
> Brick2: webserver9:/data/gluster/testvol
>
> Brick3: webserver11:/data/gluster/testvol
>
> Brick4: webserver8:/data/gluster/testvol
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
>
>
> [bgoldowsky at webserver11 ~]$ sudo gluster v info
>
> Volume Name: dockervols
>
> Type: Replicate
>
> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/dockervols
>
> Brick2: webserver11:/data/gluster/dockervols
>
> Brick3: webserver9:/data/gluster/dockervols
>
> Options Reconfigured:
>
> auth.allow: 127.0.0.1
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> Volume Name: testvol
>
> Type: Replicate
>
> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 4 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/testvol
>
> Brick2: webserver9:/data/gluster/testvol
>
> Brick3: webserver11:/data/gluster/testvol
>
> Brick4: webserver8:/data/gluster/testvol
>
> Options Reconfigured:
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> [bgoldowsky at webserver9 ~]$ sudo gluster volume add-brick dockervols
> replica 4 webserver8:/data/gluster/dockervols force
>
> volume add-brick: failed: Commit failed on webserver8.cast.org. Please
> check log file for details.
>
>
>
> Webserver8 glusterd.log:
>
>
>
> [2019-04-15 13:55:42.338197] I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req
>
> The message "I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req" repeated 2 times between [2019-04-15
13:55:42.338197]
> and [2019-04-15 13:55:42.341618]
>
> [2019-04-15 14:00:20.445011] I [run.c:190:runner_log]
> (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215)
> [0x7fe697764215]
> -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d)
> [0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fe6a2d16ea5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
> --volname=dockervols --version=1 --volume-op=add-brick
> --gd-workdir=/var/lib/glusterd
>
> [2019-04-15 14:00:20.445148] I [MSGID: 106578]
> [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management:
> replica-count is set 4
>
> [2019-04-15 14:00:20.445184] I [MSGID: 106578]
> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
> type is set 0, need to change it
>
> [2019-04-15 14:00:20.672347] E [MSGID: 106054]
> [glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management:
> Failed to set extended attribute trusted.add-brick : Transport endpoint is
> not connected [Transport endpoint is not connected]
>
> [2019-04-15 14:00:20.693491] E [MSGID: 101042]
> [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntmvdFGq
> [Transport endpoint is not connected]
>
> [2019-04-15 14:00:20.693597] E [MSGID: 106074]
> [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add
> bricks
>
> [2019-04-15 14:00:20.693637] E [MSGID: 106123]
> [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit
> failed.
>
> [2019-04-15 14:00:20.693667] E [MSGID: 106123]
> [glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management:
> commit failed on operation Add brick
>
>
>
> Webserver11 log file:
>
>
>
> [2019-04-15 13:56:29.563270] I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req
>
> The message "I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req" repeated 2 times between [2019-04-15
13:56:29.563270]
> and [2019-04-15 13:56:29.566209]
>
> [2019-04-15 14:00:33.996866] I [run.c:190:runner_log]
> (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215)
> [0x7f36de924215]
> -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d)
> [0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7f36e9ed6ea5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
> --volname=dockervols --version=1 --volume-op=add-brick
> --gd-workdir=/var/lib/glusterd
>
> [2019-04-15 14:00:33.996979] I [MSGID: 106578]
> [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management:
> replica-count is set 4
>
> [2019-04-15 14:00:33.997004] I [MSGID: 106578]
> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
> type is set 0, need to change it
>
> [2019-04-15 14:00:34.013789] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already
> stopped
>
> [2019-04-15 14:00:34.013849] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is
> stopped
>
> [2019-04-15 14:00:34.017535] I [MSGID: 106568]
> [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping
> glustershd daemon running in pid: 6087
>
> [2019-04-15 14:00:35.018783] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd
> service is stopped
>
> [2019-04-15 14:00:35.018952] I [MSGID: 106567]
> [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting
> glustershd service
>
> [2019-04-15 14:00:35.028306] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already
> stopped
>
> [2019-04-15 14:00:35.028408] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is
> stopped
>
> [2019-04-15 14:00:35.028601] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already
> stopped
>
> [2019-04-15 14:00:35.028645] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is
> stopped
>
>
>
> Thank you for taking a look!
>
>
>
> Boris
>
>
>
>
>
> *From: *Atin Mukherjee <atin.mukherjee83 at gmail.com>
> *Date: *Friday, April 12, 2019 at 1:10 PM
> *To: *Boris Goldowsky <bgoldowsky at cast.org>
> *Cc: *Gluster-users <gluster-users at gluster.org>
> *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick
>
>
>
>
>
>
>
> On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky <bgoldowsky at
cast.org> wrote:
>
> I?ve got a replicated volume with three bricks  (?1x3=3?), the idea is to
> have a common set of files that are locally available on all the machines
> (Scientific Linux 7, which is essentially CentOS 7) in a cluster.
>
>
>
> I tried to add on a fourth machine, so used a command like this:
>
>
>
> sudo gluster volume add-brick dockervols replica 4
> webserver8:/data/gluster/dockervols force
>
>
>
> but the result is:
>
> volume add-brick: failed: Commit failed on webserver1. Please check log
> file for details.
>
> Commit failed on webserver8. Please check log file for details.
>
> Commit failed on webserver11. Please check log file for details.
>
>
>
> Tried: removing the new brick (this also fails) and trying again.
>
> Tried: checking the logs. The log files are not enlightening to me ? I
> don?t know what?s normal and what?s not.
>
>
>
> From webserver8 & webserver11 could you attach glusterd log files?
>
>
>
> Also please share following:
>
> - gluster version? (gluster ?version)
>
> - Output of ?gluster peer status?
>
> - Output of ?gluster v info? from all 4 nodes.
>
>
>
> Tried: deleting the brick directory from previous attempt, so that it?s
> not in the way.
>
> Tried: restarting gluster services
>
> Tried: rebooting
>
> Tried: setting up a new volume, replicated to all four machines. This
> works, so I?m assuming it?s not a networking issue.  But still fails with
> this existing volume that has the critical data in it.
>
>
>
> Running out of ideas. Any suggestions?  Thank you!
>
>
>
> Boris
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
>
> --Atin
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190416/aeb5e7bf/attachment-0001.html>

Boris Goldowsky

2019-Apr-16 13:41 UTC

head link

[Gluster-users] Volume stuck unable to add a brick

That worked!  Thank you SO much!

Boris


From: Karthik Subrahmanya <ksubrahm at redhat.com>
Date: Tuesday, April 16, 2019 at 8:20 AM
To: Boris Goldowsky <bgoldowsky at cast.org>
Cc: Atin Mukherjee <atin.mukherjee83 at gmail.com>, Gluster-users
<gluster-users at gluster.org>
Subject: Re: [Gluster-users] Volume stuck unable to add a brick

Hi Boris,

Thank you for providing the logs.
The problem here is because of the "auth.allow: 127.0.0.1" setting on
the volume.
When you try to add a new brick to the volume internally replication module will
try to set some metadata on the existing bricks to mark pending heal on the new
brick, by creating a temporary mount. Because of the auth.allow setting that
mount gets permission errors as seen in the below logs, leading to add-brick
failure.

From data-gluster-dockervols.log-webserver9 :
[2019-04-15 14:00:34.226838] I [addr.c:55:compare_addr_and_update]
0-/data/gluster/dockervols: allowed = "127.0.0.1", received addr =
"192.168.200.147"
[2019-04-15 14:00:34.226895] E [MSGID: 115004]
[authenticate.c:224:gf_authenticate] 0-auth: no authentication module is
interested in accepting remote-client (null)
[2019-04-15 14:00:34.227129] E [MSGID: 115001]
[server-handshake.c:848:server_setvolume] 0-dockervols-server: Cannot
authenticate client from
webserver8.cast.org-55674-2019/04/15-14:00:20:495333-dockervols-client-2-0-0
3.12.2 [Permission denied]

From dockervols-add-brick-mount.log :
[2019-04-15 14:00:20.672033] W [MSGID: 114043]
[client-handshake.c:1109:client_setvolume_cbk] 0-dockervols-client-2: failed to
set the volume [Permission denied]
[2019-04-15 14:00:20.672102] W [MSGID: 114007]
[client-handshake.c:1138:client_setvolume_cbk] 0-dockervols-client-2: failed to
get 'process-uuid' from reply dict [Invalid argument]
[2019-04-15 14:00:20.672129] E [MSGID: 114044]
[client-handshake.c:1144:client_setvolume_cbk] 0-dockervols-client-2: SETVOLUME
on remote-host failed: Authentication failed [Permission denied]
[2019-04-15 14:00:20.672151] I [MSGID: 114049]
[client-handshake.c:1258:client_setvolume_cbk] 0-dockervols-client-2: sending
AUTH_FAILED event

This is a known issue and we are planning to fix this. For the time being we
have a workaround for this.
- Before you try adding the brick set the auth.allow option to default i.e.,
"*" or you can do this by running "gluster v reset
<volname> auth.allow"
- Add the brick
- After it succeeds set back the auth.allow option to the previous value.

Regards,
Karthik

On Tue, Apr 16, 2019 at 5:20 PM Boris Goldowsky <bgoldowsky at
cast.org<mailto:bgoldowsky at cast.org>> wrote:
OK, log files attached.

Boris


From: Karthik Subrahmanya <ksubrahm at redhat.com<mailto:ksubrahm at
redhat.com>>
Date: Tuesday, April 16, 2019 at 2:52 AM
To: Atin Mukherjee <atin.mukherjee83 at gmail.com<mailto:atin.mukherjee83
at gmail.com>>, Boris Goldowsky <bgoldowsky at
cast.org<mailto:bgoldowsky at cast.org>>
Cc: Gluster-users <gluster-users at gluster.org<mailto:gluster-users at
gluster.org>>
Subject: Re: [Gluster-users] Volume stuck unable to add a brick



On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee <atin.mukherjee83 at
gmail.com<mailto:atin.mukherjee83 at gmail.com>> wrote:
+Karthik Subrahmanya<mailto:ksubrahm at redhat.com>

Didn't we we fix this problem recently? Failed to set extended attribute
indicates that temp mount is failing and we don't have quorum number of
bricks up.

We had two fixes which handles two kind of add-brick scenarios.
[1] Fails add-brick when increasing the replica count if any of the brick is
down to avoid data loss. This can be overridden by using the force option.
[2] Allow add-brick to set the extended attributes by the temp mount if the
volume is already mounted (has clients).

They are in version 3.12.2 so, patch [1] is present there. But since they are
using the force option it should not have any problem even if they have any
brick down. The error message they are getting is also different, so it is not
because of any brick being down I guess.
Patch [2] is not present in 3.12.2 and it is not the conversion from plain
distribute to replicate volume. So the scenario is different here.
It seems like they are hitting some other issue.

@Boris,
Can you attach the add-brick's temp mount log. The file name should look
something like "dockervols-add-brick-mount.log". Can you also provide
all the brick logs of that volume during that time.

[1] https://review.gluster.org/#/c/glusterfs/+/16330/
[2] https://review.gluster.org/#/c/glusterfs/+/21791/

Regards,
Karthik

Boris - What's the gluster version are you using?



On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky <bgoldowsky at
cast.org<mailto:bgoldowsky at cast.org>> wrote:
Atin, thank you for the reply.  Here are all of those pieces of information:


[bgoldowsky at webserver9 ~]$ gluster --version

glusterfs 3.12.2
(same on all nodes)


[bgoldowsky at webserver9 ~]$ sudo gluster peer status

Number of Peers: 3



Hostname: webserver11.cast.org<http://webserver11.cast.org>

Uuid: c2b147fd-cab4-4859-9922-db5730f8549d

State: Peer in Cluster (Connected)



Hostname: webserver1.cast.org<http://webserver1.cast.org>

Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c

State: Peer in Cluster (Connected)

Other names:

192.168.200.131

webserver1



Hostname: webserver8.cast.org<http://webserver8.cast.org>

Uuid: be2f568b-61c5-4016-9264-083e4e6453a2

State: Peer in Cluster (Connected)

Other names:

webserver8


[bgoldowsky at webserver1 ~]$ sudo gluster v info

Volume Name: dockervols

Type: Replicate

Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: webserver1:/data/gluster/dockervols

Brick2: webserver11:/data/gluster/dockervols

Brick3: webserver9:/data/gluster/dockervols

Options Reconfigured:

nfs.disable: on

transport.address-family: inet

auth.allow: 127.0.0.1



Volume Name: testvol

Type: Replicate

Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 4 = 4

Transport-type: tcp

Bricks:

Brick1: webserver1:/data/gluster/testvol

Brick2: webserver9:/data/gluster/testvol

Brick3: webserver11:/data/gluster/testvol

Brick4: webserver8:/data/gluster/testvol

Options Reconfigured:

transport.address-family: inet

nfs.disable: on


[bgoldowsky at webserver8 ~]$ sudo gluster v info

Volume Name: dockervols

Type: Replicate

Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: webserver1:/data/gluster/dockervols

Brick2: webserver11:/data/gluster/dockervols

Brick3: webserver9:/data/gluster/dockervols

Options Reconfigured:

nfs.disable: on

transport.address-family: inet

auth.allow: 127.0.0.1



Volume Name: testvol

Type: Replicate

Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 4 = 4

Transport-type: tcp

Bricks:

Brick1: webserver1:/data/gluster/testvol

Brick2: webserver9:/data/gluster/testvol

Brick3: webserver11:/data/gluster/testvol

Brick4: webserver8:/data/gluster/testvol

Options Reconfigured:

nfs.disable: on

transport.address-family: inet


[bgoldowsky at webserver9 ~]$ sudo gluster v info

Volume Name: dockervols

Type: Replicate

Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: webserver1:/data/gluster/dockervols

Brick2: webserver11:/data/gluster/dockervols

Brick3: webserver9:/data/gluster/dockervols

Options Reconfigured:

nfs.disable: on

transport.address-family: inet

auth.allow: 127.0.0.1



Volume Name: testvol

Type: Replicate

Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 4 = 4

Transport-type: tcp

Bricks:

Brick1: webserver1:/data/gluster/testvol

Brick2: webserver9:/data/gluster/testvol

Brick3: webserver11:/data/gluster/testvol

Brick4: webserver8:/data/gluster/testvol

Options Reconfigured:

nfs.disable: on

transport.address-family: inet


[bgoldowsky at webserver11 ~]$ sudo gluster v info

Volume Name: dockervols

Type: Replicate

Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: webserver1:/data/gluster/dockervols

Brick2: webserver11:/data/gluster/dockervols

Brick3: webserver9:/data/gluster/dockervols

Options Reconfigured:

auth.allow: 127.0.0.1

transport.address-family: inet

nfs.disable: on



Volume Name: testvol

Type: Replicate

Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 4 = 4

Transport-type: tcp

Bricks:

Brick1: webserver1:/data/gluster/testvol

Brick2: webserver9:/data/gluster/testvol

Brick3: webserver11:/data/gluster/testvol

Brick4: webserver8:/data/gluster/testvol

Options Reconfigured:

transport.address-family: inet

nfs.disable: on


[bgoldowsky at webserver9 ~]$ sudo gluster volume add-brick dockervols replica 4
webserver8:/data/gluster/dockervols force

volume add-brick: failed: Commit failed on
webserver8.cast.org<http://webserver8.cast.org>. Please check log file for
details.

Webserver8 glusterd.log:


[2019-04-15 13:55:42.338197] I [MSGID: 106488]
[glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
Received get vol req

The message "I [MSGID: 106488]
[glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
Received get vol req" repeated 2 times between [2019-04-15 13:55:42.338197]
and [2019-04-15 13:55:42.341618]

[2019-04-15 14:00:20.445011] I [run.c:190:runner_log]
(-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215)
[0x7fe697764215]
-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d)
[0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115)
[0x7fe6a2d16ea5] ) 0-management: Ran script:
/var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
--volname=dockervols --version=1 --volume-op=add-brick
--gd-workdir=/var/lib/glusterd

[2019-04-15 14:00:20.445148] I [MSGID: 106578]
[glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management:
replica-count is set 4

[2019-04-15 14:00:20.445184] I [MSGID: 106578]
[glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is
set 0, need to change it

[2019-04-15 14:00:20.672347] E [MSGID: 106054]
[glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management:
Failed to set extended attribute trusted.add-brick : Transport endpoint is not
connected [Transport endpoint is not connected]

[2019-04-15 14:00:20.693491] E [MSGID: 101042] [compat.c:569:gf_umount_lazy]
0-management: Lazy unmount of /tmp/mntmvdFGq [Transport endpoint is not
connected]

[2019-04-15 14:00:20.693597] E [MSGID: 106074]
[glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add
bricks

[2019-04-15 14:00:20.693637] E [MSGID: 106123]
[glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit
failed.

[2019-04-15 14:00:20.693667] E [MSGID: 106123]
[glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management: commit
failed on operation Add brick

Webserver11 log file:


[2019-04-15 13:56:29.563270] I [MSGID: 106488]
[glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
Received get vol req

The message "I [MSGID: 106488]
[glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
Received get vol req" repeated 2 times between [2019-04-15 13:56:29.563270]
and [2019-04-15 13:56:29.566209]

[2019-04-15 14:00:33.996866] I [run.c:190:runner_log]
(-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215)
[0x7f36de924215]
-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d)
[0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115)
[0x7f36e9ed6ea5] ) 0-management: Ran script:
/var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
--volname=dockervols --version=1 --volume-op=add-brick
--gd-workdir=/var/lib/glusterd

[2019-04-15 14:00:33.996979] I [MSGID: 106578]
[glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management:
replica-count is set 4

[2019-04-15 14:00:33.997004] I [MSGID: 106578]
[glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is
set 0, need to change it

[2019-04-15 14:00:34.013789] I [MSGID: 106132]
[glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already stopped

[2019-04-15 14:00:34.013849] I [MSGID: 106568]
[glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is stopped

[2019-04-15 14:00:34.017535] I [MSGID: 106568]
[glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping glustershd
daemon running in pid: 6087

[2019-04-15 14:00:35.018783] I [MSGID: 106568]
[glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd service is
stopped

[2019-04-15 14:00:35.018952] I [MSGID: 106567]
[glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting glustershd
service

[2019-04-15 14:00:35.028306] I [MSGID: 106132]
[glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already stopped

[2019-04-15 14:00:35.028408] I [MSGID: 106568]
[glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is
stopped

[2019-04-15 14:00:35.028601] I [MSGID: 106132]
[glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already stopped

[2019-04-15 14:00:35.028645] I [MSGID: 106568]
[glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is
stopped

Thank you for taking a look!

Boris


From: Atin Mukherjee <atin.mukherjee83 at
gmail.com<mailto:atin.mukherjee83 at gmail.com>>
Date: Friday, April 12, 2019 at 1:10 PM
To: Boris Goldowsky <bgoldowsky at cast.org<mailto:bgoldowsky at
cast.org>>
Cc: Gluster-users <gluster-users at gluster.org<mailto:gluster-users at
gluster.org>>
Subject: Re: [Gluster-users] Volume stuck unable to add a brick



On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky <bgoldowsky at
cast.org<mailto:bgoldowsky at cast.org>> wrote:
I?ve got a replicated volume with three bricks  (?1x3=3?), the idea is to have a
common set of files that are locally available on all the machines (Scientific
Linux 7, which is essentially CentOS 7) in a cluster.

I tried to add on a fourth machine, so used a command like this:


sudo gluster volume add-brick dockervols replica 4
webserver8:/data/gluster/dockervols force

but the result is:

volume add-brick: failed: Commit failed on webserver1. Please check log file for
details.

Commit failed on webserver8. Please check log file for details.

Commit failed on webserver11. Please check log file for details.

Tried: removing the new brick (this also fails) and trying again.
Tried: checking the logs. The log files are not enlightening to me ? I don?t
know what?s normal and what?s not.

From webserver8 & webserver11 could you attach glusterd log files?

Also please share following:
- gluster version? (gluster ?version)
- Output of ?gluster peer status?
- Output of ?gluster v info? from all 4 nodes.

Tried: deleting the brick directory from previous attempt, so that it?s not in
the way.
Tried: restarting gluster services
Tried: rebooting
Tried: setting up a new volume, replicated to all four machines. This works, so
I?m assuming it?s not a networking issue.  But still fails with this existing
volume that has the critical data in it.

Running out of ideas. Any suggestions?  Thank you!

Boris

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users
--
--Atin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190416/11b0881f/attachment.html>

Karthik Subrahmanya

2019-Apr-16 15:26 UTC

head link

[Gluster-users] Volume stuck unable to add a brick

You're welcome!

On Tue 16 Apr, 2019, 7:12 PM Boris Goldowsky, <bgoldowsky at cast.org>
wrote:
> That worked!  Thank you SO much!
>
>
>
> Boris
>
>
>
>
>
> *From: *Karthik Subrahmanya <ksubrahm at redhat.com>
> *Date: *Tuesday, April 16, 2019 at 8:20 AM
> *To: *Boris Goldowsky <bgoldowsky at cast.org>
> *Cc: *Atin Mukherjee <atin.mukherjee83 at gmail.com>, Gluster-users
<
> gluster-users at gluster.org>
> *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick
>
>
>
> Hi Boris,
>
>
>
> Thank you for providing the logs.
>
> The problem here is because of the "auth.allow: 127.0.0.1"
setting on the
> volume.
>
> When you try to add a new brick to the volume internally replication
> module will try to set some metadata on the existing bricks to mark pending
> heal on the new brick, by creating a temporary mount. Because of the
> auth.allow setting that mount gets permission errors as seen in the below
> logs, leading to add-brick failure.
>
>
>
> From data-gluster-dockervols.log-webserver9 :
>
> [2019-04-15 14:00:34.226838] I [addr.c:55:compare_addr_and_update]
> 0-/data/gluster/dockervols: allowed = "127.0.0.1", received addr
> "192.168.200.147"
>
> [2019-04-15 14:00:34.226895] E [MSGID: 115004]
> [authenticate.c:224:gf_authenticate] 0-auth: no authentication module is
> interested in accepting remote-client (null)
>
> [2019-04-15 14:00:34.227129] E [MSGID: 115001]
> [server-handshake.c:848:server_setvolume] 0-dockervols-server: Cannot
> authenticate client from
>
webserver8.cast.org-55674-2019/04/15-14:00:20:495333-dockervols-client-2-0-0
> 3.12.2 [Permission denied]
>
>
>
> From dockervols-add-brick-mount.log :
>
> [2019-04-15 14:00:20.672033] W [MSGID: 114043]
> [client-handshake.c:1109:client_setvolume_cbk] 0-dockervols-client-2:
> failed to set the volume [Permission denied]
>
> [2019-04-15 14:00:20.672102] W [MSGID: 114007]
> [client-handshake.c:1138:client_setvolume_cbk] 0-dockervols-client-2:
> failed to get 'process-uuid' from reply dict [Invalid argument]
>
> [2019-04-15 14:00:20.672129] E [MSGID: 114044]
> [client-handshake.c:1144:client_setvolume_cbk] 0-dockervols-client-2:
> SETVOLUME on remote-host failed: Authentication failed [Permission denied]
>
> [2019-04-15 14:00:20.672151] I [MSGID: 114049]
> [client-handshake.c:1258:client_setvolume_cbk] 0-dockervols-client-2:
> sending AUTH_FAILED event
>
>
>
> This is a known issue and we are planning to fix this. For the time being
> we have a workaround for this.
>
> - Before you try adding the brick set the auth.allow option to default
> i.e., "*" or you can do this by running "gluster v reset
<volname>
> auth.allow"
>
> - Add the brick
>
> - After it succeeds set back the auth.allow option to the previous value.
>
>
>
> Regards,
>
> Karthik
>
>
>
> On Tue, Apr 16, 2019 at 5:20 PM Boris Goldowsky <bgoldowsky at
cast.org>
> wrote:
>
> OK, log files attached.
>
>
>
> Boris
>
>
>
>
>
> *From: *Karthik Subrahmanya <ksubrahm at redhat.com>
> *Date: *Tuesday, April 16, 2019 at 2:52 AM
> *To: *Atin Mukherjee <atin.mukherjee83 at gmail.com>, Boris Goldowsky
<
> bgoldowsky at cast.org>
> *Cc: *Gluster-users <gluster-users at gluster.org>
> *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee <atin.mukherjee83 at
gmail.com>
> wrote:
>
> +Karthik Subrahmanya <ksubrahm at redhat.com>
>
>
>
> Didn't we we fix this problem recently? Failed to set extended
attribute
> indicates that temp mount is failing and we don't have quorum number of
> bricks up.
>
>
>
> We had two fixes which handles two kind of add-brick scenarios.
>
> [1] Fails add-brick when increasing the replica count if any of the brick
> is down to avoid data loss. This can be overridden by using the force
> option.
>
> [2] Allow add-brick to set the extended attributes by the temp mount if
> the volume is already mounted (has clients).
>
>
>
> They are in version 3.12.2 so, patch [1] is present there. But since they
> are using the force option it should not have any problem even if they have
> any brick down. The error message they are getting is also different, so it
> is not because of any brick being down I guess.
>
> Patch [2] is not present in 3.12.2 and it is not the conversion from plain
> distribute to replicate volume. So the scenario is different here.
>
> It seems like they are hitting some other issue.
>
>
>
> @Boris,
>
> Can you attach the add-brick's temp mount log. The file name should
look
> something like "dockervols-add-brick-mount.log". Can you also
provide all
> the brick logs of that volume during that time.
>
>
>
> [1] https://review.gluster.org/#/c/glusterfs/+/16330/
>
> [2] https://review.gluster.org/#/c/glusterfs/+/21791/
>
>
>
> Regards,
>
> Karthik
>
>
>
> Boris - What's the gluster version are you using?
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky <bgoldowsky at
cast.org>
> wrote:
>
> Atin, thank you for the reply.  Here are all of those pieces of
> information:
>
>
>
> [bgoldowsky at webserver9 ~]$ gluster --version
>
> glusterfs 3.12.2
>
> (same on all nodes)
>
>
>
> [bgoldowsky at webserver9 ~]$ sudo gluster peer status
>
> Number of Peers: 3
>
>
>
> Hostname: webserver11.cast.org
>
> Uuid: c2b147fd-cab4-4859-9922-db5730f8549d
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: webserver1.cast.org
>
> Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c
>
> State: Peer in Cluster (Connected)
>
> Other names:
>
> 192.168.200.131
>
> webserver1
>
>
>
> Hostname: webserver8.cast.org
>
> Uuid: be2f568b-61c5-4016-9264-083e4e6453a2
>
> State: Peer in Cluster (Connected)
>
> Other names:
>
> webserver8
>
>
>
> [bgoldowsky at webserver1 ~]$ sudo gluster v info
>
> Volume Name: dockervols
>
> Type: Replicate
>
> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/dockervols
>
> Brick2: webserver11:/data/gluster/dockervols
>
> Brick3: webserver9:/data/gluster/dockervols
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
> auth.allow: 127.0.0.1
>
>
>
> Volume Name: testvol
>
> Type: Replicate
>
> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 4 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/testvol
>
> Brick2: webserver9:/data/gluster/testvol
>
> Brick3: webserver11:/data/gluster/testvol
>
> Brick4: webserver8:/data/gluster/testvol
>
> Options Reconfigured:
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> [bgoldowsky at webserver8 ~]$ sudo gluster v info
>
> Volume Name: dockervols
>
> Type: Replicate
>
> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/dockervols
>
> Brick2: webserver11:/data/gluster/dockervols
>
> Brick3: webserver9:/data/gluster/dockervols
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
> auth.allow: 127.0.0.1
>
>
>
> Volume Name: testvol
>
> Type: Replicate
>
> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 4 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/testvol
>
> Brick2: webserver9:/data/gluster/testvol
>
> Brick3: webserver11:/data/gluster/testvol
>
> Brick4: webserver8:/data/gluster/testvol
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
>
>
> [bgoldowsky at webserver9 ~]$ sudo gluster v info
>
> Volume Name: dockervols
>
> Type: Replicate
>
> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/dockervols
>
> Brick2: webserver11:/data/gluster/dockervols
>
> Brick3: webserver9:/data/gluster/dockervols
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
> auth.allow: 127.0.0.1
>
>
>
> Volume Name: testvol
>
> Type: Replicate
>
> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 4 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/testvol
>
> Brick2: webserver9:/data/gluster/testvol
>
> Brick3: webserver11:/data/gluster/testvol
>
> Brick4: webserver8:/data/gluster/testvol
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
>
>
> [bgoldowsky at webserver11 ~]$ sudo gluster v info
>
> Volume Name: dockervols
>
> Type: Replicate
>
> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/dockervols
>
> Brick2: webserver11:/data/gluster/dockervols
>
> Brick3: webserver9:/data/gluster/dockervols
>
> Options Reconfigured:
>
> auth.allow: 127.0.0.1
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> Volume Name: testvol
>
> Type: Replicate
>
> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 4 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/testvol
>
> Brick2: webserver9:/data/gluster/testvol
>
> Brick3: webserver11:/data/gluster/testvol
>
> Brick4: webserver8:/data/gluster/testvol
>
> Options Reconfigured:
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> [bgoldowsky at webserver9 ~]$ sudo gluster volume add-brick dockervols
> replica 4 webserver8:/data/gluster/dockervols force
>
> volume add-brick: failed: Commit failed on webserver8.cast.org. Please
> check log file for details.
>
>
>
> Webserver8 glusterd.log:
>
>
>
> [2019-04-15 13:55:42.338197] I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req
>
> The message "I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req" repeated 2 times between [2019-04-15
13:55:42.338197]
> and [2019-04-15 13:55:42.341618]
>
> [2019-04-15 14:00:20.445011] I [run.c:190:runner_log]
> (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215)
> [0x7fe697764215]
> -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d)
> [0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fe6a2d16ea5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
> --volname=dockervols --version=1 --volume-op=add-brick
> --gd-workdir=/var/lib/glusterd
>
> [2019-04-15 14:00:20.445148] I [MSGID: 106578]
> [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management:
> replica-count is set 4
>
> [2019-04-15 14:00:20.445184] I [MSGID: 106578]
> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
> type is set 0, need to change it
>
> [2019-04-15 14:00:20.672347] E [MSGID: 106054]
> [glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management:
> Failed to set extended attribute trusted.add-brick : Transport endpoint is
> not connected [Transport endpoint is not connected]
>
> [2019-04-15 14:00:20.693491] E [MSGID: 101042]
> [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntmvdFGq
> [Transport endpoint is not connected]
>
> [2019-04-15 14:00:20.693597] E [MSGID: 106074]
> [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add
> bricks
>
> [2019-04-15 14:00:20.693637] E [MSGID: 106123]
> [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit
> failed.
>
> [2019-04-15 14:00:20.693667] E [MSGID: 106123]
> [glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management:
> commit failed on operation Add brick
>
>
>
> Webserver11 log file:
>
>
>
> [2019-04-15 13:56:29.563270] I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req
>
> The message "I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req" repeated 2 times between [2019-04-15
13:56:29.563270]
> and [2019-04-15 13:56:29.566209]
>
> [2019-04-15 14:00:33.996866] I [run.c:190:runner_log]
> (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215)
> [0x7f36de924215]
> -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d)
> [0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7f36e9ed6ea5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
> --volname=dockervols --version=1 --volume-op=add-brick
> --gd-workdir=/var/lib/glusterd
>
> [2019-04-15 14:00:33.996979] I [MSGID: 106578]
> [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management:
> replica-count is set 4
>
> [2019-04-15 14:00:33.997004] I [MSGID: 106578]
> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
> type is set 0, need to change it
>
> [2019-04-15 14:00:34.013789] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already
> stopped
>
> [2019-04-15 14:00:34.013849] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is
> stopped
>
> [2019-04-15 14:00:34.017535] I [MSGID: 106568]
> [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping
> glustershd daemon running in pid: 6087
>
> [2019-04-15 14:00:35.018783] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd
> service is stopped
>
> [2019-04-15 14:00:35.018952] I [MSGID: 106567]
> [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting
> glustershd service
>
> [2019-04-15 14:00:35.028306] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already
> stopped
>
> [2019-04-15 14:00:35.028408] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is
> stopped
>
> [2019-04-15 14:00:35.028601] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already
> stopped
>
> [2019-04-15 14:00:35.028645] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is
> stopped
>
>
>
> Thank you for taking a look!
>
>
>
> Boris
>
>
>
>
>
> *From: *Atin Mukherjee <atin.mukherjee83 at gmail.com>
> *Date: *Friday, April 12, 2019 at 1:10 PM
> *To: *Boris Goldowsky <bgoldowsky at cast.org>
> *Cc: *Gluster-users <gluster-users at gluster.org>
> *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick
>
>
>
>
>
>
>
> On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky <bgoldowsky at
cast.org> wrote:
>
> I?ve got a replicated volume with three bricks  (?1x3=3?), the idea is to
> have a common set of files that are locally available on all the machines
> (Scientific Linux 7, which is essentially CentOS 7) in a cluster.
>
>
>
> I tried to add on a fourth machine, so used a command like this:
>
>
>
> sudo gluster volume add-brick dockervols replica 4
> webserver8:/data/gluster/dockervols force
>
>
>
> but the result is:
>
> volume add-brick: failed: Commit failed on webserver1. Please check log
> file for details.
>
> Commit failed on webserver8. Please check log file for details.
>
> Commit failed on webserver11. Please check log file for details.
>
>
>
> Tried: removing the new brick (this also fails) and trying again.
>
> Tried: checking the logs. The log files are not enlightening to me ? I
> don?t know what?s normal and what?s not.
>
>
>
> From webserver8 & webserver11 could you attach glusterd log files?
>
>
>
> Also please share following:
>
> - gluster version? (gluster ?version)
>
> - Output of ?gluster peer status?
>
> - Output of ?gluster v info? from all 4 nodes.
>
>
>
> Tried: deleting the brick directory from previous attempt, so that it?s
> not in the way.
>
> Tried: restarting gluster services
>
> Tried: rebooting
>
> Tried: setting up a new volume, replicated to all four machines. This
> works, so I?m assuming it?s not a networking issue.  But still fails with
> this existing volume that has the critical data in it.
>
>
>
> Running out of ideas. Any suggestions?  Thank you!
>
>
>
> Boris
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
>
> --Atin
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190416/9e63b172/attachment.html>

Gluster users - Apr 2019 - Volume stuck unable to add a brick

[Gluster-users] Volume stuck unable to add a brick

[Gluster-users] Volume stuck unable to add a brick

[Gluster-users] Volume stuck unable to add a brick