thr3ads.net - Gluster users - [Gluster-users] add-brick: failed: Commit failed [May 2019]

If this information is useful, please help other people find it:
Share via:

David Cunningham

2019-May-22 22:24 UTC

[Gluster-users] add-brick: failed: Commit failed

Hi Ravi,

Please see the log attached. The output of "gluster volume status" is
as
follows. Should there be something listening on gfs3? I'm not sure whether
it having TCP Port and Pid as N/A is a symptom or cause. Thank you.

# gluster volume status
Status of volume: gvol0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gfs1:/nodirectwritedata/gluster/gvol0 49152     0          Y
7706
Brick gfs2:/nodirectwritedata/gluster/gvol0 49152     0          Y
7624
Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A       N/A        N
N/A
Self-heal Daemon on localhost               N/A       N/A        Y
19853
Self-heal Daemon on gfs1                    N/A       N/A        Y
28600
Self-heal Daemon on gfs2                    N/A       N/A        Y
17614

Task Status of Volume gvol0
------------------------------------------------------------------------------
There are no active volume tasks


On Wed, 22 May 2019 at 18:06, Ravishankar N <ravishankar at redhat.com>
wrote:
> If you are trying this again, please 'gluster volume set $volname
> client-log-level DEBUG`before attempting the add-brick and attach the
> gvol0-add-brick-mount.log here. After that, you can change the
> client-log-level back to INFO.
>
> -Ravi
> On 22/05/19 11:32 AM, Ravishankar N wrote:
>
>
> On 22/05/19 11:23 AM, David Cunningham wrote:
>
> Hi Ravi,
>
> I'd already done exactly that before, where step 3 was a simple 'rm
-rf
> /nodirectwritedata/gluster/gvol0'. Have you another suggestion on what
the
> cleanup or reformat should be?
>
> `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me David.
> Basically, '/nodirectwritedata/gluster/gvol0' must be empty and
must not
> have any extended attributes set on it. Why fuse_first_lookup() is failing
> is a bit of a mystery to me at this point. :-(
> Regards,
> Ravi
>
>
> Thank you.
>
>
> On Wed, 22 May 2019 at 13:56, Ravishankar N <ravishankar at
redhat.com>
> wrote:
>
>> Hmm, so the volume info seems to indicate that the add-brick was
>> successful but the gfid xattr is missing on the new brick (as are the
>> actual files, barring the .glusterfs folder, according to your previous
>> mail).
>>
>> Do you want to try removing and adding it again?
>>
>> 1. `gluster volume remove-brick gvol0 replica 2
>> gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1
>>
>> 2. Check that gluster volume info is now back to a 1x2 volume on all
>> nodes and `gluster peer status` is  connected on all nodes.
>>
>> 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on
gfs3.
>>
>> 4. `gluster volume add-brick gvol0 replica 3 arbiter 1
>> gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.
>>
>> 5. Check that the files are getting healed on to the new brick.
>> Thanks,
>> Ravi
>> On 22/05/19 6:50 AM, David Cunningham wrote:
>>
>> Hi Ravi,
>>
>> Certainly. On the existing two nodes:
>>
>> gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>> getfattr: Removing leading '/' from absolute path names
>> # file: nodirectwritedata/gluster/gvol0
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.gvol0-client-2=0x000000000000000000000000
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>
>> gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>> getfattr: Removing leading '/' from absolute path names
>> # file: nodirectwritedata/gluster/gvol0
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.gvol0-client-0=0x000000000000000000000000
>> trusted.afr.gvol0-client-2=0x000000000000000000000000
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>
>> On the new node:
>>
>> gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>> getfattr: Removing leading '/' from absolute path names
>> # file: nodirectwritedata/gluster/gvol0
>> trusted.afr.dirty=0x000000000000000000000001
>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>
>> Output of "gluster volume info" is the same on all 3 nodes
and is:
>>
>> # gluster volume info
>>
>> Volume Name: gvol0
>> Type: Replicate
>> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfs1:/nodirectwritedata/gluster/gvol0
>> Brick2: gfs2:/nodirectwritedata/gluster/gvol0
>> Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
>> Options Reconfigured:
>> performance.client-io-threads: off
>> nfs.disable: on
>> transport.address-family: inet
>>
>>
>> On Wed, 22 May 2019 at 12:43, Ravishankar N <ravishankar at
redhat.com>
>> wrote:
>>
>>> Hi David,
>>> Could you provide the `getfattr -d -m. -e hex
>>> /nodirectwritedata/gluster/gvol0` output of all bricks and the
output of
>>> `gluster volume info`?
>>>
>>> Thanks,
>>> Ravi
>>> On 22/05/19 4:57 AM, David Cunningham wrote:
>>>
>>> Hi Sanju,
>>>
>>> Here's what glusterd.log says on the new arbiter server when
trying to
>>> add the node:
>>>
>>> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
>>> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
>>> [0x7fe4ca9102cd]
>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
>>> [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115)
>>> [0x7fe4d5ecc955] ) 0-management: Ran script:
>>>
/var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>>> --volname=gvol0 --version=1 --volume-op=add-brick
>>> --gd-workdir=/var/lib/glusterd
>>> [2019-05-22 00:15:05.963177] I [MSGID: 106578]
>>> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks]
0-management:
>>> replica-count is set 3
>>> [2019-05-22 00:15:05.963228] I [MSGID: 106578]
>>> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks]
0-management:
>>> arbiter-count is set 1
>>> [2019-05-22 00:15:05.963257] I [MSGID: 106578]
>>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks]
0-management:
>>> type is set 0, need to change it
>>> [2019-05-22 00:15:17.015268] E [MSGID: 106053]
>>> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops]
0-management:
>>> Failed to set extended attribute trusted.add-brick : Transport
endpoint is
>>> not connected [Transport endpoint is not connected]
>>> [2019-05-22 00:15:17.036479] E [MSGID: 106073]
>>> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd:
Unable to add
>>> bricks
>>> [2019-05-22 00:15:17.036595] E [MSGID: 106122]
>>> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick
commit
>>> failed.
>>> [2019-05-22 00:15:17.036710] E [MSGID: 106122]
>>> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn]
0-management:
>>> commit failed on operation Add brick
>>>
>>> As before gvol0-add-brick-mount.log said:
>>>
>>> [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init]
>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs
7.24 kernel
>>> 7.22
>>> [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync]
>>> 0-fuse: switched to graph 0
>>> [2019-05-22 00:15:17.010101] E
[fuse-bridge.c:4336:fuse_first_lookup]
>>> 0-fuse: first lookup on root failed (Transport endpoint is not
connected)
>>> [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk]
>>> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not
connected)
>>> [2019-05-22 00:15:17.015097] W
>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport
endpoint
>>> is not connected)
>>> [2019-05-22 00:15:17.015158] W
[fuse-bridge.c:3294:fuse_setxattr_resume]
>>> 0-glusterfs-fuse: 3: SETXATTR
00000000-0000-0000-0000-000000000001/1
>>> (trusted.add-brick) resolution failed
>>> [2019-05-22 00:15:17.035636] I
[fuse-bridge.c:5144:fuse_thread_proc]
>>> 0-fuse: initating unmount of /tmp/mntYGNbj9
>>> [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit]
>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
[0x55c81b63de75]
>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] )
0-:
>>> received signum (15), shutting down
>>> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse:
>>> Unmounting '/tmp/mntYGNbj9'.
>>> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse:
Closing
>>> fuse connection to '/tmp/mntYGNbj9'.
>>>
>>> Here are the processes running on the new arbiter server:
>>> # ps -ef | grep gluster
>>> root      3466     1  0 20:13 ?        00:00:00 /usr/sbin/glusterfs
-s
>>> localhost --volfile-id gluster/glustershd -p
>>> /var/run/gluster/glustershd/glustershd.pid -l
>>> /var/log/glusterfs/glustershd.log -S
>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
--process-name
>>> glustershd
>>> root      6832     1  0 May16 ?        00:02:10 /usr/sbin/glusterd
-p
>>> /var/run/glusterd.pid --log-level INFO
>>> root     17841     1  0 May16 ?        00:00:58 /usr/sbin/glusterfs
>>> --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0
/mnt/glusterfs
>>>
>>> Here are the files created on the new arbiter server:
>>> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
>>> drwxr-xr-x 3 root root 4096 May 21 20:15
/nodirectwritedata/gluster/gvol0
>>> drw------- 2 root root 4096 May 21 20:15
>>> /nodirectwritedata/gluster/gvol0/.glusterfs
>>>
>>> Thank you for your help!
>>>
>>>
>>> On Tue, 21 May 2019 at 00:10, Sanju Rakonde <srakonde at
redhat.com> wrote:
>>>
>>>> David,
>>>>
>>>> can you please attach glusterd.logs? As the error message says,
Commit
>>>> failed on the arbitar node, we might be able to find some issue
on that
>>>> node.
>>>>
>>>> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran <
>>>> nbalacha at redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, 17 May 2019 at 06:01, David Cunningham <
>>>>> dcunningham at voisonics.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> We're adding an arbiter node to an existing volume
and having an
>>>>>> issue. Can anyone help? The root cause error appears to
be
>>>>>> "00000000-0000-0000-0000-000000000001: failed to
resolve (Transport
>>>>>> endpoint is not connected)", as below.
>>>>>>
>>>>>> We are running glusterfs 5.6.1. Thanks in advance for
any assistance!
>>>>>>
>>>>>> On existing node gfs1, trying to add new arbiter node
gfs3:
>>>>>>
>>>>>> # gluster volume add-brick gvol0 replica 3 arbiter 1
>>>>>> gfs3:/nodirectwritedata/gluster/gvol0
>>>>>> volume add-brick: failed: Commit failed on gfs3. Please
check log
>>>>>> file for details.
>>>>>>
>>>>>
>>>>> This looks like a glusterd issue. Please check the glusterd
logs for
>>>>> more info.
>>>>> Adding the glusterd dev to this thread. Sanju, can you take
a look?
>>>>>
>>>>> Regards,
>>>>> Nithya
>>>>>
>>>>>>
>>>>>> On new node gfs3 in gvol0-add-brick-mount.log:
>>>>>>
>>>>>> [2019-05-17 01:20:22.689721] I
[fuse-bridge.c:4267:fuse_init]
>>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions:
glusterfs 7.24 kernel
>>>>>> 7.22
>>>>>> [2019-05-17 01:20:22.689778] I
[fuse-bridge.c:4878:fuse_graph_sync]
>>>>>> 0-fuse: switched to graph 0
>>>>>> [2019-05-17 01:20:22.694897] E
[fuse-bridge.c:4336:fuse_first_lookup]
>>>>>> 0-fuse: first lookup on root failed (Transport endpoint
is not connected)
>>>>>> [2019-05-17 01:20:22.699770] W
>>>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>>>>> 00000000-0000-0000-0000-000000000001: failed to resolve
(Transport endpoint
>>>>>> is not connected)
>>>>>> [2019-05-17 01:20:22.699834] W
>>>>>> [fuse-bridge.c:3294:fuse_setxattr_resume]
0-glusterfs-fuse: 2: SETXATTR
>>>>>> 00000000-0000-0000-0000-000000000001/1
(trusted.add-brick) resolution failed
>>>>>> [2019-05-17 01:20:22.715656] I
[fuse-bridge.c:5144:fuse_thread_proc]
>>>>>> 0-fuse: initating unmount of /tmp/mntQAtu3f
>>>>>> [2019-05-17 01:20:22.715865] W
[glusterfsd.c:1500:cleanup_and_exit]
>>>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
>>>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
[0x560886581e75]
>>>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
[0x560886581ceb] ) 0-:
>>>>>> received signum (15), shutting down
>>>>>> [2019-05-17 01:20:22.715926] I
[fuse-bridge.c:5914:fini] 0-fuse:
>>>>>> Unmounting '/tmp/mntQAtu3f'.
>>>>>> [2019-05-17 01:20:22.715953] I
[fuse-bridge.c:5919:fini] 0-fuse:
>>>>>> Closing fuse connection to '/tmp/mntQAtu3f'.
>>>>>>
>>>>>> Processes running on new node gfs3:
>>>>>>
>>>>>> # ps -ef | grep gluster
>>>>>> root      6832     1  0 20:17 ?        00:00:00
/usr/sbin/glusterd -p
>>>>>> /var/run/glusterd.pid --log-level INFO
>>>>>> root     15799     1  0 20:17 ?        00:00:00
/usr/sbin/glusterfs
>>>>>> -s localhost --volfile-id gluster/glustershd -p
>>>>>> /var/run/gluster/glustershd/glustershd.pid -l
>>>>>> /var/log/glusterfs/glustershd.log -S
>>>>>> /var/run/gluster/24c12b09f93eec8e.socket
--xlator-option
>>>>>>
*replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
>>>>>> glustershd
>>>>>> root     16856 16735  0 21:21 pts/0    00:00:00 grep
--color=auto
>>>>>> gluster
>>>>>>
>>>>>> --
>>>>>> David Cunningham, Voisonics Limited
>>>>>> http://voisonics.com/
>>>>>> USA: +1 213 221 1092
>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Sanju
>>>>
>>>
>>>
>>> --
>>> David Cunningham, Voisonics Limited
>>> http://voisonics.com/
>>> USA: +1 213 221 1092
>>> New Zealand: +64 (0)28 2558 3782
>>>
>>> _______________________________________________
>>> Gluster-users mailing listGluster-users at
gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>>
>>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
>
-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190523/ea4c1ce2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gvol0-add-brick-mount.log
Type: text/x-log
Size: 30154 bytes
Desc: not available
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190523/ea4c1ce2/attachment.bin>

Ravishankar N

2019-May-24 13:48 UTC

head link

[Gluster-users] add-brick: failed: Commit failed

Hi David,

On 23/05/19 3:54 AM, David Cunningham wrote:> Hi Ravi,
>
> Please see the log attached.When I grep -E "Connected to |disconnected from" 
gvol0-add-brick-mount.log,? I don't see a "Connected to
gvol0-client-1".
It looks like this temporary mount is not able to connect to the 2nd 
brick, which is why the lookup is failing due to lack of
quorum.> The output of "gluster volume status" is as follows. Should there
be
> something listening on gfs3? I'm not sure whether it having TCP Port 
> and Pid as N/A is a symptom or cause. Thank you.
>
> # gluster volume status
> Status of volume: gvol0
> Gluster process???????????????????????????? TCP Port? RDMA Port? 
> Online? Pid
>
------------------------------------------------------------------------------
> Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0????????? Y?????? 7706
> Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0????????? Y?????? 7624
> Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A N/A??????? N?????? N/A
Can you see if the following steps help?

1. Do a `setfattr -n trusted.afr.gvol0-client-2 -v 
0x000000000000000100000001 /nodirectwritedata/gluster/gvol0` on *both* 
gfs1 and gfs2.

2. 'gluster volume start gvol0 force`

3. Check if Brick-3 now comes online with a valid TCP port and PID. If 
it doesn't, check the brick log under /var/log/glusterfs/bricks on gfs3 
to see why.

Thanks,

Ravi

> Self-heal Daemon on localhost?????????????? N/A N/A??????? Y?????? 19853
> Self-heal Daemon on gfs1??????????????????? N/A N/A??????? Y?????? 28600
> Self-heal Daemon on gfs2??????????????????? N/A N/A??????? Y?????? 17614
>
> Task Status of Volume gvol0
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> On Wed, 22 May 2019 at 18:06, Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>> wrote:
>
>     If you are trying this again, please 'gluster volume set $volname
>     client-log-level DEBUG`before attempting the add-brick and attach
>     the gvol0-add-brick-mount.log here. After that, you can change the
>     client-log-level back to INFO.
>
>     -Ravi
>
>     On 22/05/19 11:32 AM, Ravishankar N wrote:
>>
>>
>>     On 22/05/19 11:23 AM, David Cunningham wrote:
>>>     Hi Ravi,
>>>
>>>     I'd already done exactly that before, where step 3 was a
simple
>>>     'rm -rf /nodirectwritedata/gluster/gvol0'. Have you
another
>>>     suggestion on what the cleanup or reformat should be?
>>     `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me
>>     David. Basically, '/nodirectwritedata/gluster/gvol0' must
be
>>     empty and must not have any extended attributes set on it. Why
>>     fuse_first_lookup() is failing is a bit of a mystery to me at
>>     this point. :-(
>>     Regards,
>>     Ravi
>>>
>>>     Thank you.
>>>
>>>
>>>     On Wed, 22 May 2019 at 13:56, Ravishankar N
>>>     <ravishankar at redhat.com <mailto:ravishankar at
redhat.com>> wrote:
>>>
>>>         Hmm, so the volume info seems to indicate that the
add-brick
>>>         was successful but the gfid xattr is missing on the new
>>>         brick (as are the actual files, barring the .glusterfs
>>>         folder, according to your previous mail).
>>>
>>>         Do you want to try removing and adding it again?
>>>
>>>         1. `gluster volume remove-brick gvol0 replica 2
>>>         gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1
>>>
>>>         2. Check that gluster volume info is now back to a 1x2
>>>         volume on all nodes and `gluster peer status` is? connected
>>>         on all nodes.
>>>
>>>         3. Cleanup or reformat
'/nodirectwritedata/gluster/gvol0' on
>>>         gfs3.
>>>
>>>         4. `gluster volume add-brick gvol0 replica 3 arbiter 1
>>>         gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.
>>>
>>>         5. Check that the files are getting healed on to the new
brick.
>>>
>>>         Thanks,
>>>         Ravi
>>>         On 22/05/19 6:50 AM, David Cunningham wrote:
>>>>         Hi Ravi,
>>>>
>>>>         Certainly. On the existing two nodes:
>>>>
>>>>         gfs1 # getfattr -d -m. -e hex
/nodirectwritedata/gluster/gvol0
>>>>         getfattr: Removing leading '/' from absolute
path names
>>>>         # file: nodirectwritedata/gluster/gvol0
>>>>         trusted.afr.dirty=0x000000000000000000000000
>>>>         trusted.afr.gvol0-client-2=0x000000000000000000000000
>>>>         trusted.gfid=0x00000000000000000000000000000001
>>>>        
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>        
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>>
>>>>         gfs2 # getfattr -d -m. -e hex
/nodirectwritedata/gluster/gvol0
>>>>         getfattr: Removing leading '/' from absolute
path names
>>>>         # file: nodirectwritedata/gluster/gvol0
>>>>         trusted.afr.dirty=0x000000000000000000000000
>>>>         trusted.afr.gvol0-client-0=0x000000000000000000000000
>>>>         trusted.afr.gvol0-client-2=0x000000000000000000000000
>>>>         trusted.gfid=0x00000000000000000000000000000001
>>>>        
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>        
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>>
>>>>         On the new node:
>>>>
>>>>         gfs3 # getfattr -d -m. -e hex
/nodirectwritedata/gluster/gvol0
>>>>         getfattr: Removing leading '/' from absolute
path names
>>>>         # file: nodirectwritedata/gluster/gvol0
>>>>         trusted.afr.dirty=0x000000000000000000000001
>>>>        
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>>
>>>>         Output of "gluster volume info" is the same
on all 3 nodes
>>>>         and is:
>>>>
>>>>         # gluster volume info
>>>>
>>>>         Volume Name: gvol0
>>>>         Type: Replicate
>>>>         Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
>>>>         Status: Started
>>>>         Snapshot Count: 0
>>>>         Number of Bricks: 1 x (2 + 1) = 3
>>>>         Transport-type: tcp
>>>>         Bricks:
>>>>         Brick1: gfs1:/nodirectwritedata/gluster/gvol0
>>>>         Brick2: gfs2:/nodirectwritedata/gluster/gvol0
>>>>         Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
>>>>         Options Reconfigured:
>>>>         performance.client-io-threads: off
>>>>         nfs.disable: on
>>>>         transport.address-family: inet
>>>>
>>>>
>>>>         On Wed, 22 May 2019 at 12:43, Ravishankar N
>>>>         <ravishankar at redhat.com <mailto:ravishankar at
redhat.com>> wrote:
>>>>
>>>>             Hi David,
>>>>             Could you provide the `getfattr -d -m. -e hex
>>>>             /nodirectwritedata/gluster/gvol0` output of all
bricks
>>>>             and the output of `gluster volume info`?
>>>>
>>>>             Thanks,
>>>>             Ravi
>>>>             On 22/05/19 4:57 AM, David Cunningham wrote:
>>>>>             Hi Sanju,
>>>>>
>>>>>             Here's what glusterd.log says on the new
arbiter
>>>>>             server when trying to add the node:
>>>>>
>>>>>             [2019-05-22 00:15:05.963059] I
[run.c:242:runner_log]
>>>>>            
(-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
>>>>>             [0x7fe4ca9102cd]
>>>>>            
-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
>>>>>             [0x7fe4ca9bbb85]
>>>>>            
-->/lib64/libglusterfs.so.0(runner_log+0x115)
>>>>>             [0x7fe4d5ecc955] ) 0-management: Ran script:
>>>>>            
/var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>>>>>             --volname=gvol0 --version=1
--volume-op=add-brick
>>>>>             --gd-workdir=/var/lib/glusterd
>>>>>             [2019-05-22 00:15:05.963177] I [MSGID: 106578]
>>>>>            
[glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks]
>>>>>             0-management: replica-count is set 3
>>>>>             [2019-05-22 00:15:05.963228] I [MSGID: 106578]
>>>>>            
[glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks]
>>>>>             0-management: arbiter-count is set 1
>>>>>             [2019-05-22 00:15:05.963257] I [MSGID: 106578]
>>>>>            
[glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks]
>>>>>             0-management: type is set 0, need to change it
>>>>>             [2019-05-22 00:15:17.015268] E [MSGID: 106053]
>>>>>            
[glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops]
>>>>>             0-management: Failed to set extended attribute
>>>>>             trusted.add-brick : Transport endpoint is not
>>>>>             connected [Transport endpoint is not connected]
>>>>>             [2019-05-22 00:15:17.036479] E [MSGID: 106073]
>>>>>            
[glusterd-brick-ops.c:2595:glusterd_op_add_brick]
>>>>>             0-glusterd: Unable to add bricks
>>>>>             [2019-05-22 00:15:17.036595] E [MSGID: 106122]
>>>>>             [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn]
>>>>>             0-management: Add-brick commit failed.
>>>>>             [2019-05-22 00:15:17.036710] E [MSGID: 106122]
>>>>>            
[glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn]
>>>>>             0-management: commit failed on operation Add
brick
>>>>>
>>>>>             As before gvol0-add-brick-mount.log said:
>>>>>
>>>>>             [2019-05-22 00:15:17.005695] I
>>>>>             [fuse-bridge.c:4267:fuse_init]
0-glusterfs-fuse: FUSE
>>>>>             inited with protocol versions: glusterfs 7.24
kernel 7.22
>>>>>             [2019-05-22 00:15:17.005749] I
>>>>>             [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse:
switched
>>>>>             to graph 0
>>>>>             [2019-05-22 00:15:17.010101] E
>>>>>             [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse:
first
>>>>>             lookup on root failed (Transport endpoint is
not
>>>>>             connected)
>>>>>             [2019-05-22 00:15:17.014217] W
>>>>>             [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 2:
>>>>>             LOOKUP() / => -1 (Transport endpoint is not
connected)
>>>>>             [2019-05-22 00:15:17.015097] W
>>>>>             [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
0-fuse:
>>>>>             00000000-0000-0000-0000-000000000001: failed to
>>>>>             resolve (Transport endpoint is not connected)
>>>>>             [2019-05-22 00:15:17.015158] W
>>>>>             [fuse-bridge.c:3294:fuse_setxattr_resume]
>>>>>             0-glusterfs-fuse: 3: SETXATTR
>>>>>             00000000-0000-0000-0000-000000000001/1
>>>>>             (trusted.add-brick) resolution failed
>>>>>             [2019-05-22 00:15:17.035636] I
>>>>>             [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse:
>>>>>             initating unmount of /tmp/mntYGNbj9
>>>>>             [2019-05-22 00:15:17.035854] W
>>>>>             [glusterfsd.c:1500:cleanup_and_exit]
>>>>>             (-->/lib64/libpthread.so.0(+0x7dd5)
[0x7f7745ccedd5]
>>>>>            
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>>>             [0x55c81b63de75]
>>>>>            
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>>>             [0x55c81b63dceb] ) 0-: received signum (15),
shutting down
>>>>>             [2019-05-22 00:15:17.035942] I
>>>>>             [fuse-bridge.c:5914:fini] 0-fuse: Unmounting
>>>>>             '/tmp/mntYGNbj9'.
>>>>>             [2019-05-22 00:15:17.035966] I
>>>>>             [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse
>>>>>             connection to '/tmp/mntYGNbj9'.
>>>>>
>>>>>             Here are the processes running on the new
arbiter server:
>>>>>             # ps -ef | grep gluster
>>>>>             root????? 3466???? 1? 0 20:13 ???????? 00:00:00
>>>>>             /usr/sbin/glusterfs -s localhost --volfile-id
>>>>>             gluster/glustershd -p
>>>>>             /var/run/gluster/glustershd/glustershd.pid -l
>>>>>             /var/log/glusterfs/glustershd.log -S
>>>>>             /var/run/gluster/24c12b09f93eec8e.socket
>>>>>             --xlator-option
>>>>>            
*replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>>>             --process-name glustershd
>>>>>             root????? 6832???? 1? 0 May16 ???????? 00:02:10
>>>>>             /usr/sbin/glusterd -p /var/run/glusterd.pid
>>>>>             --log-level INFO
>>>>>             root???? 17841???? 1? 0 May16 ???????? 00:00:58
>>>>>             /usr/sbin/glusterfs --process-name fuse
>>>>>             --volfile-server=gfs1 --volfile-id=/gvol0
/mnt/glusterfs
>>>>>
>>>>>             Here are the files created on the new arbiter
server:
>>>>>             # find /nodirectwritedata/gluster/gvol0 | xargs
ls -ald
>>>>>             drwxr-xr-x 3 root root 4096 May 21 20:15
>>>>>             /nodirectwritedata/gluster/gvol0
>>>>>             drw------- 2 root root 4096 May 21 20:15
>>>>>             /nodirectwritedata/gluster/gvol0/.glusterfs
>>>>>
>>>>>             Thank you for your help!
>>>>>
>>>>>
>>>>>             On Tue, 21 May 2019 at 00:10, Sanju Rakonde
>>>>>             <srakonde at redhat.com <mailto:srakonde
at redhat.com>> wrote:
>>>>>
>>>>>                 David,
>>>>>
>>>>>                 can you please attach glusterd.logs? As the
error
>>>>>                 message says, Commit failed on the arbitar
node,
>>>>>                 we might be able to find some issue on that
node.
>>>>>
>>>>>                 On Mon, May 20, 2019 at 10:10 AM Nithya
>>>>>                 Balachandran <nbalacha at redhat.com
>>>>>                 <mailto:nbalacha at redhat.com>>
wrote:
>>>>>
>>>>>
>>>>>
>>>>>                     On Fri, 17 May 2019 at 06:01, David
Cunningham
>>>>>                     <dcunningham at voisonics.com
>>>>>                     <mailto:dcunningham at
voisonics.com>> wrote:
>>>>>
>>>>>                         Hello,
>>>>>
>>>>>                         We're adding an arbiter node to
an
>>>>>                         existing volume and having an
issue. Can
>>>>>                         anyone help? The root cause error
appears
>>>>>                         to be
>>>>>                        
"00000000-0000-0000-0000-000000000001:
>>>>>                         failed to resolve (Transport
endpoint is
>>>>>                         not connected)", as below.
>>>>>
>>>>>                         We are running glusterfs 5.6.1.
Thanks in
>>>>>                         advance for any assistance!
>>>>>
>>>>>                         On existing node gfs1, trying to
add new
>>>>>                         arbiter node gfs3:
>>>>>
>>>>>                         # gluster volume add-brick gvol0
replica 3
>>>>>                         arbiter 1
>>>>>                        
gfs3:/nodirectwritedata/gluster/gvol0
>>>>>                         volume add-brick: failed: Commit
failed on
>>>>>                         gfs3. Please check log file for
details.
>>>>>
>>>>>
>>>>>                     This looks like a glusterd issue.
Please check
>>>>>                     the glusterd logs for more info.
>>>>>                     Adding the glusterd dev to this thread.
Sanju,
>>>>>                     can you take a look?
>>>>>                     Regards,
>>>>>                     Nithya
>>>>>
>>>>>
>>>>>                         On new node gfs3 in
gvol0-add-brick-mount.log:
>>>>>
>>>>>                         [2019-05-17 01:20:22.689721] I
>>>>>                         [fuse-bridge.c:4267:fuse_init]
>>>>>                         0-glusterfs-fuse: FUSE inited with
>>>>>                         protocol versions: glusterfs 7.24
kernel 7.22
>>>>>                         [2019-05-17 01:20:22.689778] I
>>>>>                        
[fuse-bridge.c:4878:fuse_graph_sync]
>>>>>                         0-fuse: switched to graph 0
>>>>>                         [2019-05-17 01:20:22.694897] E
>>>>>                        
[fuse-bridge.c:4336:fuse_first_lookup]
>>>>>                         0-fuse: first lookup on root failed
>>>>>                         (Transport endpoint is not
connected)
>>>>>                         [2019-05-17 01:20:22.699770] W
>>>>>                        
[fuse-resolve.c:127:fuse_resolve_gfid_cbk]
>>>>>                         0-fuse:
>>>>>                        
00000000-0000-0000-0000-000000000001:
>>>>>                         failed to resolve (Transport
endpoint is
>>>>>                         not connected)
>>>>>                         [2019-05-17 01:20:22.699834] W
>>>>>                        
[fuse-bridge.c:3294:fuse_setxattr_resume]
>>>>>                         0-glusterfs-fuse: 2: SETXATTR
>>>>>                        
00000000-0000-0000-0000-000000000001/1
>>>>>                         (trusted.add-brick) resolution
failed
>>>>>                         [2019-05-17 01:20:22.715656] I
>>>>>                        
[fuse-bridge.c:5144:fuse_thread_proc]
>>>>>                         0-fuse: initating unmount of
/tmp/mntQAtu3f
>>>>>                         [2019-05-17 01:20:22.715865] W
>>>>>                        
[glusterfsd.c:1500:cleanup_and_exit]
>>>>>                        
(-->/lib64/libpthread.so.0(+0x7dd5)
>>>>>                         [0x7fb223bf6dd5]
>>>>>                        
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>>>                         [0x560886581e75]
>>>>>                        
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>>>                         [0x560886581ceb] ) 0-: received
signum
>>>>>                         (15), shutting down
>>>>>                         [2019-05-17 01:20:22.715926] I
>>>>>                         [fuse-bridge.c:5914:fini] 0-fuse:
>>>>>                         Unmounting
'/tmp/mntQAtu3f'.
>>>>>                         [2019-05-17 01:20:22.715953] I
>>>>>                         [fuse-bridge.c:5919:fini] 0-fuse:
Closing
>>>>>                         fuse connection to
'/tmp/mntQAtu3f'.
>>>>>
>>>>>                         Processes running on new node gfs3:
>>>>>
>>>>>                         # ps -ef | grep gluster
>>>>>                         root 6832???? 1? 0 20:17 ? 00:00:00
>>>>>                         /usr/sbin/glusterd -p
>>>>>                         /var/run/glusterd.pid --log-level
INFO
>>>>>                         root 15799???? 1? 0 20:17 ?
00:00:00
>>>>>                         /usr/sbin/glusterfs -s localhost
>>>>>                         --volfile-id gluster/glustershd -p
>>>>>                        
/var/run/gluster/glustershd/glustershd.pid
>>>>>                         -l
/var/log/glusterfs/glustershd.log -S
>>>>>                        
/var/run/gluster/24c12b09f93eec8e.socket
>>>>>                         --xlator-option
>>>>>                        
*replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>>>                         --process-name glustershd
>>>>>                         root???? 16856 16735? 0 21:21 pts/0
>>>>>                         00:00:00 grep --color=auto gluster
>>>>>
>>>>>                         -- 
>>>>>                         David Cunningham, Voisonics Limited
>>>>>                         http://voisonics.com/
>>>>>                         USA: +1 213 221 1092
>>>>>                         New Zealand: +64 (0)28 2558 3782
>>>>>                        
_______________________________________________
>>>>>                         Gluster-users mailing list
>>>>>                         Gluster-users at gluster.org
>>>>>                         <mailto:Gluster-users at
gluster.org>
>>>>>                        
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>>                 -- 
>>>>>                 Thanks,
>>>>>                 Sanju
>>>>>
>>>>>
>>>>>
>>>>>             -- 
>>>>>             David Cunningham, Voisonics Limited
>>>>>             http://voisonics.com/
>>>>>             USA: +1 213 221 1092
>>>>>             New Zealand: +64 (0)28 2558 3782
>>>>>
>>>>>             _______________________________________________
>>>>>             Gluster-users mailing list
>>>>>             Gluster-users at gluster.org 
<mailto:Gluster-users at gluster.org>
>>>>>            
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>>         -- 
>>>>         David Cunningham, Voisonics Limited
>>>>         http://voisonics.com/
>>>>         USA: +1 213 221 1092
>>>>         New Zealand: +64 (0)28 2558 3782
>>>
>>>
>>>
>>>     -- 
>>>     David Cunningham, Voisonics Limited
>>>     http://voisonics.com/
>>>     USA: +1 213 221 1092
>>>     New Zealand: +64 (0)28 2558 3782
>
>
>
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190524/099fc642/attachment.html>

Gluster users - May 2019 - add-brick: failed: Commit failed

[Gluster-users] add-brick: failed: Commit failed

[Gluster-users] add-brick: failed: Commit failed