thr3ads.net - Gluster users - [Gluster-users] glusterfs 4.1.6 error in starting glusterd service [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Atin Mukherjee

2019-Jan-17 10:13 UTC

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

Can you please run 'glusterd -LDEBUG' and share back the glusterd.log?
Instead of doing too many back and forth I suggest you to share the content
of /var/lib/glusterd from all the nodes. Also do mention which particular
node the glusterd service is unable to come up.

On Thu, Jan 17, 2019 at 11:34 AM Amudhan P <amudhan83 at gmail.com> wrote:
> I have created the folder in the path as said but still, service failed to
> start below is the error msg in glusterd.log
>
> [2019-01-16 14:50:14.555742] I [MSGID: 100030] [glusterfsd.c:2741:main]
> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
> [2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init]
> 0-management: Maximum allowed open file descriptors set to 65536
> [2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init]
> 0-management: Using /var/lib/glusterd as working directory
> [2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init]
> 0-management: Using /var/run/gluster as pid file working directory
> [2019-01-16 14:50:14.563834] W [MSGID: 103071]
> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
> channel creation failed [No such device]
> [2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init]
> 0-rdma.management: Failed to initialize IB Device
> [2019-01-16 14:50:14.563882] W [rpc-transport.c:351:rpc_transport_load]
> 0-rpc-transport: 'rdma' initialization failed
> [2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener]
> 0-rpc-service: cannot create listener, initing the transport failed
> [2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init]
> 0-management: creation of 1 listeners failed, continuing with succeeded
> transport
> [2019-01-16 14:50:15.565868] I [MSGID: 106513]
> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
> op-version: 40100
> [2019-01-16 14:50:15.642532] I [MSGID: 106544]
> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
> d6bf51a7-c296-492f-8dac-e81efa9dd22d
> [2019-01-16 14:50:15.675333] I [MSGID: 106498]
> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
> connect returned 0
> [2019-01-16 14:50:15.675421] W [MSGID: 106061]
> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
> Failed to get tcp-user-timeout
> [2019-01-16 14:50:15.675451] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
> brick failed in restore*
> *[2019-01-16 14:50:15.676956] E [MSGID: 101019] [xlator.c:720:xlator_init]
> 0-management: Initialization of volume 'management' failed, review
your
> volfile again*
> [2019-01-16 14:50:15.676973] E [MSGID: 101066]
> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
> failed
> [2019-01-16 14:50:15.676986] E [MSGID: 101176]
> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
> [2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit]
> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
> received signum (-1), shutting down
>
>
> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee <amukherj at
redhat.com>
> wrote:
>
>> If gluster volume info/status shows the brick to be /media/disk4/brick4
>> then you'd need to mount the same path and hence you'd need to
create the
>> brick4 directory explicitly. I fail to understand the rationale how
only
>> /media/disk4 can be used as the mount path for the brick.
>>
>> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P <amudhan83 at
gmail.com> wrote:
>>
>>> Yes, I did mount bricks but the folder 'brick4' was still
not created
>>> inside the brick.
>>> Do I need to create this folder because when I run replace-brick it
will
>>> create folder inside the brick. I have seen this behavior before
when
>>> running replace-brick or heal begins.
>>>
>>> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee <amukherj at
redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P <amudhan83 at
gmail.com> wrote:
>>>>
>>>>> Atin,
>>>>> I have copied the content of 'gfs-tst' from vol
folder in another
>>>>> node. when starting service again fails with error msg in
glusterd.log file.
>>>>>
>>>>> [2019-01-15 20:16:59.513023] I [MSGID: 100030]
>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd:
Started running
>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>> /var/run/glusterd.pid)
>>>>> [2019-01-15 20:16:59.517164] I [MSGID: 106478]
[glusterd.c:1423:init]
>>>>> 0-management: Maximum allowed open file descriptors set to
65536
>>>>> [2019-01-15 20:16:59.517264] I [MSGID: 106479]
[glusterd.c:1481:init]
>>>>> 0-management: Using /var/lib/glusterd as working directory
>>>>> [2019-01-15 20:16:59.517283] I [MSGID: 106479]
[glusterd.c:1486:init]
>>>>> 0-management: Using /var/run/gluster as pid file working
directory
>>>>> [2019-01-15 20:16:59.521508] W [MSGID: 103071]
>>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma:
rdma_cm event
>>>>> channel creation failed [No such device]
>>>>> [2019-01-15 20:16:59.521544] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>> 0-rdma.management: Failed to initialize IB Device
>>>>> [2019-01-15 20:16:59.521562] W
>>>>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport:
'rdma'
>>>>> initialization failed
>>>>> [2019-01-15 20:16:59.521629] W
[rpcsvc.c:1781:rpcsvc_create_listener]
>>>>> 0-rpc-service: cannot create listener, initing the
transport failed
>>>>> [2019-01-15 20:16:59.521648] E [MSGID: 106244]
[glusterd.c:1764:init]
>>>>> 0-management: creation of 1 listeners failed, continuing
with succeeded
>>>>> transport
>>>>> [2019-01-15 20:17:00.529390] I [MSGID: 106513]
>>>>> [glusterd-store.c:2240:glusterd_restore_op_version]
0-glusterd: retrieved
>>>>> op-version: 40100
>>>>> [2019-01-15 20:17:00.608354] I [MSGID: 106544]
>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved
UUID:
>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>> [2019-01-15 20:17:00.650911] W [MSGID: 106425]
>>>>> [glusterd-store.c:2643:glusterd_store_retrieve_bricks]
0-management: failed
>>>>> to get statfs() call on brick /media/disk4/brick4 [No such
file or
>>>>> directory]
>>>>>
>>>>
>>>> This means that underlying brick /media/disk4/brick4
doesn't exist. You
>>>> already mentioned that you had replaced the faulty disk, but
have you not
>>>> mounted it yet?
>>>>
>>>>
>>>>> [2019-01-15 20:17:00.691240] I [MSGID: 106498]
>>>>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo]
0-management:
>>>>> connect returned 0
>>>>> [2019-01-15 20:17:00.691307] W [MSGID: 106061]
>>>>>
[glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>>>>> Failed to get tcp-user-timeout
>>>>> [2019-01-15 20:17:00.691331] I
>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management:
setting
>>>>> frame-timeout to 600
>>>>> [2019-01-15 20:17:00.692547] E [MSGID: 106187]
>>>>> [glusterd-store.c:4662:glusterd_resolve_all_bricks]
0-glusterd: resolve
>>>>> brick failed in restore
>>>>> [2019-01-15 20:17:00.692582] E [MSGID: 101019]
>>>>> [xlator.c:720:xlator_init] 0-management: Initialization of
volume
>>>>> 'management' failed, review your volfile again
>>>>> [2019-01-15 20:17:00.692597] E [MSGID: 101066]
>>>>> [graph.c:367:glusterfs_graph_init] 0-management:
initializing translator
>>>>> failed
>>>>> [2019-01-15 20:17:00.692607] E [MSGID: 101176]
>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>>> [2019-01-15 20:17:00.693004] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f)
[0x40942f] ) 0-:
>>>>> received signum (-1), shutting down
>>>>>
>>>>>
>>>>> On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee <amukherj
at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> This is a case of partial write of a transaction and as
the host ran
>>>>>> out of space for the root partition where all the
glusterd related
>>>>>> configurations are persisted, the transaction
couldn't be written and hence
>>>>>> the new (replaced) brick's information wasn't
persisted in the
>>>>>> configuration. The workaround for this is to copy the
content of
>>>>>> /var/lib/glusterd/vols/gfs-tst/ from one of the nodes
in the trusted
>>>>>> storage pool to the node where glusterd service fails
to come up and post
>>>>>> that restarting the glusterd service should be able to
make peer status
>>>>>> reporting all nodes healthy and connected.
>>>>>>
>>>>>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P <amudhan83
at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> In short, when I started glusterd service I am
getting following
>>>>>>> error msg in the glusterd.log file in one server.
>>>>>>> what needs to be done?
>>>>>>>
>>>>>>> error logged in glusterd.log
>>>>>>>
>>>>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030]
>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>>>> /var/run/glusterd.pid)
>>>>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478]
>>>>>>> [glusterd.c:1423:init] 0-management: Maximum
allowed open file descriptors
>>>>>>> set to 65536
>>>>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479]
>>>>>>> [glusterd.c:1481:init] 0-management: Using
/var/lib/glusterd as working
>>>>>>> directory
>>>>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479]
>>>>>>> [glusterd.c:1486:init] 0-management: Using
/var/run/gluster as pid file
>>>>>>> working directory
>>>>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>> channel creation failed [No such device]
>>>>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>>>> 0-rdma.management: Failed to initialize IB Device
>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>> [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma'
>>>>>>> initialization failed
>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create
>>>>>>> listener, initing the transport failed
>>>>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244]
>>>>>>> [glusterd.c:1764:init] 0-management: creation of 1
listeners failed,
>>>>>>> continuing with succeeded transport
>>>>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
>>>>>>> [glusterd-store.c:2240:glusterd_restore_op_version]
0-glusterd: retrieved
>>>>>>> op-version: 40100
>>>>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
>>>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management:
retrieved UUID:
>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
>>>>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path
corresponding to
>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>> file or directory]
>>>>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>> Unable to restore volume: gfs-tst
>>>>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019]
>>>>>>> [xlator.c:720:xlator_init] 0-management:
Initialization of volume
>>>>>>> 'management' failed, review your volfile
again
>>>>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
>>>>>>> [graph.c:367:glusterfs_graph_init] 0-management:
initializing translator
>>>>>>> failed
>>>>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
>>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph:
init failed
>>>>>>> [2019-01-15 17:50:15.047171] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> In long, I am trying to simulate a situation. where
volume stoped
>>>>>>> abnormally and
>>>>>>> entire cluster restarted with some missing disks.
>>>>>>>
>>>>>>> My test cluster is set up with 3 nodes and each has
four disks, I
>>>>>>> have setup a volume with disperse 4+2.
>>>>>>> In Node-3 2 disks have failed, to replace I have
shutdown all system
>>>>>>>
>>>>>>> below are the steps done.
>>>>>>>
>>>>>>> 1. umount from client machine
>>>>>>> 2. shutdown all system by running `shutdown -h now`
command (
>>>>>>> without stopping volume and stop service)
>>>>>>> 3. replace faulty disk in Node-3
>>>>>>> 4. powered ON all system
>>>>>>> 5. format replaced drives, and mount all drives
>>>>>>> 6. start glusterd service in all node (success)
>>>>>>> 7. Now running `voulume status` command from node-3
>>>>>>> output : [2019-01-15 16:52:17.718422]  : v status :
FAILED : Staging
>>>>>>> failed on 0083ec0c-40bf-472a-a128-458924e56c96.
Please check log file for
>>>>>>> details.
>>>>>>> 8. running `voulume start gfs-tst` command from
node-3
>>>>>>> output : [2019-01-15 16:53:19.410252]  : v start
gfs-tst : FAILED :
>>>>>>> Volume gfs-tst already started
>>>>>>>
>>>>>>> 9. running `gluster v status` in other node.
showing all brick
>>>>>>> available but 'self-heal daemon' not
running
>>>>>>> @gfstst-node2:~$ sudo gluster v status
>>>>>>> Status of volume: gfs-tst
>>>>>>> Gluster process                             TCP
Port  RDMA Port
>>>>>>> Online  Pid
>>>>>>>
>>>>>>>
------------------------------------------------------------------------------
>>>>>>> Brick IP.2:/media/disk1/brick1          49152     0
Y
>>>>>>>  1517
>>>>>>> Brick IP.4:/media/disk1/brick1          49152     0
Y
>>>>>>>  1668
>>>>>>> Brick IP.2:/media/disk2/brick2          49153     0
Y
>>>>>>>  1522
>>>>>>> Brick IP.4:/media/disk2/brick2          49153     0
Y
>>>>>>>  1678
>>>>>>> Brick IP.2:/media/disk3/brick3          49154     0
Y
>>>>>>>  1527
>>>>>>> Brick IP.4:/media/disk3/brick3          49154     0
Y
>>>>>>>  1677
>>>>>>> Brick IP.2:/media/disk4/brick4          49155     0
Y
>>>>>>>  1541
>>>>>>> Brick IP.4:/media/disk4/brick4          49155     0
Y
>>>>>>>  1683
>>>>>>> Self-heal Daemon on localhost               N/A    
N/A        Y
>>>>>>>      2662
>>>>>>> Self-heal Daemon on IP.4                N/A      
N/A        Y
>>>>>>>  2786
>>>>>>>
>>>>>>> 10. in the above output 'volume already
started'. so, running
>>>>>>> `reset-brick` command
>>>>>>>    v reset-brick gfs-tst IP.3:/media/disk3/brick3
>>>>>>> IP.3:/media/disk3/brick3 commit force
>>>>>>>
>>>>>>> output : [2019-01-15 16:57:37.916942]  : v
reset-brick gfs-tst
>>>>>>> IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3
commit force : FAILED :
>>>>>>> /media/disk3/brick3 is already part of a volume
>>>>>>>
>>>>>>> 11. reset-brick command was not working, so, tried
stopping volume
>>>>>>> and start with force command
>>>>>>> output : [2019-01-15 17:01:04.570794]  : v start
gfs-tst force :
>>>>>>> FAILED : Pre-validation failed on localhost. Please
check log file for
>>>>>>> details
>>>>>>>
>>>>>>> 12. now stopped service in all node and tried
starting again. except
>>>>>>> node-3 other nodes service started successfully
without any issues.
>>>>>>>
>>>>>>> in node-3 receiving following message.
>>>>>>>
>>>>>>> sudo service glusterd start
>>>>>>> * Starting glusterd service glusterd
>>>>>>>
>>>>>>>                 [fail]
>>>>>>> /usr/local/sbin/glusterd: option requires an
argument -- 'f'
>>>>>>> Try `glusterd --help' or `glusterd --usage'
for more information.
>>>>>>>
>>>>>>> 13. checking glusterd log file found that OS drive
was running out
>>>>>>> of space
>>>>>>> output : [2019-01-15 16:51:37.210792] W [MSGID:
101012]
>>>>>>> [store.c:372:gf_store_save_value] 0-management:
fflush failed. [No space
>>>>>>> left on device]
>>>>>>> [2019-01-15 16:51:37.210874] E [MSGID: 106190]
>>>>>>>
[glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management:
>>>>>>> Unable to write volume values for gfs-tst
>>>>>>>
>>>>>>> 14. cleared some space in OS drive but still,
service is not
>>>>>>> running. below is the error logged in glusterd.log
>>>>>>>
>>>>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030]
>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>>>> /var/run/glusterd.pid)
>>>>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478]
>>>>>>> [glusterd.c:1423:init] 0-management: Maximum
allowed open file descriptors
>>>>>>> set to 65536
>>>>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479]
>>>>>>> [glusterd.c:1481:init] 0-management: Using
/var/lib/glusterd as working
>>>>>>> directory
>>>>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479]
>>>>>>> [glusterd.c:1486:init] 0-management: Using
/var/run/gluster as pid file
>>>>>>> working directory
>>>>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>> channel creation failed [No such device]
>>>>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>>>> 0-rdma.management: Failed to initialize IB Device
>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>> [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma'
>>>>>>> initialization failed
>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create
>>>>>>> listener, initing the transport failed
>>>>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244]
>>>>>>> [glusterd.c:1764:init] 0-management: creation of 1
listeners failed,
>>>>>>> continuing with succeeded transport
>>>>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
>>>>>>> [glusterd-store.c:2240:glusterd_restore_op_version]
0-glusterd: retrieved
>>>>>>> op-version: 40100
>>>>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
>>>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management:
retrieved UUID:
>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
>>>>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path
corresponding to
>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>> file or directory]
>>>>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>> Unable to restore volume: gfs-tst
>>>>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019]
>>>>>>> [xlator.c:720:xlator_init] 0-management:
Initialization of volume
>>>>>>> 'management' failed, review your volfile
again
>>>>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
>>>>>>> [graph.c:367:glusterfs_graph_init] 0-management:
initializing translator
>>>>>>> failed
>>>>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
>>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph:
init failed
>>>>>>> [2019-01-15 17:50:15.047171] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>>>
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>>>>>>> received signum (-1), shutting down
>>>>>>>
>>>>>>>
>>>>>>> 15. In other node running `volume status' still
shows bricks node3
>>>>>>> is live
>>>>>>>      but 'peer status' showing node-3
disconnected
>>>>>>>
>>>>>>> @gfstst-node2:~$ sudo gluster v status
>>>>>>> Status of volume: gfs-tst
>>>>>>> Gluster process                             TCP
Port  RDMA Port
>>>>>>> Online  Pid
>>>>>>>
>>>>>>>
------------------------------------------------------------------------------
>>>>>>> Brick IP.2:/media/disk1/brick1          49152     0
Y
>>>>>>>  1517
>>>>>>> Brick IP.4:/media/disk1/brick1          49152     0
Y
>>>>>>>  1668
>>>>>>> Brick IP.2:/media/disk2/brick2          49153     0
Y
>>>>>>>  1522
>>>>>>> Brick IP.4:/media/disk2/brick2          49153     0
Y
>>>>>>>  1678
>>>>>>> Brick IP.2:/media/disk3/brick3          49154     0
Y
>>>>>>>  1527
>>>>>>> Brick IP.4:/media/disk3/brick3          49154     0
Y
>>>>>>>  1677
>>>>>>> Brick IP.2:/media/disk4/brick4          49155     0
Y
>>>>>>>  1541
>>>>>>> Brick IP.4:/media/disk4/brick4          49155     0
Y
>>>>>>>  1683
>>>>>>> Self-heal Daemon on localhost           N/A      
N/A        Y
>>>>>>>  2662
>>>>>>> Self-heal Daemon on IP.4                N/A      
N/A        Y
>>>>>>>  2786
>>>>>>>
>>>>>>> Task Status of Volume gfs-tst
>>>>>>>
>>>>>>>
------------------------------------------------------------------------------
>>>>>>> There are no active volume tasks
>>>>>>>
>>>>>>>
>>>>>>> root at gfstst-node2:~$ sudo gluster pool list
>>>>>>> UUID                                    Hostname   
State
>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d    IP.3       
Disconnected
>>>>>>> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143    IP.4       
Connected
>>>>>>> 0083ec0c-40bf-472a-a128-458924e56c96    localhost  
Connected
>>>>>>>
>>>>>>> root at gfstst-node2:~$ sudo gluster peer status
>>>>>>> Number of Peers: 2
>>>>>>>
>>>>>>> Hostname: IP.3
>>>>>>> Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>> State: Peer in Cluster (Disconnected)
>>>>>>>
>>>>>>> Hostname: IP.4
>>>>>>> Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143
>>>>>>> State: Peer in Cluster (Connected)
>>>>>>>
>>>>>>>
>>>>>>> regards
>>>>>>> Amudhan
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190117/16185d5c/attachment.html>

Amudhan P

2019-Jan-18 11:22 UTC

head link

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

Hi Atin,

I have sent files to your email directly in other mail. hope you have
received.

regards
Amudhan

On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee <amukherj at redhat.com>
wrote:
> Can you please run 'glusterd -LDEBUG' and share back the
glusterd.log?
> Instead of doing too many back and forth I suggest you to share the content
> of /var/lib/glusterd from all the nodes. Also do mention which particular
> node the glusterd service is unable to come up.
>
> On Thu, Jan 17, 2019 at 11:34 AM Amudhan P <amudhan83 at gmail.com>
wrote:
>
>> I have created the folder in the path as said but still, service failed
>> to start below is the error msg in glusterd.log
>>
>> [2019-01-16 14:50:14.555742] I [MSGID: 100030] [glusterfsd.c:2741:main]
>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
>> [2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init]
>> 0-management: Using /var/lib/glusterd as working directory
>> [2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init]
>> 0-management: Using /var/run/gluster as pid file working directory
>> [2019-01-16 14:50:14.563834] W [MSGID: 103071]
>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>> channel creation failed [No such device]
>> [2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init]
>> 0-rdma.management: Failed to initialize IB Device
>> [2019-01-16 14:50:14.563882] W [rpc-transport.c:351:rpc_transport_load]
>> 0-rpc-transport: 'rdma' initialization failed
>> [2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init]
>> 0-management: creation of 1 listeners failed, continuing with succeeded
>> transport
>> [2019-01-16 14:50:15.565868] I [MSGID: 106513]
>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd:
retrieved
>> op-version: 40100
>> [2019-01-16 14:50:15.642532] I [MSGID: 106544]
>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>> [2019-01-16 14:50:15.675333] I [MSGID: 106498]
>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo]
0-management:
>> connect returned 0
>> [2019-01-16 14:50:15.675421] W [MSGID: 106061]
>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build]
0-glusterd:
>> Failed to get tcp-user-timeout
>> [2019-01-16 14:50:15.675451] I
[rpc-clnt.c:1059:rpc_clnt_connection_init]
>> 0-management: setting frame-timeout to 600
>> *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
>> brick failed in restore*
>> *[2019-01-16 14:50:15.676956] E [MSGID: 101019]
>> [xlator.c:720:xlator_init] 0-management: Initialization of volume
>> 'management' failed, review your volfile again*
>> [2019-01-16 14:50:15.676973] E [MSGID: 101066]
>> [graph.c:367:glusterfs_graph_init] 0-management: initializing
translator
>> failed
>> [2019-01-16 14:50:15.676986] E [MSGID: 101176]
>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>> [2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit]
>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151)
[0x409e41]
>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>> received signum (-1), shutting down
>>
>>
>> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee <amukherj at
redhat.com>
>> wrote:
>>
>>> If gluster volume info/status shows the brick to be
/media/disk4/brick4
>>> then you'd need to mount the same path and hence you'd need
to create the
>>> brick4 directory explicitly. I fail to understand the rationale how
only
>>> /media/disk4 can be used as the mount path for the brick.
>>>
>>> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P <amudhan83 at
gmail.com> wrote:
>>>
>>>> Yes, I did mount bricks but the folder 'brick4' was
still not created
>>>> inside the brick.
>>>> Do I need to create this folder because when I run
replace-brick it
>>>> will create folder inside the brick. I have seen this behavior
before when
>>>> running replace-brick or heal begins.
>>>>
>>>> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee <amukherj at
redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P <amudhan83 at
gmail.com> wrote:
>>>>>
>>>>>> Atin,
>>>>>> I have copied the content of 'gfs-tst' from vol
folder in another
>>>>>> node. when starting service again fails with error msg
in glusterd.log file.
>>>>>>
>>>>>> [2019-01-15 20:16:59.513023] I [MSGID: 100030]
>>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd:
Started running
>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>>> /var/run/glusterd.pid)
>>>>>> [2019-01-15 20:16:59.517164] I [MSGID: 106478]
[glusterd.c:1423:init]
>>>>>> 0-management: Maximum allowed open file descriptors set
to 65536
>>>>>> [2019-01-15 20:16:59.517264] I [MSGID: 106479]
[glusterd.c:1481:init]
>>>>>> 0-management: Using /var/lib/glusterd as working
directory
>>>>>> [2019-01-15 20:16:59.517283] I [MSGID: 106479]
[glusterd.c:1486:init]
>>>>>> 0-management: Using /var/run/gluster as pid file
working directory
>>>>>> [2019-01-15 20:16:59.521508] W [MSGID: 103071]
>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>> channel creation failed [No such device]
>>>>>> [2019-01-15 20:16:59.521544] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>>> 0-rdma.management: Failed to initialize IB Device
>>>>>> [2019-01-15 20:16:59.521562] W
>>>>>> [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma'
>>>>>> initialization failed
>>>>>> [2019-01-15 20:16:59.521629] W
[rpcsvc.c:1781:rpcsvc_create_listener]
>>>>>> 0-rpc-service: cannot create listener, initing the
transport failed
>>>>>> [2019-01-15 20:16:59.521648] E [MSGID: 106244]
[glusterd.c:1764:init]
>>>>>> 0-management: creation of 1 listeners failed,
continuing with succeeded
>>>>>> transport
>>>>>> [2019-01-15 20:17:00.529390] I [MSGID: 106513]
>>>>>> [glusterd-store.c:2240:glusterd_restore_op_version]
0-glusterd: retrieved
>>>>>> op-version: 40100
>>>>>> [2019-01-15 20:17:00.608354] I [MSGID: 106544]
>>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management:
retrieved UUID:
>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>> [2019-01-15 20:17:00.650911] W [MSGID: 106425]
>>>>>> [glusterd-store.c:2643:glusterd_store_retrieve_bricks]
0-management: failed
>>>>>> to get statfs() call on brick /media/disk4/brick4 [No
such file or
>>>>>> directory]
>>>>>>
>>>>>
>>>>> This means that underlying brick /media/disk4/brick4
doesn't exist.
>>>>> You already mentioned that you had replaced the faulty
disk, but have you
>>>>> not mounted it yet?
>>>>>
>>>>>
>>>>>> [2019-01-15 20:17:00.691240] I [MSGID: 106498]
>>>>>>
[glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
>>>>>> connect returned 0
>>>>>> [2019-01-15 20:17:00.691307] W [MSGID: 106061]
>>>>>>
[glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>>>>>> Failed to get tcp-user-timeout
>>>>>> [2019-01-15 20:17:00.691331] I
>>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-management: setting
>>>>>> frame-timeout to 600
>>>>>> [2019-01-15 20:17:00.692547] E [MSGID: 106187]
>>>>>> [glusterd-store.c:4662:glusterd_resolve_all_bricks]
0-glusterd: resolve
>>>>>> brick failed in restore
>>>>>> [2019-01-15 20:17:00.692582] E [MSGID: 101019]
>>>>>> [xlator.c:720:xlator_init] 0-management: Initialization
of volume
>>>>>> 'management' failed, review your volfile again
>>>>>> [2019-01-15 20:17:00.692597] E [MSGID: 101066]
>>>>>> [graph.c:367:glusterfs_graph_init] 0-management:
initializing translator
>>>>>> failed
>>>>>> [2019-01-15 20:17:00.692607] E [MSGID: 101176]
>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init
failed
>>>>>> [2019-01-15 20:17:00.693004] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f)
[0x40942f] ) 0-:
>>>>>> received signum (-1), shutting down
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee
<amukherj at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> This is a case of partial write of a transaction
and as the host ran
>>>>>>> out of space for the root partition where all the
glusterd related
>>>>>>> configurations are persisted, the transaction
couldn't be written and hence
>>>>>>> the new (replaced) brick's information
wasn't persisted in the
>>>>>>> configuration. The workaround for this is to copy
the content of
>>>>>>> /var/lib/glusterd/vols/gfs-tst/ from one of the
nodes in the trusted
>>>>>>> storage pool to the node where glusterd service
fails to come up and post
>>>>>>> that restarting the glusterd service should be able
to make peer status
>>>>>>> reporting all nodes healthy and connected.
>>>>>>>
>>>>>>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P
<amudhan83 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> In short, when I started glusterd service I am
getting following
>>>>>>>> error msg in the glusterd.log file in one
server.
>>>>>>>> what needs to be done?
>>>>>>>>
>>>>>>>> error logged in glusterd.log
>>>>>>>>
>>>>>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030]
>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>>>>> /var/run/glusterd.pid)
>>>>>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478]
>>>>>>>> [glusterd.c:1423:init] 0-management: Maximum
allowed open file descriptors
>>>>>>>> set to 65536
>>>>>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479]
>>>>>>>> [glusterd.c:1481:init] 0-management: Using
/var/lib/glusterd as working
>>>>>>>> directory
>>>>>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479]
>>>>>>>> [glusterd.c:1486:init] 0-management: Using
/var/run/gluster as pid file
>>>>>>>> working directory
>>>>>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>> channel creation failed [No such device]
>>>>>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>>>>> 0-rdma.management: Failed to initialize IB
Device
>>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>>> [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma'
>>>>>>>> initialization failed
>>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create
>>>>>>>> listener, initing the transport failed
>>>>>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244]
>>>>>>>> [glusterd.c:1764:init] 0-management: creation
of 1 listeners failed,
>>>>>>>> continuing with succeeded transport
>>>>>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>> op-version: 40100
>>>>>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
>>>>>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path
corresponding to
>>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>>> file or directory]
>>>>>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
>>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>>> Unable to restore volume: gfs-tst
>>>>>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019]
>>>>>>>> [xlator.c:720:xlator_init] 0-management:
Initialization of volume
>>>>>>>> 'management' failed, review your
volfile again
>>>>>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>> failed
>>>>>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
>>>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph:
init failed
>>>>>>>> [2019-01-15 17:50:15.047171] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> In long, I am trying to simulate a situation.
where volume stoped
>>>>>>>> abnormally and
>>>>>>>> entire cluster restarted with some missing
disks.
>>>>>>>>
>>>>>>>> My test cluster is set up with 3 nodes and each
has four disks, I
>>>>>>>> have setup a volume with disperse 4+2.
>>>>>>>> In Node-3 2 disks have failed, to replace I
have shutdown all system
>>>>>>>>
>>>>>>>> below are the steps done.
>>>>>>>>
>>>>>>>> 1. umount from client machine
>>>>>>>> 2. shutdown all system by running `shutdown -h
now` command (
>>>>>>>> without stopping volume and stop service)
>>>>>>>> 3. replace faulty disk in Node-3
>>>>>>>> 4. powered ON all system
>>>>>>>> 5. format replaced drives, and mount all drives
>>>>>>>> 6. start glusterd service in all node (success)
>>>>>>>> 7. Now running `voulume status` command from
node-3
>>>>>>>> output : [2019-01-15 16:52:17.718422]  : v
status : FAILED :
>>>>>>>> Staging failed on
0083ec0c-40bf-472a-a128-458924e56c96. Please check log
>>>>>>>> file for details.
>>>>>>>> 8. running `voulume start gfs-tst` command from
node-3
>>>>>>>> output : [2019-01-15 16:53:19.410252]  : v
start gfs-tst : FAILED :
>>>>>>>> Volume gfs-tst already started
>>>>>>>>
>>>>>>>> 9. running `gluster v status` in other node.
showing all brick
>>>>>>>> available but 'self-heal daemon' not
running
>>>>>>>> @gfstst-node2:~$ sudo gluster v status
>>>>>>>> Status of volume: gfs-tst
>>>>>>>> Gluster process                             TCP
Port  RDMA Port
>>>>>>>> Online  Pid
>>>>>>>>
>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>> Brick IP.2:/media/disk1/brick1          49152  
0          Y
>>>>>>>>  1517
>>>>>>>> Brick IP.4:/media/disk1/brick1          49152  
0          Y
>>>>>>>>  1668
>>>>>>>> Brick IP.2:/media/disk2/brick2          49153  
0          Y
>>>>>>>>  1522
>>>>>>>> Brick IP.4:/media/disk2/brick2          49153  
0          Y
>>>>>>>>  1678
>>>>>>>> Brick IP.2:/media/disk3/brick3          49154  
0          Y
>>>>>>>>  1527
>>>>>>>> Brick IP.4:/media/disk3/brick3          49154  
0          Y
>>>>>>>>  1677
>>>>>>>> Brick IP.2:/media/disk4/brick4          49155  
0          Y
>>>>>>>>  1541
>>>>>>>> Brick IP.4:/media/disk4/brick4          49155  
0          Y
>>>>>>>>  1683
>>>>>>>> Self-heal Daemon on localhost               N/A
N/A        Y
>>>>>>>>      2662
>>>>>>>> Self-heal Daemon on IP.4                N/A    
N/A        Y
>>>>>>>>  2786
>>>>>>>>
>>>>>>>> 10. in the above output 'volume already
started'. so, running
>>>>>>>> `reset-brick` command
>>>>>>>>    v reset-brick gfs-tst
IP.3:/media/disk3/brick3
>>>>>>>> IP.3:/media/disk3/brick3 commit force
>>>>>>>>
>>>>>>>> output : [2019-01-15 16:57:37.916942]  : v
reset-brick gfs-tst
>>>>>>>> IP.3:/media/disk3/brick3
IP.3:/media/disk3/brick3 commit force : FAILED :
>>>>>>>> /media/disk3/brick3 is already part of a volume
>>>>>>>>
>>>>>>>> 11. reset-brick command was not working, so,
tried stopping volume
>>>>>>>> and start with force command
>>>>>>>> output : [2019-01-15 17:01:04.570794]  : v
start gfs-tst force :
>>>>>>>> FAILED : Pre-validation failed on localhost.
Please check log file for
>>>>>>>> details
>>>>>>>>
>>>>>>>> 12. now stopped service in all node and tried
starting again.
>>>>>>>> except node-3 other nodes service started
successfully without any issues.
>>>>>>>>
>>>>>>>> in node-3 receiving following message.
>>>>>>>>
>>>>>>>> sudo service glusterd start
>>>>>>>> * Starting glusterd service glusterd
>>>>>>>>
>>>>>>>>                 [fail]
>>>>>>>> /usr/local/sbin/glusterd: option requires an
argument -- 'f'
>>>>>>>> Try `glusterd --help' or `glusterd
--usage' for more information.
>>>>>>>>
>>>>>>>> 13. checking glusterd log file found that OS
drive was running out
>>>>>>>> of space
>>>>>>>> output : [2019-01-15 16:51:37.210792] W [MSGID:
101012]
>>>>>>>> [store.c:372:gf_store_save_value] 0-management:
fflush failed. [No space
>>>>>>>> left on device]
>>>>>>>> [2019-01-15 16:51:37.210874] E [MSGID: 106190]
>>>>>>>>
[glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management:
>>>>>>>> Unable to write volume values for gfs-tst
>>>>>>>>
>>>>>>>> 14. cleared some space in OS drive but still,
service is not
>>>>>>>> running. below is the error logged in
glusterd.log
>>>>>>>>
>>>>>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030]
>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>>>>> /var/run/glusterd.pid)
>>>>>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478]
>>>>>>>> [glusterd.c:1423:init] 0-management: Maximum
allowed open file descriptors
>>>>>>>> set to 65536
>>>>>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479]
>>>>>>>> [glusterd.c:1481:init] 0-management: Using
/var/lib/glusterd as working
>>>>>>>> directory
>>>>>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479]
>>>>>>>> [glusterd.c:1486:init] 0-management: Using
/var/run/gluster as pid file
>>>>>>>> working directory
>>>>>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>> channel creation failed [No such device]
>>>>>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>>>>> 0-rdma.management: Failed to initialize IB
Device
>>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>>> [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma'
>>>>>>>> initialization failed
>>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create
>>>>>>>> listener, initing the transport failed
>>>>>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244]
>>>>>>>> [glusterd.c:1764:init] 0-management: creation
of 1 listeners failed,
>>>>>>>> continuing with succeeded transport
>>>>>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>> op-version: 40100
>>>>>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
>>>>>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path
corresponding to
>>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>>> file or directory]
>>>>>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
>>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>>> Unable to restore volume: gfs-tst
>>>>>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019]
>>>>>>>> [xlator.c:720:xlator_init] 0-management:
Initialization of volume
>>>>>>>> 'management' failed, review your
volfile again
>>>>>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>> failed
>>>>>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
>>>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph:
init failed
>>>>>>>> [2019-01-15 17:50:15.047171] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>>>>
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>>>>>>>> received signum (-1), shutting down
>>>>>>>>
>>>>>>>>
>>>>>>>> 15. In other node running `volume status'
still shows bricks node3
>>>>>>>> is live
>>>>>>>>      but 'peer status' showing node-3
disconnected
>>>>>>>>
>>>>>>>> @gfstst-node2:~$ sudo gluster v status
>>>>>>>> Status of volume: gfs-tst
>>>>>>>> Gluster process                             TCP
Port  RDMA Port
>>>>>>>> Online  Pid
>>>>>>>>
>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>> Brick IP.2:/media/disk1/brick1          49152  
0          Y
>>>>>>>>  1517
>>>>>>>> Brick IP.4:/media/disk1/brick1          49152  
0          Y
>>>>>>>>  1668
>>>>>>>> Brick IP.2:/media/disk2/brick2          49153  
0          Y
>>>>>>>>  1522
>>>>>>>> Brick IP.4:/media/disk2/brick2          49153  
0          Y
>>>>>>>>  1678
>>>>>>>> Brick IP.2:/media/disk3/brick3          49154  
0          Y
>>>>>>>>  1527
>>>>>>>> Brick IP.4:/media/disk3/brick3          49154  
0          Y
>>>>>>>>  1677
>>>>>>>> Brick IP.2:/media/disk4/brick4          49155  
0          Y
>>>>>>>>  1541
>>>>>>>> Brick IP.4:/media/disk4/brick4          49155  
0          Y
>>>>>>>>  1683
>>>>>>>> Self-heal Daemon on localhost           N/A    
N/A        Y
>>>>>>>>  2662
>>>>>>>> Self-heal Daemon on IP.4                N/A    
N/A        Y
>>>>>>>>  2786
>>>>>>>>
>>>>>>>> Task Status of Volume gfs-tst
>>>>>>>>
>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>> There are no active volume tasks
>>>>>>>>
>>>>>>>>
>>>>>>>> root at gfstst-node2:~$ sudo gluster pool list
>>>>>>>> UUID                                   
Hostname        State
>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d    IP.3   
Disconnected
>>>>>>>> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143    IP.4   
Connected
>>>>>>>> 0083ec0c-40bf-472a-a128-458924e56c96   
localhost       Connected
>>>>>>>>
>>>>>>>> root at gfstst-node2:~$ sudo gluster peer
status
>>>>>>>> Number of Peers: 2
>>>>>>>>
>>>>>>>> Hostname: IP.3
>>>>>>>> Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>> State: Peer in Cluster (Disconnected)
>>>>>>>>
>>>>>>>> Hostname: IP.4
>>>>>>>> Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143
>>>>>>>> State: Peer in Cluster (Connected)
>>>>>>>>
>>>>>>>>
>>>>>>>> regards
>>>>>>>> Amudhan
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190118/40abd4af/attachment.html>

Atin Mukherjee

2019-Jan-23 17:32 UTC

head link

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

Amudhan,

I see that you have provided the content of the configuration of the volume
gfs-tst where the request was to share the dump of /var/lib/glusterd/* . I
can not debug this further until you share the correct dump.

On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee <amukherj at redhat.com>
wrote:
> Can you please run 'glusterd -LDEBUG' and share back the
glusterd.log?
> Instead of doing too many back and forth I suggest you to share the content
> of /var/lib/glusterd from all the nodes. Also do mention which particular
> node the glusterd service is unable to come up.
>
> On Thu, Jan 17, 2019 at 11:34 AM Amudhan P <amudhan83 at gmail.com>
wrote:
>
>> I have created the folder in the path as said but still, service failed
>> to start below is the error msg in glusterd.log
>>
>> [2019-01-16 14:50:14.555742] I [MSGID: 100030] [glusterfsd.c:2741:main]
>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
>> [2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init]
>> 0-management: Using /var/lib/glusterd as working directory
>> [2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init]
>> 0-management: Using /var/run/gluster as pid file working directory
>> [2019-01-16 14:50:14.563834] W [MSGID: 103071]
>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>> channel creation failed [No such device]
>> [2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init]
>> 0-rdma.management: Failed to initialize IB Device
>> [2019-01-16 14:50:14.563882] W [rpc-transport.c:351:rpc_transport_load]
>> 0-rpc-transport: 'rdma' initialization failed
>> [2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init]
>> 0-management: creation of 1 listeners failed, continuing with succeeded
>> transport
>> [2019-01-16 14:50:15.565868] I [MSGID: 106513]
>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd:
retrieved
>> op-version: 40100
>> [2019-01-16 14:50:15.642532] I [MSGID: 106544]
>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>> [2019-01-16 14:50:15.675333] I [MSGID: 106498]
>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo]
0-management:
>> connect returned 0
>> [2019-01-16 14:50:15.675421] W [MSGID: 106061]
>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build]
0-glusterd:
>> Failed to get tcp-user-timeout
>> [2019-01-16 14:50:15.675451] I
[rpc-clnt.c:1059:rpc_clnt_connection_init]
>> 0-management: setting frame-timeout to 600
>> *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
>> brick failed in restore*
>> *[2019-01-16 14:50:15.676956] E [MSGID: 101019]
>> [xlator.c:720:xlator_init] 0-management: Initialization of volume
>> 'management' failed, review your volfile again*
>> [2019-01-16 14:50:15.676973] E [MSGID: 101066]
>> [graph.c:367:glusterfs_graph_init] 0-management: initializing
translator
>> failed
>> [2019-01-16 14:50:15.676986] E [MSGID: 101176]
>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>> [2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit]
>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151)
[0x409e41]
>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>> received signum (-1), shutting down
>>
>>
>> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee <amukherj at
redhat.com>
>> wrote:
>>
>>> If gluster volume info/status shows the brick to be
/media/disk4/brick4
>>> then you'd need to mount the same path and hence you'd need
to create the
>>> brick4 directory explicitly. I fail to understand the rationale how
only
>>> /media/disk4 can be used as the mount path for the brick.
>>>
>>> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P <amudhan83 at
gmail.com> wrote:
>>>
>>>> Yes, I did mount bricks but the folder 'brick4' was
still not created
>>>> inside the brick.
>>>> Do I need to create this folder because when I run
replace-brick it
>>>> will create folder inside the brick. I have seen this behavior
before when
>>>> running replace-brick or heal begins.
>>>>
>>>> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee <amukherj at
redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P <amudhan83 at
gmail.com> wrote:
>>>>>
>>>>>> Atin,
>>>>>> I have copied the content of 'gfs-tst' from vol
folder in another
>>>>>> node. when starting service again fails with error msg
in glusterd.log file.
>>>>>>
>>>>>> [2019-01-15 20:16:59.513023] I [MSGID: 100030]
>>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd:
Started running
>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>>> /var/run/glusterd.pid)
>>>>>> [2019-01-15 20:16:59.517164] I [MSGID: 106478]
[glusterd.c:1423:init]
>>>>>> 0-management: Maximum allowed open file descriptors set
to 65536
>>>>>> [2019-01-15 20:16:59.517264] I [MSGID: 106479]
[glusterd.c:1481:init]
>>>>>> 0-management: Using /var/lib/glusterd as working
directory
>>>>>> [2019-01-15 20:16:59.517283] I [MSGID: 106479]
[glusterd.c:1486:init]
>>>>>> 0-management: Using /var/run/gluster as pid file
working directory
>>>>>> [2019-01-15 20:16:59.521508] W [MSGID: 103071]
>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>> channel creation failed [No such device]
>>>>>> [2019-01-15 20:16:59.521544] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>>> 0-rdma.management: Failed to initialize IB Device
>>>>>> [2019-01-15 20:16:59.521562] W
>>>>>> [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma'
>>>>>> initialization failed
>>>>>> [2019-01-15 20:16:59.521629] W
[rpcsvc.c:1781:rpcsvc_create_listener]
>>>>>> 0-rpc-service: cannot create listener, initing the
transport failed
>>>>>> [2019-01-15 20:16:59.521648] E [MSGID: 106244]
[glusterd.c:1764:init]
>>>>>> 0-management: creation of 1 listeners failed,
continuing with succeeded
>>>>>> transport
>>>>>> [2019-01-15 20:17:00.529390] I [MSGID: 106513]
>>>>>> [glusterd-store.c:2240:glusterd_restore_op_version]
0-glusterd: retrieved
>>>>>> op-version: 40100
>>>>>> [2019-01-15 20:17:00.608354] I [MSGID: 106544]
>>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management:
retrieved UUID:
>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>> [2019-01-15 20:17:00.650911] W [MSGID: 106425]
>>>>>> [glusterd-store.c:2643:glusterd_store_retrieve_bricks]
0-management: failed
>>>>>> to get statfs() call on brick /media/disk4/brick4 [No
such file or
>>>>>> directory]
>>>>>>
>>>>>
>>>>> This means that underlying brick /media/disk4/brick4
doesn't exist.
>>>>> You already mentioned that you had replaced the faulty
disk, but have you
>>>>> not mounted it yet?
>>>>>
>>>>>
>>>>>> [2019-01-15 20:17:00.691240] I [MSGID: 106498]
>>>>>>
[glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
>>>>>> connect returned 0
>>>>>> [2019-01-15 20:17:00.691307] W [MSGID: 106061]
>>>>>>
[glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>>>>>> Failed to get tcp-user-timeout
>>>>>> [2019-01-15 20:17:00.691331] I
>>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-management: setting
>>>>>> frame-timeout to 600
>>>>>> [2019-01-15 20:17:00.692547] E [MSGID: 106187]
>>>>>> [glusterd-store.c:4662:glusterd_resolve_all_bricks]
0-glusterd: resolve
>>>>>> brick failed in restore
>>>>>> [2019-01-15 20:17:00.692582] E [MSGID: 101019]
>>>>>> [xlator.c:720:xlator_init] 0-management: Initialization
of volume
>>>>>> 'management' failed, review your volfile again
>>>>>> [2019-01-15 20:17:00.692597] E [MSGID: 101066]
>>>>>> [graph.c:367:glusterfs_graph_init] 0-management:
initializing translator
>>>>>> failed
>>>>>> [2019-01-15 20:17:00.692607] E [MSGID: 101176]
>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init
failed
>>>>>> [2019-01-15 20:17:00.693004] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f)
[0x40942f] ) 0-:
>>>>>> received signum (-1), shutting down
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee
<amukherj at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> This is a case of partial write of a transaction
and as the host ran
>>>>>>> out of space for the root partition where all the
glusterd related
>>>>>>> configurations are persisted, the transaction
couldn't be written and hence
>>>>>>> the new (replaced) brick's information
wasn't persisted in the
>>>>>>> configuration. The workaround for this is to copy
the content of
>>>>>>> /var/lib/glusterd/vols/gfs-tst/ from one of the
nodes in the trusted
>>>>>>> storage pool to the node where glusterd service
fails to come up and post
>>>>>>> that restarting the glusterd service should be able
to make peer status
>>>>>>> reporting all nodes healthy and connected.
>>>>>>>
>>>>>>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P
<amudhan83 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> In short, when I started glusterd service I am
getting following
>>>>>>>> error msg in the glusterd.log file in one
server.
>>>>>>>> what needs to be done?
>>>>>>>>
>>>>>>>> error logged in glusterd.log
>>>>>>>>
>>>>>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030]
>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>>>>> /var/run/glusterd.pid)
>>>>>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478]
>>>>>>>> [glusterd.c:1423:init] 0-management: Maximum
allowed open file descriptors
>>>>>>>> set to 65536
>>>>>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479]
>>>>>>>> [glusterd.c:1481:init] 0-management: Using
/var/lib/glusterd as working
>>>>>>>> directory
>>>>>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479]
>>>>>>>> [glusterd.c:1486:init] 0-management: Using
/var/run/gluster as pid file
>>>>>>>> working directory
>>>>>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>> channel creation failed [No such device]
>>>>>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>>>>> 0-rdma.management: Failed to initialize IB
Device
>>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>>> [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma'
>>>>>>>> initialization failed
>>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create
>>>>>>>> listener, initing the transport failed
>>>>>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244]
>>>>>>>> [glusterd.c:1764:init] 0-management: creation
of 1 listeners failed,
>>>>>>>> continuing with succeeded transport
>>>>>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>> op-version: 40100
>>>>>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
>>>>>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path
corresponding to
>>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>>> file or directory]
>>>>>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
>>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>>> Unable to restore volume: gfs-tst
>>>>>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019]
>>>>>>>> [xlator.c:720:xlator_init] 0-management:
Initialization of volume
>>>>>>>> 'management' failed, review your
volfile again
>>>>>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>> failed
>>>>>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
>>>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph:
init failed
>>>>>>>> [2019-01-15 17:50:15.047171] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> In long, I am trying to simulate a situation.
where volume stoped
>>>>>>>> abnormally and
>>>>>>>> entire cluster restarted with some missing
disks.
>>>>>>>>
>>>>>>>> My test cluster is set up with 3 nodes and each
has four disks, I
>>>>>>>> have setup a volume with disperse 4+2.
>>>>>>>> In Node-3 2 disks have failed, to replace I
have shutdown all system
>>>>>>>>
>>>>>>>> below are the steps done.
>>>>>>>>
>>>>>>>> 1. umount from client machine
>>>>>>>> 2. shutdown all system by running `shutdown -h
now` command (
>>>>>>>> without stopping volume and stop service)
>>>>>>>> 3. replace faulty disk in Node-3
>>>>>>>> 4. powered ON all system
>>>>>>>> 5. format replaced drives, and mount all drives
>>>>>>>> 6. start glusterd service in all node (success)
>>>>>>>> 7. Now running `voulume status` command from
node-3
>>>>>>>> output : [2019-01-15 16:52:17.718422]  : v
status : FAILED :
>>>>>>>> Staging failed on
0083ec0c-40bf-472a-a128-458924e56c96. Please check log
>>>>>>>> file for details.
>>>>>>>> 8. running `voulume start gfs-tst` command from
node-3
>>>>>>>> output : [2019-01-15 16:53:19.410252]  : v
start gfs-tst : FAILED :
>>>>>>>> Volume gfs-tst already started
>>>>>>>>
>>>>>>>> 9. running `gluster v status` in other node.
showing all brick
>>>>>>>> available but 'self-heal daemon' not
running
>>>>>>>> @gfstst-node2:~$ sudo gluster v status
>>>>>>>> Status of volume: gfs-tst
>>>>>>>> Gluster process                             TCP
Port  RDMA Port
>>>>>>>> Online  Pid
>>>>>>>>
>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>> Brick IP.2:/media/disk1/brick1          49152  
0          Y
>>>>>>>>  1517
>>>>>>>> Brick IP.4:/media/disk1/brick1          49152  
0          Y
>>>>>>>>  1668
>>>>>>>> Brick IP.2:/media/disk2/brick2          49153  
0          Y
>>>>>>>>  1522
>>>>>>>> Brick IP.4:/media/disk2/brick2          49153  
0          Y
>>>>>>>>  1678
>>>>>>>> Brick IP.2:/media/disk3/brick3          49154  
0          Y
>>>>>>>>  1527
>>>>>>>> Brick IP.4:/media/disk3/brick3          49154  
0          Y
>>>>>>>>  1677
>>>>>>>> Brick IP.2:/media/disk4/brick4          49155  
0          Y
>>>>>>>>  1541
>>>>>>>> Brick IP.4:/media/disk4/brick4          49155  
0          Y
>>>>>>>>  1683
>>>>>>>> Self-heal Daemon on localhost               N/A
N/A        Y
>>>>>>>>      2662
>>>>>>>> Self-heal Daemon on IP.4                N/A    
N/A        Y
>>>>>>>>  2786
>>>>>>>>
>>>>>>>> 10. in the above output 'volume already
started'. so, running
>>>>>>>> `reset-brick` command
>>>>>>>>    v reset-brick gfs-tst
IP.3:/media/disk3/brick3
>>>>>>>> IP.3:/media/disk3/brick3 commit force
>>>>>>>>
>>>>>>>> output : [2019-01-15 16:57:37.916942]  : v
reset-brick gfs-tst
>>>>>>>> IP.3:/media/disk3/brick3
IP.3:/media/disk3/brick3 commit force : FAILED :
>>>>>>>> /media/disk3/brick3 is already part of a volume
>>>>>>>>
>>>>>>>> 11. reset-brick command was not working, so,
tried stopping volume
>>>>>>>> and start with force command
>>>>>>>> output : [2019-01-15 17:01:04.570794]  : v
start gfs-tst force :
>>>>>>>> FAILED : Pre-validation failed on localhost.
Please check log file for
>>>>>>>> details
>>>>>>>>
>>>>>>>> 12. now stopped service in all node and tried
starting again.
>>>>>>>> except node-3 other nodes service started
successfully without any issues.
>>>>>>>>
>>>>>>>> in node-3 receiving following message.
>>>>>>>>
>>>>>>>> sudo service glusterd start
>>>>>>>> * Starting glusterd service glusterd
>>>>>>>>
>>>>>>>>                 [fail]
>>>>>>>> /usr/local/sbin/glusterd: option requires an
argument -- 'f'
>>>>>>>> Try `glusterd --help' or `glusterd
--usage' for more information.
>>>>>>>>
>>>>>>>> 13. checking glusterd log file found that OS
drive was running out
>>>>>>>> of space
>>>>>>>> output : [2019-01-15 16:51:37.210792] W [MSGID:
101012]
>>>>>>>> [store.c:372:gf_store_save_value] 0-management:
fflush failed. [No space
>>>>>>>> left on device]
>>>>>>>> [2019-01-15 16:51:37.210874] E [MSGID: 106190]
>>>>>>>>
[glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management:
>>>>>>>> Unable to write volume values for gfs-tst
>>>>>>>>
>>>>>>>> 14. cleared some space in OS drive but still,
service is not
>>>>>>>> running. below is the error logged in
glusterd.log
>>>>>>>>
>>>>>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030]
>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>>>>> /var/run/glusterd.pid)
>>>>>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478]
>>>>>>>> [glusterd.c:1423:init] 0-management: Maximum
allowed open file descriptors
>>>>>>>> set to 65536
>>>>>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479]
>>>>>>>> [glusterd.c:1481:init] 0-management: Using
/var/lib/glusterd as working
>>>>>>>> directory
>>>>>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479]
>>>>>>>> [glusterd.c:1486:init] 0-management: Using
/var/run/gluster as pid file
>>>>>>>> working directory
>>>>>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>> channel creation failed [No such device]
>>>>>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>>>>> 0-rdma.management: Failed to initialize IB
Device
>>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>>> [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma'
>>>>>>>> initialization failed
>>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create
>>>>>>>> listener, initing the transport failed
>>>>>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244]
>>>>>>>> [glusterd.c:1764:init] 0-management: creation
of 1 listeners failed,
>>>>>>>> continuing with succeeded transport
>>>>>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>> op-version: 40100
>>>>>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
>>>>>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path
corresponding to
>>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>>> file or directory]
>>>>>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
>>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>>> Unable to restore volume: gfs-tst
>>>>>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019]
>>>>>>>> [xlator.c:720:xlator_init] 0-management:
Initialization of volume
>>>>>>>> 'management' failed, review your
volfile again
>>>>>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>> failed
>>>>>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
>>>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph:
init failed
>>>>>>>> [2019-01-15 17:50:15.047171] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>>>>
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>>>>>>>> received signum (-1), shutting down
>>>>>>>>
>>>>>>>>
>>>>>>>> 15. In other node running `volume status'
still shows bricks node3
>>>>>>>> is live
>>>>>>>>      but 'peer status' showing node-3
disconnected
>>>>>>>>
>>>>>>>> @gfstst-node2:~$ sudo gluster v status
>>>>>>>> Status of volume: gfs-tst
>>>>>>>> Gluster process                             TCP
Port  RDMA Port
>>>>>>>> Online  Pid
>>>>>>>>
>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>> Brick IP.2:/media/disk1/brick1          49152  
0          Y
>>>>>>>>  1517
>>>>>>>> Brick IP.4:/media/disk1/brick1          49152  
0          Y
>>>>>>>>  1668
>>>>>>>> Brick IP.2:/media/disk2/brick2          49153  
0          Y
>>>>>>>>  1522
>>>>>>>> Brick IP.4:/media/disk2/brick2          49153  
0          Y
>>>>>>>>  1678
>>>>>>>> Brick IP.2:/media/disk3/brick3          49154  
0          Y
>>>>>>>>  1527
>>>>>>>> Brick IP.4:/media/disk3/brick3          49154  
0          Y
>>>>>>>>  1677
>>>>>>>> Brick IP.2:/media/disk4/brick4          49155  
0          Y
>>>>>>>>  1541
>>>>>>>> Brick IP.4:/media/disk4/brick4          49155  
0          Y
>>>>>>>>  1683
>>>>>>>> Self-heal Daemon on localhost           N/A    
N/A        Y
>>>>>>>>  2662
>>>>>>>> Self-heal Daemon on IP.4                N/A    
N/A        Y
>>>>>>>>  2786
>>>>>>>>
>>>>>>>> Task Status of Volume gfs-tst
>>>>>>>>
>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>> There are no active volume tasks
>>>>>>>>
>>>>>>>>
>>>>>>>> root at gfstst-node2:~$ sudo gluster pool list
>>>>>>>> UUID                                   
Hostname        State
>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d    IP.3   
Disconnected
>>>>>>>> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143    IP.4   
Connected
>>>>>>>> 0083ec0c-40bf-472a-a128-458924e56c96   
localhost       Connected
>>>>>>>>
>>>>>>>> root at gfstst-node2:~$ sudo gluster peer
status
>>>>>>>> Number of Peers: 2
>>>>>>>>
>>>>>>>> Hostname: IP.3
>>>>>>>> Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>> State: Peer in Cluster (Disconnected)
>>>>>>>>
>>>>>>>> Hostname: IP.4
>>>>>>>> Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143
>>>>>>>> State: Peer in Cluster (Connected)
>>>>>>>>
>>>>>>>>
>>>>>>>> regards
>>>>>>>> Amudhan
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190123/5132f109/attachment.html>

Gluster users - Jan 2019 - glusterfs 4.1.6 error in starting glusterd service

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service