thr3ads.net - Gluster users - [Gluster-users] glusterfs 4.1.6 error in starting glusterd service [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Amudhan P

2019-Jan-30 13:55 UTC

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

Hi Atin,

yes, it worked out thank you.

what would be the cause of this issue?



On Fri, Jan 25, 2019 at 1:56 PM Atin Mukherjee <amukherj at redhat.com>
wrote:
> Amudhan,
>
> So here's the issue:
>
> In node3, 'cat /var/lib/glusterd/peers/* ' doesn't show up
node2's details
> and that's why glusterd wasn't able to resolve the brick(s) hosted
on node2.
>
> Can you please pick up 0083ec0c-40bf-472a-a128-458924e56c96 file from
> /var/lib/glusterd/peers/ from node 4 and place it in the same location in
> node 3 and then restart glusterd service on node 3?
>
>
> On Thu, Jan 24, 2019 at 11:57 AM Amudhan P <amudhan83 at gmail.com>
wrote:
>
>> Atin,
>>
>> Sorry, i missed to send entire `glusterd` folder.  Now attached zip
>> contains `glusterd` folder from all nodes.
>>
>> the problem node is node3 IP 10.1.2.3, `glusterd` log file is inside
>> node3 folder.
>>
>> regards
>> Amudhan
>>
>> On Wed, Jan 23, 2019 at 11:02 PM Atin Mukherjee <amukherj at
redhat.com>
>> wrote:
>>
>>> Amudhan,
>>>
>>> I see that you have provided the content of the configuration of
the
>>> volume gfs-tst where the request was to share the dump of
>>> /var/lib/glusterd/* . I can not debug this further until you share
the
>>> correct dump.
>>>
>>> On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee <amukherj at
redhat.com>
>>> wrote:
>>>
>>>> Can you please run 'glusterd -LDEBUG' and share back
the glusterd.log?
>>>> Instead of doing too many back and forth I suggest you to share
the content
>>>> of /var/lib/glusterd from all the nodes. Also do mention which
particular
>>>> node the glusterd service is unable to come up.
>>>>
>>>> On Thu, Jan 17, 2019 at 11:34 AM Amudhan P <amudhan83 at
gmail.com> wrote:
>>>>
>>>>> I have created the folder in the path as said but still,
service
>>>>> failed to start below is the error msg in glusterd.log
>>>>>
>>>>> [2019-01-16 14:50:14.555742] I [MSGID: 100030]
>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd:
Started running
>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>> /var/run/glusterd.pid)
>>>>> [2019-01-16 14:50:14.559835] I [MSGID: 106478]
[glusterd.c:1423:init]
>>>>> 0-management: Maximum allowed open file descriptors set to
65536
>>>>> [2019-01-16 14:50:14.559894] I [MSGID: 106479]
[glusterd.c:1481:init]
>>>>> 0-management: Using /var/lib/glusterd as working directory
>>>>> [2019-01-16 14:50:14.559912] I [MSGID: 106479]
[glusterd.c:1486:init]
>>>>> 0-management: Using /var/run/gluster as pid file working
directory
>>>>> [2019-01-16 14:50:14.563834] W [MSGID: 103071]
>>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma:
rdma_cm event
>>>>> channel creation failed [No such device]
>>>>> [2019-01-16 14:50:14.563867] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>> 0-rdma.management: Failed to initialize IB Device
>>>>> [2019-01-16 14:50:14.563882] W
>>>>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport:
'rdma'
>>>>> initialization failed
>>>>> [2019-01-16 14:50:14.563957] W
[rpcsvc.c:1781:rpcsvc_create_listener]
>>>>> 0-rpc-service: cannot create listener, initing the
transport failed
>>>>> [2019-01-16 14:50:14.563974] E [MSGID: 106244]
[glusterd.c:1764:init]
>>>>> 0-management: creation of 1 listeners failed, continuing
with succeeded
>>>>> transport
>>>>> [2019-01-16 14:50:15.565868] I [MSGID: 106513]
>>>>> [glusterd-store.c:2240:glusterd_restore_op_version]
0-glusterd: retrieved
>>>>> op-version: 40100
>>>>> [2019-01-16 14:50:15.642532] I [MSGID: 106544]
>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved
UUID:
>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>> [2019-01-16 14:50:15.675333] I [MSGID: 106498]
>>>>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo]
0-management:
>>>>> connect returned 0
>>>>> [2019-01-16 14:50:15.675421] W [MSGID: 106061]
>>>>>
[glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>>>>> Failed to get tcp-user-timeout
>>>>> [2019-01-16 14:50:15.675451] I
>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management:
setting
>>>>> frame-timeout to 600
>>>>> *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
>>>>> [glusterd-store.c:4662:glusterd_resolve_all_bricks]
0-glusterd: resolve
>>>>> brick failed in restore*
>>>>> *[2019-01-16 14:50:15.676956] E [MSGID: 101019]
>>>>> [xlator.c:720:xlator_init] 0-management: Initialization of
volume
>>>>> 'management' failed, review your volfile again*
>>>>> [2019-01-16 14:50:15.676973] E [MSGID: 101066]
>>>>> [graph.c:367:glusterfs_graph_init] 0-management:
initializing translator
>>>>> failed
>>>>> [2019-01-16 14:50:15.676986] E [MSGID: 101176]
>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>>> [2019-01-16 14:50:15.677479] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f)
[0x40942f] ) 0-:
>>>>> received signum (-1), shutting down
>>>>>
>>>>>
>>>>> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee <amukherj
at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> If gluster volume info/status shows the brick to be
>>>>>> /media/disk4/brick4 then you'd need to mount the
same path and hence you'd
>>>>>> need to create the brick4 directory explicitly. I fail
to understand the
>>>>>> rationale how only /media/disk4 can be used as the
mount path for the
>>>>>> brick.
>>>>>>
>>>>>> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P <amudhan83
at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Yes, I did mount bricks but the folder
'brick4' was still not
>>>>>>> created inside the brick.
>>>>>>> Do I need to create this folder because when I run
replace-brick it
>>>>>>> will create folder inside the brick. I have seen
this behavior before when
>>>>>>> running replace-brick or heal begins.
>>>>>>>
>>>>>>> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee
<amukherj at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P
<amudhan83 at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Atin,
>>>>>>>>> I have copied the content of
'gfs-tst' from vol folder in another
>>>>>>>>> node. when starting service again fails
with error msg in glusterd.log file.
>>>>>>>>>
>>>>>>>>> [2019-01-15 20:16:59.513023] I [MSGID:
100030]
>>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>>> /usr/local/sbin/glusterd version 4.1.6
(args: /usr/local/sbin/glusterd -p
>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>> [2019-01-15 20:16:59.517164] I [MSGID:
106478]
>>>>>>>>> [glusterd.c:1423:init] 0-management:
Maximum allowed open file descriptors
>>>>>>>>> set to 65536
>>>>>>>>> [2019-01-15 20:16:59.517264] I [MSGID:
106479]
>>>>>>>>> [glusterd.c:1481:init] 0-management: Using
/var/lib/glusterd as working
>>>>>>>>> directory
>>>>>>>>> [2019-01-15 20:16:59.517283] I [MSGID:
106479]
>>>>>>>>> [glusterd.c:1486:init] 0-management: Using
/var/run/gluster as pid file
>>>>>>>>> working directory
>>>>>>>>> [2019-01-15 20:16:59.521508] W [MSGID:
103071]
>>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>>> channel creation failed [No such device]
>>>>>>>>> [2019-01-15 20:16:59.521544] W [MSGID:
103055] [rdma.c:4938:init]
>>>>>>>>> 0-rdma.management: Failed to initialize IB
Device
>>>>>>>>> [2019-01-15 20:16:59.521562] W
>>>>>>>>> [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma'
>>>>>>>>> initialization failed
>>>>>>>>> [2019-01-15 20:16:59.521629] W
>>>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create
>>>>>>>>> listener, initing the transport failed
>>>>>>>>> [2019-01-15 20:16:59.521648] E [MSGID:
106244]
>>>>>>>>> [glusterd.c:1764:init] 0-management:
creation of 1 listeners failed,
>>>>>>>>> continuing with succeeded transport
>>>>>>>>> [2019-01-15 20:17:00.529390] I [MSGID:
106513]
>>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>>> op-version: 40100
>>>>>>>>> [2019-01-15 20:17:00.608354] I [MSGID:
106544]
>>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>> [2019-01-15 20:17:00.650911] W [MSGID:
106425]
>>>>>>>>>
[glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed
>>>>>>>>> to get statfs() call on brick
/media/disk4/brick4 [No such file or
>>>>>>>>> directory]
>>>>>>>>>
>>>>>>>>
>>>>>>>> This means that underlying brick
/media/disk4/brick4 doesn't exist.
>>>>>>>> You already mentioned that you had replaced the
faulty disk, but have you
>>>>>>>> not mounted it yet?
>>>>>>>>
>>>>>>>>
>>>>>>>>> [2019-01-15 20:17:00.691240] I [MSGID:
106498]
>>>>>>>>>
[glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
>>>>>>>>> connect returned 0
>>>>>>>>> [2019-01-15 20:17:00.691307] W [MSGID:
106061]
>>>>>>>>>
[glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>>>>>>>>> Failed to get tcp-user-timeout
>>>>>>>>> [2019-01-15 20:17:00.691331] I
>>>>>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-management: setting
>>>>>>>>> frame-timeout to 600
>>>>>>>>> [2019-01-15 20:17:00.692547] E [MSGID:
106187]
>>>>>>>>>
[glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
>>>>>>>>> brick failed in restore
>>>>>>>>> [2019-01-15 20:17:00.692582] E [MSGID:
101019]
>>>>>>>>> [xlator.c:720:xlator_init] 0-management:
Initialization of volume
>>>>>>>>> 'management' failed, review your
volfile again
>>>>>>>>> [2019-01-15 20:17:00.692597] E [MSGID:
101066]
>>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>>> failed
>>>>>>>>> [2019-01-15 20:17:00.692607] E [MSGID:
101176]
>>>>>>>>> [graph.c:738:glusterfs_graph_activate]
0-graph: init failed
>>>>>>>>> [2019-01-15 20:17:00.693004] W
>>>>>>>>> [glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>>>>>
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>>>>>>>>> received signum (-1), shutting down
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 16, 2019 at 4:34 PM Atin
Mukherjee <
>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> This is a case of partial write of a
transaction and as the host
>>>>>>>>>> ran out of space for the root partition
where all the glusterd related
>>>>>>>>>> configurations are persisted, the
transaction couldn't be written and hence
>>>>>>>>>> the new (replaced) brick's
information wasn't persisted in the
>>>>>>>>>> configuration. The workaround for this
is to copy the content of
>>>>>>>>>> /var/lib/glusterd/vols/gfs-tst/ from
one of the nodes in the trusted
>>>>>>>>>> storage pool to the node where glusterd
service fails to come up and post
>>>>>>>>>> that restarting the glusterd service
should be able to make peer status
>>>>>>>>>> reporting all nodes healthy and
connected.
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan
P <amudhan83 at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> In short, when I started glusterd
service I am getting following
>>>>>>>>>>> error msg in the glusterd.log file
in one server.
>>>>>>>>>>> what needs to be done?
>>>>>>>>>>>
>>>>>>>>>>> error logged in glusterd.log
>>>>>>>>>>>
>>>>>>>>>>> [2019-01-15 17:50:13.956053] I
[MSGID: 100030]
>>>>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>>>>> /usr/local/sbin/glusterd version
4.1.6 (args: /usr/local/sbin/glusterd -p
>>>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>>>> [2019-01-15 17:50:13.960131] I
[MSGID: 106478]
>>>>>>>>>>> [glusterd.c:1423:init]
0-management: Maximum allowed open file descriptors
>>>>>>>>>>> set to 65536
>>>>>>>>>>> [2019-01-15 17:50:13.960193] I
[MSGID: 106479]
>>>>>>>>>>> [glusterd.c:1481:init]
0-management: Using /var/lib/glusterd as working
>>>>>>>>>>> directory
>>>>>>>>>>> [2019-01-15 17:50:13.960212] I
[MSGID: 106479]
>>>>>>>>>>> [glusterd.c:1486:init]
0-management: Using /var/run/gluster as pid file
>>>>>>>>>>> working directory
>>>>>>>>>>> [2019-01-15 17:50:13.964437] W
[MSGID: 103071]
>>>>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>>>>> channel creation failed [No such
device]
>>>>>>>>>>> [2019-01-15 17:50:13.964474] W
[MSGID: 103055]
>>>>>>>>>>> [rdma.c:4938:init]
0-rdma.management: Failed to initialize IB Device
>>>>>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>>>>>>
[rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma'
>>>>>>>>>>> initialization failed
>>>>>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>>>>>>
[rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create
>>>>>>>>>>> listener, initing the transport
failed
>>>>>>>>>>> [2019-01-15 17:50:13.964579] E
[MSGID: 106244]
>>>>>>>>>>> [glusterd.c:1764:init]
0-management: creation of 1 listeners failed,
>>>>>>>>>>> continuing with succeeded transport
>>>>>>>>>>> [2019-01-15 17:50:14.967681] I
[MSGID: 106513]
>>>>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>>>>> op-version: 40100
>>>>>>>>>>> [2019-01-15 17:50:14.973931] I
[MSGID: 106544]
>>>>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>>>>>
d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>>>> [2019-01-15 17:50:15.046620] E
[MSGID: 101032]
>>>>>>>>>>>
[store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
>>>>>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>>>>>> file or directory]
>>>>>>>>>>> [2019-01-15 17:50:15.046685] E
[MSGID: 106201]
>>>>>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>>>>>> Unable to restore volume: gfs-tst
>>>>>>>>>>> [2019-01-15 17:50:15.046718] E
[MSGID: 101019]
>>>>>>>>>>> [xlator.c:720:xlator_init]
0-management: Initialization of volume
>>>>>>>>>>> 'management' failed, review
your volfile again
>>>>>>>>>>> [2019-01-15 17:50:15.046732] E
[MSGID: 101066]
>>>>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>>>>> failed
>>>>>>>>>>> [2019-01-15 17:50:15.046741] E
[MSGID: 101176]
>>>>>>>>>>>
[graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>>>>>>>>> [2019-01-15 17:50:15.047171] W
>>>>>>>>>>>
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In long, I am trying to simulate a
situation. where volume
>>>>>>>>>>> stoped abnormally and
>>>>>>>>>>> entire cluster restarted with some
missing disks.
>>>>>>>>>>>
>>>>>>>>>>> My test cluster is set up with 3
nodes and each has four disks,
>>>>>>>>>>> I have setup a volume with disperse
4+2.
>>>>>>>>>>> In Node-3 2 disks have failed, to
replace I have shutdown all
>>>>>>>>>>> system
>>>>>>>>>>>
>>>>>>>>>>> below are the steps done.
>>>>>>>>>>>
>>>>>>>>>>> 1. umount from client machine
>>>>>>>>>>> 2. shutdown all system by running
`shutdown -h now` command (
>>>>>>>>>>> without stopping volume and stop
service)
>>>>>>>>>>> 3. replace faulty disk in Node-3
>>>>>>>>>>> 4. powered ON all system
>>>>>>>>>>> 5. format replaced drives, and
mount all drives
>>>>>>>>>>> 6. start glusterd service in all
node (success)
>>>>>>>>>>> 7. Now running `voulume status`
command from node-3
>>>>>>>>>>> output : [2019-01-15
16:52:17.718422]  : v status : FAILED :
>>>>>>>>>>> Staging failed on
0083ec0c-40bf-472a-a128-458924e56c96. Please check log
>>>>>>>>>>> file for details.
>>>>>>>>>>> 8. running `voulume start gfs-tst`
command from node-3
>>>>>>>>>>> output : [2019-01-15
16:53:19.410252]  : v start gfs-tst :
>>>>>>>>>>> FAILED : Volume gfs-tst already
started
>>>>>>>>>>>
>>>>>>>>>>> 9. running `gluster v status` in
other node. showing all brick
>>>>>>>>>>> available but 'self-heal
daemon' not running
>>>>>>>>>>> @gfstst-node2:~$ sudo gluster v
status
>>>>>>>>>>> Status of volume: gfs-tst
>>>>>>>>>>> Gluster process                    
TCP Port  RDMA Port
>>>>>>>>>>> Online  Pid
>>>>>>>>>>>
>>>>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>>>>> Brick IP.2:/media/disk1/brick1     
49152     0          Y
>>>>>>>>>>>      1517
>>>>>>>>>>> Brick IP.4:/media/disk1/brick1     
49152     0          Y
>>>>>>>>>>>      1668
>>>>>>>>>>> Brick IP.2:/media/disk2/brick2     
49153     0          Y
>>>>>>>>>>>      1522
>>>>>>>>>>> Brick IP.4:/media/disk2/brick2     
49153     0          Y
>>>>>>>>>>>      1678
>>>>>>>>>>> Brick IP.2:/media/disk3/brick3     
49154     0          Y
>>>>>>>>>>>      1527
>>>>>>>>>>> Brick IP.4:/media/disk3/brick3     
49154     0          Y
>>>>>>>>>>>      1677
>>>>>>>>>>> Brick IP.2:/media/disk4/brick4     
49155     0          Y
>>>>>>>>>>>      1541
>>>>>>>>>>> Brick IP.4:/media/disk4/brick4     
49155     0          Y
>>>>>>>>>>>      1683
>>>>>>>>>>> Self-heal Daemon on localhost      
N/A       N/A
>>>>>>>>>>> Y       2662
>>>>>>>>>>> Self-heal Daemon on IP.4           
N/A       N/A        Y
>>>>>>>>>>>      2786
>>>>>>>>>>>
>>>>>>>>>>> 10. in the above output 'volume
already started'. so, running
>>>>>>>>>>> `reset-brick` command
>>>>>>>>>>>    v reset-brick gfs-tst
IP.3:/media/disk3/brick3
>>>>>>>>>>> IP.3:/media/disk3/brick3 commit
force
>>>>>>>>>>>
>>>>>>>>>>> output : [2019-01-15
16:57:37.916942]  : v reset-brick gfs-tst
>>>>>>>>>>> IP.3:/media/disk3/brick3
IP.3:/media/disk3/brick3 commit force : FAILED :
>>>>>>>>>>> /media/disk3/brick3 is already part
of a volume
>>>>>>>>>>>
>>>>>>>>>>> 11. reset-brick command was not
working, so, tried stopping
>>>>>>>>>>> volume and start with force command
>>>>>>>>>>> output : [2019-01-15
17:01:04.570794]  : v start gfs-tst force :
>>>>>>>>>>> FAILED : Pre-validation failed on
localhost. Please check log file for
>>>>>>>>>>> details
>>>>>>>>>>>
>>>>>>>>>>> 12. now stopped service in all node
and tried starting again.
>>>>>>>>>>> except node-3 other nodes service
started successfully without any issues.
>>>>>>>>>>>
>>>>>>>>>>> in node-3 receiving following
message.
>>>>>>>>>>>
>>>>>>>>>>> sudo service glusterd start
>>>>>>>>>>> * Starting glusterd service
glusterd
>>>>>>>>>>>
>>>>>>>>>>>                     [fail]
>>>>>>>>>>> /usr/local/sbin/glusterd: option
requires an argument -- 'f'
>>>>>>>>>>> Try `glusterd --help' or
`glusterd --usage' for more information.
>>>>>>>>>>>
>>>>>>>>>>> 13. checking glusterd log file
found that OS drive was running
>>>>>>>>>>> out of space
>>>>>>>>>>> output : [2019-01-15
16:51:37.210792] W [MSGID: 101012]
>>>>>>>>>>> [store.c:372:gf_store_save_value]
0-management: fflush failed. [No space
>>>>>>>>>>> left on device]
>>>>>>>>>>> [2019-01-15 16:51:37.210874] E
[MSGID: 106190]
>>>>>>>>>>>
[glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management:
>>>>>>>>>>> Unable to write volume values for
gfs-tst
>>>>>>>>>>>
>>>>>>>>>>> 14. cleared some space in OS drive
but still, service is not
>>>>>>>>>>> running. below is the error logged
in glusterd.log
>>>>>>>>>>>
>>>>>>>>>>> [2019-01-15 17:50:13.956053] I
[MSGID: 100030]
>>>>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>>>>> /usr/local/sbin/glusterd version
4.1.6 (args: /usr/local/sbin/glusterd -p
>>>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>>>> [2019-01-15 17:50:13.960131] I
[MSGID: 106478]
>>>>>>>>>>> [glusterd.c:1423:init]
0-management: Maximum allowed open file descriptors
>>>>>>>>>>> set to 65536
>>>>>>>>>>> [2019-01-15 17:50:13.960193] I
[MSGID: 106479]
>>>>>>>>>>> [glusterd.c:1481:init]
0-management: Using /var/lib/glusterd as working
>>>>>>>>>>> directory
>>>>>>>>>>> [2019-01-15 17:50:13.960212] I
[MSGID: 106479]
>>>>>>>>>>> [glusterd.c:1486:init]
0-management: Using /var/run/gluster as pid file
>>>>>>>>>>> working directory
>>>>>>>>>>> [2019-01-15 17:50:13.964437] W
[MSGID: 103071]
>>>>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>>>>> channel creation failed [No such
device]
>>>>>>>>>>> [2019-01-15 17:50:13.964474] W
[MSGID: 103055]
>>>>>>>>>>> [rdma.c:4938:init]
0-rdma.management: Failed to initialize IB Device
>>>>>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>>>>>>
[rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma'
>>>>>>>>>>> initialization failed
>>>>>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>>>>>>
[rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create
>>>>>>>>>>> listener, initing the transport
failed
>>>>>>>>>>> [2019-01-15 17:50:13.964579] E
[MSGID: 106244]
>>>>>>>>>>> [glusterd.c:1764:init]
0-management: creation of 1 listeners failed,
>>>>>>>>>>> continuing with succeeded transport
>>>>>>>>>>> [2019-01-15 17:50:14.967681] I
[MSGID: 106513]
>>>>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>>>>> op-version: 40100
>>>>>>>>>>> [2019-01-15 17:50:14.973931] I
[MSGID: 106544]
>>>>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>>>>>
d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>>>> [2019-01-15 17:50:15.046620] E
[MSGID: 101032]
>>>>>>>>>>>
[store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
>>>>>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>>>>>> file or directory]
>>>>>>>>>>> [2019-01-15 17:50:15.046685] E
[MSGID: 106201]
>>>>>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>>>>>> Unable to restore volume: gfs-tst
>>>>>>>>>>> [2019-01-15 17:50:15.046718] E
[MSGID: 101019]
>>>>>>>>>>> [xlator.c:720:xlator_init]
0-management: Initialization of volume
>>>>>>>>>>> 'management' failed, review
your volfile again
>>>>>>>>>>> [2019-01-15 17:50:15.046732] E
[MSGID: 101066]
>>>>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>>>>> failed
>>>>>>>>>>> [2019-01-15 17:50:15.046741] E
[MSGID: 101176]
>>>>>>>>>>>
[graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>>>>>>>>> [2019-01-15 17:50:15.047171] W
>>>>>>>>>>>
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>>>>>>>
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>>>>>>>>>>> received signum (-1), shutting down
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 15. In other node running `volume
status' still shows bricks
>>>>>>>>>>> node3 is live
>>>>>>>>>>>      but 'peer status'
showing node-3 disconnected
>>>>>>>>>>>
>>>>>>>>>>> @gfstst-node2:~$ sudo gluster v
status
>>>>>>>>>>> Status of volume: gfs-tst
>>>>>>>>>>> Gluster process                    
TCP Port  RDMA Port
>>>>>>>>>>> Online  Pid
>>>>>>>>>>>
>>>>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>>>>> Brick IP.2:/media/disk1/brick1     
49152     0          Y
>>>>>>>>>>>      1517
>>>>>>>>>>> Brick IP.4:/media/disk1/brick1     
49152     0          Y
>>>>>>>>>>>      1668
>>>>>>>>>>> Brick IP.2:/media/disk2/brick2     
49153     0          Y
>>>>>>>>>>>      1522
>>>>>>>>>>> Brick IP.4:/media/disk2/brick2     
49153     0          Y
>>>>>>>>>>>      1678
>>>>>>>>>>> Brick IP.2:/media/disk3/brick3     
49154     0          Y
>>>>>>>>>>>      1527
>>>>>>>>>>> Brick IP.4:/media/disk3/brick3     
49154     0          Y
>>>>>>>>>>>      1677
>>>>>>>>>>> Brick IP.2:/media/disk4/brick4     
49155     0          Y
>>>>>>>>>>>      1541
>>>>>>>>>>> Brick IP.4:/media/disk4/brick4     
49155     0          Y
>>>>>>>>>>>      1683
>>>>>>>>>>> Self-heal Daemon on localhost      
N/A       N/A        Y
>>>>>>>>>>>      2662
>>>>>>>>>>> Self-heal Daemon on IP.4           
N/A       N/A        Y
>>>>>>>>>>>      2786
>>>>>>>>>>>
>>>>>>>>>>> Task Status of Volume gfs-tst
>>>>>>>>>>>
>>>>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>>>>> There are no active volume tasks
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> root at gfstst-node2:~$ sudo
gluster pool list
>>>>>>>>>>> UUID                               
Hostname        State
>>>>>>>>>>>
d6bf51a7-c296-492f-8dac-e81efa9dd22d    IP.3        Disconnected
>>>>>>>>>>>
c1cbb58e-3ceb-4637-9ba3-3d28ef20b143    IP.4        Connected
>>>>>>>>>>>
0083ec0c-40bf-472a-a128-458924e56c96    localhost       Connected
>>>>>>>>>>>
>>>>>>>>>>> root at gfstst-node2:~$ sudo
gluster peer status
>>>>>>>>>>> Number of Peers: 2
>>>>>>>>>>>
>>>>>>>>>>> Hostname: IP.3
>>>>>>>>>>> Uuid:
d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>>>> State: Peer in Cluster
(Disconnected)
>>>>>>>>>>>
>>>>>>>>>>> Hostname: IP.4
>>>>>>>>>>> Uuid:
c1cbb58e-3ceb-4637-9ba3-3d28ef20b143
>>>>>>>>>>> State: Peer in Cluster (Connected)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> regards
>>>>>>>>>>> Amudhan
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>
>>>>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190130/49a4bcce/attachment.html>

Atin Mukherjee

2019-Jan-31 03:24 UTC

head link

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

I'm not very sure how did you end up into a state where in one of the node
lost information of one peer from the cluster. I suspect doing a replace
node operation you somehow landed into this situation by an incorrect step.
Until and unless you could elaborate more on what all steps you have
performed in the cluster, it'd be difficult to figure out the exact cause.

On Wed, Jan 30, 2019 at 7:25 PM Amudhan P <amudhan83 at gmail.com> wrote:
> Hi Atin,
>
> yes, it worked out thank you.
>
> what would be the cause of this issue?
>
>
>
> On Fri, Jan 25, 2019 at 1:56 PM Atin Mukherjee <amukherj at
redhat.com>
> wrote:
>
>> Amudhan,
>>
>> So here's the issue:
>>
>> In node3, 'cat /var/lib/glusterd/peers/* ' doesn't show up
node2's
>> details and that's why glusterd wasn't able to resolve the
brick(s) hosted
>> on node2.
>>
>> Can you please pick up 0083ec0c-40bf-472a-a128-458924e56c96 file from
>> /var/lib/glusterd/peers/ from node 4 and place it in the same location
in
>> node 3 and then restart glusterd service on node 3?
>>
>>
>> On Thu, Jan 24, 2019 at 11:57 AM Amudhan P <amudhan83 at
gmail.com> wrote:
>>
>>> Atin,
>>>
>>> Sorry, i missed to send entire `glusterd` folder.  Now attached zip
>>> contains `glusterd` folder from all nodes.
>>>
>>> the problem node is node3 IP 10.1.2.3, `glusterd` log file is
inside
>>> node3 folder.
>>>
>>> regards
>>> Amudhan
>>>
>>> On Wed, Jan 23, 2019 at 11:02 PM Atin Mukherjee <amukherj at
redhat.com>
>>> wrote:
>>>
>>>> Amudhan,
>>>>
>>>> I see that you have provided the content of the configuration
of the
>>>> volume gfs-tst where the request was to share the dump of
>>>> /var/lib/glusterd/* . I can not debug this further until you
share the
>>>> correct dump.
>>>>
>>>> On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee <amukherj at
redhat.com>
>>>> wrote:
>>>>
>>>>> Can you please run 'glusterd -LDEBUG' and share
back the glusterd.log?
>>>>> Instead of doing too many back and forth I suggest you to
share the content
>>>>> of /var/lib/glusterd from all the nodes. Also do mention
which particular
>>>>> node the glusterd service is unable to come up.
>>>>>
>>>>> On Thu, Jan 17, 2019 at 11:34 AM Amudhan P <amudhan83 at
gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I have created the folder in the path as said but
still, service
>>>>>> failed to start below is the error msg in glusterd.log
>>>>>>
>>>>>> [2019-01-16 14:50:14.555742] I [MSGID: 100030]
>>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd:
Started running
>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>>> /var/run/glusterd.pid)
>>>>>> [2019-01-16 14:50:14.559835] I [MSGID: 106478]
[glusterd.c:1423:init]
>>>>>> 0-management: Maximum allowed open file descriptors set
to 65536
>>>>>> [2019-01-16 14:50:14.559894] I [MSGID: 106479]
[glusterd.c:1481:init]
>>>>>> 0-management: Using /var/lib/glusterd as working
directory
>>>>>> [2019-01-16 14:50:14.559912] I [MSGID: 106479]
[glusterd.c:1486:init]
>>>>>> 0-management: Using /var/run/gluster as pid file
working directory
>>>>>> [2019-01-16 14:50:14.563834] W [MSGID: 103071]
>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>> channel creation failed [No such device]
>>>>>> [2019-01-16 14:50:14.563867] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>>> 0-rdma.management: Failed to initialize IB Device
>>>>>> [2019-01-16 14:50:14.563882] W
>>>>>> [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma'
>>>>>> initialization failed
>>>>>> [2019-01-16 14:50:14.563957] W
[rpcsvc.c:1781:rpcsvc_create_listener]
>>>>>> 0-rpc-service: cannot create listener, initing the
transport failed
>>>>>> [2019-01-16 14:50:14.563974] E [MSGID: 106244]
[glusterd.c:1764:init]
>>>>>> 0-management: creation of 1 listeners failed,
continuing with succeeded
>>>>>> transport
>>>>>> [2019-01-16 14:50:15.565868] I [MSGID: 106513]
>>>>>> [glusterd-store.c:2240:glusterd_restore_op_version]
0-glusterd: retrieved
>>>>>> op-version: 40100
>>>>>> [2019-01-16 14:50:15.642532] I [MSGID: 106544]
>>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management:
retrieved UUID:
>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>> [2019-01-16 14:50:15.675333] I [MSGID: 106498]
>>>>>>
[glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
>>>>>> connect returned 0
>>>>>> [2019-01-16 14:50:15.675421] W [MSGID: 106061]
>>>>>>
[glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>>>>>> Failed to get tcp-user-timeout
>>>>>> [2019-01-16 14:50:15.675451] I
>>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-management: setting
>>>>>> frame-timeout to 600
>>>>>> *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
>>>>>> [glusterd-store.c:4662:glusterd_resolve_all_bricks]
0-glusterd: resolve
>>>>>> brick failed in restore*
>>>>>> *[2019-01-16 14:50:15.676956] E [MSGID: 101019]
>>>>>> [xlator.c:720:xlator_init] 0-management: Initialization
of volume
>>>>>> 'management' failed, review your volfile again*
>>>>>> [2019-01-16 14:50:15.676973] E [MSGID: 101066]
>>>>>> [graph.c:367:glusterfs_graph_init] 0-management:
initializing translator
>>>>>> failed
>>>>>> [2019-01-16 14:50:15.676986] E [MSGID: 101176]
>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init
failed
>>>>>> [2019-01-16 14:50:15.677479] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f)
[0x40942f] ) 0-:
>>>>>> received signum (-1), shutting down
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee
<amukherj at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> If gluster volume info/status shows the brick to be
>>>>>>> /media/disk4/brick4 then you'd need to mount
the same path and hence you'd
>>>>>>> need to create the brick4 directory explicitly. I
fail to understand the
>>>>>>> rationale how only /media/disk4 can be used as the
mount path for the
>>>>>>> brick.
>>>>>>>
>>>>>>> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P
<amudhan83 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yes, I did mount bricks but the folder
'brick4' was still not
>>>>>>>> created inside the brick.
>>>>>>>> Do I need to create this folder because when I
run replace-brick it
>>>>>>>> will create folder inside the brick. I have
seen this behavior before when
>>>>>>>> running replace-brick or heal begins.
>>>>>>>>
>>>>>>>> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee
<amukherj at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P
<amudhan83 at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Atin,
>>>>>>>>>> I have copied the content of
'gfs-tst' from vol folder in another
>>>>>>>>>> node. when starting service again fails
with error msg in glusterd.log file.
>>>>>>>>>>
>>>>>>>>>> [2019-01-15 20:16:59.513023] I [MSGID:
100030]
>>>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>>>> /usr/local/sbin/glusterd version 4.1.6
(args: /usr/local/sbin/glusterd -p
>>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>>> [2019-01-15 20:16:59.517164] I [MSGID:
106478]
>>>>>>>>>> [glusterd.c:1423:init] 0-management:
Maximum allowed open file descriptors
>>>>>>>>>> set to 65536
>>>>>>>>>> [2019-01-15 20:16:59.517264] I [MSGID:
106479]
>>>>>>>>>> [glusterd.c:1481:init] 0-management:
Using /var/lib/glusterd as working
>>>>>>>>>> directory
>>>>>>>>>> [2019-01-15 20:16:59.517283] I [MSGID:
106479]
>>>>>>>>>> [glusterd.c:1486:init] 0-management:
Using /var/run/gluster as pid file
>>>>>>>>>> working directory
>>>>>>>>>> [2019-01-15 20:16:59.521508] W [MSGID:
103071]
>>>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>>>> channel creation failed [No such
device]
>>>>>>>>>> [2019-01-15 20:16:59.521544] W [MSGID:
103055] [rdma.c:4938:init]
>>>>>>>>>> 0-rdma.management: Failed to initialize
IB Device
>>>>>>>>>> [2019-01-15 20:16:59.521562] W
>>>>>>>>>>
[rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma'
>>>>>>>>>> initialization failed
>>>>>>>>>> [2019-01-15 20:16:59.521629] W
>>>>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create
>>>>>>>>>> listener, initing the transport failed
>>>>>>>>>> [2019-01-15 20:16:59.521648] E [MSGID:
106244]
>>>>>>>>>> [glusterd.c:1764:init] 0-management:
creation of 1 listeners failed,
>>>>>>>>>> continuing with succeeded transport
>>>>>>>>>> [2019-01-15 20:17:00.529390] I [MSGID:
106513]
>>>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>>>> op-version: 40100
>>>>>>>>>> [2019-01-15 20:17:00.608354] I [MSGID:
106544]
>>>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>>> [2019-01-15 20:17:00.650911] W [MSGID:
106425]
>>>>>>>>>>
[glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed
>>>>>>>>>> to get statfs() call on brick
/media/disk4/brick4 [No such file or
>>>>>>>>>> directory]
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This means that underlying brick
/media/disk4/brick4 doesn't
>>>>>>>>> exist. You already mentioned that you had
replaced the faulty disk, but
>>>>>>>>> have you not mounted it yet?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> [2019-01-15 20:17:00.691240] I [MSGID:
106498]
>>>>>>>>>>
[glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
>>>>>>>>>> connect returned 0
>>>>>>>>>> [2019-01-15 20:17:00.691307] W [MSGID:
106061]
>>>>>>>>>>
[glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>>>>>>>>>> Failed to get tcp-user-timeout
>>>>>>>>>> [2019-01-15 20:17:00.691331] I
>>>>>>>>>>
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
>>>>>>>>>> frame-timeout to 600
>>>>>>>>>> [2019-01-15 20:17:00.692547] E [MSGID:
106187]
>>>>>>>>>>
[glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
>>>>>>>>>> brick failed in restore
>>>>>>>>>> [2019-01-15 20:17:00.692582] E [MSGID:
101019]
>>>>>>>>>> [xlator.c:720:xlator_init]
0-management: Initialization of volume
>>>>>>>>>> 'management' failed, review
your volfile again
>>>>>>>>>> [2019-01-15 20:17:00.692597] E [MSGID:
101066]
>>>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>>>> failed
>>>>>>>>>> [2019-01-15 20:17:00.692607] E [MSGID:
101176]
>>>>>>>>>> [graph.c:738:glusterfs_graph_activate]
0-graph: init failed
>>>>>>>>>> [2019-01-15 20:17:00.693004] W
>>>>>>>>>> [glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>>>>>>
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>>>>>>>>>> received signum (-1), shutting down
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 16, 2019 at 4:34 PM Atin
Mukherjee <
>>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> This is a case of partial write of
a transaction and as the host
>>>>>>>>>>> ran out of space for the root
partition where all the glusterd related
>>>>>>>>>>> configurations are persisted, the
transaction couldn't be written and hence
>>>>>>>>>>> the new (replaced) brick's
information wasn't persisted in the
>>>>>>>>>>> configuration. The workaround for
this is to copy the content of
>>>>>>>>>>> /var/lib/glusterd/vols/gfs-tst/
from one of the nodes in the trusted
>>>>>>>>>>> storage pool to the node where
glusterd service fails to come up and post
>>>>>>>>>>> that restarting the glusterd
service should be able to make peer status
>>>>>>>>>>> reporting all nodes healthy and
connected.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 16, 2019 at 3:49 PM
Amudhan P <amudhan83 at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> In short, when I started
glusterd service I am getting
>>>>>>>>>>>> following error msg in the
glusterd.log file in one server.
>>>>>>>>>>>> what needs to be done?
>>>>>>>>>>>>
>>>>>>>>>>>> error logged in glusterd.log
>>>>>>>>>>>>
>>>>>>>>>>>> [2019-01-15 17:50:13.956053] I
[MSGID: 100030]
>>>>>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>>>>>> /usr/local/sbin/glusterd
version 4.1.6 (args: /usr/local/sbin/glusterd -p
>>>>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>>>>> [2019-01-15 17:50:13.960131] I
[MSGID: 106478]
>>>>>>>>>>>> [glusterd.c:1423:init]
0-management: Maximum allowed open file descriptors
>>>>>>>>>>>> set to 65536
>>>>>>>>>>>> [2019-01-15 17:50:13.960193] I
[MSGID: 106479]
>>>>>>>>>>>> [glusterd.c:1481:init]
0-management: Using /var/lib/glusterd as working
>>>>>>>>>>>> directory
>>>>>>>>>>>> [2019-01-15 17:50:13.960212] I
[MSGID: 106479]
>>>>>>>>>>>> [glusterd.c:1486:init]
0-management: Using /var/run/gluster as pid file
>>>>>>>>>>>> working directory
>>>>>>>>>>>> [2019-01-15 17:50:13.964437] W
[MSGID: 103071]
>>>>>>>>>>>>
[rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>>>>>>>>>>>> channel creation failed [No
such device]
>>>>>>>>>>>> [2019-01-15 17:50:13.964474] W
[MSGID: 103055]
>>>>>>>>>>>> [rdma.c:4938:init]
0-rdma.management: Failed to initialize IB Device
>>>>>>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>>>>>>>
[rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma'
>>>>>>>>>>>> initialization failed
>>>>>>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>>>>>>>
[rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create
>>>>>>>>>>>> listener, initing the transport
failed
>>>>>>>>>>>> [2019-01-15 17:50:13.964579] E
[MSGID: 106244]
>>>>>>>>>>>> [glusterd.c:1764:init]
0-management: creation of 1 listeners failed,
>>>>>>>>>>>> continuing with succeeded
transport
>>>>>>>>>>>> [2019-01-15 17:50:14.967681] I
[MSGID: 106513]
>>>>>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>>>>>> op-version: 40100
>>>>>>>>>>>> [2019-01-15 17:50:14.973931] I
[MSGID: 106544]
>>>>>>>>>>>>
[glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>>>>>>>>>>>>
d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>>>>> [2019-01-15 17:50:15.046620] E
[MSGID: 101032]
>>>>>>>>>>>>
[store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
>>>>>>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>>>>>>> file or directory]
>>>>>>>>>>>> [2019-01-15 17:50:15.046685] E
[MSGID: 106201]
>>>>>>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>>>>>>> Unable to restore volume:
gfs-tst
>>>>>>>>>>>> [2019-01-15 17:50:15.046718] E
[MSGID: 101019]
>>>>>>>>>>>> [xlator.c:720:xlator_init]
0-management: Initialization of volume
>>>>>>>>>>>> 'management' failed,
review your volfile again
>>>>>>>>>>>> [2019-01-15 17:50:15.046732] E
[MSGID: 101066]
>>>>>>>>>>>>
[graph.c:367:glusterfs_graph_init] 0-management: initializing translator
>>>>>>>>>>>> failed
>>>>>>>>>>>> [2019-01-15 17:50:15.046741] E
[MSGID: 101176]
>>>>>>>>>>>>
[graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>>>>>>>>>> [2019-01-15 17:50:15.047171] W
>>>>>>>>>>>>
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In long, I am trying to
simulate a situation. where volume
>>>>>>>>>>>> stoped abnormally and
>>>>>>>>>>>> entire cluster restarted with
some missing disks.
>>>>>>>>>>>>
>>>>>>>>>>>> My test cluster is set up with
3 nodes and each has four disks,
>>>>>>>>>>>> I have setup a volume with
disperse 4+2.
>>>>>>>>>>>> In Node-3 2 disks have failed,
to replace I have shutdown all
>>>>>>>>>>>> system
>>>>>>>>>>>>
>>>>>>>>>>>> below are the steps done.
>>>>>>>>>>>>
>>>>>>>>>>>> 1. umount from client machine
>>>>>>>>>>>> 2. shutdown all system by
running `shutdown -h now` command (
>>>>>>>>>>>> without stopping volume and
stop service)
>>>>>>>>>>>> 3. replace faulty disk in
Node-3
>>>>>>>>>>>> 4. powered ON all system
>>>>>>>>>>>> 5. format replaced drives, and
mount all drives
>>>>>>>>>>>> 6. start glusterd service in
all node (success)
>>>>>>>>>>>> 7. Now running `voulume status`
command from node-3
>>>>>>>>>>>> output : [2019-01-15
16:52:17.718422]  : v status : FAILED :
>>>>>>>>>>>> Staging failed on
0083ec0c-40bf-472a-a128-458924e56c96. Please check log
>>>>>>>>>>>> file for details.
>>>>>>>>>>>> 8. running `voulume start
gfs-tst` command from node-3
>>>>>>>>>>>> output : [2019-01-15
16:53:19.410252]  : v start gfs-tst :
>>>>>>>>>>>> FAILED : Volume gfs-tst already
started
>>>>>>>>>>>>
>>>>>>>>>>>> 9. running `gluster v status`
in other node. showing all brick
>>>>>>>>>>>> available but 'self-heal
daemon' not running
>>>>>>>>>>>> @gfstst-node2:~$ sudo gluster v
status
>>>>>>>>>>>> Status of volume: gfs-tst
>>>>>>>>>>>> Gluster process                
TCP Port  RDMA
>>>>>>>>>>>> Port  Online  Pid
>>>>>>>>>>>>
>>>>>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>>>>>> Brick IP.2:/media/disk1/brick1 
49152     0          Y
>>>>>>>>>>>>      1517
>>>>>>>>>>>> Brick IP.4:/media/disk1/brick1 
49152     0          Y
>>>>>>>>>>>>      1668
>>>>>>>>>>>> Brick IP.2:/media/disk2/brick2 
49153     0          Y
>>>>>>>>>>>>      1522
>>>>>>>>>>>> Brick IP.4:/media/disk2/brick2 
49153     0          Y
>>>>>>>>>>>>      1678
>>>>>>>>>>>> Brick IP.2:/media/disk3/brick3 
49154     0          Y
>>>>>>>>>>>>      1527
>>>>>>>>>>>> Brick IP.4:/media/disk3/brick3 
49154     0          Y
>>>>>>>>>>>>      1677
>>>>>>>>>>>> Brick IP.2:/media/disk4/brick4 
49155     0          Y
>>>>>>>>>>>>      1541
>>>>>>>>>>>> Brick IP.4:/media/disk4/brick4 
49155     0          Y
>>>>>>>>>>>>      1683
>>>>>>>>>>>> Self-heal Daemon on localhost  
N/A       N/A
>>>>>>>>>>>>   Y       2662
>>>>>>>>>>>> Self-heal Daemon on IP.4       
N/A       N/A        Y
>>>>>>>>>>>>      2786
>>>>>>>>>>>>
>>>>>>>>>>>> 10. in the above output
'volume already started'. so, running
>>>>>>>>>>>> `reset-brick` command
>>>>>>>>>>>>    v reset-brick gfs-tst
IP.3:/media/disk3/brick3
>>>>>>>>>>>> IP.3:/media/disk3/brick3 commit
force
>>>>>>>>>>>>
>>>>>>>>>>>> output : [2019-01-15
16:57:37.916942]  : v reset-brick gfs-tst
>>>>>>>>>>>> IP.3:/media/disk3/brick3
IP.3:/media/disk3/brick3 commit force : FAILED :
>>>>>>>>>>>> /media/disk3/brick3 is already
part of a volume
>>>>>>>>>>>>
>>>>>>>>>>>> 11. reset-brick command was not
working, so, tried stopping
>>>>>>>>>>>> volume and start with force
command
>>>>>>>>>>>> output : [2019-01-15
17:01:04.570794]  : v start gfs-tst force
>>>>>>>>>>>> : FAILED : Pre-validation
failed on localhost. Please check log file for
>>>>>>>>>>>> details
>>>>>>>>>>>>
>>>>>>>>>>>> 12. now stopped service in all
node and tried starting again.
>>>>>>>>>>>> except node-3 other nodes
service started successfully without any issues.
>>>>>>>>>>>>
>>>>>>>>>>>> in node-3 receiving following
message.
>>>>>>>>>>>>
>>>>>>>>>>>> sudo service glusterd start
>>>>>>>>>>>> * Starting glusterd service
glusterd
>>>>>>>>>>>>
>>>>>>>>>>>>                     [fail]
>>>>>>>>>>>> /usr/local/sbin/glusterd:
option requires an argument -- 'f'
>>>>>>>>>>>> Try `glusterd --help' or
`glusterd --usage' for more
>>>>>>>>>>>> information.
>>>>>>>>>>>>
>>>>>>>>>>>> 13. checking glusterd log file
found that OS drive was running
>>>>>>>>>>>> out of space
>>>>>>>>>>>> output : [2019-01-15
16:51:37.210792] W [MSGID: 101012]
>>>>>>>>>>>>
[store.c:372:gf_store_save_value] 0-management: fflush failed. [No space
>>>>>>>>>>>> left on device]
>>>>>>>>>>>> [2019-01-15 16:51:37.210874] E
[MSGID: 106190]
>>>>>>>>>>>>
[glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management:
>>>>>>>>>>>> Unable to write volume values
for gfs-tst
>>>>>>>>>>>>
>>>>>>>>>>>> 14. cleared some space in OS
drive but still, service is not
>>>>>>>>>>>> running. below is the error
logged in glusterd.log
>>>>>>>>>>>>
>>>>>>>>>>>> [2019-01-15 17:50:13.956053] I
[MSGID: 100030]
>>>>>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>>>>>> /usr/local/sbin/glusterd
version 4.1.6 (args: /usr/local/sbin/glusterd -p
>>>>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>>>>> [2019-01-15 17:50:13.960131] I
[MSGID: 106478]
>>>>>>>>>>>> [glusterd.c:1423:init]
0-management: Maximum allowed open file descriptors
>>>>>>>>>>>> set to 65536
>>>>>>>>>>>> [2019-01-15 17:50:13.960193] I
[MSGID: 106479]
>>>>>>>>>>>> [glusterd.c:1481:init]
0-management: Using /var/lib/glusterd as working
>>>>>>>>>>>> directory
>>>>>>>>>>>> [2019-01-15 17:50:13.960212] I
[MSGID: 106479]
>>>>>>>>>>>> [glusterd.c:1486:init]
0-management: Using /var/run/gluster as pid file
>>>>>>>>>>>> working directory
>>>>>>>>>>>> [2019-01-15 17:50:13.964437] W
[MSGID: 103071]
>>>>>>>>>>>>
[rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>>>>>>>>>>>> channel creation failed [No
such device]
>>>>>>>>>>>> [2019-01-15 17:50:13.964474] W
[MSGID: 103055]
>>>>>>>>>>>> [rdma.c:4938:init]
0-rdma.management: Failed to initialize IB Device
>>>>>>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>>>>>>>
[rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma'
>>>>>>>>>>>> initialization failed
>>>>>>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>>>>>>>
[rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create
>>>>>>>>>>>> listener, initing the transport
failed
>>>>>>>>>>>> [2019-01-15 17:50:13.964579] E
[MSGID: 106244]
>>>>>>>>>>>> [glusterd.c:1764:init]
0-management: creation of 1 listeners failed,
>>>>>>>>>>>> continuing with succeeded
transport
>>>>>>>>>>>> [2019-01-15 17:50:14.967681] I
[MSGID: 106513]
>>>>>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>>>>>> op-version: 40100
>>>>>>>>>>>> [2019-01-15 17:50:14.973931] I
[MSGID: 106544]
>>>>>>>>>>>>
[glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>>>>>>>>>>>>
d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>>>>> [2019-01-15 17:50:15.046620] E
[MSGID: 101032]
>>>>>>>>>>>>
[store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
>>>>>>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>>>>>>> file or directory]
>>>>>>>>>>>> [2019-01-15 17:50:15.046685] E
[MSGID: 106201]
>>>>>>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>>>>>>> Unable to restore volume:
gfs-tst
>>>>>>>>>>>> [2019-01-15 17:50:15.046718] E
[MSGID: 101019]
>>>>>>>>>>>> [xlator.c:720:xlator_init]
0-management: Initialization of volume
>>>>>>>>>>>> 'management' failed,
review your volfile again
>>>>>>>>>>>> [2019-01-15 17:50:15.046732] E
[MSGID: 101066]
>>>>>>>>>>>>
[graph.c:367:glusterfs_graph_init] 0-management: initializing translator
>>>>>>>>>>>> failed
>>>>>>>>>>>> [2019-01-15 17:50:15.046741] E
[MSGID: 101176]
>>>>>>>>>>>>
[graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>>>>>>>>>> [2019-01-15 17:50:15.047171] W
>>>>>>>>>>>>
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>>>>>>>>
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>>>>>>>>>>>> received signum (-1), shutting
down
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 15. In other node running
`volume status' still shows bricks
>>>>>>>>>>>> node3 is live
>>>>>>>>>>>>      but 'peer status'
showing node-3 disconnected
>>>>>>>>>>>>
>>>>>>>>>>>> @gfstst-node2:~$ sudo gluster v
status
>>>>>>>>>>>> Status of volume: gfs-tst
>>>>>>>>>>>> Gluster process                
TCP Port  RDMA
>>>>>>>>>>>> Port  Online  Pid
>>>>>>>>>>>>
>>>>>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>>>>>> Brick IP.2:/media/disk1/brick1 
49152     0          Y
>>>>>>>>>>>>      1517
>>>>>>>>>>>> Brick IP.4:/media/disk1/brick1 
49152     0          Y
>>>>>>>>>>>>      1668
>>>>>>>>>>>> Brick IP.2:/media/disk2/brick2 
49153     0          Y
>>>>>>>>>>>>      1522
>>>>>>>>>>>> Brick IP.4:/media/disk2/brick2 
49153     0          Y
>>>>>>>>>>>>      1678
>>>>>>>>>>>> Brick IP.2:/media/disk3/brick3 
49154     0          Y
>>>>>>>>>>>>      1527
>>>>>>>>>>>> Brick IP.4:/media/disk3/brick3 
49154     0          Y
>>>>>>>>>>>>      1677
>>>>>>>>>>>> Brick IP.2:/media/disk4/brick4 
49155     0          Y
>>>>>>>>>>>>      1541
>>>>>>>>>>>> Brick IP.4:/media/disk4/brick4 
49155     0          Y
>>>>>>>>>>>>      1683
>>>>>>>>>>>> Self-heal Daemon on localhost  
N/A       N/A        Y
>>>>>>>>>>>>      2662
>>>>>>>>>>>> Self-heal Daemon on IP.4       
N/A       N/A        Y
>>>>>>>>>>>>      2786
>>>>>>>>>>>>
>>>>>>>>>>>> Task Status of Volume gfs-tst
>>>>>>>>>>>>
>>>>>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>>>>>> There are no active volume
tasks
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> root at gfstst-node2:~$ sudo
gluster pool list
>>>>>>>>>>>> UUID                           
Hostname        State
>>>>>>>>>>>>
d6bf51a7-c296-492f-8dac-e81efa9dd22d    IP.3        Disconnected
>>>>>>>>>>>>
c1cbb58e-3ceb-4637-9ba3-3d28ef20b143    IP.4        Connected
>>>>>>>>>>>>
0083ec0c-40bf-472a-a128-458924e56c96    localhost
>>>>>>>>>>>>  Connected
>>>>>>>>>>>>
>>>>>>>>>>>> root at gfstst-node2:~$ sudo
gluster peer status
>>>>>>>>>>>> Number of Peers: 2
>>>>>>>>>>>>
>>>>>>>>>>>> Hostname: IP.3
>>>>>>>>>>>> Uuid:
d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>>>>> State: Peer in Cluster
(Disconnected)
>>>>>>>>>>>>
>>>>>>>>>>>> Hostname: IP.4
>>>>>>>>>>>> Uuid:
c1cbb58e-3ceb-4637-9ba3-3d28ef20b143
>>>>>>>>>>>> State: Peer in Cluster
(Connected)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> regards
>>>>>>>>>>>> Amudhan
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>
>>>>>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190131/a0154674/attachment-0001.html>

Gluster users - Jan 2019 - glusterfs 4.1.6 error in starting glusterd service

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service