thr3ads.net - Gluster users - [Gluster-users] glusterfs 4.1.6 error in starting glusterd service [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Amudhan P

2019-Jan-16 11:54 UTC

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

Yes, I did mount bricks but the folder 'brick4' was still not created
inside the brick.
Do I need to create this folder because when I run replace-brick it will
create folder inside the brick. I have seen this behavior before when
running replace-brick or heal begins.

On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee <amukherj at redhat.com>
wrote:
>
>
> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P <amudhan83 at gmail.com>
wrote:
>
>> Atin,
>> I have copied the content of 'gfs-tst' from vol folder in
another node.
>> when starting service again fails with error msg in glusterd.log file.
>>
>> [2019-01-15 20:16:59.513023] I [MSGID: 100030] [glusterfsd.c:2741:main]
>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
>> [2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init]
>> 0-management: Using /var/lib/glusterd as working directory
>> [2019-01-15 20:16:59.517283] I [MSGID: 106479] [glusterd.c:1486:init]
>> 0-management: Using /var/run/gluster as pid file working directory
>> [2019-01-15 20:16:59.521508] W [MSGID: 103071]
>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>> channel creation failed [No such device]
>> [2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init]
>> 0-rdma.management: Failed to initialize IB Device
>> [2019-01-15 20:16:59.521562] W [rpc-transport.c:351:rpc_transport_load]
>> 0-rpc-transport: 'rdma' initialization failed
>> [2019-01-15 20:16:59.521629] W [rpcsvc.c:1781:rpcsvc_create_listener]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2019-01-15 20:16:59.521648] E [MSGID: 106244] [glusterd.c:1764:init]
>> 0-management: creation of 1 listeners failed, continuing with succeeded
>> transport
>> [2019-01-15 20:17:00.529390] I [MSGID: 106513]
>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd:
retrieved
>> op-version: 40100
>> [2019-01-15 20:17:00.608354] I [MSGID: 106544]
>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>> [2019-01-15 20:17:00.650911] W [MSGID: 106425]
>> [glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management:
failed
>> to get statfs() call on brick /media/disk4/brick4 [No such file or
>> directory]
>>
>
> This means that underlying brick /media/disk4/brick4 doesn't exist. You
> already mentioned that you had replaced the faulty disk, but have you not
> mounted it yet?
>
>
>> [2019-01-15 20:17:00.691240] I [MSGID: 106498]
>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo]
0-management:
>> connect returned 0
>> [2019-01-15 20:17:00.691307] W [MSGID: 106061]
>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build]
0-glusterd:
>> Failed to get tcp-user-timeout
>> [2019-01-15 20:17:00.691331] I
[rpc-clnt.c:1059:rpc_clnt_connection_init]
>> 0-management: setting frame-timeout to 600
>> [2019-01-15 20:17:00.692547] E [MSGID: 106187]
>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
>> brick failed in restore
>> [2019-01-15 20:17:00.692582] E [MSGID: 101019]
[xlator.c:720:xlator_init]
>> 0-management: Initialization of volume 'management' failed,
review your
>> volfile again
>> [2019-01-15 20:17:00.692597] E [MSGID: 101066]
>> [graph.c:367:glusterfs_graph_init] 0-management: initializing
translator
>> failed
>> [2019-01-15 20:17:00.692607] E [MSGID: 101176]
>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>> [2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit]
>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151)
[0x409e41]
>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>> received signum (-1), shutting down
>>
>>
>> On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee <amukherj at
redhat.com>
>> wrote:
>>
>>> This is a case of partial write of a transaction and as the host
ran out
>>> of space for the root partition where all the glusterd related
>>> configurations are persisted, the transaction couldn't be
written and hence
>>> the new (replaced) brick's information wasn't persisted in
the
>>> configuration. The workaround for this is to copy the content of
>>> /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the
trusted
>>> storage pool to the node where glusterd service fails to come up
and post
>>> that restarting the glusterd service should be able to make peer
status
>>> reporting all nodes healthy and connected.
>>>
>>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P <amudhan83 at
gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> In short, when I started glusterd service I am getting
following error
>>>> msg in the glusterd.log file in one server.
>>>> what needs to be done?
>>>>
>>>> error logged in glusterd.log
>>>>
>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030]
[glusterfsd.c:2741:main]
>>>> 0-/usr/local/sbin/glusterd: Started running
/usr/local/sbin/glusterd
>>>> version 4.1.6 (args: /usr/local/sbin/glusterd -p
/var/run/glusterd.pid)
>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478]
[glusterd.c:1423:init]
>>>> 0-management: Maximum allowed open file descriptors set to
65536
>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479]
[glusterd.c:1481:init]
>>>> 0-management: Using /var/lib/glusterd as working directory
>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479]
[glusterd.c:1486:init]
>>>> 0-management: Using /var/run/gluster as pid file working
directory
>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma:
rdma_cm event
>>>> channel creation failed [No such device]
>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055]
[rdma.c:4938:init]
>>>> 0-rdma.management: Failed to initialize IB Device
>>>> [2019-01-15 17:50:13.964491] W
[rpc-transport.c:351:rpc_transport_load]
>>>> 0-rpc-transport: 'rdma' initialization failed
>>>> [2019-01-15 17:50:13.964560] W
[rpcsvc.c:1781:rpcsvc_create_listener]
>>>> 0-rpc-service: cannot create listener, initing the transport
failed
>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244]
[glusterd.c:1764:init]
>>>> 0-management: creation of 1 listeners failed, continuing with
succeeded
>>>> transport
>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
>>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd:
retrieved
>>>> op-version: 40100
>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved
UUID:
>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding
to
>>>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3.
[No such
>>>> file or directory]
>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
>>>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes]
0-management:
>>>> Unable to restore volume: gfs-tst
>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019]
>>>> [xlator.c:720:xlator_init] 0-management: Initialization of
volume
>>>> 'management' failed, review your volfile again
>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
>>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing
translator
>>>> failed
>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>> [2019-01-15 17:50:15.047171] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes
>>>>
>>>>
>>>>
>>>> In long, I am trying to simulate a situation. where volume
stoped
>>>> abnormally and
>>>> entire cluster restarted with some missing disks.
>>>>
>>>> My test cluster is set up with 3 nodes and each has four disks,
I have
>>>> setup a volume with disperse 4+2.
>>>> In Node-3 2 disks have failed, to replace I have shutdown all
system
>>>>
>>>> below are the steps done.
>>>>
>>>> 1. umount from client machine
>>>> 2. shutdown all system by running `shutdown -h now` command (
without
>>>> stopping volume and stop service)
>>>> 3. replace faulty disk in Node-3
>>>> 4. powered ON all system
>>>> 5. format replaced drives, and mount all drives
>>>> 6. start glusterd service in all node (success)
>>>> 7. Now running `voulume status` command from node-3
>>>> output : [2019-01-15 16:52:17.718422]  : v status : FAILED :
Staging
>>>> failed on 0083ec0c-40bf-472a-a128-458924e56c96. Please check
log file for
>>>> details.
>>>> 8. running `voulume start gfs-tst` command from node-3
>>>> output : [2019-01-15 16:53:19.410252]  : v start gfs-tst :
FAILED :
>>>> Volume gfs-tst already started
>>>>
>>>> 9. running `gluster v status` in other node. showing all brick
>>>> available but 'self-heal daemon' not running
>>>> @gfstst-node2:~$ sudo gluster v status
>>>> Status of volume: gfs-tst
>>>> Gluster process                             TCP Port  RDMA Port
>>>> Online  Pid
>>>>
>>>>
------------------------------------------------------------------------------
>>>> Brick IP.2:/media/disk1/brick1          49152     0          Y
>>>>  1517
>>>> Brick IP.4:/media/disk1/brick1          49152     0          Y
>>>>  1668
>>>> Brick IP.2:/media/disk2/brick2          49153     0          Y
>>>>  1522
>>>> Brick IP.4:/media/disk2/brick2          49153     0          Y
>>>>  1678
>>>> Brick IP.2:/media/disk3/brick3          49154     0          Y
>>>>  1527
>>>> Brick IP.4:/media/disk3/brick3          49154     0          Y
>>>>  1677
>>>> Brick IP.2:/media/disk4/brick4          49155     0          Y
>>>>  1541
>>>> Brick IP.4:/media/disk4/brick4          49155     0          Y
>>>>  1683
>>>> Self-heal Daemon on localhost               N/A       N/A      
Y
>>>>  2662
>>>> Self-heal Daemon on IP.4                N/A       N/A        Y
>>>>  2786
>>>>
>>>> 10. in the above output 'volume already started'. so,
running
>>>> `reset-brick` command
>>>>    v reset-brick gfs-tst IP.3:/media/disk3/brick3
>>>> IP.3:/media/disk3/brick3 commit force
>>>>
>>>> output : [2019-01-15 16:57:37.916942]  : v reset-brick gfs-tst
>>>> IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit force
: FAILED :
>>>> /media/disk3/brick3 is already part of a volume
>>>>
>>>> 11. reset-brick command was not working, so, tried stopping
volume and
>>>> start with force command
>>>> output : [2019-01-15 17:01:04.570794]  : v start gfs-tst force
: FAILED
>>>> : Pre-validation failed on localhost. Please check log file for
details
>>>>
>>>> 12. now stopped service in all node and tried starting again.
except
>>>> node-3 other nodes service started successfully without any
issues.
>>>>
>>>> in node-3 receiving following message.
>>>>
>>>> sudo service glusterd start
>>>> * Starting glusterd service glusterd
>>>>
>>>>             [fail]
>>>> /usr/local/sbin/glusterd: option requires an argument --
'f'
>>>> Try `glusterd --help' or `glusterd --usage' for more
information.
>>>>
>>>> 13. checking glusterd log file found that OS drive was running
out of
>>>> space
>>>> output : [2019-01-15 16:51:37.210792] W [MSGID: 101012]
>>>> [store.c:372:gf_store_save_value] 0-management: fflush failed.
[No space
>>>> left on device]
>>>> [2019-01-15 16:51:37.210874] E [MSGID: 106190]
>>>> [glusterd-store.c:1058:glusterd_volume_exclude_options_write]
0-management:
>>>> Unable to write volume values for gfs-tst
>>>>
>>>> 14. cleared some space in OS drive but still, service is not
running.
>>>> below is the error logged in glusterd.log
>>>>
>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030]
[glusterfsd.c:2741:main]
>>>> 0-/usr/local/sbin/glusterd: Started running
/usr/local/sbin/glusterd
>>>> version 4.1.6 (args: /usr/local/sbin/glusterd -p
/var/run/glusterd.pid)
>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478]
[glusterd.c:1423:init]
>>>> 0-management: Maximum allowed open file descriptors set to
65536
>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479]
[glusterd.c:1481:init]
>>>> 0-management: Using /var/lib/glusterd as working directory
>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479]
[glusterd.c:1486:init]
>>>> 0-management: Using /var/run/gluster as pid file working
directory
>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma:
rdma_cm event
>>>> channel creation failed [No such device]
>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055]
[rdma.c:4938:init]
>>>> 0-rdma.management: Failed to initialize IB Device
>>>> [2019-01-15 17:50:13.964491] W
[rpc-transport.c:351:rpc_transport_load]
>>>> 0-rpc-transport: 'rdma' initialization failed
>>>> [2019-01-15 17:50:13.964560] W
[rpcsvc.c:1781:rpcsvc_create_listener]
>>>> 0-rpc-service: cannot create listener, initing the transport
failed
>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244]
[glusterd.c:1764:init]
>>>> 0-management: creation of 1 listeners failed, continuing with
succeeded
>>>> transport
>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
>>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd:
retrieved
>>>> op-version: 40100
>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved
UUID:
>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding
to
>>>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3.
[No such
>>>> file or directory]
>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
>>>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes]
0-management:
>>>> Unable to restore volume: gfs-tst
>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019]
>>>> [xlator.c:720:xlator_init] 0-management: Initialization of
volume
>>>> 'management' failed, review your volfile again
>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
>>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing
translator
>>>> failed
>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>> [2019-01-15 17:50:15.047171] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2)
[0x409f52]
>>>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151)
[0x409e41]
>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f)
[0x40942f] ) 0-:
>>>> received signum (-1), shutting down
>>>>
>>>>
>>>> 15. In other node running `volume status' still shows
bricks node3 is
>>>> live
>>>>      but 'peer status' showing node-3 disconnected
>>>>
>>>> @gfstst-node2:~$ sudo gluster v status
>>>> Status of volume: gfs-tst
>>>> Gluster process                             TCP Port  RDMA Port
>>>> Online  Pid
>>>>
>>>>
------------------------------------------------------------------------------
>>>> Brick IP.2:/media/disk1/brick1          49152     0          Y
>>>>  1517
>>>> Brick IP.4:/media/disk1/brick1          49152     0          Y
>>>>  1668
>>>> Brick IP.2:/media/disk2/brick2          49153     0          Y
>>>>  1522
>>>> Brick IP.4:/media/disk2/brick2          49153     0          Y
>>>>  1678
>>>> Brick IP.2:/media/disk3/brick3          49154     0          Y
>>>>  1527
>>>> Brick IP.4:/media/disk3/brick3          49154     0          Y
>>>>  1677
>>>> Brick IP.2:/media/disk4/brick4          49155     0          Y
>>>>  1541
>>>> Brick IP.4:/media/disk4/brick4          49155     0          Y
>>>>  1683
>>>> Self-heal Daemon on localhost           N/A       N/A        Y
>>>>  2662
>>>> Self-heal Daemon on IP.4                N/A       N/A        Y
>>>>  2786
>>>>
>>>> Task Status of Volume gfs-tst
>>>>
>>>>
------------------------------------------------------------------------------
>>>> There are no active volume tasks
>>>>
>>>>
>>>> root at gfstst-node2:~$ sudo gluster pool list
>>>> UUID                                    Hostname        State
>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d    IP.3       
Disconnected
>>>> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143    IP.4        Connected
>>>> 0083ec0c-40bf-472a-a128-458924e56c96    localhost      
Connected
>>>>
>>>> root at gfstst-node2:~$ sudo gluster peer status
>>>> Number of Peers: 2
>>>>
>>>> Hostname: IP.3
>>>> Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>> State: Peer in Cluster (Disconnected)
>>>>
>>>> Hostname: IP.4
>>>> Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143
>>>> State: Peer in Cluster (Connected)
>>>>
>>>>
>>>> regards
>>>> Amudhan
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190116/3022ffc5/attachment.html>

Atin Mukherjee

2019-Jan-17 02:36 UTC

head link

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

If gluster volume info/status shows the brick to be /media/disk4/brick4
then you'd need to mount the same path and hence you'd need to create
the
brick4 directory explicitly. I fail to understand the rationale how only
/media/disk4 can be used as the mount path for the brick.

On Wed, Jan 16, 2019 at 5:24 PM Amudhan P <amudhan83 at gmail.com> wrote:
> Yes, I did mount bricks but the folder 'brick4' was still not
created
> inside the brick.
> Do I need to create this folder because when I run replace-brick it will
> create folder inside the brick. I have seen this behavior before when
> running replace-brick or heal begins.
>
> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee <amukherj at
redhat.com>
> wrote:
>
>>
>>
>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P <amudhan83 at
gmail.com> wrote:
>>
>>> Atin,
>>> I have copied the content of 'gfs-tst' from vol folder in
another node.
>>> when starting service again fails with error msg in glusterd.log
file.
>>>
>>> [2019-01-15 20:16:59.513023] I [MSGID: 100030]
[glusterfsd.c:2741:main]
>>> 0-/usr/local/sbin/glusterd: Started running
/usr/local/sbin/glusterd
>>> version 4.1.6 (args: /usr/local/sbin/glusterd -p
/var/run/glusterd.pid)
>>> [2019-01-15 20:16:59.517164] I [MSGID: 106478]
[glusterd.c:1423:init]
>>> 0-management: Maximum allowed open file descriptors set to 65536
>>> [2019-01-15 20:16:59.517264] I [MSGID: 106479]
[glusterd.c:1481:init]
>>> 0-management: Using /var/lib/glusterd as working directory
>>> [2019-01-15 20:16:59.517283] I [MSGID: 106479]
[glusterd.c:1486:init]
>>> 0-management: Using /var/run/gluster as pid file working directory
>>> [2019-01-15 20:16:59.521508] W [MSGID: 103071]
>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm
event
>>> channel creation failed [No such device]
>>> [2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init]
>>> 0-rdma.management: Failed to initialize IB Device
>>> [2019-01-15 20:16:59.521562] W
[rpc-transport.c:351:rpc_transport_load]
>>> 0-rpc-transport: 'rdma' initialization failed
>>> [2019-01-15 20:16:59.521629] W
[rpcsvc.c:1781:rpcsvc_create_listener]
>>> 0-rpc-service: cannot create listener, initing the transport failed
>>> [2019-01-15 20:16:59.521648] E [MSGID: 106244]
[glusterd.c:1764:init]
>>> 0-management: creation of 1 listeners failed, continuing with
succeeded
>>> transport
>>> [2019-01-15 20:17:00.529390] I [MSGID: 106513]
>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd:
retrieved
>>> op-version: 40100
>>> [2019-01-15 20:17:00.608354] I [MSGID: 106544]
>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>> [2019-01-15 20:17:00.650911] W [MSGID: 106425]
>>> [glusterd-store.c:2643:glusterd_store_retrieve_bricks]
0-management: failed
>>> to get statfs() call on brick /media/disk4/brick4 [No such file or
>>> directory]
>>>
>>
>> This means that underlying brick /media/disk4/brick4 doesn't exist.
You
>> already mentioned that you had replaced the faulty disk, but have you
not
>> mounted it yet?
>>
>>
>>> [2019-01-15 20:17:00.691240] I [MSGID: 106498]
>>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo]
0-management:
>>> connect returned 0
>>> [2019-01-15 20:17:00.691307] W [MSGID: 106061]
>>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build]
0-glusterd:
>>> Failed to get tcp-user-timeout
>>> [2019-01-15 20:17:00.691331] I
>>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
>>> frame-timeout to 600
>>> [2019-01-15 20:17:00.692547] E [MSGID: 106187]
>>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd:
resolve
>>> brick failed in restore
>>> [2019-01-15 20:17:00.692582] E [MSGID: 101019]
>>> [xlator.c:720:xlator_init] 0-management: Initialization of volume
>>> 'management' failed, review your volfile again
>>> [2019-01-15 20:17:00.692597] E [MSGID: 101066]
>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing
translator
>>> failed
>>> [2019-01-15 20:17:00.692607] E [MSGID: 101176]
>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>> [2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit]
>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2)
[0x409f52]
>>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151)
[0x409e41]
>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] )
0-:
>>> received signum (-1), shutting down
>>>
>>>
>>> On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee <amukherj at
redhat.com>
>>> wrote:
>>>
>>>> This is a case of partial write of a transaction and as the
host ran
>>>> out of space for the root partition where all the glusterd
related
>>>> configurations are persisted, the transaction couldn't be
written and hence
>>>> the new (replaced) brick's information wasn't persisted
in the
>>>> configuration. The workaround for this is to copy the content
of
>>>> /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the
trusted
>>>> storage pool to the node where glusterd service fails to come
up and post
>>>> that restarting the glusterd service should be able to make
peer status
>>>> reporting all nodes healthy and connected.
>>>>
>>>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P <amudhan83 at
gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> In short, when I started glusterd service I am getting
following error
>>>>> msg in the glusterd.log file in one server.
>>>>> what needs to be done?
>>>>>
>>>>> error logged in glusterd.log
>>>>>
>>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030]
>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd:
Started running
>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>> /var/run/glusterd.pid)
>>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478]
[glusterd.c:1423:init]
>>>>> 0-management: Maximum allowed open file descriptors set to
65536
>>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479]
[glusterd.c:1481:init]
>>>>> 0-management: Using /var/lib/glusterd as working directory
>>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479]
[glusterd.c:1486:init]
>>>>> 0-management: Using /var/run/gluster as pid file working
directory
>>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
>>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma:
rdma_cm event
>>>>> channel creation failed [No such device]
>>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>> 0-rdma.management: Failed to initialize IB Device
>>>>> [2019-01-15 17:50:13.964491] W
>>>>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport:
'rdma'
>>>>> initialization failed
>>>>> [2019-01-15 17:50:13.964560] W
[rpcsvc.c:1781:rpcsvc_create_listener]
>>>>> 0-rpc-service: cannot create listener, initing the
transport failed
>>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244]
[glusterd.c:1764:init]
>>>>> 0-management: creation of 1 listeners failed, continuing
with succeeded
>>>>> transport
>>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
>>>>> [glusterd-store.c:2240:glusterd_restore_op_version]
0-glusterd: retrieved
>>>>> op-version: 40100
>>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved
UUID:
>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
>>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path
corresponding to
>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>> file or directory]
>>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
>>>>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes]
0-management:
>>>>> Unable to restore volume: gfs-tst
>>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019]
>>>>> [xlator.c:720:xlator_init] 0-management: Initialization of
volume
>>>>> 'management' failed, review your volfile again
>>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
>>>>> [graph.c:367:glusterfs_graph_init] 0-management:
initializing translator
>>>>> failed
>>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>>> [2019-01-15 17:50:15.047171] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes
>>>>>
>>>>>
>>>>>
>>>>> In long, I am trying to simulate a situation. where volume
stoped
>>>>> abnormally and
>>>>> entire cluster restarted with some missing disks.
>>>>>
>>>>> My test cluster is set up with 3 nodes and each has four
disks, I have
>>>>> setup a volume with disperse 4+2.
>>>>> In Node-3 2 disks have failed, to replace I have shutdown
all system
>>>>>
>>>>> below are the steps done.
>>>>>
>>>>> 1. umount from client machine
>>>>> 2. shutdown all system by running `shutdown -h now` command
( without
>>>>> stopping volume and stop service)
>>>>> 3. replace faulty disk in Node-3
>>>>> 4. powered ON all system
>>>>> 5. format replaced drives, and mount all drives
>>>>> 6. start glusterd service in all node (success)
>>>>> 7. Now running `voulume status` command from node-3
>>>>> output : [2019-01-15 16:52:17.718422]  : v status : FAILED
: Staging
>>>>> failed on 0083ec0c-40bf-472a-a128-458924e56c96. Please
check log file for
>>>>> details.
>>>>> 8. running `voulume start gfs-tst` command from node-3
>>>>> output : [2019-01-15 16:53:19.410252]  : v start gfs-tst :
FAILED :
>>>>> Volume gfs-tst already started
>>>>>
>>>>> 9. running `gluster v status` in other node. showing all
brick
>>>>> available but 'self-heal daemon' not running
>>>>> @gfstst-node2:~$ sudo gluster v status
>>>>> Status of volume: gfs-tst
>>>>> Gluster process                             TCP Port  RDMA
Port
>>>>> Online  Pid
>>>>>
>>>>>
------------------------------------------------------------------------------
>>>>> Brick IP.2:/media/disk1/brick1          49152     0        
Y
>>>>>  1517
>>>>> Brick IP.4:/media/disk1/brick1          49152     0        
Y
>>>>>  1668
>>>>> Brick IP.2:/media/disk2/brick2          49153     0        
Y
>>>>>  1522
>>>>> Brick IP.4:/media/disk2/brick2          49153     0        
Y
>>>>>  1678
>>>>> Brick IP.2:/media/disk3/brick3          49154     0        
Y
>>>>>  1527
>>>>> Brick IP.4:/media/disk3/brick3          49154     0        
Y
>>>>>  1677
>>>>> Brick IP.2:/media/disk4/brick4          49155     0        
Y
>>>>>  1541
>>>>> Brick IP.4:/media/disk4/brick4          49155     0        
Y
>>>>>  1683
>>>>> Self-heal Daemon on localhost               N/A       N/A  
Y
>>>>>    2662
>>>>> Self-heal Daemon on IP.4                N/A       N/A      
Y
>>>>>  2786
>>>>>
>>>>> 10. in the above output 'volume already started'.
so, running
>>>>> `reset-brick` command
>>>>>    v reset-brick gfs-tst IP.3:/media/disk3/brick3
>>>>> IP.3:/media/disk3/brick3 commit force
>>>>>
>>>>> output : [2019-01-15 16:57:37.916942]  : v reset-brick
gfs-tst
>>>>> IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit
force : FAILED :
>>>>> /media/disk3/brick3 is already part of a volume
>>>>>
>>>>> 11. reset-brick command was not working, so, tried stopping
volume and
>>>>> start with force command
>>>>> output : [2019-01-15 17:01:04.570794]  : v start gfs-tst
force :
>>>>> FAILED : Pre-validation failed on localhost. Please check
log file for
>>>>> details
>>>>>
>>>>> 12. now stopped service in all node and tried starting
again. except
>>>>> node-3 other nodes service started successfully without any
issues.
>>>>>
>>>>> in node-3 receiving following message.
>>>>>
>>>>> sudo service glusterd start
>>>>> * Starting glusterd service glusterd
>>>>>
>>>>>               [fail]
>>>>> /usr/local/sbin/glusterd: option requires an argument --
'f'
>>>>> Try `glusterd --help' or `glusterd --usage' for
more information.
>>>>>
>>>>> 13. checking glusterd log file found that OS drive was
running out of
>>>>> space
>>>>> output : [2019-01-15 16:51:37.210792] W [MSGID: 101012]
>>>>> [store.c:372:gf_store_save_value] 0-management: fflush
failed. [No space
>>>>> left on device]
>>>>> [2019-01-15 16:51:37.210874] E [MSGID: 106190]
>>>>>
[glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management:
>>>>> Unable to write volume values for gfs-tst
>>>>>
>>>>> 14. cleared some space in OS drive but still, service is
not running.
>>>>> below is the error logged in glusterd.log
>>>>>
>>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030]
>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd:
Started running
>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>> /var/run/glusterd.pid)
>>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478]
[glusterd.c:1423:init]
>>>>> 0-management: Maximum allowed open file descriptors set to
65536
>>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479]
[glusterd.c:1481:init]
>>>>> 0-management: Using /var/lib/glusterd as working directory
>>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479]
[glusterd.c:1486:init]
>>>>> 0-management: Using /var/run/gluster as pid file working
directory
>>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
>>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma:
rdma_cm event
>>>>> channel creation failed [No such device]
>>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>> 0-rdma.management: Failed to initialize IB Device
>>>>> [2019-01-15 17:50:13.964491] W
>>>>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport:
'rdma'
>>>>> initialization failed
>>>>> [2019-01-15 17:50:13.964560] W
[rpcsvc.c:1781:rpcsvc_create_listener]
>>>>> 0-rpc-service: cannot create listener, initing the
transport failed
>>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244]
[glusterd.c:1764:init]
>>>>> 0-management: creation of 1 listeners failed, continuing
with succeeded
>>>>> transport
>>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
>>>>> [glusterd-store.c:2240:glusterd_restore_op_version]
0-glusterd: retrieved
>>>>> op-version: 40100
>>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved
UUID:
>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
>>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path
corresponding to
>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>> file or directory]
>>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
>>>>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes]
0-management:
>>>>> Unable to restore volume: gfs-tst
>>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019]
>>>>> [xlator.c:720:xlator_init] 0-management: Initialization of
volume
>>>>> 'management' failed, review your volfile again
>>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
>>>>> [graph.c:367:glusterfs_graph_init] 0-management:
initializing translator
>>>>> failed
>>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>>> [2019-01-15 17:50:15.047171] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f)
[0x40942f] ) 0-:
>>>>> received signum (-1), shutting down
>>>>>
>>>>>
>>>>> 15. In other node running `volume status' still shows
bricks node3 is
>>>>> live
>>>>>      but 'peer status' showing node-3 disconnected
>>>>>
>>>>> @gfstst-node2:~$ sudo gluster v status
>>>>> Status of volume: gfs-tst
>>>>> Gluster process                             TCP Port  RDMA
Port
>>>>> Online  Pid
>>>>>
>>>>>
------------------------------------------------------------------------------
>>>>> Brick IP.2:/media/disk1/brick1          49152     0        
Y
>>>>>  1517
>>>>> Brick IP.4:/media/disk1/brick1          49152     0        
Y
>>>>>  1668
>>>>> Brick IP.2:/media/disk2/brick2          49153     0        
Y
>>>>>  1522
>>>>> Brick IP.4:/media/disk2/brick2          49153     0        
Y
>>>>>  1678
>>>>> Brick IP.2:/media/disk3/brick3          49154     0        
Y
>>>>>  1527
>>>>> Brick IP.4:/media/disk3/brick3          49154     0        
Y
>>>>>  1677
>>>>> Brick IP.2:/media/disk4/brick4          49155     0        
Y
>>>>>  1541
>>>>> Brick IP.4:/media/disk4/brick4          49155     0        
Y
>>>>>  1683
>>>>> Self-heal Daemon on localhost           N/A       N/A      
Y
>>>>>  2662
>>>>> Self-heal Daemon on IP.4                N/A       N/A      
Y
>>>>>  2786
>>>>>
>>>>> Task Status of Volume gfs-tst
>>>>>
>>>>>
------------------------------------------------------------------------------
>>>>> There are no active volume tasks
>>>>>
>>>>>
>>>>> root at gfstst-node2:~$ sudo gluster pool list
>>>>> UUID                                    Hostname       
State
>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d    IP.3       
Disconnected
>>>>> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143    IP.4       
Connected
>>>>> 0083ec0c-40bf-472a-a128-458924e56c96    localhost      
Connected
>>>>>
>>>>> root at gfstst-node2:~$ sudo gluster peer status
>>>>> Number of Peers: 2
>>>>>
>>>>> Hostname: IP.3
>>>>> Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>> State: Peer in Cluster (Disconnected)
>>>>>
>>>>> Hostname: IP.4
>>>>> Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143
>>>>> State: Peer in Cluster (Connected)
>>>>>
>>>>>
>>>>> regards
>>>>> Amudhan
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190117/62561d33/attachment.html>

Gluster users - Jan 2019 - glusterfs 4.1.6 error in starting glusterd service

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service