thr3ads.net - Gluster users - [Gluster-users] glusterfs 4.1.6 error in starting glusterd service [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Atin Mukherjee

2019-Jan-16 11:04 UTC

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

This is a case of partial write of a transaction and as the host ran out of
space for the root partition where all the glusterd related configurations
are persisted, the transaction couldn't be written and hence the new
(replaced) brick's information wasn't persisted in the configuration.
The
workaround for this is to copy the content of
/var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted
storage pool to the node where glusterd service fails to come up and post
that restarting the glusterd service should be able to make peer status
reporting all nodes healthy and connected.

On Wed, Jan 16, 2019 at 3:49 PM Amudhan P <amudhan83 at gmail.com> wrote:
> Hi,
>
> In short, when I started glusterd service I am getting following error msg
> in the glusterd.log file in one server.
> what needs to be done?
>
> error logged in glusterd.log
>
> [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main]
> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init]
> 0-management: Maximum allowed open file descriptors set to 65536
> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init]
> 0-management: Using /var/lib/glusterd as working directory
> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init]
> 0-management: Using /var/run/gluster as pid file working directory
> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
> channel creation failed [No such device]
> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init]
> 0-rdma.management: Failed to initialize IB Device
> [2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load]
> 0-rpc-transport: 'rdma' initialization failed
> [2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener]
> 0-rpc-service: cannot create listener, initing the transport failed
> [2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init]
> 0-management: creation of 1 listeners failed, continuing with succeeded
> transport
> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
> op-version: 40100
> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
> d6bf51a7-c296-492f-8dac-e81efa9dd22d
> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
> file or directory]
> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
> Unable to restore volume: gfs-tst
> [2019-01-15 17:50:15.046718] E [MSGID: 101019] [xlator.c:720:xlator_init]
> 0-management: Initialization of volume 'management' failed, review
your
> volfile again
> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
> failed
> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit]
> (-->/usr/local/sbin/glusterd(glusterfs_volumes
>
>
>
> In long, I am trying to simulate a situation. where volume stoped
> abnormally and
> entire cluster restarted with some missing disks.
>
> My test cluster is set up with 3 nodes and each has four disks, I have
> setup a volume with disperse 4+2.
> In Node-3 2 disks have failed, to replace I have shutdown all system
>
> below are the steps done.
>
> 1. umount from client machine
> 2. shutdown all system by running `shutdown -h now` command ( without
> stopping volume and stop service)
> 3. replace faulty disk in Node-3
> 4. powered ON all system
> 5. format replaced drives, and mount all drives
> 6. start glusterd service in all node (success)
> 7. Now running `voulume status` command from node-3
> output : [2019-01-15 16:52:17.718422]  : v status : FAILED : Staging
> failed on 0083ec0c-40bf-472a-a128-458924e56c96. Please check log file for
> details.
> 8. running `voulume start gfs-tst` command from node-3
> output : [2019-01-15 16:53:19.410252]  : v start gfs-tst : FAILED : Volume
> gfs-tst already started
>
> 9. running `gluster v status` in other node. showing all brick available
> but 'self-heal daemon' not running
> @gfstst-node2:~$ sudo gluster v status
> Status of volume: gfs-tst
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
>
------------------------------------------------------------------------------
> Brick IP.2:/media/disk1/brick1          49152     0          Y       1517
> Brick IP.4:/media/disk1/brick1          49152     0          Y       1668
> Brick IP.2:/media/disk2/brick2          49153     0          Y       1522
> Brick IP.4:/media/disk2/brick2          49153     0          Y       1678
> Brick IP.2:/media/disk3/brick3          49154     0          Y       1527
> Brick IP.4:/media/disk3/brick3          49154     0          Y       1677
> Brick IP.2:/media/disk4/brick4          49155     0          Y       1541
> Brick IP.4:/media/disk4/brick4          49155     0          Y       1683
> Self-heal Daemon on localhost               N/A       N/A        Y
>  2662
> Self-heal Daemon on IP.4                N/A       N/A        Y       2786
>
> 10. in the above output 'volume already started'. so, running
> `reset-brick` command
>    v reset-brick gfs-tst IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3
> commit force
>
> output : [2019-01-15 16:57:37.916942]  : v reset-brick gfs-tst
> IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit force : FAILED :
> /media/disk3/brick3 is already part of a volume
>
> 11. reset-brick command was not working, so, tried stopping volume and
> start with force command
> output : [2019-01-15 17:01:04.570794]  : v start gfs-tst force : FAILED :
> Pre-validation failed on localhost. Please check log file for details
>
> 12. now stopped service in all node and tried starting again. except
> node-3 other nodes service started successfully without any issues.
>
> in node-3 receiving following message.
>
> sudo service glusterd start
> * Starting glusterd service glusterd
>
>           [fail]
> /usr/local/sbin/glusterd: option requires an argument -- 'f'
> Try `glusterd --help' or `glusterd --usage' for more information.
>
> 13. checking glusterd log file found that OS drive was running out of space
> output : [2019-01-15 16:51:37.210792] W [MSGID: 101012]
> [store.c:372:gf_store_save_value] 0-management: fflush failed. [No space
> left on device]
> [2019-01-15 16:51:37.210874] E [MSGID: 106190]
> [glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management:
> Unable to write volume values for gfs-tst
>
> 14. cleared some space in OS drive but still, service is not running.
> below is the error logged in glusterd.log
>
> [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main]
> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init]
> 0-management: Maximum allowed open file descriptors set to 65536
> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init]
> 0-management: Using /var/lib/glusterd as working directory
> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init]
> 0-management: Using /var/run/gluster as pid file working directory
> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
> channel creation failed [No such device]
> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init]
> 0-rdma.management: Failed to initialize IB Device
> [2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load]
> 0-rpc-transport: 'rdma' initialization failed
> [2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener]
> 0-rpc-service: cannot create listener, initing the transport failed
> [2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init]
> 0-management: creation of 1 listeners failed, continuing with succeeded
> transport
> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
> op-version: 40100
> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
> d6bf51a7-c296-492f-8dac-e81efa9dd22d
> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
> file or directory]
> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
> Unable to restore volume: gfs-tst
> [2019-01-15 17:50:15.046718] E [MSGID: 101019] [xlator.c:720:xlator_init]
> 0-management: Initialization of volume 'management' failed, review
your
> volfile again
> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator
> failed
> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit]
> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
> received signum (-1), shutting down
>
>
> 15. In other node running `volume status' still shows bricks node3 is
live
>      but 'peer status' showing node-3 disconnected
>
> @gfstst-node2:~$ sudo gluster v status
> Status of volume: gfs-tst
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
>
------------------------------------------------------------------------------
> Brick IP.2:/media/disk1/brick1          49152     0          Y       1517
> Brick IP.4:/media/disk1/brick1          49152     0          Y       1668
> Brick IP.2:/media/disk2/brick2          49153     0          Y       1522
> Brick IP.4:/media/disk2/brick2          49153     0          Y       1678
> Brick IP.2:/media/disk3/brick3          49154     0          Y       1527
> Brick IP.4:/media/disk3/brick3          49154     0          Y       1677
> Brick IP.2:/media/disk4/brick4          49155     0          Y       1541
> Brick IP.4:/media/disk4/brick4          49155     0          Y       1683
> Self-heal Daemon on localhost           N/A       N/A        Y       2662
> Self-heal Daemon on IP.4                N/A       N/A        Y       2786
>
> Task Status of Volume gfs-tst
>
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> root at gfstst-node2:~$ sudo gluster pool list
> UUID                                    Hostname        State
> d6bf51a7-c296-492f-8dac-e81efa9dd22d    IP.3        Disconnected
> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143    IP.4        Connected
> 0083ec0c-40bf-472a-a128-458924e56c96    localhost       Connected
>
> root at gfstst-node2:~$ sudo gluster peer status
> Number of Peers: 2
>
> Hostname: IP.3
> Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22d
> State: Peer in Cluster (Disconnected)
>
> Hostname: IP.4
> Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143
> State: Peer in Cluster (Connected)
>
>
> regards
> Amudhan
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190116/cca68b64/attachment.html>

Amudhan P

2019-Jan-16 11:32 UTC

head link

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

Atin,
I have copied the content of 'gfs-tst' from vol folder in another node.
when starting service again fails with error msg in glusterd.log file.

[2019-01-15 20:16:59.513023] I [MSGID: 100030] [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
[2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init]
0-management: Maximum allowed open file descriptors set to 65536
[2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init]
0-management: Using /var/lib/glusterd as working directory
[2019-01-15 20:16:59.517283] I [MSGID: 106479] [glusterd.c:1486:init]
0-management: Using /var/run/gluster as pid file working directory
[2019-01-15 20:16:59.521508] W [MSGID: 103071]
[rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
channel creation failed [No such device]
[2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init]
0-rdma.management: Failed to initialize IB Device
[2019-01-15 20:16:59.521562] W [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma' initialization failed
[2019-01-15 20:16:59.521629] W [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed
[2019-01-15 20:16:59.521648] E [MSGID: 106244] [glusterd.c:1764:init]
0-management: creation of 1 listeners failed, continuing with succeeded
transport
[2019-01-15 20:17:00.529390] I [MSGID: 106513]
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
op-version: 40100
[2019-01-15 20:17:00.608354] I [MSGID: 106544]
[glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
d6bf51a7-c296-492f-8dac-e81efa9dd22d
[2019-01-15 20:17:00.650911] W [MSGID: 106425]
[glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed
to get statfs() call on brick /media/disk4/brick4 [No such file or
directory]
[2019-01-15 20:17:00.691240] I [MSGID: 106498]
[glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
connect returned 0
[2019-01-15 20:17:00.691307] W [MSGID: 106061]
[glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
Failed to get tcp-user-timeout
[2019-01-15 20:17:00.691331] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2019-01-15 20:17:00.692547] E [MSGID: 106187]
[glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
brick failed in restore
[2019-01-15 20:17:00.692582] E [MSGID: 101019] [xlator.c:720:xlator_init]
0-management: Initialization of volume 'management' failed, review your
volfile again
[2019-01-15 20:17:00.692597] E [MSGID: 101066]
[graph.c:367:glusterfs_graph_init] 0-management: initializing translator
failed
[2019-01-15 20:17:00.692607] E [MSGID: 101176]
[graph.c:738:glusterfs_graph_activate] 0-graph: init failed
[2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit]
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
received signum (-1), shutting down


On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee <amukherj at redhat.com>
wrote:
> This is a case of partial write of a transaction and as the host ran out
> of space for the root partition where all the glusterd related
> configurations are persisted, the transaction couldn't be written and
hence
> the new (replaced) brick's information wasn't persisted in the
> configuration. The workaround for this is to copy the content of
> /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted
> storage pool to the node where glusterd service fails to come up and post
> that restarting the glusterd service should be able to make peer status
> reporting all nodes healthy and connected.
>
> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P <amudhan83 at gmail.com>
wrote:
>
>> Hi,
>>
>> In short, when I started glusterd service I am getting following error
>> msg in the glusterd.log file in one server.
>> what needs to be done?
>>
>> error logged in glusterd.log
>>
>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main]
>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init]
>> 0-management: Using /var/lib/glusterd as working directory
>> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init]
>> 0-management: Using /var/run/gluster as pid file working directory
>> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>> channel creation failed [No such device]
>> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init]
>> 0-rdma.management: Failed to initialize IB Device
>> [2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load]
>> 0-rpc-transport: 'rdma' initialization failed
>> [2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init]
>> 0-management: creation of 1 listeners failed, continuing with succeeded
>> transport
>> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd:
retrieved
>> op-version: 40100
>> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No
such
>> file or directory]
>> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>> Unable to restore volume: gfs-tst
>> [2019-01-15 17:50:15.046718] E [MSGID: 101019]
[xlator.c:720:xlator_init]
>> 0-management: Initialization of volume 'management' failed,
review your
>> volfile again
>> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
>> [graph.c:367:glusterfs_graph_init] 0-management: initializing
translator
>> failed
>> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit]
>> (-->/usr/local/sbin/glusterd(glusterfs_volumes
>>
>>
>>
>> In long, I am trying to simulate a situation. where volume stoped
>> abnormally and
>> entire cluster restarted with some missing disks.
>>
>> My test cluster is set up with 3 nodes and each has four disks, I have
>> setup a volume with disperse 4+2.
>> In Node-3 2 disks have failed, to replace I have shutdown all system
>>
>> below are the steps done.
>>
>> 1. umount from client machine
>> 2. shutdown all system by running `shutdown -h now` command ( without
>> stopping volume and stop service)
>> 3. replace faulty disk in Node-3
>> 4. powered ON all system
>> 5. format replaced drives, and mount all drives
>> 6. start glusterd service in all node (success)
>> 7. Now running `voulume status` command from node-3
>> output : [2019-01-15 16:52:17.718422]  : v status : FAILED : Staging
>> failed on 0083ec0c-40bf-472a-a128-458924e56c96. Please check log file
for
>> details.
>> 8. running `voulume start gfs-tst` command from node-3
>> output : [2019-01-15 16:53:19.410252]  : v start gfs-tst : FAILED :
>> Volume gfs-tst already started
>>
>> 9. running `gluster v status` in other node. showing all brick
available
>> but 'self-heal daemon' not running
>> @gfstst-node2:~$ sudo gluster v status
>> Status of volume: gfs-tst
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>>
------------------------------------------------------------------------------
>> Brick IP.2:/media/disk1/brick1          49152     0          Y      
1517
>> Brick IP.4:/media/disk1/brick1          49152     0          Y      
1668
>> Brick IP.2:/media/disk2/brick2          49153     0          Y      
1522
>> Brick IP.4:/media/disk2/brick2          49153     0          Y      
1678
>> Brick IP.2:/media/disk3/brick3          49154     0          Y      
1527
>> Brick IP.4:/media/disk3/brick3          49154     0          Y      
1677
>> Brick IP.2:/media/disk4/brick4          49155     0          Y      
1541
>> Brick IP.4:/media/disk4/brick4          49155     0          Y      
1683
>> Self-heal Daemon on localhost               N/A       N/A        Y
>>  2662
>> Self-heal Daemon on IP.4                N/A       N/A        Y      
2786
>>
>> 10. in the above output 'volume already started'. so, running
>> `reset-brick` command
>>    v reset-brick gfs-tst IP.3:/media/disk3/brick3
>> IP.3:/media/disk3/brick3 commit force
>>
>> output : [2019-01-15 16:57:37.916942]  : v reset-brick gfs-tst
>> IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit force : FAILED
:
>> /media/disk3/brick3 is already part of a volume
>>
>> 11. reset-brick command was not working, so, tried stopping volume and
>> start with force command
>> output : [2019-01-15 17:01:04.570794]  : v start gfs-tst force : FAILED
:
>> Pre-validation failed on localhost. Please check log file for details
>>
>> 12. now stopped service in all node and tried starting again. except
>> node-3 other nodes service started successfully without any issues.
>>
>> in node-3 receiving following message.
>>
>> sudo service glusterd start
>> * Starting glusterd service glusterd
>>
>>           [fail]
>> /usr/local/sbin/glusterd: option requires an argument -- 'f'
>> Try `glusterd --help' or `glusterd --usage' for more
information.
>>
>> 13. checking glusterd log file found that OS drive was running out of
>> space
>> output : [2019-01-15 16:51:37.210792] W [MSGID: 101012]
>> [store.c:372:gf_store_save_value] 0-management: fflush failed. [No
space
>> left on device]
>> [2019-01-15 16:51:37.210874] E [MSGID: 106190]
>> [glusterd-store.c:1058:glusterd_volume_exclude_options_write]
0-management:
>> Unable to write volume values for gfs-tst
>>
>> 14. cleared some space in OS drive but still, service is not running.
>> below is the error logged in glusterd.log
>>
>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main]
>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid)
>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init]
>> 0-management: Using /var/lib/glusterd as working directory
>> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init]
>> 0-management: Using /var/run/gluster as pid file working directory
>> [2019-01-15 17:50:13.964437] W [MSGID: 103071]
>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>> channel creation failed [No such device]
>> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init]
>> 0-rdma.management: Failed to initialize IB Device
>> [2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load]
>> 0-rpc-transport: 'rdma' initialization failed
>> [2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init]
>> 0-management: creation of 1 listeners failed, continuing with succeeded
>> transport
>> [2019-01-15 17:50:14.967681] I [MSGID: 106513]
>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd:
retrieved
>> op-version: 40100
>> [2019-01-15 17:50:14.973931] I [MSGID: 106544]
>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>> [2019-01-15 17:50:15.046620] E [MSGID: 101032]
>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No
such
>> file or directory]
>> [2019-01-15 17:50:15.046685] E [MSGID: 106201]
>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>> Unable to restore volume: gfs-tst
>> [2019-01-15 17:50:15.046718] E [MSGID: 101019]
[xlator.c:720:xlator_init]
>> 0-management: Initialization of volume 'management' failed,
review your
>> volfile again
>> [2019-01-15 17:50:15.046732] E [MSGID: 101066]
>> [graph.c:367:glusterfs_graph_init] 0-management: initializing
translator
>> failed
>> [2019-01-15 17:50:15.046741] E [MSGID: 101176]
>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit]
>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151)
[0x409e41]
>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>> received signum (-1), shutting down
>>
>>
>> 15. In other node running `volume status' still shows bricks node3
is
>> live
>>      but 'peer status' showing node-3 disconnected
>>
>> @gfstst-node2:~$ sudo gluster v status
>> Status of volume: gfs-tst
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>>
------------------------------------------------------------------------------
>> Brick IP.2:/media/disk1/brick1          49152     0          Y      
1517
>> Brick IP.4:/media/disk1/brick1          49152     0          Y      
1668
>> Brick IP.2:/media/disk2/brick2          49153     0          Y      
1522
>> Brick IP.4:/media/disk2/brick2          49153     0          Y      
1678
>> Brick IP.2:/media/disk3/brick3          49154     0          Y      
1527
>> Brick IP.4:/media/disk3/brick3          49154     0          Y      
1677
>> Brick IP.2:/media/disk4/brick4          49155     0          Y      
1541
>> Brick IP.4:/media/disk4/brick4          49155     0          Y      
1683
>> Self-heal Daemon on localhost           N/A       N/A        Y      
2662
>> Self-heal Daemon on IP.4                N/A       N/A        Y      
2786
>>
>> Task Status of Volume gfs-tst
>>
>>
------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>>
>> root at gfstst-node2:~$ sudo gluster pool list
>> UUID                                    Hostname        State
>> d6bf51a7-c296-492f-8dac-e81efa9dd22d    IP.3        Disconnected
>> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143    IP.4        Connected
>> 0083ec0c-40bf-472a-a128-458924e56c96    localhost       Connected
>>
>> root at gfstst-node2:~$ sudo gluster peer status
>> Number of Peers: 2
>>
>> Hostname: IP.3
>> Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22d
>> State: Peer in Cluster (Disconnected)
>>
>> Hostname: IP.4
>> Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143
>> State: Peer in Cluster (Connected)
>>
>>
>> regards
>> Amudhan
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190116/c3f8426e/attachment.html>

Gluster users - Jan 2019 - glusterfs 4.1.6 error in starting glusterd service

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service