Amudhan P
2019-Jan-18 11:22 UTC
[Gluster-users] glusterfs 4.1.6 error in starting glusterd service
Hi Atin, I have sent files to your email directly in other mail. hope you have received. regards Amudhan On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee <amukherj at redhat.com> wrote:> Can you please run 'glusterd -LDEBUG' and share back the glusterd.log? > Instead of doing too many back and forth I suggest you to share the content > of /var/lib/glusterd from all the nodes. Also do mention which particular > node the glusterd service is unable to come up. > > On Thu, Jan 17, 2019 at 11:34 AM Amudhan P <amudhan83 at gmail.com> wrote: > >> I have created the folder in the path as said but still, service failed >> to start below is the error msg in glusterd.log >> >> [2019-01-16 14:50:14.555742] I [MSGID: 100030] [glusterfsd.c:2741:main] >> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd >> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid) >> [2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init] >> 0-management: Maximum allowed open file descriptors set to 65536 >> [2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init] >> 0-management: Using /var/lib/glusterd as working directory >> [2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init] >> 0-management: Using /var/run/gluster as pid file working directory >> [2019-01-16 14:50:14.563834] W [MSGID: 103071] >> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >> channel creation failed [No such device] >> [2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init] >> 0-rdma.management: Failed to initialize IB Device >> [2019-01-16 14:50:14.563882] W [rpc-transport.c:351:rpc_transport_load] >> 0-rpc-transport: 'rdma' initialization failed >> [2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener] >> 0-rpc-service: cannot create listener, initing the transport failed >> [2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init] >> 0-management: creation of 1 listeners failed, continuing with succeeded >> transport >> [2019-01-16 14:50:15.565868] I [MSGID: 106513] >> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >> op-version: 40100 >> [2019-01-16 14:50:15.642532] I [MSGID: 106544] >> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >> d6bf51a7-c296-492f-8dac-e81efa9dd22d >> [2019-01-16 14:50:15.675333] I [MSGID: 106498] >> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management: >> connect returned 0 >> [2019-01-16 14:50:15.675421] W [MSGID: 106061] >> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd: >> Failed to get tcp-user-timeout >> [2019-01-16 14:50:15.675451] I [rpc-clnt.c:1059:rpc_clnt_connection_init] >> 0-management: setting frame-timeout to 600 >> *[2019-01-16 14:50:15.676912] E [MSGID: 106187] >> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve >> brick failed in restore* >> *[2019-01-16 14:50:15.676956] E [MSGID: 101019] >> [xlator.c:720:xlator_init] 0-management: Initialization of volume >> 'management' failed, review your volfile again* >> [2019-01-16 14:50:15.676973] E [MSGID: 101066] >> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >> failed >> [2019-01-16 14:50:15.676986] E [MSGID: 101176] >> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >> [2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit] >> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] >> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] >> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: >> received signum (-1), shutting down >> >> >> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee <amukherj at redhat.com> >> wrote: >> >>> If gluster volume info/status shows the brick to be /media/disk4/brick4 >>> then you'd need to mount the same path and hence you'd need to create the >>> brick4 directory explicitly. I fail to understand the rationale how only >>> /media/disk4 can be used as the mount path for the brick. >>> >>> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P <amudhan83 at gmail.com> wrote: >>> >>>> Yes, I did mount bricks but the folder 'brick4' was still not created >>>> inside the brick. >>>> Do I need to create this folder because when I run replace-brick it >>>> will create folder inside the brick. I have seen this behavior before when >>>> running replace-brick or heal begins. >>>> >>>> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee <amukherj at redhat.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P <amudhan83 at gmail.com> wrote: >>>>> >>>>>> Atin, >>>>>> I have copied the content of 'gfs-tst' from vol folder in another >>>>>> node. when starting service again fails with error msg in glusterd.log file. >>>>>> >>>>>> [2019-01-15 20:16:59.513023] I [MSGID: 100030] >>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running >>>>>> /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p >>>>>> /var/run/glusterd.pid) >>>>>> [2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init] >>>>>> 0-management: Maximum allowed open file descriptors set to 65536 >>>>>> [2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init] >>>>>> 0-management: Using /var/lib/glusterd as working directory >>>>>> [2019-01-15 20:16:59.517283] I [MSGID: 106479] [glusterd.c:1486:init] >>>>>> 0-management: Using /var/run/gluster as pid file working directory >>>>>> [2019-01-15 20:16:59.521508] W [MSGID: 103071] >>>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>>>>> channel creation failed [No such device] >>>>>> [2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init] >>>>>> 0-rdma.management: Failed to initialize IB Device >>>>>> [2019-01-15 20:16:59.521562] W >>>>>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' >>>>>> initialization failed >>>>>> [2019-01-15 20:16:59.521629] W [rpcsvc.c:1781:rpcsvc_create_listener] >>>>>> 0-rpc-service: cannot create listener, initing the transport failed >>>>>> [2019-01-15 20:16:59.521648] E [MSGID: 106244] [glusterd.c:1764:init] >>>>>> 0-management: creation of 1 listeners failed, continuing with succeeded >>>>>> transport >>>>>> [2019-01-15 20:17:00.529390] I [MSGID: 106513] >>>>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>>>>> op-version: 40100 >>>>>> [2019-01-15 20:17:00.608354] I [MSGID: 106544] >>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>>>> [2019-01-15 20:17:00.650911] W [MSGID: 106425] >>>>>> [glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed >>>>>> to get statfs() call on brick /media/disk4/brick4 [No such file or >>>>>> directory] >>>>>> >>>>> >>>>> This means that underlying brick /media/disk4/brick4 doesn't exist. >>>>> You already mentioned that you had replaced the faulty disk, but have you >>>>> not mounted it yet? >>>>> >>>>> >>>>>> [2019-01-15 20:17:00.691240] I [MSGID: 106498] >>>>>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management: >>>>>> connect returned 0 >>>>>> [2019-01-15 20:17:00.691307] W [MSGID: 106061] >>>>>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd: >>>>>> Failed to get tcp-user-timeout >>>>>> [2019-01-15 20:17:00.691331] I >>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting >>>>>> frame-timeout to 600 >>>>>> [2019-01-15 20:17:00.692547] E [MSGID: 106187] >>>>>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve >>>>>> brick failed in restore >>>>>> [2019-01-15 20:17:00.692582] E [MSGID: 101019] >>>>>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>>>>> 'management' failed, review your volfile again >>>>>> [2019-01-15 20:17:00.692597] E [MSGID: 101066] >>>>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>>>>> failed >>>>>> [2019-01-15 20:17:00.692607] E [MSGID: 101176] >>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>>>>> [2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit] >>>>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] >>>>>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] >>>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: >>>>>> received signum (-1), shutting down >>>>>> >>>>>> >>>>>> On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee <amukherj at redhat.com> >>>>>> wrote: >>>>>> >>>>>>> This is a case of partial write of a transaction and as the host ran >>>>>>> out of space for the root partition where all the glusterd related >>>>>>> configurations are persisted, the transaction couldn't be written and hence >>>>>>> the new (replaced) brick's information wasn't persisted in the >>>>>>> configuration. The workaround for this is to copy the content of >>>>>>> /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted >>>>>>> storage pool to the node where glusterd service fails to come up and post >>>>>>> that restarting the glusterd service should be able to make peer status >>>>>>> reporting all nodes healthy and connected. >>>>>>> >>>>>>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P <amudhan83 at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> In short, when I started glusterd service I am getting following >>>>>>>> error msg in the glusterd.log file in one server. >>>>>>>> what needs to be done? >>>>>>>> >>>>>>>> error logged in glusterd.log >>>>>>>> >>>>>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] >>>>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running >>>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p >>>>>>>> /var/run/glusterd.pid) >>>>>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] >>>>>>>> [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors >>>>>>>> set to 65536 >>>>>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] >>>>>>>> [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working >>>>>>>> directory >>>>>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479] >>>>>>>> [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file >>>>>>>> working directory >>>>>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071] >>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>>>>>>> channel creation failed [No such device] >>>>>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] >>>>>>>> 0-rdma.management: Failed to initialize IB Device >>>>>>>> [2019-01-15 17:50:13.964491] W >>>>>>>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' >>>>>>>> initialization failed >>>>>>>> [2019-01-15 17:50:13.964560] W >>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create >>>>>>>> listener, initing the transport failed >>>>>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244] >>>>>>>> [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, >>>>>>>> continuing with succeeded transport >>>>>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513] >>>>>>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>>>>>>> op-version: 40100 >>>>>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544] >>>>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>>>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032] >>>>>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to >>>>>>>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such >>>>>>>> file or directory] >>>>>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201] >>>>>>>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: >>>>>>>> Unable to restore volume: gfs-tst >>>>>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019] >>>>>>>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>>>>>>> 'management' failed, review your volfile again >>>>>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066] >>>>>>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>>>>>>> failed >>>>>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176] >>>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>>>>>>> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit] >>>>>>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> In long, I am trying to simulate a situation. where volume stoped >>>>>>>> abnormally and >>>>>>>> entire cluster restarted with some missing disks. >>>>>>>> >>>>>>>> My test cluster is set up with 3 nodes and each has four disks, I >>>>>>>> have setup a volume with disperse 4+2. >>>>>>>> In Node-3 2 disks have failed, to replace I have shutdown all system >>>>>>>> >>>>>>>> below are the steps done. >>>>>>>> >>>>>>>> 1. umount from client machine >>>>>>>> 2. shutdown all system by running `shutdown -h now` command ( >>>>>>>> without stopping volume and stop service) >>>>>>>> 3. replace faulty disk in Node-3 >>>>>>>> 4. powered ON all system >>>>>>>> 5. format replaced drives, and mount all drives >>>>>>>> 6. start glusterd service in all node (success) >>>>>>>> 7. Now running `voulume status` command from node-3 >>>>>>>> output : [2019-01-15 16:52:17.718422] : v status : FAILED : >>>>>>>> Staging failed on 0083ec0c-40bf-472a-a128-458924e56c96. Please check log >>>>>>>> file for details. >>>>>>>> 8. running `voulume start gfs-tst` command from node-3 >>>>>>>> output : [2019-01-15 16:53:19.410252] : v start gfs-tst : FAILED : >>>>>>>> Volume gfs-tst already started >>>>>>>> >>>>>>>> 9. running `gluster v status` in other node. showing all brick >>>>>>>> available but 'self-heal daemon' not running >>>>>>>> @gfstst-node2:~$ sudo gluster v status >>>>>>>> Status of volume: gfs-tst >>>>>>>> Gluster process TCP Port RDMA Port >>>>>>>> Online Pid >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> Brick IP.2:/media/disk1/brick1 49152 0 Y >>>>>>>> 1517 >>>>>>>> Brick IP.4:/media/disk1/brick1 49152 0 Y >>>>>>>> 1668 >>>>>>>> Brick IP.2:/media/disk2/brick2 49153 0 Y >>>>>>>> 1522 >>>>>>>> Brick IP.4:/media/disk2/brick2 49153 0 Y >>>>>>>> 1678 >>>>>>>> Brick IP.2:/media/disk3/brick3 49154 0 Y >>>>>>>> 1527 >>>>>>>> Brick IP.4:/media/disk3/brick3 49154 0 Y >>>>>>>> 1677 >>>>>>>> Brick IP.2:/media/disk4/brick4 49155 0 Y >>>>>>>> 1541 >>>>>>>> Brick IP.4:/media/disk4/brick4 49155 0 Y >>>>>>>> 1683 >>>>>>>> Self-heal Daemon on localhost N/A N/A Y >>>>>>>> 2662 >>>>>>>> Self-heal Daemon on IP.4 N/A N/A Y >>>>>>>> 2786 >>>>>>>> >>>>>>>> 10. in the above output 'volume already started'. so, running >>>>>>>> `reset-brick` command >>>>>>>> v reset-brick gfs-tst IP.3:/media/disk3/brick3 >>>>>>>> IP.3:/media/disk3/brick3 commit force >>>>>>>> >>>>>>>> output : [2019-01-15 16:57:37.916942] : v reset-brick gfs-tst >>>>>>>> IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit force : FAILED : >>>>>>>> /media/disk3/brick3 is already part of a volume >>>>>>>> >>>>>>>> 11. reset-brick command was not working, so, tried stopping volume >>>>>>>> and start with force command >>>>>>>> output : [2019-01-15 17:01:04.570794] : v start gfs-tst force : >>>>>>>> FAILED : Pre-validation failed on localhost. Please check log file for >>>>>>>> details >>>>>>>> >>>>>>>> 12. now stopped service in all node and tried starting again. >>>>>>>> except node-3 other nodes service started successfully without any issues. >>>>>>>> >>>>>>>> in node-3 receiving following message. >>>>>>>> >>>>>>>> sudo service glusterd start >>>>>>>> * Starting glusterd service glusterd >>>>>>>> >>>>>>>> [fail] >>>>>>>> /usr/local/sbin/glusterd: option requires an argument -- 'f' >>>>>>>> Try `glusterd --help' or `glusterd --usage' for more information. >>>>>>>> >>>>>>>> 13. checking glusterd log file found that OS drive was running out >>>>>>>> of space >>>>>>>> output : [2019-01-15 16:51:37.210792] W [MSGID: 101012] >>>>>>>> [store.c:372:gf_store_save_value] 0-management: fflush failed. [No space >>>>>>>> left on device] >>>>>>>> [2019-01-15 16:51:37.210874] E [MSGID: 106190] >>>>>>>> [glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management: >>>>>>>> Unable to write volume values for gfs-tst >>>>>>>> >>>>>>>> 14. cleared some space in OS drive but still, service is not >>>>>>>> running. below is the error logged in glusterd.log >>>>>>>> >>>>>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] >>>>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running >>>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p >>>>>>>> /var/run/glusterd.pid) >>>>>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] >>>>>>>> [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors >>>>>>>> set to 65536 >>>>>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] >>>>>>>> [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working >>>>>>>> directory >>>>>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479] >>>>>>>> [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file >>>>>>>> working directory >>>>>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071] >>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>>>>>>> channel creation failed [No such device] >>>>>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] >>>>>>>> 0-rdma.management: Failed to initialize IB Device >>>>>>>> [2019-01-15 17:50:13.964491] W >>>>>>>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' >>>>>>>> initialization failed >>>>>>>> [2019-01-15 17:50:13.964560] W >>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create >>>>>>>> listener, initing the transport failed >>>>>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244] >>>>>>>> [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, >>>>>>>> continuing with succeeded transport >>>>>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513] >>>>>>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>>>>>>> op-version: 40100 >>>>>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544] >>>>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>>>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032] >>>>>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to >>>>>>>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such >>>>>>>> file or directory] >>>>>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201] >>>>>>>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: >>>>>>>> Unable to restore volume: gfs-tst >>>>>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019] >>>>>>>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>>>>>>> 'management' failed, review your volfile again >>>>>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066] >>>>>>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>>>>>>> failed >>>>>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176] >>>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>>>>>>> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit] >>>>>>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] >>>>>>>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] >>>>>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: >>>>>>>> received signum (-1), shutting down >>>>>>>> >>>>>>>> >>>>>>>> 15. In other node running `volume status' still shows bricks node3 >>>>>>>> is live >>>>>>>> but 'peer status' showing node-3 disconnected >>>>>>>> >>>>>>>> @gfstst-node2:~$ sudo gluster v status >>>>>>>> Status of volume: gfs-tst >>>>>>>> Gluster process TCP Port RDMA Port >>>>>>>> Online Pid >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> Brick IP.2:/media/disk1/brick1 49152 0 Y >>>>>>>> 1517 >>>>>>>> Brick IP.4:/media/disk1/brick1 49152 0 Y >>>>>>>> 1668 >>>>>>>> Brick IP.2:/media/disk2/brick2 49153 0 Y >>>>>>>> 1522 >>>>>>>> Brick IP.4:/media/disk2/brick2 49153 0 Y >>>>>>>> 1678 >>>>>>>> Brick IP.2:/media/disk3/brick3 49154 0 Y >>>>>>>> 1527 >>>>>>>> Brick IP.4:/media/disk3/brick3 49154 0 Y >>>>>>>> 1677 >>>>>>>> Brick IP.2:/media/disk4/brick4 49155 0 Y >>>>>>>> 1541 >>>>>>>> Brick IP.4:/media/disk4/brick4 49155 0 Y >>>>>>>> 1683 >>>>>>>> Self-heal Daemon on localhost N/A N/A Y >>>>>>>> 2662 >>>>>>>> Self-heal Daemon on IP.4 N/A N/A Y >>>>>>>> 2786 >>>>>>>> >>>>>>>> Task Status of Volume gfs-tst >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> There are no active volume tasks >>>>>>>> >>>>>>>> >>>>>>>> root at gfstst-node2:~$ sudo gluster pool list >>>>>>>> UUID Hostname State >>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d IP.3 Disconnected >>>>>>>> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143 IP.4 Connected >>>>>>>> 0083ec0c-40bf-472a-a128-458924e56c96 localhost Connected >>>>>>>> >>>>>>>> root at gfstst-node2:~$ sudo gluster peer status >>>>>>>> Number of Peers: 2 >>>>>>>> >>>>>>>> Hostname: IP.3 >>>>>>>> Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>>>>>> State: Peer in Cluster (Disconnected) >>>>>>>> >>>>>>>> Hostname: IP.4 >>>>>>>> Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143 >>>>>>>> State: Peer in Cluster (Connected) >>>>>>>> >>>>>>>> >>>>>>>> regards >>>>>>>> Amudhan >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190118/40abd4af/attachment.html>
Atin Mukherjee
2019-Jan-19 02:25 UTC
[Gluster-users] glusterfs 4.1.6 error in starting glusterd service
I have received but haven?t got a chance to look at them. I can only come back on this sometime early next week based on my schedule. On Fri, 18 Jan 2019 at 16:52, Amudhan P <amudhan83 at gmail.com> wrote:> Hi Atin, > > I have sent files to your email directly in other mail. hope you have > received. > > regards > Amudhan > > On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee <amukherj at redhat.com> > wrote: > >> Can you please run 'glusterd -LDEBUG' and share back the glusterd.log? >> Instead of doing too many back and forth I suggest you to share the content >> of /var/lib/glusterd from all the nodes. Also do mention which particular >> node the glusterd service is unable to come up. >> >> On Thu, Jan 17, 2019 at 11:34 AM Amudhan P <amudhan83 at gmail.com> wrote: >> >>> I have created the folder in the path as said but still, service failed >>> to start below is the error msg in glusterd.log >>> >>> [2019-01-16 14:50:14.555742] I [MSGID: 100030] [glusterfsd.c:2741:main] >>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd >>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid) >>> [2019-01-16 14:50:14.559835] I [MSGID: 106478] [glusterd.c:1423:init] >>> 0-management: Maximum allowed open file descriptors set to 65536 >>> [2019-01-16 14:50:14.559894] I [MSGID: 106479] [glusterd.c:1481:init] >>> 0-management: Using /var/lib/glusterd as working directory >>> [2019-01-16 14:50:14.559912] I [MSGID: 106479] [glusterd.c:1486:init] >>> 0-management: Using /var/run/gluster as pid file working directory >>> [2019-01-16 14:50:14.563834] W [MSGID: 103071] >>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>> channel creation failed [No such device] >>> [2019-01-16 14:50:14.563867] W [MSGID: 103055] [rdma.c:4938:init] >>> 0-rdma.management: Failed to initialize IB Device >>> [2019-01-16 14:50:14.563882] W [rpc-transport.c:351:rpc_transport_load] >>> 0-rpc-transport: 'rdma' initialization failed >>> [2019-01-16 14:50:14.563957] W [rpcsvc.c:1781:rpcsvc_create_listener] >>> 0-rpc-service: cannot create listener, initing the transport failed >>> [2019-01-16 14:50:14.563974] E [MSGID: 106244] [glusterd.c:1764:init] >>> 0-management: creation of 1 listeners failed, continuing with succeeded >>> transport >>> [2019-01-16 14:50:15.565868] I [MSGID: 106513] >>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>> op-version: 40100 >>> [2019-01-16 14:50:15.642532] I [MSGID: 106544] >>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>> [2019-01-16 14:50:15.675333] I [MSGID: 106498] >>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management: >>> connect returned 0 >>> [2019-01-16 14:50:15.675421] W [MSGID: 106061] >>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd: >>> Failed to get tcp-user-timeout >>> [2019-01-16 14:50:15.675451] I >>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting >>> frame-timeout to 600 >>> *[2019-01-16 14:50:15.676912] E [MSGID: 106187] >>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve >>> brick failed in restore* >>> *[2019-01-16 14:50:15.676956] E [MSGID: 101019] >>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>> 'management' failed, review your volfile again* >>> [2019-01-16 14:50:15.676973] E [MSGID: 101066] >>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>> failed >>> [2019-01-16 14:50:15.676986] E [MSGID: 101176] >>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>> [2019-01-16 14:50:15.677479] W [glusterfsd.c:1514:cleanup_and_exit] >>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] >>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] >>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: >>> received signum (-1), shutting down >>> >>> >>> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee <amukherj at redhat.com> >>> wrote: >>> >>>> If gluster volume info/status shows the brick to be /media/disk4/brick4 >>>> then you'd need to mount the same path and hence you'd need to create the >>>> brick4 directory explicitly. I fail to understand the rationale how only >>>> /media/disk4 can be used as the mount path for the brick. >>>> >>>> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P <amudhan83 at gmail.com> wrote: >>>> >>>>> Yes, I did mount bricks but the folder 'brick4' was still not created >>>>> inside the brick. >>>>> Do I need to create this folder because when I run replace-brick it >>>>> will create folder inside the brick. I have seen this behavior before when >>>>> running replace-brick or heal begins. >>>>> >>>>> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee <amukherj at redhat.com> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P <amudhan83 at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Atin, >>>>>>> I have copied the content of 'gfs-tst' from vol folder in another >>>>>>> node. when starting service again fails with error msg in glusterd.log file. >>>>>>> >>>>>>> [2019-01-15 20:16:59.513023] I [MSGID: 100030] >>>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running >>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p >>>>>>> /var/run/glusterd.pid) >>>>>>> [2019-01-15 20:16:59.517164] I [MSGID: 106478] >>>>>>> [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors >>>>>>> set to 65536 >>>>>>> [2019-01-15 20:16:59.517264] I [MSGID: 106479] >>>>>>> [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working >>>>>>> directory >>>>>>> [2019-01-15 20:16:59.517283] I [MSGID: 106479] >>>>>>> [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file >>>>>>> working directory >>>>>>> [2019-01-15 20:16:59.521508] W [MSGID: 103071] >>>>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>>>>>> channel creation failed [No such device] >>>>>>> [2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init] >>>>>>> 0-rdma.management: Failed to initialize IB Device >>>>>>> [2019-01-15 20:16:59.521562] W >>>>>>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' >>>>>>> initialization failed >>>>>>> [2019-01-15 20:16:59.521629] W >>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create >>>>>>> listener, initing the transport failed >>>>>>> [2019-01-15 20:16:59.521648] E [MSGID: 106244] >>>>>>> [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, >>>>>>> continuing with succeeded transport >>>>>>> [2019-01-15 20:17:00.529390] I [MSGID: 106513] >>>>>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>>>>>> op-version: 40100 >>>>>>> [2019-01-15 20:17:00.608354] I [MSGID: 106544] >>>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>>>>> [2019-01-15 20:17:00.650911] W [MSGID: 106425] >>>>>>> [glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed >>>>>>> to get statfs() call on brick /media/disk4/brick4 [No such file or >>>>>>> directory] >>>>>>> >>>>>> >>>>>> This means that underlying brick /media/disk4/brick4 doesn't exist. >>>>>> You already mentioned that you had replaced the faulty disk, but have you >>>>>> not mounted it yet? >>>>>> >>>>>> >>>>>>> [2019-01-15 20:17:00.691240] I [MSGID: 106498] >>>>>>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management: >>>>>>> connect returned 0 >>>>>>> [2019-01-15 20:17:00.691307] W [MSGID: 106061] >>>>>>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd: >>>>>>> Failed to get tcp-user-timeout >>>>>>> [2019-01-15 20:17:00.691331] I >>>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting >>>>>>> frame-timeout to 600 >>>>>>> [2019-01-15 20:17:00.692547] E [MSGID: 106187] >>>>>>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve >>>>>>> brick failed in restore >>>>>>> [2019-01-15 20:17:00.692582] E [MSGID: 101019] >>>>>>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>>>>>> 'management' failed, review your volfile again >>>>>>> [2019-01-15 20:17:00.692597] E [MSGID: 101066] >>>>>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>>>>>> failed >>>>>>> [2019-01-15 20:17:00.692607] E [MSGID: 101176] >>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>>>>>> [2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit] >>>>>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] >>>>>>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] >>>>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: >>>>>>> received signum (-1), shutting down >>>>>>> >>>>>>> >>>>>>> On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee <amukherj at redhat.com> >>>>>>> wrote: >>>>>>> >>>>>>>> This is a case of partial write of a transaction and as the host >>>>>>>> ran out of space for the root partition where all the glusterd related >>>>>>>> configurations are persisted, the transaction couldn't be written and hence >>>>>>>> the new (replaced) brick's information wasn't persisted in the >>>>>>>> configuration. The workaround for this is to copy the content of >>>>>>>> /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted >>>>>>>> storage pool to the node where glusterd service fails to come up and post >>>>>>>> that restarting the glusterd service should be able to make peer status >>>>>>>> reporting all nodes healthy and connected. >>>>>>>> >>>>>>>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P <amudhan83 at gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> In short, when I started glusterd service I am getting following >>>>>>>>> error msg in the glusterd.log file in one server. >>>>>>>>> what needs to be done? >>>>>>>>> >>>>>>>>> error logged in glusterd.log >>>>>>>>> >>>>>>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] >>>>>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running >>>>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p >>>>>>>>> /var/run/glusterd.pid) >>>>>>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] >>>>>>>>> [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors >>>>>>>>> set to 65536 >>>>>>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] >>>>>>>>> [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working >>>>>>>>> directory >>>>>>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479] >>>>>>>>> [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file >>>>>>>>> working directory >>>>>>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071] >>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>>>>>>>> channel creation failed [No such device] >>>>>>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] >>>>>>>>> 0-rdma.management: Failed to initialize IB Device >>>>>>>>> [2019-01-15 17:50:13.964491] W >>>>>>>>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' >>>>>>>>> initialization failed >>>>>>>>> [2019-01-15 17:50:13.964560] W >>>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create >>>>>>>>> listener, initing the transport failed >>>>>>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244] >>>>>>>>> [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, >>>>>>>>> continuing with succeeded transport >>>>>>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513] >>>>>>>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>>>>>>>> op-version: 40100 >>>>>>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544] >>>>>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>>>>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032] >>>>>>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to >>>>>>>>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such >>>>>>>>> file or directory] >>>>>>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201] >>>>>>>>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: >>>>>>>>> Unable to restore volume: gfs-tst >>>>>>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019] >>>>>>>>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>>>>>>>> 'management' failed, review your volfile again >>>>>>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066] >>>>>>>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>>>>>>>> failed >>>>>>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176] >>>>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>>>>>>>> [2019-01-15 17:50:15.047171] W >>>>>>>>> [glusterfsd.c:1514:cleanup_and_exit] >>>>>>>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> In long, I am trying to simulate a situation. where volume stoped >>>>>>>>> abnormally and >>>>>>>>> entire cluster restarted with some missing disks. >>>>>>>>> >>>>>>>>> My test cluster is set up with 3 nodes and each has four disks, I >>>>>>>>> have setup a volume with disperse 4+2. >>>>>>>>> In Node-3 2 disks have failed, to replace I have shutdown all >>>>>>>>> system >>>>>>>>> >>>>>>>>> below are the steps done. >>>>>>>>> >>>>>>>>> 1. umount from client machine >>>>>>>>> 2. shutdown all system by running `shutdown -h now` command ( >>>>>>>>> without stopping volume and stop service) >>>>>>>>> 3. replace faulty disk in Node-3 >>>>>>>>> 4. powered ON all system >>>>>>>>> 5. format replaced drives, and mount all drives >>>>>>>>> 6. start glusterd service in all node (success) >>>>>>>>> 7. Now running `voulume status` command from node-3 >>>>>>>>> output : [2019-01-15 16:52:17.718422] : v status : FAILED : >>>>>>>>> Staging failed on 0083ec0c-40bf-472a-a128-458924e56c96. Please check log >>>>>>>>> file for details. >>>>>>>>> 8. running `voulume start gfs-tst` command from node-3 >>>>>>>>> output : [2019-01-15 16:53:19.410252] : v start gfs-tst : FAILED >>>>>>>>> : Volume gfs-tst already started >>>>>>>>> >>>>>>>>> 9. running `gluster v status` in other node. showing all brick >>>>>>>>> available but 'self-heal daemon' not running >>>>>>>>> @gfstst-node2:~$ sudo gluster v status >>>>>>>>> Status of volume: gfs-tst >>>>>>>>> Gluster process TCP Port RDMA Port >>>>>>>>> Online Pid >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>> Brick IP.2:/media/disk1/brick1 49152 0 Y >>>>>>>>> 1517 >>>>>>>>> Brick IP.4:/media/disk1/brick1 49152 0 Y >>>>>>>>> 1668 >>>>>>>>> Brick IP.2:/media/disk2/brick2 49153 0 Y >>>>>>>>> 1522 >>>>>>>>> Brick IP.4:/media/disk2/brick2 49153 0 Y >>>>>>>>> 1678 >>>>>>>>> Brick IP.2:/media/disk3/brick3 49154 0 Y >>>>>>>>> 1527 >>>>>>>>> Brick IP.4:/media/disk3/brick3 49154 0 Y >>>>>>>>> 1677 >>>>>>>>> Brick IP.2:/media/disk4/brick4 49155 0 Y >>>>>>>>> 1541 >>>>>>>>> Brick IP.4:/media/disk4/brick4 49155 0 Y >>>>>>>>> 1683 >>>>>>>>> Self-heal Daemon on localhost N/A N/A >>>>>>>>> Y 2662 >>>>>>>>> Self-heal Daemon on IP.4 N/A N/A Y >>>>>>>>> 2786 >>>>>>>>> >>>>>>>>> 10. in the above output 'volume already started'. so, running >>>>>>>>> `reset-brick` command >>>>>>>>> v reset-brick gfs-tst IP.3:/media/disk3/brick3 >>>>>>>>> IP.3:/media/disk3/brick3 commit force >>>>>>>>> >>>>>>>>> output : [2019-01-15 16:57:37.916942] : v reset-brick gfs-tst >>>>>>>>> IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit force : FAILED : >>>>>>>>> /media/disk3/brick3 is already part of a volume >>>>>>>>> >>>>>>>>> 11. reset-brick command was not working, so, tried stopping volume >>>>>>>>> and start with force command >>>>>>>>> output : [2019-01-15 17:01:04.570794] : v start gfs-tst force : >>>>>>>>> FAILED : Pre-validation failed on localhost. Please check log file for >>>>>>>>> details >>>>>>>>> >>>>>>>>> 12. now stopped service in all node and tried starting again. >>>>>>>>> except node-3 other nodes service started successfully without any issues. >>>>>>>>> >>>>>>>>> in node-3 receiving following message. >>>>>>>>> >>>>>>>>> sudo service glusterd start >>>>>>>>> * Starting glusterd service glusterd >>>>>>>>> >>>>>>>>> [fail] >>>>>>>>> /usr/local/sbin/glusterd: option requires an argument -- 'f' >>>>>>>>> Try `glusterd --help' or `glusterd --usage' for more information. >>>>>>>>> >>>>>>>>> 13. checking glusterd log file found that OS drive was running out >>>>>>>>> of space >>>>>>>>> output : [2019-01-15 16:51:37.210792] W [MSGID: 101012] >>>>>>>>> [store.c:372:gf_store_save_value] 0-management: fflush failed. [No space >>>>>>>>> left on device] >>>>>>>>> [2019-01-15 16:51:37.210874] E [MSGID: 106190] >>>>>>>>> [glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management: >>>>>>>>> Unable to write volume values for gfs-tst >>>>>>>>> >>>>>>>>> 14. cleared some space in OS drive but still, service is not >>>>>>>>> running. below is the error logged in glusterd.log >>>>>>>>> >>>>>>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] >>>>>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running >>>>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p >>>>>>>>> /var/run/glusterd.pid) >>>>>>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] >>>>>>>>> [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors >>>>>>>>> set to 65536 >>>>>>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] >>>>>>>>> [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working >>>>>>>>> directory >>>>>>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479] >>>>>>>>> [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file >>>>>>>>> working directory >>>>>>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071] >>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>>>>>>>> channel creation failed [No such device] >>>>>>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] >>>>>>>>> 0-rdma.management: Failed to initialize IB Device >>>>>>>>> [2019-01-15 17:50:13.964491] W >>>>>>>>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' >>>>>>>>> initialization failed >>>>>>>>> [2019-01-15 17:50:13.964560] W >>>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create >>>>>>>>> listener, initing the transport failed >>>>>>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244] >>>>>>>>> [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, >>>>>>>>> continuing with succeeded transport >>>>>>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513] >>>>>>>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>>>>>>>> op-version: 40100 >>>>>>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544] >>>>>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>>>>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032] >>>>>>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to >>>>>>>>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such >>>>>>>>> file or directory] >>>>>>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201] >>>>>>>>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: >>>>>>>>> Unable to restore volume: gfs-tst >>>>>>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019] >>>>>>>>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>>>>>>>> 'management' failed, review your volfile again >>>>>>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066] >>>>>>>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>>>>>>>> failed >>>>>>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176] >>>>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>>>>>>>> [2019-01-15 17:50:15.047171] W >>>>>>>>> [glusterfsd.c:1514:cleanup_and_exit] >>>>>>>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] >>>>>>>>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] >>>>>>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: >>>>>>>>> received signum (-1), shutting down >>>>>>>>> >>>>>>>>> >>>>>>>>> 15. In other node running `volume status' still shows bricks node3 >>>>>>>>> is live >>>>>>>>> but 'peer status' showing node-3 disconnected >>>>>>>>> >>>>>>>>> @gfstst-node2:~$ sudo gluster v status >>>>>>>>> Status of volume: gfs-tst >>>>>>>>> Gluster process TCP Port RDMA Port >>>>>>>>> Online Pid >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>> Brick IP.2:/media/disk1/brick1 49152 0 Y >>>>>>>>> 1517 >>>>>>>>> Brick IP.4:/media/disk1/brick1 49152 0 Y >>>>>>>>> 1668 >>>>>>>>> Brick IP.2:/media/disk2/brick2 49153 0 Y >>>>>>>>> 1522 >>>>>>>>> Brick IP.4:/media/disk2/brick2 49153 0 Y >>>>>>>>> 1678 >>>>>>>>> Brick IP.2:/media/disk3/brick3 49154 0 Y >>>>>>>>> 1527 >>>>>>>>> Brick IP.4:/media/disk3/brick3 49154 0 Y >>>>>>>>> 1677 >>>>>>>>> Brick IP.2:/media/disk4/brick4 49155 0 Y >>>>>>>>> 1541 >>>>>>>>> Brick IP.4:/media/disk4/brick4 49155 0 Y >>>>>>>>> 1683 >>>>>>>>> Self-heal Daemon on localhost N/A N/A Y >>>>>>>>> 2662 >>>>>>>>> Self-heal Daemon on IP.4 N/A N/A Y >>>>>>>>> 2786 >>>>>>>>> >>>>>>>>> Task Status of Volume gfs-tst >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>> There are no active volume tasks >>>>>>>>> >>>>>>>>> >>>>>>>>> root at gfstst-node2:~$ sudo gluster pool list >>>>>>>>> UUID Hostname State >>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d IP.3 Disconnected >>>>>>>>> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143 IP.4 Connected >>>>>>>>> 0083ec0c-40bf-472a-a128-458924e56c96 localhost Connected >>>>>>>>> >>>>>>>>> root at gfstst-node2:~$ sudo gluster peer status >>>>>>>>> Number of Peers: 2 >>>>>>>>> >>>>>>>>> Hostname: IP.3 >>>>>>>>> Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>>>>>>> State: Peer in Cluster (Disconnected) >>>>>>>>> >>>>>>>>> Hostname: IP.4 >>>>>>>>> Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143 >>>>>>>>> State: Peer in Cluster (Connected) >>>>>>>>> >>>>>>>>> >>>>>>>>> regards >>>>>>>>> Amudhan >>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> Gluster-users at gluster.org >>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>>> --- Atin (atinm) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190119/7d0ac651/attachment.html>