Amudhan P
2019-Jan-16 11:32 UTC
[Gluster-users] glusterfs 4.1.6 error in starting glusterd service
Atin, I have copied the content of 'gfs-tst' from vol folder in another node. when starting service again fails with error msg in glusterd.log file. [2019-01-15 20:16:59.513023] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid) [2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536 [2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory [2019-01-15 20:16:59.517283] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory [2019-01-15 20:16:59.521508] W [MSGID: 103071] [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] [2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init] 0-rdma.management: Failed to initialize IB Device [2019-01-15 20:16:59.521562] W [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2019-01-15 20:16:59.521629] W [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2019-01-15 20:16:59.521648] E [MSGID: 106244] [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2019-01-15 20:17:00.529390] I [MSGID: 106513] [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 40100 [2019-01-15 20:17:00.608354] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: d6bf51a7-c296-492f-8dac-e81efa9dd22d [2019-01-15 20:17:00.650911] W [MSGID: 106425] [glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed to get statfs() call on brick /media/disk4/brick4 [No such file or directory] [2019-01-15 20:17:00.691240] I [MSGID: 106498] [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2019-01-15 20:17:00.691307] W [MSGID: 106061] [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2019-01-15 20:17:00.691331] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2019-01-15 20:17:00.692547] E [MSGID: 106187] [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore [2019-01-15 20:17:00.692582] E [MSGID: 101019] [xlator.c:720:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2019-01-15 20:17:00.692597] E [MSGID: 101066] [graph.c:367:glusterfs_graph_init] 0-management: initializing translator failed [2019-01-15 20:17:00.692607] E [MSGID: 101176] [graph.c:738:glusterfs_graph_activate] 0-graph: init failed [2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit] (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: received signum (-1), shutting down On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee <amukherj at redhat.com> wrote:> This is a case of partial write of a transaction and as the host ran out > of space for the root partition where all the glusterd related > configurations are persisted, the transaction couldn't be written and hence > the new (replaced) brick's information wasn't persisted in the > configuration. The workaround for this is to copy the content of > /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted > storage pool to the node where glusterd service fails to come up and post > that restarting the glusterd service should be able to make peer status > reporting all nodes healthy and connected. > > On Wed, Jan 16, 2019 at 3:49 PM Amudhan P <amudhan83 at gmail.com> wrote: > >> Hi, >> >> In short, when I started glusterd service I am getting following error >> msg in the glusterd.log file in one server. >> what needs to be done? >> >> error logged in glusterd.log >> >> [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main] >> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd >> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid) >> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init] >> 0-management: Maximum allowed open file descriptors set to 65536 >> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init] >> 0-management: Using /var/lib/glusterd as working directory >> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init] >> 0-management: Using /var/run/gluster as pid file working directory >> [2019-01-15 17:50:13.964437] W [MSGID: 103071] >> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >> channel creation failed [No such device] >> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] >> 0-rdma.management: Failed to initialize IB Device >> [2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load] >> 0-rpc-transport: 'rdma' initialization failed >> [2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener] >> 0-rpc-service: cannot create listener, initing the transport failed >> [2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init] >> 0-management: creation of 1 listeners failed, continuing with succeeded >> transport >> [2019-01-15 17:50:14.967681] I [MSGID: 106513] >> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >> op-version: 40100 >> [2019-01-15 17:50:14.973931] I [MSGID: 106544] >> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >> d6bf51a7-c296-492f-8dac-e81efa9dd22d >> [2019-01-15 17:50:15.046620] E [MSGID: 101032] >> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to >> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such >> file or directory] >> [2019-01-15 17:50:15.046685] E [MSGID: 106201] >> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: >> Unable to restore volume: gfs-tst >> [2019-01-15 17:50:15.046718] E [MSGID: 101019] [xlator.c:720:xlator_init] >> 0-management: Initialization of volume 'management' failed, review your >> volfile again >> [2019-01-15 17:50:15.046732] E [MSGID: 101066] >> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >> failed >> [2019-01-15 17:50:15.046741] E [MSGID: 101176] >> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit] >> (-->/usr/local/sbin/glusterd(glusterfs_volumes >> >> >> >> In long, I am trying to simulate a situation. where volume stoped >> abnormally and >> entire cluster restarted with some missing disks. >> >> My test cluster is set up with 3 nodes and each has four disks, I have >> setup a volume with disperse 4+2. >> In Node-3 2 disks have failed, to replace I have shutdown all system >> >> below are the steps done. >> >> 1. umount from client machine >> 2. shutdown all system by running `shutdown -h now` command ( without >> stopping volume and stop service) >> 3. replace faulty disk in Node-3 >> 4. powered ON all system >> 5. format replaced drives, and mount all drives >> 6. start glusterd service in all node (success) >> 7. Now running `voulume status` command from node-3 >> output : [2019-01-15 16:52:17.718422] : v status : FAILED : Staging >> failed on 0083ec0c-40bf-472a-a128-458924e56c96. Please check log file for >> details. >> 8. running `voulume start gfs-tst` command from node-3 >> output : [2019-01-15 16:53:19.410252] : v start gfs-tst : FAILED : >> Volume gfs-tst already started >> >> 9. running `gluster v status` in other node. showing all brick available >> but 'self-heal daemon' not running >> @gfstst-node2:~$ sudo gluster v status >> Status of volume: gfs-tst >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick IP.2:/media/disk1/brick1 49152 0 Y 1517 >> Brick IP.4:/media/disk1/brick1 49152 0 Y 1668 >> Brick IP.2:/media/disk2/brick2 49153 0 Y 1522 >> Brick IP.4:/media/disk2/brick2 49153 0 Y 1678 >> Brick IP.2:/media/disk3/brick3 49154 0 Y 1527 >> Brick IP.4:/media/disk3/brick3 49154 0 Y 1677 >> Brick IP.2:/media/disk4/brick4 49155 0 Y 1541 >> Brick IP.4:/media/disk4/brick4 49155 0 Y 1683 >> Self-heal Daemon on localhost N/A N/A Y >> 2662 >> Self-heal Daemon on IP.4 N/A N/A Y 2786 >> >> 10. in the above output 'volume already started'. so, running >> `reset-brick` command >> v reset-brick gfs-tst IP.3:/media/disk3/brick3 >> IP.3:/media/disk3/brick3 commit force >> >> output : [2019-01-15 16:57:37.916942] : v reset-brick gfs-tst >> IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit force : FAILED : >> /media/disk3/brick3 is already part of a volume >> >> 11. reset-brick command was not working, so, tried stopping volume and >> start with force command >> output : [2019-01-15 17:01:04.570794] : v start gfs-tst force : FAILED : >> Pre-validation failed on localhost. Please check log file for details >> >> 12. now stopped service in all node and tried starting again. except >> node-3 other nodes service started successfully without any issues. >> >> in node-3 receiving following message. >> >> sudo service glusterd start >> * Starting glusterd service glusterd >> >> [fail] >> /usr/local/sbin/glusterd: option requires an argument -- 'f' >> Try `glusterd --help' or `glusterd --usage' for more information. >> >> 13. checking glusterd log file found that OS drive was running out of >> space >> output : [2019-01-15 16:51:37.210792] W [MSGID: 101012] >> [store.c:372:gf_store_save_value] 0-management: fflush failed. [No space >> left on device] >> [2019-01-15 16:51:37.210874] E [MSGID: 106190] >> [glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management: >> Unable to write volume values for gfs-tst >> >> 14. cleared some space in OS drive but still, service is not running. >> below is the error logged in glusterd.log >> >> [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main] >> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd >> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid) >> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init] >> 0-management: Maximum allowed open file descriptors set to 65536 >> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init] >> 0-management: Using /var/lib/glusterd as working directory >> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init] >> 0-management: Using /var/run/gluster as pid file working directory >> [2019-01-15 17:50:13.964437] W [MSGID: 103071] >> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >> channel creation failed [No such device] >> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] >> 0-rdma.management: Failed to initialize IB Device >> [2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load] >> 0-rpc-transport: 'rdma' initialization failed >> [2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener] >> 0-rpc-service: cannot create listener, initing the transport failed >> [2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init] >> 0-management: creation of 1 listeners failed, continuing with succeeded >> transport >> [2019-01-15 17:50:14.967681] I [MSGID: 106513] >> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >> op-version: 40100 >> [2019-01-15 17:50:14.973931] I [MSGID: 106544] >> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >> d6bf51a7-c296-492f-8dac-e81efa9dd22d >> [2019-01-15 17:50:15.046620] E [MSGID: 101032] >> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to >> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such >> file or directory] >> [2019-01-15 17:50:15.046685] E [MSGID: 106201] >> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: >> Unable to restore volume: gfs-tst >> [2019-01-15 17:50:15.046718] E [MSGID: 101019] [xlator.c:720:xlator_init] >> 0-management: Initialization of volume 'management' failed, review your >> volfile again >> [2019-01-15 17:50:15.046732] E [MSGID: 101066] >> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >> failed >> [2019-01-15 17:50:15.046741] E [MSGID: 101176] >> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit] >> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] >> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] >> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: >> received signum (-1), shutting down >> >> >> 15. In other node running `volume status' still shows bricks node3 is >> live >> but 'peer status' showing node-3 disconnected >> >> @gfstst-node2:~$ sudo gluster v status >> Status of volume: gfs-tst >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick IP.2:/media/disk1/brick1 49152 0 Y 1517 >> Brick IP.4:/media/disk1/brick1 49152 0 Y 1668 >> Brick IP.2:/media/disk2/brick2 49153 0 Y 1522 >> Brick IP.4:/media/disk2/brick2 49153 0 Y 1678 >> Brick IP.2:/media/disk3/brick3 49154 0 Y 1527 >> Brick IP.4:/media/disk3/brick3 49154 0 Y 1677 >> Brick IP.2:/media/disk4/brick4 49155 0 Y 1541 >> Brick IP.4:/media/disk4/brick4 49155 0 Y 1683 >> Self-heal Daemon on localhost N/A N/A Y 2662 >> Self-heal Daemon on IP.4 N/A N/A Y 2786 >> >> Task Status of Volume gfs-tst >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> >> root at gfstst-node2:~$ sudo gluster pool list >> UUID Hostname State >> d6bf51a7-c296-492f-8dac-e81efa9dd22d IP.3 Disconnected >> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143 IP.4 Connected >> 0083ec0c-40bf-472a-a128-458924e56c96 localhost Connected >> >> root at gfstst-node2:~$ sudo gluster peer status >> Number of Peers: 2 >> >> Hostname: IP.3 >> Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22d >> State: Peer in Cluster (Disconnected) >> >> Hostname: IP.4 >> Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143 >> State: Peer in Cluster (Connected) >> >> >> regards >> Amudhan >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190116/c3f8426e/attachment.html>
Atin Mukherjee
2019-Jan-16 11:34 UTC
[Gluster-users] glusterfs 4.1.6 error in starting glusterd service
On Wed, Jan 16, 2019 at 5:02 PM Amudhan P <amudhan83 at gmail.com> wrote:> Atin, > I have copied the content of 'gfs-tst' from vol folder in another node. > when starting service again fails with error msg in glusterd.log file. > > [2019-01-15 20:16:59.513023] I [MSGID: 100030] [glusterfsd.c:2741:main] > 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd > version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid) > [2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init] > 0-management: Maximum allowed open file descriptors set to 65536 > [2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init] > 0-management: Using /var/lib/glusterd as working directory > [2019-01-15 20:16:59.517283] I [MSGID: 106479] [glusterd.c:1486:init] > 0-management: Using /var/run/gluster as pid file working directory > [2019-01-15 20:16:59.521508] W [MSGID: 103071] > [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event > channel creation failed [No such device] > [2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init] > 0-rdma.management: Failed to initialize IB Device > [2019-01-15 20:16:59.521562] W [rpc-transport.c:351:rpc_transport_load] > 0-rpc-transport: 'rdma' initialization failed > [2019-01-15 20:16:59.521629] W [rpcsvc.c:1781:rpcsvc_create_listener] > 0-rpc-service: cannot create listener, initing the transport failed > [2019-01-15 20:16:59.521648] E [MSGID: 106244] [glusterd.c:1764:init] > 0-management: creation of 1 listeners failed, continuing with succeeded > transport > [2019-01-15 20:17:00.529390] I [MSGID: 106513] > [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved > op-version: 40100 > [2019-01-15 20:17:00.608354] I [MSGID: 106544] > [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: > d6bf51a7-c296-492f-8dac-e81efa9dd22d > [2019-01-15 20:17:00.650911] W [MSGID: 106425] > [glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed > to get statfs() call on brick /media/disk4/brick4 [No such file or > directory] >This means that underlying brick /media/disk4/brick4 doesn't exist. You already mentioned that you had replaced the faulty disk, but have you not mounted it yet?> [2019-01-15 20:17:00.691240] I [MSGID: 106498] > [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management: > connect returned 0 > [2019-01-15 20:17:00.691307] W [MSGID: 106061] > [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd: > Failed to get tcp-user-timeout > [2019-01-15 20:17:00.691331] I [rpc-clnt.c:1059:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2019-01-15 20:17:00.692547] E [MSGID: 106187] > [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve > brick failed in restore > [2019-01-15 20:17:00.692582] E [MSGID: 101019] [xlator.c:720:xlator_init] > 0-management: Initialization of volume 'management' failed, review your > volfile again > [2019-01-15 20:17:00.692597] E [MSGID: 101066] > [graph.c:367:glusterfs_graph_init] 0-management: initializing translator > failed > [2019-01-15 20:17:00.692607] E [MSGID: 101176] > [graph.c:738:glusterfs_graph_activate] 0-graph: init failed > [2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit] > (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] > -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] > -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: > received signum (-1), shutting down > > > On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee <amukherj at redhat.com> > wrote: > >> This is a case of partial write of a transaction and as the host ran out >> of space for the root partition where all the glusterd related >> configurations are persisted, the transaction couldn't be written and hence >> the new (replaced) brick's information wasn't persisted in the >> configuration. The workaround for this is to copy the content of >> /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted >> storage pool to the node where glusterd service fails to come up and post >> that restarting the glusterd service should be able to make peer status >> reporting all nodes healthy and connected. >> >> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P <amudhan83 at gmail.com> wrote: >> >>> Hi, >>> >>> In short, when I started glusterd service I am getting following error >>> msg in the glusterd.log file in one server. >>> what needs to be done? >>> >>> error logged in glusterd.log >>> >>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main] >>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd >>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid) >>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init] >>> 0-management: Maximum allowed open file descriptors set to 65536 >>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init] >>> 0-management: Using /var/lib/glusterd as working directory >>> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init] >>> 0-management: Using /var/run/gluster as pid file working directory >>> [2019-01-15 17:50:13.964437] W [MSGID: 103071] >>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>> channel creation failed [No such device] >>> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] >>> 0-rdma.management: Failed to initialize IB Device >>> [2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load] >>> 0-rpc-transport: 'rdma' initialization failed >>> [2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener] >>> 0-rpc-service: cannot create listener, initing the transport failed >>> [2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init] >>> 0-management: creation of 1 listeners failed, continuing with succeeded >>> transport >>> [2019-01-15 17:50:14.967681] I [MSGID: 106513] >>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>> op-version: 40100 >>> [2019-01-15 17:50:14.973931] I [MSGID: 106544] >>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>> [2019-01-15 17:50:15.046620] E [MSGID: 101032] >>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to >>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such >>> file or directory] >>> [2019-01-15 17:50:15.046685] E [MSGID: 106201] >>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: >>> Unable to restore volume: gfs-tst >>> [2019-01-15 17:50:15.046718] E [MSGID: 101019] >>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>> 'management' failed, review your volfile again >>> [2019-01-15 17:50:15.046732] E [MSGID: 101066] >>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>> failed >>> [2019-01-15 17:50:15.046741] E [MSGID: 101176] >>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit] >>> (-->/usr/local/sbin/glusterd(glusterfs_volumes >>> >>> >>> >>> In long, I am trying to simulate a situation. where volume stoped >>> abnormally and >>> entire cluster restarted with some missing disks. >>> >>> My test cluster is set up with 3 nodes and each has four disks, I have >>> setup a volume with disperse 4+2. >>> In Node-3 2 disks have failed, to replace I have shutdown all system >>> >>> below are the steps done. >>> >>> 1. umount from client machine >>> 2. shutdown all system by running `shutdown -h now` command ( without >>> stopping volume and stop service) >>> 3. replace faulty disk in Node-3 >>> 4. powered ON all system >>> 5. format replaced drives, and mount all drives >>> 6. start glusterd service in all node (success) >>> 7. Now running `voulume status` command from node-3 >>> output : [2019-01-15 16:52:17.718422] : v status : FAILED : Staging >>> failed on 0083ec0c-40bf-472a-a128-458924e56c96. Please check log file for >>> details. >>> 8. running `voulume start gfs-tst` command from node-3 >>> output : [2019-01-15 16:53:19.410252] : v start gfs-tst : FAILED : >>> Volume gfs-tst already started >>> >>> 9. running `gluster v status` in other node. showing all brick available >>> but 'self-heal daemon' not running >>> @gfstst-node2:~$ sudo gluster v status >>> Status of volume: gfs-tst >>> Gluster process TCP Port RDMA Port Online >>> Pid >>> >>> ------------------------------------------------------------------------------ >>> Brick IP.2:/media/disk1/brick1 49152 0 Y 1517 >>> Brick IP.4:/media/disk1/brick1 49152 0 Y 1668 >>> Brick IP.2:/media/disk2/brick2 49153 0 Y 1522 >>> Brick IP.4:/media/disk2/brick2 49153 0 Y 1678 >>> Brick IP.2:/media/disk3/brick3 49154 0 Y 1527 >>> Brick IP.4:/media/disk3/brick3 49154 0 Y 1677 >>> Brick IP.2:/media/disk4/brick4 49155 0 Y 1541 >>> Brick IP.4:/media/disk4/brick4 49155 0 Y 1683 >>> Self-heal Daemon on localhost N/A N/A Y >>> 2662 >>> Self-heal Daemon on IP.4 N/A N/A Y 2786 >>> >>> 10. in the above output 'volume already started'. so, running >>> `reset-brick` command >>> v reset-brick gfs-tst IP.3:/media/disk3/brick3 >>> IP.3:/media/disk3/brick3 commit force >>> >>> output : [2019-01-15 16:57:37.916942] : v reset-brick gfs-tst >>> IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit force : FAILED : >>> /media/disk3/brick3 is already part of a volume >>> >>> 11. reset-brick command was not working, so, tried stopping volume and >>> start with force command >>> output : [2019-01-15 17:01:04.570794] : v start gfs-tst force : FAILED >>> : Pre-validation failed on localhost. Please check log file for details >>> >>> 12. now stopped service in all node and tried starting again. except >>> node-3 other nodes service started successfully without any issues. >>> >>> in node-3 receiving following message. >>> >>> sudo service glusterd start >>> * Starting glusterd service glusterd >>> >>> [fail] >>> /usr/local/sbin/glusterd: option requires an argument -- 'f' >>> Try `glusterd --help' or `glusterd --usage' for more information. >>> >>> 13. checking glusterd log file found that OS drive was running out of >>> space >>> output : [2019-01-15 16:51:37.210792] W [MSGID: 101012] >>> [store.c:372:gf_store_save_value] 0-management: fflush failed. [No space >>> left on device] >>> [2019-01-15 16:51:37.210874] E [MSGID: 106190] >>> [glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management: >>> Unable to write volume values for gfs-tst >>> >>> 14. cleared some space in OS drive but still, service is not running. >>> below is the error logged in glusterd.log >>> >>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main] >>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd >>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid) >>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init] >>> 0-management: Maximum allowed open file descriptors set to 65536 >>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init] >>> 0-management: Using /var/lib/glusterd as working directory >>> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init] >>> 0-management: Using /var/run/gluster as pid file working directory >>> [2019-01-15 17:50:13.964437] W [MSGID: 103071] >>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>> channel creation failed [No such device] >>> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] >>> 0-rdma.management: Failed to initialize IB Device >>> [2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load] >>> 0-rpc-transport: 'rdma' initialization failed >>> [2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener] >>> 0-rpc-service: cannot create listener, initing the transport failed >>> [2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init] >>> 0-management: creation of 1 listeners failed, continuing with succeeded >>> transport >>> [2019-01-15 17:50:14.967681] I [MSGID: 106513] >>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>> op-version: 40100 >>> [2019-01-15 17:50:14.973931] I [MSGID: 106544] >>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>> [2019-01-15 17:50:15.046620] E [MSGID: 101032] >>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to >>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such >>> file or directory] >>> [2019-01-15 17:50:15.046685] E [MSGID: 106201] >>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: >>> Unable to restore volume: gfs-tst >>> [2019-01-15 17:50:15.046718] E [MSGID: 101019] >>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>> 'management' failed, review your volfile again >>> [2019-01-15 17:50:15.046732] E [MSGID: 101066] >>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>> failed >>> [2019-01-15 17:50:15.046741] E [MSGID: 101176] >>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit] >>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] >>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] >>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: >>> received signum (-1), shutting down >>> >>> >>> 15. In other node running `volume status' still shows bricks node3 is >>> live >>> but 'peer status' showing node-3 disconnected >>> >>> @gfstst-node2:~$ sudo gluster v status >>> Status of volume: gfs-tst >>> Gluster process TCP Port RDMA Port Online >>> Pid >>> >>> ------------------------------------------------------------------------------ >>> Brick IP.2:/media/disk1/brick1 49152 0 Y 1517 >>> Brick IP.4:/media/disk1/brick1 49152 0 Y 1668 >>> Brick IP.2:/media/disk2/brick2 49153 0 Y 1522 >>> Brick IP.4:/media/disk2/brick2 49153 0 Y 1678 >>> Brick IP.2:/media/disk3/brick3 49154 0 Y 1527 >>> Brick IP.4:/media/disk3/brick3 49154 0 Y 1677 >>> Brick IP.2:/media/disk4/brick4 49155 0 Y 1541 >>> Brick IP.4:/media/disk4/brick4 49155 0 Y 1683 >>> Self-heal Daemon on localhost N/A N/A Y 2662 >>> Self-heal Daemon on IP.4 N/A N/A Y 2786 >>> >>> Task Status of Volume gfs-tst >>> >>> ------------------------------------------------------------------------------ >>> There are no active volume tasks >>> >>> >>> root at gfstst-node2:~$ sudo gluster pool list >>> UUID Hostname State >>> d6bf51a7-c296-492f-8dac-e81efa9dd22d IP.3 Disconnected >>> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143 IP.4 Connected >>> 0083ec0c-40bf-472a-a128-458924e56c96 localhost Connected >>> >>> root at gfstst-node2:~$ sudo gluster peer status >>> Number of Peers: 2 >>> >>> Hostname: IP.3 >>> Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22d >>> State: Peer in Cluster (Disconnected) >>> >>> Hostname: IP.4 >>> Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143 >>> State: Peer in Cluster (Connected) >>> >>> >>> regards >>> Amudhan >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190116/6443bf8c/attachment.html>