Amudhan P
2019-Jan-16 11:54 UTC
[Gluster-users] glusterfs 4.1.6 error in starting glusterd service
Yes, I did mount bricks but the folder 'brick4' was still not created inside the brick. Do I need to create this folder because when I run replace-brick it will create folder inside the brick. I have seen this behavior before when running replace-brick or heal begins. On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee <amukherj at redhat.com> wrote:> > > On Wed, Jan 16, 2019 at 5:02 PM Amudhan P <amudhan83 at gmail.com> wrote: > >> Atin, >> I have copied the content of 'gfs-tst' from vol folder in another node. >> when starting service again fails with error msg in glusterd.log file. >> >> [2019-01-15 20:16:59.513023] I [MSGID: 100030] [glusterfsd.c:2741:main] >> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd >> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid) >> [2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init] >> 0-management: Maximum allowed open file descriptors set to 65536 >> [2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init] >> 0-management: Using /var/lib/glusterd as working directory >> [2019-01-15 20:16:59.517283] I [MSGID: 106479] [glusterd.c:1486:init] >> 0-management: Using /var/run/gluster as pid file working directory >> [2019-01-15 20:16:59.521508] W [MSGID: 103071] >> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >> channel creation failed [No such device] >> [2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init] >> 0-rdma.management: Failed to initialize IB Device >> [2019-01-15 20:16:59.521562] W [rpc-transport.c:351:rpc_transport_load] >> 0-rpc-transport: 'rdma' initialization failed >> [2019-01-15 20:16:59.521629] W [rpcsvc.c:1781:rpcsvc_create_listener] >> 0-rpc-service: cannot create listener, initing the transport failed >> [2019-01-15 20:16:59.521648] E [MSGID: 106244] [glusterd.c:1764:init] >> 0-management: creation of 1 listeners failed, continuing with succeeded >> transport >> [2019-01-15 20:17:00.529390] I [MSGID: 106513] >> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >> op-version: 40100 >> [2019-01-15 20:17:00.608354] I [MSGID: 106544] >> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >> d6bf51a7-c296-492f-8dac-e81efa9dd22d >> [2019-01-15 20:17:00.650911] W [MSGID: 106425] >> [glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed >> to get statfs() call on brick /media/disk4/brick4 [No such file or >> directory] >> > > This means that underlying brick /media/disk4/brick4 doesn't exist. You > already mentioned that you had replaced the faulty disk, but have you not > mounted it yet? > > >> [2019-01-15 20:17:00.691240] I [MSGID: 106498] >> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management: >> connect returned 0 >> [2019-01-15 20:17:00.691307] W [MSGID: 106061] >> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd: >> Failed to get tcp-user-timeout >> [2019-01-15 20:17:00.691331] I [rpc-clnt.c:1059:rpc_clnt_connection_init] >> 0-management: setting frame-timeout to 600 >> [2019-01-15 20:17:00.692547] E [MSGID: 106187] >> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve >> brick failed in restore >> [2019-01-15 20:17:00.692582] E [MSGID: 101019] [xlator.c:720:xlator_init] >> 0-management: Initialization of volume 'management' failed, review your >> volfile again >> [2019-01-15 20:17:00.692597] E [MSGID: 101066] >> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >> failed >> [2019-01-15 20:17:00.692607] E [MSGID: 101176] >> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >> [2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit] >> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] >> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] >> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: >> received signum (-1), shutting down >> >> >> On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee <amukherj at redhat.com> >> wrote: >> >>> This is a case of partial write of a transaction and as the host ran out >>> of space for the root partition where all the glusterd related >>> configurations are persisted, the transaction couldn't be written and hence >>> the new (replaced) brick's information wasn't persisted in the >>> configuration. The workaround for this is to copy the content of >>> /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted >>> storage pool to the node where glusterd service fails to come up and post >>> that restarting the glusterd service should be able to make peer status >>> reporting all nodes healthy and connected. >>> >>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P <amudhan83 at gmail.com> wrote: >>> >>>> Hi, >>>> >>>> In short, when I started glusterd service I am getting following error >>>> msg in the glusterd.log file in one server. >>>> what needs to be done? >>>> >>>> error logged in glusterd.log >>>> >>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main] >>>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd >>>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid) >>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init] >>>> 0-management: Maximum allowed open file descriptors set to 65536 >>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init] >>>> 0-management: Using /var/lib/glusterd as working directory >>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init] >>>> 0-management: Using /var/run/gluster as pid file working directory >>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071] >>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>>> channel creation failed [No such device] >>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] >>>> 0-rdma.management: Failed to initialize IB Device >>>> [2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load] >>>> 0-rpc-transport: 'rdma' initialization failed >>>> [2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener] >>>> 0-rpc-service: cannot create listener, initing the transport failed >>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init] >>>> 0-management: creation of 1 listeners failed, continuing with succeeded >>>> transport >>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513] >>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>>> op-version: 40100 >>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544] >>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032] >>>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to >>>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such >>>> file or directory] >>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201] >>>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: >>>> Unable to restore volume: gfs-tst >>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019] >>>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>>> 'management' failed, review your volfile again >>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066] >>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>>> failed >>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176] >>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>>> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit] >>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes >>>> >>>> >>>> >>>> In long, I am trying to simulate a situation. where volume stoped >>>> abnormally and >>>> entire cluster restarted with some missing disks. >>>> >>>> My test cluster is set up with 3 nodes and each has four disks, I have >>>> setup a volume with disperse 4+2. >>>> In Node-3 2 disks have failed, to replace I have shutdown all system >>>> >>>> below are the steps done. >>>> >>>> 1. umount from client machine >>>> 2. shutdown all system by running `shutdown -h now` command ( without >>>> stopping volume and stop service) >>>> 3. replace faulty disk in Node-3 >>>> 4. powered ON all system >>>> 5. format replaced drives, and mount all drives >>>> 6. start glusterd service in all node (success) >>>> 7. Now running `voulume status` command from node-3 >>>> output : [2019-01-15 16:52:17.718422] : v status : FAILED : Staging >>>> failed on 0083ec0c-40bf-472a-a128-458924e56c96. Please check log file for >>>> details. >>>> 8. running `voulume start gfs-tst` command from node-3 >>>> output : [2019-01-15 16:53:19.410252] : v start gfs-tst : FAILED : >>>> Volume gfs-tst already started >>>> >>>> 9. running `gluster v status` in other node. showing all brick >>>> available but 'self-heal daemon' not running >>>> @gfstst-node2:~$ sudo gluster v status >>>> Status of volume: gfs-tst >>>> Gluster process TCP Port RDMA Port >>>> Online Pid >>>> >>>> ------------------------------------------------------------------------------ >>>> Brick IP.2:/media/disk1/brick1 49152 0 Y >>>> 1517 >>>> Brick IP.4:/media/disk1/brick1 49152 0 Y >>>> 1668 >>>> Brick IP.2:/media/disk2/brick2 49153 0 Y >>>> 1522 >>>> Brick IP.4:/media/disk2/brick2 49153 0 Y >>>> 1678 >>>> Brick IP.2:/media/disk3/brick3 49154 0 Y >>>> 1527 >>>> Brick IP.4:/media/disk3/brick3 49154 0 Y >>>> 1677 >>>> Brick IP.2:/media/disk4/brick4 49155 0 Y >>>> 1541 >>>> Brick IP.4:/media/disk4/brick4 49155 0 Y >>>> 1683 >>>> Self-heal Daemon on localhost N/A N/A Y >>>> 2662 >>>> Self-heal Daemon on IP.4 N/A N/A Y >>>> 2786 >>>> >>>> 10. in the above output 'volume already started'. so, running >>>> `reset-brick` command >>>> v reset-brick gfs-tst IP.3:/media/disk3/brick3 >>>> IP.3:/media/disk3/brick3 commit force >>>> >>>> output : [2019-01-15 16:57:37.916942] : v reset-brick gfs-tst >>>> IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit force : FAILED : >>>> /media/disk3/brick3 is already part of a volume >>>> >>>> 11. reset-brick command was not working, so, tried stopping volume and >>>> start with force command >>>> output : [2019-01-15 17:01:04.570794] : v start gfs-tst force : FAILED >>>> : Pre-validation failed on localhost. Please check log file for details >>>> >>>> 12. now stopped service in all node and tried starting again. except >>>> node-3 other nodes service started successfully without any issues. >>>> >>>> in node-3 receiving following message. >>>> >>>> sudo service glusterd start >>>> * Starting glusterd service glusterd >>>> >>>> [fail] >>>> /usr/local/sbin/glusterd: option requires an argument -- 'f' >>>> Try `glusterd --help' or `glusterd --usage' for more information. >>>> >>>> 13. checking glusterd log file found that OS drive was running out of >>>> space >>>> output : [2019-01-15 16:51:37.210792] W [MSGID: 101012] >>>> [store.c:372:gf_store_save_value] 0-management: fflush failed. [No space >>>> left on device] >>>> [2019-01-15 16:51:37.210874] E [MSGID: 106190] >>>> [glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management: >>>> Unable to write volume values for gfs-tst >>>> >>>> 14. cleared some space in OS drive but still, service is not running. >>>> below is the error logged in glusterd.log >>>> >>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] [glusterfsd.c:2741:main] >>>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd >>>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid) >>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init] >>>> 0-management: Maximum allowed open file descriptors set to 65536 >>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init] >>>> 0-management: Using /var/lib/glusterd as working directory >>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init] >>>> 0-management: Using /var/run/gluster as pid file working directory >>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071] >>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>>> channel creation failed [No such device] >>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] >>>> 0-rdma.management: Failed to initialize IB Device >>>> [2019-01-15 17:50:13.964491] W [rpc-transport.c:351:rpc_transport_load] >>>> 0-rpc-transport: 'rdma' initialization failed >>>> [2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener] >>>> 0-rpc-service: cannot create listener, initing the transport failed >>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init] >>>> 0-management: creation of 1 listeners failed, continuing with succeeded >>>> transport >>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513] >>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>>> op-version: 40100 >>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544] >>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032] >>>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to >>>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such >>>> file or directory] >>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201] >>>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: >>>> Unable to restore volume: gfs-tst >>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019] >>>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>>> 'management' failed, review your volfile again >>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066] >>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>>> failed >>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176] >>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>>> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit] >>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] >>>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] >>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: >>>> received signum (-1), shutting down >>>> >>>> >>>> 15. In other node running `volume status' still shows bricks node3 is >>>> live >>>> but 'peer status' showing node-3 disconnected >>>> >>>> @gfstst-node2:~$ sudo gluster v status >>>> Status of volume: gfs-tst >>>> Gluster process TCP Port RDMA Port >>>> Online Pid >>>> >>>> ------------------------------------------------------------------------------ >>>> Brick IP.2:/media/disk1/brick1 49152 0 Y >>>> 1517 >>>> Brick IP.4:/media/disk1/brick1 49152 0 Y >>>> 1668 >>>> Brick IP.2:/media/disk2/brick2 49153 0 Y >>>> 1522 >>>> Brick IP.4:/media/disk2/brick2 49153 0 Y >>>> 1678 >>>> Brick IP.2:/media/disk3/brick3 49154 0 Y >>>> 1527 >>>> Brick IP.4:/media/disk3/brick3 49154 0 Y >>>> 1677 >>>> Brick IP.2:/media/disk4/brick4 49155 0 Y >>>> 1541 >>>> Brick IP.4:/media/disk4/brick4 49155 0 Y >>>> 1683 >>>> Self-heal Daemon on localhost N/A N/A Y >>>> 2662 >>>> Self-heal Daemon on IP.4 N/A N/A Y >>>> 2786 >>>> >>>> Task Status of Volume gfs-tst >>>> >>>> ------------------------------------------------------------------------------ >>>> There are no active volume tasks >>>> >>>> >>>> root at gfstst-node2:~$ sudo gluster pool list >>>> UUID Hostname State >>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d IP.3 Disconnected >>>> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143 IP.4 Connected >>>> 0083ec0c-40bf-472a-a128-458924e56c96 localhost Connected >>>> >>>> root at gfstst-node2:~$ sudo gluster peer status >>>> Number of Peers: 2 >>>> >>>> Hostname: IP.3 >>>> Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>> State: Peer in Cluster (Disconnected) >>>> >>>> Hostname: IP.4 >>>> Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143 >>>> State: Peer in Cluster (Connected) >>>> >>>> >>>> regards >>>> Amudhan >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190116/3022ffc5/attachment.html>
Atin Mukherjee
2019-Jan-17 02:36 UTC
[Gluster-users] glusterfs 4.1.6 error in starting glusterd service
If gluster volume info/status shows the brick to be /media/disk4/brick4 then you'd need to mount the same path and hence you'd need to create the brick4 directory explicitly. I fail to understand the rationale how only /media/disk4 can be used as the mount path for the brick. On Wed, Jan 16, 2019 at 5:24 PM Amudhan P <amudhan83 at gmail.com> wrote:> Yes, I did mount bricks but the folder 'brick4' was still not created > inside the brick. > Do I need to create this folder because when I run replace-brick it will > create folder inside the brick. I have seen this behavior before when > running replace-brick or heal begins. > > On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee <amukherj at redhat.com> > wrote: > >> >> >> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P <amudhan83 at gmail.com> wrote: >> >>> Atin, >>> I have copied the content of 'gfs-tst' from vol folder in another node. >>> when starting service again fails with error msg in glusterd.log file. >>> >>> [2019-01-15 20:16:59.513023] I [MSGID: 100030] [glusterfsd.c:2741:main] >>> 0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd >>> version 4.1.6 (args: /usr/local/sbin/glusterd -p /var/run/glusterd.pid) >>> [2019-01-15 20:16:59.517164] I [MSGID: 106478] [glusterd.c:1423:init] >>> 0-management: Maximum allowed open file descriptors set to 65536 >>> [2019-01-15 20:16:59.517264] I [MSGID: 106479] [glusterd.c:1481:init] >>> 0-management: Using /var/lib/glusterd as working directory >>> [2019-01-15 20:16:59.517283] I [MSGID: 106479] [glusterd.c:1486:init] >>> 0-management: Using /var/run/gluster as pid file working directory >>> [2019-01-15 20:16:59.521508] W [MSGID: 103071] >>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>> channel creation failed [No such device] >>> [2019-01-15 20:16:59.521544] W [MSGID: 103055] [rdma.c:4938:init] >>> 0-rdma.management: Failed to initialize IB Device >>> [2019-01-15 20:16:59.521562] W [rpc-transport.c:351:rpc_transport_load] >>> 0-rpc-transport: 'rdma' initialization failed >>> [2019-01-15 20:16:59.521629] W [rpcsvc.c:1781:rpcsvc_create_listener] >>> 0-rpc-service: cannot create listener, initing the transport failed >>> [2019-01-15 20:16:59.521648] E [MSGID: 106244] [glusterd.c:1764:init] >>> 0-management: creation of 1 listeners failed, continuing with succeeded >>> transport >>> [2019-01-15 20:17:00.529390] I [MSGID: 106513] >>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>> op-version: 40100 >>> [2019-01-15 20:17:00.608354] I [MSGID: 106544] >>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>> [2019-01-15 20:17:00.650911] W [MSGID: 106425] >>> [glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed >>> to get statfs() call on brick /media/disk4/brick4 [No such file or >>> directory] >>> >> >> This means that underlying brick /media/disk4/brick4 doesn't exist. You >> already mentioned that you had replaced the faulty disk, but have you not >> mounted it yet? >> >> >>> [2019-01-15 20:17:00.691240] I [MSGID: 106498] >>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management: >>> connect returned 0 >>> [2019-01-15 20:17:00.691307] W [MSGID: 106061] >>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd: >>> Failed to get tcp-user-timeout >>> [2019-01-15 20:17:00.691331] I >>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting >>> frame-timeout to 600 >>> [2019-01-15 20:17:00.692547] E [MSGID: 106187] >>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve >>> brick failed in restore >>> [2019-01-15 20:17:00.692582] E [MSGID: 101019] >>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>> 'management' failed, review your volfile again >>> [2019-01-15 20:17:00.692597] E [MSGID: 101066] >>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>> failed >>> [2019-01-15 20:17:00.692607] E [MSGID: 101176] >>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>> [2019-01-15 20:17:00.693004] W [glusterfsd.c:1514:cleanup_and_exit] >>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] >>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] >>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: >>> received signum (-1), shutting down >>> >>> >>> On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee <amukherj at redhat.com> >>> wrote: >>> >>>> This is a case of partial write of a transaction and as the host ran >>>> out of space for the root partition where all the glusterd related >>>> configurations are persisted, the transaction couldn't be written and hence >>>> the new (replaced) brick's information wasn't persisted in the >>>> configuration. The workaround for this is to copy the content of >>>> /var/lib/glusterd/vols/gfs-tst/ from one of the nodes in the trusted >>>> storage pool to the node where glusterd service fails to come up and post >>>> that restarting the glusterd service should be able to make peer status >>>> reporting all nodes healthy and connected. >>>> >>>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P <amudhan83 at gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> In short, when I started glusterd service I am getting following error >>>>> msg in the glusterd.log file in one server. >>>>> what needs to be done? >>>>> >>>>> error logged in glusterd.log >>>>> >>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] >>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running >>>>> /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p >>>>> /var/run/glusterd.pid) >>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init] >>>>> 0-management: Maximum allowed open file descriptors set to 65536 >>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init] >>>>> 0-management: Using /var/lib/glusterd as working directory >>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init] >>>>> 0-management: Using /var/run/gluster as pid file working directory >>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071] >>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>>>> channel creation failed [No such device] >>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] >>>>> 0-rdma.management: Failed to initialize IB Device >>>>> [2019-01-15 17:50:13.964491] W >>>>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' >>>>> initialization failed >>>>> [2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener] >>>>> 0-rpc-service: cannot create listener, initing the transport failed >>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init] >>>>> 0-management: creation of 1 listeners failed, continuing with succeeded >>>>> transport >>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513] >>>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>>>> op-version: 40100 >>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544] >>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032] >>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to >>>>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such >>>>> file or directory] >>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201] >>>>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: >>>>> Unable to restore volume: gfs-tst >>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019] >>>>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>>>> 'management' failed, review your volfile again >>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066] >>>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>>>> failed >>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176] >>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>>>> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit] >>>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes >>>>> >>>>> >>>>> >>>>> In long, I am trying to simulate a situation. where volume stoped >>>>> abnormally and >>>>> entire cluster restarted with some missing disks. >>>>> >>>>> My test cluster is set up with 3 nodes and each has four disks, I have >>>>> setup a volume with disperse 4+2. >>>>> In Node-3 2 disks have failed, to replace I have shutdown all system >>>>> >>>>> below are the steps done. >>>>> >>>>> 1. umount from client machine >>>>> 2. shutdown all system by running `shutdown -h now` command ( without >>>>> stopping volume and stop service) >>>>> 3. replace faulty disk in Node-3 >>>>> 4. powered ON all system >>>>> 5. format replaced drives, and mount all drives >>>>> 6. start glusterd service in all node (success) >>>>> 7. Now running `voulume status` command from node-3 >>>>> output : [2019-01-15 16:52:17.718422] : v status : FAILED : Staging >>>>> failed on 0083ec0c-40bf-472a-a128-458924e56c96. Please check log file for >>>>> details. >>>>> 8. running `voulume start gfs-tst` command from node-3 >>>>> output : [2019-01-15 16:53:19.410252] : v start gfs-tst : FAILED : >>>>> Volume gfs-tst already started >>>>> >>>>> 9. running `gluster v status` in other node. showing all brick >>>>> available but 'self-heal daemon' not running >>>>> @gfstst-node2:~$ sudo gluster v status >>>>> Status of volume: gfs-tst >>>>> Gluster process TCP Port RDMA Port >>>>> Online Pid >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Brick IP.2:/media/disk1/brick1 49152 0 Y >>>>> 1517 >>>>> Brick IP.4:/media/disk1/brick1 49152 0 Y >>>>> 1668 >>>>> Brick IP.2:/media/disk2/brick2 49153 0 Y >>>>> 1522 >>>>> Brick IP.4:/media/disk2/brick2 49153 0 Y >>>>> 1678 >>>>> Brick IP.2:/media/disk3/brick3 49154 0 Y >>>>> 1527 >>>>> Brick IP.4:/media/disk3/brick3 49154 0 Y >>>>> 1677 >>>>> Brick IP.2:/media/disk4/brick4 49155 0 Y >>>>> 1541 >>>>> Brick IP.4:/media/disk4/brick4 49155 0 Y >>>>> 1683 >>>>> Self-heal Daemon on localhost N/A N/A Y >>>>> 2662 >>>>> Self-heal Daemon on IP.4 N/A N/A Y >>>>> 2786 >>>>> >>>>> 10. in the above output 'volume already started'. so, running >>>>> `reset-brick` command >>>>> v reset-brick gfs-tst IP.3:/media/disk3/brick3 >>>>> IP.3:/media/disk3/brick3 commit force >>>>> >>>>> output : [2019-01-15 16:57:37.916942] : v reset-brick gfs-tst >>>>> IP.3:/media/disk3/brick3 IP.3:/media/disk3/brick3 commit force : FAILED : >>>>> /media/disk3/brick3 is already part of a volume >>>>> >>>>> 11. reset-brick command was not working, so, tried stopping volume and >>>>> start with force command >>>>> output : [2019-01-15 17:01:04.570794] : v start gfs-tst force : >>>>> FAILED : Pre-validation failed on localhost. Please check log file for >>>>> details >>>>> >>>>> 12. now stopped service in all node and tried starting again. except >>>>> node-3 other nodes service started successfully without any issues. >>>>> >>>>> in node-3 receiving following message. >>>>> >>>>> sudo service glusterd start >>>>> * Starting glusterd service glusterd >>>>> >>>>> [fail] >>>>> /usr/local/sbin/glusterd: option requires an argument -- 'f' >>>>> Try `glusterd --help' or `glusterd --usage' for more information. >>>>> >>>>> 13. checking glusterd log file found that OS drive was running out of >>>>> space >>>>> output : [2019-01-15 16:51:37.210792] W [MSGID: 101012] >>>>> [store.c:372:gf_store_save_value] 0-management: fflush failed. [No space >>>>> left on device] >>>>> [2019-01-15 16:51:37.210874] E [MSGID: 106190] >>>>> [glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management: >>>>> Unable to write volume values for gfs-tst >>>>> >>>>> 14. cleared some space in OS drive but still, service is not running. >>>>> below is the error logged in glusterd.log >>>>> >>>>> [2019-01-15 17:50:13.956053] I [MSGID: 100030] >>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd: Started running >>>>> /usr/local/sbin/glusterd version 4.1.6 (args: /usr/local/sbin/glusterd -p >>>>> /var/run/glusterd.pid) >>>>> [2019-01-15 17:50:13.960131] I [MSGID: 106478] [glusterd.c:1423:init] >>>>> 0-management: Maximum allowed open file descriptors set to 65536 >>>>> [2019-01-15 17:50:13.960193] I [MSGID: 106479] [glusterd.c:1481:init] >>>>> 0-management: Using /var/lib/glusterd as working directory >>>>> [2019-01-15 17:50:13.960212] I [MSGID: 106479] [glusterd.c:1486:init] >>>>> 0-management: Using /var/run/gluster as pid file working directory >>>>> [2019-01-15 17:50:13.964437] W [MSGID: 103071] >>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >>>>> channel creation failed [No such device] >>>>> [2019-01-15 17:50:13.964474] W [MSGID: 103055] [rdma.c:4938:init] >>>>> 0-rdma.management: Failed to initialize IB Device >>>>> [2019-01-15 17:50:13.964491] W >>>>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' >>>>> initialization failed >>>>> [2019-01-15 17:50:13.964560] W [rpcsvc.c:1781:rpcsvc_create_listener] >>>>> 0-rpc-service: cannot create listener, initing the transport failed >>>>> [2019-01-15 17:50:13.964579] E [MSGID: 106244] [glusterd.c:1764:init] >>>>> 0-management: creation of 1 listeners failed, continuing with succeeded >>>>> transport >>>>> [2019-01-15 17:50:14.967681] I [MSGID: 106513] >>>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved >>>>> op-version: 40100 >>>>> [2019-01-15 17:50:14.973931] I [MSGID: 106544] >>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: >>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>>> [2019-01-15 17:50:15.046620] E [MSGID: 101032] >>>>> [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to >>>>> /var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such >>>>> file or directory] >>>>> [2019-01-15 17:50:15.046685] E [MSGID: 106201] >>>>> [glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management: >>>>> Unable to restore volume: gfs-tst >>>>> [2019-01-15 17:50:15.046718] E [MSGID: 101019] >>>>> [xlator.c:720:xlator_init] 0-management: Initialization of volume >>>>> 'management' failed, review your volfile again >>>>> [2019-01-15 17:50:15.046732] E [MSGID: 101066] >>>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing translator >>>>> failed >>>>> [2019-01-15 17:50:15.046741] E [MSGID: 101176] >>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed >>>>> [2019-01-15 17:50:15.047171] W [glusterfsd.c:1514:cleanup_and_exit] >>>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52] >>>>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41] >>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-: >>>>> received signum (-1), shutting down >>>>> >>>>> >>>>> 15. In other node running `volume status' still shows bricks node3 is >>>>> live >>>>> but 'peer status' showing node-3 disconnected >>>>> >>>>> @gfstst-node2:~$ sudo gluster v status >>>>> Status of volume: gfs-tst >>>>> Gluster process TCP Port RDMA Port >>>>> Online Pid >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Brick IP.2:/media/disk1/brick1 49152 0 Y >>>>> 1517 >>>>> Brick IP.4:/media/disk1/brick1 49152 0 Y >>>>> 1668 >>>>> Brick IP.2:/media/disk2/brick2 49153 0 Y >>>>> 1522 >>>>> Brick IP.4:/media/disk2/brick2 49153 0 Y >>>>> 1678 >>>>> Brick IP.2:/media/disk3/brick3 49154 0 Y >>>>> 1527 >>>>> Brick IP.4:/media/disk3/brick3 49154 0 Y >>>>> 1677 >>>>> Brick IP.2:/media/disk4/brick4 49155 0 Y >>>>> 1541 >>>>> Brick IP.4:/media/disk4/brick4 49155 0 Y >>>>> 1683 >>>>> Self-heal Daemon on localhost N/A N/A Y >>>>> 2662 >>>>> Self-heal Daemon on IP.4 N/A N/A Y >>>>> 2786 >>>>> >>>>> Task Status of Volume gfs-tst >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> There are no active volume tasks >>>>> >>>>> >>>>> root at gfstst-node2:~$ sudo gluster pool list >>>>> UUID Hostname State >>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d IP.3 Disconnected >>>>> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143 IP.4 Connected >>>>> 0083ec0c-40bf-472a-a128-458924e56c96 localhost Connected >>>>> >>>>> root at gfstst-node2:~$ sudo gluster peer status >>>>> Number of Peers: 2 >>>>> >>>>> Hostname: IP.3 >>>>> Uuid: d6bf51a7-c296-492f-8dac-e81efa9dd22d >>>>> State: Peer in Cluster (Disconnected) >>>>> >>>>> Hostname: IP.4 >>>>> Uuid: c1cbb58e-3ceb-4637-9ba3-3d28ef20b143 >>>>> State: Peer in Cluster (Connected) >>>>> >>>>> >>>>> regards >>>>> Amudhan >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190117/62561d33/attachment.html>