thr3ads.net - Gluster users - [Gluster-users] glusterfs 4.1.6 error in starting glusterd service [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Atin Mukherjee

2019-Jan-25 08:26 UTC

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

Amudhan,

So here's the issue:

In node3, 'cat /var/lib/glusterd/peers/* ' doesn't show up
node2's details
and that's why glusterd wasn't able to resolve the brick(s) hosted on
node2.

Can you please pick up 0083ec0c-40bf-472a-a128-458924e56c96 file from
/var/lib/glusterd/peers/ from node 4 and place it in the same location in
node 3 and then restart glusterd service on node 3?


On Thu, Jan 24, 2019 at 11:57 AM Amudhan P <amudhan83 at gmail.com> wrote:
> Atin,
>
> Sorry, i missed to send entire `glusterd` folder.  Now attached zip
> contains `glusterd` folder from all nodes.
>
> the problem node is node3 IP 10.1.2.3, `glusterd` log file is inside node3
> folder.
>
> regards
> Amudhan
>
> On Wed, Jan 23, 2019 at 11:02 PM Atin Mukherjee <amukherj at
redhat.com>
> wrote:
>
>> Amudhan,
>>
>> I see that you have provided the content of the configuration of the
>> volume gfs-tst where the request was to share the dump of
>> /var/lib/glusterd/* . I can not debug this further until you share the
>> correct dump.
>>
>> On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee <amukherj at
redhat.com>
>> wrote:
>>
>>> Can you please run 'glusterd -LDEBUG' and share back the
glusterd.log?
>>> Instead of doing too many back and forth I suggest you to share the
content
>>> of /var/lib/glusterd from all the nodes. Also do mention which
particular
>>> node the glusterd service is unable to come up.
>>>
>>> On Thu, Jan 17, 2019 at 11:34 AM Amudhan P <amudhan83 at
gmail.com> wrote:
>>>
>>>> I have created the folder in the path as said but still,
service failed
>>>> to start below is the error msg in glusterd.log
>>>>
>>>> [2019-01-16 14:50:14.555742] I [MSGID: 100030]
[glusterfsd.c:2741:main]
>>>> 0-/usr/local/sbin/glusterd: Started running
/usr/local/sbin/glusterd
>>>> version 4.1.6 (args: /usr/local/sbin/glusterd -p
/var/run/glusterd.pid)
>>>> [2019-01-16 14:50:14.559835] I [MSGID: 106478]
[glusterd.c:1423:init]
>>>> 0-management: Maximum allowed open file descriptors set to
65536
>>>> [2019-01-16 14:50:14.559894] I [MSGID: 106479]
[glusterd.c:1481:init]
>>>> 0-management: Using /var/lib/glusterd as working directory
>>>> [2019-01-16 14:50:14.559912] I [MSGID: 106479]
[glusterd.c:1486:init]
>>>> 0-management: Using /var/run/gluster as pid file working
directory
>>>> [2019-01-16 14:50:14.563834] W [MSGID: 103071]
>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma:
rdma_cm event
>>>> channel creation failed [No such device]
>>>> [2019-01-16 14:50:14.563867] W [MSGID: 103055]
[rdma.c:4938:init]
>>>> 0-rdma.management: Failed to initialize IB Device
>>>> [2019-01-16 14:50:14.563882] W
[rpc-transport.c:351:rpc_transport_load]
>>>> 0-rpc-transport: 'rdma' initialization failed
>>>> [2019-01-16 14:50:14.563957] W
[rpcsvc.c:1781:rpcsvc_create_listener]
>>>> 0-rpc-service: cannot create listener, initing the transport
failed
>>>> [2019-01-16 14:50:14.563974] E [MSGID: 106244]
[glusterd.c:1764:init]
>>>> 0-management: creation of 1 listeners failed, continuing with
succeeded
>>>> transport
>>>> [2019-01-16 14:50:15.565868] I [MSGID: 106513]
>>>> [glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd:
retrieved
>>>> op-version: 40100
>>>> [2019-01-16 14:50:15.642532] I [MSGID: 106544]
>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved
UUID:
>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>> [2019-01-16 14:50:15.675333] I [MSGID: 106498]
>>>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo]
0-management:
>>>> connect returned 0
>>>> [2019-01-16 14:50:15.675421] W [MSGID: 106061]
>>>> [glusterd-handler.c:3408:glusterd_transport_inet_options_build]
0-glusterd:
>>>> Failed to get tcp-user-timeout
>>>> [2019-01-16 14:50:15.675451] I
>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management:
setting
>>>> frame-timeout to 600
>>>> *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
>>>> [glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd:
resolve
>>>> brick failed in restore*
>>>> *[2019-01-16 14:50:15.676956] E [MSGID: 101019]
>>>> [xlator.c:720:xlator_init] 0-management: Initialization of
volume
>>>> 'management' failed, review your volfile again*
>>>> [2019-01-16 14:50:15.676973] E [MSGID: 101066]
>>>> [graph.c:367:glusterfs_graph_init] 0-management: initializing
translator
>>>> failed
>>>> [2019-01-16 14:50:15.676986] E [MSGID: 101176]
>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>> [2019-01-16 14:50:15.677479] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>> (-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2)
[0x409f52]
>>>> -->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151)
[0x409e41]
>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f)
[0x40942f] ) 0-:
>>>> received signum (-1), shutting down
>>>>
>>>>
>>>> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee <amukherj at
redhat.com>
>>>> wrote:
>>>>
>>>>> If gluster volume info/status shows the brick to be
>>>>> /media/disk4/brick4 then you'd need to mount the same
path and hence you'd
>>>>> need to create the brick4 directory explicitly. I fail to
understand the
>>>>> rationale how only /media/disk4 can be used as the mount
path for the
>>>>> brick.
>>>>>
>>>>> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P <amudhan83 at
gmail.com> wrote:
>>>>>
>>>>>> Yes, I did mount bricks but the folder 'brick4'
was still not created
>>>>>> inside the brick.
>>>>>> Do I need to create this folder because when I run
replace-brick it
>>>>>> will create folder inside the brick. I have seen this
behavior before when
>>>>>> running replace-brick or heal begins.
>>>>>>
>>>>>> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee
<amukherj at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P
<amudhan83 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Atin,
>>>>>>>> I have copied the content of 'gfs-tst'
from vol folder in another
>>>>>>>> node. when starting service again fails with
error msg in glusterd.log file.
>>>>>>>>
>>>>>>>> [2019-01-15 20:16:59.513023] I [MSGID: 100030]
>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>>>>> /var/run/glusterd.pid)
>>>>>>>> [2019-01-15 20:16:59.517164] I [MSGID: 106478]
>>>>>>>> [glusterd.c:1423:init] 0-management: Maximum
allowed open file descriptors
>>>>>>>> set to 65536
>>>>>>>> [2019-01-15 20:16:59.517264] I [MSGID: 106479]
>>>>>>>> [glusterd.c:1481:init] 0-management: Using
/var/lib/glusterd as working
>>>>>>>> directory
>>>>>>>> [2019-01-15 20:16:59.517283] I [MSGID: 106479]
>>>>>>>> [glusterd.c:1486:init] 0-management: Using
/var/run/gluster as pid file
>>>>>>>> working directory
>>>>>>>> [2019-01-15 20:16:59.521508] W [MSGID: 103071]
>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>> channel creation failed [No such device]
>>>>>>>> [2019-01-15 20:16:59.521544] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>>>>> 0-rdma.management: Failed to initialize IB
Device
>>>>>>>> [2019-01-15 20:16:59.521562] W
>>>>>>>> [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma'
>>>>>>>> initialization failed
>>>>>>>> [2019-01-15 20:16:59.521629] W
>>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create
>>>>>>>> listener, initing the transport failed
>>>>>>>> [2019-01-15 20:16:59.521648] E [MSGID: 106244]
>>>>>>>> [glusterd.c:1764:init] 0-management: creation
of 1 listeners failed,
>>>>>>>> continuing with succeeded transport
>>>>>>>> [2019-01-15 20:17:00.529390] I [MSGID: 106513]
>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>> op-version: 40100
>>>>>>>> [2019-01-15 20:17:00.608354] I [MSGID: 106544]
>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>> [2019-01-15 20:17:00.650911] W [MSGID: 106425]
>>>>>>>>
[glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed
>>>>>>>> to get statfs() call on brick
/media/disk4/brick4 [No such file or
>>>>>>>> directory]
>>>>>>>>
>>>>>>>
>>>>>>> This means that underlying brick
/media/disk4/brick4 doesn't exist.
>>>>>>> You already mentioned that you had replaced the
faulty disk, but have you
>>>>>>> not mounted it yet?
>>>>>>>
>>>>>>>
>>>>>>>> [2019-01-15 20:17:00.691240] I [MSGID: 106498]
>>>>>>>>
[glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
>>>>>>>> connect returned 0
>>>>>>>> [2019-01-15 20:17:00.691307] W [MSGID: 106061]
>>>>>>>>
[glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>>>>>>>> Failed to get tcp-user-timeout
>>>>>>>> [2019-01-15 20:17:00.691331] I
>>>>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-management: setting
>>>>>>>> frame-timeout to 600
>>>>>>>> [2019-01-15 20:17:00.692547] E [MSGID: 106187]
>>>>>>>>
[glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
>>>>>>>> brick failed in restore
>>>>>>>> [2019-01-15 20:17:00.692582] E [MSGID: 101019]
>>>>>>>> [xlator.c:720:xlator_init] 0-management:
Initialization of volume
>>>>>>>> 'management' failed, review your
volfile again
>>>>>>>> [2019-01-15 20:17:00.692597] E [MSGID: 101066]
>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>> failed
>>>>>>>> [2019-01-15 20:17:00.692607] E [MSGID: 101176]
>>>>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph:
init failed
>>>>>>>> [2019-01-15 20:17:00.693004] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>>>>
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>>>>>>>> received signum (-1), shutting down
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 16, 2019 at 4:34 PM Atin Mukherjee
<amukherj at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> This is a case of partial write of a
transaction and as the host
>>>>>>>>> ran out of space for the root partition
where all the glusterd related
>>>>>>>>> configurations are persisted, the
transaction couldn't be written and hence
>>>>>>>>> the new (replaced) brick's information
wasn't persisted in the
>>>>>>>>> configuration. The workaround for this is
to copy the content of
>>>>>>>>> /var/lib/glusterd/vols/gfs-tst/ from one of
the nodes in the trusted
>>>>>>>>> storage pool to the node where glusterd
service fails to come up and post
>>>>>>>>> that restarting the glusterd service should
be able to make peer status
>>>>>>>>> reporting all nodes healthy and connected.
>>>>>>>>>
>>>>>>>>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan P
<amudhan83 at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> In short, when I started glusterd
service I am getting following
>>>>>>>>>> error msg in the glusterd.log file in
one server.
>>>>>>>>>> what needs to be done?
>>>>>>>>>>
>>>>>>>>>> error logged in glusterd.log
>>>>>>>>>>
>>>>>>>>>> [2019-01-15 17:50:13.956053] I [MSGID:
100030]
>>>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>>>> /usr/local/sbin/glusterd version 4.1.6
(args: /usr/local/sbin/glusterd -p
>>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>>> [2019-01-15 17:50:13.960131] I [MSGID:
106478]
>>>>>>>>>> [glusterd.c:1423:init] 0-management:
Maximum allowed open file descriptors
>>>>>>>>>> set to 65536
>>>>>>>>>> [2019-01-15 17:50:13.960193] I [MSGID:
106479]
>>>>>>>>>> [glusterd.c:1481:init] 0-management:
Using /var/lib/glusterd as working
>>>>>>>>>> directory
>>>>>>>>>> [2019-01-15 17:50:13.960212] I [MSGID:
106479]
>>>>>>>>>> [glusterd.c:1486:init] 0-management:
Using /var/run/gluster as pid file
>>>>>>>>>> working directory
>>>>>>>>>> [2019-01-15 17:50:13.964437] W [MSGID:
103071]
>>>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>>>> channel creation failed [No such
device]
>>>>>>>>>> [2019-01-15 17:50:13.964474] W [MSGID:
103055] [rdma.c:4938:init]
>>>>>>>>>> 0-rdma.management: Failed to initialize
IB Device
>>>>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>>>>>
[rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma'
>>>>>>>>>> initialization failed
>>>>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create
>>>>>>>>>> listener, initing the transport failed
>>>>>>>>>> [2019-01-15 17:50:13.964579] E [MSGID:
106244]
>>>>>>>>>> [glusterd.c:1764:init] 0-management:
creation of 1 listeners failed,
>>>>>>>>>> continuing with succeeded transport
>>>>>>>>>> [2019-01-15 17:50:14.967681] I [MSGID:
106513]
>>>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>>>> op-version: 40100
>>>>>>>>>> [2019-01-15 17:50:14.973931] I [MSGID:
106544]
>>>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>>> [2019-01-15 17:50:15.046620] E [MSGID:
101032]
>>>>>>>>>> [store.c:441:gf_store_handle_retrieve]
0-: Path corresponding to
>>>>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>>>>> file or directory]
>>>>>>>>>> [2019-01-15 17:50:15.046685] E [MSGID:
106201]
>>>>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>>>>> Unable to restore volume: gfs-tst
>>>>>>>>>> [2019-01-15 17:50:15.046718] E [MSGID:
101019]
>>>>>>>>>> [xlator.c:720:xlator_init]
0-management: Initialization of volume
>>>>>>>>>> 'management' failed, review
your volfile again
>>>>>>>>>> [2019-01-15 17:50:15.046732] E [MSGID:
101066]
>>>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>>>> failed
>>>>>>>>>> [2019-01-15 17:50:15.046741] E [MSGID:
101176]
>>>>>>>>>> [graph.c:738:glusterfs_graph_activate]
0-graph: init failed
>>>>>>>>>> [2019-01-15 17:50:15.047171] W
>>>>>>>>>> [glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In long, I am trying to simulate a
situation. where volume stoped
>>>>>>>>>> abnormally and
>>>>>>>>>> entire cluster restarted with some
missing disks.
>>>>>>>>>>
>>>>>>>>>> My test cluster is set up with 3 nodes
and each has four disks, I
>>>>>>>>>> have setup a volume with disperse 4+2.
>>>>>>>>>> In Node-3 2 disks have failed, to
replace I have shutdown all
>>>>>>>>>> system
>>>>>>>>>>
>>>>>>>>>> below are the steps done.
>>>>>>>>>>
>>>>>>>>>> 1. umount from client machine
>>>>>>>>>> 2. shutdown all system by running
`shutdown -h now` command (
>>>>>>>>>> without stopping volume and stop
service)
>>>>>>>>>> 3. replace faulty disk in Node-3
>>>>>>>>>> 4. powered ON all system
>>>>>>>>>> 5. format replaced drives, and mount
all drives
>>>>>>>>>> 6. start glusterd service in all node
(success)
>>>>>>>>>> 7. Now running `voulume status` command
from node-3
>>>>>>>>>> output : [2019-01-15 16:52:17.718422] 
: v status : FAILED :
>>>>>>>>>> Staging failed on
0083ec0c-40bf-472a-a128-458924e56c96. Please check log
>>>>>>>>>> file for details.
>>>>>>>>>> 8. running `voulume start gfs-tst`
command from node-3
>>>>>>>>>> output : [2019-01-15 16:53:19.410252] 
: v start gfs-tst : FAILED
>>>>>>>>>> : Volume gfs-tst already started
>>>>>>>>>>
>>>>>>>>>> 9. running `gluster v status` in other
node. showing all brick
>>>>>>>>>> available but 'self-heal
daemon' not running
>>>>>>>>>> @gfstst-node2:~$ sudo gluster v status
>>>>>>>>>> Status of volume: gfs-tst
>>>>>>>>>> Gluster process                        
TCP Port  RDMA Port
>>>>>>>>>> Online  Pid
>>>>>>>>>>
>>>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>>>> Brick IP.2:/media/disk1/brick1         
49152     0          Y
>>>>>>>>>>    1517
>>>>>>>>>> Brick IP.4:/media/disk1/brick1         
49152     0          Y
>>>>>>>>>>    1668
>>>>>>>>>> Brick IP.2:/media/disk2/brick2         
49153     0          Y
>>>>>>>>>>    1522
>>>>>>>>>> Brick IP.4:/media/disk2/brick2         
49153     0          Y
>>>>>>>>>>    1678
>>>>>>>>>> Brick IP.2:/media/disk3/brick3         
49154     0          Y
>>>>>>>>>>    1527
>>>>>>>>>> Brick IP.4:/media/disk3/brick3         
49154     0          Y
>>>>>>>>>>    1677
>>>>>>>>>> Brick IP.2:/media/disk4/brick4         
49155     0          Y
>>>>>>>>>>    1541
>>>>>>>>>> Brick IP.4:/media/disk4/brick4         
49155     0          Y
>>>>>>>>>>    1683
>>>>>>>>>> Self-heal Daemon on localhost          
N/A       N/A
>>>>>>>>>> Y       2662
>>>>>>>>>> Self-heal Daemon on IP.4               
N/A       N/A        Y
>>>>>>>>>>    2786
>>>>>>>>>>
>>>>>>>>>> 10. in the above output 'volume
already started'. so, running
>>>>>>>>>> `reset-brick` command
>>>>>>>>>>    v reset-brick gfs-tst
IP.3:/media/disk3/brick3
>>>>>>>>>> IP.3:/media/disk3/brick3 commit force
>>>>>>>>>>
>>>>>>>>>> output : [2019-01-15 16:57:37.916942] 
: v reset-brick gfs-tst
>>>>>>>>>> IP.3:/media/disk3/brick3
IP.3:/media/disk3/brick3 commit force : FAILED :
>>>>>>>>>> /media/disk3/brick3 is already part of
a volume
>>>>>>>>>>
>>>>>>>>>> 11. reset-brick command was not
working, so, tried stopping
>>>>>>>>>> volume and start with force command
>>>>>>>>>> output : [2019-01-15 17:01:04.570794] 
: v start gfs-tst force :
>>>>>>>>>> FAILED : Pre-validation failed on
localhost. Please check log file for
>>>>>>>>>> details
>>>>>>>>>>
>>>>>>>>>> 12. now stopped service in all node and
tried starting again.
>>>>>>>>>> except node-3 other nodes service
started successfully without any issues.
>>>>>>>>>>
>>>>>>>>>> in node-3 receiving following message.
>>>>>>>>>>
>>>>>>>>>> sudo service glusterd start
>>>>>>>>>> * Starting glusterd service glusterd
>>>>>>>>>>
>>>>>>>>>>                   [fail]
>>>>>>>>>> /usr/local/sbin/glusterd: option
requires an argument -- 'f'
>>>>>>>>>> Try `glusterd --help' or `glusterd
--usage' for more information.
>>>>>>>>>>
>>>>>>>>>> 13. checking glusterd log file found
that OS drive was running
>>>>>>>>>> out of space
>>>>>>>>>> output : [2019-01-15 16:51:37.210792] W
[MSGID: 101012]
>>>>>>>>>> [store.c:372:gf_store_save_value]
0-management: fflush failed. [No space
>>>>>>>>>> left on device]
>>>>>>>>>> [2019-01-15 16:51:37.210874] E [MSGID:
106190]
>>>>>>>>>>
[glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management:
>>>>>>>>>> Unable to write volume values for
gfs-tst
>>>>>>>>>>
>>>>>>>>>> 14. cleared some space in OS drive but
still, service is not
>>>>>>>>>> running. below is the error logged in
glusterd.log
>>>>>>>>>>
>>>>>>>>>> [2019-01-15 17:50:13.956053] I [MSGID:
100030]
>>>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>>>> /usr/local/sbin/glusterd version 4.1.6
(args: /usr/local/sbin/glusterd -p
>>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>>> [2019-01-15 17:50:13.960131] I [MSGID:
106478]
>>>>>>>>>> [glusterd.c:1423:init] 0-management:
Maximum allowed open file descriptors
>>>>>>>>>> set to 65536
>>>>>>>>>> [2019-01-15 17:50:13.960193] I [MSGID:
106479]
>>>>>>>>>> [glusterd.c:1481:init] 0-management:
Using /var/lib/glusterd as working
>>>>>>>>>> directory
>>>>>>>>>> [2019-01-15 17:50:13.960212] I [MSGID:
106479]
>>>>>>>>>> [glusterd.c:1486:init] 0-management:
Using /var/run/gluster as pid file
>>>>>>>>>> working directory
>>>>>>>>>> [2019-01-15 17:50:13.964437] W [MSGID:
103071]
>>>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>>>> channel creation failed [No such
device]
>>>>>>>>>> [2019-01-15 17:50:13.964474] W [MSGID:
103055] [rdma.c:4938:init]
>>>>>>>>>> 0-rdma.management: Failed to initialize
IB Device
>>>>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>>>>>
[rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma'
>>>>>>>>>> initialization failed
>>>>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create
>>>>>>>>>> listener, initing the transport failed
>>>>>>>>>> [2019-01-15 17:50:13.964579] E [MSGID:
106244]
>>>>>>>>>> [glusterd.c:1764:init] 0-management:
creation of 1 listeners failed,
>>>>>>>>>> continuing with succeeded transport
>>>>>>>>>> [2019-01-15 17:50:14.967681] I [MSGID:
106513]
>>>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>>>> op-version: 40100
>>>>>>>>>> [2019-01-15 17:50:14.973931] I [MSGID:
106544]
>>>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>>> [2019-01-15 17:50:15.046620] E [MSGID:
101032]
>>>>>>>>>> [store.c:441:gf_store_handle_retrieve]
0-: Path corresponding to
>>>>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>>>>> file or directory]
>>>>>>>>>> [2019-01-15 17:50:15.046685] E [MSGID:
106201]
>>>>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>>>>> Unable to restore volume: gfs-tst
>>>>>>>>>> [2019-01-15 17:50:15.046718] E [MSGID:
101019]
>>>>>>>>>> [xlator.c:720:xlator_init]
0-management: Initialization of volume
>>>>>>>>>> 'management' failed, review
your volfile again
>>>>>>>>>> [2019-01-15 17:50:15.046732] E [MSGID:
101066]
>>>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>>>> failed
>>>>>>>>>> [2019-01-15 17:50:15.046741] E [MSGID:
101176]
>>>>>>>>>> [graph.c:738:glusterfs_graph_activate]
0-graph: init failed
>>>>>>>>>> [2019-01-15 17:50:15.047171] W
>>>>>>>>>> [glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>>>>>>
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>>>>>>>>>> received signum (-1), shutting down
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 15. In other node running `volume
status' still shows bricks
>>>>>>>>>> node3 is live
>>>>>>>>>>      but 'peer status' showing
node-3 disconnected
>>>>>>>>>>
>>>>>>>>>> @gfstst-node2:~$ sudo gluster v status
>>>>>>>>>> Status of volume: gfs-tst
>>>>>>>>>> Gluster process                        
TCP Port  RDMA Port
>>>>>>>>>> Online  Pid
>>>>>>>>>>
>>>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>>>> Brick IP.2:/media/disk1/brick1         
49152     0          Y
>>>>>>>>>>    1517
>>>>>>>>>> Brick IP.4:/media/disk1/brick1         
49152     0          Y
>>>>>>>>>>    1668
>>>>>>>>>> Brick IP.2:/media/disk2/brick2         
49153     0          Y
>>>>>>>>>>    1522
>>>>>>>>>> Brick IP.4:/media/disk2/brick2         
49153     0          Y
>>>>>>>>>>    1678
>>>>>>>>>> Brick IP.2:/media/disk3/brick3         
49154     0          Y
>>>>>>>>>>    1527
>>>>>>>>>> Brick IP.4:/media/disk3/brick3         
49154     0          Y
>>>>>>>>>>    1677
>>>>>>>>>> Brick IP.2:/media/disk4/brick4         
49155     0          Y
>>>>>>>>>>    1541
>>>>>>>>>> Brick IP.4:/media/disk4/brick4         
49155     0          Y
>>>>>>>>>>    1683
>>>>>>>>>> Self-heal Daemon on localhost          
N/A       N/A        Y
>>>>>>>>>>    2662
>>>>>>>>>> Self-heal Daemon on IP.4               
N/A       N/A        Y
>>>>>>>>>>    2786
>>>>>>>>>>
>>>>>>>>>> Task Status of Volume gfs-tst
>>>>>>>>>>
>>>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>>>> There are no active volume tasks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> root at gfstst-node2:~$ sudo gluster
pool list
>>>>>>>>>> UUID                                   
Hostname        State
>>>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d   
IP.3        Disconnected
>>>>>>>>>> c1cbb58e-3ceb-4637-9ba3-3d28ef20b143   
IP.4        Connected
>>>>>>>>>> 0083ec0c-40bf-472a-a128-458924e56c96   
localhost       Connected
>>>>>>>>>>
>>>>>>>>>> root at gfstst-node2:~$ sudo gluster
peer status
>>>>>>>>>> Number of Peers: 2
>>>>>>>>>>
>>>>>>>>>> Hostname: IP.3
>>>>>>>>>> Uuid:
d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>>> State: Peer in Cluster (Disconnected)
>>>>>>>>>>
>>>>>>>>>> Hostname: IP.4
>>>>>>>>>> Uuid:
c1cbb58e-3ceb-4637-9ba3-3d28ef20b143
>>>>>>>>>> State: Peer in Cluster (Connected)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> regards
>>>>>>>>>> Amudhan
>>>>>>>>>>
_______________________________________________
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190125/b2684fff/attachment.html>

Amudhan P

2019-Jan-30 13:55 UTC

head link

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

Hi Atin,

yes, it worked out thank you.

what would be the cause of this issue?



On Fri, Jan 25, 2019 at 1:56 PM Atin Mukherjee <amukherj at redhat.com>
wrote:
> Amudhan,
>
> So here's the issue:
>
> In node3, 'cat /var/lib/glusterd/peers/* ' doesn't show up
node2's details
> and that's why glusterd wasn't able to resolve the brick(s) hosted
on node2.
>
> Can you please pick up 0083ec0c-40bf-472a-a128-458924e56c96 file from
> /var/lib/glusterd/peers/ from node 4 and place it in the same location in
> node 3 and then restart glusterd service on node 3?
>
>
> On Thu, Jan 24, 2019 at 11:57 AM Amudhan P <amudhan83 at gmail.com>
wrote:
>
>> Atin,
>>
>> Sorry, i missed to send entire `glusterd` folder.  Now attached zip
>> contains `glusterd` folder from all nodes.
>>
>> the problem node is node3 IP 10.1.2.3, `glusterd` log file is inside
>> node3 folder.
>>
>> regards
>> Amudhan
>>
>> On Wed, Jan 23, 2019 at 11:02 PM Atin Mukherjee <amukherj at
redhat.com>
>> wrote:
>>
>>> Amudhan,
>>>
>>> I see that you have provided the content of the configuration of
the
>>> volume gfs-tst where the request was to share the dump of
>>> /var/lib/glusterd/* . I can not debug this further until you share
the
>>> correct dump.
>>>
>>> On Thu, Jan 17, 2019 at 3:43 PM Atin Mukherjee <amukherj at
redhat.com>
>>> wrote:
>>>
>>>> Can you please run 'glusterd -LDEBUG' and share back
the glusterd.log?
>>>> Instead of doing too many back and forth I suggest you to share
the content
>>>> of /var/lib/glusterd from all the nodes. Also do mention which
particular
>>>> node the glusterd service is unable to come up.
>>>>
>>>> On Thu, Jan 17, 2019 at 11:34 AM Amudhan P <amudhan83 at
gmail.com> wrote:
>>>>
>>>>> I have created the folder in the path as said but still,
service
>>>>> failed to start below is the error msg in glusterd.log
>>>>>
>>>>> [2019-01-16 14:50:14.555742] I [MSGID: 100030]
>>>>> [glusterfsd.c:2741:main] 0-/usr/local/sbin/glusterd:
Started running
>>>>> /usr/local/sbin/glusterd version 4.1.6 (args:
/usr/local/sbin/glusterd -p
>>>>> /var/run/glusterd.pid)
>>>>> [2019-01-16 14:50:14.559835] I [MSGID: 106478]
[glusterd.c:1423:init]
>>>>> 0-management: Maximum allowed open file descriptors set to
65536
>>>>> [2019-01-16 14:50:14.559894] I [MSGID: 106479]
[glusterd.c:1481:init]
>>>>> 0-management: Using /var/lib/glusterd as working directory
>>>>> [2019-01-16 14:50:14.559912] I [MSGID: 106479]
[glusterd.c:1486:init]
>>>>> 0-management: Using /var/run/gluster as pid file working
directory
>>>>> [2019-01-16 14:50:14.563834] W [MSGID: 103071]
>>>>> [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma:
rdma_cm event
>>>>> channel creation failed [No such device]
>>>>> [2019-01-16 14:50:14.563867] W [MSGID: 103055]
[rdma.c:4938:init]
>>>>> 0-rdma.management: Failed to initialize IB Device
>>>>> [2019-01-16 14:50:14.563882] W
>>>>> [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport:
'rdma'
>>>>> initialization failed
>>>>> [2019-01-16 14:50:14.563957] W
[rpcsvc.c:1781:rpcsvc_create_listener]
>>>>> 0-rpc-service: cannot create listener, initing the
transport failed
>>>>> [2019-01-16 14:50:14.563974] E [MSGID: 106244]
[glusterd.c:1764:init]
>>>>> 0-management: creation of 1 listeners failed, continuing
with succeeded
>>>>> transport
>>>>> [2019-01-16 14:50:15.565868] I [MSGID: 106513]
>>>>> [glusterd-store.c:2240:glusterd_restore_op_version]
0-glusterd: retrieved
>>>>> op-version: 40100
>>>>> [2019-01-16 14:50:15.642532] I [MSGID: 106544]
>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved
UUID:
>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>> [2019-01-16 14:50:15.675333] I [MSGID: 106498]
>>>>> [glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo]
0-management:
>>>>> connect returned 0
>>>>> [2019-01-16 14:50:15.675421] W [MSGID: 106061]
>>>>>
[glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>>>>> Failed to get tcp-user-timeout
>>>>> [2019-01-16 14:50:15.675451] I
>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management:
setting
>>>>> frame-timeout to 600
>>>>> *[2019-01-16 14:50:15.676912] E [MSGID: 106187]
>>>>> [glusterd-store.c:4662:glusterd_resolve_all_bricks]
0-glusterd: resolve
>>>>> brick failed in restore*
>>>>> *[2019-01-16 14:50:15.676956] E [MSGID: 101019]
>>>>> [xlator.c:720:xlator_init] 0-management: Initialization of
volume
>>>>> 'management' failed, review your volfile again*
>>>>> [2019-01-16 14:50:15.676973] E [MSGID: 101066]
>>>>> [graph.c:367:glusterfs_graph_init] 0-management:
initializing translator
>>>>> failed
>>>>> [2019-01-16 14:50:15.676986] E [MSGID: 101176]
>>>>> [graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>>> [2019-01-16 14:50:15.677479] W
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>> -->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f)
[0x40942f] ) 0-:
>>>>> received signum (-1), shutting down
>>>>>
>>>>>
>>>>> On Thu, Jan 17, 2019 at 8:06 AM Atin Mukherjee <amukherj
at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> If gluster volume info/status shows the brick to be
>>>>>> /media/disk4/brick4 then you'd need to mount the
same path and hence you'd
>>>>>> need to create the brick4 directory explicitly. I fail
to understand the
>>>>>> rationale how only /media/disk4 can be used as the
mount path for the
>>>>>> brick.
>>>>>>
>>>>>> On Wed, Jan 16, 2019 at 5:24 PM Amudhan P <amudhan83
at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Yes, I did mount bricks but the folder
'brick4' was still not
>>>>>>> created inside the brick.
>>>>>>> Do I need to create this folder because when I run
replace-brick it
>>>>>>> will create folder inside the brick. I have seen
this behavior before when
>>>>>>> running replace-brick or heal begins.
>>>>>>>
>>>>>>> On Wed, Jan 16, 2019 at 5:05 PM Atin Mukherjee
<amukherj at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 16, 2019 at 5:02 PM Amudhan P
<amudhan83 at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Atin,
>>>>>>>>> I have copied the content of
'gfs-tst' from vol folder in another
>>>>>>>>> node. when starting service again fails
with error msg in glusterd.log file.
>>>>>>>>>
>>>>>>>>> [2019-01-15 20:16:59.513023] I [MSGID:
100030]
>>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>>> /usr/local/sbin/glusterd version 4.1.6
(args: /usr/local/sbin/glusterd -p
>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>> [2019-01-15 20:16:59.517164] I [MSGID:
106478]
>>>>>>>>> [glusterd.c:1423:init] 0-management:
Maximum allowed open file descriptors
>>>>>>>>> set to 65536
>>>>>>>>> [2019-01-15 20:16:59.517264] I [MSGID:
106479]
>>>>>>>>> [glusterd.c:1481:init] 0-management: Using
/var/lib/glusterd as working
>>>>>>>>> directory
>>>>>>>>> [2019-01-15 20:16:59.517283] I [MSGID:
106479]
>>>>>>>>> [glusterd.c:1486:init] 0-management: Using
/var/run/gluster as pid file
>>>>>>>>> working directory
>>>>>>>>> [2019-01-15 20:16:59.521508] W [MSGID:
103071]
>>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>>> channel creation failed [No such device]
>>>>>>>>> [2019-01-15 20:16:59.521544] W [MSGID:
103055] [rdma.c:4938:init]
>>>>>>>>> 0-rdma.management: Failed to initialize IB
Device
>>>>>>>>> [2019-01-15 20:16:59.521562] W
>>>>>>>>> [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma'
>>>>>>>>> initialization failed
>>>>>>>>> [2019-01-15 20:16:59.521629] W
>>>>>>>>> [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create
>>>>>>>>> listener, initing the transport failed
>>>>>>>>> [2019-01-15 20:16:59.521648] E [MSGID:
106244]
>>>>>>>>> [glusterd.c:1764:init] 0-management:
creation of 1 listeners failed,
>>>>>>>>> continuing with succeeded transport
>>>>>>>>> [2019-01-15 20:17:00.529390] I [MSGID:
106513]
>>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>>> op-version: 40100
>>>>>>>>> [2019-01-15 20:17:00.608354] I [MSGID:
106544]
>>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>>> d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>> [2019-01-15 20:17:00.650911] W [MSGID:
106425]
>>>>>>>>>
[glusterd-store.c:2643:glusterd_store_retrieve_bricks] 0-management: failed
>>>>>>>>> to get statfs() call on brick
/media/disk4/brick4 [No such file or
>>>>>>>>> directory]
>>>>>>>>>
>>>>>>>>
>>>>>>>> This means that underlying brick
/media/disk4/brick4 doesn't exist.
>>>>>>>> You already mentioned that you had replaced the
faulty disk, but have you
>>>>>>>> not mounted it yet?
>>>>>>>>
>>>>>>>>
>>>>>>>>> [2019-01-15 20:17:00.691240] I [MSGID:
106498]
>>>>>>>>>
[glusterd-handler.c:3614:glusterd_friend_add_from_peerinfo] 0-management:
>>>>>>>>> connect returned 0
>>>>>>>>> [2019-01-15 20:17:00.691307] W [MSGID:
106061]
>>>>>>>>>
[glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd:
>>>>>>>>> Failed to get tcp-user-timeout
>>>>>>>>> [2019-01-15 20:17:00.691331] I
>>>>>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-management: setting
>>>>>>>>> frame-timeout to 600
>>>>>>>>> [2019-01-15 20:17:00.692547] E [MSGID:
106187]
>>>>>>>>>
[glusterd-store.c:4662:glusterd_resolve_all_bricks] 0-glusterd: resolve
>>>>>>>>> brick failed in restore
>>>>>>>>> [2019-01-15 20:17:00.692582] E [MSGID:
101019]
>>>>>>>>> [xlator.c:720:xlator_init] 0-management:
Initialization of volume
>>>>>>>>> 'management' failed, review your
volfile again
>>>>>>>>> [2019-01-15 20:17:00.692597] E [MSGID:
101066]
>>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>>> failed
>>>>>>>>> [2019-01-15 20:17:00.692607] E [MSGID:
101176]
>>>>>>>>> [graph.c:738:glusterfs_graph_activate]
0-graph: init failed
>>>>>>>>> [2019-01-15 20:17:00.693004] W
>>>>>>>>> [glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>>>>>
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>>>>>>>>> received signum (-1), shutting down
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 16, 2019 at 4:34 PM Atin
Mukherjee <
>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> This is a case of partial write of a
transaction and as the host
>>>>>>>>>> ran out of space for the root partition
where all the glusterd related
>>>>>>>>>> configurations are persisted, the
transaction couldn't be written and hence
>>>>>>>>>> the new (replaced) brick's
information wasn't persisted in the
>>>>>>>>>> configuration. The workaround for this
is to copy the content of
>>>>>>>>>> /var/lib/glusterd/vols/gfs-tst/ from
one of the nodes in the trusted
>>>>>>>>>> storage pool to the node where glusterd
service fails to come up and post
>>>>>>>>>> that restarting the glusterd service
should be able to make peer status
>>>>>>>>>> reporting all nodes healthy and
connected.
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 16, 2019 at 3:49 PM Amudhan
P <amudhan83 at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> In short, when I started glusterd
service I am getting following
>>>>>>>>>>> error msg in the glusterd.log file
in one server.
>>>>>>>>>>> what needs to be done?
>>>>>>>>>>>
>>>>>>>>>>> error logged in glusterd.log
>>>>>>>>>>>
>>>>>>>>>>> [2019-01-15 17:50:13.956053] I
[MSGID: 100030]
>>>>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>>>>> /usr/local/sbin/glusterd version
4.1.6 (args: /usr/local/sbin/glusterd -p
>>>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>>>> [2019-01-15 17:50:13.960131] I
[MSGID: 106478]
>>>>>>>>>>> [glusterd.c:1423:init]
0-management: Maximum allowed open file descriptors
>>>>>>>>>>> set to 65536
>>>>>>>>>>> [2019-01-15 17:50:13.960193] I
[MSGID: 106479]
>>>>>>>>>>> [glusterd.c:1481:init]
0-management: Using /var/lib/glusterd as working
>>>>>>>>>>> directory
>>>>>>>>>>> [2019-01-15 17:50:13.960212] I
[MSGID: 106479]
>>>>>>>>>>> [glusterd.c:1486:init]
0-management: Using /var/run/gluster as pid file
>>>>>>>>>>> working directory
>>>>>>>>>>> [2019-01-15 17:50:13.964437] W
[MSGID: 103071]
>>>>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>>>>> channel creation failed [No such
device]
>>>>>>>>>>> [2019-01-15 17:50:13.964474] W
[MSGID: 103055]
>>>>>>>>>>> [rdma.c:4938:init]
0-rdma.management: Failed to initialize IB Device
>>>>>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>>>>>>
[rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma'
>>>>>>>>>>> initialization failed
>>>>>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>>>>>>
[rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create
>>>>>>>>>>> listener, initing the transport
failed
>>>>>>>>>>> [2019-01-15 17:50:13.964579] E
[MSGID: 106244]
>>>>>>>>>>> [glusterd.c:1764:init]
0-management: creation of 1 listeners failed,
>>>>>>>>>>> continuing with succeeded transport
>>>>>>>>>>> [2019-01-15 17:50:14.967681] I
[MSGID: 106513]
>>>>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>>>>> op-version: 40100
>>>>>>>>>>> [2019-01-15 17:50:14.973931] I
[MSGID: 106544]
>>>>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>>>>>
d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>>>> [2019-01-15 17:50:15.046620] E
[MSGID: 101032]
>>>>>>>>>>>
[store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
>>>>>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>>>>>> file or directory]
>>>>>>>>>>> [2019-01-15 17:50:15.046685] E
[MSGID: 106201]
>>>>>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>>>>>> Unable to restore volume: gfs-tst
>>>>>>>>>>> [2019-01-15 17:50:15.046718] E
[MSGID: 101019]
>>>>>>>>>>> [xlator.c:720:xlator_init]
0-management: Initialization of volume
>>>>>>>>>>> 'management' failed, review
your volfile again
>>>>>>>>>>> [2019-01-15 17:50:15.046732] E
[MSGID: 101066]
>>>>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>>>>> failed
>>>>>>>>>>> [2019-01-15 17:50:15.046741] E
[MSGID: 101176]
>>>>>>>>>>>
[graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>>>>>>>>> [2019-01-15 17:50:15.047171] W
>>>>>>>>>>>
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In long, I am trying to simulate a
situation. where volume
>>>>>>>>>>> stoped abnormally and
>>>>>>>>>>> entire cluster restarted with some
missing disks.
>>>>>>>>>>>
>>>>>>>>>>> My test cluster is set up with 3
nodes and each has four disks,
>>>>>>>>>>> I have setup a volume with disperse
4+2.
>>>>>>>>>>> In Node-3 2 disks have failed, to
replace I have shutdown all
>>>>>>>>>>> system
>>>>>>>>>>>
>>>>>>>>>>> below are the steps done.
>>>>>>>>>>>
>>>>>>>>>>> 1. umount from client machine
>>>>>>>>>>> 2. shutdown all system by running
`shutdown -h now` command (
>>>>>>>>>>> without stopping volume and stop
service)
>>>>>>>>>>> 3. replace faulty disk in Node-3
>>>>>>>>>>> 4. powered ON all system
>>>>>>>>>>> 5. format replaced drives, and
mount all drives
>>>>>>>>>>> 6. start glusterd service in all
node (success)
>>>>>>>>>>> 7. Now running `voulume status`
command from node-3
>>>>>>>>>>> output : [2019-01-15
16:52:17.718422]  : v status : FAILED :
>>>>>>>>>>> Staging failed on
0083ec0c-40bf-472a-a128-458924e56c96. Please check log
>>>>>>>>>>> file for details.
>>>>>>>>>>> 8. running `voulume start gfs-tst`
command from node-3
>>>>>>>>>>> output : [2019-01-15
16:53:19.410252]  : v start gfs-tst :
>>>>>>>>>>> FAILED : Volume gfs-tst already
started
>>>>>>>>>>>
>>>>>>>>>>> 9. running `gluster v status` in
other node. showing all brick
>>>>>>>>>>> available but 'self-heal
daemon' not running
>>>>>>>>>>> @gfstst-node2:~$ sudo gluster v
status
>>>>>>>>>>> Status of volume: gfs-tst
>>>>>>>>>>> Gluster process                    
TCP Port  RDMA Port
>>>>>>>>>>> Online  Pid
>>>>>>>>>>>
>>>>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>>>>> Brick IP.2:/media/disk1/brick1     
49152     0          Y
>>>>>>>>>>>      1517
>>>>>>>>>>> Brick IP.4:/media/disk1/brick1     
49152     0          Y
>>>>>>>>>>>      1668
>>>>>>>>>>> Brick IP.2:/media/disk2/brick2     
49153     0          Y
>>>>>>>>>>>      1522
>>>>>>>>>>> Brick IP.4:/media/disk2/brick2     
49153     0          Y
>>>>>>>>>>>      1678
>>>>>>>>>>> Brick IP.2:/media/disk3/brick3     
49154     0          Y
>>>>>>>>>>>      1527
>>>>>>>>>>> Brick IP.4:/media/disk3/brick3     
49154     0          Y
>>>>>>>>>>>      1677
>>>>>>>>>>> Brick IP.2:/media/disk4/brick4     
49155     0          Y
>>>>>>>>>>>      1541
>>>>>>>>>>> Brick IP.4:/media/disk4/brick4     
49155     0          Y
>>>>>>>>>>>      1683
>>>>>>>>>>> Self-heal Daemon on localhost      
N/A       N/A
>>>>>>>>>>> Y       2662
>>>>>>>>>>> Self-heal Daemon on IP.4           
N/A       N/A        Y
>>>>>>>>>>>      2786
>>>>>>>>>>>
>>>>>>>>>>> 10. in the above output 'volume
already started'. so, running
>>>>>>>>>>> `reset-brick` command
>>>>>>>>>>>    v reset-brick gfs-tst
IP.3:/media/disk3/brick3
>>>>>>>>>>> IP.3:/media/disk3/brick3 commit
force
>>>>>>>>>>>
>>>>>>>>>>> output : [2019-01-15
16:57:37.916942]  : v reset-brick gfs-tst
>>>>>>>>>>> IP.3:/media/disk3/brick3
IP.3:/media/disk3/brick3 commit force : FAILED :
>>>>>>>>>>> /media/disk3/brick3 is already part
of a volume
>>>>>>>>>>>
>>>>>>>>>>> 11. reset-brick command was not
working, so, tried stopping
>>>>>>>>>>> volume and start with force command
>>>>>>>>>>> output : [2019-01-15
17:01:04.570794]  : v start gfs-tst force :
>>>>>>>>>>> FAILED : Pre-validation failed on
localhost. Please check log file for
>>>>>>>>>>> details
>>>>>>>>>>>
>>>>>>>>>>> 12. now stopped service in all node
and tried starting again.
>>>>>>>>>>> except node-3 other nodes service
started successfully without any issues.
>>>>>>>>>>>
>>>>>>>>>>> in node-3 receiving following
message.
>>>>>>>>>>>
>>>>>>>>>>> sudo service glusterd start
>>>>>>>>>>> * Starting glusterd service
glusterd
>>>>>>>>>>>
>>>>>>>>>>>                     [fail]
>>>>>>>>>>> /usr/local/sbin/glusterd: option
requires an argument -- 'f'
>>>>>>>>>>> Try `glusterd --help' or
`glusterd --usage' for more information.
>>>>>>>>>>>
>>>>>>>>>>> 13. checking glusterd log file
found that OS drive was running
>>>>>>>>>>> out of space
>>>>>>>>>>> output : [2019-01-15
16:51:37.210792] W [MSGID: 101012]
>>>>>>>>>>> [store.c:372:gf_store_save_value]
0-management: fflush failed. [No space
>>>>>>>>>>> left on device]
>>>>>>>>>>> [2019-01-15 16:51:37.210874] E
[MSGID: 106190]
>>>>>>>>>>>
[glusterd-store.c:1058:glusterd_volume_exclude_options_write] 0-management:
>>>>>>>>>>> Unable to write volume values for
gfs-tst
>>>>>>>>>>>
>>>>>>>>>>> 14. cleared some space in OS drive
but still, service is not
>>>>>>>>>>> running. below is the error logged
in glusterd.log
>>>>>>>>>>>
>>>>>>>>>>> [2019-01-15 17:50:13.956053] I
[MSGID: 100030]
>>>>>>>>>>> [glusterfsd.c:2741:main]
0-/usr/local/sbin/glusterd: Started running
>>>>>>>>>>> /usr/local/sbin/glusterd version
4.1.6 (args: /usr/local/sbin/glusterd -p
>>>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>>>> [2019-01-15 17:50:13.960131] I
[MSGID: 106478]
>>>>>>>>>>> [glusterd.c:1423:init]
0-management: Maximum allowed open file descriptors
>>>>>>>>>>> set to 65536
>>>>>>>>>>> [2019-01-15 17:50:13.960193] I
[MSGID: 106479]
>>>>>>>>>>> [glusterd.c:1481:init]
0-management: Using /var/lib/glusterd as working
>>>>>>>>>>> directory
>>>>>>>>>>> [2019-01-15 17:50:13.960212] I
[MSGID: 106479]
>>>>>>>>>>> [glusterd.c:1486:init]
0-management: Using /var/run/gluster as pid file
>>>>>>>>>>> working directory
>>>>>>>>>>> [2019-01-15 17:50:13.964437] W
[MSGID: 103071]
>>>>>>>>>>> [rdma.c:4629:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event
>>>>>>>>>>> channel creation failed [No such
device]
>>>>>>>>>>> [2019-01-15 17:50:13.964474] W
[MSGID: 103055]
>>>>>>>>>>> [rdma.c:4938:init]
0-rdma.management: Failed to initialize IB Device
>>>>>>>>>>> [2019-01-15 17:50:13.964491] W
>>>>>>>>>>>
[rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma'
>>>>>>>>>>> initialization failed
>>>>>>>>>>> [2019-01-15 17:50:13.964560] W
>>>>>>>>>>>
[rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create
>>>>>>>>>>> listener, initing the transport
failed
>>>>>>>>>>> [2019-01-15 17:50:13.964579] E
[MSGID: 106244]
>>>>>>>>>>> [glusterd.c:1764:init]
0-management: creation of 1 listeners failed,
>>>>>>>>>>> continuing with succeeded transport
>>>>>>>>>>> [2019-01-15 17:50:14.967681] I
[MSGID: 106513]
>>>>>>>>>>>
[glusterd-store.c:2240:glusterd_restore_op_version] 0-glusterd: retrieved
>>>>>>>>>>> op-version: 40100
>>>>>>>>>>> [2019-01-15 17:50:14.973931] I
[MSGID: 106544]
>>>>>>>>>>> [glusterd.c:158:glusterd_uuid_init]
0-management: retrieved UUID:
>>>>>>>>>>>
d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>>>> [2019-01-15 17:50:15.046620] E
[MSGID: 101032]
>>>>>>>>>>>
[store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
>>>>>>>>>>>
/var/lib/glusterd/vols/gfs-tst/bricks/IP.3:-media-disk3-brick3. [No such
>>>>>>>>>>> file or directory]
>>>>>>>>>>> [2019-01-15 17:50:15.046685] E
[MSGID: 106201]
>>>>>>>>>>>
[glusterd-store.c:3384:glusterd_store_retrieve_volumes] 0-management:
>>>>>>>>>>> Unable to restore volume: gfs-tst
>>>>>>>>>>> [2019-01-15 17:50:15.046718] E
[MSGID: 101019]
>>>>>>>>>>> [xlator.c:720:xlator_init]
0-management: Initialization of volume
>>>>>>>>>>> 'management' failed, review
your volfile again
>>>>>>>>>>> [2019-01-15 17:50:15.046732] E
[MSGID: 101066]
>>>>>>>>>>> [graph.c:367:glusterfs_graph_init]
0-management: initializing translator
>>>>>>>>>>> failed
>>>>>>>>>>> [2019-01-15 17:50:15.046741] E
[MSGID: 101176]
>>>>>>>>>>>
[graph.c:738:glusterfs_graph_activate] 0-graph: init failed
>>>>>>>>>>> [2019-01-15 17:50:15.047171] W
>>>>>>>>>>>
[glusterfsd.c:1514:cleanup_and_exit]
>>>>>>>>>>>
(-->/usr/local/sbin/glusterd(glusterfs_volumes_init+0xc2) [0x409f52]
>>>>>>>>>>>
-->/usr/local/sbin/glusterd(glusterfs_process_volfp+0x151) [0x409e41]
>>>>>>>>>>>
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x5f) [0x40942f] ) 0-:
>>>>>>>>>>> received signum (-1), shutting down
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 15. In other node running `volume
status' still shows bricks
>>>>>>>>>>> node3 is live
>>>>>>>>>>>      but 'peer status'
showing node-3 disconnected
>>>>>>>>>>>
>>>>>>>>>>> @gfstst-node2:~$ sudo gluster v
status
>>>>>>>>>>> Status of volume: gfs-tst
>>>>>>>>>>> Gluster process                    
TCP Port  RDMA Port
>>>>>>>>>>> Online  Pid
>>>>>>>>>>>
>>>>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>>>>> Brick IP.2:/media/disk1/brick1     
49152     0          Y
>>>>>>>>>>>      1517
>>>>>>>>>>> Brick IP.4:/media/disk1/brick1     
49152     0          Y
>>>>>>>>>>>      1668
>>>>>>>>>>> Brick IP.2:/media/disk2/brick2     
49153     0          Y
>>>>>>>>>>>      1522
>>>>>>>>>>> Brick IP.4:/media/disk2/brick2     
49153     0          Y
>>>>>>>>>>>      1678
>>>>>>>>>>> Brick IP.2:/media/disk3/brick3     
49154     0          Y
>>>>>>>>>>>      1527
>>>>>>>>>>> Brick IP.4:/media/disk3/brick3     
49154     0          Y
>>>>>>>>>>>      1677
>>>>>>>>>>> Brick IP.2:/media/disk4/brick4     
49155     0          Y
>>>>>>>>>>>      1541
>>>>>>>>>>> Brick IP.4:/media/disk4/brick4     
49155     0          Y
>>>>>>>>>>>      1683
>>>>>>>>>>> Self-heal Daemon on localhost      
N/A       N/A        Y
>>>>>>>>>>>      2662
>>>>>>>>>>> Self-heal Daemon on IP.4           
N/A       N/A        Y
>>>>>>>>>>>      2786
>>>>>>>>>>>
>>>>>>>>>>> Task Status of Volume gfs-tst
>>>>>>>>>>>
>>>>>>>>>>>
------------------------------------------------------------------------------
>>>>>>>>>>> There are no active volume tasks
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> root at gfstst-node2:~$ sudo
gluster pool list
>>>>>>>>>>> UUID                               
Hostname        State
>>>>>>>>>>>
d6bf51a7-c296-492f-8dac-e81efa9dd22d    IP.3        Disconnected
>>>>>>>>>>>
c1cbb58e-3ceb-4637-9ba3-3d28ef20b143    IP.4        Connected
>>>>>>>>>>>
0083ec0c-40bf-472a-a128-458924e56c96    localhost       Connected
>>>>>>>>>>>
>>>>>>>>>>> root at gfstst-node2:~$ sudo
gluster peer status
>>>>>>>>>>> Number of Peers: 2
>>>>>>>>>>>
>>>>>>>>>>> Hostname: IP.3
>>>>>>>>>>> Uuid:
d6bf51a7-c296-492f-8dac-e81efa9dd22d
>>>>>>>>>>> State: Peer in Cluster
(Disconnected)
>>>>>>>>>>>
>>>>>>>>>>> Hostname: IP.4
>>>>>>>>>>> Uuid:
c1cbb58e-3ceb-4637-9ba3-3d28ef20b143
>>>>>>>>>>> State: Peer in Cluster (Connected)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> regards
>>>>>>>>>>> Amudhan
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>
>>>>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190130/49a4bcce/attachment.html>

Gluster users - Jan 2019 - glusterfs 4.1.6 error in starting glusterd service

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service

[Gluster-users] glusterfs 4.1.6 error in starting glusterd service