thr3ads.net - Gluster users - [Gluster-users] Bricks are going offline unable to recover with heal/start force commands [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Shaik Salam

2019-Jan-22 05:58 UTC

[Gluster-users] Bricks are going offline unable to recover with heal/start force commands

Can anyone respond how to recover bricks apart from heal/start force 
according to below events from logs.
Please let me know any other logs required.
Thanks in advance.

BR
Salam



From:   Shaik Salam/HYD/TCS
To:     bugs at gluster.org, gluster-users at gluster.org
Date:   01/21/2019 10:03 PM
Subject:        Bricks are going offline unable to recover with heal/start 
force commands


Hi,

Bricks are in offline and  unable to recover with following commands

gluster volume heal <vol-name>

gluster volume start <vol-name> force

But still bricks are offline.


sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
Gluster process                             TCP Port  RDMA Port  Online 
Pid
------------------------------------------------------------------------------
Brick 192.168.3.6:/var/lib/heketi/mounts/vg
_ca57f326195c243be2380ce4e42a4191/brick_952
d75fd193c7209c9a81acbc23a3747/brick         49166     0          Y 269
Brick 192.168.3.5:/var/lib/heketi/mounts/vg
_d5f17487744584e3652d3ca943b0b91b/brick_e15
c12cceae12c8ab7782dd57cf5b6c1/brick         N/A       N/A        N N/A
Brick 192.168.3.15:/var/lib/heketi/mounts/v
g_462ea199185376b03e4b0317363bb88c/brick_17
36459d19e8aaa1dcb5a87f48747d04/brick        49173     0          Y 225
Self-heal Daemon on localhost               N/A       N/A        Y 45826
Self-heal Daemon on 192.168.3.6             N/A       N/A        Y 65196
Self-heal Daemon on 192.168.3.15            N/A       N/A        Y 52915

Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8
------------------------------------------------------------------------------


We can see following events from when we start forcing volumes

/mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] 
-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) 
[0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
[0x7fcaa346f0e5] ) 0-management: Ran script: 
/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh 
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 
--volume-op=start --gd-workdir=/var/lib/glusterd
[2019-01-21 08:22:34.555068] E [run.c:241:runner_log] 
(-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) 
[0x7fca9e139b3a] 
-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563) 
[0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
[0x7fcaa346f0e5] ) 0-management: Failed to execute script: 
/var/lib/glusterd/hooks/1/start/post/S30samba-start.sh 
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 
--volume-op=start --gd-workdir=/var/lib/glusterd
[2019-01-21 08:22:53.389049] I [MSGID: 106499] 
[glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management: 
Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8
[2019-01-21 08:23:25.346839] I [MSGID: 106487] 
[glusterd-handler.c:1486:__glusterd_handle_cli_list_friends] 0-glusterd: 
Received cli list req


We can see following events from when we heal volumes.

[2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glusterfs: error returned while attempting to connect to host:(null), 
port:0
[2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk] 
0-cli: Received resp to heal volume
[2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with: -1
[2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running 
gluster with version 4.1.5
[2019-01-21 08:22:30.463648] I [MSGID: 101190] 
[event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 1
[2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now
[2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glusterfs: error returned while attempting to connect to host:(null), 
port:0
[2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now
[2019-01-21 08:22:34.581555] I 
[cli-rpc-ops.c:1472:gf_cli_start_volume_cbk] 0-cli: Received resp to start 
volume
[2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running 
gluster with version 4.1.5
[2019-01-21 08:22:53.387992] I [MSGID: 101190] 
[event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 1
[2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now
[2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glusterfs: error returned while attempting to connect to host:(null), 
port:0
[2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running 
gluster with version 4.1.5
[2019-01-21 08:23:25.346319] I [MSGID: 101190] 
[event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 1
[2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now
[2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glusterfs: error returned while attempting to connect to host:(null), 
port:0



Please let us know steps to recover bricks.


BR
Salam
=====-----=====-----====Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190122/bb673b20/attachment.html>

Amar Tumballi Suryanarayan

2019-Jan-22 06:07 UTC

head link

[Gluster-users] [Bugs] Bricks are going offline unable to recover with heal/start force commands

Hi Shaik,

Can you check what is there in brick logs? They are located in
/var/log/glusterfs/bricks/*?

Looks like the samba hooks script failed, but that shouldn't matter in this
use case.

Also, I see that you are trying to setup heketi to provision volumes, which
means you may be using gluster in container usecases. If you are still in
'PoC' phase, can you give https://github.com/gluster/gcs a try? That
makes
the deployment and the stack little simpler.

-Amar




On Tue, Jan 22, 2019 at 11:29 AM Shaik Salam <shaik.salam at tcs.com>
wrote:
> Can anyone respond how to recover bricks apart from heal/start force
> according to below events from logs.
> Please let me know any other logs required.
> Thanks in advance.
>
> BR
> Salam
>
>
>
> From:        Shaik Salam/HYD/TCS
> To:        bugs at gluster.org, gluster-users at gluster.org
> Date:        01/21/2019 10:03 PM
> Subject:        Bricks are going offline unable to recover with
> heal/start force commands
> ------------------------------
>
>
> Hi,
>
> Bricks are in offline and  unable to recover with following commands
>
> gluster volume heal <vol-name>
>
> gluster volume start <vol-name> force
>
> But still bricks are offline.
>
>
> sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
> Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
>
>
------------------------------------------------------------------------------
> Brick 192.168.3.6:/var/lib/heketi/mounts/vg
> _ca57f326195c243be2380ce4e42a4191/brick_952
> d75fd193c7209c9a81acbc23a3747/brick         49166     0          Y
> 269
> Brick 192.168.3.5:/var/lib/heketi/mounts/vg
> _d5f17487744584e3652d3ca943b0b91b/brick_e15
> c12cceae12c8ab7782dd57cf5b6c1/brick         N/A       N/A        N
> N/A
> Brick 192.168.3.15:/var/lib/heketi/mounts/v
> g_462ea199185376b03e4b0317363bb88c/brick_17
> 36459d19e8aaa1dcb5a87f48747d04/brick        49173     0          Y
> 225
> Self-heal Daemon on localhost               N/A       N/A        Y
> 45826
> Self-heal Daemon on 192.168.3.6             N/A       N/A        Y
> 65196
> Self-heal Daemon on 192.168.3.15            N/A       N/A        Y
> 52915
>
> Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8
>
>
------------------------------------------------------------------------------
>
>
> We can see following events from when we start forcing volumes
>
> /mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
> [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
> [2019-01-21 08:22:34.555068] E [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563)
> [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Failed to execute script:
> /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
> [2019-01-21 08:22:53.389049] I [MSGID: 106499]
> [glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8
> [2019-01-21 08:23:25.346839] I [MSGID: 106487]
> [glusterd-handler.c:1486:__glusterd_handle_cli_list_friends] 0-glusterd:
> Received cli list req
>
>
> We can see following events from when we heal volumes.
>
> [2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk]
> 0-cli: Received resp to heal volume
> [2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with: -1
> [2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:22:30.463648] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:34.581555] I
> [cli-rpc-ops.c:1472:gf_cli_start_volume_cbk] 0-cli: Received resp to start
> volume
> [2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with: 0
> [2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:22:53.387992] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with: 0
> [2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:23:25.346319] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
>
>
>
> Please let us know steps to recover bricks.
>
>
> BR
> Salam
>
> =====-----=====-----====> Notice: The information contained in this
e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
> _______________________________________________
> Bugs mailing list
> Bugs at gluster.org
> https://lists.gluster.org/mailman/listinfo/bugs
>

-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190122/ede06586/attachment.html>

Gluster users - Jan 2019 - Bricks are going offline unable to recover with heal/start force commands

[Gluster-users] Bricks are going offline unable to recover with heal/start force commands

[Gluster-users] [Bugs] Bricks are going offline unable to recover with heal/start force commands