thr3ads.net - Gluster users - [Gluster-users] Problem with heal operation con replica 2: "Launching heal operation to perform full self heal on volume gv0 has been unsuccessful on bricks that are down. Please check if all brick processes are running." [Dec 2019]

If this information is useful, please help other people find it:
Share via:

woz woz

2019-Dec-11 11:17 UTC

[Gluster-users] Problem with heal operation con replica 2: "Launching heal operation to perform full self heal on volume gv0 has been unsuccessful on bricks that are down. Please check if all brick processes are running."

Hi guys,
how are you? I have a question for you
Last week one of our 8 servers went down due to a problem on the RAID
controller and, unfortunately, we had to reinstall and reconfigure it. The
hostname of this server is gluster09.example.int, below you can find the
volume status:
#####################################################
Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 4e9122d3-f4c9-4509-b25e-a30f7b5f452f
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gluster09.example.int:/mnt/gluster/brick
Brick2: gluster01.example.int:/mnt/gluster/brick
Brick3: gluster03.example.int:/mnt/gluster/brick
Brick4: gluster04.example.int:/mnt/gluster/brick
Brick5: gluster05.example.int:/mnt/gluster/brick
Brick6: gluster06.example.int:/mnt/gluster/brick
Brick7: gluster07.example.int:/mnt/gluster/brick
Brick8: gluster08.example.int:/mnt/gluster/brick
Options Reconfigured:
cluster.shd-wait-qlength: 16384
cluster.self-heal-daemon: enable
cluster.shd-max-threads: 12
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
#####################################################
To add the new server again we followed the official procedure provided by
REDHAT (
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/pdf/administration_guide/Red_Hat_Gluster_Storage-3.5-Administration_Guide-en-US.pdf)
and we noticed that the self heal is in progress, in fact-checking the
status of it using the command gluster volume heal gv0 info we see a lot of
entries like:
#####################################################
Brick gluster09.example.int:/mnt/gluster/brick
Status: Connected
Number of entries: 0
Brick gluster01.example.int:/mnt/gluster/brick
<gfid:2786f558-3b4a-42dd-b519-4be548b7c735>
<gfid:15921050-7538-43bb-9c5a-80d2e74c73b8>
<gfid:00a120eb-d7c9-490b-8996-e06ceeaf2f7c>
<gfid:366ce3d7-6ab4-49b6-8fe0-10f8af07864c>
<gfid:09d94803-c4e0-4d3d-b89d-1f70e0910fdd>
<gfid:b54ebaac-ab84-4277-bded-762b2dd09ae2>
<gfid:3bab3231-cccc-44db-9ec5-1f167a7b47b1>
<gfid:e3d209ae-e260-4d0a-93fd-65e8ef35d2ff>
<gfid:8fbe07a6-b46e-4c02-b20e-0c0a2d36cee1>
<gfid:131b9889-2a4d-4b67-a0f2-99c782de2b69>
<gfid:73f76c29-b4a2-4a2d-92e3-2c9f86884d04>
<gfid:b998387c-d8fc-4fc0-9ddf-dabb26eb2f4e>
<gfid:ccdbd77d-e548-4561-a9ec-1d186e0a6bbe>
............................
#####################################################
but it seems that the sync is only for the indices and not for the data, in
fact, the disk space occupied is, currently, just 120 GB on the new server
whereas on the "master" node the server gluster01.example.int is 60TB.
We
tried to change also the parameters regarding the number of threads for the
self-heal daemon moving it from 1 to 12 and cluster.shd-wait-qlength from
1024 to 16384 but we didn't notice any improvements...
Moreover, we tried also the start a full  heal of the volume using the
command gluster volume heal gv0 full  but we receive the following error:
Launching heal operation to perform full self-heal on volume gv0 has been
unsuccessful on bricks that are down. Please check if all brick processes
are running.
How we can synchronize the data on the new server and not just the indices?

Thanks in advance for your support,
Best regards,
Woz
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20191211/e7a0b067/attachment.html>

Sanju Rakonde

2019-Dec-12 09:46 UTC

head link

[Gluster-users] Problem with heal operation con replica 2: "Launching heal operation to perform full self heal on volume gv0 has been unsuccessful on bricks that are down. Please check if all brick processes are running."

On Wed, Dec 11, 2019 at 10:06 PM woz woz <thewoz10 at gmail.com> wrote:
> Hi guys,
> how are you? I have a question for you
> Last week one of our 8 servers went down due to a problem on the RAID
> controller and, unfortunately, we had to reinstall and reconfigure it. The
> hostname of this server is gluster09.example.int, below you can find the
> volume status:
> #####################################################
> Volume Name: gv0
> Type: Distributed-Replicate
> Volume ID: 4e9122d3-f4c9-4509-b25e-a30f7b5f452f
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 4 x 2 = 8
> Transport-type: tcp
> Bricks:
> Brick1: gluster09.example.int:/mnt/gluster/brick
> Brick2: gluster01.example.int:/mnt/gluster/brick
> Brick3: gluster03.example.int:/mnt/gluster/brick
> Brick4: gluster04.example.int:/mnt/gluster/brick
> Brick5: gluster05.example.int:/mnt/gluster/brick
> Brick6: gluster06.example.int:/mnt/gluster/brick
> Brick7: gluster07.example.int:/mnt/gluster/brick
> Brick8: gluster08.example.int:/mnt/gluster/brick
> Options Reconfigured:
> cluster.shd-wait-qlength: 16384
> cluster.self-heal-daemon: enable
> cluster.shd-max-threads: 12
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> #####################################################
> To add the new server again we followed the official procedure provided by
> REDHAT (
>
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/pdf/administration_guide/Red_Hat_Gluster_Storage-3.5-Administration_Guide-en-US.pdf)
> and we noticed that the self heal is in progress, in fact-checking the
> status of it using the command gluster volume heal gv0 info we see a lot of
> entries like:
> #####################################################
> Brick gluster09.example.int:/mnt/gluster/brick
> Status: Connected
> Number of entries: 0
> Brick gluster01.example.int:/mnt/gluster/brick
> <gfid:2786f558-3b4a-42dd-b519-4be548b7c735>
> <gfid:15921050-7538-43bb-9c5a-80d2e74c73b8>
> <gfid:00a120eb-d7c9-490b-8996-e06ceeaf2f7c>
> <gfid:366ce3d7-6ab4-49b6-8fe0-10f8af07864c>
> <gfid:09d94803-c4e0-4d3d-b89d-1f70e0910fdd>
> <gfid:b54ebaac-ab84-4277-bded-762b2dd09ae2>
> <gfid:3bab3231-cccc-44db-9ec5-1f167a7b47b1>
> <gfid:e3d209ae-e260-4d0a-93fd-65e8ef35d2ff>
> <gfid:8fbe07a6-b46e-4c02-b20e-0c0a2d36cee1>
> <gfid:131b9889-2a4d-4b67-a0f2-99c782de2b69>
> <gfid:73f76c29-b4a2-4a2d-92e3-2c9f86884d04>
> <gfid:b998387c-d8fc-4fc0-9ddf-dabb26eb2f4e>
> <gfid:ccdbd77d-e548-4561-a9ec-1d186e0a6bbe>
> ............................
> #####################################################
> but it seems that the sync is only for the indices and not for the data,
> in fact, the disk space occupied is, currently, just 120 GB on the new
> server whereas on the "master" node the server
gluster01.example.int is
> 60TB. We tried to change also the parameters regarding the number of
> threads for the self-heal daemon moving it from 1 to 12
> and cluster.shd-wait-qlength from 1024 to 16384 but we didn't notice
any
> improvements...
> Moreover, we tried also the start a full  heal of the volume using the
> command gluster volume heal gv0 full  but we receive the following error:
> Launching heal operation to perform full self-heal on volume gv0 has been
> unsuccessful on bricks that are down. Please check if all brick processes
> are running.
>
Please, check whether you are hitting
https://bugzilla.redhat.com/show_bug.cgi?id=1676812
> How we can synchronize the data on the new server and not just the indices?
>
> Thanks in advance for your support,
> Best regards,
> Woz
> ________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>

-- 
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20191212/4721c571/attachment.html>

Gluster users - Dec 2019 - Problem with heal operation con replica 2: "Launching heal operation to perform full self heal on volume gv0 has been unsuccessful on bricks that are down. Please check if all brick processes are running."

[Gluster-users] Problem with heal operation con replica 2: "Launching heal operation to perform full self heal on volume gv0 has been unsuccessful on bricks that are down. Please check if all brick processes are running."

[Gluster-users] Problem with heal operation con replica 2: "Launching heal operation to perform full self heal on volume gv0 has been unsuccessful on bricks that are down. Please check if all brick processes are running."