thr3ads.net - Gluster users - [Gluster-users] Some files are not healed (but not in split-brain), manual fix with setfattr does not work [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Micha Ober

2016-Oct-01 10:44 UTC

[Gluster-users] Some files are not healed (but not in split-brain), manual fix with setfattr does not work

Hi all,

I noticed that I have two files which are not healed:

root at giant5:~# gluster volume heal gv0 info
Gathering Heal info on volume gv0 has been successful

Brick giant1:/gluster/sdc/gv0
Number of entries: 1
/holicki/lqcd/slurm-7251.out

Brick giant2:/gluster/sdc/gv0
Number of entries: 1
/holicki/lqcd/slurm-7251.out

Brick giant3:/gluster/sdc/gv0
Number of entries: 1
/holicki/lqcd/slurm-7249.out

Brick giant4:/gluster/sdc/gv0
Number of entries: 1
/holicki/lqcd/slurm-7249.out

Brick giant5:/gluster/sdc/gv0
Number of entries: 1
<gfid:e9793d5e-7174-49b0-9fa9-90f8c35948e7>

Brick giant6:/gluster/sdc/gv0
Number of entries: 1
<gfid:e9793d5e-7174-49b0-9fa9-90f8c35948e7>

Brick giant1:/gluster/sdd/gv0
Number of entries: 1
/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out

Brick giant2:/gluster/sdd/gv0
Number of entries: 1
/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out

Brick giant3:/gluster/sdd/gv0
Number of entries: 0

Brick giant4:/gluster/sdd/gv0
Number of entries: 0

Brick giant5:/gluster/sdd/gv0
Number of entries: 0

Brick giant6:/gluster/sdd/gv0
Number of entries: 0


(Disregard the file "slurm-7251.out", this is/was IO in progress.)

The logs are filled with entries like this:

[2016-09-30 12:45:26.611375] I [afr-self-heal-data.c:655:afr_sh_data_fix]
0-gv0-replicate-3: no active sinks for performing self-heal on file
/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out
[2016-09-30 12:45:36.874802] I [afr-self-heal-data.c:655:afr_sh_data_fix]
0-gv0-replicate-3: no active sinks for performing self-heal on file
/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out
[2016-09-30 12:45:53.701884] I [afr-self-heal-data.c:655:afr_sh_data_fix]
0-gv0-replicate-3: no active sinks for performing self-heal on file
/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out

I checked with md5sum that both files are identical.
Then, I used setfattr as proposed in an older thread in this mailing list:

setfattr -n trusted.afr.gv0-client-7 -v 0x000000000000000000000000
/gluster/sdd/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out

I did this on both nodes for both clients, so it now looks like this  (on
both nodes/bricks):

getfattr -d -m . -e hex
/gluster/sdd/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out
getfattr: Removing leading '/' from absolute path names
# file:
gluster/sdd/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out
trusted.afr.gv0-client-6=0x000000000000000000000000
trusted.afr.gv0-client-7=0x000000000000000000000000
trusted.gfid=0xcb7978fa42e74a0b97928a87126338ac

I triggered heal, but the files do not disappear from heal info. But also,
they are not listed in split-brain oder heal-failed.

I used gfid-resolver.sh for the other file:
e9793d5e-7174-49b0-9fa9-90f8c35948e7    ==      File:
/gluster/sdc/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu0.800/slurm-5663.out

This file is also marked as dirty:

root at giant5:/var/log/glusterfs# getfattr -d -m . -e hex
/gluster/sdc/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu0.800/slurm-5663.out
getfattr: Removing leading '/' from absolute path names
# file:
gluster/sdc/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu0.800/slurm-5663.out
trusted.afr.gv0-client-4=0x000000010000000000000000
trusted.afr.gv0-client-5=0x000000010000000000000000
trusted.gfid=0xe9793d5e717449b09fa990f8c35948e7


How can I fix this, i.e. get the files healed? I'm using gluster 3.4.2 on
Ubuntu 14.04.3.

I also thought about scheduling a downtime and upgrading gluster, but I
don't know if I can do this as long as there are files to be healed.

Thanks for any advice.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161001/bdbafde3/attachment.html>

Micha Ober

2016-Oct-02 09:57 UTC

head link

[Gluster-users] Some files are not healed (but not in split-brain), manual fix with setfattr does not work

I noticed one more detail, since I found some entries which are not healed
on another gluster filesystem. I resolved the gfids and noticed that *only*
the log files generated by the SLURM workload manager (slurm-*.out) are
affected. Are there any known problems with SLURM + glusterfs?

2016-10-01 12:44 GMT+02:00 Micha Ober <micha2k at gmail.com>:
> Hi all,
>
> I noticed that I have two files which are not healed:
>
> root at giant5:~# gluster volume heal gv0 info
> Gathering Heal info on volume gv0 has been successful
>
> Brick giant1:/gluster/sdc/gv0
> Number of entries: 1
> /holicki/lqcd/slurm-7251.out
>
> Brick giant2:/gluster/sdc/gv0
> Number of entries: 1
> /holicki/lqcd/slurm-7251.out
>
> Brick giant3:/gluster/sdc/gv0
> Number of entries: 1
> /holicki/lqcd/slurm-7249.out
>
> Brick giant4:/gluster/sdc/gv0
> Number of entries: 1
> /holicki/lqcd/slurm-7249.out
>
> Brick giant5:/gluster/sdc/gv0
> Number of entries: 1
> <gfid:e9793d5e-7174-49b0-9fa9-90f8c35948e7>
>
> Brick giant6:/gluster/sdc/gv0
> Number of entries: 1
> <gfid:e9793d5e-7174-49b0-9fa9-90f8c35948e7>
>
> Brick giant1:/gluster/sdd/gv0
> Number of entries: 1
> /jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/
> mu1.000/slurm-5660.out
>
> Brick giant2:/gluster/sdd/gv0
> Number of entries: 1
> /jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/
> mu1.000/slurm-5660.out
>
> Brick giant3:/gluster/sdd/gv0
> Number of entries: 0
>
> Brick giant4:/gluster/sdd/gv0
> Number of entries: 0
>
> Brick giant5:/gluster/sdd/gv0
> Number of entries: 0
>
> Brick giant6:/gluster/sdd/gv0
> Number of entries: 0
>
>
> (Disregard the file "slurm-7251.out", this is/was IO in
progress.)
>
> The logs are filled with entries like this:
>
> [2016-09-30 12:45:26.611375] I [afr-self-heal-data.c:655:afr_sh_data_fix]
> 0-gv0-replicate-3: no active sinks for performing self-heal on file
> /jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/
> mu1.000/slurm-5660.out
> [2016-09-30 12:45:36.874802] I [afr-self-heal-data.c:655:afr_sh_data_fix]
> 0-gv0-replicate-3: no active sinks for performing self-heal on file
> /jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/
> mu1.000/slurm-5660.out
> [2016-09-30 12:45:53.701884] I [afr-self-heal-data.c:655:afr_sh_data_fix]
> 0-gv0-replicate-3: no active sinks for performing self-heal on file
> /jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/
> mu1.000/slurm-5660.out
>
> I checked with md5sum that both files are identical.
> Then, I used setfattr as proposed in an older thread in this mailing list:
>
> setfattr -n trusted.afr.gv0-client-7 -v 0x000000000000000000000000
> /gluster/sdd/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.
> 7000_lambda0.0050/mu1.000/slurm-5660.out
>
> I did this on both nodes for both clients, so it now looks like this  (on
> both nodes/bricks):
>
> getfattr -d -m . -e hex /gluster/sdd/gv0/jwilhelm/
> CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out
> getfattr: Removing leading '/' from absolute path names
> # file: gluster/sdd/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_
> lambda0.0050/mu1.000/slurm-5660.out
> trusted.afr.gv0-client-6=0x000000000000000000000000
> trusted.afr.gv0-client-7=0x000000000000000000000000
> trusted.gfid=0xcb7978fa42e74a0b97928a87126338ac
>
> I triggered heal, but the files do not disappear from heal info. But also,
> they are not listed in split-brain oder heal-failed.
>
> I used gfid-resolver.sh for the other file:
> e9793d5e-7174-49b0-9fa9-90f8c35948e7    ==      File:
> /gluster/sdc/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.
> 7000_lambda0.0050/mu0.800/slurm-5663.out
>
> This file is also marked as dirty:
>
> root at giant5:/var/log/glusterfs# getfattr -d -m . -e hex
> /gluster/sdc/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.
> 7000_lambda0.0050/mu0.800/slurm-5663.out
> getfattr: Removing leading '/' from absolute path names
> # file: gluster/sdc/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_
> lambda0.0050/mu0.800/slurm-5663.out
> trusted.afr.gv0-client-4=0x000000010000000000000000
> trusted.afr.gv0-client-5=0x000000010000000000000000
> trusted.gfid=0xe9793d5e717449b09fa990f8c35948e7
>
>
> How can I fix this, i.e. get the files healed? I'm using gluster 3.4.2
on
> Ubuntu 14.04.3.
>
> I also thought about scheduling a downtime and upgrading gluster, but I
> don't know if I can do this as long as there are files to be healed.
>
> Thanks for any advice.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161002/099015de/attachment.html>

Gluster users - Oct 2016 - Some files are not healed (but not in split-brain), manual fix with setfattr does not work

[Gluster-users] Some files are not healed (but not in split-brain), manual fix with setfattr does not work

[Gluster-users] Some files are not healed (but not in split-brain), manual fix with setfattr does not work