thr3ads.net - Gluster users - [Gluster-users] not healing one file [Oct 2017]

If this information is useful, please help other people find it:
Share via:

Richard Neuboeck

2017-Oct-25 12:10 UTC

[Gluster-users] not healing one file

Hi Gluster Gurus,

I'm using a gluster volume as home for our users. The volume is
replica 3, running on CentOS 7, gluster version 3.10
(3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
gluster 3.10 (3.10.6-3.fc26.x86_64).

During the data backup I got an I/O error on one file. Manually
checking for this file on a client confirms this:

ls -l
romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/
ls: cannot access
'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4':
Input/output error
total 2015
-rw-------. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
-rw-------. 1 romanoch tbi 65222 Oct 17 17:57 previous.jsonlz4
-rw-------. 1 romanoch tbi 149161 Oct 1 13:46 recovery.bak
-?????????? ? ? ? ? ? recovery.baklz4

Out of curiosity I checked all the bricks for this file. It's
present there. Making a checksum shows that the file is different on
one of the three replica servers.

Querying healing information shows that the file should be healed:
# gluster volume heal home info
Brick sphere-six:/srv/gluster_home/brick
/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4

Status: Connected
Number of entries: 1

Brick sphere-five:/srv/gluster_home/brick
/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4

Status: Connected
Number of entries: 1

Brick sphere-four:/srv/gluster_home/brick
Status: Connected
Number of entries: 0

Manually triggering heal doesn't report an error but also does not
heal the file.
# gluster volume heal home
Launching heal operation to perform index self heal on volume home
has been successful

Same with a full heal
# gluster volume heal home full
Launching heal operation to perform full self heal on volume home
has been successful

According to the split brain query that's not the problem:
# gluster volume heal home info split-brain
Brick sphere-six:/srv/gluster_home/brick
Status: Connected
Number of entries in split-brain: 0

Brick sphere-five:/srv/gluster_home/brick
Status: Connected
Number of entries in split-brain: 0

Brick sphere-four:/srv/gluster_home/brick
Status: Connected
Number of entries in split-brain: 0

I have no idea why this situation arose in the first place and also
no idea as how to solve this problem. I would highly appreciate any
helpful feedback I can get.

The only mention in the logs matching this file is a rename operation:
/var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-10-23
09:19:11.561661] I [MSGID: 115061]
[server-rpc-fops.c:1022:server_rename_cbk] 0-home-server: 5266153:
RENAME
/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.jsonlz4
(48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
(48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client:
romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:206366-home-client-3-0-0,
error-xlator: home-posix [No data available]

I enabled directory quotas the same day this problem showed up but
I'm not sure how quotas could have an effect like this (maybe unless
the limit is reached but that's also not the case).

Thanks again if anyone as an idea.
Cheers
Richard
--
/dev/null

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171025/c6a4297a/attachment.sig>

Amar Tumballi

2017-Oct-26 04:50 UTC

head link

[Gluster-users] not healing one file

Thanks for this report. This week many of the developers are at Gluster
Summit in Prague, will be checking this and respond next week. Hope that's
fine.

Thanks,
Amar


On 25-Oct-2017 3:07 PM, "Richard Neuboeck" <hawk at
tbi.univie.ac.at> wrote:
> Hi Gluster Gurus,
>
> I'm using a gluster volume as home for our users. The volume is
> replica 3, running on CentOS 7, gluster version 3.10
> (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
> gluster 3.10 (3.10.6-3.fc26.x86_64).
>
> During the data backup I got an I/O error on one file. Manually
> checking for this file on a client confirms this:
>
> ls -l
> romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/
> ls: cannot access
> 'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.baklz4':
> Input/output error
> total 2015
> -rw-------. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
> -rw-------. 1 romanoch tbi  65222 Oct 17 17:57 previous.jsonlz4
> -rw-------. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
> -?????????? ? ?        ?        ?            ? recovery.baklz4
>
> Out of curiosity I checked all the bricks for this file. It's
> present there. Making a checksum shows that the file is different on
> one of the three replica servers.
>
> Querying healing information shows that the file should be healed:
> # gluster volume heal home info
> Brick sphere-six:/srv/gluster_home/brick
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.baklz4
>
> Status: Connected
> Number of entries: 1
>
> Brick sphere-five:/srv/gluster_home/brick
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.baklz4
>
> Status: Connected
> Number of entries: 1
>
> Brick sphere-four:/srv/gluster_home/brick
> Status: Connected
> Number of entries: 0
>
> Manually triggering heal doesn't report an error but also does not
> heal the file.
> # gluster volume heal home
> Launching heal operation to perform index self heal on volume home
> has been successful
>
> Same with a full heal
> # gluster volume heal home full
> Launching heal operation to perform full self heal on volume home
> has been successful
>
> According to the split brain query that's not the problem:
> # gluster volume heal home info split-brain
> Brick sphere-six:/srv/gluster_home/brick
> Status: Connected
> Number of entries in split-brain: 0
>
> Brick sphere-five:/srv/gluster_home/brick
> Status: Connected
> Number of entries in split-brain: 0
>
> Brick sphere-four:/srv/gluster_home/brick
> Status: Connected
> Number of entries in split-brain: 0
>
>
> I have no idea why this situation arose in the first place and also
> no idea as how to solve this problem. I would highly appreciate any
> helpful feedback I can get.
>
> The only mention in the logs matching this file is a rename operation:
> /var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-10-23
> 09:19:11.561661] I [MSGID: 115061]
> [server-rpc-fops.c:1022:server_rename_cbk] 0-home-server: 5266153:
> RENAME
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.jsonlz4
> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.baklz4
> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client:
> romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:
> 206366-home-client-3-0-0,
> error-xlator: home-posix [No data available]
>
> I enabled directory quotas the same day this problem showed up but
> I'm not sure how quotas could have an effect like this (maybe unless
> the limit is reached but that's also not the case).
>
> Thanks again if anyone as an idea.
> Cheers
> Richard
> --
> /dev/null
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171026/bf8aa268/attachment.html>

Amar Tumballi

2017-Oct-26 04:51 UTC

head link

[Gluster-users] not healing one file

On a side note, try recently released health report tool, and see if it
does diagnose any issues in setup. Currently you may have to run it in all
the three machines.



On 26-Oct-2017 6:50 AM, "Amar Tumballi" <atumball at redhat.com>
wrote:
> Thanks for this report. This week many of the developers are at Gluster
> Summit in Prague, will be checking this and respond next week. Hope
that's
> fine.
>
> Thanks,
> Amar
>
>
> On 25-Oct-2017 3:07 PM, "Richard Neuboeck" <hawk at
tbi.univie.ac.at> wrote:
>
>> Hi Gluster Gurus,
>>
>> I'm using a gluster volume as home for our users. The volume is
>> replica 3, running on CentOS 7, gluster version 3.10
>> (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
>> gluster 3.10 (3.10.6-3.fc26.x86_64).
>>
>> During the data backup I got an I/O error on one file. Manually
>> checking for this file on a client confirms this:
>>
>> ls -l
>> romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/
>> ls: cannot access
>> 'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.baklz4':
>> Input/output error
>> total 2015
>> -rw-------. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
>> -rw-------. 1 romanoch tbi  65222 Oct 17 17:57 previous.jsonlz4
>> -rw-------. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
>> -?????????? ? ?        ?        ?            ? recovery.baklz4
>>
>> Out of curiosity I checked all the bricks for this file. It's
>> present there. Making a checksum shows that the file is different on
>> one of the three replica servers.
>>
>> Querying healing information shows that the file should be healed:
>> # gluster volume heal home info
>> Brick sphere-six:/srv/gluster_home/brick
>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.baklz4
>>
>> Status: Connected
>> Number of entries: 1
>>
>> Brick sphere-five:/srv/gluster_home/brick
>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.baklz4
>>
>> Status: Connected
>> Number of entries: 1
>>
>> Brick sphere-four:/srv/gluster_home/brick
>> Status: Connected
>> Number of entries: 0
>>
>> Manually triggering heal doesn't report an error but also does not
>> heal the file.
>> # gluster volume heal home
>> Launching heal operation to perform index self heal on volume home
>> has been successful
>>
>> Same with a full heal
>> # gluster volume heal home full
>> Launching heal operation to perform full self heal on volume home
>> has been successful
>>
>> According to the split brain query that's not the problem:
>> # gluster volume heal home info split-brain
>> Brick sphere-six:/srv/gluster_home/brick
>> Status: Connected
>> Number of entries in split-brain: 0
>>
>> Brick sphere-five:/srv/gluster_home/brick
>> Status: Connected
>> Number of entries in split-brain: 0
>>
>> Brick sphere-four:/srv/gluster_home/brick
>> Status: Connected
>> Number of entries in split-brain: 0
>>
>>
>> I have no idea why this situation arose in the first place and also
>> no idea as how to solve this problem. I would highly appreciate any
>> helpful feedback I can get.
>>
>> The only mention in the logs matching this file is a rename operation:
>> /var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-10-23
>> 09:19:11.561661] I [MSGID: 115061]
>> [server-rpc-fops.c:1022:server_rename_cbk] 0-home-server: 5266153:
>> RENAME
>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.jsonlz4
>> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.baklz4
>> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client:
>> romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:206366-
>> home-client-3-0-0,
>> error-xlator: home-posix [No data available]
>>
>> I enabled directory quotas the same day this problem showed up but
>> I'm not sure how quotas could have an effect like this (maybe
unless
>> the limit is reached but that's also not the case).
>>
>> Thanks again if anyone as an idea.
>> Cheers
>> Richard
>> --
>> /dev/null
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171026/05cb3628/attachment.html>

Maybe Matching Threads

Search for more reasonably related threads

Gluster users - Oct 2017 - not healing one file

[Gluster-users] not healing one file

[Gluster-users] not healing one file

[Gluster-users] not healing one file

Maybe Matching Threads