Xavi,
Thanks for checking this. We have an external metadata server which
keeps track of every file that gets written to the volume and has the capability
to validate the file contents. Will use this capability to validate the file
contents. Once the data is verified will the following sequence of steps be
sufficient to restore the volume.
1) Rebalance the volume.
2) After rebalance is complete, stop ingesting more data to the volume.
3) Let the pending heals complete.
4) Stop the volume
5) For any heals that fail because of mismatching version/dirty extended
attributes on the directories, set this to a matching value on all the nodes.
Thanks and Regards,
Ram
-----Original Message-----
From: Xavier Hernandez [mailto:xhernandez at datalab.es]
Sent: Tuesday, March 14, 2017 5:28 AM
To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org);
gluster-users at gluster.org
Subject: Re: [Gluster-users] Disperse mkdir fails
Hi Ram,
On 13/03/17 15:02, Ankireddypalle Reddy wrote:> Xavi,
> CV_MAGNETIC directory on a single brick has 155683
entries. There are altogether 60 bricks in the volume. I could provide the
output if you still need that.
The problem is that not all bricks have the same number of entries:
glusterfs1:disk1 155674
glusterfs2:disk1 155675
glusterfs3:disk1 155718
glusterfs1:disk2 155688
glusterfs2:disk2 155687
glusterfs3:disk2 155730
glusterfs1:disk3 155675
glusterfs2:disk3 155674
glusterfs3:disk3 155717
glusterfs1:disk4 155684
glusterfs2:disk4 155683
glusterfs3:disk4 155726
glusterfs1:disk5 155698
glusterfs2:disk5 155695
glusterfs3:disk5 155738
glusterfs1:disk6 155668
glusterfs2:disk6 155667
glusterfs3:disk6 155710
glusterfs1:disk7 155687
glusterfs2:disk7 155689
glusterfs3:disk7 155732
glusterfs1:disk8 155673
glusterfs2:disk8 155675
glusterfs3:disk8 155718
glusterfs4:disk1 149097
glusterfs5:disk1 149097
glusterfs6:disk1 149098
glusterfs4:disk2 149097
glusterfs5:disk2 149097
glusterfs6:disk2 149098
glusterfs4:disk3 149097
glusterfs5:disk3 149097
glusterfs6:disk3 149098
glusterfs4:disk4 149097
glusterfs5:disk4 149097
glusterfs6:disk4 149098
glusterfs4:disk5 149097
glusterfs5:disk5 149097
glusterfs6:disk5 149098
glusterfs4:disk6 149097
glusterfs5:disk6 149097
glusterfs6:disk6 149098
glusterfs4:disk7 149097
glusterfs5:disk7 149097
glusterfs6:disk7 149098
glusterfs4:disk8 149097
glusterfs5:disk8 149097
glusterfs6:disk8 149098
An small difference could be explained by concurrent operations while retrieving
this data, but some bricks are way out of sync.
trusted.ec.dirty and trusted.ec.version also show many discrepancies:
glusterfs1:disk1 trusted.ec.dirty=0x0000000000000ba40000000000000000
glusterfs2:disk1 trusted.ec.dirty=0x0000000000000bb80000000000000000
glusterfs3:disk1 trusted.ec.dirty=0x00000000000000160000000000000000
glusterfs1:disk1 trusted.ec.version=0x0000000000084db40000000000084e11
glusterfs2:disk1 trusted.ec.version=0x0000000000084e070000000000084e0c
glusterfs3:disk1 trusted.ec.version=0x000000000008426a0000000000084e11
glusterfs1:disk2 trusted.ec.dirty=0x0000000000000ba50000000000000000
glusterfs2:disk2 trusted.ec.dirty=0x0000000000000bb60000000000000000
glusterfs3:disk2 trusted.ec.dirty=0x00000000000000170000000000000000
glusterfs1:disk2 trusted.ec.version=0x000000000005ccb7000000000005cd0a
glusterfs2:disk2 trusted.ec.version=0x000000000005cd00000000000005cd05
glusterfs3:disk2 trusted.ec.version=0x000000000005c166000000000005cd0a
glusterfs1:disk3 trusted.ec.dirty=0x0000000000000ba50000000000000000
glusterfs2:disk3 trusted.ec.dirty=0x0000000000000bb50000000000000000
glusterfs3:disk3 trusted.ec.dirty=0x00000000000000160000000000000000
glusterfs1:disk3 trusted.ec.version=0x000000000005d0cb000000000005d123
glusterfs2:disk3 trusted.ec.version=0x000000000005d119000000000005d11e
glusterfs3:disk3 trusted.ec.version=0x000000000005c57f000000000005d123
glusterfs1:disk4 trusted.ec.dirty=0x0000000000000ba00000000000000000
glusterfs2:disk4 trusted.ec.dirty=0x0000000000000bb10000000000000000
glusterfs3:disk4 trusted.ec.dirty=0x00000000000000130000000000000000
glusterfs1:disk4 trusted.ec.version=0x0000000000084e2e0000000000084e78
glusterfs2:disk4 trusted.ec.version=0x0000000000084e6e0000000000084e73
glusterfs3:disk4 trusted.ec.version=0x00000000000842d50000000000084e78
glusterfs1:disk5 trusted.ec.dirty=0x0000000000000b9a0000000000000000
glusterfs2:disk5 trusted.ec.dirty=0x0000000000002e270000000000000000
glusterfs3:disk5 trusted.ec.dirty=0x00000000000022950000000000000000
glusterfs1:disk5 trusted.ec.version=0x000000000005aa1f000000000005cd18
glusterfs2:disk5 trusted.ec.version=0x000000000005cd0d000000000005cd13
glusterfs3:disk5 trusted.ec.version=0x000000000005c180000000000005cd18
glusterfs1:disk6 trusted.ec.dirty=0x0000000000000ba20000000000000000
glusterfs2:disk6 trusted.ec.dirty=0x0000000000000bad0000000000000000
glusterfs3:disk6 trusted.ec.dirty=0x000000000000000f0000000000000000
glusterfs1:disk6 trusted.ec.version=0x000000000005ccba000000000005cce7
glusterfs2:disk6 trusted.ec.version=0x000000000005ccde000000000005cce2
glusterfs3:disk6 trusted.ec.version=0x000000000005c145000000000005cce7
glusterfs1:disk7 trusted.ec.dirty=0x0000000000000ba50000000000000000
glusterfs2:disk7 trusted.ec.dirty=0x0000000000000bab0000000000000000
glusterfs3:disk7 trusted.ec.dirty=0x000000000000000a0000000000000000
glusterfs1:disk7 trusted.ec.version=0x000000000005cd03000000000005cd0d
glusterfs2:disk7 trusted.ec.version=0x000000000005cd04000000000005cd08
glusterfs3:disk7 trusted.ec.version=0x000000000005c138000000000005cd0d
glusterfs1:disk8 trusted.ec.dirty=0x0000000000000bbb0000000000000000
glusterfs2:disk8 trusted.ec.dirty=0x0000000000000bc00000000000000000
glusterfs3:disk8 trusted.ec.dirty=0x00000000000000090000000000000000
glusterfs1:disk8 trusted.ec.version=0x000000000005cdc4000000000005cdcd
glusterfs2:disk8 trusted.ec.version=0x000000000005cdc4000000000005cdc8
glusterfs3:disk8 trusted.ec.version=0x000000000005c158000000000005cdcd
glusterfs4:disk1 trusted.ec.version=0x000000000005901d0000000000059021
glusterfs5:disk1 trusted.ec.version=0x000000000005901d0000000000059021
glusterfs6:disk1 trusted.ec.version=0x000000000005901e0000000000059022
glusterfs4:disk2 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk2 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk2 trusted.ec.version=0x000000000002d2d8000000000002d2da
glusterfs4:disk3 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk3 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk3 trusted.ec.version=0x000000000002d2d8000000000002d2da
glusterfs4:disk4 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk4 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk4 trusted.ec.version=0x000000000002d2d8000000000002d2da
glusterfs4:disk5 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk5 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk5 trusted.ec.version=0x000000000002d2d8000000000002d2da
glusterfs4:disk6 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk6 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk6 trusted.ec.version=0x000000000002d2d8000000000002d2da
glusterfs4:disk7 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk7 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk7 trusted.ec.version=0x000000000002d2d8000000000002d2da
glusterfs4:disk8 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk8 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk8 trusted.ec.version=0x000000000002d2d8000000000002d2da
Newer bricks seem to be healthy, but old bricks have a lot of differences.
I also see that trusted.glusterfs.dht is not set for newer bricks, and the full
range of hashes are assigned to the old bricks (at least for the CV_MAGNETIC
directory). This probably means that a rebalance has not been executed on the
volume after adding the new bricks (or it failed).
This will require much more investigation and knowledge about how do you things,
from how many clients, ...
Xavi
>
> Thanks and Regards,
> Ram
>
> -----Original Message-----
> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
> Sent: Monday, March 13, 2017 9:56 AM
> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org);
> gluster-users at gluster.org
> Subject: Re: [Gluster-users] Disperse mkdir fails
>
> Hi Ram,
>
> On 13/03/17 14:13, Ankireddypalle Reddy wrote:
>> Attachment (1):
>>
>> 1
>>
>>
>>
>> data.txt
>>
<https://imap.commvault.com/webconsole/embedded.do?url=https://imap.c
>> o
>> mmvault.com/webconsole/api/drive/publicshare/346714/file/02bb2e2504a5
>> 4
>>
3e58cc89bce9f350f8c/action/preview&downloadUrl=https://imap.commvault.
>> com/webconsole/api/contentstore/publicshare/346714/file/02bb2e2504a54
>> 3
>> e58cc89bce9f350f8c/action/download>
>> [Download]
>>
<https://imap.commvault.com/webconsole/api/contentstore/publicshare/3
>> 4
>> 6714/file/02bb2e2504a543e58cc89bce9f350f8c/action/download>(17.63
>> KB)
>>
>> Xavier,
>> Please find attached the required info from all the
>> six nodes of the cluster.
>
> I asked for the contents of the CV_MAGNETIC because this is the damaged
directory, not the parent. But anyway we can see that the number of hard links
of the directory differs for each brick, so this means that the number of
subdirectories is different on each brick. A small difference could be
explainable by the current activity of the volume while the data has been
captured, but the differences are too big.
>
>> We need to find
>> 1) What is the solution through which this problem can
>> be avoided.
>> 2) How do we fix the current state of the cluster.
>>
>> Thanks and Regards,
>> Ram
>> -----Original Message-----
>> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
>> Sent: Friday, March 10, 2017 3:34 AM
>> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org);
>> gluster-users at gluster.org
>> Subject: Re: [Gluster-users] Disperse mkdir fails
>>
>> Hi Ram,
>>
>> On 09/03/17 20:15, Ankireddypalle Reddy wrote:
>>> Xavi,
>>> Thanks for checking this.
>>> 1) mkdir returns errnum 5. EIO.
>>> 2) The specified directory is the parent directory
>>> under
>> which all the data in the gluster volume will be stored. Current
>> around 160TB of 262 TB is consumed.
>>
>> I only need the first level entries of that directory, not the entire
>> tree of entries. This should be in the order of thousands, right ?
>>
>> We need to make sure that all bricks have the same entries in this
>> directory. Otherwise we would need to check other things.
>>
>>> 3) It is extremely difficult to list the exact
sequence
>> of FOPS that would have been issued to the directory. The storage is
>> heavily used and lot of sub directories are present inside this
directory.
>>>
>>> Are you looking for the extended attributes for this
>> directory from all the bricks inside the volume. There are about 60
bricks.
>>
>> If possible, yes.
>>
>> However, if there's a lot of modifications on that directory while
>> you are getting the xattr, it's possible that you get inconsistent
>> values, but they are not really inconsistent.
>>
>> If possible, you should get that information pausing all activity to
>> that directory.
>>
>> Xavi
>>
>>>
>>> Thanks and Regards,
>>> Ram
>>>
>>> -----Original Message-----
>>> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
>>> Sent: Thursday, March 09, 2017 11:15 AM
>>> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at
gluster.org);
>>> gluster-users at gluster.org
>>> Subject: Re: [Gluster-users] Disperse mkdir fails
>>>
>>> Hi Ram,
>>>
>>> On 09/03/17 16:52, Ankireddypalle Reddy wrote:
>>>> Attachment (1):
>>>>
>>>> 1
>>>>
>>>>
>>>>
>>>> info.txt
>>>>
<https://imap.commvault.com/webconsole/embedded.do?url=https://imap.
>>>> c
>>>> o
>>>>
mmvault.com/webconsole/api/drive/publicshare/346714/file/3037641a3f
>>>> 9
>>>> b
>>>> 4
>>>>
133920b1b251ed32d5d/action/preview&downloadUrl=https://imap.commvault.
>>>>
com/webconsole/api/contentstore/publicshare/346714/file/3037641a3f9
>>>> b
>>>> 4
>>>> 1
>>>> 33920b1b251ed32d5d/action/download>
>>>> [Download]
>>>>
<https://imap.commvault.com/webconsole/api/contentstore/publicshare
>>>> /
>>>> 3
>>>> 4
>>>>
6714/file/3037641a3f9b4133920b1b251ed32d5d/action/download>(3.35
>>>> KB)
>>>>
>>>> Hi,
>>>>
>>>> I have a disperse gluster volume with 6 servers. 262TB
of
>>>> usable capacity. Gluster version is 3.7.19.
>>>>
>>>> glusterfs1, glusterf2 and glusterfs3 nodes were
initially
>>>> used for creating the volume. Nodes glusterf4, glusterfs5 and
>>>> glusterfs6 were later added to the volume.
>>>>
>>>>
>>>>
>>>> Directory creation failed on a directory called
>>>> /ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC.
>>>>
>>>> # file: ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC
>>>>
>>>>
glusterfs.gfid.string="e8e51015-616f-4f04-b9d2-92f46eb5cfc7"
>>>>
>>>>
>>>>
>>>> gluster mount log contains lot of following errors:
>>>>
>>>> [2017-03-09 15:32:36.773937] W [MSGID: 122056]
>>>> [ec-combine.c:875:ec_combine_check] 0-StoragePool-disperse-7:
>>>> Mismatching xdata in answers of 'LOOKUP' for
>>>> e8e51015-616f-4f04-b9d2-92f46eb5cfc7
>>>>
>>>>
>>>>
>>>> The directory seems to be out of sync between nodes
>>>> glusterfs1,
>>>> glusterfs2 and glusterfs3. Each has different version.
>>>>
>>>>
>>>>
>>>> trusted.ec.version=0x00000000000839f00000000000083a4d
>>>>
>>>> trusted.ec.version=0x0000000000082ea40000000000083a4b
>>>>
>>>> trusted.ec.version=0x0000000000083a760000000000083a7b
>>>>
>>>>
>>>>
>>>> Self-heal does not seem to be healing this directory.
>>>>
>>>
>>> This is very similar to what happened the other time. Once more
than
>>> 1
>> brick is damaged, self-heal cannot do anything to heal it on a 2+1
>> configuration.
>>>
>>> What error does return the mkdir request ?
>>>
>>> Does the directory you are trying to create already exist on some
brick ?
>>>
>>> Can you show all the remaining extended attributes of the directory
?
>>>
>>> It would also be useful to have the directory contents on each
brick
>> (an 'ls -l'). In this case, include the name of the directory
you are
>> trying to create.
>>>
>>> Can you explain a detailed sequence of operations done on that
>> directory since the last time you successfully created a new
subdirectory ?
>>> including any metadata change.
>>>
>>> Xavi
>>>
>>>>
>>>>
>>>> Thanks and Regards,
>>>>
>>>> Ram
>>>>
>>>> ***************************Legal
>>>> Disclaimer***************************
>>>> "This communication may contain confidential and
privileged
>>>> material for the sole use of the intended recipient. Any
>>>> unauthorized review, use or distribution by others is strictly
>>>> prohibited. If you have received the message by mistake, please
>>>> advise the sender by reply email and delete the message. Thank
you."
>>>>
*******************************************************************
>>>> *
>>>> *
>>>> *
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>> ***************************Legal
>>> Disclaimer***************************
>>> "This communication may contain confidential and privileged
material
>>> for the sole use of the intended recipient. Any unauthorized
review,
>>> use or distribution by others is strictly prohibited. If you have
>>> received the message by mistake, please advise the sender by reply
>> email and delete the message. Thank you."
>>>
********************************************************************
>>> *
>>> *
>>>
>>
>> ***************************Legal
>> Disclaimer***************************
>> "This communication may contain confidential and privileged
material
>> for the sole use of the intended recipient. Any unauthorized review,
>> use or distribution by others is strictly prohibited. If you have
>> received the message by mistake, please advise the sender by reply
>> email and delete the message. Thank you."
>> *********************************************************************
>> *
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material
> for the sole use of the intended recipient. Any unauthorized review,
> use or distribution by others is strictly prohibited. If you have
> received the message by mistake, please advise the sender by reply email
and delete the message. Thank you."
> **********************************************************************
>
***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for
the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************