thr3ads.net - Gluster users - [Gluster-users] 3.7.13, index healing broken? [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Pranith Kumar Karampuri

2016-Jul-13 05:36 UTC

[Gluster-users] 3.7.13, index healing broken?

On Wed, Jul 13, 2016 at 10:58 AM, Dmitry Melekhov <dm at belkam.com>
wrote:
> 13.07.2016 09:26, Pranith Kumar Karampuri ?????:
>
>
>
> On Wed, Jul 13, 2016 at 10:50 AM, Dmitry Melekhov < <dm at
belkam.com>
> dm at belkam.com> wrote:
>
>> 13.07.2016 09:16, Pranith Kumar Karampuri ?????:
>>
>>
>>
>> On Wed, Jul 13, 2016 at 10:38 AM, Dmitry Melekhov <dm at
belkam.com> wrote:
>>
>>> 13.07.2016 09:04, Pranith Kumar Karampuri ?????:
>>>
>>>
>>>
>>> On Wed, Jul 13, 2016 at 10:29 AM, Dmitry Melekhov < <dm at
belkam.com>
>>> dm at belkam.com> wrote:
>>>
>>>> 13.07.2016 08:56, Pranith Kumar Karampuri ?????:
>>>>
>>>>
>>>>
>>>> On Wed, Jul 13, 2016 at 10:23 AM, Dmitry Melekhov < <dm
at belkam.com>
>>>> dm at belkam.com> wrote:
>>>>
>>>>> 13.07.2016 08:46, Pranith Kumar Karampuri ?????:
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 13, 2016 at 10:10 AM, Dmitry Melekhov <
<dm at belkam.com>
>>>>> dm at belkam.com> wrote:
>>>>>
>>>>>> 13.07.2016 08:36, Pranith Kumar Karampuri ?????:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 13, 2016 at 9:35 AM, Dmitry Melekhov <
<dm at belkam.com>
>>>>>> dm at belkam.com> wrote:
>>>>>>
>>>>>>> 13.07.2016 01:52, Anuradha Talur ?????:
>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>
>>>>>>>>> From: "Dmitry Melekhov" <
<dm at belkam.com>dm at belkam.com>
>>>>>>>>> To: "Pranith Kumar Karampuri"
< <pkarampu at redhat.com>
>>>>>>>>> pkarampu at redhat.com>
>>>>>>>>> Cc: "gluster-users" <
<gluster-users at gluster.org>
>>>>>>>>> gluster-users at gluster.org>
>>>>>>>>> Sent: Tuesday, July 12, 2016 9:27:17 PM
>>>>>>>>> Subject: Re: [Gluster-users] 3.7.13, index
healing broken?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 12.07.2016 17:39, Pranith Kumar Karampuri
?????:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Wow, what are the steps to recreate the
problem?
>>>>>>>>>
>>>>>>>>> just set file length to zero, always
reproducible.
>>>>>>>>>
>>>>>>>>> If you are setting the file length to 0 on
one of the bricks
>>>>>>>> (looks like
>>>>>>>> that is the case), it is not a bug.
>>>>>>>>
>>>>>>>> Index heal relies on failures seen from the
mount point(s)
>>>>>>>> to identify the files that need heal. It
won't be able to recognize
>>>>>>>> any file
>>>>>>>> modification done directly on bricks. Same goes
for heal info
>>>>>>>> command which
>>>>>>>> is the reason heal info also shows 0 entries.
>>>>>>>>
>>>>>>>
>>>>>>> Well, this makes self-heal useless then- if any
file is accidently
>>>>>>> corrupted or deleted (yes! if file is deleted
directly from brick this is
>>>>>>> no recognized by idex heal too), then it will not
be self-healed, because
>>>>>>> self-heal uses index heal.
>>>>>>>
>>>>>>
>>>>>> It is better to look into bit-rot feature if you want
to guard
>>>>>> against these kinds of problems.
>>>>>>
>>>>>>
>>>>>> Bit rot detects bit problems, not missing files or
their wrong
>>>>>> length, i.e. this is overhead for such simple task.
>>>>>>
>>>>>
>>>>> It detects wrong length. Because checksum won't match
anymore.
>>>>>
>>>>>
>>>>> Yes, sure. I guess that it will detect missed files too.
But it needs
>>>>> far more resources, then just comparing directories in
bricks?
>>>>>
>>>>>
>>>>> What use-case you are trying out is leading to changing
things
>>>>> directly on the brick?
>>>>>
>>>>> I'm trying to test gluster failure tolerance and right
now I'm not
>>>>> happy with it...
>>>>>
>>>>
>>>> Which cases of fault tolerance are you not happy with? Making
changes
>>>> directly on the brick or anything else as well?
>>>>
>>>> I'll repeat:
>>>> As I already said- if I for some reason ( real case  can be
only by
>>>> accident ) will delete file this will not be detected by
self-heal daemon,
>>>> and, thus, will lead to lower replication level, i.e. lower
failure
>>>> tolerance.
>>>>
>>>
>>> To prevent such accidents you need to set selinux policies so that
files
>>> under the brick are not modified by accident by any user. At least
that is
>>> the solution I remember when this was discussed 3-4 years back.
>>>
>>> So only supported platfrom is linux? Or, may be, it is better to
improve
>>> self-healing to detect missing or wrong length files, I guess this
is very
>>> low cost in terms of host resources operation.
>>> Just a suggestion, may be we need to look to alternatives in near
>>> future....
>>>
>>> This is a corner case, from design perspective it is generally not
a
>> good idea to optimize for the corner case. It is better to protect
>> ourselves from the corner case (SElinux etc) or you can also use
snapshots
>> to protect against these kind of mishaps.
>>
>> Sorry, I'm not agree.
>> As you  know if on access missed or wrong lenghted file from fuse
client
>> it is restored (healed), i.e. gluster recognizes file is wrong and heal
it
>> , so I do not see any reason to provide this such function as
self-healing.
>> Thank you!
>>
>> Ah! Now how do you suggest we keep track of which of 10s of millions of
> files the user accidentally deleted from the brick without gluster's
> knowledge? Once it comes to gluster's knowledge we can do something.
But
> how does gluster become aware of something it is not keeping track of? At
> the time you access it gluster knows something went wrong so it restores
> it. If you change something on the bricks even by accident all the data
> gluster keeps (similar to journal) is a waste. Even the disk filesystems
> will ask you to do fsck if something unexpected happens so full self-heal
> is similar operation.
>
>
> You are absolutely right- question is why gluster does not become aware
> about such problem is case of self-healing?
>
Because the operations that are performed directly on brick do not go
through gluster stack.

>
>
>
> --
> Pranith
>
>
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160713/a4f20325/attachment.html>

Dmitry Melekhov

2016-Jul-13 05:41 UTC

head link

[Gluster-users] 3.7.13, index healing broken?

13.07.2016 09:36, Pranith Kumar Karampuri ?????:>
>
> On Wed, Jul 13, 2016 at 10:58 AM, Dmitry Melekhov <dm at belkam.com 
> <mailto:dm at belkam.com>> wrote:
>
>     13.07.2016 09:26, Pranith Kumar Karampuri ?????:
>>
>>
>>     On Wed, Jul 13, 2016 at 10:50 AM, Dmitry Melekhov <dm at
belkam.com
>>     <mailto:dm at belkam.com>> wrote:
>>
>>         13.07.2016 09:16, Pranith Kumar Karampuri ?????:
>>>
>>>
>>>         On Wed, Jul 13, 2016 at 10:38 AM, Dmitry Melekhov
>>>         <dm at belkam.com <mailto:dm at belkam.com>>
wrote:
>>>
>>>             13.07.2016 09:04, Pranith Kumar Karampuri ?????:
>>>>
>>>>
>>>>             On Wed, Jul 13, 2016 at 10:29 AM, Dmitry Melekhov
>>>>             <dm at belkam.com <mailto:dm at
belkam.com>> wrote:
>>>>
>>>>                 13.07.2016 08:56, Pranith Kumar Karampuri
?????:
>>>>>
>>>>>
>>>>>                 On Wed, Jul 13, 2016 at 10:23 AM, Dmitry
Melekhov
>>>>>                 <dm at belkam.com <mailto:dm at
belkam.com>> wrote:
>>>>>
>>>>>                     13.07.2016 08:46, Pranith Kumar
Karampuri ?????:
>>>>>>
>>>>>>
>>>>>>                     On Wed, Jul 13, 2016 at 10:10 AM,
Dmitry
>>>>>>                     Melekhov <dm at belkam.com
>>>>>>                     <mailto:dm at belkam.com>>
wrote:
>>>>>>
>>>>>>                         13.07.2016 08:36, Pranith Kumar
Karampuri
>>>>>>                         ?????:
>>>>>>>
>>>>>>>
>>>>>>>                         On Wed, Jul 13, 2016 at
9:35 AM, Dmitry
>>>>>>>                         Melekhov <dm at
belkam.com
>>>>>>>                         <mailto:dm at
belkam.com>> wrote:
>>>>>>>
>>>>>>>                             13.07.2016 01:52,
Anuradha Talur ?????:
>>>>>>>
>>>>>>>
>>>>>>>                                 ----- Original
Message -----
>>>>>>>
>>>>>>>                                     From:
"Dmitry Melekhov"
>>>>>>>                                     <dm at
belkam.com
>>>>>>>                                     <mailto:dm
at belkam.com>>
>>>>>>>                                     To:
"Pranith Kumar
>>>>>>>                                     Karampuri"
>>>>>>>                                     <pkarampu at
redhat.com
>>>>>>>                                    
<mailto:pkarampu at redhat.com>>
>>>>>>>                                     Cc:
"gluster-users"
>>>>>>>                                    
<gluster-users at gluster.org
>>>>>>>                                    
<mailto:gluster-users at gluster.org>>
>>>>>>>                                     Sent: Tuesday,
July 12, 2016
>>>>>>>                                     9:27:17 PM
>>>>>>>                                     Subject: Re:
[Gluster-users]
>>>>>>>                                     3.7.13, index
healing broken?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                                     12.07.2016
17:39, Pranith
>>>>>>>                                     Kumar Karampuri
?????:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                                     Wow, what are
the steps to
>>>>>>>                                     recreate the
problem?
>>>>>>>
>>>>>>>                                     just set file
length to
>>>>>>>                                     zero, always
reproducible.
>>>>>>>
>>>>>>>                                 If you are setting
the file
>>>>>>>                                 length to 0 on one
of the bricks
>>>>>>>                                 (looks like
>>>>>>>                                 that is the case),
it is not a bug.
>>>>>>>
>>>>>>>                                 Index heal relies
on failures
>>>>>>>                                 seen from the mount
point(s)
>>>>>>>                                 to identify the
files that need
>>>>>>>                                 heal. It won't
be able to
>>>>>>>                                 recognize any file
>>>>>>>                                 modification done
directly on
>>>>>>>                                 bricks. Same goes
for heal info
>>>>>>>                                 command which
>>>>>>>                                 is the reason heal
info also
>>>>>>>                                 shows 0 entries.
>>>>>>>
>>>>>>>
>>>>>>>                             Well, this makes
self-heal useless
>>>>>>>                             then- if any file is
accidently
>>>>>>>                             corrupted or deleted
(yes! if file
>>>>>>>                             is deleted directly
from brick this
>>>>>>>                             is no recognized by
idex heal too),
>>>>>>>                             then it will not be
self-healed,
>>>>>>>                             because self-heal uses
index heal.
>>>>>>>
>>>>>>>
>>>>>>>                         It is better to look into
bit-rot
>>>>>>>                         feature if you want to
guard against
>>>>>>>                         these kinds of problems.
>>>>>>
>>>>>>                         Bit rot detects bit problems,
not missing
>>>>>>                         files or their wrong length,
i.e. this is
>>>>>>                         overhead for such simple task.
>>>>>>
>>>>>>
>>>>>>                     It detects wrong length. Because
checksum
>>>>>>                     won't match anymore.
>>>>>
>>>>>                     Yes, sure. I guess that it will detect
missed
>>>>>                     files too. But it needs far more
resources,
>>>>>                     then just comparing directories in
bricks?
>>>>>>
>>>>>>                     What use-case you are trying out is
leading
>>>>>>                     to changing things directly on the
brick?
>>>>>                     I'm trying to test gluster failure
tolerance
>>>>>                     and right now I'm not happy with
it...
>>>>>
>>>>>
>>>>>                 Which cases of fault tolerance are you not
happy
>>>>>                 with? Making changes directly on the brick
or
>>>>>                 anything else as well?
>>>>>
>>>>                 I'll repeat:
>>>>                 As I already said- if I for some reason ( real
>>>>                 case  can be only by accident ) will delete
file
>>>>                 this will not be detected by self-heal daemon,
and,
>>>>                 thus, will lead to lower replication level,
i.e.
>>>>                 lower failure tolerance.
>>>>
>>>>
>>>>             To prevent such accidents you need to set selinux
>>>>             policies so that files under the brick are not
modified
>>>>             by accident by any user. At least that is the
solution
>>>>             I remember when this was discussed 3-4 years back.
>>>>
>>>             So only supported platfrom is linux? Or, may be, it is
>>>             better to improve self-healing to detect missing or
>>>             wrong length files, I guess this is very low cost in
>>>             terms of host resources operation.
>>>             Just a suggestion, may be we need to look to
>>>             alternatives in near future....
>>>
>>>         This is a corner case, from design perspective it is
>>>         generally not a good idea to optimize for the corner case.
>>>         It is better to protect ourselves from the corner case
>>>         (SElinux etc) or you can also use snapshots to protect
>>>         against these kind of mishaps.
>>>
>>         Sorry, I'm not agree.
>>         As you  know if on access missed or wrong lenghted file from
>>         fuse client it is restored (healed), i.e. gluster recognizes
>>         file is wrong and heal it , so I do not see any reason to
>>         provide this such function as self-healing.
>>         Thank you!
>>
>>     Ah! Now how do you suggest we keep track of which of 10s of
>>     millions of files the user accidentally deleted from the brick
>>     without gluster's knowledge? Once it comes to gluster's
knowledge
>>     we can do something. But how does gluster become aware of
>>     something it is not keeping track of? At the time you access it
>>     gluster knows something went wrong so it restores it. If you
>>     change something on the bricks even by accident all the data
>>     gluster keeps (similar to journal) is a waste. Even the disk
>>     filesystems will ask you to do fsck if something unexpected
>>     happens so full self-heal is similar operation.
>
>     You are absolutely right- question is why gluster does not become
>     aware about such problem is case of self-healing?
>
>
> Because the operations that are performed directly on brick do not go 
> through gluster stack.
OK, I'll repeat-
As you  know if on access missed or wrong lenghted file from fuse client 
it is restored (healed), i.e. gluster recognizes file is wrong and heal 
it , so I do not see any reason to provide this such function as 
self-healing.
>
>>
>>
>>     -- 
>>     Pranith
>
>
>
>
> -- 
> Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160713/7343b80a/attachment.html>

Gluster users - Jul 2016 - 3.7.13, index healing broken?

[Gluster-users] 3.7.13, index healing broken?

[Gluster-users] 3.7.13, index healing broken?