Pranith Kumar Karampuri
2016-Jul-13 05:36 UTC
[Gluster-users] 3.7.13, index healing broken?
On Wed, Jul 13, 2016 at 10:58 AM, Dmitry Melekhov <dm at belkam.com> wrote:> 13.07.2016 09:26, Pranith Kumar Karampuri ?????: > > > > On Wed, Jul 13, 2016 at 10:50 AM, Dmitry Melekhov < <dm at belkam.com> > dm at belkam.com> wrote: > >> 13.07.2016 09:16, Pranith Kumar Karampuri ?????: >> >> >> >> On Wed, Jul 13, 2016 at 10:38 AM, Dmitry Melekhov <dm at belkam.com> wrote: >> >>> 13.07.2016 09:04, Pranith Kumar Karampuri ?????: >>> >>> >>> >>> On Wed, Jul 13, 2016 at 10:29 AM, Dmitry Melekhov < <dm at belkam.com> >>> dm at belkam.com> wrote: >>> >>>> 13.07.2016 08:56, Pranith Kumar Karampuri ?????: >>>> >>>> >>>> >>>> On Wed, Jul 13, 2016 at 10:23 AM, Dmitry Melekhov < <dm at belkam.com> >>>> dm at belkam.com> wrote: >>>> >>>>> 13.07.2016 08:46, Pranith Kumar Karampuri ?????: >>>>> >>>>> >>>>> >>>>> On Wed, Jul 13, 2016 at 10:10 AM, Dmitry Melekhov < <dm at belkam.com> >>>>> dm at belkam.com> wrote: >>>>> >>>>>> 13.07.2016 08:36, Pranith Kumar Karampuri ?????: >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jul 13, 2016 at 9:35 AM, Dmitry Melekhov < <dm at belkam.com> >>>>>> dm at belkam.com> wrote: >>>>>> >>>>>>> 13.07.2016 01:52, Anuradha Talur ?????: >>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>> >>>>>>>>> From: "Dmitry Melekhov" < <dm at belkam.com>dm at belkam.com> >>>>>>>>> To: "Pranith Kumar Karampuri" < <pkarampu at redhat.com> >>>>>>>>> pkarampu at redhat.com> >>>>>>>>> Cc: "gluster-users" < <gluster-users at gluster.org> >>>>>>>>> gluster-users at gluster.org> >>>>>>>>> Sent: Tuesday, July 12, 2016 9:27:17 PM >>>>>>>>> Subject: Re: [Gluster-users] 3.7.13, index healing broken? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 12.07.2016 17:39, Pranith Kumar Karampuri ?????: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Wow, what are the steps to recreate the problem? >>>>>>>>> >>>>>>>>> just set file length to zero, always reproducible. >>>>>>>>> >>>>>>>>> If you are setting the file length to 0 on one of the bricks >>>>>>>> (looks like >>>>>>>> that is the case), it is not a bug. >>>>>>>> >>>>>>>> Index heal relies on failures seen from the mount point(s) >>>>>>>> to identify the files that need heal. It won't be able to recognize >>>>>>>> any file >>>>>>>> modification done directly on bricks. Same goes for heal info >>>>>>>> command which >>>>>>>> is the reason heal info also shows 0 entries. >>>>>>>> >>>>>>> >>>>>>> Well, this makes self-heal useless then- if any file is accidently >>>>>>> corrupted or deleted (yes! if file is deleted directly from brick this is >>>>>>> no recognized by idex heal too), then it will not be self-healed, because >>>>>>> self-heal uses index heal. >>>>>>> >>>>>> >>>>>> It is better to look into bit-rot feature if you want to guard >>>>>> against these kinds of problems. >>>>>> >>>>>> >>>>>> Bit rot detects bit problems, not missing files or their wrong >>>>>> length, i.e. this is overhead for such simple task. >>>>>> >>>>> >>>>> It detects wrong length. Because checksum won't match anymore. >>>>> >>>>> >>>>> Yes, sure. I guess that it will detect missed files too. But it needs >>>>> far more resources, then just comparing directories in bricks? >>>>> >>>>> >>>>> What use-case you are trying out is leading to changing things >>>>> directly on the brick? >>>>> >>>>> I'm trying to test gluster failure tolerance and right now I'm not >>>>> happy with it... >>>>> >>>> >>>> Which cases of fault tolerance are you not happy with? Making changes >>>> directly on the brick or anything else as well? >>>> >>>> I'll repeat: >>>> As I already said- if I for some reason ( real case can be only by >>>> accident ) will delete file this will not be detected by self-heal daemon, >>>> and, thus, will lead to lower replication level, i.e. lower failure >>>> tolerance. >>>> >>> >>> To prevent such accidents you need to set selinux policies so that files >>> under the brick are not modified by accident by any user. At least that is >>> the solution I remember when this was discussed 3-4 years back. >>> >>> So only supported platfrom is linux? Or, may be, it is better to improve >>> self-healing to detect missing or wrong length files, I guess this is very >>> low cost in terms of host resources operation. >>> Just a suggestion, may be we need to look to alternatives in near >>> future.... >>> >>> This is a corner case, from design perspective it is generally not a >> good idea to optimize for the corner case. It is better to protect >> ourselves from the corner case (SElinux etc) or you can also use snapshots >> to protect against these kind of mishaps. >> >> Sorry, I'm not agree. >> As you know if on access missed or wrong lenghted file from fuse client >> it is restored (healed), i.e. gluster recognizes file is wrong and heal it >> , so I do not see any reason to provide this such function as self-healing. >> Thank you! >> >> Ah! Now how do you suggest we keep track of which of 10s of millions of > files the user accidentally deleted from the brick without gluster's > knowledge? Once it comes to gluster's knowledge we can do something. But > how does gluster become aware of something it is not keeping track of? At > the time you access it gluster knows something went wrong so it restores > it. If you change something on the bricks even by accident all the data > gluster keeps (similar to journal) is a waste. Even the disk filesystems > will ask you to do fsck if something unexpected happens so full self-heal > is similar operation. > > > You are absolutely right- question is why gluster does not become aware > about such problem is case of self-healing? >Because the operations that are performed directly on brick do not go through gluster stack.> > > > -- > Pranith > > >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160713/a4f20325/attachment.html>
13.07.2016 09:36, Pranith Kumar Karampuri ?????:> > > On Wed, Jul 13, 2016 at 10:58 AM, Dmitry Melekhov <dm at belkam.com > <mailto:dm at belkam.com>> wrote: > > 13.07.2016 09:26, Pranith Kumar Karampuri ?????: >> >> >> On Wed, Jul 13, 2016 at 10:50 AM, Dmitry Melekhov <dm at belkam.com >> <mailto:dm at belkam.com>> wrote: >> >> 13.07.2016 09:16, Pranith Kumar Karampuri ?????: >>> >>> >>> On Wed, Jul 13, 2016 at 10:38 AM, Dmitry Melekhov >>> <dm at belkam.com <mailto:dm at belkam.com>> wrote: >>> >>> 13.07.2016 09:04, Pranith Kumar Karampuri ?????: >>>> >>>> >>>> On Wed, Jul 13, 2016 at 10:29 AM, Dmitry Melekhov >>>> <dm at belkam.com <mailto:dm at belkam.com>> wrote: >>>> >>>> 13.07.2016 08:56, Pranith Kumar Karampuri ?????: >>>>> >>>>> >>>>> On Wed, Jul 13, 2016 at 10:23 AM, Dmitry Melekhov >>>>> <dm at belkam.com <mailto:dm at belkam.com>> wrote: >>>>> >>>>> 13.07.2016 08:46, Pranith Kumar Karampuri ?????: >>>>>> >>>>>> >>>>>> On Wed, Jul 13, 2016 at 10:10 AM, Dmitry >>>>>> Melekhov <dm at belkam.com >>>>>> <mailto:dm at belkam.com>> wrote: >>>>>> >>>>>> 13.07.2016 08:36, Pranith Kumar Karampuri >>>>>> ?????: >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 13, 2016 at 9:35 AM, Dmitry >>>>>>> Melekhov <dm at belkam.com >>>>>>> <mailto:dm at belkam.com>> wrote: >>>>>>> >>>>>>> 13.07.2016 01:52, Anuradha Talur ?????: >>>>>>> >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>> >>>>>>> From: "Dmitry Melekhov" >>>>>>> <dm at belkam.com >>>>>>> <mailto:dm at belkam.com>> >>>>>>> To: "Pranith Kumar >>>>>>> Karampuri" >>>>>>> <pkarampu at redhat.com >>>>>>> <mailto:pkarampu at redhat.com>> >>>>>>> Cc: "gluster-users" >>>>>>> <gluster-users at gluster.org >>>>>>> <mailto:gluster-users at gluster.org>> >>>>>>> Sent: Tuesday, July 12, 2016 >>>>>>> 9:27:17 PM >>>>>>> Subject: Re: [Gluster-users] >>>>>>> 3.7.13, index healing broken? >>>>>>> >>>>>>> >>>>>>> >>>>>>> 12.07.2016 17:39, Pranith >>>>>>> Kumar Karampuri ?????: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Wow, what are the steps to >>>>>>> recreate the problem? >>>>>>> >>>>>>> just set file length to >>>>>>> zero, always reproducible. >>>>>>> >>>>>>> If you are setting the file >>>>>>> length to 0 on one of the bricks >>>>>>> (looks like >>>>>>> that is the case), it is not a bug. >>>>>>> >>>>>>> Index heal relies on failures >>>>>>> seen from the mount point(s) >>>>>>> to identify the files that need >>>>>>> heal. It won't be able to >>>>>>> recognize any file >>>>>>> modification done directly on >>>>>>> bricks. Same goes for heal info >>>>>>> command which >>>>>>> is the reason heal info also >>>>>>> shows 0 entries. >>>>>>> >>>>>>> >>>>>>> Well, this makes self-heal useless >>>>>>> then- if any file is accidently >>>>>>> corrupted or deleted (yes! if file >>>>>>> is deleted directly from brick this >>>>>>> is no recognized by idex heal too), >>>>>>> then it will not be self-healed, >>>>>>> because self-heal uses index heal. >>>>>>> >>>>>>> >>>>>>> It is better to look into bit-rot >>>>>>> feature if you want to guard against >>>>>>> these kinds of problems. >>>>>> >>>>>> Bit rot detects bit problems, not missing >>>>>> files or their wrong length, i.e. this is >>>>>> overhead for such simple task. >>>>>> >>>>>> >>>>>> It detects wrong length. Because checksum >>>>>> won't match anymore. >>>>> >>>>> Yes, sure. I guess that it will detect missed >>>>> files too. But it needs far more resources, >>>>> then just comparing directories in bricks? >>>>>> >>>>>> What use-case you are trying out is leading >>>>>> to changing things directly on the brick? >>>>> I'm trying to test gluster failure tolerance >>>>> and right now I'm not happy with it... >>>>> >>>>> >>>>> Which cases of fault tolerance are you not happy >>>>> with? Making changes directly on the brick or >>>>> anything else as well? >>>>> >>>> I'll repeat: >>>> As I already said- if I for some reason ( real >>>> case can be only by accident ) will delete file >>>> this will not be detected by self-heal daemon, and, >>>> thus, will lead to lower replication level, i.e. >>>> lower failure tolerance. >>>> >>>> >>>> To prevent such accidents you need to set selinux >>>> policies so that files under the brick are not modified >>>> by accident by any user. At least that is the solution >>>> I remember when this was discussed 3-4 years back. >>>> >>> So only supported platfrom is linux? Or, may be, it is >>> better to improve self-healing to detect missing or >>> wrong length files, I guess this is very low cost in >>> terms of host resources operation. >>> Just a suggestion, may be we need to look to >>> alternatives in near future.... >>> >>> This is a corner case, from design perspective it is >>> generally not a good idea to optimize for the corner case. >>> It is better to protect ourselves from the corner case >>> (SElinux etc) or you can also use snapshots to protect >>> against these kind of mishaps. >>> >> Sorry, I'm not agree. >> As you know if on access missed or wrong lenghted file from >> fuse client it is restored (healed), i.e. gluster recognizes >> file is wrong and heal it , so I do not see any reason to >> provide this such function as self-healing. >> Thank you! >> >> Ah! Now how do you suggest we keep track of which of 10s of >> millions of files the user accidentally deleted from the brick >> without gluster's knowledge? Once it comes to gluster's knowledge >> we can do something. But how does gluster become aware of >> something it is not keeping track of? At the time you access it >> gluster knows something went wrong so it restores it. If you >> change something on the bricks even by accident all the data >> gluster keeps (similar to journal) is a waste. Even the disk >> filesystems will ask you to do fsck if something unexpected >> happens so full self-heal is similar operation. > > You are absolutely right- question is why gluster does not become > aware about such problem is case of self-healing? > > > Because the operations that are performed directly on brick do not go > through gluster stack.OK, I'll repeat- As you know if on access missed or wrong lenghted file from fuse client it is restored (healed), i.e. gluster recognizes file is wrong and heal it , so I do not see any reason to provide this such function as self-healing.> >> >> >> -- >> Pranith > > > > > -- > Pranith-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160713/7343b80a/attachment.html>