Pranith Kumar Karampuri
2016-Jul-13 07:40 UTC
[Gluster-users] 3.7.13, index healing broken?
On Wed, Jul 13, 2016 at 12:11 PM, Dmitry Melekhov <dm at belkam.com> wrote:> 13.07.2016 10:24, Pranith Kumar Karampuri ?????: > > > > On Wed, Jul 13, 2016 at 11:49 AM, Dmitry Melekhov < <dm at belkam.com> > dm at belkam.com> wrote: > >> 13.07.2016 10:10, Pranith Kumar Karampuri ?????: >> >> >> >> On Wed, Jul 13, 2016 at 11:27 AM, Dmitry Melekhov <dm at belkam.com> wrote: >> >>> 13.07.2016 09:50, Pranith Kumar Karampuri ?????: >>> >>> >>> >>> On Wed, Jul 13, 2016 at 11:11 AM, Dmitry Melekhov < <dm at belkam.com> >>> dm at belkam.com> wrote: >>> >>>> 13.07.2016 09:36, Pranith Kumar Karampuri ?????: >>>> >>>> >>>> >>>> On Wed, Jul 13, 2016 at 10:58 AM, Dmitry Melekhov < <dm at belkam.com> >>>> dm at belkam.com> wrote: >>>> >>>>> 13.07.2016 09:26, Pranith Kumar Karampuri ?????: >>>>> >>>>> >>>>> >>>>> On Wed, Jul 13, 2016 at 10:50 AM, Dmitry Melekhov < <dm at belkam.com> >>>>> dm at belkam.com> wrote: >>>>> >>>>>> 13.07.2016 09:16, Pranith Kumar Karampuri ?????: >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jul 13, 2016 at 10:38 AM, Dmitry Melekhov < <dm at belkam.com> >>>>>> dm at belkam.com> wrote: >>>>>> >>>>>>> 13.07.2016 09:04, Pranith Kumar Karampuri ?????: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 13, 2016 at 10:29 AM, Dmitry Melekhov < <dm at belkam.com> >>>>>>> dm at belkam.com> wrote: >>>>>>> >>>>>>>> 13.07.2016 08:56, Pranith Kumar Karampuri ?????: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 13, 2016 at 10:23 AM, Dmitry Melekhov < <dm at belkam.com> >>>>>>>> dm at belkam.com> wrote: >>>>>>>> >>>>>>>>> 13.07.2016 08:46, Pranith Kumar Karampuri ?????: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 13, 2016 at 10:10 AM, Dmitry Melekhov < >>>>>>>>> <dm at belkam.com>dm at belkam.com> wrote: >>>>>>>>> >>>>>>>>>> 13.07.2016 08:36, Pranith Kumar Karampuri ?????: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jul 13, 2016 at 9:35 AM, Dmitry Melekhov < >>>>>>>>>> <dm at belkam.com>dm at belkam.com> wrote: >>>>>>>>>> >>>>>>>>>>> 13.07.2016 01:52, Anuradha Talur ?????: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>> >>>>>>>>>>>>> From: "Dmitry Melekhov" < <dm at belkam.com>dm at belkam.com> >>>>>>>>>>>>> To: "Pranith Kumar Karampuri" < <pkarampu at redhat.com> >>>>>>>>>>>>> pkarampu at redhat.com> >>>>>>>>>>>>> Cc: "gluster-users" < <gluster-users at gluster.org> >>>>>>>>>>>>> gluster-users at gluster.org> >>>>>>>>>>>>> Sent: Tuesday, July 12, 2016 9:27:17 PM >>>>>>>>>>>>> Subject: Re: [Gluster-users] 3.7.13, index healing broken? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 12.07.2016 17:39, Pranith Kumar Karampuri ?????: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Wow, what are the steps to recreate the problem? >>>>>>>>>>>>> >>>>>>>>>>>>> just set file length to zero, always reproducible. >>>>>>>>>>>>> >>>>>>>>>>>>> If you are setting the file length to 0 on one of the bricks >>>>>>>>>>>> (looks like >>>>>>>>>>>> that is the case), it is not a bug. >>>>>>>>>>>> >>>>>>>>>>>> Index heal relies on failures seen from the mount point(s) >>>>>>>>>>>> to identify the files that need heal. It won't be able to >>>>>>>>>>>> recognize any file >>>>>>>>>>>> modification done directly on bricks. Same goes for heal info >>>>>>>>>>>> command which >>>>>>>>>>>> is the reason heal info also shows 0 entries. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Well, this makes self-heal useless then- if any file is >>>>>>>>>>> accidently corrupted or deleted (yes! if file is deleted directly from >>>>>>>>>>> brick this is no recognized by idex heal too), then it will not be >>>>>>>>>>> self-healed, because self-heal uses index heal. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It is better to look into bit-rot feature if you want to guard >>>>>>>>>> against these kinds of problems. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Bit rot detects bit problems, not missing files or their wrong >>>>>>>>>> length, i.e. this is overhead for such simple task. >>>>>>>>>> >>>>>>>>> >>>>>>>>> It detects wrong length. Because checksum won't match anymore. >>>>>>>>> >>>>>>>>> >>>>>>>>> Yes, sure. I guess that it will detect missed files too. But it >>>>>>>>> needs far more resources, then just comparing directories in bricks? >>>>>>>>> >>>>>>>>> >>>>>>>>> What use-case you are trying out is leading to changing things >>>>>>>>> directly on the brick? >>>>>>>>> >>>>>>>>> I'm trying to test gluster failure tolerance and right now I'm not >>>>>>>>> happy with it... >>>>>>>>> >>>>>>>> >>>>>>>> Which cases of fault tolerance are you not happy with? Making >>>>>>>> changes directly on the brick or anything else as well? >>>>>>>> >>>>>>>> I'll repeat: >>>>>>>> As I already said- if I for some reason ( real case can be only by >>>>>>>> accident ) will delete file this will not be detected by self-heal daemon, >>>>>>>> and, thus, will lead to lower replication level, i.e. lower failure >>>>>>>> tolerance. >>>>>>>> >>>>>>> >>>>>>> To prevent such accidents you need to set selinux policies so that >>>>>>> files under the brick are not modified by accident by any user. At least >>>>>>> that is the solution I remember when this was discussed 3-4 years back. >>>>>>> >>>>>>> So only supported platfrom is linux? Or, may be, it is better to >>>>>>> improve self-healing to detect missing or wrong length files, I guess this >>>>>>> is very low cost in terms of host resources operation. >>>>>>> Just a suggestion, may be we need to look to alternatives in near >>>>>>> future.... >>>>>>> >>>>>>> This is a corner case, from design perspective it is generally not a >>>>>> good idea to optimize for the corner case. It is better to protect >>>>>> ourselves from the corner case (SElinux etc) or you can also use snapshots >>>>>> to protect against these kind of mishaps. >>>>>> >>>>>> Sorry, I'm not agree. >>>>>> As you know if on access missed or wrong lenghted file from fuse >>>>>> client it is restored (healed), i.e. gluster recognizes file is wrong and >>>>>> heal it , so I do not see any reason to provide this such function as >>>>>> self-healing. >>>>>> Thank you! >>>>>> >>>>>> Ah! Now how do you suggest we keep track of which of 10s of millions >>>>> of files the user accidentally deleted from the brick without gluster's >>>>> knowledge? Once it comes to gluster's knowledge we can do something. But >>>>> how does gluster become aware of something it is not keeping track of? At >>>>> the time you access it gluster knows something went wrong so it restores >>>>> it. If you change something on the bricks even by accident all the data >>>>> gluster keeps (similar to journal) is a waste. Even the disk filesystems >>>>> will ask you to do fsck if something unexpected happens so full self-heal >>>>> is similar operation. >>>>> >>>>> >>>>> You are absolutely right- question is why gluster does not become >>>>> aware about such problem is case of self-healing? >>>>> >>>> >>>> Because the operations that are performed directly on brick do not go >>>> through gluster stack. >>>> >>>> >>>> >>>> OK, I'll repeat- >>>> As you know if on access missed or wrong lenghted file from fuse >>>> client it is restored (healed), i.e. gluster recognizes file is wrong and >>>> heal it , so I do not see any reason to provide this such function as >>>> self-healing. >>>> >>> >>> For which you need accessing the file. >>> >>> That's right. >>> >>> For which you need full crawl. You can't detect the modification which >>> doesn't go through the stack so this is the only possibility. >>> >>> >>> OK, then, if self-heal is really useless and no possible way to get it >>> will be provided, I guess we'll use external script to check bricks >>> directories consistency, >>> don't think ls and diff will get much resources. >>> >> >> How is this different from full self-heal? >> >> >> Self-heal does not detect deleted or wrong-length files . >> > > It detects when you do full crawl. Which essentially is ls -laR kind of > thing on the whole volume. You don't need any external scripts, keep doing > full crawl once in a while may be? > > > You mean on fuse mount? > > It doesn't work: > > [root at father ~]# mount -t glusterfs localhost:/pool gluster > > [root at father ~]# > > then make it zero lengths in brick: > > [root at father gluster]# > /wall/pool/brick/gstatus-0.64-3.el7.x86_64.rpm > [root at father gluster]# > > > [root at father gluster]# ls -laR /root/gluster/ > /root/gluster/: > ????? 122153384 > drwxr-xr-x 4 qemu qemu 4096 ??? 11 13:36 . > dr-xr-x---. 10 root root 4096 ??? 11 12:26 .. > -rw-r--r-- 1 root root 8589934592 ??? 11 09:14 csr1000v1.img > -rw-r--r-- 1 root root 0 ??? 13 10:34 > gstatus-0.64-3.el7.x86_64.rpm > > > As you can see gstatus-0.64-3.el7.x86_64.rpm has 0 length > But: > > [root at father gluster]# touch /root/gluster/gstatus-0.64-3.el7.x86_64.rpm > [root at father gluster]# ls -laR /root/gluster/ > /root/gluster/: > ????? 122153436 > drwxr-xr-x 4 qemu qemu 4096 ??? 11 13:36 . > dr-xr-x---. 10 root root 4096 ??? 11 12:26 .. > -rw-r--r-- 1 root root 8589934592 ??? 11 09:14 csr1000v1.img > -rw-r--r-- 1 root root 52268 ??? 13 10:36 > gstatus-0.64-3.el7.x86_64.rpm > > > I.e. if I do some i.o. on file then it is back. > > > By the way the same problem if I delete file directly in brick: > > [root at father gluster]# rm /wall/pool/brick/gstatus-0.64-3.el7.x86_64.rpm > rm: ??????? ??????? ???? ?/wall/pool/brick/gstatus-0.64-3.el7.x86_64.rpm?? > y > [root at father gluster]# ls -laR /root/gluster/ > /root/gluster/: > ????? 122153384 > drwxr-xr-x 4 qemu qemu 4096 ??? 13 10:38 . > dr-xr-x---. 10 root root 4096 ??? 11 12:26 .. > -rw-r--r-- 1 root root 8589934592 ??? 11 09:14 csr1000v1.img > -rw-r--r-- 1 qemu qemu 43692064768 ??? 13 10:38 infimonitor.img > > > I don't see it in directory in fuse mount at all till touch, which > restores file too. > > > If you need any performance improvements here, we will be happy to help. > Please give us feedback. > > > You recipe doesn't work :-( If there is difference between bricks > directories due to direct brick manipulation it leads to problems. >You have to execute "gluster volume heal <volname> full" for triggering full heal.> > > All I was saying is it is not possible to detect them through index heal. > Because for the index to be populated you need the operations to go through > gluster stack. > > Why it can't ? I don't know, you just said it is impossible in gluster >> because it can only track changes only made through gluster, i.e. bricks >> can have different files sets and it is not recognized (true) because , as >> I understand, gluster's self-heal thinks that brick underlying filesystem >> can't be corrupted by server admin (not true, I can say this as almost 25 >> years experienced engineer, i.e. I did this several times ;-) ). >> >> >> >> >> >>> >>> Thank you! >>> >>> p.s. >>> still can't understand why it can't be implemented in gluster... :-( >>> >>> >>>> >>>>> >>>>> >>>>> -- >>>>> Pranith >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Pranith >>>> >>>> >>>> >>> >>> >>> -- >>> Pranith >>> >>> >>> >> >> >> -- >> Pranith >> >> >> > > > -- > Pranith > > >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160713/0dae5444/attachment.html>
13.07.2016 11:40, Pranith Kumar Karampuri ?????:> > You recipe doesn't work :-( If there is difference between bricks > directories due to direct brick manipulation it leads to problems. > > You have to execute "gluster volume heal <volname> full" for > triggering full heal. >yeah, but I need to know that I need to execute it. any help from gluster or only external script?