Ravishankar N
2016-Aug-17 01:24 UTC
[Gluster-users] Self healing does not see files to heal
On 08/16/2016 10:44 PM, ??????? ???????? wrote:> Hello, > > While testing healing after bitrot error it was found that self healing cannot heal files which were manually deleted from brick. Gluster 3.8.1: > > - Create volume, mount it locally and copy test file to it > [root at srv01 ~]# gluster volume create test01 replica 2 srv01:/R1/test01 srv02:/R1/test01 > volume create: test01: success: please start the volume to access data > [root at srv01 ~]# gluster volume start test01 > volume start: test01: success > [root at srv01 ~]# mount -t glusterfs srv01:/test01 /mnt > [root at srv01 ~]# cp /etc/passwd /mnt > [root at srv01 ~]# ls -l /mnt > ????? 2 > -rw-r--r--. 1 root root 1505 ??? 16 19:59 passwd > > - Then remove test file from first brick like we have to do in case of bitrot error in the fileYou also need to remove all hard-links to the corrupted file from the brick, including the one in the .glusterfs folder. There is a bug in heal-full that prevents it from crawling all bricks of the replica. The right way to heal the corrupted files as of now is to access them from the mount-point like you did after removing the hard-links. The list of files that are corrupted can be obtained with the scrub status command. Hope this helps, Ravi> [root at srv01 ~]# rm /R1/test01/passwd > [root at srv01 ~]# ls -l /mnt > ????? 0 > [root at srv01 ~]# > > - Issue full self heal > [root at srv01 ~]# gluster volume heal test01 full > Launching heal operation to perform full self heal on volume test01 has been successful > Use heal info commands to check status > [root at srv01 ~]# tail -2 /var/log/glusterfs/glustershd.log > [2016-08-16 16:59:56.483767] I [MSGID: 108026] [afr-self-heald.c:611:afr_shd_full_healer] 0-test01-replicate-0: starting full sweep on subvol test01-client-0 > [2016-08-16 16:59:56.486560] I [MSGID: 108026] [afr-self-heald.c:621:afr_shd_full_healer] 0-test01-replicate-0: finished full sweep on subvol test01-client-0 > > - Now we still see no files in mount point (it becomes empty right after removing file from the brick) > [root at srv01 ~]# ls -l /mnt > ????? 0 > [root at srv01 ~]# > > - Then try to access file by using full name (lookup-optimize and readdir-optimize are turned off by default). Now glusterfs shows the file! > [root at srv01 ~]# ls -l /mnt/passwd > -rw-r--r--. 1 root root 1505 ??? 16 19:59 /mnt/passwd > > - And it reappeared in the brick > [root at srv01 ~]# ls -l /R1/test01/ > ????? 4 > -rw-r--r--. 2 root root 1505 ??? 16 19:59 passwd > [root at srv01 ~]# > > Is it a bug or we can tell self heal to scan all files on all bricks in the volume? > > -- > Dmitry Glushenok > Jet Infosystems > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users
Lindsay Mathieson
2016-Aug-17 01:55 UTC
[Gluster-users] Self healing does not see files to heal
On 17 August 2016 at 11:24, Ravishankar N <ravishankar at redhat.com> wrote:> The right way to heal the corrupted files as of now is to access them from > the mount-point like you did after removing the hard-links. The list of > files that are corrupted can be obtained with the scrub status command.Hows that work with sharding where you can't see the shards from the mount point? -- Lindsay
Дмитрий Глушенок
2016-Aug-17 08:18 UTC
[Gluster-users] Self healing does not see files to heal
Hello Ravi, Thank you for reply. Found bug number (for those who will google the email) https://bugzilla.redhat.com/show_bug.cgi?id=1112158 Accessing the removed file from mount-point is not always working because we have to find a special client which DHT will point to the brick with removed file. Otherwise the file will be accessed from good brick and self-healing will not happen (just verified). Or by accessing you meant something like touch? -- Dmitry Glushenok Jet Infosystems> 17 ???. 2016 ?., ? 4:24, Ravishankar N <ravishankar at redhat.com> ???????(?): > > On 08/16/2016 10:44 PM, ??????? ???????? wrote: >> Hello, >> >> While testing healing after bitrot error it was found that self healing cannot heal files which were manually deleted from brick. Gluster 3.8.1: >> >> - Create volume, mount it locally and copy test file to it >> [root at srv01 ~]# gluster volume create test01 replica 2 srv01:/R1/test01 srv02:/R1/test01 >> volume create: test01: success: please start the volume to access data >> [root at srv01 ~]# gluster volume start test01 >> volume start: test01: success >> [root at srv01 ~]# mount -t glusterfs srv01:/test01 /mnt >> [root at srv01 ~]# cp /etc/passwd /mnt >> [root at srv01 ~]# ls -l /mnt >> ????? 2 >> -rw-r--r--. 1 root root 1505 ??? 16 19:59 passwd >> >> - Then remove test file from first brick like we have to do in case of bitrot error in the file > > You also need to remove all hard-links to the corrupted file from the brick, including the one in the .glusterfs folder. > There is a bug in heal-full that prevents it from crawling all bricks of the replica. The right way to heal the corrupted files as of now is to access them from the mount-point like you did after removing the hard-links. The list of files that are corrupted can be obtained with the scrub status command. > > Hope this helps, > Ravi > >> [root at srv01 ~]# rm /R1/test01/passwd >> [root at srv01 ~]# ls -l /mnt >> ????? 0 >> [root at srv01 ~]# >> >> - Issue full self heal >> [root at srv01 ~]# gluster volume heal test01 full >> Launching heal operation to perform full self heal on volume test01 has been successful >> Use heal info commands to check status >> [root at srv01 ~]# tail -2 /var/log/glusterfs/glustershd.log >> [2016-08-16 16:59:56.483767] I [MSGID: 108026] [afr-self-heald.c:611:afr_shd_full_healer] 0-test01-replicate-0: starting full sweep on subvol test01-client-0 >> [2016-08-16 16:59:56.486560] I [MSGID: 108026] [afr-self-heald.c:621:afr_shd_full_healer] 0-test01-replicate-0: finished full sweep on subvol test01-client-0 >> >> - Now we still see no files in mount point (it becomes empty right after removing file from the brick) >> [root at srv01 ~]# ls -l /mnt >> ????? 0 >> [root at srv01 ~]# >> >> - Then try to access file by using full name (lookup-optimize and readdir-optimize are turned off by default). Now glusterfs shows the file! >> [root at srv01 ~]# ls -l /mnt/passwd >> -rw-r--r--. 1 root root 1505 ??? 16 19:59 /mnt/passwd >> >> - And it reappeared in the brick >> [root at srv01 ~]# ls -l /R1/test01/ >> ????? 4 >> -rw-r--r--. 2 root root 1505 ??? 16 19:59 passwd >> [root at srv01 ~]# >> >> Is it a bug or we can tell self heal to scan all files on all bricks in the volume? >> >> -- >> Dmitry Glushenok >> Jet Infosystems >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://www.gluster.org/mailman/listinfo/gluster-users <http://www.gluster.org/mailman/listinfo/gluster-users>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160817/988b6918/attachment.html>