Дмитрий Глушенок
2016-Aug-16 17:14 UTC
[Gluster-users] Self healing does not see files to heal
Hello, While testing healing after bitrot error it was found that self healing cannot heal files which were manually deleted from brick. Gluster 3.8.1: - Create volume, mount it locally and copy test file to it [root at srv01 ~]# gluster volume create test01 replica 2 srv01:/R1/test01 srv02:/R1/test01 volume create: test01: success: please start the volume to access data [root at srv01 ~]# gluster volume start test01 volume start: test01: success [root at srv01 ~]# mount -t glusterfs srv01:/test01 /mnt [root at srv01 ~]# cp /etc/passwd /mnt [root at srv01 ~]# ls -l /mnt ????? 2 -rw-r--r--. 1 root root 1505 ??? 16 19:59 passwd - Then remove test file from first brick like we have to do in case of bitrot error in the file [root at srv01 ~]# rm /R1/test01/passwd [root at srv01 ~]# ls -l /mnt ????? 0 [root at srv01 ~]# - Issue full self heal [root at srv01 ~]# gluster volume heal test01 full Launching heal operation to perform full self heal on volume test01 has been successful Use heal info commands to check status [root at srv01 ~]# tail -2 /var/log/glusterfs/glustershd.log [2016-08-16 16:59:56.483767] I [MSGID: 108026] [afr-self-heald.c:611:afr_shd_full_healer] 0-test01-replicate-0: starting full sweep on subvol test01-client-0 [2016-08-16 16:59:56.486560] I [MSGID: 108026] [afr-self-heald.c:621:afr_shd_full_healer] 0-test01-replicate-0: finished full sweep on subvol test01-client-0 - Now we still see no files in mount point (it becomes empty right after removing file from the brick) [root at srv01 ~]# ls -l /mnt ????? 0 [root at srv01 ~]# - Then try to access file by using full name (lookup-optimize and readdir-optimize are turned off by default). Now glusterfs shows the file! [root at srv01 ~]# ls -l /mnt/passwd -rw-r--r--. 1 root root 1505 ??? 16 19:59 /mnt/passwd - And it reappeared in the brick [root at srv01 ~]# ls -l /R1/test01/ ????? 4 -rw-r--r--. 2 root root 1505 ??? 16 19:59 passwd [root at srv01 ~]# Is it a bug or we can tell self heal to scan all files on all bricks in the volume? -- Dmitry Glushenok Jet Infosystems
Ravishankar N
2016-Aug-17 01:24 UTC
[Gluster-users] Self healing does not see files to heal
On 08/16/2016 10:44 PM, ??????? ???????? wrote:> Hello, > > While testing healing after bitrot error it was found that self healing cannot heal files which were manually deleted from brick. Gluster 3.8.1: > > - Create volume, mount it locally and copy test file to it > [root at srv01 ~]# gluster volume create test01 replica 2 srv01:/R1/test01 srv02:/R1/test01 > volume create: test01: success: please start the volume to access data > [root at srv01 ~]# gluster volume start test01 > volume start: test01: success > [root at srv01 ~]# mount -t glusterfs srv01:/test01 /mnt > [root at srv01 ~]# cp /etc/passwd /mnt > [root at srv01 ~]# ls -l /mnt > ????? 2 > -rw-r--r--. 1 root root 1505 ??? 16 19:59 passwd > > - Then remove test file from first brick like we have to do in case of bitrot error in the fileYou also need to remove all hard-links to the corrupted file from the brick, including the one in the .glusterfs folder. There is a bug in heal-full that prevents it from crawling all bricks of the replica. The right way to heal the corrupted files as of now is to access them from the mount-point like you did after removing the hard-links. The list of files that are corrupted can be obtained with the scrub status command. Hope this helps, Ravi> [root at srv01 ~]# rm /R1/test01/passwd > [root at srv01 ~]# ls -l /mnt > ????? 0 > [root at srv01 ~]# > > - Issue full self heal > [root at srv01 ~]# gluster volume heal test01 full > Launching heal operation to perform full self heal on volume test01 has been successful > Use heal info commands to check status > [root at srv01 ~]# tail -2 /var/log/glusterfs/glustershd.log > [2016-08-16 16:59:56.483767] I [MSGID: 108026] [afr-self-heald.c:611:afr_shd_full_healer] 0-test01-replicate-0: starting full sweep on subvol test01-client-0 > [2016-08-16 16:59:56.486560] I [MSGID: 108026] [afr-self-heald.c:621:afr_shd_full_healer] 0-test01-replicate-0: finished full sweep on subvol test01-client-0 > > - Now we still see no files in mount point (it becomes empty right after removing file from the brick) > [root at srv01 ~]# ls -l /mnt > ????? 0 > [root at srv01 ~]# > > - Then try to access file by using full name (lookup-optimize and readdir-optimize are turned off by default). Now glusterfs shows the file! > [root at srv01 ~]# ls -l /mnt/passwd > -rw-r--r--. 1 root root 1505 ??? 16 19:59 /mnt/passwd > > - And it reappeared in the brick > [root at srv01 ~]# ls -l /R1/test01/ > ????? 4 > -rw-r--r--. 2 root root 1505 ??? 16 19:59 passwd > [root at srv01 ~]# > > Is it a bug or we can tell self heal to scan all files on all bricks in the volume? > > -- > Dmitry Glushenok > Jet Infosystems > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users