Hi Brett, First the answers of all your questions - 1. If a self-heal deamon is listed on a host (all of mine show one with a volume status command) can I assume it's enabled and running? For your volume, projects self heal daemon is UP and running 2. I assume the volume that has all the self-heals pending has some serious issues even though I can access the files and directories on it. If self-heal is running shouldn't the numbers be decreasing? It should heal the entries and the number of entries coming in "gluster v heal volname info" command should be decreasing. It appears to me self-heal is not working properly so how to I get it to start working or should I delete the volume and start over? As you can access all the files from mount point, I think the volume and the files are in good state as of now. I don't think you should think of deleting your volume before trying to fix it. If there is no fix or the fix is taking time you can go ahead with that option. ----------------------- Why all these options are off? performance.quick-read: off performance.parallel-readdir: off performance.readdir-ahead: off performance.write-behind: off performance.read-ahead: off Although this should not matter to your issue but I think you should enable all the above unless you have a reason to not to do so. -------------------- I would like you to perform following steps and provide some more information - 1 - Try to restart self heal and see if that works. "gluster v start volume force" will kill and restart the self heal processes. 2 - If step 1 is not fruitful, get the list of entries need to be healed and pick one of the entry to heal. I mean we should focus on one entry to find out why it is not getting healed instead of all the 5900 entries. Let's call it entry1. 3 - Now access the entry1 from mount point, read, write on it and see if this entry has been healed. Check for heal info. Accessing file from mount point triggers client side heal which could also heal the file. 4 - Check for the logs in /var/log/gluster, mount logs and glustershd logs should be checked and provided. 5 - Get the external attributes of entry1 from all the bricks. If the path of the entry1 on mount point is /a/b/c/entry1 then you have to run following command on all the nodes - getfattr -m. -d -e hex <path of the brick on the node>/a/b/c/entry1 Please provide the output of above command too. --- Ashish ----- Original Message ----- From: "Brett Holcomb" <biholcomb at l1049h.com> To: gluster-users at gluster.org Sent: Friday, December 28, 2018 3:49:50 AM Subject: Re: [Gluster-users] Self Heal Confusion Resend as I did not reply to the list earlier. TBird responded to the poster and not the list. On 12/27/18 11:46 AM, Brett Holcomb wrote: Thank you. I appreciate the help Here is the information. Let me know if you need anything else. I'm fairly new to gluster. Gluster version is 5.2 1. gluster v info Volume Name: projects Type: Distributed-Replicate Volume ID: 5aac71aa-feaa-44e9-a4f9-cb4dd6e0fdc3 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: gfssrv1:/srv/gfs01/Projects Brick2: gfssrv2:/srv/gfs01/Projects Brick3: gfssrv3:/srv/gfs01/Projects Brick4: gfssrv4:/srv/gfs01/Projects Brick5: gfssrv5:/srv/gfs01/Projects Brick6: gfssrv6:/srv/gfs01/Projects Options Reconfigured: cluster.self-heal-daemon: enable performance.quick-read: off performance.parallel-readdir: off performance.readdir-ahead: off performance.write-behind: off performance.read-ahead: off performance.client-io-threads: off nfs.disable: on transport.address-family: inet server.allow-insecure: on storage.build-pgfid: on changelog.changelog: on changelog.capture-del-path: on 2. gluster v status Status of volume: projects Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gfssrv1:/srv/gfs01/Projects 49154 0 Y 7213 Brick gfssrv2:/srv/gfs01/Projects 49154 0 Y 6932 Brick gfssrv3:/srv/gfs01/Projects 49154 0 Y 6920 Brick gfssrv4:/srv/gfs01/Projects 49154 0 Y 6732 Brick gfssrv5:/srv/gfs01/Projects 49154 0 Y 6950 Brick gfssrv6:/srv/gfs01/Projects 49154 0 Y 6879 Self-heal Daemon on localhost N/A N/A Y 11484 Self-heal Daemon on gfssrv2 N/A N/A Y 10366 Self-heal Daemon on gfssrv4 N/A N/A Y 9872 Self-heal Daemon on srv-1-gfs3.corp.l1049h. net N/A N/A Y 9892 Self-heal Daemon on gfssrv6 N/A N/A Y 10372 Self-heal Daemon on gfssrv5 N/A N/A Y 10761 Task Status of Volume projects ------------------------------------------------------------------------------ There are no active volume tasks 3. I've given the summary since the actual list for two volumes is around 5900 entries. Brick gfssrv1:/srv/gfs01/Projects Status: Connected Total Number of entries: 85 Number of entries in heal pending: 85 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gfssrv2:/srv/gfs01/Projects Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gfssrv3:/srv/gfs01/Projects Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gfssrv4:/srv/gfs01/Projects Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gfssrv5:/srv/gfs01/Projects Status: Connected Total Number of entries: 58854 Number of entries in heal pending: 58854 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gfssrv6:/srv/gfs01/Projects Status: Connected Total Number of entries: 58854 Number of entries in heal pending: 58854 Number of entries in split-brain: 0 Number of entries possibly healing: 0 On 12/27/18 3:09 AM, Ashish Pandey wrote: <blockquote> Hi Brett, Could you please tell us more about the setup? 1 - Gluster v info 2 - gluster v status 3 - gluster v heal <volname> info These are the very basic information to start with debugging or suggesting any workaround. It should always be included when asking such questions on mailing list so that people can reply sooner. Note: Please hide IP address/hostname or any other information you don't want world to see. --- Ashish ----- Original Message ----- From: "Brett Holcomb" <biholcomb at l1049h.com> To: gluster-users at gluster.org Sent: Thursday, December 27, 2018 12:19:15 AM Subject: Re: [Gluster-users] Self Heal Confusion Still no change in the heals pending. I found this reference, https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf , which mentions the default SELinux context for a brick and that internal operations such as self-heal, rebalance should be ignored. but they do not elaborate on what ignore means - is it just not doing self-heal or something else. I did set SELinux to permissive and nothing changed. I'll try setting the bricks to the context mentioned in this pdf and see what happens. On 12/20/18 8:26 PM, John Strunk wrote: <blockquote> Assuming your bricks are up... yes, the heal count should be decreasing. There is/was a bug wherein self-heal would stop healing but would still be running. I don't know whether your version is affected, but the remedy is to just restart the self-heal daemon. Force start one of the volumes that has heals pending. The bricks are already running, but it will cause shd to restart and, assuming this is the problem, healing should begin... $ gluster vol start my-pending-heal-vol force Others could better comment on the status of the bug. -John On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb < biholcomb at l1049h.com > wrote: <blockquote> I have one volume that has 85 pending entries in healing and two more volumes with 58,854 entries in healing pending. These numbers are from the volume heal info summary command. They have stayed constant for two days now. I've read the gluster docs and many more. The Gluster docs just give some commands and non gluster docs basically repeat that. Given that it appears no self-healing is going on for my volume I am confused as to why. 1. If a self-heal deamon is listed on a host (all of mine show one with a volume status command) can I assume it's enabled and running? 2. I assume the volume that has all the self-heals pending has some serious issues even though I can access the files and directories on it. If self-heal is running shouldn't the numbers be decreasing? It appears to me self-heal is not working properly so how to I get it to start working or should I delete the volume and start over? I'm running gluster 5.2 on Centos 7 latest and updated. Thank you. _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users </blockquote> _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users </blockquote> </blockquote> _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/cf223288/attachment.html>
Thank you for the answers to my questions.? That helps me get a better understanding of Gluster. I will do the steps you listed later today and get back to you. On 12/28/18 1:00 AM, Ashish Pandey wrote:> > Hi Brett, > > First the answers of all your questions - > > 1.? If a self-heal deamon is listed on a host (all of mine show one with > a volume status command) can I assume it's enabled and running? > > For your volume, projects self heal daemon is UP and running > > 2.? I assume the volume that has all the self-heals pending has some > serious issues even though I can access the files and directories on > it.? If self-heal is running shouldn't the numbers be decreasing? > > It should heal the entries and the number of entries coming in > "gluster v heal volname info" command should be decreasing. > > It appears to me self-heal is not working properly so how to I get it to > start working or should I delete the volume and start over? > > As you can access all the files from mount point, I think the volume > and the files are in good state as of now. > I don't think you should think of deleting your volume before trying > to fix it. > If there is no fix or the fix is taking time you can go ahead with > that option. > > ----------------------- > Why all these options are off? > > performance.quick-read: off > performance.parallel-readdir: off > performance.readdir-ahead: off > performance.write-behind: off > performance.read-ahead: off > > Although this should not matter to your issue but I think you should > enable all the above unless you have a reason to not to do so. > -------------------- > > I would like you to perform following steps and provide some more > information - > > 1 - Try to restart self heal and see if that works. > "gluster v start volume force" will kill and restart the self heal > processes. > > 2 - If step 1 is not fruitful, get the list of entries need to be > healed and pick one of the entry to heal. I mean we should focus on > one entry to find out why it is > not getting healed instead of all the 5900 entries. Let's call it entry1. > > 3 -? Now access the entry1 from mount point, read, write on it and see > if this entry has been healed. Check for heal info. Accessing file > from mount point triggers client side heal > which could also heal the file. > > 4 - Check for the logs in /var/log/gluster, mount logs and glustershd > logs should be checked and provided. > > 5 -? Get the external attributes of entry1 from all the bricks. > > If the path of the entry1 on mount point is /a/b/c/entry1 then you > have to run following command on all the nodes - > > getfattr -m. -d -e hex <path of the brick on the node>/a/b/c/entry1 > > Please provide the output of above command too. > > --- > Ashish > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > *From: *"Brett Holcomb" <biholcomb at l1049h.com> > *To: *gluster-users at gluster.org > *Sent: *Friday, December 28, 2018 3:49:50 AM > *Subject: *Re: [Gluster-users] Self Heal Confusion > > Resend as I did not reply to the list earlier.? TBird responded to the > poster and not the list. > > On 12/27/18 11:46 AM, Brett Holcomb wrote: > > Thank you. I appreciate the help? Here is the information.? Let me > know if you need anything else.? I'm fairly new to gluster. > > Gluster version is 5.2 > > 1. gluster v info > > Volume Name: projects > Type: Distributed-Replicate > Volume ID: 5aac71aa-feaa-44e9-a4f9-cb4dd6e0fdc3 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x 3 = 6 > Transport-type: tcp > Bricks: > Brick1: gfssrv1:/srv/gfs01/Projects > Brick2: gfssrv2:/srv/gfs01/Projects > Brick3: gfssrv3:/srv/gfs01/Projects > Brick4: gfssrv4:/srv/gfs01/Projects > Brick5: gfssrv5:/srv/gfs01/Projects > Brick6: gfssrv6:/srv/gfs01/Projects > Options Reconfigured: > cluster.self-heal-daemon: enable > performance.quick-read: off > performance.parallel-readdir: off > performance.readdir-ahead: off > performance.write-behind: off > performance.read-ahead: off > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > server.allow-insecure: on > storage.build-pgfid: on > changelog.changelog: on > changelog.capture-del-path: on > > 2.? gluster v status > > Status of volume: projects > Gluster process???????????????????????????? TCP Port? RDMA Port? > Online? Pid > ------------------------------------------------------------------------------ > Brick gfssrv1:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 7213 > Brick gfssrv2:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6932 > Brick gfssrv3:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6920 > Brick gfssrv4:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6732 > Brick gfssrv5:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6950 > Brick gfssrv6:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6879 > Self-heal Daemon on localhost?????????????? N/A N/A??????? Y?????? > 11484 > Self-heal Daemon on gfssrv2???????????????? N/A N/A??????? Y?????? > 10366 > Self-heal Daemon on gfssrv4???????????????? N/A N/A??????? Y?????? > 9872 > Self-heal Daemon on srv-1-gfs3.corp.l1049h. > net???????????????????????????????????????? N/A N/A??????? Y?????? > 9892 > Self-heal Daemon on gfssrv6???????????????? N/A N/A??????? Y?????? > 10372 > Self-heal Daemon on gfssrv5???????????????? N/A N/A??????? Y?????? > 10761 > > Task Status of Volume projects > ------------------------------------------------------------------------------ > There are no active volume tasks > > 3. I've given the summary since the actual list for two volumes is > around 5900 entries. > > Brick gfssrv1:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 85 > Number of entries in heal pending: 85 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv2:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv3:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv4:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv5:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 58854 > Number of entries in heal pending: 58854 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv6:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 58854 > Number of entries in heal pending: 58854 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > On 12/27/18 3:09 AM, Ashish Pandey wrote: > > Hi Brett, > > Could you please tell us more about the setup? > > 1 - Gluster v info > 2 - gluster v status > 3 - gluster v heal <volname> info > > These are the very basic information to start with debugging > or suggesting any workaround. > It should always be included when asking such questions on > mailing list so that people can reply sooner. > > > Note: Please hide IP address/hostname or any other information > you don't want world to see. > > --- > Ashish > > ------------------------------------------------------------------------ > *From: *"Brett Holcomb" <biholcomb at l1049h.com> > *To: *gluster-users at gluster.org > *Sent: *Thursday, December 27, 2018 12:19:15 AM > *Subject: *Re: [Gluster-users] Self Heal Confusion > > Still no change in the heals pending.? I found this reference, > https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf, > which mentions the default SELinux context for a brick and > that internal operations such as self-heal, rebalance should > be ignored. but they do not elaborate on what ignore means - > is it just not doing self-heal or something else. > > I did set SELinux to permissive and nothing changed.? I'll try > setting the bricks to the context mentioned in this pdf and > see what happens. > > > On 12/20/18 8:26 PM, John Strunk wrote: > > Assuming your bricks are up... yes, the heal count should > be decreasing. > > There is/was a bug wherein self-heal would stop healing > but would still be running. I don't know whether your > version is affected, but the remedy is to just restart the > self-heal daemon. > Force start one of the volumes that has heals pending. The > bricks are already running, but it will cause shd to > restart and, assuming this is the problem, healing should > begin... > > $ gluster vol start my-pending-heal-vol force > > Others could better comment on the status of the bug. > > -John > > > On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb > <biholcomb at l1049h.com <mailto:biholcomb at l1049h.com>> wrote: > > I have one volume that has 85 pending entries in > healing and two more > volumes with 58,854 entries in healing pending. These > numbers are from > the volume heal info summary command.? They have > stayed constant for two > days now.? I've read the gluster docs and many more.? > The Gluster docs > just give some commands and non gluster docs basically > repeat that. > Given that it appears no self-healing is going on for > my volume I am > confused as to why. > > 1.? If a self-heal deamon is listed on a host (all of > mine show one with > a volume status command) can I assume it's enabled and > running? > > 2.? I assume the volume that has all the self-heals > pending has some > serious issues even though I can access the files and > directories on > it.? If self-heal is running shouldn't the numbers be > decreasing? > > It appears to me self-heal is not working properly so > how to I get it to > start working or should I delete the volume and start > over? > > I'm running gluster 5.2 on Centos 7 latest and updated. > > Thank you. > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/a958eabc/attachment.html>
I assume the options were off by default but I'll turn them back on.? I'm working on getting the information. On 12/28/18 1:00 AM, Ashish Pandey wrote:> > Hi Brett, > > First the answers of all your questions - > > 1.? If a self-heal deamon is listed on a host (all of mine show one with > a volume status command) can I assume it's enabled and running? > > For your volume, projects self heal daemon is UP and running > > 2.? I assume the volume that has all the self-heals pending has some > serious issues even though I can access the files and directories on > it.? If self-heal is running shouldn't the numbers be decreasing? > > It should heal the entries and the number of entries coming in > "gluster v heal volname info" command should be decreasing. > > It appears to me self-heal is not working properly so how to I get it to > start working or should I delete the volume and start over? > > As you can access all the files from mount point, I think the volume > and the files are in good state as of now. > I don't think you should think of deleting your volume before trying > to fix it. > If there is no fix or the fix is taking time you can go ahead with > that option. > > ----------------------- > Why all these options are off? > > performance.quick-read: off > performance.parallel-readdir: off > performance.readdir-ahead: off > performance.write-behind: off > performance.read-ahead: off > > Although this should not matter to your issue but I think you should > enable all the above unless you have a reason to not to do so. > -------------------- > > I would like you to perform following steps and provide some more > information - > > 1 - Try to restart self heal and see if that works. > "gluster v start volume force" will kill and restart the self heal > processes. > > 2 - If step 1 is not fruitful, get the list of entries need to be > healed and pick one of the entry to heal. I mean we should focus on > one entry to find out why it is > not getting healed instead of all the 5900 entries. Let's call it entry1. > > 3 -? Now access the entry1 from mount point, read, write on it and see > if this entry has been healed. Check for heal info. Accessing file > from mount point triggers client side heal > which could also heal the file. > > 4 - Check for the logs in /var/log/gluster, mount logs and glustershd > logs should be checked and provided. > > 5 -? Get the external attributes of entry1 from all the bricks. > > If the path of the entry1 on mount point is /a/b/c/entry1 then you > have to run following command on all the nodes - > > getfattr -m. -d -e hex <path of the brick on the node>/a/b/c/entry1 > > Please provide the output of above command too. > > --- > Ashish > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > *From: *"Brett Holcomb" <biholcomb at l1049h.com> > *To: *gluster-users at gluster.org > *Sent: *Friday, December 28, 2018 3:49:50 AM > *Subject: *Re: [Gluster-users] Self Heal Confusion > > Resend as I did not reply to the list earlier.? TBird responded to the > poster and not the list. > > On 12/27/18 11:46 AM, Brett Holcomb wrote: > > Thank you. I appreciate the help? Here is the information.? Let me > know if you need anything else.? I'm fairly new to gluster. > > Gluster version is 5.2 > > 1. gluster v info > > Volume Name: projects > Type: Distributed-Replicate > Volume ID: 5aac71aa-feaa-44e9-a4f9-cb4dd6e0fdc3 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x 3 = 6 > Transport-type: tcp > Bricks: > Brick1: gfssrv1:/srv/gfs01/Projects > Brick2: gfssrv2:/srv/gfs01/Projects > Brick3: gfssrv3:/srv/gfs01/Projects > Brick4: gfssrv4:/srv/gfs01/Projects > Brick5: gfssrv5:/srv/gfs01/Projects > Brick6: gfssrv6:/srv/gfs01/Projects > Options Reconfigured: > cluster.self-heal-daemon: enable > performance.quick-read: off > performance.parallel-readdir: off > performance.readdir-ahead: off > performance.write-behind: off > performance.read-ahead: off > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > server.allow-insecure: on > storage.build-pgfid: on > changelog.changelog: on > changelog.capture-del-path: on > > 2.? gluster v status > > Status of volume: projects > Gluster process???????????????????????????? TCP Port? RDMA Port? > Online? Pid > ------------------------------------------------------------------------------ > Brick gfssrv1:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 7213 > Brick gfssrv2:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6932 > Brick gfssrv3:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6920 > Brick gfssrv4:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6732 > Brick gfssrv5:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6950 > Brick gfssrv6:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6879 > Self-heal Daemon on localhost?????????????? N/A N/A??????? Y?????? > 11484 > Self-heal Daemon on gfssrv2???????????????? N/A N/A??????? Y?????? > 10366 > Self-heal Daemon on gfssrv4???????????????? N/A N/A??????? Y?????? > 9872 > Self-heal Daemon on srv-1-gfs3.corp.l1049h. > net???????????????????????????????????????? N/A N/A??????? Y?????? > 9892 > Self-heal Daemon on gfssrv6???????????????? N/A N/A??????? Y?????? > 10372 > Self-heal Daemon on gfssrv5???????????????? N/A N/A??????? Y?????? > 10761 > > Task Status of Volume projects > ------------------------------------------------------------------------------ > There are no active volume tasks > > 3. I've given the summary since the actual list for two volumes is > around 5900 entries. > > Brick gfssrv1:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 85 > Number of entries in heal pending: 85 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv2:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv3:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv4:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv5:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 58854 > Number of entries in heal pending: 58854 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv6:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 58854 > Number of entries in heal pending: 58854 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > On 12/27/18 3:09 AM, Ashish Pandey wrote: > > Hi Brett, > > Could you please tell us more about the setup? > > 1 - Gluster v info > 2 - gluster v status > 3 - gluster v heal <volname> info > > These are the very basic information to start with debugging > or suggesting any workaround. > It should always be included when asking such questions on > mailing list so that people can reply sooner. > > > Note: Please hide IP address/hostname or any other information > you don't want world to see. > > --- > Ashish > > ------------------------------------------------------------------------ > *From: *"Brett Holcomb" <biholcomb at l1049h.com> > *To: *gluster-users at gluster.org > *Sent: *Thursday, December 27, 2018 12:19:15 AM > *Subject: *Re: [Gluster-users] Self Heal Confusion > > Still no change in the heals pending.? I found this reference, > https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf, > which mentions the default SELinux context for a brick and > that internal operations such as self-heal, rebalance should > be ignored. but they do not elaborate on what ignore means - > is it just not doing self-heal or something else. > > I did set SELinux to permissive and nothing changed.? I'll try > setting the bricks to the context mentioned in this pdf and > see what happens. > > > On 12/20/18 8:26 PM, John Strunk wrote: > > Assuming your bricks are up... yes, the heal count should > be decreasing. > > There is/was a bug wherein self-heal would stop healing > but would still be running. I don't know whether your > version is affected, but the remedy is to just restart the > self-heal daemon. > Force start one of the volumes that has heals pending. The > bricks are already running, but it will cause shd to > restart and, assuming this is the problem, healing should > begin... > > $ gluster vol start my-pending-heal-vol force > > Others could better comment on the status of the bug. > > -John > > > On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb > <biholcomb at l1049h.com <mailto:biholcomb at l1049h.com>> wrote: > > I have one volume that has 85 pending entries in > healing and two more > volumes with 58,854 entries in healing pending. These > numbers are from > the volume heal info summary command.? They have > stayed constant for two > days now.? I've read the gluster docs and many more.? > The Gluster docs > just give some commands and non gluster docs basically > repeat that. > Given that it appears no self-healing is going on for > my volume I am > confused as to why. > > 1.? If a self-heal deamon is listed on a host (all of > mine show one with > a volume status command) can I assume it's enabled and > running? > > 2.? I assume the volume that has all the self-heals > pending has some > serious issues even though I can access the files and > directories on > it.? If self-heal is running shouldn't the numbers be > decreasing? > > It appears to me self-heal is not working properly so > how to I get it to > start working or should I delete the volume and start > over? > > I'm running gluster 5.2 on Centos 7 latest and updated. > > Thank you. > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/615914c8/attachment.html>
I've done step 1 with no results yet so I'm trying step 2 but can't find the file via the GFID name.? The gluster volume heal projects info output is in a text file so I grabbed the first entry from the file for Brick gfssrv1:/srv/gfs01/Projects which is listed a <gfid:the long gfid> I then tried to use this method here, https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/ch, to find the? file.? However when I do the mount there is no .gfid directory anywhere. I then used the Gluster GFID resolver from here, https://gist.github.com/semiosis/4392640, and that gives me this output which has no file linked to it. [root at srv-1-gfs1 ~]# ./gfid-resolver.sh /srv/gfs01/Projects 6e5ab8ae-65f4-4594-9313-3483bf031adc 6e5ab8ae-65f4-4594-9313-3483bf031adc??? ==????? File: Done. So at this point either I'm doing something wrong (most likely) or the files do not exist. I've tried this on several files. On 12/28/18 1:00 AM, Ashish Pandey wrote:> > Hi Brett, > > First the answers of all your questions - > > 1.? If a self-heal deamon is listed on a host (all of mine show one with > a volume status command) can I assume it's enabled and running? > > For your volume, projects self heal daemon is UP and running > > 2.? I assume the volume that has all the self-heals pending has some > serious issues even though I can access the files and directories on > it.? If self-heal is running shouldn't the numbers be decreasing? > > It should heal the entries and the number of entries coming in > "gluster v heal volname info" command should be decreasing. > > It appears to me self-heal is not working properly so how to I get it to > start working or should I delete the volume and start over? > > As you can access all the files from mount point, I think the volume > and the files are in good state as of now. > I don't think you should think of deleting your volume before trying > to fix it. > If there is no fix or the fix is taking time you can go ahead with > that option. > > ----------------------- > Why all these options are off? > > performance.quick-read: off > performance.parallel-readdir: off > performance.readdir-ahead: off > performance.write-behind: off > performance.read-ahead: off > > Although this should not matter to your issue but I think you should > enable all the above unless you have a reason to not to do so. > -------------------- > > I would like you to perform following steps and provide some more > information - > > 1 - Try to restart self heal and see if that works. > "gluster v start volume force" will kill and restart the self heal > processes. > > 2 - If step 1 is not fruitful, get the list of entries need to be > healed and pick one of the entry to heal. I mean we should focus on > one entry to find out why it is > not getting healed instead of all the 5900 entries. Let's call it entry1. > > 3 -? Now access the entry1 from mount point, read, write on it and see > if this entry has been healed. Check for heal info. Accessing file > from mount point triggers client side heal > which could also heal the file. > > 4 - Check for the logs in /var/log/gluster, mount logs and glustershd > logs should be checked and provided. > > 5 -? Get the external attributes of entry1 from all the bricks. > > If the path of the entry1 on mount point is /a/b/c/entry1 then you > have to run following command on all the nodes - > > getfattr -m. -d -e hex <path of the brick on the node>/a/b/c/entry1 > > Please provide the output of above command too. > > --- > Ashish > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > *From: *"Brett Holcomb" <biholcomb at l1049h.com> > *To: *gluster-users at gluster.org > *Sent: *Friday, December 28, 2018 3:49:50 AM > *Subject: *Re: [Gluster-users] Self Heal Confusion > > Resend as I did not reply to the list earlier.? TBird responded to the > poster and not the list. > > On 12/27/18 11:46 AM, Brett Holcomb wrote: > > Thank you. I appreciate the help? Here is the information.? Let me > know if you need anything else.? I'm fairly new to gluster. > > Gluster version is 5.2 > > 1. gluster v info > > Volume Name: projects > Type: Distributed-Replicate > Volume ID: 5aac71aa-feaa-44e9-a4f9-cb4dd6e0fdc3 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x 3 = 6 > Transport-type: tcp > Bricks: > Brick1: gfssrv1:/srv/gfs01/Projects > Brick2: gfssrv2:/srv/gfs01/Projects > Brick3: gfssrv3:/srv/gfs01/Projects > Brick4: gfssrv4:/srv/gfs01/Projects > Brick5: gfssrv5:/srv/gfs01/Projects > Brick6: gfssrv6:/srv/gfs01/Projects > Options Reconfigured: > cluster.self-heal-daemon: enable > performance.quick-read: off > performance.parallel-readdir: off > performance.readdir-ahead: off > performance.write-behind: off > performance.read-ahead: off > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > server.allow-insecure: on > storage.build-pgfid: on > changelog.changelog: on > changelog.capture-del-path: on > > 2.? gluster v status > > Status of volume: projects > Gluster process???????????????????????????? TCP Port? RDMA Port? > Online? Pid > ------------------------------------------------------------------------------ > Brick gfssrv1:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 7213 > Brick gfssrv2:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6932 > Brick gfssrv3:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6920 > Brick gfssrv4:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6732 > Brick gfssrv5:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6950 > Brick gfssrv6:/srv/gfs01/Projects?????????? 49154 0????????? > Y?????? 6879 > Self-heal Daemon on localhost?????????????? N/A N/A??????? Y?????? > 11484 > Self-heal Daemon on gfssrv2???????????????? N/A N/A??????? Y?????? > 10366 > Self-heal Daemon on gfssrv4???????????????? N/A N/A??????? Y?????? > 9872 > Self-heal Daemon on srv-1-gfs3.corp.l1049h. > net???????????????????????????????????????? N/A N/A??????? Y?????? > 9892 > Self-heal Daemon on gfssrv6???????????????? N/A N/A??????? Y?????? > 10372 > Self-heal Daemon on gfssrv5???????????????? N/A N/A??????? Y?????? > 10761 > > Task Status of Volume projects > ------------------------------------------------------------------------------ > There are no active volume tasks > > 3. I've given the summary since the actual list for two volumes is > around 5900 entries. > > Brick gfssrv1:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 85 > Number of entries in heal pending: 85 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv2:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv3:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv4:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv5:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 58854 > Number of entries in heal pending: 58854 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfssrv6:/srv/gfs01/Projects > Status: Connected > Total Number of entries: 58854 > Number of entries in heal pending: 58854 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > On 12/27/18 3:09 AM, Ashish Pandey wrote: > > Hi Brett, > > Could you please tell us more about the setup? > > 1 - Gluster v info > 2 - gluster v status > 3 - gluster v heal <volname> info > > These are the very basic information to start with debugging > or suggesting any workaround. > It should always be included when asking such questions on > mailing list so that people can reply sooner. > > > Note: Please hide IP address/hostname or any other information > you don't want world to see. > > --- > Ashish > > ------------------------------------------------------------------------ > *From: *"Brett Holcomb" <biholcomb at l1049h.com> > *To: *gluster-users at gluster.org > *Sent: *Thursday, December 27, 2018 12:19:15 AM > *Subject: *Re: [Gluster-users] Self Heal Confusion > > Still no change in the heals pending.? I found this reference, > https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf, > which mentions the default SELinux context for a brick and > that internal operations such as self-heal, rebalance should > be ignored. but they do not elaborate on what ignore means - > is it just not doing self-heal or something else. > > I did set SELinux to permissive and nothing changed.? I'll try > setting the bricks to the context mentioned in this pdf and > see what happens. > > > On 12/20/18 8:26 PM, John Strunk wrote: > > Assuming your bricks are up... yes, the heal count should > be decreasing. > > There is/was a bug wherein self-heal would stop healing > but would still be running. I don't know whether your > version is affected, but the remedy is to just restart the > self-heal daemon. > Force start one of the volumes that has heals pending. The > bricks are already running, but it will cause shd to > restart and, assuming this is the problem, healing should > begin... > > $ gluster vol start my-pending-heal-vol force > > Others could better comment on the status of the bug. > > -John > > > On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb > <biholcomb at l1049h.com <mailto:biholcomb at l1049h.com>> wrote: > > I have one volume that has 85 pending entries in > healing and two more > volumes with 58,854 entries in healing pending. These > numbers are from > the volume heal info summary command.? They have > stayed constant for two > days now.? I've read the gluster docs and many more.? > The Gluster docs > just give some commands and non gluster docs basically > repeat that. > Given that it appears no self-healing is going on for > my volume I am > confused as to why. > > 1.? If a self-heal deamon is listed on a host (all of > mine show one with > a volume status command) can I assume it's enabled and > running? > > 2.? I assume the volume that has all the self-heals > pending has some > serious issues even though I can access the files and > directories on > it.? If self-heal is running shouldn't the numbers be > decreasing? > > It appears to me self-heal is not working properly so > how to I get it to > start working or should I delete the volume and start > over? > > I'm running gluster 5.2 on Centos 7 latest and updated. > > Thank you. > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/ef9eebf5/attachment.html>