Karthik Subrahmanya
2018-Mar-13 15:46 UTC
[Gluster-users] Can't heal a volume: "Please check if all brick processes are running."
Hi Anatoliy, The heal command is basically used to heal any mismatching contents between replica copies of the files. For the command "gluster volume heal <volname>" to succeed, you should have the self-heal-daemon running, which is true only if your volume is of type replicate/disperse. In your case you have a plain distribute volume where you do not store the replica of any files. So the volume heal will return you the error. Regards, Karthik On Tue, Mar 13, 2018 at 7:53 PM, Anatoliy Dmytriyev <tolid at tolid.eu.org> wrote:> Hi, > > > Maybe someone can point me to a documentation or explain this? I can't > find it myself. > Do we have any other useful resources except doc.gluster.org? As I see > many gluster options are not described there or there are no explanation > what is doing... > > > > On 2018-03-12 15:58, Anatoliy Dmytriyev wrote: > >> Hello, >> >> We have a very fresh gluster 3.10.10 installation. >> Our volume is created as distributed volume, 9 bricks 96TB in total >> (87TB after 10% of gluster disk space reservation) >> >> For some reasons I can?t ?heal? the volume: >> # gluster volume heal gv0 >> Launching heal operation to perform index self heal on volume gv0 has >> been unsuccessful on bricks that are down. Please check if all brick >> processes are running. >> >> Which processes should be run on every brick for heal operation? >> >> # gluster volume status >> Status of volume: gv0 >> Gluster process TCP Port RDMA Port Online >> Pid >> ------------------------------------------------------------ >> ------------------ >> Brick cn01-ib:/gfs/gv0/brick1/brick 0 49152 Y >> 70850 >> Brick cn02-ib:/gfs/gv0/brick1/brick 0 49152 Y >> 102951 >> Brick cn03-ib:/gfs/gv0/brick1/brick 0 49152 Y >> 57535 >> Brick cn04-ib:/gfs/gv0/brick1/brick 0 49152 Y >> 56676 >> Brick cn05-ib:/gfs/gv0/brick1/brick 0 49152 Y >> 56880 >> Brick cn06-ib:/gfs/gv0/brick1/brick 0 49152 Y >> 56889 >> Brick cn07-ib:/gfs/gv0/brick1/brick 0 49152 Y >> 56902 >> Brick cn08-ib:/gfs/gv0/brick1/brick 0 49152 Y >> 94920 >> Brick cn09-ib:/gfs/gv0/brick1/brick 0 49152 Y >> 56542 >> >> Task Status of Volume gv0 >> ------------------------------------------------------------ >> ------------------ >> There are no active volume tasks >> >> >> # gluster volume info gv0 >> Volume Name: gv0 >> Type: Distribute >> Volume ID: 8becaf78-cf2d-4991-93bf-f2446688154f >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 9 >> Transport-type: rdma >> Bricks: >> Brick1: cn01-ib:/gfs/gv0/brick1/brick >> Brick2: cn02-ib:/gfs/gv0/brick1/brick >> Brick3: cn03-ib:/gfs/gv0/brick1/brick >> Brick4: cn04-ib:/gfs/gv0/brick1/brick >> Brick5: cn05-ib:/gfs/gv0/brick1/brick >> Brick6: cn06-ib:/gfs/gv0/brick1/brick >> Brick7: cn07-ib:/gfs/gv0/brick1/brick >> Brick8: cn08-ib:/gfs/gv0/brick1/brick >> Brick9: cn09-ib:/gfs/gv0/brick1/brick >> Options Reconfigured: >> client.event-threads: 8 >> performance.parallel-readdir: on >> performance.readdir-ahead: on >> cluster.nufa: on >> nfs.disable: on >> > > -- > Best regards, > Anatoliy > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180313/c8ab65c8/attachment.html>
Laura Bailey
2018-Mar-13 23:03 UTC
[Gluster-users] Can't heal a volume: "Please check if all brick processes are running."
Can we add a smarter error message for this situation by checking volume type first? Cheers, Laura B On Wednesday, March 14, 2018, Karthik Subrahmanya <ksubrahm at redhat.com> wrote:> Hi Anatoliy, > > The heal command is basically used to heal any mismatching contents > between replica copies of the files. > For the command "gluster volume heal <volname>" to succeed, you should > have the self-heal-daemon running, > which is true only if your volume is of type replicate/disperse. > In your case you have a plain distribute volume where you do not store the > replica of any files. > So the volume heal will return you the error. > > Regards, > Karthik > > On Tue, Mar 13, 2018 at 7:53 PM, Anatoliy Dmytriyev <tolid at tolid.eu.org> > wrote: > >> Hi, >> >> >> Maybe someone can point me to a documentation or explain this? I can't >> find it myself. >> Do we have any other useful resources except doc.gluster.org? As I see >> many gluster options are not described there or there are no explanation >> what is doing... >> >> >> >> On 2018-03-12 15:58, Anatoliy Dmytriyev wrote: >> >>> Hello, >>> >>> We have a very fresh gluster 3.10.10 installation. >>> Our volume is created as distributed volume, 9 bricks 96TB in total >>> (87TB after 10% of gluster disk space reservation) >>> >>> For some reasons I can?t ?heal? the volume: >>> # gluster volume heal gv0 >>> Launching heal operation to perform index self heal on volume gv0 has >>> been unsuccessful on bricks that are down. Please check if all brick >>> processes are running. >>> >>> Which processes should be run on every brick for heal operation? >>> >>> # gluster volume status >>> Status of volume: gv0 >>> Gluster process TCP Port RDMA Port Online >>> Pid >>> ------------------------------------------------------------ >>> ------------------ >>> Brick cn01-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 70850 >>> Brick cn02-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 102951 >>> Brick cn03-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 57535 >>> Brick cn04-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 56676 >>> Brick cn05-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 56880 >>> Brick cn06-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 56889 >>> Brick cn07-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 56902 >>> Brick cn08-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 94920 >>> Brick cn09-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 56542 >>> >>> Task Status of Volume gv0 >>> ------------------------------------------------------------ >>> ------------------ >>> There are no active volume tasks >>> >>> >>> # gluster volume info gv0 >>> Volume Name: gv0 >>> Type: Distribute >>> Volume ID: 8becaf78-cf2d-4991-93bf-f2446688154f >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 9 >>> Transport-type: rdma >>> Bricks: >>> Brick1: cn01-ib:/gfs/gv0/brick1/brick >>> Brick2: cn02-ib:/gfs/gv0/brick1/brick >>> Brick3: cn03-ib:/gfs/gv0/brick1/brick >>> Brick4: cn04-ib:/gfs/gv0/brick1/brick >>> Brick5: cn05-ib:/gfs/gv0/brick1/brick >>> Brick6: cn06-ib:/gfs/gv0/brick1/brick >>> Brick7: cn07-ib:/gfs/gv0/brick1/brick >>> Brick8: cn08-ib:/gfs/gv0/brick1/brick >>> Brick9: cn09-ib:/gfs/gv0/brick1/brick >>> Options Reconfigured: >>> client.event-threads: 8 >>> performance.parallel-readdir: on >>> performance.readdir-ahead: on >>> cluster.nufa: on >>> nfs.disable: on >>> >> >> -- >> Best regards, >> Anatoliy >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > >-- Laura Bailey Senior Technical Writer Customer Content Services BNE -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180314/dfe3ed76/attachment.html>
Karthik Subrahmanya
2018-Mar-14 04:47 UTC
[Gluster-users] Can't heal a volume: "Please check if all brick processes are running."
On Wed, Mar 14, 2018 at 4:33 AM, Laura Bailey <lbailey at redhat.com> wrote:> Can we add a smarter error message for this situation by checking volume > type first?Yes we can. I will do that. Thanks, Karthik> > Cheers, > Laura B > > > On Wednesday, March 14, 2018, Karthik Subrahmanya <ksubrahm at redhat.com> > wrote: > >> Hi Anatoliy, >> >> The heal command is basically used to heal any mismatching contents >> between replica copies of the files. >> For the command "gluster volume heal <volname>" to succeed, you should >> have the self-heal-daemon running, >> which is true only if your volume is of type replicate/disperse. >> In your case you have a plain distribute volume where you do not store >> the replica of any files. >> So the volume heal will return you the error. >> >> Regards, >> Karthik >> >> On Tue, Mar 13, 2018 at 7:53 PM, Anatoliy Dmytriyev <tolid at tolid.eu.org> >> wrote: >> >>> Hi, >>> >>> >>> Maybe someone can point me to a documentation or explain this? I can't >>> find it myself. >>> Do we have any other useful resources except doc.gluster.org? As I see >>> many gluster options are not described there or there are no explanation >>> what is doing... >>> >>> >>> >>> On 2018-03-12 15:58, Anatoliy Dmytriyev wrote: >>> >>>> Hello, >>>> >>>> We have a very fresh gluster 3.10.10 installation. >>>> Our volume is created as distributed volume, 9 bricks 96TB in total >>>> (87TB after 10% of gluster disk space reservation) >>>> >>>> For some reasons I can?t ?heal? the volume: >>>> # gluster volume heal gv0 >>>> Launching heal operation to perform index self heal on volume gv0 has >>>> been unsuccessful on bricks that are down. Please check if all brick >>>> processes are running. >>>> >>>> Which processes should be run on every brick for heal operation? >>>> >>>> # gluster volume status >>>> Status of volume: gv0 >>>> Gluster process TCP Port RDMA Port >>>> Online Pid >>>> ------------------------------------------------------------ >>>> ------------------ >>>> Brick cn01-ib:/gfs/gv0/brick1/brick 0 49152 Y >>>> 70850 >>>> Brick cn02-ib:/gfs/gv0/brick1/brick 0 49152 Y >>>> 102951 >>>> Brick cn03-ib:/gfs/gv0/brick1/brick 0 49152 Y >>>> 57535 >>>> Brick cn04-ib:/gfs/gv0/brick1/brick 0 49152 Y >>>> 56676 >>>> Brick cn05-ib:/gfs/gv0/brick1/brick 0 49152 Y >>>> 56880 >>>> Brick cn06-ib:/gfs/gv0/brick1/brick 0 49152 Y >>>> 56889 >>>> Brick cn07-ib:/gfs/gv0/brick1/brick 0 49152 Y >>>> 56902 >>>> Brick cn08-ib:/gfs/gv0/brick1/brick 0 49152 Y >>>> 94920 >>>> Brick cn09-ib:/gfs/gv0/brick1/brick 0 49152 Y >>>> 56542 >>>> >>>> Task Status of Volume gv0 >>>> ------------------------------------------------------------ >>>> ------------------ >>>> There are no active volume tasks >>>> >>>> >>>> # gluster volume info gv0 >>>> Volume Name: gv0 >>>> Type: Distribute >>>> Volume ID: 8becaf78-cf2d-4991-93bf-f2446688154f >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 9 >>>> Transport-type: rdma >>>> Bricks: >>>> Brick1: cn01-ib:/gfs/gv0/brick1/brick >>>> Brick2: cn02-ib:/gfs/gv0/brick1/brick >>>> Brick3: cn03-ib:/gfs/gv0/brick1/brick >>>> Brick4: cn04-ib:/gfs/gv0/brick1/brick >>>> Brick5: cn05-ib:/gfs/gv0/brick1/brick >>>> Brick6: cn06-ib:/gfs/gv0/brick1/brick >>>> Brick7: cn07-ib:/gfs/gv0/brick1/brick >>>> Brick8: cn08-ib:/gfs/gv0/brick1/brick >>>> Brick9: cn09-ib:/gfs/gv0/brick1/brick >>>> Options Reconfigured: >>>> client.event-threads: 8 >>>> performance.parallel-readdir: on >>>> performance.readdir-ahead: on >>>> cluster.nufa: on >>>> nfs.disable: on >>>> >>> >>> -- >>> Best regards, >>> Anatoliy >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> > > -- > Laura Bailey > Senior Technical Writer > Customer Content Services BNE > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180314/fcbac516/attachment.html>
Anatoliy Dmytriyev
2018-Mar-14 10:06 UTC
[Gluster-users] Can't heal a volume: "Please check if all brick processes are running."
Hi Karthik, Thanks a lot for the explanation. Does it mean a distributed volume health can be checked only by "gluster volume status " command? And one more question: cluster.min-free-disk is 10% by default. What kind of "side effects" can we face if this option will be reduced to, for example, 5%? Could you point to any best practice document(s)? Regards, Anatoliy On 2018-03-13 16:46, Karthik Subrahmanya wrote:> Hi Anatoliy, > > The heal command is basically used to heal any mismatching contents between replica copies of the files. For the command "gluster volume heal <volname>" to succeed, you should have the self-heal-daemon running, > which is true only if your volume is of type replicate/disperse. In your case you have a plain distribute volume where you do not store the replica of any files. So the volume heal will return you the error. > > Regards, Karthik > > On Tue, Mar 13, 2018 at 7:53 PM, Anatoliy Dmytriyev <tolid at tolid.eu.org> wrote: > Hi, > > Maybe someone can point me to a documentation or explain this? I can't find it myself. > Do we have any other useful resources except doc.gluster.org [1]? As I see many gluster options are not described there or there are no explanation what is doing... > > On 2018-03-12 15:58, Anatoliy Dmytriyev wrote: > Hello, > > We have a very fresh gluster 3.10.10 installation. > Our volume is created as distributed volume, 9 bricks 96TB in total > (87TB after 10% of gluster disk space reservation) > > For some reasons I can't "heal" the volume: > # gluster volume heal gv0 > Launching heal operation to perform index self heal on volume gv0 has > been unsuccessful on bricks that are down. Please check if all brick > processes are running. > > Which processes should be run on every brick for heal operation? > > # gluster volume status > Status of volume: gv0 > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick cn01-ib:/gfs/gv0/brick1/brick 0 49152 Y 70850 > Brick cn02-ib:/gfs/gv0/brick1/brick 0 49152 Y 102951 > Brick cn03-ib:/gfs/gv0/brick1/brick 0 49152 Y 57535 > Brick cn04-ib:/gfs/gv0/brick1/brick 0 49152 Y 56676 > Brick cn05-ib:/gfs/gv0/brick1/brick 0 49152 Y 56880 > Brick cn06-ib:/gfs/gv0/brick1/brick 0 49152 Y 56889 > Brick cn07-ib:/gfs/gv0/brick1/brick 0 49152 Y 56902 > Brick cn08-ib:/gfs/gv0/brick1/brick 0 49152 Y 94920 > Brick cn09-ib:/gfs/gv0/brick1/brick 0 49152 Y 56542 > > Task Status of Volume gv0 > ------------------------------------------------------------------------------ > There are no active volume tasks > > # gluster volume info gv0 > Volume Name: gv0 > Type: Distribute > Volume ID: 8becaf78-cf2d-4991-93bf-f2446688154f > Status: Started > Snapshot Count: 0 > Number of Bricks: 9 > Transport-type: rdma > Bricks: > Brick1: cn01-ib:/gfs/gv0/brick1/brick > Brick2: cn02-ib:/gfs/gv0/brick1/brick > Brick3: cn03-ib:/gfs/gv0/brick1/brick > Brick4: cn04-ib:/gfs/gv0/brick1/brick > Brick5: cn05-ib:/gfs/gv0/brick1/brick > Brick6: cn06-ib:/gfs/gv0/brick1/brick > Brick7: cn07-ib:/gfs/gv0/brick1/brick > Brick8: cn08-ib:/gfs/gv0/brick1/brick > Brick9: cn09-ib:/gfs/gv0/brick1/brick > Options Reconfigured: > client.event-threads: 8 > performance.parallel-readdir: on > performance.readdir-ahead: on > cluster.nufa: on > nfs.disable: on > -- > Best regards, > Anatoliy > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users [2]-- Best regards, Anatoliy Links: ------ [1] http://doc.gluster.org [2] http://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180314/bfc1e742/attachment.html>
Karthik Subrahmanya
2018-Mar-14 12:12 UTC
[Gluster-users] Can't heal a volume: "Please check if all brick processes are running."
On Wed, Mar 14, 2018 at 3:36 PM, Anatoliy Dmytriyev <tolid at tolid.eu.org> wrote:> Hi Karthik, > > > Thanks a lot for the explanation. > > Does it mean a distributed volume health can be checked only by "gluster > volume status " command? >Yes. I am not aware of any other command which can give the status of plain distribute volume which is similar to the heal info command for replicate/disperse volumes.> And one more question: cluster.min-free-disk is 10% by default. What kind > of "side effects" can we face if this option will be reduced to, for > example, 5%? Could you point to any best practice document(s)? >Yes you can decrease it to any value. There won't be any side effect. Regards, Karthik> > Regards, > > Anatoliy > > > > > > On 2018-03-13 16:46, Karthik Subrahmanya wrote: > > Hi Anatoliy, > > The heal command is basically used to heal any mismatching contents > between replica copies of the files. > For the command "gluster volume heal <volname>" to succeed, you should > have the self-heal-daemon running, > which is true only if your volume is of type replicate/disperse. > In your case you have a plain distribute volume where you do not store the > replica of any files. > So the volume heal will return you the error. > > Regards, > Karthik > > On Tue, Mar 13, 2018 at 7:53 PM, Anatoliy Dmytriyev <tolid at tolid.eu.org> > wrote: > >> Hi, >> >> >> Maybe someone can point me to a documentation or explain this? I can't >> find it myself. >> Do we have any other useful resources except doc.gluster.org? As I see >> many gluster options are not described there or there are no explanation >> what is doing... >> >> >> >> On 2018-03-12 15:58, Anatoliy Dmytriyev wrote: >> >>> Hello, >>> >>> We have a very fresh gluster 3.10.10 installation. >>> Our volume is created as distributed volume, 9 bricks 96TB in total >>> (87TB after 10% of gluster disk space reservation) >>> >>> For some reasons I can't "heal" the volume: >>> # gluster volume heal gv0 >>> Launching heal operation to perform index self heal on volume gv0 has >>> been unsuccessful on bricks that are down. Please check if all brick >>> processes are running. >>> >>> Which processes should be run on every brick for heal operation? >>> >>> # gluster volume status >>> Status of volume: gv0 >>> Gluster process TCP Port RDMA Port Online >>> Pid >>> ------------------------------------------------------------ >>> ------------------ >>> Brick cn01-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 70850 >>> Brick cn02-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 102951 >>> Brick cn03-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 57535 >>> Brick cn04-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 56676 >>> Brick cn05-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 56880 >>> Brick cn06-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 56889 >>> Brick cn07-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 56902 >>> Brick cn08-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 94920 >>> Brick cn09-ib:/gfs/gv0/brick1/brick 0 49152 Y >>> 56542 >>> >>> Task Status of Volume gv0 >>> ------------------------------------------------------------ >>> ------------------ >>> There are no active volume tasks >>> >>> >>> # gluster volume info gv0 >>> Volume Name: gv0 >>> Type: Distribute >>> Volume ID: 8becaf78-cf2d-4991-93bf-f2446688154f >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 9 >>> Transport-type: rdma >>> Bricks: >>> Brick1: cn01-ib:/gfs/gv0/brick1/brick >>> Brick2: cn02-ib:/gfs/gv0/brick1/brick >>> Brick3: cn03-ib:/gfs/gv0/brick1/brick >>> Brick4: cn04-ib:/gfs/gv0/brick1/brick >>> Brick5: cn05-ib:/gfs/gv0/brick1/brick >>> Brick6: cn06-ib:/gfs/gv0/brick1/brick >>> Brick7: cn07-ib:/gfs/gv0/brick1/brick >>> Brick8: cn08-ib:/gfs/gv0/brick1/brick >>> Brick9: cn09-ib:/gfs/gv0/brick1/brick >>> Options Reconfigured: >>> client.event-threads: 8 >>> performance.parallel-readdir: on >>> performance.readdir-ahead: on >>> cluster.nufa: on >>> nfs.disable: on >> >> >> -- >> Best regards, >> Anatoliy >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > -- > Best regards, > Anatoliy >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180314/15d51d63/attachment.html>
Possibly Parallel Threads
- Can't heal a volume: "Please check if all brick processes are running."
- Can't heal a volume: "Please check if all brick processes are running."
- Can't heal a volume: "Please check if all brick processes are running."
- Can't heal a volume: "Please check if all brick processes are running."
- Can't heal a volume: "Please check if all brick processes are running."