Pranith Kumar Karampuri
2014-May-08 11:12 UTC
[Gluster-users] Proposal for improvements for heal commands
hi, 1) Command: "gluster volume heal <volname> info" was not distinguishing between files undergoing I/O vs files that need self-heal. It also doesn't scale well for Big Outputs. In 3.5 branch I already sent the re-implementation and is already merged. - It distinguishes file data modifications (Writes/Truncates) vs data that needs healing - Command scales well for VERY BIG output i.e. No Cli timeouts, prints all the entries. - TODO: Distinguishing metadata(chown/chmod/setfattr etc)/entry(create/deletes) I/O vs self-heal. 2) According to the feedback we got, Commands: "gluster volume heal <volname> info healed/heal-failed" are not helpful in debugging anything. So I am thinking of deprecating these two commands. Reasons: - The commands only give the last 1024 entries that succeeded/failed, so most of the times users need to inspect logs. Even without "gluster volume heal <volname> info heal/heal-failed" commands, user can gather the status using "gluster volume heal <volname> info" as below: - If the heal succeeds, the entry will stop showing in "gluster volume heal <volname> info" - If the heal fails, the entry keeps showing up in "gluster volume heal <volname> info" so logs give better reasons for failures. 3) "gluster volume heal <volname> info split-brain" will be re-implemented to print all the files that are in split-brain instead of the limited 1024 entries. - One constant complaint is that even after the file is fixed from split-brain, it may still show up in the previously cached output. In this implementation the goal is to remove all the caching and compute the results afresh. Please let us know your feedback. I will wait for 2-3 days to gather feedback then will start working on these. Pranith
Jeff Darcy
2014-May-08 13:23 UTC
[Gluster-users] [Gluster-devel] Proposal for improvements for heal commands
> 2) According to the feedback we got, Commands: "gluster volume heal <volname> > info healed/heal-failed" are not helpful in debugging anything. So I am > thinking of deprecating these two commands. > Reasons: > - The commands only give the last 1024 entries that succeeded/failed, so > most of the times users need to inspect logs.Seems reasonable, though if it's just an issue of not keeping enough information to be useful we could fix that by simply retaining more.> 3) "gluster volume heal <volname> info split-brain" will be re-implemented to > print all the files that are in split-brain instead of the limited 1024 > entries. > - One constant complaint is that even after the file is fixed from > split-brain, it may still show up in the previously cached output. In > this implementation the goal is to remove all the caching and compute the > results afresh.This seems reasonable too. I can't help but wonder if it might be worth tracking split-brain files using a Merkle tree approach like we did with xtime, so we could track any number of such files efficiently.
Ted Miller
2014-May-22 17:33 UTC
[Gluster-users] Proposal for improvements for heal commands
On 5/8/2014 7:12 AM, Pranith Kumar Karampuri wrote:> [snip] > > 3) "gluster volume heal <volname> info split-brain" will be re-implemented to print all the files that are in split-brain instead of the limited 1024 entries. > - One constant complaint is that even after the file is fixed from split-brain, it may still show up in the previously cached output. In this implementation the goal is to remove all the caching and compute the results afresh.Pranith, I missed your three-day deadline, because I was on vacation/holiday. If not for this time, perhaps for the next iteration. If at all possible, can we have the gfids in the split-brain output replaced by file names? No matter how you go about fixing a split-brain, not knowing the file name when faced with a list of possibilities is an infuriating situation. You can't tell log files from databases, so you can't even do triage. Thanks for your work, Ted Miller Elkhart, IN