Johannes Altmanninger via llvm-dev
2017-Mar-20 22:20 UTC
[llvm-dev] [GSoC 2017] Clang-based diff tool project
Hello, I am currently studying Computer Science at TU Eindhoven. I am doing a course that involves programming assignments on parts of LLVM such as lowering, scheduling and optimization. For this year's Google Summer of Code I plan to submit a proposal to implement a clang-based diff tool [1]. I think it really pays off to have decent developer tools available, as they can save tons of time. Clang tooling has obviously been very successful. I think it would be a good idea to develop a diff tool that considers the structure of the code, as opposed to just the lines. Plain old diff only thinks in terms of "additions" and "deletions", although it would be more natural to also consider "updates" and "moves". So a structural diff would work solely on the AST, hence formatting changes are ignored. It would allow to highlight the exact location of a change, and not a whole line. Furthermore, it would allow to compare pieces of code with the same structure (think subclasses). Besides some papers with clever AST-matching algorithms, a quick web search yielded [2], which is a proof-of-concept implementation of a structural comparison algorithm. I think it demonstrates rather nicely what could be done: movement of chunks of code can be easily traced. Anyway, one could make all kinds of nice visualizations using a AST diff tool, however, I think the initial focus should probably be on creating one with a similar output to traditional diff, with the difference that updates and moves are displayed in a easily readable way, which already could improve developer productivity and happiness. As of now I have one question: The output of the tool is meant just for humans to read (and not for actual patching), right? To sum up, this could be a very interesting project for me to work on, and the result will hopefully be useful to a wide range of developers. I would appreciate any feedback. Also, suggestions on how the diff output should be presented are welcome. Thank you! Johannes [1] http://llvm.org/OpenProjects.html#clang-diff-tool [2] https://yinwang0.wordpress.com/2012/01/03/ydiff/
Hal Finkel via llvm-dev
2017-Mar-20 23:11 UTC
[llvm-dev] [GSoC 2017] Clang-based diff tool project
On 03/20/2017 05:20 PM, Johannes Altmanninger via llvm-dev wrote:> Hello, > > I am currently studying Computer Science at TU Eindhoven. I am doing a > course that involves programming assignments on parts of LLVM such as > lowering, scheduling and optimization. For this year's Google Summer of > Code I plan to submit a proposal to implement a clang-based diff tool > [1]. > > I think it really pays off to have decent developer tools available, as > they can save tons of time. Clang tooling has obviously been very > successful. I think it would be a good idea to develop a diff tool that > considers the structure of the code, as opposed to just the lines. Plain > old diff only thinks in terms of "additions" and "deletions", although > it would be more natural to also consider "updates" and "moves". > > So a structural diff would work solely on the AST, hence formatting > changes are ignored. It would allow to highlight the exact location of a > change, and not a whole line. Furthermore, it would allow to compare > pieces of code with the same structure (think subclasses). > > Besides some papers with clever AST-matching algorithms, a quick web > search yielded [2], which is a proof-of-concept implementation of a > structural comparison algorithm. I think it demonstrates rather nicely > what could be done: movement of chunks of code can be easily traced.There is also a fair amount of literature associated with "XML Diff" tools which also demonstrate this kind of structural comparison. For example, see: http://diffxml.sourceforge.net/ https://www.cs.hut.fi/~ctl/3dm/ http://pages.cs.wisc.edu/~yuanwang/xdiff.html> > Anyway, one could make all kinds of nice visualizations using a AST diff > tool, however, I think the initial focus should probably be on creating > one with a similar output to traditional diff, with the difference that > updates and moves are displayed in a easily readable way, which already > could improve developer productivity and happiness. > > As of now I have one question: The output of the tool is meant just for > humans to read (and not for actual patching), right? > > To sum up, this could be a very interesting project for me to work on, > and the result will hopefully be useful to a wide range of developers. I > would appreciate any feedback. Also, suggestions on how the diff output > should be presented are welcome. Thank you!In the long term, I'd love to see a semantic diff tool which could help resolve merge conflicts. Merging two branches, both which have added a function to a class, or a member to a class (including updating the constructors), is a common problem. Having some way to "apply both changes" automatically would be a big help. To that end, I'd hope that the output of the tool could indeed be used for actual patching. That does not mean it would need to follow the traditional diff format (in fact, I'd expect that it would not). Moreover, the 'patch' part of the tool might well be out-of-scope for the initial project. I do think, however, we should at least have a mode where the 'diff' output is precise and machine readable so that we might later design patching tools. -Hal> > Johannes > > > [1] http://llvm.org/OpenProjects.html#clang-diff-tool > [2] https://yinwang0.wordpress.com/2012/01/03/ydiff/ > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Mehdi Amini via llvm-dev
2017-Mar-20 23:47 UTC
[llvm-dev] [GSoC 2017] Clang-based diff tool project
(+CC: Greg Clayton who gave me this idea in the first place)> On Mar 20, 2017, at 3:20 PM, Johannes Altmanninger via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hello, > > I am currently studying Computer Science at TU Eindhoven. I am doing a > course that involves programming assignments on parts of LLVM such as > lowering, scheduling and optimization. For this year's Google Summer of > Code I plan to submit a proposal to implement a clang-based diff tool > [1].Great! I look forward to see this :)> > I think it really pays off to have decent developer tools available, as > they can save tons of time. Clang tooling has obviously been very > successful. I think it would be a good idea to develop a diff tool that > considers the structure of the code, as opposed to just the lines. Plain > old diff only thinks in terms of "additions" and "deletions", although > it would be more natural to also consider "updates" and "moves". > > So a structural diff would work solely on the AST, hence formatting > changes are ignored. It would allow to highlight the exact location of a > change, and not a whole line. Furthermore, it would allow to compare > pieces of code with the same structure (think subclasses). > > Besides some papers with clever AST-matching algorithms, a quick web > search yielded [2], which is a proof-of-concept implementation of a > structural comparison algorithm. I think it demonstrates rather nicely > what could be done: movement of chunks of code can be easily traced. > > Anyway, one could make all kinds of nice visualizations using a AST diff > tool, however, I think the initial focus should probably be on creating > one with a similar output to traditional diff, with the difference that > updates and moves are displayed in a easily readable way, which already > could improve developer productivity and happiness. > > As of now I have one question: The output of the tool is meant just for > humans to read (and not for actual patching), right?Yes. But we developed software as libraries usually. Practically I expect the main part of the work to write some piece of API that generate an “in-memory” representation of the diff. A tool that is generating a textual-human readable output is likely the first client of this API and is likely critical to be able to functionally test it in the early development. In the future I hope it’d enable other graphical diff client to plug-in, or git-merge resolution tools as well. Best, — Mehdi
Johannes Altmanninger via llvm-dev
2017-Mar-21 21:44 UTC
[llvm-dev] [GSoC 2017] Clang-based diff tool project
So I have been playing around with this some more.> Yes. But we developed software as libraries usually. Practically I > expect the main part of the work to write some piece of API that > generate an “in-memory” representation of the diff.Yes, of course, I think the representation will be just a set of changes, where a change is a insert/update/removal of a node.> In the future I hope it’d enable other graphical diff client to > plug-in, or git-merge resolution tools as well.Yes, merge tools would be a nice application for this, as Hal also pointed out. Of course it is easy to make the output machine readable or just use the API in the first place. Unfortunately it is not possible to (de-)serialize the AST from/to XML (or some other structured data format). If it was, one could use existing structural diff tools to make prototypes or help test the final version. I have another question as I have never written a clang tool before: One could use LibClang or LibTooling, the former seems easier and probably good enough (?) Furthermore, should I work on a small patch / fix an easy bug to show that I am comfortable with the development process? I am a bit lost with the bug tracker, so if you can suggest things to work I will try that. Other than that I can work on a prototype for the tool but that takes some time. I found one small typo in clang, I guess I could submit that.. Johannes diff --git a/include/clang/AST/RecursiveASTVisitor.h b/include/clang/AST/RecursiveASTVisitor.h index 1b5850a05b..c31e4b0bb4 100644 --- a/include/clang/AST/RecursiveASTVisitor.h +++ b/include/clang/AST/RecursiveASTVisitor.h @@ -83,7 +83,7 @@ namespace clang { return false; \ } while (false) -/// \brief A class that does preordor or postorder +/// \brief A class that does preorder or postorder /// depth-first traversal on the entire Clang AST and visits each node. /// /// This class performs three distinct tasks: Mehdi Amini <mehdi.amini at apple.com> writes:> (+CC: Greg Clayton who gave me this idea in the first place) > >> On Mar 20, 2017, at 3:20 PM, Johannes Altmanninger via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hello, >> >> I am currently studying Computer Science at TU Eindhoven. I am doing a >> course that involves programming assignments on parts of LLVM such as >> lowering, scheduling and optimization. For this year's Google Summer of >> Code I plan to submit a proposal to implement a clang-based diff tool >> [1]. > > Great! I look forward to see this :) > >> >> I think it really pays off to have decent developer tools available, as >> they can save tons of time. Clang tooling has obviously been very >> successful. I think it would be a good idea to develop a diff tool that >> considers the structure of the code, as opposed to just the lines. Plain >> old diff only thinks in terms of "additions" and "deletions", although >> it would be more natural to also consider "updates" and "moves". >> >> So a structural diff would work solely on the AST, hence formatting >> changes are ignored. It would allow to highlight the exact location of a >> change, and not a whole line. Furthermore, it would allow to compare >> pieces of code with the same structure (think subclasses). >> >> Besides some papers with clever AST-matching algorithms, a quick web >> search yielded [2], which is a proof-of-concept implementation of a >> structural comparison algorithm. I think it demonstrates rather nicely >> what could be done: movement of chunks of code can be easily traced. >> >> Anyway, one could make all kinds of nice visualizations using a AST diff >> tool, however, I think the initial focus should probably be on creating >> one with a similar output to traditional diff, with the difference that >> updates and moves are displayed in a easily readable way, which already >> could improve developer productivity and happiness. >> >> As of now I have one question: The output of the tool is meant just for >> humans to read (and not for actual patching), right? > > Yes. But we developed software as libraries usually. Practically I expect the main part of the work to write some piece of API that generate an “in-memory” representation of the diff. > > A tool that is generating a textual-human readable output is likely the first client of this API and is likely critical to be able to functionally test it in the early development. In the future I hope it’d enable other graphical diff client to plug-in, or git-merge resolution tools as well. > > Best, > > — > Mehdi
Greg Clayton via llvm-dev
2017-Mar-23 17:41 UTC
[llvm-dev] [GSoC 2017] Clang-based diff tool project
My original idea was to write a semantic diff tool that just does some simple things up front: create an MD5 from all top level blocks of the code. Start by just finding matching blocks of code ('{' and '}', '(' and ')') and remember the source locations for these and their MD5 values. Run a normal diff on the code and see what blocks the diffs fall into. Then try to figure out where things moved by possibly delving deeper into each block that matched something from the diff. Also if any blocks moved to a completely different location, try and figure that out by matching the MD5 of any blocks. For example if you had: int main(int argc, const char **argv) { if (argc > 2) { } switch (argc) { } } You would first make MD5s for the '(' and ')' in the "main" line and for the '{' at the end of the main line, and ending at the end of the code. Now the code looks like: int main(int argc, const char **argv) { switch (argc) { } if (argc > 2) { } } The diff would show that the "if" is gone and a new "if" is found after the switch in the new version of the file. We would notice that the diff appears inside the block from the first: { if (argc > 2) { } switch (argc) { } } And in the block from the second: { switch (argc) { } if (argc > 2) { } } So we would then compute the MD5 for the blocks inside each of these blocks and try to match things up. The MD5 would of course remove spaces that aren't in strings and only compute the MD5 from the characters that make sense. This simple type of approach could almost work on any language without the need to be able to correctly compile each file with all the right options. Greg> On Mar 20, 2017, at 4:47 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > > (+CC: Greg Clayton who gave me this idea in the first place) > >> On Mar 20, 2017, at 3:20 PM, Johannes Altmanninger via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hello, >> >> I am currently studying Computer Science at TU Eindhoven. I am doing a >> course that involves programming assignments on parts of LLVM such as >> lowering, scheduling and optimization. For this year's Google Summer of >> Code I plan to submit a proposal to implement a clang-based diff tool >> [1]. > > Great! I look forward to see this :) > >> >> I think it really pays off to have decent developer tools available, as >> they can save tons of time. Clang tooling has obviously been very >> successful. I think it would be a good idea to develop a diff tool that >> considers the structure of the code, as opposed to just the lines. Plain >> old diff only thinks in terms of "additions" and "deletions", although >> it would be more natural to also consider "updates" and "moves". >> >> So a structural diff would work solely on the AST, hence formatting >> changes are ignored. It would allow to highlight the exact location of a >> change, and not a whole line. Furthermore, it would allow to compare >> pieces of code with the same structure (think subclasses). >> >> Besides some papers with clever AST-matching algorithms, a quick web >> search yielded [2], which is a proof-of-concept implementation of a >> structural comparison algorithm. I think it demonstrates rather nicely >> what could be done: movement of chunks of code can be easily traced. >> >> Anyway, one could make all kinds of nice visualizations using a AST diff >> tool, however, I think the initial focus should probably be on creating >> one with a similar output to traditional diff, with the difference that >> updates and moves are displayed in a easily readable way, which already >> could improve developer productivity and happiness. >> >> As of now I have one question: The output of the tool is meant just for >> humans to read (and not for actual patching), right? > > Yes. But we developed software as libraries usually. Practically I expect the main part of the work to write some piece of API that generate an “in-memory” representation of the diff. > > A tool that is generating a textual-human readable output is likely the first client of this API and is likely critical to be able to functionally test it in the early development. In the future I hope it’d enable other graphical diff client to plug-in, or git-merge resolution tools as well. > > Best, > > — > Mehdi >
David Chisnall via llvm-dev
2017-Mar-23 17:48 UTC
[llvm-dev] [GSoC 2017] Clang-based diff tool project
On 20 Mar 2017, at 22:20, Johannes Altmanninger via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > I am currently studying Computer Science at TU Eindhoven. I am doing a > course that involves programming assignments on parts of LLVM such as > lowering, scheduling and optimization. For this year's Google Summer of > Code I plan to submit a proposal to implement a clang-based diff tool > [1].I had a student implement such a tool a couple of years ago as a final-year (undergrad) project. He used the Python bindings to libclang and made something that worked with both the Python AST and the C AST exposed by libclang. The core idea for his work was the ability to recognise common refactoring patterns (for example inlining, outlining, variable renaming) and some fuzzy matching (so his tool could report things like ‘you renamed X to Y everywhere except here and here’). His code was a nice proof-of-concept, but was never tidied up enough to be ready for widespread use. It would be great to see something a bit more robust. David
Reasonably Related Threads
- [GSoC 2017] Clang-based diff tool project
- [GSoC 2017] Clang-based diff tool project
- [GSoC 2017] Clang-based diff tool project
- [LLVMdev] [lldb-dev] How is variable info retrieved in debugging for executables generated by llvm backend?
- [LLVMdev] How is variable info retrieved in debugging for executables generated by llvm backend?