Chris Ye via llvm-dev
2019-Jul-23 08:09 UTC
[llvm-dev] How to contribute on LLVM project as beginner
Hi Paul, Thanks for your useful guidelines, may I confirm with you the steps list below is correct or not? 1. find sample code (.c) 2. using different options(pass) to compile sample code by clang with/without "-g" 3. objdump the output.o and outout-g.o 4. compare two file of text section check if there has any difference. 5. if find difference, great, file bug and fix it. Please correct me if I miss something. Follow the steps, * I used sample code (foo.c) ------------------------------------------------------------------------------------------ int foo() { return 42; } int bar() { return foo(); } ------------------------------------------------------------------------------------------ * created a compare tool (compare.sh) ------------------------------------------------------------------------------------------ #!/bin/bash options=$1 file=$2 clang -c -ffunction-sections -fexceptions -mllvm -opt-bisect-limit=200 $1 $file -o output.o clang -c -ffunction-sections -fexceptions -mllvm -opt-bisect-limit=200 $1 $file -o output-g.o objdump -d output.o > output.objdump objdump -d output-g.o > output-g.objdump diff -uNar output.objdump output-g.objdump ------------------------------------------------------------------------------------------ * Then run the compassion tests ------------------------------------------------------------------------------------------ $ ./compare.sh -O0 foo.c $ ./compare.sh -O1 foo.c $ ./compare.sh -O2 foo.c $ ./compare.sh -O3 foo.c ------------------------------------------------------------------------------------------ The diff result is the same. How can I find the bug? Do the sample code I used too simple? Or need I add other more pass options? Please help to correct my steps if I missed something. Thanks you very much. Best Regards, Chris Ye At 2019-07-17 01:40:43, paul.robinson at sony.com wrote: Hi Chris, "Debug info should have no effect on codegen" would be a fine project for you; nobody is working on it that I know of. Another way to contribute would be to go to our Bugzilla (bugs.llvm.org) and search for open bugs with the "beginner" keyword. Regarding the "debug info has no effect on codegen" project, unfortunately I am having IT issues that keep me from providing much in the way of specific suggestions, so what follows is fairly generic. In principle, you compile some piece of code with and without –g, and see if there is any difference in the generated instructions. My experience is that you want to compile to a .o file, and then use a disassembler to dump the text sections. This will give you a cleaner diff than using –S to generate assembler files. I also recommend compiling with `-ffunction-sections` and probably `-fexceptions`. The former will put each compiled function into its own object-file section, so that differences in one function won't affect the disassembly of a later function. The latter option should work around one fairly intractable known difference: -g will cause the compiler to emit directives to produce call-frame information, and these tend to act as instruction-scheduling barriers. Using –fexceptions (I am 95% sure that is the correct option) should cause the non-dash-g compilation to use the same directives, and avoid that known difference. You can repeat this experiment with different optimization levels, as differences are far more likely to show up with optimization. Once you find a difference, you can begin experimenting with ways to identify specific compiler passes that are contributing to the difference. A very useful tool here is the backend option `-opt-bisect-limit=N` where N is the number of passes to execute. Because it is a backend option, you would use it this way: clang –c –O2 –mllvm –opt-bisect-limit=100 foo.c –o foo.o clang –c –O2 –mllvm –opt-bisect-limit=100 foo.c –g –o foo-g.o Then disassemble and diff as usual. After you have identified a problematic pass, you can try your hand at fixing it yourself, or you can file a bug (with a reduced reproducer if at all possible) and move on to another sample. Of course you will need some sample source code to run experiments on. This can be anything convenient. You could try it on any personal projects you have, or you could find a random code generator, or whatever you like. Some people have recommended LLVM's own 'test-suite' project although I have not looked at it in any detail. Good luck, and feel free to post additional questions on llvm-dev if you run into any problems. --paulr From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Chris Ye via llvm-dev Sent: Sunday, July 14, 2019 11:59 PM To: llvm-dev at lists.llvm.org Subject: [llvm-dev] How to contribute on LLVM project as beginner Hi LLVM project Leaders, I am a software engineer working on several other open source projects, recently I am very interested in LLVM technology, espically on backend part. I have taken two months studying the documents from llvm.org in my spare time. As a beginner, I would like to contribute some code to LLVM project, from the "Google Summer of Code 2019", I found one project "Debug Info should have no effect on codegen" that I may able to contribute, not sure if the project has already been completed? If there are still tasks exist, how can I join in? Or is that any other project I can work on? I would spend 10~20 hours on LLVM development every week as I want to gather experience to find a job as LLVM developer in the furture. I am a quickly learning, I would be very appricate if you could help me and give me some guides, so that I would run faster on my way to LLVM field. Many thanks. ----------------------------------------------------------------- LLVM Debug Info should have no effect on codegen Description of the project: Adding Debug Info (compiling with `clang -g`) shouldn't change the generated code at all. Unfortunately we have bugs. These are usually not too hard to fix and a good way to discover new part of the codebase! We suggest building object files both ways and disassembling the text sections, which will give cleaner diffs than comparing .s files. Expected results: Reduced test cases, bug reports with analysis (e.g., which pass is responsible), possibly patches. Confirmed Mentor: Paul Robinson Desirable skills: Intermediate knowledge of C++, some familiarity with x86 or ARM instruction set. Best Regards, Chris Ye -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190723/9eaaa421/attachment.html>
Oliver Stannard via llvm-dev
2019-Jul-24 09:52 UTC
[llvm-dev] How to contribute on LLVM project as beginner
Your script looks OK, though you won't want to use the -opt-bisect-limitoption until you've found a case where code-generation changes. Instead, that's a tool which you could use to narrow down the pass inside LLVM which is causing the change. The problem is that your input code is far too simple to trigger any interesting optimisations. I'd suggest starting with either some code from the LLVM test suite (https://github.com/llvm/llvm-test-suite), or some code generated by csmith (https://embed.cs.utah.edu/csmith/). The former has the advantage of being (mostly) real code people actually write, and the latter can generate a large amount of complex code without any external dependencies (so it's easy to build). I'd also suggest looking into creduce (https://embed.cs.utah.edu/creduce/), which will allow you to quickly reduce a large input file which triggers a bug down to a much smaller one. Oliver On Wed, 24 Jul 2019 at 01:18, Chris Ye via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hi Paul, > Thanks for your useful guidelines, may I confirm with you the steps list > below is correct or not? > > 1. find sample code (.c) > 2. using different options(pass) to compile sample code by clang > with/without "-g" > 3. objdump the output.o and outout-g.o > 4. compare two file of text section check if there has any difference. > 5. if find difference, great, file bug and fix it. > > Please correct me if I miss something. > > Follow the steps, > * I used sample code (foo.c) > > ------------------------------------------------------------------------------------------ > int foo() { return 42; } > int bar() { return foo(); } > > ------------------------------------------------------------------------------------------ > > * created a compare tool (compare.sh) > > ------------------------------------------------------------------------------------------ > #!/bin/bash > > options=$1 > file=$2 > > clang -c -ffunction-sections -fexceptions -mllvm -opt-bisect-limit=200 $1 > $file -o output.o > clang -c -ffunction-sections -fexceptions -mllvm -opt-bisect-limit=200 $1 > $file -o output-g.o > > objdump -d output.o > output.objdump > objdump -d output-g.o > output-g.objdump > > diff -uNar output.objdump output-g.objdump > > ------------------------------------------------------------------------------------------ > > * Then run the compassion tests > > ------------------------------------------------------------------------------------------ > $ ./compare.sh -O0 foo.c > $ ./compare.sh -O1 foo.c > $ ./compare.sh -O2 foo.c > $ ./compare.sh -O3 foo.c > > ------------------------------------------------------------------------------------------ > > The diff result is the same. How can I find the bug? Do the sample code I > used too simple? Or need I add other more pass options? Please help to > correct my steps if I missed something. Thanks you very much. > > Best Regards, > Chris Ye > > > At 2019-07-17 01:40:43, paul.robinson at sony.com wrote: > > Hi Chris, > > > > "Debug info should have no effect on codegen" would be a fine project for > you; nobody is working on it that I know of. Another way to contribute > would be to go to our Bugzilla (bugs.llvm.org) and search for open bugs > with the "beginner" keyword. > > > > Regarding the "debug info has no effect on codegen" project, unfortunately > I am having IT issues that keep me from providing much in the way of > specific suggestions, so what follows is fairly generic. > > In principle, you compile some piece of code with and without –g, and see > if there is any difference in the generated instructions. My experience is > that you want to compile to a .o file, and then use a disassembler to dump > the text sections. This will give you a cleaner diff than using –S to > generate assembler files. > > I also recommend compiling with `-ffunction-sections` and probably > `-fexceptions`. The former will put each compiled function into its own > object-file section, so that differences in one function won't affect the > disassembly of a later function. The latter option should work around one > fairly intractable known difference: -g will cause the compiler to emit > directives to produce call-frame information, and these tend to act as > instruction-scheduling barriers. Using –fexceptions (I am 95% sure that is > the correct option) should cause the non-dash-g compilation to use the same > directives, and avoid that known difference. > > You can repeat this experiment with different optimization levels, as > differences are far more likely to show up with optimization. > > > > Once you find a difference, you can begin experimenting with ways to > identify specific compiler passes that are contributing to the difference. > A very useful tool here is the backend option `-opt-bisect-limit=N` where N > is the number of passes to execute. Because it is a backend option, you > would use it this way: > > clang –c –O2 –mllvm –opt-bisect-limit=100 foo.c –o foo.o > > clang –c –O2 –mllvm –opt-bisect-limit=100 foo.c –g –o foo-g.o > > Then disassemble and diff as usual. After you have identified a > problematic pass, you can try your hand at fixing it yourself, or you can > file a bug (with a reduced reproducer if at all possible) and move on to > another sample. > > > > Of course you will need some sample source code to run experiments on. > This can be anything convenient. You could try it on any personal projects > you have, or you could find a random code generator, or whatever you like. > Some people have recommended LLVM's own 'test-suite' project although I > have not looked at it in any detail. > > > > Good luck, and feel free to post additional questions on llvm-dev if you > run into any problems. > > --paulr > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Chris > Ye via llvm-dev > *Sent:* Sunday, July 14, 2019 11:59 PM > *To:* llvm-dev at lists.llvm.org > *Subject:* [llvm-dev] How to contribute on LLVM project as beginner > > > > Hi LLVM project Leaders, > > I am a software engineer working on several other open source > projects, recently I am very interested in LLVM technology, espically on > backend part. I have taken two months studying the documents from llvm.org > in my spare time. As a beginner, I would like to contribute some code to > LLVM project, from the "Google Summer of Code 2019", I found one project > "Debug Info should have no effect on codegen" that I may able to > contribute, not sure if the project has already been completed? If there > are still tasks exist, how can I join in? Or is that any other project I > can work on? I would spend 10~20 hours on LLVM development every week as I > want to gather experience to find a job as LLVM developer in the > furture. I am a quickly learning, I would be very appricate if you could > help me and give me some guides, so that I would run faster on my way to > LLVM field. Many thanks. > > > > ----------------------------------------------------------------- > > LLVM > > Debug Info should have no effect on codegen > > *Description of the project:* Adding Debug Info (compiling with `clang > -g`) shouldn't change the generated code at all. Unfortunately we have > bugs. These are usually not too hard to fix and a good way to discover new > part of the codebase! We suggest building object files both ways and > disassembling the text sections, which will give cleaner diffs than > comparing .s files. > > *Expected results:* Reduced test cases, bug reports with analysis (e.g., > which pass is responsible), possibly patches. > > *Confirmed Mentor:* Paul Robinson > > *Desirable skills:* Intermediate knowledge of C++, some familiarity with > x86 or ARM instruction set. > > > > Best Regards, > > Chris Ye > > > > > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190724/9938b243/attachment.html>
Greg Bedwell via llvm-dev
2019-Jul-24 10:41 UTC
[llvm-dev] How to contribute on LLVM project as beginner
On Wed, 24 Jul 2019 at 10:52, Oliver Stannard via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Your script looks OK, though you won't want to use the -opt-bisect-limit> option until you've found a case where code-generation changes. Instead, > that's a tool which you could use to narrow down the pass inside LLVM which > is causing the change. > > The problem is that your input code is far too simple to trigger any > interesting optimisations. I'd suggest starting with either some code from > the LLVM test suite (https://github.com/llvm/llvm-test-suite), or some > code generated by csmith (https://embed.cs.utah.edu/csmith/). The former > has the advantage of being (mostly) real code people actually write, and > the latter can generate a large amount of complex code without any external > dependencies (so it's easy to build). > >A few other things to note: There's a tool in clang here ( https://github.com/llvm/llvm-project/tree/master/clang/utils/check_cfc ) called check_cfc which uses the same basic idea as the script above. It's designed to transparently wrap clang invocations so that any differences in codegen will actually trigger a build failure. There are a few more details in these slides ( https://llvm.org/devmtg/2015-04/slides/Verifying_code_gen_dash_g_final.pdf ). Ultimately it doesn't matter which tools you use in order to find bugs, but you may find it useful. We've got a meta-bug here to which we've been attaching already-reported bugs in this area ( https://bugs.llvm.org/show_bug.cgi?id=37728 ) which might be a nice place to start so that you can try replicating the results. In particular https://bugs.llvm.org/show_bug.cgi?id=42138 is a bug that one of our interns found recently using the check_cfc script with llvm test-suite (and then reducing with creduce). Unfortunately it was right at the end of his internship so he didn't get a chance to try and fix it. It might be a good starting point to have a go at replicating the failure and then trying to figure out what's happening and fixing it (assuming that it's still present). I'm sure that there are plenty of people in the community willing to help out with any specific issues you run into along the way. Good luck, with whichever approach you take! -Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190724/1efa6881/attachment.html>