Shiva Stanford via llvm-dev
2020-Mar-24 09:13 UTC
[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
I am a grad CS student at Stanford and wanted to engage with EJ Park, Giorgis Georgakoudis, Johannes Doerfert to further develop the Machine Learning and Compiler Optimization concept. My background is in machine learning, cluster computing, distributed systems etc. I am a good C/C++ developer and have a strong background in algorithms and data structure. I am also taking an advanced compiler course this quarter at Stanford. So I would be studying several of these topics anyways - so I thought I might as well co-engage on the LLVM compiler infra project. I am currently studying the background information on SCC Call Graphs, Dominator Trees and other Global and inter-procedural analysis to lay some ground work on how to tackle this optimization pass using ML models. I have run a couple of all program function passes and visualized call graphs to get familiarized with the LLVM optimization pass setup. I have also setup and learnt the use of GDB to debug function pass code. I have submitted the ML and Compiler Optimization proposal to GSOC 2020. I have added an additional feature to enhance the ML optimization to include crossover code to GPU and investigate how the function call graphs can be visualized as SCCs across CPU and GPU implementations. If the extension to GPU is too much for a summer project, potentially we can focus on developing a framework for studying SCCs across a unified CPU, GPU setup and leave the coding, if feasible, to next Summer. All preliminary ideas. Not sure how to proceed from here. Hence my email to this list. Please let me know. Thank you Shiva Badruswamy shivastanford at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200324/423ca4fc/attachment.html>
Johannes Doerfert via llvm-dev
2020-Mar-26 16:41 UTC
[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
Hi Shiva, apologies for the delayed response. On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote: > I am a grad CS student at Stanford and wanted to engage with EJ Park, > Giorgis Georgakoudis, Johannes Doerfert to further develop the Machine > Learning and Compiler Optimization concept. Cool! > My background is in machine learning, cluster computing, distributed > systems etc. I am a good C/C++ developer and have a strong background in > algorithms and data structure. Sounds good. > I am also taking an advanced compiler course this quarter at Stanford. So I > would be studying several of these topics anyways - so I thought I might as > well co-engage on the LLVM compiler infra project. Agreed ;) > I am currently studying the background information on SCC Call Graphs, > Dominator Trees and other Global and inter-procedural analysis to lay some > ground work on how to tackle this optimization pass using ML models. I have > run a couple of all program function passes and visualized call graphs to > get familiarized with the LLVM optimization pass setup. I have also setup > and learnt the use of GDB to debug function pass code. Very nice. > I have submitted the ML and Compiler Optimization proposal to GSOC 2020. I > have added an additional feature to enhance the ML optimization to include > crossover code to GPU and investigate how the function call graphs can be > visualized as SCCs across CPU and GPU implementations. If the extension to > GPU is too much for a summer project, potentially we can focus on > developing a framework for studying SCCs across a unified CPU, GPU setup > and leave the coding, if feasible, to next Summer. All preliminary ideas. I haven't looked at the proposals yet (I think we can only after the deadline). TBH, I'm not sure I fully understand your extension. Also, full disclosure, the project is pretty open-ended from my side at least. I do not necessarily believe we (=llvm) is ready for a ML driven pass or even inference in practice. What I want is to explore the use of ML to improve the code we have, especially heuristics. We build analysis and transformations but it is hard to combine them in a way that balances compile-time, code-size, and performance. Some high-level statements that might help to put my view into perspective: I want to use ML to identify patterns and code features that we can check for using common techniques but when we base our decision making on these patterns or features we achieve better compile-time, code-size, and/or performance. I want to use ML to identify shortcomings in our existing heuristics, e.g. transformation cut-off values or pass schedules. This could also mean to identify alternative (combination of) values that perform substantially better (on some inputs). > Not sure how to proceed from here. Hence my email to this list. Please let > me know. The email to the list was a great first step. The next one usually is to setup an LLVM development and testing environment, thus LLVM + Clang + LLVM-Test Suite that you can use. It is also advised to work on a small task before the GSoC to get used to the LLVM development. I don't have a really small ML "coding" task handy right now but the project is more about experiments anyway. To get some LLVM development experience we can just take a small task in the IPO Attributor pass. One thing we need and we don't have is data. The Attributor is a fixpoint iteration framework so the number of iterations is pretty integral part. We have a statistics counter to determine if the number required was higher than the given threshold but not one to determine the maximum iteration count required during compilation. It would be great if you could add that, thus a statistics counter that shows how many iterations where required until a fixpoint was found across all invocations of the Attributor. Does this make sense? Let me know what you think and feel free to ask questions via email or on IRC. Cheers, Johannes P.S. Check out the coding style guide and the how to contribute guide! > Thank you > Shiva Badruswamy > shivastanford at gmail.com > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Shiva Stanford via llvm-dev
2020-Mar-27 20:46 UTC
[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
Hi Johannes - great we are engaging on this. Some responses now and some later. 1. When you say setup LLVM dev environment +. clang + tools etc, do you mean setup LLVM compiler code from the repo and build it locally? If so, yes, this is all done from my end - that is, I have built all this on my machine and compiled and run a couple of function passes. I have look at some LLVM emits from clang tools but I will familiarize more. I have added some small code segments, modified CMAKE Lists and re-built code to get a feel for the packaging structure. Btw, is there a version of Basel build for this? Right now, I am using OS X as the SDK as Apple is the one that has adopted LLVM the most. But I can switch to Linux containers to completely wall off the LLVM build against any OS X system builds to prevent path obfuscation and truly have a separate address space. Is there a preferable environment? In any case, I am thinking of containerizing the build, so OS X system paths don't interfere with include paths - have you received feedback from other developers on whether the include paths interfere with OS X LLVM system build? 2. The attributor pass refactoring gives some specific direction as a startup project - so that's great. Let me study this pass and I will get back to you with more questions. 3. Yes, I will stick to the style guide (Baaaah...Stanford is strict on code styling and so are you guys :)) for sure. On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert < johannesdoerfert at gmail.com> wrote:> > Hi Shiva, > > apologies for the delayed response. > > On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote: > > I am a grad CS student at Stanford and wanted to engage with EJ Park, > > Giorgis Georgakoudis, Johannes Doerfert to further develop the Machine > > Learning and Compiler Optimization concept. > > Cool! > > > > My background is in machine learning, cluster computing, distributed > > systems etc. I am a good C/C++ developer and have a strong background in > > algorithms and data structure. > > Sounds good. > > > > I am also taking an advanced compiler course this quarter at > Stanford. So I > > would be studying several of these topics anyways - so I thought I > might as > > well co-engage on the LLVM compiler infra project. > > Agreed ;) > > > > I am currently studying the background information on SCC Call Graphs, > > Dominator Trees and other Global and inter-procedural analysis to lay > some > > ground work on how to tackle this optimization pass using ML models. > I have > > run a couple of all program function passes and visualized call graphs > to > > get familiarized with the LLVM optimization pass setup. I have also > setup > > and learnt the use of GDB to debug function pass code. > > Very nice. > > > > I have submitted the ML and Compiler Optimization proposal to GSOC > 2020. I > > have added an additional feature to enhance the ML optimization to > include > > crossover code to GPU and investigate how the function call graphs can > be > > visualized as SCCs across CPU and GPU implementations. If the > extension to > > GPU is too much for a summer project, potentially we can focus on > > developing a framework for studying SCCs across a unified CPU, GPU setup > > and leave the coding, if feasible, to next Summer. All preliminary > ideas. > > I haven't looked at the proposals yet (I think we can only after the > deadline). TBH, I'm not sure I fully understand your extension. Also, > full disclosure, the project is pretty open-ended from my side at least. > I do not necessarily believe we (=llvm) is ready for a ML driven pass or > even inference in practice. What I want is to explore the use of ML to > improve the code we have, especially heuristics. We build analysis and > transformations but it is hard to combine them in a way that balances > compile-time, code-size, and performance. > > Some high-level statements that might help to put my view into > perspective: > > I want to use ML to identify patterns and code features that we can > check for using common techniques but when we base our decision making > on these patterns or features we achieve better compile-time, code-size, > and/or performance. > I want to use ML to identify shortcomings in our existing heuristics, > e.g. transformation cut-off values or pass schedules. This could also > mean to identify alternative (combination of) values that perform > substantially better (on some inputs). > > > > Not sure how to proceed from here. Hence my email to this list. > Please let > > me know. > > The email to the list was a great first step. The next one usually is to > setup an LLVM development and testing environment, thus LLVM + Clang + > LLVM-Test Suite that you can use. It is also advised to work on a small > task before the GSoC to get used to the LLVM development. > > I don't have a really small ML "coding" task handy right now but the > project is more about experiments anyway. To get some LLVM development > experience we can just take a small task in the IPO Attributor pass. > > One thing we need and we don't have is data. The Attributor is a > fixpoint iteration framework so the number of iterations is pretty > integral part. We have a statistics counter to determine if the number > required was higher than the given threshold but not one to determine > the maximum iteration count required during compilation. It would be > great if you could add that, thus a statistics counter that shows how > many iterations where required until a fixpoint was found across all > invocations of the Attributor. Does this make sense? Let me know what > you think and feel free to ask questions via email or on IRC. > > Cheers, > Johannes > > P.S. Check out the coding style guide and the how to contribute guide! > > > > Thank you > > Shiva Badruswamy > > shivastanford at gmail.com > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200327/5c780cf2/attachment.html>
Possibly Parallel Threads
- Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
- Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
- Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
- [GSOC] "Project: Improve inter-procedural analyses and optimisations"
- [GSOC] "Project: Improve inter-procedural analyses and optimisations"