Shiva Stanford via llvm-dev
2020-Mar-27 20:46 UTC
[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
Hi Johannes - great we are engaging on this. Some responses now and some later. 1. When you say setup LLVM dev environment +. clang + tools etc, do you mean setup LLVM compiler code from the repo and build it locally? If so, yes, this is all done from my end - that is, I have built all this on my machine and compiled and run a couple of function passes. I have look at some LLVM emits from clang tools but I will familiarize more. I have added some small code segments, modified CMAKE Lists and re-built code to get a feel for the packaging structure. Btw, is there a version of Basel build for this? Right now, I am using OS X as the SDK as Apple is the one that has adopted LLVM the most. But I can switch to Linux containers to completely wall off the LLVM build against any OS X system builds to prevent path obfuscation and truly have a separate address space. Is there a preferable environment? In any case, I am thinking of containerizing the build, so OS X system paths don't interfere with include paths - have you received feedback from other developers on whether the include paths interfere with OS X LLVM system build? 2. The attributor pass refactoring gives some specific direction as a startup project - so that's great. Let me study this pass and I will get back to you with more questions. 3. Yes, I will stick to the style guide (Baaaah...Stanford is strict on code styling and so are you guys :)) for sure. On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert < johannesdoerfert at gmail.com> wrote:> > Hi Shiva, > > apologies for the delayed response. > > On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote: > > I am a grad CS student at Stanford and wanted to engage with EJ Park, > > Giorgis Georgakoudis, Johannes Doerfert to further develop the Machine > > Learning and Compiler Optimization concept. > > Cool! > > > > My background is in machine learning, cluster computing, distributed > > systems etc. I am a good C/C++ developer and have a strong background in > > algorithms and data structure. > > Sounds good. > > > > I am also taking an advanced compiler course this quarter at > Stanford. So I > > would be studying several of these topics anyways - so I thought I > might as > > well co-engage on the LLVM compiler infra project. > > Agreed ;) > > > > I am currently studying the background information on SCC Call Graphs, > > Dominator Trees and other Global and inter-procedural analysis to lay > some > > ground work on how to tackle this optimization pass using ML models. > I have > > run a couple of all program function passes and visualized call graphs > to > > get familiarized with the LLVM optimization pass setup. I have also > setup > > and learnt the use of GDB to debug function pass code. > > Very nice. > > > > I have submitted the ML and Compiler Optimization proposal to GSOC > 2020. I > > have added an additional feature to enhance the ML optimization to > include > > crossover code to GPU and investigate how the function call graphs can > be > > visualized as SCCs across CPU and GPU implementations. If the > extension to > > GPU is too much for a summer project, potentially we can focus on > > developing a framework for studying SCCs across a unified CPU, GPU setup > > and leave the coding, if feasible, to next Summer. All preliminary > ideas. > > I haven't looked at the proposals yet (I think we can only after the > deadline). TBH, I'm not sure I fully understand your extension. Also, > full disclosure, the project is pretty open-ended from my side at least. > I do not necessarily believe we (=llvm) is ready for a ML driven pass or > even inference in practice. What I want is to explore the use of ML to > improve the code we have, especially heuristics. We build analysis and > transformations but it is hard to combine them in a way that balances > compile-time, code-size, and performance. > > Some high-level statements that might help to put my view into > perspective: > > I want to use ML to identify patterns and code features that we can > check for using common techniques but when we base our decision making > on these patterns or features we achieve better compile-time, code-size, > and/or performance. > I want to use ML to identify shortcomings in our existing heuristics, > e.g. transformation cut-off values or pass schedules. This could also > mean to identify alternative (combination of) values that perform > substantially better (on some inputs). > > > > Not sure how to proceed from here. Hence my email to this list. > Please let > > me know. > > The email to the list was a great first step. The next one usually is to > setup an LLVM development and testing environment, thus LLVM + Clang + > LLVM-Test Suite that you can use. It is also advised to work on a small > task before the GSoC to get used to the LLVM development. > > I don't have a really small ML "coding" task handy right now but the > project is more about experiments anyway. To get some LLVM development > experience we can just take a small task in the IPO Attributor pass. > > One thing we need and we don't have is data. The Attributor is a > fixpoint iteration framework so the number of iterations is pretty > integral part. We have a statistics counter to determine if the number > required was higher than the given threshold but not one to determine > the maximum iteration count required during compilation. It would be > great if you could add that, thus a statistics counter that shows how > many iterations where required until a fixpoint was found across all > invocations of the Attributor. Does this make sense? Let me know what > you think and feel free to ask questions via email or on IRC. > > Cheers, > Johannes > > P.S. Check out the coding style guide and the how to contribute guide! > > > > Thank you > > Shiva Badruswamy > > shivastanford at gmail.com > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200327/5c780cf2/attachment.html>
Johannes Doerfert via llvm-dev
2020-Mar-31 00:29 UTC
[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
On 3/27/20 3:46 PM, Shiva Stanford wrote:> Hi Johannes - great we are engaging on this. > > Some responses now and some later. > > 1. When you say setup LLVM dev environment +. clang + tools etc, do you > mean setup LLVM compiler code from the repo and build it locally? If so, > yes, this is all done from my end - that is, I have built all this on my > machine and compiled and run a couple of function passes. I have look at > some LLVM emits from clang tools but I will familiarize more. I have added > some small code segments, modified CMAKE Lists and re-built code to get a > feel for the packaging structure. Btw, is there a version of Basel build > for this? Right now, I am using OS X as the SDK as Apple is the one that > has adopted LLVM the most. But I can switch to Linux containers to > completely wall off the LLVM build against any OS X system builds to > prevent path obfuscation and truly have a separate address space. Is there > a preferable environment? In any case, I am thinking of containerizing the > build, so OS X system paths don't interfere with include paths - have you > received feedback from other developers on whether the include paths > interfere with OS X LLVM system build?Setup sounds good. I have never used OS X but people do and I would expect it to be OK. I don't think you need to worry about this right now.> 2. The attributor pass refactoring gives some specific direction as a > startup project - so that's great. Let me study this pass and I will get > back to you with more questions.Sure.> 3. Yes, I will stick to the style guide (Baaaah...Stanford is strict on > code styling and so are you guys :)) for sure.For better or worse. Cheers, Johannes> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert < > johannesdoerfert at gmail.com> wrote: > >> Hi Shiva, >> >> apologies for the delayed response. >> >> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote: >> > I am a grad CS student at Stanford and wanted to engage with EJ Park, >> > Giorgis Georgakoudis, Johannes Doerfert to further develop the Machine >> > Learning and Compiler Optimization concept. >> >> Cool! >> >> >> > My background is in machine learning, cluster computing, distributed >> > systems etc. I am a good C/C++ developer and have a strong background in >> > algorithms and data structure. >> >> Sounds good. >> >> >> > I am also taking an advanced compiler course this quarter at >> Stanford. So I >> > would be studying several of these topics anyways - so I thought I >> might as >> > well co-engage on the LLVM compiler infra project. >> >> Agreed ;) >> >> >> > I am currently studying the background information on SCC Call Graphs, >> > Dominator Trees and other Global and inter-procedural analysis to lay >> some >> > ground work on how to tackle this optimization pass using ML models. >> I have >> > run a couple of all program function passes and visualized call graphs >> to >> > get familiarized with the LLVM optimization pass setup. I have also >> setup >> > and learnt the use of GDB to debug function pass code. >> >> Very nice. >> >> >> > I have submitted the ML and Compiler Optimization proposal to GSOC >> 2020. I >> > have added an additional feature to enhance the ML optimization to >> include >> > crossover code to GPU and investigate how the function call graphs can >> be >> > visualized as SCCs across CPU and GPU implementations. If the >> extension to >> > GPU is too much for a summer project, potentially we can focus on >> > developing a framework for studying SCCs across a unified CPU, GPU setup >> > and leave the coding, if feasible, to next Summer. All preliminary >> ideas. >> >> I haven't looked at the proposals yet (I think we can only after the >> deadline). TBH, I'm not sure I fully understand your extension. Also, >> full disclosure, the project is pretty open-ended from my side at least. >> I do not necessarily believe we (=llvm) is ready for a ML driven pass or >> even inference in practice. What I want is to explore the use of ML to >> improve the code we have, especially heuristics. We build analysis and >> transformations but it is hard to combine them in a way that balances >> compile-time, code-size, and performance. >> >> Some high-level statements that might help to put my view into >> perspective: >> >> I want to use ML to identify patterns and code features that we can >> check for using common techniques but when we base our decision making >> on these patterns or features we achieve better compile-time, code-size, >> and/or performance. >> I want to use ML to identify shortcomings in our existing heuristics, >> e.g. transformation cut-off values or pass schedules. This could also >> mean to identify alternative (combination of) values that perform >> substantially better (on some inputs). >> >> >> > Not sure how to proceed from here. Hence my email to this list. >> Please let >> > me know. >> >> The email to the list was a great first step. The next one usually is to >> setup an LLVM development and testing environment, thus LLVM + Clang + >> LLVM-Test Suite that you can use. It is also advised to work on a small >> task before the GSoC to get used to the LLVM development. >> >> I don't have a really small ML "coding" task handy right now but the >> project is more about experiments anyway. To get some LLVM development >> experience we can just take a small task in the IPO Attributor pass. >> >> One thing we need and we don't have is data. The Attributor is a >> fixpoint iteration framework so the number of iterations is pretty >> integral part. We have a statistics counter to determine if the number >> required was higher than the given threshold but not one to determine >> the maximum iteration count required during compilation. It would be >> great if you could add that, thus a statistics counter that shows how >> many iterations where required until a fixpoint was found across all >> invocations of the Attributor. Does this make sense? Let me know what >> you think and feel free to ask questions via email or on IRC. >> >> Cheers, >> Johannes >> >> P.S. Check out the coding style guide and the how to contribute guide! >> >> >> > Thank you >> > Shiva Badruswamy >> > shivastanford at gmail.com >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >>
Shiva Stanford via llvm-dev
2020-Mar-31 01:07 UTC
[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
1. Thanks for the clarifications. I will stick to non-containerized OS X for now. 2. As an aside, I did try to build a Debian docker container by git cloning into it and using the Dockerfile in LLVM/utils/docker as a starting point: - some changes needed to updated packages (GCC in particular needs to be latest) and the Debian image (Debian 9 instead of Debian 8) pretty much sets up the docker container well. But for some reason, the Ninja build tool within the CMake Generator fails. I am looking into it. Maybe I can produce a working docker workflow for others who want to build and work with LLVM in a container environment. 3. I have submitted the final proposal today to GSoC 2020 today after incorporating some comments and thoughts. When you all get a chance to review, let me know your thoughts. 4. On GPU extension, my thoughts were around what an integrated compiler like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when GCC is substituted with LLVM and if that arrangement can be optimized for ML passes. But I am beginning to think that structuring this problem well and doing meaningful work over the summer might be a bit difficult. As mentors, do you have any thoughts on how LLVM might be integrated into a joint CPU-GPU compiler by the likes of Nvidia, Apple etc.? Best Shiva On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert < johannesdoerfert at gmail.com> wrote:> > On 3/27/20 3:46 PM, Shiva Stanford wrote: > > Hi Johannes - great we are engaging on this. > > > > Some responses now and some later. > > > > 1. When you say setup LLVM dev environment +. clang + tools etc, do you > > mean setup LLVM compiler code from the repo and build it locally? If so, > > yes, this is all done from my end - that is, I have built all this on my > > machine and compiled and run a couple of function passes. I have look at > > some LLVM emits from clang tools but I will familiarize more. I have > added > > some small code segments, modified CMAKE Lists and re-built code to get a > > feel for the packaging structure. Btw, is there a version of Basel build > > for this? Right now, I am using OS X as the SDK as Apple is the one that > > has adopted LLVM the most. But I can switch to Linux containers to > > completely wall off the LLVM build against any OS X system builds to > > prevent path obfuscation and truly have a separate address space. Is > there > > a preferable environment? In any case, I am thinking of containerizing > the > > build, so OS X system paths don't interfere with include paths - have you > > received feedback from other developers on whether the include paths > > interfere with OS X LLVM system build? > > > Setup sounds good. > > I have never used OS X but people do and I would expect it to be OK. > > I don't think you need to worry about this right now. > > > > 2. The attributor pass refactoring gives some specific direction as a > > startup project - so that's great. Let me study this pass and I will get > > back to you with more questions. > > Sure. > > > > 3. Yes, I will stick to the style guide (Baaaah...Stanford is strict on > > code styling and so are you guys :)) for sure. > > For better or worse. > > > Cheers, > > Johannes > > > > > On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert < > > johannesdoerfert at gmail.com> wrote: > > > >> Hi Shiva, > >> > >> apologies for the delayed response. > >> > >> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote: > >> > I am a grad CS student at Stanford and wanted to engage with EJ > Park, > >> > Giorgis Georgakoudis, Johannes Doerfert to further develop the > Machine > >> > Learning and Compiler Optimization concept. > >> > >> Cool! > >> > >> > >> > My background is in machine learning, cluster computing, distributed > >> > systems etc. I am a good C/C++ developer and have a strong > background in > >> > algorithms and data structure. > >> > >> Sounds good. > >> > >> > >> > I am also taking an advanced compiler course this quarter at > >> Stanford. So I > >> > would be studying several of these topics anyways - so I thought I > >> might as > >> > well co-engage on the LLVM compiler infra project. > >> > >> Agreed ;) > >> > >> > >> > I am currently studying the background information on SCC Call > Graphs, > >> > Dominator Trees and other Global and inter-procedural analysis to > lay > >> some > >> > ground work on how to tackle this optimization pass using ML models. > >> I have > >> > run a couple of all program function passes and visualized call > graphs > >> to > >> > get familiarized with the LLVM optimization pass setup. I have also > >> setup > >> > and learnt the use of GDB to debug function pass code. > >> > >> Very nice. > >> > >> > >> > I have submitted the ML and Compiler Optimization proposal to GSOC > >> 2020. I > >> > have added an additional feature to enhance the ML optimization to > >> include > >> > crossover code to GPU and investigate how the function call graphs > can > >> be > >> > visualized as SCCs across CPU and GPU implementations. If the > >> extension to > >> > GPU is too much for a summer project, potentially we can focus on > >> > developing a framework for studying SCCs across a unified CPU, GPU > setup > >> > and leave the coding, if feasible, to next Summer. All preliminary > >> ideas. > >> > >> I haven't looked at the proposals yet (I think we can only after the > >> deadline). TBH, I'm not sure I fully understand your extension. Also, > >> full disclosure, the project is pretty open-ended from my side at least. > >> I do not necessarily believe we (=llvm) is ready for a ML driven pass or > >> even inference in practice. What I want is to explore the use of ML to > >> improve the code we have, especially heuristics. We build analysis and > >> transformations but it is hard to combine them in a way that balances > >> compile-time, code-size, and performance. > >> > >> Some high-level statements that might help to put my view into > >> perspective: > >> > >> I want to use ML to identify patterns and code features that we can > >> check for using common techniques but when we base our decision making > >> on these patterns or features we achieve better compile-time, code-size, > >> and/or performance. > >> I want to use ML to identify shortcomings in our existing heuristics, > >> e.g. transformation cut-off values or pass schedules. This could also > >> mean to identify alternative (combination of) values that perform > >> substantially better (on some inputs). > >> > >> > >> > Not sure how to proceed from here. Hence my email to this list. > >> Please let > >> > me know. > >> > >> The email to the list was a great first step. The next one usually is to > >> setup an LLVM development and testing environment, thus LLVM + Clang + > >> LLVM-Test Suite that you can use. It is also advised to work on a small > >> task before the GSoC to get used to the LLVM development. > >> > >> I don't have a really small ML "coding" task handy right now but the > >> project is more about experiments anyway. To get some LLVM development > >> experience we can just take a small task in the IPO Attributor pass. > >> > >> One thing we need and we don't have is data. The Attributor is a > >> fixpoint iteration framework so the number of iterations is pretty > >> integral part. We have a statistics counter to determine if the number > >> required was higher than the given threshold but not one to determine > >> the maximum iteration count required during compilation. It would be > >> great if you could add that, thus a statistics counter that shows how > >> many iterations where required until a fixpoint was found across all > >> invocations of the Attributor. Does this make sense? Let me know what > >> you think and feel free to ask questions via email or on IRC. > >> > >> Cheers, > >> Johannes > >> > >> P.S. Check out the coding style guide and the how to contribute guide! > >> > >> > >> > Thank you > >> > Shiva Badruswamy > >> > shivastanford at gmail.com > >> > > >> > > >> > _______________________________________________ > >> > LLVM Developers mailing list > >> > llvm-dev at lists.llvm.org > >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200330/149b2f30/attachment-0001.html>