Shiva Stanford via llvm-dev
2020-Mar-31 02:28 UTC
[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
Hi Johannes: 1. Attached is the submitted PDF. 2. I have a notes section where I state: I am still unsure of the GPU extension I proposed as I dont know how LLVM plays into the GPU cross over space like how nvcc (Nvidia's compiler integrates gcc and PTX) does.I dont know if there is a chance that function graphs in the CPU+GPU name spaces are seamless/continupus within nvcc or if nvcc is just a wrapper that invokes gcc on the cpu sources and ptx on the gpu sources. So what I have said is - if there is time to investigate we could look at this. But I am not sure I am even framing the problem statement correctly at this point. 3. I have added a tentative tasks section and made a note that the project is open ended and things are quite fluid and may change significantly. Cheers Shiva On Mon, Mar 30, 2020 at 6:52 PM Johannes Doerfert < johannesdoerfert at gmail.com> wrote:> On 3/30/20 8:07 PM, Shiva Stanford wrote: > > 1. Thanks for the clarifications. I will stick to non-containerized OS X > > for now. > > Sounds good. As long as you can build it and run lit and llvm-test suite > tests :) > > > > 2. As an aside, I did try to build a Debian docker container by git > cloning > > into it and using the Dockerfile in LLVM/utils/docker as a starting > point: > > - some changes needed to updated packages (GCC in particular needs to > be > > latest) and the Debian image (Debian 9 instead of Debian 8) pretty much > > sets up the docker container well. But for some reason, the Ninja build > > tool within the CMake Generator fails. I am looking into it. Maybe I can > > produce a working docker workflow for others who want to build and work > > with LLVM in a container environment. > > Feel free to propose a fix but I'm the wrong one to talk to ;) > > > > 3. I have submitted the final proposal today to GSoC 2020 today after > > incorporating some comments and thoughts. When you all get a chance to > > review, let me know your thoughts. > > Good. Can you share the google docs with me > (johannesdoerfert at gmail.com)? [Or did you and I misplaced the link? In > that case send it again ;)] > > > > 4. On GPU extension, my thoughts were around what an integrated compiler > > like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when GCC is > substituted > > with LLVM and if that arrangement can be optimized for ML passes. > > But I am beginning to think that structuring this problem well and > > doing meaningful work over the summer might be a bit difficult. > > As far as I know, neither GCC nor Clang will behave much differently if > they are used by nvcc than in their standalone mode. > > Having an "ML-mode" is probably a generic thing to look at. Though, the > "high-level" optimizations are not necessarily performed in LLVM-IR. > > > > As mentors, do you have any thoughts on how LLVM might be integrated > > into a joint CPU-GPU compiler by the likes of Nvidia, Apple etc.? > > I'm unsure what you ask exactly. Clang can be used in CPU-GPU > compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this it? > I'm personally mostly interested in generic optimizations in this space > but actually quite interested. Some ideas: > - transfer latency hiding (another GSoC project), > - kernel granularity optimizations (not worked being worked on yet but > requires some infrastructe changes that are as of now still in the > making), > - data "location" tracking so we can "move" computation to the right > device, e.g., for really dependence free loops like `pragma omp loop` > > I can list more things but I'm unsure this is the direction you were > thinking. > > Cheers, > Johannes > > > Best > > Shiva > > > > > > > > On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert < > > johannesdoerfert at gmail.com> wrote: > > > >> > >> On 3/27/20 3:46 PM, Shiva Stanford wrote: > >>> Hi Johannes - great we are engaging on this. > >>> > >>> Some responses now and some later. > >>> > >>> 1. When you say setup LLVM dev environment +. clang + tools etc, do > you > >>> mean setup LLVM compiler code from the repo and build it locally? > If so, > >>> yes, this is all done from my end - that is, I have built all this > on my > >>> machine and compiled and run a couple of function passes. I have > look at > >>> some LLVM emits from clang tools but I will familiarize more. I have > >> added > >>> some small code segments, modified CMAKE Lists and re-built code to > get a > >>> feel for the packaging structure. Btw, is there a version of Basel > build > >>> for this? Right now, I am using OS X as the SDK as Apple is the one > that > >>> has adopted LLVM the most. But I can switch to Linux containers to > >>> completely wall off the LLVM build against any OS X system builds to > >>> prevent path obfuscation and truly have a separate address space. Is > >> there > >>> a preferable environment? In any case, I am thinking of containerizing > >> the > >>> build, so OS X system paths don't interfere with include paths - > have you > >>> received feedback from other developers on whether the include paths > >>> interfere with OS X LLVM system build? > >> > >> > >> Setup sounds good. > >> > >> I have never used OS X but people do and I would expect it to be OK. > >> > >> I don't think you need to worry about this right now. > >> > >> > >>> 2. The attributor pass refactoring gives some specific direction as a > >>> startup project - so that's great. Let me study this pass and I > will get > >>> back to you with more questions. > >> > >> Sure. > >> > >> > >>> 3. Yes, I will stick to the style guide (Baaaah...Stanford is strict > on > >>> code styling and so are you guys :)) for sure. > >> > >> For better or worse. > >> > >> > >> Cheers, > >> > >> Johannes > >> > >> > >> > >>> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert < > >>> johannesdoerfert at gmail.com> wrote: > >>> > >>>> Hi Shiva, > >>>> > >>>> apologies for the delayed response. > >>>> > >>>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote: > >>>> > I am a grad CS student at Stanford and wanted to engage with EJ > >> Park, > >>>> > Giorgis Georgakoudis, Johannes Doerfert to further develop the > >> Machine > >>>> > Learning and Compiler Optimization concept. > >>>> > >>>> Cool! > >>>> > >>>> > >>>> > My background is in machine learning, cluster computing, > distributed > >>>> > systems etc. I am a good C/C++ developer and have a strong > >> background in > >>>> > algorithms and data structure. > >>>> > >>>> Sounds good. > >>>> > >>>> > >>>> > I am also taking an advanced compiler course this quarter at > >>>> Stanford. So I > >>>> > would be studying several of these topics anyways - so I thought > I > >>>> might as > >>>> > well co-engage on the LLVM compiler infra project. > >>>> > >>>> Agreed ;) > >>>> > >>>> > >>>> > I am currently studying the background information on SCC Call > >> Graphs, > >>>> > Dominator Trees and other Global and inter-procedural analysis to > >> lay > >>>> some > >>>> > ground work on how to tackle this optimization pass using ML > models. > >>>> I have > >>>> > run a couple of all program function passes and visualized call > >> graphs > >>>> to > >>>> > get familiarized with the LLVM optimization pass setup. I have > also > >>>> setup > >>>> > and learnt the use of GDB to debug function pass code. > >>>> > >>>> Very nice. > >>>> > >>>> > >>>> > I have submitted the ML and Compiler Optimization proposal to > GSOC > >>>> 2020. I > >>>> > have added an additional feature to enhance the ML optimization > to > >>>> include > >>>> > crossover code to GPU and investigate how the function call > graphs > >> can > >>>> be > >>>> > visualized as SCCs across CPU and GPU implementations. If the > >>>> extension to > >>>> > GPU is too much for a summer project, potentially we can focus on > >>>> > developing a framework for studying SCCs across a unified CPU, > GPU > >> setup > >>>> > and leave the coding, if feasible, to next Summer. All > preliminary > >>>> ideas. > >>>> > >>>> I haven't looked at the proposals yet (I think we can only after the > >>>> deadline). TBH, I'm not sure I fully understand your extension. Also, > >>>> full disclosure, the project is pretty open-ended from my side at > least. > >>>> I do not necessarily believe we (=llvm) is ready for a ML driven > pass or > >>>> even inference in practice. What I want is to explore the use of ML > to > >>>> improve the code we have, especially heuristics. We build analysis > and > >>>> transformations but it is hard to combine them in a way that balances > >>>> compile-time, code-size, and performance. > >>>> > >>>> Some high-level statements that might help to put my view into > >>>> perspective: > >>>> > >>>> I want to use ML to identify patterns and code features that we can > >>>> check for using common techniques but when we base our decision > making > >>>> on these patterns or features we achieve better compile-time, > code-size, > >>>> and/or performance. > >>>> I want to use ML to identify shortcomings in our existing heuristics, > >>>> e.g. transformation cut-off values or pass schedules. This could also > >>>> mean to identify alternative (combination of) values that perform > >>>> substantially better (on some inputs). > >>>> > >>>> > >>>> > Not sure how to proceed from here. Hence my email to this list. > >>>> Please let > >>>> > me know. > >>>> > >>>> The email to the list was a great first step. The next one usually > is to > >>>> setup an LLVM development and testing environment, thus LLVM + Clang > + > >>>> LLVM-Test Suite that you can use. It is also advised to work on a > small > >>>> task before the GSoC to get used to the LLVM development. > >>>> > >>>> I don't have a really small ML "coding" task handy right now but the > >>>> project is more about experiments anyway. To get some LLVM > development > >>>> experience we can just take a small task in the IPO Attributor pass. > >>>> > >>>> One thing we need and we don't have is data. The Attributor is a > >>>> fixpoint iteration framework so the number of iterations is pretty > >>>> integral part. We have a statistics counter to determine if the > number > >>>> required was higher than the given threshold but not one to determine > >>>> the maximum iteration count required during compilation. It would be > >>>> great if you could add that, thus a statistics counter that shows how > >>>> many iterations where required until a fixpoint was found across all > >>>> invocations of the Attributor. Does this make sense? Let me know what > >>>> you think and feel free to ask questions via email or on IRC. > >>>> > >>>> Cheers, > >>>> Johannes > >>>> > >>>> P.S. Check out the coding style guide and the how to contribute > guide! > >>>> > >>>> > >>>> > Thank you > >>>> > Shiva Badruswamy > >>>> > shivastanford at gmail.com > >>>> > > >>>> > > >>>> > _______________________________________________ > >>>> > LLVM Developers mailing list > >>>> > llvm-dev at lists.llvm.org > >>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>>> > >>>> > >> > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200330/6d40ad8d/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: Final_Proposal_ MachineLearningAndCompilerOptimization.pdf Type: application/pdf Size: 40819 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200330/6d40ad8d/attachment-0001.pdf>
Johannes Doerfert via llvm-dev
2020-Mar-31 06:42 UTC
[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
On 3/30/20 9:28 PM, Shiva Stanford wrote: > Hi Johannes: > > 1. Attached is the submitted PDF. I thought they make you submit via gdoc and I also thought they wanted a timeline and had other requirements. Please verify this so it's not a problem (I base this on the proposals I've seen this year and not on the information actually provided by GSoC). > 2. I have a notes section where I state: I am still unsure of the GPU > extension I proposed as I dont know how LLVM plays into the GPU cross over > space like how nvcc (Nvidia's compiler integrates gcc and PTX) does. You can use clang as "host compiler". As mentioned before, there is clang-cuda and OpenMP offloading also generates PTX for the GPU code. > I dont know if there is a chance that function graphs in the CPU+GPU > name spaces are seamless/continupus within nvcc or if nvcc is just a > wrapper that invokes gcc on the cpu sources and ptx on the gpu > sources. Something like that as far as I know. > So what I have said is - if there is time to investigate we could > look at this. But I am not sure I am even framing the problem > statement correctly at this point. As I said, I'd be very happy for you to also work on GPU related things, what exactly can be defined over the next weeks. GPU offloading is by nature inter-procedural (take CUDA kernels) so creating the infrastructure to alter the granularity of kernels (when/where to fuse/split them) could be a task. For this it is fairly important (as far as I know now) to predict the register usage accurately. Using learning here might be interesting as well. As you mention in the pdf, one can also split the index space to balance computation. When we implement something like `pragma omp loop` we can also balance computations across multiple GPUs as long as we get the data movement right. > 3. I have added a tentative tasks section and made a note that the > project is open ended and things are quite fluid and may change > significantly. That is good. This is a moving target and open ended task, I expect things to be determined more clearly as we go and based on the data we gather. Cheers, Johannes > Cheers Shiva > > > On Mon, Mar 30, 2020 at 6:52 PM Johannes Doerfert < > johannesdoerfert at gmail.com> wrote: > >> On 3/30/20 8:07 PM, Shiva Stanford wrote: >> > 1. Thanks for the clarifications. I will stick to >> > non-containerized OS X for now. >> >> Sounds good. As long as you can build it and run lit and llvm-test >> suite tests :) >> >> >> > 2. As an aside, I did try to build a Debian docker container by >> > git >> cloning >> > into it and using the Dockerfile in LLVM/utils/docker as a >> > starting >> point: >> > - some changes needed to updated packages (GCC in particular >> > needs to >> be >> > latest) and the Debian image (Debian 9 instead of Debian 8) pretty >> > much sets up the docker container well. But for some reason, the >> > Ninja build tool within the CMake Generator fails. I am looking >> > into it. Maybe I can produce a working docker workflow for others >> > who want to build and work with LLVM in a container environment. >> >> Feel free to propose a fix but I'm the wrong one to talk to ;) >> >> >> > 3. I have submitted the final proposal today to GSoC 2020 today >> > after incorporating some comments and thoughts. When you all get a >> > chance to review, let me know your thoughts. >> >> Good. Can you share the google docs with me >> (johannesdoerfert at gmail.com)? [Or did you and I misplaced the link? >> In that case send it again ;)] >> >> >> > 4. On GPU extension, my thoughts were around what an integrated >> > compiler like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when >> > GCC is >> substituted >> > with LLVM and if that arrangement can be optimized for ML passes. >> > But I am beginning to think that structuring this problem well and >> > doing meaningful work over the summer might be a bit difficult. >> >> As far as I know, neither GCC nor Clang will behave much differently >> if they are used by nvcc than in their standalone mode. >> >> Having an "ML-mode" is probably a generic thing to look at. Though, >> the "high-level" optimizations are not necessarily performed in >> LLVM-IR. >> >> >> > As mentors, do you have any thoughts on how LLVM might be >> > integrated into a joint CPU-GPU compiler by the likes of Nvidia, >> > Apple etc.? >> >> I'm unsure what you ask exactly. Clang can be used in CPU-GPU >> compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this it? >> I'm personally mostly interested in generic optimizations in this >> space but actually quite interested. Some ideas: - transfer latency >> hiding (another GSoC project), - kernel granularity optimizations >> (not worked being worked on yet but requires some infrastructe >> changes that are as of now still in the making), - data "location" >> tracking so we can "move" computation to the right device, e.g., for >> really dependence free loops like `pragma omp loop` >> >> I can list more things but I'm unsure this is the direction you were >> thinking. >> >> Cheers, Johannes >> >> > Best Shiva >> > >> > >> > >> > On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert < >> > johannesdoerfert at gmail.com> wrote: >> > >> >> >> >> On 3/27/20 3:46 PM, Shiva Stanford wrote: >> >>> Hi Johannes - great we are engaging on this. >> >>> >> >>> Some responses now and some later. >> >>> >> >>> 1. When you say setup LLVM dev environment +. clang + tools etc, >> >>> do >> you >> >>> mean setup LLVM compiler code from the repo and build it >> >>> locally? >> If so, >> >>> yes, this is all done from my end - that is, I have built all >> >>> this >> on my >> >>> machine and compiled and run a couple of function passes. I have >> look at >> >>> some LLVM emits from clang tools but I will familiarize more. I >> >>> have >> >> added >> >>> some small code segments, modified CMAKE Lists and re-built code >> >>> to >> get a >> >>> feel for the packaging structure. Btw, is there a version of >> >>> Basel >> build >> >>> for this? Right now, I am using OS X as the SDK as Apple is the >> >>> one >> that >> >>> has adopted LLVM the most. But I can switch to Linux containers >> >>> to completely wall off the LLVM build against any OS X system >> >>> builds to prevent path obfuscation and truly have a separate >> >>> address space. Is >> >> there >> >>> a preferable environment? In any case, I am thinking of >> >>> containerizing >> >> the >> >>> build, so OS X system paths don't interfere with include paths - >> have you >> >>> received feedback from other developers on whether the include >> >>> paths interfere with OS X LLVM system build? >> >> >> >> >> >> Setup sounds good. >> >> >> >> I have never used OS X but people do and I would expect it to be >> >> OK. >> >> >> >> I don't think you need to worry about this right now. >> >> >> >> >> >>> 2. The attributor pass refactoring gives some specific direction >> >>> as a startup project - so that's great. Let me study this pass >> >>> and I >> will get >> >>> back to you with more questions. >> >> >> >> Sure. >> >> >> >> >> >>> 3. Yes, I will stick to the style guide (Baaaah...Stanford is >> >>> strict >> on >> >>> code styling and so are you guys :)) for sure. >> >> >> >> For better or worse. >> >> >> >> >> >> Cheers, >> >> >> >> Johannes >> >> >> >> >> >> >> >>> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert < >> >>> johannesdoerfert at gmail.com> wrote: >> >>> >> >>>> Hi Shiva, >> >>>> >> >>>> apologies for the delayed response. >> >>>> >> >>>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote: >> >>>> > I am a grad CS student at Stanford and wanted to engage >> >>>> > with EJ >> >> Park, >> >>>> > Giorgis Georgakoudis, Johannes Doerfert to further develop >> >>>> > the >> >> Machine >> >>>> > Learning and Compiler Optimization concept. >> >>>> >> >>>> Cool! >> >>>> >> >>>> >> >>>> > My background is in machine learning, cluster computing, >> distributed >> >>>> > systems etc. I am a good C/C++ developer and have a strong >> >> background in >> >>>> > algorithms and data structure. >> >>>> >> >>>> Sounds good. >> >>>> >> >>>> >> >>>> > I am also taking an advanced compiler course this quarter >> >>>> > at >> >>>> Stanford. So I >> >>>> > would be studying several of these topics anyways - so I >> >>>> > thought >> I >> >>>> might as >> >>>> > well co-engage on the LLVM compiler infra project. >> >>>> >> >>>> Agreed ;) >> >>>> >> >>>> >> >>>> > I am currently studying the background information on SCC >> >>>> > Call >> >> Graphs, >> >>>> > Dominator Trees and other Global and inter-procedural >> >>>> > analysis to >> >> lay >> >>>> some >> >>>> > ground work on how to tackle this optimization pass using >> >>>> > ML >> models. >> >>>> I have >> >>>> > run a couple of all program function passes and visualized >> >>>> > call >> >> graphs >> >>>> to >> >>>> > get familiarized with the LLVM optimization pass setup. I >> >>>> > have >> also >> >>>> setup >> >>>> > and learnt the use of GDB to debug function pass code. >> >>>> >> >>>> Very nice. >> >>>> >> >>>> >> >>>> > I have submitted the ML and Compiler Optimization proposal >> >>>> > to >> GSOC >> >>>> 2020. I >> >>>> > have added an additional feature to enhance the ML >> >>>> > optimization >> to >> >>>> include >> >>>> > crossover code to GPU and investigate how the function call >> graphs >> >> can >> >>>> be >> >>>> > visualized as SCCs across CPU and GPU implementations. If >> >>>> > the >> >>>> extension to >> >>>> > GPU is too much for a summer project, potentially we can >> >>>> > focus on developing a framework for studying SCCs across a >> >>>> > unified CPU, >> GPU >> >> setup >> >>>> > and leave the coding, if feasible, to next Summer. All >> preliminary >> >>>> ideas. >> >>>> >> >>>> I haven't looked at the proposals yet (I think we can only >> >>>> after the deadline). TBH, I'm not sure I fully understand your >> >>>> extension. Also, full disclosure, the project is pretty >> >>>> open-ended from my side at >> least. >> >>>> I do not necessarily believe we (=llvm) is ready for a ML >> >>>> driven >> pass or >> >>>> even inference in practice. What I want is to explore the use >> >>>> of ML >> to >> >>>> improve the code we have, especially heuristics. We build >> >>>> analysis >> and >> >>>> transformations but it is hard to combine them in a way that >> >>>> balances compile-time, code-size, and performance. >> >>>> >> >>>> Some high-level statements that might help to put my view into >> >>>> perspective: >> >>>> >> >>>> I want to use ML to identify patterns and code features that we >> >>>> can check for using common techniques but when we base our >> >>>> decision >> making >> >>>> on these patterns or features we achieve better compile-time, >> code-size, >> >>>> and/or performance. I want to use ML to identify shortcomings >> >>>> in our existing heuristics, e.g. transformation cut-off values >> >>>> or pass schedules. This could also mean to identify alternative >> >>>> (combination of) values that perform substantially better (on >> >>>> some inputs). >> >>>> >> >>>> >> >>>> > Not sure how to proceed from here. Hence my email to this >> >>>> > list. >> >>>> Please let >> >>>> > me know. >> >>>> >> >>>> The email to the list was a great first step. The next one >> >>>> usually >> is to >> >>>> setup an LLVM development and testing environment, thus LLVM + >> >>>> Clang >> + >> >>>> LLVM-Test Suite that you can use. It is also advised to work on >> >>>> a >> small >> >>>> task before the GSoC to get used to the LLVM development. >> >>>> >> >>>> I don't have a really small ML "coding" task handy right now >> >>>> but the project is more about experiments anyway. To get some >> >>>> LLVM >> development >> >>>> experience we can just take a small task in the IPO Attributor >> >>>> pass. >> >>>> >> >>>> One thing we need and we don't have is data. The Attributor is >> >>>> a fixpoint iteration framework so the number of iterations is >> >>>> pretty integral part. We have a statistics counter to determine >> >>>> if the >> number >> >>>> required was higher than the given threshold but not one to >> >>>> determine the maximum iteration count required during >> >>>> compilation. It would be great if you could add that, thus a >> >>>> statistics counter that shows how many iterations where >> >>>> required until a fixpoint was found across all invocations of >> >>>> the Attributor. Does this make sense? Let me know what you >> >>>> think and feel free to ask questions via email or on IRC. >> >>>> >> >>>> Cheers, Johannes >> >>>> >> >>>> P.S. Check out the coding style guide and the how to contribute >> guide! >> >>>> >> >>>> >> >>>> > Thank you Shiva Badruswamy shivastanford at gmail.com >> >>>> > >> >>>> > >> >>>> > _______________________________________________ LLVM >> >>>> > Developers mailing list llvm-dev at lists.llvm.org >> >>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >>>> >> >>>> >> >> >> > >> >> >
Shiva Stanford via llvm-dev
2020-Mar-31 10:22 UTC
[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
1. Draft proposals via gdoc. Final via PDF. 2. I did not see any timeline requests from GSoC but spring quarter ends June 6 or so or maybe by a week more due to Coronavirus schedule delays. Summer begins then. I will look into it some more in the morning and see what I can add to timelines. Thanks. On Mon, Mar 30, 2020 at 11:43 PM Johannes Doerfert < johannesdoerfert at gmail.com> wrote:> > On 3/30/20 9:28 PM, Shiva Stanford wrote: > > Hi Johannes: > > > > 1. Attached is the submitted PDF. > > I thought they make you submit via gdoc and I also thought they wanted a > timeline and had other requirements. Please verify this so it's not a > problem (I base this on the proposals I've seen this year and not on the > information actually provided by GSoC). > > > > 2. I have a notes section where I state: I am still unsure of the GPU > > extension I proposed as I dont know how LLVM plays into the GPU cross > over > > space like how nvcc (Nvidia's compiler integrates gcc and PTX) does. > > You can use clang as "host compiler". As mentioned before, there is > clang-cuda and OpenMP offloading also generates PTX for the GPU code. > > > > I dont know if there is a chance that function graphs in the CPU+GPU > > name spaces are seamless/continupus within nvcc or if nvcc is just a > > wrapper that invokes gcc on the cpu sources and ptx on the gpu > > sources. > > Something like that as far as I know. > > > > So what I have said is - if there is time to investigate we could > > look at this. But I am not sure I am even framing the problem > > statement correctly at this point. > > As I said, I'd be very happy for you to also work on GPU related things, > what exactly can be defined over the next weeks. > > GPU offloading is by nature inter-procedural (take CUDA kernels) so > creating the infrastructure to alter the granularity of kernels > (when/where to fuse/split them) could be a task. For this it is fairly > important (as far as I know now) to predict the register usage > accurately. Using learning here might be interesting as well. > > As you mention in the pdf, one can also split the index space to balance > computation. When we implement something like `pragma omp loop` we can > also balance computations across multiple GPUs as long as we get the > data movement right. > > > > 3. I have added a tentative tasks section and made a note that the > > project is open ended and things are quite fluid and may change > > significantly. > > That is good. This is a moving target and open ended task, I expect > things to be determined more clearly as we go and based on the data we > gather. > > Cheers, > Johannes > > > > Cheers Shiva > > > > > > On Mon, Mar 30, 2020 at 6:52 PM Johannes Doerfert < > > johannesdoerfert at gmail.com> wrote: > > > >> On 3/30/20 8:07 PM, Shiva Stanford wrote: > >> > 1. Thanks for the clarifications. I will stick to > >> > non-containerized OS X for now. > >> > >> Sounds good. As long as you can build it and run lit and llvm-test > >> suite tests :) > >> > >> > >> > 2. As an aside, I did try to build a Debian docker container by > >> > git > >> cloning > >> > into it and using the Dockerfile in LLVM/utils/docker as a > >> > starting > >> point: > >> > - some changes needed to updated packages (GCC in particular > >> > needs to > >> be > >> > latest) and the Debian image (Debian 9 instead of Debian 8) pretty > >> > much sets up the docker container well. But for some reason, the > >> > Ninja build tool within the CMake Generator fails. I am looking > >> > into it. Maybe I can produce a working docker workflow for others > >> > who want to build and work with LLVM in a container environment. > >> > >> Feel free to propose a fix but I'm the wrong one to talk to ;) > >> > >> > >> > 3. I have submitted the final proposal today to GSoC 2020 today > >> > after incorporating some comments and thoughts. When you all get a > >> > chance to review, let me know your thoughts. > >> > >> Good. Can you share the google docs with me > >> (johannesdoerfert at gmail.com)? [Or did you and I misplaced the link? > >> In that case send it again ;)] > >> > >> > >> > 4. On GPU extension, my thoughts were around what an integrated > >> > compiler like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when > >> > GCC is > >> substituted > >> > with LLVM and if that arrangement can be optimized for ML passes. > >> > But I am beginning to think that structuring this problem well and > >> > doing meaningful work over the summer might be a bit difficult. > >> > >> As far as I know, neither GCC nor Clang will behave much differently > >> if they are used by nvcc than in their standalone mode. > >> > >> Having an "ML-mode" is probably a generic thing to look at. Though, > >> the "high-level" optimizations are not necessarily performed in > >> LLVM-IR. > >> > >> > >> > As mentors, do you have any thoughts on how LLVM might be > >> > integrated into a joint CPU-GPU compiler by the likes of Nvidia, > >> > Apple etc.? > >> > >> I'm unsure what you ask exactly. Clang can be used in CPU-GPU > >> compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this it? > >> I'm personally mostly interested in generic optimizations in this > >> space but actually quite interested. Some ideas: - transfer latency > >> hiding (another GSoC project), - kernel granularity optimizations > >> (not worked being worked on yet but requires some infrastructe > >> changes that are as of now still in the making), - data "location" > >> tracking so we can "move" computation to the right device, e.g., for > >> really dependence free loops like `pragma omp loop` > >> > >> I can list more things but I'm unsure this is the direction you were > >> thinking. > >> > >> Cheers, Johannes > >> > >> > Best Shiva > >> > > >> > > >> > > >> > On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert < > >> > johannesdoerfert at gmail.com> wrote: > >> > > >> >> > >> >> On 3/27/20 3:46 PM, Shiva Stanford wrote: > >> >>> Hi Johannes - great we are engaging on this. > >> >>> > >> >>> Some responses now and some later. > >> >>> > >> >>> 1. When you say setup LLVM dev environment +. clang + tools etc, > >> >>> do > >> you > >> >>> mean setup LLVM compiler code from the repo and build it > >> >>> locally? > >> If so, > >> >>> yes, this is all done from my end - that is, I have built all > >> >>> this > >> on my > >> >>> machine and compiled and run a couple of function passes. I have > >> look at > >> >>> some LLVM emits from clang tools but I will familiarize more. I > >> >>> have > >> >> added > >> >>> some small code segments, modified CMAKE Lists and re-built code > >> >>> to > >> get a > >> >>> feel for the packaging structure. Btw, is there a version of > >> >>> Basel > >> build > >> >>> for this? Right now, I am using OS X as the SDK as Apple is the > >> >>> one > >> that > >> >>> has adopted LLVM the most. But I can switch to Linux containers > >> >>> to completely wall off the LLVM build against any OS X system > >> >>> builds to prevent path obfuscation and truly have a separate > >> >>> address space. Is > >> >> there > >> >>> a preferable environment? In any case, I am thinking of > >> >>> containerizing > >> >> the > >> >>> build, so OS X system paths don't interfere with include paths - > >> have you > >> >>> received feedback from other developers on whether the include > >> >>> paths interfere with OS X LLVM system build? > >> >> > >> >> > >> >> Setup sounds good. > >> >> > >> >> I have never used OS X but people do and I would expect it to be > >> >> OK. > >> >> > >> >> I don't think you need to worry about this right now. > >> >> > >> >> > >> >>> 2. The attributor pass refactoring gives some specific direction > >> >>> as a startup project - so that's great. Let me study this pass > >> >>> and I > >> will get > >> >>> back to you with more questions. > >> >> > >> >> Sure. > >> >> > >> >> > >> >>> 3. Yes, I will stick to the style guide (Baaaah...Stanford is > >> >>> strict > >> on > >> >>> code styling and so are you guys :)) for sure. > >> >> > >> >> For better or worse. > >> >> > >> >> > >> >> Cheers, > >> >> > >> >> Johannes > >> >> > >> >> > >> >> > >> >>> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert < > >> >>> johannesdoerfert at gmail.com> wrote: > >> >>> > >> >>>> Hi Shiva, > >> >>>> > >> >>>> apologies for the delayed response. > >> >>>> > >> >>>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote: > >> >>>> > I am a grad CS student at Stanford and wanted to engage > >> >>>> > with EJ > >> >> Park, > >> >>>> > Giorgis Georgakoudis, Johannes Doerfert to further develop > >> >>>> > the > >> >> Machine > >> >>>> > Learning and Compiler Optimization concept. > >> >>>> > >> >>>> Cool! > >> >>>> > >> >>>> > >> >>>> > My background is in machine learning, cluster computing, > >> distributed > >> >>>> > systems etc. I am a good C/C++ developer and have a strong > >> >> background in > >> >>>> > algorithms and data structure. > >> >>>> > >> >>>> Sounds good. > >> >>>> > >> >>>> > >> >>>> > I am also taking an advanced compiler course this quarter > >> >>>> > at > >> >>>> Stanford. So I > >> >>>> > would be studying several of these topics anyways - so I > >> >>>> > thought > >> I > >> >>>> might as > >> >>>> > well co-engage on the LLVM compiler infra project. > >> >>>> > >> >>>> Agreed ;) > >> >>>> > >> >>>> > >> >>>> > I am currently studying the background information on SCC > >> >>>> > Call > >> >> Graphs, > >> >>>> > Dominator Trees and other Global and inter-procedural > >> >>>> > analysis to > >> >> lay > >> >>>> some > >> >>>> > ground work on how to tackle this optimization pass using > >> >>>> > ML > >> models. > >> >>>> I have > >> >>>> > run a couple of all program function passes and visualized > >> >>>> > call > >> >> graphs > >> >>>> to > >> >>>> > get familiarized with the LLVM optimization pass setup. I > >> >>>> > have > >> also > >> >>>> setup > >> >>>> > and learnt the use of GDB to debug function pass code. > >> >>>> > >> >>>> Very nice. > >> >>>> > >> >>>> > >> >>>> > I have submitted the ML and Compiler Optimization proposal > >> >>>> > to > >> GSOC > >> >>>> 2020. I > >> >>>> > have added an additional feature to enhance the ML > >> >>>> > optimization > >> to > >> >>>> include > >> >>>> > crossover code to GPU and investigate how the function call > >> graphs > >> >> can > >> >>>> be > >> >>>> > visualized as SCCs across CPU and GPU implementations. If > >> >>>> > the > >> >>>> extension to > >> >>>> > GPU is too much for a summer project, potentially we can > >> >>>> > focus on developing a framework for studying SCCs across a > >> >>>> > unified CPU, > >> GPU > >> >> setup > >> >>>> > and leave the coding, if feasible, to next Summer. All > >> preliminary > >> >>>> ideas. > >> >>>> > >> >>>> I haven't looked at the proposals yet (I think we can only > >> >>>> after the deadline). TBH, I'm not sure I fully understand your > >> >>>> extension. Also, full disclosure, the project is pretty > >> >>>> open-ended from my side at > >> least. > >> >>>> I do not necessarily believe we (=llvm) is ready for a ML > >> >>>> driven > >> pass or > >> >>>> even inference in practice. What I want is to explore the use > >> >>>> of ML > >> to > >> >>>> improve the code we have, especially heuristics. We build > >> >>>> analysis > >> and > >> >>>> transformations but it is hard to combine them in a way that > >> >>>> balances compile-time, code-size, and performance. > >> >>>> > >> >>>> Some high-level statements that might help to put my view into > >> >>>> perspective: > >> >>>> > >> >>>> I want to use ML to identify patterns and code features that we > >> >>>> can check for using common techniques but when we base our > >> >>>> decision > >> making > >> >>>> on these patterns or features we achieve better compile-time, > >> code-size, > >> >>>> and/or performance. I want to use ML to identify shortcomings > >> >>>> in our existing heuristics, e.g. transformation cut-off values > >> >>>> or pass schedules. This could also mean to identify alternative > >> >>>> (combination of) values that perform substantially better (on > >> >>>> some inputs). > >> >>>> > >> >>>> > >> >>>> > Not sure how to proceed from here. Hence my email to this > >> >>>> > list. > >> >>>> Please let > >> >>>> > me know. > >> >>>> > >> >>>> The email to the list was a great first step. The next one > >> >>>> usually > >> is to > >> >>>> setup an LLVM development and testing environment, thus LLVM + > >> >>>> Clang > >> + > >> >>>> LLVM-Test Suite that you can use. It is also advised to work on > >> >>>> a > >> small > >> >>>> task before the GSoC to get used to the LLVM development. > >> >>>> > >> >>>> I don't have a really small ML "coding" task handy right now > >> >>>> but the project is more about experiments anyway. To get some > >> >>>> LLVM > >> development > >> >>>> experience we can just take a small task in the IPO Attributor > >> >>>> pass. > >> >>>> > >> >>>> One thing we need and we don't have is data. The Attributor is > >> >>>> a fixpoint iteration framework so the number of iterations is > >> >>>> pretty integral part. We have a statistics counter to determine > >> >>>> if the > >> number > >> >>>> required was higher than the given threshold but not one to > >> >>>> determine the maximum iteration count required during > >> >>>> compilation. It would be great if you could add that, thus a > >> >>>> statistics counter that shows how many iterations where > >> >>>> required until a fixpoint was found across all invocations of > >> >>>> the Attributor. Does this make sense? Let me know what you > >> >>>> think and feel free to ask questions via email or on IRC. > >> >>>> > >> >>>> Cheers, Johannes > >> >>>> > >> >>>> P.S. Check out the coding style guide and the how to contribute > >> guide! > >> >>>> > >> >>>> > >> >>>> > Thank you Shiva Badruswamy shivastanford at gmail.com > >> >>>> > > >> >>>> > > >> >>>> > _______________________________________________ LLVM > >> >>>> > Developers mailing list llvm-dev at lists.llvm.org > >> >>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> >>>> > >> >>>> > >> >> > >> > > >> > >> > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200331/c1a8b964/attachment-0001.html>