thr3ads.net - llvm dev - [llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations [Mar 2020]

If this information is useful, please help other people find it:
Share via:

Shiva Stanford via llvm-dev

2020-Mar-24 09:13 UTC

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

I am a grad CS student at Stanford and wanted to engage with EJ Park,
Giorgis Georgakoudis, Johannes Doerfert to further develop the Machine
Learning and Compiler Optimization concept.

My background is in machine learning, cluster computing, distributed
systems etc. I am a good C/C++ developer and have a strong background in
algorithms and data structure.

I am also taking an advanced compiler course this quarter at Stanford. So I
would be studying several of these topics anyways - so I thought I might as
well co-engage on the LLVM compiler infra project.

I am currently studying the background information on SCC Call Graphs,
Dominator Trees and other Global and inter-procedural analysis to lay some
ground work on how to tackle this optimization pass using ML models. I have
run a couple of all program function passes and visualized call graphs to
get familiarized with the LLVM optimization pass setup. I have also setup
and learnt the use of GDB to debug function pass code.

I have submitted the ML and Compiler Optimization proposal to GSOC 2020. I
have added an additional feature to enhance the ML optimization to include
crossover code to GPU and investigate how the function call graphs can be
visualized as SCCs across CPU and GPU implementations. If the extension to
GPU is too much for a summer project, potentially we can focus on
developing a framework for studying SCCs across a unified CPU, GPU setup
and leave the coding, if feasible, to next Summer. All preliminary ideas.

Not sure how to proceed from here. Hence my email to this list. Please let
me know.

Thank you
Shiva Badruswamy
shivastanford at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200324/423ca4fc/attachment.html>

Johannes Doerfert via llvm-dev

2020-Mar-26 16:41 UTC

head link

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

Hi Shiva,

apologies for the delayed response.

On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:
 > I am a grad CS student at Stanford and wanted to engage with EJ Park,
 > Giorgis Georgakoudis, Johannes Doerfert to further develop the Machine
 > Learning and Compiler Optimization concept.

Cool!

 > My background is in machine learning, cluster computing, distributed
 > systems etc. I am a good C/C++ developer and have a strong background in
 > algorithms and data structure.

Sounds good.

 > I am also taking an advanced compiler course this quarter at 
Stanford. So I
 > would be studying several of these topics anyways - so I thought I 
might as
 > well co-engage on the LLVM compiler infra project.

Agreed ;)

 > I am currently studying the background information on SCC Call Graphs,
 > Dominator Trees and other Global and inter-procedural analysis to lay 
some
 > ground work on how to tackle this optimization pass using ML models. 
I have
 > run a couple of all program function passes and visualized call graphs to
 > get familiarized with the LLVM optimization pass setup. I have also setup
 > and learnt the use of GDB to debug function pass code.

Very nice.

 > I have submitted the ML and Compiler Optimization proposal to GSOC 
2020. I
 > have added an additional feature to enhance the ML optimization to 
include
 > crossover code to GPU and investigate how the function call graphs can be
 > visualized as SCCs across CPU and GPU implementations. If the 
extension to
 > GPU is too much for a summer project, potentially we can focus on
 > developing a framework for studying SCCs across a unified CPU, GPU setup
 > and leave the coding, if feasible, to next Summer. All preliminary ideas.

I haven't looked at the proposals yet (I think we can only after the
deadline). TBH, I'm not sure I fully understand your extension. Also,
full disclosure, the project is pretty open-ended from my side at least.
I do not necessarily believe we (=llvm) is ready for a ML driven pass or
even inference in practice. What I want is to explore the use of ML to
improve the code we have, especially heuristics. We build analysis and
transformations but it is hard to combine them in a way that balances
compile-time, code-size, and performance.

Some high-level statements that might help to put my view into
perspective:

I want to use ML to identify patterns and code features that we can
check for using common techniques but when we base our decision making
on these patterns or features we achieve better compile-time, code-size,
and/or performance.
I want to use ML to identify shortcomings in our existing heuristics,
e.g. transformation cut-off values or pass schedules. This could also
mean to identify alternative (combination of) values that perform
substantially better (on some inputs).

 > Not sure how to proceed from here. Hence my email to this list. 
Please let
 > me know.

The email to the list was a great first step. The next one usually is to
setup an LLVM development and testing environment, thus LLVM + Clang +
LLVM-Test Suite that you can use. It is also advised to work on a small
task before the GSoC to get used to the LLVM development.

I don't have a really small ML "coding" task handy right now but
the
project is more about experiments anyway. To get some LLVM development
experience we can just take a small task in the IPO Attributor pass.

One thing we need and we don't have is data. The Attributor is a
fixpoint iteration framework so the number of iterations is pretty
integral part. We have a statistics counter to determine if the number
required was higher than the given threshold but not one to determine
the maximum iteration count required during compilation. It would be
great if you could add that, thus a statistics counter that shows how
many iterations where required until a fixpoint was found across all
invocations of the Attributor. Does this make sense? Let me know what
you think and feel free to ask questions via email or on IRC.

Cheers,
   Johannes

P.S. Check out the coding style guide and the how to contribute guide!

 > Thank you
 > Shiva Badruswamy
 > shivastanford at gmail.com
 >
 >
 > _______________________________________________
 > LLVM Developers mailing list
 > llvm-dev at lists.llvm.org
 > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Shiva Stanford via llvm-dev

2020-Mar-27 20:46 UTC

head link

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

Hi Johannes - great we are engaging on this.

Some responses now and some later.

1. When you say setup LLVM dev environment +. clang + tools etc, do you
mean setup LLVM compiler code from the repo and build it locally? If so,
yes, this is all done from my end - that is, I have built all this on my
machine and compiled and run a couple of function passes. I have look at
some LLVM emits from clang tools but I will familiarize more. I have added
some small code segments, modified CMAKE Lists and re-built code to get a
feel for the packaging  structure. Btw, is there a version of Basel build
for this? Right now, I am using OS X as the SDK as Apple is the one that
has adopted LLVM the most. But I can switch to Linux containers to
completely wall off the LLVM build against any OS X system builds to
prevent path obfuscation and truly have a separate address space. Is there
a preferable environment? In any case, I am thinking of containerizing the
build, so OS X system paths don't interfere with include paths - have you
received feedback from other developers on whether the include paths
interfere with OS X LLVM system build?

2. The attributor pass refactoring gives some specific direction as a
startup project - so that's great. Let me study this pass and I will get
back to you with more questions.

3. Yes, I will stick to the style guide (Baaaah...Stanford is strict on
code styling and so are you guys :)) for sure.

On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <
johannesdoerfert at gmail.com> wrote:
>
> Hi Shiva,
>
> apologies for the delayed response.
>
> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:
>  > I am a grad CS student at Stanford and wanted to engage with EJ Park,
>  > Giorgis Georgakoudis, Johannes Doerfert to further develop the
Machine
>  > Learning and Compiler Optimization concept.
>
> Cool!
>
>
>  > My background is in machine learning, cluster computing, distributed
>  > systems etc. I am a good C/C++ developer and have a strong background
in
>  > algorithms and data structure.
>
> Sounds good.
>
>
>  > I am also taking an advanced compiler course this quarter at
> Stanford. So I
>  > would be studying several of these topics anyways - so I thought I
> might as
>  > well co-engage on the LLVM compiler infra project.
>
> Agreed ;)
>
>
>  > I am currently studying the background information on SCC Call
Graphs,
>  > Dominator Trees and other Global and inter-procedural analysis to lay
> some
>  > ground work on how to tackle this optimization pass using ML models.
> I have
>  > run a couple of all program function passes and visualized call
graphs
> to
>  > get familiarized with the LLVM optimization pass setup. I have also
> setup
>  > and learnt the use of GDB to debug function pass code.
>
> Very nice.
>
>
>  > I have submitted the ML and Compiler Optimization proposal to GSOC
> 2020. I
>  > have added an additional feature to enhance the ML optimization to
> include
>  > crossover code to GPU and investigate how the function call graphs
can
> be
>  > visualized as SCCs across CPU and GPU implementations. If the
> extension to
>  > GPU is too much for a summer project, potentially we can focus on
>  > developing a framework for studying SCCs across a unified CPU, GPU
setup
>  > and leave the coding, if feasible, to next Summer. All preliminary
> ideas.
>
> I haven't looked at the proposals yet (I think we can only after the
> deadline). TBH, I'm not sure I fully understand your extension. Also,
> full disclosure, the project is pretty open-ended from my side at least.
> I do not necessarily believe we (=llvm) is ready for a ML driven pass or
> even inference in practice. What I want is to explore the use of ML to
> improve the code we have, especially heuristics. We build analysis and
> transformations but it is hard to combine them in a way that balances
> compile-time, code-size, and performance.
>
> Some high-level statements that might help to put my view into
> perspective:
>
> I want to use ML to identify patterns and code features that we can
> check for using common techniques but when we base our decision making
> on these patterns or features we achieve better compile-time, code-size,
> and/or performance.
> I want to use ML to identify shortcomings in our existing heuristics,
> e.g. transformation cut-off values or pass schedules. This could also
> mean to identify alternative (combination of) values that perform
> substantially better (on some inputs).
>
>
>  > Not sure how to proceed from here. Hence my email to this list.
> Please let
>  > me know.
>
> The email to the list was a great first step. The next one usually is to
> setup an LLVM development and testing environment, thus LLVM + Clang +
> LLVM-Test Suite that you can use. It is also advised to work on a small
> task before the GSoC to get used to the LLVM development.
>
> I don't have a really small ML "coding" task handy right now
but the
> project is more about experiments anyway. To get some LLVM development
> experience we can just take a small task in the IPO Attributor pass.
>
> One thing we need and we don't have is data. The Attributor is a
> fixpoint iteration framework so the number of iterations is pretty
> integral part. We have a statistics counter to determine if the number
> required was higher than the given threshold but not one to determine
> the maximum iteration count required during compilation. It would be
> great if you could add that, thus a statistics counter that shows how
> many iterations where required until a fixpoint was found across all
> invocations of the Attributor. Does this make sense? Let me know what
> you think and feel free to ask questions via email or on IRC.
>
> Cheers,
>    Johannes
>
> P.S. Check out the coding style guide and the how to contribute guide!
>
>
>  > Thank you
>  > Shiva Badruswamy
>  > shivastanford at gmail.com
>  >
>  >
>  > _______________________________________________
>  > LLVM Developers mailing list
>  > llvm-dev at lists.llvm.org
>  > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200327/5c780cf2/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Mar 2020 - Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

Possibly Parallel Threads