thr3ads.net - llvm dev - [llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations [Mar 2020]

If this information is useful, please help other people find it:
Share via:

Shiva Stanford via llvm-dev

2020-Mar-27 20:46 UTC

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

Hi Johannes - great we are engaging on this.

Some responses now and some later.

1. When you say setup LLVM dev environment +. clang + tools etc, do you
mean setup LLVM compiler code from the repo and build it locally? If so,
yes, this is all done from my end - that is, I have built all this on my
machine and compiled and run a couple of function passes. I have look at
some LLVM emits from clang tools but I will familiarize more. I have added
some small code segments, modified CMAKE Lists and re-built code to get a
feel for the packaging  structure. Btw, is there a version of Basel build
for this? Right now, I am using OS X as the SDK as Apple is the one that
has adopted LLVM the most. But I can switch to Linux containers to
completely wall off the LLVM build against any OS X system builds to
prevent path obfuscation and truly have a separate address space. Is there
a preferable environment? In any case, I am thinking of containerizing the
build, so OS X system paths don't interfere with include paths - have you
received feedback from other developers on whether the include paths
interfere with OS X LLVM system build?

2. The attributor pass refactoring gives some specific direction as a
startup project - so that's great. Let me study this pass and I will get
back to you with more questions.

3. Yes, I will stick to the style guide (Baaaah...Stanford is strict on
code styling and so are you guys :)) for sure.

On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <
johannesdoerfert at gmail.com> wrote:
>
> Hi Shiva,
>
> apologies for the delayed response.
>
> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:
>  > I am a grad CS student at Stanford and wanted to engage with EJ Park,
>  > Giorgis Georgakoudis, Johannes Doerfert to further develop the
Machine
>  > Learning and Compiler Optimization concept.
>
> Cool!
>
>
>  > My background is in machine learning, cluster computing, distributed
>  > systems etc. I am a good C/C++ developer and have a strong background
in
>  > algorithms and data structure.
>
> Sounds good.
>
>
>  > I am also taking an advanced compiler course this quarter at
> Stanford. So I
>  > would be studying several of these topics anyways - so I thought I
> might as
>  > well co-engage on the LLVM compiler infra project.
>
> Agreed ;)
>
>
>  > I am currently studying the background information on SCC Call
Graphs,
>  > Dominator Trees and other Global and inter-procedural analysis to lay
> some
>  > ground work on how to tackle this optimization pass using ML models.
> I have
>  > run a couple of all program function passes and visualized call
graphs
> to
>  > get familiarized with the LLVM optimization pass setup. I have also
> setup
>  > and learnt the use of GDB to debug function pass code.
>
> Very nice.
>
>
>  > I have submitted the ML and Compiler Optimization proposal to GSOC
> 2020. I
>  > have added an additional feature to enhance the ML optimization to
> include
>  > crossover code to GPU and investigate how the function call graphs
can
> be
>  > visualized as SCCs across CPU and GPU implementations. If the
> extension to
>  > GPU is too much for a summer project, potentially we can focus on
>  > developing a framework for studying SCCs across a unified CPU, GPU
setup
>  > and leave the coding, if feasible, to next Summer. All preliminary
> ideas.
>
> I haven't looked at the proposals yet (I think we can only after the
> deadline). TBH, I'm not sure I fully understand your extension. Also,
> full disclosure, the project is pretty open-ended from my side at least.
> I do not necessarily believe we (=llvm) is ready for a ML driven pass or
> even inference in practice. What I want is to explore the use of ML to
> improve the code we have, especially heuristics. We build analysis and
> transformations but it is hard to combine them in a way that balances
> compile-time, code-size, and performance.
>
> Some high-level statements that might help to put my view into
> perspective:
>
> I want to use ML to identify patterns and code features that we can
> check for using common techniques but when we base our decision making
> on these patterns or features we achieve better compile-time, code-size,
> and/or performance.
> I want to use ML to identify shortcomings in our existing heuristics,
> e.g. transformation cut-off values or pass schedules. This could also
> mean to identify alternative (combination of) values that perform
> substantially better (on some inputs).
>
>
>  > Not sure how to proceed from here. Hence my email to this list.
> Please let
>  > me know.
>
> The email to the list was a great first step. The next one usually is to
> setup an LLVM development and testing environment, thus LLVM + Clang +
> LLVM-Test Suite that you can use. It is also advised to work on a small
> task before the GSoC to get used to the LLVM development.
>
> I don't have a really small ML "coding" task handy right now
but the
> project is more about experiments anyway. To get some LLVM development
> experience we can just take a small task in the IPO Attributor pass.
>
> One thing we need and we don't have is data. The Attributor is a
> fixpoint iteration framework so the number of iterations is pretty
> integral part. We have a statistics counter to determine if the number
> required was higher than the given threshold but not one to determine
> the maximum iteration count required during compilation. It would be
> great if you could add that, thus a statistics counter that shows how
> many iterations where required until a fixpoint was found across all
> invocations of the Attributor. Does this make sense? Let me know what
> you think and feel free to ask questions via email or on IRC.
>
> Cheers,
>    Johannes
>
> P.S. Check out the coding style guide and the how to contribute guide!
>
>
>  > Thank you
>  > Shiva Badruswamy
>  > shivastanford at gmail.com
>  >
>  >
>  > _______________________________________________
>  > LLVM Developers mailing list
>  > llvm-dev at lists.llvm.org
>  > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200327/5c780cf2/attachment.html>

Johannes Doerfert via llvm-dev

2020-Mar-31 00:29 UTC

head link

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

On 3/27/20 3:46 PM, Shiva Stanford wrote:> Hi Johannes - great we are engaging on this.
>
> Some responses now and some later.
>
> 1. When you say setup LLVM dev environment +. clang + tools etc, do you
> mean setup LLVM compiler code from the repo and build it locally? If so,
> yes, this is all done from my end - that is, I have built all this on my
> machine and compiled and run a couple of function passes. I have look at
> some LLVM emits from clang tools but I will familiarize more. I have added
> some small code segments, modified CMAKE Lists and re-built code to get a
> feel for the packaging  structure. Btw, is there a version of Basel build
> for this? Right now, I am using OS X as the SDK as Apple is the one that
> has adopted LLVM the most. But I can switch to Linux containers to
> completely wall off the LLVM build against any OS X system builds to
> prevent path obfuscation and truly have a separate address space. Is there
> a preferable environment? In any case, I am thinking of containerizing the
> build, so OS X system paths don't interfere with include paths - have
you
> received feedback from other developers on whether the include paths
> interfere with OS X LLVM system build?

Setup sounds good.

I have never used OS X but people do and I would expect it to be OK.

I don't think you need to worry about this right now.

> 2. The attributor pass refactoring gives some specific direction as a
> startup project - so that's great. Let me study this pass and I will
get
> back to you with more questions.
Sure.

> 3. Yes, I will stick to the style guide (Baaaah...Stanford is strict on
> code styling and so are you guys :)) for sure.
For better or worse.


Cheers,

   Johannes


> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <
> johannesdoerfert at gmail.com> wrote:
>
>> Hi Shiva,
>>
>> apologies for the delayed response.
>>
>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:
>>   > I am a grad CS student at Stanford and wanted to engage with EJ
Park,
>>   > Giorgis Georgakoudis, Johannes Doerfert to further develop the
Machine
>>   > Learning and Compiler Optimization concept.
>>
>> Cool!
>>
>>
>>   > My background is in machine learning, cluster computing,
distributed
>>   > systems etc. I am a good C/C++ developer and have a strong
background in
>>   > algorithms and data structure.
>>
>> Sounds good.
>>
>>
>>   > I am also taking an advanced compiler course this quarter at
>> Stanford. So I
>>   > would be studying several of these topics anyways - so I thought
I
>> might as
>>   > well co-engage on the LLVM compiler infra project.
>>
>> Agreed ;)
>>
>>
>>   > I am currently studying the background information on SCC Call
Graphs,
>>   > Dominator Trees and other Global and inter-procedural analysis
to lay
>> some
>>   > ground work on how to tackle this optimization pass using ML
models.
>> I have
>>   > run a couple of all program function passes and visualized call
graphs
>> to
>>   > get familiarized with the LLVM optimization pass setup. I have
also
>> setup
>>   > and learnt the use of GDB to debug function pass code.
>>
>> Very nice.
>>
>>
>>   > I have submitted the ML and Compiler Optimization proposal to
GSOC
>> 2020. I
>>   > have added an additional feature to enhance the ML optimization
to
>> include
>>   > crossover code to GPU and investigate how the function call
graphs can
>> be
>>   > visualized as SCCs across CPU and GPU implementations. If the
>> extension to
>>   > GPU is too much for a summer project, potentially we can focus
on
>>   > developing a framework for studying SCCs across a unified CPU,
GPU setup
>>   > and leave the coding, if feasible, to next Summer. All
preliminary
>> ideas.
>>
>> I haven't looked at the proposals yet (I think we can only after
the
>> deadline). TBH, I'm not sure I fully understand your extension.
Also,
>> full disclosure, the project is pretty open-ended from my side at
least.
>> I do not necessarily believe we (=llvm) is ready for a ML driven pass
or
>> even inference in practice. What I want is to explore the use of ML to
>> improve the code we have, especially heuristics. We build analysis and
>> transformations but it is hard to combine them in a way that balances
>> compile-time, code-size, and performance.
>>
>> Some high-level statements that might help to put my view into
>> perspective:
>>
>> I want to use ML to identify patterns and code features that we can
>> check for using common techniques but when we base our decision making
>> on these patterns or features we achieve better compile-time,
code-size,
>> and/or performance.
>> I want to use ML to identify shortcomings in our existing heuristics,
>> e.g. transformation cut-off values or pass schedules. This could also
>> mean to identify alternative (combination of) values that perform
>> substantially better (on some inputs).
>>
>>
>>   > Not sure how to proceed from here. Hence my email to this list.
>> Please let
>>   > me know.
>>
>> The email to the list was a great first step. The next one usually is
to
>> setup an LLVM development and testing environment, thus LLVM + Clang +
>> LLVM-Test Suite that you can use. It is also advised to work on a small
>> task before the GSoC to get used to the LLVM development.
>>
>> I don't have a really small ML "coding" task handy right
now but the
>> project is more about experiments anyway. To get some LLVM development
>> experience we can just take a small task in the IPO Attributor pass.
>>
>> One thing we need and we don't have is data. The Attributor is a
>> fixpoint iteration framework so the number of iterations is pretty
>> integral part. We have a statistics counter to determine if the number
>> required was higher than the given threshold but not one to determine
>> the maximum iteration count required during compilation. It would be
>> great if you could add that, thus a statistics counter that shows how
>> many iterations where required until a fixpoint was found across all
>> invocations of the Attributor. Does this make sense? Let me know what
>> you think and feel free to ask questions via email or on IRC.
>>
>> Cheers,
>>     Johannes
>>
>> P.S. Check out the coding style guide and the how to contribute guide!
>>
>>
>>   > Thank you
>>   > Shiva Badruswamy
>>   > shivastanford at gmail.com
>>   >
>>   >
>>   > _______________________________________________
>>   > LLVM Developers mailing list
>>   > llvm-dev at lists.llvm.org
>>   > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>

Shiva Stanford via llvm-dev

2020-Mar-31 01:07 UTC

head link

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

1. Thanks for the clarifications. I will stick to non-containerized OS X
for now.

2. As an aside, I did try to build a Debian docker container by git cloning
into it and using the Dockerfile in LLVM/utils/docker as a starting point:
 - some changes needed to updated packages (GCC in particular needs to be
latest) and the Debian image (Debian 9 instead of Debian 8) pretty much
sets up the docker container well. But for some reason, the Ninja build
tool within the CMake Generator fails. I am looking into it. Maybe I can
produce a working docker workflow for others who want to build and work
with LLVM in a container environment.

3. I have submitted the final proposal today to GSoC 2020 today after
incorporating some comments and thoughts. When you all get a chance to
review, let me know your thoughts.

4. On GPU extension, my thoughts were around what an integrated compiler
like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when GCC is substituted
with LLVM and if that arrangement can be optimized for ML passes. But I am
beginning to think that structuring this problem well and doing meaningful
work over the summer might be a bit difficult. As mentors, do you have any
thoughts on how LLVM might be integrated into a joint CPU-GPU compiler by
the likes of Nvidia, Apple etc.?

Best
Shiva



On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert <
johannesdoerfert at gmail.com> wrote:
>
> On 3/27/20 3:46 PM, Shiva Stanford wrote:
> > Hi Johannes - great we are engaging on this.
> >
> > Some responses now and some later.
> >
> > 1. When you say setup LLVM dev environment +. clang + tools etc, do
you
> > mean setup LLVM compiler code from the repo and build it locally? If
so,
> > yes, this is all done from my end - that is, I have built all this on
my
> > machine and compiled and run a couple of function passes. I have look
at
> > some LLVM emits from clang tools but I will familiarize more. I have
> added
> > some small code segments, modified CMAKE Lists and re-built code to
get a
> > feel for the packaging  structure. Btw, is there a version of Basel
build
> > for this? Right now, I am using OS X as the SDK as Apple is the one
that
> > has adopted LLVM the most. But I can switch to Linux containers to
> > completely wall off the LLVM build against any OS X system builds to
> > prevent path obfuscation and truly have a separate address space. Is
> there
> > a preferable environment? In any case, I am thinking of containerizing
> the
> > build, so OS X system paths don't interfere with include paths -
have you
> > received feedback from other developers on whether the include paths
> > interfere with OS X LLVM system build?
>
>
> Setup sounds good.
>
> I have never used OS X but people do and I would expect it to be OK.
>
> I don't think you need to worry about this right now.
>
>
> > 2. The attributor pass refactoring gives some specific direction as a
> > startup project - so that's great. Let me study this pass and I
will get
> > back to you with more questions.
>
> Sure.
>
>
> > 3. Yes, I will stick to the style guide (Baaaah...Stanford is strict
on
> > code styling and so are you guys :)) for sure.
>
> For better or worse.
>
>
> Cheers,
>
>    Johannes
>
>
>
> > On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <
> > johannesdoerfert at gmail.com> wrote:
> >
> >> Hi Shiva,
> >>
> >> apologies for the delayed response.
> >>
> >> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:
> >>   > I am a grad CS student at Stanford and wanted to engage
with EJ
> Park,
> >>   > Giorgis Georgakoudis, Johannes Doerfert to further develop
the
> Machine
> >>   > Learning and Compiler Optimization concept.
> >>
> >> Cool!
> >>
> >>
> >>   > My background is in machine learning, cluster computing,
distributed
> >>   > systems etc. I am a good C/C++ developer and have a strong
> background in
> >>   > algorithms and data structure.
> >>
> >> Sounds good.
> >>
> >>
> >>   > I am also taking an advanced compiler course this quarter
at
> >> Stanford. So I
> >>   > would be studying several of these topics anyways - so I
thought I
> >> might as
> >>   > well co-engage on the LLVM compiler infra project.
> >>
> >> Agreed ;)
> >>
> >>
> >>   > I am currently studying the background information on SCC
Call
> Graphs,
> >>   > Dominator Trees and other Global and inter-procedural
analysis to
> lay
> >> some
> >>   > ground work on how to tackle this optimization pass using
ML models.
> >> I have
> >>   > run a couple of all program function passes and visualized
call
> graphs
> >> to
> >>   > get familiarized with the LLVM optimization pass setup. I
have also
> >> setup
> >>   > and learnt the use of GDB to debug function pass code.
> >>
> >> Very nice.
> >>
> >>
> >>   > I have submitted the ML and Compiler Optimization proposal
to GSOC
> >> 2020. I
> >>   > have added an additional feature to enhance the ML
optimization to
> >> include
> >>   > crossover code to GPU and investigate how the function call
graphs
> can
> >> be
> >>   > visualized as SCCs across CPU and GPU implementations. If
the
> >> extension to
> >>   > GPU is too much for a summer project, potentially we can
focus on
> >>   > developing a framework for studying SCCs across a unified
CPU, GPU
> setup
> >>   > and leave the coding, if feasible, to next Summer. All
preliminary
> >> ideas.
> >>
> >> I haven't looked at the proposals yet (I think we can only
after the
> >> deadline). TBH, I'm not sure I fully understand your
extension. Also,
> >> full disclosure, the project is pretty open-ended from my side at
least.
> >> I do not necessarily believe we (=llvm) is ready for a ML driven
pass or
> >> even inference in practice. What I want is to explore the use of
ML to
> >> improve the code we have, especially heuristics. We build analysis
and
> >> transformations but it is hard to combine them in a way that
balances
> >> compile-time, code-size, and performance.
> >>
> >> Some high-level statements that might help to put my view into
> >> perspective:
> >>
> >> I want to use ML to identify patterns and code features that we
can
> >> check for using common techniques but when we base our decision
making
> >> on these patterns or features we achieve better compile-time,
code-size,
> >> and/or performance.
> >> I want to use ML to identify shortcomings in our existing
heuristics,
> >> e.g. transformation cut-off values or pass schedules. This could
also
> >> mean to identify alternative (combination of) values that perform
> >> substantially better (on some inputs).
> >>
> >>
> >>   > Not sure how to proceed from here. Hence my email to this
list.
> >> Please let
> >>   > me know.
> >>
> >> The email to the list was a great first step. The next one usually
is to
> >> setup an LLVM development and testing environment, thus LLVM +
Clang +
> >> LLVM-Test Suite that you can use. It is also advised to work on a
small
> >> task before the GSoC to get used to the LLVM development.
> >>
> >> I don't have a really small ML "coding" task handy
right now but the
> >> project is more about experiments anyway. To get some LLVM
development
> >> experience we can just take a small task in the IPO Attributor
pass.
> >>
> >> One thing we need and we don't have is data. The Attributor is
a
> >> fixpoint iteration framework so the number of iterations is pretty
> >> integral part. We have a statistics counter to determine if the
number
> >> required was higher than the given threshold but not one to
determine
> >> the maximum iteration count required during compilation. It would
be
> >> great if you could add that, thus a statistics counter that shows
how
> >> many iterations where required until a fixpoint was found across
all
> >> invocations of the Attributor. Does this make sense? Let me know
what
> >> you think and feel free to ask questions via email or on IRC.
> >>
> >> Cheers,
> >>     Johannes
> >>
> >> P.S. Check out the coding style guide and the how to contribute
guide!
> >>
> >>
> >>   > Thank you
> >>   > Shiva Badruswamy
> >>   > shivastanford at gmail.com
> >>   >
> >>   >
> >>   > _______________________________________________
> >>   > LLVM Developers mailing list
> >>   > llvm-dev at lists.llvm.org
> >>   > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200330/149b2f30/attachment-0001.html>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Mar 2020 - Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

Seemingly Similar Threads