thr3ads.net - llvm dev - [llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations [Mar 2020]

If this information is useful, please help other people find it:
Share via:

Shiva Stanford via llvm-dev

2020-Mar-31 02:28 UTC

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

Hi Johannes:

1. Attached is the submitted PDF.
2. I have a notes section where I state: I am still unsure of the GPU
extension I proposed as I dont know how LLVM plays into the GPU cross over
space like how nvcc (Nvidia's compiler integrates gcc and PTX) does.I dont
know if there is a chance that function graphs in the CPU+GPU name spaces
are seamless/continupus within nvcc or if nvcc is just a wrapper that
invokes gcc on the cpu sources and ptx on the gpu sources. So what I have
said is  - if there is time to investigate we could look  at this. But I am
not sure I am even framing the problem statement correctly at this point.
3. I have added a tentative tasks section and made a note that the project
is open ended and things are quite fluid and may change significantly.

Cheers
Shiva


On Mon, Mar 30, 2020 at 6:52 PM Johannes Doerfert <
johannesdoerfert at gmail.com> wrote:
> On 3/30/20 8:07 PM, Shiva Stanford wrote:
>  > 1. Thanks for the clarifications. I will stick to non-containerized
OS X
>  > for now.
>
> Sounds good. As long as you can build it and run lit and llvm-test suite
> tests :)
>
>
>  > 2. As an aside, I did try to build a Debian docker container by git
> cloning
>  > into it and using the Dockerfile in LLVM/utils/docker as a starting
> point:
>  >  - some changes needed to updated packages (GCC in particular needs
to
> be
>  > latest) and the Debian image (Debian 9 instead of Debian 8) pretty
much
>  > sets up the docker container well. But for some reason, the Ninja
build
>  > tool within the CMake Generator fails. I am looking into it. Maybe I
can
>  > produce a working docker workflow for others who want to build and
work
>  > with LLVM in a container environment.
>
> Feel free to propose a fix but I'm the wrong one to talk to ;)
>
>
>  > 3. I have submitted the final proposal today to GSoC 2020 today after
>  > incorporating some comments and thoughts. When you all get a chance
to
>  > review, let me know your thoughts.
>
> Good. Can you share the google docs with me
> (johannesdoerfert at gmail.com)? [Or did you and I misplaced the link? In
> that case send it again ;)]
>
>
>  > 4. On GPU extension, my thoughts were around what an integrated
compiler
>  > like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when GCC is
> substituted
>  > with LLVM and if that arrangement can be optimized for ML passes.
>  > But I am beginning to think that structuring this problem well and
>  > doing meaningful work over the summer might be a bit difficult.
>
> As far as I know, neither GCC nor Clang will behave much differently if
> they are used by nvcc than in their standalone mode.
>
> Having an "ML-mode" is probably a generic thing to look at.
Though, the
> "high-level" optimizations are not necessarily performed in
LLVM-IR.
>
>
>  > As mentors, do you have any thoughts on how LLVM might be integrated
>  > into a joint CPU-GPU compiler by the likes of Nvidia, Apple etc.?
>
> I'm unsure what you ask exactly. Clang can be used in CPU-GPU
> compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this it?
> I'm personally mostly interested in generic optimizations in this space
> but actually quite interested. Some ideas:
>   - transfer latency hiding (another GSoC project),
>   - kernel granularity optimizations (not worked being worked on yet but
>     requires some infrastructe changes that are as of now still in the
>     making),
>   - data "location" tracking so we can "move"
computation to the right
>     device, e.g., for really dependence free loops like `pragma omp loop`
>
> I can list more things but I'm unsure this is the direction you were
> thinking.
>
> Cheers,
>    Johannes
>
>  > Best
>  > Shiva
>  >
>  >
>  >
>  > On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert <
>  > johannesdoerfert at gmail.com> wrote:
>  >
>  >>
>  >> On 3/27/20 3:46 PM, Shiva Stanford wrote:
>  >>> Hi Johannes - great we are engaging on this.
>  >>>
>  >>> Some responses now and some later.
>  >>>
>  >>> 1. When you say setup LLVM dev environment +. clang + tools
etc, do
> you
>  >>> mean setup LLVM compiler code from the repo and build it
locally?
> If so,
>  >>> yes, this is all done from my end - that is, I have built all
this
> on my
>  >>> machine and compiled and run a couple of function passes. I
have
> look at
>  >>> some LLVM emits from clang tools but I will familiarize more.
I have
>  >> added
>  >>> some small code segments, modified CMAKE Lists and re-built
code to
> get a
>  >>> feel for the packaging  structure. Btw, is there a version of
Basel
> build
>  >>> for this? Right now, I am using OS X as the SDK as Apple is
the one
> that
>  >>> has adopted LLVM the most. But I can switch to Linux
containers to
>  >>> completely wall off the LLVM build against any OS X system
builds to
>  >>> prevent path obfuscation and truly have a separate address
space. Is
>  >> there
>  >>> a preferable environment? In any case, I am thinking of
containerizing
>  >> the
>  >>> build, so OS X system paths don't interfere with include
paths -
> have you
>  >>> received feedback from other developers on whether the
include paths
>  >>> interfere with OS X LLVM system build?
>  >>
>  >>
>  >> Setup sounds good.
>  >>
>  >> I have never used OS X but people do and I would expect it to be
OK.
>  >>
>  >> I don't think you need to worry about this right now.
>  >>
>  >>
>  >>> 2. The attributor pass refactoring gives some specific
direction as a
>  >>> startup project - so that's great. Let me study this pass
and I
> will get
>  >>> back to you with more questions.
>  >>
>  >> Sure.
>  >>
>  >>
>  >>> 3. Yes, I will stick to the style guide (Baaaah...Stanford is
strict
> on
>  >>> code styling and so are you guys :)) for sure.
>  >>
>  >> For better or worse.
>  >>
>  >>
>  >> Cheers,
>  >>
>  >>    Johannes
>  >>
>  >>
>  >>
>  >>> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <
>  >>> johannesdoerfert at gmail.com> wrote:
>  >>>
>  >>>> Hi Shiva,
>  >>>>
>  >>>> apologies for the delayed response.
>  >>>>
>  >>>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:
>  >>>>   > I am a grad CS student at Stanford and wanted to
engage with EJ
>  >> Park,
>  >>>>   > Giorgis Georgakoudis, Johannes Doerfert to further
develop the
>  >> Machine
>  >>>>   > Learning and Compiler Optimization concept.
>  >>>>
>  >>>> Cool!
>  >>>>
>  >>>>
>  >>>>   > My background is in machine learning, cluster
computing,
> distributed
>  >>>>   > systems etc. I am a good C/C++ developer and have
a strong
>  >> background in
>  >>>>   > algorithms and data structure.
>  >>>>
>  >>>> Sounds good.
>  >>>>
>  >>>>
>  >>>>   > I am also taking an advanced compiler course this
quarter at
>  >>>> Stanford. So I
>  >>>>   > would be studying several of these topics anyways
- so I thought
> I
>  >>>> might as
>  >>>>   > well co-engage on the LLVM compiler infra project.
>  >>>>
>  >>>> Agreed ;)
>  >>>>
>  >>>>
>  >>>>   > I am currently studying the background information
on SCC Call
>  >> Graphs,
>  >>>>   > Dominator Trees and other Global and
inter-procedural analysis to
>  >> lay
>  >>>> some
>  >>>>   > ground work on how to tackle this optimization
pass using ML
> models.
>  >>>> I have
>  >>>>   > run a couple of all program function passes and
visualized call
>  >> graphs
>  >>>> to
>  >>>>   > get familiarized with the LLVM optimization pass
setup. I have
> also
>  >>>> setup
>  >>>>   > and learnt the use of GDB to debug function pass
code.
>  >>>>
>  >>>> Very nice.
>  >>>>
>  >>>>
>  >>>>   > I have submitted the ML and Compiler Optimization
proposal to
> GSOC
>  >>>> 2020. I
>  >>>>   > have added an additional feature to enhance the ML
optimization
> to
>  >>>> include
>  >>>>   > crossover code to GPU and investigate how the
function call
> graphs
>  >> can
>  >>>> be
>  >>>>   > visualized as SCCs across CPU and GPU
implementations. If the
>  >>>> extension to
>  >>>>   > GPU is too much for a summer project, potentially
we can focus on
>  >>>>   > developing a framework for studying SCCs across a
unified CPU,
> GPU
>  >> setup
>  >>>>   > and leave the coding, if feasible, to next Summer.
All
> preliminary
>  >>>> ideas.
>  >>>>
>  >>>> I haven't looked at the proposals yet (I think we can
only after the
>  >>>> deadline). TBH, I'm not sure I fully understand your
extension. Also,
>  >>>> full disclosure, the project is pretty open-ended from my
side at
> least.
>  >>>> I do not necessarily believe we (=llvm) is ready for a ML
driven
> pass or
>  >>>> even inference in practice. What I want is to explore the
use of ML
> to
>  >>>> improve the code we have, especially heuristics. We build
analysis
> and
>  >>>> transformations but it is hard to combine them in a way
that balances
>  >>>> compile-time, code-size, and performance.
>  >>>>
>  >>>> Some high-level statements that might help to put my view
into
>  >>>> perspective:
>  >>>>
>  >>>> I want to use ML to identify patterns and code features
that we can
>  >>>> check for using common techniques but when we base our
decision
> making
>  >>>> on these patterns or features we achieve better
compile-time,
> code-size,
>  >>>> and/or performance.
>  >>>> I want to use ML to identify shortcomings in our existing
heuristics,
>  >>>> e.g. transformation cut-off values or pass schedules.
This could also
>  >>>> mean to identify alternative (combination of) values that
perform
>  >>>> substantially better (on some inputs).
>  >>>>
>  >>>>
>  >>>>   > Not sure how to proceed from here. Hence my email
to this list.
>  >>>> Please let
>  >>>>   > me know.
>  >>>>
>  >>>> The email to the list was a great first step. The next
one usually
> is to
>  >>>> setup an LLVM development and testing environment, thus
LLVM + Clang
> +
>  >>>> LLVM-Test Suite that you can use. It is also advised to
work on a
> small
>  >>>> task before the GSoC to get used to the LLVM development.
>  >>>>
>  >>>> I don't have a really small ML "coding"
task handy right now but the
>  >>>> project is more about experiments anyway. To get some
LLVM
> development
>  >>>> experience we can just take a small task in the IPO
Attributor pass.
>  >>>>
>  >>>> One thing we need and we don't have is data. The
Attributor is a
>  >>>> fixpoint iteration framework so the number of iterations
is pretty
>  >>>> integral part. We have a statistics counter to determine
if the
> number
>  >>>> required was higher than the given threshold but not one
to determine
>  >>>> the maximum iteration count required during compilation.
It would be
>  >>>> great if you could add that, thus a statistics counter
that shows how
>  >>>> many iterations where required until a fixpoint was found
across all
>  >>>> invocations of the Attributor. Does this make sense? Let
me know what
>  >>>> you think and feel free to ask questions via email or on
IRC.
>  >>>>
>  >>>> Cheers,
>  >>>>     Johannes
>  >>>>
>  >>>> P.S. Check out the coding style guide and the how to
contribute
> guide!
>  >>>>
>  >>>>
>  >>>>   > Thank you
>  >>>>   > Shiva Badruswamy
>  >>>>   > shivastanford at gmail.com
>  >>>>   >
>  >>>>   >
>  >>>>   > _______________________________________________
>  >>>>   > LLVM Developers mailing list
>  >>>>   > llvm-dev at lists.llvm.org
>  >>>>   >
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>  >>>>
>  >>>>
>  >>
>  >
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200330/6d40ad8d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Final_Proposal_ MachineLearningAndCompilerOptimization.pdf
Type: application/pdf
Size: 40819 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200330/6d40ad8d/attachment-0001.pdf>

Johannes Doerfert via llvm-dev

2020-Mar-31 06:42 UTC

head link

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

On 3/30/20 9:28 PM, Shiva Stanford wrote:
 > Hi Johannes:
 >
 > 1. Attached is the submitted PDF.

I thought they make you submit via gdoc and I also thought they wanted a
timeline and had other requirements. Please verify this so it's not a
problem (I base this on the proposals I've seen this year and not on the
information actually provided by GSoC).


 > 2. I have a notes section where I state: I am still unsure of the GPU
 > extension I proposed as I dont know how LLVM plays into the GPU cross 
over
 > space like how nvcc (Nvidia's compiler integrates gcc and PTX) does.

You can use clang as "host compiler". As mentioned before, there is
clang-cuda and OpenMP offloading also generates PTX for the GPU code.


 > I dont know if there is a chance that function graphs in the CPU+GPU
 > name spaces are seamless/continupus within nvcc or if nvcc is just a
 > wrapper that invokes gcc on the cpu sources and ptx on the gpu
 > sources.

Something like that as far as I know.


 > So what I have said is  - if there is time to investigate we could
 > look at this. But I am not sure I am even framing the problem
 > statement correctly at this point.

As I said, I'd be very happy for you to also work on GPU related things,
what exactly can be defined over the next weeks.

GPU offloading is by nature inter-procedural (take CUDA kernels) so
creating the infrastructure to alter the granularity of kernels
(when/where to fuse/split them) could be a task. For this it is fairly
important (as far as I know now) to predict the register usage
accurately. Using learning here might be interesting as well.

As you mention in the pdf, one can also split the index space to balance
computation. When we implement something like `pragma omp loop` we can
also balance computations across multiple GPUs as long as we get the
data movement right.


 > 3. I have added a tentative tasks section and made a note that the
 > project is open ended and things are quite fluid and may change
 > significantly.

That is good. This is a moving target and open ended task, I expect
things to be determined more clearly as we go and based on the data we
gather.

Cheers,
   Johannes


 > Cheers Shiva
 >
 >
 > On Mon, Mar 30, 2020 at 6:52 PM Johannes Doerfert <
 > johannesdoerfert at gmail.com> wrote:
 >
 >> On 3/30/20 8:07 PM, Shiva Stanford wrote:
 >>  > 1. Thanks for the clarifications. I will stick to
 >>  > non-containerized OS X for now.
 >>
 >> Sounds good. As long as you can build it and run lit and llvm-test
 >> suite tests :)
 >>
 >>
 >>  > 2. As an aside, I did try to build a Debian docker container by
 >>  > git
 >> cloning
 >>  > into it and using the Dockerfile in LLVM/utils/docker as a
 >>  > starting
 >> point:
 >>  >  - some changes needed to updated packages (GCC in particular
 >>  >  needs to
 >> be
 >>  > latest) and the Debian image (Debian 9 instead of Debian 8)
pretty
 >>  > much sets up the docker container well. But for some reason, the
 >>  > Ninja build tool within the CMake Generator fails. I am looking
 >>  > into it. Maybe I can produce a working docker workflow for
others
 >>  > who want to build and work with LLVM in a container environment.
 >>
 >> Feel free to propose a fix but I'm the wrong one to talk to ;)
 >>
 >>
 >>  > 3. I have submitted the final proposal today to GSoC 2020 today
 >>  > after incorporating some comments and thoughts. When you all get
a
 >>  > chance to review, let me know your thoughts.
 >>
 >> Good. Can you share the google docs with me
 >> (johannesdoerfert at gmail.com)? [Or did you and I misplaced the link?
 >> In that case send it again ;)]
 >>
 >>
 >>  > 4. On GPU extension, my thoughts were around what an integrated
 >>  > compiler like Nvidia's nvcc (GCC for CPU + PTX for GPU) does
when
 >>  > GCC is
 >> substituted
 >>  > with LLVM and if that arrangement can be optimized for ML
passes.
 >>  > But I am beginning to think that structuring this problem well
and
 >>  > doing meaningful work over the summer might be a bit difficult.
 >>
 >> As far as I know, neither GCC nor Clang will behave much differently
 >> if they are used by nvcc than in their standalone mode.
 >>
 >> Having an "ML-mode" is probably a generic thing to look at.
Though,
 >> the "high-level" optimizations are not necessarily performed
in
 >> LLVM-IR.
 >>
 >>
 >>  > As mentors, do you have any thoughts on how LLVM might be
 >>  > integrated into a joint CPU-GPU compiler by the likes of Nvidia,
 >>  > Apple etc.?
 >>
 >> I'm unsure what you ask exactly. Clang can be used in CPU-GPU
 >> compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this it?
 >> I'm personally mostly interested in generic optimizations in this
 >> space but actually quite interested. Some ideas: - transfer latency
 >> hiding (another GSoC project), - kernel granularity optimizations
 >> (not worked being worked on yet but requires some infrastructe
 >> changes that are as of now still in the making), - data
"location"
 >> tracking so we can "move" computation to the right device,
e.g., for
 >> really dependence free loops like `pragma omp loop`
 >>
 >> I can list more things but I'm unsure this is the direction you
were
 >> thinking.
 >>
 >> Cheers, Johannes
 >>
 >>  > Best Shiva
 >>  >
 >>  >
 >>  >
 >>  > On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert <
 >>  > johannesdoerfert at gmail.com> wrote:
 >>  >
 >>  >>
 >>  >> On 3/27/20 3:46 PM, Shiva Stanford wrote:
 >>  >>> Hi Johannes - great we are engaging on this.
 >>  >>>
 >>  >>> Some responses now and some later.
 >>  >>>
 >>  >>> 1. When you say setup LLVM dev environment +. clang +
tools etc,
 >>  >>> do
 >> you
 >>  >>> mean setup LLVM compiler code from the repo and build it
 >>  >>> locally?
 >> If so,
 >>  >>> yes, this is all done from my end - that is, I have
built all
 >>  >>> this
 >> on my
 >>  >>> machine and compiled and run a couple of function
passes. I have
 >> look at
 >>  >>> some LLVM emits from clang tools but I will familiarize
more. I
 >>  >>> have
 >>  >> added
 >>  >>> some small code segments, modified CMAKE Lists and
re-built code
 >>  >>> to
 >> get a
 >>  >>> feel for the packaging  structure. Btw, is there a
version of
 >>  >>> Basel
 >> build
 >>  >>> for this? Right now, I am using OS X as the SDK as Apple
is the
 >>  >>> one
 >> that
 >>  >>> has adopted LLVM the most. But I can switch to Linux
containers
 >>  >>> to completely wall off the LLVM build against any OS X
system
 >>  >>> builds to prevent path obfuscation and truly have a
separate
 >>  >>> address space. Is
 >>  >> there
 >>  >>> a preferable environment? In any case, I am thinking of
 >>  >>> containerizing
 >>  >> the
 >>  >>> build, so OS X system paths don't interfere with
include paths -
 >> have you
 >>  >>> received feedback from other developers on whether the
include
 >>  >>> paths interfere with OS X LLVM system build?
 >>  >>
 >>  >>
 >>  >> Setup sounds good.
 >>  >>
 >>  >> I have never used OS X but people do and I would expect it
to be
 >>  >> OK.
 >>  >>
 >>  >> I don't think you need to worry about this right now.
 >>  >>
 >>  >>
 >>  >>> 2. The attributor pass refactoring gives some specific
direction
 >>  >>> as a startup project - so that's great. Let me study
this pass
 >>  >>> and I
 >> will get
 >>  >>> back to you with more questions.
 >>  >>
 >>  >> Sure.
 >>  >>
 >>  >>
 >>  >>> 3. Yes, I will stick to the style guide
(Baaaah...Stanford is
 >>  >>> strict
 >> on
 >>  >>> code styling and so are you guys :)) for sure.
 >>  >>
 >>  >> For better or worse.
 >>  >>
 >>  >>
 >>  >> Cheers,
 >>  >>
 >>  >>    Johannes
 >>  >>
 >>  >>
 >>  >>
 >>  >>> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <
 >>  >>> johannesdoerfert at gmail.com> wrote:
 >>  >>>
 >>  >>>> Hi Shiva,
 >>  >>>>
 >>  >>>> apologies for the delayed response.
 >>  >>>>
 >>  >>>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev
wrote:
 >>  >>>>   > I am a grad CS student at Stanford and wanted
to engage
 >>  >>>>   > with EJ
 >>  >> Park,
 >>  >>>>   > Giorgis Georgakoudis, Johannes Doerfert to
further develop
 >>  >>>>   > the
 >>  >> Machine
 >>  >>>>   > Learning and Compiler Optimization concept.
 >>  >>>>
 >>  >>>> Cool!
 >>  >>>>
 >>  >>>>
 >>  >>>>   > My background is in machine learning, cluster
computing,
 >> distributed
 >>  >>>>   > systems etc. I am a good C/C++ developer and
have a strong
 >>  >> background in
 >>  >>>>   > algorithms and data structure.
 >>  >>>>
 >>  >>>> Sounds good.
 >>  >>>>
 >>  >>>>
 >>  >>>>   > I am also taking an advanced compiler course
this quarter
 >>  >>>>   > at
 >>  >>>> Stanford. So I
 >>  >>>>   > would be studying several of these topics
anyways - so I
 >>  >>>>   > thought
 >> I
 >>  >>>> might as
 >>  >>>>   > well co-engage on the LLVM compiler infra
project.
 >>  >>>>
 >>  >>>> Agreed ;)
 >>  >>>>
 >>  >>>>
 >>  >>>>   > I am currently studying the background
information on SCC
 >>  >>>>   > Call
 >>  >> Graphs,
 >>  >>>>   > Dominator Trees and other Global and
inter-procedural
 >>  >>>>   > analysis to
 >>  >> lay
 >>  >>>> some
 >>  >>>>   > ground work on how to tackle this
optimization pass using
 >>  >>>>   > ML
 >> models.
 >>  >>>> I have
 >>  >>>>   > run a couple of all program function passes
and visualized
 >>  >>>>   > call
 >>  >> graphs
 >>  >>>> to
 >>  >>>>   > get familiarized with the LLVM optimization
pass setup. I
 >>  >>>>   > have
 >> also
 >>  >>>> setup
 >>  >>>>   > and learnt the use of GDB to debug function
pass code.
 >>  >>>>
 >>  >>>> Very nice.
 >>  >>>>
 >>  >>>>
 >>  >>>>   > I have submitted the ML and Compiler
Optimization proposal
 >>  >>>>   > to
 >> GSOC
 >>  >>>> 2020. I
 >>  >>>>   > have added an additional feature to enhance
the ML
 >>  >>>>   > optimization
 >> to
 >>  >>>> include
 >>  >>>>   > crossover code to GPU and investigate how the
function call
 >> graphs
 >>  >> can
 >>  >>>> be
 >>  >>>>   > visualized as SCCs across CPU and GPU
implementations. If
 >>  >>>>   > the
 >>  >>>> extension to
 >>  >>>>   > GPU is too much for a summer project,
potentially we can
 >>  >>>>   > focus on developing a framework for studying
SCCs across a
 >>  >>>>   > unified CPU,
 >> GPU
 >>  >> setup
 >>  >>>>   > and leave the coding, if feasible, to next
Summer. All
 >> preliminary
 >>  >>>> ideas.
 >>  >>>>
 >>  >>>> I haven't looked at the proposals yet (I think
we can only
 >>  >>>> after the deadline). TBH, I'm not sure I fully
understand your
 >>  >>>> extension. Also, full disclosure, the project is
pretty
 >>  >>>> open-ended from my side at
 >> least.
 >>  >>>> I do not necessarily believe we (=llvm) is ready for
a ML
 >>  >>>> driven
 >> pass or
 >>  >>>> even inference in practice. What I want is to
explore the use
 >>  >>>> of ML
 >> to
 >>  >>>> improve the code we have, especially heuristics. We
build
 >>  >>>> analysis
 >> and
 >>  >>>> transformations but it is hard to combine them in a
way that
 >>  >>>> balances compile-time, code-size, and performance.
 >>  >>>>
 >>  >>>> Some high-level statements that might help to put my
view into
 >>  >>>> perspective:
 >>  >>>>
 >>  >>>> I want to use ML to identify patterns and code
features that we
 >>  >>>> can check for using common techniques but when we
base our
 >>  >>>> decision
 >> making
 >>  >>>> on these patterns or features we achieve better
compile-time,
 >> code-size,
 >>  >>>> and/or performance.  I want to use ML to identify
shortcomings
 >>  >>>> in our existing heuristics, e.g. transformation
cut-off values
 >>  >>>> or pass schedules. This could also mean to identify
alternative
 >>  >>>> (combination of) values that perform substantially
better (on
 >>  >>>> some inputs).
 >>  >>>>
 >>  >>>>
 >>  >>>>   > Not sure how to proceed from here. Hence my
email to this
 >>  >>>>   > list.
 >>  >>>> Please let
 >>  >>>>   > me know.
 >>  >>>>
 >>  >>>> The email to the list was a great first step. The
next one
 >>  >>>> usually
 >> is to
 >>  >>>> setup an LLVM development and testing environment,
thus LLVM +
 >>  >>>> Clang
 >> +
 >>  >>>> LLVM-Test Suite that you can use. It is also advised
to work on
 >>  >>>> a
 >> small
 >>  >>>> task before the GSoC to get used to the LLVM
development.
 >>  >>>>
 >>  >>>> I don't have a really small ML
"coding" task handy right now
 >>  >>>> but the project is more about experiments anyway. To
get some
 >>  >>>> LLVM
 >> development
 >>  >>>> experience we can just take a small task in the IPO
Attributor
 >>  >>>> pass.
 >>  >>>>
 >>  >>>> One thing we need and we don't have is data. The
Attributor is
 >>  >>>> a fixpoint iteration framework so the number of
iterations is
 >>  >>>> pretty integral part. We have a statistics counter
to determine
 >>  >>>> if the
 >> number
 >>  >>>> required was higher than the given threshold but not
one to
 >>  >>>> determine the maximum iteration count required
during
 >>  >>>> compilation. It would be great if you could add
that, thus a
 >>  >>>> statistics counter that shows how many iterations
where
 >>  >>>> required until a fixpoint was found across all
invocations of
 >>  >>>> the Attributor. Does this make sense? Let me know
what you
 >>  >>>> think and feel free to ask questions via email or on
IRC.
 >>  >>>>
 >>  >>>> Cheers, Johannes
 >>  >>>>
 >>  >>>> P.S. Check out the coding style guide and the how to
contribute
 >> guide!
 >>  >>>>
 >>  >>>>
 >>  >>>>   > Thank you Shiva Badruswamy shivastanford at
gmail.com
 >>  >>>>   >
 >>  >>>>   >
 >>  >>>>   >
_______________________________________________ LLVM
 >>  >>>>   > Developers mailing list llvm-dev at
lists.llvm.org
 >>  >>>>   >
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
 >>  >>>>
 >>  >>>>
 >>  >>
 >>  >
 >>
 >>
 >

Shiva Stanford via llvm-dev

2020-Mar-31 10:22 UTC

head link

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

1. Draft proposals via gdoc. Final via PDF.
2. I did not see any timeline requests from GSoC but spring quarter ends
June 6 or so or maybe by a week more due to Coronavirus schedule delays.
Summer begins then. I will look into it some more in the morning and see
what I can add to timelines.

Thanks.

On Mon, Mar 30, 2020 at 11:43 PM Johannes Doerfert <
johannesdoerfert at gmail.com> wrote:
>
> On 3/30/20 9:28 PM, Shiva Stanford wrote:
>  > Hi Johannes:
>  >
>  > 1. Attached is the submitted PDF.
>
> I thought they make you submit via gdoc and I also thought they wanted a
> timeline and had other requirements. Please verify this so it's not a
> problem (I base this on the proposals I've seen this year and not on
the
> information actually provided by GSoC).
>
>
>  > 2. I have a notes section where I state: I am still unsure of the GPU
>  > extension I proposed as I dont know how LLVM plays into the GPU cross
> over
>  > space like how nvcc (Nvidia's compiler integrates gcc and PTX)
does.
>
> You can use clang as "host compiler". As mentioned before, there
is
> clang-cuda and OpenMP offloading also generates PTX for the GPU code.
>
>
>  > I dont know if there is a chance that function graphs in the CPU+GPU
>  > name spaces are seamless/continupus within nvcc or if nvcc is just a
>  > wrapper that invokes gcc on the cpu sources and ptx on the gpu
>  > sources.
>
> Something like that as far as I know.
>
>
>  > So what I have said is  - if there is time to investigate we could
>  > look at this. But I am not sure I am even framing the problem
>  > statement correctly at this point.
>
> As I said, I'd be very happy for you to also work on GPU related
things,
> what exactly can be defined over the next weeks.
>
> GPU offloading is by nature inter-procedural (take CUDA kernels) so
> creating the infrastructure to alter the granularity of kernels
> (when/where to fuse/split them) could be a task. For this it is fairly
> important (as far as I know now) to predict the register usage
> accurately. Using learning here might be interesting as well.
>
> As you mention in the pdf, one can also split the index space to balance
> computation. When we implement something like `pragma omp loop` we can
> also balance computations across multiple GPUs as long as we get the
> data movement right.
>
>
>  > 3. I have added a tentative tasks section and made a note that the
>  > project is open ended and things are quite fluid and may change
>  > significantly.
>
> That is good. This is a moving target and open ended task, I expect
> things to be determined more clearly as we go and based on the data we
> gather.
>
> Cheers,
>    Johannes
>
>
>  > Cheers Shiva
>  >
>  >
>  > On Mon, Mar 30, 2020 at 6:52 PM Johannes Doerfert <
>  > johannesdoerfert at gmail.com> wrote:
>  >
>  >> On 3/30/20 8:07 PM, Shiva Stanford wrote:
>  >>  > 1. Thanks for the clarifications. I will stick to
>  >>  > non-containerized OS X for now.
>  >>
>  >> Sounds good. As long as you can build it and run lit and
llvm-test
>  >> suite tests :)
>  >>
>  >>
>  >>  > 2. As an aside, I did try to build a Debian docker
container by
>  >>  > git
>  >> cloning
>  >>  > into it and using the Dockerfile in LLVM/utils/docker as a
>  >>  > starting
>  >> point:
>  >>  >  - some changes needed to updated packages (GCC in
particular
>  >>  >  needs to
>  >> be
>  >>  > latest) and the Debian image (Debian 9 instead of Debian 8)
pretty
>  >>  > much sets up the docker container well. But for some
reason, the
>  >>  > Ninja build tool within the CMake Generator fails. I am
looking
>  >>  > into it. Maybe I can produce a working docker workflow for
others
>  >>  > who want to build and work with LLVM in a container
environment.
>  >>
>  >> Feel free to propose a fix but I'm the wrong one to talk to
;)
>  >>
>  >>
>  >>  > 3. I have submitted the final proposal today to GSoC 2020
today
>  >>  > after incorporating some comments and thoughts. When you
all get a
>  >>  > chance to review, let me know your thoughts.
>  >>
>  >> Good. Can you share the google docs with me
>  >> (johannesdoerfert at gmail.com)? [Or did you and I misplaced the
link?
>  >> In that case send it again ;)]
>  >>
>  >>
>  >>  > 4. On GPU extension, my thoughts were around what an
integrated
>  >>  > compiler like Nvidia's nvcc (GCC for CPU + PTX for GPU)
does when
>  >>  > GCC is
>  >> substituted
>  >>  > with LLVM and if that arrangement can be optimized for ML
passes.
>  >>  > But I am beginning to think that structuring this problem
well and
>  >>  > doing meaningful work over the summer might be a bit
difficult.
>  >>
>  >> As far as I know, neither GCC nor Clang will behave much
differently
>  >> if they are used by nvcc than in their standalone mode.
>  >>
>  >> Having an "ML-mode" is probably a generic thing to look
at. Though,
>  >> the "high-level" optimizations are not necessarily
performed in
>  >> LLVM-IR.
>  >>
>  >>
>  >>  > As mentors, do you have any thoughts on how LLVM might be
>  >>  > integrated into a joint CPU-GPU compiler by the likes of
Nvidia,
>  >>  > Apple etc.?
>  >>
>  >> I'm unsure what you ask exactly. Clang can be used in CPU-GPU
>  >> compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this
it?
>  >> I'm personally mostly interested in generic optimizations in
this
>  >> space but actually quite interested. Some ideas: - transfer
latency
>  >> hiding (another GSoC project), - kernel granularity optimizations
>  >> (not worked being worked on yet but requires some infrastructe
>  >> changes that are as of now still in the making), - data
"location"
>  >> tracking so we can "move" computation to the right
device, e.g., for
>  >> really dependence free loops like `pragma omp loop`
>  >>
>  >> I can list more things but I'm unsure this is the direction
you were
>  >> thinking.
>  >>
>  >> Cheers, Johannes
>  >>
>  >>  > Best Shiva
>  >>  >
>  >>  >
>  >>  >
>  >>  > On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert <
>  >>  > johannesdoerfert at gmail.com> wrote:
>  >>  >
>  >>  >>
>  >>  >> On 3/27/20 3:46 PM, Shiva Stanford wrote:
>  >>  >>> Hi Johannes - great we are engaging on this.
>  >>  >>>
>  >>  >>> Some responses now and some later.
>  >>  >>>
>  >>  >>> 1. When you say setup LLVM dev environment +. clang
+ tools etc,
>  >>  >>> do
>  >> you
>  >>  >>> mean setup LLVM compiler code from the repo and
build it
>  >>  >>> locally?
>  >> If so,
>  >>  >>> yes, this is all done from my end - that is, I have
built all
>  >>  >>> this
>  >> on my
>  >>  >>> machine and compiled and run a couple of function
passes. I have
>  >> look at
>  >>  >>> some LLVM emits from clang tools but I will
familiarize more. I
>  >>  >>> have
>  >>  >> added
>  >>  >>> some small code segments, modified CMAKE Lists and
re-built code
>  >>  >>> to
>  >> get a
>  >>  >>> feel for the packaging  structure. Btw, is there a
version of
>  >>  >>> Basel
>  >> build
>  >>  >>> for this? Right now, I am using OS X as the SDK as
Apple is the
>  >>  >>> one
>  >> that
>  >>  >>> has adopted LLVM the most. But I can switch to
Linux containers
>  >>  >>> to completely wall off the LLVM build against any
OS X system
>  >>  >>> builds to prevent path obfuscation and truly have a
separate
>  >>  >>> address space. Is
>  >>  >> there
>  >>  >>> a preferable environment? In any case, I am
thinking of
>  >>  >>> containerizing
>  >>  >> the
>  >>  >>> build, so OS X system paths don't interfere
with include paths -
>  >> have you
>  >>  >>> received feedback from other developers on whether
the include
>  >>  >>> paths interfere with OS X LLVM system build?
>  >>  >>
>  >>  >>
>  >>  >> Setup sounds good.
>  >>  >>
>  >>  >> I have never used OS X but people do and I would expect
it to be
>  >>  >> OK.
>  >>  >>
>  >>  >> I don't think you need to worry about this right
now.
>  >>  >>
>  >>  >>
>  >>  >>> 2. The attributor pass refactoring gives some
specific direction
>  >>  >>> as a startup project - so that's great. Let me
study this pass
>  >>  >>> and I
>  >> will get
>  >>  >>> back to you with more questions.
>  >>  >>
>  >>  >> Sure.
>  >>  >>
>  >>  >>
>  >>  >>> 3. Yes, I will stick to the style guide
(Baaaah...Stanford is
>  >>  >>> strict
>  >> on
>  >>  >>> code styling and so are you guys :)) for sure.
>  >>  >>
>  >>  >> For better or worse.
>  >>  >>
>  >>  >>
>  >>  >> Cheers,
>  >>  >>
>  >>  >>    Johannes
>  >>  >>
>  >>  >>
>  >>  >>
>  >>  >>> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert
<
>  >>  >>> johannesdoerfert at gmail.com> wrote:
>  >>  >>>
>  >>  >>>> Hi Shiva,
>  >>  >>>>
>  >>  >>>> apologies for the delayed response.
>  >>  >>>>
>  >>  >>>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev
wrote:
>  >>  >>>>   > I am a grad CS student at Stanford and
wanted to engage
>  >>  >>>>   > with EJ
>  >>  >> Park,
>  >>  >>>>   > Giorgis Georgakoudis, Johannes Doerfert
to further develop
>  >>  >>>>   > the
>  >>  >> Machine
>  >>  >>>>   > Learning and Compiler Optimization
concept.
>  >>  >>>>
>  >>  >>>> Cool!
>  >>  >>>>
>  >>  >>>>
>  >>  >>>>   > My background is in machine learning,
cluster computing,
>  >> distributed
>  >>  >>>>   > systems etc. I am a good C/C++ developer
and have a strong
>  >>  >> background in
>  >>  >>>>   > algorithms and data structure.
>  >>  >>>>
>  >>  >>>> Sounds good.
>  >>  >>>>
>  >>  >>>>
>  >>  >>>>   > I am also taking an advanced compiler
course this quarter
>  >>  >>>>   > at
>  >>  >>>> Stanford. So I
>  >>  >>>>   > would be studying several of these
topics anyways - so I
>  >>  >>>>   > thought
>  >> I
>  >>  >>>> might as
>  >>  >>>>   > well co-engage on the LLVM compiler
infra project.
>  >>  >>>>
>  >>  >>>> Agreed ;)
>  >>  >>>>
>  >>  >>>>
>  >>  >>>>   > I am currently studying the background
information on SCC
>  >>  >>>>   > Call
>  >>  >> Graphs,
>  >>  >>>>   > Dominator Trees and other Global and
inter-procedural
>  >>  >>>>   > analysis to
>  >>  >> lay
>  >>  >>>> some
>  >>  >>>>   > ground work on how to tackle this
optimization pass using
>  >>  >>>>   > ML
>  >> models.
>  >>  >>>> I have
>  >>  >>>>   > run a couple of all program function
passes and visualized
>  >>  >>>>   > call
>  >>  >> graphs
>  >>  >>>> to
>  >>  >>>>   > get familiarized with the LLVM
optimization pass setup. I
>  >>  >>>>   > have
>  >> also
>  >>  >>>> setup
>  >>  >>>>   > and learnt the use of GDB to debug
function pass code.
>  >>  >>>>
>  >>  >>>> Very nice.
>  >>  >>>>
>  >>  >>>>
>  >>  >>>>   > I have submitted the ML and Compiler
Optimization proposal
>  >>  >>>>   > to
>  >> GSOC
>  >>  >>>> 2020. I
>  >>  >>>>   > have added an additional feature to
enhance the ML
>  >>  >>>>   > optimization
>  >> to
>  >>  >>>> include
>  >>  >>>>   > crossover code to GPU and investigate
how the function call
>  >> graphs
>  >>  >> can
>  >>  >>>> be
>  >>  >>>>   > visualized as SCCs across CPU and GPU
implementations. If
>  >>  >>>>   > the
>  >>  >>>> extension to
>  >>  >>>>   > GPU is too much for a summer project,
potentially we can
>  >>  >>>>   > focus on developing a framework for
studying SCCs across a
>  >>  >>>>   > unified CPU,
>  >> GPU
>  >>  >> setup
>  >>  >>>>   > and leave the coding, if feasible, to
next Summer. All
>  >> preliminary
>  >>  >>>> ideas.
>  >>  >>>>
>  >>  >>>> I haven't looked at the proposals yet (I
think we can only
>  >>  >>>> after the deadline). TBH, I'm not sure I
fully understand your
>  >>  >>>> extension. Also, full disclosure, the project
is pretty
>  >>  >>>> open-ended from my side at
>  >> least.
>  >>  >>>> I do not necessarily believe we (=llvm) is
ready for a ML
>  >>  >>>> driven
>  >> pass or
>  >>  >>>> even inference in practice. What I want is to
explore the use
>  >>  >>>> of ML
>  >> to
>  >>  >>>> improve the code we have, especially
heuristics. We build
>  >>  >>>> analysis
>  >> and
>  >>  >>>> transformations but it is hard to combine them
in a way that
>  >>  >>>> balances compile-time, code-size, and
performance.
>  >>  >>>>
>  >>  >>>> Some high-level statements that might help to
put my view into
>  >>  >>>> perspective:
>  >>  >>>>
>  >>  >>>> I want to use ML to identify patterns and code
features that we
>  >>  >>>> can check for using common techniques but when
we base our
>  >>  >>>> decision
>  >> making
>  >>  >>>> on these patterns or features we achieve better
compile-time,
>  >> code-size,
>  >>  >>>> and/or performance.  I want to use ML to
identify shortcomings
>  >>  >>>> in our existing heuristics, e.g. transformation
cut-off values
>  >>  >>>> or pass schedules. This could also mean to
identify alternative
>  >>  >>>> (combination of) values that perform
substantially better (on
>  >>  >>>> some inputs).
>  >>  >>>>
>  >>  >>>>
>  >>  >>>>   > Not sure how to proceed from here. Hence
my email to this
>  >>  >>>>   > list.
>  >>  >>>> Please let
>  >>  >>>>   > me know.
>  >>  >>>>
>  >>  >>>> The email to the list was a great first step.
The next one
>  >>  >>>> usually
>  >> is to
>  >>  >>>> setup an LLVM development and testing
environment, thus LLVM +
>  >>  >>>> Clang
>  >> +
>  >>  >>>> LLVM-Test Suite that you can use. It is also
advised to work on
>  >>  >>>> a
>  >> small
>  >>  >>>> task before the GSoC to get used to the LLVM
development.
>  >>  >>>>
>  >>  >>>> I don't have a really small ML
"coding" task handy right now
>  >>  >>>> but the project is more about experiments
anyway. To get some
>  >>  >>>> LLVM
>  >> development
>  >>  >>>> experience we can just take a small task in the
IPO Attributor
>  >>  >>>> pass.
>  >>  >>>>
>  >>  >>>> One thing we need and we don't have is
data. The Attributor is
>  >>  >>>> a fixpoint iteration framework so the number of
iterations is
>  >>  >>>> pretty integral part. We have a statistics
counter to determine
>  >>  >>>> if the
>  >> number
>  >>  >>>> required was higher than the given threshold
but not one to
>  >>  >>>> determine the maximum iteration count required
during
>  >>  >>>> compilation. It would be great if you could add
that, thus a
>  >>  >>>> statistics counter that shows how many
iterations where
>  >>  >>>> required until a fixpoint was found across all
invocations of
>  >>  >>>> the Attributor. Does this make sense? Let me
know what you
>  >>  >>>> think and feel free to ask questions via email
or on IRC.
>  >>  >>>>
>  >>  >>>> Cheers, Johannes
>  >>  >>>>
>  >>  >>>> P.S. Check out the coding style guide and the
how to contribute
>  >> guide!
>  >>  >>>>
>  >>  >>>>
>  >>  >>>>   > Thank you Shiva Badruswamy shivastanford
at gmail.com
>  >>  >>>>   >
>  >>  >>>>   >
>  >>  >>>>   >
_______________________________________________ LLVM
>  >>  >>>>   > Developers mailing list llvm-dev at
lists.llvm.org
>  >>  >>>>   >
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>  >>  >>>>
>  >>  >>>>
>  >>  >>
>  >>  >
>  >>
>  >>
>  >
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200331/c1a8b964/attachment-0001.html>

llvm dev - Mar 2020 - Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations