thr3ads.net - llvm dev - [llvm-dev] [Q] What can drive compiler performance improvements in the future? [Feb 2021]

If this information is useful, please help other people find it:
Share via:

Michael Kruse via llvm-dev

2021-Feb-24 02:42 UTC

[llvm-dev] [Q] What can drive compiler performance improvements in the future?

To add to Stefanos' list, I think autotuning would be another point
since at compile time it is unknown with which parameters a program is
invoked and cost heuristics as in 1) cannot model the entire
architecture. Ideally, reoptimization using collected information
during runtime would be done transparently by a JIT as in Chis
Lattner's original master's thesis.

Stefanos' items 1)-3) would be possible, at least for loop nests,
using a framework that I outlined in [7].

[9] https://llvm.org/pubs/2004-01-30-CGO-LLVM.html
[7] https://youtu.be/zHHUh0c5wig


Am Mo., 22. Feb. 2021 um 19:43 Uhr schrieb Stefanos Baziotis via
llvm-dev <llvm-dev at lists.llvm.org>:>
> Hi Denis,
>
> Looking forward to your talk at LLVM-CGO!
>
> Here are some directions that I have seen lately:
>
> 1) "Unconstrained" Optimization
>
> Currently, optimization passes use a pre-determined series of steps. So,
optimizations are inherently constrained in how big leaps
> the transformations can make. On the other hand, research such as STOKE [1]
has showed that a "more dumb" but unconstrained
> optimizer can change radically even the very algorithm used. To explain the
"more dumb" but unconstrained part, the algorithm used to optimize
> the program is literally:
>
> - Start with a program (or no program, in which case the program is
synthesized)
>
> - Do a random change to the program
>   - Compute a cost (whose specifics deserve a big discussion but it's
not the central point here; the first pointer at the end is related though)
>   - If the cost is better, keep the change
>   - Otherwise, based on some probability, keep the change
>   - Repeat
>
> This resulted in great improvements to the program, in a not horrible
compilation time.
>
> 2) Automatic Parallelization Revival
>
> Automatic Parallelization is thought to have died, but in the last couple
of years a group in Princeton has shown some
> promising improvements, specifically with Perspective [2]. I think this a
great step forward as it obtained a _23.0x_ for
> 12 general-purpose C/C++ programs (SPEC IIRC) running on a 28-core
shared-memory commodity machine.
> I would urge you to take a closer look to that since the infrastructure is
built on top of LLVM.
>
> Here's some related work [3] trying to revive automatic parallelization
from a different perspective (pun not intended).
>
> 3) Decoupling Transformations and Cost-Modeling
>
> An important problem I think in today's compilers is that cost is baked
into the transformations (and it's
> not even clear how it is computed).
>
> The result of this is that even if you had a perfect oracle, which always
knew the perfect transformations to be done,
> there is simply no way to instruct the compiler to perform the sequence.
So, my personal opinion is that in
> the years to come, there will be an effort to separate transformations into
their own, dedicated and fine-grained
> modules (as opposed to the monolithic entities which now are, i.e. passes).
This in turn can enable machine-learning
> models (which will decide _what_ has to happen and then they'll use the
fine-grained APIs of transformations to make it happen).
>
> (I think this is closely related to what Mircea said above)
>
> --- Random pointers ---
>
> * The DeepCompiler [4] project at MIT has done significant improvements in
predicting the performance of X86 code:
> * Alex Aiken's opinion on the future of compilers [5]
>
> Disclaimer: This is definitely not an exhaustive list!
>
> [1] https://github.com/StanfordPL/stoke
> [2] https://liberty.princeton.edu/Projects/AutoPar/Perspective/
> [3] https://www.youtube.com/watch?v=8B25HQeJ0Ms
> [4] https://www.deep-compiler.org/
> [5] https://youtu.be/ob0nfNr2FLc?t=156
>
> Στις Τρί, 23 Φεβ 2021 στις 2:57 π.μ., ο/η Mircea Trofin via llvm-dev
<llvm-dev at lists.llvm.org> έγραψε:
>>
>>
>>
>> On Mon, Feb 22, 2021 at 4:50 PM Denis Bakhvalov via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>
>>> Hello,
>>>
>>> I'll be giving a short presentation on the LLVM performance
workshop soon and I want to touch on the topic of future performance
improvements. I decided to ask the community about what can drive performance
improvements in a classic C++ LLVM compiler CPU backend in the future? If I
summarize all the thoughts and opinions, I think it would be an interesting
discussion.
>>>
>>> There is already a body of research on the topic, including [1]
which talks about superoptimizers, but maybe anybody has some interesting new
ideas.
>>> In particular, I'm interested to hear thoughts on the following
things:
>>> 1. How big is the performance headroom in existing LLVM
optimization passes?
>>> 2. I think PGO can play a bigger role in the future. I see the
benefits of more optimizations being guided by profiling data. For example,
there is potential for intelligent injection of memory prefetching hints based
on HW telemetry data on modern Intel CPUs. This HW telemetry data allows finding
memory accesses that miss in caches and estimate the prefetch window (in
cycles). Using this data compiler can determine the place for a prefetch hint.
Obviously, there are lots of limitations, but it's just a thought. BTW, the
same can be done for PGO-driven branch-to-cmov conversion (fighting branch
mispredictions).
>>> 3. ML opportunities in compiler tooling. For example, code
similarity analysis [2][3] opens a wide range of opportunities, e.g. build a
recommendation system that will suggest a better performing code sequence.
>>
>> on this, also: replacing hand-crafted heuristics with machine learned
policies, for those passes that are heuristics driven - like inlining, regalloc,
instruction selection, etc. Same for cost models.
>>
>>>
>>> Please also share any thoughts you have that are not on this list.
>>>
>>> If that topic was discussed in the past, sorry, and please send
links to those discussions.
>>>
>>> -Denis
>>> https://easyperf.net
>>>
>>> [1]: https://arxiv.org/abs/1809.02161
>>> [2]: https://doi.org/10.1145/3360578
>>> [3]: https://arxiv.org/abs/2006.05265
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Stefanos Baziotis via llvm-dev

2021-Feb-24 13:15 UTC

head link

[llvm-dev] [Q] What can drive compiler performance improvements in the future?

Hi everyone,

1) is already doing autotuning (it's a hybrid between a static cost model
and an actual running in the target, you can see more here [1])
But what I tried to convey, at least from my perspective and that of the
one of the authors (Alex Aiken) is that it is an example
of a bigger idea. Specifically, that we don't even code the transformation
in the classic sense; we make "dumb"
optimizers which incidentally are also unconstrained.

Parallelization in 2) is based on different ideas than a loop
transformation framework where we try multiple transformations.
The core idea is to remove dependence edges based on speculation to enable
parallelization. To put it differently,
it is dependence-centric, where we don't have many transformations to do
(actually, pretty much just one: parallelization)
as opposed to transformation-centric, where we care about what is the best
sequence of e.g. loop transformations
to apply (and dependences are the means to the transformations, not the
goal).

That is not to say that the framework you mentioned Michael is not great.
It's just that AFAIU, the core
ideas are different (which is great for pluralism :))

IMHO, this framework certainly is great and in fact, it ties nicely with
3). I would argue that it is an important step for
loop optimizations in LLVM whether it is later used for auto-tuning or not.

(FWIW, I'm working on 3) from a different angle and hopefully, soon
we'll
be able to make the work public :))

Best,
Stefanos

[1] www.youtube.com/watch?v=rZFeTTFp7x4

Στις Τετ, 24 Φεβ 2021 στις 4:42 π.μ., ο/η Michael Kruse <
llvmdev at meinersbur.de> έγραψε:
> To add to Stefanos' list, I think autotuning would be another point
> since at compile time it is unknown with which parameters a program is
> invoked and cost heuristics as in 1) cannot model the entire
> architecture. Ideally, reoptimization using collected information
> during runtime would be done transparently by a JIT as in Chis
> Lattner's original master's thesis.
>
> Stefanos' items 1)-3) would be possible, at least for loop nests,
> using a framework that I outlined in [7].
>
> [9] https://llvm.org/pubs/2004-01-30-CGO-LLVM.html
> [7] https://youtu.be/zHHUh0c5wig
>
>
> Am Mo., 22. Feb. 2021 um 19:43 Uhr schrieb Stefanos Baziotis via
> llvm-dev <llvm-dev at lists.llvm.org>:
> >
> > Hi Denis,
> >
> > Looking forward to your talk at LLVM-CGO!
> >
> > Here are some directions that I have seen lately:
> >
> > 1) "Unconstrained" Optimization
> >
> > Currently, optimization passes use a pre-determined series of steps.
So,
> optimizations are inherently constrained in how big leaps
> > the transformations can make. On the other hand, research such as
STOKE
> [1] has showed that a "more dumb" but unconstrained
> > optimizer can change radically even the very algorithm used. To
explain
> the "more dumb" but unconstrained part, the algorithm used to
optimize
> > the program is literally:
> >
> > - Start with a program (or no program, in which case the program is
> synthesized)
> >
> > - Do a random change to the program
> >   - Compute a cost (whose specifics deserve a big discussion but
it's
> not the central point here; the first pointer at the end is related though)
> >   - If the cost is better, keep the change
> >   - Otherwise, based on some probability, keep the change
> >   - Repeat
> >
> > This resulted in great improvements to the program, in a not horrible
> compilation time.
> >
> > 2) Automatic Parallelization Revival
> >
> > Automatic Parallelization is thought to have died, but in the last
> couple of years a group in Princeton has shown some
> > promising improvements, specifically with Perspective [2]. I think
this
> a great step forward as it obtained a _23.0x_ for
> > 12 general-purpose C/C++ programs (SPEC IIRC) running on a 28-core
> shared-memory commodity machine.
> > I would urge you to take a closer look to that since the
infrastructure
> is built on top of LLVM.
> >
> > Here's some related work [3] trying to revive automatic
parallelization
> from a different perspective (pun not intended).
> >
> > 3) Decoupling Transformations and Cost-Modeling
> >
> > An important problem I think in today's compilers is that cost is
baked
> into the transformations (and it's
> > not even clear how it is computed).
> >
> > The result of this is that even if you had a perfect oracle, which
> always knew the perfect transformations to be done,
> > there is simply no way to instruct the compiler to perform the
sequence.
> So, my personal opinion is that in
> > the years to come, there will be an effort to separate transformations
> into their own, dedicated and fine-grained
> > modules (as opposed to the monolithic entities which now are, i.e.
> passes). This in turn can enable machine-learning
> > models (which will decide _what_ has to happen and then they'll
use the
> fine-grained APIs of transformations to make it happen).
> >
> > (I think this is closely related to what Mircea said above)
> >
> > --- Random pointers ---
> >
> > * The DeepCompiler [4] project at MIT has done significant
improvements
> in predicting the performance of X86 code:
> > * Alex Aiken's opinion on the future of compilers [5]
> >
> > Disclaimer: This is definitely not an exhaustive list!
> >
> > [1] https://github.com/StanfordPL/stoke
> > [2] https://liberty.princeton.edu/Projects/AutoPar/Perspective/
> > [3] https://www.youtube.com/watch?v=8B25HQeJ0Ms
> > [4] https://www.deep-compiler.org/
> > [5] https://youtu.be/ob0nfNr2FLc?t=156
> >
> > Στις Τρί, 23 Φεβ 2021 στις 2:57 π.μ., ο/η Mircea Trofin via llvm-dev
<
> llvm-dev at lists.llvm.org> έγραψε:
> >>
> >>
> >>
> >> On Mon, Feb 22, 2021 at 4:50 PM Denis Bakhvalov via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I'll be giving a short presentation on the LLVM
performance workshop
> soon and I want to touch on the topic of future performance improvements. I
> decided to ask the community about what can drive performance improvements
> in a classic C++ LLVM compiler CPU backend in the future? If I summarize
> all the thoughts and opinions, I think it would be an interesting
> discussion.
> >>>
> >>> There is already a body of research on the topic, including
[1] which
> talks about superoptimizers, but maybe anybody has some interesting new
> ideas.
> >>> In particular, I'm interested to hear thoughts on the
following things:
> >>> 1. How big is the performance headroom in existing LLVM
optimization
> passes?
> >>> 2. I think PGO can play a bigger role in the future. I see the
> benefits of more optimizations being guided by profiling data. For example,
> there is potential for intelligent injection of memory prefetching hints
> based on HW telemetry data on modern Intel CPUs. This HW telemetry data
> allows finding memory accesses that miss in caches and estimate the
> prefetch window (in cycles). Using this data compiler can determine the
> place for a prefetch hint. Obviously, there are lots of limitations, but
> it's just a thought. BTW, the same can be done for PGO-driven
> branch-to-cmov conversion (fighting branch mispredictions).
> >>> 3. ML opportunities in compiler tooling. For example, code
similarity
> analysis [2][3] opens a wide range of opportunities, e.g. build a
> recommendation system that will suggest a better performing code sequence.
> >>
> >> on this, also: replacing hand-crafted heuristics with machine
learned
> policies, for those passes that are heuristics driven - like inlining,
> regalloc, instruction selection, etc. Same for cost models.
> >>
> >>>
> >>> Please also share any thoughts you have that are not on this
list.
> >>>
> >>> If that topic was discussed in the past, sorry, and please
send links
> to those discussions.
> >>>
> >>> -Denis
> >>> https://easyperf.net
> >>>
> >>> [1]: https://arxiv.org/abs/1809.02161
> >>> [2]: https://doi.org/10.1145/3360578
> >>> [3]: https://arxiv.org/abs/2006.05265
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210224/6550ed67/attachment.html>

llvm dev - Feb 2021 - [Q] What can drive compiler performance improvements in the future?

[llvm-dev] [Q] What can drive compiler performance improvements in the future?

[llvm-dev] [Q] What can drive compiler performance improvements in the future?