thr3ads.net - llvm dev - [llvm-dev] Your help needed: List of LLVM Open Projects 2017 [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2017-Jan-16 22:07 UTC

[llvm-dev] Your help needed: List of LLVM Open Projects 2017

> On Jan 16, 2017, at 1:47 PM, Sean Silva <chisophugis at gmail.com>
wrote:
> 
> 
> 
> On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <davide at freebsd.org
<mailto:davide at freebsd.org>> wrote:
> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> > Do we have any open projects on LLD?
> >
> > I know we usually try to avoid any big "projects" and mainly
add/fix things
> > in response to user needs, but just wondering if somebody has any
ideas.
> >
> 
> I'm not particularly active in lld anymore, but the last big item
I'd
> like to see implemented is Pettis-Hansen layout.
> http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
<http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf>
> (mainly because it improves performances of the final executable).
> GCC/gold have an implementation of the algorithm that can be used as
> base. I'll expand if anybody is interested.
> Side note: I'd like to propose a couple of llvm projects as well,
I'll
> sit down later today and write them.
> 
I’m not sure, can you confirm that such layout optimization on ELF requires
-ffunction-sections?

Also, for clang on OSX the best layout we could get is to order functions in the
order in which they get executed at runtime.
 
> For FullLTO it is conceptually pretty easy to get profile data we need for
this, but I'm not sure about the ThinLTO case.
> 
> Teresa, Mehdi,
> 
> Are there any plans (or things already working!) for getting profile data
from ThinLTO in a format that the linker can use for code layout? I assume that
profile data is being used already to guide importing, so it may just be a
matter of siphoning that off.
I’m not sure what kind of “profile information” is needed, and what makes it
easier for MonolithicLTO compared to ThinLTO?
> Or maybe that layout code should be inside LLVM; maybe part of the general
LTO interface? It looks like the current gcc plugin calls back into gcc for the
actual layout algorithm itself (function call
find_pettis_hansen_function_layout) rather than the reordering logic living in
the linker:
https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c
<https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c>I was thinking about this: could this be done by reorganizing the module itself
for LTO?

That wouldn’t help non-LTO and ThinLTO though.

— 
Mehdi

> 
> -- Sean Silva
>  
> 
> --
> Davide
> 
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170116/95c2a21b/attachment.html>

Davide Italiano via llvm-dev

2017-Jan-16 22:31 UTC

head link

[llvm-dev] Your help needed: List of LLVM Open Projects 2017

On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:>
> On Jan 16, 2017, at 1:47 PM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>
>
> On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <davide at
freebsd.org> wrote:
>>
>> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> > Do we have any open projects on LLD?
>> >
>> > I know we usually try to avoid any big "projects" and
mainly add/fix
>> > things
>> > in response to user needs, but just wondering if somebody has any
ideas.
>> >
>>
>> I'm not particularly active in lld anymore, but the last big item
I'd
>> like to see implemented is Pettis-Hansen layout.
>>
http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
>> (mainly because it improves performances of the final executable).
>> GCC/gold have an implementation of the algorithm that can be used as
>> base. I'll expand if anybody is interested.
>> Side note: I'd like to propose a couple of llvm projects as well,
I'll
>> sit down later today and write them.
>
>
>
> I’m not sure, can you confirm that such layout optimization on ELF requires
> -ffunction-sections?
>
For the non-LTO case, I think so.
> Also, for clang on OSX the best layout we could get is to order functions
in
> the order in which they get executed at runtime.
>
That's what we already do for lld. We collect and order file (run a
profiler) and pass that to the linker that lays out functions
accordingly.
This is to improve startup time for a class of startup-time-sensitive
operations. The algorithm proposed by Pettis (allegedly) aims to
reduce the TLB misses as it tries to lay out hot functions (or
functions that are likely to  be called together near in the final
binary).
>
> For FullLTO it is conceptually pretty easy to get profile data we need for
> this, but I'm not sure about the ThinLTO case.
>
> Teresa, Mehdi,
>
> Are there any plans (or things already working!) for getting profile data
> from ThinLTO in a format that the linker can use for code layout? I assume
> that profile data is being used already to guide importing, so it may just
> be a matter of siphoning that off.
>
>
> I’m not sure what kind of “profile information” is needed, and what makes
it
> easier for MonolithicLTO compared to ThinLTO?
>
> Or maybe that layout code should be inside LLVM; maybe part of the general
> LTO interface? It looks like the current gcc plugin calls back into gcc for
> the actual layout algorithm itself (function call
> find_pettis_hansen_function_layout) rather than the reordering logic living
> in the linker:
>
https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c
>
>
> I was thinking about this: could this be done by reorganizing the module
> itself for LTO?
>
> That wouldn’t help non-LTO and ThinLTO though.
This is a dimension that I think can be explored. The fact that it
wouldn't help with other modes of operation is completely orthogonal,
in particular until it's proven that this kind of optimization makes
sense with ThinLTO (and if it doesn't, it can be an optimization ran
only during full LTO).

-- 
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare

Sean Silva via llvm-dev

2017-Jan-16 23:24 UTC

head link

[llvm-dev] Your help needed: List of LLVM Open Projects 2017

On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> On Jan 16, 2017, at 1:47 PM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>
>
> On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <davide at
freebsd.org>
> wrote:
>
>> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> > Do we have any open projects on LLD?
>> >
>> > I know we usually try to avoid any big "projects" and
mainly add/fix
>> things
>> > in response to user needs, but just wondering if somebody has any
ideas.
>> >
>>
>> I'm not particularly active in lld anymore, but the last big item
I'd
>> like to see implemented is Pettis-Hansen layout.
>> http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Article
>> s/papers15.pdf
>> (mainly because it improves performances of the final executable).
>> GCC/gold have an implementation of the algorithm that can be used as
>> base. I'll expand if anybody is interested.
>> Side note: I'd like to propose a couple of llvm projects as well,
I'll
>> sit down later today and write them.
>>
>
>
> I’m not sure, can you confirm that such layout optimization on ELF
> requires -ffunction-sections?
>
In order for a standard ELF linker to safely be able to reorder sections at
function granularity, -ffunction-sections would be required. This isn't a
problem during LTO since the code generation is set up by the linker :)

>
> Also, for clang on OSX the best layout we could get is to order functions
> in the order in which they get executed at runtime.
>
What the optimal layout may be for given apps is a bit of a separate
question. Right now we're mostly talking about how to plumb everything
together so that we can do the reordering of the final executable.

In fact, standard ELF linking semantics generally require input sections to
be concatenated in command line order (this is e.g. how .init_array/.ctors
build up their arrays of pointers to initializers; a crt*.o file at the
beginning/end has a sentinel value and so the order matters). So the linker
will generally need blessing from the compiler to do most sorts of
reorderings as far as I'm aware.

Other signals besides profile info, such as a startup trace, might be
useful too, and we should make sure we can plug that into the design.
My understanding of the clang on OSX case is based on a comparison of the
`form_by_*` functions in clang/utils/perf-training/perf-helper.py which
offer a relatively simple set of algorithms, so I think the jury is still
out on the best approach (that script also uses a data collection method
that is not part of LLVM's usual instrumentation or sampling workflows for
PGO, so we may not be able to provide the same signals out of the box as
part of our standard offering in the compiler)
I think that once we have this ordering capability integrated more deeply
into the compiler, we'll be able to evaluate more complicated algorithms
like Pettis-Hansen, have access to signals like global profile info, do
interesting call graph analyses, etc. to find interesting approaches.

>
> For FullLTO it is conceptually pretty easy to get profile data we need for
> this, but I'm not sure about the ThinLTO case.
>
> Teresa, Mehdi,
>
> Are there any plans (or things already working!) for getting profile data
> from ThinLTO in a format that the linker can use for code layout? I assume
> that profile data is being used already to guide importing, so it may just
> be a matter of siphoning that off.
>
>
> I’m not sure what kind of “profile information” is needed, and what makes
> it easier for MonolithicLTO compared to ThinLTO?
>
For MonolithicLTO I had in mind that a simple implementation would be:
```
std::vector<std::string> Ordering;
auto Pass = make_unique<LayoutModulePass>(&Ordering);
addPassToLTOPipeline(std::move(Pass))
```

The module pass would just query the profile data directly on IR
datastructures and get the order out. This would require very little
"plumbing".

>
> Or maybe that layout code should be inside LLVM; maybe part of the general
> LTO interface? It looks like the current gcc plugin calls back into gcc for
> the actual layout algorithm itself (function call
> find_pettis_hansen_function_layout) rather than the reordering logic
> living in the linker: https://android.googlesource.com/toolchain/gcc/+/
> 3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_
> reordering_plugin/function_reordering_plugin.c
>
>
> I was thinking about this: could this be done by reorganizing the module
> itself for LTO?
>
For MonolithicLTO that's another simple approach.

>
> That wouldn’t help non-LTO and ThinLTO though.
>
I think we should ideally aim for something that works uniformly for
Monolithic and Thin. For example, GCC emits special sections containing the
profile data and the linker just reads those sections; something analogous
in LLVM would just happen in the backend and be common to Monolithic and
Thin. If ThinLTO already has profile summaries in some nice form though, it
may be possible to bypass this.

Another advantage of using special sections in the output like GCC does is
that you don't actually need LTO at all to get the function reordering. The
profile data passed to the compiler during per-TU compilation can be
lowered into the same kind of annotations. (though LTO and function
ordering are likely to go hand-in-hand most often for peak-performance
builds).

-- Sean Silva

>
> —
> Mehdi
>
>
>
> -- Sean Silva
>
>
>>
>> --
>> Davide
>>
>> "There are no solved problems; there are only problems that are
more
>> or less solved" -- Henri Poincare
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170116/671ec876/attachment.html>

Sean Silva via llvm-dev

2017-Jan-16 23:32 UTC

head link

[llvm-dev] Your help needed: List of LLVM Open Projects 2017

On Mon, Jan 16, 2017 at 2:31 PM, Davide Italiano <davide at freebsd.org>
wrote:
> On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <mehdi.amini at
apple.com>
> wrote:
> >
> > On Jan 16, 2017, at 1:47 PM, Sean Silva <chisophugis at
gmail.com> wrote:
> >
> >
> >
> > On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <davide at
freebsd.org>
> wrote:
> >>
> >> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >> > Do we have any open projects on LLD?
> >> >
> >> > I know we usually try to avoid any big "projects"
and mainly add/fix
> >> > things
> >> > in response to user needs, but just wondering if somebody has
any
> ideas.
> >> >
> >>
> >> I'm not particularly active in lld anymore, but the last big
item I'd
> >> like to see implemented is Pettis-Hansen layout.
> >> http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/
> Articles/papers15.pdf
> >> (mainly because it improves performances of the final executable).
> >> GCC/gold have an implementation of the algorithm that can be used
as
> >> base. I'll expand if anybody is interested.
> >> Side note: I'd like to propose a couple of llvm projects as
well, I'll
> >> sit down later today and write them.
> >
> >
> >
> > I’m not sure, can you confirm that such layout optimization on ELF
> requires
> > -ffunction-sections?
> >
>
> For the non-LTO case, I think so.
>
> > Also, for clang on OSX the best layout we could get is to order
> functions in
> > the order in which they get executed at runtime.
> >
>
> That's what we already do for lld. We collect and order file (run a
> profiler) and pass that to the linker that lays out functions
> accordingly.
> This is to improve startup time for a class of startup-time-sensitive
> operations. The algorithm proposed by Pettis (allegedly) aims to
> reduce the TLB misses as it tries to lay out hot functions (or
> functions that are likely to  be called together near in the final
> binary).
>
IIRC from when I looked at the paper a while ago, it is mostly just a
"huffman tree construction" type algorithm (agglomerating based on
highest
probability) and assumes that if two functions are hot then they are likely
to be needed together. This is not always the case.

E.g. consider a server that accepts RPC requests and based on those
requests either does Foo or Bar which are largely disjoint. It's entirely
possible for the top two functions of the profile to be one in Foo and one
in Bar, but laying them out near each other doesn't make sense since there
is never locality (for a given RPC, either Foo or Bar gets run). A static
call graph analysis can provide the needed signals to handle this case
better.

-- Sean Silva

>
> >
> > For FullLTO it is conceptually pretty easy to get profile data we need
> for
> > this, but I'm not sure about the ThinLTO case.
> >
> > Teresa, Mehdi,
> >
> > Are there any plans (or things already working!) for getting profile
data
> > from ThinLTO in a format that the linker can use for code layout? I
> assume
> > that profile data is being used already to guide importing, so it may
> just
> > be a matter of siphoning that off.
> >
> >
> > I’m not sure what kind of “profile information” is needed, and what
> makes it
> > easier for MonolithicLTO compared to ThinLTO?
> >
> > Or maybe that layout code should be inside LLVM; maybe part of the
> general
> > LTO interface? It looks like the current gcc plugin calls back into
gcc
> for
> > the actual layout algorithm itself (function call
> > find_pettis_hansen_function_layout) rather than the reordering logic
> living
> > in the linker:
> > https://android.googlesource.com/toolchain/gcc/+/
> 3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_
> reordering_plugin/function_reordering_plugin.c
> >
> >
> > I was thinking about this: could this be done by reorganizing the
module
> > itself for LTO?
> >
> > That wouldn’t help non-LTO and ThinLTO though.
>
> This is a dimension that I think can be explored. The fact that it
> wouldn't help with other modes of operation is completely orthogonal,
> in particular until it's proven that this kind of optimization makes
> sense with ThinLTO (and if it doesn't, it can be an optimization ran
> only during full LTO).
>
> --
> Davide
>
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170116/2dcce6f9/attachment.html>

Mehdi Amini via llvm-dev

2017-Jan-16 23:35 UTC

head link

[llvm-dev] Your help needed: List of LLVM Open Projects 2017

> On Jan 16, 2017, at 3:24 PM, Sean Silva <chisophugis at gmail.com>
wrote:
> 
> 
> 
> On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
> 
>> On Jan 16, 2017, at 1:47 PM, Sean Silva <chisophugis at gmail.com
<mailto:chisophugis at gmail.com>> wrote:
>> 
>> 
>> 
>> On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <davide at
freebsd.org <mailto:davide at freebsd.org>> wrote:
>> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>> > Do we have any open projects on LLD?
>> >
>> > I know we usually try to avoid any big "projects" and
mainly add/fix things
>> > in response to user needs, but just wondering if somebody has any
ideas.
>> >
>> 
>> I'm not particularly active in lld anymore, but the last big item
I'd
>> like to see implemented is Pettis-Hansen layout.
>>
http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf
<http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf>
>> (mainly because it improves performances of the final executable).
>> GCC/gold have an implementation of the algorithm that can be used as
>> base. I'll expand if anybody is interested.
>> Side note: I'd like to propose a couple of llvm projects as well,
I'll
>> sit down later today and write them.
>> 
> 
> I’m not sure, can you confirm that such layout optimization on ELF requires
-ffunction-sections?
> 
> In order for a standard ELF linker to safely be able to reorder sections at
function granularity, -ffunction-sections would be required. This isn't a
problem during LTO since the code generation is set up by the linker :)
>  
> 
> Also, for clang on OSX the best layout we could get is to order functions
in the order in which they get executed at runtime.
> 
> What the optimal layout may be for given apps is a bit of a separate
question. Right now we're mostly talking about how to plumb everything
together so that we can do the reordering of the final executable.
Yes, I was raising this exactly with the idea of “we may want to try different
algorithm based on different kind of data”.
> 
> In fact, standard ELF linking semantics generally require input sections to
be concatenated in command line order (this is e.g. how .init_array/.ctors build
up their arrays of pointers to initializers; a crt*.o file at the beginning/end
has a sentinel value and so the order matters). So the linker will generally
need blessing from the compiler to do most sorts of reorderings as far as
I'm aware.
> 
> Other signals besides profile info, such as a startup trace, might be
useful too, and we should make sure we can plug that into the design.
> My understanding of the clang on OSX case is based on a comparison of the
`form_by_*` functions in clang/utils/perf-training/perf-helper.py which offer a
relatively simple set of algorithms, so I think the jury is still out on the
best approach (that script also uses a data collection method that is not part
of LLVM's usual instrumentation or sampling workflows for PGO, so we may not
be able to provide the same signals out of the box as part of our standard
offering in the compiler)
Yes, I was thinking that some Xray-based instrumentation could be used to
provided the same data.
> I think that once we have this ordering capability integrated more deeply
into the compiler, we'll be able to evaluate more complicated algorithms
like Pettis-Hansen, have access to signals like global profile info, do
interesting call graph analyses, etc. to find interesting approaches.
> 
>  
> 
>> For FullLTO it is conceptually pretty easy to get profile data we need
for this, but I'm not sure about the ThinLTO case.
>> 
>> Teresa, Mehdi,
>> 
>> Are there any plans (or things already working!) for getting profile
data from ThinLTO in a format that the linker can use for code layout? I assume
that profile data is being used already to guide importing, so it may just be a
matter of siphoning that off.
> 
> I’m not sure what kind of “profile information” is needed, and what makes
it easier for MonolithicLTO compared to ThinLTO?
> 
> For MonolithicLTO I had in mind that a simple implementation would be:
> ```
> std::vector<std::string> Ordering;
> auto Pass = make_unique<LayoutModulePass>(&Ordering);
> addPassToLTOPipeline(std::move(Pass))
> ```
> 
> The module pass would just query the profile data directly on IR
datastructures and get the order out. This would require very little
"plumbing".
>  
> 
>> Or maybe that layout code should be inside LLVM; maybe part of the
general LTO interface? It looks like the current gcc plugin calls back into gcc
for the actual layout algorithm itself (function call
find_pettis_hansen_function_layout) rather than the reordering logic living in
the linker:
https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c
<https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c>
> I was thinking about this: could this be done by reorganizing the module
itself for LTO?
> 
> For MonolithicLTO that's another simple approach.
>  
> 
> That wouldn’t help non-LTO and ThinLTO though.
> 
> I think we should ideally aim for something that works uniformly for
Monolithic and Thin. For example, GCC emits special sections containing the
profile data and the linker just reads those sections; something analogous in
LLVM would just happen in the backend and be common to Monolithic and Thin. If
ThinLTO already has profile summaries in some nice form though, it may be
possible to bypass this.
> 
> Another advantage of using special sections in the output like GCC does is
that you don't actually need LTO at all to get the function reordering. The
profile data passed to the compiler during per-TU compilation can be lowered
into the same kind of annotations. (though LTO and function ordering are likely
to go hand-in-hand most often for peak-performance builds).
Yes I agree with all of this :)
That makes it for interesting design trade-off!
 
— 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170116/82c1c146/attachment.html>

llvm dev - Jan 2017 - Your help needed: List of LLVM Open Projects 2017

[llvm-dev] Your help needed: List of LLVM Open Projects 2017

[llvm-dev] Your help needed: List of LLVM Open Projects 2017

[llvm-dev] Your help needed: List of LLVM Open Projects 2017

[llvm-dev] Your help needed: List of LLVM Open Projects 2017

[llvm-dev] Your help needed: List of LLVM Open Projects 2017