thr3ads.net - llvm dev - [llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations [Sep 2019]

If this information is useful, please help other people find it:
Share via:

Sriraman Tallam via llvm-dev

2019-Sep-27 00:30 UTC

[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations

On Thu, Sep 26, 2019 at 5:13 PM Eli Friedman <efriedma at quicinc.com>
wrote:>
> > -----Original Message-----
> > From: Sriraman Tallam <tmsriram at google.com>
> > Sent: Thursday, September 26, 2019 3:24 PM
> > To: Eli Friedman <efriedma at quicinc.com>
> > Cc: Xinliang David Li <xinliangli at gmail.com>; llvm-dev
<llvm-dev at lists.llvm.org>
> > Subject: [EXT] Re: [llvm-dev] [RFC] Propeller: A frame work for Post
Link
> > Optimizations
> >
> > On Thu, Sep 26, 2019 at 12:39 PM Eli Friedman <efriedma at
quicinc.com> wrote:
> > >
> > >
> > >
> > > From: Xinliang David Li <xinliangli at gmail.com>
> > > Sent: Wednesday, September 25, 2019 5:58 PM
> > > To: Eli Friedman <efriedma at quicinc.com>
> > > Cc: Sriraman Tallam <tmsriram at google.com>; llvm-dev
<llvm-
> > dev at lists.llvm.org>
> > > Subject: [EXT] Re: [llvm-dev] [RFC] Propeller: A frame work for
Post Link
> > Optimizations
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Sep 25, 2019 at 5:02 PM Eli Friedman via llvm-dev
<llvm-
> > dev at lists.llvm.org> wrote:
> > >
> > > My biggest question about this architecture is about when
propeller runs basic
> > block reordering within a function.  It seems like a lot of the
complexity comes
> > from using the proposed -fbasicblock-sections to generated mangled
ELF, and
> > then re-parsing the mangled ELF as a separate step.  I'm not sure
that's the right
> > approach, long-term.
> > >
> > > Splitting every basic block into its own section introduces
overhead, like you
> > described.  And it's likely more complex on non-x86 targets, which
have a
> > greater variety of conditional branches.  And the reordering itself
involves a
> > bunch of x86 and ELF-specific code.
> > >
> > > I'd like to suggest an alternative: instead of perform basic
block reordering and
> > function splitting by manipulating the ELF files, you could perform
reordering
> > and splitting as part of link-time code generation, as an MIR pass
that runs just
> > before the assembly printer.  MIR is almost exactly the form you want
for this
> > sort of manipulation: it has basic blocks which correspond closely to
the final
> > binary, and a high-level representation of branch instructions.
> > >
> > >
> > >
> > > This was considered for Propeller.  This  is currently being
explored in a similar
> > way as an alternative of CSFDO which uses PMU samples.
> > >
> > >
> > >
> > > Makes sense.
> > >
> > >
> > >
> > > And it's before the DWARF/CFI emission, so you don't need
to worry about
> > fixing them afterwards.  This should take less code overall, and much
less target-
> > specific code. And infrastructure for function splitting would be
useful for non-
> > Propeller workflows.
> > >
> > > There are some minor downsides to this approach I can think of. 
You lose a
> > little flexibility, in that you can't mix blocks from different
functions together,
> > but you aren't doing that anyway, from your description?
> > >
> > >
> > >
> > > One of the main design objectives of Propeller is to have the
capability to do
> > interprocedural code transformations (reordering, cloning, dedupping
etc), so
> > this won't be a minor downside. Function/block alignment (for
branch
> > misprediction reduction etc) will also need to be done as a global
optimization in
> > the future.
> > >
> > >
> > >
> > > Okay, so my suggestion doesn’t work. I’m still concerned the
proposed design
> > is going to push us in a direction we don’t want to go.  Particularly,
if you’re
> > going to attempt more complicated transforms, the problems caused by
the
> > limited information available in an ELF file will become more
prominent.  I mean,
> > yes, you can come up with a more complicated implicit contract between
the
> > compiler and Propeller about the exact format of Propeller basic
blocks, and add
> > additional symbol annotations, and eventually come up with an “IR”
that allows
> > Propeller to perform arbitrary code transforms.  But that’s inevitably
going to be
> > more complicated, and harder to understand, than a compiler IR
designed for
> > optimizations.
> >
> > Thanks for the feedback, I am not sure I fully understand your
> > concerns but let me try to make some of the things clearer:
> >
> > *  Propeller relinks. Specifically, it regenerates ELF object files
> > from MIR.  Even if MIR were serializable, we would still be starting
> > before CFI instruction inserter pass and then regenerate the native
> > ELF objects.
>
> Now I'm confused.  Why are you regenerating the ELF files?
TLDR;  We are regenerating ELF files from optimized IR to keep the
cost of generating basic block sections low.

If what you describe above is done, where we generate basic block
sections even before we profile, we must conservatively generate
sections for all functions.  This is unnecessarily expensive and we
have shown that generating it on demand based on profiles is much
cheaper.  Since the backend generation can be distributed or
parallelized, this is not expensive to do.    If we could make the
cost of basic block sections in terms of size bloats cheaper we could
do what you just suggested here, that is, always build with basic
block sections for all basic blocks.
>
> I thought the workflow was something like the following:
>
> 1. The compiler (or LTO codegen) generates ELF files with basic block
sections.
The correction here is that we generate first with basic block labels
and not sections.
> 2. You link once without propeller optimizations.
This is correct.
> 3. You collect the propeller profile.
This is correct.
> 4. You use the same ELF files to relink with propeller optimizations.
We use the same optimized IR files to regenerate the ELF files.  We
could use optimized MIR here but serializability of MIR is not well
supported yet and is part of the larger plan.  We only need to re-run
CFI instruction inserter in MIR.  For future optimizations, we can
plug in here and make small code transformations for optimizations
like prefetch insertion.  If we attempt to do basic block re-ordering
here, we would be limited to intra-procedural basic block reordering
which is sub-optimal.  Hence, we generate basic block sections here
only for the sampled functions which is a small %age.
>
> That's sufficient to implement all the optimizations you described, as
far as I can tell.
>
> But it sounds like that's wrong?  It sounds like what you're
describing is:
>
> 1. LTO codegen generates MIR files.
> 2. You convert the MIR files to ELF object files, and link without
propeller optimizations.  (Does this step have any propeller-specific code
generation differences?)
Yes, like I said above, basic block labels are used here to identify
the basic blocks which are later used in the last step to annotate
profiles.
> 4. You collect the propeller profile.
> 5. You apply some propeller optimization to the MIR files.
> 6. You convert the optimized MIR files to ELF object files, with basic
block sections enabled.
Right, technically this is high level IR now as MIR is not
serializable but this is ok for discussion.
> 7. You link the ELF files, applying propeller reordering optimizations.
>
> (And since you can't serialize MIR, you actually serialize LLVM IR, and
run the code generator when you deserialize, to convert that to MIR.)
>
> Is that correct?
That is correct.
>
> If you have the MIR at the time you're making all the propeller
optimization decisions, why is the linker rewriting raw x86 assembly, as opposed
to performing the relevant transforms on MIR?
MIR is still one module at a time.  We cannot do inter-procedural
basic block layout here.  We can do much more advanced stuff at the
whole-program level in the linker.  The relaxation code is the down
side.

For more details, We strongly considered this.  We could run something
like a thin link in thin lto figure out the global layout and hand out
the relevant  subsets of the global decision  to each module.  This
looked more complicated and the individual pieces from each module
should still be globally laid out again by the linker.  This
constraints us on what we can do for layout and also does not work
well with future optimizations like global alignment like David
pointed out.
>
> Why are you proposing to add a bunch of options to clang to manipulate LLVM
code generation, given none of those options are actually used for Propeller
workflows?
Where do you suggest labelling and section options should exist?  We
looked at  basic block sections to be similar to function sections in
terms of option handling?

Thanks
Sri
>
> -Eli

Eli Friedman via llvm-dev

2019-Sep-27 01:16 UTC

head link

[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations

> -----Original Message-----
> From: Sriraman Tallam <tmsriram at google.com>
> Sent: Thursday, September 26, 2019 5:31 PM
> To: Eli Friedman <efriedma at quicinc.com>
> Cc: Xinliang David Li <xinliangli at gmail.com>; llvm-dev
<llvm-dev at lists.llvm.org>
> Subject: [EXT] Re: [llvm-dev] [RFC] Propeller: A frame work for Post Link
> Optimizations
>
> For more details, We strongly considered this.  We could run something
> like a thin link in thin lto figure out the global layout and hand out
> the relevant  subsets of the global decision  to each module.  This
> looked more complicated and the individual pieces from each module
> should still be globally laid out again by the linker.  This
> constraints us on what we can do for layout and also does not work
> well with future optimizations like global alignment like David
> pointed out.
Basically, you need to do branch relaxation etc. in the linker because the
layout isn't sufficiently finalized until then?  And you *only* need to
rewrite branches this way; never anything else, because other instructions
don't need relaxation?

On most targets, we do branch relaxation in the compiler, unlike x86 which does
it in the assembler.  I guess we could teach the compiler to emit some specific
"relaxed" pattern, and teach the linker to reverse it.  Might be a
little tricky to handle certain odd cases well; currently, for example, on
AArch64 we sometimes insert a cpsr clobber as part of relaxation.  I'm not
really happy with having to reimplement that for every target, but if we're
sure it's limited to branches, I guess it's okay.
> > Why are you proposing to add a bunch of options to clang to manipulate
LLVM
> code generation, given none of those options are actually used for
Propeller
> workflows?
>
> Where do you suggest labelling and section options should exist?  We
> looked at  basic block sections to be similar to function sections in
> terms of option handling?
Generating bitcode with/without propeller doesn't actually affect the
generated bitcode, right?  So you could say that Propeller is enabled with
"-Wl,--enable-propeller", and not change clang at all.  I'm not a
fan of adding options just because we can.  If basic block sections are only
used as a sort of secret handshake between the propeller compiler and the
propeller linker, we can change that interface later, if we want; if we expose
it as a clang option, we're committing to making basic block sections
continue to work the same way until the end of time.

-Eli

Xinliang David Li via llvm-dev

2019-Sep-27 02:02 UTC

head link

[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations

On Thu, Sep 26, 2019 at 5:31 PM Sriraman Tallam <tmsriram at google.com>
wrote:
> On Thu, Sep 26, 2019 at 5:13 PM Eli Friedman <efriedma at
quicinc.com> wrote:
> >
> > > -----Original Message-----
> > > From: Sriraman Tallam <tmsriram at google.com>
> > > Sent: Thursday, September 26, 2019 3:24 PM
> > > To: Eli Friedman <efriedma at quicinc.com>
> > > Cc: Xinliang David Li <xinliangli at gmail.com>; llvm-dev
<
> llvm-dev at lists.llvm.org>
> > > Subject: [EXT] Re: [llvm-dev] [RFC] Propeller: A frame work for
Post
> Link
> > > Optimizations
> > >
> > > On Thu, Sep 26, 2019 at 12:39 PM Eli Friedman <efriedma at
quicinc.com>
> wrote:
> > > >
> > > >
> > > >
> > > > From: Xinliang David Li <xinliangli at gmail.com>
> > > > Sent: Wednesday, September 25, 2019 5:58 PM
> > > > To: Eli Friedman <efriedma at quicinc.com>
> > > > Cc: Sriraman Tallam <tmsriram at google.com>; llvm-dev
<llvm-
> > > dev at lists.llvm.org>
> > > > Subject: [EXT] Re: [llvm-dev] [RFC] Propeller: A frame work
for Post
> Link
> > > Optimizations
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Sep 25, 2019 at 5:02 PM Eli Friedman via llvm-dev
<llvm-
> > > dev at lists.llvm.org> wrote:
> > > >
> > > > My biggest question about this architecture is about when
propeller
> runs basic
> > > block reordering within a function.  It seems like a lot of the
> complexity comes
> > > from using the proposed -fbasicblock-sections to generated
mangled
> ELF, and
> > > then re-parsing the mangled ELF as a separate step.  I'm not
sure
> that's the right
> > > approach, long-term.
> > > >
> > > > Splitting every basic block into its own section introduces
> overhead, like you
> > > described.  And it's likely more complex on non-x86 targets,
which
> have a
> > > greater variety of conditional branches.  And the reordering
itself
> involves a
> > > bunch of x86 and ELF-specific code.
> > > >
> > > > I'd like to suggest an alternative: instead of perform
basic block
> reordering and
> > > function splitting by manipulating the ELF files, you could
perform
> reordering
> > > and splitting as part of link-time code generation, as an MIR
pass
> that runs just
> > > before the assembly printer.  MIR is almost exactly the form you
want
> for this
> > > sort of manipulation: it has basic blocks which correspond
closely to
> the final
> > > binary, and a high-level representation of branch instructions.
> > > >
> > > >
> > > >
> > > > This was considered for Propeller.  This  is currently being
> explored in a similar
> > > way as an alternative of CSFDO which uses PMU samples.
> > > >
> > > >
> > > >
> > > > Makes sense.
> > > >
> > > >
> > > >
> > > > And it's before the DWARF/CFI emission, so you don't
need to worry
> about
> > > fixing them afterwards.  This should take less code overall, and
much
> less target-
> > > specific code. And infrastructure for function splitting would be
> useful for non-
> > > Propeller workflows.
> > > >
> > > > There are some minor downsides to this approach I can think
of.  You
> lose a
> > > little flexibility, in that you can't mix blocks from
different
> functions together,
> > > but you aren't doing that anyway, from your description?
> > > >
> > > >
> > > >
> > > > One of the main design objectives of Propeller is to have
the
> capability to do
> > > interprocedural code transformations (reordering, cloning,
dedupping
> etc), so
> > > this won't be a minor downside. Function/block alignment (for
branch
> > > misprediction reduction etc) will also need to be done as a
global
> optimization in
> > > the future.
> > > >
> > > >
> > > >
> > > > Okay, so my suggestion doesn’t work. I’m still concerned the
> proposed design
> > > is going to push us in a direction we don’t want to go. 
Particularly,
> if you’re
> > > going to attempt more complicated transforms, the problems caused
by
> the
> > > limited information available in an ELF file will become more
> prominent.  I mean,
> > > yes, you can come up with a more complicated implicit contract
between
> the
> > > compiler and Propeller about the exact format of Propeller basic
> blocks, and add
> > > additional symbol annotations, and eventually come up with an
“IR”
> that allows
> > > Propeller to perform arbitrary code transforms.  But that’s
inevitably
> going to be
> > > more complicated, and harder to understand, than a compiler IR
> designed for
> > > optimizations.
> > >
> > > Thanks for the feedback, I am not sure I fully understand your
> > > concerns but let me try to make some of the things clearer:
> > >
> > > *  Propeller relinks. Specifically, it regenerates ELF object
files
> > > from MIR.  Even if MIR were serializable, we would still be
starting
> > > before CFI instruction inserter pass and then regenerate the
native
> > > ELF objects.
> >
> > Now I'm confused.  Why are you regenerating the ELF files?
>
> TLDR;  We are regenerating ELF files from optimized IR to keep the
> cost of generating basic block sections low.
>
> If what you describe above is done, where we generate basic block
> sections even before we profile, we must conservatively generate
> sections for all functions.  This is unnecessarily expensive and we
> have shown that generating it on demand based on profiles is much
> cheaper.  Since the backend generation can be distributed or
> parallelized, this is not expensive to do.    If we could make the
> cost of basic block sections in terms of size bloats cheaper we could
> do what you just suggested here, that is, always build with basic
> block sections for all basic blocks.
>
> >
> > I thought the workflow was something like the following:
> >
> > 1. The compiler (or LTO codegen) generates ELF files with basic block
> sections.
>
> The correction here is that we generate first with basic block labels
> and not sections.
>
> > 2. You link once without propeller optimizations.
>
> This is correct.
>
> > 3. You collect the propeller profile.
>
> This is correct.
>
> > 4. You use the same ELF files to relink with propeller optimizations.
>
> We use the same optimized IR files to regenerate the ELF files.  We
> could use optimized MIR here but serializability of MIR is not well
> supported yet and is part of the larger plan.  We only need to re-run
> CFI instruction inserter in MIR.  For future optimizations, we can
> plug in here and make small code transformations for optimizations
> like prefetch insertion.  If we attempt to do basic block re-ordering
> here, we would be limited to intra-procedural basic block reordering
> which is sub-optimal.  Hence, we generate basic block sections here
> only for the sampled functions which is a small %age.
>
> >
> > That's sufficient to implement all the optimizations you
described, as
> far as I can tell.
> >
> > But it sounds like that's wrong?  It sounds like what you're
describing
> is:
> >
> > 1. LTO codegen generates MIR files.
> > 2. You convert the MIR files to ELF object files, and link without
> propeller optimizations.  (Does this step have any propeller-specific code
> generation differences?)
>
> Yes, like I said above, basic block labels are used here to identify
> the basic blocks which are later used in the last step to annotate
> profiles.
>
> > 4. You collect the propeller profile.
> > 5. You apply some propeller optimization to the MIR files.
> > 6. You convert the optimized MIR files to ELF object files, with basic
> block sections enabled.
>
> Right, technically this is high level IR now as MIR is not
> serializable but this is ok for discussion.
>
> > 7. You link the ELF files, applying propeller reordering
optimizations.
> >
> > (And since you can't serialize MIR, you actually serialize LLVM
IR, and
> run the code generator when you deserialize, to convert that to MIR.)
> >
> > Is that correct?
>
> That is correct.
>
> >
> > If you have the MIR at the time you're making all the propeller
> optimization decisions, why is the linker rewriting raw x86 assembly, as
> opposed to performing the relevant transforms on MIR?
>
MIR level transformations are not excluded and can be done in Propeller
framework, just limited to module level optimizations. Anything
Global/Interprocedural needs to happen at (real) link time with assistance
of module level annotations.

>
> MIR is still one module at a time.  We cannot do inter-procedural
> basic block layout here.  We can do much more advanced stuff at the
> whole-program level in the linker.  The relaxation code is the down
> side.
>
> For more details, We strongly considered this.  We could run something
> like a thin link in thin lto figure out the global layout and hand out
> the relevant  subsets of the global decision  to each module.  This
> looked more complicated and the individual pieces from each module
> should still be globally laid out again by the linker.

This won't work well -- as cross module inlining has not yet happened. Not
only will there be problems with profile annotation, all the profile
context sensitives will be lost.

> This
> constraints us on what we can do for layout and also does not work
> well with future optimizations like global alignment like David
> pointed out.
>
> >
> > Why are you proposing to add a bunch of options to clang to manipulate
> LLVM code generation, given none of those options are actually used for
> Propeller workflows?
>
>Propeller workflows include precise profile annotations, so the options are
used there.

David


> Where do you suggest labelling and section options should exist?  We
> looked at  basic block sections to be similar to function sections in
> terms of option handling?
>
> Thanks
> Sri
>
> >
> > -Eli
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190926/511eb860/attachment.html>

Kristof Beyls via llvm-dev

2019-Sep-27 07:37 UTC

head link

[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations

Op vr 27 sep. 2019 om 02:32 schreef Sriraman Tallam via llvm-dev <
llvm-dev at lists.llvm.org>:
> >
> > If you have the MIR at the time you're making all the propeller
> optimization decisions, why is the linker rewriting raw x86 assembly, as
> opposed to performing the relevant transforms on MIR?
>
> MIR is still one module at a time.  We cannot do inter-procedural
> basic block layout here.  We can do much more advanced stuff at the
> whole-program level in the linker.  The relaxation code is the down
> side.
>
> For more details, We strongly considered this.  We could run something
> like a thin link in thin lto figure out the global layout and hand out
> the relevant  subsets of the global decision  to each module.  This
> looked more complicated and the individual pieces from each module
> should still be globally laid out again by the linker.  This
> constraints us on what we can do for layout and also does not work
> well with future optimizations like global alignment like David
> pointed out.
>
Apologies for the naive question.
Why is MIR restricted to being only module at a time?
If the restriction of only being able to do per-module processing at the
MIR level would go away, then all these optimizations could be done in a
compile step, and no support would need to be added to a linker?
If the restriction to do cross-module optimization at the MIR level could
be removed, would it become a better tradeoff to do this in an LTO compiler
step rather than a linker step?

Thanks,

Kristof
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190927/45f1471c/attachment.html>

Xinliang David Li via llvm-dev

2019-Sep-27 15:53 UTC

head link

[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations

On Fri, Sep 27, 2019 at 12:37 AM Kristof Beyls via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> Op vr 27 sep. 2019 om 02:32 schreef Sriraman Tallam via llvm-dev <
> llvm-dev at lists.llvm.org>:
>
>> >
>> > If you have the MIR at the time you're making all the
propeller
>> optimization decisions, why is the linker rewriting raw x86 assembly,
as
>> opposed to performing the relevant transforms on MIR?
>>
>> MIR is still one module at a time.  We cannot do inter-procedural
>> basic block layout here.  We can do much more advanced stuff at the
>> whole-program level in the linker.  The relaxation code is the down
>> side.
>>
>> For more details, We strongly considered this.  We could run something
>> like a thin link in thin lto figure out the global layout and hand out
>> the relevant  subsets of the global decision  to each module.  This
>> looked more complicated and the individual pieces from each module
>> should still be globally laid out again by the linker.  This
>> constraints us on what we can do for layout and also does not work
>> well with future optimizations like global alignment like David
>> pointed out.
>>
>
> Apologies for the naive question.
> Why is MIR restricted to being only module at a time?
> If the restriction of only being able to do per-module processing at the
> MIR level would go away, then all these optimizations could be done in a
> compile step, and no support would need to be added to a linker?
> If the restriction to do cross-module optimization at the MIR level could
> be removed, would it become a better tradeoff to do this in an LTO compiler
> step rather than a linker step?
>
LTO style optimization uses the monolithic model which Propeller moves away
from. The design principle is as much as possible preparation work will be
done at module level, while the global step is lean and mean.

For layout/splitting/alignment/packing, linker is the right platform to
use. It sees/creates many things compiler does not, such as
PLTs/stubs/trampolines. etc. Besides, using linker allows inter-procedural
reordering of blocks in native object files as long as those files are
compiled with the right annotations. For instance, we can do per
application optimal layout for memcpy/memset functions from libc.a
depending on the profile data for that app.

David


>
> Thanks,
>
> Kristof
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190927/b66ee645/attachment-0001.html>

Sriraman Tallam via llvm-dev

2019-Sep-27 16:42 UTC

head link

[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations

On Thu, Sep 26, 2019 at 6:16 PM Eli Friedman <efriedma at quicinc.com>
wrote:>
> > -----Original Message-----
> > From: Sriraman Tallam <tmsriram at google.com>
> > Sent: Thursday, September 26, 2019 5:31 PM
> > To: Eli Friedman <efriedma at quicinc.com>
> > Cc: Xinliang David Li <xinliangli at gmail.com>; llvm-dev
<llvm-dev at lists.llvm.org>
> > Subject: [EXT] Re: [llvm-dev] [RFC] Propeller: A frame work for Post
Link
> > Optimizations
> >
> > For more details, We strongly considered this.  We could run something
> > like a thin link in thin lto figure out the global layout and hand out
> > the relevant  subsets of the global decision  to each module.  This
> > looked more complicated and the individual pieces from each module
> > should still be globally laid out again by the linker.  This
> > constraints us on what we can do for layout and also does not work
> > well with future optimizations like global alignment like David
> > pointed out.
>
> Basically, you need to do branch relaxation etc. in the linker because the
layout isn't sufficiently finalized until then?
Yes, this is correct.  Relaxation is needed because the layout is not
finalized until link time.




And you *only* need to rewrite branches this way; never anything else,
because other instructions don't need relaxation?

I cannot think of any other instance where such relaxation is needed.
>
> On most targets, we do branch relaxation in the compiler, unlike x86 which
does it in the assembler.  I guess we could teach the compiler to emit some
specific "relaxed" pattern, and teach the linker to reverse it.  Might
be a little tricky to handle certain odd cases well; currently, for example, on
AArch64 we sometimes insert a cpsr clobber as part of relaxation.  I'm not
really happy with having to reimplement that for every target, but if we're
sure it's limited to branches, I guess it's okay.
Understood.
>
> > > Why are you proposing to add a bunch of options to clang to
manipulate LLVM
> > code generation, given none of those options are actually used for
Propeller
> > workflows?
> >
> > Where do you suggest labelling and section options should exist?  We
> > looked at  basic block sections to be similar to function sections in
> > terms of option handling?
>
> Generating bitcode with/without propeller doesn't actually affect the
generated bitcode, right?  So you could say that Propeller is enabled with
"-Wl,--enable-propeller", and not change clang at all.  I'm not a
fan of adding options just because we can.  If basic block sections are only
used as a sort of secret handshake between the propeller compiler and the
propeller linker, we can change that interface later, if we want; if we expose
it as a clang option, we're committing to making basic block sections
continue to work the same way until the end of time.
The generated MIR code is slightly different as the later passes have
more CFI instructions, basic block labels and extra direct jumps which
would have been fall throughs otherwise.  So, some llvm option is
needed.

I envisioned that basic block sections could also be useful on its own
outside of propeller.   Hence, we made it like function-sections with
a separate option -fbasicblock-sections.  The propeller option is an
umbrella option to make it easy to invoke Propeller.

Thanks
Sri
>
> -Eli

llvm dev - Sep 2019 - [RFC] Propeller: A frame work for Post Link Optimizations

[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations

[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations

[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations

[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations

[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations

[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations