thr3ads.net - llvm dev - [llvm-dev] RFC: LLD range extension thunks [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Peter Smith via llvm-dev

2017-Jan-04 18:34 UTC

[llvm-dev] RFC: LLD range extension thunks

I'm about to start working on range extension thunks in lld. This is
an attempt to summarize the approach I'd like to take and what the
impact will be on lld outside of thunks. I'm interested if anyone has
any constraints the approach will break, alternative suggestions, or
is working on something I'll need to take account of?

I expect range extension thunks to be important for ARM and useful for
AArch64. In principle any target with a limited range branch immediate
instruction could benefit from them.

I've put some detail about range extension thunks at the end of this
message for those not familiar with them. The name range extension
thunks is by no means universal, for example in ARM's ABI
documentation they are called veneers, The GNU linker calls them
stubs.

Summary of existing thunk implementation (ARM interworking and Mips
PIC to non-PIC calls):
- A Regular, Shared or Undefined symbol may have a single thunk
- For each relocation to a symbol S, if we need a thunk we use the
thunk for S as the target of the relocation rather than S. The thunk
will transfer control to S.
- Thunks are assigned to an InputSection, these are written out when
the InputSection is written. The InputSection with the Thunk contains
either the caller (ARM) or callee (Mips).
- For all existing thunks, the decision of whether a thunk is needed
is not dependent on address. A Thumb branch to ARM will always need a
thunk, no matter the distance. Thunks can therefore be generated
relatively early in Writer::run().

High level impact of range extension thunks:

There may be more than one than one thunk per callee:
- A range extension thunk must be placed within range of the caller,
there may be cases where no single thunk for a callee is in range of
all callers.
- An ARM Target may need a different Thunk for ARM and Thumb callers.

Address information is needed to determine if a range extension thunk is needed:
- The more precise the address information available the less thunks
will be generated. the most precise address information is the final
address of caller and callee is known at thunk creation time, the
least precise is neither the address of the caller or callee is known.

Range extension thunks can be combined or replace other thunks
- Thunks may also be used for instruction set interworking (ARM) or
for calling between position independent and non-position independent
code (Mips). Either a chain of thunks or a combined thunk that does
both operations is needed. For ARM all range extension thunks can
trivially be interworking thunks as well.

Range extension thunk placement can be important
- Many callers may need a range extension. Placing a range extension
thunk so that it is in range of the most callers minimizes number of
thunks needed.
- Thunks may be better as synthetic sections rather than as additions
to input sections.

Adding/removing content must not break the range calculations used in
range extension thunks.
- If any caller, callee or thunk address is changed after range
extension thunks are calculated it could invalidate the range
calculation.
- Ideally range extension thunks are the last operation the linker
does prior to resolving relocations.

I think that there are two separate areas to a range extension thunk
implementation that can be considered separately.
1.) Moving thunk generation to a later stage, at a minimum we need an
estimation of the address of each caller and callee, in an ideal world
we know the final address of each caller and callee. This could mean
assigning section addresses multiple times.
2.) The alterations to the core data structures to permit more than
one Thunk per symbol and the logic to select the "right" Thunk for
each relocation.

The design I'd like to aim at moves thunk creation into
finalizeSections() at a point where the sizes and addresses of all the
SyntheticSections are known. This would mean that the final address of
each caller and callee could be used, and after thunk creation there
would be no further content changes. This would mean:
- All code that runs prior to thunk creation may have the offset in
the OutputSection altered by the addition of thunks. In particular
scanRelocs() calculates the offset in the OutputSection of some
relocations. We would need to find alternative ways of handling these
cases so that they could either survive thunk creation or be patched
up afterwards.
- assignAddresses() will need to run at least twice if thunks are
created. At least once to give the thunk creation the caller and
callee addresses, and at least once after all thunks have been
created.

There is an alternative design that only uses estimates of caller and
callee address to decide if a thunk is needed. In effect we use a
heuristic to predict how much extra synthetic content, such as plt and
got size, will be added after Thunk creation when deciding if a Thunk
is needed. I'm not in favour of this approach as from bitter
experience it tends to result in hard to debug problems when the
heuristics break down. Precise addresses would also allow errata
patching thunks [*]

I've not thought too hard about how to alter the core data structures
yet. I think this will mostly be implementation detail though.

Next steps:
I'd like to proceed with the following plan:
1.) Move the existing thunk implementation to where it would need to
be in finalizeSections(). This should flush out all the non-thunk
related assumptions about addresses without adding any existing
complexity to the Thunk implementation.
2.) Add support for multiple thunks per symbol
3.) If it turns out to be a good idea, implement thunks as SyntheticSections
4.) Add support for range extensions.

I think the first implementation of range extension thunks should be
simple and not try too hard to minimize the number of thunks needed.
If there is a need to optimize it can be done later as the changes
should be within the thunk creation module.

Thanks for reading

Peter

The remainder of the message is a brief explanation of range extension
and errata patching thunks.

What are range extension thunks?
Many architectures have branch instructions that have a finite range
that is insufficient to reach all possible program locations. For
example the ARM branch immediate instruction has an immediate that
encodes an offset of +-32Mb from the branch instruction. A range
extension thunk is a linker generated code sequence, inserted between
the caller and the callee, that completes the transfer of control to
the callee when the distance between the caller and callee exceeds the
range of the branch instruction. A simple example in psuedo assembly
for a non-position independent ARM function call.

source:
BL long_range_thunk_to_target
...
long_range_thunk_to_target
LDR r12, target_address ; r12 is the corruptible interprocedural
scratch register (ip)
BX r12
target_address:
.word target ;
...
target:
...

What is an errata patching thunk?
Some CPU errata (hardware bugs) can be fixed at a link time by
replacing an instruction with a branch to a sequence of equivalent
instructions that are guaranteed to to not trigger the erratum. In
some cases the trigger sequence is dependent on precise addresses such
as immediates crossing page boundaries, for example
https://sourceware.org/ml/binutils-cvs/2015-04/msg00012.html . Errata
patching is out of the scope of implementing range extension thunks
but can be seen as a generalization of it.

Rui Ueyama via llvm-dev

2017-Jan-05 09:52 UTC

head link

[llvm-dev] RFC: LLD range extension thunks

Hi Peter,

Here are my comments:

- I didn't think hard enough, but I believe creating thunks as synthetic
sections instead of attached data for other input sections is towards a
right direction, because synthetic sections are suitable for adding
linker-generated data to output files.

- As you wrote, we need to iterate relocations at least twice to create
range extension thunks. Each iteration can be a linear scan, correct? I
mean, we can start from the section at the lowest address towards higher
address examining relocations and create thunks if targets are too far.

- I do not see a reason that we need to associate range extension thunks to
symbols. It seems to me that while scanning relocations, we need to keep
only the last thunk address for each symbol. If we find that some
relocation against symbol S needs a range extension thunk, we first check
if the last thunk for S is within the range and reuse it if it is. In this
way, we need to keep only one thunk for one symbol at any moment.

- Have you considered rewriting relocations? I think, if we find that
relocation R pointing to symbol S needs a range extension thunk, we should
(1) create a range extension thunk, (2) create a symbol body object S' for
the thunk, (3) and rewrite R to point to S' instead of S. Then later passes
don't have to deal with thunks.

On Thu, Jan 5, 2017 at 3:34 AM, Peter Smith <peter.smith at linaro.org>
wrote:
> I'm about to start working on range extension thunks in lld. This is
> an attempt to summarize the approach I'd like to take and what the
> impact will be on lld outside of thunks. I'm interested if anyone has
> any constraints the approach will break, alternative suggestions, or
> is working on something I'll need to take account of?
>
> I expect range extension thunks to be important for ARM and useful for
> AArch64. In principle any target with a limited range branch immediate
> instruction could benefit from them.
>
> I've put some detail about range extension thunks at the end of this
> message for those not familiar with them. The name range extension
> thunks is by no means universal, for example in ARM's ABI
> documentation they are called veneers, The GNU linker calls them
> stubs.
>
> Summary of existing thunk implementation (ARM interworking and Mips
> PIC to non-PIC calls):
> - A Regular, Shared or Undefined symbol may have a single thunk
> - For each relocation to a symbol S, if we need a thunk we use the
> thunk for S as the target of the relocation rather than S. The thunk
> will transfer control to S.
> - Thunks are assigned to an InputSection, these are written out when
> the InputSection is written. The InputSection with the Thunk contains
> either the caller (ARM) or callee (Mips).
> - For all existing thunks, the decision of whether a thunk is needed
> is not dependent on address. A Thumb branch to ARM will always need a
> thunk, no matter the distance. Thunks can therefore be generated
> relatively early in Writer::run().
>
> High level impact of range extension thunks:
>
> There may be more than one than one thunk per callee:
> - A range extension thunk must be placed within range of the caller,
> there may be cases where no single thunk for a callee is in range of
> all callers.
> - An ARM Target may need a different Thunk for ARM and Thumb callers.
>
> Address information is needed to determine if a range extension thunk is
> needed:
> - The more precise the address information available the less thunks
> will be generated. the most precise address information is the final
> address of caller and callee is known at thunk creation time, the
> least precise is neither the address of the caller or callee is known.
>
> Range extension thunks can be combined or replace other thunks
> - Thunks may also be used for instruction set interworking (ARM) or
> for calling between position independent and non-position independent
> code (Mips). Either a chain of thunks or a combined thunk that does
> both operations is needed. For ARM all range extension thunks can
> trivially be interworking thunks as well.
>
> Range extension thunk placement can be important
> - Many callers may need a range extension. Placing a range extension
> thunk so that it is in range of the most callers minimizes number of
> thunks needed.
> - Thunks may be better as synthetic sections rather than as additions
> to input sections.
>
> Adding/removing content must not break the range calculations used in
> range extension thunks.
> - If any caller, callee or thunk address is changed after range
> extension thunks are calculated it could invalidate the range
> calculation.
> - Ideally range extension thunks are the last operation the linker
> does prior to resolving relocations.
>
> I think that there are two separate areas to a range extension thunk
> implementation that can be considered separately.
> 1.) Moving thunk generation to a later stage, at a minimum we need an
> estimation of the address of each caller and callee, in an ideal world
> we know the final address of each caller and callee. This could mean
> assigning section addresses multiple times.
> 2.) The alterations to the core data structures to permit more than
> one Thunk per symbol and the logic to select the "right" Thunk
for
> each relocation.
>
> The design I'd like to aim at moves thunk creation into
> finalizeSections() at a point where the sizes and addresses of all the
> SyntheticSections are known. This would mean that the final address of
> each caller and callee could be used, and after thunk creation there
> would be no further content changes. This would mean:
> - All code that runs prior to thunk creation may have the offset in
> the OutputSection altered by the addition of thunks. In particular
> scanRelocs() calculates the offset in the OutputSection of some
> relocations. We would need to find alternative ways of handling these
> cases so that they could either survive thunk creation or be patched
> up afterwards.
> - assignAddresses() will need to run at least twice if thunks are
> created. At least once to give the thunk creation the caller and
> callee addresses, and at least once after all thunks have been
> created.
>
> There is an alternative design that only uses estimates of caller and
> callee address to decide if a thunk is needed. In effect we use a
> heuristic to predict how much extra synthetic content, such as plt and
> got size, will be added after Thunk creation when deciding if a Thunk
> is needed. I'm not in favour of this approach as from bitter
> experience it tends to result in hard to debug problems when the
> heuristics break down. Precise addresses would also allow errata
> patching thunks [*]
>
> I've not thought too hard about how to alter the core data structures
> yet. I think this will mostly be implementation detail though.
>
> Next steps:
> I'd like to proceed with the following plan:
> 1.) Move the existing thunk implementation to where it would need to
> be in finalizeSections(). This should flush out all the non-thunk
> related assumptions about addresses without adding any existing
> complexity to the Thunk implementation.
> 2.) Add support for multiple thunks per symbol
> 3.) If it turns out to be a good idea, implement thunks as
> SyntheticSections
> 4.) Add support for range extensions.
>
> I think the first implementation of range extension thunks should be
> simple and not try too hard to minimize the number of thunks needed.
> If there is a need to optimize it can be done later as the changes
> should be within the thunk creation module.
>
> Thanks for reading
>
> Peter
>
> The remainder of the message is a brief explanation of range extension
> and errata patching thunks.
>
> What are range extension thunks?
> Many architectures have branch instructions that have a finite range
> that is insufficient to reach all possible program locations. For
> example the ARM branch immediate instruction has an immediate that
> encodes an offset of +-32Mb from the branch instruction. A range
> extension thunk is a linker generated code sequence, inserted between
> the caller and the callee, that completes the transfer of control to
> the callee when the distance between the caller and callee exceeds the
> range of the branch instruction. A simple example in psuedo assembly
> for a non-position independent ARM function call.
>
> source:
> BL long_range_thunk_to_target
> ...
> long_range_thunk_to_target
> LDR r12, target_address ; r12 is the corruptible interprocedural
> scratch register (ip)
> BX r12
> target_address:
> .word target ;
> ...
> target:
> ...
>
> What is an errata patching thunk?
> Some CPU errata (hardware bugs) can be fixed at a link time by
> replacing an instruction with a branch to a sequence of equivalent
> instructions that are guaranteed to to not trigger the erratum. In
> some cases the trigger sequence is dependent on precise addresses such
> as immediates crossing page boundaries, for example
> https://sourceware.org/ml/binutils-cvs/2015-04/msg00012.html . Errata
> patching is out of the scope of implementing range extension thunks
> but can be seen as a generalization of it.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170105/d47dab2a/attachment.html>

Peter Smith via llvm-dev

2017-Jan-05 11:15 UTC

head link

[llvm-dev] RFC: LLD range extension thunks

Hello Rui,

Thanks for the comments

- Synthetic sections and rewriting relocations
I think that this would definitely be worth trying. It should remove
the need for thunks to be represented in the core data structures, and
would allow .

It would also mean that we wouldn't have to associate symbols with
thunks as the relocations would directly target the thunks. ARM
interworking makes reusing thunks more difficult as not every thunk is
compatible with every caller. For example:
ARM B target and Thumb2 B.W target can't reuse the same thunk even if
in range as the branch instruction can't change state.

I think it is worth an experiment to make the existing implementation
of thunks use synthetic sections and rewriting relocations before
trying to implement range extension thunks.

- Yes the scan is linear it is essentially:
do
    assign addresses to input sections
    for each relocation
        if (thunk needed)
            create thunk or reuse existing one
while (no more thunks added)

There's quite a lot of complexity that can be added with respect to
the placement of thunks within the output section. For example if
there is a caller with a low address and a caller with a high address,
both might be able to reuse a thunk placed in the middle. I think it
is worth starting simple though.

Peter


On 5 January 2017 at 09:52, Rui Ueyama <ruiu at google.com>
wrote:> Hi Peter,
>
> Here are my comments:
>
> - I didn't think hard enough, but I believe creating thunks as
synthetic
> sections instead of attached data for other input sections is towards a
> right direction, because synthetic sections are suitable for adding
> linker-generated data to output files.
>
> - As you wrote, we need to iterate relocations at least twice to create
> range extension thunks. Each iteration can be a linear scan, correct? I
> mean, we can start from the section at the lowest address towards higher
> address examining relocations and create thunks if targets are too far.
>
> - I do not see a reason that we need to associate range extension thunks to
> symbols. It seems to me that while scanning relocations, we need to keep
> only the last thunk address for each symbol. If we find that some
relocation
> against symbol S needs a range extension thunk, we first check if the last
> thunk for S is within the range and reuse it if it is. In this way, we need
> to keep only one thunk for one symbol at any moment.
>
> - Have you considered rewriting relocations? I think, if we find that
> relocation R pointing to symbol S needs a range extension thunk, we should
> (1) create a range extension thunk, (2) create a symbol body object S'
for
> the thunk, (3) and rewrite R to point to S' instead of S. Then later
passes
> don't have to deal with thunks.
>
>
> On Thu, Jan 5, 2017 at 3:34 AM, Peter Smith <peter.smith at
linaro.org> wrote:
>>
>> I'm about to start working on range extension thunks in lld. This
is
>> an attempt to summarize the approach I'd like to take and what the
>> impact will be on lld outside of thunks. I'm interested if anyone
has
>> any constraints the approach will break, alternative suggestions, or
>> is working on something I'll need to take account of?
>>
>> I expect range extension thunks to be important for ARM and useful for
>> AArch64. In principle any target with a limited range branch immediate
>> instruction could benefit from them.
>>
>> I've put some detail about range extension thunks at the end of
this
>> message for those not familiar with them. The name range extension
>> thunks is by no means universal, for example in ARM's ABI
>> documentation they are called veneers, The GNU linker calls them
>> stubs.
>>
>> Summary of existing thunk implementation (ARM interworking and Mips
>> PIC to non-PIC calls):
>> - A Regular, Shared or Undefined symbol may have a single thunk
>> - For each relocation to a symbol S, if we need a thunk we use the
>> thunk for S as the target of the relocation rather than S. The thunk
>> will transfer control to S.
>> - Thunks are assigned to an InputSection, these are written out when
>> the InputSection is written. The InputSection with the Thunk contains
>> either the caller (ARM) or callee (Mips).
>> - For all existing thunks, the decision of whether a thunk is needed
>> is not dependent on address. A Thumb branch to ARM will always need a
>> thunk, no matter the distance. Thunks can therefore be generated
>> relatively early in Writer::run().
>>
>> High level impact of range extension thunks:
>>
>> There may be more than one than one thunk per callee:
>> - A range extension thunk must be placed within range of the caller,
>> there may be cases where no single thunk for a callee is in range of
>> all callers.
>> - An ARM Target may need a different Thunk for ARM and Thumb callers.
>>
>> Address information is needed to determine if a range extension thunk
is
>> needed:
>> - The more precise the address information available the less thunks
>> will be generated. the most precise address information is the final
>> address of caller and callee is known at thunk creation time, the
>> least precise is neither the address of the caller or callee is known.
>>
>> Range extension thunks can be combined or replace other thunks
>> - Thunks may also be used for instruction set interworking (ARM) or
>> for calling between position independent and non-position independent
>> code (Mips). Either a chain of thunks or a combined thunk that does
>> both operations is needed. For ARM all range extension thunks can
>> trivially be interworking thunks as well.
>>
>> Range extension thunk placement can be important
>> - Many callers may need a range extension. Placing a range extension
>> thunk so that it is in range of the most callers minimizes number of
>> thunks needed.
>> - Thunks may be better as synthetic sections rather than as additions
>> to input sections.
>>
>> Adding/removing content must not break the range calculations used in
>> range extension thunks.
>> - If any caller, callee or thunk address is changed after range
>> extension thunks are calculated it could invalidate the range
>> calculation.
>> - Ideally range extension thunks are the last operation the linker
>> does prior to resolving relocations.
>>
>> I think that there are two separate areas to a range extension thunk
>> implementation that can be considered separately.
>> 1.) Moving thunk generation to a later stage, at a minimum we need an
>> estimation of the address of each caller and callee, in an ideal world
>> we know the final address of each caller and callee. This could mean
>> assigning section addresses multiple times.
>> 2.) The alterations to the core data structures to permit more than
>> one Thunk per symbol and the logic to select the "right"
Thunk for
>> each relocation.
>>
>> The design I'd like to aim at moves thunk creation into
>> finalizeSections() at a point where the sizes and addresses of all the
>> SyntheticSections are known. This would mean that the final address of
>> each caller and callee could be used, and after thunk creation there
>> would be no further content changes. This would mean:
>> - All code that runs prior to thunk creation may have the offset in
>> the OutputSection altered by the addition of thunks. In particular
>> scanRelocs() calculates the offset in the OutputSection of some
>> relocations. We would need to find alternative ways of handling these
>> cases so that they could either survive thunk creation or be patched
>> up afterwards.
>> - assignAddresses() will need to run at least twice if thunks are
>> created. At least once to give the thunk creation the caller and
>> callee addresses, and at least once after all thunks have been
>> created.
>>
>> There is an alternative design that only uses estimates of caller and
>> callee address to decide if a thunk is needed. In effect we use a
>> heuristic to predict how much extra synthetic content, such as plt and
>> got size, will be added after Thunk creation when deciding if a Thunk
>> is needed. I'm not in favour of this approach as from bitter
>> experience it tends to result in hard to debug problems when the
>> heuristics break down. Precise addresses would also allow errata
>> patching thunks [*]
>>
>> I've not thought too hard about how to alter the core data
structures
>> yet. I think this will mostly be implementation detail though.
>>
>> Next steps:
>> I'd like to proceed with the following plan:
>> 1.) Move the existing thunk implementation to where it would need to
>> be in finalizeSections(). This should flush out all the non-thunk
>> related assumptions about addresses without adding any existing
>> complexity to the Thunk implementation.
>> 2.) Add support for multiple thunks per symbol
>> 3.) If it turns out to be a good idea, implement thunks as
>> SyntheticSections
>> 4.) Add support for range extensions.
>>
>> I think the first implementation of range extension thunks should be
>> simple and not try too hard to minimize the number of thunks needed.
>> If there is a need to optimize it can be done later as the changes
>> should be within the thunk creation module.
>>
>> Thanks for reading
>>
>> Peter
>>
>> The remainder of the message is a brief explanation of range extension
>> and errata patching thunks.
>>
>> What are range extension thunks?
>> Many architectures have branch instructions that have a finite range
>> that is insufficient to reach all possible program locations. For
>> example the ARM branch immediate instruction has an immediate that
>> encodes an offset of +-32Mb from the branch instruction. A range
>> extension thunk is a linker generated code sequence, inserted between
>> the caller and the callee, that completes the transfer of control to
>> the callee when the distance between the caller and callee exceeds the
>> range of the branch instruction. A simple example in psuedo assembly
>> for a non-position independent ARM function call.
>>
>> source:
>> BL long_range_thunk_to_target
>> ...
>> long_range_thunk_to_target
>> LDR r12, target_address ; r12 is the corruptible interprocedural
>> scratch register (ip)
>> BX r12
>> target_address:
>> .word target ;
>> ...
>> target:
>> ...
>>
>> What is an errata patching thunk?
>> Some CPU errata (hardware bugs) can be fixed at a link time by
>> replacing an instruction with a branch to a sequence of equivalent
>> instructions that are guaranteed to to not trigger the erratum. In
>> some cases the trigger sequence is dependent on precise addresses such
>> as immediates crossing page boundaries, for example
>> https://sourceware.org/ml/binutils-cvs/2015-04/msg00012.html . Errata
>> patching is out of the scope of implementing range extension thunks
>> but can be seen as a generalization of it.
>
>

Rafael Avila de Espindola via llvm-dev

2017-Jan-18 15:54 UTC

head link

[llvm-dev] RFC: LLD range extension thunks

Sorry for being late on the thread, but I just wanted to say that I
agree with the design. The problem is very similar to relaxation in MC
and should probably have a similar solution:

* Keep all offsets relative to input/synthetic sections (fragments in
  MC).
* Compute addresses.
* If anything is not in range add a thunk (relax in MC).
* Repeat.

Cheers,
Rafael

Peter Smith <peter.smith at linaro.org> writes:
> I'm about to start working on range extension thunks in lld. This is
> an attempt to summarize the approach I'd like to take and what the
> impact will be on lld outside of thunks. I'm interested if anyone has
> any constraints the approach will break, alternative suggestions, or
> is working on something I'll need to take account of?
>
> I expect range extension thunks to be important for ARM and useful for
> AArch64. In principle any target with a limited range branch immediate
> instruction could benefit from them.
>
> I've put some detail about range extension thunks at the end of this
> message for those not familiar with them. The name range extension
> thunks is by no means universal, for example in ARM's ABI
> documentation they are called veneers, The GNU linker calls them
> stubs.
>
> Summary of existing thunk implementation (ARM interworking and Mips
> PIC to non-PIC calls):
> - A Regular, Shared or Undefined symbol may have a single thunk
> - For each relocation to a symbol S, if we need a thunk we use the
> thunk for S as the target of the relocation rather than S. The thunk
> will transfer control to S.
> - Thunks are assigned to an InputSection, these are written out when
> the InputSection is written. The InputSection with the Thunk contains
> either the caller (ARM) or callee (Mips).
> - For all existing thunks, the decision of whether a thunk is needed
> is not dependent on address. A Thumb branch to ARM will always need a
> thunk, no matter the distance. Thunks can therefore be generated
> relatively early in Writer::run().
>
> High level impact of range extension thunks:
>
> There may be more than one than one thunk per callee:
> - A range extension thunk must be placed within range of the caller,
> there may be cases where no single thunk for a callee is in range of
> all callers.
> - An ARM Target may need a different Thunk for ARM and Thumb callers.
>
> Address information is needed to determine if a range extension thunk is
needed:
> - The more precise the address information available the less thunks
> will be generated. the most precise address information is the final
> address of caller and callee is known at thunk creation time, the
> least precise is neither the address of the caller or callee is known.
>
> Range extension thunks can be combined or replace other thunks
> - Thunks may also be used for instruction set interworking (ARM) or
> for calling between position independent and non-position independent
> code (Mips). Either a chain of thunks or a combined thunk that does
> both operations is needed. For ARM all range extension thunks can
> trivially be interworking thunks as well.
>
> Range extension thunk placement can be important
> - Many callers may need a range extension. Placing a range extension
> thunk so that it is in range of the most callers minimizes number of
> thunks needed.
> - Thunks may be better as synthetic sections rather than as additions
> to input sections.
>
> Adding/removing content must not break the range calculations used in
> range extension thunks.
> - If any caller, callee or thunk address is changed after range
> extension thunks are calculated it could invalidate the range
> calculation.
> - Ideally range extension thunks are the last operation the linker
> does prior to resolving relocations.
>
> I think that there are two separate areas to a range extension thunk
> implementation that can be considered separately.
> 1.) Moving thunk generation to a later stage, at a minimum we need an
> estimation of the address of each caller and callee, in an ideal world
> we know the final address of each caller and callee. This could mean
> assigning section addresses multiple times.
> 2.) The alterations to the core data structures to permit more than
> one Thunk per symbol and the logic to select the "right" Thunk
for
> each relocation.
>
> The design I'd like to aim at moves thunk creation into
> finalizeSections() at a point where the sizes and addresses of all the
> SyntheticSections are known. This would mean that the final address of
> each caller and callee could be used, and after thunk creation there
> would be no further content changes. This would mean:
> - All code that runs prior to thunk creation may have the offset in
> the OutputSection altered by the addition of thunks. In particular
> scanRelocs() calculates the offset in the OutputSection of some
> relocations. We would need to find alternative ways of handling these
> cases so that they could either survive thunk creation or be patched
> up afterwards.
> - assignAddresses() will need to run at least twice if thunks are
> created. At least once to give the thunk creation the caller and
> callee addresses, and at least once after all thunks have been
> created.
>
> There is an alternative design that only uses estimates of caller and
> callee address to decide if a thunk is needed. In effect we use a
> heuristic to predict how much extra synthetic content, such as plt and
> got size, will be added after Thunk creation when deciding if a Thunk
> is needed. I'm not in favour of this approach as from bitter
> experience it tends to result in hard to debug problems when the
> heuristics break down. Precise addresses would also allow errata
> patching thunks [*]
>
> I've not thought too hard about how to alter the core data structures
> yet. I think this will mostly be implementation detail though.
>
> Next steps:
> I'd like to proceed with the following plan:
> 1.) Move the existing thunk implementation to where it would need to
> be in finalizeSections(). This should flush out all the non-thunk
> related assumptions about addresses without adding any existing
> complexity to the Thunk implementation.
> 2.) Add support for multiple thunks per symbol
> 3.) If it turns out to be a good idea, implement thunks as
SyntheticSections
> 4.) Add support for range extensions.
>
> I think the first implementation of range extension thunks should be
> simple and not try too hard to minimize the number of thunks needed.
> If there is a need to optimize it can be done later as the changes
> should be within the thunk creation module.
>
> Thanks for reading
>
> Peter
>
> The remainder of the message is a brief explanation of range extension
> and errata patching thunks.
>
> What are range extension thunks?
> Many architectures have branch instructions that have a finite range
> that is insufficient to reach all possible program locations. For
> example the ARM branch immediate instruction has an immediate that
> encodes an offset of +-32Mb from the branch instruction. A range
> extension thunk is a linker generated code sequence, inserted between
> the caller and the callee, that completes the transfer of control to
> the callee when the distance between the caller and callee exceeds the
> range of the branch instruction. A simple example in psuedo assembly
> for a non-position independent ARM function call.
>
> source:
> BL long_range_thunk_to_target
> ...
> long_range_thunk_to_target
> LDR r12, target_address ; r12 is the corruptible interprocedural
> scratch register (ip)
> BX r12
> target_address:
> .word target ;
> ...
> target:
> ...
>
> What is an errata patching thunk?
> Some CPU errata (hardware bugs) can be fixed at a link time by
> replacing an instruction with a branch to a sequence of equivalent
> instructions that are guaranteed to to not trigger the erratum. In
> some cases the trigger sequence is dependent on precise addresses such
> as immediates crossing page boundaries, for example
> https://sourceware.org/ml/binutils-cvs/2015-04/msg00012.html . Errata
> patching is out of the scope of implementing range extension thunks
> but can be seen as a generalization of it.

Ed Maste via llvm-dev

2017-Jan-18 23:48 UTC

head link

[llvm-dev] RFC: LLD range extension thunks

On 4 January 2017 at 13:34, Peter Smith via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I'm about to start working on range extension thunks in lld. This is
> an attempt to summarize the approach I'd like to take and what the
> impact will be on lld outside of thunks.
Now that LLD works well for FreeBSD/amd64 (and arm64 is very close)
I'm looking at other architectures, starting with mips64. The
statically-linked toolchain components currently fail to link with an
out of range jump, so I'm very interested in seeing this work
progress. Are you looking at only arm and AArch64? Once the
infrastructure is in I'll try to take a look at mips if nobody else
does first.

Simon Atanasyan via llvm-dev

2017-Jan-18 23:59 UTC

head link

[llvm-dev] RFC: LLD range extension thunks

On Jan 19, 2017 2:48 AM, "Ed Maste" <emaste at freebsd.org>
wrote:

On 4 January 2017 at 13:34, Peter Smith via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I'm about to start working on range extension thunks in lld. This is
> an attempt to summarize the approach I'd like to take and what the
> impact will be on lld outside of thunks.
Now that LLD works well for FreeBSD/amd64 (and arm64 is very close)
I'm looking at other architectures, starting with mips64. The
statically-linked toolchain components currently fail to link with an
out of range jump, so I'm very interested in seeing this work
progress. Are you looking at only arm and AArch64? Once the
infrastructure is in I'll try to take a look at mips if nobody else
does first.


I'm waiting for this changes too. Now mips thunks places at the end of the
corresponding section. Not sure about FreeBSD but on Linux that leads to
incorrect code in case of static linking -- a thunk goes between crt*.o
files which needs to be "joined" together. Gnu linker puts thunks to
the
separate section. We need to do the same thing.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170119/e4447c5d/attachment.html>

llvm dev - Jan 2017 - RFC: LLD range extension thunks

[llvm-dev] RFC: LLD range extension thunks

[llvm-dev] RFC: LLD range extension thunks

[llvm-dev] RFC: LLD range extension thunks

[llvm-dev] RFC: LLD range extension thunks

[llvm-dev] RFC: LLD range extension thunks

[llvm-dev] RFC: LLD range extension thunks