I'm about to start working on range extension thunks in lld. This is an attempt to summarize the approach I'd like to take and what the impact will be on lld outside of thunks. I'm interested if anyone has any constraints the approach will break, alternative suggestions, or is working on something I'll need to take account of? I expect range extension thunks to be important for ARM and useful for AArch64. In principle any target with a limited range branch immediate instruction could benefit from them. I've put some detail about range extension thunks at the end of this message for those not familiar with them. The name range extension thunks is by no means universal, for example in ARM's ABI documentation they are called veneers, The GNU linker calls them stubs. Summary of existing thunk implementation (ARM interworking and Mips PIC to non-PIC calls): - A Regular, Shared or Undefined symbol may have a single thunk - For each relocation to a symbol S, if we need a thunk we use the thunk for S as the target of the relocation rather than S. The thunk will transfer control to S. - Thunks are assigned to an InputSection, these are written out when the InputSection is written. The InputSection with the Thunk contains either the caller (ARM) or callee (Mips). - For all existing thunks, the decision of whether a thunk is needed is not dependent on address. A Thumb branch to ARM will always need a thunk, no matter the distance. Thunks can therefore be generated relatively early in Writer::run(). High level impact of range extension thunks: There may be more than one than one thunk per callee: - A range extension thunk must be placed within range of the caller, there may be cases where no single thunk for a callee is in range of all callers. - An ARM Target may need a different Thunk for ARM and Thumb callers. Address information is needed to determine if a range extension thunk is needed: - The more precise the address information available the less thunks will be generated. the most precise address information is the final address of caller and callee is known at thunk creation time, the least precise is neither the address of the caller or callee is known. Range extension thunks can be combined or replace other thunks - Thunks may also be used for instruction set interworking (ARM) or for calling between position independent and non-position independent code (Mips). Either a chain of thunks or a combined thunk that does both operations is needed. For ARM all range extension thunks can trivially be interworking thunks as well. Range extension thunk placement can be important - Many callers may need a range extension. Placing a range extension thunk so that it is in range of the most callers minimizes number of thunks needed. - Thunks may be better as synthetic sections rather than as additions to input sections. Adding/removing content must not break the range calculations used in range extension thunks. - If any caller, callee or thunk address is changed after range extension thunks are calculated it could invalidate the range calculation. - Ideally range extension thunks are the last operation the linker does prior to resolving relocations. I think that there are two separate areas to a range extension thunk implementation that can be considered separately. 1.) Moving thunk generation to a later stage, at a minimum we need an estimation of the address of each caller and callee, in an ideal world we know the final address of each caller and callee. This could mean assigning section addresses multiple times. 2.) The alterations to the core data structures to permit more than one Thunk per symbol and the logic to select the "right" Thunk for each relocation. The design I'd like to aim at moves thunk creation into finalizeSections() at a point where the sizes and addresses of all the SyntheticSections are known. This would mean that the final address of each caller and callee could be used, and after thunk creation there would be no further content changes. This would mean: - All code that runs prior to thunk creation may have the offset in the OutputSection altered by the addition of thunks. In particular scanRelocs() calculates the offset in the OutputSection of some relocations. We would need to find alternative ways of handling these cases so that they could either survive thunk creation or be patched up afterwards. - assignAddresses() will need to run at least twice if thunks are created. At least once to give the thunk creation the caller and callee addresses, and at least once after all thunks have been created. There is an alternative design that only uses estimates of caller and callee address to decide if a thunk is needed. In effect we use a heuristic to predict how much extra synthetic content, such as plt and got size, will be added after Thunk creation when deciding if a Thunk is needed. I'm not in favour of this approach as from bitter experience it tends to result in hard to debug problems when the heuristics break down. Precise addresses would also allow errata patching thunks [*] I've not thought too hard about how to alter the core data structures yet. I think this will mostly be implementation detail though. Next steps: I'd like to proceed with the following plan: 1.) Move the existing thunk implementation to where it would need to be in finalizeSections(). This should flush out all the non-thunk related assumptions about addresses without adding any existing complexity to the Thunk implementation. 2.) Add support for multiple thunks per symbol 3.) If it turns out to be a good idea, implement thunks as SyntheticSections 4.) Add support for range extensions. I think the first implementation of range extension thunks should be simple and not try too hard to minimize the number of thunks needed. If there is a need to optimize it can be done later as the changes should be within the thunk creation module. Thanks for reading Peter The remainder of the message is a brief explanation of range extension and errata patching thunks. What are range extension thunks? Many architectures have branch instructions that have a finite range that is insufficient to reach all possible program locations. For example the ARM branch immediate instruction has an immediate that encodes an offset of +-32Mb from the branch instruction. A range extension thunk is a linker generated code sequence, inserted between the caller and the callee, that completes the transfer of control to the callee when the distance between the caller and callee exceeds the range of the branch instruction. A simple example in psuedo assembly for a non-position independent ARM function call. source: BL long_range_thunk_to_target ... long_range_thunk_to_target LDR r12, target_address ; r12 is the corruptible interprocedural scratch register (ip) BX r12 target_address: .word target ; ... target: ... What is an errata patching thunk? Some CPU errata (hardware bugs) can be fixed at a link time by replacing an instruction with a branch to a sequence of equivalent instructions that are guaranteed to to not trigger the erratum. In some cases the trigger sequence is dependent on precise addresses such as immediates crossing page boundaries, for example https://sourceware.org/ml/binutils-cvs/2015-04/msg00012.html . Errata patching is out of the scope of implementing range extension thunks but can be seen as a generalization of it.
Hi Peter, Here are my comments: - I didn't think hard enough, but I believe creating thunks as synthetic sections instead of attached data for other input sections is towards a right direction, because synthetic sections are suitable for adding linker-generated data to output files. - As you wrote, we need to iterate relocations at least twice to create range extension thunks. Each iteration can be a linear scan, correct? I mean, we can start from the section at the lowest address towards higher address examining relocations and create thunks if targets are too far. - I do not see a reason that we need to associate range extension thunks to symbols. It seems to me that while scanning relocations, we need to keep only the last thunk address for each symbol. If we find that some relocation against symbol S needs a range extension thunk, we first check if the last thunk for S is within the range and reuse it if it is. In this way, we need to keep only one thunk for one symbol at any moment. - Have you considered rewriting relocations? I think, if we find that relocation R pointing to symbol S needs a range extension thunk, we should (1) create a range extension thunk, (2) create a symbol body object S' for the thunk, (3) and rewrite R to point to S' instead of S. Then later passes don't have to deal with thunks. On Thu, Jan 5, 2017 at 3:34 AM, Peter Smith <peter.smith at linaro.org> wrote:> I'm about to start working on range extension thunks in lld. This is > an attempt to summarize the approach I'd like to take and what the > impact will be on lld outside of thunks. I'm interested if anyone has > any constraints the approach will break, alternative suggestions, or > is working on something I'll need to take account of? > > I expect range extension thunks to be important for ARM and useful for > AArch64. In principle any target with a limited range branch immediate > instruction could benefit from them. > > I've put some detail about range extension thunks at the end of this > message for those not familiar with them. The name range extension > thunks is by no means universal, for example in ARM's ABI > documentation they are called veneers, The GNU linker calls them > stubs. > > Summary of existing thunk implementation (ARM interworking and Mips > PIC to non-PIC calls): > - A Regular, Shared or Undefined symbol may have a single thunk > - For each relocation to a symbol S, if we need a thunk we use the > thunk for S as the target of the relocation rather than S. The thunk > will transfer control to S. > - Thunks are assigned to an InputSection, these are written out when > the InputSection is written. The InputSection with the Thunk contains > either the caller (ARM) or callee (Mips). > - For all existing thunks, the decision of whether a thunk is needed > is not dependent on address. A Thumb branch to ARM will always need a > thunk, no matter the distance. Thunks can therefore be generated > relatively early in Writer::run(). > > High level impact of range extension thunks: > > There may be more than one than one thunk per callee: > - A range extension thunk must be placed within range of the caller, > there may be cases where no single thunk for a callee is in range of > all callers. > - An ARM Target may need a different Thunk for ARM and Thumb callers. > > Address information is needed to determine if a range extension thunk is > needed: > - The more precise the address information available the less thunks > will be generated. the most precise address information is the final > address of caller and callee is known at thunk creation time, the > least precise is neither the address of the caller or callee is known. > > Range extension thunks can be combined or replace other thunks > - Thunks may also be used for instruction set interworking (ARM) or > for calling between position independent and non-position independent > code (Mips). Either a chain of thunks or a combined thunk that does > both operations is needed. For ARM all range extension thunks can > trivially be interworking thunks as well. > > Range extension thunk placement can be important > - Many callers may need a range extension. Placing a range extension > thunk so that it is in range of the most callers minimizes number of > thunks needed. > - Thunks may be better as synthetic sections rather than as additions > to input sections. > > Adding/removing content must not break the range calculations used in > range extension thunks. > - If any caller, callee or thunk address is changed after range > extension thunks are calculated it could invalidate the range > calculation. > - Ideally range extension thunks are the last operation the linker > does prior to resolving relocations. > > I think that there are two separate areas to a range extension thunk > implementation that can be considered separately. > 1.) Moving thunk generation to a later stage, at a minimum we need an > estimation of the address of each caller and callee, in an ideal world > we know the final address of each caller and callee. This could mean > assigning section addresses multiple times. > 2.) The alterations to the core data structures to permit more than > one Thunk per symbol and the logic to select the "right" Thunk for > each relocation. > > The design I'd like to aim at moves thunk creation into > finalizeSections() at a point where the sizes and addresses of all the > SyntheticSections are known. This would mean that the final address of > each caller and callee could be used, and after thunk creation there > would be no further content changes. This would mean: > - All code that runs prior to thunk creation may have the offset in > the OutputSection altered by the addition of thunks. In particular > scanRelocs() calculates the offset in the OutputSection of some > relocations. We would need to find alternative ways of handling these > cases so that they could either survive thunk creation or be patched > up afterwards. > - assignAddresses() will need to run at least twice if thunks are > created. At least once to give the thunk creation the caller and > callee addresses, and at least once after all thunks have been > created. > > There is an alternative design that only uses estimates of caller and > callee address to decide if a thunk is needed. In effect we use a > heuristic to predict how much extra synthetic content, such as plt and > got size, will be added after Thunk creation when deciding if a Thunk > is needed. I'm not in favour of this approach as from bitter > experience it tends to result in hard to debug problems when the > heuristics break down. Precise addresses would also allow errata > patching thunks [*] > > I've not thought too hard about how to alter the core data structures > yet. I think this will mostly be implementation detail though. > > Next steps: > I'd like to proceed with the following plan: > 1.) Move the existing thunk implementation to where it would need to > be in finalizeSections(). This should flush out all the non-thunk > related assumptions about addresses without adding any existing > complexity to the Thunk implementation. > 2.) Add support for multiple thunks per symbol > 3.) If it turns out to be a good idea, implement thunks as > SyntheticSections > 4.) Add support for range extensions. > > I think the first implementation of range extension thunks should be > simple and not try too hard to minimize the number of thunks needed. > If there is a need to optimize it can be done later as the changes > should be within the thunk creation module. > > Thanks for reading > > Peter > > The remainder of the message is a brief explanation of range extension > and errata patching thunks. > > What are range extension thunks? > Many architectures have branch instructions that have a finite range > that is insufficient to reach all possible program locations. For > example the ARM branch immediate instruction has an immediate that > encodes an offset of +-32Mb from the branch instruction. A range > extension thunk is a linker generated code sequence, inserted between > the caller and the callee, that completes the transfer of control to > the callee when the distance between the caller and callee exceeds the > range of the branch instruction. A simple example in psuedo assembly > for a non-position independent ARM function call. > > source: > BL long_range_thunk_to_target > ... > long_range_thunk_to_target > LDR r12, target_address ; r12 is the corruptible interprocedural > scratch register (ip) > BX r12 > target_address: > .word target ; > ... > target: > ... > > What is an errata patching thunk? > Some CPU errata (hardware bugs) can be fixed at a link time by > replacing an instruction with a branch to a sequence of equivalent > instructions that are guaranteed to to not trigger the erratum. In > some cases the trigger sequence is dependent on precise addresses such > as immediates crossing page boundaries, for example > https://sourceware.org/ml/binutils-cvs/2015-04/msg00012.html . Errata > patching is out of the scope of implementing range extension thunks > but can be seen as a generalization of it. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170105/d47dab2a/attachment.html>
Hello Rui, Thanks for the comments - Synthetic sections and rewriting relocations I think that this would definitely be worth trying. It should remove the need for thunks to be represented in the core data structures, and would allow . It would also mean that we wouldn't have to associate symbols with thunks as the relocations would directly target the thunks. ARM interworking makes reusing thunks more difficult as not every thunk is compatible with every caller. For example: ARM B target and Thumb2 B.W target can't reuse the same thunk even if in range as the branch instruction can't change state. I think it is worth an experiment to make the existing implementation of thunks use synthetic sections and rewriting relocations before trying to implement range extension thunks. - Yes the scan is linear it is essentially: do assign addresses to input sections for each relocation if (thunk needed) create thunk or reuse existing one while (no more thunks added) There's quite a lot of complexity that can be added with respect to the placement of thunks within the output section. For example if there is a caller with a low address and a caller with a high address, both might be able to reuse a thunk placed in the middle. I think it is worth starting simple though. Peter On 5 January 2017 at 09:52, Rui Ueyama <ruiu at google.com> wrote:> Hi Peter, > > Here are my comments: > > - I didn't think hard enough, but I believe creating thunks as synthetic > sections instead of attached data for other input sections is towards a > right direction, because synthetic sections are suitable for adding > linker-generated data to output files. > > - As you wrote, we need to iterate relocations at least twice to create > range extension thunks. Each iteration can be a linear scan, correct? I > mean, we can start from the section at the lowest address towards higher > address examining relocations and create thunks if targets are too far. > > - I do not see a reason that we need to associate range extension thunks to > symbols. It seems to me that while scanning relocations, we need to keep > only the last thunk address for each symbol. If we find that some relocation > against symbol S needs a range extension thunk, we first check if the last > thunk for S is within the range and reuse it if it is. In this way, we need > to keep only one thunk for one symbol at any moment. > > - Have you considered rewriting relocations? I think, if we find that > relocation R pointing to symbol S needs a range extension thunk, we should > (1) create a range extension thunk, (2) create a symbol body object S' for > the thunk, (3) and rewrite R to point to S' instead of S. Then later passes > don't have to deal with thunks. > > > On Thu, Jan 5, 2017 at 3:34 AM, Peter Smith <peter.smith at linaro.org> wrote: >> >> I'm about to start working on range extension thunks in lld. This is >> an attempt to summarize the approach I'd like to take and what the >> impact will be on lld outside of thunks. I'm interested if anyone has >> any constraints the approach will break, alternative suggestions, or >> is working on something I'll need to take account of? >> >> I expect range extension thunks to be important for ARM and useful for >> AArch64. In principle any target with a limited range branch immediate >> instruction could benefit from them. >> >> I've put some detail about range extension thunks at the end of this >> message for those not familiar with them. The name range extension >> thunks is by no means universal, for example in ARM's ABI >> documentation they are called veneers, The GNU linker calls them >> stubs. >> >> Summary of existing thunk implementation (ARM interworking and Mips >> PIC to non-PIC calls): >> - A Regular, Shared or Undefined symbol may have a single thunk >> - For each relocation to a symbol S, if we need a thunk we use the >> thunk for S as the target of the relocation rather than S. The thunk >> will transfer control to S. >> - Thunks are assigned to an InputSection, these are written out when >> the InputSection is written. The InputSection with the Thunk contains >> either the caller (ARM) or callee (Mips). >> - For all existing thunks, the decision of whether a thunk is needed >> is not dependent on address. A Thumb branch to ARM will always need a >> thunk, no matter the distance. Thunks can therefore be generated >> relatively early in Writer::run(). >> >> High level impact of range extension thunks: >> >> There may be more than one than one thunk per callee: >> - A range extension thunk must be placed within range of the caller, >> there may be cases where no single thunk for a callee is in range of >> all callers. >> - An ARM Target may need a different Thunk for ARM and Thumb callers. >> >> Address information is needed to determine if a range extension thunk is >> needed: >> - The more precise the address information available the less thunks >> will be generated. the most precise address information is the final >> address of caller and callee is known at thunk creation time, the >> least precise is neither the address of the caller or callee is known. >> >> Range extension thunks can be combined or replace other thunks >> - Thunks may also be used for instruction set interworking (ARM) or >> for calling between position independent and non-position independent >> code (Mips). Either a chain of thunks or a combined thunk that does >> both operations is needed. For ARM all range extension thunks can >> trivially be interworking thunks as well. >> >> Range extension thunk placement can be important >> - Many callers may need a range extension. Placing a range extension >> thunk so that it is in range of the most callers minimizes number of >> thunks needed. >> - Thunks may be better as synthetic sections rather than as additions >> to input sections. >> >> Adding/removing content must not break the range calculations used in >> range extension thunks. >> - If any caller, callee or thunk address is changed after range >> extension thunks are calculated it could invalidate the range >> calculation. >> - Ideally range extension thunks are the last operation the linker >> does prior to resolving relocations. >> >> I think that there are two separate areas to a range extension thunk >> implementation that can be considered separately. >> 1.) Moving thunk generation to a later stage, at a minimum we need an >> estimation of the address of each caller and callee, in an ideal world >> we know the final address of each caller and callee. This could mean >> assigning section addresses multiple times. >> 2.) The alterations to the core data structures to permit more than >> one Thunk per symbol and the logic to select the "right" Thunk for >> each relocation. >> >> The design I'd like to aim at moves thunk creation into >> finalizeSections() at a point where the sizes and addresses of all the >> SyntheticSections are known. This would mean that the final address of >> each caller and callee could be used, and after thunk creation there >> would be no further content changes. This would mean: >> - All code that runs prior to thunk creation may have the offset in >> the OutputSection altered by the addition of thunks. In particular >> scanRelocs() calculates the offset in the OutputSection of some >> relocations. We would need to find alternative ways of handling these >> cases so that they could either survive thunk creation or be patched >> up afterwards. >> - assignAddresses() will need to run at least twice if thunks are >> created. At least once to give the thunk creation the caller and >> callee addresses, and at least once after all thunks have been >> created. >> >> There is an alternative design that only uses estimates of caller and >> callee address to decide if a thunk is needed. In effect we use a >> heuristic to predict how much extra synthetic content, such as plt and >> got size, will be added after Thunk creation when deciding if a Thunk >> is needed. I'm not in favour of this approach as from bitter >> experience it tends to result in hard to debug problems when the >> heuristics break down. Precise addresses would also allow errata >> patching thunks [*] >> >> I've not thought too hard about how to alter the core data structures >> yet. I think this will mostly be implementation detail though. >> >> Next steps: >> I'd like to proceed with the following plan: >> 1.) Move the existing thunk implementation to where it would need to >> be in finalizeSections(). This should flush out all the non-thunk >> related assumptions about addresses without adding any existing >> complexity to the Thunk implementation. >> 2.) Add support for multiple thunks per symbol >> 3.) If it turns out to be a good idea, implement thunks as >> SyntheticSections >> 4.) Add support for range extensions. >> >> I think the first implementation of range extension thunks should be >> simple and not try too hard to minimize the number of thunks needed. >> If there is a need to optimize it can be done later as the changes >> should be within the thunk creation module. >> >> Thanks for reading >> >> Peter >> >> The remainder of the message is a brief explanation of range extension >> and errata patching thunks. >> >> What are range extension thunks? >> Many architectures have branch instructions that have a finite range >> that is insufficient to reach all possible program locations. For >> example the ARM branch immediate instruction has an immediate that >> encodes an offset of +-32Mb from the branch instruction. A range >> extension thunk is a linker generated code sequence, inserted between >> the caller and the callee, that completes the transfer of control to >> the callee when the distance between the caller and callee exceeds the >> range of the branch instruction. A simple example in psuedo assembly >> for a non-position independent ARM function call. >> >> source: >> BL long_range_thunk_to_target >> ... >> long_range_thunk_to_target >> LDR r12, target_address ; r12 is the corruptible interprocedural >> scratch register (ip) >> BX r12 >> target_address: >> .word target ; >> ... >> target: >> ... >> >> What is an errata patching thunk? >> Some CPU errata (hardware bugs) can be fixed at a link time by >> replacing an instruction with a branch to a sequence of equivalent >> instructions that are guaranteed to to not trigger the erratum. In >> some cases the trigger sequence is dependent on precise addresses such >> as immediates crossing page boundaries, for example >> https://sourceware.org/ml/binutils-cvs/2015-04/msg00012.html . Errata >> patching is out of the scope of implementing range extension thunks >> but can be seen as a generalization of it. > >
Rafael Avila de Espindola via llvm-dev
2017-Jan-18 15:54 UTC
[llvm-dev] RFC: LLD range extension thunks
Sorry for being late on the thread, but I just wanted to say that I agree with the design. The problem is very similar to relaxation in MC and should probably have a similar solution: * Keep all offsets relative to input/synthetic sections (fragments in MC). * Compute addresses. * If anything is not in range add a thunk (relax in MC). * Repeat. Cheers, Rafael Peter Smith <peter.smith at linaro.org> writes:> I'm about to start working on range extension thunks in lld. This is > an attempt to summarize the approach I'd like to take and what the > impact will be on lld outside of thunks. I'm interested if anyone has > any constraints the approach will break, alternative suggestions, or > is working on something I'll need to take account of? > > I expect range extension thunks to be important for ARM and useful for > AArch64. In principle any target with a limited range branch immediate > instruction could benefit from them. > > I've put some detail about range extension thunks at the end of this > message for those not familiar with them. The name range extension > thunks is by no means universal, for example in ARM's ABI > documentation they are called veneers, The GNU linker calls them > stubs. > > Summary of existing thunk implementation (ARM interworking and Mips > PIC to non-PIC calls): > - A Regular, Shared or Undefined symbol may have a single thunk > - For each relocation to a symbol S, if we need a thunk we use the > thunk for S as the target of the relocation rather than S. The thunk > will transfer control to S. > - Thunks are assigned to an InputSection, these are written out when > the InputSection is written. The InputSection with the Thunk contains > either the caller (ARM) or callee (Mips). > - For all existing thunks, the decision of whether a thunk is needed > is not dependent on address. A Thumb branch to ARM will always need a > thunk, no matter the distance. Thunks can therefore be generated > relatively early in Writer::run(). > > High level impact of range extension thunks: > > There may be more than one than one thunk per callee: > - A range extension thunk must be placed within range of the caller, > there may be cases where no single thunk for a callee is in range of > all callers. > - An ARM Target may need a different Thunk for ARM and Thumb callers. > > Address information is needed to determine if a range extension thunk is needed: > - The more precise the address information available the less thunks > will be generated. the most precise address information is the final > address of caller and callee is known at thunk creation time, the > least precise is neither the address of the caller or callee is known. > > Range extension thunks can be combined or replace other thunks > - Thunks may also be used for instruction set interworking (ARM) or > for calling between position independent and non-position independent > code (Mips). Either a chain of thunks or a combined thunk that does > both operations is needed. For ARM all range extension thunks can > trivially be interworking thunks as well. > > Range extension thunk placement can be important > - Many callers may need a range extension. Placing a range extension > thunk so that it is in range of the most callers minimizes number of > thunks needed. > - Thunks may be better as synthetic sections rather than as additions > to input sections. > > Adding/removing content must not break the range calculations used in > range extension thunks. > - If any caller, callee or thunk address is changed after range > extension thunks are calculated it could invalidate the range > calculation. > - Ideally range extension thunks are the last operation the linker > does prior to resolving relocations. > > I think that there are two separate areas to a range extension thunk > implementation that can be considered separately. > 1.) Moving thunk generation to a later stage, at a minimum we need an > estimation of the address of each caller and callee, in an ideal world > we know the final address of each caller and callee. This could mean > assigning section addresses multiple times. > 2.) The alterations to the core data structures to permit more than > one Thunk per symbol and the logic to select the "right" Thunk for > each relocation. > > The design I'd like to aim at moves thunk creation into > finalizeSections() at a point where the sizes and addresses of all the > SyntheticSections are known. This would mean that the final address of > each caller and callee could be used, and after thunk creation there > would be no further content changes. This would mean: > - All code that runs prior to thunk creation may have the offset in > the OutputSection altered by the addition of thunks. In particular > scanRelocs() calculates the offset in the OutputSection of some > relocations. We would need to find alternative ways of handling these > cases so that they could either survive thunk creation or be patched > up afterwards. > - assignAddresses() will need to run at least twice if thunks are > created. At least once to give the thunk creation the caller and > callee addresses, and at least once after all thunks have been > created. > > There is an alternative design that only uses estimates of caller and > callee address to decide if a thunk is needed. In effect we use a > heuristic to predict how much extra synthetic content, such as plt and > got size, will be added after Thunk creation when deciding if a Thunk > is needed. I'm not in favour of this approach as from bitter > experience it tends to result in hard to debug problems when the > heuristics break down. Precise addresses would also allow errata > patching thunks [*] > > I've not thought too hard about how to alter the core data structures > yet. I think this will mostly be implementation detail though. > > Next steps: > I'd like to proceed with the following plan: > 1.) Move the existing thunk implementation to where it would need to > be in finalizeSections(). This should flush out all the non-thunk > related assumptions about addresses without adding any existing > complexity to the Thunk implementation. > 2.) Add support for multiple thunks per symbol > 3.) If it turns out to be a good idea, implement thunks as SyntheticSections > 4.) Add support for range extensions. > > I think the first implementation of range extension thunks should be > simple and not try too hard to minimize the number of thunks needed. > If there is a need to optimize it can be done later as the changes > should be within the thunk creation module. > > Thanks for reading > > Peter > > The remainder of the message is a brief explanation of range extension > and errata patching thunks. > > What are range extension thunks? > Many architectures have branch instructions that have a finite range > that is insufficient to reach all possible program locations. For > example the ARM branch immediate instruction has an immediate that > encodes an offset of +-32Mb from the branch instruction. A range > extension thunk is a linker generated code sequence, inserted between > the caller and the callee, that completes the transfer of control to > the callee when the distance between the caller and callee exceeds the > range of the branch instruction. A simple example in psuedo assembly > for a non-position independent ARM function call. > > source: > BL long_range_thunk_to_target > ... > long_range_thunk_to_target > LDR r12, target_address ; r12 is the corruptible interprocedural > scratch register (ip) > BX r12 > target_address: > .word target ; > ... > target: > ... > > What is an errata patching thunk? > Some CPU errata (hardware bugs) can be fixed at a link time by > replacing an instruction with a branch to a sequence of equivalent > instructions that are guaranteed to to not trigger the erratum. In > some cases the trigger sequence is dependent on precise addresses such > as immediates crossing page boundaries, for example > https://sourceware.org/ml/binutils-cvs/2015-04/msg00012.html . Errata > patching is out of the scope of implementing range extension thunks > but can be seen as a generalization of it.
On 4 January 2017 at 13:34, Peter Smith via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I'm about to start working on range extension thunks in lld. This is > an attempt to summarize the approach I'd like to take and what the > impact will be on lld outside of thunks.Now that LLD works well for FreeBSD/amd64 (and arm64 is very close) I'm looking at other architectures, starting with mips64. The statically-linked toolchain components currently fail to link with an out of range jump, so I'm very interested in seeing this work progress. Are you looking at only arm and AArch64? Once the infrastructure is in I'll try to take a look at mips if nobody else does first.
Simon Atanasyan via llvm-dev
2017-Jan-18 23:59 UTC
[llvm-dev] RFC: LLD range extension thunks
On Jan 19, 2017 2:48 AM, "Ed Maste" <emaste at freebsd.org> wrote: On 4 January 2017 at 13:34, Peter Smith via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I'm about to start working on range extension thunks in lld. This is > an attempt to summarize the approach I'd like to take and what the > impact will be on lld outside of thunks.Now that LLD works well for FreeBSD/amd64 (and arm64 is very close) I'm looking at other architectures, starting with mips64. The statically-linked toolchain components currently fail to link with an out of range jump, so I'm very interested in seeing this work progress. Are you looking at only arm and AArch64? Once the infrastructure is in I'll try to take a look at mips if nobody else does first. I'm waiting for this changes too. Now mips thunks places at the end of the corresponding section. Not sure about FreeBSD but on Linux that leads to incorrect code in case of static linking -- a thunk goes between crt*.o files which needs to be "joined" together. Gnu linker puts thunks to the separate section. We need to do the same thing. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170119/e4447c5d/attachment.html>