Simon Atanasyan via llvm-dev
2016-Sep-07 13:58 UTC
[llvm-dev] [LLD] Writing thunks before the corresponding section
Hi, MIPS LA25 thunk is used to call PIC function from non-PIC code. Usually it contains three instructions: lui $25, %hi(func) addiu $25, $25, %lo(func) j func We can write such thunk in an arbitrary place of the generated file. But if a PIC function requires the thunk is the first routine in a section, we can optimize the code and escape jump instruction. To do so we just write the following thunk right before the PIC routine. lui $25, %hi(func) addiu $25, $25, %lo(func) In fact GNU bfd/gold linkers write all MIPS LA25 thunks required for the section "A" into a separate input section "S" and put section "S" before "A". The last thunk in the section "S" might have an optimized two-instructions form. I would like to implement such optimization in LLD. My question is about ARM thunks - is it okay to write them before corresponding input section not after like LLD does now? -- Simon Atanasyan
Peter Smith via llvm-dev
2016-Sep-07 16:55 UTC
[llvm-dev] [LLD] Writing thunks before the corresponding section
Hello Simon, Yes it is okay to write ARM thunks before an InputSection. There is a similar "inline state change" thunk in ARM that does BX PC, NOP to change state and fall through. The ARM Thunks that are implemented now just need to be in range of the source branch. I have previously worked on an ARM Linker that has thunks in separate sections in the same way that you describe for bfd/gold. I can't tell if you are planning to implement Thunks as separate InputSections or assigning them to existing InputSections as they are now but writing them at the front and not the end. If you are considering putting the thunks as data to be written prior to the InputSection contents I think you'll need some extra book keeping. - Padding might be needed between the last thunk and the InputSection contents if the alignment of the InputSection is higher than the usual 2 or 4. - If the Thunk is conceptually part of the InputSection (starts at offset 0) then all the relocations and symbols will need displacing. It is worth mentioning that disassembly of ARM and Thumb Thunks may look a bit strange if they are moved from after the InputSection. This is because they lack a mapping symbol ($a or $t) that tells the disassembler what instruction set to disassemble. I've got adding mapping symbol for linker generated InputSections on my list of things to do. Hope this helps Peter On 7 September 2016 at 14:58, Simon Atanasyan <simon at atanasyan.com> wrote:> Hi, > > MIPS LA25 thunk is used to call PIC function from non-PIC code. > Usually it contains three instructions: > > lui $25, %hi(func) > addiu $25, $25, %lo(func) > j func > > We can write such thunk in an arbitrary place of the generated file. > But if a PIC function requires the thunk is the first routine in a > section, we can optimize the code and escape jump instruction. To do > so we just write the following thunk right before the PIC routine. > > lui $25, %hi(func) > addiu $25, $25, %lo(func) > > In fact GNU bfd/gold linkers write all MIPS LA25 thunks required for > the section "A" into a separate input section "S" and put section "S" > before "A". The last thunk in the section "S" might have an optimized > two-instructions form. > > I would like to implement such optimization in LLD. My question is > about ARM thunks - is it okay to write them before corresponding input > section not after like LLD does now? > > -- > Simon Atanasyan
Bruce Hoult via llvm-dev
2016-Sep-07 20:50 UTC
[llvm-dev] [LLD] Writing thunks before the corresponding section
On Wed, Sep 7, 2016 at 7:55 PM, Peter Smith via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello Simon, > > Yes it is okay to write ARM thunks before an InputSection. There is a > similar "inline state change" thunk in ARM that does BX PC, NOP to > change state and fall through.Maybe it's a little bit evil, but I've found that SUB PC,PC,#3 works just fine to change to Thumb state without any NOP needed on all current-generation CPUs I've tried it on, and in particular Raspberry Pi 2 (Cortex A7), Pi 3 (Cortex A53) and Odroid XU4 (Cortex A15). Unfortunately I never though to try this ten years ago on the ARM7TDMI e.g. (assumes Linux EABI kernel) .equ SYSCALL_EXIT, 1 .equ SYSCALL_WRITE, 4 .equ STDOUT, 1 .globl _start .syntax unified _start: sub pc,pc,#3 .thumb movs r0,#STDOUT adr r1,hello movs r2,#11 movs r7,#SYSCALL_WRITE swi 0 movs r7,#SYSCALL_EXIT swi 0 .align 2 hello: .asciz "Hello asm!\n" It is worth mentioning that disassembly of ARM and Thumb Thunks may> look a bit strange if they are moved from after the InputSection. This > is because they lack a mapping symbol ($a or $t) that tells the > disassembler what instruction set to disassemble. I've got adding > mapping symbol for linker generated InputSections on my list of things > to do. >This disassembles fine when built in the standard way so there's clearly no fundamental problem with disassembling past inline thunks: $ as asm_test.s -o asm_test.o $ ld asm_test.o -o asm_test $ ./asm_test Hello asm! $ objdump -d asm_test asm_test: file format elf32-littlearm Disassembly of section .text: 00010054 <_start>: 10054: e24ff003 sub pc, pc, #3 10058: 2001 movs r0, #1 1005a: a103 add r1, pc, #12 ; (adr r1, 10068 <hello>) 1005c: 220b movs r2, #11 1005e: 2704 movs r7, #4 10060: df00 svc 0 10062: 2701 movs r7, #1 10064: df00 svc 0 10066: 46c0 nop ; (mov r8, r8) 00010068 <hello>: 10068: 6c6c6548 .word 0x6c6c6548 1006c: 7361206f .word 0x7361206f 10070: 000a216d .word 0x000a216d NB that first e24ff003 is an ARM instruction, *not* Thumb2. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160907/f47ed9ed/attachment.html>
Rui Ueyama via llvm-dev
2016-Sep-07 22:44 UTC
[llvm-dev] [LLD] Writing thunks before the corresponding section
This seems to be a reasonable optimization, and I don't have any particular concern about implementing it. On Wed, Sep 7, 2016 at 6:58 AM, Simon Atanasyan <simon at atanasyan.com> wrote:> Hi, > > MIPS LA25 thunk is used to call PIC function from non-PIC code. > Usually it contains three instructions: > > lui $25, %hi(func) > addiu $25, $25, %lo(func) > j func > > We can write such thunk in an arbitrary place of the generated file. > But if a PIC function requires the thunk is the first routine in a > section, we can optimize the code and escape jump instruction. To do > so we just write the following thunk right before the PIC routine. > > lui $25, %hi(func) > addiu $25, $25, %lo(func) > > In fact GNU bfd/gold linkers write all MIPS LA25 thunks required for > the section "A" into a separate input section "S" and put section "S" > before "A". The last thunk in the section "S" might have an optimized > two-instructions form. > > I would like to implement such optimization in LLD. My question is > about ARM thunks - is it okay to write them before corresponding input > section not after like LLD does now? > > -- > Simon Atanasyan >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160907/6ce4dd16/attachment.html>
Bruce Hoult via llvm-dev
2016-Sep-08 11:42 UTC
[llvm-dev] [LLD] Writing thunks before the corresponding section
On Wed, Sep 7, 2016 at 7:55 PM, Peter Smith via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello Simon, > > Yes it is okay to write ARM thunks before an InputSection. There is a > similar "inline state change" thunk in ARM that does BX PC, NOP to > change state and fall through.Forgot to mention: BX PC won't do anything in ARM mode. Standard way is ADD Rn,PC,#1;BX Rn (typically LR). In Thumb mode BX PC will switch to ARM, but the BX instruction should be 4-byte aligned and the next 2 bytes are ignored .. doesn't matter whether they are NOP or not. The architecture manual says BX PC from the 2nd Thumb instruction in a 4 byte word is unpredictable. On some implementations it will work, resuming at the ARM instruction in the very next bytes (address 4 bytes more than the word the Thumb instruction was in). But it's hit and miss. The following code works on Odroid XU4 (A15) and Raspberry Pi 2 (A7) but not on Raspberry Pi 3 (A53 - bus error): 00010054 <_start>: 10054: e24ff003 sub pc, pc, #3 10058: 2001 movs r0, #1 1005a: a105 add r1, pc, #20 ; (adr r1, 10070 <hello>) 1005c: 220b movs r2, #11 1005e: 4778 bx pc 10060: e3b07004 movs r7, #4 10064: ef000000 svc 0x00000000 10068: e3b07001 movs r7, #1 1006c: ef000000 svc 0x00000000 00010070 <hello>: 10070: 6c6c6548 .word 0x6c6c6548 10074: 7361206f .word 0x7361206f 10078: 000a216d .word 0x000a216d -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160908/b256dbf4/attachment.html>
Simon Atanasyan via llvm-dev
2016-Nov-29 21:18 UTC
[llvm-dev] [LLD] Writing thunks before the corresponding section
Hi, Sorry for delay with reply. It looks like now thunks can be implemented as a synthetic sections. In that case we give flexible solution and will be able to put thunks before/after related sections, using different alignment etc. As far as I know BFD linker uses the same approach at least for MIPS thunks. I will try to implement this idea. On Thu, Sep 8, 2016 at 1:44 AM, Rui Ueyama <ruiu at google.com> wrote:> This seems to be a reasonable optimization, and I don't have any particular > concern about implementing it. > > On Wed, Sep 7, 2016 at 6:58 AM, Simon Atanasyan <simon at atanasyan.com> wrote: >> >> Hi, >> >> MIPS LA25 thunk is used to call PIC function from non-PIC code. >> Usually it contains three instructions: >> >> lui $25, %hi(func) >> addiu $25, $25, %lo(func) >> j func >> >> We can write such thunk in an arbitrary place of the generated file. >> But if a PIC function requires the thunk is the first routine in a >> section, we can optimize the code and escape jump instruction. To do >> so we just write the following thunk right before the PIC routine. >> >> lui $25, %hi(func) >> addiu $25, $25, %lo(func) >> >> In fact GNU bfd/gold linkers write all MIPS LA25 thunks required for >> the section "A" into a separate input section "S" and put section "S" >> before "A". The last thunk in the section "S" might have an optimized >> two-instructions form. >> >> I would like to implement such optimization in LLD. My question is >> about ARM thunks - is it okay to write them before corresponding input >> section not after like LLD does now?-- Simon Atanasyan