Tim Northover via llvm-dev
2018-May-21 13:04 UTC
[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld
On 21 May 2018 at 13:57, Bruce Hoult via llvm-dev <llvm-dev at lists.llvm.org> wrote:> "ADRL produces position-independent code, because the address is calculated > relative to PC." > > From this, I'd expect ADRP to simply do Xd <- PC + n*4096, where n is a 20 > bit number, just like AUIPC in RISC-V (also a 20 literal multiplied by 4096) > or AUIPC in MIPS (16 bits multiplied by 65636 there).Afraid not. It really is (PC & ~0xfff) + n * 0x1000. So it does require 12-bit alignment of any code section. Now that you mention the MIPS & RISC-V alternatives, I'm not sure why ARM actually made that choice. It obviously saves you a handful of transistors but I can't quite believe that's all there is to it. Cheers. Tim.
Eric Gorr via llvm-dev
2018-May-21 13:23 UTC
[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld
Thank you for providing the explanation for how ADRP works...something I should have done myself. With this explanation in hand, one other alternative I was looking at was using a linkerscript to essentially rebase the code and have ADRP instructions that would address the correct location as a result. However, I am not a linkerscript expert, so I am not sure if such a thing is even possible or would make much sense. However, it may provide a legitimate shortcut to a solution which doesn't involve adding a feature to the toolchain. On Mon, May 21, 2018 at 9:04 AM, Tim Northover <t.p.northover at gmail.com> wrote:> On 21 May 2018 at 13:57, Bruce Hoult via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > "ADRL produces position-independent code, because the address is > calculated > > relative to PC." > > > > From this, I'd expect ADRP to simply do Xd <- PC + n*4096, where n is a > 20 > > bit number, just like AUIPC in RISC-V (also a 20 literal multiplied by > 4096) > > or AUIPC in MIPS (16 bits multiplied by 65636 there). > > Afraid not. It really is (PC & ~0xfff) + n * 0x1000. So it does > require 12-bit alignment of any code section. > > Now that you mention the MIPS & RISC-V alternatives, I'm not sure why > ARM actually made that choice. It obviously saves you a handful of > transistors but I can't quite believe that's all there is to it. > > Cheers. > > Tim. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180521/3be8e781/attachment.html>
Bruce Hoult via llvm-dev
2018-May-21 13:44 UTC
[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld
On Tue, May 22, 2018 at 1:04 AM, Tim Northover <t.p.northover at gmail.com> wrote:> On 21 May 2018 at 13:57, Bruce Hoult via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > "ADRL produces position-independent code, because the address is > calculated > > relative to PC." > > > > From this, I'd expect ADRP to simply do Xd <- PC + n*4096, where n is a > 20 > > bit number, just like AUIPC in RISC-V (also a 20 literal multiplied by > 4096) > > or AUIPC in MIPS (16 bits multiplied by 65636 there). > > Afraid not. It really is (PC & ~0xfff) + n * 0x1000. So it does > require 12-bit alignment of any code section. >Wow! My mistake. Knock me down with a leaf.> Now that you mention the MIPS & RISC-V alternatives, I'm not sure why > ARM actually made that choice. It obviously saves you a handful of > transistors but I can't quite believe that's all there is to it. >I'm not quite sure how passing 12 bits through an ALU unchanged uses more transistors than inserting muxes to pass them through for some instructions and replace them with zeros for other instructions :-) I find Aarch64 inexplicable. There are some truly brilliant touches such as the bit patterns in immediate operands for logical instructions, or the pass-through/invert/negate/increment in the conditional select instruction, or the bitfield move that can extract/insert/sign extend/truncate. But there are are some things that make me think the designers operated in a complete vacuum, not aware of the brilliant bits in ARM32 or other prior art. This is one of them. The abandonment of mixed 16/32 bit opcodes that took Thumb2 to such dominance is another. MIPS have copied that several times with the recently announced nanoMIPS looking pretty good (and with 16, 32 & 48 bit opcodes designed in). RISC-V was of course designed for optional variable length 16&32 bit (and longer in future) opcodes from almost the beginning. All of these give x86_64-beating code density without the sequential decode nightmares. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180522/9ee74b63/attachment.html>
Peter Smith via llvm-dev
2018-May-21 13:52 UTC
[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld
Hello Eric, My understanding is that the ADRP instruction isn't supposed to be used on its own. The result of the ADRP provides a 4k aligned address, the following instruction such as an LDR has an immediate offset that can reach any address within the 4k page. For example to get the address of a global variable var with -fpic in ELF: adrp x0, :got:var // relocation R_AARCH64_ADR_GOT_PAGE var ldr x0, [x0, :got_lo12:var] // relocation R_AARCH64_LD64_GOT_LO12_NC The resulting code section is 4 byte aligned, I'm not sure where the requirement for 4k aligned sections come from unless you are planning to use ADRP alone? Do you need just one instruction for the purposes of reducing code size? Another possibility if you don't care about code-size but mustn't use ADRP is (range permitting) to have the linker turn an ADRP to ADR and replace the following instruction with a NOP. I think that is something you'd need to maintain downstream though. If you can use gcc then that supports -mcmodel=tiny. How long it would take to implementing it in LLVM would depend on how familiar you are with LLVM and how much you know of the specification of -mcmodel=tiny; on the assumption you aren't that familiar I'd guess at an order of weeks. Peter On 21 May 2018 at 14:23, Eric Gorr via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Thank you for providing the explanation for how ADRP works...something I > should have done myself. > > With this explanation in hand, one other alternative I was looking at was > using a linkerscript to essentially rebase the code and have ADRP > instructions that would address the correct location as a result. However, I > am not a linkerscript expert, so I am not sure if such a thing is even > possible or would make much sense. However, it may provide a legitimate > shortcut to a solution which doesn't involve adding a feature to the > toolchain. > > > On Mon, May 21, 2018 at 9:04 AM, Tim Northover <t.p.northover at gmail.com> > wrote: >> >> On 21 May 2018 at 13:57, Bruce Hoult via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> > "ADRL produces position-independent code, because the address is >> > calculated >> > relative to PC." >> > >> > From this, I'd expect ADRP to simply do Xd <- PC + n*4096, where n is a >> > 20 >> > bit number, just like AUIPC in RISC-V (also a 20 literal multiplied by >> > 4096) >> > or AUIPC in MIPS (16 bits multiplied by 65636 there). >> >> Afraid not. It really is (PC & ~0xfff) + n * 0x1000. So it does >> require 12-bit alignment of any code section. >> >> Now that you mention the MIPS & RISC-V alternatives, I'm not sure why >> ARM actually made that choice. It obviously saves you a handful of >> transistors but I can't quite believe that's all there is to it. >> >> Cheers. >> >> Tim. > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Tim Northover via llvm-dev
2018-May-21 13:53 UTC
[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld
On 21 May 2018 at 14:23, Eric Gorr <ericgorr at gmail.com> wrote:> I don't care about Mach-O at all. ELF is sufficient.Ah, there's definitely no linker-optimization hints for ELF. The compiler doesn't even emit the data that the linker would need.> As an educated opinion, how difficult might something like this be? minutes? hours? days? weeks? months?Probably a few hours on the compiler side for me (~1 plumbing "tiny" through as a valid option, ~1-2 implementing it in AArch64, + time compiling etc). It's actually a pretty simple change to make as these things go; thread-local storage is likely to be the trickiest bit. That's assuming the linker can cope with the new relocations, which looks plausible from a quick grep but not a foregone conclusion.> With this explanation in hand, one other alternative I was looking at was > using a linkerscript to essentially rebase the code and have ADRP > instructions that would address the correct location as a result.You mean provide the explicit (misaligned) address you intend to load the binary at and get the linker to fix things up? Theoretically it would have sufficient information, but I don't know how you'd convince it not to align pages.
Peter Smith via llvm-dev
2018-May-21 14:34 UTC
[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld
Hello Eric, If you do decide to investigate the linker script route, the ALIGN builitin function might be useful. I think the simplest way is to do something like: .text ALIGN(0x1000) : { *(.text) } .my_next_section ALIGN (0x1000) : { *(my_next_section) } Bothe .text and .my_next_section would start at 4k boundaries. Link to docs: https://sourceware.org/binutils/docs/ld/Builtin-Functions.html#Builtin-Functions Peter On 21 May 2018 at 14:23, Eric Gorr via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Thank you for providing the explanation for how ADRP works...something I > should have done myself. > > With this explanation in hand, one other alternative I was looking at was > using a linkerscript to essentially rebase the code and have ADRP > instructions that would address the correct location as a result. However, I > am not a linkerscript expert, so I am not sure if such a thing is even > possible or would make much sense. However, it may provide a legitimate > shortcut to a solution which doesn't involve adding a feature to the > toolchain. > > > On Mon, May 21, 2018 at 9:04 AM, Tim Northover <t.p.northover at gmail.com> > wrote: >> >> On 21 May 2018 at 13:57, Bruce Hoult via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> > "ADRL produces position-independent code, because the address is >> > calculated >> > relative to PC." >> > >> > From this, I'd expect ADRP to simply do Xd <- PC + n*4096, where n is a >> > 20 >> > bit number, just like AUIPC in RISC-V (also a 20 literal multiplied by >> > 4096) >> > or AUIPC in MIPS (16 bits multiplied by 65636 there). >> >> Afraid not. It really is (PC & ~0xfff) + n * 0x1000. So it does >> require 12-bit alignment of any code section. >> >> Now that you mention the MIPS & RISC-V alternatives, I'm not sure why >> ARM actually made that choice. It obviously saves you a handful of >> transistors but I can't quite believe that's all there is to it. >> >> Cheers. >> >> Tim. > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >