thr3ads.net - llvm dev - [llvm-dev] ARM64, dropping ADRP instructions, and ld.lld [May 2018]

If this information is useful, please help other people find it:
Share via:

Tim Northover via llvm-dev

2018-May-21 13:04 UTC

[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld

On 21 May 2018 at 13:57, Bruce Hoult via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> "ADRL produces position-independent code, because the address is
calculated
> relative to PC."
>
> From this, I'd expect ADRP to simply do Xd <- PC + n*4096, where n
is a 20
> bit number, just like AUIPC in RISC-V (also a 20 literal multiplied by
4096)
> or AUIPC in MIPS (16 bits multiplied by 65636 there).
Afraid not. It really is (PC & ~0xfff) + n * 0x1000. So it does
require 12-bit alignment of any code section.

Now that you mention the MIPS & RISC-V alternatives, I'm not sure why
ARM actually made that choice. It obviously saves you a handful of
transistors but I can't quite believe that's all there is to it.

Cheers.

Tim.

Eric Gorr via llvm-dev

2018-May-21 13:23 UTC

head link

[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld

Thank you for providing the explanation for how ADRP works...something I
should have done myself.

With this explanation in hand, one other alternative I was looking at was
using a linkerscript to essentially rebase the code and have ADRP
instructions that would address the correct location as a result. However,
I am not a linkerscript expert, so I am not sure if such a thing is even
possible or would make much sense. However, it may provide a legitimate
shortcut to a solution which doesn't involve adding a feature to the
toolchain.

On Mon, May 21, 2018 at 9:04 AM, Tim Northover <t.p.northover at
gmail.com>
wrote:
> On 21 May 2018 at 13:57, Bruce Hoult via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > "ADRL produces position-independent code, because the address is
> calculated
> > relative to PC."
> >
> > From this, I'd expect ADRP to simply do Xd <- PC + n*4096,
where n is a
> 20
> > bit number, just like AUIPC in RISC-V (also a 20 literal multiplied by
> 4096)
> > or AUIPC in MIPS (16 bits multiplied by 65636 there).
>
> Afraid not. It really is (PC & ~0xfff) + n * 0x1000. So it does
> require 12-bit alignment of any code section.
>
> Now that you mention the MIPS & RISC-V alternatives, I'm not sure
why
> ARM actually made that choice. It obviously saves you a handful of
> transistors but I can't quite believe that's all there is to it.
>
> Cheers.
>
> Tim.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180521/3be8e781/attachment.html>

Bruce Hoult via llvm-dev

2018-May-21 13:44 UTC

head link

[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld

On Tue, May 22, 2018 at 1:04 AM, Tim Northover <t.p.northover at
gmail.com>
wrote:
> On 21 May 2018 at 13:57, Bruce Hoult via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > "ADRL produces position-independent code, because the address is
> calculated
> > relative to PC."
> >
> > From this, I'd expect ADRP to simply do Xd <- PC + n*4096,
where n is a
> 20
> > bit number, just like AUIPC in RISC-V (also a 20 literal multiplied by
> 4096)
> > or AUIPC in MIPS (16 bits multiplied by 65636 there).
>
> Afraid not. It really is (PC & ~0xfff) + n * 0x1000. So it does
> require 12-bit alignment of any code section.
>
Wow! My mistake. Knock me down with a leaf.

> Now that you mention the MIPS & RISC-V alternatives, I'm not sure
why
> ARM actually made that choice. It obviously saves you a handful of
> transistors but I can't quite believe that's all there is to it.
>
I'm not quite sure how passing 12 bits through an ALU unchanged uses more
transistors than inserting muxes to pass them through for some instructions
and replace them with zeros for other instructions :-)

I find Aarch64 inexplicable. There are some truly brilliant touches such as
the bit patterns in immediate operands for logical instructions, or the
pass-through/invert/negate/increment in the conditional select instruction,
or the bitfield move that can extract/insert/sign extend/truncate. But
there are are some things that make me think the designers operated in a
complete vacuum, not aware of the brilliant bits in ARM32 or other prior
art. This is one of them. The abandonment of mixed 16/32 bit opcodes that
took Thumb2 to such dominance is another. MIPS have copied that several
times with the recently announced nanoMIPS looking pretty good (and with
16, 32 & 48 bit opcodes designed in). RISC-V was of course designed for
optional variable length 16&32 bit (and longer in future) opcodes from
almost the beginning. All of these give x86_64-beating code density without
the sequential decode nightmares.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180522/9ee74b63/attachment.html>

Peter Smith via llvm-dev

2018-May-21 13:52 UTC

head link

[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld

Hello Eric,

My understanding is that the ADRP instruction isn't supposed to be
used on its own. The result of the ADRP provides a 4k aligned address,
the following instruction such as an LDR has an immediate offset that
can reach any address within the 4k page. For example to get the
address of a global variable var with -fpic in ELF:
adrp x0, :got:var                   // relocation R_AARCH64_ADR_GOT_PAGE var
ldr x0, [x0, :got_lo12:var]     // relocation R_AARCH64_LD64_GOT_LO12_NC

The resulting code section is 4 byte aligned, I'm not sure where the
requirement for 4k aligned sections come from unless you are planning
to use ADRP alone? Do you need just one instruction for the purposes
of reducing code size? Another possibility if you don't care about
code-size but mustn't use ADRP is (range permitting) to have the
linker turn an ADRP to ADR and replace the following instruction with
a NOP. I think that is something you'd need to maintain downstream
though.

If you can use gcc then that supports -mcmodel=tiny. How long it would
take to implementing it in LLVM would depend on how familiar you are
with LLVM and how much you know of the specification of -mcmodel=tiny;
on the assumption you aren't that familiar I'd guess at an order of
weeks.

Peter

On 21 May 2018 at 14:23, Eric Gorr via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> Thank you for providing the explanation for how ADRP works...something I
> should have done myself.
>
> With this explanation in hand, one other alternative I was looking at was
> using a linkerscript to essentially rebase the code and have ADRP
> instructions that would address the correct location as a result. However,
I
> am not a linkerscript expert, so I am not sure if such a thing is even
> possible or would make much sense. However, it may provide a legitimate
> shortcut to a solution which doesn't involve adding a feature to the
> toolchain.
>
>
> On Mon, May 21, 2018 at 9:04 AM, Tim Northover <t.p.northover at
gmail.com>
> wrote:
>>
>> On 21 May 2018 at 13:57, Bruce Hoult via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> > "ADRL produces position-independent code, because the address
is
>> > calculated
>> > relative to PC."
>> >
>> > From this, I'd expect ADRP to simply do Xd <- PC + n*4096,
where n is a
>> > 20
>> > bit number, just like AUIPC in RISC-V (also a 20 literal
multiplied by
>> > 4096)
>> > or AUIPC in MIPS (16 bits multiplied by 65636 there).
>>
>> Afraid not. It really is (PC & ~0xfff) + n * 0x1000. So it does
>> require 12-bit alignment of any code section.
>>
>> Now that you mention the MIPS & RISC-V alternatives, I'm not
sure why
>> ARM actually made that choice. It obviously saves you a handful of
>> transistors but I can't quite believe that's all there is to
it.
>>
>> Cheers.
>>
>> Tim.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Tim Northover via llvm-dev

2018-May-21 13:53 UTC

head link

[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld

On 21 May 2018 at 14:23, Eric Gorr <ericgorr at gmail.com>
wrote:> I don't care about Mach-O at all. ELF is sufficient.
Ah, there's definitely no linker-optimization hints for ELF. The
compiler doesn't even emit the data that the linker would need.
> As an educated opinion, how difficult might something like this be?
minutes? hours? days? weeks? months?
Probably a few hours on the compiler side for me (~1 plumbing "tiny"
through as a valid option, ~1-2 implementing it in AArch64, + time
compiling etc). It's actually a pretty simple change to make as these
things go; thread-local storage is likely to be the trickiest bit.

That's assuming the linker can cope with the new relocations, which
looks plausible from a quick grep but not a foregone conclusion.
> With this explanation in hand, one other alternative I was looking at was
> using a linkerscript to essentially rebase the code and have ADRP
> instructions that would address the correct location as a result.
You mean provide the explicit (misaligned) address you intend to load
the binary at and get the linker to fix things up? Theoretically it
would have sufficient information, but I don't know how you'd convince
it not to align pages.

Peter Smith via llvm-dev

2018-May-21 14:34 UTC

head link

[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld

Hello Eric,

If you do decide to investigate the linker script route, the ALIGN
builitin function might be useful. I think the simplest way is to do
something like:
.text ALIGN(0x1000) : { *(.text) }
.my_next_section ALIGN (0x1000) : { *(my_next_section) }
Bothe .text and .my_next_section would start at 4k boundaries.

Link to docs:
https://sourceware.org/binutils/docs/ld/Builtin-Functions.html#Builtin-Functions

Peter

On 21 May 2018 at 14:23, Eric Gorr via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> Thank you for providing the explanation for how ADRP works...something I
> should have done myself.
>
> With this explanation in hand, one other alternative I was looking at was
> using a linkerscript to essentially rebase the code and have ADRP
> instructions that would address the correct location as a result. However,
I
> am not a linkerscript expert, so I am not sure if such a thing is even
> possible or would make much sense. However, it may provide a legitimate
> shortcut to a solution which doesn't involve adding a feature to the
> toolchain.
>
>
> On Mon, May 21, 2018 at 9:04 AM, Tim Northover <t.p.northover at
gmail.com>
> wrote:
>>
>> On 21 May 2018 at 13:57, Bruce Hoult via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> > "ADRL produces position-independent code, because the address
is
>> > calculated
>> > relative to PC."
>> >
>> > From this, I'd expect ADRP to simply do Xd <- PC + n*4096,
where n is a
>> > 20
>> > bit number, just like AUIPC in RISC-V (also a 20 literal
multiplied by
>> > 4096)
>> > or AUIPC in MIPS (16 bits multiplied by 65636 there).
>>
>> Afraid not. It really is (PC & ~0xfff) + n * 0x1000. So it does
>> require 12-bit alignment of any code section.
>>
>> Now that you mention the MIPS & RISC-V alternatives, I'm not
sure why
>> ARM actually made that choice. It obviously saves you a handful of
>> transistors but I can't quite believe that's all there is to
it.
>>
>> Cheers.
>>
>> Tim.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - May 2018 - ARM64, dropping ADRP instructions, and ld.lld

[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld

[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld

[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld

[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld

[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld

[llvm-dev] ARM64, dropping ADRP instructions, and ld.lld

Apparently Analagous Threads