thr3ads.net - llvm dev - [llvm-dev] llvm-mc assembler, GNU as, and pc-relative branches for Arm/AArch64/Mips [Jan 2018]

If this information is useful, please help other people find it:
Share via:

Alex Bradbury via llvm-dev

2018-Jan-10 10:48 UTC

[llvm-dev] llvm-mc assembler, GNU as, and pc-relative branches for Arm/AArch64/Mips

# Summary

As a consequence of comparing the RISC-V LLVM MC assembler to the RISC-V GNU
assembler I've noticed that a number of targets have quite different
handling
for pc-relative jumps/branches with immediate integer operands in llvm-mc vs
GNU as. I'll admit that this isn't likely to occur in hand-written code
(as
you'd almost always prefer to use a label), but thought it was worth
slightly
wider discussion. See below for full details, but really it boils down to
whether you treat an immediate offset to a pc-relative branch as an absolute
target or a pc-relative offset to be directly encoded in the instruction.

1) Is this an intentional difference in behaviour and just something assembly
authors should live with?
2) If not, is there any interest in resolving it? Obviously I can file bugs on
bugzilla.
3) Is anyone interested in collaborating on better automated tooling for
comparing the LLVM MC assembler and GNU as? Or even better, already have
tooling help with this that might be open sourced? Automatically finding
problems such as the assembly parsing issue described in Google's recent
LLVMDevMeeting keynote <https://youtu.be/6l4DtR5exwo?t=10m44s> would be
great.

Please note: it's possible some of the differences I'm seeing are due to
different default ASM variants or default target options across tools
- do let me know
if it seems that's the case.

# Comparing Mips behaviour

    $ cat test-mips.s
    lab:
    beq $6, $7, 128
    bne $4, $5, 64
    beq $6, $7, 128
    bne $4, $5, 64

Assemble with llvm-mc: `llvm-mc -triple=mipsel-unknown-linux test-mips.s
-filetype=obj > foo.o` and then disassemble with `llvm-objdump -d -r`:

    foo.o:  file format ELF32-mips

    Disassembly of section .text:
    lab:
           0: 20 00 c7 10   beq $6, $7, 132 <lab+0x84>
           4: 00 00 00 00   nop
           8: 10 00 85 14   bne $4, $5, 68 <lab+0x4c>
           c: 00 00 00 00   nop
          10: 20 00 c7 10   beq $6, $7, 132 <lab+0x94>
          14: 00 00 00 00   nop
          18: 10 00 85 14   bne $4, $5, 68 <lab+0x5c>
          1c: 00 00 00 00   nop

We can see that no relocations are generated, the immediate offsets for the
beq and bne pairs remain identical, and are interpreted as a PC-relative
offset.

Assembling the same input with GNU as (no arguments), then dumping with GNU
objdump (from the Mips 2016.05-03 precompiled SDK):

    a.out:     file format elf32-tradbigmips


    Disassembly of section .text:

    00000000 <lab>:
       0: 10c70020  beq a2,a3,84 <lab+0x84>
          0: R_MIPS_PC16  *ABS*
       4: 00000000  nop
       8: 14850010  bne a0,a1,4c <lab+0x4c>
          8: R_MIPS_PC16  *ABS*
       c: 00000000  nop
      10: 10c70020  beq a2,a3,94 <lab+0x94>
          10: R_MIPS_PC16 *ABS*
      14: 00000000  nop
      18: 14850010  bne a0,a1,5c <lab+0x5c>
          18: R_MIPS_PC16 *ABS*
      1c: 00000000  nop

We note that the encoded instructions are identical and the pretty-printed
target matches LLVM. However the printed immediate is changed across the
beq/beq and bne/bne pairs so it matches the absolute target.


# Comparing Arm behaviour

    $ cat test-arm.s
    lab:
    beq 128
    bne 64
    beq 128
    bne 64

Assemble with llvm-mc: `llvm-mc -triple=armv7-unknown-none test-arm.s
-filetype=obj > foo.o` and then disassemble with `llvm-ojbdump -d -r`:

    foo.o:  file format ELF32-arm-little

    Disassembly of section .text:
    lab:
           0: 20 00 00 0a   beq #128 <lab+0x88>
           4: 10 00 00 1a   bne #64 <lab+0x4c>
           8: 20 00 00 0a   beq #128 <lab+0x90>
           c: 10 00 00 1a   bne #64 <lab+0x54>

No relocations are produced and the immediate argument is clearly interpreted
as a pc-relative offset.

Assembling and objdumping the same program with the
gcc-arm-non-eabi-7-2017-q4-major toolchain (no arguments to as):

    a.out:     file format elf32-littlearm


    Disassembly of section .text:

    00000000 <lab>:
       0: 0afffffe  beq 80 <*ABS*0x80>
          0: R_ARM_JUMP24 *ABS*0x80
       4: 1afffffe  bne 40 <*ABS*0x40>
          4: R_ARM_JUMP24 *ABS*0x40
       8: 0afffffe  beq 80 <*ABS*0x80>
          8: R_ARM_JUMP24 *ABS*0x80
       c: 1afffffe  bne 40 <*ABS*0x40>
          c: R_ARM_JUMP24 *ABS*0x40

In this case, relocations are generated and the argument appears to be
interpreted as an absolute target. It's worth noting that `beq #128`
and so on aren't recognised by the GNU assembler, but I might be
missing an option that enables that syntax?

# Comparing AArch64 behaviour

    $ cat test-arm.s
    lab:
    beq 128
    bne 64
    beq 128
    bne 64

Assemble with llvm-mc: `llvm-mc -triple=aarch64-unknown-none test-arm.s
-filetype=obj > foo.o` and then disassemble with `llvm-objdump -d -r`:

    foo.o:  file format ELF64-aarch64-little

    Disassembly of section .text:
    lab:
           0: 00 04 00 54   b.eq  #128
           4: 01 02 00 54   b.ne  #64
           8: 00 04 00 54   b.eq  #128
           c: 01 02 00 54   b.ne  #64

No relocations are produced, and because the pairs of b.eq and b.ne have
identical encoding we can conclude the immediate argument is interpreted as a
pc-relative offset.

Assembling (no arguments to as) and objdumping the same input with the Linaro
gcc-linaro-7.2.1-2017.11-i686-aarch64-elf toolchain gives:

    a.out:     file format elf64-littleaarch64


    Disassembly of section .text:

    0000000000000000 <lab>:
       0: 54000400  b.eq  80 <lab+0x80>  // b.none
       4: 54000201  b.ne  44 <lab+0x44>  // b.any
       8: 54000400  b.eq  88 <lab+0x88>  // b.none
       c: 54000201  b.ne  4c <lab+0x4c>  // b.any

This seems to match the LLVM interpretation, other than different choices
about printing immediates in hex vs decimal.


Thanks,

Alex

Peter Smith via llvm-dev

2018-Jan-10 13:55 UTC

head link

[llvm-dev] llvm-mc assembler, GNU as, and pc-relative branches for Arm/AArch64/Mips

Hello Alex,
> 1) Is this an intentional difference in behaviour and just something
assembly
> authors should live with?
I can't speak for the intent of the original author, although in Arm
and AArch64 a branch to a number is not well specified and it is
pretty much implementation defined how an assembler copes with it. For
example in the Arm ARM Use of labels in UAL instruction syntax: "B,
BL, and BLX (immediate). The assembler syntax for these instructions
always specifies the label of the instruction that they branch to".
The original Arm proprietary assembler armasm creates an absolute
symbol with a value of the immediate and gas may have decided to
follow suite.
> 2) If not, is there any interest in resolving it? Obviously I can file bugs
on
> bugzilla.
For Arm/AArch64, practically speaking I've not seen a branch to number
used outside of an assembler test suite. In embedded systems branching
to an address that is defined in some ROM/Flash defined outside of the
current program is common. The usual way to handle this in the
toolchain is to branch to an external symbol that can be defined as
absolute in a different file or via some link time mechanism. This was
more maintainable than branching directly to the address in the
assembler code and worked with C code as well.

My personal opinion is that matching gas is desirable, but in this
case fixing it is probably more trouble than it is worth. Outside of
some horrific program that uses macros to construct the addresses at
assembly time; I'd think that it would be simple to rewrite the
portion of the program to use a symbol. It is worth noting that
although Arm And AArch64 have similar branch instructions (all
immediates are offsets) the GNU assembler has chosen to handle the
cases differently.
> 3) Is anyone interested in collaborating on better automated tooling for
> comparing the LLVM MC assembler and GNU as? Or even better, already have
> tooling help with this that might be open sourced? Automatically finding
> problems such as the assembly parsing issue described in Google's
recent
> LLVMDevMeeting keynote <https://youtu.be/6l4DtR5exwo?t=10m44s> would
be great.
>
I'd be interested in seeing it, although I don't know whether I'll
be
able to dedicate a lot of time to it.

Peter

Rafael Avila de Espindola via llvm-dev

2018-Jan-10 17:45 UTC

head link

[llvm-dev] llvm-mc assembler, GNU as, and pc-relative branches for Arm/AArch64/Mips

Alex Bradbury via llvm-dev <llvm-dev at lists.llvm.org> writes:
> # Summary
>
> As a consequence of comparing the RISC-V LLVM MC assembler to the RISC-V
GNU
> assembler I've noticed that a number of targets have quite different
handling
> for pc-relative jumps/branches with immediate integer operands in llvm-mc
vs
> GNU as. I'll admit that this isn't likely to occur in hand-written
code (as
> you'd almost always prefer to use a label), but thought it was worth
slightly
> wider discussion. See below for full details, but really it boils down to
> whether you treat an immediate offset to a pc-relative branch as an
absolute
> target or a pc-relative offset to be directly encoded in the instruction.
It sounds like a bug. Note that for X86_64 we produce relocations like
gas:

        jmp 0x123

produces

000000000001  000000000002 R_X86_64_PC32                        11f

Cheers,
Rafael

Daniel Sanders via llvm-dev

2018-Jan-10 18:23 UTC

head link

[llvm-dev] llvm-mc assembler, GNU as, and pc-relative branches for Arm/AArch64/Mips

On the Mips side of things, I expect this difference was missed because
there's little reason to use immediates in branches. The Mips assembler has
relative labels that can be redefined and don't create entries in the symbol
table:
1:
	...
	beq $6, $7, 1f  ; The next label 1
	beq $6, $7, 1b ; The previous label 1
	...
1:

Generally speaking, I think LLVM's assembler output should match GNU's
but I'm not sure this particular case occurs in real code.
> On 10 Jan 2018, at 02:48, Alex Bradbury via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> # Summary
> 
> As a consequence of comparing the RISC-V LLVM MC assembler to the RISC-V
GNU
> assembler I've noticed that a number of targets have quite different
handling
> for pc-relative jumps/branches with immediate integer operands in llvm-mc
vs
> GNU as. I'll admit that this isn't likely to occur in hand-written
code (as
> you'd almost always prefer to use a label), but thought it was worth
slightly
> wider discussion. See below for full details, but really it boils down to
> whether you treat an immediate offset to a pc-relative branch as an
absolute
> target or a pc-relative offset to be directly encoded in the instruction.
> 
> 1) Is this an intentional difference in behaviour and just something
assembly
> authors should live with?
> 2) If not, is there any interest in resolving it? Obviously I can file bugs
on
> bugzilla.
> 3) Is anyone interested in collaborating on better automated tooling for
> comparing the LLVM MC assembler and GNU as? Or even better, already have
> tooling help with this that might be open sourced? Automatically finding
> problems such as the assembly parsing issue described in Google's
recent
> LLVMDevMeeting keynote <https://youtu.be/6l4DtR5exwo?t=10m44s> would
be great.
> 
> Please note: it's possible some of the differences I'm seeing are
due to
> different default ASM variants or default target options across tools
> - do let me know
> if it seems that's the case.
> 
> # Comparing Mips behaviour
> 
>    $ cat test-mips.s
>    lab:
>    beq $6, $7, 128
>    bne $4, $5, 64
>    beq $6, $7, 128
>    bne $4, $5, 64
> 
> Assemble with llvm-mc: `llvm-mc -triple=mipsel-unknown-linux test-mips.s
> -filetype=obj > foo.o` and then disassemble with `llvm-objdump -d -r`:
> 
>    foo.o:  file format ELF32-mips
> 
>    Disassembly of section .text:
>    lab:
>           0: 20 00 c7 10   beq $6, $7, 132 <lab+0x84>
>           4: 00 00 00 00   nop
>           8: 10 00 85 14   bne $4, $5, 68 <lab+0x4c>
>           c: 00 00 00 00   nop
>          10: 20 00 c7 10   beq $6, $7, 132 <lab+0x94>
>          14: 00 00 00 00   nop
>          18: 10 00 85 14   bne $4, $5, 68 <lab+0x5c>
>          1c: 00 00 00 00   nop
> 
> We can see that no relocations are generated, the immediate offsets for the
> beq and bne pairs remain identical, and are interpreted as a PC-relative
> offset.
> 
> Assembling the same input with GNU as (no arguments), then dumping with GNU
> objdump (from the Mips 2016.05-03 precompiled SDK):
> 
>    a.out:     file format elf32-tradbigmips
> 
> 
>    Disassembly of section .text:
> 
>    00000000 <lab>:
>       0: 10c70020  beq a2,a3,84 <lab+0x84>
>          0: R_MIPS_PC16  *ABS*
>       4: 00000000  nop
>       8: 14850010  bne a0,a1,4c <lab+0x4c>
>          8: R_MIPS_PC16  *ABS*
>       c: 00000000  nop
>      10: 10c70020  beq a2,a3,94 <lab+0x94>
>          10: R_MIPS_PC16 *ABS*
>      14: 00000000  nop
>      18: 14850010  bne a0,a1,5c <lab+0x5c>
>          18: R_MIPS_PC16 *ABS*
>      1c: 00000000  nop
> 
> We note that the encoded instructions are identical and the pretty-printed
> target matches LLVM. However the printed immediate is changed across the
> beq/beq and bne/bne pairs so it matches the absolute target.
> 
> 
> # Comparing Arm behaviour
> 
>    $ cat test-arm.s
>    lab:
>    beq 128
>    bne 64
>    beq 128
>    bne 64
> 
> Assemble with llvm-mc: `llvm-mc -triple=armv7-unknown-none test-arm.s
> -filetype=obj > foo.o` and then disassemble with `llvm-ojbdump -d -r`:
> 
>    foo.o:  file format ELF32-arm-little
> 
>    Disassembly of section .text:
>    lab:
>           0: 20 00 00 0a   beq #128 <lab+0x88>
>           4: 10 00 00 1a   bne #64 <lab+0x4c>
>           8: 20 00 00 0a   beq #128 <lab+0x90>
>           c: 10 00 00 1a   bne #64 <lab+0x54>
> 
> No relocations are produced and the immediate argument is clearly
interpreted
> as a pc-relative offset.
> 
> Assembling and objdumping the same program with the
> gcc-arm-non-eabi-7-2017-q4-major toolchain (no arguments to as):
> 
>    a.out:     file format elf32-littlearm
> 
> 
>    Disassembly of section .text:
> 
>    00000000 <lab>:
>       0: 0afffffe  beq 80 <*ABS*0x80>
>          0: R_ARM_JUMP24 *ABS*0x80
>       4: 1afffffe  bne 40 <*ABS*0x40>
>          4: R_ARM_JUMP24 *ABS*0x40
>       8: 0afffffe  beq 80 <*ABS*0x80>
>          8: R_ARM_JUMP24 *ABS*0x80
>       c: 1afffffe  bne 40 <*ABS*0x40>
>          c: R_ARM_JUMP24 *ABS*0x40
> 
> In this case, relocations are generated and the argument appears to be
> interpreted as an absolute target. It's worth noting that `beq #128`
> and so on aren't recognised by the GNU assembler, but I might be
> missing an option that enables that syntax?
> 
> # Comparing AArch64 behaviour
> 
>    $ cat test-arm.s
>    lab:
>    beq 128
>    bne 64
>    beq 128
>    bne 64
> 
> Assemble with llvm-mc: `llvm-mc -triple=aarch64-unknown-none test-arm.s
> -filetype=obj > foo.o` and then disassemble with `llvm-objdump -d -r`:
> 
>    foo.o:  file format ELF64-aarch64-little
> 
>    Disassembly of section .text:
>    lab:
>           0: 00 04 00 54   b.eq  #128
>           4: 01 02 00 54   b.ne  #64
>           8: 00 04 00 54   b.eq  #128
>           c: 01 02 00 54   b.ne  #64
> 
> No relocations are produced, and because the pairs of b.eq and b.ne have
> identical encoding we can conclude the immediate argument is interpreted as
a
> pc-relative offset.
> 
> Assembling (no arguments to as) and objdumping the same input with the
Linaro
> gcc-linaro-7.2.1-2017.11-i686-aarch64-elf toolchain gives:
> 
>    a.out:     file format elf64-littleaarch64
> 
> 
>    Disassembly of section .text:
> 
>    0000000000000000 <lab>:
>       0: 54000400  b.eq  80 <lab+0x80>  // b.none
>       4: 54000201  b.ne  44 <lab+0x44>  // b.any
>       8: 54000400  b.eq  88 <lab+0x88>  // b.none
>       c: 54000201  b.ne  4c <lab+0x4c>  // b.any
> 
> This seems to match the LLVM interpretation, other than different choices
> about printing immediates in hex vs decimal.
> 
> 
> Thanks,
> 
> Alex
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Jan 2018 - llvm-mc assembler, GNU as, and pc-relative branches for Arm/AArch64/Mips

[llvm-dev] llvm-mc assembler, GNU as, and pc-relative branches for Arm/AArch64/Mips

[llvm-dev] llvm-mc assembler, GNU as, and pc-relative branches for Arm/AArch64/Mips

[llvm-dev] llvm-mc assembler, GNU as, and pc-relative branches for Arm/AArch64/Mips

[llvm-dev] llvm-mc assembler, GNU as, and pc-relative branches for Arm/AArch64/Mips

Reasonably Related Threads