Apologies - I apparently remembered part of the issue incorrectly, so this
ended up quite confusing. The problem comes when referencing labels in a
different section of the binary. To clarify, if I assemble the code:
.data
foo BYTE 5
.code
mov eax, foo
with Microsoft's ml64.exe, it emits an object file disassembling to:
0: 8b 05 00 00 00 00 mov eax, dword ptr [rip]
000000000000000b: IMAGE_REL_AMD64_REL32 foo
On the other hand, if I use my current local draft of llvm-ml, I get a
different result. I actually get the same result as I do for llvm-mc, using
the corresponding code:
.data
foo:
.byte 5
.text
.intel_syntax
mov eax, foo
Either way, LLVM emits an object file with disassembly (and relocation) as
follows:
0: 8b 04 25 00 00 00 00 mov eax, dword ptr [0]
0000000000000003: IMAGE_REL_AMD64_ADDR32 foo
To replicate the results from ml64.exe with LLVM, I instead need to use
mov eax, [foo + rip]
in place of mov eax, foo. At least when building with llvm-ml, we need to
mimic ml.exe's approach; a reference to a symbol in another section should
use the relative addressing mode.
My first attempt to fix this was very clumsy - when in MASM mode, I forced
all expressions without a base register to presume RIP. Unfortunately, that
breaks any attempt to use "jcc", since it turns label references into
absolute memory references with a base register (and the "jcc" family
doesn't accept absolute memory operands). Any suggestions for how I can fix
the issue described here without breaking "jcc"?
On Tue, Jan 21, 2020 at 3:43 PM Eli Friedman <efriedma at quicinc.com>
wrote:
> All immediate jump instructions on x86 (call/jmp/jcc) have a relative
> offset operand. The destination is, in some sense, “rip-relative”, but we
> don’t represent it like that in LLVM. If you look at the TableGen
> descriptions, jumps use brtarget32, and calls use i32imm_pcrel. In both
> Microsoft and GNU assembly syntax, this is something like “call baz”.
>
>
>
> “call”/”jmp” also have a register/memory form, for indirect calls. In
> 64-bit, this allows rip-relative references, to call a function pointer
> stored in a global variable. In Microsoft assembly syntax, this is “call
> QWORD PTR baz”. In GNU assembly syntax, this is “call *baz(%rip)”.
>
>
>
> For 64-bit x86, any reference to a global has to be a rip-relative address
> (since all 64-bit programs are position-independent), but on 32-bit x86,
> it’s also possible to refer to the address of a variable using something
> like “add eax, OFFSET baz”.
>
>
>
> For globals which are explicitly labeled “PTR” or “OFFSET”, the correct
> representation should be unambiguous, and it should be easy to print
> appropriate error messages. For other cases, I’m not sure what the
> inference rules are. It might vary depending on the opcode.
>
>
>
> -Eli
>
>
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of
*Eric
> Astor via llvm-dev
> *Sent:* Monday, January 20, 2020 6:26 PM
> *To:* LLVM-dev <llvm-dev at lists.llvm.org>
> *Subject:* [EXT] [llvm-dev] MASM & RIP-relative addressing
>
>
>
> Hi all,
>
>
>
> Continuing work on llvm-ml (a MASM assembler)... and my latest obstacle is
> in enabling MASM's convention that (unless specified) all memory
location
> references should be RIP-relative. Without it, we emit the wrong
> instructions for "call", "jmp", etc., and anything we
build fails at the
> linking stage.
>
>
>
> My best attempt at this so far is a small patch to X86AsmParser.cpp - just
> taking any Intel expression with no specified base register and switching
> it to use RIP - and this works alright. There's at least one exception:
it
> breaks the "jcc" instructions, at least "jcc
<label>". The issue seems to
> be that the "jcc" family exclusively takes a relative offset,
never an
> absolute reference... so adding a base register causes the operand not to
> match. ("jcc" is always RIP-relative anyway.)
>
>
>
> I'm not very familiar with the operand-matching logic, and am still
pretty
> new to LLVM as a whole. Are there more X86 instructions this will interact
> badly with? Any thoughts on how this could be handled better?
>
>
>
> If this is mostly a valid approach, might there be a way to change the
> operand type of "jcc" to accept offset(base) operands, as long as
base => X86::RIP, then ignore the RIP bit?
>
>
>
> Thanks,
>
> - Eric
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200121/444f0f07/attachment.html>
Clarifying a minor copy/paste error, ml64.exe actually outputs:
0: 8b 05 00 00 00 00 mov eax, dword ptr [rip]
0000000000000002: IMAGE_REL_AMD64_REL32 foo
In other words, the relocation info is the same... but the instruction uses
RIP-relative addressing, not absolute.
On Tue, Jan 21, 2020 at 5:41 PM Eric Astor <epastor at google.com> wrote:
> Apologies - I apparently remembered part of the issue incorrectly, so this
> ended up quite confusing. The problem comes when referencing labels in a
> different section of the binary. To clarify, if I assemble the code:
>
> .data
> foo BYTE 5
> .code
> mov eax, foo
>
> with Microsoft's ml64.exe, it emits an object file disassembling to:
>
> 0: 8b 05 00 00 00 00 mov eax, dword ptr [rip]
> 000000000000000b: IMAGE_REL_AMD64_REL32 foo
>
> On the other hand, if I use my current local draft of llvm-ml, I get a
> different result. I actually get the same result as I do for llvm-mc, using
> the corresponding code:
>
> .data
> foo:
> .byte 5
> .text
> .intel_syntax
> mov eax, foo
>
> Either way, LLVM emits an object file with disassembly (and relocation) as
> follows:
>
> 0: 8b 04 25 00 00 00 00 mov eax, dword ptr [0]
> 0000000000000003: IMAGE_REL_AMD64_ADDR32 foo
>
> To replicate the results from ml64.exe with LLVM, I instead need to use
>
> mov eax, [foo + rip]
>
> in place of mov eax, foo. At least when building with llvm-ml, we need to
> mimic ml.exe's approach; a reference to a symbol in another section
should
> use the relative addressing mode.
>
> My first attempt to fix this was very clumsy - when in MASM mode, I forced
> all expressions without a base register to presume RIP. Unfortunately, that
> breaks any attempt to use "jcc", since it turns label references
into
> absolute memory references with a base register (and the "jcc"
family
> doesn't accept absolute memory operands). Any suggestions for how I can
fix
> the issue described here without breaking "jcc"?
>
> On Tue, Jan 21, 2020 at 3:43 PM Eli Friedman <efriedma at
quicinc.com> wrote:
>
>> All immediate jump instructions on x86 (call/jmp/jcc) have a relative
>> offset operand. The destination is, in some sense, “rip-relative”, but
we
>> don’t represent it like that in LLVM. If you look at the TableGen
>> descriptions, jumps use brtarget32, and calls use i32imm_pcrel. In
both
>> Microsoft and GNU assembly syntax, this is something like “call baz”.
>>
>>
>>
>> “call”/”jmp” also have a register/memory form, for indirect calls. In
>> 64-bit, this allows rip-relative references, to call a function pointer
>> stored in a global variable. In Microsoft assembly syntax, this is
“call
>> QWORD PTR baz”. In GNU assembly syntax, this is “call *baz(%rip)”.
>>
>>
>>
>> For 64-bit x86, any reference to a global has to be a rip-relative
>> address (since all 64-bit programs are position-independent), but on
32-bit
>> x86, it’s also possible to refer to the address of a variable using
>> something like “add eax, OFFSET baz”.
>>
>>
>>
>> For globals which are explicitly labeled “PTR” or “OFFSET”, the correct
>> representation should be unambiguous, and it should be easy to print
>> appropriate error messages. For other cases, I’m not sure what the
>> inference rules are. It might vary depending on the opcode.
>>
>>
>>
>> -Eli
>>
>>
>>
>> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf
Of *Eric
>> Astor via llvm-dev
>> *Sent:* Monday, January 20, 2020 6:26 PM
>> *To:* LLVM-dev <llvm-dev at lists.llvm.org>
>> *Subject:* [EXT] [llvm-dev] MASM & RIP-relative addressing
>>
>>
>>
>> Hi all,
>>
>>
>>
>> Continuing work on llvm-ml (a MASM assembler)... and my latest obstacle
>> is in enabling MASM's convention that (unless specified) all memory
>> location references should be RIP-relative. Without it, we emit the
wrong
>> instructions for "call", "jmp", etc., and anything
we build fails at the
>> linking stage.
>>
>>
>>
>> My best attempt at this so far is a small patch to X86AsmParser.cpp -
>> just taking any Intel expression with no specified base register and
>> switching it to use RIP - and this works alright. There's at least
one
>> exception: it breaks the "jcc" instructions, at least
"jcc <label>". The
>> issue seems to be that the "jcc" family exclusively takes a
relative
>> offset, never an absolute reference... so adding a base register causes
the
>> operand not to match. ("jcc" is always RIP-relative anyway.)
>>
>>
>>
>> I'm not very familiar with the operand-matching logic, and am still
>> pretty new to LLVM as a whole. Are there more X86 instructions this
will
>> interact badly with? Any thoughts on how this could be handled better?
>>
>>
>>
>> If this is mostly a valid approach, might there be a way to change the
>> operand type of "jcc" to accept offset(base) operands, as
long as base =>> X86::RIP, then ignore the RIP bit?
>>
>>
>>
>> Thanks,
>>
>> - Eric
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200121/a0c34491/attachment-0001.html>
Are you asking what the parsing rules are, or how you should modify the LLVM
code to achieve that result?
If the latter, you haven’t really given enough detail here. What code, exactly,
have you tried modifying? Do you have any ideas for how it could work?
-Eli
From: Eric Astor <epastor at google.com>
Sent: Tuesday, January 21, 2020 2:44 PM
To: Eli Friedman <efriedma at quicinc.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [EXT] Re: [llvm-dev] MASM & RIP-relative addressing
Clarifying a minor copy/paste error, ml64.exe actually outputs:
0: 8b 05 00 00 00 00 mov eax, dword ptr [rip]
0000000000000002: IMAGE_REL_AMD64_REL32 foo
In other words, the relocation info is the same... but the instruction uses
RIP-relative addressing, not absolute.
On Tue, Jan 21, 2020 at 5:41 PM Eric Astor <epastor at
google.com<mailto:epastor at google.com>> wrote:
Apologies - I apparently remembered part of the issue incorrectly, so this ended
up quite confusing. The problem comes when referencing labels in a different
section of the binary. To clarify, if I assemble the code:
.data
foo BYTE 5
.code
mov eax, foo
with Microsoft's ml64.exe, it emits an object file disassembling to:
0: 8b 05 00 00 00 00 mov eax, dword ptr [rip]
000000000000000b: IMAGE_REL_AMD64_REL32 foo
On the other hand, if I use my current local draft of llvm-ml, I get a different
result. I actually get the same result as I do for llvm-mc, using the
corresponding code:
.data
foo:
.byte 5
.text
.intel_syntax
mov eax, foo
Either way, LLVM emits an object file with disassembly (and relocation) as
follows:
0: 8b 04 25 00 00 00 00 mov eax, dword ptr [0]
0000000000000003: IMAGE_REL_AMD64_ADDR32 foo
To replicate the results from ml64.exe with LLVM, I instead need to use
mov eax, [foo + rip]
in place of mov eax, foo. At least when building with llvm-ml, we need to mimic
ml.exe's approach; a reference to a symbol in another section should use the
relative addressing mode.
My first attempt to fix this was very clumsy - when in MASM mode, I forced all
expressions without a base register to presume RIP. Unfortunately, that breaks
any attempt to use "jcc", since it turns label references into
absolute memory references with a base register (and the "jcc" family
doesn't accept absolute memory operands). Any suggestions for how I can fix
the issue described here without breaking "jcc"?
On Tue, Jan 21, 2020 at 3:43 PM Eli Friedman <efriedma at
quicinc.com<mailto:efriedma at quicinc.com>> wrote:
All immediate jump instructions on x86 (call/jmp/jcc) have a relative offset
operand. The destination is, in some sense, “rip-relative”, but we don’t
represent it like that in LLVM. If you look at the TableGen descriptions, jumps
use brtarget32, and calls use i32imm_pcrel. In both Microsoft and GNU assembly
syntax, this is something like “call baz”.
“call”/”jmp” also have a register/memory form, for indirect calls. In 64-bit,
this allows rip-relative references, to call a function pointer stored in a
global variable. In Microsoft assembly syntax, this is “call QWORD PTR baz”. In
GNU assembly syntax, this is “call *baz(%rip)”.
For 64-bit x86, any reference to a global has to be a rip-relative address
(since all 64-bit programs are position-independent), but on 32-bit x86, it’s
also possible to refer to the address of a variable using something like “add
eax, OFFSET baz”.
For globals which are explicitly labeled “PTR” or “OFFSET”, the correct
representation should be unambiguous, and it should be easy to print appropriate
error messages. For other cases, I’m not sure what the inference rules are. It
might vary depending on the opcode.
-Eli
From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> On Behalf Of Eric Astor via llvm-dev
Sent: Monday, January 20, 2020 6:26 PM
To: LLVM-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: [EXT] [llvm-dev] MASM & RIP-relative addressing
Hi all,
Continuing work on llvm-ml (a MASM assembler)... and my latest obstacle is in
enabling MASM's convention that (unless specified) all memory location
references should be RIP-relative. Without it, we emit the wrong instructions
for "call", "jmp", etc., and anything we build fails at the
linking stage.
My best attempt at this so far is a small patch to X86AsmParser.cpp - just
taking any Intel expression with no specified base register and switching it to
use RIP - and this works alright. There's at least one exception: it breaks
the "jcc" instructions, at least "jcc <label>". The
issue seems to be that the "jcc" family exclusively takes a relative
offset, never an absolute reference... so adding a base register causes the
operand not to match. ("jcc" is always RIP-relative anyway.)
I'm not very familiar with the operand-matching logic, and am still pretty
new to LLVM as a whole. Are there more X86 instructions this will interact badly
with? Any thoughts on how this could be handled better?
If this is mostly a valid approach, might there be a way to change the operand
type of "jcc" to accept offset(base) operands, as long as base ==
X86::RIP, then ignore the RIP bit?
Thanks,
- Eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200121/af5702ec/attachment.html>