thr3ads.net - llvm dev - [LLVMdev] R_ARM_ABS32 disassembly with integrated-as [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Jim Grosbach

2012-Oct-05 16:48 UTC

[LLVMdev] R_ARM_ABS32 disassembly with integrated-as

On Oct 5, 2012, at 12:15 AM, Tim Northover <t.p.northover at gmail.com>
wrote:
> Hi Greg,
> 
>> Is this a bug?  If so, how can I fix it?
> 
> It's somewhere between a bug and a quality-of-implementation issue.
> ARM often uses literal pools in the middle of code when it needs to
> materialize a large constant (or variable address more likely for
> R_ARM_ABS32). This results in a sequence roughly like:
> 
>    ldr r0, special_lit_sym
>    [...]
>    b past_literals
> special_lit_sym:
>    .word variable_desired
> past_literals:
>    [...instructions...]
> 
> In general, deciding whether to disassemble a given location as code
> or data is a very hard problem (think of all the evil tricks you could
> play with dual-purpose), so the ARM ELF ABI
> (http://infocenter.arm.com/help/topic/com.arm.../IHI0044D_aaelf.pdf)
> specifies something called mapping symbols, which assemblers should
> insert to tell disassemblers what's actually needed.
> 
> The idea is that a $a should be inserted at the start of each section
> of ARM code, $t before Thumb and $d before data (including these
> embedded litpools). In the above example, $a would be somewhere before
> the first ldr, $d at "special_lit_sym" and $a again at
> "past_literals". objdump will then use these to decide how to
display
> a given address.
> 
> If you dump the symbol table with "readelf -s" (objdump hides
them on
> my system at least) you should see these in the GCC binary, but almost
> certainly not in the LLVM one.
> 
> There's some kind of half-written support already in LLVM I believe,
> but it's been broken for as long as I can remember. You'd need to
make
> the MC emitters properly understand when they're switching between
> code and data areas, and insert the appropriate symbols.
The recent MachO data-in-code support should have fixed a lot of the problems.
There's probably still some quirks in the specifics ($a vs. $t and making
sure the symbols get into the ELF properly), but the core functionality to know
how to mark data regions is there and works very well.

-Jim
> 
> Hope this helps.
> 
> Tim.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Renato Golin

2012-Oct-07 10:14 UTC

head link

[LLVMdev] R_ARM_ABS32 disassembly with integrated-as

On 5 October 2012 17:48, Jim Grosbach <grosbach at apple.com>
wrote:> The recent MachO data-in-code support should have fixed a lot of the
problems. There's probably still some quirks in the specifics ($a vs. $t and
making sure the symbols get into the ELF properly), but the core functionality
to know how to mark data regions is there and works very well.
Hi Jim,

I'm trying to help Greg crack it down. From your recent commits, I
take it you're re-using a data-in-code detection previously used only
for ASM output, to object output, via the
EmitDataRegion/EmitDataRegionEnd.

I haven't looked too deep in the MC, but I'm supposing that will work
automatically when the output streamer is printing object code and
meets a non-code region, so in theory, changing MCELFStreamer
accordingly (overriding those functions in there) would take care of
data vs. code issue in ELF.

Assuming LLVM doesn't generate ARM/Thumb veneers inside the same
function (ie. a Thumb function has only Thumb code), Greg could use
the EmitDataRegion and EmitDataRegionEnd, with the former saving the
state of the current code (Thumb/Arm) and the latter restoring it, by
emiting the $d and $a/t respectively.

Does it seem like a good initial approach?

Continuing... It seems MCELFStreamer already has a EmitThumbFunc,
which looks to me as the wrong place to be. I'd imagine MCELFStreamer
would have EmitFunc and MCARMELCStreamer (or whatever) would identify
its type and call the appropriate EmitThumbFunc/EmitARMFunc. Being
pedantic, even that is still too high level because of the ARM/Thumb
veneers, but we don't want to worry about that if LLVM doesn't even
try to mix ARM and Thumb (and assuming external libraries would have
the symbols, if they do).

Generating or not, LLVM's disassembler should know about those symbols
and should be able to mark them accordingly. Where would be the best
part to put those symbols (in an enum or table), so that the
MCStreamer and the disassembler could reference a single place?


-- 
cheers,
--renato

http://systemcall.org/

Jim Grosbach

2012-Oct-09 22:58 UTC

head link

[LLVMdev] R_ARM_ABS32 disassembly with integrated-as

On Oct 7, 2012, at 3:14 AM, Renato Golin <rengolin at systemcall.org>
wrote:
> On 5 October 2012 17:48, Jim Grosbach <grosbach at apple.com> wrote:
>> The recent MachO data-in-code support should have fixed a lot of the
problems. There's probably still some quirks in the specifics ($a vs. $t and
making sure the symbols get into the ELF properly), but the core functionality
to know how to mark data regions is there and works very well.
> 
> Hi Jim,
> 
> I'm trying to help Greg crack it down. From your recent commits, I
> take it you're re-using a data-in-code detection previously used only
> for ASM output, to object output, via the
> EmitDataRegion/EmitDataRegionEnd.
> 
It's a bit more than that. Those Emit* methods are new for this support.
There was spotty support for the raw $a/$t/$d stuff before, and this abstracted
and extended it to support both asm and binary emission as well as added uses
for the methods to the various bits in the ARM backend where data-in-code
regions get created (jump tables, constant pools, et. al.).

> I haven't looked too deep in the MC, but I'm supposing that will
work
> automatically when the output streamer is printing object code and
> meets a non-code region, so in theory, changing MCELFStreamer
> accordingly (overriding those functions in there) would take care of
> data vs. code issue in ELF.
Yep. They'll likely be implemented as, effectively, an EmitLabel().
> Assuming LLVM doesn't generate ARM/Thumb veneers inside the same
> function (ie. a Thumb function has only Thumb code), Greg could use
> the EmitDataRegion and EmitDataRegionEnd, with the former saving the
> state of the current code (Thumb/Arm) and the latter restoring it, by
> emiting the $d and $a/t respectively.
> 
> Does it seem like a good initial approach?
> 
> Continuing... It seems MCELFStreamer already has a EmitThumbFunc,
> which looks to me as the wrong place to be.
That's just the handler for the .thumb_func directive. It has nothing to do
with emitting the contents of the actual function.
> I'd imagine MCELFStreamer
> would have EmitFunc and MCARMELCStreamer (or whatever) would identify
> its type and call the appropriate EmitThumbFunc/EmitARMFunc. Being
> pedantic, even that is still too high level because of the ARM/Thumb
> veneers, but we don't want to worry about that if LLVM doesn't even
> try to mix ARM and Thumb (and assuming external libraries would have
> the symbols, if they do).
This is complicated a bit by needing to work for plain .s files, not just
compiler generated files. Those can intermix arm and thumb code in crazy ways.

The assembler already has a thumb vs. arm mode state (which gets adjusted via
the .arm/.thumb directives and the .code synonyms). ELF will want to check that
state and use it to determine whether a data-region-end directive should result
in a $a or a $t in the output ELF.
> Generating or not, LLVM's disassembler should know about those symbols
> and should be able to mark them accordingly. Where would be the best
> part to put those symbols (in an enum or table), so that the
> MCStreamer and the disassembler could reference a single place?
It's not the disassembler itself that should know about them, but the driver
for the disassembler. In this case, llvm-objdump. The disassembler doesn't
have that kind of gestalt knowledge.

-Jim

> 
> -- 
> cheers,
> --renato
> 
> http://systemcall.org/

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Oct 2012 - [LLVMdev] R_ARM_ABS32 disassembly with integrated-as

[LLVMdev] R_ARM_ABS32 disassembly with integrated-as

[LLVMdev] R_ARM_ABS32 disassembly with integrated-as

[LLVMdev] R_ARM_ABS32 disassembly with integrated-as

Apparently Analagous Threads