Greg Fitzgerald
2012-Oct-04 23:35 UTC
[LLVMdev] R_ARM_ABS32 disassembly with integrated-as
I'm attempting to detect encoding bugs by comparing disassembly when using GCC's 'as' versus LLVM's integrated assembler. Generally this has gone very well, but one thing that adds a lot of noise is that .word marked as a R_ARM_ABS32 is disassembled as an instruction and not data. Please see the attached 'dump.diff' which was generated by diffing the "objdump -d --all-headers" for each object file. Is this a bug? If so, how can I fix it? Thanks, Greg -------------- next part -------------- A non-text attachment was scrubbed... Name: dump.diff Type: application/octet-stream Size: 10954 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121004/31109958/attachment.obj>
Hi Greg,> Is this a bug? If so, how can I fix it?It's somewhere between a bug and a quality-of-implementation issue. ARM often uses literal pools in the middle of code when it needs to materialize a large constant (or variable address more likely for R_ARM_ABS32). This results in a sequence roughly like: ldr r0, special_lit_sym [...] b past_literals special_lit_sym: .word variable_desired past_literals: [...instructions...] In general, deciding whether to disassemble a given location as code or data is a very hard problem (think of all the evil tricks you could play with dual-purpose), so the ARM ELF ABI (http://infocenter.arm.com/help/topic/com.arm.../IHI0044D_aaelf.pdf) specifies something called mapping symbols, which assemblers should insert to tell disassemblers what's actually needed. The idea is that a $a should be inserted at the start of each section of ARM code, $t before Thumb and $d before data (including these embedded litpools). In the above example, $a would be somewhere before the first ldr, $d at "special_lit_sym" and $a again at "past_literals". objdump will then use these to decide how to display a given address. If you dump the symbol table with "readelf -s" (objdump hides them on my system at least) you should see these in the GCC binary, but almost certainly not in the LLVM one. There's some kind of half-written support already in LLVM I believe, but it's been broken for as long as I can remember. You'd need to make the MC emitters properly understand when they're switching between code and data areas, and insert the appropriate symbols. Hope this helps. Tim.
FWIW, I believe the following bugzilla issue reports/covers that mapping symbols are not being produced: http://llvm.org/bugs/show_bug.cgi?id=9582> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On > Behalf Of Tim Northover > Sent: 05 October 2012 08:15 > To: Greg Fitzgerald > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] R_ARM_ABS32 disassembly with integrated-as > > Hi Greg, > > > Is this a bug? If so, how can I fix it? > > It's somewhere between a bug and a quality-of-implementation issue. > ARM often uses literal pools in the middle of code when it needs to > materialize a large constant (or variable address more likely for > R_ARM_ABS32). This results in a sequence roughly like: > > ldr r0, special_lit_sym > [...] > b past_literals > special_lit_sym: > .word variable_desired > past_literals: > [...instructions...] > > In general, deciding whether to disassemble a given location as code > or data is a very hard problem (think of all the evil tricks you could > play with dual-purpose), so the ARM ELF ABI > (http://infocenter.arm.com/help/topic/com.arm.../IHI0044D_aaelf.pdf) > specifies something called mapping symbols, which assemblers should > insert to tell disassemblers what's actually needed. > > The idea is that a $a should be inserted at the start of each section > of ARM code, $t before Thumb and $d before data (including these > embedded litpools). In the above example, $a would be somewhere before > the first ldr, $d at "special_lit_sym" and $a again at > "past_literals". objdump will then use these to decide how to display > a given address. > > If you dump the symbol table with "readelf -s" (objdump hides them on > my system at least) you should see these in the GCC binary, but almost > certainly not in the LLVM one. > > There's some kind of half-written support already in LLVM I believe, > but it's been broken for as long as I can remember. You'd need to make > the MC emitters properly understand when they're switching between > code and data areas, and insert the appropriate symbols. > > Hope this helps. > > Tim. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On 5 October 2012 10:53, Kristof Beyls <kristof.beyls at arm.com> wrote:> FWIW, I believe the following bugzilla issue reports/covers that > mapping symbols are not being produced: > http://llvm.org/bugs/show_bug.cgi?id=9582Yes.. I meant to fix that and it kinda slipped... ;) Greg, There should be information enough in the bug to be an easy fix. If you *really* don't want to look at it, I might have some time in the near future... -- cheers, --renato http://systemcall.org/
On Oct 5, 2012, at 12:15 AM, Tim Northover <t.p.northover at gmail.com> wrote:> Hi Greg, > >> Is this a bug? If so, how can I fix it? > > It's somewhere between a bug and a quality-of-implementation issue. > ARM often uses literal pools in the middle of code when it needs to > materialize a large constant (or variable address more likely for > R_ARM_ABS32). This results in a sequence roughly like: > > ldr r0, special_lit_sym > [...] > b past_literals > special_lit_sym: > .word variable_desired > past_literals: > [...instructions...] > > In general, deciding whether to disassemble a given location as code > or data is a very hard problem (think of all the evil tricks you could > play with dual-purpose), so the ARM ELF ABI > (http://infocenter.arm.com/help/topic/com.arm.../IHI0044D_aaelf.pdf) > specifies something called mapping symbols, which assemblers should > insert to tell disassemblers what's actually needed. > > The idea is that a $a should be inserted at the start of each section > of ARM code, $t before Thumb and $d before data (including these > embedded litpools). In the above example, $a would be somewhere before > the first ldr, $d at "special_lit_sym" and $a again at > "past_literals". objdump will then use these to decide how to display > a given address. > > If you dump the symbol table with "readelf -s" (objdump hides them on > my system at least) you should see these in the GCC binary, but almost > certainly not in the LLVM one. > > There's some kind of half-written support already in LLVM I believe, > but it's been broken for as long as I can remember. You'd need to make > the MC emitters properly understand when they're switching between > code and data areas, and insert the appropriate symbols.The recent MachO data-in-code support should have fixed a lot of the problems. There's probably still some quirks in the specifics ($a vs. $t and making sure the symbols get into the ELF properly), but the core functionality to know how to mark data regions is there and works very well. -Jim> > Hope this helps. > > Tim. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Apparently Analagous Threads
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as