On Oct 5, 2012, at 12:15 AM, Tim Northover <t.p.northover at gmail.com> wrote:> Hi Greg, > >> Is this a bug? If so, how can I fix it? > > It's somewhere between a bug and a quality-of-implementation issue. > ARM often uses literal pools in the middle of code when it needs to > materialize a large constant (or variable address more likely for > R_ARM_ABS32). This results in a sequence roughly like: > > ldr r0, special_lit_sym > [...] > b past_literals > special_lit_sym: > .word variable_desired > past_literals: > [...instructions...] > > In general, deciding whether to disassemble a given location as code > or data is a very hard problem (think of all the evil tricks you could > play with dual-purpose), so the ARM ELF ABI > (http://infocenter.arm.com/help/topic/com.arm.../IHI0044D_aaelf.pdf) > specifies something called mapping symbols, which assemblers should > insert to tell disassemblers what's actually needed. > > The idea is that a $a should be inserted at the start of each section > of ARM code, $t before Thumb and $d before data (including these > embedded litpools). In the above example, $a would be somewhere before > the first ldr, $d at "special_lit_sym" and $a again at > "past_literals". objdump will then use these to decide how to display > a given address. > > If you dump the symbol table with "readelf -s" (objdump hides them on > my system at least) you should see these in the GCC binary, but almost > certainly not in the LLVM one. > > There's some kind of half-written support already in LLVM I believe, > but it's been broken for as long as I can remember. You'd need to make > the MC emitters properly understand when they're switching between > code and data areas, and insert the appropriate symbols.The recent MachO data-in-code support should have fixed a lot of the problems. There's probably still some quirks in the specifics ($a vs. $t and making sure the symbols get into the ELF properly), but the core functionality to know how to mark data regions is there and works very well. -Jim> > Hope this helps. > > Tim. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On 5 October 2012 17:48, Jim Grosbach <grosbach at apple.com> wrote:> The recent MachO data-in-code support should have fixed a lot of the problems. There's probably still some quirks in the specifics ($a vs. $t and making sure the symbols get into the ELF properly), but the core functionality to know how to mark data regions is there and works very well.Hi Jim, I'm trying to help Greg crack it down. From your recent commits, I take it you're re-using a data-in-code detection previously used only for ASM output, to object output, via the EmitDataRegion/EmitDataRegionEnd. I haven't looked too deep in the MC, but I'm supposing that will work automatically when the output streamer is printing object code and meets a non-code region, so in theory, changing MCELFStreamer accordingly (overriding those functions in there) would take care of data vs. code issue in ELF. Assuming LLVM doesn't generate ARM/Thumb veneers inside the same function (ie. a Thumb function has only Thumb code), Greg could use the EmitDataRegion and EmitDataRegionEnd, with the former saving the state of the current code (Thumb/Arm) and the latter restoring it, by emiting the $d and $a/t respectively. Does it seem like a good initial approach? Continuing... It seems MCELFStreamer already has a EmitThumbFunc, which looks to me as the wrong place to be. I'd imagine MCELFStreamer would have EmitFunc and MCARMELCStreamer (or whatever) would identify its type and call the appropriate EmitThumbFunc/EmitARMFunc. Being pedantic, even that is still too high level because of the ARM/Thumb veneers, but we don't want to worry about that if LLVM doesn't even try to mix ARM and Thumb (and assuming external libraries would have the symbols, if they do). Generating or not, LLVM's disassembler should know about those symbols and should be able to mark them accordingly. Where would be the best part to put those symbols (in an enum or table), so that the MCStreamer and the disassembler could reference a single place? -- cheers, --renato http://systemcall.org/
On Oct 7, 2012, at 3:14 AM, Renato Golin <rengolin at systemcall.org> wrote:> On 5 October 2012 17:48, Jim Grosbach <grosbach at apple.com> wrote: >> The recent MachO data-in-code support should have fixed a lot of the problems. There's probably still some quirks in the specifics ($a vs. $t and making sure the symbols get into the ELF properly), but the core functionality to know how to mark data regions is there and works very well. > > Hi Jim, > > I'm trying to help Greg crack it down. From your recent commits, I > take it you're re-using a data-in-code detection previously used only > for ASM output, to object output, via the > EmitDataRegion/EmitDataRegionEnd. >It's a bit more than that. Those Emit* methods are new for this support. There was spotty support for the raw $a/$t/$d stuff before, and this abstracted and extended it to support both asm and binary emission as well as added uses for the methods to the various bits in the ARM backend where data-in-code regions get created (jump tables, constant pools, et. al.).> I haven't looked too deep in the MC, but I'm supposing that will work > automatically when the output streamer is printing object code and > meets a non-code region, so in theory, changing MCELFStreamer > accordingly (overriding those functions in there) would take care of > data vs. code issue in ELF.Yep. They'll likely be implemented as, effectively, an EmitLabel().> Assuming LLVM doesn't generate ARM/Thumb veneers inside the same > function (ie. a Thumb function has only Thumb code), Greg could use > the EmitDataRegion and EmitDataRegionEnd, with the former saving the > state of the current code (Thumb/Arm) and the latter restoring it, by > emiting the $d and $a/t respectively. > > Does it seem like a good initial approach? > > Continuing... It seems MCELFStreamer already has a EmitThumbFunc, > which looks to me as the wrong place to be.That's just the handler for the .thumb_func directive. It has nothing to do with emitting the contents of the actual function.> I'd imagine MCELFStreamer > would have EmitFunc and MCARMELCStreamer (or whatever) would identify > its type and call the appropriate EmitThumbFunc/EmitARMFunc. Being > pedantic, even that is still too high level because of the ARM/Thumb > veneers, but we don't want to worry about that if LLVM doesn't even > try to mix ARM and Thumb (and assuming external libraries would have > the symbols, if they do).This is complicated a bit by needing to work for plain .s files, not just compiler generated files. Those can intermix arm and thumb code in crazy ways. The assembler already has a thumb vs. arm mode state (which gets adjusted via the .arm/.thumb directives and the .code synonyms). ELF will want to check that state and use it to determine whether a data-region-end directive should result in a $a or a $t in the output ELF.> Generating or not, LLVM's disassembler should know about those symbols > and should be able to mark them accordingly. Where would be the best > part to put those symbols (in an enum or table), so that the > MCStreamer and the disassembler could reference a single place?It's not the disassembler itself that should know about them, but the driver for the disassembler. In this case, llvm-objdump. The disassembler doesn't have that kind of gestalt knowledge. -Jim> > -- > cheers, > --renato > > http://systemcall.org/
Reasonably Related Threads
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as
- [LLVMdev] R_ARM_ABS32 disassembly with integrated-as