Kevin Enderby
2014-Aug-06  18:31 UTC
[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file
Hello Tim, Rafael, Renato and llvmdev, I’m working to get llvm-objdump handle both arm and thumb disassembly from the same object file similarly to how darwin’s otool(1) works. And I’m looking for implementing direction. I spoke to Jim Grosbach about some ideas and he suggested I send out and email about some of the possibilities. Since none of the ones I could think of are pretty he thought maybe you would have some thoughts or suggestions. First a little back ground, the way darwin’s otool(1) does this is that it creates an llvm disassembler for both arm and thumb when disassembling a binary with 32-bit ARM cpu. It uses the C API in <llvm-c/Disassembler.h> and calls LLVMCreateDisasmCPU() twice, once with an arm TripleName and once with a matching thumb TripleName. Then for each 32-bit ARM cpu it will default to one or the other disassembler. Then as it disassembles and finds a symbol in the symbol table for the current PC being disassembled it will see of the symbol has the N_ARM_THUMB_DEF bit set or not. And then switch disassemblers between the arm and thumb disassemblers. While this is a bit of a hack there are a limited set of Mach-O cpus otool(1) deals with. For llvm-objdump, it eventually just calls TheTarget->createMCDisassembler() and gets one disassembler for TheTarget it created. I talked to Jim a bit about sinking the logic of maintaining multiple disassemblers down into the core disassembler logic and using subtarget to select between them. Like the ARMAsmParser and I think the ARMInstPrinter work. But that seems very complicated for a single target that has two disassemblers. The implementation of llvm-objdump does have a MachODump.cpp for use with the -m option that I could do the a similar hack otool(1) like hack and special case 32-bit ARM cpus. And at least it contains the ugliness. But this does not really help the non -m case and I suspect ELF objects may face a similar problem. The other more radical change I was thinking of was maybe changing MachODump.cpp to use the C API. Then at least this way we would have something in the tree that used this and could actually have test cases. That could then use the call backs to symbolic operands etc. But that still could be done with the C++ API using TheTarget->createMCSymbolizer() anyway. So if any of you have suggestions for a direction for this let me know, Kev
Renato Golin
2014-Aug-06  19:50 UTC
[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file
On 6 August 2014 19:31, Kevin Enderby <enderby at apple.com> wrote:> First a little back ground, the way darwin’s otool(1) does this is that it creates an llvm disassembler for both arm and thumb when disassembling a binary with 32-bit ARM cpu. It uses the C API in <llvm-c/Disassembler.h> and calls LLVMCreateDisasmCPU() twice, once with an arm TripleName and once with a matching thumb TripleName. Then for each 32-bit ARM cpu it will default to one or the other disassembler. Then as it disassembles and finds a symbol in the symbol table for the current PC being disassembled it will see of the symbol has the N_ARM_THUMB_DEF bit set or not. And then switch disassemblers between the arm and thumb disassemblers. While this is a bit of a hack there are a limited set of Mach-O cpus otool(1) deals with.Hi Kevin, I guess it depends on how many other targets need to deal with the same problem, and how much their maintainers want to cope with the change on their side. Creating multiple disassemblers is wasteful, but not critical to tools like objdump, that are rarely on the hot path. It would be simpler/quicker to instantiate them on objdump and then, based on the Thumb bit, it chooses one or the other. However, I think this would not be the best solution for some reasons: 1. This is disassembler logic, and having objdump doing this on a higher level means that other tools that (eventually) need the same functionality will have to re-implement. 2. It shouldn't be that hard to join the ARM and Thumb disassembler, given that they're on the same file, share most of the static functions and could easily delegate with getInstruction() deciding which to use: getThumbInstruction() or getARMInstruction() and renaming Thumb/ARMDisassembler functions to match. Though, I haven't got my hands on the disassembler that much, so other people with more experience in that area could chime in and give more reasons on either side. My personal opinion is that it'd be more elegant and stable that way, and any work we have to do now would compensate in the future, but other back-end maintainers could disagree. cheers, --renato
Kevin Enderby
2014-Aug-06  20:18 UTC
[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file
Hi Renato, Thanks for your reply. A few comments in line below. Kev On Aug 6, 2014, at 12:50 PM, Renato Golin <renato.golin at linaro.org> wrote:> On 6 August 2014 19:31, Kevin Enderby <enderby at apple.com> wrote: >> First a little back ground, the way darwin’s otool(1) does this is that it creates an llvm disassembler for both arm and thumb when disassembling a binary with 32-bit ARM cpu. It uses the C API in <llvm-c/Disassembler.h> and calls LLVMCreateDisasmCPU() twice, once with an arm TripleName and once with a matching thumb TripleName. Then for each 32-bit ARM cpu it will default to one or the other disassembler. Then as it disassembles and finds a symbol in the symbol table for the current PC being disassembled it will see of the symbol has the N_ARM_THUMB_DEF bit set or not. And then switch disassemblers between the arm and thumb disassemblers. While this is a bit of a hack there are a limited set of Mach-O cpus otool(1) deals with. > > Hi Kevin, > > I guess it depends on how many other targets need to deal with the > same problem, and how much their maintainers want to cope with the > change on their side.That’s the rub. I think only 32-bit arm has this issue with multiple disassemblers and I would hate to add a bunch of stuff that all targets would have to deal with. Love to hear if any other target maintainer could even uses this.> > Creating multiple disassemblers is wasteful, but not critical to tools > like objdump, that are rarely on the hot path. It would be > simpler/quicker to instantiate them on objdump and then, based on the > Thumb bit, it chooses one or the other.I agree with all that. And this would be pretty simple to deal with inside objdump and be done with it.> However, I think this would > not be the best solution for some reasons: > > 1. This is disassembler logic, and having objdump doing this on a > higher level means that other tools that (eventually) need the same > functionality will have to re-implement.Also agreed. But can you or anyone else think of other tools that would need this logic? Hate to do a whole bunch of work adding to the lower layers just to make objdump a bit cleaner.> 2. It shouldn't be that hard to join the ARM and Thumb disassembler, > given that they're on the same file, share most of the static > functions and could easily delegate with getInstruction() deciding > which to use: getThumbInstruction() or getARMInstruction() and > renaming Thumb/ARMDisassembler functions to match.Yep could do that. But it seems like a lot of work for very little pay off.> Though, I haven't got my hands on the disassembler that much, so other > people with more experience in that area could chime in and give more > reasons on either side.Me too.> My personal opinion is that it'd be more elegant and stable that way,Absolutely agree about this being more elegant.> and any work we have to do now would compensate in the future, but > other back-end maintainers could disagree.As a maintainer of darwin’s otool(1) for some 20 plus years and plugging in some 9 or more different disassemblers the use if two llvm disassembler’s for 32-bit arm is no big deal at all. Heck it still has the old arm disassemblers in it and many other old ones and it is very stable and I rarely if ever have to touch those interfaces.> > cheers, > --renato
Rafael Espíndola
2014-Aug-07  14:09 UTC
[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file
> The implementation of llvm-objdump does have a MachODump.cpp for use with the -m option that I could do the a similar hack otool(1) like hack and special case 32-bit ARM cpus. And at least it contains the ugliness. But this does not really help the non -m case and I suspect ELF objects may face a similar problem.My gut feeling would be to do this first and keep an eye for how to refactor it when we want to add support for ELF or for things like mips16. CCing Eric since he is now the expert on the target/subtarget relationship.> The other more radical change I was thinking of was maybe changing MachODump.cpp to use the C API. Then at least this way we would have something in the tree that used this and could actually have test cases. That could then use the call backs to symbolic operands etc. But that still could be done with the C++ API using TheTarget->createMCSymbolizer() anyway.Using the C api seems odd.> So if any of you have suggestions for a direction for this let me know, > KevCheers, Rafael
Eric Christopher
2014-Aug-09  00:56 UTC
[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file
On Thu, Aug 7, 2014 at 7:09 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:>> The implementation of llvm-objdump does have a MachODump.cpp for use with the -m option that I could do the a similar hack otool(1) like hack and special case 32-bit ARM cpus. And at least it contains the ugliness. But this does not really help the non -m case and I suspect ELF objects may face a similar problem. > > My gut feeling would be to do this first and keep an eye for how to > refactor it when we want to add support for ELF or for things like > mips16. > > CCing Eric since he is now the expert on the target/subtarget relationship. >Sure. Probably the easiest way would be to take all of the disparate classes and throw them under the MCSubtargetInfo as best as can be done, then you can just create a single object for each target you want to disassemble for in the binary. arm/thumb mips/mips16 might be a bit more difficult as you'll need to swap on a function by function basis, but as long as you've got an index of subtargets to handle disassembly it should be possible. Right now the arm/thumb interface is a bit wonky there, but I think mips/mips16 should work - at least it does for code generation, I haven't looked heavily at the disassembler in a while. If you're interested in going down this path though I'll give it some thought and see what I can do. -eric
Apparently Analagous Threads
- [LLVMdev] Making llvm-objdump more like GNU objdump
- [LLVMdev] Making llvm-objdump more like GNU objdump
- [LLVMdev] Making llvm-objdump more like GNU objdump
- [LLVMdev] Disassembly arbitrary machine-code byte arrays
- [LLVMdev] Making llvm-objdump more like GNU objdump