thr3ads.net - llvm dev - [LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file [Aug 2014]

If this information is useful, please help other people find it:
Share via:

Kevin Enderby

2014-Aug-06 18:31 UTC

[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file

Hello Tim, Rafael, Renato and llvmdev,

I’m working to get llvm-objdump handle both arm and thumb disassembly from the
same object file similarly to how darwin’s otool(1) works.  And I’m looking for
implementing direction.  I spoke to Jim Grosbach about some ideas and he
suggested I send out and email about some of the possibilities.  Since none of
the ones I could think of are pretty he thought maybe you would have some
thoughts or suggestions.

First a little back ground, the way darwin’s otool(1) does this is that it
creates an llvm disassembler for both arm and thumb when disassembling a binary
with 32-bit ARM cpu.  It uses the C API in <llvm-c/Disassembler.h> and
calls LLVMCreateDisasmCPU() twice, once with an arm TripleName and once with a
matching thumb TripleName.  Then for each 32-bit ARM cpu it will default to one
or the other disassembler.  Then as it disassembles and finds a symbol in the
symbol table for the current PC being disassembled it will see of the symbol has
the N_ARM_THUMB_DEF bit set or not.  And then switch disassemblers between the
arm and thumb disassemblers.  While this is a bit of a hack there are a limited
set of Mach-O cpus otool(1) deals with.

For llvm-objdump, it eventually just calls TheTarget->createMCDisassembler()
and gets one disassembler for TheTarget it created.

I talked to Jim a bit about sinking the logic of maintaining multiple
disassemblers down into the core disassembler logic and using subtarget to
select between them.   Like the ARMAsmParser and I think the ARMInstPrinter
work.  But that seems very complicated for a single target that has two
disassemblers.

The implementation of llvm-objdump does have a MachODump.cpp for use with the -m
option that I could do the a similar hack otool(1) like hack and special case
32-bit ARM cpus.  And at least it contains the ugliness.  But this does not
really help the non -m case and I suspect ELF objects may face a similar
problem.

The other more radical change I was thinking of was maybe changing MachODump.cpp
to use the C API.  Then at least this way we would have something in the tree
that used this and could actually have test cases.  That could then use the call
backs to symbolic operands etc.  But that still could be done with the C++ API
using TheTarget->createMCSymbolizer() anyway.

So if any of you have suggestions for a direction for this let me know,
Kev

Renato Golin

2014-Aug-06 19:50 UTC

head link

[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file

On 6 August 2014 19:31, Kevin Enderby <enderby at apple.com>
wrote:> First a little back ground, the way darwin’s otool(1) does this is that it
creates an llvm disassembler for both arm and thumb when disassembling a binary
with 32-bit ARM cpu.  It uses the C API in <llvm-c/Disassembler.h> and
calls LLVMCreateDisasmCPU() twice, once with an arm TripleName and once with a
matching thumb TripleName.  Then for each 32-bit ARM cpu it will default to one
or the other disassembler.  Then as it disassembles and finds a symbol in the
symbol table for the current PC being disassembled it will see of the symbol has
the N_ARM_THUMB_DEF bit set or not.  And then switch disassemblers between the
arm and thumb disassemblers.  While this is a bit of a hack there are a limited
set of Mach-O cpus otool(1) deals with.
Hi Kevin,

I guess it depends on how many other targets need to deal with the
same problem, and how much their maintainers want to cope with the
change on their side.

Creating multiple disassemblers is wasteful, but not critical to tools
like objdump, that are rarely on the hot path. It would be
simpler/quicker to instantiate them on objdump and then, based on the
Thumb bit, it chooses one or the other. However, I think this would
not be the best solution for some reasons:

1. This is disassembler logic, and having objdump doing this on a
higher level means that other tools that (eventually) need the same
functionality will have to re-implement.
2. It shouldn't be that hard to join the ARM and Thumb disassembler,
given that they're on the same file, share most of the static
functions and could easily delegate with getInstruction() deciding
which to use: getThumbInstruction() or getARMInstruction() and
renaming Thumb/ARMDisassembler functions to match.

Though, I haven't got my hands on the disassembler that much, so other
people with more experience in that area could chime in and give more
reasons on either side.

My personal opinion is that it'd be more elegant and stable that way,
and any work we have to do now would compensate in the future, but
other back-end maintainers could disagree.

cheers,
--renato

Kevin Enderby

2014-Aug-06 20:18 UTC

head link

[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file

Hi Renato,

Thanks for your reply.  A few comments in line below.

Kev

On Aug 6, 2014, at 12:50 PM, Renato Golin <renato.golin at linaro.org>
wrote:
> On 6 August 2014 19:31, Kevin Enderby <enderby at apple.com> wrote:
>> First a little back ground, the way darwin’s otool(1) does this is that
it creates an llvm disassembler for both arm and thumb when disassembling a
binary with 32-bit ARM cpu.  It uses the C API in <llvm-c/Disassembler.h>
and calls LLVMCreateDisasmCPU() twice, once with an arm TripleName and once with
a matching thumb TripleName.  Then for each 32-bit ARM cpu it will default to
one or the other disassembler.  Then as it disassembles and finds a symbol in
the symbol table for the current PC being disassembled it will see of the symbol
has the N_ARM_THUMB_DEF bit set or not.  And then switch disassemblers between
the arm and thumb disassemblers.  While this is a bit of a hack there are a
limited set of Mach-O cpus otool(1) deals with.
> 
> Hi Kevin,
> 
> I guess it depends on how many other targets need to deal with the
> same problem, and how much their maintainers want to cope with the
> change on their side.
That’s the rub.  I think only 32-bit arm has this issue with multiple
disassemblers
and I would hate to add a bunch of stuff that all targets would have to deal
with.
Love to hear if any other target maintainer could even uses this.
> 
> Creating multiple disassemblers is wasteful, but not critical to tools
> like objdump, that are rarely on the hot path. It would be
> simpler/quicker to instantiate them on objdump and then, based on the
> Thumb bit, it chooses one or the other.
I agree with all that.  And this would be pretty simple to deal with inside
objdump and be done with it.
> However, I think this would
> not be the best solution for some reasons:
> 
> 1. This is disassembler logic, and having objdump doing this on a
> higher level means that other tools that (eventually) need the same
> functionality will have to re-implement.
Also agreed.  But can you or anyone else think of other tools that would
need this logic?  Hate to do a whole bunch of work adding to the lower
layers just to make objdump a bit cleaner.
> 2. It shouldn't be that hard to join the ARM and Thumb disassembler,
> given that they're on the same file, share most of the static
> functions and could easily delegate with getInstruction() deciding
> which to use: getThumbInstruction() or getARMInstruction() and
> renaming Thumb/ARMDisassembler functions to match.
Yep could do that.  But it seems like a lot of work for very little pay off.
> Though, I haven't got my hands on the disassembler that much, so other
> people with more experience in that area could chime in and give more
> reasons on either side.
Me too.
> My personal opinion is that it'd be more elegant and stable that way,
Absolutely agree about this being more elegant.
> and any work we have to do now would compensate in the future, but
> other back-end maintainers could disagree.
As a maintainer of darwin’s otool(1) for some 20 plus years and plugging in
some 9 or more different disassemblers the use if two llvm disassembler’s for
32-bit arm is no big deal at all.  Heck it still has the old arm disassemblers
in
it and many other old ones and it is very stable and I rarely if ever have to
touch
those interfaces.
> 
> cheers,
> --renato

Rafael Espíndola

2014-Aug-07 14:09 UTC

head link

[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file

> The implementation of llvm-objdump does have a MachODump.cpp for use with
the -m option that I could do the a similar hack otool(1) like hack and special
case 32-bit ARM cpus.  And at least it contains the ugliness.  But this does not
really help the non -m case and I suspect ELF objects may face a similar
problem.
My gut feeling would be to do this first and keep an eye for how to
refactor it when we want to add support for ELF or for things like
mips16.

CCing Eric since he is now the expert on the target/subtarget relationship.
> The other more radical change I was thinking of was maybe changing
MachODump.cpp to use the C API.  Then at least this way we would have something
in the tree that used this and could actually have test cases.  That could then
use the call backs to symbolic operands etc.  But that still could be done with
the C++ API using TheTarget->createMCSymbolizer() anyway.
Using the C api seems odd.
> So if any of you have suggestions for a direction for this let me know,
> Kev
Cheers,
Rafael

Eric Christopher

2014-Aug-09 00:56 UTC

head link

[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file

On Thu, Aug 7, 2014 at 7:09 AM, Rafael Espíndola
<rafael.espindola at gmail.com> wrote:>> The implementation of llvm-objdump does have a MachODump.cpp for use
with the -m option that I could do the a similar hack otool(1) like hack and
special case 32-bit ARM cpus.  And at least it contains the ugliness.  But this
does not really help the non -m case and I suspect ELF objects may face a
similar problem.
>
> My gut feeling would be to do this first and keep an eye for how to
> refactor it when we want to add support for ELF or for things like
> mips16.
>
> CCing Eric since he is now the expert on the target/subtarget relationship.
>
Sure. Probably the easiest way would be to take all of the disparate
classes and throw them under the MCSubtargetInfo as best as can be
done, then you can just create a single object for each target you
want to disassemble for in the binary. arm/thumb mips/mips16 might be
a bit more difficult as you'll need to swap on a function by function
basis, but as long as you've got an index of subtargets to handle
disassembly it should be possible. Right now the arm/thumb interface
is a bit wonky there, but I think mips/mips16 should work - at least
it does for code generation, I haven't looked heavily at the
disassembler in a while.

If you're interested in going down this path though I'll give it some
thought and see what I can do.

-eric

llvm dev - Aug 2014 - [LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file

[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file

[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file

[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file

[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file

[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file