Hi Evan, Thanks for the information! I've try to use llvm-objdump to disassemble some ARM binary, such as busybox in android. ./llvm-objdump -arch=arm -d busybox There are many instructions cannot decode, :./llvm-objdump: warning: invalid instruction encoding Did I use llvm-objdump in a correct way? I think that one possible reason is that llvm-objdump encounter pc relative data. I'll figure out if this is the reason. Thanks, David On Wed, Jun 6, 2012 at 1:36 PM, Evan Cheng <evan.cheng at apple.com> wrote:> > On Jun 5, 2012, at 7:44 PM, Fan Dawei <fandawei.s at gmail.com> wrote: > > > Hi, > > > > I'm considering to use MC disassembler for ARM target in a binary > translation project. However after trying some ARM binary and I find that > there are a lot of instructions that the disassembler fails to to decoding. > > > > Could anyone give me some information about the maturity of ARM > disassembler? > > It's production quality. We're not aware of any instructions that it fails > to decode. Please provide examples / file bugs. > > Evan > > > > > Thanks! > > David > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120606/3985fec1/attachment.html>
Hi David,> I've try to use llvm-objdump to disassemble some ARM binary, such as busybox > in android. > > ./llvm-objdump -arch=arm -d busyboxIt's probably assuming the wrong architecture revision. I don't have an android busybox handy, but I see similar on binaries compiled for ARMv7. The trick is to use: llvm-objdump -triple=armv7 -d whatever (ARMv7 covers virtually anything Android will be running on these days). There are a couple of other things to be wary of at the moment though: 1. PC-relative data, as you said: ARM code often includes literal data inline with code, this could well *not* have a valid disassembly. In relocatable object files, these regions should be marked[*], but I believe LLVM has problems with that currently. In executable files (like "busybox") the regions won't necessarily even be marked. 2. ARM object files may contain mixed ARM and Thumb code: two different instruction sets. Obviously, disassembling ARM as Thumb or the reverse won't give you anything sensible. Again, relocatable files mark these regions[*] but executables don't. If you know an what you want is thumb code, you can use the triple "thumbv7" instead for llvm-objdump. So a combination of those probably explains why you're getting problems and may improve matters, but it probably won't make things perfect (and arguably can't in the case of the ARM/Thumb distinction without reconstructing all possible control-flow graphs). Tim. [*] The marking is via symbols $a, $t and $d which reference the beginning each stretch of ARM code, Thumb code and Data.
> ./llvm-objdump -arch=arm -d busyboxIt might be possible that this defaults to armv4. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
Hi Tim, Thanks a lot for the reply. I tested libc.so which is a shared library. llvm-objdump also report some disassemble errors. Could you please tell me more about $a, $t and $d symbols? How these symbols are used to define different regions? Where I can find this symbols in ELF object file? Thanks, David I'm now try to find a decoder of ARM instructions in oder On Thu, Jun 7, 2012 at 3:57 AM, Tim Northover <t.p.northover at gmail.com>wrote:> Hi David, > > > I've try to use llvm-objdump to disassemble some ARM binary, such as > busybox > > in android. > > > > ./llvm-objdump -arch=arm -d busybox > > It's probably assuming the wrong architecture revision. I don't have > an android busybox handy, but I see similar on binaries compiled for > ARMv7. The trick is to use: > > llvm-objdump -triple=armv7 -d whatever > > (ARMv7 covers virtually anything Android will be running on these days). > > There are a couple of other things to be wary of at the moment though: > 1. PC-relative data, as you said: ARM code often includes literal data > inline with code, this could well *not* have a valid disassembly. In > relocatable object files, these regions should be marked[*], but I > believe LLVM has problems with that currently. In executable files > (like "busybox") the regions won't necessarily even be marked. > > 2. ARM object files may contain mixed ARM and Thumb code: two > different instruction sets. Obviously, disassembling ARM as Thumb or > the reverse won't give you anything sensible. Again, relocatable files > mark these regions[*] but executables don't. If you know an what you > want is thumb code, you can use the triple "thumbv7" instead for > llvm-objdump. > > So a combination of those probably explains why you're getting > problems and may improve matters, but it probably won't make things > perfect (and arguably can't in the case of the ARM/Thumb distinction > without reconstructing all possible control-flow graphs). > > Tim. > > [*] The marking is via symbols $a, $t and $d which reference the > beginning each stretch of ARM code, Thumb code and Data. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120607/a1fdd393/attachment.html>