Hi Tim, Thanks a lot for your help! I'm very grateful. libc.so is a prelinked library, I'll build a non-prelinked one and have another try. I'm now at the start of a binary translation project. I want to convert ARM binary code [*] to llvm ir, which is then translated to binary for our mips like architecture. That's why I'm looking for a decoder for ARM binary. The ARMMCDisassembler is production quality as be told by Evan. That's why I'm so interested in it. However, I realized today that might not be a good choice. Although the disassembled MCInsts has a clean and simple interface, the op-codes in them are auto generated from instruction description files. They are in large quantities and do not have one-to-one correspondence to arm instructions. I think it is not a good idea for our translator to rely on the implementation of llvm ARM back-end. So I have to find another decoder or implement it by by ourselves. Thanks, David [*] For most case, the targets are the shared libraries in Android APKs developed by NDK, like libangraybird.so. I think most of them are pre-linked, so it is bad for us. Because there is no $a, $t and $d symbols, we cannot figure out which region is arm code or thumb code statically. On Thu, Jun 7, 2012 at 8:11 PM, Tim Northover <t.p.northover at gmail.com>wrote:> Hi David, > > On Thu, Jun 7, 2012 at 10:17 AM, Fan Dawei <fandawei.s at gmail.com> wrote: > > Could you please tell me more about $a, $t and $d symbols? How these > symbols > > are used to define different regions? Where I can find this symbols in > ELF > > object file? > > At the start of each range of ARM code, an assembler or compiler > should produce a "$a" symbol with that address, and put it (naturally > enough) in the ELF symbol-table. Similarly each stretch of Thumb code > gets a "$t" and each data a "$d". > > For example if I assemble: > > .arm > mov r0, r3 > ldr r2, Lit > Lit: > .word 42 > add r0, r0, r0 > .thumb > mov r5, r2 > > then the symbol table contains these entries: > 4: 00000000 0 NOTYPE LOCAL DEFAULT 1 $a > [...] > 6: 00000008 0 NOTYPE LOCAL DEFAULT 1 $d > 7: 0000000c 0 NOTYPE LOCAL DEFAULT 1 $a > 8: 00000010 0 NOTYPE LOCAL DEFAULT 1 $t > > which shows that an ARM region begins at offset 0x0, a data one at > offset 0x8, we switch back to ARM at 0xc and finally Thumb takes over > at 0x10. > > GNU objdump hides the symbols by default when printing the > symbol-table (you can give it the --special-syms option to show them), > but readelf shows them always. > > If you want the really deep details, they're fully documented in the > ARM ELF ABI here (section 4.6.5): > > > infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf > > Which is all nice to know, but I'm afraid it probably doesn't offer an > immediate solution to the undefined instructions: > + libc.so isn't a relocatable object file (well, it is dynamically, > but that doesn't count). > + llvm-objdump ignores them anyway at the moment, as far as I can tell. > > Tim. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20120607/f3a505bc/attachment.html>
On Jun 7, 2012, at 7:53 AM, Fan Dawei <fandawei.s at gmail.com> wrote:> Hi Tim, > > Thanks a lot for your help! I'm very grateful. > > libc.so is a prelinked library, I'll build a non-prelinked one and have another try. > > I'm now at the start of a binary translation project. I want to convert ARM binary code [*] to llvm ir, which is then translated to binary for our mips like architecture. That's why I'm looking for a decoder for ARM binary. > > The ARMMCDisassembler is production quality as be told by Evan. That's why I'm so interested in it. However, I realized today that might not be a good choice. Although the disassembled MCInsts has a clean and simple interface, the op-codes in them are auto generated from instruction description files. They are in large quantities and do not have one-to-one correspondence to arm instructions. I think it is not a good idea for our translator to rely on the implementation of llvm ARM back-end. So I have to find another decoder or implement it by by ourselves.Every MCInst created by the MCDisassembler will have a one-to-one mapping to an actual ARM instruction.> > Thanks, > David > > [*] For most case, the targets are the shared libraries in Android APKs developed by NDK, like libangraybird.so. I think most of them are pre-linked, so it is bad for us. Because there is no $a, $t and $d symbols, we cannot figure out which region is arm code or thumb code statically. > > > On Thu, Jun 7, 2012 at 8:11 PM, Tim Northover <t.p.northover at gmail.com> wrote: > Hi David, > > On Thu, Jun 7, 2012 at 10:17 AM, Fan Dawei <fandawei.s at gmail.com> wrote: > > Could you please tell me more about $a, $t and $d symbols? How these symbols > > are used to define different regions? Where I can find this symbols in ELF > > object file? > > At the start of each range of ARM code, an assembler or compiler > should produce a "$a" symbol with that address, and put it (naturally > enough) in the ELF symbol-table. Similarly each stretch of Thumb code > gets a "$t" and each data a "$d". > > For example if I assemble: > > .arm > mov r0, r3 > ldr r2, Lit > Lit: > .word 42 > add r0, r0, r0 > .thumb > mov r5, r2 > > then the symbol table contains these entries: > 4: 00000000 0 NOTYPE LOCAL DEFAULT 1 $a > [...] > 6: 00000008 0 NOTYPE LOCAL DEFAULT 1 $d > 7: 0000000c 0 NOTYPE LOCAL DEFAULT 1 $a > 8: 00000010 0 NOTYPE LOCAL DEFAULT 1 $t > > which shows that an ARM region begins at offset 0x0, a data one at > offset 0x8, we switch back to ARM at 0xc and finally Thumb takes over > at 0x10. > > GNU objdump hides the symbols by default when printing the > symbol-table (you can give it the --special-syms option to show them), > but readelf shows them always. > > If you want the really deep details, they're fully documented in the > ARM ELF ABI here (section 4.6.5): > > infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf > > Which is all nice to know, but I'm afraid it probably doesn't offer an > immediate solution to the undefined instructions: > + libc.so isn't a relocatable object file (well, it is dynamically, > but that doesn't count). > + llvm-objdump ignores them anyway at the moment, as far as I can tell. > > Tim. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20120607/c62894a7/attachment.html>
Hi Jim, Thanks for reply. I'm sorry I didn't make myself clear enough. The MCInst created by MCDisassembler depends on the instructions defined in td files. These instructions do not have a one to one mapping to ARM instructions. There are usually one or more instructions defined in the td file correspond to one actual ARM instruction. Thanks, David On Thu, Jun 7, 2012 at 1:27 PM, Jim Grosbach <grosbach at apple.com> wrote:> > On Jun 7, 2012, at 7:53 AM, Fan Dawei <fandawei.s at gmail.com> wrote: > > Hi Tim, > > Thanks a lot for your help! I'm very grateful. > > libc.so is a prelinked library, I'll build a non-prelinked one and have > another try. > > I'm now at the start of a binary translation project. I want to convert > ARM binary code [*] to llvm ir, which is then translated to binary for our > mips like architecture. That's why I'm looking for a decoder for ARM > binary. > > The ARMMCDisassembler is production quality as be told by Evan. That's why > I'm so interested in it. However, I realized today that might not be a good > choice. Although the disassembled MCInsts has a clean and simple interface, > the op-codes in them are auto generated from instruction description files. > They are in large quantities and do not have one-to-one correspondence to > arm instructions. I think it is not a good idea for our translator to rely > on the implementation of llvm ARM back-end. So I have to find another > decoder or implement it by by ourselves. > > > Every MCInst created by the MCDisassembler will have a one-to-one mapping > to an actual ARM instruction. > > > Thanks, > David > > [*] For most case, the targets are the shared libraries in Android APKs > developed by NDK, like libangraybird.so. I think most of them are > pre-linked, so it is bad for us. Because there is no $a, $t and $d symbols, > we cannot figure out which region is arm code or thumb code statically. > > > On Thu, Jun 7, 2012 at 8:11 PM, Tim Northover <t.p.northover at gmail.com>wrote: > >> Hi David, >> >> On Thu, Jun 7, 2012 at 10:17 AM, Fan Dawei <fandawei.s at gmail.com> wrote: >> > Could you please tell me more about $a, $t and $d symbols? How these >> symbols >> > are used to define different regions? Where I can find this symbols in >> ELF >> > object file? >> >> At the start of each range of ARM code, an assembler or compiler >> should produce a "$a" symbol with that address, and put it (naturally >> enough) in the ELF symbol-table. Similarly each stretch of Thumb code >> gets a "$t" and each data a "$d". >> >> For example if I assemble: >> >> .arm >> mov r0, r3 >> ldr r2, Lit >> Lit: >> .word 42 >> add r0, r0, r0 >> .thumb >> mov r5, r2 >> >> then the symbol table contains these entries: >> 4: 00000000 0 NOTYPE LOCAL DEFAULT 1 $a >> [...] >> 6: 00000008 0 NOTYPE LOCAL DEFAULT 1 $d >> 7: 0000000c 0 NOTYPE LOCAL DEFAULT 1 $a >> 8: 00000010 0 NOTYPE LOCAL DEFAULT 1 $t >> >> which shows that an ARM region begins at offset 0x0, a data one at >> offset 0x8, we switch back to ARM at 0xc and finally Thumb takes over >> at 0x10. >> >> GNU objdump hides the symbols by default when printing the >> symbol-table (you can give it the --special-syms option to show them), >> but readelf shows them always. >> >> If you want the really deep details, they're fully documented in the >> ARM ELF ABI here (section 4.6.5): >> >> >> infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf >> >> Which is all nice to know, but I'm afraid it probably doesn't offer an >> immediate solution to the undefined instructions: >> + libc.so isn't a relocatable object file (well, it is dynamically, >> but that doesn't count). >> + llvm-objdump ignores them anyway at the moment, as far as I can tell. >> >> Tim. >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20120608/d1568728/attachment.html>