I would like to improve llvm-objdump. However, many unit tests depend precisely on the current output, making the picture a little tricky. My experience is limited to ELF format objects, so experts in other formats please sanity check. Suggested changes: 1) Symbolize conditional branch targets. Currently, llvm-objdump prints branch targets numerically regardless of -symbolize. 2) Make -symbolize the default behavior for human friendliness. 3) Add new -bare option to suppress symbolizing. Many unit tests will use -bare to preserve expected output in today's format. 4) When multiple symbols exist for a given address, print all of them. Today, llvm-objdump only prints the last symbol found, but symbolizes references with the first symbol found. So, it's a bit of a mess. 5) When symbolizing code references, prefer matching symbols with type FUNC, but fall back to matches with type NOTYPE. This matches GNU objdump behavior and many hand written assembly files don't specify .type directives anyway. How does this sound? Regards, -steve
Hi Steve, I too have been working on improving llvm-objdump for Mach-O files, which I guess I would be called an expert in. My long term goal is to match llvm-objdump’s functionality with that of darwin’s otool(1) and improve beyond that. For branch targets my preference is to print the target’s address (not the displacement of the branch), and preferably in hex. With a way to toggle between non-symbolic and symbolic. As non-symbolic is needed for debugging. And symbolic should be the full on use the symbol table, relocation entries, past instructions, indirect tables, literal tables, Objective-C meta data, C++ demanglers, and even debug info etc, to print the best operand and comment along with the instruction. For symbolic we go to all these lengths (short of debug info) in darwin’s otool(1) using llvm’s dissembler hooks. I do think the default makes sense to be symbolic by default and non-symbolic with an option. I would love to extend the non-symbolic option to things like printing the private headers, relocation entries, etc as raw value. Again this is very useful for debugging and dealing with broken object files when you need to see the values and what could be going on. The name -bare as an option seems fine for this to me. I don’t think having multiple addresses for a target is a real problem with the exception of the address 0 (which is often an unrelocated no addend value). So the trick is to not print the symbol name in the object with the address of zero in those cases. Generally in Mach-O we don’t see multiple symbols at the same address. In Mach-O we don’t have typed symbols in the symbol table without looking at debugging info. But what you say about using type FUNC symbols for ELF seems to make sense to me. My thoughts, Kev On Aug 26, 2014, at 9:52 AM, Steve King <steve at metrokings.com> wrote:> I would like to improve llvm-objdump. However, many unit tests depend > precisely on the current output, making the picture a little tricky. > My experience is limited to ELF format objects, so experts in other > formats please sanity check. > > Suggested changes: > 1) Symbolize conditional branch targets. Currently, llvm-objdump > prints branch targets numerically regardless of -symbolize. > > 2) Make -symbolize the default behavior for human friendliness. > > 3) Add new -bare option to suppress symbolizing. Many unit tests will > use -bare to preserve expected output in today's format. > > 4) When multiple symbols exist for a given address, print all of them. > Today, llvm-objdump only prints the last symbol found, but symbolizes > references with the first symbol found. So, it's a bit of a mess. > > 5) When symbolizing code references, prefer matching symbols with type > FUNC, but fall back to matches with type NOTYPE. This matches GNU > objdump behavior and many hand written assembly files don't specify > .type directives anyway. > > How does this sound? > > Regards, > -steve > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Tue, Aug 26, 2014 at 9:52 AM, Steve King <steve at metrokings.com> wrote:> I would like to improve llvm-objdump. However, many unit tests depend > precisely on the current output, making the picture a little tricky. > My experience is limited to ELF format objects, so experts in other > formats please sanity check. > > Suggested changes: > 1) Symbolize conditional branch targets. Currently, llvm-objdump > prints branch targets numerically regardless of -symbolize. > > 2) Make -symbolize the default behavior for human friendliness. >Last I checked (which admittedly was about a year ago), -symbolize had significant performance problems on large object files. If those are still present, I think you should focus on fixing them before changing the default.> > 3) Add new -bare option to suppress symbolizing. Many unit tests will > use -bare to preserve expected output in today's format. > > 4) When multiple symbols exist for a given address, print all of them. > Today, llvm-objdump only prints the last symbol found, but symbolizes > references with the first symbol found. So, it's a bit of a mess. > > 5) When symbolizing code references, prefer matching symbols with type > FUNC, but fall back to matches with type NOTYPE. This matches GNU > objdump behavior and many hand written assembly files don't specify > .type directives anyway. > > How does this sound? >You seem to be focusing a lot on the user-visible behavior. However, I would say that a lot of the work that needs to be done is actually internal to the code; that will make adding new functionality easier. Here are some suggested changes: 1. Clean up the code, improving the usability of LLVM's C++ API's as necessary. This will benefit all LLVM users of this functionality in fact. The main thing is to clarify the core logic and reduce boilerplate. 2. Rip out the YAMLCFG stuff (including the corresponding library code) since it seems totally borked. btw, for tools that we consider as internal testing tools, there has historically been pushback for adding user-visible features if they do not serve an immediate need within the LLVM codebase (CC'ing Rafael: does llvm-objdump fall under this category?). -- Sean Silva> > Regards, > -steve > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140826/b8d93f4f/attachment.html>
Hi Kev,
I'm glad to hear llvm-objdump is getting attention.  I'm unclear on
how much output specialization one could (or should) do for ELF vs.
Mach-O.  If you're game, let's compare an example:
$ cat labeltest.s
.text
foo:
    nop
bar:
bum:
    nop
    jmp   bar
    jmp   bum
    jmp   baz
    nop
baz:
    nop
Assembling for x86 and llvm-objdump'ing, i get
$ llvm-mc -arch=x86 -filetype=obj labeltest.s -o x86_labeltest.o
$ llvm-objdump -d  x86_labeltest.o
x86_labeltest.o: file format ELF32-i386
Disassembly of section .text:
foo:
       0: 90                                           nop
bum:
       1: 90                                           nop
       2: eb fd                                         jmp -3
       4: eb fb                                         jmp -5
       6: eb 01                                         jmp 1
       8: 90                                           nop
baz:
       9: 90                                           nop
I get the dump above with or without -symbolize.
My personal golden reference, GNU objdump, does this:
$ objdump -dw x86_labeltest.o
x86_labeltest.o:     file format elf32-i386
Disassembly of section .text:
00000000 <foo>:
   0: 90                   nop
00000001 <bar>:
   1: 90                   nop
   2: eb fd                 jmp    1 <bar>
   4: eb fb                 jmp    1 <bar>
   6: eb 01                 jmp    9 <baz>
   8: 90                   nop
00000009 <baz>:
   9: 90                   nop
What does otool produce?
On Tue, Aug 26, 2014 at 11:16 AM, Kevin Enderby <enderby at apple.com>
wrote:> For branch targets my preference is to print the target’s address (not the
displacement of the branch), and preferably in hex.
I like this too.
> I don’t think having multiple addresses for a target is a real problem with
the exception of the address 0 (which is often an unrelocated no addend value). 
So the trick is to not print the symbol name in the object with the address of
zero in those cases
Right, relocations are a special case.
On Tue, Aug 26, 2014 at 12:18 PM, Sean Silva <chisophugis at gmail.com> wrote:> > Last I checked (which admittedly was about a year ago), -symbolize had > significant performance problems on large object files. If those are still > present, I think you should focus on fixing them before changing the > default. >I haven't tested performance, but you're probably right, The symbolizing code repeats a linear search for each symbol. There is a FIXME comment in the loop suggesting to use a map instead.
> btw, for tools that we consider as internal testing tools, there has > historically been pushback for adding user-visible features if they do not > serve an immediate need within the LLVM codebase (CC'ing Rafael: does > llvm-objdump fall under this category?).I don't think so. If I remember correctly the reason for having llvm-readobj and llvm-objdump is that readobj can be whatever we want for testing and llvm-objdump can match as closely as practical the system (gnu?) objdump. Cheers, Rafael
> 2. Rip out the YAMLCFG stuff (including the corresponding library code) > since it seems totally borked.Is the attached patch OK? :-) Cheers, Rafael -------------- next part -------------- A non-text attachment was scrubbed... Name: t.patch Type: text/x-patch Size: 25260 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140827/4fc834ce/attachment.bin>