thr3ads.net - llvm dev - [LLVMdev] llvm-objdump [Aug 2014]

If this information is useful, please help other people find it:
Share via:

Steve King

2014-Aug-26 16:52 UTC

[LLVMdev] llvm-objdump

I would like to improve llvm-objdump.  However, many unit tests depend
precisely on the current output, making the picture a little tricky.
My experience is limited to ELF format objects, so experts in other
formats please sanity check.

Suggested changes:
1) Symbolize conditional branch targets.  Currently, llvm-objdump
prints branch targets numerically regardless of -symbolize.

2) Make -symbolize the default behavior for human friendliness.

3) Add new -bare option to suppress symbolizing.  Many unit tests will
use -bare to preserve expected output in today's format.

4) When multiple symbols exist for a given address, print all of them.
Today, llvm-objdump only prints the last symbol found, but symbolizes
references with the first symbol found.  So, it's a bit of a mess.

5) When symbolizing code references, prefer matching symbols with type
FUNC, but fall back to matches with type NOTYPE.  This matches GNU
objdump behavior and many hand written assembly files don't specify
.type directives anyway.

How does this sound?

Regards,
-steve

Kevin Enderby

2014-Aug-26 18:16 UTC

head link

[LLVMdev] llvm-objdump

Hi Steve,

I too have been working on improving llvm-objdump for Mach-O files, which I
guess I would be called an expert in.  My long term goal is to match
llvm-objdump’s functionality with that of darwin’s otool(1) and improve beyond
that.

For branch targets my preference is to print the target’s address (not the
displacement of the branch), and preferably in hex.  With a way to toggle
between non-symbolic and symbolic.  As non-symbolic is needed for debugging. 
And symbolic should be the full on use the symbol table, relocation entries,
past instructions, indirect tables, literal tables, Objective-C meta data, C++
demanglers, and even debug info etc, to print the best operand and comment along
with the instruction.  For symbolic we go to all these lengths (short of debug
info) in darwin’s otool(1) using llvm’s dissembler hooks.

I do think the default makes sense to be symbolic by default and non-symbolic
with an option.  I would love to extend the non-symbolic option to things like
printing the private headers, relocation entries, etc as raw value.  Again this
is very useful for debugging and dealing with broken object files when you need
to see the values and what could be going on.  The name -bare as an option seems
fine for this to me.

I don’t think having multiple addresses for a target is a real problem with the
exception of the address 0 (which is often an unrelocated no addend value).  So
the trick is to not print the symbol name in the object with the address of zero
in those cases.  Generally in Mach-O we don’t see multiple symbols at the same
address.

In Mach-O we don’t have typed symbols in the symbol table without looking at
debugging info.  But what you say about using type FUNC symbols for ELF seems to
make sense to me.

My thoughts,
Kev

On Aug 26, 2014, at 9:52 AM, Steve King <steve at metrokings.com> wrote:
> I would like to improve llvm-objdump.  However, many unit tests depend
> precisely on the current output, making the picture a little tricky.
> My experience is limited to ELF format objects, so experts in other
> formats please sanity check.
> 
> Suggested changes:
> 1) Symbolize conditional branch targets.  Currently, llvm-objdump
> prints branch targets numerically regardless of -symbolize.
> 
> 2) Make -symbolize the default behavior for human friendliness.
> 
> 3) Add new -bare option to suppress symbolizing.  Many unit tests will
> use -bare to preserve expected output in today's format.
> 
> 4) When multiple symbols exist for a given address, print all of them.
> Today, llvm-objdump only prints the last symbol found, but symbolizes
> references with the first symbol found.  So, it's a bit of a mess.
> 
> 5) When symbolizing code references, prefer matching symbols with type
> FUNC, but fall back to matches with type NOTYPE.  This matches GNU
> objdump behavior and many hand written assembly files don't specify
> .type directives anyway.
> 
> How does this sound?
> 
> Regards,
> -steve
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Sean Silva

2014-Aug-26 19:18 UTC

head link

[LLVMdev] llvm-objdump

On Tue, Aug 26, 2014 at 9:52 AM, Steve King <steve at metrokings.com>
wrote:
> I would like to improve llvm-objdump.  However, many unit tests depend
> precisely on the current output, making the picture a little tricky.
> My experience is limited to ELF format objects, so experts in other
> formats please sanity check.
>
> Suggested changes:
> 1) Symbolize conditional branch targets.  Currently, llvm-objdump
> prints branch targets numerically regardless of -symbolize.
>
> 2) Make -symbolize the default behavior for human friendliness.
>
Last I checked (which admittedly was about a year ago), -symbolize had
significant performance problems on large object files. If those are still
present, I think you should focus on fixing them before changing the
default.

>
> 3) Add new -bare option to suppress symbolizing.  Many unit tests will
> use -bare to preserve expected output in today's format.
>
> 4) When multiple symbols exist for a given address, print all of them.
> Today, llvm-objdump only prints the last symbol found, but symbolizes
> references with the first symbol found.  So, it's a bit of a mess.
>
> 5) When symbolizing code references, prefer matching symbols with type
> FUNC, but fall back to matches with type NOTYPE.  This matches GNU
> objdump behavior and many hand written assembly files don't specify
> .type directives anyway.
>
> How does this sound?
>
You seem to be focusing a lot on the user-visible behavior. However, I
would say that a lot of the work that needs to be done is actually internal
to the code; that will make adding new functionality easier.

Here are some suggested changes:

1. Clean up the code, improving the usability of LLVM's C++ API's as
necessary. This will benefit all LLVM users of this functionality in fact.
The main thing is to clarify the core logic and reduce boilerplate.
2. Rip out the YAMLCFG stuff (including the corresponding library code)
since it seems totally borked.

btw, for tools that we consider as internal testing tools, there has
historically been pushback for adding user-visible features if they do not
serve an immediate need within the LLVM codebase (CC'ing Rafael: does
llvm-objdump fall under this category?).

-- Sean Silva

>
> Regards,
> -steve
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140826/b8d93f4f/attachment.html>

Steve King

2014-Aug-26 19:43 UTC

head link

[LLVMdev] llvm-objdump

Hi Kev,
I'm glad to hear llvm-objdump is getting attention.  I'm unclear on
how much output specialization one could (or should) do for ELF vs.
Mach-O.  If you're game, let's compare an example:

$ cat labeltest.s
.text
foo:
    nop
bar:
bum:
    nop
    jmp   bar
    jmp   bum
    jmp   baz
    nop
baz:
    nop

Assembling for x86 and llvm-objdump'ing, i get

$ llvm-mc -arch=x86 -filetype=obj labeltest.s -o x86_labeltest.o
$ llvm-objdump -d  x86_labeltest.o

x86_labeltest.o: file format ELF32-i386

Disassembly of section .text:
foo:
       0: 90                                           nop

bum:
       1: 90                                           nop
       2: eb fd                                         jmp -3
       4: eb fb                                         jmp -5
       6: eb 01                                         jmp 1
       8: 90                                           nop

baz:
       9: 90                                           nop

I get the dump above with or without -symbolize.

My personal golden reference, GNU objdump, does this:

$ objdump -dw x86_labeltest.o

x86_labeltest.o:     file format elf32-i386


Disassembly of section .text:

00000000 <foo>:
   0: 90                   nop

00000001 <bar>:
   1: 90                   nop
   2: eb fd                 jmp    1 <bar>
   4: eb fb                 jmp    1 <bar>
   6: eb 01                 jmp    9 <baz>
   8: 90                   nop

00000009 <baz>:
   9: 90                   nop

What does otool produce?


On Tue, Aug 26, 2014 at 11:16 AM, Kevin Enderby <enderby at apple.com>
wrote:> For branch targets my preference is to print the target’s address (not the
displacement of the branch), and preferably in hex.
I like this too.
> I don’t think having multiple addresses for a target is a real problem with
the exception of the address 0 (which is often an unrelocated no addend value). 
So the trick is to not print the symbol name in the object with the address of
zero in those cases
Right, relocations are a special case.

Steve King

2014-Aug-26 21:55 UTC

head link

[LLVMdev] llvm-objdump

On Tue, Aug 26, 2014 at 12:18 PM, Sean Silva <chisophugis at gmail.com>
wrote:>
> Last I checked (which admittedly was about a year ago), -symbolize had
> significant performance problems on large object files. If those are still
> present, I think you should focus on fixing them before changing the
> default.
>
I haven't tested performance, but you're probably right, The
symbolizing code repeats a linear search for each symbol.  There is a
FIXME comment in the loop suggesting to use a map instead.

Rafael Espíndola

2014-Aug-27 16:02 UTC

head link

[LLVMdev] llvm-objdump

> btw, for tools that we consider as internal testing tools, there has
> historically been pushback for adding user-visible features if they do not
> serve an immediate need within the LLVM codebase (CC'ing Rafael: does
> llvm-objdump fall under this category?).
I don't think so. If I remember correctly the reason for having
llvm-readobj and llvm-objdump is that readobj can be whatever we want
for testing and llvm-objdump can match as closely as practical the
system (gnu?) objdump.

Cheers,
Rafael

Rafael Espíndola

2014-Aug-27 16:14 UTC

head link

[LLVMdev] llvm-objdump

> 2. Rip out the YAMLCFG stuff (including the corresponding library code)
> since it seems totally borked.
Is the attached patch OK? :-)



Cheers,
Rafael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: t.patch
Type: text/x-patch
Size: 25260 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140827/4fc834ce/attachment.bin>

llvm dev - Aug 2014 - [LLVMdev] llvm-objdump

[LLVMdev] llvm-objdump

[LLVMdev] llvm-objdump

[LLVMdev] llvm-objdump

[LLVMdev] llvm-objdump

[LLVMdev] llvm-objdump

[LLVMdev] llvm-objdump

[LLVMdev] llvm-objdump