thr3ads.net - llvm dev - [llvm-dev] GSoC19: Improve LLVM binary utilities [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Jordan Rupprecht via llvm-dev

2019-Mar-26 22:40 UTC

[llvm-dev] GSoC19: Improve LLVM binary utilities

(Adding just a bit to Jake's response)

On Tue, Mar 26, 2019 at 11:31 AM Jake Ehrlich via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi Seiya,
>
> What should I prioritize? I suppose that improving llvm-objcopy is the
>> most crucial work in this summer.
>
>
> This is an opinion that will vary a lot from person to person.
>+1! And don't forget that one of those people is you -- I don't think it
would be useful to start a gsoc project on something you don't enjoy just
because others think it's important. I would agree about objcopy :) but
I'm
also happy to help you figure out what project you'd like for any other
tool.

At the top of my list is improvements to llvm-objdump and working on
MachO> backends for LLD and llvm-objcopy. The critical thing to avoid IMO is
> implementing features without a direct use case in mind. I've let
myself
> fall victim to this mistake many times before. I would ask the community
> for improvements they want to see and especially relay on your host to
> guide the direction you take. If you and your host feel that llvm-objcopy
> is the most critical then I certainly know some people and use cases that
> would be interested and will respond to an email on llvm-dev asking what
> you could work on. Several people have been adding bugs for llvm-objcopy
> recently and you should be able to find things to do there.
>
I think objcopy has the *most* things that have left to be done, but
there's plenty of work in other binutils. I'm not sure if any particular
bit would be called "crucial" however.
A couple ideas that have been kicked around for llvm-objcopy are:
* Librarify it (https://bugs.llvm.org/show_bug.cgi?id=41044)
* Improve MachO/COFF support (COFF support is pretty good, MachO is barely
there).
* Support ihex (https://bugs.llvm.org/show_bug.cgi?id=39841) or efi (
https://bugs.llvm.org/show_bug.cgi?id=40618) [Not that many people are
probably asking for these though]

> How can I avoid proposing functionalities that others are already working
>> on? It seems that the tools have been still actively developed.
>>
>
> The bug tracker is one way to look at this, people will say if they're
> working on any open bugs there. In practice I found that if I have a real
> use case and the feature I need hasn't been implemented, no one is
likely
> to be currently working on it. For bigger features you should email
> llvm-dev. Many people are likely to have thought about how bigger features
> should be implemented and there's a better chance that someone is
already
> actively working on things.
>
> Are there good first issues related to the project? This is the first time
>> for me to dig into the LLVM source code so currently I cannot show
>> convincing evidence that I'm able to work on the project.
>
> Well I have biased opinions. I'd like alignment to be better handled in
> llvm-objdump, I'd like for symbol references to be resolved in an
easier to
> parse fashion, and for module and function offsets to be output in a way
> that makes them easy to jump between.
>
> Many bugs (though not enough) are tagged with the "beginner"
keyword:https://bugs.llvm.org/buglist.cgi?quicksearch=keyword%3Abeginner&list_id=157827.
That's usually a good start.

> On Tue, Mar 26, 2019 at 3:34 AM Seiya Nuta via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> My name is Seiya Nuta. I'm studying for my master's degree in
University
>> of Tsukuba and interested in the project named "Improve LLVM
binary
>> utilities". I've skimmed through llvm-objcopy/llvm-objdump,
commit logs,
>> and Bugzilla to figure out what should I do.
>>
>> I have some questions about the project:
>>
>> - What should I prioritize? I suppose that improving llvm-objcopy is
the
>>    most crucial work in this summer.
>> - How can I avoid proposing functionalities that others are already
>>    working on? It seems that the tools have been still actively
>>    developed.
>> - Are there good first issues related to the project? This is the first
>>    time for me to dig into the LLVM source code so currently I cannot
>>    show convincing evidence that I'm able to work on the project.
>>
>> Best regards,
>> Seiya
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190326/efa63838/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4849 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190326/efa63838/attachment-0001.bin>

bd1976 llvm via llvm-dev

2019-Mar-27 14:32 UTC

head link

[llvm-dev] GSoC19: Improve LLVM binary utilities

Hi Seiya,

If you want a project that is not trival; but, doable in a summer; will be
be a great leaning opportunity, and will be very useful to developers. Then
I would suggest improving the disassembly of object files on x86_64. I
can't count the number of times this has caused confusion.

Consider the following assembly:

    nop
    nop
    .globl sym1
sym1:
    ret

.section .text2,"ax", at progbits
    jmp .text
    jmp .text+1
    jmp .text+6
    jmp sym1
    .globl sym2
sym2:
    jmp .text2
    jmp .text2+1
    jmp .text2+20
    jmp sym2
    jmp sym2 at plt

When assembled and then disassembled you will see output something like:

Disassembly of section .text:
0x00000000: 90                      nop
0x00000001: 90                      nop

sym1:
0x00000002: C3                      ret

Disassembly of section .text2:
0x00000000: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFC
(0000000000000005h)
0x00000005: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFD
(000000000000000Ah)
0x0000000A: E9 00 00 00 00          jmp      sym1 (000000000000000Fh)
0x0000000F: E9 00 00 00 00          jmp      sym2 (0000000000000014h)

sym2:
0x00000014: EB EA                   jmp      0000000000000000h
0x00000016: EB E9                   jmp      0000000000000001h
0x00000018: EB FA                   jmp      sym2 (0000000000000014h)
0x0000001A: EB F8                   jmp      sym2 (0000000000000014h)
0x0000001C: E9 00 00 00 00          jmp      sym2 (0000000000000021h)

This is pretty confusing. What is wanted is output more like this:

Disassembly of section .text[0]:
0x00000000: 90                      nop
0x00000001: 90                      nop

sym1:
0x00000002: C3                      ret

Disassembly of section .text2[1]:
0x00000000: E9 ?? ?? ?? ??          jmp      .text[0] + 0x0
0x00000005: E9 ?? ?? ?? ??          jmp      .text[0] + 0x1
0x0000000A: E9 ?? ?? ?? ??          jmp      .text[0] + 0x6 (sym1 + 0x4)
0x0000000F: E9 ?? ?? ?? ??          jmp      sym1 + 0x0

sym2:
0x00000014: EB EA                   jmp      .text2[0] + 0x0
0x00000016: EB E9                   jmp      .text2[0] + 0x1
0x00000018: EB FA                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
0x0000001A: EB F8                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
0x0000001C: E9 ?? ?? ?? ??          jmp      sym2 (via GOT)


Please forgive me for using the output of our internal tools to illustrate
the point (I prepared this internally and don't have much time to write
this email so I just copied and pasted). If you try this with LLVM's binary
tools or GNU's you will see similar results.

Concrete suggestions for improvements:

   - section relative targets augmented with symbol information
   - ?? to indicate Relocation patches
   - targets of PC relative jumps computed correctly
   - sections names augmented with their indices (section name are
   ambiguous)
   - branches via PLT indicated with added comments

This is not trivial to accomplish. Specifically, computing the target of
branches will either require more integration between the binary tools and
the dissembler; or, possibly the binary tools could create a fake layout
and then patch up the instructions so that they disassemble
"correctly".

If you manage to get that done; then I would suggest going further and
trying to enhance the disassembly by adding color coding/outlining/ASCII
art to the output to show things like loops, if statements, basic blocks.
As inspiration see "rich disassembly" in this presentation by apple:
http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190327/d56abc92/attachment.html>

Jake Ehrlich via llvm-dev

2019-Mar-27 17:54 UTC

head link

[llvm-dev] GSoC19: Improve LLVM binary utilities

This is what I meant by llvm-objdump improvements.

On Wed, Mar 27, 2019 at 7:32 AM bd1976 llvm <bd1976llvm at gmail.com>
wrote:
> Hi Seiya,
>
> If you want a project that is not trival; but, doable in a summer; will be
> be a great leaning opportunity, and will be very useful to developers. Then
> I would suggest improving the disassembly of object files on x86_64. I
> can't count the number of times this has caused confusion.
>
> Consider the following assembly:
>
>     nop
>     nop
>     .globl sym1
> sym1:
>     ret
>
> .section .text2,"ax", at progbits
>     jmp .text
>     jmp .text+1
>     jmp .text+6
>     jmp sym1
>     .globl sym2
> sym2:
>     jmp .text2
>     jmp .text2+1
>     jmp .text2+20
>     jmp sym2
>     jmp sym2 at plt
>
> When assembled and then disassembled you will see output something like:
>
> Disassembly of section .text:
> 0x00000000: 90                      nop
> 0x00000001: 90                      nop
>
> sym1:
> 0x00000002: C3                      ret
>
> Disassembly of section .text2:
> 0x00000000: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFC
(0000000000000005h)
> 0x00000005: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFD
(000000000000000Ah)
> 0x0000000A: E9 00 00 00 00          jmp      sym1 (000000000000000Fh)
> 0x0000000F: E9 00 00 00 00          jmp      sym2 (0000000000000014h)
>
> sym2:
> 0x00000014: EB EA                   jmp      0000000000000000h
> 0x00000016: EB E9                   jmp      0000000000000001h
> 0x00000018: EB FA                   jmp      sym2 (0000000000000014h)
> 0x0000001A: EB F8                   jmp      sym2 (0000000000000014h)
> 0x0000001C: E9 00 00 00 00          jmp      sym2 (0000000000000021h)
>
> This is pretty confusing. What is wanted is output more like this:
>
> Disassembly of section .text[0]:
> 0x00000000: 90                      nop
> 0x00000001: 90                      nop
>
> sym1:
> 0x00000002: C3                      ret
>
> Disassembly of section .text2[1]:
> 0x00000000: E9 ?? ?? ?? ??          jmp      .text[0] + 0x0
> 0x00000005: E9 ?? ?? ?? ??          jmp      .text[0] + 0x1
> 0x0000000A: E9 ?? ?? ?? ??          jmp      .text[0] + 0x6 (sym1 + 0x4)
> 0x0000000F: E9 ?? ?? ?? ??          jmp      sym1 + 0x0
>
> sym2:
> 0x00000014: EB EA                   jmp      .text2[0] + 0x0
> 0x00000016: EB E9                   jmp      .text2[0] + 0x1
> 0x00000018: EB FA                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
> 0x0000001A: EB F8                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
> 0x0000001C: E9 ?? ?? ?? ??          jmp      sym2 (via GOT)
>
>
> Please forgive me for using the output of our internal tools to illustrate
> the point (I prepared this internally and don't have much time to write
> this email so I just copied and pasted). If you try this with LLVM's
binary
> tools or GNU's you will see similar results.
>
> Concrete suggestions for improvements:
>
>    - section relative targets augmented with symbol information
>    - ?? to indicate Relocation patches
>    - targets of PC relative jumps computed correctly
>    - sections names augmented with their indices (section name are
>    ambiguous)
>    - branches via PLT indicated with added comments
>
> This is not trivial to accomplish. Specifically, computing the target of
> branches will either require more integration between the binary tools and
> the dissembler; or, possibly the binary tools could create a fake layout
> and then patch up the instructions so that they disassemble
"correctly".
>
> If you manage to get that done; then I would suggest going further and
> trying to enhance the disassembly by adding color coding/outlining/ASCII
> art to the output to show things like loops, if statements, basic blocks.
> As inspiration see "rich disassembly" in this presentation by
apple:
> http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v.
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190327/d3987054/attachment-0001.html>

Seiya Nuta via llvm-dev

2019-Mar-28 03:50 UTC

head link

[llvm-dev] GSoC19: Improve LLVM binary utilities

Hi,

Thank you for your suggestion. It won't be easy but it's
really attractive to me!

 >   * sections names augmented with their indices (section name are 
ambiguous)
Could you explain a little further what does "ambiguous" mean here?

You mean similar section names (e.g., .text1 and .textl)?

Seiya

On 3/27/19 23:32, bd1976 llvm via llvm-dev wrote:> Hi Seiya,
> 
> If you want a project that is not trival; but, doable in a summer; will 
> be be a great leaning opportunity, and will be very useful to 
> developers. Then I would suggest improving the disassembly of object 
> files on x86_64. I can't count the number of times this has caused 
> confusion.
> 
> Consider the following assembly:
> 
>      nop
>      nop
>      .globl sym1
> sym1:
>      ret
> 
> .section .text2,"ax", at progbits
>      jmp .text
>      jmp .text+1
>      jmp .text+6
>      jmp sym1
>      .globl sym2
> sym2:
>      jmp .text2
>      jmp .text2+1
>      jmp .text2+20
>      jmp sym2
>      jmp sym2 at plt
> 
> When assembled and then disassembled you will see output something like:
> 
> Disassembly of section .text:
> 0x00000000: 90                      nop
> 0x00000001: 90                      nop
> 
> sym1:
> 0x00000002: C3                      ret
> 
> Disassembly of section .text2:
> 0x00000000: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFC
(0000000000000005h)
> 0x00000005: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFD
(000000000000000Ah)
> 0x0000000A: E9 00 00 00 00          jmp      sym1 (000000000000000Fh)
> 0x0000000F: E9 00 00 00 00          jmp      sym2 (0000000000000014h)
> 
> sym2:
> 0x00000014: EB EA                   jmp      0000000000000000h
> 0x00000016: EB E9                   jmp      0000000000000001h
> 0x00000018: EB FA                   jmp      sym2 (0000000000000014h)
> 0x0000001A: EB F8                   jmp      sym2 (0000000000000014h)
> 0x0000001C: E9 00 00 00 00          jmp      sym2 (0000000000000021h)
> 
> This is pretty confusing. What is wanted is output more like this:
> 
> Disassembly of section .text[0]:
> 0x00000000: 90                      nop
> 0x00000001: 90                      nop
> 
> sym1:
> 0x00000002: C3                      ret
> 
> Disassembly of section .text2[1]:
> 0x00000000: E9 ?? ?? ?? ??          jmp      .text[0] + 0x0
> 0x00000005: E9 ?? ?? ?? ??          jmp      .text[0] + 0x1
> 0x0000000A: E9 ?? ?? ?? ??          jmp      .text[0] + 0x6 (sym1 + 0x4)
> 0x0000000F: E9 ?? ?? ?? ??          jmp      sym1 + 0x0
> 
> sym2:
> 0x00000014: EB EA                   jmp      .text2[0] + 0x0
> 0x00000016: EB E9                   jmp      .text2[0] + 0x1
> 0x00000018: EB FA                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
> 0x0000001A: EB F8                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
> 0x0000001C: E9 ?? ?? ?? ??          jmp      sym2 (via GOT)
> 
> 
> Please forgive me for using the output of our internal tools to 
> illustrate the point (I prepared this internally and don't have much 
> time to write this email so I just copied and pasted). If you try this 
> with LLVM's binary tools or GNU's you will see similar results.
> 
> Concrete suggestions for improvements:
> 
>   * section relative targets augmented with symbol information
>   * ?? to indicate Relocation patches
>   * targets of PC relative jumps computed correctly
>   * sections names augmented with their indices (section name are
ambiguous)
>   * branches via PLT indicated with added comments
> 
> This is not trivial to accomplish. Specifically, computing the target of 
> branches will either require more integration between the binary tools 
> and the dissembler; or, possibly the binary tools could create a fake 
> layout and then patch up the instructions so that they disassemble 
> "correctly".
> 
> If you manage to get that done; then I would suggest going further and 
> trying to enhance the disassembly by adding color coding/outlining/ASCII 
> art to the output to show things like loops, if statements, basic 
> blocks. As inspiration see "rich disassembly" in this
presentation by
> apple: http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v.
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Krzysztof Parzyszek via llvm-dev

2019-Mar-28 13:53 UTC

head link

[llvm-dev] [EXT] Re: GSoC19: Improve LLVM binary utilities

This augmented output should not be the default, it should only be enabled with
an option.

-- 
Krzysztof Parzyszek  mailto:kparzysz at quicinc.com   LLVM compiler development

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of bd1976
llvm via llvm-dev
Sent: Wednesday, March 27, 2019 9:33 AM
To: Jordan Rupprecht <rupprecht at google.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [EXT] Re: [llvm-dev] GSoC19: Improve LLVM binary utilities

Hi Seiya,

If you want a project that is not trival; but, doable in a summer; will be be a
great leaning opportunity, and will be very useful to developers. Then I would
suggest improving the disassembly of object files on x86_64. I can't count
the number of times this has caused confusion.

Consider the following assembly:
    nop
    nop
    .globl sym1
sym1:
    ret

.section .text2,"ax", at progbits
    jmp .text
    jmp .text+1
    jmp .text+6
    jmp sym1
    .globl sym2
sym2:
    jmp .text2
    jmp .text2+1
    jmp .text2+20
    jmp sym2
    jmp sym2 at plt
When assembled and then disassembled you will see output something like:
Disassembly of section .text:
0x00000000: 90                      nop
0x00000001: 90                      nop

sym1:
0x00000002: C3                      ret

Disassembly of section .text2:
0x00000000: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFC
(0000000000000005h)
0x00000005: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFD
(000000000000000Ah)
0x0000000A: E9 00 00 00 00          jmp      sym1 (000000000000000Fh)
0x0000000F: E9 00 00 00 00          jmp      sym2 (0000000000000014h)

sym2:
0x00000014: EB EA                   jmp      0000000000000000h
0x00000016: EB E9                   jmp      0000000000000001h
0x00000018: EB FA                   jmp      sym2 (0000000000000014h)
0x0000001A: EB F8                   jmp      sym2 (0000000000000014h)
0x0000001C: E9 00 00 00 00          jmp      sym2 (0000000000000021h)
This is pretty confusing. What is wanted is output more like this:
Disassembly of section .text[0]:
0x00000000: 90                      nop
0x00000001: 90                      nop

sym1:
0x00000002: C3                      ret

Disassembly of section .text2[1]:
0x00000000: E9 ?? ?? ?? ??          jmp      .text[0] + 0x0
0x00000005: E9 ?? ?? ?? ??          jmp      .text[0] + 0x1
0x0000000A: E9 ?? ?? ?? ??          jmp      .text[0] + 0x6 (sym1 + 0x4)
0x0000000F: E9 ?? ?? ?? ??          jmp      sym1 + 0x0

sym2:
0x00000014: EB EA                   jmp      .text2[0] + 0x0
0x00000016: EB E9                   jmp      .text2[0] + 0x1
0x00000018: EB FA                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
0x0000001A: EB F8                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
0x0000001C: E9 ?? ?? ?? ??          jmp      sym2 (via GOT)

Please forgive me for using the output of our internal tools to illustrate the
point (I prepared this internally and don't have much time to write this
email so I just copied and pasted). If you try this with LLVM's binary tools
or GNU's you will see similar results.

Concrete suggestions for improvements:
• section relative targets augmented with symbol information
• ?? to indicate Relocation patches
• targets of PC relative jumps computed correctly
• sections names augmented with their indices (section name are ambiguous)
• branches via PLT indicated with added comments
This is not trivial to accomplish. Specifically, computing the target of
branches will either require more integration between the binary tools and the
dissembler; or, possibly the binary tools could create a fake layout and then
patch up the instructions so that they disassemble "correctly".
If you manage to get that done; then I would suggest going further and trying to
enhance the disassembly by adding color coding/outlining/ASCII art to the output
to show things like loops, if statements, basic blocks. As inspiration see
"rich disassembly" in this presentation by
apple: http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v.

llvm dev - Mar 2019 - GSoC19: Improve LLVM binary utilities

[llvm-dev] GSoC19: Improve LLVM binary utilities

[llvm-dev] GSoC19: Improve LLVM binary utilities

[llvm-dev] GSoC19: Improve LLVM binary utilities

[llvm-dev] GSoC19: Improve LLVM binary utilities

[llvm-dev] [EXT] Re: GSoC19: Improve LLVM binary utilities