Jordan Rupprecht via llvm-dev
2019-Mar-26 22:40 UTC
[llvm-dev] GSoC19: Improve LLVM binary utilities
(Adding just a bit to Jake's response) On Tue, Mar 26, 2019 at 11:31 AM Jake Ehrlich via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi Seiya, > > What should I prioritize? I suppose that improving llvm-objcopy is the >> most crucial work in this summer. > > > This is an opinion that will vary a lot from person to person. >+1! And don't forget that one of those people is you -- I don't think it would be useful to start a gsoc project on something you don't enjoy just because others think it's important. I would agree about objcopy :) but I'm also happy to help you figure out what project you'd like for any other tool. At the top of my list is improvements to llvm-objdump and working on MachO> backends for LLD and llvm-objcopy. The critical thing to avoid IMO is > implementing features without a direct use case in mind. I've let myself > fall victim to this mistake many times before. I would ask the community > for improvements they want to see and especially relay on your host to > guide the direction you take. If you and your host feel that llvm-objcopy > is the most critical then I certainly know some people and use cases that > would be interested and will respond to an email on llvm-dev asking what > you could work on. Several people have been adding bugs for llvm-objcopy > recently and you should be able to find things to do there. >I think objcopy has the *most* things that have left to be done, but there's plenty of work in other binutils. I'm not sure if any particular bit would be called "crucial" however. A couple ideas that have been kicked around for llvm-objcopy are: * Librarify it (https://bugs.llvm.org/show_bug.cgi?id=41044) * Improve MachO/COFF support (COFF support is pretty good, MachO is barely there). * Support ihex (https://bugs.llvm.org/show_bug.cgi?id=39841) or efi ( https://bugs.llvm.org/show_bug.cgi?id=40618) [Not that many people are probably asking for these though]> How can I avoid proposing functionalities that others are already working >> on? It seems that the tools have been still actively developed. >> > > The bug tracker is one way to look at this, people will say if they're > working on any open bugs there. In practice I found that if I have a real > use case and the feature I need hasn't been implemented, no one is likely > to be currently working on it. For bigger features you should email > llvm-dev. Many people are likely to have thought about how bigger features > should be implemented and there's a better chance that someone is already > actively working on things. > > Are there good first issues related to the project? This is the first time >> for me to dig into the LLVM source code so currently I cannot show >> convincing evidence that I'm able to work on the project. > > Well I have biased opinions. I'd like alignment to be better handled in > llvm-objdump, I'd like for symbol references to be resolved in an easier to > parse fashion, and for module and function offsets to be output in a way > that makes them easy to jump between. > > Many bugs (though not enough) are tagged with the "beginner" keyword:https://bugs.llvm.org/buglist.cgi?quicksearch=keyword%3Abeginner&list_id=157827. That's usually a good start.> On Tue, Mar 26, 2019 at 3:34 AM Seiya Nuta via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi all, >> >> My name is Seiya Nuta. I'm studying for my master's degree in University >> of Tsukuba and interested in the project named "Improve LLVM binary >> utilities". I've skimmed through llvm-objcopy/llvm-objdump, commit logs, >> and Bugzilla to figure out what should I do. >> >> I have some questions about the project: >> >> - What should I prioritize? I suppose that improving llvm-objcopy is the >> most crucial work in this summer. >> - How can I avoid proposing functionalities that others are already >> working on? It seems that the tools have been still actively >> developed. >> - Are there good first issues related to the project? This is the first >> time for me to dig into the LLVM source code so currently I cannot >> show convincing evidence that I'm able to work on the project. >> >> Best regards, >> Seiya >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190326/efa63838/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4849 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190326/efa63838/attachment-0001.bin>
bd1976 llvm via llvm-dev
2019-Mar-27 14:32 UTC
[llvm-dev] GSoC19: Improve LLVM binary utilities
Hi Seiya, If you want a project that is not trival; but, doable in a summer; will be be a great leaning opportunity, and will be very useful to developers. Then I would suggest improving the disassembly of object files on x86_64. I can't count the number of times this has caused confusion. Consider the following assembly: nop nop .globl sym1 sym1: ret .section .text2,"ax", at progbits jmp .text jmp .text+1 jmp .text+6 jmp sym1 .globl sym2 sym2: jmp .text2 jmp .text2+1 jmp .text2+20 jmp sym2 jmp sym2 at plt When assembled and then disassembled you will see output something like: Disassembly of section .text: 0x00000000: 90 nop 0x00000001: 90 nop sym1: 0x00000002: C3 ret Disassembly of section .text2: 0x00000000: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFC (0000000000000005h) 0x00000005: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFD (000000000000000Ah) 0x0000000A: E9 00 00 00 00 jmp sym1 (000000000000000Fh) 0x0000000F: E9 00 00 00 00 jmp sym2 (0000000000000014h) sym2: 0x00000014: EB EA jmp 0000000000000000h 0x00000016: EB E9 jmp 0000000000000001h 0x00000018: EB FA jmp sym2 (0000000000000014h) 0x0000001A: EB F8 jmp sym2 (0000000000000014h) 0x0000001C: E9 00 00 00 00 jmp sym2 (0000000000000021h) This is pretty confusing. What is wanted is output more like this: Disassembly of section .text[0]: 0x00000000: 90 nop 0x00000001: 90 nop sym1: 0x00000002: C3 ret Disassembly of section .text2[1]: 0x00000000: E9 ?? ?? ?? ?? jmp .text[0] + 0x0 0x00000005: E9 ?? ?? ?? ?? jmp .text[0] + 0x1 0x0000000A: E9 ?? ?? ?? ?? jmp .text[0] + 0x6 (sym1 + 0x4) 0x0000000F: E9 ?? ?? ?? ?? jmp sym1 + 0x0 sym2: 0x00000014: EB EA jmp .text2[0] + 0x0 0x00000016: EB E9 jmp .text2[0] + 0x1 0x00000018: EB FA jmp .text2[0] + 0x14 (sym2 + 0x0) 0x0000001A: EB F8 jmp .text2[0] + 0x14 (sym2 + 0x0) 0x0000001C: E9 ?? ?? ?? ?? jmp sym2 (via GOT) Please forgive me for using the output of our internal tools to illustrate the point (I prepared this internally and don't have much time to write this email so I just copied and pasted). If you try this with LLVM's binary tools or GNU's you will see similar results. Concrete suggestions for improvements: - section relative targets augmented with symbol information - ?? to indicate Relocation patches - targets of PC relative jumps computed correctly - sections names augmented with their indices (section name are ambiguous) - branches via PLT indicated with added comments This is not trivial to accomplish. Specifically, computing the target of branches will either require more integration between the binary tools and the dissembler; or, possibly the binary tools could create a fake layout and then patch up the instructions so that they disassemble "correctly". If you manage to get that done; then I would suggest going further and trying to enhance the disassembly by adding color coding/outlining/ASCII art to the output to show things like loops, if statements, basic blocks. As inspiration see "rich disassembly" in this presentation by apple: http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190327/d56abc92/attachment.html>
Jake Ehrlich via llvm-dev
2019-Mar-27 17:54 UTC
[llvm-dev] GSoC19: Improve LLVM binary utilities
This is what I meant by llvm-objdump improvements. On Wed, Mar 27, 2019 at 7:32 AM bd1976 llvm <bd1976llvm at gmail.com> wrote:> Hi Seiya, > > If you want a project that is not trival; but, doable in a summer; will be > be a great leaning opportunity, and will be very useful to developers. Then > I would suggest improving the disassembly of object files on x86_64. I > can't count the number of times this has caused confusion. > > Consider the following assembly: > > nop > nop > .globl sym1 > sym1: > ret > > .section .text2,"ax", at progbits > jmp .text > jmp .text+1 > jmp .text+6 > jmp sym1 > .globl sym2 > sym2: > jmp .text2 > jmp .text2+1 > jmp .text2+20 > jmp sym2 > jmp sym2 at plt > > When assembled and then disassembled you will see output something like: > > Disassembly of section .text: > 0x00000000: 90 nop > 0x00000001: 90 nop > > sym1: > 0x00000002: C3 ret > > Disassembly of section .text2: > 0x00000000: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFC (0000000000000005h) > 0x00000005: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFD (000000000000000Ah) > 0x0000000A: E9 00 00 00 00 jmp sym1 (000000000000000Fh) > 0x0000000F: E9 00 00 00 00 jmp sym2 (0000000000000014h) > > sym2: > 0x00000014: EB EA jmp 0000000000000000h > 0x00000016: EB E9 jmp 0000000000000001h > 0x00000018: EB FA jmp sym2 (0000000000000014h) > 0x0000001A: EB F8 jmp sym2 (0000000000000014h) > 0x0000001C: E9 00 00 00 00 jmp sym2 (0000000000000021h) > > This is pretty confusing. What is wanted is output more like this: > > Disassembly of section .text[0]: > 0x00000000: 90 nop > 0x00000001: 90 nop > > sym1: > 0x00000002: C3 ret > > Disassembly of section .text2[1]: > 0x00000000: E9 ?? ?? ?? ?? jmp .text[0] + 0x0 > 0x00000005: E9 ?? ?? ?? ?? jmp .text[0] + 0x1 > 0x0000000A: E9 ?? ?? ?? ?? jmp .text[0] + 0x6 (sym1 + 0x4) > 0x0000000F: E9 ?? ?? ?? ?? jmp sym1 + 0x0 > > sym2: > 0x00000014: EB EA jmp .text2[0] + 0x0 > 0x00000016: EB E9 jmp .text2[0] + 0x1 > 0x00000018: EB FA jmp .text2[0] + 0x14 (sym2 + 0x0) > 0x0000001A: EB F8 jmp .text2[0] + 0x14 (sym2 + 0x0) > 0x0000001C: E9 ?? ?? ?? ?? jmp sym2 (via GOT) > > > Please forgive me for using the output of our internal tools to illustrate > the point (I prepared this internally and don't have much time to write > this email so I just copied and pasted). If you try this with LLVM's binary > tools or GNU's you will see similar results. > > Concrete suggestions for improvements: > > - section relative targets augmented with symbol information > - ?? to indicate Relocation patches > - targets of PC relative jumps computed correctly > - sections names augmented with their indices (section name are > ambiguous) > - branches via PLT indicated with added comments > > This is not trivial to accomplish. Specifically, computing the target of > branches will either require more integration between the binary tools and > the dissembler; or, possibly the binary tools could create a fake layout > and then patch up the instructions so that they disassemble "correctly". > > If you manage to get that done; then I would suggest going further and > trying to enhance the disassembly by adding color coding/outlining/ASCII > art to the output to show things like loops, if statements, basic blocks. > As inspiration see "rich disassembly" in this presentation by apple: > http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190327/d3987054/attachment-0001.html>
Seiya Nuta via llvm-dev
2019-Mar-28 03:50 UTC
[llvm-dev] GSoC19: Improve LLVM binary utilities
Hi, Thank you for your suggestion. It won't be easy but it's really attractive to me! > * sections names augmented with their indices (section name are ambiguous) Could you explain a little further what does "ambiguous" mean here? You mean similar section names (e.g., .text1 and .textl)? Seiya On 3/27/19 23:32, bd1976 llvm via llvm-dev wrote:> Hi Seiya, > > If you want a project that is not trival; but, doable in a summer; will > be be a great leaning opportunity, and will be very useful to > developers. Then I would suggest improving the disassembly of object > files on x86_64. I can't count the number of times this has caused > confusion. > > Consider the following assembly: > > nop > nop > .globl sym1 > sym1: > ret > > .section .text2,"ax", at progbits > jmp .text > jmp .text+1 > jmp .text+6 > jmp sym1 > .globl sym2 > sym2: > jmp .text2 > jmp .text2+1 > jmp .text2+20 > jmp sym2 > jmp sym2 at plt > > When assembled and then disassembled you will see output something like: > > Disassembly of section .text: > 0x00000000: 90 nop > 0x00000001: 90 nop > > sym1: > 0x00000002: C3 ret > > Disassembly of section .text2: > 0x00000000: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFC (0000000000000005h) > 0x00000005: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFD (000000000000000Ah) > 0x0000000A: E9 00 00 00 00 jmp sym1 (000000000000000Fh) > 0x0000000F: E9 00 00 00 00 jmp sym2 (0000000000000014h) > > sym2: > 0x00000014: EB EA jmp 0000000000000000h > 0x00000016: EB E9 jmp 0000000000000001h > 0x00000018: EB FA jmp sym2 (0000000000000014h) > 0x0000001A: EB F8 jmp sym2 (0000000000000014h) > 0x0000001C: E9 00 00 00 00 jmp sym2 (0000000000000021h) > > This is pretty confusing. What is wanted is output more like this: > > Disassembly of section .text[0]: > 0x00000000: 90 nop > 0x00000001: 90 nop > > sym1: > 0x00000002: C3 ret > > Disassembly of section .text2[1]: > 0x00000000: E9 ?? ?? ?? ?? jmp .text[0] + 0x0 > 0x00000005: E9 ?? ?? ?? ?? jmp .text[0] + 0x1 > 0x0000000A: E9 ?? ?? ?? ?? jmp .text[0] + 0x6 (sym1 + 0x4) > 0x0000000F: E9 ?? ?? ?? ?? jmp sym1 + 0x0 > > sym2: > 0x00000014: EB EA jmp .text2[0] + 0x0 > 0x00000016: EB E9 jmp .text2[0] + 0x1 > 0x00000018: EB FA jmp .text2[0] + 0x14 (sym2 + 0x0) > 0x0000001A: EB F8 jmp .text2[0] + 0x14 (sym2 + 0x0) > 0x0000001C: E9 ?? ?? ?? ?? jmp sym2 (via GOT) > > > Please forgive me for using the output of our internal tools to > illustrate the point (I prepared this internally and don't have much > time to write this email so I just copied and pasted). If you try this > with LLVM's binary tools or GNU's you will see similar results. > > Concrete suggestions for improvements: > > * section relative targets augmented with symbol information > * ?? to indicate Relocation patches > * targets of PC relative jumps computed correctly > * sections names augmented with their indices (section name are ambiguous) > * branches via PLT indicated with added comments > > This is not trivial to accomplish. Specifically, computing the target of > branches will either require more integration between the binary tools > and the dissembler; or, possibly the binary tools could create a fake > layout and then patch up the instructions so that they disassemble > "correctly". > > If you manage to get that done; then I would suggest going further and > trying to enhance the disassembly by adding color coding/outlining/ASCII > art to the output to show things like loops, if statements, basic > blocks. As inspiration see "rich disassembly" in this presentation by > apple: http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v. > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Krzysztof Parzyszek via llvm-dev
2019-Mar-28 13:53 UTC
[llvm-dev] [EXT] Re: GSoC19: Improve LLVM binary utilities
This augmented output should not be the default, it should only be enabled with an option. -- Krzysztof Parzyszek mailto:kparzysz at quicinc.com LLVM compiler development From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of bd1976 llvm via llvm-dev Sent: Wednesday, March 27, 2019 9:33 AM To: Jordan Rupprecht <rupprecht at google.com> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: [EXT] Re: [llvm-dev] GSoC19: Improve LLVM binary utilities Hi Seiya, If you want a project that is not trival; but, doable in a summer; will be be a great leaning opportunity, and will be very useful to developers. Then I would suggest improving the disassembly of object files on x86_64. I can't count the number of times this has caused confusion. Consider the following assembly: nop nop .globl sym1 sym1: ret .section .text2,"ax", at progbits jmp .text jmp .text+1 jmp .text+6 jmp sym1 .globl sym2 sym2: jmp .text2 jmp .text2+1 jmp .text2+20 jmp sym2 jmp sym2 at plt When assembled and then disassembled you will see output something like: Disassembly of section .text: 0x00000000: 90 nop 0x00000001: 90 nop sym1: 0x00000002: C3 ret Disassembly of section .text2: 0x00000000: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFC (0000000000000005h) 0x00000005: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFD (000000000000000Ah) 0x0000000A: E9 00 00 00 00 jmp sym1 (000000000000000Fh) 0x0000000F: E9 00 00 00 00 jmp sym2 (0000000000000014h) sym2: 0x00000014: EB EA jmp 0000000000000000h 0x00000016: EB E9 jmp 0000000000000001h 0x00000018: EB FA jmp sym2 (0000000000000014h) 0x0000001A: EB F8 jmp sym2 (0000000000000014h) 0x0000001C: E9 00 00 00 00 jmp sym2 (0000000000000021h) This is pretty confusing. What is wanted is output more like this: Disassembly of section .text[0]: 0x00000000: 90 nop 0x00000001: 90 nop sym1: 0x00000002: C3 ret Disassembly of section .text2[1]: 0x00000000: E9 ?? ?? ?? ?? jmp .text[0] + 0x0 0x00000005: E9 ?? ?? ?? ?? jmp .text[0] + 0x1 0x0000000A: E9 ?? ?? ?? ?? jmp .text[0] + 0x6 (sym1 + 0x4) 0x0000000F: E9 ?? ?? ?? ?? jmp sym1 + 0x0 sym2: 0x00000014: EB EA jmp .text2[0] + 0x0 0x00000016: EB E9 jmp .text2[0] + 0x1 0x00000018: EB FA jmp .text2[0] + 0x14 (sym2 + 0x0) 0x0000001A: EB F8 jmp .text2[0] + 0x14 (sym2 + 0x0) 0x0000001C: E9 ?? ?? ?? ?? jmp sym2 (via GOT) Please forgive me for using the output of our internal tools to illustrate the point (I prepared this internally and don't have much time to write this email so I just copied and pasted). If you try this with LLVM's binary tools or GNU's you will see similar results. Concrete suggestions for improvements: • section relative targets augmented with symbol information • ?? to indicate Relocation patches • targets of PC relative jumps computed correctly • sections names augmented with their indices (section name are ambiguous) • branches via PLT indicated with added comments This is not trivial to accomplish. Specifically, computing the target of branches will either require more integration between the binary tools and the dissembler; or, possibly the binary tools could create a fake layout and then patch up the instructions so that they disassemble "correctly". If you manage to get that done; then I would suggest going further and trying to enhance the disassembly by adding color coding/outlining/ASCII art to the output to show things like loops, if statements, basic blocks. As inspiration see "rich disassembly" in this presentation by apple: http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v.