Michael Trent via llvm-dev
2019-May-23 18:18 UTC
[llvm-dev] Proposal for Mach-O support in llvm-objcopy: section renaming
> On May 23, 2019, at 2:05 AM, James Henderson via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I discussed this with Seiya off the mailing list yesterday, and this was the suggestion we came up with, on the basis that GNU objcopy has support for the renaming for GDB support, but it might be confusing to people who are new to the system, so we provide a more expected output option. I'm not experienced with MachO at all, though, so we'd appreciate any feedback from any MachO users.Generally, Mach-O tools separate the segment name and the section name as different entries on the command line. The “<Segment Name>,<Section Name>” string is almost always an output format. Some examples include: otool -s __TEXT __text /bin/ls ld -sectcreate __EXAMPLE __example /dev/zero ... In my opinion, that would be ideal from a "Mach-O users" point of view. That said, the “two arguments” pattern isn’t very common in llvm, although it does appear in places such as llvm-nm. llvm-objdump has a -section option that takes a single string in the “<Segment Name>,<Section Name>” format. This option only applies when a Mach-O specific flag, “-macho” or “-m", appears on the command line. And that’s basically the proposal here. So while not ideal, it’s certainly familiar. How will people use llvm-objcopy when the segment and section names legitimately contain “.” or “,” characters? Will these be escapable? The rest of the behavior, especially around “__TEXT.__text is bad but __TEXT.__unwind_info is good”, is pretty confusing. Can we define our own “canonical names” for canonical sections such as __unwind_info?> > Thanks, > > James > > On Thu, 23 May 2019 at 05:43, Seiya Nuta <nuta at seiya.me <mailto:nuta at seiya.me>> wrote: > Hi, > > I'm going to implement Mach-O support in llvm-objcopy. Before working > on this, I'd like to hear your thoughts how llvm-objcopy should handle > Mach-O section names. > > By convention, Mach-O section names are denoted by "<segment > name>,<section name>". However, GNU objcopy renames them in the > following rule [1]: > > - If the section name is well-known, rename it to an "canonical" name [2]. > - Otherwise: > - Rename to "<segment name>.<section name>" (the separator is `.' not `,') > - If the segment name does not start with `_', prefix it with `LC_SEGMENT.’Can you explain what LC_SEGMENT means here? What happens if the segment name (in the file) does not begin with a “_”? Thanks! MDT> > For example, __TEXT,__text is renamed to .text and > __TEXT,__unwind_info is renamed to __TEXT.__unwind_info. For that > reason, specifying a section in command line options is rather > nonintuitive: > > WRONG: objcopy --only-secton=__TEXT,__text a.out > WRONG: objcopy --only-secton=__TEXT.__text a.out > OK: objcopy --only-secton=.text a.out > > WRONG: objcopy --only-secton=__TEXT,__unwind_info a.out > WRONG: objcopy --only-secton=.unwind_info a.out > OK: objcopy --only-secton=__TEXT.__unwind_info a.out > > For the compatibility with GNU binutils, I propose to make this > section renaming rule as default in llvm-objcopy and implement a flag > named —macho-names to use conventional section names: > > WRONG: llvm-objcopy --only-secton=__TEXT,__text a.out > WRONG: llvm-objcopy --only-secton=__TEXT.__text a.out > OK: llvm-objcopy --only-secton=.text a.out a.out2 > > WRONG: llvm-objcopy --macho-names --only-secton=.text a.out > WRONG: llvm-objcopy --macho-names --only-secton=__TEXT.__text a.out > OK: llvm-objcopy --macho-names --only-secton=__TEXT,__text a.out > > What do you think about this behavior? > > Thanks, > Seiya > > [1]: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/mach-o.c;h=d9edef2871d83b53280b613935c068e4327f3270;hb=HEAD#l364 <https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/mach-o.c;h=d9edef2871d83b53280b613935c068e4327f3270;hb=HEAD#l364> > [2]: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/mach-o.c;h=d9edef2871d83b53280b613935c068e4327f3270;hb=HEAD#l90 <https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/mach-o.c;h=d9edef2871d83b53280b613935c068e4327f3270;hb=HEAD#l90> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190523/0a1c31aa/attachment.html>
Jake Ehrlich via llvm-dev
2019-May-23 23:58 UTC
[llvm-dev] Proposal for Mach-O support in llvm-objcopy: section renaming
There's a constraint that exists on the ELF backend that might not exist on the MachO backend. For ELF it was critical that we be drop in replaceable. We should find evidence one way or the other if this is critical for MachO. My assumption is that it still is however in which case we have to match what GNU objcopy does, at least on the public interface. On Thu, May 23, 2019 at 11:19 AM Michael Trent via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > On May 23, 2019, at 2:05 AM, James Henderson via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > I discussed this with Seiya off the mailing list yesterday, and this was > the suggestion we came up with, on the basis that GNU objcopy has support > for the renaming for GDB support, but it might be confusing to people who > are new to the system, so we provide a more expected output option. I'm not > experienced with MachO at all, though, so we'd appreciate any feedback from > any MachO users. > > > Generally, Mach-O tools separate the segment name and the section name as > different entries on the command line. The “<Segment Name>,<Section Name>” > string is almost always an output format. Some examples include: > > otool -s __TEXT __text /bin/ls > ld -sectcreate __EXAMPLE __example /dev/zero ... > > In my opinion, that would be ideal from a "Mach-O users" point of view. > > That said, the “two arguments” pattern isn’t very common in llvm, although > it does appear in places such as llvm-nm. llvm-objdump has a -section > option that takes a single string in the “<Segment Name>,<Section Name>” > format. This option only applies when a Mach-O specific flag, “-macho” or > “-m", appears on the command line. And that’s basically the proposal here. > So while not ideal, it’s certainly familiar. > > How will people use llvm-objcopy when the segment and section names > legitimately contain “.” or “,” characters? Will these be escapable? > > The rest of the behavior, especially around “__TEXT.__text is bad but > __TEXT.__unwind_info is good”, is pretty confusing. Can we define our own > “canonical names” for canonical sections such as __unwind_info? > > > Thanks, > > James > > On Thu, 23 May 2019 at 05:43, Seiya Nuta <nuta at seiya.me> wrote: > >> Hi, >> >> I'm going to implement Mach-O support in llvm-objcopy. Before working >> on this, I'd like to hear your thoughts how llvm-objcopy should handle >> Mach-O section names. >> >> By convention, Mach-O section names are denoted by "<segment >> name>,<section name>". However, GNU objcopy renames them in the >> following rule [1]: >> >> - If the section name is well-known, rename it to an "canonical" name [2]. >> - Otherwise: >> - Rename to "<segment name>.<section name>" (the separator is `.' not >> `,') >> - If the segment name does not start with `_', prefix it with >> `LC_SEGMENT.’ > > > Can you explain what LC_SEGMENT means here? What happens if the segment > name (in the file) does not begin with a “_”? > > Thanks! > > MDT > > >> For example, __TEXT,__text is renamed to .text and >> __TEXT,__unwind_info is renamed to __TEXT.__unwind_info. For that >> reason, specifying a section in command line options is rather >> nonintuitive: >> >> WRONG: objcopy --only-secton=__TEXT,__text a.out >> WRONG: objcopy --only-secton=__TEXT.__text a.out >> OK: objcopy --only-secton=.text a.out >> >> WRONG: objcopy --only-secton=__TEXT,__unwind_info a.out >> WRONG: objcopy --only-secton=.unwind_info a.out >> OK: objcopy --only-secton=__TEXT.__unwind_info a.out >> >> For the compatibility with GNU binutils, I propose to make this >> section renaming rule as default in llvm-objcopy and implement a flag >> named —macho-names to use conventional section names: >> >> WRONG: llvm-objcopy --only-secton=__TEXT,__text a.out >> WRONG: llvm-objcopy --only-secton=__TEXT.__text a.out >> OK: llvm-objcopy --only-secton=.text a.out a.out2 >> >> WRONG: llvm-objcopy --macho-names --only-secton=.text a.out >> WRONG: llvm-objcopy --macho-names --only-secton=__TEXT.__text a.out >> OK: llvm-objcopy --macho-names --only-secton=__TEXT,__text a.out >> >> What do you think about this behavior? >> >> Thanks, >> Seiya >> >> [1]: >> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/mach-o.c;h=d9edef2871d83b53280b613935c068e4327f3270;hb=HEAD#l364 >> [2]: >> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/mach-o.c;h=d9edef2871d83b53280b613935c068e4327f3270;hb=HEAD#l90 >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190523/3889b752/attachment.html>
Seiya Nuta via llvm-dev
2019-May-24 05:33 UTC
[llvm-dev] Proposal for Mach-O support in llvm-objcopy: section renaming
Hi Michael, Thank you for your comments. Now I think the following behavior is more appropriate: --*-section behaves like GNU objcopy: WRONG: llvm-objcopy --only-secton=__TEXT,__text a.out WRONG: llvm-objcopy --only-secton=__TEXT.__text a.out OK: llvm-objcopy --only-secton=.text a.out a.out2 --*-macho-section accepts "two arguments pattern": WRONG: llvm-objcopy --only-macho-secton=.text a.out a.out2 ERROR: llvm-objcopy --only-secton=__TEXT,__text --only-macho-secton __TEXT __text a.out (error: --only-section and --only-macho-section are exclusive) OK: llvm-objcopy --only-macho-secton __TEXT __text a.out (use two arguments to specify the segment/section name)> How will people use llvm-objcopy when the segment and section names legitimately contain “.” or “,” characters? Will these be escapable?I think the "two arguments pattern" as you described sounds the best way to solve this.> The rest of the behavior, especially around “__TEXT.__text is bad but __TEXT.__unwind_info is good”, is pretty confusing. Can we define our own “canonical names” for canonical sections such as __unwind_info?While it's confusing, for the compatibility with GNU obcopy, I think we should inherit canonical names hard-coded in it [1].> Can you explain what LC_SEGMENT means here? What happens if the segment name (in the file) does not begin with a “_”?If a segment name doesn't start with a "_" (they call it "a weird name"), GNU objcopy adds the prefix "LC_SEGMENT." to the BFD section name. For example, running "objdump -h" to an object file which contains a section named "foo,bar" prints as follows. I'm unclear why they do so though. foo: file format mach-o-x86-64 Sections: Idx Name Size VMA LMA File off Algn 0 .text 002ffec7 0000000100001dd4 0000000100001dd4 00000dd4 2**2 CONTENTS, ALLOC, LOAD, CODE ... 13 LC_SEGMENT.foo.bar 0000000d 0000000000000000 0000000000000000 0043d000 2**0 CONTENTS, ALLOC, LOAD [1]: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/mach-o.c;h=d9edef2871d83b53280b613935c068e4327f3270;hb=HEAD#l90 Thank you, Seiya On Fri, May 24, 2019 at 3:19 AM Michael Trent <mtrent at apple.com> wrote:> > > On May 23, 2019, at 2:05 AM, James Henderson via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I discussed this with Seiya off the mailing list yesterday, and this was the suggestion we came up with, on the basis that GNU objcopy has support for the renaming for GDB support, but it might be confusing to people who are new to the system, so we provide a more expected output option. I'm not experienced with MachO at all, though, so we'd appreciate any feedback from any MachO users. > > > Generally, Mach-O tools separate the segment name and the section name as different entries on the command line. The “<Segment Name>,<Section Name>” string is almost always an output format. Some examples include: > > otool -s __TEXT __text /bin/ls > ld -sectcreate __EXAMPLE __example /dev/zero ... > > In my opinion, that would be ideal from a "Mach-O users" point of view. > > That said, the “two arguments” pattern isn’t very common in llvm, although it does appear in places such as llvm-nm. llvm-objdump has a -section option that takes a single string in the “<Segment Name>,<Section Name>” format. This option only applies when a Mach-O specific flag, “-macho” or “-m", appears on the command line. And that’s basically the proposal here. So while not ideal, it’s certainly familiar. > > How will people use llvm-objcopy when the segment and section names legitimately contain “.” or “,” characters? Will these be escapable? > > The rest of the behavior, especially around “__TEXT.__text is bad but __TEXT.__unwind_info is good”, is pretty confusing. Can we define our own “canonical names” for canonical sections such as __unwind_info? > > > Thanks, > > James > > On Thu, 23 May 2019 at 05:43, Seiya Nuta <nuta at seiya.me> wrote: >> >> Hi, >> >> I'm going to implement Mach-O support in llvm-objcopy. Before working >> on this, I'd like to hear your thoughts how llvm-objcopy should handle >> Mach-O section names. >> >> By convention, Mach-O section names are denoted by "<segment >> name>,<section name>". However, GNU objcopy renames them in the >> following rule [1]: >> >> - If the section name is well-known, rename it to an "canonical" name [2]. >> - Otherwise: >> - Rename to "<segment name>.<section name>" (the separator is `.' not `,') >> - If the segment name does not start with `_', prefix it with `LC_SEGMENT.’ > > > Can you explain what LC_SEGMENT means here? What happens if the segment name (in the file) does not begin with a “_”? > > Thanks! > > MDT > >> >> For example, __TEXT,__text is renamed to .text and >> __TEXT,__unwind_info is renamed to __TEXT.__unwind_info. For that >> reason, specifying a section in command line options is rather >> nonintuitive: >> >> WRONG: objcopy --only-secton=__TEXT,__text a.out >> WRONG: objcopy --only-secton=__TEXT.__text a.out >> OK: objcopy --only-secton=.text a.out >> >> WRONG: objcopy --only-secton=__TEXT,__unwind_info a.out >> WRONG: objcopy --only-secton=.unwind_info a.out >> OK: objcopy --only-secton=__TEXT.__unwind_info a.out >> >> For the compatibility with GNU binutils, I propose to make this >> section renaming rule as default in llvm-objcopy and implement a flag >> named —macho-names to use conventional section names: >> >> WRONG: llvm-objcopy --only-secton=__TEXT,__text a.out >> WRONG: llvm-objcopy --only-secton=__TEXT.__text a.out >> OK: llvm-objcopy --only-secton=.text a.out a.out2 >> >> WRONG: llvm-objcopy --macho-names --only-secton=.text a.out >> WRONG: llvm-objcopy --macho-names --only-secton=__TEXT.__text a.out >> OK: llvm-objcopy --macho-names --only-secton=__TEXT,__text a.out >> >> What do you think about this behavior? >> >> Thanks, >> Seiya >> >> [1]: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/mach-o.c;h=d9edef2871d83b53280b613935c068e4327f3270;hb=HEAD#l364 >> [2]: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/mach-o.c;h=d9edef2871d83b53280b613935c068e4327f3270;hb=HEAD#l90 > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >
Seiya Nuta via llvm-dev
2019-May-24 13:22 UTC
[llvm-dev] Proposal for Mach-O support in llvm-objcopy: section renaming
I've discussed this more with James off the mailing list. I think in llvm-objcopy, "two arguments pattern" makes the command-line parsing complicated. For example, "--rename-section __FOO __foo=__FOO __bar" looks weird and is not easy to parse. Instead, I propose the following changes: - Introduce "--macho" option to allow specifying the section by "<segment>,<section>" - With "--macho", enable escaping "," by backslashes (e.g., "__FOO,__\,ba\\r" represents: "__,ba\r" section in "__FOO" segment). - Without "--macho", imitate GNU objcopy for the compatibility. That is, implement the section renaming as described. In a nutshell: WRONG: llvm-objcopy --only-secton=__TEXT,__text a.out OK: llvm-objcopy --only-secton=.text a.out WRONG: llvm-objcopy --macho --only-secton=.text a.out OK: llvm-objcopy --macho --only-secton=__TEXT,__text a.out As Jake pointed out, I think we need evidence to decide whether this section name handling should be drop-in replaceable with GNU objcopy, by the way. At least I have no idea such evidence. Thanks, Seiya On Fri, May 24, 2019 at 2:33 PM Seiya Nuta <nuta at seiya.me> wrote:> > Hi Michael, > > Thank you for your comments. Now I think the following behavior is > more appropriate: > > --*-section behaves like GNU objcopy: > WRONG: llvm-objcopy --only-secton=__TEXT,__text a.out > WRONG: llvm-objcopy --only-secton=__TEXT.__text a.out > OK: llvm-objcopy --only-secton=.text a.out a.out2 > > --*-macho-section accepts "two arguments pattern": > WRONG: llvm-objcopy --only-macho-secton=.text a.out a.out2 > ERROR: llvm-objcopy --only-secton=__TEXT,__text --only-macho-secton > __TEXT __text a.out (error: --only-section and --only-macho-section > are exclusive) > OK: llvm-objcopy --only-macho-secton __TEXT __text a.out (use two > arguments to specify the segment/section name) > > > How will people use llvm-objcopy when the segment and section names legitimately contain “.” or “,” characters? Will these be escapable? > I think the "two arguments pattern" as you described sounds the best > way to solve this. > > > The rest of the behavior, especially around “__TEXT.__text is bad but __TEXT.__unwind_info is good”, is pretty confusing. Can we define our own “canonical names” for canonical sections such as __unwind_info? > While it's confusing, for the compatibility with GNU obcopy, I think > we should inherit canonical names hard-coded in it [1]. > > > Can you explain what LC_SEGMENT means here? What happens if the segment name (in the file) does not begin with a “_”? > If a segment name doesn't start with a "_" (they call it "a weird > name"), GNU objcopy adds the prefix "LC_SEGMENT." to the BFD section > name. For example, running "objdump -h" to an object file which > contains a section named "foo,bar" prints as follows. I'm unclear why > they do so though. > > foo: file format mach-o-x86-64 > > Sections: > Idx Name Size VMA LMA File off Algn > 0 .text 002ffec7 0000000100001dd4 0000000100001dd4 00000dd4 2**2 > CONTENTS, ALLOC, LOAD, CODE > ... > 13 LC_SEGMENT.foo.bar 0000000d 0000000000000000 0000000000000000 > 0043d000 2**0 > CONTENTS, ALLOC, LOAD > > [1]: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/mach-o.c;h=d9edef2871d83b53280b613935c068e4327f3270;hb=HEAD#l90 > > Thank you, > Seiya > > On Fri, May 24, 2019 at 3:19 AM Michael Trent <mtrent at apple.com> wrote: > > > > > > On May 23, 2019, at 2:05 AM, James Henderson via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > I discussed this with Seiya off the mailing list yesterday, and this was the suggestion we came up with, on the basis that GNU objcopy has support for the renaming for GDB support, but it might be confusing to people who are new to the system, so we provide a more expected output option. I'm not experienced with MachO at all, though, so we'd appreciate any feedback from any MachO users. > > > > > > Generally, Mach-O tools separate the segment name and the section name as different entries on the command line. The “<Segment Name>,<Section Name>” string is almost always an output format. Some examples include: > > > > otool -s __TEXT __text /bin/ls > > ld -sectcreate __EXAMPLE __example /dev/zero ... > > > > In my opinion, that would be ideal from a "Mach-O users" point of view. > > > > That said, the “two arguments” pattern isn’t very common in llvm, although it does appear in places such as llvm-nm. llvm-objdump has a -section option that takes a single string in the “<Segment Name>,<Section Name>” format. This option only applies when a Mach-O specific flag, “-macho” or “-m", appears on the command line. And that’s basically the proposal here. So while not ideal, it’s certainly familiar. > > > > How will people use llvm-objcopy when the segment and section names legitimately contain “.” or “,” characters? Will these be escapable? > > > > The rest of the behavior, especially around “__TEXT.__text is bad but __TEXT.__unwind_info is good”, is pretty confusing. Can we define our own “canonical names” for canonical sections such as __unwind_info? > > > > > > Thanks, > > > > James > > > > On Thu, 23 May 2019 at 05:43, Seiya Nuta <nuta at seiya.me> wrote: > >> > >> Hi, > >> > >> I'm going to implement Mach-O support in llvm-objcopy. Before working > >> on this, I'd like to hear your thoughts how llvm-objcopy should handle > >> Mach-O section names. > >> > >> By convention, Mach-O section names are denoted by "<segment > >> name>,<section name>". However, GNU objcopy renames them in the > >> following rule [1]: > >> > >> - If the section name is well-known, rename it to an "canonical" name [2]. > >> - Otherwise: > >> - Rename to "<segment name>.<section name>" (the separator is `.' not `,') > >> - If the segment name does not start with `_', prefix it with `LC_SEGMENT.’ > > > > > > Can you explain what LC_SEGMENT means here? What happens if the segment name (in the file) does not begin with a “_”? > > > > Thanks! > > > > MDT > > > >> > >> For example, __TEXT,__text is renamed to .text and > >> __TEXT,__unwind_info is renamed to __TEXT.__unwind_info. For that > >> reason, specifying a section in command line options is rather > >> nonintuitive: > >> > >> WRONG: objcopy --only-secton=__TEXT,__text a.out > >> WRONG: objcopy --only-secton=__TEXT.__text a.out > >> OK: objcopy --only-secton=.text a.out > >> > >> WRONG: objcopy --only-secton=__TEXT,__unwind_info a.out > >> WRONG: objcopy --only-secton=.unwind_info a.out > >> OK: objcopy --only-secton=__TEXT.__unwind_info a.out > >> > >> For the compatibility with GNU binutils, I propose to make this > >> section renaming rule as default in llvm-objcopy and implement a flag > >> named —macho-names to use conventional section names: > >> > >> WRONG: llvm-objcopy --only-secton=__TEXT,__text a.out > >> WRONG: llvm-objcopy --only-secton=__TEXT.__text a.out > >> OK: llvm-objcopy --only-secton=.text a.out a.out2 > >> > >> WRONG: llvm-objcopy --macho-names --only-secton=.text a.out > >> WRONG: llvm-objcopy --macho-names --only-secton=__TEXT.__text a.out > >> OK: llvm-objcopy --macho-names --only-secton=__TEXT,__text a.out > >> > >> What do you think about this behavior? > >> > >> Thanks, > >> Seiya > >> > >> [1]: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/mach-o.c;h=d9edef2871d83b53280b613935c068e4327f3270;hb=HEAD#l364 > >> [2]: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/mach-o.c;h=d9edef2871d83b53280b613935c068e4327f3270;hb=HEAD#l90 > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > >