Shoaib Meenai via llvm-dev
2020-May-26 22:11 UTC
[llvm-dev] Emitting aligned nlist_64 structures for Mach-O in MC
I looked into this further. ld64 has a macho_nlist abstraction over the various underlying nlist structures [1]. On x86-64, the P::getP referenced in n_value will resolve to [2], which in turn goes to [3], which calls OSReadLittleInt64. On a little endian machine, OSReadLittleEndian just calls _OSReadInt64 [4], which in turn does a pointer arithmetic and cast and then dereferences the pointer [5]. As far as I can see, this will invoke UB for an unaligned int64, unless ld64 is built against a different version of OSReadLittleEndian, or unless I’m missing something. [1] https://github.com/apple-opensource/ld64/blob/fd3feabb0a1eb18ab5d7910f3c3a5eed99cef6ab/src/abstraction/MachOFileAbstraction.hpp#L1283-L1304 [2] https://github.com/apple-opensource/ld64/blob/fd3feabb0a1eb18ab5d7910f3c3a5eed99cef6ab/src/abstraction/FileAbstraction.hpp#L137 [3] https://github.com/apple-opensource/ld64/blob/fd3feabb0a1eb18ab5d7910f3c3a5eed99cef6ab/src/abstraction/FileAbstraction.hpp#L96 [4] https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24aa7443/libkern/libkern/OSByteOrder.h#L246 [5] https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24aa7443/libkern/libkern/OSByteOrder.h#L109-L117 From: David Blaikie <dblaikie at gmail.com> Date: Tuesday, May 26, 2020 at 11:25 AM To: Shoaib Meenai <smeenai at fb.com> Cc: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Emitting aligned nlist_64 structures for Mach-O in MC I'd have figured if ld64 is able to handle this correctly (are you sure ld64 has UB in this situation when the lists are underaligned? perhaps it has utilities for doing underaligned reads, etc?) it's probably important for lld to handle them too - old object files in archives, things built by not-llvm, etc. On Tue, May 26, 2020 at 11:01 AM Shoaib Meenai via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: As part of our work on LLD for Mach-O, we’ve observed that the object files produced by LLVM don’t always have aligned nlist_64 entries. For context, nlist_64 is the 64-bit symbol table entry structure for Mach-O, and the symbol table is an array of nlist_64 entries. nlist_64 has an 8 byte member, so it should be 8-byte aligned, but we’ve seen object files where the symbol table only has a 4-byte alignment. I don't know if Mach-O mandates an alignment for the symbol table (I haven't found any such requirements in the documentation), but it would be convenient for MC to emit an 8-byte aligned symbol table. We parse an input file in LLD by memory mapping it and using casts to access the various structures. When the symbol table isn't 8-byte aligned, we're accessing the nlist_64 entries in an unaligned fashion, which makes UBSAN (rightly) unhappy. As far as I can see, ld64 also does pointer arithmetic and casting to parse the symbol table, so it would run into the same issue. Does anyone see any problems with making MC always emit an 8-byte aligned symbol table for 64-bit Mach-O files, to avoid this issue? Of course, we should still handle object files we see in the wild without the 8-byte alignment. From what I understand, the blessed way to handle a potentially unaligned access is with a memcpy, and the compiler should be able to optimize out the memcpy on architectures that support unaligned accesses. A coworker is implementing that approach in https://reviews.llvm.org/D80414<https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D80414&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=o3kDXzdBUE3ljQXKeTWOMw&m=Ux8t6Su4JVUd45UmVgQRCU2YZHX4HTWB1742jKxnhJo&s=Lg9_RU6QSNSHF88ry4t5DTpADTj7oQ_ur3L5j-mCIf8&e=>, but before we go ahead with that, I wanted to confirm (a) how reliable the memcpy optimization is across all our supported compilers (in particular, I've seen MSVC struggle with this optimization in the past), because we really don't want to be paying the cost of an actual memcpy in this codepath, and (b) if there's other good ways to handle this that we could explore. _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=o3kDXzdBUE3ljQXKeTWOMw&m=Ux8t6Su4JVUd45UmVgQRCU2YZHX4HTWB1742jKxnhJo&s=ya3X_moM7YSEY6nBjT5Aov0B4JH04p7MJHwNSJiCPDw&e=> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200526/d08b489e/attachment.html>
James Y Knight via llvm-dev
2020-May-26 22:23 UTC
[llvm-dev] Emitting aligned nlist_64 structures for Mach-O in MC
Yes, those OSReadLittleInt* functions are *quite* the attractive nuisance. The interface is highly suggestive that it should work on values of arbitrary alignments...but it doesn't. In fact, nearly every use I've seen of this family of functions has been an error of this kind. We had to fix a bunch of that internally to Google a while back (e.g. https://github.com/protocolbuffers/protobuf/commit/a0eb83f5904757cc9690fec7867c04b8c2c95249). I'm going to bet that the ld64 developers were under the same misapprehension about this function's contract as everyone else was. :) On Tue, May 26, 2020 at 6:12 PM Shoaib Meenai via llvm-dev < llvm-dev at lists.llvm.org> wrote:> I looked into this further. ld64 has a macho_nlist abstraction over the > various underlying nlist structures [1]. On x86-64, the P::getP referenced > in n_value will resolve to [2], which in turn goes to [3], which calls > OSReadLittleInt64. On a little endian machine, OSReadLittleEndian just > calls _OSReadInt64 [4], which in turn does a pointer arithmetic and cast > and then dereferences the pointer [5]. As far as I can see, this will > invoke UB for an unaligned int64, unless ld64 is built against a different > version of OSReadLittleEndian, or unless I’m missing something. > > > > [1] > https://github.com/apple-opensource/ld64/blob/fd3feabb0a1eb18ab5d7910f3c3a5eed99cef6ab/src/abstraction/MachOFileAbstraction.hpp#L1283-L1304 > > [2] > https://github.com/apple-opensource/ld64/blob/fd3feabb0a1eb18ab5d7910f3c3a5eed99cef6ab/src/abstraction/FileAbstraction.hpp#L137 > > [3] > https://github.com/apple-opensource/ld64/blob/fd3feabb0a1eb18ab5d7910f3c3a5eed99cef6ab/src/abstraction/FileAbstraction.hpp#L96 > > [4] > https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24aa7443/libkern/libkern/OSByteOrder.h#L246 > > [5] > https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24aa7443/libkern/libkern/OSByteOrder.h#L109-L117 > > > > *From: *David Blaikie <dblaikie at gmail.com> > *Date: *Tuesday, May 26, 2020 at 11:25 AM > *To: *Shoaib Meenai <smeenai at fb.com> > *Cc: *"llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org> > *Subject: *Re: [llvm-dev] Emitting aligned nlist_64 structures for Mach-O > in MC > > > > I'd have figured if ld64 is able to handle this correctly (are you sure > ld64 has UB in this situation when the lists are underaligned? perhaps it > has utilities for doing underaligned reads, etc?) it's probably important > for lld to handle them too - old object files in archives, things built by > not-llvm, etc. > > > > On Tue, May 26, 2020 at 11:01 AM Shoaib Meenai via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > As part of our work on LLD for Mach-O, we’ve observed that the object > files produced by LLVM don’t always have aligned nlist_64 entries. For > context, nlist_64 is the 64-bit symbol table entry structure for Mach-O, > and the symbol table is an array of nlist_64 entries. nlist_64 has an 8 > byte member, so it should be 8-byte aligned, but we’ve seen object files > where the symbol table only has a 4-byte alignment. > > I don't know if Mach-O mandates an alignment for the symbol table (I > haven't found any such requirements in the documentation), but it would be > convenient for MC to emit an 8-byte aligned symbol table. We parse an input > file in LLD by memory mapping it and using casts to access the various > structures. When the symbol table isn't 8-byte aligned, we're accessing the > nlist_64 entries in an unaligned fashion, which makes UBSAN (rightly) > unhappy. As far as I can see, ld64 also does pointer arithmetic and casting > to parse the symbol table, so it would run into the same issue. Does anyone > see any problems with making MC always emit an 8-byte aligned symbol table > for 64-bit Mach-O files, to avoid this issue? > > Of course, we should still handle object files we see in the wild without > the 8-byte alignment. From what I understand, the blessed way to handle a > potentially unaligned access is with a memcpy, and the compiler should be > able to optimize out the memcpy on architectures that support unaligned > accesses. A coworker is implementing that approach in > https://reviews.llvm.org/D80414 > <https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D80414&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=o3kDXzdBUE3ljQXKeTWOMw&m=Ux8t6Su4JVUd45UmVgQRCU2YZHX4HTWB1742jKxnhJo&s=Lg9_RU6QSNSHF88ry4t5DTpADTj7oQ_ur3L5j-mCIf8&e=>, > but before we go ahead with that, I wanted to confirm (a) how reliable the > memcpy optimization is across all our supported compilers (in particular, > I've seen MSVC struggle with this optimization in the past), because we > really don't want to be paying the cost of an actual memcpy in this > codepath, and (b) if there's other good ways to handle this that we could > explore. > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=o3kDXzdBUE3ljQXKeTWOMw&m=Ux8t6Su4JVUd45UmVgQRCU2YZHX4HTWB1742jKxnhJo&s=ya3X_moM7YSEY6nBjT5Aov0B4JH04p7MJHwNSJiCPDw&e=> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200526/2c47087b/attachment.html>
David Blaikie via llvm-dev
2020-May-26 22:55 UTC
[llvm-dev] Emitting aligned nlist_64 structures for Mach-O in MC
+Lang for Darwin linker things - Lang, any ideas if this is correct in ld64? Should be fixed there (for back-compat) or fixed forward (because things seem to have been going OK so far... )? On Tue, May 26, 2020 at 3:24 PM James Y Knight <jyknight at google.com> wrote:> Yes, those OSReadLittleInt* functions are *quite* the attractive > nuisance. The interface is highly suggestive that it should work on values > of arbitrary alignments...but it doesn't. > > In fact, nearly every use I've seen of this family of functions has been > an error of this kind. We had to fix a bunch of that internally to Google a > while back (e.g. > https://github.com/protocolbuffers/protobuf/commit/a0eb83f5904757cc9690fec7867c04b8c2c95249). > I'm going to bet that the ld64 developers were under the same > misapprehension about this function's contract as everyone else was. :) > > > On Tue, May 26, 2020 at 6:12 PM Shoaib Meenai via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> I looked into this further. ld64 has a macho_nlist abstraction over the >> various underlying nlist structures [1]. On x86-64, the P::getP referenced >> in n_value will resolve to [2], which in turn goes to [3], which calls >> OSReadLittleInt64. On a little endian machine, OSReadLittleEndian just >> calls _OSReadInt64 [4], which in turn does a pointer arithmetic and cast >> and then dereferences the pointer [5]. As far as I can see, this will >> invoke UB for an unaligned int64, unless ld64 is built against a different >> version of OSReadLittleEndian, or unless I’m missing something. >> >> >> >> [1] >> https://github.com/apple-opensource/ld64/blob/fd3feabb0a1eb18ab5d7910f3c3a5eed99cef6ab/src/abstraction/MachOFileAbstraction.hpp#L1283-L1304 >> >> [2] >> https://github.com/apple-opensource/ld64/blob/fd3feabb0a1eb18ab5d7910f3c3a5eed99cef6ab/src/abstraction/FileAbstraction.hpp#L137 >> >> [3] >> https://github.com/apple-opensource/ld64/blob/fd3feabb0a1eb18ab5d7910f3c3a5eed99cef6ab/src/abstraction/FileAbstraction.hpp#L96 >> >> [4] >> https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24aa7443/libkern/libkern/OSByteOrder.h#L246 >> >> [5] >> https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24aa7443/libkern/libkern/OSByteOrder.h#L109-L117 >> >> >> >> *From: *David Blaikie <dblaikie at gmail.com> >> *Date: *Tuesday, May 26, 2020 at 11:25 AM >> *To: *Shoaib Meenai <smeenai at fb.com> >> *Cc: *"llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org> >> *Subject: *Re: [llvm-dev] Emitting aligned nlist_64 structures for >> Mach-O in MC >> >> >> >> I'd have figured if ld64 is able to handle this correctly (are you sure >> ld64 has UB in this situation when the lists are underaligned? perhaps it >> has utilities for doing underaligned reads, etc?) it's probably important >> for lld to handle them too - old object files in archives, things built by >> not-llvm, etc. >> >> >> >> On Tue, May 26, 2020 at 11:01 AM Shoaib Meenai via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >> As part of our work on LLD for Mach-O, we’ve observed that the object >> files produced by LLVM don’t always have aligned nlist_64 entries. For >> context, nlist_64 is the 64-bit symbol table entry structure for Mach-O, >> and the symbol table is an array of nlist_64 entries. nlist_64 has an 8 >> byte member, so it should be 8-byte aligned, but we’ve seen object files >> where the symbol table only has a 4-byte alignment. >> >> I don't know if Mach-O mandates an alignment for the symbol table (I >> haven't found any such requirements in the documentation), but it would be >> convenient for MC to emit an 8-byte aligned symbol table. We parse an input >> file in LLD by memory mapping it and using casts to access the various >> structures. When the symbol table isn't 8-byte aligned, we're accessing the >> nlist_64 entries in an unaligned fashion, which makes UBSAN (rightly) >> unhappy. As far as I can see, ld64 also does pointer arithmetic and casting >> to parse the symbol table, so it would run into the same issue. Does anyone >> see any problems with making MC always emit an 8-byte aligned symbol table >> for 64-bit Mach-O files, to avoid this issue? >> >> Of course, we should still handle object files we see in the wild without >> the 8-byte alignment. From what I understand, the blessed way to handle a >> potentially unaligned access is with a memcpy, and the compiler should be >> able to optimize out the memcpy on architectures that support unaligned >> accesses. A coworker is implementing that approach in >> https://reviews.llvm.org/D80414 >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D80414&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=o3kDXzdBUE3ljQXKeTWOMw&m=Ux8t6Su4JVUd45UmVgQRCU2YZHX4HTWB1742jKxnhJo&s=Lg9_RU6QSNSHF88ry4t5DTpADTj7oQ_ur3L5j-mCIf8&e=>, >> but before we go ahead with that, I wanted to confirm (a) how reliable the >> memcpy optimization is across all our supported compilers (in particular, >> I've seen MSVC struggle with this optimization in the past), because we >> really don't want to be paying the cost of an actual memcpy in this >> codepath, and (b) if there's other good ways to handle this that we could >> explore. >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=o3kDXzdBUE3ljQXKeTWOMw&m=Ux8t6Su4JVUd45UmVgQRCU2YZHX4HTWB1742jKxnhJo&s=ya3X_moM7YSEY6nBjT5Aov0B4JH04p7MJHwNSJiCPDw&e=> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200526/be6ca075/attachment.html>