Somehow this cover letter was dropped from my symbol offsets patch set: 1. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073200.html 2. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073201.html Original message ----------------- About a year ago a proposal suggesting symbol offsets was brought to this list[1]. This proposal goes hand-in-hand with the prefix data proposal[2] which has now been implemented and I believe both of these arose in part due to GHC's requirement to place its info tables before symbol definitions[3]. Unfortunately, the current implementation of prefix data isn't terribly useful to GHC without symbol offsets[4,5] This weekend I implemented option (2) in the original proposal, then eventually implemented option (1) on top of this. Here is the result. Note that this can also be found on Github[6] for those who prefer this. A review would be greatly appreciated. One known deficiency of this set is the lack of tests. Unfortunately, due to the use of temporary symbols it's not clear to me how this feature can be reliably tested. Ideas are welcome. Cheers, - Ben [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-April/061511.html [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063909.html [3] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-February/047514.html [4] http://www.haskell.org/pipermail/ghc-devs/2013-September/002565.html [5] https://ghc.haskell.org/trac/ghc/ticket/4213#comment:12 [6] https://github.com/bgamari/llvm/compare/symbol-offset -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 472 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140527/d604a681/attachment.sig>
I'm a little concerned we got prefix data wrong. We had the following motivating use cases: 1. Function prologue sigils, where we emit a special nop slide, maybe with data in it. Peter implemented a ubsan feature using this. 2. Function hotpatching, where we emit some data before the function and a special nop before the function. Typically the nop is 'mov edi, edi' on x86 Windows, preceded by five bytes of padding for a long jump. Profilers can uses this to turn on and off instrumentation of a running binary. 3. Tables-before-code, where data is completely prior to the code. GHC needs this. In all cases, any code inside the prologue had no meaning to LLVM. Inlining a function with a funky prologue is completely valid. I worry that symbol_offset combined with prefix are too low-level. What if we split this up into something like prefix data "prologue" data? Prefix data would be an arbitrary LLVM constant, and prologue data is a byte sequence of native executable code. Something like: define void @foo() prefix [i8* x 2] { i8* @a, i8* @b } prologue [i8 x 4] c"\xde\xad\xbe\xef" { ret void } I think the two forms are fundamentally equivalent to optimizations like global constant propagation, but it'd be nice to have an intuitive representation. One of the strengths of LLVM's IL is that it's comprehensible to mere mortal compiler engineers, and not just computer programs. --- P.S. You could also represent this with aliases with a non-zero offset from the beginning of the function. Rafael is implementing this, but I don't think that's a very good representation. What does it mean to inline through an alias to a function with a non-zero offset? We could say that we just ignore the offset for analysis purposes, but it doesn't feel very clean. On Tue, May 27, 2014 at 6:13 AM, Ben Gamari <bgamari.foss at gmail.com> wrote:> > Somehow this cover letter was dropped from my symbol offsets patch set: > > 1. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073200.html > 2. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073201.html > > > Original message > ----------------- > > About a year ago a proposal suggesting symbol offsets was brought to > this list[1]. This proposal goes hand-in-hand with the prefix data > proposal[2] which has now been implemented and I believe both of these > arose in part due to GHC's requirement to place its info tables before > symbol definitions[3]. Unfortunately, the current implementation of > prefix data isn't terribly useful to GHC without symbol offsets[4,5] > > This weekend I implemented option (2) in the original proposal, then > eventually implemented option (1) on top of this. Here is the > result. Note that this can also be found on Github[6] for those who > prefer this. > > A review would be greatly appreciated. One known deficiency of this set > is the lack of tests. Unfortunately, due to the use of temporary symbols > it's not clear to me how this feature can be reliably tested. Ideas > are welcome. > > Cheers, > > - Ben > > > [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-April/061511.html > [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063909.html > [3] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-February/047514.html > [4] http://www.haskell.org/pipermail/ghc-devs/2013-September/002565.html > [5] https://ghc.haskell.org/trac/ghc/ticket/4213#comment:12 > [6] https://github.com/bgamari/llvm/compare/symbol-offset > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140527/6b4b3974/attachment.html>
On Tue, May 27, 2014 at 5:48 PM, Reid Kleckner <rnk at google.com> wrote:> P.S. You could also represent this with aliases with a non-zero offset > from the beginning of the function. Rafael is implementing this, but I > don't think that's a very good representation. >I mean, I don't think it's a good representation for tables-before-code. I think it's a perfectly reasonable way to represent MSVC-style vftables that have RTTI data, which we plan to do.> What does it mean to inline through an alias to a function with a non-zero > offset? We could say that we just ignore the offset for analysis purposes, > but it doesn't feel very clean. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140527/d8adbe9f/attachment.html>
On Tue, May 27, 2014 at 05:48:32PM -0700, Reid Kleckner wrote:> I'm a little concerned we got prefix data wrong. We had the following > motivating use cases: > > 1. Function prologue sigils, where we emit a special nop slide, maybe with > data in it. Peter implemented a ubsan feature using this. > > 2. Function hotpatching, where we emit some data before the function and a > special nop before the function. Typically the nop is 'mov edi, edi' on > x86 Windows, preceded by five bytes of padding for a long jump. Profilers > can uses this to turn on and off instrumentation of a running binary. > > 3. Tables-before-code, where data is completely prior to the code. GHC > needs this. > > In all cases, any code inside the prologue had no meaning to LLVM. > Inlining a function with a funky prologue is completely valid. > > I worry that symbol_offset combined with prefix are too low-level. What if > we split this up into something like prefix data "prologue" data? Prefix > data would be an arbitrary LLVM constant, and prologue data is a byte > sequence of native executable code. Something like: > > define void @foo() prefix [i8* x 2] { i8* @a, i8* @b } prologue [i8 x 4] > c"\xde\xad\xbe\xef" { ret void } > > I think the two forms are fundamentally equivalent to optimizations like > global constant propagation, but it'd be nice to have an intuitive > representation. One of the strengths of LLVM's IL is that it's > comprehensible to mere mortal compiler engineers, and not just computer > programs.I like this proposal. Now that I've thought about it more, I think it might not be too important for global variables and functions to share a similar representation for offsets. One comment though. Before, when I was thinking about calls to functions with prefix data in cases where the function entry point appears after the data (i.e. GHC's use case), I was imagining that we could have two new properties for functions: the symbol offset and the entry point offset. UBSan etc would set both to zero, while GHC would set the former to zero and the latter to the size of the prefix. Provided that we need to cater to platforms where the function's metadata cannot appear before the function's symbol (which I believe to be the case on at least Darwin) we need some way of representing the distance between the symbol and the entry point in external function declarations. Under your proposal, we could probably do that by having a way of representing the type of the prefix separately from its "initializer". Thanks, -- Peter
Now that aliases can have any expressions, can't you use something like @data = private global [2 x i32] [i32 42, i32 43] @symbol = alias getelementptr ([2 x i32]* @data, i32 0, i32 1) This produces .Ldata: .long 42 # 0x2a .long 43 # 0x2b ... .globl symbol symbol = .Ldata+4 That is, in the object file there is only one symbol (named symbol) and it is at offset 4. On 27 May 2014 09:13, Ben Gamari <bgamari.foss at gmail.com> wrote:> > Somehow this cover letter was dropped from my symbol offsets patch set: > > 1. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073200.html > 2. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073201.html > > > Original message > ----------------- > > About a year ago a proposal suggesting symbol offsets was brought to > this list[1]. This proposal goes hand-in-hand with the prefix data > proposal[2] which has now been implemented and I believe both of these > arose in part due to GHC's requirement to place its info tables before > symbol definitions[3]. Unfortunately, the current implementation of > prefix data isn't terribly useful to GHC without symbol offsets[4,5] > > This weekend I implemented option (2) in the original proposal, then > eventually implemented option (1) on top of this. Here is the > result. Note that this can also be found on Github[6] for those who > prefer this. > > A review would be greatly appreciated. One known deficiency of this set > is the lack of tests. Unfortunately, due to the use of temporary symbols > it's not clear to me how this feature can be reliably tested. Ideas > are welcome. > > Cheers, > > - Ben > > > [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-April/061511.html > [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063909.html > [3] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-February/047514.html > [4] http://www.haskell.org/pipermail/ghc-devs/2013-September/002565.html > [5] https://ghc.haskell.org/trac/ghc/ticket/4213#comment:12 > [6] https://github.com/bgamari/llvm/compare/symbol-offset > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Rafael Espíndola <rafael.espindola at gmail.com> writes:> Now that aliases can have any expressions, can't you use something like > > @data = private global [2 x i32] [i32 42, i32 43] > @symbol = alias getelementptr ([2 x i32]* @data, i32 0, i32 1) > > This produces > > .Ldata: > .long 42 # 0x2a > .long 43 # 0x2b > ... > .globl symbol > symbol = .Ldata+4 > > That is, in the object file there is only one symbol (named symbol) > and it is at offset 4. >I believe we could but I'll have to try implementing it to know for certain. Thanks for the suggestion! Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 472 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140715/9940d4be/attachment.sig>
Rafael Espíndola <rafael.espindola at gmail.com> writes:> Now that aliases can have any expressions, can't you use something like > > @data = private global [2 x i32] [i32 42, i32 43] > @symbol = alias getelementptr ([2 x i32]* @data, i32 0, i32 1) > > This produces > > .Ldata: > .long 42 # 0x2a > .long 43 # 0x2b > ... > .globl symbol > symbol = .Ldata+4 > > That is, in the object file there is only one symbol (named symbol) > and it is at offset 4. >How would one define the body of the function `symbol` in this case? Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 472 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140720/ce7a33e3/attachment.sig>