thr3ads.net - llvm dev - [LLVMdev] [PATCH] Symbol offsets [May 2014]

If this information is useful, please help other people find it:
Share via:

Ben Gamari

2014-May-27 13:13 UTC

[LLVMdev] [PATCH] Symbol offsets

Somehow this cover letter was dropped from my symbol offsets patch set:

  1. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073200.html
  2. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073201.html


Original message
-----------------

About a year ago a proposal suggesting symbol offsets was brought to
this list[1]. This proposal goes hand-in-hand with the prefix data
proposal[2] which has now been implemented and I believe both of these
arose in part due to GHC's requirement to place its info tables before
symbol definitions[3]. Unfortunately, the current implementation of
prefix data isn't terribly useful to GHC without symbol offsets[4,5]

This weekend I implemented option (2) in the original proposal, then
eventually implemented option (1) on top of this. Here is the
result. Note that this can also be found on Github[6] for those who
prefer this.

A review would be greatly appreciated. One known deficiency of this set
is the lack of tests. Unfortunately, due to the use of temporary symbols
it's not clear to me how this feature can be reliably tested. Ideas
are welcome.

Cheers,

- Ben


[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-April/061511.html
[2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063909.html
[3] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-February/047514.html
[4] http://www.haskell.org/pipermail/ghc-devs/2013-September/002565.html
[5] https://ghc.haskell.org/trac/ghc/ticket/4213#comment:12
[6] https://github.com/bgamari/llvm/compare/symbol-offset

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 472 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140527/d604a681/attachment.sig>

Reid Kleckner

2014-May-28 00:48 UTC

head link

[LLVMdev] [PATCH] Symbol offsets

I'm a little concerned we got prefix data wrong.  We had the following
motivating use cases:

1. Function prologue sigils, where we emit a special nop slide, maybe with
data in it.  Peter implemented a ubsan feature using this.

2. Function hotpatching, where we emit some data before the function and a
special nop before the function.  Typically the nop is 'mov edi, edi' on
x86 Windows, preceded by five bytes of padding for a long jump.  Profilers
can uses this to turn on and off instrumentation of a running binary.

3. Tables-before-code, where data is completely prior to the code.  GHC
needs this.

In all cases, any code inside the prologue had no meaning to LLVM.
 Inlining a function with a funky prologue is completely valid.

I worry that symbol_offset combined with prefix are too low-level.  What if
we split this up into something like prefix data "prologue" data? 
Prefix
data would be an arbitrary LLVM constant, and prologue data is a byte
sequence of native executable code.  Something like:

define void @foo() prefix [i8* x 2] { i8* @a, i8* @b } prologue [i8 x 4]
c"\xde\xad\xbe\xef" { ret void }

I think the two forms are fundamentally equivalent to optimizations like
global constant propagation, but it'd be nice to have an intuitive
representation.  One of the strengths of LLVM's IL is that it's
comprehensible to mere mortal compiler engineers, and not just computer
programs.

---

P.S. You could also represent this with aliases with a non-zero offset from
the beginning of the function.  Rafael is implementing this, but I don't
think that's a very good representation.  What does it mean to inline
through an alias to a function with a non-zero offset?  We could say that
we just ignore the offset for analysis purposes, but it doesn't feel very
clean.

On Tue, May 27, 2014 at 6:13 AM, Ben Gamari <bgamari.foss at gmail.com>
wrote:
>
> Somehow this cover letter was dropped from my symbol offsets patch set:
>
>   1. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073200.html
>   2. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073201.html
>
>
> Original message
> -----------------
>
> About a year ago a proposal suggesting symbol offsets was brought to
> this list[1]. This proposal goes hand-in-hand with the prefix data
> proposal[2] which has now been implemented and I believe both of these
> arose in part due to GHC's requirement to place its info tables before
> symbol definitions[3]. Unfortunately, the current implementation of
> prefix data isn't terribly useful to GHC without symbol offsets[4,5]
>
> This weekend I implemented option (2) in the original proposal, then
> eventually implemented option (1) on top of this. Here is the
> result. Note that this can also be found on Github[6] for those who
> prefer this.
>
> A review would be greatly appreciated. One known deficiency of this set
> is the lack of tests. Unfortunately, due to the use of temporary symbols
> it's not clear to me how this feature can be reliably tested. Ideas
> are welcome.
>
> Cheers,
>
> - Ben
>
>
> [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-April/061511.html
> [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063909.html
> [3] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-February/047514.html
> [4] http://www.haskell.org/pipermail/ghc-devs/2013-September/002565.html
> [5] https://ghc.haskell.org/trac/ghc/ticket/4213#comment:12
> [6] https://github.com/bgamari/llvm/compare/symbol-offset
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140527/6b4b3974/attachment.html>

Reid Kleckner

2014-May-28 00:50 UTC

head link

[LLVMdev] [PATCH] Symbol offsets

On Tue, May 27, 2014 at 5:48 PM, Reid Kleckner <rnk at google.com> wrote:
> P.S. You could also represent this with aliases with a non-zero offset
> from the beginning of the function.  Rafael is implementing this, but I
> don't think that's a very good representation.
>
I mean, I don't think it's a good representation for tables-before-code.
I
think it's a perfectly reasonable way to represent MSVC-style vftables that
have RTTI data, which we plan to do.

> What does it mean to inline through an alias to a function with a non-zero
> offset?  We could say that we just ignore the offset for analysis purposes,
> but it doesn't feel very clean.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140527/d8adbe9f/attachment.html>

Peter Collingbourne

2014-May-28 01:55 UTC

head link

[LLVMdev] [PATCH] Symbol offsets

On Tue, May 27, 2014 at 05:48:32PM -0700, Reid Kleckner
wrote:> I'm a little concerned we got prefix data wrong.  We had the following
> motivating use cases:
> 
> 1. Function prologue sigils, where we emit a special nop slide, maybe with
> data in it.  Peter implemented a ubsan feature using this.
> 
> 2. Function hotpatching, where we emit some data before the function and a
> special nop before the function.  Typically the nop is 'mov edi,
edi' on
> x86 Windows, preceded by five bytes of padding for a long jump.  Profilers
> can uses this to turn on and off instrumentation of a running binary.
> 
> 3. Tables-before-code, where data is completely prior to the code.  GHC
> needs this.
> 
> In all cases, any code inside the prologue had no meaning to LLVM.
>  Inlining a function with a funky prologue is completely valid.
> 
> I worry that symbol_offset combined with prefix are too low-level.  What if
> we split this up into something like prefix data "prologue" data?
Prefix
> data would be an arbitrary LLVM constant, and prologue data is a byte
> sequence of native executable code.  Something like:
> 
> define void @foo() prefix [i8* x 2] { i8* @a, i8* @b } prologue [i8 x 4]
> c"\xde\xad\xbe\xef" { ret void }
> 
> I think the two forms are fundamentally equivalent to optimizations like
> global constant propagation, but it'd be nice to have an intuitive
> representation.  One of the strengths of LLVM's IL is that it's
> comprehensible to mere mortal compiler engineers, and not just computer
> programs.
I like this proposal. Now that I've thought about it more, I think it might
not be too important for global variables and functions to share a similar
representation for offsets. One comment though.

Before, when I was thinking about calls to functions with prefix data in
cases where the function entry point appears after the data (i.e. GHC's use
case), I was imagining that we could have two new properties for functions:
the symbol offset and the entry point offset. UBSan etc would set both to
zero, while GHC would set the former to zero and the latter to the size of
the prefix.

Provided that we need to cater to platforms where the function's metadata
cannot appear before the function's symbol (which I believe to be the case
on at least Darwin) we need some way of representing the distance between
the symbol and the entry point in external function declarations. Under your
proposal, we could probably do that by having a way of representing the type
of the prefix separately from its "initializer".

Thanks,
-- 
Peter

Rafael Espíndola

2014-Jun-10 23:30 UTC

head link

[LLVMdev] [PATCH] Symbol offsets

Now that aliases can have any expressions, can't you use something like

@data = private global [2 x i32] [i32 42, i32 43]
@symbol = alias getelementptr ([2 x i32]* @data, i32 0, i32 1)

This produces

.Ldata:
        .long   42                      # 0x2a
        .long   43                      # 0x2b
...
        .globl  symbol
symbol = .Ldata+4

That is, in the object file there is only one symbol (named symbol)
and it is at offset 4.


On 27 May 2014 09:13, Ben Gamari <bgamari.foss at gmail.com>
wrote:>
> Somehow this cover letter was dropped from my symbol offsets patch set:
>
>   1. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073200.html
>   2. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073201.html
>
>
> Original message
> -----------------
>
> About a year ago a proposal suggesting symbol offsets was brought to
> this list[1]. This proposal goes hand-in-hand with the prefix data
> proposal[2] which has now been implemented and I believe both of these
> arose in part due to GHC's requirement to place its info tables before
> symbol definitions[3]. Unfortunately, the current implementation of
> prefix data isn't terribly useful to GHC without symbol offsets[4,5]
>
> This weekend I implemented option (2) in the original proposal, then
> eventually implemented option (1) on top of this. Here is the
> result. Note that this can also be found on Github[6] for those who
> prefer this.
>
> A review would be greatly appreciated. One known deficiency of this set
> is the lack of tests. Unfortunately, due to the use of temporary symbols
> it's not clear to me how this feature can be reliably tested. Ideas
> are welcome.
>
> Cheers,
>
> - Ben
>
>
> [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-April/061511.html
> [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063909.html
> [3] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-February/047514.html
> [4] http://www.haskell.org/pipermail/ghc-devs/2013-September/002565.html
> [5] https://ghc.haskell.org/trac/ghc/ticket/4213#comment:12
> [6] https://github.com/bgamari/llvm/compare/symbol-offset
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Ben Gamari

2014-Jul-15 17:26 UTC

head link

[LLVMdev] [PATCH] Symbol offsets

Rafael Espíndola <rafael.espindola at gmail.com> writes:
> Now that aliases can have any expressions, can't you use something like
>
> @data = private global [2 x i32] [i32 42, i32 43]
> @symbol = alias getelementptr ([2 x i32]* @data, i32 0, i32 1)
>
> This produces
>
> .Ldata:
>         .long   42                      # 0x2a
>         .long   43                      # 0x2b
> ...
>         .globl  symbol
> symbol = .Ldata+4
>
> That is, in the object file there is only one symbol (named symbol)
> and it is at offset 4.
>I believe we could but I'll have to try implementing it to know for
certain. Thanks for the suggestion!

Cheers,

- Ben

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 472 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140715/9940d4be/attachment.sig>

Ben Gamari

2014-Jul-20 22:18 UTC

head link

[LLVMdev] [PATCH] Symbol offsets

Rafael Espíndola <rafael.espindola at gmail.com> writes:
> Now that aliases can have any expressions, can't you use something like
>
> @data = private global [2 x i32] [i32 42, i32 43]
> @symbol = alias getelementptr ([2 x i32]* @data, i32 0, i32 1)
>
> This produces
>
> .Ldata:
>         .long   42                      # 0x2a
>         .long   43                      # 0x2b
> ...
>         .globl  symbol
> symbol = .Ldata+4
>
> That is, in the object file there is only one symbol (named symbol)
> and it is at offset 4.
>How would one define the body of the function `symbol` in this case?

Cheers,

- Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 472 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140720/ce7a33e3/attachment.sig>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - May 2014 - [LLVMdev] [PATCH] Symbol offsets

[LLVMdev] [PATCH] Symbol offsets

[LLVMdev] [PATCH] Symbol offsets

[LLVMdev] [PATCH] Symbol offsets

[LLVMdev] [PATCH] Symbol offsets

[LLVMdev] [PATCH] Symbol offsets

[LLVMdev] [PATCH] Symbol offsets

[LLVMdev] [PATCH] Symbol offsets

Possibly Parallel Threads