thr3ads.net - llvm dev - [LLVMdev] Extending GC infrastructure for roots in SSA values [Dec 2012]

If this information is useful, please help other people find it:
Share via:

David Chisnall

2012-Dec-30 10:17 UTC

[LLVMdev] Extending GC infrastructure for roots in SSA values

On 30 Dec 2012, at 01:54, Talin wrote:
> I completely agree with your point about wanting to be able to attach GC
metadata to a type (rather than attaching it to a value, as is done now). In the
past, there have been two objections to this approach: first, the overhead that
would be added to the Pointer type - the vast majority of LLVM users don't
want to have to pay an extra 4-8 bytes per Pointer type. And second, that all of
the optimization passes would have to be updated so as to not do illegal
transformations on a GC type.
There are two other alternatives:

- Use address spaces to separate garbage-collected from non-garbage-collected
pointers.  There is (was?) a plan to add an address space cast instruction and
explicitly disallow bitcasts of pointers between address spaces.  This would
mean that you could have one address space for GC roots, one for GC-allocated
memory and enforce the casts in your front end.  Optimisations would then not be
allowed to change the address space of any pointers, so the GC status would be
preserved.  GC-aware allocations may insert explicit address space casts, where
appropriate.

- Add a new GC'd pointer type, which is an entirely separate type.  This
might make sense, as you ideally want GC'd pointers to be treated
differently from others (e.g. you may not want pointers to the starts of
allocations to be removed)

For languages like OCaml, you also want to be able to do escape analysis on
GC'd pointers to get good performance (so you don't bother tracing ones
that can't possibly escape).  This ideally requires a pass that will
recursively and automatically apply nocapture attributes to arguments.  In
functional languages, this ends up being almost all allocations, so you can
allocate them either on the stack or on a separate bump-the-pointer allocator
and delete them on function return by just resetting the pointer.  This means
that you would want to be able to have transforms that lowered GC'd pointers
to stack or heap pointers.

In some implementations, GC'd pointers are fat pointers, so they should not
be represented as PointerType in the IR or as iPTR in the back end.

David

Talin

2012-Dec-30 19:02 UTC

head link

[LLVMdev] Extending GC infrastructure for roots in SSA values

On Sun, Dec 30, 2012 at 2:17 AM, David Chisnall <David.Chisnall at
cl.cam.ac.uk> wrote:
> On 30 Dec 2012, at 01:54, Talin wrote:
>
> > I completely agree with your point about wanting to be able to attach
GC
> metadata to a type (rather than attaching it to a value, as is done now).
> In the past, there have been two objections to this approach: first, the
> overhead that would be added to the Pointer type - the vast majority of
> LLVM users don't want to have to pay an extra 4-8 bytes per Pointer
type.
> And second, that all of the optimization passes would have to be updated so
> as to not do illegal transformations on a GC type.
>
> There are two other alternatives:
>
> - Use address spaces to separate garbage-collected from
> non-garbage-collected pointers.  There is (was?) a plan to add an address
> space cast instruction and explicitly disallow bitcasts of pointers between
> address spaces.  This would mean that you could have one address space for
> GC roots, one for GC-allocated memory and enforce the casts in your front
> end.  Optimisations would then not be allowed to change the address space
> of any pointers, so the GC status would be preserved.  GC-aware allocations
> may insert explicit address space casts, where appropriate.
>
This works fine for languages like Java where every object has a type field
that describes how to trace it. However, the existing LLVM intrinsics also
support the case where the type information is only known statically by the
compiler instead of at runtime - the metadata argument allows the compiler
to pass a trace table to the GC plugin. Trying to encode that information
 into a single address space integer would be painful.
>
> - Add a new GC'd pointer type, which is an entirely separate type. 
This
> might make sense, as you ideally want GC'd pointers to be treated
> differently from others (e.g. you may not want pointers to the starts of
> allocations to be removed)
>
> For languages like OCaml, you also want to be able to do escape analysis
> on GC'd pointers to get good performance (so you don't bother
tracing ones
> that can't possibly escape).  This ideally requires a pass that will
> recursively and automatically apply nocapture attributes to arguments.  In
> functional languages, this ends up being almost all allocations, so you can
> allocate them either on the stack or on a separate bump-the-pointer
> allocator and delete them on function return by just resetting the pointer.
>  This means that you would want to be able to have transforms that lowered
> GC'd pointers to stack or heap pointers.
>
> In some implementations, GC'd pointers are fat pointers, so they should
> not be represented as PointerType in the IR or as iPTR in the back end.
>
> David



-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121230/534bfde7/attachment.html>

Benjamin Saunders

2012-Dec-31 00:16 UTC

head link

[LLVMdev] Extending GC infrastructure for roots in SSA values

On Sun, Dec 30, 2012 at 11:02 AM, Talin <viridia at gmail.com>
wrote:> On Sun, Dec 30, 2012 at 2:17 AM, David Chisnall
> <David.Chisnall at cl.cam.ac.uk> wrote:
>>
>> On 30 Dec 2012, at 01:54, Talin wrote:
>>
>> > I completely agree with your point about wanting to be able to
attach GC
>> > metadata to a type (rather than attaching it to a value, as is
done now). In
>> > the past, there have been two objections to this approach: first,
the
>> > overhead that would be added to the Pointer type - the vast
majority of LLVM
>> > users don't want to have to pay an extra 4-8 bytes per Pointer
type. And
>> > second, that all of the optimization passes would have to be
updated so as
>> > to not do illegal transformations on a GC type.
>>
>> There are two other alternatives:
>>
>> - Use address spaces to separate garbage-collected from
>> non-garbage-collected pointers.  There is (was?) a plan to add an
address
>> space cast instruction and explicitly disallow bitcasts of pointers
between
>> address spaces.  This would mean that you could have one address space
for
>> GC roots, one for GC-allocated memory and enforce the casts in your
front
>> end.  Optimisations would then not be allowed to change the address
space of
>> any pointers, so the GC status would be preserved.  GC-aware
allocations may
>> insert explicit address space casts, where appropriate.
>
>
> This works fine for languages like Java where every object has a type field
> that describes how to trace it. However, the existing LLVM intrinsics also
> support the case where the type information is only known statically by the
> compiler instead of at runtime - the metadata argument allows the compiler
> to pass a trace table to the GC plugin. Trying to encode that information
> into a single address space integer would be painful.
Indeed; this sort of tagless GC is exactly what I want to support. An
interesting note is that Rust currently implements exactly the
workaround you describe (see
https://github.com/elliottslaughter/rust-gc-notes ), but, hackiness
aside, this has also caused problems with attempts to support targets
where spaces actually have special meaning (see
http://blog.theincredibleholk.org/blog/2012/12/05/compiling-rust-for-gpus/
). I'm certain they'd be happy to be able to replace that with an
approach such as we're discussing.
>> - Add a new GC'd pointer type, which is an entirely separate type. 
This
>> might make sense, as you ideally want GC'd pointers to be treated
>> differently from others (e.g. you may not want pointers to the starts
of
>> allocations to be removed)
This seems like a good approach for what I originally described
(though I like Talin's proposal better) in that it's functionally
equivalent, but avoids the unnecessary overhead for non-GCing LLVM
users. Though is it really plausible that anyone would care about an
extra 4-8 bytes per PointerType? I haven't profiled, but I can't
imagine that would be a dramatic increase in the resouce usage of the
tools, or a compiler using them.
>> For languages like OCaml, you also want to be able to do escape
analysis
>> on GC'd pointers to get good performance (so you don't bother
tracing ones
>> that can't possibly escape).  This ideally requires a pass that
will
>> recursively and automatically apply nocapture attributes to arguments. 
In
>> functional languages, this ends up being almost all allocations, so you
can
>> allocate them either on the stack or on a separate bump-the-pointer
>> allocator and delete them on function return by just resetting the
pointer.
>> This means that you would want to be able to have transforms that
lowered
>> GC'd pointers to stack or heap pointers.
Is there any particular reason to expect that supporting this would
pose a problem? What might prevent it?
>> In some implementations, GC'd pointers are fat pointers, so they
should
>> not be represented as PointerType in the IR or as iPTR in the back end.
I expect that implementation techniques in those cases would be
unaffected by my proposed changes.
>> David
>
>
>
>
> --
> -- Talin

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Dec 2012 - [LLVMdev] Extending GC infrastructure for roots in SSA values

[LLVMdev] Extending GC infrastructure for roots in SSA values

[LLVMdev] Extending GC infrastructure for roots in SSA values

[LLVMdev] Extending GC infrastructure for roots in SSA values

Possibly Parallel Threads