thr3ads.net - llvm dev - [LLVMdev] Extending GC infrastructure for roots in SSA values [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Benjamin Saunders

2012-Dec-28 21:09 UTC

[LLVMdev] Extending GC infrastructure for roots in SSA values

I'm working on an LLVM backend for Idris, a garbage-collected pure
functional programming language, and have experienced some frustration
that LLVM's GC support, specifically with regard to mapping roots,
operates only on allocas. This entails a lot of otherwise unnecessary
stack allocation (especially in a pure language, where in-place
mutation is rare) and imposes limitations on what optimizations can be
applied. Other LLVM users have used elaborate workarounds to this,
such as Rust's use of address spaces and, I believe, GHC's specialized
calling convention. I'm interested in extending LLVM to support GC
roots in regular SSA values, but, that being a significant change,
it's clear that some discussion is in order before diving in if I want
to get such a patch merged.
This topic has been discussed on multiple previous occasions, and in
each case nothing seems to have come of it, though interest appears to
be significant. In particular, concerns with how such infrastructure
could be made to abide by the invariants of arbitrary GC algorithms
seem to have stayed hands. It's not clear to me why that poses a
problem--if the property of being a GC root is correctly propagated
through all manipulations of a pointer, and that information tracked
through register allocation and made available to the GC metadata
printer, won't the the resulting system have no more limitations or
constraints than the current one? A copying collector would, having a
complete list of root locations, still be able to rewrite them; a
mark-and-sweep collector would still be able to find everything in
need of marking; and so on.
If my understanding above is correct, then perhaps the challenge lies
in correctly propagating the marking of a pointer in an SSA value as a
root through transforms. The pattern of propagation desired seems
identical to that of type information, so perhaps it would be best to
make the marking of a pointer as a GC root an attribute of its type,
much the way address spaces already work? Recall again Rust's approach
here, where the behavior of address space information through
transforms is exactly what is relied upon.
It's easy to imagine a GC root flag on pointer types, but one still
needs to attach metadata to enable tagless GC as supported by the
existing infrastructure. Rust simply encodes this information into the
address space number; a similar approach could be envisioned with a
'GC type ID' number that could be used by the GC metadata printer to
look up detailed information in e.g. module-level metadata, but this
is a bit awkward; it would be nice to have an interface at least as
convenient as the current intrinsic is. If the metadata is uniqued so
as not to break type equality and uniquing, would it be viable to have
the GCd pointer type itself refer to a metadata node?
Finally, is there anything else that needs consideration before attempting this?

Talin

2012-Dec-30 01:54 UTC

head link

[LLVMdev] Extending GC infrastructure for roots in SSA values

First of all, thanks for looking into this! As you've no doubt discovered,
I'm one of the people who has talked a lot about this issue in the past,
and have been frustrated with the lack of progress in this area.

I completely agree with your point about wanting to be able to attach GC
metadata to a type (rather than attaching it to a value, as is done now).
In the past, there have been two objections to this approach: first, the
overhead that would be added to the Pointer type - the vast majority of
LLVM users don't want to have to pay an extra 4-8 bytes per Pointer type.
And second, that all of the optimization passes would have to be updated so
as to not do illegal transformations on a GC type.

A different approach would be to create a new kind of derived type that
associates metadata with an existing type. This "AnnotatedType" would
be
essentially a tuple (type, metadata), and would be constant-folded just
like other types are. Just like the existing GC intrinsics today, there
would be some way for a post-optimization pass to get access to all of the
stack variables an examine the annotations on each to determine how to
construct the appropriate static data structures.

This approach has both a number of advantages and a number of challenges.
The first advantage is that it means that LLVM users who aren't interested
in GC would pay nothing. A second advantage is that this could also be used
to wrap types that are not pointers. One use case I have is being able to
handle a discriminated union type which sometimes holds a pointer and
sometimes doesn't. The existing intrinsics allow this use case (within the
limitations that you point out) - the way it works is that the entire
struct is considered a "root" and is passed through to the GC plugin,
which
generates code to look at the discriminator field and decide whether to
trace it or not. However, I'm also aware of the fact that I seem to be the
only one who is interested in this particular case, so I won't strongly
object if your solution doesn't handle it, as long as the existing
intrinsics continue to work.

The challenge of this approach is that a lot of backend code will need to
unwrap the annotated type in order to operate upon it, and it would be all
too easy to discard the associated metadata as part of this process.

On Fri, Dec 28, 2012 at 1:09 PM, Benjamin Saunders <ralith at gmail.com>
wrote:
> I'm working on an LLVM backend for Idris, a garbage-collected pure
> functional programming language, and have experienced some frustration
> that LLVM's GC support, specifically with regard to mapping roots,
> operates only on allocas. This entails a lot of otherwise unnecessary
> stack allocation (especially in a pure language, where in-place
> mutation is rare) and imposes limitations on what optimizations can be
> applied. Other LLVM users have used elaborate workarounds to this,
> such as Rust's use of address spaces and, I believe, GHC's
specialized
> calling convention. I'm interested in extending LLVM to support GC
> roots in regular SSA values, but, that being a significant change,
> it's clear that some discussion is in order before diving in if I want
> to get such a patch merged.
> This topic has been discussed on multiple previous occasions, and in
> each case nothing seems to have come of it, though interest appears to
> be significant. In particular, concerns with how such infrastructure
> could be made to abide by the invariants of arbitrary GC algorithms
> seem to have stayed hands. It's not clear to me why that poses a
> problem--if the property of being a GC root is correctly propagated
> through all manipulations of a pointer, and that information tracked
> through register allocation and made available to the GC metadata
> printer, won't the the resulting system have no more limitations or
> constraints than the current one? A copying collector would, having a
> complete list of root locations, still be able to rewrite them; a
> mark-and-sweep collector would still be able to find everything in
> need of marking; and so on.
> If my understanding above is correct, then perhaps the challenge lies
> in correctly propagating the marking of a pointer in an SSA value as a
> root through transforms. The pattern of propagation desired seems
> identical to that of type information, so perhaps it would be best to
> make the marking of a pointer as a GC root an attribute of its type,
> much the way address spaces already work? Recall again Rust's approach
> here, where the behavior of address space information through
> transforms is exactly what is relied upon.
> It's easy to imagine a GC root flag on pointer types, but one still
> needs to attach metadata to enable tagless GC as supported by the
> existing infrastructure. Rust simply encodes this information into the
> address space number; a similar approach could be envisioned with a
> 'GC type ID' number that could be used by the GC metadata printer
to
> look up detailed information in e.g. module-level metadata, but this
> is a bit awkward; it would be nice to have an interface at least as
> convenient as the current intrinsic is. If the metadata is uniqued so
> as not to break type equality and uniquing, would it be viable to have
> the GCd pointer type itself refer to a metadata node?
> Finally, is there anything else that needs consideration before attempting
> this?
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121229/0e16de22/attachment.html>

João Matos

2012-Dec-30 02:39 UTC

head link

[LLVMdev] Extending GC infrastructure for roots in SSA values

On Sun, Dec 30, 2012 at 1:54 AM, Talin <viridia at gmail.com> wrote:
> I completely agree with your point about wanting to be able to attach GC
> metadata to a type (rather than attaching it to a value, as is done now).
> In the past, there have been two objections to this approach: first, the
> overhead that would be added to the Pointer type - the vast majority of
> LLVM users don't want to have to pay an extra 4-8 bytes per Pointer
type.
> And second, that all of the optimization passes would have to be updated so
> as to not do illegal transformations on a GC type.
>
I have extended LLVM locally to support metadata on types, though right now
it is only supported on struct types. It could be a good first step to
implement this.

If someone is interested, the code is at:
https://github.com/tritao/llvm/commit/e8c24e1c10713d358392c984389fcf2791130ca5

-- 
João Matos
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121230/6e198fbb/attachment.html>

Sean Silva

2012-Dec-30 03:53 UTC

head link

[LLVMdev] Extending GC infrastructure for roots in SSA values

On Sat, Dec 29, 2012 at 6:54 PM, Talin <viridia at gmail.com>
wrote:> However, I'm also aware of the fact that I seem to be the only one who
is
> interested in this particular case, so I won't strongly object if your
> solution doesn't handle it, as long as the existing intrinsics continue
to
> work.
Most high-performance dynamic language implementations use something
like this, so it's not that obscure. I'm not sure if LLVM will ever be
suitable for that kind of language implementation, but this will at
least be necessary to make it happen.

-- Sean Silva

David Chisnall

2012-Dec-30 10:17 UTC

head link

[LLVMdev] Extending GC infrastructure for roots in SSA values

On 30 Dec 2012, at 01:54, Talin wrote:
> I completely agree with your point about wanting to be able to attach GC
metadata to a type (rather than attaching it to a value, as is done now). In the
past, there have been two objections to this approach: first, the overhead that
would be added to the Pointer type - the vast majority of LLVM users don't
want to have to pay an extra 4-8 bytes per Pointer type. And second, that all of
the optimization passes would have to be updated so as to not do illegal
transformations on a GC type.
There are two other alternatives:

- Use address spaces to separate garbage-collected from non-garbage-collected
pointers.  There is (was?) a plan to add an address space cast instruction and
explicitly disallow bitcasts of pointers between address spaces.  This would
mean that you could have one address space for GC roots, one for GC-allocated
memory and enforce the casts in your front end.  Optimisations would then not be
allowed to change the address space of any pointers, so the GC status would be
preserved.  GC-aware allocations may insert explicit address space casts, where
appropriate.

- Add a new GC'd pointer type, which is an entirely separate type.  This
might make sense, as you ideally want GC'd pointers to be treated
differently from others (e.g. you may not want pointers to the starts of
allocations to be removed)

For languages like OCaml, you also want to be able to do escape analysis on
GC'd pointers to get good performance (so you don't bother tracing ones
that can't possibly escape).  This ideally requires a pass that will
recursively and automatically apply nocapture attributes to arguments.  In
functional languages, this ends up being almost all allocations, so you can
allocate them either on the stack or on a separate bump-the-pointer allocator
and delete them on function return by just resetting the pointer.  This means
that you would want to be able to have transforms that lowered GC'd pointers
to stack or heap pointers.

In some implementations, GC'd pointers are fat pointers, so they should not
be represented as PointerType in the IR or as iPTR in the back end.

David

Benjamin Saunders

2012-Dec-31 00:28 UTC

head link

[LLVMdev] Extending GC infrastructure for roots in SSA values

On Sat, Dec 29, 2012 at 5:54 PM, Talin <viridia at gmail.com>
wrote:> First of all, thanks for looking into this! As you've no doubt
discovered,
> I'm one of the people who has talked a lot about this issue in the
past, and
> have been frustrated with the lack of progress in this area.
Yeah, I spent some time digging through the archives. Frankly, I'm
surprised something that would be clearly useful for so many people
hasn't had someone else step up before, but I'm happy to be that
person.
> I completely agree with your point about wanting to be able to attach GC
> metadata to a type (rather than attaching it to a value, as is done now).
In
> the past, there have been two objections to this approach: first, the
> overhead that would be added to the Pointer type - the vast majority of
LLVM
> users don't want to have to pay an extra 4-8 bytes per Pointer type.
And
> second, that all of the optimization passes would have to be updated so as
> to not do illegal transformations on a GC type.
Is the added memory cost a realistic concern, bear in mind that types
are uniqued?

Your comment about optimization passes evokes what I've read in past
discussions, and I don't understand it any better now. Can you
describe some examples of illegal transformations that would occur if
the current optimization passes were left unchanged?
> A different approach would be to create a new kind of derived type that
> associates metadata with an existing type. This "AnnotatedType"
would be
> essentially a tuple (type, metadata), and would be constant-folded just
like
> other types are. Just like the existing GC intrinsics today, there would be
> some way for a post-optimization pass to get access to all of the stack
> variables an examine the annotations on each to determine how to construct
> the appropriate static data structures.
>
> This approach has both a number of advantages and a number of challenges.
> The first advantage is that it means that LLVM users who aren't
interested
> in GC would pay nothing. A second advantage is that this could also be used
> to wrap types that are not pointers. One use case I have is being able to
> handle a discriminated union type which sometimes holds a pointer and
> sometimes doesn't. The existing intrinsics allow this use case (within
the
> limitations that you point out) - the way it works is that the entire
struct
> is considered a "root" and is passed through to the GC plugin,
which
> generates code to look at the discriminator field and decide whether to
> trace it or not. However, I'm also aware of the fact that I seem to be
the
> only one who is interested in this particular case, so I won't strongly
> object if your solution doesn't handle it, as long as the existing
> intrinsics continue to work.
>
> The challenge of this approach is that a lot of backend code will need to
> unwrap the annotated type in order to operate upon it, and it would be all
> too easy to discard the associated metadata as part of this process.
I imagine this could prove useful for even more than GC, as it
introduces what one might think of as type-level, as opposed to
instruction- or value-level, metadata. Being both simpler and more
general, this seems like a much better idea than my proposal, and I'd
be happy to run with it if it checks out. In fact, I would have
eventually ran into exactly the same problem with discriminated unions
that you describe; Idris, like any other modern statically-typed
functional language, has algebraic datatypes after all. Getting
optimization passes to treat AnnotatedTypes the same as their
contained type would be necessary for this to pay off completely; can
anyone comment on what, if any, difficulties might be involved there?
> On Fri, Dec 28, 2012 at 1:09 PM, Benjamin Saunders <ralith at
gmail.com> wrote:
>>
>> I'm working on an LLVM backend for Idris, a garbage-collected pure
>> functional programming language, and have experienced some frustration
>> that LLVM's GC support, specifically with regard to mapping roots,
>> operates only on allocas. This entails a lot of otherwise unnecessary
>> stack allocation (especially in a pure language, where in-place
>> mutation is rare) and imposes limitations on what optimizations can be
>> applied. Other LLVM users have used elaborate workarounds to this,
>> such as Rust's use of address spaces and, I believe, GHC's
specialized
>> calling convention. I'm interested in extending LLVM to support GC
>> roots in regular SSA values, but, that being a significant change,
>> it's clear that some discussion is in order before diving in if I
want
>> to get such a patch merged.
>> This topic has been discussed on multiple previous occasions, and in
>> each case nothing seems to have come of it, though interest appears to
>> be significant. In particular, concerns with how such infrastructure
>> could be made to abide by the invariants of arbitrary GC algorithms
>> seem to have stayed hands. It's not clear to me why that poses a
>> problem--if the property of being a GC root is correctly propagated
>> through all manipulations of a pointer, and that information tracked
>> through register allocation and made available to the GC metadata
>> printer, won't the the resulting system have no more limitations or
>> constraints than the current one? A copying collector would, having a
>> complete list of root locations, still be able to rewrite them; a
>> mark-and-sweep collector would still be able to find everything in
>> need of marking; and so on.
>> If my understanding above is correct, then perhaps the challenge lies
>> in correctly propagating the marking of a pointer in an SSA value as a
>> root through transforms. The pattern of propagation desired seems
>> identical to that of type information, so perhaps it would be best to
>> make the marking of a pointer as a GC root an attribute of its type,
>> much the way address spaces already work? Recall again Rust's
approach
>> here, where the behavior of address space information through
>> transforms is exactly what is relied upon.
>> It's easy to imagine a GC root flag on pointer types, but one still
>> needs to attach metadata to enable tagless GC as supported by the
>> existing infrastructure. Rust simply encodes this information into the
>> address space number; a similar approach could be envisioned with a
>> 'GC type ID' number that could be used by the GC metadata
printer to
>> look up detailed information in e.g. module-level metadata, but this
>> is a bit awkward; it would be nice to have an interface at least as
>> convenient as the current intrinsic is. If the metadata is uniqued so
>> as not to break type equality and uniquing, would it be viable to have
>> the GCd pointer type itself refer to a metadata node?
>> Finally, is there anything else that needs consideration before
attempting
>> this?
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
> --
> -- Talin
>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Dec 2012 - [LLVMdev] Extending GC infrastructure for roots in SSA values

[LLVMdev] Extending GC infrastructure for roots in SSA values

[LLVMdev] Extending GC infrastructure for roots in SSA values

[LLVMdev] Extending GC infrastructure for roots in SSA values

[LLVMdev] Extending GC infrastructure for roots in SSA values

[LLVMdev] Extending GC infrastructure for roots in SSA values

[LLVMdev] Extending GC infrastructure for roots in SSA values

Possibly Parallel Threads