thr3ads.net - llvm dev - [LLVMdev] llvm.gcroot suggestion [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Talin

2011-Mar-05 17:42 UTC

[LLVMdev] llvm.gcroot suggestion

On Mon, Feb 21, 2011 at 1:50 AM, nicolas geoffray <
nicolas.geoffray at gmail.com> wrote:
> Hi Talin,
>
> On Fri, Feb 18, 2011 at 5:50 PM, Talin <viridia at gmail.com> wrote:
>>
>>
>> In the current scheme, the way you tell LLVM that a root is no longer
>> needed is by assigning NULL to it. However, that assumes that all roots
are
>> pointers, which is not true in my world - a root can be a struct
containing
>> pointers inside of it. (In my current frontend, a non-pointer root is
>> indicated by passing a non-NULL metadata argument to llvm.gcroot, which
>> contains information about which fields in the struct are roots. This
is
>> especially important in the case of tagged unions, where the garbage
>> collector may have to examine the union tag field in order to determine
if
>> the pointer field is indeed a pointer - passing the pointer alone would
be
>> insufficient to determine this.)
>>
>
> For a tagged union, I guess you are currently using the second argument of
> llvm.gcroot to provde the information? I guess keeping an intrinsic for
this
> kind of code is the best way to go.
>
>
>> Putting GC roots in a different address space works OK for me, as long
as
>> I can have SSA values that are structs that have pointers embedded in
them
>> that are in this different address space. In other words, if I have an
SSA
>> value that is a struct containing pointers which are roots, I need for
the
>> garbage collector to see the entire struct, not just the pointers.
>>
>
> That's entirely fine with a different address space. The roots given by
the
> LLVM GC pass should contain the location of these embedded pointers.
>
>
>>
>> What I'm primarily asking for is to have the LLVM code generator
>> automatically spill roots from SSA values to memory during a sync
point, and
>> reload them afterward,
>>
>
>
> I don't think that's even needed: long term, LLVM should return the
> location of all roots for a given sync point (typically method call). By
all
> roots, I mean register roots and stack roots. The frontend should then be
> responsible for updating those roots.
>
>
>> instead of my frontend having to generate code to do this. As I
mentioned,
>> the current scheme results in the frontend having to generate very
>> inefficient IR because of the need to be conservative about root
liveness.
>>
>
> Agree.
>
>
>> The frontend can't know anything about the optimization passes that
LLVM
>> will perform on the function.
>>
>
> Sure. And I think the way to go is to remove the llvm.gcroot intrinsic (and
> the hackish way it currently works: right now, because we take the address
> of the alloca, the LLVM optimiziers won't try to optimize an alloca
that may
> escape through the llvm.gcroot function call). By having an address space
> for GC roots, optimizers don't need to care about anything. After the
> optimizers and the register allocator, a final LLVM pass should compute the
> root lists of all sync points.
>
> Nicolas
>
So I've been thinking about your proposal, that of using a special address
space to indicate garbage collection roots instead of intrinsics. I want to
point out some of the downsides of this approach:

1) The biggest drawback that I see is that it still requires frontends to
signal that a root is no longer being used by assigning NULL to the pointer.
This turns out to be hard to do in some cases, for example:

   while (true) {
      String s = someFunc();
      if (s->equals("foo")) {
        break;
      } else {
        // call some function with s
      }
   }

In the above example, where would you put the code to zero out the root
's'?
In this case, the answer is, after the body of the loop. Now, normally one
would zero out roots at the end of the block in which the variable is
declared, but in the case of this while loop, it's wasted effort to do that
since the root is just going to get assigned again at the top of the loop.
However, sometimes we never make it to the end of the block, in this case
due to a break statement, and so we have to either null out 's'  either
just
before or just after the break.

This example is relatively simple - if we start getting into scenarios
involving switch statements, try/catch, and so on, I can easily construct
complex examples that would make your head spin.

Worse, there are cases where there are multiple code paths exiting from a
block, and you end up having to generate the code to null out a given root
on each of those paths. Remember what I said about frontends having to
generate inefficient code to handle llvm.gcroot? This is one of the
reasons.

To address this, we need a better way of telling LLVM that a given variable
is no longer a root. Of course, in the end it doesn't matter what the
lifetime of the variable is, the only thing that matters is the state of the
variable at each safe point. If there was a way to tell LLVM 'stop tracking
this variable as a root', and then let it worry about whether the value in
the variable is live or not, the generated code could be much more
efficient.

Of course, frontends would still have to deal with multiple exits from a
block, but they can afford, I think, to be somewhat more lax about it. For
example, assume in the example above that we insert a call to
llvm-gcroot-end (or whatever we want to call it) after the while loop. The
compiler knows in this case that all paths originating from the definition
of s must pass through that point. Further, LLVM knows that there's only one
safe point in the while loop, which is in the 'else' block. Thus the
only
time the GCStrategy ever 'sees' the variable 's' is at that one
point, which
means that there's no need to zero it out. In other words, LLVM knows that
the range over which the variable is live is smaller than the range over
which it is declared a root.

2) As I mentioned, my language supports tagged unions and other
"value"
types. Another example is a tuple type, such as (String, String). Such types
are never allocated on the heap by themselves, because they don't have the
object header structure that holds the type information needed by the
garbage collector. Instead, these values can live in SSA variables, or in
allocas, or they can be embedded inside larger types which do live on the
heap.

The way I currently handle such objects is by passing the trace table for
the type as the metadata argument to llvm.gcroot(). A NULL metadata argument
means that the root is a simple pointer, a non-NULL argument means that the
root is a struct.

How do we signal that a struct is no longer a root? Currently I do so by
zeroing out the entire structure, but again that's wasteful. It would be
better to simply tell LLVM that the struct is no longer a root.

3) I've been following the discussions on llvm-dev about the use of the
address-space property of pointers to signal different kinds of memory pools
for things like shared address spaces. If we try to use that same variable
to indicate garbage collection, now we have to multiplex both meanings onto
the same field. We can't just dedicate one special ID for the garbage
collected heap, because there could be multiple such heaps. As you add
additional orthogonal meanings to the address-space field, you end up with a
combinatorial explosion of possible values for it.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110305/31f38bf4/attachment.html>

nicolas geoffray

2011-Mar-07 12:08 UTC

head link

[LLVMdev] llvm.gcroot suggestion

Hi Talin,

On Sat, Mar 5, 2011 at 6:42 PM, Talin <viridia at gmail.com>
wrote:>
>
> So I've been thinking about your proposal, that of using a special
address
> space to indicate garbage collection roots instead of intrinsics.

Great!

>
> To address this, we need a better way of telling LLVM that a given variable
> is no longer a root.
>
Live variable analysis is already in LLVM and for me that's enough to know
whether a given variable is no longer a root. Note that each safe point has
its own set of root locations, and these locations all contain live
variables. Dead variables may still be in register or stack, but the GC will
not visit them.

> 2) As I mentioned, my language supports tagged unions and other
"value"
> types. Another example is a tuple type, such as (String, String). Such
types
> are never allocated on the heap by themselves, because they don't have
the
> object header structure that holds the type information needed by the
> garbage collector. Instead, these values can live in SSA variables, or in
> allocas, or they can be embedded inside larger types which do live on the
> heap.
>
If you know, at compile-time, whether you are dealing with a struct or a
heap, what prevents you from emitting code that won't need such tagged
unions in the IR. Same for structs: if they contain pointers to heap
objects, those will be in that special address space.

3) I've been following the discussions on llvm-dev about the use of
the> address-space property of pointers to signal different kinds of memory
pools
> for things like shared address spaces. If we try to use that same variable
> to indicate garbage collection, now we have to multiplex both meanings onto
> the same field. We can't just dedicate one special ID for the garbage
> collected heap, because there could be multiple such heaps. As you add
> additional orthogonal meanings to the address-space field, you end up with
a
> combinatorial explosion of possible values for it.
>
>I think there exist already some convention between an ID and some codegen.
Having one additional seems fine to me, even if you need to play with bits
in case you need different IDs for a single pointer.

I'm also fine with the intrinsic way of declaring a GC root. But I think it
is cumbersome, and error-prone in the presence of optimizers that may try to
move away that intrinsic (I remember similar issues with the current EH
intrinsics).

Nicolas

> --
> -- Talin
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110307/4b2ba122/attachment.html>

Talin

2011-Mar-07 17:35 UTC

head link

[LLVMdev] llvm.gcroot suggestion

On Mon, Mar 7, 2011 at 4:08 AM, nicolas geoffray <nicolas.geoffray at
gmail.com> wrote:
> Hi Talin,
>
> On Sat, Mar 5, 2011 at 6:42 PM, Talin <viridia at gmail.com> wrote:
>>
>>
>> So I've been thinking about your proposal, that of using a special
address
>> space to indicate garbage collection roots instead of intrinsics.
>
>
>  Great!
>
>
>>
>> To address this, we need a better way of telling LLVM that a given
>> variable is no longer a root.
>>
>
> Live variable analysis is already in LLVM and for me that's enough to
know
> whether a given variable is no longer a root. Note that each safe point has
> its own set of root locations, and these locations all contain live
> variables. Dead variables may still be in register or stack, but the GC
will
> not visit them.
>
>
>> 2) As I mentioned, my language supports tagged unions and other
"value"
>> types. Another example is a tuple type, such as (String, String). Such
types
>> are never allocated on the heap by themselves, because they don't
have the
>> object header structure that holds the type information needed by the
>> garbage collector. Instead, these values can live in SSA variables, or
in
>> allocas, or they can be embedded inside larger types which do live on
the
>> heap.
>>
>
> If you know, at compile-time, whether you are dealing with a struct or a
> heap, what prevents you from emitting code that won't need such tagged
> unions in the IR. Same for structs: if they contain pointers to heap
> objects, those will be in that special address space.
>
I'm not sure what you mean by this.

Take for example a union of a String (which is a pointer) and a float. The
union is either { i1; String * } or { i1; float }. The garbage collector
needs to see that i1 in order to know whether the second field of the struct
is a pointer - if it attempted to dereference the pointer when the field
actually contains a float, the program would crash. The metadata argument
that I pass to llvm.gcroot informs the garbage collector about the structure
of the union.
>
> 3) I've been following the discussions on llvm-dev about the use of the
>> address-space property of pointers to signal different kinds of memory
pools
>> for things like shared address spaces. If we try to use that same
variable
>> to indicate garbage collection, now we have to multiplex both meanings
onto
>> the same field. We can't just dedicate one special ID for the
garbage
>> collected heap, because there could be multiple such heaps. As you add
>> additional orthogonal meanings to the address-space field, you end up
with a
>> combinatorial explosion of possible values for it.
>>
>>
> I think there exist already some convention between an ID and some codegen.
> Having one additional seems fine to me, even if you need to play with bits
> in case you need different IDs for a single pointer.
>
> I'm also fine with the intrinsic way of declaring a GC root. But I
think it
> is cumbersome, and error-prone in the presence of optimizers that may try
to
> move away that intrinsic (I remember similar issues with the current EH
> intrinsics).
>
> Nicolas
>
>
>> --
>> -- Talin
>>
>
>

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110307/e32535e9/attachment.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Mar 2011 - [LLVMdev] llvm.gcroot suggestion

[LLVMdev] llvm.gcroot suggestion

[LLVMdev] llvm.gcroot suggestion

[LLVMdev] llvm.gcroot suggestion

Apparently Analagous Threads