thr3ads.net - llvm dev - [LLVMdev] llvm.gcroot suggestion [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Talin

2011-Mar-07 17:35 UTC

[LLVMdev] llvm.gcroot suggestion

On Mon, Mar 7, 2011 at 4:08 AM, nicolas geoffray <nicolas.geoffray at
gmail.com> wrote:
> Hi Talin,
>
> On Sat, Mar 5, 2011 at 6:42 PM, Talin <viridia at gmail.com> wrote:
>>
>>
>> So I've been thinking about your proposal, that of using a special
address
>> space to indicate garbage collection roots instead of intrinsics.
>
>
>  Great!
>
>
>>
>> To address this, we need a better way of telling LLVM that a given
>> variable is no longer a root.
>>
>
> Live variable analysis is already in LLVM and for me that's enough to
know
> whether a given variable is no longer a root. Note that each safe point has
> its own set of root locations, and these locations all contain live
> variables. Dead variables may still be in register or stack, but the GC
will
> not visit them.
>
>
>> 2) As I mentioned, my language supports tagged unions and other
"value"
>> types. Another example is a tuple type, such as (String, String). Such
types
>> are never allocated on the heap by themselves, because they don't
have the
>> object header structure that holds the type information needed by the
>> garbage collector. Instead, these values can live in SSA variables, or
in
>> allocas, or they can be embedded inside larger types which do live on
the
>> heap.
>>
>
> If you know, at compile-time, whether you are dealing with a struct or a
> heap, what prevents you from emitting code that won't need such tagged
> unions in the IR. Same for structs: if they contain pointers to heap
> objects, those will be in that special address space.
>
I'm not sure what you mean by this.

Take for example a union of a String (which is a pointer) and a float. The
union is either { i1; String * } or { i1; float }. The garbage collector
needs to see that i1 in order to know whether the second field of the struct
is a pointer - if it attempted to dereference the pointer when the field
actually contains a float, the program would crash. The metadata argument
that I pass to llvm.gcroot informs the garbage collector about the structure
of the union.
>
> 3) I've been following the discussions on llvm-dev about the use of the
>> address-space property of pointers to signal different kinds of memory
pools
>> for things like shared address spaces. If we try to use that same
variable
>> to indicate garbage collection, now we have to multiplex both meanings
onto
>> the same field. We can't just dedicate one special ID for the
garbage
>> collected heap, because there could be multiple such heaps. As you add
>> additional orthogonal meanings to the address-space field, you end up
with a
>> combinatorial explosion of possible values for it.
>>
>>
> I think there exist already some convention between an ID and some codegen.
> Having one additional seems fine to me, even if you need to play with bits
> in case you need different IDs for a single pointer.
>
> I'm also fine with the intrinsic way of declaring a GC root. But I
think it
> is cumbersome, and error-prone in the presence of optimizers that may try
to
> move away that intrinsic (I remember similar issues with the current EH
> intrinsics).
>
> Nicolas
>
>
>> --
>> -- Talin
>>
>
>

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110307/e32535e9/attachment.html>

Talin

2011-Mar-07 17:46 UTC

head link

[LLVMdev] llvm.gcroot suggestion

On Mon, Mar 7, 2011 at 9:35 AM, Talin <viridia at gmail.com> wrote:
> On Mon, Mar 7, 2011 at 4:08 AM, nicolas geoffray <
> nicolas.geoffray at gmail.com> wrote:
>
>> Hi Talin,
>>
>> On Sat, Mar 5, 2011 at 6:42 PM, Talin <viridia at gmail.com>
wrote:
>>>
>>>
>>> So I've been thinking about your proposal, that of using a
special
>>> address space to indicate garbage collection roots instead of
intrinsics.
>>
>>
>>  Great!
>>
>>
>>>
>>> To address this, we need a better way of telling LLVM that a given
>>> variable is no longer a root.
>>>
>>
>> Live variable analysis is already in LLVM and for me that's enough
to know
>> whether a given variable is no longer a root. Note that each safe point
has
>> its own set of root locations, and these locations all contain live
>> variables. Dead variables may still be in register or stack, but the GC
will
>> not visit them.
>>
>>
>>> 2) As I mentioned, my language supports tagged unions and other
"value"
>>> types. Another example is a tuple type, such as (String, String).
Such types
>>> are never allocated on the heap by themselves, because they
don't have the
>>> object header structure that holds the type information needed by
the
>>> garbage collector. Instead, these values can live in SSA variables,
or in
>>> allocas, or they can be embedded inside larger types which do live
on the
>>> heap.
>>>
>>
>> If you know, at compile-time, whether you are dealing with a struct or
a
>> heap, what prevents you from emitting code that won't need such
tagged
>> unions in the IR. Same for structs: if they contain pointers to heap
>> objects, those will be in that special address space.
>>
>
> I'm not sure what you mean by this.
>
> Take for example a union of a String (which is a pointer) and a float. The
> union is either { i1; String * } or { i1; float }. The garbage collector
> needs to see that i1 in order to know whether the second field of the
struct
> is a pointer - if it attempted to dereference the pointer when the field
> actually contains a float, the program would crash. The metadata argument
> that I pass to llvm.gcroot informs the garbage collector about the
structure
> of the union.
>
Sorry, I left a part out. The way that my garbage collector works currently
is that the collector gets a pointer to the enture union struct, not just
the pointer field within the union. In other words, the entire union struct
is considered a "root".

In fact, there might not even be a pointer in the struct. You see, because
LLVM doesn't directly support unions, I have to simulate that support by
casting pointers. That is, for each different type contained in the union, I
have a different struct type, and when I want to extract data from the union
I cast the pointer to the appropriate type and then use GEP to get the data
out. However, when allocating storage for the union, I have to use the
largest data type, which might not be a pointer.

For example, suppose I have a type "String or (float, float, float)" -
that
is, a union of a string and a 3-tuple of floats. Most of the time what LLVM
will see is { i1; { float; float; float; } } because that's bigger than {
i1; String* }. LLVM won't even know there's a pointer in there, except
during those brief times when I'm accessing the pointer field. So tagging
the pointer in a different address space won't help at all here.

>> 3) I've been following the discussions on llvm-dev about the use of
the
>>> address-space property of pointers to signal different kinds of
memory pools
>>> for things like shared address spaces. If we try to use that same
variable
>>> to indicate garbage collection, now we have to multiplex both
meanings onto
>>> the same field. We can't just dedicate one special ID for the
garbage
>>> collected heap, because there could be multiple such heaps. As you
add
>>> additional orthogonal meanings to the address-space field, you end
up with a
>>> combinatorial explosion of possible values for it.
>>>
>>>
>> I think there exist already some convention between an ID and some
>> codegen. Having one additional seems fine to me, even if you need to
play
>> with bits in case you need different IDs for a single pointer.
>>
>> I'm also fine with the intrinsic way of declaring a GC root. But I
think
>> it is cumbersome, and error-prone in the presence of optimizers that
may try
>> to move away that intrinsic (I remember similar issues with the current
EH
>> intrinsics).
>>
>> Nicolas
>>
>>
>>> --
>>> -- Talin
>>>
>>
>>
>
>
> --
> -- Talin
>


-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110307/b38365b6/attachment.html>

Joshua Warner

2011-Mar-07 18:58 UTC

head link

[LLVMdev] llvm.gcroot suggestion

Hi Talin,

Sorry to interject -

> For example, suppose I have a type "String or (float, float,
float)" - that
> is, a union of a string and a 3-tuple of floats. Most of the time what LLVM
> will see is { i1; { float; float; float; } } because that's bigger than
{
> i1; String* }. LLVM won't even know there's a pointer in there,
except
> during those brief times when I'm accessing the pointer field. So
tagging
> the pointer in a different address space won't help at all here.
>
>I think this is a fairly uncommon use case that will be tricky to deal with
no matter what method is used to track GC roots.  That said, why not do
something like make the pointer representation (the {i1, String*}) the
long-term storage format, and only bitcast *just* before loading the
floats?  You could even use another address space to indicate that something
is *sometimes* a pointer, dependent upon some other value (the i1, perhaps
indicated with metadata).

My vote (not that it really counts for much) would be the address-space
method.  It seems much more elegant.

The only thing that I think would be unusually difficult for the
address-space method to handle would be alternative pointer representations,
such as those used in the latest version of Hotspot (see
http://wikis.sun.com/display/HotSpotInternals/CompressedOops).  Essentially,
a 64-bit pointer is packed into 32-bits by assuming 8-byte alignment and
restricting the heap size to 32GB.  I've seen similar object-reference
bitfields used in game engines.  In this case, there is no "pointer"
to
attach the address space to.

(Yes, I know that Hotspot currently uses CompressedOops ONLY in the heap,
decompressing them when stored in locals, but it is not inconceivable to
avoid decompressing them if the code is just moving them around, as an
optimization.)

Just my few thoughts.

-Joshua
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110307/868992ab/attachment.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Mar 2011 - [LLVMdev] llvm.gcroot suggestion

[LLVMdev] llvm.gcroot suggestion

[LLVMdev] llvm.gcroot suggestion

[LLVMdev] llvm.gcroot suggestion

Apparently Analagous Threads