thr3ads.net - llvm dev - [LLVMdev] llvm.gcroot suggestion [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Joshua Warner

2011-Mar-07 18:58 UTC

[LLVMdev] llvm.gcroot suggestion

Hi Talin,

Sorry to interject -

> For example, suppose I have a type "String or (float, float,
float)" - that
> is, a union of a string and a 3-tuple of floats. Most of the time what LLVM
> will see is { i1; { float; float; float; } } because that's bigger than
{
> i1; String* }. LLVM won't even know there's a pointer in there,
except
> during those brief times when I'm accessing the pointer field. So
tagging
> the pointer in a different address space won't help at all here.
>
>I think this is a fairly uncommon use case that will be tricky to deal with
no matter what method is used to track GC roots.  That said, why not do
something like make the pointer representation (the {i1, String*}) the
long-term storage format, and only bitcast *just* before loading the
floats?  You could even use another address space to indicate that something
is *sometimes* a pointer, dependent upon some other value (the i1, perhaps
indicated with metadata).

My vote (not that it really counts for much) would be the address-space
method.  It seems much more elegant.

The only thing that I think would be unusually difficult for the
address-space method to handle would be alternative pointer representations,
such as those used in the latest version of Hotspot (see
http://wikis.sun.com/display/HotSpotInternals/CompressedOops).  Essentially,
a 64-bit pointer is packed into 32-bits by assuming 8-byte alignment and
restricting the heap size to 32GB.  I've seen similar object-reference
bitfields used in game engines.  In this case, there is no "pointer"
to
attach the address space to.

(Yes, I know that Hotspot currently uses CompressedOops ONLY in the heap,
decompressing them when stored in locals, but it is not inconceivable to
avoid decompressing them if the code is just moving them around, as an
optimization.)

Just my few thoughts.

-Joshua
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110307/868992ab/attachment.html>

Talin

2011-Mar-07 19:48 UTC

head link

[LLVMdev] llvm.gcroot suggestion

On Mon, Mar 7, 2011 at 10:58 AM, Joshua Warner <joshuawarner32 at
gmail.com>wrote:
> Hi Talin,
>
> Sorry to interject -
>
>
>>  For example, suppose I have a type "String or (float, float,
float)" -
>> that is, a union of a string and a 3-tuple of floats. Most of the time
what
>> LLVM will see is { i1; { float; float; float; } } because that's
bigger than
>> { i1; String* }. LLVM won't even know there's a pointer in
there, except
>> during those brief times when I'm accessing the pointer field. So
tagging
>> the pointer in a different address space won't help at all here.
>>
>>
> I think this is a fairly uncommon use case that will be tricky to deal with
> no matter what method is used to track GC roots.  That said, why not do
> something like make the pointer representation (the {i1, String*}) the
> long-term storage format, and only bitcast *just* before loading the
> floats?  You could even use another address space to indicate that
something
> is *sometimes* a pointer, dependent upon some other value (the i1, perhaps
> indicated with metadata).
>
I don't know if it's an uncommon use case or not, but it is something
that I
handle already in my frontend. (I suppose it's uncommon in the sense that
almost no one uses the garbage collection features of LLVM, but part of the
goal of this discussion is to change that.)

The problem with making { i1, String* } the long-term storage format is that
it isn't large enough in the example I gave, so you'll overwrite other
fields if you try to store the three floats. The more general issue is that
the concepts we're talking about simply aren't expressible in IR as it
exists today.
>
> My vote (not that it really counts for much) would be the address-space
> method.  It seems much more elegant.
>
I agree that the current solution isn't the best. The problem I have is that
the solutions that are being suggested are going to break my code badly, and
with no way to fix it.

The *real* solution is to make root-ness a function of type. In other words,
you can mark any type as being a root, which exposes the base address of all
objects of that type to the garbage collector. This is essentially the same
as the pointer-address-space suggestion, except that it's not limited to
pointers. (In practice, it would only ever apply to pointers and structs.)

(Heck, I'd even be willing to go with a solution where only structs and not
pointers could be roots - it means I'd have to wrap every pointer in a
struct, which would be a royal pain, but it would at least work.)
>
> The only thing that I think would be unusually difficult for the
> address-space method to handle would be alternative pointer
representations,
> such as those used in the latest version of Hotspot (see
> http://wikis.sun.com/display/HotSpotInternals/CompressedOops).
> Essentially, a 64-bit pointer is packed into 32-bits by assuming 8-byte
> alignment and restricting the heap size to 32GB.  I've seen similar
> object-reference bitfields used in game engines.  In this case, there is no
> "pointer" to attach the address space to.
>
> (Yes, I know that Hotspot currently uses CompressedOops ONLY in the heap,
> decompressing them when stored in locals, but it is not inconceivable to
> avoid decompressing them if the code is just moving them around, as an
> optimization.)
>
> Just my few thoughts.
>
> -Joshua
>


-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110307/a6cd922f/attachment.html>

Joshua Warner

2011-Mar-07 22:05 UTC

head link

[LLVMdev] llvm.gcroot suggestion

On Mon, Mar 7, 2011 at 12:48 PM, Talin <viridia at gmail.com> wrote:
> On Mon, Mar 7, 2011 at 10:58 AM, Joshua Warner <joshuawarner32 at
gmail.com>wrote:
>
>> Hi Talin,
>>
>> Sorry to interject -
>>
>>
>>>  For example, suppose I have a type "String or (float, float,
float)" -
>>> that is, a union of a string and a 3-tuple of floats. Most of the
time what
>>> LLVM will see is { i1; { float; float; float; } } because
that's bigger than
>>> { i1; String* }. LLVM won't even know there's a pointer in
there, except
>>> during those brief times when I'm accessing the pointer field.
So tagging
>>> the pointer in a different address space won't help at all
here.
>>>
>>>
>> I think this is a fairly uncommon use case that will be tricky to deal
>> with no matter what method is used to track GC roots.  That said, why
not do
>> something like make the pointer representation (the {i1, String*}) the
>> long-term storage format, and only bitcast *just* before loading the
>> floats?  You could even use another address space to indicate that
something
>> is *sometimes* a pointer, dependent upon some other value (the i1,
perhaps
>> indicated with metadata).
>>
>
> I don't know if it's an uncommon use case or not, but it is
something that
> I handle already in my frontend. (I suppose it's uncommon in the sense
that
> almost no one uses the garbage collection features of LLVM, but part of the
> goal of this discussion is to change that.)
>
I actually meant uncommon in the sense of having stack-allocated unions that
participate in garbage collection.  Off the top of my head, I could only
name one language (ML) that might use a feature like that.  Even then, I
suspect most ML implementations would actually push that stuff onto the
heap.

> The problem with making { i1, String* } the long-term storage format is
> that it isn't large enough in the example I gave, so you'll
overwrite other
> fields if you try to store the three floats. The more general issue is that
> the concepts we're talking about simply aren't expressible in IR as
it
> exists today.
>
Good catch - what I actually intended to indicate was the String
"half" of
the union, properly padded - so something more like {i1, String*, float}
(for 64-bit pointers).

>
>> My vote (not that it really counts for much) would be the address-space
>> method.  It seems much more elegant.
>>
>
> I agree that the current solution isn't the best. The problem I have is
> that the solutions that are being suggested are going to break my code
> badly, and with no way to fix it.
>
> The *real* solution is to make root-ness a function of type. In other
> words, you can mark any type as being a root, which exposes the base
address
> of all objects of that type to the garbage collector. This is essentially
> the same as the pointer-address-space suggestion, except that it's not
> limited to pointers. (In practice, it would only ever apply to pointers and
> structs.)
>
> (Heck, I'd even be willing to go with a solution where only structs and
not
> pointers could be roots - it means I'd have to wrap every pointer in a
> struct, which would be a royal pain, but it would at least work.)
>
Hmm... do you mean something like a "marked" bit (or maybe a vector of
mark_ids) in every type, where you could query a function for values of
"marked" types at particular safe points?  This sounds like something
that
might solve the problem described below with compressed pointers (not that I
am actually encountering this problem) - but in the near-term, it seems to
me that everything that you could conceivably mark as a GC root would
somehow contain a pointer value.  In this case, union support in LLVM would
make the generated IR cleaner, but not necessarily any more correct.

Being able to make a "marked" version of every type seems unnecessary,
and
in some cases, somewhat non-intuitive.  Take for instance, making a
"marked"
float type - which I can't think of any good use for.  I like the idea of
using address spaces because it keeps the concepts in IR largely orthogonal,
rather than having features that overlap in purpose in many cases.  That,
and IMO it just makes sense for pointers into the (or, in general, a) heap
be considered in a different address space from "normal" pointers. 
This
could extend well to tracking pointers onto the stack (as seen in C#
out/ref) for the purpose of generating closures (in .NET - which doesn't
currently have this feature).



>
>> The only thing that I think would be unusually difficult for the
>> address-space method to handle would be alternative pointer
representations,
>> such as those used in the latest version of Hotspot (see
>> http://wikis.sun.com/display/HotSpotInternals/CompressedOops).
>> Essentially, a 64-bit pointer is packed into 32-bits by assuming 8-byte
>> alignment and restricting the heap size to 32GB.  I've seen similar
>> object-reference bitfields used in game engines.  In this case, there
is no
>> "pointer" to attach the address space to.
>>
>> (Yes, I know that Hotspot currently uses CompressedOops ONLY in the
heap,
>> decompressing them when stored in locals, but it is not inconceivable
to
>> avoid decompressing them if the code is just moving them around, as an
>> optimization.)
>>
>> Just my few thoughts.
>>
>> -Joshua
>>
>
>
>
> --
> -- Talin
>
-Joshua
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110307/9d1e8706/attachment.html>

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Mar 2011 - [LLVMdev] llvm.gcroot suggestion

[LLVMdev] llvm.gcroot suggestion

[LLVMdev] llvm.gcroot suggestion

[LLVMdev] llvm.gcroot suggestion

Seemingly Similar Threads