thr3ads.net - llvm dev - [LLVMdev] Garbage Collection Roots [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Talin

2007-Sep-02 08:54 UTC

[LLVMdev] Garbage Collection Roots

Hi all,

I've been looking through the documentation 
(http://llvm.org/docs/GarbageCollection.html) on how to implement a 
garbage collector for LLVM and there's a couple of things that I don't 
quite understand. Specifically, it says that when a stack variable goes 
out of scope, you're supposed to assign a null value to it to indicate 
that the value is no longer live.

What I don't quite understand is how the GC can efficiently use this 
information.

The GC will generally have some data structure that represents a list of 
all current roots. Calling llvm.gcroot adds a pointer to the list. When 
the variable goes out of scope, you'll want to remove the pointer from 
the list. But merely assigning null to the value doesn't cause this to 
happen, it only prevents the object previously being pointed to from 
being traced. If the root pointer persists in the table long enough, 
that same stack address may get re-used for a different variable, which 
will be non-null, and cause bad things to happen.

It might be clearer if I explained what I want to do. The way that I 
manager root pointers is as follows: For each application thread, 
there's a pair of arrays stored in thread local data. One array is for 
adding roots, the other is for deleting roots. In other words, each time 
a root gets added, its address gets appended to the 'add' array; Each 
time a root is deleted, its pointer gets appended to the 'delete' array.
Since both arrays are thread-local, there's no locking or 
synchronization required. If either list gets full, a collection cycle 
is triggered.

Since the roots aren't actually used *except* during a collection cycle, 
we don't both compacting the lists until the start of the next 
collection. At that time, the 'add' list is compacted by removing both 
duplicate entries and any entries found in the 'delete' list. The
'add'
list is then used as the root set for the cycle.

In order for this to work, however, I definitely need a way to know when 
a reference has gone out of scope so that it can be added to the 
'deleted' list.

-- Talin

Gordon Henriksen

2007-Sep-02 12:31 UTC

head link

[LLVMdev] Garbage Collection Roots

Hi Talin,

On Sep 2, 2007, at 04:54, Talin wrote:
> I've been looking through the documentation (http://llvm.org/docs/ 
> GarbageCollection.html) on how to implement a garbage collector for  
> LLVM and there's a couple of things that I don't quite understand.
> Specifically, it says that when a stack variable goes out of scope,  
> you're supposed to assign a null value to it to indicate that the  
> value is no longer live.
>
> What I don't quite understand is how the GC can efficiently use  
> this information.
The GC intrinsics exist for the type of collector which depends on  
metadata compiled alongside the function. At runtime, such a  
collector walks the dynamic call stack and merges that with the  
static metadata to identify dynamically live roots. Like "zero cost  
exception handling" or debug info, collecting this kind of metadata  
requires backend code generator support, which again is the reason  
these intrinsics exist.
> The GC will generally have some data structure that represents a  
> list of all current roots. Calling llvm.gcroot adds a pointer to  
> the list. When the variable goes out of scope, you'll want to  
> remove the pointer from the list. But merely assigning null to the  
> value doesn't cause this to happen, it only prevents the object  
> previously being pointed to from being traced. If the root pointer  
> persists in the table long enough, that same stack address may get  
> re-used for a different variable, which will be non-null, and cause  
> bad things to happen.
>
> It might be clearer if I explained what I want to do. The way that  
> I manager root pointers is as follows: For each application thread,  
> there's a pair of arrays stored in thread local data. One array is  
> for adding roots, the other is for deleting roots. In other words,  
> each time a root gets added, its address gets appended to the 'add'
> array; Each time a root is deleted, its pointer gets appended to  
> the 'delete' array. Since both arrays are thread-local, there's
no
> locking or synchronization required. If either list gets full, a  
> collection cycle is triggered.
>
> Since the roots aren't actually used *except* during a collection  
> cycle, we don't both compacting the lists until the start of the  
> next collection. At that time, the 'add' list is compacted by  
> removing both duplicate entries and any entries found in the  
> 'delete' list. The 'add' list is then used as the root set
for the
> cycle.
Okay.
> In order for this to work, however, I definitely need a way to know  
> when a reference has gone out of scope so that it can be added to  
> the 'deleted' list.
 From what you've described, your collector does not need any code  
generator support. You could easily ignore the llvm.gc* intrinsics.  
Instead, your front-end could simply insert 'add' and 'remove'  
fragments as source variables enter and exit scope.


That said, it is possible to easily adapt LLVM's model to your  
runtime if you prefer to use the intrinsics. First, observe that the  
stack roots in a stack frame are valid until the function returns.  
Given that, you can adjust the 'add' and 'remove' arrays in a
batch
at the start and end of the function. To avoid keeping objects alive  
unnecessarily long, null the root as the variable goes out of scope,  
just as the LLVM manual suggests.

The existing LowerGC pass also finds (and removes) gcroot intrinsics  
and transforms them into LLVM code at function entry and return. You  
could easily adapt that pass to cooperate with your runtime.


Hope that helps.

— Gordon

Talin

2007-Sep-04 03:14 UTC

head link

[LLVMdev] Garbage Collection Roots

On Sep 2, 2007 5:31 AM, Gordon Henriksen <gordonhenriksen at mac.com>
wrote:> Hi Talin,
>
> On Sep 2, 2007, at 04:54, Talin wrote:
>
> > I've been looking through the documentation (http://llvm.org/docs/
> > GarbageCollection.html) on how to implement a garbage collector for
> > LLVM and there's a couple of things that I don't quite
understand.
> > Specifically, it says that when a stack variable goes out of scope,
> > you're supposed to assign a null value to it to indicate that the
> > value is no longer live.
> >
> > What I don't quite understand is how the GC can efficiently use
> > this information.
>
> The GC intrinsics exist for the type of collector which depends on
> metadata compiled alongside the function. At runtime, such a
> collector walks the dynamic call stack and merges that with the
> static metadata to identify dynamically live roots. Like "zero cost
> exception handling" or debug info, collecting this kind of metadata
> requires backend code generator support, which again is the reason
> these intrinsics exist.
So, if I understand what you are saying correctly, the "llvm.gcroot"
intrinsic isn't actually an "instruction" in the sense I normally
think of as an action to be taken; It's more of a marker which is used
by the code generator to create a data structure that describes the
stack layout. Is that correct?

If that's the case, then my next question is: How do the LLVM garbage
collection intrinsics work in a multi-threaded environment? If there's
a data structure that describes the layout of the stack, and you have
multiple stacks, what happens?

BTW, I'm also noticing that the current SemiSpace example code seems
quite incomplete.

-- 
-- Talin

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Sep 2007 - [LLVMdev] Garbage Collection Roots

[LLVMdev] Garbage Collection Roots

[LLVMdev] Garbage Collection Roots

[LLVMdev] Garbage Collection Roots

Maybe Matching Threads