Hi all, I've been looking through the documentation (http://llvm.org/docs/GarbageCollection.html) on how to implement a garbage collector for LLVM and there's a couple of things that I don't quite understand. Specifically, it says that when a stack variable goes out of scope, you're supposed to assign a null value to it to indicate that the value is no longer live. What I don't quite understand is how the GC can efficiently use this information. The GC will generally have some data structure that represents a list of all current roots. Calling llvm.gcroot adds a pointer to the list. When the variable goes out of scope, you'll want to remove the pointer from the list. But merely assigning null to the value doesn't cause this to happen, it only prevents the object previously being pointed to from being traced. If the root pointer persists in the table long enough, that same stack address may get re-used for a different variable, which will be non-null, and cause bad things to happen. It might be clearer if I explained what I want to do. The way that I manager root pointers is as follows: For each application thread, there's a pair of arrays stored in thread local data. One array is for adding roots, the other is for deleting roots. In other words, each time a root gets added, its address gets appended to the 'add' array; Each time a root is deleted, its pointer gets appended to the 'delete' array. Since both arrays are thread-local, there's no locking or synchronization required. If either list gets full, a collection cycle is triggered. Since the roots aren't actually used *except* during a collection cycle, we don't both compacting the lists until the start of the next collection. At that time, the 'add' list is compacted by removing both duplicate entries and any entries found in the 'delete' list. The 'add' list is then used as the root set for the cycle. In order for this to work, however, I definitely need a way to know when a reference has gone out of scope so that it can be added to the 'deleted' list. -- Talin
Hi Talin, On Sep 2, 2007, at 04:54, Talin wrote:> I've been looking through the documentation (http://llvm.org/docs/ > GarbageCollection.html) on how to implement a garbage collector for > LLVM and there's a couple of things that I don't quite understand. > Specifically, it says that when a stack variable goes out of scope, > you're supposed to assign a null value to it to indicate that the > value is no longer live. > > What I don't quite understand is how the GC can efficiently use > this information.The GC intrinsics exist for the type of collector which depends on metadata compiled alongside the function. At runtime, such a collector walks the dynamic call stack and merges that with the static metadata to identify dynamically live roots. Like "zero cost exception handling" or debug info, collecting this kind of metadata requires backend code generator support, which again is the reason these intrinsics exist.> The GC will generally have some data structure that represents a > list of all current roots. Calling llvm.gcroot adds a pointer to > the list. When the variable goes out of scope, you'll want to > remove the pointer from the list. But merely assigning null to the > value doesn't cause this to happen, it only prevents the object > previously being pointed to from being traced. If the root pointer > persists in the table long enough, that same stack address may get > re-used for a different variable, which will be non-null, and cause > bad things to happen. > > It might be clearer if I explained what I want to do. The way that > I manager root pointers is as follows: For each application thread, > there's a pair of arrays stored in thread local data. One array is > for adding roots, the other is for deleting roots. In other words, > each time a root gets added, its address gets appended to the 'add' > array; Each time a root is deleted, its pointer gets appended to > the 'delete' array. Since both arrays are thread-local, there's no > locking or synchronization required. If either list gets full, a > collection cycle is triggered. > > Since the roots aren't actually used *except* during a collection > cycle, we don't both compacting the lists until the start of the > next collection. At that time, the 'add' list is compacted by > removing both duplicate entries and any entries found in the > 'delete' list. The 'add' list is then used as the root set for the > cycle.Okay.> In order for this to work, however, I definitely need a way to know > when a reference has gone out of scope so that it can be added to > the 'deleted' list.From what you've described, your collector does not need any code generator support. You could easily ignore the llvm.gc* intrinsics. Instead, your front-end could simply insert 'add' and 'remove' fragments as source variables enter and exit scope. That said, it is possible to easily adapt LLVM's model to your runtime if you prefer to use the intrinsics. First, observe that the stack roots in a stack frame are valid until the function returns. Given that, you can adjust the 'add' and 'remove' arrays in a batch at the start and end of the function. To avoid keeping objects alive unnecessarily long, null the root as the variable goes out of scope, just as the LLVM manual suggests. The existing LowerGC pass also finds (and removes) gcroot intrinsics and transforms them into LLVM code at function entry and return. You could easily adapt that pass to cooperate with your runtime. Hope that helps. — Gordon
On Sep 2, 2007 5:31 AM, Gordon Henriksen <gordonhenriksen at mac.com> wrote:> Hi Talin, > > On Sep 2, 2007, at 04:54, Talin wrote: > > > I've been looking through the documentation (http://llvm.org/docs/ > > GarbageCollection.html) on how to implement a garbage collector for > > LLVM and there's a couple of things that I don't quite understand. > > Specifically, it says that when a stack variable goes out of scope, > > you're supposed to assign a null value to it to indicate that the > > value is no longer live. > > > > What I don't quite understand is how the GC can efficiently use > > this information. > > The GC intrinsics exist for the type of collector which depends on > metadata compiled alongside the function. At runtime, such a > collector walks the dynamic call stack and merges that with the > static metadata to identify dynamically live roots. Like "zero cost > exception handling" or debug info, collecting this kind of metadata > requires backend code generator support, which again is the reason > these intrinsics exist.So, if I understand what you are saying correctly, the "llvm.gcroot" intrinsic isn't actually an "instruction" in the sense I normally think of as an action to be taken; It's more of a marker which is used by the code generator to create a data structure that describes the stack layout. Is that correct? If that's the case, then my next question is: How do the LLVM garbage collection intrinsics work in a multi-threaded environment? If there's a data structure that describes the layout of the stack, and you have multiple stacks, what happens? BTW, I'm also noticing that the current SemiSpace example code seems quite incomplete. -- -- Talin