(This is a summary of the big long thread on llvm.gcroot, for those who didn't have time to read it.) I'm proposing the replacement of llvm.gcroot() with three new intrinsics: - *llvm.gc.declare*(alloca, meta). This intrinsic marks an alloca as a garbage collection root. It can occur anywhere within a function, and lasts either until the end of the function, or a until matching call to llvm.gc.undeclare(). - *llvm.gc.undeclare*(alloca). This intrinsic unmarks and alloca, so that it is no longer considered a root from that point onward. - *llvm.gc.value*(value, meta). This intrinsic marks an SSA value as a root. The SSA value can be any type, not necessarily a pointer. This marking lasts for the lifetime of the SSA value. The names of the intrinsics are intended to follow the naming convention for declaring debug variables (llvm.dbg.declare and llvm.dbg.value). The llvm.gc.declare() and llvm.gc.value() intrinsics do essentially the same thing: At each safe point, they make the first argument available to the GC strategy as a pointer, using whatever means is most efficient from a code generation standpoint. In the case of llvm.gc.declare(), which takes an alloca as it's first argument, this is the same as llvm.gcroot() does now, and is fairly straightforward: The GC strategy gets a reference to the value argument. In the case of llvm.gc.value(), providing a pointer to the GC strategy is more involved, since the value may be in a register or split across several registers. In some cases, it may be required to spill the value into memory during safe points, and re-load it afterwards. In many cases, calling a function will require saving the SSA value on the stack regardless, so it may be possible to determine a pointer to that stack location. The llvm.undeclare() intrinsic is used to indicate the end of the lifetime of an alloca root. This replaces the current convention of assigning NULL to a root to indicate the end of it's lifetime. This has two advantages: First, it avoids the extra store, and second, it allows the backend code generator to re-use the same stack slots for different roots, as long as their lifetimes don't overlap. (Under the current scheme, the lifetime of a root is required to be the whole function body.) In all cases, LLVM should not make any assumptions about the type of the value argument with respect to garbage collection, and should treat it as a black box to the extent possible. The value may or may not contain pointers, and it may or may not contain non-pointer fields. It will be up the the GC strategy to take the appropriate action based on the data type and the meta argument. One open issue is whether formal function arguments - which are normally treated as SSA values - can be passed as arguments to llvm.gc.value(). From the standpoint of a user, this would be very convenient to have, but if it's too difficult, then it can be worked around by copying the function parameters to local SSA values. Now, I realize that there were several strong supporters of a competing proposal involving using the address-space field of pointers in LLVM. I won't go into the details here, except to say two things (1) I believe that approach limits the generality of LLVM's support for diverse collectors, and (2) in the original thread, the folks who supported my proposal tended to be people who were actual users of the current system, or who planned on using it in the near future. -- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110330/273fd4fe/attachment.html>
On 30 March 2011 19:08, Talin <viridia at gmail.com> wrote:> llvm.gc.declare(alloca, meta). This intrinsic marks an alloca as a garbage > collection root. It can occur anywhere within a function, and lasts either > until the end of the function, or a until matching call to > llvm.gc.undeclare(). > llvm.gc.undeclare(alloca). This intrinsic unmarks and alloca, so that it is > no longer considered a root from that point onward.Hi Talin, What changes to code generation would be necessary to support this? Is there any intention of supporting a collector that has static stack maps for each function, i.e. a table telling you, for each point in the code, where all the roots are on the stack (similar to unwind info for exception handling)? If so, I think it's a bit dodgy to use intrinsic function calls to mark the start/end of the lifetime of a GC root, because function calls are inherently dynamic. For example, you can't represent this code with a static stack map: if (cond) { llvm.gc.declare(foo, bar); } ... // foo may or may not be a root here ... if (cond) { // same condition as above llvm.gc.undeclare(foo); } Even if you're careful not to generate code like this in your front end, how can you guarantee that LLVM optimisation passes won't introduce it? The old llvm.dbg.region.start and llvm.dbg.region.end had the same kind of problem. Thanks, Jay.
On Fri, Apr 1, 2011 at 1:58 AM, Jay Foad <jay.foad at gmail.com> wrote:> On 30 March 2011 19:08, Talin <viridia at gmail.com> wrote: > > llvm.gc.declare(alloca, meta). This intrinsic marks an alloca as a > garbage > > collection root. It can occur anywhere within a function, and lasts > either > > until the end of the function, or a until matching call to > > llvm.gc.undeclare(). > > llvm.gc.undeclare(alloca). This intrinsic unmarks and alloca, so that it > is > > no longer considered a root from that point onward. > > Hi Talin, > > What changes to code generation would be necessary to support this? > > I can only describe this in abstract terms, since I know very little aboutLLVM's code generation. (I am primarily a user of LLVM, not a developer of it, although I have made a few minor contributions.)> Is there any intention of supporting a collector that has static stack > maps for each function, i.e. a table telling you, for each point in > the code, where all the roots are on the stack (similar to unwind info > for exception handling)? If so, I think it's a bit dodgy to use >That is already supported in the current LLVM. The changes I am proposing are merely an extension of what we have now, allowing frontends to emit more efficient code.> intrinsic function calls to mark the start/end of the lifetime of a GC > root, because function calls are inherently dynamic. For example, you > can't represent this code with a static stack map: > > if (cond) { > llvm.gc.declare(foo, bar); > } > ... > // foo may or may not be a root here > ... > if (cond) { // same condition as above > llvm.gc.undeclare(foo); > } > > You would need to do the same as what is done today: Move the declareoutside of the condition, and initialize the variable to a null state, such that the garbage collector will know to ignore the variable. In the if-block, you then overwrite the variable with actual data. The difference is that in the today's LLVM, the variable declaration has to be in the first block, and lasts for the entire function - so you have to initialize all of your stack variables to a null state in the first block. By extending the notation to allow stack roots to have a limited lifetime, we can avoid having to initialize the stack root unless we actually enter the block where it is defined. I should mention that the declare/undeclare intrinsics are the least important part of this proposal. The important part is the ability to declare SSA values as roots - that is what will make a world of difference to folks like myself that are writing frontends that use garbage collection.> Even if you're careful not to generate code like this in your front > end, how can you guarantee that LLVM optimisation passes won't > introduce it? > > The old llvm.dbg.region.start and llvm.dbg.region.end had the same > kind of problem. > > Thanks, > Jay. >-- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110401/beadeef0/attachment.html>