thr3ads.net - llvm dev - [LLVMdev] Proposal for improving llvm.gcroot (summarized) [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Talin

2011-Mar-30 18:08 UTC

[LLVMdev] Proposal for improving llvm.gcroot (summarized)

(This is a summary of the big long thread on llvm.gcroot, for those who
didn't have time to read it.)

I'm proposing the replacement of llvm.gcroot() with three new intrinsics:

   - *llvm.gc.declare*(alloca, meta). This intrinsic marks an alloca as a
   garbage collection root. It can occur anywhere within a function, and lasts
   either until the end of the function, or a until matching call to
   llvm.gc.undeclare().
   - *llvm.gc.undeclare*(alloca). This intrinsic unmarks and alloca, so that
   it is no longer considered a root from that point onward.
   - *llvm.gc.value*(value, meta). This intrinsic marks an SSA value as a
   root. The SSA value can be any type, not necessarily a pointer. This marking
   lasts for the lifetime of the SSA value.

The names of the intrinsics are intended to follow the naming convention for
declaring debug variables (llvm.dbg.declare and llvm.dbg.value).

The llvm.gc.declare() and llvm.gc.value() intrinsics do essentially the same
thing: At each safe point, they make the first argument available to the GC
strategy as a pointer, using whatever means is most efficient from a code
generation standpoint. In the case of llvm.gc.declare(), which takes an
alloca as it's first argument, this is the same as llvm.gcroot() does now,
and is fairly straightforward: The GC strategy gets a reference to the value
argument.

In the case of llvm.gc.value(), providing a pointer to the GC strategy is
more involved, since the value may be in a register or split across several
registers. In some cases, it may be required to spill the value into memory
during safe points, and re-load it afterwards. In many cases, calling a
function will require saving the SSA value on the stack regardless, so it
may be possible to determine a pointer to that stack location.

The llvm.undeclare() intrinsic is used to indicate the end of the lifetime
of an alloca root. This replaces the current convention of assigning NULL to
a root to indicate the end of it's lifetime. This has two advantages: First,
it avoids the extra store, and second, it allows the backend code generator
to re-use the same stack slots for different roots, as long as their
lifetimes don't overlap. (Under the current scheme, the lifetime of a root
is required to be the whole function body.)

In all cases, LLVM should not make any assumptions about the type of the
value argument with respect to garbage collection, and should treat it as a
black box to the extent possible. The value may or may not contain pointers,
and it may or may not contain non-pointer fields. It will be up the the GC
strategy to take the appropriate action based on the data type and the meta
argument.

One open issue is whether formal function arguments - which are normally
treated as SSA values - can be passed as arguments to llvm.gc.value(). From
the standpoint of a user, this would be very convenient to have, but if it's
too difficult, then it can be worked around by copying the function
parameters to local SSA values.

Now, I realize that there were several strong supporters of a competing
proposal involving using the address-space field of pointers in LLVM. I
won't go into the details here, except to say two things (1) I believe that
approach limits the generality of LLVM's support for diverse collectors, and
(2) in the original thread, the folks who supported my proposal tended to be
people who were actual users of the current system, or who planned on using
it in the near future.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110330/273fd4fe/attachment.html>

Jay Foad

2011-Apr-01 08:58 UTC

head link

[LLVMdev] Proposal for improving llvm.gcroot (summarized)

On 30 March 2011 19:08, Talin <viridia at gmail.com>
wrote:> llvm.gc.declare(alloca, meta). This intrinsic marks an alloca as a garbage
> collection root. It can occur anywhere within a function, and lasts either
> until the end of the function, or a until matching call to
> llvm.gc.undeclare().
> llvm.gc.undeclare(alloca). This intrinsic unmarks and alloca, so that it is
> no longer considered a root from that point onward.
Hi Talin,

What changes to code generation would be necessary to support this?

Is there any intention of supporting a collector that has static stack
maps for each function, i.e. a table telling you, for each point in
the code, where all the roots are on the stack (similar to unwind info
for exception handling)? If so, I think it's a bit dodgy to use
intrinsic function calls to mark the start/end of the lifetime of a GC
root, because function calls are inherently dynamic. For example, you
can't represent this code with a static stack map:

if (cond) {
  llvm.gc.declare(foo, bar);
}
...
// foo may or may not be a root here
...
if (cond) { // same condition as above
  llvm.gc.undeclare(foo);
}

Even if you're careful not to generate code like this in your front
end, how can you guarantee that LLVM optimisation passes won't
introduce it?

The old llvm.dbg.region.start and llvm.dbg.region.end had the same
kind of problem.

Thanks,
Jay.

Talin

2011-Apr-01 17:34 UTC

head link

[LLVMdev] Proposal for improving llvm.gcroot (summarized)

On Fri, Apr 1, 2011 at 1:58 AM, Jay Foad <jay.foad at gmail.com> wrote:
> On 30 March 2011 19:08, Talin <viridia at gmail.com> wrote:
> > llvm.gc.declare(alloca, meta). This intrinsic marks an alloca as a
> garbage
> > collection root. It can occur anywhere within a function, and lasts
> either
> > until the end of the function, or a until matching call to
> > llvm.gc.undeclare().
> > llvm.gc.undeclare(alloca). This intrinsic unmarks and alloca, so that
it
> is
> > no longer considered a root from that point onward.
>
> Hi Talin,
>
> What changes to code generation would be necessary to support this?
>
> I can only describe this in abstract terms, since I know very little aboutLLVM's code generation. (I am primarily a user of LLVM, not a developer of
it, although I have made a few minor contributions.)

> Is there any intention of supporting a collector that has static stack
> maps for each function, i.e. a table telling you, for each point in
> the code, where all the roots are on the stack (similar to unwind info
> for exception handling)? If so, I think it's a bit dodgy to use
>
That is already supported in the current LLVM. The changes I am proposing
are merely an extension of what we have now, allowing frontends to emit more
efficient code.

> intrinsic function calls to mark the start/end of the lifetime of a GC
> root, because function calls are inherently dynamic. For example, you
> can't represent this code with a static stack map:
>
> if (cond) {
>  llvm.gc.declare(foo, bar);
> }
> ...
> // foo may or may not be a root here
> ...
> if (cond) { // same condition as above
>  llvm.gc.undeclare(foo);
> }
>
> You would need to do the same as what is done today: Move the declareoutside of the condition, and initialize the variable to a null state, such
that the garbage collector will know to ignore the variable. In the
if-block, you then overwrite the variable with actual data.

The difference is that in the today's LLVM, the variable declaration has to
be in the first block, and lasts for the entire function - so you have to
initialize all of your stack variables to a null state in the first block.
By extending the notation to allow stack roots to have a limited lifetime,
we can avoid having to initialize the stack root unless we actually enter
the block where it is defined.

I should mention that the declare/undeclare intrinsics are the least
important part of this proposal. The important part is the ability to
declare SSA values as roots - that is what will make a world of difference
to folks like myself that are writing frontends that use garbage collection.

> Even if you're careful not to generate code like this in your front
> end, how can you guarantee that LLVM optimisation passes won't
> introduce it?
>
> The old llvm.dbg.region.start and llvm.dbg.region.end had the same
> kind of problem.
>
> Thanks,
> Jay.
>


-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110401/beadeef0/attachment.html>

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Mar 2011 - [LLVMdev] Proposal for improving llvm.gcroot (summarized)

[LLVMdev] Proposal for improving llvm.gcroot (summarized)

[LLVMdev] Proposal for improving llvm.gcroot (summarized)

[LLVMdev] Proposal for improving llvm.gcroot (summarized)

Reasonably Related Threads