thr3ads.net - llvm dev - [LLVMdev] Interfacing llvm with a precise, relocating GC [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das

2013-Oct-24 21:32 UTC

[LLVMdev] Interfacing llvm with a precise, relocating GC

Hello llvm-dev!

My colleages and I are currently evaluating llvm's suitability as a
JIT compiler interfacing with a precise, relocating garbage collector.
While we couldn't find code or writeups that deal with the issues
specific to this design goal, it is entirely possible that we may have
missed something; we would appreciate references to relevant code or
writeups that people on this list may be aware of.

As an example, one issue that makes this non-trivial is that llvm (as
far as we know) is free to manufacture pointers to locations _inside_
objects, something referred to as a "derived pointer" in some places.
Since these pointers need to be updated in sync with the objects they
point into, a relocating GC needs to be aware of them; and the runtime
needs to be able to read off which registers and stack slots hold such
pointers at every safepoint.  We've looked into llvm's existing GC
support, and the mechanism it provides does not seem to help in this
use case.

This email is deliberately terse, but we are more than happy to get
into details about the approaches we've considered as the discussion
progresses.  Pointers to existing work related to this or similar
issues is especially welcome.

Thanks!

-- Sanjoy

Rafael Espíndola

2013-Oct-24 21:50 UTC

head link

[LLVMdev] Interfacing llvm with a precise, relocating GC

On 24 October 2013 17:32, Sanjoy Das <sanjoy at azulsystems.com>
wrote:> Hello llvm-dev!
>
> My colleages and I are currently evaluating llvm's suitability as a
> JIT compiler interfacing with a precise, relocating garbage collector.
> While we couldn't find code or writeups that deal with the issues
> specific to this design goal, it is entirely possible that we may have
> missed something; we would appreciate references to relevant code or
> writeups that people on this list may be aware of.

This would be hard. Currently what we have support for is a non-moving
GC where all the roots are in memory. Adding support for a non-moving
gc with register roots would not be too hard and might be possible to
reuse some of the recent stackmap work.

For a moving GC you would probably have to change how we represent
pointer arithmetic in the selection dag and MI. It would be quiet a
big change. CCIng Andy and Patrick since they might have an idea of
how much work that would be and what the costs and benefits for LLVM
are.

Also to note is that there are plans to move away from selection dag,
so it might be good to sync this work with whatever we end up using
instead.

Cheers,
Rafael

Andrew Trick

2013-Oct-24 21:56 UTC

head link

[LLVMdev] Interfacing llvm with a precise, relocating GC

On Oct 24, 2013, at 2:50 PM, Rafael Espíndola <rafael.espindola at
gmail.com> wrote:
> On 24 October 2013 17:32, Sanjoy Das <sanjoy at azulsystems.com>
wrote:
>> Hello llvm-dev!
>> 
>> My colleages and I are currently evaluating llvm's suitability as a
>> JIT compiler interfacing with a precise, relocating garbage collector.
>> While we couldn't find code or writeups that deal with the issues
>> specific to this design goal, it is entirely possible that we may have
>> missed something; we would appreciate references to relevant code or
>> writeups that people on this list may be aware of.
> 
> 
> This would be hard. Currently what we have support for is a non-moving
> GC where all the roots are in memory. Adding support for a non-moving
> gc with register roots would not be too hard and might be possible to
> reuse some of the recent stackmap work.
> 
> For a moving GC you would probably have to change how we represent
> pointer arithmetic in the selection dag and MI. It would be quiet a
> big change. CCIng Andy and Patrick since they might have an idea of
> how much work that would be and what the costs and benefits for LLVM
> are.
100% agreement.
> Also to note is that there are plans to move away from selection dag,
> so it might be good to sync this work with whatever we end up using
> instead.
FYI: when this was talked about, I heard mention that GEPs should be lowered
early in the IR->MI pipeline. I didn’t hear any ideas that would make derived
pointer tracking easier.

-Andy

Philip Reames

2013-Oct-25 18:50 UTC

head link

[LLVMdev] Interfacing llvm with a precise, relocating GC

On 10/24/13 2:50 PM, Rafael Espíndola wrote:> On 24 October 2013 17:32, Sanjoy Das <sanjoy at azulsystems.com>
wrote:
>> Hello llvm-dev!
>>
>> My colleages and I are currently evaluating llvm's suitability as a
>> JIT compiler interfacing with a precise, relocating garbage collector.
>> While we couldn't find code or writeups that deal with the issues
>> specific to this design goal, it is entirely possible that we may have
>> missed something; we would appreciate references to relevant code or
>> writeups that people on this list may be aware of.
>
> This would be hard. Currently what we have support for is a non-moving
> GC where all the roots are in memory. Adding support for a non-moving
> gc with register roots would not be too hard and might be possible to
> reuse some of the recent stackmap work.Agreed,  I think all the mechanisms are either in tree already, or 
shortly to be so.  There's still a fair amount of work required to make 
it all work together, but the task seems approachable.> For a moving GC you would probably have to change how we represent
> pointer arithmetic in the selection dag and MI. It would be quiet a
> big change. CCIng Andy and Patrick since they might have an idea of
> how much work that would be and what the costs and benefits for LLVM
> are.
>
> Also to note is that there are plans to move away from selection dag,
> so it might be good to sync this work with whatever we end up using
> instead.Ouch.  I hadn't realized that GEPs were desugared to integer arithmetic 
that early. That does seem like it would be a problem. Thank you for 
pointing this out.

Assuming we had a scheme to avoid/solve this specific issue, are you 
aware of any other ones?

Can you point me to any previous discussion of the selection dag 
migration?  This isn't a topic I usually follow on the list.  A quick 
search didn't find the conversations you're referencing.

Philip

Talin

2013-Oct-29 03:51 UTC

head link

[LLVMdev] Interfacing llvm with a precise, relocating GC

Sanjoy: This document which I wrote several years ago may be of some use to
you:

Building a Stack Crawler in
LLVM<https://docs.google.com/document/d/1-ws0KYo47S0CgqpwkjfWDBJ8wFhW_0UYKxPIJ0TyKrQ/edit?usp=sharing&authkey=COD8_LcL>

I have successfully implemented a copying collector using LLVM. I did not
implement support for interior pointers, however I have a number of ideas
on how to approach it. The language that I was implementing was similar to
C# in that classes were divided into "reference" types and
"value" types. A
pointer to a reference type on the heap always pointed to the start of an
allocation, whereas a pointer to a value type was always an interior
pointer. (In other words, a value type could never exist by itself in the
heap, it had to be embedded within some reference type). This means that
the compiler could always know whether a pointer was an internal pointer or
not. Thus, pointers to reference types were just machine-level pointers,
whereas pointers to value types consisted of a pointer+offset, with the
pointer part pointing to the start of a heap allocation. Combined with the
fact that pointers to value types were relatively rare, this allowed
internal pointers with a minimum of overhead.

Now, having said all that, I feel compelled to give a few warnings. Part of
the reason I abandoned this project was because of limitations in LLVM's
garbage collection intrinsics, which I have written about extensively on
this list. The current llvm.gcroot strategy requires the frontend to be
very complex, generate highly inefficient code, and that code is mostly
unoptimizable since LLVM's optimizers generally won't touch a value that
has been marked as a GC root.

Worse, the support for GC in the LLVM community is fairly low - the garbage
collection intrinsics in LLVM have not been updated or improved in the 7
years of my following the project, and there's been very little discussion
of GC on the mailing list (I do a search for the word "collect" in the
LLVM
archives about once a month, which is how this thread came to my
attention.) Most of the people working on/with LLVM are working with non-GC
languages, or with languages that have simple enough memory models (e.g.
"everything is an atom") that the existing intrinsics are sufficient.
There
are also a few people who have gotten around the problems by defining their
own stack frames instead of using the LLVM intrinsics.

There have been numerous proposals over the years for better GC intrinsics,
but nothing has come out of these discussions so far. There's a good reason
for this: improving the GC support would require a major commitment, since
all of the backend code generators and optimizers would potentially be
affected.

My current favorite GC proposal involves annotating types - that is, to
define a new kind of derived type that is essentially a 2-tuple consisting
of a base type + GC metadata (the second argument to llvm.gcroot). This
means that "root-ness" would be a property of a type rather than a
value,
which means that the rootness could automatically be propagated to
intermediate values or SSA values through optimization without the frontend
having to do a lot of spilling and reloading of values. (Plus having the
ability to associate metadata with types might be useful for other things
besides GC.)

On Thu, Oct 24, 2013 at 2:32 PM, Sanjoy Das <sanjoy at azulsystems.com>
wrote:
> Hello llvm-dev!
>
> My colleages and I are currently evaluating llvm's suitability as a
> JIT compiler interfacing with a precise, relocating garbage collector.
> While we couldn't find code or writeups that deal with the issues
> specific to this design goal, it is entirely possible that we may have
> missed something; we would appreciate references to relevant code or
> writeups that people on this list may be aware of.
>
> As an example, one issue that makes this non-trivial is that llvm (as
> far as we know) is free to manufacture pointers to locations _inside_
> objects, something referred to as a "derived pointer" in some
places.
> Since these pointers need to be updated in sync with the objects they
> point into, a relocating GC needs to be aware of them; and the runtime
> needs to be able to read off which registers and stack slots hold such
> pointers at every safepoint.  We've looked into llvm's existing GC
> support, and the mechanism it provides does not seem to help in this
> use case.
>
> This email is deliberately terse, but we are more than happy to get
> into details about the approaches we've considered as the discussion
> progresses.  Pointers to existing work related to this or similar
> issues is especially welcome.
>
> Thanks!
>
> -- Sanjoy
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131028/6d4966f4/attachment.html>

Sanjoy Das

2013-Oct-29 17:17 UTC

head link

[LLVMdev] Interfacing llvm with a precise, relocating GC

Hi Talin,

Thank you for your response!

Since you mention that you don't implement derived pointers, I'm a
little confused about how your approach solves the issue I brought up.
It seems to me that unless you plan on pinning some objects, support
for derived pointers isn't optional.  I'm not talking about derived
pointers that the front-end introduces (which can be tracked) but the
ones that are introduced by llvm during IR level optimizations or,
worse, during and after instruction selection.

To be sure that we're talking about the same thing, as an example,
consider the loop (in pseudo llvm IR):

for (int i = 0 to (length - 1)) {
 total += a[i];
 safepoint(); // This thread may be stopped here, and `a'
              // may be relocated.
}

llvm can transform this loop to

for (int *i = &a[0] to &a[length - 1]) {
 total += *i;
 safepoint(); // llvm has now introduced an additional
              // derived pointer, `i'.
}
>From llvm's point of view this is a valid transformation; but it endsup creating a new pointer the GC has to be aware of, and needs to be
relocated in sync with a.

Thanks!
-- Sanjoy

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Oct 2013 - [LLVMdev] Interfacing llvm with a precise, relocating GC

[LLVMdev] Interfacing llvm with a precise, relocating GC

[LLVMdev] Interfacing llvm with a precise, relocating GC

[LLVMdev] Interfacing llvm with a precise, relocating GC

[LLVMdev] Interfacing llvm with a precise, relocating GC

[LLVMdev] Interfacing llvm with a precise, relocating GC

[LLVMdev] Interfacing llvm with a precise, relocating GC

Possibly Parallel Threads