thr3ads.net - llvm dev - [LLVMdev] Interfacing llvm with a precise, relocating GC [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Michael Lewis

2013-Oct-26 04:37 UTC

[LLVMdev] Interfacing llvm with a precise, relocating GC

I'm also highly interested in relocating-GC support from LLVM. Up until now
my GC implementation has been non-relocating which is obviously kind of a
bummer given that it inhibits certain classes of memory
allocation/deallocation tricks.


I wrote up a bunch of my findings on the implementation of my GC here:
https://code.google.com/p/epoch-language/wiki/GarbageCollectionScheme


Frankly I haven't had time to get much further than idle pondering of how
I'd go about implementing relocation, but it seems to me like the existing
read/write barrier intrinsics might be sufficient if abused properly by the
front-end and lowered carefully. My operating hypothesis has been to stash
barriers at key points - identified by the front-end itself - and possibly
elide them during lowering if it's safe to do so. If my understanding is
correct, it should be possible to lower the barriers into exactly the kind
of boxing/unboxing procedure described in this thread.

Based on my experimentation so far, which is admittedly minimal, I think
the overhead of updating relocated pointers is actually not as bad as it
sounds. It isn't strictly necessary to store both a boxed *and* unboxed
pointer in every frame. In fact, in the current incarnation of gcroot,
marking an alloca as a gcroot effectively forces a stack spill for that
alloca; this means that the relocating collector just updates the single
existing pointer on the stack when it does its magic, and you're done. With
proper barriers in place, and/or careful location of safepoints by the
front-end, it's not that hard to make sure that a relocated pointer gets
updated.

The real trick, as near as I can tell, would be updating registers that
have live roots during a collection. But again based on my past
investigations this should just be a matter of ensuring that
post-collection you have a barrier that is lowered into a register refresh
based on the on-stack relocated pointer value.


One thing I've been meaning to try and do is use this barrier-abuse scheme
to work around the existing lack of support for tracing roots in machine
registers, by effectively setting up an artificial stack spill just prior
to a collection, and otherwise operating on registers as much as possible
instead of gcroot'ed allocas. Again, I've only considered this as a
thought
exercise until now, so I apologize if there's some obvious flaw I'm not
aware of in my reasoning. I haven't actually done any of this stuff yet so
it's more speculative than anything - just trying to creatively engineer
around the existing limitations :-)



 - Mike




On Fri, Oct 25, 2013 at 5:35 PM, Philip Reames <listmail at
philipreames.com>wrote:
>  On 10/25/13 1:10 PM, Ben Karel wrote:
>
>
>
>
> On Thu, Oct 24, 2013 at 6:42 PM, Sanjoy Das <sanjoy at
azulsystems.com>wrote:
>
>> Hi Rafael, Andrew,
>>
>> Thank you for the prompt reply.
>>
>> One approach we've been considering involves representing the
>> constraint "pointers to heap objects are invalidated at every
>> safepoint" somehow in the IR itself.  So, if %a and %b are values
the
>> GC is interested in, the safepoint at the back-edge of a loop might
>> look like:
>>
>>   ; <label>: body
>>     %a = phi i32 [ %a.relocated, %body ] [ %a.initial_value, %pred ]
>>     %b = phi i32 [ %b.relocated, %body ] [ %b.initial_value, %pred ]
>>     ;; Use %a and %b
>>
>>     ;; The safepoint starts here
>>     %a.relocated = @llvm.gc_relocate(%a)
>>     %b.relocated = @llvm.gc_relocate(%b)
>>     br %body
>>
>> This allows us to not bother with relocating derived pointers pointing
>> inside %a and %b, since it is semantically incorrect for llvm to reuse
>> them in the next iteration of the loop.
>
>
>  This is the right general idea, but you can already express this
> constraint in LLVM as it exists today, by using llvm.gcroot().  As you
> noted, this also solves the interior-pointer problem by making it the front
> end's job to convey to LLVM when it would/would not be safe to cache
> interior pointers across loop iterations. The invariant that a front-end
> must maintain is that any pointer which is live across a given
> potential-GC-point must be reloaded from its root after a (relocating) GC
> might have occurred. This falls naturally out of the viewpoint that %a is
> not an object pointer, it's the name of an object pointer's value
at a
> given point in time. So, of course, whenever that object pointer's
value
> might change, there must be a new name.
>
> To rephrase, you're saying that the current gcroot mechanism is
analogous
> to a boxed object pointer.  We could model a safepoint by storing the
> unboxed value into the box, applying an opaque operation to the box, and
> reloading the raw object pointer from the box.  (The opaque operator is key
> to prevent the store/load pair from being removed.  It also implements the
> actual safepoint operation.)  Is this a correct restatement?
>
> Here's an example:
> gcroot a = ...; b = ...;
> object* ax = *a, bx = *b;
> repeat 500000 iterations {
>   spin for 50 instructions
>   *a = ax;
>   *b = bx;
>   safepoint(a, b)
>   ax = *a;
>   bx = *b;
> }
>
> If so, I would agree with you that this is a correct encoding of object
> relocation.  I think what got me confused was using the in-tree examples to
> reverse engineer the semantics.  Both the Erlang and Ocaml examples insert
> safepoints after all the LLVM IR passes are run and derived pointers were
> inserted.  This was the part that got me thinking that the gcroot semantics
> were unsuitable for a precise relocating collector.  If you completely
> ignored these implementations and inserted the full box/safepoint
> call/unbox implementation at each safepoint before invoking any
> optimizations, I think this would be correct.
>
> One problem with this encoding is that there does not currently exist an
> obvious mechanism to describe a safepoint in the IR itself.  (i.e. there is
> no llvm.gcsafepoint)  You could model this with a "well known"
function
> which a custom GCStrategy recorded a stackmap on every call.  It would be
> good to extend the existing intrinsics with such a mechanism.
>
> At least if I'm reading things right, the current *implementation* is
not
> a correct implementation of the intended semantics.  In particular, the
> safepoint operation which provides the opaque manipulation of the boxed
> gcroots is not preserved into the SelectionDAG.  As a result, optimizations
> on the SelectionDAG could eliminate the now adjacent box/unbox pairs.  This
> would break a relocating collector.
>
> Leaving this aside, there are also a number of performance problems with
> the current implementation.  I'll summarize them here for now, but will
> expand into approaches for resolving each if this looks like the most
> likely implementation strategy.
> 1) Explicitly adding box/safepoint/unbox prevents creation of derived
> pointers.  This prevents optimizations such as loop-blocking.  Ideally,
> we'd rather let the derived pointers be created and capture them for
> updating at the safepoint.
> 2) The redundant loads and stores required for box/unbox.  (These might be
> partially eliminated by other optimization passes.)
> 3) The inability to allocate GC roots to registers.
> 4) Frame sizes are implicitly doubled since both an unboxed and boxed
> representation of every object pointer is required.
>
> Frankly, I think that (3) is likely solvable, but that (1) and (4) could
> be fatal for a high performance implementation.  I don't have hard
numbers
> on that; I'm simply stating my intuition.
>
>
>  The fact that the mutable memory associated with a gcroot() is allocated
> on the stack (rather than, say, a machine register) is an implementation
> detail; fixing it doesn't require altering the (conceptual) interface
for
> LLVM's existing GC support, AFAICT.
>
> While in principal, I agree with you here, in practice the difference is
> actually quite significant.  I am not convinced that the implementation
> would be straight forward.  Let's set this aside for the moment and
focus
> on the more fundamental questions.
>
>
>
>> We lower gc_relocate to a
>> pseudo opcode which lowered into nothing after register allocation.
>>
>> The problem is, of course, the performance penalty.  Does it make
>> sense to get the register allocator "see" the gc_relocate
instruction
>> as a copy so that they get the same register / slot?  Will that
>> violate the intended semantics of gc_relocate (for example, after
>> assigning the same register / slot to %a and %a.relocated, are there
>> passes that will try to cache derived pointers across loop
>> iterations)?
>>
>> Thanks,
>> -- Sanjoy
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing listLLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131025/8ff7c861/attachment.html>

Filip Pizlo

2013-Oct-26 14:40 UTC

head link

[LLVMdev] Interfacing llvm with a precise, relocating GC

> On Oct 26, 2013, at 12:37 AM, Michael Lewis <don.apoch at gmail.com>
wrote:
> 
> I'm also highly interested in relocating-GC support from LLVM. Up until
now my GC implementation has been non-relocating which is obviously kind of a
bummer given that it inhibits certain classes of memory allocation/deallocation
tricks.
You can implement a copying GC (what the kids these days call relocating)
without accurate roots.
> 
> 
> I wrote up a bunch of my findings on the implementation of my GC here:
https://code.google.com/p/epoch-language/wiki/GarbageCollectionScheme
> 
Why aren't you just using the well-known Bartlett style of GC, which allows
for relocation even in the presence of conservative roots (or accurate roots
that don't allow relocation)?
> 
> Frankly I haven't had time to get much further than idle pondering of
how I'd go about implementing relocation, but it seems to me like the
existing read/write barrier intrinsics might be sufficient if abused properly by
the front-end and lowered carefully. My operating hypothesis has been to stash
barriers at key points - identified by the front-end itself - and possibly elide
them during lowering if it's safe to do so. If my understanding is correct,
it should be possible to lower the barriers into exactly the kind of
boxing/unboxing procedure described in this thread.
> 
> Based on my experimentation so far, which is admittedly minimal, I think
the overhead of updating relocated pointers is actually not as bad as it sounds.
It isn't strictly necessary to store both a boxed *and* unboxed pointer in
every frame. In fact, in the current incarnation of gcroot, marking an alloca as
a gcroot effectively forces a stack spill for that alloca; this means that the
relocating collector just updates the single existing pointer on the stack when
it does its magic, and you're done. With proper barriers in place, and/or
careful location of safepoints by the front-end, it's not that hard to make
sure that a relocated pointer gets updated.
Bartlett collectors will give you relocation with zero overhead. You won't
even need any help from LLVM to do it.
> 
> The real trick, as near as I can tell, would be updating registers that
have live roots during a collection. But again based on my past investigations
this should just be a matter of ensuring that post-collection you have a barrier
that is lowered into a register refresh based on the on-stack relocated pointer
value.
> 
> 
> One thing I've been meaning to try and do is use this barrier-abuse
scheme to work around the existing lack of support for tracing roots in machine
registers, by effectively setting up an artificial stack spill just prior to a
collection, and otherwise operating on registers as much as possible instead of
gcroot'ed allocas. Again, I've only considered this as a thought
exercise until now, so I apologize if there's some obvious flaw I'm not
aware of in my reasoning. I haven't actually done any of this stuff yet so
it's more speculative than anything - just trying to creatively engineer
around the existing limitations :-)
> 
> 
> 
>  - Mike
> 
> 
> 
> 
>> On Fri, Oct 25, 2013 at 5:35 PM, Philip Reames <listmail at
philipreames.com> wrote:
>>> On 10/25/13 1:10 PM, Ben Karel wrote:
>>> 
>>> 
>>> 
>>> On Thu, Oct 24, 2013 at 6:42 PM, Sanjoy Das <sanjoy at
azulsystems.com> wrote:
>>>> Hi Rafael, Andrew,
>>>> 
>>>> Thank you for the prompt reply.
>>>> 
>>>> One approach we've been considering involves representing
the
>>>> constraint "pointers to heap objects are invalidated at
every
>>>> safepoint" somehow in the IR itself.  So, if %a and %b are
values the
>>>> GC is interested in, the safepoint at the back-edge of a       
loop might
>>>> look like:
>>>> 
>>>>   ; <label>: body
>>>>     %a = phi i32 [ %a.relocated, %body ] [ %a.initial_value,
%pred ]
>>>>     %b = phi i32 [ %b.relocated, %body ] [ %b.initial_value,
%pred ]
>>>>     ;; Use %a and %b
>>>> 
>>>>     ;; The safepoint starts here
>>>>     %a.relocated = @llvm.gc_relocate(%a)
>>>>     %b.relocated = @llvm.gc_relocate(%b)
>>>>     br %body
>>>> 
>>>> This allows us to not bother with relocating derived pointers
pointing
>>>> inside %a and %b, since it is semantically incorrect for llvm
to reuse
>>>> them in the next iteration of the loop.
>>> 
>>> This is the right general idea, but you can already express this
constraint in LLVM as it exists today, by using llvm.gcroot().  As you noted,
this also solves the interior-pointer problem by making it the front end's
job to convey to LLVM when it would/would not be safe to cache interior pointers
across loop iterations. The invariant that a front-end must maintain is that any
pointer which is live across a given potential-GC-point must be reloaded from
its root after a (relocating) GC might have occurred. This falls naturally out
of the viewpoint that %a is not an object pointer, it's the name of an
object pointer's value at a given point in time. So, of course, whenever
that object pointer's value might change, there must be a new name.
>> To rephrase, you're saying that the current gcroot mechanism is
analogous to a boxed object pointer.  We could model a safepoint by storing the
unboxed value into the box, applying an opaque operation to the box, and
reloading the raw object pointer from the box.  (The opaque operator is key to
prevent the store/load pair from being removed.  It also implements the actual
safepoint operation.)  Is this a correct restatement?
>> 
>> Here's an example:
>> gcroot a = ...; b = ...;
>> object* ax = *a, bx = *b;
>> repeat 500000 iterations { 
>>   spin for 50 instructions
>>   *a = ax;
>>   *b = bx;
>>   safepoint(a, b)
>>   ax = *a;
>>   bx = *b;
>> } 
>> 
>> If so, I would agree with you that this is a correct encoding of object
relocation.  I think what got me confused was using the in-tree examples to
reverse engineer the semantics.  Both the Erlang and Ocaml examples insert
safepoints after all the LLVM IR passes are run and derived pointers were
inserted.  This was the part that got me thinking that the gcroot semantics were
unsuitable for a precise relocating collector.  If you completely ignored these
implementations and inserted the full box/safepoint call/unbox implementation at
each safepoint before invoking any optimizations, I think this would be correct.
>> 
>> One problem with this encoding is that there does not currently exist
an obvious mechanism to describe a safepoint in the IR itself.  (i.e. there is
no llvm.gcsafepoint)  You could model this with a "well known"
function which a custom GCStrategy recorded a stackmap on every call.  It would
be good to extend the existing intrinsics with such a mechanism.
>> 
>> At least if I'm reading things right, the current *implementation*
is not a correct implementation of the intended semantics.  In particular, the
safepoint operation which provides the opaque manipulation of the boxed gcroots
is not preserved into the SelectionDAG.  As a result, optimizations on the
SelectionDAG could eliminate the now adjacent box/unbox pairs.  This would break
a relocating collector.
>> 
>> Leaving this aside, there are also a number of performance problems
with the current implementation.  I'll summarize them here for now, but will
expand into approaches for resolving each if this looks like the most likely
implementation strategy.
>> 1) Explicitly adding box/safepoint/unbox prevents creation of derived
pointers.  This prevents optimizations such as loop-blocking.  Ideally, we'd
rather let the derived pointers be created and capture them for updating at the
safepoint.
>> 2) The redundant loads and stores required for box/unbox.  (These might
be partially eliminated by other optimization passes.)
>> 3) The inability to allocate GC roots to registers.  
>> 4) Frame sizes are implicitly doubled since both an unboxed and boxed
representation of every object pointer is required.
>> 
>> Frankly, I think that (3) is likely solvable, but that (1) and (4)
could be fatal for a high performance implementation.  I don't have hard
numbers on that; I'm simply stating my intuition.
>> 
>>> 
>>> The fact that the mutable memory associated with a gcroot() is
allocated on the stack (rather than, say, a machine register) is an
implementation detail; fixing it doesn't require altering the (conceptual)
interface for LLVM's existing GC support, AFAICT.
>> While in principal, I agree with you here, in practice the difference
is actually quite significant.  I am not convinced that the implementation would
be straight forward.  Let's set this aside for the moment and focus on the
more fundamental questions.
>>>  
>>>> We lower gc_relocate to a
>>>> pseudo opcode which lowered into nothing after register
allocation.
>>>> 
>>>> The problem is, of course, the performance penalty.  Does it
make
>>>> sense to get the register allocator "see" the
gc_relocate instruction
>>>> as a copy so that they get the same register / slot?  Will that
>>>> violate the intended semantics of gc_relocate (for example,
after
>>>> assigning the same register / slot to %a and %a.relocated, are
there
>>>> passes that will try to cache derived pointers across loop
>>>> iterations)?
>>>> 
>>>> Thanks,
>>>> -- Sanjoy
>>>> 
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131026/57acb2a3/attachment.html>

Philip Reames

2013-Oct-28 18:03 UTC

head link

[LLVMdev] Interfacing llvm with a precise, relocating GC

On 10/26/13 7:40 AM, Filip Pizlo wrote:> You can implement a copying GC (what the kids these days call
> relocating) without accurate roots.I use "relocating" to brush over the distinction between
"copying" and
"compacting" collectors. For the purposes of our discussions, the two
are interchangeable though.> Why aren't you just using the well-known Bartlett style of GC, which
> allows for relocation even in the presence of conservative roots (or
> accurate roots that don't allow relocation)?You've brought this up a few times, so I want to go ahead and address
your point in detail.

Background for others following this discussion: A Bartlett style
garbage collector conservatively scans the stack and promotes* any
object reachable from a possible heap pointer in place. Objects
directly reachable from the stack are not relocated. Objects reachable
from these objects are treated normally by the relocating collector and
may be moved. To support in place promotion, a small amount of extra
heap metadata is generally required.

* "Promote" is used to indicate the object is conceptually moved into
the "to space". This is not the same promotion used when discussing
generational collectors.

I agree that Bartlett collectors have significant advantages with
regards to compatibility with legacy code and compilers. A Bartlett
collector does not require derived pointer tracking inside the
compiler. A Bartlett collector simplifies interfacing with code in
other languages (i.e. C/C++) since it has far fewer assumptions about
stack layout.

Unfortunately, Bartlett collectors also have a number of serious
disadvantages. To name a few:
- False Retention - Due to conservative stack scanning, objects which
are not live may be falsely retained. (i.e. false roots or dead roots)
- External Fragmentation - Promoted-in-place objects can be anywhere in
the heap. This introduces fragmentation and complicates both allocation
and collection.
- Internal Fragmentation - Promoted-in-place objects cause the retention
of the entire "page" containing them. Unless using a free-list style
allocator, all of this space is inaccessible.
- Garbage Drag - Since falsely retained data can point to other heap
objects, an unknown fraction of the heap may be falsely kept live. This
is particularly problematic for programs with extreme heap usage.
Consider a program which maintains a large B-tree in memory with
updates. Or a program manipulating large graphs. In either case, a
single stray pointer can force the retention of large portions of the
heap graph. (In practice, this does happen.)

I do acknowledge that each of these is individually surmountable.
However, doing so requires significant work beyond the basic design
described in Bartlett's initial work. Given the complexity of
engineering a production grade garbage collector, this additional
complexity is highly undesirable.

Taking a step back from the technical merits, I have one other
overriding reason for not pursuing a Bartlett style approach: we already
have a fully precise relocating collector. We are not interested in
abandoning this. From our side, the only question is whether LLVM is
suitable for use with our existing collector.

I'm happy to keep debating the merits of various approaches with you
(this is fun!), but from a practical perspective, using an alternate
approach to garbage collection is simply a non-starter for us.

Yours,
Philip

Philip Reames

2013-Oct-28 21:26 UTC

head link

[LLVMdev] Interfacing llvm with a precise, relocating GC

On 10/25/13 9:37 PM, Michael Lewis wrote:> I'm also highly interested in relocating-GC support from LLVM. Up 
> until now my GC implementation has been non-relocating which is 
> obviously kind of a bummer given that it inhibits certain classes of 
> memory allocation/deallocation tricks.Out of idle curiosity, which optimizations were you most interested
in?> I wrote up a bunch of my findings on the implementation of my GC here: 
> https://code.google.com/p/epoch-language/wiki/GarbageCollectionSchemeThanks for sharing your experiences.  I had actually run across this a 
while ago and read through it.  It's always nice to learn from others 
rather than repeating the same lessons. For example, the stack coloring 
problem you mention was one I'm glad to know I don't have to discover 
myself.  :)>
>
> Frankly I haven't had time to get much further than idle pondering of 
> how I'd go about implementing relocation, but it seems to me like the 
> existing read/write barrier intrinsics might be sufficient if abused 
> properly by the front-end and lowered carefully. My operating 
> hypothesis has been to stash barriers at key points - identified by 
> the front-end itself - and possibly elide them during lowering if it's 
> safe to do so. If my understanding is correct, it should be possible 
> to lower the barriers into exactly the kind of boxing/unboxing 
> procedure described in this thread.
>
> Based on my experimentation so far, which is admittedly minimal, I 
> think the overhead of updating relocated pointers is actually not as 
> bad as it sounds. It isn't strictly necessary to store both a boxed 
> *and* unboxed pointer in every frame. In fact, in the current 
> incarnation of gcroot, marking an alloca as a gcroot effectively 
> forces a stack spill for that alloca; this means that the relocating 
> collector just updates the single existing pointer on the stack when 
> it does its magic, and you're done. With proper barriers in place, 
> and/or careful location of safepoints by the front-end, it's not that 
> hard to make sure that a relocated pointer gets updated.I'm not concerned about the correctness of such an implementation. I am 
concerned about the performance.  I think it's time for us to do some 
prototyping on our side to see how things work out in practice.  I can't 
promise to share the detailed results, but I will try to share as much 
as I can.>
> The real trick, as near as I can tell, would be updating registers 
> that have live roots during a collection. But again based on my past 
> investigations this should just be a matter of ensuring that 
> post-collection you have a barrier that is lowered into a register 
> refresh based on the on-stack relocated pointer value.
>
>
> One thing I've been meaning to try and do is use this barrier-abuse 
> scheme to work around the existing lack of support for tracing roots 
> in machine registers, by effectively setting up an artificial stack 
> spill just prior to a collection, and otherwise operating on registers 
> as much as possible instead of gcroot'ed allocas. Again, I've only 
> considered this as a thought exercise until now, so I apologize if 
> there's some obvious flaw I'm not aware of in my reasoning. I
haven't
> actually done any of this stuff yet so it's more speculative than 
> anything - just trying to creatively engineer around the existing 
> limitations :-)Hey speculation is fine!  That's what I'm doing at the moment.  You have
to speculate while trying figure out if something is worth actually 
implementing.  :)

This is conceptually close to the scheme I'm considering, though 
different in details.  If you get a chance, let me know how it works
out.>
>
>
>  - Mike
>
>
>
>
> On Fri, Oct 25, 2013 at 5:35 PM, Philip Reames 
> <listmail at philipreames.com <mailto:listmail at
philipreames.com>> wrote:
>
>     On 10/25/13 1:10 PM, Ben Karel wrote:
>>
>>
>>
>>     On Thu, Oct 24, 2013 at 6:42 PM, Sanjoy Das
>>     <sanjoy at azulsystems.com <mailto:sanjoy at
azulsystems.com>> wrote:
>>
>>         Hi Rafael, Andrew,
>>
>>         Thank you for the prompt reply.
>>
>>         One approach we've been considering involves representing
the
>>         constraint "pointers to heap objects are invalidated at
every
>>         safepoint" somehow in the IR itself.  So, if %a and %b are
>>         values the
>>         GC is interested in, the safepoint at the back-edge of a loop
>>         might
>>         look like:
>>
>>           ; <label>: body
>>             %a = phi i32 [ %a.relocated, %body ] [ %a.initial_value,
>>         %pred ]
>>             %b = phi i32 [ %b.relocated, %body ] [ %b.initial_value,
>>         %pred ]
>>             ;; Use %a and %b
>>
>>             ;; The safepoint starts here
>>             %a.relocated = @llvm.gc_relocate(%a)
>>             %b.relocated = @llvm.gc_relocate(%b)
>>             br %body
>>
>>         This allows us to not bother with relocating derived pointers
>>         pointing
>>         inside %a and %b, since it is semantically incorrect for llvm
>>         to reuse
>>         them in the next iteration of the loop. 
>>
>>
>>     This is the right general idea, but you can already express this
>>     constraint in LLVM as it exists today, by using
>>     llvm.gcroot().  As you noted, this also solves the
>>     interior-pointer problem by making it the front end's job to
>>     convey to LLVM when it would/would not be safe to cache interior
>>     pointers across loop iterations. The invariant that a front-end
>>     must maintain is that any pointer which is live across a given
>>     potential-GC-point must be reloaded from its root after a
>>     (relocating) GC might have occurred. This falls naturally out of
>>     the viewpoint that %a is not an object pointer, it's the name
of
>>     an object pointer's value at a given point in time. So, of
>>     course, whenever that object pointer's value might change,
there
>>     must be a new name.
>     To rephrase, you're saying that the current gcroot mechanism is
>     analogous to a boxed object pointer.  We could model a safepoint
>     by storing the unboxed value into the box, applying an opaque
>     operation to the box, and reloading the raw object pointer from
>     the box.  (The opaque operator is key to prevent the store/load
>     pair from being removed.  It also implements the actual safepoint
>     operation.)  Is this a correct restatement?
>
>     Here's an example:
>     gcroot a = ...; b = ...;
>     object* ax = *a, bx = *b;
>     repeat 500000 iterations {
>       spin for 50 instructions
>       *a = ax;
>       *b = bx;
>       safepoint(a, b)
>       ax = *a;
>       bx = *b;
>     }
>
>     If so, I would agree with you that this is a correct encoding of
>     object relocation.  I think what got me confused was using the
>     in-tree examples to reverse engineer the semantics.  Both the
>     Erlang and Ocaml examples insert safepoints after all the LLVM IR
>     passes are run and derived pointers were inserted.  This was the
>     part that got me thinking that the gcroot semantics were
>     unsuitable for a precise relocating collector.  If you completely
>     ignored these implementations and inserted the full box/safepoint
>     call/unbox implementation at each safepoint before invoking any
>     optimizations, I think this would be correct.
>
>     One problem with this encoding is that there does not currently
>     exist an obvious mechanism to describe a safepoint in the IR
>     itself.  (i.e. there is no llvm.gcsafepoint)  You could model this
>     with a "well known" function which a custom GCStrategy
recorded a
>     stackmap on every call.  It would be good to extend the existing
>     intrinsics with such a mechanism.
>
>     At least if I'm reading things right, the current *implementation*
>     is not a correct implementation of the intended semantics.  In
>     particular, the safepoint operation which provides the opaque
>     manipulation of the boxed gcroots is not preserved into the
>     SelectionDAG. As a result, optimizations on the SelectionDAG could
>     eliminate the now adjacent box/unbox pairs.  This would break a
>     relocating collector.
>
>     Leaving this aside, there are also a number of performance
>     problems with the current implementation. I'll summarize them here
>     for now, but will expand into approaches for resolving each if
>     this looks like the most likely implementation strategy.
>     1) Explicitly adding box/safepoint/unbox prevents creation of
>     derived pointers.  This prevents optimizations such as
>     loop-blocking.  Ideally, we'd rather let the derived pointers be
>     created and capture them for updating at the safepoint.
>     2) The redundant loads and stores required for box/unbox.  (These
>     might be partially eliminated by other optimization passes.)
>     3) The inability to allocate GC roots to registers.
>     4) Frame sizes are implicitly doubled since both an unboxed and
>     boxed representation of every object pointer is required.
>
>     Frankly, I think that (3) is likely solvable, but that (1) and (4)
>     could be fatal for a high performance implementation.  I don't
>     have hard numbers on that; I'm simply stating my intuition.
>
>>
>>     The fact that the mutable memory associated with a gcroot() is
>>     allocated on the stack (rather than, say, a machine register) is
>>     an implementation detail; fixing it doesn't require altering
the
>>     (conceptual) interface for LLVM's existing GC support, AFAICT.
>     While in principal, I agree with you here, in practice the
>     difference is actually quite significant.  I am not convinced that
>     the implementation would be straight forward.  Let's set this
>     aside for the moment and focus on the more fundamental questions.
>>
>>         We lower gc_relocate to a
>>         pseudo opcode which lowered into nothing after register
>>         allocation.
>>
>>         The problem is, of course, the performance penalty.  Does it
make
>>         sense to get the register allocator "see" the
gc_relocate
>>         instruction
>>         as a copy so that they get the same register / slot?  Will that
>>         violate the intended semantics of gc_relocate (for example,
after
>>         assigning the same register / slot to %a and %a.relocated,
>>         are there
>>         passes that will try to cache derived pointers across loop
>>         iterations)?
>>
>>         Thanks,
>>         -- Sanjoy
>>
>>
>>         _______________________________________________
>>         LLVM Developers mailing list
>>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>         http://llvm.cs.uiuc.edu
>>         http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>>
>>     _______________________________________________
>>     LLVM Developers mailing list
>>     LLVMdev at cs.uiuc.edu  <mailto:LLVMdev at cs.uiuc.edu>      
http://llvm.cs.uiuc.edu
>>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131028/39dbd3e2/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Oct 2013 - [LLVMdev] Interfacing llvm with a precise, relocating GC

[LLVMdev] Interfacing llvm with a precise, relocating GC

[LLVMdev] Interfacing llvm with a precise, relocating GC

[LLVMdev] Interfacing llvm with a precise, relocating GC

[LLVMdev] Interfacing llvm with a precise, relocating GC

Possibly Parallel Threads