thr3ads.net - llvm dev - [LLVMdev] Improving Garbage Collection [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Peter Lawrence

2011-Jul-19 17:06 UTC

[LLVMdev] Improving Garbage Collection

Talin,
           how about having the front-end generate an llvm.safe.point 
() intrinsic call at
the desired safe points,  and having the addresses of the GC roots  
(at that point,
can vary from call to call) be the parameters (with noescape  
attribute) to the intrinsic,

IIUC currently the GC roots are tagged, and all analysis and  
transform optimizations
have to special case these tagged objects, but if instead their  
addresses were taken at
the safe points no special casing would have to be done -- all  
analysis and transform
optimizations already know how to deal with objects whose address is  
taken,

and since llvm does already have a "noescape" (not sure that's the
correct name?)
attribute for parameters, these addresses won't be misinterpreted by  
any alias
analysis either, and llvm is free to go ahead and keep these values  
in registers
between safe points  -- you can stop asking how to allow GC roots as  
SSA values,
any traditional load-store optimization pass will do it for you for  
free.

*** without you having to insert explicit load and store  
instructions, and having to
somehow mark them as non-delete-able, or always omit optimization  
passes that
you would otherwise like to have enabled ***

and also without you having to store NULL into a safe point to end  
its lifetime.
and I would suggest eliminating the gcroot() intrinsic as it's  
information content
would be redundant.


thoughts, comments ???


-Peter Lawrence.




On Jul 18, 2011, at 10:53 PM, Talin wrote:
> On Mon, Jul 18, 2011 at 11:00 AM, Peter Lawrence  
> <peterl95124 at sbcglobal.net> wrote:
> Talin,
>           do you identify safe-points in the current or proposed  
> llvm scheme, and if so how,
> or are they implicit as being at all call sites (which begs the  
> question what about leaves
> in the call tree, how does GC get started at all in that case).
>
> The LLVM linker has a feature where you can specify what kind of  
> safe points your collector requires - the options are Loop, Return,  
> PreCall and PostCall. You can also override this behavior and  
> examine each instruction and return a boolean indicating whether it  
> is or isn't a safe point.
>
> Currently I only have function calls as safe points, although I may  
> eventually enable loops as well. As far as leaf functions go,  
> consider that the call to allocate memory is also a safe point -  
> and if a function doesn't allocate any memory then we don't care if
> the GC is involved or not.
>
> One complication with the current scheme is that the frontend has  
> to have a sense of where the safe points are going to be. Because  
> the current scheme requires the frontend to insert additional loads  
> and stores around safe points (for spilling register values to  
> memory so they can be traced), the frontend has to be able to guess  
> which function call might be a safe point - but it can't know for  
> sure due to the fact that optimization and inlining (which happens  
> much later) may cause the removal of the actual call instruction.  
> The safe but inefficient approach is to insert the extra loads and  
> stores around every call instruction.
>
> Peter Lawrence.
>
> -- 
> -- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110719/d23bc323/attachment.html>

Andrew Trick

2011-Jul-20 18:20 UTC

head link

[LLVMdev] Improving Garbage Collection

On Jul 19, 2011, at 10:06 AM, Peter Lawrence wrote:
> Talin,
>           how about having the front-end generate an llvm.safe.point()
intrinsic call at
> the desired safe points,  and having the addresses of the GC roots (at that
point,
> can vary from call to call) be the parameters (with noescape attribute) to
the intrinsic,
> 
> IIUC currently the GC roots are tagged, and all analysis and transform
optimizations
> have to special case these tagged objects, but if instead their addresses
were taken at
> the safe points no special casing would have to be done -- all analysis and
transform
> optimizations already know how to deal with objects whose address is taken,
> 
> and since llvm does already have a "noescape" (not sure
that's the correct name?)
> attribute for parameters, these addresses won't be misinterpreted by
any alias
> analysis either, and llvm is free to go ahead and keep these values in
registers
> between safe points  -- you can stop asking how to allow GC roots as SSA
values,
> any traditional load-store optimization pass will do it for you for free.
> 
> *** without you having to insert explicit load and store instructions, and
having to
> somehow mark them as non-delete-able, or always omit optimization passes
that
> you would otherwise like to have enabled ***
> 
> and also without you having to store NULL into a safe point to end its
lifetime.
> and I would suggest eliminating the gcroot() intrinsic as it's
information content
> would be redundant.
> 
> 
> thoughts, comments ???
> 
> 
> -Peter Lawrence.
It is helpful to think of the stack/register map generation in two
phases: before and after identifying the location of heap
pointers. Say PrePtrMap and PostPtrMap. We don't need to know stack
offsets and register names at that point, but do we need a 1-1 mapping
from pointer values to identifiable physical locations. I deliberately
avoid calling these values roots here, because we may have multiple
live pointers derived from an object and multiple copies of the same
pointer. GC only needs to see one of these values to trace roots, but
each of these still needs its own entry in the stack/register map for
a moving collector.

PrePtrMap, we need a type system to precisely tag all values known to
contain valid heap pointers. No potentially uninitialized/undefined
values allowed here. IntPtr/PtrInt casts become control dependent.

PostPtrMap, we need an IR that represents mapped pointers as live-in
to safepoints, *and* safepoints need to be defined as clobbering all
locations that may contain a pointer. For example, if we have a
register map, then we can no longer move and add instruction across a
call if it may operate on a pointer. Obviously, it's easier to avoid
invalidating a stack map, but fundamentally the same problem. No
amount of spilling can bail you out without updating the map.

The current gcroot solution works around the lack of LLVM type system
support for heap pointers by effectively mapping pointers very early
(in the front end), reloading roots after safepoints (so we only need
one map entry per root), and relying on the rules that allows callees
to write their caller's stack under certain circumstances (someone
needs to explain these rules to me--is it only possible when an alloca
pointer is taken?).

Your proposal is exactly the same in terms of when the heap pointers
are identified. That leaves all of the LLVM optimizer and codegen
running in PostPtrMap mode. The problem is that LLVM is free to make
copies of pointers and optimize across call sites without knowing how to
update the map.

The most efficient way to support GC is to move identification of
pointer locations as late as possible. Optimization across safepoints
needs to be effectively disabled after that point.

-Andy
> 
> 
> 
> 
> On Jul 18, 2011, at 10:53 PM, Talin wrote:
> 
>> On Mon, Jul 18, 2011 at 11:00 AM, Peter Lawrence <peterl95124 at
sbcglobal.net> wrote:
>> Talin,
>>           do you identify safe-points in the current or proposed llvm
scheme, and if so how,
>> or are they implicit as being at all call sites (which begs the
question what about leaves
>> in the call tree, how does GC get started at all in that case).
>> 
>> The LLVM linker has a feature where you can specify what kind of safe
points your collector requires - the options are Loop, Return, PreCall and
PostCall. You can also override this behavior and examine each instruction and
return a boolean indicating whether it is or isn't a safe point.
>> 
>> Currently I only have function calls as safe points, although I may
eventually enable loops as well. As far as leaf functions go, consider that the
call to allocate memory is also a safe point - and if a function doesn't
allocate any memory then we don't care if the GC is involved or not.
>> 
>> One complication with the current scheme is that the frontend has to
have a sense of where the safe points are going to be. Because the current
scheme requires the frontend to insert additional loads and stores around safe
points (for spilling register values to memory so they can be traced), the
frontend has to be able to guess which function call might be a safe point - but
it can't know for sure due to the fact that optimization and inlining (which
happens much later) may cause the removal of the actual call instruction. The
safe but inefficient approach is to insert the extra loads and stores around
every call instruction.
>> 
>> Peter Lawrence.
>> 
>> -- 
>> -- Talin
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110720/c2e6139b/attachment.html>

Talin

2011-Jul-20 19:39 UTC

head link

[LLVMdev] Improving Garbage Collection

On Wed, Jul 20, 2011 at 11:20 AM, Andrew Trick <atrick at apple.com>
wrote:
> On Jul 19, 2011, at 10:06 AM, Peter Lawrence wrote:
>
> Talin,
>           how about having the front-end generate an llvm.safe.point()
> intrinsic call at
> the desired safe points,  and having the addresses of the GC roots (at that
> point,
> can vary from call to call) be the parameters (with noescape attribute) to
> the intrinsic,
>
> IIUC currently the GC roots are tagged, and all analysis and transform
> optimizations
> have to special case these tagged objects, but if instead their addresses
> were taken at
> the safe points no special casing would have to be done -- all analysis and
> transform
> optimizations already know how to deal with objects whose address is taken,
>
> and since llvm does already have a "noescape" (not sure
that's the correct
> name?)
> attribute for parameters, these addresses won't be misinterpreted by
any
> alias
> analysis either, and llvm is free to go ahead and keep these values in
> registers
> between safe points  -- you can stop asking how to allow GC roots as SSA
> values,
> any traditional load-store optimization pass will do it for you for free.
>
> *** without you having to insert explicit load and store instructions, and
> having to
> somehow mark them as non-delete-able, or always omit optimization passes
> that
> you would otherwise like to have enabled ***
>
> and also without you having to store NULL into a safe point to end its
> lifetime.
> and I would suggest eliminating the gcroot() intrinsic as it's
information
> content
> would be redundant.
>
>
> thoughts, comments ???
>
>
> -Peter Lawrence.
>
>
> It is helpful to think of the stack/register map generation in two
> phases: before and after identifying the location of heap
> pointers. Say PrePtrMap and PostPtrMap. We don't need to know stack
> offsets and register names at that point, but do we need a 1-1 mapping
> from pointer values to identifiable physical locations. I deliberately
> avoid calling these values roots here, because we may have multiple
> live pointers derived from an object and multiple copies of the same
> pointer. GC only needs to see one of these values to trace roots, but
> each of these still needs its own entry in the stack/register map for
> a moving collector.
>
> PrePtrMap, we need a type system to precisely tag all values known to
> contain valid heap pointers. No potentially uninitialized/undefined
> values allowed here. IntPtr/PtrInt casts become control dependent.
>
> PostPtrMap, we need an IR that represents mapped pointers as live-in
> to safepoints, *and* safepoints need to be defined as clobbering all
> locations that may contain a pointer. For example, if we have a
> register map, then we can no longer move and add instruction across a
> call if it may operate on a pointer. Obviously, it's easier to avoid
> invalidating a stack map, but fundamentally the same problem. No
> amount of spilling can bail you out without updating the map.
>
> The current gcroot solution works around the lack of LLVM type system
> support for heap pointers by effectively mapping pointers very early
> (in the front end), reloading roots after safepoints (so we only need
> one map entry per root), and relying on the rules that allows callees
> to write their caller's stack under certain circumstances (someone
> needs to explain these rules to me--is it only possible when an alloca
> pointer is taken?).
>
> Your proposal is exactly the same in terms of when the heap pointers
> are identified. That leaves all of the LLVM optimizer and codegen
> running in PostPtrMap mode. The problem is that LLVM is free to make
> copies of pointers and optimize across call sites without knowing how to
> update the map.
>
> The most efficient way to support GC is to move identification of
> pointer locations as late as possible. Optimization across safepoints
> needs to be effectively disabled after that point.
>
> -Andy
>
>Achievement unlocked: People who are smarter than me and more knowledgeable
about the LLVM backends are arguing over the details of how GC ought to work
:)
> On Jul 18, 2011, at 10:53 PM, Talin wrote:
>
> On Mon, Jul 18, 2011 at 11:00 AM, Peter Lawrence <
> peterl95124 at sbcglobal.net> wrote:
>
>> Talin,
>>           do you identify safe-points in the current or proposed llvm
>> scheme, and if so how,
>> or are they implicit as being at all call sites (which begs the
question
>> what about leaves
>> in the call tree, how does GC get started at all in that case).
>>
>> The LLVM linker has a feature where you can specify what kind of safe
> points your collector requires - the options are Loop, Return, PreCall and
> PostCall. You can also override this behavior and examine each instruction
> and return a boolean indicating whether it is or isn't a safe point.
>
> Currently I only have function calls as safe points, although I may
> eventually enable loops as well. As far as leaf functions go, consider that
> the call to allocate memory is also a safe point - and if a function
doesn't
> allocate any memory then we don't care if the GC is involved or not.
>
> One complication with the current scheme is that the frontend has to have a
> sense of where the safe points are going to be. Because the current scheme
> requires the frontend to insert additional loads and stores around safe
> points (for spilling register values to memory so they can be traced), the
> frontend has to be able to guess which function call might be a safe point
-
> but it can't know for sure due to the fact that optimization and
inlining
> (which happens much later) may cause the removal of the actual call
> instruction. The safe but inefficient approach is to insert the extra loads
> and stores around every call instruction.
>
> Peter Lawrence.
>>
>
> --
> -- Talin
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110720/0f2a24a1/attachment.html>

Frits van Bommel

2011-Jul-21 06:12 UTC

head link

[LLVMdev] Improving Garbage Collection

On 20 July 2011 20:20, Andrew Trick <atrick at apple.com>
wrote:> safepoints need to be defined as clobbering all
> locations that may contain a pointer.
But *only* for a moving collector.
If you're using a non-moving gc the pointers won't change, so
there's
no need to force reloading them after a safe point..
Unless weak pointers are supported, in which case you need a way to
mark *some* pointers as being clobbered (but not others).

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jul 2011 - [LLVMdev] Improving Garbage Collection

[LLVMdev] Improving Garbage Collection

[LLVMdev] Improving Garbage Collection

[LLVMdev] Improving Garbage Collection

[LLVMdev] Improving Garbage Collection

Possibly Parallel Threads