thr3ads.net - llvm dev - [LLVMdev] Stack roots and function parameters [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Kenneth Uildriks

2010-Sep-22 12:58 UTC

[LLVMdev] Stack roots and function parameters

On Tue, Sep 21, 2010 at 8:20 PM, Talin <viridia at gmail.com>
wrote:> On Mon, Sep 20, 2010 at 3:16 PM, Talin <viridia at gmail.com> wrote:
>>
>> So I've managed to get my stack crawler working and passing its
unit tests
>> - this is the one I've been working on as an alternative to
shadow-stack: it
>> uses only static constant data structures (no global variables or
>> thread-local data), which means that it's fully compatible with a
>> multi-threaded environment.
>> One question that has arisen, however, is what to do about function
>> parameters that may be stack roots. You can't call llvm.gcroot() on
a
>> function parameter. The only thing I can think of is to copy any
function
>> parameter that is a pointer, or which contains a pointer, into a local
>> variable. This seems complicated and inefficient to me (for one thing,
it
>> means that every stack frame just got 8 bytes bigger due to the need to
>> store the 'this' pointer locally)  - is there a better way?
If a function parameter is a stack root, then so is the value passed
through that parameter.  Since they both point to the same thing, only
one stack root is needed to keep it reachable.  If you've got a root
for it in the caller, you don't need one in the callee for the
argument.
>
> Anyone?
> Here's some examples of what I am talking about (I'll use Java for
the
> examples):
> Example 1:
>   class Foo {
>     static String st = "Hello";
>     void f1() {
>        f2(st);
>     }
>     void f2(String s) {
>       st = null; // Destroy static root
>       // Now allocate something, which might trigger
>       // a garbage collection.
>       StringBuffer sb = new StringBuffer();
>       // Is 's' still live? It's not stored in any
>       // alloca, or in any global variable. It only
>       // exists as a function parameter, which cannot
>       // be declared as a root via llvm.gcroot.
s is definitely still live assuming f1 copied st into a root before
using it, which it generally should.
>       sb.append(s);
>     }
>   }
> Example 2:
>   class Foo {
>     static void f1() {
>       // Create a new 'Foo' but don't store it
>       // anywhere.
>       new Foo().f2();
>     }
>     void f2() {
>       // Possibly trigger a collection
>       StringBuffer sb = new StringBuffer();
>       // Is 'this' still live at this point?
Again, yes, if f1 copied the result of "new Foo()" into a root.
>       sb.append(toString());
>     }
>   }
> Now, I know that the LLVM "Accurate GC" document says that any
"intermediate
> values" must be declared as a GC root. My question is, does merely
_loading_
> a global variable - or any mutable, non-strictly-local variable for that
> matter - count as an "intermediate" value in this case?
Yes, I believe it would.  In a multithreaded system, if you load a
pointer from a global or the heap, and some other thread changes the
pointer you loaded, you'd better have a stack root for the original
pointer value you loaded while you're using it.

Now if the global is thread-local, you don't need another root for it
since no other threads can see it.  And if you can prove that a global
isn't going to change, you also don't need another root for it.

> My problem is that I can't see how to address this without generating
> horribly bloated and inefficient code. Pretty much every function argument
> that isn't a simple integer or float will have to be copied to a hidden
> local variable before every function call - in addition to copying it to
the
> stack to create the call frame. Even local variables are not immune if you
> allow closures in your language. Under this scenario, calling conventions
> that pass arguments in registers are utterly futile and save nothing,
> because everything has to get copied to the stack anyway.
One way or another, stack roots *must* get copied to the stack, so
that the GC can find them.  Remember the GC is usually running in some
other thread and can *only* find pointers that live somewhere in
memory (either the stack or the heap).
> I really wish I could simply declare function arguments as llvm.gcroots.
For
> arguments that are not in registers, it would treat it just like an alloca,
> since they both represent stack slots. For arguments that are passed in
> registers, LLVM should automatically lower it to a non-register argument,
> making a copy if needed. (I can't do this myself, since I'm not
supposed to
> *know* which arguments are passed in registers or not - that depends on the
> target and the code generators and such).
I *believe* the code generator has some optimizations that eliminate
redundant copies within a block.  At any rate, if you keep your stack
roots in the caller rather than the callee, then there'll be a number
of cases where the caller needs to keep that stack root for purposes
other than that one call.  In those cases, having the stack root
doesn't impose a cost you aren't already paying just to allow the GC
to work.
> That still leaves me with the problem of declaring as roots function
> arguments that are the result of complex calculations, but those are far
> fewer - declaring temporary vars for those won't bloat the code so
badly.
> But having to copy every single load of a non-local variable into a
> temporary is a nightmare.
You only have to copy pointers.  If you've got a struct, you only need
to keep a root for the pointer member(s) of that struct.

Talin

2010-Sep-22 16:12 UTC

head link

[LLVMdev] Stack roots and function parameters

On Wed, Sep 22, 2010 at 5:58 AM, Kenneth Uildriks <kennethuil at
gmail.com>wrote:
> On Tue, Sep 21, 2010 at 8:20 PM, Talin <viridia at gmail.com> wrote:
> > On Mon, Sep 20, 2010 at 3:16 PM, Talin <viridia at gmail.com>
wrote:
> >>
> >> So I've managed to get my stack crawler working and passing
its unit
> tests
> >> - this is the one I've been working on as an alternative to
> shadow-stack: it
> >> uses only static constant data structures (no global variables or
> >> thread-local data), which means that it's fully compatible
with a
> >> multi-threaded environment.
> >> One question that has arisen, however, is what to do about
function
> >> parameters that may be stack roots. You can't call
llvm.gcroot() on a
> >> function parameter. The only thing I can think of is to copy any
> function
> >> parameter that is a pointer, or which contains a pointer, into a
local
> >> variable. This seems complicated and inefficient to me (for one
thing,
> it
> >> means that every stack frame just got 8 bytes bigger due to the
need to
> >> store the 'this' pointer locally)  - is there a better
way?
>
> If a function parameter is a stack root, then so is the value passed
> through that parameter.  Since they both point to the same thing, only
> one stack root is needed to keep it reachable.  If you've got a root
> for it in the caller, you don't need one in the callee for the
> argument.
>
> >
> > Anyone?
> > Here's some examples of what I am talking about (I'll use Java
for the
> > examples):
> > Example 1:
> >   class Foo {
> >     static String st = "Hello";
> >     void f1() {
> >        f2(st);
> >     }
> >     void f2(String s) {
> >       st = null; // Destroy static root
> >       // Now allocate something, which might trigger
> >       // a garbage collection.
> >       StringBuffer sb = new StringBuffer();
> >       // Is 's' still live? It's not stored in any
> >       // alloca, or in any global variable. It only
> >       // exists as a function parameter, which cannot
> >       // be declared as a root via llvm.gcroot.
>
> s is definitely still live assuming f1 copied st into a root before
> using it, which it generally should.
>
> >       sb.append(s);
> >     }
> >   }
> > Example 2:
> >   class Foo {
> >     static void f1() {
> >       // Create a new 'Foo' but don't store it
> >       // anywhere.
> >       new Foo().f2();
> >     }
> >     void f2() {
> >       // Possibly trigger a collection
> >       StringBuffer sb = new StringBuffer();
> >       // Is 'this' still live at this point?
>
> Again, yes, if f1 copied the result of "new Foo()" into a root.
>
> >       sb.append(toString());
> >     }
> >   }
> > Now, I know that the LLVM "Accurate GC" document says that
any
> "intermediate
> > values" must be declared as a GC root. My question is, does
merely
> _loading_
> > a global variable - or any mutable, non-strictly-local variable for
that
> > matter - count as an "intermediate" value in this case?
>
> Yes, I believe it would.  In a multithreaded system, if you load a
> pointer from a global or the heap, and some other thread changes the
> pointer you loaded, you'd better have a stack root for the original
> pointer value you loaded while you're using it.
>
> Now if the global is thread-local, you don't need another root for it
> since no other threads can see it.  And if you can prove that a global
> isn't going to change, you also don't need another root for it.
>
>
> > My problem is that I can't see how to address this without
generating
> > horribly bloated and inefficient code. Pretty much every function
> argument
> > that isn't a simple integer or float will have to be copied to a
hidden
> > local variable before every function call - in addition to copying it
to
> the
> > stack to create the call frame. Even local variables are not immune if
> you
> > allow closures in your language. Under this scenario, calling
conventions
> > that pass arguments in registers are utterly futile and save nothing,
> > because everything has to get copied to the stack anyway.
>
> One way or another, stack roots *must* get copied to the stack, so
> that the GC can find them.  Remember the GC is usually running in some
> other thread and can *only* find pointers that live somewhere in
> memory (either the stack or the heap).
>
> > I really wish I could simply declare function arguments as
llvm.gcroots.
> For
> > arguments that are not in registers, it would treat it just like an
> alloca,
> > since they both represent stack slots. For arguments that are passed
in
> > registers, LLVM should automatically lower it to a non-register
argument,
> > making a copy if needed. (I can't do this myself, since I'm
not supposed
> to
> > *know* which arguments are passed in registers or not - that depends
on
> the
> > target and the code generators and such).
>
> I *believe* the code generator has some optimizations that eliminate
> redundant copies within a block.  At any rate, if you keep your stack
> roots in the caller rather than the callee, then there'll be a number
> of cases where the caller needs to keep that stack root for purposes
> other than that one call.  In those cases, having the stack root
> doesn't impose a cost you aren't already paying just to allow the
GC
> to work.
>
> > That still leaves me with the problem of declaring as roots function
> > arguments that are the result of complex calculations, but those are
far
> > fewer - declaring temporary vars for those won't bloat the code so
badly.
> > But having to copy every single load of a non-local variable into a
> > temporary is a nightmare.
>
> You only have to copy pointers.  If you've got a struct, you only need
> to keep a root for the pointer member(s) of that struct.
>
Here's another way to think about the issue: Compare this whole approach to
stack roots to that of a conservative collector. Since we know conservative
collectors work, the whole reason for making an accurate collector in the
first place is efficiency, right? If we couldn't make an accurate collector
that was more efficient than a conservative collector, then no one would
bother because it's such a pain.

However, looking strictly at stack frames, a conservative collector has a
much more compact stack layout. There's no need to copy anything to local
variables, because by the time you call into the collector proper,
everything - every register variable, every function parameter, and so on -
is already on the stack *somewhere*.

The stack frame layout dictated by llvm.gcroot, by contrast, is much, much
fatter. In a typical deeply-nested call stack, there will be many copies of
the same value on the stack. Let's take the 'this' pointer as an
example.
Say we have some method that calls itself recursively, and we are currently
10 levels deep. That means that we have 20 copies of the 'this' pointer
on
the stack: Because each invocation of the method has to push the 'this'
pointer on the stack as part of the calling protocol (even if it's passed in
registers, it still has to save the prior value of the register somewhere),
and in addition we also have to save the 'this' pointer as a stack root.
A
conservative collector, on the other hand, would be able to get by with only
10 copies of the 'this' pointer.

As far as the issue of eliminating redundant stores, I'm pretty sure that
the optimizers cannot eliminate redundant *roots*. Figuring out the minimal
set of temporary roots needed for a function call is a non-trivial problem
(you don't want to just blindly make every pointer argument to a function a
temporary root, since that just bloats the size of the call frame
needlessly.)

My entire goal here is to get a GC that has high performance and efficiency.
I want my GC to be able to run on everything from a data center to a beagle
board, and use the minimum resources possible.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100922/b5103a7f/attachment.html>

Nathan Jeffords

2010-Sep-22 16:38 UTC

head link

[LLVMdev] Stack roots and function parameters

snip

>
>> > My problem is that I can't see how to address this without
generating
>> > horribly bloated and inefficient code. Pretty much every function
>> argument
>> > that isn't a simple integer or float will have to be copied to
a hidden
>> > local variable before every function call - in addition to copying
it to
>> the
>> > stack to create the call frame. Even local variables are not
immune if
>> you
>> > allow closures in your language. Under this scenario, calling
>> conventions
>> > that pass arguments in registers are utterly futile and save
nothing,
>> > because everything has to get copied to the stack anyway.
>>
>> One way or another, stack roots *must* get copied to the stack, so
>> that the GC can find them.  Remember the GC is usually running in some
>> other thread and can *only* find pointers that live somewhere in
>> memory (either the stack or the heap).
>>
>I think the logic is that by putting the responsibility for having stack
roots on the caller, the callee's do not need to
concern themselves with their arguments. A function using a gc'ed pointer
assumes that a caller somewhere up the call-chain has created a gcroot for
that pointer that will outlive its invocation therefore it doesn't need to
do anything with the garbage collector.

In the case of the 'this' pointer, only the function that got the
'this'
pointer from somewhere else (global, heap) would be responsible for creating
a gcroot, once this is done the 'this' pointer can be passed through any
number of calls and no more roots are required as long as the pointer is not
stored somewhere that will outlive the invocation.

snip

-- Talin>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>- Nathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100922/fbd326fd/attachment.html>

Mark Shannon

2010-Sep-22 18:08 UTC

head link

[LLVMdev] Stack roots and function parameters[MESSAGE NOT SCANNED]

Talin wrote:

[snip]
> 
> Here's another way to think about the issue: Compare this whole
approach
> to stack roots to that of a conservative collector. Since we know 
> conservative collectors work, the whole reason for making an accurate 
> collector in the first place is efficiency, right? If we couldn't make 
> an accurate collector that was more efficient than a conservative 
> collector, then no one would bother because it's such a pain.
> 
One very important to use precise GC is to have a copying collector.
For generational GC, a copying collector is faster and makes the 
allocator much simpler and faster.
> However, looking strictly at stack frames, a conservative collector has 
> a much more compact stack layout. There's no need to copy anything to 
> local variables, because by the time you call into the collector proper, 
> everything - every register variable, every function parameter, and so 
> on - is already on the stack *somewhere*.
The typical stack is *tiny* compared with the typical heap.
Its pretty small compared with the nursery, so even for minor 
collections it won't matter.
> 
> The stack frame layout dictated by llvm.gcroot, by contrast, is much, 
> much fatter. In a typical deeply-nested call stack, there will be many 
> copies of the same value on the stack. Let's take the 'this'
pointer as
> an example. Say we have some method that calls itself recursively, and 
> we are currently 10 levels deep. That means that we have 20 copies of 
> the 'this' pointer on the stack: Because each invocation of the
method
> has to push the 'this' pointer on the stack as part of the calling 
> protocol (even if it's passed in registers, it still has to save the 
> prior value of the register somewhere), and in addition we also have to 
> save the 'this' pointer as a stack root. A conservative collector,
on
> the other hand, would be able to get by with only 10 copies of the 
> 'this' pointer.
If you're really worried about the extra roots, you could do liveness 
analysis. If its not live across a GC-safe point, then its not a root.
In fact, liveness analysis is important anyway. Retaining a dead object
will hurt your performance much more than tracing a live one twice.
> 
> As far as the issue of eliminating redundant stores, I'm pretty sure 
> that the optimizers cannot eliminate redundant /roots/. Figuring out the 
> minimal set of temporary roots needed for a function call is a 
> non-trivial problem (you don't want to just blindly make every pointer 
> argument to a function a temporary root, since that just bloats the size 
> of the call frame needlessly.)
> 
> My entire goal here is to get a GC that has high performance and 
> efficiency. I want my GC to be able to run on everything from a data 
> center to a beagle board, and use the minimum resources possible.
> 
You might want to have a look at my GC(s), its probably too
entangled with the rest of the toolkit to be useful directly,
but it might be of interest.

http://code.google.com/p/gvmt/
> -- 
> -- Talin
> 
Cheers,
Mark

Kenneth Uildriks

2010-Sep-23 12:58 UTC

head link

[LLVMdev] Stack roots and function parameters

On Wed, Sep 22, 2010 at 11:12 AM, Talin <viridia at gmail.com>
wrote:> My entire goal here is to get a GC that has high performance and
efficiency.
> I want my GC to be able to run on everything from a data center to a beagle
> board, and use the minimum resources possible.
>
> --
> -- Talin
>The best way I can think of to streamline garbage collection is to cut
down on the amount of garbage it has to collect.  I'm working on an
optimization pass that transforms GC allocations into region
allocations whenever it can... I'll share it when it's done.

One of my goals is to handle enough cases that I can put off
implementing GC for a while :)

Jon Harrop

2010-Sep-24 21:31 UTC

head link

[LLVMdev] Stack roots and function parameters

Foo

 

Here's another way to think about the issue: Compare this whole approach to
stack roots to that of a conservative collector. Since we know conservative
collectors work, the whole reason for making an accurate collector in the
first place is efficiency, right? If we couldn't make an accurate collector
that was more efficient than a conservative collector, then no one would
bother because it's such a pain.

 

However, looking strictly at stack frames, a conservative collector has a
much more compact stack layout. There's no need to copy anything to local
variables, because by the time you call into the collector proper,
everything - every register variable, every function parameter, and so on -
is already on the stack *somewhere*.

 

The stack frame layout dictated by llvm.gcroot, by contrast, is much, much
fatter. In a typical deeply-nested call stack, there will be many copies of
the same value on the stack. Let's take the 'this' pointer as an
example.
Say we have some method that calls itself recursively, and we are currently
10 levels deep. That means that we have 20 copies of the 'this' pointer
on
the stack: Because each invocation of the method has to push the 'this'
pointer on the stack as part of the calling protocol (even if it's passed in
registers, it still has to save the prior value of the register somewhere),
and in addition we also have to save the 'this' pointer as a stack root.
A
conservative collector, on the other hand, would be able to get by with only
10 copies of the 'this' pointer.

 

As far as the issue of eliminating redundant stores, I'm pretty sure that
the optimizers cannot eliminate redundant roots. Figuring out the minimal
set of temporary roots needed for a function call is a non-trivial problem
(you don't want to just blindly make every pointer argument to a function a
temporary root, since that just bloats the size of the call frame
needlessly.)

 

My entire goal here is to get a GC that has high performance and efficiency.
I want my GC to be able to run on everything from a data center to a beagle
board, and use the minimum resources possible.

-- 
-- Talin

 

Foo

 

Foo

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100924/5a398760/attachment.html>

Jon Harrop

2010-Sep-25 10:45 UTC

head link

[LLVMdev] Stack roots and function parameters

Forgive my top post but I hate Windows. J

 

I am surprised you (Talin) say that "we know conservative collectors
work"
because my experience has very much been of them not working. Indeed, if you
have 400Mb of allocated heap blocks on a 32-bit machine is there not a 10%
chance of *each* random 32-bit int "pointing" into your heap, i.e. a
false
positive? I just did a simple test here (create an array with 1M random
32-bit ints and then allocate blocks n=1..2^16 of size n bytes) and Boehm's
conservative GC sometimes takes 13s and other times runs out of memory after
several minutes. It even complains about the "Repeated allocation of very
large block (appr. size 65536)". Is 64kb really "very large"
today? I think
not.

 

Performance is certainly an important reason to implement an accurate
collector though. However, the performance of a conservative collector is
surely dominated by the cost of searching to see if each int "points"
to
within any allocated block, a completely unnecessary operation that accurate
collectors simply do not perform. That overhead will dwarf any doubling up
of roots on the stack but the main problem with conservative collectors, as
I say, is not that they are slow but that they leak.

 

I agree that eliminating redundant roots is a desirable goal but I did only
the most basic things for HLVM and was very happy with the results. So I
think it is premature to value such optimizations. I personally would much
rather see a mature VM built on top of LLVM.

 

Cheers,

Jon.

 

Talin wrote:

Here's another way to think about the issue: Compare this whole approach to
stack roots to that of a conservative collector. Since we know conservative
collectors work, the whole reason for making an accurate collector in the
first place is efficiency, right? If we couldn't make an accurate collector
that was more efficient than a conservative collector, then no one would
bother because it's such a pain.

 

However, looking strictly at stack frames, a conservative collector has a
much more compact stack layout. There's no need to copy anything to local
variables, because by the time you call into the collector proper,
everything - every register variable, every function parameter, and so on -
is already on the stack *somewhere*.

 

The stack frame layout dictated by llvm.gcroot, by contrast, is much, much
fatter. In a typical deeply-nested call stack, there will be many copies of
the same value on the stack. Let's take the 'this' pointer as an
example.
Say we have some method that calls itself recursively, and we are currently
10 levels deep. That means that we have 20 copies of the 'this' pointer
on
the stack: Because each invocation of the method has to push the 'this'
pointer on the stack as part of the calling protocol (even if it's passed in
registers, it still has to save the prior value of the register somewhere),
and in addition we also have to save the 'this' pointer as a stack root.
A
conservative collector, on the other hand, would be able to get by with only
10 copies of the 'this' pointer.

 

As far as the issue of eliminating redundant stores, I'm pretty sure that
the optimizers cannot eliminate redundant roots. Figuring out the minimal
set of temporary roots needed for a function call is a non-trivial problem
(you don't want to just blindly make every pointer argument to a function a
temporary root, since that just bloats the size of the call frame
needlessly.)

 

My entire goal here is to get a GC that has high performance and efficiency.
I want my GC to be able to run on everything from a data center to a beagle
board, and use the minimum resources possible.

-- 
-- Talin

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100925/5c90ae48/attachment.html>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Sep 2010 - [LLVMdev] Stack roots and function parameters

[LLVMdev] Stack roots and function parameters

[LLVMdev] Stack roots and function parameters

[LLVMdev] Stack roots and function parameters

[LLVMdev] Stack roots and function parameters[MESSAGE NOT SCANNED]

[LLVMdev] Stack roots and function parameters

[LLVMdev] Stack roots and function parameters

[LLVMdev] Stack roots and function parameters

Maybe Matching Threads