thr3ads.net - llvm dev - [LLVMdev] Garbage collection [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Talin

2009-Feb-26 08:02 UTC

[LLVMdev] Garbage collection

One of the more interesting subjects of conversation at the 2008 
developer day was related to garbage collection. With the increasing 
number of LLVM-based VMs and other projects, I suspect that the desire 
for more comprehensive garbage collection support in LLVM is only going 
to increase. (I am now involved in two different open-source projects 
both of which will eventually have a strong requirement for a 
high-performance LLVM-compatible collector.)

The last time I looked, the LLVM support for GC is fairly austere, 
consisting of (a) the bare minimum set of low-level intrinsics needed to 
integrate a collector with a runtime environment, and (b) a couple of 
example of collectors, neither of which AFAIK take full advantage of the 
low-level intrinsics to get optimal performance.

The IR-level intrinsics themselves don't much help you *write* a GC, so 
much as to integrate one with LLVM. What is provided is essentially a 
mechanism for walking the stack, and a means to insert read/write 
barriers into the generated code, which together form a tiny fraction of 
what it would take to design a working collector.

Part of the reason why there isn't more direct support for GC is the 
theory that there is no such thing as a one-size-fits-all collector. The 
argument goes that a really efficient collector design requires detailed 
knowledge of the object model of the language being compiled. Some 
languages use explicit type information embedded in the object, which 
has a specific offset and a specific format that is likely to be 
different for different languages. Other languages have the benefit of 
compile-time knowledge of the type of an entire object graph, and thus 
do not require the overhead of a per-object type field.

On the other hand, it is possible to make a counter-argument to this 
theory that goes like this: The Java VM has been used to implement a 
large number of front-end languages efficiently, without requiring a 
special garbage collector for each language. Furthermore, I've observed 
that the Java VM, more than any other runtime environment, tends to be 
used as a platform for academic research into GC algorithms, without 
requiring changes to the set of languages which are hosted by the VM. In 
other words, both the language and the GC algorithm can be varied 
independently, from which we can conclude that perhaps the choice of 
language and GC aren't as tightly coupled as we might think.

It might also be argued that many of the languages that people are 
wanting to build with LLVM have Java-like object semantics anyway, at 
least at the level of GC. (Although finalization is an issue, since this 
is an area in which language semantics vary widely.)

It also seems to me that even radically different collector designs 
could utilize some common building blocks for heap management, work 
queuing, and so on. It might make sense to create a library of such 
components as an auxilliary project to LLVM. For example, while there 
are a large number of collector algorithms, there are a fairly small 
number of designs for write barriers - so the library could provide 
several implementations of such. Similarly, there are only so many ways 
to "stop the world" (for concurrent collectors that require this), and
this could be provided as well.

Of course, there is always a danger when creating libraries of the 
"ivory tower syndrome", putting a lot of effort into components that 
don't actually get used. This is why it would be even better to create a 
standard, high performance collector for LLVM that actually uses these 
methods.

As I see it, such a collector would include two tracing strategies, one 
based on per-object type information, and one based on static type 
knowledge. Having both types of tracing in a single collector would not, 
I think, cause a serious performance degradation, especially as object 
models such as Java require both types anyway.

Specific object model details such as the offset of the type info within 
an object can be dealt with by implementing the tracing functions in 
LLVM IR, so that all of the various inefficiencies caused by 
parameterizing the object layout get optimized away. It should be 
relatively easy for a frontend to generate the IR for a tracing function 
for each concrete class, while the collector library supplies the code 
that is called by the tracer for each pointer location in memory. 
Tracing functions aren't that complicated anyway, it's the interaction 
between the tracer and the mutator is where things get complex, and that 
is largely independent of things like object memory layouts.

In other words, I think that it should be possible to create a fairly 
comprehensive and efficient collector implementation in which 80% of the 
implementation lives in a library, and the language front end generates 
the remaining 20%. Moreover, it should be possible to design this 
library in a way that makes it possible for people to replace 
significant parts of the collector, so that it can be used as a basis 
for developing a diverse selection of collection strategies. And because 
all of the parts that are specific to the object model are generated by 
the front-end, the collector should work for a large number of languages 
with different object semantics.

The hardest part will be issues involving concurrency and threading. It 
has been my observation that support for concurrency is the one thing 
that you can't build up incrementally - that is, with most aspects of a 
GC you can start with a simple implementation and gradually add 
improvements (multiple generations and so on), but once you add 
concurrency you pretty much have to start over. Unfortunately, I am not 
a huge concurrent programming whiz, which is part of the reason why I 
haven't simply gone and implemented the collector myself. However, this 
does suggest to me that if we want to support concurrency (which I 
believe we will) then it has to be supported from the very beginning.

As I might have mentioned before on this list, I have written a few 
modest components that I plan to use as the basis for a collector which 
I plan to name 'Scarcity'. One of these is a heap manager which replaces
malloc/free and which has fairly efficient functions for things like 
"free all blocks for which this callback function returns true" which 
can be used to implement mark and sweep.

-- Talin

Ralf Schneider

2009-Feb-26 13:18 UTC

head link

[LLVMdev] Garbage collection

Hello,

2009/2/26 Talin <viridia at gmail.com>
> The IR-level intrinsics themselves don't much help you *write* a GC, so
> much as to integrate one with LLVM. What is provided is essentially a
> mechanism for walking the stack, and a means to insert read/write
> barriers into the generated code, which together form a tiny fraction of
> what it would take to design a working collector.
>
<...>

>
> In other words, I think that it should be possible to create a fairly
> comprehensive and efficient collector implementation in which 80% of the
> implementation lives in a library, and the language front end generates
> the remaining 20%. Moreover, it should be possible to design this
> library in a way that makes it possible for people to replace
> significant parts of the collector, so that it can be used as a basis
> for developing a diverse selection of collection strategies. And because
> all of the parts that are specific to the object model are generated by
> the front-end, the collector should work for a large number of languages
> with different object semantics.

IMO LLVM should try to keep its <L>ow <L>evel design. Garbage
collection is
already very high level.
A library which supports building garbage collectors on the other hand would
be very helpful of course.
Basically such a library boils down to an operatic system abstraction
library. Functions like "stop the world" are completely managed by the
OS
and not the CPU. I'm not sure if such functionality is part of LLVMs goals.

Having that said; functions like :

- Enumerate threads
- Pause/Resume thread
- Get root pointers for a thread (including registers)
- Get a list of modified memory-pages (GetWriteWatch in Windows - used in
the .net GC)
- ...

for different platforms - would definitely help building a GC. Just look at
the source code of the Boehm GC: It's a completely unmaintainable mess of
#ifdefs

A little bit off topic: Has anybody tried building a concurrent GC - running
in a different _process_, instead of a thread?
The idea: To perform a collection you do a fork(). The child process
collects all unreferenced memory regions and reports them back to the parent
process. The parrent process waits for the result (in a sperate thread) and
if it gets the result it frees the memory regions.
This way you do not have to worry about barriers and all the nasty stuff.
The maximum pause time is close to zero.
Of course this is only useful with a working "copy on write"
implementation
of fork().

- Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090226/ed372a6a/attachment.html>

Pertti Kellomäki

2009-Feb-26 14:24 UTC

head link

[LLVMdev] Garbage collection

Ralf Schneider wrote:> A little bit off topic: Has anybody tried building a concurrent GC - 
> running in a different _process_, instead of a thread?
> The idea: To perform a collection you do a fork(). The child process 
> collects all unreferenced memory regions and reports them back to the 
> parent process.
I remember reading a paper in ACM Sigplan Notices (I think)
many years back, describing exactly such a system.
-- 
Pertti

Gabor Greif

2009-Feb-26 16:22 UTC

head link

[LLVMdev] Garbage collection

On Feb 26, 2:18 pm, Ralf Schneider <li... at gestaltgeber.com>
wrote:> Hello,
>
> 2009/2/26 Talin <viri... at gmail.com>
>
> > The IR-level intrinsics themselves don't much help you *write* a
GC, so
> > much as to integrate one with LLVM. What is provided is essentially a
> > mechanism for walking the stack, and a means to insert read/write
> > barriers into the generated code, which together form a tiny fraction
of
> > what it would take to design a working collector.
>
> <...>
>
>
>
> > In other words, I think that it should be possible to create a fairly
> > comprehensive and efficient collector implementation in which 80% of
the
> > implementation lives in a library, and the language front end
generates
> > the remaining 20%. Moreover, it should be possible to design this
> > library in a way that makes it possible for people to replace
> > significant parts of the collector, so that it can be used as a basis
> > for developing a diverse selection of collection strategies. And
because
> > all of the parts that are specific to the object model are generated
by
> > the front-end, the collector should work for a large number of
languages
> > with different object semantics.
>
> IMO LLVM should try to keep its <L>ow <L>evel design. Garbage
collection is
> already very high level.
> A library which supports building garbage collectors on the other hand
would
> be very helpful of course.
> Basically such a library boils down to an operatic system abstraction
> library. Functions like "stop the world" are completely managed
by the OS
> and not the CPU. I'm not sure if such functionality is part of LLVMs
goals.
>
> Having that said; functions like :
>
> - Enumerate threads
> - Pause/Resume thread
> - Get root pointers for a thread (including registers)
> - Get a list of modified memory-pages (GetWriteWatch in Windows - used in
> the .net GC)
> - ...
>
> for different platforms - would definitely help building a GC. Just look at
> the source code of the Boehm GC: It's a completely unmaintainable mess
of
> #ifdefs
>
> A little bit off topic: Has anybody tried building a concurrent GC -
running
> in a different _process_, instead of a thread?
Yes, I had a proof of concept implementation of a GC with
- shared memory as the GC arena,
- (C++) throw-catch-based marking
- simple lookup rules for (in-arena) associated
  instance metadata.

I never had the need to finish the implementation, but
the fork approach worked reasonably well, and the mark
and sweep parts ran in the forked process, with the
shared memory communicating back the collection results.

The most amusing thing was to see how the stack unwinding
in the forked process did the marking and the original process
was not harmed.

I hope to extend the concept to multiple threads by (m)protecting
increasing parts of the arena and hoping that all threads get
caught with time. Finally the last running threads must be
stopped and forced to fork. This last question and how to recover
the threads from the protection violations in the original process
are the big open questions to be solved.

Cheers,

    Gabor
> The idea: To perform a collection you do a fork(). The child process
> collects all unreferenced memory regions and reports them back to the
parent
> process. The parrent process waits for the result (in a sperate thread) and
> if it gets the result it frees the memory regions.
> This way you do not have to worry about barriers and all the nasty stuff.
> The maximum pause time is close to zero.
> Of course this is only useful with a working "copy on write"
implementation
> of fork().
>
> - Ralf
>
> _______________________________________________
> LLVM Developers mailing list
> LLVM... at cs.uiuc.edu      
 http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Chris Lattner

2009-Feb-26 17:25 UTC

head link

[LLVMdev] Garbage collection

On Feb 26, 2009, at 12:02 AM, Talin wrote:>  With the increasing
> number of LLVM-based VMs and other projects, I suspect that the desire
> for more comprehensive garbage collection support in LLVM is only  
> going
> to increase.
Absolutely!
> Part of the reason why there isn't more direct support for GC is the
> theory that there is no such thing as a one-size-fits-all collector.  
> The
> argument goes that a really efficient collector design requires  
> detailed
> knowledge of the object model of the language being compiled.
Yes, you do need to have some knowledge about the object model.   
However, it would be perfectly reasonable for LLVM to support and  
include multiple different collectors for different classes of language.
> On the other hand, it is possible to make a counter-argument to this
> theory that goes like this: The Java VM has been used to implement a
> large number of front-end languages efficiently, without requiring a
> special garbage collector for each language.
Most importantly to me, the takeaway from Java is that just having  
something that works "well enough" is really important and helps  
bootstrap a lot of projects, which can then take the "last 10% of  
performance" as an optimization opportunity, instead of being blocked  
from even starting with LLVM.

I'd claim that JavaVM really isn't a good way to implement a lisp vm  
or something like that.  However, the perf delta induced by the Java  
VM may just *not matter* in the big picture.  At least with LLVM, a  
Lisp implementation could be brought up on an "OOP GC" and switched to
something more custom as the project develops.
> It also seems to me that even radically different collector designs
> could utilize some common building blocks for heap management, work
> queuing, and so on.
Yes.
> Of course, there is always a danger when creating libraries of the
> "ivory tower syndrome", putting a lot of effort into components
that
> don't actually get used. This is why it would be even better to  
> create a
> standard, high performance collector for LLVM that actually uses these
> methods.  <many good thoughts trimmed>
What you see in LLVM right now is really only the second step of the  
planned GC evolution.  The first step was very minimal, but useful for  
bridging to other existing collectors.  The second step was Gordon's  
(significant!) extensions to the system which allowed him to tie in  
the Ocaml collector and bring some more sanity to codegen.

While people object to adding high level features to LLVM, high level  
and language-specific features are *great* in llvm as long as they are  
cleanly separable.  I would *love* to see a composable collection of  
different GC subroutines with clean interfaces built on LLVM "assembly  
language" GC stuff.

In my ideal world, this would be:

1. Subsystems [with clean interfaces] for thread management,  
finalization, object model interactions, etc.
2. Within different high-level designs (e.g. copying, mark/sweep, etc)  
there can be replaceable policy components etc.
3. A couple of actual GC implementations built on top of #1/2.   
Ideally there would only be a couple of high-level collectors that can  
be parameterized by replacing subsystems and policies.
4. A very simple language implementation that uses the facilities, on  
the order of complexity as the kaleidoscope tutorial.

As far as I know, there is nothing that prevents this from happening  
today, we just need leadership in the area to drive it.  To avoid the  
"ivory tower" problem, I'd strongly recommend starting with a
simple
GC and language and get the whole thing working top to bottom.  From  
there, the various pieces can be generalized out etc.  This ensures  
that there is always a *problem being solved* and something that works  
and is testable.

One of the annoying reasons that the GC stuff is only halfway fleshed  
out is that I was working on an out of tree project (which of course  
got forgotten about when I left) when developing the GC intrinsics, so  
there is no fully working example in public.

-Chris

ps. Code generation for the GC intrinsics can be improved  
significantly.  We can add new intrinsics that don't pin things to the  
stack, update optimizations, and do many other things if people  
started using the GC stuff seriously.

Jon Harrop

2009-Feb-27 02:17 UTC

head link

[LLVMdev] Garbage collection

On Thursday 26 February 2009 17:25:56 Chris Lattner
wrote:> In my ideal world, this would be:
>
> 1. Subsystems [with clean interfaces] for thread management,
> finalization, object model interactions, etc.
> 2. Within different high-level designs (e.g. copying, mark/sweep, etc)
> there can be replaceable policy components etc.
> 3. A couple of actual GC implementations built on top of #1/2.
> Ideally there would only be a couple of high-level collectors that can
> be parameterized by replacing subsystems and policies.
> 4. A very simple language implementation that uses the facilities, on
> the order of complexity as the kaleidoscope tutorial.
>
> As far as I know, there is nothing that prevents this from happening
> today, we just need leadership in the area to drive it.  To avoid the
> "ivory tower" problem, I'd strongly recommend starting with a
simple
> GC and language and get the whole thing working top to bottom. From 
> there, the various pieces can be generalized out etc.  This ensures
> that there is always a *problem being solved* and something that works
> and is testable.
I fear that the IR generator and GC are too tightly coupled.

For example, the IR I am generating shares pointers read from the heap even 
across function calls. That is built on the assumption that the pointers are 
immutable and, therefore, that the GC is non-moving. The generated code is 
extremely efficient even though I have not even enabled LLVM's optimizations
yet precisely because of all this shared immutable data.

If you wanted to add a copying GC to my VM you would probably replace every 
lookup of the IR register with a lookup of the code to reload it, generating 
a lot of redundant loads that would greatly degrade performance so you would 
rely upon LLVM's optimization passes to clean it up again. However, I bet 
they do not have enough information to recover all of the lost performance. 
So there is a fundamental conflict here where a simple GC design decision has 
a drastic effect on the IR generator.

Although it is theoretically possible to parameterize the IR generator 
sufficiently to account for all possible combinations of GC designs I suspect 
the result would be a mess. Consequently, perhaps it would be better to 
consider IR generation and the GC as a single entity and, instead, factor 
them both out using a common high-level representation not dissimilar to JVM 
or CLR bytecode in terms of functionality but much more closely related to 
LLVM's IR?

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Gordon Henriksen

2009-Feb-27 19:08 UTC

head link

[LLVMdev] Garbage collection

On Feb 26, 2009, at 12:25, Chris Lattner wrote:
> On Feb 26, 2009, at 12:02 AM, Talin wrote:
>
>> With the increasing number of LLVM-based VMs and other projects, I  
>> suspect that the desire for more comprehensive garbage collection  
>> support in LLVM is only going to increase.
>
> What you see in LLVM right now is really only the second step of the  
> planned GC evolution.  The first step was very minimal, but useful  
> for bridging to other existing collectors.  The second step was  
> Gordon's (significant!) extensions to the system which allowed him  
> to tie in the Ocaml collector and bring some more sanity to codegen.
I agree; this would be a great contribution, making LLVM much more  
accessible to the development of novel and existing languages.
> While people object to adding high level features to LLVM, high  
> level and language-specific features are *great* in llvm as long as  
> they are cleanly separable.  I would *love* to see a composable  
> collection of different GC subroutines with clean interfaces built  
> on LLVM "assembly language" GC stuff.
Absolutely.

It is definitely valuable that the existing infrastructure doesn't  
bolt LLVM to a particular runtime. With only a few days of work, PyPy  
was able to try out the LLVM GC intrinsics and static stack maps and  
saw a big performance boost from it on their LLVM back-end. (Their GCC  
backend still outperformed LLVM, but by a much smaller margin.) But  
this in no way prevents providing GC building blocks for projects that  
are not working with existing runtimes and GCs.
> As far as I know, there is nothing that prevents this from happening  
> today, we just need leadership in the area to drive it.  To avoid  
> the "ivory tower" problem, I'd strongly recommend starting
with a
> simple GC and language and get the whole thing working top to  
> bottom.  From there, the various pieces can be generalized out etc.   
> This ensures that there is always a *problem being solved* and  
> something that works and is testable.
I strongly agree with this as well.
> ps. Code generation for the GC intrinsics can be improved  
> significantly.  We can add new intrinsics that don't pin things to  
> the stack, update optimizations, and do many other things if people  
> started using the GC stuff seriously.

I've already commented on this elsewhere in the thread. Promoting GC  
roots into SSA variables from stack slots would allow much more  
freedom for the middle- and back-end optimizations, and I think is  
clearly the next logical step.

— Gordon

Talin

2009-Feb-28 05:29 UTC

head link

[LLVMdev] Garbage collection

Chris Lattner wrote:> On Feb 26, 2009, at 12:02 AM, Talin wrote:
>   
>>  With the increasing
>> number of LLVM-based VMs and other projects, I suspect that the desire
>> for more comprehensive garbage collection support in LLVM is only  
>> going
>> to increase.
>>     
>
> Absolutely!
>
>   
>> Part of the reason why there isn't more direct support for GC is
the
>> theory that there is no such thing as a one-size-fits-all collector.  
>> The
>> argument goes that a really efficient collector design requires  
>> detailed
>> knowledge of the object model of the language being compiled.
>>     
>
> Yes, you do need to have some knowledge about the object model.   
> However, it would be perfectly reasonable for LLVM to support and  
> include multiple different collectors for different classes of language.
>
>   
>> On the other hand, it is possible to make a counter-argument to this
>> theory that goes like this: The Java VM has been used to implement a
>> large number of front-end languages efficiently, without requiring a
>> special garbage collector for each language.
>>     
>
> Most importantly to me, the takeaway from Java is that just having  
> something that works "well enough" is really important and helps
> bootstrap a lot of projects, which can then take the "last 10% of  
> performance" as an optimization opportunity, instead of being blocked
> from even starting with LLVM.
>
> I'd claim that JavaVM really isn't a good way to implement a lisp
vm
> or something like that.  However, the perf delta induced by the Java  
> VM may just *not matter* in the big picture.  At least with LLVM, a  
> Lisp implementation could be brought up on an "OOP GC" and
switched to
> something more custom as the project develops.
>
>   
>> It also seems to me that even radically different collector designs
>> could utilize some common building blocks for heap management, work
>> queuing, and so on.
>>     
>
> Yes.
>
>   
>> Of course, there is always a danger when creating libraries of the
>> "ivory tower syndrome", putting a lot of effort into
components that
>> don't actually get used. This is why it would be even better to  
>> create a
>> standard, high performance collector for LLVM that actually uses these
>> methods.  <many good thoughts trimmed>
>>     
>
> What you see in LLVM right now is really only the second step of the  
> planned GC evolution.  The first step was very minimal, but useful for  
> bridging to other existing collectors.  The second step was Gordon's  
> (significant!) extensions to the system which allowed him to tie in  
> the Ocaml collector and bring some more sanity to codegen.
>
> While people object to adding high level features to LLVM, high level  
> and language-specific features are *great* in llvm as long as they are  
> cleanly separable.  I would *love* to see a composable collection of  
> different GC subroutines with clean interfaces built on LLVM "assembly
> language" GC stuff.
>
> In my ideal world, this would be:
>
> 1. Subsystems [with clean interfaces] for thread management,  
> finalization, object model interactions, etc.
> 2. Within different high-level designs (e.g. copying, mark/sweep, etc)  
> there can be replaceable policy components etc.
> 3. A couple of actual GC implementations built on top of #1/2.   
> Ideally there would only be a couple of high-level collectors that can  
> be parameterized by replacing subsystems and policies.
> 4. A very simple language implementation that uses the facilities, on  
> the order of complexity as the kaleidoscope tutorial.
>
> As far as I know, there is nothing that prevents this from happening  
> today, we just need leadership in the area to drive it.  To avoid the  
> "ivory tower" problem, I'd strongly recommend starting with a
simple
> GC and language and get the whole thing working top to bottom.  From  
> there, the various pieces can be generalized out etc.  This ensures  
> that there is always a *problem being solved* and something that works  
> and is testable.
>
> One of the annoying reasons that the GC stuff is only halfway fleshed  
> out is that I was working on an out of tree project (which of course  
> got forgotten about when I left) when developing the GC intrinsics, so  
> there is no fully working example in public.
>
> -Chris
>
> ps. Code generation for the GC intrinsics can be improved  
> significantly.  We can add new intrinsics that don't pin things to the
> stack, update optimizations, and do many other things if people  
> started using the GC stuff seriously.
>   So I guess what I would be helpful for me is a roadmap that defines more 
clearly (a) what parts you plan to build in LLVM (beyond what is already 
there), (b) what parts you would like to have contributed, and (c) what 
parts you definitely want to keep external. In particular, I'd like to 
get a clearer picture of the shapes of the various pieces and their roles.

For example, I mentioned the "stop the world" function - however since
LLVM defines no primitives for creating threads or synchronizing between 
them, its hard to see how this could be part of LLVM proper. On the 
other hand, a sibling project (like vmkit or clang) could probably make 
a more restrictive set of assumptions, such as the existence of either 
POSIX or Windows threading models or some analog of those being 
available. "Stop the world" is implementable in terms of those
threading
primitives if you assume that mutator threads use sync points.

Thus, it seems to me that the proper home for such a function would be 
in a sibling project outside of LLVM proper. At the same time, however, 
core LLVM could benefit from having an implementation of this, in that 
it could guide the design of the IR by providing more concrete use cases 
for things like inserting sync points in generated code and
such.> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Feb 2009 - [LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

Reasonably Related Threads