One of the more interesting subjects of conversation at the 2008 developer day was related to garbage collection. With the increasing number of LLVM-based VMs and other projects, I suspect that the desire for more comprehensive garbage collection support in LLVM is only going to increase. (I am now involved in two different open-source projects both of which will eventually have a strong requirement for a high-performance LLVM-compatible collector.) The last time I looked, the LLVM support for GC is fairly austere, consisting of (a) the bare minimum set of low-level intrinsics needed to integrate a collector with a runtime environment, and (b) a couple of example of collectors, neither of which AFAIK take full advantage of the low-level intrinsics to get optimal performance. The IR-level intrinsics themselves don't much help you *write* a GC, so much as to integrate one with LLVM. What is provided is essentially a mechanism for walking the stack, and a means to insert read/write barriers into the generated code, which together form a tiny fraction of what it would take to design a working collector. Part of the reason why there isn't more direct support for GC is the theory that there is no such thing as a one-size-fits-all collector. The argument goes that a really efficient collector design requires detailed knowledge of the object model of the language being compiled. Some languages use explicit type information embedded in the object, which has a specific offset and a specific format that is likely to be different for different languages. Other languages have the benefit of compile-time knowledge of the type of an entire object graph, and thus do not require the overhead of a per-object type field. On the other hand, it is possible to make a counter-argument to this theory that goes like this: The Java VM has been used to implement a large number of front-end languages efficiently, without requiring a special garbage collector for each language. Furthermore, I've observed that the Java VM, more than any other runtime environment, tends to be used as a platform for academic research into GC algorithms, without requiring changes to the set of languages which are hosted by the VM. In other words, both the language and the GC algorithm can be varied independently, from which we can conclude that perhaps the choice of language and GC aren't as tightly coupled as we might think. It might also be argued that many of the languages that people are wanting to build with LLVM have Java-like object semantics anyway, at least at the level of GC. (Although finalization is an issue, since this is an area in which language semantics vary widely.) It also seems to me that even radically different collector designs could utilize some common building blocks for heap management, work queuing, and so on. It might make sense to create a library of such components as an auxilliary project to LLVM. For example, while there are a large number of collector algorithms, there are a fairly small number of designs for write barriers - so the library could provide several implementations of such. Similarly, there are only so many ways to "stop the world" (for concurrent collectors that require this), and this could be provided as well. Of course, there is always a danger when creating libraries of the "ivory tower syndrome", putting a lot of effort into components that don't actually get used. This is why it would be even better to create a standard, high performance collector for LLVM that actually uses these methods. As I see it, such a collector would include two tracing strategies, one based on per-object type information, and one based on static type knowledge. Having both types of tracing in a single collector would not, I think, cause a serious performance degradation, especially as object models such as Java require both types anyway. Specific object model details such as the offset of the type info within an object can be dealt with by implementing the tracing functions in LLVM IR, so that all of the various inefficiencies caused by parameterizing the object layout get optimized away. It should be relatively easy for a frontend to generate the IR for a tracing function for each concrete class, while the collector library supplies the code that is called by the tracer for each pointer location in memory. Tracing functions aren't that complicated anyway, it's the interaction between the tracer and the mutator is where things get complex, and that is largely independent of things like object memory layouts. In other words, I think that it should be possible to create a fairly comprehensive and efficient collector implementation in which 80% of the implementation lives in a library, and the language front end generates the remaining 20%. Moreover, it should be possible to design this library in a way that makes it possible for people to replace significant parts of the collector, so that it can be used as a basis for developing a diverse selection of collection strategies. And because all of the parts that are specific to the object model are generated by the front-end, the collector should work for a large number of languages with different object semantics. The hardest part will be issues involving concurrency and threading. It has been my observation that support for concurrency is the one thing that you can't build up incrementally - that is, with most aspects of a GC you can start with a simple implementation and gradually add improvements (multiple generations and so on), but once you add concurrency you pretty much have to start over. Unfortunately, I am not a huge concurrent programming whiz, which is part of the reason why I haven't simply gone and implemented the collector myself. However, this does suggest to me that if we want to support concurrency (which I believe we will) then it has to be supported from the very beginning. As I might have mentioned before on this list, I have written a few modest components that I plan to use as the basis for a collector which I plan to name 'Scarcity'. One of these is a heap manager which replaces malloc/free and which has fairly efficient functions for things like "free all blocks for which this callback function returns true" which can be used to implement mark and sweep. -- Talin
Hello, 2009/2/26 Talin <viridia at gmail.com>> The IR-level intrinsics themselves don't much help you *write* a GC, so > much as to integrate one with LLVM. What is provided is essentially a > mechanism for walking the stack, and a means to insert read/write > barriers into the generated code, which together form a tiny fraction of > what it would take to design a working collector. ><...>> > In other words, I think that it should be possible to create a fairly > comprehensive and efficient collector implementation in which 80% of the > implementation lives in a library, and the language front end generates > the remaining 20%. Moreover, it should be possible to design this > library in a way that makes it possible for people to replace > significant parts of the collector, so that it can be used as a basis > for developing a diverse selection of collection strategies. And because > all of the parts that are specific to the object model are generated by > the front-end, the collector should work for a large number of languages > with different object semantics.IMO LLVM should try to keep its <L>ow <L>evel design. Garbage collection is already very high level. A library which supports building garbage collectors on the other hand would be very helpful of course. Basically such a library boils down to an operatic system abstraction library. Functions like "stop the world" are completely managed by the OS and not the CPU. I'm not sure if such functionality is part of LLVMs goals. Having that said; functions like : - Enumerate threads - Pause/Resume thread - Get root pointers for a thread (including registers) - Get a list of modified memory-pages (GetWriteWatch in Windows - used in the .net GC) - ... for different platforms - would definitely help building a GC. Just look at the source code of the Boehm GC: It's a completely unmaintainable mess of #ifdefs A little bit off topic: Has anybody tried building a concurrent GC - running in a different _process_, instead of a thread? The idea: To perform a collection you do a fork(). The child process collects all unreferenced memory regions and reports them back to the parent process. The parrent process waits for the result (in a sperate thread) and if it gets the result it frees the memory regions. This way you do not have to worry about barriers and all the nasty stuff. The maximum pause time is close to zero. Of course this is only useful with a working "copy on write" implementation of fork(). - Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090226/ed372a6a/attachment.html>
Ralf Schneider wrote:> A little bit off topic: Has anybody tried building a concurrent GC - > running in a different _process_, instead of a thread? > The idea: To perform a collection you do a fork(). The child process > collects all unreferenced memory regions and reports them back to the > parent process.I remember reading a paper in ACM Sigplan Notices (I think) many years back, describing exactly such a system. -- Pertti
On Feb 26, 2:18 pm, Ralf Schneider <li... at gestaltgeber.com> wrote:> Hello, > > 2009/2/26 Talin <viri... at gmail.com> > > > The IR-level intrinsics themselves don't much help you *write* a GC, so > > much as to integrate one with LLVM. What is provided is essentially a > > mechanism for walking the stack, and a means to insert read/write > > barriers into the generated code, which together form a tiny fraction of > > what it would take to design a working collector. > > <...> > > > > > In other words, I think that it should be possible to create a fairly > > comprehensive and efficient collector implementation in which 80% of the > > implementation lives in a library, and the language front end generates > > the remaining 20%. Moreover, it should be possible to design this > > library in a way that makes it possible for people to replace > > significant parts of the collector, so that it can be used as a basis > > for developing a diverse selection of collection strategies. And because > > all of the parts that are specific to the object model are generated by > > the front-end, the collector should work for a large number of languages > > with different object semantics. > > IMO LLVM should try to keep its <L>ow <L>evel design. Garbage collection is > already very high level. > A library which supports building garbage collectors on the other hand would > be very helpful of course. > Basically such a library boils down to an operatic system abstraction > library. Functions like "stop the world" are completely managed by the OS > and not the CPU. I'm not sure if such functionality is part of LLVMs goals. > > Having that said; functions like : > > - Enumerate threads > - Pause/Resume thread > - Get root pointers for a thread (including registers) > - Get a list of modified memory-pages (GetWriteWatch in Windows - used in > the .net GC) > - ... > > for different platforms - would definitely help building a GC. Just look at > the source code of the Boehm GC: It's a completely unmaintainable mess of > #ifdefs > > A little bit off topic: Has anybody tried building a concurrent GC - running > in a different _process_, instead of a thread?Yes, I had a proof of concept implementation of a GC with - shared memory as the GC arena, - (C++) throw-catch-based marking - simple lookup rules for (in-arena) associated instance metadata. I never had the need to finish the implementation, but the fork approach worked reasonably well, and the mark and sweep parts ran in the forked process, with the shared memory communicating back the collection results. The most amusing thing was to see how the stack unwinding in the forked process did the marking and the original process was not harmed. I hope to extend the concept to multiple threads by (m)protecting increasing parts of the arena and hoping that all threads get caught with time. Finally the last running threads must be stopped and forced to fork. This last question and how to recover the threads from the protection violations in the original process are the big open questions to be solved. Cheers, Gabor> The idea: To perform a collection you do a fork(). The child process > collects all unreferenced memory regions and reports them back to the parent > process. The parrent process waits for the result (in a sperate thread) and > if it gets the result it frees the memory regions. > This way you do not have to worry about barriers and all the nasty stuff. > The maximum pause time is close to zero. > Of course this is only useful with a working "copy on write" implementation > of fork(). > > - Ralf > > _______________________________________________ > LLVM Developers mailing list > LLVM... at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Feb 26, 2009, at 12:02 AM, Talin wrote:> With the increasing > number of LLVM-based VMs and other projects, I suspect that the desire > for more comprehensive garbage collection support in LLVM is only > going > to increase.Absolutely!> Part of the reason why there isn't more direct support for GC is the > theory that there is no such thing as a one-size-fits-all collector. > The > argument goes that a really efficient collector design requires > detailed > knowledge of the object model of the language being compiled.Yes, you do need to have some knowledge about the object model. However, it would be perfectly reasonable for LLVM to support and include multiple different collectors for different classes of language.> On the other hand, it is possible to make a counter-argument to this > theory that goes like this: The Java VM has been used to implement a > large number of front-end languages efficiently, without requiring a > special garbage collector for each language.Most importantly to me, the takeaway from Java is that just having something that works "well enough" is really important and helps bootstrap a lot of projects, which can then take the "last 10% of performance" as an optimization opportunity, instead of being blocked from even starting with LLVM. I'd claim that JavaVM really isn't a good way to implement a lisp vm or something like that. However, the perf delta induced by the Java VM may just *not matter* in the big picture. At least with LLVM, a Lisp implementation could be brought up on an "OOP GC" and switched to something more custom as the project develops.> It also seems to me that even radically different collector designs > could utilize some common building blocks for heap management, work > queuing, and so on.Yes.> Of course, there is always a danger when creating libraries of the > "ivory tower syndrome", putting a lot of effort into components that > don't actually get used. This is why it would be even better to > create a > standard, high performance collector for LLVM that actually uses these > methods. <many good thoughts trimmed>What you see in LLVM right now is really only the second step of the planned GC evolution. The first step was very minimal, but useful for bridging to other existing collectors. The second step was Gordon's (significant!) extensions to the system which allowed him to tie in the Ocaml collector and bring some more sanity to codegen. While people object to adding high level features to LLVM, high level and language-specific features are *great* in llvm as long as they are cleanly separable. I would *love* to see a composable collection of different GC subroutines with clean interfaces built on LLVM "assembly language" GC stuff. In my ideal world, this would be: 1. Subsystems [with clean interfaces] for thread management, finalization, object model interactions, etc. 2. Within different high-level designs (e.g. copying, mark/sweep, etc) there can be replaceable policy components etc. 3. A couple of actual GC implementations built on top of #1/2. Ideally there would only be a couple of high-level collectors that can be parameterized by replacing subsystems and policies. 4. A very simple language implementation that uses the facilities, on the order of complexity as the kaleidoscope tutorial. As far as I know, there is nothing that prevents this from happening today, we just need leadership in the area to drive it. To avoid the "ivory tower" problem, I'd strongly recommend starting with a simple GC and language and get the whole thing working top to bottom. From there, the various pieces can be generalized out etc. This ensures that there is always a *problem being solved* and something that works and is testable. One of the annoying reasons that the GC stuff is only halfway fleshed out is that I was working on an out of tree project (which of course got forgotten about when I left) when developing the GC intrinsics, so there is no fully working example in public. -Chris ps. Code generation for the GC intrinsics can be improved significantly. We can add new intrinsics that don't pin things to the stack, update optimizations, and do many other things if people started using the GC stuff seriously.
On Thursday 26 February 2009 17:25:56 Chris Lattner wrote:> In my ideal world, this would be: > > 1. Subsystems [with clean interfaces] for thread management, > finalization, object model interactions, etc. > 2. Within different high-level designs (e.g. copying, mark/sweep, etc) > there can be replaceable policy components etc. > 3. A couple of actual GC implementations built on top of #1/2. > Ideally there would only be a couple of high-level collectors that can > be parameterized by replacing subsystems and policies. > 4. A very simple language implementation that uses the facilities, on > the order of complexity as the kaleidoscope tutorial. > > As far as I know, there is nothing that prevents this from happening > today, we just need leadership in the area to drive it. To avoid the > "ivory tower" problem, I'd strongly recommend starting with a simple > GC and language and get the whole thing working top to bottom. From > there, the various pieces can be generalized out etc. This ensures > that there is always a *problem being solved* and something that works > and is testable.I fear that the IR generator and GC are too tightly coupled. For example, the IR I am generating shares pointers read from the heap even across function calls. That is built on the assumption that the pointers are immutable and, therefore, that the GC is non-moving. The generated code is extremely efficient even though I have not even enabled LLVM's optimizations yet precisely because of all this shared immutable data. If you wanted to add a copying GC to my VM you would probably replace every lookup of the IR register with a lookup of the code to reload it, generating a lot of redundant loads that would greatly degrade performance so you would rely upon LLVM's optimization passes to clean it up again. However, I bet they do not have enough information to recover all of the lost performance. So there is a fundamental conflict here where a simple GC design decision has a drastic effect on the IR generator. Although it is theoretically possible to parameterize the IR generator sufficiently to account for all possible combinations of GC designs I suspect the result would be a mess. Consequently, perhaps it would be better to consider IR generation and the GC as a single entity and, instead, factor them both out using a common high-level representation not dissimilar to JVM or CLR bytecode in terms of functionality but much more closely related to LLVM's IR? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
On Feb 26, 2009, at 12:25, Chris Lattner wrote:> On Feb 26, 2009, at 12:02 AM, Talin wrote: > >> With the increasing number of LLVM-based VMs and other projects, I >> suspect that the desire for more comprehensive garbage collection >> support in LLVM is only going to increase. > > What you see in LLVM right now is really only the second step of the > planned GC evolution. The first step was very minimal, but useful > for bridging to other existing collectors. The second step was > Gordon's (significant!) extensions to the system which allowed him > to tie in the Ocaml collector and bring some more sanity to codegen.I agree; this would be a great contribution, making LLVM much more accessible to the development of novel and existing languages.> While people object to adding high level features to LLVM, high > level and language-specific features are *great* in llvm as long as > they are cleanly separable. I would *love* to see a composable > collection of different GC subroutines with clean interfaces built > on LLVM "assembly language" GC stuff.Absolutely. It is definitely valuable that the existing infrastructure doesn't bolt LLVM to a particular runtime. With only a few days of work, PyPy was able to try out the LLVM GC intrinsics and static stack maps and saw a big performance boost from it on their LLVM back-end. (Their GCC backend still outperformed LLVM, but by a much smaller margin.) But this in no way prevents providing GC building blocks for projects that are not working with existing runtimes and GCs.> As far as I know, there is nothing that prevents this from happening > today, we just need leadership in the area to drive it. To avoid > the "ivory tower" problem, I'd strongly recommend starting with a > simple GC and language and get the whole thing working top to > bottom. From there, the various pieces can be generalized out etc. > This ensures that there is always a *problem being solved* and > something that works and is testable.I strongly agree with this as well.> ps. Code generation for the GC intrinsics can be improved > significantly. We can add new intrinsics that don't pin things to > the stack, update optimizations, and do many other things if people > started using the GC stuff seriously.I've already commented on this elsewhere in the thread. Promoting GC roots into SSA variables from stack slots would allow much more freedom for the middle- and back-end optimizations, and I think is clearly the next logical step. — Gordon
Chris Lattner wrote:> On Feb 26, 2009, at 12:02 AM, Talin wrote: > >> With the increasing >> number of LLVM-based VMs and other projects, I suspect that the desire >> for more comprehensive garbage collection support in LLVM is only >> going >> to increase. >> > > Absolutely! > > >> Part of the reason why there isn't more direct support for GC is the >> theory that there is no such thing as a one-size-fits-all collector. >> The >> argument goes that a really efficient collector design requires >> detailed >> knowledge of the object model of the language being compiled. >> > > Yes, you do need to have some knowledge about the object model. > However, it would be perfectly reasonable for LLVM to support and > include multiple different collectors for different classes of language. > > >> On the other hand, it is possible to make a counter-argument to this >> theory that goes like this: The Java VM has been used to implement a >> large number of front-end languages efficiently, without requiring a >> special garbage collector for each language. >> > > Most importantly to me, the takeaway from Java is that just having > something that works "well enough" is really important and helps > bootstrap a lot of projects, which can then take the "last 10% of > performance" as an optimization opportunity, instead of being blocked > from even starting with LLVM. > > I'd claim that JavaVM really isn't a good way to implement a lisp vm > or something like that. However, the perf delta induced by the Java > VM may just *not matter* in the big picture. At least with LLVM, a > Lisp implementation could be brought up on an "OOP GC" and switched to > something more custom as the project develops. > > >> It also seems to me that even radically different collector designs >> could utilize some common building blocks for heap management, work >> queuing, and so on. >> > > Yes. > > >> Of course, there is always a danger when creating libraries of the >> "ivory tower syndrome", putting a lot of effort into components that >> don't actually get used. This is why it would be even better to >> create a >> standard, high performance collector for LLVM that actually uses these >> methods. <many good thoughts trimmed> >> > > What you see in LLVM right now is really only the second step of the > planned GC evolution. The first step was very minimal, but useful for > bridging to other existing collectors. The second step was Gordon's > (significant!) extensions to the system which allowed him to tie in > the Ocaml collector and bring some more sanity to codegen. > > While people object to adding high level features to LLVM, high level > and language-specific features are *great* in llvm as long as they are > cleanly separable. I would *love* to see a composable collection of > different GC subroutines with clean interfaces built on LLVM "assembly > language" GC stuff. > > In my ideal world, this would be: > > 1. Subsystems [with clean interfaces] for thread management, > finalization, object model interactions, etc. > 2. Within different high-level designs (e.g. copying, mark/sweep, etc) > there can be replaceable policy components etc. > 3. A couple of actual GC implementations built on top of #1/2. > Ideally there would only be a couple of high-level collectors that can > be parameterized by replacing subsystems and policies. > 4. A very simple language implementation that uses the facilities, on > the order of complexity as the kaleidoscope tutorial. > > As far as I know, there is nothing that prevents this from happening > today, we just need leadership in the area to drive it. To avoid the > "ivory tower" problem, I'd strongly recommend starting with a simple > GC and language and get the whole thing working top to bottom. From > there, the various pieces can be generalized out etc. This ensures > that there is always a *problem being solved* and something that works > and is testable. > > One of the annoying reasons that the GC stuff is only halfway fleshed > out is that I was working on an out of tree project (which of course > got forgotten about when I left) when developing the GC intrinsics, so > there is no fully working example in public. > > -Chris > > ps. Code generation for the GC intrinsics can be improved > significantly. We can add new intrinsics that don't pin things to the > stack, update optimizations, and do many other things if people > started using the GC stuff seriously. >So I guess what I would be helpful for me is a roadmap that defines more clearly (a) what parts you plan to build in LLVM (beyond what is already there), (b) what parts you would like to have contributed, and (c) what parts you definitely want to keep external. In particular, I'd like to get a clearer picture of the shapes of the various pieces and their roles. For example, I mentioned the "stop the world" function - however since LLVM defines no primitives for creating threads or synchronizing between them, its hard to see how this could be part of LLVM proper. On the other hand, a sibling project (like vmkit or clang) could probably make a more restrictive set of assumptions, such as the existence of either POSIX or Windows threading models or some analog of those being available. "Stop the world" is implementable in terms of those threading primitives if you assume that mutator threads use sync points. Thus, it seems to me that the proper home for such a function would be in a sibling project outside of LLVM proper. At the same time, however, core LLVM could benefit from having an implementation of this, in that it could guide the design of the IR by providing more concrete use cases for things like inserting sync points in generated code and such.> _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >