On Feb 26, 2009, at 12:25, Chris Lattner wrote:> On Feb 26, 2009, at 12:02 AM, Talin wrote: > >> With the increasing number of LLVM-based VMs and other projects, I >> suspect that the desire for more comprehensive garbage collection >> support in LLVM is only going to increase. > > What you see in LLVM right now is really only the second step of the > planned GC evolution. The first step was very minimal, but useful > for bridging to other existing collectors. The second step was > Gordon's (significant!) extensions to the system which allowed him > to tie in the Ocaml collector and bring some more sanity to codegen.I agree; this would be a great contribution, making LLVM much more accessible to the development of novel and existing languages.> While people object to adding high level features to LLVM, high > level and language-specific features are *great* in llvm as long as > they are cleanly separable. I would *love* to see a composable > collection of different GC subroutines with clean interfaces built > on LLVM "assembly language" GC stuff.Absolutely. It is definitely valuable that the existing infrastructure doesn't bolt LLVM to a particular runtime. With only a few days of work, PyPy was able to try out the LLVM GC intrinsics and static stack maps and saw a big performance boost from it on their LLVM back-end. (Their GCC backend still outperformed LLVM, but by a much smaller margin.) But this in no way prevents providing GC building blocks for projects that are not working with existing runtimes and GCs.> As far as I know, there is nothing that prevents this from happening > today, we just need leadership in the area to drive it. To avoid > the "ivory tower" problem, I'd strongly recommend starting with a > simple GC and language and get the whole thing working top to > bottom. From there, the various pieces can be generalized out etc. > This ensures that there is always a *problem being solved* and > something that works and is testable.I strongly agree with this as well.> ps. Code generation for the GC intrinsics can be improved > significantly. We can add new intrinsics that don't pin things to > the stack, update optimizations, and do many other things if people > started using the GC stuff seriously.I've already commented on this elsewhere in the thread. Promoting GC roots into SSA variables from stack slots would allow much more freedom for the middle- and back-end optimizations, and I think is clearly the next logical step. — Gordon
BTW, have you look at MMTk (http://jikesrvm.org/MMTk) ? This is the garbage collection library that underlies JikesRVM. It is a 'research-oriented' implementation, meaning that it has lots of configurable settings and plugin interfaces for implementing a broad range of collection algorithms. I was amused by the fact that "building a hybrid copying/mark-sweep collector" is one of the steps in their tutorial :) Of particular interest for LLVM is MMTk's API for communicating with the VM: http://rvm.codehaus.org/docs/api/org/mmtk/vm/VM.html http://rvm.codehaus.org/docs/api/org/mmtk/vm/Memory.html http://rvm.codehaus.org/docs/api/org/mmtk/vm/Barriers.html Although MMTk is far too much of a Swiss Army Knife for my purposes - too generalized and too complex - nevertheless some of its abstractions might be useful starting points for designing the kind of building blocks we've been talking about. One major simplification is that we know that we're building on top of LLVM, so many of the tunable parameters in MMTk can be replaced with constants :) One challenge is that MMTk's idioms for pluggability and extensibility are fairly Java-centric. I'm trying to determine what is the best way to do deeply invasive customization (like, swapping out the definition of a mutex or changing the low-level primitive for zeroing memory) that would allow a similar degree of customization, but in a way that matches the idioms of C++. The traditional OOP style of customization involving virtual functions and subclassing can work for many areas, but for the performance critical components I would rather do the customization via metaprogramming or some similar technique where all of the customization decisions are done at compile time. -- Talin Gordon Henriksen wrote:> On Feb 26, 2009, at 12:25, Chris Lattner wrote: > > >> On Feb 26, 2009, at 12:02 AM, Talin wrote: >> >> >>> With the increasing number of LLVM-based VMs and other projects, I >>> suspect that the desire for more comprehensive garbage collection >>> support in LLVM is only going to increase. >>> >> What you see in LLVM right now is really only the second step of the >> planned GC evolution. The first step was very minimal, but useful >> for bridging to other existing collectors. The second step was >> Gordon's (significant!) extensions to the system which allowed him >> to tie in the Ocaml collector and bring some more sanity to codegen. >> > > I agree; this would be a great contribution, making LLVM much more > accessible to the development of novel and existing languages. > > >> While people object to adding high level features to LLVM, high >> level and language-specific features are *great* in llvm as long as >> they are cleanly separable. I would *love* to see a composable >> collection of different GC subroutines with clean interfaces built >> on LLVM "assembly language" GC stuff. >> > > Absolutely. > > It is definitely valuable that the existing infrastructure doesn't > bolt LLVM to a particular runtime. With only a few days of work, PyPy > was able to try out the LLVM GC intrinsics and static stack maps and > saw a big performance boost from it on their LLVM back-end. (Their GCC > backend still outperformed LLVM, but by a much smaller margin.) But > this in no way prevents providing GC building blocks for projects that > are not working with existing runtimes and GCs. > > >> As far as I know, there is nothing that prevents this from happening >> today, we just need leadership in the area to drive it. To avoid >> the "ivory tower" problem, I'd strongly recommend starting with a >> simple GC and language and get the whole thing working top to >> bottom. From there, the various pieces can be generalized out etc. >> This ensures that there is always a *problem being solved* and >> something that works and is testable. >> > > I strongly agree with this as well. > > >> ps. Code generation for the GC intrinsics can be improved >> significantly. We can add new intrinsics that don't pin things to >> the stack, update optimizations, and do many other things if people >> started using the GC stuff seriously. >> > > > I've already commented on this elsewhere in the thread. Promoting GC > roots into SSA variables from stack slots would allow much more > freedom for the middle- and back-end optimizations, and I think is > clearly the next logical step. > > — Gordon > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
Hi Talin, Talin wrote:> BTW, have you look at MMTk (http://jikesrvm.org/MMTk) ? This is the > garbage collection library that underlies JikesRVM. It is a > 'research-oriented' implementation, meaning that it has lots of > configurable settings and plugin interfaces for implementing a broad > range of collection algorithms. I was amused by the fact that "building > a hybrid copying/mark-sweep collector" is one of the steps in their > tutorial :) > > Of particular interest for LLVM is MMTk's API for communicating with the VM: > > http://rvm.codehaus.org/docs/api/org/mmtk/vm/VM.html > http://rvm.codehaus.org/docs/api/org/mmtk/vm/Memory.html > http://rvm.codehaus.org/docs/api/org/mmtk/vm/Barriers.html > > Although MMTk is far too much of a Swiss Army Knife for my purposes - > too generalized and too complex - nevertheless some of its abstractions > might be useful starting points for designing the kind of building > blocks we've been talking about. One major simplification is that we > know that we're building on top of LLVM, so many of the tunable > parameters in MMTk can be replaced with constants :) > > One challenge is that MMTk's idioms for pluggability and extensibility > are fairly Java-centric.That's where I come in :) So I've been thinking of using MMTk in VMKit for quite a while. I think it would be the best solution for GC in LLVM for starters. The design of MMTk is VM-agnostic and hopefully should not need too much difficulty in interfacing with LLVM. Now for the Java part. VMKit recently has had an implementation of an ahead of time compiler (vmjc) that generates a .ll file. The file can then be compiled in a shared library to be loaded by VMKit. I think by tuning/optimizing vmjc, we can benefit from the runtime optimizations that MMTk relies on for performance. That's only an impression, since I haven't investigated completely on the subject. I've added an open projects entry into the list of open projects for GSoC on VMKit (http://vmkit.llvm.org/OpenProjects.html). I know other research groups have shown some interests on having MMTk ported to LLVM. Nicolas> I'm trying to determine what is the best way to > do deeply invasive customization (like, swapping out the definition of a > mutex or changing the low-level primitive for zeroing memory) that > would allow a similar degree of customization, but in a way that matches > the idioms of C++. The traditional OOP style of customization involving > virtual functions and subclassing can work for many areas, but for the > performance critical components I would rather do the customization via > metaprogramming or some similar technique where all of the customization > decisions are done at compile time. > > -- Talin > > Gordon Henriksen wrote: > >> On Feb 26, 2009, at 12:25, Chris Lattner wrote: >> >> >> >>> On Feb 26, 2009, at 12:02 AM, Talin wrote: >>> >>> >>> >>>> With the increasing number of LLVM-based VMs and other projects, I >>>> suspect that the desire for more comprehensive garbage collection >>>> support in LLVM is only going to increase. >>>> >>>> >>> What you see in LLVM right now is really only the second step of the >>> planned GC evolution. The first step was very minimal, but useful >>> for bridging to other existing collectors. The second step was >>> Gordon's (significant!) extensions to the system which allowed him >>> to tie in the Ocaml collector and bring some more sanity to codegen. >>> >>> >> I agree; this would be a great contribution, making LLVM much more >> accessible to the development of novel and existing languages. >> >> >> >>> While people object to adding high level features to LLVM, high >>> level and language-specific features are *great* in llvm as long as >>> they are cleanly separable. I would *love* to see a composable >>> collection of different GC subroutines with clean interfaces built >>> on LLVM "assembly language" GC stuff. >>> >>> >> Absolutely. >> >> It is definitely valuable that the existing infrastructure doesn't >> bolt LLVM to a particular runtime. With only a few days of work, PyPy >> was able to try out the LLVM GC intrinsics and static stack maps and >> saw a big performance boost from it on their LLVM back-end. (Their GCC >> backend still outperformed LLVM, but by a much smaller margin.) But >> this in no way prevents providing GC building blocks for projects that >> are not working with existing runtimes and GCs. >> >> >> >>> As far as I know, there is nothing that prevents this from happening >>> today, we just need leadership in the area to drive it. To avoid >>> the "ivory tower" problem, I'd strongly recommend starting with a >>> simple GC and language and get the whole thing working top to >>> bottom. From there, the various pieces can be generalized out etc. >>> This ensures that there is always a *problem being solved* and >>> something that works and is testable. >>> >>> >> I strongly agree with this as well. >> >> >> >>> ps. Code generation for the GC intrinsics can be improved >>> significantly. We can add new intrinsics that don't pin things to >>> the stack, update optimizations, and do many other things if people >>> started using the GC stuff seriously. >>> >>> >> I've already commented on this elsewhere in the thread. Promoting GC >> roots into SSA variables from stack slots would allow much more >> freedom for the middle- and back-end optimizations, and I think is >> clearly the next logical step. >> >> — Gordon >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >