On Thursday 26 February 2009 17:25:56 Chris Lattner wrote:> In my ideal world, this would be: > > 1. Subsystems [with clean interfaces] for thread management, > finalization, object model interactions, etc. > 2. Within different high-level designs (e.g. copying, mark/sweep, etc) > there can be replaceable policy components etc. > 3. A couple of actual GC implementations built on top of #1/2. > Ideally there would only be a couple of high-level collectors that can > be parameterized by replacing subsystems and policies. > 4. A very simple language implementation that uses the facilities, on > the order of complexity as the kaleidoscope tutorial. > > As far as I know, there is nothing that prevents this from happening > today, we just need leadership in the area to drive it. To avoid the > "ivory tower" problem, I'd strongly recommend starting with a simple > GC and language and get the whole thing working top to bottom. From > there, the various pieces can be generalized out etc. This ensures > that there is always a *problem being solved* and something that works > and is testable.I fear that the IR generator and GC are too tightly coupled. For example, the IR I am generating shares pointers read from the heap even across function calls. That is built on the assumption that the pointers are immutable and, therefore, that the GC is non-moving. The generated code is extremely efficient even though I have not even enabled LLVM's optimizations yet precisely because of all this shared immutable data. If you wanted to add a copying GC to my VM you would probably replace every lookup of the IR register with a lookup of the code to reload it, generating a lot of redundant loads that would greatly degrade performance so you would rely upon LLVM's optimization passes to clean it up again. However, I bet they do not have enough information to recover all of the lost performance. So there is a fundamental conflict here where a simple GC design decision has a drastic effect on the IR generator. Although it is theoretically possible to parameterize the IR generator sufficiently to account for all possible combinations of GC designs I suspect the result would be a mess. Consequently, perhaps it would be better to consider IR generation and the GC as a single entity and, instead, factor them both out using a common high-level representation not dissimilar to JVM or CLR bytecode in terms of functionality but much more closely related to LLVM's IR? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
On Feb 26, 2009, at 6:17 PM, Jon Harrop wrote:> Although it is theoretically possible to parameterize the IR generator > sufficiently to account for all possible combinations of GC designs > I suspect > the result would be a mess. Consequently, perhaps it would be better > to > consider IR generation and the GC as a single entity and, instead, > factor > them both out using a common high-level representation not > dissimilar to JVM > or CLR bytecode in terms of functionality but much more closely > related to > LLVM's IR?This falls into the category of "future work to improve codegen". Tagging pointers as "magic for GC" is very trivial now, just mark one LLVM "address space" as being a GC heap. Apply magic to that address space, problem solved. Lets get the basics working well first though :) -Chris
Jon Harrop wrote:> On Thursday 26 February 2009 17:25:56 Chris Lattner wrote: >> In my ideal world, this would be: >> >> 1. Subsystems [with clean interfaces] for thread management, >> finalization, object model interactions, etc. >> 2. Within different high-level designs (e.g. copying, mark/sweep, etc) >> there can be replaceable policy components etc. >> 3. A couple of actual GC implementations built on top of #1/2. >> Ideally there would only be a couple of high-level collectors that can >> be parameterized by replacing subsystems and policies. >> 4. A very simple language implementation that uses the facilities, on >> the order of complexity as the kaleidoscope tutorial. >> >> As far as I know, there is nothing that prevents this from happening >> today, we just need leadership in the area to drive it. To avoid the >> "ivory tower" problem, I'd strongly recommend starting with a simple >> GC and language and get the whole thing working top to bottom. From >> there, the various pieces can be generalized out etc. This ensures >> that there is always a *problem being solved* and something that works >> and is testable. > > I fear that the IR generator and GC are too tightly coupled. > > For example, the IR I am generating shares pointers read from the heap even > across function calls. That is built on the assumption that the pointers are > immutable and, therefore, that the GC is non-moving. The generated code is > extremely efficient even though I have not even enabled LLVM's optimizations > yet precisely because of all this shared immutable data. > > If you wanted to add a copying GC to my VM you would probably replace every > lookup of the IR register with a lookup of the code to reload it, generating > a lot of redundant loads that would greatly degrade performance so you would > rely upon LLVM's optimization passes to clean it up again. However, I bet > they do not have enough information to recover all of the lost performance. > So there is a fundamental conflict here where a simple GC design decision has > a drastic effect on the IR generator. > > Although it is theoretically possible to parameterize the IR generator > sufficiently to account for all possible combinations of GC designs I suspect > the result would be a mess. Consequently, perhaps it would be better to > consider IR generation and the GC as a single entity and, instead, factor > them both out using a common high-level representation not dissimilar to JVM > or CLR bytecode in terms of functionality but much more closely related to > LLVM's IR? >IMHO, it would be better if support for GC was dropped from llvm altogether. I say this having written a copying GC for my VM toolkit, which also uses llvm to do its JIT compilation. And it works just fine! I have simply avoided the intrinsics. The problem with the llvm is that to write a GC using the llvm intrinsics, you have to mess around with the code-gen part of llvm. When I want to add a generational collector to my toolkit in the future, it is easy to specify write-barriers in the IR. Modifying code-gen to handle the intrinsics is a task I would rather avoid. Mark.
Mark Shannon wrote:> Jon Harrop wrote: >> On Thursday 26 February 2009 17:25:56 Chris Lattner wrote: >>> In my ideal world, this would be: >>> >>> 1. Subsystems [with clean interfaces] for thread management, >>> finalization, object model interactions, etc. >>> 2. Within different high-level designs (e.g. copying, mark/sweep, etc) >>> there can be replaceable policy components etc. >>> 3. A couple of actual GC implementations built on top of #1/2. >>> Ideally there would only be a couple of high-level collectors that can >>> be parameterized by replacing subsystems and policies. >>> 4. A very simple language implementation that uses the facilities, on >>> the order of complexity as the kaleidoscope tutorial. >>> >>> As far as I know, there is nothing that prevents this from happening >>> today, we just need leadership in the area to drive it. To avoid the >>> "ivory tower" problem, I'd strongly recommend starting with a simple >>> GC and language and get the whole thing working top to bottom. From >>> there, the various pieces can be generalized out etc. This ensures >>> that there is always a *problem being solved* and something that works >>> and is testable. >> I fear that the IR generator and GC are too tightly coupled. >> >> For example, the IR I am generating shares pointers read from the heap even >> across function calls. That is built on the assumption that the pointers are >> immutable and, therefore, that the GC is non-moving. The generated code is >> extremely efficient even though I have not even enabled LLVM's optimizations >> yet precisely because of all this shared immutable data. >> >> If you wanted to add a copying GC to my VM you would probably replace every >> lookup of the IR register with a lookup of the code to reload it, generating >> a lot of redundant loads that would greatly degrade performance so you would >> rely upon LLVM's optimization passes to clean it up again. However, I bet >> they do not have enough information to recover all of the lost performance. >> So there is a fundamental conflict here where a simple GC design decision has >> a drastic effect on the IR generator. >> >> Although it is theoretically possible to parameterize the IR generator >> sufficiently to account for all possible combinations of GC designs I suspect >> the result would be a mess. Consequently, perhaps it would be better to >> consider IR generation and the GC as a single entity and, instead, factor >> them both out using a common high-level representation not dissimilar to JVM >> or CLR bytecode in terms of functionality but much more closely related to >> LLVM's IR? >> > > IMHO, it would be better if support for GC was dropped from llvm > altogether. I say this having written a copying GC for my VM toolkit, > which also uses llvm to do its JIT compilation. And it works just fine! > > I have simply avoided the intrinsics. > > The problem with the llvm is that to write a GC using the llvm > intrinsics, you have to mess around with the code-gen part of llvm. > > When I want to add a generational collector to my toolkit in the future, > it is easy to specify write-barriers in the IR. Modifying code-gen to > handle the intrinsics is a task I would rather avoid.People need very different things for GC. All we need for Java is the ability to dump all live object pointers into the stack, generate a bitmap that describes which words on the stack are object pointers. Also, the optimizer has to be taught that while objects might move during a collection, this will never cause a valid object pointer to become NULL nor will it change the contents of any non-reference fields. I don't think that this is an enormous task. Andrew.
Jon Harrop wrote:> On Thursday 26 February 2009 17:25:56 Chris Lattner wrote: > >> In my ideal world, this would be: >> >> 1. Subsystems [with clean interfaces] for thread management, >> finalization, object model interactions, etc. >> 2. Within different high-level designs (e.g. copying, mark/sweep, etc) >> there can be replaceable policy components etc. >> 3. A couple of actual GC implementations built on top of #1/2. >> Ideally there would only be a couple of high-level collectors that can >> be parameterized by replacing subsystems and policies. >> 4. A very simple language implementation that uses the facilities, on >> the order of complexity as the kaleidoscope tutorial. >> >> As far as I know, there is nothing that prevents this from happening >> today, we just need leadership in the area to drive it. To avoid the >> "ivory tower" problem, I'd strongly recommend starting with a simple >> GC and language and get the whole thing working top to bottom. From >> there, the various pieces can be generalized out etc. This ensures >> that there is always a *problem being solved* and something that works >> and is testable. >> > > I fear that the IR generator and GC are too tightly coupled. > > For example, the IR I am generating shares pointers read from the heap even > across function calls. That is built on the assumption that the pointers are > immutable and, therefore, that the GC is non-moving. The generated code is > extremely efficient even though I have not even enabled LLVM's optimizations > yet precisely because of all this shared immutable data. > > If you wanted to add a copying GC to my VM you would probably replace every > lookup of the IR register with a lookup of the code to reload it, generating > a lot of redundant loads that would greatly degrade performance so you would > rely upon LLVM's optimization passes to clean it up again. However, I bet > they do not have enough information to recover all of the lost performance. > So there is a fundamental conflict here where a simple GC design decision has > a drastic effect on the IR generator. > > Although it is theoretically possible to parameterize the IR generator > sufficiently to account for all possible combinations of GC designs I suspect > the result would be a mess. Consequently, perhaps it would be better to > consider IR generation and the GC as a single entity and, instead, factor > them both out using a common high-level representation not dissimilar to JVM > or CLR bytecode in terms of functionality but much more closely related to > LLVM's IR? > >Most copying collector designs that I have seen rely on explicit coordination from the mutator, which means that object addresses won't change at any arbitrary time, but only during a "sync point". It's up to the mutator to call "sync" at fairly regular intervals (although calls to allocate memory for an object are implicitly sync points as well.) During that call, the heap may get re-arranged, but between sync points pointer values are stable. So you can still share pointer values most of the time. -- Talin
On Feb 26, 2009, at 21:17, Jon Harrop wrote:> For example, the IR I am generating shares pointers read from the > heap even across function calls. That is built on the assumption > that the pointers are immutable and, therefore, that the GC is non- > moving. [...] > > If you wanted to add a copying GC to my VM you would probably > replace every lookup of the IR register with a lookup of the code to > reload it, generating a lot of redundant loads that would greatly > degrade performance so you would rely upon LLVM's optimization > passes to clean it up again. However, I bet they do not have enough > information to recover all of the lost performance. So there is a > fundamental conflict here where a simple GC design decision has a > drastic effect on the IR generator.I agree this could be better. I think it would be prudent of you, being aware of this problem, to structure your compiler so as to limit the number of pieces of code which would be effected when you switch to a copying collector. I think such structure in the back-end for your compiler addresses the same needs as would be by your "slightly less low-level virtual machine." I order to address it thoroughly, the LLVM GC infrastructure needs to track register roots so that it doesn't need to conservatively reload from the stack so frequently. This would likely entail a change to the IR (either a 'GC' address space as Chris suggests, a new intrinsic to 'tag' a value as a GC pointer, or even a change to the Type hierarchy)--another reason to isolate GC-handling code in your compiler. — Gordon
On Friday 27 February 2009 18:42:13 Gordon Henriksen wrote:> I agree this could be better. I think it would be prudent of you, > being aware of this problem, to structure your compiler so as to limit > the number of pieces of code which would be effected when you switch > to a copying collector.I think that would make my VM a lot more complicated for no clear practical gain.> I order to address it thoroughly, the LLVM GC infrastructure needs to > track register roots so that it doesn't need to conservatively reload > from the stack so frequently.F# suffers because null references are typeless on .NET but you want to use typed null references to represent the empty list and the None option type constructor and many other things. The F# team cannot fix their VM so they were forced to settle on an inefficient implementation of lists in a language where lists are ubiquitous and an implementation of the option type that is efficient but does not pretty print correctly due to insufficient type information. I can and did fix my VM by representing reference types as an aggregate containing a pointer to shared data (the type) and a pointer to individual data (the value). Thus you can have a typed null pointer by having a pointer to a valid type and a null pointer for the individual data. However, that means my roots are now aggregate values. So your proposed GC infrastructure for LLVM would need to be able to mark aggregate values as roots as well as individual values. Moreover, even if we went through all this work of improving LLVM to support this GC infrastructure and altering my VM to traverse the system stack instead of my current shadow stack, I am not even convinced it would be a significant improvement over what I already have.> This would likely entail a change to the > IR (either a 'GC' address space as Chris suggests, a new intrinsic to > 'tag' a value as a GC pointer, or even a change to the Type > hierarchy)Can you elaborate on a changing of the type hierarchy?> --another reason to isolate GC-handling code in your compiler.I just cannot see how the GC-handling code can be isolated in such a way that the result is better than just having separate HLVMs for separate design decisions. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e