*Motivation & Abstract* It has been observed by a number of LLVM users that the current garbage collection framework requires frontends to generate complex and inefficient code. This is primarily due to the fact that only allocas can be stack roots - it is not possible to trace SSA values directly. Because of this limitation, a frontend wanting to support garbage collection has to manually copy any SSA values containing potential roots to memory before each call instruction (or other 'safe point'), and reload them back into SSA values after the call instruction. *Background: How the inability to trace SSA values degrades efficiency.* Although it would be ideal if there was a way for registers to be traced directly, in practice this is very difficult, for a number of reasons - for one thing, it requires the collector to have knowledge of the register file of the current processor. It is much simpler if all garbage collection roots can be identified by an address in memory. Doing this requires that before each safe point, all register values must be spilled to memory. Although this may seem inefficient, in practice it need not be: since most processors have a relatively small register file, chances are that only a few local variables will be live in registers at any given point. Regardless of how deep the stack is (that is, how many call frames there are and how many local variables there are in each frame), the most you should ever need to copy to memory is the entire register file - less if some of the registers contain non-root values such as integers, or dead values. An additional optimization can be obtained if we realize that in most collectors, the *current* call frame can be treated as a special case. A collection cycle only happens inside one of the collector's runtime utility functions, and those functions can be written in such a way as to not require tracing. So it is only the *parent* of the current call frame, and its ancestors, that need to be traced. Unfortunately, reaching this theoretical point of maximum efficiency requires a lot of knowledge about which local variables are in registers at any given moment, and which values are live. LLVM-based frontends do not, as rule, have this knowledge. This requires the frontend to be extremely conservative and pessimistic - it has to treat every SSA value as if it were in a register, and copy it to a stack location whether it really needs to or not. Further, because copying collectors can modify object addresses, the temporary copy on the stack has to be moved back into an SSA value afterwards. The end result is that if you look at a disassembled code for a typical function, the generated code is a mess - it will have dozens of additional allocas at the beginning of the function, each with their own call to llvm.gcroot(), and each initialized to a "safe" value. And each call to a subroutine is surrounded by a bunch of extra stores and loads. There's so much extra clutter that the code is hard to read. *Background: Explicit Lifetimes for Roots* * * Another problem with the current LLVM approach for handling stack roots has to do with the way that you signal to LLVM that a given stack root is going out of scope - which is by assigning NULL to the pointer. Theoretically, this extra store shouldn't be necessary, as long as your tracer can handle the case where different safe points within the same function can have different stack maps (which is fairly easy to do in the current framework). That is, if there was a way to communicate to LLVM the region for which a given stack root was in scope, then the GC strategy could simply remove that variable from the stack map for all safe points which are outside that region. No store would be needed. * * *Background: What is a root?* * * Many language runtimes make the assumption that all roots are simple pointers. For example, in Java, there are really only two kinds of data: primitive numerical types (floats and ints), and objects. Furthermore, all objects are type-tagged, so tracing is very simple: If you see a pointer, and it's non-null, trace it. However, some languages have a richer set of internal data types, which can impact garbage collection. Three fairly common examples come to mind: - LISP - a pointer field can either hold an actual pointer or a small integer. The least significant bit of the pointer is used to determine which type it is. - Tagged unions - many languages support "tagged unions" or "disjoint types". These generally consist of a small struct containing a bitfield and a "payload" area, where the payload area is as large as the largest type in the union. The bitfield is used to determine what type is currently stored in the payload - this affects garbage collection because the collector needs to examine the bitfield in order to determine if the payload contains pointers or not. - Variable-length arrays. These generally contain a 'size' field that determines how many elements are in the array. If the array contains traceable objects, the collector needs to examine the 'size' field in order to determine how many values to trace. (While it is possible that you could get around this by initializing unused array elements to null, in practice this would slow down many array operations.) Currently LLVM puts no restriction on the type of data that can be a root - in other words, a root is "whatever the frontend says is a root". Furthermore, the llvm.gcroot() intrinsic allows arbitrary metadata to be associated with each root, making it relatively easy to handle non-tagged data types - the frontend can use this metadata argument to pass a table of descriptors which tells the collector how to trace that specific value. Take the case of tagged unions as an example. We can declare the entire tagged union struct to be a "root", meaning that what gets passed to the collector is not the address of the payload area, but the address of the whole union struct. The collector knows how to examine the bitfield and trace the payload area because of the metadata that is associated with that root - it won't assume that the root address is the address of a pointer. Now, in practice, the only LLVM data types that will ever be roots are pointers and structs. In the latter case, this will most often be structs containing pointers, but this won't always be the case (Imagine a tagged union containing both pointer and non-pointer members, but whose largest member contains no pointers.) In general, it would be best if LLVM treated roots opaquely - that is, it should not attempt to interpret the contents of the root, but should instead just pass it through to the collector. *Overall Proposal: Support marking SSA values as roots (an evolutionary approach)* My proposal consists of three rather significant changes to LLVM: - Allow frontends to mark SSA values - or even portions of SSA values - as stack roots. - For alloca roots, add a way to communicate to LLVM when a root goes out of scope. - Transfer the responsibility for copying stack roots to memory, from the frontend to the LLVM code generators. Let me take the last point first. *Proposal Pt 3: **Copying Stack Roots to Memory* The LLVM code generators and analysis passes have a much more thorough knowledge of SSA value lifetimes than frontends do, and therefore could avoid spilling and reloading of values when it wasn't needed. So the LLVM libraries should take over the job of creating local allocas for holding SSA values during a safe point, as well as the job of copying those SSA values to those allocas. Of course, generating "optimal" code (as outlined in the previous section) would require a lot of work to the backends. But we don't need to do that right away. The first goal should be a "conservative, pessimistic" implementation that's no better or worse than what we have today (other than being far easier to use.) It might be possible to do this as some sort of stand-alone transformation/lowering pass. This is what I mean by an evolutionary approach - come up with the right interface between the frontends and LLVM that enables us to gradually move towards that eventual goal. *Proposal Pt 1: Marking SSA roots* In order for the backend to be able to properly handle SSA roots, we need some way to communicate to LLVM which values are roots. Currently the way that this is done is via the llvm.gcroot() intrinsic. The intrinsic itself actually does nothing - rather, the presence of the intrinsic instruction is used as a "marker" that is recognized by certain LLVM analysis passes. One approach would be to extend the current scheme to work with SSA values, possibly using a new intrinsic. This is somewhat problematic because you can't really have a 'do-nothing' function on SSA values - the result of the function is always going to be a copy of the value, not the original value that was passed in. A different approach would be to some how annotate an LLVM type with information about it's 'root-ness'. In once sense, this is logical, because it mirrors what the frontend does - uses the type to determine how the garbage collector should treat an object. However, annotating types has some major problems: - LLVM normally folds structurally identical types together into a single type. However, you might have two types which are structurally identical when translated into IR, but which are supposed to be treated differently from a GC standpoint. The only way to avoid this to make the annotations part of the type's signature - to treat types having different annotations as unique. - For garbage collection we need at least 1 bit and 1 pointer per type annotated. (Actually, it's technically possible that we could do without the pointer, which is used to store the metadata, but it would make a frontend developer's life considerably harder.) Unfortunately, increasing the size of every type by that much would be a huge cost. - Pointers in LLVM have an "address space" attribute, from which we can steal a few bits. Unfortunately, this has two problems: Not all roots are pointers, and a few bits aren't really enough. Given that, there are a few different options I can think of - none of which I completely like, but they are better than anything I've come up with so far. *Option 1: Make type annotations first-class types in LLVM* The idea here is that you would have a new kind of type expression in LLVM, which "wraps" an existing type and adds an annotation to it. I'm going to randomly pick a character that isn't used yet in the LLVM ASCII serialization format (#) and craft an example of what this might look like: %mytype = type #( { %mytype*, i32 }, i8* %annotation ) So basically this is saying that 'mytype' is an annotated struct - you can use it just like you would use a struct, but it's a distinct type that will not be folded with other structs, unless they have an identical annotation. Multiple annotations can be handled either by nested annotations (one annotation wrapping another), or by allowing the annotation construct to take multiple arguments. You can have structs that are roots embedded inside non-root structs (this applies to options 2a and 2b as well): %mytype = type { int, #( { %mytype*, i32 }, i8* %annotation ) } The way to interpret this is as follows: If you have an SSA value whose type is "mytype" then only the second member needs to be copied to memory during a safe point, and the address passed to the garbage collector is the address of that member only. The first advantage of this approach is that it's "pay for what you use" - only people who use annotations have to pay the memory cost for them. A second advantage of this approach is that you could use it for other things besides garbage collection - const, volatile, thread-local, and any other value attributes you can think of. The big downside here is that you would have to go and update the LLVM code base to "unwrap" type expressions - that is, any function that takes an LLVM type as an argument now has to potentially strip off an arbitrary number of annotations before it can work with the type. This is a vast amount of work. *Option 2a: Allow annotations for Structs only* Chris Lattner's proposal for named struct types has a fortunate side effect, which is that we can now control whether types are folded or not. Unfortunately, this only applies to structs, and not pointers, which is the other type we are interested in. However, we can convert any arbitrary LLVM into a struct by simply wrapping it in a struct. So let's say that in addition to allowing structs to be named, we also allow arbitrary annotations to be added. (In fact, the struct name could be implemented as one type of annotation, so we're really only adding one extra field.) We make sure that any struct that is a root has a unique name, and add the appropriate 'root' annotation to it. This means that any garbage-collectable pointer type such X* now has to be transformed into { X* }. This shouldn't affect anything except that now you have to add an extra 0 to your GEP expressions. Ah well, such is life. *Option 2b: Annotations on structs via an auxiliary data structure.* * * Instead of adding the ability to annotate struct types, we build a parallel data structure which maps the name of the struct to its garbage collection metadata. The upside of this is that it involves the least number of changes to LLVM, but it has a couple of downsides: - You still have to unwrap pointers before using them (via an extra 0 to GEP.) - This data structure needs to be serialized as part of the module. - The information about the struct is no longer all in one place, the various backend passes that know about stack roots now have to correlate information from two different sources. *Proposal Pt 2: Marking SSA roots* * * The final part of the proposal is to allow the frontend to tell LLVM when an alloca variable has gone out of scope. This part is easy - just add another intrinsic, similar to llvm.gcroot, which is used as a marker. -- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20110706/07d78242/attachment.html>
On Wed, Jul 6, 2011 at 5:24 PM, Talin <viridia at gmail.com> wrote:> One approach would be to extend the current scheme to work with SSA values, > possibly using a new intrinsic. This is somewhat problematic because you > can't really have a 'do-nothing' function on SSA values - the result of the > function is always going to be a copy of the value, not the original value > that was passed in.That doesn't seem like a huge problem. In the common case, the "marked" SSA value and the "unmarked" original wouldn't be live simultaneously. Uses of the unmarked version reachable from the mark intrinsic can either be treated as a separate value or just left undefined. It makes sense to also add an "unmark" intrinsic to define when the SSA stack root goes out of scope
On Wed, Jul 6, 2011 at 3:51 PM, Kenneth Uildriks <kennethuil at gmail.com>wrote:> On Wed, Jul 6, 2011 at 5:24 PM, Talin <viridia at gmail.com> wrote: > > > One approach would be to extend the current scheme to work with SSA > values, > > possibly using a new intrinsic. This is somewhat problematic because you > > can't really have a 'do-nothing' function on SSA values - the result of > the > > function is always going to be a copy of the value, not the original > value > > that was passed in. > > That doesn't seem like a huge problem. In the common case, the > "marked" SSA value and the "unmarked" original wouldn't be live > simultaneously. Uses of the unmarked version reachable from the mark > intrinsic can either be treated as a separate value or just left > undefined. > > It makes sense to also add an "unmark" intrinsic to define when the > SSA stack root goes out of scope >That's an interesting suggestion. I realized, however, that there's another advantage to using type-based marking rather than an intrinsic - it allows you to declare function parameters as stack roots. Depending on the calling convention, some of the parameters to a function will be located on the stack while others will be passed in registers. The ones already in memory ought to be able to be traced right where they are, but unfortunately there's no way to mark them as roots with the intrinsic method - which means that the frontend has to copy the parameters into an alloca and then mark that as a root. -- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20110706/e68b0a7c/attachment.html>
On 7/6/2011 6:24 PM, Talin wrote:> The LLVM code generators and analysis passes have a much more thorough > knowledge of SSA value lifetimes than frontends do, and therefore > could avoid spilling and reloading of values when it wasn't needed.Although this would indeed be nice, it is not done by similar platforms in practice. I have investigated [very] briefly into whether the CLR or JVM implement garbage collection in their IR, and it does not seem that they do (meaning, the CLR/JVM implementation itself is responsible for garbage collection, not the code generated by the CLR/Java language compilers). However, I have left remaining the only strong point in taking an IR approach to GC that you pointed out. I have to say that it makes a convincing argument, since the generators and analysis passes would know even more about the lifetime of variables in code than any plug-in GC library would. I'd be curious as to what benefits could be gained from this approach over the current standard way of doing it; but not terribly hopeful. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20110707/b8612dd2/attachment.html>
On 07.07.2011 08:31, Nate Fries wrote:> On 7/6/2011 6:24 PM, Talin wrote: >> The LLVM code generators and analysis passes have a much more >> thorough knowledge of SSA value lifetimes than frontends do, and >> therefore could avoid spilling and reloading of values when it wasn't >> needed. > Although this would indeed be nice, it is not done by similar > platforms in practice. I have investigated [very] briefly into whether > the CLR or JVM implement garbage collection in their IR, and it does > not seem that they do (meaning, the CLR/JVM implementation itself is > responsible for garbage collection, not the code generated by the > CLR/Java language compilers).I'm not sure this is a valid comparison. CLR and JVM IRs are typed, and by nature of those VMs, *all* pointers on the stack and in registers are automatically GC roots. Unlike LLVM, the JVM (and I think the CLR) doesn't even have a concept of a non-GC pointer. Sebastian -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20110707/789d671e/attachment.html>
On Jul 6, 2011, at 3:24 PM, Talin wrote:> Overall Proposal: Support marking SSA values as roots (an evolutionary approach) > > My proposal consists of three rather significant changes to LLVM: > Allow frontends to mark SSA values - or even portions of SSA values - as stack roots. > For alloca roots, add a way to communicate to LLVM when a root goes out of scope. > Transfer the responsibility for copying stack roots to memory, from the frontend to the LLVM code generators. > Let me take the last point first. > > Proposal Pt 3: Copying Stack Roots to Memory > > The LLVM code generators and analysis passes have a much more thorough knowledge of SSA value lifetimes than frontends do, and therefore could avoid spilling and reloading of values when it wasn't needed. So the LLVM libraries should take over the job of creating local allocas for holding SSA values during a safe point, as well as the job of copying those SSA values to those allocas. > > Of course, generating "optimal" code (as outlined in the previous section) would require a lot of work to the backends. But we don't need to do that right away. The first goal should be a "conservative, pessimistic" implementation that's no better or worse than what we have today (other than being far easier to use.) It might be possible to do this as some sort of stand-alone transformation/lowering pass. > > This is what I mean by an evolutionary approach - come up with the right interface between the frontends and LLVM that enables us to gradually move towards that eventual goal.The initial step of your evolutionary approach sounds basically equivalent to what is possible today. When you try to go beyond that you will run into a lot of issues, since the backend is untyped and doesn't even distinguish between pointers and integers. The closest thing we have to tracking liveness of SSA values throughout the backend is debug info, which is frequently incorrect (although usually not due to the code generator itself). Who is going to implement support for all of this in the code generator passes and maintain it? Will GC compatibility be a requirement for any further improvements in code generation? Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20110707/c201fb52/attachment.html>
The initial step of your evolutionary approach sounds basically equivalent to what is possible today. When you try to go beyond that you will run into a lot of issues, since the backend is untyped and doesn't even distinguish between pointers and integers. The closest thing we have to tracking liveness of SSA values throughout the backend is debug info, which is frequently incorrect (although usually not due to the code generator itself). Who is going to implement support for all of this in the code generator passes and maintain it? Will GC compatibility be a requirement for any further improvements in code generation? Cameron [TAA] I don't know who would do it but perhaps it isn't as daunting as one might expect. We implemented most of this in the Intel compiler and ended up changing about 1-2% of the lines in the code base. The trend seems to be for most new languages to be garbage collected. Yes, you can do garbage collection (even accurate garbage collection) totally at the user-level but in our experience you may be leaving 10% or more performance on the table by not having a tighter integration with the underlying compiler. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20110707/494f4630/attachment.html>
On Thu, Jul 7, 2011 at 2:25 PM, Cameron Zwarich <zwarich at apple.com> wrote:> On Jul 6, 2011, at 3:24 PM, Talin wrote: > > *Overall Proposal: Support marking SSA values as roots (an evolutionary > approach)* > > My proposal consists of three rather significant changes to LLVM: > > - Allow frontends to mark SSA values - or even portions of SSA values - > as stack roots. > - For alloca roots, add a way to communicate to LLVM when a root goes > out of scope. > - Transfer the responsibility for copying stack roots to memory, from > the frontend to the LLVM code generators. > > Let me take the last point first. > > *Proposal Pt 3: **Copying Stack Roots to Memory* > > The LLVM code generators and analysis passes have a much more thorough > knowledge of SSA value lifetimes than frontends do, and therefore could > avoid spilling and reloading of values when it wasn't needed. So the LLVM > libraries should take over the job of creating local allocas for holding SSA > values during a safe point, as well as the job of copying those SSA values > to those allocas. > > Of course, generating "optimal" code (as outlined in the previous section) > would require a lot of work to the backends. But we don't need to do that > right away. The first goal should be a "conservative, pessimistic" > implementation that's no better or worse than what we have today (other than > being far easier to use.) It might be possible to do this as some sort of > stand-alone transformation/lowering pass. > > This is what I mean by an evolutionary approach - come up with the right > interface between the frontends and LLVM that enables us to gradually move > towards that eventual goal. > > > The initial step of your evolutionary approach sounds basically equivalent > to what is possible today. When you try to go beyond that you will run into > a lot of issues, since the backend is untyped and doesn't even distinguish > between pointers and integers. The closest thing we have to tracking > liveness of SSA values throughout the backend is debug info, which is > frequently incorrect (although usually not due to the code generator > itself). > > Who is going to implement support for all of this in the code generator > passes and maintain it? Will GC compatibility be a requirement for any > further improvements in code generation? > > That's really the key question. Most of the developers who are paid to workon LLVM are doing so in support of Clang, which doesn't need this feature. And language experimenters generally don't have time to work on their own language and at the same time become familiar enough with the LLVM code generator passes to do useful work. It's kind of a chicken and egg problem: Managed language creators aren't attracted to LLVM because it lacks certain features - they are far more likely to choose the JVM as a development platform (as seen in the case of Scala, Groovy, and many others). Conversely, the lack of customers for those features in LLVM makes it difficult to justify putting significant effort into building and maintaining them. This is what I meant when I said that the LLVM community lacks momentum in certain areas. However, what I'm attempting to do is at least get some sort of consensus on what we'd *like* to have happen, even if we don't yet have someone available to do the work. If some GSoC student showed up tomorrow and volunteered to do all this, I don't think we'd be in agreement as to what we'd want them to do. I would say that first and foremost, let's come up with some scheme that allows types, rather than values, to be marked as roots, and which doesn't impose a huge cost on users that don't need GC. Pretty much everything else is contingent on that. I suggested several approaches, but I'm sure that there are people on this list who have much better ideas than mine.> Cameron >-- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20110707/252fd228/attachment.html>
So - here's a really modest proposal: Based on Renato's suggestion, there is a way to associate arbitrary annotation data with any LLVM type. So what I would propose as a first step is to simply replace (or perhaps supplement) the current llvm.gcroot() feature with a type annotation. That is, LLVM would only support roots that are allocas, just it does today. The backend code would not change at all, except that in addition to detecting the presence of the llvm.gcroot marker, it would also detect the presence of a type annotation. That means that *every* alloca of a given type would be considered a root - there would be no need to insert the llvm.gcroot intrinsic into each function. Later we could think about supporting SSA local variables, parameters, and register maps. But this first step is a prerequisite for all of those other things, and I'm thinking that it's a relatively easy step compared to the others. -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20110711/8ca8c435/attachment.html>
On Jul 11, 2011, at 2:28 PM, Talin wrote:> So - here's a really modest proposal: > > Based on Renato's suggestion, there is a way to associate arbitrary annotation data with any LLVM type. > > So what I would propose as a first step is to simply replace (or perhaps supplement) the current llvm.gcroot() feature with a type annotation. That is, LLVM would only support roots that are allocas, just it does today. The backend code would not change at all, except that in addition to detecting the presence of the llvm.gcroot marker, it would also detect the presence of a type annotation. That means that *every* alloca of a given type would be considered a root - there would be no need to insert the llvm.gcroot intrinsic into each function. > > Later we could think about supporting SSA local variables, parameters, and register maps. But this first step is a prerequisite for all of those other things, and I'm thinking that it's a relatively easy step compared to the others. > > -- TalinI may have some fundamental misconceptions of how gcroot works for you today. If you can clarify it for me, then I can better understand your proposal. I always assumed that an essential side effect of the gcroot() intrinsic was to take the address of a local variable, implicitly preventing optimization. But your recent comments indicate that gcroot does nothing beyond annotating a value. Without gcroot, what prevents mem2reg from promoting the alloca? Furthermore, what prevents optimizations such as GVN from optimizing access to the local variable across call sites? I assume stack variables that don't "escape" are fair game for all the usual scalar optimizations--but I could be wrong. -Andy