Benjamin Saunders
2012-Dec-28 21:09 UTC
[LLVMdev] Extending GC infrastructure for roots in SSA values
I'm working on an LLVM backend for Idris, a garbage-collected pure functional programming language, and have experienced some frustration that LLVM's GC support, specifically with regard to mapping roots, operates only on allocas. This entails a lot of otherwise unnecessary stack allocation (especially in a pure language, where in-place mutation is rare) and imposes limitations on what optimizations can be applied. Other LLVM users have used elaborate workarounds to this, such as Rust's use of address spaces and, I believe, GHC's specialized calling convention. I'm interested in extending LLVM to support GC roots in regular SSA values, but, that being a significant change, it's clear that some discussion is in order before diving in if I want to get such a patch merged. This topic has been discussed on multiple previous occasions, and in each case nothing seems to have come of it, though interest appears to be significant. In particular, concerns with how such infrastructure could be made to abide by the invariants of arbitrary GC algorithms seem to have stayed hands. It's not clear to me why that poses a problem--if the property of being a GC root is correctly propagated through all manipulations of a pointer, and that information tracked through register allocation and made available to the GC metadata printer, won't the the resulting system have no more limitations or constraints than the current one? A copying collector would, having a complete list of root locations, still be able to rewrite them; a mark-and-sweep collector would still be able to find everything in need of marking; and so on. If my understanding above is correct, then perhaps the challenge lies in correctly propagating the marking of a pointer in an SSA value as a root through transforms. The pattern of propagation desired seems identical to that of type information, so perhaps it would be best to make the marking of a pointer as a GC root an attribute of its type, much the way address spaces already work? Recall again Rust's approach here, where the behavior of address space information through transforms is exactly what is relied upon. It's easy to imagine a GC root flag on pointer types, but one still needs to attach metadata to enable tagless GC as supported by the existing infrastructure. Rust simply encodes this information into the address space number; a similar approach could be envisioned with a 'GC type ID' number that could be used by the GC metadata printer to look up detailed information in e.g. module-level metadata, but this is a bit awkward; it would be nice to have an interface at least as convenient as the current intrinsic is. If the metadata is uniqued so as not to break type equality and uniquing, would it be viable to have the GCd pointer type itself refer to a metadata node? Finally, is there anything else that needs consideration before attempting this?
First of all, thanks for looking into this! As you've no doubt discovered, I'm one of the people who has talked a lot about this issue in the past, and have been frustrated with the lack of progress in this area. I completely agree with your point about wanting to be able to attach GC metadata to a type (rather than attaching it to a value, as is done now). In the past, there have been two objections to this approach: first, the overhead that would be added to the Pointer type - the vast majority of LLVM users don't want to have to pay an extra 4-8 bytes per Pointer type. And second, that all of the optimization passes would have to be updated so as to not do illegal transformations on a GC type. A different approach would be to create a new kind of derived type that associates metadata with an existing type. This "AnnotatedType" would be essentially a tuple (type, metadata), and would be constant-folded just like other types are. Just like the existing GC intrinsics today, there would be some way for a post-optimization pass to get access to all of the stack variables an examine the annotations on each to determine how to construct the appropriate static data structures. This approach has both a number of advantages and a number of challenges. The first advantage is that it means that LLVM users who aren't interested in GC would pay nothing. A second advantage is that this could also be used to wrap types that are not pointers. One use case I have is being able to handle a discriminated union type which sometimes holds a pointer and sometimes doesn't. The existing intrinsics allow this use case (within the limitations that you point out) - the way it works is that the entire struct is considered a "root" and is passed through to the GC plugin, which generates code to look at the discriminator field and decide whether to trace it or not. However, I'm also aware of the fact that I seem to be the only one who is interested in this particular case, so I won't strongly object if your solution doesn't handle it, as long as the existing intrinsics continue to work. The challenge of this approach is that a lot of backend code will need to unwrap the annotated type in order to operate upon it, and it would be all too easy to discard the associated metadata as part of this process. On Fri, Dec 28, 2012 at 1:09 PM, Benjamin Saunders <ralith at gmail.com> wrote:> I'm working on an LLVM backend for Idris, a garbage-collected pure > functional programming language, and have experienced some frustration > that LLVM's GC support, specifically with regard to mapping roots, > operates only on allocas. This entails a lot of otherwise unnecessary > stack allocation (especially in a pure language, where in-place > mutation is rare) and imposes limitations on what optimizations can be > applied. Other LLVM users have used elaborate workarounds to this, > such as Rust's use of address spaces and, I believe, GHC's specialized > calling convention. I'm interested in extending LLVM to support GC > roots in regular SSA values, but, that being a significant change, > it's clear that some discussion is in order before diving in if I want > to get such a patch merged. > This topic has been discussed on multiple previous occasions, and in > each case nothing seems to have come of it, though interest appears to > be significant. In particular, concerns with how such infrastructure > could be made to abide by the invariants of arbitrary GC algorithms > seem to have stayed hands. It's not clear to me why that poses a > problem--if the property of being a GC root is correctly propagated > through all manipulations of a pointer, and that information tracked > through register allocation and made available to the GC metadata > printer, won't the the resulting system have no more limitations or > constraints than the current one? A copying collector would, having a > complete list of root locations, still be able to rewrite them; a > mark-and-sweep collector would still be able to find everything in > need of marking; and so on. > If my understanding above is correct, then perhaps the challenge lies > in correctly propagating the marking of a pointer in an SSA value as a > root through transforms. The pattern of propagation desired seems > identical to that of type information, so perhaps it would be best to > make the marking of a pointer as a GC root an attribute of its type, > much the way address spaces already work? Recall again Rust's approach > here, where the behavior of address space information through > transforms is exactly what is relied upon. > It's easy to imagine a GC root flag on pointer types, but one still > needs to attach metadata to enable tagless GC as supported by the > existing infrastructure. Rust simply encodes this information into the > address space number; a similar approach could be envisioned with a > 'GC type ID' number that could be used by the GC metadata printer to > look up detailed information in e.g. module-level metadata, but this > is a bit awkward; it would be nice to have an interface at least as > convenient as the current intrinsic is. If the metadata is uniqued so > as not to break type equality and uniquing, would it be viable to have > the GCd pointer type itself refer to a metadata node? > Finally, is there anything else that needs consideration before attempting > this? > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121229/0e16de22/attachment.html>
João Matos
2012-Dec-30 02:39 UTC
[LLVMdev] Extending GC infrastructure for roots in SSA values
On Sun, Dec 30, 2012 at 1:54 AM, Talin <viridia at gmail.com> wrote:> I completely agree with your point about wanting to be able to attach GC > metadata to a type (rather than attaching it to a value, as is done now). > In the past, there have been two objections to this approach: first, the > overhead that would be added to the Pointer type - the vast majority of > LLVM users don't want to have to pay an extra 4-8 bytes per Pointer type. > And second, that all of the optimization passes would have to be updated so > as to not do illegal transformations on a GC type. >I have extended LLVM locally to support metadata on types, though right now it is only supported on struct types. It could be a good first step to implement this. If someone is interested, the code is at: https://github.com/tritao/llvm/commit/e8c24e1c10713d358392c984389fcf2791130ca5 -- João Matos -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121230/6e198fbb/attachment.html>
Sean Silva
2012-Dec-30 03:53 UTC
[LLVMdev] Extending GC infrastructure for roots in SSA values
On Sat, Dec 29, 2012 at 6:54 PM, Talin <viridia at gmail.com> wrote:> However, I'm also aware of the fact that I seem to be the only one who is > interested in this particular case, so I won't strongly object if your > solution doesn't handle it, as long as the existing intrinsics continue to > work.Most high-performance dynamic language implementations use something like this, so it's not that obscure. I'm not sure if LLVM will ever be suitable for that kind of language implementation, but this will at least be necessary to make it happen. -- Sean Silva
David Chisnall
2012-Dec-30 10:17 UTC
[LLVMdev] Extending GC infrastructure for roots in SSA values
On 30 Dec 2012, at 01:54, Talin wrote:> I completely agree with your point about wanting to be able to attach GC metadata to a type (rather than attaching it to a value, as is done now). In the past, there have been two objections to this approach: first, the overhead that would be added to the Pointer type - the vast majority of LLVM users don't want to have to pay an extra 4-8 bytes per Pointer type. And second, that all of the optimization passes would have to be updated so as to not do illegal transformations on a GC type.There are two other alternatives: - Use address spaces to separate garbage-collected from non-garbage-collected pointers. There is (was?) a plan to add an address space cast instruction and explicitly disallow bitcasts of pointers between address spaces. This would mean that you could have one address space for GC roots, one for GC-allocated memory and enforce the casts in your front end. Optimisations would then not be allowed to change the address space of any pointers, so the GC status would be preserved. GC-aware allocations may insert explicit address space casts, where appropriate. - Add a new GC'd pointer type, which is an entirely separate type. This might make sense, as you ideally want GC'd pointers to be treated differently from others (e.g. you may not want pointers to the starts of allocations to be removed) For languages like OCaml, you also want to be able to do escape analysis on GC'd pointers to get good performance (so you don't bother tracing ones that can't possibly escape). This ideally requires a pass that will recursively and automatically apply nocapture attributes to arguments. In functional languages, this ends up being almost all allocations, so you can allocate them either on the stack or on a separate bump-the-pointer allocator and delete them on function return by just resetting the pointer. This means that you would want to be able to have transforms that lowered GC'd pointers to stack or heap pointers. In some implementations, GC'd pointers are fat pointers, so they should not be represented as PointerType in the IR or as iPTR in the back end. David
Benjamin Saunders
2012-Dec-31 00:28 UTC
[LLVMdev] Extending GC infrastructure for roots in SSA values
On Sat, Dec 29, 2012 at 5:54 PM, Talin <viridia at gmail.com> wrote:> First of all, thanks for looking into this! As you've no doubt discovered, > I'm one of the people who has talked a lot about this issue in the past, and > have been frustrated with the lack of progress in this area.Yeah, I spent some time digging through the archives. Frankly, I'm surprised something that would be clearly useful for so many people hasn't had someone else step up before, but I'm happy to be that person.> I completely agree with your point about wanting to be able to attach GC > metadata to a type (rather than attaching it to a value, as is done now). In > the past, there have been two objections to this approach: first, the > overhead that would be added to the Pointer type - the vast majority of LLVM > users don't want to have to pay an extra 4-8 bytes per Pointer type. And > second, that all of the optimization passes would have to be updated so as > to not do illegal transformations on a GC type.Is the added memory cost a realistic concern, bear in mind that types are uniqued? Your comment about optimization passes evokes what I've read in past discussions, and I don't understand it any better now. Can you describe some examples of illegal transformations that would occur if the current optimization passes were left unchanged?> A different approach would be to create a new kind of derived type that > associates metadata with an existing type. This "AnnotatedType" would be > essentially a tuple (type, metadata), and would be constant-folded just like > other types are. Just like the existing GC intrinsics today, there would be > some way for a post-optimization pass to get access to all of the stack > variables an examine the annotations on each to determine how to construct > the appropriate static data structures. > > This approach has both a number of advantages and a number of challenges. > The first advantage is that it means that LLVM users who aren't interested > in GC would pay nothing. A second advantage is that this could also be used > to wrap types that are not pointers. One use case I have is being able to > handle a discriminated union type which sometimes holds a pointer and > sometimes doesn't. The existing intrinsics allow this use case (within the > limitations that you point out) - the way it works is that the entire struct > is considered a "root" and is passed through to the GC plugin, which > generates code to look at the discriminator field and decide whether to > trace it or not. However, I'm also aware of the fact that I seem to be the > only one who is interested in this particular case, so I won't strongly > object if your solution doesn't handle it, as long as the existing > intrinsics continue to work. > > The challenge of this approach is that a lot of backend code will need to > unwrap the annotated type in order to operate upon it, and it would be all > too easy to discard the associated metadata as part of this process.I imagine this could prove useful for even more than GC, as it introduces what one might think of as type-level, as opposed to instruction- or value-level, metadata. Being both simpler and more general, this seems like a much better idea than my proposal, and I'd be happy to run with it if it checks out. In fact, I would have eventually ran into exactly the same problem with discriminated unions that you describe; Idris, like any other modern statically-typed functional language, has algebraic datatypes after all. Getting optimization passes to treat AnnotatedTypes the same as their contained type would be necessary for this to pay off completely; can anyone comment on what, if any, difficulties might be involved there?> On Fri, Dec 28, 2012 at 1:09 PM, Benjamin Saunders <ralith at gmail.com> wrote: >> >> I'm working on an LLVM backend for Idris, a garbage-collected pure >> functional programming language, and have experienced some frustration >> that LLVM's GC support, specifically with regard to mapping roots, >> operates only on allocas. This entails a lot of otherwise unnecessary >> stack allocation (especially in a pure language, where in-place >> mutation is rare) and imposes limitations on what optimizations can be >> applied. Other LLVM users have used elaborate workarounds to this, >> such as Rust's use of address spaces and, I believe, GHC's specialized >> calling convention. I'm interested in extending LLVM to support GC >> roots in regular SSA values, but, that being a significant change, >> it's clear that some discussion is in order before diving in if I want >> to get such a patch merged. >> This topic has been discussed on multiple previous occasions, and in >> each case nothing seems to have come of it, though interest appears to >> be significant. In particular, concerns with how such infrastructure >> could be made to abide by the invariants of arbitrary GC algorithms >> seem to have stayed hands. It's not clear to me why that poses a >> problem--if the property of being a GC root is correctly propagated >> through all manipulations of a pointer, and that information tracked >> through register allocation and made available to the GC metadata >> printer, won't the the resulting system have no more limitations or >> constraints than the current one? A copying collector would, having a >> complete list of root locations, still be able to rewrite them; a >> mark-and-sweep collector would still be able to find everything in >> need of marking; and so on. >> If my understanding above is correct, then perhaps the challenge lies >> in correctly propagating the marking of a pointer in an SSA value as a >> root through transforms. The pattern of propagation desired seems >> identical to that of type information, so perhaps it would be best to >> make the marking of a pointer as a GC root an attribute of its type, >> much the way address spaces already work? Recall again Rust's approach >> here, where the behavior of address space information through >> transforms is exactly what is relied upon. >> It's easy to imagine a GC root flag on pointer types, but one still >> needs to attach metadata to enable tagless GC as supported by the >> existing infrastructure. Rust simply encodes this information into the >> address space number; a similar approach could be envisioned with a >> 'GC type ID' number that could be used by the GC metadata printer to >> look up detailed information in e.g. module-level metadata, but this >> is a bit awkward; it would be nice to have an interface at least as >> convenient as the current intrinsic is. If the metadata is uniqued so >> as not to break type equality and uniquing, would it be viable to have >> the GCd pointer type itself refer to a metadata node? >> Finally, is there anything else that needs consideration before attempting >> this? >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > -- > -- Talin >
Reasonably Related Threads
- [LLVMdev] Extending GC infrastructure for roots in SSA values
- [LLVMdev] Extending GC infrastructure for roots in SSA values
- [LLVMdev] Extending GC infrastructure for roots in SSA values
- [LLVMdev] Extending GC infrastructure for roots in SSA values
- [LLVMdev] Initializing GC roots