On Mon, Mar 7, 2011 at 4:08 AM, nicolas geoffray <nicolas.geoffray at gmail.com> wrote:> Hi Talin, > > On Sat, Mar 5, 2011 at 6:42 PM, Talin <viridia at gmail.com> wrote: >> >> >> So I've been thinking about your proposal, that of using a special address >> space to indicate garbage collection roots instead of intrinsics. > > > Great! > > >> >> To address this, we need a better way of telling LLVM that a given >> variable is no longer a root. >> > > Live variable analysis is already in LLVM and for me that's enough to know > whether a given variable is no longer a root. Note that each safe point has > its own set of root locations, and these locations all contain live > variables. Dead variables may still be in register or stack, but the GC will > not visit them. > > >> 2) As I mentioned, my language supports tagged unions and other "value" >> types. Another example is a tuple type, such as (String, String). Such types >> are never allocated on the heap by themselves, because they don't have the >> object header structure that holds the type information needed by the >> garbage collector. Instead, these values can live in SSA variables, or in >> allocas, or they can be embedded inside larger types which do live on the >> heap. >> > > If you know, at compile-time, whether you are dealing with a struct or a > heap, what prevents you from emitting code that won't need such tagged > unions in the IR. Same for structs: if they contain pointers to heap > objects, those will be in that special address space. >I'm not sure what you mean by this. Take for example a union of a String (which is a pointer) and a float. The union is either { i1; String * } or { i1; float }. The garbage collector needs to see that i1 in order to know whether the second field of the struct is a pointer - if it attempted to dereference the pointer when the field actually contains a float, the program would crash. The metadata argument that I pass to llvm.gcroot informs the garbage collector about the structure of the union.> > 3) I've been following the discussions on llvm-dev about the use of the >> address-space property of pointers to signal different kinds of memory pools >> for things like shared address spaces. If we try to use that same variable >> to indicate garbage collection, now we have to multiplex both meanings onto >> the same field. We can't just dedicate one special ID for the garbage >> collected heap, because there could be multiple such heaps. As you add >> additional orthogonal meanings to the address-space field, you end up with a >> combinatorial explosion of possible values for it. >> >> > I think there exist already some convention between an ID and some codegen. > Having one additional seems fine to me, even if you need to play with bits > in case you need different IDs for a single pointer. > > I'm also fine with the intrinsic way of declaring a GC root. But I think it > is cumbersome, and error-prone in the presence of optimizers that may try to > move away that intrinsic (I remember similar issues with the current EH > intrinsics). > > Nicolas > > >> -- >> -- Talin >> > >-- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110307/e32535e9/attachment.html>
On Mon, Mar 7, 2011 at 9:35 AM, Talin <viridia at gmail.com> wrote:> On Mon, Mar 7, 2011 at 4:08 AM, nicolas geoffray < > nicolas.geoffray at gmail.com> wrote: > >> Hi Talin, >> >> On Sat, Mar 5, 2011 at 6:42 PM, Talin <viridia at gmail.com> wrote: >>> >>> >>> So I've been thinking about your proposal, that of using a special >>> address space to indicate garbage collection roots instead of intrinsics. >> >> >> Great! >> >> >>> >>> To address this, we need a better way of telling LLVM that a given >>> variable is no longer a root. >>> >> >> Live variable analysis is already in LLVM and for me that's enough to know >> whether a given variable is no longer a root. Note that each safe point has >> its own set of root locations, and these locations all contain live >> variables. Dead variables may still be in register or stack, but the GC will >> not visit them. >> >> >>> 2) As I mentioned, my language supports tagged unions and other "value" >>> types. Another example is a tuple type, such as (String, String). Such types >>> are never allocated on the heap by themselves, because they don't have the >>> object header structure that holds the type information needed by the >>> garbage collector. Instead, these values can live in SSA variables, or in >>> allocas, or they can be embedded inside larger types which do live on the >>> heap. >>> >> >> If you know, at compile-time, whether you are dealing with a struct or a >> heap, what prevents you from emitting code that won't need such tagged >> unions in the IR. Same for structs: if they contain pointers to heap >> objects, those will be in that special address space. >> > > I'm not sure what you mean by this. > > Take for example a union of a String (which is a pointer) and a float. The > union is either { i1; String * } or { i1; float }. The garbage collector > needs to see that i1 in order to know whether the second field of the struct > is a pointer - if it attempted to dereference the pointer when the field > actually contains a float, the program would crash. The metadata argument > that I pass to llvm.gcroot informs the garbage collector about the structure > of the union. >Sorry, I left a part out. The way that my garbage collector works currently is that the collector gets a pointer to the enture union struct, not just the pointer field within the union. In other words, the entire union struct is considered a "root". In fact, there might not even be a pointer in the struct. You see, because LLVM doesn't directly support unions, I have to simulate that support by casting pointers. That is, for each different type contained in the union, I have a different struct type, and when I want to extract data from the union I cast the pointer to the appropriate type and then use GEP to get the data out. However, when allocating storage for the union, I have to use the largest data type, which might not be a pointer. For example, suppose I have a type "String or (float, float, float)" - that is, a union of a string and a 3-tuple of floats. Most of the time what LLVM will see is { i1; { float; float; float; } } because that's bigger than { i1; String* }. LLVM won't even know there's a pointer in there, except during those brief times when I'm accessing the pointer field. So tagging the pointer in a different address space won't help at all here.>> 3) I've been following the discussions on llvm-dev about the use of the >>> address-space property of pointers to signal different kinds of memory pools >>> for things like shared address spaces. If we try to use that same variable >>> to indicate garbage collection, now we have to multiplex both meanings onto >>> the same field. We can't just dedicate one special ID for the garbage >>> collected heap, because there could be multiple such heaps. As you add >>> additional orthogonal meanings to the address-space field, you end up with a >>> combinatorial explosion of possible values for it. >>> >>> >> I think there exist already some convention between an ID and some >> codegen. Having one additional seems fine to me, even if you need to play >> with bits in case you need different IDs for a single pointer. >> >> I'm also fine with the intrinsic way of declaring a GC root. But I think >> it is cumbersome, and error-prone in the presence of optimizers that may try >> to move away that intrinsic (I remember similar issues with the current EH >> intrinsics). >> >> Nicolas >> >> >>> -- >>> -- Talin >>> >> >> > > > -- > -- Talin >-- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110307/b38365b6/attachment.html>
Hi Talin, Sorry to interject -> For example, suppose I have a type "String or (float, float, float)" - that > is, a union of a string and a 3-tuple of floats. Most of the time what LLVM > will see is { i1; { float; float; float; } } because that's bigger than { > i1; String* }. LLVM won't even know there's a pointer in there, except > during those brief times when I'm accessing the pointer field. So tagging > the pointer in a different address space won't help at all here. > >I think this is a fairly uncommon use case that will be tricky to deal with no matter what method is used to track GC roots. That said, why not do something like make the pointer representation (the {i1, String*}) the long-term storage format, and only bitcast *just* before loading the floats? You could even use another address space to indicate that something is *sometimes* a pointer, dependent upon some other value (the i1, perhaps indicated with metadata). My vote (not that it really counts for much) would be the address-space method. It seems much more elegant. The only thing that I think would be unusually difficult for the address-space method to handle would be alternative pointer representations, such as those used in the latest version of Hotspot (see http://wikis.sun.com/display/HotSpotInternals/CompressedOops). Essentially, a 64-bit pointer is packed into 32-bits by assuming 8-byte alignment and restricting the heap size to 32GB. I've seen similar object-reference bitfields used in game engines. In this case, there is no "pointer" to attach the address space to. (Yes, I know that Hotspot currently uses CompressedOops ONLY in the heap, decompressing them when stored in locals, but it is not inconceivable to avoid decompressing them if the code is just moving them around, as an optimization.) Just my few thoughts. -Joshua -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110307/868992ab/attachment.html>