I'm re-re-reading the "Accurate Garbage Collection with LLVM", and I'm realizing that there are some parts of this document I find confusing. 1) I think that the term 'stack map' should be defined more precisely. For example, in one place it says "LVM automatically computes a stack map", and elsewhere it says "The compiler plugin is responsible for generating code which conforms to the binary interface defined by library, most essentially the stack map". At first glance, this seems contradictory - who is generating the stack map, LLVM, or the compiler plugin? The problem is that the words "stack map" are being used to refer to two different things, one which is computed by LLVM, the other generated by the compiler plugin. Also, a definition of what a stack map is, and what it contains, would be helpful. 2) It says LLVM does not address "discovery or registration of stack frame descriptors." I'm not sure what is meant by a "stack frame descriptor". Is this the same as a stack map? 3) The shadow-stack collector is presented as "an easy way to get started". However, it doesn't give many hints as to what the next logical evolutionary step would be - what method would you normally use for crawling the various stack frames if you didn't want to pay the performance penalty of maintaining the shadow stack? Something like libunwind perhaps? 4) A more general question: In the barrier intrinsics, is there any constraint on the values that the object pointer and the derived pointer can take? In particular, I am thinking about the case where you have an object, such as an ArrayList, in which there are two separate non-contiguous allocations: A fixed-length "header" part, and a variable-length "tail" part. Assume that the header part uses type tags, but the tail part is just a raw data buffer. From a GC perspective, it may be simplest to treat these as a single object, such that only the head part gets added to the work queue, and both are traced by the same tracer function. This implies, however, that when calling gcwrite(), the 'derived' pointer might be in a different memory block than the 'object' pointer, and may even be located at a negative offset from it. More questions to follow as I come up with them... -- -- Talin
Gordon Henriksen
2009-Mar-06 02:05 UTC
[LLVMdev] Suggestions for improvements to the GC docs.
Hi Talin, Thanks for the feedback. My comments inline. On 2009-03-05, at 16:01, Talin wrote:> I'm re-re-reading the "Accurate Garbage Collection with LLVM", and > I'm realizing that there are some parts of this document I find > confusing. > > 1) I think that the term 'stack map' should be defined more > precisely. For example, in one place it says "LVM automatically > computes a stack map", and elsewhere it says "The compiler plugin is > responsible for generating code which conforms to the binary > interface defined by library, most essentially the stack map". At > first glance, this seems contradictory - who is generating the stack > map, LLVM, or the compiler plugin? The problem is that the words > "stack map" are being used to refer to two different things, one > which is computed by LLVM, the other generated by the compiler plugin.There is a conflation of terms there, yes. I've tried to clarify, but they are merely different representations of the same information.> Also, a definition of what a stack map is, and what it contains, > would be helpful.I've added more detail to the 'Computing stack maps' section.> 2) It says LLVM does not address "discovery or registration of stack > frame descriptors." I'm not sure what is meant by a "stack frame > descriptor". Is this the same as a stack map?It does. I've expunged the term "descriptor" from the document for consistency.> 3) The shadow-stack collector is presented as "an easy way to get > started". However, it doesn't give many hints as to what the next > logical evolutionary step would be - what method would you normally > use for crawling the various stack frames if you didn't want to pay > the performance penalty of maintaining the shadow stack? Something > like libunwind perhaps?I've added some links to the plugin section at the conclusion here; hopefully that scratches the itch. I don't think LLVM should attempt to prescribe a GC growth path, nor should this document attempt to be a substitute for domain knowledge.> 4) A more general question: In the barrier intrinsics, is there any > constraint on the values that the object pointer and the derived > pointer can take?There is no constraint on the relationship between the object and derived pointer parameters except as the plugin requires. I've made this explicit. There is currently no benefit to using gcread/gcwrite vs. coding in the barrier up front.> In particular, I am thinking about the case where you have an > object, such as an ArrayList, in which there are two separate non- > contiguous allocations: A fixed-length "header" part, and a variable- > length "tail" part. Assume that the header part uses type tags, but > the tail part is just a raw data buffer. From a GC perspective, it > may be simplest to treat these as a single object, such that only > the head part gets added to the work queue, and both are traced by > the same tracer function. This implies, however, that when calling > gcwrite(), the 'derived' pointer might be in a different memory > block than the 'object' pointer, and may even be located at a > negative offset from it.Java and .NET treat the list header as an object which contains reference to a (fixed length) array object—i.e., 2 separate objects. On the principal of least surprise, I would recommend sticking to this model if you're creating building blocks. — Gordon
On Thursday 05 March 2009 21:01:43 Talin wrote:> ...the 'derived' pointer...The docs and LLVM GC API currently assume that GC references are just a single pointer. That is overly restrictive and, indeed, cannot even express the GC that I just wrote. I would like to see that generalized. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
Thanks for the clarifications, they were helpful. Gordon Henriksen wrote:> Hi Talin, > > Thanks for the feedback. My comments inline. > > On 2009-03-05, at 16:01, Talin wrote: > > >> I'm re-re-reading the "Accurate Garbage Collection with LLVM", and >> I'm realizing that there are some parts of this document I find >> confusing. >> >> 1) I think that the term 'stack map' should be defined more >> precisely. For example, in one place it says "LVM automatically >> computes a stack map", and elsewhere it says "The compiler plugin is >> responsible for generating code which conforms to the binary >> interface defined by library, most essentially the stack map". At >> first glance, this seems contradictory - who is generating the stack >> map, LLVM, or the compiler plugin? The problem is that the words >> "stack map" are being used to refer to two different things, one >> which is computed by LLVM, the other generated by the compiler plugin. >> > > There is a conflation of terms there, yes. I've tried to clarify, but > they are merely different representations of the same information. > > >> Also, a definition of what a stack map is, and what it contains, >> would be helpful. >> > > I've added more detail to the 'Computing stack maps' section. > > >> 2) It says LLVM does not address "discovery or registration of stack >> frame descriptors." I'm not sure what is meant by a "stack frame >> descriptor". Is this the same as a stack map? >> > > It does. I've expunged the term "descriptor" from the document for > consistency. > > >> 3) The shadow-stack collector is presented as "an easy way to get >> started". However, it doesn't give many hints as to what the next >> logical evolutionary step would be - what method would you normally >> use for crawling the various stack frames if you didn't want to pay >> the performance penalty of maintaining the shadow stack? Something >> like libunwind perhaps? >> > > I've added some links to the plugin section at the conclusion here; > hopefully that scratches the itch. I don't think LLVM should attempt > to prescribe a GC growth path, nor should this document attempt to be > a substitute for domain knowledge. > > >> 4) A more general question: In the barrier intrinsics, is there any >> constraint on the values that the object pointer and the derived >> pointer can take? >> > > There is no constraint on the relationship between the object and > derived pointer parameters except as the plugin requires. I've made > this explicit. There is currently no benefit to using gcread/gcwrite > vs. coding in the barrier up front. > > >> In particular, I am thinking about the case where you have an >> object, such as an ArrayList, in which there are two separate non- >> contiguous allocations: A fixed-length "header" part, and a variable- >> length "tail" part. Assume that the header part uses type tags, but >> the tail part is just a raw data buffer. From a GC perspective, it >> may be simplest to treat these as a single object, such that only >> the head part gets added to the work queue, and both are traced by >> the same tracer function. This implies, however, that when calling >> gcwrite(), the 'derived' pointer might be in a different memory >> block than the 'object' pointer, and may even be located at a >> negative offset from it. >> > > Java and .NET treat the list header as an object which contains > reference to a (fixed length) array object—i.e., 2 separate objects. > On the principal of least surprise, I would recommend sticking to this > model if you're creating building blocks. > > — Gordon > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >