I'm still (slowly) working on the project of creating a concurrent garbage collector that works with LLVM. I want to ask a little bit more about object tags and write barriers and so on. Let's start with the assumption that a particular language does not use per-object type tags. The code generator knows the types of all objects, however, and can pass along that type information to llvm.gcroot. And since the type of each pointer field is also known, the type information can be determined for each traced object, as long as there is no polymorphism. However, what I don't understand is how this works in the presence of write barriers. The llvm_gc_write intrinsic takes no type information, so there's no way to determine the type of the object except with a tag. So I'm not sure what exactly the write barrier can accomplish. Note that in a real language, most objects would have type tags, but there would be a significant number of objects that did not. For example, a 'string' class might have a fixed-length header with a type tag, which in turn points to a flat array of characters with no tag. The character array would not be visible to the outside world, but it would still need to be visible to the collector. -- Talin
On 2007-09-15, at 18:01, Talin wrote:> I'm still (slowly) working on the project of creating a concurrent > garbage collector that works with LLVM. I want to ask a little bit > more about object tags and write barriers and so on. > > Let's start with the assumption that a particular language does not > use per-object type tags. The code generator knows the types of all > objects, however, and can pass along that type information to > llvm.gcroot. And since the type of each pointer field is also > known, the type information can be determined for each traced > object, as long as there is no polymorphism. > > However, what I don't understand is how this works in the presence > of write barriers. The llvm_gc_write intrinsic takes no type > information, so there's no way to determine the type of the object > except with a tag. So I'm not sure what exactly the write barrier > can accomplish. > > Note that in a real language, most objects would have type tags, > but there would be a significant number of objects that did not. > For example, a 'string' class might have a fixed-length header with > a type tag, which in turn points to a flat array of characters with > no tag. The character array would not be visible to the outside > world, but it would still need to be visible to the collector.Talin, Can you be more specific the algorithm for which you need type metadata in a write barrier? No algorithms I am aware of perform any tracing from a write barrier. Write barriers are commonly used to record references from old- generation objects to new-generation ones, either by recording the referencing object, the referencing field, or using a card table. For these purposes, the addresses are sufficient. In concurrent collectors—by which I mean those that run asynchronously of the mutator—write barriers may be used to mark the written object pointer. Even byte arrays as you describe require an object header in which to store mark bits, else the collector cannot know whether the object has been marked (mark-sweep collectors) or copied (copying collectors). — Gordon
Gordon Henriksen wrote:> Can you be more specific the algorithm for which you need type > metadata in a write barrier? No algorithms I am aware of perform any > tracing from a write barrier. >This one does: http://citeseer.ist.psu.edu/cache/papers/cs2/442/http:zSzzSzwww.cs.technion.ac.ilzSz~erezzSzPaperszSzms-sliding-views.pdf/an-on-the-fly.pdf> Write barriers are commonly used to record references from old- > generation objects to new-generation ones, either by recording the > referencing object, the referencing field, or using a card table. For > these purposes, the addresses are sufficient. >In the paper cited above, the write barrier checks to see if the object being mutated has been traced; If it hasn't, then it records the values of all pointers contained in the object into a buffer - for which you need the location of the pointers.> In concurrent collectors—by which I mean those that run > asynchronously of the mutator—write barriers may be used to mark the > written object pointer. Even byte arrays as you describe require an > object header in which to store mark bits, else the collector cannot > know whether the object has been marked (mark-sweep collectors) or > copied (copying collectors). >In my design, the mark bits are maintained by the allocator as part of the allocation header of each memory block. In other words, I'm trying to maintain a strict separation between the allocator/collector and the actual objects stored in the heap. The allocator/collector "owns" all of the bits that are before the start address of the object, while the language-specific type system owns all of the bits after the start address. As a result the collector isn't tied to a specific language. -- Talin