Manuel Jacob
2013-Jul-31 02:02 UTC
[LLVMdev] New ideas about how to improve garbage collection support
Hi, I currently write a LLVM backend for RPython, a programming language and toolchain to write dynamic language interpreters. For example, this toolchain powers PyPy, a fast Python interpreter. In the moment the implementation uses a shadowstack to manage garbage collection stack roots. To improve performance, I experimented with LLVM's garbage collection support. It works by marking a stack slot as containing a garbage collection root. This enables easy code generation for frontends which use the mem2reg pass to construct the SSA form. Such frontends just need to emit a @llvm.gcroot() intrinsic referring to the AllocaInst. RPython uses a SSA representation of variables internally. To use LLVM's GC support, it has to reassign SSA values back to stack slots, which is hard. Another issue (not only for RPython) is that using the @llvm.gcroot() intrinsic prevents mem2reg from promoting these stack slots to registers, disabling optimizations. Ending the live time of a stack root is only possible by storing null to the stack slot. There were some discussions on this list about how to to mark SSA values that are stack roots. None one them resulted in implementation of a better GC root support. This was partly because most ideas assumed that a stack root must be a pointer. With the current scheme, a stack root doesn't need to be a pointer. The stack slot referred by a @llvm.gcroot() intrinsic can contain anything. Not only pointers, but also tagged unions and compressed pointers. Other ideas involved changes to the type system. To support GC roots of any type without changing the type system, a new instruction needs to be introduced. It could be used like this: %gcvar = call %gctype* @allocate_type() mark_gcroot %gctype* %gcvar For a proof-of-concept, which only supports pointers as GC roots, an intrinsic is enough: %gcvar = call %gctype* @allocate_type() %tmp = bitcast %gctype* %gcvar to i8* call void @gcroot.mark(i8* %tmp) What part of the garbage collector needs support from the code generator? When a collection cycle is triggered, the garbage collector needs to know what's contained in variables pointing to garbage collected memory. A moving garbage collector might need to change these variables to point to the new memory location. The code generator emits tables that describe the layout of the stack. Using these tables and target-specific code, the garbage collection runtime is able to "walk the stack", reading and possibly writing variables pointing to garbage collected memory. What do these tables contain? To simplify things, I assume a simple, non-concurrent garbage collector. For each call site that could trigger a garbage collection cycle, the tables contain the size of the frame in which the call happens and the relative locations of the GC roots. Where are the stack roots stored? The naive approach is to push all gc roots on the stack before a call and pop them after the call. A more clever approach is to integrate with the register allocator. Registers which are spilled anyway doesn't need to be pushed on the stack by the garbage collection support code. Also, the values don't need to be reloaded from stack until needed to reduce register pressure. I'd call this "forced spilling". Although my C++ coding skills are quite limited, I'd like to try implementing this. My open questions are: * How to pass "this is a GC root" through SelectionDAG to the post-ISel phase? * How to force spilling of a register that lives across a call? * How to encode in IR which calls can trigger a garbage collection cycle? -Manuel