Ok, I *might* be getting this from the assembly code. The assembly code has: L_llvm_gc_root_chain$non_lazy_ptr: .indirect_symbol _llvm_gc_root_chain .long 0 and I see it being used in the function preamble. Is that a ref to an extern symbol or the def? I.e., is it referring to StackEntry *llvm_gc_root_chain; that I must have in my GC C code? (semispace.c has it) SO! I might be getting this. The shadow stack plugin assumes I have struct StackEntry { StackEntry *Next; // Caller's stack entry. const FrameMap *Map; // Pointer to constant FrameMap. void *Roots[]; // Stack roots (in-place array). }; as my stack item layout and I must provide a shadow stack head. From that, it will push/pop in functions? If so, that's easy enough. :) What I was/am missing is the explicit link between types and variables in a GC.c file and the generated machine code. If I can get that last explicit link, I'm off to the races. Anybody? My IR doesn't seem to have any roots, even though I've allocated an int and declared a ptr on the stack. declare void @llvm.gcroot(i8 **, i8*) declare void @llvm_gc_collect() declare i32* @llvm_gc_allocate(i32) declare void @llvm_gc_initialize(i32) define void @foo() gc "shadow-stack" { ; int *pa = malloc(sizeof(int)); %a = call i32* @llvm_gc_allocate(i32 4) %pa = alloca i32* store i32* %a, i32** %pa %c = bitcast i32** %pa to i8** call void @llvm.gcroot(i8** %c, i8* null) ; *pa = 99; %t0 = add i32 99,0 %t1 = load i32** %pa ;%t2 = getelementptr i32** %t1, i32 0 store i32 %t0, i32* %t1 store i32* null, i32** %pa; say it's dead ret void } define void @main() { call void @llvm_gc_initialize(i32 1024) call void @foo() call void @llvm_gc_collect() ret void } I get llvm_gc_root_chain as null when I try to walk roots. Ter
On Apr 21, 2008, at 5:09 PM, Terence Parr wrote:> define void @main() { > call void @llvm_gc_initialize(i32 1024) > call void @foo() > call void @llvm_gc_collect() > ret void > }> I get llvm_gc_root_chain as null when I try to walk roots.duh...if my collect is AFTER the method terminates, there is no root. I now have a ROOT when I walk! Woohoo! I haven't done C/C++ in 13 years...woof...finally rediscovered nm to tell me about extern symbols...that chain root is essentially binding to the ptr in my GC.c file. :) I get some output from my collector now: process_root[0x0xbfffed08] = 0x0x800000 Now to figure out what to do with these suckers ;) and make it actually collect! [wiping sweat from brow] Ter
On Apr 21, 2008, at 20:09, Terence Parr wrote:> Ok, I *might* be getting this from the assembly code. ... From > that, it will push/pop in functions? If so, that's easy enough. :)Yup! Sounds like you've got it.> What I was/am missing is the explicit link between types and > variables in a GC.c file and the generated machine code. If I can > get that last explicit link, I'm off to the races.You mean, how do you know what sort of object you're tracing? You've got 3 options here… • If you have an type tree (as in Java or .NET), you can assume that every root starts with a pointer to object metadata, which should naturally include GC tracing information. • If you have a type forest (as in C or C++) with optional vtables, then no such assumption is possible, and you can include type layout information in the %metadata parameter to @llvm.gcroot. The FrameMap type includes this data. • You can tag values, as in lisp or many functional languages. (e.g., integer values have the low bit set, pointers do not.) All fields in a block must be of a uniform size, and you'll still need to know how many words in a block. This decision is completely agnostic to the decision to use the shadow stack, or something more efficient. — Gordon
On Apr 21, 2008, at 6:23 PM, Gordon Henriksen wrote:> On Apr 21, 2008, at 20:09, Terence Parr wrote: > >> Ok, I *might* be getting this from the assembly code. ... From >> that, it will push/pop in functions? If so, that's easy enough. :) > > Yup! Sounds like you've got it.Yup, what i was missing and what somebody should add to the doc is that "shadow-stack" adds a preamble/postamble snippet to each function that must bind with StackEntry *llvm_gc_root_chain; wherever you choose to define it. I put into my GC.c file. Further, that shadow-stack snippet generation assumes the following structures for tracking roots: typedef struct FrameMap FrameMap; struct FrameMap { int32_t NumRoots; // Number of roots in stack frame. int32_t NumMeta; // Number of metadata descriptors. May be < NumRoots. void *Meta[]; // May be absent for roots without metadata. }; typedef struct StackEntry StackEntry; struct StackEntry { StackEntry *Next; // Caller's stack entry. const FrameMap *Map; // Pointer to constant FrameMap. void *Roots[]; // Stack roots (in-place array). }; The doc says compiler / runtime must agree, but not what the structs are...Seems like those few lines above would make everything clear. I don't have write access to svn, but I plan on a big chapter full of ANTLR -> LLVM examples in my DSL problem solving book.>> What I was/am missing is the explicit link between types and >> variables in a GC.c file and the generated machine code. If I can >> get that last explicit link, I'm off to the races. > > You mean, how do you know what sort of object you're tracing?I assumed that I needed to generate my object maps or at least a list of pointers for each object type. Related to that, i have two important questions: 1. How do I know the offset (due to alignment/padding by LLVM) of a pointer within an object using {...} struct type? GEP instruction gets an address, of course, but how does my C collector compute these. Do I need to make a metadata struct and fill it with GEP instructions? I guess that makes sense. 2. How do I know how big a struct is? I have my gc_allocate() method but I have no idea how big the struct will be; i see now sizeof. Alignment makes it unclear how big something is; it's >= size of elements like i32 but how much bigger than packed struct is it? I.e., %struct.A = type {i32 x, [10 x i32]*} define void @foo() gc "shadow-stack" { %s = alloca %struct.A ; this knows how big struct.A is %a = call i32* @llvm_gc_allocate(i32 11); this does not know. is it 11 or more? ret void }> You've > got 3 options here… > > • If you have an type tree (as in Java or .NET), you can assume that > every root starts with a pointer to object metadata, which should > naturally include GC tracing information.That's what I plan on.> • If you have a type forest (as in C or C++) with optional vtables, > then no such assumption is possible, and you can include type layout > information in the %metadata parameter to @llvm.gcroot. The FrameMap > type includes this data.Ok, so I pass it an arbitrary struct pointer and it just gives it back later for me to peruse, right?> > • You can tag values, as in lisp or many functional languages. (e.g., > integer values have the low bit set, pointers do not.) All fields in a > block must be of a uniform size, and you'll still need to know how > many words in a block.Good to know.> This decision is completely agnostic to the decision to use the shadow > stack, or something more efficient.Yup. makes sense. Sorry for the long questions...gotta figure this out. Ter