I'm curious about the type erasure that goes on when llvm-g++ compiles C++ code. Is this a consequence of it just being the easiest way to do things based on the design of gcc and how LLVM is plugged into it? Can someone more familiar with the llvm-gcc infrastructure comment on the difficulty of generating more strongly typed virtual function tables rather than just having them all be variable length arrays of pointers of unknown type and casting to the "known" type before the call? I understand that there's some issue with structural type equivalence that can merge identical looking table types, but I feel like it would help me with alias analysis to determine possible targets for indirect calls/invokes, or for debug related purposes. What I'd really like would be a vcall/vinvoke instruction, but I get LL part of LLVM. I guess my main complaint is that I'd like to use the LLVM infrastructure to do some higher-level (semantics-wise) manipulation of code without waiting for clang to handle C++. Any other suggestions would be appreciated. Thanks, Luke
On Mar 24, 2009, at 8:40 AM, Luke Dalessandro wrote:> I'm curious about the type erasure that goes on when llvm-g++ compiles > C++ code. Is this a consequence of it just being the easiest way to do > things based on the design of gcc and how LLVM is plugged into it?This is just due to how the G++ front-end happens to lower the C++ types to C types internally.> Can someone more familiar with the llvm-gcc infrastructure comment on > the difficulty of generating more strongly typed virtual function > tables > rather than just having them all be variable length arrays of pointers > of unknown type and casting to the "known" type before the call?This would require changing the G++ front-end. I have no idea how difficult that would be. -Chris
On Mar 24, 2009, at 8:40 AM, Luke Dalessandro wrote:> Can someone more familiar with the llvm-gcc infrastructure comment on > the difficulty of generating more strongly typed virtual function > tables > rather than just having them all be variable length arrays of pointers > of unknown type and casting to the "known" type before the call?The easiest way would be to handle this in the gcc/llvm interface layer. The type of each slot can easily be figured out and the type of the vtable can be built up as a structure instead of an array. I'd guess it shouldn't be more than 100 lines. Harder to do would be to transform the virtual dispatch code. It is exposed as just raw add/ subtract, fetch and then indirect call. Seems like part of the solution may be to propagate the ALIAS_SET information from the type system down for llvm to reason with. It should be more complete and accurate than the information llvm has, though, maybe only marginally so. The saving grace would be the code is heavily stylized and you're getting it before the optimizer swizzles it on you. Since all the math is with constants usually, you just need to recognize the style and the type during the call and the type at the other end, where the pointer arithmetic starts and the transform back into the usual llvm accessors for structures. Annoying to do, but, probably under 200 lines.> Any other suggestions would be appreciated.Sure, just add code to propagate the types around, add code to handle constant arithmetic on these things and and to handle normal virtual dispatches, after that, add support for pointer to member functions and you're done. You should be able to figure out that these things don't escape much, that for a given constant (index), the same shape (type) is always used, that for a given variable (pointer to member function), that the same shape (type) is used and all assignments of this variable come from things of the same shape or that they comes from literals that have the same shape.
Mike Stump wrote:> On Mar 24, 2009, at 8:40 AM, Luke Dalessandro wrote: >> Can someone more familiar with the llvm-gcc infrastructure comment on >> the difficulty of generating more strongly typed virtual function >> tables >> rather than just having them all be variable length arrays of pointers >> of unknown type and casting to the "known" type before the call? > > The easiest way would be to handle this in the gcc/llvm interface > layer. The type of each slot can easily be figured out and the type > of the vtable can be built up as a structure instead of an array. I'd > guess it shouldn't be more than 100 lines. Harder to do would be to > transform the virtual dispatch code. It is exposed as just raw add/ > subtract, fetch and then indirect call. Seems like part of the > solution may be to propagate the ALIAS_SET information from the type > system down for llvm to reason with. It should be more complete and > accurate than the information llvm has, though, maybe only marginally > so. The saving grace would be the code is heavily stylized and you're > getting it before the optimizer swizzles it on you. Since all the > math is with constants usually, you just need to recognize the style > and the type during the call and the type at the other end, where the > pointer arithmetic starts and the transform back into the usual llvm > accessors for structures. Annoying to do, but, probably under 200 > lines.OK, so it's mainly a problem of becoming comfortable with the llvm-gcc internals that are affected and not a fundamental whole-compiler design problem. That sounds like a multi-month rather than multi-year thing for me, Thanks.> >> Any other suggestions would be appreciated. > > Sure, just add code to propagate the types around, add code to handle > constant arithmetic on these things and and to handle normal virtual > dispatches, after that, add support for pointer to member functions > and you're done. You should be able to figure out that these things > don't escape much, that for a given constant (index), the same shape > (type) is always used, that for a given variable (pointer to member > function), that the same shape (type) is used and all assignments of > this variable come from things of the same shape or that they comes > from literals that have the same shape.So I can essentially rematerialize the vtable types by pushing things back through from the indirect calls in the program. Wouldn't existing alias analysis _do_ the same thing in a less specific manner? I guess that alias analysis doesn't always "trust" casts, where if I manually pushed back I would be assuming that the casts are correct? Luke> _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev