On Oct 1, 2012, at 9:58 PM, Eli Friedman <eli.friedman at gmail.com> wrote:> Using GEP on an i8* is a bit nicer to the optimizer, though, because > using ptrtoint/inttoptr has effects on alias analysis.My understanding is that, in order to use GEP, you have to provide the LLVM code with the struct layout, i.e., build a StructType object. In my case, that struct is declared in C++ code already and, in order to use GEP, I'd have to replicate the struct layout (exactly as the C++ compiler would) in LLVM code -- something that I'd rather not do, not to mention that it's fairly "brittle" even if I could manage to get it right. (Simple structs would probably be easy, but struct that have virtual functions or multiple base classes would be much harder.)> I'm not entirely sure how you're using mbr_offset_ofGiven 't', an instance of some class T, and some member T::m, find the integer offset in bytes from &t to &t.m. This offset, when added to &t, should be &t.m. I'm using mbr_offset_of to get the C++ compiler to do the work of telling me what the correct offset is for the already existing struct.> but it's broken if there are any classes with virtual bases involved.Really? This simple code works just fine: struct A { int ai; }; struct X : virtual A { int xi; }; struct Y : virtual A { int yi; }; struct S : X, Y { string a; string b; }; template<class ClassType,class MbrType> inline ptrdiff_t mbr_offset_of( MbrType ClassType::*p ) { ClassType const *const c = static_cast<ClassType*>( nullptr ); return reinterpret_cast<ptrdiff_t>( &(c->*p) ); } int main() { ptrdiff_t offset = mbr_offset_of( &S::b ); S s; string *p = (string*)((char*)&s + offset); p->assign( "Hello, world!" ); cout << *p << endl; return 0; } Despite that, however, the equivalent code in LLVM (once I introduce a base class for S, even just ordinary inheritance), crashes. I don't understand why, however. I print out the offset, and it's the correct value that's getting added to the Pointer. - Paul
On Tue, Oct 2, 2012 at 11:33 AM, Paul J. Lucas <paul at lucasmail.org> wrote:> On Oct 1, 2012, at 9:58 PM, Eli Friedman <eli.friedman at gmail.com> wrote: > >> Using GEP on an i8* is a bit nicer to the optimizer, though, because >> using ptrtoint/inttoptr has effects on alias analysis. > > My understanding is that, in order to use GEP, you have to provide the LLVM code with the struct layout, i.e., build a StructType object. In my case, that struct is declared in C++ code already and, in order to use GEP, I'd have to replicate the struct layout (exactly as the C++ compiler would) in LLVM code -- something that I'd rather not do, not to mention that it's fairly "brittle" even if I could manage to get it right. (Simple structs would probably be easy, but struct that have virtual functions or multiple base classes would be much harder.)No, you don't have to... you can just use GEP on i8*'s. The LLVM type system doesn't have any semantic significance.>> I'm not entirely sure how you're using mbr_offset_of > > Given 't', an instance of some class T, and some member T::m, find the integer offset in bytes from &t to &t.m. This offset, when added to &t, should be &t.m. > > I'm using mbr_offset_of to get the C++ compiler to do the work of telling me what the correct offset is for the already existing struct.If you can do that, why not just generate a thunk to perform the addressing?>> but it's broken if there are any classes with virtual bases involved. > > Really? This simple code works just fine: > > struct A { int ai; }; > struct X : virtual A { int xi; }; > struct Y : virtual A { int yi; }; > > struct S : X, Y { > string a; > string b; > }; > > template<class ClassType,class MbrType> inline > ptrdiff_t mbr_offset_of( MbrType ClassType::*p ) { > ClassType const *const c = static_cast<ClassType*>( nullptr ); > return reinterpret_cast<ptrdiff_t>( &(c->*p) ); > } > > int main() { > ptrdiff_t offset = mbr_offset_of( &S::b ); > S s; > string *p = (string*)((char*)&s + offset); > p->assign( "Hello, world!" ); > cout << *p << endl; > return 0; > }It starts to become an issue when you try to compute the offset to e.g. A::ai in your example.> Despite that, however, the equivalent code in LLVM (once I introduce a base class for S, even just ordinary inheritance), crashes. I don't understand why, however. I print out the offset, and it's the correct value that's getting added to the Pointer.No idea what's happening here. -Eli
On Oct 2, 2012, at 2:34 PM, Eli Friedman <eli.friedman at gmail.com> wrote:> On Tue, Oct 2, 2012 at 11:33 AM, Paul J. Lucas <paul at lucasmail.org> wrote: > >> My understanding is that, in order to use GEP, you have to provide the LLVM code with the struct layout, i.e., build a StructType object. In my case, that struct is declared in C++ code already and, in order to use GEP, I'd have to replicate the struct layout (exactly as the C++ compiler would) in LLVM code -- something that I'd rather not do, not to mention that it's fairly "brittle" even if I could manage to get it right. (Simple structs would probably be easy, but struct that have virtual functions or multiple base classes would be much harder.) > > No, you don't have to... you can just use GEP on i8*'s. The LLVM type > system doesn't have any semantic significance.Oh! I just tried it and it works. :-)>>> I'm not entirely sure how you're using mbr_offset_of >> >> Given 't', an instance of some class T, and some member T::m, find the integer offset in bytes from &t to &t.m. This offset, when added to &t, should be &t.m. >> >> I'm using mbr_offset_of to get the C++ compiler to do the work of telling me what the correct offset is for the already existing struct. > > If you can do that, why not just generate a thunk to perform the addressing?Because if I can create a thunk to do that, I can just as easily create a thunk to provide a "setter" for the struct member (something I'd prefer not to do). I'm trying to compute the offset "inline" in the LLVM code rather than (a) have to create yet another C thunk and (b) call it.>>> but it's broken if there are any classes with virtual bases involved. >> >> Really? This simple code works just fine: >> >> struct A { int ai; }; >> struct X : virtual A { int xi; }; >> struct Y : virtual A { int yi; }; >> >> struct S : X, Y { >> string a; >> string b; >> }; >> >> template<class ClassType,class MbrType> inline >> ptrdiff_t mbr_offset_of( MbrType ClassType::*p ) { >> ClassType const *const c = static_cast<ClassType*>( nullptr ); >> return reinterpret_cast<ptrdiff_t>( &(c->*p) ); >> } >> >> int main() { >> ptrdiff_t offset = mbr_offset_of( &S::b ); >> S s; >> string *p = (string*)((char*)&s + offset); >> p->assign( "Hello, world!" ); >> cout << *p << endl; >> return 0; >> } > > It starts to become an issue when you try to compute the offset to > e.g. A::ai in your example.Hmmmm.... I just changed: s/int ai/string as/ s/&S::b/&S::as/ and the code still works. The offset is 0 which is what you'd expect with class 'A' being a public virtual (shared) base class.>> Despite that, however, the equivalent code in LLVM (once I introduce a base class for S, even just ordinary inheritance), crashes. I don't understand why, however. I print out the offset, and it's the correct value that's getting added to the Pointer. > > No idea what's happening here.The IR code is now: @0 = private unnamed_addr constant [14 x i8] c"Hello, world!\00" ... %0 = call i8* @T_S_M_new(i8* %heap) %1 = getelementptr i8* %0, i64 16 call void @T_string_M_assign_A_Pv(i8* %1, i8* getelementptr inbounds ([14 x i8]* @0, i64 0, i64 0)) where the "16" is the correct offset (it agrees with my pure C++ version of the code), yet it still crashes. It's not obvious why. - Paul