Given the C++ struct: struct S { string a; string b; }; I also have C "thunk" functions that I call from LLVM code: // calls S::S() void* T_S_M_new( void *heap ); // call string::assign(char const*) void T_string_M_assign_Pv( void *that, void *value ); I want to do the LLVM equivalent of the following C++ (where that 's' is pointer to an instance of 'S'): s->b.assign( "Hello, world!" ); // assign to S::b If there were an S member function: void S::assign_to_b( char const* ); it would be easy to write a "thunk" wrapper to call it. However, assume that there is no such S member function. I therefore need a way to get the offset of 'b' and add it to 's' so that I can call T_string_M_assign_Pv() on it. Given this helper function: template<class ClassType,class MbrType> inline ptrdiff_t mbr_offset_of( MbrType ClassType::*p ) { ClassType const *const c = static_cast<ClassType*>( nullptr ); return reinterpret_cast<ptrdiff_t>( &(c->*p) ); } I could take a Pointer to an S, use ptrtoint, add the offset, use inttoptr, and use that pointer to pass as the 'this' argument to T_string_M_assign_Pv(). The LLVM code generated via the IRBuilder is: %0 = call i8* @T_S_M_new(i8* %heap) %1 = ptrtoint i8* %0 to i64 %2 = add i64 %1, 8 ; 8 is what's returned by mbr_offset_of() %3 = inttoptr i64 %2 to i8* call void @T_string_M_assign_A_Pv(i8* %3, i8* getelementptr inbounds ([15 x i8]* @0, i64 0, i64 0)) The code does in fact work. My questions are: * Is this an "OK" thing to do? * Is there a better way? - Paul P.S.: I don't explicitly put the getelementptr instruction in there. That's something the IRBuilder does all by itself.
On Mon, Oct 1, 2012 at 9:33 PM, Paul J. Lucas <paul at lucasmail.org> wrote:> Given the C++ struct: > > struct S { > string a; > string b; > }; > > I also have C "thunk" functions that I call from LLVM code: > > // calls S::S() > void* T_S_M_new( void *heap ); > > // call string::assign(char const*) > void T_string_M_assign_Pv( void *that, void *value ); > > I want to do the LLVM equivalent of the following C++ (where that 's' is pointer to an instance of 'S'): > > s->b.assign( "Hello, world!" ); // assign to S::b > > If there were an S member function: > > void S::assign_to_b( char const* ); > > it would be easy to write a "thunk" wrapper to call it. However, assume that there is no such S member function. I therefore need a way to get the offset of 'b' and add it to 's' so that I can call T_string_M_assign_Pv() on it. > > Given this helper function: > > template<class ClassType,class MbrType> inline > ptrdiff_t mbr_offset_of( MbrType ClassType::*p ) { > ClassType const *const c = static_cast<ClassType*>( nullptr ); > return reinterpret_cast<ptrdiff_t>( &(c->*p) ); > } > > I could take a Pointer to an S, use ptrtoint, add the offset, use inttoptr, and use that pointer to pass as the 'this' argument to T_string_M_assign_Pv(). The LLVM code generated via the IRBuilder is: > > %0 = call i8* @T_S_M_new(i8* %heap) > %1 = ptrtoint i8* %0 to i64 > %2 = add i64 %1, 8 ; 8 is what's returned by mbr_offset_of() > %3 = inttoptr i64 %2 to i8* > call void @T_string_M_assign_A_Pv(i8* %3, i8* getelementptr inbounds ([15 x i8]* @0, i64 0, i64 0)) > > The code does in fact work. My questions are: > > * Is this an "OK" thing to do?Doing math on pointers if you know the offsets is perfectly legitimate. clang will generate code like this for certain casts which can't be represented in the type system. Using GEP on an i8* is a bit nicer to the optimizer, though, because using ptrtoint/inttoptr has effects on alias analysis.> * Is there a better way?I'm not entirely sure how you're using mbr_offset_of, but it's broken if there are any classes with virtual bases involved. Getting this case right probably involves using clang somehow (either to synthesize the relevant thunks, or query for the right offsets and generate the code yourself). -Eli
On Oct 1, 2012, at 9:58 PM, Eli Friedman <eli.friedman at gmail.com> wrote:> Using GEP on an i8* is a bit nicer to the optimizer, though, because > using ptrtoint/inttoptr has effects on alias analysis.My understanding is that, in order to use GEP, you have to provide the LLVM code with the struct layout, i.e., build a StructType object. In my case, that struct is declared in C++ code already and, in order to use GEP, I'd have to replicate the struct layout (exactly as the C++ compiler would) in LLVM code -- something that I'd rather not do, not to mention that it's fairly "brittle" even if I could manage to get it right. (Simple structs would probably be easy, but struct that have virtual functions or multiple base classes would be much harder.)> I'm not entirely sure how you're using mbr_offset_ofGiven 't', an instance of some class T, and some member T::m, find the integer offset in bytes from &t to &t.m. This offset, when added to &t, should be &t.m. I'm using mbr_offset_of to get the C++ compiler to do the work of telling me what the correct offset is for the already existing struct.> but it's broken if there are any classes with virtual bases involved.Really? This simple code works just fine: struct A { int ai; }; struct X : virtual A { int xi; }; struct Y : virtual A { int yi; }; struct S : X, Y { string a; string b; }; template<class ClassType,class MbrType> inline ptrdiff_t mbr_offset_of( MbrType ClassType::*p ) { ClassType const *const c = static_cast<ClassType*>( nullptr ); return reinterpret_cast<ptrdiff_t>( &(c->*p) ); } int main() { ptrdiff_t offset = mbr_offset_of( &S::b ); S s; string *p = (string*)((char*)&s + offset); p->assign( "Hello, world!" ); cout << *p << endl; return 0; } Despite that, however, the equivalent code in LLVM (once I introduce a base class for S, even just ordinary inheritance), crashes. I don't understand why, however. I print out the offset, and it's the correct value that's getting added to the Pointer. - Paul