Stephen Cross
2015-May-31 23:01 UTC
[LLVMdev] Hash Table Virtual Calls with Conflict Resolution Stubs
Hi everyone, I'm the developer of the Loci programming language ( http://loci-lang.org ), for which the compiler is a front-end for LLVM. I would like to say that LLVM has been extremely useful for the development of the compiler and so thank you everyone for building this amazing system. ---- Virtual Method Calls ---- While most aspects of the language map well onto LLVM IR, it seems non-trivial to generate the virtual method calls in LLVM. Loci uses a concept called 'structural typing' [1], which basically means classes don't have to explicitly implement interfaces; if a class has the methods required by the interface then the class -> interface cast is valid. A hash table is therefore used to represent the virtual method table of an object, in which a virtual method call indexes the hash table with a hash of the method name and then calls the function pointer it finds in the corresponding slot. Clearly there can be clashes in the hash table where two (or more) method names map to the same hash table slot. There are various approaches for conflict resolution but the cheapest seems to be setting a hidden value (such as a register) on the caller side to a larger integer hash value (a large enough space in which collisions are assumed not to occur) and then generating a 'conflict resolution stub', which does something like: conflict_resolution_stub: cmp eax, <METHOD 1 HASH> jne .method_2 .method_1: jmp method_1 .method_2: jmp method_2 So in this case 'eax' is being used for conflict resolution and a simple comparison can distinguish the two methods. The nice aspect of this approach is that collisions aren't expected to occur in most cases so the method function can be directly placed in the vtable and the hidden value will simply be ignored in those cases. Hence in general the only overhead versus a C++ virtual call is setting the hidden value, which is essentially a negligible cost. For more detail, the basic elements of the virtual call mechanism are described in 'Invokeinterface Considered Harmless' [2] (albeit for Java). ---- Encoding calls in LLVM IR ---- Encoding this information in LLVM IR seems to be very difficult, albeit I've noticed some recent developments (such as musttail) that could be useful for this. There was a previous discussion of this issue in 2011 [3] and the conclusion seemed to be that the mechanism above couldn't be encoded in LLVM IR. It is possible to generate the necessary inline assembly, but then LLVM has no knowledge of the function calls and hence no inlining occurs and all the optimisations hit this boundary. Ideally it would be possible to generate safe LLVM IR that could be understood by existing optimisation passes. So I'm wondering: * Is there anything in LLVM IR that could help? * If this currently can't be encoded in LLVM IR, what changes do you think would be necessary to support this? I understand that LLVM may not have complete support for this use case right now; I'd be very happy to do the necessary work and submit any relevant patches for review. I am very keen *not* to fork LLVM but instead to try to make upstream changes that are as broadly useful as possible. I've added some notes below about what approach might be taken and I'd be interested to know what you think. Kind regards, Stephen Cross [1] http://loci-lang.org/StructuralTyping.html [2] https://www.research.ibm.com/people/d/dgrove/papers/oopsla01.pdf [3] http://marc.info/?l=llvm-dev&m=129594650017675 ---- Notes about potential approaches ---- In terms of the approach, there are at least two parts to this problem: * Setting/retrieving the hidden value * Forwarding function arguments For the former, two vague ideas I had were adding a new calling convention (it might be useful if external projects could use hooks to add in their own calling conventions) or modifying inreg to allow specifying a particular register (albeit encoding target-specific information like this isn't ideal). Another approach could be adding a parameter attribute (e.g. 'hidden' or 'outofband') that puts the parameter in some target-specific location. The 'llvm.read_register' and 'llvm.write_register' intrinsics would only be useful if they could guarantee the relevant register wasn't going to be overwritten in the meantime (and they'd also need to support more registers than currently). As for forwarding arguments, it seems like the combination of 'musttail' and varargs come close to satisfying this. The only issue is that it's not possible to forward a return value.
Sanjoy Das
2015-Jun-01 06:20 UTC
[LLVMdev] Hash Table Virtual Calls with Conflict Resolution Stubs
> For the former, two vague ideas I had were adding a new calling > convention (it might be useful if external projects could use hooks to > add in their own calling conventions) or modifying inreg to allow > specifying a particular register (albeit encoding target-specific > information like this isn't ideal).Assuming you have full control over your environment, I'd not start by writing a new calling convention. I'd just have: ... %dest = load_from_itable() call i32 %dest(i32 METHOD_ID, rest of the arguments ...) ... and have the conflict resolution stub be: define i32 @conflict_redo(i32 %method_id, arg0, arg1, arg2) { dispatch based on %method_id } with the normal x86 calling conventions. Later, as an *optimization* I'd consider a custom calling convention for the %method_id argument. Not using %rax for %method_id frees it up for use as %dest, for instance, so using %rdi for %method_id may not actually be a bad idea in practice. If you have to inter-operate with legacy code and are not flexible on how %method_id is propagated (i.e. it _has_ to be in %eax) then a custom calling convention that puts the first argument in %eax is probably the best way to go.> As for forwarding arguments, it seems like the combination of > 'musttail' and varargs come close to satisfying this. The only issue > is that it's not possible to forward a return value.I'm not sure I understand what you mean by "forward a return value", but running the following: declare i32 @f(i32, i64) define i32 @conflict_resolution_stub(i32 %t, i64 %arg0) { %v = musttail call i32 @f(i32 %t, i64 %arg0) ret i32 %v } through `llc -O3` gives (trimmed output): _conflict_resolution_stub: ## @conflict_resolution_stub pushq %rax popq %rax jmp _f ## TAILCALL I'm not sure what the pushq and popq %rax is here, looks like a codegen glitch, but you do get the jmp _f as expected. -- Sanjoy
Stephen Cross
2015-Jun-01 23:20 UTC
[LLVMdev] Hash Table Virtual Calls with Conflict Resolution Stubs
Hi Sanjoy, Thanks for your response.> Assuming you have full control over your environment [...]Just to answer this, I do have control over the calling convention, so comments about the implementation choices are definitely in scope. Having said that, I'm keen not to design the calling convention in order to satisfy current compiler implementation limitations.> Not using %rax for %method_id frees it up for use as %dest, for instance, > so using %rdi for %method_id may not actually be a bad idea in > practice.Yes, using eax was just an example. The problem with using rdi is that the code then has to shift all the values up one register, as well as inducing an unnecessary stack allocation. On an individual basis this is a minor cost, but this is a core language feature which may be invoked a huge number of times in a program. After your mention of eax, I investigated other registers and discovered that a viable solution already appears to exist in LLVM in the form of the 'nest' parameter attribute. On x86_64 this passes a pointer in r10, which satisfies the use case extremely well. I've therefore realised that a very similar use case to this is already well known to LLVM in the form of nested functions and the static chain pointer. As far as I can tell, the registers used are: * x86 : ecx * x86_64 : r10 * ARM (32-bit and 64-bit) : r0/x0 (seems to treat it as a normal argument) (I haven't investigated the other architectures yet.) So x86 and x86_64 are excellent. I'm curious about the implementation of nested functions on ARM, since they seem to be treating the static chain pointer as a normal argument (or is it just that 'nest' is ignored on ARM?). As mentioned above this is very costly because it means all the arguments have to be shifted by one position; it would be more efficient if a callee-save register was used. In any case, I expect to make a proposal at some point soon for either extending 'nest' or adding a new attribute (I'm thinking it could be 'outofband') that allows passing an arbitrary-typed argument via an out-of-band mechanism. Following the established norms for nested functions, this would use ecx on x86, r10 on x86_64 and a callee-save register on ARM. What do you think?> I'm not sure I understand what you mean by "forward a return value",My thinking was that the i32 return value might affect the correctness of the call, since we may in fact be returning significantly different types (e.g. a large struct). Having said that, given that we're calling with the correct prototype and returning the value in a function with the correct prototype it seems hard to imagine a code generator choking on this. Presumably optimisation passes might still have trouble with this, but it looks like the new 'thunk' function attribute is designed for this case. Could anyone explain in detail the purpose of the 'thunk' attribute and what led to its introduction? (The IR documentation is slightly vague.)> but running the following:Yes, this is right and thanks for pointing this out. In an actual conflict resolution stub we need to pass arbitrary arguments of unknown type, whereas I've found casting non-varargs functions sometimes leads LLVM to zero all the arguments. Fortunately it looks like LLVM 3.6+ allows forwarding varargs which means this works smoothly. Kind regards, Stephen
David Chisnall
2015-Jun-02 09:18 UTC
[LLVMdev] Hash Table Virtual Calls with Conflict Resolution Stubs
Hi Stephen, Have you looked at how Objective-C implements this? There are at least four Objective-C runtimes that are open source, which all solve this problem in slightly different ways. The GCC, GNUstep, Apple Legacy, Apple Modern, and ObjFW runtimes are all supported by clang, so you can take a look at the IR generation easily as well. I’d be happy to answer questions about the GNUstep runtime off list, David> On 1 Jun 2015, at 00:01, Stephen Cross <scross at scross.co.uk> wrote: > > Hi everyone, > > I'm the developer of the Loci programming language ( > http://loci-lang.org ), for which the compiler is a front-end for > LLVM. I would like to say that LLVM has been extremely useful for the > development of the compiler and so thank you everyone for building > this amazing system. > > ---- Virtual Method Calls ---- > > While most aspects of the language map well onto LLVM IR, it seems > non-trivial to generate the virtual method calls in LLVM. Loci uses a > concept called 'structural typing' [1], which basically means classes > don't have to explicitly implement interfaces; if a class has the > methods required by the interface then the class -> interface cast is > valid. A hash table is therefore used to represent the virtual method > table of an object, in which a virtual method call indexes the hash > table with a hash of the method name and then calls the function > pointer it finds in the corresponding slot. > > Clearly there can be clashes in the hash table where two (or more) > method names map to the same hash table slot. There are various > approaches for conflict resolution but the cheapest seems to be > setting a hidden value (such as a register) on the caller side to a > larger integer hash value (a large enough space in which collisions > are assumed not to occur) and then generating a 'conflict resolution > stub', which does something like: > > conflict_resolution_stub: > cmp eax, <METHOD 1 HASH> > jne .method_2 > .method_1: > jmp method_1 > .method_2: > jmp method_2 > > So in this case 'eax' is being used for conflict resolution and a > simple comparison can distinguish the two methods. The nice aspect of > this approach is that collisions aren't expected to occur in most > cases so the method function can be directly placed in the vtable and > the hidden value will simply be ignored in those cases. Hence in > general the only overhead versus a C++ virtual call is setting the > hidden value, which is essentially a negligible cost. For more detail, > the basic elements of the virtual call mechanism are described in > 'Invokeinterface Considered Harmless' [2] (albeit for Java). > > ---- Encoding calls in LLVM IR ---- > > Encoding this information in LLVM IR seems to be very difficult, > albeit I've noticed some recent developments (such as musttail) that > could be useful for this. There was a previous discussion of this > issue in 2011 [3] and the conclusion seemed to be that the mechanism > above couldn't be encoded in LLVM IR. > > It is possible to generate the necessary inline assembly, but then > LLVM has no knowledge of the function calls and hence no inlining > occurs and all the optimisations hit this boundary. Ideally it would > be possible to generate safe LLVM IR that could be understood by > existing optimisation passes. > > So I'm wondering: > > * Is there anything in LLVM IR that could help? > * If this currently can't be encoded in LLVM IR, what changes do you > think would be necessary to support this? > > I understand that LLVM may not have complete support for this use case > right now; I'd be very happy to do the necessary work and submit any > relevant patches for review. I am very keen *not* to fork LLVM but > instead to try to make upstream changes that are as broadly useful as > possible. I've added some notes below about what approach might be > taken and I'd be interested to know what you think. > > Kind regards, > Stephen Cross > > [1] http://loci-lang.org/StructuralTyping.html > [2] https://www.research.ibm.com/people/d/dgrove/papers/oopsla01.pdf > [3] http://marc.info/?l=llvm-dev&m=129594650017675 > > ---- Notes about potential approaches ---- > > In terms of the approach, there are at least two parts to this problem: > > * Setting/retrieving the hidden value > * Forwarding function arguments > > For the former, two vague ideas I had were adding a new calling > convention (it might be useful if external projects could use hooks to > add in their own calling conventions) or modifying inreg to allow > specifying a particular register (albeit encoding target-specific > information like this isn't ideal). Another approach could be adding a > parameter attribute (e.g. 'hidden' or 'outofband') that puts the > parameter in some target-specific location. The 'llvm.read_register' > and 'llvm.write_register' intrinsics would only be useful if they > could guarantee the relevant register wasn't going to be overwritten > in the meantime (and they'd also need to support more registers than > currently). > > As for forwarding arguments, it seems like the combination of > 'musttail' and varargs come close to satisfying this. The only issue > is that it's not possible to forward a return value. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Stephen Cross
2015-Jun-02 23:04 UTC
[LLVMdev] Hash Table Virtual Calls with Conflict Resolution Stubs
Hello, So I've done a bit of digging in GCC and for the static chain pointer it seems to be using [1]: * x86: ecx * x86_64: r10 * ARM: ip (r12) * AArch64: x18 * PowerPC: r11 The choices here are notable: on ARM, GCC uses ip (r12), the 'intra-procedure-call scratch register', which can technically be used by veneers generated by the linker, but the nature of nested functions means this doesn't occur; I believe it also works for my use case. On AArch64, GCC uses x18, the 'platform register', which appears to be unused by Linux and therefore available as a scratch register. However it *is* used by iOS [2] and GCC appears to not have chosen a static chain register for iOS AArch64 yet [3]. I'm raising this issue on the GCC mailing list. Currently LLVM supports x86 (ecx) and x86_64 (r10), but as far as I can tell the other backends don't support 'nest' and hence treat the parameter as if it didn't have the attribute; it looks like the PowerPC backend supports the trampoline intrinsics but its nested parameter isn't passed in the same way as GCC (though I haven't yet investigated this target in detail). I'm putting together patches for ARM and AArch64 that implement 'nest' in their respective backends to match the choices made by GCC and writing tests to verify the registers used are correct (btw LLVM has a great test system). On the backends that don't support 'nest' I'd suggest that they should report an error (or at least some form of notice) since they're not using an out-of-band mechanism, and I'd be happy to write tests to check this. I would also be interested in making the documentation for 'nest' slightly clearer. Does this sound sensible? As for renaming/generalising 'nest', I'm going to leave this for now but probably raise it a later time. I do think it'd be cleaner to have something like 'outofband' that supports arbitrary parameter types, so e.g. the Loci frontend doesn't have to generate different IR for each target (32-bit targets will pass a pointer to a i64 value, whereas 64-bit targets pass the value directly). This would replace 'nest' but have identical behaviour (and register selection) for the case where the parameter is a pointer.> Have you looked at how Objective-C implements this? There are at least four Objective-C runtimes that are open source, which all solve this problem in slightly different ways. The GCC, GNUstep, Apple Legacy, Apple Modern, and ObjFW runtimes are all supported by clang, so you can take a look at the IR generation easily as well.I haven't looked at Objective-C. As far as I can tell, Objective-C doesn't face the same problem as shown here because blocks essentially pass their scope as a normal pointer argument (I ran an Objective-C program through Clang to check this). Calls to protocol methods appear to use objc_msg_lookup and then just directly call the returned function pointer. Does that sound correct? (Or did I miss the point you were making?) Kind regards, Stephen [1] https://github.com/gcc-mirror/gcc/blob/7c62dfbbcd3699efcbbadc9fb3aa14f23a123add/gcc/testsuite/gcc.dg/cwsc1.c [2] https://developer.apple.com/library/ios/documentation/Xcode/Conceptual/iPhoneOSABIReference/Articles/ARM64FunctionCallingConventions.html [3] https://github.com/gcc-mirror/gcc/blob/7c62dfbbcd3699efcbbadc9fb3aa14f23a123add/libffi/src/aarch64/ffitarget.h#L66 (GCC links are to specific commits so they don't go out-of-date.)