Martin J. O'Riordan via llvm-dev
2015-Sep-09 09:53 UTC
[llvm-dev] Writing built-ins for instructions returning multiple operands
I have written many builtins for our SHAVE processor which bind directly to our instructions, and making instructions that are not easily selectable by the compiler available to the programmer. The majority of these are straight-forward enough, taking a small number pf input operands and returning a single result; for example 'int __builtin_shave_mul(int, int)' might map onto a simple multiple instruction that takes two input integer operands in registers, and returns a single integer result in another register. However, I have a small number instructions that have two output operands, each in a separate register. I would like to provide access to these instructions using the builtins approach. This is easy enough to express in LLVM IR, but I have not been able to figure out how this can be presented in a C or C++ binding. Is there a pattern for doing this kind of thing that I haven't discovered, or is it simply something that cannot be expressed using the C binding to the builtins? Thanks, MartinO (Movidius Ltd.) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150909/af148aa4/attachment.html>
mats petersson via llvm-dev
2015-Sep-09 10:31 UTC
[llvm-dev] Writing built-ins for instructions returning multiple operands
Kind of depends on what you actually get back. For example, if I take the RDTSC instruction on x86, it returns a 64-bit value, but it's in two registers, EAX and EDX as 32-bit values. The natural thing in that case is to let the builtin form a 64-bit value [either in EAX and EDX in the 32-bit compiler, as that happens to be the way 64-bit values are returned, or as a single 64-bit value by shifting/oring the two register values into one 64-bit return value]. However, if we have, say, an instruction that returns two distinct values (div that also gives the remainder, as a simple example), you will either have to return a (small) struct, or pass in a pointer to be filled in by the function [the latter is not ideal from an optimisation perspective, as the optimiser has a harder time knowing if the output is aliased with something else. -- Mats On 9 September 2015 at 10:53, Martin J. O'Riordan via llvm-dev < llvm-dev at lists.llvm.org> wrote:> I have written many builtins for our SHAVE processor which bind directly > to our instructions, and making instructions that are not easily selectable > by the compiler available to the programmer. > > > > The majority of these are straight-forward enough, taking a small number > pf input operands and returning a single result; for example ‘int > __builtin_shave_mul(int, int)’ might map onto a simple multiple > instruction that takes two input integer operands in registers, and returns > a single integer result in another register. > > > > However, I have a small number instructions that have two output operands, > each in a separate register. I would like to provide access to these > instructions using the builtins approach. This is easy enough to express > in LLVM IR, but I have not been able to figure out how this can be > presented in a C or C++ binding. > > > > Is there a pattern for doing this kind of thing that I haven’t discovered, > or is it simply something that cannot be expressed using the C binding to > the builtins? > > > > Thanks, > > > > MartinO (Movidius Ltd.) > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150909/b9b9ffa6/attachment.html>
David Chisnall via llvm-dev
2015-Sep-09 10:43 UTC
[llvm-dev] [cfe-dev] Writing built-ins for instructions returning multiple operands
On 9 Sep 2015, at 11:31, mats petersson via cfe-dev <cfe-dev at lists.llvm.org> wrote:> > However, if we have, say, an instruction that returns two distinct values (div that also gives the remainder, as a simple example), you will either have to return a (small) struct, or pass in a pointer to be filled in by the function [the latter is not ideal from an optimisation perspective, as the optimiser has a harder time knowing if the output is aliased with something else.It’s important to differentiate the C builtin from the LLVM intrinsic. It’s generally more useable (and idiomatic) in C to have additional return values become arguments returned by pointer. It’s generally more useful in LLVM IR to have multiple return values as a struct. For an example, consider the overflow-checked builtins. The following C for a function that multiplies two numbers and returns either the result or 0 on overflow: unsigned int mul(unsigned int x, unsigned int y) { unsigned int result; return __builtin_umul_overflow(x, y, &result) == 0 ? 0 : result; } This becomes some fairly complex IR, with the key part being: %result = alloca i32, align 4 ... %5 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %3, i32 %4) ... %7 = extractvalue { i32, i1 } %5, 0 store i32 %7, i32* %result, align 4 The SROA happily turns this entire function into: define i32 @mul(i32 %x, i32 %y) #0 { %1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y) %2 = extractvalue { i32, i1 } %1, 1 %3 = extractvalue { i32, i1 } %1, 0 %4 = zext i1 %2 to i32 %5 = icmp eq i32 %4, 0 br i1 %5, label %6, label %7 ; <label>:6 ; preds = %0 br label %8 ; <label>:7 ; preds = %0 br label %8 ; <label>:8 ; preds = %7, %6 %9 = phi i32 [ 0, %6 ], [ %3, %7 ] ret i32 %9 } SimplifyCFG then turns the branches into a single select: define i32 @mul(i32 %x, i32 %y) #0 { %1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y) %2 = extractvalue { i32, i1 } %1, 1 %3 = extractvalue { i32, i1 } %1, 0 %4 = zext i1 %2 to i32 %5 = icmp eq i32 %4, 0 %. = select i1 %5, i32 0, i32 %3 ret i32 %. } And instcombine gets rid of the redundant zext / icmp: define i32 @mul(i32 %x, i32 %y) #0 { %1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y) %2 = extractvalue { i32, i1 } %1, 1 %3 = extractvalue { i32, i1 } %1, 0 %. = select i1 %2, i32 %3, i32 0 ret i32 %. } TL;DR version: Just because you expose a builtin to C as something that takes a pointer doesn’t mean that the optimisers will struggle with it if you expose a sensible LLVM IR intrinsic. David