thr3ads.net - llvm dev - [llvm-dev] Writing built-ins for instructions returning multiple operands [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Martin J. O'Riordan via llvm-dev

2015-Sep-09 09:53 UTC

[llvm-dev] Writing built-ins for instructions returning multiple operands

I have written many builtins for our SHAVE processor which bind directly to
our instructions, and making instructions that are not easily selectable by
the compiler available to the programmer.

 

The majority of these are straight-forward enough, taking a small number pf
input operands and returning a single result; for example 'int
__builtin_shave_mul(int, int)' might map onto a simple multiple instruction
that takes two input integer operands in registers, and returns a single
integer result in another register.

 

However, I have a small number instructions that have two output operands,
each in a separate register.  I would like to provide access to these
instructions using the builtins approach.  This is easy enough to express in
LLVM IR, but I have not been able to figure out how this can be presented in
a C or C++ binding.

 

Is there a pattern for doing this kind of thing that I haven't discovered,
or is it simply something that cannot be expressed using the C binding to
the builtins?

 

Thanks,

 

            MartinO (Movidius Ltd.)

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150909/af148aa4/attachment.html>

mats petersson via llvm-dev

2015-Sep-09 10:31 UTC

head link

[llvm-dev] Writing built-ins for instructions returning multiple operands

Kind of depends on what you actually get back.

For example, if I take the RDTSC instruction on x86, it returns a 64-bit
value, but it's in two registers, EAX and EDX as 32-bit values. The natural
thing in that case is to let the builtin form a 64-bit value [either in EAX
and EDX in the 32-bit compiler, as that happens to be the way 64-bit values
are returned, or as a single 64-bit value by shifting/oring the two
register values into one 64-bit return value].

However, if we have, say, an instruction that returns two distinct values
(div that also gives the remainder, as a simple example), you will either
have to return a (small) struct, or pass in a pointer to be filled in by
the function [the latter is not ideal from an optimisation perspective, as
the optimiser has a harder time knowing if the output is aliased with
something else.

--
Mats

On 9 September 2015 at 10:53, Martin J. O'Riordan via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I have written many builtins for our SHAVE processor which bind directly
> to our instructions, and making instructions that are not easily selectable
> by the compiler available to the programmer.
>
>
>
> The majority of these are straight-forward enough, taking a small number
> pf input operands and returning a single result; for example ‘int
> __builtin_shave_mul(int, int)’ might map onto a simple multiple
> instruction that takes two input integer operands in registers, and returns
> a single integer result in another register.
>
>
>
> However, I have a small number instructions that have two output operands,
> each in a separate register.  I would like to provide access to these
> instructions using the builtins approach.  This is easy enough to express
> in LLVM IR, but I have not been able to figure out how this can be
> presented in a C or C++ binding.
>
>
>
> Is there a pattern for doing this kind of thing that I haven’t discovered,
> or is it simply something that cannot be expressed using the C binding to
> the builtins?
>
>
>
> Thanks,
>
>
>
>             MartinO (Movidius Ltd.)
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150909/b9b9ffa6/attachment.html>

David Chisnall via llvm-dev

2015-Sep-09 10:43 UTC

head link

[llvm-dev] [cfe-dev] Writing built-ins for instructions returning multiple operands

On 9 Sep 2015, at 11:31, mats petersson via cfe-dev <cfe-dev at
lists.llvm.org> wrote:> 
> However, if we have, say, an instruction that returns two distinct values
(div that also gives the remainder, as a simple example), you will either have
to return a (small) struct, or pass in a pointer to be filled in by the function
[the latter is not ideal from an optimisation perspective, as the optimiser has
a harder time knowing if the output is aliased with something else.
It’s important to differentiate the C builtin from the LLVM intrinsic.  It’s
generally more useable (and idiomatic) in C to have additional return values
become arguments returned by pointer.  It’s generally more useful in LLVM IR to
have multiple return values as a struct.  For an example, consider the
overflow-checked builtins.

The following C for a function that multiplies two numbers and returns either
the result or 0 on overflow:

unsigned int mul(unsigned int x, unsigned int y)
{
	unsigned int result;
	return __builtin_umul_overflow(x, y, &result) == 0 ? 0 : result;
}

This becomes some fairly complex IR, with the key part being:

  %result = alloca i32, align 4
...
  %5 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %3, i32 %4)
...
  %7 = extractvalue { i32, i1 } %5, 0
  store i32 %7, i32* %result, align 4

The SROA happily turns this entire function into:

define i32 @mul(i32 %x, i32 %y) #0 {
  %1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
  %2 = extractvalue { i32, i1 } %1, 1
  %3 = extractvalue { i32, i1 } %1, 0
  %4 = zext i1 %2 to i32
  %5 = icmp eq i32 %4, 0
  br i1 %5, label %6, label %7

; <label>:6                                       ; preds = %0
  br label %8

; <label>:7                                       ; preds = %0
  br label %8

; <label>:8                                       ; preds = %7, %6
  %9 = phi i32 [ 0, %6 ], [ %3, %7 ]
  ret i32 %9
}

SimplifyCFG then turns the branches into a single select:

define i32 @mul(i32 %x, i32 %y) #0 {
  %1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
  %2 = extractvalue { i32, i1 } %1, 1
  %3 = extractvalue { i32, i1 } %1, 0
  %4 = zext i1 %2 to i32
  %5 = icmp eq i32 %4, 0
  %. = select i1 %5, i32 0, i32 %3
  ret i32 %.
}

And instcombine gets rid of the redundant zext / icmp:

define i32 @mul(i32 %x, i32 %y) #0 {
  %1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
  %2 = extractvalue { i32, i1 } %1, 1
  %3 = extractvalue { i32, i1 } %1, 0
  %. = select i1 %2, i32 %3, i32 0
  ret i32 %.
}

TL;DR version: Just because you expose a builtin to C as something that takes a
pointer doesn’t mean that the optimisers will struggle with it if you expose a
sensible LLVM IR intrinsic.

David

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Sep 2015 - Writing built-ins for instructions returning multiple operands

[llvm-dev] Writing built-ins for instructions returning multiple operands

[llvm-dev] Writing built-ins for instructions returning multiple operands

[llvm-dev] [cfe-dev] Writing built-ins for instructions returning multiple operands

Possibly Parallel Threads