thr3ads.net - llvm dev - [llvm-dev] [RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder [Oct 2017]

If this information is useful, please help other people find it:
Share via:

Cohen, Elad2 via llvm-dev

2017-Oct-17 05:22 UTC

[llvm-dev] [RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder

Introduction
=========
We would like to add support for masked vector signed/unsigned integer division
and remainder in the LLVM IR by introducing new target-independent intrinsics.

This follows similar work which was done already for masked vector loads and
stores - http://lists.llvm.org/pipermail/llvm-dev/2014-October/078059.html.
Another relevant reference is the masked scatter/gather intrinsics discussion -
http://lists.llvm.org/pipermail/llvm-dev/2014-December/079843.html.


Motivation
========
In the current state if the loop-vectorizer decides that it should vectorize a
loop which contains a predicated integer division - it will vectorize the loop
body and scalarize the predicated division instruction into a sequence of
branches that guard scalar division operations. In some cases the generated code
for this will not be very efficient. Speculating the divides using a non-masked
vector sdiv instruction is usually not an option due to the danger of integer
divide-by-zero.

With the addition of these hereby proposed intrinsics the loop-vectorizer could
concentrate on the vector semantics rather than how to lower them, by generating
the masked intrinsics.
Initially the intrinsics will be scalarized for all targets. This could be done
by extending scalarize-masked-mem-intrin to handle also division masked
intrinsics. Later the intrinsics could be optimized by:

1.       Lowering of the intrinsics in the backend using different expansions
(for example converting to floating point and using masked vector floating-point
division instructions).

2.       Linking the intrinsics to different vector math library
implementations.

3.       Scalarizing the intrinsics at the backend possibly using
target-specific considerations.


Proposed Definition (The following example is for masked signed division. The
rest are similar)
=======================================================================
     'llvm.masked.sdiv'

     Syntax:

           An overloaded intrinsic. You can use llvm.masked.sdiv on any vector
with integer elements.

           declare <16 x i32>  @llvm.masked.sdiv.v16i32(<16 x i32>
<a>, <16 x i32> <b>, <16 x i1> <mask>, <16 x
i32> <passthru>)

     Overview:

           Returns the quotient of its two operands per vector lane according to
the provided mask. The mask holds a bit for each vector lane, and is used to
prevent division in the masked-off lanes. The masked-off lanes in the result
vector are taken from the corresponding lanes of the passthru operand.

     Arguments:

           The first two arguments must be vectors of integer values. Both
arguments must have identical types. The third operand, mask, is a vector of
boolean values with the same number of elements as the first two. The fourth is
a pass-through value that is used to fill the masked-off lanes of the result.
The type of the passthru operand is the same as the first two.

     Semantics:

           The 'llvm.masked.sdiv' intrinsic is designed for conditional
integer division of selected vector elements in a single IR operation. The
result of this operation is equivalent to a regular vector 'sdiv'
instruction followed by a 'select' between the loaded and the passthru
values, predicated on the same mask. However, using this intrinsic prevents
divide-by-zero exceptions on division of masked-off lanes. If any element in a
turned-on lane of the divisor is zero, the operation has undefined behavior.


Feedback and comments are welcome!
Thanks, Elad
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171017/bead31fc/attachment-0001.html>

Friedman, Eli via llvm-dev

2017-Oct-17 17:58 UTC

head link

[llvm-dev] [RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder

On 10/16/2017 10:22 PM, Cohen, Elad2 via llvm-dev wrote:>
> Introduction
>
> =========>
> We would like to add support for masked vector signed/unsigned integer 
> division and remainder in the LLVM IR by introducing new 
> target-independent intrinsics.
>
> This follows similar work which was done already for masked vector 
> loads and stores - 
> http://lists.llvm.org/pipermail/llvm-dev/2014-October/078059.html.
>
> Another relevant reference is the masked scatter/gather intrinsics 
> discussion - 
> http://lists.llvm.org/pipermail/llvm-dev/2014-December/079843.html.
>
> Motivation
>
> ========>
> In the current state if the loop-vectorizer decides that it should 
> vectorize a loop which contains a predicated integer division - it 
> will vectorize the loop body and scalarize the predicated division 
> instruction into a sequence of branches that guard scalar division 
> operations. In some cases the generated code for this will not be very 
> efficient. Speculating the divides using a non-masked vector sdiv 
> instruction is usually not an option due to the danger of integer 
> divide-by-zero.
>
> With the addition of these hereby proposed intrinsics the 
> loop-vectorizer could concentrate on the vector semantics rather than 
> how to lower them, by generating the masked intrinsics.
>
> Initially the intrinsics will be scalarized for all targets. This 
> could be done by extending scalarize-masked-mem-intrin to handle also 
> division masked intrinsics. Later the intrinsics could be optimized by:
>
> 1.Lowering of the intrinsics in the backend using different expansions 
> (for example converting to floating point and using masked vector 
> floating-point division instructions).
>
> 2.Linking the intrinsics to different vector math library implementations.
>
> 3.Scalarizing the intrinsics at the backend possibly using 
> target-specific considerations.
>
> Proposed Definition (The following example is for masked signed 
> division. The rest are similar)
>
> =======================================================================>
>      ‘llvm.masked.sdiv’
>
>      Syntax:
>
>            An overloaded intrinsic. You can use llvm.masked.sdiv on 
> any vector with integer elements.
>
>            declare <16 x i32> @llvm.masked.sdiv.v16i32(<16 x
i32> <a>,
> <16 x i32> <b>, <16 x i1> <mask>, <16 x i32>
<passthru>)
>
>      Overview:
>
>            Returns the quotient of its two operands per vector lane 
> according to the provided mask. The mask holds a bit for each vector 
> lane, and is used to prevent division in the masked-off lanes. The 
> masked-off lanes in the result vector are taken from the corresponding 
> lanes of the passthru operand.
>
>      Arguments:
>
>            The first two arguments must be vectors of integer values. 
> Both arguments must have identical types. The third operand, mask, is 
> a vector of boolean values with the same number of elements as the 
> first two. The fourth is a pass-through value that is used to fill the 
> masked-off lanes of the result. The type of the passthruoperand is the 
> same as the first two.
>
>      Semantics:
>
>            The ‘llvm.masked.sdiv’ intrinsic is designed for 
> conditional integer division of selected vector elements in a single 
> IR operation. The result of this operation is equivalent to a regular 
> vector 'sdiv' instruction followed by a ‘select’ between the loaded
> and the passthru values, predicated on the same mask. However, using 
> this intrinsic prevents divide-by-zero exceptions on division of 
> masked-off lanes. If any element in a turned-on lane of the divisor is 
> zero, the operation has undefined behavior.
>
You probably want to mention INT_MIN/-1 overflow here?

----

The alternative here is to refine the definition of "sdiv" in LangRef;
other arithmetic operations LLVM IR don't have undefined behavior, and 
the primary reason "sdiv" has undefined behavior is the unfortunate 
behavior of the x86 "IDIV" instruction.  For example, we could add a 
"nooverflow" bit to "sdiv", and say that divide-by-zero has
undefined
behavior if the "nooverflow" bit is present, and produces poison
otherwise.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171017/33955f89/attachment-0001.html>

Renato Golin via llvm-dev

2017-Oct-20 20:44 UTC

head link

[llvm-dev] [RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder

Adding Sander and Florian, as this will certainly apply to SVE.

cheers,
--renato

On 17 October 2017 at 18:58, Friedman, Eli via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> On 10/16/2017 10:22 PM, Cohen, Elad2 via llvm-dev wrote:
>
> Introduction
>
> =========>
>
>
> We would like to add support for masked vector signed/unsigned integer
> division and remainder in the LLVM IR by introducing new target-independent
> intrinsics.
>
>
>
> This follows similar work which was done already for masked vector loads
and
> stores - http://lists.llvm.org/pipermail/llvm-dev/2014-October/078059.html.
>
> Another relevant reference is the masked scatter/gather intrinsics
> discussion -
> http://lists.llvm.org/pipermail/llvm-dev/2014-December/079843.html.
>
>
>
>
>
> Motivation
>
> ========>
>
>
> In the current state if the loop-vectorizer decides that it should
vectorize
> a loop which contains a predicated integer division - it will vectorize the
> loop body and scalarize the predicated division instruction into a sequence
> of branches that guard scalar division operations. In some cases the
> generated code for this will not be very efficient. Speculating the divides
> using a non-masked vector sdiv instruction is usually not an option due to
> the danger of integer divide-by-zero.
>
>
>
> With the addition of these hereby proposed intrinsics the loop-vectorizer
> could concentrate on the vector semantics rather than how to lower them, by
> generating the masked intrinsics.
>
> Initially the intrinsics will be scalarized for all targets. This could be
> done by extending scalarize-masked-mem-intrin to handle also division
masked
> intrinsics. Later the intrinsics could be optimized by:
>
> 1.       Lowering of the intrinsics in the backend using different
> expansions (for example converting to floating point and using masked
vector
> floating-point division instructions).
>
> 2.       Linking the intrinsics to different vector math library
> implementations.
>
> 3.       Scalarizing the intrinsics at the backend possibly using
> target-specific considerations.
>
>
>
>
>
> Proposed Definition (The following example is for masked signed division.
> The rest are similar)
>
> =======================================================================>
>
>
>      ‘llvm.masked.sdiv’
>
>
>
>      Syntax:
>
>
>
>            An overloaded intrinsic. You can use llvm.masked.sdiv on any
> vector with integer elements.
>
>
>
>            declare <16 x i32>  @llvm.masked.sdiv.v16i32(<16 x
i32> <a>, <16
> x i32> <b>, <16 x i1> <mask>, <16 x i32>
<passthru>)
>
>
>
>      Overview:
>
>
>
>            Returns the quotient of its two operands per vector lane
> according to the provided mask. The mask holds a bit for each vector lane,
> and is used to prevent division in the masked-off lanes. The masked-off
> lanes in the result vector are taken from the corresponding lanes of the
> passthru operand.
>
>
>
>      Arguments:
>
>
>
>            The first two arguments must be vectors of integer values. Both
> arguments must have identical types. The third operand, mask, is a vector
of
> boolean values with the same number of elements as the first two. The
fourth
> is a pass-through value that is used to fill the masked-off lanes of the
> result. The type of the passthru operand is the same as the first two.
>
>
>
>      Semantics:
>
>
>
>            The ‘llvm.masked.sdiv’ intrinsic is designed for conditional
> integer division of selected vector elements in a single IR operation. The
> result of this operation is equivalent to a regular vector 'sdiv'
> instruction followed by a ‘select’ between the loaded and the passthru
> values, predicated on the same mask. However, using this intrinsic prevents
> divide-by-zero exceptions on division of masked-off lanes. If any element
in
> a turned-on lane of the divisor is zero, the operation has undefined
> behavior.
>
>
> You probably want to mention INT_MIN/-1 overflow here?
>
> ----
>
> The alternative here is to refine the definition of "sdiv" in
LangRef; other
> arithmetic operations LLVM IR don't have undefined behavior, and the
primary
> reason "sdiv" has undefined behavior is the unfortunate behavior
of the x86
> "IDIV" instruction.  For example, we could add a
"nooverflow" bit to "sdiv",
> and say that divide-by-zero has undefined behavior if the
"nooverflow" bit
> is present, and produces poison otherwise.
>
> -Eli
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
> Foundation Collaborative Project
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Hal Finkel via llvm-dev

2017-Oct-24 01:37 UTC

head link

[llvm-dev] [RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder

On 10/17/2017 12:58 PM, Friedman, Eli via llvm-dev
wrote:> On 10/16/2017 10:22 PM, Cohen, Elad2 via llvm-dev wrote:
>>
>> Introduction
>>
>> =========>>
>> We would like to add support for masked vector signed/unsigned 
>> integer division and remainder in the LLVM IR by introducing new 
>> target-independent intrinsics.
>>
>> This follows similar work which was done already for masked vector 
>> loads and stores - 
>> http://lists.llvm.org/pipermail/llvm-dev/2014-October/078059.html.
>>
>> Another relevant reference is the masked scatter/gather intrinsics 
>> discussion - 
>> http://lists.llvm.org/pipermail/llvm-dev/2014-December/079843.html.
>>
>> Motivation
>>
>> ========>>
>> In the current state if the loop-vectorizer decides that it should 
>> vectorize a loop which contains a predicated integer division - it 
>> will vectorize the loop body and scalarize the predicated division 
>> instruction into a sequence of branches that guard scalar division 
>> operations. In some cases the generated code for this will not be 
>> very efficient. Speculating the divides using a non-masked vector 
>> sdiv instruction is usually not an option due to the danger of 
>> integer divide-by-zero.
>>
>> With the addition of these hereby proposed intrinsics the 
>> loop-vectorizer could concentrate on the vector semantics rather than 
>> how to lower them, by generating the masked intrinsics.
>>
>> Initially the intrinsics will be scalarized for all targets. This 
>> could be done by extending scalarize-masked-mem-intrin to handle also 
>> division masked intrinsics. Later the intrinsics could be optimized by:
>>
>> 1.Lowering of the intrinsics in the backend using different 
>> expansions (for example converting to floating point and using masked 
>> vector floating-point division instructions).
>>
>> 2.Linking the intrinsics to different vector math library 
>> implementations.
>>
>> 3.Scalarizing the intrinsics at the backend possibly using 
>> target-specific considerations.
>>
>> Proposed Definition (The following example is for masked signed 
>> division. The rest are similar)
>>
>>
=======================================================================>>
>>      ‘llvm.masked.sdiv’
>>
>>      Syntax:
>>
>>            An overloaded intrinsic. You can use llvm.masked.sdiv on 
>> any vector with integer elements.
>>
>>            declare <16 x i32> @llvm.masked.sdiv.v16i32(<16 x
i32>
>> <a>, <16 x i32> <b>, <16 x i1> <mask>,
<16 x i32> <passthru>)
>>
>>      Overview:
>>
>>            Returns the quotient of its two operands per vector lane 
>> according to the provided mask. The mask holds a bit for each vector 
>> lane, and is used to prevent division in the masked-off lanes. The 
>> masked-off lanes in the result vector are taken from the 
>> corresponding lanes of the passthru operand.
>>
>>      Arguments:
>>
>>            The first two arguments must be vectors of integer values. 
>> Both arguments must have identical types. The third operand, mask, is 
>> a vector of boolean values with the same number of elements as the 
>> first two. The fourth is a pass-through value that is used to fill 
>> the masked-off lanes of the result. The type of the passthruoperand 
>> is the same as the first two.
>>
>>      Semantics:
>>
>>            The ‘llvm.masked.sdiv’ intrinsic is designed for 
>> conditional integer division of selected vector elements in a single 
>> IR operation. The result of this operation is equivalent to a regular 
>> vector 'sdiv' instruction followed by a ‘select’ between the
loaded
>> and the passthru values, predicated on the same mask. However, using 
>> this intrinsic prevents divide-by-zero exceptions on division of 
>> masked-off lanes. If any element in a turned-on lane of the divisor 
>> is zero, the operation has undefined behavior.
>>
>
> You probably want to mention INT_MIN/-1 overflow here?
>
> ----
>
> The alternative here is to refine the definition of "sdiv" in
LangRef;
> other arithmetic operations LLVM IR don't have undefined behavior, and 
> the primary reason "sdiv" has undefined behavior is the
unfortunate
> behavior of the x86 "IDIV" instruction.  For example, we could
add a
> "nooverflow" bit to "sdiv", and say that divide-by-zero
has undefined
> behavior if the "nooverflow" bit is present, and produces poison 
> otherwise.
This seems like a good idea. It will also provide us with a well-defined 
way to speculate/hoist divisions. I presume that we'd want to have Clang 
(etc.) generate all divisions with this bit set, but we could clear the 
bit when vectorizing (or hoisting, if we wanted to do that).

On x86, we'd need to lower the form without the nooverflow bit present 
using a test-and-branch sequence, but on other architectures, we could 
use the poison-generating form directly.

  -Hal
>
> -Eli
> -- 
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171023/52d2ca1c/attachment-0001.html>

llvm dev - Oct 2017 - [RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder

[llvm-dev] [RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder

[llvm-dev] [RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder

[llvm-dev] [RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder

[llvm-dev] [RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder