thr3ads.net - llvm dev - [llvm-dev] RFC: Promoting experimental reduction intrinsics to first class intrinsics [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Simon Pilgrim via llvm-dev

2020-Jun-17 12:52 UTC

[llvm-dev] RFC: Promoting experimental reduction intrinsics to first class intrinsics

A minor point, but I think we need to more explicitly describe the order 
of floating point operations in the LangRef as well:

"If the intrinsic call has the ‘reassoc’ or ‘fast’ flags set, then the 
reduction will not preserve the associativity of an equivalent 
scalarized counterpart. Otherwise the reduction will be ordered, thus 
implying that the operation respects the associativity of a scalarized 
reduction."

Please could we add some pseudocode to show exactly how the intrinsic 
will be re-expanded for ordered cases?

On 16/06/2020 19:38, Sanjay Patel via llvm-dev wrote:> We switched over to producing the intrinsics for x86 with:
> https://reviews.llvm.org/rGe50059f6b6b3
> ...I'm not aware of any regressions yet.
>
> https://bugs.llvm.org/show_bug.cgi?id=45378 is also fixed as of today.
>
> So that leaves the problem with fmin/fmax when no fast-math-flags are 
> specified. We need to update the LangRef with whatever the expected 
> behavior is for NaN and -0.0.
> x86 will probably be poor regardless of whether we choose 
> "llvm.maxnum" or "llvm.maximum" semantics.
>
> On Thu, Apr 9, 2020 at 1:28 PM Craig Topper via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     No we still use the shuffle expansion which is why the issue isn't
>     unique to the intrinsic.
>
>     ~Craig
>
>
>     On Thu, Apr 9, 2020 at 10:21 AM Amara Emerson <aemerson at apple.com
>     <mailto:aemerson at apple.com>> wrote:
>
>         Has x86 switched to the intrinsics now?
>
>>         On Apr 9, 2020, at 10:17 AM, Craig Topper
>>         <craig.topper at gmail.com <mailto:craig.topper at
gmail.com>> wrote:
>>
>>         That recent X86 bug isn't unique to the intrinsic. We
>>         generate the same code from this which uses the shuffle
>>         sequence the vectorizers generated before the reduction
>>         intrinsics existed.
>>
>>         declare i64 @llvm.experimental.vector.reduce.or.v2i64(<2 x
i64>)·
>>         declare void @TrapFunc(i64)
>>
>>         define void @parseHeaders(i64 * %ptr) {
>>           %vptr = bitcast i64 * %ptr to <2 x i64> *
>>           %vload = load <2 x i64>, <2 x i64> * %vptr, align
8
>>
>>           %b = shufflevector <2 x i64> %vload, <2 x i64>
undef, <2 x
>>         i32> <i32 1, i32 undef>
>>           %c = or <2 x i64> %vload, %b
>>           %vreduce = extractelement <2 x i64> %c, i32 0
>>
>>           %vcheck = icmp eq i64 %vreduce, 0
>>           br i1 %vcheck, label %ret, label %trap
>>         trap:
>>           %v2 = extractelement <2 x i64> %vload, i32 1
>>           call void @TrapFunc(i64 %v2)
>>           ret void
>>         ret:
>>           ret void
>>         }
>>
>>         ~Craig
>>
>>
>>         On Thu, Apr 9, 2020 at 10:04 AM Philip Reames via llvm-dev
>>         <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>>             My experience with them so far is that the code
>>             generation for these
>>             intrinsics is still missing a lot of cases. Some of them
>>             are X86
>>             specific (the target I look at mostly), but many of them
>>             have generic forms.
>>
>>             As one recent example, consider
>>             https://bugs.llvm.org/show_bug.cgi?id=45378. (There's
>>             nothing special
>>             about this one other than it was recent.)
>>
>>             I'm not necessarily arguing they can't be promoted
from
>>             experimental,
>>             but it would be a much easier case if the code gen was
>>             routinely as good
>>             or better than the scalar forms.  Or to say that a bit
>>             differently, if
>>             we could canonicalize to them in the IR without major
>>             regression.
>>             Having two ways to represent something in the IR without
>>             any agreed upon
>>             canonical form is always sub-optimal.
>>
>>             Philip
>>
>>             On 4/7/20 9:59 PM, Amara Emerson via llvm-dev wrote:
>>             > Hi,
>>             >
>>             > It’s been a few years now since I added some
intrinsics
>>             for doing vector reductions. We’ve been using them
>>             exclusively on AArch64, and I’ve seen some traffic a
>>             while ago on list for other targets too. Sander did some
>>             work last year to refine the semantics after some
discussion.
>>             >
>>             > Are we at the point where we can drop the
>>             “experimental” from the name? IMO all target should begin
>>             to transition to using these as the preferred
>>             representation for reductions. But for now, I’m only
>>             proposing the naming change.
>>             >
>>             > Cheers,
>>             > Amara
>>             > _______________________________________________
>>             > LLVM Developers mailing list
>>             > llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>             >
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>             _______________________________________________
>>             LLVM Developers mailing list
>>             llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>             https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200617/716b3bfc/attachment.html>

Amara Emerson via llvm-dev

2020-Jun-17 18:15 UTC

head link

[llvm-dev] RFC: Promoting experimental reduction intrinsics to first class intrinsics

Proposed clarification here: https://reviews.llvm.org/D82034
> On Jun 17, 2020, at 5:52 AM, Simon Pilgrim via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> A minor point, but I think we need to more explicitly describe the order of
floating point operations in the LangRef as well:
> 
> "If the intrinsic call has the ‘reassoc’ or ‘fast’ flags set, then the
reduction will not preserve the associativity of an equivalent scalarized
counterpart. Otherwise the reduction will be ordered, thus implying that the
operation respects the associativity of a scalarized reduction."
> 
> Please could we add some pseudocode to show exactly how the intrinsic will
be re-expanded for ordered cases?
> 
> On 16/06/2020 19:38, Sanjay Patel via llvm-dev wrote:
>> We switched over to producing the intrinsics for x86 with:
>> https://reviews.llvm.org/rGe50059f6b6b3
<https://reviews.llvm.org/rGe50059f6b6b3>
>> ...I'm not aware of any regressions yet.
>> 
>> https://bugs.llvm.org/show_bug.cgi?id=45378
<https://bugs.llvm.org/show_bug.cgi?id=45378> is also fixed as of today.
>> 
>> So that leaves the problem with fmin/fmax when no fast-math-flags are
specified. We need to update the LangRef with whatever the expected behavior is
for NaN and -0.0.
>> x86 will probably be poor regardless of whether we choose
"llvm.maxnum" or "llvm.maximum" semantics.
>> 
>> On Thu, Apr 9, 2020 at 1:28 PM Craig Topper via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> No we still use the shuffle expansion which is why the issue isn't
unique to the intrinsic.
>> 
>> ~Craig
>> 
>> 
>> On Thu, Apr 9, 2020 at 10:21 AM Amara Emerson <aemerson at apple.com
<mailto:aemerson at apple.com>> wrote:
>> Has x86 switched to the intrinsics now?
>> 
>>> On Apr 9, 2020, at 10:17 AM, Craig Topper <craig.topper at
gmail.com <mailto:craig.topper at gmail.com>> wrote:
>>> 
>>> That recent X86 bug isn't unique to the intrinsic. We generate
the same code from this which uses the shuffle sequence the vectorizers
generated before the reduction intrinsics existed.
>>> 
>>> declare i64 @llvm.experimental.vector.reduce.or.v2i64(<2 x
i64>)·
>>> declare void @TrapFunc(i64)
>>> 
>>> define void @parseHeaders(i64 * %ptr) {
>>>   %vptr = bitcast i64 * %ptr to <2 x i64> *
>>>   %vload = load <2 x i64>, <2 x i64> * %vptr, align 8
>>> 
>>>   %b = shufflevector <2 x i64> %vload, <2 x i64> undef,
<2 x i32> <i32 1, i32 undef>
>>>   %c = or <2 x i64> %vload, %b
>>>   %vreduce = extractelement <2 x i64> %c, i32 0
>>> 
>>>   %vcheck = icmp eq i64 %vreduce, 0
>>>   br i1 %vcheck, label %ret, label %trap
>>> trap:
>>>   %v2 = extractelement <2 x i64> %vload, i32 1
>>>   call void @TrapFunc(i64 %v2)
>>>   ret void
>>> ret:
>>>   ret void
>>> }
>>> 
>>> ~Craig
>>> 
>>> 
>>> On Thu, Apr 9, 2020 at 10:04 AM Philip Reames via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>> My experience with them so far is that the code generation for
these
>>> intrinsics is still missing a lot of cases.  Some of them are X86 
>>> specific (the target I look at mostly), but many of them have
generic forms.
>>> 
>>> As one recent example, consider 
>>> https://bugs.llvm.org/show_bug.cgi?id=45378
<https://bugs.llvm.org/show_bug.cgi?id=45378>.  (There's nothing
special
>>> about this one other than it was recent.)
>>> 
>>> I'm not necessarily arguing they can't be promoted from
experimental,
>>> but it would be a much easier case if the code gen was routinely as
good
>>> or better than the scalar forms.  Or to say that a bit differently,
if
>>> we could canonicalize to them in the IR without major regression.  
>>> Having two ways to represent something in the IR without any agreed
upon
>>> canonical form is always sub-optimal.
>>> 
>>> Philip
>>> 
>>> On 4/7/20 9:59 PM, Amara Emerson via llvm-dev wrote:
>>> > Hi,
>>> >
>>> > It’s been a few years now since I added some intrinsics for
doing vector reductions. We’ve been using them exclusively on AArch64, and I’ve
seen some traffic a while ago on list for other targets too. Sander did some
work last year to refine the semantics after some discussion.
>>> >
>>> > Are we at the point where we can drop the “experimental” from
the name? IMO all target should begin to transition to using these as the
preferred representation for reductions. But for now, I’m only proposing the
naming change.
>>> >
>>> > Cheers,
>>> > Amara
>>> > _______________________________________________
>>> > LLVM Developers mailing list
>>> > llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200617/543240fe/attachment-0001.html>

Sanjay Patel via llvm-dev

2020-Sep-09 16:37 UTC

head link

[llvm-dev] RFC: Promoting experimental reduction intrinsics to first class intrinsics

Proposal to specify semantics for the FP min/max reductions:
https://reviews.llvm.org/D87391
I'm not sure how we got to the current state of codegen for those, but it
doesn't seem consistent or correct as-is, so I've proposed updates there
too.

On Wed, Jun 17, 2020 at 2:15 PM Amara Emerson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Proposed clarification here: https://reviews.llvm.org/D82034
>
> On Jun 17, 2020, at 5:52 AM, Simon Pilgrim via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> A minor point, but I think we need to more explicitly describe the order
> of floating point operations in the LangRef as well:
>
> "If the intrinsic call has the ‘reassoc’ or ‘fast’ flags set, then the
> reduction will not preserve the associativity of an equivalent scalarized
> counterpart. Otherwise the reduction will be ordered, thus implying that
> the operation respects the associativity of a scalarized reduction."
>
> Please could we add some pseudocode to show exactly how the intrinsic will
> be re-expanded for ordered cases?
> On 16/06/2020 19:38, Sanjay Patel via llvm-dev wrote:
>
> We switched over to producing the intrinsics for x86 with:
> https://reviews.llvm.org/rGe50059f6b6b3
> ...I'm not aware of any regressions yet.
>
> https://bugs.llvm.org/show_bug.cgi?id=45378 is also fixed as of today.
>
> So that leaves the problem with fmin/fmax when no fast-math-flags are
> specified. We need to update the LangRef with whatever the expected
> behavior is for NaN and -0.0.
> x86 will probably be poor regardless of whether we choose
"llvm.maxnum" or
> "llvm.maximum" semantics.
>
> On Thu, Apr 9, 2020 at 1:28 PM Craig Topper via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> No we still use the shuffle expansion which is why the issue isn't
unique
>> to the intrinsic.
>>
>> ~Craig
>>
>>
>> On Thu, Apr 9, 2020 at 10:21 AM Amara Emerson <aemerson at
apple.com> wrote:
>>
>>> Has x86 switched to the intrinsics now?
>>>
>>> On Apr 9, 2020, at 10:17 AM, Craig Topper <craig.topper at
gmail.com>
>>> wrote:
>>>
>>> That recent X86 bug isn't unique to the intrinsic. We generate
the same
>>> code from this which uses the shuffle sequence the vectorizers
generated
>>> before the reduction intrinsics existed.
>>>
>>> declare i64 @llvm.experimental.vector.reduce.or.v2i64(<2 x
i64>)·
>>> declare void @TrapFunc(i64)
>>>
>>> define void @parseHeaders(i64 * %ptr) {
>>>   %vptr = bitcast i64 * %ptr to <2 x i64> *
>>>   %vload = load <2 x i64>, <2 x i64> * %vptr, align 8
>>>
>>>   %b = shufflevector <2 x i64> %vload, <2 x i64> undef,
<2 x i32> <i32
>>> 1, i32 undef>
>>>   %c = or <2 x i64> %vload, %b
>>>   %vreduce = extractelement <2 x i64> %c, i32 0
>>>
>>>   %vcheck = icmp eq i64 %vreduce, 0
>>>   br i1 %vcheck, label %ret, label %trap
>>> trap:
>>>   %v2 = extractelement <2 x i64> %vload, i32 1
>>>   call void @TrapFunc(i64 %v2)
>>>   ret void
>>> ret:
>>>   ret void
>>> }
>>>
>>> ~Craig
>>>
>>>
>>> On Thu, Apr 9, 2020 at 10:04 AM Philip Reames via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> My experience with them so far is that the code generation for
these
>>>> intrinsics is still missing a lot of cases.  Some of them are
X86
>>>> specific (the target I look at mostly), but many of them have
generic
>>>> forms.
>>>>
>>>> As one recent example, consider
>>>> https://bugs.llvm.org/show_bug.cgi?id=45378.  (There's
nothing special
>>>> about this one other than it was recent.)
>>>>
>>>> I'm not necessarily arguing they can't be promoted from
experimental,
>>>> but it would be a much easier case if the code gen was
routinely as
>>>> good
>>>> or better than the scalar forms.  Or to say that a bit
differently, if
>>>> we could canonicalize to them in the IR without major
regression.
>>>> Having two ways to represent something in the IR without any
agreed
>>>> upon
>>>> canonical form is always sub-optimal.
>>>>
>>>> Philip
>>>>
>>>> On 4/7/20 9:59 PM, Amara Emerson via llvm-dev wrote:
>>>> > Hi,
>>>> >
>>>> > It’s been a few years now since I added some intrinsics
for doing
>>>> vector reductions. We’ve been using them exclusively on
AArch64, and I’ve
>>>> seen some traffic a while ago on list for other targets too.
Sander did
>>>> some work last year to refine the semantics after some
discussion.
>>>> >
>>>> > Are we at the point where we can drop the “experimental”
from the
>>>> name? IMO all target should begin to transition to using these
as the
>>>> preferred representation for reductions. But for now, I’m only
proposing
>>>> the naming change.
>>>> >
>>>> > Cheers,
>>>> > Amara
>>>> > _______________________________________________
>>>> > LLVM Developers mailing list
>>>> > llvm-dev at lists.llvm.org
>>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
>>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/4de7e022/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Sep 2020 - RFC: Promoting experimental reduction intrinsics to first class intrinsics

[llvm-dev] RFC: Promoting experimental reduction intrinsics to first class intrinsics

[llvm-dev] RFC: Promoting experimental reduction intrinsics to first class intrinsics

[llvm-dev] RFC: Promoting experimental reduction intrinsics to first class intrinsics

Possibly Parallel Threads