thr3ads.net - llvm dev - [llvm-dev] how to force llvm generate gather intrinsic [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Sanjay Patel via llvm-dev

2016-Feb-26 20:49 UTC

[llvm-dev] how to force llvm generate gather intrinsic

If I'm understanding correctly, you're saying that vgather* is slow on
all
of Excavator, Haswell, Broadwell, and Skylake (client). Therefore, we will
not generate it for any of those machines.

Even if that's true, we should not define "gatherIsSlow()" as
"hasAVX2() &&
!hasAVX512()". It could break for some hypothetical future processor that
manages to implement it properly. The AVX2 spec includes gather; whether
it's slow or fast is an implementation detail. We need a feature bit / cost
model entry somewhere to signify this, so we're not overloading the meaning
of the architectural features with that implementation detail.

On Fri, Feb 26, 2016 at 12:23 PM, Demikhovsky, Elena <
elena.demikhovsky at intel.com> wrote:
> No. Gather operation is slow on AVX2 processors.
>
>
>
> -          * Elena*
>
>
>
> *From:* zhi chen [mailto:zchenhn at gmail.com]
> *Sent:* Thursday, February 25, 2016 20:48
> *To:* Sanjay Patel <spatel at rotateright.com>
> *Cc:* Demikhovsky, Elena <elena.demikhovsky at intel.com>; Nema,
Ashutosh <
> Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org>
>
> *Subject:* Re: [llvm-dev] how to force llvm generate gather intrinsic
>
>
>
> It seems that http://reviews.llvm.org/D15690 only implemented
> gather/scatter for AVX-512, but not for AVX/AVX2. Is there any plan to
> enable gather for AVX/2? Thanks.
>
>
>
> Best,
>
> Zhi
>
>
>
> On Thu, Feb 25, 2016 at 8:28 AM, Sanjay Patel <spatel at
rotateright.com>
> wrote:
>
> I don't think gather has been enabled for AVX2 as of r261875.
>
> Masked load/store were enabled for AVX with:
> http://reviews.llvm.org/D16528 / http://reviews.llvm.org/rL258675
>
>
>
> On Wed, Feb 24, 2016 at 11:39 PM, Demikhovsky, Elena <
> elena.demikhovsky at intel.com> wrote:
>
> Yes, masked load/store/gather/scatter are completed.
>
>
>
> -          * Elena*
>
>
>
> *From:* zhi chen [mailto:zchenhn at gmail.com]
> *Sent:* Thursday, February 25, 2016 01:20
> *To:* Demikhovsky, Elena <elena.demikhovsky at intel.com>
> *Cc:* Sanjay Patel <spatel at rotateright.com>; Nema, Ashutosh <
> Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org>
>
>
> *Subject:* Re: [llvm-dev] how to force llvm generate gather intrinsic
>
>
>
> Hi Elena,
>
>
>
> Are the masked_load and gather working now?
>
>
>
> Best,
>
> Zhi
>
>
>
> On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena <
> elena.demikhovsky at intel.com> wrote:
>
> Ø  Can we legalize the same set of masked load/store operations for AVX1
> as AVX2?
>
> Yes, of course.
>
>
>
> -          * Elena*
>
>
>
> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Saturday, January 23, 2016 18:42
> *To:* Nema, Ashutosh <Ashutosh.Nema at amd.com>
> *Cc:* Demikhovsky, Elena <elena.demikhovsky at intel.com>; zhi chen
<
> zchenhn at gmail.com>; llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] how to force llvm generate gather intrinsic
>
>
>
>
>
> On Sat, Jan 23, 2016 at 6:45 AM, Nema, Ashutosh <Ashutosh.Nema at
amd.com>
> wrote:
>
> Thanks Sanjay for highlighting this, few days back I also faced similar
> problem
>
> while generating masked store in avx1 mode, found its only supported under
>
> avx2 else we scalarize it.
>
>
>
> >  1)   I did not switch-on masked_load/store to AVX1, I can do this.
>
>
>
> Yes Elena, This should be supported for FP type in avx1 mode (for INT
> type, I doubt X86 has masked_load/store instruction in avx1 mode).
>
>
>
> Thanks everyone for the answers. My immediate motivation is to improve the
> masked load/store ops for an AVX target. If we can fix scatter/gather
> similarly, that would be great.
>
> Can we legalize the same set of masked load/store operations for AVX1 as
> AVX2? If I'm understanding them correctly, the AVX1 FP instructions
> (vmaskmovps/pd) can be used in place of the AVX2 int instructions
> (vpmaskmovd/q), just with domain crossing penalties thrown in. I think we
> do this for other missing integer ops for an AVX1 target either in x86
> lowering or in the tablegen patterns.
>
>  Elena - I'm not too familiar with the vectorizers or scatter/gather,
but
> I'll certainly take a look at D15690. Thanks for pointing out the
patch!
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160226/fa39bde9/attachment.html>

zhi chen via llvm-dev

2016-Feb-26 21:46 UTC

head link

[llvm-dev] how to force llvm generate gather intrinsic

That makes great sense. It would be great if we have profitability mode to
see the necessity to use gathers. Or it also would be good if there is a
compiler option for the users to enable LLVM to generate the gather
instructions no matter it is faster or slow.

Best,
Zhi

On Fri, Feb 26, 2016 at 12:49 PM, Sanjay Patel <spatel at rotateright.com>
wrote:
> If I'm understanding correctly, you're saying that vgather* is slow
on all
> of Excavator, Haswell, Broadwell, and Skylake (client). Therefore, we will
> not generate it for any of those machines.
>
> Even if that's true, we should not define "gatherIsSlow()" as
"hasAVX2()
> && !hasAVX512()". It could break for some hypothetical future
processor
> that manages to implement it properly. The AVX2 spec includes gather;
> whether it's slow or fast is an implementation detail. We need a
feature
> bit / cost model entry somewhere to signify this, so we're not
overloading
> the meaning of the architectural features with that implementation detail.
>
> On Fri, Feb 26, 2016 at 12:23 PM, Demikhovsky, Elena <
> elena.demikhovsky at intel.com> wrote:
>
>> No. Gather operation is slow on AVX2 processors.
>>
>>
>>
>> -          * Elena*
>>
>>
>>
>> *From:* zhi chen [mailto:zchenhn at gmail.com]
>> *Sent:* Thursday, February 25, 2016 20:48
>> *To:* Sanjay Patel <spatel at rotateright.com>
>> *Cc:* Demikhovsky, Elena <elena.demikhovsky at intel.com>; Nema,
Ashutosh <
>> Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at
lists.llvm.org>
>>
>> *Subject:* Re: [llvm-dev] how to force llvm generate gather intrinsic
>>
>>
>>
>> It seems that http://reviews.llvm.org/D15690 only implemented
>> gather/scatter for AVX-512, but not for AVX/AVX2. Is there any plan to
>> enable gather for AVX/2? Thanks.
>>
>>
>>
>> Best,
>>
>> Zhi
>>
>>
>>
>> On Thu, Feb 25, 2016 at 8:28 AM, Sanjay Patel <spatel at
rotateright.com>
>> wrote:
>>
>> I don't think gather has been enabled for AVX2 as of r261875.
>>
>> Masked load/store were enabled for AVX with:
>> http://reviews.llvm.org/D16528 / http://reviews.llvm.org/rL258675
>>
>>
>>
>> On Wed, Feb 24, 2016 at 11:39 PM, Demikhovsky, Elena <
>> elena.demikhovsky at intel.com> wrote:
>>
>> Yes, masked load/store/gather/scatter are completed.
>>
>>
>>
>> -          * Elena*
>>
>>
>>
>> *From:* zhi chen [mailto:zchenhn at gmail.com]
>> *Sent:* Thursday, February 25, 2016 01:20
>> *To:* Demikhovsky, Elena <elena.demikhovsky at intel.com>
>> *Cc:* Sanjay Patel <spatel at rotateright.com>; Nema, Ashutosh
<
>> Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at
lists.llvm.org>
>>
>>
>> *Subject:* Re: [llvm-dev] how to force llvm generate gather intrinsic
>>
>>
>>
>> Hi Elena,
>>
>>
>>
>> Are the masked_load and gather working now?
>>
>>
>>
>> Best,
>>
>> Zhi
>>
>>
>>
>> On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena <
>> elena.demikhovsky at intel.com> wrote:
>>
>> Ø  Can we legalize the same set of masked load/store operations for
AVX1
>> as AVX2?
>>
>> Yes, of course.
>>
>>
>>
>> -          * Elena*
>>
>>
>>
>> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
>> *Sent:* Saturday, January 23, 2016 18:42
>> *To:* Nema, Ashutosh <Ashutosh.Nema at amd.com>
>> *Cc:* Demikhovsky, Elena <elena.demikhovsky at intel.com>; zhi
chen <
>> zchenhn at gmail.com>; llvm-dev <llvm-dev at lists.llvm.org>
>> *Subject:* Re: [llvm-dev] how to force llvm generate gather intrinsic
>>
>>
>>
>>
>>
>> On Sat, Jan 23, 2016 at 6:45 AM, Nema, Ashutosh <Ashutosh.Nema at
amd.com>
>> wrote:
>>
>> Thanks Sanjay for highlighting this, few days back I also faced similar
>> problem
>>
>> while generating masked store in avx1 mode, found its only supported
>> under
>>
>> avx2 else we scalarize it.
>>
>>
>>
>> >  1)   I did not switch-on masked_load/store to AVX1, I can do
this.
>>
>>
>>
>> Yes Elena, This should be supported for FP type in avx1 mode (for INT
>> type, I doubt X86 has masked_load/store instruction in avx1 mode).
>>
>>
>>
>> Thanks everyone for the answers. My immediate motivation is to improve
>> the masked load/store ops for an AVX target. If we can fix
scatter/gather
>> similarly, that would be great.
>>
>> Can we legalize the same set of masked load/store operations for AVX1
as
>> AVX2? If I'm understanding them correctly, the AVX1 FP instructions
>> (vmaskmovps/pd) can be used in place of the AVX2 int instructions
>> (vpmaskmovd/q), just with domain crossing penalties thrown in. I think
we
>> do this for other missing integer ops for an AVX1 target either in x86
>> lowering or in the tablegen patterns.
>>
>>  Elena - I'm not too familiar with the vectorizers or
scatter/gather, but
>> I'll certainly take a look at D15690. Thanks for pointing out the
patch!
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160226/cffb6190/attachment.html>

Demikhovsky, Elena via llvm-dev

2016-Feb-28 08:29 UTC

head link

[llvm-dev] how to force llvm generate gather intrinsic

I, myself, do not plan this work right now. But I’m ready to assist and review
if somebody will take it.

-           Elena

From: zhi chen [mailto:zchenhn at gmail.com]
Sent: Friday, February 26, 2016 23:47
To: Sanjay Patel <spatel at rotateright.com>
Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Nema, Ashutosh
<Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] how to force llvm generate gather intrinsic

That makes great sense. It would be great if we have profitability mode to see
the necessity to use gathers. Or it also would be good if there is a compiler
option for the users to enable LLVM to generate the gather instructions no
matter it is faster or slow.

Best,
Zhi

On Fri, Feb 26, 2016 at 12:49 PM, Sanjay Patel <spatel at
rotateright.com<mailto:spatel at rotateright.com>> wrote:
If I'm understanding correctly, you're saying that vgather* is slow on
all of Excavator, Haswell, Broadwell, and Skylake (client). Therefore, we will
not generate it for any of those machines.

Even if that's true, we should not define "gatherIsSlow()" as
"hasAVX2() && !hasAVX512()". It could break for some
hypothetical future processor that manages to implement it properly. The AVX2
spec includes gather; whether it's slow or fast is an implementation detail.
We need a feature bit / cost model entry somewhere to signify this, so we're
not overloading the meaning of the architectural features with that
implementation detail.

On Fri, Feb 26, 2016 at 12:23 PM, Demikhovsky, Elena <elena.demikhovsky at
intel.com<mailto:elena.demikhovsky at intel.com>> wrote:
No. Gather operation is slow on AVX2 processors.

-           Elena

From: zhi chen [mailto:zchenhn at gmail.com<mailto:zchenhn at gmail.com>]
Sent: Thursday, February 25, 2016 20:48
To: Sanjay Patel <spatel at rotateright.com<mailto:spatel at
rotateright.com>>
Cc: Demikhovsky, Elena <elena.demikhovsky at
intel.com<mailto:elena.demikhovsky at intel.com>>; Nema, Ashutosh
<Ashutosh.Nema at amd.com<mailto:Ashutosh.Nema at amd.com>>;
llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>

Subject: Re: [llvm-dev] how to force llvm generate gather intrinsic

It seems that http://reviews.llvm.org/D15690 only implemented gather/scatter for
AVX-512, but not for AVX/AVX2. Is there any plan to enable gather for AVX/2?
Thanks.

Best,
Zhi

On Thu, Feb 25, 2016 at 8:28 AM, Sanjay Patel <spatel at
rotateright.com<mailto:spatel at rotateright.com>> wrote:
I don't think gather has been enabled for AVX2 as of r261875.
Masked load/store were enabled for AVX with:
http://reviews.llvm.org/D16528 / http://reviews.llvm.org/rL258675

On Wed, Feb 24, 2016 at 11:39 PM, Demikhovsky, Elena <elena.demikhovsky at
intel.com<mailto:elena.demikhovsky at intel.com>> wrote:
Yes, masked load/store/gather/scatter are completed.

-           Elena

From: zhi chen [mailto:zchenhn at gmail.com<mailto:zchenhn at gmail.com>]
Sent: Thursday, February 25, 2016 01:20
To: Demikhovsky, Elena <elena.demikhovsky at
intel.com<mailto:elena.demikhovsky at intel.com>>
Cc: Sanjay Patel <spatel at rotateright.com<mailto:spatel at
rotateright.com>>; Nema, Ashutosh <Ashutosh.Nema at
amd.com<mailto:Ashutosh.Nema at amd.com>>; llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>

Subject: Re: [llvm-dev] how to force llvm generate gather intrinsic

Hi Elena,

Are the masked_load and gather working now?

Best,
Zhi

On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena <elena.demikhovsky at
intel.com<mailto:elena.demikhovsky at intel.com>> wrote:
>  Can we legalize the same set of masked load/store operations for AVX1 as
AVX2?Yes, of course.

-           Elena

From: Sanjay Patel [mailto:spatel at rotateright.com<mailto:spatel at
rotateright.com>]
Sent: Saturday, January 23, 2016 18:42
To: Nema, Ashutosh <Ashutosh.Nema at amd.com<mailto:Ashutosh.Nema at
amd.com>>
Cc: Demikhovsky, Elena <elena.demikhovsky at
intel.com<mailto:elena.demikhovsky at intel.com>>; zhi chen <zchenhn
at gmail.com<mailto:zchenhn at gmail.com>>; llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] how to force llvm generate gather intrinsic

On Sat, Jan 23, 2016 at 6:45 AM, Nema, Ashutosh <Ashutosh.Nema at
amd.com<mailto:Ashutosh.Nema at amd.com>> wrote:
Thanks Sanjay for highlighting this, few days back I also faced similar problem
while generating masked store in avx1 mode, found its only supported under
avx2 else we scalarize it.
>  1)   I did not switch-on masked_load/store to AVX1, I can do this.
Yes Elena, This should be supported for FP type in avx1 mode (for INT type, I
doubt X86 has masked_load/store instruction in avx1 mode).

Thanks everyone for the answers. My immediate motivation is to improve the
masked load/store ops for an AVX target. If we can fix scatter/gather similarly,
that would be great.

Can we legalize the same set of masked load/store operations for AVX1 as AVX2?
If I'm understanding them correctly, the AVX1 FP instructions
(vmaskmovps/pd) can be used in place of the AVX2 int instructions
(vpmaskmovd/q), just with domain crossing penalties thrown in. I think we do
this for other missing integer ops for an AVX1 target either in x86 lowering or
in the tablegen patterns.
 Elena - I'm not too familiar with the vectorizers or scatter/gather, but
I'll certainly take a look at D15690. Thanks for pointing out the patch!

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160228/d9d75635/attachment.html>

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Feb 2016 - how to force llvm generate gather intrinsic

[llvm-dev] how to force llvm generate gather intrinsic

[llvm-dev] how to force llvm generate gather intrinsic

[llvm-dev] how to force llvm generate gather intrinsic

Seemingly Similar Threads