zhi chen via llvm-dev
2016-Jan-23 00:58 UTC
[llvm-dev] how to force llvm generate gather intrinsic
Thanks for your response, Sanjay. I know there are intrinsics available in C/C++. But the problem is that I want to instrument my code at the IR level and generate those instructions. I don't want to touch the source code. Best, Zhi On Fri, Jan 22, 2016 at 4:54 PM, Sanjay Patel <spatel at rotateright.com> wrote:> I was just looking at the related masked load/store operations, and I > think there are at least 2 bugs: > > 1. X86TTIImpl::isLegalMaskedLoad/Store() should be legal for FP types with > AVX1 (not just AVX2). > 2. X86TTIImpl::isLegalMaskedGather/Scatter() should be legal for 128/256 > bit vectors with AVX2 (not just AVX512). > > I looked at this for the first time today, so I may be missing something... > > So for the moment, the answer to your question is 'no'; there's no generic > way to produce these instructions. You should be able to use the _mm_* > intrinsics in C though. > > > > > On Fri, Jan 22, 2016 at 5:00 PM, zhi chen via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi, >> >> I used clang -O3 -c -emit-llvm on the follow code to generate a bitcode, >> say a.bc. I read the .ll file and didn't see any gather intrinsic. Also, I >> used opt -O3 -mcpu=core-avx2/-mcpu=skx, but there is still no gather >> intrinsic generated. >> >> int foo(int A[800], int B[800], int C[800]) { >> for (int i = 0; i < 800; i++) { >> A[B[i]] = i + 5; >> } >> >> for (int i = 0; i < 800; i++) { >> A[B[i]]++; >> } >> >> for (int i = 0; i < 800; i++) { >> A[i] = B[C[i]]; >> } >> return 0; >> } >> >> Could some give me an example that will generate gather intrinsic for >> AVX2? I tried to used the masked_gather intrinsic provided in the language >> ref. But it seemed that it only generates gather intrinsic for AVX-512 but >> for AVX-2. I found that there are 16 gather intrinsic versions depending on >> the data types provided for AVX-2. Do I have to check the data type before >> calling them specifically? or is there a generic way to use AVX-2 gather >> intrinsic? >> >> Best, >> Zhi >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160122/cab9bfe7/attachment.html>
Demikhovsky, Elena via llvm-dev
2016-Jan-23 08:01 UTC
[llvm-dev] how to force llvm generate gather intrinsic
1) I did not switch-on masked_load/store to AVX1, I can do this. 2) I did not switch-on masked gather on AVX2 because the instruction is slow. There is no scatter on AVX2. 3) Currently, gather/scatter does not work on SKX because the patch is still under review reviews.llvm.org/D15690. I’d be happy if you agree to review it. - Elena From: zhi chen [mailto:zchenhn at gmail.com] Sent: Saturday, January 23, 2016 02:58 To: Sanjay Patel <spatel at rotateright.com> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; LLVM Developers Mailing List <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] how to force llvm generate gather intrinsic Thanks for your response, Sanjay. I know there are intrinsics available in C/C++. But the problem is that I want to instrument my code at the IR level and generate those instructions. I don't want to touch the source code. Best, Zhi On Fri, Jan 22, 2016 at 4:54 PM, Sanjay Patel <spatel at rotateright.com<mailto:spatel at rotateright.com>> wrote: I was just looking at the related masked load/store operations, and I think there are at least 2 bugs: 1. X86TTIImpl::isLegalMaskedLoad/Store() should be legal for FP types with AVX1 (not just AVX2). 2. X86TTIImpl::isLegalMaskedGather/Scatter() should be legal for 128/256 bit vectors with AVX2 (not just AVX512). I looked at this for the first time today, so I may be missing something... So for the moment, the answer to your question is 'no'; there's no generic way to produce these instructions. You should be able to use the _mm_* intrinsics in C though. On Fri, Jan 22, 2016 at 5:00 PM, zhi chen via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi, I used clang -O3 -c -emit-llvm on the follow code to generate a bitcode, say a.bc. I read the .ll file and didn't see any gather intrinsic. Also, I used opt -O3 -mcpu=core-avx2/-mcpu=skx, but there is still no gather intrinsic generated. int foo(int A[800], int B[800], int C[800]) { for (int i = 0; i < 800; i++) { A[B[i]] = i + 5; } for (int i = 0; i < 800; i++) { A[B[i]]++; } for (int i = 0; i < 800; i++) { A[i] = B[C[i]]; } return 0; } Could some give me an example that will generate gather intrinsic for AVX2? I tried to used the masked_gather intrinsic provided in the language ref. But it seemed that it only generates gather intrinsic for AVX-512 but for AVX-2. I found that there are 16 gather intrinsic versions depending on the data types provided for AVX-2. Do I have to check the data type before calling them specifically? or is there a generic way to use AVX-2 gather intrinsic? Best, Zhi _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160123/cd8b41f5/attachment.html>
zhi chen via llvm-dev
2016-Jan-23 08:21 UTC
[llvm-dev] how to force llvm generate gather intrinsic
I don't need scatter but only gather on AVX-2, and performance is not the biggest concern. Could you please kindly suggest me how to switch masked gather on? Best, Zhi On Sat, Jan 23, 2016 at 12:01 AM, Demikhovsky, Elena < elena.demikhovsky at intel.com> wrote:> 1) I did not switch-on masked_load/store to AVX1, I can do this. > > 2) I did not switch-on masked gather on AVX2 because the instruction > is slow. There is no scatter on AVX2. > > 3) Currently, gather/scatter does not work on SKX because the patch > is still under review reviews.llvm.org/D15690. I’d be happy if you > agree to review it. > > > > - * Elena* > > > > *From:* zhi chen [mailto:zchenhn at gmail.com] > *Sent:* Saturday, January 23, 2016 02:58 > *To:* Sanjay Patel <spatel at rotateright.com> > *Cc:* Demikhovsky, Elena <elena.demikhovsky at intel.com>; LLVM Developers > Mailing List <llvm-dev at lists.llvm.org> > *Subject:* Re: [llvm-dev] how to force llvm generate gather intrinsic > > > > Thanks for your response, Sanjay. I know there are intrinsics available in > C/C++. But the problem is that I want to instrument my code at the IR level > and generate those instructions. I don't want to touch the source code. > > > > Best, > > Zhi > > > > On Fri, Jan 22, 2016 at 4:54 PM, Sanjay Patel <spatel at rotateright.com> > wrote: > > I was just looking at the related masked load/store operations, and I > think there are at least 2 bugs: > > 1. X86TTIImpl::isLegalMaskedLoad/Store() should be legal for FP types with > AVX1 (not just AVX2). > 2. X86TTIImpl::isLegalMaskedGather/Scatter() should be legal for 128/256 > bit vectors with AVX2 (not just AVX512). > > I looked at this for the first time today, so I may be missing something... > > > > So for the moment, the answer to your question is 'no'; there's no generic > way to produce these instructions. You should be able to use the _mm_* > intrinsics in C though. > > > > > > On Fri, Jan 22, 2016 at 5:00 PM, zhi chen via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Hi, > > > > I used clang -O3 -c -emit-llvm on the follow code to generate a bitcode, > say a.bc. I read the .ll file and didn't see any gather intrinsic. Also, I > used opt -O3 -mcpu=core-avx2/-mcpu=skx, but there is still no gather > intrinsic generated. > > > > int foo(int A[800], int B[800], int C[800]) { > > for (int i = 0; i < 800; i++) { > > A[B[i]] = i + 5; > > } > > > > for (int i = 0; i < 800; i++) { > > A[B[i]]++; > > } > > > > for (int i = 0; i < 800; i++) { > > A[i] = B[C[i]]; > > } > > return 0; > > } > > > > Could some give me an example that will generate gather intrinsic for > AVX2? I tried to used the masked_gather intrinsic provided in the language > ref. But it seemed that it only generates gather intrinsic for AVX-512 but > for AVX-2. I found that there are 16 gather intrinsic versions depending on > the data types provided for AVX-2. Do I have to check the data type before > calling them specifically? or is there a generic way to use AVX-2 gather > intrinsic? > > > > Best, > > Zhi > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160123/6935c0a5/attachment-0001.html>
Nema, Ashutosh via llvm-dev
2016-Jan-23 13:45 UTC
[llvm-dev] how to force llvm generate gather intrinsic
Thanks Sanjay for highlighting this, few days back I also faced similar problem while generating masked store in avx1 mode, found its only supported under avx2 else we scalarize it.> 1) I did not switch-on masked_load/store to AVX1, I can do this.Yes Elena, This should be supported for FP type in avx1 mode (for INT type, I doubt X86 has masked_load/store instruction in avx1 mode). Thanks, Ashutosh From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Demikhovsky, Elena via llvm-dev Sent: Saturday, January 23, 2016 1:32 PM To: zhi chen; Sanjay Patel Cc: LLVM Developers Mailing List Subject: Re: [llvm-dev] how to force llvm generate gather intrinsic 1) I did not switch-on masked_load/store to AVX1, I can do this. 2) I did not switch-on masked gather on AVX2 because the instruction is slow. There is no scatter on AVX2. 3) Currently, gather/scatter does not work on SKX because the patch is still under review reviews.llvm.org/D15690. I’d be happy if you agree to review it. - Elena From: zhi chen [mailto:zchenhn at gmail.com] Sent: Saturday, January 23, 2016 02:58 To: Sanjay Patel <spatel at rotateright.com<mailto:spatel at rotateright.com>> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>>; LLVM Developers Mailing List <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: Re: [llvm-dev] how to force llvm generate gather intrinsic Thanks for your response, Sanjay. I know there are intrinsics available in C/C++. But the problem is that I want to instrument my code at the IR level and generate those instructions. I don't want to touch the source code. Best, Zhi On Fri, Jan 22, 2016 at 4:54 PM, Sanjay Patel <spatel at rotateright.com<mailto:spatel at rotateright.com>> wrote: I was just looking at the related masked load/store operations, and I think there are at least 2 bugs: 1. X86TTIImpl::isLegalMaskedLoad/Store() should be legal for FP types with AVX1 (not just AVX2). 2. X86TTIImpl::isLegalMaskedGather/Scatter() should be legal for 128/256 bit vectors with AVX2 (not just AVX512). I looked at this for the first time today, so I may be missing something... So for the moment, the answer to your question is 'no'; there's no generic way to produce these instructions. You should be able to use the _mm_* intrinsics in C though. On Fri, Jan 22, 2016 at 5:00 PM, zhi chen via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi, I used clang -O3 -c -emit-llvm on the follow code to generate a bitcode, say a.bc. I read the .ll file and didn't see any gather intrinsic. Also, I used opt -O3 -mcpu=core-avx2/-mcpu=skx, but there is still no gather intrinsic generated. int foo(int A[800], int B[800], int C[800]) { for (int i = 0; i < 800; i++) { A[B[i]] = i + 5; } for (int i = 0; i < 800; i++) { A[B[i]]++; } for (int i = 0; i < 800; i++) { A[i] = B[C[i]]; } return 0; } Could some give me an example that will generate gather intrinsic for AVX2? I tried to used the masked_gather intrinsic provided in the language ref. But it seemed that it only generates gather intrinsic for AVX-512 but for AVX-2. I found that there are 16 gather intrinsic versions depending on the data types provided for AVX-2. Do I have to check the data type before calling them specifically? or is there a generic way to use AVX-2 gather intrinsic? Best, Zhi _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160123/992144eb/attachment.html>