Ralph Campbell
2015-Feb-09 23:33 UTC
[LLVMdev] aarch64 status for generating SIMD instructions
% clang -S -O3 -mcpu=cortex-a57 -ffast-math -Rpass-analysis=loop-vectorize dot.c dot.c:15:1: remark: loop not vectorized: value that could not be identified as reduction is used outside the loop [-Rpass-analysis=loop-vectorize] } ^ dot.c:15:1: note: could not determine the original source location for :0:0 I found “llvm-as < /dev/null | llc -march=aarch64 -mattr=help” which listed a bunch of features but when I tried adding “-mfpu=neon” or “-mattr=+neon”, clang complained that the option was unrecognized. From: Michael Zolotukhin [mailto:mzolotukhin at apple.com] Sent: Monday, February 09, 2015 3:08 PM To: Ralph Campbell Cc: Arnaud A. de Grandmaison; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] aarch64 status for generating SIMD instructions Hi Ralph, A bunch of useful options for vectorizers is listed in [1]. Also, what you see might be a target-independent issue, not an aarch64-specific. If you can share the code you tested I can try to explain why vectorizer fails to handle it, and hopefully we can fix it later:) Thanks, Michael [1] http://llvm.org/docs/Vectorizers.html On Feb 9, 2015, at 2:19 PM, Ralph Campbell <ralph.campbell at broadcom.com<mailto:ralph.campbell at broadcom.com>> wrote: So far, all I have tried is –O3 and with & without “-mcpu=cortex-a57”. I’m new to LLVM so I’m not familiar with what optimization flags are available. I tried poking around in the LLVM documentation but haven’t found a definitive list. The clang man page is skimpy on details. From: Arnaud A. de Grandmaison [mailto:arnaud.degrandmaison at arm.com] Sent: Monday, February 09, 2015 2:11 PM To: Ralph Campbell Cc: llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu> Subject: RE: aarch64 status for generating SIMD instructions Which compiler flags have you been using ? There is definitely support for AArch64’s SIMD instructions, but their use depends on what the vectorizers can do with your code. From: llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces at cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Ralph Campbell Sent: 09 February 2015 22:30 To: llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu> Subject: [LLVMdev] aarch64 status for generating SIMD instructions I’m using Fedora 22 and gcc 4.9.2 to run llvm 3.5.1 on an ARM Juno reference box (cortex A53 & A57). I tried compiling some simple functions like dot product and axpy() into assembly to see if any of the SIMD instructions were generated (they weren’t). Perhaps I’m missing some compiler flag to enable it. Does anyone know what the status is for aarch64 generating SIMD instructions? Anyone coordinating or leading this effort? (if there is one) _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150209/18df1530/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: dot.s Type: application/octet-stream Size: 875 bytes Desc: dot.s URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150209/18df1530/attachment.obj> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dot.c URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150209/18df1530/attachment.c>
Michael Zolotukhin
2015-Feb-09 23:52 UTC
[LLVMdev] aarch64 status for generating SIMD instructions
From this message it looks like the vectorizer is having some general problems with the testcase. I’d suggest to try the simplest case for the beginning, just to make sure vectorizer works. Like this: void foo(int *a, int *b, int *c) { int i; for(i = 0; i < 1000; i++) { a[i] = b[i] + c[i]; } } If you compile it with ‘clang -O3 -arch arm64 -S’, you should see the SIMD instructions. If you do see them, it means that your original test is too complicated for the vectorizer right now (that might be due to some bug) - feel free to file a bug. Thanks, Michael> On Feb 9, 2015, at 3:33 PM, Ralph Campbell <ralph.campbell at broadcom.com> wrote: > > % clang -S -O3 -mcpu=cortex-a57 -ffast-math -Rpass-analysis=loop-vectorize dot.c > dot.c:15:1: remark: loop not vectorized: value that could not be identified as > reduction is used outside the loop [-Rpass-analysis=loop-vectorize] > } > ^ > dot.c:15:1: note: could not determine the original source location for :0:0 > > I found “llvm-as < /dev/null | llc -march=aarch64 -mattr=help” which listed a bunch of features but when I tried > adding “-mfpu=neon” or “-mattr=+neon”, clang complained that the option was unrecognized. > > > From: Michael Zolotukhin [mailto:mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>] > Sent: Monday, February 09, 2015 3:08 PM > To: Ralph Campbell > Cc: Arnaud A. de Grandmaison; llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu> > Subject: Re: [LLVMdev] aarch64 status for generating SIMD instructions > > Hi Ralph, > > A bunch of useful options for vectorizers is listed in [1]. > > Also, what you see might be a target-independent issue, not an aarch64-specific. If you can share the code you tested I can try to explain why vectorizer fails to handle it, and hopefully we can fix it later:) > > Thanks, > Michael > > [1] http://llvm.org/docs/Vectorizers.html <http://llvm.org/docs/Vectorizers.html> > > > On Feb 9, 2015, at 2:19 PM, Ralph Campbell <ralph.campbell at broadcom.com <mailto:ralph.campbell at broadcom.com>> wrote: > > So far, all I have tried is –O3 and with & without “-mcpu=cortex-a57”. > I’m new to LLVM so I’m not familiar with what optimization flags are available. > I tried poking around in the LLVM documentation but haven’t found a definitive list. > The clang man page is skimpy on details. > > From: Arnaud A. de Grandmaison [mailto:arnaud.degrandmaison at arm.com <mailto:arnaud.degrandmaison at arm.com>] > Sent: Monday, February 09, 2015 2:11 PM > To: Ralph Campbell > Cc: llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu> > Subject: RE: aarch64 status for generating SIMD instructions > > Which compiler flags have you been using ? > > There is definitely support for AArch64’s SIMD instructions, but their use depends on what the vectorizers can do with your code. > > From: llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu>] On Behalf Of Ralph Campbell > Sent: 09 February 2015 22:30 > To: llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu> > Subject: [LLVMdev] aarch64 status for generating SIMD instructions > > I’m using Fedora 22 and gcc 4.9.2 to run llvm 3.5.1 on an ARM Juno reference box (cortex A53 & A57). > I tried compiling some simple functions like dot product and axpy() into assembly to see if any of the SIMD instructions were generated (they weren’t). > Perhaps I’m missing some compiler flag to enable it. > > Does anyone know what the status is for aarch64 generating SIMD instructions? > Anyone coordinating or leading this effort? (if there is one) > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > > <dot.s><dot.c>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150209/fa611abe/attachment.html>
Michael Zolotukhin
2015-Feb-10 00:01 UTC
[LLVMdev] aarch64 status for generating SIMD instructions
I just found that you attached the testcase. The reason vectorizer fails on it is that there are three induction variables (i, ix, iy), and vectorizer doesn’t know about their strides. If you, for instance, replace inc_x and inc_y with ‘1’, the loop will be vectorized. Thanks, Michael PS: The diagnostics is really confusing here.> On Feb 9, 2015, at 3:52 PM, Michael Zolotukhin <mzolotukhin at apple.com> wrote: > > From this message it looks like the vectorizer is having some general problems with the testcase. I’d suggest to try the simplest case for the beginning, just to make sure vectorizer works. Like this: > void foo(int *a, int *b, int *c) { > int i; > for(i = 0; i < 1000; i++) { > a[i] = b[i] + c[i]; > } > } > > If you compile it with ‘clang -O3 -arch arm64 -S’, you should see the SIMD instructions. If you do see them, it means that your original test is too complicated for the vectorizer right now (that might be due to some bug) - feel free to file a bug. > > Thanks, > Michael > > >> On Feb 9, 2015, at 3:33 PM, Ralph Campbell <ralph.campbell at broadcom.com <mailto:ralph.campbell at broadcom.com>> wrote: >> >> % clang -S -O3 -mcpu=cortex-a57 -ffast-math -Rpass-analysis=loop-vectorize dot.c >> dot.c:15:1: remark: loop not vectorized: value that could not be identified as >> reduction is used outside the loop [-Rpass-analysis=loop-vectorize] >> } >> ^ >> dot.c:15:1: note: could not determine the original source location for :0:0 >> >> I found “llvm-as < /dev/null | llc -march=aarch64 -mattr=help” which listed a bunch of features but when I tried >> adding “-mfpu=neon” or “-mattr=+neon”, clang complained that the option was unrecognized. >> >> >> From: Michael Zolotukhin [mailto:mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>] >> Sent: Monday, February 09, 2015 3:08 PM >> To: Ralph Campbell >> Cc: Arnaud A. de Grandmaison; llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu> >> Subject: Re: [LLVMdev] aarch64 status for generating SIMD instructions >> >> Hi Ralph, >> >> A bunch of useful options for vectorizers is listed in [1]. >> >> Also, what you see might be a target-independent issue, not an aarch64-specific. If you can share the code you tested I can try to explain why vectorizer fails to handle it, and hopefully we can fix it later:) >> >> Thanks, >> Michael >> >> [1] http://llvm.org/docs/Vectorizers.html <http://llvm.org/docs/Vectorizers.html> >> >> >> On Feb 9, 2015, at 2:19 PM, Ralph Campbell <ralph.campbell at broadcom.com <mailto:ralph.campbell at broadcom.com>> wrote: >> >> So far, all I have tried is –O3 and with & without “-mcpu=cortex-a57”. >> I’m new to LLVM so I’m not familiar with what optimization flags are available. >> I tried poking around in the LLVM documentation but haven’t found a definitive list. >> The clang man page is skimpy on details. >> >> From: Arnaud A. de Grandmaison [mailto:arnaud.degrandmaison at arm.com <mailto:arnaud.degrandmaison at arm.com>] >> Sent: Monday, February 09, 2015 2:11 PM >> To: Ralph Campbell >> Cc: llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu> >> Subject: RE: aarch64 status for generating SIMD instructions >> >> Which compiler flags have you been using ? >> >> There is definitely support for AArch64’s SIMD instructions, but their use depends on what the vectorizers can do with your code. >> >> From: llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu>] On Behalf Of Ralph Campbell >> Sent: 09 February 2015 22:30 >> To: llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu> >> Subject: [LLVMdev] aarch64 status for generating SIMD instructions >> >> I’m using Fedora 22 and gcc 4.9.2 to run llvm 3.5.1 on an ARM Juno reference box (cortex A53 & A57). >> I tried compiling some simple functions like dot product and axpy() into assembly to see if any of the SIMD instructions were generated (they weren’t). >> Perhaps I’m missing some compiler flag to enable it. >> >> Does anyone know what the status is for aarch64 generating SIMD instructions? >> Anyone coordinating or leading this effort? (if there is one) >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >> >> <dot.s><dot.c>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150209/7049120e/attachment.html>
Ralph Campbell
2015-Feb-10 00:35 UTC
[LLVMdev] aarch64 status for generating SIMD instructions
Better. With this test I see: % clang -S -O3 -Rpass=loop-vectorize test.c test.c:3:3: remark: vectorized loop (vectorization factor: 4, unrolling interleave factor: 2) [-Rpass=loop-vectorize] for(i = 0; i < 1000; i++) { ^ % clang -S -O3 -o test1.s –mcpu=cortex-a57 -Rpass=loop-vectorize test.c test.c:3:3: remark: vectorized loop (vectorization factor: 4, unrolling interleave factor: 4) [-Rpass=loop-vectorize] for(i = 0; i < 1000; i++) { ^ Both use SIMD instructions. Changing the code to use a variable for the loop limit works OK as well as changing int to float. So I guess it is the return in dot.c that is causing a problem. I will file a bug since I think the vectorizer should handle that case. From: Michael Zolotukhin [mailto:mzolotukhin at apple.com] Sent: Monday, February 09, 2015 3:53 PM To: Ralph Campbell Cc: Arnaud A. de Grandmaison; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] aarch64 status for generating SIMD instructions From this message it looks like the vectorizer is having some general problems with the testcase. I’d suggest to try the simplest case for the beginning, just to make sure vectorizer works. Like this: void foo(int *a, int *b, int *c) { int i; for(i = 0; i < 1000; i++) { a[i] = b[i] + c[i]; } } If you compile it with ‘clang -O3 -arch arm64 -S’, you should see the SIMD instructions. If you do see them, it means that your original test is too complicated for the vectorizer right now (that might be due to some bug) - feel free to file a bug. Thanks, Michael On Feb 9, 2015, at 3:33 PM, Ralph Campbell <ralph.campbell at broadcom.com<mailto:ralph.campbell at broadcom.com>> wrote: % clang -S -O3 -mcpu=cortex-a57 -ffast-math -Rpass-analysis=loop-vectorize dot.c dot.c:15:1: remark: loop not vectorized: value that could not be identified as reduction is used outside the loop [-Rpass-analysis=loop-vectorize] } ^ dot.c:15:1: note: could not determine the original source location for :0:0 I found “llvm-as < /dev/null | llc -march=aarch64 -mattr=help” which listed a bunch of features but when I tried adding “-mfpu=neon” or “-mattr=+neon”, clang complained that the option was unrecognized. From: Michael Zolotukhin [mailto:mzolotukhin at apple.com] Sent: Monday, February 09, 2015 3:08 PM To: Ralph Campbell Cc: Arnaud A. de Grandmaison; llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu> Subject: Re: [LLVMdev] aarch64 status for generating SIMD instructions Hi Ralph, A bunch of useful options for vectorizers is listed in [1]. Also, what you see might be a target-independent issue, not an aarch64-specific. If you can share the code you tested I can try to explain why vectorizer fails to handle it, and hopefully we can fix it later:) Thanks, Michael [1] http://llvm.org/docs/Vectorizers.html On Feb 9, 2015, at 2:19 PM, Ralph Campbell <ralph.campbell at broadcom.com<mailto:ralph.campbell at broadcom.com>> wrote: So far, all I have tried is –O3 and with & without “-mcpu=cortex-a57”. I’m new to LLVM so I’m not familiar with what optimization flags are available. I tried poking around in the LLVM documentation but haven’t found a definitive list. The clang man page is skimpy on details. From: Arnaud A. de Grandmaison [mailto:arnaud.degrandmaison at arm.com] Sent: Monday, February 09, 2015 2:11 PM To: Ralph Campbell Cc: llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu> Subject: RE: aarch64 status for generating SIMD instructions Which compiler flags have you been using ? There is definitely support for AArch64’s SIMD instructions, but their use depends on what the vectorizers can do with your code. From: llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces at cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Ralph Campbell Sent: 09 February 2015 22:30 To: llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu> Subject: [LLVMdev] aarch64 status for generating SIMD instructions I’m using Fedora 22 and gcc 4.9.2 to run llvm 3.5.1 on an ARM Juno reference box (cortex A53 & A57). I tried compiling some simple functions like dot product and axpy() into assembly to see if any of the SIMD instructions were generated (they weren’t). Perhaps I’m missing some compiler flag to enable it. Does anyone know what the status is for aarch64 generating SIMD instructions? Anyone coordinating or leading this effort? (if there is one) _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu<http://llvm.cs.uiuc.edu/> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <dot.s><dot.c> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150210/737b630a/attachment.html>