similar to: vectorize.enable

Displaying 20 results from an estimated 1200 matches similar to: "vectorize.enable"

2019 Oct 02
2
vectorize.enable
Am Mi., 2. Okt. 2019 um 07:08 Uhr schrieb Florian Hahn via llvm-dev <llvm-dev at lists.llvm.org>: > The other thing is that with the patch behaviour is slightly changed and we could get a diagnostic we didn't get before: > > warning: loop not vectorized: the optimizer was unable to > perform the requested transformation; the transformation might be disabled or >
2019 Sep 10
3
loop vectorizer disabling
I would like to propose that loop pragma `vectorize(disable)` actually means disabling the vectorizer for that loop. This perhaps sounds really obvious (I hope it does), but currently `vectorize(disable)` sets the vectorization width to 1, and that means the vectorizer will run and could perform other tricks such as interleaving. The main reason to change the behaviour is that it will be more what
2018 Sep 20
2
Vectorization width not correct using #pragma clang loop vectorize_width
Hello, I m trying to set vector width using #pragma clang loop vectorize_width(32) but i m getting width 8 for the following kernel; #define M 128 #define N 128 #define SQRT_FUN(x) sqrtf(x) int main(int argc, char** argv) { /* Variable declaration/allocation. */ double float_n = (double)N; double data[N*M]; double corr[M*M]; double mean[M]; double stddev[M]; uint32_t
2016 Aug 12
2
Invoke loop vectorizer
Hi Daniel, I increased the size of your test to be 128 but -stats still shows no loop optimized... Xiaochu On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote: > It's not possible to know that A and B don't alias in this example. It's > almost certainly not profitable to add a runtime check given the size of > the loop. > > >
2016 Aug 12
4
Invoke loop vectorizer
I'm not compiling it to x86. Should loop optimizer something independent of the target? If so, should the vectorized code on IR level? On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote: > cat > test.c > > #define SIZE 128 > > void bar(int *restrict A, int* restrict B,int K) { > > #pragma clang loop vectorize(enable)
2016 Aug 12
2
Invoke loop vectorizer
Hi Andrey, Thanks. I found even when loop vectorizer and SLP vectorizer are enabled, my simple test still not get optimized. I also tried clang pragma in my test to force vectorization. What do you think is the problem? Test: #define SIZE 8 void bar(int *A, int* B,int K) { #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8) for (int i = 0; i < SIZE; ++i) A[i]
2019 Oct 04
4
vectorize.enable
Thanks for your replies. That was a very useful discussion. I won't recommit on a Friday afternoon, but will do on Monday, as it looks like we agreed again on the direction and the change. Orthogonal to this change, the interesting topics brought up are improved diagnostics, and the cases the vectoriser misses. I will briefly look why this particular case isn't vectorised, but I suspect
2020 Jun 24
2
Loop vectorization and unsafe floating point math
Hi llvm-dev! We are doing some fuzzy testing using C program generators, and one question that came up when generating a program with both floating point arithmetic and loop pragmas was; Is the loop vectorizer really allowed to vectorize a loop when it can't prove that it is safe to reorder fp math, even if there is a loop pragma that hints about a preferred width. When reading here
2019 Oct 07
2
vectorize.enable
Hi, > The problem I see is that the warning isn't very actionable. Fully agreed. > Good warnings are supposed to be actionable, but what is the developer supposed to do in this case? This diagnostic is unclear. But to be more precise, the first part says the optimisation could not be performed. This is spot on, and an improvement of what we had before because that didn't issue
2015 Dec 22
2
Question about __builtin_assume()
void test_copy_vec(const short* restrict src, short* restrict res, int N) { __builtin_assume( (N > 1) && (N%2 == 0) ); #pragma clang loop vectorize(enable) vectorize_width(2) interleave_count(1) for (int j=0; j<N; ++j) *res++ = *src++; } If I use __builtin_assume(N>1) then llvm knows the loop will execute and not check for (j <= 0), but I can't seem to get it to
2019 Oct 02
2
vectorize.enable
Am Mi., 2. Okt. 2019 um 15:56 Uhr schrieb Finkel, Hal J. <hfinkel at anl.gov>: > > It's done by the WarnMissedTransformation and just looks for > > transformation metadata that is still in the IR after all passes that > > should have transformed them have ran. That is, it does not know why > > it is still there -- it could be because the LoopVectorize pass is not
2013 May 23
3
[LLVMdev] LLVM Loop Vectorizer puzzle
On 05/23/2013 06:52 PM, Redmond, Paul wrote: > I'm not even sure you would need the llvm.loop.parallel anymore since the > vectorizer could just look to see if the loop id on a parallel_loop_access > matches the loop id of the loop being vectorized. > > Does this make any sense? Yes. However, I think you still need use the self-referencing metadata trick or similar to make the
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Illustrative Example: clang -fveclib=SVML -O3 svml.c -mavx #include <math.h> void foo(double *a, int N){ int i; #pragma clang loop vectorize_width(8) for (i=0;i<N;i++){ a[i] = sin(i); } } Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer. This is 8-element SVML sin() called with 8-element argument. On the surface,
2018 Jul 23
2
[LoopVectorizer] Improving the performance of dot product reduction loop
On 07/23/2018 06:23 PM, Hal Finkel via llvm-dev wrote: > > On 07/23/2018 05:22 PM, Craig Topper wrote: >> Hello all, >> >> This code https://godbolt.org/g/tTyxpf is a dot product reduction >> loop multipying sign extended 16-bit values to produce a 32-bit >> accumulated result. The x86 backend is currently not able to optimize >> it as well as gcc and icc.
2020 Mar 26
5
canonical form loops
Hello, Quick question to see if I haven't missed anything: I would like convert counting down loops, i.e. loops with a constant -1 step value, to counting up loops, because the vectoriser is able to better deal with these loops (see e.g. D76838 that I was discussing today with Ayal). It looks like LoopSimplifyCFG and IndVarSimplify don't do this. So was just curious if I haven't
2011 Nov 30
1
openldap authentication
I have an existing openldap schema which is handling mail, web and ftp services right now. I am trying to get a windows machine talking to the same filesystem as apache on linux via samba and read/write using the correct uid/gid. I was trying to shy away from using pam_ldap as there is no need to tie the user in ldap directly to the filesystem. The problem is it looks like the samba ldap module
2020 May 01
5
LV: predication
Hi Eli, > The problem with your proposal, as written, is that the vectorizer is producing the intrinsic. Because we don’t impose any ordering on optimizations before codegen, every optimization pass in LLVM would have to be taught to preserve any @llvm.set.loop.elements.i32 whenever it makes any change. This is completely impractical because the intrinsic isn’t related to anything
2020 May 20
2
LV: predication
Hi Ayal, Let me start with commenting on this: > A dedicated intrinsic that freezes the compare instruction, for no apparent reason, may potentially cripple subsequent passes from further optimizing the vectorized loop. The point is we have a very good reason, which is that it passes on the right information on the backend, enabling opimisations as opposed to crippling them. The compare
2018 Jul 23
4
[LoopVectorizer] Improving the performance of dot product reduction loop
~Craig On Mon, Jul 23, 2018 at 4:24 PM Hal Finkel <hfinkel at anl.gov> wrote: > > On 07/23/2018 05:22 PM, Craig Topper wrote: > > Hello all, > > This code https://godbolt.org/g/tTyxpf is a dot product reduction loop > multipying sign extended 16-bit values to produce a 32-bit accumulated > result. The x86 backend is currently not able to optimize it as well as gcc
2013 May 23
0
[LLVMdev] LLVM Loop Vectorizer puzzle
On 2013-05-23, at 2:13 PM, Pekka Jääskeläinen wrote: > On 05/23/2013 06:52 PM, Redmond, Paul wrote: >> I'm not even sure you would need the llvm.loop.parallel anymore since the >> vectorizer could just look to see if the loop id on a parallel_loop_access >> matches the loop id of the loop being vectorized. >> >> Does this make any sense? > > Yes.