thr3ads.net - similar to: "vectorize.enable"

Displaying 20 results from an estimated 1200 matches similar to: "vectorize.enable"

2019 Oct 02

vectorize.enable

Am Mi., 2. Okt. 2019 um 07:08 Uhr schrieb Florian Hahn via llvm-dev <llvm-dev at lists.llvm.org>: > The other thing is that with the patch behaviour is slightly changed and we could get a diagnostic we didn't get before: > > warning: loop not vectorized: the optimizer was unable to > perform the requested transformation; the transformation might be disabled or >

loop vectorizer disabling

2019 Sep 10

loop vectorizer disabling

I would like to propose that loop pragma `vectorize(disable)` actually means disabling the vectorizer for that loop. This perhaps sounds really obvious (I hope it does), but currently `vectorize(disable)` sets the vectorization width to 1, and that means the vectorizer will run and could perform other tricks such as interleaving. The main reason to change the behaviour is that it will be more what

Vectorization width not correct using #pragma clang loop vectorize_width

2018 Sep 20

Vectorization width not correct using #pragma clang loop vectorize_width

Hello, I m trying to set vector width using #pragma clang loop vectorize_width(32) but i m getting width 8 for the following kernel; #define M 128 #define N 128 #define SQRT_FUN(x) sqrtf(x) int main(int argc, char** argv) { /* Variable declaration/allocation. */ double float_n = (double)N; double data[N*M]; double corr[M*M]; double mean[M]; double stddev[M]; uint32_t

Invoke loop vectorizer

2016 Aug 12

Invoke loop vectorizer

Hi Daniel, I increased the size of your test to be 128 but -stats still shows no loop optimized... Xiaochu On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote: > It's not possible to know that A and B don't alias in this example. It's > almost certainly not profitable to add a runtime check given the size of > the loop. > > >

Invoke loop vectorizer

2016 Aug 12

Invoke loop vectorizer

I'm not compiling it to x86. Should loop optimizer something independent of the target? If so, should the vectorized code on IR level? On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote: > cat > test.c > > #define SIZE 128 > > void bar(int *restrict A, int* restrict B,int K) { > > #pragma clang loop vectorize(enable)

Invoke loop vectorizer

2016 Aug 12

Invoke loop vectorizer

Hi Andrey, Thanks. I found even when loop vectorizer and SLP vectorizer are enabled, my simple test still not get optimized. I also tried clang pragma in my test to force vectorization. What do you think is the problem? Test: #define SIZE 8 void bar(int *A, int* B,int K) { #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8) for (int i = 0; i < SIZE; ++i) A[i]

vectorize.enable

2019 Oct 04

vectorize.enable

Thanks for your replies. That was a very useful discussion. I won't recommit on a Friday afternoon, but will do on Monday, as it looks like we agreed again on the direction and the change. Orthogonal to this change, the interesting topics brought up are improved diagnostics, and the cases the vectoriser misses. I will briefly look why this particular case isn't vectorised, but I suspect

Loop vectorization and unsafe floating point math

2020 Jun 24

Loop vectorization and unsafe floating point math

Hi llvm-dev! We are doing some fuzzy testing using C program generators, and one question that came up when generating a program with both floating point arithmetic and loop pragmas was; Is the loop vectorizer really allowed to vectorize a loop when it can't prove that it is safe to reorder fp math, even if there is a loop pragma that hints about a preferred width. When reading here

vectorize.enable

2019 Oct 07

vectorize.enable

Hi, > The problem I see is that the warning isn't very actionable. Fully agreed. > Good warnings are supposed to be actionable, but what is the developer supposed to do in this case? This diagnostic is unclear. But to be more precise, the first part says the optimisation could not be performed. This is spot on, and an improvement of what we had before because that didn't issue

Question about __builtin_assume()

2015 Dec 22

Question about __builtin_assume()

void test_copy_vec(const short* restrict src, short* restrict res, int N) { __builtin_assume( (N > 1) && (N%2 == 0) ); #pragma clang loop vectorize(enable) vectorize_width(2) interleave_count(1) for (int j=0; j<N; ++j) *res++ = *src++; } If I use __builtin_assume(N>1) then llvm knows the loop will execute and not check for (j <= 0), but I can't seem to get it to

vectorize.enable

2019 Oct 02

vectorize.enable

Am Mi., 2. Okt. 2019 um 15:56 Uhr schrieb Finkel, Hal J. <hfinkel at anl.gov>: > > It's done by the WarnMissedTransformation and just looks for > > transformation metadata that is still in the IR after all passes that > > should have transformed them have ran. That is, it does not know why > > it is still there -- it could be because the LoopVectorize pass is not

[LLVMdev] LLVM Loop Vectorizer puzzle

2013 May 23

[LLVMdev] LLVM Loop Vectorizer puzzle

On 05/23/2013 06:52 PM, Redmond, Paul wrote: > I'm not even sure you would need the llvm.loop.parallel anymore since the > vectorizer could just look to see if the loop id on a parallel_loop_access > matches the loop id of the loop being vectorized. > > Does this make any sense? Yes. However, I think you still need use the self-referencing metadata trick or similar to make the

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

[RFC][VECLIB] how should we legalize VECLIB calls?

Illustrative Example: clang -fveclib=SVML -O3 svml.c -mavx #include <math.h> void foo(double *a, int N){ int i; #pragma clang loop vectorize_width(8) for (i=0;i<N;i++){ a[i] = sin(i); } } Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer. This is 8-element SVML sin() called with 8-element argument. On the surface,

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

[LoopVectorizer] Improving the performance of dot product reduction loop

On 07/23/2018 06:23 PM, Hal Finkel via llvm-dev wrote: > > On 07/23/2018 05:22 PM, Craig Topper wrote: >> Hello all, >> >> This code https://godbolt.org/g/tTyxpf is a dot product reduction >> loop multipying sign extended 16-bit values to produce a 32-bit >> accumulated result. The x86 backend is currently not able to optimize >> it as well as gcc and icc.

canonical form loops

2020 Mar 26

canonical form loops

Hello, Quick question to see if I haven't missed anything: I would like convert counting down loops, i.e. loops with a constant -1 step value, to counting up loops, because the vectoriser is able to better deal with these loops (see e.g. D76838 that I was discussing today with Ayal). It looks like LoopSimplifyCFG and IndVarSimplify don't do this. So was just curious if I haven't

openldap authentication

2011 Nov 30

openldap authentication

I have an existing openldap schema which is handling mail, web and ftp services right now. I am trying to get a windows machine talking to the same filesystem as apache on linux via samba and read/write using the correct uid/gid. I was trying to shy away from using pam_ldap as there is no need to tie the user in ldap directly to the filesystem. The problem is it looks like the samba ldap module

LV: predication

2020 May 01

LV: predication

Hi Eli, > The problem with your proposal, as written, is that the vectorizer is producing the intrinsic. Because we don’t impose any ordering on optimizations before codegen, every optimization pass in LLVM would have to be taught to preserve any @llvm.set.loop.elements.i32 whenever it makes any change. This is completely impractical because the intrinsic isn’t related to anything

LV: predication

2020 May 20

LV: predication

Hi Ayal, Let me start with commenting on this: > A dedicated intrinsic that freezes the compare instruction, for no apparent reason, may potentially cripple subsequent passes from further optimizing the vectorized loop. The point is we have a very good reason, which is that it passes on the right information on the backend, enabling opimisations as opposed to crippling them. The compare

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

[LoopVectorizer] Improving the performance of dot product reduction loop

~Craig On Mon, Jul 23, 2018 at 4:24 PM Hal Finkel <hfinkel at anl.gov> wrote: > > On 07/23/2018 05:22 PM, Craig Topper wrote: > > Hello all, > > This code https://godbolt.org/g/tTyxpf is a dot product reduction loop > multipying sign extended 16-bit values to produce a 32-bit accumulated > result. The x86 backend is currently not able to optimize it as well as gcc

[LLVMdev] LLVM Loop Vectorizer puzzle

2013 May 23

[LLVMdev] LLVM Loop Vectorizer puzzle

On 2013-05-23, at 2:13 PM, Pekka Jääskeläinen wrote: > On 05/23/2013 06:52 PM, Redmond, Paul wrote: >> I'm not even sure you would need the llvm.loop.parallel anymore since the >> vectorizer could just look to see if the loop id on a parallel_loop_access >> matches the loop id of the loop being vectorized. >> >> Does this make any sense? > > Yes.

similar to: vectorize.enable