similar to: [LLVMdev] Loop vectorizer behaviour for 2D arrays and parallel annotation

Displaying 20 results from an estimated 5000 matches similar to: "[LLVMdev] Loop vectorizer behaviour for 2D arrays and parallel annotation"

2013 Apr 17
0
[LLVMdev] Loop vectorizer behaviour for 2D arrays and parallel annotation
On 04/17/2013 04:55 AM, Anadi Mishra wrote: > Hello, > > I am trying to vectorize the following loop but the vectorizer says: > "Found a possible write-write reorder" and does not vectorize. > Why? To my knowledge, the dependence analysis in the loop vectorizer is not yet able to prove the absence of dependences here. > for (j=0; j < 8; j++) > { > jj
2013 Apr 17
1
[LLVMdev] Loop vectorizer behaviour for 2D arrays and parallel annotation
On Wed, Apr 17, 2013 at 8:08 AM, Tobias Grosser <tobias at grosser.es> wrote: > On 04/17/2013 04:55 AM, Anadi Mishra wrote: >> >> Hello, >> >> I am trying to vectorize the following loop but the vectorizer says: >> "Found a possible write-write reorder" and does not vectorize. >> Why? > > > To my knowledge, the dependence analysis in
2013 Apr 17
0
[LLVMdev] Loop vectorizer behaviour for 2D arrays and parallel annotation
Hi Anadi Mishra, On 04/17/2013 05:55 AM, Anadi Mishra wrote: > Another question is regarding the isannotatedparallel() check. Is > there a way to make clang (or any other frontend) to generate parallel > annotated IR? Paul Redmond was adding support for "#pragma ivdep" that would use the parallel metadata, but I haven't been able to follow its progress lately. FWIW,
2013 Apr 11
2
[LLVMdev] Decouple LoopVectorizer from O3
Done. Best, Anadi. On Thu, Apr 11, 2013 at 7:01 AM, Nadav Rotem <nrotem at apple.com> wrote: > Hi Anadi, > > Yes, this is a bug in the loop vectorizer. The loop vectorizer expects only > one loop counter (integer with step=1). There is no reason why we should > not handle the case below, and it should be easy to fix. Interestingly > enough if you reverse the order of
2013 Apr 11
2
[LLVMdev] Decouple LoopVectorizer from O3
Hi Nadav, I tried your suggestion by changing the condition to : 189 if (LoopVectorize && OptLevel >= 0) 190 MPM.add(createLoopVectorizePass()); and compiled. Then I used the following command: opt -mtriple=x86_64-linux-gnu -vectorize-loops -vectorizer-min-trip-count=6 -debug-only=loop-vectorize -O1-S -o example1_vect.s example1.s where example1.s is IR generated by clang -S
2013 Apr 11
4
[LLVMdev] Decouple LoopVectorizer from O3
Hello, I am trying out the LoopVectorizer(LV) pass and would like to decouple it from O3 which is currently required to run LV. I want to do this because I want to understand the behaviour of LV by trying simple loops but the O3 mostly optimises away the loop body. Any ideas would be appreciated. Best, Anadi. -------------- next part -------------- An HTML attachment was scrubbed... URL:
2013 Apr 15
0
[LLVMdev] Decouple LoopVectorizer from O3
Just an FYI: it's often handy to mention the PR number when a thread is concluded by filing a bug. That way other people reading (now, or more importantly, later) can follow the issue through to the bug and its resolution On Apr 11, 2013 4:24 PM, "Anadi Mishra" <reachanadi at gmail.com> wrote: > Done. > > Best, > Anadi. > > > On Thu, Apr 11, 2013 at 7:01
2013 Apr 11
0
[LLVMdev] Decouple LoopVectorizer from O3
Hi Anadi, Yes, this is a bug in the loop vectorizer. The loop vectorizer expects only one loop counter (integer with step=1). There is no reason why we should not handle the case below, and it should be easy to fix. Interestingly enough if you reverse the order of iterations and count from SIZE to zero, the loop vectorizer would vectorize it. If you open a bugzilla report and assign it to me
2013 Apr 11
1
[LLVMdev] Decouple LoopVectorizer from O3
Thanks for the suggestion Jim. I already tried to do it by 'opt' but it also requires O3. BTW I think that if I invoke 'opt' with '-vectorize-loops' option, it will figure out the passes required for LV since every pass mentions what other passes are prerequisite. Am I correct? Best, Anadi. On Thu, Apr 11, 2013 at 2:48 AM, Jim Grosbach <grosbach at apple.com>
2013 Apr 11
0
[LLVMdev] Decouple LoopVectorizer from O3
Hi Anadi, In the file PassManagerBuilder.cpp you can change the lines below to get rid of the O3 restriction. 189 if (LoopVectorize && OptLevel > 2) 190 MPM.add(createLoopVectorizePass()); Nadav On Apr 10, 2013, at 5:39 PM, Anadi Mishra <reachanadi at gmail.com> wrote: > Hello, > > I am trying out the LoopVectorizer(LV) pass and would like to decouple
2013 Apr 11
0
[LLVMdev] Decouple LoopVectorizer from O3
You can take unoptimized bitcode and run it through ‘opt’ to have complete flexibility in which passes get run. It may take some fiddling to find out the pass sequence and ordering that does what you want, as some passes rely on previous passes to canonicaplize code into a form it can effectively work with. -Jim On Apr 10, 2013, at 5:39 PM, Anadi Mishra <reachanadi at gmail.com> wrote:
2005 Dec 05
1
apply() and dropped dimensions
Hi I am having difficulty with apply(). I want apply() to return a matrix, but sometimes a vector is returned. Toy example follows. Function jj() takes a couple of matrices m1 and m2 as arguments and returns a matrix with r rows and c columns where r=nrow(m2) and c=nrow(m1). jj <- function(m1,m2,f,...){ apply(m1, 1, function(y) { apply(m2, 1, function(x) { f(x, y, ...)
2013 Mar 01
3
[LLVMdev] parallel loop metadata simplification
----- Original Message ----- > From: "Hal Finkel" <hfinkel at anl.gov> > To: "Paul Redmond" <paul.redmond at intel.com> > Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> > Sent: Friday, March 1, 2013 11:13:06 AM > Subject: Re: [LLVMdev] parallel loop metadata simplification > > ----- Original Message ----- > >
2013 Mar 01
2
[LLVMdev] parallel loop metadata simplification
On 2013-03-01, at 11:35 AM, Hal Finkel wrote: > ----- Original Message ----- >> From: "Paul Redmond" <paul.redmond at intel.com> >> To: "Hal Finkel" <hfinkel at anl.gov> >> Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> >> Sent: Friday, March 1, 2013 10:06:51 AM >> Subject: Re: [LLVMdev] parallel loop
2013 Mar 03
2
[LLVMdev] parallel loop metadata simplification
On 03/03/2013 06:43 PM, Tobias Grosser wrote: > Very good example, indeed. Is there a formal definition of what > #pragma ivdeps means? I see two options here: In the previous discussion we could not find a proper definition for #pragma ivdep so we concluded we can treat it as a statement of "treat the loop as parallel, I do not expect any dependency checking by the compiler",
2013 Mar 03
2
[LLVMdev] parallel loop metadata simplification
On 03/03/2013 02:34 PM, Tobias Grosser wrote: > Meaning they are due to an array or pointer access. What about loop-scope arrays? void foo(long *A, long b) { long i; #pragma ivdep for (i = 0; i < 100; i++) { long t[100]; t[0] = i + 2; A[i] = A[i+b] + t[0]; } } Clang places the alloca for t to the entry block, creating a new race condition.
2004 Sep 08
3
do.call("[", ...) question
Hi again everyone I have an arbitrarily dimensional array "a" and a list "jj" of length length(dim(a)). The elements of jj are vectors of indexes. How do I use do.call() to extract a[ jj[[1]], jj[[2]], jj[[3]], ...] ? Toy example follows: a <- matrix(1:30,5,6) jj <- list(5:1,6:1) I want the following a[ jj[[1]],jj[[2]] ] How do I do this? OBAttempts:
2013 Mar 03
0
[LLVMdev] parallel loop metadata simplification
On 03/03/2013 03:34 PM, Pekka Jääskeläinen wrote: > On 03/03/2013 02:34 PM, Tobias Grosser wrote: >> Meaning they are due to an array or pointer access. > > What about loop-scope arrays? > > void foo(long *A, long b) { > long i; > > #pragma ivdep > for (i = 0; i < 100; i++) { > long t[100]; > t[0] = i + 2; >
2011 Mar 03
3
top and allocation issues
In a context where exceptions are caught, I ran the fragment: cerr << "allocating" << endl; char* arr[100]; for (int jj = 0; jj < 10; ++jj) { cerr << "jj = " << jj << endl; arr[jj] = new char[2000000000]; sleep (30); } sleep (10); for (int jj = 0; jj < 10; ++jj) delete[] arr[jj]; cerr
2000 Dec 10
1
seq(0.05,0.95,by=0.002) and logical error
Regardless of which version -- 1.1.1 or 1.2.0 (2000-11-27) -- with a fresh "directory" (i.e. no .RData), I am getting an extremely weird result. R : Copyright 2000, The R Development Core Team Version 1.2.0 Under development (unstable) (2000-11-27) > jj _ seq(0.05,0.95,by=0.002) > sum(jj==0.75) ## WRONG ANSWER [1] 0 > 0.05 + 350*.002 ## Double check that 0.75 is in jj [1]