Nadav Rotem
2013-Feb-19 00:31 UTC
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
> > Okay. If you'll update your local BBVectorize patches, then we can pull them upstream. Then we'll just need to update the unroller.If I understand this thread correctly, you want to enable vecorization by telling the BB vectorizer that different operations are independent. I understand your motivation and I agree that this is indeed one way to do vectorization. However, I don't completely understand something. If we already have the information that consecutive iterations of the loops are independent, then the loop vectorizer should already vectorize the loop. Also, at the moment we unroll loops before BB Vectorization. Can you think of cases where the unrolling can help BB-vecoriation ? I think that it can only help in cases that are easily detected by the loop vectorizer. Thanks, Nadav
Hal Finkel
2013-Feb-19 04:18 UTC
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
----- Original Message -----> From: "Nadav Rotem" <nrotem at apple.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>, "Tobias Grosser" <tobias at grosser.es>, "llvmdev at cs.uiuc.edu Dev" > <llvmdev at cs.uiuc.edu> > Sent: Monday, February 18, 2013 6:31:39 PM > Subject: Re: [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata) > > > > > Okay. If you'll update your local BBVectorize patches, then we can > > pull them upstream. Then we'll just need to update the unroller. > > > If I understand this thread correctly, you want to enable > vecorization by telling the BB vectorizer that different operations > are independent. I understand your motivation and I agree that this > is indeed one way to do vectorization. However, I don't completely > understand something. If we already have the information that > consecutive iterations of the loops are independent, then the loop > vectorizer should already vectorize the loop. Also, at the moment > we unroll loops before BB Vectorization. Can you think of cases > where the unrolling can help BB-vecoriation ? I think that it can > only help in cases that are easily detected by the loop vectorizer.I think this is more a question of profitability more than ability. If we mark all loop iterations as independent, then the loop vectorizer could vectorize them, but it might not find it profitable to do so. Nevertheless, it might be profitable to partially vectorize the loop, and the unroll+bb-vectorize approach can catch those cases. Note that the use of this metadata on loads is not just to vectorize those particular loads, but also to provide a means of proving the independence of their users. In any case, I really want this iteration-independence metadata after unrolling to assist with instruction scheduling on in-order cores (the enable-aa-sched-mi option). So long as we have it, BBVectorize might as well support it, but allowing the instruction scheduler to hide the load latencies is really my key use case. Thanks again, Hal> > Thanks, > Nadav >
Pekka Jääskeläinen
2013-Feb-19 08:27 UTC
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
Hi, On 02/19/2013 02:31 AM, Nadav Rotem wrote:> vectorization. However, I don't completely understand something. If we > already have the information that consecutive iterations of the loops are > independent, then the loop vectorizer should already vectorize the loop.Yes, the loop vectorizer should be a better match for the parallel (inner)loops. That's why I've been pushing the parallel loop metadata: to go towards using the generic loop vectorizer instead of the hacked bbvectorizer for work-group autovectorization in pocl. Unfortunately, it needs some more work still to be efficient for this purpose (like discussed), but a step towards it has been now made and it can vectorize some work-group functions with pocl. That's good. BTW, there's at least one thing the bbvectorizer handles better now: intra-kernel/function vector datatypes. IIRC, it doesn't choke when it sees vectors already present in the input, but calmly tries to combine multiple vector instructions to a larger one. Thus, it might be useful in the case where the loop at hand is not nicely and cleanly vectorizable (e.g., nasty memory access patterns) to still provide some level of vector ISA utilization. Hal, this OpenCL WG autovectorization work, unfortunately, is not my first priority task at work currently (more like a pet project), so I cannot make any promises on when I might find time to work on it again. So, if you want to see the parallel loop iteration MD happen sooner, I'd recommend you dig into it. I think we'd like to start from the scratch for the bbvectorizer utilization in pocl anyways, that is, would add the metadata support first and then use it in a fresh bbvectorizer version. The current hacked version in pocl seems not to be upstreamable easily as it has lagged behind some LLVM versions and is rather dirty. BR, -- --Pekka
Hal Finkel
2013-Feb-19 15:51 UTC
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
----- Original Message -----> From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi> > To: "Nadav Rotem" <nrotem at apple.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Tobias Grosser" <tobias at grosser.es>, "llvmdev at cs.uiuc.edu Dev" > <llvmdev at cs.uiuc.edu> > Sent: Tuesday, February 19, 2013 2:27:09 AM > Subject: Re: [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata) > > Hi, > > On 02/19/2013 02:31 AM, Nadav Rotem wrote: > > vectorization. However, I don't completely understand something. > > If we > > already have the information that consecutive iterations of the > > loops are > > independent, then the loop vectorizer should already vectorize the > > loop. > > Yes, the loop vectorizer should be a better match for the parallel > (inner)loops. > > That's why I've been pushing the parallel loop metadata: to go > towards > using the generic loop vectorizer instead of the hacked bbvectorizer > for > work-group autovectorization in pocl. Unfortunately, it needs some > more work > still to be efficient for this purpose (like discussed), but a step > towards it > has been now made and it can vectorize some work-group functions with > pocl. > That's good. > > BTW, there's at least one thing the bbvectorizer handles better now: > intra-kernel/function vector datatypes. IIRC, it doesn't choke when > it sees > vectors already present in the input, but calmly tries to combine > multiple > vector instructions to a larger one. Thus, it might be useful in the > case > where the loop at hand is not nicely and cleanly vectorizable (e.g., > nasty > memory access patterns) to still provide some level of vector ISA > utilization.Indeed, that's the idea.> > Hal, this OpenCL WG autovectorization work, unfortunately, is not my > first > priority task at work currently (more like a pet project), so I > cannot make any > promises on when I might find time to work on it again. So, if you > want to > see the parallel loop iteration MD happen sooner, I'd recommend you > dig into > it. I think we'd like to start from the scratch for the bbvectorizer > utilization > in pocl anyways, that is, would add the metadata support first and > then use it > in a fresh bbvectorizer version. The current hacked version in pocl > seems not > to be upstreamable easily as it has lagged behind some LLVM versions > and is > rather dirty.Understood. If you have some time, it seems that there are several sub-tasks: - Update the language reference - Update the loop vectorizer (to update the metadata when it unrolls) - Update the regular unroller - Update the alias analysis (maybe this is sufficient for basic support in BBVectorize) - is your current code close enough for this? - Update the BB vectorizer to prefer pairings from different iterations Thanks again, Hal> > BR, > -- > --Pekka > >
Possibly Parallel Threads
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
- [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)