Terry Greyzck via llvm-dev
2019-Aug-15 20:05 UTC
[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM
The ivdep pragma is designed to do exactly what the name states - ignore vector dependencies. Cray Research first implemented this in 1978 in their CFT compiler, and has supported it since. This pragma is typically used by application developers who want vectorized code when the compiler cannot automatically determine safety; it is not equivalent to the OpenMP SIMD pragma in that the compiler is still expected to automatically detect such things as reductions. The Cray implementation accepts an optional 'safevl=<const>' clause, however this is not commonly used and not really needed for new implementations. Characteristics of ivdep: * ivdep can be applied to any loop in a nest, but only affects the immediately following loop * Only trims dependencies at that loop nest level * If the user annotates more than one loop in a loop nest, only the ivdep on the innermost loop or loops is honored * ivdep can be used on for (), while {}, and do { } while loops * Primarily ivdep allows ambiguous dependencies to be ignored, examples: * p[i] = q[j] * a[ix[i]] = b[iy[i]] * a[ix[i]] += 1.0 * ivdep still requires automatic detection of reductions, including multiple homogeneous reductions on a single variable, examples: * x = x + a[i] * x = x + a[i]; if ( c[i] > 0.0 ) { x = x + b[i] } * ivdep implies loop control variables, loop termination expressions, and primary induction increment expressions do not alias anything else in the loop nest body * For Fortran, ivdep allows the following for array syntax * Assumption that there is no overlap between the target and right hand side in assignments * Assumption there is no overlap between the target and the address expression for the target Things that ivdep will /not /do: * ivdep on a loop nest will not diagnose or reject loops with 'provable' dependencies * ivdep on a loop nest will not restructure loops as necessary for correctness * will not reorder statements for dependence resolution * will not peel for dependence resolution * will not perform loop distribution to remove recurrences * ivdep does not force vector code to be generated; the compiler can decide to not vectorize the loop nest for any reason it sees fit, including but not limited to: * Expectations the scalar version will be faster than the vector * Non-vectorizable function calls in the loop nest * Vector operations that lack hardware support * Extremely unstructured control flow -- Terry Greyzck | Cray Inc. Sr. Principal Engineer, Compiler Optimization
Doerfert, Johannes via llvm-dev
2019-Aug-19 15:30 UTC
[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM
Hi Terry, I'm curious.> * Primarily ivdep allows ambiguous dependencies to be ignored, examples: > * p[i] = q[j] > * a[ix[i]] = b[iy[i]] > * a[ix[i]] += 1.0 > > * ivdep still requires automatic detection of reductions, including > multiple homogeneous reductions on a single variable, examples: > * x = x + a[i] > * x = x + a[i]; if ( c[i] > 0.0 ) { x = x + b[i] }How do you define the difference between a[ix[i]] += 1.0 and x += 1.0 as you require reduction detection for the latter but seem to ignore the (histogram) reduction dependences for the former. Thanks, Johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190819/3aaa3187/attachment.sig>
Michael Kruse via llvm-dev
2019-Aug-19 19:33 UTC
[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM
I think some of the semantics could be implemented using the "llvm.mem.parallel_loop_access" annotation we already have, modulo the difficulties mentioned below. Am Do., 15. Aug. 2019 um 15:06 Uhr schrieb Terry Greyzck via llvm-dev <llvm-dev at lists.llvm.org>:> * Primarily ivdep allows ambiguous dependencies to be ignored, examples: > * p[i] = q[j] > * a[ix[i]] = b[iy[i]] > * a[ix[i]] += 1.0"ambiguous dependencies" is very vague. Does it mean the compiler has to do some analysis to detect non-ambiguous dependencies? When using "llvm.mem.parallel_loop_access", this would mean the front-end would have to detect which accesses are non-ambiguous and not annotate them. However, the annotation is for single accesses, not dependencies. Both "p[i]" and "q[j]" look non-ambiguous individually, but the vectorizer would have to add a runtime-check and loop versioning to ensure that these are not aliasing.> * ivdep still requires automatic detection of reductions, including > multiple homogeneous reductions on a single variable, examples: > * x = x + a[i] > * x = x + a[i]; if ( c[i] > 0.0 ) { x = x + b[i] }We could leave away the "llvm.mem.parallel_loop_access" for the LoadInst and StoreInst of the reduction variable, assuming detected reductions are limited over scalar variables. However, mem2reg/sroa would remove those memory accesses anyway, including their annotation, requiring the LoopVectorizer to detect that the resulting PHINode is a reduction. Mem2reg/sroa/LICM would also do so with non-reductions, and array elements that are promoted to registers during the execution of the loop, such that the loop would not be vectorizable. Michael
Terry Greyzck via llvm-dev
2019-Aug-20 19:23 UTC
[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM
The ivdep pragma is intended to assist automatic vectorization - basically, automatic vectorization behaves as it normally does, but if it gets into a spot where it finds a potential dependence, it continues on rather than rejecting the loop. Reductions are part of cycle-breaking; one possible way to identify potential reduction objects is that their address is provably invariant with respect to the vectorized loop. Some examples (assume 'i' is the loop primary induction): x = x + b[i] &x is invariant with respect to the 'i' vector loop, and can be a reduction candidate a[0] = a[0] + b[i] &a[0] is invariant with respect to the 'i' vector loop, and can be a reduction candidate a[ix[i]] = a[ix[i]] + b[i] &a[ix[i]] varies with respect to the 'i' vector loop, and is not a reduction candidate * I am ignoring the end case here where all values of ix[i] are identical * Without an ivdep, this would be considered a histogram (or what Cray used to call 'vector update'), due to possible repeated values in array 'ix' * With an ivdep, this becomes a gather, load and add followed by a scatter When outer loop vectorization is considered, identifying vector reductions becomes somewhat more complicated, and the simple invariant address test is not always sufficient. Examples on request, though that is really a different (and possibly lengthy) discussion. -- Terry Greyzck | Cray Inc. Sr. Principal Engineer, Compiler Optimization On 8/19/2019 10:30 AM, Doerfert, Johannes wrote:> Hi Terry, > > I'm curious. > >> * Primarily ivdep allows ambiguous dependencies to be ignored, examples: >> * p[i] = q[j] >> * a[ix[i]] = b[iy[i]] >> * a[ix[i]] += 1.0 >> >> * ivdep still requires automatic detection of reductions, including >> multiple homogeneous reductions on a single variable, examples: >> * x = x + a[i] >> * x = x + a[i]; if ( c[i] > 0.0 ) { x = x + b[i] } > How do you define the difference between > a[ix[i]] += 1.0 > and > x += 1.0 > as you require reduction detection for the latter but seem to ignore the > (histogram) reduction dependences for the former. > > Thanks, > Johannes
Terry Greyzck via llvm-dev
2019-Aug-26 19:16 UTC
[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM
My intent was to present the expected behavior for an "ivdep" pragma implementation, rather than diving into the implementation details - that seems like it should be another thread. That said, trying to predict in the front end what edges will eventually cause difficulties with automatic vectorization does seem problematic. Generally "ivdep" is an assist to automatic vectorization; for older Cray compilers that basically means the front end does nothing but pass along the "ivdep" property, and dependency analysis for vectorization uses that property directly. One thing to remember is that is perfectly valid for the "ivdep" loop nest to still be rejected as a vector candidate for any reason, so support for an "ivdep" pragma could be implemented in stages if desired. Terry On 8/19/2019 2:33 PM, Michael Kruse wrote:> I think some of the semantics could be implemented using the > "llvm.mem.parallel_loop_access" annotation we already have, modulo the > difficulties mentioned below. > > Am Do., 15. Aug. 2019 um 15:06 Uhr schrieb Terry Greyzck via llvm-dev > <llvm-dev at lists.llvm.org>: >> * Primarily ivdep allows ambiguous dependencies to be ignored, examples: >> * p[i] = q[j] >> * a[ix[i]] = b[iy[i]] >> * a[ix[i]] += 1.0 > "ambiguous dependencies" is very vague. Does it mean the compiler has > to do some analysis to detect non-ambiguous dependencies? > > When using "llvm.mem.parallel_loop_access", this would mean the > front-end would have to detect which accesses are non-ambiguous and > not annotate them. However, the annotation is for single accesses, not > dependencies. Both "p[i]" and "q[j]" look non-ambiguous individually, > but the vectorizer would have to add a runtime-check and loop > versioning to ensure that these are not aliasing. > > >> * ivdep still requires automatic detection of reductions, including >> multiple homogeneous reductions on a single variable, examples: >> * x = x + a[i] >> * x = x + a[i]; if ( c[i] > 0.0 ) { x = x + b[i] } > We could leave away the "llvm.mem.parallel_loop_access" for the > LoadInst and StoreInst of the reduction variable, assuming detected > reductions are limited over scalar variables. However, mem2reg/sroa > would remove those memory accesses anyway, including their annotation, > requiring the LoopVectorizer to detect that the resulting PHINode is a > reduction. Mem2reg/sroa/LICM would also do so with non-reductions, and > array elements that are promoted to registers during the execution of > the loop, such that the loop would not be vectorizable. > > > > Michael >