thr3ads.net - llvm dev - [llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM [Aug 2019]

If this information is useful, please help other people find it:
Share via:

Terry Greyzck via llvm-dev

2019-Aug-15 20:05 UTC

[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

The ivdep pragma is designed to do exactly what the name states - ignore
vector dependencies.  Cray Research first implemented this in 1978 in
their CFT compiler, and has supported it since.

This pragma is typically used by application developers who want
vectorized code when the compiler cannot automatically determine safety;
it is not equivalent to the OpenMP SIMD pragma in that the compiler is
still expected to automatically detect such things as reductions.

The Cray implementation accepts an optional 'safevl=<const>'
clause,
however this is not commonly used and not really needed for new
implementations.

Characteristics of ivdep:

   * ivdep can be applied to any loop in a nest, but only affects the
     immediately following loop
       *  Only trims dependencies at that loop nest level
       *  If the user annotates more than one loop in a loop nest, only
           the ivdep on the innermost loop or loops is honored

   * ivdep can be used on for (), while {}, and do { } while loops

   * Primarily ivdep allows ambiguous dependencies to be ignored, examples:
       *  p[i] = q[j]
       *  a[ix[i]] = b[iy[i]]
       *  a[ix[i]] += 1.0

   * ivdep still requires automatic detection of reductions, including
     multiple homogeneous reductions on a single variable, examples:
       *  x = x + a[i]
       *  x = x + a[i]; if ( c[i] > 0.0 ) { x = x + b[i] }

   * ivdep implies loop control variables, loop termination expressions,
     and primary induction increment expressions do not alias anything
     else in the loop nest body

   * For Fortran, ivdep allows the following for array syntax
       *  Assumption that there is no overlap between the target and right
           hand side in assignments
       *  Assumption there is no overlap between the target and the
           address expression for the target

Things that ivdep will /not /do:

   * ivdep on a loop nest will not diagnose or reject loops with
     'provable' dependencies

   * ivdep on a loop nest will not restructure loops as necessary for
     correctness
       * will not reorder statements for dependence resolution
       * will not peel for dependence resolution
       * will not perform loop distribution to remove recurrences

   * ivdep does not force vector code to be generated; the compiler can
     decide to not vectorize the loop nest for any reason it sees fit,
     including but not limited to:
       * Expectations the scalar version will be faster than the vector
       * Non-vectorizable function calls in the loop nest
       * Vector operations that lack hardware support
       * Extremely unstructured control flow

-- 
Terry Greyzck  |  Cray Inc.
Sr. Principal Engineer, Compiler Optimization

Doerfert, Johannes via llvm-dev

2019-Aug-19 15:30 UTC

head link

[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

Hi Terry,

I'm curious.
>    * Primarily ivdep allows ambiguous dependencies to be ignored, examples:
>        *  p[i] = q[j]
>        *  a[ix[i]] = b[iy[i]]
>        *  a[ix[i]] += 1.0
> 
>    * ivdep still requires automatic detection of reductions, including
>      multiple homogeneous reductions on a single variable, examples:
>        *  x = x + a[i]
>        *  x = x + a[i]; if ( c[i] > 0.0 ) { x = x + b[i] }
How do you define the difference between
  a[ix[i]] += 1.0
and
  x += 1.0
as you require reduction detection for the latter but seem to ignore the
(histogram) reduction dependences for the former.

Thanks,
  Johannes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190819/3aaa3187/attachment.sig>

Michael Kruse via llvm-dev

2019-Aug-19 19:33 UTC

head link

[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

I think some of the semantics could be implemented using the
"llvm.mem.parallel_loop_access" annotation we already have, modulo the
difficulties mentioned below.

Am Do., 15. Aug. 2019 um 15:06 Uhr schrieb Terry Greyzck via llvm-dev
<llvm-dev at lists.llvm.org>:>    * Primarily ivdep allows ambiguous dependencies to be ignored, examples:
>        *  p[i] = q[j]
>        *  a[ix[i]] = b[iy[i]]
>        *  a[ix[i]] += 1.0
"ambiguous dependencies" is very vague. Does it mean the compiler has
to do some analysis to detect non-ambiguous dependencies?

When using "llvm.mem.parallel_loop_access", this would mean the
front-end would have to detect which accesses are non-ambiguous and
not annotate them. However, the annotation is for single accesses, not
dependencies. Both "p[i]" and "q[j]" look non-ambiguous
individually,
but the vectorizer would have to add a runtime-check and loop
versioning to ensure that these are not aliasing.

>    * ivdep still requires automatic detection of reductions, including
>      multiple homogeneous reductions on a single variable, examples:
>        *  x = x + a[i]
>        *  x = x + a[i]; if ( c[i] > 0.0 ) { x = x + b[i] }
We could leave away the "llvm.mem.parallel_loop_access" for the
LoadInst and StoreInst of the reduction variable, assuming detected
reductions are limited over scalar variables. However, mem2reg/sroa
would remove those memory accesses anyway, including their annotation,
requiring the LoopVectorizer to detect that the resulting PHINode is a
reduction. Mem2reg/sroa/LICM would also do so with non-reductions, and
array elements that are promoted to registers during the execution of
the loop, such that the loop would not be vectorizable.



Michael

Terry Greyzck via llvm-dev

2019-Aug-20 19:23 UTC

head link

[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

The ivdep pragma is intended to assist automatic vectorization - 
basically, automatic vectorization behaves as it normally does, but if 
it gets into a spot where it finds a potential dependence, it continues 
on rather than rejecting the loop.

Reductions are part of cycle-breaking; one possible way to identify 
potential reduction objects is that their address is provably invariant 
with respect to the vectorized loop.  Some examples (assume 'i' is the 
loop primary induction):

x = x + b[i]
    &x is invariant with respect to the 'i' vector loop, and can be
a
reduction candidate

a[0] = a[0] + b[i]
    &a[0] is invariant with respect to the 'i' vector loop, and can
be a
reduction candidate

a[ix[i]] = a[ix[i]] + b[i]
    &a[ix[i]] varies with respect to the 'i' vector loop, and is not
a
reduction candidate
       *  I am ignoring the end case here where all values of ix[i] are 
identical
       *  Without an ivdep, this would be considered a histogram (or 
what Cray used to call 'vector update'), due to possible repeated values
in array 'ix'
       *  With an ivdep, this becomes a gather, load and add followed by 
a scatter

When outer loop vectorization is considered, identifying vector 
reductions becomes somewhat more complicated, and the simple invariant 
address test is not always sufficient.  Examples on request, though that 
is really a different (and possibly lengthy) discussion.

-- 
Terry Greyzck  |  Cray Inc.
Sr. Principal Engineer, Compiler Optimization

On 8/19/2019 10:30 AM, Doerfert, Johannes wrote:> Hi Terry,
>
> I'm curious.
>
>>     * Primarily ivdep allows ambiguous dependencies to be ignored,
examples:
>>         *  p[i] = q[j]
>>         *  a[ix[i]] = b[iy[i]]
>>         *  a[ix[i]] += 1.0
>>
>>     * ivdep still requires automatic detection of reductions, including
>>       multiple homogeneous reductions on a single variable, examples:
>>         *  x = x + a[i]
>>         *  x = x + a[i]; if ( c[i] > 0.0 ) { x = x + b[i] }
> How do you define the difference between
>    a[ix[i]] += 1.0
> and
>    x += 1.0
> as you require reduction detection for the latter but seem to ignore the
> (histogram) reduction dependences for the former.
>
> Thanks,
>    Johannes

Terry Greyzck via llvm-dev

2019-Aug-26 19:16 UTC

head link

[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

My intent was to present the expected behavior for an "ivdep" pragma 
implementation, rather than diving into the implementation details - 
that seems like it should be another thread.

That said, trying to predict in the front end what edges will eventually 
cause difficulties with automatic vectorization does seem problematic.  
Generally "ivdep" is an assist to automatic vectorization; for older 
Cray compilers that basically means the front end does nothing but pass 
along the "ivdep" property, and dependency analysis for vectorization 
uses that property directly.

One thing to remember is that is perfectly valid for the "ivdep" loop 
nest to still be rejected as a vector candidate for any reason, so 
support for an "ivdep" pragma could be implemented in stages if
desired.

Terry

On 8/19/2019 2:33 PM, Michael Kruse wrote:> I think some of the semantics could be implemented using the
> "llvm.mem.parallel_loop_access" annotation we already have,
modulo the
> difficulties mentioned below.
>
> Am Do., 15. Aug. 2019 um 15:06 Uhr schrieb Terry Greyzck via llvm-dev
> <llvm-dev at lists.llvm.org>:
>>     * Primarily ivdep allows ambiguous dependencies to be ignored,
examples:
>>         *  p[i] = q[j]
>>         *  a[ix[i]] = b[iy[i]]
>>         *  a[ix[i]] += 1.0
> "ambiguous dependencies" is very vague. Does it mean the compiler
has
> to do some analysis to detect non-ambiguous dependencies?
>
> When using "llvm.mem.parallel_loop_access", this would mean the
> front-end would have to detect which accesses are non-ambiguous and
> not annotate them. However, the annotation is for single accesses, not
> dependencies. Both "p[i]" and "q[j]" look non-ambiguous
individually,
> but the vectorizer would have to add a runtime-check and loop
> versioning to ensure that these are not aliasing.
>
>
>>     * ivdep still requires automatic detection of reductions, including
>>       multiple homogeneous reductions on a single variable, examples:
>>         *  x = x + a[i]
>>         *  x = x + a[i]; if ( c[i] > 0.0 ) { x = x + b[i] }
> We could leave away the "llvm.mem.parallel_loop_access" for the
> LoadInst and StoreInst of the reduction variable, assuming detected
> reductions are limited over scalar variables. However, mem2reg/sroa
> would remove those memory accesses anyway, including their annotation,
> requiring the LoopVectorizer to detect that the resulting PHINode is a
> reduction. Mem2reg/sroa/LICM would also do so with non-reductions, and
> array elements that are promoted to registers during the execution of
> the loop, such that the loop would not be vectorizable.
>
>
>
> Michael
>

llvm dev - Aug 2019 - [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

[llvm-dev] [LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM