Hendrik Greving via llvm-dev
2020-May-19 14:23 UTC
[llvm-dev] LLVM's loop unroller & llvm.loop.parallel_accesses
Skipping the clang question for now, this had to be a loop pragma of some kind. One step back: what we really need is a way to express that memory accesses between iterations can be re-ordered. The code that's being compiled _is_ noalias, but we don't _have_ to use noalias semantics, e.g. loop parallel semantics are sufficient. What's missing is a way to express that past the llvm unroller. When the unroller merges iterations, loop parallel no longer depicts the original iterations. So the obvious idea was using noalias scope metadata for this, and llvm.loop.noalias_acccesses would cause the unroller to propagate different scopes for each iteration. Thinkable is also to keep the llvm.loop.parallel_accesses, and the unroller propagates a new type of metadata analog to noalias scope, but loop_parallel scope or something like that. We have methods to achieve this with intrinsics, but I am looking for something more robust that also works with clang. On Mon, May 18, 2020 at 8:44 PM Michael Kruse <llvmdev at meinersbur.de> wrote:> What would be its semantics? When would clang attach that attribute? > > Michael > > Am Mo., 18. Mai 2020 um 14:01 Uhr schrieb Hendrik Greving < > hgreving at google.com>: > > > > Would you guys be open to supporting a new hint with the right > semantics, like e.g. llvm.loop.noalias_accesses?! I would need to find > support in clang however and the main point of support would be the loop > unroller behaving as stated in the OP. > > > > On Thu, May 14, 2020 at 3:04 PM Michael Kruse <llvmdev at meinersbur.de> > wrote: > >> > >> Trivial example: > >> > >> #pragma clang loop vectorize(assume_safety) > >> for (int i = 0; i < n; i+=1) { > >> (void)A[0]; > >> } > >> > >> I hope it is obvious that the loop is parallel and can be vectorized, > >> but A[0] from iteration 0 will alias with A[0] from iteration 1. > >> Replace `0` by `i*c` where c is a variable that can be 0 at runtime to > >> make the fact non-obvious to the compiler. > >> > >> We had discussions about implementing "#pragma ivdep", but it's > >> semantics are not defined independently of the implementation. Anyway, > >> even with #pragma omp ivdep, a compiler is not required to vectorize > >> the loop. > >> > >> In LLVM, runtime/partial unrolling only takes place after > >> vectorization, so there is less of an issue there. > >> > >> Michael > >> > >> > >> Am Do., 14. Mai 2020 um 16:16 Uhr schrieb Hendrik Greving < > hgreving at google.com>: > >> > > >> > This is interesting! So are you saying that loop.parallel_accesses > strictly loop parallel, and says nothing about aliasing? I see, I guess we > may have been "abusing" the hint and re-purposed it. But isn't llvm's > vectorizer using loop.parallel_accesses to vectorize loops including > vectorize memory accesses that if you ignore loop-carried dependencies, > usually means effectively re-ordering the accesses? I guess this still does > not imply "noalias"? What about icc/gcc's #pragma ivdep? Again here, it > means no loop-carried dependencies, yet still doesn't say anything about > noalias? Another way indeed would be to propagate noalias data and indeed > rely on the future fix that Hal mentions above. > >> > > >> > > >> > > >> > On Thu, May 14, 2020 at 1:33 PM Michael Kruse <llvmdev at meinersbur.de> > wrote: > >> >> > >> >> llvm.loop.parallel_accesses does not imply that these accesses from > >> >> different iterations are not aliasing. Examples where an access are > >> >> parallel are that the accesses are atomic or read-only from a > specific > >> >> location. > >> >> > >> >> The LoopUnrollPass might deduce that non-atomic stores are > necessarily > >> >> not aliasing (when not using transactional memory), but I don't think > >> >> we can do this for all the read accesses. Would that be sufficiently > >> >> useful? > >> >> > >> >> Michael > >> >> > >> >> > >> >> Am Do., 14. Mai 2020 um 15:11 Uhr schrieb Hendrik Greving via > llvm-dev > >> >> <llvm-dev at lists.llvm.org>: > >> >> > > >> >> > Hi, in our backend, which is unfortunately not upstreamed, we are > relying on llvm.loop.parallel_accesses metadata for certain passes like > software pipelining so we can re-order instructions. Ideally, we would want > the loop unroller to support the notion of the loop's parallelism in its > pre-unrolled version. This probably should happen by propagating > !alias.scope and !alias metadata. Is there any plan or open patch for > supporting this? > >> >> > > >> >> > Simplified example: > >> >> > > >> >> > for.body: > >> >> > %0 = load [..] > >> >> > store %0 [..] > >> >> > br label %for.cond, !llvm.loop !2 > >> >> > > >> >> > !1 = distinct !{} > >> >> > !2 = distinct !{!2, !3, !4, !5, !6, !7} > >> >> > !3 = !{!"llvm.loop.parallel_accesses", !1} > >> >> > !4 = !{!"llvm.loop.vectorize.width", i32 1} > >> >> > !5 = !{!"llvm.loop.interleave.count", i32 1} > >> >> > !6 = !{!"llvm.loop.vectorize.enable", i1 true} > >> >> > !7 = !{!"llvm.loop.vectorize.followup_all", !8} > >> >> > !8 = !{!"llvm.loop.unroll.count", i32 2} > >> >> > > >> >> > (unroll by 2) => > >> >> > > >> >> > for.body: > >> >> > %0 = load [..] !alias.scope !9 !noalias !11 > >> >> > store %0 [..] !alias.scope !9 !noalias !11 > >> >> > %1 = load [..] !alias.scope !10 !noalias !12 > >> >> > store %1 [..] !alias.scope !10 !noalias !12 > >> >> > br label %for.cond, !llvm.loop !2 > >> >> > > >> >> > [..] > >> >> > > >> >> > !9 = distinct !{!9, !"iteration0"} > >> >> > !10 = distinct !{!10, !"iteration1"} > >> >> > !11 = !{!10} > >> >> > !12 = !{!9} > >> >> > > >> >> > Thanks, Hendrik > >> >> > _______________________________________________ > >> >> > LLVM Developers mailing list > >> >> > llvm-dev at lists.llvm.org > >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200519/94a136e6/attachment.html>
Michael Kruse via llvm-dev
2020-May-20 11:41 UTC
[llvm-dev] LLVM's loop unroller & llvm.loop.parallel_accesses
You should know that LLVM's alias infrastructure+metadata current does not handle cross-iteration aliasing. The current AliasAnalysis interface, when comparing two accesses, assumes you mean executions in the same iteration. This is a common source of bugs such as http://lists.llvm.org/pipermail/llvm-dev/2019-May/132725.html and the aforementioned https://bugs.llvm.org/show_bug.cgi?id=39282. It might be difficult to add metadata for which there isn't a concept of in LLVM. Of course, we are looking into improving the situation, we'd enjoy if you could join the conference call: http://lists.llvm.org/pipermail/llvm-dev/2020-May/141653.html At the moment, the way to find out whether accesses from different iterations might alias is DependenceAnalysis (which has its own wrong assumptions about AliasAnalysis, see https://bugs.llvm.org/show_bug.cgi?id=42143). A LoopUnroll could query DependenceAnalysis for the iterations before unrolling and add noalias metadata if DA found that they do not alias. However, there is currently no "cross-iteration" analogon to alias.scope/alias.metadata that could assist DA, which is where your idea could come into play. Note that DA generally is a computationally expensive analysis, there could be questions whether the gain in additional noalias information is worth the additional compile-time cost (at least up to -O2) of an otherwise "simple" transformation like unrolling. Michael Am Di., 19. Mai 2020 um 09:23 Uhr schrieb Hendrik Greving <hgreving at google.com>:> > Skipping the clang question for now, this had to be a loop pragma of some kind. One step back: what we really need is a way to express that memory accesses between iterations can be re-ordered. The code that's being compiled _is_ noalias, but we don't _have_ to use noalias semantics, e.g. loop parallel semantics are sufficient. What's missing is a way to express that past the llvm unroller. When the unroller merges iterations, loop parallel no longer depicts the original iterations. So the obvious idea was using noalias scope metadata for this, and llvm.loop.noalias_acccesses would cause the unroller to propagate different scopes for each iteration. Thinkable is also to keep the llvm.loop.parallel_accesses, and the unroller propagates a new type of metadata analog to noalias scope, but loop_parallel scope or something like that. We have methods to achieve this with intrinsics, but I am looking for something more robust that also works with clang. > > On Mon, May 18, 2020 at 8:44 PM Michael Kruse <llvmdev at meinersbur.de> wrote: >> >> What would be its semantics? When would clang attach that attribute? >> >> Michael >> >> Am Mo., 18. Mai 2020 um 14:01 Uhr schrieb Hendrik Greving <hgreving at google.com>: >> > >> > Would you guys be open to supporting a new hint with the right semantics, like e.g. llvm.loop.noalias_accesses?! I would need to find support in clang however and the main point of support would be the loop unroller behaving as stated in the OP. >> > >> > On Thu, May 14, 2020 at 3:04 PM Michael Kruse <llvmdev at meinersbur.de> wrote: >> >> >> >> Trivial example: >> >> >> >> #pragma clang loop vectorize(assume_safety) >> >> for (int i = 0; i < n; i+=1) { >> >> (void)A[0]; >> >> } >> >> >> >> I hope it is obvious that the loop is parallel and can be vectorized, >> >> but A[0] from iteration 0 will alias with A[0] from iteration 1. >> >> Replace `0` by `i*c` where c is a variable that can be 0 at runtime to >> >> make the fact non-obvious to the compiler. >> >> >> >> We had discussions about implementing "#pragma ivdep", but it's >> >> semantics are not defined independently of the implementation. Anyway, >> >> even with #pragma omp ivdep, a compiler is not required to vectorize >> >> the loop. >> >> >> >> In LLVM, runtime/partial unrolling only takes place after >> >> vectorization, so there is less of an issue there. >> >> >> >> Michael >> >> >> >> >> >> Am Do., 14. Mai 2020 um 16:16 Uhr schrieb Hendrik Greving <hgreving at google.com>: >> >> > >> >> > This is interesting! So are you saying that loop.parallel_accesses strictly loop parallel, and says nothing about aliasing? I see, I guess we may have been "abusing" the hint and re-purposed it. But isn't llvm's vectorizer using loop.parallel_accesses to vectorize loops including vectorize memory accesses that if you ignore loop-carried dependencies, usually means effectively re-ordering the accesses? I guess this still does not imply "noalias"? What about icc/gcc's #pragma ivdep? Again here, it means no loop-carried dependencies, yet still doesn't say anything about noalias? Another way indeed would be to propagate noalias data and indeed rely on the future fix that Hal mentions above. >> >> > >> >> > >> >> > >> >> > On Thu, May 14, 2020 at 1:33 PM Michael Kruse <llvmdev at meinersbur.de> wrote: >> >> >> >> >> >> llvm.loop.parallel_accesses does not imply that these accesses from >> >> >> different iterations are not aliasing. Examples where an access are >> >> >> parallel are that the accesses are atomic or read-only from a specific >> >> >> location. >> >> >> >> >> >> The LoopUnrollPass might deduce that non-atomic stores are necessarily >> >> >> not aliasing (when not using transactional memory), but I don't think >> >> >> we can do this for all the read accesses. Would that be sufficiently >> >> >> useful? >> >> >> >> >> >> Michael >> >> >> >> >> >> >> >> >> Am Do., 14. Mai 2020 um 15:11 Uhr schrieb Hendrik Greving via llvm-dev >> >> >> <llvm-dev at lists.llvm.org>: >> >> >> > >> >> >> > Hi, in our backend, which is unfortunately not upstreamed, we are relying on llvm.loop.parallel_accesses metadata for certain passes like software pipelining so we can re-order instructions. Ideally, we would want the loop unroller to support the notion of the loop's parallelism in its pre-unrolled version. This probably should happen by propagating !alias.scope and !alias metadata. Is there any plan or open patch for supporting this? >> >> >> > >> >> >> > Simplified example: >> >> >> > >> >> >> > for.body: >> >> >> > %0 = load [..] >> >> >> > store %0 [..] >> >> >> > br label %for.cond, !llvm.loop !2 >> >> >> > >> >> >> > !1 = distinct !{} >> >> >> > !2 = distinct !{!2, !3, !4, !5, !6, !7} >> >> >> > !3 = !{!"llvm.loop.parallel_accesses", !1} >> >> >> > !4 = !{!"llvm.loop.vectorize.width", i32 1} >> >> >> > !5 = !{!"llvm.loop.interleave.count", i32 1} >> >> >> > !6 = !{!"llvm.loop.vectorize.enable", i1 true} >> >> >> > !7 = !{!"llvm.loop.vectorize.followup_all", !8} >> >> >> > !8 = !{!"llvm.loop.unroll.count", i32 2} >> >> >> > >> >> >> > (unroll by 2) => >> >> >> > >> >> >> > for.body: >> >> >> > %0 = load [..] !alias.scope !9 !noalias !11 >> >> >> > store %0 [..] !alias.scope !9 !noalias !11 >> >> >> > %1 = load [..] !alias.scope !10 !noalias !12 >> >> >> > store %1 [..] !alias.scope !10 !noalias !12 >> >> >> > br label %for.cond, !llvm.loop !2 >> >> >> > >> >> >> > [..] >> >> >> > >> >> >> > !9 = distinct !{!9, !"iteration0"} >> >> >> > !10 = distinct !{!10, !"iteration1"} >> >> >> > !11 = !{!10} >> >> >> > !12 = !{!9} >> >> >> > >> >> >> > Thanks, Hendrik >> >> >> > _______________________________________________ >> >> >> > LLVM Developers mailing list >> >> >> > llvm-dev at lists.llvm.org >> >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Hendrik Greving via llvm-dev
2020-May-20 19:07 UTC
[llvm-dev] LLVM's loop unroller & llvm.loop.parallel_accesses
Do you really need to query DependenceAnalysis though if instead the unroller could propagate some kind of "parallel scope" metadata simply based on the loop's llvm.loop.parallel_accesses? On Wed, May 20, 2020 at 4:42 AM Michael Kruse <llvmdev at meinersbur.de> wrote:> You should know that LLVM's alias infrastructure+metadata current does > not handle cross-iteration aliasing. The current AliasAnalysis > interface, when comparing two accesses, assumes you mean executions in > the same iteration. This is a common source of bugs such as > http://lists.llvm.org/pipermail/llvm-dev/2019-May/132725.html and the > aforementioned https://bugs.llvm.org/show_bug.cgi?id=39282. > > It might be difficult to add metadata for which there isn't a concept > of in LLVM. Of course, we are looking into improving the situation, > we'd enjoy if you could join the conference call: > http://lists.llvm.org/pipermail/llvm-dev/2020-May/141653.html > > At the moment, the way to find out whether accesses from different > iterations might alias is DependenceAnalysis (which has its own wrong > assumptions about AliasAnalysis, see > https://bugs.llvm.org/show_bug.cgi?id=42143). A LoopUnroll could query > DependenceAnalysis for the iterations before unrolling and add noalias > metadata if DA found that they do not alias. However, there is > currently no "cross-iteration" analogon to alias.scope/alias.metadata > that could assist DA, which is where your idea could come into play. > > Note that DA generally is a computationally expensive analysis, there > could be questions whether the gain in additional noalias information > is worth the additional compile-time cost (at least up to -O2) of an > otherwise "simple" transformation like unrolling. > > Michael > > > > Am Di., 19. Mai 2020 um 09:23 Uhr schrieb Hendrik Greving < > hgreving at google.com>: > > > > Skipping the clang question for now, this had to be a loop pragma of > some kind. One step back: what we really need is a way to express that > memory accesses between iterations can be re-ordered. The code that's being > compiled _is_ noalias, but we don't _have_ to use noalias semantics, e.g. > loop parallel semantics are sufficient. What's missing is a way to express > that past the llvm unroller. When the unroller merges iterations, loop > parallel no longer depicts the original iterations. So the obvious idea was > using noalias scope metadata for this, and llvm.loop.noalias_acccesses > would cause the unroller to propagate different scopes for each iteration. > Thinkable is also to keep the llvm.loop.parallel_accesses, and the unroller > propagates a new type of metadata analog to noalias scope, but > loop_parallel scope or something like that. We have methods to achieve this > with intrinsics, but I am looking for something more robust that also works > with clang. > > > > On Mon, May 18, 2020 at 8:44 PM Michael Kruse <llvmdev at meinersbur.de> > wrote: > >> > >> What would be its semantics? When would clang attach that attribute? > >> > >> Michael > >> > >> Am Mo., 18. Mai 2020 um 14:01 Uhr schrieb Hendrik Greving < > hgreving at google.com>: > >> > > >> > Would you guys be open to supporting a new hint with the right > semantics, like e.g. llvm.loop.noalias_accesses?! I would need to find > support in clang however and the main point of support would be the loop > unroller behaving as stated in the OP. > >> > > >> > On Thu, May 14, 2020 at 3:04 PM Michael Kruse <llvmdev at meinersbur.de> > wrote: > >> >> > >> >> Trivial example: > >> >> > >> >> #pragma clang loop vectorize(assume_safety) > >> >> for (int i = 0; i < n; i+=1) { > >> >> (void)A[0]; > >> >> } > >> >> > >> >> I hope it is obvious that the loop is parallel and can be vectorized, > >> >> but A[0] from iteration 0 will alias with A[0] from iteration 1. > >> >> Replace `0` by `i*c` where c is a variable that can be 0 at runtime > to > >> >> make the fact non-obvious to the compiler. > >> >> > >> >> We had discussions about implementing "#pragma ivdep", but it's > >> >> semantics are not defined independently of the implementation. > Anyway, > >> >> even with #pragma omp ivdep, a compiler is not required to vectorize > >> >> the loop. > >> >> > >> >> In LLVM, runtime/partial unrolling only takes place after > >> >> vectorization, so there is less of an issue there. > >> >> > >> >> Michael > >> >> > >> >> > >> >> Am Do., 14. Mai 2020 um 16:16 Uhr schrieb Hendrik Greving < > hgreving at google.com>: > >> >> > > >> >> > This is interesting! So are you saying that loop.parallel_accesses > strictly loop parallel, and says nothing about aliasing? I see, I guess we > may have been "abusing" the hint and re-purposed it. But isn't llvm's > vectorizer using loop.parallel_accesses to vectorize loops including > vectorize memory accesses that if you ignore loop-carried dependencies, > usually means effectively re-ordering the accesses? I guess this still does > not imply "noalias"? What about icc/gcc's #pragma ivdep? Again here, it > means no loop-carried dependencies, yet still doesn't say anything about > noalias? Another way indeed would be to propagate noalias data and indeed > rely on the future fix that Hal mentions above. > >> >> > > >> >> > > >> >> > > >> >> > On Thu, May 14, 2020 at 1:33 PM Michael Kruse < > llvmdev at meinersbur.de> wrote: > >> >> >> > >> >> >> llvm.loop.parallel_accesses does not imply that these accesses > from > >> >> >> different iterations are not aliasing. Examples where an access > are > >> >> >> parallel are that the accesses are atomic or read-only from a > specific > >> >> >> location. > >> >> >> > >> >> >> The LoopUnrollPass might deduce that non-atomic stores are > necessarily > >> >> >> not aliasing (when not using transactional memory), but I don't > think > >> >> >> we can do this for all the read accesses. Would that be > sufficiently > >> >> >> useful? > >> >> >> > >> >> >> Michael > >> >> >> > >> >> >> > >> >> >> Am Do., 14. Mai 2020 um 15:11 Uhr schrieb Hendrik Greving via > llvm-dev > >> >> >> <llvm-dev at lists.llvm.org>: > >> >> >> > > >> >> >> > Hi, in our backend, which is unfortunately not upstreamed, we > are relying on llvm.loop.parallel_accesses metadata for certain passes like > software pipelining so we can re-order instructions. Ideally, we would want > the loop unroller to support the notion of the loop's parallelism in its > pre-unrolled version. This probably should happen by propagating > !alias.scope and !alias metadata. Is there any plan or open patch for > supporting this? > >> >> >> > > >> >> >> > Simplified example: > >> >> >> > > >> >> >> > for.body: > >> >> >> > %0 = load [..] > >> >> >> > store %0 [..] > >> >> >> > br label %for.cond, !llvm.loop !2 > >> >> >> > > >> >> >> > !1 = distinct !{} > >> >> >> > !2 = distinct !{!2, !3, !4, !5, !6, !7} > >> >> >> > !3 = !{!"llvm.loop.parallel_accesses", !1} > >> >> >> > !4 = !{!"llvm.loop.vectorize.width", i32 1} > >> >> >> > !5 = !{!"llvm.loop.interleave.count", i32 1} > >> >> >> > !6 = !{!"llvm.loop.vectorize.enable", i1 true} > >> >> >> > !7 = !{!"llvm.loop.vectorize.followup_all", !8} > >> >> >> > !8 = !{!"llvm.loop.unroll.count", i32 2} > >> >> >> > > >> >> >> > (unroll by 2) => > >> >> >> > > >> >> >> > for.body: > >> >> >> > %0 = load [..] !alias.scope !9 !noalias !11 > >> >> >> > store %0 [..] !alias.scope !9 !noalias !11 > >> >> >> > %1 = load [..] !alias.scope !10 !noalias !12 > >> >> >> > store %1 [..] !alias.scope !10 !noalias !12 > >> >> >> > br label %for.cond, !llvm.loop !2 > >> >> >> > > >> >> >> > [..] > >> >> >> > > >> >> >> > !9 = distinct !{!9, !"iteration0"} > >> >> >> > !10 = distinct !{!10, !"iteration1"} > >> >> >> > !11 = !{!10} > >> >> >> > !12 = !{!9} > >> >> >> > > >> >> >> > Thanks, Hendrik > >> >> >> > _______________________________________________ > >> >> >> > LLVM Developers mailing list > >> >> >> > llvm-dev at lists.llvm.org > >> >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200520/dde236b5/attachment.html>