Nicolau Werneck via llvm-dev
2019-May-04 21:51 UTC
[llvm-dev] Vectorizing minimum without function attributes
Thanks for the reply. I should say I'm actually working on 6.0, but I don't think this part of the code changed much since. These are traces I made with GDB optimizing a loop with floats and then integers, showing where they diverge: https://gist.github.com/nlw0/58ed9fda8e8944a9cb5e5a20f6038fcf This is the point I believe we need to set the function attribute for the vectorization to work with floats: https://github.com/llvm/llvm-project/blob/fd254e429ea103be8bab6271855c04919d33f9fb/llvm/lib/Analysis/IVDescriptors.cpp#L590 Could this be a bug? It seems to me it just wasn't changed yet to depend only on instruction flags. I would gladly work on refactoring this if there's an opportunity, but I'm a complete newbie in this project. It would be great to hear from someone more knowledgeable what can be done about this issue, especially if turns out to be a very small patch! On Sat, May 4, 2019 at 3:41 PM Finkel, Hal J. <hfinkel at anl.gov> wrote:> On 5/4/19 6:36 AM, llvm-dev wrote: > > Greetings, > > The LLVM loop vectorizer does a great job handling reductions with the > `min(a, b)` function over an array of integers or floats. This finds the > smallest value of a list exploiting SIMD instructions, and works just as > well as a summation. > > Specifically with floats, though, using the `fcmp` instruction, the > vectorization seems to require the function attribute "no-nans-fp-math" to > be set. Just setting instruction flags is not enough. > > > fcmp takes fast-math flags now, but that wasn't always true (my > recollection is that was a capability added after the arithmetic > operations). In any case, I wonder if this is just a hold-over from before > fcmp took fast-math flags, or if this is an && condition that should be an > || condition. > > > This forces us to give up on fine-grained control of fast-math in the code > in order to benefit from this vectorization. > > How to overcome this? LLVM has intrinsic functions such as `minnum` and > `minimum` (`minnan`) that accurately represent the operation. This could > permit fine-grained control of fast-math flags, although the vectorizer > seems to ignore these intrinsics. > > Beyond this specific case, it would be nice to be sure when is it ever > necessary to set these function attributes, e.g. > > https://github.com/llvm/llvm-project/blob/8205a814a691bfa62fed911b58b0a306ab5efe31/clang/lib/CodeGen/CGCall.cpp#L1743-L1750 > > What would be a way to control the vectorization for `min` without having > to rely on that function attribute? And furthermore, could LLVM > optimizations conceivably depend only on instruction flags, and not ever on > function attributes? What would be necessary to achieve this? > > > The goal has been to eliminate the dependence on the function attributes > once all of the necessary local flags are in place. Obviously I could be > missing something, but this just seems like a bug. > > -Hal > > > Thanks, > > -- > Nicolau Werneck <nwerneck at gmail.com> > http://n <http://nwerneck.sdf.org>ic.hpavc.net > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >-- Nicolau Werneck <nwerneck at gmail.com> http://n <http://nwerneck.sdf.org>ic.hpavc.net -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190504/1d91f64c/attachment.html>
Sanjay Patel via llvm-dev
2019-May-06 13:31 UTC
[llvm-dev] Vectorizing minimum without function attributes
We have several problems with FP min/max optimization, so it would help to see an IR example. But I agree that we should not require a function attribute to enable the optimization. This might be the closest match based on the description: https://bugs.llvm.org/show_bug.cgi?id=35538 Other candidates: https://bugs.llvm.org/show_bug.cgi?id=36982 https://bugs.llvm.org/show_bug.cgi?id=34149 https://bugs.llvm.org/show_bug.cgi?id=26956 https://bugs.llvm.org/show_bug.cgi?id=35284 On Sat, May 4, 2019 at 3:51 PM Nicolau Werneck via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Thanks for the reply. I should say I'm actually working on 6.0, but I > don't think this part of the code changed much since. These are traces I > made with GDB optimizing a loop with floats and then integers, showing > where they diverge: > https://gist.github.com/nlw0/58ed9fda8e8944a9cb5e5a20f6038fcf > > This is the point I believe we need to set the function attribute for the > vectorization to work with floats: > > https://github.com/llvm/llvm-project/blob/fd254e429ea103be8bab6271855c04919d33f9fb/llvm/lib/Analysis/IVDescriptors.cpp#L590 > > Could this be a bug? It seems to me it just wasn't changed yet to depend > only on instruction flags. > > I would gladly work on refactoring this if there's an opportunity, but I'm > a complete newbie in this project. It would be great to hear from someone > more knowledgeable what can be done about this issue, especially if turns > out to be a very small patch! > > > On Sat, May 4, 2019 at 3:41 PM Finkel, Hal J. <hfinkel at anl.gov> wrote: > >> On 5/4/19 6:36 AM, llvm-dev wrote: >> >> Greetings, >> >> The LLVM loop vectorizer does a great job handling reductions with the >> `min(a, b)` function over an array of integers or floats. This finds the >> smallest value of a list exploiting SIMD instructions, and works just as >> well as a summation. >> >> Specifically with floats, though, using the `fcmp` instruction, the >> vectorization seems to require the function attribute "no-nans-fp-math" to >> be set. Just setting instruction flags is not enough. >> >> >> fcmp takes fast-math flags now, but that wasn't always true (my >> recollection is that was a capability added after the arithmetic >> operations). In any case, I wonder if this is just a hold-over from before >> fcmp took fast-math flags, or if this is an && condition that should be an >> || condition. >> >> >> This forces us to give up on fine-grained control of fast-math in the >> code in order to benefit from this vectorization. >> >> How to overcome this? LLVM has intrinsic functions such as `minnum` and >> `minimum` (`minnan`) that accurately represent the operation. This could >> permit fine-grained control of fast-math flags, although the vectorizer >> seems to ignore these intrinsics. >> >> Beyond this specific case, it would be nice to be sure when is it ever >> necessary to set these function attributes, e.g. >> >> https://github.com/llvm/llvm-project/blob/8205a814a691bfa62fed911b58b0a306ab5efe31/clang/lib/CodeGen/CGCall.cpp#L1743-L1750 >> >> What would be a way to control the vectorization for `min` without having >> to rely on that function attribute? And furthermore, could LLVM >> optimizations conceivably depend only on instruction flags, and not ever on >> function attributes? What would be necessary to achieve this? >> >> >> The goal has been to eliminate the dependence on the function attributes >> once all of the necessary local flags are in place. Obviously I could be >> missing something, but this just seems like a bug. >> >> -Hal >> >> >> Thanks, >> >> -- >> Nicolau Werneck <nwerneck at gmail.com> >> http://n <http://nwerneck.sdf.org>ic.hpavc.net >> >> _______________________________________________ >> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> > > -- > Nicolau Werneck <nwerneck at gmail.com> > http://n <http://nwerneck.sdf.org>ic.hpavc.net > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190506/28ce5ce5/attachment.html>
Nicolau Werneck via llvm-dev
2019-May-06 19:18 UTC
[llvm-dev] Vectorizing minimum without function attributes
Thanks for checking this out. I think your second comment on 35538 must be precisely what I'm talking about. I came up with this IR that might be used for testing: ``` define float @minloop(float* nocapture readonly) #0 { top: %1 = load float, float* %0 br label %loop loop: %2 = phi i64 [ %8, %loop ], [ 1, %top ] %3 = phi float [ %7, %loop ], [ %1, %top ] %4 = getelementptr float, float* %0, i64 %2 %5 = load float, float* %4, align 4 %6 = fcmp fast olt float %3, %5 %7 = select i1 %6, float %3, float %5 %8 = add i64 %2, 1 %9 = icmp eq i64 %8, 65537 br i1 %9, label %out, label %loop out: ret float %7 } attributes #0 = { "no-nans-fp-math"="true"} ``` I can get vectorized code if I use this: opt -S minloop.ll -O2 -force-vector-width=8 Setting the attributes is necessary, and commenting it prevents the vectorization. Neither the fcmp fast flag or the command-line argument -fp-contract=fast have any effect. The code that seems relevant to this issue is 6 years old. In fact, its birthday was just yesterday! https://github.com/llvm/llvm-project/commit/d96e427eacdd4d48da3a1f93a64008ef4dadcc8a#diff-b3c08c1a517adb5ef635cdf9cf224904R3087 If support for fast flags in fcmp was really introduced only after that, then this was probably just never updated, and the fix might be as simple as replacing `HasFunNoNaNAttr` by `I->hasNoNaNs()`. Does that sound like a possible PR? On Mon, May 6, 2019 at 3:31 PM Sanjay Patel <spatel at rotateright.com> wrote:> We have several problems with FP min/max optimization, so it would help to > see an IR example. But I agree that we should not require a function > attribute to enable the optimization. > > This might be the closest match based on the description: > https://bugs.llvm.org/show_bug.cgi?id=35538 > > Other candidates: > https://bugs.llvm.org/show_bug.cgi?id=36982 > https://bugs.llvm.org/show_bug.cgi?id=34149 > https://bugs.llvm.org/show_bug.cgi?id=26956 > https://bugs.llvm.org/show_bug.cgi?id=35284 > > > On Sat, May 4, 2019 at 3:51 PM Nicolau Werneck via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Thanks for the reply. I should say I'm actually working on 6.0, but I >> don't think this part of the code changed much since. These are traces I >> made with GDB optimizing a loop with floats and then integers, showing >> where they diverge: >> https://gist.github.com/nlw0/58ed9fda8e8944a9cb5e5a20f6038fcf >> >> This is the point I believe we need to set the function attribute for the >> vectorization to work with floats: >> >> https://github.com/llvm/llvm-project/blob/fd254e429ea103be8bab6271855c04919d33f9fb/llvm/lib/Analysis/IVDescriptors.cpp#L590 >> >> Could this be a bug? It seems to me it just wasn't changed yet to depend >> only on instruction flags. >> >> I would gladly work on refactoring this if there's an opportunity, but >> I'm a complete newbie in this project. It would be great to hear from >> someone more knowledgeable what can be done about this issue, especially if >> turns out to be a very small patch! >> >> >> On Sat, May 4, 2019 at 3:41 PM Finkel, Hal J. <hfinkel at anl.gov> wrote: >> >>> On 5/4/19 6:36 AM, llvm-dev wrote: >>> >>> Greetings, >>> >>> The LLVM loop vectorizer does a great job handling reductions with the >>> `min(a, b)` function over an array of integers or floats. This finds the >>> smallest value of a list exploiting SIMD instructions, and works just as >>> well as a summation. >>> >>> Specifically with floats, though, using the `fcmp` instruction, the >>> vectorization seems to require the function attribute "no-nans-fp-math" to >>> be set. Just setting instruction flags is not enough. >>> >>> >>> fcmp takes fast-math flags now, but that wasn't always true (my >>> recollection is that was a capability added after the arithmetic >>> operations). In any case, I wonder if this is just a hold-over from before >>> fcmp took fast-math flags, or if this is an && condition that should be an >>> || condition. >>> >>> >>> This forces us to give up on fine-grained control of fast-math in the >>> code in order to benefit from this vectorization. >>> >>> How to overcome this? LLVM has intrinsic functions such as `minnum` and >>> `minimum` (`minnan`) that accurately represent the operation. This could >>> permit fine-grained control of fast-math flags, although the vectorizer >>> seems to ignore these intrinsics. >>> >>> Beyond this specific case, it would be nice to be sure when is it ever >>> necessary to set these function attributes, e.g. >>> >>> https://github.com/llvm/llvm-project/blob/8205a814a691bfa62fed911b58b0a306ab5efe31/clang/lib/CodeGen/CGCall.cpp#L1743-L1750 >>> >>> What would be a way to control the vectorization for `min` without >>> having to rely on that function attribute? And furthermore, could LLVM >>> optimizations conceivably depend only on instruction flags, and not ever on >>> function attributes? What would be necessary to achieve this? >>> >>> >>> The goal has been to eliminate the dependence on the function attributes >>> once all of the necessary local flags are in place. Obviously I could be >>> missing something, but this just seems like a bug. >>> >>> -Hal >>> >>> >>> Thanks, >>> >>> -- >>> Nicolau Werneck <nwerneck at gmail.com> >>> http://n <http://nwerneck.sdf.org>ic.hpavc.net >>> >>> _______________________________________________ >>> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >>> >> >> -- >> Nicolau Werneck <nwerneck at gmail.com> >> http://n <http://nwerneck.sdf.org>ic.hpavc.net >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-- Nicolau Werneck <nwerneck at gmail.com> http://n <http://nwerneck.sdf.org>ic.hpavc.net -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190506/6f1bc49c/attachment.html>
Seemingly Similar Threads
- Vectorizing minimum without function attributes
- Vectorizing minimum without function attributes
- PR43374 - when should comparing NaN values raise a floating point exception?
- Help understanding and lowering LLVM IDS conditional codes correctly
- Proposed new min and max intrinsics