Philip Reames <listmail at philipreames.com> writes:> I was not suggesting that you rely on pattern matching predication for > correctness. As you point out, that's obviously incorrect. I was > assuming that you have a correct but slow lowering for the select > form. I was suggesting your ISEL attempt to use a predicated > instruction where possible for performance.The whole reason for using predication is performance. In the presence of traps, the select form should never even be created in the first place.> The point about pattern complexity is an inherent difficulty w/any > intermediate IR. We do quite well pattern matching complicate > constructs in existing backends - x86 SIMD comes to mind - and I'm > unconvinced that predication is somehow inherently more difficult.Our experience tells us otherwise. Intrinsics, and ultimately first-class IR support is the most reasonable way to get correctness and performance. How should we translate this to get predicated instructions out? for (int i=...) { if( fabs(c[i]) > epsilon) { a[i] = b[i]/c[i]; } else { a[i] = 0; } } We can't use select even with constrained intrinsics, because the constrained intrinsics only tell the optimizer they can't be speculated. This is not a legal translation: %cond = fabs(c[i]) > epsilon %temp = select %cond, llvm.experimental.constrained.fdiv(b[i], c[i], tonearest, maytrap), 0 store a[i], %temp According to the IR, we've already speculated llvm.experimental.constrained.fdiv above the test. I believe the only way to safely do this with the current IR is via control flow and now we have to match complex control flow during isel. Who knows what other things passes may have put into our carefully constructed basic blocks? The ARM backend has (had?) logic for trying to match predicated scalar things. I would not wish it on any codegen person. -David
Constrained intrinsics should be extended to take a mask parameter. Scalar call sites should be verified to have constant TRUE mask value. llvm.experimental.constrained.fdiv(b[i], c[i], tonearest, maytrap), Thanks, Hideki -----Original Message----- From: David Greene [mailto:dag at cray.com] Sent: Thursday, February 21, 2019 10:28 AM To: Philip Reames <listmail at philipreames.com> Cc: Simon Moll <moll at cs.uni-saarland.de>; via llvm-dev <llvm-dev at lists.llvm.org>; Maslov, Sergey V <sergey.v.maslov at intel.com>; Saito, Hideki <hideki.saito at intel.com>; Topper, Craig <craig.topper at intel.com> Subject: Re: [llvm-dev] [RFC] Vector Predication Philip Reames <listmail at philipreames.com> writes:> I was not suggesting that you rely on pattern matching predication for > correctness. As you point out, that's obviously incorrect. I was > assuming that you have a correct but slow lowering for the select > form. I was suggesting your ISEL attempt to use a predicated > instruction where possible for performance.The whole reason for using predication is performance. In the presence of traps, the select form should never even be created in the first place.> The point about pattern complexity is an inherent difficulty w/any > intermediate IR. We do quite well pattern matching complicate > constructs in existing backends - x86 SIMD comes to mind - and I'm > unconvinced that predication is somehow inherently more difficult.Our experience tells us otherwise. Intrinsics, and ultimately first-class IR support is the most reasonable way to get correctness and performance. How should we translate this to get predicated instructions out? for (int i=...) { if( fabs(c[i]) > epsilon) { a[i] = b[i]/c[i]; } else { a[i] = 0; } } We can't use select even with constrained intrinsics, because the constrained intrinsics only tell the optimizer they can't be speculated. This is not a legal translation: %cond = fabs(c[i]) > epsilon %temp = select %cond, llvm.experimental.constrained.fdiv(b[i], c[i], tonearest, maytrap), 0 store a[i], %temp According to the IR, we've already speculated llvm.experimental.constrained.fdiv above the test. I believe the only way to safely do this with the current IR is via control flow and now we have to match complex control flow during isel. Who knows what other things passes may have put into our carefully constructed basic blocks? The ARM backend has (had?) logic for trying to match predicated scalar things. I would not wish it on any codegen person. -David
"Saito, Hideki" <hideki.saito at intel.com> writes:> Constrained intrinsics should be extended to take a mask > parameter. Scalar call sites should be verified to have constant TRUE > mask value. > > llvm.experimental.constrained.fdiv(b[i], c[i], tonearest, maytrap),Yes, they should and I believe that is part of Simon's proposal. -David> -----Original Message----- > From: David Greene [mailto:dag at cray.com] > Sent: Thursday, February 21, 2019 10:28 AM > To: Philip Reames <listmail at philipreames.com> > Cc: Simon Moll <moll at cs.uni-saarland.de>; via llvm-dev > <llvm-dev at lists.llvm.org>; Maslov, Sergey V > <sergey.v.maslov at intel.com>; Saito, Hideki <hideki.saito at intel.com>; > Topper, Craig <craig.topper at intel.com> > Subject: Re: [llvm-dev] [RFC] Vector Predication > > Philip Reames <listmail at philipreames.com> writes: > >> I was not suggesting that you rely on pattern matching predication for >> correctness. As you point out, that's obviously incorrect. I was >> assuming that you have a correct but slow lowering for the select >> form. I was suggesting your ISEL attempt to use a predicated >> instruction where possible for performance. > > The whole reason for using predication is performance. In the >> presence of traps, the select form should never even be created in >> the first place. > >> The point about pattern complexity is an inherent difficulty w/any >> intermediate IR. We do quite well pattern matching complicate >> constructs in existing backends - x86 SIMD comes to mind - and I'm >> unconvinced that predication is somehow inherently more difficult. > > Our experience tells us otherwise. Intrinsics, and ultimately first-class IR support is the most reasonable way to get correctness and performance. How should we translate this to get predicated instructions out? > > for (int i=...) { > if( fabs(c[i]) > epsilon) { > a[i] = b[i]/c[i]; > } > else { > a[i] = 0; > } > } > > We can't use select even with constrained intrinsics, because the constrained intrinsics only tell the optimizer they can't be speculated. > This is not a legal translation: > > %cond = fabs(c[i]) > epsilon > %temp = select %cond, > llvm.experimental.constrained.fdiv(b[i], c[i], tonearest, maytrap), > 0 > store a[i], %temp > > According to the IR, we've already speculated llvm.experimental.constrained.fdiv above the test. > > I believe the only way to safely do this with the current IR is via control flow and now we have to match complex control flow during isel. > Who knows what other things passes may have put into our carefully constructed basic blocks? > > The ARM backend has (had?) logic for trying to match predicated scalar things. I would not wish it on any codegen person. > > -David