Nicolai Hähnle-Montoro via llvm-dev
2019-Oct-29 15:38 UTC
[llvm-dev] RFC: Removal of noduplicate attribute
On Tue, Oct 29, 2019 at 11:57 AM Savonichev, Andrew via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > These are good points. I think the first question should be: Do we know > > of any active users of this attribute right now? If not, deprecation > > seems like something we could do, e.g., through a warning in clang and > > in the middle-end to ensure other front-ends are aware of it as well. > > Noduplicate attribute is still used by the Intel OpenCL Compiler for CPU. > The main use case is to prevent loop unroll when an OpenCL barrier is called > within a loop. Although such loop can be unrolled and keep its semantic intact, but > this introduces a lot of distinct barrier calls, and each of them has to > be handled separately. > > In other words, "noduplicate" serves as a hint to not unroll a loop if a > certain function is called in a loop body.I don't quite understand the reasoning behind this. Is it because your backend turns each individual barrier call into a large chunk of code? If so, would it be a long-term viable alternative to inform the various code size heuristics about this instead of using `noduplicate`? Cheers, Nicolai -- Lerne, wie die Welt wirklich ist, aber vergiss niemals, wie sie sein sollte.
Savonichev, Andrew via llvm-dev
2019-Oct-29 16:56 UTC
[llvm-dev] RFC: Removal of noduplicate attribute
On 10/29, Nicolai Hähnle-Montoro wrote:> On Tue, Oct 29, 2019 at 11:57 AM Savonichev, Andrew via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > These are good points. I think the first question should be: Do we know > > > of any active users of this attribute right now? If not, deprecation > > > seems like something we could do, e.g., through a warning in clang and > > > in the middle-end to ensure other front-ends are aware of it as well. > > > > Noduplicate attribute is still used by the Intel OpenCL Compiler for CPU. > > The main use case is to prevent loop unroll when an OpenCL barrier is called > > within a loop. Although such loop can be unrolled and keep its semantic intact, but > > this introduces a lot of distinct barrier calls, and each of them has to > > be handled separately. > > > > In other words, "noduplicate" serves as a hint to not unroll a loop if a > > certain function is called in a loop body. > > I don't quite understand the reasoning behind this. Is it because your > backend turns each individual barrier call into a large chunk of code?It is even worse than just a large chunk of code: in order to support OpenCL barrier on CPU (at least in our implementation) we have to significantly change control flow across the entire call chain. Evgeniy Tyurin gave a talk about this at the last year LLVM'Dev[1][2], and a short summary is: OpenCL barriers on CPU are complicated, and they are *very* expensive for performance and compile time. [1]: https://www.youtube.com/watch?v=Mm5ATyqm7Rw [2]: https://llvm.org/devmtg/2018-10/slides/Tyurin-ImplementingOpenCLCompiler.pdf> If so, would it be a long-term viable alternative to inform the > various code size heuristics about this instead of using > `noduplicate`?I think so. If we can tell standard LLVM optimizations to not make several calls out of one call, that should be good enough. Although it is exactly the meaning of the current `noduplicate' attribute, so I'm not sure what will be the difference. Another related problem is the fact that the OpenCL barrier is not an LLVM intrinsic - it is a regular function (declaration) that has the attribute. If we want to inform standard LLVM optimizations about it, this function should be changed to an intrinsic, right? -- Andrew
Finkel, Hal J. via llvm-dev
2019-Oct-29 18:50 UTC
[llvm-dev] RFC: Removal of noduplicate attribute
On 10/29/19 11:56 AM, Savonichev, Andrew via llvm-dev wrote:> On 10/29, Nicolai Hähnle-Montoro wrote: >> On Tue, Oct 29, 2019 at 11:57 AM Savonichev, Andrew via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >>>> These are good points. I think the first question should be: Do we know >>>> of any active users of this attribute right now? If not, deprecation >>>> seems like something we could do, e.g., through a warning in clang and >>>> in the middle-end to ensure other front-ends are aware of it as well. >>> Noduplicate attribute is still used by the Intel OpenCL Compiler for CPU. >>> The main use case is to prevent loop unroll when an OpenCL barrier is called >>> within a loop. Although such loop can be unrolled and keep its semantic intact, but >>> this introduces a lot of distinct barrier calls, and each of them has to >>> be handled separately. >>> >>> In other words, "noduplicate" serves as a hint to not unroll a loop if a >>> certain function is called in a loop body. >> I don't quite understand the reasoning behind this. Is it because your >> backend turns each individual barrier call into a large chunk of code? > It is even worse than just a large chunk of code: in order to support OpenCL > barrier on CPU (at least in our implementation) we have to significantly change > control flow across the entire call chain. Evgeniy Tyurin gave a talk about this > at the last year LLVM'Dev[1][2], and a short summary is: OpenCL barriers on CPU > are complicated, and they are *very* expensive for performance and compile time. > > [1]: https://www.youtube.com/watch?v=Mm5ATyqm7Rw > [2]: https://llvm.org/devmtg/2018-10/slides/Tyurin-ImplementingOpenCLCompiler.pdf > >> If so, would it be a long-term viable alternative to inform the >> various code size heuristics about this instead of using >> `noduplicate`? > I think so. If we can tell standard LLVM optimizations to not make several > calls out of one call, that should be good enough. Although it is exactly the > meaning of the current `noduplicate' attribute, so I'm not sure what will be > the difference. > > Another related problem is the fact that the OpenCL barrier is not an LLVM > intrinsic - it is a regular function (declaration) that has the attribute. > If we want to inform standard LLVM optimizations about it, this function > should be changed to an intrinsic, right?That's an option. It's also possible to teach TargetLibraryInfo about it. The optimizer knows about malloc(), but that's not an intrinsic. -Hal>-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory