On 09/12/2011 04:56 PM, Sebastian Pop wrote:> On Mon, Sep 12, 2011 at 10:44, Tobias Grosser<tobias at grosser.es> wrote: >>> You can have the parallel code generation part of Polly working as >>> a LoopPass. >> >> Are you sure about this? In CodeGeneration we basically translate a CLooG >> AST into LLVM-IR. Without a CLooG AST this does not work. > > I mean you could rewrite that code to work on a per loop basis, like > graphite does: flag the parallel loops in the code generation of polly > and then independently of polly, as a LoopPass, iterate over all the > loops and code generate the parallel ones.Yes, that's true and probably a good idea. It is more work as generating it directly in Polly, but it facilitates reuse of code. We can probably reuse quite a bit of the code in the existing OpenMP code generation. We (or actually someone) may work on three things: 1) The high level OpenMP intrinsics that can be lowered to actual library calls This can be reused by an clang OpenMP extensions as well as by polly directly. 2) An OpenMP builder, that allows to build OpenMP loops from scratch 3) The actual parallelizer, that takes an LLVM-IR loop and parallelizes it with OpenMP intrinsics. Cheers Tobi
Recently I met the following spec: http://gcc.gnu.org/onlinedocs/libgomp/OMP_005fNUM_005fTHREADS.html#OMP_005fNUM_005fTHREADS It was new to me OpenMP actually allows to specify "the number of threads to use for the corresponding nested level" in the multidimensional loops. Is that correct in Polly this feature is not supported, and OpenMP clause is always applied only the the most outer loop? With it supported I think we can step much closer to natural unification of OpenMP and multidimensional compute grids of GPU programming models. 2011/9/12 Tobias Grosser <tobias at grosser.es>:> On 09/12/2011 04:56 PM, Sebastian Pop wrote: >> >> On Mon, Sep 12, 2011 at 10:44, Tobias Grosser<tobias at grosser.es> wrote: >>>> >>>> You can have the parallel code generation part of Polly working as >>>> a LoopPass. >>> >>> Are you sure about this? In CodeGeneration we basically translate a CLooG >>> AST into LLVM-IR. Without a CLooG AST this does not work. >> >> I mean you could rewrite that code to work on a per loop basis, like >> graphite does: flag the parallel loops in the code generation of polly >> and then independently of polly, as a LoopPass, iterate over all the >> loops and code generate the parallel ones. > > Yes, that's true and probably a good idea. It is more work as generating it > directly in Polly, but it facilitates reuse of code. We can probably reuse > quite a bit of the code in the existing OpenMP code generation. > > We (or actually someone) may work on three things: > > 1) The high level OpenMP intrinsics that can be lowered to actual library > calls > > This can be reused by an clang OpenMP extensions as well as by polly > directly. > > 2) An OpenMP builder, that allows to build OpenMP loops from scratch > > 3) The actual parallelizer, that takes an LLVM-IR loop and parallelizes it > with OpenMP intrinsics. > > Cheers > Tobi > > >
On 09/12/2011 05:23 PM, Dmitry N. Mikushin wrote:> Recently I met the following spec: > http://gcc.gnu.org/onlinedocs/libgomp/OMP_005fNUM_005fTHREADS.html#OMP_005fNUM_005fTHREADS > It was new to me OpenMP actually allows to specify "the number of > threads to use for the corresponding nested level" in the > multidimensional loops. Is that correct in Polly this feature is not > supported, and OpenMP clause is always applied only the the most outer > loop?Jep, it is not used by Polly.> With it supported I think we can step much closer to natural > unification of OpenMP and multidimensional compute grids of GPU > programming models.I would be very interested to see a use case, where this improves performance. In case this use case appears frequently, it would be nice to see work in this direction. I would be glad to contribute to relevant discussions. Cheers Tobi