Johannes Doerfert via llvm-dev
2017-Jan-20 13:32 UTC
[llvm-dev] [RFC] IR-level Region Annotations
On 01/11, Daniel Berlin via llvm-dev wrote:> > > > def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty], > > [IntrArgMemOnly], > > "llvm.experimental.directive">; > > > > def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty], > > [IntrArgMemOnly], > > "llvm.experimental.dir.qual">; > > > > def int_experimental_dir_qual_opnd : Intrinsic<[], > > [llvm_metadata_ty, llvm_any_ty], > > [IntrArgMemOnly], > > "llvm.experimental.dir.qual.opnd">; > > > > def int_experimental_dir_qual_opndlist : Intrinsic< > > [], > > [llvm_metadata_ty, llvm_vararg_ty], > > [IntrArgMemOnly], > > "llvm.experimental.dir.qual.opndlist">; > > > > > I'll bite. > > What does argmemonly mean when the operands are metadata/? > :) > > If the rest is an attempt to keep the intrinsic from being floated or > removed, i'm strongly against extending a way we already know to have > significant effect on optimization (fake memory dependence) to do this. > Particularly for something so major.I guess that any kind of extension that does pretend to have some sort of side effect will have a significant effect on optimizations. The tricky part is to find the right representation (and side effects) that implicitly keep the semantics/invariants of parallel code preserved while allowing as much transformations as possible. [The following paragraph might be a bit abstract. If it is unclear please tell me and I will add a code example.] In the example by Sanjoy [0] we saw that "parallel regions markers" need to be a barrier for alloca movement, though we might want some transformations to "move" them nevertheless, e.g., to aggregate in parallel executed allocas outside the parallel region as a means of communication. To make a transformation like this happening but prevent the movement Sanjoy described at the same time, we probably have to educate some passes on the semantics of "parallel region markers". Alternatively, (my hope is that) if we use use known concepts (mainly dominance) to encode parts of the parallel invariants such optimizations should come at a much lower cost. Cheers, Johannes [0] http://lists.llvm.org/pipermail/llvm-dev/2017-January/109302.html> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Johannes Doerfert Researcher / PhD Student Compiler Design Lab (Prof. Hack) Saarland Informatics Campus, Germany Building E1.3, Room 4.31 Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de Fax. +49 (0)681 302-3065 : http://www.cdl.uni-saarland.de/people/doerfert -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: Digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170120/79802694/attachment.sig>
Yonghong Yan via llvm-dev
2017-Jan-20 14:49 UTC
[llvm-dev] [RFC] IR-level Region Annotations
Johannes and Sanjoy, Thank you for the info, illustration and examples. In Sanjoy's example, the current situation is that the whole parallel region is outlined as a new function by the frontend, and intrinsic_a includes the kmp_fork_call which takes input a pointer of that function to the runtime for parallel execution. Thus it prevented those exotic or evil optimization happening. The same thing for OpenMP task and cilk_spawn (cilk_spawn specifically requires a new function because of the way it handles the frame and stack in the runtime) if we do not let the compiler outline (or enable inter-procedural analysis/optimization) and have a parallel region marker thing (either region metadata, instructions or others). The region could be marked 1) as if it is a sequential loop with 4 iterations and then most of the pass for sequential code would do proper optimization. We however limit us for only parallel-ignorant optimization, which is the current situation regardless of that. or 2) The region is marked as a real parallel region and develop parallel-aware passes that are independent of parallel-ignorant passes. In any approach of making the PIR, I feel safer if we can separate parallel-ignorant passes and parallel-aware one, i.e. not to introduce too much or even no changes to the current passes. Yonghong Yan Assistant Professor Department of Computer Science and Engineering School of Engineering and Computer Science Oakland University Office: EC 534 Phone: 248-370-4087 Email: yan at oakland.edu www.secs.oakland.edu/~yan On Fri, Jan 20, 2017 at 8:32 AM, Johannes Doerfert via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 01/11, Daniel Berlin via llvm-dev wrote: > > > > > > def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty], > > > [IntrArgMemOnly], > > > "llvm.experimental.directive">; > > > > > > def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty], > > > [IntrArgMemOnly], > > > "llvm.experimental.dir.qual">; > > > > > > def int_experimental_dir_qual_opnd : Intrinsic<[], > > > [llvm_metadata_ty, llvm_any_ty], > > > [IntrArgMemOnly], > > > "llvm.experimental.dir.qual.opnd">; > > > > > > def int_experimental_dir_qual_opndlist : Intrinsic< > > > [], > > > [llvm_metadata_ty, llvm_vararg_ty], > > > [IntrArgMemOnly], > > > "llvm.experimental.dir.qual.opndlist">; > > > > > > > > I'll bite. > > > > What does argmemonly mean when the operands are metadata/? > > :) > > > > If the rest is an attempt to keep the intrinsic from being floated or > > removed, i'm strongly against extending a way we already know to have > > significant effect on optimization (fake memory dependence) to do this. > > Particularly for something so major. > > I guess that any kind of extension that does pretend to have some sort > of side effect will have a significant effect on optimizations. The > tricky part is to find the right representation (and side effects) that > implicitly keep the semantics/invariants of parallel code preserved > while allowing as much transformations as possible. > > [The following paragraph might be a bit abstract. If it is unclear > please tell me and I will add a code example.] > > In the example by Sanjoy [0] we saw that "parallel regions markers" need > to be a barrier for alloca movement, though we might want some > transformations to "move" them nevertheless, e.g., to aggregate in > parallel executed allocas outside the parallel region as a means of > communication. To make a transformation like this happening but prevent > the movement Sanjoy described at the same time, we probably have to > educate some passes on the semantics of "parallel region markers". > Alternatively, (my hope is that) if we use use known concepts (mainly > dominance) to encode parts of the parallel invariants such optimizations > should come at a much lower cost. > > Cheers, > Johannes > > [0] http://lists.llvm.org/pipermail/llvm-dev/2017-January/109302.html > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > -- > > Johannes Doerfert > Researcher / PhD Student > > Compiler Design Lab (Prof. Hack) > Saarland Informatics Campus, Germany > Building E1.3, Room 4.31 > > Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de > Fax. +49 (0)681 302-3065 : http://www.cdl.uni-saarland.de/people/doerfert > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170120/611357b8/attachment.html>
Johannes Doerfert via llvm-dev
2017-Jan-20 17:41 UTC
[llvm-dev] [RFC] IR-level Region Annotations
On 01/20, Yonghong Yan wrote:> Johannes and Sanjoy, > > Thank you for the info, illustration and examples.If you are interested, there are some more in the white paper of PIR: http://compilers.cs.uni-saarland.de/people/doerfert/parallelcfg.pdf> In Sanjoy's example, the current situation is that the whole parallel > region is outlined as a new function by the frontend, and intrinsic_a > includes the kmp_fork_call which takes input a pointer of that function to > the runtime for parallel execution. Thus it prevented those exotic or evil > optimization happening. The same thing for OpenMP task and cilk_spawn > (cilk_spawn specifically requires a new function because of the way it > handles the frame and stack in the runtime) > > if we do not let the compiler outline (or enable inter-procedural > analysis/optimization) and have a parallel region marker thing (either > region metadata, instructions or others). The region could be marked 1) as > if it is a sequential loop with 4 iterations and then most of the pass for > sequential code would do proper optimization. We however limit us for only > parallel-ignorant optimization, which is the current situation regardless > of that. or 2) The region is marked as a real parallel region and develop > parallel-aware passes that are independent of parallel-ignorant passes. In > any approach of making the PIR, I feel safer if we can separate > parallel-ignorant passes and parallel-aware one, i.e. not to introduce too > much or even no changes to the current passes.I agree, especially with your last sentence. Optimizations that explicitly target parallel regions/loops/... should be developed and maintained separately with different heuristics, etc. At the same time I don't think it is too far fetched to expect some existing passes to work even across the sequential to parallel boundary with little to none modifications, iff that boundary is designed "in a certain way".> Yonghong Yan > Assistant Professor > Department of Computer Science and Engineering > School of Engineering and Computer Science > Oakland University > Office: EC 534 > Phone: 248-370-4087 > Email: yan at oakland.edu > www.secs.oakland.edu/~yan > > On Fri, Jan 20, 2017 at 8:32 AM, Johannes Doerfert via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > On 01/11, Daniel Berlin via llvm-dev wrote: > > > > > > > > def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty], > > > > [IntrArgMemOnly], > > > > "llvm.experimental.directive">; > > > > > > > > def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty], > > > > [IntrArgMemOnly], > > > > "llvm.experimental.dir.qual">; > > > > > > > > def int_experimental_dir_qual_opnd : Intrinsic<[], > > > > [llvm_metadata_ty, llvm_any_ty], > > > > [IntrArgMemOnly], > > > > "llvm.experimental.dir.qual.opnd">; > > > > > > > > def int_experimental_dir_qual_opndlist : Intrinsic< > > > > [], > > > > [llvm_metadata_ty, llvm_vararg_ty], > > > > [IntrArgMemOnly], > > > > "llvm.experimental.dir.qual.opndlist">; > > > > > > > > > > > I'll bite. > > > > > > What does argmemonly mean when the operands are metadata/? > > > :) > > > > > > If the rest is an attempt to keep the intrinsic from being floated or > > > removed, i'm strongly against extending a way we already know to have > > > significant effect on optimization (fake memory dependence) to do this. > > > Particularly for something so major. > > > > I guess that any kind of extension that does pretend to have some sort > > of side effect will have a significant effect on optimizations. The > > tricky part is to find the right representation (and side effects) that > > implicitly keep the semantics/invariants of parallel code preserved > > while allowing as much transformations as possible. > > > > [The following paragraph might be a bit abstract. If it is unclear > > please tell me and I will add a code example.] > > > > In the example by Sanjoy [0] we saw that "parallel regions markers" need > > to be a barrier for alloca movement, though we might want some > > transformations to "move" them nevertheless, e.g., to aggregate in > > parallel executed allocas outside the parallel region as a means of > > communication. To make a transformation like this happening but prevent > > the movement Sanjoy described at the same time, we probably have to > > educate some passes on the semantics of "parallel region markers". > > Alternatively, (my hope is that) if we use use known concepts (mainly > > dominance) to encode parts of the parallel invariants such optimizations > > should come at a much lower cost. > > > > Cheers, > > Johannes > > > > [0] http://lists.llvm.org/pipermail/llvm-dev/2017-January/109302.html > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > llvm-dev at lists.llvm.org > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > -- > > > > Johannes Doerfert > > Researcher / PhD Student > > > > Compiler Design Lab (Prof. Hack) > > Saarland Informatics Campus, Germany > > Building E1.3, Room 4.31 > > > > Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de > > Fax. +49 (0)681 302-3065 : http://www.cdl.uni-saarland.de/people/doerfert > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > >-- Johannes Doerfert Researcher / PhD Student Compiler Design Lab (Prof. Hack) Saarland Informatics Campus, Germany Building E1.3, Room 4.31 Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de Fax. +49 (0)681 302-3065 : http://www.cdl.uni-saarland.de/people/doerfert -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: Digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170120/98125609/attachment.sig>