Hi, I am continuing the discussion about Parallel Loop Metadata from here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-February/059168.html and here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-February/058999.html Pekka suggested that we add two kind of metadata: llvm.loop.parallel (attached to each loop latch) and llvm.mem.parallel (attached to each memory instruction!). I think that the motivation for the first metadata is clear - it says that the loop is data-parallel. I can also see us adding additional metadata such as llvm.loop.unrollcnt to allow the users to control the unroll count of loops using pragmas. That's fine. Pekka, can you think of transformations that may need invalidate or take this metadata into consideration ? Regarding the second metadata that you proposed, I am a bit skeptical. I don't fully understand the semantics of this metadata and I am not sure why we need it. And even if we do need it, I think that it would require too many passes to change. It is very very important to take into account the complexity of these features. In the past we rejected the parallel 'barrier' semantics to change because it required too many unrelated passes to change. Nadav
Hi Nadav, On 02/07/2013 07:46 PM, Nadav Rotem wrote:> Pekka suggested that we add two kind of metadata: llvm.loop.parallel > (attached to each loop latch) and llvm.mem.parallel (attached to each memory > instruction!). I think that the motivation for the first metadata is clear - > it says that the loop is data-parallel. I can also see us adding additional > metadata such as llvm.loop.unrollcnt to allow the users to control the unroll > count of loops using pragmas. That's fine. Pekka, can you think of > transformations that may need invalidate or take this metadata into > consideration ?Any pass that introduces new non-parallel memory instructions to the loop, because they think the loop is sequential and it's ok to do so. I do not know any other such pass than the one pointed out earlier, reg2mem (if the variables inside the loop body reuse stack slots). E.g., inlining should be safe. So should be unrolling an inner loop inside a parallel loop. Anyways, the fact that I do not know more of such passes to exist does not mean there isn't any. Especially when you consider there are out of tree passes in external projects that use LLVM. Therefore, the "safety first" approach of annotating the memory instructions and falling back to sequential semantics if non-annotated memory instructions are found sounds sensible to me. Your loop unroll metadata example does not need this as it's not related to the parallel semantics. That one works both for parallel and sequential loops. The other way to go is the "jump in the cold water" approach: Assume the parallel loop metadata itself is something that should be respected by all passes or breakage might happen, which is a bit rough and not allowed according to the metadata guidelines. It practically adds a new semantical construct, a parallel loop, to the LLVM IR. Thus, it's then something that all passes potentially *need* to know about to not accidentally break the code (by assuming it's a sequential loop and doing transformations that actually make it so).> Regarding the second metadata that you proposed, I am a bit skeptical. I > don't fully understand the semantics of this metadata and I am not sure why > we need it. And even if we do need it, I think that it would require too many > passes to change.It's there exactly to avoid the *need* for the passes to know about parallel loop semantics. If they do know about it, they can *optimize* by retaining it as a parallel loop, but the fallback to a serial loop should be safe for parallel-loop-unaware passes. E.g., inlining and unrolling might want to update the metadata to still enable, e.g., the vectorizer to consider it as a parallel loop. Definitely some kind of helper function somewhere should do this to make it easy to add "parallel-loop-awareness" to the passes.> It is very very important to take into account the complexity of these > features. In the past we rejected the parallel 'barrier' semantics to change > because it required too many unrelated passes to change.The additional llvm.mem.parallel metadata tries to avoid exactly this. Simply put, llvm.loop.parallel as itself is not legal metadata (it cannot be ignored safely) without the other. That being said, if you know of a better way to guarantee this type of "safe fallback", I'll be happy to implement it. BR, -- --Pekka
On Feb 7, 2013, at 10:55 AM, Pekka Jääskeläinen <pekka.jaaskelainen at tut.fi> wrote:> Hi Nadav, > > On 02/07/2013 07:46 PM, Nadav Rotem wrote: >> Pekka suggested that we add two kind of metadata: llvm.loop.parallel >> (attached to each loop latch) and llvm.mem.parallel (attached to each memory >> instruction!). I think that the motivation for the first metadata is clear - >> it says that the loop is data-parallel. I can also see us adding additional >> metadata such as llvm.loop.unrollcnt to allow the users to control the unroll >> count of loops using pragmas. That's fine. Pekka, can you think of >> transformations that may need invalidate or take this metadata into >> consideration ? > > Any pass that introduces new non-parallel memory instructions to the loop, > because they think the loop is sequential and it's ok to do so. I do not know > any other such pass than the one pointed out earlier, reg2mem (if the > variables inside the loop body reuse stack slots). E.g., inlining > should be safe. So should be unrolling an inner loop inside a parallel loop. >I suggest that we only add the 'llvm.loop.parallel' metadata and not llvm.mem.parallem. I believe that it should be the job of the consumer pass (e.g loop-vectorizer) to scan the loop and detect parallelism violations. This is also the approach that we use when we optimize stack slots using lifetime markers. I understand that the consumer passes will have to be more conservative and miss some optimizations. But I still think that this is better than forcing different passes in the compiler to know about parallel metadata.
On Thu, Feb 7, 2013 at 6:55 PM, Pekka Jääskeläinen < pekka.jaaskelainen at tut.fi> wrote:> Hi Nadav, > > > On 02/07/2013 07:46 PM, Nadav Rotem wrote: > >> Pekka suggested that we add two kind of metadata: llvm.loop.parallel >> (attached to each loop latch) and llvm.mem.parallel (attached to each >> memory >> instruction!). I think that the motivation for the first metadata is >> clear - >> it says that the loop is data-parallel. I can also see us adding >> additional >> metadata such as llvm.loop.unrollcnt to allow the users to control the >> unroll >> count of loops using pragmas. That's fine. Pekka, can you think of >> transformations that may need invalidate or take this metadata into >> consideration ? >> > > Any pass that introduces new non-parallel memory instructions to the loop, > because they think the loop is sequential and it's ok to do so. I do not > know > any other such pass than the one pointed out earlier, reg2mem (if the > variables inside the loop body reuse stack slots). E.g., inlining > should be safe. So should be unrolling an inner loop inside a parallel > loop. > > Anyways, the fact that I do not know more of such passes to exist does not > mean there isn't any. Especially when you consider there are out of tree > passes in external projects that use LLVM. Therefore, the "safety first" > approach of annotating the memory instructions and falling back to > sequential > semantics if non-annotated memory instructions are found sounds sensible > to me. > >Another possibility would be for passes to be only-just parallel metadata aware, in that if a pass which doesn't know enough to correctly preserve parallel metadata will not do anything if it sees any parallel markers (maybe trying to find the smallest region to which this "don't run yourself" applies, maybe not). In that way the metadata is guaranteed to remain correct at the cost of missing out on some reorgranisations that are done on non-parallel-metadata code. I don't know how well this would fit in with the general philosophy of LLVM passes though. (I'm also aware that we're coming at this from different directions: most people are interested in auto-parallelisation, where missing a parallelisation opportunity is just one of those unfortunate things, while I have a personal interest in DSLs which try to present LLVM code with huge "parallelise this" signs pointing at bits of it. It would be frustrating to have carefully made sure the DSL twisted things into a parallelisable form only to have parallelisation/vectorization "fall at the last hurdle".) Cheers, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130208/cdd7f726/attachment.html>