Hi Renato, On 02/08/2013 03:07 PM, Renato Golin wrote:> In this case, I'd prefer metadata on the variables that are assumed not > to alias, like the restrict keyword.>> It seems to me that having metadata on the loop basic blocks, since they > can be invalidated, will not help that much with the vectorizer more > than specific annotation on specific values (which are harder to lose). > I'm not saying we should annotate *all* memory instructions on a loop, > just the ones that make sense, or will help the vectorizer default to a > sane value.This is an interesting alternative! Do you mean that we would still add the llvm.mem.parallel_loop_access metadata, but only to such mem accesses that are assumed to be "hard or impossible to analyze" (to prove to be no alias cases)? Then we'd forget about the "parallel loop metadata" as is. Then we would rely on the regular loop carried dependency analyzer by default, but let those (mem) annotations just *help* in the "tricky cases". The llvm.mem.parallel_loop_access metadata would only communicate "this instruction does not alias with any other similarly annotated instruction from any other iteration in this loop". Quickly thinking, this might work and might not loose the parallelism info too easily. Anyways, the info still has to be connected to a loop to avoid breakup in inlining, multi-level loops, etc. Summarizing, the new metadata would be: llvm.loop: Just to mark a loop (points to a unique id metadata). llvm.mem.parallel_loop_access: The above mentioned new semantics, connected to the llvm.loop's id metadata. What do others think? Nadav? -- Pekka
On Fri, Feb 8, 2013 at 9:16 AM, Pekka Jääskeläinen <pekka.jaaskelainen at tut.fi> wrote:> Hi Renato, > > > On 02/08/2013 03:07 PM, Renato Golin wrote: >> >> In this case, I'd prefer metadata on the variables that are assumed not >> to alias, like the restrict keyword. > >> >> >> It seems to me that having metadata on the loop basic blocks, since they >> can be invalidated, will not help that much with the vectorizer more >> than specific annotation on specific values (which are harder to lose). >> I'm not saying we should annotate *all* memory instructions on a loop, >> just the ones that make sense, or will help the vectorizer default to a >> sane value. > > > This is an interesting alternative! Do you mean that we would still add > the llvm.mem.parallel_loop_access metadata, but only to such mem accesses > that are assumed to be "hard or impossible to analyze" (to prove to be no > alias cases)? Then we'd forget about the "parallel loop metadata" as is. > > Then we would rely on the regular loop carried dependency analyzer by > default, but let those (mem) annotations just *help* in the "tricky cases". > > The llvm.mem.parallel_loop_access metadata would only communicate "this > instruction does not alias with any other similarly annotated instruction > from any other iteration in this loop". > > Quickly thinking, this might work and might not loose the > parallelism info too easily. Anyways, the info still has to be > connected to a loop to avoid breakup in inlining, multi-level loops, etc. > > Summarizing, the new metadata would be: > > llvm.loop: > Just to mark a loop (points to a unique id metadata). > > llvm.mem.parallel_loop_access: > The above mentioned new semantics, connected to the llvm.loop's id metadata.How does this not require you to mark all the possible alias pairs in practice? IE Given memory instructions A, B, C, and D, what do you think makes the (A,B) hard to analyze (and thus you'd need to mark A and B with this new metadata) that doesn't also make (A, C) hard to analyze? Is it not usually the case that it is *A* itself, that is hard to analyze (because of some property of the memory access), rather than any particular pair? I'd love to see example cases where the pair analysis is the difficulty, rather than the access analysis of any single memory piece being the difficulty.
On 02/08/2013 04:26 PM, Daniel Berlin wrote:> I'd love to see example cases where the pair analysis is the > difficulty, rather than the access analysis of any single memory piece > being the difficulty.I'm not completely sure what you mean, but is there really a difference between doing "pair analysis" across multiple iterations of the same instruction than doing it with any other instruction? Say, a loop with memory operations Aw, Br, Cw (w=write, r=read). This is unrolled just for the sake of the example. The same applies to widening the instructions in a loop vectorizer, similar dependency analysis is needed. i1: Aw Br Cw i2: Aw' Br' Cw' To do a legal code motion across the two iterations in such a way that you move e.g., Aw' before Br, in parallel with Aw (e.g. to pack them to a vector instruction or statically schedule in a VLIW) requires you to know that none of Aw, Br nor Cw access the same location of Aw'. If we know these are parallel loop accesses we know that Aw' doesn't alias with any of Aw, Br or Cw, thus we can perform the code motion with a peaceful mind, and execute Aw and Aw' in parallel e.g. in a vector instruction. If we get some non-parallel-loop-annotated mem instructions injected in the loop, it's up to the analyzer again to prove the code motion is legal. For example, the reg2mem case (where Sw is produced by it and not marked with the llvm.mem.parallel_loop_access MD). i1: Aw, Sw, Br, Cw i2: Aw', Sw', Br', Cw' Cannot do the same code motion here unless the analyzer can prove Aw' and Sw do not alias. -- Pekka