Hi, I've been working on clang codegen for #pragma ivdep and creating the llvm.mem.parallel_loop_access metadata seems quite difficult. The main problem is that there are so many places where loads and stores are created and all of them need to be changed when emitting a parallel loop. Note that creating llvm.loop.parallel is not a problem. One option is to modify IRBuilder to enable attaching the metadata for loads and stores but that seems like a huge hack. I'd like to reopen the discussion on requiring the llvm.mem.parallel_loop_access metadata. I understand the reason for the metadata is to protect against transformations that may introduce unsafe parallel memory accesses (the reg2mem example.) I'm wondering if perhaps we can make the metadata more user-friendly by providing a single loop metadata which can be expanded into "safer" metadata as required. Specifically a loop pass could be added that expands llvm.loop.parallel into llvm.loop.parallel_protected + llvm.mem.parallel_loop_access. Perhaps there are simpler options I've overlooked? paul
Hi Paul, On 02/28/2013 09:30 PM, Redmond, Paul wrote:> I'd like to reopen the discussion on requiring the > llvm.mem.parallel_loop_access metadata. I understand the reason for the > metadata is to protect against transformations that may introduce unsafe > parallel memory accesses (the reg2mem example.) I'm wondering if perhaps we > can make the metadata more user-friendly by providing a single loop metadata > which can be expanded into "safer" metadata as required. Specifically a loop > pass could be added that expands llvm.loop.parallel into > llvm.loop.parallel_protected + llvm.mem.parallel_loop_access.Did I understand this correctly: llvm.loop.parallel would become a version which does not require the parallel_loop_access? Basically it would be a placeholder for a later pass (before unsafe optimizations?) that converts it to the "safe" metadata pair? Then you can generate the loads/stores in Clang without worrying about the mem metadata but add the safe metadata once and for all. The problem I see with this is that I do not think there is use for "unsafe"/"unprotected" parallel metadata after Clang. Thus, the "unprotected metadata" would be merely book keeping until the final IR is generated in Clang (with the safe metadata). Maybe some other book keeping mechanism could do? If it's not easily workable (I'm not a Clang expert) then I'd propose a descriptive name for the new metadata like llvm.loop.parallel_under_construction that strongly states that it's for this purpose only and should not be used for anything during optimizations (as it's not safe to do so!). -- --Pekka
Hi Pekka, On 2013-02-28, at 4:21 PM, Pekka Jääskeläinen wrote:> Hi Paul, > > On 02/28/2013 09:30 PM, Redmond, Paul wrote: >> I'd like to reopen the discussion on requiring the >> llvm.mem.parallel_loop_access metadata. I understand the reason for the >> metadata is to protect against transformations that may introduce unsafe >> parallel memory accesses (the reg2mem example.) I'm wondering if perhaps we >> can make the metadata more user-friendly by providing a single loop metadata >> which can be expanded into "safer" metadata as required. Specifically a loop >> pass could be added that expands llvm.loop.parallel into >> llvm.loop.parallel_protected + llvm.mem.parallel_loop_access. > > Did I understand this correctly: llvm.loop.parallel would become > a version which does not require the parallel_loop_access? Basically it > would be a placeholder for a later pass (before unsafe optimizations?) that > converts it to the "safe" metadata pair? Then you can generate the loads/stores > in Clang without worrying about the mem metadata but add the safe > metadata once and for all. >Yes, that is correct.> The problem I see with this is that I do not think there is use for > "unsafe"/"unprotected" parallel metadata after Clang. Thus, the > "unprotected metadata" would be merely book keeping until the final IR is > generated in Clang (with the safe metadata). Maybe some other book keeping > mechanism could do? If it's not easily workable (I'm not a Clang expert) > then I'd propose a descriptive name for the new metadata like llvm.loop.parallel_under_construction that strongly states that > it's for this purpose only and should not be used for anything > during optimizations (as it's not safe to do so!). >Those are fair points and I can't say whether or not there are other potential users. Certainly clang could use custom loop metadata and the pass could live in its codegen. Another option is two make Loop::isParallel aware of both safe and unsafe parallel metadata and not requires translation at all. In practice, passes like reg2mem are not run so the concern goes away. Perhaps a single loop metadata is good enough for most uses? paul> -- > --Pekka
----- Original Message -----> From: "Paul Redmond" <paul.redmond at intel.com> > To: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> > Sent: Thursday, February 28, 2013 1:30:57 PM > Subject: [LLVMdev] parallel loop metadata simplification > > Hi, > > I've been working on clang codegen for #pragma ivdep and creating the > llvm.mem.parallel_loop_access metadata seems quite difficult. The > main problem is that there are so many places where loads and stores > are created and all of them need to be changed when emitting a > parallel loop. Note that creating llvm.loop.parallel is not a > problem. > > One option is to modify IRBuilder to enable attaching the metadata > for loads and stores but that seems like a huge hack.Can you please explain why this is a bad option? To be honest, this is what I expected you to do. The IRBuilder sits on top of all of the loads and stores, and seems like the perfect place to keep state for something that affects "all" loads and stores in some code region. In addition, putting this in IRBuilder should also make it easier to use this feature from other frontends. -Hal> > I'd like to reopen the discussion on requiring the > llvm.mem.parallel_loop_access metadata. I understand the reason for > the metadata is to protect against transformations that may > introduce unsafe parallel memory accesses (the reg2mem example.) I'm > wondering if perhaps we can make the metadata more user-friendly by > providing a single loop metadata which can be expanded into "safer" > metadata as required. Specifically a loop pass could be added that > expands llvm.loop.parallel into llvm.loop.parallel_protected + > llvm.mem.parallel_loop_access. > > Perhaps there are simpler options I've overlooked? > > paul > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Hi Hal, On 2013-02-28, at 9:33 PM, Hal Finkel wrote:> ----- Original Message ----- >> From: "Paul Redmond" <paul.redmond at intel.com> >> To: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> >> Sent: Thursday, February 28, 2013 1:30:57 PM >> Subject: [LLVMdev] parallel loop metadata simplification >> >> Hi, >> >> I've been working on clang codegen for #pragma ivdep and creating the >> llvm.mem.parallel_loop_access metadata seems quite difficult. The >> main problem is that there are so many places where loads and stores >> are created and all of them need to be changed when emitting a >> parallel loop. Note that creating llvm.loop.parallel is not a >> problem. >> >> One option is to modify IRBuilder to enable attaching the metadata >> for loads and stores but that seems like a huge hack. > > Can you please explain why this is a bad option? To be honest, this is what I expected you to do. The IRBuilder sits on top of all of the loads and stores, and seems like the perfect place to keep state for something that affects "all" loads and stores in some code region. In addition, putting this in IRBuilder should also make it easier to use this feature from other frontends.One concern I have is that the llvm.mem.parallel_loop_access can refer to nested loops. Is it possible that instructions in the same loop require different metadatas (one is parallel in both inner and outer loop and another is only parallel in the inner loop?) This is a more general problem I think but the IRBuilder would only be useful for brute force adding the same metadata to all instructions with mayReadOrWriteMemory(). Thoughts? paul> > -Hal > >> >> I'd like to reopen the discussion on requiring the >> llvm.mem.parallel_loop_access metadata. I understand the reason for >> the metadata is to protect against transformations that may >> introduce unsafe parallel memory accesses (the reg2mem example.) I'm >> wondering if perhaps we can make the metadata more user-friendly by >> providing a single loop metadata which can be expanded into "safer" >> metadata as required. Specifically a loop pass could be added that >> expands llvm.loop.parallel into llvm.loop.parallel_protected + >> llvm.mem.parallel_loop_access. >> >> Perhaps there are simpler options I've overlooked? >> >> paul >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>
Sergei Larin
2013-Mar-01  16:24 UTC
[LLVMdev] Interesting post increment situation in DAG combiner
Hal, (and everyone who might care about post increment generation)...
I have an interesting question/observation. Consider this vector loop.
void vec_add_const(unsigned N, short __attribute__ ((aligned (16))) *A,
                                  short __attribute__ ((aligned (16))) val)
{
 unsigned i,j;
 for (i=0; i<N; i++) {
  for (j=0; j<N; j++) {
   A[i*N+j] += val;
  }
 }
}
The innermost loop looks like this right before the DAG selection begins.
  p.loop_body.us65:                             ; preds %p.loop_body.lr.ph.us78,
%p.loop_body.us65
  %p_arrayidx.us69.phi = phi i16* [ %p_arrayidx.us69.gep,
%p.loop_body.lr.ph.us78 ], [ %p_arrayidx.us69.inc, %p.loop_body.us65 ]
  %p.loopiv48.us66 = phi i32 [ 0, %p.loop_body.lr.ph.us78 ], [
%p.next_loopiv.us67, %p.loop_body.us65 ]
  %vector_ptr.us70 = bitcast i16* %p_arrayidx.us69.phi to <4 x i16>*
  %p.next_loopiv.us67 = add nsw i32 %p.loopiv48.us66, 4 
<<<<<<<<<<<<<<<<<<
IV
  %_p_vec_full.us71 = load <4 x i16>* %vector_ptr.us70, align 16
<<<<<<<<<<<<<<<<<<<Load
  %add5p_vec.us72 = add <4 x i16> %_p_vec_full.us71, %5
  store <4 x i16> %add5p_vec.us72, <4 x i16>* %vector_ptr.us70,
align 16
<<<<<<<<<<<<<<<Store
  %p_arrayidx.us69.inc = getelementptr i16* %p_arrayidx.us69.phi, i32 4
<<<<<<<<<<<<<<< Common Ptr
  %11 = icmp slt i32 %p.next_loopiv.us67, %leftover_lb
  br i1 %11, label %p.loop_body.us65, label %p.loop_header38.preheader.us84
When it gets to the DAG Combiner, in CombineToPostIndexedLoadStore() two
opportunities for post inc are recognized - the load and the store.
Now, you can easily see that in this case you would want the store to get
the post inc, not the load, but since the DAG combiner simply scans
top-down, the opposite happens.
  So here is the question - do you recognize this as a deficiency, and can
you see the same in PPC? The fix is code trivial, but it would introduce a
general concept of a primitive cost function to the PostInc candidacy
selection in DAG combine. If you recognize the issue, I will post a patch
with more details, but if I am missing the big picture here, please advise.
Sergei
---
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation