thr3ads.net - llvm dev - [LLVMdev] parallel loop metadata simplification [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Redmond, Paul

2013-Feb-28 19:30 UTC

[LLVMdev] parallel loop metadata simplification

Hi,

I've been working on clang codegen for #pragma ivdep and creating the
llvm.mem.parallel_loop_access metadata seems quite difficult. The main problem
is that there are so many places where loads and stores are created and all of
them need to be changed when emitting a parallel loop. Note that creating
llvm.loop.parallel is not a problem.

One option is to modify IRBuilder to enable attaching the metadata for loads and
stores but that seems like a huge hack.

I'd like to reopen the discussion on requiring the
llvm.mem.parallel_loop_access metadata. I understand the reason for the metadata
is to protect against transformations that may introduce unsafe parallel memory
accesses (the reg2mem example.) I'm wondering if perhaps we can make the
metadata more user-friendly by providing a single loop metadata which can be
expanded into "safer" metadata as required. Specifically a loop pass
could be added that expands llvm.loop.parallel into llvm.loop.parallel_protected
+ llvm.mem.parallel_loop_access.

Perhaps there are simpler options I've overlooked?

paul

Pekka Jääskeläinen

2013-Feb-28 21:21 UTC

head link

[LLVMdev] parallel loop metadata simplification

Hi Paul,

On 02/28/2013 09:30 PM, Redmond, Paul wrote:> I'd like to reopen the discussion on requiring the
> llvm.mem.parallel_loop_access metadata. I understand the reason for the
> metadata is to protect against transformations that may introduce unsafe
> parallel memory accesses (the reg2mem example.) I'm wondering if
perhaps we
> can make the metadata more user-friendly by providing a single loop
metadata
> which can be expanded into "safer" metadata as required.
Specifically a loop
> pass could be added that expands llvm.loop.parallel into
> llvm.loop.parallel_protected + llvm.mem.parallel_loop_access.
Did I understand this correctly: llvm.loop.parallel would become
a version which does not require the parallel_loop_access? Basically it
would be a placeholder for a later pass (before unsafe optimizations?) that
converts it to the "safe" metadata pair? Then you can generate the
loads/stores
in Clang without worrying about the mem metadata but add the safe
metadata once and for all.

The problem I see with this is that I do not think there is use for
"unsafe"/"unprotected" parallel metadata after Clang. Thus,
the
"unprotected metadata" would be merely book keeping until the final IR
is
generated in Clang (with the safe metadata). Maybe some other book keeping
mechanism could do? If it's not easily workable (I'm not a Clang expert)
then I'd propose a descriptive name for the new metadata like 
llvm.loop.parallel_under_construction that strongly states that
it's for this purpose only and should not be used for anything
during optimizations (as it's not safe to do so!).

-- 
--Pekka

Redmond, Paul

2013-Feb-28 22:37 UTC

head link

[LLVMdev] parallel loop metadata simplification

Hi Pekka,

On 2013-02-28, at 4:21 PM, Pekka Jääskeläinen wrote:
> Hi Paul,
> 
> On 02/28/2013 09:30 PM, Redmond, Paul wrote:
>> I'd like to reopen the discussion on requiring the
>> llvm.mem.parallel_loop_access metadata. I understand the reason for the
>> metadata is to protect against transformations that may introduce
unsafe
>> parallel memory accesses (the reg2mem example.) I'm wondering if
perhaps we
>> can make the metadata more user-friendly by providing a single loop
metadata
>> which can be expanded into "safer" metadata as required.
Specifically a loop
>> pass could be added that expands llvm.loop.parallel into
>> llvm.loop.parallel_protected + llvm.mem.parallel_loop_access.
> 
> Did I understand this correctly: llvm.loop.parallel would become
> a version which does not require the parallel_loop_access? Basically it
> would be a placeholder for a later pass (before unsafe optimizations?) that
> converts it to the "safe" metadata pair? Then you can generate
the loads/stores
> in Clang without worrying about the mem metadata but add the safe
> metadata once and for all.
> 
Yes, that is correct.
> The problem I see with this is that I do not think there is use for
> "unsafe"/"unprotected" parallel metadata after Clang.
Thus, the
> "unprotected metadata" would be merely book keeping until the
final IR is
> generated in Clang (with the safe metadata). Maybe some other book keeping
> mechanism could do? If it's not easily workable (I'm not a Clang
expert)
> then I'd propose a descriptive name for the new metadata like
llvm.loop.parallel_under_construction that strongly states that
> it's for this purpose only and should not be used for anything
> during optimizations (as it's not safe to do so!).
> 
Those are fair points and I can't say whether or not there are other
potential users. Certainly clang could use custom loop metadata and the pass
could live in its codegen.

Another option is two make Loop::isParallel aware of both safe and unsafe
parallel metadata and not requires translation at all. In practice, passes like
reg2mem are not run so the concern goes away. Perhaps a single loop metadata is
good enough for most uses?

paul
> -- 
> --Pekka

Hal Finkel

2013-Mar-01 02:33 UTC

head link

[LLVMdev] parallel loop metadata simplification

----- Original Message -----> From: "Paul Redmond" <paul.redmond at intel.com>
> To: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu>
> Sent: Thursday, February 28, 2013 1:30:57 PM
> Subject: [LLVMdev] parallel loop metadata simplification
> 
> Hi,
> 
> I've been working on clang codegen for #pragma ivdep and creating the
> llvm.mem.parallel_loop_access metadata seems quite difficult. The
> main problem is that there are so many places where loads and stores
> are created and all of them need to be changed when emitting a
> parallel loop. Note that creating llvm.loop.parallel is not a
> problem.
> 
> One option is to modify IRBuilder to enable attaching the metadata
> for loads and stores but that seems like a huge hack.
Can you please explain why this is a bad option? To be honest, this is what I
expected you to do. The IRBuilder sits on top of all of the loads and stores,
and seems like the perfect place to keep state for something that affects
"all" loads and stores in some code region. In addition, putting this
in IRBuilder should also make it easier to use this feature from other
frontends.

 -Hal
> 
> I'd like to reopen the discussion on requiring the
> llvm.mem.parallel_loop_access metadata. I understand the reason for
> the metadata is to protect against transformations that may
> introduce unsafe parallel memory accesses (the reg2mem example.) I'm
> wondering if perhaps we can make the metadata more user-friendly by
> providing a single loop metadata which can be expanded into
"safer"
> metadata as required. Specifically a loop pass could be added that
> expands llvm.loop.parallel into llvm.loop.parallel_protected +
> llvm.mem.parallel_loop_access.
> 
> Perhaps there are simpler options I've overlooked?
> 
> paul
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Redmond, Paul

2013-Mar-01 16:06 UTC

head link

[LLVMdev] parallel loop metadata simplification

Hi Hal,

On 2013-02-28, at 9:33 PM, Hal Finkel wrote:
> ----- Original Message -----
>> From: "Paul Redmond" <paul.redmond at intel.com>
>> To: "llvmdev at cs.uiuc.edu Dev" <llvmdev at
cs.uiuc.edu>
>> Sent: Thursday, February 28, 2013 1:30:57 PM
>> Subject: [LLVMdev] parallel loop metadata simplification
>> 
>> Hi,
>> 
>> I've been working on clang codegen for #pragma ivdep and creating
the
>> llvm.mem.parallel_loop_access metadata seems quite difficult. The
>> main problem is that there are so many places where loads and stores
>> are created and all of them need to be changed when emitting a
>> parallel loop. Note that creating llvm.loop.parallel is not a
>> problem.
>> 
>> One option is to modify IRBuilder to enable attaching the metadata
>> for loads and stores but that seems like a huge hack.
> 
> Can you please explain why this is a bad option? To be honest, this is what
I expected you to do. The IRBuilder sits on top of all of the loads and stores,
and seems like the perfect place to keep state for something that affects
"all" loads and stores in some code region. In addition, putting this
in IRBuilder should also make it easier to use this feature from other
frontends.
One concern I have is that the llvm.mem.parallel_loop_access can refer to nested
loops. Is it possible that instructions in the same loop require different
metadatas (one is parallel in both inner and outer loop and another is only
parallel in the inner loop?) This is a more general problem I think but the
IRBuilder would only be useful for brute force adding the same metadata to all
instructions with mayReadOrWriteMemory().

Thoughts?

paul
> 
> -Hal
> 
>> 
>> I'd like to reopen the discussion on requiring the
>> llvm.mem.parallel_loop_access metadata. I understand the reason for
>> the metadata is to protect against transformations that may
>> introduce unsafe parallel memory accesses (the reg2mem example.)
I'm
>> wondering if perhaps we can make the metadata more user-friendly by
>> providing a single loop metadata which can be expanded into
"safer"
>> metadata as required. Specifically a loop pass could be added that
>> expands llvm.loop.parallel into llvm.loop.parallel_protected +
>> llvm.mem.parallel_loop_access.
>> 
>> Perhaps there are simpler options I've overlooked?
>> 
>> paul
>> 
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>

Sergei Larin

2013-Mar-01 16:24 UTC

head link

[LLVMdev] Interesting post increment situation in DAG combiner

Hal, (and everyone who might care about post increment generation)...

I have an interesting question/observation. Consider this vector loop.

void vec_add_const(unsigned N, short __attribute__ ((aligned (16))) *A,
                                  short __attribute__ ((aligned (16))) val)
{
 unsigned i,j;
 for (i=0; i<N; i++) {
  for (j=0; j<N; j++) {
   A[i*N+j] += val;
  }
 }
}

The innermost loop looks like this right before the DAG selection begins.

  p.loop_body.us65:                             ; preds %p.loop_body.lr.ph.us78,
%p.loop_body.us65
  %p_arrayidx.us69.phi = phi i16* [ %p_arrayidx.us69.gep,
%p.loop_body.lr.ph.us78 ], [ %p_arrayidx.us69.inc, %p.loop_body.us65 ]
  %p.loopiv48.us66 = phi i32 [ 0, %p.loop_body.lr.ph.us78 ], [
%p.next_loopiv.us67, %p.loop_body.us65 ]
  %vector_ptr.us70 = bitcast i16* %p_arrayidx.us69.phi to <4 x i16>*
  %p.next_loopiv.us67 = add nsw i32 %p.loopiv48.us66, 4 
<<<<<<<<<<<<<<<<<<
IV
  %_p_vec_full.us71 = load <4 x i16>* %vector_ptr.us70, align 16
<<<<<<<<<<<<<<<<<<<Load
  %add5p_vec.us72 = add <4 x i16> %_p_vec_full.us71, %5
  store <4 x i16> %add5p_vec.us72, <4 x i16>* %vector_ptr.us70,
align 16
<<<<<<<<<<<<<<<Store
  %p_arrayidx.us69.inc = getelementptr i16* %p_arrayidx.us69.phi, i32 4
<<<<<<<<<<<<<<< Common Ptr
  %11 = icmp slt i32 %p.next_loopiv.us67, %leftover_lb
  br i1 %11, label %p.loop_body.us65, label %p.loop_header38.preheader.us84

When it gets to the DAG Combiner, in CombineToPostIndexedLoadStore() two
opportunities for post inc are recognized - the load and the store.
Now, you can easily see that in this case you would want the store to get
the post inc, not the load, but since the DAG combiner simply scans
top-down, the opposite happens.

  So here is the question - do you recognize this as a deficiency, and can
you see the same in PPC? The fix is code trivial, but it would introduce a
general concept of a primitive cost function to the PostInc candidacy
selection in DAG combine. If you recognize the issue, I will post a patch
with more details, but if I am missing the big picture here, please advise.

Sergei


---
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Feb 2013 - [LLVMdev] parallel loop metadata simplification

[LLVMdev] parallel loop metadata simplification

[LLVMdev] parallel loop metadata simplification

[LLVMdev] parallel loop metadata simplification

[LLVMdev] parallel loop metadata simplification

[LLVMdev] parallel loop metadata simplification

[LLVMdev] Interesting post increment situation in DAG combiner

Apparently Analagous Threads