thr3ads.net - llvm dev - [LLVMdev] Interesting post increment situation in DAG combiner [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Sergei Larin

2013-Mar-01 16:24 UTC

[LLVMdev] Interesting post increment situation in DAG combiner

Hal, (and everyone who might care about post increment generation)...

I have an interesting question/observation. Consider this vector loop.

void vec_add_const(unsigned N, short __attribute__ ((aligned (16))) *A,
                                  short __attribute__ ((aligned (16))) val)
{
 unsigned i,j;
 for (i=0; i<N; i++) {
  for (j=0; j<N; j++) {
   A[i*N+j] += val;
  }
 }
}

The innermost loop looks like this right before the DAG selection begins.

  p.loop_body.us65:                             ; preds %p.loop_body.lr.ph.us78,
%p.loop_body.us65
  %p_arrayidx.us69.phi = phi i16* [ %p_arrayidx.us69.gep,
%p.loop_body.lr.ph.us78 ], [ %p_arrayidx.us69.inc, %p.loop_body.us65 ]
  %p.loopiv48.us66 = phi i32 [ 0, %p.loop_body.lr.ph.us78 ], [
%p.next_loopiv.us67, %p.loop_body.us65 ]
  %vector_ptr.us70 = bitcast i16* %p_arrayidx.us69.phi to <4 x i16>*
  %p.next_loopiv.us67 = add nsw i32 %p.loopiv48.us66, 4 
<<<<<<<<<<<<<<<<<<
IV
  %_p_vec_full.us71 = load <4 x i16>* %vector_ptr.us70, align 16
<<<<<<<<<<<<<<<<<<<Load
  %add5p_vec.us72 = add <4 x i16> %_p_vec_full.us71, %5
  store <4 x i16> %add5p_vec.us72, <4 x i16>* %vector_ptr.us70,
align 16
<<<<<<<<<<<<<<<Store
  %p_arrayidx.us69.inc = getelementptr i16* %p_arrayidx.us69.phi, i32 4
<<<<<<<<<<<<<<< Common Ptr
  %11 = icmp slt i32 %p.next_loopiv.us67, %leftover_lb
  br i1 %11, label %p.loop_body.us65, label %p.loop_header38.preheader.us84

When it gets to the DAG Combiner, in CombineToPostIndexedLoadStore() two
opportunities for post inc are recognized - the load and the store.
Now, you can easily see that in this case you would want the store to get
the post inc, not the load, but since the DAG combiner simply scans
top-down, the opposite happens.

  So here is the question - do you recognize this as a deficiency, and can
you see the same in PPC? The fix is code trivial, but it would introduce a
general concept of a primitive cost function to the PostInc candidacy
selection in DAG combine. If you recognize the issue, I will post a patch
with more details, but if I am missing the big picture here, please advise.

Sergei


---
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

Hal Finkel

2013-Mar-01 22:52 UTC

head link

[LLVMdev] Interesting post increment situation in DAG combiner

----- Original Message -----> From: "Sergei Larin" <slarin at codeaurora.org>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvmdev at cs.uiuc.edu
> Sent: Friday, March 1, 2013 10:24:39 AM
> Subject: Interesting post increment situation in DAG combiner
> 
> Hal, (and everyone who might care about post increment generation)...
Sergei,

Perhaps this is a problem that I wish that I had ;) -- PPC does not have
post-increment loads and stores, only pre-increment ones. Nevertheless, I think
that the situation is similar...

For one thing, I recently committed an enhancement to pre-increment generation
that causes later constant offsets to use the new incremented address instead of
the old address. I thought that this would not be an issue for post-increment
generation, but it seems that I was wrong: you'd have the same problem here:
if you post-increment the load and not the store, then you might need an extra
register to hold the original base address for the store. In this case,
you'd not have a problem if you just chose to post-increment the store
instead, but the general problem still exists.

Regarding the selection of which load/store to pre/post increment, I think that
this is also a general issue. At least for pre-increment generation I've
seen it make some odd choices.

In short, I do recognize the issue, and I'm curious to see your patch. If it
can improve pre-increment selection as well, that would help me too.

Thanks again,
Hal
> 
> I have an interesting question/observation. Consider this vector
> loop.
> 
> void vec_add_const(unsigned N, short __attribute__ ((aligned (16)))
> *A,
>                                   short __attribute__ ((aligned
>                                   (16))) val)
> {
>  unsigned i,j;
>  for (i=0; i<N; i++) {
>   for (j=0; j<N; j++) {
>    A[i*N+j] += val;
>   }
>  }
> }
> 
> The innermost loop looks like this right before the DAG selection
> begins.
> 
>   p.loop_body.us65:                             ; preds >
%p.loop_body.lr.ph.us78, %p.loop_body.us65
>   %p_arrayidx.us69.phi = phi i16* [ %p_arrayidx.us69.gep,
> %p.loop_body.lr.ph.us78 ], [ %p_arrayidx.us69.inc, %p.loop_body.us65
> ]
>   %p.loopiv48.us66 = phi i32 [ 0, %p.loop_body.lr.ph.us78 ], [
> %p.next_loopiv.us67, %p.loop_body.us65 ]
>   %vector_ptr.us70 = bitcast i16* %p_arrayidx.us69.phi to <4 x i16>*
>   %p.next_loopiv.us67 = add nsw i32 %p.loopiv48.us66, 4
>    <<<<<<<<<<<<<<<<<<
> IV
>   %_p_vec_full.us71 = load <4 x i16>* %vector_ptr.us70, align 16
>
<<<<<<<<<<<<<<<<<<<Load
>   %add5p_vec.us72 = add <4 x i16> %_p_vec_full.us71, %5
>   store <4 x i16> %add5p_vec.us72, <4 x i16>* %vector_ptr.us70,
align
>   16
> <<<<<<<<<<<<<<<Store
>   %p_arrayidx.us69.inc = getelementptr i16* %p_arrayidx.us69.phi, i32
>   4
> <<<<<<<<<<<<<<< Common Ptr
>   %11 = icmp slt i32 %p.next_loopiv.us67, %leftover_lb
>   br i1 %11, label %p.loop_body.us65, label
>   %p.loop_header38.preheader.us84
> 
> When it gets to the DAG Combiner, in CombineToPostIndexedLoadStore()
> two
> opportunities for post inc are recognized - the load and the store.
> Now, you can easily see that in this case you would want the store to
> get
> the post inc, not the load, but since the DAG combiner simply scans
> top-down, the opposite happens.
> 
>   So here is the question - do you recognize this as a deficiency,
>   and can
> you see the same in PPC? The fix is code trivial, but it would
> introduce a
> general concept of a primitive cost function to the PostInc candidacy
> selection in DAG combine. If you recognize the issue, I will post a
> patch
> with more details, but if I am missing the big picture here, please
> advise.
> 
> Sergei
> 
> 
> ---
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by
> The Linux Foundation
> 
> 
> 
>

Sergei Larin

2013-Mar-01 23:16 UTC

head link

[LLVMdev] Interesting post increment situation in DAG combiner

Hal,

  Here is my patch for the post inc case. I think it is symmetrically applicable
to the pre-inc, but I have not tested it for that.
I think you can clearly see my intent here - I simply select the
"latest" candidate when multiple are available.

   Who else might be interested in this?

Sergei

---
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The
Linux Foundation

> -----Original Message-----
> From: Hal Finkel [mailto:hfinkel at anl.gov]
> Sent: Friday, March 01, 2013 4:53 PM
> To: Sergei Larin
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: Interesting post increment situation in DAG combiner
> 
> ----- Original Message -----
> > From: "Sergei Larin" <slarin at codeaurora.org>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: llvmdev at cs.uiuc.edu
> > Sent: Friday, March 1, 2013 10:24:39 AM
> > Subject: Interesting post increment situation in DAG combiner
> >
> > Hal, (and everyone who might care about post increment generation)...
> 
> Sergei,
> 
> Perhaps this is a problem that I wish that I had ;) -- PPC does not
> have post-increment loads and stores, only pre-increment ones.
> Nevertheless, I think that the situation is similar...
> 
> For one thing, I recently committed an enhancement to pre-increment
> generation that causes later constant offsets to use the new
> incremented address instead of the old address. I thought that this
> would not be an issue for post-increment generation, but it seems that
> I was wrong: you'd have the same problem here: if you post-increment
> the load and not the store, then you might need an extra register to
> hold the original base address for the store. In this case, you'd not
> have a problem if you just chose to post-increment the store instead,
> but the general problem still exists.
> 
> Regarding the selection of which load/store to pre/post increment, I
> think that this is also a general issue. At least for pre-increment
> generation I've seen it make some odd choices.
> 
> In short, I do recognize the issue, and I'm curious to see your patch.
> If it can improve pre-increment selection as well, that would help me
> too.
> 
> Thanks again,
> Hal
> 
> >
> > I have an interesting question/observation. Consider this vector
> loop.
> >
> > void vec_add_const(unsigned N, short __attribute__ ((aligned (16)))
> > *A,
> >                                   short __attribute__ ((aligned
> >                                   (16))) val) {  unsigned i,j;  for
> > (i=0; i<N; i++) {
> >   for (j=0; j<N; j++) {
> >    A[i*N+j] += val;
> >   }
> >  }
> > }
> >
> > The innermost loop looks like this right before the DAG selection
> > begins.
> >
> >   p.loop_body.us65:                             ; preds > >
%p.loop_body.lr.ph.us78, %p.loop_body.us65
> >   %p_arrayidx.us69.phi = phi i16* [ %p_arrayidx.us69.gep,
> > %p.loop_body.lr.ph.us78 ], [ %p_arrayidx.us69.inc, %p.loop_body.us65
> ]
> >   %p.loopiv48.us66 = phi i32 [ 0, %p.loop_body.lr.ph.us78 ], [
> > %p.next_loopiv.us67, %p.loop_body.us65 ]
> >   %vector_ptr.us70 = bitcast i16* %p_arrayidx.us69.phi to <4 x
i16>*
> >   %p.next_loopiv.us67 = add nsw i32 %p.loopiv48.us66, 4
> >   
<<<<<<<<<<<<<<<<<<
> > IV
> >   %_p_vec_full.us71 = load <4 x i16>* %vector_ptr.us70, align 16
> >
<<<<<<<<<<<<<<<<<<<Load
> >   %add5p_vec.us72 = add <4 x i16> %_p_vec_full.us71, %5
> >   store <4 x i16> %add5p_vec.us72, <4 x i16>*
%vector_ptr.us70, align
> >   16
> > <<<<<<<<<<<<<<<Store
> >   %p_arrayidx.us69.inc = getelementptr i16* %p_arrayidx.us69.phi, i32
> >   4
> > <<<<<<<<<<<<<<< Common
Ptr
> >   %11 = icmp slt i32 %p.next_loopiv.us67, %leftover_lb
> >   br i1 %11, label %p.loop_body.us65, label
> >   %p.loop_header38.preheader.us84
> >
> > When it gets to the DAG Combiner, in CombineToPostIndexedLoadStore()
> > two opportunities for post inc are recognized - the load and the
> > store.
> > Now, you can easily see that in this case you would want the store to
> > get the post inc, not the load, but since the DAG combiner simply
> > scans top-down, the opposite happens.
> >
> >   So here is the question - do you recognize this as a deficiency,
> >   and can
> > you see the same in PPC? The fix is code trivial, but it would
> > introduce a general concept of a primitive cost function to the
> > PostInc candidacy selection in DAG combine. If you recognize the
> > issue, I will post a patch with more details, but if I am missing the
> > big picture here, please advise.
> >
> > Sergei
> >
> >
> > ---
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > hosted by The Linux Foundation
> >
> >
> >
> >-------------- next part --------------
A non-text attachment was scrubbed...
Name: DagCombiner_PostInc.patch
Type: application/octet-stream
Size: 3444 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130301/97fc5162/attachment.obj>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Mar 2013 - [LLVMdev] Interesting post increment situation in DAG combiner

[LLVMdev] Interesting post increment situation in DAG combiner

[LLVMdev] Interesting post increment situation in DAG combiner

[LLVMdev] Interesting post increment situation in DAG combiner

Maybe Matching Threads