thr3ads.net - llvm dev - [LLVMdev] Trip count and Loop Vectorizer [Sep 2013]

If this information is useful, please help other people find it:
Share via:

Arnold Schwaighofer

2013-Sep-27 17:54 UTC

[LLVMdev] Trip count and Loop Vectorizer

On Sep 27, 2013, at 12:47 PM, Arnold Schwaighofer <aschwaighofer at
apple.com> wrote:
>  so you could infer that n must be smaller than 8 (because you know the
range of the other dimension). The question is how often does such an example
occur, where this is possible, to make such an effort justifiable?smaller equal, of course ;)

Murali, Sriram

2013-Sep-27 19:21 UTC

head link

[LLVMdev] Trip count and Loop Vectorizer

Hey Arnold,
I have run into this situation many times while benchmarking.
I think it is best if this is addressed using a simple heuristic. For that, we
need to identify the loop cost and decide if it makes sense to completely unroll
the loop, or partially unroll. I am unsure of the optimal way to implement this
though.

I want to run it by the list to get any ideas floating around :)
Thanks
Sriram

-----Original Message-----
From: Arnold Schwaighofer [mailto:aschwaighofer at apple.com] 
Sent: Friday, September 27, 2013 1:54 PM
To: Murali, Sriram
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Trip count and Loop Vectorizer


On Sep 27, 2013, at 12:47 PM, Arnold Schwaighofer <aschwaighofer at
apple.com> wrote:
>  so you could infer that n must be smaller than 8 (because you know the
range of the other dimension). The question is how often does such an example
occur, where this is possible, to make such an effort justifiable?smaller equal, of course ;)

Arnold Schwaighofer

2013-Sep-27 19:56 UTC

head link

[LLVMdev] Trip count and Loop Vectorizer

We have frame work for simple heuristics in place: 

1.) We vectorize loops with unknown trip count.
2.) You can find our partial unroll heuristic in selectUnrollFactor. It probably
needs tuning.

  // We unroll the loop in order to expose ILP and reduce the loop overhead.
  // There are many micro-architectural considerations that we can't predict
  // at this level. For example frontend pressure (on decode or fetch) due to
  // code size, or the number and capabilities of the execution ports.
  //
  // We use the following heuristics to select the unroll factor:
  // 1. If the code has reductions the we unroll in order to break the cross
  // iteration dependency.
  // 2. If the loop is really small then we unroll in order to reduce the loop
  // overhead. 

   <<< This is the heuristic that works against your example.

  // 3. We don't unroll if we think that we will spill registers to memory
due
  // to the increased register pressure.

On Sep 27, 2013, at 2:21 PM, Murali, Sriram <sriram.murali at intel.com>
wrote:
> Hey Arnold,
> I have run into this situation many times while benchmarking.
> I think it is best if this is addressed using a simple heuristic. For that,
we need to identify the loop cost and decide if it makes sense to completely
unroll the loop, or partially unroll. I am unsure of the optimal way to
implement this though.
> 
> I want to run it by the list to get any ideas floating around :)
> Thanks
> Sriram
> 
> -----Original Message-----
> From: Arnold Schwaighofer [mailto:aschwaighofer at apple.com] 
> Sent: Friday, September 27, 2013 1:54 PM
> To: Murali, Sriram
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Trip count and Loop Vectorizer
> 
> 
> On Sep 27, 2013, at 12:47 PM, Arnold Schwaighofer <aschwaighofer at
apple.com> wrote:
> 
>> so you could infer that n must be smaller than 8 (because you know the
range of the other dimension). The question is how often does such an example
occur, where this is possible, to make such an effort justifiable?
> smaller equal, of course ;)

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - Sep 2013 - [LLVMdev] Trip count and Loop Vectorizer

[LLVMdev] Trip count and Loop Vectorizer

[LLVMdev] Trip count and Loop Vectorizer

[LLVMdev] Trip count and Loop Vectorizer

Seemingly Similar Threads