thr3ads.net - llvm dev - [LLVMdev] ARM vectorizer cost model [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2013-Jan-09 14:32 UTC

[LLVMdev] ARM vectorizer cost model

Hi Nadav,

I'm interested in knowing how you'll work up the ARM cost model and how
easy it'd be to split the work.

As far as I can see, LoopVectorizationCostModel is the class that does all
the work, with assistance from the target transform info.

Do you think that updating ARMTTI would be the best course of action now,
and inspect the differences in the CostModel later?

I also haven't seen anything related to context switches and pipeline
decisions on the cost model, another issue that will be quite different
between targets and sub-targets (especially in ARM world). But that can
wait...

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130109/ca1093d6/attachment.html>

Nadav Rotem

2013-Jan-09 17:10 UTC

head link

[LLVMdev] ARM vectorizer cost model

Hi Renato, 
> I'm interested in knowing how you'll work up the ARM cost model and
how easy it'd be to split the work.
Yes, I am starting to work on the ARM cost model and I would appreciate any help
in the form of: advice, performance measurements, patches, etc.

I tune the cost model by running the cost model analysis pass and I compare the
output of the analysis to the output of LLC.

For example:
	"opt -cost-model -analyze dumper.ll -mtriple=thumbv7
-mcpu=cortex-a15"

I also run the vectorizer with -debug-only=loop-vectorize because it dumps the
costs of all of the instructions with different vectorization factors, and it
also detects the different kinds of shuffles that we support.
> As far as I can see, LoopVectorizationCostModel is the class that does all
the work, with assistance from the target transform info.
The LoopVectorizerCostModel only predicts which IR will be generated when
vectorizing to a specific vector width. It uses TTI to get the cost of each IR
instruction. Chandler recently refactored TTI (thank!) and now TTI is an
analysis group. The BasicTTI attempts to handle all of the target independent
logic. It uses the TargetLowering interface to check if the types are legal and
how many times large vectors need to be split. Different targets need to
implement the cases that the BasicTTI does not catch. For example, the cost of
zext <8xi8> to <8 x i32> which is custom lowered on some targets.
> Do you think that updating ARMTTI would be the best course of action now,
and inspect the differences in the CostModel later?
> We should update TTI and inspect the cost model as we go.
> I also haven't seen anything related to context switches and pipeline
decisions on the cost model, another issue that will be quite different between
targets and sub-targets (especially in ARM world). But that can wait…
I am not aware of anything that we can do in regard to context switches. Do you
mean the cost of moving GPR to NEON ? Its a good point. We need to increase the
cost of insert/extract vector. It should be easy to model and we have all of the
hooks already.

We can use the Subtarget when we implement the hooks. This is an example from
the ARMTTI

  unsigned getNumberOfRegisters(bool Vector) const {
    if (Vector) {
      if (ST->hasNEON())
        return 16; 
      return 0;
    }   

    if (ST->isThumb1Only())
      return 8;
    return 16; 
  }

  unsigned getMaximumUnrollFactor() const {
    // These are out of order CPUs:
    if (ST->isCortexA15() || ST->isSwift())
      return 2;
    return 1;
  }
> 
Thanks,
Nadav
> 
> cheers,
> --renato

Renato Golin Linaro

2013-Jan-10 22:19 UTC

head link

[LLVMdev] ARM vectorizer cost model

On 9 January 2013 17:10, Nadav Rotem <nrotem at apple.com> wrote:
> For example:
>         "opt -cost-model -analyze dumper.ll -mtriple=thumbv7
> -mcpu=cortex-a15"
>
> I also run the vectorizer with -debug-only=loop-vectorize because it dumps
> the costs of all of the instructions with different vectorization factors,
> and it also detects the different kinds of shuffles that we support.
>
Hi Nadav,

These are great ways of debugging the cost model!


The LoopVectorizerCostModel only predicts which IR will be generated
when> vectorizing to a specific vector width. It uses TTI to get the cost of each
> IR instruction. Chandler recently refactored TTI (thank!) and now TTI is an
> analysis group. The BasicTTI attempts to handle all of the target
> independent logic. It uses the TargetLowering interface to check if the
> types are legal and how many times large vectors need to be split.
> Different targets need to implement the cases that the BasicTTI does not
> catch. For example, the cost of zext <8xi8> to <8 x i32> which
is custom
> lowered on some targets.
>
I'm also thinking about the individual instructions cost
(getArithmeticInstrCost, getShuffleCost, etc). That can be a simple and
easily parallelized task. I got the A9 manual that has the cost of all
instructions (including NEON and VFP), that should give us a head start.

I'm guessing the cost you already have for Intel and the BasicTTI is in
"ideal cycle count", not taking into consideration the time available
to
get the results or pipeline stalls, etc. In the end, when the model is
complete, it doesn't matter much the individual numbers, as long as they
scale equally, but for now, while we're still relying on BasicTTI, we
should follow a similar approach.


> I am not aware of anything that we can do in regard to context switches.
> Do you mean the cost of moving GPR to NEON ? Its a good point. We need to
> increase the cost of insert/extract vector. It should be easy to model and
> we have all of the hooks already.
>
Yes, and pipeline stalls, and intra-instruction behaviour, and A9 oddities,
but that's all blue sky ideas for now. I don't think it'll be a hard
engineering problem to know where to put the code, but it won't be easy to
get some things right without badly breaking others. Let's be conservative
for now... ;)


We can use the Subtarget when we implement the hooks. This is an
example> from the ARMTTI

Yes, this direct access is very convenient. For now, I'll focus on A9 and
later we can add the subtleties of each sub-target.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130110/a736c8f9/attachment.html>

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Jan 2013 - [LLVMdev] ARM vectorizer cost model

[LLVMdev] ARM vectorizer cost model

[LLVMdev] ARM vectorizer cost model

[LLVMdev] ARM vectorizer cost model

Seemingly Similar Threads