thr3ads.net - llvm dev - [llvm-dev] TargetTransformInfo getOperationCost uses [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Arsenault, Matthew via llvm-dev

2016-Jan-05 16:25 UTC

[llvm-dev] TargetTransformInfo getOperationCost uses

Hi,


I'm trying to implement the TTI hooks for AMDGPU to avoid unrolling loops
for operations with huge expansions (i.e. integer division).


The values that are ultimately reported by opt -cost-model -analyze (the actual
cost model tests) seem to not matter for this. The huge cost I've assigned
division doesn't prevent the loop from being unrolled, because it isn't
actually considered when loop unrolling.


The loop unrolled uses CodeMetrics, which via getUserCost ultimately uses
TargetTransformInfoImplBase::getOperationCost(), which returns various fixed
values (4 for division (TCC_Expensive, but this isn't nearly expensive
enough)).


getOperationCost only uses the type and opcode to estimate, so it doesn't
require a value. No target overrides this. The hooks that targets do really
implement, like getArithmeticInstrCost, use some information about the operands
so require a value. These don't appear to be used at all by the cost model,
and instead in specific places in some passes.


Why don't any targets override getOperationCost or why aren't there any
other hooks that change its behavior? Having these two parallel path for
operation costs is confusing, especially since I expected to be able to use the
-cost-model output for testing all of the computed costs.


-Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160105/e57e98d7/attachment.html>

Hal Finkel via llvm-dev

2016-Jan-05 21:11 UTC

head link

[llvm-dev] TargetTransformInfo getOperationCost uses

Hi Matt, 

So the problem here is that, as you imply, TTI has two cost models: One is used
for vectorization (and has received a fair amount of attention), and one is used
for inlining/unrolling, and has received less attention.

The vectorization cost model is, generally speaking, concerned with instruction
throughputs, and is used to estimate the relative speed of a vectorized loop vs.
the scalar one (it assumes that a proper amount of ILP is available, or created
by interleaving, such that the throughputs matter much more than the latencies).

The "user" code model, used by the inliner and unroller, is concerned
with estimating something more-closely related to code size (although there is
obviously a desired correlation with performance). It has primarily been
customized to let the inliner/unroller, on a target specific basis, understand
that certain zext/sexts are free, etc.

As you also imply. there has been a lot more specific target customization work
on the vectorization cost model. To be honest, I don't think this situation
is ideal (we could have one cost model that returns information on size, latency
and throughput that is used by all clients). Nevertheless, hopefully this makes
things somewhat clearer.

-Hal 

----- Original Message -----
> From: "Matthew via llvm-dev Arsenault" <llvm-dev at
lists.llvm.org>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Tuesday, January 5, 2016 10:25:46 AM
> Subject: [llvm-dev] TargetTransformInfo getOperationCost uses
> Hi,
> I'm trying to implement the TTI hooks for AMDGPU to avoid unrolling
> loops for operations with huge expansions (i.e. integer division).
> The values that are ultimately reported by opt -cost-model -analyze
> (the actual cost model tests) seem to not matter for this. The huge
> cost I've assigned division doesn't prevent the loop from being
> unrolled, because it isn't actually considered when loop unrolling.
> The loop unrolled uses CodeMetrics, which via getUserCost ultimately
> uses TargetTransformInfoImplBase::getOperationCost(), which returns
> various fixed values (4 for division (TCC_Expensive, but this isn't
> nearly expensive enough)).
> getOperationCost only uses the type and opcode to estimate, so it
> doesn't require a value. No target overrides this. The hooks that
> targets do really implement, like getArithmeticInstrCost, use some
> information about the operands so require a value. These don't
> appear to be used at all by the cost model, and instead in specific
> places in some passes.
> Why don't any targets override getOperationCost or why aren't there
> any other hooks that change its behavior? Having these two parallel
> path for operation costs is confusing, especially since I expected
> to be able to use the -cost-model output for testing all of the
> computed costs.
> -Matt
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160105/d9fac96a/attachment.html>

Sanjay Patel via llvm-dev

2016-Jan-06 01:12 UTC

head link

[llvm-dev] TargetTransformInfo getOperationCost uses

Side note on TargetTransformInfoImplBase::getOperationCost() and
TCC_Expensive:
As part of fixing https://llvm.org/bugs/show_bug.cgi?id=24818
...I made div/rem instructions default to expensive.

GPU targets should probably override those back to "TCC_Basic" so
CodeGenPrepare won't despeculate execution of those ops when they're
operands of a select.

If we continue along that path (default costs are based on typical CPUs
rather than GPUs...because there are more of those targets?), I'd like to
change sqrt and possibly other math intrinsics in the same way - default
them to TCC_Expensive.


On Tue, Jan 5, 2016 at 2:11 PM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi Matt,
>
> So the problem here is that, as you imply, TTI has two cost models: One is
> used for vectorization (and has received a fair amount of attention), and
> one is used for inlining/unrolling, and has received less attention.
>
> The vectorization cost model is, generally speaking, concerned with
> instruction throughputs, and is used to estimate the relative speed of a
> vectorized loop vs. the scalar one (it assumes that a proper amount of ILP
> is available, or created by interleaving, such that the throughputs matter
> much more than the latencies).
>
> The "user" code model, used by the inliner and unroller, is
concerned with
> estimating something more-closely related to code size (although there is
> obviously a desired correlation with performance). It has primarily been
> customized to let the inliner/unroller, on a target specific basis,
> understand that certain zext/sexts are free, etc.
>
> As you also imply. there has been a lot more specific target customization
> work on the vectorization cost model. To be honest, I don't think this
> situation is ideal (we could have one cost model that returns information
> on size, latency and throughput that is used by all clients). Nevertheless,
> hopefully this makes things somewhat clearer.
>
>  -Hal
>
> ------------------------------
>
> *From: *"Matthew via llvm-dev Arsenault" <llvm-dev at
lists.llvm.org>
> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>
> *Sent: *Tuesday, January 5, 2016 10:25:46 AM
> *Subject: *[llvm-dev] TargetTransformInfo getOperationCost uses
>
>
> Hi,
>
>
> I'm trying to implement the TTI hooks for AMDGPU to avoid unrolling
loops
> for operations with huge expansions (i.e. integer division).
>
>
> The values that are ultimately reported by opt -cost-model -analyze (the
> actual cost model tests) seem to not matter for this. The huge cost
I've
> assigned division doesn't prevent the loop from being unrolled, because
it
> isn't actually considered when loop unrolling.
>
> The loop unrolled uses CodeMetrics, which via getUserCost ultimately
> uses TargetTransformInfoImplBase::getOperationCost(), which returns various
> fixed values (4 for division (TCC_Expensive, but this isn't nearly
> expensive enough)).
>
>
> getOperationCost only uses the type and opcode to estimate, so it
doesn't
> require a value. No target overrides this. The hooks that targets do really
> implement, like getArithmeticInstrCost, use some information about the
> operands so require a value. These don't appear to be used at all by
the
> cost model, and instead in specific places in some passes.
>
>
> Why don't any targets override getOperationCost or why aren't there
any
> other hooks that change its behavior? Having these two parallel path for
> operation costs is confusing, especially since I expected to be able to use
> the -cost-model output for testing all of the computed costs.
>
>
> -Matt
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160105/3b262c60/attachment.html>

Zaks, Ayal via llvm-dev

2016-Jan-07 12:27 UTC

head link

[llvm-dev] TargetTransformInfo getOperationCost uses

> The vectorization cost model is, generally speaking, concerned with
instruction throughputs, and is used to estimate the relative speed of a
vectorized loop vs. the scalar one (it assumes that a proper amount of ILP is
available, or created by interleaving, such that the throughputs matter much
more than the latencies).
Note that the loop vectorizer also acts as an unroller, and considers other
costs of register pressure and expanding reductions in doing so.

The SLP vectorizer should also consider latency costs in latency-bound
situations, rather considering only throughput costs, as observed in PR25108.

Ayal.


From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Hal
Finkel via llvm-dev
Sent: Tuesday, January 05, 2016 23:12
To: Matthew Arsenault <Matthew.Arsenault at amd.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] TargetTransformInfo getOperationCost uses

Hi Matt,

So the problem here is that, as you imply, TTI has two cost models: One is used
for vectorization (and has received a fair amount of attention), and one is used
for inlining/unrolling, and has received less attention.

The vectorization cost model is, generally speaking, concerned with instruction
throughputs, and is used to estimate the relative speed of a vectorized loop vs.
the scalar one (it assumes that a proper amount of ILP is available, or created
by interleaving, such that the throughputs matter much more than the latencies).

The "user" code model, used by the inliner and unroller, is concerned
with estimating something more-closely related to code size (although there is
obviously a desired correlation with performance). It has primarily been
customized to let the inliner/unroller, on a target specific basis, understand
that certain zext/sexts are free, etc.

As you also imply. there has been a lot more specific target customization work
on the vectorization cost model. To be honest, I don't think this situation
is ideal (we could have one cost model that returns information on size, latency
and throughput that is used by all clients). Nevertheless, hopefully this makes
things somewhat clearer.

 -Hal
________________________________
From: "Matthew via llvm-dev Arsenault" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
To: "llvm-dev" <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Sent: Tuesday, January 5, 2016 10:25:46 AM
Subject: [llvm-dev] TargetTransformInfo getOperationCost uses

Hi,



I'm trying to implement the TTI hooks for AMDGPU to avoid unrolling loops
for operations with huge expansions (i.e. integer division).



The values that are ultimately reported by opt -cost-model -analyze (the actual
cost model tests) seem to not matter for this. The huge cost I've assigned
division doesn't prevent the loop from being unrolled, because it isn't
actually considered when loop unrolling.


The loop unrolled uses CodeMetrics, which via getUserCost ultimately uses
TargetTransformInfoImplBase::getOperationCost(), which returns various fixed
values (4 for division (TCC_Expensive, but this isn't nearly expensive
enough)).



getOperationCost only uses the type and opcode to estimate, so it doesn't
require a value. No target overrides this. The hooks that targets do really
implement, like getArithmeticInstrCost, use some information about the operands
so require a value. These don't appear to be used at all by the cost model,
and instead in specific places in some passes.



Why don't any targets override getOperationCost or why aren't there any
other hooks that change its behavior? Having these two parallel path for
operation costs is confusing, especially since I expected to be able to use the
-cost-model output for testing all of the computed costs.



-Matt

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160107/e9334dfd/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jan 2016 - TargetTransformInfo getOperationCost uses

[llvm-dev] TargetTransformInfo getOperationCost uses

[llvm-dev] TargetTransformInfo getOperationCost uses

[llvm-dev] TargetTransformInfo getOperationCost uses

[llvm-dev] TargetTransformInfo getOperationCost uses

Possibly Parallel Threads