thr3ads.net - llvm dev - [LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics [May 2015]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2015-May-05 13:58 UTC

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

On 4 May 2015 at 08:37, Shahid, Asghar-ahmad
<Asghar-ahmad.Shahid at amd.com> wrote:> My worry is regarding the query for cost calculation for specific SAD
> instructions such as ‘psad’ (X86) or ‘usad’ (ARM) in Loop Vectorizer.
Hi Shahid,

The vectorizer's cost model has the ability to return different costs
for the same instruction based on the arguments (scalar/vector,
big/small, special cases), so I don't think that adding intrisics will
help you in defining the correct cost. This is true for all other
vectorizer's decisions and it works quite well.

If you find something missing, maybe we should fix the cost model, not
introduce more intrinsics.

cheers,
--renato

Shahid, Asghar-ahmad

2015-May-05 14:41 UTC

head link

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

Hi Renato,

Thanks for your response. My concern was actually this. For example, take vector
type V8i16 on X86 target

With llvm.sad() intrinsic:
VC1 (Vector Cost) = Cost associated with "PSAD" instruction.

W/ llvm.absd() and llvm.hadd()
VC2  = Cost associated with "absolute diff" +  "horizontal
add" ( ??? )

As I will be querying with getIntrinsicCost(ID) for these two intrinsics
separately, Will VC1==VC2?

May be I am missing something obvious?

Regards,
Shahid
> -----Original Message-----
> From: Renato Golin [mailto:renato.golin at linaro.org]
> Sent: Tuesday, May 05, 2015 7:28 PM
> To: Shahid, Asghar-ahmad
> Cc: James Molloy; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics
> 
> On 4 May 2015 at 08:37, Shahid, Asghar-ahmad <Asghar-
> ahmad.Shahid at amd.com> wrote:
> > My worry is regarding the query for cost calculation for specific SAD
> > instructions such as ‘psad’ (X86) or ‘usad’ (ARM) in Loop Vectorizer.
> 
> Hi Shahid,
> 
> The vectorizer's cost model has the ability to return different costs
for the
> same instruction based on the arguments (scalar/vector, big/small, special
> cases), so I don't think that adding intrisics will help you in
defining the
> correct cost. This is true for all other vectorizer's decisions and it
works quite
> well.
> 
> If you find something missing, maybe we should fix the cost model, not
> introduce more intrinsics.
> 
> cheers,
> --renato

Renato Golin

2015-May-05 15:09 UTC

head link

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

On 5 May 2015 at 15:41, Shahid, Asghar-ahmad
<Asghar-ahmad.Shahid at amd.com> wrote:> With llvm.sad() intrinsic:
> VC1 (Vector Cost) = Cost associated with "PSAD" instruction.
>
> W/ llvm.absd() and llvm.hadd()
> VC2  = Cost associated with "absolute diff" +  "horizontal
add" ( ??? )
>
> As I will be querying with getIntrinsicCost(ID) for these two intrinsics
separately, Will VC1==VC2?
I see. You are correct to say that this is a crude approximation.

The way we do today is to get one of them and treat as "cheap", or if
not possible, to hope it'll dilute amidst other more expensive
instructions. Since the cost table is mostly to get it going, having
2/4 of the cost instead of 1/4 of the cost (for diff+add of 4-way
vectors instead of diff+add of 4 scalars) will count little to the
final score and it'll probably encourage vectorization. On the generic
cases that we fail to vectorize, we end up increasing the cost of the
scalar operations.

I agree this is far from ideal, but it works reasonably well. The
alternative would be to have instructions pattern support, which would
give us more fine grained control. I have suggested this many years
ago, but so far, the current model is working well enough so that we
haven't felt the need to implement a complicated pattern matching
support.

The cases where a pattern match would help are mainly:

* Detecting cases where the back end has special instructions for
multiple IR instructions. This is your case, and is common enough that
should benefit almost all back-ends.

* Hazard detection, for instance when moving in and out of VFP
registers, or when two instructions in sequence are really bad in
specific CPUs. This would also benefit multiple back-ends, but
probably has less impact on the quality of the choices.

However, we should first try the current model, and only go towards
the more complex model if we have enough patterns that would benefit
strongly enough to compensate for the increase in complexity. This
should be a consensus decision, I think.

In any case, not an argument to implement intrisics just because the
cost model is not accurate enough. If anything, we should fix the cost
model.

cheers,
--renato

llvm dev - May 2015 - [LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics