On 4 May 2015 at 08:37, Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com> wrote:> My worry is regarding the query for cost calculation for specific SAD > instructions such as ‘psad’ (X86) or ‘usad’ (ARM) in Loop Vectorizer.Hi Shahid, The vectorizer's cost model has the ability to return different costs for the same instruction based on the arguments (scalar/vector, big/small, special cases), so I don't think that adding intrisics will help you in defining the correct cost. This is true for all other vectorizer's decisions and it works quite well. If you find something missing, maybe we should fix the cost model, not introduce more intrinsics. cheers, --renato
Shahid, Asghar-ahmad
2015-May-05 14:41 UTC
[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics
Hi Renato, Thanks for your response. My concern was actually this. For example, take vector type V8i16 on X86 target With llvm.sad() intrinsic: VC1 (Vector Cost) = Cost associated with "PSAD" instruction. W/ llvm.absd() and llvm.hadd() VC2 = Cost associated with "absolute diff" + "horizontal add" ( ??? ) As I will be querying with getIntrinsicCost(ID) for these two intrinsics separately, Will VC1==VC2? May be I am missing something obvious? Regards, Shahid> -----Original Message----- > From: Renato Golin [mailto:renato.golin at linaro.org] > Sent: Tuesday, May 05, 2015 7:28 PM > To: Shahid, Asghar-ahmad > Cc: James Molloy; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics > > On 4 May 2015 at 08:37, Shahid, Asghar-ahmad <Asghar- > ahmad.Shahid at amd.com> wrote: > > My worry is regarding the query for cost calculation for specific SAD > > instructions such as ‘psad’ (X86) or ‘usad’ (ARM) in Loop Vectorizer. > > Hi Shahid, > > The vectorizer's cost model has the ability to return different costs for the > same instruction based on the arguments (scalar/vector, big/small, special > cases), so I don't think that adding intrisics will help you in defining the > correct cost. This is true for all other vectorizer's decisions and it works quite > well. > > If you find something missing, maybe we should fix the cost model, not > introduce more intrinsics. > > cheers, > --renato
On 5 May 2015 at 15:41, Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com> wrote:> With llvm.sad() intrinsic: > VC1 (Vector Cost) = Cost associated with "PSAD" instruction. > > W/ llvm.absd() and llvm.hadd() > VC2 = Cost associated with "absolute diff" + "horizontal add" ( ??? ) > > As I will be querying with getIntrinsicCost(ID) for these two intrinsics separately, Will VC1==VC2?I see. You are correct to say that this is a crude approximation. The way we do today is to get one of them and treat as "cheap", or if not possible, to hope it'll dilute amidst other more expensive instructions. Since the cost table is mostly to get it going, having 2/4 of the cost instead of 1/4 of the cost (for diff+add of 4-way vectors instead of diff+add of 4 scalars) will count little to the final score and it'll probably encourage vectorization. On the generic cases that we fail to vectorize, we end up increasing the cost of the scalar operations. I agree this is far from ideal, but it works reasonably well. The alternative would be to have instructions pattern support, which would give us more fine grained control. I have suggested this many years ago, but so far, the current model is working well enough so that we haven't felt the need to implement a complicated pattern matching support. The cases where a pattern match would help are mainly: * Detecting cases where the back end has special instructions for multiple IR instructions. This is your case, and is common enough that should benefit almost all back-ends. * Hazard detection, for instance when moving in and out of VFP registers, or when two instructions in sequence are really bad in specific CPUs. This would also benefit multiple back-ends, but probably has less impact on the quality of the choices. However, we should first try the current model, and only go towards the more complex model if we have enough patterns that would benefit strongly enough to compensate for the increase in complexity. This should be a consensus decision, I think. In any case, not an argument to implement intrisics just because the cost model is not accurate enough. If anything, we should fix the cost model. cheers, --renato
Maybe Matching Threads
- [LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics
- [LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics
- [LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics
- [LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics
- [LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics