similar to: [LLVMdev] Load value and broadcast in LLVM

Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] Load value and broadcast in LLVM"

2015 May 04
2
[LLVMdev] Load value and broadcast in LLVM
Hi Shahid, Thank you so much for your response. You suggested approach is what I am right now using. However, it seems that the overhead is a little bit high because we are introducing two more instructions. I was wondering if there was a cheaper way to do it. Best, Zhi On Mon, May 4, 2015 at 2:12 AM, Shahid, Asghar-ahmad < Asghar-ahmad.Shahid at amd.com> wrote: > Hi Zhi, > >
2015 Apr 18
2
[LLVMdev] how can I create an SSE instrinsics sqrt?
Thanks, Shahid. It is fixed now. On Fri, Apr 17, 2015 at 8:50 PM, Shahid, Asghar-ahmad < Asghar-ahmad.Shahid at amd.com> wrote: > Hi zhi, > > > > You have to also pass the value type to getDecalaration() API such as > > > > Value* sqrtv = Intrinsic::getDeclaration(M, Intrinsic::x86_sse2_sqrt_pd, > v->getType()); > > > > Regards, > >
2017 Mar 15
2
Data structure improvement for the SLP vectorizer
Maybe it would illustrative to give an IR example of the case I'm interested in. Consider define void @"julia_transform_bvn_derivs_hessian!"(double* %data, double* %data2, double *%data3, double *%out) { %element11 = getelementptr inbounds double, double* %data, i32 1 %load10 = load double, double* %data %load11 = load double, double* %element11 %element21 =
2017 Mar 15
2
Data structure improvement for the SLP vectorizer
There was some discussion of this on the llvm-commits list, but I wanted to raise the topic for discussion here. The background of the -commits discussion was that r296863 added the ability to sort memory access when the SLP vectorizer reached a load (the SLP vectorizer starts at a store or some other sink, and tries to go up the tree vectorizing as it goes along - if the input is in a different
2016 Jun 30
0
[Proposal][RFC] Strided Memory Access Vectorization
One common concern raised for cases where Loop Vectorizer generate bigger types than target supported: Based on VF currently we check the cost and generate the expected set of instruction[s] for bigger type. It has two challenges for bigger types cost is not always correct and code generation may not generate efficient instruction[s]. Probably can depend on the support provided by below RFC by
2015 Apr 18
2
[LLVMdev] how can I create an SSE instrinsics sqrt?
I want to create a vector version sqrt as the following. Value *Approx::CreateFSqrt(IRBuilder<> &builder, Value *v, const char* Name) { Type *tys[] = {v->getType()}; Module* M = currF->getParent(); Value* sqrtv = Intrinsic::getDeclaration(M, Intrinsic::x86_sse2_sqrt_pd); CallInst *CI = builder.CreateCall(sqrtv, v, Name); return CI; } Here is Value *v is <2 x
2016 Jun 30
1
[Proposal][RFC] Strided Memory Access Vectorization
As a strong advocate of logical vector representation, I'm counting on community liking Michael's RFC and that'll proceed sooner than later. I plan to pitch in (e.g., perf experiments). >Probably can depend on the support provided by below RFC by Michael: > "Allow loop vectorizer to choose vector widths that generate illegal types" >In that case Loop Vectorizer will
2015 May 04
2
[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics
Hi Asghar-Ahmed, I saw your last ping - sorry, I'm away on vacation and back on Wednesday. Generally, I'm not sure that having both absd/hadd and sad are compatible with the discussions going on in other threads, for example my thread about min and max. Given that those two intrinsics are fairly trivial to match , I don't see the need to have two different canonical forms. James On
2015 May 01
2
[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics
Hi All, I would like to introduce intrinsics to generate efficient codes for 'absolute differences', 'horizontal add' and 'sum of absolute differences' Idioms used by user programs. Identifying these idioms at lower level (Codegen) is complex. These idioms can be identified in LV/SLP and vectorized using above intrinsics to generate better code. Proposal: 1. Add
2014 Dec 13
2
[LLVMdev] Vectorization factor limitation in Loop Vectorizer
So IMO, if we modify the VF calculation for targets/subtargets using TTI where higher VF is supported The vectorizer’s scope will become wider. Did/do you foresee any issue with this? Thanks, Shahid From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Saturday, December 13, 2014 2:47 AM To: Shahid, Asghar-ahmad Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Vectorization factor limitation in
2014 Dec 11
2
[LLVMdev] Vectorization factor limitation in Loop Vectorizer
Hi Nadav/Devs I am exploring Loop Vectorizer to vectorize i8 scalar operations into 8xi8 vector operation. I was expecting the Loop Vectorizer to analyze the profitability for vectorization factor(VF) of 8, However it is not doing so due to the widest type calculation done for the blocks inside the loop. May be I am missing something, however, I am curious to know why Loop Vectorizer limits the
2016 Jun 18
2
[Proposal][RFC] Strided Memory Access Vectorization
>Vectorizer's output should be as clean as vector code can be so that analyses and optimizers downstream can >do a great job optimizing. Guess I should clarify this philosophical position of mine. In terms of vector code optimization that complicates the output of vectorizer: If vectorizer is the best place to perform the optimization, it should do so. This includes the cases like
2015 May 05
2
[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics
On 4 May 2015 at 08:37, Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com> wrote: > My worry is regarding the query for cost calculation for specific SAD > instructions such as ‘psad’ (X86) or ‘usad’ (ARM) in Loop Vectorizer. Hi Shahid, The vectorizer's cost model has the ability to return different costs for the same instruction based on the arguments (scalar/vector,
2016 May 12
3
sum elements in the vector
> why in order to add this particular instruction (sum elements in a vector) I need to add an insrinsic? Adding intrinsic is not the only way, it is one of the way and user WILL-NOT be required to invoke It specifically. Currently LLVM does not have any instruction to directly represent “sum of elements in a vector” and generate your particular instruction.However, you can do it without
2016 May 16
4
sum elements in the vector
This would be really cool. We have several instructions that perform horizontal vector operations, and have to use built-ins to select them as there is no easy way of expressing them in a TD file. Some like SUM for a ‘v4i32’ are easy enough to express with a pattern fragment, SUM ‘v8i16’ takes TableGen a long time to compute, but SUM ‘v16i8’ resulted in TableGen disappearing into itself for
2016 Apr 04
7
sum elements in the vector
My target has an instruction that adds up all elements in the vector and stores the result in a register. I'm trying to implement it in my compiler but I'm not sure even where to start. I did look at other targets, but they don't seem to have anything like it ( I could be wrong. My experience with LLVM is limited, so if I missed it, I'd appreciate if someone could point it out ).
2016 May 16
0
sum elements in the vector
I'm starting to think we should directly implement horizontal operations on vector types. My suspicion is that coming up with a nice model for this would help us a lot with things like: - Idiom recognition of reduction patterns that use horizontal arithmetic - Ability to use horizontal operations in SLPVectorizer - Significantly easier cost modeling of vectorizing loops with reductions in
2015 May 06
2
[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics
> For the time being, if you can get away with heuristics, and that fills your > allocated time for this task, that it's the best way forward for now. Sorry that I could not get what exactly you mean with "heuristics". Is it the "intrinsics approach" itself or something else? BTW, now my plan is to just add the two intrinsics for 'absolute difference' and
2015 May 06
2
[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics
Hi Renato, That’s right. I agree with your *pattern vs complexity* thinking. So I would drop llvm.sad() and go ahead with the remaining two. Does it make sense in general? Regards, Shahid > -----Original Message----- > From: Renato Golin [mailto:renato.golin at linaro.org] > Sent: Tuesday, May 05, 2015 8:40 PM > To: Shahid, Asghar-ahmad > Cc: James Molloy; llvmdev at
2015 Nov 19
5
[RFC] Introducing a vector reduction add instruction.
After some attempt to implement reduce-add in LLVM, I found out a easier way to detect reduce-add without introducing new IR operations. The basic idea is annotating phi node instead of add (so that it is easier to handle other reduction operations). In PHINode class, we can add a flag indicating if the phi node is a reduction one (the flag can be set in loop vectorizer for vectorized phi nodes).