thr3ads.net - similar to: "[LLVMdev] Load value and broadcast in LLVM"

Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] Load value and broadcast in LLVM"

[LLVMdev] Load value and broadcast in LLVM

2015 May 04

[LLVMdev] Load value and broadcast in LLVM

Hi Shahid, Thank you so much for your response. You suggested approach is what I am right now using. However, it seems that the overhead is a little bit high because we are introducing two more instructions. I was wondering if there was a cheaper way to do it. Best, Zhi On Mon, May 4, 2015 at 2:12 AM, Shahid, Asghar-ahmad < Asghar-ahmad.Shahid at amd.com> wrote: > Hi Zhi, > >

[LLVMdev] how can I create an SSE instrinsics sqrt?

2015 Apr 18

[LLVMdev] how can I create an SSE instrinsics sqrt?

Thanks, Shahid. It is fixed now. On Fri, Apr 17, 2015 at 8:50 PM, Shahid, Asghar-ahmad < Asghar-ahmad.Shahid at amd.com> wrote: > Hi zhi, > > > > You have to also pass the value type to getDecalaration() API such as > > > > Value* sqrtv = Intrinsic::getDeclaration(M, Intrinsic::x86_sse2_sqrt_pd, > v->getType()); > > > > Regards, > >

Data structure improvement for the SLP vectorizer

2017 Mar 15

Data structure improvement for the SLP vectorizer

Maybe it would illustrative to give an IR example of the case I'm interested in. Consider define void @"julia_transform_bvn_derivs_hessian!"(double* %data, double* %data2, double *%data3, double *%out) { %element11 = getelementptr inbounds double, double* %data, i32 1 %load10 = load double, double* %data %load11 = load double, double* %element11 %element21 =

Data structure improvement for the SLP vectorizer

2017 Mar 15

Data structure improvement for the SLP vectorizer

There was some discussion of this on the llvm-commits list, but I wanted to raise the topic for discussion here. The background of the -commits discussion was that r296863 added the ability to sort memory access when the SLP vectorizer reached a load (the SLP vectorizer starts at a store or some other sink, and tries to go up the tree vectorizing as it goes along - if the input is in a different

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 30

[Proposal][RFC] Strided Memory Access Vectorization

One common concern raised for cases where Loop Vectorizer generate bigger types than target supported: Based on VF currently we check the cost and generate the expected set of instruction[s] for bigger type. It has two challenges for bigger types cost is not always correct and code generation may not generate efficient instruction[s]. Probably can depend on the support provided by below RFC by

[LLVMdev] how can I create an SSE instrinsics sqrt?

2015 Apr 18

[LLVMdev] how can I create an SSE instrinsics sqrt?

I want to create a vector version sqrt as the following. Value *Approx::CreateFSqrt(IRBuilder<> &builder, Value *v, const char* Name) { Type *tys[] = {v->getType()}; Module* M = currF->getParent(); Value* sqrtv = Intrinsic::getDeclaration(M, Intrinsic::x86_sse2_sqrt_pd); CallInst *CI = builder.CreateCall(sqrtv, v, Name); return CI; } Here is Value *v is <2 x

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 30

[Proposal][RFC] Strided Memory Access Vectorization

As a strong advocate of logical vector representation, I'm counting on community liking Michael's RFC and that'll proceed sooner than later. I plan to pitch in (e.g., perf experiments). >Probably can depend on the support provided by below RFC by Michael: > "Allow loop vectorizer to choose vector widths that generate illegal types" >In that case Loop Vectorizer will

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

2015 May 04

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

Hi Asghar-Ahmed, I saw your last ping - sorry, I'm away on vacation and back on Wednesday. Generally, I'm not sure that having both absd/hadd and sad are compatible with the discussions going on in other threads, for example my thread about min and max. Given that those two intrinsics are fairly trivial to match , I don't see the need to have two different canonical forms. James On

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

2015 May 01

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

Hi All, I would like to introduce intrinsics to generate efficient codes for 'absolute differences', 'horizontal add' and 'sum of absolute differences' Idioms used by user programs. Identifying these idioms at lower level (Codegen) is complex. These idioms can be identified in LV/SLP and vectorized using above intrinsics to generate better code. Proposal: 1. Add

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

2014 Dec 13

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

So IMO, if we modify the VF calculation for targets/subtargets using TTI where higher VF is supported The vectorizer’s scope will become wider. Did/do you foresee any issue with this? Thanks, Shahid From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Saturday, December 13, 2014 2:47 AM To: Shahid, Asghar-ahmad Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Vectorization factor limitation in

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

2014 Dec 11

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

Hi Nadav/Devs I am exploring Loop Vectorizer to vectorize i8 scalar operations into 8xi8 vector operation. I was expecting the Loop Vectorizer to analyze the profitability for vectorization factor(VF) of 8, However it is not doing so due to the widest type calculation done for the blocks inside the loop. May be I am missing something, however, I am curious to know why Loop Vectorizer limits the

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 18

[Proposal][RFC] Strided Memory Access Vectorization

>Vectorizer's output should be as clean as vector code can be so that analyses and optimizers downstream can >do a great job optimizing. Guess I should clarify this philosophical position of mine. In terms of vector code optimization that complicates the output of vectorizer: If vectorizer is the best place to perform the optimization, it should do so. This includes the cases like

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

2015 May 05

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

On 4 May 2015 at 08:37, Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com> wrote: > My worry is regarding the query for cost calculation for specific SAD > instructions such as ‘psad’ (X86) or ‘usad’ (ARM) in Loop Vectorizer. Hi Shahid, The vectorizer's cost model has the ability to return different costs for the same instruction based on the arguments (scalar/vector,

sum elements in the vector

2016 May 12

sum elements in the vector

> why in order to add this particular instruction (sum elements in a vector) I need to add an insrinsic? Adding intrinsic is not the only way, it is one of the way and user WILL-NOT be required to invoke It specifically. Currently LLVM does not have any instruction to directly represent “sum of elements in a vector” and generate your particular instruction.However, you can do it without

sum elements in the vector

2016 May 16

sum elements in the vector

This would be really cool. We have several instructions that perform horizontal vector operations, and have to use built-ins to select them as there is no easy way of expressing them in a TD file. Some like SUM for a ‘v4i32’ are easy enough to express with a pattern fragment, SUM ‘v8i16’ takes TableGen a long time to compute, but SUM ‘v16i8’ resulted in TableGen disappearing into itself for

sum elements in the vector

2016 Apr 04

sum elements in the vector

My target has an instruction that adds up all elements in the vector and stores the result in a register. I'm trying to implement it in my compiler but I'm not sure even where to start. I did look at other targets, but they don't seem to have anything like it ( I could be wrong. My experience with LLVM is limited, so if I missed it, I'd appreciate if someone could point it out ).

sum elements in the vector

2016 May 16

sum elements in the vector

I'm starting to think we should directly implement horizontal operations on vector types. My suspicion is that coming up with a nice model for this would help us a lot with things like: - Idiom recognition of reduction patterns that use horizontal arithmetic - Ability to use horizontal operations in SLPVectorizer - Significantly easier cost modeling of vectorizing loops with reductions in

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

2015 May 06

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

> For the time being, if you can get away with heuristics, and that fills your > allocated time for this task, that it's the best way forward for now. Sorry that I could not get what exactly you mean with "heuristics". Is it the "intrinsics approach" itself or something else? BTW, now my plan is to just add the two intrinsics for 'absolute difference' and

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

2015 May 06

[LLVMdev] [RFC][PATCH] Adding absd/hadd/sad intrinsics

Hi Renato, That’s right. I agree with your *pattern vs complexity* thinking. So I would drop llvm.sad() and go ahead with the remaining two. Does it make sense in general? Regards, Shahid > -----Original Message----- > From: Renato Golin [mailto:renato.golin at linaro.org] > Sent: Tuesday, May 05, 2015 8:40 PM > To: Shahid, Asghar-ahmad > Cc: James Molloy; llvmdev at

[RFC] Introducing a vector reduction add instruction.

2015 Nov 19

[RFC] Introducing a vector reduction add instruction.

After some attempt to implement reduce-add in LLVM, I found out a easier way to detect reduce-add without introducing new IR operations. The basic idea is annotating phi node instead of add (so that it is easier to handle other reduction operations). In PHINode class, we can add a flag indicating if the phi node is a reduction one (the flag can be set in loop vectorizer for vectorized phi nodes).

similar to: [LLVMdev] Load value and broadcast in LLVM