similar to: [LLVMdev] Extracting a splat value from vector instruction.

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] Extracting a splat value from vector instruction."

2015 May 04
2
[LLVMdev] Load value and broadcast in LLVM
Hi Shahid, Thank you so much for your response. You suggested approach is what I am right now using. However, it seems that the overhead is a little bit high because we are introducing two more instructions. I was wondering if there was a cheaper way to do it. Best, Zhi On Mon, May 4, 2015 at 2:12 AM, Shahid, Asghar-ahmad < Asghar-ahmad.Shahid at amd.com> wrote: > Hi Zhi, > >
2015 May 04
4
[LLVMdev] Load value and broadcast in LLVM
Is it possible to load a value into a vector register and broadcast it in LLVM? For example, for the following address %x %x = getelementptr inbounds %struct._Ray* %ray, i32 0, i32 0, i32 0 instead of loading the value at %x into a scalar register %0: %0 = load double* %x, align 4, !tbaa !0 I want to load it into a <2 x double> vector register %1 and make both of the two elements in %1
2015 Mar 03
4
[LLVMdev] Extending Vector GEP - proposal
> This problem can be solved by sinking the broadcast instruction at codegen-prepare time. I considered this option. We currently don’t have target specific optimizations in codegen-prepare time. (Or I’m wrong?) And it will be very X86-directed optimization. Even gather-scatter intrinsics are considered as common for all targets. And the second reason, why I’d prefer to generate a splat-GEP,
2013 Jul 22
0
[LLVMdev] Inverse of ConstantFP::get and similar functions?
----- Original Message ----- > Hi, > > I noticed that ConstantFP::get automatically returns the > appropriately > types Constant depending on the LLVM type passed in (i.e. if called > with a vector, it returns a splat vector with the given constant). > > Is there any simple way to do the inverse of this function? i.e., > given a llvm::Value, check whether it is either
2015 Mar 01
2
[LLVMdev] Extending Vector GEP - proposal
Hi, According to the current GEP syntax, vector GEP requires that each index must be a vector with the same number of elements. %A = getelementptr <4 x i8*> %ptrs, <4 x i64> %offsets I propose to lessen this requirement. Let each index be or vector or scalar. All vector indices must have the same number of elements. The scalar value will mean the splat vector value. %A =
2013 Jul 22
6
[LLVMdev] Inverse of ConstantFP::get and similar functions?
Hi, I noticed that ConstantFP::get automatically returns the appropriately types Constant depending on the LLVM type passed in (i.e. if called with a vector, it returns a splat vector with the given constant). Is there any simple way to do the inverse of this function? i.e., given a llvm::Value, check whether it is either a scalar of the given constant value or a splat vector with the given
2019 Aug 29
6
[SVE][AArch64] Codegen for a scalable vector splat
Hi, During the discussion on introducing scalable vectors we established that we could use the canonical IR form for splats of scalable vector types (insert element into lane 0 of an undef vector, shuffle that with another undef vector of the same type and a zeroinitializer mask). We do run into a problem for lowering to SelectionDAG however, since the canonical form there is a BUILD_VECTOR with
2015 Jun 19
2
[LLVMdev] Static function definition in "VectorUtils.h"
I was trying to extend "VectorUtils.h", found all functions are defined as static. The common problem when we include this and only use few function, compiler throw warning for unused functions. i.e. > llvm/include/llvm/Transforms/Utils/VectorUtils.h:102:1: warning: > 'llvm::Intrinsic::ID llvm::getIntrinsicIDForCall(llvm::CallInst*, const llvm::TargetLibraryInfo*)' >
2016 Feb 26
2
how to force llvm generate gather intrinsic
If I'm understanding correctly, you're saying that vgather* is slow on all of Excavator, Haswell, Broadwell, and Skylake (client). Therefore, we will not generate it for any of those machines. Even if that's true, we should not define "gatherIsSlow()" as "hasAVX2() && !hasAVX512()". It could break for some hypothetical future processor that manages to
2016 Feb 25
2
how to force llvm generate gather intrinsic
It seems that http://reviews.llvm.org/D15690 only implemented gather/scatter for AVX-512, but not for AVX/AVX2. Is there any plan to enable gather for AVX/2? Thanks. Best, Zhi On Thu, Feb 25, 2016 at 8:28 AM, Sanjay Patel <spatel at rotateright.com> wrote: > I don't think gather has been enabled for AVX2 as of r261875. > Masked load/store were enabled for AVX with: >
2016 Feb 26
0
how to force llvm generate gather intrinsic
That makes great sense. It would be great if we have profitability mode to see the necessity to use gathers. Or it also would be good if there is a compiler option for the users to enable LLVM to generate the gather instructions no matter it is faster or slow. Best, Zhi On Fri, Feb 26, 2016 at 12:49 PM, Sanjay Patel <spatel at rotateright.com> wrote: > If I'm understanding
2016 Feb 25
2
how to force llvm generate gather intrinsic
Yes, masked load/store/gather/scatter are completed. - Elena From: zhi chen [mailto:zchenhn at gmail.com] Sent: Thursday, February 25, 2016 01:20 To: Demikhovsky, Elena <elena.demikhovsky at intel.com> Cc: Sanjay Patel <spatel at rotateright.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] how to
2014 Oct 27
4
[LLVMdev] Adding masked vector load and store intrinsics
we just follow a common recommendation to start with intrinsics: http://llvm.org/docs/ExtendingLLVM.html - Elena From: Owen Anderson [mailto:resistor at mac.com] Sent: Sunday, October 26, 2014 23:57 To: Demikhovsky, Elena Cc: llvmdev at cs.uiuc.edu; dag at cray.com Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics What is the motivation for using intrinsics
2014 Oct 28
2
[LLVMdev] Adding masked vector load and store intrinsics
Many oveloaded intrinsics may be replaced with instructions - fabs or fma or sqrt. Chandler will probably explain the criteria. What the diff between fma and fadd? Or fptrunc and fabs? A new instruction like %a = loadm <4 x i32>* %addr, <4 x i32> %passthru, i32 4, <4 x i1>%mask is possible, but may be not very useful for most of targets. So we start from intrinsics. -
2016 Apr 12
2
X86 TRUNCATE cost for AVX & AVX2 mode
<Copied Cong> Thanks Elena. Mostly I was interested in why such a high cost 30 kept for TRUNCATE v16i32 to v16i8 in SSE41. Looking at the code it appears like TRUNCATE v16i32 to v16i8 in SSE41 is very expensive vs SSE2. I feel this number should be same/close to the cost mentioned for same operation in SSE2ConversionTbl. Below patch from Cong Hou reduce cost for same operation in SSE2
2016 May 20
5
Working on FP SCEV Analysis
To the best of my experience, handling case B (secondary induction) is must-have, and if I’m not mistaken, people aren’t opposed to that. For me, handling case A (primary induction) is “why not?”, but I certainly admit that that can be very naïve thinking coming from lack of good understanding on SCEV and their proper usages. Now, let’s assume we can postpone discussion about case A. What is the
2016 Feb 26
0
how to force llvm generate gather intrinsic
No. Gather operation is slow on AVX2 processors. - Elena From: zhi chen [mailto:zchenhn at gmail.com] Sent: Thursday, February 25, 2016 20:48 To: Sanjay Patel <spatel at rotateright.com> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] how to force
2016 Mar 04
2
Fwd: [PATCH] D17497: Support arbitrary address space for intrinsics
Per my previous email, I have just signed off on Artur's original patch. Philip On 03/02/2016 11:21 AM, Philip Reames via llvm-dev wrote: > Elena, > > I'd like to propose that we move forward withArtur's original patch > <http://reviews.llvm.org/D17270> and separate the discussion of how we > might change our intrinsic naming scheme. Artur's patch is
2015 Apr 16
2
[LLVMdev] Code review for gather and scatter intrinsics
Hi Renato, I fully agree with you, but indexed load and store is the next step. I'm asking to review gather and scatter code. Thanks. - Elena -----Original Message----- From: Renato Golin [mailto:renato.golin at linaro.org] Sent: Thursday, April 16, 2015 17:17 To: Demikhovsky, Elena Cc: llvmdev at cs.uiuc.edu; Chandler Carruth; James Molloy Subject: Re: [LLVMdev] Code review for gather
2014 Oct 24
20
[LLVMdev] Adding masked vector load and store intrinsics
Hi, We would like to add support for masked vector loads and stores by introducing new target-independent intrinsics. The loop vectorizer will then be enhanced to optimize loops containing conditional memory accesses by generating these intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer will first ask the target about availability of masked vector loads and stores. The SLP