similar to: [LLVMdev] Question about LLVM NEON intrinsics

Displaying 20 results from an estimated 600 matches similar to: "[LLVMdev] Question about LLVM NEON intrinsics"

2012 Sep 21
0
[LLVMdev] Question about LLVM NEON intrinsics
On Fri, Sep 21, 2012 at 1:28 AM, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote: > Hi all, > > I would like to know if LLVM Neon intrinsics are designed to support only 'Legal' types for NEON units. > Using llc -march=arm -mcpu=cortex-a9 vmax4.ll -o vmax4.s on following ll code: > > > ; ModuleID = 'vmax.ll' > target datalayout =
2012 Sep 21
2
[LLVMdev] RE : Question about LLVM NEON intrinsics
Hi Eli, Thanks for the answer, it clarifies the situation for me. Do you know if there is Pass in LLVM that could be adapted to 'legalize' intrinsics calls ? Or shall I define my own intrinsics for non supported types ? Best Regards Seb ________________________________________ De : Eli Friedman [eli.friedman at gmail.com] Date d'envoi : vendredi 21 septembre 2012 11:54 À : Sebastien
2012 Sep 21
0
[LLVMdev] Question about LLVM NEON intrinsics
On Sep 21, 2012, at 2:58 AM, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote: > Hi Eli, > > Thanks for the answer, it clarifies the situation for me. Do you know if there is Pass in LLVM that could be adapted to 'legalize' intrinsics calls ? > Or shall I define my own intrinsics for non supported types ? You should never generate these sorts of intrinsics with
2012 Sep 21
0
[LLVMdev] Question about LLVM NEON intrinsics
On 21 September 2012 09:28, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote: > declare <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x float>, <16 x float>) nounwind readnone > > llc fails with following message: > > SplitVectorResult #0: 0x2258350: v16f32 = llvm.arm.neon.vmaxs 0x2258250, 0x2258050, 0x2258150 [ORD=3] [ID=0] > > LLVM ERROR: Do not
2012 Sep 21
2
[LLVMdev] RE : Question about LLVM NEON intrinsics
Hello Renato, You're pointing me at ARM intrinsics related to loads, problem that I've reported in original e-mail, is not support for vector loads, but support for 'vmaxs'. For instance, there is no vector loads of 16 floats in ARM ISA but it is legal to write in LLVM: ; ModuleID = 'vadd.ll' target datalayout =
2009 Nov 10
3
[LLVMdev] speed up memcpy intrinsic using ARM Neon registers
On Nov 9, 2009, at 5:59 PM, David Conrad wrote: > On Nov 9, 2009, at 7:34 PM, Neel Nagar wrote: > >> I tried to speed up Dhrystone on ARM Cortex-A8 by optimizing the >> memcpy intrinsic. I used the Neon load multiple instruction to move >> up >> to 48 bytes at a time . Over 15 scalar instructions collapsed down >> into these 2 Neon instructions. Nice. Thanks
2012 Sep 21
0
[LLVMdev] RE : Question about LLVM NEON intrinsics
On 21 September 2012 10:57, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote: > You're pointing me at ARM intrinsics related to loads, problem that I've reported in original e-mail, is not support for vector loads, but support for 'vmaxs'. For instance, there is no vector loads of 16 floats in ARM ISA but it is legal to write in LLVM: Oh, yes, sorry. Still, Eli is
2009 Nov 10
4
[LLVMdev] speed up memcpy intrinsic using ARM Neon registers
I tried to speed up Dhrystone on ARM Cortex-A8 by optimizing the memcpy intrinsic. I used the Neon load multiple instruction to move up to 48 bytes at a time . Over 15 scalar instructions collapsed down into these 2 Neon instructions. fldmiad r3, {d0, d1, d2, d3, d4, d5} @ SrcLine dhrystone.c 359 fstmiad r1, {d0, d1, d2, d3, d4, d5} It seems like this should be faster. But I did
2015 Jan 05
2
[LLVMdev] NEON intrinsics preventing redundant load optimization?
On 4 Jan 2015, at 21:06, Tim Northover <t.p.northover at gmail.com> wrote: >>> I’ve managed to replace the load/store intrinsics with pointer dereferences (along with a typedef to get the alignment correct). This generates 100% the same IR + asm as the auto-vectorized C version (both using -O3), and works with the toolchain in the latest XCode. Are there any concerns around doing
2010 Jul 15
2
[LLVMdev] v16i32/v16f32
Hi I find types such as v16i32, v16f32 missing in my llvm version 2.7 So does the following page not list them http://llvm.org/docs/doxygen/html/classllvm_1_1MVT.html is that intentional for any reason or can I just add them ? thanks shrey
2015 Mar 15
2
[LLVMdev] Indexed Load and Store Intrinsics - proposal
hi Hao, I started to upstream and the second patch is stalled under review now. - Elena -----Original Message----- From: Hao Liu [mailto:haoliuts at gmail.com] Sent: Friday, March 13, 2015 05:56 To: Demikhovsky, Elena Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal Hi Elena, I think such intrinsics are very useful. Do you have any plan to
2014 Dec 18
8
[LLVMdev] Indexed Load and Store Intrinsics - proposal
Hi, Recent Intel architectures AVX-512 and AVX2 provide vector gather and/or scatter instructions. Gather/scatter instructions allow read/write access to multiple memory addresses. The addresses are specified using a base address and a vector of indices. We'd like Vectorizers to tap this functionality, and propose to do so by introducing new intrinsics: VectorValue = @llvm.sindex.load
2013 Feb 04
6
[LLVMdev] Vectorizer using Instruction, not opcodes
On 4 February 2013 18:25, Arnold Schwaighofer <aschwaighofer at apple.com>wrote: > For cases where this approach breaks really badly we could consider adding > a specialized api or parameters (like the type of a user/use). But we > should do so only as a last resort and backed by actual code that would > benefit from doing so. > Very sensible, more or less what I had in
2013 Feb 04
0
[LLVMdev] Vectorizer using Instruction, not opcodes
Hi all, My take on this is that, as you state below, at the IR level we are only roughly estimating cost, at best (or we would have to lower the code and then estimate cost - something we don't want to do). I would propose for estimating the "worst case costs" and see how far we get with this. My rational here is that we don't want vectorization to decrease performance relative
2016 Sep 19
2
RFC: New intrinsics masked.expandload and masked.compressstore
Hi all, AVX-512 ISA introduces new vector instructions VCOMPRESS and VEXPAND in order to allow vectorization of the following loops with two specific types of cross-iteration dependencies: Compress: for (int i=0; i<N; ++i) If (t[i]) *A++ = expr; Expand: for (i=0; i<N; ++i) If (t[i]) X[i] = *A++; else
2010 Jul 15
0
[LLVMdev] v16i32/v16f32
On Wed, Jul 14, 2010 at 6:48 PM, shreyas krishnan <shreyas76 at gmail.com> wrote: > Hi >   I find  types such as v16i32, v16f32  missing in my llvm version 2.7 > > So does the following page not list them > http://llvm.org/docs/doxygen/html/classllvm_1_1MVT.html > > is that intentional for any reason or can I just add them  ? As far as I know, they're not there
2016 Sep 25
5
RFC: New intrinsics masked.expandload and masked.compressstore
| |Hi Elena, | |Technically speaking, this seems straightforward. | |I wonder, however, how target-independent this is in a practical |sense; will there be an efficient lowering when targeting any other |ISA? I don't want to get into the territory where, because the |vectorizer is supposed to be architecture independent, we need to |add target-independent intrinsics for all
2013 Feb 04
2
[LLVMdev] Vectorizer using Instruction, not opcodes
Hi folks, I've been thinking on how to implement some of the costs and there is a lot of instructions which cost depend on other instructions around. Casts are one obvious case, since arithmetic and memory instructions can, sometimes, cast values for free. The cost model receives Opcodes, which lose the info on the history of the values being vectorized, and I thought we could pass the whole
2017 Aug 06
2
VBROADCAST Implementation Issues
i want to implement gather for v64i32. i wrote following code. def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins i2048mem:$src), "GATHER_256B\t{$src, $dst|$dst, $src}", [(set VR_2048:$dst, (v64i32 (masked_gather addr:$src)))], IIC_MOV_MEM>, TA; def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
2017 Aug 07
2
VBROADCAST Implementation Issues
Hello, I did as you said, Please tell me whether the following correct now?? def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}}, $src2}"), [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 (GatherNode