thr3ads.net - search: "vld1"

Displaying 20 results from an estimated 31 matches for "vld1".

Did you mean: ld1

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

..., > > We ran into the same issue with generating vector loads/stores for vectors > with less than word alignment. It seems we took a similar approach to > solving the problem by modifying the logic in allowsUnalignedMemoryAccesses. > > As you and Jim mentioned, it looks like the vld1/vst1 instructions should > support element aligned access for any armv7 implementation (I'm looking at > Table A3-1 ARM Architecture Reference Manual - ARM DDI 0406C). > > Right now I do not think we have the correct code setup in ARMSubtarget to > accurately represent this tabl...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hello, Thanks again. We did try overestimating the alignment, and saw the vldr you reference here. It looks like a recent change (r161962?) did enable vld1 generation for this case (great!) on darwin, but not linux. I'm not sure if the effect of lowering load <4 x i16>* align 2 to vld1.16 this was intentional in this change or not. If so, my question is what is the preferable way to inform the Subtarget that it is allowed to use unaligned...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...> > We ran into the same issue with generating vector loads/stores for > vectors with less than word alignment. It seems we took a similar > approach to solving the problem by modifying the logic in allowsUnalignedMemoryAccesses. > > As you and Jim mentioned, it looks like the vld1/vst1 instructions > should support element aligned access for any armv7 implementation > (I'm looking at Table A3-1 ARM Architecture Reference Manual - ARM DDI 0406C). > > Right now I do not think we have the correct code setup in > ARMSubtarget to accurately represent this t...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 07

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...o the same issue with generating vector loads/stores for >> vectors with less than word alignment. It seems we took a similar >> approach to solving the problem by modifying the logic in > allowsUnalignedMemoryAccesses. >> >> As you and Jim mentioned, it looks like the vld1/vst1 instructions >> should support element aligned access for any armv7 implementation >> (I'm looking at Table A3-1 ARM Architecture Reference Manual - ARM DDI > 0406C). >> >> Right now I do not think we have the correct code setup in >> ARMSubtarget to ac...

[Constant Folder, InstCombine, ARM, AArch64] Question about constant folding of vector load

2018 Apr 26

[Constant Folder, InstCombine, ARM, AArch64] Question about constant folding of vector load

...turn an Arm/AArch64 table lookup intrinsic that takes a constant vector mask into a shufflevector instruction: vtbl1(V,mask) ~> shufflevector(V,undef,mask) The reason is that if the mask is {7,6,5,4,3,2,1,0}, then the backend will generate rev64 instructions instead. If the mask comes from a vld1 of a global constant I could fold it to allow the above instruction combining. My question is, does the constant folding of the vld1 seem a good thing to do in the general case, as a standalone transformation, or only when used as a mask for a table lookup? Alexandros IMPORTANT NOTICE: The conten...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 05

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hello Jim, Thank you for the response. I may be confused about the alignment rules here. I had been looking at the ARM RVCT Assembler Guide, which seems to indicate vld1.16 operates on 16-bit aligned data, unless I am misinterpreting their table (Table 5-11 in ARM DUI 0204H, pg 5-70,5-71). Prior to the table, It does mention the accesses need to be "element" aligned, where I took element in this case to mean i16. Anyhow, to make this a little more conc...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hi Pete, We ran into the same issue with generating vector loads/stores for vectors with less than word alignment. It seems we took a similar approach to solving the problem by modifying the logic in allowsUnalignedMemoryAccesses. As you and Jim mentioned, it looks like the vld1/vst1 instructions should support element aligned access for any armv7 implementation (I'm looking at Table A3-1 ARM Architecture Reference Manual - ARM DDI 0406C). Right now I do not think we have the correct code setup in ARMSubtarget to accurately represent this table. I would propose that w...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 05

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...) and get: extend: @ @extend @ BB#0: vldr d16, [r0] vmovl.s16 q8, d16 vstmia r1, {d16, d17} vldr d16, [r0, #8] add r0, r1, #16 vmovl.s16 q8, d16 vstmia r0, {d16, d17} bx lr Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI 0406C), you're correct about the element size alignment requirement for VLD1, but our isel isn't attempting to use that instruction, but rather VLDR, which has word alignment required, so it falls over. Given that, it se...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 07

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...generating vector loads/stores for > >> vectors with less than word alignment. It seems we took a similar > >> approach to solving the problem by modifying the logic in > > allowsUnalignedMemoryAccesses. > >> > >> As you and Jim mentioned, it looks like the vld1/vst1 instructions > >> should support element aligned access for any armv7 implementation > >> (I'm looking at Table A3-1 ARM Architecture Reference Manual - ARM > >> DDI > > 0406C). > >> > >> Right now I do not think we have the correct code s...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 05

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hello all, I am a first time writer here, but am a happy LLVM tinkerer. It is a pleasure to use :). We have come across some sub-optimal behavior when LLVM lowers loads for vectors with small integers, i.e. load <4 x i16>* %a, align 2, using a sequence of scalar loads rather than a single vld1 on armv7 linux with NEON. Looking at the code in svn, it appears the ARM backend is capable of lowering these loads as desired, and will if we use an appropriate darwin triple. It appears this was actually enabled relatively recently. Seemingly, the case where the Subtarget has NEON available sh...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...xtend > @ BB#0: > vldr d16, [r0] > vmovl.s16 q8, d16 > vstmia r1, {d16, d17} > vldr d16, [r0, #8] > add r0, r1, #16 > vmovl.s16 q8, d16 > vstmia r0, {d16, d17} > bx lr > > Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI 0406C), you're correct about the element size alignment requirement for VLD1, but our isel isn't attempting to use that instruction, but rather VLDR, which has word alignment required, so it falls over. > > Given t...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 05

[LLVMdev] Unaligned vector memory access for ARM/NEON.

VLD1 expects a 64-bit aligned address unless the target explicitly days that unaligned loads are OK. For your situation, either the subtarget should set AllowsUnalignedMem to true (if that's accurate), or the load address should be made 64-bit aligned. -Jim On Sep 5, 2012, at 2:42 PM, Peter Coupe...

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2015 Jan 05

[LLVMdev] NEON intrinsics preventing redundant load optimization?

Hi all, Sorry for arriving late to the party. First, some context: vld1 is not the same as a pointer dereference. The alignment requirements are different (which I saw you hacked around in your testcase using attribute((aligned(4))) ), and in big endian environments they do totally different things (VLD1 does element-wise byteswapping and pointer dereferences byteswaps...

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2015 Jan 05

[LLVMdev] NEON intrinsics preventing redundant load optimization?

On 4 Jan 2015, at 21:06, Tim Northover <t.p.northover at gmail.com> wrote: >>> I’ve managed to replace the load/store intrinsics with pointer dereferences (along with a typedef to get the alignment correct). This generates 100% the same IR + asm as the auto-vectorized C version (both using -O3), and works with the toolchain in the latest XCode. Are there any concerns around doing

[LLVMdev] MachineOperand SubReg

2013 Apr 19

[LLVMdev] MachineOperand SubReg

...is also not a tree, it is a DAG. > > Hmm. I don't doubt it but can you give me an example of a case where > there is no "most super" register? I'm having a hard time thinking up > how one would design such an ISA. The ARM NEON D-registers are 64 bits each. NEON has vld1 instructions that can load 2, 3, or 4 consecutive D-registers. Two consecutive D-registers is represented by the D0_D1, D1_D2, D2_D3, ... super-registers. As you can see, D1 has two super-registers, neither is more super than the other. We similarly define triples and quads of consecutive D-regis...

[LLVMdev] [RFC] Stripping unusable intrinsics

2014 Dec 23

[LLVMdev] [RFC] Stripping unusable intrinsics

...rnMatch template that takes a string. The switches are also straight-forward. In BasicAA, for example, instead of: switch (II->getIntrinsicID()) { default: break; case Intrinsic::memset: case Intrinsic::memcpy: case Intrinsic::memmove: { ... case Intrinsic::arm_neon_vld1: { assert(ArgIdx == 0 && "Invalid argument index"); assert(Loc.Ptr == II->getArgOperand(ArgIdx) && "Intrinsic location pointer not argument?"); // LLVM's vld1 and vst1 intrinsics currently only support a single // vecto...

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2014 Dec 10

[LLVMdev] NEON intrinsics preventing redundant load optimization?

...hope for this improving in the future, or anything I can do now to improve the generated code? >>> >>> If I had to guess, I'd say the intrinsic got in the way of recognising >>> the pattern. vmulq_f32 got correctly lowered to IR as "fmul", but >>> vld1q_f32 is still kept as an intrinsic, so register allocators and >>> schedulers get confused and, when lowering to assembly, you're left >>> with garbage around it. > > FWIW, with top of tree clang, I get the same (good) code for both of the implementations of operator* i...

[LLVMdev] MachineOperand SubReg

2013 Apr 19

[LLVMdev] MachineOperand SubReg

Jakob Stoklund Olesen <stoklund at 2pi.dk> writes: >> A MachineOperand has both a getReg() and a getSubReg() interface. >> For a physical register operand, is getReg() guaranteed to be the >> "most super" register with getSubReg() providing the specific >> subregister information for the operand? If so then for my current >> purposes it seems I

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2014 Dec 08

[LLVMdev] NEON intrinsics preventing redundant load optimization?

...store on the stack? Is there any hope for this improving in the future, or anything I can do now to improve the generated code? > > If I had to guess, I'd say the intrinsic got in the way of recognising > the pattern. vmulq_f32 got correctly lowered to IR as "fmul", but > vld1q_f32 is still kept as an intrinsic, so register allocators and > schedulers get confused and, when lowering to assembly, you're left > with garbage around it. > > Creating a bug for this is probably the best thing to do, since this > is a common pattern that needs looking into t...

[LLVMdev] Avoiding MCRegAliasIterator with register units

2013 May 22

[LLVMdev] Avoiding MCRegAliasIterator with register units

...ames to such pseudo-registers, again from ARM: SPR: (s0, s1, ...) 32-bit floating point registers. DPR: (d0, d1, ...) Even-odd pairs of consecutive S-registers. QPR: (q0, q1, ...) Even-odd pairs of consecutive D-registers. But not all constraints are given 'register' names by the ISA. One vld1 instruction variant can load two consecutive D-registers, both even-odd and odd-even pairs. An even-odd pair like {d0, d1} is also called q0, but an odd-even pair like {d1, d2} has no other ISA name. Since it's a bit random what an ISA decides to call a register and what it decides to call an...

search for: vld1