Sebastien DELDON-GNB
2012-Sep-21 09:57 UTC
[LLVMdev] RE : Question about LLVM NEON intrinsics
Hello Renato,
You're pointing me at ARM intrinsics related to loads, problem that I've
reported in original e-mail, is not support for vector loads, but support for
'vmaxs'. For instance, there is no vector loads of 16 floats in ARM ISA
but it is legal to write in LLVM:
; ModuleID = 'vadd.ll'
target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32"
target triple = "armv7-none-linux-androideabi"
define void @vaddf32(<16 x float> *%C, <16 x float>* %A, <16 x
float>* %B) nounwind {
%tmp1 = load <16 x float>* %A
%tmp2 = load <16 x float>* %B
%tmp3 = fadd <16 x float> %tmp1, %tmp2
store <16 x float> %tmp3, <16 x float>* %C
ret void
}
and llc generates following code:
vaddf32: @ @vaddf32
@ BB#0:
add r12, r1, #48
add r3, r2, #32
vld1.64 {d20, d21}, [r3, :128]
add r3, r2, #48
vld1.64 {d16, d17}, [r2, :128]
add r2, r2, #16
vld1.64 {d18, d19}, [r1, :128]
vld1.64 {d26, d27}, [r12, :128]
add r12, r1, #32
vld1.64 {d24, d25}, [r3, :128]
add r1, r1, #16
vadd.f32 q11, q9, q8
vld1.64 {d28, d29}, [r12, :128]
vadd.f32 q9, q13, q12
vadd.f32 q8, q14, q10
vld1.64 {d20, d21}, [r2, :128]
vld1.64 {d24, d25}, [r1, :128]
add r1, r0, #48
vadd.f32 q10, q12, q10
vst1.64 {d22, d23}, [r0, :128]
vst1.64 {d18, d19}, [r1, :128]
add r1, r0, #32
add r0, r0, #16
vst1.64 {d16, d17}, [r1, :128]
vst1.64 {d20, d21}, [r0, :128]
bx lr
.Ltmp0:
.size vaddf32, .Ltmp0-vadd32
So 'fadd' instruction operating on vector of <16 x float> is
legalized (scalarized) into 4 vadd.f32 instructions. My assumption was that same
process could apply to NEON LLVM intrinsics such as 'vmaxs'. It
doesn't seems to be the case so I'm wondering if this is an actual bug
or if LLVM intrinsics are limited to legal types for the targeted architecture.
Note that however <16 x float> loads are not supported LLVM is able to
generate them as a serie of vld1.i64 instructions.
Hope this clarify my request.
Best Regards
Seb
________________________________________
De : rengolin at gmail.com [rengolin at gmail.com] de la part de Renato Golin
[rengolin at systemcall.org]
Date d'envoi : vendredi 21 septembre 2012 11:14
À : Sebastien DELDON-GNB
Cc : llvmdev at cs.uiuc.edu
Objet : Re: [LLVMdev] Question about LLVM NEON intrinsics
On 21 September 2012 09:28, Sebastien DELDON-GNB
<sebastien.deldon at st.com> wrote:> declare <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x float>,
<16 x float>) nounwind readnone
>
> llc fails with following message:
>
> SplitVectorResult #0: 0x2258350: v16f32 = llvm.arm.neon.vmaxs 0x2258250,
0x2258050, 0x2258150 [ORD=3] [ID=0]
>
> LLVM ERROR: Do not know how to split the result of this operator!
>
> Is it a BUG ? If yes I'm happy to get some directions on how I can fix
it. If not I would like to know how to determine valid type for a given LLVM
intrinsics.
I may be wrong, but I don't think there is such a load intrinsic...
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0348c/BABDCGGF.html
--
cheers,
--renato
http://systemcall.org/
On 21 September 2012 10:57, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote:> You're pointing me at ARM intrinsics related to loads, problem that I've reported in original e-mail, is not support for vector loads, but support for 'vmaxs'. For instance, there is no vector loads of 16 floats in ARM ISA but it is legal to write in LLVM:Oh, yes, sorry. Still, Eli is right, you can't assume generic IR will convert to platform-specific intrinsics automagically. This is not a bug, but could be a feature, if you want to write a NEON validator pass that pattern-matches generic LLVM IR operations into the respective (semantically correct) NEON intrinsics, or at least leave the IR operations in a state that the back-end will recognize it. Honestly, I prefer the approach to have the front-end writing generic IR and having target-specific passes that will change the generic IR to target specific, so the back-end can deal with it. But it seems that the front-ends had to deal with that, so far, including the ones I wrote. :/ -- cheers, --renato http://systemcall.org/
Sebastien DELDON-GNB
2012-Sep-21 10:19 UTC
[LLVMdev] RE : Question about LLVM NEON intrinsics
Hi Renato, I guess one solution could be to define LLVM max intrinsic and have LLVM backends generating the appropriate instructions (using SSE inst for x86, NEON for ARM etc.). Seb> -----Original Message----- > From: rengolin at gmail.com [mailto:rengolin at gmail.com] On Behalf Of > Renato Golin > Sent: Friday, September 21, 2012 12:13 PM > To: Sebastien DELDON-GNB > Cc: llvmdev at cs.uiuc.edu > Subject: Re: RE : [LLVMdev] Question about LLVM NEON intrinsics > > On 21 September 2012 10:57, Sebastien DELDON-GNB > <sebastien.deldon at st.com> wrote: > > You're pointing me at ARM intrinsics related to loads, problem that I've > reported in original e-mail, is not support for vector loads, but support for > 'vmaxs'. For instance, there is no vector loads of 16 floats in ARM ISA but it is > legal to write in LLVM: > > Oh, yes, sorry. > > Still, Eli is right, you can't assume generic IR will convert to platform-specific > intrinsics automagically. > > This is not a bug, but could be a feature, if you want to write a NEON validator > pass that pattern-matches generic LLVM IR operations into the respective > (semantically correct) NEON intrinsics, or at least leave the IR operations in a > state that the back-end will recognize it. > > Honestly, I prefer the approach to have the front-end writing generic IR and > having target-specific passes that will change the generic IR to target specific, > so the back-end can deal with it. But it seems that the front-ends had to deal > with that, so far, including the ones I wrote. :/ > > -- > cheers, > --renato > > http://systemcall.org/