thr3ads.net - llvm dev - [LLVMdev] Question about LLVM NEON intrinsics [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Sebastien DELDON-GNB

2012-Sep-21 08:28 UTC

[LLVMdev] Question about LLVM NEON intrinsics

Hi all,

I would like to know if LLVM Neon intrinsics are designed to support only
'Legal' types for NEON units.
Using llc -march=arm -mcpu=cortex-a9 vmax4.ll -o vmax4.s on following ll code:


; ModuleID = 'vmax.ll'
target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32"
target triple = "armv7-none-linux-androideabi"

define void @vmaxf32(<4 x float> *%C, <4 x float>* %A, <4 x
float>* %B) nounwind {
    %tmp1 = load <4 x float>* %A
    %tmp2 = load <4 x float>* %B
    %tmp3 = call <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float>
%tmp1, <4 x float> %tmp2)
    store <4 x float> %tmp3, <4 x float>* %C
    ret void
}

declare <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float>, <4 x
float>) nounwind readnone

I've got following code generated:

...
vmaxf32:                                @ @vmaxf32
@ BB#0:
	vld1.64	{d16, d17}, [r2]
	vld1.64	{d18, d19}, [r1]
	vmax.f32	q8, q9, q8
	vst1.64	{d16, d17}, [r0]
	bx	lr
...

Now if use <16 x float> vectors instead of <4 x float>:

define void @vmaxf32(<16 x float> *%C, <16 x float>* %A, <16 x
float>* %B) nounwind {
    %tmp1 = load <16 x float>* %A
    %tmp2 = load <16 x float>* %B
    %tmp3 = call <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x
float> %tmp1, <16 x float> %tmp2)
    store <16 x float> %tmp3, <16 x float>* %C
    ret void
}

declare <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x float>,
<16 x float>) nounwind readnone

llc fails with following message:

SplitVectorResult #0: 0x2258350: v16f32 = llvm.arm.neon.vmaxs 0x2258250,
0x2258050, 0x2258150 [ORD=3] [ID=0]

LLVM ERROR: Do not know how to split the result of this operator!

Is it a BUG ? If yes I'm happy to get some directions on how I can fix it.
If not I would like to know how to determine valid type for a given LLVM
intrinsics.

Thanks for your answers
Best Regards
Seb

Renato Golin

2012-Sep-21 09:14 UTC

head link

[LLVMdev] Question about LLVM NEON intrinsics

On 21 September 2012 09:28, Sebastien DELDON-GNB
<sebastien.deldon at st.com> wrote:> declare <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x float>,
<16 x float>) nounwind readnone
>
> llc fails with following message:
>
> SplitVectorResult #0: 0x2258350: v16f32 = llvm.arm.neon.vmaxs 0x2258250,
0x2258050, 0x2258150 [ORD=3] [ID=0]
>
> LLVM ERROR: Do not know how to split the result of this operator!
>
> Is it a BUG ? If yes I'm happy to get some directions on how I can fix
it. If not I would like to know how to determine valid type for a given LLVM
intrinsics.
I may be wrong, but I don't think there is such a load intrinsic...

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0348c/BABDCGGF.html


-- 
cheers,
--renato

http://systemcall.org/

Eli Friedman

2012-Sep-21 09:54 UTC

head link

[LLVMdev] Question about LLVM NEON intrinsics

On Fri, Sep 21, 2012 at 1:28 AM, Sebastien DELDON-GNB
<sebastien.deldon at st.com> wrote:> Hi all,
>
> I would like to know if LLVM Neon intrinsics are designed to support only
'Legal' types for NEON units.
> Using llc -march=arm -mcpu=cortex-a9 vmax4.ll -o vmax4.s on following ll
code:
>
>
> ; ModuleID = 'vmax.ll'
> target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32"
> target triple = "armv7-none-linux-androideabi"
>
> define void @vmaxf32(<4 x float> *%C, <4 x float>* %A, <4 x
float>* %B) nounwind {
>     %tmp1 = load <4 x float>* %A
>     %tmp2 = load <4 x float>* %B
>     %tmp3 = call <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x
float> %tmp1, <4 x float> %tmp2)
>     store <4 x float> %tmp3, <4 x float>* %C
>     ret void
> }
>
> declare <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float>,
<4 x float>) nounwind readnone
>
> I've got following code generated:
>
> ...
> vmaxf32:                                @ @vmaxf32
> @ BB#0:
>         vld1.64 {d16, d17}, [r2]
>         vld1.64 {d18, d19}, [r1]
>         vmax.f32        q8, q9, q8
>         vst1.64 {d16, d17}, [r0]
>         bx      lr
> ...
>
> Now if use <16 x float> vectors instead of <4 x float>:
>
> define void @vmaxf32(<16 x float> *%C, <16 x float>* %A, <16
x float>* %B) nounwind {
>     %tmp1 = load <16 x float>* %A
>     %tmp2 = load <16 x float>* %B
>     %tmp3 = call <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x
float> %tmp1, <16 x float> %tmp2)
>     store <16 x float> %tmp3, <16 x float>* %C
>     ret void
> }
>
> declare <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x float>,
<16 x float>) nounwind readnone
>
> llc fails with following message:
>
> SplitVectorResult #0: 0x2258350: v16f32 = llvm.arm.neon.vmaxs 0x2258250,
0x2258050, 0x2258150 [ORD=3] [ID=0]
>
> LLVM ERROR: Do not know how to split the result of this operator!
>
> Is it a BUG ? If yes I'm happy to get some directions on how I can fix
it.
No... platform-specific intrinsics have platform-specific semantics,
including what types they're defined for. NEON doesn't have 16 x float
vectors, at least not for that sort of operation.
> If not I would like to know how to determine valid type for a given LLVM
intrinsics.
The ARM reference manual is probably your best bet for ARM intrinsics.

-Eli

Sebastien DELDON-GNB

2012-Sep-21 09:57 UTC

head link

[LLVMdev] RE : Question about LLVM NEON intrinsics

Hello Renato,

You're pointing me at ARM intrinsics related to loads, problem that I've
reported in original e-mail, is not support for vector loads, but support for
'vmaxs'. For instance, there is no vector loads of 16 floats in ARM ISA
but it is legal to write in LLVM:

; ModuleID = 'vadd.ll'
target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32"
target triple = "armv7-none-linux-androideabi"

define void @vaddf32(<16 x float> *%C, <16 x float>* %A, <16 x
float>* %B) nounwind {
    %tmp1 = load <16 x float>* %A
    %tmp2 = load <16 x float>* %B
    %tmp3 = fadd <16 x float> %tmp1, %tmp2
    store <16 x float> %tmp3, <16 x float>* %C
    ret void
}

and llc generates following code:

vaddf32:                                @ @vaddf32
@ BB#0:
	add	r12, r1, #48
	add	r3, r2, #32
	vld1.64	{d20, d21}, [r3, :128]
	add	r3, r2, #48
	vld1.64	{d16, d17}, [r2, :128]
	add	r2, r2, #16
	vld1.64	{d18, d19}, [r1, :128]
	vld1.64	{d26, d27}, [r12, :128]
	add	r12, r1, #32
	vld1.64	{d24, d25}, [r3, :128]
	add	r1, r1, #16
	vadd.f32	q11, q9, q8
	vld1.64	{d28, d29}, [r12, :128]
	vadd.f32	q9, q13, q12
	vadd.f32	q8, q14, q10
	vld1.64	{d20, d21}, [r2, :128]
	vld1.64	{d24, d25}, [r1, :128]
	add	r1, r0, #48
	vadd.f32	q10, q12, q10
	vst1.64	{d22, d23}, [r0, :128]
	vst1.64	{d18, d19}, [r1, :128]
	add	r1, r0, #32
	add	r0, r0, #16
	vst1.64	{d16, d17}, [r1, :128]
	vst1.64	{d20, d21}, [r0, :128]
	bx	lr
.Ltmp0:
	.size	vaddf32, .Ltmp0-vadd32

So 'fadd' instruction operating on vector of <16 x float> is
legalized (scalarized) into 4 vadd.f32 instructions. My assumption was that same
process could apply to NEON LLVM intrinsics such as 'vmaxs'.  It
doesn't seems to be the case so I'm wondering if this is an actual bug
or if LLVM intrinsics are limited to legal types for the targeted architecture.
Note that however <16 x float> loads are not supported LLVM is able to
generate them as a serie of vld1.i64 instructions.
Hope this clarify my request.

Best Regards
Seb

________________________________________
De : rengolin at gmail.com [rengolin at gmail.com] de la part de Renato Golin
[rengolin at systemcall.org]
Date d'envoi : vendredi 21 septembre 2012 11:14
À : Sebastien DELDON-GNB
Cc : llvmdev at cs.uiuc.edu
Objet : Re: [LLVMdev] Question about LLVM NEON intrinsics

On 21 September 2012 09:28, Sebastien DELDON-GNB
<sebastien.deldon at st.com> wrote:> declare <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x float>,
<16 x float>) nounwind readnone
>
> llc fails with following message:
>
> SplitVectorResult #0: 0x2258350: v16f32 = llvm.arm.neon.vmaxs 0x2258250,
0x2258050, 0x2258150 [ORD=3] [ID=0]
>
> LLVM ERROR: Do not know how to split the result of this operator!
>
> Is it a BUG ? If yes I'm happy to get some directions on how I can fix
it. If not I would like to know how to determine valid type for a given LLVM
intrinsics.
I may be wrong, but I don't think there is such a load intrinsic...

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0348c/BABDCGGF.html


--
cheers,
--renato

http://systemcall.org/

Sebastien DELDON-GNB

2012-Sep-21 09:58 UTC

head link

[LLVMdev] RE : Question about LLVM NEON intrinsics

Hi Eli,

Thanks for the answer, it clarifies the situation for me. Do you know if there
is Pass in LLVM that could be adapted to 'legalize' intrinsics calls ?
Or shall I define my own intrinsics for non supported types ? 

Best Regards
Seb
________________________________________
De : Eli Friedman [eli.friedman at gmail.com]
Date d'envoi : vendredi 21 septembre 2012 11:54
À : Sebastien DELDON-GNB
Cc : llvmdev at cs.uiuc.edu
Objet : Re: [LLVMdev] Question about LLVM NEON intrinsics

On Fri, Sep 21, 2012 at 1:28 AM, Sebastien DELDON-GNB
<sebastien.deldon at st.com> wrote:> Hi all,
>
> I would like to know if LLVM Neon intrinsics are designed to support only
'Legal' types for NEON units.
> Using llc -march=arm -mcpu=cortex-a9 vmax4.ll -o vmax4.s on following ll
code:
>
>
> ; ModuleID = 'vmax.ll'
> target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32"
> target triple = "armv7-none-linux-androideabi"
>
> define void @vmaxf32(<4 x float> *%C, <4 x float>* %A, <4 x
float>* %B) nounwind {
>     %tmp1 = load <4 x float>* %A
>     %tmp2 = load <4 x float>* %B
>     %tmp3 = call <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x
float> %tmp1, <4 x float> %tmp2)
>     store <4 x float> %tmp3, <4 x float>* %C
>     ret void
> }
>
> declare <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float>,
<4 x float>) nounwind readnone
>
> I've got following code generated:
>
> ...
> vmaxf32:                                @ @vmaxf32
> @ BB#0:
>         vld1.64 {d16, d17}, [r2]
>         vld1.64 {d18, d19}, [r1]
>         vmax.f32        q8, q9, q8
>         vst1.64 {d16, d17}, [r0]
>         bx      lr
> ...
>
> Now if use <16 x float> vectors instead of <4 x float>:
>
> define void @vmaxf32(<16 x float> *%C, <16 x float>* %A, <16
x float>* %B) nounwind {
>     %tmp1 = load <16 x float>* %A
>     %tmp2 = load <16 x float>* %B
>     %tmp3 = call <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x
float> %tmp1, <16 x float> %tmp2)
>     store <16 x float> %tmp3, <16 x float>* %C
>     ret void
> }
>
> declare <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x float>,
<16 x float>) nounwind readnone
>
> llc fails with following message:
>
> SplitVectorResult #0: 0x2258350: v16f32 = llvm.arm.neon.vmaxs 0x2258250,
0x2258050, 0x2258150 [ORD=3] [ID=0]
>
> LLVM ERROR: Do not know how to split the result of this operator!
>
> Is it a BUG ? If yes I'm happy to get some directions on how I can fix
it.
No... platform-specific intrinsics have platform-specific semantics,
including what types they're defined for. NEON doesn't have 16 x float
vectors, at least not for that sort of operation.
> If not I would like to know how to determine valid type for a given LLVM
intrinsics.
The ARM reference manual is probably your best bet for ARM intrinsics.

-Eli

Jim Grosbach

2012-Sep-21 16:52 UTC

head link

[LLVMdev] Question about LLVM NEON intrinsics

On Sep 21, 2012, at 2:54 AM, Eli Friedman <eli.friedman at gmail.com>
wrote:
> On Fri, Sep 21, 2012 at 1:28 AM, Sebastien DELDON-GNB
> <sebastien.deldon at st.com> wrote:
>> Hi all,
>> 
>> I would like to know if LLVM Neon intrinsics are designed to support
only 'Legal' types for NEON units.
>> Using llc -march=arm -mcpu=cortex-a9 vmax4.ll -o vmax4.s on following
ll code:
>> 
>> 
>> ; ModuleID = 'vmax.ll'
>> target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32"
>> target triple = "armv7-none-linux-androideabi"
>> 
>> define void @vmaxf32(<4 x float> *%C, <4 x float>* %A,
<4 x float>* %B) nounwind {
>>    %tmp1 = load <4 x float>* %A
>>    %tmp2 = load <4 x float>* %B
>>    %tmp3 = call <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x
float> %tmp1, <4 x float> %tmp2)
>>    store <4 x float> %tmp3, <4 x float>* %C
>>    ret void
>> }
>> 
>> declare <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float>,
<4 x float>) nounwind readnone
>> 
>> I've got following code generated:
>> 
>> ...
>> vmaxf32:                                @ @vmaxf32
>> @ BB#0:
>>        vld1.64 {d16, d17}, [r2]
>>        vld1.64 {d18, d19}, [r1]
>>        vmax.f32        q8, q9, q8
>>        vst1.64 {d16, d17}, [r0]
>>        bx      lr
>> ...
>> 
>> Now if use <16 x float> vectors instead of <4 x float>:
>> 
>> define void @vmaxf32(<16 x float> *%C, <16 x float>* %A,
<16 x float>* %B) nounwind {
>>    %tmp1 = load <16 x float>* %A
>>    %tmp2 = load <16 x float>* %B
>>    %tmp3 = call <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x
float> %tmp1, <16 x float> %tmp2)
>>    store <16 x float> %tmp3, <16 x float>* %C
>>    ret void
>> }
>> 
>> declare <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x
float>, <16 x float>) nounwind readnone
>> 
>> llc fails with following message:
>> 
>> SplitVectorResult #0: 0x2258350: v16f32 = llvm.arm.neon.vmaxs
0x2258250, 0x2258050, 0x2258150 [ORD=3] [ID=0]
>> 
>> LLVM ERROR: Do not know how to split the result of this operator!
>> 
>> Is it a BUG ? If yes I'm happy to get some directions on how I can
fix it.
> 
> No... platform-specific intrinsics have platform-specific semantics,
> including what types they're defined for. NEON doesn't have 16 x
float
> vectors, at least not for that sort of operation.
> Right.

These backend intrinsics are designed for support of the functions in
arm_neon.h. Any use outside of that context is "there be dragons here"
territory.
>> If not I would like to know how to determine valid type for a given
LLVM intrinsics.
> 
> The ARM reference manual is probably your best bet for ARM intrinsics.
> 
> -Eli
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Sep 2012 - [LLVMdev] Question about LLVM NEON intrinsics

[LLVMdev] Question about LLVM NEON intrinsics

[LLVMdev] Question about LLVM NEON intrinsics

[LLVMdev] Question about LLVM NEON intrinsics

[LLVMdev] RE : Question about LLVM NEON intrinsics

[LLVMdev] RE : Question about LLVM NEON intrinsics

[LLVMdev] Question about LLVM NEON intrinsics

Apparently Analagous Threads