thr3ads.net - llvm dev - [LLVMdev] Question about ARM/vfp/NEON code generation [May 2011]

If this information is useful, please help other people find it:
Share via:

David Dunkle

2011-May-27 01:04 UTC

[LLVMdev] Question about ARM/vfp/NEON code generation

I have a code generation question for ARM with VFP and NEON.

I am generating code for the following function as a test:
 
void FloatingPointTest(float f1, float f2, float f3) 
{
     float f4 = f1 * f2;
     if (f4 > f3)
          printf("%f\n",f2);
     else
          printf("%f\n",f3);
}

I have tried compiling with:

	1. -mfloat-abi=softfp and -mfpu=neon
	2. -mfloat-abi=hard and -mfpu=neon
	3. -mfloat-abi=softfp and -mfpu=vfp3
	4. -mfloat-abi=hard and -mfpu=vfp3

When I use --emit-llvm -c flags to generate bitcode, and then use llc to
generate ARM assembler, I have tried supplying these flag variations to
llc:

      5. llc -mattr=+neon
      6. llc -mattr=+vfp3

I am building for armv7-a.

In all cases, I get code that looks pretty very the same; its like what
is below. However, I am expecting to see instruction level differences
between the vfp3 and neon versions. When I do the same with gcc 4.2 I do
see differences in the generated code.

Am I mistaken in expecting to see a difference in NEON and VFP
instructions, is this my mistake, or is there something else going on
here? 

thanks,
-David

        .private_extern _FloatingPointTest
        .globl  _FloatingPointTest
        .align  2
_FloatingPointTest:                     @ @FloatingPointTest
@ BB#0:                                 @ %entry
        sub     sp, sp, #8
        str     lr, [sp, #4]
        str     r7, [sp]
        mov     r7, sp
        sub     sp, sp, #36
        str     r0, [r7, #-4]
        vmov    s0, r0
        str     r1, [r7, #-8]
        vmov    s1, r1
        str     r2, [r7, #-12]
        vmov    s2, r2
        vldr.32 s3, [r7, #-4]
        vldr.32 s4, [r7, #-8]
        vmul.f32        s3, s3, s4
        vstr.32 s3, [r7, #-16]
        vldr.32 s4, [r7, #-12]
        vcmpe.f32       s3, s4
        vmrs    apsr_nzcv, fpscr
        vstr.32 s0, [sp, #16]
        vstr.32 s2, [sp, #12]
        vstr.32 s1, [sp, #8]
        ble     LBB20_2
@ BB#1:                                 @ %bb
        vldr.32 s0, [r7, #-16]
        ldr     r0, LCPI20_0

LPC20_0:
        add     r0, pc, r0
        vcvt.f64.f32    d1, s0
        vmov    r1, r2, d1
        bl      _printf
        str     r0, [sp, #4]
        b       LBB20_3
LBB20_2:                                @ %bb1
        vldr.32 s0, [r7, #-12]
        ldr     r0, LCPI20_1

LPC20_1:
        add     r0, pc, r0
        vcvt.f64.f32    d1, s0
        vmov    r1, r2, d1
        bl      _printf
        str     r0, [sp]
LBB20_3:                                @ %bb2
@ BB#4:                                 @ %return
        mov     sp, r7
        ldr     r7, [sp]
        ldr     lr, [sp, #4]
        add     sp, sp, #8
        bx      lr
@ BB#5:
        .align  2
LCPI20_0:
        .long   L_.str107-(LPC20_0+8)

        .align  2
LCPI20_1:
        .long   L_.str107-(LPC20_1+8)

David Dunkle

2011-May-27 17:49 UTC

head link

[LLVMdev] Question about ARM/vfp/NEON code generation

Thanks, that helps a lot. 
> All chips (to date) with NEON have VFP3, so it's safe to assume that a
-mfpu=neon will have VFP3, so all the decisions > about code generated for VFP3 can safely be assumed by targets withNEON.

Just to confirm my understanding, can I correctly say in general that
the llc code generator might blur distinctions between NEON and VFP3
when it can do so safely?

-David

-----Original Message-----
From: rengolin at gmail.com [mailto:rengolin at gmail.com] On Behalf Of Renato
Golin
Sent: Friday, May 27, 2011 2:38 AM
To: David Dunkle
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Question about ARM/vfp/NEON code generation

On 27 May 2011 02:04, David Dunkle <ddunkle at arxan.com>
wrote:> In all cases, I get code that looks pretty very the same; its like 
> what is below. However, I am expecting to see instruction level 
> differences between the vfp3 and neon versions. When I do the same 
> with gcc 4.2 I do see differences in the generated code.
Hi David,

You could see different instructions (as gcc does, you say), but it's
not necessary.

Your example has only floating point arithmetic, which both VFP3 and
NEON can do, so the final assembly will be similar. If you start using
integer arithmetic, than you can see vector instructions for NEON (if
it's vectorized) and not for VFP3.

All chips (to date) with NEON have VFP3, so it's safe to assume that a
-mfpu=neon will have VFP3, so all the decisions about code generated for
VFP3 can safely be assumed by targets with NEON.

Hope that answers your questions.

cheers,
--renato

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - May 2011 - [LLVMdev] Question about ARM/vfp/NEON code generation

[LLVMdev] Question about ARM/vfp/NEON code generation

[LLVMdev] Question about ARM/vfp/NEON code generation

Maybe Matching Threads