thr3ads.net - llvm dev - [LLVMdev] Question about ARM/vfp/NEON code generation [May 2011]

If this information is useful, please help other people find it:
Share via:

Evan Cheng

2011-May-26 22:51 UTC

[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team

Hi all,

LLVM CodeGen and Tools team at Apple is looking for exceptional compiler
engineers. This is a great opportunity to work with many of the leaders in the
LLVM community.

If you are interested in this position, please send your resume / CV and
relevant information to evan.cheng at apple.com

Thanks,

Evan


Job description

The Apple compiler team is seeking an engineer who is strongly motivated to
build high-quality and high performance compilers. We are focused on improving
the user experience by reducing compile time as well as maximizing the execution
speed of the code generated for the Apple systems. As a key member of the Apple
Compiler Team, you will apply your strong state-of-the-art background and
experience toward the development of fast highly optimized compiler products
that extract top performance from the Apple systems.

You will join a small team of highly motivated senior engineers who build
first-class open-source compiler tools and apply them in new and innovative
ways.

Required Experience:

* Ideal candidate will have experience with the LLVM, GCC, or other open source
/ commercial compilers.
* Strong background in compiler architecture, optimization, code generation and
overall design of compilers.
* Knowledge and experience with developing compilers for embedded devices is a
plus.
* Familiarity with analyzing generated code for optimization/code generation
opportunities.
* Strong communication and teamwork skills.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110526/9f9c7070/attachment.html>

David Dunkle

2011-May-27 01:04 UTC

head link

[LLVMdev] Question about ARM/vfp/NEON code generation

I have a code generation question for ARM with VFP and NEON.

I am generating code for the following function as a test:
 
void FloatingPointTest(float f1, float f2, float f3) 
{
     float f4 = f1 * f2;
     if (f4 > f3)
          printf("%f\n",f2);
     else
          printf("%f\n",f3);
}

I have tried compiling with:

	1. -mfloat-abi=softfp and -mfpu=neon
	2. -mfloat-abi=hard and -mfpu=neon
	3. -mfloat-abi=softfp and -mfpu=vfp3
	4. -mfloat-abi=hard and -mfpu=vfp3

When I use --emit-llvm -c flags to generate bitcode, and then use llc to
generate ARM assembler, I have tried supplying these flag variations to
llc:

      5. llc -mattr=+neon
      6. llc -mattr=+vfp3

I am building for armv7-a.

In all cases, I get code that looks pretty very the same; its like what
is below. However, I am expecting to see instruction level differences
between the vfp3 and neon versions. When I do the same with gcc 4.2 I do
see differences in the generated code.

Am I mistaken in expecting to see a difference in NEON and VFP
instructions, is this my mistake, or is there something else going on
here? 

thanks,
-David

        .private_extern _FloatingPointTest
        .globl  _FloatingPointTest
        .align  2
_FloatingPointTest:                     @ @FloatingPointTest
@ BB#0:                                 @ %entry
        sub     sp, sp, #8
        str     lr, [sp, #4]
        str     r7, [sp]
        mov     r7, sp
        sub     sp, sp, #36
        str     r0, [r7, #-4]
        vmov    s0, r0
        str     r1, [r7, #-8]
        vmov    s1, r1
        str     r2, [r7, #-12]
        vmov    s2, r2
        vldr.32 s3, [r7, #-4]
        vldr.32 s4, [r7, #-8]
        vmul.f32        s3, s3, s4
        vstr.32 s3, [r7, #-16]
        vldr.32 s4, [r7, #-12]
        vcmpe.f32       s3, s4
        vmrs    apsr_nzcv, fpscr
        vstr.32 s0, [sp, #16]
        vstr.32 s2, [sp, #12]
        vstr.32 s1, [sp, #8]
        ble     LBB20_2
@ BB#1:                                 @ %bb
        vldr.32 s0, [r7, #-16]
        ldr     r0, LCPI20_0

LPC20_0:
        add     r0, pc, r0
        vcvt.f64.f32    d1, s0
        vmov    r1, r2, d1
        bl      _printf
        str     r0, [sp, #4]
        b       LBB20_3
LBB20_2:                                @ %bb1
        vldr.32 s0, [r7, #-12]
        ldr     r0, LCPI20_1

LPC20_1:
        add     r0, pc, r0
        vcvt.f64.f32    d1, s0
        vmov    r1, r2, d1
        bl      _printf
        str     r0, [sp]
LBB20_3:                                @ %bb2
@ BB#4:                                 @ %return
        mov     sp, r7
        ldr     r7, [sp]
        ldr     lr, [sp, #4]
        add     sp, sp, #8
        bx      lr
@ BB#5:
        .align  2
LCPI20_0:
        .long   L_.str107-(LPC20_0+8)

        .align  2
LCPI20_1:
        .long   L_.str107-(LPC20_1+8)

Renato Golin

2011-May-27 09:37 UTC

head link

[LLVMdev] Question about ARM/vfp/NEON code generation

On 27 May 2011 02:04, David Dunkle <ddunkle at arxan.com>
wrote:> In all cases, I get code that looks pretty very the same; its like what
> is below. However, I am expecting to see instruction level differences
> between the vfp3 and neon versions. When I do the same with gcc 4.2 I do
> see differences in the generated code.
Hi David,

You could see different instructions (as gcc does, you say), but it's
not necessary.

Your example has only floating point arithmetic, which both VFP3 and
NEON can do, so the final assembly will be similar. If you start using
integer arithmetic, than you can see vector instructions for NEON (if
it's vectorized) and not for VFP3.

All chips (to date) with NEON have VFP3, so it's safe to assume that a
-mfpu=neon will have VFP3, so all the decisions about code generated
for VFP3 can safely be assumed by targets with NEON.

Hope that answers your questions.

cheers,
--renato

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - May 2011 - [LLVMdev] Question about ARM/vfp/NEON code generation

[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team

[LLVMdev] Question about ARM/vfp/NEON code generation

[LLVMdev] Question about ARM/vfp/NEON code generation

Apparently Analagous Threads