Evan Cheng
2011-May-26 22:51 UTC
[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team
Hi all, LLVM CodeGen and Tools team at Apple is looking for exceptional compiler engineers. This is a great opportunity to work with many of the leaders in the LLVM community. If you are interested in this position, please send your resume / CV and relevant information to evan.cheng at apple.com Thanks, Evan Job description The Apple compiler team is seeking an engineer who is strongly motivated to build high-quality and high performance compilers. We are focused on improving the user experience by reducing compile time as well as maximizing the execution speed of the code generated for the Apple systems. As a key member of the Apple Compiler Team, you will apply your strong state-of-the-art background and experience toward the development of fast highly optimized compiler products that extract top performance from the Apple systems. You will join a small team of highly motivated senior engineers who build first-class open-source compiler tools and apply them in new and innovative ways. Required Experience: * Ideal candidate will have experience with the LLVM, GCC, or other open source / commercial compilers. * Strong background in compiler architecture, optimization, code generation and overall design of compilers. * Knowledge and experience with developing compilers for embedded devices is a plus. * Familiarity with analyzing generated code for optimization/code generation opportunities. * Strong communication and teamwork skills. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110526/9f9c7070/attachment.html>
I have a code generation question for ARM with VFP and NEON. I am generating code for the following function as a test: void FloatingPointTest(float f1, float f2, float f3) { float f4 = f1 * f2; if (f4 > f3) printf("%f\n",f2); else printf("%f\n",f3); } I have tried compiling with: 1. -mfloat-abi=softfp and -mfpu=neon 2. -mfloat-abi=hard and -mfpu=neon 3. -mfloat-abi=softfp and -mfpu=vfp3 4. -mfloat-abi=hard and -mfpu=vfp3 When I use --emit-llvm -c flags to generate bitcode, and then use llc to generate ARM assembler, I have tried supplying these flag variations to llc: 5. llc -mattr=+neon 6. llc -mattr=+vfp3 I am building for armv7-a. In all cases, I get code that looks pretty very the same; its like what is below. However, I am expecting to see instruction level differences between the vfp3 and neon versions. When I do the same with gcc 4.2 I do see differences in the generated code. Am I mistaken in expecting to see a difference in NEON and VFP instructions, is this my mistake, or is there something else going on here? thanks, -David .private_extern _FloatingPointTest .globl _FloatingPointTest .align 2 _FloatingPointTest: @ @FloatingPointTest @ BB#0: @ %entry sub sp, sp, #8 str lr, [sp, #4] str r7, [sp] mov r7, sp sub sp, sp, #36 str r0, [r7, #-4] vmov s0, r0 str r1, [r7, #-8] vmov s1, r1 str r2, [r7, #-12] vmov s2, r2 vldr.32 s3, [r7, #-4] vldr.32 s4, [r7, #-8] vmul.f32 s3, s3, s4 vstr.32 s3, [r7, #-16] vldr.32 s4, [r7, #-12] vcmpe.f32 s3, s4 vmrs apsr_nzcv, fpscr vstr.32 s0, [sp, #16] vstr.32 s2, [sp, #12] vstr.32 s1, [sp, #8] ble LBB20_2 @ BB#1: @ %bb vldr.32 s0, [r7, #-16] ldr r0, LCPI20_0 LPC20_0: add r0, pc, r0 vcvt.f64.f32 d1, s0 vmov r1, r2, d1 bl _printf str r0, [sp, #4] b LBB20_3 LBB20_2: @ %bb1 vldr.32 s0, [r7, #-12] ldr r0, LCPI20_1 LPC20_1: add r0, pc, r0 vcvt.f64.f32 d1, s0 vmov r1, r2, d1 bl _printf str r0, [sp] LBB20_3: @ %bb2 @ BB#4: @ %return mov sp, r7 ldr r7, [sp] ldr lr, [sp, #4] add sp, sp, #8 bx lr @ BB#5: .align 2 LCPI20_0: .long L_.str107-(LPC20_0+8) .align 2 LCPI20_1: .long L_.str107-(LPC20_1+8)
On 27 May 2011 02:04, David Dunkle <ddunkle at arxan.com> wrote:> In all cases, I get code that looks pretty very the same; its like what > is below. However, I am expecting to see instruction level differences > between the vfp3 and neon versions. When I do the same with gcc 4.2 I do > see differences in the generated code.Hi David, You could see different instructions (as gcc does, you say), but it's not necessary. Your example has only floating point arithmetic, which both VFP3 and NEON can do, so the final assembly will be similar. If you start using integer arithmetic, than you can see vector instructions for NEON (if it's vectorized) and not for VFP3. All chips (to date) with NEON have VFP3, so it's safe to assume that a -mfpu=neon will have VFP3, so all the decisions about code generated for VFP3 can safely be assumed by targets with NEON. Hope that answers your questions. cheers, --renato
Possibly Parallel Threads
- [LLVMdev] Question about ARM/vfp/NEON code generation
- [LLVMdev] Question about ARM/vfp/NEON code generation
- [LLVMdev] Question about ARM/vfp/NEON code generation
- [LLVMdev] Question about ARM/vfp/NEON code generation
- [LLVMdev] Question about ARM/vfp/NEON code generation