thr3ads.net - search: "testvec4multiply"

Displaying 1 result from an estimated 1 matches for "testvec4multiply".

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2014 Dec 07

[LLVMdev] NEON intrinsics preventing redundant load optimization?

...iOS SDK: Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn) Here's a simplified test case: struct vec4 { float data[4]; }; vec4 operator* (vec4& a, vec4& b) { vec4 result; for(int i = 0; i < 4; ++i) result.data[i] = a.data[i] * b.data[i]; return result; } void TestVec4Multiply(vec4& a, vec4& b, vec4& result) { result = a * b; } With -O3 the loop gets vectorized and the code generated looks optimal: __Z16TestVec4MultiplyR4vec4S0_S0_: @ BB#0: vld1.32 {d16, d17}, [r1] vld1.32 {d18, d19}, [r0] vmul.f32 q8, q9, q8 vst1.32 {d16, d17}, [r2] bx lr However if...

search for: testvec4multiply