Displaying 1 result from an estimated 1 matches for "testvec4multiply".
2014 Dec 07
3
[LLVMdev] NEON intrinsics preventing redundant load optimization?
...iOS SDK: Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)
Here's a simplified test case:
struct vec4
{
float data[4];
};
vec4 operator* (vec4& a, vec4& b)
{
vec4 result;
for(int i = 0; i < 4; ++i)
result.data[i] = a.data[i] * b.data[i];
return result;
}
void TestVec4Multiply(vec4& a, vec4& b, vec4& result)
{
result = a * b;
}
With -O3 the loop gets vectorized and the code generated looks optimal:
__Z16TestVec4MultiplyR4vec4S0_S0_:
@ BB#0:
vld1.32 {d16, d17}, [r1]
vld1.32 {d18, d19}, [r0]
vmul.f32 q8, q9, q8
vst1.32 {d16, d17}, [r2]
bx lr
However if...