Hal, James,
My plan to disable vectorization on NEON FP had two steps:
1. Create the infrastructure to detect unsafe FP maths and force NEON
FP via fast-math.
2. Use -mfpmath=neon/sse to fine-tune the flags even further, but this
needs a lot of work in IR.
The expected behaviour is to have most performance with least options,
but with correctness in mind. So, we can't vectorize FP loops without
either -ffast-math or -mfpmath=neon, but we want to tell the users
that they could get more performance out of their compiler if either
option was chosen.
If I force the need for -ffast-math, many other deviations from
IEEE-754 will be allowed, not just denormals, so you're left with
either slow or potentially bad results. Using -mfpmath hits the right
spot, but is less known and is not plugged in yet.
Vectorizing FP loops is a correctness problem in NEON (and I assume
SSE), so it would be good to be safe. But I take it it's not a serious
correctness problem, so we can go about it the right way from the
beginning, which I'm ok with.
So...
If I got it right, we need to tell FP instructions that they allow
denormals. So far, I could only find flags about NaNs, Infs, signed
zeroes and reciprocals, as well as the "fast" flag that turns them all
on.
In the target transform info, we need to add a denormal flag to be set
if fpmath=neon/sse/etc on all FP instructions, and in the vectorizer,
we just test for that flag (which should also be set by fast-math).
The Darwin vs. Linux problem is, then, moved to the target transform
info, only setting the flag on ARM if...
isDarwin OR ARMISA >= v8 OR fastMath OR fpMath == NEON
Makes sense?
cheers,
--renato