Displaying 5 results from an estimated 5 matches for "mrecip".
Did you mean:
precip
2020 Sep 01
2
Should llvm optimize 1.0 / x ?
Hi Quentin,
You are correct, I could manage to get clang to use vrcpps, but not in
a satisfying way:
clang++ -O3 -march=native -mtune=native \
-Rpass=loop-vectorize -Rpass-missed=loop-vectorize
-Rpass-analysis=loop-vectorize \
-ffast-math -ffp-model=fast -ffp-exception-behavior=ignore -ffp-contract=fast \
-c -o vec.o vec.cc
0000000000000140 <_Z4fct4Dv4_f>:
140: c5 f8 53 c8
2020 Sep 01
2
Vector evolution?
...i)
x[i] = 7 * x[i];
}
After compiling it with:
clang++ -O3 -march=native -mtune=native \
-Rpass=loop-vectorize,slp-vectorize
-Rpass-missed=loop-vectorize,slp-vectorize
-Rpass-analysis=loop-vectorize,slp-vectorize \
-ffast-math -ffp-model=fast -ffp-exception-behavior=ignore
-ffp-contract=fast -mrecip=all:0 \
-c -o vec.o vec.cc
I get the following codegen:
0000000000000160 <_Z4fct6PDv4_f>:
160: 31 c0 xor %eax,%eax
162: c4 e2 79 18 05 00 00 vbroadcastss 0x0(%rip),%xmm0 # 16b
<_Z4fct6PDv4_f+0xb>
169: 00 00
16b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1...
2018 Jun 19
2
Naming clash: -DCLS=n and CLS in code
.../Frontend/Rewrite
-I/sw/src/clang_llvm_dev/clang/include -Itools/clang/include -Iinclude
-I/sw/src/clang_llvm_dev/llvm_trunk/include -fopt-info -pipe -Wall -Wextra
-Ofast -DCLS=64 -fPIC -floop-nest-optimize --param
simultaneous-prefetches=16 -fprefetch-loop-arrays -msse4.2 -mrecip=all
-funroll-loops -fdelete-null-pointer-checks --param
prefetch-latency=32 -ffast-math -ftree-vectorize -funsafe-math-optimizations
-Wno-error=unused-parameter -Wno-error=type-limits -fPIC
-fvisibility-inlines-hidden -Werror=date-time -std=c++1y -Wall -Wextra
-Wno-unused-parameter -...
2017 Oct 03
2
Trouble when suppressing a portion of fast-math-transformations
...he
> not-quite-accurate-enough built in instruction.
This is what arcp is for (implying that you can use the reciprocal
estimate and not worry about getting the exact answer). Now there's a
separate question about how many Newton iterations to use, and we have a
separate flag for that (-mrecip=...). Check out the implementation of
TargetLoweringBase::getRecipEstimateSqrtEnabled to see how it's setup in
backend. This is, however, per function, so we don't currently have a
per-operation control on this.
>
> The OpenCL spec defines a number of compile flags controlling
&...
2017 Oct 02
2
Trouble when suppressing a portion of fast-math-transformations
I'm not aware of any additional bits needed. But putting us right at the edge leaves me uncomfortable. So an implementation that isn't limited by the 7 bits in SubclassOptionalData seems sensible.
Thanks,
-Warren
From: Sanjay Patel [mailto:spatel at rotateright.com]
Sent: Monday, October 2, 2017 12:06 AM
To: Ristow, Warren
Cc: Hal Finkel; llvm-dev at lists.llvm.org
Subject: Re: