search for: mrecip

Displaying 5 results from an estimated 5 matches for "mrecip".

Did you mean: precip
2020 Sep 01
2
Should llvm optimize 1.0 / x ?
Hi Quentin, You are correct, I could manage to get clang to use vrcpps, but not in a satisfying way: clang++ -O3 -march=native -mtune=native \ -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize \ -ffast-math -ffp-model=fast -ffp-exception-behavior=ignore -ffp-contract=fast \ -c -o vec.o vec.cc 0000000000000140 <_Z4fct4Dv4_f>: 140: c5 f8 53 c8
2020 Sep 01
2
Vector evolution?
...i) x[i] = 7 * x[i]; } After compiling it with: clang++ -O3 -march=native -mtune=native \ -Rpass=loop-vectorize,slp-vectorize -Rpass-missed=loop-vectorize,slp-vectorize -Rpass-analysis=loop-vectorize,slp-vectorize \ -ffast-math -ffp-model=fast -ffp-exception-behavior=ignore -ffp-contract=fast -mrecip=all:0 \ -c -o vec.o vec.cc I get the following codegen: 0000000000000160 <_Z4fct6PDv4_f>: 160: 31 c0 xor %eax,%eax 162: c4 e2 79 18 05 00 00 vbroadcastss 0x0(%rip),%xmm0 # 16b <_Z4fct6PDv4_f+0xb> 169: 00 00 16b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1...
2018 Jun 19
2
Naming clash: -DCLS=n and CLS in code
.../Frontend/Rewrite -I/sw/src/clang_llvm_dev/clang/include -Itools/clang/include -Iinclude -I/sw/src/clang_llvm_dev/llvm_trunk/include -fopt-info -pipe -Wall -Wextra -Ofast -DCLS=64 -fPIC -floop-nest-optimize --param simultaneous-prefetches=16 -fprefetch-loop-arrays -msse4.2 -mrecip=all -funroll-loops -fdelete-null-pointer-checks --param prefetch-latency=32 -ffast-math -ftree-vectorize -funsafe-math-optimizations -Wno-error=unused-parameter -Wno-error=type-limits -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++1y -Wall -Wextra -Wno-unused-parameter -...
2017 Oct 03
2
Trouble when suppressing a portion of fast-math-transformations
...he > not-quite-accurate-enough built in instruction. This is what arcp is for (implying that you can use the reciprocal estimate and not worry about getting the exact answer). Now there's a separate question about how many Newton iterations to use, and we have a separate flag for that (-mrecip=...). Check out the implementation of TargetLoweringBase::getRecipEstimateSqrtEnabled to see how it's setup in backend. This is, however, per function, so we don't currently have a per-operation control on this. > > The OpenCL spec defines a number of compile flags controlling &...
2017 Oct 02
2
Trouble when suppressing a portion of fast-math-transformations
I'm not aware of any additional bits needed. But putting us right at the edge leaves me uncomfortable. So an implementation that isn't limited by the 7 bits in SubclassOptionalData seems sensible. Thanks, -Warren From: Sanjay Patel [mailto:spatel at rotateright.com] Sent: Monday, October 2, 2017 12:06 AM To: Ristow, Warren Cc: Hal Finkel; llvm-dev at lists.llvm.org Subject: Re: