similar to: Handling of FP denormal values

Displaying 20 results from an estimated 2000 matches similar to: "Handling of FP denormal values"

2019 Sep 17
2
[cfe-dev] Handling of FP denormal values
On Mon, Sep 16, 2019 at 9:43 PM Matt Arsenault via cfe-dev < cfe-dev at lists.llvm.org> wrote: > > > On Sep 16, 2019, at 19:57, Kaylor, Andrew via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > Do we need an ftz fast-math flag? > > > This would be useful for matching a handful of AMDGPU instructions (a fmad > that only always flushes being the
2019 Mar 18
3
[RFC] Making space for a flush-to-zero flag in FastMathFlags
We knew the day when we needed another FMF bit was coming back in: https://reviews.llvm.org/D39304 ...it was just a question of 'when'. :) I'm guessing that an FTZ bit won't be the last new bit needed if we consider permutations between strict FP and fast-math. Even without that, denormals-as-zero (DAZ) might also be useful? So rather than continuing to carve these out bit-by-bit,
2009 Sep 23
1
High CPU usage
Hi Jeff, Hi Jean-Marc, I first modified the FPU control word to raise an exception whenever a denormal is used. Then I used the debugger to locate the exceptions and added VERY_SMALLs where they seem to fit well. Although I got CPU usage as low as 10%, I seriously lack knowledge of how things work inside speex. So just changing some code is not the best idea for me. My second attempt was to
2012 Jun 14
1
High CPU usage
Hi Mark, Code below: int16_t* samples; int16_t* fbSilenceFrame; void *fSpeexState; float eng(0.f); int speexFrameSize(0); speex_encoder_ctl(speexState, SPEEX_GET_FRAME_SIZE, &speexFrameSize); for (int i = 0; i < speexFrameSize; i++) { eng += samples[i] * samples[i]; } if (eng / speexFrameSize < 3.f) { memcpy(samples, silenceFrame, speexFrameSize * sizeof(int16_t)); } where
2019 Mar 16
3
[RFC] Making space for a flush-to-zero flag in FastMathFlags
Hi, I need to add a flush-denormals-to-zero (FTZ) flag to FastMathFlags, but we've already used up the 7 bits available in Value::SubclassOptionalData (the "backing storage" for FPMathOperator::getFastMathFlags()). These are the possibilities I can think of: 1. Increase the size of FPMathOperator. This gives us some additional bits for FTZ and other fastmath flags we'd want
2014 Jan 28
2
[LLVMdev] ldmxcsr reordering issue
Hi, I met troubles with jitting x86 codes when using Intrinsic::x86_sse_ldmxcsr. The target code must execute some SSE2 instruction with DAZ/FTZ modes enabled and others with DAZ/FTZ disabled. I'm trying to get this by emitting LDMXCSR instructions with proper flag words. It appeared however that execution engine sometimes reorders these instructions with computational ones (say with
2019 Mar 18
2
[RFC] Making space for a flush-to-zero flag in FastMathFlags
On Sun, Mar 17, 2019 at 1:47 PM Craig Topper <craig.topper at gmail.com> wrote: > Can we move HasValueHandle out of the byte used for SubClassOptionalData and move it to the flags at the bottom of value by shrinking NumUserOperands to 27? I like this approach because it is less work for me. :) But I agree with Sanjay below that this only kicks the can slightly further down the road
2018 Aug 21
4
different output with fast-math flag
This is of course not homework. I am trying to understand how fast math optimizations work in llvm. When I compared IR for both the programs, the only thing I have noticed is that fdiv and fmul are replaced with fdiv fast and fmul fast. Not sure what happens in fdiv fast and fmul fast. I feel that its because d/max is really small number and fast-math does not care about small numbers and consider
2012 Jun 13
0
High CPU usage
Hi Tanmay, >Does compiling speex API with DISABLE_FLOAT_API and DISABLE_VBR solve the >problem? I remember that this fixed the problem. But at that time I also needed VBR so this was not an option. As far as I know, it is related to some calculations that involve float denormals that cause the high CPU usage. Today I'm still using the following code before speex_encoder_init and
2016 Oct 12
3
[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Wed, Oct 12, 2016 at 10:28 AM, Hal Finkel <hfinkel at anl.gov> wrote: > ----- Original Message ----- >> From: "Renato Golin" <renato.golin at linaro.org> >> To: "Hal Finkel" <hfinkel at anl.gov> >> Cc: "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias
2016 Mar 22
2
NEON FP flags
Hal, James, My plan to disable vectorization on NEON FP had two steps: 1. Create the infrastructure to detect unsafe FP maths and force NEON FP via fast-math. 2. Use -mfpmath=neon/sse to fine-tune the flags even further, but this needs a lot of work in IR. The expected behaviour is to have most performance with least options, but with correctness in mind. So, we can't vectorize FP loops
2014 Apr 07
9
[LLVMdev] 3.4.1 Release Plans
Hi Robert, Can you ping the code owners about these patches. It might be good to write a separate email per code owner and cc the appropriate -commits list. Thanks, Tom On Wed, Apr 02, 2014 at 06:16:44PM +0400, Robert Khasanov wrote: > Hi Tom, > > I would like to nominate the following patches to be backported to 3.4.1 > > Clang: > 1. r204742 - Zinovy Nis <zinovy.nis at
2016 Feb 15
2
Vectorization with fast-math on irregular ISA sub-sets
Hi, > James, is that a correct assessment? Yes, it is also my belief that the only way ARMv7 NEON differs from IEEE754 is lack of denormal support. James > On 11 Feb 2016, at 10:53, Renato Golin <renato.golin at linaro.org> wrote: > > Hal, > > I had a read on the ARM ARM about VFP and SIMD FP semantics and my > analysis is that NEON's only problem is the
2016 Oct 12
3
[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On 12 October 2016 at 15:05, Hal Finkel <hfinkel at anl.gov> wrote: > This is something we need to understand. No, there's not always an error bar. With FMA formation and without non-IEEE-compliant optimizations (i.e. fast-math), the optimized answer should be identical to the non-optimized answer. What about architectures that this is never respected, like Darwin? In the general
2009 Sep 23
2
High CPU usage
Hi Jean-Marc, I recompiled with FIXED_POINT and CPU utilization stays below 4%. This is a great improvement. So how can I fix this to work with floating point ? Thanks. Mark -----Urspr?ngliche Nachricht----- Von: Jean-Marc Valin [mailto:jean-marc.valin at usherbrooke.ca] Betreff: Re: [Speex-dev] High CPU usage Hi, Sound like it could be the good old denormalised float problem on the Intel
2011 Sep 02
1
[LLVMdev] does new EH require newer linker?
Is the new EH scheme completely compatible with the existing linker in Xcode 4.1? I am finding that today's changes break the ability to link xplor-nih with dragonegg under FSF gcc 4.6.2... de-g++46 -c thread.cc -O3 -ffast-math -funroll-loops -g -DX_MMAP_FLAGS=0 -DFORTRAN_INIT -fno-common -DDARWIN -D_REENTRANT -DNDEBUG -I/Users/howarth/xplor-nih-2.27/vmd/
2016 Feb 08
2
Vectorization with fast-math on irregular ISA sub-sets
Folks, I'm now looking at https://llvm.org/bugs/show_bug.cgi?id=16274, which seems to have some support in the vectorizer, but not as we need for this particular case. I may have missed something obvious, please let me know if there is a better way. As you already know, ARM has two FP instruction sets: VFP and NEON. VFP applies to single FP registers while NEON is a full SIMD. The problem is
2016 Feb 11
2
Vectorization with fast-math on irregular ISA sub-sets
Our processor also has some issues regarding the handling of denormals - scalar and vector - and we ran into a related problem only a few days ago. The v3.8 compiler has done a lot of good work on optimisations for floating-point math, but ironically one of them broke our implementation of 'nextafterf'. The desired code fragment (FP32) is: float xAbs = fabsf(x); since we know our
2017 Jul 01
2
[PATCH 1/2] nv110/exa: Remove depbars
Removed explicit depar instructions as they're not used by the blob anymore. Signed-off-by: Aaryaman Vasishta <jem456.vasishta at gmail.com> --- src/shader/exac8nv110.fp | 5 ++--- src/shader/exac8nv110.fpc | 10 ++++------ src/shader/exacanv110.fp | 5 ++--- src/shader/exacanv110.fpc | 10 ++++------ src/shader/exacmnv110.fp | 5 ++--- src/shader/exacmnv110.fpc | 10 ++++------
2016 Oct 16
2
[PATCH] exa: add GM10x acceleration support
rendercheck -f a8r8g8b8 passes as much as on a GK208, and xv appears to work. Very lightly tested. Instead of sticking coordinates into pushbufs, the vertex shader is modified to read them from a constbuf, indexed by vertex id. This approach could be used for all nvc0 generations, but I didn't want to rock the boat. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Note: this