thr3ads.net - similar to: "Handling of FP denormal values"

Displaying 20 results from an estimated 2000 matches similar to: "Handling of FP denormal values"

[cfe-dev] Handling of FP denormal values

2019 Sep 17

[cfe-dev] Handling of FP denormal values

On Mon, Sep 16, 2019 at 9:43 PM Matt Arsenault via cfe-dev < cfe-dev at lists.llvm.org> wrote: > > > On Sep 16, 2019, at 19:57, Kaylor, Andrew via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > Do we need an ftz fast-math flag? > > > This would be useful for matching a handful of AMDGPU instructions (a fmad > that only always flushes being the

[RFC] Making space for a flush-to-zero flag in FastMathFlags

2019 Mar 18

[RFC] Making space for a flush-to-zero flag in FastMathFlags

We knew the day when we needed another FMF bit was coming back in: https://reviews.llvm.org/D39304 ...it was just a question of 'when'. :) I'm guessing that an FTZ bit won't be the last new bit needed if we consider permutations between strict FP and fast-math. Even without that, denormals-as-zero (DAZ) might also be useful? So rather than continuing to carve these out bit-by-bit,

High CPU usage

2009 Sep 23

High CPU usage

Hi Jeff, Hi Jean-Marc, I first modified the FPU control word to raise an exception whenever a denormal is used. Then I used the debugger to locate the exceptions and added VERY_SMALLs where they seem to fit well. Although I got CPU usage as low as 10%, I seriously lack knowledge of how things work inside speex. So just changing some code is not the best idea for me. My second attempt was to

High CPU usage

2012 Jun 14

High CPU usage

Hi Mark, Code below: int16_t* samples; int16_t* fbSilenceFrame; void *fSpeexState; float eng(0.f); int speexFrameSize(0); speex_encoder_ctl(speexState, SPEEX_GET_FRAME_SIZE, &speexFrameSize); for (int i = 0; i < speexFrameSize; i++) { eng += samples[i] * samples[i]; } if (eng / speexFrameSize < 3.f) { memcpy(samples, silenceFrame, speexFrameSize * sizeof(int16_t)); } where

[RFC] Making space for a flush-to-zero flag in FastMathFlags

2019 Mar 16

[RFC] Making space for a flush-to-zero flag in FastMathFlags

Hi, I need to add a flush-denormals-to-zero (FTZ) flag to FastMathFlags, but we've already used up the 7 bits available in Value::SubclassOptionalData (the "backing storage" for FPMathOperator::getFastMathFlags()). These are the possibilities I can think of: 1. Increase the size of FPMathOperator. This gives us some additional bits for FTZ and other fastmath flags we'd want

[LLVMdev] ldmxcsr reordering issue

2014 Jan 28

[LLVMdev] ldmxcsr reordering issue

Hi, I met troubles with jitting x86 codes when using Intrinsic::x86_sse_ldmxcsr. The target code must execute some SSE2 instruction with DAZ/FTZ modes enabled and others with DAZ/FTZ disabled. I'm trying to get this by emitting LDMXCSR instructions with proper flag words. It appeared however that execution engine sometimes reorders these instructions with computational ones (say with

[RFC] Making space for a flush-to-zero flag in FastMathFlags

2019 Mar 18

[RFC] Making space for a flush-to-zero flag in FastMathFlags

On Sun, Mar 17, 2019 at 1:47 PM Craig Topper <craig.topper at gmail.com> wrote: > Can we move HasValueHandle out of the byte used for SubClassOptionalData and move it to the flags at the bottom of value by shrinking NumUserOperands to 27? I like this approach because it is less work for me. :) But I agree with Sanjay below that this only kicks the can slightly further down the road

different output with fast-math flag

2018 Aug 21

different output with fast-math flag

This is of course not homework. I am trying to understand how fast math optimizations work in llvm. When I compared IR for both the programs, the only thing I have noticed is that fdiv and fmul are replaced with fdiv fast and fmul fast. Not sure what happens in fdiv fast and fmul fast. I feel that its because d/max is really small number and fast-math does not care about small numbers and consider

High CPU usage

2012 Jun 13

High CPU usage

Hi Tanmay, >Does compiling speex API with DISABLE_FLOAT_API and DISABLE_VBR solve the >problem? I remember that this fixed the problem. But at that time I also needed VBR so this was not an option. As far as I know, it is related to some calculations that involve float denormals that cause the high CPU usage. Today I'm still using the following code before speex_encoder_init and

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

2016 Oct 12

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On Wed, Oct 12, 2016 at 10:28 AM, Hal Finkel <hfinkel at anl.gov> wrote: > ----- Original Message ----- >> From: "Renato Golin" <renato.golin at linaro.org> >> To: "Hal Finkel" <hfinkel at anl.gov> >> Cc: "Sebastian Paul Pop" <s.pop at samsung.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, "Matthias

NEON FP flags

2016 Mar 22

NEON FP flags

Hal, James, My plan to disable vectorization on NEON FP had two steps: 1. Create the infrastructure to detect unsafe FP maths and force NEON FP via fast-math. 2. Use -mfpmath=neon/sse to fine-tune the flags even further, but this needs a lot of work in IR. The expected behaviour is to have most performance with least options, but with correctness in mind. So, we can't vectorize FP loops

[LLVMdev] 3.4.1 Release Plans

2014 Apr 07

[LLVMdev] 3.4.1 Release Plans

Hi Robert, Can you ping the code owners about these patches. It might be good to write a separate email per code owner and cc the appropriate -commits list. Thanks, Tom On Wed, Apr 02, 2014 at 06:16:44PM +0400, Robert Khasanov wrote: > Hi Tom, > > I would like to nominate the following patches to be backported to 3.4.1 > > Clang: > 1. r204742 - Zinovy Nis <zinovy.nis at

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 15

Vectorization with fast-math on irregular ISA sub-sets

Hi, > James, is that a correct assessment? Yes, it is also my belief that the only way ARMv7 NEON differs from IEEE754 is lack of denormal support. James > On 11 Feb 2016, at 10:53, Renato Golin <renato.golin at linaro.org> wrote: > > Hal, > > I had a read on the ARM ARM about VFP and SIMD FP semantics and my > analysis is that NEON's only problem is the

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

2016 Oct 12

[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On 12 October 2016 at 15:05, Hal Finkel <hfinkel at anl.gov> wrote: > This is something we need to understand. No, there's not always an error bar. With FMA formation and without non-IEEE-compliant optimizations (i.e. fast-math), the optimized answer should be identical to the non-optimized answer. What about architectures that this is never respected, like Darwin? In the general

High CPU usage

2009 Sep 23

High CPU usage

Hi Jean-Marc, I recompiled with FIXED_POINT and CPU utilization stays below 4%. This is a great improvement. So how can I fix this to work with floating point ? Thanks. Mark -----Urspr?ngliche Nachricht----- Von: Jean-Marc Valin [mailto:jean-marc.valin at usherbrooke.ca] Betreff: Re: [Speex-dev] High CPU usage Hi, Sound like it could be the good old denormalised float problem on the Intel

[LLVMdev] does new EH require newer linker?

2011 Sep 02

[LLVMdev] does new EH require newer linker?

Is the new EH scheme completely compatible with the existing linker in Xcode 4.1? I am finding that today's changes break the ability to link xplor-nih with dragonegg under FSF gcc 4.6.2... de-g++46 -c thread.cc -O3 -ffast-math -funroll-loops -g -DX_MMAP_FLAGS=0 -DFORTRAN_INIT -fno-common -DDARWIN -D_REENTRANT -DNDEBUG -I/Users/howarth/xplor-nih-2.27/vmd/

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 08

Vectorization with fast-math on irregular ISA sub-sets

Folks, I'm now looking at https://llvm.org/bugs/show_bug.cgi?id=16274, which seems to have some support in the vectorizer, but not as we need for this particular case. I may have missed something obvious, please let me know if there is a better way. As you already know, ARM has two FP instruction sets: VFP and NEON. VFP applies to single FP registers while NEON is a full SIMD. The problem is

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 11

Vectorization with fast-math on irregular ISA sub-sets

Our processor also has some issues regarding the handling of denormals - scalar and vector - and we ran into a related problem only a few days ago. The v3.8 compiler has done a lot of good work on optimisations for floating-point math, but ironically one of them broke our implementation of 'nextafterf'. The desired code fragment (FP32) is: float xAbs = fabsf(x); since we know our

[PATCH 1/2] nv110/exa: Remove depbars

2017 Jul 01

[PATCH 1/2] nv110/exa: Remove depbars

Removed explicit depar instructions as they're not used by the blob anymore. Signed-off-by: Aaryaman Vasishta <jem456.vasishta at gmail.com> --- src/shader/exac8nv110.fp | 5 ++--- src/shader/exac8nv110.fpc | 10 ++++------ src/shader/exacanv110.fp | 5 ++--- src/shader/exacanv110.fpc | 10 ++++------ src/shader/exacmnv110.fp | 5 ++--- src/shader/exacmnv110.fpc | 10 ++++------

[PATCH] exa: add GM10x acceleration support

2016 Oct 16

[PATCH] exa: add GM10x acceleration support

rendercheck -f a8r8g8b8 passes as much as on a GK208, and xv appears to work. Very lightly tested. Instead of sticking coordinates into pushbufs, the vertex shader is modified to read them from a constbuf, indexed by vertex id. This approach could be used for all nvc0 generations, but I didn't want to rock the boat. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Note: this

similar to: Handling of FP denormal values