thr3ads.net - search: "accumated"

Displaying 20 results from an estimated 63 matches for "accumated".

2009 Oct 26

[PATCH] Fix miscompile of SSE resampler

From: Thorvald Natvig <slicer at users.sourceforge.net> Some optimizing compilers miscompile the current SSE optimizations when full optimizations are enabled. By using output value pointer instead of a return value, we can bypass this misbehaviour. --- libspeex/resample.c | 8 ++++---- libspeex/resample_sse.h | 24 ++++++++---------------- 2 files changed, 12 insertions(+), 20

Resampler (no api)

2008 May 03

Resampler (no api)

.. And a version without the API changes. -------------- next part -------------- Index: libspeex/resample_sse.h =================================================================== --- libspeex/resample_sse.h (revision 0) +++ libspeex/resample_sse.h (revision 0) @@ -0,0 +1,128 @@ +/* Copyright (C) 2002-2008 Jean-Marc Valin + * Copyright (C) 2008 Thorvald Natvig + */ +/** + @file resample_sse.h +

Resampler, memory only variant

2008 May 03

Resampler, memory only variant

Hi, Here's the (hopefully) final version of the resampler, now always using st->mem as the buffer area. It only allocates buffers on the stack when it's necesarry to convert the output between int and float. -------------- next part -------------- Index: include/speex/speex_resampler.h =================================================================== ---

SCEV and LoopStrengthReduction Formulae

2018 Apr 07

SCEV and LoopStrengthReduction Formulae

> > I realize this is a micro-op saving a single cycle. But this reduces the instruction count, one less > instr to decode in a potentially hot path. If this all makes sense, and seems like a reasonable addition > to llvm, would it make sense to implement this as a supplemental LSR formula, or as a separate pass? This seems reasonable to me so long as rbx has no other uses that

Accumulating results from "for" loop in a list/array

2009 Sep 11

Accumulating results from "for" loop in a list/array

Dear R users, I would like to accumulate objects generated from 'for' loop to a list or array. To illustrate the problem, arbitrary data set and script is shown below, x <- data.frame(a = c(rep("n",3),rep("y",2),rep("n",3),rep("y",2)), b = c(rep("y",2),rep("n",4),rep("y",3),"n"), c =

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 10

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

Hello all, I wanted to get some feedback on this patch for ScalarEvolution. It addresses a performance problem I am seeing for simple benchmark. Starting with this C code: 01: signed char foo(void) 02: { 03: const int count = 8000; 04: signed char result = 0; 05: int j; 06: 07: for (j = 0; j < count; ++j) { 08: result += (result_t)(3); 09: } 10: 11: return result; 12: } I

[PATCH] drm/nvc0-: Fix voltage obtained from vbios.

2014 Jan 02

[PATCH] drm/nvc0-: Fix voltage obtained from vbios.

Coefficients are based on the formula: uV = 0.1 * arg[0] + 150.5 * arg[1] + 22.65025 * arg[2] It seems to be rounded downwards. I have no idea why the voltage isn't specified in the bios directly. Signed-off-by: Maarten Lankhorst <maarten.lankhorst at canonical.com> ---- diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/vmap.c b/drivers/gpu/drm/nouveau/core/subdev/bios/vmap.c

mdct_backward with fused muladd?

2003 May 20

mdct_backward with fused muladd?

Can anybody point me at any resources that would explain how to optimize mdct_backward for a cpu with a fused multiply-accumute unit? >From what I understand from responses to my older postings, Tremor's mdct_backward could be rewritten to take advantage of a muladd. My target machine can do either two-wide 32x32 + Accum(64) -> Accum(64) integer muladd or eight-wide 16x16 + Accum(32)

PWGL in wine, problems

2008 May 14

PWGL in wine, problems

Hello, I'm new on this list. First of all, thank you to all the developers of this great project! At the moment there is only an application that keeps me on both macos and windows, its name is PWGL a free environment for computer assisted composition in openGL. (http://www2.siba.fi/PWGL/) I'm running Ubuntu 8.04 and wine 0.9.59. I have to say that I also installed vcrun2005 and

[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point

2011 Sep 01

[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point

From: Jyri Sarha <jsarha at ti.com> Semantics of inner_product_single have also been changed to contain the final right shift and saturation so it can also be implemented in the optimal way for the used platform. This change affects fixed point calculations only. I also added a new fixed point macro SATURATE32PSHR(x, shift, a). It does pretty much the same thing as SATURATE32(PSHR32(x,

Hangups - SIGFPE in dsp.c

2004 Aug 18

Hangups - SIGFPE in dsp.c

Hi, I'm running the latest CVS HEAD version of asterisk, and I'm experiencing hangups during voice conversation. This happens quite regularely and often. The problem is in dsp.c, line 1235, where it says accum /= len; But `len', at this point, is 0, resulting in a SIGFPE. The routine ast_frame *i4l_read() in channels/chan_modem_i4l.c:411 is setting p->fr.datalen to

SCEV and LoopStrengthReduction Formulae

2018 Apr 03

SCEV and LoopStrengthReduction Formulae

I am attempting to implement a minor loop strength reduction optimization for targets that support compare and jump fusion, specifically TTI::canMacroFuseCmp(). My approach might be wrong; however, I am soliciting the idea for feedback, so that I can implement this correctly. My plan is to add a Supplemental LSR formula to LoopStrengthReduce.cpp that optimizes the following case, but perhaps

Plot frame border to start at zero?

2010 Jan 20

Plot frame border to start at zero?

Hello, I am creating plots of hourly precipitation and accumulated precipitation (on different axis, see attached image). I was wondering how can I have the plot frame (black border) start at zero, it looks like it is plotted less than zero? The code I use to create the png files is below: CairoPNG(PNG_file,width=1000, height=600, pointsize=14, bg="white") opar <-

[LLVMdev] subregisters, def-kill

2011 May 20

[LLVMdev] subregisters, def-kill

If I write %reg16506<def> = INSERT_SUBREG %reg16506, %reg16445, hi16; #1 %reg16506<def> = INSERT_SUBREG %reg16506, %reg16468, lo16; #2 store %reg16506 #3 it will not coalesce, as LiveVariables: on #2: %16506 gets #2 as a kill #3: %16506 gets #3 as an additional kill LiveIntervalAnalysis:

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

[LLVMdev] LLVM ARM VMLA instruction

> http://llvm.org/bugs/show_bug.cgi?id=17188 > http://llvm.org/bugs/show_bug.cgi?id=17211 Ah, thanks. That makes a lot more sense now. > Correct - clang is different than gcc, icc, msvc, xlc, etc. on this. Still > haven't seen any explanation for how this is better though... That would be because it follows what C tells us a compiler has to do by default but provides overrides

[PATCH] remove unnecesary typedef in bitwriter.c

2012 Apr 05

[PATCH] remove unnecesary typedef in bitwriter.c

--- src/libFLAC/bitwriter.c | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/src/libFLAC/bitwriter.c b/src/libFLAC/bitwriter.c index 651440d..7da4b15 100644 --- a/src/libFLAC/bitwriter.c +++ b/src/libFLAC/bitwriter.c @@ -43,12 +43,11 @@ /* Things should be fastest when this matches the machine word size */ /* WATCHOUT: if you change this

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

Thanks for the explanation, Tim! gcc 4.8.1 *does* generate an fma for your code example for an x86 target that supports fma. I'd bet that the HW vendors' compilers do the same, but I don't have any of those installed at the moment to test that theory. So this is a bug in those compilers? Do you know how they justify it? I see section 6.5 "Expressions" in the C standard, and

Floating point exception help

2004 Aug 19

Floating point exception help

Hi Manfred, I applied the patch and recompiled and reinstalled and I got the folowing warning during my first test call: Aug 19 12:26:51 WARNING[294927]: dsp.c:1234 __ast_dsp_silence: zero length packet It looks like that could be the problem... and the fix! I'll let you know if the problem reoccurs. Might it be an idea to submit the patch to the bugtracker? Thanks, Gary ----- Original

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

[LLVMdev] LLVM ARM VMLA instruction

> "-ffp-contract=fast" is needed Correct - clang is different than gcc, icc, msvc, xlc, etc. on this. Still haven't seen any explanation for how this is better though... http://llvm.org/bugs/show_bug.cgi?id=17188 http://llvm.org/bugs/show_bug.cgi?id=17211 On Wed, Dec 18, 2013 at 6:02 AM, Tim Northover <t.p.northover at gmail.com>wrote: > > I believe that's the

[PATCH 0/5] ARM NEON optimization for samplerate converter

2011 Sep 01

[PATCH 0/5] ARM NEON optimization for samplerate converter

From: Jyri Sarha <jsarha at ti.com> I optimized Speex resampler for NEON capable ARM CPUs. The first patch should speed up resampling on any platform that can spare the increased memory usage. It would be nice to have these merged to the master branch. Please let me know if there is anything I can do to help the the merge. The patches have been rebased on top of master branch in

search for: accumated