thr3ads.net - search: "accum"

2009 Oct 26

1

[PATCH] Fix miscompile of SSE resampler

...iles changed, 12 insertions(+), 20 deletions(-) diff --git a/libspeex/resample.c b/libspeex/resample.c index 7b5a308..8131380 100644 --- a/libspeex/resample.c +++ b/libspeex/resample.c @@ -361,7 +361,7 @@ static int resampler_basic_direct_single(SpeexResamplerState *st, spx_uint32_t c sum = accum[0] + accum[1] + accum[2] + accum[3]; */ #else - sum = inner_product_single(sinc, iptr, N); + inner_product_single(&sum, sinc, iptr, N); #endif out[out_stride * out_sample++] = SATURATE32(PSHR32(sum, 15), 32767); @@ -412,7 +412,7 @@ static int resampler_basic_direct_double(...

Resampler (no api)

2008 May 03

2

Resampler (no api)

...< 0;j++) - { - sum += MULT16_16(mem[last_sample+j],st->sinc_table[samp_frac_num*st->filt_len+j]); + const spx_word16_t *sinc = & sinc_table[samp_frac_num*N]; + const spx_word16_t *iptr = & in[last_sample]; + +#ifndef OVERRIDE_INNER_PRODUCT_SINGLE + float accum[4] = {0,0,0,0}; + + for(j=0;j<N;j+=4) { + accum[0] += sinc[j]*iptr[j]; + accum[1] += sinc[j+1]*iptr[j+1]; + accum[2] += sinc[j+2]*iptr[j+2]; + accum[3] += sinc[j+3]*iptr[j+3]; } - - /* Do the new part */ - if (in != NULL) + sum = accum...

Resampler, memory only variant

2008 May 03

0

Resampler, memory only variant

...< 0;j++) - { - sum += MULT16_16(mem[last_sample+j],st->sinc_table[samp_frac_num*st->filt_len+j]); + const spx_word16_t *sinc = & sinc_table[samp_frac_num*N]; + const spx_word16_t *iptr = & in[last_sample]; + +#ifndef OVERRIDE_INNER_PRODUCT_SINGLE + float accum[4] = {0,0,0,0}; + + for(j=0;j<N;j+=4) { + accum[0] += sinc[j]*iptr[j]; + accum[1] += sinc[j+1]*iptr[j+1]; + accum[2] += sinc[j+2]*iptr[j+2]; + accum[3] += sinc[j+3]*iptr[j+3]; } - - /* Do the new part */ - if (in != NULL) + sum = accum...

SCEV and LoopStrengthReduction Formulae

2018 Apr 07

0

SCEV and LoopStrengthReduction Formulae

...timization for a living, I did similar tricks pretty much everywhere in DSP functions. It’d be pretty nice if the compiler could do it too. There is one alternate approach that I recall, which looks like this: Original code (example, pseudocode): int add_delta_256(uint8 *in1, uint8 *in2) { int accum = 0; for (int i = 0; i < 16; ++i) { uint8x16 a = load16(in1 + i *16); // NOTE: takes an extra addressing op because x86 uint8x16 b = load16(in2 + i *16); // NOTE: takes an extra addressing op because x86 accum += psadbw(a, b); } return accum; } end of loop: inc i cmp i, 16 jl loo...

Accumulating results from "for" loop in a list/array

2009 Sep 11

2

Accumulating results from "for" loop in a list/array

Dear R users, I would like to accumulate objects generated from 'for' loop to a list or array. To illustrate the problem, arbitrary data set and script is shown below, x <- data.frame(a = c(rep("n",3),rep("y",2),rep("n",3),rep("y",2)), b = c(rep("y",2),rep("n",...

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 10

3

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

...Linux Foundation -------------- next part -------------- >From e9ca53595ee7a274308c823d14cccd5a8598814d Mon Sep 17 00:00:00 2001 From: Matthew Curtis <mcurtis at codeaurora.org> Date: Mon, 3 Dec 2012 17:25:20 -0600 Subject: [PATCH] Teach ScalarEvolution to handle IV=add(zext(trunc(IV)), Accum) Code generated for 8-bit and 16-bit induction variables commonly produces this pattern (on hexagon), where the add is performed as a 32-bit operation, but is preceeded by a truncate and zero-extend (or equivalent bitwise and). This change teaches ScalarEvolution to recognize this pattern and cre...

[PATCH] drm/nvc0-: Fix voltage obtained from vbios.

2014 Jan 02

0

[PATCH] drm/nvc0-: Fix voltage obtained from vbios.

...b/drivers/gpu/drm/nouveau/core/subdev/bios/vmap.c @@ -87,14 +87,25 @@ nvbios_vmap_entry_parse(struct nouveau_bios *bios, int idx, u8 *ver, u8 *len, u16 vmap = nvbios_vmap_entry(bios, idx, ver, len); memset(info, 0x00, sizeof(*info)); switch (!!vmap * *ver) { - case 0x10: + case 0x10: { + s32 accum, b, c; + info->link = 0xff; info->min = nv_ro32(bios, vmap + 0x00); info->max = nv_ro32(bios, vmap + 0x04); - info->arg[0] = nv_ro32(bios, vmap + 0x08); - info->arg[1] = nv_ro32(bios, vmap + 0x0c); - info->arg[2] = nv_ro32(bios, vmap + 0x10); + + accum = nv_ro...

mdct_backward with fused muladd?

2003 May 20

2

mdct_backward with fused muladd?

Can anybody point me at any resources that would explain how to optimize mdct_backward for a cpu with a fused multiply-accumute unit? >From what I understand from responses to my older postings, Tremor's mdct_backward could be rewritten to take advantage of a muladd. My target machine can do either two-wide 32x32 + Accum(64) -> Accum(64) integer muladd or eight-wide 16x16 + Accum(32) -> Accum(32) integer m...

PWGL in wine, problems

2008 May 14

6

PWGL in wine, problems

Hello, I'm new on this list. First of all, thank you to all the developers of this great project! At the moment there is only an application that keeps me on both macos and windows, its name is PWGL a free environment for computer assisted composition in openGL. (http://www2.siba.fi/PWGL/) I'm running Ubuntu 8.04 and wine 0.9.59. I have to say that I also installed vcrun2005 and

[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point

2011 Sep 01

0

[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point

...endif +#ifdef _USE_NEON +#include "resample_neon.h" +#endif + /* Numer of elements to allocate on the stack */ #ifdef VAR_ARRAYS #define FIXED_STACK_ALLOC 8192 @@ -360,11 +364,12 @@ static int resampler_basic_direct_single(SpeexResamplerState *st, spx_uint32_t c } sum = accum[0] + accum[1] + accum[2] + accum[3]; */ + sum = SATURATE32PSHR(sum, 15, 32767); #else sum = inner_product_single(sinc, iptr, N); #endif - out[out_stride * out_sample++] = SATURATE32(PSHR32(sum, 15), 32767); + out[out_stride * out_sample++] = sum; last_sample += int...

Hangups - SIGFPE in dsp.c

2004 Aug 18

1

Hangups - SIGFPE in dsp.c

Hi, I'm running the latest CVS HEAD version of asterisk, and I'm experiencing hangups during voice conversation. This happens quite regularely and often. The problem is in dsp.c, line 1235, where it says accum /= len; But `len', at this point, is 0, resulting in a SIGFPE. The routine ast_frame *i4l_read() in channels/chan_modem_i4l.c:411 is setting p->fr.datalen to p->obuflen which is zero. Has anybody noticed this, too? Since I don't know the code, I cannot suggest a fix, but maybe so...

SCEV and LoopStrengthReduction Formulae

2018 Apr 03

4

SCEV and LoopStrengthReduction Formulae

I am attempting to implement a minor loop strength reduction optimization for targets that support compare and jump fusion, specifically TTI::canMacroFuseCmp(). My approach might be wrong; however, I am soliciting the idea for feedback, so that I can implement this correctly. My plan is to add a Supplemental LSR formula to LoopStrengthReduce.cpp that optimizes the following case, but perhaps

Plot frame border to start at zero?

2010 Jan 20

2

Plot frame border to start at zero?

Hello, I am creating plots of hourly precipitation and accumulated precipitation (on different axis, see attached image). I was wondering how can I have the plot frame (black border) start at zero, it looks like it is plotted less than zero? The code I use to create the png files is below: CairoPNG(PNG_file,width=1000, height=600, pointsize=14, bg=&quo...

[LLVMdev] subregisters, def-kill

2011 May 20

1

[LLVMdev] subregisters, def-kill

...g16511:hi16<def> = COPY %reg16473:lo16<kill>, %reg16511<imp-def>; 844L %reg16511:lo16<def> = COPY %reg16478:lo16<kill>; 852L %r4<def,dead> = st_postMod %reg16511<kill>, %r4 ... 844L %reg16511:lo16<def> = COPY %reg16478:lo16<kill>; Accum:%reg16511,16478 Considering merging %reg16478 with reg%16511 to Accum RHS = %reg16478,0.000000e+00 = [804d,844d:0) 0 at 804d LHS = %reg16511,0.000000e+00 = [836d,844d:1)[844d,852d:0) 0 at 844d 1 at 836d Interference! It seems that there is a Live range from the...

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

0

[LLVMdev] LLVM ARM VMLA instruction

...would be because it follows what C tells us a compiler has to do by default but provides overrides in either direction if you know what you're doing. The key point is that LLVM (currently) has no notion of statement boundaries, so it would fuse the operations in this function: float foo(float accum, float lhs, float rhs) { float product = lhs * rhs; return accum + product; } This isn't allowed even under FP_CONTRACT=on (the multiply and add do not occur within a single expression), so LLVM can't in good conscience enable these optimisations by default. Cheers. Tim.

[PATCH] remove unnecesary typedef in bitwriter.c

2012 Apr 05

1

[PATCH] remove unnecesary typedef in bitwriter.c

..._WORDS_TO_BITS(words) ((words) * FLAC__BITS_PER_WORD) #define FLAC__TOTAL_BITS(bw) (FLAC__WORDS_TO_BITS((bw)->words) + (bw)->bits) @@ -85,8 +84,8 @@ static const unsigned FLAC__BITWRITER_DEFAULT_INCREMENT = 4096u / sizeof(bwword) #endif struct FLAC__BitWriter { - bwword *buffer; - bwword accum; /* accumulator; bits are right-justified; when full, accum is appended to buffer */ + uint32_t *buffer; + uint32_t accum; /* accumulator; bits are right-justified; when full, accum is appended to buffer */ unsigned capacity; /* capacity of buffer in words */ unsigned words; /* # of complete wo...

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

2

[LLVMdev] LLVM ARM VMLA instruction

...C tells us a compiler has to do > by default but provides overrides in either direction if you know what > you're doing. > > The key point is that LLVM (currently) has no notion of statement > boundaries, so it would fuse the operations in this function: > > float foo(float accum, float lhs, float rhs) { > float product = lhs * rhs; > return accum + product; > } > > This isn't allowed even under FP_CONTRACT=on (the multiply and add do > not occur within a single expression), so LLVM can't in good > conscience enable these optimisations by de...

Floating point exception help

2004 Aug 19

2

Floating point exception help

...200 > @@ -1229,6 +1229,13 @@ > int x; > int res = 0; > > + /*PM BEGIN*/ > + if (len==0) { > + ast_log(LOG_WARNING, "zero length packet\n"); > + return 0; > + } > + /*PM END*/ > + > accum = 0; > for (x=0;x<len; x++) > accum += abs(s[x]); > > > _______________________________________________ > Asterisk-Users mailing list > Asterisk-Users@lists.digium.com > http://lists.digium.com/mailman/listinfo/asterisk-users > To UNSUBSCRIBE or u...

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

2

[LLVMdev] LLVM ARM VMLA instruction

> "-ffp-contract=fast" is needed Correct - clang is different than gcc, icc, msvc, xlc, etc. on this. Still haven't seen any explanation for how this is better though... http://llvm.org/bugs/show_bug.cgi?id=17188 http://llvm.org/bugs/show_bug.cgi?id=17211 On Wed, Dec 18, 2013 at 6:02 AM, Tim Northover <t.p.northover at gmail.com>wrote: > > I believe that's the

[PATCH 0/5] ARM NEON optimization for samplerate converter

2011 Sep 01

6

[PATCH 0/5] ARM NEON optimization for samplerate converter

From: Jyri Sarha <jsarha at ti.com> I optimized Speex resampler for NEON capable ARM CPUs. The first patch should speed up resampling on any platform that can spare the increased memory usage. It would be nice to have these merged to the master branch. Please let me know if there is anything I can do to help the the merge. The patches have been rebased on top of master branch in

search for: accum