Displaying 20 results from an estimated 63 matches for "accum".
2009 Oct 26
1
[PATCH] Fix miscompile of SSE resampler
...iles changed, 12 insertions(+), 20 deletions(-)
diff --git a/libspeex/resample.c b/libspeex/resample.c
index 7b5a308..8131380 100644
--- a/libspeex/resample.c
+++ b/libspeex/resample.c
@@ -361,7 +361,7 @@ static int resampler_basic_direct_single(SpeexResamplerState *st, spx_uint32_t c
sum = accum[0] + accum[1] + accum[2] + accum[3];
*/
#else
- sum = inner_product_single(sinc, iptr, N);
+ inner_product_single(&sum, sinc, iptr, N);
#endif
out[out_stride * out_sample++] = SATURATE32(PSHR32(sum, 15), 32767);
@@ -412,7 +412,7 @@ static int resampler_basic_direct_double(...
2008 May 03
2
Resampler (no api)
...< 0;j++)
- {
- sum += MULT16_16(mem[last_sample+j],st->sinc_table[samp_frac_num*st->filt_len+j]);
+ const spx_word16_t *sinc = & sinc_table[samp_frac_num*N];
+ const spx_word16_t *iptr = & in[last_sample];
+
+#ifndef OVERRIDE_INNER_PRODUCT_SINGLE
+ float accum[4] = {0,0,0,0};
+
+ for(j=0;j<N;j+=4) {
+ accum[0] += sinc[j]*iptr[j];
+ accum[1] += sinc[j+1]*iptr[j+1];
+ accum[2] += sinc[j+2]*iptr[j+2];
+ accum[3] += sinc[j+3]*iptr[j+3];
}
-
- /* Do the new part */
- if (in != NULL)
+ sum = accum...
2008 May 03
0
Resampler, memory only variant
...< 0;j++)
- {
- sum += MULT16_16(mem[last_sample+j],st->sinc_table[samp_frac_num*st->filt_len+j]);
+ const spx_word16_t *sinc = & sinc_table[samp_frac_num*N];
+ const spx_word16_t *iptr = & in[last_sample];
+
+#ifndef OVERRIDE_INNER_PRODUCT_SINGLE
+ float accum[4] = {0,0,0,0};
+
+ for(j=0;j<N;j+=4) {
+ accum[0] += sinc[j]*iptr[j];
+ accum[1] += sinc[j+1]*iptr[j+1];
+ accum[2] += sinc[j+2]*iptr[j+2];
+ accum[3] += sinc[j+3]*iptr[j+3];
}
-
- /* Do the new part */
- if (in != NULL)
+ sum = accum...
2018 Apr 07
0
SCEV and LoopStrengthReduction Formulae
...timization for a living, I did similar tricks pretty much everywhere in DSP functions. It’d be pretty nice if the compiler could do it too.
There is one alternate approach that I recall, which looks like this:
Original code (example, pseudocode):
int add_delta_256(uint8 *in1, uint8 *in2) {
int accum = 0;
for (int i = 0; i < 16; ++i) {
uint8x16 a = load16(in1 + i *16); // NOTE: takes an extra addressing op because x86
uint8x16 b = load16(in2 + i *16); // NOTE: takes an extra addressing op because x86
accum += psadbw(a, b);
}
return accum;
}
end of loop:
inc i
cmp i, 16
jl loo...
2009 Sep 11
2
Accumulating results from "for" loop in a list/array
Dear R users,
I would like to accumulate objects generated from 'for' loop to a list or
array.
To illustrate the problem, arbitrary data set and script is shown below,
x <- data.frame(a = c(rep("n",3),rep("y",2),rep("n",3),rep("y",2)), b =
c(rep("y",2),rep("n",...
2012 Dec 10
3
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
...Linux Foundation
-------------- next part --------------
>From e9ca53595ee7a274308c823d14cccd5a8598814d Mon Sep 17 00:00:00 2001
From: Matthew Curtis <mcurtis at codeaurora.org>
Date: Mon, 3 Dec 2012 17:25:20 -0600
Subject: [PATCH] Teach ScalarEvolution to handle IV=add(zext(trunc(IV)),
Accum)
Code generated for 8-bit and 16-bit induction variables commonly
produces this pattern (on hexagon), where the add is performed as a
32-bit operation, but is preceeded by a truncate and zero-extend (or
equivalent bitwise and).
This change teaches ScalarEvolution to recognize this pattern and
cre...
2014 Jan 02
0
[PATCH] drm/nvc0-: Fix voltage obtained from vbios.
...b/drivers/gpu/drm/nouveau/core/subdev/bios/vmap.c
@@ -87,14 +87,25 @@ nvbios_vmap_entry_parse(struct nouveau_bios *bios, int idx, u8 *ver, u8 *len,
u16 vmap = nvbios_vmap_entry(bios, idx, ver, len);
memset(info, 0x00, sizeof(*info));
switch (!!vmap * *ver) {
- case 0x10:
+ case 0x10: {
+ s32 accum, b, c;
+
info->link = 0xff;
info->min = nv_ro32(bios, vmap + 0x00);
info->max = nv_ro32(bios, vmap + 0x04);
- info->arg[0] = nv_ro32(bios, vmap + 0x08);
- info->arg[1] = nv_ro32(bios, vmap + 0x0c);
- info->arg[2] = nv_ro32(bios, vmap + 0x10);
+
+ accum = nv_ro...
2003 May 20
2
mdct_backward with fused muladd?
Can anybody point me at any resources that would explain how to optimize
mdct_backward for a cpu with a fused multiply-accumute unit?
>From what I understand from responses to my older postings, Tremor's
mdct_backward could be rewritten to take advantage of a muladd.
My target machine can do either two-wide 32x32 + Accum(64) -> Accum(64)
integer muladd or eight-wide 16x16 + Accum(32) -> Accum(32) integer m...
2008 May 14
6
PWGL in wine, problems
Hello,
I'm new on this list. First of all, thank you to all the developers of this
great project!
At the moment there is only an application that keeps me on both macos and
windows, its name is PWGL a free environment for computer assisted
composition in openGL. (http://www2.siba.fi/PWGL/)
I'm running Ubuntu 8.04 and wine 0.9.59.
I have to say that I also installed vcrun2005 and
2011 Sep 01
0
[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point
...endif
+#ifdef _USE_NEON
+#include "resample_neon.h"
+#endif
+
/* Numer of elements to allocate on the stack */
#ifdef VAR_ARRAYS
#define FIXED_STACK_ALLOC 8192
@@ -360,11 +364,12 @@ static int resampler_basic_direct_single(SpeexResamplerState *st, spx_uint32_t c
}
sum = accum[0] + accum[1] + accum[2] + accum[3];
*/
+ sum = SATURATE32PSHR(sum, 15, 32767);
#else
sum = inner_product_single(sinc, iptr, N);
#endif
- out[out_stride * out_sample++] = SATURATE32(PSHR32(sum, 15), 32767);
+ out[out_stride * out_sample++] = sum;
last_sample += int...
2004 Aug 18
1
Hangups - SIGFPE in dsp.c
Hi,
I'm running the latest CVS HEAD version of asterisk, and I'm experiencing
hangups during voice conversation. This happens quite regularely and
often.
The problem is in dsp.c, line 1235, where it says
accum /= len;
But `len', at this point, is 0, resulting in a SIGFPE. The routine
ast_frame *i4l_read() in channels/chan_modem_i4l.c:411 is
setting p->fr.datalen to p->obuflen which is zero.
Has anybody noticed this, too?
Since I don't know the code, I cannot suggest a fix, but maybe so...
2018 Apr 03
4
SCEV and LoopStrengthReduction Formulae
I am attempting to implement a minor loop strength reduction optimization for
targets that support compare and jump fusion, specifically
TTI::canMacroFuseCmp(). My approach might be wrong; however, I am soliciting
the idea for feedback, so that I can implement this correctly. My plan is to
add a Supplemental LSR formula to LoopStrengthReduce.cpp that optimizes the
following case, but perhaps
2010 Jan 20
2
Plot frame border to start at zero?
Hello,
I am creating plots of hourly precipitation and accumulated
precipitation (on different axis, see attached image). I was wondering
how can I have the plot frame (black border) start at zero, it looks
like it is plotted less than zero?
The code I use to create the png files is below:
CairoPNG(PNG_file,width=1000, height=600, pointsize=14, bg=&quo...
2011 May 20
1
[LLVMdev] subregisters, def-kill
...g16511:hi16<def> = COPY %reg16473:lo16<kill>, %reg16511<imp-def>;
844L %reg16511:lo16<def> = COPY %reg16478:lo16<kill>;
852L %r4<def,dead> = st_postMod %reg16511<kill>, %r4
...
844L %reg16511:lo16<def> = COPY %reg16478:lo16<kill>; Accum:%reg16511,16478
Considering merging %reg16478 with reg%16511 to Accum
RHS = %reg16478,0.000000e+00 = [804d,844d:0) 0 at 804d
LHS = %reg16511,0.000000e+00 = [836d,844d:1)[844d,852d:0) 0 at 844d 1 at 836d
Interference!
It seems that there is a Live range from the...
2013 Dec 18
0
[LLVMdev] LLVM ARM VMLA instruction
...would be because it follows what C tells us a compiler has to do
by default but provides overrides in either direction if you know what
you're doing.
The key point is that LLVM (currently) has no notion of statement
boundaries, so it would fuse the operations in this function:
float foo(float accum, float lhs, float rhs) {
float product = lhs * rhs;
return accum + product;
}
This isn't allowed even under FP_CONTRACT=on (the multiply and add do
not occur within a single expression), so LLVM can't in good
conscience enable these optimisations by default.
Cheers.
Tim.
2012 Apr 05
1
[PATCH] remove unnecesary typedef in bitwriter.c
..._WORDS_TO_BITS(words) ((words) * FLAC__BITS_PER_WORD)
#define FLAC__TOTAL_BITS(bw) (FLAC__WORDS_TO_BITS((bw)->words) + (bw)->bits)
@@ -85,8 +84,8 @@ static const unsigned FLAC__BITWRITER_DEFAULT_INCREMENT = 4096u / sizeof(bwword)
#endif
struct FLAC__BitWriter {
- bwword *buffer;
- bwword accum; /* accumulator; bits are right-justified; when full, accum is appended to buffer */
+ uint32_t *buffer;
+ uint32_t accum; /* accumulator; bits are right-justified; when full, accum is appended to buffer */
unsigned capacity; /* capacity of buffer in words */
unsigned words; /* # of complete wo...
2013 Dec 19
2
[LLVMdev] LLVM ARM VMLA instruction
...C tells us a compiler has to do
> by default but provides overrides in either direction if you know what
> you're doing.
>
> The key point is that LLVM (currently) has no notion of statement
> boundaries, so it would fuse the operations in this function:
>
> float foo(float accum, float lhs, float rhs) {
> float product = lhs * rhs;
> return accum + product;
> }
>
> This isn't allowed even under FP_CONTRACT=on (the multiply and add do
> not occur within a single expression), so LLVM can't in good
> conscience enable these optimisations by de...
2004 Aug 19
2
Floating point exception help
...200
> @@ -1229,6 +1229,13 @@
> int x;
> int res = 0;
>
> + /*PM BEGIN*/
> + if (len==0) {
> + ast_log(LOG_WARNING, "zero length packet\n");
> + return 0;
> + }
> + /*PM END*/
> +
> accum = 0;
> for (x=0;x<len; x++)
> accum += abs(s[x]);
>
>
> _______________________________________________
> Asterisk-Users mailing list
> Asterisk-Users@lists.digium.com
> http://lists.digium.com/mailman/listinfo/asterisk-users
> To UNSUBSCRIBE or u...
2013 Dec 18
2
[LLVMdev] LLVM ARM VMLA instruction
> "-ffp-contract=fast" is needed
Correct - clang is different than gcc, icc, msvc, xlc, etc. on this. Still
haven't seen any explanation for how this is better though...
http://llvm.org/bugs/show_bug.cgi?id=17188
http://llvm.org/bugs/show_bug.cgi?id=17211
On Wed, Dec 18, 2013 at 6:02 AM, Tim Northover <t.p.northover at gmail.com>wrote:
> > I believe that's the
2011 Sep 01
6
[PATCH 0/5] ARM NEON optimization for samplerate converter
From: Jyri Sarha <jsarha at ti.com>
I optimized Speex resampler for NEON capable ARM CPUs. The first patch
should speed up resampling on any platform that can spare the
increased memory usage. It would be nice to have these merged to the
master branch. Please let me know if there is anything I can do to
help the the merge. The patches have been rebased on top of master
branch in