Displaying 5 results from an estimated 5 matches for "interpolate_product_single".
2009 Oct 26
1
[PATCH] Fix miscompile of SSE resampler
...erpolate_single(SpeexResamplerState *st, spx_uint3
sum = MULT16_32_Q15(interp[0],SHR32(accum[0], 1)) + MULT16_32_Q15(interp[1],SHR32(accum[1], 1)) + MULT16_32_Q15(interp[2],SHR32(accum[2], 1)) + MULT16_32_Q15(interp[3],SHR32(accum[3], 1));
#else
cubic_coef(frac, interp);
- sum = interpolate_product_single(iptr, st->sinc_table + st->oversample + 4 - offset - 2, N, st->oversample, interp);
+ interpolate_product_single(&sum, iptr, st->sinc_table + st->oversample + 4 - offset - 2, N, st->oversample, interp);
#endif
out[out_stride * out_sample++] = SATURATE32(PS...
2011 Sep 01
0
[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point
...erp);
sum = MULT16_32_Q15(interp[0],SHR32(accum[0], 1)) + MULT16_32_Q15(interp[1],SHR32(accum[1], 1)) + MULT16_32_Q15(interp[2],SHR32(accum[2], 1)) + MULT16_32_Q15(interp[3],SHR32(accum[3], 1));
+ sum = SATURATE32PSHR(sum, 15, 32767);
#else
cubic_coef(frac, interp);
sum = interpolate_product_single(iptr, st->sinc_table + st->oversample + 4 - offset - 2, N, st->oversample, interp);
#endif
- out[out_stride * out_sample++] = SATURATE32(PSHR32(sum, 14), 32767);
+ out[out_stride * out_sample++] = sum;
last_sample += int_advance;
samp_frac_num += frac_advan...
2008 May 03
2
Resampler (no api)
...a+i), _mm_loadu_ps(b+i)));
+ sum = _mm_add_ps(sum, _mm_mul_ps(_mm_loadu_ps(a+i+4), _mm_loadu_ps(b+i+4)));
+ }
+ sum = _mm_add_ps(sum, _mm_movehl_ps(sum, sum));
+ sum = _mm_add_ss(sum, _mm_shuffle_ps(sum, sum, 0x55));
+ _mm_store_ss(&ret, sum);
+ return ret;
+}
+
+#define OVERRIDE_INTERPOLATE_PRODUCT_SINGLE
+static inline float interpolate_product_single(const float *a, const float *b, unsigned int len, const spx_uint32_t oversample, float *frac) {
+ int i;
+ float ret;
+ __m128 sum = _mm_setzero_ps();
+ __m128 f = _mm_loadu_ps(frac);
+ for(i=0;i<len;i+=2)
+ {
+ sum = _mm_add_ps(sum, _mm_m...
2008 May 03
0
Resampler, memory only variant
...a+i), _mm_loadu_ps(b+i)));
+ sum = _mm_add_ps(sum, _mm_mul_ps(_mm_loadu_ps(a+i+4), _mm_loadu_ps(b+i+4)));
+ }
+ sum = _mm_add_ps(sum, _mm_movehl_ps(sum, sum));
+ sum = _mm_add_ss(sum, _mm_shuffle_ps(sum, sum, 0x55));
+ _mm_store_ss(&ret, sum);
+ return ret;
+}
+
+#define OVERRIDE_INTERPOLATE_PRODUCT_SINGLE
+static inline float interpolate_product_single(const float *a, const float *b, unsigned int len, const spx_uint32_t oversample, float *frac) {
+ int i;
+ float ret;
+ __m128 sum = _mm_setzero_ps();
+ __m128 f = _mm_loadu_ps(frac);
+ for(i=0;i<len;i+=2)
+ {
+ sum = _mm_add_ps(sum, _mm_m...
2011 Sep 01
6
[PATCH 0/5] ARM NEON optimization for samplerate converter
From: Jyri Sarha <jsarha at ti.com>
I optimized Speex resampler for NEON capable ARM CPUs. The first patch
should speed up resampling on any platform that can spare the
increased memory usage. It would be nice to have these merged to the
master branch. Please let me know if there is anything I can do to
help the the merge. The patches have been rebased on top of master
branch in