search for: inner_product_single

Displaying 7 results from an estimated 7 matches for "inner_product_single".

2011 Sep 01
0
[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point
...quot;vqmovn.s32 d0, q0\n" + "vmov.s16 %[ret], d0[0]\n" + : [ret] "=&r" (ret) + : [a] "r" (a) + : "q0"); + return ret; +} +#endif +#undef WORD2INT +#define WORD2INT(x) (saturate_32bit_to_16bit(x)) + #define OVERRIDE_INNER_PRODUCT_SINGLE /* Only works when len % 4 == 0 */ static inline int32_t inner_product_single(const int16_t *a, const int16_t *b, unsigned int len) @@ -97,4 +121,81 @@ static inline int32_t inner_product_single(const int16_t *a, const int16_t *b, u return ret; } +#elif defined(FLOATING_POINT) + +static i...
2009 Oct 26
1
[PATCH] Fix miscompile of SSE resampler
...eex/resample.c b/libspeex/resample.c index 7b5a308..8131380 100644 --- a/libspeex/resample.c +++ b/libspeex/resample.c @@ -361,7 +361,7 @@ static int resampler_basic_direct_single(SpeexResamplerState *st, spx_uint32_t c sum = accum[0] + accum[1] + accum[2] + accum[3]; */ #else - sum = inner_product_single(sinc, iptr, N); + inner_product_single(&sum, sinc, iptr, N); #endif out[out_stride * out_sample++] = SATURATE32(PSHR32(sum, 15), 32767); @@ -412,7 +412,7 @@ static int resampler_basic_direct_double(SpeexResamplerState *st, spx_uint32_t c } sum = accum[0] + accum[1]...
2011 Sep 01
0
[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point
From: Jyri Sarha <jsarha at ti.com> Semantics of inner_product_single have also been changed to contain the final right shift and saturation so it can also be implemented in the optimal way for the used platform. This change affects fixed point calculations only. I also added a new fixed point macro SATURATE32PSHR(x, shift, a). It does pretty much the same thing as...
2011 Sep 01
6
[PATCH 0/5] ARM NEON optimization for samplerate converter
...r branch in http://git.xiph.org/speex.git and the relevant parts have been tested on ARM and x86 systems. Cheers, Jyri Jyri Sarha (5): resample: Calculate full sinc table (e.g. no sinc interpolation) configure.ac: Add --enable-resample-full-sinc-table conf flag resample: Add NEON optimized inner_product_single for fixed point configure.ac: Add ARM NEON support resample: Add NEON optimized inner_product_single for floating point configure.ac | 35 ++++++++ libspeex/arch.h | 1 + libspeex/fixed_generic.h | 4 + libspeex/resample.c | 14 +++- libspeex/resample_neon....
2008 Apr 04
1
Resampler experimental speedups
Hello :) The attached patch (which is not in any way finished) optimizes the resampler. (For those following the discussions on IRC; this version includes optimizations for both direct and interpolate cases). Using GCC 4.3, x86_64, Valgrind to measure instruction counts, resampling 10 frames of 320 floats at quality 3. Direct was measured with a 16=>48 resampling, and interpolate with a
2008 May 03
2
Resampler (no api)
...OWEVER CAUSED AND ON ANY THEORY OF + LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ + +#include <xmmintrin.h> + +#define OVERRIDE_INNER_PRODUCT_SINGLE +static inline float inner_product_single(const float *a, const float *b, unsigned int len) +{ + int i; + float ret; + __m128 sum = _mm_setzero_ps(); + for (i=0;i<len;i+=8) + { + sum = _mm_add_ps(sum, _mm_mul_ps(_mm_loadu_ps(a+i), _mm_loadu_ps(b+i))); + sum = _mm_add_ps(sum,...
2008 May 03
0
Resampler, memory only variant
...OWEVER CAUSED AND ON ANY THEORY OF + LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ + +#include <xmmintrin.h> + +#define OVERRIDE_INNER_PRODUCT_SINGLE +static inline float inner_product_single(const float *a, const float *b, unsigned int len) +{ + int i; + float ret; + __m128 sum = _mm_setzero_ps(); + for (i=0;i<len;i+=8) + { + sum = _mm_add_ps(sum, _mm_mul_ps(_mm_loadu_ps(a+i), _mm_loadu_ps(b+i))); + sum = _mm_add_ps(sum,...