search for: override_inner_product_singl

Displaying 8 results from an estimated 8 matches for "override_inner_product_singl".

2011 Sep 01
0
[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point
..."vqmovn.s32 d0, q0\n" + "vmov.s16 %[ret], d0[0]\n" + : [ret] "=&r" (ret) + : [a] "r" (a) + : "q0"); + return ret; +} +#endif +#undef WORD2INT +#define WORD2INT(x) (saturate_32bit_to_16bit(x)) + #define OVERRIDE_INNER_PRODUCT_SINGLE /* Only works when len % 4 == 0 */ static inline int32_t inner_product_single(const int16_t *a, const int16_t *b, unsigned int len) @@ -97,4 +121,81 @@ static inline int32_t inner_product_single(const int16_t *a, const int16_t *b, u return ret; } +#elif defined(FLOATING_POINT) + +static...
2009 Jun 14
1
Resampler saturation, blackfin performance
...being that resample.patch converts the "unrolled > by four" loop into a plain one that's easier on DSPs, right? Yes exactly, plus a little explanation in comments. I really have no idea of the performance difference on x86. But I think gcc/msvc can unroll. Up to you. Anyway I can OVERRIDE_INNER_PRODUCT_SINGLE. Talking about performance (still using generic version with VDSP compiler): 1. I got a pretty good boost by using a scratch buffer in SRAM. 2. Wideband Encode+Decode takes 79.1 + 7.2 MIPS on my BF536 400/133 Mhz 3. Profiler says: vq_nbest 33.05% vq_nbest_sign 11.12%...
2011 Sep 01
6
[PATCH 0/5] ARM NEON optimization for samplerate converter
From: Jyri Sarha <jsarha at ti.com> I optimized Speex resampler for NEON capable ARM CPUs. The first patch should speed up resampling on any platform that can spare the increased memory usage. It would be nice to have these merged to the master branch. Please let me know if there is anything I can do to help the the merge. The patches have been rebased on top of master branch in
2009 Oct 26
1
[PATCH] Fix miscompile of SSE resampler
...mple, interp); #endif out[out_stride * out_sample++] = PSHR32(sum,15); diff --git a/libspeex/resample_sse.h b/libspeex/resample_sse.h index 64be8a1..86ff35e 100644 --- a/libspeex/resample_sse.h +++ b/libspeex/resample_sse.h @@ -37,10 +37,9 @@ #include <xmmintrin.h> #define OVERRIDE_INNER_PRODUCT_SINGLE -static inline float inner_product_single(const float *a, const float *b, unsigned int len) +static inline void inner_product_single(float *ret, const float *a, const float *b, unsigned int len) { int i; - float ret; __m128 sum = _mm_setzero_ps(); for (i=0;i<len;i+=8) { @@ -4...
2008 Apr 04
1
Resampler experimental speedups
Hello :) The attached patch (which is not in any way finished) optimizes the resampler. (For those following the discussions on IRC; this version includes optimizations for both direct and interpolate cases). Using GCC 4.3, x86_64, Valgrind to measure instruction counts, resampling 10 frames of 320 floats at quality 3. Direct was measured with a 16=>48 resampling, and interpolate with a
2011 Sep 01
0
[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point
...SED AND ON ANY THEORY OF + LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ + +#include <arm_neon.h> + +#ifdef FIXED_POINT +#define OVERRIDE_INNER_PRODUCT_SINGLE +/* Only works when len % 4 == 0 */ +static inline int32_t inner_product_single(const int16_t *a, const int16_t *b, unsigned int len) +{ + int32_t ret; + uint32_t remainder = len % 16; + len = len - remainder; + + asm volatile (" cmp %[len], #0\n" + " bne 1f\n"...
2008 May 03
2
Resampler (no api)
...UPTION) HOWEVER CAUSED AND ON ANY THEORY OF + LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ + +#include <xmmintrin.h> + +#define OVERRIDE_INNER_PRODUCT_SINGLE +static inline float inner_product_single(const float *a, const float *b, unsigned int len) +{ + int i; + float ret; + __m128 sum = _mm_setzero_ps(); + for (i=0;i<len;i+=8) + { + sum = _mm_add_ps(sum, _mm_mul_ps(_mm_loadu_ps(a+i), _mm_loadu_ps(b+i))); + sum = _mm_add_ps(sum,...
2008 May 03
0
Resampler, memory only variant
...UPTION) HOWEVER CAUSED AND ON ANY THEORY OF + LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ + +#include <xmmintrin.h> + +#define OVERRIDE_INNER_PRODUCT_SINGLE +static inline float inner_product_single(const float *a, const float *b, unsigned int len) +{ + int i; + float ret; + __m128 sum = _mm_setzero_ps(); + for (i=0;i<len;i+=8) + { + sum = _mm_add_ps(sum, _mm_mul_ps(_mm_loadu_ps(a+i), _mm_loadu_ps(b+i))); + sum = _mm_add_ps(sum,...