search for: celt_assert

Displaying 20 results from an estimated 36 matches for "celt_assert".

2017 Jun 06
4
Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics
...gt; Assuming there's no issue with the patches, next week isn't too late. >> >> Also, I've started looking at your patches. So far there's one thing >> that puzzles me a bit. In the OPUS_CHECK_ASM section of patch 0004, you >> have: >> >> + celt_assert(ABS32(xy1_c - *xy1) <= VERY_SMALL); >> >> Given the normal range of the values (the xy values are often much >> larger than one) and the precision involved here (24-bit mantissa), it >> seems like this test can only succeed if the two values are actually >> equal. I...
2017 Jun 06
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Thank Ulrich! Yes, using celt_assert(1.0 + celt_inner_prod_neon_float_c_simulation(x, y, N) == 1.0 + xy); celt_assert(1.0 + xy1_c == 1.0 + *xy1); celt_assert(1.0 + xy2_c == 1.0 + *xy2); can avoid the useage of VERY_SMALL. Hi Jean-Marc, I added { const opus_val32 xy_c = celt_inner_prod_neon_float_c_simulat...
2014 Nov 28
2
[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...oat32x4_t XX[4]; > + float32x2_t XX_2; > + float32x4_t SUMM[4]; > + float *xi = x; > + float *yi = y; > + int cd = len/4; > + int cr = len%4; len is signed, so / and % are NOT equivalent to the corresponding >> and & (they are much slower). > + int j; > + > + celt_assert(len>=3); > + > + /* Initialize sums to 0 */ > + SUMM[0] = vdupq_n_f32(0); > + SUMM[1] = vdupq_n_f32(0); > + SUMM[2] = vdupq_n_f32(0); > + SUMM[3] = vdupq_n_f32(0); > + > + YY[0] = vld1q_f32(yi); > + > + /* Each loop consumes 8 floats in y vector > + * and 4 floa...
2008 Dec 21
0
[PATCH] Fix ectest to not check a case which isn't guaranteed to work, and which we don't use.
...; +#include "arch.h" void ec_byte_readinit(ec_byte_buffer *_b,unsigned char *_buf,long _bytes){ @@ -106,6 +107,8 @@ ec_uint32 ec_dec_uint(ec_dec *_this,ec_uint32 _ft){ unsigned s; int ftb; t=0; + /*In order to optimize EC_ILOG(), it is undefined for the value 0.*/ + celt_assert(_ft>1); _ft--; ftb=EC_ILOG(_ft); if(ftb>EC_UNIT_BITS){ diff --git a/libcelt/entenc.c b/libcelt/entenc.c index 3da351e..d0cbb0c 100644 --- a/libcelt/entenc.c +++ b/libcelt/entenc.c @@ -100,8 +100,10 @@ void ec_enc_uint(ec_enc *_this,ec_uint32 _fl,ec_uint32 _ft){ unsigned ft; un...
2014 Dec 01
0
[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...>> + float *xi = x; >> + float *yi = y; >> + int cd = len/4; >> + int cr = len%4; > > len is signed, so / and % are NOT equivalent to the corresponding >> and > & (they are much slower). > >> + int j; >> + >> + celt_assert(len>=3); >> + >> + /* Initialize sums to 0 */ >> + SUMM[0] = vdupq_n_f32(0); >> + SUMM[1] = vdupq_n_f32(0); >> + SUMM[2] = vdupq_n_f32(0); >> + SUMM[3] = vdupq_n_f32(0); >> + >> + YY[0] = vld1q_f32(yi); >> + >> +...
2015 May 15
0
[RFC V3 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics
...void xcorr_kernel_neon_fixed(const int16_t *x, const int16_t *y, + int32_t sum[4], int len) { + int16x8_t YY[3]; + int16x4_t YEXT[3]; + int16x8_t XX[2]; + int16x4_t XX_2, YY_2; + int32x4_t SUMM; + const int16_t *xi = x; + const int16_t *yi = y; + + celt_assert(len>4); + + YY[0] = vld1q_s16(yi); + YY_2 = vget_low_s16(YY[0]); + + SUMM = vdupq_n_s32(0); + + /* Consume 16 elements in x vector and 20 elements in y + * vector. However, the y[19] and beyond dont get accessed + * So, if len == 16, then we must only access y[0] to y[18] + * So...
2015 May 08
0
[[RFC PATCH v2]: Ne10 fft fixed and previous 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics
...void xcorr_kernel_neon_fixed(const int16_t *x, const int16_t *y, + int32_t sum[4], int len) { + int16x8_t YY[3]; + int16x4_t YEXT[3]; + int16x8_t XX[2]; + int16x4_t XX_2, YY_2; + int32x4_t SUMM; + const int16_t *xi = x; + const int16_t *yi = y; + + celt_assert(len>4); + + YY[0] = vld1q_s16(yi); + YY_2 = vget_low_s16(YY[0]); + + SUMM = vdupq_n_s32(0); + + /* Consume 16 elements in x vector and 20 elements in y + * vector. However, the y[19] and beyond dont get accessed + * So, if len == 16, then we must only access y[0] to y[18] + * So...
2014 Nov 21
4
[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hello, I received feedback from engineers working on NE10 [1] that it would be better to use NE10 [1] for FFT optimizations for opus use cases. However, these FFT patches are currently in review and haven't been integrated into NE10 yet. While the FFT functions in NE10 are getting baked, I wanted to optimize the celt_pitch_xcorr (floating point only) and use it to introduce ARM NEON
2017 Jun 06
3
celt_inner_prod() and dual_inner_prod() NEON intrinsics
...; don't wait if it's too late for 1.2 release. Assuming there's no issue with the patches, next week isn't too late. Also, I've started looking at your patches. So far there's one thing that puzzles me a bit. In the OPUS_CHECK_ASM section of patch 0004, you have: + celt_assert(ABS32(xy1_c - *xy1) <= VERY_SMALL); Given the normal range of the values (the xy values are often much larger than one) and the precision involved here (24-bit mantissa), it seems like this test can only succeed if the two values are actually equal. Is the float patch actually bit-exact? If so,...
2014 Nov 21
0
[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...;arm_neon.h> +#include "../arch.h" + +static void xcorr_kernel_neon_float(float *x, float *y, float sum[4], int len) { + float32x4_t YY[5]; + float32x4_t XX[4]; + float32x2_t XX_2; + float32x4_t SUMM[4]; + float *xi = x; + float *yi = y; + int cd = len/4; + int cr = len%4; + int j; + + celt_assert(len>=3); + + /* Initialize sums to 0 */ + SUMM[0] = vdupq_n_f32(0); + SUMM[1] = vdupq_n_f32(0); + SUMM[2] = vdupq_n_f32(0); + SUMM[3] = vdupq_n_f32(0); + + YY[0] = vld1q_f32(yi); + + /* Each loop consumes 8 floats in y vector + * and 4 floats in x vector + */ + for (j = 0; j < cd; j++) { +...
2014 Dec 19
2
[PATCH v1] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi, Optimizes celt_pitch_xcorr for ARM NEON floating point. Changes from RFCv3: - celt_neon_intr.c - removed warnings due to not having constant pointers - Put simpler loop to take care of corner cases. Unrolling using intrinsics was not really mapping well to what was done in celt_pitch_xcorr_arm.s - Makefile.am Removed explicit -O3 optimization - test_unit_mathops.c,
2014 Dec 19
0
[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...] + */ +static void xcorr_kernel_neon_float(const float *x, const float *y, + float sum[4], int len) { + float32x4_t YY[3]; + float32x4_t YEXT[3]; + float32x4_t XX[2]; + float32x2_t XX_2; + float32x4_t SUMM; + const float *xi = x; + const float *yi = y; + + celt_assert(len>0); + + YY[0] = vld1q_f32(yi); + SUMM = vdupq_n_f32(0); + + /* Consume 8 elements in x vector and 12 elements in y + * vector. However, the 12'th element never really gets + * touched in this loop. So, if len == 8, then we only + * must access y[0] to y[10]. y[11] must not...
2014 Dec 07
0
[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...tores them in sum[4] + */ +void xcorr_kernel_neon_float(const float *x, const float *y, + float sum[4], int len) { + float32x4_t YY[3]; + float32x4_t YEXT[3]; + float32x4_t XX[2]; + float32x2_t XX_2; + float32x4_t SUMM; + float *xi = x; + float *yi = y; + + celt_assert(len>0); + + YY[0] = vld1q_f32(yi); + SUMM = vdupq_n_f32(0); + + /* Consume 8 elements in x vector and 12 elements in y + * vector. However, the 12'th element never really gets + * touched in this loop. So, if len == 8, then we only + * must access y[0] to y[10]. y[11] must not...
2014 Dec 10
0
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...hem in sum[4] + */ +static void xcorr_kernel_neon_float(const float *x, const float *y, + float sum[4], int len) { + float32x4_t YY[3]; + float32x4_t YEXT[3]; + float32x4_t XX[2]; + float32x2_t XX_2; + float32x4_t SUMM; + float *xi = x; + float *yi = y; + + celt_assert(len>0); + + YY[0] = vld1q_f32(yi); + SUMM = vdupq_n_f32(0); + + /* Consume 8 elements in x vector and 12 elements in y + * vector. However, the 12'th element never really gets + * touched in this loop. So, if len == 8, then we only + * must access y[0] to y[10]. y[11] must not...
2014 Dec 10
2
[RFC PATCH v3] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi, Optimizes celt_pitch_xcorr for floating point. Changes from RFCv2: - Changes recommended by Timothy for celt_neon_intr.c everything except, left the unrolled loop still unrolled - configure.ac - use AC_LINK_IFELSE instead of AC_COMPILE_IFELSE - Moved compile flags into Makefile.am - OPUS_ARM_NEON_INR --> typo --> OPUS_ARM_NEON_INTR Viswanath Puttagunta (1): armv7:
2014 Dec 07
3
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
From: Viswanath Puttagunta <viswanath.puttagunta at linaro.org> Hi, Optimizes celt_pitch_xcorr for floating point. Changes from RFCv1: - Rebased on top of commit aad281878: Fix celt_pitch_xcorr_c signature. which got rid of ugly code around CELT_PITCH_XCORR_IMPL passing of "arch" parameter. - Unified with --enable-intrinsics used by x86 - Modified algorithm to be more
2014 Dec 19
2
[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...float *x, const float *y, > + float sum[4], int len) { > + float32x4_t YY[3]; > + float32x4_t YEXT[3]; > + float32x4_t XX[2]; > + float32x2_t XX_2; > + float32x4_t SUMM; > + const float *xi = x; > + const float *yi = y; > + > + celt_assert(len>0); > + > + YY[0] = vld1q_f32(yi); > + SUMM = vdupq_n_f32(0); > + > + /* Consume 8 elements in x vector and 12 elements in y > + * vector. However, the 12'th element never really gets > + * touched in this loop. So, if len == 8, then we only > + * m...
2014 Dec 07
2
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi, Optimizes celt_pitch_xcorr for floating point. Changes from RFCv1: - Rebased on top of commit aad281878: Fix celt_pitch_xcorr_c signature. which got rid of ugly code around CELT_PITCH_XCORR_IMPL passing of "arch" parameter. - Unified with --enable-intrinsics used by x86 - Modified algorithm to be more in-line with algorithm in celt_pitch_xcorr_arm.s Viswanath Puttagunta
2008 Dec 20
5
ectest failed with gcc-4.2.4
Hi, compiling the latest release 0.5.1 (as well as from git) with gcc-4.2.4 on zenwalk (slackware current), ectest fails; using gcc-3.4.6 all tests succeeds. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/opus/attachments/20081220/68be24c8/attachment-0002.htm
2015 Dec 23
6
[AArch64 neon intrinsics v4 0/5] Rework Neon intrinsic code for Aarch64 patchset
Following Tim's comments, here are my reworked patches for the Neon intrinsic function patches of of my Aarch64 patchset, i.e. replacing patches 5-8 of the v2 series. Patches 1-4 and 9-18 of the old series still apply unmodified. The one new (as opposed to changed) patch is the first one in this series, to add named constants for the ARM architecture variants. There are also some minor code