Displaying 12 results from an estimated 12 matches for "yy_2".
Did you mean:
ty_2
2015 May 15
0
[RFC V3 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics
...------------
+ * Computes 8 correlation values and stores them in sum[8]
+ */
+static void xcorr_kernel_neon_fixed(const int16_t *x, const int16_t *y,
+ int32_t sum[4], int len) {
+ int16x8_t YY[3];
+ int16x4_t YEXT[3];
+ int16x8_t XX[2];
+ int16x4_t XX_2, YY_2;
+ int32x4_t SUMM;
+ const int16_t *xi = x;
+ const int16_t *yi = y;
+
+ celt_assert(len>4);
+
+ YY[0] = vld1q_s16(yi);
+ YY_2 = vget_low_s16(YY[0]);
+
+ SUMM = vdupq_n_s32(0);
+
+ /* Consume 16 elements in x vector and 20 elements in y
+ * vector. However, the y[19] and beyon...
2015 May 08
0
[[RFC PATCH v2]: Ne10 fft fixed and previous 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics
...------------
+ * Computes 8 correlation values and stores them in sum[8]
+ */
+static void xcorr_kernel_neon_fixed(const int16_t *x, const int16_t *y,
+ int32_t sum[4], int len) {
+ int16x8_t YY[3];
+ int16x4_t YEXT[3];
+ int16x8_t XX[2];
+ int16x4_t XX_2, YY_2;
+ int32x4_t SUMM;
+ const int16_t *xi = x;
+ const int16_t *yi = y;
+
+ celt_assert(len>4);
+
+ YY[0] = vld1q_s16(yi);
+ YY_2 = vget_low_s16(YY[0]);
+
+ SUMM = vdupq_n_s32(0);
+
+ /* Consume 16 elements in x vector and 20 elements in y
+ * vector. However, the y[19] and beyon...
2014 Dec 19
2
[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...tch_xcorr_arm.s, the closest I came using intrinsics
is below code.. which didn't really put much dent in the performance..
so I just left
it out since above code submitted is much simpler to read than below
celt_pitch_xcorr_arm.s.. So, I request to leave it simple to read for now.
float32x2_t YY_2;
while (len > 0) {
switch(len) {
case 4:
case 3:
XX_2 = vld1_f32(xi);
xi += 2;
YY_2 = vld1_f32(yi+4);
YY[1] = vcombine_f32(YY_2, YY_2);
SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0);
YEXT[0] = vextq_f32(YY[0], YY[1], 1);...
2014 Dec 19
2
[PATCH v1] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi,
Optimizes celt_pitch_xcorr for ARM NEON floating point.
Changes from RFCv3:
- celt_neon_intr.c
- removed warnings due to not having constant pointers
- Put simpler loop to take care of corner cases. Unrolling using
intrinsics was not really mapping well to what was done
in celt_pitch_xcorr_arm.s
- Makefile.am
Removed explicit -O3 optimization
- test_unit_mathops.c,
2014 Dec 19
0
[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...* ---------------------------------
+ * Computes single correlation values and stores in *sum
+ */
+static void xcorr_kernel_neon_float_process1(const float *x, const float *y,
+ float *sum, int len) {
+ float32x4_t XX[4];
+ float32x4_t YY[4];
+ float32x2_t XX_2;
+ float32x2_t YY_2;
+ float32x4_t SUMM;
+ float32x2_t SUMM_2[2];
+ const float *xi = x;
+ const float *yi = y;
+
+ SUMM = vdupq_n_f32(0);
+
+ /* Work on 16 values per iteration */
+ while (len >= 16) {
+ XX[0] = vld1q_f32(xi);
+ xi += 4;
+ XX[1] = vld1q_f32(xi);
+ xi += 4;
+...
2014 Dec 10
0
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...* ---------------------------------
+ * Computes single correlation values and stores in *sum
+ */
+static void xcorr_kernel_neon_float_process1(const float *x, const float *y,
+ float *sum, int len) {
+ float32x4_t XX[4];
+ float32x4_t YY[4];
+ float32x2_t XX_2;
+ float32x2_t YY_2;
+ float32x4_t SUMM;
+ float32x2_t SUMM_2[2];
+ float *xi = x;
+ float *yi = y;
+
+ SUMM = vdupq_n_f32(0);
+
+ /* Work on 16 values per iteration */
+ while (len >= 16) {
+ XX[0] = vld1q_f32(xi);
+ xi += 4;
+ XX[1] = vld1q_f32(xi);
+ xi += 4;
+ XX[2] = vld1...
2014 Dec 10
2
[RFC PATCH v3] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi,
Optimizes celt_pitch_xcorr for floating point.
Changes from RFCv2:
- Changes recommended by Timothy for celt_neon_intr.c
everything except, left the unrolled loop still unrolled
- configure.ac
- use AC_LINK_IFELSE instead of AC_COMPILE_IFELSE
- Moved compile flags into Makefile.am
- OPUS_ARM_NEON_INR --> typo --> OPUS_ARM_NEON_INTR
Viswanath Puttagunta (1):
armv7:
2016 Sep 13
4
[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)
Should call celt_inner_prod().
---
celt/bands.c | 7 ++++---
celt/bands.h | 2 +-
celt/celt_encoder.c | 6 +++---
celt/pitch.c | 2 +-
src/opus_multistream_encoder.c | 2 +-
5 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/celt/bands.c b/celt/bands.c
index bbe8a4c..1ab24aa 100644
--- a/celt/bands.c
+++ b/celt/bands.c
2015 Mar 31
6
[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series
Hi Timothy,
As I mentioned earlier [1], I now fixed compile issues
with fixed point and resubmitting the patch.
I also have new patch that does intrinsics optimizations
for celt_pitch_xcorr targetting aarch64.
You can find my latest work-in-progress branch at [2]
For reference, you can use the Ne10 pre-built libraries
at [3]
Note that I am working with Phil at ARM to get my patch at [4]
2015 May 08
8
[RFC PATCH v2]: Ne10 fft fixed and previous 0/8]
Hi All,
As per Timothy's suggestion, disabling mdct_forward
for fixed point. Only effects
armv7,armv8: Extend fixed fft NE10 optimizations to mdct
Rest of patches are same as in [1]
For reference, latest wip code for opus is at [2]
Still working with NE10 team at ARM to get corner cases of
mdct_forward. Will update with another patch
when issue in NE10 gets fixed.
Regards,
Vish
[1]:
2015 May 15
11
[RFC V3 0/8] Ne10 fft fixed and previous
Hi All,
Changes from RFC v2 [1]
armv7,armv8: Extend fixed fft NE10 optimizations to mdct
- Overflow issue fixed by Phil at ARM. Ne10 wip at [2]. Should be upstream soon.
- So, re-enabled using fixed fft for mdct_forward which was disabled in RFCv2
armv7,armv8: Optimize fixed point fft using NE10 library
- Thanks to Jonathan Lennox, fixed some build fixes on iOS and some copy-paste errors
Rest
2015 Apr 28
10
[RFC PATCH v1 0/8] Ne10 fft fixed and previous
Hello Timothy / Jean-Marc / opus-dev,
This patch series is follow up on work I posted on [1].
In addition to what was posted on [1], this patch series mainly
integrates Fixed point FFT implementations in NE10 library into opus.
You can view my opus wip code at [2].
Note that while I found some issues both with the NE10 library(fixed fft)
and with Linaro toolchain (armv8 intrinsics), the work