Displaying 7 results from an estimated 7 matches for "vpadd_s32".
Did you mean:
vadd_s32
2015 May 15
0
[RFC V3 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics
...en -= 8;
+ }
+
+ /* Work on 4 values */
+ if (len >= 4) {
+ XX_2 = vld1_s16(xi);
+ xi += 4;
+ YY_2 = vld1_s16(yi);
+ yi += 4;
+ SUMM = vmlal_s16(SUMM, YY_2, XX_2);
+ len -= 4;
+ }
+
+ SUMM_2 = vadd_s32(vget_high_s32(SUMM), vget_low_s32(SUMM));
+ SUMM_2 = vpadd_s32(SUMM_2, SUMM_2);
+ SUMM = vcombine_s32(SUMM_2, SUMM_2);
+
+ while (len > 0) {
+ XX_2 = vld1_dup_s16(xi++);
+ YY_2 = vld1_dup_s16(yi++);
+ SUMM = vmlal_s16(SUMM, XX_2, YY_2);
+ len--;
+ }
+ vst1q_lane_s32(sum, SUMM, 0);
+}
+
+opus_val32 celt_pitch_xcorr_fixed_neon(cons...
2015 May 08
0
[[RFC PATCH v2]: Ne10 fft fixed and previous 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics
...en -= 8;
+ }
+
+ /* Work on 4 values */
+ if (len >= 4) {
+ XX_2 = vld1_s16(xi);
+ xi += 4;
+ YY_2 = vld1_s16(yi);
+ yi += 4;
+ SUMM = vmlal_s16(SUMM, YY_2, XX_2);
+ len -= 4;
+ }
+
+ SUMM_2 = vadd_s32(vget_high_s32(SUMM), vget_low_s32(SUMM));
+ SUMM_2 = vpadd_s32(SUMM_2, SUMM_2);
+ SUMM = vcombine_s32(SUMM_2, SUMM_2);
+
+ while (len > 0) {
+ XX_2 = vld1_dup_s16(xi++);
+ YY_2 = vld1_dup_s16(yi++);
+ SUMM = vmlal_s16(SUMM, XX_2, YY_2);
+ len--;
+ }
+ vst1q_lane_s32(sum, SUMM, 0);
+}
+
+opus_val32 celt_pitch_xcorr_fixed_neon(cons...
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes.
Patches 1 to 3 replace all my previous submitted patches.
Patches 4 and 5 are new.
Thanks,
Linfeng Zhang
2015 Mar 31
6
[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series
Hi Timothy,
As I mentioned earlier [1], I now fixed compile issues
with fixed point and resubmitting the patch.
I also have new patch that does intrinsics optimizations
for celt_pitch_xcorr targetting aarch64.
You can find my latest work-in-progress branch at [2]
For reference, you can use the Ne10 pre-built libraries
at [3]
Note that I am working with Phil at ARM to get my patch at [4]
2015 May 08
8
[RFC PATCH v2]: Ne10 fft fixed and previous 0/8]
Hi All,
As per Timothy's suggestion, disabling mdct_forward
for fixed point. Only effects
armv7,armv8: Extend fixed fft NE10 optimizations to mdct
Rest of patches are same as in [1]
For reference, latest wip code for opus is at [2]
Still working with NE10 team at ARM to get corner cases of
mdct_forward. Will update with another patch
when issue in NE10 gets fixed.
Regards,
Vish
[1]:
2015 May 15
11
[RFC V3 0/8] Ne10 fft fixed and previous
Hi All,
Changes from RFC v2 [1]
armv7,armv8: Extend fixed fft NE10 optimizations to mdct
- Overflow issue fixed by Phil at ARM. Ne10 wip at [2]. Should be upstream soon.
- So, re-enabled using fixed fft for mdct_forward which was disabled in RFCv2
armv7,armv8: Optimize fixed point fft using NE10 library
- Thanks to Jonathan Lennox, fixed some build fixes on iOS and some copy-paste errors
Rest
2015 Apr 28
10
[RFC PATCH v1 0/8] Ne10 fft fixed and previous
Hello Timothy / Jean-Marc / opus-dev,
This patch series is follow up on work I posted on [1].
In addition to what was posted on [1], this patch series mainly
integrates Fixed point FFT implementations in NE10 library into opus.
You can view my opus wip code at [2].
Note that while I found some issues both with the NE10 library(fixed fft)
and with Linaro toolchain (armv8 intrinsics), the work