thr3ads.net - search: "vpadd

[RFC V3 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics

2015 May 15

0

[RFC V3 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics

...en -= 8; + } + + /* Work on 4 values */ + if (len >= 4) { + XX_2 = vld1_s16(xi); + xi += 4; + YY_2 = vld1_s16(yi); + yi += 4; + SUMM = vmlal_s16(SUMM, YY_2, XX_2); + len -= 4; + } + + SUMM_2 = vadd_s32(vget_high_s32(SUMM), vget_low_s32(SUMM)); + SUMM_2 = vpadd_s32(SUMM_2, SUMM_2); + SUMM = vcombine_s32(SUMM_2, SUMM_2); + + while (len > 0) { + XX_2 = vld1_dup_s16(xi++); + YY_2 = vld1_dup_s16(yi++); + SUMM = vmlal_s16(SUMM, XX_2, YY_2); + len--; + } + vst1q_lane_s32(sum, SUMM, 0); +} + +opus_val32 celt_pitch_xcorr_fixed_neon(cons...

[[RFC PATCH v2]: Ne10 fft fixed and previous 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics

2015 May 08

0

[[RFC PATCH v2]: Ne10 fft fixed and previous 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics

...en -= 8; + } + + /* Work on 4 values */ + if (len >= 4) { + XX_2 = vld1_s16(xi); + xi += 4; + YY_2 = vld1_s16(yi); + yi += 4; + SUMM = vmlal_s16(SUMM, YY_2, XX_2); + len -= 4; + } + + SUMM_2 = vadd_s32(vget_high_s32(SUMM), vget_low_s32(SUMM)); + SUMM_2 = vpadd_s32(SUMM_2, SUMM_2); + SUMM = vcombine_s32(SUMM_2, SUMM_2); + + while (len > 0) { + XX_2 = vld1_dup_s16(xi++); + YY_2 = vld1_dup_s16(yi++); + SUMM = vmlal_s16(SUMM, XX_2, YY_2); + len--; + } + vst1q_lane_s32(sum, SUMM, 0); +} + +opus_val32 celt_pitch_xcorr_fixed_neon(cons...

Several patches of ARM NEON optimization

2016 Jul 14

6

Several patches of ARM NEON optimization

I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang

[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series

2015 Mar 31

6

[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series

Hi Timothy, As I mentioned earlier [1], I now fixed compile issues with fixed point and resubmitting the patch. I also have new patch that does intrinsics optimizations for celt_pitch_xcorr targetting aarch64. You can find my latest work-in-progress branch at [2] For reference, you can use the Ne10 pre-built libraries at [3] Note that I am working with Phil at ARM to get my patch at [4]

[RFC PATCH v2]: Ne10 fft fixed and previous 0/8]

2015 May 08

8

[RFC PATCH v2]: Ne10 fft fixed and previous 0/8]

Hi All, As per Timothy's suggestion, disabling mdct_forward for fixed point. Only effects armv7,armv8: Extend fixed fft NE10 optimizations to mdct Rest of patches are same as in [1] For reference, latest wip code for opus is at [2] Still working with NE10 team at ARM to get corner cases of mdct_forward. Will update with another patch when issue in NE10 gets fixed. Regards, Vish [1]:

[RFC V3 0/8] Ne10 fft fixed and previous

2015 May 15

11

[RFC V3 0/8] Ne10 fft fixed and previous

Hi All, Changes from RFC v2 [1] armv7,armv8: Extend fixed fft NE10 optimizations to mdct - Overflow issue fixed by Phil at ARM. Ne10 wip at [2]. Should be upstream soon. - So, re-enabled using fixed fft for mdct_forward which was disabled in RFCv2 armv7,armv8: Optimize fixed point fft using NE10 library - Thanks to Jonathan Lennox, fixed some build fixes on iOS and some copy-paste errors Rest

[RFC PATCH v1 0/8] Ne10 fft fixed and previous

2015 Apr 28

10

[RFC PATCH v1 0/8] Ne10 fft fixed and previous

Hello Timothy / Jean-Marc / opus-dev, This patch series is follow up on work I posted on [1]. In addition to what was posted on [1], this patch series mainly integrates Fixed point FFT implementations in NE10 library into opus. You can view my opus wip code at [2]. Note that while I found some issues both with the NE10 library(fixed fft) and with Linaro toolchain (armv8 intrinsics), the work

search for: vpadd_s32