Linfeng Zhang
2017-Feb-15 19:22 UTC
[opus] [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi, Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). And Patch 2 optimizes the new function celt_fir_permit_overflow() for ARM NEON. Please recommend a better function name. We did the same internal code review and testing already. Thanks, Linfeng -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170215/c44c8029/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-Optimize-celt_fir_permit_overflow-for-ARM-NEON.patch Type: text/x-patch Size: 16004 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20170215/c44c8029/attachment-0002.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Refactor-silk_LPC_analysis_filter.patch Type: text/x-patch Size: 7941 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20170215/c44c8029/attachment-0003.bin>
Jean-Marc Valin
2017-Feb-15 19:57 UTC
[opus] [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Linfeng, On 15/02/17 02:22 PM, Linfeng Zhang wrote:> Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). > And Patch 2 optimizes the new function celt_fir_permit_overflow() for > ARM NEON. > > Please recommend a better function name.In most other cases, we've just added the _ovflw() suffix to functions/macros where signed overflow is allowed (suppressed using unsigned cast).> We did the same internal code review and testing already.Thanks, I'll have a look. Jean-Marc
Jean-Marc Valin
2017-Feb-15 20:06 UTC
[opus] [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Linfeng, Can you give me a bit more details about the purpose of this patchset. It seems to me like it's mostly duplicating the celt_fir() optimizations? Did I miss anything? Cheers, Jean-Marc On 15/02/17 02:22 PM, Linfeng Zhang wrote:> Hi, > > Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). > And Patch 2 optimizes the new function celt_fir_permit_overflow() for > ARM NEON. > > Please recommend a better function name. > > We did the same internal code review and testing already. > > Thanks, > Linfeng > > > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus >
Linfeng Zhang
2017-Feb-15 21:05 UTC
[opus] [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Jean-Marc, The original celt_fir() is a little bit messy. It has 2 branches chosen by #ifdef SMALL_FOOTPRINT. For floating-point, the 2 branches are identical (except the operation sequence of accumulating x[i] to sum, which is not a big deal). For fixed-point, the 2 branches are different. I separate them into 2 functions: the new celt_fir(), and celt_fir_permit_overflow() which is the SMALL_FOOTPRINT branch. The only difference for fixed-point is: celt_fir(): the sum is truncated first and then accumulated to x[i] and saturated. celt_fir_permit_overflow(): x[i] is accumulated to the sum first and then truncated saturated. Maybe this is the reason why silk_LPC_analysis_filter() switched the FIR from celt_fir() to celt_fir_permit_overflow() half a year ago. Because of silk_LPC_analysis_filter(), celt_fir_permit_overflow() must behave the same for both floating-point and fixed-point, and this is why we defined ADD32_FIXED(), ..., PSHR32_FIXED() etc. It's still a messy. For the NEON optimization part, the previous celt_fir() optimization calls xcorr_kernel(). We tested and found that calling the xcorr_kernel() optimization didn't help too much here. The optimization in the patch is about 1% faster than simply calling xcorr_kernel() for the whole encoder. Considering the really small size of the new optimization, it's better to not call xcorr_kernel() to get 1% faster. Thanks, Linfeng On Wed, Feb 15, 2017 at 12:06 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:> Hi Linfeng, > > Can you give me a bit more details about the purpose of this patchset. > It seems to me like it's mostly duplicating the celt_fir() > optimizations? Did I miss anything? > > Cheers, > > Jean-Marc > > On 15/02/17 02:22 PM, Linfeng Zhang wrote: > > Hi, > > > > Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). > > And Patch 2 optimizes the new function celt_fir_permit_overflow() for > > ARM NEON. > > > > Please recommend a better function name. > > > > We did the same internal code review and testing already. > > > > Thanks, > > Linfeng > > > > > > > > _______________________________________________ > > opus mailing list > > opus at xiph.org > > http://lists.xiph.org/mailman/listinfo/opus > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170215/99a85dd6/attachment.html>
Linfeng Zhang
2017-Feb-15 21:08 UTC
[opus] [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
On Wed, Feb 15, 2017 at 11:57 AM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:> > In most other cases, we've just added the _ovflw() suffix to > functions/macros where signed overflow is allowed (suppressed using > unsigned cast). >I'll rename it to celt_fir_ovflw() later. Thanks, Linfeng -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170215/d6d8d080/attachment-0001.html>
Maybe Matching Threads
- [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
- [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
- [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
- [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
- [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON