John Ridges
2015-Mar-12 20:24 UTC
[opus] [RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.
Nit: in dual_inner_prod_sse, why not do both horizontal sums at the same time? As in: xsum1 = _mm_add_ps(_mm_movelh_ps(xsum1, xsum2), _mm_movehl_ps(xsum2, xsum1)); xsum1 = _mm_add_ps(xsum1, _mm_shuffle_ps(xsum1, xsum1, 0xf5)); _mm_store_ss(xy1, xsum1); _mm_store_ss(xy2, _mm_movehl_ps(xsum1, xsum1)); --John
Viswanath Puttagunta
2015-Mar-13 21:19 UTC
[opus] [RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.
This causes regression for ARMv7 Neon intrinsics.. I'm fixing this now.. will push out patch this week end. Regards, Vish On 12 March 2015 at 15:24, John Ridges <jridges at masque.com> wrote:> Nit: in dual_inner_prod_sse, why not do both horizontal sums at the same > time? As in: > > xsum1 = _mm_add_ps(_mm_movelh_ps(xsum1, xsum2), > _mm_movehl_ps(xsum2, xsum1)); > xsum1 = _mm_add_ps(xsum1, _mm_shuffle_ps(xsum1, xsum1, 0xf5)); > _mm_store_ss(xy1, xsum1); > _mm_store_ss(xy2, _mm_movehl_ps(xsum1, xsum1)); > > --John > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus
Jonathan Lennox
2015-Mar-13 23:05 UTC
[opus] [RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.
The patch only takes the existing Opus SSE code and moves it (out of header files, into C files) in order to make it possible to use it with RTCD. There aren?t any changes to the code itself. Improvements to the SSE code are very likely possible, but I think they should be contributed as a separate patch. This patch is too big already. On Mar 12, 2015, at 4:24 PM, John Ridges <jridges at masque.com> wrote:> Nit: in dual_inner_prod_sse, why not do both horizontal sums at the same > time? As in: > > xsum1 = _mm_add_ps(_mm_movelh_ps(xsum1, xsum2), > _mm_movehl_ps(xsum2, xsum1)); > xsum1 = _mm_add_ps(xsum1, _mm_shuffle_ps(xsum1, xsum1, 0xf5)); > _mm_store_ss(xy1, xsum1); > _mm_store_ss(xy2, _mm_movehl_ps(xsum1, xsum1)); > > --John > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus
Apparently Analagous Threads
- [RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.
- [RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.
- Bug fix in celt_lpc.c and some xcorr_kernel optimizations
- Patch cleaning up Opus x86 intrinsics configury
- [RFC PATCH v1 0/4] Enable aarch64 intrinsics/Ne10