Jonathan Lennox
2015-Mar-04 04:17 UTC
[opus] Patch cleaning up Opus x86 intrinsics configury
On Mar 3, 2015, at 11:08 PM, Viswanath Puttagunta <viswanath.puttagunta at linaro.org<mailto:viswanath.puttagunta at linaro.org>> wrote: On 3 March 2015 at 21:59, Jonathan Lennox <jonathan at vidyo.com<mailto:jonathan at vidyo.com>> wrote: Viswenath, My patch should be against the tip, but it?s the very recent tip, including some changes this past Friday (27 Feb). I mentioned in the IRC room a problem I discovered in creating my patch, and then later improved the fix Tim had made for the problem. Where do you get conflicts merging it to tip? In terms of merging, you posted your patch before I posted mine, so probably I should be the one on the hook to rebase after your fix goes in. Looking over your patch quickly, I don?t think any of my changes should be that difficult to merge with yours.>> Yeah.. trivial merge issues.. shouldn't take too long to resolve.I haven?t studied your patch in depth yet. Does Ne10 use its own RTCD code, or do you use Opus?s?>> Not using Ne10's RTCD feature at the moment.. using Opus's rtcd. libopus RTCD for linux/armv7 will not work for aarch64. It needs to be updated (or just disabled if system built for aarch64 since Neon support is mandatory)... It is in my todo list..My patch should handle aarch64 properly. One of the things it does is it discovers (in configure) whether a little test program using Neon intrinsics can be compiled without -mfpu=neon ? if so, it disables RTCD for Neon, and sets a #define OPUS_ARM_PRESUME_NEON_INTR. (This is analogous to the existing code for determining whether Neon assembly instructions can be assembled by default.) This thus handles not only aarch64, but also iOS armv7. I also have a number of aarch64 patches pending ? mine are entirely for fixed-point though. Are yours all floating-point? On Mar 3, 2015, at 5:18 PM, Viswanath Puttagunta <viswanath.puttagunta at linaro.org<mailto:viswanath.puttagunta at linaro.org>> wrote:> Hello Jonathan, > > I am unable to apply your patch cleanly on tip. > > Timothy/opus-dev, > > This patch has some conflicts with my ARM patch that does fft optimizations > http://lists.xiph.org/pipermail/opus/2015-March/002904.html > http://lists.xiph.org/pipermail/opus/2015-March/002905.html > > One of us probably has to rebase depending on which patch goes into opus first. > > Regards, > Vish > > > On 1 March 2015 at 20:47, Jonathan Lennox <jonathan at vidyo.com<mailto:jonathan at vidyo.com>> wrote: >> The attached patch cleans up Opus's x86 intrinsics configury. >> >> It: >> * Makes ?enable-intrinsics work with clang and other non-GCC compilers >> * Enables RTCD for the floating-point-mode SSE code in Celt. >> * Disables use of RTCD in cases where the compiler targets an instruction >> set by default. >> * Enables the SSE4.1 Silk optimizations that apply to the common parts of >> Silk when Opus is built in floating-point mode, not just in fixed-point >> mode. >> * Enables the SSE intrinsics (with RTCD when appropriate) in the Win32 >> build. >> * Fixes a case where GCC would compile SSE2 code as SSE4.1, causing a crash >> on non-SSE4.1 CPUs. >> * Allows configuration with compilers with non-GCC-flavor flags for enabling >> architecture options. >> * Hopefully makes the configuration and ifdef?s easier to follow and >> understand. >> >> This does not yet switch ?enable-intrinsics to be enabled by default on >> supported architectures, but I think it?d be ready to do so. >> >> Comments are welcome! >> >> >> >> ? >> Jonathan Lennox >> jonathan at vidyo.com<mailto:jonathan at vidyo.com> >> >> >> _______________________________________________ >> opus mailing list >> opus at xiph.org<mailto:opus at xiph.org> >> http://lists.xiph.org/mailman/listinfo/opus >>-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/opus/attachments/20150304/0dbc3dba/attachment.htm
Viswanath Puttagunta
2015-Mar-04 15:00 UTC
[opus] Patch cleaning up Opus x86 intrinsics configury
On 3 March 2015 at 22:17, Jonathan Lennox <jonathan at vidyo.com> wrote:> > On Mar 3, 2015, at 11:08 PM, Viswanath Puttagunta > <viswanath.puttagunta at linaro.org> wrote: > > > > On 3 March 2015 at 21:59, Jonathan Lennox <jonathan at vidyo.com> wrote: >> >> Viswenath, >> >> My patch should be against the tip, but it?s the very recent tip, >> including some changes this past Friday (27 Feb). I mentioned in the IRC >> room a problem I discovered in creating my patch, and then later improved >> the fix Tim had made for the problem. Where do you get conflicts merging it >> to tip? >> >> In terms of merging, you posted your patch before I posted mine, so >> probably I should be the one on the hook to rebase after your fix goes in. >> Looking over your patch quickly, I don?t think any of my changes should be >> that difficult to merge with yours. > >>> Yeah.. trivial merge issues.. shouldn't take too long to resolve. >> >> >> I haven?t studied your patch in depth yet. Does Ne10 use its own RTCD >> code, or do you use Opus?s? > >>> Not using Ne10's RTCD feature at the moment.. using Opus's rtcd. libopus >>> RTCD for linux/armv7 will not work for aarch64. It needs to be updated (or >>> just disabled if system built for aarch64 since Neon support is >>> mandatory)... It is in my todo list.. > > > My patch should handle aarch64 properly. One of the things it does is it > discovers (in configure) whether a little test program using Neon intrinsics > can be compiled without -mfpu=neon ? if so, it disables RTCD for Neon, and > sets a #define OPUS_ARM_PRESUME_NEON_INTR. (This is analogous to the > existing code for determining whether Neon assembly instructions can be > assembled by default.) > > This thus handles not only aarch64, but also iOS armv7.This sounds good to me. I wanted to do more thorough code review for your patch, but am waiting until it applies cleanly on tip.. But now that we agree that your patch should be rebased on top of my patch, I'll either wait for Timothy/opus-dev to take in my patch, and your patch to be rebased? And if this is taking too long for my patch to go in (Due to opus-dev maintainers time availability), I'll rebase (retaining your authorship ofcourse) your patch on my submitted patch... and put in my future work on top of your patch. But I really hope we can get some cycles from Timothy/Jean to get the patches into opus. It seems they are really busy for last month or so.> > I also have a number of aarch64 patches pending ? mine are entirely for > fixed-point though. Are yours all floating-point?Yes, so far they are floating point. Can you post your fixed-point optimizations? Also, are any of your patches using NE10 library? So far, what I submitted is to optimize opus_fft using NE10 (float only). I also modified mdct_forward to take advantage of this optimization. My next action plan: - use opus_ifft to optimize mdct_backwards (float via Ne10) - Re-work/add opus_fft and opus_ifft for fixed (again via Ne10) - Enable for Aarch64 and test (I'm hoping your patch already does this) - float-to-int and int-to-float conversions using neon - etc.> >> On Mar 3, 2015, at 5:18 PM, Viswanath Puttagunta >> <viswanath.puttagunta at linaro.org> wrote: >> >> > Hello Jonathan, >> > >> > I am unable to apply your patch cleanly on tip. >> > >> > Timothy/opus-dev, >> > >> > This patch has some conflicts with my ARM patch that does fft >> > optimizations >> > http://lists.xiph.org/pipermail/opus/2015-March/002904.html >> > http://lists.xiph.org/pipermail/opus/2015-March/002905.html >> > >> > One of us probably has to rebase depending on which patch goes into opus >> > first. >> > >> > Regards, >> > Vish >> > >> > >> > On 1 March 2015 at 20:47, Jonathan Lennox <jonathan at vidyo.com> wrote: >> >> The attached patch cleans up Opus's x86 intrinsics configury. >> >> >> >> It: >> >> * Makes ?enable-intrinsics work with clang and other non-GCC compilers >> >> * Enables RTCD for the floating-point-mode SSE code in Celt. >> >> * Disables use of RTCD in cases where the compiler targets an >> >> instruction >> >> set by default. >> >> * Enables the SSE4.1 Silk optimizations that apply to the common parts >> >> of >> >> Silk when Opus is built in floating-point mode, not just in fixed-point >> >> mode. >> >> * Enables the SSE intrinsics (with RTCD when appropriate) in the Win32 >> >> build. >> >> * Fixes a case where GCC would compile SSE2 code as SSE4.1, causing a >> >> crash >> >> on non-SSE4.1 CPUs. >> >> * Allows configuration with compilers with non-GCC-flavor flags for >> >> enabling >> >> architecture options. >> >> * Hopefully makes the configuration and ifdef?s easier to follow and >> >> understand. >> >> >> >> This does not yet switch ?enable-intrinsics to be enabled by default on >> >> supported architectures, but I think it?d be ready to do so. >> >> >> >> Comments are welcome! >> >> >> >> >> >> >> >> ? >> >> Jonathan Lennox >> >> jonathan at vidyo.com >> >> >> >> >> >> _______________________________________________ >> >> opus mailing list >> >> opus at xiph.org >> >> http://lists.xiph.org/mailman/listinfo/opus >> >> >> > >
Viswanath Puttagunta
2015-Mar-07 18:30 UTC
[opus] Patch cleaning up Opus x86 intrinsics configury
Hello Jonathan, Just FYI, I started doing review of your patch and will get back to you in few days. After review, I would like to rebase your patch (as necessary) myself and do some testing.. and re-submit. Regards, Vish On 4 March 2015 at 09:00, Viswanath Puttagunta <viswanath.puttagunta at linaro.org> wrote:> > On 3 March 2015 at 22:17, Jonathan Lennox <jonathan at vidyo.com> wrote: > > > > On Mar 3, 2015, at 11:08 PM, Viswanath Puttagunta > > <viswanath.puttagunta at linaro.org> wrote: > > > > > > > > On 3 March 2015 at 21:59, Jonathan Lennox <jonathan at vidyo.com> wrote: > >> > >> Viswenath, > >> > >> My patch should be against the tip, but it?s the very recent tip, > >> including some changes this past Friday (27 Feb). I mentioned in the IRC > >> room a problem I discovered in creating my patch, and then later improved > >> the fix Tim had made for the problem. Where do you get conflicts merging it > >> to tip? > >> > >> In terms of merging, you posted your patch before I posted mine, so > >> probably I should be the one on the hook to rebase after your fix goes in. > >> Looking over your patch quickly, I don?t think any of my changes should be > >> that difficult to merge with yours. > > > >>> Yeah.. trivial merge issues.. shouldn't take too long to resolve. > >> > >> > >> I haven?t studied your patch in depth yet. Does Ne10 use its own RTCD > >> code, or do you use Opus?s? > > > >>> Not using Ne10's RTCD feature at the moment.. using Opus's rtcd. libopus > >>> RTCD for linux/armv7 will not work for aarch64. It needs to be updated (or > >>> just disabled if system built for aarch64 since Neon support is > >>> mandatory)... It is in my todo list.. > > > > > > My patch should handle aarch64 properly. One of the things it does is it > > discovers (in configure) whether a little test program using Neon intrinsics > > can be compiled without -mfpu=neon ? if so, it disables RTCD for Neon, and > > sets a #define OPUS_ARM_PRESUME_NEON_INTR. (This is analogous to the > > existing code for determining whether Neon assembly instructions can be > > assembled by default.) > > > > This thus handles not only aarch64, but also iOS armv7. > This sounds good to me. I wanted to do more thorough code review for > your patch, but am waiting until it applies cleanly on tip.. But now > that we agree that your patch should be rebased on top of my patch, > I'll either wait for Timothy/opus-dev to take in my patch, and your > patch to be rebased? And if this is taking too long for my patch to go > in (Due to opus-dev maintainers time availability), I'll rebase > (retaining your authorship ofcourse) your patch on my submitted > patch... and put in my future work on top of your patch. > > But I really hope we can get some cycles from Timothy/Jean to get the > patches into opus. It seems they are really busy for last month or so. > > > > > I also have a number of aarch64 patches pending ? mine are entirely for > > fixed-point though. Are yours all floating-point? > Yes, so far they are floating point. Can you post your fixed-point > optimizations? Also, are any of your patches using NE10 library? > > So far, what I submitted is to optimize opus_fft using NE10 (float > only). I also modified mdct_forward to take advantage of this > optimization. > > My next action plan: > - use opus_ifft to optimize mdct_backwards (float via Ne10) > - Re-work/add opus_fft and opus_ifft for fixed (again via Ne10) > - Enable for Aarch64 and test (I'm hoping your patch already does this) > - float-to-int and int-to-float conversions using neon > - etc. > > > > > >> On Mar 3, 2015, at 5:18 PM, Viswanath Puttagunta > >> <viswanath.puttagunta at linaro.org> wrote: > >> > >> > Hello Jonathan, > >> > > >> > I am unable to apply your patch cleanly on tip. > >> > > >> > Timothy/opus-dev, > >> > > >> > This patch has some conflicts with my ARM patch that does fft > >> > optimizations > >> > http://lists.xiph.org/pipermail/opus/2015-March/002904.html > >> > http://lists.xiph.org/pipermail/opus/2015-March/002905.html > >> > > >> > One of us probably has to rebase depending on which patch goes into opus > >> > first. > >> > > >> > Regards, > >> > Vish > >> > > >> > > >> > On 1 March 2015 at 20:47, Jonathan Lennox <jonathan at vidyo.com> wrote: > >> >> The attached patch cleans up Opus's x86 intrinsics configury. > >> >> > >> >> It: > >> >> * Makes ?enable-intrinsics work with clang and other non-GCC compilers > >> >> * Enables RTCD for the floating-point-mode SSE code in Celt. > >> >> * Disables use of RTCD in cases where the compiler targets an > >> >> instruction > >> >> set by default. > >> >> * Enables the SSE4.1 Silk optimizations that apply to the common parts > >> >> of > >> >> Silk when Opus is built in floating-point mode, not just in fixed-point > >> >> mode. > >> >> * Enables the SSE intrinsics (with RTCD when appropriate) in the Win32 > >> >> build. > >> >> * Fixes a case where GCC would compile SSE2 code as SSE4.1, causing a > >> >> crash > >> >> on non-SSE4.1 CPUs. > >> >> * Allows configuration with compilers with non-GCC-flavor flags for > >> >> enabling > >> >> architecture options. > >> >> * Hopefully makes the configuration and ifdef?s easier to follow and > >> >> understand. > >> >> > >> >> This does not yet switch ?enable-intrinsics to be enabled by default on > >> >> supported architectures, but I think it?d be ready to do so. > >> >> > >> >> Comments are welcome! > >> >> > >> >> > >> >> > >> >> ? > >> >> Jonathan Lennox > >> >> jonathan at vidyo.com > >> >> > >> >> > >> >> _______________________________________________ > >> >> opus mailing list > >> >> opus at xiph.org > >> >> http://lists.xiph.org/mailman/listinfo/opus > >> >> > >> > > > >