thr3ads.net - opus - [opus] Patch cleaning up Opus x86 intrinsics configury [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Jonathan Lennox

2015-Mar-04 04:17 UTC

[opus] Patch cleaning up Opus x86 intrinsics configury

On Mar 3, 2015, at 11:08 PM, Viswanath Puttagunta <viswanath.puttagunta at
linaro.org<mailto:viswanath.puttagunta at linaro.org>> wrote:

On 3 March 2015 at 21:59, Jonathan Lennox <jonathan at
vidyo.com<mailto:jonathan at vidyo.com>> wrote:
Viswenath,

My patch should be against the tip, but it?s the very recent tip, including some
changes this past Friday (27 Feb).  I mentioned in the IRC room a problem I
discovered in creating my patch, and then later improved the fix Tim had made
for the problem.  Where do you get conflicts merging it to tip?

In terms of merging, you posted your patch before I posted mine, so probably I
should be the one on the hook to rebase after your fix goes in.  Looking over
your patch quickly, I don?t think any of my changes should be that difficult to
merge with yours.>> Yeah.. trivial merge issues.. shouldn't take too long to resolve.
I haven?t studied your patch in depth yet.  Does Ne10 use its own RTCD code, or
do you use Opus?s?>> Not using Ne10's RTCD feature at the moment.. using Opus's
rtcd. libopus RTCD for linux/armv7 will not work for aarch64. It needs to be
updated (or just disabled if system built for aarch64 since Neon support is
mandatory)... It is in my todo list..
My patch should handle aarch64 properly.  One of the things it does is it
discovers (in configure) whether a little test program using Neon intrinsics can
be compiled without -mfpu=neon ? if so, it disables RTCD for Neon, and sets a
#define OPUS_ARM_PRESUME_NEON_INTR.  (This is analogous to the existing code for
determining whether Neon assembly instructions can be assembled by default.)

This thus handles not only aarch64, but also iOS armv7.

I also have a number of aarch64 patches pending ? mine are entirely for
fixed-point though.  Are yours all floating-point?

On Mar 3, 2015, at 5:18 PM, Viswanath Puttagunta <viswanath.puttagunta at
linaro.org<mailto:viswanath.puttagunta at linaro.org>> wrote:
> Hello Jonathan,
>
> I am unable to apply your patch cleanly on tip.
>
> Timothy/opus-dev,
>
> This patch has some conflicts with my ARM patch that does fft optimizations
> http://lists.xiph.org/pipermail/opus/2015-March/002904.html
> http://lists.xiph.org/pipermail/opus/2015-March/002905.html
>
> One of us probably has to rebase depending on which patch goes into opus
first.
>
> Regards,
> Vish
>
>
> On 1 March 2015 at 20:47, Jonathan Lennox <jonathan at
vidyo.com<mailto:jonathan at vidyo.com>> wrote:
>> The attached patch cleans up Opus's x86 intrinsics configury.
>>
>> It:
>> * Makes ?enable-intrinsics work with clang and other non-GCC compilers
>> * Enables RTCD for the floating-point-mode SSE code in Celt.
>> * Disables use of RTCD in cases where the compiler targets an
instruction
>> set by default.
>> * Enables the SSE4.1 Silk optimizations that apply to the common parts
of
>> Silk when Opus is built in floating-point mode, not just in fixed-point
>> mode.
>> * Enables the SSE intrinsics (with RTCD when appropriate) in the Win32
>> build.
>> * Fixes a case where GCC would compile SSE2 code as SSE4.1, causing a
crash
>> on non-SSE4.1 CPUs.
>> * Allows configuration with compilers with non-GCC-flavor flags for
enabling
>> architecture options.
>> * Hopefully makes the configuration and ifdef?s easier to follow and
>> understand.
>>
>> This does not yet switch ?enable-intrinsics to be enabled by default on
>> supported architectures, but I think it?d be ready to do so.
>>
>> Comments are welcome!
>>
>>
>>
>> ?
>> Jonathan Lennox
>> jonathan at vidyo.com<mailto:jonathan at vidyo.com>
>>
>>
>> _______________________________________________
>> opus mailing list
>> opus at xiph.org<mailto:opus at xiph.org>
>> http://lists.xiph.org/mailman/listinfo/opus
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.xiph.org/pipermail/opus/attachments/20150304/0dbc3dba/attachment.htm

Viswanath Puttagunta

2015-Mar-04 15:00 UTC

head link

[opus] Patch cleaning up Opus x86 intrinsics configury

On 3 March 2015 at 22:17, Jonathan Lennox <jonathan at vidyo.com>
wrote:>
> On Mar 3, 2015, at 11:08 PM, Viswanath Puttagunta
> <viswanath.puttagunta at linaro.org> wrote:
>
>
>
> On 3 March 2015 at 21:59, Jonathan Lennox <jonathan at vidyo.com>
wrote:
>>
>> Viswenath,
>>
>> My patch should be against the tip, but it?s the very recent tip,
>> including some changes this past Friday (27 Feb).  I mentioned in the
IRC
>> room a problem I discovered in creating my patch, and then later
improved
>> the fix Tim had made for the problem.  Where do you get conflicts
merging it
>> to tip?
>>
>> In terms of merging, you posted your patch before I posted mine, so
>> probably I should be the one on the hook to rebase after your fix goes
in.
>> Looking over your patch quickly, I don?t think any of my changes should
be
>> that difficult to merge with yours.
>
>>> Yeah.. trivial merge issues.. shouldn't take too long to
resolve.
>>
>>
>> I haven?t studied your patch in depth yet.  Does Ne10 use its own RTCD
>> code, or do you use Opus?s?
>
>>> Not using Ne10's RTCD feature at the moment.. using Opus's
rtcd. libopus
>>> RTCD for linux/armv7 will not work for aarch64. It needs to be
updated (or
>>> just disabled if system built for aarch64 since Neon support is
>>> mandatory)... It is in my todo list..
>
>
> My patch should handle aarch64 properly.  One of the things it does is it
> discovers (in configure) whether a little test program using Neon
intrinsics
> can be compiled without -mfpu=neon ? if so, it disables RTCD for Neon, and
> sets a #define OPUS_ARM_PRESUME_NEON_INTR.  (This is analogous to the
> existing code for determining whether Neon assembly instructions can be
> assembled by default.)
>
> This thus handles not only aarch64, but also iOS armv7.This sounds good to me. I wanted to do more thorough code review for
your patch, but am waiting until it applies cleanly on tip.. But now
that we agree that your patch should be rebased on top of my patch,
I'll either wait for Timothy/opus-dev to take in my patch, and your
patch to be rebased? And if this is taking too long for my patch to go
in (Due to opus-dev maintainers time availability), I'll rebase
(retaining your authorship ofcourse) your patch on my submitted
patch... and put in my future work on top of your patch.

But I really hope we can get some cycles from Timothy/Jean to get the
patches into opus. It seems they are really busy for last month or so.
>
> I also have a number of aarch64 patches pending ? mine are entirely for
> fixed-point though.  Are yours all floating-point?Yes, so far they are floating point. Can you post your fixed-point
optimizations? Also, are any of your patches using NE10 library?

So far, what I submitted is to optimize opus_fft using NE10 (float
only). I also modified mdct_forward to take advantage of this
optimization.

My next action plan:
- use opus_ifft to optimize mdct_backwards (float via Ne10)
- Re-work/add opus_fft and opus_ifft for fixed (again via Ne10)
- Enable for Aarch64 and test (I'm hoping your patch already does this)
- float-to-int and int-to-float conversions using neon
- etc.

>
>> On Mar 3, 2015, at 5:18 PM, Viswanath Puttagunta
>> <viswanath.puttagunta at linaro.org> wrote:
>>
>> > Hello Jonathan,
>> >
>> > I am unable to apply your patch cleanly on tip.
>> >
>> > Timothy/opus-dev,
>> >
>> > This patch has some conflicts with my ARM patch that does fft
>> > optimizations
>> > http://lists.xiph.org/pipermail/opus/2015-March/002904.html
>> > http://lists.xiph.org/pipermail/opus/2015-March/002905.html
>> >
>> > One of us probably has to rebase depending on which patch goes
into opus
>> > first.
>> >
>> > Regards,
>> > Vish
>> >
>> >
>> > On 1 March 2015 at 20:47, Jonathan Lennox <jonathan at
vidyo.com> wrote:
>> >> The attached patch cleans up Opus's x86 intrinsics
configury.
>> >>
>> >> It:
>> >> * Makes ?enable-intrinsics work with clang and other non-GCC
compilers
>> >> * Enables RTCD for the floating-point-mode SSE code in Celt.
>> >> * Disables use of RTCD in cases where the compiler targets an
>> >> instruction
>> >> set by default.
>> >> * Enables the SSE4.1 Silk optimizations that apply to the
common parts
>> >> of
>> >> Silk when Opus is built in floating-point mode, not just in
fixed-point
>> >> mode.
>> >> * Enables the SSE intrinsics (with RTCD when appropriate) in
the Win32
>> >> build.
>> >> * Fixes a case where GCC would compile SSE2 code as SSE4.1,
causing a
>> >> crash
>> >> on non-SSE4.1 CPUs.
>> >> * Allows configuration with compilers with non-GCC-flavor
flags for
>> >> enabling
>> >> architecture options.
>> >> * Hopefully makes the configuration and ifdef?s easier to
follow and
>> >> understand.
>> >>
>> >> This does not yet switch ?enable-intrinsics to be enabled by
default on
>> >> supported architectures, but I think it?d be ready to do so.
>> >>
>> >> Comments are welcome!
>> >>
>> >>
>> >>
>> >> ?
>> >> Jonathan Lennox
>> >> jonathan at vidyo.com
>> >>
>> >>
>> >> _______________________________________________
>> >> opus mailing list
>> >> opus at xiph.org
>> >> http://lists.xiph.org/mailman/listinfo/opus
>> >>
>>
>
>

Viswanath Puttagunta

2015-Mar-07 18:30 UTC

head link

[opus] Patch cleaning up Opus x86 intrinsics configury

Hello Jonathan,

Just FYI, I started doing review of your patch and will get back to
you in few days. After review, I would like to rebase your patch (as
necessary) myself and do some testing.. and re-submit.

Regards,
Vish

On 4 March 2015 at 09:00, Viswanath Puttagunta
<viswanath.puttagunta at linaro.org> wrote:>
> On 3 March 2015 at 22:17, Jonathan Lennox <jonathan at vidyo.com>
wrote:
> >
> > On Mar 3, 2015, at 11:08 PM, Viswanath Puttagunta
> > <viswanath.puttagunta at linaro.org> wrote:
> >
> >
> >
> > On 3 March 2015 at 21:59, Jonathan Lennox <jonathan at
vidyo.com> wrote:
> >>
> >> Viswenath,
> >>
> >> My patch should be against the tip, but it?s the very recent tip,
> >> including some changes this past Friday (27 Feb).  I mentioned in
the IRC
> >> room a problem I discovered in creating my patch, and then later
improved
> >> the fix Tim had made for the problem.  Where do you get conflicts
merging it
> >> to tip?
> >>
> >> In terms of merging, you posted your patch before I posted mine,
so
> >> probably I should be the one on the hook to rebase after your fix
goes in.
> >> Looking over your patch quickly, I don?t think any of my changes
should be
> >> that difficult to merge with yours.
> >
> >>> Yeah.. trivial merge issues.. shouldn't take too long to
resolve.
> >>
> >>
> >> I haven?t studied your patch in depth yet.  Does Ne10 use its own
RTCD
> >> code, or do you use Opus?s?
> >
> >>> Not using Ne10's RTCD feature at the moment.. using
Opus's rtcd. libopus
> >>> RTCD for linux/armv7 will not work for aarch64. It needs to be
updated (or
> >>> just disabled if system built for aarch64 since Neon support
is
> >>> mandatory)... It is in my todo list..
> >
> >
> > My patch should handle aarch64 properly.  One of the things it does is
it
> > discovers (in configure) whether a little test program using Neon
intrinsics
> > can be compiled without -mfpu=neon ? if so, it disables RTCD for Neon,
and
> > sets a #define OPUS_ARM_PRESUME_NEON_INTR.  (This is analogous to the
> > existing code for determining whether Neon assembly instructions can
be
> > assembled by default.)
> >
> > This thus handles not only aarch64, but also iOS armv7.
> This sounds good to me. I wanted to do more thorough code review for
> your patch, but am waiting until it applies cleanly on tip.. But now
> that we agree that your patch should be rebased on top of my patch,
> I'll either wait for Timothy/opus-dev to take in my patch, and your
> patch to be rebased? And if this is taking too long for my patch to go
> in (Due to opus-dev maintainers time availability), I'll rebase
> (retaining your authorship ofcourse) your patch on my submitted
> patch... and put in my future work on top of your patch.
>
> But I really hope we can get some cycles from Timothy/Jean to get the
> patches into opus. It seems they are really busy for last month or so.
>
> >
> > I also have a number of aarch64 patches pending ? mine are entirely
for
> > fixed-point though.  Are yours all floating-point?
> Yes, so far they are floating point. Can you post your fixed-point
> optimizations? Also, are any of your patches using NE10 library?
>
> So far, what I submitted is to optimize opus_fft using NE10 (float
> only). I also modified mdct_forward to take advantage of this
> optimization.
>
> My next action plan:
> - use opus_ifft to optimize mdct_backwards (float via Ne10)
> - Re-work/add opus_fft and opus_ifft for fixed (again via Ne10)
> - Enable for Aarch64 and test (I'm hoping your patch already does this)
> - float-to-int and int-to-float conversions using neon
> - etc.
>
>
> >
> >> On Mar 3, 2015, at 5:18 PM, Viswanath Puttagunta
> >> <viswanath.puttagunta at linaro.org> wrote:
> >>
> >> > Hello Jonathan,
> >> >
> >> > I am unable to apply your patch cleanly on tip.
> >> >
> >> > Timothy/opus-dev,
> >> >
> >> > This patch has some conflicts with my ARM patch that does fft
> >> > optimizations
> >> > http://lists.xiph.org/pipermail/opus/2015-March/002904.html
> >> > http://lists.xiph.org/pipermail/opus/2015-March/002905.html
> >> >
> >> > One of us probably has to rebase depending on which patch
goes into opus
> >> > first.
> >> >
> >> > Regards,
> >> > Vish
> >> >
> >> >
> >> > On 1 March 2015 at 20:47, Jonathan Lennox <jonathan at
vidyo.com> wrote:
> >> >> The attached patch cleans up Opus's x86 intrinsics
configury.
> >> >>
> >> >> It:
> >> >> * Makes ?enable-intrinsics work with clang and other
non-GCC compilers
> >> >> * Enables RTCD for the floating-point-mode SSE code in
Celt.
> >> >> * Disables use of RTCD in cases where the compiler
targets an
> >> >> instruction
> >> >> set by default.
> >> >> * Enables the SSE4.1 Silk optimizations that apply to the
common parts
> >> >> of
> >> >> Silk when Opus is built in floating-point mode, not just
in fixed-point
> >> >> mode.
> >> >> * Enables the SSE intrinsics (with RTCD when appropriate)
in the Win32
> >> >> build.
> >> >> * Fixes a case where GCC would compile SSE2 code as
SSE4.1, causing a
> >> >> crash
> >> >> on non-SSE4.1 CPUs.
> >> >> * Allows configuration with compilers with non-GCC-flavor
flags for
> >> >> enabling
> >> >> architecture options.
> >> >> * Hopefully makes the configuration and ifdef?s easier to
follow and
> >> >> understand.
> >> >>
> >> >> This does not yet switch ?enable-intrinsics to be enabled
by default on
> >> >> supported architectures, but I think it?d be ready to do
so.
> >> >>
> >> >> Comments are welcome!
> >> >>
> >> >>
> >> >>
> >> >> ?
> >> >> Jonathan Lennox
> >> >> jonathan at vidyo.com
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> opus mailing list
> >> >> opus at xiph.org
> >> >> http://lists.xiph.org/mailman/listinfo/opus
> >> >>
> >>
> >
> >

Apparently Analagous Threads

Search for more reasonably related threads

opus - Mar 2015 - Patch cleaning up Opus x86 intrinsics configury

[opus] Patch cleaning up Opus x86 intrinsics configury

[opus] Patch cleaning up Opus x86 intrinsics configury

[opus] Patch cleaning up Opus x86 intrinsics configury

Apparently Analagous Threads