thr3ads.net - opus - [opus] [OPUS] celt_inner_prod() and dual_inner

If this information is useful, please help other people find it:
Share via:

Linfeng Zhang

2017-Jun-01 22:33 UTC

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Hi,

Attached are 5 patches related to celt_inner_prod() and dual_inner_prod()
NEON intrinsics optimization.

In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, the
optimization changed the order of floating-point inner products, which will
change the results. I created celt_inner_prod_neon_float_c_simulation()
and dual_inner_prod_neon_float_c_simulation() to simulate the order
floating-point operations in NEON optimization and compare their results.
Sorry that I cannot bond the distance between original C function and NEON
function to any giving reasonable small number or ratio. It's easy to
create an input which 0 and 1,000 are both correct results by just
manipulating the inner product order.

The total speed gain is about 1.0% for fixed-point encoder, and 1.8% for
floating-point encoder, in Complexity 8, tested on my Chromebook.

Thanks,
Linfeng
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170601/92c39072/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0005-Clean-celt_pitch_xcorr_float_neon.patch
Type: text/x-patch
Size: 3960 bytes
Desc: not available
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170601/92c39072/attachment-0005.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch
Type: text/x-patch
Size: 8832 bytes
Desc: not available
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170601/92c39072/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch
Type: text/x-patch
Size: 9812 bytes
Desc: not available
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170601/92c39072/attachment-0007.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Replace-call-of-celt_inner_prod_c-step-2.patch
Type: text/x-patch
Size: 7652 bytes
Desc: not available
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170601/92c39072/attachment-0008.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Replace-call-of-celt_inner_prod_c-step-1.patch
Type: text/x-patch
Size: 5706 bytes
Desc: not available
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170601/92c39072/attachment-0009.bin>

Jean-Marc Valin

2017-Jun-02 18:26 UTC

head link

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Hi Linfeng,

I'll look into your patches. Can you let me know what's the expected
effect on performance (if any) for each of your patches? Also, are these
all the patches you intend to merge for 1.2 or are there more upcoming ones?

Cheers,

	Jean-Marc

On 01/06/17 06:33 PM, Linfeng Zhang wrote:> Hi,
> 
> Attached are 5 patches related to celt_inner_prod()
> and dual_inner_prod() NEON intrinsics optimization.
> 
> In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, the
> optimization changed the order of floating-point inner products, which
> will change the results. I
> created celt_inner_prod_neon_float_c_simulation()
> and dual_inner_prod_neon_float_c_simulation() to simulate the order
> floating-point operations in NEON optimization and compare their
> results. Sorry that I cannot bond the distance between original C
> function and NEON function to any giving reasonable small number or
> ratio. It's easy to create an input which 0 and 1,000 are both correct
> results by just manipulating the inner product order.
> 
> The total speed gain is about 1.0% for fixed-point encoder, and 1.8% for
> floating-point encoder, in Complexity 8, tested on my Chromebook.
> 
> Thanks,
> Linfeng
> 
> 
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
>

Linfeng Zhang

2017-Jun-05 19:28 UTC

head link

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Hi Jean-Marc,

I attached the new version in inner_prod_5patches_v2.zip which synced to
the current master.

For fixed-point ARM, only 0003-Optimize-fixed-point-celt
_inner_prod-and-dual_inner_.patch changes the performance.
For floating-point ARM, only 0004-Optimize-floating-point-c
elt_inner_prod-and-dual_inn.patch changes the performance.
Patch 1 and 2 are code clean-up and can only affect x86 performance.
Patch 5 has neglectable effect on floating-point ARM performance.

Thanks,
Linfeng

On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at jmvalin.ca>
wrote:
> Hi Linfeng,
>
> I'll look into your patches. Can you let me know what's the
expected
> effect on performance (if any) for each of your patches? Also, are these
> all the patches you intend to merge for 1.2 or are there more upcoming
> ones?
>
> Cheers,
>
>         Jean-Marc
>
> On 01/06/17 06:33 PM, Linfeng Zhang wrote:
> > Hi,
> >
> > Attached are 5 patches related to celt_inner_prod()
> > and dual_inner_prod() NEON intrinsics optimization.
> >
> > In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch,
the
> > optimization changed the order of floating-point inner products, which
> > will change the results. I
> > created celt_inner_prod_neon_float_c_simulation()
> > and dual_inner_prod_neon_float_c_simulation() to simulate the order
> > floating-point operations in NEON optimization and compare their
> > results. Sorry that I cannot bond the distance between original C
> > function and NEON function to any giving reasonable small number or
> > ratio. It's easy to create an input which 0 and 1,000 are both
correct
> > results by just manipulating the inner product order.
> >
> > The total speed gain is about 1.0% for fixed-point encoder, and 1.8%
for
> > floating-point encoder, in Complexity 8, tested on my Chromebook.
> >
> > Thanks,
> > Linfeng
> >
> >
> > _______________________________________________
> > opus mailing list
> > opus at xiph.org
> > http://lists.xiph.org/mailman/listinfo/opus
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170605/c8d5d402/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: inner_prod_5patches_v2.zip
Type: application/zip
Size: 10997 bytes
Desc: not available
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170605/c8d5d402/attachment-0001.zip>

Jonathan Lennox

2017-Jun-06 20:09 UTC

head link

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Two comments on the various infrastructure for RTCD etc.

1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions, but
doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s correspondingly. 
I suspect the ‘arch’ parameter can just be ignored by the assembly functions,
but at least the comments in that file should be updated to indicate the
register that’s used to pass it in, and that it’s ignored.

2. In the 0003- patch, you shouldn’t use the MAY_HAVE_NEON macro in your new
arm_celt_map tables, for the same reason we didn’t want it in the arm_silk_map
tables.


Out of curiosity, what’s the CPU in the Chromebook you’re using to test?
> On Jun 1, 2017, at 6:33 PM, Linfeng Zhang <linfengz at google.com>
wrote:
> 
> Hi,
> 
> Attached are 5 patches related to celt_inner_prod() and dual_inner_prod()
NEON intrinsics optimization.
> 
> In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, the
optimization changed the order of floating-point inner products, which will
change the results. I created celt_inner_prod_neon_float_c_simulation() and
dual_inner_prod_neon_float_c_simulation() to simulate the order floating-point
operations in NEON optimization and compare their results. Sorry that I cannot
bond the distance between original C function and NEON function to any giving
reasonable small number or ratio. It's easy to create an input which 0 and
1,000 are both correct results by just manipulating the inner product order.
> 
> The total speed gain is about 1.0% for fixed-point encoder, and 1.8% for
floating-point encoder, in Complexity 8, tested on my Chromebook.
> 
> Thanks,
> Linfeng
>
<0005-Clean-celt_pitch_xcorr_float_neon.patch><0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch><0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch><0002-Replace-call-of-celt_inner_prod_c-step-2.patch><0001-Replace-call-of-celt_inner_prod_c-step-1.patch>_______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus

Jean-Marc Valin

2017-Jun-06 20:15 UTC

head link

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Hi Linfeng,

On 06/06/17 04:09 PM, Jonathan Lennox wrote:> Two comments on the various infrastructure for RTCD etc.
> 
> 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions,
> but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s
> correspondingly.  I suspect the ‘arch’ parameter can just be ignored
> by the assembly functions, but at least the comments in that file
> should be updated to indicate the register that’s used to pass it in,
> and that it’s ignored.
> 
> 2. In the 0003- patch, you shouldn’t use the MAY_HAVE_NEON macro in
> your new arm_celt_map tables, for the same reason we didn’t want it
> in the arm_silk_map tables.
I have no further issues with your patches, so once you address the two
issues Jonathan pointed out, I'll be able to merge them.

Cheers,

	Jean-Marc
> 
> Out of curiosity, what’s the CPU in the Chromebook you’re using to
> test?
> 
>> On Jun 1, 2017, at 6:33 PM, Linfeng Zhang <linfengz at
google.com>
>> wrote:
>> 
>> Hi,
>> 
>> Attached are 5 patches related to celt_inner_prod() and
>> dual_inner_prod() NEON intrinsics optimization.
>> 
>> In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch,
>> the optimization changed the order of floating-point inner
>> products, which will change the results. I created
>> celt_inner_prod_neon_float_c_simulation() and
>> dual_inner_prod_neon_float_c_simulation() to simulate the order
>> floating-point operations in NEON optimization and compare their
>> results. Sorry that I cannot bond the distance between original C
>> function and NEON function to any giving reasonable small number or
>> ratio. It's easy to create an input which 0 and 1,000 are both
>> correct results by just manipulating the inner product order.
>> 
>> The total speed gain is about 1.0% for fixed-point encoder, and
>> 1.8% for floating-point encoder, in Complexity 8, tested on my
>> Chromebook.
>> 
>> Thanks, Linfeng 
>>
<0005-Clean-celt_pitch_xcorr_float_neon.patch><0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch><0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch><0002-Replace-call-of-celt_inner_prod_c-step-2.patch><0001-Replace-call-of-celt_inner_prod_c-step-1.patch>_______________________________________________
>>
>> 
opus mailing list>> opus at xiph.org http://lists.xiph.org/mailman/listinfo/opus
> 
> _______________________________________________ opus mailing list 
> opus at xiph.org http://lists.xiph.org/mailman/listinfo/opus
>

Seemingly Similar Threads

Search for more maybe matching threads

opus - Jun 2017 - celt_inner_prod() and dual_inner_prod() NEON intrinsics

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Seemingly Similar Threads