Displaying 20 results from an estimated 900 matches similar to: "[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON"
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Thanks a lot for reviewing this huge assembly function!
silk_warped_autocorrelation_FIX_c()'s kernel part is
for( n = 0; n < length; n++ ) {
tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS );
/* Loop over allpass sections */
for( i = 0; i < order; i++ ) {
/* Output of allpass section */
tmp2_QS = silk_SMLAWB(
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
This is a great idea. But the order (psEncC->shapingLPCOrder) can be
configured to 12, 14, 16, 20 and 24 according to complexity parameter.
It's hard to get a universal function to handle all these orders
efficiently. Any suggestions?
Thanks,
Linfeng
On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> Hi Linfeng,
>
> On 06/02/17 02:51 PM,
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Thanks for your suggestions. Will get back to you once we have some updates.
Linfeng
On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> Hi Linfeng,
>
> On 06/02/17 07:18 PM, Linfeng Zhang wrote:
> > This is a great idea. But the order (psEncC->shapingLPCOrder) can be
> > configured to 12, 14, 16, 20 and 24 according to
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
I attached a new patch with small cleanup (disassembly is identical as the
last patch). We have done the same internal testing as usual.
Also, attached 2 failed temporary versions which try to reduce code size
(just for code review reference purpose).
The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of
3,228 bytes (with gcc).
smaller_slower.c has a code size of 2,304
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc!
The speedup percentages are all relative to the entire encoder.
Comparing to master, this optimization patch speeds up fixed-point SILK
encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8%
Complexity 8: 5.5% Complexity 10: 4.0%
when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max
MHz: 2116.5
Thanks,
Linfeng
On Wed, Apr 5, 2017 at 11:02 AM,
2017 Feb 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Linfeng,
On 06/02/17 02:51 PM, Linfeng Zhang wrote:
> However, the critical thing is that all the states in each stage when
> processing input[i] are reused by the next input[i+1]. That is
> input[i+1] must wait input[i] for 1 stage, and input[i+2] must wait
> input[i+1] for 1 stage, etc.
That is indeed the tricky part... and the one I think you could do
slightly differently. If
2017 Feb 07
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Linfeng,
On 06/02/17 07:18 PM, Linfeng Zhang wrote:
> This is a great idea. But the order (psEncC->shapingLPCOrder) can be
> configured to 12, 14, 16, 20 and 24 according to complexity parameter.
>
> It's hard to get a universal function to handle all these orders
> efficiently. Any suggestions?
I can think of two ways of handling larger orders. The obvious one is
2017 Apr 03
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Attached is the silk_warped_autocorrelation_FIX_neon() which implements
your idea.
Speed improvement vs the previous optimization:
Complexity 0-4: Doesn't call this function. Complexity 5: 2.1% (order = 16)
Complexity 6: 1.0% (order = 20) Complexity 8: 0.1% (order = 24) Complexity
10: 0.1% (order = 24)
Code size of silk_warped_autocorrelation_FIX_neon() changes from 2,644
2017 Apr 05
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Linfeng,
Thanks for the updated patch. I'll have a look and get back to you. When
you report speedup percentages, is that relative to the entire encoder
or relative to just that function in C? Also, what's the speedup
compared to master?
Cheers,
Jean-Marc
On 05/04/17 12:14 PM, Linfeng Zhang wrote:
> I attached a new patch with small cleanup (disassembly is identical as
> the
2017 Apr 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Linfeng,
I had a closer look at your patch and the code looks good -- and
slightly simpler than I had anticipated, so that's good.
I did some profiling on a Cortex A57 and I've been seeing slightly less
improvement than you're reporting, more like 3.5% at complexity 8. It
appears that the warped autocorrelation function itself is only faster
by a factor of about 1.35. That's a
2016 Jul 01
1
silk_warped_autocorrelation_FIX() NEON optimization
Hi all,
I'm sending patch "Optimize silk_warped_autocorrelation_FIX() for ARM NEON" in an separate email.
It is based on Tim’s aarch64v8 branch https://git.xiph.org/?p=users/tterribe/opus.git;a=shortlog;h=refs/heads/aarch64v8
Thanks for your comments.
Linfeng
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes.
Patches 1 to 3 replace all my previous submitted patches.
Patches 4 and 5 are new.
Thanks,
Linfeng Zhang
2008 Feb 25
6
[PATCH 0/4] ia64/xen: paravirtualization of hand written assembly code
Hi. The patch I send before was too large so that it was dropped from
the maling list. I'm sending again with smaller size.
This patch set is the xen paravirtualization of hand written assenbly
code. And I expect that much clean up is necessary before merge.
We really need the feed back before starting actual clean up as Eddie
already said before.
Eddie discussed how to clean up and suggested
2008 Feb 25
6
[PATCH 0/4] ia64/xen: paravirtualization of hand written assembly code
Hi. The patch I send before was too large so that it was dropped from
the maling list. I'm sending again with smaller size.
This patch set is the xen paravirtualization of hand written assenbly
code. And I expect that much clean up is necessary before merge.
We really need the feed back before starting actual clean up as Eddie
already said before.
Eddie discussed how to clean up and suggested
2008 Mar 31
1
[03/15][PATCH] kvm/ia64: Add header files for kvm/ia64. V8
Hi Xiantao,
Some more nit-picking, though some of this is a bit more important
to fixup.
Cheers,
Jes
> +typedef struct thash_data {
Urgh! argh! Please avoid typedefs unless you really need them, see
Chapter 5 of Documentation/CodingStyle for details.
> diff --git a/include/asm-ia64/kvm_host.h b/include/asm-ia64/kvm_host.h
> new file mode 100644
> index 0000000..522bde0
> ---
2008 Mar 31
1
[03/15][PATCH] kvm/ia64: Add header files for kvm/ia64. V8
Hi Xiantao,
Some more nit-picking, though some of this is a bit more important
to fixup.
Cheers,
Jes
> +typedef struct thash_data {
Urgh! argh! Please avoid typedefs unless you really need them, see
Chapter 5 of Documentation/CodingStyle for details.
> diff --git a/include/asm-ia64/kvm_host.h b/include/asm-ia64/kvm_host.h
> new file mode 100644
> index 0000000..522bde0
> ---
2010 Aug 20
3
change object name within for loop
Hi,
I am writing a for loop that creates one object, say 'outn' on every
round of the loop. I would like the name of each object to include the
index of the loop as in, for example:
out1, out2, out3, ...
And I would like the naming of the object to take place automatically
as the loop moves through?
Similarly, I would like to be able to call different objects (in1,
in2, in3,
2008 Feb 26
8
[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code
Hi. I rewrote the patch according to the comments. I adopted generating
in-place code because it looks the quickest way.
The point Eddie wanted to discuss is how to generate code and its ABI.
i.e. in-place generating v.s. direct jump v.s. indirect function call
Indirect function call doesn't make sense because ivt.S is compiled
multi times. And it is up to pv instances to choose in-place
2008 Feb 26
8
[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code
Hi. I rewrote the patch according to the comments. I adopted generating
in-place code because it looks the quickest way.
The point Eddie wanted to discuss is how to generate code and its ABI.
i.e. in-place generating v.s. direct jump v.s. indirect function call
Indirect function call doesn't make sense because ivt.S is compiled
multi times. And it is up to pv instances to choose in-place
2007 Apr 17
3
acpitool
I downloaded and installed acpitool
when I run it I get
Battery Status <not available>
AC Adapter <not availble>
Thermal Info <not available>
I was trying to find out how hot the CPU is?
Any ideas how to get this information?
THanks,
jerry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: