search for: zanelli

Displaying 14 results from an estimated 14 matches for "zanelli".

Did you mean: zanella
2013 Jun 07
2
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi JM, I have no doubt that Mr. Zanelli's NEON code is faster, since hand tuned assembly is bound to be faster than using intrinsics. However I notice that his code can also read past the y buffer. Cheers, --John On 6/6/2013 9:22 PM, Jean-Marc Valin wrote: > Hi John, > > Thanks for the two fixes. They're in git now....
2013 Jun 11
0
Bug fix in celt_lpc.c and some xcorr_kernel, optimizations
Although I've never used ARM's compiler, I admit I'm very surprised that it's not compatible with the NEON intrinsics. Given that and M. Zanelli's speed tests, it seems clear that M. Zanelli's code is the way to go. I look forward to its inclusion in the opus GIT. --John On 6/10/2013 1:00 PM, opus-request at xiph.org wrote: > Date: Mon, 10 Jun 2013 10:36:34 +0100 > From: Cliff Parris<cliff at espico.com> > Subject:...
2013 Jun 07
1
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Unfortunately I don't have a setup that lets me easily profile ARM code, so I really can't tell which method is faster (though I suspect Mr. Zanelli's code is). Let me offer up another intrinsic version of the NEON xcorr_kernel that is almost identical to the SSE version, and more in line with Mr. Zanelli's code: static inline void xcorr_kernel_neon(const opus_val16 *x, const opus_val16 *y, opus_val32 sum[4], int len) { int j;...
2013 Jun 10
0
opus Digest, Vol 53, Issue 2
...at jmvalin.ca> Cc: opus at xiph.org Message-ID: <51B263C8.5060203 at masque.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Unfortunately I don't have a setup that lets me easily profile ARM code, so I really can't tell which method is faster (though I suspect Mr. Zanelli's code is). Let me offer up another intrinsic version of the NEON xcorr_kernel that is almost identical to the SSE version, and more in line with Mr. Zanelli's code: static inline void xcorr_kernel_neon(const opus_val16 *x, const opus_val16 *y, opus_val32 sum[4], int len) { int j;...
2013 Jun 07
2
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi JM, At line 221 in celt_lpc.c (the celt_iir function) I think you really want the RESTORE_STACK statement to be before the #endif instead of after it. Also, I couldn't help notice that your SSE code for xcorr_kernel reads more than "len" elements of "_x". I don't know if that's really a problem when running the codec, but a tool like valgrind will have a
2013 May 17
1
Opus ARM optimizations
Hello, I've been working on optimizations for ARMv5E architecture and Cortex-A8 for both decoder and encoder and my company is agree to contribute to upstream. Could you tell me how to do it ? Best regards, -- Aur?lien Zanelli Parrot SA 174, quai de Jemmapes 75010 Paris France
2013 Jun 07
0
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
On 06/07/2013 02:33 PM, John Ridges wrote: > I have no doubt that Mr. Zanelli's NEON code is faster, since hand tuned > assembly is bound to be faster than using intrinsics. I was mostly curious about comparing vectorization approaches (assuming the two are different) than exact code. > However I notice > that his code can also read past the y buffer. Yeah we...
2013 May 23
2
ASM runtime detection and optimizations
...of optimized function. I follow design which had been discussed on IRC. Also, i notice a little drawback: we must propagate the arch index through functions which don't have codec state as argument. However, if it's look good, i will continue to implement it. Best regards, -- Aur?lien Zanelli Parrot SA 174, quai de Jemmapes 75010 Paris France -------------- next part -------------- diff --git a/Makefile.am b/Makefile.am index f04e3bc..06d4283 100644 --- a/Makefile.am +++ b/Makefile.am @@ -5,7 +5,7 @@ lib_LTLIBRARIES = libopus.la DIST_SUBDIRS = doc -INCLUDES = -I$(top_srcdir)/includ...
2013 May 17
1
[Patch]01-Add ARM5E macros
Hello, This is a first patch which add macros for ARMv5E. Also, I copy headers from other files and add company name, tell me if I'm wrong. Also, if you have any question or comment about it, feel free to contact me. Best regards, -- Aur?lien Zanelli Parrot SA 174, quai de Jemmapes 75010 Paris France -------------- next part -------------- diff --git a/celt/fixed_arm5e.h b/celt/fixed_arm5e.h new file mode 100644 index 0000000..9eb970a --- /dev/null +++ b/celt/fixed_arm5e.h @@ -0,0 +1,99 @@ +/* Copyright (C) 2007-2009 Xiph.Org Foundation + Cop...
2013 May 27
0
[Patch] Check if opus_compare is executable in run_vectors.sh
If opus_compare doesn't exist or isn't executable, tests failed normally which could be misleading. So test for existence and mode to avoid this ambiguity. -- Aur?lien Zanelli Parrot SA 174, quai de Jemmapes 75010 Paris France -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Check-if-opus_compare-is-executable-in-run_vectors.s.patch Type: text/x-patch Size: 0 bytes Desc: not available Url : http://lists.xiph.org/pipermail/opus/atta...
2013 Jun 07
0
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi John, Thanks for the two fixes. They're in git now. Your SSE version seems to also be slightly faster than mine -- probably due the the partial sums. As for the NEON code, it would be good to compare the performance with the code Aur?lien Zanelli posted at http://darkosphere.fr/public/0002-Add-optimized-NEON-version-of-celt_fir-celt_iir-and-.patch Cheers, Jean-Marc On 06/06/2013 08:07 PM, John Ridges wrote: > Hi JM, > > At line 221 in celt_lpc.c (the celt_iir function) I think you really > want the RESTORE_STACK statement...
2013 May 21
2
[PATCH] 02-Add CELT filter optimizations
...a loop by 2. It increase performance when using optimized macros (ex: ARMv5E). A possible side effect of loop unroll is that i don't check for odd length here. - Add NEON version of FIR filter and autocorr - Add a section in autoconf in order to check NEON support Best regards, -- Aur?lien Zanelli Parrot SA 174, quai de Jemmapes 75010 Paris France -------------- next part -------------- diff --git a/celt/celt_lpc.c b/celt/celt_lpc.c index d2addbf..14a7839 100644 --- a/celt/celt_lpc.c +++ b/celt/celt_lpc.c @@ -33,6 +33,10 @@ #include "stack_alloc.h" #include "mathops.h"...
2013 May 21
0
[PATCH] 02-
- Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It increase performance when using optimized macros (ex: ARMv5E). A possible side effect of loop unroll is that i don't check for odd length here. - Add NEON version of FIR filter and autocorr -- Aur?lien Zanelli Parrot SA 174, quai de Jemmapes 75010 Paris France -------------- next part -------------- diff --git a/celt/celt_lpc.c b/celt/celt_lpc.c index d2addbf..14a7839 100644 --- a/celt/celt_lpc.c +++ b/celt/celt_lpc.c @@ -33,6 +33,10 @@ #include "stack_alloc.h" #include "mathops.h"...
2013 May 27
0
[PATCH] Check if opus_compare is executable in run_vectors.sh
If opus_compare doesn't exist or isn't executable, tests failed normally which could be misleading. So test for existence and mode to avoid this ambiguity. --- tests/run_vectors.sh | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tests/run_vectors.sh b/tests/run_vectors.sh index 1cc445d..116a743 100755 --- a/tests/run_vectors.sh +++ b/tests/run_vectors.sh @@ -57,6 +57,11 @@