thr3ads.net - similar to: "[PATCH 0/5] ARM NEON optimization for samplerate converter"

Displaying 20 results from an estimated 300 matches similar to: "[PATCH 0/5] ARM NEON optimization for samplerate converter"

[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point

2011 Sep 01

[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point

From: Jyri Sarha <jsarha at ti.com> Semantics of inner_product_single have also been changed to contain the final right shift and saturation so it can also be implemented in the optimal way for the used platform. This change affects fixed point calculations only. I also added a new fixed point macro SATURATE32PSHR(x, shift, a). It does pretty much the same thing as SATURATE32(PSHR32(x,

[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point

2011 Sep 01

[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point

From: Jyri Sarha <jsarha at ti.com> Also adds inline asm implementations of WORD2INT(x) macro for fixed and floating point. --- libspeex/resample_neon.h | 101 ++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 101 insertions(+), 0 deletions(-) diff --git a/libspeex/resample_neon.h b/libspeex/resample_neon.h index ba93e41..e7e981e 100644 --- a/libspeex/resample_neon.h +++

[PATCH] Fix miscompile of SSE resampler

2009 Oct 26

[PATCH] Fix miscompile of SSE resampler

From: Thorvald Natvig <slicer at users.sourceforge.net> Some optimizing compilers miscompile the current SSE optimizations when full optimizations are enabled. By using output value pointer instead of a return value, we can bypass this misbehaviour. --- libspeex/resample.c | 8 ++++---- libspeex/resample_sse.h | 24 ++++++++---------------- 2 files changed, 12 insertions(+), 20

[PATCH 4/5] configure.ac: Add ARM NEON support

2011 Sep 01

[PATCH 4/5] configure.ac: Add ARM NEON support

From: Jyri Sarha <jsarha at ti.com> Use --enable-neon to force NEON optimization on. The auto detection should also work if your CFLAGS supports NEON. --- configure.ac | 32 +++++++++++++++++++++++++++++++- 1 files changed, 31 insertions(+), 1 deletions(-) diff --git a/configure.ac b/configure.ac index 255c0b4..08d3d5f 100644 --- a/configure.ac +++ b/configure.ac @@ -89,6 +89,23 @@

Compilation failure in resample_neon.h on aarch64

2015 Jul 06

Compilation failure in resample_neon.h on aarch64

Hi all, I'm updating OpenEmbedded-core's speexdsp from 1.2rc1 (when it still was a part of the speex source tree) to 1.2rc3. I found out that building the new version for aarch64 fails in resample_neon.h (the target machine is OE-core's default qemuarm64 target). This is the error message: .../speexdsp-1.2rc3/libspeexdsp/resample_neon.h:148:5: error: impossible constraint in

Compilation failure in resample_neon.h on aarch64

2015 Jul 07

Compilation failure in resample_neon.h on aarch64

On Tue, 2015-07-07 at 18:40 +0930, Ron wrote: > On Mon, Jul 06, 2015 at 05:35:51PM +0300, Tanu Kaskinen wrote: > > Hi all, > > > > I'm updating OpenEmbedded-core's speexdsp from 1.2rc1 (when it still was > > a part of the speex source tree) to 1.2rc3. I found out that building > > the new version for aarch64 fails in resample_neon.h (the target machine

[PATCH] resample: Fix input indexing bug from interleaved functions

2012 May 02

[PATCH] resample: Fix input indexing bug from interleaved functions

From: Jyri Sarha <jsarha at ti.com> This bug happens quite often when resampling from a low to a high sample-rate with big enough factor. Also the resampling call has to be limited by the output buffer size and some unused samples needs be left in the input buffer. Sometimes when up-sampling with a big factor the resampling function wants to peek one more sample from the input buffer to

Cannot compile speexdsp 1.2rc3 on ARM64

2015 Mar 28

Cannot compile speexdsp 1.2rc3 on ARM64

Hi all, I build successfully with speex-1.2rc2. And with speexdsp 1.2rc3, I build with i386, X86_64, armv7 and armv7s all passed. But when I build for ARM64 (for iPhone 6), it failed with: /Applications/Xcode.app/Contents/Developer/usr/bin/make all-recursive Making all in libspeexdsp CC preprocess.lo CC jitter.lo CC mdf.lo CC fftwrap.lo CC

Resampler experimental speedups

2008 Apr 04

Resampler experimental speedups

Hello :) The attached patch (which is not in any way finished) optimizes the resampler. (For those following the discussions on IRC; this version includes optimizations for both direct and interpolate cases). Using GCC 4.3, x86_64, Valgrind to measure instruction counts, resampling 10 frames of 320 floats at quality 3. Direct was measured with a 16=>48 resampling, and interpolate with a

Cannot compile speexdsp 1.2rc3 on ARM64

2016 Jul 30

Cannot compile speexdsp 1.2rc3 on ARM64

I've filed a bug for aarch64 https://github.com/xiph/speexdsp/issues/7 and provided the port in a fork with a pull request. We need someone to review/merge in the pull request? It provides the source code, but my testing was under Android builds, so there would be some configure changes needed to build it stand alone. On Tue, Apr 19, 2016 at 4:32 PM, Frank Barchard <fbarchard at

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2014 Dec 07

[LLVMdev] NEON intrinsics preventing redundant load optimization?

Hi all, I’m not sure if this is the right list, so apologies if not. Doing some profiling I noticed some of my hand-tuned matrix multiply code with NEON intrinsics was much slower through a C++ template wrapper vs calling the intrinsics function directly. It turned out clang/LLVM was unable to eliminate a temporary even though the case seemed quite straightforward. Unfortunately any loads

Fwd: Cannot compile speexdsp 1.2rc3 on ARM64

2015 Apr 13

Fwd: Cannot compile speexdsp 1.2rc3 on ARM64

Hi, On Sat, Mar 28, 2015 at 2:34 PM, Evan JIANG <firstfan at gmail.com> wrote: > Hi all, > > (Sorry that may be duplicated that I was not a mail-list member before, > so last mail sent failed) > > I build successfully with speex-1.2rc2. And with speexdsp 1.2rc3, I > build with i386, X86_64, armv7 and armv7s all passed. > But when I build for ARM64 (for

[LLVMdev] RE : Vector argument passing abi for ARM ?

2012 Jul 05

[LLVMdev] RE : Vector argument passing abi for ARM ?

Hi Duncan, I also thought it was a bug, especially since it worked with LLVM 3.0, but since it is not defined by ABI, I was not sure if I need to submit it as a BUG. I wanted to be sure that it is an actual BUG before submitting it and got the not-a-bug answer. Here is a small example to reproduce the problem I'm experiencing: ; ModuleID = 'bugparam.ll' target datalayout =

[LLVMdev] RE : Vector argument passing abi for ARM ?

2012 Jul 05

[LLVMdev] RE : Vector argument passing abi for ARM ?

Hi Sebastien, > I also thought it was a bug, especially since it worked with LLVM 3.0, but since it is not defined by ABI, I was not sure if I need to submit it as a BUG. yes it is a bug. > I wanted to be sure that it is an actual BUG before submitting it and got the not-a-bug answer. I didn't read Nadav's reply as saying there was no bug, in fact he explicitly said in his email

[LLVMdev] Simple NEON optimization

2010 Nov 12

[LLVMdev] Simple NEON optimization

Hi folks, me again, So, I want to implement a simple optimization in a NEON case I've seen these days, most as a matter of exercise, but it also simplifies (just a bit) the code generated. The case is simple: uint32x2_t x, res; res = vceq_u32(x, vcreate_u32(0)); This will generate the following code: ; zero d16 vmov.i32 d16, #0x0 ; load a

[LLVMdev] Vector argument passing abi for ARM ?

2012 Jul 05

[LLVMdev] Vector argument passing abi for ARM ?

Hi Sebastien, > Thanks for the quick answer, how do I know which type is legal/illegal with respect to calling convention ? the code generators are supposed to produce working code no matter what the parameter type is. The fact that the ARM ABI doesn't specify how <2 x i8> is passed just means that the code generators can pass it using whatever technique it feels like (since it

[PATCH 1/2] drm/atomic: Change drm_atomic_helper_swap_state to return an error.

2017 Jun 28

[PATCH 1/2] drm/atomic: Change drm_atomic_helper_swap_state to return an error.

We want to change swap_state to wait indefinitely, but to do this swap_state should wait interruptibly. This requires propagating the error to each driver. All drivers have changes to deal with the clean up. In order to allow easy reverting, the commit that changes behavior is separate so someone only has to revert that for testing. Nouveau has a small bugfix, if drm_atomic_helper_wait_for_fences

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

2013 Jul 01

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

Hi, ** Problematic ** I am looking for advices to share some logic between DAG combine and target lowering. Basically, I need to know if a bitcast that is about to be inserted during target specific isel lowering will be eliminated during DAG combine. Let me know if there is another, better supported, approach for this kind of problems. ** Motivating Example ** The motivating example comes

[Intel-gfx] [PATCH 1/2] drm/atomic: Change drm_atomic_helper_swap_state to return an error.

2017 Jul 03

[Intel-gfx] [PATCH 1/2] drm/atomic: Change drm_atomic_helper_swap_state to return an error.

Op 30-06-17 om 15:56 schreef Daniel Vetter: > On Wed, Jun 28, 2017 at 03:28:11PM +0200, Maarten Lankhorst wrote: >> We want to change swap_state to wait indefinitely, but to do this >> swap_state should wait interruptibly. This requires propagating >> the error to each driver. All drivers have changes to deal with the >> clean up. In order to allow easy reverting, the

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

2013 Jul 01

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet <qcolombet at apple.com>wrote: > Hi, > > ** Problematic ** > I am looking for advices to share some logic between DAG combine and > target lowering. > > Basically, I need to know if a bitcast that is about to be inserted during > target specific isel lowering will be eliminated during DAG combine. > > Let me

similar to: [PATCH 0/5] ARM NEON optimization for samplerate converter