thr3ads.net - search: "16x32"

Displaying 18 results from an estimated 18 matches for "16x32".

Did you mean: 16.32

2004 Nov 01

speex on TI C5x fixed-point DSP

Jean-Marc Valin wrote: >>I have the encoder and decoder running now and have verified that the >>encoder is bit-exact wrt to the fixed-point code running on x86 for the >>same 30-second audio sample. Encode and decode together run in >>real-time for 8KHz data, complexity=3, on 120MHz C5509 when code and >>data are all in on-chip SRAM. I have not tested the

[PATCH 4/8] Arm64 assembly for Celt fixed-point math.

2015 Aug 05

[PATCH 4/8] Arm64 assembly for Celt fixed-point math.

...VER CAUSED AND ON ANY THEORY OF + LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ + +#ifndef FIXED_ARM64_H +#define FIXED_ARM64_H + +/** 16x32 multiplication, followed by a 16-bit shift right. Results fits in 32 bits */ +#undef MULT16_32_Q16 +static OPUS_INLINE opus_val32 MULT16_32_Q16_arm64(opus_val16 a, opus_val32 b) +{ + opus_int64 rd; + __asm__( + "smull %x0, %w1, %w2\n\t" + : "=&r"(rd) + : &qu...

[Aarch64 06/11] Add aarch64 assembly for Celt fixed-point math.

2015 Nov 07

[Aarch64 06/11] Add aarch64 assembly for Celt fixed-point math.

speex on TI C5x fixed-point DSP

2004 Nov 03

speex on TI C5x fixed-point DSP

Jean-Marc Valin wrote: >Well, I guess the first thing to look is whether your DSP can actually >do either 16x32=>48 or 16x32=>32 (keeping the MSBs), which is what the >smulwb does on ARM. If that's the case, you can gain a lot of speed (use >one instruction for 16x32 instead of three). Otherwise, replacing the >32x32 multiplies by 16x16 is probably a good thing. > > One thing I'...

[Fast Int64 2/4] Add OPUS_FAST_INT64 flavors of celt/fixed_generic.h macros.

2015 Nov 16

[Fast Int64 2/4] Add OPUS_FAST_INT64 flavors of celt/fixed_generic.h macros.

...+++++++++++ 1 file changed, 16 insertions(+) diff --git a/celt/fixed_generic.h b/celt/fixed_generic.h index ac67d37..1cfd6d6 100644 --- a/celt/fixed_generic.h +++ b/celt/fixed_generic.h @@ -37,16 +37,32 @@ #define MULT16_16SU(a,b) ((opus_val32)(opus_val16)(a)*(opus_val32)(opus_uint16)(b)) /** 16x32 multiplication, followed by a 16-bit shift right. Results fits in 32 bits */ +#if OPUS_FAST_INT64 +#define MULT16_32_Q16(a,b) ((opus_val32)SHR((opus_int64)((opus_val16)(a))*(b),16)) +#else #define MULT16_32_Q16(a,b) ADD32(MULT16_16((a),SHR((b),16)), SHR(MULT16_16SU((a),((b)&0x0000ffff)),16)) +...

[Fast Int64 1/4] Move OPUS_FAST_INT64 definition to celt/arch.h.

2015 Nov 16

[Fast Int64 1/4] Move OPUS_FAST_INT64 definition to celt/arch.h.

--- celt/arch.h | 5 +++++ silk/macros.h | 4 +--- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/celt/arch.h b/celt/arch.h index 9f74ddd..670527b 100644 --- a/celt/arch.h +++ b/celt/arch.h @@ -78,6 +78,11 @@ static OPUS_INLINE void _celt_fatal(const char *str, const char *file, int line) #define UADD32(a,b) ((a)+(b)) #define USUB32(a,b) ((a)-(b)) +/* Set this if opus_int64

[RFC][FFT][Fixed-Point][NEON] NEON-Optimize

2014 Dec 29

[RFC][FFT][Fixed-Point][NEON] NEON-Optimize

Hi Timothy, It requires some extra effort if twiddles and input/output have different bit width. Since Opus uses int32 for twiddles, we are going to do the same thing. Thanks, Phil Wang -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not

[RFC][FFT][Fixed-Point][NEON] NEON-Optimize

2015 Jan 19

[RFC][FFT][Fixed-Point][NEON] NEON-Optimize

...int32 for twiddles, we are going > > to do the same thing. > > Actually, the existing Opus code has 16-bit twiddles, mostly because it makes > it possible to use smulwb on ARMv5E. That being said, I agree that for Neon it > makes sense to use 32-bit twiddles since there's no 16x32 multiplier. > > Cheers, > > Jean-Marc > > > > > > > Thanks, > > > > Phil Wang > > > > > > -- IMPORTANT NOTICE: The contents of this email and any attachments > > are confidential and may also be privileged. If you are not the...

[Patch]01-Add ARM5E macros

2013 May 17

[Patch]01-Add ARM5E macros

...VER CAUSED AND ON ANY THEORY OF + LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ + +#ifndef FIXED_ARM5E_H +#define FIXED_ARM5E_H + +/** 16x32 multiplication, followed by a 16-bit shift right. Results fits in 32 bits */ +#undef MULT16_32_Q16 +static inline opus_val32 MULT16_32_Q16(opus_val16 a, opus_val32 b) +{ + int res; + __asm__( + "smulwb %0, %1, %2;\n" + : "=&r"(res) + : "%r"(b),&quo...

[Aarch64 v2 10/18] Clean up some intrinsics-related wording in configure.

2015 Nov 21

[Aarch64 v2 10/18] Clean up some intrinsics-related wording in configure.

--- configure.ac | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index f52d2c2..e1a6e9b 100644 --- a/configure.ac +++ b/configure.ac @@ -190,7 +190,7 @@ AC_ARG_ENABLE([rtcd], [enable_rtcd=yes]) AC_ARG_ENABLE([intrinsics], - [AS_HELP_STRING([--disable-intrinsics], [Disable intrinsics optimizations for ARM(float) X86(fixed)])],, +

Major internal changes, TI DSP build change

2006 Apr 22

Major internal changes, TI DSP build change

> >I fixed it in svn. Could you check that? > > Now all platforms match again. Note that the measured SNR for this test > sample is lower than with the broken code (10.87 vs 11.10), but of course > this is no way to judge the real quality. SNR, especially on a single sample, can be very misleading. Yet, could you just check that the DSP results match what you get on a PC?

[RFC][FFT][Fixed-Point][NEON] NEON-Optimize

2014 Dec 29

[RFC][FFT][Fixed-Point][NEON] NEON-Optimize

...t width. Since Opus uses int32 for twiddles, we are going > to do the same thing. Actually, the existing Opus code has 16-bit twiddles, mostly because it makes it possible to use smulwb on ARMv5E. That being said, I agree that for Neon it makes sense to use 32-bit twiddles since there's no 16x32 multiplier. Cheers, Jean-Marc > > > Thanks, > > Phil Wang > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately an...

Major internal changes, TI DSP build change

2006 Apr 22

Major internal changes, TI DSP build change

...ut by itself? No, I have done no assembly work on any of these DSPs. It has been a few years since I did assembly work on any DSP, and it does not look like I will need to for my applications. I just found the above instruction in the instruction set reference manual, and it seems perfect for 16x32 multiplies. When I look at the assembler output for filter.c, I do not see this instruction used, probably because there is always some shift in the result (like MULT_16_32_Q15, which takes 6 instructions to implement: two multiplies, two adds, a shift, and a store). So, never mind. >>...

Updated MIPs and memory requirements for TI c54x or c55 DSPs

2005 Aug 15

Updated MIPs and memory requirements for TI c54x or c55 DSPs

Hi, I can see that there has been some effort to compile the SPEEX codec to operate on the TI c54x and c55x DSPs and I am wondering if anyone would be able to update the mailing list with their current MIPs and Memory resource requirements for their c54x and/or c55x compilation? The only estimate I was able to find in the mailing list archive was 42MIPs but I'm not sure if this is an

Bug in ARM fixed-point ASM?

2015 Jul 19

Bug in ARM fixed-point ASM?

Hi, folks, I've been hunting down some strange bugs in audio I've been doing. While hunting my bugs down, I tripped across what appears to be an Opus bug, but it's not clear where it's coming from. Note that the optimization choices differ between the two in the config.log below. How can I force them to be the same? Presumably I need to force the android version toward the

[PATCH 0/8] Patches for arm64 (aarch64) support

2015 Aug 05

[PATCH 0/8] Patches for arm64 (aarch64) support

This sequence of patches provides arm64 support for Opus. Tested on iOS, Android, and Ubuntu 14.04. The patch sequence was written on top of Viswanath Puttagunta's Ne10 patches, but all but the second ("Reorganize pitch_arm.h") should, I think, apply independently of it. It does depends on my previous intrinsics configury reorganization, however. Comments welcome. With this and

[Aarch64 00/11] Patches to enable Aarch64 (arm64) optimizations, rebased to current master.

2015 Nov 07

[Aarch64 00/11] Patches to enable Aarch64 (arm64) optimizations, rebased to current master.

Here are my aarch64 patches rebased to the current tip of Opus master. They're largely the same as my previous patch set, with the addition of the final one (the Neon fixed-point implementation of xcorr_kernel). This replaces Viswanath's Neon fixed-point celt_pitch_xcorr, since xcorr_kernel is used in celt_fir and celt_iir as well. These have been tested for correctness under qemu

Updated MIPs and memory requirements for TI c54x or c55DSPs

2005 Aug 17

Updated MIPs and memory requirements for TI c54x or c55DSPs

Hi, Just a couple tips to reduce complexity. First, I think you'd get a good speedup by enabling the PRECISION16 switch (if it's not done already). This (very) slightly reduces quality, but means you convert a lot of "emulated" 16x32 multiplications into 16x16. There are also several routines that would benefit from platform-specific optimizations. There are already optimizations for ARM (*_arm4.h), Blackfin (*_bfin.h) and SSE (*_sse.h), so you can see what functions are worth optimizing. For a DSP, there are also two specifi...

search for: 16x32