search for: smull

Displaying 15 results from an estimated 15 matches for "smull".

Did you mean: small
2009 Jul 08
1
[LLVMdev] ARM cross compiling causes segmentation fault
Thanks. I could take a look at the lines and all of them have smull instruction like 'smull r0, r1, r0, r1'. Won On Wed, Jul 8, 2009 at 2:54 PM, Dale Johannesen <dalej at apple.com> wrote: > > On Jul 8, 2009, at 12:52 PMPDT, Won J Jeon wrote: > > I tried a couple of options (-mcpu=arm1136j-s, -mcpu=arm1136jf-s, > -march=armv6, ...)...
2005 Mar 25
2
Port speex to my iPAQ 1945
Hi I want to port speex to my pocket PC iPAQ1945 which has a Samsung processor 2410, an ARM9-based processor. I would like to write the specific optimized code for this chip. I had some experience at DSP chip and fixed-point coding but know nothing about embedded system and ARM. Could someone tell me some hint how to write optimized code for this pocket PC. If you can give me some links that will
2015 Aug 05
0
[PATCH 4/8] Arm64 assembly for Celt fixed-point math.
...CH DAMAGE. +*/ + +#ifndef FIXED_ARM64_H +#define FIXED_ARM64_H + +/** 16x32 multiplication, followed by a 16-bit shift right. Results fits in 32 bits */ +#undef MULT16_32_Q16 +static OPUS_INLINE opus_val32 MULT16_32_Q16_arm64(opus_val16 a, opus_val32 b) +{ + opus_int64 rd; + __asm__( + "smull %x0, %w1, %w2\n\t" + : "=&r"(rd) + : "%r"(b), "r"(a<<16) + ); + return (rd >> 32); +} +#define MULT16_32_Q16(a, b) (MULT16_32_Q16_arm64(a, b)) + + +/** 16x32 multiplication, followed by a 15-bit shift right. Results fits in 32 bits */ +...
2015 Nov 07
0
[Aarch64 06/11] Add aarch64 assembly for Celt fixed-point math.
...CH DAMAGE. +*/ + +#ifndef FIXED_ARM64_H +#define FIXED_ARM64_H + +/** 16x32 multiplication, followed by a 16-bit shift right. Results fits in 32 bits */ +#undef MULT16_32_Q16 +static OPUS_INLINE opus_val32 MULT16_32_Q16_arm64(opus_val16 a, opus_val32 b) +{ + opus_int64 rd; + __asm__( + "smull %x0, %w1, %w2\n\t" + : "=&r"(rd) + : "%r"(b), "r"(a<<16) + ); + return (rd >> 32); +} +#define MULT16_32_Q16(a, b) (MULT16_32_Q16_arm64(a, b)) + + +/** 16x32 multiplication, followed by a 15-bit shift right. Results fits in 32 bits */ +...
2007 Dec 02
2
Optimised qmf_synth and iir_mem16
..._mul(const spx_sig_t *x, @ spx_sig_t *y, @ spx_word32_t scale, @ int len) .global signal_mul signal_mul: stmdb sp!, { r4-r8, lr } 0: ldmia r0!, { r5-r8 } @ Load four input samples smull r5, r12, r2, r5 mov r12, r12, lsl #18 @ Recombine upper and lower parts orr r5, r12, r5, lsr #14 smull r6, r12, r2, r6 mov r12, r12, lsl #18 orr r6, r12, r6, lsr #14 smull r7, r12, r2, r7 mov r12, r12, lsl #18 orr r7, r12,...
2009 Jul 08
0
[LLVMdev] ARM cross compiling causes segmentation fault
On Jul 8, 2009, at 12:52 PMPDT, Won J Jeon wrote: > I tried a couple of options (-mcpu=arm1136j-s, -mcpu=arm1136jf-s, - > march=armv6, ...) to let the compile know the specific ARM > processor, but the same issue is still there. I tried to take a look > at .s file in /tmp directory, but it's already cleaned up. Is it > because I enabled the optimization option when I
2015 Aug 05
0
[PATCH 5/8] Arm64 assembly for Silk math.
...******************************/ + +#ifndef SILK_MACROS_ARM64_H +#define SILK_MACROS_ARM64_H + +/* (a32 * b32) >> 16 */ +#undef silk_SMULWW +static OPUS_INLINE opus_int32 silk_SMULWW_arm64(opus_int32 a, opus_int32 b) +{ + opus_int64 rd; + __asm__( + "#silk_SMULWW\n\t" + "smull %x0, %w1, %w2\n\t" + : "=&r"(rd) + : "%r"(a), "r"(b) + ); + rd >>= 16; + rd &= 0xFFFFFFFF; + return rd; +} +#define silk_SMULWW(a, b) (silk_SMULWW_arm64(a, b)) + +#undef silk_SMLAWW +static OPUS_INLINE opus_int32 silk_SMLAWW_arm64(opus_int3...
2015 Nov 23
1
[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.
...00000000005c ldr q1, [x1] 0000000000000060 ext.16b v0, v0, v1, #12 0000000000000064 ldur q1, [x1, #12] 0000000000000068 ldr q2, [x2] 000000000000006c sshll.4s v3, v2, #0 0000000000000070 sshll2.4s v2, v2, #0 0000000000000074 smull.2d v4, v0, v3 0000000000000078 smlal2.2d v4, v0, v3 000000000000007c smlal.2d v4, v1, v2 0000000000000080 smlal2.2d v4, v1, v2 0000000000000084 ext.16b v2, v4, v4, #8 0000000000000088 add d2, d4, d2 000000000000008c sshr d2,...
2009 Jul 08
3
[LLVMdev] ARM cross compiling causes segmentation fault
I tried a couple of options (-mcpu=arm1136j-s, -mcpu=arm1136jf-s, -march=armv6, ...) to let the compile know the specific ARM processor, but the same issue is still there. I tried to take a look at .s file in /tmp directory, but it's already cleaned up. Is it because I enabled the optimization option when I compiled llvm? Regards, Won On Wed, Jul 8, 2009 at 1:28 PM, Dale Johannesen <dalej
2005 Mar 27
0
Port speex to my iPAQ 1945
...sults. Right now, many places use ARM4 assembly even on > ARM5E, so if you want even better results, you can rewrite those. The > main instructions you'll want to use are smulbb, smlabb, smulwb and > smlawb, which aren't present in ARM4 and are usually more efficient than > mul, smull and mla. > > Jean-Marc >
2015 Nov 20
2
[Aarch64 00/11] Patches to enable Aarch64
> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com> wrote: > > Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) Yes, you?re right. I forgot to run the vectors under qemu with my previous version (oh, the embarrassment!) Fixed forthcoming
2015 Aug 05
8
[PATCH 0/8] Patches for arm64 (aarch64) support
This sequence of patches provides arm64 support for Opus. Tested on iOS, Android, and Ubuntu 14.04. The patch sequence was written on top of Viswanath Puttagunta's Ne10 patches, but all but the second ("Reorganize pitch_arm.h") should, I think, apply independently of it. It does depends on my previous intrinsics configury reorganization, however. Comments welcome. With this and
2002 Feb 08
2
Vorbis bitstream specification...
Hi, I'm looking for more documentation on the Vorbis bitstream format. The goal for me is to write an optimized decoder using only integer or fixed point math for use on the Phatnoise Car Audio System (see http://www.phatnoise.com). I've already found the info on the Ogg framing system and I've already written my own thing for parsing through Ogg frames (easy). Also, is RTP
2015 Nov 07
12
[Aarch64 00/11] Patches to enable Aarch64 (arm64) optimizations, rebased to current master.
Here are my aarch64 patches rebased to the current tip of Opus master. They're largely the same as my previous patch set, with the addition of the final one (the Neon fixed-point implementation of xcorr_kernel). This replaces Viswanath's Neon fixed-point celt_pitch_xcorr, since xcorr_kernel is used in celt_fir and celt_iir as well. These have been tested for correctness under qemu
2007 Nov 21
3
[LLVMdev] Add/sub with carry; widening multiply
I've been playing around with llvm lately and I was wondering something about the bitcode instructions for basic arithmetic. Is there any plan to provide instructions that perform widening multiply, or add with carry? It might be written as: mulw i32 %lhs %rhs -> i64 ; widening multiply addw i32 %lhs %rhs -> i33 ; widening add addc i32 %lhs, i32 %rhs, i1 %c -> i33 ; add with carry