thr3ads.net - search: "add16"

Fwd: Re: Fixed Point on wideband-mode: Single Frame loss on 2000 Hz sine causes "freak off"

2010 Feb 04

1

Fwd: Re: Fixed Point on wideband-mode: Single Frame loss on 2000 Hz sine causes "freak off"

O.k., some more info: I just tested bandwidth widening to fix this. But I need to go to gamma values below 0.9 to become stable -- clearly too much widening, I think. I looked inside the Levinson-Durbin algorithm next. The lines #ifdef FIXED_POINT r = DIV32_16(rr+PSHR32(error,1),ADD16(error,8)); #else r = rr/(error+.003*ac[0]); #endif look interesting. While for floating point, .003*ac[0] is added to error, for fixed point, a constant value of 8 is added. When I alter this value, I get an output without "freaking out" for values 1,2,3 and 5. for 4, 6 and 7 the s...

[LLVMdev] Register design decision for backend

2010 Aug 29

2

[LLVMdev] Register design decision for backend

...tion to produce the following asm code: add r0, r2 addc r1, r3 ; add with carry ret however i noticed this doesnt work. As a test, I removed the WDREGS class and passed everything in GPR8 regs, this way LLVM was able to expand the i16 add instruction into the code above. I first thought on making add16 a pseudo instr and expand it manually, but i think this is a bad solution because LLVM would loose a lot of information in such a basic thing like an addition. Also, i would have to do the same for wider data types and for the rest of arithmetic and logical instructions, so nearly everything would...

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 19

5

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

...i_u*)&buf[i]); + __m128i in8_2 = sse_load_si128((__m128i_u*)&buf[i + 16]); + + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... 2*[int16*8] + // Fastest, even though multiply by 1 + __m128i mul_one = _mm_set1_epi8(1); + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); + + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8] + __m128i mul_const = _mm_set1_epi32(4 + (3 << 8) + (2 << 16) + (1 << 24)); +...

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

6

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

...st*)&buf[i]); + __m128i in8_2 = sse_load_si128((void const*)&buf[i + 16]); + + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... 2*[int16*8] + // Fastest, even though multiply by 1 + __m128i mul_one = _mm_set1_epi8(1); + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); + + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8] + __m128i mul_const = _mm_set1_epi32(4 + (3 << 8) + (2 << 16) + (1 << 24)); +...

[LLVMdev] Register design decision for backend

2010 Aug 31

0

[LLVMdev] Register design decision for backend

...: > > add r0, r2 > addc r1, r3 ; add with carry > ret > > however i noticed this doesnt work. As a test, I removed the WDREGS class > and passed everything in GPR8 regs, this way LLVM was able to expand the i16 > add instruction into the code above. I first thought on making add16 a > pseudo instr and expand it manually, but i think this is a bad solution > because LLVM would loose a lot of information in such a basic thing like an > addition. Also, i would have to do the same for wider data types and for the > rest of arithmetic and logical instructions, so near...

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

0

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

...__m128i in8_2 = sse_load_si128((void const*)&buf[i + 16]); > + > + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... > 2*[int16*8] > + // Fastest, even though multiply by 1 > + __m128i mul_one = _mm_set1_epi8(1); > + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); > + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); > + > + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... > 2*[int16*8] > + __m128i mul_const = _mm_set1_epi32(4 + (3 << 8) + (2 << >...

fixed point macros

2004 Aug 06

1

fixed point macros

...There are two fixed-point types: > spx_word16_t and spx_word32_t. Both of them are defined as float when > compiling normally and are defined to short (16 bits) and int (32 bits) > for fixed-point. As for the macros, here are some of them (the rest > should be easy to guess): > > ADD16, ADD32 adders for 16 and 32 bits > MULT16_16 multiply a 16 bit value by another 16 bit value (result in > 32) > MAC16_16 same but also adds to the first argument > MULT16_16_Q15 multiply a 16 bit value by another 16 bit value and shift > right by 15 (result assumed to fit in 16 bits...

Fwd: Re: Fixed Point on wideband-mode: Single Frame loss on 2000 Hz sine causes "freak off"

2010 Feb 05

0

Fwd: Re: Fixed Point on wideband-mode: Single Frame loss on 2000 Hz sine causes "freak off"

...spx_word32_t rr = NEG32(SHL32(EXTEND32(ac[i + 1]),13)); for (j = 0; j < i; j++) rr = SUB32(rr,MULT16_16(lpc[j],ac[i - j])); #ifdef FIXED_POINT // stop calculation if error < 30 if ( error <= 30 ) { return error; } //r = DIV32_16(rr+PSHR32(error,1),ADD16(error,10 )); r = DIV32_16(rr+PSHR32(error,1),error); #else r = rr/(error+.003*ac[0]); #endif This improves the situation. There's no more "freak out" for most cases. I tested with 2000 Hz, 2200 Hz and 3000 Hz input for different complexity and quality settings. Neverthe...

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

2

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

...d_si128((void const*)&buf[i + 16]); >> + >> + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... >> 2*[int16*8] >> + // Fastest, even though multiply by 1 >> + __m128i mul_one = _mm_set1_epi8(1); >> + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); >> + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); >> + >> + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8] >> + __m128i mul_const = _mm_set1_epi32(4 + (3 << 8) + (2 &lt...

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 20

0

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

...__m128i in8_2 = sse_load_si128((__m128i_u*)&buf[i + 16]); > + > + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... > 2*[int16*8] > + // Fastest, even though multiply by 1 > + __m128i mul_one = _mm_set1_epi8(1); > + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); > + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); > + > + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8] > + __m128i mul_const = _mm_set1_epi32(4 + (3 << 8) + (2 << > 16) +...

[ANNOUNCE] PocketPC Port for speex-1.1.5 with sample code

2004 Aug 06

0

[ANNOUNCE] PocketPC Port for speex-1.1.5 with sample code

...efine LSP_SCALING 1. #define GAMMA_SCALING 1. #define GAIN_SCALING 1. #define GAIN_SCALING_1 1. #define LPC_SHIFT 0 #define SIG_SHIFT 0 #define VERY_SMALL 1e-30 #define PSHR(a,shift) (a) #define SHR(a,shift) (a) #define SHL(a,shift) (a) #define SATURATE(x,a) (x) #define ADD16(a,b) ((a)+(b)) #define SUB16(a,b) ((a)-(b)) #define ADD32(a,b) ((a)+(b)) #define SUB32(a,b) ((a)-(b)) #define ADD64(a,b) ((a)+(b)) #define MULT16_16_16(a,b) ((a)*(b)) #define MULT16_16(a,b) ((a)*(b)) #define MAC16_16(c,a,b) ((c)+(a)*(b)) #define MULT16_32_Q11(a,b) ((a)*(b)) #define...

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

3

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

What do you base this on? Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html : "For the x86-32 compiler, you must use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default." That reads to me like we're fine for SSE2. As stated in my comments, SSSE3 support must be

[GlobalISel] A Proposal for global instruction selection

2015 Nov 18

13

[GlobalISel] A Proposal for global instruction selection

...the remaining (G) MachineInstr to MachineIntr. ** Implications ** As part of the bring-up of the prototype, we need to extend some of the core MachineInstr-level APIs: - Need to remember FastMath flags for each MachineInstr. - Need to know the type of each MachineInstr. We don’t want ADD8, ADD16, etc. - Extend the MachineRegisterInfo to support size as well as register classes for virtual registers. I have sketched the changes in the attached patches to help picturing how the changes would impact the existing APIs. Note: I do not intend to commit those changes as they are. They will go...

[LLVMdev] Register design decision for backend

2010 Aug 31

2

[LLVMdev] Register design decision for backend

...gt;> addc r1, r3 ; add with carry >> ret >> >> however i noticed this doesnt work. As a test, I removed the WDREGS class >> and passed everything in GPR8 regs, this way LLVM was able to expand the i16 >> add instruction into the code above. I first thought on making add16 a >> pseudo instr and expand it manually, but i think this is a bad solution >> because LLVM would loose a lot of information in such a basic thing like an >> addition. Also, i would have to do the same for wider data types and for the >> rest of arithmetic and logical instr...

[ANNOUNCE] PocketPC Port for speex-1.1.5 with sample code

2004 Aug 06

2

[ANNOUNCE] PocketPC Port for speex-1.1.5 with sample code

Hi Jean-Marc, Based on the wonderful Speex project, I've created SpeexOutLoud, essentially a Speex codec port for Windows Mobile 2003 devices. I've included a sample project intended to show the usage of SpeexOutLoud codec in a Pocket PC application based on .NET Compact Framework. I'd request you to please go through the attached build, and include it as a contribution to the

[GlobalISel] A Proposal for global instruction selection

2016 Jan 07

2

[GlobalISel] A Proposal for global instruction selection

...neIntr. > > > > ** Implications ** > > As part of the bring-up of the prototype, we need to extend some of the core MachineInstr-level APIs: > - Need to remember FastMath flags for each MachineInstr. > - Need to know the type of each MachineInstr. We don’t want ADD8, ADD16, etc. > - Extend the MachineRegisterInfo to support size as well as register classes for virtual registers. > > I have sketched the changes in the attached patches to help picturing how the changes would impact the existing APIs. > > Note: I do not intend to commit those changes...

[GlobalISel] A Proposal for global instruction selection

2016 Jan 11

2

[GlobalISel] A Proposal for global instruction selection

...neIntr. > > > > ** Implications ** > > As part of the bring-up of the prototype, we need to extend some of the core MachineInstr-level APIs: > - Need to remember FastMath flags for each MachineInstr. > - Need to know the type of each MachineInstr. We don’t want ADD8, ADD16, etc. > - Extend the MachineRegisterInfo to support size as well as register classes for virtual registers. > > I have sketched the changes in the attached patches to help picturing how the changes would impact the existing APIs. > > Note: I do not intend to commit those changes...

[GlobalISel] A Proposal for global instruction selection

2016 Jan 12

4

[GlobalISel] A Proposal for global instruction selection

...eIntr. > > > > ** Implications ** > > As part of the bring-up of the prototype, we need to extend some of the > core MachineInstr-level APIs: > - Need to remember FastMath flags for each MachineInstr. > - Need to know the type of each MachineInstr. We don’t want ADD8, ADD16, > etc. > - Extend the MachineRegisterInfo to support size as well as register > classes for virtual registers. > > I have sketched the changes in the attached patches to help picturing how > the changes would impact the existing APIs. > > > > Note: I do not intend t...

[GlobalISel] A Proposal for global instruction selection

2015 Nov 18

2

[GlobalISel] A Proposal for global instruction selection

...neIntr. > > > > ** Implications ** > > As part of the bring-up of the prototype, we need to extend some of the core MachineInstr-level APIs: > - Need to remember FastMath flags for each MachineInstr. > - Need to know the type of each MachineInstr. We don’t want ADD8, ADD16, etc. > - Extend the MachineRegisterInfo to support size as well as register classes for virtual registers. > > I have sketched the changes in the attached patches to help picturing how the changes would impact the existing APIs. > > Note: I do not intend to commit those changes...

[GlobalISel] A Proposal for global instruction selection

2016 Jan 12

2

[GlobalISel] A Proposal for global instruction selection

...neIntr. > > > > ** Implications ** > > As part of the bring-up of the prototype, we need to extend some of the core MachineInstr-level APIs: > - Need to remember FastMath flags for each MachineInstr. > - Need to know the type of each MachineInstr. We don’t want ADD8, ADD16, etc. > - Extend the MachineRegisterInfo to support size as well as register classes for virtual registers. > > I have sketched the changes in the attached patches to help picturing how the changes would impact the existing APIs. > > Note: I do not intend to commit those changes...

search for: add16