search for: mul_const

Displaying 6 results from an estimated 6 matches for "mul_const".

2020 May 19
5
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...multiply by 1 + __m128i mul_one = _mm_set1_epi8(1); + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); + + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8] + __m128i mul_const = _mm_set1_epi32(4 + (3 << 8) + (2 << 16) + (1 << 24)); + __m128i mul_add16_1 = sse_maddubs_epi16(mul_const, in8_1); + __m128i mul_add16_2 = sse_maddubs_epi16(mul_const, in8_2); + + // s2 += 32*s1 + ss2 = _mm_add_epi32(ss2, _mm_slli_epi3...
2020 May 18
6
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...multiply by 1 + __m128i mul_one = _mm_set1_epi8(1); + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); + + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8] + __m128i mul_const = _mm_set1_epi32(4 + (3 << 8) + (2 << 16) + (1 << 24)); + __m128i mul_add16_1 = sse_maddubs_epi16(mul_const, in8_1); + __m128i mul_add16_2 = sse_maddubs_epi16(mul_const, in8_2); + + // s2 += 32*s1 + ss2 = _mm_add_epi32(ss2, _mm_slli_epi3...
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...m128i mul_one = _mm_set1_epi8(1); > + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); > + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); > + > + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... > 2*[int16*8] > + __m128i mul_const = _mm_set1_epi32(4 + (3 << 8) + (2 << > 16) + (1 << 24)); > + __m128i mul_add16_1 = sse_maddubs_epi16(mul_const, in8_1); > + __m128i mul_add16_2 = sse_maddubs_epi16(mul_const, in8_2); > + > + // s2 += 32*s1 > + ss2 = _m...
2020 May 18
2
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
..._mm_set1_epi8(1); >> + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); >> + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); >> + >> + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8] >> + __m128i mul_const = _mm_set1_epi32(4 + (3 << 8) + (2 << >> 16) + (1 << 24)); >> + __m128i mul_add16_1 = sse_maddubs_epi16(mul_const, in8_1); >> + __m128i mul_add16_2 = sse_maddubs_epi16(mul_const, in8_2); >> + >> + // s2 += 32*s1 >&g...
2020 May 20
0
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...__m128i mul_one = _mm_set1_epi8(1); > + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); > + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); > + > + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8] > + __m128i mul_const = _mm_set1_epi32(4 + (3 << 8) + (2 << > 16) + (1 << 24)); > + __m128i mul_add16_1 = sse_maddubs_epi16(mul_const, in8_1); > + __m128i mul_add16_2 = sse_maddubs_epi16(mul_const, in8_2); > + > + // s2 += 32*s1 > + ss2 = _m...
2020 May 18
3
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
What do you base this on? Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html : "For the x86-32 compiler, you must use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default." That reads to me like we're fine for SSE2. As stated in my comments, SSSE3 support must be