search for: mul_one

Displaying 6 results from an estimated 6 matches for "mul_one".

Did you mean: malone
2020 May 19
5
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...__m128i in8_1 = sse_load_si128((__m128i_u*)&buf[i]); + __m128i in8_2 = sse_load_si128((__m128i_u*)&buf[i + 16]); + + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... 2*[int16*8] + // Fastest, even though multiply by 1 + __m128i mul_one = _mm_set1_epi8(1); + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); + + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8] + __m128i mul_const = _mm_set1_epi32(4 + (3 <<...
2020 May 18
6
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...__m128i in8_1 = sse_load_si128((void const*)&buf[i]); + __m128i in8_2 = sse_load_si128((void const*)&buf[i + 16]); + + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... 2*[int16*8] + // Fastest, even though multiply by 1 + __m128i mul_one = _mm_set1_epi8(1); + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); + + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8] + __m128i mul_const = _mm_set1_epi32(4 + (3 <<...
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...oad_si128((void const*)&buf[i]); > + __m128i in8_2 = sse_load_si128((void const*)&buf[i + 16]); > + > + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... > 2*[int16*8] > + // Fastest, even though multiply by 1 > + __m128i mul_one = _mm_set1_epi8(1); > + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); > + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); > + > + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... > 2*[int16*8] > + __m128i mul_const = _...
2020 May 18
2
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...amp;buf[i]); >> + __m128i in8_2 = sse_load_si128((void const*)&buf[i + 16]); >> + >> + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... >> 2*[int16*8] >> + // Fastest, even though multiply by 1 >> + __m128i mul_one = _mm_set1_epi8(1); >> + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); >> + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); >> + >> + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8] >> + __m128...
2020 May 20
0
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
..._load_si128((__m128i_u*)&buf[i]); > + __m128i in8_2 = sse_load_si128((__m128i_u*)&buf[i + 16]); > + > + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... > 2*[int16*8] > + // Fastest, even though multiply by 1 > + __m128i mul_one = _mm_set1_epi8(1); > + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1); > + __m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2); > + > + // (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8] > + __m128i mul_const = _mm_se...
2020 May 18
3
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
What do you base this on? Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html : "For the x86-32 compiler, you must use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default." That reads to me like we're fine for SSE2. As stated in my comments, SSSE3 support must be