search for: in8_2

Displaying 6 results from an estimated 6 matches for "in8_2".

Did you mean: in8_1
2020 May 19
5
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...oad_si128((__m128i_u*)mul_t1_buf); + __m128i ss1 = _mm_setzero_si128(); + __m128i ss2 = _mm_setzero_si128(); + + for (i = 0; i < (len-32); i+=32) { + // Load ... 2*[int8*16] + __m128i in8_1 = sse_load_si128((__m128i_u*)&buf[i]); + __m128i in8_2 = sse_load_si128((__m128i_u*)&buf[i + 16]); + + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... 2*[int16*8] + // Fastest, even though multiply by 1 + __m128i mul_one = _mm_set1_epi8(1); + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1);...
2020 May 18
6
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...d_si128((void const*)mul_t1_buf); + __m128i ss1 = _mm_setzero_si128(); + __m128i ss2 = _mm_setzero_si128(); + + for (i = 0; i < (len-32); i+=32) { + // Load ... 2*[int8*16] + __m128i in8_1 = sse_load_si128((void const*)&buf[i]); + __m128i in8_2 = sse_load_si128((void const*)&buf[i + 16]); + + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... 2*[int16*8] + // Fastest, even though multiply by 1 + __m128i mul_one = _mm_set1_epi8(1); + __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1);...
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...gt; + __m128i ss1 = _mm_setzero_si128(); > + __m128i ss2 = _mm_setzero_si128(); > + > + for (i = 0; i < (len-32); i+=32) { > + // Load ... 2*[int8*16] > + __m128i in8_1 = sse_load_si128((void const*)&buf[i]); > + __m128i in8_2 = sse_load_si128((void const*)&buf[i + 16]); > + > + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... > 2*[int16*8] > + // Fastest, even though multiply by 1 > + __m128i mul_one = _mm_set1_epi8(1); > + __m128i add16_1 = sse_...
2020 May 18
2
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...= _mm_setzero_si128(); >> + __m128i ss2 = _mm_setzero_si128(); >> + >> + for (i = 0; i < (len-32); i+=32) { >> + // Load ... 2*[int8*16] >> + __m128i in8_1 = sse_load_si128((void const*)&buf[i]); >> + __m128i in8_2 = sse_load_si128((void const*)&buf[i + 16]); >> + >> + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... >> 2*[int16*8] >> + // Fastest, even though multiply by 1 >> + __m128i mul_one = _mm_set1_epi8(1); >> +...
2020 May 20
0
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...> + __m128i ss1 = _mm_setzero_si128(); > + __m128i ss2 = _mm_setzero_si128(); > + > + for (i = 0; i < (len-32); i+=32) { > + // Load ... 2*[int8*16] > + __m128i in8_1 = sse_load_si128((__m128i_u*)&buf[i]); > + __m128i in8_2 = sse_load_si128((__m128i_u*)&buf[i + 16]); > + > + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... > 2*[int16*8] > + // Fastest, even though multiply by 1 > + __m128i mul_one = _mm_set1_epi8(1); > + __m128i add16_1 = sse_m...
2020 May 18
3
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
What do you base this on? Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html : "For the x86-32 compiler, you must use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default." That reads to me like we're fine for SSE2. As stated in my comments, SSSE3 support must be