Displaying 6 results from an estimated 6 matches for "in8_2".
Did you mean:
in8_1
2020 May 19
5
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...oad_si128((__m128i_u*)mul_t1_buf);
+ __m128i ss1 = _mm_setzero_si128();
+ __m128i ss2 = _mm_setzero_si128();
+
+ for (i = 0; i < (len-32); i+=32) {
+ // Load ... 2*[int8*16]
+ __m128i in8_1 = sse_load_si128((__m128i_u*)&buf[i]);
+ __m128i in8_2 = sse_load_si128((__m128i_u*)&buf[i + 16]);
+
+ // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ...
2*[int16*8]
+ // Fastest, even though multiply by 1
+ __m128i mul_one = _mm_set1_epi8(1);
+ __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1);...
2020 May 18
6
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...d_si128((void const*)mul_t1_buf);
+ __m128i ss1 = _mm_setzero_si128();
+ __m128i ss2 = _mm_setzero_si128();
+
+ for (i = 0; i < (len-32); i+=32) {
+ // Load ... 2*[int8*16]
+ __m128i in8_1 = sse_load_si128((void const*)&buf[i]);
+ __m128i in8_2 = sse_load_si128((void const*)&buf[i + 16]);
+
+ // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ...
2*[int16*8]
+ // Fastest, even though multiply by 1
+ __m128i mul_one = _mm_set1_epi8(1);
+ __m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1);...
2020 May 18
0
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...gt; + __m128i ss1 = _mm_setzero_si128();
> + __m128i ss2 = _mm_setzero_si128();
> +
> + for (i = 0; i < (len-32); i+=32) {
> + // Load ... 2*[int8*16]
> + __m128i in8_1 = sse_load_si128((void const*)&buf[i]);
> + __m128i in8_2 = sse_load_si128((void const*)&buf[i + 16]);
> +
> + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ...
> 2*[int16*8]
> + // Fastest, even though multiply by 1
> + __m128i mul_one = _mm_set1_epi8(1);
> + __m128i add16_1 = sse_...
2020 May 18
2
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...= _mm_setzero_si128();
>> + __m128i ss2 = _mm_setzero_si128();
>> +
>> + for (i = 0; i < (len-32); i+=32) {
>> + // Load ... 2*[int8*16]
>> + __m128i in8_1 = sse_load_si128((void const*)&buf[i]);
>> + __m128i in8_2 = sse_load_si128((void const*)&buf[i + 16]);
>> +
>> + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ...
>> 2*[int16*8]
>> + // Fastest, even though multiply by 1
>> + __m128i mul_one = _mm_set1_epi8(1);
>> +...
2020 May 20
0
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
...> + __m128i ss1 = _mm_setzero_si128();
> + __m128i ss2 = _mm_setzero_si128();
> +
> + for (i = 0; i < (len-32); i+=32) {
> + // Load ... 2*[int8*16]
> + __m128i in8_1 = sse_load_si128((__m128i_u*)&buf[i]);
> + __m128i in8_2 = sse_load_si128((__m128i_u*)&buf[i + 16]);
> +
> + // (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ...
> 2*[int16*8]
> + // Fastest, even though multiply by 1
> + __m128i mul_one = _mm_set1_epi8(1);
> + __m128i add16_1 = sse_m...
2020 May 18
3
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
What do you base this on?
Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html :
"For the x86-32 compiler, you must use -march=cpu-type, -msse or
-msse2 switches to enable SSE extensions and make this option
effective. For the x86-64 compiler, these extensions are enabled by
default."
That reads to me like we're fine for SSE2. As stated in my comments,
SSSE3 support must be