thr3ads.net - search: "mul32"

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

2013 Jul 31

1

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

Thanks Tom. I really appreciate your insight. I'm able to use the customize to get the 64-bit to go to a subroutine and for the 32-bit, I am generate XXXISD::MUL32. I'm not sure then what you mean about "overriding" the ReplaceNodeResults. For ReplaceNodeResults, I'm doing: SDValue Res = LowerOperation(SDValue(N, 0), DAG); for (unsigned I = 0, E = Res->getNumValues(); I != E; ++I) Results.push_back(Res.getValue(I)); I did have...

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

2013 Jul 31

0

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

...Custom", you should be able to interfere in the type legalisation phase (before it gets promoted to a 64-bit MUL) by overriding the "ReplaceNodeResults" function. You could either expand it to a different libcall directly there, or replace it with a target-specific node (say XXXISD::MUL32) which claims to take i64 types but you really know is the 32-bit multiply. Then you'd have to take care of that node elsewhere, of course. Cheers. Tim.

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 19

5

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

.../ s2 += 32*s1 + ss2 = _mm_add_epi32(ss2, _mm_slli_epi32(ss1, 5)); + + // [sum(t1[0]..t1[6]), X, X, X] [int32*4]; faster than multiple _mm_hadds_epi16 + // Shifting left, then shifting right again and shuffling (rather than just + // shifting right as with mul32 below) to cheaply end up with the correct sign + // extension as we go from int16 to int32. + __m128i sum_add32 = _mm_add_epi16(add16_1, add16_2); + sum_add32 = _mm_add_epi16(sum_add32, _mm_slli_si128(sum_add32, 2)); + sum_add32 = _mm_add_epi16(sum_add32,...

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

2013 Jul 31

2

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

...t integers > > > > Problem: > > > > MUL on i32 is getting promoted to MUL on i64 > > > > MUL on i64 is getting expanded to a library call in compiler-rt > > > > > > Can you fix this by marking i64 MUL as Legal? > > > the problem is that MUL32 gets promoted and then converted into a > > subroutine call because it is now type i64, even though I want the MUL > I32 > > to remain as an operation in the architecture. MUL i32 would generate a > > 64-bit results from the lower 32-bit portions of 64-bit source operands. &gt...

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

6

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

.../ s2 += 32*s1 + ss2 = _mm_add_epi32(ss2, _mm_slli_epi32(ss1, 5)); + + // [sum(t1[0]..t1[6]), X, X, X] [int32*4]; faster than multiple _mm_hadds_epi16 + // Shifting left, then shifting right again and shuffling (rather than just + // shifting right as with mul32 below) to cheaply end up with the correct sign + // extension as we go from int16 to int32. + __m128i sum_add32 = _mm_add_epi16(add16_1, add16_2); + sum_add32 = _mm_add_epi16(sum_add32, _mm_slli_si128(sum_add32, 2)); + sum_add32 = _mm_add_epi16(sum_add32,...

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

0

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

...= _mm_add_epi32(ss2, _mm_slli_epi32(ss1, 5)); > + > + // [sum(t1[0]..t1[6]), X, X, X] [int32*4]; faster than > multiple _mm_hadds_epi16 > + // Shifting left, then shifting right again and shuffling > (rather than just > + // shifting right as with mul32 below) to cheaply end up > with the correct sign > + // extension as we go from int16 to int32. > + __m128i sum_add32 = _mm_add_epi16(add16_1, add16_2); > + sum_add32 = _mm_add_epi16(sum_add32, > _mm_slli_si128(sum_add32, 2)); > + sum_ad...

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

2

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

...m_slli_epi32(ss1, 5)); >> + >> + // [sum(t1[0]..t1[6]), X, X, X] [int32*4]; faster than >> multiple _mm_hadds_epi16 >> + // Shifting left, then shifting right again and shuffling >> (rather than just >> + // shifting right as with mul32 below) to cheaply end up >> with the correct sign >> + // extension as we go from int16 to int32. >> + __m128i sum_add32 = _mm_add_epi16(add16_1, add16_2); >> + sum_add32 = _mm_add_epi16(sum_add32, _mm_slli_si128(sum_add32, 2)); >> +...

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 20

0

[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

...= _mm_add_epi32(ss2, _mm_slli_epi32(ss1, 5)); > + > + // [sum(t1[0]..t1[6]), X, X, X] [int32*4]; faster than > multiple _mm_hadds_epi16 > + // Shifting left, then shifting right again and shuffling > (rather than just > + // shifting right as with mul32 below) to cheaply end up > with the correct sign > + // extension as we go from int16 to int32. > + __m128i sum_add32 = _mm_add_epi16(add16_1, add16_2); > + sum_add32 = _mm_add_epi16(sum_add32, _mm_slli_si128(sum_add32, 2)); > + sum_add32 =...

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020 May 18

3

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

What do you base this on? Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html : "For the x86-32 compiler, you must use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default." That reads to me like we're fine for SSE2. As stated in my comments, SSSE3 support must be

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

2013 Jul 30

3

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

I'll try to run through the scenario: 64-bit register type target (all registers have 64 bits). all 32-bits are getting promoted to 64-bit integers Problem: MUL on i32 is getting promoted to MUL on i64 MUL on i64 is getting expanded to a library call in compiler-rt the problem is that MUL32 gets promoted and then converted into a subroutine call because it is now type i64, even though I want the MUL I32 to remain as an operation in the architecture. MUL i32 would generate a 64-bit results from the lower 32-bit portions of 64-bit source operands. In customize for the operations, I am...

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

2013 Jul 30

0

[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

...> > all 32-bits are getting promoted to 64-bit integers > > Problem: > > MUL on i32 is getting promoted to MUL on i64 > > MUL on i64 is getting expanded to a library call in compiler-rt > > Can you fix this by marking i64 MUL as Legal? > the problem is that MUL32 gets promoted and then converted into a > subroutine call because it is now type i64, even though I want the MUL I32 > to remain as an operation in the architecture. MUL i32 would generate a > 64-bit results from the lower 32-bit portions of 64-bit source operands. > > In customize...

search for: mul32