Displaying 3 results from an estimated 3 matches for "_mm_setr_ps".
Did you mean:
_mm_set1_ps
2004 Aug 06
3
[PATCH] Make SSE Run Time option.
...int ord, float *_mem)
{
__m128 num[3], den[3], mem[3];
int i;
/* Copy numerator, denominator and memory to aligned xmm */
for (i=0;i<2;i++)
{
mem[i] = _mm_loadu_ps(_mem+4*i);
num[i] = _mm_loadu_ps(_num+4*i+1);
den[i] = _mm_loadu_ps(_den+4*i+1);
}
mem[2] = _mm_setr_ps(_mem[8], _mem[9], 0, 0);
num[2] = _mm_setr_ps(_num[9], _num[10], 0, 0);
den[2] = _mm_setr_ps(_den[9], _den[10], 0, 0);
for (i=0;i<N;i++)
{
__m128 xx;
__m128 yy;
/* Compute next filter result */
xx = _mm_load_ps1(x+i);
yy = _mm_add_ss(xx, mem[0]);...
2004 Aug 06
5
[PATCH] Make SSE Run Time option.
> Personally, I don't think much of PNI. The complex arithmetic stuff they
> added sets you up for a lot of permute overhead that is inefficient --
> especially on a processor that is already weak on permute. In my opinion,
Actually, the new instructions make it possible to do complex multiplies
without the need to permute and separate the add and subtract. The
really useful
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...x),
_mm_castsi128_ps(_mm_set1_epi32(0x7f << 23)));
const __m128 exp = _mm_cvtepi32_ps(iexp);
const __m128i quot = _mm_cvttps_epi32(_mm_div_ps(exp, _mm_set1_ps(3.f)));
const __m128i rem = _mm_sub_epi32(iexp, _mm_mullo_epi16(quot,
_mm_set1_epi32(0x10003)));
const __m128 entry = _mm_setr_ps( // 'rem' gets spilled depending on
version of lane extractor used
table[geti(rem, 0)],
table[geti(rem, 1)],
table[geti(rem, 2)],
table[geti(rem, 3)]);
return _mm_set1_ps(.5f) * entry;
}
int main(int argc, char** argv)
{
r[0] = testee(x[0]);
return 0;
}
In the above function 'testee...