Displaying 4 results from an estimated 4 matches for "cvttps2dq".
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...0x2004c8(%rip),%xmm0 # 6009c0 <x>
4004f8: vpsrld $0x17,%xmm0,%xmm0
4004fd: vpaddd 0x17b(%rip),%xmm0,%xmm0 # 400680
<__dso_handle+0x8>
400505: vcvtdq2ps %xmm0,%xmm1
400509: vdivps 0x17f(%rip),%xmm1,%xmm1 # 400690
<__dso_handle+0x18>
400511: vcvttps2dq %xmm1,%xmm1
400515: vpmullw 0x183(%rip),%xmm1,%xmm1 # 4006a0
<__dso_handle+0x28>
40051d: vpsubd %xmm1,%xmm0,%xmm0
400521: vmovq %xmm0,%rax
400526: movslq %eax,%rcx
400529: sar $0x20,%rax
40052d: vpextrq $0x1,%xmm0,%rdx
400533: movslq %edx,%rsi
400536:...
2004 Aug 06
2
[PATCH] Make SSE Run Time option.
> Please note that dot products of simple vector floats are usually
> faster
> in the scalar units. The add across and transfer to scalar is just too
> expensive.
Or do four at once, with some shuffling (which is basically free);
almost the same code as a 4x4 matrix/vector multiply.
<p>Segher
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage:
2004 Aug 06
2
[PATCH] Make SSE Run Time option.
Hi Jean Marc,
I think there is just a confusion over terminology going on here- I agree that
support for 3dnow base version may not necessarily be relevant; However,
even though 3dNow extended is a bastardized version of SSE, it still supports
the same instructions, and that is what is important- I don't think we
intend to
add any AMD specfic code.
The real issue is cross CPU SSE support,
2004 Aug 06
5
[PATCH] Make SSE Run Time option.
...d to permute and separate the add and subtract. The
really useful instruction here is the "addsubps".
> I find it hard to believe you will never need SSE2. There are some
> instructions that are legitimately useful to single precision floating
> point work, such as cvtps2dq and cvttps2dq.
There are so few conversions in Speex in the first place that it's not
even bothering with that. You get all the gain from just addps and mulps
(and the "glue instructions" that allows to use them like movaps and
shufps).
Jean-Marc
--
Jean-Marc Valin, M.Sc.A., ing. jr.
L...