search for: cvttps2dq

Displaying 4 results from an estimated 4 matches for "cvttps2dq".

2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...0x2004c8(%rip),%xmm0 # 6009c0 <x> 4004f8: vpsrld $0x17,%xmm0,%xmm0 4004fd: vpaddd 0x17b(%rip),%xmm0,%xmm0 # 400680 <__dso_handle+0x8> 400505: vcvtdq2ps %xmm0,%xmm1 400509: vdivps 0x17f(%rip),%xmm1,%xmm1 # 400690 <__dso_handle+0x18> 400511: vcvttps2dq %xmm1,%xmm1 400515: vpmullw 0x183(%rip),%xmm1,%xmm1 # 4006a0 <__dso_handle+0x28> 40051d: vpsubd %xmm1,%xmm0,%xmm0 400521: vmovq %xmm0,%rax 400526: movslq %eax,%rcx 400529: sar $0x20,%rax 40052d: vpextrq $0x1,%xmm0,%rdx 400533: movslq %edx,%rsi 400536:...
2004 Aug 06
2
[PATCH] Make SSE Run Time option.
> Please note that dot products of simple vector floats are usually > faster > in the scalar units. The add across and transfer to scalar is just too > expensive. Or do four at once, with some shuffling (which is basically free); almost the same code as a 4x4 matrix/vector multiply. <p>Segher --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage:
2004 Aug 06
2
[PATCH] Make SSE Run Time option.
Hi Jean Marc, I think there is just a confusion over terminology going on here- I agree that support for 3dnow base version may not necessarily be relevant; However, even though 3dNow extended is a bastardized version of SSE, it still supports the same instructions, and that is what is important- I don't think we intend to add any AMD specfic code. The real issue is cross CPU SSE support,
2004 Aug 06
5
[PATCH] Make SSE Run Time option.
...d to permute and separate the add and subtract. The really useful instruction here is the "addsubps". > I find it hard to believe you will never need SSE2. There are some > instructions that are legitimately useful to single precision floating > point work, such as cvtps2dq and cvttps2dq. There are so few conversions in Speex in the first place that it's not even bothering with that. You get all the gain from just addps and mulps (and the "glue instructions" that allows to use them like movaps and shufps). Jean-Marc -- Jean-Marc Valin, M.Sc.A., ing. jr. L...