search for: cvtps2dq

Displaying 3 results from an estimated 3 matches for "cvtps2dq".

Did you mean: cvttps2dq
2004 Aug 06
2
[PATCH] Make SSE Run Time option.
> Please note that dot products of simple vector floats are usually > faster > in the scalar units. The add across and transfer to scalar is just too > expensive. Or do four at once, with some shuffling (which is basically free); almost the same code as a 4x4 matrix/vector multiply. <p>Segher --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage:
2004 Aug 06
2
[PATCH] Make SSE Run Time option.
Hi Jean Marc, I think there is just a confusion over terminology going on here- I agree that support for 3dnow base version may not necessarily be relevant; However, even though 3dNow extended is a bastardized version of SSE, it still supports the same instructions, and that is what is important- I don't think we intend to add any AMD specfic code. The real issue is cross CPU SSE support,
2004 Aug 06
5
[PATCH] Make SSE Run Time option.
...thout the need to permute and separate the add and subtract. The really useful instruction here is the "addsubps". > I find it hard to believe you will never need SSE2. There are some > instructions that are legitimately useful to single precision floating > point work, such as cvtps2dq and cvttps2dq. There are so few conversions in Speex in the first place that it's not even bothering with that. You get all the gain from just addps and mulps (and the "glue instructions" that allows to use them like movaps and shufps). Jean-Marc -- Jean-Marc Valin, M.Sc....