search for: pmaddwd

Displaying 6 results from an estimated 6 matches for "pmaddwd".

Did you mean: vpmaddwd
2006 May 25
2
Compilation issues with s390
Hi all, I'm trying to compile asterisk on the mainframe (s390 / s390x) and I am running into issues. I was wondering if somebody could give a hand? I'm thinking that I should be able to do this. I have noticed that Debian even has binary RPM's out for Asterisk now. I'm trying to do this on SuSE SLES8 (with the 2.4 kernel). What I see is, an issue that arch=s390 isn't
2004 Aug 24
5
MMX/mmxext optimisations
quite some speed improvement indeed. attached the updated patch to apply to svn/trunk. j -------------- next part -------------- A non-text attachment was scrubbed... Name: theora-mmx.patch.gz Type: application/x-gzip Size: 8648 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin
2015 Nov 25
2
[RFC] Introducing a vector reduction add instruction.
...reduction phi node, and this is usually true as long as a reduction loop can be vectorized). For example, if we let the result be [s0+s1, 0, s2+s3, 0] or [0, 0, s0+s1+s2+s3, 0], the reduction result won't change. This enable us to detect SAD or dot-product patterns and use SSE's psadbw and pmaddwd instructions. Please see my respond to your another email for more details. Thanks! Cong > > Thanks again, > Hal > > ----- Original Message ----- >> From: "Cong Hou via llvm-dev" <llvm-dev at lists.llvm.org> >> To: "llvm-dev" <llvm-dev at l...
2015 Nov 25
2
[RFC] Introducing a vector reduction add instruction.
...this is > usually > true as long as a reduction loop can be vectorized). For example, if > we let the result be [s0+s1, 0, s2+s3, 0] or [0, 0, s0+s1+s2+s3, 0], > the reduction result won't change. This enable us to detect SAD or > dot-product patterns and use SSE's psadbw and pmaddwd instructions. > Please see my respond to your another email for more details. > > Thanks! > > Cong > > > > > > > Thanks again, > > Hal > > > > ----- Original Message ----- > >> From: "Cong Hou via llvm-dev" < llvm-de...
2018 Jul 23
4
[LoopVectorizer] Improving the performance of dot product reduction loop
...e it as well as gcc > and icc. The IR we are getting from the loop vectorizer has several v8i32 > adds and muls inside the loop. These are fed by v8i16 loads and sexts from > v8i16 to v8i32. The x86 backend recognizes that these are addition > reductions of multiplication so we use the vpmaddwd instruction which > calculates 32-bit products from 16-bit inputs and does a horizontal add of > adjacent pairs. A vpmaddwd given two v8i16 inputs will produce a v4i32 > result. > > That godbolt link seems wrong. It wasn't supposed to be clang IR. This should be right. > &gt...
2015 Nov 19
5
[RFC] Introducing a vector reduction add instruction.
After some attempt to implement reduce-add in LLVM, I found out a easier way to detect reduce-add without introducing new IR operations. The basic idea is annotating phi node instead of add (so that it is easier to handle other reduction operations). In PHINode class, we can add a flag indicating if the phi node is a reduction one (the flag can be set in loop vectorizer for vectorized phi nodes).