search for: andps

Displaying 16 results from an estimated 16 matches for "andps".

Did you mean: andes
2009 Dec 08
2
[LLVMdev] LLVM intrinsic for SSE ANDPS instruction
Hi, The arguments to the 'and' instruction must be integer types or vectors of integer types. If I have a compiler whose source language has support for andps by having its own intrinsics, then I would have to generate code to convert the float vector into an int vector before passing it to llvm's and instruction, then convert the result back. Zoltan On Tue, Dec 8, 2009 at 8:20 PM, Evan Cheng &l...
2009 Dec 08
2
[LLVMdev] LLVM intrinsic for SSE ANDPS instruction
Hi, LLVM is used to have an llvm.x86.and_ps instrinsic for the ANDPS instruction, but it seems to be gone, and it is a bit hard to synthetize it from vector instructions, since 'and' only works on vectors of integer types. Would a patch be accepted which adds this and related instructions back ? Zoltan -------------- next part -------------...
2009 Dec 08
0
[LLVMdev] LLVM intrinsic for SSE ANDPS instruction
...n in actual code. --Sam Crow > >From: Zoltan Varga <vargaz at gmail.com> >To: Evan Cheng <evan.cheng at apple.com> >Cc: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> >Sent: Tue, December 8, 2009 1:39:18 PM >Subject: Re: [LLVMdev] LLVM intrinsic for SSE ANDPS instruction > >Hi, > > The arguments to the 'and' instruction must be integer types or vectors of integer types. If >I have a compiler whose source language has support for andps by having its own intrinsics, >then I would have to generate code to convert the float vector...
2009 Dec 08
0
[LLVMdev] LLVM intrinsic for SSE ANDPS instruction
On Dec 8, 2009, at 11:18 AM, Zoltan Varga wrote: > Hi, > > LLVM is used to have an llvm.x86.and_ps instrinsic for the ANDPS instruction, but it seems to be gone, and it is a bit hard to > synthetize it from vector instructions, since 'and' only works on vectors of integer types. Would a patch be accepted which adds this and related instructions back ? No. It won't be. Why not just generate a llvm and ins...
2010 May 11
2
[LLVMdev] How does SSEDomainFix work?
Hello. This is my 1st post. I have tried SSE execution domain fixup pass. But I am not able to see any improvements. I expect for the example below to use MOVDQA, PAND &c. (On nehalem, ANDPS is extremely slower than PAND) Please tell me if something would be wrong for me. Thank you. Takumi Host: i386-mingw32 Build: trunk at 103373 foo.ll: define <4 x i32> @foo(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) nounwind readnone { entry: %0 = and <4 x i32> %x...
2010 May 11
0
[LLVMdev] How does SSEDomainFix work?
...kumi wrote: > Hello. This is my 1st post. ようこそ! > I have tried SSE execution domain fixup pass. > But I am not able to see any improvements. Did you actually measure runtime, or did you look at assembly? > I expect for the example below to use MOVDQA, PAND &c. > (On nehalem, ANDPS is extremely slower than PAND) Are you sure? The andps and pand instructions are actually the same speed, but on Nehalem there is a latency penalty for moving data between the int and float domains. The SSE execution domain pass tries to minimize the extra latency by switching instructions. In y...
2008 Jun 17
2
[LLVMdev] VFCmp failing when unordered or UnsafeFPMath on x86
...the following C++ code: if(v[0] < 0) v[0] += 1.0f; if(v[1] < 0) v[1] += 1.0f; if(v[2] < 0) v[2] += 1.0f; if(v[3] < 0) v[3] += 1.0f; With SSE assembly this would be as simple as: movaps xmm1, xmm0 // v in xmm0 cmpltps xmm1, zero // zero = {0.0f, 0.0f, 0.0f, 0.0f} andps xmm1, one // one = {1.0f, 1.0f, 1.0f, 1.0f} addps xmm0, xmm1 With the current definition of VFCmp this seems hard if not impossible to achieve. Vector compare instructions that return all 1's or all 0's per element are very common, and they are quite powerful in my opinion...
2011 Jun 07
2
[LLVMdev] AVX Status?
...ecause >>> xor<8 x i32> %m, %m >>> works, probably because it can get rid of all bitcasts. >> >> And it can use xorps to implement the operation. > > Yes, that makes sense. But why does the same not work with "and" and > "or" (-> VANDPS/VORPS) ? It can. Maybe the pattern for ANDPS isn't there yet. I'd have to dig deeper into the failure. The fact that there are inconsistencies like this is one of the motivations behind the SIMD reorg. There are plenty of such inconsistencies in the existing SSE spec. Hopefully after...
2011 Jun 04
0
[LLVMdev] AVX Status?
...ms to be some code for this because >> xor<8 x i32> %m, %m >> works, probably because it can get rid of all bitcasts. > > And it can use xorps to implement the operation. Yes, that makes sense. But why does the same not work with "and" and "or" (-> VANDPS/VORPS) ? Anyway, I am looking forward to testing your patches. Would it be possible to send around a notification when the stuff goes upstream? Thanks a lot :). Best, Ralf
2008 Jun 16
0
[LLVMdev] VFCmp failing when unordered or UnsafeFPMath on x86
On Jun 13, 2008, at 12:27 AM, Nicolas Capens wrote: > Hi all, > > When trying to generate a VFCmp instruction when UnsafeFPMath is set > to true I get an assert “Unexpected CondCode” on my x86 system. This > also happens with UnsafeFPMath set to false and using an unordered > compare. Could someone look into this? > > While I’m at it, is there any reason why only the
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...loat> @llvm.x86.sse.min.ps(<4 x float> %mul310, <4 x float> zeroinitializer) nounwind ; <<4 x float>> [#uses=1] >> + %bitcast.i3 = bitcast <4 x float> %mul310 to <4 x i32> ; <<4 x i32>> [#uses=1] >> + %andps.i5 = and <4 x i32> %bitcast.i3, zeroinitializer ; <<4 x i32>> [#uses=1] >> + >> + call void null(<4 x float> %mul313, <4 x float> %cmpunord.i11, <4 x float> %tmp83, <4 x float> zeroinitializer, %struct.__ImageExecInfo* null, <4...
2016 Mar 16
3
the as-if rule / perf vs. security
...turn (v4i32) { x0, x1, 0, x3 }; } For x86, we notice that we have nearly a v4i32 vector's worth of loads, so we just turn that into a vector load and mask out the element that's getting set to zero: movups (%rdi), %xmm0 ; load 128-bits instead of three 32-bit elements andps LCPI0_0(%rip), %xmm0 ; put zero bits into the 3rd element of the vector Should that optimization be disabled by a hypothetical -fextra-secure flag? On Wed, Mar 16, 2016 at 7:59 AM, Craig, Ben <ben.craig at codeaurora.org> wrote: > Regarding accessing extra data, there are at least...
2011 Jun 03
2
[LLVMdev] AVX Status?
...>> } > > That would be nice indeed Some lowering code would be needed to convert from i1 masks to i8 masks (the so-called packed vs. sparse mask issue). I don't think I've added anything to do this as our vectorizer doesn't generate code this way. >> -> VCMPPS, VANDPS, BLENDVPS >> >> Nadav Rotem sent around a patch a few weeks ago in which he implemented >> codegen for the select for SSE, unfortunately I did not have time to >> look at it in more depth so far. >> >> Can anybody comment on the current status of AVX? > > N...
2016 Mar 16
3
the as-if rule / perf vs. security
...> > For x86, we notice that we have nearly a v4i32 vector's worth of loads, so > we just turn that into a vector load and mask out the element that's > getting set to zero: > movups (%rdi), %xmm0 ; load 128-bits instead of three > 32-bit elements > andps LCPI0_0(%rip), %xmm0 ; put zero bits into the 3rd element of > the vector > > Should that optimization be disabled by a hypothetical -fextra-secure flag? > > > > On Wed, Mar 16, 2016 at 7:59 AM, Craig, Ben <ben.craig at codeaurora.org> > wrote: > >> Regardi...
2008 Jun 13
6
[LLVMdev] VFCmp failing when unordered or UnsafeFPMath on x86
Hi all, When trying to generate a VFCmp instruction when UnsafeFPMath is set to true I get an assert "Unexpected CondCode" on my x86 system. This also happens with UnsafeFPMath set to false and using an unordered compare. Could someone look into this? While I'm at it, is there any reason why only the most significant bit of the return value of VFCmp is defined (according to
2016 Mar 15
3
the as-if rule / perf vs. security
[cc'ing cfe-dev because this may require some interpretation of language law] My understanding is that the compiler has the freedom to access extra data in C/C++ (not sure about other languages); AFAIK, the LLVM LangRef is silent about this. In C/C++, this is based on the "as-if rule": http://en.cppreference.com/w/cpp/language/as_if So the question is: where should the optimizer