similar to: [LLVMdev] LLVM intrinsic for SSE ANDPS instruction

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] LLVM intrinsic for SSE ANDPS instruction"

2009 Dec 08
0
[LLVMdev] LLVM intrinsic for SSE ANDPS instruction
On Dec 8, 2009, at 11:18 AM, Zoltan Varga wrote: > Hi, > > LLVM is used to have an llvm.x86.and_ps instrinsic for the ANDPS instruction, but it seems to be gone, and it is a bit hard to > synthetize it from vector instructions, since 'and' only works on vectors of integer types. Would a patch be accepted which adds this and related instructions back ? No. It won't be.
2009 Dec 08
2
[LLVMdev] LLVM intrinsic for SSE ANDPS instruction
Hi, The arguments to the 'and' instruction must be integer types or vectors of integer types. If I have a compiler whose source language has support for andps by having its own intrinsics, then I would have to generate code to convert the float vector into an int vector before passing it to llvm's and instruction, then convert the result back.
2009 Dec 08
0
[LLVMdev] LLVM intrinsic for SSE ANDPS instruction
Hi Zoltan, I think the bitcast operation is rather painless to use. And if you want to be able to execute it on a float vector you could try putting the and operation in a function with inline linkage and that would be all that's needed to convert over and back. BTW, bitcasting is a no-op conversion in actual code. --Sam Crow > >From: Zoltan Varga <vargaz at gmail.com>
2010 May 11
2
[LLVMdev] How does SSEDomainFix work?
Hello. This is my 1st post. I have tried SSE execution domain fixup pass. But I am not able to see any improvements. I expect for the example below to use MOVDQA, PAND &c. (On nehalem, ANDPS is extremely slower than PAND) Please tell me if something would be wrong for me. Thank you. Takumi Host: i386-mingw32 Build: trunk at 103373 foo.ll: define <4 x i32> @foo(<4 x i32> %x,
2010 May 11
0
[LLVMdev] How does SSEDomainFix work?
On May 10, 2010, at 9:07 PM, NAKAMURA Takumi wrote: > Hello. This is my 1st post. ようこそ! > I have tried SSE execution domain fixup pass. > But I am not able to see any improvements. Did you actually measure runtime, or did you look at assembly? > I expect for the example below to use MOVDQA, PAND &c. > (On nehalem, ANDPS is extremely slower than PAND) Are you sure? The
2009 May 04
4
[LLVMdev] [PATCH] Add support for accessing the FS segment register on X86
Hi, Here is an updated version of the patch using address space 257. Zoltan On Mon, May 4, 2009 at 11:36 PM, Shantonu Sen <ssen at apple.com> wrote: > Maybe 257 would be better (or other unused), because of r70197, which gives > special behavior for <256 > > Shantonu Sen > ssen at apple.com > > Sent from my Mac Pro > > > On May 4, 2009,
2008 Dec 09
3
[LLVMdev] [PATH] Add sub.ovf/mul.ovf intrinsics
Hi, Attached is the final version of the patch, adding the requested FIXME. If this is ok, can somebody check it in ? thanks Zoltan On Tue, Dec 9, 2008 at 9:58 PM, Bill Wendling <isanbard at gmail.com> wrote: > On Tue, Dec 9, 2008 at 6:11 AM, Zoltan Varga <vargaz at gmail.com> wrote: >> Hi, >>
2009 Sep 05
4
[LLVMdev] loads from a null address and optimizations
Hi, I don't intentionally want to induce a tramp, the load null is created by an llvm optimization pass from code like: v = null; ..... v.Call (); Zoltan On Sat, Sep 5, 2009 at 11:39 PM, Bill Wendling <isanbard at gmail.com> wrote: > Hi Zoltan, > > We've come across this before where people meant to induce a trap by > dereferencing a null. It
2009 May 04
3
[LLVMdev] [PATH] Fixes for the amd64 JIT code
Hi, If this looks ok, could somebody check it in ? thanks Zoltan Evan Cheng-2 wrote: > > Looks good. Thanks. > > Evan > > On May 1, 2009, at 8:40 AM, Zoltan Varga wrote: > >> Hi, >> >> The attached patch contains the following changes: >> >> * X86InstrInfo.cpp: Synchronize a few places with the code
2009 Jun 01
3
[LLVMdev] [PATH] Fix support for .umul.with.overflow on x86 + fix c binding
Hi, The first patch fixes the implementation of umul.with.overflow on x86 which was throwing a 'Cannot yet select' error. The second patch fixes the definition of LLVMTypeKind in the C binding by syncing it with the c++ counterpart. Please review and commit if it looks ok. thanks Zoltan -------------- next part -------------- An HTML attachment was
2009 May 05
1
[LLVMdev] [PATH] Fixes for the amd64 JIT code
Hi, It looks like the problem was with the RIP relative addressing. The original patch mistakenly removed the || DispForReloc part because I tough that the RIP relative addressing was done by the SIB encodings, but it is actually done by the shorter ones. The attached patch seems to work for me on linux and when simulating darwin by forcing some variables in X86TargetMachine.cpp to their darwin
2009 Sep 05
3
[LLVMdev] loads from a null address and optimizations
Hi, Currently, llvm treats the loads from a null address as unreachable code, i.e.: load i32* null is transformed by some optimization pass into unreachable This presents problems in JIT compilers like mono which implement null pointer checks by trapping SIGSEGV signals. It also looks incorrect since it changes program behavior, which might be undefined in general, but it is quite
2009 May 05
2
[LLVMdev] [PATH] Fixes for the amd64 JIT code
Hi Zoltan, The part that determines whether SIB byte is needed caused a lot of regressions last night (see Geryon-X86-64 etc.). I've reverted it for now. Please take a look. Thanks, Evan On May 4, 2009, at 3:49 PM, Evan Cheng wrote: > Committed as revision 70929. Thanks. > > Evan > > On May 3, 2009, at 8:29 PM, vargaz wrote: > >> >> Hi, >> >>
2009 May 04
1
[LLVMdev] [PATCH] Add support for accessing the FS segment register on X86
Hi, If I'm writing a JIT, and want to access the TLS variables of the app containing the JIT, I can't use thread_local since that only works for variables declared in LLVM IL and/or managed by the ExecutionEngine. While this patch allows a JIT to generate the TLS accesses itself, if it knows the tls offset of the variable in question. Zoltan On Tue, May 5, 2009 at
2008 Dec 09
1
[LLVMdev] [PATH] Add sub.ovf/mul.ovf intrinsics
Hi, The add.with.overflow instrinsics don't seem to work with constant arguments, i.e. changing the call in add-with-overflow.ll to: %t = call {i32, i1} @llvm.sadd.with.overflow.i32(i32 0, i32 0) causes the following exception when running the codegen tests: llc: DAGCombiner.cpp:646: void<unnamed>::DAGCombiner::Run(llvm::CombineLevel): Assertion `N->getValueType(0) ==
2009 Sep 14
3
[LLVMdev] merge request for 2.6
Hi, Would it be possible to merge this commit: http://llvm.org/viewvc/llvm-project?view=rev&revision=80960 to the llvm 2.6 branch ? Without it, incomplete unwind info is generated for functions with 0 stack size. thanks Zoltan -------------- next part -------------- An HTML attachment was scrubbed... URL:
2009 May 04
0
[LLVMdev] [PATCH] Add support for accessing the FS segment register on X86
Hello, The preferred way to do TLS is to use the thread_local keyword. There is x86-64 support for thread_local on ELF; if you need it for other targets, I recommend looking at adapting it. Dan On May 4, 2009, at 2:59 PM, Zoltan Varga wrote: > Hi, > > Here is an updated version of the patch using address space 257. > > Zoltan > > On Mon, May 4, 2009 at
2008 Dec 09
0
[LLVMdev] [PATH] Add sub.ovf/mul.ovf intrinsics
Applied. Thanks, Zoltan! -bw On Tue, Dec 9, 2008 at 1:12 PM, Zoltan Varga <vargaz at gmail.com> wrote: > Hi, > > Attached is the final version of the patch, adding the requested > FIXME. If this is ok, can > somebody check it in ? > > thanks > > Zoltan > > On Tue, Dec 9, 2008 at 9:58 PM,
2009 May 05
0
[LLVMdev] [PATH] Fixes for the amd64 JIT code
Hi, I can't reproduce these failures on my linux machine. The test machine seems to be running darwin. I suspect that the problem might be with RIP relative addressing, or with the encoding of R12/R13, but the code seems to handle the latter, since it checks for ESP/EBP which is the same as R12/R13. Zoltan On Tue, May 5, 2009 at 8:18 PM, Evan Cheng <evan.cheng at
2009 May 01
2
[LLVMdev] [PATH] Fixes for the amd64 JIT code
Hi, The attached patch contains the following changes: * X86InstrInfo.cpp: Synchronize a few places with the code in X86CodeEmitter.cpp * X86CodeEmitter.cpp: Avoid the longer SIB encoding on amd64 if it is not neeed. Zoltan -------------- next part -------------- An HTML attachment was scrubbed... URL: