similar to: [LLVMdev] Vector code

Displaying 20 results from an estimated 20000 matches similar to: "[LLVMdev] Vector code"

2008 May 08
2
[LLVMdev] Vector code
Hi Chris, Thanks for the advise, but I'm actually not trying to compile code from text. For now I'm just trying to construct the function directly. Think of it as the vector equivalent of the HowToUseJIT.cpp example. Cheers, -Nicolas -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Chris Lattner Sent: Thursday, 08
2008 May 08
0
[LLVMdev] Vector code
On Thu, 8 May 2008, Nicolas Capens wrote: > Thanks for the advise, but I'm actually not trying to compile code from > text. For now I'm just trying to construct the function directly. Think of > it as the vector equivalent of the HowToUseJIT.cpp example. There is a one to one mapping between text and IR. If you understand what to generate it is much easier to generate it.
2008 May 08
0
[LLVMdev] Vector code
On Thu, 8 May 2008, Nicolas Capens wrote: > I'm trying to use LLVM to generate SIMD code at runtime (in particular Intel > SSE). But I'm having a bit of trouble understanding how to create even the > simplest function; adding two vectors of four single-precision > floating-point elements. I can get it to add the elements one at a time but > not using one vector instruction.
2008 May 08
2
[LLVMdev] Vector code
Nicolas, > Thanks for the advise, but I'm actually not trying to compile code from > text. For now I'm just trying to construct the function directly. Think of > it as the vector equivalent of the HowToUseJIT.cpp example. llvm2cpp is your friend then. It's now a separate 'target' in llc. It will generate C++ code, which will construct provided IR. -- With best
2010 Sep 22
1
[LLVMdev] LLVM 2.8 and MMX
Assign the bug to me and I'll fix it in TOT next week! Thanks for narrowing it down! On Wednesday, September 22, 2010, Nicolas Capens <nicolas.capens at gmail.com> wrote: > Hi all, > > I think I figured it out: > 112804 causes 64-bit UNPCKLBW to no longer be selected for certain cases. > 112805 is benign. > 112806 causes 64-bit UNPCKHBW to no longer be selected for
2010 Sep 22
1
[LLVMdev] LLVM 2.8 and MMX
On Sep 21, 2010, at 5:30 PMPDT, Bill Wendling wrote: > LLVM isn't going to stop generating MMX instructions all together. We can't do that. :-) If the user specifically wants MMX (by, say, using the builtins), we have to support that still. The plan to cease generating MMX for generic vectors is a work-in-progress right now. It's not in 2.8. > > -bw Right, early on there
2008 Sep 05
1
[LLVMdev] Keeping values in memory
Hi all, It looks like LLVM is quite eager to load values into registers when they have multiple uses. Unfortunately, this increases register pressure, specifically on x86. In my experience modern x86 processors are very capable of using memory operands as source. In fact the only cases where a register is preferred over repeatedly using the same memory operand is when multiple instructions
2008 May 08
0
[LLVMdev] Vector code
On Thu, May 8, 2008 8:24 am, Nicolas Capens wrote: > Hi all, > > > > I'm trying to use LLVM to generate SIMD code at runtime (in particular > Intel > SSE). But I'm having a bit of trouble understanding how to create even the > simplest function; adding two vectors of four single-precision > floating-point elements. I can get it to add the elements one at a time
2011 Oct 20
4
[LLVMdev] Lowering to MMX
Hi all, I'm working on a graphics project which uses LLVM for dynamic code generation, and I noticed a major performance regression when upgrading from LLVM 2.8 to 3.0-rc1 (LLVM 2.9 didn't support Win64 so I skipped it entirely). I found out that the performance regression is due to removing support for lowering 64-bit vector operations to MMX, and using SSE2 instead. My code uses a
2008 May 08
1
[LLVMdev] Vector code
Nicolas Capens wrote: > Here's essentially what I try to generate: > > void add(float z[4], float x[4], float y[4]) > { > z[0] = x[0] + y[0]; > z[1] = x[1] + y[1]; > z[2] = x[2] + y[2]; > z[3] = x[3] + y[3]; > } This is the vectorized llvm-assembly equivalent: ----- define void @add(<4 x float>* %z, <4 x float>* %x, <4 x float>* %y) {
2011 Oct 26
2
[LLVMdev] Lowering to MMX
Hi Bill, Comments inline: On 24/10/2011 9:50 PM, Bill Wendling wrote: > On Oct 20, 2011, at 8:42 AM, Nicolas Capens wrote: > >> Hi all, >> >> I'm working on a graphics project which uses LLVM for dynamic code >> generation, and I noticed a major performance regression when upgrading >> from LLVM 2.8 to 3.0-rc1 (LLVM 2.9 didn't support Win64 so I
2008 May 08
3
[LLVMdev] Vector code
Hi Nicolas (at least, I suspect your signing of your mail with "Anton" was not intentional :-p), > I assume that's the same as the online demo's "Show LLVM C++ API code" > option (http://llvm.org/demo/)? I've tried that with a structure containing > four floating-point components but it also appears to add them individually > using extract/insert. Maybe
2008 May 08
2
[LLVMdev] Vector code
llvm does not automatically vectorize your scalar code (as least for now). You have to write gcc generic vector code or use vector builtins. Evan On May 8, 2008, at 1:46 PM, Nicolas Capens wrote: > Hi Matthijs, > > Yes, I've turned off the link-time optimizations (otherwise it just > propagates my constant vectors and immediate prints the result). :-) > > Here's
2008 May 20
2
[LLVMdev] Making use of SSE intrinsics
Hi all, I'd like to make use of some specific x86 Streaming SIMD Extension instructions, but I don't know where to start. For instance the 'rcpps' instructions computes a low precision but fast reciprocal. I've noticed that LLVM supports intrinsics, but I couldn't find any information on how to use them. I've tried digging through the LLVM-GCC code but it's just
2008 May 08
0
[LLVMdev] Vector code
Hi Matthijs, Yes, I've turned off the link-time optimizations (otherwise it just propagates my constant vectors and immediate prints the result). :-) Here's essentially what I try to generate: void add(float z[4], float x[4], float y[4]) { z[0] = x[0] + y[0]; z[1] = x[1] + y[1]; z[2] = x[2] + y[2]; z[3] = x[3] + y[3]; } And here's part of the output from the online
2010 Sep 08
8
[LLVMdev] LLVM 2.8 and MMX
On Wed, Sep 8, 2010 at 12:35 AM, Nicolas Capens <nicolas.capens at gmail.com> wrote: > Hi Chris, > > It's not broken, but the performance is crippled. > > I noticed that the code still contains some MMX instructions, but several > operations get expanded (apparently swizzling and such get expanded to a > large number of byte moves). I think some changes related to
2014 Aug 07
3
[LLVMdev] How to broaden the SLP vectorizer's search
On 7 August 2014 17:33, Chad Rosier <mcrosier at codeaurora.org> wrote: > You might consider filing a bug (llvm.org/bugs) requesting a flag, but I > don't know if the code owners want to expose such a flag. I'm not sure that's a good idea as a raw access to that limit, as there are no guarantees that it'll stay the same. But maybe a flag turning some
2011 Oct 25
0
[LLVMdev] Lowering to MMX
On Oct 20, 2011, at 8:42 AM, Nicolas Capens wrote: > Hi all, > > I'm working on a graphics project which uses LLVM for dynamic code > generation, and I noticed a major performance regression when upgrading > from LLVM 2.8 to 3.0-rc1 (LLVM 2.9 didn't support Win64 so I skipped it > entirely). > > I found out that the performance regression is due to removing
2010 Sep 21
1
[LLVMdev] LLVM 2.8 and MMX
This thread confuses me. I thought Chris said that LLVM 2.8 will not lower generic vectors to MMX because it breaks x87 code, and I didn't see an answer to your question about a switch to tell the code generator otherwise. However, you're complaining that MMX performance is subpar, even though LLVM 2.8 isn't supposed to generate MMX instructions. Can someone clarify the situation
2008 Jun 13
6
[LLVMdev] VFCmp failing when unordered or UnsafeFPMath on x86
Hi all, When trying to generate a VFCmp instruction when UnsafeFPMath is set to true I get an assert "Unexpected CondCode" on my x86 system. This also happens with UnsafeFPMath set to false and using an unordered compare. Could someone look into this? While I'm at it, is there any reason why only the most significant bit of the return value of VFCmp is defined (according to