thr3ads.net - similar to: "[LLVMdev] Haswell New Instructions"

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Haswell New Instructions"

2011 Jun 13

[LLVMdev] Haswell New Instructions

On Jun 13, 2011, at 6:48 AM, Dan Gohman wrote: > On Jun 13, 2011, at 4:41 AM, Nicolas Capens wrote: >> >> So I was wondering whether in LLVM a gather operation is best represented with a 'load' instruction taking vector operands, or whether it's better to define it as a separate 'gather' instruction. What would be the pros and cons of each approach, and what do

[LLVMdev] Haswell New Instructions

2011 Jun 13

[LLVMdev] Haswell New Instructions

Hi all, Intel has just revealed its AVX2 instruction set, to be supported by the 2013 Haswell architecture, and it's looking quite revolutionary: http://software.intel.com/en-us/forums/showthread.php?t=83399 <http://software.intel.com/en-us/forums/showthread.php?t=83399&o=a&s=lr> &o=a&s=lr It includes powerful 'gather' instructions, which allow reading

[LLVMdev] Haswell New Instructions

2011 Jun 13

[LLVMdev] Haswell New Instructions

The important thing IMO, is to not represent the gather operation as an instruction which takes a vector of pointers, because that's too restrictive for architectures with 64bits pointers. What one most frequently wants to do in those architectures is to specify a 64bit scalar base pointer with a vector of 32bit offsets. This fits what the VGATHERxxx described in the spec provides, and this

[LLVMdev] Spilled variables using unaligned moves

2008 Jul 14

[LLVMdev] Spilled variables using unaligned moves

Hi all, It looks like vector spills don't use aligned moves even though the stack is aligned. This seems like an optimization opportunity. The attached replacement of fibonacci.cpp generates x86 code like this: 03A70010 push ebp 03A70011 mov ebp,esp 03A70013 and esp,0FFFFFFF0h 03A70019 sub esp,1A0h ... 03A7006C movups xmmword ptr

[LLVMdev] Spilled variables using unaligned moves

2008 Jul 14

[LLVMdev] Spilled variables using unaligned moves

This is on Windows / Cygwin? I think the dynamic stack pointer re- alignment doesn't happen until post- register allocation. Assuming there aren't other instructions between the prologue and the first movups that mess up esp (there shouldn't), this is indeed a bug. Please file a bug and attach a bc file. Thanks. Evan On Jul 14, 2008, at 7:43 AM, Nicolas Capens wrote: > Hi

[LLVMdev] Haswell New Instructions

2011 Jun 15

[LLVMdev] Haswell New Instructions

Jose Fonseca <jfonseca at vmware.com> writes: > The important thing IMO, is to not represent the gather operation as > an instruction which takes a vector of pointers, because that's too > restrictive for architectures with 64bits pointers. How is it restrictive? > What one most frequently wants to do in those architectures is to specify a > 64bit scalar base pointer

[LLVMdev] Spilled variables using unaligned moves

2008 Jul 15

[LLVMdev] Spilled variables using unaligned moves

Hi Evan, Could you maybe point me to the source files where this issue might originate? I'd like to learn more about LLVM's innards but so far I've just scraped the surface and I don't know where what phase of instruction selection / register allocation / stack layout / etc. happens. If I understand correctly this issue might be fixed by moving stack pointer alignment

[LLVMdev] Haswell New Instructions

2011 Jun 15

[LLVMdev] Haswell New Instructions

greened at obbligato.org (David A. Greene) writes: > Jose Fonseca <jfonseca at vmware.com> writes: > >> The important thing IMO, is to not represent the gather operation as >> an instruction which takes a vector of pointers, because that's too >> restrictive for architectures with 64bits pointers. > > How is it restrictive? Ah, I think you mean you

[LLVMdev] InstructionCombining forgets alignment of globals

2008 Jul 10

[LLVMdev] InstructionCombining forgets alignment of globals

Hi all, The InstructionCombining pass causes alignment of globals to be ignored. I've attached a replacement of Fibonacci.cpp which reproduces this (I used 2.3 release). Here's the x86 code it produces: 03C20019 movaps xmm0,xmmword ptr ds:[164E799h] 03C20020 mulps xmm0,xmmword ptr ds:[164E79Ah] 03C20027 movaps xmmword ptr ds:[164E799h],xmm0 03C2002E

[LLVMdev] Spilled variables using unaligned moves

2008 Jul 14

[LLVMdev] Spilled variables using unaligned moves

On Jul 14, 2008, at 7:43 AM, Nicolas Capens wrote: > Hi all, > > It looks like vector spills don’t use aligned moves even though the > stack is aligned. This seems like an optimization opportunity. What target is this? Linux doesn't have a 16-byte aligned stack. -Chris > > The attached replacement of fibonacci.cpp generates x86 code like > this: > > 03A70010

[LLVMdev] Autovectorization questions

2014 Mar 12

[LLVMdev] Autovectorization questions

Hi, I'm reading "http://llvm.org/docs/Vectorizers.html" and have few question. Hope someone has answers on it. The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions that scatter/gathers memory. ( http://llvm.org/docs/Vectorizers.html#scatter-gather) int foo(int *A, int *B, int n, int k) { for (int i = 0; i < n; ++i) A[i*7] += B[i*k]; } I

[LLVMdev] unaligned AVX store gets split into two instructions

2013 Jul 10

[LLVMdev] unaligned AVX store gets split into two instructions

Hi, Yes. On Sandybridge 256-bit loads/stores are double pumped. This means that they go in one after the other in two cycles. On Haswell the memory ports are wide enough to allow a 256bit memory operation in one cycle. So, on Sandybridge we split unaligned memory operations into two 128bit parts to allow them to execute in two separate ports. This is also what GCC and ICC do. It is very

[LLVMdev] unaligned AVX store gets split into two instructions

2013 Jul 10

[LLVMdev] unaligned AVX store gets split into two instructions

I've narrowed this down to a single kernel (kernel.ll), which does a fixed-size matrix-matrix multiply: # ~/llvm-32-final/bin/llc kernel.ll -o kernel32.s # ~/llvm-33-final/bin/llc kernel.ll -o kernel33.s # ~/llvm-32-final/bin/clang++ harness.cpp kernel32.s -o harness32 # ~/llvm-32-final/bin/clang++ harness.cpp kernel33.s -o harness33 # time ./harness32 real 0m0.584s user 0m0.581s sys 0m0.001s

[LLVMdev] unaligned AVX store gets split into two instructions

2013 Jul 10

[LLVMdev] unaligned AVX store gets split into two instructions

Thanks for all the the info! I'm still in the process of narrowing down the performance difference in my code. I'm no longer convinced its related to only the unaligned loads/stores alone since extracting this part of the kernel makes the performance difference disappear. I will try to narrow down what is going on and if it seems related LLVM, I will post an example. Thanks again, Zach

[LLVMdev] unaligned AVX store gets split into two instructions

2013 Sep 19

[LLVMdev] unaligned AVX store gets split into two instructions

Nadav, We see multiple regressions after r172868 in ISPC compiler (based on LLVM optimizer). The regressions are due to spill/reloads, which are due to increase register pressure. This matches Zach's analysis. We've filed bug 17285 for this problem. Is there any possibility to avoid splitting in case of multiple loads going together? Dmitry. On Wed, Jul 10, 2013 at 1:12 PM, Zach

[LLVMdev] Autovectorization questions

2014 Mar 12

[LLVMdev] Autovectorization questions

In order to vectorize code like this LLVM needs to prove that “A[i*7]” does not wrap in the address space. It fails to do so and so LLVM doesn’t vectorize this loop even if we try to force it. The following loop will be vectorized if we force it: int foo(int * A, int * B, int n, int k) { for (int i = 0; i < 1024; ++i) A[i] += B[i*k]; } So will this loop: int foo(int * restrict A, int

[LLVMdev] Legalizing v32i1, v64i1 for Haswell pext/pdep instructions

2014 May 18

[LLVMdev] Legalizing v32i1, v64i1 for Haswell pext/pdep instructions

I have a group of students working with me on some LLVM projects related to our Parabix research. One interesting issue that has come up for us is code generation support for the Haswell new instructions pext and pdep. These instructions shuffle bits within a 64-bit word, either gathering all selected bits to the beginning (pext) or scattering some initial bits throughout (pdep). A natural

[LLVMdev] unaligned AVX store gets split into two instructions

2013 Jul 10

[LLVMdev] unaligned AVX store gets split into two instructions

On Tue, Jul 9, 2013 at 9:01 PM, Zach Devito <zdevito at gmail.com> wrote: > I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads > on AVX. > 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as a > single instruction (details below). > In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, which > seems to be

[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP

2011 Nov 23

[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP

Duncan, Thanks for the quick review! Here is a short description (design) of where I am going with this patch: 1. Motivation: Vectors-of-pointers is the first step in supporting scatter/gather instructions (available in AVX2, for example). I believe that this feature was requested on the mailing list before. As mentioned by Hal Finkel earlier today, this feature is desired by autovectorizers as

[LLVMdev] unaligned AVX store gets split into two instructions

2013 Jul 10

[LLVMdev] unaligned AVX store gets split into two instructions

I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads on AVX. 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as a single instruction (details below). In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, which seems to be due to this. Any ideas why this changed? Thanks! Zach LLVM Code: define <4 x double> @vstore(<4 x

similar to: [LLVMdev] Haswell New Instructions