similar to: [LLVMdev] Legalizing v32i1, v64i1 for Haswell pext/pdep instructions

Displaying 20 results from an estimated 100 matches similar to: "[LLVMdev] Legalizing v32i1, v64i1 for Haswell pext/pdep instructions"

2014 Apr 03
3
[LLVMdev] SIMD Projects with LLVM
Hi everyone. After lurking for a while, this is my first post to the list. I am working with some graduate students on the general topic of compiler support for SIMD programming and specific projects related to LLVM and my own Parabix technology (parabix.costar.sfu.ca). Right now we have a few course projects on the go and already a question arising out of one of them (SSE2 Hoisting).
2018 Jan 07
2
Beginner question: Calling intrinsic
Hello, I’m not sure if this is the right place to ask beginner questions. If not, please direct me to the appropriate place. I’m writing my first llvm program and I’m trying to call an intrinsic, but failing. So far this is what I have: declare ccc i32 @llvm.x86.bmi.pdep.32(i32, i32) @.str2 = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1 declare i32 @printf(i8*, ...)
2018 Jan 07
0
Beginner question: Calling intrinsic
Hi John, What targer are you trying to compile this for? I imagine this is just a case of using an X86 intrinsic on a non-X86 back end. Or is this an intrinsic you added and didn't provide a selection pattern for? In any case, this intrinsic makes it into the selection DAG and the instruction selector tries to select a sequence of instructions for it. However, it fails to find a pattern that
2018 Jan 08
1
Beginner question: Calling intrinsic
If you are using x86, you probably need to pass something like -mcpu=haswell or -mattr=bmi2 to enable support for the intrinsic. It seems to -mcpu=native doesn't work for lli so it can't just autodetect your CPU. ~Craig On Sun, Jan 7, 2018 at 7:54 AM, Nemanja Ivanovic via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Hi John, > What targer are you trying to compile this
2016 Nov 16
3
InstCombine question on combineLoadToOperationType
Hello, Context: We have a backend where v32i1 is a Legal type, but the storage for v32i1 is not 32-bits/uses a different instruction sequence. We ran into an issue because combineLoadToOperationType changed v32i1 loads into i32 loads, so a sequence like: define void @bits(<32 x i1>* %A, <32 x i1>* %B) { %a = load <32 x i1>, <32 x i1>* %A store <32 x i1> %a,
2018 May 15
0
Rotates, once again
Thanks for writing this up. I'd like to have this intrinsic too. Another argument for having the intrinsic is shown in PR37426: https://bugs.llvm.org/show_bug.cgi?id=37426 Vectorization goes overboard because the throughput cost model used by the vectorizers doesn't match the 6 IR instructions that correspond to 1 x86 rotate instruction. Instead, we have: $ opt 37426prevectorize.ll -S
2006 Jan 17
1
How to loop a Vobis sound ?
The sound file is played correctly for the first time, then when rewind to the initial position. then copy PCM to buffer, the OpenAL report an error. It seems like the OpenAL doesn't recognize the PCM data. The OpenAL error number : AL_INVALID_VALUE 0xA003 void Buffer::PCMData (ALuint id, ALenum eFormat, ALvoid *data, ALsizei size, ALsizei freq) { // Copy
2011 Mar 10
0
[LLVMdev] Vector select/compare support in LLVM
"Rotem, Nadav" <nadav.rotem at intel.com> writes: > One of the arguments for packing masks is that it reduces > vector-registers pressure. Auto-vectorizing compilers maintain > multiple masks for different execution paths (for each loop nesting, > etc). Saving masks in xmm registers may result in vector-register > pressure which will cause spilling of these
2011 Mar 10
2
[LLVMdev] Vector select/compare support in LLVM
Hi David, The MOVMSKPS instruction is cheap (2 cycles). Not to be confused with VMASKMOV, the AVX masked move, which is expensive. One of the arguments for packing masks is that it reduces vector-registers pressure. Auto-vectorizing compilers maintain multiple masks for different execution paths (for each loop nesting, etc). Saving masks in xmm registers may result in vector-register
2011 Mar 14
1
[LLVMdev] Vector select/compare support in LLVM
David, The problem with the sparse representation is that it is word-width dependent. For 32-bit data-types, the mask is the 32nd bit, while fore 64bit types the mask is the 64th bit. How would you legalize the mask for the following code ? %mask = cmp nge <4 x float> %A, %B ; <4 x i1> %val = select <4 x i1>% mask, <4 x double> %X, %Y ; <4 x
2016 Jun 28
2
Question about VectorLegalizer::ExpandStore() with v4i1
On Tue, Jun 28, 2016 at 2:45 AM, jingu kang via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Hi All, > > Can someone comment below question whether it is wrong or not please? > > 2016-06-25 7:52 GMT+01:00 jingu kang <jaykang10 at gmail.com>: >> Hi All, >> >> I have a problem with VectorLegalizer::ExpandStore() with v4i1. >> >> Let's
2011 Mar 10
2
[LLVMdev] Vector select/compare support in LLVM
After I implemented a new type of legalization (the packing of i1 vectors), I found that x86 does not have a way to load packed masks into SSE registers. So, I guess that legalizing of <4 x i1> to <4 x i32> is the way to go. Cheers, Nadav -----Original Message----- From: Rotem, Nadav Sent: Thursday, March 10, 2011 11:04 To: 'David A. Greene' Cc: llvmdev at cs.uiuc.edu
2018 May 14
5
Rotates, once again
Hi everyone! I recently ran into some interesting issues with generation of rotate instructions - the details are in the bug tracker (https://bugs.llvm.org/show_bug.cgi?id=37387 and related bugs) for those interested - and it brought up the issue of rotates in the IR again. Now this is a proposal that has been made (and been rejected) several times, but I've been told that this time round we
2011 Mar 10
0
[LLVMdev] Vector select/compare support in LLVM
Hey, I am currently forced to create the BLENDVPS intrinsic as an external call (via Intrinsic::x86_sse41_blendvps) which has the following signature (from IntrinsicsX86.td): def int_x86_sse41_blendvps : GCCBuiltin<"__builtin_ia32_blendvps">, Intrinsic<[llvm_v4f32_ty],[llvm_v4f32_ty, llvm_v4f32_ty, llvm_v4f32_ty],[IntrNoMem]> Thus, it expects the mask (first operand if
2016 Oct 20
2
[Bug 1092] New: nft v0.6 segfault in must_print_eq_op at expression.c:520 during 'nft monitor trace' in netdev filter
https://bugzilla.netfilter.org/show_bug.cgi?id=1092 Bug ID: 1092 Summary: nft v0.6 segfault in must_print_eq_op at expression.c:520 during 'nft monitor trace' in netdev filter Product: nftables Version: unspecified Hardware: x86_64 OS: All Status: NEW
2009 Mar 27
2
[LLVMdev] GSoc 2009 (Bad Subject in the previous email)
Dear all I am a PhD student of Computer Scince at Simon Fraser University ( http://www.cs.sfu.ca) interested in applying to GSoC. My PhD is focused on theoretical computer science, but since Sep. 2008 I have started working on Software projects again. Currently I am working in COSTAR lab ( http://costar.sfu.ca/) on a high performance regular expression engine based on Parallel bit streams
2009 Mar 30
0
[LLVMdev] GSoc 2009 (Bad Subject in the previous email)
Hi Ehsan, All of the projects you have listed are quite interesting. If I were to advocate for one, it would be #2. I think the scope of work is perfect for GSoc. I'd encourage send out a more concrete proposal when you're ready. Thanks, Evan On Mar 27, 2009, at 2:35 PM, Ehsan Amiri wrote: > Dear all > > I am a PhD student of Computer Scince at Simon Fraser University
2009 Apr 01
4
[LLVMdev] GSoc 2009 (Bad Subject in the previous email)
Hi Evan Thanks for the email. I had a look at gcc implementation of TBAA and I think three main steps in implementation of TBAA for LLVM will be this: April 20 ~ May 23: I will read gcc implementation in depth and play around with LLVM code. May 23 ~ July 6: Implementation and test of a simple version of TBAA that does not work with all aggregate types. I think part of the coding required for
2015 Mar 12
2
PROBLEMA DE MEMORIA AL HACER PERMUTACIONES
Buenas tardes amigos, De nuevo por aqui con un incoveniente, tengo el siguiente arreglo: > MuestraS [1] 1 0 0 0 1 0 1 1 1 1 1 0 Deseo realizar todas las permutaciones posibles para luego tomar una muestra aleatoria pequeña, esto lo debo hacer varias veces incrementando el largo del arreglo "MuestraS". El inconveniente esta en que al hacer las permutaciones con este arreglo de 12
2011 Mar 09
0
[LLVMdev] Vector select/compare support in LLVM
"Rotem, Nadav" <nadav.rotem at intel.com> writes: > I can think of two ways to represent masks in x86: sparse and > packed. In the sparse method, the masks are kept in <4 x 32bit> > registers, which are mapped to xmm registers. This is the ‘native’ way > of using masks. This argues for the sparse representation, I think. > _Sparse_ After my discussion with