thr3ads.net - search: "movmskps"

Displaying 15 results from an estimated 15 matches for "movmskps".

2012 Sep 04

[LLVMdev] branch on vector compare?

...ht not produce anything good, the > free sext requires llvm 2.7 on x86 to work at all (certainly shouldn't > be a problem nowadays but on other backends it might be different) and > for the ptest sequence very recent svn is required. > I don't think the current code can generate movmskps + test (probably > the next best thing without sse41) instead of ptest though if you only > got sse. Thanks Roland, sign extending gets me part of the way at least. I'm on version 3.1 and as you say in bug report, there are a few extraneous instructions. For the record, casting to a &lt...

[LLVMdev] branch on vector compare?

2012 Sep 05

[LLVMdev] branch on vector compare?

...nything good, the >> free sext requires llvm 2.7 on x86 to work at all (certainly shouldn't >> be a problem nowadays but on other backends it might be different) and >> for the ptest sequence very recent svn is required. >> I don't think the current code can generate movmskps + test (probably >> the next best thing without sse41) instead of ptest though if you only >> got sse. > > > Thanks Roland, sign extending gets me part of the way at least. > I'm on version 3.1 and as you say in bug report, there are a > few extraneous instructions....

[LLVMdev] Implementing a new feature in LVVM

2013 Oct 09

[LLVMdev] Implementing a new feature in LVVM

...nt a feature, so I want your advice on its feasibility and difficulty. I want the code generator to support such code : %1 = ; <4 x float> vector %2 = ; <4 x float> vector %3 = fcmp ogt <4 x float> %1, %2 %4 = bitcast <4 x i1> %3 to i4 %5 = zext i4 %4 to i32 and generate a movmskps instruction (correct me if it's not possible). So the code generator would have to find such pattern (fcmp followed by a bitcast and/or zext) and generate correct assembly (using movmskps or a slower code path). Currently, LLVM 3.3 seems to generate wrong code (see bug 17479 : http://llvm.org/b...

Question about VectorLegalizer::ExpandStore() with v4i1

2016 Jun 29

Question about VectorLegalizer::ExpandStore() with v4i1

...ng how it is consumed, it is best to look at what happens to v8i1. We can then let the same optimizer work to get the optimal ASM code out in the end, whether vectorization factor is 4 or 8. In the end, I may be agreeing to Rob, but not because of the reasons Rob mentioned. One of the headaches is movmskps/pmovmskb do not have a quick reverse instruction (MIC-AVX512 and below). I do not know LLVM's X86 CodeGen enough to say whether it internally has mask-to/from-vector nodes. If it has, I'd hope X86 CodeGen can cancel out such things in a peephole manner very efficiently so that blindly going...

[LLVMdev] branch on vector compare?

2012 Sep 04

[LLVMdev] branch on vector compare?

...ec this sequence might not produce anything good, the free sext requires llvm 2.7 on x86 to work at all (certainly shouldn't be a problem nowadays but on other backends it might be different) and for the ptest sequence very recent svn is required. I don't think the current code can generate movmskps + test (probably the next best thing without sse41) instead of ptest though if you only got sse. Roland

[LLVMdev] branch on vector compare?

2012 Sep 03

[LLVMdev] branch on vector compare?

> > which goes through memory. Is there some idiom I'm missing so that it would use > > for instance movmsk for SSE or vcmpgt & cr6 for altivec? > > I don't think you are missing anything: LLVM IR has no support for horizontal > operations like or'ing the elements of a vector of boolean together. The code > generators do try to recognize a few idioms and

[LLVMdev] Vector select/compare support in LLVM

2011 Mar 08

[LLVMdev] Vector select/compare support in LLVM

...rse method, the masks are kept in <4 x 32bit> registers, which are mapped to xmm registers. This is the ‘native’ way of using masks. In the second representation, the packed method, the MSB bits are collected from the xmm register into a packed general purpose register. Luckily, SSE has the MOVMSKPS instruction, which converts sparse masks to packed masks. I am not sure which representation is better, but both are reasonable. The former may cause register pressure in some cases, while the latter may add the packing-unpacking overhead. _Sparse_ After my discussion with Duncan, last week, I st...

[LLVMdev] Vector select/compare support in LLVM

2011 Mar 09

[LLVMdev] Vector select/compare support in LLVM

"Rotem, Nadav" <nadav.rotem at intel.com> writes: > I can think of two ways to represent masks in x86: sparse and > packed. In the sparse method, the masks are kept in <4 x 32bit> > registers, which are mapped to xmm registers. This is the ‘native’ way > of using masks. This argues for the sparse representation, I think. > _Sparse_ After my discussion with

[LLVMdev] Vector select/compare support in LLVM

2011 Mar 10

[LLVMdev] Vector select/compare support in LLVM

...alizing of <4 x i1> to <4 x i32> is the way to go. Cheers, Nadav -----Original Message----- From: Rotem, Nadav Sent: Thursday, March 10, 2011 11:04 To: 'David A. Greene' Cc: llvmdev at cs.uiuc.edu Subject: RE: [LLVMdev] Vector select/compare support in LLVM Hi David, The MOVMSKPS instruction is cheap (2 cycles). Not to be confused with VMASKMOV, the AVX masked move, which is expensive. One of the arguments for packing masks is that it reduces vector-registers pressure. Auto-vectorizing compilers maintain multiple masks for different execution paths (for each loop nestin...

[LLVMdev] vector compare

2008 Dec 26

[LLVMdev] vector compare

On Dec 25, 2008, at 11:02 AM, Eli Friedman wrote: > On Thu, Dec 25, 2008 at 1:54 AM, Eli Friedman > <eli.friedman at gmail.com> wrote: >> On Thu, Dec 25, 2008 at 1:28 AM, Claudio Basile <cbasile at tempo- >> da.com> wrote: >>> Hi all, >>> >>> is there any way to compare two 128bit values? >>> I have tried 3 different approaches

[LLVMdev] vector compare

2008 Dec 25

[LLVMdev] vector compare

On Thu, Dec 25, 2008 at 1:54 AM, Eli Friedman <eli.friedman at gmail.com> wrote: > On Thu, Dec 25, 2008 at 1:28 AM, Claudio Basile <cbasile at tempo-da.com> wrote: >> Hi all, >> >> is there any way to compare two 128bit values? >> I have tried 3 different approaches and they all fail with an internal >> assertion. >> I'm running llvm 2.4 on

[PATCH 2/4] x86/emulator: add emulation of SIMD FP moves

2011 Nov 30

[PATCH 2/4] x86/emulator: add emulation of SIMD FP moves

...regs.eip = (unsigned long)&instr[0]; + regs.ecx = 0; + regs.edx = (unsigned long)res; + rc = x86_emulate(&ctxt, &emulops); + if ( rc != X86EMUL_OKAY ) + goto fail; + asm ( "cmpeqps %1, %%xmm7\n\t" + "movmskps %%xmm7, %0" : "=r" (rc) : "m" (res[8]) ); + if ( rc != 0xf ) + goto fail; + printf("okay\n"); + } + else + printf("skipped\n"); + for ( j = 1; j <= 2; j++ ) { #if defined(__i386__) --- a/xen/arch/x86/x8...

[LLVMdev] Vector select/compare support in LLVM

2011 Mar 10

[LLVMdev] Vector select/compare support in LLVM

...> > Cheers, > Nadav > > -----Original Message----- > From: Rotem, Nadav > Sent: Thursday, March 10, 2011 11:04 > To: 'David A. Greene' > Cc: llvmdev at cs.uiuc.edu > Subject: RE: [LLVMdev] Vector select/compare support in LLVM > > Hi David, > > The MOVMSKPS instruction is cheap (2 cycles). Not to be confused with VMASKMOV, the AVX masked move, which is expensive. > > One of the arguments for packing masks is that it reduces vector-registers pressure. Auto-vectorizing compilers maintain multiple masks for different execution paths (for each loo...

Question about VectorLegalizer::ExpandStore() with v4i1

2016 Jun 28

Question about VectorLegalizer::ExpandStore() with v4i1

On Tue, Jun 28, 2016 at 2:45 AM, jingu kang via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Hi All, > > Can someone comment below question whether it is wrong or not please? > > 2016-06-25 7:52 GMT+01:00 jingu kang <jaykang10 at gmail.com>: >> Hi All, >> >> I have a problem with VectorLegalizer::ExpandStore() with v4i1. >> >> Let's

[LLVMdev] Vector select/compare support in LLVM

2011 Mar 10

[LLVMdev] Vector select/compare support in LLVM

Hi David, The MOVMSKPS instruction is cheap (2 cycles). Not to be confused with VMASKMOV, the AVX masked move, which is expensive. One of the arguments for packing masks is that it reduces vector-registers pressure. Auto-vectorizing compilers maintain multiple masks for different execution paths (for each loop nestin...

search for: movmskps