thr3ads.net - similar to: "[LLVMdev] loop vectorizer erroneously finds 256 bit vectors"

Displaying 20 results from an estimated 20000 matches similar to: "[LLVMdev] loop vectorizer erroneously finds 256 bit vectors"

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

I looked more into this. For the previously sent IR the vector width of 256 bit is found mistakenly (and reproducibly) on this hardware: model name : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz For the same IR the loop vectorizer finds the correct vector width (128 bit) on: model name : Intel(R) Xeon(R) CPU E5630 @ 2.53GHz model name : Intel(R) Core(TM) i7 CPU M 640 @

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

Hi Frank, I'm not an Intel expert, but it seems that your Xeon E5 supports AVX, which does have 256-bit vectors. The other two only supports SSE instructions, which are only 128-bit long. cheers, --renato On 10 November 2013 06:05, Frank Winter <fwinter at jlab.org> wrote: > I looked more into this. For the previously sent IR the vector width of > 256 bit is found mistakenly

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

Hi Renato, you are right! There is 'avx' support: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The following IR implements the following nested loop: for (int i = start ; i < end ; ++i ) for (int p = 0 ; p < 4 ; ++p ) a[i*4+p] = b[i*4+p] + c[i*4+p]; define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float* noalias %arg4, float* noalias %arg5, float* noalias %arg6) { entrypoint: br i1 %arg2, label %L0, label %L1 L0:

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The loop vectorizer relies on cleanup passes to be run after it: from Transforms/IPO/PassManagerBuilder.cpp: // Add the various vectorization passes and relevant cleanup passes for // them since we are no longer in the middle of the main scalar pipeline. MPM.add(createLoopVectorizePass(DisableUnrollLoops)); MPM.add(createInstructionCombiningPass());

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The instcombine pass cleans up a lot. Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x

[LLVMdev] Upcoming API change: FAdd, FSub, FMul

2009 Jun 15

[LLVMdev] Upcoming API change: FAdd, FSub, FMul

Hello, The LLVM IR opcodes Add, Sub, and Mul have been each split into two. Add, Sub, and Mul now only handle integer types, and three new opcodes, FAdd, FSub, and FMul now handle floating-point types. The main LLVM APIs are currently preserving backwards compatibility, transparently mapping integer opcodes to corresponding floating-point opcodes when the operands have floating-point types.

[LLVMdev] Upcoming API change: FAdd, FSub, FMul

2009 Jun 16

[LLVMdev] Upcoming API change: FAdd, FSub, FMul

On Jun 16, 2009, at 7:34 AM, Aaron Gray wrote: >> The LLVM IR opcodes Add, Sub, and Mul have been each split into >> two. Add, Sub, and Mul now only handle integer types, and three >> new opcodes, FAdd, FSub, and FMul now handle floating-point types. > > Dan, > > Wondering the reason why there is no FDiv ? FDiv already exists; div was split quite a while ago. Dan

Fusing contract fadd/fsub with normal fmul

2017 Jun 10

Fusing contract fadd/fsub with normal fmul

Hi, On LLVM 5.0 (current trunk), fadd/fsub and fmul that are both marked with `contract` or `fast` can be merged to a fma instruction by the backend. I'm wondering about the exact semantic of this new flag as well as `fast` and in particular, would it be valid to do this when only the `fadd`/`fsub` (and not the `fmul`) is marked with `contract` or at least `fast`. The reasoning is that doing

[LLVMdev] Upcoming API change: FAdd, FSub, FMul

2009 Jun 16

[LLVMdev] Upcoming API change: FAdd, FSub, FMul

> The LLVM IR opcodes Add, Sub, and Mul have been each split into > two. Add, Sub, and Mul now only handle integer types, and three > new opcodes, FAdd, FSub, and FMul now handle floating-point types. Dan, Wondering the reason why there is no FDiv ? Thanks, Aaron

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

I am trying a setup where the one loop is rewritten as two loops. This avoids the 'rem' and 'div' instructions in the index calculation (which give the loop vectorizer a hard time). However, with this setup the loop vectorizer complains about a too small loop. LV: Checking a loop in "main" LV: Found a loop: L3 LV: Found a loop with a very small trip count. This loop

[LLVMdev] Bug 16257 - fmul of undef ConstantExpr not folded to undef

2014 Aug 27

[LLVMdev] Bug 16257 - fmul of undef ConstantExpr not folded to undef

Hi Duncan, Thank you a lot for your time to provide that great and informative explanation. Now the "undef" logic makes much more sense for me. >> /You are wrong to say that "div undef, %X" is folded to "undef" by InstructionSimplify, it is folded to zero./ My mistake. I meant to say "*f****div* undef, %X" is folded to "undef" (not

[LLVMdev] Bug 16257 - fmul of undef ConstantExpr not folded to undef

2014 Aug 27

[LLVMdev] Bug 16257 - fmul of undef ConstantExpr not folded to undef

Duncan, > Hi Oleg, > >> >> /This is either a mistake, or a decision that in LLVM IR snans >> are always >> considered to be signalling. / >> Yes, this seems to be an agreement to treat "undef" as a SNaN for >> "fdiv". > > "undef" is whatever bit pattern you want it to be, i.e. the compiler > can assume it is any

AVX2 codegen - question reg. FMA generation

2019 Sep 02

AVX2 codegen - question reg. FMA generation

On Mon, 2 Sep 2019 at 16:59, Roman Lebedev <lebedev.ri at gmail.com> wrote: > > It appears you need 'reassoc' on fmul/fadd: > https://godbolt.org/z/nuTzx2 Thanks very much, that was it. Either that or providing -enable-unsafe-fp-math to llc yielded FMAs. I didn't expect this since using FMAs here instead of mul/add appears to be safer (the reverse is unsafe). ~ Uday

[LLVMdev] API change: add, sub, and mul no longer do floating-point

2010 May 03

[LLVMdev] API change: add, sub, and mul no longer do floating-point

Quick heads up: On LLVM trunk, the add, sub, and mul instructions no longer accept floating-point operands. The fadd, fsub, and fmul instructions should be used instead. This change actually happened back in LLVM 2.6; since then, LLVM has just been silently converting add into fadd and so on, and the change today is that it no longer does this silent conversion. Dan

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

In the case when coming from C it was probably the loop unroller and SLP vectorizer which vectorized the code. Potentially I could do the same in the IR. However, the loop body that is generated in the IR can get very large. Thus, the loop unroller will refuse to unroll the loop in a large number of (important) cases. Isn't there a way to convince the loop vectorizer that it should

[LLVMdev] What's the Alias Analysis does clang use ?

2013 Nov 11

[LLVMdev] What's the Alias Analysis does clang use ?

Hi, LLVM community: I found basicaa seems not to tell must-not-alias for __restrict__ arguments in c/c++. It only compares two pointers and the underlying objects they point to. I wonder how clang does alias analysis for c/c++ keyword restrict. let assume we compile the following code: $cat myalias.cc float foo(float * __restrict__ v0, float * __restrict__ v1, float * __restrict__ v2, float *

Paging in waves.

2013 Dec 06

Paging in waves.

I've been working on writing a subroutine to page groups of phones at once and I'm having some difficulty. My goal is to have a user call an extension, I record the page they wish to play, I then page out that recorded file to the phones in groups. [sub-masspage] exten => s,1,NoOP same => n,Answer same => n,Set(filename=$PAGE) same => n,Wait(1) same =>

[RFC] Changes to llvm.experimental.vector.reduce intrinsics

2019 Apr 05

[RFC] Changes to llvm.experimental.vector.reduce intrinsics

On 05/04/2019 09:37, Simon Pilgrim via llvm-dev wrote: > On 04/04/2019 14:11, Sander De Smalen wrote: >> Proposed change: >> >> ---------------------------- >> >> In this RFC I propose changing the intrinsics for >> llvm.experimental.vector.reduce.fadd and >> llvm.experimental.vector.reduce.fmul (see options A and B). I also >> propose renaming

[LLVMdev] What's the Alias Analysis does clang use ?

2013 Nov 12

[LLVMdev] What's the Alias Analysis does clang use ?

Hi, Your problem is that the function arguments, which are makes as noalias, are not being directly used as the base objects of the array accesses: > %v0.addr = alloca float*, align 8 > %v1.addr = alloca float*, align 8 > %v2.addr = alloca float*, align 8 > %t.addr = alloca float*, align 8 ... > store float* %v0, float** %v0.addr, align 8 > store float* %v1, float** %v1.addr,

similar to: [LLVMdev] loop vectorizer erroneously finds 256 bit vectors