thr3ads.net - similar to: "[RFC] Matrix support (take 2)"

Displaying 20 results from an estimated 80000 matches similar to: "[RFC] Matrix support (take 2)"

2018 Dec 19

[RFC] Matrix support (take 2)

Hi Chris, > On Dec 18, 2018, at 8:45 PM, Chris Lattner <clattner at nondot.org> wrote: > > >> On Dec 5, 2018, at 10:41 AM, Adam Nemet <anemet at apple.com> wrote: >> Hi all, >> >> After the previous RFC[1], there were multiple discussions on the ML and in person at the DevMtg. I will summarize the options discussed and propose a path forward.

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

2019 Dec 09

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

Sanjay, I'm looking at some missed optimizations caused by D70246. Here's a test case: define <4 x float> @f(i32 %t32, <4 x float>* %t24) { .entry: %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 %t44 = bitcast <3 x i32> %t43 to <3 x float> %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 0, i32 undef,

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The instcombine pass cleans up a lot. Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x

[RFC] Changes to llvm.experimental.vector.reduce intrinsics

2019 Apr 04

[RFC] Changes to llvm.experimental.vector.reduce intrinsics

Hi all, While working on a patch to improve codegen for fadd reductions on AArch64, I stumbled upon an outstanding issue with the experimental.vector.reduce.fadd/fmul intrinsics where the accumulator argument is ignored when fast-math is set on the intrinsic call. This behaviour is described in the LangRef (https://www.llvm.org/docs/LangRef.html#id1905) and is mentioned in

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

Yes, you need the latest ToT version of llvm or you run -loop-vectorize -earlycse -instcombine -simplifycfg The bitcast essentially is a noop to satisfy the type system. This is how your example looks like for me: vector.body: ; preds = %vector.body, %vector.ph %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %.lhs = shl i64 %6, 2

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The loop vectorizer relies on cleanup passes to be run after it: from Transforms/IPO/PassManagerBuilder.cpp: // Add the various vectorization passes and relevant cleanup passes for // them since we are no longer in the middle of the main scalar pipeline. MPM.add(createLoopVectorizePass(DisableUnrollLoops)); MPM.add(createInstructionCombiningPass());

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The following IR implements the following nested loop: for (int i = start ; i < end ; ++i ) for (int p = 0 ; p < 4 ; ++p ) a[i*4+p] = b[i*4+p] + c[i*4+p]; define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float* noalias %arg4, float* noalias %arg5, float* noalias %arg6) { entrypoint: br i1 %arg2, label %L0, label %L1 L0:

[RFC] Changes to llvm.experimental.vector.reduce intrinsics

2019 Apr 05

[RFC] Changes to llvm.experimental.vector.reduce intrinsics

On 05/04/2019 09:37, Simon Pilgrim via llvm-dev wrote: > On 04/04/2019 14:11, Sander De Smalen wrote: >> Proposed change: >> >> ---------------------------- >> >> In this RFC I propose changing the intrinsics for >> llvm.experimental.vector.reduce.fadd and >> llvm.experimental.vector.reduce.fmul (see options A and B). I also >> propose renaming

[RFC] Changes to llvm.experimental.vector.reduce intrinsics

2019 Apr 10

[RFC] Changes to llvm.experimental.vector.reduce intrinsics

> On 8 Apr 2019, at 11:37, Simon Moll <moll at cs.uni-saarland.de> wrote: > > Hi, > > On 4/5/19 10:47 AM, Simon Pilgrim via llvm-dev wrote: >> On 05/04/2019 09:37, Simon Pilgrim via llvm-dev wrote: >>> On 04/04/2019 14:11, Sander De Smalen wrote: >>>> Proposed change: >>>> ---------------------------- >>>> In this RFC I

llc error

2016 Sep 03

llc error

Hi all, The attached LLVM assembly file fails to generate x86 code when compiled using llc. compilation command - ../llvm-build/bin/llc -filetype=asm -march=x86-64 -mcpu=core-avx2 ex4.ll The error message is, LLVM ERROR: Cannot select: t95: v8f32 = X86ISD::SUBV_BROADCAST t17 t17: v4f32,ch = load<LD16[%scevgep](tbaa=<0x4dbcd98>)> t0, t16, undef:i64 t16: i64 = add t2,

[LLVMdev] bb-vectorizer transforms only part of the block

2015 Jun 22

[LLVMdev] bb-vectorizer transforms only part of the block

The loads, stores and float arithmetic in attached function should be completely vectorizable. The bb-vectorizer does a good job at first, but from instruction %96 on it messes up by adding unnecessary vectorshuffles. (The function was designed so that no shuffle would be needed in order to vectorize it). I tested this with llvm 3.6 with the following command:

llc error

2016 Sep 03

llc error

I updated to the latest revision and now llvm does not build and quits cmake with CMake Error at cmake/modules/LLVMProcessSources.cmake:83 (message): Found unknown source file ../llvm-revec/lib/CodeGen/MachineFunctionAnalysis.cpp Please update ../llvm-revec/lib/CodeGen/CMakeLists.txt Thanks On Sat, Sep 3, 2016 at 2:09 AM, Craig Topper <craig.topper at gmail.com> wrote: >

AVX2 codegen - question reg. FMA generation

2019 Sep 02

AVX2 codegen - question reg. FMA generation

Hello, On the appended reasonably simple test case that has an fmul/fadd sequence on <8 x float> vector types, I don't see the x86-64 code generator (with cpu set to haswell or later types) turning it into an AVX2 FMA instructions. Here's the snippet in the output it generates: $ llc -O3 -mcpu=skylake --------------------- .LBB0_2: # =>This Inner

AVX2 codegen - question reg. FMA generation

2019 Sep 02

AVX2 codegen - question reg. FMA generation

On Mon, 2 Sep 2019 at 16:59, Roman Lebedev <lebedev.ri at gmail.com> wrote: > > It appears you need 'reassoc' on fmul/fadd: > https://godbolt.org/z/nuTzx2 Thanks very much, that was it. Either that or providing -enable-unsafe-fp-math to llc yielded FMAs. I didn't expect this since using FMAs here instead of mul/add appears to be safer (the reverse is unsafe). ~ Uday

[RFC] Changes to llvm.experimental.vector.reduce intrinsics

2019 May 16

[RFC] Changes to llvm.experimental.vector.reduce intrinsics

Hello again, I've been meaning to follow up on this thread for the last couple of weeks, my apologies for the delay. To summarise the feedback on the proposal for vector.reduce.fadd/fmul: There seems to be consensus to keep the explicit start value to better accommodate chained reductions (as opposed to generating IR that performs the reduction of the first element using extract/fadd/insert

[LLVMdev] X86 - Help on fixing a poor code generation bug

2013 Dec 05

[LLVMdev] X86 - Help on fixing a poor code generation bug

Hi Andrea, Thanks for working on this. I can see two approaches to solving this problem. The first one (that you suggested) is to catch this pattern after register allocation. The second approach is to eliminate this redundancy during instruction selection. Can you please look into catching this pattern during iSel? The idea is that ADDSS does an ADD plus BLEND operations, and you can easily

RFC: Matrix math support

2019 Oct 28

RFC: Matrix math support

Hello, This is a follow-up to Adam’s original RFC (RFC: First-class Matrix type, [1] <http://lists.llvm.org/pipermail/llvm-dev/2018-October/126871.html>) <https://confluence.sd.apple.com/pages/createpage.action?spaceKey=DPT&title=1&linkCreation=true&fromPageId=1360610956> and his follow-up ([RFC] Matrix support (take 2, [2

LangRef semantics for shufflevector with undef mask is incorrect

2019 Nov 27

LangRef semantics for shufflevector with undef mask is incorrect

Ok, makes sense. My suggestion is that we patch the IR Verifier to ensure that the mask is indeed a vector of constants and/or undefs. Right now it only runs the standard checks for instructions. We will also run Alive2 on the test suite to make sure undef is never replaced in practice. Thanks, Nuno -----Original Message----- From: Eli Friedman <efriedma at quicinc.com> Sent: 27 de

[LLVMdev] [Vectorization] Mis match in code generated

2014 Nov 10

[LLVMdev] [Vectorization] Mis match in code generated

Hi Suyog, Thanks for looking at this. This has recently got itself onto my TODO list too. > I am not sure how much all this will improve the code quality for horizontal reduction > (donno how frequently such pattern of horizontal reduction from same array occurs in real world/SPECS). Actually the main loop of 470.lbm can be SLP vectorized like this. We have three parts to it: A fully

llvm-stress crash

2017 Mar 14

llvm-stress crash

Hi, Using llvm-stress, I got a crash after Post-RA pseudo expansion, with machine verifier. A 128 bit register %vreg233:subreg_l32<def,read-undef> = LLCRMux %vreg119; GR128Bit:%vreg233 GRX32Bit:%vreg119 gets spilled: %vreg265:subreg_l32<def,read-undef> = LLCRMux %vreg119; GR128Bit:%vreg265 GRX32Bit:%vreg119 ST128 %vreg265, <fi#10>, 0, %noreg;

similar to: [RFC] Matrix support (take 2)