Displaying 20 results from an estimated 80000 matches similar to: "[RFC] Matrix support (take 2)"
2018 Dec 19
3
[RFC] Matrix support (take 2)
Hi Chris,
> On Dec 18, 2018, at 8:45 PM, Chris Lattner <clattner at nondot.org> wrote:
>
>
>> On Dec 5, 2018, at 10:41 AM, Adam Nemet <anemet at apple.com> wrote:
>> Hi all,
>>
>> After the previous RFC[1], there were multiple discussions on the ML and in person at the DevMtg. I will summarize the options discussed and propose a path forward.
2019 Dec 09
2
[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs
Sanjay,
I'm looking at some missed optimizations caused by D70246. Here's a test case:
define <4 x float> @f(i32 %t32, <4 x float>* %t24) {
.entry:
%t43 = insertelement <3 x i32> undef, i32 %t32, i32 2
%t44 = bitcast <3 x i32> %t43 to <3 x float>
%t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
<i32 0, i32 undef,
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The instcombine pass cleans up a lot.
Any idea why there are still shufflevector, insertelement, *and* bitcast
(!!) etc. instructions left? The original loop is so clean, a textbook
example I'd say. There is no need to shuffle anything.At least I don't
see it.
Frank
vector.ph: ; preds = %L5
%broadcast.splatinsert1 = insertelement <4 x
2019 Apr 04
5
[RFC] Changes to llvm.experimental.vector.reduce intrinsics
Hi all,
While working on a patch to improve codegen for fadd reductions on AArch64, I stumbled upon an outstanding issue with the experimental.vector.reduce.fadd/fmul intrinsics where the accumulator argument is ignored when fast-math is set on the intrinsic call. This behaviour is described in the LangRef (https://www.llvm.org/docs/LangRef.html#id1905) and is mentioned in
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
Yes, you need the latest ToT version of llvm or you run
-loop-vectorize -earlycse -instcombine -simplifycfg
The bitcast essentially is a noop to satisfy the type system.
This is how your example looks like for me:
vector.body: ; preds = %vector.body, %vector.ph
%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%.lhs = shl i64 %6, 2
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The loop vectorizer relies on cleanup passes to be run after it:
from Transforms/IPO/PassManagerBuilder.cpp:
// Add the various vectorization passes and relevant cleanup passes for
// them since we are no longer in the middle of the main scalar pipeline.
MPM.add(createLoopVectorizePass(DisableUnrollLoops));
MPM.add(createInstructionCombiningPass());
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The following IR implements the following nested loop:
for (int i = start ; i < end ; ++i )
for (int p = 0 ; p < 4 ; ++p )
a[i*4+p] = b[i*4+p] + c[i*4+p];
define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float*
noalias %arg4, float* noalias %arg5, float* noalias %arg6) {
entrypoint:
br i1 %arg2, label %L0, label %L1
L0:
2019 Apr 05
4
[RFC] Changes to llvm.experimental.vector.reduce intrinsics
On 05/04/2019 09:37, Simon Pilgrim via llvm-dev wrote:
> On 04/04/2019 14:11, Sander De Smalen wrote:
>> Proposed change:
>>
>> ----------------------------
>>
>> In this RFC I propose changing the intrinsics for
>> llvm.experimental.vector.reduce.fadd and
>> llvm.experimental.vector.reduce.fmul (see options A and B). I also
>> propose renaming
2019 Apr 10
2
[RFC] Changes to llvm.experimental.vector.reduce intrinsics
> On 8 Apr 2019, at 11:37, Simon Moll <moll at cs.uni-saarland.de> wrote:
>
> Hi,
>
> On 4/5/19 10:47 AM, Simon Pilgrim via llvm-dev wrote:
>> On 05/04/2019 09:37, Simon Pilgrim via llvm-dev wrote:
>>> On 04/04/2019 14:11, Sander De Smalen wrote:
>>>> Proposed change:
>>>> ----------------------------
>>>> In this RFC I
2016 Sep 03
4
llc error
Hi all,
The attached LLVM assembly file fails to generate x86 code when compiled
using llc.
compilation command - ../llvm-build/bin/llc -filetype=asm -march=x86-64
-mcpu=core-avx2 ex4.ll
The error message is,
LLVM ERROR: Cannot select: t95: v8f32 = X86ISD::SUBV_BROADCAST t17
t17: v4f32,ch = load<LD16[%scevgep](tbaa=<0x4dbcd98>)> t0, t16, undef:i64
t16: i64 = add t2,
2015 Jun 22
2
[LLVMdev] bb-vectorizer transforms only part of the block
The loads, stores and float arithmetic in attached function should be
completely vectorizable. The bb-vectorizer does a good job at first, but
from instruction %96 on it messes up by adding unnecessary
vectorshuffles. (The function was designed so that no shuffle would be
needed in order to vectorize it).
I tested this with llvm 3.6 with the following command:
2016 Sep 03
2
llc error
I updated to the latest revision and now llvm does not build and quits
cmake with
CMake Error at cmake/modules/LLVMProcessSources.cmake:83 (message):
Found unknown source file
../llvm-revec/lib/CodeGen/MachineFunctionAnalysis.cpp
Please update
../llvm-revec/lib/CodeGen/CMakeLists.txt
Thanks
On Sat, Sep 3, 2016 at 2:09 AM, Craig Topper <craig.topper at gmail.com> wrote:
>
2019 Sep 02
3
AVX2 codegen - question reg. FMA generation
Hello,
On the appended reasonably simple test case that has an fmul/fadd
sequence on <8 x float> vector types, I don't see the x86-64 code
generator (with cpu set to haswell or later types) turning it into an
AVX2 FMA instructions. Here's the snippet in the output it generates:
$ llc -O3 -mcpu=skylake
---------------------
.LBB0_2: # =>This Inner
2019 Sep 02
2
AVX2 codegen - question reg. FMA generation
On Mon, 2 Sep 2019 at 16:59, Roman Lebedev <lebedev.ri at gmail.com> wrote:
>
> It appears you need 'reassoc' on fmul/fadd:
> https://godbolt.org/z/nuTzx2
Thanks very much, that was it. Either that or providing
-enable-unsafe-fp-math to llc yielded FMAs. I didn't expect this since
using FMAs here instead of mul/add appears to be safer (the reverse is
unsafe).
~ Uday
2019 May 16
4
[RFC] Changes to llvm.experimental.vector.reduce intrinsics
Hello again,
I've been meaning to follow up on this thread for the last couple of weeks, my apologies for the delay.
To summarise the feedback on the proposal for vector.reduce.fadd/fmul:
There seems to be consensus to keep the explicit start value to better accommodate chained reductions (as opposed to generating IR that performs the reduction of the first element using extract/fadd/insert
2013 Dec 05
0
[LLVMdev] X86 - Help on fixing a poor code generation bug
Hi Andrea,
Thanks for working on this. I can see two approaches to solving this problem. The first one (that you suggested) is to catch this pattern after register allocation. The second approach is to eliminate this redundancy during instruction selection. Can you please look into catching this pattern during iSel? The idea is that ADDSS does an ADD plus BLEND operations, and you can easily
2019 Oct 28
6
RFC: Matrix math support
Hello,
This is a follow-up to Adam’s original RFC (RFC: First-class Matrix type, [1] <http://lists.llvm.org/pipermail/llvm-dev/2018-October/126871.html>) <https://confluence.sd.apple.com/pages/createpage.action?spaceKey=DPT&title=1&linkCreation=true&fromPageId=1360610956> and his follow-up ([RFC] Matrix support (take 2, [2
2019 Nov 27
2
LangRef semantics for shufflevector with undef mask is incorrect
Ok, makes sense.
My suggestion is that we patch the IR Verifier to ensure that the mask is
indeed a vector of constants and/or undefs. Right now it only runs the
standard checks for instructions.
We will also run Alive2 on the test suite to make sure undef is never
replaced in practice.
Thanks,
Nuno
-----Original Message-----
From: Eli Friedman <efriedma at quicinc.com>
Sent: 27 de
2014 Nov 10
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Suyog,
Thanks for looking at this. This has recently got itself onto my TODO list
too.
> I am not sure how much all this will improve the code quality for
horizontal reduction
> (donno how frequently such pattern of horizontal reduction from same
array occurs in real world/SPECS).
Actually the main loop of 470.lbm can be SLP vectorized like this. We have
three parts to it: A fully
2017 Mar 14
3
llvm-stress crash
Hi,
Using llvm-stress, I got a crash after Post-RA pseudo expansion, with
machine verifier.
A 128 bit register
%vreg233:subreg_l32<def,read-undef> = LLCRMux %vreg119;
GR128Bit:%vreg233 GRX32Bit:%vreg119
gets spilled:
%vreg265:subreg_l32<def,read-undef> = LLCRMux %vreg119;
GR128Bit:%vreg265 GRX32Bit:%vreg119
ST128 %vreg265, <fi#10>, 0, %noreg;