similar to: [RFC] Changes to llvm.experimental.vector.reduce intrinsics

Displaying 20 results from an estimated 6000 matches similar to: "[RFC] Changes to llvm.experimental.vector.reduce intrinsics"

2019 Apr 05
4
[RFC] Changes to llvm.experimental.vector.reduce intrinsics
On 05/04/2019 09:37, Simon Pilgrim via llvm-dev wrote: > On 04/04/2019 14:11, Sander De Smalen wrote: >> Proposed change: >> >> ---------------------------- >> >> In this RFC I propose changing the intrinsics for >> llvm.experimental.vector.reduce.fadd and >> llvm.experimental.vector.reduce.fmul (see options A and B). I also >> propose renaming
2019 May 16
4
[RFC] Changes to llvm.experimental.vector.reduce intrinsics
Hello again, I've been meaning to follow up on this thread for the last couple of weeks, my apologies for the delay. To summarise the feedback on the proposal for vector.reduce.fadd/fmul: There seems to be consensus to keep the explicit start value to better accommodate chained reductions (as opposed to generating IR that performs the reduction of the first element using extract/fadd/insert
2019 Apr 10
2
[RFC] Changes to llvm.experimental.vector.reduce intrinsics
> On 8 Apr 2019, at 11:37, Simon Moll <moll at cs.uni-saarland.de> wrote: > > Hi, > > On 4/5/19 10:47 AM, Simon Pilgrim via llvm-dev wrote: >> On 05/04/2019 09:37, Simon Pilgrim via llvm-dev wrote: >>> On 04/04/2019 14:11, Sander De Smalen wrote: >>>> Proposed change: >>>> ---------------------------- >>>> In this RFC I
2018 Aug 22
4
Condition code in DAGCombiner::visitFADDForFMACombine?
On 22.08.2018 13:29, Ryan Taylor wrote: > The example starts as SPIR-V with the NoContraction decoration flag on > the fmul. > > I think what you are saying seems valid in that if the user had put the > flag on the fadd instead of the fmul it would not contract and so in > this example the user needs to put the NoContraction on the fadd though > I'm not sure
2018 Aug 23
3
Condition code in DAGCombiner::visitFADDForFMACombine?
I don't think the global fast math flag should override the NoContraction decoration as that's mostly the point of that decoration to begin with, to have fine granular control while still having a broad sweeping optimization. Did I miss your point? I feel like I did. On Thu, Aug 23, 2018, 3:42 PM Michael Berg <michael_c_berg at apple.com> wrote: > Ryan, > > Given that the
2018 Aug 22
2
Condition code in DAGCombiner::visitFADDForFMACombine?
On 21.08.2018 16:08, Ryan Taylor via llvm-dev wrote: > So I have a test case where: > > %20 = fmul nnan arcp float %15, %19 > %21 = fadd reassoc nnan arcp contract float %20, -1.000000e+00 > > is being contracted in DAG to fmad. Is this correct since the fmul has > no reassoc or contract fast math flag? By having the reassoc and contract flags on fadd, the frontend is
2018 Aug 23
2
Condition code in DAGCombiner::visitFADDForFMACombine?
Nicolai, Can you do without the use of -fp-contract=fast (Options.AllowFPOpFusion == FPOpFusion::Fast ) and without Unsafe? As I SPIR-V’s usage of NoContraction flies in the face of both. If so, you should be able to get what you want, as then you are down to just IR flags. You will need a model to generate the correct behavior though in your SPIR-V implementation wrt IR flag emissions.
2018 Aug 20
3
Condition code in DAGCombiner::visitFADDForFMACombine?
I'm curious why the condition to fuse is this: // Floating-point multiply-add with intermediate rounding. bool HasFMAD = (LegalOperations && TLI.isOperationLegal(ISD::FMAD, VT)); static bool isContractable(SDNode *N) { SDNodeFlags F = N->getFlags(); return F.hasAllowContract() || F.hasAllowReassociation(); } bool CanFuse = Options.UnsafeFPMath || isContractable(N); bool
2018 Aug 23
2
Condition code in DAGCombiner::visitFADDForFMACombine?
Maybe there is a cleaner solution but it seems like adding a 'nocontract' flag is close to the intention of spir-v and is an easy check in the DAGCombiner without breaking anything else and its intentions are very clear. Right now the DAGCombiner logic doesn't seem to be able to handle the case of having fast math globally with instruction level flags to turn off fast math. Right now,
2018 Aug 23
2
Condition code in DAGCombiner::visitFADDForFMACombine?
Michael, >From the spec with regards to reassoc: – 15225 Include no re-association as a constraint required by the NoContraction Decoration. I don't see a solution given the situation where -fp-contract=fast and we want to contract. Furthermore, I think a 'nocontract' flag will allow the IR to be more readable in it's intention. The problem is you can have 2 fp arith
2018 Aug 21
3
Condition code in DAGCombiner::visitFADDForFMACombine?
> On Aug 21, 2018, at 17:57, Ryan Taylor <ryta1203 at gmail.com> wrote: > > Matt, > I'm sorry, actually it's fma not fmad. > > In the post-legalizer DAG combine for the given code it's producing fma not fmad. That doens't seem correct. > The contract is on the fadd. I’m not really sure what the rule is supposed to be for contract between the nodes.
2018 Aug 22
2
Condition code in DAGCombiner::visitFADDForFMACombine?
On 22.08.2018 17:52, Ryan Taylor wrote: > This is probably going to effect on other backends and break llvm-lit > for them? Very likely, yes. Can you take a look at how big the fallout is? This might give us a hint about what other frontends might expect, and who needs to be involved in the discussion (if one is needed). Cheers, Nicolai > > On Wed, Aug 22, 2018 at 11:41 AM
2018 Aug 21
2
Condition code in DAGCombiner::visitFADDForFMACombine?
> On Aug 21, 2018, at 17:08, Ryan Taylor via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > So I have a test case where: > > %20 = fmul nnan arcp float %15, %19 > %21 = fadd reassoc nnan arcp contract float %20, -1.000000e+00 > > is being contracted in DAG to fmad. Is this correct since the fmul has no reassoc or contract fast math flag? > > Thanks. fmad
2019 Sep 02
3
AVX2 codegen - question reg. FMA generation
Hello, On the appended reasonably simple test case that has an fmul/fadd sequence on <8 x float> vector types, I don't see the x86-64 code generator (with cpu set to haswell or later types) turning it into an AVX2 FMA instructions. Here's the snippet in the output it generates: $ llc -O3 -mcpu=skylake --------------------- .LBB0_2: # =>This Inner
2018 Aug 21
2
Condition code in DAGCombiner::visitFADDForFMACombine?
For this code: %20 = fmul reassoc nnan arcp contract float %15, %19 %21 = fadd nnan arcp float %20, -1.000000e+00 This does not result in fused multiply-add. it seems like the logic to contact the fmul from the fadd is different than whether to decide to contract the fadd. I would think the logic would be the same for each instruction in the pair. On Tue, Aug 21, 2018 at 2:05 PM Ryan
2019 Sep 02
2
AVX2 codegen - question reg. FMA generation
On Mon, 2 Sep 2019 at 16:59, Roman Lebedev <lebedev.ri at gmail.com> wrote: > > It appears you need 'reassoc' on fmul/fadd: > https://godbolt.org/z/nuTzx2 Thanks very much, that was it. Either that or providing -enable-unsafe-fp-math to llc yielded FMAs. I didn't expect this since using FMAs here instead of mul/add appears to be safer (the reverse is unsafe). ~ Uday
2017 Mar 15
2
Data structure improvement for the SLP vectorizer
There was some discussion of this on the llvm-commits list, but I wanted to raise the topic for discussion here. The background of the -commits discussion was that r296863 added the ability to sort memory access when the SLP vectorizer reached a load (the SLP vectorizer starts at a store or some other sink, and tries to go up the tree vectorizing as it goes along - if the input is in a different
2013 Jul 18
0
[LLVMdev] SIMD instructions and memory alignment on X86
Are you able to send any IR for others to reproduce this issue? On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com> wrote: > Unfortunately, this doesn't appear to be the bug I'm hitting. I applied > the fix to my source and it didn't make a difference. > > Also further testing found me getting the same behavior with other SIMD > instructions.
2010 Sep 29
3
[LLVMdev] spilling & xmm register usage
Hello everybody, I have stumbled upon a test case (the attached module is a slightly reduced version) that shows extremely reduced performance on linux compared to windows when executed using LLVM's JIT. We narrowed the problem down to the actual code being generated, the source IR on both systems is the same. Try compiling the attached module: llc -O3 -filetype=asm -o BAD.s BAD.ll Under
2013 Jul 18
2
[LLVMdev] SIMD instructions and memory alignment on X86
Unfortunately, this doesn't appear to be the bug I'm hitting. I applied the fix to my source and it didn't make a difference. Also further testing found me getting the same behavior with other SIMD instructions. The common factor is in each case, ECX is set to 0x7fffffff, and it's an operation using xmm ptr ecx+offset . Additionally, turning the optimization level passed to