thr3ads.net - similar to: "loop vectorizer disabling"

Displaying 20 results from an estimated 3000 matches similar to: "loop vectorizer disabling"

2019 Jul 15

Tail-Loop Folding/Predication

I am looking for feedback to add support for a new loop pragma to Clang/LLVM. With "#pragma tail_predicate" the idea would be to indicate that a loop epilogue/tail can, or should be, folded into the main loop. I see two use cases for this pragma. First, this could be interesting for the vectorizer. It currently supports tail folding by masking all loop instructions/blocks, but does this

vectorize.enable

2019 Oct 02

vectorize.enable

Hi Michael and Florian, ( + llvm-dev for visibility) I would like to quickly follow up on "Pragma vectorize_width() implies vectorize(enable)", which got reverted with commit 858a1ae for 2 reasons, see also that revert commit message. Ignore the assert, that's been fixed now. The other thing is that with the patch behaviour is slightly changed and we could get a diagnostic we

FileCheck

2020 Jun 18

FileCheck

On Thu, Jun 18, 2020 at 3:37 PM Chris Tetreault <ctetreau at quicinc.com> wrote: > We’re talking about verbose output right? Verbose isn’t the default. > I'm fairly certain the issue in this thread is just the verbosity of -dump-input=fail. Yes, -vv makes it even more verbose by annotating input lines with good matches, etc., but that's not part of the "new

FileCheck

2020 Jun 19

FileCheck

Sorry if I wasn't clear about my use case. In my daily dev work, I do many local "ninja check"s, or "llvm-lit" on a subdirectory as a quick(er) smoke test if I am making changes in that area (e.g. "llvm-lit ../llvm/test/CodeGen"). Nothing wrong here, as indeed nothing changed here. But in case of a test failure, I want to run just that test: bin/llvm-lit

LV: predication

2020 May 04

LV: predication

> The harm comes if the intrinsic ends up with the wrong value, or attached to the wrong loop. The intrinsic is marked as IntrNoDuplicate, so I wasn't worried about it ending up somewhere else. Also, it is a property of a specific loop, a tail-folded vector loop, that holds even after it is transformed I think. I.e. unrolling a vector loop is probably not what you want, but even if you do

LV: predication

2020 May 04

LV: predication

Hi Roger, That's a good example, that shows most of the moving parts involved here. In a nutshell, the difference is, and what we would like to make explicit, is the vector trip versus the scalar loop trip count. In your IR example, the loads/stores are predicated on a mask that is calculated from a splat induction variable, which is compared with the vector trip count. Illustrated with your

FileCheck

2020 Jun 19

FileCheck

> I don't know how you proceed to debug FileCheck failures, but for me most of the time I'll have to figure out which "RUN" line fail and try to execute it manually and then remove the FileCheck pipe to get the raw input and then painfully tried to match the FileCheck error to the actual input. Yeah, not very different from what you described here. If I 'm creating or

Redundant copies

2020 Mar 16

Redundant copies

Yep, exactly that. We see quite a lot of them, most of them get cleaned up, but not always... Cheers. ________________________________ From: Roger Ferrer Ibáñez <rofirrim at gmail.com> Sent: 16 March 2020 08:53 To: Sjoerd Meijer <Sjoerd.Meijer at arm.com> Cc: LLVM-Dev <llvm-dev at lists.llvm.org>; Sam Parker <Sam.Parker at arm.com> Subject: Re: [llvm-dev] Redundant copies

FileCheck

2020 Jun 18

FileCheck

Hi Chris, On Thu, Jun 18, 2020 at 1:37 PM Chris Tetreault via llvm-dev < llvm-dev at lists.llvm.org> wrote: > The thing I use normally only shows the first N lines by default (I don’t > know off hand what N is). Honestly, I don’t feel very strongly about the > specific order, but it’s not useful when somebody proposes something on the > list, and nobody voices any dissent

FileCheck

2020 Jun 18

FileCheck

I would guess that in a CI system the order doesn't matter much because you look at a webpage? I looked at some build bots today/yesterday that now also show this, and yeah, it's fine either way, I was guessing. My primary use-case is usage in a terminal, and displaying the errors first followed by all input makes this pretty unusable. ________________________________ From: Chris

LV: predication

2020 May 19

LV: predication

Hi Simon, Thanks for reposting the example, and looking at it more carefully, I think it is very similar to my first proposal. This was met with some resistance here because it dumps loop information in the vector preheader. Doing it this early, we want to emit this in the vectoriser, puts a restriction on (future) optimisations that transform vector loops to honour/update/support this intrinsic

LV: predication

2020 May 18

LV: predication

> You have similar problems with https://reviews.llvm.org/D79100 The new revision D79100<https://reviews.llvm.org/D79100> solves your comment 1), and I don't think your comments2) and 3) apply as there are no vendor specific intrinsics involved at all here. Just to quickly discuss the optimisation pipeline, D79100<https://reviews.llvm.org/D79100> is a small extension for the

LV: predication

2020 May 19

LV: predication

Invitation accepted, I am happy to help out with reviews, like I did with the previous VP patches. And of course agreed that things should be well defined, and that we shouldn't paint ourselves in a corner, but I don't think that this is the case. And it's not that I am in a rush, but I don't think this change needs to be predicated on a big change landing first like the LV

ARM vectorized fp16 support

2019 Sep 05

ARM vectorized fp16 support

Thanks for reply. I was using LLVM 8.0. Let me try trunk and will let you know if it works. On Wed, Sep 4, 2019 at 11:19 PM Sjoerd Meijer <Sjoerd.Meijer at arm.com> wrote: > > Hi, > Which version of Clang are you using? I do get a "vfma.f16" with a recent trunk build. I haven't looked at older versions and when this landed, but we had an effort to plug the remaining

canonical form loops

2020 Mar 26

canonical form loops

Hello, Quick question to see if I haven't missed anything: I would like convert counting down loops, i.e. loops with a constant -1 step value, to counting up loops, because the vectoriser is able to better deal with these loops (see e.g. D76838 that I was discussing today with Ayal). It looks like LoopSimplifyCFG and IndVarSimplify don't do this. So was just curious if I haven't

Improved jump-threading in LLVM for finite state automata

2020 Sep 29

Improved jump-threading in LLVM for finite state automata

Hi Sjoerd We (at Huawei) also have a pass for this. Originally we implemented this back in 2018 and meant to upstream it, but there were some issues with the implementation that required some changes in the code. We started revising it,a few weeks ago. I thought now that there are multiple options, maybe we can discuss our approaches, and see if there is a preference in the community for one

LV: predication

2020 May 18

LV: predication

Hi, I abandoned that approach and followed Eli's suggestion, see somewhere earlier in this thread, and emit an intrinsic that represents/calculates the active mask. I've just uploaded a new revision for D79100 that implements this. Cheers. ________________________________ From: Simon Moll <Simon.Moll at EMEA.NEC.COM> Sent: 18 May 2020 13:32 To: Sjoerd Meijer <Sjoerd.Meijer at

LV: predication

2020 May 21

LV: predication

> The compare of interest is clear, I think. It compares a Vector Induction Variable with a broadcasted loop invariant value, aka the BTC. Obtaining the latter operand is the goal, clearly, but to do so, the former operand needs to be recognized as a VIV. Yep, exactly that. > What if this compare is not generated by LV’s fold-tail-by-masking transformation? Not sure I completely follow

LV: predication

2020 May 01

LV: predication

Hi Eli, > The problem with your proposal, as written, is that the vectorizer is producing the intrinsic. Because we don’t impose any ordering on optimizations before codegen, every optimization pass in LLVM would have to be taught to preserve any @llvm.set.loop.elements.i32 whenever it makes any change. This is completely impractical because the intrinsic isn’t related to anything

Llvm-mca library.

2019 May 03

Llvm-mca library.

Hi Sjoerd, On Fri, May 3, 2019 at 8:19 AM Sjoerd Meijer via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > I read that out-of-order cores are supported. How about in-order cores? > Would it be easy/difficult to add support for that? > > Cheers, > Sjoerd. > > I don't think that it would be difficult to support in-order superscalar cores. However, it would

similar to: loop vectorizer disabling