Displaying 20 results from an estimated 3000 matches similar to: "loop vectorizer disabling"
2019 Jul 15
2
Tail-Loop Folding/Predication
I am looking for feedback to add support for a new loop pragma to Clang/LLVM.
With "#pragma tail_predicate" the idea would be to indicate that a loop
epilogue/tail can, or should be, folded into the main loop. I see two use
cases for this pragma.
First, this could be interesting for the vectorizer. It currently supports tail
folding by masking all loop instructions/blocks, but does this
2019 Oct 02
2
vectorize.enable
Hi Michael and Florian,
( + llvm-dev for visibility)
I would like to quickly follow up on "Pragma vectorize_width() implies vectorize(enable)",
which got reverted with commit 858a1ae for 2 reasons, see also that revert commit message. Ignore the assert, that's been fixed now.
The other thing is that with the patch behaviour is slightly changed and we could get a diagnostic we
2020 Jun 18
3
FileCheck
On Thu, Jun 18, 2020 at 3:37 PM Chris Tetreault <ctetreau at quicinc.com>
wrote:
> We’re talking about verbose output right? Verbose isn’t the default.
>
I'm fairly certain the issue in this thread is just the verbosity of
-dump-input=fail. Yes, -vv makes it even more verbose by annotating input
lines with good matches, etc., but that's not part of the "new
2020 Jun 19
3
FileCheck
Sorry if I wasn't clear about my use case. In my daily dev work, I do many local "ninja check"s, or "llvm-lit" on a subdirectory as a quick(er) smoke test if I am making changes in that area (e.g. "llvm-lit ../llvm/test/CodeGen"). Nothing wrong here, as indeed nothing changed here. But in case of a test failure, I want to run just that test:
bin/llvm-lit
2020 May 04
3
LV: predication
> The harm comes if the intrinsic ends up with the wrong value, or attached to the wrong loop.
The intrinsic is marked as IntrNoDuplicate, so I wasn't worried about it ending up somewhere else. Also, it is a property of a specific loop, a tail-folded vector loop, that holds even after it is transformed I think. I.e. unrolling a vector loop is probably not what you want, but even if you do
2020 May 04
3
LV: predication
Hi Roger,
That's a good example, that shows most of the moving parts involved here. In a nutshell, the difference is, and what we would like to make explicit, is the vector trip versus the scalar loop trip count. In your IR example, the loads/stores are predicated on a mask that is calculated from a splat induction variable, which is compared with the vector trip count. Illustrated with your
2020 Jun 19
2
FileCheck
> I don't know how you proceed to debug FileCheck failures, but for me most of the time I'll have to figure out which "RUN" line fail and try to execute it manually and then remove the FileCheck pipe to get the raw input and then painfully tried to match the FileCheck error to the actual input.
Yeah, not very different from what you described here. If I 'm creating or
2020 Mar 16
2
Redundant copies
Yep, exactly that. We see quite a lot of them, most of them get cleaned up, but not always...
Cheers.
________________________________
From: Roger Ferrer Ibáñez <rofirrim at gmail.com>
Sent: 16 March 2020 08:53
To: Sjoerd Meijer <Sjoerd.Meijer at arm.com>
Cc: LLVM-Dev <llvm-dev at lists.llvm.org>; Sam Parker <Sam.Parker at arm.com>
Subject: Re: [llvm-dev] Redundant copies
2020 Jun 18
2
FileCheck
Hi Chris,
On Thu, Jun 18, 2020 at 1:37 PM Chris Tetreault via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> The thing I use normally only shows the first N lines by default (I don’t
> know off hand what N is). Honestly, I don’t feel very strongly about the
> specific order, but it’s not useful when somebody proposes something on the
> list, and nobody voices any dissent
2020 Jun 18
4
FileCheck
I would guess that in a CI system the order doesn't matter much because you look at a webpage? I looked at some build bots today/yesterday that now also show this, and yeah, it's fine either way, I was guessing.
My primary use-case is usage in a terminal, and displaying the errors first followed by all input makes this pretty unusable.
________________________________
From: Chris
2020 May 19
3
LV: predication
Hi Simon,
Thanks for reposting the example, and looking at it more carefully, I think it is very similar to my first proposal. This was met with some resistance here because it dumps loop information in the vector preheader. Doing it this early, we want to emit this in the vectoriser, puts a restriction on (future) optimisations that transform vector loops to honour/update/support this intrinsic
2020 May 18
2
LV: predication
> You have similar problems with https://reviews.llvm.org/D79100
The new revision D79100<https://reviews.llvm.org/D79100> solves your comment 1), and I don't think your comments2) and 3) apply as there are no vendor specific intrinsics involved at all here. Just to quickly discuss the optimisation pipeline, D79100<https://reviews.llvm.org/D79100> is a small extension for the
2020 May 19
2
LV: predication
Invitation accepted, I am happy to help out with reviews, like I did with the previous VP patches.
And of course agreed that things should be well defined, and that we shouldn't paint ourselves in a corner, but I don't think that this is the case. And it's not that I am in a rush, but I don't think this change needs to be predicated on a big change landing first like the LV
2019 Sep 05
2
ARM vectorized fp16 support
Thanks for reply. I was using LLVM 8.0. Let me try trunk and will let
you know if it works.
On Wed, Sep 4, 2019 at 11:19 PM Sjoerd Meijer <Sjoerd.Meijer at arm.com> wrote:
>
> Hi,
> Which version of Clang are you using? I do get a "vfma.f16" with a recent trunk build. I haven't looked at older versions and when this landed, but we had an effort to plug the remaining
2020 Mar 26
5
canonical form loops
Hello,
Quick question to see if I haven't missed anything: I would like convert counting down loops, i.e. loops with a constant -1 step value, to counting up loops, because the vectoriser is able to better deal with these loops (see e.g. D76838 that I was discussing today with Ayal). It looks like LoopSimplifyCFG and IndVarSimplify don't do this. So was just curious if I haven't
2020 Sep 29
2
Improved jump-threading in LLVM for finite state automata
Hi Sjoerd
We (at Huawei) also have a pass for this. Originally we implemented this
back in 2018 and meant to upstream it, but there were some issues with the
implementation that required some changes in the code. We started revising
it,a few weeks ago.
I thought now that there are multiple options, maybe we can discuss our
approaches, and see if there is a preference in the community for one
2020 May 18
2
LV: predication
Hi,
I abandoned that approach and followed Eli's suggestion, see somewhere earlier in this thread, and emit an intrinsic that represents/calculates the active mask. I've just uploaded a new revision for D79100 that implements this.
Cheers.
________________________________
From: Simon Moll <Simon.Moll at EMEA.NEC.COM>
Sent: 18 May 2020 13:32
To: Sjoerd Meijer <Sjoerd.Meijer at
2020 May 21
2
LV: predication
> The compare of interest is clear, I think. It compares a Vector Induction Variable with a broadcasted loop invariant value, aka the BTC. Obtaining the latter operand is the goal, clearly, but to do so, the former operand needs to be recognized as a VIV.
Yep, exactly that.
> What if this compare is not generated by LV’s fold-tail-by-masking transformation?
Not sure I completely follow
2020 May 01
5
LV: predication
Hi Eli,
> The problem with your proposal, as written, is that the vectorizer is producing the intrinsic. Because we don’t impose any ordering on optimizations before codegen, every optimization pass in LLVM would have to be taught to preserve any @llvm.set.loop.elements.i32 whenever it makes any change. This is completely impractical because the intrinsic isn’t related to anything
2019 May 03
3
Llvm-mca library.
Hi Sjoerd,
On Fri, May 3, 2019 at 8:19 AM Sjoerd Meijer via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> I read that out-of-order cores are supported. How about in-order cores?
> Would it be easy/difficult to add support for that?
>
>
Cheers,
> Sjoerd.
>
>
I don't think that it would be difficult to support in-order superscalar
cores.
However, it would