search for: d79100

Displaying 10 results from an estimated 10 matches for "d79100".

2020 May 18
2
LV: predication
> You have similar problems with https://reviews.llvm.org/D79100 The new revision D79100<https://reviews.llvm.org/D79100> solves your comment 1), and I don't think your comments2) and 3) apply as there are no vendor specific intrinsics involved at all here. Just to quickly discuss the optimisation pipeline, D79100<https://reviews.llvm.org/D79100&gt...
2020 May 19
3
LV: predication
...l. This was met with some resistance here because it dumps loop information in the vector preheader. Doing it this early, we want to emit this in the vectoriser, puts a restriction on (future) optimisations that transform vector loops to honour/update/support this intrinsic and loop information. In D79100, it is integral part of the vector body and has some semantics (I will update it today), and thus doesn't have these disadvantages. Also, the vectoriser isn't using the VP intrinsics yet, so using them is a bridge too far for me at this point. But we should definitely re-evaluate at some po...
2020 May 18
2
LV: predication
Hi, I abandoned that approach and followed Eli's suggestion, see somewhere earlier in this thread, and emit an intrinsic that represents/calculates the active mask. I've just uploaded a new revision for D79100 that implements this. Cheers. ________________________________ From: Simon Moll <Simon.Moll at EMEA.NEC.COM> Sent: 18 May 2020 13:32 To: Sjoerd Meijer <Sjoerd.Meijer at arm.com> Cc: Roger Ferrer Ibáñez <rofirrim at gmail.com>; Eli Friedman <efriedma at quicinc.com>; listmai...
2020 May 19
2
LV: predication
...l. This was met with some resistance here because it dumps loop information in the vector preheader. Doing it this early, we want to emit this in the vectoriser, puts a restriction on (future) optimisations that transform vector loops to honour/update/support this intrinsic and loop information. In D79100, it is integral part of the vector body and has some semantics (I will update it today), and thus doesn't have these disadvantages. The difference is that in the VP version there is an explicit dependence of every vector operation in the loop to the set.num.elements intrinsic. This dependence i...
2020 May 04
3
LV: predication
...AnyInt, AnyInt) It produces a <N x i1> predicate based on its two arguments, the number of elements and the vector trip count, and it will be used by the predicated masked loads/stores instructions in the vector body. I will start drafting an implementation for this and continue with this in D79100. Thanks, Sjoerd. ________________________________ From: Eli Friedman <efriedma at quicinc.com> Sent: 01 May 2020 21:11 To: Sjoerd Meijer <Sjoerd.Meijer at arm.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: RE: [llvm-dev] LV: predication From: Sjoerd Meijer <Sjoer...
2020 May 01
5
LV: predication
...lements.i32(i32 ) This represents the number of data elements processed by a vector loop, and will be emitted in the preheader block of the vector loop after querying TTI that the backend understands this intrinsic and that it should be emitted for that loop. The vectoriser patch is available in D79100, and we pick this intrinsic up in the ARM backend here in D79175. Context: We are working on predication form that we call tail-predication: a vector hardwareloop has an implicit form of predication that sets active/inactive lanes for the last iteration of the vector loop. Thus, the scalar ep...
2020 May 04
3
LV: predication
...AnyInt, AnyInt) It produces a <N x i1> predicate based on its two arguments, the number of elements and the vector trip count, and it will be used by the predicated masked loads/stores instructions in the vector body. I will start drafting an implementation for this and continue with this in D79100. I'm curious about this, because this looks to me very similar to the code that -prefer-predicate-over-epilog is already emitting for the "outer mask" of a tail-folded loop. The following code void foo(int N, int *restrict c, int *restrict a, int *restrict b) { #pragma clang loop v...
2020 May 20
2
LV: predication
...Mobileye) <ayal.zaks at intel.com> Sent: 20 May 2020 20:39 To: Sjoerd Meijer <Sjoerd.Meijer at arm.com>; Eli Friedman <efriedma at quicinc.com> Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> Subject: RE: [llvm-dev] LV: predication I realize this discussion and D79100 have progressed, sorry, but could we revisit the “simplest path” of deriving the desired number? > This is what we are currently doing and works excellent for simpler cases. For the more complicated cases that we now what to handle as well, the pattern matching just becomes a bit too horrible...
2020 May 01
3
LV: predication
....elements.i32(i32 ) This represents the number of data elements processed by a vector loop, and will be emitted in the preheader block of the vector loop after querying TTI that the backend understands this intrinsic and that it should be emitted for that loop. The vectoriser patch is available in D79100, and we pick this intrinsic up in the ARM backend here in D79175. Context: We are working on predication form that we call tail-predication: a vector hardwareloop has an implicit form of predication that sets active/inactive lanes for the last iteration of the vector loop. Thus, the scalar epilog...
2020 May 21
2
LV: predication
...trinsic. But quickly checking the example we have: %vec.ind = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, %vector.ph ], [ %vec.ind.next, %vector.body ] %4 = icmp ult <4 x i32> %vec.ind, <i32 998, i32 998, i32 998, i32 998> and everything looks good here. Using the change D79100, we don't emit @llvm.get.active.lane.mask() because we don't exactly have the case VIV <= BTC, here VIV is a vector phi and looks like this follows a different code path in the vectoriser. This is definitely a case we want to support too, but that's a different story. A previous con...