search for: ayal

Displaying 20 results from an estimated 106 matches for "ayal".

2016 Sep 01
2
enabling interleaved access loop vectorization
...fectively have power-of-2 strides and/or alignment. > So, unfortunately, it turns out I don't have access to DENBench. If you like we could test your patch to see how it (mis)behaves. From: Michael Kuperstein [mailto:mkuper at google.com] Sent: Thursday, August 18, 2016 03:57 To: Zaks, Ayal <ayal.zaks at intel.com> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Renato Golin <renato.golin at linaro.org>; Matthew Simpson <mssimpso at codeaurora.org>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; Sanjay Patel <spatel at rotateright.com>; llvm-de...
2016 Aug 17
2
enabling interleaved access loop vectorization
Thanks Ayal! On Wed, Aug 17, 2016 at 2:14 PM, Zaks, Ayal <ayal.zaks at intel.com> wrote: > Hi Michael, > > > > Don’t quite have a full reproducer for you yet. You’re welcome to try and > see what’s happening in 32 bit mode when enabling interleaving for the > following, based on “...
2016 Aug 16
2
enabling interleaved access loop vectorization
Hi Ayal, Elena, I'd really like to enable this by default. As I wrote above, I didn't see any regressions in internal benchmarks, and there doesn't seem to be anything in SPEC2006 either. I do see a performance improvement in an internal benchmark (that is, a real workload). Would you be abl...
2016 Aug 09
2
enabling interleaved access loop vectorization
Thanks Ayal! I'll take a look at DENBench. As another data point - I tried enabling this on our internal benchmarks. I'm seeing one regression, and it seems to be a regression of the "good" kind - without interleaving we don't vectorize the innermost loop, and with interleaving we do. T...
2016 Aug 07
2
enabling interleaved access loop vectorization
...n <renato.golin at linaro.org> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Matthew Simpson <mssimpso at codeaurora.org>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; Sanjay Patel <spatel at rotateright.com>; llvm-dev <llvm-dev at lists.llvm.org>; Zaks, Ayal <ayal.zaks at intel.com> Subject: Re: [llvm-dev] enabling interleaved access loop vectorization On Fri, Aug 5, 2016 at 4:37 PM, Renato Golin <renato.golin at linaro.org<mailto:renato.golin at linaro.org>> wrote: On 6 August 2016 at 00:18, Michael Kuperstein <mkuper at google...
2014 Dec 24
2
[LLVMdev] Indexed Load and Store Intrinsics - proposal
----- Original Message ----- > From: "Xinmin Tian" <xinmin.tian at intel.com> > To: "Hal Finkel" <hfinkel at anl.gov>, "Ayal Zaks" <ayal.zaks at intel.com> > Cc: dag at cray.com, "Robert Khasanov" <robert.khasanov at intel.com>, llvmdev at cs.uiuc.edu > Sent: Tuesday, December 23, 2014 7:36:44 PM > Subject: RE: [LLVMdev] Indexed Load and Store Intrinsics - proposal > > For non-z...
2017 Mar 14
10
[Proposal][RFC] Epilog loop vectorization
...halu/LayoutDescription.png Approach-1 looks feasible, please comment if any objections. Regards, Ashutosh From: Nema, Ashutosh Sent: Wednesday, March 1, 2017 10:42 AM To: 'Daniel Berlin' <dberlin at dberlin.org> Cc: anemet at apple.com; Hal Finkel <hfinkel at anl.gov>; Zaks, Ayal <ayal.zaks at intel.com>; Renato Golin <renato.golin at linaro.org>; mkuper at google.com; Mehdi Amini <mehdi.amini at apple.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: RE: [llvm-dev] [Proposal][RFC] Epilog loop vectorization Sorry I misunderstood, gvn/newgvn/gvnho...
2017 Jul 21
2
[SPIR/PTX] Divergence analysis for BasicBlocks
Hello, Yes? Where is allActive defined, I couldn't find it. Basically, a BB is control divergent if it's execution depends on a branch that itself depends on a divergent ssa value. On Fri, Jul 21, 2017 at 4:13 PM, Zaks, Ayal <ayal.zaks at intel.com> wrote: > What would be the definition of “isControlDivergent(BasicBlock*)”; the > complementary of “allActive(BasicBlock*)” – blocks known to execute all > lanes, whenever reached? Note the (distinct) term “rewire targets” that > Ralf Karrenberg used in h...
2014 Dec 24
2
[LLVMdev] Indexed Load and Store Intrinsics - proposal
----- Original Message ----- > From: "Ayal Zaks" <ayal.zaks at intel.com> > To: "Philip Reames" <listmail at philipreames.com>, dag at cray.com, "Elena Demikhovsky" <elena.demikhovsky at intel.com> > Cc: "Robert Khasanov" <robert.khasanov at intel.com>, llvmdev at cs.uiuc.edu...
2017 Mar 14
2
[Proposal][RFC] Epilog loop vectorization
...March 1, 2017 10:42 AM > *To:* 'Daniel Berlin' <dberlin at dberlin.org > <mailto:dberlin at dberlin.org>> > *Cc:* anemet at apple.com <mailto:anemet at apple.com>; Hal Finkel > <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>; Zaks, Ayal > <ayal.zaks at intel.com <mailto:ayal.zaks at intel.com>>; Renato Golin > <renato.golin at linaro.org <mailto:renato.golin at linaro.org>>; > mkuper at google.com <mailto:mkuper at google.com>; Mehdi Amini > <mehdi.amini at apple.com &lt...
2014 Oct 24
2
[LLVMdev] Adding masked vector load and store intrinsics
...as select(mask, load(addr), passthru)? This suggests masked-off lanes are free to speculatively load from memory. Whereas proposed semantics is that: > The addressed memory will not be touched for masked-off lanes. In > particular, if all lanes are masked off no address will be accessed. Ayal. -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel Sent: Friday, October 24, 2014 15:50 To: Demikhovsky, Elena Cc: dag at cray.com; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Adding masked vector load and store intrin...
2017 Feb 28
3
[Proposal][RFC] Epilog loop vectorization
...ation factor for epilog vector loop. Regards, Ashutosh From: anemet at apple.com [mailto:anemet at apple.com] Sent: Tuesday, February 28, 2017 1:33 AM To: Hal Finkel <hfinkel at anl.gov> Cc: Daniel Berlin <dberlin at dberlin.org>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; Zaks, Ayal <ayal.zaks at intel.com>; Renato Golin <renato.golin at linaro.org>; mkuper at google.com; Mehdi Amini <mehdi.amini at apple.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] [Proposal][RFC] Epilog loop vectorization On Feb 27, 2017, at 12:01 PM, Hal Fink...
2017 Mar 14
2
[Proposal][RFC] Epilog loop vectorization
...Wednesday, March 1, 2017 10:42 AM >> *To:*'Daniel Berlin' <dberlin at dberlin.org <mailto:dberlin at dberlin.org>> >> *Cc:*anemet at apple.com <mailto:anemet at apple.com>; Hal Finkel >> <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>; Zaks, Ayal >> <ayal.zaks at intel.com <mailto:ayal.zaks at intel.com>>; Renato Golin >> <renato.golin at linaro.org >> <mailto:renato.golin at linaro.org>>;mkuper at google.com >> <mailto:mkuper at google.com>; Mehdi Amini <mehdi.amini at apple.com...
2020 Apr 01
2
canonical form loops
...> > Quick question to see if I haven't missed anything: I would like convert > counting down loops, i.e. loops with a constant -1 step value, to counting up > loops, because the vectoriser is able to better deal with these loops (see e.g. > D76838 that I was discussing today with Ayal). It looks like LoopSimplifyCFG > and IndVarSimplify don't do this. So was just curious if I haven't missed > anything here or in another pass I haven't yet considered. I was perhaps also > expecting this to be the canonical form of loops, but couldn't find any > evidenc...
2020 May 21
2
LV: predication
...bust, and a simple way of passing this information on from the vectoriser to the backend. I still might have skipped a few details here, but this is what it boils down to, and hopefully you've got a good impression of the problem. Cheers, Sjoerd. ________________________________ From: Zaks, Ayal (Mobileye) <ayal.zaks at intel.com> Sent: 21 May 2020 18:44 To: Sjoerd Meijer <Sjoerd.Meijer at arm.com>; Eli Friedman <efriedma at quicinc.com> Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> Subject: RE: [llvm-dev] LV: predication > The compare that we are...
2016 Apr 12
2
llvm outlining question
I'm not aware of anything else in LLVM that performs outlining. Ayal (CCed) may be able to help you regarding CodeExtractor fixes. Thanks, Michael On 12 April 2016 at 14:21, Minghwa Wang <mwang2 at cse.scu.edu> wrote: > Thank you Michael and Tom for the quick reply. > > According to your experience and comments, CodeExtractor is buggy only > w...
2016 Feb 24
5
Fwd: [PATCH] D17497: Support arbitrary address space for intrinsics
...ddress space for intrinsics Date: Mon, 22 Feb 2016 08:39:38 +0000 From: Elena Demikhovsky <elena.demikhovsky at intel.com> Reply-To: reviews+D17497+public+90f3d1b9468ba8ca at reviews.llvm.org To: elena.demikhovsky at intel.com, apilipenko at azulsystems.com, listmail at philipreames.com, ayal.zaks at intel.com, Matthew.Arsenault at amd.com, pjcoup at gmail.com CC: llvm-commits at lists.llvm.org delena created this revision. delena added reviewers: apilipenko, reames, Ayal, arsenm, pjcoup. delena added a subscriber: llvm-commits. delena set the repository for this revision to rL LLV...
2017 Mar 14
1
[Proposal][RFC] Epilog loop vectorization
...*To:* 'Daniel Berlin' <dberlin at dberlin.org >> <mailto:dberlin at dberlin.org>> >> *Cc:* anemet at apple.com <mailto:anemet at apple.com>; Hal Finkel >> <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>; Zaks, Ayal >> <ayal.zaks at intel.com <mailto:ayal.zaks at intel.com>>; Renato >> Golin <renato.golin at linaro.org >> <mailto:renato.golin at linaro.org>>; mkuper at google.com >> <mailto:mkuper at google.com>; Mehdi Amini...
2017 Feb 27
4
[Proposal][RFC] Epilog loop vectorization
...og vectorization. This count should not be small as it may degrade the performance, with my limited tests I have observed 16 is a point it shows gains with one of our internal benchmark. This require more experiments & testing to decide what should be the minimum width. 5) Unrolling issues: As Ayal mentioned with large unroll factor the next profitable EpilogVF could be equal to VF. With the same reason the current patch enforces UF=1, as unrolling can minimize the possibility of executing epilog vector loop. Example to understand the new layout: void foo (char *A, char *B, char *C, int len...
2016 Jun 16
2
[RFC] Allow loop vectorizer to choose vector widths that generate illegal types
...produced by the vectorizer than by an unroll-and-jam pass. BTW, taken to the extreme, one could vectorize to the full trip count of the loop, as in http://impact.crhc.illinois.edu/shared/Papers/tr2014.mxpa.pdf, where memory spatial locality is deemed more important to optimize than register usage. Ayal. From: Michael Kuperstein [mailto:mkuper at google.com] Sent: Thursday, June 16, 2016 10:42 To: Nadav Rotem <nadav.rotem at me.com> Cc: Hal Finkel <hfinkel at anl.gov>; Zaks, Ayal <ayal.zaks at intel.com>; Demikhovsky, Elena <elena.demikhovsky at intel.com>; Adam Nemet <...