thr3ads.net - search: "rotateright"

Displaying 20 results from an estimated 287 matches for "rotateright".

how to force llvm generate gather intrinsic

2016 Feb 26

how to force llvm generate gather intrinsic

...e great if we have profitability mode to see the necessity to use gathers. Or it also would be good if there is a compiler option for the users to enable LLVM to generate the gather instructions no matter it is faster or slow. Best, Zhi On Fri, Feb 26, 2016 at 12:49 PM, Sanjay Patel <spatel at rotateright.com> wrote: > If I'm understanding correctly, you're saying that vgather* is slow on all > of Excavator, Haswell, Broadwell, and Skylake (client). Therefore, we will > not generate it for any of those machines. > > Even if that's true, we should not define "gathe...

how to force llvm generate gather intrinsic

2016 Feb 26

how to force llvm generate gather intrinsic

...Elena < elena.demikhovsky at intel.com> wrote: > No. Gather operation is slow on AVX2 processors. > > > > - * Elena* > > > > *From:* zhi chen [mailto:zchenhn at gmail.com] > *Sent:* Thursday, February 25, 2016 20:48 > *To:* Sanjay Patel <spatel at rotateright.com> > *Cc:* Demikhovsky, Elena <elena.demikhovsky at intel.com>; Nema, Ashutosh < > Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org> > > *Subject:* Re: [llvm-dev] how to force llvm generate gather intrinsic > > > > It seems that http://r...

how to force llvm generate gather intrinsic

2016 Feb 25

how to force llvm generate gather intrinsic

It seems that http://reviews.llvm.org/D15690 only implemented gather/scatter for AVX-512, but not for AVX/AVX2. Is there any plan to enable gather for AVX/2? Thanks. Best, Zhi On Thu, Feb 25, 2016 at 8:28 AM, Sanjay Patel <spatel at rotateright.com> wrote: > I don't think gather has been enabled for AVX2 as of r261875. > Masked load/store were enabled for AVX with: > http://reviews.llvm.org/D16528 / http://reviews.llvm.org/rL258675 > > On Wed, Feb 24, 2016 at 11:39 PM, Demikhovsky, Elena < > elena.demikhovsky...

how to force llvm generate gather intrinsic

2016 Feb 26

how to force llvm generate gather intrinsic

No. Gather operation is slow on AVX2 processors. - Elena From: zhi chen [mailto:zchenhn at gmail.com] Sent: Thursday, February 25, 2016 20:48 To: Sanjay Patel <spatel at rotateright.com> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] how to force llvm generate gather intrinsic It seems that http://reviews.llvm.org/D15690 only implemented gath...

Redundant promotion of integer values in x86 target

2016 Feb 01

Redundant promotion of integer values in x86 target

...der if you are still working on it and have a plan to submit your changes for the review. Thanks, Taewook From: "Smith, Kevin B" <kevin.b.smith at intel.com<mailto:kevin.b.smith at intel.com>> Date: Monday, February 1, 2016 at 3:30 PM To: 'Sanjay Patel' <spatel at rotateright.com<mailto:spatel at rotateright.com>>, Taewook Oh <twoh at fb.com<mailto:twoh at fb.com>> Cc: "llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>" <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: RE: [llvm-dev] Red...

GEP transformation by InstCombiner

2018 Jan 15

GEP transformation by InstCombiner

...ains pointer size, but how can I conclude that the GEP index can't be widened? - Elena From: Hal Finkel [mailto:hfinkel at anl.gov] Sent: Monday, January 15, 2018 20:34 To: Demikhovsky, Elena <elena.demikhovsky at intel.com>; llvm-dev at lists.llvm.org; Sanjay Patel (spatel at rotateright.com) <spatel at rotateright.com>; Chandler Carruth (chandlerc at gmail.com) <chandlerc at gmail.com>; Quentin Colombet (qcolombet at apple.com) <qcolombet at apple.com>; Craig Topper (craig.topper at gmail.com) <craig.topper at gmail.com> Cc: Breger, Igor <igor.breger at...

GEP transformation by InstCombiner

2018 Jan 15

GEP transformation by InstCombiner

...n DataLayout. -Hal > > > - */ Elena/* > > > > *From:*Hal Finkel [mailto:hfinkel at anl.gov] > *Sent:* Monday, January 15, 2018 20:34 > *To:* Demikhovsky, Elena <elena.demikhovsky at intel.com>; > llvm-dev at lists.llvm.org; Sanjay Patel (spatel at rotateright.com) > <spatel at rotateright.com>; Chandler Carruth (chandlerc at gmail.com) > <chandlerc at gmail.com>; Quentin Colombet (qcolombet at apple.com) > <qcolombet at apple.com>; Craig Topper (craig.topper at gmail.com) > <craig.topper at gmail.com> > *Cc:* Brege...

FMA canonicalization in IR

2016 Nov 20

FMA canonicalization in IR

...do today to fuse them back together again? On Sat, Nov 19, 2016 at 8:29 PM Hal Finkel <hfinkel at anl.gov> wrote: > ----- Original Message ----- > > From: "Hal J. via llvm-dev Finkel" <llvm-dev at lists.llvm.org> > > To: "Sanjay Patel" <spatel at rotateright.com> > > Cc: "llvm-dev" <llvm-dev at lists.llvm.org> > > Sent: Saturday, November 19, 2016 10:58:27 AM > > Subject: Re: [llvm-dev] FMA canonicalization in IR > > > > > > Sent from my Verizon Wireless 4G LTE DROID > > On Nov 19, 2016 10:26...

how to force llvm generate gather intrinsic

2016 Jan 23

how to force llvm generate gather intrinsic

Thanks for your response, Sanjay. I know there are intrinsics available in C/C++. But the problem is that I want to instrument my code at the IR level and generate those instructions. I don't want to touch the source code. Best, Zhi On Fri, Jan 22, 2016 at 4:54 PM, Sanjay Patel <spatel at rotateright.com> wrote: > I was just looking at the related masked load/store operations, and I > think there are at least 2 bugs: > > 1. X86TTIImpl::isLegalMaskedLoad/Store() should be legal for FP types with > AVX1 (not just AVX2). > 2. X86TTIImpl::isLegalMaskedGather/Scatter() should b...

[LLVMdev] Contributing the Apple ARM64 compiler backend

2014 Jun 26

[LLVMdev] Contributing the Apple ARM64 compiler backend

...add w8, w9, w8 str w8, [x0, w1, sxtw #2] ret The sext can be matched as part of the addressing mode for AArch64 – maybe it’s something in codegenprepare for x86 going awry? Cheers, James From: Sanjay Patel [mailto:spatel at rotateright.com] Sent: 26 June 2014 18:11 To: Manjunath DN Cc: James Molloy; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Contributing the Apple ARM64 compiler backend >> We've also seen similar instances where multiple registers are used to compute very similar >> addresses (such as x+0...

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 24

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

> On Jan 24, 2017, at 7:18 AM, Sanjay Patel <spatel at rotateright.com> wrote: > > > > On Mon, Jan 23, 2017 at 10:53 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: > >> On Jan 23, 2017, at 3:48 PM, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm....

how to force llvm generate gather intrinsic

2016 Feb 25

how to force llvm generate gather intrinsic

Yes, masked load/store/gather/scatter are completed. - Elena From: zhi chen [mailto:zchenhn at gmail.com] Sent: Thursday, February 25, 2016 01:20 To: Demikhovsky, Elena <elena.demikhovsky at intel.com> Cc: Sanjay Patel <spatel at rotateright.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] how to force llvm generate gather intrinsic Hi Elena, Are the masked_load and gather working now? Best, Zhi On Sat, Jan 23, 2016 at 12:06 PM, Demikhovsky, Elena <elen...

failing to optimize boolean ops on cmps

2017 Jul 13

failing to optimize boolean ops on cmps

...'t be an instsimplify though? The values we want in these cases do not exist already: %res = or i8 %b, %a %res = or i1 %cmp, %c On Thu, Jul 13, 2017 at 5:10 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > > > On Thu, Jul 13, 2017 at 2:12 PM, Sanjay Patel <spatel at rotateright.com> > wrote: > >> We have several optimizations in InstCombine for bitwise logic ops >> (and/or/xor) that fail to handle compare patterns with the equivalent >> bitwise logic. Example: >> >> define i8 @or_and_not(i8 %a, i8 %b) { >> %nota = xor i8 %a,...

[FPEnv] FNEG instruction

2018 Oct 01

[FPEnv] FNEG instruction

...n't want to over-constrain allowable optimizations. Fneg folds shouldn't be disabled just because we changed the FP exception state? On Mon, Oct 1, 2018 at 12:20 PM Cameron McInally <cameron.mcinally at nyu.edu> wrote: > On Thu, Sep 27, 2018 at 10:14 AM Sanjay Patel <spatel at rotateright.com> > wrote: > >> Regarding non-IEEE targets: yes, we definitely support those, so we do >> have to be careful about not breaking them. I know because I have broken >> them. :) >> See the discussion and related links here: >> https://reviews.llvm.org/D19391 &...

Question about canonicalizing cmp+select

2018 Jul 03

Question about canonicalizing cmp+select

I linked the wrong patch review. Here's the patch that was actually committed: https://reviews.llvm.org/D48508 https://reviews.llvm.org/rL335433 On Tue, Jul 3, 2018 at 4:39 PM, Sanjay Patel <spatel at rotateright.com> wrote: > [adding back llvm-dev and cc'ing Craig] > > I think you are asking if we are missing a fold (or your target is missing > enabling another hook) to transform the sext+add into shift+or? I think the > answer is 'yes'. We probably should add that fold. This...

FMA canonicalization in IR

2016 Nov 19

FMA canonicalization in IR

Sent from my Verizon Wireless 4G LTE DROID On Nov 19, 2016 10:26 AM, Sanjay Patel <spatel at rotateright.com<mailto:spatel at rotateright.com>> wrote: > > If I have my FMA intrinsics story straight now (thanks for the explanation, Hal!), I think it raises another question about IR canonicalization (and may affect the proposed revision to IR FMF): No, I think that we specifically don...

Invalid transformation in LibCallSimplifier::replacePowWithSqrt?

2020 Sep 14

Invalid transformation in LibCallSimplifier::replacePowWithSqrt?

...d your example and the problem. I see now where LibCallSimplifier creates the select...but we are immediately erasing that select with the code from the godbolt example. Does the real motivating case have no uses of the pow() result value? On Mon, Sep 14, 2020 at 1:03 PM Sanjay Patel <spatel at rotateright.com> wrote: > Yes, I mean just bail out on the transform in > LibCallSimplifier::replacePowWithSqrt() -> getSqrtCall(). If we can't prove > the call behaves the same with errno, then give up. > I'm not sure where the select / branching happens, but I don't see that &g...

enabling interleaved access loop vectorization

2016 Sep 01

enabling interleaved access loop vectorization

...8, 2016 03:57 To: Zaks, Ayal <ayal.zaks at intel.com> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Renato Golin <renato.golin at linaro.org>; Matthew Simpson <mssimpso at codeaurora.org>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; Sanjay Patel <spatel at rotateright.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] enabling interleaved access loop vectorization So, at least for this example, it looks like we actually want to vectorize with -enable-interleaved-mem-accesses, we just need the backend to generate good code for the vecto...

analysis based on nonnull attribute

2016 Dec 16

analysis based on nonnull attribute

...*From: *"Michael Kuperstein" <michael.kuperstein at gmail.com > <mailto:michael.kuperstein at gmail.com>> > *To: *"Hal Finkel" <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> > *Cc: *"Sanjay Patel" <spatel at rotateright.com > <mailto:spatel at rotateright.com>>, "llvm-dev" > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>, > "Michael Kuperstein" <mkuper at google.com > <mailto:mkuper at google.com>> &g...

RFC: New intrinsics masked.expandload and masked.compressstore

2016 Sep 25

RFC: New intrinsics masked.expandload and masked.compressstore

search for: rotateright