thr3ads.net - similar to: "Folding zext from i1 into PHI nodes with only zwo incoming values."

Displaying 20 results from an estimated 1100 matches similar to: "Folding zext from i1 into PHI nodes with only zwo incoming values."

Folding zext from i1 into PHI nodes with only zwo incoming values.

2017 Jan 30

Folding zext from i1 into PHI nodes with only zwo incoming values.

Hi Sanjay, unfortunately that patch does not help in my case. Here's the IR that fails to get fully optimized: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" define fastcc zeroext i1 @testfunc(i8** noalias nocapture readonly dereferenceable(8)) unnamed_addr { entry-block: %1 = load i8*, i8**

Folding zext from i1 into PHI nodes with only zwo incoming values.

2017 Jan 30

Folding zext from i1 into PHI nodes with only zwo incoming values.

Hi Björn and Daniel, FoldPHIArgOpIntoPHI() is for unary ops (casts), but it calls FoldPHIArgBinOpIntoPHI() for binops which takes care of almost everything else? My minimal patch idea is to ease the restriction in ShouldChangeType because i1 is special. I tried the patch below, and it works on the example...and nothing in 'make check' failed. :) Index:

Folding zext from i1 into PHI nodes with only zwo incoming values.

2017 Jan 31

Folding zext from i1 into PHI nodes with only zwo incoming values.

Hi Sanjay, 2017-01-30 22:22 GMT+01:00 Sanjay Patel <spatel at rotateright.com>: > My minimal patch idea is to ease the restriction in ShouldChangeType > because i1 is special. I tried the patch below, and it works on the > example...and nothing in 'make check' failed. :) > Yeah, that would work for me as well, I just wasn't sure about the implications that has.

Folding zext from i1 into PHI nodes with only zwo incoming values.

2017 Jan 31

Folding zext from i1 into PHI nodes with only zwo incoming values.

On Tue, Jan 31, 2017 at 3:09 AM, Björn Steinbrink <bsteinbr at gmail.com> wrote: > Hi Sanjay, > > 2017-01-30 22:22 GMT+01:00 Sanjay Patel <spatel at rotateright.com>: > >> My minimal patch idea is to ease the restriction in ShouldChangeType >> because i1 is special. I tried the patch below, and it works on the >> example...and nothing in 'make

Folding zext from i1 into PHI nodes with only zwo incoming values.

2017 Jan 31

Folding zext from i1 into PHI nodes with only zwo incoming values.

2017-01-31 16:59 GMT+01:00 Sanjay Patel <spatel at rotateright.com>: > > > On Tue, Jan 31, 2017 at 3:09 AM, Björn Steinbrink <bsteinbr at gmail.com> > wrote: > >> Hi Sanjay, >> >> 2017-01-30 22:22 GMT+01:00 Sanjay Patel <spatel at rotateright.com>: >> >>> My minimal patch idea is to ease the restriction in ShouldChangeType

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 23

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Confirm there is no change in IR if the hack is disabled in the sources. David wrote that these instructions are created by SCEV. Are other targets affected by the changes, e.g. X86? Kind regards, Evgeny Astigeevich Senior Compiler Engineer Compilation Tools ARM From: Sanjay Patel [mailto:spatel at rotateright.com] Sent: Sunday, January 22, 2017 10:45 PM To: Evgeny Astigeevich Cc: llvm-dev; nd

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 22

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Thank you for information. I’ll build clang without the hack and re-run the benchmark tomorrow. -Evgeny From: Sanjay Patel [mailto:spatel at rotateright.com] Sent: Sunday, January 22, 2017 8:00 PM To: Evgeny Astigeevich Cc: llvm-dev; nd Subject: Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines > Do you mean to

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 22

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Hi Sanjay, The benchmark source file: http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/Benchmarks/Shootout/sieve.c?view=markup Clang options used to produce the initial IR: clang -DNDEBUG -O3 -DNDEBUG -mcpu=cortex-a53 -fomit-frame-pointer -O3 -DNDEBUG -w -Werror=date-time -c sieve.c -S -emit-llvm -mllvm -disable-llvm-optzns --target=aarch64-arm-linux Opt options: opt -O3

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 24

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

> On Jan 23, 2017, at 3:48 PM, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > All targets are likely affected in some way by the icmp+shl fold introduced with r292492. It's a basic pattern that occurs in lots of code. Did you see any perf wins on your targets with this commit? > > Sadly, it is also likely that many (all?) targets are negatively

always allow canonicalizing to 8- and 16-bit ops?

2018 Jan 17

always allow canonicalizing to 8- and 16-bit ops?

Example: define i8 @narrow_add(i8 %x, i8 %y) { %x32 = zext i8 %x to i32 %y32 = zext i8 %y to i32 %add = add nsw i32 %x32, %y32 %tr = trunc i32 %add to i8 ret i8 %tr } With no data-layout or with an x86 target where 8-bit integer is in the data-layout, we reduce to: $ ./opt -instcombine narrowadd.ll -S define i8 @narrow_add(i8 %x, i8 %y) { %add = add i8 %x, %y ret i8 %add } But on

always allow canonicalizing to 8- and 16-bit ops?

2018 Jan 22

always allow canonicalizing to 8- and 16-bit ops?

Thanks for the perf testing. I assume that DAG legalization is equipped to handle these cases fairly well, or someone would've complained by now... FWIW (and at least some of this can be blamed on me), instcombine already does the narrowing transforms without checking shouldChangeType() for binops like and/or/xor/udiv. The justification was that narrower ops are always better for

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 24

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

> On Jan 24, 2017, at 7:18 AM, Sanjay Patel <spatel at rotateright.com> wrote: > > > > On Mon, Jan 23, 2017 at 10:53 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: > >> On Jan 23, 2017, at 3:48 PM, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>

always allow canonicalizing to 8- and 16-bit ops?

2018 Jan 22

always allow canonicalizing to 8- and 16-bit ops?

Hello Thanks for looking into this. I can't be very confident what the knock on result of a change like that would be, especially on architectures that are not Arm. What I can do though, is run some benchmarks and look at that results. Using this patch: --- a/lib/Transforms/InstCombine/InstructionCombining.cpp +++ b/lib/Transforms/InstCombine/InstructionCombining.cpp @@ -150,6 +150,9 @@

IR canonicalization: shufflevector or vector trunc?

2017 Jan 17

IR canonicalization: shufflevector or vector trunc?

We use InstCombiner::ShouldChangeType() to prevent transforms to illegal integer types, but I'm not sure how that would apply to vector types. Ie, let's say v256 is a legal type in your example. DataLayout doesn't appear to specify what configurations of a 256-bit vector are legal, so I don't think we can currently use that to say v2i128 should be treated differently than v16i16.

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 24

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Hi Sanjay, Thank you for your analysis. It’s interesting why the x86 machine is not affected. Maybe the x86 backend is smarter than the AArch64 backend, or it might be micro-architectural differences. I don’t mind to keep the changes on trunk. What I’d like to see is who will/should be involved in solving the issue. What kind of help/support is needed? Should we (ARM Compilation Tools) start

[cfe-dev] New buildbot with -Werror

2016 Mar 23

[cfe-dev] New buildbot with -Werror

On Tue, Mar 22, 2016 at 5:03 PM, David Blaikie <dblaikie at gmail.com> wrote: > Could we change the buildbot config to treat > warnings-as-errors-that-continue in the build step? (so we get the benefit > of further execution tests, while still sending fail-mail, etc for the > warning?) > > That sounds reasonable, although the buildbot implementation seems to be pretty

IR canonicalization: shufflevector or vector trunc?

2017 Jan 21

IR canonicalization: shufflevector or vector trunc?

On Thu, Jan 19, 2017 at 9:17 AM, Rackover, Zvi <zvi.rackover at intel.com> wrote: > Hi Sanjay, > > > > I agree we should also discuss **if** this canonicalization is beneficial. > > For starters, do we have a concrete case where we would benefit from > canonicalizing shuffles <-> truncates in LLVM IR? > > IMO, we should not count benefits for codegen

[cfe-dev] New buildbot with -Werror

2016 Mar 23

[cfe-dev] New buildbot with -Werror

Could we change the buildbot config to treat warnings-as-errors-that-continue in the build step? (so we get the benefit of further execution tests, while still sending fail-mail, etc for the warning?) On Tue, Mar 22, 2016 at 5:00 PM, David Jones via cfe-dev < cfe-dev at lists.llvm.org> wrote: > Greetings, > > I would like to propose adding a buildbot which builds with -Werror. The

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 18

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

On Tue, Dec 18, 2012 at 9:56 AM, Matthew Curtis <mcurtis at codeaurora.org> wrote: > > Here's how I'm evaluating the expression (in my head): > > 00: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[n],3), i8), i32),3) > | > 01: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[0],3), i8), i32),3) >

[LLVMdev] Making a CopyToReg/CopyFromReg into a zext/sext?

2015 Jan 27

[LLVMdev] Making a CopyToReg/CopyFromReg into a zext/sext?

I have a CopyToReg that is copying from different size types, what's the best way to change that to a zext or sext node based on signed or unsigned? I'm fairly unfamiliar with SelectionDAG process (outside of the docs on llvm website). It seems like I should be able to insert a custom hook using the register class to identify the type, potentially in ISelDAGToDag.cpp or is there a better

similar to: Folding zext from i1 into PHI nodes with only zwo incoming values.