similar to: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Displaying 20 results from an estimated 1000 matches similar to: "[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines"

2017 Jan 22
2
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
Hi Sanjay, The benchmark source file: http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/Benchmarks/Shootout/sieve.c?view=markup Clang options used to produce the initial IR: clang -DNDEBUG -O3 -DNDEBUG -mcpu=cortex-a53 -fomit-frame-pointer -O3 -DNDEBUG -w -Werror=date-time -c sieve.c -S -emit-llvm -mllvm -disable-llvm-optzns --target=aarch64-arm-linux Opt options: opt -O3
2017 Jan 22
2
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
Thank you for information. I’ll build clang without the hack and re-run the benchmark tomorrow. -Evgeny From: Sanjay Patel [mailto:spatel at rotateright.com] Sent: Sunday, January 22, 2017 8:00 PM To: Evgeny Astigeevich Cc: llvm-dev; nd Subject: Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines > Do you mean to
2017 Jan 23
2
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
Confirm there is no change in IR if the hack is disabled in the sources. David wrote that these instructions are created by SCEV. Are other targets affected by the changes, e.g. X86? Kind regards, Evgeny Astigeevich Senior Compiler Engineer Compilation Tools ARM From: Sanjay Patel [mailto:spatel at rotateright.com] Sent: Sunday, January 22, 2017 10:45 PM To: Evgeny Astigeevich Cc: llvm-dev; nd
2017 Jan 24
2
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
> On Jan 23, 2017, at 3:48 PM, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > All targets are likely affected in some way by the icmp+shl fold introduced with r292492. It's a basic pattern that occurs in lots of code. Did you see any perf wins on your targets with this commit? > > Sadly, it is also likely that many (all?) targets are negatively
2017 Jan 24
3
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
> On Jan 24, 2017, at 7:18 AM, Sanjay Patel <spatel at rotateright.com> wrote: > > > > On Mon, Jan 23, 2017 at 10:53 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: > >> On Jan 23, 2017, at 3:48 PM, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>
2017 Jan 24
3
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
Hi Sanjay, Thank you for your analysis. It’s interesting why the x86 machine is not affected. Maybe the x86 backend is smarter than the AArch64 backend, or it might be micro-architectural differences. I don’t mind to keep the changes on trunk. What I’d like to see is who will/should be involved in solving the issue. What kind of help/support is needed? Should we (ARM Compilation Tools) start
2017 Jan 24
3
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
On Tue, Jan 24, 2017 at 1:20 PM, Sanjay Patel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > I started looking at the log files that you attached, and I'm confused. > The code that is supposedly causing the perf regression is created by the > loop vectorizer, right? Except the bad code is not in the "vector.body", so > is there something peculiar about
2009 Jun 24
1
[LLVMdev] Handling SMax(N, N - constInt) in Scalar Evolution pass
Hi all, I'm working on a project which tries to prove an access to an array is safe. For example, int foo(int s) { int * p = malloc(s * sizeof int); ... int q = p[s - 2]; } then the access of p[s - 2] always stays in bound. I implemented a prototype using the Scalar Evolution pass. Here are the pseudo-code of the implementation: const SCEV * offset =
2009 Jun 24
2
[LLVMdev] Handling SMax(N, N - constInt) in Scalar Evolution pass
On Tue, 2009-06-23 at 22:55 -0700, Nick Lewycky wrote: > Mai, Haohui wrote: > > Hi all, > > > > I'm working on a project which tries to prove an access to an array is > > safe. For example, > > > > int foo(int s) { > > int * p = malloc(s * sizeof int); > > ... > > int q = p[s - 2]; > > } > > > > then the access
2015 Jul 21
6
[LLVMdev] GlobalsModRef (and thus LTO) is completely broken
Based on function names and structures, this is some version of GCC :) Any way you can post the entire .ll file? Because it's globalsmodref, it's hard to debug without the other functions, since it goes over all the functions to determine address takenness, etc :) On Tue, Jul 21, 2015 at 3:23 PM, Michael Zolotukhin <mzolotukhin at apple.com> wrote: > Hi Chandler, > > We
2009 Jun 24
0
[LLVMdev] Handling SMax(N, N - constInt) in Scalar Evolution pass
Mai, Haohui wrote: > Hi all, > > I'm working on a project which tries to prove an access to an array is > safe. For example, > > int foo(int s) { > int * p = malloc(s * sizeof int); > ... > int q = p[s - 2]; > } > > then the access of p[s - 2] always stays in bound. > > I implemented a prototype using the Scalar Evolution pass. Here are the
2009 Jun 24
0
[LLVMdev] Handling SMax(N, N - constInt) in Scalar Evolution pass
Nick, It might be a little bit difficult to handle SMax correctly. But is it possible to reduce A+(-A) to 0 in SAddExpr? Haohui On Wed, 2009-06-24 at 01:05 -0500, Mai, Haohui wrote: > On Tue, 2009-06-23 at 22:55 -0700, Nick Lewycky wrote: > > Mai, Haohui wrote: > > > Hi all, > > > > > > I'm working on a project which tries to prove an access to an array
2009 Jun 24
4
[LLVMdev] killing vicmp and vfcmp
Now that icmp and fcmp have supported returning vectors of i1 for a while, I think it's time to remove the vicmp and vfcmp instructions from LLVM. The good news is that we've never shipped a release that included them so we won't be providing auto-upgrade support. There is some existing backend support for vicmp and vfcmp that looks different from what icmp and fcmp do. If this
2009 Jun 24
1
[LLVMdev] Handling SMax(N, N - constInt) in Scalar Evolution pass
Mai, Haohui wrote: > Nick, > > It might be a little bit difficult to handle SMax correctly. But is it > possible to reduce A+(-A) to 0 in SAddExpr? Yes, it should already do that. Here's a quick test I wrote up: $ cat x.ll define i8 @test(i8 %x) { %neg = sub i8 0, %x %sum = add i8 %x, %neg ret i8 %sum } $ llvm-as < x.ll | opt -analyze
2015 Jul 17
2
[LLVMdev] GlobalsModRef (and thus LTO) is completely broken
On Fri, Jul 17, 2015 at 9:13 AM Evgeny Astigeevich < evgeny.astigeevich at arm.com> wrote: > It’s Dhrystone. > Dhrystone has historically not been a good indicator of real-world performance fluctuations, especially at this small of a shift. I'd like to see if we see any fluctuation on larger and more realistic application benchmarks. One advantage of the flag being set is that we
2017 Nov 10
5
[RFC] Enable Partial Inliner by default
Hi Graham, Thank you for offering help. I am trying to create a reproducer. The problem is that the crashes happen whilst LTO is used. One thing I am sure about IR is broken at compile time. Thanks, Evgeny From: Graham Yiu <gyiu at ca.ibm.com> Date: Friday, 10 November 2017 at 16:09 To: Evgeny Astigeevich <Evgeny.Astigeevich at arm.com> Cc: "junbuml at codeaurora.org"
2017 Jan 27
2
Reversion of rL292621 caused about 7% performance regressions on Cortex-M
Hi Wei, Thank you for information. Please let me know about any progress in fixing the failures. I can help with checking that the final patch gives the same level of performance improvements. Kind regards, Evgeny Astigeevich Senior Compiler Engineer Compilation Tools ARM > -----Original Message----- > From: Wei Mi [mailto:wmi at google.com] > Sent: Friday, January 27, 2017 6:20 PM
2017 Nov 10
2
[RFC] Making .eh_frame more linker-friendly
> But if we still need to deal with CIEs and generate .eh_frame_hdr in a special way, > does it make sense to make this change to simplify only a small part of a linker? For huge C++ projects this could improve link time if GC is a bottleneck. It will also improve eh_frame_hdr build time because you don’t spend time on parsing garbage. However a linker will have to have two versions of GC:
2017 Jan 27
2
Reversion of rL292621 caused about 7% performance regressions on Cortex-M
Hi Evgeny, Quentin and Matthias found it was a problem about subreg live range update and will push a fix soon (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170123/424126.html). Thanks, Wei. On Fri, Jan 27, 2017 at 10:35 AM, Wei Mi <wmi at google.com> wrote: > Sure. Will keep you posted. > > Thanks, > Wei. > > On Fri, Jan 27, 2017 at 10:31 AM, Evgeny
2017 Nov 10
2
[RFC] Making .eh_frame more linker-friendly
Hi Igor, > It sounds like the linker has to be aware of the .eh_frame section details to be able to generate .eh_frame_hdr and eliminate duplicate CIEs, right? Yes, a linker needs some details but not all of them. It needs to know sizes of records and initial locations (PC Begin) to find out which functions FDEs belong to. > So, is there any difference whether it knows that in one place