Displaying 20 results from an estimated 800 matches similar to: "[RFC] New pass: LoopExitValues"
2015 Sep 01
2
[RFC] New pass: LoopExitValues
On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem
<jvanadrighem at gmail.com> wrote:
> Do you have some specific performance measurements?
Averaging 4 runs of 10000 iterations each of Coremark on my X86_64
desktop showed:
-O2 performance: +2.9% faster with the L.E.V. pass
-Os size: 1.5% smaller with the L.E.V. pass
In the case of Coremark, the benefit comes mainly from the matrix
2015 Feb 13
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
I submitted the problem report to clang's bugzilla but no one seems to
care so I have to send it to the mailing list.
clang 3.7 svn (trunk 229055 as the time I was to report this problem)
generates slower code than 3.5 (Apple LLVM version 6.0
(clang-600.0.56) (based on LLVM 3.5svn)) for the following code.
It is a "8 queens puzzle" solver written as an educational example. As
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
The regressions in the performance of generated code, introduced
by the llvm 3.6 release, don't seem to be limited to this 8 queens
puzzle" solver test case. See...
http://www.phoronix.com/scan.php?page=article&item=llvm-clang-3.5-3.6-rc1&num=1
where a bit hit in the performance of the Sparse Matrix Multiply test
of the SciMark v2.0 benchmark was observed as well as others.
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
Using the SciMark 2.0 code from
http://math.nist.gov/scimark2/scimark2_1c.zip compiled with the
same...
make CFLAGS="-O3 -march=native"
I am able to reproduce the 22% performance regression in the run time
of the Sparse matmult benchmark.
For 10 runs of the scimark2 benechmark, I get 998.439+/-0.4828 with
the release llvm clang 3.5.1 compiler
and 1217.363+/-1.1004 for the current
2015 Sep 23
3
[RFC] New pass: LoopExitValues
On Wed, Sep 23, 2015 at 12:00 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>
>> Should we try the patch in it's current location, namely after LSR?
>
> Sure; post the patch as you have it so we can look at what's going on.
>
http://reviews.llvm.org/D12494
One particular point: The algorithm checks that SCEV's are equal when
their raw pointers are equal. Is
2016 Jun 25
3
Tail call optimization is getting affected due to local function related optimization with IPRA
Hello LLVM Community,
To improve Interprocedural Register Allocation (IPRA) we are trying to
force caller
saved registers for local functions (which has likage type local). To
achive it
I have modified TargetFrameLowering::determineCalleeSaves() to return early
for
function which satisfies if (F->hasLocalLinkage() && !F->hasAddressTaken())
and
also reflecting the fact that for local
2015 Sep 11
5
[RFC] New pass: LoopExitValues
Hi Steve
it seems the general consensus is that the patch feels like a work-around for a problem with LSR (and possibly other loop transformations) that introduces redundant instructions. It is probably best to file a bug and a few of your test cases.
Thanks
Gerolf
> On Sep 10, 2015, at 4:37 PM, Steve King via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> On Thu, Sep 10, 2015
2013 Sep 12
1
[LLVMdev] bug in X86 disasm code?
hi,
i found this code in X86DisassemblerDecoder.h
#define EA_BASES_32BIT \
ENTRY(EAX) \
ENTRY(ECX) \
ENTRY(EDX) \
ENTRY(EBX) \
ENTRY(sib) \
ENTRY(EBP) \
ENTRY(ESI) \
ENTRY(EDI) \
ENTRY(R8D) \
ENTRY(R9D) \
ENTRY(R10D) \
ENTRY(R11D) \
2019 Dec 23
2
Register Dataflow Analysis on X86
Hi Scott,
That #1073741833 is a register mask. They are treated as aggregate registers (essentially sets of registers), so if it includes R9D and R11D, it will be treated as being aliased with both.
These separate defs are there because they reach disjoint registers.
--
Krzysztof Parzyszek kparzysz at quicinc.com<mailto:kparzysz at quicinc.com> AI tools development
From: Scott
2020 Jan 10
2
Register Dataflow Analysis on X86
Hi Scott,
Sorry for the late reply, I was out of office during the holidays.
1. A def node can reach either a use node, or another def node. In the highlighted phi node (p3224), the def (d3225) reaches another def (1598) in statement (s1597), that’s why it’s needed.
2. The reason why the def of R11 in s1578 is not connected directly to the use in s1725 is that there may be an intervening
2016 Jun 26
3
Tail call optimization is getting affected due to local function related optimization with IPRA
According to this http://llvm.org/docs/CodeGenerator.html#tail-call-section,
it seems that adding a new CC for the purpose of local function
optimization seems a good idea because tail call optimization only takes
place when both caller and callee have fastcc or GHC or HiPE calling
convention.
-Vivek
On Sun, Jun 26, 2016 at 1:26 AM, vivek pandya <vivekvpandya at gmail.com>
wrote:
>
2016 Jun 28
2
Tail call optimization is getting affected due to local function related optimization with IPRA
> On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote:
>
> Hello ,
>
> To solve this bug locally I have given preference to tail call optimization over local function related optimization in IPRA. I have added following method to achieve this:
>
> bool isEligibleForTailCallOptimization(Function *F) {
> CallingConv::ID CC =
2016 Jun 25
0
Tail call optimization is getting affected due to local function related optimization with IPRA
On Sat, Jun 25, 2016 at 11:03 PM, vivek pandya <vivekvpandya at gmail.com>
wrote:
> Hello LLVM Community,
>
> To improve Interprocedural Register Allocation (IPRA) we are trying to
> force caller
> saved registers for local functions (which has likage type local). To
> achive it
> I have modified TargetFrameLowering::determineCalleeSaves() to return
> early for
>
2016 Jun 27
0
Tail call optimization is getting affected due to local function related optimization with IPRA
Hello ,
To solve this bug locally I have given preference to tail call optimization
over local function related optimization in IPRA. I have added following
method to achieve this:
bool isEligibleForTailCallOptimization(Function *F) {
CallingConv::ID CC = F->getCallingConv();
if (CC == CallingConv::Fast || CC == CallingConv::GHC || CC ==
CallingConv::HiPE)
return true;
return false;
2019 Nov 08
2
Register Dataflow Analysis on X86
Do you know whether it has been fixed on the 8.0.1 release?
Scott
On Fri, Nov 8, 2019 at 9:45 AM Krzysztof Parzyszek <kparzysz at quicinc.com<mailto:kparzysz at quicinc.com>> wrote:
The one blocking issue that existed in the past has been fixed. I haven’t had time to do any work on it lately, but I’m not aware of any fundamental problems that would make it not work on x86.
--
2015 Sep 23
2
[RFC] New pass: LoopExitValues
On Mon, Sep 21, 2015 at 11:13 AM, Wei Mi <wmi at google.com> wrote:
> Maybe it can follow the "Delete dead loops" pass which is after
> Induction Variable Simplification pass, so it will not affect existing
> exitvalue rewriting optimization in Induction Variable Simplification
> to find out and delete dead loops?
>
> Existing pass pipeline:
>
2015 Sep 03
2
[RFC] New pass: LoopExitValues
On Wed, Sep 2, 2015 at 5:36 AM, James Molloy <james at jamesmolloy.co.uk> wrote:
> Hi,
>
> Coremark really isn't a good enough test - have you run the LLVM test suite
> with this patch, and what were the performance differences?
For the test suite single source benches, the 235 tests improved
performance, 2 regressed and 705 were unchanged. That seems very
optimistic.
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You,
It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 =
[8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are
indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from
these locations. and zmm2 contains constant 4000. so,
vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000,
as for array b the stride is 4000.
zmm14=
2016 Jun 28
0
Tail call optimization is getting affected due to local function related optimization with IPRA
On Tue, Jun 28, 2016 at 8:11 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
> On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote:
>
> Hello ,
>
> To solve this bug locally I have given preference to tail call
> optimization over local function related optimization in IPRA. I have added
> following method to achieve this:
>
>
2015 Sep 26
2
[RFC] New pass: LoopExitValues
Hi Steve,
Do you primarily find this to help for nested loops? If so, that
could be because LSR explicitly bails out of processing them:
// Skip nested loops until we can model them better with formulae.
if (!L->empty()) {
DEBUG(dbgs() << "LSR skipping outer loop " << *L << "n");
return;
}
I don't know how much time you're