similar to: [LLVMdev] trunk's optimizer generates slower code than 3.5

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] trunk's optimizer generates slower code than 3.5"

2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
The regressions in the performance of generated code, introduced by the llvm 3.6 release, don't seem to be limited to this 8 queens puzzle" solver test case. See... http://www.phoronix.com/scan.php?page=article&item=llvm-clang-3.5-3.6-rc1&num=1 where a bit hit in the performance of the Sparse Matrix Multiply test of the SciMark v2.0 benchmark was observed as well as others.
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
Using the SciMark 2.0 code from http://math.nist.gov/scimark2/scimark2_1c.zip compiled with the same... make CFLAGS="-O3 -march=native" I am able to reproduce the 22% performance regression in the run time of the Sparse matmult benchmark. For 10 runs of the scimark2 benechmark, I get 998.439+/-0.4828 with the release llvm clang 3.5.1 compiler and 1217.363+/-1.1004 for the current
2013 Sep 12
1
[LLVMdev] bug in X86 disasm code?
hi, i found this code in X86DisassemblerDecoder.h #define EA_BASES_32BIT \ ENTRY(EAX) \ ENTRY(ECX) \ ENTRY(EDX) \ ENTRY(EBX) \ ENTRY(sib) \ ENTRY(EBP) \ ENTRY(ESI) \ ENTRY(EDI) \ ENTRY(R8D) \ ENTRY(R9D) \ ENTRY(R10D) \ ENTRY(R11D) \
2016 Jun 25
3
Tail call optimization is getting affected due to local function related optimization with IPRA
Hello LLVM Community, To improve Interprocedural Register Allocation (IPRA) we are trying to force caller saved registers for local functions (which has likage type local). To achive it I have modified TargetFrameLowering::determineCalleeSaves() to return early for function which satisfies if (F->hasLocalLinkage() && !F->hasAddressTaken()) and also reflecting the fact that for local
2016 Jun 26
3
Tail call optimization is getting affected due to local function related optimization with IPRA
According to this http://llvm.org/docs/CodeGenerator.html#tail-call-section, it seems that adding a new CC for the purpose of local function optimization seems a good idea because tail call optimization only takes place when both caller and callee have fastcc or GHC or HiPE calling convention. -Vivek On Sun, Jun 26, 2016 at 1:26 AM, vivek pandya <vivekvpandya at gmail.com> wrote: >
2016 Jun 30
4
Help required regarding IPRA and Local Function optimization
Hello Mentors, I am currently finding bug in Local Function related optimization due to which runtime failures are observed in some test cases, as those test cases are containing very large function with recursion and object oriented code so I am not able to find a pattern which is causing failure. So I tried following simple case to understand expected behavior from this optimization. Consider
2016 Jun 25
0
Tail call optimization is getting affected due to local function related optimization with IPRA
On Sat, Jun 25, 2016 at 11:03 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > Hello LLVM Community, > > To improve Interprocedural Register Allocation (IPRA) we are trying to > force caller > saved registers for local functions (which has likage type local). To > achive it > I have modified TargetFrameLowering::determineCalleeSaves() to return > early for >
2016 Jun 28
2
Tail call optimization is getting affected due to local function related optimization with IPRA
> On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > > Hello , > > To solve this bug locally I have given preference to tail call optimization over local function related optimization in IPRA. I have added following method to achieve this: > > bool isEligibleForTailCallOptimization(Function *F) { > CallingConv::ID CC =
2016 Jun 27
0
Tail call optimization is getting affected due to local function related optimization with IPRA
Hello , To solve this bug locally I have given preference to tail call optimization over local function related optimization in IPRA. I have added following method to achieve this: bool isEligibleForTailCallOptimization(Function *F) { CallingConv::ID CC = F->getCallingConv(); if (CC == CallingConv::Fast || CC == CallingConv::GHC || CC == CallingConv::HiPE) return true; return false;
2011 Mar 19
2
[LLVMdev] Apparent optimizer bug on X86_64
Compiling a simple automaton created by GNU bison with -O1 or -O2 resulted in the following machine code: 1300 /*-----------------------------. 1301 | yyreduce -- Do a reduction. | 1302 `-----------------------------*/ 1303 yyreduce: 1304 /* yyn is the number of a rule to reduce with. */ 1305 yylen = yyr2[yyn]; 0x0000000000400c14 <rpcalc_parse+628>: mov
2016 Jun 28
0
Tail call optimization is getting affected due to local function related optimization with IPRA
On Tue, Jun 28, 2016 at 8:11 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > > On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > > Hello , > > To solve this bug locally I have given preference to tail call > optimization over local function related optimization in IPRA. I have added > following method to achieve this: > >
2019 Dec 23
2
Register Dataflow Analysis on X86
Hi Scott, That #1073741833 is a register mask. They are treated as aggregate registers (essentially sets of registers), so if it includes R9D and R11D, it will be treated as being aliased with both. These separate defs are there because they reach disjoint registers. -- Krzysztof Parzyszek kparzysz at quicinc.com<mailto:kparzysz at quicinc.com> AI tools development From: Scott
2018 Feb 06
3
What does a dead register mean?
Hi, My understanding of a "dead" register is a def that is never used. However, when I dump the MI after reg alloc on a simple program I see the following sequence: ADJCALLSTACKDOWN64 0, 0, 0, *implicit-def dead %rsp*, implicit-def dead %eflags, implicit-def dead %ssp, implicit %rsp, implicit %ssp CALL64pcrel32 @foo, <regmask %bh %bl %bp %bpl %bx %ebp %ebx %rbp %rbx %r12 %r13 %r14
2020 Jan 10
2
Register Dataflow Analysis on X86
Hi Scott, Sorry for the late reply, I was out of office during the holidays. 1. A def node can reach either a use node, or another def node. In the highlighted phi node (p3224), the def (d3225) reaches another def (1598) in statement (s1597), that’s why it’s needed. 2. The reason why the def of R11 in s1578 is not connected directly to the use in s1725 is that there may be an intervening
2020 May 22
2
[PATCH] Optimized assembler version of md5_process() for x86-64
This patch introduces an optimized assembler version of md5_process(), the inner loop of MD5 checksumming. It affects the performance of all MD5 operations in rsync - including block matching and whole-file checksums. Performance gain is 5-10% depending on the specific CPU. Originally created by Marc Bevand and placed in the public domain, later integrated into OpenSSL. This is the original
2019 Nov 08
2
Register Dataflow Analysis on X86
Do you know whether it has been fixed on the 8.0.1 release? Scott On Fri, Nov 8, 2019 at 9:45 AM Krzysztof Parzyszek <kparzysz at quicinc.com<mailto:kparzysz at quicinc.com>> wrote: The one blocking issue that existed in the past has been fixed. I haven’t had time to do any work on it lately, but I’m not aware of any fundamental problems that would make it not work on x86. --
2016 Jan 02
13
[Bug 93557] New: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a
https://bugs.freedesktop.org/show_bug.cgi?id=93557 Bug ID: 93557 Summary: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a Product: xorg Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: blocker
2016 Jun 27
3
Finding caller-saved registers at a function call site
Hi Sanjoy, I'm having trouble finding caller-saved registers using the RegMask operand you've mentioned. As an example, I've got a C function that looks like this: double recurse(int depth, double val) { if(depth < max_depth) return recurse(depth + 1, val * 1.2) + val; else return outer_func(val); } As a quick refresher, all "xmm" registers are considered
2017 Aug 12
3
Mischeduler: Unknown reason for peak register pressure increase
I am working on a project where we are integrating an existing pre-RA scheduler into LLVM and we are trying to match our peak register pressure values with the machine instruction schedulers values while using X86. I am finding some mismatches in test cases like the one attached. The registers "AH" and "AL" are live-out but not live-in and I don't see that they are defined
2018 Feb 06
0
What does a dead register mean?
You are right about your interpretation of "dead". The case here is that RSP is a reserved register and so its liveness isn't really tracked. The "implicit-def dead" is an idiom used to mean that the register (reserved or not) is clobbered. The other implicit uses/defs can come from instruction definitions to indicate that this instruction uses and/or modifies a given