thr3ads.net - similar to: "[LLVMdev] trunk's optimizer generates slower code than 3.5"

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] trunk's optimizer generates slower code than 3.5"

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

[LLVMdev] trunk's optimizer generates slower code than 3.5

The regressions in the performance of generated code, introduced by the llvm 3.6 release, don't seem to be limited to this 8 queens puzzle" solver test case. See... http://www.phoronix.com/scan.php?page=article&item=llvm-clang-3.5-3.6-rc1&num=1 where a bit hit in the performance of the Sparse Matrix Multiply test of the SciMark v2.0 benchmark was observed as well as others.

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

[LLVMdev] trunk's optimizer generates slower code than 3.5

Using the SciMark 2.0 code from http://math.nist.gov/scimark2/scimark2_1c.zip compiled with the same... make CFLAGS="-O3 -march=native" I am able to reproduce the 22% performance regression in the run time of the Sparse matmult benchmark. For 10 runs of the scimark2 benechmark, I get 998.439+/-0.4828 with the release llvm clang 3.5.1 compiler and 1217.363+/-1.1004 for the current

[LLVMdev] bug in X86 disasm code?

2013 Sep 12

[LLVMdev] bug in X86 disasm code?

hi, i found this code in X86DisassemblerDecoder.h #define EA_BASES_32BIT \ ENTRY(EAX) \ ENTRY(ECX) \ ENTRY(EDX) \ ENTRY(EBX) \ ENTRY(sib) \ ENTRY(EBP) \ ENTRY(ESI) \ ENTRY(EDI) \ ENTRY(R8D) \ ENTRY(R9D) \ ENTRY(R10D) \ ENTRY(R11D) \

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 25

Tail call optimization is getting affected due to local function related optimization with IPRA

Hello LLVM Community, To improve Interprocedural Register Allocation (IPRA) we are trying to force caller saved registers for local functions (which has likage type local). To achive it I have modified TargetFrameLowering::determineCalleeSaves() to return early for function which satisfies if (F->hasLocalLinkage() && !F->hasAddressTaken()) and also reflecting the fact that for local

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 26

Tail call optimization is getting affected due to local function related optimization with IPRA

According to this http://llvm.org/docs/CodeGenerator.html#tail-call-section, it seems that adding a new CC for the purpose of local function optimization seems a good idea because tail call optimization only takes place when both caller and callee have fastcc or GHC or HiPE calling convention. -Vivek On Sun, Jun 26, 2016 at 1:26 AM, vivek pandya <vivekvpandya at gmail.com> wrote: >

Help required regarding IPRA and Local Function optimization

2016 Jun 30

Help required regarding IPRA and Local Function optimization

Hello Mentors, I am currently finding bug in Local Function related optimization due to which runtime failures are observed in some test cases, as those test cases are containing very large function with recursion and object oriented code so I am not able to find a pattern which is causing failure. So I tried following simple case to understand expected behavior from this optimization. Consider

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 25

Tail call optimization is getting affected due to local function related optimization with IPRA

On Sat, Jun 25, 2016 at 11:03 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > Hello LLVM Community, > > To improve Interprocedural Register Allocation (IPRA) we are trying to > force caller > saved registers for local functions (which has likage type local). To > achive it > I have modified TargetFrameLowering::determineCalleeSaves() to return > early for >

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 28

Tail call optimization is getting affected due to local function related optimization with IPRA

> On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > > Hello , > > To solve this bug locally I have given preference to tail call optimization over local function related optimization in IPRA. I have added following method to achieve this: > > bool isEligibleForTailCallOptimization(Function *F) { > CallingConv::ID CC =

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 27

Tail call optimization is getting affected due to local function related optimization with IPRA

Hello , To solve this bug locally I have given preference to tail call optimization over local function related optimization in IPRA. I have added following method to achieve this: bool isEligibleForTailCallOptimization(Function *F) { CallingConv::ID CC = F->getCallingConv(); if (CC == CallingConv::Fast || CC == CallingConv::GHC || CC == CallingConv::HiPE) return true; return false;

[LLVMdev] Apparent optimizer bug on X86_64

2011 Mar 19

[LLVMdev] Apparent optimizer bug on X86_64

Compiling a simple automaton created by GNU bison with -O1 or -O2 resulted in the following machine code: 1300 /*-----------------------------. 1301 | yyreduce -- Do a reduction. | 1302 `-----------------------------*/ 1303 yyreduce: 1304 /* yyn is the number of a rule to reduce with. */ 1305 yylen = yyr2[yyn]; 0x0000000000400c14 <rpcalc_parse+628>: mov

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 28

Tail call optimization is getting affected due to local function related optimization with IPRA

On Tue, Jun 28, 2016 at 8:11 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > > On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > > Hello , > > To solve this bug locally I have given preference to tail call > optimization over local function related optimization in IPRA. I have added > following method to achieve this: > >

2019 Dec 23

Hi Scott, That #1073741833 is a register mask. They are treated as aggregate registers (essentially sets of registers), so if it includes R9D and R11D, it will be treated as being aliased with both. These separate defs are there because they reach disjoint registers. -- Krzysztof Parzyszek kparzysz at quicinc.com<mailto:kparzysz at quicinc.com> AI tools development From: Scott

What does a dead register mean?

2018 Feb 06

What does a dead register mean?

Hi, My understanding of a "dead" register is a def that is never used. However, when I dump the MI after reg alloc on a simple program I see the following sequence: ADJCALLSTACKDOWN64 0, 0, 0, *implicit-def dead %rsp*, implicit-def dead %eflags, implicit-def dead %ssp, implicit %rsp, implicit %ssp CALL64pcrel32 @foo, <regmask %bh %bl %bp %bpl %bx %ebp %ebx %rbp %rbx %r12 %r13 %r14

2020 Jan 10

Hi Scott, Sorry for the late reply, I was out of office during the holidays. 1. A def node can reach either a use node, or another def node. In the highlighted phi node (p3224), the def (d3225) reaches another def (1598) in statement (s1597), that’s why it’s needed. 2. The reason why the def of R11 in s1578 is not connected directly to the use in s1725 is that there may be an intervening

[PATCH] Optimized assembler version of md5_process() for x86-64

2020 May 22

[PATCH] Optimized assembler version of md5_process() for x86-64

This patch introduces an optimized assembler version of md5_process(), the inner loop of MD5 checksumming. It affects the performance of all MD5 operations in rsync - including block matching and whole-file checksums. Performance gain is 5-10% depending on the specific CPU. Originally created by Marc Bevand and placed in the public domain, later integrated into OpenSSL. This is the original

2019 Nov 08

Do you know whether it has been fixed on the 8.0.1 release? Scott On Fri, Nov 8, 2019 at 9:45 AM Krzysztof Parzyszek <kparzysz at quicinc.com<mailto:kparzysz at quicinc.com>> wrote: The one blocking issue that existed in the past has been fixed. I haven’t had time to do any work on it lately, but I’m not aware of any fundamental problems that would make it not work on x86. --

[Bug 93557] New: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a

2016 Jan 02

[Bug 93557] New: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a

https://bugs.freedesktop.org/show_bug.cgi?id=93557 Bug ID: 93557 Summary: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a Product: xorg Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: blocker

Finding caller-saved registers at a function call site

2016 Jun 27

Finding caller-saved registers at a function call site

Hi Sanjoy, I'm having trouble finding caller-saved registers using the RegMask operand you've mentioned. As an example, I've got a C function that looks like this: double recurse(int depth, double val) { if(depth < max_depth) return recurse(depth + 1, val * 1.2) + val; else return outer_func(val); } As a quick refresher, all "xmm" registers are considered

Mischeduler: Unknown reason for peak register pressure increase

2017 Aug 12

Mischeduler: Unknown reason for peak register pressure increase

I am working on a project where we are integrating an existing pre-RA scheduler into LLVM and we are trying to match our peak register pressure values with the machine instruction schedulers values while using X86. I am finding some mismatches in test cases like the one attached. The registers "AH" and "AL" are live-out but not live-in and I don't see that they are defined

What does a dead register mean?

2018 Feb 06

What does a dead register mean?

You are right about your interpretation of "dead". The case here is that RSP is a reserved register and so its liveness isn't really tracked. The "implicit-def dead" is an idiom used to mean that the register (reserved or not) is clobbered. The other implicit uses/defs can come from instruction definitions to indicate that this instruction uses and/or modifies a given

similar to: [LLVMdev] trunk's optimizer generates slower code than 3.5