Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] trunk's optimizer generates slower code than 3.5"
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
The regressions in the performance of generated code, introduced
by the llvm 3.6 release, don't seem to be limited to this 8 queens
puzzle" solver test case. See...
http://www.phoronix.com/scan.php?page=article&item=llvm-clang-3.5-3.6-rc1&num=1
where a bit hit in the performance of the Sparse Matrix Multiply test
of the SciMark v2.0 benchmark was observed as well as others.
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
Using the SciMark 2.0 code from
http://math.nist.gov/scimark2/scimark2_1c.zip compiled with the
same...
make CFLAGS="-O3 -march=native"
I am able to reproduce the 22% performance regression in the run time
of the Sparse matmult benchmark.
For 10 runs of the scimark2 benechmark, I get 998.439+/-0.4828 with
the release llvm clang 3.5.1 compiler
and 1217.363+/-1.1004 for the current
2013 Sep 12
1
[LLVMdev] bug in X86 disasm code?
hi,
i found this code in X86DisassemblerDecoder.h
#define EA_BASES_32BIT \
ENTRY(EAX) \
ENTRY(ECX) \
ENTRY(EDX) \
ENTRY(EBX) \
ENTRY(sib) \
ENTRY(EBP) \
ENTRY(ESI) \
ENTRY(EDI) \
ENTRY(R8D) \
ENTRY(R9D) \
ENTRY(R10D) \
ENTRY(R11D) \
2016 Jun 25
3
Tail call optimization is getting affected due to local function related optimization with IPRA
Hello LLVM Community,
To improve Interprocedural Register Allocation (IPRA) we are trying to
force caller
saved registers for local functions (which has likage type local). To
achive it
I have modified TargetFrameLowering::determineCalleeSaves() to return early
for
function which satisfies if (F->hasLocalLinkage() && !F->hasAddressTaken())
and
also reflecting the fact that for local
2016 Jun 26
3
Tail call optimization is getting affected due to local function related optimization with IPRA
According to this http://llvm.org/docs/CodeGenerator.html#tail-call-section,
it seems that adding a new CC for the purpose of local function
optimization seems a good idea because tail call optimization only takes
place when both caller and callee have fastcc or GHC or HiPE calling
convention.
-Vivek
On Sun, Jun 26, 2016 at 1:26 AM, vivek pandya <vivekvpandya at gmail.com>
wrote:
>
2016 Jun 30
4
Help required regarding IPRA and Local Function optimization
Hello Mentors,
I am currently finding bug in Local Function related optimization due to
which runtime failures are observed in some test cases, as those test cases
are containing very large function with recursion and object oriented code
so I am not able to find a pattern which is causing failure. So I tried
following simple case to understand expected behavior from this
optimization.
Consider
2016 Jun 25
0
Tail call optimization is getting affected due to local function related optimization with IPRA
On Sat, Jun 25, 2016 at 11:03 PM, vivek pandya <vivekvpandya at gmail.com>
wrote:
> Hello LLVM Community,
>
> To improve Interprocedural Register Allocation (IPRA) we are trying to
> force caller
> saved registers for local functions (which has likage type local). To
> achive it
> I have modified TargetFrameLowering::determineCalleeSaves() to return
> early for
>
2016 Jun 28
2
Tail call optimization is getting affected due to local function related optimization with IPRA
> On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote:
>
> Hello ,
>
> To solve this bug locally I have given preference to tail call optimization over local function related optimization in IPRA. I have added following method to achieve this:
>
> bool isEligibleForTailCallOptimization(Function *F) {
> CallingConv::ID CC =
2016 Jun 27
0
Tail call optimization is getting affected due to local function related optimization with IPRA
Hello ,
To solve this bug locally I have given preference to tail call optimization
over local function related optimization in IPRA. I have added following
method to achieve this:
bool isEligibleForTailCallOptimization(Function *F) {
CallingConv::ID CC = F->getCallingConv();
if (CC == CallingConv::Fast || CC == CallingConv::GHC || CC ==
CallingConv::HiPE)
return true;
return false;
2011 Mar 19
2
[LLVMdev] Apparent optimizer bug on X86_64
Compiling a simple automaton created by GNU bison with -O1 or -O2
resulted in the following machine code:
1300 /*-----------------------------.
1301 | yyreduce -- Do a reduction. |
1302 `-----------------------------*/
1303 yyreduce:
1304 /* yyn is the number of a rule to reduce with. */
1305 yylen = yyr2[yyn];
0x0000000000400c14 <rpcalc_parse+628>: mov
2016 Jun 28
0
Tail call optimization is getting affected due to local function related optimization with IPRA
On Tue, Jun 28, 2016 at 8:11 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
> On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote:
>
> Hello ,
>
> To solve this bug locally I have given preference to tail call
> optimization over local function related optimization in IPRA. I have added
> following method to achieve this:
>
>
2019 Dec 23
2
Register Dataflow Analysis on X86
Hi Scott,
That #1073741833 is a register mask. They are treated as aggregate registers (essentially sets of registers), so if it includes R9D and R11D, it will be treated as being aliased with both.
These separate defs are there because they reach disjoint registers.
--
Krzysztof Parzyszek kparzysz at quicinc.com<mailto:kparzysz at quicinc.com> AI tools development
From: Scott
2018 Feb 06
3
What does a dead register mean?
Hi,
My understanding of a "dead" register is a def that is never used. However,
when I dump the MI after reg alloc on a simple program I see the following
sequence:
ADJCALLSTACKDOWN64 0, 0, 0, *implicit-def dead %rsp*, implicit-def dead
%eflags, implicit-def dead %ssp, implicit %rsp, implicit %ssp
CALL64pcrel32 @foo, <regmask %bh %bl %bp %bpl %bx %ebp %ebx %rbp %rbx %r12
%r13 %r14
2020 Jan 10
2
Register Dataflow Analysis on X86
Hi Scott,
Sorry for the late reply, I was out of office during the holidays.
1. A def node can reach either a use node, or another def node. In the highlighted phi node (p3224), the def (d3225) reaches another def (1598) in statement (s1597), that’s why it’s needed.
2. The reason why the def of R11 in s1578 is not connected directly to the use in s1725 is that there may be an intervening
2020 May 22
2
[PATCH] Optimized assembler version of md5_process() for x86-64
This patch introduces an optimized assembler version of md5_process(),
the inner loop of MD5 checksumming. It affects the performance of all
MD5 operations in rsync - including block matching and whole-file
checksums.
Performance gain is 5-10% depending on the specific CPU.
Originally created by Marc Bevand and placed in the public domain,
later integrated into OpenSSL. This is the original
2019 Nov 08
2
Register Dataflow Analysis on X86
Do you know whether it has been fixed on the 8.0.1 release?
Scott
On Fri, Nov 8, 2019 at 9:45 AM Krzysztof Parzyszek <kparzysz at quicinc.com<mailto:kparzysz at quicinc.com>> wrote:
The one blocking issue that existed in the past has been fixed. I haven’t had time to do any work on it lately, but I’m not aware of any fundamental problems that would make it not work on x86.
--
2016 Jan 02
13
[Bug 93557] New: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM on Nvidia GeForce 7025 / nForce 630a
https://bugs.freedesktop.org/show_bug.cgi?id=93557
Bug ID: 93557
Summary: Kernel Panic on Linux Kernel 4.4 when loading KDE/KDM
on Nvidia GeForce 7025 / nForce 630a
Product: xorg
Version: unspecified
Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
Severity: blocker
2016 Jun 27
3
Finding caller-saved registers at a function call site
Hi Sanjoy,
I'm having trouble finding caller-saved registers using the RegMask operand
you've mentioned. As an example, I've got a C function that looks like
this:
double recurse(int depth, double val)
{
if(depth < max_depth) return recurse(depth + 1, val * 1.2) + val;
else return outer_func(val);
}
As a quick refresher, all "xmm" registers are considered
2017 Aug 12
3
Mischeduler: Unknown reason for peak register pressure increase
I am working on a project where we are integrating an existing pre-RA scheduler into LLVM and we are trying to match our peak register pressure values with the machine instruction schedulers values while using X86. I am finding some mismatches in test cases like the one attached. The registers "AH" and "AL" are live-out but not live-in and I don't see that they are defined
2018 Feb 06
0
What does a dead register mean?
You are right about your interpretation of "dead". The case here is that
RSP is a reserved register and so its liveness isn't really tracked. The
"implicit-def dead" is an idiom used to mean that the register (reserved
or not) is clobbered. The other implicit uses/defs can come from
instruction definitions to indicate that this instruction uses and/or
modifies a given