thr3ads.net - similar to: "Where's the optimiser gone? (part 5.b): missed tail calls, and more..."

Displaying 20 results from an estimated 10000 matches similar to: "Where's the optimiser gone? (part 5.b): missed tail calls, and more..."

Where's the optimiser gone? (part 5.c): missed tail calls, and more...

2018 Dec 01

Where's the optimiser gone? (part 5.c): missed tail calls, and more...

Compile the following functions with "-O3 -target i386-win32" (see <https://godbolt.org/z/exmjWY>): __int64 __fastcall div(__int64 foo, __int64 bar) { return foo / bar; } On the left the generated code; on the right the expected, properly optimised code: push dword ptr [esp + 16] | push dword ptr [esp + 16] | push dword ptr [esp + 16] |

Where's the optimiser gone (part 11): use the proper instruction for sign extension

2019 Mar 04

Where's the optimiser gone (part 11): use the proper instruction for sign extension

Compile with -O3 -m32 (see <https://godbolt.org/z/yCpBpM>): long lsign(long x) { return (x > 0) - (x < 0); } long long llsign(long long x) { return (x > 0) - (x < 0); } While the code generated for the "long" version of this function is quite OK, the code for the "long long" version misses an obvious optimisation: lsign: # @lsign mov

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

[LLVMdev] Area for improvement

2005 Feb 22

[LLVMdev] Area for improvement

On Mon, 21 Feb 2005, Jeff Cohen wrote: > I noticed that fourinarow is one of the programs in which LLVM is much slower > than GCC, so I decided to take a look and see why that is so. The program > has many loops that look like this: > > #define ROWS 6 > #define COLS 7 > > void init_board(char b[COLS][ROWS+1]) > { > int i,j; > > for

[LLVMdev] Area for improvement

2005 Feb 22

[LLVMdev] Area for improvement

Sorry, I thought I was running selection dag isel but I screwed up when trying out the really big array. You're right, it does clean it up except for the multiplication. So LoopStrengthReduce is not ready for prime time and doesn't actually get used? I might consider whipping it into shape. Does it still have to handle getelementptr in its full generality? Chris Lattner wrote:

(Question regarding the) incomplete "builtins library" of "Compiler-RT"

2018 Nov 30

(Question regarding the) incomplete "builtins library" of "Compiler-RT"

"Friedman, Eli" <efriedma at codeaurora.org> wrote: > On 11/30/2018 8:31 AM, Stefan Kanthak via llvm-dev wrote: >> Hi @ll, >> >> compiler-rt implements (for example) the MSVC (really Windows) >> specific routines compiler-rt/lib/builtins/i386/chkstk.S and >> compiler-rt/lib/builtins/x86_64/chkstk.S as __chkstk_ms() >> See

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 13

[LLVMdev] trunk's optimizer generates slower code than 3.5

I submitted the problem report to clang's bugzilla but no one seems to care so I have to send it to the mailing list. clang 3.7 svn (trunk 229055 as the time I was to report this problem) generates slower code than 3.5 (Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)) for the following code. It is a "8 queens puzzle" solver written as an educational example. As

Where's the optimiser gone? (part 5.a): missed tail calls, and more...

2018 Dec 01

Where's the optimiser gone? (part 5.a): missed tail calls, and more...

Compile the following functions with "-O3 -target amd64" (see <https://godbolt.org/z/5xqYhH>): __int128 div(__int128 foo, __int128 bar) { return foo / bar; } On the left the generated code; on the right the expected, properly optimised code: div: # @div push rbp | mov rbp, rsp | call __divti3 | jmp __divti3 pop rbp | ret

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

[LLVMdev] trunk's optimizer generates slower code than 3.5

The regressions in the performance of generated code, introduced by the llvm 3.6 release, don't seem to be limited to this 8 queens puzzle" solver test case. See... http://www.phoronix.com/scan.php?page=article&item=llvm-clang-3.5-3.6-rc1&num=1 where a bit hit in the performance of the Sparse Matrix Multiply test of the SciMark v2.0 benchmark was observed as well as others.

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

[LLVMdev] trunk's optimizer generates slower code than 3.5

Using the SciMark 2.0 code from http://math.nist.gov/scimark2/scimark2_1c.zip compiled with the same... make CFLAGS="-O3 -march=native" I am able to reproduce the 22% performance regression in the run time of the Sparse matmult benchmark. For 10 runs of the scimark2 benechmark, I get 998.439+/-0.4828 with the release llvm clang 3.5.1 compiler and 1217.363+/-1.1004 for the current

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 19

[LLVMdev] SIMD instructions and memory alignment on X86

Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think

[LLVMdev] About JIT by LLVM 2.9 or later

2011 Nov 02

[LLVMdev] About JIT by LLVM 2.9 or later

Hello guys, Thanks for your help when you are busing. I am working on an open source project. It supports shader language and I want JIT feature, so LLVM is used. But now I find the ABI & Calling Convention did not co-work with MSVC. For example, following code I have: struct float4 { float x, y, z, w; }; struct float4x4 { float4 x, y, z, w; }; float4 fetch_vs( float4x4* mat

What does a dead register mean?

2018 Feb 06

What does a dead register mean?

Hi, My understanding of a "dead" register is a def that is never used. However, when I dump the MI after reg alloc on a simple program I see the following sequence: ADJCALLSTACKDOWN64 0, 0, 0, *implicit-def dead %rsp*, implicit-def dead %eflags, implicit-def dead %ssp, implicit %rsp, implicit %ssp CALL64pcrel32 @foo, <regmask %bh %bl %bp %bpl %bx %ebp %ebx %rbp %rbx %r12 %r13 %r14

[LLVMdev] Area for improvement

2005 Feb 22

[LLVMdev] Area for improvement

I noticed that fourinarow is one of the programs in which LLVM is much slower than GCC, so I decided to take a look and see why that is so. The program has many loops that look like this: #define ROWS 6 #define COLS 7 void init_board(char b[COLS][ROWS+1]) { int i,j; for (i=0;i<COLS;i++) for (j=0;j<ROWS;j++) b[i][j]='.';

(Question regarding the) incomplete "builtins library" of "Compiler-RT"

2018 Nov 30

(Question regarding the) incomplete "builtins library" of "Compiler-RT"

Hi @ll, compiler-rt implements (for example) the MSVC (really Windows) specific routines compiler-rt/lib/builtins/i386/chkstk.S and compiler-rt/lib/builtins/x86_64/chkstk.S as __chkstk_ms() See <http://msdn.microsoft.com/en-us/library/ms648426.aspx> Is there any special reason why compiler-rt doesn't implement other MSVC specific functions (alias builtins or "compiler

[LLVMdev] fptoui calling a function that modifies ECX

2013 Jul 19

[LLVMdev] fptoui calling a function that modifies ECX

Try adding ECX to the Defs of this part of lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a Windows machine to test myself. let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in { def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src), "# win32 fptoui", [(X86WinFTOL RFP32:$src)]>,

[LLVMdev] fptoui calling a function that modifies ECX

2013 Jul 19

[LLVMdev] fptoui calling a function that modifies ECX

I don't think that's going to work. On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman <peter at uformia.com> wrote: > Thank you, I'm trying this now. > > > On 19/07/2013 5:23 PM, Craig Topper wrote: > > Try adding ECX to the Defs of this part of > lib/Target/X86/X86InstrCompiler.td like I've done below. I don't have a > Windows machine to test

[LLVMdev] Boostrap Failure -- Expected Differences?

2007 Apr 27

[LLVMdev] Boostrap Failure -- Expected Differences?

The saga continues. I've been tracking the interface changes and merging them with the refactoring work I'm doing. I got as far as building stage3 of llvm-gcc but the object files from stage2 and stage3 differ: warning: ./cc1-checksum.o differs warning: ./cc1plus-checksum.o differs (Are the above two ok?) The list below is clearly bad. I think it's every object file in the

[LLVMdev] fptoui calling a function that modifies ECX

2013 Jul 19

[LLVMdev] fptoui calling a function that modifies ECX

Here's my attempt at a fix. Adding Jakob to make sure I did this right. On Fri, Jul 19, 2013 at 2:34 AM, Peter Newman <peter at uformia.com> wrote: > That does appear to have worked. All my tests are passing now. > > I'll hand this out to our other devs & testers and make sure it's working > for them as well (not just on my machine). > > Thank you,

[LLVMdev] Microsoft constructors implementation problem.

2012 Feb 27

[LLVMdev] Microsoft constructors implementation problem.

Hi all. I am working on constructors implementation for MS ABI. Itanium ABI has 2 constructor types - base & complete. MS ABI has only 1 type. How it works I'll show on example. class first { public: virtual void g(){} }; class second : public virtual first { public : virtual void g(){} }; When construct instance of second we will have next code push 1 lea ecx,[f]

similar to: Where's the optimiser gone? (part 5.b): missed tail calls, and more...