thr3ads.net - similar to: "Where's the optimiser gone? (part 5.c): missed tail calls, and more..."

Displaying 20 results from an estimated 6000 matches similar to: "Where's the optimiser gone? (part 5.c): missed tail calls, and more..."

Where's the optimiser gone? (part 5.b): missed tail calls, and more...

2018 Dec 01

Where's the optimiser gone? (part 5.b): missed tail calls, and more...

Compile the following functions with "-O3 -target i386" (see <https://godbolt.org/z/VmKlXL>): long long div(long long foo, long long bar) { return foo / bar; } On the left the generated code; on the right the expected, properly optimised code: div: # @div push ebp | mov ebp, esp | push dword ptr [ebp + 20] | push

Where's the optimiser gone (part 11): use the proper instruction for sign extension

2019 Mar 04

Where's the optimiser gone (part 11): use the proper instruction for sign extension

Compile with -O3 -m32 (see <https://godbolt.org/z/yCpBpM>): long lsign(long x) { return (x > 0) - (x < 0); } long long llsign(long long x) { return (x > 0) - (x < 0); } While the code generated for the "long" version of this function is quite OK, the code for the "long long" version misses an obvious optimisation: lsign: # @lsign mov

[LLVMdev] Area for improvement

2005 Feb 22

[LLVMdev] Area for improvement

On Mon, 21 Feb 2005, Jeff Cohen wrote: > I noticed that fourinarow is one of the programs in which LLVM is much slower > than GCC, so I decided to take a look and see why that is so. The program > has many loops that look like this: > > #define ROWS 6 > #define COLS 7 > > void init_board(char b[COLS][ROWS+1]) > { > int i,j; > > for

[LLVMdev] Area for improvement

2005 Feb 22

[LLVMdev] Area for improvement

Sorry, I thought I was running selection dag isel but I screwed up when trying out the really big array. You're right, it does clean it up except for the multiplication. So LoopStrengthReduce is not ready for prime time and doesn't actually get used? I might consider whipping it into shape. Does it still have to handle getelementptr in its full generality? Chris Lattner wrote:

Where's the optimiser gone? (part 5.a): missed tail calls, and more...

2018 Dec 01

Where's the optimiser gone? (part 5.a): missed tail calls, and more...

Compile the following functions with "-O3 -target amd64" (see <https://godbolt.org/z/5xqYhH>): __int128 div(__int128 foo, __int128 bar) { return foo / bar; } On the left the generated code; on the right the expected, properly optimised code: div: # @div push rbp | mov rbp, rsp | call __divti3 | jmp __divti3 pop rbp | ret

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

[LLVMdev] Area for improvement

2005 Feb 22

[LLVMdev] Area for improvement

I noticed that fourinarow is one of the programs in which LLVM is much slower than GCC, so I decided to take a look and see why that is so. The program has many loops that look like this: #define ROWS 6 #define COLS 7 void init_board(char b[COLS][ROWS+1]) { int i,j; for (i=0;i<COLS;i++) for (j=0;j<ROWS;j++) b[i][j]='.';

[LLVMdev] LLVM Optmizer

2009 Jan 06

[LLVMdev] LLVM Optmizer

The following C code : #include <stdio.h> #include <stdlib.h> int TESTE2( int parami , int paraml ,double paramd ) { int varx=0,vary; int nI =0; //varx= parami; if( parami > 0 ) { varx = parami; vary = varx + 1; } else { varx = vary + 1; vary = paraml; } varx = varx + parami + paraml; for( nI = 1 ; nI <= paraml; nI++) { varx =

[LLVMdev] Area for improvement

2005 Feb 22

[LLVMdev] Area for improvement

On Mon, 21 Feb 2005, Jeff Cohen wrote: > Sorry, I thought I was running selection dag isel but I screwed up when > trying out the really big array. You're right, it does clean it up except > for the multiplication. > > So LoopStrengthReduce is not ready for prime time and doesn't actually get > used? I don't know what the status of it is. You could try it out,

[LLVMdev] LLVM optmization

2009 Jan 07

[LLVMdev] LLVM optmization

The following C test program was compiled using LLVM with -O3 option and MSVC with /O2. The MSVC one is about 600 times faster than the one compiled with the LLVM. We can see that the for loop in MSVC assembler is solved in the optimization pass more efficiently than that in LLVM. Is there an way to get a optimization result in LLVM like that of the MSVC? Manoel Teixeira #include

[PATCH] core: Fix 'trackbuf' descriptor list byte length

2011 Mar 06

[PATCH] core: Fix 'trackbuf' descriptor list byte length

(Tested using a Linux bzImage, with and without an initrd.) Per shuffle_and_boot documentation, %ecx must contain the descriptor list byte length, but it's set with such list end address instead. Fix. Signed-off-by: Ahmed S. Darwish <darwish.07 at gmail.com> -- core/bcopy32.inc | 2 ++ core/bcopyxx.inc | 2 ++ core/bootsect.inc | 8 +++++--- core/runkernel.inc |

[LLVMdev] fptoui calling a function that modifies ECX

2013 Jul 19