similar to: [LLVMdev] Optimization opportunity

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Optimization opportunity"

2004 Aug 27
0
[LLVMdev] Optimization opportunity
On Thu, 26 Aug 2004, Jeff Cohen wrote: > There seems to be a disadvantage to the approach of allocating all > locals on the stack using alloca. Consider the following code: There is nothing intrinsic in LLVM that prevents this from happening, we just have not yet implemented 'stack packing'. > We have two arrays, b and c, only one of which can exist at any given > time.
2004 Aug 29
3
[LLVMdev] Optimization opportunity
On Fri, 27 Aug 2004 02:20:34 -0500 (CDT) Chris Lattner <sabre at nondot.org> wrote: > On Thu, 26 Aug 2004, Jeff Cohen wrote: > > > Also, the store into the arrays generates two x86 machine > > instructions: > > > > lea %ECX, DWORD PTR [%ESP + 16] > > mov DWORD PTR [%ECX + <element offset>], %EAX > > > > These can be combined into a
2004 Aug 29
0
[LLVMdev] Optimization opportunity
Jeff, Chris isn't likely to respond to this for a while as he's on vacation. I'll take a look at it and will commit it if it looks good. Since code gen isn't my specialty, could you increase my comfort level a little by giving me some examples of the test results you got when testing your patches? Ideally, I'd like to see some of the test/Programs/MultiSource programs working
2005 Mar 11
5
[LLVMdev] FP Intrinsics
Hello, I am trying to make the FP intrinsics (abs, sin, cos, sqrt) I've added work with the X86ISelPattern, but I'm having some difficulties understanding what needs to be done. I assume I have to add new nodetypes for the FP instructions to SelectionDAGNodes.h, and make nodes for these in SelectionDAGLowering::visitCall when I find the intrinsic... The part I don't quite
2015 Feb 13
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
I submitted the problem report to clang's bugzilla but no one seems to care so I have to send it to the mailing list. clang 3.7 svn (trunk 229055 as the time I was to report this problem) generates slower code than 3.5 (Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)) for the following code. It is a "8 queens puzzle" solver written as an educational example. As
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
The regressions in the performance of generated code, introduced by the llvm 3.6 release, don't seem to be limited to this 8 queens puzzle" solver test case. See... http://www.phoronix.com/scan.php?page=article&item=llvm-clang-3.5-3.6-rc1&num=1 where a bit hit in the performance of the Sparse Matrix Multiply test of the SciMark v2.0 benchmark was observed as well as others.
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
Using the SciMark 2.0 code from http://math.nist.gov/scimark2/scimark2_1c.zip compiled with the same... make CFLAGS="-O3 -march=native" I am able to reproduce the 22% performance regression in the run time of the Sparse matmult benchmark. For 10 runs of the scimark2 benechmark, I get 998.439+/-0.4828 with the release llvm clang 3.5.1 compiler and 1217.363+/-1.1004 for the current
2005 Feb 22
0
[LLVMdev] Area for improvement
On Mon, 21 Feb 2005, Jeff Cohen wrote: > I noticed that fourinarow is one of the programs in which LLVM is much slower > than GCC, so I decided to take a look and see why that is so. The program > has many loops that look like this: > > #define ROWS 6 > #define COLS 7 > > void init_board(char b[COLS][ROWS+1]) > { > int i,j; > > for
2005 Feb 22
2
[LLVMdev] Area for improvement
Sorry, I thought I was running selection dag isel but I screwed up when trying out the really big array. You're right, it does clean it up except for the multiplication. So LoopStrengthReduce is not ready for prime time and doesn't actually get used? I might consider whipping it into shape. Does it still have to handle getelementptr in its full generality? Chris Lattner wrote:
2005 Feb 22
5
[LLVMdev] Area for improvement
I noticed that fourinarow is one of the programs in which LLVM is much slower than GCC, so I decided to take a look and see why that is so. The program has many loops that look like this: #define ROWS 6 #define COLS 7 void init_board(char b[COLS][ROWS+1]) { int i,j; for (i=0;i<COLS;i++) for (j=0;j<ROWS;j++) b[i][j]='.';
2005 Mar 11
0
[LLVMdev] FP Intrinsics
Update: I have been working on this all day, and I finally got it working more or less with the pattern instruction selector... However, the generated code is not very good, and I haven't implemented the expand to calls if the target does not support these FP instructions. As an example, in the following function the sub abs and compare compiles to 13 instructions! Also it has changed the
2011 Nov 02
5
[LLVMdev] About JIT by LLVM 2.9 or later
Hello guys, Thanks for your help when you are busing. I am working on an open source project. It supports shader language and I want JIT feature, so LLVM is used. But now I find the ABI & Calling Convention did not co-work with MSVC. For example, following code I have: struct float4 { float x, y, z, w; }; struct float4x4 { float4 x, y, z, w; }; float4 fetch_vs( float4x4* mat
2012 Feb 27
3
[LLVMdev] Microsoft constructors implementation problem.
Hi all. I am working on constructors implementation for MS ABI. Itanium ABI has 2 constructor types - base & complete. MS ABI has only 1 type. How it works I'll show on example. class first { public: virtual void g(){} }; class second : public virtual first { public : virtual void g(){} }; When construct instance of second we will have next code push 1 lea ecx,[f]
2018 Nov 20
2
A pattern for portable __builtin_add_overflow()
Hi LLVM, clang, I'm trying to write a portable version of __builtin_add_overflow() it a way that the compiler would recognize the pattern and use the add_overflow intrinsic / the best possible machine instruction. Here are docs about these builtins: https://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins . With unsigned types this is easy: int uaddo_native(unsigned
2018 Nov 30
2
(Question regarding the) incomplete "builtins library" of "Compiler-RT"
"Friedman, Eli" <efriedma at codeaurora.org> wrote: > On 11/30/2018 8:31 AM, Stefan Kanthak via llvm-dev wrote: >> Hi @ll, >> >> compiler-rt implements (for example) the MSVC (really Windows) >> specific routines compiler-rt/lib/builtins/i386/chkstk.S and >> compiler-rt/lib/builtins/x86_64/chkstk.S as __chkstk_ms() >> See
2018 Feb 06
3
What does a dead register mean?
Hi, My understanding of a "dead" register is a def that is never used. However, when I dump the MI after reg alloc on a simple program I see the following sequence: ADJCALLSTACKDOWN64 0, 0, 0, *implicit-def dead %rsp*, implicit-def dead %eflags, implicit-def dead %ssp, implicit %rsp, implicit %ssp CALL64pcrel32 @foo, <regmask %bh %bl %bp %bpl %bx %ebp %ebx %rbp %rbx %r12 %r13 %r14
2014 Mar 25
3
[LLVMdev] Getting the Debugging JIT-ed Code with GDB example to work
I'm trying to run the example described at: http://llvm.org/docs/DebuggingJITedCode.html I followed the sample command line session (below, with versions numbers for everything), but gdb doesn't stop at the breakpoints as described. Any idea what is wrong? Thanks, Zach zdevito at derp:~/terra/tests$ > ~/clang+llvm-3.4-x86_64-unknown-ubuntu12.04/bin/clang -cc1 -O0 -g >
2011 Dec 04
3
[LLVMdev] Implement implicit TLS on Windows - need advice
Hi! LLVM currently does not implement the implicit TLS model on Windows. This model is easy: - a thread local variable ends up in the .tls section - to access a thread local variable, you have to do (1) load pointer to thread local storage from TEB On x86_64, this is gs:0x58, on x86 it is fs:0x2C. (2) load pointer to thread local state. In general, the index is stored in variable
2011 Dec 06
0
[LLVMdev] Implement implicit TLS on Windows - need advice
On Sun, Dec 4, 2011 at 9:18 AM, Kai <kai at redstar.de> wrote: > Hi! > > LLVM currently does not implement the implicit TLS model on Windows. This > model is easy: > > - a thread local variable ends up in the .tls section > - to access a thread local variable, you have to do >  (1) load pointer to thread local storage from TEB >      On x86_64, this is gs:0x58, on
2012 Feb 27
0
[LLVMdev] Microsoft constructors implementation problem.
On Mon, Feb 27, 2012 at 3:42 AM, r4start <r4start at gmail.com> wrote: > Hi all. > > I am working on constructors implementation for MS ABI. Itanium ABI has > 2 constructor types - base & complete. MS ABI has only 1 type. > How it works I'll show on example. > class first { > public: >   virtual void g(){} > }; > > class second : public virtual first