thr3ads.net - similar to: "[LLVMdev] llc -O# / opt -O# differences"

Displaying 20 results from an estimated 3000 matches similar to: "[LLVMdev] llc -O# / opt -O# differences"

[LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?

2012 Jan 04

[LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?

I write a small function and test it under clang and gcc, filet test.c: double X[100]; double Y[100]; double DA = 0.3; int f() { int i; for (i = 0; i < 100; i++) Y[i] = Y[i] - DA * X[i]; return 0; } clang -S -O3 -o test.s test.c -march=native -ccc-echo result: "D:/work/trunk/bin/Release/clang.exe" -cc1 -triple i686-pc-win32 -S -disable-fr e -disable-llvm-verifier

[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm

2017 Mar 01

[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm

Hi, We seem to have found a bug in the LLVM 3.8 code generator. We are using MCJIT and have isolated working.ll and broken.ll after middle-end optimizations -- in the block merge128, notice that broken.ll has a fcmp une comparison to zero and a jump based on that branch: merge128: ; preds = %true71, %false72 %_rtB_724 = load %B_repro_T*, %B_repro_T**

[LLVMdev] XMM in X86 Backend

2010 Jun 07

[LLVMdev] XMM in X86 Backend

Hi all, I am observing an excessive use of xmm registers in the output assembly produced by x86 backend. Basically, for a code like this double test(double a, double b) { double c; c = 1.0 + sin (a + b*b); return c; } llc produced somthing like.... movsd 16(%ebp), %xmm0 mulsd %xmm0, %xmm0 addsd 8(%ebp), %xmm0 movsd %xmm0, (%esp) ....... fstpl

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

2017 Aug 04

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

All, I'm working on an improvement to the inline cost model, but I'm unsure how to proceed. Let me begin by first describing the problem I'm trying to solve. Consider the following pseudo C code: *typedef struct element { unsigned idx; } element_t; * *static inline unsigned char fn2 (element_t *dst_ptr, const element_t *a_ptr, const element_t *b_ptr,

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

2017 Aug 07

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

On 8/7/2017 1:02 PM, Daniel Berlin wrote: > Can someone fill me in on the issue with the dominator tree, > precisely, during inlining? > We now have the capability of quickly keeping it up to date without > too much trouble (it may require pushing it through a bunch of places, > but the actual changes to do should be easy). If I'm not mistaken (which I very well could be

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

2017 Aug 04

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

On 8/4/2017 2:06 PM, Daniel Berlin wrote: > A few notes: > I'm a bit surprised IPO copy/constant propagation doesn't get this > case, but i didn't look if the lattice supports variables. > In particular, in your example, given no other call sites, it should > eliminate the dead code. > (In a real program, it may require cloning). In the actual program

[LLVMdev] 2.6 JIT using wrong address for external functions

2009 Dec 07

[LLVMdev] 2.6 JIT using wrong address for external functions

I have an external function that was added with ExecutionEngine::addGlobalMapping... and the JIT is burning the wrong address into the emitted function. All of the addresses have 0xffffff8d00000000 added to them. Does this look familiar to anyone? I'm using 2.6 on Solaris10/x64 if it matters. This has been working for about two months and I can't readily figure out what I changed to break

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

2017 Aug 07

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

Hi, Coincidentally I've been working to optimize this same case last week. I was struggling a bit to determine where to put this functionality and eventually went for the pragmatic approach of creating an experimental pass. Probably not the eventual solution, but it may provide some useful input to the discussion here. Basically, I experimented with a 'pre-inlining-transform' pass

[LLVMdev] 2.6 JIT using wrong address for external functions

2009 Dec 07

[LLVMdev] 2.6 JIT using wrong address for external functions

I had that problem too: http://llvm.org/bugs/show_bug.cgi?id=5116. To work around the problem, you can: * Switch to the thread-unsafe lazy jit. * Allocate your JIT code within 2GB of your text segment. * Find a way to look up the external function with dlsym or maybe the ExecutionEngine's LazyFunctionCreator instead of addGlobalMapping. * Upgrade to the top of the svn tree. On Mon, Dec 7,

[RFC][llvm-mca] Adding binary support to llvm-mca.

2018 Nov 15

[RFC][llvm-mca] Adding binary support to llvm-mca.

Introduction ----------------- Currently llvm-mca only accepts assembly code as input. We would like to extend llvm-mca to support object files, allowing users to analyze the performance of binaries. The proposed changes (which involve both clang and llvm) optionally introduce an object file section, but this can be stripped-out if desired. For the llvm-mca binary support feature to be useful, a

Use GPU in R with .Call

2012 Jul 21

Use GPU in R with .Call

Hi All, I am a newbie to GPU programming. I wonder if anyone can help me with using GPU in .Call in R. Basically, I want to write a function that calcuates the sum of two double type vectors and implement this using GPU. My final goal is to make such an implementation callable from R. (a) First, I wrote a R-C interface handles the R object using .Call (saved as VecAdd_cuda.c

Finding caller-saved registers at a function call site

2016 Jun 27

Finding caller-saved registers at a function call site

Hi Sanjoy, I'm having trouble finding caller-saved registers using the RegMask operand you've mentioned. As an example, I've got a C function that looks like this: double recurse(int depth, double val) { if(depth < max_depth) return recurse(depth + 1, val * 1.2) + val; else return outer_func(val); } As a quick refresher, all "xmm" registers are considered

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

2013 Jul 19

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

(Changing subject line as diagnosis has changed) I'm attaching the compiled code that I've been getting, both with CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with CodeGenOpt::None, but that seems to be because ECX isn't being used - it still gets set to 0x7fffffff by one of the calls to 76719BA1 I notice that X86::SQRTPD[m|r] appear in

Finding caller-saved registers at a function call site

2016 Jun 27

Finding caller-saved registers at a function call site

Ah, I see -- the registers left out of the mask are considered clobbered. Got it! At a high level, I'm interested in finding the locations of all values that are live at a given call site. You can think of it like a debugger, e.g. gdb -- I'd like to be able to unwind the stack, frame by frame, and locate all the live values for each function invocation (i.e., where they are in a

[RFC][llvm-mca] Adding binary support to llvm-mca.

2018 Nov 21

[RFC][llvm-mca] Adding binary support to llvm-mca.

Hi Andrea, Thanks for your input. On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote: [... snip ...] > About the suggested design: > I like the idea of being able to identify code regions using a numeric > identifier. > However, what happens if a code region spans through multiple basic blocks? The current patch does not take into consideration cases where the region

[LLVMdev] Suboptimal code due to excessive spilling

2012 Apr 05

[LLVMdev] Suboptimal code due to excessive spilling

I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround? /Patrik Hägglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker Sent: den 28 mars 2012 03:18 To: llvmdev Subject: [LLVMdev] Suboptimal code due to excessive spilling Hi, I have run into the following strange behavior

[LLVMdev] Suboptimal code due to excessive spilling

2012 Mar 28

[LLVMdev] Suboptimal code due to excessive spilling

Hi, I have run into the following strange behavior and wanted to ask for some advice. For the C program below, function sum() gets inlined in foo() but the code generated looks very suboptimal (the code is an extract from a larger program). Below I show the 32-bit x86 assembly as produced by the demo page on the llvm home page ("Output A"). As you can see from the assembly, after

[LLVMdev] Odd weak symbol thing on i386

2012 Jan 13

[LLVMdev] Odd weak symbol thing on i386

Hi, I'm compiling lldiv.c from the NetBSD standard library. It works on ARM, Mips, Microblaze,ppc, ppc64, and x86_64. On i386 a very strange thing happens. Here's the source: #include <stdlib.h> #define __weak_alias(sym) __attribute__ ((weak, alias (#sym))) lldiv_t lldiv(long long int num, long long int denom) __weak_alias(_lldiv); lldiv_t _lldiv(long long num, long

[LLVMdev] Odd weak symbol thing on i386

2012 Jan 13

[LLVMdev] Odd weak symbol thing on i386

On Fri, Jan 13, 2012 at 2:53 PM, Richard Pennington <rich at pennware.com> wrote: > Hi, > > I'm compiling lldiv.c from the NetBSD standard library. It works on ARM, Mips, > Microblaze,ppc, ppc64, and x86_64. On i386 a very strange thing happens. > Here's the source: > > #include <stdlib.h> > #define __weak_alias(sym) __attribute__ ((weak, alias

[RFC][llvm-mca] Adding binary support to llvm-mca.

2018 Nov 27

[RFC][llvm-mca] Adding binary support to llvm-mca.

Thanks for clarifying it Matt. In general, I quite like your suggested design. My only concern is about the semantic of the two new intrinsics. You design doesn't allow mca ranges to span through multiple basic blocks. That constraint is acceptable for now, since llvm-mca doesn't know how to deal with control flow. However, I am a bit concerned about what might happen in future if we

similar to: [LLVMdev] llc -O# / opt -O# differences