thr3ads.net - similar to: "[LLVMdev] Target specific type modifications"

Displaying 20 results from an estimated 6000 matches similar to: "[LLVMdev] Target specific type modifications"

2010 Sep 29

[LLVMdev] spilling & xmm register usage

On Sep 29, 2010, at 8:35 AMPDT, Ralf Karrenberg wrote: > Hello everybody, > > I have stumbled upon a test case (the attached module is a slightly > reduced version) that shows extremely reduced performance on linux > compared to windows when executed using LLVM's JIT. > > We narrowed the problem down to the actual code being generated, the > source IR on both systems

[LLVMdev] Loads moving across barriers

2013 Nov 08

[LLVMdev] Loads moving across barriers

Hi, For a long time we've been having a problem we've been working around in OpenCL where loads are moving across an intrinsic used for a barrier. Attached is the testcase, and the result of opt -S -basicaa -gvn on it. This example is essentially this: void foo(global float2* result, local float2* restrict data0, ...) { int id = get_local_id(0); // ... data0[id] = ...;

[LLVMdev] better code for IV

2014 Feb 19

[LLVMdev] better code for IV

Hi Andrew, The issue below refers to LSR, so I'll appreciate your feedback. It also refers to instruction combining and might impact backends other than X86, so if you know of others that might be interested you are more than welcome to add them. Thanks, Anat _____________________________________________ From: Shemer, Anat Sent: Tuesday, February 18, 2014 15:07 To: 'llvmdev at

[LLVMdev] Address space extension

2013 Aug 11

[LLVMdev] Address space extension

Hello Micah, I first apologize for the mail length, but I think that using an example would be better to clarify the case and the objections. > [Micah Villmow] In the case of OpenCL, you can't correctly use the standard C calling convention and still be OpenCL compliant, the C calling convention is too permissive. The second you use OpenCL, you are using an OpenCL specific calling

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

[LLVMdev] Vectorizer using Instruction, not opcodes

The loop vectorized does not estimate the cost of vectorization by looking at the IR you list below. It does not vectorize and then run the CostAnalysis pass. It estimates the cost itself before it even performs the vectorization. The way it works is that it looks at all the scalar instructions and asks: What is the cost if I execute the scalar instruction as a vector instruction. Therefore, it

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 18

[LLVMdev] [Vectorization] Mis match in code generated

Hi, I am trying to understand LLVM vectorization implementation and was looking into both loop and SLP vectorization. test case 1: *int foo(int *a) {int sum = 0,i;for(i=0; i<16; i++) sum += a[i];return sum;}* This code is vectorized by loop vectorizer where we calculate scalar loop cost as 4 and vector loop cost as 2. Since vector loop cost is less and above reduction is legal to

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 18

[LLVMdev] [Vectorization] Mis match in code generated

Hi Nadav, Thanks for the quick reply !! Ok, so as of now we are lacking capability to handle flat large reductions. I did go through function vectorizeChainsInBlock() (line number 2862). In this function, we try to vectorize if we have phi nodes in the IR (several if's check for phi nodes) i.e we try to construct tree that starts at chains. Any pointers on how to join multiple trees? I

Some llvm questions (for tgsi backend)

2016 Jan 11

Some llvm questions (for tgsi backend)

On Mon, Jan 11, 2016 at 6:07 AM, Hans de Goede <hdegoede at redhat.com> wrote: > Hi, > > After a few distractions I'm back to work on the llvm tgsi backend. I've > added clang integration and I can now compile a simple opencl program > to something which sort of looks like tgsi. > > You can find my latest work on this here: >

Some llvm questions (for tgsi backend)

2016 Jan 11

Some llvm questions (for tgsi backend)

On Mon, Jan 11, 2016 at 12:07:14PM +0100, Hans de Goede wrote: > Hi, > > After a few distractions I'm back to work on the llvm tgsi backend. I've > added clang integration and I can now compile a simple opencl program > to something which sort of looks like tgsi. > > You can find my latest work on this here: > http://cgit.freedesktop.org/~jwrdegoede/llvm >

[LLVMdev] Bug in MachineInstr::isIdenticalTo

2011 Jan 04

[LLVMdev] Bug in MachineInstr::isIdenticalTo

On Jan 4, 2011, at 11:08 AM, Villmow, Micah wrote: > I have ran across a case where the function isIdenticalTo is return true for instructions that are not equivalent. The instructions in question are load/store instructions, and is causing a problem with MachineBranchFolding. The problem is this, I have two branches of a switch statement that are identical, except for the size of the store.

Structurizing multi-exit regions

2017 Mar 02

Structurizing multi-exit regions

Hi, I'm trying to solve a problem from StructurizeCFG not actually handling regions with multiple exits. Sample IR attached. StructurizeCFG doesn't touch this function, exiting early on the isTopLevelRegion check. SIAnnotateControlFlow then gets confused and ends up inserting an if into one of the blocks, and the matching end.cf into one of the return/unreachable blocks. The input to

[LLVMdev] Bug in MachineInstr::isIdenticalTo

2011 Jan 04

[LLVMdev] Bug in MachineInstr::isIdenticalTo

I have ran across a case where the function isIdenticalTo is return true for instructions that are not equivalent. The instructions in question are load/store instructions, and is causing a problem with MachineBranchFolding. The problem is this, I have two branches of a switch statement that are identical, except for the size of the store. Here is some cut-down LLVM-IR to showcase the issue:

[AMDGPU] non-hsa intrinsic with hsa target

2016 Mar 05

[AMDGPU] non-hsa intrinsic with hsa target

Dear Developers, I compiled a OpenCL kernel before (on Nov. last year) like __kernel void g(__global float* array) { array[get_global_id(0)] = 1; } with libclc, which would originally use the instrinsics like llvm.r600.read.local.size.x(). I executed the generated object file with one version of the hsa-runtime [1] provided by Mr. Stellard, when there was more than one workgroup, the output

Some llvm questions (for tgsi backend)

2016 Jan 12

Some llvm questions (for tgsi backend)

Hi Tom, Thanks for taking the time to answer this. On 11-01-16 18:10, Tom Stellard wrote: > On Mon, Jan 11, 2016 at 12:07:14PM +0100, Hans de Goede wrote: >> Hi, >> >> After a few distractions I'm back to work on the llvm tgsi backend. I've >> added clang integration and I can now compile a simple opencl program >> to something which sort of looks like

[LLVMdev] [Vectorization] Mis match in code generated

2014 Nov 10

[LLVMdev] [Vectorization] Mis match in code generated

Hi Suyog, Thanks for looking at this. This has recently got itself onto my TODO list too. > I am not sure how much all this will improve the code quality for horizontal reduction > (donno how frequently such pattern of horizontal reduction from same array occurs in real world/SPECS). Actually the main loop of 470.lbm can be SLP vectorized like this. We have three parts to it: A fully

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 19

[LLVMdev] [Vectorization] Mis match in code generated

Hi Arnold, Thanks for your reply. I tried test case as suggested by you. *void foo(int *a, int *sum) {*sum = a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}* so that it has a 'store' in its IR. *IR before vectorization :*target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple =

[AMDGPU] non-hsa intrinsic with hsa target

2016 Mar 05

[AMDGPU] non-hsa intrinsic with hsa target

Hi Mr. Liu, Thanks for your quick reply. I compiled the code with the libclc_trunk and linked the bitcode file under $LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc, it is currently using the new workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment,

Some llvm questions (for tgsi backend)

2016 Jan 11

Some llvm questions (for tgsi backend)

Hi, After a few distractions I'm back to work on the llvm tgsi backend. I've added clang integration and I can now compile a simple opencl program to something which sort of looks like tgsi. You can find my latest work on this here: http://cgit.freedesktop.org/~jwrdegoede/llvm http://cgit.freedesktop.org/~jwrdegoede/clang (the latter may still need to sync) I've a little test

[LLVMdev] Address calculation

2008 Oct 06

[LLVMdev] Address calculation

I am attempting to get indexing code generation working with my backend. However, it seems that the addresses being calculated is being multiplied by the width of the data type. define void @ test_input_index_constant_int(i32 addrspace(11)* %input, i32 addrspace(11)* %result) { entry: %input.addr = alloca i32 addrspace(11)* ; <i32 addrspace(11)**> [#uses=2]

[LLVMdev] [Patch][RFC] Change R600 data layout

2013 Dec 31

[LLVMdev] [Patch][RFC] Change R600 data layout

Hi, I've prepared patches for both LLVM and Clang to change the datalayout for R600. This may seem like a bold move, but I think it is warranted. R600/SI is a strange architecture in that it uses 64bit pointers but does not support 64 bit arithmetic except for load/store operations that roughly map onto getelementptr. The current datalayout for r600 includes n32:64, which is odd

similar to: [LLVMdev] Target specific type modifications