Displaying 20 results from an estimated 6000 matches similar to: "[LLVMdev] Target specific type modifications"
2010 Sep 29
0
[LLVMdev] spilling & xmm register usage
On Sep 29, 2010, at 8:35 AMPDT, Ralf Karrenberg wrote:
> Hello everybody,
>
> I have stumbled upon a test case (the attached module is a slightly
> reduced version) that shows extremely reduced performance on linux
> compared to windows when executed using LLVM's JIT.
>
> We narrowed the problem down to the actual code being generated, the
> source IR on both systems
2013 Nov 08
3
[LLVMdev] Loads moving across barriers
Hi,
For a long time we've been having a problem we've been working around in
OpenCL where loads are moving across an intrinsic used for a barrier.
Attached is the testcase, and the result of opt -S -basicaa -gvn on it.
This example is essentially this:
void foo(global float2* result, local float2* restrict data0, ...)
{
int id = get_local_id(0);
// ...
data0[id] = ...;
2014 Feb 19
2
[LLVMdev] better code for IV
Hi Andrew,
The issue below refers to LSR, so I'll appreciate your feedback. It also refers to instruction combining and might impact backends other than X86, so if you know of others that might be interested you are more than welcome to add them.
Thanks, Anat
_____________________________________________
From: Shemer, Anat
Sent: Tuesday, February 18, 2014 15:07
To: 'llvmdev at
2013 Aug 11
0
[LLVMdev] Address space extension
Hello Micah,
I first apologize for the mail length, but I think that using an example would
be better to clarify the case and the objections.
> [Micah Villmow] In the case of OpenCL, you can't correctly use the standard C calling convention and still be OpenCL compliant, the C calling convention is too permissive. The second you use OpenCL, you are using an OpenCL specific calling
2013 Feb 04
0
[LLVMdev] Vectorizer using Instruction, not opcodes
The loop vectorized does not estimate the cost of vectorization by looking at the IR you list below. It does not vectorize and then run the CostAnalysis pass. It estimates the cost itself before it even performs the vectorization.
The way it works is that it looks at all the scalar instructions and asks: What is the cost if I execute the scalar instruction as a vector instruction. Therefore, it
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi,
I am trying to understand LLVM vectorization implementation and was looking
into both loop and SLP vectorization.
test case 1:
*int foo(int *a) {int sum = 0,i;for(i=0; i<16; i++) sum += a[i];return
sum;}*
This code is vectorized by loop vectorizer where we calculate scalar loop
cost as 4 and vector loop cost as 2.
Since vector loop cost is less and above reduction is legal to
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Nadav,
Thanks for the quick reply !!
Ok, so as of now we are lacking capability to handle flat large reductions.
I did go through function vectorizeChainsInBlock() (line number 2862). In
this function,
we try to vectorize if we have phi nodes in the IR (several if's check for
phi nodes) i.e we try to
construct tree that starts at chains.
Any pointers on how to join multiple trees? I
2016 Jan 11
0
Some llvm questions (for tgsi backend)
On Mon, Jan 11, 2016 at 6:07 AM, Hans de Goede <hdegoede at redhat.com> wrote:
> Hi,
>
> After a few distractions I'm back to work on the llvm tgsi backend. I've
> added clang integration and I can now compile a simple opencl program
> to something which sort of looks like tgsi.
>
> You can find my latest work on this here:
>
2016 Jan 11
0
Some llvm questions (for tgsi backend)
On Mon, Jan 11, 2016 at 12:07:14PM +0100, Hans de Goede wrote:
> Hi,
>
> After a few distractions I'm back to work on the llvm tgsi backend. I've
> added clang integration and I can now compile a simple opencl program
> to something which sort of looks like tgsi.
>
> You can find my latest work on this here:
> http://cgit.freedesktop.org/~jwrdegoede/llvm
>
2011 Jan 04
0
[LLVMdev] Bug in MachineInstr::isIdenticalTo
On Jan 4, 2011, at 11:08 AM, Villmow, Micah wrote:
> I have ran across a case where the function isIdenticalTo is return true for instructions that are not equivalent. The instructions in question are load/store instructions, and is causing a problem with MachineBranchFolding. The problem is this, I have two branches of a switch statement that are identical, except for the size of the store.
2017 Mar 02
5
Structurizing multi-exit regions
Hi,
I'm trying to solve a problem from StructurizeCFG not actually handling
regions with multiple exits. Sample IR attached.
StructurizeCFG doesn't touch this function, exiting early on the
isTopLevelRegion check. SIAnnotateControlFlow then gets confused and
ends up inserting an if into one of the blocks, and the matching end.cf
into one of the return/unreachable blocks. The input to
2011 Jan 04
4
[LLVMdev] Bug in MachineInstr::isIdenticalTo
I have ran across a case where the function isIdenticalTo is return true for instructions that are not equivalent. The instructions in question are load/store instructions, and is causing a problem with MachineBranchFolding. The problem is this, I have two branches of a switch statement that are identical, except for the size of the store. Here is some cut-down LLVM-IR to showcase the issue:
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
Dear Developers,
I compiled a OpenCL kernel before (on Nov. last year) like
__kernel void g(__global float* array)
{
array[get_global_id(0)] = 1;
}
with libclc, which would originally use the instrinsics like
llvm.r600.read.local.size.x().
I executed the generated object file with one version of the hsa-runtime
[1] provided by Mr. Stellard, when there was more than one workgroup, the
output
2016 Jan 12
1
Some llvm questions (for tgsi backend)
Hi Tom,
Thanks for taking the time to answer this.
On 11-01-16 18:10, Tom Stellard wrote:
> On Mon, Jan 11, 2016 at 12:07:14PM +0100, Hans de Goede wrote:
>> Hi,
>>
>> After a few distractions I'm back to work on the llvm tgsi backend. I've
>> added clang integration and I can now compile a simple opencl program
>> to something which sort of looks like
2014 Nov 10
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Suyog,
Thanks for looking at this. This has recently got itself onto my TODO list
too.
> I am not sure how much all this will improve the code quality for
horizontal reduction
> (donno how frequently such pattern of horizontal reduction from same
array occurs in real world/SPECS).
Actually the main loop of 470.lbm can be SLP vectorized like this. We have
three parts to it: A fully
2014 Sep 19
3
[LLVMdev] [Vectorization] Mis match in code generated
Hi Arnold,
Thanks for your reply.
I tried test case as suggested by you.
*void foo(int *a, int *sum) {*sum =
a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}*
so that it has a 'store' in its IR.
*IR before vectorization :*target datalayout =
"e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
target triple =
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
Hi Mr. Liu,
Thanks for your quick reply.
I compiled the code with the libclc_trunk and linked the bitcode file under
$LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc,
it is currently using the new workitem intrinsics
(commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode
($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code
segment,
2016 Jan 11
4
Some llvm questions (for tgsi backend)
Hi,
After a few distractions I'm back to work on the llvm tgsi backend. I've
added clang integration and I can now compile a simple opencl program
to something which sort of looks like tgsi.
You can find my latest work on this here:
http://cgit.freedesktop.org/~jwrdegoede/llvm
http://cgit.freedesktop.org/~jwrdegoede/clang
(the latter may still need to sync)
I've a little test
2008 Oct 06
3
[LLVMdev] Address calculation
I am attempting to get indexing code generation working with my backend.
However, it seems that the addresses being calculated is being
multiplied by the width of the data type.
define void @ test_input_index_constant_int(i32 addrspace(11)* %input,
i32 addrspace(11)* %result) {
entry:
%input.addr = alloca i32 addrspace(11)* ; <i32
addrspace(11)**> [#uses=2]
2013 Dec 31
4
[LLVMdev] [Patch][RFC] Change R600 data layout
Hi,
I've prepared patches for both LLVM and Clang to change the
datalayout for R600. This may seem like a bold move, but I think it is
warranted. R600/SI is a strange architecture in that it uses 64bit
pointers but does not support 64 bit arithmetic except for load/store
operations that roughly map onto getelementptr.
The current datalayout for r600 includes n32:64, which is odd