thr3ads.net - similar to: "[LLVMdev] 16bit loads being promoted to 32bit?"

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] 16bit loads being promoted to 32bit?"

[LLVMdev] 16bit loads being promoted to 32bit?

2009 Feb 13

[LLVMdev] 16bit loads being promoted to 32bit?

On Thu, Feb 12, 2009 at 4:53 PM, Villmow, Micah <Micah.Villmow at amd.com> wrote: > The > problem that I am having is somewhere along the line the 16bit load is being > promoted to a 32bit load For the given testcase, that's clearly illegal. Either there's a serious bug in LLVM, or you're misinterpreting the meaning of the DAG. Are you sure you aren't seeing a

[LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.

2015 Mar 03

[LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.

I'm curious about this code in ReduceLoadWidth (and in DAGCombiner in general): if (LegalOperations && !TLI.isLoadExtLegal(ExtType, ExtVT)) return SDValue <http://llvm.org/docs/doxygen/html/classllvm_1_1SDValue.html>(); LegalOperations is false for the first pre-legalize pass and true for the post-legalize pass. The first pass is target-independent yes? So that makes sense.

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

On Sep 29, 2010, at 8:35 AMPDT, Ralf Karrenberg wrote: > Hello everybody, > > I have stumbled upon a test case (the attached module is a slightly > reduced version) that shows extremely reduced performance on linux > compared to windows when executed using LLVM's JIT. > > We narrowed the problem down to the actual code being generated, the > source IR on both systems

Structurizing multi-exit regions

2017 Mar 02

Structurizing multi-exit regions

Hi, I'm trying to solve a problem from StructurizeCFG not actually handling regions with multiple exits. Sample IR attached. StructurizeCFG doesn't touch this function, exiting early on the isTopLevelRegion check. SIAnnotateControlFlow then gets confused and ends up inserting an if into one of the blocks, and the matching end.cf into one of the return/unreachable blocks. The input to

[LLVMdev] Loads moving across barriers

2013 Nov 08

[LLVMdev] Loads moving across barriers

Hi, For a long time we've been having a problem we've been working around in OpenCL where loads are moving across an intrinsic used for a barrier. Attached is the testcase, and the result of opt -S -basicaa -gvn on it. This example is essentially this: void foo(global float2* result, local float2* restrict data0, ...) { int id = get_local_id(0); // ... data0[id] = ...;

[LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.

2015 Mar 03

[LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.

1) It's crashing because LD1 is produced due to LegalOperations=false in pre-legalize pass. Then Legalization does not know how to handle it so it asserts on a default case. I don't know if it's a reasonable expectation or not but we do not have support for it. I have not tried overriding shouldReduceLoadWidth. 2) I see, that makes sense to some degree, I'm curious if you can

[LLVMdev] Some questions on SelectionDAG

2011 Sep 02

[LLVMdev] Some questions on SelectionDAG

Hi, all I am studying the ARM backend on SelectionDAG, I have some following questions: 1. Each operator of SDNode in SelectionDAG is required to be defined by SDNode<ISD::XXX,XXX,XXX> in .td file, right? But several operators are not defined in .td file, why? (e.g., ISD::BR_CC, ISD::CopyToReg, ISD::AssertSext) 2. The MVT::glue value is used to ensure two nodes are scheduled

[LLVMdev] Custom Lowering of ARM zero-extending loads

2013 Mar 04

[LLVMdev] Custom Lowering of ARM zero-extending loads

Hi, For my research, I need to reshape the current ARM backend to support armv2a. Zero-extend half word load (ldrh) is not supported by armv2a, so I need to make the code generation to not generate ldrh instructions. I want to replace all those instances with a 32-bit load (ldr) and then and the result with 0xffff to mask out the upper bits. These are the modifications that I have made to

BitcodeReader non explicit error

2016 May 24

BitcodeReader non explicit error

Hi, I'm working on OpenCL and I'm using clang as compiler (based on clang 3.7.0). I have a issue, I'm generating a bitcode file (that I can print before before the generation). But when I'm trying to read it again with clang, I have this issue: "error: Invalid record" How can I managed to know where it comes from? Thank you, Romaric Here is what is print before the

[LLVMdev] Question on equivalence of pointer types

2014 Dec 05

[LLVMdev] Question on equivalence of pointer types

Is copy.0 semantically equivalent to copy.1 in the following example? define void @copy.0(i8 addrspace(1)* addrspace(1)* %src, i8 addrspace(1)* addrspace(1)* %dst) { entry: %val = load i8 addrspace(1)* addrspace(1)* %src store i8 addrspace(1)* %val, i8 addrspace(1)* addrspace(1)* %dst ret void } define void @copy.1(i8 addrspace(1)* addrspace(1)* %src, i8 addrspace(1)* addrspace(1)* %dst)

[LLVMdev] Question on equivalence of pointer types

2014 Dec 09

[LLVMdev] Question on equivalence of pointer types

> On Dec 8, 2014, at 5:12 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > Partially answering my own question, in general these are not > equivalent because LLVM allows for pointers in different address > spaces to have different sizes. However, are they equivalent if > pointers in addrspace(1) have the same size as pointers in > addrspace(0)? > >

RFC: Representing unions in TBAA

2017 Feb 13

RFC: Representing unions in TBAA

Hello all, I'm new to the llvm community. I'm learning how things work. I noticed that there has been some interest in improving how unions are handled. Bug 21725 is one example. I figured it might be a interesting place to start. I discussed this with a couple people, and below is a suggestion on how to represent unions. I would like some comments on how this fits in with how

[LLVMdev] Address calculation

2008 Oct 06

[LLVMdev] Address calculation

I am attempting to get indexing code generation working with my backend. However, it seems that the addresses being calculated is being multiplied by the width of the data type. define void @ test_input_index_constant_int(i32 addrspace(11)* %input, i32 addrspace(11)* %result) { entry: %input.addr = alloca i32 addrspace(11)* ; <i32 addrspace(11)**> [#uses=2]

[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)

2014 Oct 03

[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)

Hi Tom, Matt, I'm running into strange issues with the cos test (piglit generated_tests/cl/builtin/math/builtin-float-cos-1.0.generated.c) I have been seeing random failures (incorrect results) for some time and tried to investigate. the weird part is that the failures are not 100% reproducible, sometimes the tests pass, or partly pass (it's usually float8 and float16 subtests that

[LLVMdev] SCEV getMulExpr() not propagating Wrap flags

2013 Nov 13

[LLVMdev] SCEV getMulExpr() not propagating Wrap flags

Hi folks, I'm trying to analyse this piece of IR: for.body: ; preds = %for.body, %entry %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %0 = shl nsw i64 %indvars.iv, 1 %arrayidx = getelementptr inbounds i32* %b, i64 %0 %1 = load i32* %arrayidx, align 4, !tbaa !1 %add = add nsw i32 %1, %I %arrayidx3 = getelementptr

Optimisation passes introducing address space casts

2017 Jan 03

Optimisation passes introducing address space casts

OK, I’ve hit one more existing regression test that I’m weary of: define void @test2_addrspacecast() { %A = alloca %T %B = alloca %T %a = addrspacecast %T* %A to i8 addrspace(1)* %b = addrspacecast %T* %B to i8 addrspace(1)* call void @llvm.memcpy.p1i8.p0i8.i64(i8 addrspace(1)* %a, i8* bitcast (%T* @G to i8*), i64 124, i32 4, i1 false) call void

RFC: atomic operations on SI+

2016 Mar 28

RFC: atomic operations on SI+

On Fri, Mar 25, 2016 at 02:22:11PM -0400, Jan Vesely wrote: > Hi Tom, Matt, > > I'm working on a project that needs few coherent atomic operations (HSA > mode: load, store, compare-and-swap) for std::atomic_uint in HCC. > > the attached patch implements atomic compare and swap for SI+ > (untested). I tried to stay within what was available, but there are > few issues

byval argument causes llvm to crash after inlining

2018 Sep 25

byval argument causes llvm to crash after inlining

Hello, With the following reduced test case, cmd "opt -always-inline t.ll" crashes after inlining. Notice that byval argument %a will be remapped to %1 below, and consequently produces an illegal store. %1 = alloca i32, align 4 store i32 * %1, i32 addrspace(1)** %a.addr, align 8 Looks like Inliner assumes that byval arguments are from address space 0. Or this is just a bug in inliner?

byval argument causes llvm to crash after inlining

2018 Sep 25

byval argument causes llvm to crash after inlining

It is problematic when byval argument is not from address space 0. When the default alloca address space is 0, should we consider this IR illegal? define internal i32 @bar(i32 addrspace(1)* byval %a) alwaysinline From: Reid Kleckner [mailto:rnk at google.com] Sent: Tuesday, September 25, 2018 2:38 PM To: Pan, Wei <wei.pan at intel.com> Cc: llvm-dev <llvm-dev at lists.llvm.org>

Using store with operands in non-zero address space

2019 Jun 21

Using store with operands in non-zero address space

Hello, LLVM devs. I have the following IR: %x = alloca i32, align 4 %p = alloca i32*, align 8 store i32* %x, i32** %p, align 8 Now I change module's data layout and run InferAddressSpacePass. This turns that piece of code into %x = alloca i32, align 4, addrspace(1) %p = alloca i32*, align 8, addrspace(1) store i32 addrspace(1)* %x, i32* addrspace(1)* %p, align 8 But Verifier

similar to: [LLVMdev] 16bit loads being promoted to 32bit?