thr3ads.net - search: "iadds"

Displaying 20 results from an estimated 29 matches for "iadds".

Did you mean: adds

[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs

2014 Oct 24

[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs

Hi, I noticed a significant performance regression (up to 40%) on some internal CUDA benchmarks (a reduced example presented below). The root cause of this regression seems that IndVarSimpilfy widens induction variables assuming arithmetics on wider integer types are as cheap as those on narrower ones. However, this assumption is wrong at least for the NVPTX64 target. Although the NVPTX64 target

[LLVMdev] type legalization/operation action

2015 Feb 05

[LLVMdev] type legalization/operation action

Dear there, I have a target which is supporting the 32 bit operations natively. Right now,I want to make it support the 16 bits operations as well. My initial thought is: (1) I can adding something like “ CCIfType< [i16], CCPromoteToType<i32>>”, to the CallingConv.td, then “all” the 16 bits operands will be automatically promoted to 32 bits, it will be all set. but looks it is not

[LLVMdev] why llvm does not have uadd, iadd node

2015 Feb 17

[LLVMdev] why llvm does not have uadd, iadd node

So if the overflow happens for either one of the case, the return value will be implementation dependent? best kevin On Feb 17, 2015, at 2:01 PM, Tim Northover <t.p.northover at gmail.com> wrote: > Hi Kevin, > > On 17 February 2015 at 10:41, kewuzhang <kewu.zhang at amd.com> wrote: >> I just noticed that the LLVM has some node for signed/unsigned type( like udiv,

[LLVMdev] why llvm does not have uadd, iadd node

2015 Feb 17

[LLVMdev] why llvm does not have uadd, iadd node

Hi guys, I just noticed that the LLVM has some node for signed/unsigned type( like udiv, sdiv), but why the ADD, SUB do not have the counter part sadd, uadd? best kevin

[LLVMdev] Using patterns inside patterns

2008 Oct 30

[LLVMdev] Using patterns inside patterns

I do not have access to a subtraction routine, as it is considered add with negation on the second parameter, so I have this pattern: // integer subtraction // a - b ==> a + (-b) def ISUB : Pat<(sub GPRI32:$src0, GPRI32:$src1), (IADD GPRI32:$src0, (INEGATE GPRI32:$src1))>; I am attemping to do 64 bit integer shifts and using the following pattern: def LSHL :

[LLVMdev] Using patterns inside patterns

2008 Oct 30

[LLVMdev] Using patterns inside patterns

I am not sure what you are looking to do. Please provide a mark up example. Evan On Oct 28, 2008, at 11:00 AM, Villmow, Micah wrote: > Is there currently a way to use a pattern inside of another pattern? > > Micah Villmow > Systems Engineer > Advanced Technology & Performance > Advanced Micro Devices Inc. > 4555 Great America Pkwy, > Santa Clara, CA. 95054 > P:

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 16

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

Alex, From my experience in working with GPU vector registers; there is no support for swizzles in the manner that you would normally code them, and in my case I have 6^4 permutations on src registers and 24 combinations in the dst registers. The way that I ended up handling this was to have different register classes for 1, 2, 3 and 4 component vectors. This made the generic cases very simple

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 16

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in

[LLVMdev] Using patterns inside patterns

2008 Oct 28

[LLVMdev] Using patterns inside patterns

Is there currently a way to use a pattern inside of another pattern? Micah Villmow Systems Engineer Advanced Technology & Performance Advanced Micro Devices Inc. 4555 Great America Pkwy, Santa Clara, CA. 95054 P: 408-572-6219 F: 408-572-6596 -------------- next part -------------- An HTML attachment was scrubbed... URL:

Help with simple dll wrapper around linux so

2010 Jun 09

Help with simple dll wrapper around linux so

Ive recently got metatrader to work on linux uner wine and would now like to see if i can import a dll wrapper so i can use some code i wrote in linux. Im trying something like this (based on http://www.winehq.org/docs/winelib-guide/bindlls) : add.c: Code: int add(int a,int b) { return a+b; } add.h: > int add(int,int); WinAdd.c: WinAdd.c: Code: #include <windef.h> #include

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 13

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

Am 13.06.2017 um 02:05 schrieb Ilia Mirkin: > On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland at vmware.com> wrote: >> FWIW surely on nv50 you could keep a single mad instruction for umad >> (sad maybe too?). (I'm actually wondering if the hw really can't do >> unfused float multiply+add as a single instruction but I know next to >> nothing

[LLVMdev] Performance problems with FORTRAN allocatable arrays

2012 Feb 15

[LLVMdev] Performance problems with FORTRAN allocatable arrays

I've noticed that LLVM does a bad job of optimizing array indexing code for FORTRAN arrays declared using the ALLOCATABLE keyword. For example if you have something like the following: DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV ... ALLOCATE( QAV( -2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND) ) ... DO L = 1, 5 DO K = K1, K2 DO J = J1, J2 DO I = I1, I2 II = I +

[LLVMdev] Performance problems with FORTRAN allocatable arrays

2012 Feb 15

[LLVMdev] Performance problems with FORTRAN allocatable arrays

Hi Wonsun, can you please provide a testcase. Best wishes, Duncan. > I've noticed that LLVM does a bad job of optimizing array indexing > code for FORTRAN arrays declared using the ALLOCATABLE keyword. > > For example if you have something like the following: > > DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV > ... > ALLOCATE( QAV(

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

On 06/14/2017 05:05 PM, Connor Abbott wrote: > On Tue, Jun 13, 2017 at 6:13 PM, Tom Stellard <tstellar at redhat.com> wrote: >> On 06/13/2017 07:33 PM, Matt Arsenault wrote: >>> >>>> On Jun 12, 2017, at 17:23, Tom Stellard <tstellar at redhat.com <mailto:tstellar at redhat.com>> wrote: >>>> >>>> On 06/12/2017 08:03 PM, Connor

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

I'm wondering about the focus on bound_cntl. Any cleared bit in the row_mask or bank_mask will also disable updating the result. Brian -----Original Message----- From: Connor Abbott [mailto:cwabbott0 at gmail.com] Sent: Wednesday, June 14, 2017 6:13 PM To: tstellar at redhat.com Cc: Matt Arsenault; llvm-dev at lists.llvm.org; Kolton, Sam; Sumner, Brian; Pykhtin, Valery Subject: Re:

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 12

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

This looks like the right idea to me too. It may sound a bit weird to do that per instruction, but d3d11 does that as well. (Some d3d versions just have a global flag basically forbidding or allowing any such fast math optimizations in the assembly, but I'm not actually sure everybody honors that without tesselation...) For 1/9: Reviewed-by: Roland Scheidegger <sroland at vmware.com>

[LLVMdev] Question on tablegen

2009 May 08

[LLVMdev] Question on tablegen

Manjunath, I had a very similar problem and I solved it using a custom vector shuffle and addition instead of mov. For example, Vector_shuffle s1, s2, <0,3> is mapped to a custom instruction where I transform the swizzle to a 32bit integer mask and an inverted mask. So I have dst, src0, src1, imm1, imm2 And I have my asm look similar to: Add dst, src0.imm1, src1.imm2 and then in the asm

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 13

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland at vmware.com> wrote: > FWIW surely on nv50 you could keep a single mad instruction for umad > (sad maybe too?). (I'm actually wondering if the hw really can't do > unfused float multiply+add as a single instruction but I know next to > nothing about nvidia hw...) The compiler should reassociate a mul + add

[LLVMdev] Question on tablegen

2009 May 08

[LLVMdev] Question on tablegen

Dan, Thanks a lot. Using a modifier in the assembly string works for this case. I am trying to solve a related problem. I am trying to print out a set of "mov" ops for the vector_shuffle node. Since the source of the "mov" is from one of the sources to vector_shuffle, depending on the mask, I am not sure what assembly string to emit. For example, if I have d <-

[LLVMdev] Possible miscompilation?

2008 Jun 11

[LLVMdev] Possible miscompilation?

Hi all, I'm trying to figure out a weird bug I'm seeing. I'm hoping it's something simple in my IR but I can't see anything wrong so I'm hoping someone here can see something. I'm using LLVM to compile Java bytecode into native functions. My code keeps track of the Java local variables in an array of llvm::Value pointers which get phi'd up at various points. The

search for: iadds