thr3ads.net - search: "iadd"

Displaying 20 results from an estimated 29 matches for "iadd".

Did you mean: add

[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs

2014 Oct 24

[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs

...ensive than their 32-bit counterparts. Indeed, the SASS code (disassembly of the actual machine code running on GPUs) of the version with widening looks significantly longer. Without widening (7 instructions): .L_1: /*0048*/ IMUL R2, R0, R0; /*0050*/ IADD R0, R0, 0x1; /*0058*/ ST.E [R4], R2; /*0060*/ ISETP.NE.AND P0, PT, R0, c[0x0][0x140], PT; /*0068*/ IADD R4.CC, R4, 0x4; /*0070*/ IADD.X R5, R5, RZ; /*0078*/ @P0 BRA `(.L_1); With...

[LLVMdev] type legalization/operation action

2015 Feb 05

[LLVMdev] type legalization/operation action

Dear there, I have a target which is supporting the 32 bit operations natively. Right now,I want to make it support the 16 bits operations as well. My initial thought is: (1) I can adding something like “ CCIfType< [i16], CCPromoteToType<i32>>”, to the CallingConv.td, then “all” the 16 bits operands will be automatically promoted to 32 bits, it will be all set. but looks it is not

[LLVMdev] why llvm does not have uadd, iadd node

2015 Feb 17

[LLVMdev] why llvm does not have uadd, iadd node

So if the overflow happens for either one of the case, the return value will be implementation dependent? best kevin On Feb 17, 2015, at 2:01 PM, Tim Northover <t.p.northover at gmail.com> wrote: > Hi Kevin, > > On 17 February 2015 at 10:41, kewuzhang <kewu.zhang at amd.com> wrote: >> I just noticed that the LLVM has some node for signed/unsigned type( like udiv,

[LLVMdev] why llvm does not have uadd, iadd node

2015 Feb 17

[LLVMdev] why llvm does not have uadd, iadd node

Hi guys, I just noticed that the LLVM has some node for signed/unsigned type( like udiv, sdiv), but why the ADD, SUB do not have the counter part sadd, uadd? best kevin

[LLVMdev] Using patterns inside patterns

2008 Oct 30

[LLVMdev] Using patterns inside patterns

I do not have access to a subtraction routine, as it is considered add with negation on the second parameter, so I have this pattern: // integer subtraction // a - b ==> a + (-b) def ISUB : Pat<(sub GPRI32:$src0, GPRI32:$src1), (IADD GPRI32:$src0, (INEGATE GPRI32:$src1))>; I am attemping to do 64 bit integer shifts and using the following pattern: def LSHL : Pat<(shl GPRI64:$src0, GPRI32:$src1), (LCREATE (ISHL (LLO GPRI64:$src0), GPRI32:$src1), (IOR (ISHL (LHI GPRI64:$src0), GPRI32:$src1), (IOR (U...

[LLVMdev] Using patterns inside patterns

2008 Oct 30

[LLVMdev] Using patterns inside patterns

I am not sure what you are looking to do. Please provide a mark up example. Evan On Oct 28, 2008, at 11:00 AM, Villmow, Micah wrote: > Is there currently a way to use a pattern inside of another pattern? > > Micah Villmow > Systems Engineer > Advanced Technology & Performance > Advanced Micro Devices Inc. > 4555 Great America Pkwy, > Santa Clara, CA. 95054 > P:

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 16

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

...printer by decoding the integer constant. This does require having extra moves, but your example below would end up being something like the following: dp4 r100, r1, r2 mov r0.x, r100 (float4 => float1 extract_vector_elt) dp4 r101, r4, r5 mov r3.x, r101 (float4 => float1 extract_vector_elt) iadd r6.xy__, r0.x000, r3.0x00(float1 + float1 => float2 build_vector) dp4 r7.x, r8, r9 <as above> dp4 r10.x, r11, r12 <as above> iadd r13.xy__, r7.x000, f10.0x00(float1 + float1 => float2 build_vector) iadd r14, r13.xy00, r6.00xy (float2 + float2 => float4 build_vector) sub r15, r1...

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 16

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in

[LLVMdev] Using patterns inside patterns

2008 Oct 28

[LLVMdev] Using patterns inside patterns

Is there currently a way to use a pattern inside of another pattern? Micah Villmow Systems Engineer Advanced Technology & Performance Advanced Micro Devices Inc. 4555 Great America Pkwy, Santa Clara, CA. 95054 P: 408-572-6219 F: 408-572-6596 -------------- next part -------------- An HTML attachment was scrubbed... URL:

Help with simple dll wrapper around linux so

2010 Jun 09

Help with simple dll wrapper around linux so

...WinAdd.c: Code: #include <windef.h> #include "add.h" int WINAPI WinAdd (int a,int b) { return add(a,b); } WinAdd.dll.spec: Code: 2 stdcall WinAdd (long long) WinAdd now, i have these all in a directory called test. I type: Code: winemaker . --nosource-fix --nomfc -iadd --single-target WinAdd -L"." (ive already compiled to libadd.so for linux), and then run Code: make And i get: Code: winegcc -o WinAdd.so add.o WinAdd.o -L. -ladd /usr/lib/wine/libwinecrt0.a(exe_main.o): In function `main': (.text+0xa0): undefined reference to `WinMa...

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 13

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

...ally can't do >> unfused float multiply+add as a single instruction but I know next to >> nothing about nvidia hw...) > > The compiler should reassociate a mul + add into a mad where possible. > In actuality, IMAD is actually super-slow... allegedly slower than > IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is > faster but we haven't figured out how to operate it yet. I'm not aware > of a muladd version of fma on fermi and newer (GL 4.0). The tesla > series does have a floating point mul+add (but no fma). > Interesting. radeons...

[LLVMdev] Performance problems with FORTRAN allocatable arrays

2012 Feb 15

[LLVMdev] Performance problems with FORTRAN allocatable arrays

...red using the ALLOCATABLE keyword. For example if you have something like the following: DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV ... ALLOCATE( QAV( -2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND) ) ... DO L = 1, 5 DO K = K1, K2 DO J = J1, J2 DO I = I1, I2 II = I + IADD IBD = II - IBDD ICD = II + IBDD QAV(I,J,K,L) = R6I * (2.0D0 * Q(IBD,J,K,L,N) + > 5.0D0 * Q( II,J,K,L,N) - > Q(ICD,J,K,L,N)) END DO END DO END DO END DO Most...

[LLVMdev] Performance problems with FORTRAN allocatable arrays

2012 Feb 15

[LLVMdev] Performance problems with FORTRAN allocatable arrays

...u have something like the following: > > DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:,:) :: QAV > ... > ALLOCATE( QAV( -2:IMAX+2,-2:JMAX+2,-2:KMAX+2,ND) ) > ... > DO L = 1, 5 > DO K = K1, K2 > DO J = J1, J2 > DO I = I1, I2 > II = I + IADD > IBD = II - IBDD > ICD = II + IBDD > > QAV(I,J,K,L) = R6I * (2.0D0 * Q(IBD,J,K,L,N) + >> 5.0D0 * Q( II,J,K,L,N) - >> Q(ICD,J,K,L,N)) > END D...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...t of this was to be able to express > stuff like: > > v_min_f32 v1, v0, v1 (dpp control) > > where you take the minimum of v1 and the swizzled v0, except where you > would've read an invalid lane for v0, you read the old value for v1 > instead. For operations like add and iadd where the identity is 0, you > can set bound_ctrl = 1, and then the optimizer can safely fold the > v_mov_b32 into the operation itself. That is, you'd do: > > %swizzled = i32 llvm.amdgcn.update.dpp i32 0, %update, (dpp control) > %new = i32 add %swizzled, %old > > and af...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...xpress >> stuff like: >> >> v_min_f32 v1, v0, v1 (dpp control) >> >> where you take the minimum of v1 and the swizzled v0, except where >> you would've read an invalid lane for v0, you read the old value for >> v1 instead. For operations like add and iadd where the identity is 0, >> you can set bound_ctrl = 1, and then the optimizer can safely fold >> the >> v_mov_b32 into the operation itself. That is, you'd do: >> >> %swizzled = i32 llvm.amdgcn.update.dpp i32 0, %update, (dpp control) >> %new = i32 add %s...

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 12

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

This looks like the right idea to me too. It may sound a bit weird to do that per instruction, but d3d11 does that as well. (Some d3d versions just have a global flag basically forbidding or allowing any such fast math optimizations in the assembly, but I'm not actually sure everybody honors that without tesselation...) For 1/9: Reviewed-by: Roland Scheidegger <sroland at vmware.com>

[LLVMdev] Question on tablegen

2009 May 08

[LLVMdev] Question on tablegen

...m1, imm2 And I have my asm look similar to: Add dst, src0.imm1, src1.imm2 and then in the asm printer I intercept vector_shuffle and I convert the integer to x,y,z,w, 0, 1 or _. For example if the mask is to take x from s1 and yzw from s2, I would generate 0x1000 and 0x0234. So my result looks like Iadd d0, s1.x000, s2.0yzw This allows you to do your vector shuffle in a single instruction. It's not the cleanest approach but it works for me and I can encode up to 8 swizzle per immediate so works on vector sizes up to 8 in length. Hope this helps, Micah -----Original Message----- From: llvmd...

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 13

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

...ually wondering if the hw really can't do > unfused float multiply+add as a single instruction but I know next to > nothing about nvidia hw...) The compiler should reassociate a mul + add into a mad where possible. In actuality, IMAD is actually super-slow... allegedly slower than IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is faster but we haven't figured out how to operate it yet. I'm not aware of a muladd version of fma on fermi and newer (GL 4.0). The tesla series does have a floating point mul+add (but no fma).

[LLVMdev] Question on tablegen

2009 May 08

[LLVMdev] Question on tablegen

Dan, Thanks a lot. Using a modifier in the assembly string works for this case. I am trying to solve a related problem. I am trying to print out a set of "mov" ops for the vector_shuffle node. Since the source of the "mov" is from one of the sources to vector_shuffle, depending on the mask, I am not sure what assembly string to emit. For example, if I have d <-

[LLVMdev] Possible miscompilation?

2008 Jun 11

[LLVMdev] Possible miscompilation?

Hi all, I'm trying to figure out a weird bug I'm seeing. I'm hoping it's something simple in my IR but I can't see anything wrong so I'm hoping someone here can see something. I'm using LLVM to compile Java bytecode into native functions. My code keeps track of the Java local variables in an array of llvm::Value pointers which get phi'd up at various points. The

search for: iadd