search for: xyzw

Displaying 20 results from an estimated 50 matches for "xyzw".

Did you mean: xyz
2014 Oct 03
2
[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)
...RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG --check-prefix=FUNC %s > +; RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=SI --check-prefix=FUNC %s > > ;FUNC-LABEL: @test1: > -;EG-CHECK: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > +;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} > > -;SI-CHECK: V_ADD_I32_e32 [[REG:v[0-9]+]], {{v[0-9]+, v[0-9]+}} > -;SI-CHECK-NOT: [[REG]] > -;SI-CHECK: BUFFER_STORE_DWORD [[REG]], > +;SI: V_ADD_I32_e32 [[REG...
2015 Nov 18
1
[Mesa-dev] llvm TGSI backend (WIP) questions
...gis TEMP1z, [TEMP1y] UADDs TEMP1y, TEMP1y, 4 LOADgis TEMP1y, [TEMP1y] UADDs TEMP1y, TEMP1z, TEMP1y STOREgis [TEMP1x], TEMP1y UADDs TEMP0x, TEMP0x, 0 RET ENDSUB and add.ll has: ;FUNC-LABEL: {{^}}test1: ;EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}} ;SI: v_add_i32_e32 [[REG:v[0-9]+]], vcc, {{v[0-9]+, v[0-9]+}} ;SI-NOT: [[REG]] ;SI: buffer_store_dword [[REG]], define void @test1(i32 addrspace(1)* %out, i32 addrspace(1)* %in) { %b_ptr = getelementptr i32, i32 addrspace(1)* %in, i32 1 %a = load i32, i32...
2015 Dec 22
0
Translating tests/trivial/compute.c gallium tests to opencl (input / help wanted)
...test_system_values(struct context *ctx) " UADD TEMP[0].xy, TEMP[0].xyxy, TEMP[0].zwzw\n" " UADD TEMP[0].x, TEMP[0].xxxx, TEMP[0].yyyy\n" " UMUL TEMP[0].x, TEMP[0], IMM[0]\n" - " STORE RES[0].xyzw, TEMP[0], SV[0]\n" + " LOAD TEMP[1].x, RINPUT, IMM[2]\n" + " UADD TEMP[0].x, TEMP[0], TEMP[1]\n" + " STORE RGLOBAL.xyzw, TEMP[0], SV[0]\n" " UADD TEMP[0].x, TEMP[0], IMM[1]\n" -...
2005 Sep 17
1
[LLVMdev] Subword register allocation
..., I try to elaborate it again. Pardon. All registers are 128-bit. Each register can be divided into four 32-bit subwords. Each subword can be independently read and written. A symbolic name is given to each subword: x, y, z, w. MUL r0.xyz, r1.xyz, r2.xxx SUB r0.w, r3,y, r4.z ADD r5.xyzw, r0.xyzw, r2.xyzw MUL defines the three subwords of r0, and SUB defines the rest one. Note that ADD uses the four subwords defined by the previous two instructions. The register allocate must be aware of this, otherwise additional MOV instructions may have to be inserted: MUL r0.xyz, r1.xyz,...
2014 Sep 05
3
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> wrote: > Unfortunately, another team, while doing internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >
2014 Sep 05
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...recently though and has been much less well tested. I'll start fuzz >> testing it and should hopefully uncover the bug. > > Here's two small test cases. Hope they are of use. > > Thanks, > Rob. > > ------ > define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) { > %1 = extractelement <4 x float> %xyzw, i32 0 > %2 = insertelement <4 x float> undef, float %1, i32 0 > %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1 > %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, &lt...
2014 Sep 06
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...s well tested. I'll start fuzz >> testing it and should hopefully uncover the bug. >> >> >> Here's two small test cases. Hope they are of use. >> >> Thanks, >> Rob. >> >> ------ >> define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) { >> %1 = extractelement <4 x float> %xyzw, i32 0 >> %2 = insertelement <4 x float> undef, float %1, i32 0 >> %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1 >> %4 = shufflevector <4 x float> %3, <4 x floa...
2009 Feb 16
2
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in the following example: // dp4 = dot product 4-element dp4 r0.x, r1, r2 dp4 r0.y, r3, r4 dp4 r0.z, r5, r6 dp4 r0.w, r7, r8 sub r5, r0.xyzw, r6 -- View t...
2009 Feb 13
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
...27). > > In this way I ensure MUL and ADD write to the same physical > register. This > replacement is done in the other FunctionPass *after* register > allocation. > > MUL and ADD have an 'OptionalDefOperand' writemask. By default the > writemask is > "xyzw" (all elmenets are written). > > // 0xF == all elements are written by default > def WRITEMASK : OptionalDefOperand<OtherVT, (ops i32imm), (ops > (i32 0xF))> > {...} > > def MUL : MyInst<(outs REG4X32:$dst), > (ins REG4X32:$src0, R...
2005 May 06
3
[LLVMdev] avoid live range overlap of "vector" registers
a "vector" register r0 is composed of four 32-bit floating scalar registers, r0.x, r0.y, r0.z, r0.w. each scalar reg can be assigned individually, e.g. mov r0.x, r1.y add r0.y, r1,x, r2.z or assigned simultaneously with vector instructions, e.g. add r0.xyzw, r1.xzyw, r2.xyzw My question is how to define the register in .td file to avoid the code generator overlaps the live ranges of vector registers? i could define a 'definition' for each scalar register, but it's tedious: class FooReg<string n> : Register<n> {} def r0_x...
2015 Nov 13
6
llvm TGSI backend (WIP) questions
Hi All, So as discussed I've started working on a TGSI backend for llvm to use as a way to get compute going on nouveau (and other gpu-s). I'm still learning all the ins and outs of llvm so I do not have much to show yet. I've rebased Francisco's (curro's) latest version on top of llvm trunk, and added a commit on top to actual get it build with the latest trunk. So
2009 Feb 13
3
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
...ination register of MUL (%reg1024) and ADD(%reg1027). In this way I ensure MUL and ADD write to the same physical register. This replacement is done in the other FunctionPass *after* register allocation. MUL and ADD have an 'OptionalDefOperand' writemask. By default the writemask is "xyzw" (all elmenets are written). // 0xF == all elements are written by default def WRITEMASK : OptionalDefOperand<OtherVT, (ops i32imm), (ops (i32 0xF))> {...} def MUL : MyInst<(outs REG4X32:$dst), (ins REG4X32:$src0, REG4X32:$src1, WRITEMASK:$wm), In the s...
2014 Sep 08
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...>>>> testing it and should hopefully uncover the bug. >>> >>> Here's two small test cases. Hope they are of use. >>> >>> Thanks, >>> Rob. >>> >>> ------ >>> define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) { >>> %1 = extractelement <4 x float> %xyzw, i32 0 >>> %2 = insertelement <4 x float> undef, float %1, i32 0 >>> %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1 >>> %4 = shufflevector <4 x float>...
2010 May 18
1
runes of Magic doesn't display login
...0.xy = (-R1.ww + R0.xy); fixme:d3d_shader:shader_glsl_dump_program_source A0.w = (int(floor(abs(R0.w) + 0.5) * sign(R0.w))); fixme:d3d_shader:shader_glsl_dump_program_source A0.xy = (ivec2(floor(abs(R0.xy) + vec2(0.5)) * sign(R0.xy))); fixme:d3d_shader:shader_glsl_dump_program_source R1.xyzw = (VC[A0.w + 2].xyzw); fixme:d3d_shader:shader_glsl_dump_program_source R0.xy = (R1.xy + -VC[A0.x + 2].xy); fixme:d3d_shader:shader_glsl_dump_program_source R0.xy = ((attrib0.zz * R0.xy) + VC[A0.x + 2].xy); fixme:d3d_shader:shader_glsl_dump_program_source R0.zw = (R1.zw + -VC[A0.x + 2]....
2005 May 10
0
[LLVMdev] avoid live range overlap of "vector" registers
...uot; register r0 is composed of four 32-bit floating scalar > registers, r0.x, r0.y, r0.z, r0.w. > > each scalar reg can be assigned individually, e.g. > > mov r0.x, r1.y > add r0.y, r1,x, r2.z > > or assigned simultaneously with vector instructions, e.g. > > add r0.xyzw, r1.xzyw, r2.xyzw > > My question is how to define the register in .td file to avoid the > code generator overlaps the live ranges of vector registers? If you want to access each part individually, I would suggest doing the tedious thing and including them all. The IA64 backend has 3*12...
2005 Dec 15
3
[LLVMdev] Vector LLVM extension v.s. DirectX Shaders
...pattern in the program semantic tree. For example, to match 'mul' and 'add', and merge them to a single instruction 'mad' (multiple-and-add). For another example, to vectorize several scalar operations like: add r0.xy, r1.xy, r2.xy add r0.zw, r1.zw, r2.zw to: add r0.xyzw, r1.xyzw, r2.xyzw If the write mask and swizzles are 'supported' in the each instruction per se. The syntax/signature of LLVM assembly will need to be changed from: <result> = add <ty> <var1>, <var2> to: <result>.<writemask> = add <ty> <...
2005 May 01
3
win32-dir 0.1.0 compile problems
I tried to download/compile/install win32-dir, but I couldn''t get it to go. Over a private email Daniel Berger had me... "Curious. What platform are you on exactly? Try modifying the extconf.rb file. Add ''have_library("SHFolder")'' above ''have_library("shell32")''. If that doesn''t work, try uncommenting the other
1999 Apr 01
2
Swat password syncronization - HELP
Hi All I've installed Samba 2.0.3 and coudn't put "unix passowrd sync" to work. I set: passwd program = /usr/bin/passwd %u passwd chat = *New*password:* %n\n *Re-enter*new*password:* %n\n*changed.* passwd chat debug = Yes unix password sync = Yes log level = 100 The password page says "The password for 'user' has been changed". In fact it has been
2005 Apr 20
1
[LLVMdev] adding new instructions to support "swizzle" and "writemask"
...dware further "source register swizzle" and "writemask". For example: # The following two instructions are equivalent. # They cost the same instruction slot, and have same # execution time. Four channels are added in parallel. add r0, r1, r2 add r0.xyzw, r1.xyzw, r2.xyzw # equivalent to: # r0.x = r1.yy + r2.w # r0.z = r1.yy + r2.x # r0.y and r0.w remains unchanged add r0.xz, r1.y, r2.wx Note that the channel y of r1 is replicated in the third instruction. Detailed documentation: <http://msdn.microsoft.com/lib...
2009 Feb 16
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
...or registers, again (with my implementation) Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in the following example: // dp4 = dot product 4-element dp4 r0.x, r1, r2 dp4 r0.y, r3, r4 dp4 r0.z, r5, r6 dp4 r0.w, r7, r8 sub r5, r0.xyzw, r6 -- View t...