thr3ads.net - similar to: "[LLVMdev] Vector LLVM extension v.s. DirectX Shaders"

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] Vector LLVM extension v.s. DirectX Shaders"

[LLVMdev] Vector LLVM extension v.s. DirectX Shaders

2005 Dec 15

[LLVMdev] Vector LLVM extension v.s. DirectX Shaders

On Thu, 15 Dec 2005, Tzu-Chien Chiu wrote: > To write a compiler for Microsoft Direct3D shaders from our hardware, > I have a program which translates the Direct3D shader assembly to LLVM > assembly. I added several intrinsics for this purpose. > It's a vector ISA and has some special instructions like: > * rcp (reciprocal) > * frc (the fractional portion of each input

[LLVMdev] RE: LLVM extension v.s. DirectX Shaders

2006 Apr 08

[LLVMdev] RE: LLVM extension v.s. DirectX Shaders

Way back on Wed Dec 14, 2005, Tzu-Chien Chiu wrote: > To write a compiler for Microsoft Direct3D shaders from our hardware, > I have a program which translates the Direct3D shader assembly to LLVM > assembly. I added several intrinsics for this purpose. > It's a vector ISA and has some special instructions like: > * rcp (reciprocal) > ... > These operations are very

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 13

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

It seems to me that LLVM sub-register is not for the following hardware architecture. All instructions of a hardware are vector instructions. All registers contains 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w. Most instructions write more than one elements in this way: mul r0.xyw, r1, r2 add r0.z, r3, r4 sub r5, r0, r1 Notice that the four elements of r0 are written

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 16

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in

[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)

2005 Jul 27

[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)

Each register is a 4-component (namely, r, g, b, a) vector register. They are actually defined as llvm packed [4xfloat]. The instruction: add_sat r0.a, r1_bias.xxyy, r3_x2.zzzz Explaination: '.a' is a writemask. only the specified component will be update '.xxyy' and '.zzzz' are swizzle masks, specify the component permutation, simliar to the Intel SSE permutation

[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)

2005 Jul 29

[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)

Actually the problems that Tzu-Chien Chiu are encountering are similar to what should be done for generating SSE code in the X86 backend and also other SIMD instruction sets. I think LLVM neeeds to add instructions for permuting components, extracting and injecting elements in packed types. If the architecture has instructions which can do permutations for each instruction (for example

[LLVMdev] adding new instructions to support "swizzle" and "writemask"

2005 Apr 20

[LLVMdev] adding new instructions to support "swizzle" and "writemask"

Hello, everyone: I am writing a compiler for a programmable graphics hardware. Each registers of the hardware has four channels, namely 'r', 'b', 'g', 'a', and each channel is a 32-bit floating point. It's similar to the high and low 8-bit of an x86 16-bit general purpose register "AX" can be individually referenced as "AH" and

[PATCH] nouveau: codegen: Take src swizzle into account on loads

2016 Apr 08

[PATCH] nouveau: codegen: Take src swizzle into account on loads

Hi, On 08-04-16 17:02, Ilia Mirkin wrote: > On Fri, Apr 8, 2016 at 5:27 AM, Hans de Goede <hdegoede at redhat.com> wrote: >> Hi, >> >> On 07-04-16 15:58, Ilia Mirkin wrote: >>> >>> That's wrong. >> >> >> It used to work with the old RES[] code and if one cannot specify >> a source swizzle, then how can I do something like

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 13

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

On Feb 13, 2009, at 9:47 AM, Alex wrote: > It seems to me that LLVM sub-register is not for the following > hardware architecture. > > All instructions of a hardware are vector instructions. All > registers contains > 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w. > > Most instructions write more than one elements in this way: > > mul

[PATCH] nouveau: codegen: Take src swizzle into account on loads

2016 Apr 08

[PATCH] nouveau: codegen: Take src swizzle into account on loads

Hi, On 07-04-16 15:58, Ilia Mirkin wrote: > That's wrong. It used to work with the old RES[] code and if one cannot specify a source swizzle, then how can I do something like LOAD TEMP[0].y, MEMORY[0], address And get the data at absolute global memory address "address" into TEMP[0].y ? This is a must-have for llvm to be able to generate working TGSI code, I do not see any

[LLVMdev] Vector swizzling and write masks code generation

2007 Sep 27

[LLVMdev] Vector swizzling and write masks code generation

Hey, as some of you may know we're in process of experimenting with LLVM in Gallium3D (Mesa's new driver model), where LLVM would be used both in the software only (by just JIT executing shaders) and hardware (drivers will implement LLVM code-generators) cases. While the software only case is pretty straight forward I just realized I missed something in my initial evaluation. That

[PATCH] nouveau: codegen: Take src swizzle into account on loads

2016 Apr 08

[PATCH] nouveau: codegen: Take src swizzle into account on loads

Hi, On 08-04-16 17:45, Ilia Mirkin wrote: > On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goede <hdegoede at redhat.com> wrote: >> When dealing with non vector variables the llvm register allocator >> will use TEMP[0].x then TEMP[0].y, etc. >> >> When loading something from a global buffer it will calculate the >> address to use, and store that in say TEMP[0].x,

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 16

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

Alex, From my experience in working with GPU vector registers; there is no support for swizzles in the manner that you would normally code them, and in my case I have 6^4 permutations on src registers and 24 combinations in the dst registers. The way that I ended up handling this was to have different register classes for 1, 2, 3 and 4 component vectors. This made the generic cases very simple

[LLVMdev] [Mesa3d-dev] Folding vector instructions

2008 Dec 30

[LLVMdev] [Mesa3d-dev] Folding vector instructions

Alex wrote: > Hello. > > Sorry I am not sure this question should go to llvm or mesa3d-dev mailing > list, so I post it to both. > > I am writing a llvm backend for a modern graphics processor which has a ISA > very similar to that of Direct 3D. > > I am reading the code in Gallium-3D driver in a mesa3d branch, which > converts the shader programs (TGSI tokens) to

[LLVMdev] Folding vector instructions

2008 Dec 30

[LLVMdev] Folding vector instructions

Hello. Sorry I am not sure this question should go to llvm or mesa3d-dev mailing list, so I post it to both. I am writing a llvm backend for a modern graphics processor which has a ISA very similar to that of Direct 3D. I am reading the code in Gallium-3D driver in a mesa3d branch, which converts the shader programs (TGSI tokens) to LLVM IR. For the shader instruction also found in LLVM IR,

[LLVMdev] [Mesa3d-dev] Folding vector instructions

2008 Dec 30

[LLVMdev] [Mesa3d-dev] Folding vector instructions

On Dec 30, 2008, at 6:39 AM, Corbin Simpson wrote: >> However, the special instrucions cannot directly be mapped to LLVM >> IR, like >> "min", the conversion involves in 'extract' the vector, create >> less-than-compare, create 'select' instruction, and create 'insert- >> element' >> instruction. Using scalar operations

[LLVMdev] Vector swizzling and write masks code generation

2007 Sep 27

[LLVMdev] Vector swizzling and write masks code generation

On Thu, 27 Sep 2007, Zack Rusin wrote: > as some of you may know we're in process of experimenting with LLVM in > Gallium3D (Mesa's new driver model), where LLVM would be used both in the > software only (by just JIT executing shaders) and hardware (drivers will > implement LLVM code-generators) cases. Yep, nifty! > That is graphics hardware (basically every single

[LLVMdev] avoid live range overlap of "vector" registers

2005 May 11

[LLVMdev] avoid live range overlap of "vector" registers

On Wed, 11 May 2005, Tzu-Chien Chiu wrote: > On Tue May 10 2005, Chris Lattner wrote: >> On Tue, 10 May 2005, Morten Ofstad wrote: >>> Actually, I think it would be better to define the registers as a machine >>> value type for packed float x4, and providing some 'extract' and 'inject' >>> instructions to access individual components... There

[LLVMdev] avoid live range overlap of "vector" registers

2005 May 11

[LLVMdev] avoid live range overlap of "vector" registers

Chris Lattner wrote: > None, that documentation is out of date and doesn't make a ton of sense > for your application. I would suggest that you implement it in the > context of the SelectionDAG framework that all of the code generators > either currently use or are moving to. I updated the documentation > here:

[PATCH] nouveau: codegen: Take src swizzle into account on loads

2016 Apr 08

[PATCH] nouveau: codegen: Take src swizzle into account on loads

On Fri, Apr 8, 2016 at 5:27 AM, Hans de Goede <hdegoede at redhat.com> wrote: > Hi, > > On 07-04-16 15:58, Ilia Mirkin wrote: >> >> That's wrong. > > > It used to work with the old RES[] code and if one cannot specify > a source swizzle, then how can I do something like > > LOAD TEMP[0].y, MEMORY[0], address > > And get the data at absolute

similar to: [LLVMdev] Vector LLVM extension v.s. DirectX Shaders