search for: writemask

Displaying 20 results from an estimated 51 matches for "writemask".

2005 Jul 27
3
[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)
Each register is a 4-component (namely, r, g, b, a) vector register. They are actually defined as llvm packed [4xfloat]. The instruction: add_sat r0.a, r1_bias.xxyy, r3_x2.zzzz Explaination: '.a' is a writemask. only the specified component will be update '.xxyy' and '.zzzz' are swizzle masks, specify the component permutation, simliar to the Intel SSE permutation instruction SHUFPD '_bias' and '_x2' are modifiers. they modify the value of source operands and send the mod...
2005 Jul 29
0
[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)
...ction. m. Tzu-Chien Chiu wrote: > Each register is a 4-component (namely, r, g, b, a) vector register. > They are actually defined as llvm packed [4xfloat]. > > The instruction: > > add_sat r0.a, r1_bias.xxyy, r3_x2.zzzz > > Explaination: > > '.a' is a writemask. only the specified component will be update > > '.xxyy' and '.zzzz' are swizzle masks, specify the component > permutation, simliar to the Intel SSE permutation instruction SHUFPD > > '_bias' and '_x2' are modifiers. they modify the value of source...
2005 Apr 20
1
[LLVMdev] adding new instructions to support "swizzle" and "writemask"
...ach channel is a 32-bit floating point. It's similar to the high and low 8-bit of an x86 16-bit general purpose register "AX" can be individually referenced as "AH" and "AL". What's different is the hardware further "source register swizzle" and "writemask". For example: # The following two instructions are equivalent. # They cost the same instruction slot, and have same # execution time. Four channels are added in parallel. add r0, r1, r2 add r0.xyzw, r1.xyzw, r2.xyzw # equivalent to: # r0.x = r1.y...
2009 Feb 13
3
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
...destination register allocated to the destination register of MUL (%reg1024) and ADD(%reg1027). In this way I ensure MUL and ADD write to the same physical register. This replacement is done in the other FunctionPass *after* register allocation. MUL and ADD have an 'OptionalDefOperand' writemask. By default the writemask is "xyzw" (all elmenets are written). // 0xF == all elements are written by default def WRITEMASK : OptionalDefOperand<OtherVT, (ops i32imm), (ops (i32 0xF))> {...} def MUL : MyInst<(outs REG4X32:$dst), (ins REG4X32:$src0,...
2009 Feb 13
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
...; destination register of MUL (%reg1024) and ADD(%reg1027). > > In this way I ensure MUL and ADD write to the same physical > register. This > replacement is done in the other FunctionPass *after* register > allocation. > > MUL and ADD have an 'OptionalDefOperand' writemask. By default the > writemask is > "xyzw" (all elmenets are written). > > // 0xF == all elements are written by default > def WRITEMASK : OptionalDefOperand<OtherVT, (ops i32imm), (ops > (i32 0xF))> > {...} > > def MUL : MyInst<(outs REG4X32:$...
2013 Nov 26
2
[LLVMdev] R600/SI build failure on Leopard (Use of C++11)
Hi Christian, Ryan just reported to me that llvm-3.4 is no longer building on OS X Leopard (https://trac.macports.org/ticket/41548). It seems the issue is with a commit that you made back in April (referenced below) which added this to SIISelLowering.cpp: // Adjust the writemask in the node std::vector<SDValue> Ops; Ops.push_back(DAG.getTargetConstant(NewDmask, MVT::i32)); for (unsigned i = 1, e = Node->getNumOperands(); i != e; ++i) Ops.push_back(Node->getOperand(i)); Node = (MachineSDNode*)DAG.UpdateNodeOperands(Node, Ops.data(), Ops.size()); Tha...
2013 Nov 26
0
[LLVMdev] R600/SI build failure on Leopard (Use of C++11)
...: > Hi Christian, > > Ryan just reported to me that llvm-3.4 is no longer building on OS X Leopard (https://trac.macports.org/ticket/41548). It seems the issue is with a commit that you made back in April (referenced below) which added this to SIISelLowering.cpp: > > // Adjust the writemask in the node > std::vector<SDValue> Ops; > Ops.push_back(DAG.getTargetConstant(NewDmask, MVT::i32)); > for (unsigned i = 1, e = Node->getNumOperands(); i != e; ++i) > Ops.push_back(Node->getOperand(i)); > Node = (MachineSDNode*)DAG.UpdateNodeOperands(Node, Ops....
2008 Dec 30
0
[LLVMdev] [Mesa3d-dev] Folding vector instructions
...9;t know what the status of this is, I think it is partially implemented but may not be complete yet. >> I don't have experience of the new vector instructions in LLVM, and >> perhaps >> that's why it makes me feel it's complicated to fold the swizzle and >> writemask. We have really good support for swizzling operations already with the shuffle_vector instruction. I'm not sure about writemask. > > Um, I was thinking that we should eventually create intrinsic > functions > for some of the commands, like LIT, that might not be > single-i...
2008 Dec 30
2
[LLVMdev] [Mesa3d-dev] Folding vector instructions
...ake the instruction selection easier (no folding and > complicated pattern-matching in the instruction selection DAG). > > I don't have experience of the new vector instructions in LLVM, and perhaps > that's why it makes me feel it's complicated to fold the swizzle and > writemask. > > Thanks. I hope marcheu sees this too. Um, I was thinking that we should eventually create intrinsic functions for some of the commands, like LIT, that might not be single-instruction, but that can be lowered eventually, and for commands like LG2, that might be single-instruction for s...
2016 Apr 08
2
[PATCH] nouveau: codegen: Take src swizzle into account on loads
...the coordinates provided (4 sequential dwords > from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in > the case of images) > (b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE argument > (c) store that swizzled result into the destination based on the writemask > > That would sound reasonable to me, and if I understand correctly, is > option 2 of your proposal. Yes that is option 2, and is basically what the patch which started this thread does. So that would work for me :) > We'd need some docs updates and buy-in from the other gallium...
2005 Dec 15
3
[LLVMdev] Vector LLVM extension v.s. DirectX Shaders
...w, r1.zw, r2.zw to: add r0.xyzw, r1.xyzw, r2.xyzw If the write mask and swizzles are 'supported' in the each instruction per se. The syntax/signature of LLVM assembly will need to be changed from: <result> = add <ty> <var1>, <var2> to: <result>.<writemask> = add <ty> <var1>.<swizzle>, <var2>.<swizzle> This could be easier for the frontend transformations to recognize/identify the real program semantics, without the additional extract, combine, and permute instruction sequences. >From the point of view writing fr...
2016 Apr 08
2
[PATCH] nouveau: codegen: Take src swizzle into account on loads
...and by my comment of "working as intended". But that doesn't mean > the intent can't be changed :) > > For memory/buffers, LOAD takes the address at TEMP[0].x and loads 16 > bytes (4 words), and sticks them into the destination's .xyzw. If you > happen to have a writemask, then only some of those are written out. > > It seems that you're trying to add additional meaning to the swizzle > on the "memory" argument. However I don't believe that such a thing is > defined. (And definitely not used anywhere, at least not on purpose.) > >...
2008 Dec 30
2
[LLVMdev] Folding vector instructions
...;min Function' will make the instruction selection easier (no folding and complicated pattern-matching in the instruction selection DAG). I don't have experience of the new vector instructions in LLVM, and perhaps that's why it makes me feel it's complicated to fold the swizzle and writemask. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20081230/3e9eaef5/attachment.html>
2007 Sep 27
3
[LLVMdev] Vector swizzling and write masks code generation
..., i32 1 %tmp4 = insertelement <2 x float> %tmp2, float %tmp3, i32 1 store <2 x float> %tmp4, <2 x float>* @vec2 or the like. So I think my options come down to: 1) figure out a way of having code generator be actually able to combine all those IR instructions back into OP dst.writemask src1.swizzle1 src2.swizzle2 2) have some kind of instruction level support for it in LLVM IR With my limited knowledge of code generators in LLVM I don't see a way of doing #1 and I'm afraid #2 might be the only option. I'd appreciate any ideas and/or comments that could potentially...
2009 Sep 10
0
[PATCH 02/13] nv50: add functions for swizzle resolution
...may also emit the final + * result to a write-only register. + */ +static struct nv50_reg * +tgsi_broadcast_dst(struct nv50_pc *pc, + const struct tgsi_full_dst_register *fd, unsigned mask) +{ + if (fd->DstRegister.File == TGSI_FILE_TEMPORARY) { + int c = ffs(~mask & fd->DstRegister.WriteMask); + if (c) + return tgsi_dst(pc, c - 1, fd); + } else { + int c = ffs(fd->DstRegister.WriteMask) - 1; + if ((1 << c) == fd->DstRegister.WriteMask) + return tgsi_dst(pc, c, fd); + } + + return NULL; +} + static unsigned load_fp_attrib(struct nv50_pc *pc, int i, unsigned *acc, in...
2017 Jun 11
0
[RFC 3/9] st/glsl_to_tgsi: handle precise modifier
...>src[3]); + new_inst = emit_asm(ir, inst->op, l, inst->src[0], inst->src[1], inst->src[2], inst->src[3], + is_precise(ir->lhs->variable_referenced())); new_inst->saturate = inst->saturate; inst->dead_mask = inst->dst[0].writemask; } else { @@ -4072,16 +4122,16 @@ glsl_to_tgsi_visitor::calc_deref_offsets(ir_dereference *tail, deref_arr->array_index->accept(this); if (*array_elements != 1) - emit_asm(NULL, TGSI_OPCODE_MUL, temp_dst, this->result, st_src_reg_for_int(*array_elements...
2005 Dec 15
0
[LLVMdev] Vector LLVM extension v.s. DirectX Shaders
...ld be done in the dag combiner (not in instruction selection), but shouldn't conceptually be a problem. > The syntax/signature of LLVM assembly will need to be changed > from: > > <result> = add <ty> <var1>, <var2> > > to: > <result>.<writemask> = add <ty> <var1>.<swizzle>, <var2>.<swizzle> > > This could be easier for the frontend transformations to > recognize/identify the real program semantics, without the additional > extract, combine, and permute instruction sequences. I don't agree....
2016 Apr 08
0
[PATCH] nouveau: codegen: Take src swizzle into account on loads
...fetch 4 values from the coordinates provided (4 sequential dwords from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in the case of images) (b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE argument (c) store that swizzled result into the destination based on the writemask That would sound reasonable to me, and if I understand correctly, is option 2 of your proposal. We'd need some docs updates and buy-in from the other gallium driver developers. STORE remains unchanged, as the MEMORY/etc is in the destination, where there is a writemask, which is presently use...
2007 Sep 27
0
[LLVMdev] Vector swizzling and write masks code generation
...f operand (e.g. vec4 -> vec4) you can use the shufflevector instruction, but if not, you have to use insert/extract. > So I think my options come down to: > > 1) figure out a way of having code generator be actually able to combine all > those IR instructions back into > OP dst.writemask src1.swizzle1 src2.swizzle2 Yep. If you're using the LLVM code generator, it makes it reasonably easy to pattern match on this sort of thing and/or introduce machine specific abstractions to describe them. -Chris -- http://nondot.org/sabre/ http://llvm.org/
2009 Sep 10
0
[PATCH 01/13] nv50: extend insn src mask function
...orted(const struct tgsi_full_instruction *insn, int i) } } +/* Return a read mask for source registers deduced from opcode & write mask. */ +static unsigned +nv50_tgsi_src_mask(const struct tgsi_full_instruction *insn, int c) +{ + unsigned x, mask = insn->FullDstRegisters[0].DstRegister.WriteMask; + + switch (insn->Instruction.Opcode) { + case TGSI_OPCODE_COS: + case TGSI_OPCODE_SIN: + return (mask & 0x8) | ((mask & 0x7) ? 0x1 : 0x0); + case TGSI_OPCODE_DP3: + return 0x7; + case TGSI_OPCODE_DP4: + case TGSI_OPCODE_DPH: + case TGSI_OPCODE_KIL: /* WriteMask ignored */ + return 0...