thr3ads.net - search: "s32"

[GlobalISel][MIPS] Legality and instruction combining

2018 Sep 14

2

[GlobalISel][MIPS] Legality and instruction combining

Hi Daniel, On 13.09.2018. 19:32, Daniel Sanders wrote: > Could you clarify what you mean here? The new legalizer info can > define this with: > getActionDefinitionsBuilder(G_SELECT).clampScalar(1, s32, s32) > so I'm guessing you mean that code to mutate the G_SELECT is currently > missing Yes, LegalizerHelper::widenScalar widens only TypeIdx==0, it doesn't do that for TypeIdx==1. Is it intentionally implemented this way? >> b) Is the plan to sometimes let s1 as legal type a...

[GlobalISel] Legalize generic instructions that also depend on type of scalar, not only scalar size

2018 Sep 21

2

[GlobalISel] Legalize generic instructions that also depend on type of scalar, not only scalar size

Hi, Mips32 has 64 bit floating point instructions, while i64 instructions have to be emulated with i32 instructions. This means that G_LOAD should be custom legalized for s64 integer value, and be legal for s64 floating point value. There are also other generic instructions with the same problem: G_STORE,...

[LLVMdev] Re: LLVM to SUIF-MACH VM binary

2005 Jan 19

0

[LLVMdev] Re: LLVM to SUIF-MACH VM binary

...any of the code generator components). If it does, making > use of the code generator infrastructure would make sense. > > -Chris > Sample from SUIF disassembler (done by someone else): lda $vr10.p32 <- main.A cvt $vr11.p32 <- $vr10.p32 add $vr12.p32 <- $vr11.p32,$vr9.s32 lod $vr13.s32 <- 0($vr12.p32) cvt $vr8.s32 <- $vr13.s32 mul $vr6.s32 <- $vr7.s32,$vr8.s32 ldc $vr15.s32 <- 5 ldc $vr18.s32 <- 1 add $vr17.s32 <- main.i,$vr18.s32 ******************** So I guess it is RISK. Lots of virtual registers, so I guess allocation isn't a big p...

[LLVMdev] Re: LLVM to SUIF-MACH VM binary

2005 Jan 18

2

[LLVMdev] Re: LLVM to SUIF-MACH VM binary

On Tue, 18 Jan 2005, John Cortes wrote: >> Can you say a little bit about MACH-SUIF? With a brief google search, I >> didn't turn up anything that described the architecture. Is it a RISC-like >> machine with 32-bit instruction words? >> > > It's another VM representation. I haven't really gotten to know the nitty > gritty of the language so

[GlobalISel][MIPS] Legality and instruction combining

2018 Sep 13

2

[GlobalISel][MIPS] Legality and instruction combining

...always i1, and test argument (type 1) in select is also i1. Here is an .ll example: define i32 @f(i32 %a, i32 %b, i32 %c, i32 %d) { entry: %cmp = icmp slt i32 %a, %b %cond = select i1 %cmp, i32 %d, i32 %c ret i32 %cond } and corresponding MIR snippet: %4:_(s1) = G_ICMP intpred(sgt), %0(s32), %1 %5:_(s32) = G_SELECT %4(s1), %2, %3 On mips 32, integer compare uses i32 as result and that result is zero extended. For G_SELECT, we will select instructions that check whether "test register" was zero or not (selected instructions are movz or movn). Test register has size 32,...

[PATCH 2/4] nvc0/ir: Handle reverse subop for OP_EXTBF when folding constant expressions

2014 May 29

2

[PATCH 2/4] nvc0/ir: Handle reverse subop for OP_EXTBF when folding constant expressions

...f4..93f7c2a 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -529,8 +529,18 @@ ConstantFolding::expr(Instruction *i, lshift = 32 - width - offset; } switch (i->dType) { - case TYPE_S32: res.data.s32 = (a->data.s32 << lshift) >> rshift; break; - case TYPE_U32: res.data.u32 = (a->data.u32 << lshift) >> rshift; break; + case TYPE_S32: { + res.data.s32 = (res.data.s32 << lshift) >> rshift; + if (i->subOp == NV50_I...

[GlobalISel] Narrowing uneven/non-pow-2 types

2020 Mar 25

2

[GlobalISel] Narrowing uneven/non-pow-2 types

...on G_UNMERGE_VALUES and therefore requires the source type to be a multiple of the narrow type. Often times these instructions can be widened without any problem to a fitting type. >> >> This has us writing legalization rules like `.widenScalarToNextPow2(0, /*MinSize*/ 32).maxScalar(0, s32)` instead of the much simpler `.clampScalar(0, s32 ,s32)`. >> >> Although this works and has the desired effect, we feel like that such a rule requires internal knowledge of the legalizer, which can change at any point in the future. Ideally we would only want to say `clampScalar` and l...

[PATCH 10/13] x86/alternative: Support indirect call replacement

2017 Oct 04

1

[PATCH 10/13] x86/alternative: Support indirect call replacement

...hecked before the opcode to avoid - * accessing uninitialized bytes for zero-length replacements. + * Fix the address offsets for call and jump instructions which + * use PC-relative addressing. */ if (a->replacementlen == 5 && *insnbuf == 0xe8) { + /* direct call */ *(s32 *)(insnbuf + 1) += replacement - instr; - DPRINTK("Fix CALL offset: 0x%x, CALL 0x%lx", + DPRINTK("Fix direct CALL offset: 0x%x, CALL 0x%lx", *(s32 *)(insnbuf + 1), (unsigned long)instr + *(s32 *)(insnbuf + 1) + 5); - } - if (a->replacementlen && is_j...

[PATCH 2/4] nvc0/ir: Handle reverse subop for OP_EXTBF when folding constant expressions

2014 May 29

1

[PATCH 2/4] nvc0/ir: Handle reverse subop for OP_EXTBF when folding constant expressions

...gen/nv50_ir_peephole.cpp >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >> @@ -529,8 +529,18 @@ ConstantFolding::expr(Instruction *i, >> lshift = 32 - width - offset; >> } >> switch (i->dType) { >> - case TYPE_S32: res.data.s32 = (a->data.s32 << lshift) >> rshift; break; >> - case TYPE_U32: res.data.u32 = (a->data.u32 << lshift) >> rshift; break; >> + case TYPE_S32: { >> + res.data.s32 = (res.data.s32 << lshift) >> rshift; >>...

[PATCH] 02-

2013 May 21

0

[PATCH] 02-

...Doing 16 samples filtering at a time */ + "vmlal.s16 q7, d8, d10;\n" + "vmlal.s16 q8, d8, d11;\n" + "vmlal.s16 q9, d8, d12;\n" + "vmlal.s16 q10, d8, d13;\n" + + /* Reduce filter sum to 16 bits for y output */ + "vrshrn.s32 d4, q7, %[SIGSHIFT];\n" + "vrshrn.s32 d5, q8, %[SIGSHIFT];\n" + "vrshrn.s32 d6, q9, %[SIGSHIFT];\n" + "vrshrn.s32 d7, q10, %[SIGSHIFT];\n" + + "pld [%0, #0];\n" + + /* Duplicate last x sample to q5 for next "previous" s...

[PATCH v2 0/4] Constant folding of new Instructions

2014 Jun 03

8

[PATCH v2 0/4] Constant folding of new Instructions

And another try for constant folding of Instructions for nvc0. Please Review this! Thanks, Tobias Klausmann Tobias Klausmann (4): nvc0/ir: clear subop when folding constant expressions nvc0/ir: Handle reverse subop for OP_EXTBF when folding constant expressions nvc0/ir: Handle OP_BFIND when folding constant expressions nvc0/ir: Handle OP_POPCNT when folding constant expressions

[PATCH 10/13] x86/alternative: Support indirect call replacement

2017 Nov 16

1

[PATCH 10/13] x86/alternative: Support indirect call replacement

...zed bytes for zero-length replacements. > > + * Fix the address offsets for call and jump instructions which > > + * use PC-relative addressing. > > */ > > if (a->replacementlen == 5 && *insnbuf == 0xe8) { > > + /* direct call */ > > *(s32 *)(insnbuf + 1) += replacement - instr; > > - DPRINTK("Fix CALL offset: 0x%x, CALL 0x%lx", > > + DPRINTK("Fix direct CALL offset: 0x%x, CALL 0x%lx", > > *(s32 *)(insnbuf + 1), > > (unsigned long)instr + *(s32 *)(insnbuf + 1) + 5); > >...

[PATCH 10/13] x86/alternative: Support indirect call replacement

2017 Nov 16

1

[PATCH 10/13] x86/alternative: Support indirect call replacement

...zed bytes for zero-length replacements. > > + * Fix the address offsets for call and jump instructions which > > + * use PC-relative addressing. > > */ > > if (a->replacementlen == 5 && *insnbuf == 0xe8) { > > + /* direct call */ > > *(s32 *)(insnbuf + 1) += replacement - instr; > > - DPRINTK("Fix CALL offset: 0x%x, CALL 0x%lx", > > + DPRINTK("Fix direct CALL offset: 0x%x, CALL 0x%lx", > > *(s32 *)(insnbuf + 1), > > (unsigned long)instr + *(s32 *)(insnbuf + 1) + 5); > >...

[GlobalISel] Narrowing uneven/non-pow-2 types

2020 Mar 24

3

[GlobalISel] Narrowing uneven/non-pow-2 types

...ng code relies on G_UNMERGE_VALUES and therefore requires the source type to be a multiple of the narrow type. Often times these instructions can be widened without any problem to a fitting type. This has us writing legalization rules like `.widenScalarToNextPow2(0, /*MinSize*/ 32).maxScalar(0, s32)` instead of the much simpler `.clampScalar(0, s32 ,s32)`. Although this works and has the desired effect, we feel like that such a rule requires internal knowledge of the legalizer, which can change at any point in the future. Ideally we would only want to say `clampScalar` and let the legali...

[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions

2015 Jan 09

3

[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions

..., + INT16_MAX)); + else + res.data.s16 = util_iround(imm0.reg.data.f64); + break; + default: + return; + } + i->setSrc(0, bld.mkImm(res.data.s16)); + break; + case TYPE_S32: + switch (i->sType) { + case TYPE_F32: + if (i->saturate) + res.data.s32 = util_iround(CLAMP(imm0.reg.data.f32, INT32_MIN, + INT32_MAX)); + else + res.data.s32 = util_iround(imm0.r...

[PATCH] 02-Add CELT filter optimizations

2013 May 21

2

[PATCH] 02-Add CELT filter optimizations

...Doing 16 samples filtering at a time */ + "vmlal.s16 q7, d8, d10;\n" + "vmlal.s16 q8, d8, d11;\n" + "vmlal.s16 q9, d8, d12;\n" + "vmlal.s16 q10, d8, d13;\n" + + /* Reduce filter sum to 16 bits for y output */ + "vrshrn.s32 d4, q7, %[SIGSHIFT];\n" + "vrshrn.s32 d5, q8, %[SIGSHIFT];\n" + "vrshrn.s32 d6, q9, %[SIGSHIFT];\n" + "vrshrn.s32 d7, q10, %[SIGSHIFT];\n" + + "pld [%0, #0];\n" + + /* Duplicate last x sample to q5 for next "previous" s...

[PATCH 1/2] nv50/ir: fix s32 x s32 -> high s32 multiply logic

2014 May 18

1

[PATCH 1/2] nv50/ir: fix s32 x s32 -> high s32 multiply logic

...an unsigned multiply. static bool expandIntegerMUL(BuildUtil *bld, Instruction *mul) { const bool highResult = mul->subOp == NV50_IR_SUBOP_MUL_HIGH; - DataType fTy = mul->sType; // full type - DataType hTy; + DataType fTy; // full type + switch (mul->sType) { + case TYPE_S32: fTy = TYPE_U32; break; + case TYPE_S64: fTy = TYPE_U64; break; + default: fTy = mul->sType; break; + } + + DataType hTy; // half type switch (fTy) { - case TYPE_S32: hTy = TYPE_S16; break; case TYPE_U32: hTy = TYPE_U16; break; case TYPE_U64: hTy = TYPE_U32; break; - case...

[PATCH v2 2/4] nvc0/ir: Handle reverse subop for OP_EXTBF when folding constant expressions

2014 Jun 03

0

[PATCH v2 2/4] nvc0/ir: Handle reverse subop for OP_EXTBF when folding constant expressions

...f4..a214ffc 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -529,8 +529,20 @@ ConstantFolding::expr(Instruction *i, lshift = 32 - width - offset; } switch (i->dType) { - case TYPE_S32: res.data.s32 = (a->data.s32 << lshift) >> rshift; break; - case TYPE_U32: res.data.u32 = (a->data.u32 << lshift) >> rshift; break; + case TYPE_S32: + if (i->subOp == NV50_IR_SUBOP_EXTBF_REV) + res.data.s32 = util_bitreverse(a->data.s...

[PATCH v4] nv50/ir: Handle OP_CVT when folding constant expressions

2014 Jul 05

1

[PATCH v4] nv50/ir: Handle OP_CVT when folding constant expressions

...imm0.reg.data.f64), INT16_MIN, + INT16_MAX); + else res.data.s16 = util_iround(imm0.reg.data.f64); + break; + default: + return; + } + i->setSrc(0, bld.mkImm(res.data.s16)); + break; + case TYPE_S32: + switch (i->sType) { + case TYPE_F32: + if (i->saturate) + res.data.s32 = CLAMP(util_iround(imm0.reg.data.f32), INT32_MIN, + INT32_MAX); + else res.data.s32 = util_iround(imm0.reg.data.f32); +...

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

4

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

...2F_square( > > .reg .b64 examples_2E_mandelbrot_2F_square_param_0 > > ) > > { > > .reg .pred %p<396>; > > .reg .s16 %rc<396>; > > .reg .s16 %rs<396>; > > .reg .s32 %r<396>; > > .reg .s64 %rl<396>; > > .reg .f32 %f<396>; > > .reg .f64 %fl<396>; > > > > mov.f64 %fl0, examples_2E_mandelbrot_2F_square_param_0; > > mul.f64...

search for: s32