thr3ads.net - search: "mul2"

Displaying 17 results from an estimated 17 matches for "mul2".

Did you mean: mul

should we do this time-consuming transform in InstCombine?

2018 Dec 18

should we do this time-consuming transform in InstCombine?

...Tue, Dec 18, 2018 at 10:18 AM Zheng CZ Chen via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi, Hi. > There is an opportunity in instCombine for following instruction pattern: > > %mul = mul nsw i32 %b, %a > %cmp = icmp sgt i32 %mul, -1 > %sub = sub i32 0, %a > %mul2 = mul nsw i32 %sub, %b > %cond = select i1 %cmp, i32 %mul, i32 %mul2 > > Source code for above pattern: > return (a*b) >=0 ? (a*b) : -a*b; > > Currently, llvm(-O3) can not recognize this as abs(a*b). > > I initially think we could do this in instCombine phase in opt. Belo...

should we do this time-consuming transform in InstCombine?

2018 Dec 18

should we do this time-consuming transform in InstCombine?

Hi, There is an opportunity in instCombine for following instruction pattern: %mul = mul nsw i32 %b, %a %cmp = icmp sgt i32 %mul, -1 %sub = sub i32 0, %a %mul2 = mul nsw i32 %sub, %b %cond = select i1 %cmp, i32 %mul, i32 %mul2 Source code for above pattern: return (a*b) >=0 ? (a*b) : -a*b; Currently, llvm(-O3) can not recognize this as abs(a*b). I initially think we could do this in instCombine phase in opt. Below is what I think: %res = OP i32...

[PATCH] nv50/ir: take postFactor into account when doing peephole optimizations

2015 Mar 25

[PATCH] nv50/ir: take postFactor into account when doing peephole optimizations

...Instruction *i, i->src(0).mod = Modifier(0); i->src(1).mod = Modifier(0); + i->postFactor = 0; i->setSrc(0, new_ImmediateValue(i->bb->getProgram(), res.data.u32)); i->setSrc(1, NULL); @@ -682,7 +685,7 @@ ConstantFolding::tryCollapseChainedMULs(Instruction *mul2, Instruction *insn; Instruction *mul1 = NULL; // mul1 before mul2 int e = 0; - float f = imm2.reg.data.f32; + float f = imm2.reg.data.f32 * exp2f(mul2->postFactor); ImmediateValue imm1; assert(mul2->op == OP_MUL && mul2->dType == TYPE_F32); @@ -782,9 +785,...

[PATCH] nv50/ir: use unordered_set instead of list to keep our instructions in uses

2014 Jul 08

[PATCH] nv50/ir: use unordered_set instead of list to keep our instructions in uses

...eephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index c162ac4..8d052c5 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -686,7 +686,7 @@ ConstantFolding::tryCollapseChainedMULs(Instruction *mul2, // b = mul a, imm // d = mul b, c -> d = mul_x_imm a, c int s2, t2; - insn = mul2->getDef(0)->uses.front()->getInsn(); + insn = (*mul2->getDef(0)->uses.begin())->getInsn(); if (!insn) return; mul1 = mul2; diff --git a/sr...

[LLVMdev] folding x * 0 = 0

2010 Mar 03

[LLVMdev] folding x * 0 = 0

...f; return cos(0.5) * sin(0.5) * x; }; after compiling it with clang (cpp mode) and renaming _ZSt3sinf to sin and _ZSt3cosf to cos I get the following: define float @_Z3fooff(float %a, float %b) nounwind { entry: %mul = fmul float %a, %b ; <float> [#uses=1] %mul2 = fmul float %mul, 0.000000e+000 ; <float> [#uses=1] %mul6 = fmul float 0x3FDAED54A0000000, %mul2 ; <float> [#uses=1] ret float %mul6 } the sin and cos calls are folded, but not the mul by zero. May be this is missing in llvm::ConstantFoldInstOperands in ConsantFoldi...

[RFC] A new multidimensional array indexing intrinsic

2019 Jul 23

[RFC] A new multidimensional array indexing intrinsic

...on-integer for illustration purposes: > > %arrayidx = call i64 @llvm.multidim.array.index.i64.p0f64.i64.i64.i64.i64 double* %A, i64 %str_1, i64 %idx_1, i64 %str_2, i64 %idx_2 > > According to the RFC, that would get lowered to this: > > %mul1 = mul nsw i64 %str_1, %idx_1 > %mul2 = mul1 nsw i64 %str_2, %idx_2 > %total = add nsw i64 %mul2, %mul1 > %arrayidx = getelementptr inbounds double, double* %A, i64 %total, !multidim !1 > > The problem I'm having is that the source element type in the GEP instruction in the lowering can only be inferred from the poi...

Data structure improvement for the SLP vectorizer

2017 Mar 15

Data structure improvement for the SLP vectorizer

...%element33 = getelementptr inbounds double, double* %data3, i32 3 %load30 = load double, double* %data3 %load31 = load double, double* %element31 %load32 = load double, double* %element32 %load33 = load double, double* %element33 %mul1 = fmul fast double %load20, %load10 %mul2 = fmul fast double %load21, %load11 %mul3 = fmul fast double %load22, %load10 %mul4 = fmul fast double %load23, %load11 %add1 = fadd fast double %load30, %mul1 %add2 = fadd fast double %load31, %mul2 %add3 = fadd fast double %load32, %mul3 %add4 = fadd fast double %load33,...

[LLVMdev] constant folding for standard math functions

2010 Mar 01

[LLVMdev] constant folding for standard math functions

Hi! I'd like to replace all calls to standard math functions (e.g. sin(0.5)) by their result. What strategy do you recommend? Should I write a pass that does only this or should I copy and modify the SCCP pass? A problem with an extra pass could be that that I need to alternate my pass and SCCP several times since the results of the math functions could be folded again. -Jochen

[LLVMdev] constant folding for standard math functions

2010 Mar 01

[LLVMdev] constant folding for standard math functions

On Mar 1, 2010, at 9:44 AM, Jochen Wilhelmy wrote: > Hi! > > I'd like to replace all calls to standard math functions (e.g. sin(0.5)) by > their result. > What strategy do you recommend? > Should I write a pass that does only this or should I copy and > modify the SCCP pass? > > A problem with an extra pass could be that that I need to alternate > my pass and

[LLVMdev] folding x * 0 = 0

2010 Mar 05

[LLVMdev] folding x * 0 = 0

...}; > > after compiling it with clang (cpp mode) and renaming _ZSt3sinf to sin > and _ZSt3cosf to cos I get the following: > > define float @_Z3fooff(float %a, float %b) nounwind { > entry: > %mul = fmul float %a, %b ;<float> [#uses=1] > %mul2 = fmul float %mul, 0.000000e+000 ;<float> [#uses=1] > %mul6 = fmul float 0x3FDAED54A0000000, %mul2 ;<float> [#uses=1] > ret float %mul6 > } > > the sin and cos calls are folded, but not the mul by zero. > May be this is missing in llvm::ConstantFo...

[RFC] A new multidimensional array indexing intrinsic

2019 Jul 22

[RFC] A new multidimensional array indexing intrinsic

...> ``` > > > %arrayidx = llvm.multidim.array.index.* i64 i64* %A, %str_1, %idx_1, %str_2, %idx_2 > > > ``` > > > > > > is lowered to: > > > > > > ``` > > > %mul1 = mul nsw i64 %str_1, %idx_1 > > > %mul2 = mul1 nsw i64 %str_2, %idx_2 > > > %total = add nsw i64 %mul2, %mul1 > > > %arrayidx = getelementptr inbounds i64, i64* %A, i64 %total, !multidim !1 > > > ``` > > > with guarantees that the first term in each multiplication is the stride > &gt...

Data structure improvement for the SLP vectorizer

2017 Mar 15

Data structure improvement for the SLP vectorizer

There was some discussion of this on the llvm-commits list, but I wanted to raise the topic for discussion here. The background of the -commits discussion was that r296863 added the ability to sort memory access when the SLP vectorizer reached a load (the SLP vectorizer starts at a store or some other sink, and tries to go up the tree vectorizing as it goes along - if the input is in a different

[RFC] A new multidimensional array indexing intrinsic

2019 Jul 21

[RFC] A new multidimensional array indexing intrinsic

...d by the lowered GEP is guaranteed to be in a canonical form which allows the analysis to infer stride and index sizes. A multidim index of the form: ``` %arrayidx = llvm.multidim.array.index.* i64 i64* %A, %str_1, %idx_1, %str_2, %idx_2 ``` is lowered to: ``` %mul1 = mul nsw i64 %str_1, %idx_1 %mul2 = mul1 nsw i64 %str_2, %idx_2 %total = add nsw i64 %mul2, %mul1 %arrayidx = getelementptr inbounds i64, i64* %A, i64 %total, !multidim !1 ``` with guarantees that the first term in each multiplication is the stride and the second term in each multiplication is the index. (What happens if intermedia...

[RFC] A new multidimensional array indexing intrinsic

2019 Jul 22

[RFC] A new multidimensional array indexing intrinsic

...stride and index sizes. >> >> A multidim index of the form: >> ``` >> %arrayidx = llvm.multidim.array.index.* i64 i64* %A, %str_1, %idx_1, %str_2, %idx_2 >> ``` >> >> is lowered to: >> >> ``` >> %mul1 = mul nsw i64 %str_1, %idx_1 >> %mul2 = mul1 nsw i64 %str_2, %idx_2 >> %total = add nsw i64 %mul2, %mul1 >> %arrayidx = getelementptr inbounds i64, i64* %A, i64 %total, !multidim !1 >> ``` >> with guarantees that the first term in each multiplication is the stride >> and the second term in each multiplicat...

[RFC] A new multidimensional array indexing intrinsic

2019 Jul 22

[RFC] A new multidimensional array indexing intrinsic

...to infer stride and index sizes. > > A multidim index of the form: > ``` > %arrayidx = llvm.multidim.array.index.* i64 i64* %A, %str_1, %idx_1, %str_2, %idx_2 > ``` > > is lowered to: > > ``` > %mul1 = mul nsw i64 %str_1, %idx_1 > %mul2 = mul1 nsw i64 %str_2, %idx_2 > %total = add nsw i64 %mul2, %mul1 > %arrayidx = getelementptr inbounds i64, i64* %A, i64 %total, !multidim !1 > ``` > with guarantees that the first term in each multiplication is the stride > and the second term in each multiplicat...

[RFC] A new multidimensional array indexing intrinsic

2019 Jul 25

[RFC] A new multidimensional array indexing intrinsic

...;> A multidim index of the form: >>> ``` >>> %arrayidx = llvm.multidim.array.index.* i64 i64* %A, %str_1, %idx_1, %str_2, %idx_2 >>> ``` >>> >>> is lowered to: >>> >>> ``` >>> %mul1 = mul nsw i64 %str_1, %idx_1 >>> %mul2 = mul1 nsw i64 %str_2, %idx_2 >>> %total = add nsw i64 %mul2, %mul1 >>> %arrayidx = getelementptr inbounds i64, i64* %A, i64 %total, !multidim !1 >>> ``` >>> with guarantees that the first term in each multiplication is the stride >>> and the second term...

Replacing a function from one module into another one

2018 Sep 06

Replacing a function from one module into another one

Hi Philip, The error happens when the program finishes and it automatically calls the destructors, so it is not an error specifically inside my program. Here's the full code: #include "llvm/ExecutionEngine/ExecutionEngine.h" #include "llvm/ExecutionEngine/MCJIT.h" #include "llvm/IRReader/IRReader.h" #include "llvm/Support/TargetSelect.h" #include

search for: mul2