thr3ads.net - search: "conv2"

Displaying 20 results from an estimated 39 matches for "conv2".

Did you mean: conv

InstCombine wrongful (?) optimization on BinOp with SameOperands

2015 Sep 30

InstCombine wrongful (?) optimization on BinOp with SameOperands

...re forwarding it to the backend I develop for my company and while building define i32 @test_extract_subreg_func(i32 %x, i32 %y) #0 { entry: %conv = zext i32 %x to i64 %conv1 = zext i32 %y to i64 %mul = mul nuw i64 %conv1, %conv %shr = lshr i64 %mul, 32 %xor = xor i64 %shr, %mul %conv2 = trunc i64 %xor to i32 ret i32 %conv2 } I came upon the following optimization (during instcombine): *IC: Visiting: %mul = mul nuw i64 %conv, %conv1 IC: Visiting: %shr = lshr i64 %mul, 32 IC: Visiting: %conv2 = trunc i64 %shr to i32 IC: Visiting: %conv3 = trunc i64 %mul to i32 IC: Visi...

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 20

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

...39;s really happening. Ignore my previous statements concerning %add :) Again, given: 05: for.body: ; preds = %entry, %for.body 06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ] 07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ] 08: %conv2 = and i32 %result.03, 255 09: %add = add nsw i32 %conv2, 3 10: %inc = add nsw i32 %j.04, 1 11: %cmp = icmp slt i32 %inc, 8000 12: br i1 %cmp, label %for.body, label %for.end LLVM executes the following: 01: createSCEV(%conv2 = and i32 %result.03, 255) 02: calls getSCEV(%result.03)...

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 21

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

...ous statements concerning %add :) > > Again, given: > > 05: for.body: ; preds = %entry, > %for.body > 06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ] > 07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ] > 08: %conv2 = and i32 %result.03, 255 > 09: %add = add nsw i32 %conv2, 3 > 10: %inc = add nsw i32 %j.04, 1 > 11: %cmp = icmp slt i32 %inc, 8000 > 12: br i1 %cmp, label %for.body, label %for.end > > LLVM executes the following: > > 01: createSCEV(%conv2 = and i32 %result.03,...

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 10

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

...01: define signext i8 @foo() nounwind readnone { 02: entry: 03: br label %for.body 04: 05: for.body: ; preds = %entry, %for.body 06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ] 07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ] 08: %conv2 = and i32 %result.03, 255 09: %add = add nsw i32 %conv2, 3 10: %inc = add nsw i32 %j.04, 1 11: %cmp = icmp slt i32 %inc, 8000 12: br i1 %cmp, label %for.body, label %for.end 13: 14: for.end: ; preds = %for.body 15: %conv1 = trunc i32 %add to i8 16:...

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 18

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

On Tue, Dec 18, 2012 at 9:56 AM, Matthew Curtis <mcurtis at codeaurora.org> wrote: > > Here's how I'm evaluating the expression (in my head): > > 00: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[n],3), i8), i32),3) > | > 01: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[0],3), i8), i32),3) >

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 18

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

...>> 03: br label %for.body >> 04: >> 05: for.body: ; preds = %entry, >> %for.body >> 06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ] >> 07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ] >> 08: %conv2 = and i32 %result.03, 255 >> 09: %add = add nsw i32 %conv2, 3 >> 10: %inc = add nsw i32 %j.04, 1 >> 11: %cmp = icmp slt i32 %inc, 8000 >> 12: br i1 %cmp, label %for.body, label %for.end >> 13: >> 14: for.end: ; pre...

[LLVMdev] Aliasing bug or feature?

2012 Mar 01

[LLVMdev] Aliasing bug or feature?

...ign extends *** define void @test() nounwind { entry: store i8 0, i8* @s, align 1, !tbaa !0 %0 = load i8** @p, align 4, !tbaa !2 %1 = load i8* %0, align 1, !tbaa !0 %conv = zext i8 %1 to i32 %arrayidx1 = getelementptr inbounds i8* %0, i32 1 %2 = load i8* %arrayidx1, align 1, !tbaa !0 %conv2 = zext i8 %2 to i32 %3 = load i8** @q, align 4, !tbaa !2 <<< Can this load be bypassed by the store below? %4 = load i8* %3, align 1, !tbaa !0 %conv5 = zext i8 %4 to i32 %add = add i32 %conv2, %conv %add7 = add i32 %add, %conv5 %conv8 = trunc i32 %add7 to i8 store i8 %conv8,...

[LLVMdev] How to vectorize a vector type cast?

2012 Feb 28

[LLVMdev] How to vectorize a vector type cast?

...(i32 %in.coerce) nounwind uwtable readnone { entry: %0 = bitcast i32 %in.coerce to <4 x i8> %1 = extractelement <4 x i8> %0, i32 0 %conv = uitofp i8 %1 to float %vecinit = insertelement <4 x float> undef, float %conv, i32 0 %2 = extractelement <4 x i8> %0, i32 1 %conv2 = uitofp i8 %2 to float %vecinit3 = insertelement <4 x float> %vecinit, float %conv2, i32 1 %3 = extractelement <4 x i8> %0, i32 2 %conv4 = uitofp i8 %3 to float %vecinit5 = insertelement <4 x float> %vecinit3, float %conv4, i32 2 %4 = extractelement <4 x i8> %0, i...

[LLVMdev] Aliasing bug or feature?

2012 Mar 01

[LLVMdev] Aliasing bug or feature?

...() nounwind { > entry: > store i8 0, i8* @s, align 1, !tbaa !0 > %0 = load i8** @p, align 4, !tbaa !2 > %1 = load i8* %0, align 1, !tbaa !0 > %conv = zext i8 %1 to i32 > %arrayidx1 = getelementptr inbounds i8* %0, i32 1 > %2 = load i8* %arrayidx1, align 1, !tbaa !0 > %conv2 = zext i8 %2 to i32 > %3 = load i8** @q, align 4, !tbaa !2 <<< Can this load be bypassed by the > store below? > %4 = load i8* %3, align 1, !tbaa !0 > %conv5 = zext i8 %4 to i32 > %add = add i32 %conv2, %conv > %add7 = add i32 %add, %conv5 > %conv8 = trunc i32 %a...

[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

2020 Jan 11

[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

...th PPC). But what I am proposing here is actually handling something like this: define dso_local <2 x double> @test(<2 x i64> %a) { entry: %vecext = extractelement <2 x i64> %a, i32 0 %vecext1 = extractelement <2 x i64> %a, i32 1 %conv = sitofp i64 %vecext to double %conv2 = sitofp i64 %vecext1 to double %vecinit = insertelement <2 x double> undef, double %conv, i32 0 %vecinit3 = insertelement <2 x double> %vecinit, double %conv2, i32 1 ret <2 x double> %vecinit3 } With this type conversion, InstCombine will actually simplify this as expected....

smoothing 2D vector field

2009 Feb 20

smoothing 2D vector field

Hi all, is there a function / package in R that provides a function like Matlab's conv2 or filter2 for smoothing a vector- / velocity- field. I unfortunately could not find anything. Thanks a lot.

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

2011 Sep 08

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

...t, align 4 %y.addr = alloca float, align 4 store float* %result, float** %result.addr, align 8 store float %x, float* %x.addr, align 4 store float %y, float* %y.addr, align 4 %tmp = load float* %x.addr, align 4 %conv = fpext float %tmp to double %tmp1 = load float* %y.addr, align 4 %conv2 = fpext float %tmp1 to double %div = fdiv double %conv, %conv2 %conv3 = fptrunc double %div to float %tmp4 = load float** %result.addr, align 8 store float %conv3, float* %tmp4 ret void } ----- With optimisations turned on: ----- define void @dpdiv(float* nocapture %result, float %x, fl...

[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

2020 Jan 11

[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

...ling something like this: >> define dso_local <2 x double> @test(<2 x i64> %a) { >> entry: >> %vecext = extractelement <2 x i64> %a, i32 0 >> %vecext1 = extractelement <2 x i64> %a, i32 1 >> %conv = sitofp i64 %vecext to double >> %conv2 = sitofp i64 %vecext1 to double >> %vecinit = insertelement <2 x double> undef, double %conv, i32 0 >> %vecinit3 = insertelement <2 x double> %vecinit, double %conv2, i32 1 >> ret <2 x double> %vecinit3 >> } >> With this type conversion, InstCom...

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 17

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

...readnone { > 02: entry: > 03: br label %for.body > 04: > 05: for.body: ; preds = %entry, > %for.body > 06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ] > 07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ] > 08: %conv2 = and i32 %result.03, 255 > 09: %add = add nsw i32 %conv2, 3 > 10: %inc = add nsw i32 %j.04, 1 > 11: %cmp = icmp slt i32 %inc, 8000 > 12: br i1 %cmp, label %for.body, label %for.end > 13: > 14: for.end: ; preds = %for.body > 15:...

Lowering ISD::TRUNCATE

2018 Aug 06

Lowering ISD::TRUNCATE

...entry: %val1.addr = alloca i8, align 1 store i8 %val1, i8* %val1.addr, align 1 %0 = load i8, i8* %val1.addr, align 1 %conv = zext i8 %0 to i16 %1 = load i8, i8* %val1.addr, align 1 %conv1 = zext i8 %1 to i16 %add = add nsw i16 %conv, %conv1 %conv2 = trunc i16 %add to i8 ret i8 %conv2 } I looked into the X86 backend, which has a Z80-like register design, i.e. being able to access the subregs AL (and AH) from AX directly, without any specific truncation operation necessary. But, to be honest, I do not really understand from the...

[LLVMdev] Multiply i8 operands promotes to i32

2012 Oct 08

[LLVMdev] Multiply i8 operands promotes to i32

...or MUL_I16 in order to do the correct lowering? Thanks in advance, Pedro P.S: I add C code and corresponding LLVM code. C code: void (const u_int16_t in_data, u_int16_t* out) { u_int8_t kk = in_data&0xFF; u_int16_t kk16 = kk * kk; *out = kk16; } LLVM: %1 = load i8* %kk, align 1 %conv2 = zext i8 %1 to i32 %2 = load i8* %kk, align 1 %conv3 = zext i8 %2 to i32 %mul = mul nsw i32 %conv2, %conv3 %conv4 = trunc i32 %mul to i16 store i16 %conv4, i16* %kk16, align 2 -- Pedro Malagón - Profesor ayudante 91 549 57 00 - ext. 4220 Departamento de Ingeniería Electrónica Escuela T...

Remove zext-unfolding from InstCombine

2016 Jul 27

Remove zext-unfolding from InstCombine

...generates for `foo` and `goo` just before they are passed to InstCombine: ``` define signext i8 @foo_before_InstCombine(i8 signext %a, i8 signext %b) local_unnamed_addr #0 { entry: %conv = sext i8 %a to i32 %and = and i32 %conv, 1 %cmp = icmp eq i32 %and, 0 %conv1 = zext i1 %cmp to i32 %conv2 = sext i8 %b to i32 %cmp3 = icmp eq i32 %conv2, 0 %conv4 = zext i1 %cmp3 to i32 %or = or i32 %conv1, %conv4 %conv5 = trunc i32 %or to i8 ret i8 %conv5 } ; Function Attrs: nounwind ssp uwtable define signext i8 @goo_before_InstCombine(i8 signext %a, i8 signext %b) local_unnamed_addr #0 {...

2013 Oct 09

[LLVMdev] Related constant folding of floating point values

...d float* %a, align 4 %conv = fpext float %0 to double %sub = fsub double %conv, 8.100000e+00 %cmp = fcmp oge double %sub, 0x3E8000000102F4FD br i1 %cmp, label %if.then, label %lor.lhs.false lor.lhs.false: ; preds = %entry %1 = load float* %a, align 4 %conv2 = fpext float %1 to double %sub3 = fsub double %conv2, 8.100000e+00 %cmp4 = fcmp ole double %sub3, 0xBE8000000102F4FD br i1 %cmp4, label %if.then, label %if.else ... during the transformation the %conv is replaced with "double 0x4020333340000000" and then the result of comparison i...

Speedups with Ra and jit

2008 May 02

Speedups with Ra and jit

...s is using Ra with R-2.7.0. > conv1 <- function(a, b) { > ### with Ra and jit require(jit) jit(1) ab <- numeric(length(a)+length(b)-1) for(i in 1:length(a)) for(j in 1:length(b)) ab[i+j-1] <- ab[i+j-1] + a[i]*b[j] ab } > > conv2 <- function(a, b) { > ### with just Ra ab <- numeric(length(a)+length(b)-1) for(i in 1:length(a)) for(j in 1:length(b)) ab[i+j-1] <- ab[i+j-1] + a[i]*b[j] ab } > > x <- 1:2000 > y <- 1:500 > system.time(tst1 <- conv1(x, y))...

[LLVMdev] Folding an insertelt chain

2012 Feb 17

[LLVMdev] Folding an insertelt chain

On Feb 17, 2012, at 12:50 AM, Ivan Llopard wrote: > Hello, > > I've added a little combining operation in DAGCombiner to fold a chain of insertelt nodes if that chain is proved to fully overwrite the very first source vector. In which case, I supposed a build_vector is better. It seems to be safe but I don't know if it is correctly implemented or if it is already done somewhere

search for: conv2