thr3ads.net - search: "conv3"

Displaying 14 results from an estimated 14 matches for "conv3".

Did you mean: conv

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

2011 Sep 08

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

...** %result.addr, align 8 store float %x, float* %x.addr, align 4 store float %y, float* %y.addr, align 4 %tmp = load float* %x.addr, align 4 %conv = fpext float %tmp to double %tmp1 = load float* %y.addr, align 4 %conv2 = fpext float %tmp1 to double %div = fdiv double %conv, %conv2 %conv3 = fptrunc double %div to float %tmp4 = load float** %result.addr, align 8 store float %conv3, float* %tmp4 ret void } ----- With optimisations turned on: ----- define void @dpdiv(float* nocapture %result, float %x, float %y) nounwind uwtable { entry: %conv3 = fdiv float %x, %y store flo...

InstCombine wrongful (?) optimization on BinOp with SameOperands

2015 Sep 30

InstCombine wrongful (?) optimization on BinOp with SameOperands

...%xor = xor i64 %shr, %mul %conv2 = trunc i64 %xor to i32 ret i32 %conv2 } I came upon the following optimization (during instcombine): *IC: Visiting: %mul = mul nuw i64 %conv, %conv1 IC: Visiting: %shr = lshr i64 %mul, 32 IC: Visiting: %conv2 = trunc i64 %shr to i32 IC: Visiting: %conv3 = trunc i64 %mul to i32 IC: Visiting: %xor = xor i32 %conv3, %conv2 IC: ADD: %xor6 = xor i64 %mul, %shr IC: Old = %xor = xor i32 %conv3, %conv2 New = <badref> = trunc i64 %xor6 to i32 * which seems to be performed by SDValue DAGCombiner::SimplifyBinOpWithSameOpcodeHands(SDNode *...

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

2011 Sep 08

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

Peter, Is there a way to make this flag globally available? Metadata can be fairly expensive to handle at each node when in many cases it is a global flag and not a per operation flag. > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Robert Quill > Sent: Thursday, September 08, 2011 3:24 AM > To: Peter

[LLVMdev] Multiply i8 operands promotes to i32

2012 Oct 08

[LLVMdev] Multiply i8 operands promotes to i32

...advance, Pedro P.S: I add C code and corresponding LLVM code. C code: void (const u_int16_t in_data, u_int16_t* out) { u_int8_t kk = in_data&0xFF; u_int16_t kk16 = kk * kk; *out = kk16; } LLVM: %1 = load i8* %kk, align 1 %conv2 = zext i8 %1 to i32 %2 = load i8* %kk, align 1 %conv3 = zext i8 %2 to i32 %mul = mul nsw i32 %conv2, %conv3 %conv4 = trunc i32 %mul to i16 store i16 %conv4, i16* %kk16, align 2 -- Pedro Malagón - Profesor ayudante 91 549 57 00 - ext. 4220 Departamento de Ingeniería Electrónica Escuela Técnica Superior de Ingenieros de Telecomunicación Universi...

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

2011 Sep 08

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

Hi Peter, This sounds like I really good idea. One thing that did occur to me though from an OpenCL point of view is that ULP accuracy requirements can differ for embedded and full profile so that may need to be handled somehow. Thanks, Rob On Wed, 2011-09-07 at 21:55 +0100, Peter Collingbourne wrote: > Hi, > > This is my proposal to add floating point accuracy support to LLVM. >

Speedups with Ra and jit

2008 May 02

Speedups with Ra and jit

...0.00 0.55 > system.time(tst2 <- conv2(x, y)) user system elapsed 9.49 0.00 9.56 > all.equal(tst1, tst2) [1] TRUE > > 9.56/0.55 [1] 17.38182 > However for this example you can achieve speed-ups like that or better just using vectorised code intelligently: > conv3 <- local({ conv <- function(a, b, na, nb) { r <- numeric(na + nb -1) ij <- 1:nb for(e in a) { r[ij] <- r[ij] + e*b ij <- ij + 1 } r } function(a, b) { na <- length(a) nb <- len...

[LLVMdev] Promoting i16 load to i32

2011 Feb 07

[LLVMdev] Promoting i16 load to i32

...oca i16, align 2 %y.addr = alloca i16, align 2 store i16 %x, i16* %x.addr, align 2 store i16 %y, i16* %y.addr, align 2 %tmp = load i16* %x.addr, align 2 %conv = zext i16 %tmp to i32 %tmp1 = load i16* %y.addr, align 2 %conv2 = zext i16 %tmp1 to i32 %add = add nsw i32 %conv, %conv2 %conv3 = trunc i32 %add to i16 ret i16 %conv3 } Upon compiling I get this failed assertion: llc: LegalizeDAG.cpp:1309: llvm::SDValue<unnamed>::SelectionDAGLegalize::LegalizeOp(llvm::SDValue): Assertion `0 && "This action is not supported yet!"' failed. I initially expected...

[LLVMdev] What's the Alias Analysis does clang use ?

2013 Nov 11

[LLVMdev] What's the Alias Analysis does clang use ?

...lign 4 %6 = load float* %arrayidx1, align 4 store float %6, float* %y, align 4 %7 = load float* %arrayidx2, align 4 store float %7, float* %z, align 4 %8 = load float* %x, align 4 %conv = fpext float %8 to double %mul = fmul double %conv, 6.700000e-01 %9 = load float* %y, align 4 %conv3 = fpext float %9 to double %mul4 = fmul double %conv3, 1.700000e-01 %add = fadd double %mul, %mul4 %10 = load float* %z, align 4 %conv5 = fpext float %10 to double %mul6 = fmul double %conv5, 1.600000e-01 %add7 = fadd double %add, %mul6 %conv8 = fptrunc double %add7 to float store f...

[LLVMdev] Unnecessary moves after sign-extension in 2-address target

2009 Apr 20

[LLVMdev] Unnecessary moves after sign-extension in 2-address target

...{ return (signed char) a + (signed short) b + c; } I get this IR: define i32 @sext(i32 %a, i32 %b, i32 %c) nounwind readnone { entry: %conv = trunc i32 %a to i8 ; <i8> [#uses=1] %conv1 = sext i8 %conv to i32 ; <i32> [#uses=1] %conv3 = trunc i32 %b to i16 ; <i16> [#uses=1] %conv4 = sext i16 %conv3 to i32 ; <i32> [#uses=1] %add = add i32 %conv1, %c ; <i32> [#uses=1] %add6 = add i32 %add, %conv4 ; <i32> [#uses=1] ret i32 %add6 } And this not-...

[LLVMdev] What's the Alias Analysis does clang use ?

2013 Nov 12

[LLVMdev] What's the Alias Analysis does clang use ?

...* %arrayidx1, align 4 > store float %6, float* %y, align 4 > %7 = load float* %arrayidx2, align 4 > store float %7, float* %z, align 4 > %8 = load float* %x, align 4 > %conv = fpext float %8 to double > %mul = fmul double %conv, 6.700000e-01 > %9 = load float* %y, align 4 > %conv3 = fpext float %9 to double > %mul4 = fmul double %conv3, 1.700000e-01 > %add = fadd double %mul, %mul4 > %10 = load float* %z, align 4 > %conv5 = fpext float %10 to double > %mul6 = fmul double %conv5, 1.600000e-01 > %add7 = fadd double %add, %mul6 > %conv8 = fptrunc double %ad...

[LLVMdev] global type legalization?

2010 Sep 14

[LLVMdev] global type legalization?

Returning to an old discussion here.... On Aug 18, 2010, at 10:42 AM, Chris Lattner wrote: > On Aug 18, 2010, at 10:27 AM, Bob Wilson wrote: >>> I tend to think that it isn't worth the compile time to try to microoptimize out every compare, but I could be convinced otherwise if there are important use cases we're failing to handle. I also do think that whole-function

[LLVMdev] Bug in MachineInstr::isIdenticalTo

2011 Jan 04

[LLVMdev] Bug in MachineInstr::isIdenticalTo

...53 = extractelement <4 x i32> %format, i32 1 ; <i32> [#uses=1] switch i32 %tmp53, label %if.then [ i32 1, label %switch.case55 i32 2, label %switch.case61 ] switch.case55: ; preds = %switch.case %arrayidx = getelementptr i8 addrspace(1)* %conv3, i32 %tmp22 ; <i8 addrspace(1)*> [#uses=1] %tmp59 = extractelement <4 x i32> %9, i32 0 ; <i32> [#uses=1] %conv60 = trunc i32 %tmp59 to i8 ; <i8> [#uses=1] store i8 %conv60, i8 addrspace(1)* %arrayidx ret void switch.case61:...

[LLVMdev] Bug in MachineInstr::isIdenticalTo

2011 Jan 04

[LLVMdev] Bug in MachineInstr::isIdenticalTo

...2> %format, i32 1 ; <i32> [#uses=1] > switch i32 %tmp53, label %if.then [ > i32 1, label %switch.case55 > i32 2, label %switch.case61 > ] > switch.case55: ; preds = %switch.case > %arrayidx = getelementptr i8 addrspace(1)* %conv3, i32 %tmp22 ; <i8 addrspace(1)*> [#uses=1] > %tmp59 = extractelement <4 x i32> %9, i32 0 ; <i32> [#uses=1] > %conv60 = trunc i32 %tmp59 to i8 ; <i8> [#uses=1] > store i8 %conv60, i8 addrspace(1)* %arrayidx > ret void > switch.case61:...

[LLVMdev] LiveIntervals analysis problem

2013 Feb 14

[LLVMdev] LiveIntervals analysis problem

...nv2.i.i.i.i.i = trunc i32 %or.i.i.i.i.i to i16 br label %if.end.i126.i.i.i.i if.end.i126.i.i.i.i: ; preds = %if.then.i.i.i.i.i, %for.body.i.i.i.i.i %bits.1.i.i.i.i.i = phi i16 [ %conv2.i.i.i.i.i, %if.then.i.i.i.i.i ], [ %bits.024.i.i.i.i.i, %for.body.i.i.i.i.i ] %conv3.i.i.i.i.i = zext i16 %194 to i32 %shl.i.i.i.i.i = shl nuw nsw i32 %conv3.i.i.i.i.i, 1 %conv5.i.i.i.i.i = zext i16 %bits.1.i.i.i.i.i to i32 %and6.i.i.i.i.i = lshr i32 %conv5.i.i.i.i.i, 1 %and6.lobit.i.i.i.i.i = and i32 %and6.i.i.i.i.i, 1 %storemerge.in.i.i.i.i.i = or i32 %and6.lobit.i.i.i....

search for: conv3