search for: load3

Displaying 20 results from an estimated 28 matches for "load3".

Did you mean: load
2013 Dec 16
2
[LLVMdev] Question about Pre-RA-schedule in LLVM3.3
...uctions would be executed with less stalls and cycles. However, in the latest version of LLVM, the Pre-RA-sched builds a scheduling graph(original graph) which is shown following. //original graph ----> data flow ====> control flow load1 ----> store1 ====> load2 ----> store2 ====> load3 ----> store3 //end original graph So, Pre-RA-sched is unable to schedule apart load/store instruction pair. Due to LiveRange in the Register Allocation stage, all load/store instruction pair are allocated the same register. If we change the control flow in the above original graph, the modified...
2013 Jan 18
2
[LLVMdev] Weird volatile propagation ?
...{ i16, i16 } @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8 define void @test(i16 zeroext %a) nounwind uwtable { %r.sroa.0 = alloca i16, align 2 %r.sroa.1 = alloca i16, align 2 store i16 %a, i16* %r.sroa.0, align 2 store i16 1, i16* %r.sroa.1, align 2 %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 ret void } When I would have exp...
2013 Dec 21
0
[LLVMdev] Question about Pre-RA-schedule in LLVM3.3
...with less stalls and cycles. > However, in the latest version of LLVM, the Pre-RA-sched builds a scheduling graph(original graph) which is shown following. > //original graph > ----> data flow > ====> control flow > load1 ----> store1 ====> load2 ----> store2 ====> load3 ----> store3 > //end original graph > So, Pre-RA-sched is unable to schedule apart load/store instruction pair. > Due to LiveRange in the Register Allocation stage, all load/store instruction pair are allocated the same register. > > If we change the control flow in the above ori...
2013 Jan 20
0
[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
...{ i16, i16 } @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8 define void @test(i16 zeroext %a) nounwind uwtable { %r.sroa.0 = alloca i16, align 2 %r.sroa.1 = alloca i16, align 2 store i16 %a, i16* %r.sroa.0, align 2 store i16 1, i16* %r.sroa.1, align 2 %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 ret void } The problem is in how...
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
...23 = getelementptr float* %arg5, i64 %22 %24 = bitcast float* %23 to <4 x float>* %wide.load = load <4 x float>* %24, align 16 %25 = extractelement <4 x i64> %21, i32 0 %26 = getelementptr float* %arg6, i64 %25 %27 = bitcast float* %26 to <4 x float>* %wide.load3 = load <4 x float>* %27, align 16 %28 = fadd <4 x float> %wide.load3, %wide.load %29 = extractelement <4 x i64> %21, i32 0 %30 = getelementptr float* %arg4, i64 %29 %31 = bitcast float* %30 to <4 x float>* store <4 x float> %28, <4 x float>* %31, a...
2013 Jan 28
4
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
...{ i16, i16 } @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8 define void @test(i16 zeroext %a) nounwind uwtable { %r.sroa.0 = alloca i16, align 2 %r.sroa.1 = alloca i16, align 2 store i16 %a, i16* %r.sroa.0, align 2 store i16 1, i16* %r.sroa.1, align 2 %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 ret void } In the generated ir, t...
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
...ody ] %.lhs = shl i64 %6, 2 %7 = add i64 %.lhs, %index %8 = getelementptr float* %arg5, i64 %7 %9 = bitcast float* %8 to <4 x float>* %wide.load = load <4 x float>* %9, align 16 %10 = getelementptr float* %arg6, i64 %7 %11 = bitcast float* %10 to <4 x float>* %wide.load3 = load <4 x float>* %11, align 16 %12 = fadd <4 x float> %wide.load3, %wide.load %13 = getelementptr float* %arg4, i64 %7 %14 = bitcast float* %13 to <4 x float>* store <4 x float> %12, <4 x float>* %14, align 16 %index.next = add i64 %index, 4 %15 = icmp e...
2013 Dec 15
0
[LLVMdev] Question about Pre-RA-schedule in LLVM3.3
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Haishan > Subject: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3 > My clang version is 3.3 and debug build. > //test.c > int a[6] = {1, 2, 3, 4, 5, 6} > int main() { >  a[0] = a[5]; >  a[1] = a[4]; >  a[2] = a[5]; > } > //end test.c > Then test.dump is
2013 Jan 20
2
[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
...t; arnaud.allarddegrandmaison at parrot.com> wrote: > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16, align 2 > store i16 %a, i16* %r.sroa.0, align 2 > store i16 1, i16* %r.sroa.1, align 2 > %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 > store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), > align 32 > %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 > store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), > align 2 > Just...
2013 Jan 29
0
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
...t %struct.R* inttoptr (i64 416 to %struct.R*), align 8 > > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16, align 2 > store i16 %a, i16* %r.sroa.0, align 2 > store i16 1, i16* %r.sroa.1, align 2 > %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 > store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 > %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 > store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 > ret void > } &g...
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
...= getelementptr float* %arg6, i64 %47 %49 = insertelement <4 x float*> %46, float* %48, i32 3 %50 = extractelement <4 x i64> %21, i32 0 %51 = getelementptr float* %arg6, i64 %50 %52 = getelementptr float* %51, i32 0 %53 = bitcast float* %52 to <4 x float>* %wide.load3 = load <4 x float>* %53 %54 = fadd <4 x float> %wide.load3, %wide.load %55 = extractelement <4 x i64> %21, i32 0 %56 = getelementptr float* %arg4, i64 %55 %57 = insertelement <4 x float*> undef, float* %56, i32 0 %58 = extractelement <4 x i64> %21, i32 1...
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
...loat* %arg6, i64 %47 > %49 = insertelement <4 x float*> %46, float* %48, i32 3 > %50 = extractelement <4 x i64> %21, i32 0 > %51 = getelementptr float* %arg6, i64 %50 > %52 = getelementptr float* %51, i32 0 > %53 = bitcast float* %52 to <4 x float>* > %wide.load3 = load <4 x float>* %53 > %54 = fadd <4 x float> %wide.load3, %wide.load > %55 = extractelement <4 x i64> %21, i32 0 > %56 = getelementptr float* %arg4, i64 %55 > %57 = insertelement <4 x float*> undef, float* %56, i32 0 > %58 = extractelement <4 x i64...
2004 Oct 05
0
Asterisk CLI Prompt : Small hack
...orking. Flames/comments/suggestions: Matt or Matt flewid@flewid.ca sideshow@terahertz.net now you can use %n in your prompts to give a newline. A prompt example is below. export ASTERISK_PROMPT="%n[ %d/%t ]%n[ Load1: %l1 Load2: %l2 Load3: %l3 ]%n[ Processes: %l4 PID: %l5 ]%n[ %H ] %%%# " or env ASTERISK_PROMPT="%n[ %d/%t ]%n[ Load1: %l1 Load2: %l2 Load3: %l3 ]%n[ Processes: %l4 PID: %l5 ]%n[ %H ] %%%# " --- begin patch [ apply in asterisk/ with patch -p0 < ] --- asterisk.c.original 2004-10-05 10:26:21.00000000...
2013 Dec 15
3
[LLVMdev] Question about Pre-RA-schedule in LLVM3.3
Hi, I compile a case (test.c) to get object machine file (test.o) using clang as follows: "clang -target arm -integrated-as -c test.c -o test.o" My clang version is 3.3 and debug build. //test.c int a[6] = {1, 2, 3, 4, 5, 6} int main() { a[0] = a[5]; a[1] = a[4]; a[2] = a[5]; } //end test.c Then test.dump is generated by using the objdump tool. //test.dump ldr r1, [r0, #20]
2013 Jan 21
0
[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
...parrot.com>> wrote: > > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16, align 2 > store i16 %a, i16* %r.sroa.0, align 2 > store i16 1, i16* %r.sroa.1, align 2 > %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 > store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to > i16*), align 32 > %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 > store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to > i16*),...
2015 Dec 22
2
Question about TargetLowering::SimplifyDemandedBits with AND
...{ struct A x[1]; x[0].b1 = false; int s = 0; s = x[0].b1 ? 1 : 0; <--- Here is problem. if (s != 0) __builtin_abort (); return 0; } /* IR of "s = x[0].b1 ? 1 : 0;" */ ... %b12 = getelementptr inbounds %struct.A, %struct.A* %arrayidx1, i32 0, i32 3 %bf.load3 = load i8, i8* %b12, align 2 %bf.clear4 = and i8 %bf.load3, 1 %bf.cast = trunc i8 %bf.clear4 to i1 %cond = select i1 %bf.cast, i32 1, i32 0 store i32 %cond, i32* %s, align 4 ... /* Initial Selection DAG of "s = x[0].b1 ? 1 : 0;" */ ... 0x81d17c0: i8,ch = load 0x81cca20, 0x...
2013 Jan 31
0
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
...truct.R* inttoptr (i64 416 to %struct.R*), align 8 > > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16, align 2 > store i16 %a, i16* %r.sroa.0, align 2 > store i16 1, i16* %r.sroa.1, align 2 > %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 > store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 > %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 > store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 > ret void >...
2013 Feb 03
0
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
...o %struct.R*), align 8 >> >> define void @test(i16 zeroext %a) nounwind uwtable { >> %r.sroa.0 = alloca i16, align 2 >> %r.sroa.1 = alloca i16, align 2 >> store i16 %a, i16* %r.sroa.0, align 2 >> store i16 1, i16* %r.sroa.1, align 2 >> %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 >> store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 >> %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 >> store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 >>...
2013 Feb 04
6
[LLVMdev] Vectorizer using Instruction, not opcodes
...6 x i32]* %c, i32 0, i32 %25 %27 = insertelement <4 x i32*> %24, i32* %26, i32 3 %28 = extractelement <4 x i32> %induction, i32 0 %29 = getelementptr inbounds [256 x i32]* %c, i32 0, i32 %28 %30 = getelementptr i32* %29, i32 0 %31 = bitcast i32* %30 to <4 x i32>* %wide.load3 = load <4 x i32>* %31, align 4 %32 = mul nsw <4 x i32> %wide.load3, %wide.load %33 = extractelement <4 x i32> %induction, i32 0 %34 = getelementptr inbounds [256 x i32]* %a, i32 0, i32 %33 %35 = insertelement <4 x i32*> undef, i32* %34, i32 0 %36 = extractelement &...
2013 Feb 04
0
[LLVMdev] Vectorizer using Instruction, not opcodes
Hi all, My take on this is that, as you state below, at the IR level we are only roughly estimating cost, at best (or we would have to lower the code and then estimate cost - something we don't want to do). I would propose for estimating the "worst case costs" and see how far we get with this. My rational here is that we don't want vectorization to decrease performance relative