thr3ads.net - search: "load3"

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

2013 Dec 16

2

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

...uctions would be executed with less stalls and cycles. However, in the latest version of LLVM, the Pre-RA-sched builds a scheduling graph(original graph) which is shown following. //original graph ----> data flow ====> control flow load1 ----> store1 ====> load2 ----> store2 ====> load3 ----> store3 //end original graph So, Pre-RA-sched is unable to schedule apart load/store instruction pair. Due to LiveRange in the Register Allocation stage, all load/store instruction pair are allocated the same register. If we change the control flow in the above original graph, the modified...

[LLVMdev] Weird volatile propagation ?

2013 Jan 18

2

[LLVMdev] Weird volatile propagation ?

...{ i16, i16 } @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8 define void @test(i16 zeroext %a) nounwind uwtable { %r.sroa.0 = alloca i16, align 2 %r.sroa.1 = alloca i16, align 2 store i16 %a, i16* %r.sroa.0, align 2 store i16 1, i16* %r.sroa.1, align 2 %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 ret void } When I would have exp...

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

2013 Dec 21

0

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

...with less stalls and cycles. > However, in the latest version of LLVM, the Pre-RA-sched builds a scheduling graph(original graph) which is shown following. > //original graph > ----> data flow > ====> control flow > load1 ----> store1 ====> load2 ----> store2 ====> load3 ----> store3 > //end original graph > So, Pre-RA-sched is unable to schedule apart load/store instruction pair. > Due to LiveRange in the Register Allocation stage, all load/store instruction pair are allocated the same register. > > If we change the control flow in the above ori...

[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

2013 Jan 20

0

[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

...{ i16, i16 } @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8 define void @test(i16 zeroext %a) nounwind uwtable { %r.sroa.0 = alloca i16, align 2 %r.sroa.1 = alloca i16, align 2 store i16 %a, i16* %r.sroa.0, align 2 store i16 1, i16* %r.sroa.1, align 2 %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 ret void } The problem is in how...

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

2

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

...23 = getelementptr float* %arg5, i64 %22 %24 = bitcast float* %23 to <4 x float>* %wide.load = load <4 x float>* %24, align 16 %25 = extractelement <4 x i64> %21, i32 0 %26 = getelementptr float* %arg6, i64 %25 %27 = bitcast float* %26 to <4 x float>* %wide.load3 = load <4 x float>* %27, align 16 %28 = fadd <4 x float> %wide.load3, %wide.load %29 = extractelement <4 x i64> %21, i32 0 %30 = getelementptr float* %arg4, i64 %29 %31 = bitcast float* %30 to <4 x float>* store <4 x float> %28, <4 x float>* %31, a...

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

2013 Jan 28

4

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

...{ i16, i16 } @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8 define void @test(i16 zeroext %a) nounwind uwtable { %r.sroa.0 = alloca i16, align 2 %r.sroa.1 = alloca i16, align 2 store i16 %a, i16* %r.sroa.0, align 2 store i16 1, i16* %r.sroa.1, align 2 %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 ret void } In the generated ir, t...

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

0

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

...ody ] %.lhs = shl i64 %6, 2 %7 = add i64 %.lhs, %index %8 = getelementptr float* %arg5, i64 %7 %9 = bitcast float* %8 to <4 x float>* %wide.load = load <4 x float>* %9, align 16 %10 = getelementptr float* %arg6, i64 %7 %11 = bitcast float* %10 to <4 x float>* %wide.load3 = load <4 x float>* %11, align 16 %12 = fadd <4 x float> %wide.load3, %wide.load %13 = getelementptr float* %arg4, i64 %7 %14 = bitcast float* %13 to <4 x float>* store <4 x float> %12, <4 x float>* %14, align 16 %index.next = add i64 %index, 4 %15 = icmp e...

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

2013 Dec 15

0

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Haishan > Subject: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3 > My clang version is 3.3 and debug build. > //test.c > int a[6] = {1, 2, 3, 4, 5, 6} > int main() { > a[0] = a[5]; > a[1] = a[4]; > a[2] = a[5]; > } > //end test.c > Then test.dump is

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

2013 Jan 20

2

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

...t; arnaud.allarddegrandmaison at parrot.com> wrote: > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16, align 2 > store i16 %a, i16* %r.sroa.0, align 2 > store i16 1, i16* %r.sroa.1, align 2 > %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 > store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), > align 32 > %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 > store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), > align 2 > Just...

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

2013 Jan 29

0

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

...t %struct.R* inttoptr (i64 416 to %struct.R*), align 8 > > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16, align 2 > store i16 %a, i16* %r.sroa.0, align 2 > store i16 1, i16* %r.sroa.1, align 2 > %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 > store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 > %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 > store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 > ret void > } &g...

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

2

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

...= getelementptr float* %arg6, i64 %47 %49 = insertelement <4 x float*> %46, float* %48, i32 3 %50 = extractelement <4 x i64> %21, i32 0 %51 = getelementptr float* %arg6, i64 %50 %52 = getelementptr float* %51, i32 0 %53 = bitcast float* %52 to <4 x float>* %wide.load3 = load <4 x float>* %53 %54 = fadd <4 x float> %wide.load3, %wide.load %55 = extractelement <4 x i64> %21, i32 0 %56 = getelementptr float* %arg4, i64 %55 %57 = insertelement <4 x float*> undef, float* %56, i32 0 %58 = extractelement <4 x i64> %21, i32 1...

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

0

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

...loat* %arg6, i64 %47 > %49 = insertelement <4 x float*> %46, float* %48, i32 3 > %50 = extractelement <4 x i64> %21, i32 0 > %51 = getelementptr float* %arg6, i64 %50 > %52 = getelementptr float* %51, i32 0 > %53 = bitcast float* %52 to <4 x float>* > %wide.load3 = load <4 x float>* %53 > %54 = fadd <4 x float> %wide.load3, %wide.load > %55 = extractelement <4 x i64> %21, i32 0 > %56 = getelementptr float* %arg4, i64 %55 > %57 = insertelement <4 x float*> undef, float* %56, i32 0 > %58 = extractelement <4 x i64...

Asterisk CLI Prompt : Small hack

2004 Oct 05

0

Asterisk CLI Prompt : Small hack

...orking. Flames/comments/suggestions: Matt or Matt flewid@flewid.ca sideshow@terahertz.net now you can use %n in your prompts to give a newline. A prompt example is below. export ASTERISK_PROMPT="%n[ %d/%t ]%n[ Load1: %l1 Load2: %l2 Load3: %l3 ]%n[ Processes: %l4 PID: %l5 ]%n[ %H ] %%%# " or env ASTERISK_PROMPT="%n[ %d/%t ]%n[ Load1: %l1 Load2: %l2 Load3: %l3 ]%n[ Processes: %l4 PID: %l5 ]%n[ %H ] %%%# " --- begin patch [ apply in asterisk/ with patch -p0 < ] --- asterisk.c.original 2004-10-05 10:26:21.00000000...

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

2013 Dec 15

3

[LLVMdev] Question about Pre-RA-schedule in LLVM3.3

Hi, I compile a case (test.c) to get object machine file (test.o) using clang as follows: "clang -target arm -integrated-as -c test.c -o test.o" My clang version is 3.3 and debug build. //test.c int a[6] = {1, 2, 3, 4, 5, 6} int main() { a[0] = a[5]; a[1] = a[4]; a[2] = a[5]; } //end test.c Then test.dump is generated by using the objdump tool. //test.dump ldr r1, [r0, #20]

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

2013 Jan 21

0

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

...parrot.com>> wrote: > > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16, align 2 > store i16 %a, i16* %r.sroa.0, align 2 > store i16 1, i16* %r.sroa.1, align 2 > %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 > store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to > i16*), align 32 > %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 > store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to > i16*),...

Question about TargetLowering::SimplifyDemandedBits with AND

2015 Dec 22

2

Question about TargetLowering::SimplifyDemandedBits with AND

...{ struct A x[1]; x[0].b1 = false; int s = 0; s = x[0].b1 ? 1 : 0; <--- Here is problem. if (s != 0) __builtin_abort (); return 0; } /* IR of "s = x[0].b1 ? 1 : 0;" */ ... %b12 = getelementptr inbounds %struct.A, %struct.A* %arrayidx1, i32 0, i32 3 %bf.load3 = load i8, i8* %b12, align 2 %bf.clear4 = and i8 %bf.load3, 1 %bf.cast = trunc i8 %bf.clear4 to i1 %cond = select i1 %bf.cast, i32 1, i32 0 store i32 %cond, i32* %s, align 4 ... /* Initial Selection DAG of "s = x[0].b1 ? 1 : 0;" */ ... 0x81d17c0: i8,ch = load 0x81cca20, 0x...

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

2013 Jan 31

0

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

...truct.R* inttoptr (i64 416 to %struct.R*), align 8 > > define void @test(i16 zeroext %a) nounwind uwtable { > %r.sroa.0 = alloca i16, align 2 > %r.sroa.1 = alloca i16, align 2 > store i16 %a, i16* %r.sroa.0, align 2 > store i16 1, i16* %r.sroa.1, align 2 > %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 > store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 > %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 > store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 > ret void >...

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

2013 Feb 03

0

[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics

...o %struct.R*), align 8 >> >> define void @test(i16 zeroext %a) nounwind uwtable { >> %r.sroa.0 = alloca i16, align 2 >> %r.sroa.1 = alloca i16, align 2 >> store i16 %a, i16* %r.sroa.0, align 2 >> store i16 1, i16* %r.sroa.1, align 2 >> %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2 >> store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align 32 >> %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2 >> store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2 >>...

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

6

[LLVMdev] Vectorizer using Instruction, not opcodes

...6 x i32]* %c, i32 0, i32 %25 %27 = insertelement <4 x i32*> %24, i32* %26, i32 3 %28 = extractelement <4 x i32> %induction, i32 0 %29 = getelementptr inbounds [256 x i32]* %c, i32 0, i32 %28 %30 = getelementptr i32* %29, i32 0 %31 = bitcast i32* %30 to <4 x i32>* %wide.load3 = load <4 x i32>* %31, align 4 %32 = mul nsw <4 x i32> %wide.load3, %wide.load %33 = extractelement <4 x i32> %induction, i32 0 %34 = getelementptr inbounds [256 x i32]* %a, i32 0, i32 %33 %35 = insertelement <4 x i32*> undef, i32* %34, i32 0 %36 = extractelement &...

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

0

[LLVMdev] Vectorizer using Instruction, not opcodes

Hi all, My take on this is that, as you state below, at the IR level we are only roughly estimating cost, at best (or we would have to lower the code and then estimate cost - something we don't want to do). I would propose for estimating the "worst case costs" and see how far we get with this. My rational here is that we don't want vectorization to decrease performance relative

search for: load3