thr3ads.net - search: "output

2010 May 29

3

[LLVMdev] Vectorized LLVM IR

...inbounds float** %outputs, i64 0 %output0 = load float** %output_array_ptr0, align 8 %out = icmp sgt i32 %count, 0 br i1 %out, label %convert, label %return convert: %count_64 = zext i32 %count to i64 br label %loop loop: %indvar = phi i64 [ 0, %convert ], [ %indvar.next, %loop ] %output_ptr0 = getelementptr float* %output0, i64 %indvar %input_ptr1 = getelementptr float* %input1, i64 %indvar %fTemp0 = load float* %input_ptr1, align 4 %input_ptr0 = getelementptr float* %input0, i64 %indvar %fTemp1 = load float* %input_ptr0, align 4 %fTemp2 = fadd float %fTemp1, %fTemp0 %input...

[LLVMdev] Vectorized LLVM IR

2010 May 29

0

[LLVMdev] Vectorized LLVM IR

...gt i32 %count, 0 > br i1 %out, label %convert, label %return > convert: > %count_64 = zext i32 %count to i64 > br label %loop > loop: > %indvar = phi i64 [ 0, %convert ], [ %indvar.next, %loop ] > %output_ptr0 = getelementptr float* %output0, i64 %indvar > %input_ptr1 = getelementptr float* %input1, i64 %indvar > %fTemp0 = load float* %input_ptr1, align 4 > %input_ptr0 = getelementptr float* %input0, i64 %indvar > %fTemp1 = load floa...

[LLVMdev] Vectorized LLVM IR

2010 May 29

1

[LLVMdev] Vectorized LLVM IR

... br i1 %out, label %convert, label %return >> convert: >> %count_64 = zext i32 %count to i64 >> br label %loop >> loop: >> %indvar = phi i64 [ 0, %convert ], [ %indvar.next, %loop ] >> %output_ptr0 = getelementptr float* %output0, i64 %indvar >> %input_ptr1 = getelementptr float* %input1, i64 %indvar >> %fTemp0 = load float* %input_ptr1, align 4 >> %input_ptr0 = getelementptr float* %input0, i64 %indvar >> %fT...

[LLVMdev] Vectorized LLVM IR

2010 May 28

0

[LLVMdev] Vectorized LLVM IR

Hi Stéphane, The SSE support is the LLVM backend is fine. What is the code that's generated? Do you have some short examples of where LLVM doesn't do as well as the equivalent scalar code? -bw On May 28, 2010, at 12:13 PM, Stéphane Letz wrote: > Hi, > > We are experimenting directly generating vectorized LLVM IR (using <8 x float> kind of types), then compiling the code

[LLVMdev] Vectorized LLVM IR

2010 May 28

3

[LLVMdev] Vectorized LLVM IR

Hi, We are experimenting directly generating vectorized LLVM IR (using <8 x float> kind of types), then compiling the code to SSE on a 64 bits machine. Right now the equivalent code in scalar mode sill outperform the SSE one. What is the quality of the SSE support in X86 LLVL backend? Are they any specific things to be aware of to improve the speed? Thanks Stéphane Letz

search for: output_ptr0