Hello. I have compiled the simple program: #include <stdio.h> #include <stdlib.h> int v1[10000]; int main() { int i; for (i = 0; i < 10000; i++) { v1[i] = i; } for (i = 0; i < 10000; i++) { printf("%d ", v1[i]); } return 0; } Next, I disasseble the executable file and have not found any SSE instructions. I know that LLVM support SSE. So my questions: 1. It is occur only in my computer? 2. If it is not only my bug, then there are not SSE optimizations in LLVM? 3. Have anyone, already worked on this problem? -- Serg Anohovsky. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110522/149188fe/attachment.html>
On 05/22/2011 08:07 PM, Serg Anohovsky wrote:> Hello. > I have compiled the simple program: > > #include <stdio.h> > #include <stdlib.h> > > int v1[10000]; > > int main() > { > int i; > > for (i = 0; i < 10000; i++) { > v1[i] = i; > } > > for (i = 0; i < 10000; i++) { > printf("%d ", v1[i]); > } > > return 0; > } >This program has no floating point, and no vector data types, and no vector intrinsics. AFAIK those are the only situations where LLVM would produce SSE code. GCC indeed produces some SSE instructions at -O3, because unlike LLVM it has auto-vectorization support. I doubt that for this particular loop the difference would be significant though. Best regards, --Edwin
On Sun, May 22, 2011 at 1:07 PM, Serg Anohovsky <serg.anohovsky at gmail.com>wrote:> Hello. > I have compiled the simple program: > > #include <stdio.h> > #include <stdlib.h> > > int v1[10000]; > > int main() > { > int i; > > for (i = 0; i < 10000; i++) { > v1[i] = i; > } > >This loop is not really vectorizable, even if LLVM had an auto-vectorizer. You need the same operation (floating-point or integer) applied to contiguous elements in a vector. An example of a vectorizable loop body would be "v1[i] = v1[i] * v1[i]" Then, you could use SSE (or any other vector instruction set) to get a substantial speed improvement.> for (i = 0; i < 10000; i++) { > printf("%d ", v1[i]); > } > > return 0; > } > > Next, I disasseble the executable file and have not found any SSE > instructions. > I know that LLVM support SSE. > So my questions: > 1. It is occur only in my computer? > 2. If it is not only my bug, then there are not SSE optimizations in > LLVM? > 3. Have anyone, already worked on this problem? > > -- > Serg Anohovsky. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110522/62f69ae6/attachment.html>
On May 22, 2011, at 10:47 AM, Justin Holewinski wrote:> On Sun, May 22, 2011 at 1:07 PM, Serg Anohovsky <serg.anohovsky at gmail.com> wrote: > Hello. > I have compiled the simple program: > > #include <stdio.h> > #include <stdlib.h> > > int v1[10000]; > > int main() > { > int i; > > for (i = 0; i < 10000; i++) { > v1[i] = i; > } > > > This loop is not really vectorizable, even if LLVM had an auto-vectorizer. You need the same operation (floating-point or integer) applied to contiguous elements in a vector. An example of a vectorizable loop body would be "v1[i] = v1[i] * v1[i]" Then, you could use SSE (or any other vector instruction set) to get a substantial speed improvement.This is vectorizable. Just start out with a vector of constants <0, 1, 2, 3> and do a store of it every time through the loop, adding <4,4,4,4> as you go. -Chris> > for (i = 0; i < 10000; i++) { > printf("%d ", v1[i]); > } > > return 0; > } > > Next, I disasseble the executable file and have not found any SSE instructions. > I know that LLVM support SSE. > So my questions: > 1. It is occur only in my computer? > 2. If it is not only my bug, then there are not SSE optimizations in LLVM? > 3. Have anyone, already worked on this problem? > > -- > Serg Anohovsky. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > -- > > Thanks, > > Justin Holewinski > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110522/fee15322/attachment.html>
---------- Forwarded message ---------- From: Serg Anohovsky <serg.anohovsky at gmail.com> Date: 2011/5/22 Subject: Re: [LLVMdev] No SSE instructions To: Chris Lattner <clattner at apple.com> 2011/5/22 Chris Lattner <clattner at apple.com>> > On May 22, 2011, at 10:47 AM, Justin Holewinski wrote: > > On Sun, May 22, 2011 at 1:07 PM, Serg Anohovsky <serg.anohovsky at gmail.com>wrote: > >> Hello. >> I have compiled the simple program: >> >> #include <stdio.h> >> #include <stdlib.h> >> >> int v1[10000]; >> >> int main() >> { >> int i; >> >> for (i = 0; i < 10000; i++) { >> v1[i] = i; >> } >> >> > This loop is not really vectorizable, even if LLVM had an auto-vectorizer. > You need the same operation (floating-point or integer) applied to > contiguous elements in a vector. An example of a vectorizable loop body > would be "v1[i] = v1[i] * v1[i]" Then, you could use SSE (or any other > vector instruction set) to get a substantial speed improvement. > > > This is vectorizable. Just start out with a vector of constants <0, 1, 2, > 3> and do a store of it every time through the loop, adding <4,4,4,4> as > you go. > > -Chris > > > >> for (i = 0; i < 10000; i++) { >> printf("%d ", v1[i]); >> } >> >> return 0; >> } >> >> Next, I disasseble the executable file and have not found any SSE >> instructions. >> I know that LLVM support SSE. >> So my questions: >> 1. It is occur only in my computer? >> 2. If it is not only my bug, then there are not SSE optimizations in >> LLVM? >> 3. Have anyone, already worked on this problem? >> >> -- >> Serg Anohovsky. >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > > -- > > Thanks, > > Justin Holewinski > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > Thanks, for your notes. In my opinion, there are no different. So anotherexample: #include <stdio.h> #include <stdlib.h> int v0[10000]; int v1[10000]; int main() { int i; for (i = 0; i < 10000; i++) { v0[i] = i; } for (i = 0; i < 10000; i++) { v1[i] = v0[i] * v0[i] * 4; } for (i = 0; i < 10000; i++) { printf("%d ", v1[i]); } return 0; } This is should be optimized, but LLVM have not optimized this program. The questions were not about this specific example. I wont to understand, what vector optimizations LLVM have? How well implemented this theme in LLVM? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110522/1d5e2c70/attachment.html>
On Sun, May 22, 2011 at 2:10 PM, Serg Anohovsky <serg.anohovsky at gmail.com> wrote:> This is should be optimized, but LLVM have not optimized this program. The > questions > were not about this specific example. I wont to understand, what vector > optimizations LLVM have? > How well implemented this theme in LLVM?When asking this type of question, you should be specific about how you built the program, ie did you use clang, llvm-gcc, or dragonegg, and which options did you use. From your message, I can't tell if you built at O0 or O3. In this case, no, LLVM does not have any auto-vectorization optimizations. However, LLVM does have good support for vector intrinsics, so if you use xmmintrin.h you should be able to get good performance. Reid
Hi Serg,> Next, I disasseble the executable file and have not found any SSE instructions. > I know that LLVM support SSE. > So my questions: > 1. It is occur only in my computer? > 2. If it is not only my bug, then there are not SSE optimizations in LLVM? > 3. Have anyone, already worked on this problem?the gcc-4.5 tree vectorizer vectorizes this (see LLVM IR below) but LLVM does not yet have an auto-vectorizer that can do this. Ciao, Duncan. IR produced by dragonegg using -O3 and -fplugin-arg-dragonegg-enable-gcc-optzns: target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" module asm "\09.ident\09\22GCC: (GNU) 4.5.4 20110506 (prerelease) LLVM: 131851M\22" @v1 = common global [10000 x i32] zeroinitializer, align 32 @.cst = private constant [4 x i8] c"%d \00", align 8 define i32 @main() nounwind { entry: br label %"<bb 3>" "<bb 3>": ; preds = %"<bb 3>", %entry %indvar2 = phi i64 [ %indvar.next3, %"<bb 3>" ], [ 0, %entry ] %vect_vec_iv_.8_10 = phi <4 x i32> [ %vect_vec_iv_.8_24, %"<bb 3>" ], [ <i32 0, i32 1, i32 2, i32 3>, %entry ] %tmp6 = shl i64 %indvar2, 2 %scevgep7 = getelementptr [10000 x i32]* @v1, i64 0, i64 %tmp6 %scevgep78 = bitcast i32* %scevgep7 to <4 x i32>* %vect_vec_iv_.8_24 = add nsw <4 x i32> %vect_vec_iv_.8_10, <i32 4, i32 4, i32 4, i32 4> store <4 x i32> %vect_vec_iv_.8_10, <4 x i32>* %scevgep78, align 16 %indvar.next3 = add i64 %indvar2, 1 %exitcond4 = icmp eq i64 %indvar.next3, 2500 br i1 %exitcond4, label %"<bb 5>", label %"<bb 3>" "<bb 5>": ; preds = %"<bb 3>", %"<bb 5>" %indvar = phi i64 [ %indvar.next, %"<bb 5>" ], [ 0, %"<bb 3>" ] %scevgep = getelementptr [10000 x i32]* @v1, i64 0, i64 %indvar %D.3943_6 = load i32* %scevgep, align 4 %0 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @.cst, i64 0, i64 0), i32 %D.3943_6) nounwind %indvar.next = add i64 %indvar, 1 %exitcond = icmp eq i64 %indvar.next, 10000 br i1 %exitcond, label %"<bb 6>", label %"<bb 5>" "<bb 6>": ; preds = %"<bb 5>" ret i32 0 } declare i32 @printf(i8* nocapture, ...) nounwind
2011/5/22 Chris Lattner <clattner at apple.com>> > LLVM does not have an autovectorizer. > > -Chris >Could you tell me please are you going to implement autovecorizer in LLVM in nearby future? -- Serg Anohovsky -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110522/596014d7/attachment.html>
On May 22, 2011, at 12:31 PM, Serg Anohovsky wrote:> > > 2011/5/22 Chris Lattner <clattner at apple.com> > > LLVM does not have an autovectorizer. > > -Chris > > Could you tell me please are you going to implement autovecorizer in LLVM in nearby future?I'm confident it will happen but have no idea on what timeline. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110522/8fea592d/attachment.html>
---------- Forwarded message ---------- From: Serg Anohovsky <serg.anohovsky at gmail.com> Date: 2011/5/23 Subject: Re: [LLVMdev] No SSE instructions To: Chris Lattner <clattner at apple.com> Thank you all, for your explanation. This is a real interesting theme for me. -- Serg Anohovsky -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110523/cb34fb99/attachment.html>
On 05/22/2011 04:31 PM, Serg Anohovsky wrote:> > > 2011/5/22 Chris Lattner <clattner at apple.com <mailto:clattner at apple.com>> > > > LLVM does not have an autovectorizer. > > -Chris > > > Could you tell me please are you going to implement autovecorizer in > LLVM in nearby future?Hi Serg, there is some preliminary work done in the Polly project[1] on autovectorization. Though we mainly work on loop transformations that will expose more vectoriation opportunities. If you are interested to do research in this area, Polly may be a good start. Cheers Tobi [1] http://polly.grosser.es