thr3ads.net - llvm dev - [LLVMdev] Fwd: No SSE instructions [May 2011]

If this information is useful, please help other people find it:
Share via:

Serg Anohovsky

2011-May-22 17:07 UTC

[LLVMdev] No SSE instructions

Hello.
I have compiled the simple program:

#include <stdio.h>
#include <stdlib.h>

int v1[10000];

int main()
{
        int i;

        for (i = 0; i < 10000; i++) {
                v1[i] = i;
        }

        for (i = 0; i < 10000; i++) {
                printf("%d ", v1[i]);
        }

        return 0;
}

Next, I disasseble the executable file and have not found any SSE
instructions.
I know that LLVM support SSE.
So my questions:
  1. It is occur only in my computer?
  2. If it is not only my bug, then there are not SSE optimizations in
LLVM?
  3. Have anyone, already worked on this problem?

--
Serg Anohovsky.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110522/149188fe/attachment.html>

Török Edwin

2011-May-22 17:26 UTC

head link

[LLVMdev] No SSE instructions

On 05/22/2011 08:07 PM, Serg Anohovsky wrote:> Hello.
> I have compiled the simple program:
> 
> #include <stdio.h>
> #include <stdlib.h>
> 
> int v1[10000];
> 
> int main()
> {
>         int i;
> 
>         for (i = 0; i < 10000; i++) {
>                 v1[i] = i;
>         }
> 
>         for (i = 0; i < 10000; i++) {
>                 printf("%d ", v1[i]);
>         }
> 
>         return 0;
> }
> 
This program has no floating point, and no vector data types, and no
vector intrinsics.
AFAIK those are the only situations where LLVM would produce SSE code.

GCC indeed produces some SSE instructions at -O3, because unlike LLVM it
has auto-vectorization support.

I doubt that for this particular loop the difference would be
significant though.

Best regards,
--Edwin

Justin Holewinski

2011-May-22 17:47 UTC

head link

[LLVMdev] No SSE instructions

On Sun, May 22, 2011 at 1:07 PM, Serg Anohovsky <serg.anohovsky at
gmail.com>wrote:
> Hello.
> I have compiled the simple program:
>
> #include <stdio.h>
> #include <stdlib.h>
>
> int v1[10000];
>
> int main()
> {
>         int i;
>
>         for (i = 0; i < 10000; i++) {
>                 v1[i] = i;
>         }
>
>This loop is not really vectorizable, even if LLVM had an auto-vectorizer.
 You need the same operation (floating-point or integer) applied to
contiguous elements in a vector.  An example of a vectorizable loop body
would be "v1[i] = v1[i] * v1[i]"  Then, you could use SSE (or any
other
vector instruction set) to get a substantial speed improvement.

>         for (i = 0; i < 10000; i++) {
>                 printf("%d ", v1[i]);
>         }
>
>         return 0;
> }
>
> Next, I disasseble the executable file and have not found any SSE
> instructions.
> I know that LLVM support SSE.
> So my questions:
>   1. It is occur only in my computer?
>   2. If it is not only my bug, then there are not SSE optimizations in
> LLVM?
>   3. Have anyone, already worked on this problem?
>
> --
> Serg Anohovsky.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110522/62f69ae6/attachment.html>

Chris Lattner

2011-May-22 17:51 UTC

head link

[LLVMdev] No SSE instructions

On May 22, 2011, at 10:47 AM, Justin Holewinski wrote:
> On Sun, May 22, 2011 at 1:07 PM, Serg Anohovsky <serg.anohovsky at
gmail.com> wrote:
> Hello.
> I have compiled the simple program:
> 
> #include <stdio.h>
> #include <stdlib.h>
> 
> int v1[10000];
> 
> int main()
> {
>         int i;
> 
>         for (i = 0; i < 10000; i++) {
>                 v1[i] = i;
>         }
> 
> 
> This loop is not really vectorizable, even if LLVM had an auto-vectorizer. 
You need the same operation (floating-point or integer) applied to contiguous
elements in a vector.  An example of a vectorizable loop body would be
"v1[i] = v1[i] * v1[i]"  Then, you could use SSE (or any other vector
instruction set) to get a substantial speed improvement.
This is vectorizable.  Just start out with a vector of constants <0, 1, 2,
3>  and do a store of it every time through the loop, adding <4,4,4,4>
as you go.

-Chris
>  
>         for (i = 0; i < 10000; i++) {
>                 printf("%d ", v1[i]);
>         }
> 
>         return 0;
> }
> 
> Next, I disasseble the executable file and have not found any SSE
instructions.
> I know that LLVM support SSE.
> So my questions:
>   1. It is occur only in my computer?
>   2. If it is not only my bug, then there are not SSE optimizations in
LLVM?
>   3. Have anyone, already worked on this problem? 
> 
> --
> Serg Anohovsky.
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> 
> 
> -- 
> 
> Thanks,
> 
> Justin Holewinski
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110522/fee15322/attachment.html>

Serg Anohovsky

2011-May-22 18:10 UTC

head link

[LLVMdev] Fwd: No SSE instructions

---------- Forwarded message ----------
From: Serg Anohovsky <serg.anohovsky at gmail.com>
Date: 2011/5/22
Subject: Re: [LLVMdev] No SSE instructions
To: Chris Lattner <clattner at apple.com>




2011/5/22 Chris Lattner <clattner at apple.com>
>
> On May 22, 2011, at 10:47 AM, Justin Holewinski wrote:
>
> On Sun, May 22, 2011 at 1:07 PM, Serg Anohovsky <serg.anohovsky at
gmail.com>wrote:
>
>> Hello.
>> I have compiled the simple program:
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> int v1[10000];
>>
>> int main()
>> {
>>         int i;
>>
>>         for (i = 0; i < 10000; i++) {
>>                 v1[i] = i;
>>         }
>>
>>
> This loop is not really vectorizable, even if LLVM had an auto-vectorizer.
>  You need the same operation (floating-point or integer) applied to
> contiguous elements in a vector.  An example of a vectorizable loop body
> would be "v1[i] = v1[i] * v1[i]"  Then, you could use SSE (or any
other
> vector instruction set) to get a substantial speed improvement.
>
>
> This is vectorizable.  Just start out with a vector of constants <0, 1,
2,
> 3>  and do a store of it every time through the loop, adding
<4,4,4,4> as
> you go.
>
> -Chris
>
>
>
>>         for (i = 0; i < 10000; i++) {
>>                 printf("%d ", v1[i]);
>>         }
>>
>>         return 0;
>> }
>>
>> Next, I disasseble the executable file and have not found any SSE
>> instructions.
>> I know that LLVM support SSE.
>> So my questions:
>>   1. It is occur only in my computer?
>>   2. If it is not only my bug, then there are not SSE optimizations in
>> LLVM?
>>   3. Have anyone, already worked on this problem?
>>
>> --
>> Serg Anohovsky.
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
>
> --
>
> Thanks,
>
> Justin Holewinski
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> Thanks, for your notes. In my opinion, there are no different. So anotherexample:
#include <stdio.h>
#include <stdlib.h>

int v0[10000];
int v1[10000];

int main()
{
        int i;

        for (i = 0; i < 10000; i++) {
                v0[i] = i;
        }

        for (i = 0; i < 10000; i++) {
                v1[i] = v0[i] * v0[i] * 4;
        }

        for (i = 0; i < 10000; i++) {
                printf("%d ", v1[i]);
        }

        return 0;
}

This is should be optimized, but LLVM have not optimized this program. The
questions
were not about this specific example. I wont to understand, what vector
optimizations LLVM have?
How well implemented this theme in LLVM?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110522/1d5e2c70/attachment.html>

Reid Kleckner

2011-May-22 18:17 UTC

head link

[LLVMdev] Fwd: No SSE instructions

On Sun, May 22, 2011 at 2:10 PM, Serg Anohovsky
<serg.anohovsky at gmail.com> wrote:> This is should be optimized, but LLVM have not optimized this program. The
> questions
> were not about this specific example. I wont to understand, what vector
> optimizations LLVM have?
> How well implemented this theme in LLVM?
When asking this type of question, you should be specific about how
you built the program, ie did you use clang, llvm-gcc, or dragonegg,
and which options did you use.  From your message, I can't tell if you
built at O0 or O3.

In this case, no, LLVM does not have any auto-vectorization
optimizations.  However, LLVM does have good support for vector
intrinsics, so if you use xmmintrin.h you should be able to get good
performance.

Reid

Duncan Sands

2011-May-22 19:27 UTC

head link

[LLVMdev] No SSE instructions

Hi Serg,
> Next, I disasseble the executable file and have not found any SSE
instructions.
> I know that LLVM support SSE.
> So my questions:
>    1. It is occur only in my computer?
>    2. If it is not only my bug, then there are not SSE optimizations in
LLVM?
>    3. Have anyone, already worked on this problem?
the gcc-4.5 tree vectorizer vectorizes this (see LLVM IR below) but LLVM does
not yet have an auto-vectorizer that can do this.

Ciao, Duncan.

IR produced by dragonegg using -O3 and -fplugin-arg-dragonegg-enable-gcc-optzns:

target datalayout = 
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

module asm "\09.ident\09\22GCC: (GNU) 4.5.4 20110506 (prerelease) LLVM:
131851M\22"

@v1 = common global [10000 x i32] zeroinitializer, align 32
@.cst = private constant [4 x i8] c"%d \00", align 8

define i32 @main() nounwind {
entry:
   br label %"<bb 3>"

"<bb 3>":                                         ; preds =
%"<bb 3>", %entry
   %indvar2 = phi i64 [ %indvar.next3, %"<bb 3>" ], [ 0, %entry
]
   %vect_vec_iv_.8_10 = phi <4 x i32> [ %vect_vec_iv_.8_24, %"<bb
3>" ], [ <i32
0, i32 1, i32 2, i32 3>, %entry ]
   %tmp6 = shl i64 %indvar2, 2
   %scevgep7 = getelementptr [10000 x i32]* @v1, i64 0, i64 %tmp6
   %scevgep78 = bitcast i32* %scevgep7 to <4 x i32>*
   %vect_vec_iv_.8_24 = add nsw <4 x i32> %vect_vec_iv_.8_10, <i32 4,
i32 4, i32
4, i32 4>
   store <4 x i32> %vect_vec_iv_.8_10, <4 x i32>* %scevgep78, align
16
   %indvar.next3 = add i64 %indvar2, 1
   %exitcond4 = icmp eq i64 %indvar.next3, 2500
   br i1 %exitcond4, label %"<bb 5>", label %"<bb
3>"

"<bb 5>":                                         ; preds =
%"<bb 3>", %"<bb 5>"
   %indvar = phi i64 [ %indvar.next, %"<bb 5>" ], [ 0,
%"<bb 3>" ]
   %scevgep = getelementptr [10000 x i32]* @v1, i64 0, i64 %indvar
   %D.3943_6 = load i32* %scevgep, align 4
   %0 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* 
@.cst, i64 0, i64 0), i32 %D.3943_6) nounwind
   %indvar.next = add i64 %indvar, 1
   %exitcond = icmp eq i64 %indvar.next, 10000
   br i1 %exitcond, label %"<bb 6>", label %"<bb
5>"

"<bb 6>":                                         ; preds =
%"<bb 5>"
   ret i32 0
}

declare i32 @printf(i8* nocapture, ...) nounwind

Serg Anohovsky

2011-May-22 19:31 UTC

head link

[LLVMdev] No SSE instructions

2011/5/22 Chris Lattner <clattner at apple.com>
>
> LLVM does not have an autovectorizer.
>
> -Chris
>
Could you tell me please are you going to implement autovecorizer in LLVM in
nearby future?

--
Serg Anohovsky
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110522/596014d7/attachment.html>

Chris Lattner

2011-May-22 21:41 UTC

head link

[LLVMdev] No SSE instructions

On May 22, 2011, at 12:31 PM, Serg Anohovsky wrote:
> 
> 
> 2011/5/22 Chris Lattner <clattner at apple.com>
> 
> LLVM does not have an autovectorizer.
> 
> -Chris
> 
> Could you tell me please are you going to implement autovecorizer in LLVM
in nearby future?
I'm confident it will happen but have no idea on what timeline.

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110522/8fea592d/attachment.html>

Serg Anohovsky

2011-May-23 07:58 UTC

head link

[LLVMdev] Fwd: No SSE instructions

---------- Forwarded message ----------
From: Serg Anohovsky <serg.anohovsky at gmail.com>
Date: 2011/5/23
Subject: Re: [LLVMdev] No SSE instructions
To: Chris Lattner <clattner at apple.com>


Thank you all, for your explanation. This is a real interesting theme for
me.

--
Serg Anohovsky
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110523/cb34fb99/attachment.html>

Tobias Grosser

2011-May-23 10:49 UTC

head link

[LLVMdev] No SSE instructions

On 05/22/2011 04:31 PM, Serg Anohovsky wrote:>
>
> 2011/5/22 Chris Lattner <clattner at apple.com <mailto:clattner at
apple.com>>
>
>
>     LLVM does not have an autovectorizer.
>
>     -Chris
>
>
> Could you tell me please are you going to implement autovecorizer in
> LLVM in nearby future?
Hi Serg,

there is some preliminary work done in the Polly project[1] on 
autovectorization. Though we mainly work on loop transformations that 
will expose more vectoriation opportunities. If you are interested to do 
research in this area, Polly may be a good start.

Cheers
Tobi

[1] http://polly.grosser.es

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - May 2011 - [LLVMdev] Fwd: No SSE instructions

[LLVMdev] No SSE instructions

[LLVMdev] No SSE instructions

[LLVMdev] No SSE instructions

[LLVMdev] No SSE instructions

[LLVMdev] Fwd: No SSE instructions

[LLVMdev] Fwd: No SSE instructions

[LLVMdev] No SSE instructions

[LLVMdev] No SSE instructions

[LLVMdev] No SSE instructions

[LLVMdev] Fwd: No SSE instructions

[LLVMdev] No SSE instructions

Seemingly Similar Threads