Serge Preis via llvm-dev
2017-Jul-03 05:19 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
Hello,
This is not equivalent rewrite: your original code definitely shouldn't
vectorize because it has backward cross-iteration (loop-carried) dependency:
your value on iteration j+1 depend on value from iteration j you've just
written. In case of vectorization you need to do load-operation-store on
multiple consecutive values at once and it is impossible in this case.
In your new code all values of a[] in right-hand side of the main loop are from
previous k iteration (because on current k iteration you're writing to
'b'). So there is no way to vectorize loop in its original form, but new
form is definitely vectorizable.
I am second to recommend you filing a bug over 'restrict' behavior. And
you may in fact save some memory by making 'b' 1D array (this is not
equivalent rewrite once again though)
// This function computes 2D-5 point Jacobi stencil
void stencil(int a[][N], int b[N])
{
int i, j, k;
for (k = 0; k < N; k++) {
for (i = 1; i <= N-2; i++) {
for (j = 1; j <= N-2; j++)
b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
a[i][j+1]);
for (j = 1; j <= N-2; j++)
a[i][j] = b[j];
}
}
}
There is a way to vectorize your code even more efficient. This will look like
more low-level, but probably closest to what you would expect from
vectorization.
void stencil(int a[][N])
{
int i, j, k;
for (k = 0; k < 100; k++) {
for (i = 1; i <= N-2; i++) {
for (j = 1; j <= N-2; j+=16) {
int b[16]; // This should become a register on KNL
for (v = 0; v < 16; v++) {
b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] +
a[i][j-1+v] + a[i][j+1+v]);
}
for (v = 0; v < 16; v++) { // This will be a single store
operation
a[i][j+v] = b[v];
}
}
// You should explicitly take care about the tail of j-loop
#if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED // This is not an actual name,
just a designator
for (;j <= N-2; j++) {
a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
a[i][j+1]);
}
#else
for (v = 0; v < 16-(N-2-j); v++) { // This would become masked
non-loop
b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + a[i][j-1+v]
+ a[i][j+1+v]);
}
for (v = 0; v < 16-(N-2-j); v++) { // This will be a single
masked store operation
a[i][j+v] = b[v];
}
#endif
}
}
Unfortunalely compiler cannot do this for you: this is not equivalent
transformation of original code. I am also not aware of any way to express this
desired behavior less explicitly (e.g. OpenMP SIMD pragma won't work in this
case).
Minor note: You're using 'int' for data, than multiply by 0.25
(divide by 4) and than write it back to 'int'. This will cost you 2
conversion to/from double while you may just place (...) / 4 which should be
optimized to simple sequecnce with shifts (not to single shift due to
signedness, but still better than conversions with changes of element size
4->8->4 and data size INT->FP->INT).
And by the way why do you divide by 4, not by 5 as number of points suggest?
Serge Preis
02.07.2017, 05:11, "hameeza ahmed via llvm-dev" <llvm-dev at
lists.llvm.org>:> further i modified the code to the following;
>
> #include <stdio.h>
> #define N 100351
>
> // This function computes 2D-5 point Jacobi stencil
> void stencil(int a[restrict][N], int b[restrict][N])
> {
> int i, j, k;
> for (k = 0; k < N; k++) {
> for (i = 1; i <= N-2; i++)
> for (j = 1; j <= N-2; j++)
> b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
a[i][j+1]);
>
> for (i = 1; i <= N-2; i++)
> for (j = 1; j <= N-2; j++)
> a[i][j] = b[i][j];
>
> }
> }
>
> but still no vectorization in IR. Also, when I set vector width
explicitly to 64, it gives the following error:
>
> remark: <unknown>:0:0: loop not vectorized: call instruction cannot
be vectorized
> remark: <unknown>:0:0: loop not vectorized: value that could not be
identified as reduction is used outside the loop
>
> I need serious help on this. Please guide me.
>
> On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at
gmail.com> wrote:
>> Does it happen due to loop carried dependence? if yes what is the
solution to vectorize such codes?
>>
>> please reply. i m waiting.
>>
>> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at
gmail.com> wrote:
>>> I even tried polly but still my llvm IR does not contain vector
instructions. i used the following command;
>>>
>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm
-polly-vectorizer=stripmine -o stencil_poly.ll
>>>
>>> Please specify what is wrong with my code?
>>>
>>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at
gmail.com> wrote:
>>>> Hello,
>>>>
>>>> I am trying to vectorize following stencil code;
>>>>
>>>> #include <stdio.h>
>>>> #define N 100351
>>>>
>>>> // This function computes 2D-5 point Jacobi stencil
>>>> void stencil(int a[restrict][N])
>>>> {
>>>> int i, j, k;
>>>> for (k = 0; k < 100; k++)
>>>> { for (i = 1; i <= N-2; i++)
>>>> { for (j = 1; j <= N-2; j++)
>>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
a[i][j-1] + a[i][j+1]);
>>>> }
>>>> }
>>>> }}
>>>>
>>>> I have used the following commands
>>>>
>>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm
-disable-llvm-optzns -o stencil.ll
>>>>
>>>> opt -S -O3 stencil.ll -o stencil_o3.ll
>>>>
>>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s
>>>>
>>>> But the code is not vectorized. It still uses the scalar
instructions;
>>>>
>>>> Please correct me.
>>>>
>>>> Thank You
> ,
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
hameeza ahmed via llvm-dev
2017-Oct-21 20:21 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
Hello,
The following stencil function that you wrote few months ago (for
computing 2D-5 point Jacobi stencil) is vectorizable.
void stencil(int a[][N], int b[N])
{
int i, j, k;
for (k = 0; k < N; k++) {
for (i = 1; i <= N-2; i++) {
for (j = 1; j <= N-2; j++)
b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1]
+ a[i][j+1]);
for (j = 1; j <= N-2; j++)
a[i][j] = b[j];
}
}
}
but when i write the above code in main i am getting error
opt -S -O3 stencil.ll -o stencil_o3.ll
remark: <unknown>:0:0: loop not vectorized: value that could not be
identified as reduction is used outside the loop
Why is that so?
my code is follows:
int a[N][N], b[N];
int main(void)
{
int i, j, k;
for (k = 0; k < N; k++) {
for (i = 1; i <= N-2; i++) {
for (j = 1; j <= N-2; j++)
b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1]
+ a[i][j+1]);
for (j = 1; j <= N-2; j++)
a[i][j] = b[j];
}
}
}
Please help.
Thank You.
On Mon, Jul 3, 2017 at 10:19 AM, Serge Preis <spreis at yandex-team.ru>
wrote:
> Hello,
>
> This is not equivalent rewrite: your original code definitely shouldn't
> vectorize because it has backward cross-iteration (loop-carried)
> dependency: your value on iteration j+1 depend on value from iteration j
> you've just written. In case of vectorization you need to do
> load-operation-store on multiple consecutive values at once and it is
> impossible in this case.
>
> In your new code all values of a[] in right-hand side of the main loop are
> from previous k iteration (because on current k iteration you're
writing to
> 'b'). So there is no way to vectorize loop in its original form,
but new
> form is definitely vectorizable.
>
> I am second to recommend you filing a bug over 'restrict' behavior.
And
> you may in fact save some memory by making 'b' 1D array (this is
not
> equivalent rewrite once again though)
>
> // This function computes 2D-5 point Jacobi stencil
> void stencil(int a[][N], int b[N])
> {
> int i, j, k;
> for (k = 0; k < N; k++) {
> for (i = 1; i <= N-2; i++) {
> for (j = 1; j <= N-2; j++)
> b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
> a[i][j-1] + a[i][j+1]);
> for (j = 1; j <= N-2; j++)
> a[i][j] = b[j];
> }
> }
> }
>
> There is a way to vectorize your code even more efficient. This will look
> like more low-level, but probably closest to what you would expect from
> vectorization.
>
> void stencil(int a[][N])
> {
> int i, j, k;
> for (k = 0; k < 100; k++) {
> for (i = 1; i <= N-2; i++) {
> for (j = 1; j <= N-2; j+=16) {
> int b[16]; // This should become a register on KNL
> for (v = 0; v < 16; v++) {
> b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] +
> a[i][j-1+v] + a[i][j+1+v]);
> }
> for (v = 0; v < 16; v++) { // This will be a single
store
> operation
> a[i][j+v] = b[v];
> }
> }
> // You should explicitly take care about the tail of j-loop
> #if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED // This is not an
> actual name, just a designator
> for (;j <= N-2; j++) {
> a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
> a[i][j-1] + a[i][j+1]);
> }
> #else
> for (v = 0; v < 16-(N-2-j); v++) { // This would become
masked
> non-loop
> b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] +
> a[i][j-1+v] + a[i][j+1+v]);
> }
> for (v = 0; v < 16-(N-2-j); v++) { // This will be a single
> masked store operation
> a[i][j+v] = b[v];
> }
> #endif
> }
> }
>
> Unfortunalely compiler cannot do this for you: this is not equivalent
> transformation of original code. I am also not aware of any way to express
> this desired behavior less explicitly (e.g. OpenMP SIMD pragma won't
work
> in this case).
>
> Minor note: You're using 'int' for data, than multiply by 0.25
(divide by
> 4) and than write it back to 'int'. This will cost you 2 conversion
to/from
> double while you may just place (...) / 4 which should be optimized to
> simple sequecnce with shifts (not to single shift due to signedness, but
> still better than conversions with changes of element size 4->8->4
and data
> size INT->FP->INT).
>
> And by the way why do you divide by 4, not by 5 as number of points
> suggest?
>
> Serge Preis
>
>
> 02.07.2017, 05:11, "hameeza ahmed via llvm-dev" <llvm-dev at
lists.llvm.org>:
> > further i modified the code to the following;
> >
> > #include <stdio.h>
> > #define N 100351
> >
> > // This function computes 2D-5 point Jacobi stencil
> > void stencil(int a[restrict][N], int b[restrict][N])
> > {
> > int i, j, k;
> > for (k = 0; k < N; k++) {
> > for (i = 1; i <= N-2; i++)
> > for (j = 1; j <= N-2; j++)
> > b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1]
+
> a[i][j+1]);
> >
> > for (i = 1; i <= N-2; i++)
> > for (j = 1; j <= N-2; j++)
> > a[i][j] = b[i][j];
> >
> > }
> > }
> >
> > but still no vectorization in IR. Also, when I set vector width
> explicitly to 64, it gives the following error:
> >
> > remark: <unknown>:0:0: loop not vectorized: call instruction
cannot be
> vectorized
> > remark: <unknown>:0:0: loop not vectorized: value that could not
be
> identified as reduction is used outside the loop
> >
> > I need serious help on this. Please guide me.
> >
> > On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at
gmail.com>
> wrote:
> >> Does it happen due to loop carried dependence? if yes what is the
> solution to vectorize such codes?
> >>
> >> please reply. i m waiting.
> >>
> >> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305
at gmail.com> wrote:
> >>> I even tried polly but still my llvm IR does not contain
vector
> instructions. i used the following command;
> >>>
> >>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly
-mllvm
> -polly-vectorizer=stripmine -o stencil_poly.ll
> >>>
> >>> Please specify what is wrong with my code?
> >>>
> >>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305
at gmail.com>
> wrote:
> >>>> Hello,
> >>>>
> >>>> I am trying to vectorize following stencil code;
> >>>>
> >>>> #include <stdio.h>
> >>>> #define N 100351
> >>>>
> >>>> // This function computes 2D-5 point Jacobi stencil
> >>>> void stencil(int a[restrict][N])
> >>>> {
> >>>> int i, j, k;
> >>>> for (k = 0; k < 100; k++)
> >>>> { for (i = 1; i <= N-2; i++)
> >>>> { for (j = 1; j <= N-2; j++)
> >>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] +
a[i+1][j] +
> a[i][j-1] + a[i][j+1]);
> >>>> }
> >>>> }
> >>>> }}
> >>>>
> >>>> I have used the following commands
> >>>>
> >>>> clang -S -emit-llvm stencil.c -march=knl -O3 -mllvm
> -disable-llvm-optzns -o stencil.ll
> >>>>
> >>>> opt -S -O3 stencil.ll -o stencil_o3.ll
> >>>>
> >>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s
> >>>>
> >>>> But the code is not vectorized. It still uses the scalar
instructions;
> >>>>
> >>>> Please correct me.
> >>>>
> >>>> Thank You
> > ,
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171022/c765aadb/attachment.html>
Serge Preis via llvm-dev
2017-Oct-23 06:37 UTC
[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing
<div> </div><div> </div><div>Hello,</div><div> </div><div>To
me this is an issue in llvm loop vectorizer (if N is large enough to prevent
complete unrolling of
j-loop).</div><div> </div><div>Woud you mind to share
stencil.ll than I would say more definitely what the issue
is.</div><div> </div><div>Regards,</div><div>Serge.</div><div> </div><div> </div><div>22.10.2017,
03:21, "hameeza ahmed"
<hahmed2305@gmail.com>:</div><blockquote
type="cite"><div>Hello,<div> </div><div>The
following stencil function that you wrote few months ago (for computing 2D-5
point Jacobi stencil) is
vectorizable.</div><div> </div><div><div>void
stencil(int a[][N], int b[N])</div><div>{</div><div>
int i, j, k;</div><div> for (k = 0; k < N; k++)
{</div><div> for (i = 1; i <= N-2; i++)
{</div><div> for (j = 1; j <= N-2;
j++)</div><div> b[j] = 0.25 * (a[i][j] + a[i-1][j] +
a[i+1][j] + a[i][j-1] + a[i][j+1]);</div><div> for (j =
1; j <= N-2; j++)</div><div> a[i][j] =
b[j];</div><div> }</div><div>
}</div><div>}</div></div><div> </div><div>but
when i write the above code in main i am getting
error </div><div> </div><div><div>opt -S -O3
stencil.ll -o stencil_o3.ll</div><div>remark: <unknown>:0:0:
loop not vectorized: value that could not be identified as reduction is used
outside the loop</div></div><div> </div><div>Why
is that so?</div><div> </div><div>my code is
follows:</div><div>int a[N][N],
b[N];</div><div><div>int
main(void)</div><div>{</div><div> int i, j,
k;</div><div> for (k = 0; k < N; k++)
{</div><div> for (i = 1; i <= N-2; i++)
{</div><div> for (j = 1; j <= N-2;
j++)</div><div> b[j] = 0.25 * (a[i][j] + a[i-1][j] +
a[i+1][j] + a[i][j-1] + a[i][j+1]);</div><div> for (j =
1; j <= N-2; j++)</div><div> a[i][j] =
b[j];</div><div> }</div><div>
}</div><div>}</div></div><div> </div><div> </div><div>Please
help.</div><div> </div><div>Thank
You.</div><div> </div></div><div> <div>On
Mon, Jul 3, 2017 at 10:19 AM, Serge Preis <span><<a
target="_blank"
href="mailto:spreis@yandex-team.ru">spreis@yandex-team.ru</a>></span>
wrote:<blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc
solid;padding-left:1ex;">Hello,<br /><br />This is not
equivalent rewrite: your original code definitely shouldn't vectorize
because it has backward cross-iteration (loop-carried) dependency: your value on
iteration j+1 depend on value from iteration j you've just written. In case
of vectorization you need to do load-operation-store on multiple consecutive
values at once and it is impossible in this case.<br /><br />In your
new code all values of a[] in right-hand side of the main loop are from previous
k iteration (because on current k iteration you're writing to 'b').
So there is no way to vectorize loop in its original form, but new form is
definitely vectorizable.<br /><br />I am second to recommend you
filing a bug over 'restrict' behavior. And you may in fact save some
memory by making 'b' 1D array (this is not equivalent rewrite once again
though)<br /><br /><span>// This function computes 2D-5 point
Jacobi stencil</span><br />void stencil(int a[][N], int b[N])<br
/><span>{<br /> int i, j, k;<br /> for (k = 0; k <
N; k++) {<br /> for (i = 1; i <= N-2; i++) {<br />
for (j = 1; j <= N-2; j++)</span><br /> b[j] =
0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]);<br
/><span> for (j = 1; j <= N-2; j++)</span><br
/> a[i][j] = b[j];<br /> }<br />
}<br />}<br /><br />There is a way to vectorize your code even
more efficient. This will look like more low-level, but probably closest to what
you would expect from vectorization.<br /><br />void stencil(int
a[][N])<br /><span>{<br /> int i, j, k;<br /> for
(k = 0; k < 100; k++) { <br /> for (i = 1; i <= N-2; i++)
{</span><br /> for (j = 1; j <= N-2; j+=16) {<br
/> int b[16]; // This should become a register on KNL<br
/> for (v = 0; v < 16; v++) {<br />
b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + a[i][j-1+v] +
a[i][j+1+v]);<br /> }<br /> for (v = 0;
v < 16; v++) { // This will be a single store operation<br />
a[i][j+v] = b[v];<br /> }<br />
}<br /> // You should explicitly take care about the tail of
j-loop<br />#if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED // This is
not an actual name, just a designator<br /> for (;j <= N-2;
j++) {<br /><span> a[i][j] = 0.25 * (a[i][j] +
a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]);<br />
}</span><br />#else<br /> for (v = 0; v <
16-(N-2-j); v++) { // This would become masked non-loop<br />
b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] + a[i][j-1+v] +
a[i][j+1+v]);<br /> }<br /> for (v = 0; v <
16-(N-2-j); v++) { // This will be a single masked store operation<br />
a[i][j+v] = b[v];<br /> }<br />#endif<br
/> }<br />}<br /><br />Unfortunalely compiler cannot do
this for you: this is not equivalent transformation of original code. I am also
not aware of any way to express this desired behavior less explicitly (e.g.
OpenMP SIMD pragma won't work in this case).<br /><br />Minor
note: You're using 'int' for data, than multiply by 0.25 (divide by
4) and than write it back to 'int'. This will cost you 2 conversion
to/from double while you may just place (...) / 4 which should be optimized to
simple sequecnce with shifts (not to single shift due to signedness, but still
better than conversions with changes of element size 4->8->4 and data size
INT->FP->INT).<br /><br />And by the way why do you divide by
4, not by 5 as number of points suggest?<br /><br />Serge
Preis<br /><br /><br />02.07.2017, 05:11, "hameeza ahmed
via llvm-dev" <<a
href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>>:<div><div>>
further i modified the code to the following;<br />><br />>
#include <stdio.h><br />> #define N 100351<br />><br
/>> // This function computes 2D-5 point Jacobi stencil<br />>
void stencil(int a[restrict][N], int b[restrict][N])<br />> {<br
/>> int i, j, k;<br />> for (k = 0; k < N; k++) {<br
/>> for (i = 1; i <= N-2; i++)<br />> for (j
= 1; j <= N-2; j++)<br />> b[i][j] = 0.25 * (a[i][j] +
a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]);<br />><br />>
for (i = 1; i <= N-2; i++)<br />> for (j = 1; j <= N-2;
j++)<br />> a[i][j] = b[i][j];<br />><br />> }<br
/>> }<br />><br />> but still no vectorization in IR.
Also, when I set vector width explicitly to 64, it gives the following
error:<br />><br />> remark: <unknown>:0:0: loop not
vectorized: call instruction cannot be vectorized<br />> remark:
<unknown>:0:0: loop not vectorized: value that could not be identified as
reduction is used outside the loop<br />><br />> I need
serious help on this. Please guide me.<br />><br />> On Sun,
Jul 2, 2017 at 1:54 AM, hameeza ahmed <<a
href="mailto:hahmed2305@gmail.com">hahmed2305@gmail.com</a>>
wrote:<br />>> Does it happen due to loop carried dependence? if yes
what is the solution to vectorize such codes?<br />>><br
/>>> please reply. i m waiting.<br />>><br />>>
On Jul 1, 2017 12:30 PM, "hameeza ahmed" <<a
href="mailto:hahmed2305@gmail.com">hahmed2305@gmail.com</a>>
wrote:<br />>>> I even tried polly but still my llvm IR does not
contain vector instructions. i used the following command;<br
/>>>><br />>>> clang -S -emit-llvm stencil.c
-march=knl -O3 -mllvm -polly -mllvm -polly-vectorizer=stripmine -o
stencil_poly.ll<br />>>><br />>>> Please specify
what is wrong with my code?<br />>>><br />>>> On
Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <<a
href="mailto:hahmed2305@gmail.com">hahmed2305@gmail.com</a>>
wrote:<br />>>>> Hello,<br />>>>><br
/>>>>> I am trying to vectorize following stencil code;<br
/>>>>><br />>>>> #include <stdio.h><br
/>>>>> #define N 100351<br />>>>><br
/>>>>> // This function computes 2D-5 point Jacobi stencil<br
/>>>>> void stencil(int a[restrict][N])<br
/>>>>> {<br />>>>> int i, j, k;<br
/>>>>> for (k = 0; k < 100; k++)<br
/>>>>> { for (i = 1; i <= N-2; i++)<br
/>>>>> { for (j = 1; j <= N-2; j++)<br
/>>>>> { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] +
a[i+1][j] + a[i][j-1] + a[i][j+1]);<br />>>>>
}<br />>>>> }<br />>>>> }}<br
/>>>>><br />>>>> I have used the following
commands<br />>>>><br />>>>> clang -S
-emit-llvm stencil.c -march=knl -O3 -mllvm -disable-llvm-optzns -o
stencil.ll<br />>>>><br />>>>> opt -S -O3
stencil.ll -o stencil_o3.ll<br />>>>><br
/>>>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s<br
/>>>>><br />>>>> But the code is not
vectorized. It still uses the scalar instructions;<br
/>>>>><br />>>>> Please correct me.<br
/>>>>><br />>>>> Thank
You</div></div>> ,<br />><br />>
_______________________________________________<br />> LLVM Developers
mailing list<br />> <a
href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br
/>> <a target="_blank"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></blockquote></div></div></blockquote>