thr3ads.net - llvm dev - [llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Evgeny Astigeevich via llvm-dev

2017-Jan-20 17:11 UTC

[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Hi,

We found that today's 17.30%/11.37% performance regressions in LNT
SingleSource/Benchmarks/Shootout/sieve on LNT-AArch64-A53-O3__clang_DEV__aarch64
and LNT-Thumb2v7-A15-O3__clang_DEV__thumbv7
(http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen)
are caused by changes [rL292492] in InstCombine:

https://reviews.llvm.org/D28406 "[InstCombine] icmp sgt (shl nsw X, C1), C0
--> icmp sgt X, C0 >> C1"

The Loop Vectorizer generates code with more instructions:

==== Loop Vectorizer from rL292492  ===for.body5:                               
; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader
  %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0,
%for.cond.preheader ]
  %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1,
%for.cond.preheader ]
  %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0,
%for.cond.preheader ]
  %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2,
%for.cond.preheader ]
  %2 = add i64 %indvar, 2
  %3 = shl i64 %indvar, 1
  %4 = add i64 %3, 4
  %5 = add i64 %indvar, 2
  %6 = shl i64 %indvar, 1
  %7 = add i64 %6, 4
  %8 = add i64 %indvar, 2
  %9 = mul i64 %indvar, 3
  %10 = add i64 %9, 6
  %11 = icmp sgt i64 %10, 8193
  %smax = select i1 %11, i64 %10, i64 8193
  %12 = mul i64 %indvar, -2
  %13 = add i64 %12, -5
  %14 = add i64 %smax, %13
  %15 = add i64 %indvar, 2
  %16 = udiv i64 %14, %15
  %17 = add i64 %16, 1
  %tobool7 = icmp eq i8 %1, 0
  br i1 %tobool7, label %for.inc16, label %if.then
===============================
The code generated by the Loop Vectorizer before the changes:

==== Loop Vectorizer from rL292487 ===for.body5:                                
; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader
  %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0,
%for.cond.preheader ]
  %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1,
%for.cond.preheader ]
  %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0,
%for.cond.preheader ]
  %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2,
%for.cond.preheader ]
  %2 = add i64 %indvar, 2
  %3 = shl i64 %indvar, 1
  %4 = add i64 %3, 4
  %5 = add i64 %indvar, 2
  %6 = shl i64 %indvar, 1
  %7 = add i64 %6, 4
  %8 = add i64 %indvar, 2
  %9 = mul i64 %indvar, -2
  %10 = add i64 %9, 8188
  %11 = add i64 %indvar, 2
  %12 = udiv i64 %10, %11
  %13 = add i64 %12, 1
  %tobool7 = icmp eq i8 %1, 0
  br i1 %tobool7, label %for.inc16, label %if.then
===============================
I have not investigated yet why the behaviour of the Vectorizer is changed.

Kind regards,
Evgeny Astigeevich
Senior Compiler Engineer
Compilation Tools
ARM

IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.

Sanjay Patel via llvm-dev

2017-Jan-20 18:16 UTC

head link

[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Thanks for letting me know about this problem!

There's no 'shl nsw' visible in the earlier (r292487) code, so it
would be
better to see exactly what the IR looks like before that added transform
fires.

But I see a red flag:
  %smax = select i1 %11, i64 %10, i64 8193

The new icmp transform allowed us to create an smax, but we have this hack
in InstCombiner::visitICmpInst():

  // Test if the ICmpInst instruction is used exclusively by a select as
  // part of a minimum or maximum operation. If so, refrain from doing
  // any other folding. This helps out other analyses which understand
  // non-obfuscated minimum and maximum idioms, such as ScalarEvolution
  // and CodeGen. And in this case, at least one of the comparison
  // operands has at least one user besides the compare (the select),
  // which would often largely negate the benefit of folding anyway.

...so that prevented folding the icmp into the earlier math.

I am actively working on trying to get rid of that bail-out by improving
min/max value tracking and icmp/select folding. In fact, we might be able
to remove it right now, but I don't know the history of that code or what
cases it was supposed to help.

If it's possible, can you remove that check locally, rebuild, and try the
benchmark again on your system? I'd love to know if that change alone would
solve the problem.


On Fri, Jan 20, 2017 at 10:11 AM, Evgeny Astigeevich <
Evgeny.Astigeevich at arm.com> wrote:
> Hi,
>
> We found that today's 17.30%/11.37% performance regressions in LNT
> SingleSource/Benchmarks/Shootout/sieve on
LNT-AArch64-A53-O3__clang_DEV__aarch64
> and LNT-Thumb2v7-A15-O3__clang_DEV__thumbv7 (http://llvm.org/perf/db_defau
> lt/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch6
> 4%7Carm%7Cthumb%7Cgreen) are caused by changes [rL292492] in InstCombine:
>
> https://reviews.llvm.org/D28406 "[InstCombine] icmp sgt (shl nsw X,
C1),
> C0 --> icmp sgt X, C0 >> C1"
>
> The Loop Vectorizer generates code with more instructions:
>
> ==== Loop Vectorizer from rL292492  ===> for.body5:                     
; preds > %for.inc16.for.body5_crit_edge, %for.cond.preheader
>   %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0,
> %for.cond.preheader ]
>   %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1,
> %for.cond.preheader ]
>   %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0,
> %for.cond.preheader ]
>   %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2,
> %for.cond.preheader ]
>   %2 = add i64 %indvar, 2
>   %3 = shl i64 %indvar, 1
>   %4 = add i64 %3, 4
>   %5 = add i64 %indvar, 2
>   %6 = shl i64 %indvar, 1
>   %7 = add i64 %6, 4
>   %8 = add i64 %indvar, 2
>   %9 = mul i64 %indvar, 3
>   %10 = add i64 %9, 6
>   %11 = icmp sgt i64 %10, 8193
>   %smax = select i1 %11, i64 %10, i64 8193
>   %12 = mul i64 %indvar, -2
>   %13 = add i64 %12, -5
>   %14 = add i64 %smax, %13
>   %15 = add i64 %indvar, 2
>   %16 = udiv i64 %14, %15
>   %17 = add i64 %16, 1
>   %tobool7 = icmp eq i8 %1, 0
>   br i1 %tobool7, label %for.inc16, label %if.then
> ===============================>
> The code generated by the Loop Vectorizer before the changes:
>
> ==== Loop Vectorizer from rL292487 ===> for.body5:                      
; preds > %for.inc16.for.body5_crit_edge, %for.cond.preheader
>   %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0,
> %for.cond.preheader ]
>   %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1,
> %for.cond.preheader ]
>   %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0,
> %for.cond.preheader ]
>   %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2,
> %for.cond.preheader ]
>   %2 = add i64 %indvar, 2
>   %3 = shl i64 %indvar, 1
>   %4 = add i64 %3, 4
>   %5 = add i64 %indvar, 2
>   %6 = shl i64 %indvar, 1
>   %7 = add i64 %6, 4
>   %8 = add i64 %indvar, 2
>   %9 = mul i64 %indvar, -2
>   %10 = add i64 %9, 8188
>   %11 = add i64 %indvar, 2
>   %12 = udiv i64 %10, %11
>   %13 = add i64 %12, 1
>   %tobool7 = icmp eq i8 %1, 0
>   br i1 %tobool7, label %for.inc16, label %if.then
> ===============================>
> I have not investigated yet why the behaviour of the Vectorizer is changed.
>
> Kind regards,
> Evgeny Astigeevich
> Senior Compiler Engineer
> Compilation Tools
> ARM
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170120/8b5b1150/attachment.html>

Evgeny Astigeevich via llvm-dev

2017-Jan-22 19:40 UTC

head link

[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Hi Sanjay,

The benchmark source file:
http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/Benchmarks/Shootout/sieve.c?view=markup
Clang options used to produce the initial IR: clang -DNDEBUG  -O3 -DNDEBUG
-mcpu=cortex-a53 -fomit-frame-pointer -O3 -DNDEBUG   -w -Werror=date-time -c
sieve.c -S -emit-llvm -mllvm -disable-llvm-optzns --target=aarch64-arm-linux
Opt options: opt -O3 -o /dev/null -print-before-all -print-after-all sieve.ll
>& sieve.log

I used the IR (in attached sieve.zip) created with the r292487 version.
The attached sieve contains the output of ‘-print-before-all -print-after-all’ 
for r292487 and rL292492.
> If it's possible, can you remove that check locally, rebuild,
> and try the benchmark again on your system? I'd love to know
> if that change alone would solve the problem.
Do you mean to remove the hack in InstCombiner::visitICmpInst()?

Kind regards,
Evgeny Astigeevich
Senior Compiler Engineer
Compilation Tools
ARM


From: Sanjay Patel [mailto:spatel at rotateright.com]
Sent: Friday, January 20, 2017 6:16 PM
To: Evgeny Astigeevich
Cc: llvm-dev; Renato Golin; t.p.northover at gmail.com; hfinkel at anl.gov
Subject: Re: [InstCombine] rL292492 affected LoopVectorizer and caused
17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Thanks for letting me know about this problem!

There's no 'shl nsw' visible in the earlier (r292487) code, so it
would be better to see exactly what the IR looks like before that added
transform fires.

But I see a red flag:
  %smax = select i1 %11, i64 %10, i64 8193
The new icmp transform allowed us to create an smax, but we have this hack in
InstCombiner::visitICmpInst():

  // Test if the ICmpInst instruction is used exclusively by a select as
  // part of a minimum or maximum operation. If so, refrain from doing
  // any other folding. This helps out other analyses which understand
  // non-obfuscated minimum and maximum idioms, such as ScalarEvolution
  // and CodeGen. And in this case, at least one of the comparison
  // operands has at least one user besides the compare (the select),
  // which would often largely negate the benefit of folding anyway.
...so that prevented folding the icmp into the earlier math.
I am actively working on trying to get rid of that bail-out by improving min/max
value tracking and icmp/select folding. In fact, we might be able to remove it
right now, but I don't know the history of that code or what cases it was
supposed to help.

If it's possible, can you remove that check locally, rebuild, and try the
benchmark again on your system? I'd love to know if that change alone would
solve the problem.


On Fri, Jan 20, 2017 at 10:11 AM, Evgeny Astigeevich <Evgeny.Astigeevich at
arm.com<mailto:Evgeny.Astigeevich at arm.com>> wrote:
Hi,

We found that today's 17.30%/11.37% performance regressions in LNT
SingleSource/Benchmarks/Shootout/sieve on LNT-AArch64-A53-O3__clang_DEV__aarch64
and LNT-Thumb2v7-A15-O3__clang_DEV__thumbv7
(http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen)
are caused by changes [rL292492] in InstCombine:

https://reviews.llvm.org/D28406 "[InstCombine] icmp sgt (shl nsw X, C1), C0
--> icmp sgt X, C0 >> C1"

The Loop Vectorizer generates code with more instructions:

==== Loop Vectorizer from rL292492  ===for.body5:                               
; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader
  %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0,
%for.cond.preheader ]
  %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1,
%for.cond.preheader ]
  %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0,
%for.cond.preheader ]
  %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2,
%for.cond.preheader ]
  %2 = add i64 %indvar, 2
  %3 = shl i64 %indvar, 1
  %4 = add i64 %3, 4
  %5 = add i64 %indvar, 2
  %6 = shl i64 %indvar, 1
  %7 = add i64 %6, 4
  %8 = add i64 %indvar, 2
  %9 = mul i64 %indvar, 3
  %10 = add i64 %9, 6
  %11 = icmp sgt i64 %10, 8193
  %smax = select i1 %11, i64 %10, i64 8193
  %12 = mul i64 %indvar, -2
  %13 = add i64 %12, -5
  %14 = add i64 %smax, %13
  %15 = add i64 %indvar, 2
  %16 = udiv i64 %14, %15
  %17 = add i64 %16, 1
  %tobool7 = icmp eq i8 %1, 0
  br i1 %tobool7, label %for.inc16, label %if.then
===============================
The code generated by the Loop Vectorizer before the changes:

==== Loop Vectorizer from rL292487 ===for.body5:                                
; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader
  %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0,
%for.cond.preheader ]
  %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1,
%for.cond.preheader ]
  %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0,
%for.cond.preheader ]
  %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2,
%for.cond.preheader ]
  %2 = add i64 %indvar, 2
  %3 = shl i64 %indvar, 1
  %4 = add i64 %3, 4
  %5 = add i64 %indvar, 2
  %6 = shl i64 %indvar, 1
  %7 = add i64 %6, 4
  %8 = add i64 %indvar, 2
  %9 = mul i64 %indvar, -2
  %10 = add i64 %9, 8188
  %11 = add i64 %indvar, 2
  %12 = udiv i64 %10, %11
  %13 = add i64 %12, 1
  %tobool7 = icmp eq i8 %1, 0
  br i1 %tobool7, label %for.inc16, label %if.then
===============================
I have not investigated yet why the behaviour of the Vectorizer is changed.

Kind regards,
Evgeny Astigeevich
Senior Compiler Engineer
Compilation Tools
ARM


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170122/df0cd762/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sieve.zip
Type: application/x-zip-compressed
Size: 40139 bytes
Desc: sieve.zip
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170122/df0cd762/attachment-0001.bin>

David Majnemer via llvm-dev

2017-Jan-22 22:46 UTC

head link

[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

On Fri, Jan 20, 2017 at 10:16 AM, Sanjay Patel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Thanks for letting me know about this problem!
>
> There's no 'shl nsw' visible in the earlier (r292487) code, so
it would be
> better to see exactly what the IR looks like before that added transform
> fires.
>
> But I see a red flag:
>   %smax = select i1 %11, i64 %10, i64 8193
>
> The new icmp transform allowed us to create an smax, but we have this hack
> in InstCombiner::visitICmpInst():
>
>   // Test if the ICmpInst instruction is used exclusively by a select as
>   // part of a minimum or maximum operation. If so, refrain from doing
>   // any other folding. This helps out other analyses which understand
>   // non-obfuscated minimum and maximum idioms, such as ScalarEvolution
>   // and CodeGen. And in this case, at least one of the comparison
>   // operands has at least one user besides the compare (the select),
>   // which would often largely negate the benefit of folding anyway.
>
> ...so that prevented folding the icmp into the earlier math.
>
> I am actively working on trying to get rid of that bail-out by improving
> min/max value tracking and icmp/select folding. In fact, we might be able
> to remove it right now, but I don't know the history of that code or
what
> cases it was supposed to help.
>
I think the primary reason is SCEV.  It has SCEVSMax/SCEVUMax expressions.
Our generic select handling for things like knownbits is going to be less
precise.

>
> If it's possible, can you remove that check locally, rebuild, and try
the
> benchmark again on your system? I'd love to know if that change alone
would
> solve the problem.
>
>
> On Fri, Jan 20, 2017 at 10:11 AM, Evgeny Astigeevich <
> Evgeny.Astigeevich at arm.com> wrote:
>
>> Hi,
>>
>> We found that today's 17.30%/11.37% performance regressions in LNT
>> SingleSource/Benchmarks/Shootout/sieve on
LNT-AArch64-A53-O3__clang_DEV__aarch64
>> and LNT-Thumb2v7-A15-O3__clang_DEV__thumbv7 (
>> http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/
>> 20?filter-machine-regex=aarch64%7Carm%7Cthumb%7Cgreen) are caused by
>> changes [rL292492] in InstCombine:
>>
>> https://reviews.llvm.org/D28406 "[InstCombine] icmp sgt (shl nsw
X, C1),
>> C0 --> icmp sgt X, C0 >> C1"
>>
>> The Loop Vectorizer generates code with more instructions:
>>
>> ==== Loop Vectorizer from rL292492  ===>> for.body5:             
; preds >> %for.inc16.for.body5_crit_edge, %for.cond.preheader
>>   %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [
>> 0, %for.cond.preheader ]
>>   %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1,
>> %for.cond.preheader ]
>>   %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [
0,
>> %for.cond.preheader ]
>>   %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2,
>> %for.cond.preheader ]
>>   %2 = add i64 %indvar, 2
>>   %3 = shl i64 %indvar, 1
>>   %4 = add i64 %3, 4
>>   %5 = add i64 %indvar, 2
>>   %6 = shl i64 %indvar, 1
>>   %7 = add i64 %6, 4
>>   %8 = add i64 %indvar, 2
>>   %9 = mul i64 %indvar, 3
>>   %10 = add i64 %9, 6
>>   %11 = icmp sgt i64 %10, 8193
>>   %smax = select i1 %11, i64 %10, i64 8193
>>   %12 = mul i64 %indvar, -2
>>   %13 = add i64 %12, -5
>>   %14 = add i64 %smax, %13
>>   %15 = add i64 %indvar, 2
>>   %16 = udiv i64 %14, %15
>>   %17 = add i64 %16, 1
>>   %tobool7 = icmp eq i8 %1, 0
>>   br i1 %tobool7, label %for.inc16, label %if.then
>> ===============================>>
>> The code generated by the Loop Vectorizer before the changes:
>>
>> ==== Loop Vectorizer from rL292487 ===>> for.body5:              
; preds >> %for.inc16.for.body5_crit_edge, %for.cond.preheader
>>   %indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [
>> 0, %for.cond.preheader ]
>>   %1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1,
>> %for.cond.preheader ]
>>   %count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [
0,
>> %for.cond.preheader ]
>>   %i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2,
>> %for.cond.preheader ]
>>   %2 = add i64 %indvar, 2
>>   %3 = shl i64 %indvar, 1
>>   %4 = add i64 %3, 4
>>   %5 = add i64 %indvar, 2
>>   %6 = shl i64 %indvar, 1
>>   %7 = add i64 %6, 4
>>   %8 = add i64 %indvar, 2
>>   %9 = mul i64 %indvar, -2
>>   %10 = add i64 %9, 8188
>>   %11 = add i64 %indvar, 2
>>   %12 = udiv i64 %10, %11
>>   %13 = add i64 %12, 1
>>   %tobool7 = icmp eq i8 %1, 0
>>   br i1 %tobool7, label %for.inc16, label %if.then
>> ===============================>>
>> I have not investigated yet why the behaviour of the Vectorizer is
>> changed.
>>
>> Kind regards,
>> Evgeny Astigeevich
>> Senior Compiler Engineer
>> Compilation Tools
>> ARM
>>
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy
the
>> information in any medium. Thank you.
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170122/d35587d7/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jan 2017 - [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Possibly Parallel Threads