thr3ads.net - llvm dev - [llvm-dev] llvm is getting slower, January edition [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Jonathan Roelofs via llvm-dev

2017-Jan-18 23:21 UTC

[llvm-dev] llvm is getting slower, January edition

On 1/18/17 3:55 PM, Davide Italiano via llvm-dev wrote:> On Tue, Jan 17, 2017 at 6:02 PM, Mikhail Zolotukhin
> <mzolotukhin at apple.com> wrote:
>> Hi,
>>
>> Continuing recent efforts in understanding compile time slowdowns, I
looked at some historical data: I picked one test and tried to pin-point commits
that affected its compile-time. The data I have is not 100% accurate, but
hopefully it helps to provide an overview of what's going on with compile
time in LLVM and give a better understanding of what changes usually impact
compile time.
>>
>> Configuration:
>> The test I used is tramp3d-v4 from LLVM testsuite. It consists of a
single source file, but still takes a noticeable time to compile, which makes it
very convenient for this kind of experiments. The file was compiled with Os for
arm64 on x86 host.
>>
>> Results:
>> The attached PDF has a compile time graph, on which I marked points
where compile time changed with a list of corresponding commits. A textual
version of the list is available below, but I think it might be much harder to
comprehend the data without the graph. A number in the end shows compile time
change after the given commit:
>>
>> 1. r239821: [InstSimplify] Allow folding of fdiv X, X with just NaNs
ignored. +1%
>> 2. r241886: [InstCombine] Employ AliasAnalysis in
FindAvailableLoadedValue. +1%
>> 3. r245118: [SCEV] Apply NSW and NUW flags via poison value analysis
for sub, mul and shl. +2%
>> 4. r246694: [RemoveDuplicatePHINodes] Start over after removing a PHI.
-1%
>> 5. r247269: [ADT] Rewrite the StringRef::find implementation to be
simpler... +1%
>>    r247240: [LPM] Use a map from analysis ID to immutable passes in the
legacy pass manager... +3%
>>    r247264: Enable GlobalsAA by default. +1%
>> 6. r247674: [GlobalsAA] Disable globals-aa by default. -1%
>> 7. r248638: [SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to
exploit trip counts'. +2%
>> 8. r249802: [SCEV] Call `StrengthenNoWrapFlags` after
`GroupByComplexity`; NFCI. +4%
>> 9. r250157: [GlobalsAA] Turn GlobalsAA on again by default. +1%
>> 10. r251049: [SCEV] Mark AddExprs as nsw or nuw if legal. +23%
>> 11. No data
>> 12. r259252: AttributeSetImpl: Summarize existing function attributes
in a bitset. -1%
>>     r259256: Add LoopSimplifyCFG pass. -2%
>> 13. r262250: Enable LoopLoadElimination by default. +3%
>> 14. r262839: Revert "Enable LoopLoadElimination by default".
-3%
>> 15. r263393: Remove PreserveNames template parameter from IRBuilder.
-3%
>> 16. r263595: Turn LoopLoadElimination on again. +3%
>> 17. r267672: [LoopDist] Add llvm.loop.distribute.enable loop metadata.
+4%
>> 18. r268509: Do not disable completely loop unroll when optimizing for
size. -34%
>> 19. r269124: Loop unroller: set thresholds for optsize and minsize
functions to zero. +50%
>> 20. r269392: [LoopDist] Only run LAA for loops with the pragma. -4%
>> 21. r270630: Re-enable "[LoopUnroll] Enable advanced unrolling
analysis by default" one more time. -28%
>> 22. r270881: Don't allocate in APInt::slt.  NFC. -2%
>>     r270959: Don't allocate unnecessarily in APInt::operator[+-]. 
NFC. -1%
>>     r271020: Don't generate unnecessary signed ConstantRange during
multiply.  NFC. -3%
>> 23. r271615: [LoopUnroll] Set correct thresholds for new recently
enabled unrolling heuristic. +22%
>> 24. r276942: Don't invoke getName() from Function::isIntrinsic().
-1%
>>     r277087: Revert "Don't invoke getName() from
Function::isIntrinsic().", rL276942. +1%
>> 25. r279585: [LoopUnroll] By default disable unrolling when optimizing
for size.
>> 26. r286814: [InlineCost] Remove skew when calculating call costs. +3%
>> 27. r289755: Make processing @llvm.assume more efficient by using
operand bundles. +6%
>> 28. r290086: Revert @llvm.assume with operator bundles
(r289755-r289757). -6%
>>
>>
>> Disclaimer:
>> The data is specific for this particular test, so I could have skipped
some commits affecting compile time on other workloads/configurations.
>> The data I have is not perfect, so I could have skipped some commits,
even if they impacted compile-time on this test case.
>> Same commits might have a different impact on a different
test/configuration, up to the opposite to the one listed.
>> I didn't mean to label any commits as 'good' or
'bad' by posting these numbers. It's expected that some commits
increase compile time, we just need to be aware of it and avoid unnecessary
slowdowns.
>>
>> Conclusions:
>> Changes in optimization thresholds/cost-models usually have the biggest
impact on compile time. However, usually they are well-assessed and trade-offs
are discussed and agreed on.
>> Introducing a pass doesn't necessarily mean a compile time
slowdown. Sometimes the total compile time might decrease because we're
saving some work for later passes.
>> There are many commits, which individually have a low compile time
impact, but together sum up to a noticeable slowdown.
>> Conscious efforts on reducing compile time definitely help - thanks
everyone who's been working on this!
>>
>> Thanks for reading, any comments or suggestions on how to make LLVM
faster are welcome! I hope we'll see this graph going down this year :-)
>>
>> Michael
>>
>
> This is great, thanks for the January update :)
> Do you mind to share how you collected the numbers (script etc.. and
> how you plotted the graph so I can try repeating at home with my
> testcases?)
Out of pure curiosity, I would love to see the performance of the 
resulting binary co-plotted with the same horizontal axis as this 
compile duration data.


Jon
>
> Thanks,
>
-- 
Jon Roelofs
jonathan at codesourcery.com
CodeSourcery / Mentor Embedded

Mikhail Zolotukhin via llvm-dev

2017-Jan-18 23:35 UTC

head link

[llvm-dev] llvm is getting slower, January edition

> On Jan 18, 2017, at 3:21 PM, Jonathan Roelofs <jonathan at
codesourcery.com> wrote:
> 
> 
> 
> On 1/18/17 3:55 PM, Davide Italiano via llvm-dev wrote:
>> On Tue, Jan 17, 2017 at 6:02 PM, Mikhail Zolotukhin
>> <mzolotukhin at apple.com> wrote:
>>> Hi,
>>> 
>>> Continuing recent efforts in understanding compile time slowdowns,
I looked at some historical data: I picked one test and tried to pin-point
commits that affected its compile-time. The data I have is not 100% accurate,
but hopefully it helps to provide an overview of what's going on with
compile time in LLVM and give a better understanding of what changes usually
impact compile time.
>>> 
>>> Configuration:
>>> The test I used is tramp3d-v4 from LLVM testsuite. It consists of a
single source file, but still takes a noticeable time to compile, which makes it
very convenient for this kind of experiments. The file was compiled with Os for
arm64 on x86 host.
>>> 
>>> Results:
>>> The attached PDF has a compile time graph, on which I marked points
where compile time changed with a list of corresponding commits. A textual
version of the list is available below, but I think it might be much harder to
comprehend the data without the graph. A number in the end shows compile time
change after the given commit:
>>> 
>>> 1. r239821: [InstSimplify] Allow folding of fdiv X, X with just
NaNs ignored. +1%
>>> 2. r241886: [InstCombine] Employ AliasAnalysis in
FindAvailableLoadedValue. +1%
>>> 3. r245118: [SCEV] Apply NSW and NUW flags via poison value
analysis for sub, mul and shl. +2%
>>> 4. r246694: [RemoveDuplicatePHINodes] Start over after removing a
PHI. -1%
>>> 5. r247269: [ADT] Rewrite the StringRef::find implementation to be
simpler... +1%
>>>   r247240: [LPM] Use a map from analysis ID to immutable passes in
the legacy pass manager... +3%
>>>   r247264: Enable GlobalsAA by default. +1%
>>> 6. r247674: [GlobalsAA] Disable globals-aa by default. -1%
>>> 7. r248638: [SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond
to exploit trip counts'. +2%
>>> 8. r249802: [SCEV] Call `StrengthenNoWrapFlags` after
`GroupByComplexity`; NFCI. +4%
>>> 9. r250157: [GlobalsAA] Turn GlobalsAA on again by default. +1%
>>> 10. r251049: [SCEV] Mark AddExprs as nsw or nuw if legal. +23%
>>> 11. No data
>>> 12. r259252: AttributeSetImpl: Summarize existing function
attributes in a bitset. -1%
>>>    r259256: Add LoopSimplifyCFG pass. -2%
>>> 13. r262250: Enable LoopLoadElimination by default. +3%
>>> 14. r262839: Revert "Enable LoopLoadElimination by
default". -3%
>>> 15. r263393: Remove PreserveNames template parameter from
IRBuilder. -3%
>>> 16. r263595: Turn LoopLoadElimination on again. +3%
>>> 17. r267672: [LoopDist] Add llvm.loop.distribute.enable loop
metadata. +4%
>>> 18. r268509: Do not disable completely loop unroll when optimizing
for size. -34%
>>> 19. r269124: Loop unroller: set thresholds for optsize and minsize
functions to zero. +50%
>>> 20. r269392: [LoopDist] Only run LAA for loops with the pragma. -4%
>>> 21. r270630: Re-enable "[LoopUnroll] Enable advanced unrolling
analysis by default" one more time. -28%
>>> 22. r270881: Don't allocate in APInt::slt.  NFC. -2%
>>>    r270959: Don't allocate unnecessarily in
APInt::operator[+-].  NFC. -1%
>>>    r271020: Don't generate unnecessary signed ConstantRange
during multiply.  NFC. -3%
>>> 23. r271615: [LoopUnroll] Set correct thresholds for new recently
enabled unrolling heuristic. +22%
>>> 24. r276942: Don't invoke getName() from
Function::isIntrinsic(). -1%
>>>    r277087: Revert "Don't invoke getName() from
Function::isIntrinsic().", rL276942. +1%
>>> 25. r279585: [LoopUnroll] By default disable unrolling when
optimizing for size.
>>> 26. r286814: [InlineCost] Remove skew when calculating call costs.
+3%
>>> 27. r289755: Make processing @llvm.assume more efficient by using
operand bundles. +6%
>>> 28. r290086: Revert @llvm.assume with operator bundles
(r289755-r289757). -6%
>>> 
>>> 
>>> Disclaimer:
>>> The data is specific for this particular test, so I could have
skipped some commits affecting compile time on other workloads/configurations.
>>> The data I have is not perfect, so I could have skipped some
commits, even if they impacted compile-time on this test case.
>>> Same commits might have a different impact on a different
test/configuration, up to the opposite to the one listed.
>>> I didn't mean to label any commits as 'good' or
'bad' by posting these numbers. It's expected that some commits
increase compile time, we just need to be aware of it and avoid unnecessary
slowdowns.
>>> 
>>> Conclusions:
>>> Changes in optimization thresholds/cost-models usually have the
biggest impact on compile time. However, usually they are well-assessed and
trade-offs are discussed and agreed on.
>>> Introducing a pass doesn't necessarily mean a compile time
slowdown. Sometimes the total compile time might decrease because we're
saving some work for later passes.
>>> There are many commits, which individually have a low compile time
impact, but together sum up to a noticeable slowdown.
>>> Conscious efforts on reducing compile time definitely help - thanks
everyone who's been working on this!
>>> 
>>> Thanks for reading, any comments or suggestions on how to make LLVM
faster are welcome! I hope we'll see this graph going down this year :-)
>>> 
>>> Michael
>>> 
>> 
>> This is great, thanks for the January update :)
>> Do you mind to share how you collected the numbers (script etc.. and
>> how you plotted the graph so I can try repeating at home with my
>> testcases?)
> 
> Out of pure curiosity, I would love to see the performance of the resulting
binary co-plotted with the same horizontal axis as this compile duration data.LNT doesn't allow to plot them on the same graph, so that's I was able
to just align them one with the other:

Though I don't know how representative this test is for performance (but
jump in the beginning looks very interesting).

Michael
> 
> 
> Jon
> 
>> 
>> Thanks,
>> 
> 
> -- 
> Jon Roelofs
> jonathan at codesourcery.com <mailto:jonathan at codesourcery.com>
> CodeSourcery / Mentor Embedded
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170118/53d3060a/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PerformanceAndCompileTime.pdf
Type: application/pdf
Size: 813021 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170118/53d3060a/attachment-0001.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170118/53d3060a/attachment-0003.html>

Sean Silva via llvm-dev

2017-Jan-19 06:00 UTC

head link

[llvm-dev] llvm is getting slower, January edition

Am I reading this right that over the course of the graph we have gotten
about 50% slower compiling this benchmark, and the execution time of the
benchmark has tripled? Those are significant regressions along both
dimensions.

-- Sean Silva

On Wed, Jan 18, 2017 at 3:35 PM, Mikhail Zolotukhin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> On Jan 18, 2017, at 3:21 PM, Jonathan Roelofs <jonathan at
codesourcery.com>
> wrote:
>
>
>
> On 1/18/17 3:55 PM, Davide Italiano via llvm-dev wrote:
>
> On Tue, Jan 17, 2017 at 6:02 PM, Mikhail Zolotukhin
> <mzolotukhin at apple.com> wrote:
>
> Hi,
>
> Continuing recent efforts in understanding compile time slowdowns, I
> looked at some historical data: I picked one test and tried to pin-point
> commits that affected its compile-time. The data I have is not 100%
> accurate, but hopefully it helps to provide an overview of what's going
on
> with compile time in LLVM and give a better understanding of what changes
> usually impact compile time.
>
> Configuration:
> The test I used is tramp3d-v4 from LLVM testsuite. It consists of a single
> source file, but still takes a noticeable time to compile, which makes it
> very convenient for this kind of experiments. The file was compiled with Os
> for arm64 on x86 host.
>
> Results:
> The attached PDF has a compile time graph, on which I marked points where
> compile time changed with a list of corresponding commits. A textual
> version of the list is available below, but I think it might be much harder
> to comprehend the data without the graph. A number in the end shows compile
> time change after the given commit:
>
> 1. r239821: [InstSimplify] Allow folding of fdiv X, X with just NaNs
> ignored. +1%
> 2. r241886: [InstCombine] Employ AliasAnalysis in
> FindAvailableLoadedValue. +1%
> 3. r245118: [SCEV] Apply NSW and NUW flags via poison value analysis for
> sub, mul and shl. +2%
> 4. r246694: [RemoveDuplicatePHINodes] Start over after removing a PHI. -1%
> 5. r247269: [ADT] Rewrite the StringRef::find implementation to be
> simpler... +1%
>   r247240: [LPM] Use a map from analysis ID to immutable passes in the
> legacy pass manager... +3%
>   r247264: Enable GlobalsAA by default. +1%
> 6. r247674: [GlobalsAA] Disable globals-aa by default. -1%
> 7. r248638: [SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to
exploit
> trip counts'. +2%
> 8. r249802: [SCEV] Call `StrengthenNoWrapFlags` after `GroupByComplexity`;
> NFCI. +4%
> 9. r250157: [GlobalsAA] Turn GlobalsAA on again by default. +1%
> 10. r251049: [SCEV] Mark AddExprs as nsw or nuw if legal. +23%
> 11. No data
> 12. r259252: AttributeSetImpl: Summarize existing function attributes in a
> bitset. -1%
>    r259256: Add LoopSimplifyCFG pass. -2%
> 13. r262250: Enable LoopLoadElimination by default. +3%
> 14. r262839: Revert "Enable LoopLoadElimination by default". -3%
> 15. r263393: Remove PreserveNames template parameter from IRBuilder. -3%
> 16. r263595: Turn LoopLoadElimination on again. +3%
> 17. r267672: [LoopDist] Add llvm.loop.distribute.enable loop metadata. +4%
> 18. r268509: Do not disable completely loop unroll when optimizing for
> size. -34%
> 19. r269124: Loop unroller: set thresholds for optsize and minsize
> functions to zero. +50%
> 20. r269392: [LoopDist] Only run LAA for loops with the pragma. -4%
> 21. r270630: Re-enable "[LoopUnroll] Enable advanced unrolling
analysis by
> default" one more time. -28%
> 22. r270881: Don't allocate in APInt::slt.  NFC. -2%
>    r270959: Don't allocate unnecessarily in APInt::operator[+-].  NFC.
-1%
>    r271020: Don't generate unnecessary signed ConstantRange during
> multiply.  NFC. -3%
> 23. r271615: [LoopUnroll] Set correct thresholds for new recently enabled
> unrolling heuristic. +22%
> 24. r276942: Don't invoke getName() from Function::isIntrinsic(). -1%
>    r277087: Revert "Don't invoke getName() from
Function::isIntrinsic().",
> rL276942. +1%
> 25. r279585: [LoopUnroll] By default disable unrolling when optimizing for
> size.
> 26. r286814: [InlineCost] Remove skew when calculating call costs. +3%
> 27. r289755: Make processing @llvm.assume more efficient by using operand
> bundles. +6%
> 28. r290086: Revert @llvm.assume with operator bundles (r289755-r289757).
> -6%
>
>
> Disclaimer:
> The data is specific for this particular test, so I could have skipped
> some commits affecting compile time on other workloads/configurations.
> The data I have is not perfect, so I could have skipped some commits, even
> if they impacted compile-time on this test case.
> Same commits might have a different impact on a different
> test/configuration, up to the opposite to the one listed.
> I didn't mean to label any commits as 'good' or 'bad'
by posting these
> numbers. It's expected that some commits increase compile time, we just
> need to be aware of it and avoid unnecessary slowdowns.
>
> Conclusions:
> Changes in optimization thresholds/cost-models usually have the biggest
> impact on compile time. However, usually they are well-assessed and
> trade-offs are discussed and agreed on.
> Introducing a pass doesn't necessarily mean a compile time slowdown.
> Sometimes the total compile time might decrease because we're saving
some
> work for later passes.
> There are many commits, which individually have a low compile time impact,
> but together sum up to a noticeable slowdown.
> Conscious efforts on reducing compile time definitely help - thanks
> everyone who's been working on this!
>
> Thanks for reading, any comments or suggestions on how to make LLVM faster
> are welcome! I hope we'll see this graph going down this year :-)
>
> Michael
>
>
> This is great, thanks for the January update :)
> Do you mind to share how you collected the numbers (script etc.. and
> how you plotted the graph so I can try repeating at home with my
> testcases?)
>
>
> Out of pure curiosity, I would love to see the performance of the
> resulting binary co-plotted with the same horizontal axis as this compile
> duration data.
>
> LNT doesn't allow to plot them on the same graph, so that's I was
able to
> just align them one with the other:
>
> Though I don't know how representative this test is for performance
(but
> jump in the beginning looks very interesting).
>
> Michael
>
>
>
> Jon
>
>
> Thanks,
>
>
> --
> Jon Roelofs
> jonathan at codesourcery.com
> CodeSourcery / Mentor Embedded
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170118/9cc70b9a/attachment.html>

llvm dev - Jan 2017 - llvm is getting slower, January edition

[llvm-dev] llvm is getting slower, January edition

[llvm-dev] llvm is getting slower, January edition

[llvm-dev] llvm is getting slower, January edition