Evgeny Stupachenko via llvm-dev
2017-Feb-21 07:48 UTC
[llvm-dev] An alternative way to resolve complex LSR solutions (need perf testing)
Hi All, I've committed an alternative way to resolve complex (>UINT16 variants) LSR solutions under an option "-lsr-exp-narrow" (which is temporary set to true by default): https://reviews.llvm.org/D29862, r295704 I'll turn the option to false when get your feedback. The method is based on registers number mathematical expectation and should be generally closer to optimal solution. However, there could be corner cases, so please let me know if there are gains/regression on your benchmarks. The biggest performance changes are on x86 32 bits (as there are not to much registers). On my benchmarks set there are more gains. Compile time changes are also important (however I don't expect much changes as complex solutions are not that frequent in hot loops). Thanks in advance, Evgeny
David Green via llvm-dev
2017-Feb-23 12:01 UTC
[llvm-dev] An alternative way to resolve complex LSR solutions (need perf testing)
Hello, I originally missed this email, but we did notice the results in our internal benchmarks. Some results are up, some are down, as you might expect. A good place to start for results would be the LNT results here: http://llvm.org/perf/db_default/v4/nts/daily_report/2017/2/21?day_start=16 They show Shootout-C++/matrix-c++ down, and internally we see Shootout/matrix is down in more configurations. We have some other internal benchmarks with similar code patterns to the matrix results being down too. They look like they have similar nested for loop and similar access patterns. We don't have compile time numbers handy, but the LNT results above seem to show them near the bottom. They can sometimes be noisy, but hopefully may be of help. Dave