Ahmed Bougacha via llvm-dev
2017-Apr-06 19:06 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
On Thu, Apr 6, 2017 at 6:53 AM, Kristof Beyls via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I've been digging a little bit deeper into the biggest performance > regressions I've observed. > > What I've observed so far is: > * A lot of the biggest regressions are caused by unnecessarily moving > floating point values through general purpose registers. I've raised > http://bugs.llvm.org/show_bug.cgi?id=32550 for this. I think this one > definitely needs fixing before enabling GlobalISel by default at -O0. > * FastISel seems to transform division-by-constant-power-of-2 into right > shift (see > https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468). > GlobalISel does not. It seems to me that at -O0 there may be reasons not > perform this transformation, but maybe there is a good reason why FastISel > does this?So, FastISel on AArch64 isn't really an "O0" selector: it has a lot of smarts and peepholes, because some JIT users had it as the main optimizing selector for a while. In that sense, it's a pretty aggressive target that IMO we don't have to match.> * FastISel doesn't seem to handle functions with switch statements, so it > falls back to DAGISel. DAGISel produces code that's a lot better than > GlobalISel for switch statement at -O0. I'm not sure if we need to do > something here before enabling GlobalISel by default. I'm thinking we may > need to add a smarter way to lower switch statements rather than just a > cascaded sequence of conditional branches.D31080 seems promising, I've been wanting to take a look, hoping we can use that to emit an optimized lowering. I'm not sure we want that at O0 though (even if only for FastISel+DAGISel parity).> I'll try to add the above content to the document Diana created at > https://goo.gl/IS2Bdw too.Thanks for the investigation! These are also some of the biggest problems I've seen (in particular the FP regbanks). I'll make sure I find the time to file bugs for all the other issues I'm aware of. (sorry I haven't done that earlier!) -Ahmed> Thanks, > > Kristof > > > > On 3 Apr 2017, at 17:10, Kristof Beyls <Kristof.Beyls at arm.com> wrote: > > I've kicked off a run to compare "-O0 -g" versus "-O0 -g -mllvm -global-isel > -mllvm -global-isel-abort=2". > I've selected the test-suite (albeit a version which is a couple of months > old now) and a few short-running proprietary benchmarks to get data back > quickly for an initial feel of where things are. > This was running on Cortex-A57 AArch64 Linux. > > I saw one assertion failure in GlobalISel, see > http://bugs.llvm.org/show_bug.cgi?id=32471. This is in a program compiled at > -O2 (my out-dated test-suite still overrides -O0 and instead uses -O for > that program). The root cause of the failure seems to be due to LowLevelType > not supporting vectors of pointers. I think this demonstrates that for > correctness, we should be trying to test more than -O0, or even more than > just LLVM-IR produced by clang, as other front-ends could run into this even > at -O0. > > Due to this assertion failure and the infrastructure I used, the numbers > below do not include test-suite/MultiSource/Benchmarks results. > > On the non-correctness aspects, LNT tells me that: > - The programs that report execution time, on geomean are about 17% slower. > - The programs that report scores, on geomean are about 21% slower. > - Code size is up on geomean about 11%. > I'm afraid I don't have compile time numbers, nor any feel for debug info > quality. > > I'll need quite a bit more time to dig into the details to come up with > something actionable, although the fact that LowLevelType doesn't support > vectors of pointers is already actionable. > Nevertheless, I thought to share what I see as is, to see if others see > similar results so far. > > I thought Diana was going to look into fallback rate on the test-suite on > AArch64 linux? > > Thanks, > > Kristof > > On 30 Mar 2017, at 10:54, Renato Golin <renato.golin at linaro.org> wrote: > > On 30 March 2017 at 00:27, Quentin Colombet <qcolombet at apple.com> wrote: > > On iOS we are at 100% pass rate in 00 g for the LLVM test suite, standard > benchmarks and unit tests. In about 5% of all functions GlobalIsel falls > back to SDIsel. > (Kristof Beyls would have the linux numbers.) > The self host compiler correctly builds and runs the LLVM test suite in O0. > > > Having done no tests at all on my side, I think we need to have > similar numbers on Linux to be able to flip across the board. > > I don't want to flip it only for Darwin and not Linux, as that will > fragment the effort too much. > > I'll check with Diana and Kristof to know what's the best way forward, > but it should be reasonably quick. > > cheers, > --renato > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Kristof Beyls via llvm-dev
2017-Apr-07 08:14 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
On 6 Apr 2017, at 21:06, Ahmed Bougacha <ahmed.bougacha at gmail.com<mailto:ahmed.bougacha at gmail.com>> wrote: On Thu, Apr 6, 2017 at 6:53 AM, Kristof Beyls via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: I've been digging a little bit deeper into the biggest performance regressions I've observed. What I've observed so far is: * A lot of the biggest regressions are caused by unnecessarily moving floating point values through general purpose registers. I've raised http://bugs.llvm.org/show_bug.cgi?id=32550 for this. I think this one definitely needs fixing before enabling GlobalISel by default at -O0. * FastISel seems to transform division-by-constant-power-of-2 into right shift (see https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468). GlobalISel does not. It seems to me that at -O0 there may be reasons not perform this transformation, but maybe there is a good reason why FastISel does this? So, FastISel on AArch64 isn't really an "O0" selector: it has a lot of smarts and peepholes, because some JIT users had it as the main optimizing selector for a while. In that sense, it's a pretty aggressive target that IMO we don't have to match. OK, that makes sense to me now, and indeed it doesn't seem a good idea to try and do lots of peepholes at -O0. * FastISel doesn't seem to handle functions with switch statements, so it falls back to DAGISel. DAGISel produces code that's a lot better than GlobalISel for switch statement at -O0. I'm not sure if we need to do something here before enabling GlobalISel by default. I'm thinking we may need to add a smarter way to lower switch statements rather than just a cascaded sequence of conditional branches. D31080 seems promising, I've been wanting to take a look, hoping we can use that to emit an optimized lowering. I'm not sure we want that at O0 though (even if only for FastISel+DAGISel parity). I wasn't aware of D31080: good to know! My thinking here is that one of the reasons people use -O0 is they want a pretty straightforward mapping between source code and the generated assembly code. For switch statements, mapping to a cascaded sequence of conditional branches, a jump table, a binary search tree, or any of the other ways to lower switch statements is equally good from this perspective, I think. So if one of these other lowering schemes is as good as a cascaded sequence of branches for aspects such as compile time and debug info quality, I think it's best to choose one of these alternative lowering schemes. It seems to me that e.g. on MultiSource/Applications/sqlite3/sqlite3, this may be the cause of the almost 2x slowdown compared to non-globalisel -O0. I'll try to add the above content to the document Diana created at https://goo.gl/IS2Bdw too. Thanks for the investigation! These are also some of the biggest problems I've seen (in particular the FP regbanks). I'll make sure I find the time to file bugs for all the other issues I'm aware of. (sorry I haven't done that earlier!) I've seen you added 2 bugs so far. I've slotted them in to https://goo.gl/IS2Bdw. I'm starting to think that it may be easiest if we had a "Meta bug" in bugzilla that combines all the issues we think should be fixed before GlobalISel can be enabled by default at -O0 for AArch64. In the same style as e.g. http://bugs.llvm.org/show_bug.cgi?id=32061. What do you think? Thanks! Kristof -Ahmed Thanks, Kristof On 3 Apr 2017, at 17:10, Kristof Beyls <Kristof.Beyls at arm.com<mailto:Kristof.Beyls at arm.com>> wrote: I've kicked off a run to compare "-O0 -g" versus "-O0 -g -mllvm -global-isel -mllvm -global-isel-abort=2". I've selected the test-suite (albeit a version which is a couple of months old now) and a few short-running proprietary benchmarks to get data back quickly for an initial feel of where things are. This was running on Cortex-A57 AArch64 Linux. I saw one assertion failure in GlobalISel, see http://bugs.llvm.org/show_bug.cgi?id=32471. This is in a program compiled at -O2 (my out-dated test-suite still overrides -O0 and instead uses -O for that program). The root cause of the failure seems to be due to LowLevelType not supporting vectors of pointers. I think this demonstrates that for correctness, we should be trying to test more than -O0, or even more than just LLVM-IR produced by clang, as other front-ends could run into this even at -O0. Due to this assertion failure and the infrastructure I used, the numbers below do not include test-suite/MultiSource/Benchmarks results. On the non-correctness aspects, LNT tells me that: - The programs that report execution time, on geomean are about 17% slower. - The programs that report scores, on geomean are about 21% slower. - Code size is up on geomean about 11%. I'm afraid I don't have compile time numbers, nor any feel for debug info quality. I'll need quite a bit more time to dig into the details to come up with something actionable, although the fact that LowLevelType doesn't support vectors of pointers is already actionable. Nevertheless, I thought to share what I see as is, to see if others see similar results so far. I thought Diana was going to look into fallback rate on the test-suite on AArch64 linux? Thanks, Kristof On 30 Mar 2017, at 10:54, Renato Golin <renato.golin at linaro.org<mailto:renato.golin at linaro.org>> wrote: On 30 March 2017 at 00:27, Quentin Colombet <qcolombet at apple.com<mailto:qcolombet at apple.com>> wrote: On iOS we are at 100% pass rate in 00 g for the LLVM test suite, standard benchmarks and unit tests. In about 5% of all functions GlobalIsel falls back to SDIsel. (Kristof Beyls would have the linux numbers.) The self host compiler correctly builds and runs the LLVM test suite in O0. Having done no tests at all on my side, I think we need to have similar numbers on Linux to be able to flip across the board. I don't want to flip it only for Darwin and not Linux, as that will fragment the effort too much. I'll check with Diana and Kristof to know what's the best way forward, but it should be reasonably quick. cheers, --renato _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170407/032932d0/attachment.html>
Diana Picus via llvm-dev
2017-Apr-07 08:55 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
On 7 April 2017 at 10:14, Kristof Beyls <Kristof.Beyls at arm.com> wrote:> I'm starting to think that it may be easiest if we had a "Meta bug" in > bugzilla that combines > all the issues we think should be fixed before GlobalISel can be enabled by > default at -O0 > for AArch64. In the same style as e.g. > http://bugs.llvm.org/show_bug.cgi?id=32061.+1, we already do this for lots of other things (inline assembly bugs, bugs building the Linux kernel etc) and it's a good way to keep things organized.