thr3ads.net - llvm dev - [llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try! [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Kristof Beyls via llvm-dev

2017-Apr-03 15:10 UTC

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

I've kicked off a run to compare "-O0 -g" versus "-O0 -g
-mllvm -global-isel -mllvm -global-isel-abort=2".
I've selected the test-suite (albeit a version which is a couple of months
old now) and a few short-running proprietary benchmarks to get data back quickly
for an initial feel of where things are.
This was running on Cortex-A57 AArch64 Linux.

I saw one assertion failure in GlobalISel, see
http://bugs.llvm.org/show_bug.cgi?id=32471. This is in a program compiled at -O2
(my out-dated test-suite still overrides -O0 and instead uses -O for that
program). The root cause of the failure seems to be due to LowLevelType not
supporting vectors of pointers. I think this demonstrates that for correctness,
we should be trying to test more than -O0, or even more than just LLVM-IR
produced by clang, as other front-ends could run into this even at -O0.

Due to this assertion failure and the infrastructure I used, the numbers below
do not include test-suite/MultiSource/Benchmarks results.

On the non-correctness aspects, LNT tells me that:
- The programs that report execution time, on geomean are about 17% slower.
- The programs that report scores, on geomean are about 21% slower.
- Code size is up on geomean about 11%.
I'm afraid I don't have compile time numbers, nor any feel for debug
info quality.

I'll need quite a bit more time to dig into the details to come up with
something actionable, although the fact that LowLevelType doesn't support
vectors of pointers is already actionable.
Nevertheless, I thought to share what I see as is, to see if others see similar
results so far.

I thought Diana was going to look into fallback rate on the test-suite on
AArch64 linux?

Thanks,

Kristof

On 30 Mar 2017, at 10:54, Renato Golin <renato.golin at
linaro.org<mailto:renato.golin at linaro.org>> wrote:

On 30 March 2017 at 00:27, Quentin Colombet <qcolombet at
apple.com<mailto:qcolombet at apple.com>> wrote:
On iOS we are at 100% pass rate in 00 g for the LLVM test suite, standard
benchmarks and unit tests. In about 5% of all functions GlobalIsel falls
back to SDIsel.
(Kristof Beyls would have the linux numbers.)
The self host compiler correctly builds and runs the LLVM test suite in O0.

Having done no tests at all on my side, I think we need to have
similar numbers on Linux to be able to flip across the board.

I don't want to flip it only for Darwin and not Linux, as that will
fragment the effort too much.

I'll check with Diana and Kristof to know what's the best way forward,
but it should be reasonably quick.

cheers,
--renato

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170403/2d1a644e/attachment.html>

Diana Picus via llvm-dev

2017-Apr-04 13:55 UTC

head link

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Hi,

Here's my results so far:

On the test-suite, we get 2 timeouts during execution (paq8p and
scimark2). Other than that, everything seems to run just fine. We have
about 7410 unique fallbacks out of 52367 unique functions. Assuming I
counted right (please let me know if this looks fishy, I personally
find the total number of functions terrifyingly small).

On a stage 2 build of clang, we run check-all successfully. We have
about 64784 unique fallbacks out of 661461 functions.

I'm currently trying to run a stage 3 build to compare binaries (just
for kicks) and I'll also try to do some runs without fallbacks and
count what problems we run into most often.

The way I've been counting the total number of functions was to run
objdump -t on all the .o files in the build directory, grab everything
with an "F" flag and remove duplicates. If anyone knows a better way
to do this I'm all ears.

Thanks,
Diana


On 3 April 2017 at 17:10, Kristof Beyls <Kristof.Beyls at arm.com>
wrote:> I've kicked off a run to compare "-O0 -g" versus "-O0 -g
-mllvm -global-isel
> -mllvm -global-isel-abort=2".
> I've selected the test-suite (albeit a version which is a couple of
months
> old now) and a few short-running proprietary benchmarks to get data back
> quickly for an initial feel of where things are.
> This was running on Cortex-A57 AArch64 Linux.
>
> I saw one assertion failure in GlobalISel, see
> http://bugs.llvm.org/show_bug.cgi?id=32471. This is in a program compiled
at
> -O2 (my out-dated test-suite still overrides -O0 and instead uses -O for
> that program). The root cause of the failure seems to be due to
LowLevelType
> not supporting vectors of pointers. I think this demonstrates that for
> correctness, we should be trying to test more than -O0, or even more than
> just LLVM-IR produced by clang, as other front-ends could run into this
even
> at -O0.
>
> Due to this assertion failure and the infrastructure I used, the numbers
> below do not include test-suite/MultiSource/Benchmarks results.
>
> On the non-correctness aspects, LNT tells me that:
> - The programs that report execution time, on geomean are about 17% slower.
> - The programs that report scores, on geomean are about 21% slower.
> - Code size is up on geomean about 11%.
> I'm afraid I don't have compile time numbers, nor any feel for
debug info
> quality.
>
> I'll need quite a bit more time to dig into the details to come up with
> something actionable, although the fact that LowLevelType doesn't
support
> vectors of pointers is already actionable.
> Nevertheless, I thought to share what I see as is, to see if others see
> similar results so far.
>
> I thought Diana was going to look into fallback rate on the test-suite on
> AArch64 linux?
>
> Thanks,
>
> Kristof
>
> On 30 Mar 2017, at 10:54, Renato Golin <renato.golin at linaro.org>
wrote:
>
> On 30 March 2017 at 00:27, Quentin Colombet <qcolombet at apple.com>
wrote:
>
> On iOS we are at 100% pass rate in 00 g for the LLVM test suite, standard
> benchmarks and unit tests. In about 5% of all functions GlobalIsel falls
> back to SDIsel.
> (Kristof Beyls would have the linux numbers.)
> The self host compiler correctly builds and runs the LLVM test suite in O0.
>
>
> Having done no tests at all on my side, I think we need to have
> similar numbers on Linux to be able to flip across the board.
>
> I don't want to flip it only for Darwin and not Linux, as that will
> fragment the effort too much.
>
> I'll check with Diana and Kristof to know what's the best way
forward,
> but it should be reasonably quick.
>
> cheers,
> --renato
>
>

Tim Northover via llvm-dev

2017-Apr-04 16:55 UTC

head link

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

On 4 April 2017 at 06:55, Diana Picus via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> On the test-suite, we get 2 timeouts during execution (paq8p and
> scimark2).
Interesting. I'd not seen those failures in the configurations I'd
run. I'll look into them (other than that my best bet for debugging is
a kernel panic, this has to be easier!).
> On the test-suite, we get 2 timeouts during execution (paq8p and
> scimark2). Other than that, everything seems to run just fine. We have
> about 7410 unique fallbacks out of 52367 unique functions. Assuming I
> counted right (please let me know if this looks fishy, I personally
> find the total number of functions terrifyingly small).
It's about 275000 before uniqueing (which roughly matches an earlier
measurement I did). The duplication seems to be dominated by the
halide tests, though you have to be a little careful with "main"
(which occurs about 500 times). At a glance I'd put the total closer
to 53000 (52420 + 492 extra copies of main), but that's a small
difference to your figure.

Tim.

Kristof Beyls via llvm-dev

2017-Apr-06 13:53 UTC

head link

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

I've been digging a little bit deeper into the biggest performance
regressions I've observed.

What I've observed so far is:
* A lot of the biggest regressions are caused by unnecessarily moving floating
point values through general purpose registers. I've raised
http://bugs.llvm.org/show_bug.cgi?id=32550 for this. I think this one definitely
needs fixing before enabling GlobalISel by default at -O0.
* FastISel seems to transform division-by-constant-power-of-2 into right shift
(see
https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468).
GlobalISel does not. It seems to me that at -O0 there may be reasons not perform
this transformation, but maybe there is a good reason why FastISel does this?
* FastISel doesn't seem to handle functions with switch statements, so it
falls back to DAGISel. DAGISel produces code that's a lot better than
GlobalISel for switch statement at -O0. I'm not sure if we need to do
something here before enabling GlobalISel by default. I'm thinking we may
need to add a smarter way to lower switch statements rather than just a cascaded
sequence of conditional branches.

I'll try to add the above content to the document Diana created at
https://goo.gl/IS2Bdw too.

Thanks,

Kristof

On 3 Apr 2017, at 17:10, Kristof Beyls <Kristof.Beyls at
arm.com<mailto:Kristof.Beyls at arm.com>> wrote:

I've kicked off a run to compare "-O0 -g" versus "-O0 -g
-mllvm -global-isel -mllvm -global-isel-abort=2".
I've selected the test-suite (albeit a version which is a couple of months
old now) and a few short-running proprietary benchmarks to get data back quickly
for an initial feel of where things are.
This was running on Cortex-A57 AArch64 Linux.

I saw one assertion failure in GlobalISel, see
http://bugs.llvm.org/show_bug.cgi?id=32471. This is in a program compiled at -O2
(my out-dated test-suite still overrides -O0 and instead uses -O for that
program). The root cause of the failure seems to be due to LowLevelType not
supporting vectors of pointers. I think this demonstrates that for correctness,
we should be trying to test more than -O0, or even more than just LLVM-IR
produced by clang, as other front-ends could run into this even at -O0.

Due to this assertion failure and the infrastructure I used, the numbers below
do not include test-suite/MultiSource/Benchmarks results.

On the non-correctness aspects, LNT tells me that:
- The programs that report execution time, on geomean are about 17% slower.
- The programs that report scores, on geomean are about 21% slower.
- Code size is up on geomean about 11%.
I'm afraid I don't have compile time numbers, nor any feel for debug
info quality.

I'll need quite a bit more time to dig into the details to come up with
something actionable, although the fact that LowLevelType doesn't support
vectors of pointers is already actionable.
Nevertheless, I thought to share what I see as is, to see if others see similar
results so far.

I thought Diana was going to look into fallback rate on the test-suite on
AArch64 linux?

Thanks,

Kristof

On 30 Mar 2017, at 10:54, Renato Golin <renato.golin at
linaro.org<mailto:renato.golin at linaro.org>> wrote:

On 30 March 2017 at 00:27, Quentin Colombet <qcolombet at
apple.com<mailto:qcolombet at apple.com>> wrote:
On iOS we are at 100% pass rate in 00 g for the LLVM test suite, standard
benchmarks and unit tests. In about 5% of all functions GlobalIsel falls
back to SDIsel.
(Kristof Beyls would have the linux numbers.)
The self host compiler correctly builds and runs the LLVM test suite in O0.

Having done no tests at all on my side, I think we need to have
similar numbers on Linux to be able to flip across the board.

I don't want to flip it only for Darwin and not Linux, as that will
fragment the effort too much.

I'll check with Diana and Kristof to know what's the best way forward,
but it should be reasonably quick.

cheers,
--renato

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170406/46396c89/attachment.html>

Ahmed Bougacha via llvm-dev

2017-Apr-06 19:06 UTC

head link

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

On Thu, Apr 6, 2017 at 6:53 AM, Kristof Beyls via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I've been digging a little bit deeper into the biggest performance
> regressions I've observed.
>
> What I've observed so far is:
> * A lot of the biggest regressions are caused by unnecessarily moving
> floating point values through general purpose registers. I've raised
> http://bugs.llvm.org/show_bug.cgi?id=32550 for this. I think this one
> definitely needs fixing before enabling GlobalISel by default at -O0.
> * FastISel seems to transform division-by-constant-power-of-2 into right
> shift (see
>
https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468).
> GlobalISel does not. It seems to me that at -O0 there may be reasons not
> perform this transformation, but maybe there is a good reason why FastISel
> does this?
So, FastISel on AArch64 isn't really an "O0" selector:  it has a
lot
of smarts and peepholes, because some JIT users had it as the main
optimizing selector for a while.

In that sense, it's a pretty aggressive target that IMO we don't have to
match.
> * FastISel doesn't seem to handle functions with switch statements, so
it
> falls back to DAGISel. DAGISel produces code that's a lot better than
> GlobalISel for switch statement at -O0. I'm not sure if we need to do
> something here before enabling GlobalISel by default. I'm thinking we
may
> need to add a smarter way to lower switch statements rather than just a
> cascaded sequence of conditional branches.
D31080 seems promising, I've been wanting to take a look, hoping we
can use that to emit an optimized lowering.  I'm not sure we want that
at O0 though (even if only for FastISel+DAGISel parity).
> I'll try to add the above content to the document Diana created at
> https://goo.gl/IS2Bdw too.
Thanks for the investigation!  These are also some of the biggest
problems I've seen (in particular the FP regbanks).

I'll make sure I find the time to file bugs for all the other issues
I'm aware of.  (sorry I haven't done that earlier!)

-Ahmed
> Thanks,
>
> Kristof
>
>
>
> On 3 Apr 2017, at 17:10, Kristof Beyls <Kristof.Beyls at arm.com>
wrote:
>
> I've kicked off a run to compare "-O0 -g" versus "-O0 -g
-mllvm -global-isel
> -mllvm -global-isel-abort=2".
> I've selected the test-suite (albeit a version which is a couple of
months
> old now) and a few short-running proprietary benchmarks to get data back
> quickly for an initial feel of where things are.
> This was running on Cortex-A57 AArch64 Linux.
>
> I saw one assertion failure in GlobalISel, see
> http://bugs.llvm.org/show_bug.cgi?id=32471. This is in a program compiled
at
> -O2 (my out-dated test-suite still overrides -O0 and instead uses -O for
> that program). The root cause of the failure seems to be due to
LowLevelType
> not supporting vectors of pointers. I think this demonstrates that for
> correctness, we should be trying to test more than -O0, or even more than
> just LLVM-IR produced by clang, as other front-ends could run into this
even
> at -O0.
>
> Due to this assertion failure and the infrastructure I used, the numbers
> below do not include test-suite/MultiSource/Benchmarks results.
>
> On the non-correctness aspects, LNT tells me that:
> - The programs that report execution time, on geomean are about 17% slower.
> - The programs that report scores, on geomean are about 21% slower.
> - Code size is up on geomean about 11%.
> I'm afraid I don't have compile time numbers, nor any feel for
debug info
> quality.
>
> I'll need quite a bit more time to dig into the details to come up with
> something actionable, although the fact that LowLevelType doesn't
support
> vectors of pointers is already actionable.
> Nevertheless, I thought to share what I see as is, to see if others see
> similar results so far.
>
> I thought Diana was going to look into fallback rate on the test-suite on
> AArch64 linux?
>
> Thanks,
>
> Kristof
>
> On 30 Mar 2017, at 10:54, Renato Golin <renato.golin at linaro.org>
wrote:
>
> On 30 March 2017 at 00:27, Quentin Colombet <qcolombet at apple.com>
wrote:
>
> On iOS we are at 100% pass rate in 00 g for the LLVM test suite, standard
> benchmarks and unit tests. In about 5% of all functions GlobalIsel falls
> back to SDIsel.
> (Kristof Beyls would have the linux numbers.)
> The self host compiler correctly builds and runs the LLVM test suite in O0.
>
>
> Having done no tests at all on my side, I think we need to have
> similar numbers on Linux to be able to flip across the board.
>
> I don't want to flip it only for Darwin and not Linux, as that will
> fragment the effort too much.
>
> I'll check with Diana and Kristof to know what's the best way
forward,
> but it should be reasonably quick.
>
> cheers,
> --renato
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Quentin Colombet via llvm-dev

2017-Apr-26 23:48 UTC

head link

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Hi Kristof,
> On Apr 6, 2017, at 6:53 AM, Kristof Beyls <kristof.beyls at arm.com>
wrote:
> 
> I've been digging a little bit deeper into the biggest performance
regressions I've observed.
> 
> What I've observed so far is:
> * A lot of the biggest regressions are caused by unnecessarily moving
floating point values through general purpose registers. I've raised
http://bugs.llvm.org/show_bug.cgi?id=32550
<http://bugs.llvm.org/show_bug.cgi?id=32550> for this. I think this one
definitely needs fixing before enabling GlobalISel by default at -O0.
I commented in the PR. This is a known problem and we have a solution. Given
this is an optimization in the sense that it does not affect the correctness of
the program, we didn’t push for fixing it now.

For O0 we wanted to focus ourselves on generating correct code. Unless the
regressions you are seeing are preventing debugging/running of the program, I
wouldn’t block the flip of the switch on that.

What do you think? 
> * FastISel seems to transform division-by-constant-power-of-2 into right
shift (see
https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468
<https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/FastISel.cpp#L456-L468>).
GlobalISel does not. It seems to me that at -O0 there may be reasons not perform
this transformation, but maybe there is a good reason why FastISel does this?
I think FastISel tries to generate the best code it can no matter what. For
GISel O0 however, not doing this optimization sounds sensible to me.
Now, I would say that the same remark as the previous bullet point apply: we
shouldn’t do it unless it gets in the way of running/debugging the program.
> * FastISel doesn’t\ seem to handle functions with switch statements, so it
falls back to DAGISel. DAGISel produces code that's a lot better than
GlobalISel for switch statement at -O0. I'm not sure if we need to do
something here before enabling GlobalISel by default. I'm thinking we may
need to add a smarter way to lower switch statements rather than just a cascaded
sequence of conditional branches.
Sounds optimization-ish to me. Same remark.
> 
> I'll try to add the above content to the document Diana created at
https://goo.gl/IS2Bdw <https://goo.gl/IS2Bdw> too.
> 
> Thanks,
> 
> Kristof
> 
> 
> 
>> On 3 Apr 2017, at 17:10, Kristof Beyls <Kristof.Beyls at arm.com
<mailto:Kristof.Beyls at arm.com>> wrote:
>> 
>> I've kicked off a run to compare "-O0 -g" versus
"-O0 -g -mllvm -global-isel -mllvm -global-isel-abort=2".
>> I've selected the test-suite (albeit a version which is a couple of
months old now) and a few short-running proprietary benchmarks to get data back
quickly for an initial feel of where things are.
>> This was running on Cortex-A57 AArch64 Linux.
>> 
>> I saw one assertion failure in GlobalISel, see
http://bugs.llvm.org/show_bug.cgi?id=32471
<http://bugs.llvm.org/show_bug.cgi?id=32471>. This is in a program
compiled at -O2 (my out-dated test-suite still overrides -O0 and instead uses -O
for that program). The root cause of the failure seems to be due to LowLevelType
not supporting vectors of pointers. I think this demonstrates that for
correctness, we should be trying to test more than -O0, or even more than just
LLVM-IR produced by clang, as other front-ends could run into this even at -O0.
>> 
>> Due to this assertion failure and the infrastructure I used, the
numbers below do not include test-suite/MultiSource/Benchmarks results.
>> 
>> On the non-correctness aspects, LNT tells me that:
>> - The programs that report execution time, on geomean are about 17%
slower.
>> - The programs that report scores, on geomean are about 21% slower.
>> - Code size is up on geomean about 11%.
>> I'm afraid I don't have compile time numbers, nor any feel for
debug info quality.
>> 
>> I'll need quite a bit more time to dig into the details to come up
with something actionable, although the fact that LowLevelType doesn't
support vectors of pointers is already actionable.
>> Nevertheless, I thought to share what I see as is, to see if others see
similar results so far.
>> 
>> I thought Diana was going to look into fallback rate on the test-suite
on AArch64 linux?
>> 
>> Thanks,
>> 
>> Kristof
>> 
>>> On 30 Mar 2017, at 10:54, Renato Golin <renato.golin at
linaro.org <mailto:renato.golin at linaro.org>> wrote:
>>> 
>>> On 30 March 2017 at 00:27, Quentin Colombet <qcolombet at
apple.com <mailto:qcolombet at apple.com>> wrote:
>>>> On iOS we are at 100% pass rate in 00 g for the LLVM test
suite, standard
>>>> benchmarks and unit tests. In about 5% of all functions
GlobalIsel falls
>>>> back to SDIsel.
>>>> (Kristof Beyls would have the linux numbers.)
>>>> The self host compiler correctly builds and runs the LLVM test
suite in O0.
>>> 
>>> Having done no tests at all on my side, I think we need to have
>>> similar numbers on Linux to be able to flip across the board.
>>> 
>>> I don't want to flip it only for Darwin and not Linux, as that
will
>>> fragment the effort too much.
>>> 
>>> I'll check with Diana and Kristof to know what's the best
way forward,
>>> but it should be reasonably quick.
>>> 
>>> cheers,
>>> --renato
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170426/745d5645/attachment.html>

llvm dev - Apr 2017 - [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!