thr3ads.net - llvm dev - [LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM [Apr 2014]

If this information is useful, please help other people find it:
Share via:

Ana Pazos

2014-Apr-23 20:58 UTC

[LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

Hi Gerolf,

 

Sorry for the delayed response. I had to get permission to share more
details. 

 

I am allowed to share relative numbers but not absolute numbers.

 

Any missing test is due to runtime failures (e.g., gcc failure due to the
fused multiply pattern bug which Tim fixed later on).

 

Thanks,

Ana.

 


Benchmarks

ARM64 vs GCC 4.9 %

ARM64 vs AArch64 %

ARM64 vs AArch64 patched %


SPEC 2000


art

-10

-5

-1


bzip2

-3

5

5


crafty

-5

1

3


gap

-8

1

2


gzip

0

4

3


mcf

-2

-1

-1


mesa

-15

-3

-1


parser

-10

-2

4


perlbmk

5

7

5


vortex

-3

-6

-4


vpr

-15

-1

0

 

 

From: Gerolf Hoflehner [mailto:ghoflehner at apple.com] 
Sent: Tuesday, April 08, 2014 4:46 PM
To: Ana Pazos
Cc: Tim Northover; LLVM Developers Mailing List
Subject: Re: [LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

 

Hi Ana,

 

could you share the SPEC2000 data per suite and per benchmark?

 

Thanks

Gerolf

 

On Apr 8, 2014, at 1:33 AM, Ana Pazos <apazos at codeaurora.org> wrote:





Hi folks,

 

As Tim pointed out, we recently had the opportunity to collect 64-bit
benchmark performance data for GCC 4.9, AArch64 and ARM64 compilers on a
real hardware. It is a cortex-a53 device. Due to proprietary reasons we
cannot share the full hardware configuration.

 

The preliminary results were shared at the hackers lab at EuroLLVM
yesterday. For those who could not make it, below is the summarized
performance data.

 

A positive number means the ARM64 run is better by the number %. A negative
number means the baseline (GCC 4.9 or AArch64) is better by the number %.

 

Tuning of AArch64 backend on this processor has not been completely done yet
(some initial work has started on modeling cortex-a53). But we quickly
investigated the bad vectorized code in some of the tests (Linpack for
example) and identified straightforward fixes that improved AArch64
performance (similar patches are present in ARM64, e.g. loop unroll default
limit, unaligned memory accesses, etc.). These patches are going to the
AArch64 commits list for review.

 

This experiment indicates that from the point of view of correctness and
performance either ARM64 or AArch64 could be the base compiler of choice if
the known correctness issues (in ARM64) and lack of performance tuning (in
AArch64) are addressed.

 

However much more work has to be done to catch up with GCC 4.9 middle-end
and backend optimizations.

 


Benchmark

ARM64 vs GCC 4.9 %

ARM64 vs AArch64  %

ARM64 vs AArch64 patched %


EEMBC (no consumer) geomean

-17

1

-2


EEMBC (consumer only) geomean

-21

-2

-5


Linpack Double

-29

45

-1


Linpack Single

-51

40

1


SPEC2000 geomean

-6

0

1

 

Thanks,

Ana.

 

 

 

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Tim Northover
Sent: Tuesday, April 08, 2014 12:04 AM
To: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

 

Hi again,

 

In my original message I was attempting to summarise the key arguments as I
saw them. Other points came up in the discussion, which Ana kindly recorded
and I'll summarise here:

 

First, extra arguments brought up in favour of each backend (I'll mention
duplicates too so that the list is as complete as possible):

 

+ Register class usage in ARM64 is cleaner.

+ FastISel is on ARM64, but not AArch64. Some TableGen work will be

needed to enable it because of how patterns are written there.

+ There is no macro support in AArch64.

+ Both NEON syntax variants (general & iOS) are supported by ARM64 now.

+ ARM64 assumes neon enabled by default, and indeed has no notion that

a CPU might not have NEON. Instructions will need to be predicated to check
NEON is present and probably some corresponding .cpp changes where it was
also assumed.

+ Inline asm is possibly better in ARM64.

+ Anecdotal evidence suggests it's easier to debug MC layer issues on

ARM64 than on AArch64.

 

Other important points that we discussed:

 

+ We need to setup a buildbot for performance using some real hardware

(volunteers with hardware?) so patches can be validated in the supported
targets. And also for correctness using qemu.

 

+ Google is working on a framework to build and run benchmarks - to be

available soon? And should enable the buildbot setup from item above.

 

+ We need to sort out differences between cortex-a53 and Cyclone model

descriptions (both use the new approach for MI scheduler, but one requires
annotating instructions and the other does not). We should pin down Andy and
get him to describe the perfect machine model.

 

Cheers.

 

Tim

 

_______________________________________________

LLVM Developers mailing list

 <mailto:LLVMdev at cs.uiuc.edu> LLVMdev at cs.uiuc.edu
<http://llvm.cs.uiuc.edu/> http://llvm.cs.uiuc.edu

 <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
 <mailto:LLVMdev at cs.uiuc.edu> LLVMdev at cs.uiuc.edu
<http://llvm.cs.uiuc.edu/> http://llvm.cs.uiuc.edu
 <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140423/c18032c0/attachment.html>

JF Bastien

2014-Apr-23 22:31 UTC

head link

[LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

A few things that we discussed at the EuroLLVM meeting:

   - It would be useful to have numbers generated with the same method but
   comparing A32 to GCC, to see what performance difference is
"inherent" to
   LLVM's architecture-independent optimizations, and which are actual ARM64
   shortcomings due to its immaturity compared to A32.
   - SPEC2k6 may be a better measure since 2k often can be gamed or fits in
   cache.

The conclusion at EuroLLVM was that the ARM64 vs Aarch64 numbers after
patching were acceptable. It seems to still be the case?



On Wed, Apr 23, 2014 at 1:58 PM, Ana Pazos <apazos at codeaurora.org>
wrote:
> Hi Gerolf,
>
>
>
> Sorry for the delayed response. I had to get permission to share more
> details.
>
>
>
> I am allowed to share relative numbers but not absolute numbers.
>
>
>
> Any missing test is due to runtime failures (e.g., gcc failure due to the
> fused multiply pattern bug which Tim fixed later on).
>
>
>
> Thanks,
>
> Ana.
>
>
>
> *Benchmarks*
>
> *ARM64 vs GCC 4.9 %*
>
> *ARM64 vs AArch64 %*
>
> *ARM64 vs AArch64 patched %*
>
> *SPEC 2000*
>
> art
>
> -10
>
> -5
>
> -1
>
> bzip2
>
> -3
>
> 5
>
> 5
>
> crafty
>
> -5
>
> 1
>
> 3
>
> gap
>
> -8
>
> 1
>
> 2
>
> gzip
>
> 0
>
> 4
>
> 3
>
> mcf
>
> -2
>
> -1
>
> -1
>
> mesa
>
> -15
>
> -3
>
> -1
>
> parser
>
> -10
>
> -2
>
> 4
>
> perlbmk
>
> 5
>
> 7
>
> 5
>
> vortex
>
> -3
>
> -6
>
> -4
>
> vpr
>
> -15
>
> -1
>
> 0
>
>
>
>
>
> *From:* Gerolf Hoflehner [mailto:ghoflehner at apple.com]
> *Sent:* Tuesday, April 08, 2014 4:46 PM
> *To:* Ana Pazos
> *Cc:* Tim Northover; LLVM Developers Mailing List
>
> *Subject:* Re: [LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM
>
>
>
> Hi Ana,
>
>
>
> could you share the SPEC2000 data per suite and per benchmark?
>
>
>
> Thanks
>
> Gerolf
>
>
>
> On Apr 8, 2014, at 1:33 AM, Ana Pazos <apazos at codeaurora.org>
wrote:
>
>
>
> Hi folks,
>
>
>
> As Tim pointed out, we recently had the opportunity to collect 64-bit
> benchmark performance data for GCC 4.9, AArch64 and ARM64 compilers on a
> real hardware. It is a cortex-a53 device. Due to proprietary reasons we
> cannot share the full hardware configuration.
>
>
>
> The preliminary results were shared at the hackers lab at EuroLLVM
> yesterday. For those who could not make it, below is the summarized
> performance data.
>
>
>
> A positive number means the ARM64 run is better by the number %. A
> negative number means the baseline (GCC 4.9 or AArch64) is better by the
> number %.
>
>
>
> Tuning of AArch64 backend on this processor has not been completely done
> yet (some initial work has started on modeling cortex-a53). But we quickly
> investigated the bad vectorized code in some of the tests (Linpack for
> example) and identified straightforward fixes that improved AArch64
> performance (similar patches are present in ARM64, e.g. loop unroll default
> limit, unaligned memory accesses, etc.). These patches are going to the
> AArch64 commits list for review.
>
>
>
> This experiment indicates that from the point of view of correctness and
> performance either ARM64 or AArch64 could be the base compiler of choice if
> the known correctness issues (in ARM64) and lack of performance tuning (in
> AArch64) are addressed.
>
>
>
> However much more work has to be done to catch up with GCC 4.9 middle-end
> and backend optimizations.
>
>
>
> *Benchmark*
>
> *ARM64 vs GCC 4.9 %*
>
> *ARM64 vs AArch64  %*
>
> *ARM64 vs AArch64 patched %*
>
> EEMBC (no consumer) geomean
>
> -17
>
> 1
>
> -2
>
> EEMBC (consumer only) geomean
>
> -21
>
> -2
>
> -5
>
> Linpack Double
>
> -29
>
> 45
>
> -1
>
> Linpack Single
>
> -51
>
> 40
>
> 1
>
> SPEC2000 geomean
>
> -6
>
> 0
>
> 1
>
>
>
> Thanks,
>
> Ana.
>
>
>
>
>
>
>
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu<llvmdev-bounces at cs.uiuc.edu>]
> On Behalf Of Tim Northover
> Sent: Tuesday, April 08, 2014 12:04 AM
> To: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM
>
>
>
> Hi again,
>
>
>
> In my original message I was attempting to summarise the key arguments as
> I saw them. Other points came up in the discussion, which Ana kindly
> recorded and I'll summarise here:
>
>
>
> First, extra arguments brought up in favour of each backend (I'll
mention
> duplicates too so that the list is as complete as possible):
>
>
>
> + Register class usage in ARM64 is cleaner.
>
> + FastISel is on ARM64, but not AArch64. Some TableGen work will be
>
> needed to enable it because of how patterns are written there.
>
> + There is no macro support in AArch64.
>
> + Both NEON syntax variants (general & iOS) are supported by ARM64 now.
>
> + ARM64 assumes neon enabled by default, and indeed has no notion that
>
> a CPU might not have NEON. Instructions will need to be predicated to
> check NEON is present and probably some corresponding .cpp changes where it
> was also assumed.
>
> + Inline asm is possibly better in ARM64.
>
> + Anecdotal evidence suggests it's easier to debug MC layer issues on
>
> ARM64 than on AArch64.
>
>
>
> Other important points that we discussed:
>
>
>
> + We need to setup a buildbot for performance using some real hardware
>
> (volunteers with hardware?) so patches can be validated in the supported
> targets. And also for correctness using qemu.
>
>
>
> + Google is working on a framework to build and run benchmarks – to be
>
> available soon? And should enable the buildbot setup from item above.
>
>
>
> + We need to sort out differences between cortex-a53 and Cyclone model
>
> descriptions (both use the new approach for MI scheduler, but one requires
> annotating instructions and the other does not). We should pin down Andy
> and get him to describe the perfect machine model.
>
>
>
> Cheers.
>
>
>
> Tim
>
>
>
> _______________________________________________
>
> LLVM Developers mailing list
>
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140423/7c935ec8/attachment.html>

Ana Pazos

2014-Apr-23 23:20 UTC

head link

[LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

Hi JF,

 

Yes, I remember those discussion points. And yes, we all agreed to switch to
ARM64 code base.

 

We will run A32 versus GCC 4.9 and post the results in the list sometime soon.

 

The a53 device we have has 1 GB RAM which is not enough for SPEC2006 workloads.
We are expecting (by next week) a firmware fix for the Juno a57 board to clock
it at a more meaningful frequency. We will then be able to run SPEC2006 and
share the results.

 

Thanks,

Ana.

 

From: JF Bastien [mailto:jfb at google.com] 
Sent: Wednesday, April 23, 2014 3:31 PM
To: Ana Pazos
Cc: Gerolf Hoflehner; Kipping, David; rajav at codeaurora.org; LLVM Developers
Mailing List
Subject: Re: [LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

 

A few things that we discussed at the EuroLLVM meeting:

*	It would be useful to have numbers generated with the same method but
comparing A32 to GCC, to see what performance difference is "inherent"
to LLVM's architecture-independent optimizations, and which are actual ARM64
shortcomings due to its immaturity compared to A32.
*	SPEC2k6 may be a better measure since 2k often can be gamed or fits in cache.

The conclusion at EuroLLVM was that the ARM64 vs Aarch64 numbers after patching
were acceptable. It seems to still be the case?

 

 

On Wed, Apr 23, 2014 at 1:58 PM, Ana Pazos <apazos at codeaurora.org>
wrote:

Hi Gerolf,

 

Sorry for the delayed response. I had to get permission to share more details. 

 

I am allowed to share relative numbers but not absolute numbers.

 

Any missing test is due to runtime failures (e.g., gcc failure due to the fused
multiply pattern bug which Tim fixed later on).

 

Thanks,

Ana.

 


Benchmarks

ARM64 vs GCC 4.9 %

ARM64 vs AArch64 %

ARM64 vs AArch64 patched %


SPEC 2000


art

-10

-5

-1


bzip2

-3

5

5


crafty

-5

1

3


gap

-8

1

2


gzip

0

4

3


mcf

-2

-1

-1


mesa

-15

-3

-1


parser

-10

-2

4


perlbmk

5

7

5


vortex

-3

-6

-4


vpr

-15

-1

0

 

 

From: Gerolf Hoflehner [mailto:ghoflehner at apple.com] 
Sent: Tuesday, April 08, 2014 4:46 PM
To: Ana Pazos
Cc: Tim Northover; LLVM Developers Mailing List


Subject: Re: [LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

 

Hi Ana,

 

could you share the SPEC2000 data per suite and per benchmark?

 

Thanks

Gerolf

 

On Apr 8, 2014, at 1:33 AM, Ana Pazos <apazos at codeaurora.org> wrote:

 

Hi folks,

 

As Tim pointed out, we recently had the opportunity to collect 64-bit benchmark
performance data for GCC 4.9, AArch64 and ARM64 compilers on a real hardware. It
is a cortex-a53 device. Due to proprietary reasons we cannot share the full
hardware configuration.

 

The preliminary results were shared at the hackers lab at EuroLLVM yesterday.
For those who could not make it, below is the summarized performance data.

 

A positive number means the ARM64 run is better by the number %. A negative
number means the baseline (GCC 4.9 or AArch64) is better by the number %.

 

Tuning of AArch64 backend on this processor has not been completely done yet
(some initial work has started on modeling cortex-a53). But we quickly
investigated the bad vectorized code in some of the tests (Linpack for example)
and identified straightforward fixes that improved AArch64 performance (similar
patches are present in ARM64, e.g. loop unroll default limit, unaligned memory
accesses, etc.). These patches are going to the AArch64 commits list for review.

 

This experiment indicates that from the point of view of correctness and
performance either ARM64 or AArch64 could be the base compiler of choice if the
known correctness issues (in ARM64) and lack of performance tuning (in AArch64)
are addressed.

 

However much more work has to be done to catch up with GCC 4.9 middle-end and
backend optimizations.

 


Benchmark

ARM64 vs GCC 4.9 %

ARM64 vs AArch64  %

ARM64 vs AArch64 patched %


EEMBC (no consumer) geomean

-17

1

-2


EEMBC (consumer only) geomean

-21

-2

-5


Linpack Double

-29

45

-1


Linpack Single

-51

40

1


SPEC2000 geomean

-6

0

1

 

Thanks,

Ana.

 

 

 

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Tim Northover
Sent: Tuesday, April 08, 2014 12:04 AM
To: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

 

Hi again,

 

In my original message I was attempting to summarise the key arguments as I saw
them. Other points came up in the discussion, which Ana kindly recorded and
I'll summarise here:

 

First, extra arguments brought up in favour of each backend (I'll mention
duplicates too so that the list is as complete as possible):

 

+ Register class usage in ARM64 is cleaner.

+ FastISel is on ARM64, but not AArch64. Some TableGen work will be

needed to enable it because of how patterns are written there.

+ There is no macro support in AArch64.

+ Both NEON syntax variants (general & iOS) are supported by ARM64 now.

+ ARM64 assumes neon enabled by default, and indeed has no notion that

a CPU might not have NEON. Instructions will need to be predicated to check NEON
is present and probably some corresponding .cpp changes where it was also
assumed.

+ Inline asm is possibly better in ARM64.

+ Anecdotal evidence suggests it's easier to debug MC layer issues on

ARM64 than on AArch64.

 

Other important points that we discussed:

 

+ We need to setup a buildbot for performance using some real hardware

(volunteers with hardware?) so patches can be validated in the supported
targets. And also for correctness using qemu.

 

+ Google is working on a framework to build and run benchmarks – to be

available soon? And should enable the buildbot setup from item above.

 

+ We need to sort out differences between cortex-a53 and Cyclone model

descriptions (both use the new approach for MI scheduler, but one requires
annotating instructions and the other does not). We should pin down Andy and get
him to describe the perfect machine model.

 

Cheers.

 

Tim

 

_______________________________________________

LLVM Developers mailing list

 <mailto:LLVMdev at cs.uiuc.edu> LLVMdev at cs.uiuc.edu         
<http://llvm.cs.uiuc.edu/> http://llvm.cs.uiuc.edu

 <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
 <mailto:LLVMdev at cs.uiuc.edu> LLVMdev at cs.uiuc.edu         
<http://llvm.cs.uiuc.edu/> http://llvm.cs.uiuc.edu
 <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

 


_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140423/bf0b1019/attachment.html>

llvm dev - Apr 2014 - [LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

[LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

[LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

[LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM