This sounds reasonable. Thanks, all.> - CSE of ADRP optimization (Jiangning)Quentin may have some input here. He’s done quite a lot of optimizations for ADRP sequences. -Jim On Apr 12, 2014, at 12:08 AM, Tim Northover <t.p.northover at gmail.com> wrote:> Hi again, > > Having heard no howls of protest, those of us remaining on the > Wednesday decided to get down to planning a few more details of the > merge. > > David Kipping very kindly took notes, and we've produced the summary > of the discussion below: > > On Wednesday after the EuroLLVM meeting, a group met to continue > discussing the ARMv8 backend merge and how to accelerate completion. > Attending was James, Bradley, Tim, Jiangning, Kristof, Vinod, > Chandler, Pierre, Ana, and David. > > EuroLLVM provided a timely and convenient opportunity to meet in > person to discuss this topic. But it is important to note that this is > only one meeting and some issues likely have been missed, and that not > everyone involved in the discussion was at EuroLLVM; everything below > is open for further discussion and revision on the community lists. > > Later in the mail are details on the work to complete the merge, but > there is a lot and participation from the community is warmly welcome. > This is an excellent opportunity if you want to learn more about > backends, the ARMv8 architecture, or just want to ensure that the > community ARMv8 backend is of this highest quality and performance. > Some of the areas that have been identified needing help are: > > - Code reviews (there will be lots of changes and quality of review > and timeliness is critical) > - Merging regression tests from both ARMv8 backends - Tim will lead > this effort but is looking for help > - Inline ASM (I think Eric said at the Hackers Lab that he might be > willing to do this) > - Fix bugs > - For others who want to help test, compiling and running your > codebases on QEMU (no crypto extensions) > - Code coverage analysis of backend > - Clean up the codebase (C++11-ify it, for example) - J im will lead > this effort > - In addition, any of the work items identified later in this mail > > ----------------------------------------------------------------------------------------------------------------- > > Summary of the meeting: > > The meeting reaffirmed the conclusion of the discussion at the > EuroLLVM Hackerlab of using ARM64 backend as the merge target. > > It's important to merge as quickly as possible to avoid fragmentation > of community efforts across two backends . Completing the merge in > time for the 3.5 release shall be a stretch goal, but this will be > very difficult because of the short time remaining, and may be missed. > More important than schedule, is to make sure the merge is done right > with good design that is maintainable. > > When the merge is complete, need to delete AArch64 and rename ARM64 to > AArch64 to avoid confusion. Also, alias together the arm64 and aarch64 > triples to the merged backend. > > Should try to minimize patches to AArch64 during the merge, but it is > important to realize that this backend is being used for product > releases and there are contributions in flight and more expected. Bugs > should be filled for ARM64 when appropriate. > > Work that needs to be completed prior to the merge is considered complete: > > - No significant regressions: correctness, features, stability, > performance. There will likely be exceptions, particularly in some > performance subtests, that need to be addressed on a case by case > basis > > - Correctness > -- Merged backend passes LLVM test suite > -- Merged backend passes the invested parties internal tests (Apple, > ARM, QuIC) and should not have significant regressions. It should be > recognized that this is a special situation as there are commercial > releases being made on the two backends, and for adoption of the merge > it is critical that there are no regressions. Examples of tests are: > SPEC2000, SPEC2006, EEMBC, Geekbench, Coremark, MCHammer, Emperor > (NEON) > > - Performance - Difficult to have precise and fixed baseline for > measuring performance regressions on the merged backends because of > variability in hardware, but all significant performance regressions > must be investigated and justified as fix/notfix > > - Feature parity - to the level found in the ARM64 and AArch64 backends today > -- big-endian > -- Optionality of ARMv8 architecture extension sets (no fpu, crypto, crc, ...) > -- A53 scheduler > -- Inline assembly > -- ACLE 2.0 > --- Neon (chapter 12 of ACLE); probably there already on the ARM64 backend > --- Predefines > -- Proper guarding of platform-specific features (Cyclone, Darwin, ELF, …) > -- Regression tests from both backends merged > > The following patches were identified in order to swap in the merged > backend once the merge is completed: > > - Delete AArch64 backend > - Move and rename ARM64 to AArch64 (Changes filename, class names, > replace all non comments ARM64 strings to AArch64) > - Retarget ARM64 triples to merged backend > - Clean up any ARM64 references elsewhere in llvm subprojects > > The following is the anticipated sequence of work leading to a merge: > > - During merge, invested parties will frequently run their internal > correctness, stability and performance tests. Report bugs as > appropriate (ALL) > - System registers redesign, refactoring to use some more of tablegen > resources, and bug fixes (45 patches from ARM were reviewed during the > meeting) > - A53 scheduler - (Dave E, Ana, Andy) have already started discussing > - LLVM test suite run and report failures (Jiangning/Kevin/Hao) > - LLVM test suite enabled in the buildbot and testing ARM64 (Gabor) > - CSE of ADRP optimization (Jiangning) > - Making optional armv8 architecture extension sets optional in LLVM; > no fpu, crypto, crc, ... (Jiangning/Kevin/Hao) > - Proper guarding of platform-specific features (Cyclone, Darwin, ELF, …) (Tim) > - Big-endian (James/Bradley/Kristof) > - Predefines (Bradley) > - Fixes bugs (ALL) > - Backend switch patch-sets (Tim) > > Communication during the merge > - Primary discussions will take place on llvmdev, llvm-commits, and IRC > - A top-level bug: http://llvm.org/bugs/show_bug.cgi?id=19392 > - Depending on how things go, we may want to get together for some > kind of telephone call. We'll send a message to the list if that > happens. > > I think that about covers it. If anyone has any questions, ask away! > > Cheers. > > Tim. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi Jim, 2014-04-15 4:28 GMT+08:00 Jim Grosbach <grosbach at apple.com>:> This sounds reasonable. Thanks, all. > > > - CSE of ADRP optimization (Jiangning) > > Quentin may have some input here. He’s done quite a lot of optimizations > for ADRP sequences. > > -Jim >Thanks for letting me know Quentin may have deep thought around this. ARM64 generates pseudo instructions ARM64::MOVaddr and friends in ISEL stage, which intends to guarantee address serialization (page address + in-page address), and exposes adrp finally by pass ExpandPseudoInsts. The assumption of ARM64 solution is we don't know the in-page offset can be fused into load/store or not at compile time, and this assumption would turn to be not true any longer for the solution of using global merge as I proposed with the patch. If simply apply the global merge solution to ARM64, probably we should avoid generating pseudo instruction MOVaddr and friends in ISEL stage, but I'm not sure if the LOH solution would still work or not, because, 1) ARM64 link-time optimization depends on LOH. 2) We don't see linker plug-in in LLVM trunk and it would be hard for me to verify any thoughts. Any concrete suggestion of combining those different ADRP CSE solutions and tests would be appreciated! Thanks, -Jiangning -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140415/ea0e64ca/attachment.html>
+Quentin. On Apr 14, 2014, at 10:31 PM, Jiangning Liu <liujiangning1 at gmail.com> wrote:> Hi Jim, > > 2014-04-15 4:28 GMT+08:00 Jim Grosbach <grosbach at apple.com>: > This sounds reasonable. Thanks, all. > > > - CSE of ADRP optimization (Jiangning) > > Quentin may have some input here. He’s done quite a lot of optimizations for ADRP sequences. > > -Jim > > Thanks for letting me know Quentin may have deep thought around this. > > ARM64 generates pseudo instructions ARM64::MOVaddr and friends in ISEL stage, which intends to guarantee address serialization (page address + in-page address), and exposes adrp finally by pass ExpandPseudoInsts. The assumption of ARM64 solution is we don't know the in-page offset can be fused into load/store or not at compile time, and this assumption would turn to be not true any longer for the solution of using global merge as I proposed with the patch. > > If simply apply the global merge solution to ARM64, probably we should avoid generating pseudo instruction MOVaddr and friends in ISEL stage, but I'm not sure if the LOH solution would still work or not, because, > 1) ARM64 link-time optimization depends on LOH. > 2) We don't see linker plug-in in LLVM trunk and it would be hard for me to verify any thoughts. > > Any concrete suggestion of combining those different ADRP CSE solutions and tests would be appreciated! > > Thanks, > -Jiangning >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140415/8be57a0b/attachment.html>
Quentin Colombet
2014-Apr-15 17:33 UTC
[LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM
Hi Jiangning, On Apr 14, 2014, at 10:31 PM, Jiangning Liu <liujiangning1 at gmail.com> wrote:> Hi Jim, > > 2014-04-15 4:28 GMT+08:00 Jim Grosbach <grosbach at apple.com>: > This sounds reasonable. Thanks, all. > > > - CSE of ADRP optimization (Jiangning) > > Quentin may have some input here. He’s done quite a lot of optimizations for ADRP sequences. > > -Jim > > Thanks for letting me know Quentin may have deep thought around this. > > ARM64 generates pseudo instructions ARM64::MOVaddr and friends in ISEL stage, which intends to guarantee address serialization (page address + in-page address), and exposes adrp finally by pass ExpandPseudoInsts. The assumption of ARM64 solution is we don't know the in-page offset can be fused into load/store or not at compile time, and this assumption would turn to be not true any longer for the solution of using global merge as I proposed with the patch.I think this is orthogonal. If you happen to merge globals they will have the same base address (i.e., the same pseudo instruction) but different offsets. CSE and such will work like a charm for the pseudos. Assuming you emit the right instructions at isel time, you will create ADRP, LOADGot, or ADD with symbols. Since you do not know anything on the symbols, CSE will match only the ones that are identical. You will have a finer granularity to do CSE, but I am not sure it will help that much. On the other hand, you lose the rematerialization capability, because that feature can only handle one instruction at a time. So you will still be able to rematerialize ADRP but not the LOADGot and ADD with symbols.> > If simply apply the global merge solution to ARM64, probably we should avoid generating pseudo instruction MOVaddr and friends in ISEL stage, but I'm not sure if the LOH solution would still work or not, because, > 1) ARM64 link-time optimization depends on LOH. > 2) We don't see linker plug-in in LLVM trunk and it would be hard for me to verify any thoughts.The LOH solution is also orthogonal. You can see that as a last chance way to optimize those accesses. That said, if you CSE the ADRP and not the LOADGot, you will indeed create far less candidates for the LOHs because you will have ADRPs with several uses, which is not supported by LOHs. FYI, the LOH optimization is not a link-time optimization in LLVM, this is really a link-time optimization: on the binary.> > Any concrete suggestion of combining those different ADRP CSE solutions and tests would be appreciated!The bottom line is whatever you are doing with merge globals, it is orthogonal with LOHs. That said I think it is best to keep the pseudo instructions. Of course I may be wrong and the best way to check would be to measure what happens if you get rid of the pseudo instructions. Do not be too concerned with the impact on the LOHs. Thanks, -Quentin> > Thanks, > -Jiangning > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140415/a91f240f/attachment.html>