Frank Schaefer via llvm-dev
2018-Oct-03 03:02 UTC
[llvm-dev] [RFC] Rearchitect Gnu toolchain driver to simplify multilib support
Hi all, I've been poking around with llvm+clang+compiler-rt, trying to get it working on Linux ARM soft-float (yes, ARM soft-float support is pretty broken). Along the way I tried writing a multilib toolchain driver for ARM soft/hard float, with only partial success. For reference see https://reviews.llvm.org/D52705#inline-464117. One thing I noticed while doing this (and a few other people seem to agree on) is that the entire Gnu toolchain driver set could be greatly simplified. So far, it seems like every time someone has encountered a new multilib case (either a new arch or a new distro arrangement), the response has been to pile on another custom multilib driver, or add a bunch of corner-case codepaths to an existing driver. That's been done so many times that the existing driver set is honestly starting to collapse under its own weight. :-( I'm now contemplating what it would take to reduce the entire driver set to something that simply figures out all the multilib/multiarch distinctions by querying the existing gcc installation. This could theoretically cover all Gnu multilib cases in a single codepath. Some background: Current GNU toolchains (gcc+glibc+binutils) tend to encapsulate all multilib knowledge in gcc, including: * What flags trigger a specific multilib selection * What directories are associated with a particular multilib selection (what we know as osSuffix()/gccSuffix()) * What run-time linker (/llib/ld-<arch>.so.<ver>) to use for a particular multilib selection This is highly customizable at gcc build time via a bunch of arch+OS+ABI configuration fragments in the "gcc/config" directory of the gcc source tree, and a lot of Linux distros have taken their own liberties with this configuration. That's part of why clang's Gnu toolchain driver is in the state it's in. The rough outline of what I would propose: 1. clang's CMakeLists can scan the spec tokens for a selected gcc installation (available via "gcc -dumpspecs") and pick out the important tokens (so far I know this includes "*multilib", "*multilib_matches", "*multilib_defaults", "*multilib_options", and "*link"). 2. clang's Gnu driver can be re-coded to parse the relevant spec tokens. 3. clang's Gnu driver can build up a complete unified MultilibSet based on these tokens. Some potential complications I anticipate: 1. I don't know how consistently gcc has used these spec tokens, or how the formatting has evolved over time. Mimicking the current (gcc 8.2.0) format seems sensible, but what we pull from older gcc installations may not comport with what we expect. 2. I don't see anything in the spec tokens that describes system header arrangement. Vanilla multilib-enabled gcc seems to honor /usr/include/<os-suffix> (where <os-suffix> seems to conform to the output of "gcc <flags> -print-multiarch"). Note that this doesn't necessarily match the osSuffix; I've produced functional GNU toolchains that honor a standard-triple osSuffix, but don't honor _anything_ like it under /usr/include. 3. g++, OTOH, expects all C++ headers to be under /usr/include/c++/<version>. Vanilla g++ keeps some headers further subbed under <os-suffix>, with some of those further subbed again under <gcc-suffix> for non-default multilib cases. Just to complicate things, Debian/Ubuntu g++ has apparently been adapted to employ the /usr/include/<os-suffix> for multilib-specific C++ headers. If other distros do their own thing with this, then I see no straightforward way to autodetect anything but a few obvious cases. To address the above complications, I would suggest adding CMake options for users to supply their own multilib descriptor tokens, in case whatever's in gcc specs doesn't work for them. We might even allow for an extra token or two to better describe C/C++ header layout. This would all require a LOT of planning and testing, especially across the multiple targets/distros the Gnu toolchain driver currently supports. I'm not sure how to access suitable testbeds for a lot of it (I count myself lucky just to have a reasonably-powerful ARM build-box). At least initially, I think we would have to keep the old hodgepodge driver code around alongside the new unified driver code. -- Frank "If a server dies in a server farm and no one pings it, does it still cost four figures to fix?"