> > (from the context, you might have meant 'tuple' where
you've written 'triple'. I'm answering based on the assumption
you meant 'triple')
> I did mean what I wrote.
I thought I ought to check since it's very easy to mix up triples and tuples
and the context sounded off. I'm glad I picked the right assumption.
> > The proposed TargetTuple is a direct replacement for the GNU triple
and is intended to resolve this ambiguity and move away from a
> > string-based implementation (we need to keep a string serialization
though, see below). Essentially, I'm trying to push the ambiguity
> > out of the internals and give the distributor control of how the
ambiguity is resolved for their environment. Once that is done, we'll be
> > able to rely on the TargetTuple for information about the target such
as ABI's, architecture revisions, endianness, etc.
> This is pretty vague.
Unfortunately, it's a very broad topic. I've been approaching this from
the opposite direction to Renato. His reply explains his use case very well and
as I understand it is trying to implement a means of understanding user-level
descriptions of the target such as command line options. I understand that
there's significant inconsistency in the strings used that makes this
non-trivial for ARM in particular.
In my case, I've been fixing a number of details in the backend and have
often found that the only information available is the triple which I know to be
unreliable. For example, our backend currently considers mips64-linux-gnu to be
a 64-bit architecture using the N64 ABI. This is a problem because having a
64-bit CPU does not require the ABI to be N64 or even 64-bit (N32/N64). It's
valid to produce O32 code for a 64-bit processor (and triple) and many of my
test systems (e.g. 32-bit Debian on a MIPS64R2 processor) actually need this
since the host triple detected by config.guess is mips64-linux-gnu. As things
stand, attempting to emit O32 code on a mips64-linux-gnu host crashes the
compiler unless you cross-compile by adding '-target mips-linux-gnu'. My
'native' LLVM releases are arguably cross-compilers (the target triple
!= the host triple), albeit ones that cross-compile to the same target it
executes on.
This turns out to be very difficult to fix since the majority of the Mips target
is using Triple::mips and Triple::mips64 to make assumptions about the
architecture (e.g. are registers 64 bit?) or the ABI (Triple::mips implies O32,
Triple::mips64 implies N64 or, in cases where this has been partially fixed,
N32/N64). Unfortunately, in many of these cases the triple is the best
information I have. At this point, I've hit this wall and seen misuse of
triples in code reviews enough times that my desire to find a general solution
to this is very high. I also know of some impending work that it likely to make
matters worse. I want to be able to, for example, ask the TargetTuple whether I
am targeting a 64-bit ISA and whether I'm supposed to treat it as a 32-bit
ISA (e.g. O32 on MIPS64R2) in many areas of LLVM (including those where
MipsSubtarget and similar are not available) and be able to rely on the answer.
At the moment, we incorrectly conflate 'is it a 64-bit ISA?' with
'is the CPU 64-bit?' as well as 'is the ABI 64-bit?' with
'is the ISA 64-bit?'
> > The string serialization I mentioned above is useful for LLVM-IR as
part of a direct replacement for the 'target triple' statement.
<snip>
> My first impression of using this serialization as is that it's
something I'm against. Keep in mind that being able to parse the string
> can't invoke a target backend to handle the rest of the parsing.
It'd need to be as generic as a DataLayout if you want to do this
> sort of thing and I'm entirely uncertain this is possible for the goals
you (and I) have in mind here.
This may be moot since Renato has almost convinced me we don't need a string
serialization, but I don't see why we can't have target-dependent
parsing. One approach is a factory method that selects the appropriate
TargetTuple subclass based on the first few characters of the serialization, and
another is to do the same but with a member of the TargetTuple class and a
target-dependent portion of the string. The implementations would have to be in
the support library with the base class, but I believe we need that anyway.
> Let me be clear, I do agree with you that the Triple by itself is
insufficient for what we want long term in the backends,
> however, we won't be able to get rid of it completely. It's too
ingrained into how cross compilation is done as a base.
> It is, however, possible to design an API that includes the Triple and the
relevant information to augment sufficiently.
> My vision for this is an API that has a base part that is going to be
generic across all targets (think the current arguments
> to the TargetMachine constructor), and additional target specific
information that can be passed in via user customization
> (i.e. command line options etc).
I'm not trying to get rid of it completely. I'm trying to push it to the
outskirts of the API.
I see llvm::Triple remaining on the periphery but not being used in the core of
LLVM. It's design is that of a parser for the GNU triple but it's also
being used as the representation of a target. My intent it to split these two
concepts apart and have llvm::Triple be the parser it really is and
llvm::TargetTuple be the parsed meaning. This meaning can then be mutated by
tool options before being passed into the core APIs of LLVM.
> I blame the mips backend for this one. We can do -m32/-m64 just fine for
x86 as an example. Some backends have this problem, others don't.
Certainly we made it a lot worse than it needed to be and the number of Mips
ABI's hasn't helped. It's also the fault of historical triple usage
and thinking it's ok to ignore it. In any case, we need to fix it.
-m32/-m64 works by switching the triple for a new one. For example, -m64 on i386
calls Triple::get64BitArchVariant() which changes the triple to x86_64-....
Similarly, -m32 on x86_64 changes the triple to i386-*
It's fairly ugly but it works. The TargetTuple version of this is
essentially the same thing but with a clear separation between the parser and
the target description.
> > Various details (ELF headers, label prefixes, exception personality,
JIT target, etc.) depend on the ABI and OS Distribution rather than just 32-bit
vs 64-bit
> Sure?
Yep.
Some OS Distributions have FPXX enabled by default. Likewise for NAN2008
(although this is currently rare). Some (will) permit mixing IEEE754-1985 and
IEEE754-2008 NAN encodings (they aren't the same on Mips due to historical
choices). This ends up in the ELF headers. It's related to the second class
of issues discussed below.
Label prefixes are normally '.L' but can also be '$' on O32.
Apparently it's an old Irix thing that for one reason or another didn't
actually depend on the OS being Irix. As such '$' is the conventional
prefix on O32 but I plan to make '.L' the conventional prefix in clang
for all ABIs with '$' as a supported alternative on O32.
Pointers in exception tables are 32-bit for O32/N32 and (should be) 64-bit for
N64. The type encodings and other information should change accordingly. We
can't do 64-bit PC-relative due to lack of assembler support so where it
matters we need to use absolute references (we currently use 32-bit PC relative
and hope we can reach).
A JIT on an O32-ABI-based-OS should generate O32 code even if the host triple is
mips64-linux-gnu.
> > * It's not possible to implement clang in a way that can support
all of mips-linux-gnu's possible meanings.
> > mips-mti-linux-gnu, and mips-img-linux-gnu have the same problem to
a lesser degree
>
> I'm really not sure what any of these things are bringing up. You
haven't actually said what communication
> problem you're trying to solve between the user and the compiler here.
How about we start this from another
> perspective? Can you give some examples of what you'd like to do to
communicate the information you think
> you need to various parts of the backend and how you'd like to
communicate it?
There's two main classes of issue here. The first is that we don't
represent the target accurately and detailed enough throughout all areas of LLVM
to be able to implement the correct behaviour. The second is that the
distributor is unable to convey their environment to LLVM.
I'm going to focus on the second issue here since the first issue is covered
elsewhere.
Suppose Foo port of BarOS (a linux based distribution) has the triple
foo-linux-gnu and therefore commands gcc, and clang which target foo-linux-gnu.
In BarOS version 1, it targets the Foo architecture revision 1. Over time the
Foo architecture releases many updates and Foo revision 7 is now common place.
At this point BarOS version 2 decides to update the minimum Foo revision to 7.
It is currently impossible to change clang in such a way to be buildable on both
BarOS version 1 and 2. We can change clangs interpretation of the foo-linux-gnu
triple but BarOS version 1 would change too and would have to either stop
updates or maintain patches to keep it's default target. Alternatively,
BarOS version 2 must maintain patches. We need a mechanism for BarOS to decide
which Foo revision they want.
Now suppose, there's another distribution called FredsOS. It too has the
triple foo-linux-gnu but wants to target Foo revision 3. If clang changed to
accommodate BarOS, then FredsOS is in the same position as BarOS version 2 was.
The same thing happens with cross-compilers. Using the mips-mti-linux-gnu
toolchain as an example:
Mentor release a mips-mti-linux-gnu toolchain with a vast array of multilibs (36
IIRC, but it used to be >70). Imagination Technologies also release a
mips-mti-linux-gnu toolchain with a much smaller set of multilibs. Furthermore,
it's sysroot has a different layout. Header files aren't duplicated, the
directory names differ, etc.
Much like in the distribution example, clang has no means to support both
toolchains. One or both of Mentor and Imagination Technologies must maintain
patches. Until now we've taken the union of the two and hoped for the best
but this no longer works.
> I promise I'm not trying to be (on purpose at least) particularly dense
here, but I just don't have enough
> information to work with here. I agree that we probably have an API problem
- some of which I solved
> for the mips backend at one point using MCOptions (which I don't really
like as a general solution), but
> a more general solution that'll work and be cleaner is definitely a
direction I'd like us to go.
Don't worry, I'm not taking it that way. I'm aware that a lot of
this goes against most peoples opinion of what a triple is and I remember that I
took a lot of convincing too. I was rather skeptical of the things Matthew
Fortune (GCC maintainer) was telling me years ago about triples until I dug into
it myself recently and found that much of what I thought I knew about triples
was wrong.
From: Eric Christopher [mailto:echristo at gmail.com]
Sent: 30 July 2015 07:52
To: Daniel Sanders; LLVM Developers Mailing List (llvmdev at cs.uiuc.edu)
Cc: Renato Golin (renato.golin at linaro.org); Jim Grosbach (grosbach at
apple.com)
Subject: Re: The Trouble with Triples
Hi Daniel,
(from the context, you might have meant 'tuple' where you've written
'triple'. I'm answering based on the assumption you meant
'triple')
I did mean what I wrote.
The GNU triple is already used as a way of encoding a large amount of the target
data in a string but unfortunately, while this data is passed throughout LLVM,
it isn't reliable because GNU triples are ambiguous and inconsistent. For
example, in GCC toolchains mips-linux-gnu probably means a MIPS target on
Gnu/Linux but anything beyond that (ISA revision, default ABI, multilib layout,
etc.) is up to the person who built the toolchain and may change over time.
Another example is that Debian's definition for i386-linux-gnu has been i486
and i586 at various points in time.
Sorta...
The proposed TargetTuple is a direct replacement for the GNU triple and is
intended to resolve this ambiguity and move away from a string-based
implementation (we need to keep a string serialization though, see below).
Essentially, I'm trying to push the ambiguity out of the internals and give
the distributor control of how the ambiguity is resolved for their environment.
Once that is done, we'll be able to rely on the TargetTuple for information
about the target such as ABI's, architecture revisions, endianness, etc.
This is pretty vague.
I agree that we should open up the API to specify the appropriate data and that
is something that TargetTuple will acquire during step 4 and 7 of the plan
(mostly step 7 where compiler/tool options begin mutating the target tuple). I
don't agree with keeping the GNU triple around though for two main reasons.
The first is that most people believe that GNU triples accurately describe the
target and there will be a strong temptation to inappropriately base logic on
them. The second is that the meaning of the triple varies between toolchain
builds and over time and there is a significant potential for bugs where
different parts of the toolchain use different meanings for the same GNU triple
(due to rebuilding or switching toolchains, or moving objects from system to
system). We ought to resolve the ambiguity once and then stick to that
interpretation.
The string serialization I mentioned above is useful for LLVM-IR as part of a
direct replacement for the 'target triple' statement. We could split
this statement up into smaller pieces but the migration to target tuples is
already difficult so I think it would be best to do a direct replacement first
and redesign the IR statements later if we want to. The serialization is also
useful for command line options on internal tools such as llc to give us precise
control over our tests that the GNU triple can't deliver. This will be
particularly important when distributors can apply their own disambiguations to
GNU triples. The serialization may also be useful as part of a C API but I
haven't given the C API much thought beyond preserving the current API.
My first impression of using this serialization as is that it's something
I'm against. Keep in mind that being able to parse the string can't
invoke a target backend to handle the rest of the parsing. It'd need to be
as generic as a DataLayout if you want to do this sort of thing and I'm
entirely uncertain this is possible for the goals you (and I) have in mind here.
Hopefully, that helps clear up your concerns. Let me know if there's
anything that still seems strange.
Not really. I don't see much of a sketch on what you have in mind for your
"TargetTuple" here other than "it'll be a bunch of things
together".
Let me be clear, I do agree with you that the Triple by itself is insufficient
for what we want long term in the backends, however, we won't be able to get
rid of it completely. It's too ingrained into how cross compilation is done
as a base. It is, however, possible to design an API that includes the Triple
and the relevant information to augment sufficiently. My vision for this is an
API that has a base part that is going to be generic across all targets (think
the current arguments to the TargetMachine constructor), and additional target
specific information that can be passed in via user customization (i.e. command
line options etc).
> My suggestion on a route forward here is that we should look at the
particular
> API and areas of the backend that you're having an issue with and
figure out
> how to best communicate the data you'd like to the appropriate area. I
realize
> this probably seems a little vague and handwavy, but I don't know what
areas
> you've been having problems with lately. I'll absolutely help with
this effort if
> you need assistance or guidance in any way.
The MIPS specific problems are broad and varied. Some of the bigger ones are:
* Building clang on a 32-bit Debian and a 64-bit MIPS processor produces a
compiler that cannot target the native system. The release packages work around
this by 'cross-compiling' from the host triple to the target triple
which are different strings (mips-linux-gnu vs mips64-linux-gnu) but have the
same meaning.
* It's not possible to produce a clang that can generate code for both
32-bit and 64-bit MIPS without one of them needing a -target option to change
the GNU triple. This is because we based the logic on the triple and lack
anything else to use.
I blame the mips backend for this one. We can do -m32/-m64 just fine for x86 as
an example. Some backends have this problem, others don't.
* Various details (ELF headers, label prefixes, exception personality, JIT
target, etc.) depend on the ABI and OS Distribution rather than just 32-bit vs
64-bit
Sure?
* It's not possible to implement clang in a way that can support all of
mips-linux-gnu's possible meanings. mips-mti-linux-gnu, and
mips-img-linux-gnu have the same problem to a lesser degree
I'm really not sure what any of these things are bringing up. You
haven't actually said what communication problem you're trying to solve
between the user and the compiler here. How about we start this from another
perspective? Can you give some examples of what you'd like to do to
communicate the information you think you need to various parts of the backend
and how you'd like to communicate it?
I promise I'm not trying to be (on purpose at least) particularly dense
here, but I just don't have enough information to work with here. I agree
that we probably have an API problem - some of which I solved for the mips
backend at one point using MCOptions (which I don't really like as a general
solution), but a more general solution that'll work and be cleaner is
definitely a direction I'd like us to go.
-eric
________________________________________
From: Eric Christopher [echristo at gmail.com<mailto:echristo at
gmail.com>]
Sent: 29 July 2015 21:44
To: Daniel Sanders; LLVM Developers Mailing List (llvmdev at
cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>)
Cc: Renato Golin (renato.golin at linaro.org<mailto:renato.golin at
linaro.org>); Jim Grosbach (grosbach at apple.com<mailto:grosbach at
apple.com>)
Subject: Re: The Trouble with Triples
Hi Daniel,
I'm not sure I agree with the basic idea of using the target triple as a way
of encoding all of the pieces of target data as a string. I think in a number of
cases what we need to do is either open up API to the back end to specify
things, or encode the information into the IR when it's different from the
generic triple. Ideally the triple will have enough information to do basic
layout and anything else can be either gotten from the IR or passed via option.
My suggestion on a route forward here is that we should look at the particular
API and areas of the backend that you're having an issue with and figure out
how to best communicate the data you'd like to the appropriate area. I
realize this probably seems a little vague and handwavy, but I don't know
what areas you've been having problems with lately. I'll absolutely help
with this effort if you need assistance or guidance in any way.
Thanks!
-eric
On Wed, Jul 8, 2015 at 7:31 AM Daniel Sanders <Daniel.Sanders at
imgtec.com<mailto:Daniel.Sanders at imgtec.com><mailto:Daniel.Sanders
at imgtec.com<mailto:Daniel.Sanders at imgtec.com>>> wrote:
Hi,
In http://reviews.llvm.org/D10969, Eric asked me to explain the wider context of
the TargetTuple object that was replacing Triple on llvmdev so here it is.
Before I start, I'm sure I don't know the full extent of GNU triple
ambiguity and lack of canonicity. Additional examples are welcome.
The Problem
As you know, LLVM uses a GNU Triple is as a target description that can be
relied upon to make decisions. It's used for various decisions such as the
default cpu, the alignment of types, the object format, the names for libcalls,
and a wide variety of others.
In using it like this, LLVM assumes that triples are unambiguous and have a
specific defined meaning. Unfortunately, this assumption fails for a number of
reasons.
The first reason is that compiler options can overrule the triple but leave it
unchanged. For example, in GCC mips-linux-gnu-gcc normally produces 32-bit
MIPS-I output using the O32 ABI, but 'mips-linux-gnu-gcc -mips64'
normally produces 64-bit MIPS-III output using the N32 ABI. Like GCC, compiler
options to mips-linux-gnu-clang should (and mostly do but MIPS has a few
crashing cases caused by triple misuse) overrule the triple. However, we
don't mutate the triple to reflect this so any decisions based on the
overridable state cannot rely on the triple to accurately reflect the desired
behaviour.
It's worth mentioning here that some targets have hacks to partially mutate
the triple in clang to work around issues they would otherwise have in the
backend but this is done on an ad-hoc basis for specific details (e.g. mips
<-> mipsel for -EL and -EB).
The second reason is that there is no canonical meaning for a given GNU Triple,
it varies between vendors and over time. There is also no requirement for
vendors to have a unique GNU Triple for their toolchain. For GCC, it's
fairly common for distributors to change the meanings of triples using options
like --with-arch, --with-cpu, --with-abi, etc. There are also some
target-specific options such as --with-mode to select ARM/Thumb by default and
--with-nan for MIPS NAN encoding selection. Different vendors use different
configure options and may change them at will. When they do change them, the
vendors often desire to keep the same triple to be able to drop in the new
version without causing wider impact on their environment. For example, assuming
I'm reading debian/rules2 for Debian's gcc-4.9 package correctly then
the i386-linux-gnu means i486 on Debian Etch and Lenny but means i586 on more
recent versions. On a similar note, on Debian, mips-linux-gnu targets MIPS-II
(optimised for typical MIPS32 implementations) rather than the usual MIPS-I. The
last example of this ambiguity I'd like to reference is that mentioned by
https://wiki.debian.org/Multiarch/Tuples#Why_not_use_GNU_triplets.3F. In that
example, hard-float and soft-float on ARM both used arm-linux-gnueabi but were
mutually incompatible. The Multiarch tuples described on that page are an
attempt to resolve the ambiguity but I'm told that they aren't likely to
be universally adopted.
The third reason, is that different triples can mean the same thing. Jim
Grosbach has mentioned that the prefixes of the GNU Triple are different between
Linux and Darwin for ARM despite sharing the same meaning (presumably subject to
the issues above). As a result decisions based on the string have to take care
of multiple possible values. Mips has a similar issue too since a host triple
(and therefore default target triple) of mips64-linux-gnu needs to behave like
mips-linux-gnu on a 32-bit Mips port of Debian.
Although not included in the description of the assumption above, one additional
flaw in the use of GNU Triples is that they are sometimes inadequate as a
description of the target. One example affecting MIPS in particular is that the
ABI is not represented in the GNU Triple we require significant API changes to
get this information where we need it. It would be helpful to be able to pass
such information through the existing plumbing.
The Planned Solution
The plan is to split the GNU Triple represented by the llvm::Triple object into
two pieces. The first piece is the existing llvm::Triple and is responsible for
parsing the GNU triple and canonicalizing it. The second piece is a mutable
target description named llvm::TargetTuple. TargetTuple is responsible for
interpreting the triple according to the vendor's rules, providing an
interface to allow mutation by tools, and authoritatively defining the target
being targeted without the ambiguity of GNU Triples. As an example,
'mips-linux-gnu-clang -EL ...' would:
// Parse the GNU Triple
llvm::Triple GnuTriple("mips-linux-gnu");
// Convert it to a TargetTuple according to the (possibly customized) meanings
in
// use by the vendor.
llvm::TargetTuple TT(GnuTriple);
// Then mutate the TargetTuple according to the compiler options (or equivalent
depending
// on the tool, for example disassemblers would mutate it according to the
object headers).
if (hasOption("-EL"))
TT.setLittleEndian()
...
At this point, TT would be
"+mipsel-unknown-linux-gnu-elf32-some-other-stuff" (exact
serialization is t.b.d and may end up target dependent) which we can then rely
on in the rest of LLVM. This split resolves the issue of llvm::Triple objects
not being reliable when used as a target description since TargetTuple will
reflect the result of interpreting the triple as well as applying appropriate
options. It also provides a suitable place for vendors to define the meanings of
their GNU Triples.
One significant detail is the way vendors customize the meaning of their
Triples. Currently, the plan is to nominate a constructor
(TargetTuple::TargetTuple(const Triple &)) a vendor can patch to redefine
their triples with the default implementation being the 'usual' meaning
(the meaning that should be used in the absence of customization). One nice
benefit of this configure-by-source-patch approach is that vendors can customize
multiple triples as easily as their native triple or intended target triple. To
use Debian as an example again, they would be able to customize all their
supported triples such that 'clang -target arm-linux-gnueabihf' on the
amd64 port targets their armhf port using the same customization that makes
'clang' on the armhf port do the right thing natively. Android, and
toolchains for heterogenous platform would likely benefit from this too. This
configure-by-source-patch approach seems to make some people uncomfortable so we
may have to find another way to configure the triples (tablegen?).
To reach this result the plan is to do the following:
1. Replace any remaining std::string's and StringRef's containing
GNU triples with Triple objects.
2. Split the llvm::Triple class into llvm::Triple and llvm::TargetTuple
classes. Both are identical in implementation and almost identical in interface
at this stage.
3. Gradually replace Triples with TargetTuples until the C APIs and the
LLVM-IR are the only place inside LLVM where Triples are still used.
4. Change the implementation of TargetTuple to whatever is convenient for
LLVM's internals and decide on a serialization.
5. Replace serialized Triples with serialized TargetTuples in LLVM-IR.
a. Maintain backwards compatibility with IR using triples, at least for a
while.
6. Add TargetTuple support to the C API. Exact API is t.b.d.
7. Have the API users mutate the TargetTuple appropriately.
Renato: This has been revised slightly from the last one we discussed due to
public C++ API's being used internally as well as externally.
Where we are now
I've just started posting patches for step 2 and 3 of the plan. My working
copy is nearly at step 4.
What's next
Upstream step 2 and 3 and then begin replacing the TargetTuple implementation as
per step 4.
Previous Discussions
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/86020/focus=86073. I
should mention that I've since been made aware that the original topic of
private label prefixes could be solved in a much simpler way than previously
thought. The triple related discussion is still relevant though.
I understand from Renato that there are more threads over the last few years but
I haven't looked for them.
Daniel Sanders
Leading Software Design Engineer, MIPS Processor IP
Imagination Technologies Limited
www.imgtec.com<http://www.imgtec.com><http://www.imgtec.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150731/b60f66b8/attachment.html>