thr3ads.net - llvm dev - [llvm-dev] The Trouble with Triples [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Daniel Sanders via llvm-dev

2015-Sep-24 13:18 UTC

[llvm-dev] The Trouble with Triples

> > > The word 'all' is what still bothers me here. If any one
piece of the information is derived from incorrect information in the triple,
then the behaviour will likely be incorrect.
> >
> > If it's possible to be derived from the triple then it's going
to be correct or the triple is incorrect.
> > If it's something that's overridden later because it can't
be represented by a triple then that
> > default needs to be overriden. This make sense? My mipsel example is
what comes to mind here the most.
> I think so although the triple is often incorrect, particularly with
Triple::mips/mips64.
>
> How's the triple incorrect? Is it user error, configuration error, or
just doesn't represent the full value of the options the user may have
specified?It doesn't reflect the effects of the command line options but the backend
believes it does.

It also conflates having a mips64 architecture with intending to use a mips64
architecture but these aren't quite the same thing. For example,
'mips-linux-gnu –mips64' means O32 on a MIPS64 which is effectively
MIPS32 and O32 in terms of what the compiler is permitted to use. This is the
main reason I want to separate the textual llvm::Triple from the understanding
of what the llvm::Triple means. I want to be able to take an initial meaning and
mutate it according to the options. This can be done in-place in the
llvm::Triple by string substitution (and that's what –EL and –m64 do) but
that's a rather ugly way to do it. The other reason I want to separate the
two is that it provides a home for triple-related information that is implied by
the triple but isn't textually present such as the default CPU that clang
should use. The separation provides a good point to account for
configure-time/run-time triple customization.
Additionally, Renato wants this separation because interpreting ARM triples is
hard and he needs somewhere to store the interpreted meaning.

Similarly, the backend also believes that Triple::mips implies O32 and
Triple::mips64 implies N64 (N32 isn't represented) but we already have a
plan to sort that out.
> Sounds like it was a bug that was fixed. :) That's pretty crazy though.
That's how I see it. The endian mismatch wasn't even the weirdest thing
about it.

From: Eric Christopher [mailto:echristo at gmail.com]
Sent: 24 September 2015 05:19
To: Daniel Sanders; Renato Golin; Jim Grosbach
Cc: llvm-dev at lists.llvm.org; Matthew Fortune
Subject: Re: The Trouble with Triples

On Wed, Sep 23, 2015 at 4:05 PM Daniel Sanders <Daniel.Sanders at
imgtec.com<mailto:Daniel.Sanders at imgtec.com>>
wrote:> > The word 'all' is what still bothers me here. If any one piece
of the information is derived from incorrect information in the triple, then the
behaviour will likely be incorrect.
>
> If it's possible to be derived from the triple then it's going to
be correct or the triple is incorrect.
> If it's something that's overridden later because it can't be
represented by a triple then that
> default needs to be overriden. This make sense? My mipsel example is what
comes to mind here the most.I think so although the triple is often incorrect, particularly with
Triple::mips/mips64.

How's the triple incorrect? Is it user error, configuration error, or just
doesn't represent the full value of the options the user may have specified?
> (Though, if you wouldn't mind, where did you get this mips-linux
toolchain that had a
> --target=mips-linux-gnu at gcc compile time, but was somehow little endian?
How did
> the little endianness get configured in?)It used to be found at
http://community.imgtec.com/developers/mips/tools/compilers/clang-llvm/ but that
link (thankfully) redirects to our main page now which in turn links to
llvm.org<http://llvm.org>. I'm not sure how long it was our published
LLVM+GCC toolchain since I only found out when a user asked about it.

Sounds like it was a bug that was fixed. :) That's pretty crazy though.

-eric

________________________________
From: Eric Christopher [echristo at gmail.com<mailto:echristo at
gmail.com>]
Sent: 23 September 2015 23:23

To: Daniel Sanders; Renato Golin; Jim Grosbach
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Matthew
Fortune
Subject: Re: The Trouble with Triples

On Wed, Sep 23, 2015 at 3:00 PM Daniel Sanders <Daniel.Sanders at
imgtec.com<mailto:Daniel.Sanders at imgtec.com>>
wrote:> > Note that the same problems exist and that they are unrelated to the
existence
> > of TargetMachine or not since TargetMachine gets the relevant
information from
> > the Triple it  holds. This information is incorrect, even as a
starting point.
>
> I believe we're going to disagree here as the TargetMachine does not
get all of its
> information from the Triple - except where the Triple is the canonical
place for that
> information and it isn't overridden in any way.The word 'all' is what still bothers me here. If any one piece of the
information is derived from incorrect information in the triple, then the
behaviour will likely be incorrect.

If it's possible to be derived from the triple then it's going to be
correct or the triple is incorrect. If it's something that's overridden
later because it can't be represented by a triple then that default needs to
be overriden. This make sense? My mipsel example is what comes to mind here the
most.

(Though, if you wouldn't mind, where did you get this mips-linux toolchain
that had a --target=mips-linux-gnu at gcc compile time, but was somehow little
endian? How did the little endianness get configured in?)

Are you saying that almost everything up to and including the architecture (e.g
Triple::mips) should be represented directly in the (MC)TargetMachine?

I'm saying it should be one way or another yes.
> I think Matthew is going down the correct path in his email and I've
responded there.I have some questions on that but I'll reply there.

Cool deal.

-eric

________________________________
From: Eric Christopher [echristo at gmail.com<mailto:echristo at
gmail.com>]
Sent: 23 September 2015 22:28

To: Daniel Sanders; Renato Golin; Jim Grosbach
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Matthew
Fortune

Subject: Re: The Trouble with Triples

On Wed, Sep 23, 2015 at 2:19 PM Daniel Sanders <Daniel.Sanders at
imgtec.com<mailto:Daniel.Sanders at imgtec.com>> wrote:
Rewrote the ABI example in terms of clang -cc1as which is a supported tool.

Which is still just calling the same set of APIs I mentioned in my previous
email so everything there still holds.

Note that the same problems exist and that they are unrelated to the existence
of TargetMachine or not since TargetMachine gets the relevant information from
the Triple it  holds. This information is incorrect, even as a starting point.

I believe we're going to disagree here as the TargetMachine does not get all
of its information from the Triple - except where the Triple is the canonical
place for that information and it isn't overridden in any way.

I think Matthew is going down the correct path in his email and I've
responded there.

Thanks!

-eric

Please do read the other examples in my previous email. It contains a number of
problems
that need to be addressed and are completely unrelated to the MC layer.

ABI

Let's start at ExecuteAssembler() in cc1as_main.cpp. Here's a sketch of
what happens:
* Call TargetRegistry::lookupTarget() to get a llvm::Target.
* Call createMCRegInfo(Triple, ...)
* Call createMCAsmInfo(..., Triple)
  * MipsMCAsmInfo::PointerSize is incorrect for the N32 ABI (should be 4 but
gets 8 since it checks for Triple::mips64/mips64el)
  * MipsMCAsmInfo::CalleeSaveStackSlotSize is incorrect for mips-linux-gnu
–mips64 –mabi=64. Since it too checks for Triple::mips64/mips64el
  * MipsMCAsmInfo::PrivateLabelPrefix and MipsMCAsmInfo::PrivateGlobalPrefix are
wrong (currently "$", should be ".L") for N32/N64 but
it's possible to fix this. However, O32 should permit "$" in
addition to ".L". Even if MipsMCAsmInfo supported multiple prefixes
(which is easy enough to add), checking for Triple::mips/mipsel would not yield
the correct result on mips64-linux-gnu –mabi=32.
* Construct an MCObjectFileInfo
* InitMCObjectFileInfo()
  * FDECFEEncoding is incorrect for N32 (should be sdata4 but gets sdata8 since
it checks for Triple::mips64/mips64el)
  * PersonalityEncoding and TTypeEncoding are correct but only because we
don't have a R_MIPS_PC64 relocation yet. If we had such a relocation this
would have the same problem as FDECFEEncoding.
* Call createMCInstrInfo
* Call createMCSubtargetInfo(Triple, ...)
* If emitting assembly:
  * Call createMCInstPrinter(Triple, ...)
  * If emitting encodings:
    * Call createMCCodeEmitter()
    * Call createMCAsmBackend(..., Triple, ...)
  * Call createMCAsmStreamer()
* If emitting objects:
  * Call createMCCodeEmitter()
  * Call createMCAsmBackend(..., Triple, ...)
  * createMCObjectStreamer()
    * This in turn calls createObjectWriter() and tells it to emit ELF32/ELF64
objects. This information comes from MipsAsmBackend and ultimately comes from
Triple::mips/mipsel vs Triple::mips64/mips64el. This is incorrect for N32 (which
should be ELF32 but has Triple::mips64/mips64el) and for mips-linux-gnu –mips64
(which should be ELF32 since it should target O32).
* Call createMCAsmParser()
* Call a different createMCAsmParser().

Other places that get ABI information wrong:
* AddressSanitizer: Uses Triple::mips64/mips64el to mean the N64 ABI. N32 is a
Triple::mips64/mips64el that should behave as the Triple::mips/mipsel cases do.
* DataFlowSanitizer: Is heading down the same road but hasn't implemented
O32/N32 yet.
* MemorySanitizer: Is heading down the same road but hasn't implemented
O32/N32 yet.
* Many places where hasMips64*() or isGP64bit() are used in the backend.
  * MSA intrinsic lowering
  * Legalization configuration
  * Instruction selection
  * MipsTargetLowering::getOptimalMemOpType()
  * And many more. I can provide more detail if you want.

Other notables:
* RuntimeDyldELF gets it right but only because it can read the ELF headers
instead of the Triple. It went down the same road for a while.

I'll provide a CodeGen example tomorrow if you want.
________________________________
From: Eric Christopher [echristo at gmail.com<mailto:echristo at
gmail.com>]

Sent: 23 September 2015 19:49
To: Daniel Sanders; Renato Golin; Jim Grosbach
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: The Trouble with Triples

On Wed, Sep 23, 2015 at 11:38 AM Daniel Sanders <Daniel.Sanders at
imgtec.com<mailto:Daniel.Sanders at imgtec.com>>
wrote:> OK, I'm going to just reply to the last because I think it's the
most important part of all this and would like to try to have us side tracked
again. If you'd like I can reply to it, but let's take the last part
first :)
>
> > > Could you please provide some examples of things that are
impossible right now
> > > with command lines, how those interact with the TargetMachine,
and how you see
> > > it being impossible to deal with?
> > There's some examples above but I'll give the detail in the
morning. It's 11:30pm
at the moment :-).> Let's talk through one of your examples here when you write things up.
I think
> tracing the execution as you see it will be important to coming to a mutual
> understanding here. I know that you have a solution that you see is going
to
> solve the problems you see, but the I think the problems that you and I are
seeing
> are possibly not the same thing. So let's walk through this execution
trace and see
> what we can do.ABI

Let's start at llvm-mc's main(). It's important to note that llvm-mc
does not create a TargetMachine. Here's a sketch of what happens:

So, we can just stop here.

A couple problems:

a) llvm-mc isn't a supported product, but that's not the real issue.
b) The lack of a TargetMachine at the MC level was something I brought up a long
time ago in this thread with my proposed solutions. This is what needs to be
fixed, especially given that targets can switch ISA, ABI, floating point, etc
within a single assemble action.

I even brought up a lot of these problems originally when I was fixing MIPS to
work with the current subtarget rewrite.

-eric

•         Initialize LLVM

•         Parse the command line

•         Construct an MCTargetOptions from the flags

•         Normalize the triple

•         Construct a llvm::Target

o   If the triple is not given, we fetch the default

o   We normalize the triple

o   We call TargetRegistry::lookupTarget() to get a llvm::Target.

•  If –march is given, and Triple::getArchTypeForLLVMName() doesn't return
Triple::UnknownArch, the new arch this mutates the triple. Otherwise it applies
the –march correctly but doesn't change the triple to match. In this way,
it's possible to end up with i586-linux-gnu targeting the foobar
architecture.

•         Call createMCRegInfo()

•         Call createMCAsmInfo()

o   MipsMCAsmInfo::PointerSize is incorrect for the N32 ABI (should be 4 but
gets 8 since it checks for Triple::mips64/mips64el)

o   MipsMCAsmInfo::CalleeSaveStackSlotSize is incorrect for mips-linux-gnu
–mips64 –mabi=64. Since it too checks for Triple::mips64/mips64el

o   MipsMCAsmInfo::PrivateLabelPrefix and MipsMCAsmInfo::PrivateGlobalPrefix are
wrong (currently "$", should be ".L") for N32/N64 but
it's possible to fix this. However, O32 should permit "$" in
addition to ".L". Even if MipsMCAsmInfo supported multiple prefixes
(which is easy enough to add), checking for Triple::mips/mipsel would not yield
the correct result on mips64-linux-gnu –mabi=32.

•         InitMCObjectFileInfo()

o   FDECFEEncoding is incorrect for N32 (should be sdata4 but gets sdata8 since
it checks for Triple::mips64/mips64el)

o   PersonalityEncoding and TTypeEncoding are correct but only because we
don't have a R_MIPS_PC64 relocation yet. If we had such a relocation this
would have the same problem as FDECFEEncoding.

•         createMCInstrInfo()

•         createMCInstPrinter()

•         createMCCodeEmitter()

•         createMCAsmBackend()

•         If emitting assembly, createMCAsmStreamer()

•         if emitting object, createMCObjectStreamer()

o   This in turn calls createObjectWriter() and tells it to emit ELF32/ELF64
objects. This information comes from MipsAsmBackend and ultimately comes from
Triple::mips/mipsel vs Triple::mips64/mips64el. This is incorrect for N32 (which
should be ELF32 but has Triple::mips64/mips64el) and for mips-linux-gnu –mips64
(which should be ELF32 since it should target O32).

•         If assembling createMCAsmParser

•         If disassembling:

o   createMCRegInfo() (again)

o   createMCAsmInfo() (again)

•  This has the same issues as the first call.

o   createMCDisassembler()
Clang does pretty much the same thing as this but additionally has to deal with
using the correct default ABI for the given triple. I'll cover this kind of
problem in 'CPU Defaults' below.

Other places that get ABI information wrong:

•         AddressSanitizer: Uses Triple::mips64/mips64el to mean the N64 ABI.
N32 is a Triple::mips64/mips64el that should behave as the Triple::mips/mipsel
cases do.

•         DataFlowSanitizer: Is heading down the same road but hasn't
implemented O32/N32 yet.

•         MemorySanitizer: Is heading down the same road but hasn't
implemented O32/N32 yet.

•         Many places where hasMips64*() or isGP64bit() are used in the backend.

o   MSA intrinsic lowering

o   Legalization configuration

o   Instruction selection

o   MipsTargetLowering::getOptimalMemOpType()

o   And many more. I can provide more detail if you want.

Other notables:

•         RuntimeDyldELF gets it right but only because it can read the ELF
headers instead of the Triple. It went down the same road for a while.

I'll provide a CodeGen example tomorrow if you want. I'd intended to
include one but this email took longer to type up than I expected.

Endian Defaults

The toolchain is mips-linux-gnu and targets little endian by default. Here's
what currently happens:

•         We parse the triple (mips-linux-gnu) and get Triple::mips

•         No command line flags modify this

•         We construct a TargetMachine and all the other objects using this
llvm::Triple.

•         The architecture was Triple::mips so everything configures for
big-endian even though the target was supposed to be little endian.

CPU Defaults

In LLVM, the default CPU is hardcoded to be MIPS32 (in
MipsABIInfo::computeTargetABI()). In Clang, the default CPU for this triple is
hardcoded to be MIPS32R2 (in mips::getMipsCPUAndABI()) and clang always passes
an explicit CPU to the backend via –target-cpu.

On Debian, the default CPU for mipsel-linux-gnu is MIPS-II. On Fedora, the
default CPU for mipsel-linux-gnu is MIPS32R2. It is not possible to hardcode the
default both ways.
How would you resolve this conflict?

In my opinion, the only choices to resolve this conflict are configure-time
options or run-time config files. Configure-time options to select the default
CPU is faster to
implement and produces a (slightly) faster clang while run-time config files are
more flexible but slower to implement and produces a slower clang. To me,
configure-time is the
sensible short term choice followed by moving to run-time config files once the
pressure to achieve an initial release is gone.

Now let's consider JIT's. JIT's should default to the host CPU as
defined by the host triple so that it generates code for the same target as the
rest of the system. There is a reasonable argument that the default CPU should
be auto-detected CPU for performance reasons but it may not be possible to
auto-detect the CPU in all circumstances. We therefore need a default to fall
back on. This default should be the same as the default for the native compiler
on this host (MIPS-II for Debian, MIPS32R2 for Fedora).

In my opinion, the default CPU is a property of the target platform since the
platform specifies the minimum CPU it is intended to run on. Our representation
of the target platform is called llvm::Triple so the default CPU belongs in this
object. Being in this object means that tools such as clang, or API's such
as Target::createTargetMachine() will always get the defaults corresponding to
the triple. These defaults, as we discussed above vary according to the OS
(MIPS-II on Debian, MIPS32R2 on Fedora).

This kind of problem also exists in other forms such as Softfloat vs Hardfloat
defaults, NAN1985 vs NAN2008 defaults, default ABIs, etc.

Other things to mention

MIPS64 is not a fundamentally different architecture from MIPS32. If we had a
representation of the ABI in the triple then we wouldn't need
Triple::mips64/mips64el.

From: Eric Christopher [mailto:echristo at gmail.com<mailto:echristo at
gmail.com>]
Sent: 23 September 2015 01:34

To: Daniel Sanders; Renato Golin; Jim Grosbach
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: The Trouble with Triples

OK, I'm going to just reply to the last because I think it's the most
important part of all this and would like to try to have us side tracked again.
If you'd like I can reply to it, but let's take the last part first :)
> Could you please provide some examples of things that are impossible right
now
> with command lines, how those interact with the TargetMachine, and how you
see
> it being impossible to deal with?There's some examples above but I'll give the detail in the morning.
It's 11:30pm
at the moment :-).

Let's talk through one of your examples here when you write things up. I
think tracing the execution as you see it will be important to coming to a
mutual understanding here. I know that you have a solution that you see is going
to solve the problems you see, but the I think the problems that you and I are
seeing are possibly not the same thing. So let's walk through this execution
trace and see what we can do.

Thanks!

-eric

________________________________
From: Eric Christopher [echristo at gmail.com<mailto:echristo at
gmail.com>]
Sent: 22 September 2015 20:40
To: Daniel Sanders; Renato Golin; Jim Grosbach
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>

Subject: Re: The Trouble with Triples

On Thu, Sep 17, 2015 at 6:21 AM Daniel Sanders <Daniel.Sanders at
imgtec.com<mailto:Daniel.Sanders at imgtec.com>> wrote:
I think we need to take a step further back and re-enter from the right starting
point. The thing that's bothering me about the push back so far is that
it's trying to discuss and understand the consequences of resolving the core
problem while seemingly ignoring the core problem itself. The reason I've
been steering everything back to GNU Triple's being ambiguous and
inconsistent is because it's the root of all the problems and the fixes to
the various issues fall out naturally once this core point has been addressed.

*sigh*

Here's the line of thought that I'd like people to start with:

•         Triples don't describe the target. They look like they should, but
they don't. They're really just arbitrary strings.

Triples are used as a starting point, but no more.

•         LLVM relies on Triple as a description of the target. It defines the
backend to use, the binary format to use, OS and Vendor specific quirks to
enable/disable, the default CPU, the default ABI, the endian, and countless
other details about the target.

These two statements aren't necessarily true in whole.

a) We don't use the Triple to fully specify the target.
b) We don't use the Triple to fully specify the ABI.
c) We don't use the Triple to fully specify the CPU.
d) We do use the triple to handle endianness since most, if not all, triples
actually bother to encode endianness.
e) The rest of the "countless details" may or may not be relevant, you
haven't given an example of what you care about.

From here on your email relies on all of these assumptions being true. So
I'm going to skip past that part and go to where you answer some of my
questions.
At this point, in the MC layer we have a number of classes that need to know the
ABI but lack this information. Our TargetMachine has an accurate TargetTuple
object that describes the invariants of the desired target. The desired ABI is
an invariant too so why not have it in the TargetTuple which is already plumbed
in everywhere we need it? After all, it's a property of the target
OS/Environment. If we have the ABI in the TargetTuple, then we don't need
any other means to set the ABI, tools can set it up front in the TargetTuple and
we don't need any command-line option handling for it in the backend.

This isn't sufficient anyways as I don't want to depend on a weird
serialization format to deal with something a simple command line can deal with
(or you've said this in a way that's confused me). I see you saying you
want:

-tuple mips-linux-gnu-abio32-el

to specify on a command line to, say, llvm-mc or a new assembler interface, or
heck, to clang itself, that you want to compile for:

-triple mipsel-linux-gnu -mabi=o32

right? Basically? (Bikeshedding of how to actually serialize things aside?)

Meanwhile, in clang we have a number of command line options that change the
desired target. Let's say we've constructed a Triple and resolved it to
TargetTuple (more on that below). We're now processing the –EL option. At
the moment, we substitute our mips-linux-gnu triple for a mipsel-linux-gnu
triple, construct a Triple object from it and resolve the new Triple to a
TargetTuple. But why do we need to bother with that kind of weird hackery when
we can simply do Obj.setEndian(Little)? This is what Phase 7 of the plan is
about. We end up with a cleaner way to process target changes that, until now,
have required weird triple hacking to handle.

This is something else I don't understand. Here is the first time you start
talking about APIs which is what I'm particularly asking about in my earlier
mails. I'd like to see how you plan on changing the TargetMachine and MC
level APIs to deal with this. It seems like the Tuple is going to be a way to
side-load information around to the MC layer and while I agree that something is
necessary there, I don't think that this solution is the right one. (As I
said earlier in the thread)

I skipped the Triple -> TargetTuple resolution a moment ago and I should
address that now. We already know that mapping Triple to TargetTuple is a many
to many mapping. One Triple has many possible TargetTuple's depending on the
environment. One TargetTuple can be formed from multiple possible Triples. In an
ideal world, we'd like to bake in all of these mappings so that one clang
binary supports everything. Unfortunately, being a many to many mapping, some of
these mappings are mutually exclusive. Note that this isn't a new problem
resulting from this project. The problem has always been there but has been
ignored until now. To resolve this, we need to provide configure-time and
possibly run-time controls for how this conversion is disambiguated. This
resolution is performed as early as possible so that the middle/back-ends
don't need to know anything about the ambiguity problem.

The minute you start talking about configure time controls we've already
lost. This, for me, is a non-starter. That said, I'd like to see the
examples you think show that things are impossible to deal with in the current
architecture.

---

To reply more directly to your email:

Thanks :)
> What can't be done to TargetMachine to avoid this serialization?
TargetMachine already has the serialization (see TargetMachine::TargetTriple).
We're not doing anything new here. We're simply replacing one object
holding faulty information with a new object holding reliable information.

This is side stepping my question and making it about Triple. I've
specifically said that TargetMachine does not and is not completely dependent
upon Triple.
> And a followup question: What can't be serialized at the function level
in the IR to make certain things clear that aren't global? We already do
this for a lot of command line options.
The data I want to fix is global. I think the bit you may be getting hung up on
here is that small portions of this global data can also be overridden at the
function level. Those overrides aren't a problem and continue to operate in
the same way as they do today.

Examples please.
> And one more: What global options do we need to consider here?
I'm not certain I understand this question. If you're talking command
line options, it's things like –EL, -EB, -mips32, -mips32r[2356], -mips64,
-mips64r[2356], -mabi=…. If you're talking about Triple -> TargetTuple
mappings, there's quite a wide variety but the main ones for Mips are
endian, architecture, default CPU, and default ABI.

All of these are representable right now in the TargetMachine as far as I can
tell. What examples are you having problems with?

> The goal of the configuration level of the TargetMachine is that it
controls things that don't change at the object level.
> This is a fairly recently stated goal, but I think it makes sense for LLVM
in general. TargetSubtargetInfo takes care of
> everything that resides under this (as much as possible, some bits are
still in transition, e.g. TargetOptions). This is part
> of my suggestion to Daniel about the problems with MCSubtargetInfo and the
assembler. Targets like Mips and ARM
> were unfortunately designed to change things on the fly during assembly and
need to collate or at least change defaults
> as we're processing code. I definitely had to deal with a lot of the
pain you're talking about when I was rewriting some
> of the handling there during the TargetSubtargetInfo work.
I generally agree with this. The key bit I need to draw attention to is that the
'defaults' don't change, but are instead overridden. These constant
defaults are stored in TargetMachine and particularly
TargetMachine::TargetTriple. These defaults are wrong for some toolchains since
the information stored in TargetMachine::TargetTriple are wrong. It's the
defaults I'm trying to fix rather than the overrides.

I don't understand what you mean here.

I think I understand your proposed plan now and it's a few steps ahead of
where we are and where we need to be. I agree that overridable state should be
in TargetSubtargetInfo, however I can't initialize that state without the
default values which come from the faulty information in
TargetMachine::TargetTriple. This triple work is a pre-requisite to your plan
and at first I don't need to override ABI's.

Can you provide an example of using a tool that you're having problems with?
> Right now I see TargetTuple as trying to take over all of the various
arguments to TargetMachine and encapsulate them into a single thing.
> I also don't see this is bad, but I also don't see it taking all of
them right now and I'm not sure how it solves some of the existing problems
> with data sharing that we've got which is where the push back
you're both getting is coming from here. Ultimately library-wise I can agree
> with some of the directions you're headed - I just don't see the
unification and interactions right now.
I think we'll end up with TargetTuple taking over many arguments to
TargetMachine but that's not my goal at this stage. My goal is simply to fix
the faulty information currently held in Triple and use the now-accurate
information in TargetTuple to fix various blocking issues that prevent a proper
Mips toolchain product based on Clang/LLVM. At the end of Phase 7, it become
possible to fix a number of issues that are impossible to fix right now because
the available data we can consult at the moment is incorrect.

Could you please provide some examples of things that are impossible right now
with command lines, how those interact with the TargetMachine, and how you see
it being impossible to deal with?

Thanks

-eric

From: Eric Christopher [mailto:echristo at gmail.com<mailto:echristo at
gmail.com>]
Sent: 16 September 2015 23:52
To: Renato Golin; Jim Grosbach
Cc: Daniel Sanders; llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>

Subject: Re: The Trouble with Triples

Let's take a step back here.

It appears that you and Daniel are trying to solve some problems. I think
solving problems is good, I just want to make sure that we're solving them
in a way that gets us a decent API at the end. I also want to make sure
we're solving the right problems.

TargetTuple appears to be related to the TargetParser as you bring up in this
mail. They're two separate parts of similar problems - people trying to both
serialize command line options and communication from the front end to the
backend with respect to target information.

This leads me to a question: What can't be done to TargetMachine to avoid
this serialization?
And a followup question: What can't be serialized at the function level in
the IR to make certain things clear that aren't global? We already do this
for a lot of command line options.
And one more: What global options do we need to consider here?

The goal of the configuration level of the TargetMachine is that it controls
things that don't change at the object level. This is a fairly recently
stated goal, but I think it makes sense for LLVM in general. TargetSubtargetInfo
takes care of everything that resides under this (as much as possible, some bits
are still in transition, e.g. TargetOptions). This is part of my suggestion to
Daniel about the problems with MCSubtargetInfo and the assembler. Targets like
Mips and ARM were unfortunately designed to change things on the fly during
assembly and need to collate or at least change defaults as we're processing
code. I definitely had to deal with a lot of the pain you're talking about
when I was rewriting some of the handling there during the TargetSubtargetInfo
work.

Now a bit more on TargetParser + TargetTuple:

TargetParser appears to be trying to solve the parsing in Triple in a nice way
for ARM and also some of the "what kind of subtarget feature
canonicalization can we do in llvm that makes sense to communicate to the front
end". I like this particular idea and have often wanted a library of
feature handling, but it seems to have stabilized at an ARM specific set of code
with no defined interface. I can't even figure out how I'd use it in
lib/Basic right now for any target other than ARM. This isn't a condemnation
of TargetParser, but I think it's something that needs to be thought through
a bit more. It's been hooked up well before I'd expected it to and right
now if we moved it to the ARM backend from Support it'd make just as much
sense as it does where it is now other than making clang depend on the ARM
backend as well as the X86 backend :)

Right now I see TargetTuple as trying to take over all of the various arguments
to TargetMachine and encapsulate them into a single thing. I also don't see
this is bad, but I also don't see it taking all of them right now and
I'm not sure how it solves some of the existing problems with data sharing
that we've got which is where the push back you're both getting is
coming from here. Ultimately library-wise I can agree with some of the
directions you're headed - I just don't see the unification and
interactions right now.

As a suggestion as a way forward here let's see if we can get my questions
above answered and also show some of how the interactions between llvm's
libraries are going to get fixed, moved to a better place, etc here.

Thanks!

-eric

On Wed, Sep 16, 2015 at 3:02 PM Renato Golin <renato.golin at
linaro.org<mailto:renato.golin at linaro.org>> wrote:
On 16 September 2015 at 21:56, Jim Grosbach <grosbach at
apple.com<mailto:grosbach at apple.com>> wrote:> Why do we care about GAS? We have an assembler.
It's not that simple.

There are a lot of old code out there, including the Linux kernel
which we do care a lot, that only compiles with GAS. We're slowly
moving the legacy code up to modern standards, and specifically some
kernel folks are happy to move up not only the asm syntax, but the C
standard and move away from GNU-specific behaviour. But we're not
quite there yet, and might not be for a few more years. so, yes, we
still care about GAS.

But this is not just about GAS.

As I said on my previous email, this is about clearing the bloat in
target descriptions by both: removing the need for adding numerous CPU
names, target features, architecture names (xscale, strongarm, etc),
AND making sure all parties (front/middle/back-ends) speak the same
language, produced from the same source.

The TargetTuple is that common language, and the TargetParser created
from the TableGen files is the common source. The Triple becomes a
legacy constructor value for the Tuple. All other target information
classes are already (or should be) generated from the TableGen files,
so the ultimate source becomes the TableGen description, which I think
it what you were aiming to on your comment.

For simple architectures, like x86, you don't even need a
TargetParser. You can easily construct the Tuple from a triple and use
the Tuple as you've always used the triple. No harm done. But for the
complex ones like ARM and MIPS, having a common interface generated
from the same place the other interfaces are is important to avoid
more bridges between front and middle and back end interpretations of
the same target. Whatever legacy ARM or MIPS carry can be isolated in
their own implementation, leaving the rest of the targets with a clean
and simple interface.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150924/7a923926/attachment-0001.html>

Matthew Fortune via llvm-dev

2015-Sep-24 13:46 UTC

head link

[llvm-dev] The Trouble with Triples

I think this all matches up with Eric’s comments. Personally I’d advocate
getting to a point where the triple can flow through unmutated (because it
becomes unused as a source of information that can be overridden) but it doesn’t
matter much; there will be some extra data flowing alongside it for all the
other implied or overridden settings. The useful bits of the triple become cpu
family ‘mips’ (for any of mips, mipsel, mips64, mips64el, mips64orion,
mips64octeon), OS ‘linux’ and environment ‘gnu’ (for any of gnu, gnuabi32,
gnuabi64, gnueabihf etc).

Rationalising the suggestion for cpu: While I fully agree that a mips- triple
must be big endian default and mipsel- must be little endian there is nothing to
stop an opposing endian ‘multilib’ in such a toolchain. That’s why I suggest
that endian should not end up coming from the triple (internally in LLVM) for
architectures that support both endians.

I think we are done here. Just need an implementation to deal with the finer
points; I’ll resist commenting further on this thread.

Thanks for the discussion everyone, it’s been quite educational.
Matthew

From: Daniel Sanders
Sent: 24 September 2015 14:18
To: Eric Christopher; Renato Golin; Jim Grosbach
Cc: llvm-dev at lists.llvm.org; Matthew Fortune
Subject: RE: The Trouble with Triples
> > > The word 'all' is what still bothers me here. If any one
piece of the information is derived from incorrect information in the triple,
then the behaviour will likely be incorrect.
> >
> > If it's possible to be derived from the triple then it's going
to be correct or the triple is incorrect.
> > If it's something that's overridden later because it can't
be represented by a triple then that
> > default needs to be overriden. This make sense? My mipsel example is
what comes to mind here the most.
> I think so although the triple is often incorrect, particularly with
Triple::mips/mips64.
>
> How's the triple incorrect? Is it user error, configuration error, or
just doesn't represent the full value of the options the user may have
specified?It doesn't reflect the effects of the command line options but the backend
believes it does.

It also conflates having a mips64 architecture with intending to use a mips64
architecture but these aren't quite the same thing. For example,
'mips-linux-gnu –mips64' means O32 on a MIPS64 which is effectively
MIPS32 and O32 in terms of what the compiler is permitted to use. This is the
main reason I want to separate the textual llvm::Triple from the understanding
of what the llvm::Triple means. I want to be able to take an initial meaning and
mutate it according to the options. This can be done in-place in the
llvm::Triple by string substitution (and that's what –EL and –m64 do) but
that's a rather ugly way to do it. The other reason I want to separate the
two is that it provides a home for triple-related information that is implied by
the triple but isn't textually present such as the default CPU that clang
should use. The separation provides a good point to account for
configure-time/run-time triple customization.
Additionally, Renato wants this separation because interpreting ARM triples is
hard and he needs somewhere to store the interpreted meaning.

Similarly, the backend also believes that Triple::mips implies O32 and
Triple::mips64 implies N64 (N32 isn't represented) but we already have a
plan to sort that out.
> Sounds like it was a bug that was fixed. :) That's pretty crazy though.
That's how I see it. The endian mismatch wasn't even the weirdest thing
about it.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150924/77795ed9/attachment.html>

Daniel Sanders via llvm-dev

2015-Sep-24 14:46 UTC

head link

[llvm-dev] The Trouble with Triples

If Triple is available then people will likely misuse it. Where sensible,
I'd like it to have boundaries where it stops flowing and hands over to
better representations in TargetMachine/MCTargetMachine/something-else.
> Rationalising the suggestion for cpu: While I fully agree that a mips-
triple must be big endian default and mipsel-
> must be little endian there is nothing to stop an opposing endian
‘multilib’ in such a toolchain. That’s why I suggest
> that endian should not end up coming from the triple (internally in LLVM)
for architectures that support both endians.
You're talking about the place to look for the information and not the
origin of that information, right?
The initial value has to originate from the interpretation of the triple (plus
customizations), there's nowhere else to get it.

From: Matthew Fortune
Sent: 24 September 2015 14:46
To: Daniel Sanders; Eric Christopher; Renato Golin; Jim Grosbach
Cc: llvm-dev at lists.llvm.org
Subject: RE: The Trouble with Triples

I think this all matches up with Eric’s comments. Personally I’d advocate
getting to a point where the triple can flow through unmutated (because it
becomes unused as a source of information that can be overridden) but it doesn’t
matter much; there will be some extra data flowing alongside it for all the
other implied or overridden settings. The useful bits of the triple become cpu
family ‘mips’ (for any of mips, mipsel, mips64, mips64el, mips64orion,
mips64octeon), OS ‘linux’ and environment ‘gnu’ (for any of gnu, gnuabi32,
gnuabi64, gnueabihf etc).

Rationalising the suggestion for cpu: While I fully agree that a mips- triple
must be big endian default and mipsel- must be little endian there is nothing to
stop an opposing endian ‘multilib’ in such a toolchain. That’s why I suggest
that endian should not end up coming from the triple (internally in LLVM) for
architectures that support both endians.

I think we are done here. Just need an implementation to deal with the finer
points; I’ll resist commenting further on this thread.

Thanks for the discussion everyone, it’s been quite educational.
Matthew

From: Daniel Sanders
Sent: 24 September 2015 14:18
To: Eric Christopher; Renato Golin; Jim Grosbach
Cc: llvm-dev at lists.llvm.org; Matthew Fortune
Subject: RE: The Trouble with Triples
> > > The word 'all' is what still bothers me here. If any one
piece of the information is derived from incorrect information in the triple,
then the behaviour will likely be incorrect.
> >
> > If it's possible to be derived from the triple then it's going
to be correct or the triple is incorrect.
> > If it's something that's overridden later because it can't
be represented by a triple then that
> > default needs to be overriden. This make sense? My mipsel example is
what comes to mind here the most.
> I think so although the triple is often incorrect, particularly with
Triple::mips/mips64.
>
> How's the triple incorrect? Is it user error, configuration error, or
just doesn't represent the full value of the options the user may have
specified?It doesn't reflect the effects of the command line options but the backend
believes it does.

It also conflates having a mips64 architecture with intending to use a mips64
architecture but these aren't quite the same thing. For example,
'mips-linux-gnu –mips64' means O32 on a MIPS64 which is effectively
MIPS32 and O32 in terms of what the compiler is permitted to use. This is the
main reason I want to separate the textual llvm::Triple from the understanding
of what the llvm::Triple means. I want to be able to take an initial meaning and
mutate it according to the options. This can be done in-place in the
llvm::Triple by string substitution (and that's what –EL and –m64 do) but
that's a rather ugly way to do it. The other reason I want to separate the
two is that it provides a home for triple-related information that is implied by
the triple but isn't textually present such as the default CPU that clang
should use. The separation provides a good point to account for
configure-time/run-time triple customization.
Additionally, Renato wants this separation because interpreting ARM triples is
hard and he needs somewhere to store the interpreted meaning.

Similarly, the backend also believes that Triple::mips implies O32 and
Triple::mips64 implies N64 (N32 isn't represented) but we already have a
plan to sort that out.
> Sounds like it was a bug that was fixed. :) That's pretty crazy though.
That's how I see it. The endian mismatch wasn't even the weirdest thing
about it.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150924/0eaafe3f/attachment.html>

Renato Golin via llvm-dev

2015-Sep-24 16:59 UTC

head link

[llvm-dev] The Trouble with Triples

On 24 September 2015 at 06:18, Daniel Sanders <Daniel.Sanders at
imgtec.com> wrote:> Additionally, Renato wants this separation because interpreting ARM triples
> is hard and he needs somewhere to store the interpreted meaning.
I have two main problems:

1. Parsing all the numerous options, not just triples, is slow and
should only ever be done once. However we do the passing of
information to the back-end, it has to be in a way that passes and
codegen classes can query to understand about the target. I believe
TargetMachine and friends can do that, we just have to make sure that
*all* knowledge is accessible via those classes.

2. CPU names can change the Arch name and vice-versa, and we had
numerous problems of things disappearing and all moving back to
armv4t. That used to happen because at different stages of the driver,
the string parsing routines would fail and return null or empty
strings, and the rest of the driver couldn't cope with it.

Moving all parsing into one location (TargetParser) was the first
step, but all places that used to do string parsing are still calling
those methods, which means we're not that much better off. If we move
all parsing *usage* into one location and only deal with enum values,
not only it'll be a lot faster, but we would be able to reason about
every convoluted option in less, bigger methods, reducing the
opportunity for bad assumptions.

We still need the parser in the back-end, though, as assembler options
use the same syntax and generally mean the same thing, so we need to
make it safe to use it to change the TargetMachine while assembling.

None of this goes against any of what you guys are discussing, I
believe, I just wanted to make it clear what my issues were. :)

cheers,
--renato

llvm dev - Sep 2015 - The Trouble with Triples

[llvm-dev] The Trouble with Triples

[llvm-dev] The Trouble with Triples

[llvm-dev] The Trouble with Triples

[llvm-dev] The Trouble with Triples