thr3ads.net - llvm dev - [llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Rui Ueyama via llvm-dev

2017-Nov-08 04:49 UTC

[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC

On Tue, Nov 7, 2017 at 8:16 PM, Rafael Avila de Espindola <
rafael.espindola at gmail.com> wrote:
> Rui Ueyama <ruiu at google.com> writes:
>
> >> So I am strongly against removing either non TLSDESC support of
support
> >> for the relaxations.
> >>
> >
> > It's still pretty arguable. By default, compilers use General
Dynamic
> model
> > with -fpic, and Initial Exec without -fpic.
>
> It is more complicated than that. You can get all 4 modes with clang
>
> -------------------------------
> __thread int bar = 42;
> int *foo(void) {  return &bar; }
> -------------------------------
> without -fPIC: local exec.
>
> -------------------------------
> extern __thread int bar;
> int *foo(void) {  return &bar; }
> -------------------------------
> without -fPIC: initial exec.
> with -fPIC: general dynamic
>
> -------------------------------
> __attribute__((visibility("hidden"))) extern __thread int bar;
> int *foo(void) {  return &bar; }
> -------------------------------
> with -fPIC: local dynamic.

The other case is

__attribute__((visibility("hidden"))) extern __thread int bar;
int *foo(void) {  return &bar; }

without -fPIC which choose Local Exec.

>
> > lld doesn't do any relaxation
> > if -shared is given. So, if you are creating a DSO, thread-local
> variables
> > in the DSO are accessed using Global Dynamic model. No relaxations are
> > involved.
>
> There is not a lot of opportunities there. If one patches one access at
> a time LD is as expensive as GD. The linker also doesn't know if the
.so
> will be used with dlopen or not, sot it cannot relax to IE. I guess a
> linker could have that command line option for the second part.
>
> Now that I spell that out, it is easy to see the TLSDESC big
> advantage. It can optimize the case the static linker cannot.

Because of this fact, DSOs that use thread-local variables such as libc are
already compiled with -ftls-model=initial-exec. So the authors of DSOs in
which the performance thread-local variables matters are already aware of
the issue and how to workaround it.
> If you are creating an executable and if your executable is not
> > position-independent, you're using Initial Exec model by default
which is
> > as fast as variables accessed through GOT. If you really want to use
> Local
> > Exec model, you can pass -ftls-model=local-exec to compilers.
>
> But then all the used variables have to be defined in the same
> executable. You can't have even one from a shared library (think
errno).
>
Not really -- you can still use Local Exec per variable basis using the
visibility attribute. I don't think that we can observe noticeable
difference in performance between Initial Exec and Local Exec except an
synthetic benchmark though.

The nice thing about linker relaxations is that they are very
user> friendly. The linker is the first point in the toolchaing where some
> usefull fact is know, and it can optimize the result with no user
> intervention.

I think I agree with this point. Automatic linker code relaxation is
convenient and if it makes a difference, we should implement that. But I'd
doubt if TLS relaxation is actually effective. George implemented them
because there's a spec defining how to relax them, and I accepted the
patches without thinking hard enough, but I didn't see a convincing
benchmark result (or even a non-convincing one) that shows that these
relaxations actually make real-world programs faster. Do you know of
any? It is funny that even the creator of TLSDESC found that their
optimization didn't actually makes NPTL faster as it is mentioned in the
"Conclusion" section in http://www.fsfla.org/~lxoliva/
writeups/TLS/RFC-TLSDESC-x86.txt.

So I don't think I'm proposing we simplify code by degrading user's
code.
It feels more like we are making too much effort on something that doesn't
produce any measurable difference in real life.
> So I don't see a strong reason to do a complicated instruction
rewriting
> in
> > the linker. I feel more like we should do whatever it is instructed to
do
> > by command line options and input object files. You are for example
free
> to
> > pass the -fPIC option to create object files and still let the linker
to
> > create a non-PIC executable, even though these combinations
doesn't make
> > much sense and produces slightly inefficient binary. If you don't
like
> it,
> > you can fix the compiler options. Thread-local variables can be
> considered
> > in the same way, no?
>
> They are considered in the same way, we also relax got access :-)
>
> The proposal is making the linker worse for our users to make our lifes
> easier. I really don't think we should do it.
>
> It is likelly that we can code the existing optimization in a simpler
> way. Even if we cannot, I don't think we should remove them.
>
> Linker relaxations are extremely convenient. We use the example you
> gave (-fPIC .o in an executable) all the time in llvm. That way we build
> only one .o that is used in lib/ and bin/.
>
> Linker relaxations are also fundamental to how RISCV works.
>
> Cheers,
> Rafael
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171107/c8b9a1c8/attachment.html>

Rafael Avila de Espindola via llvm-dev

2017-Nov-08 17:33 UTC

head link

[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC

Rui Ueyama <ruiu at google.com> writes:
>> If you are creating an executable and if your executable is not
>> > position-independent, you're using Initial Exec model by
default which is
>> > as fast as variables accessed through GOT. If you really want to
use
>> Local
>> > Exec model, you can pass -ftls-model=local-exec to compilers.
>>
>> But then all the used variables have to be defined in the same
>> executable. You can't have even one from a shared library (think
errno).
>>
>
> Not really -- you can still use Local Exec per variable basis using the
> visibility attribute. I don't think that we can observe noticeable
> difference in performance between Initial Exec and Local Exec except an
> synthetic benchmark though.
There nothing that the linker can do that the compiler could not have
done in the first place. The point is that if to switch to lld and keep
performance users should not have to annotate all tls variables with
tls-model.
> The nice thing about linker relaxations is that they are very user
>> friendly. The linker is the first point in the toolchaing where some
>> usefull fact is know, and it can optimize the result with no user
>> intervention.
>
>
> I think I agree with this point. Automatic linker code relaxation is
> convenient and if it makes a difference, we should implement that. But
I'd
> doubt if TLS relaxation is actually effective. George implemented them
> because there's a spec defining how to relax them, and I accepted the
> patches without thinking hard enough, but I didn't see a convincing
> benchmark result (or even a non-convincing one) that shows that these
> relaxations actually make real-world programs faster. Do you know of
> any? It is funny that even the creator of TLSDESC found that their
> optimization didn't actually makes NPTL faster as it is mentioned in
the
> "Conclusion" section in http://www.fsfla.org/~lxoliva/
> writeups/TLS/RFC-TLSDESC-x86.txt.
>
> So I don't think I'm proposing we simplify code by degrading
user's code.
> It feels more like we are making too much effort on something that
doesn't
> produce any measurable difference in real life.
*PLEASE* let us keep it. It is bad enough that we are regressing
performance in the name of having code that you find nicer. It would be
really annoying to see us drop a working feature just to reduce our
code a bit.

The code is working, please let it be!

At the very least we should keep it until we are in a position to
actually measure it. As is this is just guesswork. We would need a
*much* bigger adoption before we could measure this.

Cheers,
Rafael

Rui Ueyama via llvm-dev

2017-Nov-08 18:55 UTC

head link

[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC

On Wed, Nov 8, 2017 at 9:33 AM, Rafael Avila de Espindola <
rafael.espindola at gmail.com> wrote:
> Rui Ueyama <ruiu at google.com> writes:
>
> >> If you are creating an executable and if your executable is not
> >> > position-independent, you're using Initial Exec model by
default
> which is
> >> > as fast as variables accessed through GOT. If you really want
to use
> >> Local
> >> > Exec model, you can pass -ftls-model=local-exec to compilers.
> >>
> >> But then all the used variables have to be defined in the same
> >> executable. You can't have even one from a shared library
(think errno).
> >>
> >
> > Not really -- you can still use Local Exec per variable basis using
the
> > visibility attribute. I don't think that we can observe noticeable
> > difference in performance between Initial Exec and Local Exec except
an
> > synthetic benchmark though.
>
> There nothing that the linker can do that the compiler could not have
> done in the first place. The point is that if to switch to lld and keep
> performance users should not have to annotate all tls variables with
> tls-model.
>
> > The nice thing about linker relaxations is that they are very user
> >> friendly. The linker is the first point in the toolchaing where
some
> >> usefull fact is know, and it can optimize the result with no user
> >> intervention.
> >
> >
> > I think I agree with this point. Automatic linker code relaxation is
> > convenient and if it makes a difference, we should implement that. But
> I'd
> > doubt if TLS relaxation is actually effective. George implemented them
> > because there's a spec defining how to relax them, and I accepted
the
> > patches without thinking hard enough, but I didn't see a
convincing
> > benchmark result (or even a non-convincing one) that shows that these
> > relaxations actually make real-world programs faster. Do you know of
> > any? It is funny that even the creator of TLSDESC found that their
> > optimization didn't actually makes NPTL faster as it is mentioned
in the
> > "Conclusion" section in http://www.fsfla.org/~lxoliva/
> > writeups/TLS/RFC-TLSDESC-x86.txt.
> >
> > So I don't think I'm proposing we simplify code by degrading
user's code.
> > It feels more like we are making too much effort on something that
> doesn't
> > produce any measurable difference in real life.
>
> *PLEASE* let us keep it. It is bad enough that we are regressing
> performance in the name of having code that you find nicer. It would be
> really annoying to see us drop a working feature just to reduce our
> code a bit.
>
Please take it easy. :) I'm not saying that I'm going to remove it.
Instead, I'm bringing a (possibly crazy) idea to the table to discuss, and
that is IMO a good thing. Part of the reason why lld is successful is
because of its relatively radical design choice such as Windows-ish library
semantics, which might have been somewhat crazy idea. So, I think "stop,
think and re-evaluate what has traditionally been done" is what we are good
at, regardless of the conclusion of the assessment. And as you know we
(including you) have been making reasonable decisions on technical design
choices.

The code is working, please let it be!>
> At the very least we should keep it until we are in a position to
> actually measure it. As is this is just guesswork. We would need a
> *much* bigger adoption before we could measure this.
>
> Cheers,
> Rafael
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171108/81bead65/attachment.html>

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Nov 2017 - [RFC] lld: Dropping TLS relaxations in favor of TLSDESC

[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC

[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC

[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC

Seemingly Similar Threads