Rui Ueyama via llvm-dev
2017-Nov-08 02:27 UTC
[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC
tl;dr: TLSDESC have solved most problems in formerly inefficient TLS access models, so I think we can drop TLS relaxation support from lld. lld's code to handle relocations is a mess; the code consists of a lot of cascading "if"s and needs a lot of prior knowledge to understand what it is doing. Honestly it is head-scratching and needs serious refactoring. I'm trying to simplify it to make it manageable again, and I'm now focusing on the TLS relaxations. Thread-local variables in ELF is complicated. The ELF TLS specification [1] defines 4 different access models: General Dynamic, Local Dynamic, Initial Exec and Local Exec. I'm not going into the details of the spec here, but the reason why we have so many different models for the same feature is because they were different in speed, and we have to use (formerly) slow models when we know less about their run-time memory layout at compile-time or link-time. So, there was a trade-off between generality and performance. For example, if you want to use thread-local variables in a dlopen(2)'able DSO, you need to choose the slowest model. If a linker knows at link-time that a more restricted access model is applicable (e.g. if it is linking a main executable, it knows for sure that it is not creating a DSO that will be used via dlopen), the linker is allowed to rewrite instructions to load thread-local variables to use a faster access model. What makes the situation more complicated is the presence of a new method of accessing thread-local variables. After the ELF TLS spec was defined, TLSDESC [2] was proposed and implemented. With that method, General Dynamic and Local Dynamic models (that were pretty slow in the original spec) are as fast as much faster Initial Exec model. TLSDESC doesn't have a trade-off of dlopen'ability and access speed. According to [2], it also reduces the size of generated DSOs. So it seems like TLSDESC is strictly a better way of accessing thread-local variables than the old way, and the thread-local variable's performance problem (that the TLS ELF spec was trying to address by defining four different access models and relaxations in between) doesn't seem a real issue anymore. lld supports all TLS relaxations as defined by the ELF TLS spec. I accepted the patches to implement all these features without thinking hard enough about it, but on second thought, that was likely a wrong decision. Being a new linker, we don't need to trace the history of the evolution of the ELF spec. Instead, we should have implemented whatever it makes sense now. So, I'd like to propose we drop TLS relaxations from lld, including Initial Exec → Local Exec. Dropping IE→LE is strictly speaking a degradation, but I don't think that is important. We don't have optimizations for much more frequent variable access patterns such as locally-accessed variables that have GOT slots (which in theory we can skip GOT access because GOT slot values are known at link-time), so it is odd that we are only serious about TLS variables, which are usually much less important. Even if it would turn out that we want it after implementing more important relaxations, I'd like to drop it for now and reimplement it in a different way later. This should greatly simplifies the code because it does not only reduce the complexity and amount of the existing code, but also reduces the amount of knowledge you need to have to read the code, without sacrificing performance of lld-generated files in practice. Thoughts? [1] https://www.akkadia.org/drepper/tls.pdf [2] http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171107/b39b7aaf/attachment-0001.html>
Rafael Avila de Espindola via llvm-dev
2017-Nov-08 02:59 UTC
[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC
Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org> writes:> tl;dr: TLSDESC have solved most problems in formerly inefficient TLS access > models, so I think we can drop TLS relaxation support from lld. > > lld's code to handle relocations is a mess; the code consists of a lot of > cascading "if"s and needs a lot of prior knowledge to understand what it is > doing. Honestly it is head-scratching and needs serious refactoring. I'm > trying to simplify it to make it manageable again, and I'm now focusing on > the TLS relaxations. > > Thread-local variables in ELF is complicated. The ELF TLS specification [1] > defines 4 different access models: General Dynamic, Local Dynamic, Initial > Exec and Local Exec. > > I'm not going into the details of the spec here, but the reason why we have > so many different models for the same feature is because they were > different in speed, and we have to use (formerly) slow models when we know > less about their run-time memory layout at compile-time or link-time. So, > there was a trade-off between generality and performance. For example, if > you want to use thread-local variables in a dlopen(2)'able DSO, you need to > choose the slowest model. If a linker knows at link-time that a more > restricted access model is applicable (e.g. if it is linking a main > executable, it knows for sure that it is not creating a DSO that will be > used via dlopen), the linker is allowed to rewrite instructions to load > thread-local variables to use a faster access model. > > What makes the situation more complicated is the presence of a new method > of accessing thread-local variables. After the ELF TLS spec was defined, > TLSDESC [2] was proposed and implemented. With that method, General Dynamic > and Local Dynamic models (that were pretty slow in the original spec) are > as fast as much faster Initial Exec model. TLSDESC doesn't have a trade-off > of dlopen'ability and access speed. According to [2], it also reduces the > size of generated DSOs. So it seems like TLSDESC is strictly a better way > of accessing thread-local variables than the old way, and the thread-local > variable's performance problem (that the TLS ELF spec was trying to address > by defining four different access models and relaxations in between) > doesn't seem a real issue anymore. > > lld supports all TLS relaxations as defined by the ELF TLS spec. I accepted > the patches to implement all these features without thinking hard enough > about it, but on second thought, that was likely a wrong decision. Being a > new linker, we don't need to trace the history of the evolution of the ELF > spec. Instead, we should have implemented whatever it makes sense now. > > So, I'd like to propose we drop TLS relaxations from lld, including Initial > Exec → Local Exec. Dropping IE→LE is strictly speaking a degradation, but I > don't think that is important. We don't have optimizations for much more > frequent variable access patterns such as locally-accessed variables that > have GOT slots (which in theory we can skip GOT access because GOT slot > values are known at link-time), so it is odd that we are only serious about > TLS variables, which are usually much less important. Even if it would turn > out that we want it after implementing more important relaxations, I'd like > to drop it for now and reimplement it in a different way later. > > This should greatly simplifies the code because it does not only reduce the > complexity and amount of the existing code, but also reduces the amount of > knowledge you need to have to read the code, without sacrificing > performance of lld-generated files in practice. > > Thoughts?I don't think we can do it. The main thing we have to keep in mind is that not everyone is using TLSDESC. In fact, clang doesn't even support -mtls-dialect=gnu2. If everyone switches to TLSDESC, then I am OK with dropping optimizations for the old model. But even with TLSDESC we still need linker relaxations. The TLSDESC idea solves some of the GD -> IE cost in the case where the .so is not dlopened, but that is it. Note that AARCH64 that is TLSDESC only has relaxations. So I am strongly against removing either non TLSDESC support of support for the relaxations. Cheers, Rafael
Rui Ueyama via llvm-dev
2017-Nov-08 03:39 UTC
[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC
On Tue, Nov 7, 2017 at 6:59 PM, Rafael Avila de Espindola < rafael.espindola at gmail.com> wrote:> Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org> writes: > > > tl;dr: TLSDESC have solved most problems in formerly inefficient TLS > access > > models, so I think we can drop TLS relaxation support from lld. > > > > lld's code to handle relocations is a mess; the code consists of a lot of > > cascading "if"s and needs a lot of prior knowledge to understand what it > is > > doing. Honestly it is head-scratching and needs serious refactoring. I'm > > trying to simplify it to make it manageable again, and I'm now focusing > on > > the TLS relaxations. > > > > Thread-local variables in ELF is complicated. The ELF TLS specification > [1] > > defines 4 different access models: General Dynamic, Local Dynamic, > Initial > > Exec and Local Exec. > > > > I'm not going into the details of the spec here, but the reason why we > have > > so many different models for the same feature is because they were > > different in speed, and we have to use (formerly) slow models when we > know > > less about their run-time memory layout at compile-time or link-time. So, > > there was a trade-off between generality and performance. For example, if > > you want to use thread-local variables in a dlopen(2)'able DSO, you need > to > > choose the slowest model. If a linker knows at link-time that a more > > restricted access model is applicable (e.g. if it is linking a main > > executable, it knows for sure that it is not creating a DSO that will be > > used via dlopen), the linker is allowed to rewrite instructions to load > > thread-local variables to use a faster access model. > > > > What makes the situation more complicated is the presence of a new method > > of accessing thread-local variables. After the ELF TLS spec was defined, > > TLSDESC [2] was proposed and implemented. With that method, General > Dynamic > > and Local Dynamic models (that were pretty slow in the original spec) are > > as fast as much faster Initial Exec model. TLSDESC doesn't have a > trade-off > > of dlopen'ability and access speed. According to [2], it also reduces the > > size of generated DSOs. So it seems like TLSDESC is strictly a better way > > of accessing thread-local variables than the old way, and the > thread-local > > variable's performance problem (that the TLS ELF spec was trying to > address > > by defining four different access models and relaxations in between) > > doesn't seem a real issue anymore. > > > > lld supports all TLS relaxations as defined by the ELF TLS spec. I > accepted > > the patches to implement all these features without thinking hard enough > > about it, but on second thought, that was likely a wrong decision. Being > a > > new linker, we don't need to trace the history of the evolution of the > ELF > > spec. Instead, we should have implemented whatever it makes sense now. > > > > So, I'd like to propose we drop TLS relaxations from lld, including > Initial > > Exec → Local Exec. Dropping IE→LE is strictly speaking a degradation, > but I > > don't think that is important. We don't have optimizations for much more > > frequent variable access patterns such as locally-accessed variables that > > have GOT slots (which in theory we can skip GOT access because GOT slot > > values are known at link-time), so it is odd that we are only serious > about > > TLS variables, which are usually much less important. Even if it would > turn > > out that we want it after implementing more important relaxations, I'd > like > > to drop it for now and reimplement it in a different way later. > > > > This should greatly simplifies the code because it does not only reduce > the > > complexity and amount of the existing code, but also reduces the amount > of > > knowledge you need to have to read the code, without sacrificing > > performance of lld-generated files in practice. > > > > Thoughts? > > I don't think we can do it. > > The main thing we have to keep in mind is that not everyone is using > TLSDESC. In fact, clang doesn't even support -mtls-dialect=gnu2. >Oh, okay, that is a surprise to me. There's no reason not to support that and make it default, I wasn't even try that. We definitely should support that. If everyone switches to TLSDESC, then I am OK with dropping> optimizations for the old model. > > But even with TLSDESC we still need linker relaxations. The TLSDESC idea > solves some of the GD -> IE cost in the case where the .so is not > dlopened, but that is it. Note that AARCH64 that is TLSDESC only has > relaxations. > > So I am strongly against removing either non TLSDESC support of support > for the relaxations. >It's still pretty arguable. By default, compilers use General Dynamic model with -fpic, and Initial Exec without -fpic. lld doesn't do any relaxation if -shared is given. So, if you are creating a DSO, thread-local variables in the DSO are accessed using Global Dynamic model. No relaxations are involved. If you are creating an executable and if your executable is not position-independent, you're using Initial Exec model by default which is as fast as variables accessed through GOT. If you really want to use Local Exec model, you can pass -ftls-model=local-exec to compilers. If you are creating a position-independent executable and you want to use Initial Exec or Local Exec, you can do that by passing -ftls-model={initial-exec,local-exec} to compilers. So I don't see a strong reason to do a complicated instruction rewriting in the linker. I feel more like we should do whatever it is instructed to do by command line options and input object files. You are for example free to pass the -fPIC option to create object files and still let the linker to create a non-PIC executable, even though these combinations doesn't make much sense and produces slightly inefficient binary. If you don't like it, you can fix the compiler options. Thread-local variables can be considered in the same way, no? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171107/6fd22c90/attachment.html>
Mark Kettenis via llvm-dev
2017-Nov-08 09:05 UTC
[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC
> Date: Tue, 7 Nov 2017 18:27:37 -0800 > From: Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org> > > tl;dr: TLSDESC have solved most problems in formerly inefficient TLS access > models, so I think we can drop TLS relaxation support from lld. > > lld's code to handle relocations is a mess; the code consists of a lot of > cascading "if"s and needs a lot of prior knowledge to understand what it is > doing. Honestly it is head-scratching and needs serious refactoring. I'm > trying to simplify it to make it manageable again, and I'm now focusing on > the TLS relaxations. > > Thread-local variables in ELF is complicated. The ELF TLS specification [1] > defines 4 different access models: General Dynamic, Local Dynamic, Initial > Exec and Local Exec. > > I'm not going into the details of the spec here, but the reason why we have > so many different models for the same feature is because they were > different in speed, and we have to use (formerly) slow models when we know > less about their run-time memory layout at compile-time or link-time. So, > there was a trade-off between generality and performance. For example, if > you want to use thread-local variables in a dlopen(2)'able DSO, you need to > choose the slowest model. If a linker knows at link-time that a more > restricted access model is applicable (e.g. if it is linking a main > executable, it knows for sure that it is not creating a DSO that will be > used via dlopen), the linker is allowed to rewrite instructions to load > thread-local variables to use a faster access model. > > What makes the situation more complicated is the presence of a new method > of accessing thread-local variables. After the ELF TLS spec was defined, > TLSDESC [2] was proposed and implemented. With that method, General Dynamic > and Local Dynamic models (that were pretty slow in the original spec) are > as fast as much faster Initial Exec model. TLSDESC doesn't have a trade-off > of dlopen'ability and access speed. According to [2], it also reduces the > size of generated DSOs. So it seems like TLSDESC is strictly a better way > of accessing thread-local variables than the old way, and the thread-local > variable's performance problem (that the TLS ELF spec was trying to address > by defining four different access models and relaxations in between) > doesn't seem a real issue anymore. > > lld supports all TLS relaxations as defined by the ELF TLS spec. I accepted > the patches to implement all these features without thinking hard enough > about it, but on second thought, that was likely a wrong decision. Being a > new linker, we don't need to trace the history of the evolution of the ELF > spec. Instead, we should have implemented whatever it makes sense now. > > So, I'd like to propose we drop TLS relaxations from lld, including Initial > Exec → Local Exec. Dropping IE→LE is strictly speaking a degradation, but I > don't think that is important. We don't have optimizations for much more > frequent variable access patterns such as locally-accessed variables that > have GOT slots (which in theory we can skip GOT access because GOT slot > values are known at link-time), so it is odd that we are only serious about > TLS variables, which are usually much less important. Even if it would turn > out that we want it after implementing more important relaxations, I'd like > to drop it for now and reimplement it in a different way later. > > This should greatly simplifies the code because it does not only reduce the > complexity and amount of the existing code, but also reduces the amount of > knowledge you need to have to read the code, without sacrificing > performance of lld-generated files in practice. > > Thoughts?Not sure what the impact of this would be. Does this mean that some TLS relocations will no longer be supported? Or is it that they just won't be optimized. How about static binaries? Don't they rely on the local exec model? Doe this affect linking code generated by older compilers (say GCC 4.2.1) in any way?
Joerg Sonnenberger via llvm-dev
2017-Nov-08 12:53 UTC
[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC
On Tue, Nov 07, 2017 at 06:27:37PM -0800, Rui Ueyama via llvm-dev wrote:> tl;dr: TLSDESC have solved most problems in formerly inefficient TLS access > models, so I think we can drop TLS relaxation support from lld.I've skipped over the description and I have some difficulty sharing this conlusion. I don't see how it makes any significant difference. I also don't know if any system beyond glibc implements it. Side note: position independent executables that are properly compiled behave like non-position independent executables. Side note 2: I strongly question the assertions about frequency of dlopen vs direct linking from the TLSDESC paper. Quite a few hacks on the dynamic linker side are a direct result of people wanting to dlopen libGL from scripting languages like Python. Joerg
Peter Smith via llvm-dev
2017-Nov-08 14:10 UTC
[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC
I'm assuming it means the instruction sequences wouldn't be optimized, I don't think it would be practical to remove support for the relocations. For Arm and, I think Mips is similar, there isn't any TLS relaxation of instructions as the TLS relocations act on data and not instructions. There are some cases where dynamic relocations can be omitted, for example the module-id of an executable is defined to be 1 so there is no need for the dynamic linker to fill this in. For static linking the linker knows module and the offsets of all the TLS Symbols so it can resolve all the dynamic relocations. I don't know off the top of my head whether this would apply to other architectures, although I think the general principles should hold the same. The last time I looked relaxation was the technique used to support static linking on non-Arm and Mips Targets. I have a vague memory of the OpenGL folk being sensitive to TLS performance, particularly as the library is often shared. I think that TLS relaxation isn't going to show up in many traditional benchmarking suites as much of the performance critical code is going to be in the application, and are unlikely to have much TLS in them. I'm thinking that it would need something like a real-world application that makes heavy use of shared-libraries with TLS (games, web-browsers or perhaps HPC?). Given that getting convincing data either way about the impact of TLS relaxation could be difficult we should err towards keeping it. Peter On 8 November 2017 at 09:05, Mark Kettenis via llvm-dev <llvm-dev at lists.llvm.org> wrote:>> Date: Tue, 7 Nov 2017 18:27:37 -0800 >> From: Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org> >> >> tl;dr: TLSDESC have solved most problems in formerly inefficient TLS access >> models, so I think we can drop TLS relaxation support from lld. >> >> lld's code to handle relocations is a mess; the code consists of a lot of >> cascading "if"s and needs a lot of prior knowledge to understand what it is >> doing. Honestly it is head-scratching and needs serious refactoring. I'm >> trying to simplify it to make it manageable again, and I'm now focusing on >> the TLS relaxations. >> >> Thread-local variables in ELF is complicated. The ELF TLS specification [1] >> defines 4 different access models: General Dynamic, Local Dynamic, Initial >> Exec and Local Exec. >> >> I'm not going into the details of the spec here, but the reason why we have >> so many different models for the same feature is because they were >> different in speed, and we have to use (formerly) slow models when we know >> less about their run-time memory layout at compile-time or link-time. So, >> there was a trade-off between generality and performance. For example, if >> you want to use thread-local variables in a dlopen(2)'able DSO, you need to >> choose the slowest model. If a linker knows at link-time that a more >> restricted access model is applicable (e.g. if it is linking a main >> executable, it knows for sure that it is not creating a DSO that will be >> used via dlopen), the linker is allowed to rewrite instructions to load >> thread-local variables to use a faster access model. >> >> What makes the situation more complicated is the presence of a new method >> of accessing thread-local variables. After the ELF TLS spec was defined, >> TLSDESC [2] was proposed and implemented. With that method, General Dynamic >> and Local Dynamic models (that were pretty slow in the original spec) are >> as fast as much faster Initial Exec model. TLSDESC doesn't have a trade-off >> of dlopen'ability and access speed. According to [2], it also reduces the >> size of generated DSOs. So it seems like TLSDESC is strictly a better way >> of accessing thread-local variables than the old way, and the thread-local >> variable's performance problem (that the TLS ELF spec was trying to address >> by defining four different access models and relaxations in between) >> doesn't seem a real issue anymore. >> >> lld supports all TLS relaxations as defined by the ELF TLS spec. I accepted >> the patches to implement all these features without thinking hard enough >> about it, but on second thought, that was likely a wrong decision. Being a >> new linker, we don't need to trace the history of the evolution of the ELF >> spec. Instead, we should have implemented whatever it makes sense now. >> >> So, I'd like to propose we drop TLS relaxations from lld, including Initial >> Exec → Local Exec. Dropping IE→LE is strictly speaking a degradation, but I >> don't think that is important. We don't have optimizations for much more >> frequent variable access patterns such as locally-accessed variables that >> have GOT slots (which in theory we can skip GOT access because GOT slot >> values are known at link-time), so it is odd that we are only serious about >> TLS variables, which are usually much less important. Even if it would turn >> out that we want it after implementing more important relaxations, I'd like >> to drop it for now and reimplement it in a different way later. >> >> This should greatly simplifies the code because it does not only reduce the >> complexity and amount of the existing code, but also reduces the amount of >> knowledge you need to have to read the code, without sacrificing >> performance of lld-generated files in practice. >> >> Thoughts? > > Not sure what the impact of this would be. Does this mean that some > TLS relocations will no longer be supported? Or is it that they just > won't be optimized. How about static binaries? Don't they rely on > the local exec model? > > Doe this affect linking code generated by older compilers (say GCC > 4.2.1) in any way? > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Rafael Avila de Espindola via llvm-dev
2017-Nov-08 17:38 UTC
[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC
Joerg Sonnenberger via llvm-dev <llvm-dev at lists.llvm.org> writes:> On Tue, Nov 07, 2017 at 06:27:37PM -0800, Rui Ueyama via llvm-dev wrote: >> tl;dr: TLSDESC have solved most problems in formerly inefficient TLS access >> models, so I think we can drop TLS relaxation support from lld. > > I've skipped over the description and I have some difficulty sharing > this conlusion. I don't see how it makes any significant difference. I > also don't know if any system beyond glibc implements it.musl implements it. Cheers, Rafael