Hi! LLVM currently does not implement the implicit TLS model on Windows. This model is easy: - a thread local variable ends up in the .tls section - to access a thread local variable, you have to do (1) load pointer to thread local storage from TEB On x86_64, this is gs:0x58, on x86 it is fs:0x2C. (2) load pointer to thread local state. In general, the index is stored in variable _tls_index. For a .exe, _tls_index is always 0. (3) load offset of variable to start of .tls section. (4) the thread local variable can now accessed with the add of step 2 and 3. For x86_64, something like the following should be generated for the tls1.ll test case: (1) mov rdx, qword [gs:abs 58H] (2) mov ecx, dword [rel _tls_index] mov rcx, qword [rdx+rcx*8] (3) mov eax, .tls$:i (4) mov eax, dword [rax+rcx] ret (See the PECOFF spec, chapter 5.7 and http://www.nynaeve.net/?p=185 for reference.) I tried to implement this. With the attached patch tls1.patch, a thread local variable ends up in the .tls section. This looks fine. With the second patch tls2.patch I try to implement the code sequence. Here I have a couple of questions: - To get the offset to start of .tls section, I have created a new MachineOperand flag. Is this the right approach? If yes then I need a hint where to implement this in the WinCOFFObjectWriter. - How can I code the load of variable _tls_index in SelectionDAG? I have some trouble using DAG.getExternalSymbol and then loading the value. Thanks for your help. Kai -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: tls2.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111204/2878ec05/attachment.ksh> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: tls1.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111204/2878ec05/attachment-0001.ksh>
Michael Spencer
2011-Dec-06 00:22 UTC
[LLVMdev] Implement implicit TLS on Windows - need advice
On Sun, Dec 4, 2011 at 9:18 AM, Kai <kai at redstar.de> wrote:> Hi! > > LLVM currently does not implement the implicit TLS model on Windows. This > model is easy: > > - a thread local variable ends up in the .tls section > - to access a thread local variable, you have to do > (1) load pointer to thread local storage from TEB > On x86_64, this is gs:0x58, on x86 it is fs:0x2C. > (2) load pointer to thread local state. In general, the index is stored in > variable _tls_index. For a .exe, _tls_index is always 0. > (3) load offset of variable to start of .tls section. > (4) the thread local variable can now accessed with the add of step 2 and > 3. > > For x86_64, something like the following should be generated for the tls1.ll > test case: > > (1) mov rdx, qword [gs:abs 58H] > (2) mov ecx, dword [rel _tls_index] > mov rcx, qword [rdx+rcx*8] > (3) mov eax, .tls$:i > (4) mov eax, dword [rax+rcx] > ret > > (See the PECOFF spec, chapter 5.7 and http://www.nynaeve.net/?p=185 for > reference.) > > I tried to implement this. With the attached patch tls1.patch, a thread > local variable ends up in the .tls section. This looks fine. > With the second patch tls2.patch I try to implement the code sequence. Here > I have a couple of questions: > > - To get the offset to start of .tls section, I have created a new > MachineOperand flag. Is this the right approach? If yes then I need a hint > where to implement this in the WinCOFFObjectWriter. > - How can I code the load of variable _tls_index in SelectionDAG? I have > some trouble using DAG.getExternalSymbol and then loading the value. > > Thanks for your help. > > KaiThanks for working on this! The first patch looks fine except that it's emitting to .tls when it should be .tls$. Also, you need to add tests. As for the second patch, that's not how MSVC 2010 emits code (and it needs tests). thread_local.c: #ifdef _MSC_VER #define __thread __declspec(thread) #endif __thread int i = 0; int foo() { return i++; } thread_local.asm: PUBLIC _i _TLS SEGMENT _i DD 00H _TLS ENDS PUBLIC _foo EXTRN __tls_array:DWORD EXTRN __tls_index:DWORD ; Function compile flags: /Ogtpy _TEXT SEGMENT _foo PROC ; File c:\users\mspencer\projects\llvm-project\test\thread_local.c ; Line 7 mov eax, DWORD PTR __tls_index mov ecx, DWORD PTR fs:__tls_array mov ecx, DWORD PTR [ecx+eax*4] mov eax, DWORD PTR _i[ecx] lea edx, DWORD PTR [eax+1] mov DWORD PTR _i[ecx], edx ; Line 8 ret 0 _foo ENDP _TEXT ENDS END llvm-objdump -d -r -s thread_local.obj: Disassembly of section .text: _foo: 0: a1 00 00 00 00 movl 0, %eax 1: IMAGE_REL_I386_DIR32 __tls_index 5: 64 8b 0d 00 00 00 00 movl %fs:0, %ecx 8: IMAGE_REL_I386_DIR32 __tls_array c: 8b 0c 81 movl (%ecx,%eax,4), %ecx f: 8b 81 00 00 00 00 movl (%ecx), %eax 11: IMAGE_REL_I386_SECREL _i 15: 8d 50 01 leal 1(%eax), %edx 18: 89 91 00 00 00 00 movl %edx, (%ecx) 1a: IMAGE_REL_I386_SECREL _i 1e: c3 ret Contents of section .tls$: 0000 00000000 .... Contents of section .text: 0000 a1000000 00648b0d 00000000 8b0c818b .....d.......... 0010 81000000 008d5001 89910000 0000c3 ......P........ - Michael Spencer
Hi Michael! Thanks for your answer. I got a step further - I can generate some code which looks not too bad. And yes - I am aware of the fact that test cases are still missing. Thanks for pointing out that the 32bit code is a bit different from the 64bit code. I have a real use case for the 64bit code, so this is my first target. I added an assert for the 32bit case. I also changed the name of the section to .tls$. I still have some questions: 1) In WinCOFFObjectWriter::RecordRelocation I check for the new MCSymbolRefExpr::VK_SECREL. Is this the right approach or should I better create a new fixup kind? 2) Is there a way to lower the code so that an expression like rax+8*rbx is generated by default? Thank you! Kai On 06.12.2011 01:22, Michael Spencer wrote:> On Sun, Dec 4, 2011 at 9:18 AM, Kai<kai at redstar.de> wrote: >> Hi! >> >> LLVM currently does not implement the implicit TLS model on Windows. This >> model is easy: >> >> - a thread local variable ends up in the .tls section >> - to access a thread local variable, you have to do >> (1) load pointer to thread local storage from TEB >> On x86_64, this is gs:0x58, on x86 it is fs:0x2C. >> (2) load pointer to thread local state. In general, the index is stored in >> variable _tls_index. For a .exe, _tls_index is always 0. >> (3) load offset of variable to start of .tls section. >> (4) the thread local variable can now accessed with the add of step 2 and >> 3. >> >> For x86_64, something like the following should be generated for the tls1.ll >> test case: >> >> (1) mov rdx, qword [gs:abs 58H] >> (2) mov ecx, dword [rel _tls_index] >> mov rcx, qword [rdx+rcx*8] >> (3) mov eax, .tls$:i >> (4) mov eax, dword [rax+rcx] >> ret >> >> (See the PECOFF spec, chapter 5.7 and http://www.nynaeve.net/?p=185 for >> reference.) >> >> I tried to implement this. With the attached patch tls1.patch, a thread >> local variable ends up in the .tls section. This looks fine. >> With the second patch tls2.patch I try to implement the code sequence. Here >> I have a couple of questions: >> >> - To get the offset to start of .tls section, I have created a new >> MachineOperand flag. Is this the right approach? If yes then I need a hint >> where to implement this in the WinCOFFObjectWriter. >> - How can I code the load of variable _tls_index in SelectionDAG? I have >> some trouble using DAG.getExternalSymbol and then loading the value. >> >> Thanks for your help. >> >> Kai > > Thanks for working on this! > > The first patch looks fine except that it's emitting to .tls when it > should be .tls$. Also, you need to add tests. > > As for the second patch, that's not how MSVC 2010 emits code (and it > needs tests). > > thread_local.c: > > #ifdef _MSC_VER > #define __thread __declspec(thread) > #endif > __thread int i = 0; > > int foo() { > return i++; > } > > thread_local.asm: > > PUBLIC _i > _TLS SEGMENT > _i DD 00H > _TLS ENDS > PUBLIC _foo > EXTRN __tls_array:DWORD > EXTRN __tls_index:DWORD > ; Function compile flags: /Ogtpy > _TEXT SEGMENT > _foo PROC > ; File c:\users\mspencer\projects\llvm-project\test\thread_local.c > ; Line 7 > mov eax, DWORD PTR __tls_index > mov ecx, DWORD PTR fs:__tls_array > mov ecx, DWORD PTR [ecx+eax*4] > mov eax, DWORD PTR _i[ecx] > lea edx, DWORD PTR [eax+1] > mov DWORD PTR _i[ecx], edx > ; Line 8 > ret 0 > _foo ENDP > _TEXT ENDS > END > > llvm-objdump -d -r -s thread_local.obj: > > Disassembly of section .text: > _foo: > 0: a1 00 00 00 00 movl 0, %eax > 1: IMAGE_REL_I386_DIR32 __tls_index > 5: 64 8b 0d 00 00 00 00 movl > %fs:0, %ecx > 8: IMAGE_REL_I386_DIR32 __tls_array > c: 8b 0c 81 movl > (%ecx,%eax,4), %ecx > f: 8b 81 00 00 00 00 movl > (%ecx), %eax > 11: IMAGE_REL_I386_SECREL _i > 15: 8d 50 01 leal > 1(%eax), %edx > 18: 89 91 00 00 00 00 movl > %edx, (%ecx) > 1a: IMAGE_REL_I386_SECREL _i > 1e: c3 ret > > Contents of section .tls$: > 0000 00000000 .... > Contents of section .text: > 0000 a1000000 00648b0d 00000000 8b0c818b .....d.......... > 0010 81000000 008d5001 89910000 0000c3 ......P........ > > - Michael Spencer >-------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: tls.diff URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111208/ff268844/attachment.ksh>
Hi Michael! Thanks for your answer. I got a step further - I can generate some code which looks not too bad. And yes - I am aware of the fact that test cases are still missing. Thanks for pointing out that the 32bit code is a bit different from the 64bit code. I have a real use case for the 64bit code, so this is my first target. I added an assert for the 32bit case. I also changed the name of the section to .tls$. I still have some questions: 1) In WinCOFFObjectWriter::RecordRelocation I check for the new MCSymbolRefExpr::VK_SECREL. Is this the right approach or should I better create a new fixup kind? 2) Is there a way to lower the code so that an expression like rax+8*rbx is generated by default? Thank you! Kai On 06.12.2011 01:22, Michael Spencer wrote:> On Sun, Dec 4, 2011 at 9:18 AM, Kai<kai at redstar.de> wrote: >> Hi! >> >> LLVM currently does not implement the implicit TLS model on Windows. This >> model is easy: >> >> - a thread local variable ends up in the .tls section >> - to access a thread local variable, you have to do >> (1) load pointer to thread local storage from TEB >> On x86_64, this is gs:0x58, on x86 it is fs:0x2C. >> (2) load pointer to thread local state. In general, the index is stored in >> variable _tls_index. For a .exe, _tls_index is always 0. >> (3) load offset of variable to start of .tls section. >> (4) the thread local variable can now accessed with the add of step 2 and >> 3. >> >> For x86_64, something like the following should be generated for the tls1.ll >> test case: >> >> (1) mov rdx, qword [gs:abs 58H] >> (2) mov ecx, dword [rel _tls_index] >> mov rcx, qword [rdx+rcx*8] >> (3) mov eax, .tls$:i >> (4) mov eax, dword [rax+rcx] >> ret >> >> (See the PECOFF spec, chapter 5.7 and http://www.nynaeve.net/?p=185 for >> reference.) >> >> I tried to implement this. With the attached patch tls1.patch, a thread >> local variable ends up in the .tls section. This looks fine. >> With the second patch tls2.patch I try to implement the code sequence. Here >> I have a couple of questions: >> >> - To get the offset to start of .tls section, I have created a new >> MachineOperand flag. Is this the right approach? If yes then I need a hint >> where to implement this in the WinCOFFObjectWriter. >> - How can I code the load of variable _tls_index in SelectionDAG? I have >> some trouble using DAG.getExternalSymbol and then loading the value. >> >> Thanks for your help. >> >> Kai > > Thanks for working on this! > > The first patch looks fine except that it's emitting to .tls when it > should be .tls$. Also, you need to add tests. > > As for the second patch, that's not how MSVC 2010 emits code (and it > needs tests). > > thread_local.c: > > #ifdef _MSC_VER > #define __thread __declspec(thread) > #endif > __thread int i = 0; > > int foo() { > return i++; > } > > thread_local.asm: > > PUBLIC _i > _TLS SEGMENT > _i DD 00H > _TLS ENDS > PUBLIC _foo > EXTRN __tls_array:DWORD > EXTRN __tls_index:DWORD > ; Function compile flags: /Ogtpy > _TEXT SEGMENT > _foo PROC > ; File c:\users\mspencer\projects\llvm-project\test\thread_local.c > ; Line 7 > mov eax, DWORD PTR __tls_index > mov ecx, DWORD PTR fs:__tls_array > mov ecx, DWORD PTR [ecx+eax*4] > mov eax, DWORD PTR _i[ecx] > lea edx, DWORD PTR [eax+1] > mov DWORD PTR _i[ecx], edx > ; Line 8 > ret 0 > _foo ENDP > _TEXT ENDS > END > > llvm-objdump -d -r -s thread_local.obj: > > Disassembly of section .text: > _foo: > 0: a1 00 00 00 00 movl 0, %eax > 1: IMAGE_REL_I386_DIR32 __tls_index > 5: 64 8b 0d 00 00 00 00 movl > %fs:0, %ecx > 8: IMAGE_REL_I386_DIR32 __tls_array > c: 8b 0c 81 movl > (%ecx,%eax,4), %ecx > f: 8b 81 00 00 00 00 movl > (%ecx), %eax > 11: IMAGE_REL_I386_SECREL _i > 15: 8d 50 01 leal > 1(%eax), %edx > 18: 89 91 00 00 00 00 movl > %edx, (%ecx) > 1a: IMAGE_REL_I386_SECREL _i > 1e: c3 ret > > Contents of section .tls$: > 0000 00000000 .... > Contents of section .text: > 0000 a1000000 00648b0d 00000000 8b0c818b .....d.......... > 0010 81000000 008d5001 89910000 0000c3 ......P........ > > - Michael Spencer >-------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: tls.diff URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111208/f13c5e77/attachment.ksh>
Possibly Parallel Threads
- [LLVMdev] Implement implicit TLS on Windows - need advice
- [newbie] trouble with global variables and CreateLoad/Store in JIT
- [newbie] trouble with global variables and CreateLoad/Store in JIT
- [newbie] trouble with global variables and CreateLoad/Store in JIT
- Complete migration from Cyrus on remote server