Taddeus Kroes via llvm-dev
2017-Aug-02 21:03 UTC
[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing
Hi Eli, Thanks, I’ll look into that then! Cheers, Taddeüs From: Friedman, Eli Sent: Wednesday, 2 August 2017 19:48 To: Taddeus; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing On 8/2/2017 9:03 AM, Taddeus via llvm-dev wrote:> Hi all, > > I am experiencing a problem with the representation of addresses in > the x86_64 TableGen backend and was hoping someone can tell me if it > is fixable. Any comments or hints in to send me in the right direction > would be greatly appreciated. I am using LLVM version 3.8, commit 251286. > > > I have an IR pass that stores metadata in the upper 32 bits of 64-bit > pointers in order to implement memory safety. The pass instruments > loads and stores to do an AND of the address with 0xffffffff to mask > out that metadata. E.g., when loading a 4-byte value from memory > pointed to by %rbx, this translates to the following asm: > mov %ecx,%ecx ; zeroes the upper bits, removing the metadata > mov (%rcx),%eax > > This leads to quite some overhead (12% on SPEC CPU2006) so I am > looking into possibilities for backend modifications to optimize this. > The first mov introduces unnecessary extra cycles and the second mov > has to wait for its results, potentially stalling the pipeline. On top > of that, it increases register pressure when the original pointer must > be preserved for later use (e.g. the mask would be "mov %esi,%ecx" > after which %rsi is dereferenced, instead of just dereferencing %esi). > > So, what I would like to generate instead is the following: > mov (%ecx),%eax > I.e., don't do the masking in a separate mov, but by using a > subregister for the address (which is zero-extended, effectively > ignoring the metadata bits). As a side note, GCC does emit the second > snippet as expected. > > > Looking at the TableGen files I found two problems: > > 1. The AND of the address with 0xffffffff is replaced with > SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in > lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits an > explicit mov instruction later. I think I need to replace this with > (i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that produces a > 32-bit value, which leads me to the next, more general problem: > > 2. The x86 backend currently does not support dereferencing 32-bit > addresses in 64-bit mode. Specifically, addresses are defined as an > iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8 > bytes depending on if 32/64 bit mode is active: > def addr : ComplexPattern<iPTR, 5, "selectAddr", [], > [SDNPWantParent]>; > The derefencing mov instruction looks like this: > def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src), > "mov{l}\t{$src, $dst|$dst, $src}", > [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>, OpSize32; > So it expects a source address of type 'addr' which is 8 bytes. This > leads to the following code being emitted when I apply my solution to > problem 1: > mov (%rcx),%eax > In other words, the upper bits are not ignored. > > > I am currently not sure what is the best place to solve this problem. > The best would be to give the 'addr' type a dynamic size but I don't > know how to do this. Any ideas on this?A TableGen pattern can only match one specific type; you'll need a separate pattern to match a 32-bit address. Yes, this means you'll need to write your own separate pattern for every load/store instruction, but there isn't really any way around that. There are some existing patterns involving MOV32rm, if you want inspiration; for example, the following pattern is from X86InstrCompiler.td: def : Pat<(extloadi64i32 addr:$src), (SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>; -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170802/0a85f677/attachment.html>
Craig Topper via llvm-dev
2017-Aug-02 21:17 UTC
[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing
Getting the instruction to actually use (%ecx) as the address requires putting a 0x67 prefix on the instruction. I'm not sure how to convince X86MCCodeEmitter.cpp to do that for you. Assuming you're wanting to generate binary and not textual assembly. ~Craig On Wed, Aug 2, 2017 at 2:03 PM, Taddeus Kroes via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi Eli, > > Thanks, I’ll look into that then! > > > > Cheers, > > Taddeüs > > > > *From: *Friedman, Eli <efriedma at codeaurora.org> > *Sent: *Wednesday, 2 August 2017 19:48 > *To: *Taddeus <t.kroes at vu.nl>; llvm-dev at lists.llvm.org > *Subject: *Re: [llvm-dev] Efficiently ignoring upper 32 pointer bits > whendereferencing > > > > On 8/2/2017 9:03 AM, Taddeus via llvm-dev wrote: > > > Hi all, > > > > > > I am experiencing a problem with the representation of addresses in > > > the x86_64 TableGen backend and was hoping someone can tell me if it > > > is fixable. Any comments or hints in to send me in the right direction > > > would be greatly appreciated. I am using LLVM version 3.8, commit > 251286. > > > > > > > > > I have an IR pass that stores metadata in the upper 32 bits of 64-bit > > > pointers in order to implement memory safety. The pass instruments > > > loads and stores to do an AND of the address with 0xffffffff to mask > > > out that metadata. E.g., when loading a 4-byte value from memory > > > pointed to by %rbx, this translates to the following asm: > > > mov %ecx,%ecx ; zeroes the upper bits, removing the metadata > > > mov (%rcx),%eax > > > > > > This leads to quite some overhead (12% on SPEC CPU2006) so I am > > > looking into possibilities for backend modifications to optimize this. > > > The first mov introduces unnecessary extra cycles and the second mov > > > has to wait for its results, potentially stalling the pipeline. On top > > > of that, it increases register pressure when the original pointer must > > > be preserved for later use (e.g. the mask would be "mov %esi,%ecx" > > > after which %rsi is dereferenced, instead of just dereferencing %esi). > > > > > > So, what I would like to generate instead is the following: > > > mov (%ecx),%eax > > > I.e., don't do the masking in a separate mov, but by using a > > > subregister for the address (which is zero-extended, effectively > > > ignoring the metadata bits). As a side note, GCC does emit the second > > > snippet as expected. > > > > > > > > > Looking at the TableGen files I found two problems: > > > > > > 1. The AND of the address with 0xffffffff is replaced with > > > SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in > > > lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits an > > > explicit mov instruction later. I think I need to replace this with > > > (i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that produces a > > > 32-bit value, which leads me to the next, more general problem: > > > > > > 2. The x86 backend currently does not support dereferencing 32-bit > > > addresses in 64-bit mode. Specifically, addresses are defined as an > > > iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8 > > > bytes depending on if 32/64 bit mode is active: > > > def addr : ComplexPattern<iPTR, 5, "selectAddr", [], > > > [SDNPWantParent]>; > > > The derefencing mov instruction looks like this: > > > def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src), > > > "mov{l}\t{$src, $dst|$dst, $src}", > > > [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>, OpSize32; > > > So it expects a source address of type 'addr' which is 8 bytes. This > > > leads to the following code being emitted when I apply my solution to > > > problem 1: > > > mov (%rcx),%eax > > > In other words, the upper bits are not ignored. > > > > > > > > > I am currently not sure what is the best place to solve this problem. > > > The best would be to give the 'addr' type a dynamic size but I don't > > > know how to do this. Any ideas on this? > > > > A TableGen pattern can only match one specific type; you'll need a > > separate pattern to match a 32-bit address. Yes, this means you'll need > > to write your own separate pattern for every load/store instruction, but > > there isn't really any way around that. > > > > There are some existing patterns involving MOV32rm, if you want > > inspiration; for example, the following pattern is from > X86InstrCompiler.td: > > > > def : Pat<(extloadi64i32 addr:$src), > > (SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>; > > > > -Eli > > > > -- > > Employee of Qualcomm Innovation Center, Inc. > > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux > Foundation Collaborative Project > > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170802/dba579f7/attachment.html>
Craig Topper via llvm-dev
2017-Aug-02 21:22 UTC
[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing
Maybe the code emitter will just work because it detects the register size since we have to support hand written assembly. ~Craig On Wed, Aug 2, 2017 at 2:17 PM, Craig Topper <craig.topper at gmail.com> wrote:> Getting the instruction to actually use (%ecx) as the address requires > putting a 0x67 prefix on the instruction. I'm not sure how to convince > X86MCCodeEmitter.cpp to do that for you. Assuming you're wanting to > generate binary and not textual assembly. > > ~Craig > > On Wed, Aug 2, 2017 at 2:03 PM, Taddeus Kroes via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi Eli, >> >> Thanks, I’ll look into that then! >> >> >> >> Cheers, >> >> Taddeüs >> >> >> >> *From: *Friedman, Eli <efriedma at codeaurora.org> >> *Sent: *Wednesday, 2 August 2017 19:48 >> *To: *Taddeus <t.kroes at vu.nl>; llvm-dev at lists.llvm.org >> *Subject: *Re: [llvm-dev] Efficiently ignoring upper 32 pointer bits >> whendereferencing >> >> >> >> On 8/2/2017 9:03 AM, Taddeus via llvm-dev wrote: >> >> > Hi all, >> >> > >> >> > I am experiencing a problem with the representation of addresses in >> >> > the x86_64 TableGen backend and was hoping someone can tell me if it >> >> > is fixable. Any comments or hints in to send me in the right direction >> >> > would be greatly appreciated. I am using LLVM version 3.8, commit >> 251286. >> >> > >> >> > >> >> > I have an IR pass that stores metadata in the upper 32 bits of 64-bit >> >> > pointers in order to implement memory safety. The pass instruments >> >> > loads and stores to do an AND of the address with 0xffffffff to mask >> >> > out that metadata. E.g., when loading a 4-byte value from memory >> >> > pointed to by %rbx, this translates to the following asm: >> >> > mov %ecx,%ecx ; zeroes the upper bits, removing the metadata >> >> > mov (%rcx),%eax >> >> > >> >> > This leads to quite some overhead (12% on SPEC CPU2006) so I am >> >> > looking into possibilities for backend modifications to optimize this. >> >> > The first mov introduces unnecessary extra cycles and the second mov >> >> > has to wait for its results, potentially stalling the pipeline. On top >> >> > of that, it increases register pressure when the original pointer must >> >> > be preserved for later use (e.g. the mask would be "mov %esi,%ecx" >> >> > after which %rsi is dereferenced, instead of just dereferencing %esi). >> >> > >> >> > So, what I would like to generate instead is the following: >> >> > mov (%ecx),%eax >> >> > I.e., don't do the masking in a separate mov, but by using a >> >> > subregister for the address (which is zero-extended, effectively >> >> > ignoring the metadata bits). As a side note, GCC does emit the second >> >> > snippet as expected. >> >> > >> >> > >> >> > Looking at the TableGen files I found two problems: >> >> > >> >> > 1. The AND of the address with 0xffffffff is replaced with >> >> > SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in >> >> > lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits an >> >> > explicit mov instruction later. I think I need to replace this with >> >> > (i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that produces a >> >> > 32-bit value, which leads me to the next, more general problem: >> >> > >> >> > 2. The x86 backend currently does not support dereferencing 32-bit >> >> > addresses in 64-bit mode. Specifically, addresses are defined as an >> >> > iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8 >> >> > bytes depending on if 32/64 bit mode is active: >> >> > def addr : ComplexPattern<iPTR, 5, "selectAddr", [], >> >> > [SDNPWantParent]>; >> >> > The derefencing mov instruction looks like this: >> >> > def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src), >> >> > "mov{l}\t{$src, $dst|$dst, $src}", >> >> > [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>, OpSize32; >> >> > So it expects a source address of type 'addr' which is 8 bytes. This >> >> > leads to the following code being emitted when I apply my solution to >> >> > problem 1: >> >> > mov (%rcx),%eax >> >> > In other words, the upper bits are not ignored. >> >> > >> >> > >> >> > I am currently not sure what is the best place to solve this problem. >> >> > The best would be to give the 'addr' type a dynamic size but I don't >> >> > know how to do this. Any ideas on this? >> >> >> >> A TableGen pattern can only match one specific type; you'll need a >> >> separate pattern to match a 32-bit address. Yes, this means you'll need >> >> to write your own separate pattern for every load/store instruction, but >> >> there isn't really any way around that. >> >> >> >> There are some existing patterns involving MOV32rm, if you want >> >> inspiration; for example, the following pattern is from >> X86InstrCompiler.td: >> >> >> >> def : Pat<(extloadi64i32 addr:$src), >> >> (SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>; >> >> >> >> -Eli >> >> >> >> -- >> >> Employee of Qualcomm Innovation Center, Inc. >> >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a >> Linux Foundation Collaborative Project >> >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170802/ac373156/attachment.html>