thr3ads.net - llvm dev - [llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Taddeus Kroes via llvm-dev

2017-Aug-02 21:03 UTC

[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing

Hi Eli,
Thanks, I’ll look into that then!

Cheers,
Taddeüs

From: Friedman, Eli
Sent: Wednesday, 2 August 2017 19:48
To: Taddeus; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Efficiently ignoring upper 32 pointer bits
whendereferencing

On 8/2/2017 9:03 AM, Taddeus via llvm-dev wrote:> Hi all,
>
> I am experiencing a problem with the representation of addresses in 
> the x86_64 TableGen backend and was hoping someone can tell me if it 
> is fixable. Any comments or hints in to send me in the right direction 
> would be greatly appreciated. I am using  LLVM version 3.8, commit 251286.
>
>
> I have an IR pass that stores metadata in the upper 32 bits of 64-bit 
> pointers in order to implement memory safety. The pass instruments 
> loads and stores to do an AND of the address with 0xffffffff to mask 
> out that metadata. E.g., when loading a 4-byte value from memory 
> pointed to by %rbx, this translates to the following asm:
>     mov    %ecx,%ecx   ; zeroes the upper bits, removing the metadata
>     mov    (%rcx),%eax
>
> This leads to quite some overhead (12% on SPEC CPU2006) so I am 
> looking into possibilities for backend modifications to optimize this. 
> The first mov introduces unnecessary extra cycles and the second mov 
> has to wait for its results, potentially stalling the pipeline. On top 
> of that, it increases register pressure when the original pointer must 
> be preserved for later use (e.g. the mask would be "mov
%esi,%ecx"
> after which %rsi is dereferenced, instead of just dereferencing %esi).
>
> So, what I would like to generate instead is the following:
>     mov    (%ecx),%eax
> I.e., don't do the masking in a separate mov, but by using a 
> subregister for the address (which is zero-extended, effectively 
> ignoring the metadata bits). As a side note, GCC does emit the second 
> snippet as expected.
>
>
> Looking at the TableGen files I found two problems:
>
> 1. The AND of the address with 0xffffffff is replaced with 
> SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in 
> lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits an 
> explicit mov instruction later. I think I need to replace this with 
> (i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that produces a 
> 32-bit value, which leads me to the next, more general problem:
>
> 2. The x86 backend currently does not support dereferencing 32-bit 
> addresses in 64-bit mode. Specifically, addresses are defined as an 
> iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8 
> bytes depending on if 32/64 bit mode is active:
>     def addr : ComplexPattern<iPTR, 5, "selectAddr", [], 
> [SDNPWantParent]>;
> The derefencing mov instruction looks like this:
>    def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src),
>         "mov{l}\t{$src, $dst|$dst, $src}",
>         [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>, OpSize32;
> So it expects a source address of type 'addr' which is 8 bytes.
This
> leads to the following code being emitted when I apply my solution to 
> problem 1:
>      mov    (%rcx),%eax
> In other words, the upper bits are not ignored.
>
>
> I am currently not sure what is the best place to solve this problem. 
> The best would be to give the 'addr' type a dynamic size but I
don't
> know how to do this. Any ideas on this?
A TableGen pattern can only match one specific type; you'll need a 
separate pattern to match a 32-bit address.  Yes, this means you'll need 
to write your own separate pattern for every load/store instruction, but 
there isn't really any way around that.

There are some existing patterns involving MOV32rm, if you want 
inspiration; for example, the following pattern is from X86InstrCompiler.td:

def : Pat<(extloadi64i32 addr:$src),
           (SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>;

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170802/0a85f677/attachment.html>

Craig Topper via llvm-dev

2017-Aug-02 21:17 UTC

head link

[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing

Getting the instruction to actually use (%ecx) as the address requires
putting a 0x67 prefix on the instruction. I'm not sure how to convince
X86MCCodeEmitter.cpp to do that for you. Assuming you're wanting to
generate binary and not textual assembly.

~Craig

On Wed, Aug 2, 2017 at 2:03 PM, Taddeus Kroes via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi Eli,
>
> Thanks, I’ll look into that then!
>
>
>
> Cheers,
>
> Taddeüs
>
>
>
> *From: *Friedman, Eli <efriedma at codeaurora.org>
> *Sent: *Wednesday, 2 August 2017 19:48
> *To: *Taddeus <t.kroes at vu.nl>; llvm-dev at lists.llvm.org
> *Subject: *Re: [llvm-dev] Efficiently ignoring upper 32 pointer bits
> whendereferencing
>
>
>
> On 8/2/2017 9:03 AM, Taddeus via llvm-dev wrote:
>
> > Hi all,
>
> >
>
> > I am experiencing a problem with the representation of addresses in
>
> > the x86_64 TableGen backend and was hoping someone can tell me if it
>
> > is fixable. Any comments or hints in to send me in the right direction
>
> > would be greatly appreciated. I am using  LLVM version 3.8, commit
> 251286.
>
> >
>
> >
>
> > I have an IR pass that stores metadata in the upper 32 bits of 64-bit
>
> > pointers in order to implement memory safety. The pass instruments
>
> > loads and stores to do an AND of the address with 0xffffffff to mask
>
> > out that metadata. E.g., when loading a 4-byte value from memory
>
> > pointed to by %rbx, this translates to the following asm:
>
> >     mov    %ecx,%ecx   ; zeroes the upper bits, removing the metadata
>
> >     mov    (%rcx),%eax
>
> >
>
> > This leads to quite some overhead (12% on SPEC CPU2006) so I am
>
> > looking into possibilities for backend modifications to optimize this.
>
> > The first mov introduces unnecessary extra cycles and the second mov
>
> > has to wait for its results, potentially stalling the pipeline. On top
>
> > of that, it increases register pressure when the original pointer must
>
> > be preserved for later use (e.g. the mask would be "mov
%esi,%ecx"
>
> > after which %rsi is dereferenced, instead of just dereferencing %esi).
>
> >
>
> > So, what I would like to generate instead is the following:
>
> >     mov    (%ecx),%eax
>
> > I.e., don't do the masking in a separate mov, but by using a
>
> > subregister for the address (which is zero-extended, effectively
>
> > ignoring the metadata bits). As a side note, GCC does emit the second
>
> > snippet as expected.
>
> >
>
> >
>
> > Looking at the TableGen files I found two problems:
>
> >
>
> > 1. The AND of the address with 0xffffffff is replaced with
>
> > SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in
>
> > lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits an
>
> > explicit mov instruction later. I think I need to replace this with
>
> > (i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that produces a
>
> > 32-bit value, which leads me to the next, more general problem:
>
> >
>
> > 2. The x86 backend currently does not support dereferencing 32-bit
>
> > addresses in 64-bit mode. Specifically, addresses are defined as an
>
> > iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8
>
> > bytes depending on if 32/64 bit mode is active:
>
> >     def addr : ComplexPattern<iPTR, 5, "selectAddr", [],
>
> > [SDNPWantParent]>;
>
> > The derefencing mov instruction looks like this:
>
> >    def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins
i32mem:$src),
>
> >         "mov{l}\t{$src, $dst|$dst, $src}",
>
> >         [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>,
OpSize32;
>
> > So it expects a source address of type 'addr' which is 8
bytes. This
>
> > leads to the following code being emitted when I apply my solution to
>
> > problem 1:
>
> >      mov    (%rcx),%eax
>
> > In other words, the upper bits are not ignored.
>
> >
>
> >
>
> > I am currently not sure what is the best place to solve this problem.
>
> > The best would be to give the 'addr' type a dynamic size but I
don't
>
> > know how to do this. Any ideas on this?
>
>
>
> A TableGen pattern can only match one specific type; you'll need a
>
> separate pattern to match a 32-bit address.  Yes, this means you'll
need
>
> to write your own separate pattern for every load/store instruction, but
>
> there isn't really any way around that.
>
>
>
> There are some existing patterns involving MOV32rm, if you want
>
> inspiration; for example, the following pattern is from
> X86InstrCompiler.td:
>
>
>
> def : Pat<(extloadi64i32 addr:$src),
>
>            (SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>;
>
>
>
> -Eli
>
>
>
> --
>
> Employee of Qualcomm Innovation Center, Inc.
>
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
> Foundation Collaborative Project
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170802/dba579f7/attachment.html>

Craig Topper via llvm-dev

2017-Aug-02 21:22 UTC

head link

[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing

Maybe the code emitter will just work because it detects the register size
since we have to support hand written assembly.

~Craig

On Wed, Aug 2, 2017 at 2:17 PM, Craig Topper <craig.topper at gmail.com>
wrote:
> Getting the instruction to actually use (%ecx) as the address requires
> putting a 0x67 prefix on the instruction. I'm not sure how to convince
> X86MCCodeEmitter.cpp to do that for you. Assuming you're wanting to
> generate binary and not textual assembly.
>
> ~Craig
>
> On Wed, Aug 2, 2017 at 2:03 PM, Taddeus Kroes via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi Eli,
>>
>> Thanks, I’ll look into that then!
>>
>>
>>
>> Cheers,
>>
>> Taddeüs
>>
>>
>>
>> *From: *Friedman, Eli <efriedma at codeaurora.org>
>> *Sent: *Wednesday, 2 August 2017 19:48
>> *To: *Taddeus <t.kroes at vu.nl>; llvm-dev at lists.llvm.org
>> *Subject: *Re: [llvm-dev] Efficiently ignoring upper 32 pointer bits
>> whendereferencing
>>
>>
>>
>> On 8/2/2017 9:03 AM, Taddeus via llvm-dev wrote:
>>
>> > Hi all,
>>
>> >
>>
>> > I am experiencing a problem with the representation of addresses
in
>>
>> > the x86_64 TableGen backend and was hoping someone can tell me if
it
>>
>> > is fixable. Any comments or hints in to send me in the right
direction
>>
>> > would be greatly appreciated. I am using  LLVM version 3.8, commit
>> 251286.
>>
>> >
>>
>> >
>>
>> > I have an IR pass that stores metadata in the upper 32 bits of
64-bit
>>
>> > pointers in order to implement memory safety. The pass instruments
>>
>> > loads and stores to do an AND of the address with 0xffffffff to
mask
>>
>> > out that metadata. E.g., when loading a 4-byte value from memory
>>
>> > pointed to by %rbx, this translates to the following asm:
>>
>> >     mov    %ecx,%ecx   ; zeroes the upper bits, removing the
metadata
>>
>> >     mov    (%rcx),%eax
>>
>> >
>>
>> > This leads to quite some overhead (12% on SPEC CPU2006) so I am
>>
>> > looking into possibilities for backend modifications to optimize
this.
>>
>> > The first mov introduces unnecessary extra cycles and the second
mov
>>
>> > has to wait for its results, potentially stalling the pipeline. On
top
>>
>> > of that, it increases register pressure when the original pointer
must
>>
>> > be preserved for later use (e.g. the mask would be "mov
%esi,%ecx"
>>
>> > after which %rsi is dereferenced, instead of just dereferencing
%esi).
>>
>> >
>>
>> > So, what I would like to generate instead is the following:
>>
>> >     mov    (%ecx),%eax
>>
>> > I.e., don't do the masking in a separate mov, but by using a
>>
>> > subregister for the address (which is zero-extended, effectively
>>
>> > ignoring the metadata bits). As a side note, GCC does emit the
second
>>
>> > snippet as expected.
>>
>> >
>>
>> >
>>
>> > Looking at the TableGen files I found two problems:
>>
>> >
>>
>> > 1. The AND of the address with 0xffffffff is replaced with
>>
>> > SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in
>>
>> > lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits
an
>>
>> > explicit mov instruction later. I think I need to replace this
with
>>
>> > (i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that
produces a
>>
>> > 32-bit value, which leads me to the next, more general problem:
>>
>> >
>>
>> > 2. The x86 backend currently does not support dereferencing 32-bit
>>
>> > addresses in 64-bit mode. Specifically, addresses are defined as
an
>>
>> > iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8
>>
>> > bytes depending on if 32/64 bit mode is active:
>>
>> >     def addr : ComplexPattern<iPTR, 5, "selectAddr",
[],
>>
>> > [SDNPWantParent]>;
>>
>> > The derefencing mov instruction looks like this:
>>
>> >    def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins
i32mem:$src),
>>
>> >         "mov{l}\t{$src, $dst|$dst, $src}",
>>
>> >         [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>,
OpSize32;
>>
>> > So it expects a source address of type 'addr' which is 8
bytes. This
>>
>> > leads to the following code being emitted when I apply my solution
to
>>
>> > problem 1:
>>
>> >      mov    (%rcx),%eax
>>
>> > In other words, the upper bits are not ignored.
>>
>> >
>>
>> >
>>
>> > I am currently not sure what is the best place to solve this
problem.
>>
>> > The best would be to give the 'addr' type a dynamic size
but I don't
>>
>> > know how to do this. Any ideas on this?
>>
>>
>>
>> A TableGen pattern can only match one specific type; you'll need a
>>
>> separate pattern to match a 32-bit address.  Yes, this means you'll
need
>>
>> to write your own separate pattern for every load/store instruction,
but
>>
>> there isn't really any way around that.
>>
>>
>>
>> There are some existing patterns involving MOV32rm, if you want
>>
>> inspiration; for example, the following pattern is from
>> X86InstrCompiler.td:
>>
>>
>>
>> def : Pat<(extloadi64i32 addr:$src),
>>
>>            (SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>;
>>
>>
>>
>> -Eli
>>
>>
>>
>> --
>>
>> Employee of Qualcomm Innovation Center, Inc.
>>
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
>> Linux Foundation Collaborative Project
>>
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170802/ac373156/attachment.html>

llvm dev - Aug 2017 - Efficiently ignoring upper 32 pointer bits whendereferencing

[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing

[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing

[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing