thr3ads.net - llvm dev - [llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Taddeus via llvm-dev

2017-Aug-02 16:03 UTC

[llvm-dev] Efficiently ignoring upper 32 pointer bits when dereferencing

Hi all,

I am experiencing a problem with the representation of addresses in the 
x86_64 TableGen backend and was hoping someone can tell me if it is 
fixable. Any comments or hints in to send me in the right direction 
would be greatly appreciated. I am using  LLVM version 3.8, commit 
251286.


I have an IR pass that stores metadata in the upper 32 bits of 64-bit 
pointers in order to implement memory safety. The pass instruments 
loads and stores to do an AND of the address with 0xffffffff to mask 
out that metadata. E.g., when loading a 4-byte value from memory 
pointed to by %rbx, this translates to the following asm:
    mov    %ecx,%ecx   ; zeroes the upper bits, removing the metadata
    mov    (%rcx),%eax

This leads to quite some overhead (12% on SPEC CPU2006) so I am looking 
into possibilities for backend modifications to optimize this. The 
first mov introduces unnecessary extra cycles and the second mov has to 
wait for its results, potentially stalling the pipeline. On top of 
that, it increases register pressure when the original pointer must be 
preserved for later use (e.g. the mask would be "mov %esi,%ecx" after 
which %rsi is dereferenced, instead of just dereferencing %esi).

So, what I would like to generate instead is the following:
    mov    (%ecx),%eax
I.e., don't do the masking in a separate mov, but by using a 
subregister for the address (which is zero-extended, effectively 
ignoring the metadata bits). As a side note, GCC does emit the second 
snippet as expected.


Looking at the TableGen files I found two problems:

1. The AND of the address with 0xffffffff is replaced with 
SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in 
lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits an 
explicit mov instruction later. I think I need to replace this with 
(i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that produces a 
32-bit value, which leads me to the next, more general problem:

2. The x86 backend currently does not support dereferencing 32-bit 
addresses in 64-bit mode. Specifically, addresses are defined as an 
iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8 bytes 
depending on if 32/64 bit mode is active:
    def addr : ComplexPattern<iPTR, 5, "selectAddr", [], 
[SDNPWantParent]>;
The derefencing mov instruction looks like this:
   def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src),
        "mov{l}\t{$src, $dst|$dst, $src}",
        [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>, OpSize32;
So it expects a source address of type 'addr' which is 8 bytes. This 
leads to the following code being emitted when I apply my solution to 
problem 1:
     mov    (%rcx),%eax
In other words, the upper bits are not ignored.


I am currently not sure what is the best place to solve this problem. 
The best would be to give the 'addr' type a dynamic size but I don't
know how to do this. Any ideas on this?

Cheers,
Taddeüs
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170802/af1dfc41/attachment.html>

Friedman, Eli via llvm-dev

2017-Aug-02 17:47 UTC

head link

[llvm-dev] Efficiently ignoring upper 32 pointer bits when dereferencing

On 8/2/2017 9:03 AM, Taddeus via llvm-dev wrote:> Hi all,
>
> I am experiencing a problem with the representation of addresses in 
> the x86_64 TableGen backend and was hoping someone can tell me if it 
> is fixable. Any comments or hints in to send me in the right direction 
> would be greatly appreciated. I am using  LLVM version 3.8, commit 251286.
>
>
> I have an IR pass that stores metadata in the upper 32 bits of 64-bit 
> pointers in order to implement memory safety. The pass instruments 
> loads and stores to do an AND of the address with 0xffffffff to mask 
> out that metadata. E.g., when loading a 4-byte value from memory 
> pointed to by %rbx, this translates to the following asm:
>     mov    %ecx,%ecx   ; zeroes the upper bits, removing the metadata
>     mov    (%rcx),%eax
>
> This leads to quite some overhead (12% on SPEC CPU2006) so I am 
> looking into possibilities for backend modifications to optimize this. 
> The first mov introduces unnecessary extra cycles and the second mov 
> has to wait for its results, potentially stalling the pipeline. On top 
> of that, it increases register pressure when the original pointer must 
> be preserved for later use (e.g. the mask would be "mov
%esi,%ecx"
> after which %rsi is dereferenced, instead of just dereferencing %esi).
>
> So, what I would like to generate instead is the following:
>     mov    (%ecx),%eax
> I.e., don't do the masking in a separate mov, but by using a 
> subregister for the address (which is zero-extended, effectively 
> ignoring the metadata bits). As a side note, GCC does emit the second 
> snippet as expected.
>
>
> Looking at the TableGen files I found two problems:
>
> 1. The AND of the address with 0xffffffff is replaced with 
> SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in 
> lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits an 
> explicit mov instruction later. I think I need to replace this with 
> (i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that produces a 
> 32-bit value, which leads me to the next, more general problem:
>
> 2. The x86 backend currently does not support dereferencing 32-bit 
> addresses in 64-bit mode. Specifically, addresses are defined as an 
> iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8 
> bytes depending on if 32/64 bit mode is active:
>     def addr : ComplexPattern<iPTR, 5, "selectAddr", [], 
> [SDNPWantParent]>;
> The derefencing mov instruction looks like this:
>    def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src),
>         "mov{l}\t{$src, $dst|$dst, $src}",
>         [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>, OpSize32;
> So it expects a source address of type 'addr' which is 8 bytes.
This
> leads to the following code being emitted when I apply my solution to 
> problem 1:
>      mov    (%rcx),%eax
> In other words, the upper bits are not ignored.
>
>
> I am currently not sure what is the best place to solve this problem. 
> The best would be to give the 'addr' type a dynamic size but I
don't
> know how to do this. Any ideas on this?
A TableGen pattern can only match one specific type; you'll need a 
separate pattern to match a 32-bit address.  Yes, this means you'll need 
to write your own separate pattern for every load/store instruction, but 
there isn't really any way around that.

There are some existing patterns involving MOV32rm, if you want 
inspiration; for example, the following pattern is from X86InstrCompiler.td:

def : Pat<(extloadi64i32 addr:$src),
           (SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>;

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

Taddeus Kroes via llvm-dev

2017-Aug-02 21:03 UTC

head link

[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing

Hi Eli,
Thanks, I’ll look into that then!

Cheers,
Taddeüs

From: Friedman, Eli
Sent: Wednesday, 2 August 2017 19:48
To: Taddeus; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Efficiently ignoring upper 32 pointer bits
whendereferencing

On 8/2/2017 9:03 AM, Taddeus via llvm-dev wrote:> Hi all,
>
> I am experiencing a problem with the representation of addresses in 
> the x86_64 TableGen backend and was hoping someone can tell me if it 
> is fixable. Any comments or hints in to send me in the right direction 
> would be greatly appreciated. I am using  LLVM version 3.8, commit 251286.
>
>
> I have an IR pass that stores metadata in the upper 32 bits of 64-bit 
> pointers in order to implement memory safety. The pass instruments 
> loads and stores to do an AND of the address with 0xffffffff to mask 
> out that metadata. E.g., when loading a 4-byte value from memory 
> pointed to by %rbx, this translates to the following asm:
>     mov    %ecx,%ecx   ; zeroes the upper bits, removing the metadata
>     mov    (%rcx),%eax
>
> This leads to quite some overhead (12% on SPEC CPU2006) so I am 
> looking into possibilities for backend modifications to optimize this. 
> The first mov introduces unnecessary extra cycles and the second mov 
> has to wait for its results, potentially stalling the pipeline. On top 
> of that, it increases register pressure when the original pointer must 
> be preserved for later use (e.g. the mask would be "mov
%esi,%ecx"
> after which %rsi is dereferenced, instead of just dereferencing %esi).
>
> So, what I would like to generate instead is the following:
>     mov    (%ecx),%eax
> I.e., don't do the masking in a separate mov, but by using a 
> subregister for the address (which is zero-extended, effectively 
> ignoring the metadata bits). As a side note, GCC does emit the second 
> snippet as expected.
>
>
> Looking at the TableGen files I found two problems:
>
> 1. The AND of the address with 0xffffffff is replaced with 
> SUBREG_TO_REG(MOV32rr (EXTRACT_SUBREG ...)) in 
> lib/Target/X86/X86InstrCompiler.td (line 1326). That MOV32rr emits an 
> explicit mov instruction later. I think I need to replace this with 
> (i32 (EXTRACT_SUBREG ...)) to get rid of the mov, but that produces a 
> 32-bit value, which leads me to the next, more general problem:
>
> 2. The x86 backend currently does not support dereferencing 32-bit 
> addresses in 64-bit mode. Specifically, addresses are defined as an 
> iPTR type in X86InstrInfo.td which I assume is expanded to 4 or 8 
> bytes depending on if 32/64 bit mode is active:
>     def addr : ComplexPattern<iPTR, 5, "selectAddr", [], 
> [SDNPWantParent]>;
> The derefencing mov instruction looks like this:
>    def MOV32rm : I<0x8B, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src),
>         "mov{l}\t{$src, $dst|$dst, $src}",
>         [(set GR32:$dst, (loadi32 addr:$src))], IIC_MOV_MEM>, OpSize32;
> So it expects a source address of type 'addr' which is 8 bytes.
This
> leads to the following code being emitted when I apply my solution to 
> problem 1:
>      mov    (%rcx),%eax
> In other words, the upper bits are not ignored.
>
>
> I am currently not sure what is the best place to solve this problem. 
> The best would be to give the 'addr' type a dynamic size but I
don't
> know how to do this. Any ideas on this?
A TableGen pattern can only match one specific type; you'll need a 
separate pattern to match a 32-bit address.  Yes, this means you'll need 
to write your own separate pattern for every load/store instruction, but 
there isn't really any way around that.

There are some existing patterns involving MOV32rm, if you want 
inspiration; for example, the following pattern is from X86InstrCompiler.td:

def : Pat<(extloadi64i32 addr:$src),
           (SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>;

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170802/0a85f677/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Aug 2017 - Efficiently ignoring upper 32 pointer bits whendereferencing

[llvm-dev] Efficiently ignoring upper 32 pointer bits when dereferencing

[llvm-dev] Efficiently ignoring upper 32 pointer bits when dereferencing

[llvm-dev] Efficiently ignoring upper 32 pointer bits whendereferencing

Possibly Parallel Threads