thr3ads.net - llvm dev - [LLVMdev] Reducing Generic Address Space Usage [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Jingyue Wu

2014-Mar-25 21:31 UTC

[LLVMdev] Reducing Generic Address Space Usage

This is a follow-up discussion on
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20140324/101899.html.
The front-end change was already pushed in r204677, so we want to continue
with the IR optimization.

In general, we want to write an IR pass to convert generic address space
usage to non-generic address space usage, because accessing the generic
address space in CUDA and OpenCL is significantly slower than accessing
non-generic ones (such as shared and constant),.

Here is an example Justin gave:

  %ptr = ...
  %val = load i32* %ptr

In this case, %ptr is a generic address space pointer (assuming an address
space mapping where 0 is generic).  But if an analysis can prove that the
pointer %ptr was originally addrspacecast'd from a specific address space
(or some other mechanism through which the pointer's specific address space
can be determined), it may be beneficial to explicitly convert the IR to
something like:

  %ptr = ...
  %ptr.0 = addrspacecast i32* to i32 addrspace(3)*
  %val = load i32 addrspace(3)* %ptr.0

Such a translation may generate better code for some targets.

There are two major design decisions we need to make:

1. Where does this pass live? Target-independent or target-dependent?

Both NVPTX and R600 backend want this optimization, which seems a good
justification for making this optimization target-independent.

However, we have three concerns on this:
a) I doubt this optimization is valid for all targets, because LLVM
language reference (
http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction) says
addrspacecast "can be a no-op cast or a complex value modification,
depending on the target and the address space pair."
b) NVPTX and R600 have different address numbering for the generic address
space, which makes things more complicated.
c) We don't have a good understanding of the R600 backend.

Therefore, I would vote for making this optimization NVPTX-specific for
now. If other targets need this, we can later think about how to reuse the
code.

2. How effective do we want this optimization to be?

In the short term, I want it to be able to eliminate unnecessary
non-generic-to-generic addrspacecasts the front-end generates for the NVPTX
target. For example,

%p1 = addrspace i32 addrspace(3)* %p0 to i32*
%v = load i32* %p1

=>

%v = load i32 addrspace(3)* %p0

We want similar optimization for store+addrspacecast and gep+addrspacecast
as well.

In a long term, we could for sure improve this optimization to handle more
instructions and more patterns.

Jingyue
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140325/12e4a145/attachment.html>

Matt Arsenault

2014-Mar-25 22:21 UTC

head link

[LLVMdev] Reducing Generic Address Space Usage

On 03/25/2014 02:31 PM, Jingyue Wu wrote:>
> However, we have three concerns on this:
> a) I doubt this optimization is valid for all targets, because LLVM 
> language reference 
> (http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction) says 
> addrspacecast "can be a no-op cast or a complex value modification, 
> depending on the target and the address space pair."I think most of the simple cast optimizations would be acceptable. The 
addrspacecasted pointer still needs to point to the same memory 
location, so changing an access to use a different address space would 
be OK. I think canonicalizing accesses to use the original address space 
of a casted pointer when possible would make sense.
> b) NVPTX and R600 have different address numbering for the generic 
> address space, which makes things more complicated.
> c) We don't have a good understanding of the R600 backend.
>
R600 currently does not support the flat address space instructions 
intended to use for the generic address space. I posted a patch a while 
ago that half added it, which I can try to work on finishing if it would 
help.

I also do not understand how NVPTX uses address spaces, particularly how 
it can use 0 as the the generic address space.
> 2. How effective do we want this optimization to be?
>
> In the short term, I want it to be able to eliminate unnecessary 
> non-generic-to-generic addrspacecasts the front-end generates for the 
> NVPTX target. For example,
>
> %p1 = addrspace i32 addrspace(3)* %p0 to i32*
> %v = load i32* %p1
>
> =>
>
> %v = load i32 addrspace(3)* %p0
>
> We want similar optimization for store+addrspacecast and 
> gep+addrspacecast as well.
>
> In a long term, we could for sure improve this optimization to handle 
> more instructions and more patterns.
>I believe most of the cast simplifications that apply to bitcasts of 
pointers also apply to addrspacecast. I have some patches waiting that 
extend some of the more basic ones to understand addrspacecast (e.g. 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140120/202296.html),
plus a few more that I haven't posted yet. Mostly they are little cast 
simplifications like your example in instcombine, but also SROA to 
eliminate allocas that are addrspacecasted.

-Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140325/ab358263/attachment.html>

Jingyue Wu

2014-Mar-26 00:07 UTC

head link

[LLVMdev] Reducing Generic Address Space Usage

On Tue, Mar 25, 2014 at 3:21 PM, Matt Arsenault
<Matthew.Arsenault at amd.com>wrote:
>  On 03/25/2014 02:31 PM, Jingyue Wu wrote:
>
>
> However, we have three concerns on this:
> a) I doubt this optimization is valid for all targets, because LLVM
> language reference (
> http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction) says
> addrspacecast "can be a no-op cast or a complex value modification,
> depending on the target and the address space pair."
>
> I think most of the simple cast optimizations would be acceptable. The
> addrspacecasted pointer still needs to point to the same memory location,
> so changing an access to use a different address space would be OK. I think
> canonicalizing accesses to use the original address space of a casted
> pointer when possible would make sense.
>
"the address space conversion is legal then both result and operand refer
to the same memory location". I don't quite understand this sentence.
Does
the same memory location mean the same numeric value?

>
>
>   b) NVPTX and R600 have different address numbering for the generic
> address space, which makes things more complicated.
> c) We don't have a good understanding of the R600 backend.
>
>
> R600 currently does not support the flat address space instructions
> intended to use for the generic address space. I posted a patch a while ago
> that half added it, which I can try to work on finishing if it would help.
>
> I also do not understand how NVPTX uses address spaces, particularly how
> it can use 0 as the the generic address space.
>
NVPTX backend generates ld.f32 for reading from the generic address space.
There's no special machine instruction to read/write from/to the generic
address space in R600?

>
>
>   2. How effective do we want this optimization to be?
>
>  In the short term, I want it to be able to eliminate unnecessary
> non-generic-to-generic addrspacecasts the front-end generates for the NVPTX
> target. For example,
>
>  %p1 = addrspace i32 addrspace(3)* %p0 to i32*
> %v = load i32* %p1
>
>  =>
>
>  %v = load i32 addrspace(3)* %p0
>
>  We want similar optimization for store+addrspacecast and
> gep+addrspacecast as well.
>
>  In a long term, we could for sure improve this optimization to handle
> more instructions and more patterns.
>
>   I believe most of the cast simplifications that apply to bitcasts of
> pointers also apply to addrspacecast. I have some patches waiting that
> extend some of the more basic ones to understand addrspacecast (e.g.
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140120/202296.html),
> plus a few more that I haven't posted yet. Mostly they are little cast
> simplifications like your example in instcombine, but also SROA to
> eliminate allocas that are addrspacecasted.
>
We also think InstCombine is a good place to put this optimization, if we
decide to go with target-independent. Looking forward to your patches!

>
>
> -Matt
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140325/7214198d/attachment.html>

Justin Holewinski

2014-Mar-26 15:24 UTC

head link

[LLVMdev] Reducing Generic Address Space Usage

On 03/25/2014 06:21 PM, Matt Arsenault wrote:> On 03/25/2014 02:31 PM, Jingyue Wu wrote:
>>
>> However, we have three concerns on this:
>> a) I doubt this optimization is valid for all targets, because LLVM 
>> language reference 
>> (http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction) says 
>> addrspacecast "can be a no-op cast or a complex value
modification,
>> depending on the target and the address space pair."
> I think most of the simple cast optimizations would be acceptable. The 
> addrspacecasted pointer still needs to point to the same memory 
> location, so changing an access to use a different address space would 
> be OK. I think canonicalizing accesses to use the original address 
> space of a casted pointer when possible would make sense.
>
>> b) NVPTX and R600 have different address numbering for the generic 
>> address space, which makes things more complicated.
>> c) We don't have a good understanding of the R600 backend.
>>
>
> R600 currently does not support the flat address space instructions 
> intended to use for the generic address space. I posted a patch a 
> while ago that half added it, which I can try to work on finishing if 
> it would help.
>
> I also do not understand how NVPTX uses address spaces, particularly 
> how it can use 0 as the the generic address space.
We handle alloca by expanding it to a local stack reservation plus a 
pointer conversion to the generic address space.  So if we have IR like 
the following:

%ptr = alloca i32
store i32 0, i32* %ptr

This will really get expanded to something like the following at 
MachineInstr-level (in pseudo-code):

%local_ptr = %SP+offset    ; Stack pointer (in thread-local [private] 
address space)
%ptr = convert %local_ptr to generic address
store.generic.i32 [%ptr], 0

With the proposed optimization, this would be optimized back to a 
non-generic store:

%local_ptr = %SP+offset
%ptr = convert %local_ptr to generic address
%ptr.0 = convert %ptr to thread-local address space
store.local.i32 [%ptr.0], 0

This turns the address space conversion sequence into a no-op (assuming 
no other users) that can be eliminated, and a non-generic store is 
likely to be more efficient than a generic store.
>
>> 2. How effective do we want this optimization to be?
>>
>> In the short term, I want it to be able to eliminate unnecessary 
>> non-generic-to-generic addrspacecasts the front-end generates for the 
>> NVPTX target. For example,
>>
>> %p1 = addrspace i32 addrspace(3)* %p0 to i32*
>> %v = load i32* %p1
>>
>> =>
>>
>> %v = load i32 addrspace(3)* %p0
>>
>> We want similar optimization for store+addrspacecast and 
>> gep+addrspacecast as well.
>>
>> In a long term, we could for sure improve this optimization to handle 
>> more instructions and more patterns.
>>
> I believe most of the cast simplifications that apply to bitcasts of 
> pointers also apply to addrspacecast. I have some patches waiting that 
> extend some of the more basic ones to understand addrspacecast (e.g. 
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140120/202296.html),
> plus a few more that I haven't posted yet. Mostly they are little cast 
> simplifications like your example in instcombine, but also SROA to 
> eliminate allocas that are addrspacecasted.
>
> -Matt

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may
contain
confidential information.  Any unauthorized review, use, disclosure or
distribution
is prohibited.  If you are not the intended recipient, please contact the sender
by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140326/73a51945/attachment.html>

Tom Stellard

2014-Mar-26 15:48 UTC

head link

[LLVMdev] Reducing Generic Address Space Usage

On Tue, Mar 25, 2014 at 02:31:05PM -0700, Jingyue Wu
wrote:> This is a follow-up discussion on
>
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20140324/101899.html.
> The front-end change was already pushed in r204677, so we want to continue
> with the IR optimization.
> 
> In general, we want to write an IR pass to convert generic address space
> usage to non-generic address space usage, because accessing the generic
> address space in CUDA and OpenCL is significantly slower than accessing
> non-generic ones (such as shared and constant),.
> 
> Here is an example Justin gave:
> 
>   %ptr = ...
>   %val = load i32* %ptr
> 
> In this case, %ptr is a generic address space pointer (assuming an address
> space mapping where 0 is generic).  But if an analysis can prove that the
> pointer %ptr was originally addrspacecast'd from a specific address
space
> (or some other mechanism through which the pointer's specific address
space
> can be determined), it may be beneficial to explicitly convert the IR to
> something like:
> 
>   %ptr = ...
>   %ptr.0 = addrspacecast i32* to i32 addrspace(3)*
>   %val = load i32 addrspace(3)* %ptr.0
> 
> Such a translation may generate better code for some targets.
>
I think a slight variation of this optimization may be useful for the
R600 backend.  One thing I have been working on is migrating allocas
to different address spaces, which in some cases may improve
performance.  Here is an example:


%ptr = alloca [5 x i32]
...

Would become:

@local_mem = internal addrspace(3) unnamed_addr global [5 x i32]

%ptr = addrspacecast [5 x i32] addrspace(3)* @local_me to i32*
...


In this case I would like all users of %ptr to read and write
address space 3 rather than address space 0, and it sounds like your
proposed optimization pass could do this.

> There are two major design decisions we need to make:
> 
> 1. Where does this pass live? Target-independent or target-dependent?
> 
> Both NVPTX and R600 backend want this optimization, which seems a good
> justification for making this optimization target-independent.
> 
I agree here.
> However, we have three concerns on this:
> a) I doubt this optimization is valid for all targets, because LLVM
> language reference (
> http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction) says
> addrspacecast "can be a no-op cast or a complex value modification,
> depending on the target and the address space pair."
Does it matter that it isn't valid for all targets as long as it is
valid for some?  We could add it, but not run it by default.
> b) NVPTX and R600 have different address numbering for the generic address
> space, which makes things more complicated.
Could we add a TargetLowering callback that the pass can use to determine
whether or not is is profitable to replace one address space with
another?

-Tom
> c) We don't have a good understanding of the R600 backend.
> 
> Therefore, I would vote for making this optimization NVPTX-specific for
> now. If other targets need this, we can later think about how to reuse the
> code.
> 
> 2. How effective do we want this optimization to be?
> 
> In the short term, I want it to be able to eliminate unnecessary
> non-generic-to-generic addrspacecasts the front-end generates for the NVPTX
> target. For example,
> 
> %p1 = addrspace i32 addrspace(3)* %p0 to i32*
> %v = load i32* %p1
> 
> =>
> 
> %v = load i32 addrspace(3)* %p0
> 
> We want similar optimization for store+addrspacecast and gep+addrspacecast
> as well.
> 
> In a long term, we could for sure improve this optimization to handle more
> instructions and more patterns.
> 
> Jingyue

Philip Reames

2014-Mar-26 21:10 UTC

head link

[LLVMdev] Reducing Generic Address Space Usage

On 03/25/2014 02:31 PM, Jingyue Wu wrote:> This is a follow-up discussion on 
>
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20140324/101899.html.
> The front-end change was already pushed in r204677, so we want to 
> continue with the IR optimization.
>
> In general, we want to write an IR pass to convert generic address 
> space usage to non-generic address space usage, because accessing the 
> generic address space in CUDA and OpenCL is significantly slower than 
> accessing non-generic ones (such as shared and constant),.
>
> Here is an example Justin gave:
>
> %ptr = ...
> %val = load i32* %ptr
>
> In this case, %ptr is a generic address space pointer (assuming an 
> address space mapping where 0 is generic). But if an analysis can 
> prove that the pointer %ptr was originally addrspacecast'd from a 
> specific address space (or some other mechanism through which the 
> pointer's specific address space can be determined), it may be 
> beneficial to explicitly convert the IR to something like:
>
> %ptr = ...
> %ptr.0 = addrspacecast i32* to i32 addrspace(3)*
> %val = load i32 addrspace(3)* %ptr.0
>
> Such a translation may generate better code for some targets.Just a note of caution: for some of us, address spaces are semantically 
important.  (i.e. having a cast introduced from one to another would be 
incorrect)  I have no problem with the mechanism you're describing being 
implemented, but it needs to be an opt in feature.
>
> There are two major design decisions we need to make:
>
> 1. Where does this pass live? Target-independent or target-dependent?
>
> Both NVPTX and R600 backend want this optimization, which seems a good 
> justification for making this optimization target-independent.
>
> However, we have three concerns on this:
> a) I doubt this optimization is valid for all targets, because LLVM 
> language reference 
> (http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction) says 
> addrspacecast "can be a no-op cast or a complex value modification, 
> depending on the target and the address space pair."
> b) NVPTX and R600 have different address numbering for the generic 
> address space, which makes things more complicated.
> c) We don't have a good understanding of the R600 backend.
>
> Therefore, I would vote for making this optimization NVPTX-specific 
> for now. If other targets need this, we can later think about how to 
> reuse the code.No opinion, but if it is target independent, it needs to be behind an 
optin target hook.>
> 2. How effective do we want this optimization to be?
>
> In the short term, I want it to be able to eliminate unnecessary 
> non-generic-to-generic addrspacecasts the front-end generates for the 
> NVPTX target. For example,
>
> %p1 = addrspace i32 addrspace(3)* %p0 to i32*
> %v = load i32* %p1
>
> =>
>
> %v = load i32 addrspace(3)* %p0
>
> We want similar optimization for store+addrspacecast and 
> gep+addrspacecast as well.
>
> In a long term, we could for sure improve this optimization to handle 
> more instructions and more patterns.Just to note, this last bit raises much less worries for me about 
correctness of my work.  If you've loading from a pointer which was in 
different address space, it seems very logical to combine that with the 
load.  We'd also never generate code like that.  :)

To restate my concern in general terms, it's the introduction of *new* 
casts which worry me, not the exploitation/optimization of existing ones.

Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140326/1748fee2/attachment.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Mar 2014 - [LLVMdev] Reducing Generic Address Space Usage

[LLVMdev] Reducing Generic Address Space Usage

[LLVMdev] Reducing Generic Address Space Usage

[LLVMdev] Reducing Generic Address Space Usage

[LLVMdev] Reducing Generic Address Space Usage

[LLVMdev] Reducing Generic Address Space Usage

[LLVMdev] Reducing Generic Address Space Usage

Apparently Analagous Threads