thr3ads.net - llvm dev - [LLVMdev] Address space extension [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Pete Cooper

2013-Aug-07 22:24 UTC

[LLVMdev] Address space extension

On Aug 7, 2013, at 2:54 PM, Michele Scandale <michele.scandale at
gmail.com> wrote:
>> I don’t know if CUDA has aliasing address spaces, but that would also
be
>> useful to consider.  Something simple like this might work.  Note i’m
>> using the examples from the clang discussion, that is "1 =
opencl/cuda
>> global, 2 = opencl_local/cuda_shared, 3 = opencl/cuda constant"
> 
> You are assuming that the target device has different physical address
spaces (like, PTX or R600 or TCE). What for those one with an unique address
space (e.g. X86, ARM) where all opencl/cuda address spaces are mapped
(correctly) to the target address space 0?That seems like something only the backend needs to care about, but it is a very
important thing to consider.

You could extend my approach below with one more field which for each address
space tells you the HW address space it maps to.  Then the selection DAG builder
can use that information (if it exists) to do the translation.  Thats perhaps
not the cleanest implementation, but it would work.

I was going to suggest that an alternative is to pass this information in to the
load/store instructions in the backend, but it looks like that information is
already available.  That is, MachinePointerInfo has a getAddrSpace() method. 
This could potentially allow you to optimize MachineInstrs using the same
knowledge you have here, e.g., constness for addrspace(3) in
MachineLICM.> 
>> 
>> !address_spaces = !{!0, !1, !2, !3}
>> 
>> ; Address space tuple.  { address space number, parent address space,
>> additional properties }
>> !0 = metadata !{ i32 0, !{}, !{} }
>> !1 = metadata !{ i32 1, !0, !{} }
>> !2 = metadata !{ i32 2, !0, !{} }
>> !3 = metadata !{ i32 3, !0, !4 }
>> 
>> !4 = metadata !{ “constant” }
>> 
>> 
>> This corresponds to 3 address spaces which all are members of address
>> space 0, but which otherwise do not alias each other.  I think this is
>> roughly how TBAA does things.  You can introduce any nodes in the tree
>> of address spaces you need to make children in the tree alias each
other.
>> 
>> Additionally, the last address space is marked as constant which could
>> be used for optimization, e.g. LICM.
> 
> You mean that 1, 2, 3 do not alias each other, but they all alias with 0,
right? The address space 0 in used to represent opencl __private address space,
I think it would not alias with the others…Yeah, thats right, i have them all alias 0.  If 0 is private and doesn’t alias
anything then thats even better.  Potentially that means that the optimizer will
be able to reorder any access to globals with any other access to the stack for
example.  That will really help it optimize very well.> 
> BTW, I like the approach: it allows a fine description of relationship
between address spaces that can be used in the middle-end, and the frontend is
responsible for the correct emission of this language specific information.
That's great!
Thanks :)>

Michele Scandale

2013-Aug-07 22:52 UTC

head link

[LLVMdev] Address space extension

On 08/08/2013 12:24 AM, Pete Cooper wrote:>
> On Aug 7, 2013, at 2:54 PM, Michele Scandale <michele.scandale at
gmail.com> wrote:
>
>>> I don’t know if CUDA has aliasing address spaces, but that would
also be
>>> useful to consider.  Something simple like this might work.  Note
i’m
>>> using the examples from the clang discussion, that is "1 =
opencl/cuda
>>> global, 2 = opencl_local/cuda_shared, 3 = opencl/cuda
constant"
>>
>> You are assuming that the target device has different physical address
spaces (like, PTX or R600 or TCE). What for those one with an unique address
space (e.g. X86, ARM) where all opencl/cuda address spaces are mapped
(correctly) to the target address space 0?
> That seems like something only the backend needs to care about, but it is a
very important thing to consider.
>
> You could extend my approach below with one more field which for each
address space tells you the HW address space it maps to.  Then the selection DAG
builder can use that information (if it exists) to do the translation.  Thats
perhaps not the cleanest implementation, but it would work.
>
> I was going to suggest that an alternative is to pass this information in
to the load/store instructions in the backend, but it looks like that
information is already available.  That is, MachinePointerInfo has a
getAddrSpace() method.  This could potentially allow you to optimize
MachineInstrs using the same knowledge you have here, e.g., constness for
addrspace(3) in MachineLICM.
 From here: http://llvm.org/docs/LangRef.html#pointer-type

"The semantics of non-zero address spaces are target-specific."

My interpretation is that address spaces are TARGET dependent, so they 
want to represent the physical address spaces. So it is *bad* cheating 
with this modifier adding a translation that do not reflect the target 
features. The assumption I see is that the backend knows how to handle 
the address space numbers used here. So use this modifier would imply 
that any backend should be aware of the semantic of opencl/cuda address 
spaces.
I discussed about this for a correlated issue in cfe-commits (please 
follow the message chain starting from here 
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130715/084011.html).

The idea of using metadata to represent the mapping sounds good, *but 
the semantic of addrspace modifier in the IR must change*.

Do you agree with this?

Indeed the lowering phase of address space must be explicit somewhere.
I agree with you that generically the instruction selection is fine, but 
the high level information should not be dropped: the target address 
space must be used for the instruction selection but the high level 
information must be accessible from the MachineInstr if needed.
 From what I see the MachinePointerInfo::getAddrSpace uses the IR Value* 
associated and return the address space saved in the IR. If in the IR we 
have the logical address space, somewhere else I expect to have the 
physical one (its recomputation may be fine).

>> You mean that 1, 2, 3 do not alias each other, but they all alias with
0, right? The address space 0 in used to represent opencl __private address
space, I think it would not alias with the others…
> Yeah, thats right, i have them all alias 0.  If 0 is private and doesn’t
alias anything then thats even better.  Potentially that means that the
optimizer will be able to reorder any access to globals with any other access to
the stack for example.  That will really help it optimize very well.
In the opencl specification is said that the four address spaces are 
disjoint, so my conclusion of non aliasing with the others.

I hope that the discussion will bring us to a nice and clear solution :-).

Thanks.

-Michele

Matt Arsenault

2013-Aug-07 22:55 UTC

head link

[LLVMdev] Address space extension

On 08/07/2013 03:52 PM, Michele Scandale wrote:>
> In the opencl specification is said that the four address spaces are 
> disjoint, so my conclusion of non aliasing with the others.In OpenCL 2.0, you can cast between the generic address space and 
global/local/private, so there's also that to consider.

Justin Holewinski

2013-Aug-07 23:19 UTC

head link

[LLVMdev] Address space extension

On Wed, Aug 7, 2013 at 6:24 PM, Pete Cooper <peter_cooper at apple.com>
wrote:
>
> On Aug 7, 2013, at 2:54 PM, Michele Scandale <michele.scandale at
gmail.com>
> wrote:
>
> >> I don’t know if CUDA has aliasing address spaces, but that would
also be
> >> useful to consider.  Something simple like this might work.  Note
i’m
> >> using the examples from the clang discussion, that is "1 =
opencl/cuda
> >> global, 2 = opencl_local/cuda_shared, 3 = opencl/cuda
constant"
> >
> > You are assuming that the target device has different physical address
> spaces (like, PTX or R600 or TCE). What for those one with an unique
> address space (e.g. X86, ARM) where all opencl/cuda address spaces are
> mapped (correctly) to the target address space 0?
> That seems like something only the backend needs to care about, but it is
> a very important thing to consider.
>
> You could extend my approach below with one more field which for each
> address space tells you the HW address space it maps to.  Then the
> selection DAG builder can use that information (if it exists) to do the
> translation.  Thats perhaps not the cleanest implementation, but it would
> work.
>
> I was going to suggest that an alternative is to pass this information in
> to the load/store instructions in the backend, but it looks like that
> information is already available.  That is, MachinePointerInfo has a
> getAddrSpace() method.  This could potentially allow you to optimize
> MachineInstrs using the same knowledge you have here, e.g., constness for
> addrspace(3) in MachineLICM.\
>
I don't believe MachinePointerInfo is guaranteed to be meaningful for all
loads/stores.  It is populated with an llvm::Value*, but loads/stores
generated in a backend may not be associated with a Value*.

> >
> >>
> >> !address_spaces = !{!0, !1, !2, !3}
> >>
> >> ; Address space tuple.  { address space number, parent address
space,
> >> additional properties }
> >> !0 = metadata !{ i32 0, !{}, !{} }
> >> !1 = metadata !{ i32 1, !0, !{} }
> >> !2 = metadata !{ i32 2, !0, !{} }
> >> !3 = metadata !{ i32 3, !0, !4 }
> >>
> >> !4 = metadata !{ “constant” }
> >>
> >>
> >> This corresponds to 3 address spaces which all are members of
address
> >> space 0, but which otherwise do not alias each other.  I think
this is
> >> roughly how TBAA does things.  You can introduce any nodes in the
tree
> >> of address spaces you need to make children in the tree alias each
> other.
> >>
> >> Additionally, the last address space is marked as constant which
could
> >> be used for optimization, e.g. LICM.
> >
> > You mean that 1, 2, 3 do not alias each other, but they all alias with
> 0, right? The address space 0 in used to represent opencl __private address
> space, I think it would not alias with the others…
> Yeah, thats right, i have them all alias 0.  If 0 is private and doesn’t
> alias anything then thats even better.  Potentially that means that the
> optimizer will be able to reorder any access to globals with any other
> access to the stack for example.  That will really help it optimize very
> well.
> >
> > BTW, I like the approach: it allows a fine description of relationship
> between address spaces that can be used in the middle-end, and the frontend
> is responsible for the correct emission of this language specific
> information. That's great!
> Thanks :)
> >
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>


-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130807/ca3c72a4/attachment.html>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Aug 2013 - [LLVMdev] Address space extension

[LLVMdev] Address space extension

[LLVMdev] Address space extension

[LLVMdev] Address space extension

[LLVMdev] Address space extension

Seemingly Similar Threads