On Aug 7, 2013, at 2:54 PM, Michele Scandale <michele.scandale at gmail.com> wrote:>> I don’t know if CUDA has aliasing address spaces, but that would also be >> useful to consider. Something simple like this might work. Note i’m >> using the examples from the clang discussion, that is "1 = opencl/cuda >> global, 2 = opencl_local/cuda_shared, 3 = opencl/cuda constant" > > You are assuming that the target device has different physical address spaces (like, PTX or R600 or TCE). What for those one with an unique address space (e.g. X86, ARM) where all opencl/cuda address spaces are mapped (correctly) to the target address space 0?That seems like something only the backend needs to care about, but it is a very important thing to consider. You could extend my approach below with one more field which for each address space tells you the HW address space it maps to. Then the selection DAG builder can use that information (if it exists) to do the translation. Thats perhaps not the cleanest implementation, but it would work. I was going to suggest that an alternative is to pass this information in to the load/store instructions in the backend, but it looks like that information is already available. That is, MachinePointerInfo has a getAddrSpace() method. This could potentially allow you to optimize MachineInstrs using the same knowledge you have here, e.g., constness for addrspace(3) in MachineLICM.> >> >> !address_spaces = !{!0, !1, !2, !3} >> >> ; Address space tuple. { address space number, parent address space, >> additional properties } >> !0 = metadata !{ i32 0, !{}, !{} } >> !1 = metadata !{ i32 1, !0, !{} } >> !2 = metadata !{ i32 2, !0, !{} } >> !3 = metadata !{ i32 3, !0, !4 } >> >> !4 = metadata !{ “constant” } >> >> >> This corresponds to 3 address spaces which all are members of address >> space 0, but which otherwise do not alias each other. I think this is >> roughly how TBAA does things. You can introduce any nodes in the tree >> of address spaces you need to make children in the tree alias each other. >> >> Additionally, the last address space is marked as constant which could >> be used for optimization, e.g. LICM. > > You mean that 1, 2, 3 do not alias each other, but they all alias with 0, right? The address space 0 in used to represent opencl __private address space, I think it would not alias with the others…Yeah, thats right, i have them all alias 0. If 0 is private and doesn’t alias anything then thats even better. Potentially that means that the optimizer will be able to reorder any access to globals with any other access to the stack for example. That will really help it optimize very well.> > BTW, I like the approach: it allows a fine description of relationship between address spaces that can be used in the middle-end, and the frontend is responsible for the correct emission of this language specific information. That's great!Thanks :)>
On 08/08/2013 12:24 AM, Pete Cooper wrote:> > On Aug 7, 2013, at 2:54 PM, Michele Scandale <michele.scandale at gmail.com> wrote: > >>> I don’t know if CUDA has aliasing address spaces, but that would also be >>> useful to consider. Something simple like this might work. Note i’m >>> using the examples from the clang discussion, that is "1 = opencl/cuda >>> global, 2 = opencl_local/cuda_shared, 3 = opencl/cuda constant" >> >> You are assuming that the target device has different physical address spaces (like, PTX or R600 or TCE). What for those one with an unique address space (e.g. X86, ARM) where all opencl/cuda address spaces are mapped (correctly) to the target address space 0? > That seems like something only the backend needs to care about, but it is a very important thing to consider. > > You could extend my approach below with one more field which for each address space tells you the HW address space it maps to. Then the selection DAG builder can use that information (if it exists) to do the translation. Thats perhaps not the cleanest implementation, but it would work. > > I was going to suggest that an alternative is to pass this information in to the load/store instructions in the backend, but it looks like that information is already available. That is, MachinePointerInfo has a getAddrSpace() method. This could potentially allow you to optimize MachineInstrs using the same knowledge you have here, e.g., constness for addrspace(3) in MachineLICM.From here: http://llvm.org/docs/LangRef.html#pointer-type "The semantics of non-zero address spaces are target-specific." My interpretation is that address spaces are TARGET dependent, so they want to represent the physical address spaces. So it is *bad* cheating with this modifier adding a translation that do not reflect the target features. The assumption I see is that the backend knows how to handle the address space numbers used here. So use this modifier would imply that any backend should be aware of the semantic of opencl/cuda address spaces. I discussed about this for a correlated issue in cfe-commits (please follow the message chain starting from here http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130715/084011.html). The idea of using metadata to represent the mapping sounds good, *but the semantic of addrspace modifier in the IR must change*. Do you agree with this? Indeed the lowering phase of address space must be explicit somewhere. I agree with you that generically the instruction selection is fine, but the high level information should not be dropped: the target address space must be used for the instruction selection but the high level information must be accessible from the MachineInstr if needed. From what I see the MachinePointerInfo::getAddrSpace uses the IR Value* associated and return the address space saved in the IR. If in the IR we have the logical address space, somewhere else I expect to have the physical one (its recomputation may be fine).>> You mean that 1, 2, 3 do not alias each other, but they all alias with 0, right? The address space 0 in used to represent opencl __private address space, I think it would not alias with the others… > Yeah, thats right, i have them all alias 0. If 0 is private and doesn’t alias anything then thats even better. Potentially that means that the optimizer will be able to reorder any access to globals with any other access to the stack for example. That will really help it optimize very well.In the opencl specification is said that the four address spaces are disjoint, so my conclusion of non aliasing with the others. I hope that the discussion will bring us to a nice and clear solution :-). Thanks. -Michele
On 08/07/2013 03:52 PM, Michele Scandale wrote:> > In the opencl specification is said that the four address spaces are > disjoint, so my conclusion of non aliasing with the others.In OpenCL 2.0, you can cast between the generic address space and global/local/private, so there's also that to consider.
On Wed, Aug 7, 2013 at 6:24 PM, Pete Cooper <peter_cooper at apple.com> wrote:> > On Aug 7, 2013, at 2:54 PM, Michele Scandale <michele.scandale at gmail.com> > wrote: > > >> I don’t know if CUDA has aliasing address spaces, but that would also be > >> useful to consider. Something simple like this might work. Note i’m > >> using the examples from the clang discussion, that is "1 = opencl/cuda > >> global, 2 = opencl_local/cuda_shared, 3 = opencl/cuda constant" > > > > You are assuming that the target device has different physical address > spaces (like, PTX or R600 or TCE). What for those one with an unique > address space (e.g. X86, ARM) where all opencl/cuda address spaces are > mapped (correctly) to the target address space 0? > That seems like something only the backend needs to care about, but it is > a very important thing to consider. > > You could extend my approach below with one more field which for each > address space tells you the HW address space it maps to. Then the > selection DAG builder can use that information (if it exists) to do the > translation. Thats perhaps not the cleanest implementation, but it would > work. > > I was going to suggest that an alternative is to pass this information in > to the load/store instructions in the backend, but it looks like that > information is already available. That is, MachinePointerInfo has a > getAddrSpace() method. This could potentially allow you to optimize > MachineInstrs using the same knowledge you have here, e.g., constness for > addrspace(3) in MachineLICM.\ >I don't believe MachinePointerInfo is guaranteed to be meaningful for all loads/stores. It is populated with an llvm::Value*, but loads/stores generated in a backend may not be associated with a Value*.> > > >> > >> !address_spaces = !{!0, !1, !2, !3} > >> > >> ; Address space tuple. { address space number, parent address space, > >> additional properties } > >> !0 = metadata !{ i32 0, !{}, !{} } > >> !1 = metadata !{ i32 1, !0, !{} } > >> !2 = metadata !{ i32 2, !0, !{} } > >> !3 = metadata !{ i32 3, !0, !4 } > >> > >> !4 = metadata !{ “constant” } > >> > >> > >> This corresponds to 3 address spaces which all are members of address > >> space 0, but which otherwise do not alias each other. I think this is > >> roughly how TBAA does things. You can introduce any nodes in the tree > >> of address spaces you need to make children in the tree alias each > other. > >> > >> Additionally, the last address space is marked as constant which could > >> be used for optimization, e.g. LICM. > > > > You mean that 1, 2, 3 do not alias each other, but they all alias with > 0, right? The address space 0 in used to represent opencl __private address > space, I think it would not alias with the others… > Yeah, thats right, i have them all alias 0. If 0 is private and doesn’t > alias anything then thats even better. Potentially that means that the > optimizer will be able to reorder any access to globals with any other > access to the stack for example. That will really help it optimize very > well. > > > > BTW, I like the approach: it allows a fine description of relationship > between address spaces that can be used in the middle-end, and the frontend > is responsible for the correct emission of this language specific > information. That's great! > Thanks :) > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130807/ca3c72a4/attachment.html>