thr3ads.net - llvm dev - [LLVMdev] RFC: Representation of OpenCL Memory Spaces [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Justin Holewinski

2011-Oct-13 13:46 UTC

[LLVMdev] RFC: Representation of OpenCL Memory Spaces

The problem I want to address in this discussion is the representation of
OpenCL/CUDA memory spaces in LLVM IR. As support for OpenCL and CUDA mature
within Clang, it is important that we provide a way to represent memory
spaces in a way that is (1) sufficiently generic that other language
front-ends can easily emit the needed annotations, and (2) sufficiently
specific that LLVM optimization passes can perform aggressive optimizations.

*1. Introduction*

Support for OpenCL/CUDA, and potentially future language extensions,
requires the compiler to differentiate between different types of memory.
 For example, OpenCL has a "__global" memory space which corresponds
to
globally-accessible data, and is usually off-chip memory in most
GPU configurations; and a "__local" memory space which corresponds to
work-group data (not accessible by work items outside of the current work
group), and is usually on-chip scratchpad memory in most GPU configurations.
 This information is currently represented in Clang/LLVM using the
addrspace() attribute on pointer types, where the OpenCL memory space to
target address space mapping is defined by the requested target (e.g. PTX,
X86, etc.).

This leads to a few issues.  First, some existing targets already use LLVM
address spaces for other purposes, so supporting OpenCL (as currently
supported in Clang) on these targets would require significant
re-structuring in the back-end.  Second, LLVM address spaces do not provide
enough semantic knowledge for optimization passes. For example, consider
pointer aliasing in the following kernel:

__kernel
void foo(__global float* a, __local float* b) {
  b[0] = a[0];
}

If we compile this with Clang targeting PTX, the resulting LLVM IR will be:

target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
target triple = "ptx32--"

define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)*
nocapture %b) nounwind noinline {
entry:
  %0 = load float* %a, align 4, !tbaa !1
  store float %0, float addrspace(4)* %b, align 4, !tbaa !1
  ret void
}

!opencl.kernels = !{!0}

!0 = metadata !{void (float*, float addrspace(4)*)* @foo}
!1 = metadata !{metadata !"float", metadata !2}
!2 = metadata !{metadata !"omnipotent char", metadata !3}
!3 = metadata !{metadata !"Simple C/C++ TBAA", null}

Does the load from %a alias the store to %b?  Using the semantics of OpenCL,
they cannot alias since they correspond to two different memory spaces.
 However, if we just look at the information in the LLVM IR, then basic
alias analysis cannot determine if aliasing occurs because disjoint memory
is not a property of LLVM address spaces. Therefore, we are not able to
optimize as much as we could.

It is becoming increasingly clear to me that LLVM address spaces are not the
general solution to OpenCL/CUDA memory spaces. They are a convenient hack to
get things working in the short term, but I think a more long-term approach
should be discussed and decided upon now before the OpenCL and CUDA
implementations in Clang/LLVM get too mature. To be clear, I am not
advocating that *targets* change to a different method for representing
device memory spaces. The current use of address spaces to represent
different types of device memory is perfectly valid, IMHO. However, this
knowledge should not be encoded in front-ends and pre-SelectionDAG
optimization passes.


*2. Solutions*

A couple of solutions to this problem are presented here, with the hope that
the Clang/LLVM community will offer a constructive discussion on how best to
proceed with OpenCL/CUDA support in Clang/LLVM. The following list is in no
way meant to be exhaustive; it merely serves as a starting basis for
discussion.


*2A. Extend TBAA*

In theory, the type-based alias analysis pass could be extended to
(properly) support aliasing queries for pointers in OpenCL kernels.
 Currently, it has no way of knowing if two pointers in different address
spaces can alias, and in fact cannot know if this is the case given the
definition of LLVM address spaces.  Instead of programming it with
target-specific knowledge, it can be extended with language-specific
knowledge.  Instead of considering address spaces, the Clang portion of TBAA
can be programmed to use OpenCL attributes to extend its pointer metadata.
 Specifically, pointers to different memory spaces are in essence different
types and cannot alias.  For the kernel shown above, the resulting LLVM IR
could be:

; ModuleID = 'test1.cl'
target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
target triple = "ptx32--"

define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)*
nocapture %b) nounwind noinline {
entry:
  %0 = load float* %a, align 4, !tbaa !1
  store float %0, float addrspace(4)* %b, align 4, !tbaa *!2*
  ret void
}

!opencl.kernels = !{!0}

!0 = metadata !{void (float*, float addrspace(4)*)* @foo}
*!1 = metadata !{metadata !"float$__global", metadata !3}*
*!2 = metadata !{metadata !"float$__local", metadata !3}*
!3 = metadata !{metadata !"omnipotent char", metadata !4}
!4 = metadata !{metadata !"Simple C/C++ TBAA", null}

Differences are bolded.  Here, the TBAA pass would be able to identify that
the loads and stores do not alias.  Of course, when compiling in
non-OpenCL/CUDA mode, TBAA would work just as before.

*Pros:*

Relatively easy to implement

*Cons:*

Does not solve the full problem, such as how to represent OpenCL memory
spaces in other backends, such as X86 which uses LLVM address spaces for
different purposes.

I see this solution as more of a short-term hack to solve the pointer
aliasing issue without actually addressing the larger issues.


*2B. Emit OpenCL/CUDA-specific Metadata or Attributes*

Instead of using LLVM address spaces to represent OpenCL/CUDA memory spaces,
language-specific annotations can be provided on types.  This can take the
form of metadata, or additional LLVM IR attributes on types and parameters,
such as:

; ModuleID = 'test1.cl'
target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
target triple = "ptx32--"

define *ocl_kernel* void @foo(float* nocapture *ocl_global* %a, float*
nocapture *ocl_local* %b) nounwind noinline {
entry:
  %0 = load float* %a, align 4
  store float %0, float* %b, align 4
  ret void
}

Instead of extending the LLVM IR language, this information could also be
encoded as metadata by either (1) emitting some global metadata that binds
useful properties to globals and parameters, or (2) extending LLVM IR to
allow attributes on parameters and globals.

Optimization passes can make use of these additional attributes to derive
useful properties, such as %a cannot alias %b. Then, back-ends can use these
attributes to emit proper code sequences based on the pointer attributes.

*Pros:*
*
*
If done right, would solve the general problem

*Cons:*
*
*
Large implementation commitment; could potentially touch many parts of LLVM.


Any comments?

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111013/7affca6b/attachment.html>

Peter Collingbourne

2011-Oct-13 15:57 UTC

head link

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

Hi Justin,

Thanks for bringing this up, I think it's important to discuss
these issues here.

On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski
wrote:> It is becoming increasingly clear to me that LLVM address spaces are not
the
> general solution to OpenCL/CUDA memory spaces. They are a convenient hack
to
> get things working in the short term, but I think a more long-term approach
> should be discussed and decided upon now before the OpenCL and CUDA
> implementations in Clang/LLVM get too mature. To be clear, I am not
> advocating that *targets* change to a different method for representing
> device memory spaces. The current use of address spaces to represent
> different types of device memory is perfectly valid, IMHO. However, this
> knowledge should not be encoded in front-ends and pre-SelectionDAG
> optimization passes.
I disagree.  The targets should expose all the address spaces they
provide, and the frontend should know about the various address spaces
it needs to know about.  It is incumbent on the frontend to deliver
a valid IR for a particular language implementation, and part of
that involves knowing about the ABI requirements for the language
implementation (which may involve using specific address spaces)
and the capabilities of each target (including the capabilities of
the target's address spaces), together with the language semantics.
It is not the job of the optimisers or backend to know the semantics
for a specific language, a specific implementation of that language
or a specific ABI.
> 
> 
> *2. Solutions*
> 
> A couple of solutions to this problem are presented here, with the hope
that
> the Clang/LLVM community will offer a constructive discussion on how best
to
> proceed with OpenCL/CUDA support in Clang/LLVM. The following list is in no
> way meant to be exhaustive; it merely serves as a starting basis for
> discussion.
> 
> 
> *2A. Extend TBAA*
> 
> In theory, the type-based alias analysis pass could be extended to
> (properly) support aliasing queries for pointers in OpenCL kernels.
>  Currently, it has no way of knowing if two pointers in different address
> spaces can alias, and in fact cannot know if this is the case given the
> definition of LLVM address spaces.  Instead of programming it with
> target-specific knowledge, it can be extended with language-specific
> knowledge.  Instead of considering address spaces, the Clang portion of
TBAA
> can be programmed to use OpenCL attributes to extend its pointer metadata.
>  Specifically, pointers to different memory spaces are in essence different
> types and cannot alias.  For the kernel shown above, the resulting LLVM IR
> could be:
> 
> ; ModuleID = 'test1.cl'
> target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> target triple = "ptx32--"
> 
> define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)*
> nocapture %b) nounwind noinline {
> entry:
>   %0 = load float* %a, align 4, !tbaa !1
>   store float %0, float addrspace(4)* %b, align 4, !tbaa *!2*
>   ret void
> }
> 
> !opencl.kernels = !{!0}
> 
> !0 = metadata !{void (float*, float addrspace(4)*)* @foo}
> *!1 = metadata !{metadata !"float$__global", metadata !3}*
> *!2 = metadata !{metadata !"float$__local", metadata !3}*
> !3 = metadata !{metadata !"omnipotent char", metadata !4}
> !4 = metadata !{metadata !"Simple C/C++ TBAA", null}
> 
> Differences are bolded.  Here, the TBAA pass would be able to identify that
> the loads and stores do not alias.  Of course, when compiling in
> non-OpenCL/CUDA mode, TBAA would work just as before.
I have to say that I much prefer the TBAA solution, as it encodes the
language semantics using the existing metadata for language semantics.
> *Pros:*
> 
> Relatively easy to implement
> 
> *Cons:*
> 
> Does not solve the full problem, such as how to represent OpenCL memory
> spaces in other backends, such as X86 which uses LLVM address spaces for
> different purposes.
This presupposes that we need a way of representing OpenCL address
spaces in IR targeting X86 (and targets which lack GPU-like address
spaces).  As far as I can tell, the only real representations of
OpenCL address spaces on such targets that we need are a way of
distinguishing the different address spaces for alias analysis
and a representation for __local variables allocated on the stack.
TBAA metadata would solve the first problem, and we already have
mechanisms in the frontend that could be used to solve the second.
> I see this solution as more of a short-term hack to solve the pointer
> aliasing issue without actually addressing the larger issues.
I remain to be persuaded that there are any "larger issues" to solve.
> *2B. Emit OpenCL/CUDA-specific Metadata or Attributes*
> 
> Instead of using LLVM address spaces to represent OpenCL/CUDA memory
spaces,
> language-specific annotations can be provided on types.  This can take the
> form of metadata, or additional LLVM IR attributes on types and parameters,
> such as:
> 
> ; ModuleID = 'test1.cl'
> target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> target triple = "ptx32--"
> 
> define *ocl_kernel* void @foo(float* nocapture *ocl_global* %a, float*
> nocapture *ocl_local* %b) nounwind noinline {
> entry:
>   %0 = load float* %a, align 4
>   store float %0, float* %b, align 4
>   ret void
> }
> 
> Instead of extending the LLVM IR language, this information could also be
> encoded as metadata by either (1) emitting some global metadata that binds
> useful properties to globals and parameters, or (2) extending LLVM IR to
> allow attributes on parameters and globals.
> 
> Optimization passes can make use of these additional attributes to derive
> useful properties, such as %a cannot alias %b. Then, back-ends can use
these
> attributes to emit proper code sequences based on the pointer attributes.
> 
> *Pros:*
> *
> *
> If done right, would solve the general problem
> 
> *Cons:*
> *
> *
> Large implementation commitment; could potentially touch many parts of
LLVM.
You are being vague about what is required here.  A complete solution
following 2B would involve allowing these attributes on all pointer
types.  It would be very expensive to allow custom attributes or
metadata on pointer types, since they are used frequently in the IR,
and the common case is not to have attributes or metadata.  Also,
depending on how this is implemented, this would encode far too much
language specific information in the IR.

Thanks,
-- 
Peter

Villmow, Micah

2011-Oct-13 18:59 UTC

head link

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

Justin,  
 Out of these options, I would take the metadata approach for AA support. 

This doesn't solve the problem of different frontend/backends choosing
different
address space representations for the same language, but is the correct 
approach for providing extra information to the optimizations.

The issue about memory spaces in general is a little different. For example,
based on
the code you posted below, address space 0(default) is global in CUDA, but
in OpenCL, the default address space is private. So, how does the ptx backend 
handle the differences? I think this is problematic as address spaces 
are language constructs and hardcoded at the frontend, but the backend needs to
be
able to interpret them differently based on the source language.

One way this could be done is to have the backends have options, but then
each backend would need to implement this. I think a better approach is 
to have some way to represent address spaces generically in the module.

Micah> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of Peter Collingbourne
> Sent: Thursday, October 13, 2011 8:58 AM
> To: Justin Holewinski
> Cc: clang-dev Developers; LLVM Developers Mailing List
> Subject: Re: [LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory
> Spaces
> 
> Hi Justin,
> 
> Thanks for bringing this up, I think it's important to discuss
> these issues here.
> 
> On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski wrote:
> > It is becoming increasingly clear to me that LLVM address spaces are
> not the
> > general solution to OpenCL/CUDA memory spaces. They are a convenient
> hack to
> > get things working in the short term, but I think a more long-term
> approach
> > should be discussed and decided upon now before the OpenCL and CUDA
> > implementations in Clang/LLVM get too mature. To be clear, I am not
> > advocating that *targets* change to a different method for
> representing
> > device memory spaces. The current use of address spaces to represent
> > different types of device memory is perfectly valid, IMHO. However,
> this
> > knowledge should not be encoded in front-ends and pre-SelectionDAG
> > optimization passes.
> 
> I disagree.  The targets should expose all the address spaces they
> provide, and the frontend should know about the various address spaces
> it needs to know about.  It is incumbent on the frontend to deliver
> a valid IR for a particular language implementation, and part of
> that involves knowing about the ABI requirements for the language
> implementation (which may involve using specific address spaces)
> and the capabilities of each target (including the capabilities of
> the target's address spaces), together with the language semantics.
> It is not the job of the optimisers or backend to know the semantics
> for a specific language, a specific implementation of that language
> or a specific ABI.
> 
> >
> >
> > *2. Solutions*
> >
> > A couple of solutions to this problem are presented here, with the
> hope that
> > the Clang/LLVM community will offer a constructive discussion on how
> best to
> > proceed with OpenCL/CUDA support in Clang/LLVM. The following list is
> in no
> > way meant to be exhaustive; it merely serves as a starting basis for
> > discussion.
> >
> >
> > *2A. Extend TBAA*
> >
> > In theory, the type-based alias analysis pass could be extended to
> > (properly) support aliasing queries for pointers in OpenCL kernels.
> >  Currently, it has no way of knowing if two pointers in different
> address
> > spaces can alias, and in fact cannot know if this is the case given
> the
> > definition of LLVM address spaces.  Instead of programming it with
> > target-specific knowledge, it can be extended with language-specific
> > knowledge.  Instead of considering address spaces, the Clang portion
> of TBAA
> > can be programmed to use OpenCL attributes to extend its pointer
> metadata.
> >  Specifically, pointers to different memory spaces are in essence
> different
> > types and cannot alias.  For the kernel shown above, the resulting
> LLVM IR
> > could be:
> >
> > ; ModuleID = 'test1.cl'
> > target datalayout =
"e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> > target triple = "ptx32--"
> >
> > define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)*
> > nocapture %b) nounwind noinline {
> > entry:
> >   %0 = load float* %a, align 4, !tbaa !1
> >   store float %0, float addrspace(4)* %b, align 4, !tbaa *!2*
> >   ret void
> > }
> >
> > !opencl.kernels = !{!0}
> >
> > !0 = metadata !{void (float*, float addrspace(4)*)* @foo}
> > *!1 = metadata !{metadata !"float$__global", metadata !3}*
> > *!2 = metadata !{metadata !"float$__local", metadata !3}*
> > !3 = metadata !{metadata !"omnipotent char", metadata !4}
> > !4 = metadata !{metadata !"Simple C/C++ TBAA", null}
> >
> > Differences are bolded.  Here, the TBAA pass would be able to
> identify that
> > the loads and stores do not alias.  Of course, when compiling in
> > non-OpenCL/CUDA mode, TBAA would work just as before.
> 
> I have to say that I much prefer the TBAA solution, as it encodes the
> language semantics using the existing metadata for language semantics.
> 
> > *Pros:*
> >
> > Relatively easy to implement
> >
> > *Cons:*
> >
> > Does not solve the full problem, such as how to represent OpenCL
> memory
> > spaces in other backends, such as X86 which uses LLVM address spaces
> for
> > different purposes.
> 
> This presupposes that we need a way of representing OpenCL address
> spaces in IR targeting X86 (and targets which lack GPU-like address
> spaces).  As far as I can tell, the only real representations of
> OpenCL address spaces on such targets that we need are a way of
> distinguishing the different address spaces for alias analysis
> and a representation for __local variables allocated on the stack.
> TBAA metadata would solve the first problem, and we already have
> mechanisms in the frontend that could be used to solve the second.
> 
> > I see this solution as more of a short-term hack to solve the pointer
> > aliasing issue without actually addressing the larger issues.
> 
> I remain to be persuaded that there are any "larger issues" to
solve.
> 
> > *2B. Emit OpenCL/CUDA-specific Metadata or Attributes*
> >
> > Instead of using LLVM address spaces to represent OpenCL/CUDA memory
> spaces,
> > language-specific annotations can be provided on types.  This can
> take the
> > form of metadata, or additional LLVM IR attributes on types and
> parameters,
> > such as:
> >
> > ; ModuleID = 'test1.cl'
> > target datalayout =
"e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> > target triple = "ptx32--"
> >
> > define *ocl_kernel* void @foo(float* nocapture *ocl_global* %a,
> float*
> > nocapture *ocl_local* %b) nounwind noinline {
> > entry:
> >   %0 = load float* %a, align 4
> >   store float %0, float* %b, align 4
> >   ret void
> > }
> >
> > Instead of extending the LLVM IR language, this information could
> also be
> > encoded as metadata by either (1) emitting some global metadata that
> binds
> > useful properties to globals and parameters, or (2) extending LLVM IR
> to
> > allow attributes on parameters and globals.
> >
> > Optimization passes can make use of these additional attributes to
> derive
> > useful properties, such as %a cannot alias %b. Then, back-ends can
> use these
> > attributes to emit proper code sequences based on the pointer
> attributes.
> >
> > *Pros:*
> > *
> > *
> > If done right, would solve the general problem
> >
> > *Cons:*
> > *
> > *
> > Large implementation commitment; could potentially touch many parts
> of LLVM.
> 
> You are being vague about what is required here.  A complete solution
> following 2B would involve allowing these attributes on all pointer
> types.  It would be very expensive to allow custom attributes or
> metadata on pointer types, since they are used frequently in the IR,
> and the common case is not to have attributes or metadata.  Also,
> depending on how this is implemented, this would encode far too much
> language specific information in the IR.
> 
> Thanks,
> --
> Peter
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Justin Holewinski

2011-Oct-13 20:14 UTC

head link

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

On Thu, Oct 13, 2011 at 11:57 AM, Peter Collingbourne <peter at
pcc.me.uk>wrote:
> Hi Justin,
>
> Thanks for bringing this up, I think it's important to discuss
> these issues here.
>
> On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski wrote:
> > It is becoming increasingly clear to me that LLVM address spaces are
not
> the
> > general solution to OpenCL/CUDA memory spaces. They are a convenient
hack
> to
> > get things working in the short term, but I think a more long-term
> approach
> > should be discussed and decided upon now before the OpenCL and CUDA
> > implementations in Clang/LLVM get too mature. To be clear, I am not
> > advocating that *targets* change to a different method for
representing
> > device memory spaces. The current use of address spaces to represent
> > different types of device memory is perfectly valid, IMHO. However,
this
> > knowledge should not be encoded in front-ends and pre-SelectionDAG
> > optimization passes.
>
> I disagree.  The targets should expose all the address spaces they
> provide, and the frontend should know about the various address spaces
> it needs to know about.  It is incumbent on the frontend to deliver
> a valid IR for a particular language implementation, and part of
> that involves knowing about the ABI requirements for the language
> implementation (which may involve using specific address spaces)
> and the capabilities of each target (including the capabilities of
> the target's address spaces), together with the language semantics.
> It is not the job of the optimisers or backend to know the semantics
> for a specific language, a specific implementation of that language
> or a specific ABI.
>
But this is assuming that a target's address spaces have a valid 1 to 1
mapping between OpenCL memory spaces and back-end address spaces.  What
happens for a target such as x86?  Do we introduce pseudo address spaces
into the back-end just to satisfy the front-end OpenCL requirements?

> >
> >
> > *2. Solutions*
> >
> > A couple of solutions to this problem are presented here, with the
hope
> that
> > the Clang/LLVM community will offer a constructive discussion on how
best
> to
> > proceed with OpenCL/CUDA support in Clang/LLVM. The following list is
in
> no
> > way meant to be exhaustive; it merely serves as a starting basis for
> > discussion.
> >
> >
> > *2A. Extend TBAA*
> >
> > In theory, the type-based alias analysis pass could be extended to
> > (properly) support aliasing queries for pointers in OpenCL kernels.
> >  Currently, it has no way of knowing if two pointers in different
address
> > spaces can alias, and in fact cannot know if this is the case given
the
> > definition of LLVM address spaces.  Instead of programming it with
> > target-specific knowledge, it can be extended with language-specific
> > knowledge.  Instead of considering address spaces, the Clang portion
of
> TBAA
> > can be programmed to use OpenCL attributes to extend its pointer
> metadata.
> >  Specifically, pointers to different memory spaces are in essence
> different
> > types and cannot alias.  For the kernel shown above, the resulting
LLVM
> IR
> > could be:
> >
> > ; ModuleID = 'test1.cl'
> > target datalayout =
"e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> > target triple = "ptx32--"
> >
> > define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)*
> > nocapture %b) nounwind noinline {
> > entry:
> >   %0 = load float* %a, align 4, !tbaa !1
> >   store float %0, float addrspace(4)* %b, align 4, !tbaa *!2*
> >   ret void
> > }
> >
> > !opencl.kernels = !{!0}
> >
> > !0 = metadata !{void (float*, float addrspace(4)*)* @foo}
> > *!1 = metadata !{metadata !"float$__global", metadata !3}*
> > *!2 = metadata !{metadata !"float$__local", metadata !3}*
> > !3 = metadata !{metadata !"omnipotent char", metadata !4}
> > !4 = metadata !{metadata !"Simple C/C++ TBAA", null}
> >
> > Differences are bolded.  Here, the TBAA pass would be able to identify
> that
> > the loads and stores do not alias.  Of course, when compiling in
> > non-OpenCL/CUDA mode, TBAA would work just as before.
>
> I have to say that I much prefer the TBAA solution, as it encodes the
> language semantics using the existing metadata for language semantics.
>
It's certainly the easiest to implement and would have the least impact
(practically zero) on existing passes.

>
> > *Pros:*
> >
> > Relatively easy to implement
> >
> > *Cons:*
> >
> > Does not solve the full problem, such as how to represent OpenCL
memory
> > spaces in other backends, such as X86 which uses LLVM address spaces
for
> > different purposes.
>
> This presupposes that we need a way of representing OpenCL address
> spaces in IR targeting X86 (and targets which lack GPU-like address
> spaces).  As far as I can tell, the only real representations of
> OpenCL address spaces on such targets that we need are a way of
> distinguishing the different address spaces for alias analysis
> and a representation for __local variables allocated on the stack.
> TBAA metadata would solve the first problem, and we already have
> mechanisms in the frontend that could be used to solve the second.
>
Which mechanisms could be used to differentiate between thread-private and
__local data?

>
> > I see this solution as more of a short-term hack to solve the pointer
> > aliasing issue without actually addressing the larger issues.
>
> I remain to be persuaded that there are any "larger issues" to
solve.
>
> > *2B. Emit OpenCL/CUDA-specific Metadata or Attributes*
> >
> > Instead of using LLVM address spaces to represent OpenCL/CUDA memory
> spaces,
> > language-specific annotations can be provided on types.  This can take
> the
> > form of metadata, or additional LLVM IR attributes on types and
> parameters,
> > such as:
> >
> > ; ModuleID = 'test1.cl'
> > target datalayout =
"e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> > target triple = "ptx32--"
> >
> > define *ocl_kernel* void @foo(float* nocapture *ocl_global* %a, float*
> > nocapture *ocl_local* %b) nounwind noinline {
> > entry:
> >   %0 = load float* %a, align 4
> >   store float %0, float* %b, align 4
> >   ret void
> > }
> >
> > Instead of extending the LLVM IR language, this information could also
be
> > encoded as metadata by either (1) emitting some global metadata that
> binds
> > useful properties to globals and parameters, or (2) extending LLVM IR
to
> > allow attributes on parameters and globals.
> >
> > Optimization passes can make use of these additional attributes to
derive
> > useful properties, such as %a cannot alias %b. Then, back-ends can use
> these
> > attributes to emit proper code sequences based on the pointer
attributes.
> >
> > *Pros:*
> > *
> > *
> > If done right, would solve the general problem
> >
> > *Cons:*
> > *
> > *
> > Large implementation commitment; could potentially touch many parts of
> LLVM.
>
> You are being vague about what is required here.  A complete solution
> following 2B would involve allowing these attributes on all pointer
> types.  It would be very expensive to allow custom attributes or
> metadata on pointer types, since they are used frequently in the IR,
> and the common case is not to have attributes or metadata.  Also,
> depending on how this is implemented, this would encode far too much
> language specific information in the IR.
>
I agree that this would be expensive, and I'm not necessarily advocating it.
If the consensus is that TBAA extensions are sufficient for all cases, then
I'm fine with that.  It's much less work. :)

I just want to make sure we're covering all of our bases before we proceed
too far with this.

>
> Thanks,
> --
> Peter
>


-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111013/cd90f598/attachment.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Oct 2011 - [LLVMdev] RFC: Representation of OpenCL Memory Spaces

[LLVMdev] RFC: Representation of OpenCL Memory Spaces

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

Apparently Analagous Threads