thr3ads.net - llvm dev - [LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Peter Collingbourne

2011-Oct-13 15:57 UTC

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

Hi Justin,

Thanks for bringing this up, I think it's important to discuss
these issues here.

On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski
wrote:> It is becoming increasingly clear to me that LLVM address spaces are not
the
> general solution to OpenCL/CUDA memory spaces. They are a convenient hack
to
> get things working in the short term, but I think a more long-term approach
> should be discussed and decided upon now before the OpenCL and CUDA
> implementations in Clang/LLVM get too mature. To be clear, I am not
> advocating that *targets* change to a different method for representing
> device memory spaces. The current use of address spaces to represent
> different types of device memory is perfectly valid, IMHO. However, this
> knowledge should not be encoded in front-ends and pre-SelectionDAG
> optimization passes.
I disagree.  The targets should expose all the address spaces they
provide, and the frontend should know about the various address spaces
it needs to know about.  It is incumbent on the frontend to deliver
a valid IR for a particular language implementation, and part of
that involves knowing about the ABI requirements for the language
implementation (which may involve using specific address spaces)
and the capabilities of each target (including the capabilities of
the target's address spaces), together with the language semantics.
It is not the job of the optimisers or backend to know the semantics
for a specific language, a specific implementation of that language
or a specific ABI.
> 
> 
> *2. Solutions*
> 
> A couple of solutions to this problem are presented here, with the hope
that
> the Clang/LLVM community will offer a constructive discussion on how best
to
> proceed with OpenCL/CUDA support in Clang/LLVM. The following list is in no
> way meant to be exhaustive; it merely serves as a starting basis for
> discussion.
> 
> 
> *2A. Extend TBAA*
> 
> In theory, the type-based alias analysis pass could be extended to
> (properly) support aliasing queries for pointers in OpenCL kernels.
>  Currently, it has no way of knowing if two pointers in different address
> spaces can alias, and in fact cannot know if this is the case given the
> definition of LLVM address spaces.  Instead of programming it with
> target-specific knowledge, it can be extended with language-specific
> knowledge.  Instead of considering address spaces, the Clang portion of
TBAA
> can be programmed to use OpenCL attributes to extend its pointer metadata.
>  Specifically, pointers to different memory spaces are in essence different
> types and cannot alias.  For the kernel shown above, the resulting LLVM IR
> could be:
> 
> ; ModuleID = 'test1.cl'
> target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> target triple = "ptx32--"
> 
> define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)*
> nocapture %b) nounwind noinline {
> entry:
>   %0 = load float* %a, align 4, !tbaa !1
>   store float %0, float addrspace(4)* %b, align 4, !tbaa *!2*
>   ret void
> }
> 
> !opencl.kernels = !{!0}
> 
> !0 = metadata !{void (float*, float addrspace(4)*)* @foo}
> *!1 = metadata !{metadata !"float$__global", metadata !3}*
> *!2 = metadata !{metadata !"float$__local", metadata !3}*
> !3 = metadata !{metadata !"omnipotent char", metadata !4}
> !4 = metadata !{metadata !"Simple C/C++ TBAA", null}
> 
> Differences are bolded.  Here, the TBAA pass would be able to identify that
> the loads and stores do not alias.  Of course, when compiling in
> non-OpenCL/CUDA mode, TBAA would work just as before.
I have to say that I much prefer the TBAA solution, as it encodes the
language semantics using the existing metadata for language semantics.
> *Pros:*
> 
> Relatively easy to implement
> 
> *Cons:*
> 
> Does not solve the full problem, such as how to represent OpenCL memory
> spaces in other backends, such as X86 which uses LLVM address spaces for
> different purposes.
This presupposes that we need a way of representing OpenCL address
spaces in IR targeting X86 (and targets which lack GPU-like address
spaces).  As far as I can tell, the only real representations of
OpenCL address spaces on such targets that we need are a way of
distinguishing the different address spaces for alias analysis
and a representation for __local variables allocated on the stack.
TBAA metadata would solve the first problem, and we already have
mechanisms in the frontend that could be used to solve the second.
> I see this solution as more of a short-term hack to solve the pointer
> aliasing issue without actually addressing the larger issues.
I remain to be persuaded that there are any "larger issues" to solve.
> *2B. Emit OpenCL/CUDA-specific Metadata or Attributes*
> 
> Instead of using LLVM address spaces to represent OpenCL/CUDA memory
spaces,
> language-specific annotations can be provided on types.  This can take the
> form of metadata, or additional LLVM IR attributes on types and parameters,
> such as:
> 
> ; ModuleID = 'test1.cl'
> target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> target triple = "ptx32--"
> 
> define *ocl_kernel* void @foo(float* nocapture *ocl_global* %a, float*
> nocapture *ocl_local* %b) nounwind noinline {
> entry:
>   %0 = load float* %a, align 4
>   store float %0, float* %b, align 4
>   ret void
> }
> 
> Instead of extending the LLVM IR language, this information could also be
> encoded as metadata by either (1) emitting some global metadata that binds
> useful properties to globals and parameters, or (2) extending LLVM IR to
> allow attributes on parameters and globals.
> 
> Optimization passes can make use of these additional attributes to derive
> useful properties, such as %a cannot alias %b. Then, back-ends can use
these
> attributes to emit proper code sequences based on the pointer attributes.
> 
> *Pros:*
> *
> *
> If done right, would solve the general problem
> 
> *Cons:*
> *
> *
> Large implementation commitment; could potentially touch many parts of
LLVM.
You are being vague about what is required here.  A complete solution
following 2B would involve allowing these attributes on all pointer
types.  It would be very expensive to allow custom attributes or
metadata on pointer types, since they are used frequently in the IR,
and the common case is not to have attributes or metadata.  Also,
depending on how this is implemented, this would encode far too much
language specific information in the IR.

Thanks,
-- 
Peter

Peter Collingbourne

2011-Oct-13 20:16 UTC

head link

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

On Thu, Oct 13, 2011 at 06:59:47PM +0000, Villmow, Micah
wrote:> Justin,  
>  Out of these options, I would take the metadata approach for AA support. 
> 
> This doesn't solve the problem of different frontend/backends choosing
different
> address space representations for the same language, but is the correct 
> approach for providing extra information to the optimizations.
> 
> The issue about memory spaces in general is a little different. For
example, based on
> the code you posted below, address space 0(default) is global in CUDA, but
> in OpenCL, the default address space is private. So, how does the ptx
backend
> handle the differences? I think this is problematic as address spaces 
> are language constructs and hardcoded at the frontend, but the backend
needs to be
> able to interpret them differently based on the source language.
> 
> One way this could be done is to have the backends have options, but then
> each backend would need to implement this. I think a better approach is 
> to have some way to represent address spaces generically in the module.
Address space 0 (i.e. the default address space) should always be the
address space on which the stack resides.  This is a requirement for
alloca to work correctly.  So for PTX, I think that address space 0
should be the local state space (but I noticed that at the moment it
is the global state space, which seems wrong IMHO).

As I mentioned in my previous email, I don't think that the backend
should interpret address spaces for the source language, as this
places too much language-specific functionality in the backend.

The situation regarding default address spaces in CUDA is more
complex, but suffice it to say that there is usually no such thing
as a "default" address space in CUDA, because the language does not
contain support for address space qualified pointer types (only address
space qualified declarations).  NVIDIA's CUDA compiler, nvopencc,
determines the correct address space for each pointer using type
inference (there is an explanation of nvopencc's algorithm in the
src/doc/ssa_memory_space.txt file in the nvopencc distribution).
Our compiler should eventually contain a similar algorithm.

Thanks,
-- 
Peter

Justin Holewinski

2011-Oct-13 20:16 UTC

head link

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

On Thu, Oct 13, 2011 at 2:59 PM, Villmow, Micah <Micah.Villmow at
amd.com>wrote:
> Justin,
>  Out of these options, I would take the metadata approach for AA support.
>
> This doesn't solve the problem of different frontend/backends choosing
> different
> address space representations for the same language, but is the correct
> approach for providing extra information to the optimizations.
>
> The issue about memory spaces in general is a little different. For
> example, based on
> the code you posted below, address space 0(default) is global in CUDA, but
> in OpenCL, the default address space is private. So, how does the ptx
> backend
> handle the differences? I think this is problematic as address spaces
> are language constructs and hardcoded at the frontend, but the backend
> needs to be
> able to interpret them differently based on the source language.
>
> One way this could be done is to have the backends have options, but then
> each backend would need to implement this. I think a better approach is
> to have some way to represent address spaces generically in the module.
>
That's sort of where I was trying to go with this.  I'm thinking of some
sort of annotation like address spaces, but with semantic properties
associated with them instead of leaving the definitions solely up to the
target.  Then again, this may be too high-level for LLVM IR, which is target
dependent to begin with.

>
> Micah
> > -----Original Message-----
> > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> > On Behalf Of Peter Collingbourne
> > Sent: Thursday, October 13, 2011 8:58 AM
> > To: Justin Holewinski
> > Cc: clang-dev Developers; LLVM Developers Mailing List
> > Subject: Re: [LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory
> > Spaces
> >
> > Hi Justin,
> >
> > Thanks for bringing this up, I think it's important to discuss
> > these issues here.
> >
> > On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski wrote:
> > > It is becoming increasingly clear to me that LLVM address spaces
are
> > not the
> > > general solution to OpenCL/CUDA memory spaces. They are a
convenient
> > hack to
> > > get things working in the short term, but I think a more
long-term
> > approach
> > > should be discussed and decided upon now before the OpenCL and
CUDA
> > > implementations in Clang/LLVM get too mature. To be clear, I am
not
> > > advocating that *targets* change to a different method for
> > representing
> > > device memory spaces. The current use of address spaces to
represent
> > > different types of device memory is perfectly valid, IMHO.
However,
> > this
> > > knowledge should not be encoded in front-ends and
pre-SelectionDAG
> > > optimization passes.
> >
> > I disagree.  The targets should expose all the address spaces they
> > provide, and the frontend should know about the various address spaces
> > it needs to know about.  It is incumbent on the frontend to deliver
> > a valid IR for a particular language implementation, and part of
> > that involves knowing about the ABI requirements for the language
> > implementation (which may involve using specific address spaces)
> > and the capabilities of each target (including the capabilities of
> > the target's address spaces), together with the language
semantics.
> > It is not the job of the optimisers or backend to know the semantics
> > for a specific language, a specific implementation of that language
> > or a specific ABI.
> >
> > >
> > >
> > > *2. Solutions*
> > >
> > > A couple of solutions to this problem are presented here, with
the
> > hope that
> > > the Clang/LLVM community will offer a constructive discussion on
how
> > best to
> > > proceed with OpenCL/CUDA support in Clang/LLVM. The following
list is
> > in no
> > > way meant to be exhaustive; it merely serves as a starting basis
for
> > > discussion.
> > >
> > >
> > > *2A. Extend TBAA*
> > >
> > > In theory, the type-based alias analysis pass could be extended
to
> > > (properly) support aliasing queries for pointers in OpenCL
kernels.
> > >  Currently, it has no way of knowing if two pointers in different
> > address
> > > spaces can alias, and in fact cannot know if this is the case
given
> > the
> > > definition of LLVM address spaces.  Instead of programming it
with
> > > target-specific knowledge, it can be extended with
language-specific
> > > knowledge.  Instead of considering address spaces, the Clang
portion
> > of TBAA
> > > can be programmed to use OpenCL attributes to extend its pointer
> > metadata.
> > >  Specifically, pointers to different memory spaces are in essence
> > different
> > > types and cannot alias.  For the kernel shown above, the
resulting
> > LLVM IR
> > > could be:
> > >
> > > ; ModuleID = 'test1.cl'
> > > target datalayout =
"e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> > > target triple = "ptx32--"
> > >
> > > define ptx_kernel void @foo(float* nocapture %a, float
addrspace(4)*
> > > nocapture %b) nounwind noinline {
> > > entry:
> > >   %0 = load float* %a, align 4, !tbaa !1
> > >   store float %0, float addrspace(4)* %b, align 4, !tbaa *!2*
> > >   ret void
> > > }
> > >
> > > !opencl.kernels = !{!0}
> > >
> > > !0 = metadata !{void (float*, float addrspace(4)*)* @foo}
> > > *!1 = metadata !{metadata !"float$__global", metadata
!3}*
> > > *!2 = metadata !{metadata !"float$__local", metadata
!3}*
> > > !3 = metadata !{metadata !"omnipotent char", metadata
!4}
> > > !4 = metadata !{metadata !"Simple C/C++ TBAA", null}
> > >
> > > Differences are bolded.  Here, the TBAA pass would be able to
> > identify that
> > > the loads and stores do not alias.  Of course, when compiling in
> > > non-OpenCL/CUDA mode, TBAA would work just as before.
> >
> > I have to say that I much prefer the TBAA solution, as it encodes the
> > language semantics using the existing metadata for language semantics.
> >
> > > *Pros:*
> > >
> > > Relatively easy to implement
> > >
> > > *Cons:*
> > >
> > > Does not solve the full problem, such as how to represent OpenCL
> > memory
> > > spaces in other backends, such as X86 which uses LLVM address
spaces
> > for
> > > different purposes.
> >
> > This presupposes that we need a way of representing OpenCL address
> > spaces in IR targeting X86 (and targets which lack GPU-like address
> > spaces).  As far as I can tell, the only real representations of
> > OpenCL address spaces on such targets that we need are a way of
> > distinguishing the different address spaces for alias analysis
> > and a representation for __local variables allocated on the stack.
> > TBAA metadata would solve the first problem, and we already have
> > mechanisms in the frontend that could be used to solve the second.
> >
> > > I see this solution as more of a short-term hack to solve the
pointer
> > > aliasing issue without actually addressing the larger issues.
> >
> > I remain to be persuaded that there are any "larger issues"
to solve.
> >
> > > *2B. Emit OpenCL/CUDA-specific Metadata or Attributes*
> > >
> > > Instead of using LLVM address spaces to represent OpenCL/CUDA
memory
> > spaces,
> > > language-specific annotations can be provided on types.  This can
> > take the
> > > form of metadata, or additional LLVM IR attributes on types and
> > parameters,
> > > such as:
> > >
> > > ; ModuleID = 'test1.cl'
> > > target datalayout =
"e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> > > target triple = "ptx32--"
> > >
> > > define *ocl_kernel* void @foo(float* nocapture *ocl_global* %a,
> > float*
> > > nocapture *ocl_local* %b) nounwind noinline {
> > > entry:
> > >   %0 = load float* %a, align 4
> > >   store float %0, float* %b, align 4
> > >   ret void
> > > }
> > >
> > > Instead of extending the LLVM IR language, this information could
> > also be
> > > encoded as metadata by either (1) emitting some global metadata
that
> > binds
> > > useful properties to globals and parameters, or (2) extending
LLVM IR
> > to
> > > allow attributes on parameters and globals.
> > >
> > > Optimization passes can make use of these additional attributes
to
> > derive
> > > useful properties, such as %a cannot alias %b. Then, back-ends
can
> > use these
> > > attributes to emit proper code sequences based on the pointer
> > attributes.
> > >
> > > *Pros:*
> > > *
> > > *
> > > If done right, would solve the general problem
> > >
> > > *Cons:*
> > > *
> > > *
> > > Large implementation commitment; could potentially touch many
parts
> > of LLVM.
> >
> > You are being vague about what is required here.  A complete solution
> > following 2B would involve allowing these attributes on all pointer
> > types.  It would be very expensive to allow custom attributes or
> > metadata on pointer types, since they are used frequently in the IR,
> > and the common case is not to have attributes or metadata.  Also,
> > depending on how this is implemented, this would encode far too much
> > language specific information in the IR.
> >
> > Thanks,
> > --
> > Peter
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111013/fc20c6a7/attachment.html>

Mon P Wang

2011-Oct-13 23:56 UTC

head link

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

Hi,

Tanya and I also prefer the extended TBAA solution as it naturally fits with
LLVM.  From my understanding of TBAA, it seems to provide the power to describe
the relationship between address spaces for alias analysis, i.e., it can
describe if two address spaces are disjoint or one may nest within another.  For
OpenCL, it is most useful to indicate that address spaces are disjoint from the
point of view of alias analysis even though the underlying memory may be the
same like in x86.   The question is there something missing in TBAA that it
can't properly describe the semantics we want for an address space?

  -- Mon Ping



On Oct 13, 2011, at 1:14 PM, Justin Holewinski wrote:
> 
> 
> On Thu, Oct 13, 2011 at 11:57 AM, Peter Collingbourne <peter at
pcc.me.uk> wrote:
> Hi Justin,
> 
> Thanks for bringing this up, I think it's important to discuss
> these issues here.
> 
> On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski wrote:
> > It is becoming increasingly clear to me that LLVM address spaces are
not the
> > general solution to OpenCL/CUDA memory spaces. They are a convenient
hack to
> > get things working in the short term, but I think a more long-term
approach
> > should be discussed and decided upon now before the OpenCL and CUDA
> > implementations in Clang/LLVM get too mature. To be clear, I am not
> > advocating that *targets* change to a different method for
representing
> > device memory spaces. The current use of address spaces to represent
> > different types of device memory is perfectly valid, IMHO. However,
this
> > knowledge should not be encoded in front-ends and pre-SelectionDAG
> > optimization passes.
> 
> I disagree.  The targets should expose all the address spaces they
> provide, and the frontend should know about the various address spaces
> it needs to know about.  It is incumbent on the frontend to deliver
> a valid IR for a particular language implementation, and part of
> that involves knowing about the ABI requirements for the language
> implementation (which may involve using specific address spaces)
> and the capabilities of each target (including the capabilities of
> the target's address spaces), together with the language semantics.
> It is not the job of the optimisers or backend to know the semantics
> for a specific language, a specific implementation of that language
> or a specific ABI.
> 
> But this is assuming that a target's address spaces have a valid 1 to 1
mapping between OpenCL memory spaces and back-end address spaces.  What happens
for a target such as x86?  Do we introduce pseudo address spaces into the
back-end just to satisfy the front-end OpenCL requirements?
> 
> 
> >
> >
> > *2. Solutions*
> >
> > A couple of solutions to this problem are presented here, with the
hope that
> > the Clang/LLVM community will offer a constructive discussion on how
best to
> > proceed with OpenCL/CUDA support in Clang/LLVM. The following list is
in no
> > way meant to be exhaustive; it merely serves as a starting basis for
> > discussion.
> >
> >
> > *2A. Extend TBAA*
> >
> > In theory, the type-based alias analysis pass could be extended to
> > (properly) support aliasing queries for pointers in OpenCL kernels.
> >  Currently, it has no way of knowing if two pointers in different
address
> > spaces can alias, and in fact cannot know if this is the case given
the
> > definition of LLVM address spaces.  Instead of programming it with
> > target-specific knowledge, it can be extended with language-specific
> > knowledge.  Instead of considering address spaces, the Clang portion
of TBAA
> > can be programmed to use OpenCL attributes to extend its pointer
metadata.
> >  Specifically, pointers to different memory spaces are in essence
different
> > types and cannot alias.  For the kernel shown above, the resulting
LLVM IR
> > could be:
> >
> > ; ModuleID = 'test1.cl'
> > target datalayout =
"e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> > target triple = "ptx32--"
> >
> > define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)*
> > nocapture %b) nounwind noinline {
> > entry:
> >   %0 = load float* %a, align 4, !tbaa !1
> >   store float %0, float addrspace(4)* %b, align 4, !tbaa *!2*
> >   ret void
> > }
> >
> > !opencl.kernels = !{!0}
> >
> > !0 = metadata !{void (float*, float addrspace(4)*)* @foo}
> > *!1 = metadata !{metadata !"float$__global", metadata !3}*
> > *!2 = metadata !{metadata !"float$__local", metadata !3}*
> > !3 = metadata !{metadata !"omnipotent char", metadata !4}
> > !4 = metadata !{metadata !"Simple C/C++ TBAA", null}
> >
> > Differences are bolded.  Here, the TBAA pass would be able to identify
that
> > the loads and stores do not alias.  Of course, when compiling in
> > non-OpenCL/CUDA mode, TBAA would work just as before.
> 
> I have to say that I much prefer the TBAA solution, as it encodes the
> language semantics using the existing metadata for language semantics.
> 
> It's certainly the easiest to implement and would have the least impact
(practically zero) on existing passes.
>  
> 
> > *Pros:*
> >
> > Relatively easy to implement
> >
> > *Cons:*
> >
> > Does not solve the full problem, such as how to represent OpenCL
memory
> > spaces in other backends, such as X86 which uses LLVM address spaces
for
> > different purposes.
> 
> This presupposes that we need a way of representing OpenCL address
> spaces in IR targeting X86 (and targets which lack GPU-like address
> spaces).  As far as I can tell, the only real representations of
> OpenCL address spaces on such targets that we need are a way of
> distinguishing the different address spaces for alias analysis
> and a representation for __local variables allocated on the stack.
> TBAA metadata would solve the first problem, and we already have
> mechanisms in the frontend that could be used to solve the second.
> 
> Which mechanisms could be used to differentiate between thread-private and
__local data?
>  
> 
> > I see this solution as more of a short-term hack to solve the pointer
> > aliasing issue without actually addressing the larger issues.
> 
> I remain to be persuaded that there are any "larger issues" to
solve.
> 
> > *2B. Emit OpenCL/CUDA-specific Metadata or Attributes*
> >
> > Instead of using LLVM address spaces to represent OpenCL/CUDA memory
spaces,
> > language-specific annotations can be provided on types.  This can take
the
> > form of metadata, or additional LLVM IR attributes on types and
parameters,
> > such as:
> >
> > ; ModuleID = 'test1.cl'
> > target datalayout =
"e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> > target triple = "ptx32--"
> >
> > define *ocl_kernel* void @foo(float* nocapture *ocl_global* %a, float*
> > nocapture *ocl_local* %b) nounwind noinline {
> > entry:
> >   %0 = load float* %a, align 4
> >   store float %0, float* %b, align 4
> >   ret void
> > }
> >
> > Instead of extending the LLVM IR language, this information could also
be
> > encoded as metadata by either (1) emitting some global metadata that
binds
> > useful properties to globals and parameters, or (2) extending LLVM IR
to
> > allow attributes on parameters and globals.
> >
> > Optimization passes can make use of these additional attributes to
derive
> > useful properties, such as %a cannot alias %b. Then, back-ends can use
these
> > attributes to emit proper code sequences based on the pointer
attributes.
> >
> > *Pros:*
> > *
> > *
> > If done right, would solve the general problem
> >
> > *Cons:*
> > *
> > *
> > Large implementation commitment; could potentially touch many parts of
LLVM.
> 
> You are being vague about what is required here.  A complete solution
> following 2B would involve allowing these attributes on all pointer
> types.  It would be very expensive to allow custom attributes or
> metadata on pointer types, since they are used frequently in the IR,
> and the common case is not to have attributes or metadata.  Also,
> depending on how this is implemented, this would encode far too much
> language specific information in the IR.
> 
> I agree that this would be expensive, and I'm not necessarily
advocating it. If the consensus is that TBAA extensions are sufficient for all
cases, then I'm fine with that.  It's much less work. :)
> 
> I just want to make sure we're covering all of our bases before we
proceed too far with this.
>  
> 
> Thanks,
> --
> Peter
> 
> 
> 
> -- 
> 
> Thanks,
> 
> Justin Holewinski
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111013/4294a831/attachment.html>

Peter Collingbourne

2011-Oct-14 17:40 UTC

head link

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

On Thu, Oct 13, 2011 at 04:14:09PM -0400, Justin Holewinski
wrote:> On Thu, Oct 13, 2011 at 11:57 AM, Peter Collingbourne <peter at
pcc.me.uk>wrote:
> 
> > Hi Justin,
> >
> > Thanks for bringing this up, I think it's important to discuss
> > these issues here.
> >
> > On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski wrote:
> > > It is becoming increasingly clear to me that LLVM address spaces
are not
> > the
> > > general solution to OpenCL/CUDA memory spaces. They are a
convenient hack
> > to
> > > get things working in the short term, but I think a more
long-term
> > approach
> > > should be discussed and decided upon now before the OpenCL and
CUDA
> > > implementations in Clang/LLVM get too mature. To be clear, I am
not
> > > advocating that *targets* change to a different method for
representing
> > > device memory spaces. The current use of address spaces to
represent
> > > different types of device memory is perfectly valid, IMHO.
However, this
> > > knowledge should not be encoded in front-ends and
pre-SelectionDAG
> > > optimization passes.
> >
> > I disagree.  The targets should expose all the address spaces they
> > provide, and the frontend should know about the various address spaces
> > it needs to know about.  It is incumbent on the frontend to deliver
> > a valid IR for a particular language implementation, and part of
> > that involves knowing about the ABI requirements for the language
> > implementation (which may involve using specific address spaces)
> > and the capabilities of each target (including the capabilities of
> > the target's address spaces), together with the language
semantics.
> > It is not the job of the optimisers or backend to know the semantics
> > for a specific language, a specific implementation of that language
> > or a specific ABI.
> >
> 
> But this is assuming that a target's address spaces have a valid 1 to 1
> mapping between OpenCL memory spaces and back-end address spaces.  What
> happens for a target such as x86?  Do we introduce pseudo address spaces
> into the back-end just to satisfy the front-end OpenCL requirements?
I don't see how anything I wrote implies that.  For x86, there would
presumably be a many-to-one mapping.
> > This presupposes that we need a way of representing OpenCL address
> > spaces in IR targeting X86 (and targets which lack GPU-like address
> > spaces).  As far as I can tell, the only real representations of
> > OpenCL address spaces on such targets that we need are a way of
> > distinguishing the different address spaces for alias analysis
> > and a representation for __local variables allocated on the stack.
> > TBAA metadata would solve the first problem, and we already have
> > mechanisms in the frontend that could be used to solve the second.
> >
> 
> Which mechanisms could be used to differentiate between thread-private and
> __local data?
In OpenCL C, it is illegal to declare a variable with static storage
duration in the __private address space (section 6.5: "All program
scope variables must be declared in the __constant address space.";
section 6.8g: "The extern, static, auto and register storage-class
specifiers are not supported.").  This implies that there is no way
for pointers to the __private address space to be usefully shared
between work-items without invoking undefined behaviour, so the
question is moot (i.e. __private does not need to be implemented using
thread-local storage).

It is possible to write OpenCL C code which shares pointers to
__private memory using barrier synchronisation, but since there is no
way to queue a memory fence across __private memory (only __local and
__global), any access to that memory would invoke undefined behaviour.
For example, consider the following (2 work-items in a work-group):

__kernel void foo() {
  int x = 0;
  int *__local p;
  if (get_local_id(0) == 0) p = &x;
  barrier(CLK_LOCAL_MEM_FENCE);
  if (get_local_id(0) == 1) *p = 1;
  barrier(CLK_LOCAL_MEM_FENCE);
  // what is the value of x in work-item 0 here?
}
  
The value of x at the comment is undefined, because no fence across
__private memory was queued.

Perhaps more straightforwardly, referring to the following passage
in section 3.3 ("memory model") of the OpenCL specification:

"Private Memory: A region of memory private to a work-item. Variables
defined in one work-item's private memory are not visible to another
work-item."

We can interpret the term "not visible" here as meaning that accesses
across work-items invoke undefined behaviour, so in the example above,
the write to x via p would itself be undefined.

Thanks,
-- 
Peter

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Oct 2011 - [LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

Reasonably Related Threads