thr3ads.net - llvm dev - [LLVMdev] Address space extension [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Pete Cooper

2013-Aug-08 03:23 UTC

[LLVMdev] Address space extension

On Aug 7, 2013, at 7:23 PM, Michele Scandale <michele.scandale at
gmail.com> wrote:
> On 08/08/2013 03:52 AM, Pete Cooper wrote:
>>> Why a backend should be responsible (meaning have knowledge) for a
>>> mapping between high level address spaces and low level address
spaces?
>> Thats true.  I’m thinking entirely from the persecutive of the backend
>> doing CL/CUDA.  But actually LLVM is language agnostic.  That is still
>> something the metadata could solve.  The front-end could generate the
>> metadata i suggested earlier which will tell the backend how to do the
>> mapping.  Then the backend only needs to read the metadata.
> 
> From here I understand that in the IR there are addrspace(N) where
N=0,1,2,3,... according to the target independent mapping done by the frontend
to represent different address spaces (for OpenCL 1.2 0 = private, 1 = global, 2
= local, 3 = constant).
> 
> Then the frontend emits metadata that contains the map from "language
address spaces" to "target address spaces" (for X86 would be
0->0 1->0 2->0 3->0).
> 
> Finally the instruction selection will use these informations to perform
the instruction selection correctly and tagging the machine instruction with
both logical and physical address spaces.
Sounds good.> 
>>> Why X86 backend should be aware of opencl address spaces or any
other
>>> address spaces?
>> The only reason i can think of is that this allows the address space
>> alias analysis to occur, and all of the optimizations you might want to
>> implement on top of it.  Otherwise you’ll need the front-end to put
>> everything in address space 0 and you’ll have lost some opportunity to
>> optimize in that way for x86.
> 
> The mapping phase will allow to have to have the backend precondition
satisfied (no address spaces other than zero). Having in the IR and also after
both informations the alias analysis should be feasible.
> 
>>> Like for other aspects I see more direct and intuitive to
anticipate
>>> target information in the frontend (this is already done and
accepted)
>>> to have a middle-end and back-end source language dependent (no
>>> specific language knowledge is required, because different
frontends
>>> could be built on top of this).
>>> 
>>> Maybe a way to decouple the frontend and the specific target is
>>> possible in order to have in the target independent part of the
>>> code-generator a support for a set of language with common concept
>>> (like opencl/cuda) but it's still language dependent!
>> Yes, that could work.  Actually the numbers are probably not the
>> important thing.  Its the names that really tell you what the address
>> space is for.  The backend needs to know what loading from a local
>> means.  Its almost unimportant what specific number a front-end chooses
>> for that address space.  We know the front-end is really going to
choose
>> 2 (from what you said earlier), but the backend just needs to know how
>> to load/store a local.
>> 
>> So perhaps the front-end should really be generating metadata which
>> tells the target what address space it chose for a memory space.  That
is
>> 
>> !private_memory = metadata !{ i32 0 }
>> !global_memory = metadata !{ i32 1 }
>> !local_memory = metadata !{ i32 2 }
>> !constant_memory = metadata !{ i32 3 }
>> 
>> Unfortunately you’d have to essentially reserve those metadata names
for
>> your use (better names than i chose of course), but this might be
>> reasonable.  You could alternately use the example I first gave, but
>> just add a name field to it.
>> 
>> I guess targets would have to either assert or default to address space
>> 0 when they see an address space without associated metadata.
> 
> This part is not clear, still in the X86 backend private/global/local
memories are meaningless. Indeed it is limited to a set of languages that
support these abstractions.Yeah.  They address spaces don’t mean anything in terms of instruction selection
for x86.  You mentioned earlier putting the physical and logical address spaces
on the machine instr.  If you wanted you could use these to perform code motion
on x86 which would otherwise not be possible, but thats the only reason I can
think of for why x86 would benefit from address space information in the
backend.> 
> IMO a more general solution would be to fully demand to the frontend the
mapping resolution generating the map from logical to physical address spaces.
> 
> Considering also the fact that addrspace is used to support C address space
extension that maps from C to physical numbered address spaces, maybe a default
implicit identity function as mapping would be fine when no metadata are not
provided.Yeah, I think a default identify mapping is a good idea.  x86 for example uses
address spaces 256 and 257 for the fs and gs segments.  Without this default
mapping, tests using those segments would fail.

Thanks,
Pete> 
> 
> Thanks again.
> 
> -Michele
> 
> 
>

David Chisnall

2013-Aug-08 09:04 UTC

head link

[LLVMdev] Address space extension

On 8 Aug 2013, at 04:23, Pete Cooper <peter_cooper at apple.com> wrote:
> 
> On Aug 7, 2013, at 7:23 PM, Michele Scandale <michele.scandale at
gmail.com> wrote:
> 
>> On 08/08/2013 03:52 AM, Pete Cooper wrote:
>> 
>> From here I understand that in the IR there are addrspace(N) where
N=0,1,2,3,... according to the target independent mapping done by the frontend
to represent different address spaces (for OpenCL 1.2 0 = private, 1 = global, 2
= local, 3 = constant).
>> 
>> Then the frontend emits metadata that contains the map from
"language address spaces" to "target address spaces" (for
X86 would be 0->0 1->0 2->0 3->0).
>> 
>> Finally the instruction selection will use these informations to
perform the instruction selection correctly and tagging the machine instruction
with both logical and physical address spaces.
> Sounds good.
What happens when I link together two IR modules from different front ends that
have different language-specific address spaces?

I would be very hesitant about using address spaces until we've fixed their
semantics to disallow bitcasts between different address spaces and require an
explicit address space cast.  To illustrate the problem, consider the following
trivial example:

typedef __attribute__((address_space(256))) int* gsptr;

int *toglobal(gsptr foo)
{
	return (int*)foo;
}

int load(int *foo)
{
	return *foo;
}

int loadgs(gsptr foo)
{
	return *foo;
}

int loadgs2(gsptr foo)
{
	return *toglobal(foo);
}

When we compile this to LLVM IR with clang (disabling asynchronous unwind tables
for clarity), at -O2 we get this:

define i32* @toglobal(i32 addrspace(256)* %foo) nounwind readnone ssp {
  %1 = bitcast i32 addrspace(256)* %foo to i32*
  ret i32* %1
}

define i32 @load(i32* nocapture %foo) nounwind readonly ssp {
  %1 = load i32* %foo, align 4, !tbaa !0
  ret i32 %1
}

define i32 @loadgs(i32 addrspace(256)* nocapture %foo) nounwind readonly ssp {
  %1 = load i32 addrspace(256)* %foo, align 4, !tbaa !0
  ret i32 %1
}

define i32 @loadgs2(i32 addrspace(256)* nocapture %foo) nounwind readonly ssp {
  %1 = bitcast i32 addrspace(256)* %foo to i32*
  %2 = load i32* %1, align 4, !tbaa !0
  ret i32 %2
}

Note that in loadgs2, the call to toglobal has been inlined and so the back end
will just see a bitcast, which SelectionDAG treats as a no-op.  The assembly we
get from this is:

_toglobal:                              ## @toglobal
## BB#0:
	pushq	%rbp
	movq	%rsp, %rbp
	movq	%rdi, %rax
	popq	%rbp
	ret
load:                                  ## @load
## BB#0:
	pushq	%rbp
	movq	%rsp, %rbp
	movl	(%rdi), %eax
	popq	%rbp
	ret

	.globl	_loadgs
	.align	4, 0x90
loadgs:                                ## @loadgs
## BB#0:
	pushq	%rbp
	movq	%rsp, %rbp
	movl	%gs:(%rdi), %eax
	popq	%rbp
	ret

	.globl	_loadgs2
	.align	4, 0x90
loadgs2:                               ## @loadgs2
## BB#0:
	pushq	%rbp
	movq	%rsp, %rbp
	movl	(%rdi), %eax
	popq	%rbp
	ret

loadgs() has been compiled correctly.  It uses the parameter as a gs-relative
address and performs the load.  The assembly for load() and loadgs2(), however,
are identical: both are treating the parameter as a linear (not gs-relative)
address.  The cast has been lost.  This is even simpler when you look at
toglobal(), which has just become a noop.  The correct code for this should be
(I believe):

_toglobal:                              ## @toglobal
## BB#0:
	pushq	%rbp
	movq	%rsp, %rbp
	lea		%gs:(%rdi), %rax
	popq	%rbp
	ret

In the inlined version, the lea and movl should be combined into a single
gs-relativel movl.

Until we can generate correct code from IR containing address spaces, discussion
of how to optimise this IR seems premature.

David

Micah Villmow

2013-Aug-08 10:06 UTC

head link

[LLVMdev] Address space extension

My view is modules with different data layouts should be considered
incompatible. Data layouts are inherently target/language specific and I
don't view it any different than combining IR modules compiled for different
architectures.

Micah
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of David Chisnall
> Sent: Thursday, August 08, 2013 2:04 AM
> To: Pete Cooper
> Cc: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] Address space extension
> 
> On 8 Aug 2013, at 04:23, Pete Cooper <peter_cooper at apple.com>
wrote:
> 
> >
> > On Aug 7, 2013, at 7:23 PM, Michele Scandale
> <michele.scandale at gmail.com> wrote:
> >
> >> On 08/08/2013 03:52 AM, Pete Cooper wrote:
> >>
> >> From here I understand that in the IR there are addrspace(N) where
> N=0,1,2,3,... according to the target independent mapping done by the
> frontend to represent different address spaces (for OpenCL 1.2 0 = private,
1
> = global, 2 = local, 3 = constant).
> >>
> >> Then the frontend emits metadata that contains the map from
"language
> address spaces" to "target address spaces" (for X86 would be
0->0 1->0 2->0
> 3->0).
> >>
> >> Finally the instruction selection will use these informations to
perform the
> instruction selection correctly and tagging the machine instruction with
both
> logical and physical address spaces.
> > Sounds good.
> 
> What happens when I link together two IR modules from different front
> ends that have different language-specific address spaces?
> 
> I would be very hesitant about using address spaces until we've fixed
their
> semantics to disallow bitcasts between different address spaces and require
> an explicit address space cast.  To illustrate the problem, consider the
> following trivial example:
> 
> typedef __attribute__((address_space(256))) int* gsptr;
> 
> int *toglobal(gsptr foo)
> {
> 	return (int*)foo;
> }
> 
> int load(int *foo)
> {
> 	return *foo;
> }
> 
> int loadgs(gsptr foo)
> {
> 	return *foo;
> }
> 
> int loadgs2(gsptr foo)
> {
> 	return *toglobal(foo);
> }
> 
> When we compile this to LLVM IR with clang (disabling asynchronous unwind
> tables for clarity), at -O2 we get this:
> 
> define i32* @toglobal(i32 addrspace(256)* %foo) nounwind readnone ssp {
>   %1 = bitcast i32 addrspace(256)* %foo to i32*
>   ret i32* %1
> }
> 
> define i32 @load(i32* nocapture %foo) nounwind readonly ssp {
>   %1 = load i32* %foo, align 4, !tbaa !0
>   ret i32 %1
> }
> 
> define i32 @loadgs(i32 addrspace(256)* nocapture %foo) nounwind readonly
> ssp {
>   %1 = load i32 addrspace(256)* %foo, align 4, !tbaa !0
>   ret i32 %1
> }
> 
> define i32 @loadgs2(i32 addrspace(256)* nocapture %foo) nounwind
> readonly ssp {
>   %1 = bitcast i32 addrspace(256)* %foo to i32*
>   %2 = load i32* %1, align 4, !tbaa !0
>   ret i32 %2
> }
> 
> Note that in loadgs2, the call to toglobal has been inlined and so the back
end
> will just see a bitcast, which SelectionDAG treats as a no-op.  The
assembly
> we get from this is:
> 
> _toglobal:                              ## @toglobal
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	movq	%rdi, %rax
> 	popq	%rbp
> 	ret
> load:                                  ## @load
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	movl	(%rdi), %eax
> 	popq	%rbp
> 	ret
> 
> 	.globl	_loadgs
> 	.align	4, 0x90
> loadgs:                                ## @loadgs
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	movl	%gs:(%rdi), %eax
> 	popq	%rbp
> 	ret
> 
> 	.globl	_loadgs2
> 	.align	4, 0x90
> loadgs2:                               ## @loadgs2
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	movl	(%rdi), %eax
> 	popq	%rbp
> 	ret
> 
> loadgs() has been compiled correctly.  It uses the parameter as a
gs-relative
> address and performs the load.  The assembly for load() and loadgs2(),
> however, are identical: both are treating the parameter as a linear (not
gs-
> relative) address.  The cast has been lost.  This is even simpler when you
look
> at toglobal(), which has just become a noop.  The correct code for this
should
> be (I believe):
> 
> _toglobal:                              ## @toglobal
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	lea		%gs:(%rdi), %rax
> 	popq	%rbp
> 	ret
> 
> In the inlined version, the lea and movl should be combined into a single
gs-
> relativel movl.
> 
> Until we can generate correct code from IR containing address spaces,
> discussion of how to optimise this IR seems premature.
> 
> David
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Michele Scandale

2013-Aug-08 12:03 UTC

head link

[LLVMdev] Address space extension

On 08/08/2013 11:04 AM, David Chisnall wrote:> What happens when I link together two IR modules from different front ends
that have different language-specific address spaces?
I agree with Micah: if during the linking two IR modules there are 
incoherences (e.g. in module1 2 -> 1 and in module2 2 -> 3) then the 
modules are incompatible and the link process should fail.
> I would be very hesitant about using address spaces until we've fixed
their semantics to disallow bitcasts between different address spaces and
require an explicit address space cast.  To illustrate the problem, consider the
following trivial example:
>
> typedef __attribute__((address_space(256))) int* gsptr;
>
> int *toglobal(gsptr foo)
> {
> 	return (int*)foo;
> }
>
> int load(int *foo)
> {
> 	return *foo;
> }
>
> int loadgs(gsptr foo)
> {
> 	return *foo;
> }
>
> int loadgs2(gsptr foo)
> {
> 	return *toglobal(foo);
> }
>
> When we compile this to LLVM IR with clang (disabling asynchronous unwind
tables for clarity), at -O2 we get this:
>
> define i32* @toglobal(i32 addrspace(256)* %foo) nounwind readnone ssp {
>    %1 = bitcast i32 addrspace(256)* %foo to i32*
>    ret i32* %1
> }
>
> define i32 @load(i32* nocapture %foo) nounwind readonly ssp {
>    %1 = load i32* %foo, align 4, !tbaa !0
>    ret i32 %1
> }
>
> define i32 @loadgs(i32 addrspace(256)* nocapture %foo) nounwind readonly
ssp {
>    %1 = load i32 addrspace(256)* %foo, align 4, !tbaa !0
>    ret i32 %1
> }
>
> define i32 @loadgs2(i32 addrspace(256)* nocapture %foo) nounwind readonly
ssp {
>    %1 = bitcast i32 addrspace(256)* %foo to i32*
>    %2 = load i32* %1, align 4, !tbaa !0
>    ret i32 %2
> }
>
> Note that in loadgs2, the call to toglobal has been inlined and so the back
end will just see a bitcast, which SelectionDAG treats as a no-op.  The assembly
we get from this is:
>
> _toglobal:                              ## @toglobal
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	movq	%rdi, %rax
> 	popq	%rbp
> 	ret
> load:                                  ## @load
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	movl	(%rdi), %eax
> 	popq	%rbp
> 	ret
>
> 	.globl	_loadgs
> 	.align	4, 0x90
> loadgs:                                ## @loadgs
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	movl	%gs:(%rdi), %eax
> 	popq	%rbp
> 	ret
>
> 	.globl	_loadgs2
> 	.align	4, 0x90
> loadgs2:                               ## @loadgs2
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	movl	(%rdi), %eax
> 	popq	%rbp
> 	ret
>
> loadgs() has been compiled correctly.  It uses the parameter as a
gs-relative address and performs the load.  The assembly for load() and
loadgs2(), however, are identical: both are treating the parameter as a linear
(not gs-relative) address.  The cast has been lost.  This is even simpler when
you look at toglobal(), which has just become a noop.  The correct code for this
should be (I believe):
>
> _toglobal:                              ## @toglobal
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	lea		%gs:(%rdi), %rax
> 	popq	%rbp
> 	ret
>
> In the inlined version, the lea and movl should be combined into a single
gs-relativel movl.
>
> Until we can generate correct code from IR containing address spaces,
discussion of how to optimise this IR seems premature.
I've done a quick test: the problem is that the BITCAST node is not 
generated during the SelectionDAG building. If you look in 
SelectionDAGBuilder::visitBitCast, you will see that the node is 
generated only if the operand value of the bitcast operation and the 
result value have different EVTs: the address space information is not 
handled in EVT and so pointers in different address spaces are mapped to 
the same EVT that imply a missing BITCAST node.

Maybe rethinking the way address spaces are handled at the interface 
between middle-end and backend would allow to fix also these kind of 
problems. BTW, I think this specific problem can be used for a bug 
report :-).

Thanks.

-Michele

Charles Davis

2013-Aug-09 08:48 UTC

head link

[LLVMdev] Address space extension

On Aug 8, 2013, at 3:04 AM, David Chisnall wrote:> The correct code for this should be (I believe):
> 
> _toglobal:                              ## @toglobal
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	lea		%gs:(%rdi), %rax
> 	popq	%rbp
> 	retThis won't have the effect you're hoping for. LEA stands for "Load
Effective Address"; it only operates on the offset part of a logical (far)
address. It's no different from before, when RDI was MOV'd into RAX.

In fact, there is no instruction you can use to turn a seg:offset logical
address into a linear address. That's why most systems that use the FS and
GS registers for thread-specific data have a field for the linear address of the
TSD structure.

Chip

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Aug 2013 - [LLVMdev] Address space extension

[LLVMdev] Address space extension

[LLVMdev] Address space extension

[LLVMdev] Address space extension

[LLVMdev] Address space extension

[LLVMdev] Address space extension

Possibly Parallel Threads