thr3ads.net - llvm dev - [LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Villmow, Micah

2011-Oct-03 18:36 UTC

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

One of the projects I am working on with others is to make LLVM-IR endian
agnostic.

So, I am sending out this proposal for feedback to the LLVM community. I've
attached
pretty version of the proposal in PDF format and pasted a 80-column safe text
version
below.

I'm looking forward to comments and feedback.

Thanks,
Micah Villmow


Text of Proposal:
===============================================================================RFC
- ENDIAN AGNOSTIC IR REPRESENTATION.
===============================================================================
-------------------------------------------------------------------------------
MOTIVATION:
-------------------------------------------------------------------------------
    In our current compilation model, a compiler will compile from a source
language to an intermediate IR representation, perform optimizations, and then
use the LLVM code generation infrastructure to target a specific backend. This
approach works well for homogenous devices, but compilation models of compute
languages, like OpenCL,  now take place with heterogeneous devices(e.g. x86
core + gpu) as the targets. In some cases the devices have varying pointer
sizes and byte ordering and this causes a problem for cross-device binary
compatibility that a source language provides.

    The major problem is that when creating a program for a heterogeneous
system, different compilation paths are required for different devices for
each source compilation. OpenCL, an Open Standard for compute on various types
of devices, defines a source language that is portable across devices(i.e.
same source on GPU's and CPUs). While compilation from source is portable,
compilation from binary is not, and diverging compilation paths have issues
with both maintenance and testing.

-------------------------------------------------------------------------------
PROBLEM QUESTION:
-------------------------------------------------------------------------------
How does a vendor simplify the compiler stack across multiple target devices
by removing endianess from the IR representation?

-------------------------------------------------------------------------------
PROPOSAL:
-------------------------------------------------------------------------------
I am proposing an extension to LLVM[1] that abstracts away all endian related
IR operations with a series of intrinsic calls. These intrinsic calls allow
consumers of the IR to quickly reconstruct either the original IR, or an
equivalent IR, with respect to the byte ordering of the target device. This IR
representation provides an abstraction layer similar to the hton[sl]() series
of function calls with network programming.

-------------------------------------------------------------------------------
OTHER SYSTEMS:
-------------------------------------------------------------------------------
While this approach is similar to Google's PNaCl[3] which attempts to
provide
an ISA neutral representation. The goals of this proposal are slightly
different in that we not only want ISA neutral representation, but also
endian-neutral representation. Therefore, where PNaCl represents LLVM-IR
before codegen, see figure 1 from [3],  this approach provides a portable
representation after the frontend and before LLVM-bitcode generation. The
PNaCl representation makes assumptions on address space, data types,
byte-order, concurrency and runtime system. This proposal inherits the
assumptions on data types, address sizes, concurrency and runtime systems
from OpenCL[4].

-------------------------------------------------------------------------------
DEFINITIONS:
-------------------------------------------------------------------------------
Global Memory - Memory that is visible to all threads in a process/program,
e.g. video ram. This includes all read-only, write-only and read-write memories
on the system that are visible to all threads.

-------------------------------------------------------------------------------
INTRINSICS:
-------------------------------------------------------------------------------
This proposal introduces new sets of intrinsics, two load intrinsics and two
store intrinsics. The sets are as follows:
declare <type> @llvm.portable.load.e.<type>(<type>* ptr, , i32
alignment,
i1 host, i1 atomic, i1 volatile, i1 nontemporal, i1 singlethread)
// little endian load

declare <type> @llvm.portable.load.E.<type>(<type>* ptr, i32
alignment,
i1 host,  i1 atomic, i1 volatile, i1 nontemporal, i1 singlethread)
// big endian load

declare void @llvm.portable.store.e.<type>(<type> data,
<type>* ptr,
i32 alignment, i1 host,  i1 atomic, i1 volatilei1 nontemporal,
i1 singlethread) // little endian store
declare void @llvm.portable.store.E.<type>(<type> data,
<type>* ptr,
i32 alignment, i1 host, i1 atomic, i1 volatile, i1 nontemporal,
i1 singlethread) // big endian store


A second smaller set could be:
declare <type> @llvm.portable.load.<type>(<type>* ptr, i32
alignment,
i1 host, i1 littleEndian, i1 atomic, i1 volatile,
i1 nontemporal, i1 singlethread)

declare void @llvm.portable.store.<type>(<type> data, <type>*
ptr,
i32 alignment, i1 host, i1 littleEndian, i1 atomic, i1 volatile,
i1 nontemporal, i1 singlethread)

    Valid values for type are scalar sizes i8, i16, i32, i64, f16, f32, f64 and
vector versions with sizes of 2, 3, 4, 8 and 16 elements. Only pointers to the
global address space, designated to separate it from the default address space
in LLVM which is 0, with the pointer address space 1, are valid pointer values.
The reason for the different address space is that requirement in OpenCL that
the default address space is private memory, which conflicts with LLVM's
default memory going to globally visible memory. For brevity, all possible
combinations are not enumerated here.

Another issue is with the data layout. A third option to the endianess is
added to the LLVM reference manual that is defined as follows.
"p Specifies that the IR is in endian-portable form, i.e. code produced by
little- and big-endian target back ends will be functionally equivalent (in
their affect on global memory).  The IR must be converted to a target format
before the IR is valid LLVM-IR."
Using this data layout option will allow the compiler to quickly determine
if the IR is in endian portable form.

-------------------------------------------------------------------------------
PARAMETERS:
-------------------------------------------------------------------------------
host - True when the load/store is from the host machine and false when the
load/store is from the device.
atomic/volatile/nontemporal/singlethread - Follows the same semantics as the
arguments to the load/store instructions in the LLVM-IR with the same names.
See the LLVM Lang Ref[1].

-------------------------------------------------------------------------------
POINTER ATTRIBUTES:
-------------------------------------------------------------------------------
In OpenCL, a pointer can have attributes attached, and this information needs
to be encoded. In LLVM, the method of encoding extra information is via
metadata nodes and this is used so that the intrinsic do not need to be
modified to add extra information. One example of this is the endian(host)
attribute that can be attached to a pointer argument(see 6.10.3 of OpenCL
1.1 spec). This information can be encoded in a metadata node which is attached
to the intrinsic.  An example encoding of this information is as follows:
!0 = metadata !{
  i32, ;; Tag = <OpenCL version number> using the official OpenCL version
macro
  i1,;;Boolean value to specify that load is from host on true, device on false
  metadata ;; List of attributes for this intrinsic instruction
}

-------------------------------------------------------------------------------
CONSTRAINTS:
-------------------------------------------------------------------------------
Except for the data and ptr arguments, all arguments must be compile time
constants.
Optimizations that rely on the byte ordering of memory or that modify the
programs interactions with global memory are illegal to be performed on the
IR when in the portable form.
All accesses to global memory must be done through these intrinsic calls.

-------------------------------------------------------------------------------
LINKS:
-------------------------------------------------------------------------------
1.            http://llvm.org/docs/LangRef.html
2.            http://llvm.org/docs/Atomics.html
3.            http://nativeclient.googlecode.com/svn/data/site/pnacl.pdf
4.            http://www.khronos.org/opencl/

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111003/972c6dea/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RFC-SPIR.pdf
Type: application/pdf
Size: 269380 bytes
Desc: RFC-SPIR.pdf
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111003/972c6dea/attachment.pdf>

Török Edwin

2011-Oct-03 21:00 UTC

head link

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

On 10/03/2011 09:36 PM, Villmow, Micah wrote:> One of the projects I am working on with others is to make LLVM-IR endian
agnostic.
> 
>  
> 
> So, I am sending out this proposal for feedback to the LLVM community. I’ve
attached
> 
> pretty version of the proposal in PDF format and pasted a 80-column safe
text version
> 
> below.
> 
>  
> 
> A second smaller set could be:
> 
> declare <type> @llvm.portable.load.<type>(<type>* ptr,
i32 alignment,
> 
> i1 host, i1 littleEndian, i1 atomic, i1 volatile,
> 
> i1 nontemporal, i1 singlethread)
> 
>  
> 
> declare void @llvm.portable.store.<type>(<type> data,
<type>* ptr,
> 
> i32 alignment, i1 host, i1 littleEndian, i1 atomic, i1 volatile,
> 
> i1 nontemporal, i1 singlethread)
FWIW here is another way to do it (which is approximately what ClamAV does
currently) by introducing just one intrinsic:
declare i1 @llvm.is_bigendian()

The advantage is that you can implement htonl() and ntohl() like functionality
without using a temporary memory location.
Actually I think having the 2 intrinsics you suggest and the is_bigendian()
intrinsic would be optimal:
you can use your 2 intrinsics for initial codegen, and mem2reg can transform it
to is_bigendian().

For load/store:
<type> %val = load <type>* %ptr
<type> %sval = bswap.i<type> %val
%result = <type> select @llvm.is_bigendian(), %val, %sval

For htonl():
<type> %sval = bswap.i<type> %val
%result = <type> select @llvm.is_bigendian(), %val, %sval

(store is similar, byteswap before the store)

At bytecode JIT time / assembly emission time @llvm.is_bigendian() is a known
constant, and constant propagation is
used to throw away the unwanted code path, so it becomes either:

<type> %result = load <type>* %ptr

or

<type> %val = load <type>* %ptr
<type> %result = bswap.i<type> %val

Best regards,
--Edwin

James Molloy

2011-Oct-04 07:06 UTC

head link

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

Hi Micah,

 

I'm no core developer, but FWIW here are my thoughts:

 

I'm general I think the patch is too OpenCL oriented, and I have some
niggling qualms about other parts. Specifically (comments inline):

 

 

 

From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Villmow, Micah
Sent: 03 October 2011 19:37
To: llvmdev at cs.uiuc.edu
Subject: [LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

 

One of the projects I am working on with others is to make LLVM-IR endian
agnostic. 

 

So, I am sending out this proposal for feedback to the LLVM community. I've
attached

pretty version of the proposal in PDF format and pasted a 80-column safe
text version

below. 

 

I'm looking forward to comments and feedback.

 

----------------------------------------------------------------------------
---

PROBLEM QUESTION:

----------------------------------------------------------------------------
---

How does a vendor simplify the compiler stack across multiple target devices

by removing endianess from the IR representation?

 

 

This is not the question that your RFC answers. Your RFC answers a superset
of just "represent endianness".

 

----------------------------------------------------------------------------
---

DEFINITIONS:

----------------------------------------------------------------------------
---

Global Memory - Memory that is visible to all threads in a process/program, 

e.g. video ram. This includes all read-only, write-only and read-write
memories

on the system that are visible to all threads.

 

What has this got to do with endianness?

 

----------------------------------------------------------------------------
---

INTRINSICS:

----------------------------------------------------------------------------
---

This proposal introduces new sets of intrinsics, two load intrinsics and two


store intrinsics. The sets are as follows:

declare <type> @llvm.portable.load.e.<type>(<type>* ptr, , i32
alignment,

i1 host, i1 atomic, i1 volatile, i1 nontemporal, i1 singlethread) 

// little endian load

 

declare <type> @llvm.portable.load.E.<type>(<type>* ptr, i32
alignment,

i1 host,  i1 atomic, i1 volatile, i1 nontemporal, i1 singlethread) 

// big endian load

 

declare void @llvm.portable.store.e.<type>(<type> data,
<type>* ptr,

i32 alignment, i1 host,  i1 atomic, i1 volatilei1 nontemporal, 

i1 singlethread) // little endian store

declare void @llvm.portable.store.E.<type>(<type> data,
<type>* ptr,

i32 alignment, i1 host, i1 atomic, i1 volatile, i1 nontemporal, 

i1 singlethread) // big endian store

 

.         I don't like the 'e'/'E' representation. If there
were only little
or big endian loads throughout an IR file, it wouldn't be obvious to me what
the 'e'/'E' meant. It's only seeing the two in tandem where
it jumps out at
me. I'd prefer the standard 'le'/'be'.

.         You've put the OpenCL concept of "host" and
"device" in a
supposedly target-agnostic IR. Why should there be only one device? More
importantly, why is host/device an attribute of the load or store as opposed
to the pointer to load/store to? Does it semantically make sense to have
both a host load and a device load of the same memory location in the same
module?

 

----------------------------------------------------------------------------
---

POINTER ATTRIBUTES:

----------------------------------------------------------------------------
---

In OpenCL, a pointer can have attributes attached, and this information
needs 

to be encoded. In LLVM, the method of encoding extra information is via 

metadata nodes and this is used so that the intrinsic do not need to be 

modified to add extra information. One example of this is the endian(host) 

attribute that can be attached to a pointer argument(see 6.10.3 of OpenCL 

1.1 spec). This information can be encoded in a metadata node which is
attached

to the intrinsic.  An example encoding of this information is as follows:

!0 = metadata !{

  i32, ;; Tag = <OpenCL version number> using the official OpenCL version
macro

  i1,;;Boolean value to specify that load is from host on true, device on
false

  metadata ;; List of attributes for this intrinsic instruction

}

 

 

Does this subsection add anything extra to the RFC? It talks about a format
for metadata, but doesn't appear to really add any suggestions or
requirements for changing LLVM IR.

 

If your intention was just to make the IR endian-agnostic, I don't see why
you wouldn't just propose an extra attribute on the load/store instructions
(load be %0, load le %0) instead of recreating all loads and stores in a new
form and having to make all passes interact with them.

 

My general summary is that I think your suggestions take a "somewhat
language-agnostic and somewhat target-agnostic" IR and turn it into a
"somewhat language-dependent and more target-agnostic" IR, by
embedding
OpenCL specifics. I'm not sure I think that's the best way to go.

 

Cheers,

 

James

 

Thanks,

Micah Villmow

 

 

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111004/5965e872/attachment.html>

Duncan Sands

2011-Oct-04 07:28 UTC

head link

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

Hi Edwin,
> FWIW here is another way to do it (which is approximately what ClamAV does
currently) by introducing just one intrinsic:
> declare i1 @llvm.is_bigendian()
why is an intrinsic needed?  It is easy to write a small LLVM IR function
that computes this.  For example:

define i1 @is_big_endian() {
   %ip = alloca i16
   store i16 1, i16* %ip
   %cp = bitcast i16* %ip to i8*
   %c = load i8* %cp
   %r = icmp eq i8 %c, 0
   ret i1 %r
}

Ciao, Duncan.

Villmow, Micah

2011-Oct-04 16:36 UTC

head link

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of Török Edwin
> Sent: Monday, October 03, 2011 2:00 PM
> To: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic
> 
> On 10/03/2011 09:36 PM, Villmow, Micah wrote:
> > One of the projects I am working on with others is to make LLVM-IR
> endian agnostic.
> >
> >
> >
> > So, I am sending out this proposal for feedback to the LLVM
> community. I've attached
> >
> > pretty version of the proposal in PDF format and pasted a 80-column
> safe text version
> >
> > below.
> >
> >
> >
> > A second smaller set could be:
> >
> > declare <type> @llvm.portable.load.<type>(<type>*
ptr, i32 alignment,
> >
> > i1 host, i1 littleEndian, i1 atomic, i1 volatile,
> >
> > i1 nontemporal, i1 singlethread)
> >
> >
> >
> > declare void @llvm.portable.store.<type>(<type> data,
<type>* ptr,
> >
> > i32 alignment, i1 host, i1 littleEndian, i1 atomic, i1 volatile,
> >
> > i1 nontemporal, i1 singlethread)
> 
> FWIW here is another way to do it (which is approximately what ClamAV
> does currently) by introducing just one intrinsic:
> declare i1 @llvm.is_bigendian()[Villmow, Micah] I think the big difference in our requirements is that we can
have both big endian(host) and little endian(device), or vice versa, accesses to
the same pointer. So a global is_bigendian intrinsic would not work for what we
are attempting to accomplish.> 
> The advantage is that you can implement htonl() and ntohl() like
> functionality without using a temporary memory location.
> Actually I think having the 2 intrinsics you suggest and the
> is_bigendian() intrinsic would be optimal:
> you can use your 2 intrinsics for initial codegen, and mem2reg can
> transform it to is_bigendian().
> 
> For load/store:
> <type> %val = load <type>* %ptr
> <type> %sval = bswap.i<type> %val
> %result = <type> select @llvm.is_bigendian(), %val, %sval
> 
> For htonl():
> <type> %sval = bswap.i<type> %val
> %result = <type> select @llvm.is_bigendian(), %val, %sval
> 
> (store is similar, byteswap before the store)
> 
> At bytecode JIT time / assembly emission time @llvm.is_bigendian() is a
> known constant, and constant propagation is
> used to throw away the unwanted code path, so it becomes either:
> 
> <type> %result = load <type>* %ptr
> 
> or
> 
> <type> %val = load <type>* %ptr
> <type> %result = bswap.i<type> %val
> 
> Best regards,
> --Edwin
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Samuel Crow

2011-Oct-04 16:48 UTC

head link

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

>________________________________
>From: "Villmow, Micah" <Micah.Villmow at amd.com>
>To: "llvmdev at cs.uiuc.edu" <llvmdev at cs.uiuc.edu>
>Sent: Monday, October 3, 2011 1:36 PM
>Subject: [LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic
>
>
>One of the projects I am working on with others is to make LLVM-IR endian
agnostic.
> 
>So, I am sending out this proposal for feedback to the LLVM community. I’ve
attached
>pretty version of the proposal in PDF format and pasted a 80-column safe
text version
>below. 
> 
>I’m looking forward to comments and feedback.
> 
>Thanks,
>Micah Villmow
> 
> --snip--

Hello Micah,

Without having read a lot into your plan I'd like to make a few suggestions:
 Some game systems use mixed-endian datalayouts as a form of lockouts for
homebrew software.  While I believe it isn't a terribly effective mechanism,
it does leave LLVM unable to be used for such game systems.  I think LLVM should
allow some sort of swizzle mechanism to allow such mixed-endian datalayouts.  (I
think swizzle is the correct term.)

Also, as a co-developer of Clang's AROS backend, it would be really handy to
have an endian-agnostic bitcode format since our OS covers about 5 different CPU
architectures, some of which are big-endian.  We were hoping to base a superset
of the ELF loader that would be endian-agnostic based on PNaCl's bitcode
format.

Thanks for taking this challenge on,

--Samuel Crow

Villmow, Micah

2011-Oct-04 17:07 UTC

head link

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

From: James Molloy [mailto:james.molloy at arm.com]
Sent: Tuesday, October 04, 2011 12:06 AM
To: Villmow, Micah; llvmdev at cs.uiuc.edu
Subject: RE: [RFC] Proposal to make LLVM-IR endian agnostic

Hi Micah,

I'm no core developer, but FWIW here are my thoughts:

I'm general I think the patch is too OpenCL oriented
[Villmow, Micah] I agree, but this is mainly to solve a problem that is unique
to OpenCL or related technologies(CUDA, DirectCompute, etc...).
, and I have some niggling qualms about other parts. Specifically (comments
inline):

From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Villmow, Micah
Sent: 03 October 2011 19:37
To: llvmdev at cs.uiuc.edu
Subject: [LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

One of the projects I am working on with others is to make LLVM-IR endian
agnostic.

So, I am sending out this proposal for feedback to the LLVM community. I've
attached
pretty version of the proposal in PDF format and pasted a 80-column safe text
version
below.

I'm looking forward to comments and feedback.

-------------------------------------------------------------------------------
PROBLEM QUESTION:
-------------------------------------------------------------------------------
How does a vendor simplify the compiler stack across multiple target devices
by removing endianess from the IR representation?

This is not the question that your RFC answers. Your RFC answers a superset of
just "represent endianness".
[Villmow, Micah] Maybe I didn't go into enough detail on how this proposal
helps solve this problem. Currently
our compiler stack has to handle issues with big endian vs little endian devices
along with 32bit vs 64bit devices(which is outside of this scope).  If we want
to have a common binary that we can use to compile for all devices, then we must
store both versions of the LLVM-IR, increasing binary size and compile time. By
abstracting away endianness representation, two of the 4 variations are unified,
allowing the fat binary to store 1 LLVM-IR representation for each bitness
instead of 2. So by abstracting endian assumptions out of the LLVM-IR, we are
simplifying the compiler stack.

-------------------------------------------------------------------------------
DEFINITIONS:
-------------------------------------------------------------------------------
Global Memory - Memory that is visible to all threads in a process/program,
e.g. video ram. This includes all read-only, write-only and read-write memories
on the system that are visible to all threads.

What has this got to do with endianness?
[Villmow, Micah] This just defines the type of memory we are interested in.
Other types of memory are not covered by this proposal, as they should not have
this problem. For example, endianness agnostic load/stores to private memory are
meaningless as it is only visible within a thread.

-------------------------------------------------------------------------------
INTRINSICS:
-------------------------------------------------------------------------------
This proposal introduces new sets of intrinsics, two load intrinsics and two
store intrinsics. The sets are as follows:
declare <type> @llvm.portable.load.e.<type>(<type>* ptr, , i32
alignment,
i1 host, i1 atomic, i1 volatile, i1 nontemporal, i1 singlethread)
// little endian load

declare <type> @llvm.portable.load.E.<type>(<type>* ptr, i32
alignment,
i1 host,  i1 atomic, i1 volatile, i1 nontemporal, i1 singlethread)
// big endian load

declare void @llvm.portable.store.e.<type>(<type> data,
<type>* ptr,
i32 alignment, i1 host,  i1 atomic, i1 volatilei1 nontemporal,
i1 singlethread) // little endian store
declare void @llvm.portable.store.E.<type>(<type> data,
<type>* ptr,
i32 alignment, i1 host, i1 atomic, i1 volatile, i1 nontemporal,
i1 singlethread) // big endian store

*         I don't like the 'e'/'E' representation. If there
were only little or big endian loads throughout an IR file, it wouldn't be
obvious to me what the 'e'/'E' meant. It's only seeing the
two in tandem where it jumps out at me. I'd prefer the standard
'le'/'be'.
[Villmow, Micah] Good suggestion, I was using the 'e' and 'E' as
that is what is in the target data description from the LLVM spec.

*         You've put the OpenCL concept of "host" and
"device" in a supposedly target-agnostic IR. Why should there be only
one device? More importantly, why is host/device an attribute of the load or
store as opposed to the pointer to load/store to? Does it semantically make
sense to have both a host load and a device load of the same memory location in
the same module?
[Villmow, Micah] Abstracting LLVM-IR so it can encode multiple device execution
information from a single compilation unit is outside the scope of this
proposal, hence the single device. The reason for not adding the attribute to
the pointer is that each load/store can be unique in how to represent the
endianness of the memory it points to. As for the third question, it isn't a
host load and a device load, it is a load with host endianess and a load with
device endianess. A hypothetical example of this is a simple embedded
co-processor attached to a general purpose processor(i.e. AMD's Torrenza
initiative) where the co-processor did not have hardware to convert between the
endianness but memory spans across its own memory and the system memory. In this
case, the compiler when it generates executables needs to make sure that loads
from the host have memory ordered correctly for the device. Again, this is just
an example, but a possible valid situation.
-------------------------------------------------------------------------------
POINTER ATTRIBUTES:
-------------------------------------------------------------------------------
In OpenCL, a pointer can have attributes attached, and this information needs
to be encoded. In LLVM, the method of encoding extra information is via
metadata nodes and this is used so that the intrinsic do not need to be
modified to add extra information. One example of this is the endian(host)
attribute that can be attached to a pointer argument(see 6.10.3 of OpenCL
1.1 spec). This information can be encoded in a metadata node which is attached
to the intrinsic.  An example encoding of this information is as follows:
!0 = metadata !{
  i32, ;; Tag = <OpenCL version number> using the official OpenCL version
macro
  i1,;;Boolean value to specify that load is from host on true, device on false
  metadata ;; List of attributes for this intrinsic instruction
}

Does this subsection add anything extra to the RFC? It talks about a format for
metadata, but doesn't appear to really add any suggestions or requirements
for changing LLVM IR.
[Villmow, Micah] Your right, this is more on how to encode pointer information,
this can be ignored.

If your intention was just to make the IR endian-agnostic, I don't see why
you wouldn't just propose an extra attribute on the load/store instructions
(load be %0, load le %0) instead of recreating all loads and stores in a new
form and having to make all passes interact with them.
[Villmow, Micah] While we could go this route, this would make the endian
agnostic IR compatible with LLVM-IR passes, which we don't want to do, hence
the use of intrinsic. Basically we want the endian agnostic IR to be mostly
compatible with LLVM-IR, but will require a transformation pass to generate the
correct load/store instructions for the device it will be generated for. I
believe there was other reasons brought up by other contributors, but they
escape me right now.

My general summary is that I think your suggestions take a "somewhat
language-agnostic and somewhat target-agnostic" IR and turn it into a
"somewhat language-dependent and more target-agnostic" IR, by
embedding OpenCL specifics. I'm not sure I think that's the best way to
go.
[Villmow, Micah] That is correct, that is basically what we are attempting to
do. We want a single IR that can be used across multiple devices for OpenCL. If
this is something that can be modified to be less language dependent but keep
the target agnostic and fulfill our needs, then we are willing to go down that
path. This is why we believe involving the LLVM community is important so we can
get this kind of feedback and hopefully agree on something that we can use, but
other non-related projects can also use.

Cheers,

James

Thanks,
Micah Villmow

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111004/104c13eb/attachment.html>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Oct 2011 - [LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

[LLVMdev] [RFC] Proposal to make LLVM-IR endian agnostic

Possibly Parallel Threads