thr3ads.net - llvm dev - [LLVMdev] n-bit bytes for clang/llvm [Mar 2015]

If this information is useful, please help other people find it:
Share via:

David Chisnall

2015-Mar-18 07:31 UTC

[LLVMdev] n-bit bytes for clang/llvm

On 17 Mar 2015, at 13:11, Tyro Software <softwaretyro at gmail.com>
wrote:> 
> As an alternative to fixing the "char == 8 bits" presumption
would using non-uniform pointer types have been another possible approach, e.g.
keep char as 8 bit but have char* encode both the word address and the byte
location within it (i.e. one extra bit in this 16-bit case). Of course this is
only a less intrusive (to LLVM) approach if LLVM readily supports such pointers,
which may be close to asking "could 8086 small/large/huge pointers be
implemented?"
> 
> One obvious drawback to such an approach is that dereferencing char*
becomes relatively expensive, though for the sort of code being predominantly
run on a DSP that might be acceptable.
We're using multiple address spaces to describe two pointer representations
for CHERI: AS0 is a 64-bit pointer that's represented as an integer, AS200
is a capability (256-bit fat pointer with base, length, permissions, enforced in
hardware).  We had to fix a few things where LLVM assumes that pointers are
integers, but the different size pointers in different address spaces part works
very well.  The biggest weakness is in TableGen / SelectionDAG, where you
can't write patterns on iPTR that depend on a specific AS (actually, you
can't really write patterns on iPTR at all, as LLVM tries to lower iPTR to
some integer type first, even when this doesn't make any sense [e.g. on an
architecture with separate address and integer registers]).

Having AS0 be a byte pointer, which the back end would lower to two words, and
some target-specific AS be a word pointer would likely work quite well.

David

Tyro Software

2015-Mar-18 10:25 UTC

head link

[LLVMdev] n-bit bytes for clang/llvm

Thanks - that's a really helpful steer.

So if I'm understanding correctly, the CHERI address spaces are equivalent
as regards actual memory addresses, with the "fatness" being the type,
access, etc metadata? (somehow I'd formed the impression that LLVM address
spaces needed to be disjoint)

Tyro

On Wed, Mar 18, 2015 at 8:31 AM, David Chisnall <David.Chisnall at
cl.cam.ac.uk> wrote:
> On 17 Mar 2015, at 13:11, Tyro Software <softwaretyro at gmail.com>
wrote:
> >
> > As an alternative to fixing the "char == 8 bits" presumption
would using
> non-uniform pointer types have been another possible approach, e.g. keep
> char as 8 bit but have char* encode both the word address and the byte
> location within it (i.e. one extra bit in this 16-bit case). Of course this
> is only a less intrusive (to LLVM) approach if LLVM readily supports such
> pointers, which may be close to asking "could 8086 small/large/huge
> pointers be implemented?"
> >
> > One obvious drawback to such an approach is that dereferencing char*
> becomes relatively expensive, though for the sort of code being
> predominantly run on a DSP that might be acceptable.
>
> We're using multiple address spaces to describe two pointer
> representations for CHERI: AS0 is a 64-bit pointer that's represented
as an
> integer, AS200 is a capability (256-bit fat pointer with base, length,
> permissions, enforced in hardware).  We had to fix a few things where LLVM
> assumes that pointers are integers, but the different size pointers in
> different address spaces part works very well.  The biggest weakness is in
> TableGen / SelectionDAG, where you can't write patterns on iPTR that
depend
> on a specific AS (actually, you can't really write patterns on iPTR at
all,
> as LLVM tries to lower iPTR to some integer type first, even when this
> doesn't make any sense [e.g. on an architecture with separate address
and
> integer registers]).
>
> Having AS0 be a byte pointer, which the back end would lower to two words,
> and some target-specific AS be a word pointer would likely work quite well.
>
> David
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150318/c4718910/attachment.html>

Tyro Software

2015-Mar-18 10:59 UTC

head link

[LLVMdev] n-bit bytes for clang/llvm

Hi Patrik

Indeed I am hoping to avoid the n-bitian-fork approach (laziness more than
anything; the pain of keeping patches moving forwards with the clang/llvm
mainstream) And luckily for a toy architecture legacy code compatibility is
less of a concern, at least until I sleepwalk into the "port Linux"
state...

Tyro

On Wed, Mar 18, 2015 at 11:25 AM, Tyro Software <softwaretyro at
gmail.com>
wrote:
> Thanks - that's a really helpful steer.
>
> So if I'm understanding correctly, the CHERI address spaces are
equivalent
> as regards actual memory addresses, with the "fatness" being the
type,
> access, etc metadata? (somehow I'd formed the impression that LLVM
address
> spaces needed to be disjoint)
>
> Tyro
>
> On Wed, Mar 18, 2015 at 8:31 AM, David Chisnall <
> David.Chisnall at cl.cam.ac.uk> wrote:
>
>> On 17 Mar 2015, at 13:11, Tyro Software <softwaretyro at
gmail.com> wrote:
>> >
>> > As an alternative to fixing the "char == 8 bits"
presumption would
>> using non-uniform pointer types have been another possible approach,
e.g.
>> keep char as 8 bit but have char* encode both the word address and the
byte
>> location within it (i.e. one extra bit in this 16-bit case). Of course
this
>> is only a less intrusive (to LLVM) approach if LLVM readily supports
such
>> pointers, which may be close to asking "could 8086
small/large/huge
>> pointers be implemented?"
>> >
>> > One obvious drawback to such an approach is that dereferencing
char*
>> becomes relatively expensive, though for the sort of code being
>> predominantly run on a DSP that might be acceptable.
>>
>> We're using multiple address spaces to describe two pointer
>> representations for CHERI: AS0 is a 64-bit pointer that's
represented as an
>> integer, AS200 is a capability (256-bit fat pointer with base, length,
>> permissions, enforced in hardware).  We had to fix a few things where
LLVM
>> assumes that pointers are integers, but the different size pointers in
>> different address spaces part works very well.  The biggest weakness is
in
>> TableGen / SelectionDAG, where you can't write patterns on iPTR
that depend
>> on a specific AS (actually, you can't really write patterns on iPTR
at all,
>> as LLVM tries to lower iPTR to some integer type first, even when this
>> doesn't make any sense [e.g. on an architecture with separate
address and
>> integer registers]).
>>
>> Having AS0 be a byte pointer, which the back end would lower to two
>> words, and some target-specific AS be a word pointer would likely work
>> quite well.
>>
>> David
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150318/ad3c1b1c/attachment.html>

David Chisnall

2015-Mar-18 12:06 UTC

head link

[LLVMdev] n-bit bytes for clang/llvm

On 18 Mar 2015, at 12:25, Tyro Software <softwaretyro at gmail.com>
wrote:> 
> So if I'm understanding correctly, the CHERI address spaces are
equivalent as regards actual memory addresses, with the "fatness"
being the type, access, etc metadata? (somehow I'd formed the impression
that LLVM address spaces needed to be disjoint)
We're slightly abusing the address space mechanism, but there's no
requirement that they be disjoint - if they were then there would be no need for
an address space cast instruction.  For us, whether they are disjoint is a
run-time property: non-capability loads and stores are relative to a global base
capability, which may be the entire virtual address space or may be quite
restricted.

David

llvm dev - Mar 2015 - [LLVMdev] n-bit bytes for clang/llvm

[LLVMdev] n-bit bytes for clang/llvm

[LLVMdev] n-bit bytes for clang/llvm

[LLVMdev] n-bit bytes for clang/llvm

[LLVMdev] n-bit bytes for clang/llvm