On 17 Mar 2015, at 13:11, Tyro Software <softwaretyro at gmail.com> wrote:> > As an alternative to fixing the "char == 8 bits" presumption would using non-uniform pointer types have been another possible approach, e.g. keep char as 8 bit but have char* encode both the word address and the byte location within it (i.e. one extra bit in this 16-bit case). Of course this is only a less intrusive (to LLVM) approach if LLVM readily supports such pointers, which may be close to asking "could 8086 small/large/huge pointers be implemented?" > > One obvious drawback to such an approach is that dereferencing char* becomes relatively expensive, though for the sort of code being predominantly run on a DSP that might be acceptable.We're using multiple address spaces to describe two pointer representations for CHERI: AS0 is a 64-bit pointer that's represented as an integer, AS200 is a capability (256-bit fat pointer with base, length, permissions, enforced in hardware). We had to fix a few things where LLVM assumes that pointers are integers, but the different size pointers in different address spaces part works very well. The biggest weakness is in TableGen / SelectionDAG, where you can't write patterns on iPTR that depend on a specific AS (actually, you can't really write patterns on iPTR at all, as LLVM tries to lower iPTR to some integer type first, even when this doesn't make any sense [e.g. on an architecture with separate address and integer registers]). Having AS0 be a byte pointer, which the back end would lower to two words, and some target-specific AS be a word pointer would likely work quite well. David
Thanks - that's a really helpful steer. So if I'm understanding correctly, the CHERI address spaces are equivalent as regards actual memory addresses, with the "fatness" being the type, access, etc metadata? (somehow I'd formed the impression that LLVM address spaces needed to be disjoint) Tyro On Wed, Mar 18, 2015 at 8:31 AM, David Chisnall <David.Chisnall at cl.cam.ac.uk> wrote:> On 17 Mar 2015, at 13:11, Tyro Software <softwaretyro at gmail.com> wrote: > > > > As an alternative to fixing the "char == 8 bits" presumption would using > non-uniform pointer types have been another possible approach, e.g. keep > char as 8 bit but have char* encode both the word address and the byte > location within it (i.e. one extra bit in this 16-bit case). Of course this > is only a less intrusive (to LLVM) approach if LLVM readily supports such > pointers, which may be close to asking "could 8086 small/large/huge > pointers be implemented?" > > > > One obvious drawback to such an approach is that dereferencing char* > becomes relatively expensive, though for the sort of code being > predominantly run on a DSP that might be acceptable. > > We're using multiple address spaces to describe two pointer > representations for CHERI: AS0 is a 64-bit pointer that's represented as an > integer, AS200 is a capability (256-bit fat pointer with base, length, > permissions, enforced in hardware). We had to fix a few things where LLVM > assumes that pointers are integers, but the different size pointers in > different address spaces part works very well. The biggest weakness is in > TableGen / SelectionDAG, where you can't write patterns on iPTR that depend > on a specific AS (actually, you can't really write patterns on iPTR at all, > as LLVM tries to lower iPTR to some integer type first, even when this > doesn't make any sense [e.g. on an architecture with separate address and > integer registers]). > > Having AS0 be a byte pointer, which the back end would lower to two words, > and some target-specific AS be a word pointer would likely work quite well. > > David > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150318/c4718910/attachment.html>
Hi Patrik Indeed I am hoping to avoid the n-bitian-fork approach (laziness more than anything; the pain of keeping patches moving forwards with the clang/llvm mainstream) And luckily for a toy architecture legacy code compatibility is less of a concern, at least until I sleepwalk into the "port Linux" state... Tyro On Wed, Mar 18, 2015 at 11:25 AM, Tyro Software <softwaretyro at gmail.com> wrote:> Thanks - that's a really helpful steer. > > So if I'm understanding correctly, the CHERI address spaces are equivalent > as regards actual memory addresses, with the "fatness" being the type, > access, etc metadata? (somehow I'd formed the impression that LLVM address > spaces needed to be disjoint) > > Tyro > > On Wed, Mar 18, 2015 at 8:31 AM, David Chisnall < > David.Chisnall at cl.cam.ac.uk> wrote: > >> On 17 Mar 2015, at 13:11, Tyro Software <softwaretyro at gmail.com> wrote: >> > >> > As an alternative to fixing the "char == 8 bits" presumption would >> using non-uniform pointer types have been another possible approach, e.g. >> keep char as 8 bit but have char* encode both the word address and the byte >> location within it (i.e. one extra bit in this 16-bit case). Of course this >> is only a less intrusive (to LLVM) approach if LLVM readily supports such >> pointers, which may be close to asking "could 8086 small/large/huge >> pointers be implemented?" >> > >> > One obvious drawback to such an approach is that dereferencing char* >> becomes relatively expensive, though for the sort of code being >> predominantly run on a DSP that might be acceptable. >> >> We're using multiple address spaces to describe two pointer >> representations for CHERI: AS0 is a 64-bit pointer that's represented as an >> integer, AS200 is a capability (256-bit fat pointer with base, length, >> permissions, enforced in hardware). We had to fix a few things where LLVM >> assumes that pointers are integers, but the different size pointers in >> different address spaces part works very well. The biggest weakness is in >> TableGen / SelectionDAG, where you can't write patterns on iPTR that depend >> on a specific AS (actually, you can't really write patterns on iPTR at all, >> as LLVM tries to lower iPTR to some integer type first, even when this >> doesn't make any sense [e.g. on an architecture with separate address and >> integer registers]). >> >> Having AS0 be a byte pointer, which the back end would lower to two >> words, and some target-specific AS be a word pointer would likely work >> quite well. >> >> David >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150318/ad3c1b1c/attachment.html>
On 18 Mar 2015, at 12:25, Tyro Software <softwaretyro at gmail.com> wrote:> > So if I'm understanding correctly, the CHERI address spaces are equivalent as regards actual memory addresses, with the "fatness" being the type, access, etc metadata? (somehow I'd formed the impression that LLVM address spaces needed to be disjoint)We're slightly abusing the address space mechanism, but there's no requirement that they be disjoint - if they were then there would be no need for an address space cast instruction. For us, whether they are disjoint is a run-time property: non-capability loads and stores are relative to a global base capability, which may be the entire virtual address space or may be quite restricted. David