Dmitriy Borisenkov via llvm-dev
2019-Oct-31 11:17 UTC
[llvm-dev] RFC: On non 8-bit bytes and the target for it
David, just to clarify a misconception I might have introduced, we do not have linear memory in the sense that all data is stored as a trie. We do support arrays, structures and GEPs, however, as well as all relevant features in C by modeling memory. So regarding concepts of byte, all 5 statements you gave are true for our target. Either due to the specification or because of performance (gas consumption) issues. But if there are architectures that need less from the notion of byte, we should try to figure out the common denominator. It's probably ok to be less restrictive about a byte. -- Kind regards, Dmitry On Wed, Oct 30, 2019 at 5:19 PM David Chisnall via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 30/10/2019 10:07, Jeroen Dobbelaere via llvm-dev wrote: > > We (Synopsys ASIP Designer team) and our customers tend to disagree: our > customers do create plenty of cpu architectures > > with non-8-bit characters (and non-8-bit addressable memories). We are > able to provide them with a working c/c++ compiler solution. > > Maybe some support libraries are not supported out of the box, but for > these kind of architectures that is acceptable. > > (Besides that, llvm is also more than just c/c++) > > My main concern in this discussion is that we're conflating several > concepts of a 'byte': > > - The smallest unit that can be loaded / stored at a time. > > - The smallest unit that can be addressed with a raw pointer in a > specific address space. > > - The largest unit whose encoding is opaque to anything above the ISA. > > - The type used to represent `char` in C. > > - The type that has a size that all other types are a multiple of. > > In POSIX C (which imposes some extra constraints not found in ISO C), > when lowered to LLVM IR, all of these are the same type: > > - Loads and stores of values smaller than i8 or not a multiple of i8 > may be widened to a multiple of i8. Bitfield fields that are smaller > than i8 must use i8 or wider operations and masking. > > - GEP indexes are not well defined for anything that is not a multiple > of i8. > > - There is no defined bit order of i8 (or bit order for larger types, > only an assumption that, for example, i32 is 4 i8s in a specific order > specified by the data layout). > > - char is lowered to i8. > > - All ABI-visible types have a size that is a multiple of 8 bits. > > It's not clear to me that saying 'a byte is 257 bits' means changing all > of these to 257 or changing only some of them to 257 (which?). For > example, when compiling C for 16-byte-addressible historic > architectures, typically: > > - char is 8 bytes. > > - char* and void* is represented as a pointer plus a 1-bit offset > (sometimes encoded in the low bit, so the load / store sequence is a > right shift one, a load, and then a mask or mask and shift depending on > the low bit). > > - Other pointer types are 16-bit aligned. > > IBM's 36-bit word machines use a broadly similar strategy, though with > some important differences and I would imagine that most Synopsis cores > are going to use some variation on this approach. > > This probably involves a quite different design to a model with 257-bit > registers, but most of the concerns don't exist if you don't have memory > that can store byte arrays and so involve very different design decisions. > > TL;DR: A proposal for supporting non-8-bit bytes needs to explain what > their expected lowerings are and what they mean by a byte. > > David > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191031/97bc5fa3/attachment.html>
David Chisnall via llvm-dev
2019-Oct-31 11:48 UTC
[llvm-dev] RFC: On non 8-bit bytes and the target for it
On 31/10/2019 11:17, Dmitriy Borisenkov wrote:> David, just to clarify a misconception I might have introduced, we do > not have linear memory in the sense that all data is stored as a trie. > We do support arrays, structures and GEPs, however, as well as all > relevant features in C by modeling memory.So, if I understand correctly, your memory is a key-value store where the keys are 257-bit values and the values are arrays of 257-bit values? Or they values are 257-bit values? To help the discussion, please can you explain how the following are represented: - A pointer to an object. - A pointer to a field in an object. - An arbitrary void*. - The C string "hello world"> So regarding concepts of byte, all 5 statements you gave are true for > our target. Either due to the specification or because of > performance (gas consumption) issues. But if there are architectures > that need less from the notion of byte, we should try to figure out the > common denominator. It's probably ok to be less restrictive about a byte.It seems odd to encode a C string as an array of 257-bit values, rather than as an array of 8-bit values that are stored in 32-char chunks. David
Dmitriy Borisenkov via llvm-dev
2019-Oct-31 13:41 UTC
[llvm-dev] RFC: On non 8-bit bytes and the target for it
> So, if I understand correctly, your memory is a key-value store where thekeys are 257-bit values and the values are arrays of 257-bit values? Both the keys and the values are 257-bits wide: - A pointer to an object is 257 bits integer. - The same as a pointer to a field of an object. - And an arbitrary void* is also 257 bits wide integer. - "Hello, world" is an array of 257-bit characters. It's indeed redundant for letters and pointers to occupy that much space. However, a realistic contract that is able to run on a virtual machine without exceeding gas limits can't use strings and memory extensively. So we've chosen the simplest implementation possible. If other targets that have non-8-bits byte pack multiple 8-bit characters into a single byte and it's convenient for the community to maintain this kind of design, we probably can reimplement strings this way too. Persistent data, which is kept in the blockchain is more compact, but it requires explicit intrinsic calls to deserialize data and then the programmer is able to manipulate with it as with 257-bits integers. On Thu, Oct 31, 2019 at 1:48 PM David Chisnall <David.Chisnall at cl.cam.ac.uk> wrote:> On 31/10/2019 11:17, Dmitriy Borisenkov wrote: > > David, just to clarify a misconception I might have introduced, we do > > not have linear memory in the sense that all data is stored as a trie. > > We do support arrays, structures and GEPs, however, as well as all > > relevant features in C by modeling memory. > > So, if I understand correctly, your memory is a key-value store where > the keys are 257-bit values and the values are arrays of 257-bit values? > Or they values are 257-bit values? To help the discussion, please can > you explain how the following are represented: > > - A pointer to an object. > - A pointer to a field in an object. > - An arbitrary void*. > - The C string "hello world" > > > So regarding concepts of byte, all 5 statements you gave are true for > > our target. Either due to the specification or because of > > performance (gas consumption) issues. But if there are architectures > > that need less from the notion of byte, we should try to figure out the > > common denominator. It's probably ok to be less restrictive about a byte. > > It seems odd to encode a C string as an array of 257-bit values, rather > than as an array of 8-bit values that are stored in 32-char chunks. > > David >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191031/3ca19b89/attachment.html>