Dmitriy Borisenkov via llvm-dev
2019-Oct-23 09:16 UTC
[llvm-dev] RFC: On non 8-bit bytes and the target for it
This RFC is to ask whether the community is interested in further discussion of iN bytes support. Last time the issue was on the agenda in May and the discussion was triggered by Jesper Antonsson's patches (see <https://lists.llvm.org/pipermail/llvm-dev/2019-May/132080.html> https://lists.llvm.org/pipermail/llvm-dev/2019-May/132080.html). It seems that, while some downstream areas benefit from non-8-bit bytes support, this feature is barely maintainable given the lack of utilization targets in the upstream. The reason why I would like to again raise the matter is that we, the TON Labs team, would like to upstream our backend solution. The backend generates code for TON virtual machine designed to run smart contracts in TON blockchain (see the original specifications for TVM and TON respectively at <https://test.ton.org/tvm.pdf> https://test.ton.org/tvm.pdf and at <https://test.ton.org/tblkch.pdf> https://test.ton.org/tblkch.pdf). The target has the following key particularities: - stack-based virtual machine - 257-bit wide integers, signed magnitude representation - no float point arithmetic support - persistent storage - no "native" memory; modeling is possible by costly - presence of custom types (it is exactly the reason for upstreaming) Given that the TVM only operates with 257 bits wide numbers, we changed LLVM in downstream to get a 257 bits byte. At the moment, we have a hacky implementation with a new byte size hardcoded. For a reference: the scope was to change approximately 20 files in LLVM and about a dozen in Clang. Later on, we plan to integrate the new byte size with data layout according to <https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/> https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/. And if the community decides to move on, we will upstream and maintain it. We realize that a 257 bits byte is quite unusual, but for smart contracts it is ok to have at least 256 bits numbers. The leading VM for smart contracts, Ethereum VM, introduced this practice and other blockchain VMs followed. Thus, while TVM might be the first LLVM-based target for blockchain that needs the feature, it is not necessarily the last one. We also found mentions of 12, 16 and 24 bits wide bytes in non-8-bits byte discussions in the past (in reverse chronological order: <https://lists.llvm.org/pipermail/llvm-dev/2019-May/132080.html> https://lists.llvm.org/pipermail/llvm-dev/2019-May/132080.html, http://lists.llvm.org/pipermail/llvm-dev/2017-January/109335.html, <http://lists.llvm.org/pipermail/llvm-dev/2017-January/108901.html> http://lists.llvm.org/pipermail/llvm-dev/2017-January/108901.html, <http://lists.llvm.org/pipermail/llvm-dev/2015-March/083177.html> http://lists.llvm.org/pipermail/llvm-dev/2015-March/083177.html, <http://lists.llvm.org/pipermail/llvm-dev/2014-September/076543.html> http://lists.llvm.org/pipermail/llvm-dev/2014-September/076543.html, http://lists.llvm.org/pipermail/llvm-dev/2009-September/026027.html). Our Toolchain is going to be based only on OSS. It allows using the backend without getting any proprietary software. Also, we hope that implementation for a target similar to TVM would help to generalize some concepts in LLVM and to make the whole framework better suit non-mainstream architectures. Aside from non-i8 bytes, we would like to bring stack machine support in the Target Independent Code generator. The matter will be discussed at the developers' meeting, see <http://llvm.org/devmtg/2019-10/talk-abstracts.html#bof2> http://llvm.org/devmtg/2019-10/talk-abstracts.html#bof2. LLVM and Clang for TVM are available at ( <https://github.com/tonlabs/TON-Compiler> https://github.com/tonlabs/TON-Compiler). It is currently under LLVM 7 and it can only produce assembler; we have not specified our object file format yet). Moreover, we have introduced custom IR types to model Tuples, Slices, Builders, Cells from the specification. We are going to do an LLVM update and consider using opaque types before starting to upstream. -- Kind regards, Dmitry Borisenkov -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191023/082336f6/attachment.html>
JF Bastien via llvm-dev
2019-Oct-24 21:21 UTC
[llvm-dev] RFC: On non 8-bit bytes and the target for it
I’d like to understand what programming model you see programmers using. You don’t need 257 bits per byte if you only offer 257 bit integers. Rather, bytes aren’t really a thing at that point. LLVM kinda handles iN already, and your backend would legalize everything to exactly this type and nothing else, right? Would it be sufficient to expose something like int<unsigned Size> with Size=257 for your programming environment? It would also be useful to understand what other changes you’re proposing, especially your mention of Tuples, Slices, Builders, Cells.> On Oct 23, 2019, at 2:16 AM, Dmitriy Borisenkov via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > This RFC is to ask whether the community is interested in further discussion of iN bytes support. Last time the issue was on the agenda in May and the discussion was triggered by Jesper Antonsson's patches (see <https://lists.llvm.org/pipermail/llvm-dev/2019-May/132080.html>https://lists.llvm.org/pipermail/llvm-dev/2019-May/132080.html <https://lists.llvm.org/pipermail/llvm-dev/2019-May/132080.html>). > > It seems that, while some downstream areas benefit from non-8-bit bytes support, this feature is barely maintainable given the lack of utilization targets in the upstream. The reason why I would like to again raise the matter is that we, the TON Labs team, would like to upstream our backend solution. > > The backend generates code for TON virtual machine designed to run smart contracts in TON blockchain (see the original specifications for TVM and TON respectively at <https://test.ton.org/tvm.pdf>https://test.ton.org/tvm.pdf <https://test.ton.org/tvm.pdf> and at <https://test.ton.org/tblkch.pdf>https://test.ton.org/tblkch.pdf <https://test.ton.org/tblkch.pdf>). > > The target has the following key particularities: > > stack-based virtual machine > 257-bit wide integers, signed magnitude representation > no float point arithmetic support > persistent storage > no "native" memory; modeling is possible by costly > presence of custom types (it is exactly the reason for upstreaming) > Given that the TVM only operates with 257 bits wide numbers, we changed LLVM in downstream to get a 257 bits byte. At the moment, we have a hacky implementation with a new byte size hardcoded. For a reference: the scope was to change approximately 20 files in LLVM and about a dozen in Clang. Later on, we plan to integrate the new byte size with data layout according to <https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/>https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/ <https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/>. And if the community decides to move on, we will upstream and maintain it. > > We realize that a 257 bits byte is quite unusual, but for smart contracts it is ok to have at least 256 bits numbers. The leading VM for smart contracts, Ethereum VM, introduced this practice and other blockchain VMs followed. Thus, while TVM might be the first LLVM-based target for blockchain that needs the feature, it is not necessarily the last one. We also found mentions of 12, 16 and 24 bits wide bytes in non-8-bits byte discussions in the past (in reverse chronological order: <https://lists.llvm.org/pipermail/llvm-dev/2019-May/132080.html>https://lists.llvm.org/pipermail/llvm-dev/2019-May/132080.html <https://lists.llvm.org/pipermail/llvm-dev/2019-May/132080.html>, http://lists.llvm.org/pipermail/llvm-dev/2017-January/109335.html <http://lists.llvm.org/pipermail/llvm-dev/2017-January/109335.html>, <http://lists.llvm.org/pipermail/llvm-dev/2017-January/108901.html>http://lists.llvm.org/pipermail/llvm-dev/2017-January/108901.html <http://lists.llvm.org/pipermail/llvm-dev/2017-January/108901.html>, <http://lists.llvm.org/pipermail/llvm-dev/2015-March/083177.html>http://lists.llvm.org/pipermail/llvm-dev/2015-March/083177.html <http://lists.llvm.org/pipermail/llvm-dev/2015-March/083177.html>, <http://lists.llvm.org/pipermail/llvm-dev/2014-September/076543.html>http://lists.llvm.org/pipermail/llvm-dev/2014-September/076543.html <http://lists.llvm.org/pipermail/llvm-dev/2014-September/076543.html>, http://lists.llvm.org/pipermail/llvm-dev/2009-September/026027.html <http://lists.llvm.org/pipermail/llvm-dev/2009-September/026027.html>). > > Our Toolchain is going to be based only on OSS. It allows using the backend without getting any proprietary software. Also, we hope that implementation for a target similar to TVM would help to generalize some concepts in LLVM and to make the whole framework better suit non-mainstream architectures. > > Aside from non-i8 bytes, we would like to bring stack machine support in the Target Independent Code generator. The matter will be discussed at the developers' meeting, see <http://llvm.org/devmtg/2019-10/talk-abstracts.html#bof2>http://llvm.org/devmtg/2019-10/talk-abstracts.html#bof2 <http://llvm.org/devmtg/2019-10/talk-abstracts.html#bof2>. > > LLVM and Clang for TVM are available at ( <https://github.com/tonlabs/TON-Compiler>https://github.com/tonlabs/TON-Compiler <https://github.com/tonlabs/TON-Compiler>). It is currently under LLVM 7 and it can only produce assembler; we have not specified our object file format yet). Moreover, we have introduced custom IR types to model Tuples, Slices, Builders, Cells from the specification. We are going to do an LLVM update and consider using opaque types before starting to upstream. > > -- > Kind regards, Dmitry Borisenkov > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191024/87ef72cb/attachment.html>
David Chisnall via llvm-dev
2019-Oct-24 23:02 UTC
[llvm-dev] RFC: On non 8-bit bytes and the target for it
On 24/10/2019 14:21, JF Bastien via llvm-dev wrote:> I’d like to understand what programming model you see programmers using. > You don’t need 257 bits per byte if you only offer 257 bit integers. > Rather, bytes aren’t really a thing at that point. LLVM kinda handles iN > already, and your backend would legalize everything to exactly this type > and nothing else, right? Would it be sufficient to expose something like > int<unsigned Size> with Size=257 for your programming environment?To add to what JF says: Typically, a byte means some combination of: 1. The smallest unit that can be indexed in memory (irrelevant for you, you have no memory). 2. The smallest unit that can be stored in a register in such a way that its representation is opaque to software (i.e. you can't tell the bit order of a byte in a multi-byte word). For you, it's not clear if this is 257 bits or something smaller. 3. The smallest unit that is used to build complex types in software. Since you have no memory, it's not clear that you can build structs or arrays, and therefore this doesn't seem to apply. From your description of your VM, it doesn't sound as if you can translate from any language with a vaguely C-like abstract machine, so I'm not certain why the size of a byte actually matters to you. LLVM IR has a quite C-like abstract machine, and several of these features seem like they will be problematic for you. There is quite a limited subset of LLVM IR that can be expressed for your VM and it would be helpful if you could enumerate what you expect to be able to support (and why going via LLVM is useful, given that you are unlikely to be able to take advantage of any existing front ends, many optimisations, or most of the target-agnostic code generator. David