thr3ads.net - llvm dev - [llvm-dev] RFC: On non 8-bit bytes and the target for it [Oct 2019]

If this information is useful, please help other people find it:
Share via:

David Chisnall via llvm-dev

2019-Oct-24 23:02 UTC

[llvm-dev] RFC: On non 8-bit bytes and the target for it

On 24/10/2019 14:21, JF Bastien via llvm-dev wrote:> I’d like to understand what programming model you see programmers using. 
> You don’t need 257 bits per byte if you only offer 257 bit integers. 
> Rather, bytes aren’t really a thing at that point. LLVM kinda handles iN 
> already, and your backend would legalize everything to exactly this type 
> and nothing else, right? Would it be sufficient to expose something like 
> int<unsigned Size> with Size=257 for your programming environment?
To add to what JF says:

Typically, a byte means some combination of:

1. The smallest unit that can be indexed in memory (irrelevant for you, 
you have no memory).
2. The smallest unit that can be stored in a register in such a way that 
its representation is opaque to software (i.e. you can't tell the bit 
order of a byte in a multi-byte word).  For you, it's not clear if this 
is 257 bits or something smaller.
3. The smallest unit that is used to build complex types in software. 
Since you have no memory, it's not clear that you can build structs or 
arrays, and therefore this doesn't seem to apply.

 From your description of your VM, it doesn't sound as if you can 
translate from any language with a vaguely C-like abstract machine, so 
I'm not certain why the size of a byte actually matters to you.  LLVM IR 
has a quite C-like abstract machine, and several of these features seem 
like they will be problematic for you.  There is quite a limited subset 
of LLVM IR that can be expressed for your VM and it would be helpful if 
you could enumerate what you expect to be able to support (and why going 
via LLVM is useful, given that you are unlikely to be able to take 
advantage of any existing front ends, many optimisations, or most of the 
target-agnostic code generator.

David

Dmitriy Borisenkov via llvm-dev

2019-Oct-25 11:44 UTC

head link

[llvm-dev] RFC: On non 8-bit bytes and the target for it

Just to clarify, the VM doesn't have memory indeed, but we emulate the
memory with dictionaries (address -> value) which are native to TVM. Thus
you can work with arrays and structures in TVM.
However, access to a dictionary is very expensive in terms of gas (fee you
pay for a contract execution in the blockchain). We really don't want to
have unaligned memory access things like that. Aside from that, our
"ALU"
only support 257-bit operations and handling overflows of smaller types is
an additional expense for a user. So we set sizeof(char) == sizeof(short)
== sizeof(int) == sizeof(long) == sizeof(long long) == 1 byte == 257 bits
in C. Luckily, the C spec allows it. We do not have a specification
requirement of doing so, but we found it natural from implementation and
user experience point of view.

Our goal is to allow using general-purpose languages to develop smart
contracts since we believe it was a shortcoming of Etherium to focus solely
on Solidity. That why we decided to use LLVM. As for the LLVM specification
coverage, at the moment we support operations with memory (they are
probably not well tested yet, but there is a bunch of tests on arrays) and
structures, all integer arithmetic and bitwise operations, control-flow
instruction excluding exception handling stuff and indirectbr, comparisons,
extensions and truncations (we do have smaller values than i257 that are
stored in persistent memory, where a user pays for data storage; but
persistent memory is a different story, it will likely to become a
different address space in future, but now it's only accessible through
intrinsics). We also support memcpy and memset in non-persistent memory.

As for Slices, Builders and the rest, we aren't that crazy to really
propose them being upstreamed - it's very specific to our VM. It's an
implementation detail at the moment - we did introduced these entities as
types, basically because of time pressure on the project. We want to switch
to opaque types if it's possible without losing the correctness of our
backend. If it's impossible well, we will probably start looking for a way
to change the framework so that a target could introduce it's own type, but
I really hope it won't be the case.

So the scope of the changes we'd like to introduce:
1. Getting rid of byte size assumption in LLVM and Clang (adding byte size
to data layout, removing magic number 8 (where it means size of byte) from
LLVM and Clang, introducing the notion of byte for memcpy and memset). The
C spec doesn't have this constraint, so I'm not sure that LLVM should be
more restrictive here.
2. Adding support for stack machines in the backend (generalizing
algorithms of converting register-based instruction to stack-based ones,
the generic implementation of scheduling appropriate for a stack machine
and implementation of stack-aware (i.e. configurable) reassociation). It
was discussed during BoF talk at the recent conference. We are going to
summarize the results soon.
3. The backend itself.

So basically, we believe that (1) is beneficial for Embecosm, Ericsson and
other companies that were actively involved in the previous iterations of
non-8-bits byte discussion in the past. (3) fixes the main concern of the
community: the testability of these changes. (2) benefits WebAssembly and
further stack machines implemented in LLVM.

--
Kind regards, Dmitry

On Fri, Oct 25, 2019 at 1:02 AM David Chisnall <David.Chisnall at
cl.cam.ac.uk>
wrote:
> On 24/10/2019 14:21, JF Bastien via llvm-dev wrote:
> > I’d like to understand what programming model you see programmers
using.
> > You don’t need 257 bits per byte if you only offer 257 bit integers.
> > Rather, bytes aren’t really a thing at that point. LLVM kinda handles
iN
> > already, and your backend would legalize everything to exactly this
type
> > and nothing else, right? Would it be sufficient to expose something
like
> > int<unsigned Size> with Size=257 for your programming
environment?
>
> To add to what JF says:
>
> Typically, a byte means some combination of:
>
> 1. The smallest unit that can be indexed in memory (irrelevant for you,
> you have no memory).
> 2. The smallest unit that can be stored in a register in such a way that
> its representation is opaque to software (i.e. you can't tell the bit
> order of a byte in a multi-byte word).  For you, it's not clear if this
> is 257 bits or something smaller.
> 3. The smallest unit that is used to build complex types in software.
> Since you have no memory, it's not clear that you can build structs or
> arrays, and therefore this doesn't seem to apply.
>
>  From your description of your VM, it doesn't sound as if you can
> translate from any language with a vaguely C-like abstract machine, so
> I'm not certain why the size of a byte actually matters to you.  LLVM
IR
> has a quite C-like abstract machine, and several of these features seem
> like they will be problematic for you.  There is quite a limited subset
> of LLVM IR that can be expressed for your VM and it would be helpful if
> you could enumerate what you expect to be able to support (and why going
> via LLVM is useful, given that you are unlikely to be able to take
> advantage of any existing front ends, many optimisations, or most of the
> target-agnostic code generator.
>
> David
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191025/2afe6ac5/attachment.html>

Jesper Antonsson via llvm-dev

2019-Oct-25 15:46 UTC

head link

[llvm-dev] RFC: On non 8-bit bytes and the target for it

Hi Dmitriy,

I can confirm that Ericsson remains interested in the byte-size issue.
We would be more than happy to contribute/collaborate on patches,
suggestions and reviews in that area, should your upstreaming effort
win community approval.

Best regards, Jesper


On Fri, 2019-10-25 at 13:44 +0200, Dmitriy Borisenkov via llvm-dev
wrote:> Just to clarify, the VM doesn't have memory indeed, but we emulate
> the memory with dictionaries (address -> value) which are native to
> TVM. Thus you can work with arrays and structures in TVM.
> However, access to a dictionary is very expensive in terms of gas
> (fee you pay for a contract execution in the blockchain). We really
> don't want to have unaligned memory access things like that. Aside
> from that, our "ALU" only support 257-bit operations and handling
> overflows of smaller types is an additional expense for a user. So we
> set sizeof(char) == sizeof(short) == sizeof(int) == sizeof(long) =>
sizeof(long long) == 1 byte == 257 bits in C. Luckily, the C spec
> allows it. We do not have a specification requirement of doing so,
> but we found it natural from implementation and user experience point
> of view.
> 
> Our goal is to allow using general-purpose languages to develop smart
> contracts since we believe it was a shortcoming of Etherium to focus
> solely on Solidity. That why we decided to use LLVM. As for the LLVM
> specification coverage, at the moment we support operations with
> memory (they are probably not well tested yet, but there is a bunch
> of tests on arrays) and structures, all integer arithmetic and
> bitwise operations, control-flow instruction excluding exception
> handling stuff and indirectbr, comparisons, extensions and
> truncations (we do have smaller values than i257 that are stored in
> persistent memory, where a user pays for data storage; but persistent
> memory is a different story, it will likely to become a different
> address space in future, but now it's only accessible through
> intrinsics). We also support memcpy and memset in non-persistent
> memory.
> 
> As for Slices, Builders and the rest, we aren't that crazy to really
> propose them being upstreamed - it's very specific to our VM. It's
an
> implementation detail at the moment - we did introduced these
> entities as types, basically because of time pressure on the project.
> We want to switch to opaque types if it's possible without losing the
> correctness of our backend. If it's impossible well, we will probably
> start looking for a way to change the framework so that a target
> could introduce it's own type, but I really hope it won't be the
> case.
> 
> So the scope of the changes we'd like to introduce:
> 1. Getting rid of byte size assumption in LLVM and Clang (adding byte
> size to data layout, removing magic number 8 (where it means size of
> byte) from LLVM and Clang, introducing the notion of byte for memcpy
> and memset). The C spec doesn't have this constraint, so I'm not
sure
> that LLVM should be more restrictive here.
> 2. Adding support for stack machines in the backend (generalizing
> algorithms of converting register-based instruction to stack-based
> ones, the generic implementation of scheduling appropriate for a
> stack machine and implementation of stack-aware (i.e. configurable)
> reassociation). It was discussed during BoF talk at the recent
> conference. We are going to summarize the results soon.
> 3. The backend itself.
> 
> So basically, we believe that (1) is beneficial for Embecosm,
> Ericsson and other companies that were actively involved in the
> previous iterations of non-8-bits byte discussion in the past. (3)
> fixes the main concern of the community: the testability of these
> changes. (2) benefits WebAssembly and further stack machines
> implemented in LLVM.
> 
> --
> Kind regards, Dmitry
> 
> On Fri, Oct 25, 2019 at 1:02 AM David Chisnall <
> David.Chisnall at cl.cam.ac.uk> wrote:
> > On 24/10/2019 14:21, JF Bastien via llvm-dev wrote:
> > > I’d like to understand what programming model you see programmers
> > using. 
> > > You don’t need 257 bits per byte if you only offer 257 bit
> > integers. 
> > > Rather, bytes aren’t really a thing at that point. LLVM kinda
> > handles iN 
> > > already, and your backend would legalize everything to exactly
> > this type 
> > > and nothing else, right? Would it be sufficient to expose
> > something like 
> > > int<unsigned Size> with Size=257 for your programming
> > environment?
> > 
> > To add to what JF says:
> > 
> > Typically, a byte means some combination of:
> > 
> > 1. The smallest unit that can be indexed in memory (irrelevant for
> > you, 
> > you have no memory).
> > 2. The smallest unit that can be stored in a register in such a way
> > that 
> > its representation is opaque to software (i.e. you can't tell the
> > bit 
> > order of a byte in a multi-byte word).  For you, it's not clear if
> > this 
> > is 257 bits or something smaller.
> > 3. The smallest unit that is used to build complex types in
> > software. 
> > Since you have no memory, it's not clear that you can build
structs
> > or 
> > arrays, and therefore this doesn't seem to apply.
> > 
> >  From your description of your VM, it doesn't sound as if you can 
> > translate from any language with a vaguely C-like abstract machine,
> > so 
> > I'm not certain why the size of a byte actually matters to you. 
> > LLVM IR 
> > has a quite C-like abstract machine, and several of these features
> > seem 
> > like they will be problematic for you.  There is quite a limited
> > subset 
> > of LLVM IR that can be expressed for your VM and it would be
> > helpful if 
> > you could enumerate what you expect to be able to support (and why
> > going 
> > via LLVM is useful, given that you are unlikely to be able to take 
> > advantage of any existing front ends, many optimisations, or most
> > of the 
> > target-agnostic code generator.
> > 
> > David
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://protect2.fireeye.com/v1/url?k=3cfa75d3-60705739-3cfa3548-0cc47ad93e32-a226b272d7cbf41b&q=1&e=e79c42bc-f473-4130-bb12-0a88373d4c99&u=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev

Chris Lattner via llvm-dev

2019-Oct-26 02:56 UTC

head link

[llvm-dev] RFC: On non 8-bit bytes and the target for it

> On Oct 24, 2019, at 4:02 PM, David Chisnall via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On 24/10/2019 14:21, JF Bastien via llvm-dev wrote:
>> I’d like to understand what programming model you see programmers
using. You don’t need 257 bits per byte if you only offer 257 bit integers.
Rather, bytes aren’t really a thing at that point. LLVM kinda handles iN
already, and your backend would legalize everything to exactly this type and
nothing else, right? Would it be sufficient to expose something like
int<unsigned Size> with Size=257 for your programming environment?
> 
> To add to what JF says:
> 
> Typically, a byte means some combination of:
> 
> 1. The smallest unit that can be indexed in memory (irrelevant for you, you
have no memory).
> 2. The smallest unit that can be stored in a register in such a way that
its representation is opaque to software (i.e. you can't tell the bit order
of a byte in a multi-byte word).  For you, it's not clear if this is 257
bits or something smaller.
> 3. The smallest unit that is used to build complex types in software. Since
you have no memory, it's not clear that you can build structs or arrays, and
therefore this doesn't seem to apply.
> 
> From your description of your VM, it doesn't sound as if you can
translate from any language with a vaguely C-like abstract machine, so I'm
not certain why the size of a byte actually matters to you.  LLVM IR has a quite
C-like abstract machine, and several of these features seem like they will be
problematic for you.  There is quite a limited subset of LLVM IR that can be
expressed for your VM and it would be helpful if you could enumerate what you
expect to be able to support (and why going via LLVM is useful, given that you
are unlikely to be able to take advantage of any existing front ends, many
optimisations, or most of the target-agnostic code generator.
Right.  A 257-bit target is a bit crazy, but there are lots of other targets
that only have 16-bit or 32-bit addressable memory.   I’ve heard various people
saying that they all have out-of-tree patches to support non-8-bit-byte targets,
but because there is no in-tree target that uses them, it is very difficult to
merge these patches up stream.

I for one would love to see some of these patches get upstreamed.  If the only
problem is one of testing, then maybe we could make a virtual target exist, or
maybe we could accept the patches without test cases (so long as they doesn’t
break 8-bit-byte targets obviously).

-Chris

Dmitriy Borisenkov via llvm-dev

2019-Oct-29 19:11 UTC

head link

[llvm-dev] RFC: On non 8-bit bytes and the target for it

Thanks, Chris, for supporting the idea to have non-8-bits byte in LLVM.

I want to clarify the scope and then analyze the options we have.

The scope:
1. BitsPerByte or similar variable should be introduced to data layout;
include/CodeGen/ValueTypes.h and some other generic headers also need to be
updated and probably become dependent on the data layout.
2. Magic number 8 should be replaced with BitsPerByte. We found that 8 is
used as "size of a byte in bits" in Selection DAG, asm printer,
analysis
and transformation passes. Some of the passes are currently independent of
any target specific information. In downstream, we changed about ten passes
before our testing succeeded, but we might have missed some cases due to
the incompleteness of our tests.
3. &255 and other bits manipulations. We didn't catch many of that with
our
downstream testing. But again, at the moment, our tests are not
sufficiently good for any claims here.
4. The concept of byte should probably be introduced to Type.h. The
assumption that Type::getInt8Ty returns type for a byte is baked into the
code generator, builtins (notably memcpy and memset) and more than ten
analysis and transformation passes.

Noteworthy to say, that these changes should apply to the upcoming patches
as well to the existing ones, and if we decide to move on, and developers
should no longer assume that byte is 8-bits wide with an exception for
target-dependent pieces of code.

The options we have.
1. Perform 1 - 4 w/o any testing in upstream. It seems a very fragile
solution to me. Without any non-8-bit target in upstream, it's unlikely
that contributors will differentiate between getInt8Ty() and getByteTy().
So I guess that after a couple of months, we'll get a mix of 8s and
BitsPerBytes in code, and none of the tests will be regressed. The remedy
is probably an active contributor from downstream who is on top of the
trunk and checks new patches against its tests daily.
2. Test with a dummy target. It might work if we have a group of
contributors who is willing to rewrite and upstream some of their
downstream tests as well as to design and implement the target itself. The
issue here might be in functional tests, so we'd probably need to implement
a dummy virtual machine to run them because lit tests are unlikely to catch
all issues from paragraphs (2) and (3) of the scope described.
3. TON labs can provide its crazy target or some lightweight version of
it.>From the testing point of view, it works similar to the second solution,but it doesn't require any inventions. I could create a separate RFC about
the target to find out if the community thinks it's appropriate.

--
Kind regards, Dmitry.

On Sat, Oct 26, 2019 at 4:56 AM Chris Lattner <clattner at nondot.org>
wrote:>
>
>
> > On Oct 24, 2019, at 4:02 PM, David Chisnall via llvm-dev <
llvm-dev at lists.llvm.org> wrote:> >
> > On 24/10/2019 14:21, JF Bastien via llvm-dev wrote:
> >> I’d like to understand what programming model you see programmersusing. You don’t need 257 bits per byte if you only offer 257 bit integers.
Rather, bytes aren’t really a thing at that point. LLVM kinda handles iN
already, and your backend would legalize everything to exactly this type
and nothing else, right? Would it be sufficient to expose something like
int<unsigned Size> with Size=257 for your programming
environment?> >
> > To add to what JF says:
> >
> > Typically, a byte means some combination of:
> >
> > 1. The smallest unit that can be indexed in memory (irrelevant for
you,
you have no memory).> > 2. The smallest unit that can be stored in a register in such a waythat its representation is opaque to software (i.e. you can't tell the bit
order of a byte in a multi-byte word).  For you, it's not clear if this is
257 bits or something smaller.> > 3. The smallest unit that is used to build complex types in software.Since you have no memory, it's not clear that you can build structs or
arrays, and therefore this doesn't seem to apply.> >
> > From your description of your VM, it doesn't sound as if you cantranslate from any language with a vaguely C-like abstract machine, so I'm
not certain why the size of a byte actually matters to you.  LLVM IR has a
quite C-like abstract machine, and several of these features seem like they
will be problematic for you.  There is quite a limited subset of LLVM IR
that can be expressed for your VM and it would be helpful if you could
enumerate what you expect to be able to support (and why going via LLVM is
useful, given that you are unlikely to be able to take advantage of any
existing front ends, many optimisations, or most of the target-agnostic
code generator.>
> Right.  A 257-bit target is a bit crazy, but there are lots of othertargets that only have 16-bit or 32-bit addressable memory.   I’ve heard
various people saying that they all have out-of-tree patches to support
non-8-bit-byte targets, but because there is no in-tree target that uses
them, it is very difficult to merge these patches up
stream.>
> I for one would love to see some of these patches get upstreamed.  If theonly problem is one of testing, then maybe we could make a virtual target
exist, or maybe we could accept the patches without test cases (so long as
they doesn’t break 8-bit-byte targets obviously).>
> -Chris
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191029/0229ae65/attachment.html>

llvm dev - Oct 2019 - RFC: On non 8-bit bytes and the target for it

[llvm-dev] RFC: On non 8-bit bytes and the target for it

[llvm-dev] RFC: On non 8-bit bytes and the target for it

[llvm-dev] RFC: On non 8-bit bytes and the target for it

[llvm-dev] RFC: On non 8-bit bytes and the target for it

[llvm-dev] RFC: On non 8-bit bytes and the target for it