thr3ads.net - llvm dev - [llvm-dev] RFC: On non 8-bit bytes and the target for it [Oct 2019]

If this information is useful, please help other people find it:
Share via:

Jeroen Dobbelaere via llvm-dev

2019-Oct-30 10:07 UTC

[llvm-dev] RFC: On non 8-bit bytes and the target for it

> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of JF
Bastien via
[..]> Is it relevant to any modern compiler though?
> 
> I strongly agree with Tim. As I said in previous threads, unless people
will have
> actual testable targets for this type of thing, I think we shouldn’t add
> maintenance burden. This isn’t really C or C++ anymore because so much code
> assumes CHAR_BIT == 8, or at a minimum CHAR_BIT % 8 == 0, that we’re
> supporting a different language. IMO they should use a different language,
and
> C / C++ should only allow CHAR_BIT % 8 == 0 (and only for small values of
> CHAR_BIT).
We (Synopsys ASIP Designer team) and our customers tend to disagree: our
customers do create plenty of cpu architectures
with non-8-bit characters (and non-8-bit addressable memories). We are able to
provide them with a working c/c++ compiler solution.
Maybe some support libraries are not supported out of the box, but for these
kind of architectures that is acceptable.
(Besides that, llvm is also more than just c/c++)

Greetings,

Jeroen Dobbelaere

JF Bastien via llvm-dev

2019-Oct-30 13:35 UTC

head link

[llvm-dev] RFC: On non 8-bit bytes and the target for it

> On Oct 30, 2019, at 3:07 AM, Jeroen Dobbelaere <Jeroen.Dobbelaere at
synopsys.com> wrote:
> 
> 
>> 
>> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
JF Bastien via
> [..]
>> Is it relevant to any modern compiler though?
>> 
>> I strongly agree with Tim. As I said in previous threads, unless people
will have
>> actual testable targets for this type of thing, I think we shouldn’t
add
>> maintenance burden. This isn’t really C or C++ anymore because so much
code
>> assumes CHAR_BIT == 8, or at a minimum CHAR_BIT % 8 == 0, that we’re
>> supporting a different language. IMO they should use a different
language, and
>> C / C++ should only allow CHAR_BIT % 8 == 0 (and only for small values
of
>> CHAR_BIT).
> 
> We (Synopsys ASIP Designer team) and our customers tend to disagree: our
customers do create plenty of cpu architectures
> with non-8-bit characters (and non-8-bit addressable memories). We are able
to provide them with a working c/c++ compiler solution.
> Maybe some support libraries are not supported out of the box, but for
these kind of architectures that is acceptable.
That’s the kind of use case I’d happily support if we had upstream testing, say
though a backend. I’m also happy if we remove magic numbers.

Can you share the values you see for CHAR_BIT?


> (Besides that, llvm is also more than just c/c++)
Agreed, I bring up C and C++ because they were the languages discussed in the
previous proposals.
> Greetings,
> 
> Jeroen Dobbelaere
> 
>

David Chisnall via llvm-dev

2019-Oct-30 15:18 UTC

head link

[llvm-dev] RFC: On non 8-bit bytes and the target for it

On 30/10/2019 10:07, Jeroen Dobbelaere via llvm-dev
wrote:> We (Synopsys ASIP Designer team) and our customers tend to disagree: our
customers do create plenty of cpu architectures
> with non-8-bit characters (and non-8-bit addressable memories). We are able
to provide them with a working c/c++ compiler solution.
> Maybe some support libraries are not supported out of the box, but for
these kind of architectures that is acceptable.
> (Besides that, llvm is also more than just c/c++)
My main concern in this discussion is that we're conflating several 
concepts of a 'byte':

  - The smallest unit that can be loaded / stored at a time.

  - The smallest unit that can be addressed with a raw pointer in a 
specific address space.

  - The largest unit whose encoding is opaque to anything above the ISA.

  - The type used to represent `char` in C.

  - The type that has a size that all other types are a multiple of.

In POSIX C (which imposes some extra constraints not found in ISO C), 
when lowered to LLVM IR, all of these are the same type:

  - Loads and stores of values smaller than i8 or not a multiple of i8 
may be widened to a multiple of i8.  Bitfield fields that are smaller 
than i8 must use i8 or wider operations and masking.

  - GEP indexes are not well defined for anything that is not a multiple 
of i8.

  - There is no defined bit order of i8 (or bit order for larger types, 
only an assumption that, for example, i32 is 4 i8s in a specific order 
specified by the data layout).

  - char is lowered to i8.

  - All ABI-visible types have a size that is a multiple of 8 bits.

It's not clear to me that saying 'a byte is 257 bits' means changing
all
of these to 257 or changing only some of them to 257 (which?).  For 
example, when compiling C for 16-byte-addressible historic 
architectures, typically:

  - char is 8 bytes.

  - char* and void* is represented as a pointer plus a 1-bit offset 
(sometimes encoded in the low bit, so the load / store sequence is a 
right shift one, a load, and then a mask or mask and shift depending on 
the low bit).

  - Other pointer types are 16-bit aligned.

IBM's 36-bit word machines use a broadly similar strategy, though with 
some important differences and I would imagine that most Synopsis cores 
are going to use some variation on this approach.

This probably involves a quite different design to a model with 257-bit 
registers, but most of the concerns don't exist if you don't have memory
that can store byte arrays and so involve very different design decisions.

TL;DR: A proposal for supporting non-8-bit bytes needs to explain what 
their expected lowerings are and what they mean by a byte.

David

Chris Lattner via llvm-dev

2019-Oct-30 22:30 UTC

head link

[llvm-dev] RFC: On non 8-bit bytes and the target for it

> On Oct 30, 2019, at 3:07 AM, Jeroen Dobbelaere via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
JF Bastien via
> [..]
>> Is it relevant to any modern compiler though?
>> 
>> I strongly agree with Tim. As I said in previous threads, unless people
will have
>> actual testable targets for this type of thing, I think we shouldn’t
add
>> maintenance burden. This isn’t really C or C++ anymore because so much
code
>> assumes CHAR_BIT == 8, or at a minimum CHAR_BIT % 8 == 0, that we’re
>> supporting a different language. IMO they should use a different
language, and
>> C / C++ should only allow CHAR_BIT % 8 == 0 (and only for small values
of
>> CHAR_BIT).
> 
> We (Synopsys ASIP Designer team) and our customers tend to disagree: our
customers do create plenty of cpu architectures
> with non-8-bit characters (and non-8-bit addressable memories). We are able
to provide them with a working c/c++ compiler solution.
> Maybe some support libraries are not supported out of the box, but for
these kind of architectures that is acceptable.
> (Besides that, llvm is also more than just c/c++)
I agree - there are a lot of weird accelerators with LLVM backends, many of them
aren’t targeted by C compilers/code.  The ones that do have C frontends often
use weird dialects or lots of builtins, but they are still useful to support.

I find this thread to be a bit confusing: it seems that people are aware that
such chips exists (even today) but some folks are reticent to add generic
support for them.  While I can see the concern about inventing new backends just
for testing, I don’t see an argument against generalizing the core and leaving
it untested (in master).  If any bugs creep in, then people with downstream
targets can fix them in core.

-Chris

Mikael Holmén via llvm-dev

2019-Oct-31 06:50 UTC

head link

[llvm-dev] RFC: On non 8-bit bytes and the target for it

On Wed, 2019-10-30 at 15:30 -0700, Chris Lattner via llvm-dev
wrote:> > On Oct 30, 2019, at 3:07 AM, Jeroen Dobbelaere via llvm-dev <
> > llvm-dev at lists.llvm.org> wrote:
> > 
> > > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On
Behalf Of JF
> > > Bastien via
> > 
> > [..]
> > > Is it relevant to any modern compiler though?
> > > 
> > > I strongly agree with Tim. As I said in previous threads, unless
> > > people will have
> > > actual testable targets for this type of thing, I think we
> > > shouldn’t add
> > > maintenance burden. This isn’t really C or C++ anymore because so
> > > much code
> > > assumes CHAR_BIT == 8, or at a minimum CHAR_BIT % 8 == 0, that
> > > we’re
> > > supporting a different language. IMO they should use a different
> > > language, and
> > > C / C++ should only allow CHAR_BIT % 8 == 0 (and only for small
> > > values of
> > > CHAR_BIT).
> > 
> > We (Synopsys ASIP Designer team) and our customers tend to
> > disagree: our customers do create plenty of cpu architectures
> > with non-8-bit characters (and non-8-bit addressable memories). We
> > are able to provide them with a working c/c++ compiler solution.
> > Maybe some support libraries are not supported out of the box, but
> > for these kind of architectures that is acceptable. 
> > (Besides that, llvm is also more than just c/c++)
> 
> I agree - there are a lot of weird accelerators with LLVM backends,
> many of them aren’t targeted by C compilers/code.  The ones that do
> have C frontends often use weird dialects or lots of builtins, but
> they are still useful to support.
> 
> I find this thread to be a bit confusing: it seems that people are
> aware that such chips exists (even today) but some folks are reticent
> to add generic support for them.  While I can see the concern about
> inventing new backends just for testing, I don’t see an argument
> against generalizing the core and leaving it untested (in
> master).  If any bugs creep in, then people with downstream targets
> can fix them in core.
Thanks Chris! This is what we would like to see as well!

We have a 16bit byte target downstream and we live pretty much on top-
of-tree since we pull from llvm every day. Every now and then we find
new 8bit byte assumptions in the code that break things for us that we
fix downstream.

If we were allowed, we would be happy to upstream such fixes which
would make life easier both for us (as we would need to maintain fewer
downstream diffs) and (hopefully) for others living downstream with
other non-8bit byte targets.

Now, while we try to fix things in ways that would work for several
different byte sizes, what _we_ actually really test is 16bit bytes, so
I'm sure we fail to generalize things enough for all sizes, but at
least our contributions will make things more general than today. 

And I imagine that if other downstream targets use other byte sizes
than us they would also notice when things break and would also pitch
in and generalize it further so that it in the end works for all users.

/Mikael
> 
> -Chris
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://protect2.fireeye.com/v1/url?k=8c219edf-d0a845d0-8c21de44-0cc47ad93e1a-b9df048a1ecb44b1&q=1&e=95c12902-023a-4b29-913c-87a467fe82d9&u=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev

Dmitriy Borisenkov via llvm-dev

2019-Oct-31 11:17 UTC

head link

[llvm-dev] RFC: On non 8-bit bytes and the target for it

David, just to clarify a misconception I might have introduced, we do not
have linear memory in the sense that all data is stored as a trie. We do
support arrays, structures and GEPs, however, as well as all relevant
features in C by modeling memory.

So regarding concepts of byte, all 5 statements you gave are true for our
target. Either due to the specification or because of performance (gas
consumption) issues. But if there are architectures that need less from the
notion of byte, we should try to figure out the common denominator. It's
probably ok to be less restrictive about a byte.

--
Kind regards, Dmitry

On Wed, Oct 30, 2019 at 5:19 PM David Chisnall via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 30/10/2019 10:07, Jeroen Dobbelaere via llvm-dev wrote:
> > We (Synopsys ASIP Designer team) and our customers tend to disagree:
our
> customers do create plenty of cpu architectures
> > with non-8-bit characters (and non-8-bit addressable memories). We are
> able to provide them with a working c/c++ compiler solution.
> > Maybe some support libraries are not supported out of the box, but for
> these kind of architectures that is acceptable.
> > (Besides that, llvm is also more than just c/c++)
>
> My main concern in this discussion is that we're conflating several
> concepts of a 'byte':
>
>   - The smallest unit that can be loaded / stored at a time.
>
>   - The smallest unit that can be addressed with a raw pointer in a
> specific address space.
>
>   - The largest unit whose encoding is opaque to anything above the ISA.
>
>   - The type used to represent `char` in C.
>
>   - The type that has a size that all other types are a multiple of.
>
> In POSIX C (which imposes some extra constraints not found in ISO C),
> when lowered to LLVM IR, all of these are the same type:
>
>   - Loads and stores of values smaller than i8 or not a multiple of i8
> may be widened to a multiple of i8.  Bitfield fields that are smaller
> than i8 must use i8 or wider operations and masking.
>
>   - GEP indexes are not well defined for anything that is not a multiple
> of i8.
>
>   - There is no defined bit order of i8 (or bit order for larger types,
> only an assumption that, for example, i32 is 4 i8s in a specific order
> specified by the data layout).
>
>   - char is lowered to i8.
>
>   - All ABI-visible types have a size that is a multiple of 8 bits.
>
> It's not clear to me that saying 'a byte is 257 bits' means
changing all
> of these to 257 or changing only some of them to 257 (which?).  For
> example, when compiling C for 16-byte-addressible historic
> architectures, typically:
>
>   - char is 8 bytes.
>
>   - char* and void* is represented as a pointer plus a 1-bit offset
> (sometimes encoded in the low bit, so the load / store sequence is a
> right shift one, a load, and then a mask or mask and shift depending on
> the low bit).
>
>   - Other pointer types are 16-bit aligned.
>
> IBM's 36-bit word machines use a broadly similar strategy, though with
> some important differences and I would imagine that most Synopsis cores
> are going to use some variation on this approach.
>
> This probably involves a quite different design to a model with 257-bit
> registers, but most of the concerns don't exist if you don't have
memory
> that can store byte arrays and so involve very different design decisions.
>
> TL;DR: A proposal for supporting non-8-bit bytes needs to explain what
> their expected lowerings are and what they mean by a byte.
>
> David
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191031/97bc5fa3/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Oct 2019 - RFC: On non 8-bit bytes and the target for it

[llvm-dev] RFC: On non 8-bit bytes and the target for it

[llvm-dev] RFC: On non 8-bit bytes and the target for it

[llvm-dev] RFC: On non 8-bit bytes and the target for it

[llvm-dev] RFC: On non 8-bit bytes and the target for it

[llvm-dev] RFC: On non 8-bit bytes and the target for it

[llvm-dev] RFC: On non 8-bit bytes and the target for it

Maybe Matching Threads