Jeroen Dobbelaere via llvm-dev
2019-Oct-30 10:07 UTC
[llvm-dev] RFC: On non 8-bit bytes and the target for it
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of JF Bastien via[..]> Is it relevant to any modern compiler though? > > I strongly agree with Tim. As I said in previous threads, unless people will have > actual testable targets for this type of thing, I think we shouldn’t add > maintenance burden. This isn’t really C or C++ anymore because so much code > assumes CHAR_BIT == 8, or at a minimum CHAR_BIT % 8 == 0, that we’re > supporting a different language. IMO they should use a different language, and > C / C++ should only allow CHAR_BIT % 8 == 0 (and only for small values of > CHAR_BIT).We (Synopsys ASIP Designer team) and our customers tend to disagree: our customers do create plenty of cpu architectures with non-8-bit characters (and non-8-bit addressable memories). We are able to provide them with a working c/c++ compiler solution. Maybe some support libraries are not supported out of the box, but for these kind of architectures that is acceptable. (Besides that, llvm is also more than just c/c++) Greetings, Jeroen Dobbelaere
JF Bastien via llvm-dev
2019-Oct-30 13:35 UTC
[llvm-dev] RFC: On non 8-bit bytes and the target for it
> On Oct 30, 2019, at 3:07 AM, Jeroen Dobbelaere <Jeroen.Dobbelaere at synopsys.com> wrote: > > >> >> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of JF Bastien via > [..] >> Is it relevant to any modern compiler though? >> >> I strongly agree with Tim. As I said in previous threads, unless people will have >> actual testable targets for this type of thing, I think we shouldn’t add >> maintenance burden. This isn’t really C or C++ anymore because so much code >> assumes CHAR_BIT == 8, or at a minimum CHAR_BIT % 8 == 0, that we’re >> supporting a different language. IMO they should use a different language, and >> C / C++ should only allow CHAR_BIT % 8 == 0 (and only for small values of >> CHAR_BIT). > > We (Synopsys ASIP Designer team) and our customers tend to disagree: our customers do create plenty of cpu architectures > with non-8-bit characters (and non-8-bit addressable memories). We are able to provide them with a working c/c++ compiler solution. > Maybe some support libraries are not supported out of the box, but for these kind of architectures that is acceptable.That’s the kind of use case I’d happily support if we had upstream testing, say though a backend. I’m also happy if we remove magic numbers. Can you share the values you see for CHAR_BIT?> (Besides that, llvm is also more than just c/c++)Agreed, I bring up C and C++ because they were the languages discussed in the previous proposals.> Greetings, > > Jeroen Dobbelaere > >
David Chisnall via llvm-dev
2019-Oct-30 15:18 UTC
[llvm-dev] RFC: On non 8-bit bytes and the target for it
On 30/10/2019 10:07, Jeroen Dobbelaere via llvm-dev wrote:> We (Synopsys ASIP Designer team) and our customers tend to disagree: our customers do create plenty of cpu architectures > with non-8-bit characters (and non-8-bit addressable memories). We are able to provide them with a working c/c++ compiler solution. > Maybe some support libraries are not supported out of the box, but for these kind of architectures that is acceptable. > (Besides that, llvm is also more than just c/c++)My main concern in this discussion is that we're conflating several concepts of a 'byte': - The smallest unit that can be loaded / stored at a time. - The smallest unit that can be addressed with a raw pointer in a specific address space. - The largest unit whose encoding is opaque to anything above the ISA. - The type used to represent `char` in C. - The type that has a size that all other types are a multiple of. In POSIX C (which imposes some extra constraints not found in ISO C), when lowered to LLVM IR, all of these are the same type: - Loads and stores of values smaller than i8 or not a multiple of i8 may be widened to a multiple of i8. Bitfield fields that are smaller than i8 must use i8 or wider operations and masking. - GEP indexes are not well defined for anything that is not a multiple of i8. - There is no defined bit order of i8 (or bit order for larger types, only an assumption that, for example, i32 is 4 i8s in a specific order specified by the data layout). - char is lowered to i8. - All ABI-visible types have a size that is a multiple of 8 bits. It's not clear to me that saying 'a byte is 257 bits' means changing all of these to 257 or changing only some of them to 257 (which?). For example, when compiling C for 16-byte-addressible historic architectures, typically: - char is 8 bytes. - char* and void* is represented as a pointer plus a 1-bit offset (sometimes encoded in the low bit, so the load / store sequence is a right shift one, a load, and then a mask or mask and shift depending on the low bit). - Other pointer types are 16-bit aligned. IBM's 36-bit word machines use a broadly similar strategy, though with some important differences and I would imagine that most Synopsis cores are going to use some variation on this approach. This probably involves a quite different design to a model with 257-bit registers, but most of the concerns don't exist if you don't have memory that can store byte arrays and so involve very different design decisions. TL;DR: A proposal for supporting non-8-bit bytes needs to explain what their expected lowerings are and what they mean by a byte. David
Chris Lattner via llvm-dev
2019-Oct-30 22:30 UTC
[llvm-dev] RFC: On non 8-bit bytes and the target for it
> On Oct 30, 2019, at 3:07 AM, Jeroen Dobbelaere via llvm-dev <llvm-dev at lists.llvm.org> wrote: > >> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of JF Bastien via > [..] >> Is it relevant to any modern compiler though? >> >> I strongly agree with Tim. As I said in previous threads, unless people will have >> actual testable targets for this type of thing, I think we shouldn’t add >> maintenance burden. This isn’t really C or C++ anymore because so much code >> assumes CHAR_BIT == 8, or at a minimum CHAR_BIT % 8 == 0, that we’re >> supporting a different language. IMO they should use a different language, and >> C / C++ should only allow CHAR_BIT % 8 == 0 (and only for small values of >> CHAR_BIT). > > We (Synopsys ASIP Designer team) and our customers tend to disagree: our customers do create plenty of cpu architectures > with non-8-bit characters (and non-8-bit addressable memories). We are able to provide them with a working c/c++ compiler solution. > Maybe some support libraries are not supported out of the box, but for these kind of architectures that is acceptable. > (Besides that, llvm is also more than just c/c++)I agree - there are a lot of weird accelerators with LLVM backends, many of them aren’t targeted by C compilers/code. The ones that do have C frontends often use weird dialects or lots of builtins, but they are still useful to support. I find this thread to be a bit confusing: it seems that people are aware that such chips exists (even today) but some folks are reticent to add generic support for them. While I can see the concern about inventing new backends just for testing, I don’t see an argument against generalizing the core and leaving it untested (in master). If any bugs creep in, then people with downstream targets can fix them in core. -Chris
Mikael Holmén via llvm-dev
2019-Oct-31 06:50 UTC
[llvm-dev] RFC: On non 8-bit bytes and the target for it
On Wed, 2019-10-30 at 15:30 -0700, Chris Lattner via llvm-dev wrote:> > On Oct 30, 2019, at 3:07 AM, Jeroen Dobbelaere via llvm-dev < > > llvm-dev at lists.llvm.org> wrote: > > > > > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of JF > > > Bastien via > > > > [..] > > > Is it relevant to any modern compiler though? > > > > > > I strongly agree with Tim. As I said in previous threads, unless > > > people will have > > > actual testable targets for this type of thing, I think we > > > shouldn’t add > > > maintenance burden. This isn’t really C or C++ anymore because so > > > much code > > > assumes CHAR_BIT == 8, or at a minimum CHAR_BIT % 8 == 0, that > > > we’re > > > supporting a different language. IMO they should use a different > > > language, and > > > C / C++ should only allow CHAR_BIT % 8 == 0 (and only for small > > > values of > > > CHAR_BIT). > > > > We (Synopsys ASIP Designer team) and our customers tend to > > disagree: our customers do create plenty of cpu architectures > > with non-8-bit characters (and non-8-bit addressable memories). We > > are able to provide them with a working c/c++ compiler solution. > > Maybe some support libraries are not supported out of the box, but > > for these kind of architectures that is acceptable. > > (Besides that, llvm is also more than just c/c++) > > I agree - there are a lot of weird accelerators with LLVM backends, > many of them aren’t targeted by C compilers/code. The ones that do > have C frontends often use weird dialects or lots of builtins, but > they are still useful to support. > > I find this thread to be a bit confusing: it seems that people are > aware that such chips exists (even today) but some folks are reticent > to add generic support for them. While I can see the concern about > inventing new backends just for testing, I don’t see an argument > against generalizing the core and leaving it untested (in > master). If any bugs creep in, then people with downstream targets > can fix them in core.Thanks Chris! This is what we would like to see as well! We have a 16bit byte target downstream and we live pretty much on top- of-tree since we pull from llvm every day. Every now and then we find new 8bit byte assumptions in the code that break things for us that we fix downstream. If we were allowed, we would be happy to upstream such fixes which would make life easier both for us (as we would need to maintain fewer downstream diffs) and (hopefully) for others living downstream with other non-8bit byte targets. Now, while we try to fix things in ways that would work for several different byte sizes, what _we_ actually really test is 16bit bytes, so I'm sure we fail to generalize things enough for all sizes, but at least our contributions will make things more general than today. And I imagine that if other downstream targets use other byte sizes than us they would also notice when things break and would also pitch in and generalize it further so that it in the end works for all users. /Mikael> > -Chris > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org >https://protect2.fireeye.com/v1/url?k=8c219edf-d0a845d0-8c21de44-0cc47ad93e1a-b9df048a1ecb44b1&q=1&e=95c12902-023a-4b29-913c-87a467fe82d9&u=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev
Dmitriy Borisenkov via llvm-dev
2019-Oct-31 11:17 UTC
[llvm-dev] RFC: On non 8-bit bytes and the target for it
David, just to clarify a misconception I might have introduced, we do not have linear memory in the sense that all data is stored as a trie. We do support arrays, structures and GEPs, however, as well as all relevant features in C by modeling memory. So regarding concepts of byte, all 5 statements you gave are true for our target. Either due to the specification or because of performance (gas consumption) issues. But if there are architectures that need less from the notion of byte, we should try to figure out the common denominator. It's probably ok to be less restrictive about a byte. -- Kind regards, Dmitry On Wed, Oct 30, 2019 at 5:19 PM David Chisnall via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 30/10/2019 10:07, Jeroen Dobbelaere via llvm-dev wrote: > > We (Synopsys ASIP Designer team) and our customers tend to disagree: our > customers do create plenty of cpu architectures > > with non-8-bit characters (and non-8-bit addressable memories). We are > able to provide them with a working c/c++ compiler solution. > > Maybe some support libraries are not supported out of the box, but for > these kind of architectures that is acceptable. > > (Besides that, llvm is also more than just c/c++) > > My main concern in this discussion is that we're conflating several > concepts of a 'byte': > > - The smallest unit that can be loaded / stored at a time. > > - The smallest unit that can be addressed with a raw pointer in a > specific address space. > > - The largest unit whose encoding is opaque to anything above the ISA. > > - The type used to represent `char` in C. > > - The type that has a size that all other types are a multiple of. > > In POSIX C (which imposes some extra constraints not found in ISO C), > when lowered to LLVM IR, all of these are the same type: > > - Loads and stores of values smaller than i8 or not a multiple of i8 > may be widened to a multiple of i8. Bitfield fields that are smaller > than i8 must use i8 or wider operations and masking. > > - GEP indexes are not well defined for anything that is not a multiple > of i8. > > - There is no defined bit order of i8 (or bit order for larger types, > only an assumption that, for example, i32 is 4 i8s in a specific order > specified by the data layout). > > - char is lowered to i8. > > - All ABI-visible types have a size that is a multiple of 8 bits. > > It's not clear to me that saying 'a byte is 257 bits' means changing all > of these to 257 or changing only some of them to 257 (which?). For > example, when compiling C for 16-byte-addressible historic > architectures, typically: > > - char is 8 bytes. > > - char* and void* is represented as a pointer plus a 1-bit offset > (sometimes encoded in the low bit, so the load / store sequence is a > right shift one, a load, and then a mask or mask and shift depending on > the low bit). > > - Other pointer types are 16-bit aligned. > > IBM's 36-bit word machines use a broadly similar strategy, though with > some important differences and I would imagine that most Synopsis cores > are going to use some variation on this approach. > > This probably involves a quite different design to a model with 257-bit > registers, but most of the concerns don't exist if you don't have memory > that can store byte arrays and so involve very different design decisions. > > TL;DR: A proposal for supporting non-8-bit bytes needs to explain what > their expected lowerings are and what they mean by a byte. > > David > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191031/97bc5fa3/attachment.html>