Jesper Antonsson via llvm-dev
2019-May-06 08:25 UTC
[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes
I agree, addressable unit size is probably a better abstraction. However, in the lib/CodeGen directory alone, there's some 785 uses of the word "byte" and a significant fraction of the code that we want to modify is using the byte terminology today. An example of unmodified code from my showcase patch set: assert(!(Shift & 0x7) == 0 && "Shifts not aligned on Bytes are not supported."); uint64_t Offset = Shift / 8; unsigned TySizeInBytes = Origin->getValueSizeInBits(0) / 8; assert(!(Origin->getValueSizeInBits(0) & 0x7) == 0 && "The size of the original loaded type is not a "multiple of a byte."); How would you prefer we handle this? If we only remove the magic numbers using getAddressableUnitSize() instead of getBitsPerByte() we'd get some mixed terminology. If the community is ok with that, we're happy to do this. If we would go for changing the terminology overall, then the work and the patch sizes would grow considerably. BR, Jesper On Fri, 2019-05-03 at 20:17 +0000, Finkel, Hal J. via llvm-dev wrote:> On 5/3/19 2:58 PM, llvm-dev wrote: > > Jeroen Dobbelaere via llvm-dev <llvm-dev at lists.llvm.org> writes: > > > > > Hi Jesper, > > > > > > > Thanks, these are interesting differences. The CHAR_BIT and > > > > byte > > > > relation is established in the C standard and I would prefer > > > > the byte > > > > terminology. It means the same thing as addressable unit but is > > > > a bit > > > > shorter and probably more widely known. > > > > > > Looking purely from a c/c++ language viewpoint, this makes > > > sense. We > > > settled on using 'addressable unit size', but any abstraction > > > will > > > already be helpful. > > > > Given that f18 has just been accepted as an LLVM project, we > > probably > > shouldn't be using C/C++ or any specific language terminology in > > LLVM. > > Regardless of f18, we shouldn't anyway, unless it's terminology we > independently define in our language reference. LLVM has supported > many different language frontends for a long time :-) > -Hal > > > It seems useful to distinguish "addressable unit size" from "data > > size" > > and talk about things using such abstract terminology. > > > > -David > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
James Courtier-Dutton via llvm-dev
2019-May-06 09:12 UTC
[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes
On Mon, 6 May 2019 at 09:26, Jesper Antonsson via llvm-dev < llvm-dev at lists.llvm.org> wrote:> I agree, addressable unit size is probably a better abstraction. > However, in the lib/CodeGen directory alone, there's some 785 uses of > the word "byte" and a significant fraction of the code that we want to > modify is using the byte terminology today. An example of unmodified > code from my showcase patch set: > > assert(!(Shift & 0x7) == 0 && > "Shifts not aligned on Bytes are not supported."); > uint64_t Offset = Shift / 8; > unsigned TySizeInBytes = Origin->getValueSizeInBits(0) / 8; > assert(!(Origin->getValueSizeInBits(0) & 0x7) == 0 && > "The size of the original loaded type is not a > "multiple of a byte."); > > How would you prefer we handle this? If we only remove the magic > numbers using getAddressableUnitSize() instead of getBitsPerByte() we'd > get some mixed terminology. If the community is ok with that, we're > happy to do this. If we would go for changing the terminology overall, > then the work and the patch sizes would grow considerably. > > >Although the above is mentioning bytes, looking at the "/ 8" and "& 0x7" makes it look like the author meant octets and not bytes. Bytes can be any size of bits. Octets are only ever 8 bits. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190506/b612ecde/attachment.html>
Tim Northover via llvm-dev
2019-May-06 09:43 UTC
[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes
On Mon, 6 May 2019 at 10:13, James Courtier-Dutton via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Although the above is mentioning bytes, looking at the "/ 8" and "& 0x7" makes it look like the author meant octets and not bytes. > Bytes can be any size of bits.I don't think you'll have much luck trying to make that stick for a general audience, or even a general compiler-writer audience. Byte is far too strongly associated with 8 bits these days.> Octets are only ever 8 bits.You might be able to convert all uses of byte to octet and abandon byte entirely, but at that point why bother? It feels like a change just for the sake of pedantry. I like the "addressable unit" name, though it's a bit long (AddrUnit seems OK). It at least signals to a reader that there might be something weird going on. Getting someone writing new code to think in those terms is a different matter, of course, but I don't think any of the changes under discussion really help there. BTW, is there an open source backend (in a fork, I assume) that does this? So that we can get some kind of idea of the real scope of the changes needed. Cheers. Tim.