thr3ads.net - llvm dev - [llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes [May 2019]

If this information is useful, please help other people find it:
Share via:

Jesper Antonsson via llvm-dev

2019-May-06 08:25 UTC

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

I agree, addressable unit size is probably a better abstraction.
However, in the lib/CodeGen directory alone, there's some 785 uses of
the word "byte" and a significant fraction of the code that we want to
modify is using the byte terminology today. An example of unmodified
code from my showcase patch set:

    assert(!(Shift & 0x7) == 0 &&
           "Shifts not aligned on Bytes are not supported.");
    uint64_t Offset = Shift / 8;
    unsigned TySizeInBytes = Origin->getValueSizeInBits(0) / 8;
    assert(!(Origin->getValueSizeInBits(0) & 0x7) == 0 &&
           "The size of the original loaded type is not a 
           "multiple of a byte.");

How would you prefer we handle this? If we only remove the magic
numbers using getAddressableUnitSize() instead of getBitsPerByte() we'd
get some mixed terminology. If the community is ok with that, we're
happy to do this. If we would go for changing the terminology overall,
then the work and the patch sizes would grow considerably.

BR,
Jesper


On Fri, 2019-05-03 at 20:17 +0000, Finkel, Hal J. via llvm-dev
wrote:> On 5/3/19 2:58 PM, llvm-dev wrote:
> > Jeroen Dobbelaere via llvm-dev <llvm-dev at lists.llvm.org>
writes:
> > 
> > > Hi Jesper,
> > > 
> > > > Thanks, these are interesting differences. The CHAR_BIT and
> > > > byte
> > > > relation is established in the C standard and I would prefer
> > > > the byte
> > > > terminology. It means the same thing as addressable unit but
is
> > > > a bit
> > > > shorter and probably more widely known.
> > > 
> > > Looking purely from a c/c++ language viewpoint, this makes
> > > sense.  We
> > > settled on using 'addressable unit size', but any
abstraction
> > > will
> > > already be helpful.
> > 
> > Given that f18 has just been accepted as an LLVM project, we
> > probably
> > shouldn't be using C/C++ or any specific language terminology in
> > LLVM.
> 
> Regardless of f18, we shouldn't anyway, unless it's terminology we
> independently define in our language reference. LLVM has supported
> many different language frontends for a long time :-)
>  -Hal
> 
> > It seems useful to distinguish "addressable unit size" from
"data
> > size"
> > and talk about things using such abstract terminology.
> > 
> >                             -David
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

James Courtier-Dutton via llvm-dev

2019-May-06 09:12 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

On Mon, 6 May 2019 at 09:26, Jesper Antonsson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I agree, addressable unit size is probably a better abstraction.
> However, in the lib/CodeGen directory alone, there's some 785 uses of
> the word "byte" and a significant fraction of the code that we
want to
> modify is using the byte terminology today. An example of unmodified
> code from my showcase patch set:
>
>     assert(!(Shift & 0x7) == 0 &&
>            "Shifts not aligned on Bytes are not supported.");
>     uint64_t Offset = Shift / 8;
>     unsigned TySizeInBytes = Origin->getValueSizeInBits(0) / 8;
>     assert(!(Origin->getValueSizeInBits(0) & 0x7) == 0 &&
>            "The size of the original loaded type is not a
>            "multiple of a byte.");
>
> How would you prefer we handle this? If we only remove the magic
> numbers using getAddressableUnitSize() instead of getBitsPerByte() we'd
> get some mixed terminology. If the community is ok with that, we're
> happy to do this. If we would go for changing the terminology overall,
> then the work and the patch sizes would grow considerably.
>
>
>Although the above is mentioning bytes, looking at the "/ 8"   and
"& 0x7"
makes it look like the author meant octets and not bytes.
Bytes can be any size of bits. Octets are only ever 8 bits.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190506/b612ecde/attachment.html>

Tim Northover via llvm-dev

2019-May-06 09:43 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

On Mon, 6 May 2019 at 10:13, James Courtier-Dutton via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Although the above is mentioning bytes, looking at the "/ 8"  
and "& 0x7" makes it look like the author meant octets and not
bytes.
> Bytes can be any size of bits.
I don't think you'll have much luck trying to make that stick for a
general audience, or even a general compiler-writer audience. Byte is
far too strongly associated with 8 bits these days.
> Octets are only ever 8 bits.
You might be able to convert all uses of byte to octet and abandon
byte entirely, but at that point why bother? It feels like a change
just for the sake of pedantry.

I like the "addressable unit" name, though it's a bit long
(AddrUnit
seems OK). It at least signals to a reader that there might be
something weird going on. Getting someone writing new code to think in
those terms is a different matter, of course, but I don't think any of
the changes under discussion really help there.

BTW, is there an open source backend (in a fork, I assume) that does
this? So that we can get some kind of idea of the real scope of the
changes needed.

Cheers.

Tim.

llvm dev - May 2019 - RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes