thr3ads.net - llvm dev - [llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes [May 2019]

If this information is useful, please help other people find it:
Share via:

Jesper Antonsson via llvm-dev

2019-May-03 11:22 UTC

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

On Thu, 2019-05-02 at 19:54 +0200, Pavel Šnobl wrote:
> Hi Jesper,
> 
> thank you for working on this. My company (Codasip) would definitely
> be interested in having this feature upstream. I think that this is
> actually important for a suprisingly large number of people who
> currently have to maintain their changes downstream. I have a couple
> of questions and comments:
> 
> 1. Do you plan on supporting truly arbitrary values as the byte size
> or are there in fact going to be limitations (e.g. the value has to
> be a multiple of 8 and lower or equal to 64)? I recall that we had a
> customer asking about 36-bit bytes.
We plan on supporting arbitrary sizes with a lower limit of 8, not
necessarily power-of-two or multiples of 8. I have to admit that I
haven't thought very much about what the upper limit might be. We might
leave it up to other interested parties to explore that and if we
receive suggestions on how to generalize also in that respect, we'll
certainly consider them.
> 2. If you define a byte to be e.g. 16 bits wide, does it mean that
> "char" is also 16 bits wide? If yes then how to do you define
types
> like int8_t from stdint.h?
Yes, char is the same. The int8_t type is optional according to the
standard and we don't define it for our OOT target. The int_least8_t is
required, but we just define it to be byte sized. 
> 3. Have you thought about the possibility to support different byte
> sizes for data and code?
Not really, but I saw that Jeroen Dobbelaere just suggested supporting
memory spaces with different byte sizes.
> 4. I realize that this is a separate issue but fully supporting non-
> 8-bit bytes requires also changes to other parts of a typical
> toolchain, namely linker (ld/lld) and debugger (gdb/lldb). Do you
> maintain out-of-tree changes in this area as well?
That's true, we do. I've also seen some community interest in those
areas, e.g. from Embecosm:
https://www.embecosm.com/2018/02/26/how-much-does-a-compiler-cost/

and from within Ericsson:
https://www.youtube.com/watch?v=HAqtEZmci70

Thanks,
Jesper

> Thank you,
> Pavel
> 
> On Thu, May 2, 2019 at 2:20 PM Jesper Antonsson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >    A. This RFC outlines a proposal regarding non-8-bit-byte support
> > that
> >       got positive reception at a Round Table at EuroLLVM19. The
> > general
> >       topic has been brought up several times before and one good
> > overview
> >       can be found in a FOSDEM 2017 presentation by Jones and Cook:
> > https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
> > 
> > In a nutshell, the proposal is for the llvm community to
> > allow/encourage interested parties to gradually remove "magic
> > numbers",
> > e.g. assumptions on the size of bytes from the codebase. Overview,
> > rationale and some example refactorings follows.
> > 
> > Overview:
> > 
> > LLVM currently assumes 8-bit bytes, while there exist a few out-of-
> > tree 
> > llvm targets that utilize bytes of other sizes, including our
> > (Ericsson's) proprietary target. The main issues are the magic
> > number 8
> > and "/8" and "*8" all over the place and the use
of i8 pointers.
> > 
> > There's considerable agreement that the use of magic numbers is
not
> > good coding style, and removing these ones would be of particular
> > benefit, even though the effort would not be complete and no in-
> > tree
> > target with tests exist to guarantee that all gains are maintained.
> > 
> > Ericsson is willing to drive this effort. During EuroLLVM19, there
> > seemed to be sufficient positive interest from other companies for
> > us
> > to expect help with reviewing patch sets. Ericsson has been
> > performing
> > nightly integration towards top-of-tree with this backend for
> > years,
> > catching and fixing new 8-bit-byte continuously. Thus we're able
to
> > commit to doing similar upstream fixes for the long haul in a no-
> > drama
> > way.
> > 
> > Rationale:
> > 
> > Benefits of moving toward a byte-size agnostic llvm include:
> > * Less magic numbers in the codebase.
> > * A reduced effort to maintain out-of-tree targets with non-8-bit
> > bytes
> > as contributors follow the established patterns. (One company has
> > told
> > us that they created but eventually gave up on a 16-bit byte target
> > due
> > to too-high integration burden.)
> > * A reduction in duplicate efforts as some of the adaptation work
> > would
> > happen in-tree rather than in several out-of-tree targets.
> > * For up-and-coming targets that have non-8-bit-byte sizes, time to
> > market using llvm would be far quicker.
> > * A higher probability of LLVM being the compiler of choice for
> > such
> > targets.
> > * Eventually, as the patch set required to make llvm fully byte
> > size
> > agnostic becomes small enough, the effort to provide a mock in-tree
> > target with some other byte size should be surmountable.
> > 
> > As cons, one could see a burden for the in-tree community to
> > maintain
> > whatever gains that have been had. However the onus should be on
> > interested parties to mend any bit-rot. The impact of not having as
> > much magic numbers and such should if anything make the code more
> > easy
> > to understand. The permission to go ahead would be under the
> > condition
> > that significant added complexities are avoided. Another con would
> > be
> > added compilation time e.g. in cases where the byte size is a run-
> > time
> > variable rather than a constant. However, this cost seems
> > negligible in
> > practice.
> > 
> > Refactoring examples:
> > https://reviews.llvm.org/D61432
> > 
> > Best Regards,
> > Jesper
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Rui Ueyama via llvm-dev

2019-May-07 05:23 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

*From: *Jesper Antonsson via llvm-dev <llvm-dev at lists.llvm.org>
*Date: *Fri, May 3, 2019 at 8:23 PM
*To: *snobl at codasip.com
*Cc: *llvm-dev at lists.llvm.org

On Thu, 2019-05-02 at 19:54 +0200, Pavel Šnobl wrote:>
> > Hi Jesper,
> >
> > thank you for working on this. My company (Codasip) would definitely
> > be interested in having this feature upstream. I think that this is
> > actually important for a suprisingly large number of people who
> > currently have to maintain their changes downstream. I have a couple
> > of questions and comments:
> >
> > 1. Do you plan on supporting truly arbitrary values as the byte size
> > or are there in fact going to be limitations (e.g. the value has to
> > be a multiple of 8 and lower or equal to 64)? I recall that we had a
> > customer asking about 36-bit bytes.
>
> We plan on supporting arbitrary sizes with a lower limit of 8, not
> necessarily power-of-two or multiples of 8. I have to admit that I
> haven't thought very much about what the upper limit might be. We might
> leave it up to other interested parties to explore that and if we
> receive suggestions on how to generalize also in that respect, we'll
> certainly consider them.
>
> > 2. If you define a byte to be e.g. 16 bits wide, does it mean that
> > "char" is also 16 bits wide? If yes then how to do you
define types
> > like int8_t from stdint.h?
>
> Yes, char is the same. The int8_t type is optional according to the
> standard and we don't define it for our OOT target. The int_least8_t is
> required, but we just define it to be byte sized.
>
> > 3. Have you thought about the possibility to support different byte
> > sizes for data and code?
>
> Not really, but I saw that Jeroen Dobbelaere just suggested supporting
> memory spaces with different byte sizes.
>
> > 4. I realize that this is a separate issue but fully supporting non-
> > 8-bit bytes requires also changes to other parts of a typical
> > toolchain, namely linker (ld/lld) and debugger (gdb/lldb). Do you
> > maintain out-of-tree changes in this area as well?
>
> That's true, we do. I've also seen some community interest in those
> areas, e.g. from Embecosm:
> https://www.embecosm.com/2018/02/26/how-much-does-a-compiler-cost/
>
> and from within Ericsson:
> https://www.youtube.com/watch?v=HAqtEZmci70

What are you using for the executable file format for machines whose byte
size is not 8? Looks like the ELF spec assumes that a byte is 8 bits long.

> Thanks,
> Jesper
>
>
> > Thank you,
> > Pavel
> >
> > On Thu, May 2, 2019 at 2:20 PM Jesper Antonsson via llvm-dev <
> > llvm-dev at lists.llvm.org> wrote:
> > >    A. This RFC outlines a proposal regarding non-8-bit-byte
support
> > > that
> > >       got positive reception at a Round Table at EuroLLVM19. The
> > > general
> > >       topic has been brought up several times before and one good
> > > overview
> > >       can be found in a FOSDEM 2017 presentation by Jones and
Cook:
> > > https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
> > >
> > > In a nutshell, the proposal is for the llvm community to
> > > allow/encourage interested parties to gradually remove
"magic
> > > numbers",
> > > e.g. assumptions on the size of bytes from the codebase.
Overview,
> > > rationale and some example refactorings follows.
> > >
> > > Overview:
> > >
> > > LLVM currently assumes 8-bit bytes, while there exist a few
out-of-
> > > tree
> > > llvm targets that utilize bytes of other sizes, including our
> > > (Ericsson's) proprietary target. The main issues are the
magic
> > > number 8
> > > and "/8" and "*8" all over the place and the
use of i8 pointers.
> > >
> > > There's considerable agreement that the use of magic numbers
is not
> > > good coding style, and removing these ones would be of particular
> > > benefit, even though the effort would not be complete and no in-
> > > tree
> > > target with tests exist to guarantee that all gains are
maintained.
> > >
> > > Ericsson is willing to drive this effort. During EuroLLVM19,
there
> > > seemed to be sufficient positive interest from other companies
for
> > > us
> > > to expect help with reviewing patch sets. Ericsson has been
> > > performing
> > > nightly integration towards top-of-tree with this backend for
> > > years,
> > > catching and fixing new 8-bit-byte continuously. Thus we're
able to
> > > commit to doing similar upstream fixes for the long haul in a no-
> > > drama
> > > way.
> > >
> > > Rationale:
> > >
> > > Benefits of moving toward a byte-size agnostic llvm include:
> > > * Less magic numbers in the codebase.
> > > * A reduced effort to maintain out-of-tree targets with non-8-bit
> > > bytes
> > > as contributors follow the established patterns. (One company has
> > > told
> > > us that they created but eventually gave up on a 16-bit byte
target
> > > due
> > > to too-high integration burden.)
> > > * A reduction in duplicate efforts as some of the adaptation work
> > > would
> > > happen in-tree rather than in several out-of-tree targets.
> > > * For up-and-coming targets that have non-8-bit-byte sizes, time
to
> > > market using llvm would be far quicker.
> > > * A higher probability of LLVM being the compiler of choice for
> > > such
> > > targets.
> > > * Eventually, as the patch set required to make llvm fully byte
> > > size
> > > agnostic becomes small enough, the effort to provide a mock
in-tree
> > > target with some other byte size should be surmountable.
> > >
> > > As cons, one could see a burden for the in-tree community to
> > > maintain
> > > whatever gains that have been had. However the onus should be on
> > > interested parties to mend any bit-rot. The impact of not having
as
> > > much magic numbers and such should if anything make the code more
> > > easy
> > > to understand. The permission to go ahead would be under the
> > > condition
> > > that significant added complexities are avoided. Another con
would
> > > be
> > > added compilation time e.g. in cases where the byte size is a
run-
> > > time
> > > variable rather than a constant. However, this cost seems
> > > negligible in
> > > practice.
> > >
> > > Refactoring examples:
> > > https://reviews.llvm.org/D61432
> > >
> > > Best Regards,
> > > Jesper
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org
> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190507/7a02c714/attachment-0001.html>

Jesper Antonsson via llvm-dev

2019-May-08 07:52 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

On Tue, 2019-05-07 at 14:23 +0900, Rui Ueyama wrote:> From: Jesper Antonsson via llvm-dev <llvm-dev at lists.llvm.org>
> Date: Fri, May 3, 2019 at 8:23 PM
> To: snobl at codasip.com
> Cc: llvm-dev at lists.llvm.org
> 
> > On Thu, 2019-05-02 at 19:54 +0200, Pavel Šnobl wrote:
> > 
> > > Hi Jesper,
> > > 
> > > thank you for working on this. My company (Codasip) would
> > definitely
> > > be interested in having this feature upstream. I think that this
> > is
> > > actually important for a suprisingly large number of people who
> > > currently have to maintain their changes downstream. I have a
> > couple
> > > of questions and comments:
> > > 
> > > 1. Do you plan on supporting truly arbitrary values as the byte
> > size
> > > or are there in fact going to be limitations (e.g. the value has
> > to
> > > be a multiple of 8 and lower or equal to 64)? I recall that we
> > had a
> > > customer asking about 36-bit bytes.
> > 
> > We plan on supporting arbitrary sizes with a lower limit of 8, not
> > necessarily power-of-two or multiples of 8. I have to admit that I
> > haven't thought very much about what the upper limit might be. We
> > might
> > leave it up to other interested parties to explore that and if we
> > receive suggestions on how to generalize also in that respect,
> > we'll
> > certainly consider them.
> > 
> > > 2. If you define a byte to be e.g. 16 bits wide, does it mean
> > that
> > > "char" is also 16 bits wide? If yes then how to do you
define
> > types
> > > like int8_t from stdint.h?
> > 
> > Yes, char is the same. The int8_t type is optional according to the
> > standard and we don't define it for our OOT target. The
> > int_least8_t is
> > required, but we just define it to be byte sized. 
> > 
> > > 3. Have you thought about the possibility to support different
> > byte
> > > sizes for data and code?
> > 
> > Not really, but I saw that Jeroen Dobbelaere just suggested
> > supporting
> > memory spaces with different byte sizes.
> > 
> > > 4. I realize that this is a separate issue but fully supporting
> > non-
> > > 8-bit bytes requires also changes to other parts of a typical
> > > toolchain, namely linker (ld/lld) and debugger (gdb/lldb). Do you
> > > maintain out-of-tree changes in this area as well?
> > 
> > That's true, we do. I've also seen some community interest in
those
> > areas, e.g. from Embecosm:
> > https://www.embecosm.com/2018/02/26/how-much-does-a-compiler-cost/
> > 
> > and from within Ericsson:
> > https://www.youtube.com/watch?v=HAqtEZmci70
> 
> What are you using for the executable file format for machines whose
> byte size is not 8? Looks like the ELF spec assumes that a byte is 8
> bits long.
We use ELF. Architectures can have a different byte-size to the on-disk 
representation in ELF/DWARF, and the ELF/DWARF specs are not good at
differentiating between octets and bytes. Thus it's probably easier to
keep ELF/DWARF in the 8-bit byte world and we have to convert from
machine byte width to 8-bit bytes/octets at some point. This might be
one additional reason to use the "addressable unit" terminology
instead.
> 
> > Thanks,
> > Jesper
> > 
> > 
> > > Thank you,
> > > Pavel
> > > 
> > > On Thu, May 2, 2019 at 2:20 PM Jesper Antonsson via llvm-dev <
> > > llvm-dev at lists.llvm.org> wrote:
> > > >    A. This RFC outlines a proposal regarding non-8-bit-byte
> > support
> > > > that
> > > >       got positive reception at a Round Table at EuroLLVM19.
> > The
> > > > general
> > > >       topic has been brought up several times before and one
> > good
> > > > overview
> > > >       can be found in a FOSDEM 2017 presentation by Jones
and
> > Cook:
> > > > https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
> > > > 
> > > > In a nutshell, the proposal is for the llvm community to
> > > > allow/encourage interested parties to gradually remove
"magic
> > > > numbers",
> > > > e.g. assumptions on the size of bytes from the codebase.
> > Overview,
> > > > rationale and some example refactorings follows.
> > > > 
> > > > Overview:
> > > > 
> > > > LLVM currently assumes 8-bit bytes, while there exist a few
> > out-of-
> > > > tree 
> > > > llvm targets that utilize bytes of other sizes, including
our
> > > > (Ericsson's) proprietary target. The main issues are the
magic
> > > > number 8
> > > > and "/8" and "*8" all over the place and
the use of i8
> > pointers.
> > > > 
> > > > There's considerable agreement that the use of magic
numbers is
> > not
> > > > good coding style, and removing these ones would be of
> > particular
> > > > benefit, even though the effort would not be complete and no
> > in-
> > > > tree
> > > > target with tests exist to guarantee that all gains are
> > maintained.
> > > > 
> > > > Ericsson is willing to drive this effort. During EuroLLVM19,
> > there
> > > > seemed to be sufficient positive interest from other
companies
> > for
> > > > us
> > > > to expect help with reviewing patch sets. Ericsson has been
> > > > performing
> > > > nightly integration towards top-of-tree with this backend
for
> > > > years,
> > > > catching and fixing new 8-bit-byte continuously. Thus
we're
> > able to
> > > > commit to doing similar upstream fixes for the long haul in
a
> > no-
> > > > drama
> > > > way.
> > > > 
> > > > Rationale:
> > > > 
> > > > Benefits of moving toward a byte-size agnostic llvm include:
> > > > * Less magic numbers in the codebase.
> > > > * A reduced effort to maintain out-of-tree targets with
non-8-
> > bit
> > > > bytes
> > > > as contributors follow the established patterns. (One
company
> > has
> > > > told
> > > > us that they created but eventually gave up on a 16-bit byte
> > target
> > > > due
> > > > to too-high integration burden.)
> > > > * A reduction in duplicate efforts as some of the adaptation
> > work
> > > > would
> > > > happen in-tree rather than in several out-of-tree targets.
> > > > * For up-and-coming targets that have non-8-bit-byte sizes,
> > time to
> > > > market using llvm would be far quicker.
> > > > * A higher probability of LLVM being the compiler of choice
for
> > > > such
> > > > targets.
> > > > * Eventually, as the patch set required to make llvm fully
byte
> > > > size
> > > > agnostic becomes small enough, the effort to provide a mock
in-
> > tree
> > > > target with some other byte size should be surmountable.
> > > > 
> > > > As cons, one could see a burden for the in-tree community to
> > > > maintain
> > > > whatever gains that have been had. However the onus should
be
> > on
> > > > interested parties to mend any bit-rot. The impact of not
> > having as
> > > > much magic numbers and such should if anything make the code
> > more
> > > > easy
> > > > to understand. The permission to go ahead would be under the
> > > > condition
> > > > that significant added complexities are avoided. Another con
> > would
> > > > be
> > > > added compilation time e.g. in cases where the byte size is
a
> > run-
> > > > time
> > > > variable rather than a constant. However, this cost seems
> > > > negligible in
> > > > practice.
> > > > 
> > > > Refactoring examples:
> > > > https://reviews.llvm.org/D61432
> > > > 
> > > > Best Regards,
> > > > Jesper
> > > > _______________________________________________
> > > > LLVM Developers mailing list
> > > > llvm-dev at lists.llvm.org
> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - May 2019 - RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes