thr3ads.net - llvm dev - [llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes [May 2019]

If this information is useful, please help other people find it:
Share via:

Jesper Antonsson via llvm-dev

2019-May-02 12:20 UTC

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

A. This RFC outlines a proposal regarding non-8-bit-byte support that
      got positive reception at a Round Table at EuroLLVM19. The general
      topic has been brought up several times before and one good overview
      can be found in a FOSDEM 2017 presentation by Jones and Cook:
https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/

In a nutshell, the proposal is for the llvm community to
allow/encourage interested parties to gradually remove "magic
numbers",
e.g. assumptions on the size of bytes from the codebase. Overview,
rationale and some example refactorings follows.

Overview:

LLVM currently assumes 8-bit bytes, while there exist a few out-of-tree 
llvm targets that utilize bytes of other sizes, including our
(Ericsson's) proprietary target. The main issues are the magic number 8
and "/8" and "*8" all over the place and the use of i8
pointers.

There's considerable agreement that the use of magic numbers is not
good coding style, and removing these ones would be of particular
benefit, even though the effort would not be complete and no in-tree
target with tests exist to guarantee that all gains are maintained.

Ericsson is willing to drive this effort. During EuroLLVM19, there
seemed to be sufficient positive interest from other companies for us
to expect help with reviewing patch sets. Ericsson has been performing
nightly integration towards top-of-tree with this backend for years,
catching and fixing new 8-bit-byte continuously. Thus we're able to
commit to doing similar upstream fixes for the long haul in a no-drama
way.

Rationale:

Benefits of moving toward a byte-size agnostic llvm include:
* Less magic numbers in the codebase.
* A reduced effort to maintain out-of-tree targets with non-8-bit bytes
as contributors follow the established patterns. (One company has told
us that they created but eventually gave up on a 16-bit byte target due
to too-high integration burden.)
* A reduction in duplicate efforts as some of the adaptation work would
happen in-tree rather than in several out-of-tree targets.
* For up-and-coming targets that have non-8-bit-byte sizes, time to
market using llvm would be far quicker.
* A higher probability of LLVM being the compiler of choice for such
targets.
* Eventually, as the patch set required to make llvm fully byte size
agnostic becomes small enough, the effort to provide a mock in-tree
target with some other byte size should be surmountable.

As cons, one could see a burden for the in-tree community to maintain
whatever gains that have been had. However the onus should be on
interested parties to mend any bit-rot. The impact of not having as
much magic numbers and such should if anything make the code more easy
to understand. The permission to go ahead would be under the condition
that significant added complexities are avoided. Another con would be
added compilation time e.g. in cases where the byte size is a run-time
variable rather than a constant. However, this cost seems negligible in
practice.

Refactoring examples:
https://reviews.llvm.org/D61432

Best Regards,
Jesper

JF Bastien via llvm-dev

2019-May-02 16:43 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

I’m not a fan of C and C++ supporting anything but 8 bits per byte.
Realistically, C and C++ on such targets are different languages from
8-bit-per-byte C and C++, and therefore code isn’t portable from one to the
other. I intend to propose that C++23 support only 8 bits per byte, ditto C. I’m
therefore not a fan of teaching clang about this.

Separately, teaching LLVM about unusual-sized bytes seems fine to me, if the
maintenance burden is low enough and the targets are supported in-tree and are
maintained. I agree that you can’t just plop in a target without support, so it
makes sense to first clean things up and then land a target. However, I don’t
think a mock target makes sense. I’d much rather see a real target.

Are we only talking about powers-of-two here, or “anything goes”? What
restrictions are you proposing to impose?

I’m really not convinced by this “magic number” argument. 8 really isn’t that
bad to see.

> On May 2, 2019, at 5:20 AM, Jesper Antonsson via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>   A. This RFC outlines a proposal regarding non-8-bit-byte support that
>      got positive reception at a Round Table at EuroLLVM19. The general
>      topic has been brought up several times before and one good overview
>      can be found in a FOSDEM 2017 presentation by Jones and Cook:
> https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
> 
> In a nutshell, the proposal is for the llvm community to
> allow/encourage interested parties to gradually remove "magic
numbers",
> e.g. assumptions on the size of bytes from the codebase. Overview,
> rationale and some example refactorings follows.
> 
> Overview:
> 
> LLVM currently assumes 8-bit bytes, while there exist a few out-of-tree 
> llvm targets that utilize bytes of other sizes, including our
> (Ericsson's) proprietary target. The main issues are the magic number 8
> and "/8" and "*8" all over the place and the use of i8
pointers.
> 
> There's considerable agreement that the use of magic numbers is not
> good coding style, and removing these ones would be of particular
> benefit, even though the effort would not be complete and no in-tree
> target with tests exist to guarantee that all gains are maintained.
> 
> Ericsson is willing to drive this effort. During EuroLLVM19, there
> seemed to be sufficient positive interest from other companies for us
> to expect help with reviewing patch sets. Ericsson has been performing
> nightly integration towards top-of-tree with this backend for years,
> catching and fixing new 8-bit-byte continuously. Thus we're able to
> commit to doing similar upstream fixes for the long haul in a no-drama
> way.
> 
> Rationale:
> 
> Benefits of moving toward a byte-size agnostic llvm include:
> * Less magic numbers in the codebase.
> * A reduced effort to maintain out-of-tree targets with non-8-bit bytes
> as contributors follow the established patterns. (One company has told
> us that they created but eventually gave up on a 16-bit byte target due
> to too-high integration burden.)
> * A reduction in duplicate efforts as some of the adaptation work would
> happen in-tree rather than in several out-of-tree targets.
> * For up-and-coming targets that have non-8-bit-byte sizes, time to
> market using llvm would be far quicker.
> * A higher probability of LLVM being the compiler of choice for such
> targets.
> * Eventually, as the patch set required to make llvm fully byte size
> agnostic becomes small enough, the effort to provide a mock in-tree
> target with some other byte size should be surmountable.
> 
> As cons, one could see a burden for the in-tree community to maintain
> whatever gains that have been had. However the onus should be on
> interested parties to mend any bit-rot. The impact of not having as
> much magic numbers and such should if anything make the code more easy
> to understand. The permission to go ahead would be under the condition
> that significant added complexities are avoided. Another con would be
> added compilation time e.g. in cases where the byte size is a run-time
> variable rather than a constant. However, this cost seems negligible in
> practice.
> 
> Refactoring examples:
> https://reviews.llvm.org/D61432
> 
> Best Regards,
> Jesper
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

via llvm-dev

2019-May-02 17:21 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of JF
> Bastien via llvm-dev
> 
> I’m not a fan of C and C++ supporting anything but 8 bits per byte.
> Realistically, C and C++ on such targets are different languages from 8-
> bit-per-byte C and C++, and therefore code isn’t portable from one to the
> other.
Having done it, I promise you that it's reasonable to write portable C
targeting both 7-bit and 8-bit 'char'.  It was too long ago to remember
anything in detail, but the brain cells still remaining from that era
believe it was pretty clean.
> I intend to propose that C++23 support only 8 bits per byte, ditto
> C. I’m therefore not a fan of teaching clang about this.
My impression is that non-8-bit-byte machines are (these days) generally 
small and likely for embedded or other special purposes, so a proposal 
to stop trying to squeeze the bloated monster that C++ has become onto 
those is probably fine. C, on the other hand, appears to be the language
of choice for all sorts of weird things, and that's less likely to fly.
(Just sayin'. I'm not on either committee and have no vested interest.)
--paulr

Pavel Šnobl via llvm-dev

2019-May-02 17:54 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Hi Jesper,

thank you for working on this. My company (Codasip) would definitely be
interested in having this feature upstream. I think that this is actually
important for a suprisingly large number of people who currently have to
maintain their changes downstream. I have a couple of questions and
comments:

1. Do you plan on supporting truly arbitrary values as the byte size or are
there in fact going to be limitations (e.g. the value has to be a multiple
of 8 and lower or equal to 64)? I recall that we had a customer asking
about 36-bit bytes.
2. If you define a byte to be e.g. 16 bits wide, does it mean that
"char"
is also 16 bits wide? If yes then how to do you define types like int8_t
from stdint.h?
3. Have you thought about the possibility to support different byte sizes
for data and code?
4. I realize that this is a separate issue but fully supporting non-8-bit
bytes requires also changes to other parts of a typical toolchain, namely
linker (ld/lld) and debugger (gdb/lldb). Do you maintain out-of-tree
changes in this area as well?

Thank you,
Pavel

On Thu, May 2, 2019 at 2:20 PM Jesper Antonsson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>    A. This RFC outlines a proposal regarding non-8-bit-byte support that
>       got positive reception at a Round Table at EuroLLVM19. The general
>       topic has been brought up several times before and one good overview
>       can be found in a FOSDEM 2017 presentation by Jones and Cook:
> https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
>
> In a nutshell, the proposal is for the llvm community to
> allow/encourage interested parties to gradually remove "magic
numbers",
> e.g. assumptions on the size of bytes from the codebase. Overview,
> rationale and some example refactorings follows.
>
> Overview:
>
> LLVM currently assumes 8-bit bytes, while there exist a few out-of-tree
> llvm targets that utilize bytes of other sizes, including our
> (Ericsson's) proprietary target. The main issues are the magic number 8
> and "/8" and "*8" all over the place and the use of i8
pointers.
>
> There's considerable agreement that the use of magic numbers is not
> good coding style, and removing these ones would be of particular
> benefit, even though the effort would not be complete and no in-tree
> target with tests exist to guarantee that all gains are maintained.
>
> Ericsson is willing to drive this effort. During EuroLLVM19, there
> seemed to be sufficient positive interest from other companies for us
> to expect help with reviewing patch sets. Ericsson has been performing
> nightly integration towards top-of-tree with this backend for years,
> catching and fixing new 8-bit-byte continuously. Thus we're able to
> commit to doing similar upstream fixes for the long haul in a no-drama
> way.
>
> Rationale:
>
> Benefits of moving toward a byte-size agnostic llvm include:
> * Less magic numbers in the codebase.
> * A reduced effort to maintain out-of-tree targets with non-8-bit bytes
> as contributors follow the established patterns. (One company has told
> us that they created but eventually gave up on a 16-bit byte target due
> to too-high integration burden.)
> * A reduction in duplicate efforts as some of the adaptation work would
> happen in-tree rather than in several out-of-tree targets.
> * For up-and-coming targets that have non-8-bit-byte sizes, time to
> market using llvm would be far quicker.
> * A higher probability of LLVM being the compiler of choice for such
> targets.
> * Eventually, as the patch set required to make llvm fully byte size
> agnostic becomes small enough, the effort to provide a mock in-tree
> target with some other byte size should be surmountable.
>
> As cons, one could see a burden for the in-tree community to maintain
> whatever gains that have been had. However the onus should be on
> interested parties to mend any bit-rot. The impact of not having as
> much magic numbers and such should if anything make the code more easy
> to understand. The permission to go ahead would be under the condition
> that significant added complexities are avoided. Another con would be
> added compilation time e.g. in cases where the byte size is a run-time
> variable rather than a constant. However, this cost seems negligible in
> practice.
>
> Refactoring examples:
> https://reviews.llvm.org/D61432
>
> Best Regards,
> Jesper
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190502/c077685a/attachment-0001.html>

David Greene via llvm-dev

2019-May-02 21:04 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

JF Bastien via llvm-dev <llvm-dev at lists.llvm.org> writes:
> I’m not a fan of C and C++ supporting anything but 8 bits per
> byte. Realistically, C and C++ on such targets are different languages
> from 8-bit-per-byte C and C++, and therefore code isn’t portable from
> one to the other. I intend to propose that C++23 support only 8 bits
> per byte, ditto C. I’m therefore not a fan of teaching clang about
> this.
It depends on what you mean by "byte."  Is "byte" a unit of
addressing
or a unit of data?

We have worked with machines that had word-level (32-bit) addressing.
8-bit data works just fine on such machines, you simply treat them like
bitfields.  Types like sint8_t are reasonable on such machines.  Types
like sint8_t * can also be reasonable.

I think it would be pretty bad to restrict either language to 8-bit
bytes.

                            -David

Sean Kilmurray via llvm-dev

2019-May-03 08:27 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Hi Jesper,

My company (CML Microsystems) would definitely be interested in having this
feature upstream too.

We currently maintain an out of tree backend that has a minimum addressable size
of 16 bits and this is implemented using the method outlined by Jones and Cook
of Embecosm that you refer to in the RFC.

Our implementation is slightly different than the one you’ve proposed  in that
we  used the concept of bitPerChar and only support multiples of 8 bits for that
char width.

I would be happy to help out with the work in any way I can.

Regards
Sean Kilmurray

-----Original Message-----
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Jesper
Antonsson via llvm-dev
Sent: 02 May 2019 13:21
To: llvm-dev at lists.llvm.org
Subject: [llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

   A. This RFC outlines a proposal regarding non-8-bit-byte support that
      got positive reception at a Round Table at EuroLLVM19. The general
      topic has been brought up several times before and one good overview
      can be found in a FOSDEM 2017 presentation by Jones and Cook:
https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/

In a nutshell, the proposal is for the llvm community to
allow/encourage interested parties to gradually remove "magic
numbers",
e.g. assumptions on the size of bytes from the codebase. Overview,
rationale and some example refactorings follows.

Overview:

LLVM currently assumes 8-bit bytes, while there exist a few out-of-tree 
llvm targets that utilize bytes of other sizes, including our
(Ericsson's) proprietary target. The main issues are the magic number 8
and "/8" and "*8" all over the place and the use of i8
pointers.

There's considerable agreement that the use of magic numbers is not
good coding style, and removing these ones would be of particular
benefit, even though the effort would not be complete and no in-tree
target with tests exist to guarantee that all gains are maintained.

Ericsson is willing to drive this effort. During EuroLLVM19, there
seemed to be sufficient positive interest from other companies for us
to expect help with reviewing patch sets. Ericsson has been performing
nightly integration towards top-of-tree with this backend for years,
catching and fixing new 8-bit-byte continuously. Thus we're able to
commit to doing similar upstream fixes for the long haul in a no-drama
way.

Rationale:

Benefits of moving toward a byte-size agnostic llvm include:
* Less magic numbers in the codebase.
* A reduced effort to maintain out-of-tree targets with non-8-bit bytes
as contributors follow the established patterns. (One company has told
us that they created but eventually gave up on a 16-bit byte target due
to too-high integration burden.)
* A reduction in duplicate efforts as some of the adaptation work would
happen in-tree rather than in several out-of-tree targets.
* For up-and-coming targets that have non-8-bit-byte sizes, time to
market using llvm would be far quicker.
* A higher probability of LLVM being the compiler of choice for such
targets.
* Eventually, as the patch set required to make llvm fully byte size
agnostic becomes small enough, the effort to provide a mock in-tree
target with some other byte size should be surmountable.

As cons, one could see a burden for the in-tree community to maintain
whatever gains that have been had. However the onus should be on
interested parties to mend any bit-rot. The impact of not having as
much magic numbers and such should if anything make the code more easy
to understand. The permission to go ahead would be under the condition
that significant added complexities are avoided. Another con would be
added compilation time e.g. in cases where the byte size is a run-time
variable rather than a constant. However, this cost seems negligible in
practice.

Refactoring examples:
https://reviews.llvm.org/D61432

Best Regards,
Jesper
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Disclaimer

The information contained in this communication from the sender is confidential.
It is intended solely for use by the recipient and others authorized to receive
it. If you are not the recipient, you are hereby notified that any disclosure,
copying, distribution or taking action in relation of the contents of this
information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been
automatically archived by Mimecast Ltd, an innovator in Software as a Service
(SaaS) for business. Providing a safer and more useful place for your human
generated data. Specializing in; Security, archiving and compliance. To find out
more visit the Mimecast website.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190503/a9afe1ae/attachment.html>
-------------- next part --------------
?Hi Jesper,

My company (CML Microsystems) would definitely be interested in having this
feature upstream too.

We currently maintain an out of tree backend that has a minimum addressable size
of 16 bits and this is implemented using the method outlined by Jones and Cook
of Embecosm that you refer to in the RFC.

Our implementation is slightly different than the one you’ve proposed  in that
we  used the concept of bitPerChar and only support multiples of 8 bits for that
char width.

I would be happy to help out with the work in any way I can.

Regards
Sean Kilmurray


PLEASE READ: Information in this email, including any attachments, is
intended solely for the addressee(s). Access to this information by anyone
else is unauthorised and in these circumstances the use, disclosure, copying
or distribution of this information is strictly prohibited. If you are not the
intended recipient, please let us know by replying to the sender and
immediately delete this email from your system.
This information has been transmitted over a public network and neither 
CML nor or any of its controlled entities accepts responsibility for the
accuracy or completeness of this message. Unless otherwise stated, opinions
expressed in this e-mail are those of the author and are not endorsed by CML.
-----Original Message-----
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Jesper
Antonsson via llvm-dev
Sent: 02 May 2019 13:21
To: llvm-dev at lists.llvm.org
Subject: [llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

   A. This RFC outlines a proposal regarding non-8-bit-byte support that
      got positive reception at a Round Table at EuroLLVM19. The general
      topic has been brought up several times before and one good overview
      can be found in a FOSDEM 2017 presentation by Jones and Cook:
https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/

In a nutshell, the proposal is for the llvm community to
allow/encourage interested parties to gradually remove "magic
numbers",
e.g. assumptions on the size of bytes from the codebase. Overview,
rationale and some example refactorings follows.

Overview:

LLVM currently assumes 8-bit bytes, while there exist a few out-of-tree
llvm targets that utilize bytes of other sizes, including our
(Ericsson's) proprietary target. The main issues are the magic number 8
and "/8" and "*8" all over the place and the use of i8
pointers.

There's considerable agreement that the use of magic numbers is not
good coding style, and removing these ones would be of particular
benefit, even though the effort would not be complete and no in-tree
target with tests exist to guarantee that all gains are maintained.

Ericsson is willing to drive this effort. During EuroLLVM19, there
seemed to be sufficient positive interest from other companies for us
to expect help with reviewing patch sets. Ericsson has been performing
nightly integration towards top-of-tree with this backend for years,
catching and fixing new 8-bit-byte continuously. Thus we're able to
commit to doing similar upstream fixes for the long haul in a no-drama
way.

Rationale:

Benefits of moving toward a byte-size agnostic llvm include:
* Less magic numbers in the codebase.
* A reduced effort to maintain out-of-tree targets with non-8-bit bytes
as contributors follow the established patterns. (One company has told
us that they created but eventually gave up on a 16-bit byte target due
to too-high integration burden.)
* A reduction in duplicate efforts as some of the adaptation work would
happen in-tree rather than in several out-of-tree targets.
* For up-and-coming targets that have non-8-bit-byte sizes, time to
market using llvm would be far quicker.
* A higher probability of LLVM being the compiler of choice for such
targets.
* Eventually, as the patch set required to make llvm fully byte size
agnostic becomes small enough, the effort to provide a mock in-tree
target with some other byte size should be surmountable.

As cons, one could see a burden for the in-tree community to maintain
whatever gains that have been had. However the onus should be on
interested parties to mend any bit-rot. The impact of not having as
much magic numbers and such should if anything make the code more easy
to understand. The permission to go ahead would be under the condition
that significant added complexities are avoided. Another con would be
added compilation time e.g. in cases where the byte size is a run-time
variable rather than a constant. However, this cost seems negligible in
practice.

Refactoring examples:
https://reviews.llvm.org/D61432

Best Regards,
Jesper
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Jesper Antonsson via llvm-dev

2019-May-03 09:18 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

On Thu, 2019-05-02 at 09:43 -0700, JF Bastien wrote:> I’m not a fan of C and C++ supporting anything but 8 bits per byte.
> Realistically, C and C++ on such targets are different languages from
> 8-bit-per-byte C and C++, and therefore code isn’t portable from one
> to the other. I intend to propose that C++23 support only 8 bits per
> byte, ditto C. I’m therefore not a fan of teaching clang about this.
On portability, the same is true for byte order and more. Also, the
standard is what it is currently and the non-8-bit byte targets do
exist. However, we don't suggest clang changes for now.
> Separately, teaching LLVM about unusual-sized bytes seems fine to me,
> if the maintenance burden is low enough and the targets are supported
> in-tree and are maintained. I agree that you can’t just plop in a
> target without support, so it makes sense to first clean things up
> and then land a target. However, I don’t think a mock target makes
> sense. I’d much rather see a real target.
I'd also much rather see a real target. Hopefully, the cleanup will
make it more likely to happen.
> Are we only talking about powers-of-two here, or “anything goes”?
> What restrictions are you proposing to impose?
We're proposing "anything goes" larger than 8 as that's what
the
standards says, and as we've talked to people having non-power-of-two
architectures. Also, we feel there's no major disadvantage of going for
that. Yes, we can't use masks and shifts the same way, but we feel that
won't have a big impact. (However, our target has 16-bit bytes, so if
the community would rather see powers-of-two, we could live with that.)
> I’m really not convinced by this “magic number” argument. 8 really
> isn’t that bad to see.
Though the meaning isn't always clear, i.e. if it's handling bytes or
octets. And perhaps it doesn't have to be, for as long as you have an
8-bit byte architecture, but when you start to clean up for another
architecture, it becomes a pain and is not always obvious. Especially
not when you're mucking around with Dwarf. Also, there's "&
7", ">> 3"
as well for instance. Not that bad either (as magic numbers often
aren't in context), but if you grep for them, you often have to look a
bit extra to see if it's e.g. a flag, a byte or an octet.

Regards,
Jesper

> 
> 
> > On May 2, 2019, at 5:20 AM, Jesper Antonsson via llvm-dev <
> > llvm-dev at lists.llvm.org> wrote:
> > 
> >   A. This RFC outlines a proposal regarding non-8-bit-byte support
> > that
> >      got positive reception at a Round Table at EuroLLVM19. The
> > general
> >      topic has been brought up several times before and one good
> > overview
> >      can be found in a FOSDEM 2017 presentation by Jones and Cook:
> > 
https://protect2.fireeye.com/url?k=b58a506e-e9015b52-b58a10f5-86ef624f95b6-937d68ba77c32042&u=https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/> > 
> > In a nutshell, the proposal is for the llvm community to
> > allow/encourage interested parties to gradually remove "magic
> > numbers",
> > e.g. assumptions on the size of bytes from the codebase. Overview,
> > rationale and some example refactorings follows.
> > 
> > Overview:
> > 
> > LLVM currently assumes 8-bit bytes, while there exist a few out-of-
> > tree 
> > llvm targets that utilize bytes of other sizes, including our
> > (Ericsson's) proprietary target. The main issues are the magic
> > number 8
> > and "/8" and "*8" all over the place and the use
of i8 pointers.
> > 
> > There's considerable agreement that the use of magic numbers is
not
> > good coding style, and removing these ones would be of particular
> > benefit, even though the effort would not be complete and no in-
> > tree
> > target with tests exist to guarantee that all gains are maintained.
> > 
> > Ericsson is willing to drive this effort. During EuroLLVM19, there
> > seemed to be sufficient positive interest from other companies for
> > us
> > to expect help with reviewing patch sets. Ericsson has been
> > performing
> > nightly integration towards top-of-tree with this backend for
> > years,
> > catching and fixing new 8-bit-byte continuously. Thus we're able
to
> > commit to doing similar upstream fixes for the long haul in a no-
> > drama
> > way.
> > 
> > Rationale:
> > 
> > Benefits of moving toward a byte-size agnostic llvm include:
> > * Less magic numbers in the codebase.
> > * A reduced effort to maintain out-of-tree targets with non-8-bit
> > bytes
> > as contributors follow the established patterns. (One company has
> > told
> > us that they created but eventually gave up on a 16-bit byte target
> > due
> > to too-high integration burden.)
> > * A reduction in duplicate efforts as some of the adaptation work
> > would
> > happen in-tree rather than in several out-of-tree targets.
> > * For up-and-coming targets that have non-8-bit-byte sizes, time to
> > market using llvm would be far quicker.
> > * A higher probability of LLVM being the compiler of choice for
> > such
> > targets.
> > * Eventually, as the patch set required to make llvm fully byte
> > size
> > agnostic becomes small enough, the effort to provide a mock in-tree
> > target with some other byte size should be surmountable.
> > 
> > As cons, one could see a burden for the in-tree community to
> > maintain
> > whatever gains that have been had. However the onus should be on
> > interested parties to mend any bit-rot. The impact of not having as
> > much magic numbers and such should if anything make the code more
> > easy
> > to understand. The permission to go ahead would be under the
> > condition
> > that significant added complexities are avoided. Another con would
> > be
> > added compilation time e.g. in cases where the byte size is a run-
> > time
> > variable rather than a constant. However, this cost seems
> > negligible in
> > practice.
> > 
> > Refactoring examples:
> > https://reviews.llvm.org/D61432
> > 
> > Best Regards,
> > Jesper
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
>

Jeroen Dobbelaere via llvm-dev

2019-May-03 10:33 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Hi Jesper,

We (ASIP Designer team, Synopsys) are also interested in a cleaner approach
without those magic constants.

Instead of 'n-bit bytes', we tend to talk about 'word
addressing' and 'addressable unit size'.

We support C/C++ for various architectures that our customers create with our
tools.
Some of those have multiple memories where the addressable unit size depends on
the memory (address space).

NOTE: the addressable unit size can be any value (like 10 or 41 or 2048)
 
We also keep the bits in 'char' separate from the addressable unit size.
As such, an 8 bit char can
have a 'storage size' of 24 bits in one memory and 64 bits in another.
(but a char can also be 24 bits, or 10 or something else).

I think that a cleaner abstraction in datalayout is in general useful.

We use:
   unsigned getAddresssableUnitSize(unsigned AddressSpace=0);

which, for the main llvm, could be implemented as :
   unsigned getAddresssableUnitSize(unsigned AS=0) { return 8; }

Greetings,

Jeroen Dobbelaere

> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
Jesper
> Antonsson via llvm-dev
> Sent: Thursday, May 2, 2019 14:21
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes
> 
>    A. This RFC outlines a proposal regarding non-8-bit-byte support that
>       got positive reception at a Round Table at EuroLLVM19. The general
>       topic has been brought up several times before and one good overview
>       can be found in a FOSDEM 2017 presentation by Jones and Cook:
> https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
>
> In a nutshell, the proposal is for the llvm community to
> allow/encourage interested parties to gradually remove "magic
numbers",
> e.g. assumptions on the size of bytes from the codebase. Overview,
> rationale and some example refactorings follows.
> 
> Overview:
> 
> LLVM currently assumes 8-bit bytes, while there exist a few out-of-tree
> llvm targets that utilize bytes of other sizes, including our
> (Ericsson's) proprietary target. The main issues are the magic number 8
> and "/8" and "*8" all over the place and the use of i8
pointers.
> 
> There's considerable agreement that the use of magic numbers is not
> good coding style, and removing these ones would be of particular
> benefit, even though the effort would not be complete and no in-tree
> target with tests exist to guarantee that all gains are maintained.
> 
> Ericsson is willing to drive this effort. During EuroLLVM19, there
> seemed to be sufficient positive interest from other companies for us
> to expect help with reviewing patch sets. Ericsson has been performing
> nightly integration towards top-of-tree with this backend for years,
> catching and fixing new 8-bit-byte continuously. Thus we're able to
> commit to doing similar upstream fixes for the long haul in a no-drama
> way.
> 
> Rationale:
> 
> Benefits of moving toward a byte-size agnostic llvm include:
> * Less magic numbers in the codebase.
> * A reduced effort to maintain out-of-tree targets with non-8-bit bytes
> as contributors follow the established patterns. (One company has told
> us that they created but eventually gave up on a 16-bit byte target due
> to too-high integration burden.)
> * A reduction in duplicate efforts as some of the adaptation work would
> happen in-tree rather than in several out-of-tree targets.
> * For up-and-coming targets that have non-8-bit-byte sizes, time to
> market using llvm would be far quicker.
> * A higher probability of LLVM being the compiler of choice for such
> targets.
> * Eventually, as the patch set required to make llvm fully byte size
> agnostic becomes small enough, the effort to provide a mock in-tree
> target with some other byte size should be surmountable.
> 
> As cons, one could see a burden for the in-tree community to maintain
> whatever gains that have been had. However the onus should be on
> interested parties to mend any bit-rot. The impact of not having as
> much magic numbers and such should if anything make the code more easy
> to understand. The permission to go ahead would be under the condition
> that significant added complexities are avoided. Another con would be
> added compilation time e.g. in cases where the byte size is a run-time
> variable rather than a constant. However, this cost seems negligible in
> practice.
> 
> Refactoring examples:
> https://reviews.llvm.org/D61432
> 
> Best Regards,
> Jesper

Jesper Antonsson via llvm-dev

2019-May-03 11:22 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

On Thu, 2019-05-02 at 19:54 +0200, Pavel Šnobl wrote:
> Hi Jesper,
> 
> thank you for working on this. My company (Codasip) would definitely
> be interested in having this feature upstream. I think that this is
> actually important for a suprisingly large number of people who
> currently have to maintain their changes downstream. I have a couple
> of questions and comments:
> 
> 1. Do you plan on supporting truly arbitrary values as the byte size
> or are there in fact going to be limitations (e.g. the value has to
> be a multiple of 8 and lower or equal to 64)? I recall that we had a
> customer asking about 36-bit bytes.
We plan on supporting arbitrary sizes with a lower limit of 8, not
necessarily power-of-two or multiples of 8. I have to admit that I
haven't thought very much about what the upper limit might be. We might
leave it up to other interested parties to explore that and if we
receive suggestions on how to generalize also in that respect, we'll
certainly consider them.
> 2. If you define a byte to be e.g. 16 bits wide, does it mean that
> "char" is also 16 bits wide? If yes then how to do you define
types
> like int8_t from stdint.h?
Yes, char is the same. The int8_t type is optional according to the
standard and we don't define it for our OOT target. The int_least8_t is
required, but we just define it to be byte sized. 
> 3. Have you thought about the possibility to support different byte
> sizes for data and code?
Not really, but I saw that Jeroen Dobbelaere just suggested supporting
memory spaces with different byte sizes.
> 4. I realize that this is a separate issue but fully supporting non-
> 8-bit bytes requires also changes to other parts of a typical
> toolchain, namely linker (ld/lld) and debugger (gdb/lldb). Do you
> maintain out-of-tree changes in this area as well?
That's true, we do. I've also seen some community interest in those
areas, e.g. from Embecosm:
https://www.embecosm.com/2018/02/26/how-much-does-a-compiler-cost/

and from within Ericsson:
https://www.youtube.com/watch?v=HAqtEZmci70

Thanks,
Jesper

> Thank you,
> Pavel
> 
> On Thu, May 2, 2019 at 2:20 PM Jesper Antonsson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >    A. This RFC outlines a proposal regarding non-8-bit-byte support
> > that
> >       got positive reception at a Round Table at EuroLLVM19. The
> > general
> >       topic has been brought up several times before and one good
> > overview
> >       can be found in a FOSDEM 2017 presentation by Jones and Cook:
> > https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
> > 
> > In a nutshell, the proposal is for the llvm community to
> > allow/encourage interested parties to gradually remove "magic
> > numbers",
> > e.g. assumptions on the size of bytes from the codebase. Overview,
> > rationale and some example refactorings follows.
> > 
> > Overview:
> > 
> > LLVM currently assumes 8-bit bytes, while there exist a few out-of-
> > tree 
> > llvm targets that utilize bytes of other sizes, including our
> > (Ericsson's) proprietary target. The main issues are the magic
> > number 8
> > and "/8" and "*8" all over the place and the use
of i8 pointers.
> > 
> > There's considerable agreement that the use of magic numbers is
not
> > good coding style, and removing these ones would be of particular
> > benefit, even though the effort would not be complete and no in-
> > tree
> > target with tests exist to guarantee that all gains are maintained.
> > 
> > Ericsson is willing to drive this effort. During EuroLLVM19, there
> > seemed to be sufficient positive interest from other companies for
> > us
> > to expect help with reviewing patch sets. Ericsson has been
> > performing
> > nightly integration towards top-of-tree with this backend for
> > years,
> > catching and fixing new 8-bit-byte continuously. Thus we're able
to
> > commit to doing similar upstream fixes for the long haul in a no-
> > drama
> > way.
> > 
> > Rationale:
> > 
> > Benefits of moving toward a byte-size agnostic llvm include:
> > * Less magic numbers in the codebase.
> > * A reduced effort to maintain out-of-tree targets with non-8-bit
> > bytes
> > as contributors follow the established patterns. (One company has
> > told
> > us that they created but eventually gave up on a 16-bit byte target
> > due
> > to too-high integration burden.)
> > * A reduction in duplicate efforts as some of the adaptation work
> > would
> > happen in-tree rather than in several out-of-tree targets.
> > * For up-and-coming targets that have non-8-bit-byte sizes, time to
> > market using llvm would be far quicker.
> > * A higher probability of LLVM being the compiler of choice for
> > such
> > targets.
> > * Eventually, as the patch set required to make llvm fully byte
> > size
> > agnostic becomes small enough, the effort to provide a mock in-tree
> > target with some other byte size should be surmountable.
> > 
> > As cons, one could see a burden for the in-tree community to
> > maintain
> > whatever gains that have been had. However the onus should be on
> > interested parties to mend any bit-rot. The impact of not having as
> > much magic numbers and such should if anything make the code more
> > easy
> > to understand. The permission to go ahead would be under the
> > condition
> > that significant added complexities are avoided. Another con would
> > be
> > added compilation time e.g. in cases where the byte size is a run-
> > time
> > variable rather than a constant. However, this cost seems
> > negligible in
> > practice.
> > 
> > Refactoring examples:
> > https://reviews.llvm.org/D61432
> > 
> > Best Regards,
> > Jesper
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Jesper Antonsson via llvm-dev

2019-May-03 14:51 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Hi Jeroen,

Thanks, these are interesting differences. The CHAR_BIT and byte
relation is established in the C standard and I would prefer the byte
terminology. It means the same thing as addressable unit but is a bit
shorter and probably more widely known.

Do you suggest any in-tree changes at some call sites using the
suggested AdressSpace parameter or would we rely on the default value
always? I could definitely include this, but if the parameter is never
used, perhaps we can leave it out-of-tree for now?

Best Regards,
Jesper


On Fri, 2019-05-03 at 10:33 +0000, Jeroen Dobbelaere
wrote:> Hi Jesper,
> 
> We (ASIP Designer team, Synopsys) are also interested in a cleaner
> approach without those magic constants.
> 
> Instead of 'n-bit bytes', we tend to talk about 'word
addressing' and
> 'addressable unit size'.
> 
> We support C/C++ for various architectures that our customers create
> with our tools.
> Some of those have multiple memories where the addressable unit size
> depends on the memory (address space).
> 
> NOTE: the addressable unit size can be any value (like 10 or 41 or
> 2048)
>  
> We also keep the bits in 'char' separate from the addressable unit
> size. As such, an 8 bit char can
> have a 'storage size' of 24 bits in one memory and 64 bits in
> another.
> (but a char can also be 24 bits, or 10 or something else).
> 
> I think that a cleaner abstraction in datalayout is in general
> useful.
> 
> We use:
>    unsigned getAddresssableUnitSize(unsigned AddressSpace=0);
> 
> which, for the main llvm, could be implemented as :
>    unsigned getAddresssableUnitSize(unsigned AS=0) { return 8; }
> 
> Greetings,
> 
> Jeroen Dobbelaere
> 
> 
> > -----Original Message-----
> > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
> > Jesper
> > Antonsson via llvm-dev
> > Sent: Thursday, May 2, 2019 14:21
> > To: llvm-dev at lists.llvm.org
> > Subject: [llvm-dev] RFC: On removing magic numbers assuming 8-bit
> > bytes
> > 
> >    A. This RFC outlines a proposal regarding non-8-bit-byte support
> > that
> >       got positive reception at a Round Table at EuroLLVM19. The
> > general
> >       topic has been brought up several times before and one good
> > overview
> >       can be found in a FOSDEM 2017 presentation by Jones and Cook:
> > 
https://protect2.fireeye.com/url?k=2d1c7421-7195ae31-2d1c34ba-0cc47ad93e2a-4cae3fa5d066a45a&u=https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/> > 
> > In a nutshell, the proposal is for the llvm community to
> > allow/encourage interested parties to gradually remove "magic
> > numbers",
> > e.g. assumptions on the size of bytes from the codebase. Overview,
> > rationale and some example refactorings follows.
> > 
> > Overview:
> > 
> > LLVM currently assumes 8-bit bytes, while there exist a few out-of-
> > tree
> > llvm targets that utilize bytes of other sizes, including our
> > (Ericsson's) proprietary target. The main issues are the magic
> > number 8
> > and "/8" and "*8" all over the place and the use
of i8 pointers.
> > 
> > There's considerable agreement that the use of magic numbers is
not
> > good coding style, and removing these ones would be of particular
> > benefit, even though the effort would not be complete and no in-
> > tree
> > target with tests exist to guarantee that all gains are maintained.
> > 
> > Ericsson is willing to drive this effort. During EuroLLVM19, there
> > seemed to be sufficient positive interest from other companies for
> > us
> > to expect help with reviewing patch sets. Ericsson has been
> > performing
> > nightly integration towards top-of-tree with this backend for
> > years,
> > catching and fixing new 8-bit-byte continuously. Thus we're able
to
> > commit to doing similar upstream fixes for the long haul in a no-
> > drama
> > way.
> > 
> > Rationale:
> > 
> > Benefits of moving toward a byte-size agnostic llvm include:
> > * Less magic numbers in the codebase.
> > * A reduced effort to maintain out-of-tree targets with non-8-bit
> > bytes
> > as contributors follow the established patterns. (One company has
> > told
> > us that they created but eventually gave up on a 16-bit byte target
> > due
> > to too-high integration burden.)
> > * A reduction in duplicate efforts as some of the adaptation work
> > would
> > happen in-tree rather than in several out-of-tree targets.
> > * For up-and-coming targets that have non-8-bit-byte sizes, time to
> > market using llvm would be far quicker.
> > * A higher probability of LLVM being the compiler of choice for
> > such
> > targets.
> > * Eventually, as the patch set required to make llvm fully byte
> > size
> > agnostic becomes small enough, the effort to provide a mock in-tree
> > target with some other byte size should be surmountable.
> > 
> > As cons, one could see a burden for the in-tree community to
> > maintain
> > whatever gains that have been had. However the onus should be on
> > interested parties to mend any bit-rot. The impact of not having as
> > much magic numbers and such should if anything make the code more
> > easy
> > to understand. The permission to go ahead would be under the
> > condition
> > that significant added complexities are avoided. Another con would
> > be
> > added compilation time e.g. in cases where the byte size is a run-
> > time
> > variable rather than a constant. However, this cost seems
> > negligible in
> > practice.
> > 
> > Refactoring examples:
> > https://reviews.llvm.org/D61432
> > 
> > Best Regards,
> > Jesper
> 
>

Jesper Antonsson via llvm-dev

2019-May-14 11:08 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Thanks to everyone that have weighed in on this! There have been
remarks along the lines that the proposal lacks consensus to move
forward, and that might very well be true. However, the decision model
is not fully clear to me, so before giving up, I'd like to summarize
the discussion to the best of my ability and ask for clarification on
how the community makes its decisions.

(First, I would like to emphasize that this is merely my
interpretation, which may be flawed. If so, my apologies.) The RFC
tl;dr is "without supporting non-8-bit bytes, we want to abstract the
magic 8's used for bytesize to a DataLayout getter".

It garnered some five supporters, including four companies representing
out-of-tree backends: my own Ericsson, Synopsys, CML Microsystems and
Codasip. I also count James Y Knight's (Google) remarks as positive.

Two commenters I interpret as clearly against, among other things
citing the need for a committment for an in-tree-backend using non-8-
bit bytes to undertake such a change: JF Bastien and Philip Reames.

Nine additional commenters provided side remarks to the discussion, and
my interpretation was that five were in some small regard abstraction-
positive as they provided input on the details of the abstraction,
whereas two were more skeptical, and two were neither.

Now to the decision model: If full consensus is required, we clearly
don't have it, and we can stop here. However, if the abstraction were
in place, we'd have a large majority against replacing them with 8s.
And I guess it would be hard to move forward on very many things if
everyone in a huge community can veto?

Or should we take an insider-outsider perspective, then perhaps the
opposers are in  majority as their in-tree voices are what counts?

Or should we take a company perspective, then there's four companies
backing this, while I'm unsure if e.g. JF is speaking for Apple?

Or should we apply some reputation weights, so some people can veto?

Or is it an informal mix of all of the above? I'm sorry if I'm being
obnoxious, I really don't mean to. I'd just sleep better with clarity
in closure, so that the discussion don't just fizzle out and I give up
while having a sense that the community is leaning toward the positive
on this.

BR,
Jesper

On Thu, 2019-05-02 at 12:20 +0000, Jesper Antonsson via llvm-dev
wrote:>    A. This RFC outlines a proposal regarding non-8-bit-byte support
> that
>       got positive reception at a Round Table at EuroLLVM19. The
> general
>       topic has been brought up several times before and one good
> overview
>       can be found in a FOSDEM 2017 presentation by Jones and Cook:
> 
https://protect2.fireeye.com/url?k=0092ea36-5c19e105-0092aaad-865bb277df6a-f2acface76df5bb9&u=https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/> 
> In a nutshell, the proposal is for the llvm community to
> allow/encourage interested parties to gradually remove "magic
> numbers",
> e.g. assumptions on the size of bytes from the codebase. Overview,
> rationale and some example refactorings follows.
> 
> Overview:
> 
> LLVM currently assumes 8-bit bytes, while there exist a few out-of-
> tree 
> llvm targets that utilize bytes of other sizes, including our
> (Ericsson's) proprietary target. The main issues are the magic number
> 8
> and "/8" and "*8" all over the place and the use of i8
pointers.
> 
> There's considerable agreement that the use of magic numbers is not
> good coding style, and removing these ones would be of particular
> benefit, even though the effort would not be complete and no in-tree
> target with tests exist to guarantee that all gains are maintained.
> 
> Ericsson is willing to drive this effort. During EuroLLVM19, there
> seemed to be sufficient positive interest from other companies for us
> to expect help with reviewing patch sets. Ericsson has been
> performing
> nightly integration towards top-of-tree with this backend for years,
> catching and fixing new 8-bit-byte continuously. Thus we're able to
> commit to doing similar upstream fixes for the long haul in a no-
> drama
> way.
> 
> Rationale:
> 
> Benefits of moving toward a byte-size agnostic llvm include:
> * Less magic numbers in the codebase.
> * A reduced effort to maintain out-of-tree targets with non-8-bit
> bytes
> as contributors follow the established patterns. (One company has
> told
> us that they created but eventually gave up on a 16-bit byte target
> due
> to too-high integration burden.)
> * A reduction in duplicate efforts as some of the adaptation work
> would
> happen in-tree rather than in several out-of-tree targets.
> * For up-and-coming targets that have non-8-bit-byte sizes, time to
> market using llvm would be far quicker.
> * A higher probability of LLVM being the compiler of choice for such
> targets.
> * Eventually, as the patch set required to make llvm fully byte size
> agnostic becomes small enough, the effort to provide a mock in-tree
> target with some other byte size should be surmountable.
> 
> As cons, one could see a burden for the in-tree community to maintain
> whatever gains that have been had. However the onus should be on
> interested parties to mend any bit-rot. The impact of not having as
> much magic numbers and such should if anything make the code more
> easy
> to understand. The permission to go ahead would be under the
> condition
> that significant added complexities are avoided. Another con would be
> added compilation time e.g. in cases where the byte size is a run-
> time
> variable rather than a constant. However, this cost seems negligible
> in
> practice.
> 
> Refactoring examples:
> https://reviews.llvm.org/D61432
> 
> Best Regards,
> Jesper
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

James Y Knight via llvm-dev

2019-May-23 21:02 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

You have raised an interesting question here about model for
decision-making, which unfortunately I don't think anyone really has an
answer for. I certainly don't.

Anyways, yes, my response was indeed intended to be mostly-positive
encouragement.

It seems to me that LLVM, as a project, generally encourages doing as much
development in the upstream project as possible, while also being generally
welcoming and helpful to closed-source downstream forks. That said, we
cannot ignore the critical requirement that upstream developers continue to
be able to make changes without having to remember all of the arbitrary
untested/undocumented constraints closed-source downstream project may have.

That's why I like the formulation of this proposal as a "code
clean-up",
rather than a "feature". As a feature, it really cannot exist without
a
backend using it upstream, because we have no ability to test it, so it
will be necessarily be broken. On the other hand, as a cleanup, it's
entirely acceptable that only some of the code has abstracted away the
number 8.

I'd suggest that this proposed cleanup should be allowed to happen -- but
that upstream, we do NOT expose it as an overrideable value (e.g., don't
put it in a target hook). Instead, make a non-overrideble function which
always just returns 8. Document it as returning the number of bits in a
character -- and that it's always 8 right now, but with the possibility
that someday llvm may handle other bit-widths.

Any downstream users who wish to change the value now, will need to patch
it to be a target hook (or whatever) in their fork -- and also fix anything
that's not using the abstraction properly, because we don't actually
support any other values upstream (due to no ability to test them).

On Tue, May 14, 2019 at 7:09 AM Jesper Antonsson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Thanks to everyone that have weighed in on this! There have been
> remarks along the lines that the proposal lacks consensus to move
> forward, and that might very well be true. However, the decision model
> is not fully clear to me, so before giving up, I'd like to summarize
> the discussion to the best of my ability and ask for clarification on
> how the community makes its decisions.
>
> (First, I would like to emphasize that this is merely my
> interpretation, which may be flawed. If so, my apologies.) The RFC
> tl;dr is "without supporting non-8-bit bytes, we want to abstract the
> magic 8's used for bytesize to a DataLayout getter".
>
> It garnered some five supporters, including four companies representing
> out-of-tree backends: my own Ericsson, Synopsys, CML Microsystems and
> Codasip. I also count James Y Knight's (Google) remarks as positive.
>
> Two commenters I interpret as clearly against, among other things
> citing the need for a committment for an in-tree-backend using non-8-
> bit bytes to undertake such a change: JF Bastien and Philip Reames.
>
> Nine additional commenters provided side remarks to the discussion, and
> my interpretation was that five were in some small regard abstraction-
> positive as they provided input on the details of the abstraction,
> whereas two were more skeptical, and two were neither.
>
>
> Now to the decision model: If full consensus is required, we clearly
> don't have it, and we can stop here. However, if the abstraction were
> in place, we'd have a large majority against replacing them with 8s.
> And I guess it would be hard to move forward on very many things if
> everyone in a huge community can veto?
>
> Or should we take an insider-outsider perspective, then perhaps the
> opposers are in  majority as their in-tree voices are what counts?
>
> Or should we take a company perspective, then there's four companies
> backing this, while I'm unsure if e.g. JF is speaking for Apple?
>
> Or should we apply some reputation weights, so some people can veto?
>
> Or is it an informal mix of all of the above? I'm sorry if I'm
being
> obnoxious, I really don't mean to. I'd just sleep better with
clarity
> in closure, so that the discussion don't just fizzle out and I give up
> while having a sense that the community is leaning toward the positive
> on this.
>
> BR,
> Jesper
>
>
> On Thu, 2019-05-02 at 12:20 +0000, Jesper Antonsson via llvm-dev wrote:
> >    A. This RFC outlines a proposal regarding non-8-bit-byte support
> > that
> >       got positive reception at a Round Table at EuroLLVM19. The
> > general
> >       topic has been brought up several times before and one good
> > overview
> >       can be found in a FOSDEM 2017 presentation by Jones and Cook:
> >
>
>
https://protect2.fireeye.com/url?k=0092ea36-5c19e105-0092aaad-865bb277df6a-f2acface76df5bb9&u=https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
> >
> > In a nutshell, the proposal is for the llvm community to
> > allow/encourage interested parties to gradually remove "magic
> > numbers",
> > e.g. assumptions on the size of bytes from the codebase. Overview,
> > rationale and some example refactorings follows.
> >
> > Overview:
> >
> > LLVM currently assumes 8-bit bytes, while there exist a few out-of-
> > tree
> > llvm targets that utilize bytes of other sizes, including our
> > (Ericsson's) proprietary target. The main issues are the magic
number
> > 8
> > and "/8" and "*8" all over the place and the use
of i8 pointers.
> >
> > There's considerable agreement that the use of magic numbers is
not
> > good coding style, and removing these ones would be of particular
> > benefit, even though the effort would not be complete and no in-tree
> > target with tests exist to guarantee that all gains are maintained.
> >
> > Ericsson is willing to drive this effort. During EuroLLVM19, there
> > seemed to be sufficient positive interest from other companies for us
> > to expect help with reviewing patch sets. Ericsson has been
> > performing
> > nightly integration towards top-of-tree with this backend for years,
> > catching and fixing new 8-bit-byte continuously. Thus we're able
to
> > commit to doing similar upstream fixes for the long haul in a no-
> > drama
> > way.
> >
> > Rationale:
> >
> > Benefits of moving toward a byte-size agnostic llvm include:
> > * Less magic numbers in the codebase.
> > * A reduced effort to maintain out-of-tree targets with non-8-bit
> > bytes
> > as contributors follow the established patterns. (One company has
> > told
> > us that they created but eventually gave up on a 16-bit byte target
> > due
> > to too-high integration burden.)
> > * A reduction in duplicate efforts as some of the adaptation work
> > would
> > happen in-tree rather than in several out-of-tree targets.
> > * For up-and-coming targets that have non-8-bit-byte sizes, time to
> > market using llvm would be far quicker.
> > * A higher probability of LLVM being the compiler of choice for such
> > targets.
> > * Eventually, as the patch set required to make llvm fully byte size
> > agnostic becomes small enough, the effort to provide a mock in-tree
> > target with some other byte size should be surmountable.
> >
> > As cons, one could see a burden for the in-tree community to maintain
> > whatever gains that have been had. However the onus should be on
> > interested parties to mend any bit-rot. The impact of not having as
> > much magic numbers and such should if anything make the code more
> > easy
> > to understand. The permission to go ahead would be under the
> > condition
> > that significant added complexities are avoided. Another con would be
> > added compilation time e.g. in cases where the byte size is a run-
> > time
> > variable rather than a constant. However, this cost seems negligible
> > in
> > practice.
> >
> > Refactoring examples:
> > https://reviews.llvm.org/D61432
> >
> > Best Regards,
> > Jesper
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190523/f44a6110/attachment.html>

Philip Reames via llvm-dev

2019-May-28 18:20 UTC

head link

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Jesper, on the abstract decision model side of things, let me share my
understanding.  Note that I'm specifically not speaking for the project
as a whole, just the way I think about things.

In practice, we generally use a loose consensus model.  We don't require
full consensus, but we do require there to be a) an agreement on
direction being generally valuable to more than the contributor, b) a
lack of strong objections from established contributors, and c) an
(often implicit) commitment from the proposer to invest enough effort in
the project to justify any downsides other contributors might experience. 

Point (c) is frequently where a lot of large proposals from new
contributors fail.  Unless there's someone else strongly motivated to
drive it forward who can step in to satisfy (c), then a proposal is
likely to die on the vine.   Some of the most important feedback any
proposal gets is about how to reduce the cost to other contributors.  If
that feedback is ignored, then the proposal is almost certain dead.  (I
see this all the time.)

Point (b) is a major stumbling block, but can almost always be worked
through.  There's a couple of implicit points worth noting.  1) We
require that objections be largely technical in nature.  2) Anyone
objecting strongly is expect to be themselves a long term contributor
whose been through the process before. 3) There's a lot of horse trading
which goes on behind the scenes, and 4) because a strong objection is so
powerful, it's frequently waived.  Objections based on (1) can
frequently be addressed through offline direct conversation, and
frequently lead to revisions in proposal.  (Usually, a reduction in
initial scope, sometimes the opposite.)  (3) is a practical necessity as
no one has the time to review and think through *everything*; I'm much
more likely to invest time in a proposal coming from someone who I've
worked closely with in the past than a stranger.  (4) shows up in subtle
ways.  If you've seen me or someone else say something to the effect of
"Drive by comment", "minor concern", "deferring to X on
point Y", those
are all indications that I've chosen explicitly *not* to express a
strong objection.

Point (a) may seem like the core point, but it's generally the easiest. 
Once (b) has been addressed, (a) almost always follows. 

Again, speaking only for myself. 

Philip


On 5/14/19 4:08 AM, Jesper Antonsson via llvm-dev wrote:> Thanks to everyone that have weighed in on this! There have been
> remarks along the lines that the proposal lacks consensus to move
> forward, and that might very well be true. However, the decision model
> is not fully clear to me, so before giving up, I'd like to summarize
> the discussion to the best of my ability and ask for clarification on
> how the community makes its decisions.
>
> (First, I would like to emphasize that this is merely my
> interpretation, which may be flawed. If so, my apologies.) The RFC
> tl;dr is "without supporting non-8-bit bytes, we want to abstract the
> magic 8's used for bytesize to a DataLayout getter".
>
> It garnered some five supporters, including four companies representing
> out-of-tree backends: my own Ericsson, Synopsys, CML Microsystems and
> Codasip. I also count James Y Knight's (Google) remarks as positive.
>
> Two commenters I interpret as clearly against, among other things
> citing the need for a committment for an in-tree-backend using non-8-
> bit bytes to undertake such a change: JF Bastien and Philip Reames.
>
> Nine additional commenters provided side remarks to the discussion, and
> my interpretation was that five were in some small regard abstraction-
> positive as they provided input on the details of the abstraction,
> whereas two were more skeptical, and two were neither.
>
>
> Now to the decision model: If full consensus is required, we clearly
> don't have it, and we can stop here. However, if the abstraction were
> in place, we'd have a large majority against replacing them with 8s.
> And I guess it would be hard to move forward on very many things if
> everyone in a huge community can veto?
>
> Or should we take an insider-outsider perspective, then perhaps the
> opposers are in  majority as their in-tree voices are what counts?
>
> Or should we take a company perspective, then there's four companies
> backing this, while I'm unsure if e.g. JF is speaking for Apple?
>
> Or should we apply some reputation weights, so some people can veto?
>
> Or is it an informal mix of all of the above? I'm sorry if I'm
being
> obnoxious, I really don't mean to. I'd just sleep better with
clarity
> in closure, so that the discussion don't just fizzle out and I give up
> while having a sense that the community is leaning toward the positive
> on this.
>
> BR,
> Jesper
>
>
> On Thu, 2019-05-02 at 12:20 +0000, Jesper Antonsson via llvm-dev wrote:
>>    A. This RFC outlines a proposal regarding non-8-bit-byte support
>> that
>>       got positive reception at a Round Table at EuroLLVM19. The
>> general
>>       topic has been brought up several times before and one good
>> overview
>>       can be found in a FOSDEM 2017 presentation by Jones and Cook:
>>
>
https://protect2.fireeye.com/url?k=0092ea36-5c19e105-0092aaad-865bb277df6a-f2acface76df5bb9&u=https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
>> In a nutshell, the proposal is for the llvm community to
>> allow/encourage interested parties to gradually remove "magic
>> numbers",
>> e.g. assumptions on the size of bytes from the codebase. Overview,
>> rationale and some example refactorings follows.
>>
>> Overview:
>>
>> LLVM currently assumes 8-bit bytes, while there exist a few out-of-
>> tree 
>> llvm targets that utilize bytes of other sizes, including our
>> (Ericsson's) proprietary target. The main issues are the magic
number
>> 8
>> and "/8" and "*8" all over the place and the use of
i8 pointers.
>>
>> There's considerable agreement that the use of magic numbers is not
>> good coding style, and removing these ones would be of particular
>> benefit, even though the effort would not be complete and no in-tree
>> target with tests exist to guarantee that all gains are maintained.
>>
>> Ericsson is willing to drive this effort. During EuroLLVM19, there
>> seemed to be sufficient positive interest from other companies for us
>> to expect help with reviewing patch sets. Ericsson has been
>> performing
>> nightly integration towards top-of-tree with this backend for years,
>> catching and fixing new 8-bit-byte continuously. Thus we're able to
>> commit to doing similar upstream fixes for the long haul in a no-
>> drama
>> way.
>>
>> Rationale:
>>
>> Benefits of moving toward a byte-size agnostic llvm include:
>> * Less magic numbers in the codebase.
>> * A reduced effort to maintain out-of-tree targets with non-8-bit
>> bytes
>> as contributors follow the established patterns. (One company has
>> told
>> us that they created but eventually gave up on a 16-bit byte target
>> due
>> to too-high integration burden.)
>> * A reduction in duplicate efforts as some of the adaptation work
>> would
>> happen in-tree rather than in several out-of-tree targets.
>> * For up-and-coming targets that have non-8-bit-byte sizes, time to
>> market using llvm would be far quicker.
>> * A higher probability of LLVM being the compiler of choice for such
>> targets.
>> * Eventually, as the patch set required to make llvm fully byte size
>> agnostic becomes small enough, the effort to provide a mock in-tree
>> target with some other byte size should be surmountable.
>>
>> As cons, one could see a burden for the in-tree community to maintain
>> whatever gains that have been had. However the onus should be on
>> interested parties to mend any bit-rot. The impact of not having as
>> much magic numbers and such should if anything make the code more
>> easy
>> to understand. The permission to go ahead would be under the
>> condition
>> that significant added complexities are avoided. Another con would be
>> added compilation time e.g. in cases where the byte size is a run-
>> time
>> variable rather than a constant. However, this cost seems negligible
>> in
>> practice.
>>
>> Refactoring examples:
>> https://reviews.llvm.org/D61432
>>
>> Best Regards,
>> Jesper
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - May 2019 - RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Seemingly Similar Threads