Jesper Antonsson via llvm-dev
2019-May-03 14:51 UTC
[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes
Hi Jeroen, Thanks, these are interesting differences. The CHAR_BIT and byte relation is established in the C standard and I would prefer the byte terminology. It means the same thing as addressable unit but is a bit shorter and probably more widely known. Do you suggest any in-tree changes at some call sites using the suggested AdressSpace parameter or would we rely on the default value always? I could definitely include this, but if the parameter is never used, perhaps we can leave it out-of-tree for now? Best Regards, Jesper On Fri, 2019-05-03 at 10:33 +0000, Jeroen Dobbelaere wrote:> Hi Jesper, > > We (ASIP Designer team, Synopsys) are also interested in a cleaner > approach without those magic constants. > > Instead of 'n-bit bytes', we tend to talk about 'word addressing' and > 'addressable unit size'. > > We support C/C++ for various architectures that our customers create > with our tools. > Some of those have multiple memories where the addressable unit size > depends on the memory (address space). > > NOTE: the addressable unit size can be any value (like 10 or 41 or > 2048) > > We also keep the bits in 'char' separate from the addressable unit > size. As such, an 8 bit char can > have a 'storage size' of 24 bits in one memory and 64 bits in > another. > (but a char can also be 24 bits, or 10 or something else). > > I think that a cleaner abstraction in datalayout is in general > useful. > > We use: > unsigned getAddresssableUnitSize(unsigned AddressSpace=0); > > which, for the main llvm, could be implemented as : > unsigned getAddresssableUnitSize(unsigned AS=0) { return 8; } > > Greetings, > > Jeroen Dobbelaere > > > > -----Original Message----- > > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of > > Jesper > > Antonsson via llvm-dev > > Sent: Thursday, May 2, 2019 14:21 > > To: llvm-dev at lists.llvm.org > > Subject: [llvm-dev] RFC: On removing magic numbers assuming 8-bit > > bytes > > > > A. This RFC outlines a proposal regarding non-8-bit-byte support > > that > > got positive reception at a Round Table at EuroLLVM19. The > > general > > topic has been brought up several times before and one good > > overview > > can be found in a FOSDEM 2017 presentation by Jones and Cook: > >https://protect2.fireeye.com/url?k=2d1c7421-7195ae31-2d1c34ba-0cc47ad93e2a-4cae3fa5d066a45a&u=https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/> > > > In a nutshell, the proposal is for the llvm community to > > allow/encourage interested parties to gradually remove "magic > > numbers", > > e.g. assumptions on the size of bytes from the codebase. Overview, > > rationale and some example refactorings follows. > > > > Overview: > > > > LLVM currently assumes 8-bit bytes, while there exist a few out-of- > > tree > > llvm targets that utilize bytes of other sizes, including our > > (Ericsson's) proprietary target. The main issues are the magic > > number 8 > > and "/8" and "*8" all over the place and the use of i8 pointers. > > > > There's considerable agreement that the use of magic numbers is not > > good coding style, and removing these ones would be of particular > > benefit, even though the effort would not be complete and no in- > > tree > > target with tests exist to guarantee that all gains are maintained. > > > > Ericsson is willing to drive this effort. During EuroLLVM19, there > > seemed to be sufficient positive interest from other companies for > > us > > to expect help with reviewing patch sets. Ericsson has been > > performing > > nightly integration towards top-of-tree with this backend for > > years, > > catching and fixing new 8-bit-byte continuously. Thus we're able to > > commit to doing similar upstream fixes for the long haul in a no- > > drama > > way. > > > > Rationale: > > > > Benefits of moving toward a byte-size agnostic llvm include: > > * Less magic numbers in the codebase. > > * A reduced effort to maintain out-of-tree targets with non-8-bit > > bytes > > as contributors follow the established patterns. (One company has > > told > > us that they created but eventually gave up on a 16-bit byte target > > due > > to too-high integration burden.) > > * A reduction in duplicate efforts as some of the adaptation work > > would > > happen in-tree rather than in several out-of-tree targets. > > * For up-and-coming targets that have non-8-bit-byte sizes, time to > > market using llvm would be far quicker. > > * A higher probability of LLVM being the compiler of choice for > > such > > targets. > > * Eventually, as the patch set required to make llvm fully byte > > size > > agnostic becomes small enough, the effort to provide a mock in-tree > > target with some other byte size should be surmountable. > > > > As cons, one could see a burden for the in-tree community to > > maintain > > whatever gains that have been had. However the onus should be on > > interested parties to mend any bit-rot. The impact of not having as > > much magic numbers and such should if anything make the code more > > easy > > to understand. The permission to go ahead would be under the > > condition > > that significant added complexities are avoided. Another con would > > be > > added compilation time e.g. in cases where the byte size is a run- > > time > > variable rather than a constant. However, this cost seems > > negligible in > > practice. > > > > Refactoring examples: > > https://reviews.llvm.org/D61432 > > > > Best Regards, > > Jesper > >
Jeroen Dobbelaere via llvm-dev
2019-May-03 18:00 UTC
[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes
Hi Jesper,> Thanks, these are interesting differences. The CHAR_BIT and byte > relation is established in the C standard and I would prefer the byte > terminology. It means the same thing as addressable unit but is a bit > shorter and probably more widely known.Looking purely from a c/c++ language viewpoint, this makes sense. We settled on using 'addressable unit size', but any abstraction will already be helpful.> > Do you suggest any in-tree changes at some call sites using the > suggested AdressSpace parameter or would we rely on the default value > always? I could definitely include this, but if the parameter is never > used, perhaps we can leave it out-of-tree for now?Whenever optimizations are done on store/load/pointers, the address space should be taken into account. For us it is fine to manage that part out-of-tree. Greetings, Jeroen Dobbelaere
David Greene via llvm-dev
2019-May-03 19:58 UTC
[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes
Jeroen Dobbelaere via llvm-dev <llvm-dev at lists.llvm.org> writes:> Hi Jesper, > >> Thanks, these are interesting differences. The CHAR_BIT and byte >> relation is established in the C standard and I would prefer the byte >> terminology. It means the same thing as addressable unit but is a bit >> shorter and probably more widely known. > > Looking purely from a c/c++ language viewpoint, this makes sense. We > settled on using 'addressable unit size', but any abstraction will > already be helpful.Given that f18 has just been accepted as an LLVM project, we probably shouldn't be using C/C++ or any specific language terminology in LLVM. It seems useful to distinguish "addressable unit size" from "data size" and talk about things using such abstract terminology. -David