Dylan McKay via llvm-dev
2017-Jul-11 22:18 UTC
[llvm-dev] RFC: Harvard architectures and default address spaces
Hello Hal,> Add this information to DataLayout and to use that information inrelevant places. This sounds like a much better/cleaner idea, thanks! On Wed, Jul 12, 2017 at 1:13 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 07/11/2017 12:54 AM, Dylan McKay via llvm-dev wrote: > > Hello all, I’m looking into solving an AVR-specific issue and would love > to hear peoples thoughts on how to best fix it. > Background > > As you may or may not know, I maintain the in-tree AVR backend, which also > happens to be (to the best of my knowledge) the first in-tree backend for a > Harvard architecture. > > In this architecture, code lives inside the ‘program memory’ space > (numbered 1), whereas data lives inside RAM “data space”, which corresponds > to the default address space 0. This is important because loads/stores use > different instruction/pointer formats depending on the address space used, > and so we need correct address space information available to the backend > itself. > > Due to the fact that address spaces in LLVM default to 0, this means that > all global or constant variables default to living inside data space. This > causes a few issues, including the fact that the SimplifyCFG pass creates > switch lookup tables, which default to data space, causing us to emit > broken table lookups and also wasting precious RAM. > The problem - emitting pointers as operands > > *NOTE*: Feel free to skip to tl;dr of this section if you don’t care too > much about the details > > There are different instructions which require different fixups to be > applied depending on whether pointers are located in data space or program > space. > > Take the ICALL instruction - it performs an indirect call to the pointer > stored in the Z register. > > We must first load the pointer into Z via the ‘ldi’ instruction. If the > pointer is actually a pointer to a symbol, we need to emit a > AVR_LO8_LDI_GS relocation, otherwise we emit a AVR_LO8_LDI relocation. > There are a few other cases, but they’re irrelevant for this discussion. > > We can quite easily look at the GlobalValue* that corresponds to the > pointer if it is a symbol and select the fixup based on that, but that > assumes that the address spaces are always correct. > > Now, imagine that the pointer is actually a function pointer. LLVM does > not expose any way to set address space in the IR for functions, but > because it derived from GlobalValue, it does have an address space, and > that address space defaults to zero. Because of this, every single function > pointer in the AVR backend that gets loaded by the ldi will be associated > with data space, and not program space, which it actually belongs to. > > *tl;dr* functions default to address space zero, even though they are in > a different space on Harvard architectures, which causes silent codegen > bugs when we rely on the address space of a global value > Proposed solution > > It would be impossible to set the address space correctly on creation of > llvm::Function objects because at that point in the pipeline, we do not > know the target architecture. > > Because of this, I’d like to extend TargetTransformInfo with hooks that > like getSwitchTableAddressSpace(), getFunctionAddressSpace(). I have > already got a WIP patch for this here <https://reviews.llvm.org/D34983>. > > Once we have that information available to TargetTransformInfo, I propose > we add a pass (very early in the codegen pipeline) that sets the address > space of all functions to whatever value is specified in the hooks. > > This works well because we don’t let frontends specify address space on > functions, nor do we even mention that functions have address spaces in the > language reference. > > The downside of it it is that you wouldn’t normally expect something like > an address space to change midway through the compilation process. To > counter that however, I doubt the pre-codegen code cares much about the > value of function address spaces, if at all. > > On top of this, at the current point in time, Pointer<Function>:: > getAddressSpace is downright incorrect on any Harvard architecture, and > for other architectures, the address space for functions will still stay > the default of zero and will not change at all. > > Does anybody know anything I haven’t thought of? Any reasons why this > solution is suboptimal? > > > Hi, Dylan, > > Being able to specify the address space of functions, etc. is a good idea. > Given the current design, you can't put this into TargetTransformInfo, > however, because nothing in TTI may be required for correctness (because > your target's implementation might not be available). Information required > for correctness must go in DataLayout (because it must always be > available). You should propose patches to add this information to > DataLayout and to use that information in relevant places. > > -Hal > > > > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170712/4de81aae/attachment.html>
David Chisnall via llvm-dev
2017-Jul-12 15:26 UTC
[llvm-dev] RFC: Harvard architectures and default address spaces
On 11 Jul 2017, at 23:18, Dylan McKay via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > > Add this information to DataLayout and to use that information in relevant places. > > This sounds like a much better/cleaner idea, thanks!I’d suggest taking a look at the alloca address space changes, which were recently added based on a cleaned-up version of our code. We have a similar issue (function and data pointers have the same representation for us, but casting requires different handling[1]) and have considered adding address spaces to functions. David [1] Probably not relevant for this discussion, but if anyone cares: in our world we have 128-bit fat pointers contain base, bounds and permissions, and that 64-bit pointers that are implicitly relative to one of two special fat pointer registers, one for code and one for data. We must therefore handle 64-bit to 128-bit pointer casts differently depending on whether we’re casting code or data pointers. We currently do this with some fairly ugly hacks, but being able to put all functions in a different AS would make this much easier for us.
Björn Pettersson A via llvm-dev
2017-Jul-13 10:38 UTC
[llvm-dev] RFC: Harvard architectures and default address spaces
My experience of having the address space for functions (or function pointers) in the DataLayout i that when the .ll file is parsed we need to parse the DataLayout before any function declarations. That is needed because we want to attribute the functions with correct address space (according to DataLayout) when inserting them in the symbol table. An alternative would be to update address space info for functions after having parsed the DataLayout. Is the DataLayout normally used when parsing the .ll file (or .bc)? Or would this be the first case of doing that? Is it guaranteed that DataLayout is specified/parsed before function declaration, or that the DataLayout specification is context sensitive and only is valid for the following declarations? What if there are several address spaces for functions? Or is that a silly thing that no one ever will use? Having the address space specified in the DataLayout would be insufficient, since we would need to attribute the functions separately, right? I do not say that having the info in the DataLayout is a totally bad idea (since our out-of-tree target is using that trick), but I think it might impose some problems as well. And perhaps it isn't the most general solution. /Björn> -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of David > Chisnall via llvm-dev > Sent: den 12 juli 2017 17:26 > To: Dylan McKay <me at dylanmckay.io> > Cc: llvm-dev <llvm-dev at lists.llvm.org>; Carl Peto <carl.peto at me.com> > Subject: Re: [llvm-dev] RFC: Harvard architectures and default address > spaces > > On 11 Jul 2017, at 23:18, Dylan McKay via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > > > > > Add this information to DataLayout and to use that information in > relevant places. > > > > This sounds like a much better/cleaner idea, thanks! > > I’d suggest taking a look at the alloca address space changes, which were > recently added based on a cleaned-up version of our code. We have a similar > issue (function and data pointers have the same representation for us, but > casting requires different handling[1]) and have considered adding address > spaces to functions. > > David > > [1] Probably not relevant for this discussion, but if anyone cares: in our world > we have 128-bit fat pointers contain base, bounds and permissions, and that > 64-bit pointers that are implicitly relative to one of two special fat pointer > registers, one for code and one for data. We must therefore handle 64-bit to > 128-bit pointer casts differently depending on whether we’re casting code or > data pointers. We currently do this with some fairly ugly hacks, but being > able to put all functions in a different AS would make this much easier for us. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev