David Chisnall
2014-Feb-20 09:02 UTC
[LLVMdev] RFC: GEP as canonical form for pointer addressing
On 20 Feb 2014, at 06:11, Ivan Godard <ivan at ootbcomp.com> wrote:> It's not just old mainframes, it's some of the newest architecture as well. > The Mill general-purpose architecture (http://ootbcomp.com) has non-integer > pointers and distinct pointer operations too. That LLVM loses pointerhood is > the biggest problem that we have identified while looking into using LLVM as > our supported compiler. It may be a killer, and we may have to fall back to > gcc. That would be a shame, but it does appear that the ir makes rash > assumptions about machine architecture. > > "There'd be a lot more work needed to support this" is not encouraging to > see, especially for a startup company with limited resources and little > prior exposure to LLVM internals.Just to add, I spend a fair bit of my time in the computer architecture research community, and pointers that are not integers are an increasingly common model. They simplify various dependency analysis paths in the pipeline (giving fewer pipeline flushes) and make certain kinds of security features significantly easier to implement. Architectures that separate pointers from integers are note becoming rarer. They are increasingly common in application-specific processors and likely to reappear in mainstream processors over the next 5-10 years. We have managed to get LLVM working (and building nontrivial amounts of code) on a MIPS-derived architecture that has non-integer pointers, and the representation in the IR itself is fine. We have a few hacks in optimisations that are far too coarse grained (i.e. don't do this optimisation if you're dealing with this kind of pointer, even though many of them [SCEV in particular] should work but the code makes invalid assumptions). We do end up having to add more after every merge. We start to hit problems when we get to SelectionDAG, which makes a lot of assumptions about the underlying architecture and has an annoying habit of thinking it knows better than the back end and undoing transformations that the back end has done. David P.S. The Mill is a very interesting architecture, but I'm very glad I'm not the one responsible for instruction scheduling on it...
Philip Reames
2014-Feb-24 23:19 UTC
[LLVMdev] RFC: GEP as canonical form for pointer addressing
On 02/20/2014 01:02 AM, David Chisnall wrote:> We have managed to get LLVM working (and building nontrivial amounts of code) on a MIPS-derived architecture that has non-integer pointers, and the representation in the IR itself is fine. We have a few hacks in optimisations that are far too coarse grained (i.e. don't do this optimisation if you're dealing with this kind of pointer, even though many of them [SCEV in particular] should work but the code makes invalid assumptions). We do end up having to add more after every merge.Any chance you'd be willing to share patches? Or even just a list of optimizations effected? This is work I'm likely be duplicating in the very near future.> We start to hit problems when we get to SelectionDAG, which makes a lot of assumptions about the underlying architecture and has an annoying habit of thinking it knows better than the back end and undoing transformations that the back end has done.We looked at trying to preserve pointer vs integer information post SelectionDAG and quickly gave up. I believe this to be the right long term direction - i.e. years from now - but we didn't believe it would be viable in the near term. Instead, we've chosen to encode the information we actually need - which values to rewrite - at an earlier phase and construct the IR such that - we hope - nothing can insert uses after our insert safepoints. The fact you've gotten this working all the way though is an impressive accomplishment and gives me hope for the long term direction. Philip
David Chisnall
2014-Feb-25 08:16 UTC
[LLVMdev] RFC: GEP as canonical form for pointer addressing
On 24 Feb 2014, at 23:19, Philip Reames <listmail at philipreames.com> wrote:> On 02/20/2014 01:02 AM, David Chisnall wrote: >> We have managed to get LLVM working (and building nontrivial amounts of code) on a MIPS-derived architecture that has non-integer pointers, and the representation in the IR itself is fine. We have a few hacks in optimisations that are far too coarse grained (i.e. don't do this optimisation if you're dealing with this kind of pointer, even though many of them [SCEV in particular] should work but the code makes invalid assumptions). We do end up having to add more after every merge. > Any chance you'd be willing to share patches? Or even just a list of optimizations effected? This is work I'm likely be duplicating in the very near future. >> We start to hit problems when we get to SelectionDAG, which makes a lot of assumptions about the underlying architecture and has an annoying habit of thinking it knows better than the back end and undoing transformations that the back end has done. > We looked at trying to preserve pointer vs integer information post SelectionDAG and quickly gave up. I believe this to be the right long term direction - i.e. years from now - but we didn't believe it would be viable in the near term. Instead, we've chosen to encode the information we actually need - which values to rewrite - at an earlier phase and construct the IR such that - we hope - nothing can insert uses after our insert safepoints. > > The fact you've gotten this working all the way though is an impressive accomplishment and gives me hope for the long term direction.Our LLVM and Clang repositories are here: https://github.com/CTSRD-CHERI/llvm https://github.com/CTSRD-CHERI/clang (the cheri branch in both is currently the active one, but it will be renamed head soon - we've just made some changes to our ISA and are waiting for everyone to have the updated version before we sync everything). I'm in the process of cleaning up the MIPS IV support for upstreaming, and I'm happy to upstream anything else that is more generally useful. I haven't yet for two reasons: - Without an architecture that has a pointer-integer distinction in tree (and a lot of tests!), the support will likely bit rot. Our architecture does not yet have a stable ISA (and, since it's a research platform, probably never will), so would not be a good choice. - Lots of things have '//FIXME: This is a really ugly hack!' above them and, while they do make things work for us, they are not always the right approach for long-term support. I would be interested in working with anyone who wants to get better support for this kind of architecture into the architecture-neutral parts of LLVM. Having the address space cast instruction has simplified things for us quite a bit (we still actually lower this to an inttoptr or ptrtoint in the back end, but at least the optimisers don't randomly elide the casts or break things anymore, because they assume that ptrtoint -> inttoptr is a bitcast, even if they ended up in different address spaces [which had different sizes]). David