On 28 Sep 2016, at 16:50, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > At this point, my general view is that widening transformations of any kind should be done very late. Ideally, this is something the backend would do, but doing it as a CGP like fixup pass over the IR is also reasonable.I’m really glad to see that this is gone in GVN - it will reduce our diffs a lot when we do the next import. The GVN load widening is not sound in the presence of hardware-enforced spacial memory safety, so we ended up with the compiler inserting things that caused hardware bounds checks to fail and had to disable it in a few places. If you’re reintroducing it, please can we have a backend option to specify whether it’s valid to widen loads beyond the extents (for example, for us it is not valid to widen an i16 load followed by an i8 load to an i32 load). Ideally, we’d also want the back end to supply some cost function: on MIPS, even when this is valid, the i32 load followed by the masks and shifts is a lot more expensive than the two loads (and increases register pressure, so has knock-on effects that mean that turning off this bit of GVN gave us *much* better code, even without any of the later optimisation opportunities being exploited). David -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3719 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160928/b3750c85/attachment.bin>
Hi David, David Chisnall via llvm-dev wrote: > On 28 Sep 2016, at 16:50, Philip Reames via llvm-dev<llvm-dev at lists.llvm.org> wrote: >> At this point, my general view is that widening transformations of any kind should be done very late. Ideally, this is something the backend would do, but doing it as a CGP like fixup pass over the IR is also reasonable. > > I’m really glad to see that this is gone in GVN - it will reduce our > diffs a lot when we do the next import. The GVN load widening is not > sound in the presence of hardware-enforced spacial memory safety, so > we ended up with the compiler inserting things that caused hardware > bounds checks to fail and had to disable it in a few places. > > If you’re reintroducing it, please can we have a backend option to > specify whether it’s valid to widen loads beyond the extents (for > example, for us it is not valid to widen an i16 load followed by an i8 > load to an i32 load). Ideally, we’d also want the back end to supply Don't you have to mark the functions you generate as "Attribute::SanitizeAddress"? We should definitely make the speculative form of this transform (i.e. "load i32, load i16" --> "load i64") predicated on Attribute::SanitizeAddress. -- Sanjoy
Just to chime in that we have been seeing a bunch of instances on aarch64 as well where the load widening prevented LICM and probably worse inserted additional shifts instructions into the critical path. This was showing especially after r246985 (see for example http://llvm.org/PR25321 <http://llvm.org/PR25321>) which leads to additional widening possibilities. So I'd also strongly suggest to only optimize obvious cases and stay away from variations that lead to additional shift instructions being inserted. (I missed D24096, but will check whether that fixes our performance problems as well). - Matthias> On Sep 28, 2016, at 5:25 PM, Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi David, > > David Chisnall via llvm-dev wrote: > > On 28 Sep 2016, at 16:50, Philip Reames via llvm-dev<llvm-dev at lists.llvm.org> wrote: > >> At this point, my general view is that widening transformations of any kind should be done very late. Ideally, this is something the backend would do, but doing it as a CGP like fixup pass over the IR is also reasonable. > > > > I’m really glad to see that this is gone in GVN - it will reduce our > > diffs a lot when we do the next import. The GVN load widening is not > > sound in the presence of hardware-enforced spacial memory safety, so > > we ended up with the compiler inserting things that caused hardware > > bounds checks to fail and had to disable it in a few places. > > > > If you’re reintroducing it, please can we have a backend option to > > specify whether it’s valid to widen loads beyond the extents (for > > example, for us it is not valid to widen an i16 load followed by an i8 > > load to an i32 load). Ideally, we’d also want the back end to supply > > Don't you have to mark the functions you generate as > "Attribute::SanitizeAddress"? We should definitely make the > speculative form of this transform (i.e. "load i32, load i16" --> > "load i64") predicated on Attribute::SanitizeAddress. > > -- Sanjoy > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160928/e67d372a/attachment.html>
On 29 Sep 2016, at 01:25, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> > Hi David, > > David Chisnall via llvm-dev wrote: > > On 28 Sep 2016, at 16:50, Philip Reames via llvm-dev<llvm-dev at lists.llvm.org> wrote: > >> At this point, my general view is that widening transformations of any kind should be done very late. Ideally, this is something the backend would do, but doing it as a CGP like fixup pass over the IR is also reasonable. > > > > I’m really glad to see that this is gone in GVN - it will reduce our > > diffs a lot when we do the next import. The GVN load widening is not > > sound in the presence of hardware-enforced spacial memory safety, so > > we ended up with the compiler inserting things that caused hardware > > bounds checks to fail and had to disable it in a few places. > > > > If you’re reintroducing it, please can we have a backend option to > > specify whether it’s valid to widen loads beyond the extents (for > > example, for us it is not valid to widen an i16 load followed by an i8 > > load to an i32 load). Ideally, we’d also want the back end to supply > > Don't you have to mark the functions you generate as > "Attribute::SanitizeAddress"? We should definitely make the > speculative form of this transform (i.e. "load i32, load i16" --> > "load i64") predicated on Attribute::SanitizeAddress.Nope, we’re not using the address sanitiser. Our architecture supports byte-granularity bounds checking in hardware. Note that even without this, for pure MIPS code without our extensions, load widening generates significantly worse code than when it doesn’t happen. I’m actually finding it difficult to come up with a microarchitecture where a 16-bit load followed by an 8-bit load from the same cache line would give worse performance than a 32-bit load, a mask and a shift. In an in-order design, it’s more instructions to do the same work, and therefore slower. In an out-of-order design, the two loads within the cache line will likely be dispatched simultaneously and you’ll have less pressure on the register rename engine. It seems like something that will only give wins if it exposes optimisation opportunities to other transforms and, as such, should probably be exposed as an analysis so that the later transforms can do the combine if it actually makes sense to do so. David