Hi, I find that LSR is not helping enough on avoiding unfoldable offsets for SystemZ. When the loop has three stores with unfoldable offsets, LSR rewrites the IV in a good way. However, if adding another store with a foldable offset that fits already, LSR fails to rewrite the three stores. And if I happen to add a too big *positive* offset (the first three were negative) instead of a foldable one, only the positive gets transformed. * LSR is not rewriting the IV to have three foldable offsets rather than one. * It would actually be preferred in this case to use a second address register for the offset that is too far away from the others. Has anyone any idea on how to best handle this? Can LSR "split" an IV to use an extra register? Or would this need to be done in a target specific pass? For a reduced test case for this problem, see https://bugs.llvm.org//show_bug.cgi?id=32548. Thanks, Jonas
On 04/10/2017 08:47 AM, Jonas Paulsson via llvm-dev wrote:> Hi, > > I find that LSR is not helping enough on avoiding unfoldable offsets > for SystemZ. When the loop has three stores with unfoldable offsets, > LSR rewrites the IV in a good way. However, if adding another store > with a foldable offset that fits already, LSR fails to rewrite the > three stores. > > And if I happen to add a too big *positive* offset (the first three > were negative) instead of a foldable one, only the positive gets > transformed. > > * LSR is not rewriting the IV to have three foldable offsets rather > than one. > > * It would actually be preferred in this case to use a second address > register for the offset that is too far away from the others. > > Has anyone any idea on how to best handle this? Can LSR "split" an IV > to use an extra register? Or would this need to be done in a target > specific pass?When you say "an extra address register" would this imply LSR adding an additional PHI? -Hal> > For a reduced test case for this problem, see > https://bugs.llvm.org//show_bug.cgi?id=32548. > > Thanks, > > Jonas > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
>> Has anyone any idea on how to best handle this? Can LSR "split" an IV >> to use an extra register? Or would this need to be done in a target >> specific pass? > > When you say "an extra address register" would this imply LSR adding > an additional PHI? > > -Hal >Yes, that would have worked well at least in this type of loop. Can LSR do this? I experimented with adding a check for 12 bit offsets distance in isProfitableIncrement() (checking against all members of the chain), which resulted in several chains being produced by LSR, instead of just one. The chains that now formed now had immediate offsets that were close to each other, so that they should result in addresses with 12 bit offsets. But, to my disappointment, LSR did not handle these different chains by generating new PHI-nodes for each of them (or by skipping those that ended up with just one store in the chain), but instead it still output the stores in the same way as before. I also see that LSR is thinking in terms of increments between the memory accesses. In the loop I am working with it's disappointing to see that before each memory access, the base address is loaded into register, and then the offset is added, and then the access, which is 3 instructions. It should have been just an add/sub after the previous access before the memory access, per LSRs intentions. I wonder where this is supposed to be handled: In some sort of target pre-isel pass that chains the GEPs? Or is this just folded more often on other targets? /Jonas