>> Has anyone any idea on how to best handle this? Can LSR "split" an IV >> to use an extra register? Or would this need to be done in a target >> specific pass? > > When you say "an extra address register" would this imply LSR adding > an additional PHI? > > -Hal >Yes, that would have worked well at least in this type of loop. Can LSR do this? I experimented with adding a check for 12 bit offsets distance in isProfitableIncrement() (checking against all members of the chain), which resulted in several chains being produced by LSR, instead of just one. The chains that now formed now had immediate offsets that were close to each other, so that they should result in addresses with 12 bit offsets. But, to my disappointment, LSR did not handle these different chains by generating new PHI-nodes for each of them (or by skipping those that ended up with just one store in the chain), but instead it still output the stores in the same way as before. I also see that LSR is thinking in terms of increments between the memory accesses. In the loop I am working with it's disappointing to see that before each memory access, the base address is loaded into register, and then the offset is added, and then the access, which is 3 instructions. It should have been just an add/sub after the previous access before the memory access, per LSRs intentions. I wonder where this is supposed to be handled: In some sort of target pre-isel pass that chains the GEPs? Or is this just folded more often on other targets? /Jonas
On 04/11/2017 08:52 AM, Jonas Paulsson wrote:> >>> Has anyone any idea on how to best handle this? Can LSR "split" an >>> IV to use an extra register? Or would this need to be done in a >>> target specific pass? >> >> When you say "an extra address register" would this imply LSR adding >> an additional PHI? >> >> -Hal >> > Yes, that would have worked well at least in this type of loop. Can > LSR do this?No, LSR won't add new PHIs. This is a long-standing deficiency (and also, in part, prevents it from properly handling pre/post-inc addressing modes, which motives some target-specific passes such as lib/Target/PowerPC/PPCLoopPreIncPrep.cpp).> > I experimented with adding a check for 12 bit offsets distance in > isProfitableIncrement() (checking against all members of the chain), > which resulted in several chains being produced by LSR, instead of > just one. The chains that now formed now had immediate offsets that > were close to each other, so that they should result in addresses with > 12 bit offsets. But, to my disappointment, LSR did not handle these > different chains by generating new PHI-nodes for each of them (or by > skipping those that ended up with just one store in the chain), but > instead it still output the stores in the same way as before. > > I also see that LSR is thinking in terms of increments between the > memory accesses. In the loop I am working with it's disappointing to > see that before each memory access, the base address is loaded into > register, and then the offset is added, and then the access, which is > 3 instructions. It should have been just an add/sub after the previous > access before the memory access, per LSRs intentions. I wonder where > this is supposed to be handled: In some sort of target pre-isel pass > that chains the GEPs? Or is this just folded more often on other targets?As I recall, it does not do this now (although this is also needed for handling pre/post-inc addressing modes properly). -Hal> > /Jonas >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Hi Hal,> > No, LSR won't add new PHIs. This is a long-standing deficiency (and > also, in part, prevents it from properly handling pre/post-inc > addressing modes, which motives some target-specific passes such as > lib/Target/PowerPC/PPCLoopPreIncPrep.cpp). >I do find in this regression that gcc manages to use four address registers, while llvm uses just one, with a lot of extra address building instructions as a result, where the gcc loop looks very clean. I suspect that this could be a needed feature. Would you recommend doing this (splitting PHIs) as a "LSRPrep" pass, perhaps in the target, or would you try to extend LSR itself?>> >> >> I also see that LSR is thinking in terms of increments between the >> memory accesses. In the loop I am working with it's disappointing to >> see that before each memory access, the base address is loaded into >> register, and then the offset is added, and then the access, which is >> 3 instructions. It should have been just an add/sub after the >> previous access before the memory access, per LSRs intentions. I >> wonder where this is supposed to be handled: In some sort of target >> pre-isel pass that chains the GEPs? Or is this just folded more often >> on other targets? > > As I recall, it does not do this now (although this is also needed for > handling pre/post-inc addressing modes properly). >Same question - It might be simpler to do a separate post-LSR GEP handling (in CodeGenPrepare, perhaps?), but I suspect it would also be possible to extend LSR to do this instead? /Jonas