On 22-Jul-08, at 1:22 PM, Dan Gohman wrote:> On Jul 21, 2008, at 3:51 PM, Stefanus Du Toit wrote: >> - build a custom alias analysis pass that indicates that these loads >> never alias with any stores in a given function >> - declare these globals as external constants within the module > > If you can convince yourself that no interprocedural optimization > will ever get in trouble, the second approach here sounds reasonable > and simpler. But if the values aren't really constant, it may be > difficult to be sure.Yes, we definitely can be sure.> Building a custom alias analysis is reasonable too.Right, and is helpful for other reasons.> One way to fix this would be to have AliasSetTracker pretend that > pointers to constant memory never alias anything. That's a little > sneaky though, so offhand I think an approach such as what's in > your patch is better.That does seem a little... unethical? :) At the very least it seems this should still return must-alias results correctly, otherwise this could hurt optimizations (I assume). I was going to suggest that at least getModRefInfo should handle this for stores, but it looks like it already does.>> If I apply the patch found further down, LICM moves the loads out (as >> expected), but of course this is a fairly specific fix. > > Slightly better than checking for GlobalVariabls directly > is to call the AliasAnalysis' pointsToConstantMemory method. > BasicAliasAnalysis' implementation of that does exactly the same > thing, > checking for constant GlobalVariables, but it would allow alias > analyses to do more sophisticated things. Could you submit a patch > for this?Ah, I was going to ask if there was such a method. I'll submit a patch once I'm done testing it.>> What's the right way to handle this? Should Basic AA handle this >> case? >> Will the RA be aware that it can remat these loads or do I need to do >> something else to allow it to know this? Will the scheduler be aware >> that it can reorder them? > > It would be nice to have an AA that's smart enough to do things > like this. However for now, having code use > AliasAnalysis::pointsToConstantMemory should cover many of the > obvious cases.OK, are there any other obvious places this should go? I expect optimizations that use getModRefInfo on stores rather than alias sets won't need any changes.>> Obviously I can also move the loads to the entry block of the >> function, but that does not address the RA/scheduling issues and is >> difficult to do in general due to some additional semantics in our >> frontend. > > In the scheduling department, LLVM is not yet using any alias > information. You can experiment with the -combiner-alias-analysis and > -combiner-global-alias-analysis options, which use AliasAnalysis > queries and do a pretty good job, but aren't very efficient and not > very widely tested. Ideally we'd like to do something better here.Could you expand on this a bit (or point me to past discussions/...)? This can be pretty important on architectures like Cell SPU.> For register allocation, LLVM currently has some simple hooks which > individual targets use to specify which loads are rematerializable. > See isReallyTriviallyReMaterializable. Currently this code is all > target-specific and doesn't use AliasAnalysis information, but I > think it could be reasonably generalized to use the new > MachineMemOperand information to be less target-dependent and to > make at least AliasAnalysis::pointsToConstantMemory queries.OK, I will have a look. I assume the reference to M_REMATERIALIZABLE in the comment for it should really be TID::Rematerializable? I also noticed that the documentation for TargetInstrDesc::isRematerializable() says "This flag is deprecated, please don't use it anymore" -- could you explain what replaces it? Stefanus -- Stefanus Du Toit <stefanus.dutoit at rapidmind.com> RapidMind Inc. phone: +1 519 885 5455 x116 -- fax: +1 519 885 1463
On Jul 22, 2008, at 2:51 PM, Stefanus Du Toit wrote:> \ > >> For register allocation, LLVM currently has some simple hooks which >> individual targets use to specify which loads are rematerializable. >> See isReallyTriviallyReMaterializable. Currently this code is all >> target-specific and doesn't use AliasAnalysis information, but I >> think it could be reasonably generalized to use the new >> MachineMemOperand information to be less target-dependent and to >> make at least AliasAnalysis::pointsToConstantMemory queries. > > OK, I will have a look. I assume the reference to M_REMATERIALIZABLE > in the comment for it should really be TID::Rematerializable? I also > noticed that the documentation for > TargetInstrDesc::isRematerializable() says "This flag is deprecated, > please don't use it anymore" -- could you explain what replaces it?It's not *yet* deprecated. The intent is to replace it with hasSideEffects, mayHaveSideEffects, neverHasSideEffects, and more analysis. It hasn't happened yet. Evan> > > Stefanus > > -- > Stefanus Du Toit <stefanus.dutoit at rapidmind.com> > RapidMind Inc. > phone: +1 519 885 5455 x116 -- fax: +1 519 885 1463 > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Jul 22, 2008, at 2:51 PM, Stefanus Du Toit wrote:> On 22-Jul-08, at 1:22 PM, Dan Gohman wrote: >>> If I apply the patch found further down, LICM moves the loads out >>> (as >>> expected), but of course this is a fairly specific fix. >> >> Slightly better than checking for GlobalVariabls directly >> is to call the AliasAnalysis' pointsToConstantMemory method. >> BasicAliasAnalysis' implementation of that does exactly the same >> thing, >> checking for constant GlobalVariables, but it would allow alias >> analyses to do more sophisticated things. Could you submit a patch >> for this? > > Ah, I was going to ask if there was such a method. > > I'll submit a patch once I'm done testing it.Thanks!> > >>> What's the right way to handle this? Should Basic AA handle this >>> case? >>> Will the RA be aware that it can remat these loads or do I need to >>> do >>> something else to allow it to know this? Will the scheduler be aware >>> that it can reorder them? >> >> It would be nice to have an AA that's smart enough to do things >> like this. However for now, having code use >> AliasAnalysis::pointsToConstantMemory should cover many of the >> obvious cases. > > OK, are there any other obvious places this should go? I expect > optimizations that use getModRefInfo on stores rather than alias sets > won't need any changes.At a glance, it looks like DeadStoreElimination and MemoryDependenceAnalysis could also use this.> > >>> Obviously I can also move the loads to the entry block of the >>> function, but that does not address the RA/scheduling issues and is >>> difficult to do in general due to some additional semantics in our >>> frontend. >> >> In the scheduling department, LLVM is not yet using any alias >> information. You can experiment with the -combiner-alias-analysis and >> -combiner-global-alias-analysis options, which use AliasAnalysis >> queries and do a pretty good job, but aren't very efficient and not >> very widely tested. Ideally we'd like to do something better here. > > Could you expand on this a bit (or point me to past discussions/...)?The codegen dependency graphs and the scheduler are capable of working with precise memory dependencies, but right now they're being given conservative dependencies. Take a look at the -view-sunit-dags graphs on code with lots of simple loads and stores, and you'll often see tall chains of dashed blue lines. That's what we'd like to avoid :-). This can be fixed by either teaching SelectionDAGISel.cpp how to use AliasAnalysis information to compute better dependencies when building the graph, or by teaching DAGCombiner.cpp how to use AliasAnalysis information to optimize the dependencies. The options I mentioned above take the latter approach, though they probably have room for improvement. One other factor here is that some of the transformations that LoopStrengthReduction and CodeGenPrepare do lower getelementptrs to ptrtoint+arithmetic+inttoptr, and this interferes with BasicAliasAnalysis. There's been some discussion of this on llvmdev, but I don't have a link handy.> This can be pretty important on architectures like Cell SPU.I agree that it's an important capability. We're gradually working on adding the requisite pieces, and we welcome help :-).> > >> For register allocation, LLVM currently has some simple hooks which >> individual targets use to specify which loads are rematerializable. >> See isReallyTriviallyReMaterializable. Currently this code is all >> target-specific and doesn't use AliasAnalysis information, but I >> think it could be reasonably generalized to use the new >> MachineMemOperand information to be less target-dependent and to >> make at least AliasAnalysis::pointsToConstantMemory queries. > > OK, I will have a look. I assume the reference to M_REMATERIALIZABLE > in the comment for it should really be TID::Rematerializable? I also > noticed that the documentation for > TargetInstrDesc::isRematerializable() says "This flag is deprecated, > please don't use it anymore" -- could you explain what replaces it?I think the idea is to move towards having more complete instruction descriptions so that the rematerializability of an instruction can be inferred instead of being an explicit flag, but I don't know what stage this transition is in. Dan
Hello Stefanus, Your LICM patch is fine on its own, but it turns out that since CodeGen can't always rematerialize loads of GlobalVariable constants, it causes some performance regressions. I've disabled this feature by default for now. You can enable it with the -enable-licm-constant-variables option. I didn't want to do this :-), so I made an attempt to fix CodeGen by adding some basic remat improvements. They help out in some cases, however it turns out that they're not sufficient in the face of PIC-mode complications. It looks this won't be easily fixable until after some of the major infrastructure projects are done. Along the way, I noticed that AliasAnalsis::pointsToConstantMemory could be used when building SelectionDAGs, which allows us to freely schedule loads from constant global variables. The larger problem of using alias-analysis to improve scheduling is still open, but this change does address the specific concern you originally asked about :-). Dan