Our frontend can guarantee that loads from globals are rematerializable and do not alias with any stores in any function in the given module. We'd like the optimization passes (and ideally the register allocator as well) to be able to use this fact. The globals are not constant "forever" but are constant during the calling of any given function in the module. There seem to be two major ways to expose this to the optimization passes and code gen: - build a custom alias analysis pass that indicates that these loads never alias with any stores in a given function - declare these globals as external constants within the module The former should give optimizations like LICM the freedom to move these loads around, allow them to be CSE'd, etc. The latter should technically allow the same freedom to these optimizations, but doesn't currently seem to. Furthermore, the latter should give the RA enough information to rematerialize these loads instead of spilling them if necessary. Below is a simple example module that illustrates this. It's just a memcpy loop copying between two external arrays. With unmodified TOT, opt -basicaa -licm for example will not move the invariant loads of @b and @a (into %tmp3 and %tmp5) out of the body of the for loop. If I apply the patch found further down, LICM moves the loads out (as expected), but of course this is a fairly specific fix. What's the right way to handle this? Should Basic AA handle this case? Will the RA be aware that it can remat these loads or do I need to do something else to allow it to know this? Will the scheduler be aware that it can reorder them? Obviously I can also move the loads to the entry block of the function, but that does not address the RA/scheduling issues and is difficult to do in general due to some additional semantics in our frontend. Thanks! Stefanus === Example ==@b = external constant float* @a = external constant float* define void @test(i32 %count) { entry: br label %forcond forcond: ; preds = %forinc, %entry %i.0 = phi i32 [ 0, %entry ], [ %inc, %forinc ] ; <i32> [#uses=4] %cmp = icmp ult i32 %i.0, %count ; <i1> [#uses=1] br i1 %cmp, label %forbody, label %afterfor forbody: ; preds = %forcond %tmp3 = load float** @b ; <float*> [#uses=1] %arrayidx = getelementptr float* %tmp3, i32 %i. 0 ; <float*> [#uses=1] %tmp5 = load float** @a ; <float*> [#uses=1] %arrayidx6 = getelementptr float* %tmp5, i32 %i. 0 ; <float*> [#uses=1] %tmp7 = load float* %arrayidx6 ; <float> [#uses=1] store float %tmp7, float* %arrayidx br label %forinc forinc: ; preds = %forbody %inc = add i32 %i.0, 1 ; <i32> [#uses=1] br label %forcond afterfor: ; preds = %forcond ret void } === Patch ==Index: lib/Transforms/Scalar/LICM.cpp ==================================================================--- lib/Transforms/Scalar/LICM.cpp (revision 53694) +++ lib/Transforms/Scalar/LICM.cpp (working copy) @@ -36,6 +36,7 @@ #include "llvm/Constants.h" #include "llvm/DerivedTypes.h" #include "llvm/Instructions.h" +#include "llvm/GlobalVariable.h" #include "llvm/Target/TargetData.h" #include "llvm/Analysis/LoopInfo.h" #include "llvm/Analysis/LoopPass.h" @@ -370,6 +371,12 @@ if (LI->isVolatile()) return false; // Don't hoist volatile loads! + if (GlobalVariable* gv = dyn_cast<GlobalVariable>(LI- >getOperand(0))) { + if (gv->isConstant()) { + return true; + } + } + // Don't hoist loads which have may-aliased stores in loop. unsigned Size = 0; if (LI->getType()->isSized()) -- Stefanus Du Toit <stefanus.dutoit at rapidmind.com> RapidMind Inc. phone: +1 519 885 5455 x116 -- fax: +1 519 885 1463
On Mon, Jul 21, 2008 at 3:51 PM, Stefanus Du Toit <sdt at rapidmind.com> wrote:> The latter should technically allow the same freedom to these > optimizations, but doesn't currently seem to. Furthermore, the latter > should give the RA enough information to rematerialize these loads > instead of spilling them if necessary.Yes... the issue is roughly that alias analysis works between addresses, not instructions, so it can't tell that a store can't alias a load from a constant global.> If I apply the patch found further down, LICM moves the loads out (as > expected), but of course this is a fairly specific fix.Yes... the attached patch looks correct.> What's the right way to handle this? Should Basic AA handle this case? > Will the RA be aware that it can remat these loads or do I need to do > something else to allow it to know this? Will the scheduler be aware > that it can reorder them?The best way to go about this is probably case-by-case, at least at the moment... the const-ness of a global provides information that isn't otherwise possible to express. Remat can probably be extended to be aware of const-ness when remat-ing loads. This will probably be fixed properly once LLVM gets type-based alias information. -Eli
On Jul 21, 2008, at 3:51 PM, Stefanus Du Toit wrote:> Our frontend can guarantee that loads from globals are > rematerializable and do not alias with any stores in any function in > the given module. We'd like the optimization passes (and ideally the > register allocator as well) to be able to use this fact. The globals > are not constant "forever" but are constant during the calling of any > given function in the module. > > There seem to be two major ways to expose this to the optimization > passes and code gen: > - build a custom alias analysis pass that indicates that these loads > never alias with any stores in a given function > - declare these globals as external constants within the moduleIf you can convince yourself that no interprocedural optimization will ever get in trouble, the second approach here sounds reasonable and simpler. But if the values aren't really constant, it may be difficult to be sure. Building a custom alias analysis is reasonable too.> > The former should give optimizations like LICM the freedom to move > these loads around, allow them to be CSE'd, etc. > > The latter should technically allow the same freedom to these > optimizations, but doesn't currently seem to. Furthermore, the latter > should give the RA enough information to rematerialize these loads > instead of spilling them if necessary. > > Below is a simple example module that illustrates this. It's just a > memcpy loop copying between two external arrays. With unmodified TOT, > opt -basicaa -licm for example will not move the invariant loads of @b > and @a (into %tmp3 and %tmp5) out of the body of the for loop.Good catch! One way to fix this would be to have AliasSetTracker pretend that pointers to constant memory never alias anything. That's a little sneaky though, so offhand I think an approach such as what's in your patch is better.> If I apply the patch found further down, LICM moves the loads out (as > expected), but of course this is a fairly specific fix.Slightly better than checking for GlobalVariabls directly is to call the AliasAnalysis' pointsToConstantMemory method. BasicAliasAnalysis' implementation of that does exactly the same thing, checking for constant GlobalVariables, but it would allow alias analyses to do more sophisticated things. Could you submit a patch for this?> What's the right way to handle this? Should Basic AA handle this case? > Will the RA be aware that it can remat these loads or do I need to do > something else to allow it to know this? Will the scheduler be aware > that it can reorder them?It would be nice to have an AA that's smart enough to do things like this. However for now, having code use AliasAnalysis::pointsToConstantMemory should cover many of the obvious cases.> > Obviously I can also move the loads to the entry block of the > function, but that does not address the RA/scheduling issues and is > difficult to do in general due to some additional semantics in our > frontend.In the scheduling department, LLVM is not yet using any alias information. You can experiment with the -combiner-alias-analysis and -combiner-global-alias-analysis options, which use AliasAnalysis queries and do a pretty good job, but aren't very efficient and not very widely tested. Ideally we'd like to do something better here. For register allocation, LLVM currently has some simple hooks which individual targets use to specify which loads are rematerializable. See isReallyTriviallyReMaterializable. Currently this code is all target-specific and doesn't use AliasAnalysis information, but I think it could be reasonably generalized to use the new MachineMemOperand information to be less target-dependent and to make at least AliasAnalysis::pointsToConstantMemory queries. Dan
Hi,> One way to fix this would be to have AliasSetTracker pretend that > pointers to constant memory never alias anything. That's a little > sneaky though, ...on the other hand it is simple and (presumably) effective. Do you think it really could cause trouble? Ciao, Duncan.
On 22-Jul-08, at 1:22 PM, Dan Gohman wrote:> On Jul 21, 2008, at 3:51 PM, Stefanus Du Toit wrote: >> - build a custom alias analysis pass that indicates that these loads >> never alias with any stores in a given function >> - declare these globals as external constants within the module > > If you can convince yourself that no interprocedural optimization > will ever get in trouble, the second approach here sounds reasonable > and simpler. But if the values aren't really constant, it may be > difficult to be sure.Yes, we definitely can be sure.> Building a custom alias analysis is reasonable too.Right, and is helpful for other reasons.> One way to fix this would be to have AliasSetTracker pretend that > pointers to constant memory never alias anything. That's a little > sneaky though, so offhand I think an approach such as what's in > your patch is better.That does seem a little... unethical? :) At the very least it seems this should still return must-alias results correctly, otherwise this could hurt optimizations (I assume). I was going to suggest that at least getModRefInfo should handle this for stores, but it looks like it already does.>> If I apply the patch found further down, LICM moves the loads out (as >> expected), but of course this is a fairly specific fix. > > Slightly better than checking for GlobalVariabls directly > is to call the AliasAnalysis' pointsToConstantMemory method. > BasicAliasAnalysis' implementation of that does exactly the same > thing, > checking for constant GlobalVariables, but it would allow alias > analyses to do more sophisticated things. Could you submit a patch > for this?Ah, I was going to ask if there was such a method. I'll submit a patch once I'm done testing it.>> What's the right way to handle this? Should Basic AA handle this >> case? >> Will the RA be aware that it can remat these loads or do I need to do >> something else to allow it to know this? Will the scheduler be aware >> that it can reorder them? > > It would be nice to have an AA that's smart enough to do things > like this. However for now, having code use > AliasAnalysis::pointsToConstantMemory should cover many of the > obvious cases.OK, are there any other obvious places this should go? I expect optimizations that use getModRefInfo on stores rather than alias sets won't need any changes.>> Obviously I can also move the loads to the entry block of the >> function, but that does not address the RA/scheduling issues and is >> difficult to do in general due to some additional semantics in our >> frontend. > > In the scheduling department, LLVM is not yet using any alias > information. You can experiment with the -combiner-alias-analysis and > -combiner-global-alias-analysis options, which use AliasAnalysis > queries and do a pretty good job, but aren't very efficient and not > very widely tested. Ideally we'd like to do something better here.Could you expand on this a bit (or point me to past discussions/...)? This can be pretty important on architectures like Cell SPU.> For register allocation, LLVM currently has some simple hooks which > individual targets use to specify which loads are rematerializable. > See isReallyTriviallyReMaterializable. Currently this code is all > target-specific and doesn't use AliasAnalysis information, but I > think it could be reasonably generalized to use the new > MachineMemOperand information to be less target-dependent and to > make at least AliasAnalysis::pointsToConstantMemory queries.OK, I will have a look. I assume the reference to M_REMATERIALIZABLE in the comment for it should really be TID::Rematerializable? I also noticed that the documentation for TargetInstrDesc::isRematerializable() says "This flag is deprecated, please don't use it anymore" -- could you explain what replaces it? Stefanus -- Stefanus Du Toit <stefanus.dutoit at rapidmind.com> RapidMind Inc. phone: +1 519 885 5455 x116 -- fax: +1 519 885 1463
On Tue, Jul 22, 2008 at 10:22 AM, Dan Gohman <gohman at apple.com> wrote:> Slightly better than checking for GlobalVariabls directly > is to call the AliasAnalysis' pointsToConstantMemory method.Ah, I didn't realize there was such a method; that's definitely the way to go. -Eli
On Jul 21, 2008, at 3:51 PM, Stefanus Du Toit wrote:> There seem to be two major ways to expose this to the optimization > passes and code gen: > - build a custom alias analysis pass that indicates that these loads > never alias with any stores in a given function > - declare these globals as external constants within the moduleSince the second has already been examined down-thread, I thought I'd talk about the first one. Instead of implementing a whole new alias analysis, you can build on the lib/Analysis/LibCallAliasAnalysis.cpp/LibCallSemantics.cpp stuff. This provides a slightly higher level API to define that "objects" exist (with custom predicates to determine whether a pointer provably points to the 'object') and then define their relations to standard functions that can't be marked readnone. This is mostly useful for things like libm calls, which don't modify anything except errno, and errno has a very weird and target-specific description. If your language has similar things, it could be useful for that. -Chris