Hi Sundeep, I am also interested in the load-store lifting transformation. For static globals as-in your example, the transformation in general would rely on a better static global aliasing information that is currently in review http://reviews.llvm.org/D10059 For non-static globals, one problem with loop-based analysis alone is that in a popular embedded benchmark suite, you get serious gains if you can localize the globals in code like int *G1; int *G2; foo () { G1 = malloc(...); for (...) { // Lots of stuff with G1 and G2 worth localizing } } That malloc is very important to consider when doing an alias query because now the aliasing infrastructure knows G1, G2 don't alias, and you won't see it from a loop pass. If you were to try and modify LICM to localize the globals for example, it would have to assume G1 and G2 MayAlias. I believe this implies we must use a FunctionPass, and I have a prototype that catches cases like the above, as well as the simpler ones. I can't commit myself however to when that will be ready, so I'm just sharing what I've found out, maybe it's helpful. I'd be interested to hear your thoughts / approach in greater detail. Thanks! --Charlie. On 21 July 2015 at 06:10, <sundeepk at codeaurora.org> wrote:> typo corrected > > lcl_var = gbl_var; > for () { > ...access lcl_var... > } > gbl_var = lcl_var; > > >> Hello all, >> >> I am writing to get some feedback on an optimization that I would like to >> upstream. The basic idea is to localize global variables inside loops so >> that it can be allocated into registers. For example, transform the >> following sequence >> >> static int gbl_var; >> void foo() { >> >> for () { >> ...access gbl_var... >> } >> >> } >> >> into something like >> >> static int gbl_var; >> void foo() { >> int lcl_var; >> >> lcl_var = gbl_var; >> for () { >> ...access clc_var... >> } >> gbl_var = lcl_var; >> >> } >> >> This transformation helps a couple of EEMBC benchmarks on both Aarch64 and >> Hexagon backends. I was wondering if there is interest to get this >> optimization upstreamed or if there is a better way of doing this. >> >> Thanks, >> Sundeep >> >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >> hosted by The Linux Foundation >> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
----- Original Message -----> From: "Charlie Turner" <charlesturner7c5 at gmail.com> > To: sundeepk at codeaurora.org > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Tuesday, July 21, 2015 9:22:04 AM > Subject: Re: [LLVMdev] Loop localize global variables > > Hi Sundeep, > > I am also interested in the load-store lifting transformation.I am as well, LICM should certainly be taught to speculatively load/store to conditionally-accessed dereferenceable addresses when that is likely to be profitable.> > For static globals as-in your example, the transformation in general > would rely on a better static global aliasing information that is > currently in review http://reviews.llvm.org/D10059This will help, yes, but the transformation is quite useful regardless.> > For non-static globals, one problem with loop-based analysis alone is > that in a popular embedded benchmark suite, you get serious gains if > you can localize the globals in code like > > int *G1; > int *G2; > > foo () { > G1 = malloc(...); > for (...) { > // Lots of stuff with G1 and G2 worth localizing > } > } > > That malloc is very important to consider when doing an alias query > because now the aliasing infrastructure knows G1, G2 don't alias, and > you won't see it from a loop pass. If you were to try and modify LICM > to localize the globals for example, it would have to assume G1 and > G2 > MayAlias.Why? -Hal> I believe this implies we must use a FunctionPass, and I > have a prototype that catches cases like the above, as well as the > simpler ones. I can't commit myself however to when that will be > ready, so I'm just sharing what I've found out, maybe it's helpful. > > I'd be interested to hear your thoughts / approach in greater detail. > > Thanks! > --Charlie. > > On 21 July 2015 at 06:10, <sundeepk at codeaurora.org> wrote: > > typo corrected > > > > lcl_var = gbl_var; > > for () { > > ...access lcl_var... > > } > > gbl_var = lcl_var; > > > > > >> Hello all, > >> > >> I am writing to get some feedback on an optimization that I would > >> like to > >> upstream. The basic idea is to localize global variables inside > >> loops so > >> that it can be allocated into registers. For example, transform > >> the > >> following sequence > >> > >> static int gbl_var; > >> void foo() { > >> > >> for () { > >> ...access gbl_var... > >> } > >> > >> } > >> > >> into something like > >> > >> static int gbl_var; > >> void foo() { > >> int lcl_var; > >> > >> lcl_var = gbl_var; > >> for () { > >> ...access clc_var... > >> } > >> gbl_var = lcl_var; > >> > >> } > >> > >> This transformation helps a couple of EEMBC benchmarks on both > >> Aarch64 and > >> Hexagon backends. I was wondering if there is interest to get this > >> optimization upstreamed or if there is a better way of doing this. > >> > >> Thanks, > >> Sundeep > >> > >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > >> hosted by The Linux Foundation > >> > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Hi Charlie, My prototype only handles the static case. It's very simple implementation. It relies on ProcessInternalGlobal in GlobalOpt.cpp to check for the safety of static variables those can be localized (address-not-taken, volatile etc). Then I run a LoopPass that pretty much goes through each BB of the Loop and collects all static variables. It also checks for some more safety conditions (no calls, inline asm blocks etc). If the static variable is safe to localize in the loop, it creates alloca, loads from GV and stores into alloca in the pre-header, load from alloca and store into GV in the exit block, and replaces all uses of GV in the loop with alloca. The malloc case you mentioned below is very interesting but I don't follow why you need a function pass to handle this case. Thanks, Sundeep> ----- Original Message ----- >> From: "Charlie Turner" <charlesturner7c5 at gmail.com> >> To: sundeepk at codeaurora.org >> Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> >> Sent: Tuesday, July 21, 2015 9:22:04 AM >> Subject: Re: [LLVMdev] Loop localize global variables >> >> Hi Sundeep, >> >> I am also interested in the load-store lifting transformation. > > I am as well, LICM should certainly be taught to speculatively load/store > to conditionally-accessed dereferenceable addresses when that is likely to > be profitable. > >> >> For static globals as-in your example, the transformation in general >> would rely on a better static global aliasing information that is >> currently in review http://reviews.llvm.org/D10059 > > This will help, yes, but the transformation is quite useful regardless. > >> >> For non-static globals, one problem with loop-based analysis alone is >> that in a popular embedded benchmark suite, you get serious gains if >> you can localize the globals in code like >> >> int *G1; >> int *G2; >> >> foo () { >> G1 = malloc(...); >> for (...) { >> // Lots of stuff with G1 and G2 worth localizing >> } >> } >> >> That malloc is very important to consider when doing an alias query >> because now the aliasing infrastructure knows G1, G2 don't alias, and >> you won't see it from a loop pass. If you were to try and modify LICM >> to localize the globals for example, it would have to assume G1 and >> G2 >> MayAlias. > > Why? > > -Hal > >> I believe this implies we must use a FunctionPass, and I >> have a prototype that catches cases like the above, as well as the >> simpler ones. I can't commit myself however to when that will be >> ready, so I'm just sharing what I've found out, maybe it's helpful. >> >> I'd be interested to hear your thoughts / approach in greater detail. >> >> Thanks! >> --Charlie. >> >> On 21 July 2015 at 06:10, <sundeepk at codeaurora.org> wrote: >> > typo corrected >> > >> > lcl_var = gbl_var; >> > for () { >> > ...access lcl_var... >> > } >> > gbl_var = lcl_var; >> > >> > >> >> Hello all, >> >> >> >> I am writing to get some feedback on an optimization that I would >> >> like to >> >> upstream. The basic idea is to localize global variables inside >> >> loops so >> >> that it can be allocated into registers. For example, transform >> >> the >> >> following sequence >> >> >> >> static int gbl_var; >> >> void foo() { >> >> >> >> for () { >> >> ...access gbl_var... >> >> } >> >> >> >> } >> >> >> >> into something like >> >> >> >> static int gbl_var; >> >> void foo() { >> >> int lcl_var; >> >> >> >> lcl_var = gbl_var; >> >> for () { >> >> ...access clc_var... >> >> } >> >> gbl_var = lcl_var; >> >> >> >> } >> >> >> >> This transformation helps a couple of EEMBC benchmarks on both >> >> Aarch64 and >> >> Hexagon backends. I was wondering if there is interest to get this >> >> optimization upstreamed or if there is a better way of doing this. >> >> >> >> Thanks, >> >> Sundeep >> >> >> >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >> >> hosted by The Linux Foundation >> >> >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >