Liu Xin
2013-Oct-30 09:04 UTC
[LLVMdev] Is there pass to break down <4 x float> to scalars
Hi, Richard, Your decompose vector patch works perfect on my site. Unfortunately, I still get stupid code because llvm '-dse' fails followed by 'decompose-vector' . I read the DSE code and it is definitely capable of eliminating unused memory stores if its AA works. I don't think basic AA works for me. I found my program have complex memory accesses, such as bi-dimentional arrays. Sorry, I am not good at AA. In my concept, TBAA is just for C++. Do you mean that you can make use of TBAA to help DSE? Why TBAA is total null for my program ? basicaa is even better than -tbaa. liuxin at rd58:~/testbed$ opt -tbaa -aa-eval -decompose-vectors -mem2reg -dse test.bc -debug-pass=Structure -o test.opt.bc -stats Pass Arguments: -targetlibinfo -no-aa -tbaa -aa-eval -decompose-vectors -domtree -mem2reg -memdep -dse -preverify -verify Target Library Information No Alias Analysis (always returns 'may' alias) Type-Based Alias Analysis ModulePass Manager FunctionPass Manager Exhaustive Alias Analysis Precision Evaluator Decompose vector operations into smaller pieces Dominator Tree Construction Promote Memory to Register Memory Dependence Analysis Dead Store Elimination Preliminary module verification Module Verifier Bitcode Writer ===== Alias Analysis Evaluator Report ==== 1176 Total Alias Queries Performed 0 no alias responses (0.0%) 1176 may alias responses (100.0%) 0 partial alias responses (0.0%) 0 must alias responses (0.0%) Alias Analysis Evaluator Pointer Alias Summary: 0%/100%/0%/0% 49 Total ModRef Queries Performed 0 no mod/ref responses (0.0%) 0 mod responses (0.0%) 0 ref responses (0.0%) 49 mod & ref responses (100.0%) Alias Analysis Evaluator Mod/Ref Summary: 0%/0%/0%/100% Our c/c++ compiler uses steensguaard's points-to algorithm, so I turns to find -steens-aa. It seems that llvm's poolalloc implements steens-aa, right? does it still maintain? I found I can not build rDSA using the latest llvm headers. thanks, --lx On Fri, Oct 25, 2013 at 10:19 PM, Richard Sandiford < rsandifo at linux.vnet.ibm.com> wrote:> Liu Xin <navy.xliu at gmail.com> writes: > > I think we are solving a same problem. I am working on shader language > > too. I am not satisfied with current binaries because vector operations > > are kept in llvm opt. > > > > glsl shader language has an operation called "swizzle". It can select > > sub-components of a vector. If a shader only takes components "xy" for a > > vec4. it's certainly wasteful to generate 4 operations for a scalar > > processor. > > > > i think a good solution for llvm is in codegen. Many compiler has codegen > > optimizer. A DSE is good enough. > > > > Which posted patch about TBAA? you have yet another solution except > > decompose-vectors? > > Ah, no, the TBAA thing is separate really. llvmpipe generally operates > on 4 rows at a time, so some functions end up with patterns like: > > load <16 x i8> row0 ... > load <16 x i8> row1 ... > load <16 x i8> row2 ... > load <16 x i8> row3 ... > ... do stuff ... > store <16 x i8> row0 ... > store <16 x i8> row1 ... > store <16 x i8> row2 ... > store <16 x i8> row3 ... > > Since the row stride is variable, llvm doesn't have enough information > to tell that these rows don't alias. So it has to keep the loads and > stores in order. And z only has 16 general registers, so a naively- > scalarised 16 x i8 operation rapidly runs out. With unmodified llvmpipe > IR we get lots of spills. > > Since z also has x86-like register-memory operations, a few spills are > usually OK. But in this case we have to load i8s and immediately > spill them. > > So the idea was to add TBAA information to the llvmpipe IR to say that > the rows don't alias. (At the moment I'm only doing that by hand on > saved IR, I've not done it in llvmpipe itself yet.) Combined with > -combiner-alias-analysis -combiner-global-alias-analysis, this allows > the loads and stores to be reordered, which gives much better code. > > However, the problem at the moment is that there are other scalar loads > that get rewritten by DAGCombiner and the legalisation code, and in the > process lose their TBAA info. This then interferes with the optimisation > above. So I wanted to make sure that the TBAA information is kept around: > > http://llvm-reviews.chandlerc.com/D1894 > > It was just that if I had a choice of only getting one of the two patches > in, > it'd definitely be the D1894 one. It sounds like there's more interest in > the DecomposeVectors patch than I'd expected though, so I'll get back to > it. > > Maybe as a first cut we can have a TargetTransformInfo hook to enable or > disable the pass wholesale, with a command-line option to override it. > > Thanks to you an Renato for the feedback. > > Richard > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131030/a323c533/attachment.html>
Richard Sandiford
2013-Oct-30 12:13 UTC
[LLVMdev] Is there pass to break down <4 x float> to scalars
Liu Xin <navy.xliu at gmail.com> writes:> Your decompose vector patch works perfect on my site. Unfortunately, I > still get stupid code because llvm '-dse' fails followed by > 'decompose-vector' . > I read the DSE code and it is definitely capable of eliminating unused > memory stores if its AA works. I don't think basic AA works for me. I > found my program have complex memory accesses, such as bi-dimentional > arrays. > > Sorry, I am not good at AA. In my concept, TBAA is just for C++.Well, as I understand it, what's called TBAA in llvm is mostly an alias set hierarchy. You can use the same infrastructure for any situation in which you know that two accesses can't overlap, even if that doesn't really map to "type"s in the language sense. So it's more than just C++ (or other languages, like LangRef.rst says). In my case, I'm using TBAA for IR generated by llvmpipe. The information I'm adding isn't really related to the C types in llvmpipe (or gallium/mesa generally). It just says that two accesses can't overlap because they refer to different arrays, or different rows/slices of the same array.> Do you mean that you can make use of TBAA to help DSE?The main reason I wanted TBAA is to help scheduling. None of the accesses are dead, but I want to able to interleave them to reduce register pressure.> Why TBAA is total null for my program ? basicaa is even better than -tbaa. > > liuxin at rd58:~/testbed$ opt -tbaa -aa-eval -decompose-vectors -mem2reg -dse > test.bc -debug-pass=Structure -o test.opt.bc -stats > Pass Arguments: -targetlibinfo -no-aa -tbaa -aa-eval -decompose-vectors > -domtree -mem2reg -memdep -dse -preverify -verify > Target Library Information > No Alias Analysis (always returns 'may' alias) > Type-Based Alias Analysis > ModulePass Manager > FunctionPass Manager > Exhaustive Alias Analysis Precision Evaluator > Decompose vector operations into smaller pieces > Dominator Tree Construction > Promote Memory to Register > Memory Dependence Analysis > Dead Store Elimination > Preliminary module verification > Module Verifier > Bitcode Writer > ===== Alias Analysis Evaluator Report ====> 1176 Total Alias Queries Performed > 0 no alias responses (0.0%) > 1176 may alias responses (100.0%) > 0 partial alias responses (0.0%) > 0 must alias responses (0.0%) > Alias Analysis Evaluator Pointer Alias Summary: 0%/100%/0%/0% > 49 Total ModRef Queries Performed > 0 no mod/ref responses (0.0%) > 0 mod responses (0.0%) > 0 ref responses (0.0%) > 49 mod & ref responses (100.0%) > Alias Analysis Evaluator Mod/Ref Summary: 0%/0%/0%/100% > > Our c/c++ compiler uses steensguaard's points-to algorithm, so I turns to > find -steens-aa. It seems that llvm's poolalloc implements steens-aa, > right? does it still maintain? > I found I can not build rDSA using the latest llvm headers.Sorry, I don't really know this part of llvm, so I'm not sure what to suggest. Hopefully someone else will comment. Thanks, Richard
Liu Xin
2013-Oct-31 14:56 UTC
[LLVMdev] Is there pass to break down <4 x float> to scalars
Richard, Thank you. Building up a points-to algorithm is non-trivia. I will investigate on this thread. thank you for the suggest! --lx On Wed, Oct 30, 2013 at 8:13 PM, Richard Sandiford < rsandifo at linux.vnet.ibm.com> wrote:> Liu Xin <navy.xliu at gmail.com> writes: > > Your decompose vector patch works perfect on my site. Unfortunately, I > > still get stupid code because llvm '-dse' fails followed by > > 'decompose-vector' . > > I read the DSE code and it is definitely capable of eliminating unused > > memory stores if its AA works. I don't think basic AA works for me. I > > found my program have complex memory accesses, such as bi-dimentional > > arrays. > > > > Sorry, I am not good at AA. In my concept, TBAA is just for C++. > > Well, as I understand it, what's called TBAA in llvm is mostly an alias > set hierarchy. You can use the same infrastructure for any situation in > which you know that two accesses can't overlap, even if that doesn't > really map to "type"s in the language sense. So it's more than just C++ > (or other languages, like LangRef.rst says). > > In my case, I'm using TBAA for IR generated by llvmpipe. The information > I'm adding isn't really related to the C types in llvmpipe (or gallium/mesa > generally). It just says that two accesses can't overlap because they > refer to different arrays, or different rows/slices of the same array. > > > Do you mean that you can make use of TBAA to help DSE? > > The main reason I wanted TBAA is to help scheduling. None of the > accesses are dead, but I want to able to interleave them to reduce > register pressure. > > > Why TBAA is total null for my program ? basicaa is even better than > -tbaa. > > > > liuxin at rd58:~/testbed$ opt -tbaa -aa-eval -decompose-vectors -mem2reg > -dse > > test.bc -debug-pass=Structure -o test.opt.bc -stats > > Pass Arguments: -targetlibinfo -no-aa -tbaa -aa-eval -decompose-vectors > > -domtree -mem2reg -memdep -dse -preverify -verify > > Target Library Information > > No Alias Analysis (always returns 'may' alias) > > Type-Based Alias Analysis > > ModulePass Manager > > FunctionPass Manager > > Exhaustive Alias Analysis Precision Evaluator > > Decompose vector operations into smaller pieces > > Dominator Tree Construction > > Promote Memory to Register > > Memory Dependence Analysis > > Dead Store Elimination > > Preliminary module verification > > Module Verifier > > Bitcode Writer > > ===== Alias Analysis Evaluator Report ====> > 1176 Total Alias Queries Performed > > 0 no alias responses (0.0%) > > 1176 may alias responses (100.0%) > > 0 partial alias responses (0.0%) > > 0 must alias responses (0.0%) > > Alias Analysis Evaluator Pointer Alias Summary: 0%/100%/0%/0% > > 49 Total ModRef Queries Performed > > 0 no mod/ref responses (0.0%) > > 0 mod responses (0.0%) > > 0 ref responses (0.0%) > > 49 mod & ref responses (100.0%) > > Alias Analysis Evaluator Mod/Ref Summary: 0%/0%/0%/100% > > > > Our c/c++ compiler uses steensguaard's points-to algorithm, so I turns to > > find -steens-aa. It seems that llvm's poolalloc implements steens-aa, > > right? does it still maintain? > > I found I can not build rDSA using the latest llvm headers. > > Sorry, I don't really know this part of llvm, so I'm not sure what to > suggest. > Hopefully someone else will comment. > > Thanks, > Richard > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131031/d2744162/attachment.html>
Reasonably Related Threads
- [LLVMdev] Is there pass to break down <4 x float> to scalars
- [LLVMdev] Is there pass to break down <4 x float> to scalars
- [LLVMdev] Is there pass to break down <4 x float> to scalars
- [LLVMdev] Is there pass to break down <4 x float> to scalars
- [LLVMdev] Is there pass to break down <4 x float> to scalars