Star Tan
2013-Jul-26  04:01 UTC
[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass
Hi Sebastian, Recently, I found the "Polly - Calculate dependences" pass would lead to significant compile-time overhead when compiling some loop-intensive source code. Tobias told me you found similar problem as follows: http://llvm.org/bugs/show_bug.cgi?id=14240 My evaluation shows that "Polly - Calculate dependences" pass consumes 96.4% of total compile-time overhead when compiling the program you proposed. It seems that the expensive compile-time overhead comes from those loop operations in ISL library. Some preliminary results can be seen on http://llvm.org/bugs/show_bug.cgi?id=14240. Sebastian, I wonder whether you have further results or suggestions besides those words posted on the Bugzilla page? Any information or suggestion would be appreciated. Thanks, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130726/79eb5206/attachment.html>
Tobias Grosser
2013-Jul-26  06:14 UTC
[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass
On 07/25/2013 09:01 PM, Star Tan wrote:> Hi Sebastian, > > > Recently, I found the "Polly - Calculate dependences" pass would lead to significant compile-time overhead when compiling some loop-intensive source code. Tobias told me you found similar problem as follows: > http://llvm.org/bugs/show_bug.cgi?id=14240 > > > My evaluation shows that "Polly - Calculate dependences" pass consumes 96.4% of total compile-time overhead when compiling the program you proposed. It seems that the expensive compile-time overhead comes from those loop operations in ISL library. Some preliminary results can be seen on http://llvm.org/bugs/show_bug.cgi?id=14240. > > > Sebastian, I wonder whether you have further results or suggestions besides those words posted on the Bugzilla page? Any information or suggestion would be appreciated.Hi Star Tan, thanks for looking into this. Just to sum up, the test case that shows this slowness is the following: void bar(); void foo(short *input) { short x0, x1; char i, ctr; for(i = 0; i < 8; i++) { // SCoP begin for (ctr = 0; ctr < 8; ctr++) { x0 = input[i*64 + ctr*8 + 0] ; x1 = input[i*64 + ctr*8 + 1] ; input[i*64 + ctr*8 + 0] = x0 - x1; input[i*64 + ctr*8 + 1] = x0 + x1; input[i*64 + ctr*8 + 2] = x0 * x1; } // SCoP end bar(); // Unknown function call stops further expansion of SCoP } } Which is translated to the following scop: Context: [p_0, p_1, p_2] -> { : p_0 >= -2147483648 and p_0 <= 2147483647 and p_1 >= -2147483648 and p_1 <= 2147483647 and p_2 >= -2147483648 and p_2 <= 2147483647 } p_0: {0,+,128}<%for.cond2.preheader> p_1: {2,+,128}<%for.cond2.preheader> p_2: {4,+,128}<%for.cond2.preheader> [...] The interesting observation is, that Polly introduces three parameters (p_0, p_1, p_2) for this SCoP, even though in the C source code only the variable 'i' is SCoP invariant. However, due to the way SCEVExpr(essions) in LLVM are nested, Polly sees three scop-invariant SCEVExpr(essions) which are all translated into independent parameters. However, as we can see, the only difference between the three parameters is a different constant in the base of the AddRecExpr. If we would just introduce p_0 (the parameter where the scev-base is zero) and express any use of p_1 as p_0 + 2 and p_2 as p_0 + 4, isl could solve this problem very quickly. There are other ways to improve performance which include further tuning isl or reducing the precision of the analysis, but in this case I don't think we need to look into them. The above fix should give us good performance and additionally also increases the precision of the result (as isl now understands the relation between the different parameters). To fix this, you probably would like to look into the SCEVAffinator class and the way parameters are handled. Cheers, Tobias
Star Tan
2013-Jul-26  15:30 UTC
[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass
At 2013-07-26 14:14:51,"Tobias Grosser" <tobias at grosser.es> wrote:>On 07/25/2013 09:01 PM, Star Tan wrote: >> Hi Sebastian, >> >> >> Recently, I found the "Polly - Calculate dependences" pass would lead to significant compile-time overhead when compiling some loop-intensive source code. Tobias told me you found similar problem as follows: >> http://llvm.org/bugs/show_bug.cgi?id=14240 >> >> >> My evaluation shows that "Polly - Calculate dependences" pass consumes 96.4% of total compile-time overhead when compiling the program you proposed. It seems that the expensive compile-time overhead comes from those loop operations in ISL library. Some preliminary results can be seen on http://llvm.org/bugs/show_bug.cgi?id=14240. >> >> >> Sebastian, I wonder whether you have further results or suggestions besides those words posted on the Bugzilla page? Any information or suggestion would be appreciated. > >Hi Star Tan, > >thanks for looking into this. Just to sum up, the test case that >shows this slowness is the following: > >void bar(); >void foo(short *input) { > short x0, x1; > char i, ctr; > > for(i = 0; i < 8; i++) { > > // SCoP begin > for (ctr = 0; ctr < 8; ctr++) { > x0 = input[i*64 + ctr*8 + 0] ; > x1 = input[i*64 + ctr*8 + 1] ; > > input[i*64 + ctr*8 + 0] = x0 - x1; > input[i*64 + ctr*8 + 1] = x0 + x1; > input[i*64 + ctr*8 + 2] = x0 * x1; > } > // SCoP end > > bar(); // Unknown function call stops further expansion of SCoP > } >} > >Which is translated to the following scop: > > Context: > [p_0, p_1, p_2] -> { : p_0 >= -2147483648 and p_0 <= 2147483647 >and p_1 >= -2147483648 and p_1 <= 2147483647 and p_2 >= -2147483648 and >p_2 <= 2147483647 } > p_0: {0,+,128}<%for.cond2.preheader> > p_1: {2,+,128}<%for.cond2.preheader> > p_2: {4,+,128}<%for.cond2.preheader> >[...] > >The interesting observation is, that Polly introduces three parameters >(p_0, p_1, p_2) for this SCoP, even though in the C source code only the >variable 'i' is SCoP invariant. However, due to the way >SCEVExpr(essions) in LLVM are nested, Polly sees three scop-invariant >SCEVExpr(essions) which are all translated into independent parameters. >However, as we can see, the only difference between the three >parameters is a different constant in the base of the AddRecExpr. If we >would just introduce p_0 (the parameter where the scev-base is zero) and >express any use of p_1 as p_0 + 2 and p_2 as p_0 + 4, isl could solve >this problem very quickly. >There are other ways to improve performance which include further tuning >isl or reducing the precision of the analysis, but in this case >I don't think we need to look into them. The above fix should give us >good performance and additionally also increases the precision of the >result (as isl now understands the relation between the different >parameters). > >To fix this, you probably would like to look into the SCEVAffinator >class and the way parameters are handled. >Thank you so much. Maybe I was in the wrong direction at first -:) Yes, you are right. I should first investigate why Polly introduces three parameters rather than just one parameter. Best wishes, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130726/ec4c2b2d/attachment.html>
Tobias Grosser
2013-Jul-28  23:42 UTC
[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass
On 07/28/2013 06:52 AM, Star Tan wrote:> Hi Tobias, > > I tried to investigated the problem related to ScopInfo, but I need your > help on handling some problems about ISL and SCEV.I copied the list as the discussion may be helpful for others. @Sven, no need to read all. Just search for your name. [..]>>The interesting observation is, that Polly introduces three parameters >>(p_0, p_1, p_2) for this SCoP, even though in the C source code only the >>variable 'i' is SCoP invariant. However, due to the way >>SCEVExpr(essions) in LLVM are nested, Polly sees three scop-invariant >>SCEVExpr(essions) which are all translated into independent parameters. >>However, as we can see, the only difference between the three >>parameters is a different constant in the base of the AddRecExpr. If we >>would just introduce p_0 (the parameter where the scev-base is zero) and >>express any use of p_1 as p_0 + 2 and p_2 as p_0 + 4, isl could solve >>this problem very quickly. > > Currently, Polly introduces three parameters because it treats (base, step) as a whole for each parameter. You are right, it seems we could usea single parameter to represent all these accesses. > > I have attached a patch file for this purpose. This patch file is just a hack file to see whether three parameters can be merged into one parameter and whether the compile-time could be reduce after that. The key of this patch is the added source code inScop::addParams, which only add parameters if the new parameter has different "step" value. In other ways, we skip the "base" value when compare parameters.> With this patch file, the Scop becomes: > > Context: > p0: {0,+,128}<%for.cond2.preheader> > Statements { > Stmt_for_body6 > Domain :> { Stmt_for_body6[i0] : i0 >= 0 and i0 <= 7 }; > Scattering :> { Stmt_for_body6[i0] -> scattering[0, i0, 0] }; > ReadAccess :> [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 }; > ReadAccess :> [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 16i0 }; > MustWriteAccess :> [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 }; > MustWriteAccess :> [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 16i0 }; > MustWriteAccess :> [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 4 + p_0 + 16i0 }; > } > > Unfortunately, the compile-time is still very high!This _looks_ already a lot better. Very nice!> I have investigated a little about it. Maybe this is becauseisl_union_map_add_mapin Polly-dependencewould still introduces new parameters based on the resulted produced by ScopInfo. > > Without this patch file, the Read, Write, MayWrite (input to ISL library)calculated by collectInfo are: > > Read:[p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : (2o0 = p_0 + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = p_1 + 16i0 and i0 >= 0 and i0 <= 7) } > Write:[p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : (2o0 = p_0 + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = p_1 + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = p_2 + 16i0 and i0 >= 0 and i0 <= 7) } > Maywrite:[p_0, p_1, p_2] -> { } > Schedule:[p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> scattering[0, i0, 0] } > > But with our patch file, they become: > > Read: [p_0, p_0'] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : (2o0 = p_0' + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = 2 + p_0 + 16i0 and i0 >= 0 and i0 <= 7) } > Write: [p_0, p_0', p_0''] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : (2o0 = p_0'' + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = 2 + p_0' + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = 4 + p_0 + 16i0 and i0 >= 0 and i0 <= 7) } > MayWrite: { } > Schedule: { Stmt_for_body6[i0] -> scattering[0, i0, 0] } > It seems that theisl_union_map_add_map function called from Dependences::collectInfoautomatically introduces extra parameters, but I have no idea why this happens. > > Could you give me some further suggestion on this problem?Oh, you are on the right way. Even though we only see parameters 'p_0', isl seems to believe those values 'p_0' are not identical. We see this as isl renames them such that the difference becomes obvious. Unfortunately, as long as isl does not believe those parameters are identical, it will still be stuck in long calculations. To understand the above behaviour we need to understand how isl determines if two parameter dimensions refer to the same parameter. What you would probably expect is that the union of the two sets [p] -> {[i] = p} and [p] -> {[i] = p} is a single set [p] -> {[i] = p} as the two parameters have the very same name. However, this is not always true. Even though parameters share the same name, they may in fact be different isl_id objects. isl_id_alloc() takes a parameter name and a pointer and only if both of them are identical, the very same isl_id object is returned. Sven: In terms of making the behaviour of isl easier to understand, it may make sense to fail/assert in case operands have parameters that are named identical, but that refer to different pointer values. So the question is why do you obtain such different isl_ids and what can be done to fix this. I don't have a solution ready, but I have some questions in the patch that may help you when looking into this.> diff --git a/include/polly/ScopInfo.h b/include/polly/ScopInfo.h > index 8c56582..bab7763 100755 > --- a/include/polly/ScopInfo.h > +++ b/include/polly/ScopInfo.h > @@ -445,6 +445,7 @@ class Scop { > /// The isl_ids that are used to represent the parameters > typedef std::map<const SCEV *, int> ParamIdType; > ParamIdType ParameterIds; > + ParamIdType ParameterIdsSpace;Why are you introducing another member variable here? Why don't you keep using ParameterIds? What is the this variable supposed to track?> diff --git a/lib/Analysis/Dependences.cpp b/lib/Analysis/Dependences.cpp > index 5a185d0..9f918f3 100644 > --- a/lib/Analysis/Dependences.cpp > +++ b/lib/Analysis/Dependences.cpp > @@ -30,6 +30,9 @@ > #include <isl/map.h> > #include <isl/set.h> > > +#define DEBUG_TYPE "polly-dependence" > +#include "llvm/Support/Debug.h" > + > using namespace polly; > using namespace llvm; > > @@ -88,8 +91,15 @@ void Dependences::collectInfo(Scop &S, isl_union_map **Read, > void Dependences::calculateDependences(Scop &S) { > isl_union_map *Read, *Write, *MayWrite, *Schedule; > > + DEBUG(dbgs() << "Scop: " << S << "\n"); > + > collectInfo(S, &Read, &Write, &MayWrite, &Schedule); > > + DEBUG(dbgs() << "Read: " << Read << "\n"; > + dbgs() << "Write: " << Write << "\n"; > + dbgs() << "MayWrite: " << MayWrite << "\n"; > + dbgs() << "Schedule: " << Schedule << "\n"); > + > if (OptAnalysisType == VALUE_BASED_ANALYSIS) { > isl_union_map_compute_flow( > isl_union_map_copy(Read), isl_union_map_copy(Write), > @@ -131,6 +141,8 @@ void Dependences::calculateDependences(Scop &S) { > RAW = isl_union_map_coalesce(RAW); > WAW = isl_union_map_coalesce(WAW); > WAR = isl_union_map_coalesce(WAR); > + > + DEBUG(printScop(dbgs())); > }This patch looks good/helpful by itself. I propose to submit it separately for commit.> bool Dependences::runOnScop(Scop &S) { > diff --git a/lib/Analysis/ScopInfo.cpp b/lib/Analysis/ScopInfo.cpp > index aa72f3e..6fc838e 100644 > --- a/lib/Analysis/ScopInfo.cpp > +++ b/lib/Analysis/ScopInfo.cpp > @@ -86,6 +86,14 @@ public: > isl_aff *Affine > isl_aff_zero_on_domain(isl_local_space_from_space(Space)); > Affine = isl_aff_add_coefficient_si(Affine, isl_dim_param, 0, 1); > + if (Scev->getSCEVType() == scAddRecExpr) { > + const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(Scev);The canonical pattern for this is: if (SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Scev)) {> + const SCEVConstant *c = cast<SCEVConstant>(AR->getOperand(0));This is obviously a hack. The base is not always a constant. You can probably just call use something like, isl_pw_aff *BaseValue = visit(AR->getOperand(0)) Affine = isl_pw_aff_sum(Affine, BaseValue);> + isl_int t; > + isl_int_init(t); > + isl_int_set_si(t, c->getValue()->getSExtValue());We now use isl_val instead of isl_int.> return isl_pw_aff_alloc(Domain, Affine); > } > @@ -717,6 +725,34 @@ void Scop::addParams(std::vector<const SCEV *> NewParameters) { > if (ParameterIds.find(Parameter) != ParameterIds.end()) > continue; > > + if (Parameter->getSCEVType() == scAddRecExpr) { > + int index, maxIndex = Parameters.size(); > + const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(Parameter); > + > + for (index = 0; index < Parameters.size(); ++index) { > + const SCEV *pOld = Parameters[index]; > + > + if (pOld->getSCEVType() != scAddRecExpr) > + continue; > + > + const SCEVAddRecExpr *OAR = cast<SCEVAddRecExpr>(pOld); > + if (AR->getNumOperands() != OAR->getNumOperands()) > + continue; > + > + unsigned i, e; > + for (i = 1, e = AR->getNumOperands(); i != e; ++i){ > + if (AR->getOperand(i) != OAR->getOperand(i)) > + break; > + } > + > + if (i == e) { > + // Parameters.push_back(Parameter); > + ParameterIds[Parameter] = index; > + return; > + } > + } > + }I think this is the right idea, but probably the wrong place to put it. I would put this into SCEVValidator::visitAddRecExpr. This function always adds the AddRecExpr itself as a parameter, whenever it is found to be parametric. However, what we should do is to create a new ScevExpr that starts at zero and is otherwise identical. We then add this as a parameter. When doing this, we now also need to keep all the parameters that have been found previously in the base expression.> int dimension = Parameters.size(); > > Parameters.push_back(Parameter); > @@ -777,9 +813,9 @@ void Scop::addParameterBounds() { > > void Scop::realignParams() { > // Add all parameters into a common model. > - isl_space *Space = isl_space_params_alloc(IslCtx, ParameterIds.size()); > + isl_space *Space = isl_space_params_alloc(IslCtx, ParameterIdsSpace.size()); > > - for (ParamIdType::iterator PI = ParameterIds.begin(), PE = ParameterIds.end(); > + for (ParamIdType::iterator PI = ParameterIdsSpace.begin(), PE = ParameterIdsSpace.end();I do not see why those changes are necessary.> PI != PE; ++PI) { > const SCEV *Parameter = PI->first; > isl_id *id = getIdForParam(Parameter);> @@ -787,7 +823,7 @@ void Scop::realignParams() { > } > > // Align the parameters of all data structures to the model. > - Context = isl_set_align_params(Context, Space); > +// Context = isl_set_align_params(Context, Space);Also no idea why this one is necessary. Hope this helps, Tobias
Tobias Grosser
2013-Jul-29  14:37 UTC
[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass
On 07/29/2013 03:18 AM, Sven Verdoolaege wrote:> On Sun, Jul 28, 2013 at 04:42:25PM -0700, Tobias Grosser wrote: >> Sven: In terms of making the behaviour of isl easier to understand, >> it may make sense to fail/assert in case operands have parameters that >> are named identical, but that refer to different pointer values. > > No, you are allowed to have different identifiers with the same name. > I could optionally print the pointer values, but then I'd have > to think about what to do with them when reading a textual > representation of a set with such pointer values in them.Yes, this is how it is today. I wondered if there is actually a need to allow the use of different identifiers with the same name (except all being unnamed?). I personally do not see such a need and would prefer isl to assert/fail in case someone tries to do so. This may avoid confusions as happened here. Do you see a reason why isl should allow this? Cheers, Tobias
Tobias Grosser
2013-Jul-29  16:27 UTC
[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass
On 07/29/2013 09:15 AM, Sven Verdoolaege wrote:> On Mon, Jul 29, 2013 at 07:37:14AM -0700, Tobias Grosser wrote: >> On 07/29/2013 03:18 AM, Sven Verdoolaege wrote: >>> On Sun, Jul 28, 2013 at 04:42:25PM -0700, Tobias Grosser wrote: >>>> Sven: In terms of making the behaviour of isl easier to understand, >>>> it may make sense to fail/assert in case operands have parameters that >>>> are named identical, but that refer to different pointer values. >>> >>> No, you are allowed to have different identifiers with the same name. >>> I could optionally print the pointer values, but then I'd have >>> to think about what to do with them when reading a textual >>> representation of a set with such pointer values in them. >> >> Yes, this is how it is today. > > No, the pointer values are currently not printed.I was referring to the first sentence. I do not think printing pointer values is what we want. It would make the output unpredictable not only when address space randomisation is involved.>> I wondered if there is actually a need to >> allow the use of different identifiers with the same name (except all being >> unnamed?). I personally do not see such a need and would prefer isl to >> assert/fail in case someone tries to do so. This may avoid confusions as >> happened here. Do you see a reason why isl should allow this? > > Removing this feature would break existing users.Even if it would, the benefits for future users may outweigh this. Also, are you aware of a user that actually breaks? Anyway, on the Polly side we know the behaviour and can handle it. So this is nothing I am very strong about. I just mentioned it as it seemed to be a good idea. Cheers, Tobias
Star Tan
2013-Jul-31  04:13 UTC
[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass
Hi Tobias and Sven,
Thanks for your discussion and suggestion. 
@Sven: ISL actually allows users to have different identifiers with the same
name. The problem that we have discussed is caused by incorrect usage of
isl_space in Polly, so please do not worry about ISL library. You can skip the
following information related to Polly implementation.
@Tobias and Polly developers:
I have attached a patch file to fix this problem. The key idea is to keep ISL
library always seeing the same single parameter for those memory accesses that
use the same index variables even though they have different constant base
values.
For the example we have seen before:
for(i = 0; i < 8; i++) { for (ctr = 0; ctr < 8; ctr++) { x1 = input[i*64 +
ctr*8 + 1] ; x0 = input[i*64 + ctr*8 + 0] ; input[i*64 + ctr*8 + 0] = x0 - x1;
input[i*64 + ctr*8 + 1] = x0 + x1; input[i*64 + ctr*8 + 2] = x0 * x1; }
Without this patch file, Polly would produce the Context as follows:
  Context:
[p_0, p_1, p_2] -> { : p_0 >= -9223372036854775808 and p_0 <=
9223372036854775807 and p_1 >= -9223372036854775808 and p_1 <=
9223372036854775807 and p_2 >= -9223372036854775808 and p_2 <=
9223372036854775807 } p0: {0,+,128}<%for.cond2.preheader> p1:
{2,+,128}<%for.cond2.preheader> p2: {4,+,128}<%for.cond2.preheader>
Statements { Stmt_for_body6 Domain := [p_0, p_1, p_2] -> { Stmt_for_body6[i0]
: i0 >= 0 and i0 <= 7 }; Scattering := [p_0, p_1, p_2] -> {
Stmt_for_body6[i0] -> scattering[0, i0, 0] }; ReadAccess := [p_0, p_1, p_2]
-> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 };
ReadAccess := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0]
: 2o0 = p_1 + 16i0 }; MustWriteAccess := [p_0, p_1, p_2] -> {
Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 }; MustWriteAccess
:= [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_1
+ 16i0 }; MustWriteAccess := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] ->
MemRef_input[o0] : 2o0 = p_2 + !
 16i0 }; }
Furthermore, it will produce very complex RAW, WAW and WAR dependence as
follows:
RAW dependences: [p_0, p_1, p_2] -> { Stmt_for_body6[i0] ->
Stmt_for_body6[o0] : (exists (e0 = [(p_1)/2], e1 = [(7p_1 + p_2 + 112i0)/16]:
16o0 = -p_0 + p_1 + 16i0 and 2e0 = p_1 and 16i0 >= -p_1 + p_2 and 16i0 <=
112 + p_0 - p_1 and p_1 >= 16 + p_0 and i0 >= 0 and 16e1 >= -15 + 7p_1
+ p_2 + 112i0 and 16e1 <= -1 + 7p_1 + p_2 + 112i0 and p_2 >= 16 + p_0)) or
... or (exists (e0 = [(p_1)/2], e1 = [(-p_1 + p_2)/16]: 16o0 = p_0 - p_1 + 16i0
and 2e0 = p_1 and 16e1 = -p_1 + p_2 and 16i0 >= -p_0 + p_2 and 16i0 <= 112
- p_0 + p_1 and p_1 <= -16 + p_0 and p_2 >= 16 + p_0)) }
It not only leads to significant compile-time overhead, but also produces
incorrect dependence results.  As we can see from the source code, there should
be no RAW, WAW and WAR across loop iterations at all.
With this patch file, Polly would produce the following Context and RAW, WAW,
WAR dependence:
Context:
    [p_0] -> {  : p_0 >= -9223372036854775808 and p_0 <=
9223372036854775807 }
    p0: {0,+,128}<%for.cond2.preheader>
    Statements {
    Stmt_for_body6
            Domain :                [p_0] -> { Stmt_for_body6[i0] : i0 >=
0 and i0 <= 7 };
            Scattering :                [p_0] -> { Stmt_for_body6[i0] ->
scattering[0, i0, 0] };
            ReadAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 =
p_0 + 16i0 };
            ReadAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 =
2 + p_0 + 16i0 };
            MustWriteAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 =
p_0 + 16i0 };
            MustWriteAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 =
2 + p_0 + 16i0 };
            MustWriteAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 =
4 + p_0 + 16i0 };
    }
RAW dependences:
[p_0] -> {  }
WAR dependences:
[p_0] -> {  }
WAW dependences:
[p_0] -> {  }
We can see it only creates a single parameter and it can catch the exact RAR,
WAW and WAW dependence.
To see whether our patch file could catch normal data dependence, I have
constructed another example as follows:
  for(i = 0; i < 8; i++) {
    for (ctr = 0; ctr < 8; ctr++) {
      x0 = input[i*64 + ctr + 0] ; 
      x1 = input[i*64 + ctr + 1] ;
      input[i*64 + ctr + 0] = x0 - x1;
      input[i*64 + ctr + 1] = x0 + x1;
      input[i*64 + ctr + 2] = x0 * x1;
    }
The original Polly would produce similar complex results. However, with the
attached patch file, Polly would produce the following Context and RAW, WAW, WAR
dependence:
Context:
    [p_0] -> {  : p_0 >= -2147483648 and p_0 <= 2147483647 }
    p0: {0,+,128}<%for.cond2.preheader>
    Statements {
    Stmt_for_body6
            Domain :                [p_0] -> { Stmt_for_body6[i0] : i0 >=
0 and i0 <= 7 };
            Scattering :                [p_0] -> { Stmt_for_body6[i0] ->
scattering[0, i0, 0] };
            ReadAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 =
p_0 + 2i0 };
            ReadAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 =
2 + p_0 + 2i0 };
            MustWriteAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 =
p_0 + 2i0 };
            MustWriteAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 =
2 + p_0 + 2i0 };
            MustWriteAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 =
4 + p_0 + 2i0 };
    }
RAW dependences:
[p_0] -> { Stmt_for_body6[i0] -> Stmt_for_body6[1 + i0] : exists (e0 =
[(p_0)/2]: 2e0 = p_0 and i0 >= 0 and i0 <= 6) }
WAR dependences:
[p_0] -> {  }
WAW dependences:
[p_0] -> { Stmt_for_body6[i0] -> Stmt_for_body6[1 + i0] : exists (e0 =
[(p_0)/2]: 2e0 = p_0 and i0 >= 0 and i0 <= 6)
Results show that it still creates a single parameter and catches the exact RAR,
WAW and WAW dependence.  Of course, this patch file makes the compiling very
faster, i.e., compile-time is reduced from several minutes to less then 1
second.
@Tobias:>This is obviously a hack. The base is not always a constant.
>You can probably just call use something like,
>isl_pw_aff *BaseValue = visit(AR->getOperand(0))
>Affine = isl_pw_aff_sum(Affine, BaseValue);
Currently, I only handle constant base values because I have not found a good
way to handle general base values. isl_pw_aff_add requires two isl_pw_aff
parameters, but Affine is actually of isl_aff type. Perhaps we could first
commit a patch file to handle common cases, then we can considering submitting
another patch file to handle general cases.>I think this is the right idea, but probably the wrong place to put it.
>I would put this into SCEVValidator::visitAddRecExpr. This function 
>always adds the AddRecExpr itself as a parameter, whenever it is found 
>to be parametric. However, what we should do is to create a new ScevExpr
>that starts at zero and is otherwise identical. We then add this as a 
>parameter. When doing this, we now also need to keep all the parameters
>that have been found previously in the base expression.
A lot of Polly functions access ParameterIds and Parameters using existing
ScevExpr. If we create new ScevExpr and put them into Parameters and
ParameterIds, it may require some extra tricks to handle the mapping from
existing ScevExpr to newly created ScevExpr. I think we can consider fixing it
later.
Cheers,
Star Tan
At 2013-07-30 00:27:27,"Tobias Grosser" <tobias at grosser.es>
wrote:>On 07/29/2013 09:15 AM, Sven Verdoolaege wrote:
>> On Mon, Jul 29, 2013 at 07:37:14AM -0700, Tobias Grosser wrote:
>>> On 07/29/2013 03:18 AM, Sven Verdoolaege wrote:
>>>> On Sun, Jul 28, 2013 at 04:42:25PM -0700, Tobias Grosser wrote:
>>>>> Sven: In terms of making the behaviour of isl easier to
understand,
>>>>> it may make sense to fail/assert in case operands have
parameters that
>>>>> are named identical, but that refer to different pointer
values.
>>>>
>>>> No, you are allowed to have different identifiers with the same
name.
>>>> I could optionally print the pointer values, but then I'd
have
>>>> to think about what to do with them when reading a textual
>>>> representation of a set with such pointer values in them.
>>>
>>> Yes, this is how it is today.
>>
>> No, the pointer values are currently not printed.
>
>I was referring to the first sentence. I do not think printing pointer 
>values is what we want. It would make the output unpredictable not only 
>when address space randomisation is involved.
>
>>> I wondered if there is actually a need to
>>> allow the use of different identifiers with the same name (except
all being
>>> unnamed?). I personally do not see such a need and would prefer isl
to
>>> assert/fail in case someone tries to do so. This may avoid
confusions as
>>> happened here. Do you see a reason why isl should allow this?
>>
>> Removing this feature would break existing users.
>
>Even if it would, the benefits for future users may outweigh this.
>Also, are you aware of a user that actually breaks?
>
>Anyway, on the Polly side we know the behaviour and can handle it. So 
>this is nothing I am very strong about. I just mentioned it as it seemed 
>to be a good idea.
>
>Cheers,
>Tobias
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130731/274bbbb4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ScopInfo-Create-shared-parameter-for-memory-accesses.patch
Type: application/octet-stream
Size: 6694 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130731/274bbbb4/attachment.obj>
Possibly Parallel Threads
- [LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass
- [LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass
- [LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass
- [LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
- [LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops