Chris Lattner wrote:> On Mon, 21 Feb 2005, Jeff Cohen wrote: > >> Sorry, I thought I was running selection dag isel but I screwed up >> when trying out the really big array. You're right, it does clean it >> up except for the multiplication. >> >> So LoopStrengthReduce is not ready for prime time and doesn't >> actually get used? > > > I don't know what the status of it is. You could try it out, and see > what it does. :) I would suggest adding it to gccas immediately after > the SCCP pass.I know it has a lot of FIXMEs in it.> >> I might consider whipping it into shape. Does it still have to >> handle getelementptr in its full generality? > > > What do you mean? > > -ChrisConsider the bytecode I originally posted for the nested for loop. If the getelementptrs haven't been replaced with something simpler, LSR will have to take them apart itself. The multiplication cannot be replaced with addition without the two-dimensional access being replaced with one-dimensional access plus some pointer arithmetic. I would guess that they will not have been decomposed before LSR is run, correct? It also appears the current LSR reduces each GEP in isolation of all the others. That clearly is bad for this example, as all the GEPs in this loop do the same thing with the induction variable. Creating a separate pointer for each one only makes things worse. Ordinarily the addressing is explicit so the redundancy would have been removed and the LSR algorithm doesn't have to worry about it, but here it does. Bottom line is it has to strength reduce operations that are implicit, so it first has to determine what those operations are and detect common subexpressions among those operations itself (possibly across multiple basic blocks). It doesn't appear to handle explicit multiplications at all (probably doesn't add much value to do so). It's a challenge :)
On Mon, 21 Feb 2005, Jeff Cohen wrote:>>> trying out the really big array. You're right, it does clean it up except >>> for the multiplication. >>> >>> So LoopStrengthReduce is not ready for prime time and doesn't actually get >>> used? >> >> >> I don't know what the status of it is. You could try it out, and see what >> it does. :) I would suggest adding it to gccas immediately after the SCCP >> pass. > > I know it has a lot of FIXMEs in it.Okay, fair enough. I think these are due to unimplemented features. I think that Nate (wisely) started simple, and decided to handle the complex cases later.>>> I might consider whipping it into shape. Does it still have to handle >>> getelementptr in its full generality? >> >> >> What do you mean? >> >> -Chris > > Consider the bytecode I originally posted for the nested for loop. If the > getelementptrs haven't been replaced with something simpler, LSR will have to > take them apart itself. The multiplication cannot be replaced with addition > without the two-dimensional access being replaced with one-dimensional access > plus some pointer arithmetic. I would guess that they will not have been > decomposed before LSR is run, correct?No, this is exactly what LSR should do in this case.> It also appears the current LSR reduces each GEP in isolation of all the > others. That clearly is bad for this example, as all the GEPs in this loop > do the same thing with the induction variable. Creating a separate pointer > for each one only makes things worse. Ordinarily the addressing is explicit > so the redundancy would have been removed and the LSR algorithm doesn't have > to worry about it, but here it does. Bottom line is it has to strength > reduce operations that are implicit, so it first has to determine what those > operations are and detect common subexpressions among those operations itself > (possibly across multiple basic blocks).Yes, if you mean something like this: for (i) for (j) A[i][j].X = 0; A[i][j].Y = 0; It should be sufficient to reduce this to (just considering the inner loop): for (i) p = &A[i]; // this would be futher reduced later. for (j) p[j].X = 0; p[j].Y = 0; (this needs to do the auto-cse of p into the two GEPs). Further strength reduction should simplify this to: for (i) p = &A[i]; // this would be futher reduced later. for (j) p->X = 0; p->Y = 0; ++p; ... which is what we want.> It doesn't appear to handle > explicit multiplications at all (probably doesn't add much value to do so).True, this can be added later. Other operations, including div and rem, can be strength reduced as well.> It's a challenge :):) I would suggest continuing Nate's approach. Start with the most trivial of simple loops (e.g. for (i) A[i] = 0; ), and work up from there, adding complexity as necessary. The idea should be to always have a pass that works, but continue to add new capabilities to it. -Chris -- http://nondot.org/sabre/ http://llvm.cs.uiuc.edu/
Well, what about this? for (i) for (j) A[i][j].X = 0; if (func(j)) A[i][j].Y = 0; else A[i][j].Z = 0; This is CSE across multiple basic blocks, i.e. GCSE. Or even: for (i) for (j) A[i][j].X = 0; if (func(j)) A[i-1][j].Y = 0; else A[i+1][j].Z = 0; Much easier if the GEPs have been decomposed first and the usual global opts already performed :) Chris Lattner wrote:>> Consider the bytecode I originally posted for the nested for loop. >> If the getelementptrs haven't been replaced with something simpler, >> LSR will have to take them apart itself. The multiplication cannot >> be replaced with addition without the two-dimensional access being >> replaced with one-dimensional access plus some pointer arithmetic. I >> would guess that they will not have been decomposed before LSR is >> run, correct? > > > No, this is exactly what LSR should do in this case. > >> It also appears the current LSR reduces each GEP in isolation of all >> the others. That clearly is bad for this example, as all the GEPs in >> this loop do the same thing with the induction variable. Creating a >> separate pointer for each one only makes things worse. Ordinarily >> the addressing is explicit so the redundancy would have been removed >> and the LSR algorithm doesn't have to worry about it, but here it >> does. Bottom line is it has to strength reduce operations that are >> implicit, so it first has to determine what those operations are and >> detect common subexpressions among those operations itself (possibly >> across multiple basic blocks). > > > Yes, if you mean something like this: > > for (i) > for (j) > A[i][j].X = 0; > A[i][j].Y = 0; > > It should be sufficient to reduce this to (just considering the inner > loop): > > for (i) > p = &A[i]; // this would be futher reduced later. > for (j) > p[j].X = 0; > p[j].Y = 0; > > (this needs to do the auto-cse of p into the two GEPs). Further > strength reduction should simplify this to: > > for (i) > p = &A[i]; // this would be futher reduced later. > for (j) > p->X = 0; > p->Y = 0; > ++p; > > ... which is what we want. > >> It doesn't appear to handle explicit multiplications at all (probably >> doesn't add much value to do so). > > > True, this can be added later. Other operations, including div and > rem, can be strength reduced as well. > >> It's a challenge :) > > > :) I would suggest continuing Nate's approach. Start with the most > trivial of simple loops (e.g. for (i) A[i] = 0; ), and work up from > there, adding complexity as necessary. > > The idea should be to always have a pass that works, but continue to > add new capabilities to it. > > -Chris >