Geoff Berry via llvm-dev
2017-Aug-08 19:17 UTC
[llvm-dev] [ScalarEvolution][SCEV] no-wrap flags dependent on order of getSCEV() calls
On 8/8/2017 1:37 PM, Friedman, Eli wrote:> On 8/8/2017 10:22 AM, Geoff Berry via llvm-dev wrote: >> Hi all, >> >> I'm looking into resolving a FIXME in the LoopDataPrefetch (and FalkorMarkStridedAccesses) pass by marking both of these passes as preserving the ScalarEvolution analysis. Unfortunately, when this change is made, LSR will generate different code. One of the root causes seems to be that SCEV will return different nsw/nuw flags for the same Value, depending on what order the SCEVs are computed, due to the fact that the SCEV object unique-ing doesn't take the nsw/nuw flags into account. Since LoopDataPrefetch computes SCEVs in a different order than LSR, the nsw/nuw flags seen by LSR will differ based on whether the SCEVs are preserved from LoopDataPrefetch. >> >> I believe this issue has been discussed before, but I just wanted to check if this is indeed the current expected behavior, and if anyone has any plans/ideas for addressing this issue. > > The general issue that SCEV nsw is weird is known... see, for example https://bugs.llvm.org/show_bug.cgi?id=23527. > >> For reference, below is a reduced loop where this problem occurs. The SCEV for %i.07.i will have <nuw> or not depending on whether %idxprom.i was computed before it: > > %idxprom.i, the zext? I'm not sure how you're getting that particular effect. ScalarEvolution::getSCEV for a zext immediately calls getSCEV on its operand.Here is an abridged record of the getSCEV results as seen by each pass with/without preserving SCEVAnalysis. In the first case, when the SCEV is invalidated, the SCEV for %i.07.i is computed in LSR as {0,+,1}<%for.body.i>. In the second case, the SCEV for %i.07.i is computed in LSR the same way, but because the SCEV for %idxprom.i from FalkorHWPFFix unique's to the same value but has the nuw flag set and is still present in the foldingset, {0,+,1}<nuw><%for.body.i> is returned for %i.07.i in LSR instead. The SCEVs for other values differ too, but I thought I'd start with this one. ********** FalkorHWPFFix ********** Created SCEV for %i.07.i = phi i32 [ %inc.i, %for.inc.i ], [ 0, %for.body.i.preheader ]: {0,+,1}<%for.body.i> Created SCEV for %idxprom.i = zext i32 %i.07.i to i64: (zext i32 {0,+,1}<%for.body.i> to i64) Created SCEV for %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i: ((4 * (zext i32 {0,+,1}<%for.body.i> to i64))<nuw><nsw> + %rot)<nsw> Created SCEV for %inc.i = add i32 %i.07.i, 1: {1,+,1}<%for.body.i> Created SCEV for %idxprom.i = zext i32 %i.07.i to i64: {0,+,1}<nuw><%for.body.i> Created SCEV for %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i: {%rot,+,4}<nw><%for.body.i> ********** SCEV invalidated ********** ********** LSR ********** Created SCEV for %i.07.i = phi i32 [ %inc.i, %for.inc.i ], [ 0, %for.body.i.preheader ]: {0,+,1}<%for.body.i> Created SCEV for %idxprom.i = zext i32 %i.07.i to i64: (zext i32 {0,+,1}<%for.body.i> to i64) Created SCEV for %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i: ((4 * (zext i32 {0,+,1}<%for.body.i> to i64))<nuw><nsw> + %rot)<nsw> Created SCEV for %inc.i = add i32 %i.07.i, 1: {1,+,1}<%for.body.i> Created SCEV for %cmp.i = icmp ult i32 %inc.i, %size: %cmp.i vs. ********** FalkorHWPFFix ********** Created SCEV for %i.07.i = phi i32 [ %inc.i, %for.inc.i ], [ 0, %for.body.i.preheader ]: {0,+,1}<%for.body.i> Created SCEV for %idxprom.i = zext i32 %i.07.i to i64: (zext i32 {0,+,1}<%for.body.i> to i64) Created SCEV for %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i: ((4 * (zext i32 {0,+,1}<%for.body.i> to i64))<nuw><nsw> + %rot)<nsw> Created SCEV for %inc.i = add i32 %i.07.i, 1: {1,+,1}<%for.body.i> Created SCEV for %idxprom.i = zext i32 %i.07.i to i64: {0,+,1}<nuw><%for.body.i> Created SCEV for %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i: {%rot,+,4}<nw><%for.body.i> ********** SCEV preserved ********** ********** LSR ********** Created SCEV for %i.07.i = phi i32 [ %inc.i, %for.inc.i ], [ 0, %for.body.i.preheader ]: {0,+,1}<nuw><%for.body.i> ^^^^^ Created SCEV for %inc.i = add i32 %i.07.i, 1: {1,+,1}<nuw><%for.body.i> Existing SCEV for %idxprom.i = zext i32 %i.07.i to i64: {0,+,1}<nuw><%for.body.i> Existing SCEV for %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i: {%rot,+,4}<nw><%for.body.i> Created SCEV for %cmp.i = icmp ult i32 %inc.i, %size: %cmp.i Created SCEV for %lsr.iv = phi i64 [ 0, %for.body.i.preheader ], [ %lsr.iv.next, %for.inc.i ]: {0,+,1}<nuw><nsw><%for.body.i>> > -Eli >-- Geoff Berry Employee of Qualcomm Datacenter Technologies, Inc. Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Friedman, Eli via llvm-dev
2017-Aug-08 19:58 UTC
[llvm-dev] [ScalarEvolution][SCEV] no-wrap flags dependent on order of getSCEV() calls
On 8/8/2017 12:17 PM, Geoff Berry wrote:> On 8/8/2017 1:37 PM, Friedman, Eli wrote: >> On 8/8/2017 10:22 AM, Geoff Berry via llvm-dev wrote: >>> Hi all, >>> >>> I'm looking into resolving a FIXME in the LoopDataPrefetch (and >>> FalkorMarkStridedAccesses) pass by marking both of these passes as >>> preserving the ScalarEvolution analysis. Unfortunately, when this >>> change is made, LSR will generate different code. One of the root >>> causes seems to be that SCEV will return different nsw/nuw flags for >>> the same Value, depending on what order the SCEVs are computed, due >>> to the fact that the SCEV object unique-ing doesn't take the nsw/nuw >>> flags into account. Since LoopDataPrefetch computes SCEVs in a >>> different order than LSR, the nsw/nuw flags seen by LSR will differ >>> based on whether the SCEVs are preserved from LoopDataPrefetch. >>> >>> I believe this issue has been discussed before, but I just wanted to >>> check if this is indeed the current expected behavior, and if anyone >>> has any plans/ideas for addressing this issue. >> >> The general issue that SCEV nsw is weird is known... see, for example >> https://bugs.llvm.org/show_bug.cgi?id=23527. >> >>> For reference, below is a reduced loop where this problem occurs. >>> The SCEV for %i.07.i will have <nuw> or not depending on whether >>> %idxprom.i was computed before it: >> >> %idxprom.i, the zext? I'm not sure how you're getting that >> particular effect. ScalarEvolution::getSCEV for a zext immediately >> calls getSCEV on its operand. > > Here is an abridged record of the getSCEV results as seen by each pass > with/without preserving SCEVAnalysis. > In the first case, when the SCEV is invalidated, the SCEV for %i.07.i > is computed in LSR as {0,+,1}<%for.body.i>. In the second case, the > SCEV for %i.07.i is computed in LSR the same way, but because the SCEV > for %idxprom.i from FalkorHWPFFix unique's to the same value but has > the nuw flag set and is still present in the foldingset, > {0,+,1}<nuw><%for.body.i> is returned for %i.07.i in LSR instead. The > SCEVs for other values differ too, but I thought I'd start with this one.Oh, I see... yes, we do stupid things involving mutating NoWrap flags after a SCEV is created. (grep for setNoWrapFlags in ScalarEvolution.cpp.) -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Sanjoy Das via llvm-dev
2017-Aug-09 00:38 UTC
[llvm-dev] [ScalarEvolution][SCEV] no-wrap flags dependent on order of getSCEV() calls
Hi, On Tue, Aug 8, 2017 at 12:58 PM, Friedman, Eli <efriedma at codeaurora.org> wrote:> Oh, I see... yes, we do stupid things involving mutating NoWrap flags after > a SCEV is created. (grep for setNoWrapFlags in ScalarEvolution.cpp.)That's really a compile time hack -- we defer some expensive tricks to prove nsw/nuw on an add recurrences to when we've been asked to sign/zero extend said add recurrence. I would be okay if you wanted to expose those tricks via a helper in SCEV that could be called independently of creating zero or sign extensions. That way SCEV clients could trade-off compile time for more precise information in a granular way. -- Sanjoy> > -Eli > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux > Foundation Collaborative Project >