Philip Reames via llvm-dev
2016-Mar-22 20:31 UTC
[llvm-dev] RFC: A change in InstCombine canonical form
I feel very strongly that blocking work on making optimization bitcast-ignorant on the typeless pointer work would be a major mistake. Unless we expected the typeless pointer work to be concluded within the near term (say 3-6 months maximum), we should not block any development which would be accepted in the typeless pointer work wasn't planned. In my view, this is one of the largest mistakes we've made with the pass manager work, it has seriously cost us, and I don't want to see it happening again. Philip On 03/22/2016 01:09 PM, David Blaikie wrote:> Ultimately everything is going to be made to not rely on the types of > pointers - that's nearly equivalent to bitcast-ignorant (the > difference being that the presence of an extra instruction (the > bitcast) might trip up some optimizations - but the presence of the > /type/ information implied by the bitcast should not trip up or be > necessary for optimizations (two sides of the same coin)) > > If we're talking about making an optimization able to ignore the > bitcast instructions - yes, that work is unnecessary & perhaps > questionable given the typeless pointer work. Not outright off limits, > but the same time might be better invested in moving typeless pointers > forward if the contributor is so inclined/able to shift in that direction. > > But if we're talking about the work to make the optimization not use > the type of pointers as a crutch - that work is a necessary precursor > to the typeless pointer work and would be most welcome. > > - David > > On Tue, Mar 22, 2016 at 12:39 PM, Mehdi Amini <mehdi.amini at apple.com > <mailto:mehdi.amini at apple.com>> wrote: > > I don't really mind, but the intermediate stage will not be very > nice: that a lot of code / tests that needs to be written with > bitcast, and all of that while they are deemed to disappear. The > added value isn't clear to me considering the added work. I'm not > sure it wouldn't add more work for all the cleanup required by the > "typeless pointer", but I'm not sure what's involved here and if > David thinks the intermediate steps of handling bit casts > everywhere is not making it harder I'm fine with it. > > -- > Mehdi > >> On Mar 22, 2016, at 12:36 PM, Philip Reames >> <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote: >> >> I'd phrase this differently: being pointer-bitcast agnostic is a >> step towards support typeless pointers. :) We can either become >> bitcast agnostic all in one big change or incrementally. >> Personally, I'd prefer the later since it reduces the risk >> associated with enabling typeless pointers in the end. >> >> Philip >> >> On 03/22/2016 12:16 PM, Mehdi Amini via llvm-dev wrote: >>> I don't know enough about the tradeoff for 1, but 2 seems like a >>> bandaid for something that is not a correctness issue neither a >>> regression. I'm not sure it justifies "bandaid patches" while >>> there is a clear path forward, i.e. typeless pointers, unless >>> there is an acknowledgement that typeless pointers won't be >>> there before a couple of years. >>> >>> -- >>> Mehdi >>> >>>> On Mar 22, 2016, at 11:32 AM, Ehsan Amiri <ehsanamiri at gmail.com >>>> <mailto:ehsanamiri at gmail.com>> wrote: >>>> >>>> Back to the discussion on the RFC, I still see some advantage >>>> in following the proposed solution. I see two paths forward: >>>> >>>> 1- Change canonical form, possibly lower memcpy to non-integer >>>> load and store in InstCombine. Then teach the backends to >>>> convert that to integer load and store if that is more >>>> profitable. Notice that we are talking about loads that have no >>>> use other than store. So it is a fairly simple change in the >>>> backends. >>>> >>>> 2- Do not change the canonical form. Then for this bug, we need >>>> to teach select speculation to see through bitcasts. We will >>>> probably need to teach other optimizations to see though >>>> bitcasts in the future as problems are uncovered. That is until >>>> typeless pointer work is complete. Once the typeless pointer >>>> work is complete, we have some extra code in each optimization >>>> for seeing through bitcasts which is possibly no longer needed. >>>> >>>> Based on this I think (1) is the right thing to do. But there >>>> might be other reasons for the current canonical form that I am >>>> not aware of. Please let me know what you think. >>>> >>>> Thanks >>>> Ehsan >>>> >>>> On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie >>>> <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote: >>>> >>>> >>>> >>>> On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri >>>> <ehsanamiri at gmail.com <mailto:ehsanamiri at gmail.com>> wrote: >>>> >>>> David, >>>> >>>> Could you give us an update on the status of typeless >>>> pointer work? How much work is left and when you think >>>> it might be ready? >>>> >>>> >>>> It's a bit of an onion peel, really - since it will >>>> eventually involve generalizing/fixing every optimization >>>> that's currently leaning on typed pointers to keep the >>>> performance while removing the crutch they're currently >>>> leaning on. (in cases where bitcasts are literally just >>>> getting in the way, those won't require cleaning up & >>>> should just become "free performance wins" once we remove >>>> them, though) >>>> >>>> At the moment we can roundtrip every LLVM IR test case >>>> through bitcode and textual IR (reading/writing both >>>> formats) while using only a narrow whitelist of places that >>>> request the type of a pointer (things like the verifier, >>>> the parser/printer where it actually needs the typed >>>> pointer to verify it matches the explicit type, etc). >>>> >>>> The next thing on the list is probably figuring out the >>>> byval/inalloca representation (add an explicit pointee >>>> type? just make the number of bytes explicit with no type >>>> information?). >>>> >>>> Then start migrating optimizations over - doing the same >>>> sort of testing I did for the IR/bitcode roundtripping - >>>> assert that the pointee type is not accessed, whitelist >>>> places that need it until the bitcasts go away, fix >>>> anything else... it'll still be a fair bit of work & I >>>> don't really know how much. It should parallelize pretty >>>> well (doing any of this work is really helpful, each >>>> optimization is indepednent, etc) if anyone wants to/is >>>> able to help. >>>> >>>> - Dave >>>> >>>> >>>> Thanks >>>> Ehsan >>>> >>>> >>>> On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie >>>> <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote: >>>> >>>> >>>> >>>> On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via >>>> llvm-dev <llvm-dev at lists.llvm.org >>>> <mailto:llvm-dev at lists.llvm.org>> wrote: >>>> >>>> Hi, >>>> >>>> How do it interact with the "typeless pointers" >>>> work? >>>> >>>> >>>> Right - the goal of the typeless pointer work is to >>>> fix all these bugs related to "didn't look through >>>> bitcasts" in optimizations. Sometimes that's going >>>> to mean more work (because the code is leaning on >>>> the absence of bitcasts & the presence of >>>> convenient (but not guaranteed) type information to >>>> inform optimization decisions) but if we remove >>>> typed pointers while keeping optimization quality >>>> in the cases we have today, then we should've also >>>> fixed the cases that were broken because the type >>>> information didn't end up aligning to produce the >>>> optimal output. >>>> >>>> & I know I've been off the typeless pointer stuff >>>> for a few months working on llvm-dwp - but happy >>>> for any help (the next immediate piece is probably >>>> figuring out teh right representation for byval and >>>> inalloca - there were some code reviews sent out >>>> for that that I'll need to come back around to - >>>> but also any optimizations people want to help >>>> rework/improve would be great too & I can provide >>>> some techniques/tools to help people approach those) >>>> >>>> - Dave >>>> >>>> >>>> Thanks, >>>> >>>> -- >>>> Mehdi >>>> >>>>> On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via >>>>> llvm-dev <llvm-dev at lists.llvm.org >>>>> <mailto:llvm-dev at lists.llvm.org>> wrote: >>>>> >>>>> === PROBLEM === (See this bug >>>>> https://llvm.org/bugs/show_bug.cgi?id=26445) >>>>> >>>>> IR contains code for loading a float from >>>>> float * and storing it to a float * address. >>>>> After canonicalization of load in InstCombine >>>>> [1], new bitcasts are added to the IR (see >>>>> bottom of the email for code samples). This >>>>> prevents select speculation in SROA to work. >>>>> Also after SROA we have bitcasts from int32 to >>>>> float. (Whereas originally after instCombine, >>>>> bitcasts are only done on pointer types). >>>>> >>>>> === PROPOSED SOLUTION==>>>>> >>>>> [1] implies that we need load canonicalization >>>>> when we load a value only to store it again. >>>>> The reason is to avoid generating slightly >>>>> different code (due to different ways of >>>>> adding bitcasts), in different situations. In >>>>> all examples presented in [1] there is a >>>>> non-zero number of bitcasts. I think when we >>>>> load a value of type T from a T* address and >>>>> store it as a type T value to one or more T* >>>>> address (and there is no other use or store), >>>>> we can redefine canonical form to mean there >>>>> should not be any bitcasts. So we still have a >>>>> canonical form, but its definition is slightly >>>>> different. >>>>> >>>>> === REASONS FOR / AGAINST==>>>>> >>>>> Hal Finkel warns that while this may be useful >>>>> for power pc, this may hurt more than one >>>>> other platform and become a very large >>>>> project. Despite this he is fine with bringing >>>>> up the issue to the mailing list to get >>>>> feedback, mostly because this seems inline >>>>> with our future direction of having a unique >>>>> type for all pointers. (Hal please correct me >>>>> if I misunderstood your comment) >>>>> >>>>> This is a much simpler fix compared to >>>>> alternatives. (ignoring potential regressions) >>>>> >>>>> === ALTERNATIVE SOLUTION ==>>>>> >>>>> Fix select speculation in SROA to see through >>>>> bitcasts. Handle remaining bitcasts during >>>>> code gen. Other alternative solutions are welcome. >>>>> >>>>> Should I implement the proposed solution or is >>>>> it too risky? I understand that we may need to >>>>> undo it if it breaks too many things. Comments >>>>> are welcome. >>>>> >>>>> >>>>> [1] >>>>> http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html >>>>> r226781 git commit id: b778cbc0c8 >>>>> >>>>> >>>>> >>>>> Code Samples (only relevant part is copied): >>>>> >>>>> -------------------- Before Canonicalization >>>>> (contains call to std::max): -------------------- >>>>> entry: >>>>> %max_value = alloca float, align 4 >>>>> %1 = load float, float* %input, align 4, >>>>> !tbaa !1 >>>>> store float %1, float* %max_value, align 4, >>>>> !tbaa !1 >>>>> >>>>> for.body: >>>>> %call = call dereferenceable(4) float* >>>>> @_ZSt3maxIfERKT_S2_S2_(float* >>>>> dereferenceable(4) %max_value, float* >>>>> dereferenceable(4) %arrayidx1) >>>>> %3 = load float, float* %call, align 4, !tbaa !1 >>>>> store float %3, float* %max_value, align 4, >>>>> !tbaa !1 >>>>> >>>>> -------------------- After Canonicalization >>>>> (contains call to std::max):-------------------- >>>>> >>>>> entry: >>>>> %max_value = alloca float, align 4 >>>>> %1 = bitcast float* %input to i32* >>>>> %2 = load i32, i32* %1, align 4, !tbaa !1 >>>>> %3 = bitcast float* %max_value to i32* >>>>> store i32 %2, i32* %3, align 4, !tbaa !1 >>>>> >>>>> for.body: >>>>> %call = call dereferenceable(4) float* >>>>> @_ZSt3maxIfERKT_S2_S2_(float* nonnull >>>>> dereferenceable(4) %max_value, float* >>>>> dereferenceable(4) %arrayidx1) >>>>> %5 = bitcast float* %call to i32* >>>>> %6 = load i32, i32* %5, align 4, !tbaa !1 >>>>> %7 = bitcast float* %max_value to i32* >>>>> store i32 %6, i32* %7, align 4, !tbaa !1 >>>>> >>>>> -------------------- After SROA (the call to >>>>> std::max is inlined now):-------------------- >>>>> entry: >>>>> %max_value.sroa.0 = alloca i32 >>>>> %0 = bitcast float* %input to i32* >>>>> %1 = load i32, i32* %0, align 4, !tbaa !1 >>>>> store i32 %1, i32* %max_value.sroa.0 >>>>> >>>>> for.body: >>>>> %max_value.sroa.0.0.max_value.sroa.0.0.6 >>>>> load i32, i32* %max_value.sroa.0 >>>>> %3 = bitcast i32 >>>>> %max_value.sroa.0.0.max_value.sroa.0.0.6 to float >>>>> %max_value.sroa.0.0.max_value.sroa_cast8 >>>>> bitcast i32* %max_value.sroa.0 to float* >>>>> %__b.__a.i = select i1 %cmp.i, float* >>>>> %arrayidx1, float* >>>>> %max_value.sroa.0.0.max_value.sroa_cast8 >>>>> %5 = bitcast float* %__b.__a.i to i32* >>>>> %6 = load i32, i32* %5, align 4, !tbaa !1 >>>>> store i32 %6, i32* %max_value.sroa.0 >>>>> >>>>> -------------------- After SROA when >>>>> Canonicalization is turned >>>>> off-------------------- >>>>> entry: >>>>> %0 = load float, float* %input, align 4, >>>>> !tbaa !1 >>>>> >>>>> for.cond: ; preds = %for.body, %entry >>>>> %max_value.0 = phi float [ %0, %entry ], [ >>>>> %.sroa.speculated, %for.body ] >>>>> >>>>> for.body: >>>>> %1 = load float, float* %arrayidx1, align 4, >>>>> !tbaa !1 >>>>> %cmp.i = fcmp olt float %max_value.0, %1 >>>>> %.sroa.speculate.load.true = load float, >>>>> float* %arrayidx1, align 4, !tbaa !1 >>>>> %.sroa.speculated = select i1 %cmp.i, float >>>>> %.sroa.speculate.load.true, float %max_value.0 >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> <mailto:llvm-dev at lists.llvm.org> >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> <mailto:llvm-dev at lists.llvm.org> >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/a61d8417/attachment-0001.html>
Mehdi Amini via llvm-dev
2016-Mar-22 20:34 UTC
[llvm-dev] RFC: A change in InstCombine canonical form
This is roughly what I wrote...> On Mar 22, 2016, at 1:31 PM, Philip Reames <listmail at philipreames.com> wrote: > > I feel very strongly that blocking work on making optimization bitcast-ignorant on the typeless pointer work would be a major mistake. Unless we expected the typeless pointer work to be concluded within the near term (say 3-6 months maximum), we should not block any development which would be accepted in the typeless pointer work wasn't planned. > > In my view, this is one of the largest mistakes we've made with the pass manager work, it has seriously cost us, and I don't want to see it happening again. > > Philip > > On 03/22/2016 01:09 PM, David Blaikie wrote: >> Ultimately everything is going to be made to not rely on the types of pointers - that's nearly equivalent to bitcast-ignorant (the difference being that the presence of an extra instruction (the bitcast) might trip up some optimizations - but the presence of the /type/ information implied by the bitcast should not trip up or be necessary for optimizations (two sides of the same coin)) >> >> If we're talking about making an optimization able to ignore the bitcast instructions - yes, that work is unnecessary & perhaps questionable given the typeless pointer work. Not outright off limits, but the same time might be better invested in moving typeless pointers forward if the contributor is so inclined/able to shift in that direction. >> >> But if we're talking about the work to make the optimization not use the type of pointers as a crutch - that work is a necessary precursor to the typeless pointer work and would be most welcome. >> >> - David >> >> On Tue, Mar 22, 2016 at 12:39 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: >> I don't really mind, but the intermediate stage will not be very nice: that a lot of code / tests that needs to be written with bitcast, and all of that while they are deemed to disappear. The added value isn't clear to me considering the added work. I'm not sure it wouldn't add more work for all the cleanup required by the "typeless pointer", but I'm not sure what's involved here and if David thinks the intermediate steps of handling bit casts everywhere is not making it harder I'm fine with it. >> >> -- >> Mehdi >> >>> On Mar 22, 2016, at 12:36 PM, Philip Reames <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote: >>> >>> I'd phrase this differently: being pointer-bitcast agnostic is a step towards support typeless pointers. :) We can either become bitcast agnostic all in one big change or incrementally. Personally, I'd prefer the later since it reduces the risk associated with enabling typeless pointers in the end. >>> >>> Philip >>> >>> On 03/22/2016 12:16 PM, Mehdi Amini via llvm-dev wrote: >>>> I don't know enough about the tradeoff for 1, but 2 seems like a bandaid for something that is not a correctness issue neither a regression. I'm not sure it justifies "bandaid patches" while there is a clear path forward, i.e. typeless pointers, unless there is an acknowledgement that typeless pointers won't be there before a couple of years. >>>> >>>> -- >>>> Mehdi >>>> >>>>> On Mar 22, 2016, at 11:32 AM, Ehsan Amiri < <mailto:ehsanamiri at gmail.com>ehsanamiri at gmail.com <mailto:ehsanamiri at gmail.com>> wrote: >>>>> >>>>> Back to the discussion on the RFC, I still see some advantage in following the proposed solution. I see two paths forward: >>>>> >>>>> 1- Change canonical form, possibly lower memcpy to non-integer load and store in InstCombine. Then teach the backends to convert that to integer load and store if that is more profitable. Notice that we are talking about loads that have no use other than store. So it is a fairly simple change in the backends. >>>>> >>>>> 2- Do not change the canonical form. Then for this bug, we need to teach select speculation to see through bitcasts. We will probably need to teach other optimizations to see though bitcasts in the future as problems are uncovered. That is until typeless pointer work is complete. Once the typeless pointer work is complete, we have some extra code in each optimization for seeing through bitcasts which is possibly no longer needed. >>>>> >>>>> Based on this I think (1) is the right thing to do. But there might be other reasons for the current canonical form that I am not aware of. Please let me know what you think. >>>>> >>>>> Thanks >>>>> Ehsan >>>>> >>>>> On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie < <mailto:dblaikie at gmail.com>dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote: >>>>> >>>>> >>>>> On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri < <mailto:ehsanamiri at gmail.com>ehsanamiri at gmail.com <mailto:ehsanamiri at gmail.com>> wrote: >>>>> David, >>>>> >>>>> Could you give us an update on the status of typeless pointer work? How much work is left and when you think it might be ready? >>>>> >>>>> It's a bit of an onion peel, really - since it will eventually involve generalizing/fixing every optimization that's currently leaning on typed pointers to keep the performance while removing the crutch they're currently leaning on. (in cases where bitcasts are literally just getting in the way, those won't require cleaning up & should just become "free performance wins" once we remove them, though) >>>>> >>>>> At the moment we can roundtrip every LLVM IR test case through bitcode and textual IR (reading/writing both formats) while using only a narrow whitelist of places that request the type of a pointer (things like the verifier, the parser/printer where it actually needs the typed pointer to verify it matches the explicit type, etc). >>>>> >>>>> The next thing on the list is probably figuring out the byval/inalloca representation (add an explicit pointee type? just make the number of bytes explicit with no type information?). >>>>> >>>>> Then start migrating optimizations over - doing the same sort of testing I did for the IR/bitcode roundtripping - assert that the pointee type is not accessed, whitelist places that need it until the bitcasts go away, fix anything else... it'll still be a fair bit of work & I don't really know how much. It should parallelize pretty well (doing any of this work is really helpful, each optimization is indepednent, etc) if anyone wants to/is able to help. >>>>> >>>>> - Dave >>>>> >>>>> >>>>> Thanks >>>>> Ehsan >>>>> >>>>> >>>>> On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie < <mailto:dblaikie at gmail.com>dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote: >>>>> >>>>> >>>>> On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via llvm-dev < <mailto:llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>>> Hi, >>>>> >>>>> How do it interact with the "typeless pointers" work? >>>>> >>>>> Right - the goal of the typeless pointer work is to fix all these bugs related to "didn't look through bitcasts" in optimizations. Sometimes that's going to mean more work (because the code is leaning on the absence of bitcasts & the presence of convenient (but not guaranteed) type information to inform optimization decisions) but if we remove typed pointers while keeping optimization quality in the cases we have today, then we should've also fixed the cases that were broken because the type information didn't end up aligning to produce the optimal output. >>>>> >>>>> & I know I've been off the typeless pointer stuff for a few months working on llvm-dwp - but happy for any help (the next immediate piece is probably figuring out teh right representation for byval and inalloca - there were some code reviews sent out for that that I'll need to come back around to - but also any optimizations people want to help rework/improve would be great too & I can provide some techniques/tools to help people approach those) >>>>> >>>>> - Dave >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> -- >>>>> Mehdi >>>>> >>>>>> On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via llvm-dev < <mailto:llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>>>> >>>>>> === PROBLEM === (See this bug <https://llvm.org/bugs/show_bug.cgi?id=26445>https://llvm.org/bugs/show_bug.cgi?id=26445 <https://llvm.org/bugs/show_bug.cgi?id=26445>) >>>>>> >>>>>> IR contains code for loading a float from float * and storing it to a float * address. After canonicalization of load in InstCombine [1], new bitcasts are added to the IR (see bottom of the email for code samples). This prevents select speculation in SROA to work. Also after SROA we have bitcasts from int32 to float. (Whereas originally after instCombine, bitcasts are only done on pointer types). >>>>>> >>>>>> === PROPOSED SOLUTION==>>>>>> >>>>>> [1] implies that we need load canonicalization when we load a value only to store it again. The reason is to avoid generating slightly different code (due to different ways of adding bitcasts), in different situations. In all examples presented in [1] there is a non-zero number of bitcasts. I think when we load a value of type T from a T* address and store it as a type T value to one or more T* address (and there is no other use or store), we can redefine canonical form to mean there should not be any bitcasts. So we still have a canonical form, but its definition is slightly different. >>>>>> >>>>>> === REASONS FOR / AGAINST==>>>>>> >>>>>> Hal Finkel warns that while this may be useful for power pc, this may hurt more than one other platform and become a very large project. Despite this he is fine with bringing up the issue to the mailing list to get feedback, mostly because this seems inline with our future direction of having a unique type for all pointers. (Hal please correct me if I misunderstood your comment) >>>>>> >>>>>> This is a much simpler fix compared to alternatives. (ignoring potential regressions) >>>>>> >>>>>> === ALTERNATIVE SOLUTION ==>>>>>> >>>>>> Fix select speculation in SROA to see through bitcasts. Handle remaining bitcasts during code gen. Other alternative solutions are welcome. >>>>>> >>>>>> Should I implement the proposed solution or is it too risky? I understand that we may need to undo it if it breaks too many things. Comments are welcome. >>>>>> >>>>>> >>>>>> [1] <http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html>http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html <http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html> r226781 git commit id: b778cbc0c8 >>>>>> >>>>>> >>>>>> >>>>>> Code Samples (only relevant part is copied): >>>>>> >>>>>> -------------------- Before Canonicalization (contains call to std::max): -------------------- >>>>>> entry: >>>>>> %max_value = alloca float, align 4 >>>>>> %1 = load float, float* %input, align 4, !tbaa !1 >>>>>> store float %1, float* %max_value, align 4, !tbaa !1 >>>>>> >>>>>> for.body: >>>>>> %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float* dereferenceable(4) %max_value, float* dereferenceable(4) %arrayidx1) >>>>>> %3 = load float, float* %call, align 4, !tbaa !1 >>>>>> store float %3, float* %max_value, align 4, !tbaa !1 >>>>>> >>>>>> -------------------- After Canonicalization (contains call to std::max):-------------------- >>>>>> >>>>>> entry: >>>>>> %max_value = alloca float, align 4 >>>>>> %1 = bitcast float* %input to i32* >>>>>> %2 = load i32, i32* %1, align 4, !tbaa !1 >>>>>> %3 = bitcast float* %max_value to i32* >>>>>> store i32 %2, i32* %3, align 4, !tbaa !1 >>>>>> >>>>>> for.body: >>>>>> %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float* nonnull dereferenceable(4) %max_value, float* dereferenceable(4) %arrayidx1) >>>>>> %5 = bitcast float* %call to i32* >>>>>> %6 = load i32, i32* %5, align 4, !tbaa !1 >>>>>> %7 = bitcast float* %max_value to i32* >>>>>> store i32 %6, i32* %7, align 4, !tbaa !1 >>>>>> >>>>>> -------------------- After SROA (the call to std::max is inlined now):-------------------- >>>>>> entry: >>>>>> %max_value.sroa.0 = alloca i32 >>>>>> %0 = bitcast float* %input to i32* >>>>>> %1 = load i32, i32* %0, align 4, !tbaa !1 >>>>>> store i32 %1, i32* %max_value.sroa.0 >>>>>> >>>>>> for.body: >>>>>> %max_value.sroa.0.0.max_value.sroa.0.0.6 = load i32, i32* %max_value.sroa.0 >>>>>> %3 = bitcast i32 %max_value.sroa.0.0.max_value.sroa.0.0.6 to float >>>>>> %max_value.sroa.0.0.max_value.sroa_cast8 = bitcast i32* %max_value.sroa.0 to float* >>>>>> %__b.__a.i = select i1 %cmp.i, float* %arrayidx1, float* %max_value.sroa.0.0.max_value.sroa_cast8 >>>>>> %5 = bitcast float* %__b.__a.i to i32* >>>>>> %6 = load i32, i32* %5, align 4, !tbaa !1 >>>>>> store i32 %6, i32* %max_value.sroa.0 >>>>>> >>>>>> -------------------- After SROA when Canonicalization is turned off-------------------- >>>>>> entry: >>>>>> %0 = load float, float* %input, align 4, !tbaa !1 >>>>>> >>>>>> for.cond: ; preds = %for.body, %entry >>>>>> %max_value.0 = phi float [ %0, %entry ], [ %.sroa.speculated, %for.body ] >>>>>> >>>>>> for.body: >>>>>> %1 = load float, float* %arrayidx1, align 4, !tbaa !1 >>>>>> %cmp.i = fcmp olt float %max_value.0, %1 >>>>>> %.sroa.speculate.load.true = load float, float* %arrayidx1, align 4, !tbaa !1 >>>>>> %.sroa.speculated = select i1 %cmp.i, float %.sroa.speculate.load.true, float %max_value.0 >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> <mailto:llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>>>>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >>>>> >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> <mailto:llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>>>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >>> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/fd290826/attachment.html>
Philip Reames via llvm-dev
2016-Mar-22 20:37 UTC
[llvm-dev] RFC: A change in InstCombine canonical form
But not what David was stating, unless I misread? I was specifically responding to David's wording: "If we're talking about making an optimization able to ignore the bitcast instructions - yes, that work is unnecessary & perhaps questionable given the typeless pointer work. Not outright off limits, but the same time might be better invested in moving typeless pointers forward if the contributor is so inclined/able to shift in that direction." Both "perhaps questionable" and "not outright off limits" seem to strongly imply such work should be discouraged. I disagree with that view which is why I wrote my response. Philip On 03/22/2016 01:34 PM, Mehdi Amini wrote:> This is roughly what I wrote... > >> On Mar 22, 2016, at 1:31 PM, Philip Reames <listmail at philipreames.com >> <mailto:listmail at philipreames.com>> wrote: >> >> I feel very strongly that blocking work on making optimization >> bitcast-ignorant on the typeless pointer work would be a major >> mistake. Unless we expected the typeless pointer work to be >> concluded within the near term (say 3-6 months maximum), we should >> not block any development which would be accepted in the typeless >> pointer work wasn't planned. >> >> In my view, this is one of the largest mistakes we've made with the >> pass manager work, it has seriously cost us, and I don't want to see >> it happening again. >> >> Philip >> >> On 03/22/2016 01:09 PM, David Blaikie wrote: >>> Ultimately everything is going to be made to not rely on the types >>> of pointers - that's nearly equivalent to bitcast-ignorant (the >>> difference being that the presence of an extra instruction (the >>> bitcast) might trip up some optimizations - but the presence of the >>> /type/ information implied by the bitcast should not trip up or be >>> necessary for optimizations (two sides of the same coin)) >>> >>> If we're talking about making an optimization able to ignore the >>> bitcast instructions - yes, that work is unnecessary & perhaps >>> questionable given the typeless pointer work. Not outright off >>> limits, but the same time might be better invested in moving >>> typeless pointers forward if the contributor is so inclined/able to >>> shift in that direction. >>> >>> But if we're talking about the work to make the optimization not use >>> the type of pointers as a crutch - that work is a necessary >>> precursor to the typeless pointer work and would be most welcome. >>> >>> - David >>> >>> On Tue, Mar 22, 2016 at 12:39 PM, Mehdi Amini <mehdi.amini at apple.com >>> <mailto:mehdi.amini at apple.com>> wrote: >>> >>> I don't really mind, but the intermediate stage will not be very >>> nice: that a lot of code / tests that needs to be written with >>> bitcast, and all of that while they are deemed to disappear. The >>> added value isn't clear to me considering the added work. I'm >>> not sure it wouldn't add more work for all the cleanup required >>> by the "typeless pointer", but I'm not sure what's involved here >>> and if David thinks the intermediate steps of handling bit casts >>> everywhere is not making it harder I'm fine with it. >>> >>> -- >>> Mehdi >>> >>>> On Mar 22, 2016, at 12:36 PM, Philip Reames >>>> <listmail at philipreames.com <mailto:listmail at philipreames.com>> >>>> wrote: >>>> >>>> I'd phrase this differently: being pointer-bitcast agnostic is >>>> a step towards support typeless pointers. :) We can either >>>> become bitcast agnostic all in one big change or >>>> incrementally. Personally, I'd prefer the later since it >>>> reduces the risk associated with enabling typeless pointers in >>>> the end. >>>> >>>> Philip >>>> >>>> On 03/22/2016 12:16 PM, Mehdi Amini via llvm-dev wrote: >>>>> I don't know enough about the tradeoff for 1, but 2 seems like >>>>> a bandaid for something that is not a correctness issue >>>>> neither a regression. I'm not sure it justifies "bandaid >>>>> patches" while there is a clear path forward, i.e. typeless >>>>> pointers, unless there is an acknowledgement that typeless >>>>> pointers won't be there before a couple of years. >>>>> >>>>> -- >>>>> Mehdi >>>>> >>>>>> On Mar 22, 2016, at 11:32 AM, Ehsan Amiri >>>>>> <ehsanamiri at gmail.com> wrote: >>>>>> >>>>>> Back to the discussion on the RFC, I still see some advantage >>>>>> in following the proposed solution. I see two paths forward: >>>>>> >>>>>> 1- Change canonical form, possibly lower memcpy to >>>>>> non-integer load and store in InstCombine. Then teach the >>>>>> backends to convert that to integer load and store if that is >>>>>> more profitable. Notice that we are talking about loads that >>>>>> have no use other than store. So it is a fairly simple change >>>>>> in the backends. >>>>>> >>>>>> 2- Do not change the canonical form. Then for this bug, we >>>>>> need to teach select speculation to see through bitcasts. We >>>>>> will probably need to teach other optimizations to see though >>>>>> bitcasts in the future as problems are uncovered. That is >>>>>> until typeless pointer work is complete. Once the typeless >>>>>> pointer work is complete, we have some extra code in each >>>>>> optimization for seeing through bitcasts which is possibly no >>>>>> longer needed. >>>>>> >>>>>> Based on this I think (1) is the right thing to do. But there >>>>>> might be other reasons for the current canonical form that I >>>>>> am not aware of. Please let me know what you think. >>>>>> >>>>>> Thanks >>>>>> Ehsan >>>>>> >>>>>> On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie >>>>>> <dblaikie at gmail.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri >>>>>> <ehsanamiri at gmail.com> wrote: >>>>>> >>>>>> David, >>>>>> >>>>>> Could you give us an update on the status of typeless >>>>>> pointer work? How much work is left and when you >>>>>> think it might be ready? >>>>>> >>>>>> >>>>>> It's a bit of an onion peel, really - since it will >>>>>> eventually involve generalizing/fixing every optimization >>>>>> that's currently leaning on typed pointers to keep the >>>>>> performance while removing the crutch they're currently >>>>>> leaning on. (in cases where bitcasts are literally just >>>>>> getting in the way, those won't require cleaning up & >>>>>> should just become "free performance wins" once we remove >>>>>> them, though) >>>>>> >>>>>> At the moment we can roundtrip every LLVM IR test case >>>>>> through bitcode and textual IR (reading/writing both >>>>>> formats) while using only a narrow whitelist of places >>>>>> that request the type of a pointer (things like the >>>>>> verifier, the parser/printer where it actually needs the >>>>>> typed pointer to verify it matches the explicit type, etc). >>>>>> >>>>>> The next thing on the list is probably figuring out the >>>>>> byval/inalloca representation (add an explicit pointee >>>>>> type? just make the number of bytes explicit with no type >>>>>> information?). >>>>>> >>>>>> Then start migrating optimizations over - doing the same >>>>>> sort of testing I did for the IR/bitcode roundtripping - >>>>>> assert that the pointee type is not accessed, whitelist >>>>>> places that need it until the bitcasts go away, fix >>>>>> anything else... it'll still be a fair bit of work & I >>>>>> don't really know how much. It should parallelize pretty >>>>>> well (doing any of this work is really helpful, each >>>>>> optimization is indepednent, etc) if anyone wants to/is >>>>>> able to help. >>>>>> >>>>>> - Dave >>>>>> >>>>>> >>>>>> Thanks >>>>>> Ehsan >>>>>> >>>>>> >>>>>> On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie >>>>>> <dblaikie at gmail.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via >>>>>> llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> How do it interact with the "typeless >>>>>> pointers" work? >>>>>> >>>>>> >>>>>> Right - the goal of the typeless pointer work is >>>>>> to fix all these bugs related to "didn't look >>>>>> through bitcasts" in optimizations. Sometimes >>>>>> that's going to mean more work (because the code >>>>>> is leaning on the absence of bitcasts & the >>>>>> presence of convenient (but not guaranteed) type >>>>>> information to inform optimization decisions) but >>>>>> if we remove typed pointers while keeping >>>>>> optimization quality in the cases we have today, >>>>>> then we should've also fixed the cases that were >>>>>> broken because the type information didn't end up >>>>>> aligning to produce the optimal output. >>>>>> >>>>>> & I know I've been off the typeless pointer stuff >>>>>> for a few months working on llvm-dwp - but happy >>>>>> for any help (the next immediate piece is >>>>>> probably figuring out teh right representation >>>>>> for byval and inalloca - there were some code >>>>>> reviews sent out for that that I'll need to come >>>>>> back around to - but also any optimizations >>>>>> people want to help rework/improve would be great >>>>>> too & I can provide some techniques/tools to help >>>>>> people approach those) >>>>>> >>>>>> - Dave >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -- >>>>>> Mehdi >>>>>> >>>>>>> On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via >>>>>>> llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>>>>> >>>>>>> === PROBLEM === (See this bug >>>>>>> https://llvm.org/bugs/show_bug.cgi?id=26445) >>>>>>> >>>>>>> IR contains code for loading a float from >>>>>>> float * and storing it to a float * address. >>>>>>> After canonicalization of load in >>>>>>> InstCombine [1], new bitcasts are added to >>>>>>> the IR (see bottom of the email for code >>>>>>> samples). This prevents select speculation >>>>>>> in SROA to work. Also after SROA we have >>>>>>> bitcasts from int32 to float. (Whereas >>>>>>> originally after instCombine, bitcasts are >>>>>>> only done on pointer types). >>>>>>> >>>>>>> === PROPOSED SOLUTION==>>>>>>> >>>>>>> [1] implies that we need load >>>>>>> canonicalization when we load a value only >>>>>>> to store it again. The reason is to avoid >>>>>>> generating slightly different code (due to >>>>>>> different ways of adding bitcasts), in >>>>>>> different situations. In all examples >>>>>>> presented in [1] there is a non-zero number >>>>>>> of bitcasts. I think when we load a value of >>>>>>> type T from a T* address and store it as a >>>>>>> type T value to one or more T* address (and >>>>>>> there is no other use or store), we can >>>>>>> redefine canonical form to mean there should >>>>>>> not be any bitcasts. So we still have a >>>>>>> canonical form, but its definition is >>>>>>> slightly different. >>>>>>> >>>>>>> === REASONS FOR / AGAINST==>>>>>>> >>>>>>> Hal Finkel warns that while this may be >>>>>>> useful for power pc, this may hurt more than >>>>>>> one other platform and become a very large >>>>>>> project. Despite this he is fine with >>>>>>> bringing up the issue to the mailing list to >>>>>>> get feedback, mostly because this seems >>>>>>> inline with our future direction of having a >>>>>>> unique type for all pointers. (Hal please >>>>>>> correct me if I misunderstood your comment) >>>>>>> >>>>>>> This is a much simpler fix compared to >>>>>>> alternatives. (ignoring potential regressions) >>>>>>> >>>>>>> === ALTERNATIVE SOLUTION ==>>>>>>> >>>>>>> Fix select speculation in SROA to see >>>>>>> through bitcasts. Handle remaining bitcasts >>>>>>> during code gen. Other alternative solutions >>>>>>> are welcome. >>>>>>> >>>>>>> Should I implement the proposed solution or >>>>>>> is it too risky? I understand that we may >>>>>>> need to undo it if it breaks too many >>>>>>> things. Comments are welcome. >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html >>>>>>> r226781 git commit id: b778cbc0c8 >>>>>>> >>>>>>> >>>>>>> >>>>>>> Code Samples (only relevant part is copied): >>>>>>> >>>>>>> -------------------- Before Canonicalization >>>>>>> (contains call to std::max): >>>>>>> -------------------- >>>>>>> entry: >>>>>>> %max_value = alloca float, align 4 >>>>>>> %1 = load float, float* %input, align 4, >>>>>>> !tbaa !1 >>>>>>> store float %1, float* %max_value, align >>>>>>> 4, !tbaa !1 >>>>>>> >>>>>>> for.body: >>>>>>> %call = call dereferenceable(4) float* >>>>>>> @_ZSt3maxIfERKT_S2_S2_(float* >>>>>>> dereferenceable(4) %max_value, float* >>>>>>> dereferenceable(4) %arrayidx1) >>>>>>> %3 = load float, float* %call, align 4, >>>>>>> !tbaa !1 >>>>>>> store float %3, float* %max_value, align >>>>>>> 4, !tbaa !1 >>>>>>> >>>>>>> -------------------- After Canonicalization >>>>>>> (contains call to >>>>>>> std::max):-------------------- >>>>>>> >>>>>>> entry: >>>>>>> %max_value = alloca float, align 4 >>>>>>> %1 = bitcast float* %input to i32* >>>>>>> %2 = load i32, i32* %1, align 4, !tbaa !1 >>>>>>> %3 = bitcast float* %max_value to i32* >>>>>>> store i32 %2, i32* %3, align 4, !tbaa !1 >>>>>>> >>>>>>> for.body: >>>>>>> %call = call dereferenceable(4) float* >>>>>>> @_ZSt3maxIfERKT_S2_S2_(float* nonnull >>>>>>> dereferenceable(4) %max_value, float* >>>>>>> dereferenceable(4) %arrayidx1) >>>>>>> %5 = bitcast float* %call to i32* >>>>>>> %6 = load i32, i32* %5, align 4, !tbaa !1 >>>>>>> %7 = bitcast float* %max_value to i32* >>>>>>> store i32 %6, i32* %7, align 4, !tbaa !1 >>>>>>> >>>>>>> -------------------- After SROA (the call to >>>>>>> std::max is inlined now):-------------------- >>>>>>> entry: >>>>>>> %max_value.sroa.0 = alloca i32 >>>>>>> %0 = bitcast float* %input to i32* >>>>>>> %1 = load i32, i32* %0, align 4, !tbaa !1 >>>>>>> store i32 %1, i32* %max_value.sroa.0 >>>>>>> >>>>>>> for.body: >>>>>>> %max_value.sroa.0.0.max_value.sroa.0.0.6 >>>>>>> load i32, i32* %max_value.sroa.0 >>>>>>> %3 = bitcast i32 >>>>>>> %max_value.sroa.0.0.max_value.sroa.0.0.6 to >>>>>>> float >>>>>>> %max_value.sroa.0.0.max_value.sroa_cast8 >>>>>>> bitcast i32* %max_value.sroa.0 to float* >>>>>>> %__b.__a.i = select i1 %cmp.i, float* >>>>>>> %arrayidx1, float* >>>>>>> %max_value.sroa.0.0.max_value.sroa_cast8 >>>>>>> %5 = bitcast float* %__b.__a.i to i32* >>>>>>> %6 = load i32, i32* %5, align 4, !tbaa !1 >>>>>>> store i32 %6, i32* %max_value.sroa.0 >>>>>>> >>>>>>> -------------------- After SROA when >>>>>>> Canonicalization is turned >>>>>>> off-------------------- >>>>>>> entry: >>>>>>> %0 = load float, float* %input, align 4, >>>>>>> !tbaa !1 >>>>>>> >>>>>>> for.cond: ; preds = %for.body, %entry >>>>>>> %max_value.0 = phi float [ %0, %entry ], [ >>>>>>> %.sroa.speculated, %for.body ] >>>>>>> >>>>>>> for.body: >>>>>>> %1 = load float, float* %arrayidx1, align >>>>>>> 4, !tbaa !1 >>>>>>> %cmp.i = fcmp olt float %max_value.0, %1 >>>>>>> %.sroa.speculate.load.true = load float, >>>>>>> float* %arrayidx1, align 4, !tbaa !1 >>>>>>> %.sroa.speculated = select i1 %cmp.i, float >>>>>>> %.sroa.speculate.load.true, float %max_value.0 >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> llvm-dev at lists.llvm.org >>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/0f9ac7d6/attachment-0001.html>