Juneyoung Lee via llvm-dev
2019-Jan-15 16:11 UTC
[llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM
Hello Chandler,> ``` > void f(int *arr1, int *arr2, int length) { > intptr_t rel_offset = arr2 - arr1; > int *arr1_end = arr1 + length; > for (int *p = arr1; p < arr1_end; p += sizeof(int)) { > do_something(p, p + rel_offset); > } > } > ``` > > For example, a common loop optimization technique is something like thefollowing in *pseudocode* (note that this is> not necessarily valid C but we might reasonably want to express this kindof optimization within LLVM: When LLVM need to insert subtraction between two pointers, like the example you wrote, I believe we can use 'sub(ptrtoint p, ptrtoint q)' as before. This addresses concerns regarding correctness of existing pointer analysis. So you don't lose expressiveness, while you gain precision when possible. A concrete suggestion is to add a parameter like 'bool use_psub' to IRBuilder::CreatePtrDiff. When Clang inserts a pointer subtraction, use_psub is set to true. When LLVM inserts it, it is set as false. Default value of use_psub is false, so existing LLVM passes will still insert sub (ptrtoint, ptrtoint) but most of ptrtoints are inserted from Clang, so # of ptrtoint will significantly decrease as well. Adding an "inbounds" flag to psub doesn't seem necessary since you can achieve the same thing with sub(ptrtoint, ptrtoint).> This doesn't make a lot of sense to me... the pointer subtractionssemantics depend on whether the> input was rinsed through `inttoptr`? It also doesn't handle the case Idescribe above. The motivation is that (as far as I understand) LLVM already treat pointers casted from inttoptr differently from pointers from gep. We tentatively call these "physical pointers" (as opposed to "logical pointers"). For example, GEP document says that inttoptr(i + ptrtoint p) is different from gep(p, i) - it says 'most of GEP’s special aliasing rules do not apply to pointers computed from ptrtoint, arithmetic, and inttoptr sequences', implying that it can touch any object in LLVM. Juneyoung Lee On Mon, Jan 14, 2019 at 10:59 PM Chandler Carruth <chandlerc at gmail.com> wrote:> Note that we should keep the discussion of instcombine and > canonicalization completely separate. It is an independent question IMO. > Only responding to the pointer subtraction point here. > > On Mon, Jan 14, 2019 at 12:56 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr> > wrote: > >> > First and foremost - how do you address correctness issues here? >> Because the subtraction `A - B` can escape/capture more things. >> Specifically, if one of `A` or `B` is escaped/captured, the >> > subtraction can be used to escape or capture the other pointer. So >> *some* of the conservative treatment is necessary. What is the plan to >> update all the analyses to remain correct? What >> > correctness testing have you done? >> >> Correctness of psub is guaranteed by the specification of pointer >> subtraction of C/C++. >> > > LLVM is not a C/C++ compiler backend. We cannot rely on frontend language > semantics to make the optimizer correct. > > LLVM has its own semantic model and we need *it* to be correct. > > Also, the reality is that we *do* have the case I described. Even if the > source language does not form this directly, LLVM itself can form the exact > pattern I describe? We would need rules in LLVM's IR that preclude this at > a semantic level and to find and remove optimizations that violate that. > Have we done this? > > I think this may be much harder than it seems due to fundamental > transformations. If we prove that `C = (A - B)`, even though we originally > got `C` directly, we might replace it with `A - B`. This subtraction > doesn't even have to be something we can compute, we just need a proof that > it is equivalent. > > For example, a common loop optimization technique is something like the > following in *pseudocode* (note that this is not necessarily valid C but we > might reasonably want to express this kind of optimization within LLVM: > > ``` > void f(int *arr1, int *arr2, int length) { > intptr_t rel_offset = arr2 - arr1; > int *arr1_end = arr1 + length; > for (int *p = arr1; p < arr1_end; p += sizeof(int)) { > do_something(p, p + rel_offset); > } > } > ``` > > This kind of thing should be representable in LLVM's IR in my opinion. > > An analogous situation is with `getelementptr` vs `getelementptr > inbounds`. I think you'll need something similar, and you'll want to update > the optimizer to track the non-inbounds version as potentially escaping > (much as ptrtoint is potentially escaping). This is still distinct from > ptrtoint as things like address spaces and other properties of the pointers > are not necessarily lost by this operation. > > When two pointers are subtracted, both shall point to elements of the same >> array object, or one past the last element of the array object (6.5.6.9). >> So, if the two pointers p and q point to different objects, we can define >> llvm.psub(p,q) as poison. >> Other than meeting C specification, correctness of llvm.psub is tested >> with SPEC CPU2017 and LLVM Nightly Tests as well. >> > > SPEC and the nightly test suite are great at finding problems, but quite > bad at giving confindence that there are not problems. For subtle things > like this, I would expect you need to look an a much larger sampling of > real-world application code. > > >> >> But it is true that sometimes pointer subtraction is used to get distance >> between two objects. >> Most common case is doing something like 'p - NULL', and this pattern >> exists in SPEC CPU2017, for example spec_qsort.c in mcf_r . >> Our suggestion is to define 'p - q' correctly return the distance between >> p and q if either p or q is based on inttoptr(i). This naturally explains >> 'p - NULL' because NULL is equivalent to inttoptr(0). >> > > This doesn't make a lot of sense to me... the pointer subtractions > semantics depend on whether the input was rinsed through `inttoptr`? It > also doesn't handle the case I describe above. > > >> Regarding analysis - what I've observed is that analysis was done after >> pointer subtraction was optimized into another form. >> For example, if '(p - q) == 0' was given, this is transformed into 'p =>> q', and then some analysis was done. >> Good thing is that these transformations can be simply applied to >> llvm.psub as well, which will reenable analysis. >> Also we're adding a new operation here, so existing analysis wouldn't be >> incorrect, but wouldn't fire. >> Fortunately, the performance impact after changing llvm.psub wasn't big. >> >> > Second - an intrinsic seems a poor fit here given the significance of >> this operation. We have an instruction that covers most pointer arithmetic >> (`getelementptr`), and I can imagine growing >> > pointer subtraction, but it seems like it should be an instruction if >> we're going to have it. Based on the above, we will need to use it very >> often in analysis. >> >> I also think that defining psub as instruction is fine. :) >> >> > Regarding the instcombine, it should be very easy to keep loads and >> stores of pointers as pointer typed in instcombine. Likely just a missing >> case in the code I added/touched there. >> >> That's really good. :) I found that combineLoadToOperationType from >> InstCombineLoadStoreAlloca was responsible for the transformation. >> I can upload a patch for that if ok. >> >> Best Regards, >> Juneyoung Lee >> >> On Mon, Jan 14, 2019 at 5:36 PM Chandler Carruth <chandlerc at gmail.com> >> wrote: >> >>> While I'm very interested in the end result here, I have some questions >>> that don't seem well answered yet around pointer subtraction... >>> >>> First and foremost - how do you address correctness issues here? Because >>> the subtraction `A - B` can escape/capture more things. Specifically, if >>> one of `A` or `B` is escaped/captured, the subtraction can be used to >>> escape or capture the other pointer. So *some* of the conservative >>> treatment is necessary. What is the plan to update all the analyses to >>> remain correct? What correctness testing have you done? >>> >>> Second - an intrinsic seems a poor fit here given the significance of >>> this operation. We have an instruction that covers most pointer arithmetic >>> (`getelementptr`), and I can imagine growing pointer subtraction, but it >>> seems like it should be an instruction if we're going to have it. Based on >>> the above, we will need to use it very often in analysis. >>> >>> >>> Regarding the instcombine, it should be very easy to keep loads and >>> stores of pointers as pointer typed in instcombine. Likely just a missing >>> case in the code I added/touched there. >>> >>> On Mon, Jan 14, 2019 at 3:23 AM Juneyoung Lee via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Hello all, >>>> >>>> This is a proposal for reducing # of ptrtoint/inttoptr casts which are >>>> not >>>> written by programmers but rather generated by LLVM passes. >>>> Currently the majority of ptrtoint/inttoptr casts are generated by LLVM; >>>> when compiling SPEC 2017 with LLVM r348082 (Dec 2 2018) with -O3, >>>> the output IR contains 22,771 inttoptr instructions. However, when >>>> compiling it with -O0, there are only 1048 inttoptrs, meaning that 95.4% >>>> of them are generated by LLVM passes. >>>> >>>> This trend is similar in ptrtoint instruction as well. When compiling >>>> SPEC 2017 >>>> with -O0, there are 23,208 ptrtoint instructions, but among them 22,016 >>>> (94.8%) >>>> are generated by Clang frontend to represent pointer subtraction. >>>> They aren't effectively optimized out because there are even more >>>> ptrtoints (31,721) after -O3. >>>> This is bad for performance because existence of ptrtoint makes >>>> analysis return conservative >>>> result as a pointer can be escaped through the cast. >>>> Memory accesses to a pointer came from inttoptr is assumed >>>> to possibly access anywhere, therefore it may block >>>> store-to-load forwarding, merging two same loads, etc. >>>> >>>> I believe this can be addressed by applying two patches - first one is >>>> representing pointer subtraction with a dedicated intrinsic function, >>>> llvm.psub, and second one is disabling InstCombine transformation >>>> >>>> %q = load i8*, i8** %p1 >>>> store i8* %q, i8** %p2 >>>> => >>>> %1 = bitcast i8** %p1 to i64* >>>> %q1 = load i64, i64* %1, align 8 >>>> %2 = bitcast i8** %p2 to i64* >>>> store i64 %q1, i64* %2, align 8 >>>> >>>> This transformation can introduce inttoptrs later if loads are followed >>>> (https://godbolt.org/z/wsZ3II ). Both are discussed in >>>> https://bugs.llvm.org/show_bug.cgi?id=39846 as well. >>>> After llvm.psub is used & this transformation is disabled, # of >>>> inttoptrs decreases from 22,771 to 1,565 (6.9%), and # of ptrtoints >>>> decreases from 31,721 to 7,772 (24.5%). >>>> >>>> I'll introduce llvm.psub patch first. >>>> >>>> >>>> --- Adding llvm.psub --- >>>> >>>> By defining pointer subtraction intrinsic, we can get performance gain >>>> because it gives more undefined behavior than just subtracting two >>>> ptrtoints. >>>> >>>> Patch https://reviews.llvm.org/D56598 adds llvm.psub(p1,p2) intrinsic >>>> function, which subtracts two pointers and returns the difference. Its >>>> semantic is as follows. >>>> If p1 and p2 point to different objects, and neither of them is based >>>> on a pointer casted from an integer, `llvm.psub(p1, p2)` returns poison. >>>> For example, >>>> >>>> %p = alloca >>>> %q = alloca >>>> %i = llvm.psub(p, q) ; %i is poison >>>> >>>> This allows aggressive escape analysis on pointers. Given i >>>> llvm.psub(p1, p2), if neither of p1 and p2 is based on a pointer casted >>>> from an integer, the llvm.psub call does not make p1 or p2 escape. ( >>>> https://reviews.llvm.org/D56601 ) >>>> >>>> If either p1 or p2 is based on a pointer casted from integer, or p1 and >>>> p2 point to a same object, it returns the result of subtraction (in bytes); >>>> for example, >>>> >>>> %p = alloca >>>> %q = inttoptr %x >>>> %i = llvm.psub(p, q) ; %i is equivalent to (ptrtoint %p) - %x >>>> >>>> `null` is regarded as a pointer casted from an integer because >>>> it is equivalent to `inttoptr 0`. >>>> >>>> Adding llvm.psub allows LLVM to utilize significant portion of >>>> ptrtoints & reduce a portion of inttoptrs. After llvm.psub is used, when >>>> SPECrate 2017 is compiled with -O3, # of inttoptr decreases to ~13,500 >>>> (59%) and # of ptrtoint decreases to ~14,300 (45%). >>>> >>>> To see the performance change, I ran SPECrate 2017 (thread # = 1) with >>>> three versions of LLVM, which are r313797 (Sep 21, 2017), LLVM 6.0 >>>> official, and r348082 (Dec 2, 2018). >>>> Running r313797 shows that 505.mcf_r has consistent 2.0% speedup over 3 >>>> different machines (which are i3-6100, i5-6600, i7-7700). For LLVM 6.0 and >>>> r348082, there's neither consistent speedup nor slowdown, but the average >>>> speedup is near 0. I believe there's still a room of improvement because >>>> there are passes which are not aware of llvm.psub. >>>> >>>> Thank you for reading this, and any comment is welcome. >>>> >>>> Best Regards, >>>> Juneyoung Lee >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>> >> >> -- >> >> Juneyoung Lee >> Software Foundation Lab, Seoul National University >> >-- Juneyoung Lee Software Foundation Lab, Seoul National University -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190115/33f69897/attachment.html>
Chandler Carruth via llvm-dev
2019-Jan-15 21:52 UTC
[llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM
On Tue, Jan 15, 2019 at 8:11 AM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr> wrote:> Hello Chandler, > > > ``` > > void f(int *arr1, int *arr2, int length) { > > intptr_t rel_offset = arr2 - arr1; > > int *arr1_end = arr1 + length; > > for (int *p = arr1; p < arr1_end; p += sizeof(int)) { > > do_something(p, p + rel_offset); > > } > > } > > ``` > > > > For example, a common loop optimization technique is something like the > following in *pseudocode* (note that this is > > not necessarily valid C but we might reasonably want to express this > kind of optimization within LLVM: > > When LLVM need to insert subtraction between two pointers, like the > example you wrote, I believe we can use 'sub(ptrtoint p, ptrtoint q)' as > before. This addresses concerns regarding correctness of existing pointer > analysis. So you don't lose expressiveness, while you gain precision when > possible. > > A concrete suggestion is to add a parameter like 'bool use_psub' to > IRBuilder::CreatePtrDiff. When Clang inserts a pointer subtraction, > use_psub is set to true. When LLVM inserts it, it is set as false. Default > value of use_psub is false, so existing LLVM passes will still insert sub > (ptrtoint, ptrtoint) but most of ptrtoints are inserted from Clang, so # of > ptrtoint will significantly decrease as well. > > Adding an "inbounds" flag to psub doesn't seem necessary since you can > achieve the same thing with sub(ptrtoint, ptrtoint). >I don't follow -- if we continue to carry based-on aliasing semantics in GEPs (as you mention below), then it seems like the same kinds of motivations fro non-inbounds GEPs would motivate non-inbounds pointer subtraction (without degrading to integers and losing all "based on" aliasing information). As a concrete example, one use case for non-inbounds GEPs is to allow two displacements to be applied to a pointer that may move it outside the allocation, but combined will return to the allocation and be valid. It would seem useful to be able to *compute* such displacements as well. I think an important design point is that any valid pointer and any *intermediate* pointers in a GEP computation should be valid operands to pointer subtraction.> > > This doesn't make a lot of sense to me... the pointer subtractions > semantics depend on whether the > > input was rinsed through `inttoptr`? It also doesn't handle the case I > describe above. > > The motivation is that (as far as I understand) LLVM already treat > pointers casted from inttoptr differently from pointers from gep. > We tentatively call these "physical pointers" (as opposed to "logical > pointers"). > For example, GEP document says that inttoptr(i + ptrtoint p) is different > from gep(p, i) - it says 'most of GEP’s special aliasing rules > do not apply to pointers computed from ptrtoint, arithmetic, and inttoptr > sequences', implying that it can touch any object in LLVM. >I understand that the LangRef currently says all of these things. But I am not sure that it reflects reality in the face of CSE-like transformations. Those tend to break "based on" aliasing systems in many cases. I'd appreciate the folks who have been studying the failure to fully formally specify the aliasing semantics to chime in -- maybe all of the issues here have been thoroughly addressed. If so, great. If not, then I think we will need to be careful here. I've already suggested several patterns for how you could evaluate the safety in practical ways (as opposed to citing the language reference), and I think that plus hearing from folks who have worked on these formalizing conflicts in the IR are the next steps.> > Juneyoung Lee > > On Mon, Jan 14, 2019 at 10:59 PM Chandler Carruth <chandlerc at gmail.com> > wrote: > >> Note that we should keep the discussion of instcombine and >> canonicalization completely separate. It is an independent question IMO. >> Only responding to the pointer subtraction point here. >> >> On Mon, Jan 14, 2019 at 12:56 PM Juneyoung Lee < >> juneyoung.lee at sf.snu.ac.kr> wrote: >> >>> > First and foremost - how do you address correctness issues here? >>> Because the subtraction `A - B` can escape/capture more things. >>> Specifically, if one of `A` or `B` is escaped/captured, the >>> > subtraction can be used to escape or capture the other pointer. So >>> *some* of the conservative treatment is necessary. What is the plan to >>> update all the analyses to remain correct? What >>> > correctness testing have you done? >>> >>> Correctness of psub is guaranteed by the specification of pointer >>> subtraction of C/C++. >>> >> >> LLVM is not a C/C++ compiler backend. We cannot rely on frontend language >> semantics to make the optimizer correct. >> >> LLVM has its own semantic model and we need *it* to be correct. >> >> Also, the reality is that we *do* have the case I described. Even if the >> source language does not form this directly, LLVM itself can form the exact >> pattern I describe? We would need rules in LLVM's IR that preclude this at >> a semantic level and to find and remove optimizations that violate that. >> Have we done this? >> >> I think this may be much harder than it seems due to fundamental >> transformations. If we prove that `C = (A - B)`, even though we originally >> got `C` directly, we might replace it with `A - B`. This subtraction >> doesn't even have to be something we can compute, we just need a proof that >> it is equivalent. >> >> For example, a common loop optimization technique is something like the >> following in *pseudocode* (note that this is not necessarily valid C but we >> might reasonably want to express this kind of optimization within LLVM: >> >> ``` >> void f(int *arr1, int *arr2, int length) { >> intptr_t rel_offset = arr2 - arr1; >> int *arr1_end = arr1 + length; >> for (int *p = arr1; p < arr1_end; p += sizeof(int)) { >> do_something(p, p + rel_offset); >> } >> } >> ``` >> >> This kind of thing should be representable in LLVM's IR in my opinion. >> >> An analogous situation is with `getelementptr` vs `getelementptr >> inbounds`. I think you'll need something similar, and you'll want to update >> the optimizer to track the non-inbounds version as potentially escaping >> (much as ptrtoint is potentially escaping). This is still distinct from >> ptrtoint as things like address spaces and other properties of the pointers >> are not necessarily lost by this operation. >> >> When two pointers are subtracted, both shall point to elements of the >>> same array object, or one past the last element of the array object >>> (6.5.6.9). >>> So, if the two pointers p and q point to different objects, we can >>> define llvm.psub(p,q) as poison. >>> Other than meeting C specification, correctness of llvm.psub is tested >>> with SPEC CPU2017 and LLVM Nightly Tests as well. >>> >> >> SPEC and the nightly test suite are great at finding problems, but quite >> bad at giving confindence that there are not problems. For subtle things >> like this, I would expect you need to look an a much larger sampling of >> real-world application code. >> >> >>> >>> But it is true that sometimes pointer subtraction is used to get >>> distance between two objects. >>> Most common case is doing something like 'p - NULL', and this pattern >>> exists in SPEC CPU2017, for example spec_qsort.c in mcf_r . >>> Our suggestion is to define 'p - q' correctly return the distance >>> between p and q if either p or q is based on inttoptr(i). This naturally >>> explains 'p - NULL' because NULL is equivalent to inttoptr(0). >>> >> >> This doesn't make a lot of sense to me... the pointer subtractions >> semantics depend on whether the input was rinsed through `inttoptr`? It >> also doesn't handle the case I describe above. >> >> >>> Regarding analysis - what I've observed is that analysis was done after >>> pointer subtraction was optimized into another form. >>> For example, if '(p - q) == 0' was given, this is transformed into 'p =>>> q', and then some analysis was done. >>> Good thing is that these transformations can be simply applied to >>> llvm.psub as well, which will reenable analysis. >>> Also we're adding a new operation here, so existing analysis wouldn't be >>> incorrect, but wouldn't fire. >>> Fortunately, the performance impact after changing llvm.psub wasn't big. >>> >>> > Second - an intrinsic seems a poor fit here given the significance of >>> this operation. We have an instruction that covers most pointer arithmetic >>> (`getelementptr`), and I can imagine growing >>> > pointer subtraction, but it seems like it should be an instruction if >>> we're going to have it. Based on the above, we will need to use it very >>> often in analysis. >>> >>> I also think that defining psub as instruction is fine. :) >>> >>> > Regarding the instcombine, it should be very easy to keep loads and >>> stores of pointers as pointer typed in instcombine. Likely just a missing >>> case in the code I added/touched there. >>> >>> That's really good. :) I found that combineLoadToOperationType from >>> InstCombineLoadStoreAlloca was responsible for the transformation. >>> I can upload a patch for that if ok. >>> >>> Best Regards, >>> Juneyoung Lee >>> >>> On Mon, Jan 14, 2019 at 5:36 PM Chandler Carruth <chandlerc at gmail.com> >>> wrote: >>> >>>> While I'm very interested in the end result here, I have some questions >>>> that don't seem well answered yet around pointer subtraction... >>>> >>>> First and foremost - how do you address correctness issues here? >>>> Because the subtraction `A - B` can escape/capture more things. >>>> Specifically, if one of `A` or `B` is escaped/captured, the subtraction can >>>> be used to escape or capture the other pointer. So *some* of the >>>> conservative treatment is necessary. What is the plan to update all the >>>> analyses to remain correct? What correctness testing have you done? >>>> >>>> Second - an intrinsic seems a poor fit here given the significance of >>>> this operation. We have an instruction that covers most pointer arithmetic >>>> (`getelementptr`), and I can imagine growing pointer subtraction, but it >>>> seems like it should be an instruction if we're going to have it. Based on >>>> the above, we will need to use it very often in analysis. >>>> >>>> >>>> Regarding the instcombine, it should be very easy to keep loads and >>>> stores of pointers as pointer typed in instcombine. Likely just a missing >>>> case in the code I added/touched there. >>>> >>>> On Mon, Jan 14, 2019 at 3:23 AM Juneyoung Lee via llvm-dev < >>>> llvm-dev at lists.llvm.org> wrote: >>>> >>>>> Hello all, >>>>> >>>>> This is a proposal for reducing # of ptrtoint/inttoptr casts which are >>>>> not >>>>> written by programmers but rather generated by LLVM passes. >>>>> Currently the majority of ptrtoint/inttoptr casts are generated by >>>>> LLVM; >>>>> when compiling SPEC 2017 with LLVM r348082 (Dec 2 2018) with -O3, >>>>> the output IR contains 22,771 inttoptr instructions. However, when >>>>> compiling it with -O0, there are only 1048 inttoptrs, meaning that >>>>> 95.4% >>>>> of them are generated by LLVM passes. >>>>> >>>>> This trend is similar in ptrtoint instruction as well. When compiling >>>>> SPEC 2017 >>>>> with -O0, there are 23,208 ptrtoint instructions, but among them >>>>> 22,016 (94.8%) >>>>> are generated by Clang frontend to represent pointer subtraction. >>>>> They aren't effectively optimized out because there are even more >>>>> ptrtoints (31,721) after -O3. >>>>> This is bad for performance because existence of ptrtoint makes >>>>> analysis return conservative >>>>> result as a pointer can be escaped through the cast. >>>>> Memory accesses to a pointer came from inttoptr is assumed >>>>> to possibly access anywhere, therefore it may block >>>>> store-to-load forwarding, merging two same loads, etc. >>>>> >>>>> I believe this can be addressed by applying two patches - first one is >>>>> representing pointer subtraction with a dedicated intrinsic function, >>>>> llvm.psub, and second one is disabling InstCombine transformation >>>>> >>>>> %q = load i8*, i8** %p1 >>>>> store i8* %q, i8** %p2 >>>>> => >>>>> %1 = bitcast i8** %p1 to i64* >>>>> %q1 = load i64, i64* %1, align 8 >>>>> %2 = bitcast i8** %p2 to i64* >>>>> store i64 %q1, i64* %2, align 8 >>>>> >>>>> This transformation can introduce inttoptrs later if loads are >>>>> followed (https://godbolt.org/z/wsZ3II ). Both are discussed in >>>>> https://bugs.llvm.org/show_bug.cgi?id=39846 as well. >>>>> After llvm.psub is used & this transformation is disabled, # of >>>>> inttoptrs decreases from 22,771 to 1,565 (6.9%), and # of ptrtoints >>>>> decreases from 31,721 to 7,772 (24.5%). >>>>> >>>>> I'll introduce llvm.psub patch first. >>>>> >>>>> >>>>> --- Adding llvm.psub --- >>>>> >>>>> By defining pointer subtraction intrinsic, we can get performance gain >>>>> because it gives more undefined behavior than just subtracting two >>>>> ptrtoints. >>>>> >>>>> Patch https://reviews.llvm.org/D56598 adds llvm.psub(p1,p2) intrinsic >>>>> function, which subtracts two pointers and returns the difference. Its >>>>> semantic is as follows. >>>>> If p1 and p2 point to different objects, and neither of them is based >>>>> on a pointer casted from an integer, `llvm.psub(p1, p2)` returns poison. >>>>> For example, >>>>> >>>>> %p = alloca >>>>> %q = alloca >>>>> %i = llvm.psub(p, q) ; %i is poison >>>>> >>>>> This allows aggressive escape analysis on pointers. Given i >>>>> llvm.psub(p1, p2), if neither of p1 and p2 is based on a pointer casted >>>>> from an integer, the llvm.psub call does not make p1 or p2 escape. ( >>>>> https://reviews.llvm.org/D56601 ) >>>>> >>>>> If either p1 or p2 is based on a pointer casted from integer, or p1 >>>>> and p2 point to a same object, it returns the result of subtraction (in >>>>> bytes); for example, >>>>> >>>>> %p = alloca >>>>> %q = inttoptr %x >>>>> %i = llvm.psub(p, q) ; %i is equivalent to (ptrtoint %p) - %x >>>>> >>>>> `null` is regarded as a pointer casted from an integer because >>>>> it is equivalent to `inttoptr 0`. >>>>> >>>>> Adding llvm.psub allows LLVM to utilize significant portion of >>>>> ptrtoints & reduce a portion of inttoptrs. After llvm.psub is used, when >>>>> SPECrate 2017 is compiled with -O3, # of inttoptr decreases to ~13,500 >>>>> (59%) and # of ptrtoint decreases to ~14,300 (45%). >>>>> >>>>> To see the performance change, I ran SPECrate 2017 (thread # = 1) with >>>>> three versions of LLVM, which are r313797 (Sep 21, 2017), LLVM 6.0 >>>>> official, and r348082 (Dec 2, 2018). >>>>> Running r313797 shows that 505.mcf_r has consistent 2.0% speedup over >>>>> 3 different machines (which are i3-6100, i5-6600, i7-7700). For LLVM 6.0 >>>>> and r348082, there's neither consistent speedup nor slowdown, but the >>>>> average speedup is near 0. I believe there's still a room of improvement >>>>> because there are passes which are not aware of llvm.psub. >>>>> >>>>> Thank you for reading this, and any comment is welcome. >>>>> >>>>> Best Regards, >>>>> Juneyoung Lee >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>> >>> >>> -- >>> >>> Juneyoung Lee >>> Software Foundation Lab, Seoul National University >>> >> > > -- > > Juneyoung Lee > Software Foundation Lab, Seoul National University >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190115/b87652b0/attachment.html>
Juneyoung Lee via llvm-dev
2019-Jan-16 14:15 UTC
[llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM
> I don't follow -- if we continue to carry based-on aliasing semantics inGEPs (as you mention below), then it seems like the same kinds of motivations fro non-inbounds GEPs would motivate non-inbounds pointer subtraction (without degrading to integers and losing all "based on" aliasing information).> As a concrete example, one use case for non-inbounds GEPs is to allow twodisplacements to be applied to a pointer that may move it outside the allocation, but combined will return to the allocation and be valid. It would seem useful to be able to *compute* such displacements as well.> I think an important design point is that any valid pointer and any*intermediate* pointers in a GEP computation should be valid operands to pointer subtraction. I see - so if my understanding is correct, having 'inbounds' flag to psub may give benefit which is analogous to GEP inbounds: a = alloca [10 x i8] p = f(a) i = psub inbounds p, a // We can assume that 0 <= i <= 10 To synchronize, this is the semantics including psub inbounds: 'psub inbounds p q' returns poison if (p and q point to different objects) \/ (either p or q is not in-bounds). Otherwise, it returns 'sub(ptrtoint p, ptrtoint q)'. 'psub p q' returns poison if p and q point to different objects. Otherwise, it returns 'sub(ptrtoint p, ptrtoint q)'. sub(ptrtoint p, ptrtoint q) returns non-poison integer unless either p or q is poison.> I understand that the LangRef currently says all of these things. But Iam not sure that it reflects reality in the face of CSE-like transformations. Those tend to break "based on" aliasing systems in many cases.> I'd appreciate the folks who have been studying the failure to fullyformally specify the aliasing semantics to chime in -- maybe all of the issues here have been thoroughly addressed. If so, great. If not, then I think we will need to be careful here. I've already suggested several patterns for how you could evaluate the safety in practical ways (as opposed to citing the language reference), and I think that plus hearing from folks who have worked on these formalizing conflicts in the IR are the next steps. Yep, discussing with people about more LLVM IR patterns that possibly involve psub would be great. I believe gradually changing use of sub(ptrtoint ,ptrtoint) to psub can somehow satisfy people's concerns. Regarding the CSE-like transformation: if we introduce 'psub', we can again support replacing a pointer with another in this case: i = psub p, q // i is poison if p and q are based on different objects if (i == 0) { use(p) // replace this with q is fine, because p and q are guaranteed to point to the same object } else { use(q) } This transformation is correct because psub returns poison if p and q point to different objects. On Tue, Jan 15, 2019 at 9:52 PM Chandler Carruth <chandlerc at gmail.com> wrote:> On Tue, Jan 15, 2019 at 8:11 AM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr> > wrote: > >> Hello Chandler, >> >> > ``` >> > void f(int *arr1, int *arr2, int length) { >> > intptr_t rel_offset = arr2 - arr1; >> > int *arr1_end = arr1 + length; >> > for (int *p = arr1; p < arr1_end; p += sizeof(int)) { >> > do_something(p, p + rel_offset); >> > } >> > } >> > ``` >> > >> > For example, a common loop optimization technique is something like the >> following in *pseudocode* (note that this is >> > not necessarily valid C but we might reasonably want to express this >> kind of optimization within LLVM: >> >> When LLVM need to insert subtraction between two pointers, like the >> example you wrote, I believe we can use 'sub(ptrtoint p, ptrtoint q)' as >> before. This addresses concerns regarding correctness of existing pointer >> analysis. So you don't lose expressiveness, while you gain precision when >> possible. >> >> A concrete suggestion is to add a parameter like 'bool use_psub' to >> IRBuilder::CreatePtrDiff. When Clang inserts a pointer subtraction, >> use_psub is set to true. When LLVM inserts it, it is set as false. Default >> value of use_psub is false, so existing LLVM passes will still insert sub >> (ptrtoint, ptrtoint) but most of ptrtoints are inserted from Clang, so # of >> ptrtoint will significantly decrease as well. >> >> Adding an "inbounds" flag to psub doesn't seem necessary since you can >> achieve the same thing with sub(ptrtoint, ptrtoint). >> > > I don't follow -- if we continue to carry based-on aliasing semantics in > GEPs (as you mention below), then it seems like the same kinds of > motivations fro non-inbounds GEPs would motivate non-inbounds pointer > subtraction (without degrading to integers and losing all "based on" > aliasing information). > > As a concrete example, one use case for non-inbounds GEPs is to allow two > displacements to be applied to a pointer that may move it outside the > allocation, but combined will return to the allocation and be valid. It > would seem useful to be able to *compute* such displacements as well. > > I think an important design point is that any valid pointer and any > *intermediate* pointers in a GEP computation should be valid operands to > pointer subtraction. > > >> >> > This doesn't make a lot of sense to me... the pointer subtractions >> semantics depend on whether the >> > input was rinsed through `inttoptr`? It also doesn't handle the case I >> describe above. >> >> The motivation is that (as far as I understand) LLVM already treat >> pointers casted from inttoptr differently from pointers from gep. >> We tentatively call these "physical pointers" (as opposed to "logical >> pointers"). >> For example, GEP document says that inttoptr(i + ptrtoint p) is different >> from gep(p, i) - it says 'most of GEP’s special aliasing rules >> do not apply to pointers computed from ptrtoint, arithmetic, and inttoptr >> sequences', implying that it can touch any object in LLVM. >> > > I understand that the LangRef currently says all of these things. But I am > not sure that it reflects reality in the face of CSE-like transformations. > Those tend to break "based on" aliasing systems in many cases. > > I'd appreciate the folks who have been studying the failure to fully > formally specify the aliasing semantics to chime in -- maybe all of the > issues here have been thoroughly addressed. If so, great. If not, then I > think we will need to be careful here. I've already suggested several > patterns for how you could evaluate the safety in practical ways (as > opposed to citing the language reference), and I think that plus hearing > from folks who have worked on these formalizing conflicts in the IR are the > next steps. > > >> >> Juneyoung Lee >> >> On Mon, Jan 14, 2019 at 10:59 PM Chandler Carruth <chandlerc at gmail.com> >> wrote: >> >>> Note that we should keep the discussion of instcombine and >>> canonicalization completely separate. It is an independent question IMO. >>> Only responding to the pointer subtraction point here. >>> >>> On Mon, Jan 14, 2019 at 12:56 PM Juneyoung Lee < >>> juneyoung.lee at sf.snu.ac.kr> wrote: >>> >>>> > First and foremost - how do you address correctness issues here? >>>> Because the subtraction `A - B` can escape/capture more things. >>>> Specifically, if one of `A` or `B` is escaped/captured, the >>>> > subtraction can be used to escape or capture the other pointer. So >>>> *some* of the conservative treatment is necessary. What is the plan to >>>> update all the analyses to remain correct? What >>>> > correctness testing have you done? >>>> >>>> Correctness of psub is guaranteed by the specification of pointer >>>> subtraction of C/C++. >>>> >>> >>> LLVM is not a C/C++ compiler backend. We cannot rely on frontend >>> language semantics to make the optimizer correct. >>> >>> LLVM has its own semantic model and we need *it* to be correct. >>> >>> Also, the reality is that we *do* have the case I described. Even if the >>> source language does not form this directly, LLVM itself can form the exact >>> pattern I describe? We would need rules in LLVM's IR that preclude this at >>> a semantic level and to find and remove optimizations that violate that. >>> Have we done this? >>> >>> I think this may be much harder than it seems due to fundamental >>> transformations. If we prove that `C = (A - B)`, even though we originally >>> got `C` directly, we might replace it with `A - B`. This subtraction >>> doesn't even have to be something we can compute, we just need a proof that >>> it is equivalent. >>> >>> For example, a common loop optimization technique is something like the >>> following in *pseudocode* (note that this is not necessarily valid C but we >>> might reasonably want to express this kind of optimization within LLVM: >>> >>> ``` >>> void f(int *arr1, int *arr2, int length) { >>> intptr_t rel_offset = arr2 - arr1; >>> int *arr1_end = arr1 + length; >>> for (int *p = arr1; p < arr1_end; p += sizeof(int)) { >>> do_something(p, p + rel_offset); >>> } >>> } >>> ``` >>> >>> This kind of thing should be representable in LLVM's IR in my opinion. >>> >>> An analogous situation is with `getelementptr` vs `getelementptr >>> inbounds`. I think you'll need something similar, and you'll want to update >>> the optimizer to track the non-inbounds version as potentially escaping >>> (much as ptrtoint is potentially escaping). This is still distinct from >>> ptrtoint as things like address spaces and other properties of the pointers >>> are not necessarily lost by this operation. >>> >>> When two pointers are subtracted, both shall point to elements of the >>>> same array object, or one past the last element of the array object >>>> (6.5.6.9). >>>> So, if the two pointers p and q point to different objects, we can >>>> define llvm.psub(p,q) as poison. >>>> Other than meeting C specification, correctness of llvm.psub is tested >>>> with SPEC CPU2017 and LLVM Nightly Tests as well. >>>> >>> >>> SPEC and the nightly test suite are great at finding problems, but quite >>> bad at giving confindence that there are not problems. For subtle things >>> like this, I would expect you need to look an a much larger sampling of >>> real-world application code. >>> >>> >>>> >>>> But it is true that sometimes pointer subtraction is used to get >>>> distance between two objects. >>>> Most common case is doing something like 'p - NULL', and this pattern >>>> exists in SPEC CPU2017, for example spec_qsort.c in mcf_r . >>>> Our suggestion is to define 'p - q' correctly return the distance >>>> between p and q if either p or q is based on inttoptr(i). This naturally >>>> explains 'p - NULL' because NULL is equivalent to inttoptr(0). >>>> >>> >>> This doesn't make a lot of sense to me... the pointer subtractions >>> semantics depend on whether the input was rinsed through `inttoptr`? It >>> also doesn't handle the case I describe above. >>> >>> >>>> Regarding analysis - what I've observed is that analysis was done after >>>> pointer subtraction was optimized into another form. >>>> For example, if '(p - q) == 0' was given, this is transformed into 'p >>>> == q', and then some analysis was done. >>>> Good thing is that these transformations can be simply applied to >>>> llvm.psub as well, which will reenable analysis. >>>> Also we're adding a new operation here, so existing analysis wouldn't >>>> be incorrect, but wouldn't fire. >>>> Fortunately, the performance impact after changing llvm.psub wasn't big. >>>> >>>> > Second - an intrinsic seems a poor fit here given the significance of >>>> this operation. We have an instruction that covers most pointer arithmetic >>>> (`getelementptr`), and I can imagine growing >>>> > pointer subtraction, but it seems like it should be an instruction if >>>> we're going to have it. Based on the above, we will need to use it very >>>> often in analysis. >>>> >>>> I also think that defining psub as instruction is fine. :) >>>> >>>> > Regarding the instcombine, it should be very easy to keep loads and >>>> stores of pointers as pointer typed in instcombine. Likely just a missing >>>> case in the code I added/touched there. >>>> >>>> That's really good. :) I found that combineLoadToOperationType from >>>> InstCombineLoadStoreAlloca was responsible for the transformation. >>>> I can upload a patch for that if ok. >>>> >>>> Best Regards, >>>> Juneyoung Lee >>>> >>>> On Mon, Jan 14, 2019 at 5:36 PM Chandler Carruth <chandlerc at gmail.com> >>>> wrote: >>>> >>>>> While I'm very interested in the end result here, I have some >>>>> questions that don't seem well answered yet around pointer subtraction... >>>>> >>>>> First and foremost - how do you address correctness issues here? >>>>> Because the subtraction `A - B` can escape/capture more things. >>>>> Specifically, if one of `A` or `B` is escaped/captured, the subtraction can >>>>> be used to escape or capture the other pointer. So *some* of the >>>>> conservative treatment is necessary. What is the plan to update all the >>>>> analyses to remain correct? What correctness testing have you done? >>>>> >>>>> Second - an intrinsic seems a poor fit here given the significance of >>>>> this operation. We have an instruction that covers most pointer arithmetic >>>>> (`getelementptr`), and I can imagine growing pointer subtraction, but it >>>>> seems like it should be an instruction if we're going to have it. Based on >>>>> the above, we will need to use it very often in analysis. >>>>> >>>>> >>>>> Regarding the instcombine, it should be very easy to keep loads and >>>>> stores of pointers as pointer typed in instcombine. Likely just a missing >>>>> case in the code I added/touched there. >>>>> >>>>> On Mon, Jan 14, 2019 at 3:23 AM Juneyoung Lee via llvm-dev < >>>>> llvm-dev at lists.llvm.org> wrote: >>>>> >>>>>> Hello all, >>>>>> >>>>>> This is a proposal for reducing # of ptrtoint/inttoptr casts which >>>>>> are not >>>>>> written by programmers but rather generated by LLVM passes. >>>>>> Currently the majority of ptrtoint/inttoptr casts are generated by >>>>>> LLVM; >>>>>> when compiling SPEC 2017 with LLVM r348082 (Dec 2 2018) with -O3, >>>>>> the output IR contains 22,771 inttoptr instructions. However, when >>>>>> compiling it with -O0, there are only 1048 inttoptrs, meaning that >>>>>> 95.4% >>>>>> of them are generated by LLVM passes. >>>>>> >>>>>> This trend is similar in ptrtoint instruction as well. When compiling >>>>>> SPEC 2017 >>>>>> with -O0, there are 23,208 ptrtoint instructions, but among them >>>>>> 22,016 (94.8%) >>>>>> are generated by Clang frontend to represent pointer subtraction. >>>>>> They aren't effectively optimized out because there are even more >>>>>> ptrtoints (31,721) after -O3. >>>>>> This is bad for performance because existence of ptrtoint makes >>>>>> analysis return conservative >>>>>> result as a pointer can be escaped through the cast. >>>>>> Memory accesses to a pointer came from inttoptr is assumed >>>>>> to possibly access anywhere, therefore it may block >>>>>> store-to-load forwarding, merging two same loads, etc. >>>>>> >>>>>> I believe this can be addressed by applying two patches - first one >>>>>> is representing pointer subtraction with a dedicated intrinsic function, >>>>>> llvm.psub, and second one is disabling InstCombine transformation >>>>>> >>>>>> %q = load i8*, i8** %p1 >>>>>> store i8* %q, i8** %p2 >>>>>> => >>>>>> %1 = bitcast i8** %p1 to i64* >>>>>> %q1 = load i64, i64* %1, align 8 >>>>>> %2 = bitcast i8** %p2 to i64* >>>>>> store i64 %q1, i64* %2, align 8 >>>>>> >>>>>> This transformation can introduce inttoptrs later if loads are >>>>>> followed (https://godbolt.org/z/wsZ3II ). Both are discussed in >>>>>> https://bugs.llvm.org/show_bug.cgi?id=39846 as well. >>>>>> After llvm.psub is used & this transformation is disabled, # of >>>>>> inttoptrs decreases from 22,771 to 1,565 (6.9%), and # of ptrtoints >>>>>> decreases from 31,721 to 7,772 (24.5%). >>>>>> >>>>>> I'll introduce llvm.psub patch first. >>>>>> >>>>>> >>>>>> --- Adding llvm.psub --- >>>>>> >>>>>> By defining pointer subtraction intrinsic, we can get performance >>>>>> gain because it gives more undefined behavior than just subtracting two >>>>>> ptrtoints. >>>>>> >>>>>> Patch https://reviews.llvm.org/D56598 adds llvm.psub(p1,p2) >>>>>> intrinsic function, which subtracts two pointers and returns the >>>>>> difference. Its semantic is as follows. >>>>>> If p1 and p2 point to different objects, and neither of them is based >>>>>> on a pointer casted from an integer, `llvm.psub(p1, p2)` returns poison. >>>>>> For example, >>>>>> >>>>>> %p = alloca >>>>>> %q = alloca >>>>>> %i = llvm.psub(p, q) ; %i is poison >>>>>> >>>>>> This allows aggressive escape analysis on pointers. Given i >>>>>> llvm.psub(p1, p2), if neither of p1 and p2 is based on a pointer casted >>>>>> from an integer, the llvm.psub call does not make p1 or p2 escape. ( >>>>>> https://reviews.llvm.org/D56601 ) >>>>>> >>>>>> If either p1 or p2 is based on a pointer casted from integer, or p1 >>>>>> and p2 point to a same object, it returns the result of subtraction (in >>>>>> bytes); for example, >>>>>> >>>>>> %p = alloca >>>>>> %q = inttoptr %x >>>>>> %i = llvm.psub(p, q) ; %i is equivalent to (ptrtoint %p) - %x >>>>>> >>>>>> `null` is regarded as a pointer casted from an integer because >>>>>> it is equivalent to `inttoptr 0`. >>>>>> >>>>>> Adding llvm.psub allows LLVM to utilize significant portion of >>>>>> ptrtoints & reduce a portion of inttoptrs. After llvm.psub is used, when >>>>>> SPECrate 2017 is compiled with -O3, # of inttoptr decreases to ~13,500 >>>>>> (59%) and # of ptrtoint decreases to ~14,300 (45%). >>>>>> >>>>>> To see the performance change, I ran SPECrate 2017 (thread # = 1) >>>>>> with three versions of LLVM, which are r313797 (Sep 21, 2017), LLVM 6.0 >>>>>> official, and r348082 (Dec 2, 2018). >>>>>> Running r313797 shows that 505.mcf_r has consistent 2.0% speedup over >>>>>> 3 different machines (which are i3-6100, i5-6600, i7-7700). For LLVM 6.0 >>>>>> and r348082, there's neither consistent speedup nor slowdown, but the >>>>>> average speedup is near 0. I believe there's still a room of improvement >>>>>> because there are passes which are not aware of llvm.psub. >>>>>> >>>>>> Thank you for reading this, and any comment is welcome. >>>>>> >>>>>> Best Regards, >>>>>> Juneyoung Lee >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>> >>>> >>>> -- >>>> >>>> Juneyoung Lee >>>> Software Foundation Lab, Seoul National University >>>> >>> >> >> -- >> >> Juneyoung Lee >> Software Foundation Lab, Seoul National University >> >-- Juneyoung Lee Software Foundation Lab, Seoul National University -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190116/39a9c52c/attachment-0001.html>