Dinar Temirbulatov via llvm-dev
2017-Sep-13 21:46 UTC
[llvm-dev] RFC phantom memory intrinsic
Hi Michael,>I have a case where InstCombine removes a store and your approach would be >valuable for me if the entire access to an aggregate could be restored.Yes, no problem and we could add the aggregate pointer to this new intrinsic and in my particular case I should ignore it, but I am looking now at "speculation_marker" metadata and I am still not sure how to implement it better. Thanks, Dinar. On Wed, Sep 13, 2017 at 3:23 PM, Haidl, Michael <michael.haidl at uni-muenster.de> wrote:> Hi Dinar, > > I am asking because I am maintaining an out-of-tree pass which does > exactly what SLP does not. It is a pass designed for GPUs to combine > loads and stores, e.g., when consecutive fields of a structure have the > same type it merges the loads and stores to vector loads and stores. I > have a case where InstCombine removes a store and your approach would be > valuable for me if the entire access to an aggregate could be restored. > Second thing I am concerned is that this intrinsics "just" fix a > specific problem of IC where potentially a more generic solution is needed. > > Cheers, > Michael >> Hi Michael, >>> Interesting approach but how do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases. >> Yes, correct, this a little bit changed example is not working. >> #include <x86intrin.h> >> >> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >> } >> But with the aggregate case it is a new level of complexity, should we >> we care about? There might be some logic that probably would be mark >> as dead by InstCombine and we don't want to keep it. >> BTW: Looks like SLP could not recognize the case either : >> define <4 x double> @vsht_d4_fold(double* %ptr, i64 %i) local_unnamed_addr #0 { >> entry: >> %arrayidx = getelementptr inbounds double, double* %ptr, i64 %i >> %0 = load double, double* %arrayidx, align 8 >> %vecinit = insertelement <4 x double> undef, double %0, i32 0 >> %add = add i64 %i, 1 >> %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 %add >> %1 = load double, double* %arrayidx1, align 8 >> %vecinit2 = insertelement <4 x double> %vecinit, double %1, i32 1 >> %add3 = add i64 %i, 2 >> %arrayidx4 = getelementptr inbounds double, double* %ptr, i64 %add3 >> %2 = load double, double* %arrayidx4, align 8 >> %vecinit5 = insertelement <4 x double> %vecinit2, double %2, i32 2 >> %add6 = add i64 %i, 3 >> %arrayidx7 = getelementptr inbounds double, double* %ptr, i64 %add6 >> %3 = load double, double* %arrayidx7, align 8 >> %vecinit8 = insertelement <4 x double> %vecinit5, double %3, i32 3 >> %shuffle = shufflevector <4 x double> %vecinit8, <4 x double> >> %vecinit8, <4 x i32> <i32 3, i32 3, i32 2, i32 2> >> ret <4 x double> %shuffle >> } >> >> Thanks, Dinar. >> >> On Tue, Sep 12, 2017 at 8:26 PM, Haidl, Michael >> <michael.haidl at uni-muenster.de> wrote: >>> Interesting approach but how do you handle more complex offsets, e.g., when >>> the pointer is part of an aggregate? Only one offset does not seem enough to >>> handle generic cases. >>> >>> -----Original Message----- >>> From: Dinar Temirbulatov via llvm-dev [llvm-dev at lists.llvm.org] >>> Received: Dienstag, 12 Sep. 2017, 9:57 >>> To: llvm-dev at lists.llvm.org [llvm-dev at lists.llvm.org] >>> CC: Filipe Cabecinhas [me at filcab.net] >>> Subject: [llvm-dev] RFC phantom memory intrinsic >>> >>> Hi, >>> For PR21780 solution, I plan to add a new functionality to restore >>> memory operations that was once deleted, in this particular case it is >>> the load operations that were deleted by InstCombine, please note that >>> once the load was removed there is no way to restore it back and that >>> prevents us from vectorizing the shuffle operation. There are probably >>> more similar issues where this approach could be applied. >>> I added phatom_mem(llvm_anyptr_ty, llvm_i64_ty) intrinsic for that, >>> indicating that for particular pointer let's call it %ptr we observed >>> maximum possible offset at which there was reference by its type in a >>> function. After InstCombine deleted the load operation, it could be >>> restored in SLPVectorizer and we could restore chains of GEPs, Loads >>> and Inserts in case we encounter phatom_mem intrinsic. >>> >>> Here is two part review: >>> https://reviews.llvm.org/D37579 - InstCombine part. >>> https://reviews.llvm.org/D37648 - SLP part. >>> >>> Also, there might be different approaches in describing deleted memory >>> operations, for example, for my case: phantom_load(llvm_anyptr_ty, >>> llvm_i64_ty). First parameter describes pointer and second parameter >>> offset from pointer this loaded was deleted, for example. This two >>> operations: >>> >>> %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 1 >>> %ld1 = load double, double* %arrayidx1 >>> >>> could be represented in the IR with this one: "void phantom_load(%ptr, >>> 1)" after removal. But, the approach that is already implemented in >>> both reviews looks better to me since we don't need to add intrinsic >>> for every removed operation in the IR. Also, while constructing such >>> form in the IR we have to be careful since some pointer operations >>> might be in loops and as the result we might end up construction an >>> incorrect IR. So, I just avoid to notice any pointer operation if it >>> is belong to a loop, except those where the the whole chain of >>> operations pointer origin, GEP, Load, Shuffle operation are in the >>> same loop and in the same basic block. >>> Thanks, Dinar. >>> >>> Here is the thread for this issue regarding using metadata: >>> http://lists.llvm.org/pipermail/llvm-dev/2017-July/115730.html >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>
On 09/13/2017 04:46 PM, Dinar Temirbulatov via llvm-dev wrote:> Hi Michael, >> I have a case where InstCombine removes a store and your approach would be >> valuable for me if the entire access to an aggregate could be restored. > Yes, no problem and we could add the aggregate pointer to this new > intrinsic and in my particular case I should ignore it, but I am > looking now at "speculation_marker" metadata and I am still not sure > how to implement it better.Are you primarily concerned with being able to widen loads later in the pipeline? Could we attached metadata to the remaining loads indicating that it would be legal to widen them? -Hal> Thanks, Dinar. > > On Wed, Sep 13, 2017 at 3:23 PM, Haidl, Michael > <michael.haidl at uni-muenster.de> wrote: >> Hi Dinar, >> >> I am asking because I am maintaining an out-of-tree pass which does >> exactly what SLP does not. It is a pass designed for GPUs to combine >> loads and stores, e.g., when consecutive fields of a structure have the >> same type it merges the loads and stores to vector loads and stores. I >> have a case where InstCombine removes a store and your approach would be >> valuable for me if the entire access to an aggregate could be restored. >> Second thing I am concerned is that this intrinsics "just" fix a >> specific problem of IC where potentially a more generic solution is needed. >> >> Cheers, >> Michael >>> Hi Michael, >>>> Interesting approach but how do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases. >>> Yes, correct, this a little bit changed example is not working. >>> #include <x86intrin.h> >>> >>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >>> } >>> But with the aggregate case it is a new level of complexity, should we >>> we care about? There might be some logic that probably would be mark >>> as dead by InstCombine and we don't want to keep it. >>> BTW: Looks like SLP could not recognize the case either : >>> define <4 x double> @vsht_d4_fold(double* %ptr, i64 %i) local_unnamed_addr #0 { >>> entry: >>> %arrayidx = getelementptr inbounds double, double* %ptr, i64 %i >>> %0 = load double, double* %arrayidx, align 8 >>> %vecinit = insertelement <4 x double> undef, double %0, i32 0 >>> %add = add i64 %i, 1 >>> %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 %add >>> %1 = load double, double* %arrayidx1, align 8 >>> %vecinit2 = insertelement <4 x double> %vecinit, double %1, i32 1 >>> %add3 = add i64 %i, 2 >>> %arrayidx4 = getelementptr inbounds double, double* %ptr, i64 %add3 >>> %2 = load double, double* %arrayidx4, align 8 >>> %vecinit5 = insertelement <4 x double> %vecinit2, double %2, i32 2 >>> %add6 = add i64 %i, 3 >>> %arrayidx7 = getelementptr inbounds double, double* %ptr, i64 %add6 >>> %3 = load double, double* %arrayidx7, align 8 >>> %vecinit8 = insertelement <4 x double> %vecinit5, double %3, i32 3 >>> %shuffle = shufflevector <4 x double> %vecinit8, <4 x double> >>> %vecinit8, <4 x i32> <i32 3, i32 3, i32 2, i32 2> >>> ret <4 x double> %shuffle >>> } >>> >>> Thanks, Dinar. >>> >>> On Tue, Sep 12, 2017 at 8:26 PM, Haidl, Michael >>> <michael.haidl at uni-muenster.de> wrote: >>>> Interesting approach but how do you handle more complex offsets, e.g., when >>>> the pointer is part of an aggregate? Only one offset does not seem enough to >>>> handle generic cases. >>>> >>>> -----Original Message----- >>>> From: Dinar Temirbulatov via llvm-dev [llvm-dev at lists.llvm.org] >>>> Received: Dienstag, 12 Sep. 2017, 9:57 >>>> To: llvm-dev at lists.llvm.org [llvm-dev at lists.llvm.org] >>>> CC: Filipe Cabecinhas [me at filcab.net] >>>> Subject: [llvm-dev] RFC phantom memory intrinsic >>>> >>>> Hi, >>>> For PR21780 solution, I plan to add a new functionality to restore >>>> memory operations that was once deleted, in this particular case it is >>>> the load operations that were deleted by InstCombine, please note that >>>> once the load was removed there is no way to restore it back and that >>>> prevents us from vectorizing the shuffle operation. There are probably >>>> more similar issues where this approach could be applied. >>>> I added phatom_mem(llvm_anyptr_ty, llvm_i64_ty) intrinsic for that, >>>> indicating that for particular pointer let's call it %ptr we observed >>>> maximum possible offset at which there was reference by its type in a >>>> function. After InstCombine deleted the load operation, it could be >>>> restored in SLPVectorizer and we could restore chains of GEPs, Loads >>>> and Inserts in case we encounter phatom_mem intrinsic. >>>> >>>> Here is two part review: >>>> https://reviews.llvm.org/D37579 - InstCombine part. >>>> https://reviews.llvm.org/D37648 - SLP part. >>>> >>>> Also, there might be different approaches in describing deleted memory >>>> operations, for example, for my case: phantom_load(llvm_anyptr_ty, >>>> llvm_i64_ty). First parameter describes pointer and second parameter >>>> offset from pointer this loaded was deleted, for example. This two >>>> operations: >>>> >>>> %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 1 >>>> %ld1 = load double, double* %arrayidx1 >>>> >>>> could be represented in the IR with this one: "void phantom_load(%ptr, >>>> 1)" after removal. But, the approach that is already implemented in >>>> both reviews looks better to me since we don't need to add intrinsic >>>> for every removed operation in the IR. Also, while constructing such >>>> form in the IR we have to be careful since some pointer operations >>>> might be in loops and as the result we might end up construction an >>>> incorrect IR. So, I just avoid to notice any pointer operation if it >>>> is belong to a loop, except those where the the whole chain of >>>> operations pointer origin, GEP, Load, Shuffle operation are in the >>>> same loop and in the same basic block. >>>> Thanks, Dinar. >>>> >>>> Here is the thread for this issue regarding using metadata: >>>> http://lists.llvm.org/pipermail/llvm-dev/2017-July/115730.html >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Dinar Temirbulatov via llvm-dev
2017-Sep-26 13:31 UTC
[llvm-dev] RFC phantom memory intrinsic
Hi Hal,>Are you primarily concerned with being able to widen loads later in the pipeline? Could we attached metadata to the remaining loads indicating that it would be legal to widen them?no, I don't have any concerns about intrinsic way of implementation, and intrinsic way looks safer for me since we somehow detach our information about memory from that actual load instruction. I updated https://reviews.llvm.org/D37579 and https://reviews.llvm.org/D37648 with adding the aggregate pointer as second parameter as Michael asked. So now, the intrinsic look like this: void phantom_mem(any_pointer base, any_pointer aggregate, uint64_t maximum_offset) For PR21780, I don't need to use aggregate so it is set to null, but for other similar issues this aggregate parameter might be useful. Thanks, Dinar. On Tue, Sep 26, 2017 at 5:52 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 09/13/2017 04:46 PM, Dinar Temirbulatov via llvm-dev wrote: >> >> Hi Michael, >>> >>> I have a case where InstCombine removes a store and your approach would >>> be >>> valuable for me if the entire access to an aggregate could be restored. >> >> Yes, no problem and we could add the aggregate pointer to this new >> intrinsic and in my particular case I should ignore it, but I am >> looking now at "speculation_marker" metadata and I am still not sure >> how to implement it better. > > > Are you primarily concerned with being able to widen loads later in the > pipeline? Could we attached metadata to the remaining loads indicating that > it would be legal to widen them? > > -Hal > > >> Thanks, Dinar. >> >> On Wed, Sep 13, 2017 at 3:23 PM, Haidl, Michael >> <michael.haidl at uni-muenster.de> wrote: >>> >>> Hi Dinar, >>> >>> I am asking because I am maintaining an out-of-tree pass which does >>> exactly what SLP does not. It is a pass designed for GPUs to combine >>> loads and stores, e.g., when consecutive fields of a structure have the >>> same type it merges the loads and stores to vector loads and stores. I >>> have a case where InstCombine removes a store and your approach would be >>> valuable for me if the entire access to an aggregate could be restored. >>> Second thing I am concerned is that this intrinsics "just" fix a >>> specific problem of IC where potentially a more generic solution is >>> needed. >>> >>> Cheers, >>> Michael >>>> >>>> Hi Michael, >>>>> >>>>> Interesting approach but how do you handle more complex offsets, e.g., >>>>> when the pointer is part of an aggregate? Only one offset does not seem >>>>> enough to handle generic cases. >>>> >>>> Yes, correct, this a little bit changed example is not working. >>>> #include <x86intrin.h> >>>> >>>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >>>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >>>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >>>> } >>>> But with the aggregate case it is a new level of complexity, should we >>>> we care about? There might be some logic that probably would be mark >>>> as dead by InstCombine and we don't want to keep it. >>>> BTW: Looks like SLP could not recognize the case either : >>>> define <4 x double> @vsht_d4_fold(double* %ptr, i64 %i) >>>> local_unnamed_addr #0 { >>>> entry: >>>> %arrayidx = getelementptr inbounds double, double* %ptr, i64 %i >>>> %0 = load double, double* %arrayidx, align 8 >>>> %vecinit = insertelement <4 x double> undef, double %0, i32 0 >>>> %add = add i64 %i, 1 >>>> %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 %add >>>> %1 = load double, double* %arrayidx1, align 8 >>>> %vecinit2 = insertelement <4 x double> %vecinit, double %1, i32 1 >>>> %add3 = add i64 %i, 2 >>>> %arrayidx4 = getelementptr inbounds double, double* %ptr, i64 %add3 >>>> %2 = load double, double* %arrayidx4, align 8 >>>> %vecinit5 = insertelement <4 x double> %vecinit2, double %2, i32 2 >>>> %add6 = add i64 %i, 3 >>>> %arrayidx7 = getelementptr inbounds double, double* %ptr, i64 %add6 >>>> %3 = load double, double* %arrayidx7, align 8 >>>> %vecinit8 = insertelement <4 x double> %vecinit5, double %3, i32 3 >>>> %shuffle = shufflevector <4 x double> %vecinit8, <4 x double> >>>> %vecinit8, <4 x i32> <i32 3, i32 3, i32 2, i32 2> >>>> ret <4 x double> %shuffle >>>> } >>>> >>>> Thanks, Dinar. >>>> >>>> On Tue, Sep 12, 2017 at 8:26 PM, Haidl, Michael >>>> <michael.haidl at uni-muenster.de> wrote: >>>>> >>>>> Interesting approach but how do you handle more complex offsets, e.g., >>>>> when >>>>> the pointer is part of an aggregate? Only one offset does not seem >>>>> enough to >>>>> handle generic cases. >>>>> >>>>> -----Original Message----- >>>>> From: Dinar Temirbulatov via llvm-dev [llvm-dev at lists.llvm.org] >>>>> Received: Dienstag, 12 Sep. 2017, 9:57 >>>>> To: llvm-dev at lists.llvm.org [llvm-dev at lists.llvm.org] >>>>> CC: Filipe Cabecinhas [me at filcab.net] >>>>> Subject: [llvm-dev] RFC phantom memory intrinsic >>>>> >>>>> Hi, >>>>> For PR21780 solution, I plan to add a new functionality to restore >>>>> memory operations that was once deleted, in this particular case it is >>>>> the load operations that were deleted by InstCombine, please note that >>>>> once the load was removed there is no way to restore it back and that >>>>> prevents us from vectorizing the shuffle operation. There are probably >>>>> more similar issues where this approach could be applied. >>>>> I added phatom_mem(llvm_anyptr_ty, llvm_i64_ty) intrinsic for that, >>>>> indicating that for particular pointer let's call it %ptr we observed >>>>> maximum possible offset at which there was reference by its type in a >>>>> function. After InstCombine deleted the load operation, it could be >>>>> restored in SLPVectorizer and we could restore chains of GEPs, Loads >>>>> and Inserts in case we encounter phatom_mem intrinsic. >>>>> >>>>> Here is two part review: >>>>> https://reviews.llvm.org/D37579 - InstCombine part. >>>>> https://reviews.llvm.org/D37648 - SLP part. >>>>> >>>>> Also, there might be different approaches in describing deleted memory >>>>> operations, for example, for my case: phantom_load(llvm_anyptr_ty, >>>>> llvm_i64_ty). First parameter describes pointer and second parameter >>>>> offset from pointer this loaded was deleted, for example. This two >>>>> operations: >>>>> >>>>> %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 1 >>>>> %ld1 = load double, double* %arrayidx1 >>>>> >>>>> could be represented in the IR with this one: "void phantom_load(%ptr, >>>>> 1)" after removal. But, the approach that is already implemented in >>>>> both reviews looks better to me since we don't need to add intrinsic >>>>> for every removed operation in the IR. Also, while constructing such >>>>> form in the IR we have to be careful since some pointer operations >>>>> might be in loops and as the result we might end up construction an >>>>> incorrect IR. So, I just avoid to notice any pointer operation if it >>>>> is belong to a loop, except those where the the whole chain of >>>>> operations pointer origin, GEP, Load, Shuffle operation are in the >>>>> same loop and in the same basic block. >>>>> Thanks, Dinar. >>>>> >>>>> Here is the thread for this issue regarding using metadata: >>>>> http://lists.llvm.org/pipermail/llvm-dev/2017-July/115730.html >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory >