thr3ads.net - llvm dev - [llvm-dev] RFC phantom memory intrinsic [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Dinar Temirbulatov via llvm-dev

2017-Sep-26 13:31 UTC

[llvm-dev] RFC phantom memory intrinsic

Hi Hal,>Are you primarily concerned with being able to widen loads later in the
pipeline? Could we attached metadata to the remaining loads indicating that it
would be legal to widen them?no, I don't have any concerns about intrinsic way of implementation,
and intrinsic way looks safer for me since we somehow detach our
information about memory from that actual load instruction. I updated
https://reviews.llvm.org/D37579 and https://reviews.llvm.org/D37648
with adding the aggregate pointer as second parameter as Michael
asked. So now, the intrinsic look like this:

void phantom_mem(any_pointer base, any_pointer aggregate, uint64_t
maximum_offset)

For PR21780, I don't need to use aggregate so it is set to null, but
for other similar issues this aggregate parameter might be useful.
                             Thanks, Dinar.

On Tue, Sep 26, 2017 at 5:52 AM, Hal Finkel <hfinkel at anl.gov>
wrote:>
> On 09/13/2017 04:46 PM, Dinar Temirbulatov via llvm-dev wrote:
>>
>> Hi Michael,
>>>
>>> I have a case where InstCombine removes a store and your approach
would
>>> be
>>> valuable for me if the entire access to an aggregate could be
restored.
>>
>> Yes, no problem and we could add the aggregate pointer to this new
>> intrinsic and in my particular case I should ignore it, but I am
>> looking now at "speculation_marker" metadata and I am still
not sure
>> how to implement it better.
>
>
> Are you primarily concerned with being able to widen loads later in the
> pipeline? Could we attached metadata to the remaining loads indicating that
> it would be legal to widen them?
>
>  -Hal
>
>
>>                    Thanks, Dinar.
>>
>> On Wed, Sep 13, 2017 at 3:23 PM, Haidl, Michael
>> <michael.haidl at uni-muenster.de> wrote:
>>>
>>> Hi Dinar,
>>>
>>> I am asking because I am maintaining an out-of-tree pass which does
>>> exactly what SLP does not. It is a pass designed for GPUs to
combine
>>> loads and stores, e.g., when consecutive fields of a structure have
the
>>> same type it merges the loads and stores to vector loads and
stores. I
>>> have a case where InstCombine removes a store and your approach
would be
>>> valuable for me if the entire access to an aggregate could be
restored.
>>> Second thing I am concerned is that this intrinsics
"just" fix a
>>> specific problem of IC where potentially a more generic solution is
>>> needed.
>>>
>>> Cheers,
>>> Michael
>>>>
>>>> Hi Michael,
>>>>>
>>>>> Interesting approach but how do you handle more complex
offsets, e.g.,
>>>>> when the pointer is part of an aggregate? Only one offset
does not seem
>>>>> enough to handle generic cases.
>>>>
>>>> Yes, correct, this a little bit changed example is not working.
>>>> #include <x86intrin.h>
>>>>
>>>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) {
>>>>     __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2],
ptr[i+3] };
>>>>     return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 );
>>>> }
>>>> But with the aggregate case it is a new level of complexity,
should we
>>>> we care about? There might be some logic that probably would be
mark
>>>> as dead by InstCombine and we don't want to keep it.
>>>> BTW: Looks like SLP could not recognize the case either :
>>>> define <4 x double> @vsht_d4_fold(double* %ptr, i64 %i)
>>>> local_unnamed_addr #0 {
>>>> entry:
>>>>     %arrayidx = getelementptr inbounds double, double* %ptr,
i64 %i
>>>>     %0 = load double, double* %arrayidx, align 8
>>>>     %vecinit = insertelement <4 x double> undef, double
%0, i32 0
>>>>     %add = add i64 %i, 1
>>>>     %arrayidx1 = getelementptr inbounds double, double* %ptr,
i64 %add
>>>>     %1 = load double, double* %arrayidx1, align 8
>>>>     %vecinit2 = insertelement <4 x double> %vecinit,
double %1, i32 1
>>>>     %add3 = add i64 %i, 2
>>>>     %arrayidx4 = getelementptr inbounds double, double* %ptr,
i64 %add3
>>>>     %2 = load double, double* %arrayidx4, align 8
>>>>     %vecinit5 = insertelement <4 x double> %vecinit2,
double %2, i32 2
>>>>     %add6 = add i64 %i, 3
>>>>     %arrayidx7 = getelementptr inbounds double, double* %ptr,
i64 %add6
>>>>     %3 = load double, double* %arrayidx7, align 8
>>>>     %vecinit8 = insertelement <4 x double> %vecinit5,
double %3, i32 3
>>>>     %shuffle = shufflevector <4 x double> %vecinit8,
<4 x double>
>>>> %vecinit8, <4 x i32> <i32 3, i32 3, i32 2, i32 2>
>>>>     ret <4 x double> %shuffle
>>>> }
>>>>
>>>>           Thanks, Dinar.
>>>>
>>>> On Tue, Sep 12, 2017 at 8:26 PM, Haidl, Michael
>>>> <michael.haidl at uni-muenster.de> wrote:
>>>>>
>>>>> Interesting approach but how do you handle more complex
offsets, e.g.,
>>>>> when
>>>>> the pointer is part of an aggregate? Only one offset does
not seem
>>>>> enough to
>>>>> handle generic cases.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Dinar Temirbulatov via llvm-dev [llvm-dev at
lists.llvm.org]
>>>>> Received: Dienstag, 12 Sep. 2017, 9:57
>>>>> To: llvm-dev at lists.llvm.org [llvm-dev at lists.llvm.org]
>>>>> CC: Filipe Cabecinhas [me at filcab.net]
>>>>> Subject: [llvm-dev] RFC phantom memory intrinsic
>>>>>
>>>>> Hi,
>>>>> For PR21780 solution, I plan to add a new functionality to
restore
>>>>> memory operations that was once deleted, in this particular
case it is
>>>>> the load operations that were deleted by InstCombine,
please note that
>>>>> once the load was removed there is no way to restore it
back and that
>>>>> prevents us from vectorizing the shuffle operation. There
are probably
>>>>> more similar issues where this approach could be applied.
>>>>> I added phatom_mem(llvm_anyptr_ty, llvm_i64_ty) intrinsic
for that,
>>>>> indicating that for particular pointer let's call it
%ptr we observed
>>>>> maximum possible offset at which there was reference by its
type in a
>>>>> function. After InstCombine deleted the load operation, it
could be
>>>>> restored in SLPVectorizer and we could restore chains of
GEPs, Loads
>>>>> and Inserts in case we encounter phatom_mem intrinsic.
>>>>>
>>>>>    Here is two part review:
>>>>>             https://reviews.llvm.org/D37579 - InstCombine
part.
>>>>>             https://reviews.llvm.org/D37648 - SLP part.
>>>>>
>>>>> Also, there might be different approaches in describing
deleted memory
>>>>> operations, for example, for my case:
phantom_load(llvm_anyptr_ty,
>>>>> llvm_i64_ty). First parameter describes pointer and second
parameter
>>>>> offset from pointer this loaded was deleted, for example.
This two
>>>>> operations:
>>>>>
>>>>>     %arrayidx1 = getelementptr inbounds double, double*
%ptr, i64 1
>>>>>     %ld1 = load double, double* %arrayidx1
>>>>>
>>>>> could be represented in the IR with this one: "void
phantom_load(%ptr,
>>>>> 1)" after removal. But, the approach that is already
implemented in
>>>>> both reviews looks better to me since we don't need to
add intrinsic
>>>>> for every removed operation in the IR. Also, while
constructing such
>>>>> form in the IR we have to be careful since some pointer
operations
>>>>> might be in loops and as the result we might end up
construction an
>>>>> incorrect IR.  So, I just avoid to notice any pointer
operation if it
>>>>> is belong to a loop, except those where the the whole chain
of
>>>>> operations pointer origin, GEP, Load, Shuffle operation are
in the
>>>>> same loop and in the same basic block.
>>>>>                                            Thanks, Dinar.
>>>>>
>>>>> Here is the thread for this issue regarding using metadata:
>>>>>
http://lists.llvm.org/pipermail/llvm-dev/2017-July/115730.html
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>

Hal Finkel via llvm-dev

2017-Sep-26 23:21 UTC

head link

[llvm-dev] RFC phantom memory intrinsic

On 09/26/2017 08:31 AM, Dinar Temirbulatov wrote:> Hi Hal,
>> Are you primarily concerned with being able to widen loads later in the
pipeline? Could we attached metadata to the remaining loads indicating that it
would be legal to widen them?
> no, I don't have any concerns about intrinsic way of implementation,
> and intrinsic way looks safer for me since we somehow detach our
> information about memory from that actual load instruction.
In general, our use of intrinsics vs. (metadata or attributes), 
especially an intrinsic that would be automatically introduced during 
canonicalization, is characterized as "only if there's no other
way".
The reason is that intrinsics are expensive, they add uses to 
otherwise-single-use values (which block optimization) (*), and keep 
otherwise-dead code alive.

I believe that there are uses for an intrinsic like this. Such uses 
require that the information implied by the intrinsic be anchored to a 
particular place in the CFG (and, importantly, not be hoisted or 
removed). This may be useful to convey at what point in the CFG some 
memory is dereferenceable, thus allowing, for example, sunk accesses to 
be re-hoisted. In your case, however, you don't need that information 
(AFAIKT). Instead, you just need to know that you can widen certain 
loads. This is valuable information, but it can be conveyed with 
metadata on the relevant loads. That would be a cheaper solution. If 
this seems impractical, I'd certainly like to understand why.

(*) CodeMetrics has a way to collect "ephemeral values" to avoid extra
values affecting some of the cost modeling, but they still block 
optimization.

Thanks again,
Hal
>   I updated
> https://reviews.llvm.org/D37579 and https://reviews.llvm.org/D37648
> with adding the aggregate pointer as second parameter as Michael
> asked. So now, the intrinsic look like this:
>
> void phantom_mem(any_pointer base, any_pointer aggregate, uint64_t
> maximum_offset)
>
> For PR21780, I don't need to use aggregate so it is set to null, but
> for other similar issues this aggregate parameter might be useful.
>                               Thanks, Dinar.
>
> On Tue, Sep 26, 2017 at 5:52 AM, Hal Finkel <hfinkel at anl.gov>
wrote:
>> On 09/13/2017 04:46 PM, Dinar Temirbulatov via llvm-dev wrote:
>>> Hi Michael,
>>>> I have a case where InstCombine removes a store and your
approach would
>>>> be
>>>> valuable for me if the entire access to an aggregate could be
restored.
>>> Yes, no problem and we could add the aggregate pointer to this new
>>> intrinsic and in my particular case I should ignore it, but I am
>>> looking now at "speculation_marker" metadata and I am
still not sure
>>> how to implement it better.
>>
>> Are you primarily concerned with being able to widen loads later in the
>> pipeline? Could we attached metadata to the remaining loads indicating
that
>> it would be legal to widen them?
>>
>>   -Hal
>>
>>
>>>                     Thanks, Dinar.
>>>
>>> On Wed, Sep 13, 2017 at 3:23 PM, Haidl, Michael
>>> <michael.haidl at uni-muenster.de> wrote:
>>>> Hi Dinar,
>>>>
>>>> I am asking because I am maintaining an out-of-tree pass which
does
>>>> exactly what SLP does not. It is a pass designed for GPUs to
combine
>>>> loads and stores, e.g., when consecutive fields of a structure
have the
>>>> same type it merges the loads and stores to vector loads and
stores. I
>>>> have a case where InstCombine removes a store and your approach
would be
>>>> valuable for me if the entire access to an aggregate could be
restored.
>>>> Second thing I am concerned is that this intrinsics
"just" fix a
>>>> specific problem of IC where potentially a more generic
solution is
>>>> needed.
>>>>
>>>> Cheers,
>>>> Michael
>>>>> Hi Michael,
>>>>>> Interesting approach but how do you handle more complex
offsets, e.g.,
>>>>>> when the pointer is part of an aggregate? Only one
offset does not seem
>>>>>> enough to handle generic cases.
>>>>> Yes, correct, this a little bit changed example is not
working.
>>>>> #include <x86intrin.h>
>>>>>
>>>>> __m256d vsht_d4_fold(const double* ptr, unsigned long long
i) {
>>>>>      __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2],
ptr[i+3] };
>>>>>      return __builtin_shufflevector( foo, foo, 3, 3, 2, 2
);
>>>>> }
>>>>> But with the aggregate case it is a new level of
complexity, should we
>>>>> we care about? There might be some logic that probably
would be mark
>>>>> as dead by InstCombine and we don't want to keep it.
>>>>> BTW: Looks like SLP could not recognize the case either :
>>>>> define <4 x double> @vsht_d4_fold(double* %ptr, i64
%i)
>>>>> local_unnamed_addr #0 {
>>>>> entry:
>>>>>      %arrayidx = getelementptr inbounds double, double*
%ptr, i64 %i
>>>>>      %0 = load double, double* %arrayidx, align 8
>>>>>      %vecinit = insertelement <4 x double> undef,
double %0, i32 0
>>>>>      %add = add i64 %i, 1
>>>>>      %arrayidx1 = getelementptr inbounds double, double*
%ptr, i64 %add
>>>>>      %1 = load double, double* %arrayidx1, align 8
>>>>>      %vecinit2 = insertelement <4 x double> %vecinit,
double %1, i32 1
>>>>>      %add3 = add i64 %i, 2
>>>>>      %arrayidx4 = getelementptr inbounds double, double*
%ptr, i64 %add3
>>>>>      %2 = load double, double* %arrayidx4, align 8
>>>>>      %vecinit5 = insertelement <4 x double>
%vecinit2, double %2, i32 2
>>>>>      %add6 = add i64 %i, 3
>>>>>      %arrayidx7 = getelementptr inbounds double, double*
%ptr, i64 %add6
>>>>>      %3 = load double, double* %arrayidx7, align 8
>>>>>      %vecinit8 = insertelement <4 x double>
%vecinit5, double %3, i32 3
>>>>>      %shuffle = shufflevector <4 x double> %vecinit8,
<4 x double>
>>>>> %vecinit8, <4 x i32> <i32 3, i32 3, i32 2, i32
2>
>>>>>      ret <4 x double> %shuffle
>>>>> }
>>>>>
>>>>>            Thanks, Dinar.
>>>>>
>>>>> On Tue, Sep 12, 2017 at 8:26 PM, Haidl, Michael
>>>>> <michael.haidl at uni-muenster.de> wrote:
>>>>>> Interesting approach but how do you handle more complex
offsets, e.g.,
>>>>>> when
>>>>>> the pointer is part of an aggregate? Only one offset
does not seem
>>>>>> enough to
>>>>>> handle generic cases.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Dinar Temirbulatov via llvm-dev [llvm-dev at
lists.llvm.org]
>>>>>> Received: Dienstag, 12 Sep. 2017, 9:57
>>>>>> To: llvm-dev at lists.llvm.org [llvm-dev at
lists.llvm.org]
>>>>>> CC: Filipe Cabecinhas [me at filcab.net]
>>>>>> Subject: [llvm-dev] RFC phantom memory intrinsic
>>>>>>
>>>>>> Hi,
>>>>>> For PR21780 solution, I plan to add a new functionality
to restore
>>>>>> memory operations that was once deleted, in this
particular case it is
>>>>>> the load operations that were deleted by InstCombine,
please note that
>>>>>> once the load was removed there is no way to restore it
back and that
>>>>>> prevents us from vectorizing the shuffle operation.
There are probably
>>>>>> more similar issues where this approach could be
applied.
>>>>>> I added phatom_mem(llvm_anyptr_ty, llvm_i64_ty)
intrinsic for that,
>>>>>> indicating that for particular pointer let's call
it %ptr we observed
>>>>>> maximum possible offset at which there was reference by
its type in a
>>>>>> function. After InstCombine deleted the load operation,
it could be
>>>>>> restored in SLPVectorizer and we could restore chains
of GEPs, Loads
>>>>>> and Inserts in case we encounter phatom_mem intrinsic.
>>>>>>
>>>>>>     Here is two part review:
>>>>>>              https://reviews.llvm.org/D37579 -
InstCombine part.
>>>>>>              https://reviews.llvm.org/D37648 - SLP
part.
>>>>>>
>>>>>> Also, there might be different approaches in describing
deleted memory
>>>>>> operations, for example, for my case:
phantom_load(llvm_anyptr_ty,
>>>>>> llvm_i64_ty). First parameter describes pointer and
second parameter
>>>>>> offset from pointer this loaded was deleted, for
example. This two
>>>>>> operations:
>>>>>>
>>>>>>      %arrayidx1 = getelementptr inbounds double,
double* %ptr, i64 1
>>>>>>      %ld1 = load double, double* %arrayidx1
>>>>>>
>>>>>> could be represented in the IR with this one:
"void phantom_load(%ptr,
>>>>>> 1)" after removal. But, the approach that is
already implemented in
>>>>>> both reviews looks better to me since we don't need
to add intrinsic
>>>>>> for every removed operation in the IR. Also, while
constructing such
>>>>>> form in the IR we have to be careful since some pointer
operations
>>>>>> might be in loops and as the result we might end up
construction an
>>>>>> incorrect IR.  So, I just avoid to notice any pointer
operation if it
>>>>>> is belong to a loop, except those where the the whole
chain of
>>>>>> operations pointer origin, GEP, Load, Shuffle operation
are in the
>>>>>> same loop and in the same basic block.
>>>>>>                                             Thanks,
Dinar.
>>>>>>
>>>>>> Here is the thread for this issue regarding using
metadata:
>>>>>>
http://lists.llvm.org/pipermail/llvm-dev/2017-July/115730.html
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Dinar Temirbulatov via llvm-dev

2017-Sep-26 23:38 UTC

head link

[llvm-dev] RFC phantom memory intrinsic

>In general, our use of intrinsics vs. (metadata or attributes), especially
an intrinsic that would be automatically introduced during canonicalization, is
characterized as "only if there's no other way". The reason is
that intrinsics are ?>expensive, they add uses to otherwise-single-use values
(which block optimization) (*), and keep otherwise-dead code alive.
>
>I believe that there are uses for an intrinsic like this. Such uses require
that the information implied by the intrinsic be anchored to a particular place
in the CFG (and, importantly, not be hoisted or removed). This may be useful to
convey >at what point in the CFG some memory is dereferenceable, thus
allowing, for example, sunk accesses to be re-hoisted. In your case, however,
you don't need that information (AFAIKT). Instead, you just need to know
that you can widen >certain loads. This is valuable information, but it can
be conveyed with metadata on the relevant loads. That would be a cheaper
solution. If this seems impractical, I'd certainly like to understand why.ok, I see, yes it would be certainly cheaper by using metadata. Thank
you Philip for pointing for important use-case in D37648.
                            Thanks, Dinar.

On Wed, Sep 27, 2017 at 2:21 AM, Hal Finkel <hfinkel at anl.gov>
wrote:>
> On 09/26/2017 08:31 AM, Dinar Temirbulatov wrote:
>>
>> Hi Hal,
>>>
>>> Are you primarily concerned with being able to widen loads later in
the
>>> pipeline? Could we attached metadata to the remaining loads
indicating that
>>> it would be legal to widen them?
>>
>> no, I don't have any concerns about intrinsic way of
implementation,
>> and intrinsic way looks safer for me since we somehow detach our
>> information about memory from that actual load instruction.
>
>
> In general, our use of intrinsics vs. (metadata or attributes), especially
> an intrinsic that would be automatically introduced during
canonicalization,
> is characterized as "only if there's no other way". The
reason is that
> intrinsics are expensive, they add uses to otherwise-single-use values
> (which block optimization) (*), and keep otherwise-dead code alive.
>
> I believe that there are uses for an intrinsic like this. Such uses require
> that the information implied by the intrinsic be anchored to a particular
> place in the CFG (and, importantly, not be hoisted or removed). This may be
> useful to convey at what point in the CFG some memory is dereferenceable,
> thus allowing, for example, sunk accesses to be re-hoisted. In your case,
> however, you don't need that information (AFAIKT). Instead, you just
need to
> know that you can widen certain loads. This is valuable information, but it
> can be conveyed with metadata on the relevant loads. That would be a
cheaper
> solution. If this seems impractical, I'd certainly like to understand
why.
>
> (*) CodeMetrics has a way to collect "ephemeral values" to avoid
extra
> values affecting some of the cost modeling, but they still block
> optimization.
>
> Thanks again,
> Hal
>
>
>>   I updated
>> https://reviews.llvm.org/D37579 and https://reviews.llvm.org/D37648
>> with adding the aggregate pointer as second parameter as Michael
>> asked. So now, the intrinsic look like this:
>>
>> void phantom_mem(any_pointer base, any_pointer aggregate, uint64_t
>> maximum_offset)
>>
>> For PR21780, I don't need to use aggregate so it is set to null,
but
>> for other similar issues this aggregate parameter might be useful.
>>                               Thanks, Dinar.
>>
>> On Tue, Sep 26, 2017 at 5:52 AM, Hal Finkel <hfinkel at anl.gov>
wrote:
>>>
>>> On 09/13/2017 04:46 PM, Dinar Temirbulatov via llvm-dev wrote:
>>>>
>>>> Hi Michael,
>>>>>
>>>>> I have a case where InstCombine removes a store and your
approach would
>>>>> be
>>>>> valuable for me if the entire access to an aggregate could
be restored.
>>>>
>>>> Yes, no problem and we could add the aggregate pointer to this
new
>>>> intrinsic and in my particular case I should ignore it, but I
am
>>>> looking now at "speculation_marker" metadata and I am
still not sure
>>>> how to implement it better.
>>>
>>>
>>> Are you primarily concerned with being able to widen loads later in
the
>>> pipeline? Could we attached metadata to the remaining loads
indicating
>>> that
>>> it would be legal to widen them?
>>>
>>>   -Hal
>>>
>>>
>>>>                     Thanks, Dinar.
>>>>
>>>> On Wed, Sep 13, 2017 at 3:23 PM, Haidl, Michael
>>>> <michael.haidl at uni-muenster.de> wrote:
>>>>>
>>>>> Hi Dinar,
>>>>>
>>>>> I am asking because I am maintaining an out-of-tree pass
which does
>>>>> exactly what SLP does not. It is a pass designed for GPUs
to combine
>>>>> loads and stores, e.g., when consecutive fields of a
structure have the
>>>>> same type it merges the loads and stores to vector loads
and stores. I
>>>>> have a case where InstCombine removes a store and your
approach would
>>>>> be
>>>>> valuable for me if the entire access to an aggregate could
be restored.
>>>>> Second thing I am concerned is that this intrinsics
"just" fix a
>>>>> specific problem of IC where potentially a more generic
solution is
>>>>> needed.
>>>>>
>>>>> Cheers,
>>>>> Michael
>>>>>>
>>>>>> Hi Michael,
>>>>>>>
>>>>>>> Interesting approach but how do you handle more
complex offsets,
>>>>>>> e.g.,
>>>>>>> when the pointer is part of an aggregate? Only one
offset does not
>>>>>>> seem
>>>>>>> enough to handle generic cases.
>>>>>>
>>>>>> Yes, correct, this a little bit changed example is not
working.
>>>>>> #include <x86intrin.h>
>>>>>>
>>>>>> __m256d vsht_d4_fold(const double* ptr, unsigned long
long i) {
>>>>>>      __m256d foo = (__m256d){ ptr[i], ptr[i+1],
ptr[i+2], ptr[i+3] };
>>>>>>      return __builtin_shufflevector( foo, foo, 3, 3, 2,
2 );
>>>>>> }
>>>>>> But with the aggregate case it is a new level of
complexity, should we
>>>>>> we care about? There might be some logic that probably
would be mark
>>>>>> as dead by InstCombine and we don't want to keep
it.
>>>>>> BTW: Looks like SLP could not recognize the case either
:
>>>>>> define <4 x double> @vsht_d4_fold(double* %ptr,
i64 %i)
>>>>>> local_unnamed_addr #0 {
>>>>>> entry:
>>>>>>      %arrayidx = getelementptr inbounds double, double*
%ptr, i64 %i
>>>>>>      %0 = load double, double* %arrayidx, align 8
>>>>>>      %vecinit = insertelement <4 x double> undef,
double %0, i32 0
>>>>>>      %add = add i64 %i, 1
>>>>>>      %arrayidx1 = getelementptr inbounds double,
double* %ptr, i64
>>>>>> %add
>>>>>>      %1 = load double, double* %arrayidx1, align 8
>>>>>>      %vecinit2 = insertelement <4 x double>
%vecinit, double %1, i32 1
>>>>>>      %add3 = add i64 %i, 2
>>>>>>      %arrayidx4 = getelementptr inbounds double,
double* %ptr, i64
>>>>>> %add3
>>>>>>      %2 = load double, double* %arrayidx4, align 8
>>>>>>      %vecinit5 = insertelement <4 x double>
%vecinit2, double %2, i32
>>>>>> 2
>>>>>>      %add6 = add i64 %i, 3
>>>>>>      %arrayidx7 = getelementptr inbounds double,
double* %ptr, i64
>>>>>> %add6
>>>>>>      %3 = load double, double* %arrayidx7, align 8
>>>>>>      %vecinit8 = insertelement <4 x double>
%vecinit5, double %3, i32
>>>>>> 3
>>>>>>      %shuffle = shufflevector <4 x double>
%vecinit8, <4 x double>
>>>>>> %vecinit8, <4 x i32> <i32 3, i32 3, i32 2, i32
2>
>>>>>>      ret <4 x double> %shuffle
>>>>>> }
>>>>>>
>>>>>>            Thanks, Dinar.
>>>>>>
>>>>>> On Tue, Sep 12, 2017 at 8:26 PM, Haidl, Michael
>>>>>> <michael.haidl at uni-muenster.de> wrote:
>>>>>>>
>>>>>>> Interesting approach but how do you handle more
complex offsets,
>>>>>>> e.g.,
>>>>>>> when
>>>>>>> the pointer is part of an aggregate? Only one
offset does not seem
>>>>>>> enough to
>>>>>>> handle generic cases.
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Dinar Temirbulatov via llvm-dev [llvm-dev at
lists.llvm.org]
>>>>>>> Received: Dienstag, 12 Sep. 2017, 9:57
>>>>>>> To: llvm-dev at lists.llvm.org [llvm-dev at
lists.llvm.org]
>>>>>>> CC: Filipe Cabecinhas [me at filcab.net]
>>>>>>> Subject: [llvm-dev] RFC phantom memory intrinsic
>>>>>>>
>>>>>>> Hi,
>>>>>>> For PR21780 solution, I plan to add a new
functionality to restore
>>>>>>> memory operations that was once deleted, in this
particular case it
>>>>>>> is
>>>>>>> the load operations that were deleted by
InstCombine, please note
>>>>>>> that
>>>>>>> once the load was removed there is no way to
restore it back and that
>>>>>>> prevents us from vectorizing the shuffle operation.
There are
>>>>>>> probably
>>>>>>> more similar issues where this approach could be
applied.
>>>>>>> I added phatom_mem(llvm_anyptr_ty, llvm_i64_ty)
intrinsic for that,
>>>>>>> indicating that for particular pointer let's
call it %ptr we observed
>>>>>>> maximum possible offset at which there was
reference by its type in a
>>>>>>> function. After InstCombine deleted the load
operation, it could be
>>>>>>> restored in SLPVectorizer and we could restore
chains of GEPs, Loads
>>>>>>> and Inserts in case we encounter phatom_mem
intrinsic.
>>>>>>>
>>>>>>>     Here is two part review:
>>>>>>>              https://reviews.llvm.org/D37579 -
InstCombine part.
>>>>>>>              https://reviews.llvm.org/D37648 - SLP
part.
>>>>>>>
>>>>>>> Also, there might be different approaches in
describing deleted
>>>>>>> memory
>>>>>>> operations, for example, for my case:
phantom_load(llvm_anyptr_ty,
>>>>>>> llvm_i64_ty). First parameter describes pointer and
second parameter
>>>>>>> offset from pointer this loaded was deleted, for
example. This two
>>>>>>> operations:
>>>>>>>
>>>>>>>      %arrayidx1 = getelementptr inbounds double,
double* %ptr, i64 1
>>>>>>>      %ld1 = load double, double* %arrayidx1
>>>>>>>
>>>>>>> could be represented in the IR with this one:
"void
>>>>>>> phantom_load(%ptr,
>>>>>>> 1)" after removal. But, the approach that is
already implemented in
>>>>>>> both reviews looks better to me since we don't
need to add intrinsic
>>>>>>> for every removed operation in the IR. Also, while
constructing such
>>>>>>> form in the IR we have to be careful since some
pointer operations
>>>>>>> might be in loops and as the result we might end up
construction an
>>>>>>> incorrect IR.  So, I just avoid to notice any
pointer operation if it
>>>>>>> is belong to a loop, except those where the the
whole chain of
>>>>>>> operations pointer origin, GEP, Load, Shuffle
operation are in the
>>>>>>> same loop and in the same basic block.
>>>>>>>                                             Thanks,
Dinar.
>>>>>>>
>>>>>>> Here is the thread for this issue regarding using
metadata:
>>>>>>>
http://lists.llvm.org/pipermail/llvm-dev/2017-July/115730.html
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>> --
>>> Hal Finkel
>>> Lead, Compiler Technology and Programming Languages
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>

llvm dev - Sep 2017 - RFC phantom memory intrinsic

[llvm-dev] RFC phantom memory intrinsic

[llvm-dev] RFC phantom memory intrinsic

[llvm-dev] RFC phantom memory intrinsic