Displaying 10 results from an estimated 10 matches for "vecinit".
2020 Jan 11
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
...ather than on the input
vectors themselves.
In the PR you linked, there is an example that shows the difference
(simplified to <2 x double> for brevity):
define dso_local <2 x double> @test(i64 %a, i64 %b) {
entry:
%conv = uitofp i64 %a to double
%conv1 = uitofp i64 %b to double
%vecinit = insertelement <2 x double> undef, double %conv, i32 0
%vecinit2 = insertelement <2 x double> %vecinit, double %conv1, i32 1
ret <2 x double> %vecinit2
}
The inputs here are scalars so I suppose it is quite possible (perhaps
likely) that on some targets, doing the insert wit...
2020 Jan 11
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
...you linked, there is an example that shows the difference
>> (simplified to <2 x double> for brevity):
>> define dso_local <2 x double> @test(i64 %a, i64 %b) {
>> entry:
>> %conv = uitofp i64 %a to double
>> %conv1 = uitofp i64 %b to double
>> %vecinit = insertelement <2 x double> undef, double %conv, i32 0
>> %vecinit2 = insertelement <2 x double> %vecinit, double %conv1, i32 1
>> ret <2 x double> %vecinit2
>> }
>>
>> The inputs here are scalars so I suppose it is quite possible (perhaps
>&g...
2012 Feb 28
1
[LLVMdev] How to vectorize a vector type cast?
...it-llvm" and then through "opt -O2 -S", produces the following IR:
define <4 x float> @to_float4(i32 %in.coerce) nounwind uwtable readnone {
entry:
%0 = bitcast i32 %in.coerce to <4 x i8>
%1 = extractelement <4 x i8> %0, i32 0
%conv = uitofp i8 %1 to float
%vecinit = insertelement <4 x float> undef, float %conv, i32 0
%2 = extractelement <4 x i8> %0, i32 1
%conv2 = uitofp i8 %2 to float
%vecinit3 = insertelement <4 x float> %vecinit, float %conv2, i32 1
%3 = extractelement <4 x i8> %0, i32 2
%conv4 = uitofp i8 %3 to float
%...
2020 Jan 10
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
I have added a few PPC-specific DAG combines in the past that follow this
pattern on specific operations. Now that it appears that this would be
useful to do on yet another operation, I'm wondering what people think
about doing this in the target-independent DAG Combiner for any
legal/custom operation on the target.
TL; DR;
The generic pattern would look like this:
(build_vector (op
2017 Sep 13
2
RFC phantom memory intrinsic
...nd we don't want to keep it.
BTW: Looks like SLP could not recognize the case either :
define <4 x double> @vsht_d4_fold(double* %ptr, i64 %i) local_unnamed_addr #0 {
entry:
%arrayidx = getelementptr inbounds double, double* %ptr, i64 %i
%0 = load double, double* %arrayidx, align 8
%vecinit = insertelement <4 x double> undef, double %0, i32 0
%add = add i64 %i, 1
%arrayidx1 = getelementptr inbounds double, double* %ptr, i64 %add
%1 = load double, double* %arrayidx1, align 8
%vecinit2 = insertelement <4 x double> %vecinit, double %1, i32 1
%add3 = add i64 %i, 2...
2017 Sep 13
2
RFC phantom memory intrinsic
...SLP could not recognize the case either :
>> define <4 x double> @vsht_d4_fold(double* %ptr, i64 %i) local_unnamed_addr #0 {
>> entry:
>> %arrayidx = getelementptr inbounds double, double* %ptr, i64 %i
>> %0 = load double, double* %arrayidx, align 8
>> %vecinit = insertelement <4 x double> undef, double %0, i32 0
>> %add = add i64 %i, 1
>> %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 %add
>> %1 = load double, double* %arrayidx1, align 8
>> %vecinit2 = insertelement <4 x double> %vecinit, dou...
2017 Sep 26
0
RFC phantom memory intrinsic
...e the case either :
>>> define <4 x double> @vsht_d4_fold(double* %ptr, i64 %i) local_unnamed_addr #0 {
>>> entry:
>>> %arrayidx = getelementptr inbounds double, double* %ptr, i64 %i
>>> %0 = load double, double* %arrayidx, align 8
>>> %vecinit = insertelement <4 x double> undef, double %0, i32 0
>>> %add = add i64 %i, 1
>>> %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 %add
>>> %1 = load double, double* %arrayidx1, align 8
>>> %vecinit2 = insertelement <4 x doub...
2017 Sep 26
2
RFC phantom memory intrinsic
...define <4 x double> @vsht_d4_fold(double* %ptr, i64 %i)
>>>> local_unnamed_addr #0 {
>>>> entry:
>>>> %arrayidx = getelementptr inbounds double, double* %ptr, i64 %i
>>>> %0 = load double, double* %arrayidx, align 8
>>>> %vecinit = insertelement <4 x double> undef, double %0, i32 0
>>>> %add = add i64 %i, 1
>>>> %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 %add
>>>> %1 = load double, double* %arrayidx1, align 8
>>>> %vecinit2 = insertelem...
2017 Sep 26
0
RFC phantom memory intrinsic
...t; @vsht_d4_fold(double* %ptr, i64 %i)
>>>>> local_unnamed_addr #0 {
>>>>> entry:
>>>>> %arrayidx = getelementptr inbounds double, double* %ptr, i64 %i
>>>>> %0 = load double, double* %arrayidx, align 8
>>>>> %vecinit = insertelement <4 x double> undef, double %0, i32 0
>>>>> %add = add i64 %i, 1
>>>>> %arrayidx1 = getelementptr inbounds double, double* %ptr, i64 %add
>>>>> %1 = load double, double* %arrayidx1, align 8
>>>>> %v...
2017 Sep 12
3
RFC phantom memory intrinsic
Hi,
For PR21780 solution, I plan to add a new functionality to restore
memory operations that was once deleted, in this particular case it is
the load operations that were deleted by InstCombine, please note that
once the load was removed there is no way to restore it back and that
prevents us from vectorizing the shuffle operation. There are probably
more similar issues where this approach could