Currently, when a loop with loads or stores is partially unrolled, the
generated code ends up looking like:
for.body74: ; preds = %for.body74,
%for.cond71.preheader
%indvars.iv = phi i64 [ 0, %for.cond71.preheader ], [ %
indvars.iv.next.9, %for.body74 ]
...
%indvars.iv.next.1 = add i64 %indvars.iv, 2
%arrayidx78.2 = getelementptr inbounds [200 x [200 x double]]* @a, i64
0, i64 %indvars.iv10, i64 %indvars.iv.next.1
...
%indvars.iv.next.2 = add i64 %indvars.iv, 3
%arrayidx78.3 = getelementptr inbounds [200 x [200 x double]]* @a, i64
0, i64 %indvars.iv10, i64 %indvars.iv.next.2
...
%indvars.iv.next.3 = add i64 %indvars.iv, 4
%arrayidx78.4 = getelementptr inbounds [200 x [200 x double]]* @a, i64
0, i64 %indvars.iv10, i64 %indvars.iv.next.3
...
I think that it would be better if it calculated the base pointer at the
beginning and then added to it somehow. Is that possible? For one thing,
llvm::GetPointerBaseWithConstantOffset cannot currently tell that all of
the loads and stores reference the same base pointer with a constant
offset. If it could, that would facilitate vectorization (and possibly
other optimizations as well).
If the current scheme is the only way (given LLVM's abstraction model),
would it make sense for llvm::GetPointerBaseWithConstantOffset to peel
away constant adds and subtracts to the index offsets and then construct
a new base pointer / offset pair from that?
Also, why is partial unrolling not turned on by default?
Thanks in advance,
Hal
--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory