Thomas Preud'homme via llvm-dev
2021-Jun-04 09:06 UTC
[llvm-dev] Hardware loop and SpeculateAroundPHIs pass
Hi,
We've experienced some regression due to SpeculateAroundPHIs on our
non-upstream target: code that used hardware loop is now using a regular branch
loop. Apologies for only picking this now, we've only started using the new
pass manager when we merged in the commit that enabled it by default.
An example of what causes us trouble is:
unsigned KnownDec(unsigned *arr) {
unsigned x = 0x2000;
unsigned z = 0;
while(x) {
z += arr[x-1];
x--;
}
return z;
}
we get the following IR after the SpeculateAroundPHIs pass:
entry:
%sub.0 = add nsw i32 2000, -1
br label %while.body
while.body: ; preds =
%while.body.while.body_crit_edge, %entry
%z.07 = phi i32 [ 0, %entry ], [ %add, %while.body.while.body_crit_edge ]
%sub.phi = phi i32 [ %sub.0, %entry ], [ %sub.1,
%while.body.while.body_crit_edge ]
%arrayidx = getelementptr inbounds i32, i32* %arr, i32 %sub.phi
%0 = load i32, i32* %arrayidx, align 4, !tbaa !2
%add = add i32 %0, %z.07
%tobool.not = icmp eq i32 %sub.phi, 0
br i1 %tobool.not, label %while.end, label %while.body.while.body_crit_edge,
!llvm.loop !6
while.body.while.body_crit_edge: ; preds = %while.body
%sub.1 = add nsw i32 %sub.phi, -1
br label %while.body
while.end: ; preds = %while.body
ret i32 %add
The fact that the condition check is separate from the loop latch means we
cannot use a hardware loop instruction. Similar code gets generated for PowerPC
if using 0x10000 instead of 2000 but (i) they run EarlyCSE after which sink the
value down again in the PHI and they have a pass to canonicalize the loop form
for their addressing mode which incidentally moves the getelementptr from the
critical edge (inserted there by the loop strength reduction pass) back into the
main loop body. This feels kinda lucky, after all the EarlyCSE and
ppc-loop-instr-form-prep passes were added to the PPC pipeline long before the
PHI speculation one.
Is the expectation that targets with hardware loop deal with the result of PHI
speculation? If that's the case, could we have a hook for those target to
disable the pass when they suspect a hardware loop instruction might be used?
Best regards,
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210604/6ea71b18/attachment.html>
Sjoerd Meijer via llvm-dev
2021-Jun-04 09:19 UTC
[llvm-dev] Hardware loop and SpeculateAroundPHIs pass
FWIW, looks related to https://bugs.llvm.org/show_bug.cgi?id=48821.
________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Thomas
Preud'homme via llvm-dev <llvm-dev at lists.llvm.org>
Sent: 04 June 2021 10:06
To: llvm-dev <llvm-dev at lists.llvm.org>; Chandler Carruth <chandlerc
at gmail.com>
Subject: [llvm-dev] Hardware loop and SpeculateAroundPHIs pass
Hi,
We've experienced some regression due to SpeculateAroundPHIs on our
non-upstream target: code that used hardware loop is now using a regular branch
loop. Apologies for only picking this now, we've only started using the new
pass manager when we merged in the commit that enabled it by default.
An example of what causes us trouble is:
unsigned KnownDec(unsigned *arr) {
unsigned x = 0x2000;
unsigned z = 0;
while(x) {
z += arr[x-1];
x--;
}
return z;
}
we get the following IR after the SpeculateAroundPHIs pass:
entry:
%sub.0 = add nsw i32 2000, -1
br label %while.body
while.body: ; preds =
%while.body.while.body_crit_edge, %entry
%z.07 = phi i32 [ 0, %entry ], [ %add, %while.body.while.body_crit_edge ]
%sub.phi = phi i32 [ %sub.0, %entry ], [ %sub.1,
%while.body.while.body_crit_edge ]
%arrayidx = getelementptr inbounds i32, i32* %arr, i32 %sub.phi
%0 = load i32, i32* %arrayidx, align 4, !tbaa !2
%add = add i32 %0, %z.07
%tobool.not = icmp eq i32 %sub.phi, 0
br i1 %tobool.not, label %while.end, label %while.body.while.body_crit_edge,
!llvm.loop !6
while.body.while.body_crit_edge: ; preds = %while.body
%sub.1 = add nsw i32 %sub.phi, -1
br label %while.body
while.end: ; preds = %while.body
ret i32 %add
The fact that the condition check is separate from the loop latch means we
cannot use a hardware loop instruction. Similar code gets generated for PowerPC
if using 0x10000 instead of 2000 but (i) they run EarlyCSE after which sink the
value down again in the PHI and they have a pass to canonicalize the loop form
for their addressing mode which incidentally moves the getelementptr from the
critical edge (inserted there by the loop strength reduction pass) back into the
main loop body. This feels kinda lucky, after all the EarlyCSE and
ppc-loop-instr-form-prep passes were added to the PPC pipeline long before the
PHI speculation one.
Is the expectation that targets with hardware loop deal with the result of PHI
speculation? If that's the case, could we have a hook for those target to
disable the pass when they suspect a hardware loop instruction might be used?
Best regards,
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210604/c5843294/attachment.html>