Hi,
I met an issue caused by Simplify the CFG. We have the following instructions:
sw.bb: ; preds = %if.then63
%bf.load65 = load i192, i192* %13, align 4
%bf.lshr66 = lshr i192 %bf.load65, 80
sw.bb70: ; preds = %if.then63
%bf.load73 = load i192, i192* %15, align 4
%bf.lshr74 = lshr i192 %bf.load73, 96
sw.bb78: ; preds = %if.then63
%bf.load81 = load i192, i192* %17, align 4
%bf.lshr82 = lshr i192 %bf.load81, 112
sw.bb86: ; preds = %if.then63
%bf.load89 = load i192, i192* %19, align 4
%bf.lshr90 = lshr i192 %bf.load89, 128
sw.bb94: ; preds = %if.then63
%bf.load97 = load i192, i192* %21, align 4
%bf.lshr98 = lshr i192 %bf.load97, 144
Each pair of them is from different blocks and will be sunk into an end block.
Also, a PHI node %.sink is created for the constant operand of lshr as
sw.epilog.sink.split: ; preds = %if.then63, %sw.bb,
%sw.bb78, %sw.bb86, %sw.bb94
%.sink = phi i192 [ 144, %sw.bb94 ], [ 128, %sw.bb86 ], [ 112, %sw.bb78 ], [
80, %sw.bb ], [ 96, %if.then63 ]
%bf.load97 = load i192, i192* %13, align 4
%bf.lshr98 = lshr i192 %bf.load97, %.sink
Before lshr nodes are sunk, our lowering backend can understand which 32 bits of
i192 are need and load that 32 bits only.
But after this CFG pass, %.sink is unknown, then there will be i192 load (8
32-bit loads).
I have some ideas how to handle it, but still I would like some feedback about
what’s best way to do.
(The LLVM IR before and after CFG are shown below)
Thank you.
Best regards,
Ning Xie
*** Before simplify CFG is applied, we have the following LLVM IR ***
if.then63: ; preds = %if.end
%trunc = trunc i8 %11 to i3
switch i3 %trunc, label %sw.epilog [
i3 0, label %sw.bb
i3 1, label %sw.bb70
i3 2, label %sw.bb78
i3 3, label %sw.bb86
i3 -4, label %sw.bb94
]
sw.bb: ; preds = %if.then63
%13 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* %C0000159C,
i32 0, i32 5, i32 3, i32 %conv50, i32 0
%bf.load65 = load i192, i192* %13, align 4
%bf.lshr66 = lshr i192 %bf.load65, 80
%14 = trunc i192 %bf.lshr66 to i32
%bf.cast68 = and i32 %14, 4095
br label %sw.epilog
sw.bb70: ; preds = %if.then63
%15 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* %C0000159C,
i32 0, i32 5, i32 3, i32 %conv50, i32 0
%bf.load73 = load i192, i192* %15, align 4
%bf.lshr74 = lshr i192 %bf.load73, 96
%16 = trunc i192 %bf.lshr74 to i32
%bf.cast76 = and i32 %16, 4095
br label %sw.epilog
sw.bb78: ; preds = %if.then63
%17 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* %C0000159C,
i32 0, i32 5, i32 3, i32 %conv50, i32 0
%bf.load81 = load i192, i192* %17, align 4
%bf.lshr82 = lshr i192 %bf.load81, 112
%18 = trunc i192 %bf.lshr82 to i32
%bf.cast84 = and i32 %18, 4095
br label %sw.epilog
sw.bb86: ; preds = %if.then63
%19 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* %C0000159C,
i32 0, i32 5, i32 3, i32 %conv50, i32 0
%bf.load89 = load i192, i192* %19, align 4
%bf.lshr90 = lshr i192 %bf.load89, 128
%20 = trunc i192 %bf.lshr90 to i32
%bf.cast92 = and i32 %20, 4095
sw.bb94: ; preds = %if.then63
%21 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* %C0000159C,
i32 0, i32 5, i32 3, i32 %conv50, i32 0
%bf.load97 = load i192, i192* %21, align 4
%bf.lshr98 = lshr i192 %bf.load97, 144
%22 = trunc i192 %bf.lshr98 to i32
%bf.cast100 = and i32 %22, 4095
br label %sw.epilog
*** IR Dump After Simplify the CFG ***
if.then63: ; preds = %if.end
%trunc = trunc i8 %11 to i3
switch i3 %trunc, label %sw.epilog [
i3 0, label %sw.bb
i3 1, label %sw.epilog.sink.split
i3 2, label %sw.bb78
i3 3, label %sw.bb86
i3 -4, label %sw.bb94
]
sw.bb: ; preds = %if.then63
br label %sw.epilog.sink.split
sw.bb78: ; preds = %if.then63
br label %sw.epilog.sink.split
sw.bb86: ; preds = %if.then63
br label %sw.epilog.sink.split
sw.bb94: ; preds = %if.then63
br label %sw.epilog.sink.split
sw.epilog.sink.split: ; preds = %if.then63, %sw.bb,
%sw.bb78, %sw.bb86, %sw.bb94
%.sink = phi i192 [ 144, %sw.bb94 ], [ 128, %sw.bb86 ], [ 112, %sw.bb78 ], [
80, %sw.bb ], [ 96, %if.then63 ]
%13 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* %C0000159C,
i32 0, i32 5, i32 3, i32 %conv50, i32 0
%bf.load97 = load i192, i192* %13, align 4
%bf.lshr98 = lshr i192 %bf.load97, %.sink
%14 = trunc i192 %bf.lshr98 to i32
%bf.cast100 = and i32 %14, 4095
br label %sw.epilog
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170803/77f8b5f1/attachment.html>
On 8/3/2017 3:19 PM, Ning XIE via llvm-dev wrote:> Hi, > > I met an issue caused by Simplify the CFG. We have the following > instructions: > sw.bb: ; preds = %if.then63 > %bf.load65 = load i192, i192* %13, align 4 > %bf.lshr66 = lshr i192 %bf.load65, 80 > sw.bb70: ; preds = %if.then63 > %bf.load73 = load i192, i192* %15, align 4 > %bf.lshr74 = lshr i192 %bf.load73, 96 > sw.bb78: ; preds = %if.then63 > %bf.load81 = load i192, i192* %17, align 4 > %bf.lshr82 = lshr i192 %bf.load81, 112 > sw.bb86: ; preds = %if.then63 > %bf.load89 = load i192, i192* %19, align 4 > %bf.lshr90 = lshr i192 %bf.load89, 128 > sw.bb94: ; preds = %if.then63 > %bf.load97 = load i192, i192* %21, align 4 > %bf.lshr98 = lshr i192 %bf.load97, 144 > Each pair of them is from different blocks and will be sunk into an > end block. Also, a PHI node %.sink is created for the constant operand > of lshr as > sw.epilog.sink.split: ; preds = %if.then63, %sw.bb, %sw.bb78, > %sw.bb86, %sw.bb94 > %.sink = phi i192 [ 144, %sw.bb94 ], [ 128, %sw.bb86 ], [ 112, > %sw.bb78 ], [ 80, %sw.bb ], [ 96, %if.then63 ] > %bf.load97 = load i192, i192* %13, align 4 > %bf.lshr98 = lshr i192 %bf.load97, %.sink > Before lshr nodes are sunk, our lowering backend can understand which > 32 bits of i192 are need and load that 32 bits only. > But after this CFG pass, %.sink is unknown, then there will be i192 > load (8 32-bit loads). > I have some ideas how to handle it, but still I would like some > feedback about what’s best way to do. > (The LLVM IR before and after CFG are shown below) > Thank you. > Best regards, > Ning Xie > > *** Before simplify CFG is applied, we have the following LLVM IR *** > if.then63: ; preds = %if.end > %trunc = trunc i8 %11 to i3 > switch i3 %trunc, label %sw.epilog [ > i3 0, label %sw.bb > i3 1, label %sw.bb70 > i3 2, label %sw.bb78 > i3 3, label %sw.bb86 > i3 -4, label %sw.bb94 > ] > sw.bb: ; preds = %if.then63 > %13 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* > %C0000159C, i32 0, i32 5, i32 3, i32 %conv50, i32 0 > %bf.load65 = load i192, i192* %13, align 4 > %bf.lshr66 = lshr i192 %bf.load65, 80 > %14 = trunc i192 %bf.lshr66 to i32 > %bf.cast68 = and i32 %14, 4095 > br label %sw.epilog > sw.bb70: ; preds = %if.then63 > %15 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* > %C0000159C, i32 0, i32 5, i32 3, i32 %conv50, i32 0 > %bf.load73 = load i192, i192* %15, align 4 > %bf.lshr74 = lshr i192 %bf.load73, 96 > %16 = trunc i192 %bf.lshr74 to i32 > %bf.cast76 = and i32 %16, 4095 > br label %sw.epilog > sw.bb78: ; preds = %if.then63 > %17 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* > %C0000159C, i32 0, i32 5, i32 3, i32 %conv50, i32 0 > %bf.load81 = load i192, i192* %17, align 4 > %bf.lshr82 = lshr i192 %bf.load81, 112 > %18 = trunc i192 %bf.lshr82 to i32 > %bf.cast84 = and i32 %18, 4095 > br label %sw.epilog > sw.bb86: ; preds = %if.then63 > %19 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* > %C0000159C, i32 0, i32 5, i32 3, i32 %conv50, i32 0 > %bf.load89 = load i192, i192* %19, align 4 > %bf.lshr90 = lshr i192 %bf.load89, 128 > %20 = trunc i192 %bf.lshr90 to i32 > %bf.cast92 = and i32 %20, 4095 > sw.bb94: ; preds = %if.then63 > %21 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* > %C0000159C, i32 0, i32 5, i32 3, i32 %conv50, i32 0 > %bf.load97 = load i192, i192* %21, align 4 > %bf.lshr98 = lshr i192 %bf.load97, 144 > %22 = trunc i192 %bf.lshr98 to i32 > %bf.cast100 = and i32 %22, 4095 > br label %sw.epilog > *** IR Dump After Simplify the CFG *** > if.then63: ; preds = %if.end > %trunc = trunc i8 %11 to i3 > switch i3 %trunc, label %sw.epilog [ > i3 0, label %sw.bb > i3 1, label %sw.epilog.sink.split > i3 2, label %sw.bb78 > i3 3, label %sw.bb86 > i3 -4, label %sw.bb94 > ] > sw.bb: ; preds = %if.then63 > br label %sw.epilog.sink.split > sw.bb78: ; preds = %if.then63 > br label %sw.epilog.sink.split > sw.bb86: ; preds = %if.then63 > br label %sw.epilog.sink.split > sw.bb94: ; preds = %if.then63 > br label %sw.epilog.sink.split > sw.epilog.sink.split: ; preds = %if.then63, %sw.bb, %sw.bb78, > %sw.bb86, %sw.bb94 > %.sink = phi i192 [ 144, %sw.bb94 ], [ 128, %sw.bb86 ], [ 112, > %sw.bb78 ], [ 80, %sw.bb ], [ 96, %if.then63 ] > %13 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* > %C0000159C, i32 0, i32 5, i32 3, i32 %conv50, i32 0 > %bf.load97 = load i192, i192* %13, align 4 > %bf.lshr98 = lshr i192 %bf.load97, %.sink > %14 = trunc i192 %bf.lshr98 to i32 > %bf.cast100 = and i32 %14, 4095 > br label %sw.epilogI think we need to improve the cost modeling for sinking code. Fundamentally, the problem is that "lshr i192 %bf.load97, %.sink" is a lot more expensive than "lshr i192 %bf.load97, 80", and we don't really account for that in the code which decides whether to sink the shift. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170804/a68d689e/attachment-0001.html>