thr3ads.net - llvm dev - [llvm-dev] A CFG issue [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Ning XIE via llvm-dev

2017-Aug-03 22:19 UTC

[llvm-dev] A CFG issue

Hi,

I met an issue caused by Simplify the CFG. We have the following instructions:

sw.bb:                                            ; preds = %if.then63
  %bf.load65 = load i192, i192* %13, align 4
  %bf.lshr66 = lshr i192 %bf.load65, 80

sw.bb70:                                          ; preds = %if.then63
  %bf.load73 = load i192, i192* %15, align 4
 %bf.lshr74 = lshr i192 %bf.load73, 96

sw.bb78:                                          ; preds = %if.then63
 %bf.load81 = load i192, i192* %17, align 4
  %bf.lshr82 = lshr i192 %bf.load81, 112

sw.bb86:                                          ; preds = %if.then63
  %bf.load89 = load i192, i192* %19, align 4
  %bf.lshr90 = lshr i192 %bf.load89, 128

sw.bb94:                                          ; preds = %if.then63
  %bf.load97 = load i192, i192* %21, align 4
  %bf.lshr98 = lshr i192 %bf.load97, 144

Each pair of them is from different blocks and will be sunk into an end block.
Also, a PHI node %.sink is created for the constant operand of lshr as

sw.epilog.sink.split:                             ; preds = %if.then63, %sw.bb,
%sw.bb78, %sw.bb86, %sw.bb94
  %.sink = phi i192 [ 144, %sw.bb94 ], [ 128, %sw.bb86 ], [ 112, %sw.bb78 ], [
80, %sw.bb ], [ 96, %if.then63 ]
  %bf.load97 = load i192, i192* %13, align 4
  %bf.lshr98 = lshr i192 %bf.load97, %.sink


Before lshr nodes are sunk, our lowering backend can understand which 32 bits of
i192 are need and load that 32 bits only.
But after this CFG pass, %.sink is unknown, then there will be i192 load (8
32-bit loads).

I have some ideas how to handle it, but still I would like some feedback about
what’s best way to do.
(The LLVM IR before and after CFG are shown below)

Thank you.

Best regards,
Ning Xie


*** Before simplify CFG is applied, we have the following LLVM IR ***

if.then63:                                        ; preds = %if.end
  %trunc = trunc i8 %11 to i3
  switch i3 %trunc, label %sw.epilog [
    i3 0, label %sw.bb
    i3 1, label %sw.bb70
    i3 2, label %sw.bb78
    i3 3, label %sw.bb86
    i3 -4, label %sw.bb94
  ]

sw.bb:                                            ; preds = %if.then63
  %13 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* %C0000159C,
i32 0, i32 5, i32 3, i32 %conv50, i32 0
  %bf.load65 = load i192, i192* %13, align 4
  %bf.lshr66 = lshr i192 %bf.load65, 80
  %14 = trunc i192 %bf.lshr66 to i32
  %bf.cast68 = and i32 %14, 4095
  br label %sw.epilog

sw.bb70:                                          ; preds = %if.then63
  %15 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* %C0000159C,
i32 0, i32 5, i32 3, i32 %conv50, i32 0
  %bf.load73 = load i192, i192* %15, align 4
  %bf.lshr74 = lshr i192 %bf.load73, 96
  %16 = trunc i192 %bf.lshr74 to i32
  %bf.cast76 = and i32 %16, 4095
  br label %sw.epilog

sw.bb78:                                          ; preds = %if.then63
  %17 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* %C0000159C,
i32 0, i32 5, i32 3, i32 %conv50, i32 0
  %bf.load81 = load i192, i192* %17, align 4
  %bf.lshr82 = lshr i192 %bf.load81, 112
  %18 = trunc i192 %bf.lshr82 to i32
  %bf.cast84 = and i32 %18, 4095
  br label %sw.epilog

sw.bb86:                                          ; preds = %if.then63
  %19 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* %C0000159C,
i32 0, i32 5, i32 3, i32 %conv50, i32 0
  %bf.load89 = load i192, i192* %19, align 4
  %bf.lshr90 = lshr i192 %bf.load89, 128
  %20 = trunc i192 %bf.lshr90 to i32
  %bf.cast92 = and i32 %20, 4095

sw.bb94:                                          ; preds = %if.then63
  %21 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* %C0000159C,
i32 0, i32 5, i32 3, i32 %conv50, i32 0
  %bf.load97 = load i192, i192* %21, align 4
  %bf.lshr98 = lshr i192 %bf.load97, 144
  %22 = trunc i192 %bf.lshr98 to i32
  %bf.cast100 = and i32 %22, 4095
  br label %sw.epilog


*** IR Dump After Simplify the CFG ***

if.then63:                                        ; preds = %if.end
  %trunc = trunc i8 %11 to i3
  switch i3 %trunc, label %sw.epilog [
    i3 0, label %sw.bb
    i3 1, label %sw.epilog.sink.split
    i3 2, label %sw.bb78
    i3 3, label %sw.bb86
    i3 -4, label %sw.bb94
  ]

sw.bb:                                            ; preds = %if.then63
  br label %sw.epilog.sink.split

sw.bb78:                                          ; preds = %if.then63
  br label %sw.epilog.sink.split

sw.bb86:                                          ; preds = %if.then63
  br label %sw.epilog.sink.split

sw.bb94:                                          ; preds = %if.then63
  br label %sw.epilog.sink.split

sw.epilog.sink.split:                             ; preds = %if.then63, %sw.bb,
%sw.bb78, %sw.bb86, %sw.bb94
  %.sink = phi i192 [ 144, %sw.bb94 ], [ 128, %sw.bb86 ], [ 112, %sw.bb78 ], [
80, %sw.bb ], [ 96, %if.then63 ]
  %13 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* %C0000159C,
i32 0, i32 5, i32 3, i32 %conv50, i32 0
  %bf.load97 = load i192, i192* %13, align 4
  %bf.lshr98 = lshr i192 %bf.load97, %.sink
  %14 = trunc i192 %bf.lshr98 to i32
  %bf.cast100 = and i32 %14, 4095
  br label %sw.epilog

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170803/77f8b5f1/attachment.html>

Friedman, Eli via llvm-dev

2017-Aug-04 18:10 UTC

head link

[llvm-dev] A CFG issue

On 8/3/2017 3:19 PM, Ning XIE via llvm-dev wrote:> Hi,
>
> I met an issue caused by Simplify the CFG. We have the following 
> instructions:
> sw.bb: ; preds = %if.then63
>   %bf.load65 = load i192, i192* %13, align 4
>   %bf.lshr66 = lshr i192 %bf.load65, 80
> sw.bb70: ; preds = %if.then63
>   %bf.load73 = load i192, i192* %15, align 4
>  %bf.lshr74 = lshr i192 %bf.load73, 96
> sw.bb78: ; preds = %if.then63
>  %bf.load81 = load i192, i192* %17, align 4
>   %bf.lshr82 = lshr i192 %bf.load81, 112
> sw.bb86: ; preds = %if.then63
>   %bf.load89 = load i192, i192* %19, align 4
>   %bf.lshr90 = lshr i192 %bf.load89, 128
> sw.bb94: ; preds = %if.then63
>   %bf.load97 = load i192, i192* %21, align 4
>   %bf.lshr98 = lshr i192 %bf.load97, 144
> Each pair of them is from different blocks and will be sunk into an 
> end block. Also, a PHI node %.sink is created for the constant operand 
> of lshr as
> sw.epilog.sink.split: ; preds = %if.then63, %sw.bb, %sw.bb78, 
> %sw.bb86, %sw.bb94
>   %.sink = phi i192 [ 144, %sw.bb94 ], [ 128, %sw.bb86 ], [ 112, 
> %sw.bb78 ], [ 80, %sw.bb ], [ 96, %if.then63 ]
>   %bf.load97 = load i192, i192* %13, align 4
>   %bf.lshr98 = lshr i192 %bf.load97, %.sink
> Before lshr nodes are sunk, our lowering backend can understand which 
> 32 bits of i192 are need and load that 32 bits only.
> But after this CFG pass, %.sink is unknown, then there will be i192 
> load (8 32-bit loads).
> I have some ideas how to handle it, but still I would like some 
> feedback about what’s best way to do.
> (The LLVM IR before and after CFG are shown below)
> Thank you.
> Best regards,
> Ning Xie
>
> *** Before simplify CFG is applied, we have the following LLVM IR ***
> if.then63: ; preds = %if.end
>   %trunc = trunc i8 %11 to i3
>   switch i3 %trunc, label %sw.epilog [
>     i3 0, label %sw.bb
>     i3 1, label %sw.bb70
>     i3 2, label %sw.bb78
>     i3 3, label %sw.bb86
>     i3 -4, label %sw.bb94
>   ]
> sw.bb: ; preds = %if.then63
>   %13 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* 
> %C0000159C, i32 0, i32 5, i32 3, i32 %conv50, i32 0
>   %bf.load65 = load i192, i192* %13, align 4
> %bf.lshr66 = lshr i192 %bf.load65, 80
>   %14 = trunc i192 %bf.lshr66 to i32
>   %bf.cast68 = and i32 %14, 4095
>   br label %sw.epilog
> sw.bb70: ; preds = %if.then63
>   %15 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* 
> %C0000159C, i32 0, i32 5, i32 3, i32 %conv50, i32 0
>   %bf.load73 = load i192, i192* %15, align 4
> %bf.lshr74 = lshr i192 %bf.load73, 96
>   %16 = trunc i192 %bf.lshr74 to i32
>   %bf.cast76 = and i32 %16, 4095
>   br label %sw.epilog
> sw.bb78: ; preds = %if.then63
>   %17 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* 
> %C0000159C, i32 0, i32 5, i32 3, i32 %conv50, i32 0
>   %bf.load81 = load i192, i192* %17, align 4
>   %bf.lshr82 = lshr i192 %bf.load81, 112
>   %18 = trunc i192 %bf.lshr82 to i32
>   %bf.cast84 = and i32 %18, 4095
>   br label %sw.epilog
> sw.bb86: ; preds = %if.then63
>   %19 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* 
> %C0000159C, i32 0, i32 5, i32 3, i32 %conv50, i32 0
>   %bf.load89 = load i192, i192* %19, align 4
>   %bf.lshr90 = lshr i192 %bf.load89, 128
>   %20 = trunc i192 %bf.lshr90 to i32
>   %bf.cast92 = and i32 %20, 4095
> sw.bb94: ; preds = %if.then63
>   %21 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* 
> %C0000159C, i32 0, i32 5, i32 3, i32 %conv50, i32 0
>   %bf.load97 = load i192, i192* %21, align 4
>   %bf.lshr98 = lshr i192 %bf.load97, 144
>   %22 = trunc i192 %bf.lshr98 to i32
>   %bf.cast100 = and i32 %22, 4095
>   br label %sw.epilog
> *** IR Dump After Simplify the CFG ***
> if.then63: ; preds = %if.end
>   %trunc = trunc i8 %11 to i3
>   switch i3 %trunc, label %sw.epilog [
>     i3 0, label %sw.bb
>     i3 1, label %sw.epilog.sink.split
>     i3 2, label %sw.bb78
>     i3 3, label %sw.bb86
>     i3 -4, label %sw.bb94
>   ]
> sw.bb: ; preds = %if.then63
>   br label %sw.epilog.sink.split
> sw.bb78: ; preds = %if.then63
>   br label %sw.epilog.sink.split
> sw.bb86: ; preds = %if.then63
>   br label %sw.epilog.sink.split
> sw.bb94:                                  ; preds = %if.then63
>   br label %sw.epilog.sink.split
> sw.epilog.sink.split: ; preds = %if.then63, %sw.bb, %sw.bb78, 
> %sw.bb86, %sw.bb94
>   %.sink = phi i192 [ 144, %sw.bb94 ], [ 128, %sw.bb86 ], [ 112, 
> %sw.bb78 ], [ 80, %sw.bb ], [ 96, %if.then63 ]
>   %13 = getelementptr inbounds %struct.C0000294C, %struct.C0000294C* 
> %C0000159C, i32 0, i32 5, i32 3, i32 %conv50, i32 0
>   %bf.load97 = load i192, i192* %13, align 4
>   %bf.lshr98 = lshr i192 %bf.load97, %.sink
>   %14 = trunc i192 %bf.lshr98 to i32
>   %bf.cast100 = and i32 %14, 4095
>   br label %sw.epilog
I think we need to improve the cost modeling for sinking code. 
Fundamentally, the problem is that "lshr i192 %bf.load97, %.sink" is a
lot more expensive than "lshr i192 %bf.load97, 80", and we don't
really
account for that in the code which decides whether to sink the shift.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170804/a68d689e/attachment-0001.html>

llvm dev - Aug 2017 - A CFG issue

[llvm-dev] A CFG issue

[llvm-dev] A CFG issue