thr3ads.net - llvm dev - [llvm-dev] Approaches for handling branches [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Jared C. Davis via llvm-dev

2016-Mar-12 13:45 UTC

[llvm-dev] Approaches for handling branches

Hi,

LLVM newbie here.  I'm working on writing a compiler from a toy language to
LLVM assembly, and hoping to get some advice about how to handle certain
kinds of branches.

Consider an input program that might look like this if written in C:

   x1 = cond1 ? f(a1)
      : cond2 ? f(a2)
      : cond3 ? f(a1)
      :         f(a4);

   x2 = cond3 ? f(a3)
      : cond4 ? f(a1)
      :         f(a4);

   return g(x1, x2);

I'm hoping to compile something like the above in a "good" way
that avoids
unnecessarily calling F.

To mock up how this might work, I wrote some LLVM assembly to try to model
the above (full code below: "try1").  I ended up with the following
for x1,
where @f is declared as an external function, and @ite is a simple
if-then-else function that seems to nicely get inlined:

   %fa1 = call i64 @f (i64 %a1)
   %fa2 = call i64 @f (i64 %a2)
   %fa4 = call i64 @f (i64 %a4)

   %x1.cond3 = call i64 @ite (i64 %cond3, i64 %fa1, i64 %fa4)
   %x1.cond2 = call i64 @ite (i64 %cond2, i64 %fa2, i64 %x1.cond3)
   %x1       = call i64 @ite (i64 %cond1, i64 %fa1, i64 %x1.cond2)

Unfortunately this seems to produce "bad" x86 code, at least when
compiled
like this:

     opt demo.ll -O2 -o=demo.ll.opt && llc -o demo.s demo.ll.opt

In particular, the resulting demo.s appears to be doing all of the calls of F
first, followed by some conditional moves to select the right answer.  (This
would of course be great if F is very fast, but it would be bad if F is
slow.)

I've tried adding some promising looking switches to the opt command (like
-delinearize and -sink) and using different register allocation schemes in
the llc command, but so far I don't seem to be able to get LLVM to reorder
the instructions to push the calls of F into the branches.  (I'm afraid I
don't really know much about what the switches do.)

I seem to be able to do much better by (manually) rewriting the assembly so
that I explicitly check the conditions first, and only call F on the various
arguments in the branches where they are needed (code below: "try2").

So I could write my frontend so that it generates code like try2.  But I
worry about this approach because the code for try2 ends up repeating
computations like the call of f(a1) and f(a4), and it seems like this will
lead to code blowup.  My "real" programs will have much more deeply
nested
branches, and the expressions in the branches are going to be bigger than
just a single function call.  So if I try to put these expressions only in
the branches where they are computed, it seems like that will result in code
that grows exponentially in the number of branches that lead to a common
subexpression.

So I have many questions.  I wonder:

  - Are there other command-line options I should look into that might help
    LLVM be able able to optimize "try1" to push the calls down into
the
    branches where they are used and avoid the unnecessary calls of F?

  - Is this kind of reordering anything I should expect LLVM to be able to
    take care of in the first place, or should I expect to need to work on
    making sure that my frontend produces code that is closer to what I want
    the machine to execute?

  - Are there other approaches that would work better than try1/try2 that I
    haven't considered?

  - Are there any examples of developing LLVM frontends for languages that
    deal with terms that are rich in branches and structure sharing?  My real
    goal is to build a good compiler for efficiently executing DAGs of 64-bit
    integer operations, where some nodes fan out to several operations, but
    large chunks of the DAG may not be needed depending on the inputs.  I'd
    be very interested in seeing other approaches to this kind of problem.

Thanks very much for your time.

Cheers,

Jared

------------------------------

;; Source code for try1/try2
;;
;; I'm using LLVM on Linux Mint:
;;   opt --version: LLVM version 3.4
;;   llc --version: LLVM version 3.4

declare i64 @f (i64 %a) nounwind readonly
declare i64 @g (i64 %a, i64 %b) nounwind readonly


define i64 @ite (i64 %cond, i64 %then, i64 %else)
{
    ;; If/Then/Else, which will be inlined with -inline
    %cond.zero = icmp eq i64 %cond, 0
    br i1 %cond.zero, label %case.zero, label %case.nonzero
  case.nonzero:
    ret i64 %then
  case.zero:
    ret i64 %else
}

define i64 @try1 (i64* %in)
{
    ;; Arguments from an input array.
    %cond1.addr = getelementptr i64* %in, i32 0
    %cond2.addr = getelementptr i64* %in, i32 1
    %cond3.addr = getelementptr i64* %in, i32 2
    %cond4.addr = getelementptr i64* %in, i32 3
    %a1.addr = getelementptr i64* %in, i32 4
    %a2.addr = getelementptr i64* %in, i32 5
    %a3.addr = getelementptr i64* %in, i32 6
    %a4.addr = getelementptr i64* %in, i32 7

    ;; Load arguments into temporaries
    %cond1 = load i64* %cond1.addr
    %cond2 = load i64* %cond2.addr
    %cond3 = load i64* %cond3.addr
    %cond4 = load i64* %cond3.addr
    %a1 = load i64* %a1.addr
    %a2 = load i64* %a2.addr
    %a3 = load i64* %a3.addr
    %a4 = load i64* %a4.addr

    ;; Compute F(Ai)
    %fa1 = call i64 @f (i64 %a1)
    %fa2 = call i64 @f (i64 %a2)
    %fa3 = call i64 @f (i64 %a3)
    %fa4 = call i64 @f (i64 %a4)

    ; Compute x1 = cond1 ? f(a1)
    ;            : cond2 ? f(a2)
    ;            : cond3 ? f(a1)
    ;            :         f(a4);

    %x1.cond3 = call i64 @ite (i64 %cond3, i64 %fa1, i64 %fa4)
    %x1.cond2 = call i64 @ite (i64 %cond2, i64 %fa2, i64 %x1.cond3)
    %x1     = call i64 @ite (i64 %cond1, i64 %fa1, i64 %x1.cond2)

    ; Compute x2 = cond3 ? f(a3)
    ;            : cond4 ? f(a1)
    ;            :         f(a4);

    %x2.cond4 = call i64 @ite (i64 %cond4, i64 %fa1, i64 %fa4)
    %x2       = call i64 @ite (i64 %cond3, i64 %fa3, i64 %x2.cond4)

    ; Return g(x1, x2);
    %ret = call i64 @g(i64 %x1, i64 %x2)
    ret i64 %ret
}



define i64 @try2 (i64* %in)
{
    ;; Arguments from an input array.
    %cond1.addr = getelementptr i64* %in, i32 0
    %cond2.addr = getelementptr i64* %in, i32 1
    %cond3.addr = getelementptr i64* %in, i32 2
    %cond4.addr = getelementptr i64* %in, i32 3
    %a1.addr = getelementptr i64* %in, i32 4
    %a2.addr = getelementptr i64* %in, i32 5
    %a3.addr = getelementptr i64* %in, i32 6
    %a4.addr = getelementptr i64* %in, i32 7

    ;; Load arguments into temporaries
    %cond1 = load i64* %cond1.addr
    %cond2 = load i64* %cond2.addr
    %cond3 = load i64* %cond3.addr
    %cond4 = load i64* %cond3.addr
    %a1 = load i64* %a1.addr
    %a2 = load i64* %a2.addr
    %a3 = load i64* %a3.addr
    %a4 = load i64* %a4.addr

    ; Compute x1 = cond1 ? f(a1)
    ;            : cond2 ? f(a2)
    ;            : cond3 ? f(a1)
    ;            :         f(a4);

    %cond1.nonzero = icmp ne i64 %cond1, 0
    br i1 %cond1.nonzero, label %x1.cond1.true, label %x1.cond1.false

  x1.cond1.false:
    %cond2.nonzero = icmp ne i64 %cond2, 0
    br i1 %cond2.nonzero, label %x1.cond2.true, label %x1.cond2.false

  x1.cond2.false:
    %cond3.nonzero = icmp ne i64 %cond3, 0
    br i1 %cond3.nonzero, label %x1.cond3.true, label %x1.otherwise


  x1.cond1.true:                           ; cond1 ? f(a1)
    %x1.cond1.ans = call i64 @f (i64 %a1)
    br label %x1.finish

  x1.cond2.true:                           ; cond2 ? f(a2)
    %x1.cond2.ans = call i64 @f (i64 %a2)
    br label %x1.finish

  x1.cond3.true:                           ; cond3 ? f(a1)
    %x1.cond3.ans = call i64 @f (i64 %a1)
    br label %x1.finish

  x1.otherwise:                            ; otherwise f(a4)
    %x1.otherwise.ans = call i64 @f (i64 %a4)
    br label %x1.finish

  x1.finish:
    %x1 = phi i64 [%x1.cond1.ans, %x1.cond1.true],
                  [%x1.cond2.ans, %x1.cond2.true],
                  [%x1.cond3.ans, %x1.cond3.true],
                  [%x1.otherwise.ans, %x1.otherwise]



    ; Compute x2 = cond3 ? f(a3)
    ;            : cond4 ? f(a1)
    ;            :         f(a4);

    %cond3.nonzero_for_x2 = icmp ne i64 %cond3, 0
    br i1 %cond3.nonzero_for_x2, label %x2.cond3.true, label %x2.cond3.false

  x2.cond3.false:
    %cond4.nonzero = icmp ne i64 %cond4, 0
    br i1 %cond4.nonzero, label %x2.cond4.true, label %x2.otherwise

  x2.cond3.true:                            ; cond3 ? f(a3)
    %x2.cond3.ans = call i64 @f (i64 %a3)
    br label %x2.finish

  x2.cond4.true:                            ; cond4 ? f(a1)
    %x2.cond4.ans = call i64 @f (i64 %a1)
    br label %x2.finish

  x2.otherwise:
    %x2.otherwise.ans = call i64 @f (i64 %a4)
    br label %x2.finish

  x2.finish:
    %x2 = phi i64 [%x2.cond3.ans, %x2.cond3.true],
                  [%x2.cond4.ans, %x2.cond4.true],
                  [%x2.otherwise.ans, %x2.otherwise]

    ; Return g(x1, x2);
    %ret = call i64 @g(i64 %x1, i64 %x2)
    ret i64 %ret

}




-- 
Jared C. Davis <jared at cs.utexas.edu>
11410 Windermere Meadows
Austin, TX 78759
http://www.cs.utexas.edu/users/jared/

David Blaikie via llvm-dev

2016-Mar-12 21:58 UTC

head link

[llvm-dev] Approaches for handling branches

Try writing the same C code, feeding it through Clang and see how it
optimizes, that should give you a sense of what's possible/practical.

The main issue here, I think, is that LLVM doesn't know/isn't guaranteed
that your function calls don't have side effects (printing things, changing
the contents of memory, etc) so it can't cause there to be extra (or fewer)
invocations of the function than are present in your original program
without making the program incorrect/behave differently.

If you know your function calls have no side effects and can be called
multiple times I think the attribute you want to add to the function
declarations is 'readnone', but I'm not entirely sure. If you add
that
attribute to your functions, you should be able to see them potentially
cloned or coalesced as is 'best' for the code.

But if you don't have that guarantee from your input program/language spec,
then you need to make sure there aren't excessive/duplicate calls in your
input you give to LLVM to optimize.

On Sat, Mar 12, 2016 at 5:45 AM, Jared C. Davis via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> LLVM newbie here.  I'm working on writing a compiler from a toy
language to
> LLVM assembly, and hoping to get some advice about how to handle certain
> kinds of branches.
>
> Consider an input program that might look like this if written in C:
>
>    x1 = cond1 ? f(a1)
>       : cond2 ? f(a2)
>       : cond3 ? f(a1)
>       :         f(a4);
>
>    x2 = cond3 ? f(a3)
>       : cond4 ? f(a1)
>       :         f(a4);
>
>    return g(x1, x2);
>
> I'm hoping to compile something like the above in a "good"
way that avoids
> unnecessarily calling F.
>
> To mock up how this might work, I wrote some LLVM assembly to try to model
> the above (full code below: "try1").  I ended up with the
following for x1,
> where @f is declared as an external function, and @ite is a simple
> if-then-else function that seems to nicely get inlined:
>
>    %fa1 = call i64 @f (i64 %a1)
>    %fa2 = call i64 @f (i64 %a2)
>    %fa4 = call i64 @f (i64 %a4)
>
>    %x1.cond3 = call i64 @ite (i64 %cond3, i64 %fa1, i64 %fa4)
>    %x1.cond2 = call i64 @ite (i64 %cond2, i64 %fa2, i64 %x1.cond3)
>    %x1       = call i64 @ite (i64 %cond1, i64 %fa1, i64 %x1.cond2)
>
> Unfortunately this seems to produce "bad" x86 code, at least when
compiled
> like this:
>
>      opt demo.ll -O2 -o=demo.ll.opt && llc -o demo.s demo.ll.opt
>
> In particular, the resulting demo.s appears to be doing all of the calls
> of F
> first, followed by some conditional moves to select the right answer.
> (This
> would of course be great if F is very fast, but it would be bad if F is
> slow.)
>
> I've tried adding some promising looking switches to the opt command
(like
> -delinearize and -sink) and using different register allocation schemes in
> the llc command, but so far I don't seem to be able to get LLVM to
reorder
> the instructions to push the calls of F into the branches.  (I'm afraid
I
> don't really know much about what the switches do.)
>
> I seem to be able to do much better by (manually) rewriting the assembly so
> that I explicitly check the conditions first, and only call F on the
> various
> arguments in the branches where they are needed (code below:
"try2").
>
> So I could write my frontend so that it generates code like try2.  But I
> worry about this approach because the code for try2 ends up repeating
> computations like the call of f(a1) and f(a4), and it seems like this will
> lead to code blowup.  My "real" programs will have much more
deeply nested
> branches, and the expressions in the branches are going to be bigger than
> just a single function call.  So if I try to put these expressions only in
> the branches where they are computed, it seems like that will result in
> code
> that grows exponentially in the number of branches that lead to a common
> subexpression.
>
> So I have many questions.  I wonder:
>
>   - Are there other command-line options I should look into that might help
>     LLVM be able able to optimize "try1" to push the calls down
into the
>     branches where they are used and avoid the unnecessary calls of F?
>
>   - Is this kind of reordering anything I should expect LLVM to be able to
>     take care of in the first place, or should I expect to need to work on
>     making sure that my frontend produces code that is closer to what I
> want
>     the machine to execute?
>
>   - Are there other approaches that would work better than try1/try2 that I
>     haven't considered?
>
>   - Are there any examples of developing LLVM frontends for languages that
>     deal with terms that are rich in branches and structure sharing?  My
> real
>     goal is to build a good compiler for efficiently executing DAGs of
> 64-bit
>     integer operations, where some nodes fan out to several operations, but
>     large chunks of the DAG may not be needed depending on the inputs. 
I'd
>     be very interested in seeing other approaches to this kind of problem.
>
> Thanks very much for your time.
>
> Cheers,
>
> Jared
>
> ------------------------------
>
> ;; Source code for try1/try2
> ;;
> ;; I'm using LLVM on Linux Mint:
> ;;   opt --version: LLVM version 3.4
> ;;   llc --version: LLVM version 3.4
>
> declare i64 @f (i64 %a) nounwind readonly
> declare i64 @g (i64 %a, i64 %b) nounwind readonly
>
>
> define i64 @ite (i64 %cond, i64 %then, i64 %else)
> {
>     ;; If/Then/Else, which will be inlined with -inline
>     %cond.zero = icmp eq i64 %cond, 0
>     br i1 %cond.zero, label %case.zero, label %case.nonzero
>   case.nonzero:
>     ret i64 %then
>   case.zero:
>     ret i64 %else
> }
>
> define i64 @try1 (i64* %in)
> {
>     ;; Arguments from an input array.
>     %cond1.addr = getelementptr i64* %in, i32 0
>     %cond2.addr = getelementptr i64* %in, i32 1
>     %cond3.addr = getelementptr i64* %in, i32 2
>     %cond4.addr = getelementptr i64* %in, i32 3
>     %a1.addr = getelementptr i64* %in, i32 4
>     %a2.addr = getelementptr i64* %in, i32 5
>     %a3.addr = getelementptr i64* %in, i32 6
>     %a4.addr = getelementptr i64* %in, i32 7
>
>     ;; Load arguments into temporaries
>     %cond1 = load i64* %cond1.addr
>     %cond2 = load i64* %cond2.addr
>     %cond3 = load i64* %cond3.addr
>     %cond4 = load i64* %cond3.addr
>     %a1 = load i64* %a1.addr
>     %a2 = load i64* %a2.addr
>     %a3 = load i64* %a3.addr
>     %a4 = load i64* %a4.addr
>
>     ;; Compute F(Ai)
>     %fa1 = call i64 @f (i64 %a1)
>     %fa2 = call i64 @f (i64 %a2)
>     %fa3 = call i64 @f (i64 %a3)
>     %fa4 = call i64 @f (i64 %a4)
>
>     ; Compute x1 = cond1 ? f(a1)
>     ;            : cond2 ? f(a2)
>     ;            : cond3 ? f(a1)
>     ;            :         f(a4);
>
>     %x1.cond3 = call i64 @ite (i64 %cond3, i64 %fa1, i64 %fa4)
>     %x1.cond2 = call i64 @ite (i64 %cond2, i64 %fa2, i64 %x1.cond3)
>     %x1     = call i64 @ite (i64 %cond1, i64 %fa1, i64 %x1.cond2)
>
>     ; Compute x2 = cond3 ? f(a3)
>     ;            : cond4 ? f(a1)
>     ;            :         f(a4);
>
>     %x2.cond4 = call i64 @ite (i64 %cond4, i64 %fa1, i64 %fa4)
>     %x2       = call i64 @ite (i64 %cond3, i64 %fa3, i64 %x2.cond4)
>
>     ; Return g(x1, x2);
>     %ret = call i64 @g(i64 %x1, i64 %x2)
>     ret i64 %ret
> }
>
>
>
> define i64 @try2 (i64* %in)
> {
>     ;; Arguments from an input array.
>     %cond1.addr = getelementptr i64* %in, i32 0
>     %cond2.addr = getelementptr i64* %in, i32 1
>     %cond3.addr = getelementptr i64* %in, i32 2
>     %cond4.addr = getelementptr i64* %in, i32 3
>     %a1.addr = getelementptr i64* %in, i32 4
>     %a2.addr = getelementptr i64* %in, i32 5
>     %a3.addr = getelementptr i64* %in, i32 6
>     %a4.addr = getelementptr i64* %in, i32 7
>
>     ;; Load arguments into temporaries
>     %cond1 = load i64* %cond1.addr
>     %cond2 = load i64* %cond2.addr
>     %cond3 = load i64* %cond3.addr
>     %cond4 = load i64* %cond3.addr
>     %a1 = load i64* %a1.addr
>     %a2 = load i64* %a2.addr
>     %a3 = load i64* %a3.addr
>     %a4 = load i64* %a4.addr
>
>     ; Compute x1 = cond1 ? f(a1)
>     ;            : cond2 ? f(a2)
>     ;            : cond3 ? f(a1)
>     ;            :         f(a4);
>
>     %cond1.nonzero = icmp ne i64 %cond1, 0
>     br i1 %cond1.nonzero, label %x1.cond1.true, label %x1.cond1.false
>
>   x1.cond1.false:
>     %cond2.nonzero = icmp ne i64 %cond2, 0
>     br i1 %cond2.nonzero, label %x1.cond2.true, label %x1.cond2.false
>
>   x1.cond2.false:
>     %cond3.nonzero = icmp ne i64 %cond3, 0
>     br i1 %cond3.nonzero, label %x1.cond3.true, label %x1.otherwise
>
>
>   x1.cond1.true:                           ; cond1 ? f(a1)
>     %x1.cond1.ans = call i64 @f (i64 %a1)
>     br label %x1.finish
>
>   x1.cond2.true:                           ; cond2 ? f(a2)
>     %x1.cond2.ans = call i64 @f (i64 %a2)
>     br label %x1.finish
>
>   x1.cond3.true:                           ; cond3 ? f(a1)
>     %x1.cond3.ans = call i64 @f (i64 %a1)
>     br label %x1.finish
>
>   x1.otherwise:                            ; otherwise f(a4)
>     %x1.otherwise.ans = call i64 @f (i64 %a4)
>     br label %x1.finish
>
>   x1.finish:
>     %x1 = phi i64 [%x1.cond1.ans, %x1.cond1.true],
>                   [%x1.cond2.ans, %x1.cond2.true],
>                   [%x1.cond3.ans, %x1.cond3.true],
>                   [%x1.otherwise.ans, %x1.otherwise]
>
>
>
>     ; Compute x2 = cond3 ? f(a3)
>     ;            : cond4 ? f(a1)
>     ;            :         f(a4);
>
>     %cond3.nonzero_for_x2 = icmp ne i64 %cond3, 0
>     br i1 %cond3.nonzero_for_x2, label %x2.cond3.true, label
> %x2.cond3.false
>
>   x2.cond3.false:
>     %cond4.nonzero = icmp ne i64 %cond4, 0
>     br i1 %cond4.nonzero, label %x2.cond4.true, label %x2.otherwise
>
>   x2.cond3.true:                            ; cond3 ? f(a3)
>     %x2.cond3.ans = call i64 @f (i64 %a3)
>     br label %x2.finish
>
>   x2.cond4.true:                            ; cond4 ? f(a1)
>     %x2.cond4.ans = call i64 @f (i64 %a1)
>     br label %x2.finish
>
>   x2.otherwise:
>     %x2.otherwise.ans = call i64 @f (i64 %a4)
>     br label %x2.finish
>
>   x2.finish:
>     %x2 = phi i64 [%x2.cond3.ans, %x2.cond3.true],
>                   [%x2.cond4.ans, %x2.cond4.true],
>                   [%x2.otherwise.ans, %x2.otherwise]
>
>     ; Return g(x1, x2);
>     %ret = call i64 @g(i64 %x1, i64 %x2)
>     ret i64 %ret
>
> }
>
>
>
>
> --
> Jared C. Davis <jared at cs.utexas.edu>
> 11410 Windermere Meadows
> Austin, TX 78759
> http://www.cs.utexas.edu/users/jared/
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160312/3942c904/attachment.html>

llvm dev - Mar 2016 - Approaches for handling branches

[llvm-dev] Approaches for handling branches

[llvm-dev] Approaches for handling branches