Is there any way to disable short-circuit evaluation of expressions in Clang/LLVM? Let's say I have C code like the following: bool validX = get_group_id(0) > 32; int globalIndexY0 = get_group_id(1)*186 + 6*get_local_id(1) + 0 + 1; bool valid0 = validX && globalIndexY0 >= 4 && globalIndexY0 < 3910; int globalIndexY1 = get_group_id(1)*186 + 6*get_local_id(1) + 1 + 1; bool valid1 = validX && globalIndexY1 >= 4 && globalIndexY1 < 3910; int globalIndexY2 = get_group_id(1)*186 + 6*get_local_id(1) + 2 + 1; bool valid2 = validX && globalIndexY2 >= 4 && globalIndexY2 < 3910; Clang, even at -O0, is performing short-circuit evaluation of these expressions, resulting in a fair number of branch instructions being generated. For most targets, this is a beneficial optimizations. However, for my target (PTX), it would be most beneficial to actually evaluate the entire expression and remove the unneeded branches. Is this possible with current Clang/LLVM? -- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111010/9f7000e3/attachment.html>
David A. Greene
2011-Oct-10 14:29 UTC
[LLVMdev] [cfe-dev] Disable Short-Circuit Evaluation?
Justin Holewinski <justin.holewinski at gmail.com> writes:> int globalIndexY2 = get_group_id(1)*186 + 6*get_local_id(1) + 2 + 1; > bool valid2 = validX && globalIndexY2 >= 4 && globalIndexY2 < 3910; > > Clang, even at -O0, is performing short-circuit evaluation of these > expressions, resulting in a fair number of branch instructions being > generated.It has to. This is the semantics of C. Short-circuiting is used to defend against all sorts of undefined behavior in real code.> For most targets, this is a beneficial optimizations.For all targets. If the code doesn't work, it's pretty useless. :)> However, for my target (PTX), it would be most beneficial to actually > evaluate the entire expression and remove the unneeded branches. Is > this possible with current Clang/LLVM?So for PTX what you want is if-conversion. I believe there is a pass that does this in the ARM codegen. Of course the PTX backend will have to support mask bits. I don't know if it does currently. -Dave
A compilable testcase: extern int get_group_id (int); extern int get_local_id (int); extern void check (bool, bool, bool); void foo (void) { bool validX = get_group_id (0) > 32; int globalIndexY0 = get_group_id (1) * 186 + 6 * get_local_id (1) + 0 + 1; bool valid0 = validX && globalIndexY0 >= 4 && globalIndexY0 < 3910; int globalIndexY1 = get_group_id (1) * 186 + 6 * get_local_id (1) + 1 + 1; bool valid1 = validX && globalIndexY1 >= 4 && globalIndexY1 < 3910; int globalIndexY2 = get_group_id (1) * 186 + 6 * get_local_id (1) + 2 + 1; bool valid2 = validX && globalIndexY2 >= 4 && globalIndexY2 < 3910; check (valid0, valid1, valid2); }
Konstantin Tokarev
2011-Oct-10 15:19 UTC
[LLVMdev] [cfe-dev] Disable Short-Circuit Evaluation?
10.10.2011, 18:29, "David A. Greene" <greened at obbligato.org>:> Justin Holewinski <justin.holewinski at gmail.com> writes: > >> int globalIndexY2 = get_group_id(1)*186 + 6*get_local_id(1) + 2 + 1; >> bool valid2 = validX && globalIndexY2 >= 4 && globalIndexY2 < 3910; >> >> Clang, even at -O0, is performing short-circuit evaluation of these >> expressions, resulting in a fair number of branch instructions being >> generated. > > It has to. This is the semantics of C. Short-circuiting is used to > defend against all sorts of undefined behavior in real code.More precisely, && and || are sequence points (though in C++ they may not be sequence points if respective operator is overloaded) [1] http://en.wikipedia.org/wiki/Sequence_point -- Regards, Konstantin