Nicolas Brunie via llvm-dev
2015-Sep-30 06:01 UTC
[llvm-dev] InstCombine wrongful (?) optimization on BinOp with SameOperands
Hi all, I have been looking at the way LLVM optimizes code before forwarding it to the backend I develop for my company and while building define i32 @test_extract_subreg_func(i32 %x, i32 %y) #0 { entry: %conv = zext i32 %x to i64 %conv1 = zext i32 %y to i64 %mul = mul nuw i64 %conv1, %conv %shr = lshr i64 %mul, 32 %xor = xor i64 %shr, %mul %conv2 = trunc i64 %xor to i32 ret i32 %conv2 } I came upon the following optimization (during instcombine): *IC: Visiting: %mul = mul nuw i64 %conv, %conv1 IC: Visiting: %shr = lshr i64 %mul, 32 IC: Visiting: %conv2 = trunc i64 %shr to i32 IC: Visiting: %conv3 = trunc i64 %mul to i32 IC: Visiting: %xor = xor i32 %conv3, %conv2 IC: ADD: %xor6 = xor i64 %mul, %shr IC: Old = %xor = xor i32 %conv3, %conv2 New = <badref> = trunc i64 %xor6 to i32 * which seems to be performed by SDValue DAGCombiner::SimplifyBinOpWithSameOpcodeHands(SDNode *N) In my backend's architecture truncate is free, but zext is not (and i64 is not a desirable type for xor or any binary operation in general), so I would expect this optimization to be bypassed but because of the following statement : (N0.getOpcode() == ISD::TRUNCATE && (!TLI.isZExtFree(VT, Op0VT) || !TLI.isTruncateFree(Op0VT, VT)) it is not (as isZExtFree return false for my architecture while isTruncateFree returns true). The comment on binop simplification says that binop over truncs should be optimize only if trunc is not free, so I do not understand the point of adding !isZExtFree at this point. Can someone enlighten my ignorance on this optimization ? best regards, Nicolas Brunie -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150930/fecff467/attachment.html>
Hal Finkel via llvm-dev
2015-Oct-26 17:40 UTC
[llvm-dev] InstCombine wrongful (?) optimization on BinOp with SameOperands
----- Original Message -----> From: "Nicolas Brunie via llvm-dev" <llvm-dev at lists.llvm.org> > To: llvm-dev at lists.llvm.org > Sent: Wednesday, September 30, 2015 1:01:52 AM > Subject: [llvm-dev] InstCombine wrongful (?) optimization on BinOp with SameOperands > > > Hi all, > I have been looking at the way LLVM optimizes code before forwarding > it to the backend I develop for my company and while building > define i32 @test_extract_subreg_func(i32 %x, i32 %y) #0 { > entry: > %conv = zext i32 %x to i64 > %conv1 = zext i32 %y to i64 > %mul = mul nuw i64 %conv1, %conv > %shr = lshr i64 %mul, 32 > %xor = xor i64 %shr, %mul > %conv2 = trunc i64 %xor to i32 > ret i32 %conv2 > } > > I came upon the following optimization (during instcombine): > IC: Visiting: %mul = mul nuw i64 %conv, %conv1 > IC: Visiting: %shr = lshr i64 %mul, 32 > IC: Visiting: %conv2 = trunc i64 %shr to i32 > IC: Visiting: %conv3 = trunc i64 %mul to i32 > IC: Visiting: %xor = xor i32 %conv3, %conv2 > IC: ADD: %xor6 = xor i64 %mul, %shr > IC: Old = %xor = xor i32 %conv3, %conv2 > New = <badref> = trunc i64 %xor6 to i32 > > which seems to be performed by SDValue > DAGCombiner::SimplifyBinOpWithSameOpcodeHands(SDNode *N)You might have figured this out by now, but no, InstCombine and DAGCombine are two completely different pieces of code. One is driven by the code in lib/Transforms/InstCombine/* and the other in lib/CodeGen/SelectionDAG/DAGCombiner.cpp. InstCombine's job is to move the IR toward our chosen canonical form, which is designed to simplify operations in a way that exposes further optimization opportunities (as well as being generally beneficial). It does not take target costs into account.> > In my backend's architecture truncate is free, but zext is not (and > i64 is not a desirable type for xor or any binary operation in > general),Why, then, have you listed i64 as a legal type? -Hal> so I would expect this optimization to be bypassed but > because of the following statement : > (N0.getOpcode() == ISD::TRUNCATE && (!TLI.isZExtFree(VT, Op0VT) || > !TLI.isTruncateFree(Op0VT, VT)) > it is not (as isZExtFree return false for my architecture while > isTruncateFree returns true). The comment on binop simplification > says that binop over truncs should be optimize only if trunc is not > free, so I do not understand the point of adding !isZExtFree at this > point. > Can someone enlighten my ignorance on this optimization ? > > best regards, > Nicolas Brunie > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Nicolas Brunie via llvm-dev
2015-Oct-26 19:05 UTC
[llvm-dev] InstCombine wrongful (?) optimization on BinOp with SameOperands
----- Mail original ----- De: "Hal Finkel" <hfinkel at anl.gov> À: "Nicolas Brunie" <nicolas.brunie at kalray.eu> Cc: llvm-dev at lists.llvm.org Envoyé: Lundi 26 Octobre 2015 18:40:54 Objet: Re: [llvm-dev] InstCombine wrongful (?) optimization on BinOp with SameOperands ----- Original Message -----> From: "Nicolas Brunie via llvm-dev" <llvm-dev at lists.llvm.org> > To: llvm-dev at lists.llvm.org > Sent: Wednesday, September 30, 2015 1:01:52 AM > Subject: [llvm-dev] InstCombine wrongful (?) optimization on BinOp with SameOperands > > > Hi all, > I have been looking at the way LLVM optimizes code before forwarding > it to the backend I develop for my company and while building > define i32 @test_extract_subreg_func(i32 %x, i32 %y) #0 { > entry: > %conv = zext i32 %x to i64 > %conv1 = zext i32 %y to i64 > %mul = mul nuw i64 %conv1, %conv > %shr = lshr i64 %mul, 32 > %xor = xor i64 %shr, %mul > %conv2 = trunc i64 %xor to i32 > ret i32 %conv2 > } > > I came upon the following optimization (during instcombine): > IC: Visiting: %mul = mul nuw i64 %conv, %conv1 > IC: Visiting: %shr = lshr i64 %mul, 32 > IC: Visiting: %conv2 = trunc i64 %shr to i32 > IC: Visiting: %conv3 = trunc i64 %mul to i32 > IC: Visiting: %xor = xor i32 %conv3, %conv2 > IC: ADD: %xor6 = xor i64 %mul, %shr > IC: Old = %xor = xor i32 %conv3, %conv2 > New = <badref> = trunc i64 %xor6 to i32 > > which seems to be performed by SDValue > DAGCombiner::SimplifyBinOpWithSameOpcodeHands(SDNode *N)You might have figured this out by now, but no, InstCombine and DAGCombine are two completely different pieces of code. One is driven by the code in lib/Transforms/InstCombine/* and the other in lib/CodeGen/SelectionDAG/DAGCombiner.cpp. InstCombine's job is to move the IR toward our chosen canonical form, which is designed to simplify operations in a way that exposes further optimization opportunities (as well as being generally beneficial). It does not take target costs into account. Yes indeed, I went on my visit of LLVM sources and discover my mistake. But your explanation helps my understanding, than you.> > In my backend's architecture truncate is free, but zext is not (and > i64 is not a desirable type for xor or any binary operation in > general),Why, then, have you listed i64 as a legal type? Because for operation such as mul, add, and in fact xor ... the targets does in fact supports i64, it is just more costly than i32 : the target is a VLIW which can do two 32b add or a single 64b one each cycle. So when possible I would like LLLVM to forward the information it gathers about use of result : i.e. if only the 32 MSB of a i64 result are not used it will be better if only the 32b operations was performed and this optimization was recursively applied to the 64b DAG until a node whose 64b are effectively required. It may well be that I did not described my target correctly to LLVM and thus the 64b DAG is not simplified to 32b. I was under the impression that I should declare i32 as a "preffered" type for these operations and i64 as legal because I do not want i64 operations to be legalize/expanded just simplified (but maybe this is the point of the "legal" declaration). Thank you a lot for digging-up this thread, and for the info Regards, Nicolas
Possibly Parallel Threads
- Speedups with Ra and jit
- Lowering ISD::TRUNCATE
- [LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)
- [LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
- [LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)