Hi, In the file lib/CodeGen/MachineCombiner.cpp I see that in the function MachineCombiner::preservesCriticalPathLen we try to determine whether the new combined instruction lengthens the critical path or not. In order to do this we compute the depth and latency for the current instruction (MUL+ADD) and the alternate instruction (MADD). But we call two different set of APIs for the current and new instructions: For new instruction we use: unsigned NewRootDepth = getDepth(InsInstrs, InstrIdxForVirtReg, BlockTrace); unsigned NewRootLatency = getLatency(Root, NewRoot, BlockTrace); While for the current instruction we use: unsigned RootDepth = BlockTrace.getInstrCycles(Root).Depth; unsigned RootLatency = TSchedModel.computeInstrLatency(Root); This has been introduced in the following commit: commit e4fa341dde3c9521b7f11bd53ecdcbeb3f8fcbda Author: Gerolf Hoflehner <ghoflehner at apple.com> Date: Thu Aug 7 21:40:58 2014 +0000 MachineCombiner Pass for selecting faster instruction sequence on AArch64 For this example code sequence: %mul = mul nuw nsw i32 %conv2, %conv %mul7 = mul nuw nsw i32 %conv6, %conv4 %add = add nuw nsw i32 %mul7, %mul ret i32 %add We generate the following assembly: mul w8, w0, w1 mul w9, w2, w3 add w0, w9, w8 ret Whereas I expected the MUL+ADD to be combined to MADD otherwise I see degraded performance in several of my tests. Could someone please explain why we use two different APIs to compute depth and latency for the two instructions? Thanks, Mandeep -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150203/694f8eab/attachment.html>