Hi everyone, The MachineLICM pass has a heuristic such that, even in low-register-pressure situations, it will refuse to hoist "cheap" instructions out of loops. By default, when an itinerary is available, this means that all of the defined operands are available in at most 1 cycle. ARM overrides this, and provides this more-customized definition: bool ARMBaseInstrInfo:: hasLowDefLatency(const InstrItineraryData *ItinData, const MachineInstr *DefMI, unsigned DefIdx) const { if (!ItinData || ItinData->isEmpty()) return false; unsigned DDomain = DefMI->getDesc().TSFlags & ARMII::DomainMask; if (DDomain == ARMII::DomainGeneral) { unsigned DefClass = DefMI->getDesc().getSchedClass(); int DefCycle = ItinData->getOperandCycle(DefClass, DefIdx); return (DefCycle != -1 && DefCycle <= 2); } return false; } So it won't hoist instructions that have defined operands ready in at most two cycles for general-domain instructions. Regardless, I don't understand the logic behind this heuristic. High-register-pressure situations are one thing, but why is it ever profitable in low-register-pressure situations not to hoist even "cheap" instructions out of loop bodies? Thanks again, Hal -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Hi Hal, I haven’t looked at the implementation but could it be that we leave room for expensive instructions to be hoisted? The rational being that hoisting a cheap instruction will increase the register pressure and then we may block expensive instruction (assuming we use some kind of iterative algorithm). Other than that, I would say that seem rather silly. Cheers, Q. On Jan 8, 2015, at 1:45 PM, Hal Finkel <hfinkel at anl.gov> wrote:> Hi everyone, > > The MachineLICM pass has a heuristic such that, even in low-register-pressure situations, it will refuse to hoist "cheap" instructions out of loops. By default, when an itinerary is available, this means that all of the defined operands are available in at most 1 cycle. ARM overrides this, and provides this more-customized definition: > > bool ARMBaseInstrInfo:: > hasLowDefLatency(const InstrItineraryData *ItinData, > const MachineInstr *DefMI, unsigned DefIdx) const { > if (!ItinData || ItinData->isEmpty()) > return false; > > unsigned DDomain = DefMI->getDesc().TSFlags & ARMII::DomainMask; > if (DDomain == ARMII::DomainGeneral) { > unsigned DefClass = DefMI->getDesc().getSchedClass(); > int DefCycle = ItinData->getOperandCycle(DefClass, DefIdx); > return (DefCycle != -1 && DefCycle <= 2); > } > return false; > } > > So it won't hoist instructions that have defined operands ready in at most two cycles for general-domain instructions. Regardless, I don't understand the logic behind this heuristic. High-register-pressure situations are one thing, but why is it ever profitable in low-register-pressure situations not to hoist even "cheap" instructions out of loop bodies? > > Thanks again, > Hal > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
----- Original Message -----> From: "Quentin Colombet" <qcolombet at apple.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Dev" <llvmdev at cs.uiuc.edu> > Sent: Thursday, January 8, 2015 5:06:24 PM > Subject: Re: [LLVMdev] Machine LICM and cheap instructions? > > Hi Hal, > > I haven’t looked at the implementation but could it be that we leave > room for expensive instructions to be hoisted? > The rational being that hoisting a cheap instruction will increase > the register pressure and then we may block expensive instruction > (assuming we use some kind of iterative algorithm).Interesting point. That cuts both ways, however, because hoisting a cheap instruction can then make an expensive instruction loop invariant. Perhaps we need two passes over the instructions to make a good decision here. Thanks, Hal> > Other than that, I would say that seem rather silly. > > Cheers, > Q. > On Jan 8, 2015, at 1:45 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > > Hi everyone, > > > > The MachineLICM pass has a heuristic such that, even in > > low-register-pressure situations, it will refuse to hoist "cheap" > > instructions out of loops. By default, when an itinerary is > > available, this means that all of the defined operands are > > available in at most 1 cycle. ARM overrides this, and provides > > this more-customized definition: > > > > bool ARMBaseInstrInfo:: > > hasLowDefLatency(const InstrItineraryData *ItinData, > > const MachineInstr *DefMI, unsigned DefIdx) const { > > if (!ItinData || ItinData->isEmpty()) > > return false; > > > > unsigned DDomain = DefMI->getDesc().TSFlags & ARMII::DomainMask; > > if (DDomain == ARMII::DomainGeneral) { > > unsigned DefClass = DefMI->getDesc().getSchedClass(); > > int DefCycle = ItinData->getOperandCycle(DefClass, DefIdx); > > return (DefCycle != -1 && DefCycle <= 2); > > } > > return false; > > } > > > > So it won't hoist instructions that have defined operands ready in > > at most two cycles for general-domain instructions. Regardless, I > > don't understand the logic behind this heuristic. > > High-register-pressure situations are one thing, but why is it > > ever profitable in low-register-pressure situations not to hoist > > even "cheap" instructions out of loop bodies? > > > > Thanks again, > > Hal > > > > -- > > Hal Finkel > > Assistant Computational Scientist > > Leadership Computing Facility > > Argonne National Laboratory > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Some cheap and trivially remateriablizable instructions are free. It makes little sense to hoist them and increase register pressure. In some cases, the register allocator will end up rematerialize them at use sites again. The heuristics is obviously conservative. Due to phase ordering reasons, register pressure estimates cannot be 100% accurate. It’s here to prevent harm. During my testings from way back then, hoisting these cheap instructions can indeed harm performance. It probably can help in some cases as well. But I’d wager it’s pretty much a wash at best. Evan> On Jan 8, 2015, at 1:45 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > Hi everyone, > > The MachineLICM pass has a heuristic such that, even in low-register-pressure situations, it will refuse to hoist "cheap" instructions out of loops. By default, when an itinerary is available, this means that all of the defined operands are available in at most 1 cycle. ARM overrides this, and provides this more-customized definition: > > bool ARMBaseInstrInfo:: > hasLowDefLatency(const InstrItineraryData *ItinData, > const MachineInstr *DefMI, unsigned DefIdx) const { > if (!ItinData || ItinData->isEmpty()) > return false; > > unsigned DDomain = DefMI->getDesc().TSFlags & ARMII::DomainMask; > if (DDomain == ARMII::DomainGeneral) { > unsigned DefClass = DefMI->getDesc().getSchedClass(); > int DefCycle = ItinData->getOperandCycle(DefClass, DefIdx); > return (DefCycle != -1 && DefCycle <= 2); > } > return false; > } > > So it won't hoist instructions that have defined operands ready in at most two cycles for general-domain instructions. Regardless, I don't understand the logic behind this heuristic. High-register-pressure situations are one thing, but why is it ever profitable in low-register-pressure situations not to hoist even "cheap" instructions out of loop bodies? > > Thanks again, > Hal > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory
On Hexagon it's generally better to hoist everything out of the loop and then rematerialize cheap instructions if necessary. Hexagon executes instruction in packets, and each extra instruction in the loop adds the likelihood of having more packets than necessary. In case of small loops this can be particularly expensive and predicting packetization is extremely difficult without actually performing it. The "hasLowDefLatency" is not really a good tool to model this, since cheap instructions are still cheap, and yet they are better hoisted out. -Krzysztof On 1/8/2015 3:45 PM, Hal Finkel wrote:> Hi everyone, > > The MachineLICM pass has a heuristic such that, even in low-register-pressure situations, it will refuse to hoist "cheap" instructions out of loops. By default, when an itinerary is available, this means that all of the defined operands are available in at most 1 cycle. ARM overrides this, and provides this more-customized definition: > > bool ARMBaseInstrInfo:: > hasLowDefLatency(const InstrItineraryData *ItinData, > const MachineInstr *DefMI, unsigned DefIdx) const { > if (!ItinData || ItinData->isEmpty()) > return false; > > unsigned DDomain = DefMI->getDesc().TSFlags & ARMII::DomainMask; > if (DDomain == ARMII::DomainGeneral) { > unsigned DefClass = DefMI->getDesc().getSchedClass(); > int DefCycle = ItinData->getOperandCycle(DefClass, DefIdx); > return (DefCycle != -1 && DefCycle <= 2); > } > return false; > } > > So it won't hoist instructions that have defined operands ready in at most two cycles for general-domain instructions. Regardless, I don't understand the logic behind this heuristic. High-register-pressure situations are one thing, but why is it ever profitable in low-register-pressure situations not to hoist even "cheap" instructions out of loop bodies? > > Thanks again, > Hal >-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation