via llvm-dev
2017-Nov-17 19:15 UTC
[llvm-dev] Less aggressive on the first allocation of CSR if detecting an early exit
On 2017-11-17 13:10, Quentin Colombet wrote:>> On Nov 16, 2017, at 2:31 PM, junbuml at codeaurora.org wrote: >> On 2017-11-14 17:22, Quentin Colombet wrote: >> >>> Hi, >>> I think it is kind of artificial to tie the CSRCost with the >>> presence >>> of calls. >>> I think I’ve already mentioned it in one of the review, but I >>> believe it would be better to differentiate when we want to use a >>> CSR >>> to avoid spilling or to avoid splitting. CSR instead of spilling >>> is >>> good, CSR instead of splitting, not so good :). >> >> About this, I can see your previous comment in D27366 (I copied it >> below) : >> Also, that's possible that the right fix/simple fix is to have one >> CSRCost for split and one for spill. >> Indeed, IIRC, right now the returned cost for both spilling and >> splitting is going to be the sum of the frequencies where the >> split/spill happen and given the spill and copy have different cost, >> we may want to have different comparison. >> E.g., CSRCostForSpill = 5 (ok to trade against more than 5 executed >> spill/reload) but CSRCostForSpilt = 20 (ok to trade against more >> than 20 executed copies) >> >> If this is what you meant here, is the CSRCostForSpilt the actual >> cost directly comparable with the split cost? >> Or, it should be multiplied with the entry frequency to be >> comparable with the split cost, considering that the CSRCost is the >> cost of spilling CSR in the entry? > > I believe it should be compared directly with the split cost. I.e., > CSRCostXXX against CostOfTheOtherChoiceWithFrequency.As far as I know the spill/split cost which is compared with CSRCost is the sum of the frequencies of spots where spill/split happen, and the CSRCost is the cost of spilling CSR at the entry. If we say, for example, 20 copies from pre-split are okay to trade against 1 csr spill at the entry, then shouldn't we multiply FreqOfEntry with the 20 (number of tradable copies)? Please me know if I miss something here?> >> By doing this, I would expect we mechanically get the desired >> behavior >> that CSRs get used for live-ranges that go through calls (otherwise >> we >> would have spilled). >> My 2c. >> Cheers, >> -Quentin >> On Nov 10, 2017, at 12:34 PM, junbuml at codeaurora.org wrote: >> On 2017-11-10 07:47, Nemanja Ivanovic wrote: >> One thing I thought about doing a while back and never really >> wrote a >> POC for is the following: >> - Make FirstCSRCost a property of the MachineBasicBlock (or create >> a >> map of MBB* -> FirstCSRCost) >> - Implement a pre-RA pass that will populate the map as follows: >> - Identify all blocks with calls >> - Find the nearest common dominator (NCD) to all those blocks >> (not >> strict so that a block with a call might be that NCD) >> - If the NCD is the entry block, CSR allocation is cheap in all >> blocks >> - Make CSR allocation cheap in blocks that are in the dominator >> tree >> rooted at NCD >> The idea would be to favour CSR allocation in blocks that might be >> eligible for the prologue and favour splitting in blocks that we'd >> prefer not to have a prologue in (or before). >> Then a CFG such as this: >> A >> / \ >> B C >> | / \ >> | D E >> | | / >> | | / >> | |/ >> | / >> |/ >> F >> - Assume calls are in B and ANY_OF(C,D,E): CSR allocation is cheap >> everywhere >> - Assume calls are in C or ALL_OF(D,E): CSR allocation is cheap in >> ALL_OF(C,D,E); CSR allocation is expensive in ALL_OF(A,B,F) >> - Assume only call is in ANY_OF(B,D,E): CSR allocation is cheap >> only >> in that block, expensive everywhere else >> I think this construction would give us what we want, but there >> may be >> [obvious] counter-examples I haven't thought of. >> Thanks Nemanja for sharing your idea. I think this might cover the >> case I was targeting, and we may not need to be limited only in the >> entry block. >> However, I bit worry if this cause RegAllc to slow in allocating >> CSRs. What if we have a call-site in the early exit path and no >> call-site in all other blocks. Then, conservative allocation of CSRs >> might not be a good choice if the reg pressure is high in the all >> other blocks. As our first step, would it make sense to limit this >> only when we detect an early exit? I guess Quentin may have some >> comment. >> thanks, >> Jun >> On Tue, Oct 31, 2017 at 8:38 PM, via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> On 2017-10-30 21:20, Hal Finkel wrote: >> On 10/30/2017 12:20 PM, junbuml at codeaurora.org wrote: >> On 2017-10-27 19:50, Hal Finkel wrote: >> On 10/27/2017 03:32 PM, Jun Lim via llvm-dev wrote: >> When compiling C code below for AArach64, I saw that >> shrink-wrapping >> didn't happen due to the very early uses of CSRs in the entry block. >> So CSR spills/reloads are executed even when the early exit block is >> taken. >> int getI(int i); >> int foo(int *P, int i) { >> if (i>0) >> return P[i]; >> i = getI(i); >> return P[i]; >> } >> It's not that hard to find such cases where RegAllocGreedy >> aggressively allocates a CSRs when a live range expands across a >> call-site. That's because of the conservatively initialized >> CSRCost, causing RegAllocGreedy to strongly favour allocating a CSR >> over splitting a region. Since allocation of CSRs requires the cost >> of spilling CSRs, allocating CSRs is not always beneficial. Like the >> case above, if a function has an early exit code, we may want to be >> less aggressive on the first allocation of CSR in the entry block by >> increasing the CSRCost. >> Previously, I proposed https://reviews.llvm.org/D34608 [1] in this >> matter, but the way I detect the profitable cases and the way I >> increase the CRSCost was somewhat unclear. Now, I'm thinking to less >> aggressive on the first allocation of CSR in the entry block in case >> where the function has an early exit so that encourage more >> shrink-wrapping and avoid executing CSR spill/recover when the early >> exit is taken. By sending this out, I just want to get any high >> level feedback early. Please let me know if anyone has any opinion >> about this. >> So the heuristic will have nothing to do with the presence of calls? >> Might this increase spilling in loops? >> -Hal >> Before allowing the first allocation from CSRs, I will check if the >> virtual register is really live across a call in other blocks. If >> the >> function have a call in entry or exit, we don't need to increase the >> CSRCost. This heuristic will be applied only for the very first >> allocation from CSRs only in the entry block when the function has >> an >> early exit; if a CSR is already allocated in the function, we will >> use >> the current default global CSRCost. >> Even after increasing the CSRCost, we should end up allocating the >> first CSR if the cost of splitting the live-range is still higher >> than >> the increased CSRCost. I believe the amount we want to increase the >> CSRCost must be target-dependent, but it must be conservative enough >> to avoid too many copies in the related spots. >> Thanks for explaining. >> I suppose you'll want to make sure that the call(s) in question come >> after the early exit (i.e., that there aren't calls before the early >> exit)? >> I will check if a virtual register is live across only calls which >> will be executed when the early exit is not taken. By increasing >> CSRcost for such case, we increase chances to avoid executing CSR >> spill/recover when the early exit is taken. >> When you say "only in the entry block" you mean that the live range >> starts in the entry block, right (i.e., that it is a function >> parameter or a vreg defined by some instruction in the entry block)? >> Yes. >> Does it matter that it is in the entry block, or do you only need it >> to come before the early exit and have an execution frequency <= to >> the entry block's execution frequency? >> The reason I specifically check the entry block is because for now I >> see the early exit happen only in entry block; limiting that the >> early >> exit condition is checked in the entry block and branch to the exit >> block directly from the entry block if hitting the condition. If >> overall approach is reasonable, we can certainly extend it. >> -Hal >> Thanks, >> Jun >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev [2] >> -- Hal Finkel >> Lead, Compiler Technology and Programming Languages >> Leadership Computing Facility >> Argonne National Laboratory >> -- >> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm >> Technologies, Inc. >> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a >> Linux Foundation Collaborative Project. >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev [2] >> Links: >> ------ >> [1] https://reviews.llvm.org/D34608 >> [2] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > -- > Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm > Technologies, Inc. > Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a > Linux Foundation Collaborative Project. > > -- > Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm > Technologies, Inc. > Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a > Linux Foundation Collaborative Project.-- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Quentin Colombet via llvm-dev
2017-Nov-18 04:50 UTC
[llvm-dev] Less aggressive on the first allocation of CSR if detecting an early exit
> On Nov 17, 2017, at 11:15 AM, junbuml at codeaurora.org wrote: > > On 2017-11-17 13:10, Quentin Colombet wrote: >>> On Nov 16, 2017, at 2:31 PM, junbuml at codeaurora.org wrote: >>> On 2017-11-14 17:22, Quentin Colombet wrote: >>>> Hi, >>>> I think it is kind of artificial to tie the CSRCost with the >>>> presence >>>> of calls. >>>> I think I’ve already mentioned it in one of the review, but I >>>> believe it would be better to differentiate when we want to use a >>>> CSR >>>> to avoid spilling or to avoid splitting. CSR instead of spilling >>>> is >>>> good, CSR instead of splitting, not so good :). >>> About this, I can see your previous comment in D27366 (I copied it >>> below) : >>> Also, that's possible that the right fix/simple fix is to have one >>> CSRCost for split and one for spill. >>> Indeed, IIRC, right now the returned cost for both spilling and >>> splitting is going to be the sum of the frequencies where the >>> split/spill happen and given the spill and copy have different cost, >>> we may want to have different comparison. >>> E.g., CSRCostForSpill = 5 (ok to trade against more than 5 executed >>> spill/reload) but CSRCostForSpilt = 20 (ok to trade against more >>> than 20 executed copies) >>> If this is what you meant here, is the CSRCostForSpilt the actual >>> cost directly comparable with the split cost? >>> Or, it should be multiplied with the entry frequency to be >>> comparable with the split cost, considering that the CSRCost is the >>> cost of spilling CSR in the entry? >> I believe it should be compared directly with the split cost. I.e., >> CSRCostXXX against CostOfTheOtherChoiceWithFrequency. > > As far as I know the spill/split cost which is compared with CSRCost is the sum of the frequencies of spots where spill/split happen, and the CSRCost is the cost of spilling CSR at the entry. If we say, for example, 20 copies from pre-split are okay to trade against 1 csr spill at the entry, then shouldn't we multiply FreqOfEntry with the 20 (number of tradable copies)?Yes, you’re right with FreqOfEntry. I assumed this was 1.> Please me know if I miss something here? > > >>> By doing this, I would expect we mechanically get the desired >>> behavior >>> that CSRs get used for live-ranges that go through calls (otherwise >>> we >>> would have spilled). >>> My 2c. >>> Cheers, >>> -Quentin >>> On Nov 10, 2017, at 12:34 PM, junbuml at codeaurora.org wrote: >>> On 2017-11-10 07:47, Nemanja Ivanovic wrote: >>> One thing I thought about doing a while back and never really >>> wrote a >>> POC for is the following: >>> - Make FirstCSRCost a property of the MachineBasicBlock (or create >>> a >>> map of MBB* -> FirstCSRCost) >>> - Implement a pre-RA pass that will populate the map as follows: >>> - Identify all blocks with calls >>> - Find the nearest common dominator (NCD) to all those blocks >>> (not >>> strict so that a block with a call might be that NCD) >>> - If the NCD is the entry block, CSR allocation is cheap in all >>> blocks >>> - Make CSR allocation cheap in blocks that are in the dominator >>> tree >>> rooted at NCD >>> The idea would be to favour CSR allocation in blocks that might be >>> eligible for the prologue and favour splitting in blocks that we'd >>> prefer not to have a prologue in (or before). >>> Then a CFG such as this: >>> A >>> / \ >>> B C >>> | / \ >>> | D E >>> | | / >>> | | / >>> | |/ >>> | / >>> |/ >>> F >>> - Assume calls are in B and ANY_OF(C,D,E): CSR allocation is cheap >>> everywhere >>> - Assume calls are in C or ALL_OF(D,E): CSR allocation is cheap in >>> ALL_OF(C,D,E); CSR allocation is expensive in ALL_OF(A,B,F) >>> - Assume only call is in ANY_OF(B,D,E): CSR allocation is cheap >>> only >>> in that block, expensive everywhere else >>> I think this construction would give us what we want, but there >>> may be >>> [obvious] counter-examples I haven't thought of. >>> Thanks Nemanja for sharing your idea. I think this might cover the >>> case I was targeting, and we may not need to be limited only in the >>> entry block. >>> However, I bit worry if this cause RegAllc to slow in allocating >>> CSRs. What if we have a call-site in the early exit path and no >>> call-site in all other blocks. Then, conservative allocation of CSRs >>> might not be a good choice if the reg pressure is high in the all >>> other blocks. As our first step, would it make sense to limit this >>> only when we detect an early exit? I guess Quentin may have some >>> comment. >>> thanks, >>> Jun >>> On Tue, Oct 31, 2017 at 8:38 PM, via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>> On 2017-10-30 21:20, Hal Finkel wrote: >>> On 10/30/2017 12:20 PM, junbuml at codeaurora.org wrote: >>> On 2017-10-27 19:50, Hal Finkel wrote: >>> On 10/27/2017 03:32 PM, Jun Lim via llvm-dev wrote: >>> When compiling C code below for AArach64, I saw that >>> shrink-wrapping >>> didn't happen due to the very early uses of CSRs in the entry block. >>> So CSR spills/reloads are executed even when the early exit block is >>> taken. >>> int getI(int i); >>> int foo(int *P, int i) { >>> if (i>0) >>> return P[i]; >>> i = getI(i); >>> return P[i]; >>> } >>> It's not that hard to find such cases where RegAllocGreedy >>> aggressively allocates a CSRs when a live range expands across a >>> call-site. That's because of the conservatively initialized >>> CSRCost, causing RegAllocGreedy to strongly favour allocating a CSR >>> over splitting a region. Since allocation of CSRs requires the cost >>> of spilling CSRs, allocating CSRs is not always beneficial. Like the >>> case above, if a function has an early exit code, we may want to be >>> less aggressive on the first allocation of CSR in the entry block by >>> increasing the CSRCost. >>> Previously, I proposed https://reviews.llvm.org/D34608 [1] in this >>> matter, but the way I detect the profitable cases and the way I >>> increase the CRSCost was somewhat unclear. Now, I'm thinking to less >>> aggressive on the first allocation of CSR in the entry block in case >>> where the function has an early exit so that encourage more >>> shrink-wrapping and avoid executing CSR spill/recover when the early >>> exit is taken. By sending this out, I just want to get any high >>> level feedback early. Please let me know if anyone has any opinion >>> about this. >>> So the heuristic will have nothing to do with the presence of calls? >>> Might this increase spilling in loops? >>> -Hal >>> Before allowing the first allocation from CSRs, I will check if the >>> virtual register is really live across a call in other blocks. If >>> the >>> function have a call in entry or exit, we don't need to increase the >>> CSRCost. This heuristic will be applied only for the very first >>> allocation from CSRs only in the entry block when the function has >>> an >>> early exit; if a CSR is already allocated in the function, we will >>> use >>> the current default global CSRCost. >>> Even after increasing the CSRCost, we should end up allocating the >>> first CSR if the cost of splitting the live-range is still higher >>> than >>> the increased CSRCost. I believe the amount we want to increase the >>> CSRCost must be target-dependent, but it must be conservative enough >>> to avoid too many copies in the related spots. >>> Thanks for explaining. >>> I suppose you'll want to make sure that the call(s) in question come >>> after the early exit (i.e., that there aren't calls before the early >>> exit)? >>> I will check if a virtual register is live across only calls which >>> will be executed when the early exit is not taken. By increasing >>> CSRcost for such case, we increase chances to avoid executing CSR >>> spill/recover when the early exit is taken. >>> When you say "only in the entry block" you mean that the live range >>> starts in the entry block, right (i.e., that it is a function >>> parameter or a vreg defined by some instruction in the entry block)? >>> Yes. >>> Does it matter that it is in the entry block, or do you only need it >>> to come before the early exit and have an execution frequency <= to >>> the entry block's execution frequency? >>> The reason I specifically check the entry block is because for now I >>> see the early exit happen only in entry block; limiting that the >>> early >>> exit condition is checked in the entry block and branch to the exit >>> block directly from the entry block if hitting the condition. If >>> overall approach is reasonable, we can certainly extend it. >>> -Hal >>> Thanks, >>> Jun >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev [2] >>> -- Hal Finkel >>> Lead, Compiler Technology and Programming Languages >>> Leadership Computing Facility >>> Argonne National Laboratory >>> -- >>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm >>> Technologies, Inc. >>> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a >>> Linux Foundation Collaborative Project. >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev [2] >>> Links: >>> ------ >>> [1] https://reviews.llvm.org/D34608 >>> [2] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> -- >> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm >> Technologies, Inc. >> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a >> Linux Foundation Collaborative Project. >> -- >> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm >> Technologies, Inc. >> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a >> Linux Foundation Collaborative Project. > > -- > Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. > Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171117/f3f8ef25/attachment.html>
via llvm-dev
2017-Nov-21 15:59 UTC
[llvm-dev] Less aggressive on the first allocation of CSR if detecting an early exit
On 2017-11-17 23:50, Quentin Colombet wrote:>> On Nov 17, 2017, at 11:15 AM, junbuml at codeaurora.org wrote: >> On 2017-11-17 13:10, Quentin Colombet wrote: >> On Nov 16, 2017, at 2:31 PM, junbuml at codeaurora.org wrote: >> On 2017-11-14 17:22, Quentin Colombet wrote: >> Hi, >> I think it is kind of artificial to tie the CSRCost with the >> presence >> of calls. >> I think I’ve already mentioned it in one of the review, but I >> believe it would be better to differentiate when we want to use a >> CSR >> to avoid spilling or to avoid splitting. CSR instead of spilling >> is >> good, CSR instead of splitting, not so good :). >> About this, I can see your previous comment in D27366 (I copied it >> below) : >> Also, that's possible that the right fix/simple fix is to have one >> CSRCost for split and one for spill. >> Indeed, IIRC, right now the returned cost for both spilling and >> splitting is going to be the sum of the frequencies where the >> split/spill happen and given the spill and copy have different cost, >> we may want to have different comparison. >> E.g., CSRCostForSpill = 5 (ok to trade against more than 5 executed >> spill/reload) but CSRCostForSpilt = 20 (ok to trade against more >> than 20 executed copies) >> If this is what you meant here, is the CSRCostForSpilt the actual >> cost directly comparable with the split cost? >> Or, it should be multiplied with the entry frequency to be >> comparable with the split cost, considering that the CSRCost is the >> cost of spilling CSR in the entry? > I believe it should be compared directly with the split cost. I.e., > CSRCostXXX against CostOfTheOtherChoiceWithFrequency. > > As far as I know the spill/split cost which is compared with CSRCost > is the sum of the frequencies of spots where spill/split happen, and > the CSRCost is the cost of spilling CSR at the entry. If we say, for > example, 20 copies from pre-split are okay to trade against 1 csr > spill at the entry, then shouldn't we multiply FreqOfEntry with the 20 > (number of tradable copies)? > > Yes, you’re right with FreqOfEntry. I assumed this was 1.Considering current implementation in initializeCSRCost() where the CSRCost is scaled by the FixedEntry(1 << 14), if we want to set 20 for CSRCostForSplit (20 copies are tradable with one CSR spill in the entry), then I think the initial raw value for CSRCostForSplit should be 20 * (1 << 14) to allow us to multiply 20 with the actual frequency of the entry. Do you agree with it?> >> Please me know if I miss something here? >> >> By doing this, I would expect we mechanically get the desired >> behavior >> that CSRs get used for live-ranges that go through calls (otherwise >> we >> would have spilled). >> My 2c. >> Cheers, >> -Quentin >> On Nov 10, 2017, at 12:34 PM, junbuml at codeaurora.org wrote: >> On 2017-11-10 07:47, Nemanja Ivanovic wrote: >> One thing I thought about doing a while back and never really >> wrote a >> POC for is the following: >> - Make FirstCSRCost a property of the MachineBasicBlock (or create >> a >> map of MBB* -> FirstCSRCost) >> - Implement a pre-RA pass that will populate the map as follows: >> - Identify all blocks with calls >> - Find the nearest common dominator (NCD) to all those blocks >> (not >> strict so that a block with a call might be that NCD) >> - If the NCD is the entry block, CSR allocation is cheap in all >> blocks >> - Make CSR allocation cheap in blocks that are in the dominator >> tree >> rooted at NCD >> The idea would be to favour CSR allocation in blocks that might be >> eligible for the prologue and favour splitting in blocks that we'd >> prefer not to have a prologue in (or before). >> Then a CFG such as this: >> A >> / \ >> B C >> | / \ >> | D E >> | | / >> | | / >> | |/ >> | / >> |/ >> F >> - Assume calls are in B and ANY_OF(C,D,E): CSR allocation is cheap >> everywhere >> - Assume calls are in C or ALL_OF(D,E): CSR allocation is cheap in >> ALL_OF(C,D,E); CSR allocation is expensive in ALL_OF(A,B,F) >> - Assume only call is in ANY_OF(B,D,E): CSR allocation is cheap >> only >> in that block, expensive everywhere else >> I think this construction would give us what we want, but there >> may be >> [obvious] counter-examples I haven't thought of. >> Thanks Nemanja for sharing your idea. I think this might cover the >> case I was targeting, and we may not need to be limited only in the >> entry block. >> However, I bit worry if this cause RegAllc to slow in allocating >> CSRs. What if we have a call-site in the early exit path and no >> call-site in all other blocks. Then, conservative allocation of CSRs >> might not be a good choice if the reg pressure is high in the all >> other blocks. As our first step, would it make sense to limit this >> only when we detect an early exit? I guess Quentin may have some >> comment. >> thanks, >> Jun >> On Tue, Oct 31, 2017 at 8:38 PM, via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> On 2017-10-30 21:20, Hal Finkel wrote: >> On 10/30/2017 12:20 PM, junbuml at codeaurora.org wrote: >> On 2017-10-27 19:50, Hal Finkel wrote: >> On 10/27/2017 03:32 PM, Jun Lim via llvm-dev wrote: >> When compiling C code below for AArach64, I saw that >> shrink-wrapping >> didn't happen due to the very early uses of CSRs in the entry block. >> So CSR spills/reloads are executed even when the early exit block is >> taken. >> int getI(int i); >> int foo(int *P, int i) { >> if (i>0) >> return P[i]; >> i = getI(i); >> return P[i]; >> } >> It's not that hard to find such cases where RegAllocGreedy >> aggressively allocates a CSRs when a live range expands across a >> call-site. That's because of the conservatively initialized >> CSRCost, causing RegAllocGreedy to strongly favour allocating a CSR >> over splitting a region. Since allocation of CSRs requires the cost >> of spilling CSRs, allocating CSRs is not always beneficial. Like the >> case above, if a function has an early exit code, we may want to be >> less aggressive on the first allocation of CSR in the entry block by >> increasing the CSRCost. >> Previously, I proposed https://reviews.llvm.org/D34608 [1] in this >> matter, but the way I detect the profitable cases and the way I >> increase the CRSCost was somewhat unclear. Now, I'm thinking to less >> aggressive on the first allocation of CSR in the entry block in case >> where the function has an early exit so that encourage more >> shrink-wrapping and avoid executing CSR spill/recover when the early >> exit is taken. By sending this out, I just want to get any high >> level feedback early. Please let me know if anyone has any opinion >> about this. >> So the heuristic will have nothing to do with the presence of calls? >> Might this increase spilling in loops? >> -Hal >> Before allowing the first allocation from CSRs, I will check if the >> virtual register is really live across a call in other blocks. If >> the >> function have a call in entry or exit, we don't need to increase the >> CSRCost. This heuristic will be applied only for the very first >> allocation from CSRs only in the entry block when the function has >> an >> early exit; if a CSR is already allocated in the function, we will >> use >> the current default global CSRCost. >> Even after increasing the CSRCost, we should end up allocating the >> first CSR if the cost of splitting the live-range is still higher >> than >> the increased CSRCost. I believe the amount we want to increase the >> CSRCost must be target-dependent, but it must be conservative enough >> to avoid too many copies in the related spots. >> Thanks for explaining. >> I suppose you'll want to make sure that the call(s) in question come >> after the early exit (i.e., that there aren't calls before the early >> exit)? >> I will check if a virtual register is live across only calls which >> will be executed when the early exit is not taken. By increasing >> CSRcost for such case, we increase chances to avoid executing CSR >> spill/recover when the early exit is taken. >> When you say "only in the entry block" you mean that the live range >> starts in the entry block, right (i.e., that it is a function >> parameter or a vreg defined by some instruction in the entry block)? >> Yes. >> Does it matter that it is in the entry block, or do you only need it >> to come before the early exit and have an execution frequency <= to >> the entry block's execution frequency? >> The reason I specifically check the entry block is because for now I >> see the early exit happen only in entry block; limiting that the >> early >> exit condition is checked in the entry block and branch to the exit >> block directly from the entry block if hitting the condition. If >> overall approach is reasonable, we can certainly extend it. >> -Hal >> Thanks, >> Jun >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev [2] >> -- Hal Finkel >> Lead, Compiler Technology and Programming Languages >> Leadership Computing Facility >> Argonne National Laboratory >> -- >> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm >> Technologies, Inc. >> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a >> Linux Foundation Collaborative Project. >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev [2] >> Links: >> ------ >> [1] https://reviews.llvm.org/D34608 >> [2] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> -- >> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm >> Technologies, Inc. >> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a >> Linux Foundation Collaborative Project. >> -- >> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm >> Technologies, Inc. >> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a >> Linux Foundation Collaborative Project. > > -- > Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm > Technologies, Inc. > Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a > Linux Foundation Collaborative Project.-- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.