Serge Pavlov via llvm-dev
2019-Oct-01 05:35 UTC
[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment
Hi all, This proposal is aimed at support of floating point environment, in which some properties like rounding mode or exception behavior differ from those used by default. This include in particular support of 'pragma STDC FENV_ACCESS', 'pragma STDC FENV_ROUND' as well as some other related facilities. Problem On many processors use of non-default floating mode requires modification of global state by writing into some register. It presents a difficulty for implementation as a floating point instruction must not be move to code which executes with different floating point state. To prevent from such moves, the current solution represents FP operations with special (constrained) instructions, which do not participate in optimizations ( http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html). It is important that the constrained FP operations must be used everywhere in entire function including inlined calls, if they are used in some part of it. The main concern about such approach is performance drop. Using constrained FP operations means that optimizations on FP operations are turned off, this is the main reason of using them. Even if non-default FP environment is used in a small piece of a function, optimizations are turned off in entire function. For many practical application this is unacceptable. Although this approach prevents from moving instructions, it does not prevent from moving basic blocks. The code that uses non-default FP environment at some point must set appropriate state registers, do necessary operations and then restore the original mode. If this activity is scattered by several basic blocks, block-level optimizations can break these arrangement, for instance a basic block with default FP operations can be moved after the block that sets non-default FP environment. Solution The proposed approach is based on extension of basic blocks. It is assumed that code in basic block is executed in the same FP environment. The assumption is consistent with the rules of using 'pragma STDC FENV_ACCESS' and similar facilities. If the environment differs from default, such block has pointer to some object that keeps the block attributes including FP settings. All basic blocks, obtained from the same block where 'pragma STDC FENV_ACCESS' is specified, share the same attribute object. In bytecode these attributes are represented by metadata attached to the basic blocks. With basic block attributes compiler can assert validity of an instruction move by comparing attributes of source and destination BBs. An instruction should keep pointer to BB attributes even if it is detached from BB, to support common technique of moving instructions. Similarly compiler can verify validity of BB movement. Such approach allows to develop implementation in which constrained FP operations are 'jailed' in their basic blocks. Other part of the function can still use usual FP operations and get profit of optimizations. Depending on the target hardware some FP operations may be allowed to cross the 'jail' boundary, for instance, it they correspond to instructions which directly encode rounding mode and FP environment change rounding mode only. Is this solution feasible? What are obstacles, difficulties or drawbacks for it? Are there any improvements for it? Any feedback is welcome. Thanks, --Serge -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191001/f4088f26/attachment.html>
Finkel, Hal J. via llvm-dev
2019-Oct-02 22:12 UTC
[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment
On 10/1/19 12:35 AM, Serge Pavlov via llvm-dev wrote: Hi all, This proposal is aimed at support of floating point environment, in which some properties like rounding mode or exception behavior differ from those used by default. This include in particular support of 'pragma STDC FENV_ACCESS', 'pragma STDC FENV_ROUND' as well as some other related facilities. Problem On many processors use of non-default floating mode requires modification of global state by writing into some register. It presents a difficulty for implementation as a floating point instruction must not be move to code which executes with different floating point state. To prevent from such moves, the current solution represents FP operations with special (constrained) instructions, which do not participate in optimizations (http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html). It is important that the constrained FP operations must be used everywhere in entire function including inlined calls, if they are used in some part of it. The main concern about such approach is performance drop. Using constrained FP operations means that optimizations on FP operations are turned off, this is the main reason of using them. Even if non-default FP environment is used in a small piece of a function, optimizations are turned off in entire function. For many practical application this is unacceptable. The reason, as you're likely aware, that the constrained FP operations must be used within the entire function is that, if you mix the constrained FP operations with the normal ones, there's no way to prevent code motion from intermixing them. The solution I recall being discussed to this problem of a function which requires constrained operations only in part is outlining in Clang - this does introduce function-call overhead (although perhaps some MI-level inlining pass could mitigate that in part), but otherwise permits normal optimization of the normal FP operations. Although this approach prevents from moving instructions, it does not prevent from moving basic blocks. The code that uses non-default FP environment at some point must set appropriate state registers, do necessary operations and then restore the original mode. If this activity is scattered by several basic blocks, block-level optimizations can break these arrangement, for instance a basic block with default FP operations can be moved after the block that sets non-default FP environment. Can you please provide some pseudocode to illustrate this problem? Moving basic blocks moves the instructions within them, and I don't see how our current semantics would prevent illegal reorderings of the instructions but not prevent illegal reorderings of groups of those same instructions. At the LLVM level, we currently model the FP-environment state as a kind of memory, and so the operations which adjust the FP-environment state must also be marked as writing to memory, but that's true with essentially all external program state, and that should prevent all illegal reordering. Thanks, Hal Solution The proposed approach is based on extension of basic blocks. It is assumed that code in basic block is executed in the same FP environment. The assumption is consistent with the rules of using 'pragma STDC FENV_ACCESS' and similar facilities. If the environment differs from default, such block has pointer to some object that keeps the block attributes including FP settings. All basic blocks, obtained from the same block where 'pragma STDC FENV_ACCESS' is specified, share the same attribute object. In bytecode these attributes are represented by metadata attached to the basic blocks. With basic block attributes compiler can assert validity of an instruction move by comparing attributes of source and destination BBs. An instruction should keep pointer to BB attributes even if it is detached from BB, to support common technique of moving instructions. Similarly compiler can verify validity of BB movement. Such approach allows to develop implementation in which constrained FP operations are 'jailed' in their basic blocks. Other part of the function can still use usual FP operations and get profit of optimizations. Depending on the target hardware some FP operations may be allowed to cross the 'jail' boundary, for instance, it they correspond to instructions which directly encode rounding mode and FP environment change rounding mode only. Is this solution feasible? What are obstacles, difficulties or drawbacks for it? Are there any improvements for it? Any feedback is welcome. Thanks, --Serge _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191002/8b23e16f/attachment.html>
Finkel, Hal J. via llvm-dev
2019-Oct-03 00:01 UTC
[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment
On 10/2/19 5:12 PM, Hal Finkel wrote: On 10/1/19 12:35 AM, Serge Pavlov via llvm-dev wrote: Hi all, This proposal is aimed at support of floating point environment, in which some properties like rounding mode or exception behavior differ from those used by default. This include in particular support of 'pragma STDC FENV_ACCESS', 'pragma STDC FENV_ROUND' as well as some other related facilities. Problem On many processors use of non-default floating mode requires modification of global state by writing into some register. It presents a difficulty for implementation as a floating point instruction must not be move to code which executes with different floating point state. To prevent from such moves, the current solution represents FP operations with special (constrained) instructions, which do not participate in optimizations (http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html). It is important that the constrained FP operations must be used everywhere in entire function including inlined calls, if they are used in some part of it. The main concern about such approach is performance drop. Using constrained FP operations means that optimizations on FP operations are turned off, this is the main reason of using them. Even if non-default FP environment is used in a small piece of a function, optimizations are turned off in entire function. For many practical application this is unacceptable. The reason, as you're likely aware, that the constrained FP operations must be used within the entire function is that, if you mix the constrained FP operations with the normal ones, there's no way to prevent code motion from intermixing them. The solution I recall being discussed to this problem of a function which requires constrained operations only in part is outlining in Clang - this does introduce function-call overhead (although perhaps some MI-level inlining pass could mitigate that in part), but otherwise permits normal optimization of the normal FP operations. Johannes and I discussed the outlining here offline, and two notes: 1. The outlining itself will prevent the undesired code motion today, but in the future we'll have IPO transformations that will need to be specifically taught to avoid moving FP operations into these outlined functions. 2. The outlined functions will need to be marked with noinline and also noimplicitfloat. In fact, all functions using the constrained intrinsics might need to be marked with noimplicitfloat. The above-mentioned restrictions on IPO passes might be conditioned on the noimplicitfloat attribute. -Hal Although this approach prevents from moving instructions, it does not prevent from moving basic blocks. The code that uses non-default FP environment at some point must set appropriate state registers, do necessary operations and then restore the original mode. If this activity is scattered by several basic blocks, block-level optimizations can break these arrangement, for instance a basic block with default FP operations can be moved after the block that sets non-default FP environment. Can you please provide some pseudocode to illustrate this problem? Moving basic blocks moves the instructions within them, and I don't see how our current semantics would prevent illegal reorderings of the instructions but not prevent illegal reorderings of groups of those same instructions. At the LLVM level, we currently model the FP-environment state as a kind of memory, and so the operations which adjust the FP-environment state must also be marked as writing to memory, but that's true with essentially all external program state, and that should prevent all illegal reordering. Thanks, Hal Solution The proposed approach is based on extension of basic blocks. It is assumed that code in basic block is executed in the same FP environment. The assumption is consistent with the rules of using 'pragma STDC FENV_ACCESS' and similar facilities. If the environment differs from default, such block has pointer to some object that keeps the block attributes including FP settings. All basic blocks, obtained from the same block where 'pragma STDC FENV_ACCESS' is specified, share the same attribute object. In bytecode these attributes are represented by metadata attached to the basic blocks. With basic block attributes compiler can assert validity of an instruction move by comparing attributes of source and destination BBs. An instruction should keep pointer to BB attributes even if it is detached from BB, to support common technique of moving instructions. Similarly compiler can verify validity of BB movement. Such approach allows to develop implementation in which constrained FP operations are 'jailed' in their basic blocks. Other part of the function can still use usual FP operations and get profit of optimizations. Depending on the target hardware some FP operations may be allowed to cross the 'jail' boundary, for instance, it they correspond to instructions which directly encode rounding mode and FP environment change rounding mode only. Is this solution feasible? What are obstacles, difficulties or drawbacks for it? Are there any improvements for it? Any feedback is welcome. Thanks, --Serge _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191003/41aa6277/attachment.html>
Kaylor, Andrew via llvm-dev
2019-Oct-03 18:43 UTC
[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment
I’d like to emphasize that the constrained intrinsics prevent optimizations *by default*. We have a plan to go back and teach individual optimizations how to handle these intrinsics. The idea is that if an optimization knows nothing about the constrained intrinsics then it won’t try to transform them, but if an optimization has been taught to handle the intrinsics correctly then it isn’t limited by anything other than the semantics of the constraints. Once we’ve updated an optimization pass, it will be able to do everything with a constrained intrinsic that has the “relaxed” settings (“fpexcept.ignore” and “fpround.tonearest”) that it would be able to do with the regular operation. This philosophy is key to the way that we’re approaching FPENV support. One of the primary goals is that any optimization that isn’t specifically aware of the mechanisms we’re using will automatically get conservatively correct behavior. The problem with relying on basic block attributes is that it requires teaching all current optimizations to look for the attribute. We had a somewhat similar problem when we implemented Windows exception handling. The implementation introduced basic blocks that instructions shouldn’t be hoisted or sunk into. We ended up having to chase down a lot of cases where our rules were violated. I think this stems from not having a single place to check the legality of code motion. -Andy From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Serge Pavlov via llvm-dev Sent: Monday, September 30, 2019 10:36 PM To: LLVM Developers <llvm-dev at lists.llvm.org> Subject: [llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment Hi all, This proposal is aimed at support of floating point environment, in which some properties like rounding mode or exception behavior differ from those used by default. This include in particular support of 'pragma STDC FENV_ACCESS', 'pragma STDC FENV_ROUND' as well as some other related facilities. Problem On many processors use of non-default floating mode requires modification of global state by writing into some register. It presents a difficulty for implementation as a floating point instruction must not be move to code which executes with different floating point state. To prevent from such moves, the current solution represents FP operations with special (constrained) instructions, which do not participate in optimizations (http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html). It is important that the constrained FP operations must be used everywhere in entire function including inlined calls, if they are used in some part of it. The main concern about such approach is performance drop. Using constrained FP operations means that optimizations on FP operations are turned off, this is the main reason of using them. Even if non-default FP environment is used in a small piece of a function, optimizations are turned off in entire function. For many practical application this is unacceptable. Although this approach prevents from moving instructions, it does not prevent from moving basic blocks. The code that uses non-default FP environment at some point must set appropriate state registers, do necessary operations and then restore the original mode. If this activity is scattered by several basic blocks, block-level optimizations can break these arrangement, for instance a basic block with default FP operations can be moved after the block that sets non-default FP environment. Solution The proposed approach is based on extension of basic blocks. It is assumed that code in basic block is executed in the same FP environment. The assumption is consistent with the rules of using 'pragma STDC FENV_ACCESS' and similar facilities. If the environment differs from default, such block has pointer to some object that keeps the block attributes including FP settings. All basic blocks, obtained from the same block where 'pragma STDC FENV_ACCESS' is specified, share the same attribute object. In bytecode these attributes are represented by metadata attached to the basic blocks. With basic block attributes compiler can assert validity of an instruction move by comparing attributes of source and destination BBs. An instruction should keep pointer to BB attributes even if it is detached from BB, to support common technique of moving instructions. Similarly compiler can verify validity of BB movement. Such approach allows to develop implementation in which constrained FP operations are 'jailed' in their basic blocks. Other part of the function can still use usual FP operations and get profit of optimizations. Depending on the target hardware some FP operations may be allowed to cross the 'jail' boundary, for instance, it they correspond to instructions which directly encode rounding mode and FP environment change rounding mode only. Is this solution feasible? What are obstacles, difficulties or drawbacks for it? Are there any improvements for it? Any feedback is welcome. Thanks, --Serge -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191003/f787fc4e/attachment.html>
Doerfert, Johannes via llvm-dev
2019-Oct-03 18:54 UTC
[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment
On 10/03, Kaylor, Andrew via llvm-dev wrote:> I’d like to emphasize that the constrained intrinsics prevent > optimizations *by default*. We have a plan to go back and teach > individual optimizations how to handle these intrinsics. The idea is > that if an optimization knows nothing about the constrained intrinsics > then it won’t try to transform them, but if an optimization has been > taught to handle the intrinsics correctly then it isn’t limited by > anything other than the semantics of the constraints. Once we’ve > updated an optimization pass, it will be able to do everything with a > constrained intrinsic that has the “relaxed” settings > (“fpexcept.ignore” and “fpround.tonearest”) that it would be able to > do with the regular operation.The way I understood it, the constraint intrinsics are not the only problem but the regular ones can be. That is, optimizations will move around/combine/replace/... regular floating operations in the presence of constraint intrinsics because they do not impact each other (other than def-use). If that understanding is correct, and this is a problem, then I doubt that we want basic block attributes. Also, given that the constraint intrinsics are inaccessible_mem_only, optimizations will work with them as they work with other opaque instructions for which certain effects are known. (Btw. is it intentional that these can unwind?)> This philosophy is key to the way that we’re approaching FPENV > support. One of the primary goals is that any optimization that isn’t > specifically aware of the mechanisms we’re using will automatically > get conservatively correct behavior. The problem with relying on basic > block attributes is that it requires teaching all current > optimizations to look for the attribute.Agreed, totally.> We had a somewhat similar problem when we implemented Windows > exception handling. The implementation introduced basic blocks that > instructions shouldn’t be hoisted or sunk into. We ended up having to > chase down a lot of cases where our rules were violated. I think this > stems from not having a single place to check the legality of code > motion.Agreed. Outlineing seems a reasonable approach to avoid code motion or at least restrict the locations that need to know about the constraints. Given that we already have no implicit float, it seems natural to use it here and make sure IPOs honor it. Cheers, Johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191003/f5e9a424/attachment.sig>
Reid Kleckner via llvm-dev
2019-Oct-03 21:04 UTC
[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment
Basic block attributes would be a pretty major change for LLVM. If we were to add something like this to LLVM, it should be really well designed, and support other use cases beyond just the FP environment. General regions could support things like: - replace lifetime.start/end - asynch exception handling - better windows EH - better replacement for inalloca - I'm sure there are use cases in parallelization that I'm unfamiliar with As is, no, I don't think we should put attributes on blocks. On Mon, Sep 30, 2019 at 10:37 PM Serge Pavlov via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi all, > > This proposal is aimed at support of floating point environment, in which > some properties like rounding mode or exception behavior differ from those > used by default. This include in particular support of 'pragma STDC > FENV_ACCESS', 'pragma STDC FENV_ROUND' as well as some other related > facilities. > > Problem > > On many processors use of non-default floating mode requires modification > of global state by writing into some register. It presents a difficulty for > implementation as a floating point instruction must not be move to code > which executes with different floating point state. To prevent from such > moves, the current solution represents FP operations with special > (constrained) instructions, which do not participate in optimizations ( > http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html). It is > important that the constrained FP operations must be used everywhere in > entire function including inlined calls, if they are used in some part of > it. > > The main concern about such approach is performance drop. Using > constrained FP operations means that optimizations on FP operations are > turned off, this is the main reason of using them. Even if non-default FP > environment is used in a small piece of a function, optimizations are > turned off in entire function. For many practical application this is > unacceptable. > > Although this approach prevents from moving instructions, it does not > prevent from moving basic blocks. The code that uses non-default FP > environment at some point must set appropriate state registers, do > necessary operations and then restore the original mode. If this activity > is scattered by several basic blocks, block-level optimizations can break > these arrangement, for instance a basic block with default FP operations > can be moved after the block that sets non-default FP environment. > > Solution > > The proposed approach is based on extension of basic blocks. It is assumed > that code in basic block is executed in the same FP environment. The > assumption is consistent with the rules of using 'pragma STDC FENV_ACCESS' > and similar facilities. If the environment differs from default, such block > has pointer to some object that keeps the block attributes including FP > settings. All basic blocks, obtained from the same block where 'pragma STDC > FENV_ACCESS' is specified, share the same attribute object. In bytecode > these attributes are represented by metadata attached to the basic blocks. > > With basic block attributes compiler can assert validity of an instruction > move by comparing attributes of source and destination BBs. An instruction > should keep pointer to BB attributes even if it is detached from BB, to > support common technique of moving instructions. Similarly compiler can > verify validity of BB movement. > > Such approach allows to develop implementation in which constrained FP > operations are 'jailed' in their basic blocks. Other part of the function > can still use usual FP operations and get profit of optimizations. > Depending on the target hardware some FP operations may be allowed to cross > the 'jail' boundary, for instance, it they correspond to instructions which > directly encode rounding mode and FP environment change rounding mode only. > > Is this solution feasible? What are obstacles, difficulties or drawbacks > for it? Are there any improvements for it? Any feedback is welcome. > > Thanks, > --Serge > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191003/b33c2dec/attachment.html>
Matt Arsenault via llvm-dev
2019-Oct-03 21:13 UTC
[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment
> On Oct 3, 2019, at 14:04, Reid Kleckner via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Basic block attributes would be a pretty major change for LLVM. If we were to add something like this to LLVM, it should be really well designed, and support other use cases beyond just the FP environment. > > General regions could support things like: > - replace lifetime.start/end > - asynch exception handling > - better windows EH > - better replacement for inalloca > - I'm sure there are use cases in parallelization that I'm unfamiliar with > > As is, no, I don't think we should put attributes on blocks.+1 We have a variety of code motion issues to solve for GPUs, but something block level won’t really help much. -Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191003/82a5af96/attachment.html>
Serge Pavlov via llvm-dev
2019-Oct-08 07:38 UTC
[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment
I see this approach is not supported, so I am trying to elaborate another solution. Nevertheless I'd like to address some comments, just for references. On Fri, Oct 4, 2019 at 1:43 AM Kaylor, Andrew <andrew.kaylor at intel.com> wrote:> I’d like to emphasize that the constrained intrinsics prevent > optimizations **by default**. We have a plan to go back and teach > individual optimizations how to handle these intrinsics. >The idea is that if an optimization knows nothing about the constrained> intrinsics then it won’t try to transform them, but if an optimization has > been taught to handle the intrinsics correctly then it isn’t limited by > anything other than the semantics of the constraints. Once we’ve updated an > optimization pass, it will be able to do everything with a constrained > intrinsic that has the “relaxed” settings (“fpexcept.ignore” and > “fpround.tonearest”) that it would be able to do with the regular operation. >This work is necessary for any approach, but for the current is is vital. As constrained intrinsics are used in entire function body, the code base where the solution must work correctly and fast is larger. The performance drop make this solution inappropriate for many users, they wouldn't use it until the performance become close to the case without constrained intrinsics. In contrast basic block attributes limit the constrained intrinsics with only part of function code. It would be easier to make the solution suitable for use in production code. Of course, when reasoning about performance, it would be nice to have numbers.> > This philosophy is key to the way that we’re approaching FPENV support. > One of the primary goals is that any optimization that isn’t specifically > aware of the mechanisms we’re using will automatically get conservatively > correct behavior. The problem with relying on basic block attributes is > that it requires teaching all current optimizations to look for the > attribute. >All these optimizations must be eventually modified in the current approach as well. If a transformation makes dangerous instruction or basic block move it must be taught to process constrained intrinsics correctly, or it becomes a source of performance drop. But you are right, implementation of basic block attributes require implementation of mechanism that checks validity of instruction and basic block moves. After it is implemented, the search of the places where transformation require modification become simpler. On Fri, Oct 4, 2019 at 1:54 AM Doerfert, Johannes <jdoerfert at anl.gov> wrote:> > The way I understood it, the constraint intrinsics are not the only >> problem but the regular ones can be. That is, optimizations will move >> around/combine/replace/... regular floating operations in the presence >> of constraint intrinsics because they do not impact each other (other >> than def-use). If that understanding is correct, and this is a problem, >> then I doubt that we want basic block attributes. > >Basic block attributes allows to partition function code into realms, where FP operation is represented by either constrained intrinsic or by regular node. Code that moves instructions checks if particular instruction is allowed to pass realm boundary. This mechanism prevents from mixing constrained intrinsics with regular FP nodes, but still allows optimizations like inlining. On Thu, Oct 3, 2019 at 10:45 PM Doerfert, Johannes <jdoerfert at anl.gov> wrote:> On 10/03, Serge Pavlov wrote: > > > > Outlining is an interesting solution but unfortunately it is not an > option > > for processors for machine learning. Branching is expensive on them and > > some processors do not have call instruction, all function calls must be > > eventually inlined. > > Would "really late" inlining be an option?Late inlining means fewer optimization possibilities. If resulting code represents a single function (as in the case of kernels) it is usually more profitable to do early inlining. Thanks, --Serge>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191008/1efc4dd7/attachment.html>
Possibly Parallel Threads
- [RFC] Using basic block attributes to implement non-default floating point environment
- [LLVMdev] ocaml bindings
- FW: clarification needed for the constrained fp implementation.
- FW: clarification needed for the constrained fp implementation.
- [LLVMdev] why LoopUnswitch pass does not constant fold conditional branch and merge blocks