Quentin Colombet
2013-Jul-01 20:33 UTC
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
On Jul 1, 2013, at 11:52 AM, Eli Friedman <eli.friedman at gmail.com> wrote:> On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet <qcolombet at apple.com> wrote: > Hi, > > ** Problematic ** > I am looking for advices to share some logic between DAG combine and target lowering. > > Basically, I need to know if a bitcast that is about to be inserted during target specific isel lowering will be eliminated during DAG combine. > > Let me know if there is another, better supported, approach for this kind of problems. > > ** Motivating Example ** > The motivating example comes form the lowering of vector code on armv7. > More specifically, the build_vector node is lowered to a target specific ARMISD::build_vector where all the parameters are bitcasted to floating point types. > > This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code. > > Attached motivating_example.ll shows such a case: > llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - > ldr r0, [r1] > ldr r1, [r2] > vmov s1, r1 > vmov s0, r0 > Here each ldr, vmov sequences could have been replaced by a simple vld1.32. > > ** Proposed Solution ** > Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free. > The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO). > > I think you're approaching this backwards: the obvious thing to do is to generate the insert_vector_elt sequence unconditionally, and DAGCombine that sequence to a build_vector when appropriate.Hi Eli, I have started to look into the direction you gave me. I may have miss something but I do not see how the proposed direction solves the issue. Indeed to be able to DAGCombine a insert_vector_elt sequences into a ARMISD::build_vector, I still need to know if it would be profitable, i.e., if DAGCombine will remove the bitcasts that combining/lowering is about to insert. Since target specific DAGCombine are also done in TargetLowering I do not have access to more DAGCombine logic (at least DAGCombineInfo is not providing the require information). What did I miss? Thanks, -Quentin> > -Eli-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/5a04d7c5/attachment.html>
Eli Friedman
2013-Jul-01 20:45 UTC
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
On Mon, Jul 1, 2013 at 1:33 PM, Quentin Colombet <qcolombet at apple.com>wrote:> On Jul 1, 2013, at 11:52 AM, Eli Friedman <eli.friedman at gmail.com> wrote: > > On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet <qcolombet at apple.com> > wrote: > >> Hi, >> >> ** Problematic ** >> I am looking for advices to share some logic between DAG combine and >> target lowering. >> >> Basically, I need to know if a bitcast that is about to be inserted >> during target specific isel lowering will be eliminated during DAG combine. >> >> Let me know if there is another, better supported, approach for this kind >> of problems. >> >> ** Motivating Example ** >> The motivating example comes form the lowering of vector code on armv7. >> More specifically, the build_vector node is lowered to a target specific >> ARMISD::build_vector where all the parameters are bitcasted to floating >> point types. >> >> This works well, unless the inserted bitcasts survive until instruction >> selection. In that case, they incur moves between integer unit and floating >> point unit that may result in inefficient code. >> >> Attached motivating_example.ll shows such a case: >> llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - >> ldr r0, [r1] >> ldr r1, [r2] >> vmov s1, r1 >> vmov s0, r0 >> Here each ldr, vmov sequences could have been replaced by a simple >> vld1.32. >> >> ** Proposed Solution ** >> Lower to more vector friendly code (using a sequence of >> insert_vector_elt), when bit casts will not be free. >> The attached patch demonstrates that, but is missing the proper check to >> know what DAG combine will do (see TODO). >> > > I think you're approaching this backwards: the obvious thing to do is to > generate the insert_vector_elt sequence unconditionally, and DAGCombine > that sequence to a build_vector when appropriate. > > Hi Eli, > > I have started to look into the direction you gave me. > > I may have miss something but I do not see how the proposed direction > solves the issue. Indeed to be able to DAGCombine a insert_vector_elt > sequences into a ARMISD::build_vector, I still need to know if it would be > profitable, i.e., if DAGCombine will remove the bitcasts that > combining/lowering is about to insert. > > Since target specific DAGCombine are also done in TargetLowering I do not > have access to more DAGCombine logic (at least DAGCombineInfo is not > providing the require information). > > What did I miss? > >Err, wait, sorry, my fault; I missed that you only insert the bitcasts on the other side of the branch. You should be able to do it the other way, though: generate the build_vector unconditionally, and pull insert_vector_elts out of it in a DAGCombine. (At this point, you know whether DAGCombine will remove the bit casts because if it could, it would have already done it.) -Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/dd67c722/attachment.html>
Quentin Colombet
2013-Jul-01 20:58 UTC
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
On Jul 1, 2013, at 1:45 PM, Eli Friedman <eli.friedman at gmail.com> wrote:> On Mon, Jul 1, 2013 at 1:33 PM, Quentin Colombet <qcolombet at apple.com> wrote: > On Jul 1, 2013, at 11:52 AM, Eli Friedman <eli.friedman at gmail.com> wrote: > >> On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet <qcolombet at apple.com> wrote: >> Hi, >> >> ** Problematic ** >> I am looking for advices to share some logic between DAG combine and target lowering. >> >> Basically, I need to know if a bitcast that is about to be inserted during target specific isel lowering will be eliminated during DAG combine. >> >> Let me know if there is another, better supported, approach for this kind of problems. >> >> ** Motivating Example ** >> The motivating example comes form the lowering of vector code on armv7. >> More specifically, the build_vector node is lowered to a target specific ARMISD::build_vector where all the parameters are bitcasted to floating point types. >> >> This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code. >> >> Attached motivating_example.ll shows such a case: >> llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - >> ldr r0, [r1] >> ldr r1, [r2] >> vmov s1, r1 >> vmov s0, r0 >> Here each ldr, vmov sequences could have been replaced by a simple vld1.32. >> >> ** Proposed Solution ** >> Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free. >> The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO). >> >> I think you're approaching this backwards: the obvious thing to do is to generate the insert_vector_elt sequence unconditionally, and DAGCombine that sequence to a build_vector when appropriate. > > Hi Eli, > > I have started to look into the direction you gave me. > > I may have miss something but I do not see how the proposed direction solves the issue. Indeed to be able to DAGCombine a insert_vector_elt sequences into a ARMISD::build_vector, I still need to know if it would be profitable, i.e., if DAGCombine will remove the bitcasts that combining/lowering is about to insert. > > Since target specific DAGCombine are also done in TargetLowering I do not have access to more DAGCombine logic (at least DAGCombineInfo is not providing the require information). > > What did I miss? > > > Err, wait, sorry, my fault; I missed that you only insert the bitcasts on the other side of the branch. > > You should be able to do it the other way, though: generate the build_vector unconditionally, and pull insert_vector_elts out of it in a DAGCombine. (At this point, you know whether DAGCombine will remove the bit casts because if it could, it would have already done it.)Make sense. I will try that. Thanks! -Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/3d5acf26/attachment.html>
Quentin Colombet
2013-Jul-02 00:48 UTC
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
On Jul 1, 2013, at 1:45 PM, Eli Friedman <eli.friedman at gmail.com> wrote:> On Mon, Jul 1, 2013 at 1:33 PM, Quentin Colombet <qcolombet at apple.com> wrote: > On Jul 1, 2013, at 11:52 AM, Eli Friedman <eli.friedman at gmail.com> wrote: > >> On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet <qcolombet at apple.com> wrote: >> Hi, >> >> ** Problematic ** >> I am looking for advices to share some logic between DAG combine and target lowering. >> >> Basically, I need to know if a bitcast that is about to be inserted during target specific isel lowering will be eliminated during DAG combine. >> >> Let me know if there is another, better supported, approach for this kind of problems. >> >> ** Motivating Example ** >> The motivating example comes form the lowering of vector code on armv7. >> More specifically, the build_vector node is lowered to a target specific ARMISD::build_vector where all the parameters are bitcasted to floating point types. >> >> This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code. >> >> Attached motivating_example.ll shows such a case: >> llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - >> ldr r0, [r1] >> ldr r1, [r2] >> vmov s1, r1 >> vmov s0, r0 >> Here each ldr, vmov sequences could have been replaced by a simple vld1.32. >> >> ** Proposed Solution ** >> Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free. >> The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO). >> >> I think you're approaching this backwards: the obvious thing to do is to generate the insert_vector_elt sequence unconditionally, and DAGCombine that sequence to a build_vector when appropriate. > > Hi Eli, > > I have started to look into the direction you gave me. > > I may have miss something but I do not see how the proposed direction solves the issue. Indeed to be able to DAGCombine a insert_vector_elt sequences into a ARMISD::build_vector, I still need to know if it would be profitable, i.e., if DAGCombine will remove the bitcasts that combining/lowering is about to insert. > > Since target specific DAGCombine are also done in TargetLowering I do not have access to more DAGCombine logic (at least DAGCombineInfo is not providing the require information). > > What did I miss? > > > Err, wait, sorry, my fault; I missed that you only insert the bitcasts on the other side of the branch. > > You should be able to do it the other way, though: generate the build_vector unconditionally, and pull insert_vector_elts out of it in a DAGCombine. (At this point, you know whether DAGCombine will remove the bit casts because if it could, it would have already done it.) >Thanks again Eli for the direction. That works great! I have sent a proposal with that solution. -Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/c035c391/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
- [LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
- [LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
- [LLVMdev] Modeling GPU vector registers, again (with my implementation)
- [LLVMdev] Simple NEON optimization