thr3ads.net - llvm dev - [LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering? [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Quentin Colombet

2013-Jul-01 18:30 UTC

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

Hi,

** Problematic **
I am looking for advices to share some logic between DAG combine and target
lowering.

Basically, I need to know if a bitcast that is about to be inserted during
target specific isel lowering will be eliminated during DAG combine.

Let me know if there is another, better supported, approach for this kind of
problems.

** Motivating Example **
The motivating example comes form the lowering of vector code on armv7.
More specifically, the build_vector node is lowered to a target specific
ARMISD::build_vector where all the parameters are bitcasted to floating point
types.

This works well, unless the inserted bitcasts survive until instruction
selection. In that case, they incur moves between integer unit and floating
point unit that may result in inefficient code.

Attached motivating_example.ll shows such a case:
llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o -
ldr r0, [r1]
ldr r1, [r2]
vmov s1, r1
vmov s0, r0
Here each ldr, vmov sequences could have been replaced by a simple vld1.32.

** Proposed Solution **
Lower to more vector friendly code (using a sequence of insert_vector_elt), when
bit casts will not be free.
The attached patch demonstrates that, but is missing the proper check to know
what DAG combine will do (see TODO).

Thanks for your help.

Cheers,

-Quentin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/c1dd6e5a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ARMISelLowering.patch
Type: application/octet-stream
Size: 1288 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/c1dd6e5a/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/c1dd6e5a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: motivating_example.ll
Type: application/octet-stream
Size: 1114 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/c1dd6e5a/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/c1dd6e5a/attachment-0002.html>

Eli Friedman

2013-Jul-01 18:52 UTC

head link

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet <qcolombet at
apple.com>wrote:
> Hi,
>
> ** Problematic **
> I am looking for advices to share some logic between DAG combine and
> target lowering.
>
> Basically, I need to know if a bitcast that is about to be inserted during
> target specific isel lowering will be eliminated during DAG combine.
>
> Let me know if there is another, better supported, approach for this kind
> of problems.
>
> ** Motivating Example **
> The motivating example comes form the lowering of vector code on armv7.
> More specifically, the build_vector node is lowered to a target specific
> ARMISD::build_vector where all the parameters are bitcasted to floating
> point types.
>
> This works well, unless the inserted bitcasts survive until instruction
> selection. In that case, they incur moves between integer unit and floating
> point unit that may result in inefficient code.
>
> Attached motivating_example.ll shows such a case:
> llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o -
> ldr r0, [r1]
> ldr r1, [r2]
> vmov s1, r1
> vmov s0, r0
> Here each ldr, vmov sequences could have been replaced by a simple vld1.32.
>
> ** Proposed Solution **
> Lower to more vector friendly code (using a sequence of
> insert_vector_elt), when bit casts will not be free.
> The attached patch demonstrates that, but is missing the proper check to
> know what DAG combine will do (see TODO).
>
I think you're approaching this backwards: the obvious thing to do is to
generate the insert_vector_elt sequence unconditionally, and DAGCombine
that sequence to a build_vector when appropriate.

-Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/e2f1191b/attachment.html>

Quentin Colombet

2013-Jul-01 19:07 UTC

head link

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

On Jul 1, 2013, at 11:52 AM, Eli Friedman <eli.friedman at gmail.com>
wrote:
> On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet <qcolombet at
apple.com> wrote:
> Hi,
> 
> ** Problematic **
> I am looking for advices to share some logic between DAG combine and target
lowering.
> 
> Basically, I need to know if a bitcast that is about to be inserted during
target specific isel lowering will be eliminated during DAG combine.
> 
> Let me know if there is another, better supported, approach for this kind
of problems.
> 
> ** Motivating Example **
> The motivating example comes form the lowering of vector code on armv7.
> More specifically, the build_vector node is lowered to a target specific
ARMISD::build_vector where all the parameters are bitcasted to floating point
types.
> 
> This works well, unless the inserted bitcasts survive until instruction
selection. In that case, they incur moves between integer unit and floating
point unit that may result in inefficient code.
> 
> Attached motivating_example.ll shows such a case:
> llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o -
> 	ldr	r0, [r1]
> 	ldr	r1, [r2]
> 	vmov	s1, r1
> 	vmov	s0, r0
> Here each ldr, vmov sequences could have been replaced by a simple vld1.32.
> 
> ** Proposed Solution **
> Lower to more vector friendly code (using a sequence of insert_vector_elt),
when bit casts will not be free.
> The attached patch demonstrates that, but is missing the proper check to
know what DAG combine will do (see TODO).
> 
> I think you're approaching this backwards: the obvious thing to do is
to generate the insert_vector_elt sequence unconditionally, and DAGCombine that
sequence to a build_vector when appropriate.Thanks Eli.

I will try that approach.

-Quentin
> 
> -Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/33178708/attachment.html>

Quentin Colombet

2013-Jul-01 20:33 UTC

head link

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

On Jul 1, 2013, at 11:52 AM, Eli Friedman <eli.friedman at gmail.com>
wrote:
> On Mon, Jul 1, 2013 at 11:30 AM, Quentin Colombet <qcolombet at
apple.com> wrote:
> Hi,
> 
> ** Problematic **
> I am looking for advices to share some logic between DAG combine and target
lowering.
> 
> Basically, I need to know if a bitcast that is about to be inserted during
target specific isel lowering will be eliminated during DAG combine.
> 
> Let me know if there is another, better supported, approach for this kind
of problems.
> 
> ** Motivating Example **
> The motivating example comes form the lowering of vector code on armv7.
> More specifically, the build_vector node is lowered to a target specific
ARMISD::build_vector where all the parameters are bitcasted to floating point
types.
> 
> This works well, unless the inserted bitcasts survive until instruction
selection. In that case, they incur moves between integer unit and floating
point unit that may result in inefficient code.
> 
> Attached motivating_example.ll shows such a case:
> llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o -
> 	ldr	r0, [r1]
> 	ldr	r1, [r2]
> 	vmov	s1, r1
> 	vmov	s0, r0
> Here each ldr, vmov sequences could have been replaced by a simple vld1.32.
> 
> ** Proposed Solution **
> Lower to more vector friendly code (using a sequence of insert_vector_elt),
when bit casts will not be free.
> The attached patch demonstrates that, but is missing the proper check to
know what DAG combine will do (see TODO).
> 
> I think you're approaching this backwards: the obvious thing to do is
to generate the insert_vector_elt sequence unconditionally, and DAGCombine that
sequence to a build_vector when appropriate.Hi Eli,

I have started to look into the direction you gave me.

I may have miss something but I do not see how the proposed direction solves the
issue. Indeed to be able to DAGCombine a insert_vector_elt sequences into a
ARMISD::build_vector, I still need to know if it would be profitable, i.e., if
DAGCombine will remove the bitcasts that combining/lowering is about to insert.

Since target specific DAGCombine are also done in TargetLowering I do not have
access to more DAGCombine logic (at least DAGCombineInfo is not providing the
require information).

What did I miss?

Thanks,

-Quentin
> 
> -Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/5a04d7c5/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Jul 2013 - [LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?

Possibly Parallel Threads