David Jones via llvm-dev
2016-Dec-30  00:27 UTC
[llvm-dev] RFC: Inline expansion of memcmp vs call to standard library
Can I make another suggestion: create an intrinsic for memory equality, e.g. llvm.memcmp_eq.p0i8.p0i8.i64(i8*a, i8*b, i64 len). This intrinsic would return zero if the memory regions are equal, and nonzero otherwise. However, it does NOT return any notion of "greater" or "less". Many applications require only determining equality, rather than a total ordering. Given that "greater" and "less" also require some knowledge of endianness, even a fancy lowered version of memcmp can be slower than an equality-only compare. On Thu, Dec 29, 2016 at 4:14 PM, Philip Reames via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Improving lowering for memcmp is definitely something we should do for all > targets. Doing it in a target specific way is decidedly non-ideal. > > It looks like we already have some code in SelectionDAGBuilder which tries > to optimize the lowering for the memcpy library call. I am a bit confused > by the problem you are trying to solve. Are you specifically interested in > lowering for constant lengths greater than a legal size? (i.e. do you need > the loop?) > > If so, there are two approaches you might consider: > - Expand the memcmp call into the loop form in CodeGenPrep (or a similar > timed pass) where working with multiple basic blocks is much easier. Long > term, the "right place" for this type of thing is clearly GlobalISEL, but > we have a number of other such hacks in lowering today and continuing to > build off of that seems reasonable. > - Emit the non-early exit form for small constant values (a[0] == b[0] && > a[1] == b[1] ...). Assuming your backend has handling for efficiently > lowering and chains using branches, you may very well get the code you > want. > > Using the psuedo instruction here feels messy. In particular, I don't > like the fact it basically opts out of all of the combines which might > further improve the lowering. > > Philip > > > On 12/29/2016 11:35 AM, Zaara Syeda via llvm-dev wrote: > > Currently on PowerPC, calls to memcmp are not expanded and are left as > library calls. In certain conditions, expansion can improve performance > rather than calling the library function as done for functions like memcpy, > memmove, etc. This patch *(**https://reviews.llvm.org/D28163* > <https://reviews.llvm.org/D28163>*)* is an initial implementation for > PowerPC to expand memcmp when the size is an 8 byte multiple. > > The approach currently added for this expansion tries to use the existing > infrastructure by overriding the virtual function EmitTargetCodeForMemcmp. > This function works on the SelectionDAG, but the expansion requires control > flow for early exit. So, instead of implementing the expansion within > EmitTargetCodeForMemcmp, a new pseudo instruction is added for memcmp and a > SelectionDAG node for this new pseudo is created in > EmitTargetCodeForMemcmp. This pseudo instruction is then expanded during > lowering in EmitInstrWithCustomInserter. > > The advantage of this approach is that it uses the existing infrastructure > and does not impact other targets. If other targets would like to expand > memcmp, they can also override EmitTargetCodeForMemcmp and create their own > expansion. > > Another option to consider is adding a new optimization pass for this > expansion that isn’t target specific if other targets would benefit from a > more general infrastructure. > > Please provide feedback if this approach should be continued to implement > the PowerPC specific memcmp expansions or whether the community is > interested in devising a more general approach. > > Thanks, > > Zaara Syeda > > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161229/60659b73/attachment.html>
Martin J. O'Riordan via llvm-dev
2016-Dec-30  10:27 UTC
[llvm-dev] RFC: Inline expansion of memcmp vs call to standard library
With the intrinsic support for ‘memcpy’ and ‘memset’ the operands also have
associated alignment operands.  I think that ‘memcmp’ should also provide the
alignment information for each of the source operands (when statically known). 
In some cases this will lead to more optimal alignment aware lowering, and for
targets for which unaligned access is costly or fatal, it can be lowered safely.
 
            MartinO
 
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of David
Jones via llvm-dev
Sent: 30 December 2016 00:28
To: Philip Reames <listmail at philipreames.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; Zaara Syeda <syzaara at
ca.ibm.com>
Subject: Re: [llvm-dev] RFC: Inline expansion of memcmp vs call to standard
library
 
Can I make another suggestion: create an intrinsic for memory equality, e.g.
llvm.memcmp_eq.p0i8.p0i8.i64(i8*a, i8*b, i64 len).  This intrinsic would return
zero if the memory regions are equal, and nonzero otherwise. However, it does
NOT return any notion of "greater" or "less".
Many applications require only determining equality, rather than a total
ordering. Given that "greater" and "less" also require some
knowledge of endianness, even a fancy lowered version of memcmp can be slower
than an equality-only compare.
 
On Thu, Dec 29, 2016 at 4:14 PM, Philip Reames via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > wrote:
Improving lowering for memcmp is definitely something we should do for all
targets.  Doing it in a target specific way is decidedly non-ideal.
It looks like we already have some code in SelectionDAGBuilder which tries to
optimize the lowering for the memcpy library call.  I am a bit confused by the
problem you are trying to solve.  Are you specifically interested in lowering
for constant lengths greater than a legal size?  (i.e. do you need the loop?)
If so, there are two approaches you might consider:
- Expand the memcmp call into the loop form in CodeGenPrep (or a similar timed
pass) where working with multiple basic blocks is much easier.  Long term, the
"right place" for this type of thing is clearly GlobalISEL, but we
have a number of other such hacks in lowering today and continuing to build off
of that seems reasonable.
- Emit the non-early exit form for small constant values (a[0] == b[0]
&& a[1] == b[1] ...).  Assuming your backend has handling for
efficiently lowering and chains using branches, you may very well get the code
you want.
Using the psuedo instruction here feels messy.  In particular, I don't like
the fact it basically opts out of all of the combines which might further
improve the lowering.
Philip
On 12/29/2016 11:35 AM, Zaara Syeda via llvm-dev wrote:
Currently on PowerPC, calls to memcmp are not expanded and are left as library
calls. In certain conditions, expansion can improve performance rather than
calling the library function as done for functions like memcpy, memmove, etc.
This patch ( <https://reviews.llvm.org/D28163>
https://reviews.llvm.org/D28163) is an initial implementation for PowerPC to
expand memcmp when the size is an 8 byte multiple.
The approach currently added for this expansion tries to use the existing
infrastructure by overriding the virtual function EmitTargetCodeForMemcmp. This
function works on the SelectionDAG, but the expansion requires control flow for
early exit. So, instead of implementing the expansion within
EmitTargetCodeForMemcmp, a new pseudo instruction is added for memcmp and a
SelectionDAG node for this new pseudo is created in EmitTargetCodeForMemcmp.
This pseudo instruction is then expanded during lowering in
EmitInstrWithCustomInserter.
The advantage of this approach is that it uses the existing infrastructure and
does not impact other targets. If other targets would like to expand memcmp,
they can also override EmitTargetCodeForMemcmp and create their own expansion.
Another option to consider is adding a new optimization pass for this expansion
that isn’t target specific if other targets would benefit from a more general
infrastructure.
Please provide feedback if this approach should be continued to implement the
PowerPC specific memcmp expansions or whether the community is interested in
devising a more general approach.
Thanks,
Zaara Syeda
 
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> 
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
 
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> 
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161230/bc686704/attachment.html>
Philip Reames via llvm-dev
2016-Dec-30  20:20 UTC
[llvm-dev] RFC: Inline expansion of memcmp vs call to standard library
I'd definitely support having a memcmp intrinsic for the reasons previously specified. However, this is somewhat orthogonal from the original direction of the patch. We can easily improve the lowering of the existing target function and then introduce the intrinsic. Porting the existing lowering code over should be straight forward. I'm only point this out so that we don't get blocked on the eventual end goal and fail to make progress. Philip On 12/30/2016 02:27 AM, Martin J. O'Riordan wrote:> > With the intrinsic support for ‘memcpy’ and ‘memset’ the operands also > have associated alignment operands. I think that ‘memcmp’ should also > provide the alignment information for each of the source operands > (when statically known). In some cases this will lead to more optimal > alignment aware lowering, and for targets for which unaligned access > is costly or fatal, it can be lowered safely. > > MartinO > > *From:*llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of > *David Jones via llvm-dev > *Sent:* 30 December 2016 00:28 > *To:* Philip Reames <listmail at philipreames.com> > *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; Zaara Syeda <syzaara at ca.ibm.com> > *Subject:* Re: [llvm-dev] RFC: Inline expansion of memcmp vs call to > standard library > > Can I make another suggestion: create an intrinsic for memory > equality, e.g. llvm.memcmp_eq.p0i8.p0i8.i64(i8*a, i8*b, i64 len). > This intrinsic would return zero if the memory regions are equal, and > nonzero otherwise. However, it does NOT return any notion of "greater" > or "less". > > Many applications require only determining equality, rather than a > total ordering. Given that "greater" and "less" also require some > knowledge of endianness, even a fancy lowered version of memcmp can be > slower than an equality-only compare. > > On Thu, Dec 29, 2016 at 4:14 PM, Philip Reames via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Improving lowering for memcmp is definitely something we should do > for all targets. Doing it in a target specific way is decidedly > non-ideal. > > It looks like we already have some code in SelectionDAGBuilder > which tries to optimize the lowering for the memcpy library call. > I am a bit confused by the problem you are trying to solve. Are > you specifically interested in lowering for constant lengths > greater than a legal size? (i.e. do you need the loop?) > > If so, there are two approaches you might consider: > - Expand the memcmp call into the loop form in CodeGenPrep (or a > similar timed pass) where working with multiple basic blocks is > much easier. Long term, the "right place" for this type of thing > is clearly GlobalISEL, but we have a number of other such hacks in > lowering today and continuing to build off of that seems reasonable. > - Emit the non-early exit form for small constant values (a[0] => b[0] && a[1] == b[1] ...). Assuming your backend has handling for > efficiently lowering and chains using branches, you may very well > get the code you want. > > Using the psuedo instruction here feels messy. In particular, I > don't like the fact it basically opts out of all of the combines > which might further improve the lowering. > > Philip > > > On 12/29/2016 11:35 AM, Zaara Syeda via llvm-dev wrote: > > Currently on PowerPC, calls to memcmp are not expanded and are > left as library calls. In certain conditions, expansion can > improve performance rather than calling the library function > as done for functions like memcpy, memmove, etc. This patch > *(**https://reviews.llvm.org/D28163**)*is an initial > implementation for PowerPC to expand memcmp when the size is > an 8 byte multiple. > > The approach currently added for this expansion tries to use > the existing infrastructure by overriding the virtual function > EmitTargetCodeForMemcmp. This function works on the > SelectionDAG, but the expansion requires control flow for > early exit. So, instead of implementing the expansion within > EmitTargetCodeForMemcmp, a new pseudo instruction is added for > memcmp and a SelectionDAG node for this new pseudo is created > in EmitTargetCodeForMemcmp. This pseudo instruction is then > expanded during lowering in EmitInstrWithCustomInserter. > > The advantage of this approach is that it uses the existing > infrastructure and does not impact other targets. If other > targets would like to expand memcmp, they can also override > EmitTargetCodeForMemcmp and create their own expansion. > > Another option to consider is adding a new optimization pass > for this expansion that isn’t target specific if other targets > would benefit from a more general infrastructure. > > Please provide feedback if this approach should be continued > to implement the PowerPC specific memcmp expansions or whether > the community is interested in devising a more general approach. > > Thanks, > > Zaara Syeda > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161230/15466ee8/attachment.html>
Reasonably Related Threads
- RFC: Inline expansion of memcmp vs call to standard library
- RFC: Inline expansion of memcmp vs call to standard library
- RFC: Inline expansion of memcmp vs call to standard library
- CFI directives for callee saved registers
- [LLVMdev] RFC: new intrinsic llvm.memcmp?