thr3ads.net - llvm dev - [llvm-dev] Early legalization pass ? Doing early legalization in an existing pass ? [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Nemanja Ivanovic via llvm-dev

2017-Jan-24 12:47 UTC

[llvm-dev] Early legalization pass ? Doing early legalization in an existing pass ?

I may be wrong here, but legalizing early seems like something that is more
likely to prevent optimizations than it is to encourage them.

But I guess I don't follow why things like TTI, TII and TLI queries
don't
suffice for this. CodeGenPrepare will break this sequence up. I would
imagine that if the target returns false for isCheapToSpeculateCtlz() and
false for canInsertSelect(), the code would look the way you'd like it to.

But as I said, I'm mostly speculating here and I might be very wrong.

On Mon, Jan 23, 2017 at 5:02 PM, via llvm-dev <llvm-dev at lists.llvm.org>
wrote:
>
> > On Jan 23, 2017, at 4:06 AM, Amaury SECHET via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Hi all,
> >
> > Some non trivial legalization of operations which aren't supported
by
> the backend would benefit from having the optimizer pass on them. I noticed
> some example trying to optimize various pieces of code over the past weeks.
> >
> > One offender is the cttz/ctlz intrinsic when defined on 0. On X86, BSR
> and NSF are undefined on 0, and only recent CPU have the LZCNT and TZCNT
> instructions that are properly defined for 0. The backend insert code with
> a branch that checks for 0 and use bsf/bsr or just use a constant.
> >
> > But if we are to branch anyway, and one path of the branch set the
value
> as a constant, there are some obvious optimization which can be done,
> starting with constant folding. None of these happen in the backend and it
> doesn't seems to be the right place anyway. See for instance the sample
> code from a serialization/deserialization routine (the code has been tuned
> to illustrate the problem in a brief way) :
> >
> > auto a = ctlz(n, false);
> > auto b = ((a * 36) + 35) >> 8;
> >
> > Which will be synthesized as follow:
> >
> > auto a = (n == 0) ? 64 : ctlz(n, true);
> > auto b = ((a * 36) + 35) >> 8;
> >
> > But obviously, recomputing b in the case n is 0 is completely
pointless
> work. A better codegen would be something like:
> >
> > if (n == 0) {
> >   a = 64;
> >   b = 0;
> > } else {
> >   a = ctlz(n, true);
> >   b = ((a * 36) + 35) >> 8;
> > }
> >
> > The optimizer knows how to do these kind of transformations, but the
> backend do not. I encountered the same issue a some time back in a memory
> allocator, and worked around it, but as I'm encountering it again in
the
> serialization library, I'm assuming there may be some untapped source
of
> optimizations here.
> >
> > I was unsure about where these optimizations should take place.
Clearly,
> we want to do them early in the pipeline so that other passes can pick up
> on it. I was looking around but it didn't seemed like there was a good
> place to add this transformation.
> >
> > Other examples of legalization that may benefit from the optimizer are
> splitting of large integral that the backend do not support into multiple
> operations on smaller integrals.
> >
> > Would a EarlyLegalization pass be worth it ? It could use infos from
the
> backend and do various transformations that the backend would have to do
> anyway, which will expose optimization opportunities. Or is there a place
> that is appropriate to insert theses ?
>
> At least in theory, SelectionDAG is supposed to be able to do a lot of
> these kind of optimizations. The goal is to do legalization, then clean up
> the results of that in combine2.
>
> —escha
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170124/6a4798aa/attachment.html>

Amaury SECHET via llvm-dev

2017-Jan-24 23:08 UTC

head link

[llvm-dev] Early legalization pass ? Doing early legalization in an existing pass ?

2017-01-24 13:47 GMT+01:00 Nemanja Ivanovic <nemanja.i.ibm at gmail.com>:
> I may be wrong here, but legalizing early seems like something that is
> more likely to prevent optimizations than it is to encourage them.
>
>I guess it really depends. My gut feeling is that you are right for simple
things like using proper integer sizes, but that it is probably not true
for anything involving select/control flow.

>
> But I guess I don't follow why things like TTI, TII and TLI queries
don't
> suffice for this. CodeGenPrepare will break this sequence up. I would
> imagine that if the target returns false for isCheapToSpeculateCtlz() and
> false for canInsertSelect(), the code would look the way you'd like it
to.
>
> But as I said, I'm mostly speculating here and I might be very wrong.
>
>I got a fair amount of bad codegen here because a branch is added at the
last minute but at this point, all the passes doing anything interesting
with the control flow are over. For instance:

auto x = ctlz(n);
auto y = ((x * 36) + 35) >> 8;

It's fairly obvious that you'd like to constant fold y in the case a
branch
is required. That is something that nothing after CodeGenPrepare knows how
to do.

> On Mon, Jan 23, 2017 at 5:02 PM, via llvm-dev <llvm-dev at
lists.llvm.org>
> wrote:
>
>>
>> > On Jan 23, 2017, at 4:06 AM, Amaury SECHET via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >
>> > Hi all,
>> >
>> > Some non trivial legalization of operations which aren't
supported by
>> the backend would benefit from having the optimizer pass on them. I
noticed
>> some example trying to optimize various pieces of code over the past
weeks.
>> >
>> > One offender is the cttz/ctlz intrinsic when defined on 0. On X86,
BSR
>> and NSF are undefined on 0, and only recent CPU have the LZCNT and
TZCNT
>> instructions that are properly defined for 0. The backend insert code
with
>> a branch that checks for 0 and use bsf/bsr or just use a constant.
>> >
>> > But if we are to branch anyway, and one path of the branch set the
>> value as a constant, there are some obvious optimization which can be
done,
>> starting with constant folding. None of these happen in the backend and
it
>> doesn't seems to be the right place anyway. See for instance the
sample
>> code from a serialization/deserialization routine (the code has been
tuned
>> to illustrate the problem in a brief way) :
>> >
>> > auto a = ctlz(n, false);
>> > auto b = ((a * 36) + 35) >> 8;
>> >
>> > Which will be synthesized as follow:
>> >
>> > auto a = (n == 0) ? 64 : ctlz(n, true);
>> > auto b = ((a * 36) + 35) >> 8;
>> >
>> > But obviously, recomputing b in the case n is 0 is completely
pointless
>> work. A better codegen would be something like:
>> >
>> > if (n == 0) {
>> >   a = 64;
>> >   b = 0;
>> > } else {
>> >   a = ctlz(n, true);
>> >   b = ((a * 36) + 35) >> 8;
>> > }
>> >
>> > The optimizer knows how to do these kind of transformations, but
the
>> backend do not. I encountered the same issue a some time back in a
memory
>> allocator, and worked around it, but as I'm encountering it again
in the
>> serialization library, I'm assuming there may be some untapped
source of
>> optimizations here.
>> >
>> > I was unsure about where these optimizations should take place.
>> Clearly, we want to do them early in the pipeline so that other passes
can
>> pick up on it. I was looking around but it didn't seemed like there
was a
>> good place to add this transformation.
>> >
>> > Other examples of legalization that may benefit from the optimizer
are
>> splitting of large integral that the backend do not support into
multiple
>> operations on smaller integrals.
>> >
>> > Would a EarlyLegalization pass be worth it ? It could use infos
from
>> the backend and do various transformations that the backend would have
to
>> do anyway, which will expose optimization opportunities. Or is there a
>> place that is appropriate to insert theses ?
>>
>> At least in theory, SelectionDAG is supposed to be able to do a lot of
>> these kind of optimizations. The goal is to do legalization, then clean
up
>> the results of that in combine2.
>>
>> —escha
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170125/aadfa44b/attachment.html>

Nemanja Ivanovic via llvm-dev

2017-Jan-25 06:31 UTC

head link

[llvm-dev] Early legalization pass ? Doing early legalization in an existing pass ?

On Wed, Jan 25, 2017 at 12:08 AM, Amaury SECHET <deadalnix at gmail.com>
wrote:
>
>
> 2017-01-24 13:47 GMT+01:00 Nemanja Ivanovic <nemanja.i.ibm at
gmail.com>:
>
>> I may be wrong here, but legalizing early seems like something that is
>> more likely to prevent optimizations than it is to encourage them.
>>
>>
> I guess it really depends. My gut feeling is that you are right for simple
> things like using proper integer sizes, but that it is probably not true
> for anything involving select/control flow.
>This sounds reasonable.

>
>> But I guess I don't follow why things like TTI, TII and TLI queries
don't
>> suffice for this. CodeGenPrepare will break this sequence up. I would
>> imagine that if the target returns false for isCheapToSpeculateCtlz()
and
>> false for canInsertSelect(), the code would look the way you'd like
it to.
>>
>> But as I said, I'm mostly speculating here and I might be very
wrong.
>>
>>
> I got a fair amount of bad codegen here because a branch is added at the
> last minute but at this point, all the passes doing anything interesting
> with the control flow are over. For instance:
>
> auto x = ctlz(n);
> auto y = ((x * 36) + 35) >> 8;
>
> It's fairly obvious that you'd like to constant fold y in the case
a
> branch is required. That is something that nothing after CodeGenPrepare
> knows how to do.
>I wonder if adding some cleanup to CodeGenPrepare for any basic blocks that
end up having PHI's with some constant operands might be useful in general.

>
>
>> On Mon, Jan 23, 2017 at 5:02 PM, via llvm-dev <llvm-dev at
lists.llvm.org>
>> wrote:
>>
>>>
>>> > On Jan 23, 2017, at 4:06 AM, Amaury SECHET via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > Some non trivial legalization of operations which aren't
supported by
>>> the backend would benefit from having the optimizer pass on them. I
noticed
>>> some example trying to optimize various pieces of code over the
past weeks.
>>> >
>>> > One offender is the cttz/ctlz intrinsic when defined on 0. On
X86, BSR
>>> and NSF are undefined on 0, and only recent CPU have the LZCNT and
TZCNT
>>> instructions that are properly defined for 0. The backend insert
code with
>>> a branch that checks for 0 and use bsf/bsr or just use a constant.
>>> >
>>> > But if we are to branch anyway, and one path of the branch set
the
>>> value as a constant, there are some obvious optimization which can
be done,
>>> starting with constant folding. None of these happen in the backend
and it
>>> doesn't seems to be the right place anyway. See for instance
the sample
>>> code from a serialization/deserialization routine (the code has
been tuned
>>> to illustrate the problem in a brief way) :
>>> >
>>> > auto a = ctlz(n, false);
>>> > auto b = ((a * 36) + 35) >> 8;
>>> >
>>> > Which will be synthesized as follow:
>>> >
>>> > auto a = (n == 0) ? 64 : ctlz(n, true);
>>> > auto b = ((a * 36) + 35) >> 8;
>>> >
>>> > But obviously, recomputing b in the case n is 0 is completely
>>> pointless work. A better codegen would be something like:
>>> >
>>> > if (n == 0) {
>>> >   a = 64;
>>> >   b = 0;
>>> > } else {
>>> >   a = ctlz(n, true);
>>> >   b = ((a * 36) + 35) >> 8;
>>> > }
>>> >
>>> > The optimizer knows how to do these kind of transformations,
but the
>>> backend do not. I encountered the same issue a some time back in a
memory
>>> allocator, and worked around it, but as I'm encountering it
again in the
>>> serialization library, I'm assuming there may be some untapped
source of
>>> optimizations here.
>>> >
>>> > I was unsure about where these optimizations should take
place.
>>> Clearly, we want to do them early in the pipeline so that other
passes can
>>> pick up on it. I was looking around but it didn't seemed like
there was a
>>> good place to add this transformation.
>>> >
>>> > Other examples of legalization that may benefit from the
optimizer are
>>> splitting of large integral that the backend do not support into
multiple
>>> operations on smaller integrals.
>>> >
>>> > Would a EarlyLegalization pass be worth it ? It could use
infos from
>>> the backend and do various transformations that the backend would
have to
>>> do anyway, which will expose optimization opportunities. Or is
there a
>>> place that is appropriate to insert theses ?
>>>
>>> At least in theory, SelectionDAG is supposed to be able to do a lot
of
>>> these kind of optimizations. The goal is to do legalization, then
clean up
>>> the results of that in combine2.
>>>
>>> —escha
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170125/eb5ce65a/attachment.html>

Mehdi Amini via llvm-dev

2017-Jan-25 16:12 UTC

head link

[llvm-dev] Early legalization pass ? Doing early legalization in an existing pass ?

> On Jan 24, 2017, at 3:08 PM, Amaury SECHET via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> 
> 2017-01-24 13:47 GMT+01:00 Nemanja Ivanovic <nemanja.i.ibm at gmail.com
<mailto:nemanja.i.ibm at gmail.com>>:
> I may be wrong here, but legalizing early seems like something that is more
likely to prevent optimizations than it is to encourage them.
> 
> 
> I guess it really depends. My gut feeling is that you are right for simple
things like using proper integer sizes, but that it is probably not true for
anything involving select/control flow.
>  
> 
> But I guess I don't follow why things like TTI, TII and TLI queries
don't suffice for this. CodeGenPrepare will break this sequence up. I would
imagine that if the target returns false for isCheapToSpeculateCtlz() and false
for canInsertSelect(), the code would look the way you'd like it to.
> 
> But as I said, I'm mostly speculating here and I might be very wrong.
> 
> 
> I got a fair amount of bad codegen here because a branch is added at the
last minute but at this point, all the passes doing anything interesting with
the control flow are over. For instance:
> 
> auto x = ctlz(n);
> auto y = ((x * 36) + 35) >> 8;
> 
> It's fairly obvious that you'd like to constant fold y in the case
a branch is required. That is something that nothing after CodeGenPrepare knows
how to do.
It seems to me that this is something that could be done at the MI level (I’d
need to see the MI dump though).

GobalISel may also help here by being able to look beyond single blocks (CC
Tim/Quentin, see below for the example with the branch).

— 
Mehdi


>  
> On Mon, Jan 23, 2017 at 5:02 PM, via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> 
> > On Jan 23, 2017, at 4:06 AM, Amaury SECHET via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> >
> > Hi all,
> >
> > Some non trivial legalization of operations which aren't supported
by the backend would benefit from having the optimizer pass on them. I noticed
some example trying to optimize various pieces of code over the past weeks.
> >
> > One offender is the cttz/ctlz intrinsic when defined on 0. On X86, BSR
and NSF are undefined on 0, and only recent CPU have the LZCNT and TZCNT
instructions that are properly defined for 0. The backend insert code with a
branch that checks for 0 and use bsf/bsr or just use a constant.
> >
> > But if we are to branch anyway, and one path of the branch set the
value as a constant, there are some obvious optimization which can be done,
starting with constant folding. None of these happen in the backend and it
doesn't seems to be the right place anyway. See for instance the sample code
from a serialization/deserialization routine (the code has been tuned to
illustrate the problem in a brief way) :
> >
> > auto a = ctlz(n, false);
> > auto b = ((a * 36) + 35) >> 8;
> >
> > Which will be synthesized as follow:
> >
> > auto a = (n == 0) ? 64 : ctlz(n, true);
> > auto b = ((a * 36) + 35) >> 8;
> >
> > But obviously, recomputing b in the case n is 0 is completely
pointless work. A better codegen would be something like:
> >
> > if (n == 0) {
> >   a = 64;
> >   b = 0;
> > } else {
> >   a = ctlz(n, true);
> >   b = ((a * 36) + 35) >> 8;
> > }
> >
> > The optimizer knows how to do these kind of transformations, but the
backend do not. I encountered the same issue a some time back in a memory
allocator, and worked around it, but as I'm encountering it again in the
serialization library, I'm assuming there may be some untapped source of
optimizations here.
> >
> > I was unsure about where these optimizations should take place.
Clearly, we want to do them early in the pipeline so that other passes can pick
up on it. I was looking around but it didn't seemed like there was a good
place to add this transformation.
> >
> > Other examples of legalization that may benefit from the optimizer are
splitting of large integral that the backend do not support into multiple
operations on smaller integrals.
> >
> > Would a EarlyLegalization pass be worth it ? It could use infos from
the backend and do various transformations that the backend would have to do
anyway, which will expose optimization opportunities. Or is there a place that
is appropriate to insert theses ?
> 
> At least in theory, SelectionDAG is supposed to be able to do a lot of
these kind of optimizations. The goal is to do legalization, then clean up the
results of that in combine2.
> 
> —escha
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170125/e3e23e1d/attachment.html>

llvm dev - Jan 2017 - Early legalization pass ? Doing early legalization in an existing pass ?

[llvm-dev] Early legalization pass ? Doing early legalization in an existing pass ?

[llvm-dev] Early legalization pass ? Doing early legalization in an existing pass ?

[llvm-dev] Early legalization pass ? Doing early legalization in an existing pass ?

[llvm-dev] Early legalization pass ? Doing early legalization in an existing pass ?