thr3ads.net - llvm dev - [llvm-dev] Loop identification [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2017-Jan-13 16:52 UTC

[llvm-dev] Loop identification

On 01/13/2017 10:19 AM, Krzysztof Parzyszek via llvm-dev
wrote:> Hi Catello,
>
> LLVM does have a "loop idiom recognition" pass which, in
principle,
> does exactly that kind of a thing: it recognizes loops that perform 
> memcpy/memset operations. It does not recognize any target-specific 
> idioms though and there isn't really much in it that would make such 
> recognition easier. We have some cases like yours on Hexagon, where we 
> want to replace certain loops with Hexagon-specific intrinsics, and 
> the way we do it is that we have (in our own compiler) a separate pass 
> that runs at around the same time, but which does "Hexagon-specific 
> loop idiom recognition". That pass is not present in llvm.org, mostly 
> because it hooks up a target specific pass in a way that is not 
> "officially supported".
>
> If LLVM supported adding such target-specific passes at that point in 
> the optimization pipeline, you could just write your own pass and plug 
> it in there.
This certainly seems like a reasonable thing to support, but the 
question is: Why should your pass run early in the mid-level optimizer 
(i.e. in the part of the pipeline we generally consider 
canonicalization) instead of as an early IR pass in the backend? Adding 
IR-level passes early in the backend is well supported. There are plenty 
of potential answers here for why earlier is better (e.g. affecting 
inlining decisions, idioms might be significantly more difficult to 
recognize after vectorization, etc.) but I think we need to discuss the 
use case.

  -Hal
>
> -Krzysztof
>
>
> On 1/13/2017 9:45 AM, Catello Cioffi via llvm-dev wrote:
>> Good afternoon,
>>
>> I'm working on modifying the Mips backend in order to add new
>> functionalities. I've successfully implemented the intrinsics, but
I
>> want to recognize a pattern like this:
>>
>> int seq[max];
>> int cnt = 0;
>>
>> for (int i = 0; i < max; i++)
>> {
>>    for (int j = 0; i < 16; i++)
>>    {
>>        char hn = (seq[i] & (3<<(j*2)) >> (j*2);
>>        if (hn == 2)
>>           {
>>              cnt++;
>>           }
>>    }
>> }
>>
>>
>> and transform it into something like:
>>
>> int seq[max];
>> int cnt = 0;
>>
>> for (int i = 0; i < max; i++)
>> {
>>   cnt += intrinsic(seq[i], 2);
>> }
>>
>> Do you know what I can use to transform the loop? Or if exists
something
>> similar in LLVM?
>>
>> Thanks,
>>
>> Catello
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Krzysztof Parzyszek via llvm-dev

2017-Jan-13 17:13 UTC

head link

[llvm-dev] Loop identification

On 1/13/2017 10:52 AM, Hal Finkel wrote:> On 01/13/2017 10:19 AM, Krzysztof Parzyszek via llvm-dev wrote:
>>
>> If LLVM supported adding such target-specific passes at that point in
>> the optimization pipeline, you could just write your own pass and plug
>> it in there.
>
> This certainly seems like a reasonable thing to support, but the
> question is: Why should your pass run early in the mid-level optimizer
> (i.e. in the part of the pipeline we generally consider
> canonicalization) instead of as an early IR pass in the backend? Adding
> IR-level passes early in the backend is well supported. There are plenty
> of potential answers here for why earlier is better (e.g. affecting
> inlining decisions, idioms might be significantly more difficult to
> recognize after vectorization, etc.) but I think we need to discuss the
> use case.
The reason is that the idiom code may end up looking different each time 
one of the preceding optimization is changed.  Also, some of the 
optimizations (instruction combiner, for example) have a tendency to 
greatly obfuscate the code, making it really hard to extract useful data 
from the idiom code. It is not always enough to simply recognize a 
pattern, but to replace it with an intrinsic some additional parameters 
may need to be obtained from the initial code. When the code is mangled 
by the combiner, this process may be a lot harder.  Also, combiner is 
one of those things that change quite often.  For recognizing loop 
idioms, the loop optimizations may be the main problem.  The idiom code 
may end up getting unrolled, rotated, or otherwise rendered unrecognizable.


-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Mehdi Amini via llvm-dev

2017-Jan-13 17:25 UTC

head link

[llvm-dev] Loop identification

> On Jan 13, 2017, at 9:13 AM, Krzysztof Parzyszek via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
> 
> On 1/13/2017 10:52 AM, Hal Finkel wrote:
>> On 01/13/2017 10:19 AM, Krzysztof Parzyszek via llvm-dev wrote:
>>> 
>>> If LLVM supported adding such target-specific passes at that point
in
>>> the optimization pipeline, you could just write your own pass and
plug
>>> it in there.
>> 
>> This certainly seems like a reasonable thing to support, but the
>> question is: Why should your pass run early in the mid-level optimizer
>> (i.e. in the part of the pipeline we generally consider
>> canonicalization) instead of as an early IR pass in the backend? Adding
>> IR-level passes early in the backend is well supported. There are
plenty
>> of potential answers here for why earlier is better (e.g. affecting
>> inlining decisions, idioms might be significantly more difficult to
>> recognize after vectorization, etc.) but I think we need to discuss the
>> use case.
> 
> The reason is that the idiom code may end up looking different each time
one of the preceding optimization is changed.  Also, some of the optimizations
(instruction combiner, for example) have a tendency to greatly obfuscate the
code
This seems quite contradictory with what I was constantly told about the goal of
inst-combine, i.e. it is supposed to canonicalize the IR to make it easier for
further passes to recognize patterns.

> , making it really hard to extract useful data from the idiom code. It is
not always enough to simply recognize a pattern, but to replace it with an
intrinsic some additional parameters may need to be obtained from the initial
code. When the code is mangled by the combiner, this process may be a lot
harder.  Also, combiner is one of those things that change quite often.  For
recognizing loop idioms, the loop optimizations may be the main problem.  The
idiom code may end up getting unrolled, rotated, or otherwise rendered
unrecognizable.
That part (loop optimization) was acknowledged in Hal answer IIUC.

— 
Mehdi

Krzysztof Parzyszek via llvm-dev

2017-Jan-13 18:53 UTC

head link

[llvm-dev] Loop identification

On 1/13/2017 10:52 AM, Hal Finkel wrote:> but I think we need to discuss the use case.
The main case for us was recognizing polynomial multiplications. Hexagon 
has instructions that do that, and the goal was to replace loops that do 
that with intrinsics. The problem is that these loops often get unrolled 
and intertwined with other code, making the replacement hard, or 
impossible. This is especially true if some of the multiplication code 
is combined with instructions that were not originally part of it (I 
don't remember 100% if that was happening, but the loop optimizations 
were the main culprit in making it hard).

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Hal Finkel via llvm-dev

2017-Jan-14 01:08 UTC

head link

[llvm-dev] Loop identification

On 01/13/2017 12:53 PM, Krzysztof Parzyszek wrote:> On 1/13/2017 10:52 AM, Hal Finkel wrote:
>> but I think we need to discuss the use case.
>
> The main case for us was recognizing polynomial multiplications. 
> Hexagon has instructions that do that, and the goal was to replace 
> loops that do that with intrinsics. The problem is that these loops 
> often get unrolled and intertwined with other code, making the 
> replacement hard, or impossible. This is especially true if some of 
> the multiplication code is combined with instructions that were not 
> originally part of it (I don't remember 100% if that was happening, 
> but the loop optimizations were the main culprit in making it hard).
This is integer multiplication or floating-point multiplication? If it 
is integer multiplication, I'd expect that using SCEV would be the 
easiest way to recognize the relevant patterns. SCEV is supposed to 
understand all of the unobfuscation tricks.

Do these instructions contain an implicit loop (of runtime trip count) 
or are you trying to match loops of some fixed trip count?

  -Hal
>
> -Krzysztof
>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jan 2017 - Loop identification

[llvm-dev] Loop identification

[llvm-dev] Loop identification

[llvm-dev] Loop identification

[llvm-dev] Loop identification

[llvm-dev] Loop identification

Possibly Parallel Threads