On 01/13/2017 10:19 AM, Krzysztof Parzyszek via llvm-dev wrote:> Hi Catello, > > LLVM does have a "loop idiom recognition" pass which, in principle, > does exactly that kind of a thing: it recognizes loops that perform > memcpy/memset operations. It does not recognize any target-specific > idioms though and there isn't really much in it that would make such > recognition easier. We have some cases like yours on Hexagon, where we > want to replace certain loops with Hexagon-specific intrinsics, and > the way we do it is that we have (in our own compiler) a separate pass > that runs at around the same time, but which does "Hexagon-specific > loop idiom recognition". That pass is not present in llvm.org, mostly > because it hooks up a target specific pass in a way that is not > "officially supported". > > If LLVM supported adding such target-specific passes at that point in > the optimization pipeline, you could just write your own pass and plug > it in there.This certainly seems like a reasonable thing to support, but the question is: Why should your pass run early in the mid-level optimizer (i.e. in the part of the pipeline we generally consider canonicalization) instead of as an early IR pass in the backend? Adding IR-level passes early in the backend is well supported. There are plenty of potential answers here for why earlier is better (e.g. affecting inlining decisions, idioms might be significantly more difficult to recognize after vectorization, etc.) but I think we need to discuss the use case. -Hal> > -Krzysztof > > > On 1/13/2017 9:45 AM, Catello Cioffi via llvm-dev wrote: >> Good afternoon, >> >> I'm working on modifying the Mips backend in order to add new >> functionalities. I've successfully implemented the intrinsics, but I >> want to recognize a pattern like this: >> >> int seq[max]; >> int cnt = 0; >> >> for (int i = 0; i < max; i++) >> { >> for (int j = 0; i < 16; i++) >> { >> char hn = (seq[i] & (3<<(j*2)) >> (j*2); >> if (hn == 2) >> { >> cnt++; >> } >> } >> } >> >> >> and transform it into something like: >> >> int seq[max]; >> int cnt = 0; >> >> for (int i = 0; i < max; i++) >> { >> cnt += intrinsic(seq[i], 2); >> } >> >> Do you know what I can use to transform the loop? Or if exists something >> similar in LLVM? >> >> Thanks, >> >> Catello >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
On 1/13/2017 10:52 AM, Hal Finkel wrote:> On 01/13/2017 10:19 AM, Krzysztof Parzyszek via llvm-dev wrote: >> >> If LLVM supported adding such target-specific passes at that point in >> the optimization pipeline, you could just write your own pass and plug >> it in there. > > This certainly seems like a reasonable thing to support, but the > question is: Why should your pass run early in the mid-level optimizer > (i.e. in the part of the pipeline we generally consider > canonicalization) instead of as an early IR pass in the backend? Adding > IR-level passes early in the backend is well supported. There are plenty > of potential answers here for why earlier is better (e.g. affecting > inlining decisions, idioms might be significantly more difficult to > recognize after vectorization, etc.) but I think we need to discuss the > use case.The reason is that the idiom code may end up looking different each time one of the preceding optimization is changed. Also, some of the optimizations (instruction combiner, for example) have a tendency to greatly obfuscate the code, making it really hard to extract useful data from the idiom code. It is not always enough to simply recognize a pattern, but to replace it with an intrinsic some additional parameters may need to be obtained from the initial code. When the code is mangled by the combiner, this process may be a lot harder. Also, combiner is one of those things that change quite often. For recognizing loop idioms, the loop optimizations may be the main problem. The idiom code may end up getting unrolled, rotated, or otherwise rendered unrecognizable. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
> On Jan 13, 2017, at 9:13 AM, Krzysztof Parzyszek via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On 1/13/2017 10:52 AM, Hal Finkel wrote: >> On 01/13/2017 10:19 AM, Krzysztof Parzyszek via llvm-dev wrote: >>> >>> If LLVM supported adding such target-specific passes at that point in >>> the optimization pipeline, you could just write your own pass and plug >>> it in there. >> >> This certainly seems like a reasonable thing to support, but the >> question is: Why should your pass run early in the mid-level optimizer >> (i.e. in the part of the pipeline we generally consider >> canonicalization) instead of as an early IR pass in the backend? Adding >> IR-level passes early in the backend is well supported. There are plenty >> of potential answers here for why earlier is better (e.g. affecting >> inlining decisions, idioms might be significantly more difficult to >> recognize after vectorization, etc.) but I think we need to discuss the >> use case. > > The reason is that the idiom code may end up looking different each time one of the preceding optimization is changed. Also, some of the optimizations (instruction combiner, for example) have a tendency to greatly obfuscate the codeThis seems quite contradictory with what I was constantly told about the goal of inst-combine, i.e. it is supposed to canonicalize the IR to make it easier for further passes to recognize patterns.> , making it really hard to extract useful data from the idiom code. It is not always enough to simply recognize a pattern, but to replace it with an intrinsic some additional parameters may need to be obtained from the initial code. When the code is mangled by the combiner, this process may be a lot harder. Also, combiner is one of those things that change quite often. For recognizing loop idioms, the loop optimizations may be the main problem. The idiom code may end up getting unrolled, rotated, or otherwise rendered unrecognizable.That part (loop optimization) was acknowledged in Hal answer IIUC. — Mehdi
On 1/13/2017 10:52 AM, Hal Finkel wrote:> but I think we need to discuss the use case.The main case for us was recognizing polynomial multiplications. Hexagon has instructions that do that, and the goal was to replace loops that do that with intrinsics. The problem is that these loops often get unrolled and intertwined with other code, making the replacement hard, or impossible. This is especially true if some of the multiplication code is combined with instructions that were not originally part of it (I don't remember 100% if that was happening, but the loop optimizations were the main culprit in making it hard). -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
On 01/13/2017 12:53 PM, Krzysztof Parzyszek wrote:> On 1/13/2017 10:52 AM, Hal Finkel wrote: >> but I think we need to discuss the use case. > > The main case for us was recognizing polynomial multiplications. > Hexagon has instructions that do that, and the goal was to replace > loops that do that with intrinsics. The problem is that these loops > often get unrolled and intertwined with other code, making the > replacement hard, or impossible. This is especially true if some of > the multiplication code is combined with instructions that were not > originally part of it (I don't remember 100% if that was happening, > but the loop optimizations were the main culprit in making it hard).This is integer multiplication or floating-point multiplication? If it is integer multiplication, I'd expect that using SCEV would be the easiest way to recognize the relevant patterns. SCEV is supposed to understand all of the unobfuscation tricks. Do these instructions contain an implicit loop (of runtime trip count) or are you trying to match loops of some fixed trip count? -Hal> > -Krzysztof >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory