thr3ads.net - llvm dev - [llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager [May 2021]

If this information is useful, please help other people find it:
Share via:

Jie He via llvm-dev

2021-May-12 02:41 UTC

[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager

yes, but current lowering deopt implementation would generate a statepoint
IR which currently only supports X86-64, as mentioned in GC documentation
in LLVM.

iRCE doesn't reply on GCed language, I remember wrong. but it's not
smart
right now, can't handle bounds check well like java RCE did.

On Tue, 11 May 2021 at 23:04, Philip Reames <listmail at philipreames.com>
wrote:
> This is incorrect.
>
> IRCE's current sole known user happens to be a compiler for a GCed
> language, but there is no (intentional) dependence on that fact.  It should
> work on arbitrary IR.
>
> Loop predication (the form in IndVars) triggers for arbitrary IR.  The
> separate pass depends on semantics of guards which is related to deopt
> semantics, but *not* GC.
>
> Philip
> On 5/11/21 7:17 AM, Jie He wrote:
>
> as I know, current IRCE implementation relies on some preconditions.
it's
> intended to language runtime with garbage collection, not for loop
> vectorization.
> the same is true for loop predication, which is also helpful for
> eliminating condition check within a loop.
>
> Jie He
> B.R
>
> On Tue, 11 May 2021 at 20:50, Jingu Kang via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi Philip,
>>
>>
>>
>> I have extended your suggestion slightly more as below.
>>
>>
>>
>>                                  newbound1 = min(n, c)
>>
>>                                  newbound2 = max(n, c)
>>
>>      while (iv < n) {            while(iv < newbound1) {
>>
>>        A                           A
>>
>>        if (iv < c)                 B
>>
>>          B                         C
>>
>>        C                         }
>>
>>      }                           iv = newbound1
>>
>>                                  while (iv < newbound2) {
>>
>>                                    A
>>
>>                                    C
>>
>>                                  }
>>
>>
>>
>> I have implemented a simple pass to split bound of loop, which has
>> conditional branch with IV, as above example.
>> https://reviews.llvm.org/D102234 It is initial version. If possible,
>> please review it.
>>
>>
>>
>> Thanks
>>
>> JinGu Kang
>>
>>
>>
>> *From:* Jingu Kang <Jingu.Kang at arm.com>
>> *Sent:* 04 May 2021 12:45
>> *To:* Philip Reames <listmail at philipreames.com>; Jingu Kang
<
>> Jingu.Kang at arm.com>
>> *Cc:* llvm-dev at lists.llvm.org
>> *Subject:* RE: [llvm-dev] Enabling IRCE pass or Adding something
similar
>> in the pipeline of new pass manager
>>
>>
>>
>> Philip, I appreciate your kind comments.
>>
>> >In this example, forming the full pre/main/post loop structure of
IRCE
>> is overkill.  Instead, we could simply restrict the loop bounds in the
>> following manner:
>>
>> >loop.ph:
>>
>> >  ;; Warning: psuedo code, might have edge conditions wrong
>>
>> >  %c = icmp sgt %iv, %n
>>
>> >  %min = umax(%n, %a)
>>
>> >  br i1 %c, label %exit, label %loop.ph
>>
>> >
>>
>> >loop.ph.split:
>>
>> >  br label %loop
>>
>> >
>>
>> >loop:
>>
>> >  %iv = phi i64 [ %inc, %loop ], [ 1, %loop.ph ]
>>
>> >  %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
>>
>> >  %val = load i64, i64* %src.arrayidx
>>
>> >  %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
>>
>> >  store i64 %val, i64* %dst.arrayidx
>>
>> >  %inc = add nuw nsw i64 %iv, 1
>>
>> >  %cond = icmp eq i64 %inc, %min
>>
>> >  br i1 %cond, label %exit, label %loop
>>
>> >
>>
>> >exit:
>>
>> >  ret void
>>
>> >}
>>
>> >
>>
>> >I'm not quite sure what to call this transform, but it's
not IRCE.  If this example is actually general enough to cover your use cases,
it's going to be a lot easier to judge profitability on than the general
form of iteration set splitting
>>
>>
>>
>> I agree with you. If the llvm community is ok to accept above approach
as
>> a pass or a part of a certain pass, I would be happy to implement it
>> because I am aiming to handle this case with llvm upstream.
>>
>>
>>
>> >Another way to frame this special case might be to recognize the
>> conditional block can be inverted into an early exit.  (Reasoning: %iv
is
>> strictly increasing, condition is monotonic, path if not taken has no
>> observable effect)  Consider:
>>
>> >loop.ph:
>>
>> >  br label %loop
>>
>> >
>>
>> >loop:
>>
>> >  %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph ]
>>
>> >  %cmp = icmp sge i64 %iv, %a
>>
>> >  br i1 %cmp, label %exit, label %for.inc
>>
>> >
>>
>> >for.inc:
>>
>> >  %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
>>
>> >  %val = load i64, i64* %src.arrayidx
>>
>> >  %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
>>
>> >  store i64 %val, i64* %dst.arrayidx
>>
>> >  %inc = add nuw nsw i64 %iv, 1
>>
>> >  %cond = icmp eq i64 %inc, %n
>>
>> >  br i1 %cond, label %exit, label %loop
>>
>> >
>>
>> >exit:
>>
>> >  ret void
>>
>> >}
>>
>> >Once that's done, the multiple exit vectorization work should
vectorize
>> this loop. Thinking about it, I really like this variant.
>>
>>  I have not looked at the multiple exit vectorization work yet but it
>> looks we could consider the inverted condition as early exit’s
condition.
>>
>> >The costing here seems quite off.  I have not looked at how the
>> vectorize costs predicated loads on hardware without predication, but
>> needing to scalarize a conditional VF-times and form a vector again
does
>> not have a cost of 3 million.  This could definitely be improved.
>>
>> I agree with you.
>>
>>
>>
>> Additionally, if possible, I would like to suggest to enable or add
>> transformations in order to help vectorization. For example, as
removing
>> conditional branch inside loop, we could split a loop with dependency,
>> which blocks vectorization, into vectorizable loop and non-vectorizable
one
>> using transformations like loop distribution. I am not sure why these
>> features have not been enabled as default on pass manager but it would
make
>> more loops vectorizable.
>>
>>
>>
>> Thanks
>>
>> JinGu Kang
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> --
> Best Regards
> He Jie 何杰
>
>
-- 
Best Regards
He Jie 何杰
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210512/97d4f353/attachment.html>

Philip Reames via llvm-dev

2021-May-12 18:45 UTC

head link

[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager

On 5/11/21 7:41 PM, Jie He wrote:> yes, but current lowering deopt implementation would generate a 
> statepoint IR which currently only supports X86-64, as mentioned in GC 
> documentation in LLVM.I believe this is supported on at least AArch64 if memory
serves.>
> iRCE doesn't reply on GCed language, I remember wrong. but it's not
> smart right now, can't handle bounds check well like java RCE did.Er, I think you're either misunderstanding or need to clarify your 
point.  IRCE does exactly the standard pre/main/post loop technique 
which was used in C2 back in the day.  LoopPred does the widening 
transformation.  Do you have a particular case in mind you're thinking
of?>
> On Tue, 11 May 2021 at 23:04, Philip Reames <listmail at
philipreames.com
> <mailto:listmail at philipreames.com>> wrote:
>
>     This is incorrect.
>
>     IRCE's current sole known user happens to be a compiler for a GCed
>     language, but there is no (intentional) dependence on that fact. 
>     It should work on arbitrary IR.
>
>     Loop predication (the form in IndVars) triggers for arbitrary IR. 
>     The separate pass depends on semantics of guards which is related
>     to deopt semantics, but *not* GC.
>
>     Philip
>
>     On 5/11/21 7:17 AM, Jie He wrote:
>>     as I know, current IRCE implementation relies on some
>>     preconditions. it's intended to language runtime with garbage
>>     collection, not for loop vectorization.
>>     the same is true for loop predication, which is also helpful for
>>     eliminating condition check within a loop.
>>
>>     Jie He
>>     B.R
>>
>>     On Tue, 11 May 2021 at 20:50, Jingu Kang via llvm-dev
>>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>>         Hi Philip,
>>
>>         I have extended your suggestion slightly more as below.
>>
>>         newbound1 = min(n, c)
>>
>>         newbound2 = max(n, c)
>>
>>              while (iv < n) {            while(iv < newbound1) {
>>
>>         A                           A
>>
>>                if (iv < c)                 B
>>
>>         B                         C
>>
>>         C                         }
>>
>>         }                           iv = newbound1
>>
>>         while (iv < newbound2) {
>>
>>         A
>>
>>                                        C
>>
>>         }
>>
>>         I have implemented a simple pass to split bound of loop,
>>         which has conditional branch with IV, as above example.
>>         https://reviews.llvm.org/D102234
>>         <https://reviews.llvm.org/D102234> It is initial version.
If
>>         possible, please review it.
>>
>>         Thanks
>>
>>         JinGu Kang
>>
>>         *From:* Jingu Kang <Jingu.Kang at arm.com
>>         <mailto:Jingu.Kang at arm.com>>
>>         *Sent:* 04 May 2021 12:45
>>         *To:* Philip Reames <listmail at philipreames.com
>>         <mailto:listmail at philipreames.com>>; Jingu Kang
>>         <Jingu.Kang at arm.com <mailto:Jingu.Kang at
arm.com>>
>>         *Cc:* llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>         *Subject:* RE: [llvm-dev] Enabling IRCE pass or Adding
>>         something similar in the pipeline of new pass manager
>>
>>         Philip, I appreciate your kind comments.
>>
>>         >In this example, forming the full pre/main/post loop
structure of IRCE is overkill.  Instead,
>>         we could simply restrict the loop bounds in the following
manner:
>>
>>         >loop.ph  <http://loop.ph>:
>>
>>         >  ;; Warning: psuedo code, might have edge conditions wrong
>>
>>         >  %c = icmp sgt %iv, %n
>>
>>         >  %min = umax(%n, %a)
>>
>>         >  br i1 %c, label %exit, label %loop.ph 
<http://loop.ph>
>>
>>         > 
>>
>>         >loop.ph.split:
>>
>>         >  br label %loop
>>
>>         > 
>>
>>         >loop:
>>
>>         >  %iv = phi i64 [ %inc, %loop ], [ 1, %loop.ph 
<http://loop.ph>  ]
>>
>>         >  %src.arrayidx = getelementptr inbounds i64, i64* %src,
i64 %iv
>>
>>         >  %val = load i64, i64* %src.arrayidx
>>
>>         >  %dst.arrayidx = getelementptr inbounds i64, i64* %dst,
i64 %iv
>>
>>         >  store i64 %val, i64* %dst.arrayidx
>>
>>         >  %inc = add nuw nsw i64 %iv, 1
>>
>>         >  %cond = icmp eq i64 %inc, %min
>>
>>         >  br i1 %cond, label %exit, label %loop
>>
>>         > 
>>
>>         >exit:
>>
>>         >  ret void
>>
>>         >}
>>
>>         > 
>>
>>         >I'm not quite sure what to call this transform, but
it's not IRCE.  If this example is actually general enough to cover your use
cases, it's going to be a lot easier to judge profitability on than the
general form of iteration set splitting
>>
>>         I agree with you. If the llvm community is ok to accept above
>>         approach as a pass or a part of a certain pass, I would be
>>         happy to implement it because I am aiming to handle this case
>>         with llvm upstream.
>>
>>         >Another way to frame this special case might be to
recognize the conditional block can be
>>         inverted into an early exit. (Reasoning: %iv is strictly
>>         increasing, condition is monotonic, path if not taken has no
>>         observable effect)  Consider:
>>
>>         >loop.ph  <http://loop.ph>:
>>
>>         >  br label %loop
>>
>>         > 
>>
>>         >loop:
>>
>>         >  %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph 
<http://loop.ph>  ]
>>
>>         >  %cmp = icmp sge i64 %iv, %a
>>
>>         >  br i1 %cmp, label %exit, label %for.inc
>>
>>         > 
>>
>>         >for.inc:
>>
>>         >  %src.arrayidx = getelementptr inbounds i64, i64* %src,
i64 %iv
>>
>>         >  %val = load i64, i64* %src.arrayidx
>>
>>         >  %dst.arrayidx = getelementptr inbounds i64, i64* %dst,
i64 %iv
>>
>>         >  store i64 %val, i64* %dst.arrayidx
>>
>>         >  %inc = add nuw nsw i64 %iv, 1
>>
>>         >  %cond = icmp eq i64 %inc, %n
>>
>>         >  br i1 %cond, label %exit, label %loop
>>
>>         > 
>>
>>         >exit:
>>
>>         >  ret void
>>
>>         >}
>>
>>         >Once that's done, the multiple exit vectorization work
should vectorize this loop. Thinking
>>         about it, I really like this variant.
>>
>>          I have not looked at the multiple exit vectorization work
>>         yet but it looks we could consider the inverted condition as
>>         early exit’s condition.
>>
>>         >The costing here seems quite off. I have not looked at how
the vectorize costs predicated loads
>>         on hardware without predication, but needing to scalarize a
>>         conditional VF-times and form a vector again does not have a
>>         cost of 3 million.  This could definitely be improved.
>>
>>         I agree with you.
>>
>>         Additionally, if possible, I would like to suggest to enable
>>         or add transformations in order to help vectorization. For
>>         example, as removing conditional branch inside loop, we could
>>         split a loop with dependency, which blocks vectorization,
>>         into vectorizable loop and non-vectorizable one using
>>         transformations like loop distribution. I am not sure why
>>         these features have not been enabled as default on pass
>>         manager but it would make more loops vectorizable.
>>
>>         Thanks
>>
>>         JinGu Kang
>>
>>         _______________________________________________
>>         LLVM Developers mailing list
>>         llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>         https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>        
<https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>>
>>     -- 
>>     Best Regards
>>     He Jie 何杰
>
>
>
> -- 
> Best Regards
> He Jie 何杰-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210512/99e580c4/attachment.html>

llvm dev - May 2021 - Enabling IRCE pass or Adding something similar in the pipeline of new pass manager

[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager

[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager