thr3ads.net - llvm dev - [llvm-dev] Delinearization validity checks in DependenceAnalysis [May 2019]

If this information is useful, please help other people find it:
Share via:

David Green via llvm-dev

2019-May-16 19:50 UTC

[llvm-dev] Delinearization validity checks in DependenceAnalysis

Hello

Under the proviso that it's been a while since I looked into any of these
things...

On 05/15, Bardia Mahjour via llvm-dev wrote:> I also get correct results for my example (for a 64-bit target) if the
upper
> bounds are changed to unsigned. The reason is simply because clang
zero-extends
> `m` for address calculations but sign-extends it for the loop upper bound.
This
> prevents SCEV from canceling out the 'm' term from the difference
expression
> that looks like `(-3 + (sext i32 %m to i64) + (-1 * (zext i32 %m to
> i64))<nsw>)`. While we cannot reduce expressions of this form in
general, it
> does pose a sever limitation for the vast majority of cases where the loop
> upper bound is non-negative.
>
There are already some function like isKnownPredicate that attempt to deal with
some of this is DA. Can we extend the other for these cases? Perhaps with some
extra information that we have that SCEV would in general not know, or do
something extra that it would not compute. Or ideally can we just improve SCEV
to get this right?
> Regarding your example, I'm not sure I fully understand the concern and
I
> would appreciate it if you could clarify that for me. My understanding is
> that if the intended shape of 'a' is in fact 'a[n][m]' then
the example, as
> provided, has an out-of-bound access to start with. To avoid this out-bound
> access one would need to change the upper bound of the j-loop to be
'm-2'.
> Interestingly, if we do that, the current validity checks start to pass and
> we get [0 2] as the dependence vector. Here's a slightly modified
version
> of your example that I tried:
> 
> ```
> typedef unsigned long long TT;
> void foo(TT n, TT m, int *a) {
>   for (TT i = 0; i < n; ++i)
>     for (TT j = 0; j < m-2; ++j) {
>       a[i*m + j] = a[i*m + j+2];
>     }
> }
> ```
Sure, but is wouldn't be invalid from C to use the "for (TT j = 0; j
< m; ++j)" bound, so long as the size of the object passed through to
"a" was large enough that it didn't overflow. We don't have
enough information to prove this won't happen, so unfortunately we need to
be a conservative.

So this loop:

void test(int * A, signed N, int M)
{
    for(int i = 1; i < N; i+=1) {
        for(int j = 0; j < M; j+=1) {
            A[i*M + j] = 2;
            A[i*M + j - 1] = 4;
        }
    }
}

Needs to not get the dependency vector [0 >], otherwise unroll and jam might
think it was safe to reorder (unroll and jam) and miscompile the code.
> 
> If the concern is that the actual intended shape of the array may have been
> something other than a[n][m], and that we are indexing it such that the
> accesses are in-bound with respect to the memory of the whole array but not
> with respect to individual dimensions, then I'm not sure we can reason
> about *any* delinearization statically (except for the limited cases where
> the bounds are compile-time known).
>
> Am I misunderstanding the issue?
> 
> Bardia Mahjour
> Compiler Optimizations
> Hydra Squad Lead
> IBM Toronto Software Lab
> bmahjour at ca.ibm.com (905) 413-2336
> 
> 
> 
> 
> 
> From: David Green <David.Green at arm.com>
> To:   "llvm-dev at lists.llvm.org" <llvm-dev at
lists.llvm.org>, Bardia
>             Mahjour <bmahjour at ca.ibm.com>
> Cc:   nd <nd at arm.com>
> Date: 2019/05/14 02:50 PM
> Subject:      [EXTERNAL] Re: [llvm-dev] Delinearization validity checks in
>             DependenceAnalysis
> 
> 
> 
> Hello
> 
> Interestingly, the example you provide works for me so long as either
it's
> a 32bit target, or the array bounds (n and m) are changed to unsigned.
> 
> For a bit of history, DA used to have a different delinearisation method
> based on geps, but it was known to be invalid in same cases and was
> eventually removed. There was no delinearisation for a while until these
> checks were fixed, enough to be correct, but not as powerful as they could
> be. I believe Pollys delinearisation is much more powerful, and can version
> loops with runtime conditions.
> 
> IIRC, the checks are there for cases like this:
> void foo(unsigned n, unsigned m, int *a) {
>   for (int i = 0; i < n; ++i)
>     for (int j = 0; j < m; ++j) {
>       a[i*m + j] = a[i*m + j+2];
>     }
> }
> 
> The "j-2" can underflow into the previous i iterations space. So
the
> distance vector isn't [0 2] (and isn't even [= <]).
Unfortunately this is
> perfectly valid in C (I think for the int a[][m] case too).
> 
> See this comment for why they were needed and perhaps a better way to fix
> it:
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_llvm_llvm-2Dproject_commit_d143c65de3c884d09197da279d2f04f094efaf15-23diff-2D57473b376037dd3637516348b9b02556L3274&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=aihobyOnVzXW7OPSK1-NiSYQkq7oP3ZSUVc4BemvrVo&m=46eKxI_sFjjeBzn7X-OLXSEUwHN-HVCD16TF9OuyIqc&s=adPvJDhPtFMlaTWihZmvWjXqFUFHDnzcV84oaDGlryM&e>
> 
> Any improvements to the delinearisation code would be most welcome.
> Dave
> 
> 
> 
> 
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of
Bardia
> Mahjour via llvm-dev <llvm-dev at lists.llvm.org>
> Sent: 13 May 2019 14:48
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] Delinearization validity checks in DependenceAnalysis
> 
> Hi all,
> 
> I have been looking at the `DependenceAnalysis` pass in
> `llvm/include/llvm/Analysis/DependenceAnalysis.h`.
> In order for this analysis to produce accurate dependence vectors for
> multi-dimensional arrays in nested loops, it needs to
"delinearize" array
> element accesses to recover the subscripts in each dimension of the array.
> I believe that the current implementation of delinearization is based on
> this paper:
> 
https://urldefense.proofpoint.com/v2/url?u=http-3A__web.cse.ohio-2Dstate.edu_-7Epouchet.2_doc_ics-2Darticle.15a.pdf&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=aihobyOnVzXW7OPSK1-NiSYQkq7oP3ZSUVc4BemvrVo&m=46eKxI_sFjjeBzn7X-OLXSEUwHN-HVCD16TF9OuyIqc&s=nraVp9R56UlYi0We27kmGSjZGnM296r0HFpbRR62Fzs&e>
.
> 
> This paper describes how to delinearize the subscripts, and as a last step
> it requires certain conditions to be met in order to validate that the
> delinearized indexes are correct. The `tryDelinearize` function in
> `DependenceAnalysis.cpp` appears to be checking for cases where these
> conditions can be validated *at compile time*:
> 
> ```
> // Statically check that the array bounds are in-range. The first subscript
> we
> // don't have a size for and it cannot overflow into another subscript,
so
> is
> // always safe. The others need to be 0 <= subscript[i] < bound, for
both
> src
> // and dst.
> // FIXME: It may be better to record these sizes and add them as
> constraints
> // to the dependency checks.
> for (int i = 1; i < size; ++i) {
> if (!isKnownNonNegative(SrcSubscripts[i], SrcPtr))
> return false;
> 
> if (!isKnownLessThan(SrcSubscripts[i], Sizes[i - 1]))
> return false;
> 
> if (!isKnownNonNegative(DstSubscripts[i], DstPtr))
> return false;
> 
> if (!isKnownLessThan(DstSubscripts[i], Sizes[i - 1]))
> return false;
> }
> ```
> 
> The problem is that in a lot of cases these conditions cannot be proven
> statically, even though the delinearized indexes are in fact correct. For
> example consider this simple loop:
> 
> ```
> void foo(int n, int m, int a[][m]) {
> for (int i = 0; i < n-1; ++i)
> for (int j = 2; j < m; ++j) {
> a[i][j] = a[i+1][j-2];
> }
> }
> 
> clang test.c -c -O3 -S -Xclang -disable-llvm-passes -emit-llvm
> opt -mem2reg -instcombine -indvars -loop-simplify -loop-rotate -inline
> -simplifycfg test.ll -S -o test.simp.ll
> opt test.simp.ll -analyze -da
> ```
> 
> will produce:
> 
> ```
> da analyze - none!
> da analyze - anti [* *|<]!
> da analyze - output [* *]!
> ```
> 
> If the validity checks were not present, the result would be much more
> accurate dependence vector with accurate dependence distances:
> 
> ```
> da analyze - none!
> da analyze - consistent anti [1 -2]!
> da analyze - none!
> ```
> 
> In my experience the static validity checks tend to fail in many common
> cases (involving loops with symbolic bounds). Sometimes this happens
> because SCEV is not able to simplify the expressions due to presence of
> type casts and sign/zero extensions, but other times the conditions are
> just not computable at compile-time.
> 
> So far I haven't been able to come up with an example where the
validity
> checks in the current implementation legitimately catch a case of invalid
> delinearization. I've also disabled these checks and run some tests
without
> finding any failures (other than improvements in the dependence analysis
> LIT tests).
> 
> I also had a quick look at Polly which benefits from delinearization. From
> what I saw, a much more optimistic approach is taken where the validity
> checks seem to be avoided.
> 
> My questions are:
> 1. Does anyone have more information about why these validity checks are
> necessary in the current implementation, perhaps with some examples showing
> an incorrect delinearization that's possible without those checks?
> 2. Are there any concerns with putting these validity checks under an
> option so that we can more easily disable them and experiment? This could
> also help us improve LIT tests, since some of them have been pessimised to
> compensate for DA's inability to delinearize, and could fail to catch
> regressions as a result of bad changes to the data dependence analysis.
> 
> Looking forward to your help on this.
> 
> Thank you,
> 
> Bardia Mahjour
> Compiler Optimizations
> Hydra Squad Lead
> IBM Toronto Software Lab
> bmahjour at ca.ibm.com (905) 413-2336
> 
> 
> 
> 
> 

> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
    
-------------- next part --------------
A non-text attachment was scrubbed...
Name: da3.c
Type: text/x-csrc
Size: 525 bytes
Desc: da3.c
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190516/b2b50a68/attachment.c>

Bardia Mahjour via llvm-dev

2019-May-22 17:09 UTC

head link

[llvm-dev] Delinearization validity checks in DependenceAnalysis

Hi David,

Sorry for my delayed response.
> There are already some function like isKnownPredicate that attempt todeal with some of this is DA. Can we extend the other for these cases?
Perhaps with some extra information that we have that SCEV would in general
not know, or do something extra that it would not compute. Or ideally can
we just improve SCEV to get this right?

It looks like SCEV already tries to simplify identically sign and zero
extended terms in an expression. However expressions that contain mix of
sex and zex like this: `(-3 + (sext i32 %m to i64) + (-1 * (zext i32 %m
to> i64))<nsw>)` cannot be simplified unless we can assume that %m is
nevernegative. Although in practice %m will not be negative in most cases, the
compiler cannot prove that at compile-time so I don't think we can
"improve" SCEV to catch these cases.

> So this loop:
> void test(int * A, signed N, int M)
> {
>    for(int i = 1; i < N; i+=1) {
>        for(int j = 0; j < M; j+=1) {
>            A[i*M + j] = 2;
>            A[i*M + j - 1] = 4;
>        }
>    }
> }
> Needs to not get the dependency vector [0 >], otherwise unroll and jammight think it was safe to reorder (unroll and jam) and miscompile the
code.

Without the delinearization validity checks, the access functions A[i*M +
j] and A[i*M + j - 1] would get delinearized as follows:

SrcSCEV = {{((4 * %M) + %A)<nsw>,+,(4 *
%M)}<%for.body>,+,4}<%for.body4>
DstSCEV = {{(-4 + (4 * %M) + %A),+,(4 *
%M)}<%for.body>,+,4}<%for.body4>

SrcSubscripts: {1,+,1}<%for.body>{0,+,1}<%for.body4>
DstSubscripts: {1,+,1}<%for.body>{-1,+,1}<%for.body4>   
delinearized
    subscript 0
        src = {1,+,1}<%for.body>
        dst = {1,+,1}<%for.body>
        class = 1
        loops = {1}
    subscript 1
        src = {0,+,1}<%for.body4>
        dst = {-1,+,1}<%for.body4>
        class = 1
        loops = {2}
    Separable = {0 1}
    Coupled = {}

Why is this not a valid delinearization? The fact that the {-1,+,1}
subscript goes out of bounds with respect to the second dimension is a
problem that originates from the user code (and can be fixed by
initializing j to 1 instead of 0). It is not a problem introduced by
delinearization. Nevertheless delinearization validity checks fail because
they find that the subscript in the second dimension could be negative and
think that we didn't delinearize correctly.

Do you have any objections to adding an option to disable these checks, at
least for the test cases where we know the delinearization would be
correct, but cannot be guaranteed by the compiler?

Do we still need the validity step in the delinearization algorithm, if we
could guarantee that the array size parameters collected (guessed) in the
first step of the algorithm are in fact the correct array dimensions?





From:	David Green <David.Green at arm.com>
To:	"Doerfert, Johannes" <jdoerfert at anl.gov>, Bardia Mahjour
            <bmahjour at ca.ibm.com>
Cc:	"llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org>,
nd
            <nd at arm.com>
Date:	2019/05/16 03:50 PM
Subject:	[EXTERNAL] Re: [llvm-dev] Delinearization validity checks in
            DependenceAnalysis



Hello

Under the proviso that it's been a while since I looked into any of these
things...

On 05/15, Bardia Mahjour via llvm-dev wrote:> I also get correct results for my example (for a 64-bit target) if the
upper> bounds are changed to unsigned. The reason is simply because clang
zero-extends> `m` for address calculations but sign-extends it for the loop upper
bound. This> prevents SCEV from canceling out the 'm' term from the difference
expression> that looks like `(-3 + (sext i32 %m to i64) + (-1 * (zext i32 %m to
> i64))<nsw>)`. While we cannot reduce expressions of this form in
general,
it> does pose a sever limitation for the vast majority of cases where the
loop> upper bound is non-negative.
>
There are already some function like isKnownPredicate that attempt to deal
with some of this is DA. Can we extend the other for these cases? Perhaps
with some extra information that we have that SCEV would in general not
know, or do something extra that it would not compute. Or ideally can we
just improve SCEV to get this right?
> Regarding your example, I'm not sure I fully understand the concern and
I
> would appreciate it if you could clarify that for me. My understanding is
> that if the intended shape of 'a' is in fact 'a[n][m]' then
the example,
as> provided, has an out-of-bound access to start with. To avoid this
out-bound> access one would need to change the upper bound of the j-loop to be
'm-2'.> Interestingly, if we do that, the current validity checks start to pass
and> we get [0 2] as the dependence vector. Here's a slightly modified
version
> of your example that I tried:
>
> ```
> typedef unsigned long long TT;
> void foo(TT n, TT m, int *a) {
>   for (TT i = 0; i < n; ++i)
>     for (TT j = 0; j < m-2; ++j) {
>       a[i*m + j] = a[i*m + j+2];
>     }
> }
> ```
Sure, but is wouldn't be invalid from C to use the "for (TT j = 0; j
< m; +
+j)" bound, so long as the size of the object passed through to
"a" was
large enough that it didn't overflow. We don't have enough information
to
prove this won't happen, so unfortunately we need to be a conservative.

So this loop:

void test(int * A, signed N, int M)
{
    for(int i = 1; i < N; i+=1) {
        for(int j = 0; j < M; j+=1) {
            A[i*M + j] = 2;
            A[i*M + j - 1] = 4;
        }
    }
}

Needs to not get the dependency vector [0 >], otherwise unroll and jam
might think it was safe to reorder (unroll and jam) and miscompile the
code.
>
> If the concern is that the actual intended shape of the array may have
been> something other than a[n][m], and that we are indexing it such that the
> accesses are in-bound with respect to the memory of the whole array but
not> with respect to individual dimensions, then I'm not sure we can reason
> about *any* delinearization statically (except for the limited cases
where> the bounds are compile-time known).
>
> Am I misunderstanding the issue?
>
> Bardia Mahjour
> Compiler Optimizations
> Hydra Squad Lead
> IBM Toronto Software Lab
> bmahjour at ca.ibm.com (905) 413-2336
>
>
>
>
>
> From: David Green <David.Green at arm.com>
> To:   "llvm-dev at lists.llvm.org" <llvm-dev at
lists.llvm.org>, Bardia
>             Mahjour <bmahjour at ca.ibm.com>
> Cc:   nd <nd at arm.com>
> Date: 2019/05/14 02:50 PM
> Subject:      [EXTERNAL] Re: [llvm-dev] Delinearization validity checks
in>             DependenceAnalysis
>
>
>
> Hello
>
> Interestingly, the example you provide works for me so long as either
it's> a 32bit target, or the array bounds (n and m) are changed to unsigned.
>
> For a bit of history, DA used to have a different delinearisation method
> based on geps, but it was known to be invalid in same cases and was
> eventually removed. There was no delinearisation for a while until these
> checks were fixed, enough to be correct, but not as powerful as they
could> be. I believe Pollys delinearisation is much more powerful, and can
version> loops with runtime conditions.
>
> IIRC, the checks are there for cases like this:
> void foo(unsigned n, unsigned m, int *a) {
>   for (int i = 0; i < n; ++i)
>     for (int j = 0; j < m; ++j) {
>       a[i*m + j] = a[i*m + j+2];
>     }
> }
>
> The "j-2" can underflow into the previous i iterations space. So
the
> distance vector isn't [0 2] (and isn't even [= <]).
Unfortunately this is
> perfectly valid in C (I think for the int a[][m] case too).
>
> See this comment for why they were needed and perhaps a better way to fix
> it:
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_llvm_llvm-2Dproject_commit_d143c65de3c884d09197da279d2f04f094efaf15-23diff-2D57473b376037dd3637516348b9b02556L3274&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=aihobyOnVzXW7OPSK1-NiSYQkq7oP3ZSUVc4BemvrVo&m=46eKxI_sFjjeBzn7X-OLXSEUwHN-HVCD16TF9OuyIqc&s=adPvJDhPtFMlaTWihZmvWjXqFUFHDnzcV84oaDGlryM&e>
>
> Any improvements to the delinearisation code would be most welcome.
> Dave
>
>
>
>
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of
Bardia
> Mahjour via llvm-dev <llvm-dev at lists.llvm.org>
> Sent: 13 May 2019 14:48
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] Delinearization validity checks in DependenceAnalysis
>
> Hi all,
>
> I have been looking at the `DependenceAnalysis` pass in
> `llvm/include/llvm/Analysis/DependenceAnalysis.h`.
> In order for this analysis to produce accurate dependence vectors for
> multi-dimensional arrays in nested loops, it needs to
"delinearize" array
> element accesses to recover the subscripts in each dimension of the
array.> I believe that the current implementation of delinearization is based on
> this paper:
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__web.cse.ohio-2Dstate.edu_-7Epouchet.2_doc_ics-2Darticle.15a.pdf&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=aihobyOnVzXW7OPSK1-NiSYQkq7oP3ZSUVc4BemvrVo&m=46eKxI_sFjjeBzn7X-OLXSEUwHN-HVCD16TF9OuyIqc&s=nraVp9R56UlYi0We27kmGSjZGnM296r0HFpbRR62Fzs&e> .
>
> This paper describes how to delinearize the subscripts, and as a last
step> it requires certain conditions to be met in order to validate that the
> delinearized indexes are correct. The `tryDelinearize` function in
> `DependenceAnalysis.cpp` appears to be checking for cases where these
> conditions can be validated *at compile time*:
>
> ```
> // Statically check that the array bounds are in-range. The first
subscript> we
> // don't have a size for and it cannot overflow into another subscript,
so> is
> // always safe. The others need to be 0 <= subscript[i] < bound, for
both
> src
> // and dst.
> // FIXME: It may be better to record these sizes and add them as
> constraints
> // to the dependency checks.
> for (int i = 1; i < size; ++i) {
> if (!isKnownNonNegative(SrcSubscripts[i], SrcPtr))
> return false;
>
> if (!isKnownLessThan(SrcSubscripts[i], Sizes[i - 1]))
> return false;
>
> if (!isKnownNonNegative(DstSubscripts[i], DstPtr))
> return false;
>
> if (!isKnownLessThan(DstSubscripts[i], Sizes[i - 1]))
> return false;
> }
> ```
>
> The problem is that in a lot of cases these conditions cannot be proven
> statically, even though the delinearized indexes are in fact correct. For
> example consider this simple loop:
>
> ```
> void foo(int n, int m, int a[][m]) {
> for (int i = 0; i < n-1; ++i)
> for (int j = 2; j < m; ++j) {
> a[i][j] = a[i+1][j-2];
> }
> }
>
> clang test.c -c -O3 -S -Xclang -disable-llvm-passes -emit-llvm
> opt -mem2reg -instcombine -indvars -loop-simplify -loop-rotate -inline
> -simplifycfg test.ll -S -o test.simp.ll
> opt test.simp.ll -analyze -da
> ```
>
> will produce:
>
> ```
> da analyze - none!
> da analyze - anti [* *|<]!
> da analyze - output [* *]!
> ```
>
> If the validity checks were not present, the result would be much more
> accurate dependence vector with accurate dependence distances:
>
> ```
> da analyze - none!
> da analyze - consistent anti [1 -2]!
> da analyze - none!
> ```
>
> In my experience the static validity checks tend to fail in many common
> cases (involving loops with symbolic bounds). Sometimes this happens
> because SCEV is not able to simplify the expressions due to presence of
> type casts and sign/zero extensions, but other times the conditions are
> just not computable at compile-time.
>
> So far I haven't been able to come up with an example where the
validity
> checks in the current implementation legitimately catch a case of invalid
> delinearization. I've also disabled these checks and run some tests
without> finding any failures (other than improvements in the dependence analysis
> LIT tests).
>
> I also had a quick look at Polly which benefits from delinearization.
From> what I saw, a much more optimistic approach is taken where the validity
> checks seem to be avoided.
>
> My questions are:
> 1. Does anyone have more information about why these validity checks are
> necessary in the current implementation, perhaps with some examples
showing> an incorrect delinearization that's possible without those checks?
> 2. Are there any concerns with putting these validity checks under an
> option so that we can more easily disable them and experiment? This could
> also help us improve LIT tests, since some of them have been pessimised
to> compensate for DA's inability to delinearize, and could fail to catch
> regressions as a result of bad changes to the data dependence analysis.
>
> Looking forward to your help on this.
>
> Thank you,
>
> Bardia Mahjour
> Compiler Optimizations
> Hydra Squad Lead
> IBM Toronto Software Lab
> bmahjour at ca.ibm.com (905) 413-2336
>
>
>
>
>

> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
>https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=aihobyOnVzXW7OPSK1-NiSYQkq7oP3ZSUVc4BemvrVo&m=fUw7wlIqUAftj6s2uRGSJGlKnQDG-pboQ8ywJ5Rttgo&s=iXNavqa8jVkQGLIgSy7xSbPHg_R7_QAlfjUZ4LBeK7U&e


--

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
    [attachment "da3.c" deleted by Bardia Mahjour/Toronto/IBM]

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190522/a1f35492/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190522/a1f35492/attachment.gif>

David Green via llvm-dev

2019-May-22 22:01 UTC

head link

[llvm-dev] Delinearization validity checks in DependenceAnalysis

Hello

Yes, I agree that the SCEV cannot be simplified.  Is my understanding correct
that it is passed to a function like "isKnownNegative"? Which could
still be able to prove is always true.

The delinearisation may be valid, depending on exactly how you define
delinearisation (under what conditions it should be giving results). It would be
invalid for DA to return a dependency of [0 >] though. The code is perfectly
valid from the C or llvm-ir specifications and we can't change the meaning
of it just because we don't like it :)  One of the observable effects of the
program is the last values written into A, and unroll-and-jam will reorder the
stores if it does not understand it is unsafe, leading to different output. i.e
a mis-compile. C isn't very friendly to us here, and llvm is linearising the
access. Some other languages have stronger guarantees about their array
dimensions accesses being in range. But this being a flat C array, there is
nothing out-of-bounds going on.

I'm not sure about just disabling the checks, at least without some other
way of keeping the results correct. Having a debug option probably wouldn't
hurt, but I'm not sure about turning it on for tests if it reduces the test
coverage. Other people may have different opinions though.

Any improvements to DA in general would be excellent, and it has a lot of room
for improvement.
Dave





From: Bardia Mahjour
Sent: 22 May 2019 18:09
To: David Green
Cc: Doerfert, Johannes; llvm-dev at lists.llvm.org; nd
Subject: Re: Re: [llvm-dev] Delinearization validity checks in
DependenceAnalysis
 

Hi David,

Sorry for my delayed response.
> There are already some function like isKnownPredicate that attempt to deal
with some of this is DA. Can we extend the other for these cases? Perhaps with
some extra information that we have that SCEV would in general not know, or do
something extra that it would not compute. Or ideally can we just improve SCEV
to get this right?
It looks like SCEV already tries to simplify identically sign and zero extended
terms in an expression. However expressions that contain mix of sex and zex like
this: `(-3 + (sext i32 %m to i64) + (-1 * (zext i32 %m
to> i64))<nsw>)` cannot be simplified unless we can assume that %m is
never negative. Although in practice %m will not be negative in most cases, the
compiler cannot prove that at compile-time so I don't think we can
"improve" SCEV to catch these cases.
> So this loop:
> void test(int * A, signed N, int M)
> {
>    for(int i = 1; i < N; i+=1) {
>        for(int j = 0; j < M; j+=1) {
>            A[i*M + j] = 2;
>            A[i*M + j - 1] = 4;
>        }
>    }
> }
> Needs to not get the dependency vector [0 >], otherwise unroll and jam
might think it was safe to reorder (unroll and jam) and miscompile the code.
Without the delinearization validity checks, the access functions A[i*M + j] and
A[i*M + j - 1] would get delinearized as follows:

SrcSCEV = {{((4 * %M) + %A)<nsw>,+,(4 *
%M)}<%for.body>,+,4}<%for.body4>
DstSCEV = {{(-4 + (4 * %M) + %A),+,(4 *
%M)}<%for.body>,+,4}<%for.body4>

SrcSubscripts: {1,+,1}<%for.body>{0,+,1}<%for.body4>
DstSubscripts: {1,+,1}<%for.body>{-1,+,1}<%for.body4> delinearized
subscript 0
src = {1,+,1}<%for.body>
dst = {1,+,1}<%for.body>
class = 1
loops = {1}
subscript 1
src = {0,+,1}<%for.body4>
dst = {-1,+,1}<%for.body4>
class = 1
loops = {2}
Separable = {0 1}
Coupled = {}

Why is this not a valid delinearization? The fact that the {-1,+,1} subscript
goes out of bounds with respect to the second dimension is a problem that
originates from the user code (and can be fixed by initializing j to 1 instead
of 0). It is not a problem introduced by delinearization. Nevertheless
delinearization validity checks fail because they find that the subscript in the
second dimension could be negative and think that we didn't delinearize
correctly.

Do you have any objections to adding an option to disable these checks, at least
for the test cases where we know the delinearization would be correct, but
cannot be guaranteed by the compiler?

Do we still need the validity step in the delinearization algorithm, if we could
guarantee that the array size parameters collected (guessed) in the first step
of the algorithm are in fact the correct array dimensions?



David Green ---2019/05/16 03:50:55 PM---Hello Under the proviso that it's
been a while since I looked into any of these things...

From: David Green <David.Green at arm.com>
To: "Doerfert, Johannes" <jdoerfert at anl.gov>, Bardia Mahjour
<bmahjour at ca.ibm.com>
Cc: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org>,
nd <nd at arm.com>
Date: 2019/05/16 03:50 PM
Subject: [EXTERNAL] Re: [llvm-dev] Delinearization validity checks in
DependenceAnalysis





Hello

Under the proviso that it's been a while since I looked into any of these
things...

On 05/15, Bardia Mahjour via llvm-dev wrote:> I also get correct results for my example (for a 64-bit target) if the
upper
> bounds are changed to unsigned. The reason is simply because clang
zero-extends
> `m` for address calculations but sign-extends it for the loop upper bound.
This
> prevents SCEV from canceling out the 'm' term from the difference
expression
> that looks like `(-3 + (sext i32 %m to i64) + (-1 * (zext i32 %m to
> i64))<nsw>)`. While we cannot reduce expressions of this form in
general, it
> does pose a sever limitation for the vast majority of cases where the loop
> upper bound is non-negative.
>
There are already some function like isKnownPredicate that attempt to deal with
some of this is DA. Can we extend the other for these cases? Perhaps with some
extra information that we have that SCEV would in general not know, or do
something extra that it would not compute. Or ideally can we just improve SCEV
to get this right?
> Regarding your example, I'm not sure I fully understand the concern and
I
> would appreciate it if you could clarify that for me. My understanding is
> that if the intended shape of 'a' is in fact 'a[n][m]' then
the example, as
> provided, has an out-of-bound access to start with. To avoid this out-bound
> access one would need to change the upper bound of the j-loop to be
'm-2'.
> Interestingly, if we do that, the current validity checks start to pass and
> we get [0 2] as the dependence vector. Here's a slightly modified
version
> of your example that I tried:
>
> ```
> typedef unsigned long long TT;
> void foo(TT n, TT m, int *a) {
>   for (TT i = 0; i < n; ++i)
>     for (TT j = 0; j < m-2; ++j) {
>       a[i*m + j] = a[i*m + j+2];
>     }
> }
> ```
Sure, but is wouldn't be invalid from C to use the "for (TT j = 0; j
< m; ++j)" bound, so long as the size of the object passed through to
"a" was large enough that it didn't overflow. We don't have
enough information to prove this won't happen, so unfortunately we need to
be a conservative.

So this loop:

void test(int * A, signed N, int M)
{
   for(int i = 1; i < N; i+=1) {
       for(int j = 0; j < M; j+=1) {
           A[i*M + j] = 2;
           A[i*M + j - 1] = 4;
       }
   }
}

Needs to not get the dependency vector [0 >], otherwise unroll and jam might
think it was safe to reorder (unroll and jam) and miscompile the code.
>
> If the concern is that the actual intended shape of the array may have been
> something other than a[n][m], and that we are indexing it such that the
> accesses are in-bound with respect to the memory of the whole array but not
> with respect to individual dimensions, then I'm not sure we can reason
> about *any* delinearization statically (except for the limited cases where
> the bounds are compile-time known).
>
> Am I misunderstanding the issue?
>
> Bardia Mahjour
> Compiler Optimizations
> Hydra Squad Lead
> IBM Toronto Software Lab
> bmahjour at ca.ibm.com (905) 413-2336
>
>
>
>
>
> From: David Green <David.Green at arm.com>
> To:   "llvm-dev at lists.llvm.org" <llvm-dev at
lists.llvm.org>, Bardia
>             Mahjour <bmahjour at ca.ibm.com>
> Cc:   nd <nd at arm.com>
> Date: 2019/05/14 02:50 PM
> Subject:      [EXTERNAL] Re: [llvm-dev] Delinearization validity checks in
>             DependenceAnalysis
>
>
>
> Hello
>
> Interestingly, the example you provide works for me so long as either
it's
> a 32bit target, or the array bounds (n and m) are changed to unsigned.
>
> For a bit of history, DA used to have a different delinearisation method
> based on geps, but it was known to be invalid in same cases and was
> eventually removed. There was no delinearisation for a while until these
> checks were fixed, enough to be correct, but not as powerful as they could
> be. I believe Pollys delinearisation is much more powerful, and can version
> loops with runtime conditions.
>
> IIRC, the checks are there for cases like this:
> void foo(unsigned n, unsigned m, int *a) {
>   for (int i = 0; i < n; ++i)
>     for (int j = 0; j < m; ++j) {
>       a[i*m + j] = a[i*m + j+2];
>     }
> }
>
> The "j-2" can underflow into the previous i iterations space. So
the
> distance vector isn't [0 2] (and isn't even [= <]).
Unfortunately this is
> perfectly valid in C (I think for the int a[][m] case too).
>
> See this comment for why they were needed and perhaps a better way to fix
> it:
>
 https://github.com/llvm/llvm-project/commit/d143c65de3c884d09197da279d2f04f094efaf15#diff-57473b376037dd3637516348b9b02556L3274
>
>
> Any improvements to the delinearisation code would be most welcome.
> Dave
>
>
>
>
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of
Bardia
> Mahjour via llvm-dev <llvm-dev at lists.llvm.org>
> Sent: 13 May 2019 14:48
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] Delinearization validity checks in DependenceAnalysis
>
> Hi all,
>
> I have been looking at the `DependenceAnalysis` pass in
> `llvm/include/llvm/Analysis/DependenceAnalysis.h`.
> In order for this analysis to produce accurate dependence vectors for
> multi-dimensional arrays in nested loops, it needs to
"delinearize" array
> element accesses to recover the subscripts in each dimension of the array.
> I believe that the current implementation of delinearization is based on
> this paper:
>  http://web.cse.ohio-state.edu/~pouchet.2/doc/ics-article.15a.pdf
> .
>
> This paper describes how to delinearize the subscripts, and as a last step
> it requires certain conditions to be met in order to validate that the
> delinearized indexes are correct. The `tryDelinearize` function in
> `DependenceAnalysis.cpp` appears to be checking for cases where these
> conditions can be validated *at compile time*:
>
> ```
> // Statically check that the array bounds are in-range. The first subscript
> we
> // don't have a size for and it cannot overflow into another subscript,
so
> is
> // always safe. The others need to be 0 <= subscript[i] < bound, for
both
> src
> // and dst.
> // FIXME: It may be better to record these sizes and add them as
> constraints
> // to the dependency checks.
> for (int i = 1; i < size; ++i) {
> if (!isKnownNonNegative(SrcSubscripts[i], SrcPtr))
> return false;
>
> if (!isKnownLessThan(SrcSubscripts[i], Sizes[i - 1]))
> return false;
>
> if (!isKnownNonNegative(DstSubscripts[i], DstPtr))
> return false;
>
> if (!isKnownLessThan(DstSubscripts[i], Sizes[i - 1]))
> return false;
> }
> ```
>
> The problem is that in a lot of cases these conditions cannot be proven
> statically, even though the delinearized indexes are in fact correct. For
> example consider this simple loop:
>
> ```
> void foo(int n, int m, int a[][m]) {
> for (int i = 0; i < n-1; ++i)
> for (int j = 2; j < m; ++j) {
> a[i][j] = a[i+1][j-2];
> }
> }
>
> clang test.c -c -O3 -S -Xclang -disable-llvm-passes -emit-llvm
> opt -mem2reg -instcombine -indvars -loop-simplify -loop-rotate -inline
> -simplifycfg test.ll -S -o test.simp.ll
> opt test.simp.ll -analyze -da
> ```
>
> will produce:
>
> ```
> da analyze - none!
> da analyze - anti [* *|<]!
> da analyze - output [* *]!
> ```
>
> If the validity checks were not present, the result would be much more
> accurate dependence vector with accurate dependence distances:
>
> ```
> da analyze - none!
> da analyze - consistent anti [1 -2]!
> da analyze - none!
> ```
>
> In my experience the static validity checks tend to fail in many common
> cases (involving loops with symbolic bounds). Sometimes this happens
> because SCEV is not able to simplify the expressions due to presence of
> type casts and sign/zero extensions, but other times the conditions are
> just not computable at compile-time.
>
> So far I haven't been able to come up with an example where the
validity
> checks in the current implementation legitimately catch a case of invalid
> delinearization. I've also disabled these checks and run some tests
without
> finding any failures (other than improvements in the dependence analysis
> LIT tests).
>
> I also had a quick look at Polly which benefits from delinearization. From
> what I saw, a much more optimistic approach is taken where the validity
> checks seem to be avoided.
>
> My questions are:
> 1. Does anyone have more information about why these validity checks are
> necessary in the current implementation, perhaps with some examples showing
> an incorrect delinearization that's possible without those checks?
> 2. Are there any concerns with putting these validity checks under an
> option so that we can more easily disable them and experiment? This could
> also help us improve LIT tests, since some of them have been pessimised to
> compensate for DA's inability to delinearize, and could fail to catch
> regressions as a result of bad changes to the data dependence analysis.
>
> Looking forward to your help on this.
>
> Thank you,
>
> Bardia Mahjour
> Compiler Optimizations
> Hydra Squad Lead
> IBM Toronto Software Lab
> bmahjour at ca.ibm.com (905) 413-2336
>
>
>
>
>

> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
   [attachment "da3.c" deleted by Bardia Mahjour/Toronto/IBM]


-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: graycol.gif
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190522/ac045836/attachment.gif>

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - May 2019 - Delinearization validity checks in DependenceAnalysis

[llvm-dev] Delinearization validity checks in DependenceAnalysis

[llvm-dev] Delinearization validity checks in DependenceAnalysis

[llvm-dev] Delinearization validity checks in DependenceAnalysis

Apparently Analagous Threads