thr3ads.net - llvm dev - [llvm-dev] [RFC] A new multidimensional array indexing intrinsic [Jul 2019]

If this information is useful, please help other people find it:
Share via:

Michael Kruse via llvm-dev

2019-Jul-22 15:59 UTC

[llvm-dev] [RFC] A new multidimensional array indexing intrinsic

Am Mo., 22. Juli 2019 um 10:50 Uhr schrieb Doerfert, Johannes
<jdoerfert at anl.gov>:> Why introduce a new intrinsic (family)? It seems that would require us
> to support GEPs and GEP + "multi-dim" semantics in various
places. What is
> the benefit over a GEP extension?
Adding an intrinsic is easier than adding or extending an existing
instruction, as suggested by
https://llvm.org/docs/ExtendingLLVM.html#introduction-and-warning

Extending GEP would require all passes to understand the added
semantics on day 1, while a new intrinsic allows us to gradually add
support.
We can still make the intrinsic into an instruction when existing
passes understand the new functionality.

> > However, alas, this is illegal, for the C language does not provide
> > semantics that allow the final inference above. It is conceivable that
> > `x1 != x2, y1 != y2`, but the indices do actually alias, since
> > according to C semantics, the two indices alias if the _flattened
> > representation of the indices alias_. Consider the parameter
> > values:
> >
> > ```
> > n = m = 3
> > x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3
> > x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3
> > ```
> >
> > Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and
> > so the transformation proposed in `ex1_opt` is unsound in general.
>
> I'm unsure your example actually showcases the problem:
>
> C standard, N1570 draft, Page 560, Appendix J.2 Undefined Behavior:
>
> An array subscript is out of range, even if an object is apparently
> accessible with the given subscript (as in the lvalue expression a[1][7]
> given the declaration inta[4][5]) (6.5.6).

GEP requires an array type, which in LLVM has static size. For
dynamically-sized array (In C/C++ language family, this is only
supported by C99 VLAs), clang emits the the linearized index
expression followed by a single-dimensional GEP.

The multi-dimensional GEP instruction has this semantics with the
`inrange` modifier, which clang currently does not generate for
subscript expressions. Given that there might be code around relying
on this behavior, and the C++ standard not containing this paragraph,
I am not sure we can change it to undefined behavior, even if the C
standard would permit it. I think this is a separate discussion since
the main motivation of the RFC are languages that have broader use of
runtime-length arrays.

But for the showcase, you are right in that according to the C
standard, this aliasing might be undefined behavior.

Michael

Doerfert, Johannes via llvm-dev

2019-Jul-22 16:55 UTC

head link

[llvm-dev] [RFC] A new multidimensional array indexing intrinsic

On 07/22, Michael Kruse wrote:> Am Mo., 22. Juli 2019 um 10:50 Uhr schrieb Doerfert, Johannes
> <jdoerfert at anl.gov>:
> > Why introduce a new intrinsic (family)? It seems that would require us
> > to support GEPs and GEP + "multi-dim" semantics in various
places. What is
> > the benefit over a GEP extension?
> 
> Adding an intrinsic is easier than adding or extending an existing
> instruction, as suggested by
> https://llvm.org/docs/ExtendingLLVM.html#introduction-and-warning
They compare *new intrinsic* vs *new instruction*, not *new intrinsic*
vs *extending an instruction*. Also, this is motivated as a missing core
functionality, the discussion if this should have native support from
day one should take place even if it means more work.

> Extending GEP would require all passes to understand the added
> semantics on day 1, while a new intrinsic allows us to gradually add
> support.  We can still make the intrinsic into an instruction when
> existing passes understand the new functionality.
Here, extending GEP means *restricting* the semantics. So, we could add
it in a way that would not require any pass to understand the semantics,
e.g., we drop "multi-dim" whenever a GEP is modified, cloned, etc.
Preferably, we would change the locations that modify GEPs (inplace),
which I believe is reasonable.

Arguably, for any real use case, you need to use extended GEPs or modify
quite a few passes to get the performance you want. Everything that
analyzes GEPs nowadays needs to be aware of the intrinsics because the
pointer is otherwise opaque, which makes it basically impossible to
transform.

If modifying the GEP is not viable, for any reason, we should
investigate ways that allows to use GEPs anyway. One method could be
to translate the following code into the IR shown below.

type A[n][m];
x = A[i][j];

%ptr = gep %A, %i, %j
%md_ptr = llvm.multidim.access(%A, %n, %m, %i, %j)
%cmp = icmp eq %ptr, %md_ptr
llvm.assume(%cmp)
%x = load %ptr

or maybe:

%ptr = gep %A, %i, %j
%md_ptr = llvm.multidim.access(%A, %n, %m, %i, %j)
%x = load %ptr, mutli-dim %md_ptr


> > > However, alas, this is illegal, for the C language does not
> > > provide semantics that allow the final inference above. It is
> > > conceivable that `x1 != x2, y1 != y2`, but the indices do
actually
> > > alias, since according to C semantics, the two indices alias if
> > > the _flattened representation of the indices alias_. Consider the
> > > parameter values:
> > >
> > > ``` n = m = 3 x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3 x2 =
0,
> > > y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3 ```
> > >
> > > Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can
alias_,
> > > and so the transformation proposed in `ex1_opt` is unsound in
> > > general.
> >
> > I'm unsure your example actually showcases the problem:
> >
> > C standard, N1570 draft, Page 560, Appendix J.2 Undefined Behavior:
> >
> > An array subscript is out of range, even if an object is apparently
> > accessible with the given subscript (as in the lvalue expression
> > a[1][7] given the declaration inta[4][5]) (6.5.6).
> 
> 
> GEP requires an array type, which in LLVM has static size. For
> dynamically-sized array (In C/C++ language family, this is only
> supported by C99 VLAs), clang emits the the linearized index
> expression followed by a single-dimensional GEP.
> 
> The multi-dimensional GEP instruction has this semantics with the
> `inrange` modifier, which clang currently does not generate for
> subscript expressions. Given that there might be code around relying
> on this behavior, and the C++ standard not containing this paragraph,
> I am not sure we can change it to undefined behavior, even if the C
> standard would permit it. I think this is a separate discussion since
> the main motivation of the RFC are languages that have broader use of
> runtime-length arrays.
> 
> But for the showcase, you are right in that according to the C
> standard, this aliasing might be undefined behavior.
> 
> Michael
-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190722/d5ba6440/attachment.sig>

Michael Kruse via llvm-dev

2019-Jul-23 01:05 UTC

head link

[llvm-dev] [RFC] A new multidimensional array indexing intrinsic

After having spoken to Johannes, I think we had a classic
misunderstanding on what "extending" means.


1.
The most obvious why for me was changing GEP to allow variable-sized
multi-dimensional arrays in the first argument, such as

    %1 = getelementptr double, double* %ptr, inrange i64 %i, inrange i64 %j

(normally GEP would only allow a single index argument for a
pointer-typed base pointer).
Since %A has variable size, there is not enough information to compute
the result, we need to pass at least the stride of the innermost
dimension, such as:

    %1 = getelementptr double, double* %A, inrange i64 %i, inrange i64
%j, i64 %n

It should be clear that this breaks a lot of assumptions then went
into writing code that handle GEPs, not only the number of arguments
the GEP should have. As a result, existing code may trigger
assertions, crash, or miscompile when encountering such a modified
GEP. I think it is unfeasible to change all code to properly handle
the new form at once.


2.
Johannes' interpretation is to add some kind of metadata to GEPs, in
addition to the naive computation, such as:

    %offset1= mul i64. %i, %n
    %offset2 = add i64, %j, %offset1
    %1 = getelementptr double, double* %A, inrange i64 %offset2 [
"multi-dim"(i64 %n) ]

This was not obvious to me. The code above uses operand bundle syntax.
During our discussing for this RFC we briefly discussed metadata,
which unfortunately do not allow referencing local SSA values.


3.
For completeness, here is Johannes other suggestion without modifying
GEP/Load/Store:

    %offset1= mul i64. %i, %n
    %offset2= add i64, %j, %offset1
    %1 = getelementptr double, double* %A, inrange i64 %offset2
    %2 = llvm.multidim.access.pdouble.pdouble.i64.i64.i64.i64(double*
%A, i64 %n, i64 %m, i64 %i, i64 %j)
    %cmp = icmp eq %1, %2
    call void @llvm.assume(i1 %cmp)


The main motivation is to avoid that unprepared analyses such as
BasicAA have to give up when encountering the new intrinsic. 2. and 3.
only add things around as they were before such multidimensional
accesses.

Subjectively, I don't think that having to give up is such a big
problem. In the beginning, there will be no frontend that generates
the new intrinsic. More important passes such as BasicAA and
ScalarEvolution can be adapted before even Chapel would emit such
intrinsics. In the long term I would indeed try to fuse it with GEP
when all there is to do is changing the code determining whether it is
a multi-dimensional intrinsic to determining whether it has
variable-length form.


Michael

Philip Reames via llvm-dev

2019-Jul-23 16:31 UTC

head link

[llvm-dev] [RFC] A new multidimensional array indexing intrinsic

On 7/22/19 8:59 AM, Michael Kruse via llvm-dev wrote:> Am Mo., 22. Juli 2019 um 10:50 Uhr schrieb Doerfert, Johannes
> <jdoerfert at anl.gov>:
>> Why introduce a new intrinsic (family)? It seems that would require us
>> to support GEPs and GEP + "multi-dim" semantics in various
places. What is
>> the benefit over a GEP extension?
> Adding an intrinsic is easier than adding or extending an existing
> instruction, as suggested by
> https://llvm.org/docs/ExtendingLLVM.html#introduction-and-warning
In this case, that's probably bad advice.>
> Extending GEP would require all passes to understand the added
> semantics on day 1, while a new intrinsic allows us to gradually add
> support.
> We can still make the intrinsic into an instruction when existing
> passes understand the new functionality.
>
>
>>> However, alas, this is illegal, for the C language does not provide
>>> semantics that allow the final inference above. It is conceivable
that
>>> `x1 != x2, y1 != y2`, but the indices do actually alias, since
>>> according to C semantics, the two indices alias if the _flattened
>>> representation of the indices alias_. Consider the parameter
>>> values:
>>>
>>> ```
>>> n = m = 3
>>> x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3
>>> x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3
>>> ```
>>>
>>> Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_,
and
>>> so the transformation proposed in `ex1_opt` is unsound in general.
>> I'm unsure your example actually showcases the problem:
>>
>> C standard, N1570 draft, Page 560, Appendix J.2 Undefined Behavior:
>>
>> An array subscript is out of range, even if an object is apparently
>> accessible with the given subscript (as in the lvalue expression
a[1][7]
>> given the declaration inta[4][5]) (6.5.6).
>
> GEP requires an array type, which in LLVM has static size.This is false.  See the last paragraph in 
http://llvm.org/docs/LangRef.html#array-type> For
> dynamically-sized array (In C/C++ language family, this is only
> supported by C99 VLAs), clang emits the the linearized index
> expression followed by a single-dimensional GEP.
This could be changed if desired.>
> The multi-dimensional GEP instruction has this semantics with the
> `inrange` modifier, which clang currently does not generate for
> subscript expressions. Given that there might be code around relying
> on this behavior, and the C++ standard not containing this paragraph,
> I am not sure we can change it to undefined behavior, even if the C
> standard would permit it. I think this is a separate discussion since
> the main motivation of the RFC are languages that have broader use of
> runtime-length arrays.
>
> But for the showcase, you are right in that according to the C
> standard, this aliasing might be undefined behavior.
>
> Michael
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Jul 2019 - [RFC] A new multidimensional array indexing intrinsic

[llvm-dev] [RFC] A new multidimensional array indexing intrinsic

[llvm-dev] [RFC] A new multidimensional array indexing intrinsic

[llvm-dev] [RFC] A new multidimensional array indexing intrinsic

[llvm-dev] [RFC] A new multidimensional array indexing intrinsic