thr3ads.net - llvm dev - [llvm-dev] array fill idioms [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Bagel via llvm-dev

2016-Nov-10 21:25 UTC

[llvm-dev] array fill idioms

I am asking for some collective wisdom/guidance.

What sort of IR construct should one use to implement filling each
element in an array (or vector) with the same value?  In C++, this
might arise in "std:fill" or "std:fill_n", when the element
values in the
vector are identical.
In the D language, one can fill an array or a slice of an array
by an assignment, e.g.
  "A[2..10] = 42;"

1. What I would prefer is an explicit intrinsic, call it "llvm.fill.*"
that
   would work similar to the "llvm.memset.*" intrinsic.  The memset
intrinsic
   only works with byte arrays, but provides wonderful optimizations in the
   various code generators.  Hopefully, these similar optimizations would be
   implemented for "llvm.fill.*".

2. Given that I probably won't get my wish, I note that some front-ends use
   vector assignment:
   store <8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16
42, i16
42>, <8 x i16>* %14, align 2
   Does this work well for architectures without SIMD?
   What chunk size should be used for the vector, and is that architecture
   dependent?

3. If vectors are not used, but rather an explicit loop of stores,
   element-by-element, will this be recognized as an idiom for
   architecture-dependent optimizations?

Thanks in advance.

Mehdi Amini via llvm-dev

2016-Nov-10 21:30 UTC

head link

[llvm-dev] array fill idioms

Hi,

An alternative is to perform what is done for the equivalent C construct:

void foo() {
  char bar[20] = “hello”;
}

->

@foo.bar = private unnamed_addr constant [20 x i8]
c"hello\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", align 16
define void @foo() #0 {
  %1 = alloca [20 x i8], align 16
  %2 = bitcast [20 x i8]* %1 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* getelementptr inbounds ([20 x
i8], [20 x i8]* @foo.bar, i32 0, i32 0), i64 20, i32 16, i1 false)
  ret void
}


— 
Mehdi


> On Nov 10, 2016, at 1:25 PM, Bagel via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> I am asking for some collective wisdom/guidance.
> 
> What sort of IR construct should one use to implement filling each
> element in an array (or vector) with the same value?  In C++, this
> might arise in "std:fill" or "std:fill_n", when the
element values in the
> vector are identical.
> In the D language, one can fill an array or a slice of an array
> by an assignment, e.g.
>  "A[2..10] = 42;"
> 
> 1. What I would prefer is an explicit intrinsic, call it
"llvm.fill.*" that
>   would work similar to the "llvm.memset.*" intrinsic.  The
memset intrinsic
>   only works with byte arrays, but provides wonderful optimizations in the
>   various code generators.  Hopefully, these similar optimizations would be
>   implemented for "llvm.fill.*".
> 
> 2. Given that I probably won't get my wish, I note that some front-ends
use
>   vector assignment:
>   store <8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16 42,
i16 42, i16
> 42>, <8 x i16>* %14, align 2
>   Does this work well for architectures without SIMD?
>   What chunk size should be used for the vector, and is that architecture
>   dependent?
> 
> 3. If vectors are not used, but rather an explicit loop of stores,
>   element-by-element, will this be recognized as an idiom for
>   architecture-dependent optimizations?
> 
> Thanks in advance.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mehdi Amini via llvm-dev

2016-Nov-10 21:53 UTC

head link

[llvm-dev] array fill idioms

> On Nov 10, 2016, at 1:30 PM, Mehdi Amini via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
> An alternative is to perform what is done for the equivalent C construct:
> 
> void foo() {
>  char bar[20] = “hello”;
> }
> 
> ->
> 
> @foo.bar = private unnamed_addr constant [20 x i8]
c"hello\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", align 16
> define void @foo() #0 {
>  %1 = alloca [20 x i8], align 16
>  %2 = bitcast [20 x i8]* %1 to i8*
>  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* getelementptr inbounds
([20 x i8], [20 x i8]* @foo.bar, i32 0, i32 0), i64 20, i32 16, i1 false)
>  ret void
> }
> 
> 
Obviously that isn’t great for patterns, and LLVM can recognize pattern that can
fit memset_pattern on targets that supports it, which looks like closer to what
you’d like I think.

— 
Mehdi


> 
> 
> 
>> On Nov 10, 2016, at 1:25 PM, Bagel via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>> 
>> I am asking for some collective wisdom/guidance.
>> 
>> What sort of IR construct should one use to implement filling each
>> element in an array (or vector) with the same value?  In C++, this
>> might arise in "std:fill" or "std:fill_n", when the
element values in the
>> vector are identical.
>> In the D language, one can fill an array or a slice of an array
>> by an assignment, e.g.
>> "A[2..10] = 42;"
>> 
>> 1. What I would prefer is an explicit intrinsic, call it
"llvm.fill.*" that
>>  would work similar to the "llvm.memset.*" intrinsic.  The
memset intrinsic
>>  only works with byte arrays, but provides wonderful optimizations in
the
>>  various code generators.  Hopefully, these similar optimizations would
be
>>  implemented for "llvm.fill.*".
>> 
>> 2. Given that I probably won't get my wish, I note that some
front-ends use
>>  vector assignment:
>>  store <8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16
42, i16 42, i16
>> 42>, <8 x i16>* %14, align 2
>>  Does this work well for architectures without SIMD?
>>  What chunk size should be used for the vector, and is that
architecture
>>  dependent?
>> 
>> 3. If vectors are not used, but rather an explicit loop of stores,
>>  element-by-element, will this be recognized as an idiom for
>>  architecture-dependent optimizations?
>> 
>> Thanks in advance.
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Bagel via llvm-dev

2016-Nov-10 22:02 UTC

head link

[llvm-dev] array fill idioms

Yes, I know this works peachy keen for char arrays.  I'm looking at (which
is
hard to express in C) something like

void foo () {
   int bar[20] = { 42, 42, ..., 42 };
}

I don't want to do a memcopy of the 20 element constant array, and memset
doesn't work here.  I want an intrinsic that copys the scalar int constant
42
to each element of the int array.

bagel

On 11/10/2016 03:30 PM, Mehdi Amini wrote:> Hi,
> 
> An alternative is to perform what is done for the equivalent C construct:
> 
> void foo() {
>   char bar[20] = “hello”;
> }
> 
> ->
> 
> @foo.bar = private unnamed_addr constant [20 x i8]
c"hello\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", align 16
> define void @foo() #0 {
>   %1 = alloca [20 x i8], align 16
>   %2 = bitcast [20 x i8]* %1 to i8*
>   call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* getelementptr inbounds
([20 x i8], [20 x i8]* @foo.bar, i32 0, i32 0), i64 20, i32 16, i1 false)
>   ret void
> }
> 
> 
> — 
> Mehdi

Philip Reames via llvm-dev

2016-Nov-25 23:46 UTC

head link

[llvm-dev] array fill idioms

Take a look an memset (byte patterns), and memset_patternX (multi byte 
patterns, only currently supported for selective targets).  In general, 
support for fill idioms is something we could stand to improve and it's 
something I or someone on my team is likely to be working on within the 
next year.

Today, the naive store loop is probably your best choice to have emitted 
by the frontend.  This loop will be nicely vectorized by the loop 
vectorizer, specialized if the loop length is known to be small, and 
otherwise decently handled.  The only serious problem with this 
implementation strategy is that you end up with many copies of the fill 
loop scattered throughout your code (code bloat).  (I'm assuming this 
gets aggressively inlined.  If it doesn't, well, then there are bigger 
problems.)

Moving forward, any further support we added would definitely handle 
pattern matching the naive loop constructs.  Given that, it's also 
reasonably future proof as well.

Philip

On 11/10/2016 01:25 PM, Bagel via llvm-dev wrote:> I am asking for some collective wisdom/guidance.
>
> What sort of IR construct should one use to implement filling each
> element in an array (or vector) with the same value?  In C++, this
> might arise in "std:fill" or "std:fill_n", when the
element values in the
> vector are identical.
> In the D language, one can fill an array or a slice of an array
> by an assignment, e.g.
>    "A[2..10] = 42;"
>
> 1. What I would prefer is an explicit intrinsic, call it
"llvm.fill.*" that
>     would work similar to the "llvm.memset.*" intrinsic.  The
memset intrinsic
>     only works with byte arrays, but provides wonderful optimizations in
the
>     various code generators.  Hopefully, these similar optimizations would
be
>     implemented for "llvm.fill.*".
>
> 2. Given that I probably won't get my wish, I note that some front-ends
use
>     vector assignment:
>     store <8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16
42, i16 42, i16
> 42>, <8 x i16>* %14, align 2
>     Does this work well for architectures without SIMD?
>     What chunk size should be used for the vector, and is that architecture
>     dependent?
>
> 3. If vectors are not used, but rather an explicit loop of stores,
>     element-by-element, will this be recognized as an idiom for
>     architecture-dependent optimizations?
>
> Thanks in advance.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Bagel via llvm-dev

2016-Nov-28 17:38 UTC

head link

[llvm-dev] array fill idioms

Philip, thank you for your comments.

I already use memset for byte patterns, but was unaware of memset_patternX; I
will look into it.  Based on your observations, I guess I will go ahead with
the naive store loop approach for now and hope for "llvm.fill.*" in
the future.

Thanks, bagel.


On 11/25/2016 05:46 PM, Philip Reames wrote:> Take a look an memset (byte patterns), and memset_patternX (multi byte
> patterns, only currently supported for selective targets).  In general,
support
> for fill idioms is something we could stand to improve and it's
something I or
> someone on my team is likely to be working on within the next year.
> 
> Today, the naive store loop is probably your best choice to have emitted by
the
> frontend.  This loop will be nicely vectorized by the loop vectorizer,
> specialized if the loop length is known to be small, and otherwise decently
> handled.  The only serious problem with this implementation strategy is
that
> you end up with many copies of the fill loop scattered throughout your code
> (code bloat).  (I'm assuming this gets aggressively inlined.  If it
doesn't,
> well, then there are bigger problems.)
> 
> Moving forward, any further support we added would definitely handle
pattern
> matching the naive loop constructs.  Given that, it's also reasonably
future
> proof as well.
> 
> Philip

llvm dev - Nov 2016 - array fill idioms

[llvm-dev] array fill idioms

[llvm-dev] array fill idioms

[llvm-dev] array fill idioms

[llvm-dev] array fill idioms

[llvm-dev] array fill idioms

[llvm-dev] array fill idioms