thr3ads.net - llvm dev - [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata) [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2013-Feb-20 04:24 UTC

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

----- Original Message -----> From: "Vladimir Guzma" <vladimir.guzma at tut.fi>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>,
"Tobias Grosser" <tobias at grosser.es>, "llvmdev at
cs.uiuc.edu Dev"
> <llvmdev at cs.uiuc.edu>
> Sent: Tuesday, February 19, 2013 9:44:02 PM
> Subject: Re: [LLVMdev] Pointer Context Metadata (was: Parallel Loop
Metadata)
> 
> Hi all,
> 
> >> Hal, this OpenCL WG autovectorization work, unfortunately, is not
> >> my
> >> first
> >> priority task at work currently (more like a pet project), so I
> >> cannot make any
> >> promises on when I might find time to work on it again. So, if you
> >> want to
> >> see the parallel loop iteration MD happen sooner, I'd
recommend
> >> you
> >> dig into
> >> it. I think we'd like to start from the scratch for the
> >> bbvectorizer
> >> utilization
> >> in pocl anyways, that is, would add the metadata support first and
> >> then use it
> >> in a fresh bbvectorizer version. The current hacked version in
> >> pocl
> >> seems not
> >> to be upstreamable easily as it has lagged behind some LLVM
> >> versions
> >> and is
> >> rather dirty.
> > 
> > Understood. If you have some time, it seems that there are several
> > sub-tasks:
> > 
> > - Update the language reference
> > - Update the loop vectorizer (to update the metadata when it
> > unrolls)
> > - Update the regular unroller
> > - Update the alias analysis (maybe this is sufficient for basic
> > support in BBVectorize) - is your current code close enough for
> > this?
> 
> Current implementation of AA uses work item metadata as well as
> 'region' metadata identifiers (regions begin separated by
barriers).
> So in order to provide similar functionality, parallel loop metadata
> would need both, 'loop ID' and 'loop iteration ID'.
> 
> This is perhaps not much of a immediate concern with respect to
> current discussion, but can become trouble in case there are two
> consecutive loops, both marked with parallel loop metadata, both
> fully unrolled (or partially unrolled followed by some loop
> fusion/combining pass). In this case there is need for 'loop ID' to
> distinguish origin, since 'loop iteration ID' is not enough.
Agreed, but I think that the current loop.parallel metadata scheme already has
that. The memory access metadata refers to the metadata marking the loop
backedges.
> 
> > - Update the BB vectorizer to prefer pairings from different
> > iterations
> 
> Updating BB vectorizer to use work item metadata was rather trivial
> addition of a test for difference in identifiers, very similar to
> one in AA (though we also record position of the instruction in the
> originating block) and should be trivial to add to BBvectorizer as
> well, using parallel metadata.
> It become major mess once complains started about speed, e.g. pocl
> version does not have any search limits or maximum instr per group
> etc, so finding all candidate pairs become rather time consuming.
> So the candidate selection is not compatible with BB vectorizer, and
> whole bunch of code was removed…
Interesting.
> 
> Anyways, perhaps interesting parts for integrating to BBVectorizer
> could be (crude) caching during replaceOutputs to be used when
>  vectorizing phi nodes. There is also some vectorization of
> getelementpointer instructions, creation vectors of allocas to get
> better vector memory accesses, some magic about computing addresses
> of stride memory accesses using vectors, some tweaks to eliminate
> unneeded shuffle instructions in replacement inputs etc. There are
> lot of assumptions that the instructions to be vectorized are really
> identical from different work items (due to recorded position in the
> originating code), which may not be case in general BB vectorized
> cases.
To clarify, are these features that you've implemented in your version?
> 
> Anyways, if the loop metadata gets updated, I can have a look at
> updating AA and moving it from pocl to LLVM, but not likely this
> week (maybe Pekka can provide it sooner if there is a rush).
> I can not make any promises about BBVectorization atm, unfortunately.
Great, thanks!

 -Hal
> 
> regards
> Vlado
> > 
> > Thanks again,
> > Hal
> > 
> >> 
> >> BR,
> >> --
> >> --Pekka
> >> 
> >> 
> > 
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
>

Vladimir Guzma

2013-Feb-20 04:37 UTC

head link

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

>>> 
>>> - Update the language reference
>>> - Update the loop vectorizer (to update the metadata when it
>>> unrolls)
>>> - Update the regular unroller
>>> - Update the alias analysis (maybe this is sufficient for basic
>>> support in BBVectorize) - is your current code close enough for
>>> this?
>> 
>> Current implementation of AA uses work item metadata as well as
>> 'region' metadata identifiers (regions begin separated by
barriers).
>> So in order to provide similar functionality, parallel loop metadata
>> would need both, 'loop ID' and 'loop iteration ID'.
>> 
>> This is perhaps not much of a immediate concern with respect to
>> current discussion, but can become trouble in case there are two
>> consecutive loops, both marked with parallel loop metadata, both
>> fully unrolled (or partially unrolled followed by some loop
>> fusion/combining pass). In this case there is need for 'loop
ID' to
>> distinguish origin, since 'loop iteration ID' is not enough.
> 
> Agreed, but I think that the current loop.parallel metadata scheme already
has that. The memory access metadata refers to the metadata marking the loop
back edges.
Good.> 
>> 
>>> - Update the BB vectorizer to prefer pairings from different
>>> iterations
>> 
>> Updating BB vectorizer to use work item metadata was rather trivial
>> addition of a test for difference in identifiers, very similar to
>> one in AA (though we also record position of the instruction in the
>> originating block) and should be trivial to add to BBvectorizer as
>> well, using parallel metadata.
>> It become major mess once complains started about speed, e.g. pocl
>> version does not have any search limits or maximum instr per group
>> etc, so finding all candidate pairs become rather time consuming.
>> So the candidate selection is not compatible with BB vectorizer, and
>> whole bunch of code was removed…
> 
> Interesting.
> 
>> 
>> Anyways, perhaps interesting parts for integrating to BBVectorizer
>> could be (crude) caching during replaceOutputs to be used when
>> vectorizing phi nodes. There is also some vectorization of
>> getelementpointer instructions, creation vectors of allocas to get
>> better vector memory accesses, some magic about computing addresses
>> of stride memory accesses using vectors, some tweaks to eliminate
>> unneeded shuffle instructions in replacement inputs etc. There are
>> lot of assumptions that the instructions to be vectorized are really
>> identical from different work items (due to recorded position in the
>> originating code), which may not be case in general BB vectorized
>> cases.
> 
> To clarify, are these features that you've implemented in your version?
Yes, these are there. As well as some stuff to clean up after
vectorizer...>From performance point of view, addition of vectors of phi nodes was most
beneficial for our main target (TTA architecture).
regards
Vlado> 
>> 
>> Anyways, if the loop metadata gets updated, I can have a look at
>> updating AA and moving it from pocl to LLVM, but not likely this
>> week (maybe Pekka can provide it sooner if there is a rush).
>> I can not make any promises about BBVectorization atm, unfortunately.
> 
> Great, thanks!
> 
> -Hal
> 
>> 
>> regards
>> Vlado
>>> 
>>> Thanks again,
>>> Hal
>>> 
>>>> 
>>>> BR,
>>>> --
>>>> --Pekka
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>>

Hal Finkel

2013-Feb-20 05:33 UTC

head link

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

----- Original Message -----> From: "Vladimir Guzma" <vladimir.guzma at tut.fi>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>,
"Tobias Grosser" <tobias at grosser.es>, "llvmdev at
cs.uiuc.edu Dev"
> <llvmdev at cs.uiuc.edu>
> Sent: Tuesday, February 19, 2013 10:37:38 PM
> Subject: Re: [LLVMdev] Pointer Context Metadata (was: Parallel Loop
Metadata)
> 
> >>> 
> >>> - Update the language reference
> >>> - Update the loop vectorizer (to update the metadata when it
> >>> unrolls)
> >>> - Update the regular unroller
> >>> - Update the alias analysis (maybe this is sufficient for
basic
> >>> support in BBVectorize) - is your current code close enough
for
> >>> this?
> >> 
> >> Current implementation of AA uses work item metadata as well as
> >> 'region' metadata identifiers (regions begin separated by
> >> barriers).
> >> So in order to provide similar functionality, parallel loop
> >> metadata
> >> would need both, 'loop ID' and 'loop iteration
ID'.
> >> 
> >> This is perhaps not much of a immediate concern with respect to
> >> current discussion, but can become trouble in case there are two
> >> consecutive loops, both marked with parallel loop metadata, both
> >> fully unrolled (or partially unrolled followed by some loop
> >> fusion/combining pass). In this case there is need for 'loop
ID'
> >> to
> >> distinguish origin, since 'loop iteration ID' is not
enough.
> > 
> > Agreed, but I think that the current loop.parallel metadata scheme
> > already has that. The memory access metadata refers to the
> > metadata marking the loop back edges.
> 
> Good.
> > 
> >> 
> >>> - Update the BB vectorizer to prefer pairings from different
> >>> iterations
> >> 
> >> Updating BB vectorizer to use work item metadata was rather
> >> trivial
> >> addition of a test for difference in identifiers, very similar to
> >> one in AA (though we also record position of the instruction in
> >> the
> >> originating block) and should be trivial to add to BBvectorizer as
> >> well, using parallel metadata.
> >> It become major mess once complains started about speed, e.g. pocl
> >> version does not have any search limits or maximum instr per group
> >> etc, so finding all candidate pairs become rather time consuming.
> >> So the candidate selection is not compatible with BB vectorizer,
> >> and
> >> whole bunch of code was removed…
> > 
> > Interesting.
> > 
> >> 
> >> Anyways, perhaps interesting parts for integrating to BBVectorizer
> >> could be (crude) caching during replaceOutputs to be used when
> >> vectorizing phi nodes. There is also some vectorization of
> >> getelementpointer instructions, creation vectors of allocas to get
> >> better vector memory accesses, some magic about computing
> >> addresses
> >> of stride memory accesses using vectors, some tweaks to eliminate
> >> unneeded shuffle instructions in replacement inputs etc. There are
> >> lot of assumptions that the instructions to be vectorized are
> >> really
> >> identical from different work items (due to recorded position in
> >> the
> >> originating code), which may not be case in general BB vectorized
> >> cases.
> > 
> > To clarify, are these features that you've implemented in your
> > version?
> 
> Yes, these are there. As well as some stuff to clean up after
> vectorizer...
> From performance point of view, addition of vectors of phi nodes was
> most beneficial for our main target (TTA architecture).
It looks like the source is here:
http://bazaar.launchpad.net/~pocl/pocl/trunk/view/head:/lib/llvmopencl/WIVectorize.cc

Are there test cases for these new features?

I'll look at this version; hopefully we'll be able to get most if not
all of the improvements upstream. It looks like you've left your version
under LLVM's license; is that correct? If not, may I have your explicit
permission for relicensing?

I've made some significant performance improvements in BBVectorize recently,
and you may want to adopt those changes in your version as well (especially if
people have complained about speed).

Thanks again,
Hal
> 
> regards
> Vlado
> > 
> >> 
> >> Anyways, if the loop metadata gets updated, I can have a look at
> >> updating AA and moving it from pocl to LLVM, but not likely this
> >> week (maybe Pekka can provide it sooner if there is a rush).
> >> I can not make any promises about BBVectorization atm,
> >> unfortunately.
> > 
> > Great, thanks!
> > 
> > -Hal
> > 
> >> 
> >> regards
> >> Vlado
> >>> 
> >>> Thanks again,
> >>> Hal
> >>> 
> >>>> 
> >>>> BR,
> >>>> --
> >>>> --Pekka
> >>>> 
> >>>> 
> >>> 
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> 
> >> 
> 
>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Feb 2013 - [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

Possibly Parallel Threads