thr3ads.net - llvm dev - [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata) [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2013-Feb-19 15:51 UTC

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

----- Original Message -----> From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>
> To: "Nadav Rotem" <nrotem at apple.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Tobias
Grosser" <tobias at grosser.es>, "llvmdev at cs.uiuc.edu
Dev"
> <llvmdev at cs.uiuc.edu>
> Sent: Tuesday, February 19, 2013 2:27:09 AM
> Subject: Re: [LLVMdev] Pointer Context Metadata (was: Parallel Loop
Metadata)
> 
> Hi,
> 
> On 02/19/2013 02:31 AM, Nadav Rotem wrote:
> > vectorization.  However, I don't completely understand something.
> >  If we
> > already have the information that consecutive iterations of the
> > loops are
> > independent, then the loop vectorizer should already vectorize the
> > loop.
> 
> Yes, the loop vectorizer should be a better match for the parallel
> (inner)loops.
> 
> That's why I've been pushing the parallel loop metadata: to go
> towards
> using the generic loop vectorizer instead of the hacked bbvectorizer
> for
> work-group autovectorization in pocl. Unfortunately, it needs some
> more work
> still to be efficient for this purpose (like discussed), but a step
> towards it
> has been now made and it can vectorize some work-group functions with
> pocl.
> That's good.
> 
> BTW, there's at least one thing the bbvectorizer handles better now:
> intra-kernel/function vector datatypes. IIRC, it doesn't choke when
> it sees
> vectors already present in the input, but calmly tries to combine
> multiple
> vector instructions to a larger one. Thus, it might be useful in the
> case
> where the loop at hand is not nicely and cleanly vectorizable (e.g.,
> nasty
> memory access patterns) to still provide some level of vector ISA
> utilization.
Indeed, that's the idea.
> 
> Hal, this OpenCL WG autovectorization work, unfortunately, is not my
> first
> priority task at work currently (more like a pet project), so I
> cannot make any
> promises on when I might find time to work on it again. So, if you
> want to
> see the parallel loop iteration MD happen sooner, I'd recommend you
> dig into
> it. I think we'd like to start from the scratch for the bbvectorizer
> utilization
> in pocl anyways, that is, would add the metadata support first and
> then use it
> in a fresh bbvectorizer version. The current hacked version in pocl
> seems not
> to be upstreamable easily as it has lagged behind some LLVM versions
> and is
> rather dirty.
Understood. If you have some time, it seems that there are several sub-tasks:

 - Update the language reference
 - Update the loop vectorizer (to update the metadata when it unrolls)
 - Update the regular unroller
 - Update the alias analysis (maybe this is sufficient for basic support in
BBVectorize) - is your current code close enough for this?
 - Update the BB vectorizer to prefer pairings from different iterations

Thanks again,
Hal
> 
> BR,
> --
> --Pekka
> 
>

Pekka Jääskeläinen

2013-Feb-19 16:25 UTC

head link

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

On 02/19/2013 05:51 PM, Hal Finkel wrote:> Understood. If you have some time, it seems that there are several
sub-tasks:
>
>   - Update the language reference
Document the additional optional iteration id argument to
llvm.mem.parallel_loop_access? I'll do this.
>   - Update the loop vectorizer (to update the metadata when it unrolls)
>   - Update the regular unroller
I'll update the pocl's work-item replicator first (of which output
is effectively the same as a fully unrolled parallel wiloop) to mark the
iterations with the iteration argument and see where it gets the WG
vectorization using the upstream BBVectorizer.
>   - Update the alias analysis (maybe this is sufficient for basic support
in BBVectorize) - is your current code close enough for this?
The AA should be trivial. I can take a look at this too and try
to prepare a patch. This can be the first consumer for the optional
parallel_loop_access argument so it fits to the same patch.

The work-item AA implementation seems to be rather clean in pocl so it
should not require much additional work to generalize it to support
parallel loop iterations of any kind:

http://bazaar.launchpad.net/~pocl/pocl/trunk/view/head:/lib/llvmopencl/WorkItemAliasAnalysis.cc

I might try to upstream the OpenCL disjoint address space AA part
while I'm at it. It can use the OpenCL kernel metadata for checking it's
an OCL kernel that is being processed.
>   - Update the BB vectorizer to prefer pairings from different iterations
I'd leave this as a last task. It might find good enough pairings solely
with the additional AA, like you wrote. Let's see.

-- 
Pekka

Vladimir Guzma

2013-Feb-20 03:44 UTC

head link

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

Hi all,
>> Hal, this OpenCL WG autovectorization work, unfortunately, is not my
>> first
>> priority task at work currently (more like a pet project), so I
>> cannot make any
>> promises on when I might find time to work on it again. So, if you
>> want to
>> see the parallel loop iteration MD happen sooner, I'd recommend you
>> dig into
>> it. I think we'd like to start from the scratch for the
bbvectorizer
>> utilization
>> in pocl anyways, that is, would add the metadata support first and
>> then use it
>> in a fresh bbvectorizer version. The current hacked version in pocl
>> seems not
>> to be upstreamable easily as it has lagged behind some LLVM versions
>> and is
>> rather dirty.
> 
> Understood. If you have some time, it seems that there are several
sub-tasks:
> 
> - Update the language reference
> - Update the loop vectorizer (to update the metadata when it unrolls)
> - Update the regular unroller
> - Update the alias analysis (maybe this is sufficient for basic support in
BBVectorize) - is your current code close enough for this?
Current implementation of AA uses work item metadata as well as 'region'
metadata identifiers (regions begin separated by barriers).
So in order to provide similar functionality, parallel loop metadata would need
both, 'loop ID' and 'loop iteration ID'.

This is perhaps not much of a immediate concern with respect to current
discussion, but can become trouble in case there are two consecutive loops, both
marked with parallel loop metadata, both fully unrolled (or partially unrolled
followed by some loop fusion/combining pass). In this case there is need for
'loop ID' to distinguish origin, since 'loop iteration ID' is
not enough.
> - Update the BB vectorizer to prefer pairings from different iterations
Updating BB vectorizer to use work item metadata was rather trivial addition of
a test for difference in identifiers, very similar to one in AA (though we also
record position of the instruction in the originating block) and should be
trivial to add to BBvectorizer as well, using parallel metadata.
It become major mess once complains started about speed, e.g. pocl version does
not have any search limits or maximum instr per group etc, so finding all
candidate pairs become rather time consuming.
So the candidate selection is not compatible with BB vectorizer, and whole bunch
of code was removed…

Anyways, perhaps interesting parts for integrating to BBVectorizer could be
(crude) caching during replaceOutputs to be used when  vectorizing phi nodes.
There is also some vectorization of getelementpointer instructions, creation
vectors of allocas to get better vector memory accesses, some magic about
computing addresses of stride memory accesses using vectors, some tweaks to
eliminate unneeded shuffle instructions in replacement inputs etc. There are lot
of assumptions that the instructions to be vectorized are really identical from
different work items (due to recorded position in the originating code), which
may not be case in general BB vectorized cases.

Anyways, if the loop metadata gets updated, I can have a look at updating AA and
moving it from pocl to LLVM, but not likely this week (maybe Pekka can provide
it sooner if there is a rush).
I can not make any promises about BBVectorization atm, unfortunately.

regards
Vlado> 
> Thanks again,
> Hal
> 
>> 
>> BR,
>> --
>> --Pekka
>> 
>> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Hal Finkel

2013-Feb-20 04:24 UTC

head link

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

----- Original Message -----> From: "Vladimir Guzma" <vladimir.guzma at tut.fi>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>,
"Tobias Grosser" <tobias at grosser.es>, "llvmdev at
cs.uiuc.edu Dev"
> <llvmdev at cs.uiuc.edu>
> Sent: Tuesday, February 19, 2013 9:44:02 PM
> Subject: Re: [LLVMdev] Pointer Context Metadata (was: Parallel Loop
Metadata)
> 
> Hi all,
> 
> >> Hal, this OpenCL WG autovectorization work, unfortunately, is not
> >> my
> >> first
> >> priority task at work currently (more like a pet project), so I
> >> cannot make any
> >> promises on when I might find time to work on it again. So, if you
> >> want to
> >> see the parallel loop iteration MD happen sooner, I'd
recommend
> >> you
> >> dig into
> >> it. I think we'd like to start from the scratch for the
> >> bbvectorizer
> >> utilization
> >> in pocl anyways, that is, would add the metadata support first and
> >> then use it
> >> in a fresh bbvectorizer version. The current hacked version in
> >> pocl
> >> seems not
> >> to be upstreamable easily as it has lagged behind some LLVM
> >> versions
> >> and is
> >> rather dirty.
> > 
> > Understood. If you have some time, it seems that there are several
> > sub-tasks:
> > 
> > - Update the language reference
> > - Update the loop vectorizer (to update the metadata when it
> > unrolls)
> > - Update the regular unroller
> > - Update the alias analysis (maybe this is sufficient for basic
> > support in BBVectorize) - is your current code close enough for
> > this?
> 
> Current implementation of AA uses work item metadata as well as
> 'region' metadata identifiers (regions begin separated by
barriers).
> So in order to provide similar functionality, parallel loop metadata
> would need both, 'loop ID' and 'loop iteration ID'.
> 
> This is perhaps not much of a immediate concern with respect to
> current discussion, but can become trouble in case there are two
> consecutive loops, both marked with parallel loop metadata, both
> fully unrolled (or partially unrolled followed by some loop
> fusion/combining pass). In this case there is need for 'loop ID' to
> distinguish origin, since 'loop iteration ID' is not enough.
Agreed, but I think that the current loop.parallel metadata scheme already has
that. The memory access metadata refers to the metadata marking the loop
backedges.
> 
> > - Update the BB vectorizer to prefer pairings from different
> > iterations
> 
> Updating BB vectorizer to use work item metadata was rather trivial
> addition of a test for difference in identifiers, very similar to
> one in AA (though we also record position of the instruction in the
> originating block) and should be trivial to add to BBvectorizer as
> well, using parallel metadata.
> It become major mess once complains started about speed, e.g. pocl
> version does not have any search limits or maximum instr per group
> etc, so finding all candidate pairs become rather time consuming.
> So the candidate selection is not compatible with BB vectorizer, and
> whole bunch of code was removed…
Interesting.
> 
> Anyways, perhaps interesting parts for integrating to BBVectorizer
> could be (crude) caching during replaceOutputs to be used when
>  vectorizing phi nodes. There is also some vectorization of
> getelementpointer instructions, creation vectors of allocas to get
> better vector memory accesses, some magic about computing addresses
> of stride memory accesses using vectors, some tweaks to eliminate
> unneeded shuffle instructions in replacement inputs etc. There are
> lot of assumptions that the instructions to be vectorized are really
> identical from different work items (due to recorded position in the
> originating code), which may not be case in general BB vectorized
> cases.
To clarify, are these features that you've implemented in your version?
> 
> Anyways, if the loop metadata gets updated, I can have a look at
> updating AA and moving it from pocl to LLVM, but not likely this
> week (maybe Pekka can provide it sooner if there is a rush).
> I can not make any promises about BBVectorization atm, unfortunately.
Great, thanks!

 -Hal
> 
> regards
> Vlado
> > 
> > Thanks again,
> > Hal
> > 
> >> 
> >> BR,
> >> --
> >> --Pekka
> >> 
> >> 
> > 
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
>

Pekka Jääskeläinen

2013-Mar-12 11:02 UTC

head link

[LLVMdev] Pointer Context Metadata

Hi,
> On 02/19/2013 05:51 PM, Hal Finkel wrote:
>> Understood. If you have some time, it seems that there are several
>> sub-tasks:
>>
>> - Update the language reference
On 02/19/2013 06:25 PM, Pekka Jääskeläinen wrote:> Document the additional optional iteration id argument to
> llvm.mem.parallel_loop_access? I'll do this.
I finally found some time to think about this. The current idea to mark the
unrolled parallel iterations is the following.

In the current llvm.mem.parallel_loop_access metadata format the accesses
are marked like this (a store in a 2-level nested loop):

...
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
...

!0 = metadata !{ metadata !1, metadata !2 } ; a list of parallel loop
identifiers
!1 = metadata !{ metadata !1 } ; an identifier for the inner parallel loop
!2 = metadata !{ metadata !2 } ; an identifier for the outer parallel loop

The idea was to add an optional extra iteration identifier to the MD of the
parallel_loop_access node. In essence, we want to identify the loop and
the unrolled iteration of each access to help (at least) the alias analysis.

Say, we unroll the inner loop once:

...
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
...
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !3
...

!0 = metadata !{ metadata !{ metadata !1, i64 0 }, metadata !2 }
!1 = metadata !{ metadata !1 }  ; loop id inner
!2 = metadata !{ metadata !2 }  ; loop id outer
!3 = metadata !{ metadata !{ metadata !1, i64 1 }, metadata !2 }

The llvm.loop.parallel MD of the branch will point to !1 and !2 directly
because the unrolled iterations should not (still) alias with accesses
from any (rolled) iteration. The outer loop iteration !2 has an
implicit "iteration id" of 0.

Or, should we use the "unique self-referring metadata node" trick also
here
to mark the iteration identifiers? The integer ids there are kind of
arbitrary, and the unrollers need to find out an unique int if they decide to
unroll more.

Could the following work more robustly?

!0 = metadata !{ metadata !{ metadata !1, metadata !0 }, metadata !2 }
!1 = metadata !{ metadata !1 }  ; loop id inner
!2 = metadata !{ metadata !2 }  ; loop id outer
!3 = metadata !{ metadata !{ metadata !1, metadata !3 }, metadata !2 }

Here the self-reference back to the "iteration" metadata (!0 and
!3) would force the uniqueness of an iteration MD. If we now unroll once
the outer loop too we get to:

; for the inner loop iterations:
!0 = metadata !{ metadata !{ metadata !1, metadata !0 }, metadata !2 }
!1 = metadata !{ metadata !1 }  ; loop id inner
!2 = metadata !{ metadata !2 }  ; loop id outer
!3 = metadata !{ metadata !{ metadata !1, metadata !3 }, metadata !2 }

; for the outer loop iterations:
!4 = metadata !{
   metadata !{ metadata !1, metadata !4 },
   metadata !{ metadata !2, metadata !4 } }

!7 = metadata !{
   metadata !{ metadata !1, metadata !3 },
   metadata !{ metadata !2, metadata !7 } }

The MD should still fulfill the requirement of not needing to know about the 
metadata: if the unroller unknowingly copies the iteration MD, we just won't
get the AA benefits, but nothing should break. The new unrolled instructions
would look like a single parallel loop iteration which can alias with each
other. If it doesn't copy the MD, the loop loses the "parallel
property".

The llvm.loop.parallel metadata would still work as previously: it points only 
to the loop id metadata nodes which the iteration id nodes also point to.

Any comments/thoughts/better ideas before I move forward with this? Next
I plan to emit the MD from pocl and add an alias analyzer to LLVM that
understands it.

Thanks,
-- 
Pekka

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Feb 2013 - [LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)

[LLVMdev] Pointer Context Metadata

Apparently Analagous Threads