2009/11/8 Chris Lattner <clattner at apple.com>:> The first step is loop dependence analysis. This is required to determine
> loop reuse information and is the basis for a lot of vectorization and
> parallelization loop transformations.
I suppose all dependencies can be determined with function passes and
module-wide analysis.
LLVM does unroll small loops, but once the number of iterations is too
big, it does not even attempts to unroll a multiple of the iterations.
for(i=0;i<4;++i) unrolls to four flat calls but
for(i=0;i<400;++i) doesn't unroll to 100 iterations of four flat calls...
Is there any IR vectorial instruction? Or does it need to go as
metadata for the codegen routines? So, instead of unrolling at the IR
level, we could have some MISD/SIMD instructions with the whole range
and let the codegen define what low-level instructions to use in each
case. So, a processor without VFP would unroll the loop, while one
with could use the VFP instructions instead of unrolling.
Collapsing memset-like loop:
multistore i32 %value, [ 400 x i32 ]* %array
Collapsing memcpy-like loop:
multicopy [ 400 x i32 ]* %orig, [ 400 x i32 ]* %dest
Like the MSVC, we could also detect pointer copy loops and revert to a
memcpy call. If a loop is called more than a few times, might be
better (if space optimisations are not on) to create a region in
memory to copy from with memcpy. This is particularly useful in
repetitive calls to reset an array for the next iteration in a
specific parallel computation.
In that case, instead of creating new instructions, we could use those
functions, inline them as often as possible and optimise them to VFP
instructions later.
cheers,
--renato
Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm