Displaying 1 result from an estimated 1 matches for "vmovdqu32zmm0".
2017 Jun 24
4
AVX Scheduling and Parallelism
Hello,
After generating AVX code for large no of iterations i came to realize that
it still uses only 2 registers zmm0 and zmm1 when the loop urnroll
factor=1024,
i wonder if this register allocation allows operations in parallel?
Also i know all the elements within a single vector instruction are
computed in parallel but does the elements of multiple instructions
computed in parallel? like are