Diego Novillo
2014-Jan-16 18:02 UTC
[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
On Thu, Jan 16, 2014 at 9:26 AM, Nadav Rotem <nrotem at apple.com> wrote:> Hi Diego, > > It looks like the problem is with the code in the vectorizer that tries to estimate the most profitable vectorization factor: > >> LV: Found an estimated cost of 6 for VF 2 For instruction: %3 = load >> i64* %state, align 8, !dbg !58, !tbaa !61 > > > It looks like a cost model problem. The vectorizer thinks that loading %3 (above) is non consecutive and would require scatter/gather. Is that correct? I wonder that SCEV is reporting. Is there an index overflow problem that is preventing us from loading consecutive elements?Yes, I forgot to mention that. The access is non-consecutive: for(i=0; i<reg->size; i++) { /* Flip the target bit of each basis state */ reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target); } The code is writing to the 'state' field. The data structures look like this: typedef struct quantum_matrix_struct quantum_matrix; struct quantum_reg_node_struct { COMPLEX_FLOAT amplitude; /* alpha_j */ MAX_UNSIGNED state; /* j */ }; typedef struct quantum_reg_node_struct quantum_reg_node; /* The quantum register */ struct quantum_reg_struct { int width; /* number of qubits in the qureg */ int size; /* number of non-zero vectors */ int hashw; /* width of the hash array */ quantum_reg_node *node; int *hash; }; If you do the trick of writing to a separate array, then the loop can be vectorized. Diego.
Nadav Rotem
2014-Jan-16 18:06 UTC
[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
On Jan 16, 2014, at 10:02 AM, Diego Novillo <dnovillo at google.com> wrote:> On Thu, Jan 16, 2014 at 9:26 AM, Nadav Rotem <nrotem at apple.com> wrote: >> Hi Diego, >> >> It looks like the problem is with the code in the vectorizer that tries to estimate the most profitable vectorization factor: >> >>> LV: Found an estimated cost of 6 for VF 2 For instruction: %3 = load >>> i64* %state, align 8, !dbg !58, !tbaa !61 >> >> >> It looks like a cost model problem. The vectorizer thinks that loading %3 (above) is non consecutive and would require scatter/gather. Is that correct? I wonder that SCEV is reporting. Is there an index overflow problem that is preventing us from loading consecutive elements? > > Yes, I forgot to mention that. The access is non-consecutive: > > for(i=0; i<reg->size; i++) > { > /* Flip the target bit of each basis state */ > reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target); > } > > The code is writing to the 'state' field. The data structures look like this: > > typedef struct quantum_matrix_struct quantum_matrix; > > struct quantum_reg_node_struct > { > COMPLEX_FLOAT amplitude; /* alpha_j */ > MAX_UNSIGNED state; /* j */ > }; > > typedef struct quantum_reg_node_struct quantum_reg_node; > > /* The quantum register */ > > struct quantum_reg_struct > { > int width; /* number of qubits in the qureg */ > int size; /* number of non-zero vectors */ > int hashw; /* width of the hash array */ > quantum_reg_node *node; > int *hash; > }; > > If you do the trick of writing to a separate array, then the loop can > be vectorized. >Do you mean that we should change the layout of the array? Or is there another trick?> > Diego.
Diego Novillo
2014-Jan-16 18:09 UTC
[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
On Thu, Jan 16, 2014 at 10:06 AM, Nadav Rotem <nrotem at apple.com> wrote:>> If you do the trick of writing to a separate array, then the loop can >> be vectorized. >> > > Do you mean that we should change the layout of the array? Or is there another trick?No, sorry. I meant that if I change the code to instead do the xor operation into a temp array, then I expect the loop to be vectorized. But the overhead would likely negate the vectorization benefits (haven't tried it, though). Incidentally, GCC gives up on vectorizing this loop for essentially the same reason. It decides that the access pattern is too complex to bother. Diego.
Sean Silva
2014-Jan-16 19:57 UTC
[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
On Thu, Jan 16, 2014 at 1:02 PM, Diego Novillo <dnovillo at google.com> wrote:> On Thu, Jan 16, 2014 at 9:26 AM, Nadav Rotem <nrotem at apple.com> wrote: > > Hi Diego, > > > > It looks like the problem is with the code in the vectorizer that tries > to estimate the most profitable vectorization factor: > > > >> LV: Found an estimated cost of 6 for VF 2 For instruction: %3 = load > >> i64* %state, align 8, !dbg !58, !tbaa !61 > > > > > > It looks like a cost model problem. The vectorizer thinks that loading > %3 (above) is non consecutive and would require scatter/gather. Is that > correct? I wonder that SCEV is reporting. Is there an index overflow > problem that is preventing us from loading consecutive elements? > > Yes, I forgot to mention that. The access is non-consecutive: > > for(i=0; i<reg->size; i++) > { > /* Flip the target bit of each basis state */ > reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target); > } > > The code is writing to the 'state' field. The data structures look like > this: > > typedef struct quantum_matrix_struct quantum_matrix; > > struct quantum_reg_node_struct > { > COMPLEX_FLOAT amplitude; /* alpha_j */ > MAX_UNSIGNED state; /* j */ > }; > > typedef struct quantum_reg_node_struct quantum_reg_node; > > /* The quantum register */ > > struct quantum_reg_struct > { > int width; /* number of qubits in the qureg */ > int size; /* number of non-zero vectors */ > int hashw; /* width of the hash array */ > quantum_reg_node *node; > int *hash; > }; > > If you do the trick of writing to a separate array, then the loop can > be vectorized. >Or use scatter-gather. -- Sean Silva> > > Diego. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140116/67bdabd8/attachment.html>