thr3ads.net - llvm dev - [LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info [Jan 2014]

If this information is useful, please help other people find it:
Share via:

Diego Novillo

2014-Jan-16 18:02 UTC

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

On Thu, Jan 16, 2014 at 9:26 AM, Nadav Rotem <nrotem at apple.com>
wrote:> Hi Diego,
>
> It looks like the problem is with the code in the vectorizer that tries to
estimate the most profitable vectorization factor:
>
>> LV: Found an estimated cost of 6 for VF 2 For instruction:   %3 = load
>> i64* %state, align 8, !dbg !58, !tbaa !61
>
>
> It looks like a cost model problem.  The vectorizer thinks that loading %3
(above) is non consecutive and would require scatter/gather.  Is that correct? I
wonder that SCEV is reporting. Is there an index overflow problem that is
preventing us from loading consecutive elements?
Yes, I forgot to mention that. The access is non-consecutive:

      for(i=0; i<reg->size; i++)
         {
             /* Flip the target bit of each basis state */
             reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
}

The code is writing to the 'state' field. The data structures look like
this:

typedef struct quantum_matrix_struct quantum_matrix;

struct quantum_reg_node_struct
{
  COMPLEX_FLOAT amplitude; /* alpha_j */
  MAX_UNSIGNED state;      /* j */
};

typedef struct quantum_reg_node_struct quantum_reg_node;

/* The quantum register */

struct quantum_reg_struct
{
  int width;    /* number of qubits in the qureg */
  int size;     /* number of non-zero vectors */
  int hashw;    /* width of the hash array */
  quantum_reg_node *node;
  int *hash;
};

If you do the trick of writing to a separate array, then the loop can
be vectorized.


Diego.

Nadav Rotem

2014-Jan-16 18:06 UTC

head link

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

On Jan 16, 2014, at 10:02 AM, Diego Novillo <dnovillo at google.com>
wrote:
> On Thu, Jan 16, 2014 at 9:26 AM, Nadav Rotem <nrotem at apple.com>
wrote:
>> Hi Diego,
>> 
>> It looks like the problem is with the code in the vectorizer that tries
to estimate the most profitable vectorization factor:
>> 
>>> LV: Found an estimated cost of 6 for VF 2 For instruction:   %3 =
load
>>> i64* %state, align 8, !dbg !58, !tbaa !61
>> 
>> 
>> It looks like a cost model problem.  The vectorizer thinks that loading
%3 (above) is non consecutive and would require scatter/gather.  Is that
correct? I wonder that SCEV is reporting. Is there an index overflow problem
that is preventing us from loading consecutive elements?
> 
> Yes, I forgot to mention that. The access is non-consecutive:
> 
>      for(i=0; i<reg->size; i++)
>         {
>             /* Flip the target bit of each basis state */
>             reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
> }
> 
> The code is writing to the 'state' field. The data structures look
like this:
> 
> typedef struct quantum_matrix_struct quantum_matrix;
> 
> struct quantum_reg_node_struct
> {
>  COMPLEX_FLOAT amplitude; /* alpha_j */
>  MAX_UNSIGNED state;      /* j */
> };
> 
> typedef struct quantum_reg_node_struct quantum_reg_node;
> 
> /* The quantum register */
> 
> struct quantum_reg_struct
> {
>  int width;    /* number of qubits in the qureg */
>  int size;     /* number of non-zero vectors */
>  int hashw;    /* width of the hash array */
>  quantum_reg_node *node;
>  int *hash;
> };
> 
> If you do the trick of writing to a separate array, then the loop can
> be vectorized.
> 
Do you mean that we should change the layout of the array?  Or is there another
trick?

> 
> Diego.

Diego Novillo

2014-Jan-16 18:09 UTC

head link

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

On Thu, Jan 16, 2014 at 10:06 AM, Nadav Rotem <nrotem at apple.com> wrote:
>> If you do the trick of writing to a separate array, then the loop can
>> be vectorized.
>>
>
> Do you mean that we should change the layout of the array?  Or is there
another trick?
No, sorry. I meant that if I change the code to instead do the xor
operation into a temp array, then I expect the loop to be vectorized.
But the overhead would likely negate the vectorization benefits
(haven't tried it, though). Incidentally, GCC gives up on vectorizing
this loop for essentially the same reason. It decides that the access
pattern is too complex to bother.


Diego.

Sean Silva

2014-Jan-16 19:57 UTC

head link

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

On Thu, Jan 16, 2014 at 1:02 PM, Diego Novillo <dnovillo at google.com>
wrote:
> On Thu, Jan 16, 2014 at 9:26 AM, Nadav Rotem <nrotem at apple.com>
wrote:
> > Hi Diego,
> >
> > It looks like the problem is with the code in the vectorizer that
tries
> to estimate the most profitable vectorization factor:
> >
> >> LV: Found an estimated cost of 6 for VF 2 For instruction:   %3 =
load
> >> i64* %state, align 8, !dbg !58, !tbaa !61
> >
> >
> > It looks like a cost model problem.  The vectorizer thinks that
loading
> %3 (above) is non consecutive and would require scatter/gather.  Is that
> correct? I wonder that SCEV is reporting. Is there an index overflow
> problem that is preventing us from loading consecutive elements?
>
> Yes, I forgot to mention that. The access is non-consecutive:
>
>       for(i=0; i<reg->size; i++)
>          {
>              /* Flip the target bit of each basis state */
>              reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
> }
>
> The code is writing to the 'state' field. The data structures look
like
> this:
>
> typedef struct quantum_matrix_struct quantum_matrix;
>
> struct quantum_reg_node_struct
> {
>   COMPLEX_FLOAT amplitude; /* alpha_j */
>   MAX_UNSIGNED state;      /* j */
> };
>
> typedef struct quantum_reg_node_struct quantum_reg_node;
>
> /* The quantum register */
>
> struct quantum_reg_struct
> {
>   int width;    /* number of qubits in the qureg */
>   int size;     /* number of non-zero vectors */
>   int hashw;    /* width of the hash array */
>   quantum_reg_node *node;
>   int *hash;
> };
>
> If you do the trick of writing to a separate array, then the loop can
> be vectorized.
>
Or use scatter-gather.

-- Sean Silva

>
>
> Diego.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140116/67bdabd8/attachment.html>

llvm dev - Jan 2014 - [LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info