thr3ads.net - llvm dev - [LLVMdev] loop vectorizer erroneously finds 256 bit vectors [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2013-Nov-10 14:39 UTC

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

Hi Frank,

I'm not an Intel expert, but it seems that your Xeon E5 supports AVX, which
does have 256-bit vectors. The other two only supports SSE instructions,
which are only 128-bit long.

cheers,
--renato


On 10 November 2013 06:05, Frank Winter <fwinter at jlab.org> wrote:
> I looked more into this. For the previously sent IR the vector width of
> 256 bit is found mistakenly (and reproducibly) on this hardware:
>
> model name    : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>
> For the same IR the loop vectorizer finds the correct vector width (128
> bit) on:
>
> model name    : Intel(R) Xeon(R) CPU           E5630  @ 2.53GHz
> model name    : Intel(R) Core(TM) i7 CPU       M 640  @ 2.80GHz
>
> Thus, the behavior depends on which hardware I run on.
>
> I am using the JIT execution engine (original interface).
>
> Frank
>
>
>
> On 09/11/13 23:50, Frank Winter wrote:
>
>> The loop vectorizer is doing an amazing job so far. Most of the time.
>> I just came across one function which led to unexpected behavior:
>>
>> On this function the loop vectorizer finds a 256 bit vector as the
>> wides vector type for the x86-64 architecture. (!)
>>
>> This is strange, as it was always finding the correct size of 128 bit
>> as the widest type. I isolated the IR of the function to check if this
>> is reproducible outside of my application. And to my surprise it is!
>>
>> If I run
>>
>> opt -O1 -loop-vectorize -debug-only=loop-vectorize
>> -vectorizer-min-trip-count=4 strange.ll -S
>>
>> on the IR found below I get:
>>
>> LV: Checking a loop in "main"
>> LV: Found a loop: L3
>> LV: Found an induction variable.
>> LV: We need to do 0 pointer comparisons.
>> LV: We don't need a runtime memory check.
>> LV: We can vectorize this loop!
>> LV: Found trip count: 4
>> LV: The Widest type: 32 bits.
>> LV: The Widest register is: 256 bits.
>>
>> Wow, a Sandybridge processor with 256 bit SIMD?
>>
>> The vectorizer carries on and figures that 8 would be the best to go
for.
>>
>> LV: Vector loop of width 8 costs: 38.
>> LV: Selecting VF = : 8.
>>
>> I didn't look into the machine code but I guess there is something
going
>> wrong earlier.
>>
>> I am wondering why it's reproducible and depending on the IR?!
>>
>> PS When running with -O3 it still find 256 bit, but later decides that
>> it's not worth
>> vectorizing.
>>
>> Frank
>>
>>
>>
>>
>>
>>
>> target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-
>> i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-
>> v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
>>
>> target triple = "x86_64-unknown-linux-elf"
>>
>> define void @main(i64 %arg0, i64 %arg1, i64 %arg2, i1 %arg3, i64 %arg4,
>> float* noalias %arg5, float* noalias %arg6, float* noalias %arg7,
double*
>> noalias %arg8) {
>> entrypoint:
>>   br i1 %arg3, label %L0, label %L1
>>
>> L0:                                               ; preds = %entrypoint
>>   %0 = add nsw i64 %arg0, %arg4
>>   %1 = add nsw i64 %arg1, %arg4
>>   br label %L2
>>
>> L1:                                               ; preds = %entrypoint
>>   br label %L2
>>
>> L2:                                               ; preds = %L0, %L1
>>   %2 = phi i64 [ %arg0, %L1 ], [ %0, %L0 ]
>>   %3 = phi i64 [ %arg1, %L1 ], [ %1, %L0 ]
>>   %4 = sdiv i64 %2, 4
>>   %5 = sdiv i64 %3, 4
>>   br label %L5
>>
>> L3:                                               ; preds = %L7, %L5
>>   %6 = phi i64 [ %528, %L7 ], [ 0, %L5 ]
>>   %7 = mul i64 %527, 4
>>   %8 = add nsw i64 %7, %6
>>   %9 = mul i64 %527, 1
>>   %10 = add nsw i64 %9, 0
>>   %11 = mul i64 %10, 9
>>   %12 = add nsw i64 %11, 0
>>   %13 = mul i64 %12, 2
>>   %14 = add nsw i64 %13, 0
>>   %15 = mul i64 %14, 4
>>   %16 = add nsw i64 %15, %6
>>   %17 = mul i64 %527, 4
>>   %18 = add nsw i64 %17, %6
>>   %19 = mul i64 %527, 1
>>   %20 = add nsw i64 %19, 0
>>   %21 = mul i64 %20, 9
>>   %22 = add nsw i64 %21, 0
>>   %23 = mul i64 %22, 2
>>   %24 = add nsw i64 %23, 1
>>   %25 = mul i64 %24, 4
>>   %26 = add nsw i64 %25, %6
>>   %27 = mul i64 %527, 4
>>   %28 = add nsw i64 %27, %6
>>   %29 = mul i64 %527, 1
>>   %30 = add nsw i64 %29, 0
>>   %31 = mul i64 %30, 9
>>   %32 = add nsw i64 %31, 1
>>   %33 = mul i64 %32, 2
>>   %34 = add nsw i64 %33, 0
>>   %35 = mul i64 %34, 4
>>   %36 = add nsw i64 %35, %6
>>   %37 = mul i64 %527, 4
>>   %38 = add nsw i64 %37, %6
>>   %39 = mul i64 %527, 1
>>   %40 = add nsw i64 %39, 0
>>   %41 = mul i64 %40, 9
>>   %42 = add nsw i64 %41, 1
>>   %43 = mul i64 %42, 2
>>   %44 = add nsw i64 %43, 1
>>   %45 = mul i64 %44, 4
>>   %46 = add nsw i64 %45, %6
>>   %47 = mul i64 %527, 4
>>   %48 = add nsw i64 %47, %6
>>   %49 = mul i64 %527, 1
>>   %50 = add nsw i64 %49, 0
>>   %51 = mul i64 %50, 9
>>   %52 = add nsw i64 %51, 2
>>   %53 = mul i64 %52, 2
>>   %54 = add nsw i64 %53, 0
>>   %55 = mul i64 %54, 4
>>   %56 = add nsw i64 %55, %6
>>   %57 = mul i64 %527, 4
>>   %58 = add nsw i64 %57, %6
>>   %59 = mul i64 %527, 1
>>   %60 = add nsw i64 %59, 0
>>   %61 = mul i64 %60, 9
>>   %62 = add nsw i64 %61, 2
>>   %63 = mul i64 %62, 2
>>   %64 = add nsw i64 %63, 1
>>   %65 = mul i64 %64, 4
>>   %66 = add nsw i64 %65, %6
>>   %67 = mul i64 %527, 4
>>   %68 = add nsw i64 %67, %6
>>   %69 = mul i64 %527, 1
>>   %70 = add nsw i64 %69, 0
>>   %71 = mul i64 %70, 9
>>   %72 = add nsw i64 %71, 3
>>   %73 = mul i64 %72, 2
>>   %74 = add nsw i64 %73, 0
>>   %75 = mul i64 %74, 4
>>   %76 = add nsw i64 %75, %6
>>   %77 = mul i64 %527, 4
>>   %78 = add nsw i64 %77, %6
>>   %79 = mul i64 %527, 1
>>   %80 = add nsw i64 %79, 0
>>   %81 = mul i64 %80, 9
>>   %82 = add nsw i64 %81, 3
>>   %83 = mul i64 %82, 2
>>   %84 = add nsw i64 %83, 1
>>   %85 = mul i64 %84, 4
>>   %86 = add nsw i64 %85, %6
>>   %87 = mul i64 %527, 4
>>   %88 = add nsw i64 %87, %6
>>   %89 = mul i64 %527, 1
>>   %90 = add nsw i64 %89, 0
>>   %91 = mul i64 %90, 9
>>   %92 = add nsw i64 %91, 4
>>   %93 = mul i64 %92, 2
>>   %94 = add nsw i64 %93, 0
>>   %95 = mul i64 %94, 4
>>   %96 = add nsw i64 %95, %6
>>   %97 = mul i64 %527, 4
>>   %98 = add nsw i64 %97, %6
>>   %99 = mul i64 %527, 1
>>   %100 = add nsw i64 %99, 0
>>   %101 = mul i64 %100, 9
>>   %102 = add nsw i64 %101, 4
>>   %103 = mul i64 %102, 2
>>   %104 = add nsw i64 %103, 1
>>   %105 = mul i64 %104, 4
>>   %106 = add nsw i64 %105, %6
>>   %107 = mul i64 %527, 4
>>   %108 = add nsw i64 %107, %6
>>   %109 = mul i64 %527, 1
>>   %110 = add nsw i64 %109, 0
>>   %111 = mul i64 %110, 9
>>   %112 = add nsw i64 %111, 5
>>   %113 = mul i64 %112, 2
>>   %114 = add nsw i64 %113, 0
>>   %115 = mul i64 %114, 4
>>   %116 = add nsw i64 %115, %6
>>   %117 = mul i64 %527, 4
>>   %118 = add nsw i64 %117, %6
>>   %119 = mul i64 %527, 1
>>   %120 = add nsw i64 %119, 0
>>   %121 = mul i64 %120, 9
>>   %122 = add nsw i64 %121, 5
>>   %123 = mul i64 %122, 2
>>   %124 = add nsw i64 %123, 1
>>   %125 = mul i64 %124, 4
>>   %126 = add nsw i64 %125, %6
>>   %127 = mul i64 %527, 4
>>   %128 = add nsw i64 %127, %6
>>   %129 = mul i64 %527, 1
>>   %130 = add nsw i64 %129, 0
>>   %131 = mul i64 %130, 9
>>   %132 = add nsw i64 %131, 6
>>   %133 = mul i64 %132, 2
>>   %134 = add nsw i64 %133, 0
>>   %135 = mul i64 %134, 4
>>   %136 = add nsw i64 %135, %6
>>   %137 = mul i64 %527, 4
>>   %138 = add nsw i64 %137, %6
>>   %139 = mul i64 %527, 1
>>   %140 = add nsw i64 %139, 0
>>   %141 = mul i64 %140, 9
>>   %142 = add nsw i64 %141, 6
>>   %143 = mul i64 %142, 2
>>   %144 = add nsw i64 %143, 1
>>   %145 = mul i64 %144, 4
>>   %146 = add nsw i64 %145, %6
>>   %147 = mul i64 %527, 4
>>   %148 = add nsw i64 %147, %6
>>   %149 = mul i64 %527, 1
>>   %150 = add nsw i64 %149, 0
>>   %151 = mul i64 %150, 9
>>   %152 = add nsw i64 %151, 7
>>   %153 = mul i64 %152, 2
>>   %154 = add nsw i64 %153, 0
>>   %155 = mul i64 %154, 4
>>   %156 = add nsw i64 %155, %6
>>   %157 = mul i64 %527, 4
>>   %158 = add nsw i64 %157, %6
>>   %159 = mul i64 %527, 1
>>   %160 = add nsw i64 %159, 0
>>   %161 = mul i64 %160, 9
>>   %162 = add nsw i64 %161, 7
>>   %163 = mul i64 %162, 2
>>   %164 = add nsw i64 %163, 1
>>   %165 = mul i64 %164, 4
>>   %166 = add nsw i64 %165, %6
>>   %167 = mul i64 %527, 4
>>   %168 = add nsw i64 %167, %6
>>   %169 = mul i64 %527, 1
>>   %170 = add nsw i64 %169, 0
>>   %171 = mul i64 %170, 9
>>   %172 = add nsw i64 %171, 8
>>   %173 = mul i64 %172, 2
>>   %174 = add nsw i64 %173, 0
>>   %175 = mul i64 %174, 4
>>   %176 = add nsw i64 %175, %6
>>   %177 = mul i64 %527, 4
>>   %178 = add nsw i64 %177, %6
>>   %179 = mul i64 %527, 1
>>   %180 = add nsw i64 %179, 0
>>   %181 = mul i64 %180, 9
>>   %182 = add nsw i64 %181, 8
>>   %183 = mul i64 %182, 2
>>   %184 = add nsw i64 %183, 1
>>   %185 = mul i64 %184, 4
>>   %186 = add nsw i64 %185, %6
>>   %187 = getelementptr float* %arg6, i64 %16
>>   %188 = load float* %187
>>   %189 = getelementptr float* %arg6, i64 %26
>>   %190 = load float* %189
>>   %191 = getelementptr float* %arg6, i64 %36
>>   %192 = load float* %191
>>   %193 = getelementptr float* %arg6, i64 %46
>>   %194 = load float* %193
>>   %195 = getelementptr float* %arg6, i64 %56
>>   %196 = load float* %195
>>   %197 = getelementptr float* %arg6, i64 %66
>>   %198 = load float* %197
>>   %199 = getelementptr float* %arg6, i64 %76
>>   %200 = load float* %199
>>   %201 = getelementptr float* %arg6, i64 %86
>>   %202 = load float* %201
>>   %203 = getelementptr float* %arg6, i64 %96
>>   %204 = load float* %203
>>   %205 = getelementptr float* %arg6, i64 %106
>>   %206 = load float* %205
>>   %207 = getelementptr float* %arg6, i64 %116
>>   %208 = load float* %207
>>   %209 = getelementptr float* %arg6, i64 %126
>>   %210 = load float* %209
>>   %211 = getelementptr float* %arg6, i64 %136
>>   %212 = load float* %211
>>   %213 = getelementptr float* %arg6, i64 %146
>>   %214 = load float* %213
>>   %215 = getelementptr float* %arg6, i64 %156
>>   %216 = load float* %215
>>   %217 = getelementptr float* %arg6, i64 %166
>>   %218 = load float* %217
>>   %219 = getelementptr float* %arg6, i64 %176
>>   %220 = load float* %219
>>   %221 = getelementptr float* %arg6, i64 %186
>>   %222 = load float* %221
>>   %223 = mul i64 %527, 4
>>   %224 = add nsw i64 %223, %6
>>   %225 = mul i64 %527, 1
>>   %226 = add nsw i64 %225, 0
>>   %227 = mul i64 %226, 9
>>   %228 = add nsw i64 %227, 0
>>   %229 = mul i64 %228, 2
>>   %230 = add nsw i64 %229, 0
>>   %231 = mul i64 %230, 4
>>   %232 = add nsw i64 %231, %6
>>   %233 = mul i64 %527, 4
>>   %234 = add nsw i64 %233, %6
>>   %235 = mul i64 %527, 1
>>   %236 = add nsw i64 %235, 0
>>   %237 = mul i64 %236, 9
>>   %238 = add nsw i64 %237, 0
>>   %239 = mul i64 %238, 2
>>   %240 = add nsw i64 %239, 1
>>   %241 = mul i64 %240, 4
>>   %242 = add nsw i64 %241, %6
>>   %243 = mul i64 %527, 4
>>   %244 = add nsw i64 %243, %6
>>   %245 = mul i64 %527, 1
>>   %246 = add nsw i64 %245, 0
>>   %247 = mul i64 %246, 9
>>   %248 = add nsw i64 %247, 1
>>   %249 = mul i64 %248, 2
>>   %250 = add nsw i64 %249, 0
>>   %251 = mul i64 %250, 4
>>   %252 = add nsw i64 %251, %6
>>   %253 = mul i64 %527, 4
>>   %254 = add nsw i64 %253, %6
>>   %255 = mul i64 %527, 1
>>   %256 = add nsw i64 %255, 0
>>   %257 = mul i64 %256, 9
>>   %258 = add nsw i64 %257, 1
>>   %259 = mul i64 %258, 2
>>   %260 = add nsw i64 %259, 1
>>   %261 = mul i64 %260, 4
>>   %262 = add nsw i64 %261, %6
>>   %263 = mul i64 %527, 4
>>   %264 = add nsw i64 %263, %6
>>   %265 = mul i64 %527, 1
>>   %266 = add nsw i64 %265, 0
>>   %267 = mul i64 %266, 9
>>   %268 = add nsw i64 %267, 2
>>   %269 = mul i64 %268, 2
>>   %270 = add nsw i64 %269, 0
>>   %271 = mul i64 %270, 4
>>   %272 = add nsw i64 %271, %6
>>   %273 = mul i64 %527, 4
>>   %274 = add nsw i64 %273, %6
>>   %275 = mul i64 %527, 1
>>   %276 = add nsw i64 %275, 0
>>   %277 = mul i64 %276, 9
>>   %278 = add nsw i64 %277, 2
>>   %279 = mul i64 %278, 2
>>   %280 = add nsw i64 %279, 1
>>   %281 = mul i64 %280, 4
>>   %282 = add nsw i64 %281, %6
>>   %283 = mul i64 %527, 4
>>   %284 = add nsw i64 %283, %6
>>   %285 = mul i64 %527, 1
>>   %286 = add nsw i64 %285, 0
>>   %287 = mul i64 %286, 9
>>   %288 = add nsw i64 %287, 3
>>   %289 = mul i64 %288, 2
>>   %290 = add nsw i64 %289, 0
>>   %291 = mul i64 %290, 4
>>   %292 = add nsw i64 %291, %6
>>   %293 = mul i64 %527, 4
>>   %294 = add nsw i64 %293, %6
>>   %295 = mul i64 %527, 1
>>   %296 = add nsw i64 %295, 0
>>   %297 = mul i64 %296, 9
>>   %298 = add nsw i64 %297, 3
>>   %299 = mul i64 %298, 2
>>   %300 = add nsw i64 %299, 1
>>   %301 = mul i64 %300, 4
>>   %302 = add nsw i64 %301, %6
>>   %303 = mul i64 %527, 4
>>   %304 = add nsw i64 %303, %6
>>   %305 = mul i64 %527, 1
>>   %306 = add nsw i64 %305, 0
>>   %307 = mul i64 %306, 9
>>   %308 = add nsw i64 %307, 4
>>   %309 = mul i64 %308, 2
>>   %310 = add nsw i64 %309, 0
>>   %311 = mul i64 %310, 4
>>   %312 = add nsw i64 %311, %6
>>   %313 = mul i64 %527, 4
>>   %314 = add nsw i64 %313, %6
>>   %315 = mul i64 %527, 1
>>   %316 = add nsw i64 %315, 0
>>   %317 = mul i64 %316, 9
>>   %318 = add nsw i64 %317, 4
>>   %319 = mul i64 %318, 2
>>   %320 = add nsw i64 %319, 1
>>   %321 = mul i64 %320, 4
>>   %322 = add nsw i64 %321, %6
>>   %323 = mul i64 %527, 4
>>   %324 = add nsw i64 %323, %6
>>   %325 = mul i64 %527, 1
>>   %326 = add nsw i64 %325, 0
>>   %327 = mul i64 %326, 9
>>   %328 = add nsw i64 %327, 5
>>   %329 = mul i64 %328, 2
>>   %330 = add nsw i64 %329, 0
>>   %331 = mul i64 %330, 4
>>   %332 = add nsw i64 %331, %6
>>   %333 = mul i64 %527, 4
>>   %334 = add nsw i64 %333, %6
>>   %335 = mul i64 %527, 1
>>   %336 = add nsw i64 %335, 0
>>   %337 = mul i64 %336, 9
>>   %338 = add nsw i64 %337, 5
>>   %339 = mul i64 %338, 2
>>   %340 = add nsw i64 %339, 1
>>   %341 = mul i64 %340, 4
>>   %342 = add nsw i64 %341, %6
>>   %343 = mul i64 %527, 4
>>   %344 = add nsw i64 %343, %6
>>   %345 = mul i64 %527, 1
>>   %346 = add nsw i64 %345, 0
>>   %347 = mul i64 %346, 9
>>   %348 = add nsw i64 %347, 6
>>   %349 = mul i64 %348, 2
>>   %350 = add nsw i64 %349, 0
>>   %351 = mul i64 %350, 4
>>   %352 = add nsw i64 %351, %6
>>   %353 = mul i64 %527, 4
>>   %354 = add nsw i64 %353, %6
>>   %355 = mul i64 %527, 1
>>   %356 = add nsw i64 %355, 0
>>   %357 = mul i64 %356, 9
>>   %358 = add nsw i64 %357, 6
>>   %359 = mul i64 %358, 2
>>   %360 = add nsw i64 %359, 1
>>   %361 = mul i64 %360, 4
>>   %362 = add nsw i64 %361, %6
>>   %363 = mul i64 %527, 4
>>   %364 = add nsw i64 %363, %6
>>   %365 = mul i64 %527, 1
>>   %366 = add nsw i64 %365, 0
>>   %367 = mul i64 %366, 9
>>   %368 = add nsw i64 %367, 7
>>   %369 = mul i64 %368, 2
>>   %370 = add nsw i64 %369, 0
>>   %371 = mul i64 %370, 4
>>   %372 = add nsw i64 %371, %6
>>   %373 = mul i64 %527, 4
>>   %374 = add nsw i64 %373, %6
>>   %375 = mul i64 %527, 1
>>   %376 = add nsw i64 %375, 0
>>   %377 = mul i64 %376, 9
>>   %378 = add nsw i64 %377, 7
>>   %379 = mul i64 %378, 2
>>   %380 = add nsw i64 %379, 1
>>   %381 = mul i64 %380, 4
>>   %382 = add nsw i64 %381, %6
>>   %383 = mul i64 %527, 4
>>   %384 = add nsw i64 %383, %6
>>   %385 = mul i64 %527, 1
>>   %386 = add nsw i64 %385, 0
>>   %387 = mul i64 %386, 9
>>   %388 = add nsw i64 %387, 8
>>   %389 = mul i64 %388, 2
>>   %390 = add nsw i64 %389, 0
>>   %391 = mul i64 %390, 4
>>   %392 = add nsw i64 %391, %6
>>   %393 = mul i64 %527, 4
>>   %394 = add nsw i64 %393, %6
>>   %395 = mul i64 %527, 1
>>   %396 = add nsw i64 %395, 0
>>   %397 = mul i64 %396, 9
>>   %398 = add nsw i64 %397, 8
>>   %399 = mul i64 %398, 2
>>   %400 = add nsw i64 %399, 1
>>   %401 = mul i64 %400, 4
>>   %402 = add nsw i64 %401, %6
>>   %403 = getelementptr float* %arg7, i64 %232
>>   %404 = load float* %403
>>   %405 = getelementptr float* %arg7, i64 %242
>>   %406 = load float* %405
>>   %407 = getelementptr float* %arg7, i64 %252
>>   %408 = load float* %407
>>   %409 = getelementptr float* %arg7, i64 %262
>>   %410 = load float* %409
>>   %411 = getelementptr float* %arg7, i64 %272
>>   %412 = load float* %411
>>   %413 = getelementptr float* %arg7, i64 %282
>>   %414 = load float* %413
>>   %415 = getelementptr float* %arg7, i64 %292
>>   %416 = load float* %415
>>   %417 = getelementptr float* %arg7, i64 %302
>>   %418 = load float* %417
>>   %419 = getelementptr float* %arg7, i64 %312
>>   %420 = load float* %419
>>   %421 = getelementptr float* %arg7, i64 %322
>>   %422 = load float* %421
>>   %423 = getelementptr float* %arg7, i64 %332
>>   %424 = load float* %423
>>   %425 = getelementptr float* %arg7, i64 %342
>>   %426 = load float* %425
>>   %427 = getelementptr float* %arg7, i64 %352
>>   %428 = load float* %427
>>   %429 = getelementptr float* %arg7, i64 %362
>>   %430 = load float* %429
>>   %431 = getelementptr float* %arg7, i64 %372
>>   %432 = load float* %431
>>   %433 = getelementptr float* %arg7, i64 %382
>>   %434 = load float* %433
>>   %435 = getelementptr float* %arg7, i64 %392
>>   %436 = load float* %435
>>   %437 = getelementptr float* %arg7, i64 %402
>>   %438 = load float* %437
>>   %439 = fmul float %406, %188
>>   %440 = fmul float %404, %190
>>   %441 = fadd float %440, %439
>>   %442 = fmul float %406, %190
>>   %443 = fmul float %404, %188
>>   %444 = fsub float %443, %442
>>   %445 = fmul float %410, %200
>>   %446 = fmul float %408, %202
>>   %447 = fadd float %446, %445
>>   %448 = fmul float %410, %202
>>   %449 = fmul float %408, %200
>>   %450 = fsub float %449, %448
>>   %451 = fadd float %444, %450
>>   %452 = fadd float %441, %447
>>   %453 = fmul float %414, %212
>>   %454 = fmul float %412, %214
>>   %455 = fadd float %454, %453
>>   %456 = fmul float %414, %214
>>   %457 = fmul float %412, %212
>>   %458 = fsub float %457, %456
>>   %459 = fadd float %451, %458
>>   %460 = fadd float %452, %455
>>   %461 = fmul float %418, %192
>>   %462 = fmul float %416, %194
>>   %463 = fadd float %462, %461
>>   %464 = fmul float %418, %194
>>   %465 = fmul float %416, %192
>>   %466 = fsub float %465, %464
>>   %467 = fadd float %459, %466
>>   %468 = fadd float %460, %463
>>   %469 = fmul float %422, %204
>>   %470 = fmul float %420, %206
>>   %471 = fadd float %470, %469
>>   %472 = fmul float %422, %206
>>   %473 = fmul float %420, %204
>>   %474 = fsub float %473, %472
>>   %475 = fadd float %467, %474
>>   %476 = fadd float %468, %471
>>   %477 = fmul float %426, %216
>>   %478 = fmul float %424, %218
>>   %479 = fadd float %478, %477
>>   %480 = fmul float %426, %218
>>   %481 = fmul float %424, %216
>>   %482 = fsub float %481, %480
>>   %483 = fadd float %475, %482
>>   %484 = fadd float %476, %479
>>   %485 = fmul float %430, %196
>>   %486 = fmul float %428, %198
>>   %487 = fadd float %486, %485
>>   %488 = fmul float %430, %198
>>   %489 = fmul float %428, %196
>>   %490 = fsub float %489, %488
>>   %491 = fadd float %483, %490
>>   %492 = fadd float %484, %487
>>   %493 = fmul float %434, %208
>>   %494 = fmul float %432, %210
>>   %495 = fadd float %494, %493
>>   %496 = fmul float %434, %210
>>   %497 = fmul float %432, %208
>>   %498 = fsub float %497, %496
>>   %499 = fadd float %491, %498
>>   %500 = fadd float %492, %495
>>   %501 = fmul float %438, %220
>>   %502 = fmul float %436, %222
>>   %503 = fadd float %502, %501
>>   %504 = fmul float %438, %222
>>   %505 = fmul float %436, %220
>>   %506 = fsub float %505, %504
>>   %507 = fadd float %499, %506
>>   %508 = fadd float %500, %503
>>   %509 = getelementptr double* %arg8, i32 0
>>   %510 = load double* %509
>>   %511 = fpext float %507 to double
>>   %512 = fmul double %510, %511
>>   %513 = mul i64 %527, 4
>>   %514 = add nsw i64 %513, %6
>>   %515 = mul i64 %527, 1
>>   %516 = add nsw i64 %515, 0
>>   %517 = mul i64 %516, 1
>>   %518 = add nsw i64 %517, 0
>>   %519 = mul i64 %518, 1
>>   %520 = add nsw i64 %519, 0
>>   %521 = mul i64 %520, 4
>>   %522 = add nsw i64 %521, %6
>>   %523 = getelementptr float* %arg5, i64 %522
>>   %524 = fptrunc double %512 to float
>>   store float %524, float* %523
>>   br label %L7
>>
>> L4:                                               ; preds = %L7
>>   %525 = add nsw i64 %527, 1
>>   %526 = icmp sge i64 %525, %5
>>   br i1 %526, label %L6, label %L5
>>
>> L5:                                               ; preds = %L4, %L2
>>   %527 = phi i64 [ %525, %L4 ], [ %4, %L2 ]
>>   br label %L3
>>
>> L6:                                               ; preds = %L4
>>   ret void
>>
>> L7:                                               ; preds = %L3
>>   %528 = add nsw i64 %6, 1
>>   %529 = icmp sge i64 %528, 4
>>   br i1 %529, label %L4, label %L3
>> }
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131110/c2389482/attachment.html>

Frank Winter

2013-Nov-10 14:48 UTC

head link

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

Hi Renato,

you are right! There is 'avx' support:

fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology 
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx 
lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority 
ept vpid

This is still strange: Why is this the only IR where 256 bit vectors are 
found and for all other functions it finds 128 bit vectors. This should 
be independent of the IR, right? So far I have tested a bunch of about 
20 functions generated and vectorized with this method. When processing 
all but this one the loop vectorizer finds 128 bit vectors as the 
widest. Only for this IR the loop vectorizer 'sees' the 256 bit version.
Any idea?

Frank


On 10/11/13 09:39, Renato Golin wrote:> Hi Frank,
>
> I'm not an Intel expert, but it seems that your Xeon E5 supports AVX, 
> which does have 256-bit vectors. The other two only supports SSE 
> instructions, which are only 128-bit long.
>
> cheers,
> --renato
>
>
> On 10 November 2013 06:05, Frank Winter <fwinter at jlab.org 
> <mailto:fwinter at jlab.org>> wrote:
>
>     I looked more into this. For the previously sent IR the vector
>     width of 256 bit is found mistakenly (and reproducibly) on this
>     hardware:
>
>     model name    : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>
>     For the same IR the loop vectorizer finds the correct vector width
>     (128 bit) on:
>
>     model name    : Intel(R) Xeon(R) CPU           E5630  @ 2.53GHz
>     model name    : Intel(R) Core(TM) i7 CPU       M 640  @ 2.80GHz
>
>     Thus, the behavior depends on which hardware I run on.
>
>     I am using the JIT execution engine (original interface).
>
>     Frank
>
>
>
>     On 09/11/13 23:50, Frank Winter wrote:
>
>         The loop vectorizer is doing an amazing job so far. Most of
>         the time.
>         I just came across one function which led to unexpected behavior:
>
>         On this function the loop vectorizer finds a 256 bit vector as the
>         wides vector type for the x86-64 architecture. (!)
>
>         This is strange, as it was always finding the correct size of
>         128 bit
>         as the widest type. I isolated the IR of the function to check
>         if this
>         is reproducible outside of my application. And to my surprise
>         it is!
>
>         If I run
>
>         opt -O1 -loop-vectorize -debug-only=loop-vectorize
>         -vectorizer-min-trip-count=4 strange.ll -S
>
>         on the IR found below I get:
>
>         LV: Checking a loop in "main"
>         LV: Found a loop: L3
>         LV: Found an induction variable.
>         LV: We need to do 0 pointer comparisons.
>         LV: We don't need a runtime memory check.
>         LV: We can vectorize this loop!
>         LV: Found trip count: 4
>         LV: The Widest type: 32 bits.
>         LV: The Widest register is: 256 bits.
>
>         Wow, a Sandybridge processor with 256 bit SIMD?
>
>         The vectorizer carries on and figures that 8 would be the best
>         to go for.
>
>         LV: Vector loop of width 8 costs: 38.
>         LV: Selecting VF = : 8.
>
>         I didn't look into the machine code but I guess there is
>         something going wrong earlier.
>
>         I am wondering why it's reproducible and depending on the IR?!
>
>         PS When running with -O3 it still find 256 bit, but later
>         decides that it's not worth
>         vectorizing.
>
>         Frank
>
>
>
>
>
>
>         target datalayout >        
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
>
>         target triple = "x86_64-unknown-linux-elf"
>
>         define void @main(i64 %arg0, i64 %arg1, i64 %arg2, i1 %arg3,
>         i64 %arg4, float* noalias %arg5, float* noalias %arg6, float*
>         noalias %arg7, double* noalias %arg8) {
>         entrypoint:
>           br i1 %arg3, label %L0, label %L1
>
>         L0:                                               ; preds >     
%entrypoint
>           %0 = add nsw i64 %arg0, %arg4
>           %1 = add nsw i64 %arg1, %arg4
>           br label %L2
>
>         L1:                                               ; preds >     
%entrypoint
>           br label %L2
>
>         L2:                                               ; preds >     
%L0, %L1
>           %2 = phi i64 [ %arg0, %L1 ], [ %0, %L0 ]
>           %3 = phi i64 [ %arg1, %L1 ], [ %1, %L0 ]
>           %4 = sdiv i64 %2, 4
>           %5 = sdiv i64 %3, 4
>           br label %L5
>
>         L3:                                               ; preds >     
%L7, %L5
>           %6 = phi i64 [ %528, %L7 ], [ 0, %L5 ]
>           %7 = mul i64 %527, 4
>           %8 = add nsw i64 %7, %6
>           %9 = mul i64 %527, 1
>           %10 = add nsw i64 %9, 0
>           %11 = mul i64 %10, 9
>           %12 = add nsw i64 %11, 0
>           %13 = mul i64 %12, 2
>           %14 = add nsw i64 %13, 0
>           %15 = mul i64 %14, 4
>           %16 = add nsw i64 %15, %6
>           %17 = mul i64 %527, 4
>           %18 = add nsw i64 %17, %6
>           %19 = mul i64 %527, 1
>           %20 = add nsw i64 %19, 0
>           %21 = mul i64 %20, 9
>           %22 = add nsw i64 %21, 0
>           %23 = mul i64 %22, 2
>           %24 = add nsw i64 %23, 1
>           %25 = mul i64 %24, 4
>           %26 = add nsw i64 %25, %6
>           %27 = mul i64 %527, 4
>           %28 = add nsw i64 %27, %6
>           %29 = mul i64 %527, 1
>           %30 = add nsw i64 %29, 0
>           %31 = mul i64 %30, 9
>           %32 = add nsw i64 %31, 1
>           %33 = mul i64 %32, 2
>           %34 = add nsw i64 %33, 0
>           %35 = mul i64 %34, 4
>           %36 = add nsw i64 %35, %6
>           %37 = mul i64 %527, 4
>           %38 = add nsw i64 %37, %6
>           %39 = mul i64 %527, 1
>           %40 = add nsw i64 %39, 0
>           %41 = mul i64 %40, 9
>           %42 = add nsw i64 %41, 1
>           %43 = mul i64 %42, 2
>           %44 = add nsw i64 %43, 1
>           %45 = mul i64 %44, 4
>           %46 = add nsw i64 %45, %6
>           %47 = mul i64 %527, 4
>           %48 = add nsw i64 %47, %6
>           %49 = mul i64 %527, 1
>           %50 = add nsw i64 %49, 0
>           %51 = mul i64 %50, 9
>           %52 = add nsw i64 %51, 2
>           %53 = mul i64 %52, 2
>           %54 = add nsw i64 %53, 0
>           %55 = mul i64 %54, 4
>           %56 = add nsw i64 %55, %6
>           %57 = mul i64 %527, 4
>           %58 = add nsw i64 %57, %6
>           %59 = mul i64 %527, 1
>           %60 = add nsw i64 %59, 0
>           %61 = mul i64 %60, 9
>           %62 = add nsw i64 %61, 2
>           %63 = mul i64 %62, 2
>           %64 = add nsw i64 %63, 1
>           %65 = mul i64 %64, 4
>           %66 = add nsw i64 %65, %6
>           %67 = mul i64 %527, 4
>           %68 = add nsw i64 %67, %6
>           %69 = mul i64 %527, 1
>           %70 = add nsw i64 %69, 0
>           %71 = mul i64 %70, 9
>           %72 = add nsw i64 %71, 3
>           %73 = mul i64 %72, 2
>           %74 = add nsw i64 %73, 0
>           %75 = mul i64 %74, 4
>           %76 = add nsw i64 %75, %6
>           %77 = mul i64 %527, 4
>           %78 = add nsw i64 %77, %6
>           %79 = mul i64 %527, 1
>           %80 = add nsw i64 %79, 0
>           %81 = mul i64 %80, 9
>           %82 = add nsw i64 %81, 3
>           %83 = mul i64 %82, 2
>           %84 = add nsw i64 %83, 1
>           %85 = mul i64 %84, 4
>           %86 = add nsw i64 %85, %6
>           %87 = mul i64 %527, 4
>           %88 = add nsw i64 %87, %6
>           %89 = mul i64 %527, 1
>           %90 = add nsw i64 %89, 0
>           %91 = mul i64 %90, 9
>           %92 = add nsw i64 %91, 4
>           %93 = mul i64 %92, 2
>           %94 = add nsw i64 %93, 0
>           %95 = mul i64 %94, 4
>           %96 = add nsw i64 %95, %6
>           %97 = mul i64 %527, 4
>           %98 = add nsw i64 %97, %6
>           %99 = mul i64 %527, 1
>           %100 = add nsw i64 %99, 0
>           %101 = mul i64 %100, 9
>           %102 = add nsw i64 %101, 4
>           %103 = mul i64 %102, 2
>           %104 = add nsw i64 %103, 1
>           %105 = mul i64 %104, 4
>           %106 = add nsw i64 %105, %6
>           %107 = mul i64 %527, 4
>           %108 = add nsw i64 %107, %6
>           %109 = mul i64 %527, 1
>           %110 = add nsw i64 %109, 0
>           %111 = mul i64 %110, 9
>           %112 = add nsw i64 %111, 5
>           %113 = mul i64 %112, 2
>           %114 = add nsw i64 %113, 0
>           %115 = mul i64 %114, 4
>           %116 = add nsw i64 %115, %6
>           %117 = mul i64 %527, 4
>           %118 = add nsw i64 %117, %6
>           %119 = mul i64 %527, 1
>           %120 = add nsw i64 %119, 0
>           %121 = mul i64 %120, 9
>           %122 = add nsw i64 %121, 5
>           %123 = mul i64 %122, 2
>           %124 = add nsw i64 %123, 1
>           %125 = mul i64 %124, 4
>           %126 = add nsw i64 %125, %6
>           %127 = mul i64 %527, 4
>           %128 = add nsw i64 %127, %6
>           %129 = mul i64 %527, 1
>           %130 = add nsw i64 %129, 0
>           %131 = mul i64 %130, 9
>           %132 = add nsw i64 %131, 6
>           %133 = mul i64 %132, 2
>           %134 = add nsw i64 %133, 0
>           %135 = mul i64 %134, 4
>           %136 = add nsw i64 %135, %6
>           %137 = mul i64 %527, 4
>           %138 = add nsw i64 %137, %6
>           %139 = mul i64 %527, 1
>           %140 = add nsw i64 %139, 0
>           %141 = mul i64 %140, 9
>           %142 = add nsw i64 %141, 6
>           %143 = mul i64 %142, 2
>           %144 = add nsw i64 %143, 1
>           %145 = mul i64 %144, 4
>           %146 = add nsw i64 %145, %6
>           %147 = mul i64 %527, 4
>           %148 = add nsw i64 %147, %6
>           %149 = mul i64 %527, 1
>           %150 = add nsw i64 %149, 0
>           %151 = mul i64 %150, 9
>           %152 = add nsw i64 %151, 7
>           %153 = mul i64 %152, 2
>           %154 = add nsw i64 %153, 0
>           %155 = mul i64 %154, 4
>           %156 = add nsw i64 %155, %6
>           %157 = mul i64 %527, 4
>           %158 = add nsw i64 %157, %6
>           %159 = mul i64 %527, 1
>           %160 = add nsw i64 %159, 0
>           %161 = mul i64 %160, 9
>           %162 = add nsw i64 %161, 7
>           %163 = mul i64 %162, 2
>           %164 = add nsw i64 %163, 1
>           %165 = mul i64 %164, 4
>           %166 = add nsw i64 %165, %6
>           %167 = mul i64 %527, 4
>           %168 = add nsw i64 %167, %6
>           %169 = mul i64 %527, 1
>           %170 = add nsw i64 %169, 0
>           %171 = mul i64 %170, 9
>           %172 = add nsw i64 %171, 8
>           %173 = mul i64 %172, 2
>           %174 = add nsw i64 %173, 0
>           %175 = mul i64 %174, 4
>           %176 = add nsw i64 %175, %6
>           %177 = mul i64 %527, 4
>           %178 = add nsw i64 %177, %6
>           %179 = mul i64 %527, 1
>           %180 = add nsw i64 %179, 0
>           %181 = mul i64 %180, 9
>           %182 = add nsw i64 %181, 8
>           %183 = mul i64 %182, 2
>           %184 = add nsw i64 %183, 1
>           %185 = mul i64 %184, 4
>           %186 = add nsw i64 %185, %6
>           %187 = getelementptr float* %arg6, i64 %16
>           %188 = load float* %187
>           %189 = getelementptr float* %arg6, i64 %26
>           %190 = load float* %189
>           %191 = getelementptr float* %arg6, i64 %36
>           %192 = load float* %191
>           %193 = getelementptr float* %arg6, i64 %46
>           %194 = load float* %193
>           %195 = getelementptr float* %arg6, i64 %56
>           %196 = load float* %195
>           %197 = getelementptr float* %arg6, i64 %66
>           %198 = load float* %197
>           %199 = getelementptr float* %arg6, i64 %76
>           %200 = load float* %199
>           %201 = getelementptr float* %arg6, i64 %86
>           %202 = load float* %201
>           %203 = getelementptr float* %arg6, i64 %96
>           %204 = load float* %203
>           %205 = getelementptr float* %arg6, i64 %106
>           %206 = load float* %205
>           %207 = getelementptr float* %arg6, i64 %116
>           %208 = load float* %207
>           %209 = getelementptr float* %arg6, i64 %126
>           %210 = load float* %209
>           %211 = getelementptr float* %arg6, i64 %136
>           %212 = load float* %211
>           %213 = getelementptr float* %arg6, i64 %146
>           %214 = load float* %213
>           %215 = getelementptr float* %arg6, i64 %156
>           %216 = load float* %215
>           %217 = getelementptr float* %arg6, i64 %166
>           %218 = load float* %217
>           %219 = getelementptr float* %arg6, i64 %176
>           %220 = load float* %219
>           %221 = getelementptr float* %arg6, i64 %186
>           %222 = load float* %221
>           %223 = mul i64 %527, 4
>           %224 = add nsw i64 %223, %6
>           %225 = mul i64 %527, 1
>           %226 = add nsw i64 %225, 0
>           %227 = mul i64 %226, 9
>           %228 = add nsw i64 %227, 0
>           %229 = mul i64 %228, 2
>           %230 = add nsw i64 %229, 0
>           %231 = mul i64 %230, 4
>           %232 = add nsw i64 %231, %6
>           %233 = mul i64 %527, 4
>           %234 = add nsw i64 %233, %6
>           %235 = mul i64 %527, 1
>           %236 = add nsw i64 %235, 0
>           %237 = mul i64 %236, 9
>           %238 = add nsw i64 %237, 0
>           %239 = mul i64 %238, 2
>           %240 = add nsw i64 %239, 1
>           %241 = mul i64 %240, 4
>           %242 = add nsw i64 %241, %6
>           %243 = mul i64 %527, 4
>           %244 = add nsw i64 %243, %6
>           %245 = mul i64 %527, 1
>           %246 = add nsw i64 %245, 0
>           %247 = mul i64 %246, 9
>           %248 = add nsw i64 %247, 1
>           %249 = mul i64 %248, 2
>           %250 = add nsw i64 %249, 0
>           %251 = mul i64 %250, 4
>           %252 = add nsw i64 %251, %6
>           %253 = mul i64 %527, 4
>           %254 = add nsw i64 %253, %6
>           %255 = mul i64 %527, 1
>           %256 = add nsw i64 %255, 0
>           %257 = mul i64 %256, 9
>           %258 = add nsw i64 %257, 1
>           %259 = mul i64 %258, 2
>           %260 = add nsw i64 %259, 1
>           %261 = mul i64 %260, 4
>           %262 = add nsw i64 %261, %6
>           %263 = mul i64 %527, 4
>           %264 = add nsw i64 %263, %6
>           %265 = mul i64 %527, 1
>           %266 = add nsw i64 %265, 0
>           %267 = mul i64 %266, 9
>           %268 = add nsw i64 %267, 2
>           %269 = mul i64 %268, 2
>           %270 = add nsw i64 %269, 0
>           %271 = mul i64 %270, 4
>           %272 = add nsw i64 %271, %6
>           %273 = mul i64 %527, 4
>           %274 = add nsw i64 %273, %6
>           %275 = mul i64 %527, 1
>           %276 = add nsw i64 %275, 0
>           %277 = mul i64 %276, 9
>           %278 = add nsw i64 %277, 2
>           %279 = mul i64 %278, 2
>           %280 = add nsw i64 %279, 1
>           %281 = mul i64 %280, 4
>           %282 = add nsw i64 %281, %6
>           %283 = mul i64 %527, 4
>           %284 = add nsw i64 %283, %6
>           %285 = mul i64 %527, 1
>           %286 = add nsw i64 %285, 0
>           %287 = mul i64 %286, 9
>           %288 = add nsw i64 %287, 3
>           %289 = mul i64 %288, 2
>           %290 = add nsw i64 %289, 0
>           %291 = mul i64 %290, 4
>           %292 = add nsw i64 %291, %6
>           %293 = mul i64 %527, 4
>           %294 = add nsw i64 %293, %6
>           %295 = mul i64 %527, 1
>           %296 = add nsw i64 %295, 0
>           %297 = mul i64 %296, 9
>           %298 = add nsw i64 %297, 3
>           %299 = mul i64 %298, 2
>           %300 = add nsw i64 %299, 1
>           %301 = mul i64 %300, 4
>           %302 = add nsw i64 %301, %6
>           %303 = mul i64 %527, 4
>           %304 = add nsw i64 %303, %6
>           %305 = mul i64 %527, 1
>           %306 = add nsw i64 %305, 0
>           %307 = mul i64 %306, 9
>           %308 = add nsw i64 %307, 4
>           %309 = mul i64 %308, 2
>           %310 = add nsw i64 %309, 0
>           %311 = mul i64 %310, 4
>           %312 = add nsw i64 %311, %6
>           %313 = mul i64 %527, 4
>           %314 = add nsw i64 %313, %6
>           %315 = mul i64 %527, 1
>           %316 = add nsw i64 %315, 0
>           %317 = mul i64 %316, 9
>           %318 = add nsw i64 %317, 4
>           %319 = mul i64 %318, 2
>           %320 = add nsw i64 %319, 1
>           %321 = mul i64 %320, 4
>           %322 = add nsw i64 %321, %6
>           %323 = mul i64 %527, 4
>           %324 = add nsw i64 %323, %6
>           %325 = mul i64 %527, 1
>           %326 = add nsw i64 %325, 0
>           %327 = mul i64 %326, 9
>           %328 = add nsw i64 %327, 5
>           %329 = mul i64 %328, 2
>           %330 = add nsw i64 %329, 0
>           %331 = mul i64 %330, 4
>           %332 = add nsw i64 %331, %6
>           %333 = mul i64 %527, 4
>           %334 = add nsw i64 %333, %6
>           %335 = mul i64 %527, 1
>           %336 = add nsw i64 %335, 0
>           %337 = mul i64 %336, 9
>           %338 = add nsw i64 %337, 5
>           %339 = mul i64 %338, 2
>           %340 = add nsw i64 %339, 1
>           %341 = mul i64 %340, 4
>           %342 = add nsw i64 %341, %6
>           %343 = mul i64 %527, 4
>           %344 = add nsw i64 %343, %6
>           %345 = mul i64 %527, 1
>           %346 = add nsw i64 %345, 0
>           %347 = mul i64 %346, 9
>           %348 = add nsw i64 %347, 6
>           %349 = mul i64 %348, 2
>           %350 = add nsw i64 %349, 0
>           %351 = mul i64 %350, 4
>           %352 = add nsw i64 %351, %6
>           %353 = mul i64 %527, 4
>           %354 = add nsw i64 %353, %6
>           %355 = mul i64 %527, 1
>           %356 = add nsw i64 %355, 0
>           %357 = mul i64 %356, 9
>           %358 = add nsw i64 %357, 6
>           %359 = mul i64 %358, 2
>           %360 = add nsw i64 %359, 1
>           %361 = mul i64 %360, 4
>           %362 = add nsw i64 %361, %6
>           %363 = mul i64 %527, 4
>           %364 = add nsw i64 %363, %6
>           %365 = mul i64 %527, 1
>           %366 = add nsw i64 %365, 0
>           %367 = mul i64 %366, 9
>           %368 = add nsw i64 %367, 7
>           %369 = mul i64 %368, 2
>           %370 = add nsw i64 %369, 0
>           %371 = mul i64 %370, 4
>           %372 = add nsw i64 %371, %6
>           %373 = mul i64 %527, 4
>           %374 = add nsw i64 %373, %6
>           %375 = mul i64 %527, 1
>           %376 = add nsw i64 %375, 0
>           %377 = mul i64 %376, 9
>           %378 = add nsw i64 %377, 7
>           %379 = mul i64 %378, 2
>           %380 = add nsw i64 %379, 1
>           %381 = mul i64 %380, 4
>           %382 = add nsw i64 %381, %6
>           %383 = mul i64 %527, 4
>           %384 = add nsw i64 %383, %6
>           %385 = mul i64 %527, 1
>           %386 = add nsw i64 %385, 0
>           %387 = mul i64 %386, 9
>           %388 = add nsw i64 %387, 8
>           %389 = mul i64 %388, 2
>           %390 = add nsw i64 %389, 0
>           %391 = mul i64 %390, 4
>           %392 = add nsw i64 %391, %6
>           %393 = mul i64 %527, 4
>           %394 = add nsw i64 %393, %6
>           %395 = mul i64 %527, 1
>           %396 = add nsw i64 %395, 0
>           %397 = mul i64 %396, 9
>           %398 = add nsw i64 %397, 8
>           %399 = mul i64 %398, 2
>           %400 = add nsw i64 %399, 1
>           %401 = mul i64 %400, 4
>           %402 = add nsw i64 %401, %6
>           %403 = getelementptr float* %arg7, i64 %232
>           %404 = load float* %403
>           %405 = getelementptr float* %arg7, i64 %242
>           %406 = load float* %405
>           %407 = getelementptr float* %arg7, i64 %252
>           %408 = load float* %407
>           %409 = getelementptr float* %arg7, i64 %262
>           %410 = load float* %409
>           %411 = getelementptr float* %arg7, i64 %272
>           %412 = load float* %411
>           %413 = getelementptr float* %arg7, i64 %282
>           %414 = load float* %413
>           %415 = getelementptr float* %arg7, i64 %292
>           %416 = load float* %415
>           %417 = getelementptr float* %arg7, i64 %302
>           %418 = load float* %417
>           %419 = getelementptr float* %arg7, i64 %312
>           %420 = load float* %419
>           %421 = getelementptr float* %arg7, i64 %322
>           %422 = load float* %421
>           %423 = getelementptr float* %arg7, i64 %332
>           %424 = load float* %423
>           %425 = getelementptr float* %arg7, i64 %342
>           %426 = load float* %425
>           %427 = getelementptr float* %arg7, i64 %352
>           %428 = load float* %427
>           %429 = getelementptr float* %arg7, i64 %362
>           %430 = load float* %429
>           %431 = getelementptr float* %arg7, i64 %372
>           %432 = load float* %431
>           %433 = getelementptr float* %arg7, i64 %382
>           %434 = load float* %433
>           %435 = getelementptr float* %arg7, i64 %392
>           %436 = load float* %435
>           %437 = getelementptr float* %arg7, i64 %402
>           %438 = load float* %437
>           %439 = fmul float %406, %188
>           %440 = fmul float %404, %190
>           %441 = fadd float %440, %439
>           %442 = fmul float %406, %190
>           %443 = fmul float %404, %188
>           %444 = fsub float %443, %442
>           %445 = fmul float %410, %200
>           %446 = fmul float %408, %202
>           %447 = fadd float %446, %445
>           %448 = fmul float %410, %202
>           %449 = fmul float %408, %200
>           %450 = fsub float %449, %448
>           %451 = fadd float %444, %450
>           %452 = fadd float %441, %447
>           %453 = fmul float %414, %212
>           %454 = fmul float %412, %214
>           %455 = fadd float %454, %453
>           %456 = fmul float %414, %214
>           %457 = fmul float %412, %212
>           %458 = fsub float %457, %456
>           %459 = fadd float %451, %458
>           %460 = fadd float %452, %455
>           %461 = fmul float %418, %192
>           %462 = fmul float %416, %194
>           %463 = fadd float %462, %461
>           %464 = fmul float %418, %194
>           %465 = fmul float %416, %192
>           %466 = fsub float %465, %464
>           %467 = fadd float %459, %466
>           %468 = fadd float %460, %463
>           %469 = fmul float %422, %204
>           %470 = fmul float %420, %206
>           %471 = fadd float %470, %469
>           %472 = fmul float %422, %206
>           %473 = fmul float %420, %204
>           %474 = fsub float %473, %472
>           %475 = fadd float %467, %474
>           %476 = fadd float %468, %471
>           %477 = fmul float %426, %216
>           %478 = fmul float %424, %218
>           %479 = fadd float %478, %477
>           %480 = fmul float %426, %218
>           %481 = fmul float %424, %216
>           %482 = fsub float %481, %480
>           %483 = fadd float %475, %482
>           %484 = fadd float %476, %479
>           %485 = fmul float %430, %196
>           %486 = fmul float %428, %198
>           %487 = fadd float %486, %485
>           %488 = fmul float %430, %198
>           %489 = fmul float %428, %196
>           %490 = fsub float %489, %488
>           %491 = fadd float %483, %490
>           %492 = fadd float %484, %487
>           %493 = fmul float %434, %208
>           %494 = fmul float %432, %210
>           %495 = fadd float %494, %493
>           %496 = fmul float %434, %210
>           %497 = fmul float %432, %208
>           %498 = fsub float %497, %496
>           %499 = fadd float %491, %498
>           %500 = fadd float %492, %495
>           %501 = fmul float %438, %220
>           %502 = fmul float %436, %222
>           %503 = fadd float %502, %501
>           %504 = fmul float %438, %222
>           %505 = fmul float %436, %220
>           %506 = fsub float %505, %504
>           %507 = fadd float %499, %506
>           %508 = fadd float %500, %503
>           %509 = getelementptr double* %arg8, i32 0
>           %510 = load double* %509
>           %511 = fpext float %507 to double
>           %512 = fmul double %510, %511
>           %513 = mul i64 %527, 4
>           %514 = add nsw i64 %513, %6
>           %515 = mul i64 %527, 1
>           %516 = add nsw i64 %515, 0
>           %517 = mul i64 %516, 1
>           %518 = add nsw i64 %517, 0
>           %519 = mul i64 %518, 1
>           %520 = add nsw i64 %519, 0
>           %521 = mul i64 %520, 4
>           %522 = add nsw i64 %521, %6
>           %523 = getelementptr float* %arg5, i64 %522
>           %524 = fptrunc double %512 to float
>           store float %524, float* %523
>           br label %L7
>
>         L4:                                               ; preds = %L7
>           %525 = add nsw i64 %527, 1
>           %526 = icmp sge i64 %525, %5
>           br i1 %526, label %L6, label %L5
>
>         L5:                                               ; preds >     
%L4, %L2
>           %527 = phi i64 [ %525, %L4 ], [ %4, %L2 ]
>           br label %L3
>
>         L6:                                               ; preds = %L4
>           ret void
>
>         L7:                                               ; preds = %L3
>           %528 = add nsw i64 %6, 1
>           %529 = icmp sge i64 %528, 4
>           br i1 %529, label %L4, label %L3
>         }
>
>
>
>         _______________________________________________
>         LLVM Developers mailing list
>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>         http://llvm.cs.uiuc.edu
>         http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131110/c5c2e429/attachment.html>

Renato Golin

2013-Nov-10 15:06 UTC

head link

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

On 10 November 2013 06:48, Frank Winter <fwinter at jlab.org> wrote:
>  This is still strange: Why is this the only IR where 256 bit vectors are
> found and for all other functions it finds 128 bit vectors. This should be
> independent of the IR, right? So far I have tested a bunch of about 20
> functions generated and vectorized with this method. When processing all
> but this one the loop vectorizer finds 128 bit vectors as the widest. Only
> for this IR the loop vectorizer 'sees' the 256 bit version. Any
idea?
>
Hi Frank,

The vectorization factor is computed depending on the type size and the
ability to use instructions on the varying vector sizes, and that depends
not only on the register width, but also on the available instructions on
the target. Furthermore, the cost of operations is taken into account, so
if the cost of shuffling/scattering 8 lanes might be twice as much as 4
lanes, and that might stop/redirect vectorization.

I don't know AVX that well, but it's possible that some operations are
not
available on 256-bit width vectors, so the vectorizer uses the least
requirement, which in your other examples, was 128-bit.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131110/fb870ed7/attachment.html>

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Nov 2013 - [LLVMdev] loop vectorizer erroneously finds 256 bit vectors

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

Reasonably Related Threads