thr3ads.net - llvm dev - [LLVMdev] loop vectorizer erroneously finds 256 bit vectors [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Frank Winter

2013-Nov-10 04:50 UTC

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

The loop vectorizer is doing an amazing job so far. Most of the time.
I just came across one function which led to unexpected behavior:

On this function the loop vectorizer finds a 256 bit vector as the
wides vector type for the x86-64 architecture. (!)

This is strange, as it was always finding the correct size of 128 bit
as the widest type. I isolated the IR of the function to check if this
is reproducible outside of my application. And to my surprise it is!

If I run

opt -O1 -loop-vectorize -debug-only=loop-vectorize 
-vectorizer-min-trip-count=4 strange.ll -S

on the IR found below I get:

LV: Checking a loop in "main"
LV: Found a loop: L3
LV: Found an induction variable.
LV: We need to do 0 pointer comparisons.
LV: We don't need a runtime memory check.
LV: We can vectorize this loop!
LV: Found trip count: 4
LV: The Widest type: 32 bits.
LV: The Widest register is: 256 bits.

Wow, a Sandybridge processor with 256 bit SIMD?

The vectorizer carries on and figures that 8 would be the best to go for.

LV: Vector loop of width 8 costs: 38.
LV: Selecting VF = : 8.

I didn't look into the machine code but I guess there is something going 
wrong earlier.

I am wondering why it's reproducible and depending on the IR?!

PS When running with -O3 it still find 256 bit, but later decides that 
it's not worth
vectorizing.

Frank






target datalayout = 
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

target triple = "x86_64-unknown-linux-elf"

define void @main(i64 %arg0, i64 %arg1, i64 %arg2, i1 %arg3, i64 %arg4, 
float* noalias %arg5, float* noalias %arg6, float* noalias %arg7, 
double* noalias %arg8) {
entrypoint:
   br i1 %arg3, label %L0, label %L1

L0:                                               ; preds = %entrypoint
   %0 = add nsw i64 %arg0, %arg4
   %1 = add nsw i64 %arg1, %arg4
   br label %L2

L1:                                               ; preds = %entrypoint
   br label %L2

L2:                                               ; preds = %L0, %L1
   %2 = phi i64 [ %arg0, %L1 ], [ %0, %L0 ]
   %3 = phi i64 [ %arg1, %L1 ], [ %1, %L0 ]
   %4 = sdiv i64 %2, 4
   %5 = sdiv i64 %3, 4
   br label %L5

L3:                                               ; preds = %L7, %L5
   %6 = phi i64 [ %528, %L7 ], [ 0, %L5 ]
   %7 = mul i64 %527, 4
   %8 = add nsw i64 %7, %6
   %9 = mul i64 %527, 1
   %10 = add nsw i64 %9, 0
   %11 = mul i64 %10, 9
   %12 = add nsw i64 %11, 0
   %13 = mul i64 %12, 2
   %14 = add nsw i64 %13, 0
   %15 = mul i64 %14, 4
   %16 = add nsw i64 %15, %6
   %17 = mul i64 %527, 4
   %18 = add nsw i64 %17, %6
   %19 = mul i64 %527, 1
   %20 = add nsw i64 %19, 0
   %21 = mul i64 %20, 9
   %22 = add nsw i64 %21, 0
   %23 = mul i64 %22, 2
   %24 = add nsw i64 %23, 1
   %25 = mul i64 %24, 4
   %26 = add nsw i64 %25, %6
   %27 = mul i64 %527, 4
   %28 = add nsw i64 %27, %6
   %29 = mul i64 %527, 1
   %30 = add nsw i64 %29, 0
   %31 = mul i64 %30, 9
   %32 = add nsw i64 %31, 1
   %33 = mul i64 %32, 2
   %34 = add nsw i64 %33, 0
   %35 = mul i64 %34, 4
   %36 = add nsw i64 %35, %6
   %37 = mul i64 %527, 4
   %38 = add nsw i64 %37, %6
   %39 = mul i64 %527, 1
   %40 = add nsw i64 %39, 0
   %41 = mul i64 %40, 9
   %42 = add nsw i64 %41, 1
   %43 = mul i64 %42, 2
   %44 = add nsw i64 %43, 1
   %45 = mul i64 %44, 4
   %46 = add nsw i64 %45, %6
   %47 = mul i64 %527, 4
   %48 = add nsw i64 %47, %6
   %49 = mul i64 %527, 1
   %50 = add nsw i64 %49, 0
   %51 = mul i64 %50, 9
   %52 = add nsw i64 %51, 2
   %53 = mul i64 %52, 2
   %54 = add nsw i64 %53, 0
   %55 = mul i64 %54, 4
   %56 = add nsw i64 %55, %6
   %57 = mul i64 %527, 4
   %58 = add nsw i64 %57, %6
   %59 = mul i64 %527, 1
   %60 = add nsw i64 %59, 0
   %61 = mul i64 %60, 9
   %62 = add nsw i64 %61, 2
   %63 = mul i64 %62, 2
   %64 = add nsw i64 %63, 1
   %65 = mul i64 %64, 4
   %66 = add nsw i64 %65, %6
   %67 = mul i64 %527, 4
   %68 = add nsw i64 %67, %6
   %69 = mul i64 %527, 1
   %70 = add nsw i64 %69, 0
   %71 = mul i64 %70, 9
   %72 = add nsw i64 %71, 3
   %73 = mul i64 %72, 2
   %74 = add nsw i64 %73, 0
   %75 = mul i64 %74, 4
   %76 = add nsw i64 %75, %6
   %77 = mul i64 %527, 4
   %78 = add nsw i64 %77, %6
   %79 = mul i64 %527, 1
   %80 = add nsw i64 %79, 0
   %81 = mul i64 %80, 9
   %82 = add nsw i64 %81, 3
   %83 = mul i64 %82, 2
   %84 = add nsw i64 %83, 1
   %85 = mul i64 %84, 4
   %86 = add nsw i64 %85, %6
   %87 = mul i64 %527, 4
   %88 = add nsw i64 %87, %6
   %89 = mul i64 %527, 1
   %90 = add nsw i64 %89, 0
   %91 = mul i64 %90, 9
   %92 = add nsw i64 %91, 4
   %93 = mul i64 %92, 2
   %94 = add nsw i64 %93, 0
   %95 = mul i64 %94, 4
   %96 = add nsw i64 %95, %6
   %97 = mul i64 %527, 4
   %98 = add nsw i64 %97, %6
   %99 = mul i64 %527, 1
   %100 = add nsw i64 %99, 0
   %101 = mul i64 %100, 9
   %102 = add nsw i64 %101, 4
   %103 = mul i64 %102, 2
   %104 = add nsw i64 %103, 1
   %105 = mul i64 %104, 4
   %106 = add nsw i64 %105, %6
   %107 = mul i64 %527, 4
   %108 = add nsw i64 %107, %6
   %109 = mul i64 %527, 1
   %110 = add nsw i64 %109, 0
   %111 = mul i64 %110, 9
   %112 = add nsw i64 %111, 5
   %113 = mul i64 %112, 2
   %114 = add nsw i64 %113, 0
   %115 = mul i64 %114, 4
   %116 = add nsw i64 %115, %6
   %117 = mul i64 %527, 4
   %118 = add nsw i64 %117, %6
   %119 = mul i64 %527, 1
   %120 = add nsw i64 %119, 0
   %121 = mul i64 %120, 9
   %122 = add nsw i64 %121, 5
   %123 = mul i64 %122, 2
   %124 = add nsw i64 %123, 1
   %125 = mul i64 %124, 4
   %126 = add nsw i64 %125, %6
   %127 = mul i64 %527, 4
   %128 = add nsw i64 %127, %6
   %129 = mul i64 %527, 1
   %130 = add nsw i64 %129, 0
   %131 = mul i64 %130, 9
   %132 = add nsw i64 %131, 6
   %133 = mul i64 %132, 2
   %134 = add nsw i64 %133, 0
   %135 = mul i64 %134, 4
   %136 = add nsw i64 %135, %6
   %137 = mul i64 %527, 4
   %138 = add nsw i64 %137, %6
   %139 = mul i64 %527, 1
   %140 = add nsw i64 %139, 0
   %141 = mul i64 %140, 9
   %142 = add nsw i64 %141, 6
   %143 = mul i64 %142, 2
   %144 = add nsw i64 %143, 1
   %145 = mul i64 %144, 4
   %146 = add nsw i64 %145, %6
   %147 = mul i64 %527, 4
   %148 = add nsw i64 %147, %6
   %149 = mul i64 %527, 1
   %150 = add nsw i64 %149, 0
   %151 = mul i64 %150, 9
   %152 = add nsw i64 %151, 7
   %153 = mul i64 %152, 2
   %154 = add nsw i64 %153, 0
   %155 = mul i64 %154, 4
   %156 = add nsw i64 %155, %6
   %157 = mul i64 %527, 4
   %158 = add nsw i64 %157, %6
   %159 = mul i64 %527, 1
   %160 = add nsw i64 %159, 0
   %161 = mul i64 %160, 9
   %162 = add nsw i64 %161, 7
   %163 = mul i64 %162, 2
   %164 = add nsw i64 %163, 1
   %165 = mul i64 %164, 4
   %166 = add nsw i64 %165, %6
   %167 = mul i64 %527, 4
   %168 = add nsw i64 %167, %6
   %169 = mul i64 %527, 1
   %170 = add nsw i64 %169, 0
   %171 = mul i64 %170, 9
   %172 = add nsw i64 %171, 8
   %173 = mul i64 %172, 2
   %174 = add nsw i64 %173, 0
   %175 = mul i64 %174, 4
   %176 = add nsw i64 %175, %6
   %177 = mul i64 %527, 4
   %178 = add nsw i64 %177, %6
   %179 = mul i64 %527, 1
   %180 = add nsw i64 %179, 0
   %181 = mul i64 %180, 9
   %182 = add nsw i64 %181, 8
   %183 = mul i64 %182, 2
   %184 = add nsw i64 %183, 1
   %185 = mul i64 %184, 4
   %186 = add nsw i64 %185, %6
   %187 = getelementptr float* %arg6, i64 %16
   %188 = load float* %187
   %189 = getelementptr float* %arg6, i64 %26
   %190 = load float* %189
   %191 = getelementptr float* %arg6, i64 %36
   %192 = load float* %191
   %193 = getelementptr float* %arg6, i64 %46
   %194 = load float* %193
   %195 = getelementptr float* %arg6, i64 %56
   %196 = load float* %195
   %197 = getelementptr float* %arg6, i64 %66
   %198 = load float* %197
   %199 = getelementptr float* %arg6, i64 %76
   %200 = load float* %199
   %201 = getelementptr float* %arg6, i64 %86
   %202 = load float* %201
   %203 = getelementptr float* %arg6, i64 %96
   %204 = load float* %203
   %205 = getelementptr float* %arg6, i64 %106
   %206 = load float* %205
   %207 = getelementptr float* %arg6, i64 %116
   %208 = load float* %207
   %209 = getelementptr float* %arg6, i64 %126
   %210 = load float* %209
   %211 = getelementptr float* %arg6, i64 %136
   %212 = load float* %211
   %213 = getelementptr float* %arg6, i64 %146
   %214 = load float* %213
   %215 = getelementptr float* %arg6, i64 %156
   %216 = load float* %215
   %217 = getelementptr float* %arg6, i64 %166
   %218 = load float* %217
   %219 = getelementptr float* %arg6, i64 %176
   %220 = load float* %219
   %221 = getelementptr float* %arg6, i64 %186
   %222 = load float* %221
   %223 = mul i64 %527, 4
   %224 = add nsw i64 %223, %6
   %225 = mul i64 %527, 1
   %226 = add nsw i64 %225, 0
   %227 = mul i64 %226, 9
   %228 = add nsw i64 %227, 0
   %229 = mul i64 %228, 2
   %230 = add nsw i64 %229, 0
   %231 = mul i64 %230, 4
   %232 = add nsw i64 %231, %6
   %233 = mul i64 %527, 4
   %234 = add nsw i64 %233, %6
   %235 = mul i64 %527, 1
   %236 = add nsw i64 %235, 0
   %237 = mul i64 %236, 9
   %238 = add nsw i64 %237, 0
   %239 = mul i64 %238, 2
   %240 = add nsw i64 %239, 1
   %241 = mul i64 %240, 4
   %242 = add nsw i64 %241, %6
   %243 = mul i64 %527, 4
   %244 = add nsw i64 %243, %6
   %245 = mul i64 %527, 1
   %246 = add nsw i64 %245, 0
   %247 = mul i64 %246, 9
   %248 = add nsw i64 %247, 1
   %249 = mul i64 %248, 2
   %250 = add nsw i64 %249, 0
   %251 = mul i64 %250, 4
   %252 = add nsw i64 %251, %6
   %253 = mul i64 %527, 4
   %254 = add nsw i64 %253, %6
   %255 = mul i64 %527, 1
   %256 = add nsw i64 %255, 0
   %257 = mul i64 %256, 9
   %258 = add nsw i64 %257, 1
   %259 = mul i64 %258, 2
   %260 = add nsw i64 %259, 1
   %261 = mul i64 %260, 4
   %262 = add nsw i64 %261, %6
   %263 = mul i64 %527, 4
   %264 = add nsw i64 %263, %6
   %265 = mul i64 %527, 1
   %266 = add nsw i64 %265, 0
   %267 = mul i64 %266, 9
   %268 = add nsw i64 %267, 2
   %269 = mul i64 %268, 2
   %270 = add nsw i64 %269, 0
   %271 = mul i64 %270, 4
   %272 = add nsw i64 %271, %6
   %273 = mul i64 %527, 4
   %274 = add nsw i64 %273, %6
   %275 = mul i64 %527, 1
   %276 = add nsw i64 %275, 0
   %277 = mul i64 %276, 9
   %278 = add nsw i64 %277, 2
   %279 = mul i64 %278, 2
   %280 = add nsw i64 %279, 1
   %281 = mul i64 %280, 4
   %282 = add nsw i64 %281, %6
   %283 = mul i64 %527, 4
   %284 = add nsw i64 %283, %6
   %285 = mul i64 %527, 1
   %286 = add nsw i64 %285, 0
   %287 = mul i64 %286, 9
   %288 = add nsw i64 %287, 3
   %289 = mul i64 %288, 2
   %290 = add nsw i64 %289, 0
   %291 = mul i64 %290, 4
   %292 = add nsw i64 %291, %6
   %293 = mul i64 %527, 4
   %294 = add nsw i64 %293, %6
   %295 = mul i64 %527, 1
   %296 = add nsw i64 %295, 0
   %297 = mul i64 %296, 9
   %298 = add nsw i64 %297, 3
   %299 = mul i64 %298, 2
   %300 = add nsw i64 %299, 1
   %301 = mul i64 %300, 4
   %302 = add nsw i64 %301, %6
   %303 = mul i64 %527, 4
   %304 = add nsw i64 %303, %6
   %305 = mul i64 %527, 1
   %306 = add nsw i64 %305, 0
   %307 = mul i64 %306, 9
   %308 = add nsw i64 %307, 4
   %309 = mul i64 %308, 2
   %310 = add nsw i64 %309, 0
   %311 = mul i64 %310, 4
   %312 = add nsw i64 %311, %6
   %313 = mul i64 %527, 4
   %314 = add nsw i64 %313, %6
   %315 = mul i64 %527, 1
   %316 = add nsw i64 %315, 0
   %317 = mul i64 %316, 9
   %318 = add nsw i64 %317, 4
   %319 = mul i64 %318, 2
   %320 = add nsw i64 %319, 1
   %321 = mul i64 %320, 4
   %322 = add nsw i64 %321, %6
   %323 = mul i64 %527, 4
   %324 = add nsw i64 %323, %6
   %325 = mul i64 %527, 1
   %326 = add nsw i64 %325, 0
   %327 = mul i64 %326, 9
   %328 = add nsw i64 %327, 5
   %329 = mul i64 %328, 2
   %330 = add nsw i64 %329, 0
   %331 = mul i64 %330, 4
   %332 = add nsw i64 %331, %6
   %333 = mul i64 %527, 4
   %334 = add nsw i64 %333, %6
   %335 = mul i64 %527, 1
   %336 = add nsw i64 %335, 0
   %337 = mul i64 %336, 9
   %338 = add nsw i64 %337, 5
   %339 = mul i64 %338, 2
   %340 = add nsw i64 %339, 1
   %341 = mul i64 %340, 4
   %342 = add nsw i64 %341, %6
   %343 = mul i64 %527, 4
   %344 = add nsw i64 %343, %6
   %345 = mul i64 %527, 1
   %346 = add nsw i64 %345, 0
   %347 = mul i64 %346, 9
   %348 = add nsw i64 %347, 6
   %349 = mul i64 %348, 2
   %350 = add nsw i64 %349, 0
   %351 = mul i64 %350, 4
   %352 = add nsw i64 %351, %6
   %353 = mul i64 %527, 4
   %354 = add nsw i64 %353, %6
   %355 = mul i64 %527, 1
   %356 = add nsw i64 %355, 0
   %357 = mul i64 %356, 9
   %358 = add nsw i64 %357, 6
   %359 = mul i64 %358, 2
   %360 = add nsw i64 %359, 1
   %361 = mul i64 %360, 4
   %362 = add nsw i64 %361, %6
   %363 = mul i64 %527, 4
   %364 = add nsw i64 %363, %6
   %365 = mul i64 %527, 1
   %366 = add nsw i64 %365, 0
   %367 = mul i64 %366, 9
   %368 = add nsw i64 %367, 7
   %369 = mul i64 %368, 2
   %370 = add nsw i64 %369, 0
   %371 = mul i64 %370, 4
   %372 = add nsw i64 %371, %6
   %373 = mul i64 %527, 4
   %374 = add nsw i64 %373, %6
   %375 = mul i64 %527, 1
   %376 = add nsw i64 %375, 0
   %377 = mul i64 %376, 9
   %378 = add nsw i64 %377, 7
   %379 = mul i64 %378, 2
   %380 = add nsw i64 %379, 1
   %381 = mul i64 %380, 4
   %382 = add nsw i64 %381, %6
   %383 = mul i64 %527, 4
   %384 = add nsw i64 %383, %6
   %385 = mul i64 %527, 1
   %386 = add nsw i64 %385, 0
   %387 = mul i64 %386, 9
   %388 = add nsw i64 %387, 8
   %389 = mul i64 %388, 2
   %390 = add nsw i64 %389, 0
   %391 = mul i64 %390, 4
   %392 = add nsw i64 %391, %6
   %393 = mul i64 %527, 4
   %394 = add nsw i64 %393, %6
   %395 = mul i64 %527, 1
   %396 = add nsw i64 %395, 0
   %397 = mul i64 %396, 9
   %398 = add nsw i64 %397, 8
   %399 = mul i64 %398, 2
   %400 = add nsw i64 %399, 1
   %401 = mul i64 %400, 4
   %402 = add nsw i64 %401, %6
   %403 = getelementptr float* %arg7, i64 %232
   %404 = load float* %403
   %405 = getelementptr float* %arg7, i64 %242
   %406 = load float* %405
   %407 = getelementptr float* %arg7, i64 %252
   %408 = load float* %407
   %409 = getelementptr float* %arg7, i64 %262
   %410 = load float* %409
   %411 = getelementptr float* %arg7, i64 %272
   %412 = load float* %411
   %413 = getelementptr float* %arg7, i64 %282
   %414 = load float* %413
   %415 = getelementptr float* %arg7, i64 %292
   %416 = load float* %415
   %417 = getelementptr float* %arg7, i64 %302
   %418 = load float* %417
   %419 = getelementptr float* %arg7, i64 %312
   %420 = load float* %419
   %421 = getelementptr float* %arg7, i64 %322
   %422 = load float* %421
   %423 = getelementptr float* %arg7, i64 %332
   %424 = load float* %423
   %425 = getelementptr float* %arg7, i64 %342
   %426 = load float* %425
   %427 = getelementptr float* %arg7, i64 %352
   %428 = load float* %427
   %429 = getelementptr float* %arg7, i64 %362
   %430 = load float* %429
   %431 = getelementptr float* %arg7, i64 %372
   %432 = load float* %431
   %433 = getelementptr float* %arg7, i64 %382
   %434 = load float* %433
   %435 = getelementptr float* %arg7, i64 %392
   %436 = load float* %435
   %437 = getelementptr float* %arg7, i64 %402
   %438 = load float* %437
   %439 = fmul float %406, %188
   %440 = fmul float %404, %190
   %441 = fadd float %440, %439
   %442 = fmul float %406, %190
   %443 = fmul float %404, %188
   %444 = fsub float %443, %442
   %445 = fmul float %410, %200
   %446 = fmul float %408, %202
   %447 = fadd float %446, %445
   %448 = fmul float %410, %202
   %449 = fmul float %408, %200
   %450 = fsub float %449, %448
   %451 = fadd float %444, %450
   %452 = fadd float %441, %447
   %453 = fmul float %414, %212
   %454 = fmul float %412, %214
   %455 = fadd float %454, %453
   %456 = fmul float %414, %214
   %457 = fmul float %412, %212
   %458 = fsub float %457, %456
   %459 = fadd float %451, %458
   %460 = fadd float %452, %455
   %461 = fmul float %418, %192
   %462 = fmul float %416, %194
   %463 = fadd float %462, %461
   %464 = fmul float %418, %194
   %465 = fmul float %416, %192
   %466 = fsub float %465, %464
   %467 = fadd float %459, %466
   %468 = fadd float %460, %463
   %469 = fmul float %422, %204
   %470 = fmul float %420, %206
   %471 = fadd float %470, %469
   %472 = fmul float %422, %206
   %473 = fmul float %420, %204
   %474 = fsub float %473, %472
   %475 = fadd float %467, %474
   %476 = fadd float %468, %471
   %477 = fmul float %426, %216
   %478 = fmul float %424, %218
   %479 = fadd float %478, %477
   %480 = fmul float %426, %218
   %481 = fmul float %424, %216
   %482 = fsub float %481, %480
   %483 = fadd float %475, %482
   %484 = fadd float %476, %479
   %485 = fmul float %430, %196
   %486 = fmul float %428, %198
   %487 = fadd float %486, %485
   %488 = fmul float %430, %198
   %489 = fmul float %428, %196
   %490 = fsub float %489, %488
   %491 = fadd float %483, %490
   %492 = fadd float %484, %487
   %493 = fmul float %434, %208
   %494 = fmul float %432, %210
   %495 = fadd float %494, %493
   %496 = fmul float %434, %210
   %497 = fmul float %432, %208
   %498 = fsub float %497, %496
   %499 = fadd float %491, %498
   %500 = fadd float %492, %495
   %501 = fmul float %438, %220
   %502 = fmul float %436, %222
   %503 = fadd float %502, %501
   %504 = fmul float %438, %222
   %505 = fmul float %436, %220
   %506 = fsub float %505, %504
   %507 = fadd float %499, %506
   %508 = fadd float %500, %503
   %509 = getelementptr double* %arg8, i32 0
   %510 = load double* %509
   %511 = fpext float %507 to double
   %512 = fmul double %510, %511
   %513 = mul i64 %527, 4
   %514 = add nsw i64 %513, %6
   %515 = mul i64 %527, 1
   %516 = add nsw i64 %515, 0
   %517 = mul i64 %516, 1
   %518 = add nsw i64 %517, 0
   %519 = mul i64 %518, 1
   %520 = add nsw i64 %519, 0
   %521 = mul i64 %520, 4
   %522 = add nsw i64 %521, %6
   %523 = getelementptr float* %arg5, i64 %522
   %524 = fptrunc double %512 to float
   store float %524, float* %523
   br label %L7

L4:                                               ; preds = %L7
   %525 = add nsw i64 %527, 1
   %526 = icmp sge i64 %525, %5
   br i1 %526, label %L6, label %L5

L5:                                               ; preds = %L4, %L2
   %527 = phi i64 [ %525, %L4 ], [ %4, %L2 ]
   br label %L3

L6:                                               ; preds = %L4
   ret void

L7:                                               ; preds = %L3
   %528 = add nsw i64 %6, 1
   %529 = icmp sge i64 %528, 4
   br i1 %529, label %L4, label %L3
}

Frank Winter

2013-Nov-10 14:05 UTC

head link

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

I looked more into this. For the previously sent IR the vector width of 
256 bit is found mistakenly (and reproducibly) on this hardware:

model name    : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz

For the same IR the loop vectorizer finds the correct vector width (128 
bit) on:

model name    : Intel(R) Xeon(R) CPU           E5630  @ 2.53GHz
model name    : Intel(R) Core(TM) i7 CPU       M 640  @ 2.80GHz

Thus, the behavior depends on which hardware I run on.

I am using the JIT execution engine (original interface).

Frank


On 09/11/13 23:50, Frank Winter wrote:> The loop vectorizer is doing an amazing job so far. Most of the time.
> I just came across one function which led to unexpected behavior:
>
> On this function the loop vectorizer finds a 256 bit vector as the
> wides vector type for the x86-64 architecture. (!)
>
> This is strange, as it was always finding the correct size of 128 bit
> as the widest type. I isolated the IR of the function to check if this
> is reproducible outside of my application. And to my surprise it is!
>
> If I run
>
> opt -O1 -loop-vectorize -debug-only=loop-vectorize 
> -vectorizer-min-trip-count=4 strange.ll -S
>
> on the IR found below I get:
>
> LV: Checking a loop in "main"
> LV: Found a loop: L3
> LV: Found an induction variable.
> LV: We need to do 0 pointer comparisons.
> LV: We don't need a runtime memory check.
> LV: We can vectorize this loop!
> LV: Found trip count: 4
> LV: The Widest type: 32 bits.
> LV: The Widest register is: 256 bits.
>
> Wow, a Sandybridge processor with 256 bit SIMD?
>
> The vectorizer carries on and figures that 8 would be the best to go for.
>
> LV: Vector loop of width 8 costs: 38.
> LV: Selecting VF = : 8.
>
> I didn't look into the machine code but I guess there is something 
> going wrong earlier.
>
> I am wondering why it's reproducible and depending on the IR?!
>
> PS When running with -O3 it still find 256 bit, but later decides that 
> it's not worth
> vectorizing.
>
> Frank
>
>
>
>
>
>
> target datalayout = 
>
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
>
> target triple = "x86_64-unknown-linux-elf"
>
> define void @main(i64 %arg0, i64 %arg1, i64 %arg2, i1 %arg3, i64 
> %arg4, float* noalias %arg5, float* noalias %arg6, float* noalias 
> %arg7, double* noalias %arg8) {
> entrypoint:
>   br i1 %arg3, label %L0, label %L1
>
> L0:                                               ; preds = %entrypoint
>   %0 = add nsw i64 %arg0, %arg4
>   %1 = add nsw i64 %arg1, %arg4
>   br label %L2
>
> L1:                                               ; preds = %entrypoint
>   br label %L2
>
> L2:                                               ; preds = %L0, %L1
>   %2 = phi i64 [ %arg0, %L1 ], [ %0, %L0 ]
>   %3 = phi i64 [ %arg1, %L1 ], [ %1, %L0 ]
>   %4 = sdiv i64 %2, 4
>   %5 = sdiv i64 %3, 4
>   br label %L5
>
> L3:                                               ; preds = %L7, %L5
>   %6 = phi i64 [ %528, %L7 ], [ 0, %L5 ]
>   %7 = mul i64 %527, 4
>   %8 = add nsw i64 %7, %6
>   %9 = mul i64 %527, 1
>   %10 = add nsw i64 %9, 0
>   %11 = mul i64 %10, 9
>   %12 = add nsw i64 %11, 0
>   %13 = mul i64 %12, 2
>   %14 = add nsw i64 %13, 0
>   %15 = mul i64 %14, 4
>   %16 = add nsw i64 %15, %6
>   %17 = mul i64 %527, 4
>   %18 = add nsw i64 %17, %6
>   %19 = mul i64 %527, 1
>   %20 = add nsw i64 %19, 0
>   %21 = mul i64 %20, 9
>   %22 = add nsw i64 %21, 0
>   %23 = mul i64 %22, 2
>   %24 = add nsw i64 %23, 1
>   %25 = mul i64 %24, 4
>   %26 = add nsw i64 %25, %6
>   %27 = mul i64 %527, 4
>   %28 = add nsw i64 %27, %6
>   %29 = mul i64 %527, 1
>   %30 = add nsw i64 %29, 0
>   %31 = mul i64 %30, 9
>   %32 = add nsw i64 %31, 1
>   %33 = mul i64 %32, 2
>   %34 = add nsw i64 %33, 0
>   %35 = mul i64 %34, 4
>   %36 = add nsw i64 %35, %6
>   %37 = mul i64 %527, 4
>   %38 = add nsw i64 %37, %6
>   %39 = mul i64 %527, 1
>   %40 = add nsw i64 %39, 0
>   %41 = mul i64 %40, 9
>   %42 = add nsw i64 %41, 1
>   %43 = mul i64 %42, 2
>   %44 = add nsw i64 %43, 1
>   %45 = mul i64 %44, 4
>   %46 = add nsw i64 %45, %6
>   %47 = mul i64 %527, 4
>   %48 = add nsw i64 %47, %6
>   %49 = mul i64 %527, 1
>   %50 = add nsw i64 %49, 0
>   %51 = mul i64 %50, 9
>   %52 = add nsw i64 %51, 2
>   %53 = mul i64 %52, 2
>   %54 = add nsw i64 %53, 0
>   %55 = mul i64 %54, 4
>   %56 = add nsw i64 %55, %6
>   %57 = mul i64 %527, 4
>   %58 = add nsw i64 %57, %6
>   %59 = mul i64 %527, 1
>   %60 = add nsw i64 %59, 0
>   %61 = mul i64 %60, 9
>   %62 = add nsw i64 %61, 2
>   %63 = mul i64 %62, 2
>   %64 = add nsw i64 %63, 1
>   %65 = mul i64 %64, 4
>   %66 = add nsw i64 %65, %6
>   %67 = mul i64 %527, 4
>   %68 = add nsw i64 %67, %6
>   %69 = mul i64 %527, 1
>   %70 = add nsw i64 %69, 0
>   %71 = mul i64 %70, 9
>   %72 = add nsw i64 %71, 3
>   %73 = mul i64 %72, 2
>   %74 = add nsw i64 %73, 0
>   %75 = mul i64 %74, 4
>   %76 = add nsw i64 %75, %6
>   %77 = mul i64 %527, 4
>   %78 = add nsw i64 %77, %6
>   %79 = mul i64 %527, 1
>   %80 = add nsw i64 %79, 0
>   %81 = mul i64 %80, 9
>   %82 = add nsw i64 %81, 3
>   %83 = mul i64 %82, 2
>   %84 = add nsw i64 %83, 1
>   %85 = mul i64 %84, 4
>   %86 = add nsw i64 %85, %6
>   %87 = mul i64 %527, 4
>   %88 = add nsw i64 %87, %6
>   %89 = mul i64 %527, 1
>   %90 = add nsw i64 %89, 0
>   %91 = mul i64 %90, 9
>   %92 = add nsw i64 %91, 4
>   %93 = mul i64 %92, 2
>   %94 = add nsw i64 %93, 0
>   %95 = mul i64 %94, 4
>   %96 = add nsw i64 %95, %6
>   %97 = mul i64 %527, 4
>   %98 = add nsw i64 %97, %6
>   %99 = mul i64 %527, 1
>   %100 = add nsw i64 %99, 0
>   %101 = mul i64 %100, 9
>   %102 = add nsw i64 %101, 4
>   %103 = mul i64 %102, 2
>   %104 = add nsw i64 %103, 1
>   %105 = mul i64 %104, 4
>   %106 = add nsw i64 %105, %6
>   %107 = mul i64 %527, 4
>   %108 = add nsw i64 %107, %6
>   %109 = mul i64 %527, 1
>   %110 = add nsw i64 %109, 0
>   %111 = mul i64 %110, 9
>   %112 = add nsw i64 %111, 5
>   %113 = mul i64 %112, 2
>   %114 = add nsw i64 %113, 0
>   %115 = mul i64 %114, 4
>   %116 = add nsw i64 %115, %6
>   %117 = mul i64 %527, 4
>   %118 = add nsw i64 %117, %6
>   %119 = mul i64 %527, 1
>   %120 = add nsw i64 %119, 0
>   %121 = mul i64 %120, 9
>   %122 = add nsw i64 %121, 5
>   %123 = mul i64 %122, 2
>   %124 = add nsw i64 %123, 1
>   %125 = mul i64 %124, 4
>   %126 = add nsw i64 %125, %6
>   %127 = mul i64 %527, 4
>   %128 = add nsw i64 %127, %6
>   %129 = mul i64 %527, 1
>   %130 = add nsw i64 %129, 0
>   %131 = mul i64 %130, 9
>   %132 = add nsw i64 %131, 6
>   %133 = mul i64 %132, 2
>   %134 = add nsw i64 %133, 0
>   %135 = mul i64 %134, 4
>   %136 = add nsw i64 %135, %6
>   %137 = mul i64 %527, 4
>   %138 = add nsw i64 %137, %6
>   %139 = mul i64 %527, 1
>   %140 = add nsw i64 %139, 0
>   %141 = mul i64 %140, 9
>   %142 = add nsw i64 %141, 6
>   %143 = mul i64 %142, 2
>   %144 = add nsw i64 %143, 1
>   %145 = mul i64 %144, 4
>   %146 = add nsw i64 %145, %6
>   %147 = mul i64 %527, 4
>   %148 = add nsw i64 %147, %6
>   %149 = mul i64 %527, 1
>   %150 = add nsw i64 %149, 0
>   %151 = mul i64 %150, 9
>   %152 = add nsw i64 %151, 7
>   %153 = mul i64 %152, 2
>   %154 = add nsw i64 %153, 0
>   %155 = mul i64 %154, 4
>   %156 = add nsw i64 %155, %6
>   %157 = mul i64 %527, 4
>   %158 = add nsw i64 %157, %6
>   %159 = mul i64 %527, 1
>   %160 = add nsw i64 %159, 0
>   %161 = mul i64 %160, 9
>   %162 = add nsw i64 %161, 7
>   %163 = mul i64 %162, 2
>   %164 = add nsw i64 %163, 1
>   %165 = mul i64 %164, 4
>   %166 = add nsw i64 %165, %6
>   %167 = mul i64 %527, 4
>   %168 = add nsw i64 %167, %6
>   %169 = mul i64 %527, 1
>   %170 = add nsw i64 %169, 0
>   %171 = mul i64 %170, 9
>   %172 = add nsw i64 %171, 8
>   %173 = mul i64 %172, 2
>   %174 = add nsw i64 %173, 0
>   %175 = mul i64 %174, 4
>   %176 = add nsw i64 %175, %6
>   %177 = mul i64 %527, 4
>   %178 = add nsw i64 %177, %6
>   %179 = mul i64 %527, 1
>   %180 = add nsw i64 %179, 0
>   %181 = mul i64 %180, 9
>   %182 = add nsw i64 %181, 8
>   %183 = mul i64 %182, 2
>   %184 = add nsw i64 %183, 1
>   %185 = mul i64 %184, 4
>   %186 = add nsw i64 %185, %6
>   %187 = getelementptr float* %arg6, i64 %16
>   %188 = load float* %187
>   %189 = getelementptr float* %arg6, i64 %26
>   %190 = load float* %189
>   %191 = getelementptr float* %arg6, i64 %36
>   %192 = load float* %191
>   %193 = getelementptr float* %arg6, i64 %46
>   %194 = load float* %193
>   %195 = getelementptr float* %arg6, i64 %56
>   %196 = load float* %195
>   %197 = getelementptr float* %arg6, i64 %66
>   %198 = load float* %197
>   %199 = getelementptr float* %arg6, i64 %76
>   %200 = load float* %199
>   %201 = getelementptr float* %arg6, i64 %86
>   %202 = load float* %201
>   %203 = getelementptr float* %arg6, i64 %96
>   %204 = load float* %203
>   %205 = getelementptr float* %arg6, i64 %106
>   %206 = load float* %205
>   %207 = getelementptr float* %arg6, i64 %116
>   %208 = load float* %207
>   %209 = getelementptr float* %arg6, i64 %126
>   %210 = load float* %209
>   %211 = getelementptr float* %arg6, i64 %136
>   %212 = load float* %211
>   %213 = getelementptr float* %arg6, i64 %146
>   %214 = load float* %213
>   %215 = getelementptr float* %arg6, i64 %156
>   %216 = load float* %215
>   %217 = getelementptr float* %arg6, i64 %166
>   %218 = load float* %217
>   %219 = getelementptr float* %arg6, i64 %176
>   %220 = load float* %219
>   %221 = getelementptr float* %arg6, i64 %186
>   %222 = load float* %221
>   %223 = mul i64 %527, 4
>   %224 = add nsw i64 %223, %6
>   %225 = mul i64 %527, 1
>   %226 = add nsw i64 %225, 0
>   %227 = mul i64 %226, 9
>   %228 = add nsw i64 %227, 0
>   %229 = mul i64 %228, 2
>   %230 = add nsw i64 %229, 0
>   %231 = mul i64 %230, 4
>   %232 = add nsw i64 %231, %6
>   %233 = mul i64 %527, 4
>   %234 = add nsw i64 %233, %6
>   %235 = mul i64 %527, 1
>   %236 = add nsw i64 %235, 0
>   %237 = mul i64 %236, 9
>   %238 = add nsw i64 %237, 0
>   %239 = mul i64 %238, 2
>   %240 = add nsw i64 %239, 1
>   %241 = mul i64 %240, 4
>   %242 = add nsw i64 %241, %6
>   %243 = mul i64 %527, 4
>   %244 = add nsw i64 %243, %6
>   %245 = mul i64 %527, 1
>   %246 = add nsw i64 %245, 0
>   %247 = mul i64 %246, 9
>   %248 = add nsw i64 %247, 1
>   %249 = mul i64 %248, 2
>   %250 = add nsw i64 %249, 0
>   %251 = mul i64 %250, 4
>   %252 = add nsw i64 %251, %6
>   %253 = mul i64 %527, 4
>   %254 = add nsw i64 %253, %6
>   %255 = mul i64 %527, 1
>   %256 = add nsw i64 %255, 0
>   %257 = mul i64 %256, 9
>   %258 = add nsw i64 %257, 1
>   %259 = mul i64 %258, 2
>   %260 = add nsw i64 %259, 1
>   %261 = mul i64 %260, 4
>   %262 = add nsw i64 %261, %6
>   %263 = mul i64 %527, 4
>   %264 = add nsw i64 %263, %6
>   %265 = mul i64 %527, 1
>   %266 = add nsw i64 %265, 0
>   %267 = mul i64 %266, 9
>   %268 = add nsw i64 %267, 2
>   %269 = mul i64 %268, 2
>   %270 = add nsw i64 %269, 0
>   %271 = mul i64 %270, 4
>   %272 = add nsw i64 %271, %6
>   %273 = mul i64 %527, 4
>   %274 = add nsw i64 %273, %6
>   %275 = mul i64 %527, 1
>   %276 = add nsw i64 %275, 0
>   %277 = mul i64 %276, 9
>   %278 = add nsw i64 %277, 2
>   %279 = mul i64 %278, 2
>   %280 = add nsw i64 %279, 1
>   %281 = mul i64 %280, 4
>   %282 = add nsw i64 %281, %6
>   %283 = mul i64 %527, 4
>   %284 = add nsw i64 %283, %6
>   %285 = mul i64 %527, 1
>   %286 = add nsw i64 %285, 0
>   %287 = mul i64 %286, 9
>   %288 = add nsw i64 %287, 3
>   %289 = mul i64 %288, 2
>   %290 = add nsw i64 %289, 0
>   %291 = mul i64 %290, 4
>   %292 = add nsw i64 %291, %6
>   %293 = mul i64 %527, 4
>   %294 = add nsw i64 %293, %6
>   %295 = mul i64 %527, 1
>   %296 = add nsw i64 %295, 0
>   %297 = mul i64 %296, 9
>   %298 = add nsw i64 %297, 3
>   %299 = mul i64 %298, 2
>   %300 = add nsw i64 %299, 1
>   %301 = mul i64 %300, 4
>   %302 = add nsw i64 %301, %6
>   %303 = mul i64 %527, 4
>   %304 = add nsw i64 %303, %6
>   %305 = mul i64 %527, 1
>   %306 = add nsw i64 %305, 0
>   %307 = mul i64 %306, 9
>   %308 = add nsw i64 %307, 4
>   %309 = mul i64 %308, 2
>   %310 = add nsw i64 %309, 0
>   %311 = mul i64 %310, 4
>   %312 = add nsw i64 %311, %6
>   %313 = mul i64 %527, 4
>   %314 = add nsw i64 %313, %6
>   %315 = mul i64 %527, 1
>   %316 = add nsw i64 %315, 0
>   %317 = mul i64 %316, 9
>   %318 = add nsw i64 %317, 4
>   %319 = mul i64 %318, 2
>   %320 = add nsw i64 %319, 1
>   %321 = mul i64 %320, 4
>   %322 = add nsw i64 %321, %6
>   %323 = mul i64 %527, 4
>   %324 = add nsw i64 %323, %6
>   %325 = mul i64 %527, 1
>   %326 = add nsw i64 %325, 0
>   %327 = mul i64 %326, 9
>   %328 = add nsw i64 %327, 5
>   %329 = mul i64 %328, 2
>   %330 = add nsw i64 %329, 0
>   %331 = mul i64 %330, 4
>   %332 = add nsw i64 %331, %6
>   %333 = mul i64 %527, 4
>   %334 = add nsw i64 %333, %6
>   %335 = mul i64 %527, 1
>   %336 = add nsw i64 %335, 0
>   %337 = mul i64 %336, 9
>   %338 = add nsw i64 %337, 5
>   %339 = mul i64 %338, 2
>   %340 = add nsw i64 %339, 1
>   %341 = mul i64 %340, 4
>   %342 = add nsw i64 %341, %6
>   %343 = mul i64 %527, 4
>   %344 = add nsw i64 %343, %6
>   %345 = mul i64 %527, 1
>   %346 = add nsw i64 %345, 0
>   %347 = mul i64 %346, 9
>   %348 = add nsw i64 %347, 6
>   %349 = mul i64 %348, 2
>   %350 = add nsw i64 %349, 0
>   %351 = mul i64 %350, 4
>   %352 = add nsw i64 %351, %6
>   %353 = mul i64 %527, 4
>   %354 = add nsw i64 %353, %6
>   %355 = mul i64 %527, 1
>   %356 = add nsw i64 %355, 0
>   %357 = mul i64 %356, 9
>   %358 = add nsw i64 %357, 6
>   %359 = mul i64 %358, 2
>   %360 = add nsw i64 %359, 1
>   %361 = mul i64 %360, 4
>   %362 = add nsw i64 %361, %6
>   %363 = mul i64 %527, 4
>   %364 = add nsw i64 %363, %6
>   %365 = mul i64 %527, 1
>   %366 = add nsw i64 %365, 0
>   %367 = mul i64 %366, 9
>   %368 = add nsw i64 %367, 7
>   %369 = mul i64 %368, 2
>   %370 = add nsw i64 %369, 0
>   %371 = mul i64 %370, 4
>   %372 = add nsw i64 %371, %6
>   %373 = mul i64 %527, 4
>   %374 = add nsw i64 %373, %6
>   %375 = mul i64 %527, 1
>   %376 = add nsw i64 %375, 0
>   %377 = mul i64 %376, 9
>   %378 = add nsw i64 %377, 7
>   %379 = mul i64 %378, 2
>   %380 = add nsw i64 %379, 1
>   %381 = mul i64 %380, 4
>   %382 = add nsw i64 %381, %6
>   %383 = mul i64 %527, 4
>   %384 = add nsw i64 %383, %6
>   %385 = mul i64 %527, 1
>   %386 = add nsw i64 %385, 0
>   %387 = mul i64 %386, 9
>   %388 = add nsw i64 %387, 8
>   %389 = mul i64 %388, 2
>   %390 = add nsw i64 %389, 0
>   %391 = mul i64 %390, 4
>   %392 = add nsw i64 %391, %6
>   %393 = mul i64 %527, 4
>   %394 = add nsw i64 %393, %6
>   %395 = mul i64 %527, 1
>   %396 = add nsw i64 %395, 0
>   %397 = mul i64 %396, 9
>   %398 = add nsw i64 %397, 8
>   %399 = mul i64 %398, 2
>   %400 = add nsw i64 %399, 1
>   %401 = mul i64 %400, 4
>   %402 = add nsw i64 %401, %6
>   %403 = getelementptr float* %arg7, i64 %232
>   %404 = load float* %403
>   %405 = getelementptr float* %arg7, i64 %242
>   %406 = load float* %405
>   %407 = getelementptr float* %arg7, i64 %252
>   %408 = load float* %407
>   %409 = getelementptr float* %arg7, i64 %262
>   %410 = load float* %409
>   %411 = getelementptr float* %arg7, i64 %272
>   %412 = load float* %411
>   %413 = getelementptr float* %arg7, i64 %282
>   %414 = load float* %413
>   %415 = getelementptr float* %arg7, i64 %292
>   %416 = load float* %415
>   %417 = getelementptr float* %arg7, i64 %302
>   %418 = load float* %417
>   %419 = getelementptr float* %arg7, i64 %312
>   %420 = load float* %419
>   %421 = getelementptr float* %arg7, i64 %322
>   %422 = load float* %421
>   %423 = getelementptr float* %arg7, i64 %332
>   %424 = load float* %423
>   %425 = getelementptr float* %arg7, i64 %342
>   %426 = load float* %425
>   %427 = getelementptr float* %arg7, i64 %352
>   %428 = load float* %427
>   %429 = getelementptr float* %arg7, i64 %362
>   %430 = load float* %429
>   %431 = getelementptr float* %arg7, i64 %372
>   %432 = load float* %431
>   %433 = getelementptr float* %arg7, i64 %382
>   %434 = load float* %433
>   %435 = getelementptr float* %arg7, i64 %392
>   %436 = load float* %435
>   %437 = getelementptr float* %arg7, i64 %402
>   %438 = load float* %437
>   %439 = fmul float %406, %188
>   %440 = fmul float %404, %190
>   %441 = fadd float %440, %439
>   %442 = fmul float %406, %190
>   %443 = fmul float %404, %188
>   %444 = fsub float %443, %442
>   %445 = fmul float %410, %200
>   %446 = fmul float %408, %202
>   %447 = fadd float %446, %445
>   %448 = fmul float %410, %202
>   %449 = fmul float %408, %200
>   %450 = fsub float %449, %448
>   %451 = fadd float %444, %450
>   %452 = fadd float %441, %447
>   %453 = fmul float %414, %212
>   %454 = fmul float %412, %214
>   %455 = fadd float %454, %453
>   %456 = fmul float %414, %214
>   %457 = fmul float %412, %212
>   %458 = fsub float %457, %456
>   %459 = fadd float %451, %458
>   %460 = fadd float %452, %455
>   %461 = fmul float %418, %192
>   %462 = fmul float %416, %194
>   %463 = fadd float %462, %461
>   %464 = fmul float %418, %194
>   %465 = fmul float %416, %192
>   %466 = fsub float %465, %464
>   %467 = fadd float %459, %466
>   %468 = fadd float %460, %463
>   %469 = fmul float %422, %204
>   %470 = fmul float %420, %206
>   %471 = fadd float %470, %469
>   %472 = fmul float %422, %206
>   %473 = fmul float %420, %204
>   %474 = fsub float %473, %472
>   %475 = fadd float %467, %474
>   %476 = fadd float %468, %471
>   %477 = fmul float %426, %216
>   %478 = fmul float %424, %218
>   %479 = fadd float %478, %477
>   %480 = fmul float %426, %218
>   %481 = fmul float %424, %216
>   %482 = fsub float %481, %480
>   %483 = fadd float %475, %482
>   %484 = fadd float %476, %479
>   %485 = fmul float %430, %196
>   %486 = fmul float %428, %198
>   %487 = fadd float %486, %485
>   %488 = fmul float %430, %198
>   %489 = fmul float %428, %196
>   %490 = fsub float %489, %488
>   %491 = fadd float %483, %490
>   %492 = fadd float %484, %487
>   %493 = fmul float %434, %208
>   %494 = fmul float %432, %210
>   %495 = fadd float %494, %493
>   %496 = fmul float %434, %210
>   %497 = fmul float %432, %208
>   %498 = fsub float %497, %496
>   %499 = fadd float %491, %498
>   %500 = fadd float %492, %495
>   %501 = fmul float %438, %220
>   %502 = fmul float %436, %222
>   %503 = fadd float %502, %501
>   %504 = fmul float %438, %222
>   %505 = fmul float %436, %220
>   %506 = fsub float %505, %504
>   %507 = fadd float %499, %506
>   %508 = fadd float %500, %503
>   %509 = getelementptr double* %arg8, i32 0
>   %510 = load double* %509
>   %511 = fpext float %507 to double
>   %512 = fmul double %510, %511
>   %513 = mul i64 %527, 4
>   %514 = add nsw i64 %513, %6
>   %515 = mul i64 %527, 1
>   %516 = add nsw i64 %515, 0
>   %517 = mul i64 %516, 1
>   %518 = add nsw i64 %517, 0
>   %519 = mul i64 %518, 1
>   %520 = add nsw i64 %519, 0
>   %521 = mul i64 %520, 4
>   %522 = add nsw i64 %521, %6
>   %523 = getelementptr float* %arg5, i64 %522
>   %524 = fptrunc double %512 to float
>   store float %524, float* %523
>   br label %L7
>
> L4:                                               ; preds = %L7
>   %525 = add nsw i64 %527, 1
>   %526 = icmp sge i64 %525, %5
>   br i1 %526, label %L6, label %L5
>
> L5:                                               ; preds = %L4, %L2
>   %527 = phi i64 [ %525, %L4 ], [ %4, %L2 ]
>   br label %L3
>
> L6:                                               ; preds = %L4
>   ret void
>
> L7:                                               ; preds = %L3
>   %528 = add nsw i64 %6, 1
>   %529 = icmp sge i64 %528, 4
>   br i1 %529, label %L4, label %L3
> }
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Renato Golin

2013-Nov-10 14:39 UTC

head link

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

Hi Frank,

I'm not an Intel expert, but it seems that your Xeon E5 supports AVX, which
does have 256-bit vectors. The other two only supports SSE instructions,
which are only 128-bit long.

cheers,
--renato


On 10 November 2013 06:05, Frank Winter <fwinter at jlab.org> wrote:
> I looked more into this. For the previously sent IR the vector width of
> 256 bit is found mistakenly (and reproducibly) on this hardware:
>
> model name    : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>
> For the same IR the loop vectorizer finds the correct vector width (128
> bit) on:
>
> model name    : Intel(R) Xeon(R) CPU           E5630  @ 2.53GHz
> model name    : Intel(R) Core(TM) i7 CPU       M 640  @ 2.80GHz
>
> Thus, the behavior depends on which hardware I run on.
>
> I am using the JIT execution engine (original interface).
>
> Frank
>
>
>
> On 09/11/13 23:50, Frank Winter wrote:
>
>> The loop vectorizer is doing an amazing job so far. Most of the time.
>> I just came across one function which led to unexpected behavior:
>>
>> On this function the loop vectorizer finds a 256 bit vector as the
>> wides vector type for the x86-64 architecture. (!)
>>
>> This is strange, as it was always finding the correct size of 128 bit
>> as the widest type. I isolated the IR of the function to check if this
>> is reproducible outside of my application. And to my surprise it is!
>>
>> If I run
>>
>> opt -O1 -loop-vectorize -debug-only=loop-vectorize
>> -vectorizer-min-trip-count=4 strange.ll -S
>>
>> on the IR found below I get:
>>
>> LV: Checking a loop in "main"
>> LV: Found a loop: L3
>> LV: Found an induction variable.
>> LV: We need to do 0 pointer comparisons.
>> LV: We don't need a runtime memory check.
>> LV: We can vectorize this loop!
>> LV: Found trip count: 4
>> LV: The Widest type: 32 bits.
>> LV: The Widest register is: 256 bits.
>>
>> Wow, a Sandybridge processor with 256 bit SIMD?
>>
>> The vectorizer carries on and figures that 8 would be the best to go
for.
>>
>> LV: Vector loop of width 8 costs: 38.
>> LV: Selecting VF = : 8.
>>
>> I didn't look into the machine code but I guess there is something
going
>> wrong earlier.
>>
>> I am wondering why it's reproducible and depending on the IR?!
>>
>> PS When running with -O3 it still find 256 bit, but later decides that
>> it's not worth
>> vectorizing.
>>
>> Frank
>>
>>
>>
>>
>>
>>
>> target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-
>> i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-
>> v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
>>
>> target triple = "x86_64-unknown-linux-elf"
>>
>> define void @main(i64 %arg0, i64 %arg1, i64 %arg2, i1 %arg3, i64 %arg4,
>> float* noalias %arg5, float* noalias %arg6, float* noalias %arg7,
double*
>> noalias %arg8) {
>> entrypoint:
>>   br i1 %arg3, label %L0, label %L1
>>
>> L0:                                               ; preds = %entrypoint
>>   %0 = add nsw i64 %arg0, %arg4
>>   %1 = add nsw i64 %arg1, %arg4
>>   br label %L2
>>
>> L1:                                               ; preds = %entrypoint
>>   br label %L2
>>
>> L2:                                               ; preds = %L0, %L1
>>   %2 = phi i64 [ %arg0, %L1 ], [ %0, %L0 ]
>>   %3 = phi i64 [ %arg1, %L1 ], [ %1, %L0 ]
>>   %4 = sdiv i64 %2, 4
>>   %5 = sdiv i64 %3, 4
>>   br label %L5
>>
>> L3:                                               ; preds = %L7, %L5
>>   %6 = phi i64 [ %528, %L7 ], [ 0, %L5 ]
>>   %7 = mul i64 %527, 4
>>   %8 = add nsw i64 %7, %6
>>   %9 = mul i64 %527, 1
>>   %10 = add nsw i64 %9, 0
>>   %11 = mul i64 %10, 9
>>   %12 = add nsw i64 %11, 0
>>   %13 = mul i64 %12, 2
>>   %14 = add nsw i64 %13, 0
>>   %15 = mul i64 %14, 4
>>   %16 = add nsw i64 %15, %6
>>   %17 = mul i64 %527, 4
>>   %18 = add nsw i64 %17, %6
>>   %19 = mul i64 %527, 1
>>   %20 = add nsw i64 %19, 0
>>   %21 = mul i64 %20, 9
>>   %22 = add nsw i64 %21, 0
>>   %23 = mul i64 %22, 2
>>   %24 = add nsw i64 %23, 1
>>   %25 = mul i64 %24, 4
>>   %26 = add nsw i64 %25, %6
>>   %27 = mul i64 %527, 4
>>   %28 = add nsw i64 %27, %6
>>   %29 = mul i64 %527, 1
>>   %30 = add nsw i64 %29, 0
>>   %31 = mul i64 %30, 9
>>   %32 = add nsw i64 %31, 1
>>   %33 = mul i64 %32, 2
>>   %34 = add nsw i64 %33, 0
>>   %35 = mul i64 %34, 4
>>   %36 = add nsw i64 %35, %6
>>   %37 = mul i64 %527, 4
>>   %38 = add nsw i64 %37, %6
>>   %39 = mul i64 %527, 1
>>   %40 = add nsw i64 %39, 0
>>   %41 = mul i64 %40, 9
>>   %42 = add nsw i64 %41, 1
>>   %43 = mul i64 %42, 2
>>   %44 = add nsw i64 %43, 1
>>   %45 = mul i64 %44, 4
>>   %46 = add nsw i64 %45, %6
>>   %47 = mul i64 %527, 4
>>   %48 = add nsw i64 %47, %6
>>   %49 = mul i64 %527, 1
>>   %50 = add nsw i64 %49, 0
>>   %51 = mul i64 %50, 9
>>   %52 = add nsw i64 %51, 2
>>   %53 = mul i64 %52, 2
>>   %54 = add nsw i64 %53, 0
>>   %55 = mul i64 %54, 4
>>   %56 = add nsw i64 %55, %6
>>   %57 = mul i64 %527, 4
>>   %58 = add nsw i64 %57, %6
>>   %59 = mul i64 %527, 1
>>   %60 = add nsw i64 %59, 0
>>   %61 = mul i64 %60, 9
>>   %62 = add nsw i64 %61, 2
>>   %63 = mul i64 %62, 2
>>   %64 = add nsw i64 %63, 1
>>   %65 = mul i64 %64, 4
>>   %66 = add nsw i64 %65, %6
>>   %67 = mul i64 %527, 4
>>   %68 = add nsw i64 %67, %6
>>   %69 = mul i64 %527, 1
>>   %70 = add nsw i64 %69, 0
>>   %71 = mul i64 %70, 9
>>   %72 = add nsw i64 %71, 3
>>   %73 = mul i64 %72, 2
>>   %74 = add nsw i64 %73, 0
>>   %75 = mul i64 %74, 4
>>   %76 = add nsw i64 %75, %6
>>   %77 = mul i64 %527, 4
>>   %78 = add nsw i64 %77, %6
>>   %79 = mul i64 %527, 1
>>   %80 = add nsw i64 %79, 0
>>   %81 = mul i64 %80, 9
>>   %82 = add nsw i64 %81, 3
>>   %83 = mul i64 %82, 2
>>   %84 = add nsw i64 %83, 1
>>   %85 = mul i64 %84, 4
>>   %86 = add nsw i64 %85, %6
>>   %87 = mul i64 %527, 4
>>   %88 = add nsw i64 %87, %6
>>   %89 = mul i64 %527, 1
>>   %90 = add nsw i64 %89, 0
>>   %91 = mul i64 %90, 9
>>   %92 = add nsw i64 %91, 4
>>   %93 = mul i64 %92, 2
>>   %94 = add nsw i64 %93, 0
>>   %95 = mul i64 %94, 4
>>   %96 = add nsw i64 %95, %6
>>   %97 = mul i64 %527, 4
>>   %98 = add nsw i64 %97, %6
>>   %99 = mul i64 %527, 1
>>   %100 = add nsw i64 %99, 0
>>   %101 = mul i64 %100, 9
>>   %102 = add nsw i64 %101, 4
>>   %103 = mul i64 %102, 2
>>   %104 = add nsw i64 %103, 1
>>   %105 = mul i64 %104, 4
>>   %106 = add nsw i64 %105, %6
>>   %107 = mul i64 %527, 4
>>   %108 = add nsw i64 %107, %6
>>   %109 = mul i64 %527, 1
>>   %110 = add nsw i64 %109, 0
>>   %111 = mul i64 %110, 9
>>   %112 = add nsw i64 %111, 5
>>   %113 = mul i64 %112, 2
>>   %114 = add nsw i64 %113, 0
>>   %115 = mul i64 %114, 4
>>   %116 = add nsw i64 %115, %6
>>   %117 = mul i64 %527, 4
>>   %118 = add nsw i64 %117, %6
>>   %119 = mul i64 %527, 1
>>   %120 = add nsw i64 %119, 0
>>   %121 = mul i64 %120, 9
>>   %122 = add nsw i64 %121, 5
>>   %123 = mul i64 %122, 2
>>   %124 = add nsw i64 %123, 1
>>   %125 = mul i64 %124, 4
>>   %126 = add nsw i64 %125, %6
>>   %127 = mul i64 %527, 4
>>   %128 = add nsw i64 %127, %6
>>   %129 = mul i64 %527, 1
>>   %130 = add nsw i64 %129, 0
>>   %131 = mul i64 %130, 9
>>   %132 = add nsw i64 %131, 6
>>   %133 = mul i64 %132, 2
>>   %134 = add nsw i64 %133, 0
>>   %135 = mul i64 %134, 4
>>   %136 = add nsw i64 %135, %6
>>   %137 = mul i64 %527, 4
>>   %138 = add nsw i64 %137, %6
>>   %139 = mul i64 %527, 1
>>   %140 = add nsw i64 %139, 0
>>   %141 = mul i64 %140, 9
>>   %142 = add nsw i64 %141, 6
>>   %143 = mul i64 %142, 2
>>   %144 = add nsw i64 %143, 1
>>   %145 = mul i64 %144, 4
>>   %146 = add nsw i64 %145, %6
>>   %147 = mul i64 %527, 4
>>   %148 = add nsw i64 %147, %6
>>   %149 = mul i64 %527, 1
>>   %150 = add nsw i64 %149, 0
>>   %151 = mul i64 %150, 9
>>   %152 = add nsw i64 %151, 7
>>   %153 = mul i64 %152, 2
>>   %154 = add nsw i64 %153, 0
>>   %155 = mul i64 %154, 4
>>   %156 = add nsw i64 %155, %6
>>   %157 = mul i64 %527, 4
>>   %158 = add nsw i64 %157, %6
>>   %159 = mul i64 %527, 1
>>   %160 = add nsw i64 %159, 0
>>   %161 = mul i64 %160, 9
>>   %162 = add nsw i64 %161, 7
>>   %163 = mul i64 %162, 2
>>   %164 = add nsw i64 %163, 1
>>   %165 = mul i64 %164, 4
>>   %166 = add nsw i64 %165, %6
>>   %167 = mul i64 %527, 4
>>   %168 = add nsw i64 %167, %6
>>   %169 = mul i64 %527, 1
>>   %170 = add nsw i64 %169, 0
>>   %171 = mul i64 %170, 9
>>   %172 = add nsw i64 %171, 8
>>   %173 = mul i64 %172, 2
>>   %174 = add nsw i64 %173, 0
>>   %175 = mul i64 %174, 4
>>   %176 = add nsw i64 %175, %6
>>   %177 = mul i64 %527, 4
>>   %178 = add nsw i64 %177, %6
>>   %179 = mul i64 %527, 1
>>   %180 = add nsw i64 %179, 0
>>   %181 = mul i64 %180, 9
>>   %182 = add nsw i64 %181, 8
>>   %183 = mul i64 %182, 2
>>   %184 = add nsw i64 %183, 1
>>   %185 = mul i64 %184, 4
>>   %186 = add nsw i64 %185, %6
>>   %187 = getelementptr float* %arg6, i64 %16
>>   %188 = load float* %187
>>   %189 = getelementptr float* %arg6, i64 %26
>>   %190 = load float* %189
>>   %191 = getelementptr float* %arg6, i64 %36
>>   %192 = load float* %191
>>   %193 = getelementptr float* %arg6, i64 %46
>>   %194 = load float* %193
>>   %195 = getelementptr float* %arg6, i64 %56
>>   %196 = load float* %195
>>   %197 = getelementptr float* %arg6, i64 %66
>>   %198 = load float* %197
>>   %199 = getelementptr float* %arg6, i64 %76
>>   %200 = load float* %199
>>   %201 = getelementptr float* %arg6, i64 %86
>>   %202 = load float* %201
>>   %203 = getelementptr float* %arg6, i64 %96
>>   %204 = load float* %203
>>   %205 = getelementptr float* %arg6, i64 %106
>>   %206 = load float* %205
>>   %207 = getelementptr float* %arg6, i64 %116
>>   %208 = load float* %207
>>   %209 = getelementptr float* %arg6, i64 %126
>>   %210 = load float* %209
>>   %211 = getelementptr float* %arg6, i64 %136
>>   %212 = load float* %211
>>   %213 = getelementptr float* %arg6, i64 %146
>>   %214 = load float* %213
>>   %215 = getelementptr float* %arg6, i64 %156
>>   %216 = load float* %215
>>   %217 = getelementptr float* %arg6, i64 %166
>>   %218 = load float* %217
>>   %219 = getelementptr float* %arg6, i64 %176
>>   %220 = load float* %219
>>   %221 = getelementptr float* %arg6, i64 %186
>>   %222 = load float* %221
>>   %223 = mul i64 %527, 4
>>   %224 = add nsw i64 %223, %6
>>   %225 = mul i64 %527, 1
>>   %226 = add nsw i64 %225, 0
>>   %227 = mul i64 %226, 9
>>   %228 = add nsw i64 %227, 0
>>   %229 = mul i64 %228, 2
>>   %230 = add nsw i64 %229, 0
>>   %231 = mul i64 %230, 4
>>   %232 = add nsw i64 %231, %6
>>   %233 = mul i64 %527, 4
>>   %234 = add nsw i64 %233, %6
>>   %235 = mul i64 %527, 1
>>   %236 = add nsw i64 %235, 0
>>   %237 = mul i64 %236, 9
>>   %238 = add nsw i64 %237, 0
>>   %239 = mul i64 %238, 2
>>   %240 = add nsw i64 %239, 1
>>   %241 = mul i64 %240, 4
>>   %242 = add nsw i64 %241, %6
>>   %243 = mul i64 %527, 4
>>   %244 = add nsw i64 %243, %6
>>   %245 = mul i64 %527, 1
>>   %246 = add nsw i64 %245, 0
>>   %247 = mul i64 %246, 9
>>   %248 = add nsw i64 %247, 1
>>   %249 = mul i64 %248, 2
>>   %250 = add nsw i64 %249, 0
>>   %251 = mul i64 %250, 4
>>   %252 = add nsw i64 %251, %6
>>   %253 = mul i64 %527, 4
>>   %254 = add nsw i64 %253, %6
>>   %255 = mul i64 %527, 1
>>   %256 = add nsw i64 %255, 0
>>   %257 = mul i64 %256, 9
>>   %258 = add nsw i64 %257, 1
>>   %259 = mul i64 %258, 2
>>   %260 = add nsw i64 %259, 1
>>   %261 = mul i64 %260, 4
>>   %262 = add nsw i64 %261, %6
>>   %263 = mul i64 %527, 4
>>   %264 = add nsw i64 %263, %6
>>   %265 = mul i64 %527, 1
>>   %266 = add nsw i64 %265, 0
>>   %267 = mul i64 %266, 9
>>   %268 = add nsw i64 %267, 2
>>   %269 = mul i64 %268, 2
>>   %270 = add nsw i64 %269, 0
>>   %271 = mul i64 %270, 4
>>   %272 = add nsw i64 %271, %6
>>   %273 = mul i64 %527, 4
>>   %274 = add nsw i64 %273, %6
>>   %275 = mul i64 %527, 1
>>   %276 = add nsw i64 %275, 0
>>   %277 = mul i64 %276, 9
>>   %278 = add nsw i64 %277, 2
>>   %279 = mul i64 %278, 2
>>   %280 = add nsw i64 %279, 1
>>   %281 = mul i64 %280, 4
>>   %282 = add nsw i64 %281, %6
>>   %283 = mul i64 %527, 4
>>   %284 = add nsw i64 %283, %6
>>   %285 = mul i64 %527, 1
>>   %286 = add nsw i64 %285, 0
>>   %287 = mul i64 %286, 9
>>   %288 = add nsw i64 %287, 3
>>   %289 = mul i64 %288, 2
>>   %290 = add nsw i64 %289, 0
>>   %291 = mul i64 %290, 4
>>   %292 = add nsw i64 %291, %6
>>   %293 = mul i64 %527, 4
>>   %294 = add nsw i64 %293, %6
>>   %295 = mul i64 %527, 1
>>   %296 = add nsw i64 %295, 0
>>   %297 = mul i64 %296, 9
>>   %298 = add nsw i64 %297, 3
>>   %299 = mul i64 %298, 2
>>   %300 = add nsw i64 %299, 1
>>   %301 = mul i64 %300, 4
>>   %302 = add nsw i64 %301, %6
>>   %303 = mul i64 %527, 4
>>   %304 = add nsw i64 %303, %6
>>   %305 = mul i64 %527, 1
>>   %306 = add nsw i64 %305, 0
>>   %307 = mul i64 %306, 9
>>   %308 = add nsw i64 %307, 4
>>   %309 = mul i64 %308, 2
>>   %310 = add nsw i64 %309, 0
>>   %311 = mul i64 %310, 4
>>   %312 = add nsw i64 %311, %6
>>   %313 = mul i64 %527, 4
>>   %314 = add nsw i64 %313, %6
>>   %315 = mul i64 %527, 1
>>   %316 = add nsw i64 %315, 0
>>   %317 = mul i64 %316, 9
>>   %318 = add nsw i64 %317, 4
>>   %319 = mul i64 %318, 2
>>   %320 = add nsw i64 %319, 1
>>   %321 = mul i64 %320, 4
>>   %322 = add nsw i64 %321, %6
>>   %323 = mul i64 %527, 4
>>   %324 = add nsw i64 %323, %6
>>   %325 = mul i64 %527, 1
>>   %326 = add nsw i64 %325, 0
>>   %327 = mul i64 %326, 9
>>   %328 = add nsw i64 %327, 5
>>   %329 = mul i64 %328, 2
>>   %330 = add nsw i64 %329, 0
>>   %331 = mul i64 %330, 4
>>   %332 = add nsw i64 %331, %6
>>   %333 = mul i64 %527, 4
>>   %334 = add nsw i64 %333, %6
>>   %335 = mul i64 %527, 1
>>   %336 = add nsw i64 %335, 0
>>   %337 = mul i64 %336, 9
>>   %338 = add nsw i64 %337, 5
>>   %339 = mul i64 %338, 2
>>   %340 = add nsw i64 %339, 1
>>   %341 = mul i64 %340, 4
>>   %342 = add nsw i64 %341, %6
>>   %343 = mul i64 %527, 4
>>   %344 = add nsw i64 %343, %6
>>   %345 = mul i64 %527, 1
>>   %346 = add nsw i64 %345, 0
>>   %347 = mul i64 %346, 9
>>   %348 = add nsw i64 %347, 6
>>   %349 = mul i64 %348, 2
>>   %350 = add nsw i64 %349, 0
>>   %351 = mul i64 %350, 4
>>   %352 = add nsw i64 %351, %6
>>   %353 = mul i64 %527, 4
>>   %354 = add nsw i64 %353, %6
>>   %355 = mul i64 %527, 1
>>   %356 = add nsw i64 %355, 0
>>   %357 = mul i64 %356, 9
>>   %358 = add nsw i64 %357, 6
>>   %359 = mul i64 %358, 2
>>   %360 = add nsw i64 %359, 1
>>   %361 = mul i64 %360, 4
>>   %362 = add nsw i64 %361, %6
>>   %363 = mul i64 %527, 4
>>   %364 = add nsw i64 %363, %6
>>   %365 = mul i64 %527, 1
>>   %366 = add nsw i64 %365, 0
>>   %367 = mul i64 %366, 9
>>   %368 = add nsw i64 %367, 7
>>   %369 = mul i64 %368, 2
>>   %370 = add nsw i64 %369, 0
>>   %371 = mul i64 %370, 4
>>   %372 = add nsw i64 %371, %6
>>   %373 = mul i64 %527, 4
>>   %374 = add nsw i64 %373, %6
>>   %375 = mul i64 %527, 1
>>   %376 = add nsw i64 %375, 0
>>   %377 = mul i64 %376, 9
>>   %378 = add nsw i64 %377, 7
>>   %379 = mul i64 %378, 2
>>   %380 = add nsw i64 %379, 1
>>   %381 = mul i64 %380, 4
>>   %382 = add nsw i64 %381, %6
>>   %383 = mul i64 %527, 4
>>   %384 = add nsw i64 %383, %6
>>   %385 = mul i64 %527, 1
>>   %386 = add nsw i64 %385, 0
>>   %387 = mul i64 %386, 9
>>   %388 = add nsw i64 %387, 8
>>   %389 = mul i64 %388, 2
>>   %390 = add nsw i64 %389, 0
>>   %391 = mul i64 %390, 4
>>   %392 = add nsw i64 %391, %6
>>   %393 = mul i64 %527, 4
>>   %394 = add nsw i64 %393, %6
>>   %395 = mul i64 %527, 1
>>   %396 = add nsw i64 %395, 0
>>   %397 = mul i64 %396, 9
>>   %398 = add nsw i64 %397, 8
>>   %399 = mul i64 %398, 2
>>   %400 = add nsw i64 %399, 1
>>   %401 = mul i64 %400, 4
>>   %402 = add nsw i64 %401, %6
>>   %403 = getelementptr float* %arg7, i64 %232
>>   %404 = load float* %403
>>   %405 = getelementptr float* %arg7, i64 %242
>>   %406 = load float* %405
>>   %407 = getelementptr float* %arg7, i64 %252
>>   %408 = load float* %407
>>   %409 = getelementptr float* %arg7, i64 %262
>>   %410 = load float* %409
>>   %411 = getelementptr float* %arg7, i64 %272
>>   %412 = load float* %411
>>   %413 = getelementptr float* %arg7, i64 %282
>>   %414 = load float* %413
>>   %415 = getelementptr float* %arg7, i64 %292
>>   %416 = load float* %415
>>   %417 = getelementptr float* %arg7, i64 %302
>>   %418 = load float* %417
>>   %419 = getelementptr float* %arg7, i64 %312
>>   %420 = load float* %419
>>   %421 = getelementptr float* %arg7, i64 %322
>>   %422 = load float* %421
>>   %423 = getelementptr float* %arg7, i64 %332
>>   %424 = load float* %423
>>   %425 = getelementptr float* %arg7, i64 %342
>>   %426 = load float* %425
>>   %427 = getelementptr float* %arg7, i64 %352
>>   %428 = load float* %427
>>   %429 = getelementptr float* %arg7, i64 %362
>>   %430 = load float* %429
>>   %431 = getelementptr float* %arg7, i64 %372
>>   %432 = load float* %431
>>   %433 = getelementptr float* %arg7, i64 %382
>>   %434 = load float* %433
>>   %435 = getelementptr float* %arg7, i64 %392
>>   %436 = load float* %435
>>   %437 = getelementptr float* %arg7, i64 %402
>>   %438 = load float* %437
>>   %439 = fmul float %406, %188
>>   %440 = fmul float %404, %190
>>   %441 = fadd float %440, %439
>>   %442 = fmul float %406, %190
>>   %443 = fmul float %404, %188
>>   %444 = fsub float %443, %442
>>   %445 = fmul float %410, %200
>>   %446 = fmul float %408, %202
>>   %447 = fadd float %446, %445
>>   %448 = fmul float %410, %202
>>   %449 = fmul float %408, %200
>>   %450 = fsub float %449, %448
>>   %451 = fadd float %444, %450
>>   %452 = fadd float %441, %447
>>   %453 = fmul float %414, %212
>>   %454 = fmul float %412, %214
>>   %455 = fadd float %454, %453
>>   %456 = fmul float %414, %214
>>   %457 = fmul float %412, %212
>>   %458 = fsub float %457, %456
>>   %459 = fadd float %451, %458
>>   %460 = fadd float %452, %455
>>   %461 = fmul float %418, %192
>>   %462 = fmul float %416, %194
>>   %463 = fadd float %462, %461
>>   %464 = fmul float %418, %194
>>   %465 = fmul float %416, %192
>>   %466 = fsub float %465, %464
>>   %467 = fadd float %459, %466
>>   %468 = fadd float %460, %463
>>   %469 = fmul float %422, %204
>>   %470 = fmul float %420, %206
>>   %471 = fadd float %470, %469
>>   %472 = fmul float %422, %206
>>   %473 = fmul float %420, %204
>>   %474 = fsub float %473, %472
>>   %475 = fadd float %467, %474
>>   %476 = fadd float %468, %471
>>   %477 = fmul float %426, %216
>>   %478 = fmul float %424, %218
>>   %479 = fadd float %478, %477
>>   %480 = fmul float %426, %218
>>   %481 = fmul float %424, %216
>>   %482 = fsub float %481, %480
>>   %483 = fadd float %475, %482
>>   %484 = fadd float %476, %479
>>   %485 = fmul float %430, %196
>>   %486 = fmul float %428, %198
>>   %487 = fadd float %486, %485
>>   %488 = fmul float %430, %198
>>   %489 = fmul float %428, %196
>>   %490 = fsub float %489, %488
>>   %491 = fadd float %483, %490
>>   %492 = fadd float %484, %487
>>   %493 = fmul float %434, %208
>>   %494 = fmul float %432, %210
>>   %495 = fadd float %494, %493
>>   %496 = fmul float %434, %210
>>   %497 = fmul float %432, %208
>>   %498 = fsub float %497, %496
>>   %499 = fadd float %491, %498
>>   %500 = fadd float %492, %495
>>   %501 = fmul float %438, %220
>>   %502 = fmul float %436, %222
>>   %503 = fadd float %502, %501
>>   %504 = fmul float %438, %222
>>   %505 = fmul float %436, %220
>>   %506 = fsub float %505, %504
>>   %507 = fadd float %499, %506
>>   %508 = fadd float %500, %503
>>   %509 = getelementptr double* %arg8, i32 0
>>   %510 = load double* %509
>>   %511 = fpext float %507 to double
>>   %512 = fmul double %510, %511
>>   %513 = mul i64 %527, 4
>>   %514 = add nsw i64 %513, %6
>>   %515 = mul i64 %527, 1
>>   %516 = add nsw i64 %515, 0
>>   %517 = mul i64 %516, 1
>>   %518 = add nsw i64 %517, 0
>>   %519 = mul i64 %518, 1
>>   %520 = add nsw i64 %519, 0
>>   %521 = mul i64 %520, 4
>>   %522 = add nsw i64 %521, %6
>>   %523 = getelementptr float* %arg5, i64 %522
>>   %524 = fptrunc double %512 to float
>>   store float %524, float* %523
>>   br label %L7
>>
>> L4:                                               ; preds = %L7
>>   %525 = add nsw i64 %527, 1
>>   %526 = icmp sge i64 %525, %5
>>   br i1 %526, label %L6, label %L5
>>
>> L5:                                               ; preds = %L4, %L2
>>   %527 = phi i64 [ %525, %L4 ], [ %4, %L2 ]
>>   br label %L3
>>
>> L6:                                               ; preds = %L4
>>   ret void
>>
>> L7:                                               ; preds = %L3
>>   %528 = add nsw i64 %6, 1
>>   %529 = icmp sge i64 %528, 4
>>   br i1 %529, label %L4, label %L3
>> }
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131110/c2389482/attachment.html>

Nadav Rotem

2013-Nov-11 05:38 UTC

head link

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

On Nov 9, 2013, at 8:50 PM, Frank Winter <fwinter at jlab.org> wrote:
> Wow, a Sandybridge processor with 256 bit SIMD?

Yes.  Gesher was the first processor to feature AVX. 
http://en.wikipedia.org/wiki/Sandy_Bridge

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131110/aa4e1739/attachment.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Nov 2013 - [LLVMdev] loop vectorizer erroneously finds 256 bit vectors

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

Apparently Analagous Threads