thr3ads.net - llvm dev - [LLVMdev] SIMD instructions and memory alignment on X86 [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Peter Newman

2013-Jul-19 04:09 UTC

[LLVMdev] SIMD instructions and memory alignment on X86

I've attached the module->dump() that our code is producing. 
Unfortunately this is the smallest test case I have available.

This is before any optimization passes are applied. There are two 
separate modules in existence at the time, and there are no guarantees 
about the order the surrounding code calls those functions, so there may 
be some interaction between them? There shouldn't be, they don't refer 
to any common memory etc. There is no multi-threading occurring.

The function in module-dump.ll (called crashfunc in this file) is called 
with
-        func_params    0x0018f3b0    double [3]
         [0x0]    -11.339976634695301    double
         [0x1]    -9.7504239056205506    double
         [0x2]    -5.2900856817382804    double
at the time of the exception.

This is compiled on a "i686-pc-win32" triple. All of the non-intrinsic
functions referred to in these modules are the standard equivalents from 
the MSVC library (e.g. @asin is the standard C lib    double asin( 
double ) ).

Hopefully this is reproducible for you.

--
PeterN

On 18/07/2013 4:37 PM, Craig Topper wrote:> Are you able to send any IR for others to reproduce this issue?
>
>
> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com 
> <mailto:peter at uformia.com>> wrote:
>
>     Unfortunately, this doesn't appear to be the bug I'm hitting. I
>     applied the fix to my source and it didn't make a difference.
>
>     Also further testing found me getting the same behavior with other
>     SIMD instructions. The common factor is in each case, ECX is set
>     to 0x7fffffff, and it's an operation using xmm ptr ecx+offset .
>
>     Additionally, turning the optimization level passed to createJIT
>     down appears to avoid it, so I'm now leaning towards a bug in one
>     of the optimization passes.
>
>     I'm going to dig through the passes controlled by that parameter
>     and see if I can narrow down which optimization is causing it.
>
>     Peter N
>
>
>     On 17/07/2013 1:58 PM, Solomon Boulos wrote:
>
>         As someone off list just told me, perhaps my new bug is the
>         same issue:
>
>         http://llvm.org/bugs/show_bug.cgi?id=16640
>
>         Do you happen to be using FastISel?
>
>         Solomon
>
>         On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com
>         <mailto:peter at uformia.com>> wrote:
>
>             Hello all,
>
>             I'm currently in the process of debugging a crash
>             occurring in our program. In LLVM 3.2 and 3.3 it appears
>             that JIT generated code is attempting to perform access
>             unaligned memory with a SSE2 instruction. However this
>             only happens under certain conditions that seem (but may
>             not be) related to the stacks state on calling the function.
>
>             Our program acts as a front-end, using the LLVM C++ API to
>             generate a JIT generated function. This function is
>             primarily mathematical, so we use the Vector types to take
>             advantage of SIMD instructions (as well as a few SSE2
>             intrinsics).
>
>             This worked in LLVM 2.8 but started failing in 3.2 and has
>             continued to fail in 3.3. It fails with no optimizations
>             applied to the LLVM Function/Module. It crashes with what
>             is reported as a memory access error (accessing
>             0xffffffff), however it's suggested that this is how the
>             SSE fault raising mechanism appears.
>
>             The generated instruction varies, but it seems to often be
>             similar to (I don't have it in front of me, sorry):
>             movapd xmm0, xmm[ecx+0x???????]
>             Where the xmm register changes, and the second parameter
>             is a memory access.
>             ECX is always set to 0x7ffffff - however I don't know if
>             this is part of the SSE error reporting process or is part
>             of the situation causing the error.
>
>             I haven't worked out exactly what code path etc is causing
>             this crash. I'm hoping that someone can tell me if there
>             were any changed requirements for working with SIMD in
>             LLVM 3.2 (or earlier, we haven't tried 3.0 or 3.1). I
>             currently suspect the use of GlobalVariable (we first
>             discovered the crash when using a feature that uses them),
>             however I have attempted using setAlignment on the
>             GlobalVariables without any change.
>
>             --
>             Peter N
>             _______________________________________________
>             LLVM Developers mailing list
>             LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>             http://llvm.cs.uiuc.edu
>             http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
> -- 
> ~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/0ace5f38/attachment.html>
-------------- next part --------------
; ModuleID = 'crashmodule'

@"460" = private constant [12 x <2 x double>] [<2 x
double> <double 1.000000e+00, double 1.000000e+00>, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
<double 1.000000e+00, double 1.000000e+00>, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
<double 1.000000e+00, double 1.000000e+00>, <2 x double>
zeroinitializer]
@"461" = private constant [12 x <2 x double>] [<2 x
double> <double 1.000000e+00, double 1.000000e+00>, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
<double 1.000000e+00, double 1.000000e+00>, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
<double 1.000000e+00, double 1.000000e+00>, <2 x double>
zeroinitializer]
@"462" = private constant [24 x <2 x double>] [<2 x
double> <double 1.000000e+00, double 1.000000e+00>, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
<double 1.000000e+00, double 1.000000e+00>, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
<double 1.000000e+00, double 1.000000e+00>, <2 x double>
zeroinitializer, <2 x double> <double 1.000000e+00, double
1.000000e+00>, <2 x double> zeroinitializer, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
zeroinitializer, <2 x double> <double 1.000000e+00, double
1.000000e+00>, <2 x double> zeroinitializer, <2 x double>
zeroinitializer, <2 x double> zeroinitializer, <2 x double>
zeroinitializer, <2 x double> <double 1.000000e+00, double
1.000000e+00>, <2 x double> zeroinitializer]

define double @crashfunc(double* %params) {
body:
  %0 = alloca <2 x double>
  %1 = alloca <4 x double>
  %2 = alloca { <2 x double>, <2 x double>, <2 x double> }
  %3 = load { <2 x double>, <2 x double>, <2 x double> }* %2
  %4 = extractvalue { <2 x double>, <2 x double>, <2 x double>
} %3, 0
  %5 = getelementptr double* %params, i32 0
  %6 = load double* %5
  %7 = insertelement <2 x double> %4, double %6, i32 0
  %8 = insertelement <2 x double> %7, double %6, i32 1
  %9 = insertvalue { <2 x double>, <2 x double>, <2 x double>
} %3, <2 x double> %8, 0
  %10 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %9, 1
  %11 = getelementptr double* %params, i32 1
  %12 = load double* %11
  %13 = insertelement <2 x double> %10, double %12, i32 0
  %14 = insertelement <2 x double> %13, double %12, i32 1
  %15 = insertvalue { <2 x double>, <2 x double>, <2 x double>
} %9, <2 x double> %14, 1
  %16 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %15, 2
  %17 = getelementptr double* %params, i32 2
  %18 = load double* %17
  %19 = insertelement <2 x double> %16, double %18, i32 0
  %20 = insertelement <2 x double> %19, double %18, i32 1
  %21 = insertvalue { <2 x double>, <2 x double>, <2 x double>
} %15, <2 x double> %20, 2
  store <4 x double> zeroinitializer, <4 x double>* %1
  store <2 x double> zeroinitializer, <2 x double>* %0
  br label %array_loop

array_loop:                                       ; preds = %array_loop_tail,
%body
  %22 = load <4 x double>* %1
  %23 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %21, 0
  %24 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %21, 1
  %25 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %21, 2
  %26 = extractelement <4 x double> %22, i32 0
  %27 = insertelement <2 x double> zeroinitializer, double %26, i32 0
  %28 = insertelement <2 x double> %27, double %26, i32 1
  %29 = fmul <2 x double> %28, <double 1.000000e+00, double
1.000000e+00>
  %30 = fsub <2 x double> %23, %29
  %31 = fmul <2 x double> %28, zeroinitializer
  %32 = fsub <2 x double> %24, %31
  %33 = fmul <2 x double> %28, zeroinitializer
  %34 = fsub <2 x double> %25, %33
  %35 = extractelement <4 x double> %22, i32 1
  %36 = insertelement <2 x double> zeroinitializer, double %35, i32 0
  %37 = insertelement <2 x double> %36, double %35, i32 1
  %38 = fmul <2 x double> %37, zeroinitializer
  %39 = fsub <2 x double> %30, %38
  %40 = fmul <2 x double> %37, <double 1.000000e+00, double
1.000000e+00>
  %41 = fsub <2 x double> %32, %40
  %42 = fmul <2 x double> %37, zeroinitializer
  %43 = fsub <2 x double> %34, %42
  %44 = extractelement <4 x double> %22, i32 2
  %45 = insertelement <2 x double> zeroinitializer, double %44, i32 0
  %46 = insertelement <2 x double> %45, double %44, i32 1
  %47 = fmul <2 x double> %46, zeroinitializer
  %48 = fsub <2 x double> %39, %47
  %49 = fmul <2 x double> %46, zeroinitializer
  %50 = fsub <2 x double> %41, %49
  %51 = fmul <2 x double> %46, <double 2.000000e+01, double
2.000000e+01>
  %52 = fsub <2 x double> %43, %51
  %53 = extractelement <4 x double> %22, i32 0
  %54 = fptoui double %53 to i32
  %55 = mul i32 %54, 12
  %56 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32
%55
  %57 = load <2 x double>* %56
  %58 = add i32 %55, 1
  %59 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32
%58
  %60 = load <2 x double>* %59
  %61 = add i32 %58, 1
  %62 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32
%61
  %63 = load <2 x double>* %62
  %64 = add i32 %61, 1
  %65 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32
%64
  %66 = load <2 x double>* %65
  %67 = add i32 %64, 1
  %68 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32
%67
  %69 = load <2 x double>* %68
  %70 = add i32 %67, 1
  %71 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32
%70
  %72 = load <2 x double>* %71
  %73 = add i32 %70, 1
  %74 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32
%73
  %75 = load <2 x double>* %74
  %76 = add i32 %73, 1
  %77 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32
%76
  %78 = load <2 x double>* %77
  %79 = add i32 %76, 1
  %80 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32
%79
  %81 = load <2 x double>* %80
  %82 = add i32 %79, 1
  %83 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32
%82
  %84 = load <2 x double>* %83
  %85 = add i32 %82, 1
  %86 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32
%85
  %87 = load <2 x double>* %86
  %88 = add i32 %85, 1
  %89 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32
%88
  %90 = load <2 x double>* %89
  %91 = fmul <2 x double> %52, %63
  %92 = fmul <2 x double> %50, %60
  %93 = fmul <2 x double> %48, %57
  %94 = fadd <2 x double> %93, %92
  %95 = fadd <2 x double> %94, %91
  %96 = fadd <2 x double> %95, %66
  %97 = fmul <2 x double> %52, %75
  %98 = fmul <2 x double> %50, %72
  %99 = fmul <2 x double> %48, %69
  %100 = fadd <2 x double> %99, %98
  %101 = fadd <2 x double> %100, %97
  %102 = fadd <2 x double> %101, %78
  %103 = fmul <2 x double> %52, %87
  %104 = fmul <2 x double> %50, %84
  %105 = fmul <2 x double> %48, %81
  %106 = fadd <2 x double> %105, %104
  %107 = fadd <2 x double> %106, %103
  %108 = fadd <2 x double> %107, %90
  %109 = extractelement <4 x double> %22, i32 1
  %110 = fptoui double %109 to i32
  %111 = mul i32 %110, 12
  %112 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32
%111
  %113 = load <2 x double>* %112
  %114 = add i32 %111, 1
  %115 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32
%114
  %116 = load <2 x double>* %115
  %117 = add i32 %114, 1
  %118 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32
%117
  %119 = load <2 x double>* %118
  %120 = add i32 %117, 1
  %121 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32
%120
  %122 = load <2 x double>* %121
  %123 = add i32 %120, 1
  %124 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32
%123
  %125 = load <2 x double>* %124
  %126 = add i32 %123, 1
  %127 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32
%126
  %128 = load <2 x double>* %127
  %129 = add i32 %126, 1
  %130 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32
%129
  %131 = load <2 x double>* %130
  %132 = add i32 %129, 1
  %133 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32
%132
  %134 = load <2 x double>* %133
  %135 = add i32 %132, 1
  %136 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32
%135
  %137 = load <2 x double>* %136
  %138 = add i32 %135, 1
  %139 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32
%138
  %140 = load <2 x double>* %139
  %141 = add i32 %138, 1
  %142 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32
%141
  %143 = load <2 x double>* %142
  %144 = add i32 %141, 1
  %145 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32
%144
  %146 = load <2 x double>* %145
  %147 = fmul <2 x double> %108, %119
  %148 = fmul <2 x double> %102, %116
  %149 = fmul <2 x double> %96, %113
  %150 = fadd <2 x double> %149, %148
  %151 = fadd <2 x double> %150, %147
  %152 = fadd <2 x double> %151, %122
  %153 = fmul <2 x double> %108, %131
  %154 = fmul <2 x double> %102, %128
  %155 = fmul <2 x double> %96, %125
  %156 = fadd <2 x double> %155, %154
  %157 = fadd <2 x double> %156, %153
  %158 = fadd <2 x double> %157, %134
  %159 = fmul <2 x double> %108, %143
  %160 = fmul <2 x double> %102, %140
  %161 = fmul <2 x double> %96, %137
  %162 = fadd <2 x double> %161, %160
  %163 = fadd <2 x double> %162, %159
  %164 = fadd <2 x double> %163, %146
  %165 = extractelement <4 x double> %22, i32 2
  %166 = fptoui double %165 to i32
  %167 = mul i32 %166, 12
  %168 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32
%167
  %169 = load <2 x double>* %168
  %170 = add i32 %167, 1
  %171 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32
%170
  %172 = load <2 x double>* %171
  %173 = add i32 %170, 1
  %174 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32
%173
  %175 = load <2 x double>* %174
  %176 = add i32 %173, 1
  %177 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32
%176
  %178 = load <2 x double>* %177
  %179 = add i32 %176, 1
  %180 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32
%179
  %181 = load <2 x double>* %180
  %182 = add i32 %179, 1
  %183 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32
%182
  %184 = load <2 x double>* %183
  %185 = add i32 %182, 1
  %186 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32
%185
  %187 = load <2 x double>* %186
  %188 = add i32 %185, 1
  %189 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32
%188
  %190 = load <2 x double>* %189
  %191 = add i32 %188, 1
  %192 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32
%191
  %193 = load <2 x double>* %192
  %194 = add i32 %191, 1
  %195 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32
%194
  %196 = load <2 x double>* %195
  %197 = add i32 %194, 1
  %198 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32
%197
  %199 = load <2 x double>* %198
  %200 = add i32 %197, 1
  %201 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32
%200
  %202 = load <2 x double>* %201
  %203 = fmul <2 x double> %164, %175
  %204 = fmul <2 x double> %158, %172
  %205 = fmul <2 x double> %152, %169
  %206 = fadd <2 x double> %205, %204
  %207 = fadd <2 x double> %206, %203
  %208 = fadd <2 x double> %207, %178
  %209 = fmul <2 x double> %164, %187
  %210 = fmul <2 x double> %158, %184
  %211 = fmul <2 x double> %152, %181
  %212 = fadd <2 x double> %211, %210
  %213 = fadd <2 x double> %212, %209
  %214 = fadd <2 x double> %213, %190
  %215 = fmul <2 x double> %164, %199
  %216 = fmul <2 x double> %158, %196
  %217 = fmul <2 x double> %152, %193
  %218 = fadd <2 x double> %217, %216
  %219 = fadd <2 x double> %218, %215
  %220 = fadd <2 x double> %219, %202
  %221 = insertvalue { <2 x double>, <2 x double>, <2 x
double> } %21, <2 x double> %208, 0
  %222 = insertvalue { <2 x double>, <2 x double>, <2 x
double> } %221, <2 x double> %214, 1
  %223 = insertvalue { <2 x double>, <2 x double>, <2 x
double> } %222, <2 x double> %220, 2
  %224 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %223, 0
  %225 = fsub <2 x double> %224, zeroinitializer
  %226 = fmul <2 x double> %225, <double 1.000000e-01, double
1.000000e-01>
  %227 = fmul <2 x double> %226, %226
  %228 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %223, 1
  %229 = fsub <2 x double> %228, zeroinitializer
  %230 = fmul <2 x double> %229, <double 1.000000e-01, double
1.000000e-01>
  %231 = fmul <2 x double> %230, %230
  %232 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %223, 2
  %233 = fsub <2 x double> %232, zeroinitializer
  %234 = fmul <2 x double> %233, <double 1.000000e-01, double
1.000000e-01>
  %235 = fmul <2 x double> %234, %234
  %236 = fadd <2 x double> %227, %231
  %237 = fadd <2 x double> %236, %235
  %238 = fsub <2 x double> <double 1.000000e+00, double
1.000000e+00>, %237
  %239 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %223, 0
  %240 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %223, 1
  %241 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %223, 2
  %242 = fsub <2 x double> %239, <double 0x402BD3D97C583BCD, double
0x402BD3D97C583BCD>
  %243 = fsub <2 x double> %240, <double 0x3FB9CFA0EA0F0EC0, double
0x3FB9CFA0EA0F0EC0>
  %244 = fsub <2 x double> %241, zeroinitializer
  %245 = insertvalue { <2 x double>, <2 x double>, <2 x
double> } %223, <2 x double> %242, 0
  %246 = insertvalue { <2 x double>, <2 x double>, <2 x
double> } %245, <2 x double> %243, 1
  %247 = insertvalue { <2 x double>, <2 x double>, <2 x
double> } %246, <2 x double> %244, 2
  %248 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %247, 0
  %249 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %247, 1
  %250 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %247, 2
  %251 = fsub <2 x double> %248, zeroinitializer
  %252 = fsub <2 x double> %249, zeroinitializer
  %253 = fsub <2 x double> %250, zeroinitializer
  %254 = fmul <2 x double> %251, <double 1.000000e+00, double
1.000000e+00>
  %255 = fmul <2 x double> %253, zeroinitializer
  %256 = fsub <2 x double> %254, %255
  %257 = fmul <2 x double> %251, zeroinitializer
  %258 = fmul <2 x double> %253, <double 1.000000e+00, double
1.000000e+00>
  %259 = fadd <2 x double> %257, %258
  %260 = fmul <2 x double> %256, <double 0xBFE4226452DA8FA3, double
0xBFE4226452DA8FA3>
  %261 = fmul <2 x double> %252, <double 0x3FE8DF30A2958450, double
0x3FE8DF30A2958450>
  %262 = fadd <2 x double> %260, %261
  %263 = fmul <2 x double> %256, <double 0x3FE8DF30A2958450, double
0x3FE8DF30A2958450>
  %264 = fmul <2 x double> %252, <double 0xBFE4226452DA8FA3, double
0xBFE4226452DA8FA3>
  %265 = fsub <2 x double> %264, %263
  %266 = fmul <2 x double> %265, <double 1.000000e+00, double
1.000000e+00>
  %267 = fmul <2 x double> %259, zeroinitializer
  %268 = fadd <2 x double> %266, %267
  %269 = fmul <2 x double> %265, zeroinitializer
  %270 = fmul <2 x double> %259, <double 1.000000e+00, double
1.000000e+00>
  %271 = fsub <2 x double> %270, %269
  %272 = fadd <2 x double> %262, zeroinitializer
  %273 = insertvalue { <2 x double>, <2 x double>, <2 x
double> } %247, <2 x double> %272, 0
  %274 = fadd <2 x double> %268, zeroinitializer
  %275 = insertvalue { <2 x double>, <2 x double>, <2 x
double> } %273, <2 x double> %274, 1
  %276 = fadd <2 x double> %271, zeroinitializer
  %277 = insertvalue { <2 x double>, <2 x double>, <2 x
double> } %275, <2 x double> %276, 2
  %278 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %277, 0
  %279 = fsub <2 x double> %278, zeroinitializer
  %280 = fmul <2 x double> %279, <double 0x3FC77E683470D9F8, double
0x3FC77E683470D9F8>
  %281 = fmul <2 x double> %280, %280
  %282 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %277, 1
  %283 = fsub <2 x double> %282, zeroinitializer
  %284 = fmul <2 x double> %283, <double 0x3FC77E683470D9F8, double
0x3FC77E683470D9F8>
  %285 = fmul <2 x double> %284, %284
  %286 = extractvalue { <2 x double>, <2 x double>, <2 x
double> } %277, 2
  %287 = fsub <2 x double> %286, zeroinitializer
  %288 = fmul <2 x double> %287, <double 0x3FC77E683470D9F8, double
0x3FC77E683470D9F8>
  %289 = fmul <2 x double> %288, %288
  %290 = fadd <2 x double> %281, %285
  %291 = fadd <2 x double> %290, %289
  %292 = fsub <2 x double> <double 1.000000e+00, double
1.000000e+00>, %291
  %293 = fmul <2 x double> %238, <double 1.000000e+00, double
1.000000e+00>
  %294 = fmul <2 x double> %293, %293
  %295 = fmul <2 x double> %292, <double 1.000000e+00, double
1.000000e+00>
  %296 = fmul <2 x double> %295, %295
  %297 = fadd <2 x double> %294, %296
  %298 = fadd <2 x double> %297, <double 1.000000e+00, double
1.000000e+00>
  %299 = extractelement <2 x double> %298, i32 0
  %300 = fdiv double 1.000000e+00, %299
  %301 = insertelement <2 x double> %298, double %300, i32 0
  %302 = extractelement <2 x double> %301, i32 1
  %303 = fdiv double 1.000000e+00, %302
  %304 = insertelement <2 x double> %301, double %303, i32 1
  %305 = fmul <2 x double> %304, <double 5.000000e-01, double
5.000000e-01>
  %306 = fmul <2 x double> %292, %292
  %307 = fmul <2 x double> %238, %238
  %308 = fadd <2 x double> %307, %306
  %309 = call <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double> %308)
  %310 = fadd <2 x double> %238, %292
  %311 = fadd <2 x double> %310, %309
  %312 = fadd <2 x double> %311, %305
  %313 = load <4 x double>* %1
  %314 = extractelement <4 x double> %313, i32 0
  %315 = extractelement <4 x double> %313, i32 1
  %316 = fadd double %314, %315
  %317 = extractelement <4 x double> %313, i32 2
  %318 = fadd double %316, %317
  %319 = fcmp oeq double %318, 0.000000e+00
  %320 = load <2 x double>* %0
  %321 = fmul <2 x double> %312, <double 1.000000e+00, double
1.000000e+00>
  %322 = fmul <2 x double> %321, %321
  %323 = fmul <2 x double> %320, %320
  %324 = fadd <2 x double> %323, %322
  %325 = call <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double> %324)
  %326 = fadd <2 x double> %320, %321
  %327 = fadd <2 x double> %326, %325
  %328 = select i1 %319, <2 x double> %312, <2 x double> %327
  store <2 x double> %328, <2 x double>* %0
  br label %array_loop_tail

array_loop_tail:                                  ; preds = %array_loop
  %329 = extractelement <4 x double> %313, i32 0
  %330 = fadd double %329, 1.000000e+00
  %331 = fcmp oge double %330, 1.000000e+00
  %332 = select i1 %331, double 0.000000e+00, double %330
  %333 = insertelement <4 x double> %313, double %332, i32 0
  %334 = extractelement <4 x double> %333, i32 1
  %335 = fadd double %334, 1.000000e+00
  %336 = select i1 %331, double %335, double %334
  %337 = fcmp oge double %336, 1.000000e+00
  %338 = select i1 %337, double 0.000000e+00, double %336
  %339 = insertelement <4 x double> %333, double %338, i32 1
  %340 = extractelement <4 x double> %339, i32 2
  %341 = fadd double %340, 1.000000e+00
  %342 = select i1 %337, double %341, double %340
  %343 = fcmp oge double %342, 2.000000e+00
  %344 = select i1 %343, double 0.000000e+00, double %342
  %345 = insertelement <4 x double> %339, double %344, i32 2
  store <4 x double> %345, <4 x double>* %1
  br i1 %343, label %array_loop_end, label %array_loop

array_loop_end:                                   ; preds = %array_loop_tail
  %346 = load <2 x double>* %0
  %347 = extractelement <2 x double> %346, i32 0
  ret double %347
}

; Function Attrs: nounwind readonly
declare double @llvm.sin.f64(double) #0

; Function Attrs: nounwind readonly
declare double @llvm.cos.f64(double) #0

; Function Attrs: nounwind readnone
declare double @asin(double) #1

; Function Attrs: nounwind readnone
declare double @acos(double) #1

; Function Attrs: nounwind readnone
declare double @atan(double) #1

; Function Attrs: nounwind readnone
declare double @flr(double) #1

; Function Attrs: nounwind readonly
declare double @llvm.exp.f64(double) #0

; Function Attrs: nounwind readonly
declare double @llvm.log.f64(double) #0

; Function Attrs: nounwind readnone
declare void @dump(double) #1

; Function Attrs: nounwind readonly
declare double @llvm.pow.f64(double, double) #0

; Function Attrs: nounwind readnone
declare <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double>) #1

attributes #0 = { nounwind readonly }
attributes #1 = { nounwind readnone }
-------------- next part --------------
A non-text attachment was scrubbed...
Name: module-dump-2.zip
Type: application/octet-stream
Size: 23030 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/0ace5f38/attachment.obj>

Peter Newman

2013-Jul-19 05:12 UTC

head link

[LLVMdev] SIMD instructions and memory alignment on X86

After stepping through the produced assembly, I believe I have a culprit.

One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of ECX 
- while the produced code is expecting it to still contain its previous 
value.

Peter N

On 19/07/2013 2:09 PM, Peter Newman wrote:> I've attached the module->dump() that our code is producing. 
> Unfortunately this is the smallest test case I have available.
>
> This is before any optimization passes are applied. There are two 
> separate modules in existence at the time, and there are no guarantees 
> about the order the surrounding code calls those functions, so there 
> may be some interaction between them? There shouldn't be, they
don't
> refer to any common memory etc. There is no multi-threading occurring.
>
> The function in module-dump.ll (called crashfunc in this file) is 
> called with
> -        func_params    0x0018f3b0    double [3]
>         [0x0]    -11.339976634695301    double
>         [0x1]    -9.7504239056205506    double
>         [0x2]    -5.2900856817382804    double
> at the time of the exception.
>
> This is compiled on a "i686-pc-win32" triple. All of the
non-intrinsic
> functions referred to in these modules are the standard equivalents 
> from the MSVC library (e.g. @asin is the standard C lib    double 
> asin( double ) ).
>
> Hopefully this is reproducible for you.
>
> --
> PeterN
>
> On 18/07/2013 4:37 PM, Craig Topper wrote:
>> Are you able to send any IR for others to reproduce this issue?
>>
>>
>> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com
>> <mailto:peter at uformia.com>> wrote:
>>
>>     Unfortunately, this doesn't appear to be the bug I'm
hitting. I
>>     applied the fix to my source and it didn't make a difference.
>>
>>     Also further testing found me getting the same behavior with
>>     other SIMD instructions. The common factor is in each case, ECX
>>     is set to 0x7fffffff, and it's an operation using xmm ptr
>>     ecx+offset .
>>
>>     Additionally, turning the optimization level passed to createJIT
>>     down appears to avoid it, so I'm now leaning towards a bug in
one
>>     of the optimization passes.
>>
>>     I'm going to dig through the passes controlled by that
parameter
>>     and see if I can narrow down which optimization is causing it.
>>
>>     Peter N
>>
>>
>>     On 17/07/2013 1:58 PM, Solomon Boulos wrote:
>>
>>         As someone off list just told me, perhaps my new bug is the
>>         same issue:
>>
>>         http://llvm.org/bugs/show_bug.cgi?id=16640
>>
>>         Do you happen to be using FastISel?
>>
>>         Solomon
>>
>>         On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at
uformia.com
>>         <mailto:peter at uformia.com>> wrote:
>>
>>             Hello all,
>>
>>             I'm currently in the process of debugging a crash
>>             occurring in our program. In LLVM 3.2 and 3.3 it appears
>>             that JIT generated code is attempting to perform access
>>             unaligned memory with a SSE2 instruction. However this
>>             only happens under certain conditions that seem (but may
>>             not be) related to the stacks state on calling the
function.
>>
>>             Our program acts as a front-end, using the LLVM C++ API
>>             to generate a JIT generated function. This function is
>>             primarily mathematical, so we use the Vector types to
>>             take advantage of SIMD instructions (as well as a few
>>             SSE2 intrinsics).
>>
>>             This worked in LLVM 2.8 but started failing in 3.2 and
>>             has continued to fail in 3.3. It fails with no
>>             optimizations applied to the LLVM Function/Module. It
>>             crashes with what is reported as a memory access error
>>             (accessing 0xffffffff), however it's suggested that
this
>>             is how the SSE fault raising mechanism appears.
>>
>>             The generated instruction varies, but it seems to often
>>             be similar to (I don't have it in front of me, sorry):
>>             movapd xmm0, xmm[ecx+0x???????]
>>             Where the xmm register changes, and the second parameter
>>             is a memory access.
>>             ECX is always set to 0x7ffffff - however I don't know
if
>>             this is part of the SSE error reporting process or is
>>             part of the situation causing the error.
>>
>>             I haven't worked out exactly what code path etc is
>>             causing this crash. I'm hoping that someone can tell me
>>             if there were any changed requirements for working with
>>             SIMD in LLVM 3.2 (or earlier, we haven't tried 3.0 or
>>             3.1). I currently suspect the use of GlobalVariable (we
>>             first discovered the crash when using a feature that uses
>>             them), however I have attempted using setAlignment on the
>>             GlobalVariables without any change.
>>
>>             --
>>             Peter N
>>             _______________________________________________
>>             LLVM Developers mailing list
>>             LLVMdev at cs.uiuc.edu <mailto:LLVMdev at
cs.uiuc.edu>
>>             http://llvm.cs.uiuc.edu
>>             http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>     _______________________________________________
>>     LLVM Developers mailing list
>>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>     http://llvm.cs.uiuc.edu
>>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>>
>> -- 
>> ~Craig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/9de718fe/attachment.html>

Craig Topper

2013-Jul-19 05:25 UTC

head link

[LLVMdev] SIMD instructions and memory alignment on X86

What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things
prefixed
with "llvm.x86".


On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman <peter at uformia.com>
wrote:
>  After stepping through the produced assembly, I believe I have a culprit.
>
> One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of ECX -
> while the produced code is expecting it to still contain its previous
value.
>
> Peter N
>
>
> On 19/07/2013 2:09 PM, Peter Newman wrote:
>
> I've attached the module->dump() that our code is producing.
Unfortunately
> this is the smallest test case I have available.
>
> This is before any optimization passes are applied. There are two separate
> modules in existence at the time, and there are no guarantees about the
> order the surrounding code calls those functions, so there may be some
> interaction between them? There shouldn't be, they don't refer to
any
> common memory etc. There is no multi-threading occurring.
>
> The function in module-dump.ll (called crashfunc in this file) is called
> with
> -        func_params    0x0018f3b0    double [3]
>         [0x0]    -11.339976634695301    double
>         [0x1]    -9.7504239056205506    double
>         [0x2]    -5.2900856817382804    double
> at the time of the exception.
>
> This is compiled on a "i686-pc-win32" triple. All of the
non-intrinsic
> functions referred to in these modules are the standard equivalents from
> the MSVC library (e.g. @asin is the standard C lib    double asin( double )
> ).
>
> Hopefully this is reproducible for you.
>
> --
> PeterN
>
> On 18/07/2013 4:37 PM, Craig Topper wrote:
>
> Are you able to send any IR for others to reproduce this issue?
>
>
> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com>
wrote:
>
>> Unfortunately, this doesn't appear to be the bug I'm hitting. I
applied
>> the fix to my source and it didn't make a difference.
>>
>> Also further testing found me getting the same behavior with other SIMD
>> instructions. The common factor is in each case, ECX is set to
0x7fffffff,
>> and it's an operation using xmm ptr ecx+offset .
>>
>> Additionally, turning the optimization level passed to createJIT down
>> appears to avoid it, so I'm now leaning towards a bug in one of the
>> optimization passes.
>>
>> I'm going to dig through the passes controlled by that parameter
and see
>> if I can narrow down which optimization is causing it.
>>
>> Peter N
>>
>>
>> On 17/07/2013 1:58 PM, Solomon Boulos wrote:
>>
>>> As someone off list just told me, perhaps my new bug is the same
issue:
>>>
>>>    http://llvm.org/bugs/show_bug.cgi?id=16640
>>>
>>> Do you happen to be using FastISel?
>>>
>>> Solomon
>>>
>>> On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at
uformia.com> wrote:
>>>
>>>  Hello all,
>>>>
>>>> I'm currently in the process of debugging a crash occurring
in our
>>>> program. In LLVM 3.2 and 3.3 it appears that JIT generated code
is
>>>> attempting to perform access unaligned memory with a SSE2
instruction.
>>>> However this only happens under certain conditions that seem
(but may not
>>>> be) related to the stacks state on calling the function.
>>>>
>>>> Our program acts as a front-end, using the LLVM C++ API to
generate a
>>>> JIT generated function. This function is primarily
mathematical, so we use
>>>> the Vector types to take advantage of SIMD instructions (as
well as a few
>>>> SSE2 intrinsics).
>>>>
>>>> This worked in LLVM 2.8 but started failing in 3.2 and has
continued to
>>>> fail in 3.3. It fails with no optimizations applied to the LLVM
>>>> Function/Module. It crashes with what is reported as a memory
access error
>>>> (accessing 0xffffffff), however it's suggested that this is
how the SSE
>>>> fault raising mechanism appears.
>>>>
>>>> The generated instruction varies, but it seems to often be
similar to
>>>> (I don't have it in front of me, sorry):
>>>> movapd xmm0, xmm[ecx+0x???????]
>>>> Where the xmm register changes, and the second parameter is a
memory
>>>> access.
>>>> ECX is always set to 0x7ffffff - however I don't know if
this is part
>>>> of the SSE error reporting process or is part of the situation
causing the
>>>> error.
>>>>
>>>> I haven't worked out exactly what code path etc is causing
this crash.
>>>> I'm hoping that someone can tell me if there were any
changed requirements
>>>> for working with SIMD in LLVM 3.2 (or earlier, we haven't
tried 3.0 or
>>>> 3.1). I currently suspect the use of GlobalVariable (we first
discovered
>>>> the crash when using a feature that uses them), however I have
attempted
>>>> using setAlignment on the GlobalVariables without any change.
>>>>
>>>> --
>>>> Peter N
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
>
>  --
> ~Craig
>
>
>
>

-- 
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130718/21541c86/attachment.html>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Jul 2013 - [LLVMdev] SIMD instructions and memory alignment on X86

[LLVMdev] SIMD instructions and memory alignment on X86

[LLVMdev] SIMD instructions and memory alignment on X86

[LLVMdev] SIMD instructions and memory alignment on X86

Reasonably Related Threads