Peter Newman
2013-Jul-19 04:09 UTC
[LLVMdev] SIMD instructions and memory alignment on X86
I've attached the module->dump() that our code is producing. Unfortunately this is the smallest test case I have available. This is before any optimization passes are applied. There are two separate modules in existence at the time, and there are no guarantees about the order the surrounding code calls those functions, so there may be some interaction between them? There shouldn't be, they don't refer to any common memory etc. There is no multi-threading occurring. The function in module-dump.ll (called crashfunc in this file) is called with - func_params 0x0018f3b0 double [3] [0x0] -11.339976634695301 double [0x1] -9.7504239056205506 double [0x2] -5.2900856817382804 double at the time of the exception. This is compiled on a "i686-pc-win32" triple. All of the non-intrinsic functions referred to in these modules are the standard equivalents from the MSVC library (e.g. @asin is the standard C lib double asin( double ) ). Hopefully this is reproducible for you. -- PeterN On 18/07/2013 4:37 PM, Craig Topper wrote:> Are you able to send any IR for others to reproduce this issue? > > > On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com > <mailto:peter at uformia.com>> wrote: > > Unfortunately, this doesn't appear to be the bug I'm hitting. I > applied the fix to my source and it didn't make a difference. > > Also further testing found me getting the same behavior with other > SIMD instructions. The common factor is in each case, ECX is set > to 0x7fffffff, and it's an operation using xmm ptr ecx+offset . > > Additionally, turning the optimization level passed to createJIT > down appears to avoid it, so I'm now leaning towards a bug in one > of the optimization passes. > > I'm going to dig through the passes controlled by that parameter > and see if I can narrow down which optimization is causing it. > > Peter N > > > On 17/07/2013 1:58 PM, Solomon Boulos wrote: > > As someone off list just told me, perhaps my new bug is the > same issue: > > http://llvm.org/bugs/show_bug.cgi?id=16640 > > Do you happen to be using FastISel? > > Solomon > > On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com > <mailto:peter at uformia.com>> wrote: > > Hello all, > > I'm currently in the process of debugging a crash > occurring in our program. In LLVM 3.2 and 3.3 it appears > that JIT generated code is attempting to perform access > unaligned memory with a SSE2 instruction. However this > only happens under certain conditions that seem (but may > not be) related to the stacks state on calling the function. > > Our program acts as a front-end, using the LLVM C++ API to > generate a JIT generated function. This function is > primarily mathematical, so we use the Vector types to take > advantage of SIMD instructions (as well as a few SSE2 > intrinsics). > > This worked in LLVM 2.8 but started failing in 3.2 and has > continued to fail in 3.3. It fails with no optimizations > applied to the LLVM Function/Module. It crashes with what > is reported as a memory access error (accessing > 0xffffffff), however it's suggested that this is how the > SSE fault raising mechanism appears. > > The generated instruction varies, but it seems to often be > similar to (I don't have it in front of me, sorry): > movapd xmm0, xmm[ecx+0x???????] > Where the xmm register changes, and the second parameter > is a memory access. > ECX is always set to 0x7ffffff - however I don't know if > this is part of the SSE error reporting process or is part > of the situation causing the error. > > I haven't worked out exactly what code path etc is causing > this crash. I'm hoping that someone can tell me if there > were any changed requirements for working with SIMD in > LLVM 3.2 (or earlier, we haven't tried 3.0 or 3.1). I > currently suspect the use of GlobalVariable (we first > discovered the crash when using a feature that uses them), > however I have attempted using setAlignment on the > GlobalVariables without any change. > > -- > Peter N > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > -- > ~Craig-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/0ace5f38/attachment.html> -------------- next part -------------- ; ModuleID = 'crashmodule' @"460" = private constant [12 x <2 x double>] [<2 x double> <double 1.000000e+00, double 1.000000e+00>, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> <double 1.000000e+00, double 1.000000e+00>, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> <double 1.000000e+00, double 1.000000e+00>, <2 x double> zeroinitializer] @"461" = private constant [12 x <2 x double>] [<2 x double> <double 1.000000e+00, double 1.000000e+00>, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> <double 1.000000e+00, double 1.000000e+00>, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> <double 1.000000e+00, double 1.000000e+00>, <2 x double> zeroinitializer] @"462" = private constant [24 x <2 x double>] [<2 x double> <double 1.000000e+00, double 1.000000e+00>, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> <double 1.000000e+00, double 1.000000e+00>, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> <double 1.000000e+00, double 1.000000e+00>, <2 x double> zeroinitializer, <2 x double> <double 1.000000e+00, double 1.000000e+00>, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> <double 1.000000e+00, double 1.000000e+00>, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> zeroinitializer, <2 x double> <double 1.000000e+00, double 1.000000e+00>, <2 x double> zeroinitializer] define double @crashfunc(double* %params) { body: %0 = alloca <2 x double> %1 = alloca <4 x double> %2 = alloca { <2 x double>, <2 x double>, <2 x double> } %3 = load { <2 x double>, <2 x double>, <2 x double> }* %2 %4 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %3, 0 %5 = getelementptr double* %params, i32 0 %6 = load double* %5 %7 = insertelement <2 x double> %4, double %6, i32 0 %8 = insertelement <2 x double> %7, double %6, i32 1 %9 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %3, <2 x double> %8, 0 %10 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %9, 1 %11 = getelementptr double* %params, i32 1 %12 = load double* %11 %13 = insertelement <2 x double> %10, double %12, i32 0 %14 = insertelement <2 x double> %13, double %12, i32 1 %15 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %9, <2 x double> %14, 1 %16 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %15, 2 %17 = getelementptr double* %params, i32 2 %18 = load double* %17 %19 = insertelement <2 x double> %16, double %18, i32 0 %20 = insertelement <2 x double> %19, double %18, i32 1 %21 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %15, <2 x double> %20, 2 store <4 x double> zeroinitializer, <4 x double>* %1 store <2 x double> zeroinitializer, <2 x double>* %0 br label %array_loop array_loop: ; preds = %array_loop_tail, %body %22 = load <4 x double>* %1 %23 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %21, 0 %24 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %21, 1 %25 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %21, 2 %26 = extractelement <4 x double> %22, i32 0 %27 = insertelement <2 x double> zeroinitializer, double %26, i32 0 %28 = insertelement <2 x double> %27, double %26, i32 1 %29 = fmul <2 x double> %28, <double 1.000000e+00, double 1.000000e+00> %30 = fsub <2 x double> %23, %29 %31 = fmul <2 x double> %28, zeroinitializer %32 = fsub <2 x double> %24, %31 %33 = fmul <2 x double> %28, zeroinitializer %34 = fsub <2 x double> %25, %33 %35 = extractelement <4 x double> %22, i32 1 %36 = insertelement <2 x double> zeroinitializer, double %35, i32 0 %37 = insertelement <2 x double> %36, double %35, i32 1 %38 = fmul <2 x double> %37, zeroinitializer %39 = fsub <2 x double> %30, %38 %40 = fmul <2 x double> %37, <double 1.000000e+00, double 1.000000e+00> %41 = fsub <2 x double> %32, %40 %42 = fmul <2 x double> %37, zeroinitializer %43 = fsub <2 x double> %34, %42 %44 = extractelement <4 x double> %22, i32 2 %45 = insertelement <2 x double> zeroinitializer, double %44, i32 0 %46 = insertelement <2 x double> %45, double %44, i32 1 %47 = fmul <2 x double> %46, zeroinitializer %48 = fsub <2 x double> %39, %47 %49 = fmul <2 x double> %46, zeroinitializer %50 = fsub <2 x double> %41, %49 %51 = fmul <2 x double> %46, <double 2.000000e+01, double 2.000000e+01> %52 = fsub <2 x double> %43, %51 %53 = extractelement <4 x double> %22, i32 0 %54 = fptoui double %53 to i32 %55 = mul i32 %54, 12 %56 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %55 %57 = load <2 x double>* %56 %58 = add i32 %55, 1 %59 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %58 %60 = load <2 x double>* %59 %61 = add i32 %58, 1 %62 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %61 %63 = load <2 x double>* %62 %64 = add i32 %61, 1 %65 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %64 %66 = load <2 x double>* %65 %67 = add i32 %64, 1 %68 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %67 %69 = load <2 x double>* %68 %70 = add i32 %67, 1 %71 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %70 %72 = load <2 x double>* %71 %73 = add i32 %70, 1 %74 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %73 %75 = load <2 x double>* %74 %76 = add i32 %73, 1 %77 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %76 %78 = load <2 x double>* %77 %79 = add i32 %76, 1 %80 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %79 %81 = load <2 x double>* %80 %82 = add i32 %79, 1 %83 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %82 %84 = load <2 x double>* %83 %85 = add i32 %82, 1 %86 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %85 %87 = load <2 x double>* %86 %88 = add i32 %85, 1 %89 = getelementptr [12 x <2 x double>]* @"460", i32 0, i32 %88 %90 = load <2 x double>* %89 %91 = fmul <2 x double> %52, %63 %92 = fmul <2 x double> %50, %60 %93 = fmul <2 x double> %48, %57 %94 = fadd <2 x double> %93, %92 %95 = fadd <2 x double> %94, %91 %96 = fadd <2 x double> %95, %66 %97 = fmul <2 x double> %52, %75 %98 = fmul <2 x double> %50, %72 %99 = fmul <2 x double> %48, %69 %100 = fadd <2 x double> %99, %98 %101 = fadd <2 x double> %100, %97 %102 = fadd <2 x double> %101, %78 %103 = fmul <2 x double> %52, %87 %104 = fmul <2 x double> %50, %84 %105 = fmul <2 x double> %48, %81 %106 = fadd <2 x double> %105, %104 %107 = fadd <2 x double> %106, %103 %108 = fadd <2 x double> %107, %90 %109 = extractelement <4 x double> %22, i32 1 %110 = fptoui double %109 to i32 %111 = mul i32 %110, 12 %112 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %111 %113 = load <2 x double>* %112 %114 = add i32 %111, 1 %115 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %114 %116 = load <2 x double>* %115 %117 = add i32 %114, 1 %118 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %117 %119 = load <2 x double>* %118 %120 = add i32 %117, 1 %121 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %120 %122 = load <2 x double>* %121 %123 = add i32 %120, 1 %124 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %123 %125 = load <2 x double>* %124 %126 = add i32 %123, 1 %127 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %126 %128 = load <2 x double>* %127 %129 = add i32 %126, 1 %130 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %129 %131 = load <2 x double>* %130 %132 = add i32 %129, 1 %133 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %132 %134 = load <2 x double>* %133 %135 = add i32 %132, 1 %136 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %135 %137 = load <2 x double>* %136 %138 = add i32 %135, 1 %139 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %138 %140 = load <2 x double>* %139 %141 = add i32 %138, 1 %142 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %141 %143 = load <2 x double>* %142 %144 = add i32 %141, 1 %145 = getelementptr [12 x <2 x double>]* @"461", i32 0, i32 %144 %146 = load <2 x double>* %145 %147 = fmul <2 x double> %108, %119 %148 = fmul <2 x double> %102, %116 %149 = fmul <2 x double> %96, %113 %150 = fadd <2 x double> %149, %148 %151 = fadd <2 x double> %150, %147 %152 = fadd <2 x double> %151, %122 %153 = fmul <2 x double> %108, %131 %154 = fmul <2 x double> %102, %128 %155 = fmul <2 x double> %96, %125 %156 = fadd <2 x double> %155, %154 %157 = fadd <2 x double> %156, %153 %158 = fadd <2 x double> %157, %134 %159 = fmul <2 x double> %108, %143 %160 = fmul <2 x double> %102, %140 %161 = fmul <2 x double> %96, %137 %162 = fadd <2 x double> %161, %160 %163 = fadd <2 x double> %162, %159 %164 = fadd <2 x double> %163, %146 %165 = extractelement <4 x double> %22, i32 2 %166 = fptoui double %165 to i32 %167 = mul i32 %166, 12 %168 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %167 %169 = load <2 x double>* %168 %170 = add i32 %167, 1 %171 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %170 %172 = load <2 x double>* %171 %173 = add i32 %170, 1 %174 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %173 %175 = load <2 x double>* %174 %176 = add i32 %173, 1 %177 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %176 %178 = load <2 x double>* %177 %179 = add i32 %176, 1 %180 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %179 %181 = load <2 x double>* %180 %182 = add i32 %179, 1 %183 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %182 %184 = load <2 x double>* %183 %185 = add i32 %182, 1 %186 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %185 %187 = load <2 x double>* %186 %188 = add i32 %185, 1 %189 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %188 %190 = load <2 x double>* %189 %191 = add i32 %188, 1 %192 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %191 %193 = load <2 x double>* %192 %194 = add i32 %191, 1 %195 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %194 %196 = load <2 x double>* %195 %197 = add i32 %194, 1 %198 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %197 %199 = load <2 x double>* %198 %200 = add i32 %197, 1 %201 = getelementptr [24 x <2 x double>]* @"462", i32 0, i32 %200 %202 = load <2 x double>* %201 %203 = fmul <2 x double> %164, %175 %204 = fmul <2 x double> %158, %172 %205 = fmul <2 x double> %152, %169 %206 = fadd <2 x double> %205, %204 %207 = fadd <2 x double> %206, %203 %208 = fadd <2 x double> %207, %178 %209 = fmul <2 x double> %164, %187 %210 = fmul <2 x double> %158, %184 %211 = fmul <2 x double> %152, %181 %212 = fadd <2 x double> %211, %210 %213 = fadd <2 x double> %212, %209 %214 = fadd <2 x double> %213, %190 %215 = fmul <2 x double> %164, %199 %216 = fmul <2 x double> %158, %196 %217 = fmul <2 x double> %152, %193 %218 = fadd <2 x double> %217, %216 %219 = fadd <2 x double> %218, %215 %220 = fadd <2 x double> %219, %202 %221 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %21, <2 x double> %208, 0 %222 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %221, <2 x double> %214, 1 %223 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %222, <2 x double> %220, 2 %224 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %223, 0 %225 = fsub <2 x double> %224, zeroinitializer %226 = fmul <2 x double> %225, <double 1.000000e-01, double 1.000000e-01> %227 = fmul <2 x double> %226, %226 %228 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %223, 1 %229 = fsub <2 x double> %228, zeroinitializer %230 = fmul <2 x double> %229, <double 1.000000e-01, double 1.000000e-01> %231 = fmul <2 x double> %230, %230 %232 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %223, 2 %233 = fsub <2 x double> %232, zeroinitializer %234 = fmul <2 x double> %233, <double 1.000000e-01, double 1.000000e-01> %235 = fmul <2 x double> %234, %234 %236 = fadd <2 x double> %227, %231 %237 = fadd <2 x double> %236, %235 %238 = fsub <2 x double> <double 1.000000e+00, double 1.000000e+00>, %237 %239 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %223, 0 %240 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %223, 1 %241 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %223, 2 %242 = fsub <2 x double> %239, <double 0x402BD3D97C583BCD, double 0x402BD3D97C583BCD> %243 = fsub <2 x double> %240, <double 0x3FB9CFA0EA0F0EC0, double 0x3FB9CFA0EA0F0EC0> %244 = fsub <2 x double> %241, zeroinitializer %245 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %223, <2 x double> %242, 0 %246 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %245, <2 x double> %243, 1 %247 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %246, <2 x double> %244, 2 %248 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %247, 0 %249 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %247, 1 %250 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %247, 2 %251 = fsub <2 x double> %248, zeroinitializer %252 = fsub <2 x double> %249, zeroinitializer %253 = fsub <2 x double> %250, zeroinitializer %254 = fmul <2 x double> %251, <double 1.000000e+00, double 1.000000e+00> %255 = fmul <2 x double> %253, zeroinitializer %256 = fsub <2 x double> %254, %255 %257 = fmul <2 x double> %251, zeroinitializer %258 = fmul <2 x double> %253, <double 1.000000e+00, double 1.000000e+00> %259 = fadd <2 x double> %257, %258 %260 = fmul <2 x double> %256, <double 0xBFE4226452DA8FA3, double 0xBFE4226452DA8FA3> %261 = fmul <2 x double> %252, <double 0x3FE8DF30A2958450, double 0x3FE8DF30A2958450> %262 = fadd <2 x double> %260, %261 %263 = fmul <2 x double> %256, <double 0x3FE8DF30A2958450, double 0x3FE8DF30A2958450> %264 = fmul <2 x double> %252, <double 0xBFE4226452DA8FA3, double 0xBFE4226452DA8FA3> %265 = fsub <2 x double> %264, %263 %266 = fmul <2 x double> %265, <double 1.000000e+00, double 1.000000e+00> %267 = fmul <2 x double> %259, zeroinitializer %268 = fadd <2 x double> %266, %267 %269 = fmul <2 x double> %265, zeroinitializer %270 = fmul <2 x double> %259, <double 1.000000e+00, double 1.000000e+00> %271 = fsub <2 x double> %270, %269 %272 = fadd <2 x double> %262, zeroinitializer %273 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %247, <2 x double> %272, 0 %274 = fadd <2 x double> %268, zeroinitializer %275 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %273, <2 x double> %274, 1 %276 = fadd <2 x double> %271, zeroinitializer %277 = insertvalue { <2 x double>, <2 x double>, <2 x double> } %275, <2 x double> %276, 2 %278 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %277, 0 %279 = fsub <2 x double> %278, zeroinitializer %280 = fmul <2 x double> %279, <double 0x3FC77E683470D9F8, double 0x3FC77E683470D9F8> %281 = fmul <2 x double> %280, %280 %282 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %277, 1 %283 = fsub <2 x double> %282, zeroinitializer %284 = fmul <2 x double> %283, <double 0x3FC77E683470D9F8, double 0x3FC77E683470D9F8> %285 = fmul <2 x double> %284, %284 %286 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %277, 2 %287 = fsub <2 x double> %286, zeroinitializer %288 = fmul <2 x double> %287, <double 0x3FC77E683470D9F8, double 0x3FC77E683470D9F8> %289 = fmul <2 x double> %288, %288 %290 = fadd <2 x double> %281, %285 %291 = fadd <2 x double> %290, %289 %292 = fsub <2 x double> <double 1.000000e+00, double 1.000000e+00>, %291 %293 = fmul <2 x double> %238, <double 1.000000e+00, double 1.000000e+00> %294 = fmul <2 x double> %293, %293 %295 = fmul <2 x double> %292, <double 1.000000e+00, double 1.000000e+00> %296 = fmul <2 x double> %295, %295 %297 = fadd <2 x double> %294, %296 %298 = fadd <2 x double> %297, <double 1.000000e+00, double 1.000000e+00> %299 = extractelement <2 x double> %298, i32 0 %300 = fdiv double 1.000000e+00, %299 %301 = insertelement <2 x double> %298, double %300, i32 0 %302 = extractelement <2 x double> %301, i32 1 %303 = fdiv double 1.000000e+00, %302 %304 = insertelement <2 x double> %301, double %303, i32 1 %305 = fmul <2 x double> %304, <double 5.000000e-01, double 5.000000e-01> %306 = fmul <2 x double> %292, %292 %307 = fmul <2 x double> %238, %238 %308 = fadd <2 x double> %307, %306 %309 = call <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double> %308) %310 = fadd <2 x double> %238, %292 %311 = fadd <2 x double> %310, %309 %312 = fadd <2 x double> %311, %305 %313 = load <4 x double>* %1 %314 = extractelement <4 x double> %313, i32 0 %315 = extractelement <4 x double> %313, i32 1 %316 = fadd double %314, %315 %317 = extractelement <4 x double> %313, i32 2 %318 = fadd double %316, %317 %319 = fcmp oeq double %318, 0.000000e+00 %320 = load <2 x double>* %0 %321 = fmul <2 x double> %312, <double 1.000000e+00, double 1.000000e+00> %322 = fmul <2 x double> %321, %321 %323 = fmul <2 x double> %320, %320 %324 = fadd <2 x double> %323, %322 %325 = call <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double> %324) %326 = fadd <2 x double> %320, %321 %327 = fadd <2 x double> %326, %325 %328 = select i1 %319, <2 x double> %312, <2 x double> %327 store <2 x double> %328, <2 x double>* %0 br label %array_loop_tail array_loop_tail: ; preds = %array_loop %329 = extractelement <4 x double> %313, i32 0 %330 = fadd double %329, 1.000000e+00 %331 = fcmp oge double %330, 1.000000e+00 %332 = select i1 %331, double 0.000000e+00, double %330 %333 = insertelement <4 x double> %313, double %332, i32 0 %334 = extractelement <4 x double> %333, i32 1 %335 = fadd double %334, 1.000000e+00 %336 = select i1 %331, double %335, double %334 %337 = fcmp oge double %336, 1.000000e+00 %338 = select i1 %337, double 0.000000e+00, double %336 %339 = insertelement <4 x double> %333, double %338, i32 1 %340 = extractelement <4 x double> %339, i32 2 %341 = fadd double %340, 1.000000e+00 %342 = select i1 %337, double %341, double %340 %343 = fcmp oge double %342, 2.000000e+00 %344 = select i1 %343, double 0.000000e+00, double %342 %345 = insertelement <4 x double> %339, double %344, i32 2 store <4 x double> %345, <4 x double>* %1 br i1 %343, label %array_loop_end, label %array_loop array_loop_end: ; preds = %array_loop_tail %346 = load <2 x double>* %0 %347 = extractelement <2 x double> %346, i32 0 ret double %347 } ; Function Attrs: nounwind readonly declare double @llvm.sin.f64(double) #0 ; Function Attrs: nounwind readonly declare double @llvm.cos.f64(double) #0 ; Function Attrs: nounwind readnone declare double @asin(double) #1 ; Function Attrs: nounwind readnone declare double @acos(double) #1 ; Function Attrs: nounwind readnone declare double @atan(double) #1 ; Function Attrs: nounwind readnone declare double @flr(double) #1 ; Function Attrs: nounwind readonly declare double @llvm.exp.f64(double) #0 ; Function Attrs: nounwind readonly declare double @llvm.log.f64(double) #0 ; Function Attrs: nounwind readnone declare void @dump(double) #1 ; Function Attrs: nounwind readonly declare double @llvm.pow.f64(double, double) #0 ; Function Attrs: nounwind readnone declare <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double>) #1 attributes #0 = { nounwind readonly } attributes #1 = { nounwind readnone } -------------- next part -------------- A non-text attachment was scrubbed... Name: module-dump-2.zip Type: application/octet-stream Size: 23030 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/0ace5f38/attachment.obj>
Peter Newman
2013-Jul-19 05:12 UTC
[LLVMdev] SIMD instructions and memory alignment on X86
After stepping through the produced assembly, I believe I have a culprit. One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of ECX - while the produced code is expecting it to still contain its previous value. Peter N On 19/07/2013 2:09 PM, Peter Newman wrote:> I've attached the module->dump() that our code is producing. > Unfortunately this is the smallest test case I have available. > > This is before any optimization passes are applied. There are two > separate modules in existence at the time, and there are no guarantees > about the order the surrounding code calls those functions, so there > may be some interaction between them? There shouldn't be, they don't > refer to any common memory etc. There is no multi-threading occurring. > > The function in module-dump.ll (called crashfunc in this file) is > called with > - func_params 0x0018f3b0 double [3] > [0x0] -11.339976634695301 double > [0x1] -9.7504239056205506 double > [0x2] -5.2900856817382804 double > at the time of the exception. > > This is compiled on a "i686-pc-win32" triple. All of the non-intrinsic > functions referred to in these modules are the standard equivalents > from the MSVC library (e.g. @asin is the standard C lib double > asin( double ) ). > > Hopefully this is reproducible for you. > > -- > PeterN > > On 18/07/2013 4:37 PM, Craig Topper wrote: >> Are you able to send any IR for others to reproduce this issue? >> >> >> On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com >> <mailto:peter at uformia.com>> wrote: >> >> Unfortunately, this doesn't appear to be the bug I'm hitting. I >> applied the fix to my source and it didn't make a difference. >> >> Also further testing found me getting the same behavior with >> other SIMD instructions. The common factor is in each case, ECX >> is set to 0x7fffffff, and it's an operation using xmm ptr >> ecx+offset . >> >> Additionally, turning the optimization level passed to createJIT >> down appears to avoid it, so I'm now leaning towards a bug in one >> of the optimization passes. >> >> I'm going to dig through the passes controlled by that parameter >> and see if I can narrow down which optimization is causing it. >> >> Peter N >> >> >> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >> >> As someone off list just told me, perhaps my new bug is the >> same issue: >> >> http://llvm.org/bugs/show_bug.cgi?id=16640 >> >> Do you happen to be using FastISel? >> >> Solomon >> >> On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com >> <mailto:peter at uformia.com>> wrote: >> >> Hello all, >> >> I'm currently in the process of debugging a crash >> occurring in our program. In LLVM 3.2 and 3.3 it appears >> that JIT generated code is attempting to perform access >> unaligned memory with a SSE2 instruction. However this >> only happens under certain conditions that seem (but may >> not be) related to the stacks state on calling the function. >> >> Our program acts as a front-end, using the LLVM C++ API >> to generate a JIT generated function. This function is >> primarily mathematical, so we use the Vector types to >> take advantage of SIMD instructions (as well as a few >> SSE2 intrinsics). >> >> This worked in LLVM 2.8 but started failing in 3.2 and >> has continued to fail in 3.3. It fails with no >> optimizations applied to the LLVM Function/Module. It >> crashes with what is reported as a memory access error >> (accessing 0xffffffff), however it's suggested that this >> is how the SSE fault raising mechanism appears. >> >> The generated instruction varies, but it seems to often >> be similar to (I don't have it in front of me, sorry): >> movapd xmm0, xmm[ecx+0x???????] >> Where the xmm register changes, and the second parameter >> is a memory access. >> ECX is always set to 0x7ffffff - however I don't know if >> this is part of the SSE error reporting process or is >> part of the situation causing the error. >> >> I haven't worked out exactly what code path etc is >> causing this crash. I'm hoping that someone can tell me >> if there were any changed requirements for working with >> SIMD in LLVM 3.2 (or earlier, we haven't tried 3.0 or >> 3.1). I currently suspect the use of GlobalVariable (we >> first discovered the crash when using a feature that uses >> them), however I have attempted using setAlignment on the >> GlobalVariables without any change. >> >> -- >> Peter N >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> >> http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> >> http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> >> >> -- >> ~Craig >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130719/9de718fe/attachment.html>
Craig Topper
2013-Jul-19 05:25 UTC
[LLVMdev] SIMD instructions and memory alignment on X86
What is "frep.x86.sse2.sqrt.pd". I'm only familiar with things prefixed with "llvm.x86". On Thu, Jul 18, 2013 at 10:12 PM, Peter Newman <peter at uformia.com> wrote:> After stepping through the produced assembly, I believe I have a culprit. > > One of the calls to @frep.x86.sse2.sqrt.pd is modifying the value of ECX - > while the produced code is expecting it to still contain its previous value. > > Peter N > > > On 19/07/2013 2:09 PM, Peter Newman wrote: > > I've attached the module->dump() that our code is producing. Unfortunately > this is the smallest test case I have available. > > This is before any optimization passes are applied. There are two separate > modules in existence at the time, and there are no guarantees about the > order the surrounding code calls those functions, so there may be some > interaction between them? There shouldn't be, they don't refer to any > common memory etc. There is no multi-threading occurring. > > The function in module-dump.ll (called crashfunc in this file) is called > with > - func_params 0x0018f3b0 double [3] > [0x0] -11.339976634695301 double > [0x1] -9.7504239056205506 double > [0x2] -5.2900856817382804 double > at the time of the exception. > > This is compiled on a "i686-pc-win32" triple. All of the non-intrinsic > functions referred to in these modules are the standard equivalents from > the MSVC library (e.g. @asin is the standard C lib double asin( double ) > ). > > Hopefully this is reproducible for you. > > -- > PeterN > > On 18/07/2013 4:37 PM, Craig Topper wrote: > > Are you able to send any IR for others to reproduce this issue? > > > On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com> wrote: > >> Unfortunately, this doesn't appear to be the bug I'm hitting. I applied >> the fix to my source and it didn't make a difference. >> >> Also further testing found me getting the same behavior with other SIMD >> instructions. The common factor is in each case, ECX is set to 0x7fffffff, >> and it's an operation using xmm ptr ecx+offset . >> >> Additionally, turning the optimization level passed to createJIT down >> appears to avoid it, so I'm now leaning towards a bug in one of the >> optimization passes. >> >> I'm going to dig through the passes controlled by that parameter and see >> if I can narrow down which optimization is causing it. >> >> Peter N >> >> >> On 17/07/2013 1:58 PM, Solomon Boulos wrote: >> >>> As someone off list just told me, perhaps my new bug is the same issue: >>> >>> http://llvm.org/bugs/show_bug.cgi?id=16640 >>> >>> Do you happen to be using FastISel? >>> >>> Solomon >>> >>> On Jul 16, 2013, at 6:39 PM, Peter Newman <peter at uformia.com> wrote: >>> >>> Hello all, >>>> >>>> I'm currently in the process of debugging a crash occurring in our >>>> program. In LLVM 3.2 and 3.3 it appears that JIT generated code is >>>> attempting to perform access unaligned memory with a SSE2 instruction. >>>> However this only happens under certain conditions that seem (but may not >>>> be) related to the stacks state on calling the function. >>>> >>>> Our program acts as a front-end, using the LLVM C++ API to generate a >>>> JIT generated function. This function is primarily mathematical, so we use >>>> the Vector types to take advantage of SIMD instructions (as well as a few >>>> SSE2 intrinsics). >>>> >>>> This worked in LLVM 2.8 but started failing in 3.2 and has continued to >>>> fail in 3.3. It fails with no optimizations applied to the LLVM >>>> Function/Module. It crashes with what is reported as a memory access error >>>> (accessing 0xffffffff), however it's suggested that this is how the SSE >>>> fault raising mechanism appears. >>>> >>>> The generated instruction varies, but it seems to often be similar to >>>> (I don't have it in front of me, sorry): >>>> movapd xmm0, xmm[ecx+0x???????] >>>> Where the xmm register changes, and the second parameter is a memory >>>> access. >>>> ECX is always set to 0x7ffffff - however I don't know if this is part >>>> of the SSE error reporting process or is part of the situation causing the >>>> error. >>>> >>>> I haven't worked out exactly what code path etc is causing this crash. >>>> I'm hoping that someone can tell me if there were any changed requirements >>>> for working with SIMD in LLVM 3.2 (or earlier, we haven't tried 3.0 or >>>> 3.1). I currently suspect the use of GlobalVariable (we first discovered >>>> the crash when using a feature that uses them), however I have attempted >>>> using setAlignment on the GlobalVariables without any change. >>>> >>>> -- >>>> Peter N >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > > -- > ~Craig > > > >-- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130718/21541c86/attachment.html>
Reasonably Related Threads
- [LLVMdev] SIMD instructions and memory alignment on X86
- [LLVMdev] SIMD instructions and memory alignment on X86
- [LLVMdev] SIMD instructions and memory alignment on X86
- [LLVMdev] SIMD instructions and memory alignment on X86
- [LLVMdev] SIMD instructions and memory alignment on X86