Chengyu Song
2014-Dec-08 04:21 UTC
[LLVMdev] Incorrect loop optimization when building the Linux kernel
I was trying to build the Linux kernel with clang and observed a crash due to incorrect loop optimization: drivers/base/firmware_class.c extern struct builtin_fw __start_builtin_fw[]; extern struct builtin_fw __end_builtin_fw[]; static bool fw_get_builtin_firmware(struct firmware *fw, const char *name) { struct builtin_fw *b_fw; for (b_fw = __start_builtin_fw; b_fw != __end_builtin_fw; b_fw++) { if (strcmp(name, b_fw->name) == 0) { fw->size = b_fw->size; fw->data = b_fw->data; return true; } } return false; } When compiled with -O0, the comparison (b_fw != __end_builtin_fw) is executed before loop body, which is the expected behavior. But with -O1 optimization (opt -O1), the comparison is moved to the end of the loop, which causes the strcmp to use incorrect argument and finally crashes the kernel. define internal fastcc i1 @fw_get_builtin_firmware(%struct.firmware* nocapture %fw, i8* nocapture readonly %name) #0 { tail call void @llvm.dbg.value(metadata !{%struct.firmware* %fw}, i64 0, metadata !3985, metadata !3750), !dbg !3986 tail call void @llvm.dbg.value(metadata !{i8* %name}, i64 0, metadata !3987, metadata !3750), !dbg !3988 tail call void @llvm.dbg.value(metadata !3839, i64 0, metadata !3989, metadata !3750), !dbg !3990 br label %1, !dbg !3991 ; <label>:1 ; preds = %13, %0 %b_fw.02 = phi %struct.builtin_fw* [ getelementptr inbounds ([0 x %struct.builtin_fw]* @__start_builtin_fw, i64 0, i64 0), %0 ], [ %14, %13 ] %2 = getelementptr inbounds %struct.builtin_fw* %b_fw.02, i64 0, i32 0, !dbg !3995 %3 = load i8** %2, align 8, !dbg !3995 %4 = tail call i32 @strcmp(i8* %name, i8* %3) #8, !dbg !3995 %5 = icmp eq i32 %4, 0, !dbg !3995 br i1 %5, label %6, label %13, !dbg !3995 ; <label>:6 ; preds = %1 %b_fw.02.lcssa = phi %struct.builtin_fw* [ %b_fw.02, %1 ] %7 = getelementptr inbounds %struct.builtin_fw* %b_fw.02.lcssa, i64 0, i32 2, !dbg !3999 %8 = load i64* %7, align 8, !dbg !3999 %9 = getelementptr inbounds %struct.firmware* %fw, i64 0, i32 0, !dbg !3999 store i64 %8, i64* %9, align 8, !dbg !3999 %10 = getelementptr inbounds %struct.builtin_fw* %b_fw.02.lcssa, i64 0, i32 1, !dbg !4001 %11 = load i8** %10, align 8, !dbg !4001 %12 = getelementptr inbounds %struct.firmware* %fw, i64 0, i32 1, !dbg !4001 store i8* %11, i8** %12, align 8, !dbg !4001 br label %.loopexit, !dbg !4002 ; <label>:13 ; preds = %1 %14 = getelementptr %struct.builtin_fw* %b_fw.02, i64 1, !dbg !400 tail call void @llvm.dbg.value(metadata !{%struct.builtin_fw* %14}, i64 0, metadata !3989, metadata !3750), !dbg !3990 %15 = icmp eq %struct.builtin_fw* %14, getelementptr inbounds ([0 x %struct.builtin_fw]* @__end_builtin_fw, i64 0, i64 0), !dbg !3991 br i1 %15, label %.loopexit.loopexit, label %1, !dbg !3991 .loopexit.loopexit: ; preds = %13 br label %.loopexit .loopexit: ; preds = %.loopexit.loopexit, %6 %.0 = phi i1 [ true, %6 ], [ false, %.loopexit.loopexit ] ret i1 %.0, !dbg !4004 } Could anyone explain why this is unhappening and how to avoid it. Thanks, Chengyu
Tim Northover
2014-Dec-08 05:05 UTC
[LLVMdev] Incorrect loop optimization when building the Linux kernel
> Could anyone explain why this is unhappening and how to avoid it.It's difficult to say without a full example, but I'm very suspicious of those global declarations. I think the compiler would be entirely justified in assuming you could *never* get from __start_builtin_fw to __end_builtin_fw, let alone on the first iteration: they're distinct array objects and by definition (within C99) can't overlap. Trouble is, any perturbation of that code would change things so drastically that it's going to be difficult to prove that's the cause without digging into the exact LLVM pass that does the transformation to see what it's basing the decision on. Cheers. Tim.
Sanjoy Das
2014-Dec-08 05:09 UTC
[LLVMdev] Incorrect loop optimization when building the Linux kernel
LLVM seems to assume that two non-weak declarations can never point to the same thing (so "__start_builtin_fw == __end_builtin_fw" is always false). In fact, foo in the following program constant folds to return false (with a warning from clang): struct F {}; extern struct F a[]; extern struct F b[]; int foo() { return a == b; } I don't know enough about the difference between different linkage types to say if llvm's behavior is reasonable or not. One place where this property is exploited (there are probably others): (llvm/lib/IR/ConstantFold.cpp) static ICmpInst::Predicate areGlobalsPotentiallyEqual(const GlobalValue *GV1, const GlobalValue *GV2) { // Don't try to decide equality of aliases. if (!isa<GlobalAlias>(GV1) && !isa<GlobalAlias>(GV2)) if (!GV1->hasExternalWeakLinkage() || !GV2->hasExternalWeakLinkage()) return ICmpInst::ICMP_NE; return ICmpInst::BAD_ICMP_PREDICATE; } On Sun, Dec 7, 2014 at 8:21 PM, Chengyu Song <csong84 at gatech.edu> wrote:> I was trying to build the Linux kernel with clang and observed a crash due to incorrect loop optimization: > > drivers/base/firmware_class.c > > extern struct builtin_fw __start_builtin_fw[]; > extern struct builtin_fw __end_builtin_fw[]; > > static bool fw_get_builtin_firmware(struct firmware *fw, const char *name) > { > struct builtin_fw *b_fw; > for (b_fw = __start_builtin_fw; b_fw != __end_builtin_fw; b_fw++) { > if (strcmp(name, b_fw->name) == 0) { > fw->size = b_fw->size; > fw->data = b_fw->data; > return true; > } > } > return false; > } > > When compiled with -O0, the comparison (b_fw != __end_builtin_fw) is executed before loop body, which is the expected behavior. > > But with -O1 optimization (opt -O1), the comparison is moved to the end of the loop, which causes the strcmp to use incorrect argument and finally crashes the kernel. > > define internal fastcc i1 @fw_get_builtin_firmware(%struct.firmware* nocapture %fw, i8* nocapture readonly %name) #0 { > tail call void @llvm.dbg.value(metadata !{%struct.firmware* %fw}, i64 0, metadata !3985, metadata !3750), !dbg !3986 > tail call void @llvm.dbg.value(metadata !{i8* %name}, i64 0, metadata !3987, metadata !3750), !dbg !3988 > tail call void @llvm.dbg.value(metadata !3839, i64 0, metadata !3989, metadata !3750), !dbg !3990 > br label %1, !dbg !3991 > > ; <label>:1 ; preds = %13, %0 > %b_fw.02 = phi %struct.builtin_fw* [ getelementptr inbounds ([0 x %struct.builtin_fw]* @__start_builtin_fw, i64 0, i64 0), %0 ], [ %14, %13 ] > %2 = getelementptr inbounds %struct.builtin_fw* %b_fw.02, i64 0, i32 0, !dbg !3995 > %3 = load i8** %2, align 8, !dbg !3995 > %4 = tail call i32 @strcmp(i8* %name, i8* %3) #8, !dbg !3995 > %5 = icmp eq i32 %4, 0, !dbg !3995 > br i1 %5, label %6, label %13, !dbg !3995 > > ; <label>:6 ; preds = %1 > %b_fw.02.lcssa = phi %struct.builtin_fw* [ %b_fw.02, %1 ] > %7 = getelementptr inbounds %struct.builtin_fw* %b_fw.02.lcssa, i64 0, i32 2, !dbg !3999 > %8 = load i64* %7, align 8, !dbg !3999 > %9 = getelementptr inbounds %struct.firmware* %fw, i64 0, i32 0, !dbg !3999 > store i64 %8, i64* %9, align 8, !dbg !3999 > %10 = getelementptr inbounds %struct.builtin_fw* %b_fw.02.lcssa, i64 0, i32 1, !dbg !4001 > %11 = load i8** %10, align 8, !dbg !4001 > %12 = getelementptr inbounds %struct.firmware* %fw, i64 0, i32 1, !dbg !4001 > store i8* %11, i8** %12, align 8, !dbg !4001 > br label %.loopexit, !dbg !4002 > > ; <label>:13 ; preds = %1 > %14 = getelementptr %struct.builtin_fw* %b_fw.02, i64 1, !dbg !400 > tail call void @llvm.dbg.value(metadata !{%struct.builtin_fw* %14}, i64 0, metadata !3989, metadata !3750), !dbg !3990 > %15 = icmp eq %struct.builtin_fw* %14, getelementptr inbounds ([0 x %struct.builtin_fw]* @__end_builtin_fw, i64 0, i64 0), !dbg !3991 > br i1 %15, label %.loopexit.loopexit, label %1, !dbg !3991 > > .loopexit.loopexit: ; preds = %13 > br label %.loopexit > > .loopexit: ; preds = %.loopexit.loopexit, %6 > %.0 = phi i1 [ true, %6 ], [ false, %.loopexit.loopexit ] > ret i1 %.0, !dbg !4004 > } > > Could anyone explain why this is unhappening and how to avoid it. > > Thanks, > Chengyu > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Chengyu Song
2014-Dec-08 05:31 UTC
[LLVMdev] Incorrect loop optimization when building the Linux kernel
> It's difficult to say without a full example, but I'm very suspicious > of those global declarations. I think the compiler would be entirely > justified in assuming you could *never* get from __start_builtin_fw to > __end_builtin_fw, let alone on the first iteration: they're distinct > array objects and by definition (within C99) can't overlap.I think this should be the root cause. Once I changed the declaration to pointers (instead of arrays): extern struct builtin_fw* __start_builtin_fw; extern struct builtin_fw* __end_builtin_fw; The generated code will not skip the first comparison. Sadly, Linux kernel may contain more such declarations. Thanks a lot! Chengyu
Possibly Parallel Threads
- [LLVMdev] Incorrect loop optimization when building the Linux kernel
- [LLVMdev] Incorrect loop optimization when building the Linux kernel
- [LLVMdev] Incorrect loop optimization when building the Linux kernel
- [PATCH v1 15/27] compiler: Option to default to hidden symbols
- [PATCH v1 15/27] compiler: Option to default to hidden symbols