thr3ads.net - llvm dev - [LLVMdev] Incorrect loop optimization when building the Linux kernel [Dec 2014]

If this information is useful, please help other people find it:
Share via:

Chengyu Song

2014-Dec-08 04:21 UTC

[LLVMdev] Incorrect loop optimization when building the Linux kernel

I was trying to build the Linux kernel with clang and observed a crash due to
incorrect loop optimization:

drivers/base/firmware_class.c

extern struct builtin_fw __start_builtin_fw[];
extern struct builtin_fw __end_builtin_fw[];

static bool fw_get_builtin_firmware(struct firmware *fw, const char *name)
{
	struct builtin_fw *b_fw;
	for (b_fw = __start_builtin_fw; b_fw != __end_builtin_fw; b_fw++) {
		if (strcmp(name, b_fw->name) == 0) {
			fw->size = b_fw->size;
			fw->data = b_fw->data;
			return true;
		}
	}
	return false;
}

When compiled with -O0, the comparison (b_fw != __end_builtin_fw) is executed
before loop body, which is the expected behavior.

But with -O1 optimization (opt -O1), the comparison is moved to the end of the
loop, which causes the strcmp to use incorrect argument and finally crashes the
kernel.

define internal fastcc i1 @fw_get_builtin_firmware(%struct.firmware* nocapture
%fw, i8* nocapture readonly %name) #0 {
	tail call void @llvm.dbg.value(metadata !{%struct.firmware* %fw}, i64 0,
metadata !3985, metadata !3750), !dbg !3986
	tail call void @llvm.dbg.value(metadata !{i8* %name}, i64 0, metadata !3987,
metadata !3750), !dbg !3988
	tail call void @llvm.dbg.value(metadata !3839, i64 0, metadata !3989, metadata
!3750), !dbg !3990
	br label %1, !dbg !3991

; <label>:1                                       ; preds = %13, %0
	%b_fw.02 = phi %struct.builtin_fw* [ getelementptr inbounds ([0 x
%struct.builtin_fw]* @__start_builtin_fw, i64 0, i64 0), %0 ], [ %14, %13 ]
	%2 = getelementptr inbounds %struct.builtin_fw* %b_fw.02, i64 0, i32 0, !dbg
!3995
	%3 = load i8** %2, align 8, !dbg !3995
	%4 = tail call i32 @strcmp(i8* %name, i8* %3) #8, !dbg !3995
	%5 = icmp eq i32 %4, 0, !dbg !3995
	br i1 %5, label %6, label %13, !dbg !3995

; <label>:6                                       ; preds = %1
	%b_fw.02.lcssa = phi %struct.builtin_fw* [ %b_fw.02, %1 ]
	%7 = getelementptr inbounds %struct.builtin_fw* %b_fw.02.lcssa, i64 0, i32 2,
!dbg !3999
	%8 = load i64* %7, align 8, !dbg !3999
	%9 = getelementptr inbounds %struct.firmware* %fw, i64 0, i32 0, !dbg !3999
	store i64 %8, i64* %9, align 8, !dbg !3999
	%10 = getelementptr inbounds %struct.builtin_fw* %b_fw.02.lcssa, i64 0, i32 1,
!dbg !4001
	%11 = load i8** %10, align 8, !dbg !4001
	%12 = getelementptr inbounds %struct.firmware* %fw, i64 0, i32 1, !dbg !4001
	store i8* %11, i8** %12, align 8, !dbg !4001
	br label %.loopexit, !dbg !4002

; <label>:13                                      ; preds = %1
	%14 = getelementptr %struct.builtin_fw* %b_fw.02, i64 1, !dbg !400
	tail call void @llvm.dbg.value(metadata !{%struct.builtin_fw* %14}, i64 0,
metadata !3989, metadata !3750), !dbg !3990
	%15 = icmp eq %struct.builtin_fw* %14, getelementptr inbounds ([0 x
%struct.builtin_fw]* @__end_builtin_fw, i64 0, i64 0), !dbg !3991
	br i1 %15, label %.loopexit.loopexit, label %1, !dbg !3991

.loopexit.loopexit:                               ; preds = %13
	br label %.loopexit

.loopexit:                                        ; preds = %.loopexit.loopexit,
%6
	%.0 = phi i1 [ true, %6 ], [ false, %.loopexit.loopexit ]
	ret i1 %.0, !dbg !4004
}

Could anyone explain why this is unhappening and how to avoid it.

Thanks,
Chengyu

Tim Northover

2014-Dec-08 05:05 UTC

head link

[LLVMdev] Incorrect loop optimization when building the Linux kernel

> Could anyone explain why this is unhappening and how to avoid it.
It's difficult to say without a full example, but I'm very suspicious
of those global declarations. I think the compiler would be entirely
justified in assuming you could *never* get from __start_builtin_fw to
__end_builtin_fw, let alone on the first iteration: they're distinct
array objects and by definition (within C99) can't overlap.

Trouble is, any perturbation of that code would change things so
drastically that it's going to be difficult to prove that's the cause
without digging into the exact LLVM pass that does the transformation
to see what it's basing the decision on.

Cheers.

Tim.

Sanjoy Das

2014-Dec-08 05:09 UTC

head link

[LLVMdev] Incorrect loop optimization when building the Linux kernel

LLVM seems to assume that two non-weak declarations can never point to
the same thing (so "__start_builtin_fw == __end_builtin_fw" is always
false).  In fact, foo in the following program constant folds to
return false (with a warning from clang):

struct F {};
extern struct F a[];
extern struct F b[];

int foo() {
  return a == b;
}

I don't know enough about the difference between different linkage
types to say if llvm's behavior is reasonable or not.


One place where this property is exploited (there are probably others):

(llvm/lib/IR/ConstantFold.cpp)

static ICmpInst::Predicate areGlobalsPotentiallyEqual(const GlobalValue *GV1,
                                                      const GlobalValue *GV2) {
  // Don't try to decide equality of aliases.
  if (!isa<GlobalAlias>(GV1) && !isa<GlobalAlias>(GV2))
    if (!GV1->hasExternalWeakLinkage() || !GV2->hasExternalWeakLinkage())
      return ICmpInst::ICMP_NE;
  return ICmpInst::BAD_ICMP_PREDICATE;
}


On Sun, Dec 7, 2014 at 8:21 PM, Chengyu Song <csong84 at gatech.edu>
wrote:> I was trying to build the Linux kernel with clang and observed a crash due
to incorrect loop optimization:
>
> drivers/base/firmware_class.c
>
> extern struct builtin_fw __start_builtin_fw[];
> extern struct builtin_fw __end_builtin_fw[];
>
> static bool fw_get_builtin_firmware(struct firmware *fw, const char *name)
> {
>         struct builtin_fw *b_fw;
>         for (b_fw = __start_builtin_fw; b_fw != __end_builtin_fw; b_fw++) {
>                 if (strcmp(name, b_fw->name) == 0) {
>                         fw->size = b_fw->size;
>                         fw->data = b_fw->data;
>                         return true;
>                 }
>         }
>         return false;
> }
>
> When compiled with -O0, the comparison (b_fw != __end_builtin_fw) is
executed before loop body, which is the expected behavior.
>
> But with -O1 optimization (opt -O1), the comparison is moved to the end of
the loop, which causes the strcmp to use incorrect argument and finally crashes
the kernel.
>
> define internal fastcc i1 @fw_get_builtin_firmware(%struct.firmware*
nocapture %fw, i8* nocapture readonly %name) #0 {
>         tail call void @llvm.dbg.value(metadata !{%struct.firmware* %fw},
i64 0, metadata !3985, metadata !3750), !dbg !3986
>         tail call void @llvm.dbg.value(metadata !{i8* %name}, i64 0,
metadata !3987, metadata !3750), !dbg !3988
>         tail call void @llvm.dbg.value(metadata !3839, i64 0, metadata
!3989, metadata !3750), !dbg !3990
>         br label %1, !dbg !3991
>
> ; <label>:1                                       ; preds = %13, %0
>         %b_fw.02 = phi %struct.builtin_fw* [ getelementptr inbounds ([0 x
%struct.builtin_fw]* @__start_builtin_fw, i64 0, i64 0), %0 ], [ %14, %13 ]
>         %2 = getelementptr inbounds %struct.builtin_fw* %b_fw.02, i64 0,
i32 0, !dbg !3995
>         %3 = load i8** %2, align 8, !dbg !3995
>         %4 = tail call i32 @strcmp(i8* %name, i8* %3) #8, !dbg !3995
>         %5 = icmp eq i32 %4, 0, !dbg !3995
>         br i1 %5, label %6, label %13, !dbg !3995
>
> ; <label>:6                                       ; preds = %1
>         %b_fw.02.lcssa = phi %struct.builtin_fw* [ %b_fw.02, %1 ]
>         %7 = getelementptr inbounds %struct.builtin_fw* %b_fw.02.lcssa, i64
0, i32 2, !dbg !3999
>         %8 = load i64* %7, align 8, !dbg !3999
>         %9 = getelementptr inbounds %struct.firmware* %fw, i64 0, i32 0,
!dbg !3999
>         store i64 %8, i64* %9, align 8, !dbg !3999
>         %10 = getelementptr inbounds %struct.builtin_fw* %b_fw.02.lcssa,
i64 0, i32 1, !dbg !4001
>         %11 = load i8** %10, align 8, !dbg !4001
>         %12 = getelementptr inbounds %struct.firmware* %fw, i64 0, i32 1,
!dbg !4001
>         store i8* %11, i8** %12, align 8, !dbg !4001
>         br label %.loopexit, !dbg !4002
>
> ; <label>:13                                      ; preds = %1
>         %14 = getelementptr %struct.builtin_fw* %b_fw.02, i64 1, !dbg !400
>         tail call void @llvm.dbg.value(metadata !{%struct.builtin_fw* %14},
i64 0, metadata !3989, metadata !3750), !dbg !3990
>         %15 = icmp eq %struct.builtin_fw* %14, getelementptr inbounds ([0 x
%struct.builtin_fw]* @__end_builtin_fw, i64 0, i64 0), !dbg !3991
>         br i1 %15, label %.loopexit.loopexit, label %1, !dbg !3991
>
> .loopexit.loopexit:                               ; preds = %13
>         br label %.loopexit
>
> .loopexit:                                        ; preds =
%.loopexit.loopexit, %6
>         %.0 = phi i1 [ true, %6 ], [ false, %.loopexit.loopexit ]
>         ret i1 %.0, !dbg !4004
> }
>
> Could anyone explain why this is unhappening and how to avoid it.
>
> Thanks,
> Chengyu
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Chengyu Song

2014-Dec-08 05:31 UTC

head link

[LLVMdev] Incorrect loop optimization when building the Linux kernel

> It's difficult to say without a full example, but I'm very
suspicious
> of those global declarations. I think the compiler would be entirely
> justified in assuming you could *never* get from __start_builtin_fw to
> __end_builtin_fw, let alone on the first iteration: they're distinct
> array objects and by definition (within C99) can't overlap.
I think this should be the root cause. Once I changed the declaration to
pointers (instead of arrays):

extern struct builtin_fw* __start_builtin_fw;
extern struct builtin_fw* __end_builtin_fw;

The generated code will not skip the first comparison.

Sadly, Linux kernel may contain more such declarations.

Thanks a lot!
Chengyu

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Dec 2014 - [LLVMdev] Incorrect loop optimization when building the Linux kernel

[LLVMdev] Incorrect loop optimization when building the Linux kernel

[LLVMdev] Incorrect loop optimization when building the Linux kernel

[LLVMdev] Incorrect loop optimization when building the Linux kernel

[LLVMdev] Incorrect loop optimization when building the Linux kernel

Possibly Parallel Threads