Hey everyone, I'm running stock LLVM 3.1 release. Both llc and opt programs have the -O# arguments, however it looks like the results are somewhat different. Here's a silly unoptimized bit of code which I'm generating from my LLVM-backed program ; ModuleID = 'foo' %Coord = type { double, double, double } define double @foo(%Coord*, %Coord*) nounwind uwtable ssp { entry: %dx_ptr = alloca double %b_ptr = alloca %Coord* %a_ptr = alloca %Coord* store %Coord* %0, %Coord** %a_ptr store %Coord* %1, %Coord** %b_ptr %a = load %Coord** %a_ptr %addr = getelementptr %Coord* %a, i64 0 %2 = getelementptr inbounds %Coord* %addr, i32 0, i32 0 %3 = load double* %2 %b = load %Coord** %b_ptr %addr1 = getelementptr %Coord* %b, i64 0 %4 = getelementptr inbounds %Coord* %addr1, i32 0, i32 0 %5 = load double* %4 %sub = fsub double %3, %5 store double %sub, double* %dx_ptr %dx = load double* %dx_ptr %dx2 = load double* %dx_ptr %mult = fmul double %dx, %dx2 ret double %mult } This roughly matches the following C code struct Coord { double x; double y; double z; }; double foo(struct Coord * a, struct Coord * b) { dx = a[0].x - a[0].y; return dx * dx; } Running through opt $ llvm-as < x.ll | opt -O3 | llc > y.s Produces the following: _foo: ## @foo .cfi_startproc ## BB#0: ## %entry movsd (%rdi), %xmm0 subsd (%rsi), %xmm0 mulsd %xmm0, %xmm0 ret .cfi_endproc This also matches what clang compiles from the C function. However, running through llc with the same optimization flag $ llc -O3 x.ll -o z.s _foo: ## @foo .cfi_startproc ## BB#0: ## %entry movq %rdi, -24(%rsp) movq %rsi, -16(%rsp) movq -24(%rsp), %rax movsd (%rax), %xmm0 subsd (%rsi), %xmm0 movsd %xmm0, -8(%rsp) mulsd %xmm0, %xmm0 ret .cfi_endproc This matches the results of LLVMCreateTargetMachine with CodeGenLevelAggressive followed by LLVMTargetMachineEmitToFile which I'm using. . Is the llc/opt difference expected? I'm a bit confused since I'd expect same -O level running the same optimization passes. I have to admit I'm not well versed in assembly but to me it looks like opt produces something that eliminates a bunch of stack loading ops. I'd appreciate any insight into this. Thanks, Dimitri
Dimitri Tcaciuc wrote:> Is the llc/opt difference expected?Yes. "opt" runs the optimizers, which take LLVM IR as input and produce LLVM IR as output. "opt -O2 -debug-pass=Arguments" will show you a list of the individual optimizations (and analyses) that opt -O2 performs. It's possible to run them individually (opt -scalarrepl -instcombine) to create a list that's better for your own compiler, but -O2 has the ones we think are good for a C/C++ compiler. "llc" is the code generator, which takes LLVM IR in and produces machine code. There are some places in the code generator where it has the choice between spending compile time to produce good code, or getting the code out quickly, and the -O flag to llc specifies that choice. For example, you can do register allocation by trying to figure out the most efficient registers that minimize the number of spills, or you can just pick the registers starting from one, and spill it if it's already used. Any optimizations llc does are things that can't possibly happen in an IR-to-IR pass (since the IR is SSA form, we can't do register allocation there). If you want optimized code, you'd run the IR optimizers and ask the code generator to spend time producing good code. Or if you want unoptimized code, don't run any IR optimizers and ask the code generator to produce code as quickly as it can. You can of course choose some other combination by running opt and llc yourself, as you noticed. Nick
Great, thanks for the info! So to extrapolate, (referring to LLVM C bindings) running PassManager + populating PassManagerBuilder at it's own OptLevel actually takes care of different category of optimizations and will not step on what Target machine CodeGenLevel argument + TargetMachineEmitToFile are working on? Dimitri. On Sat, Jun 30, 2012 at 1:28 PM, Nick Lewycky <nicholas at mxc.ca> wrote:> Dimitri Tcaciuc wrote: >> >> Is the llc/opt difference expected? > > > Yes. "opt" runs the optimizers, which take LLVM IR as input and produce LLVM > IR as output. "opt -O2 -debug-pass=Arguments" will show you a list of the > individual optimizations (and analyses) that opt -O2 performs. It's possible > to run them individually (opt -scalarrepl -instcombine) to create a list > that's better for your own compiler, but -O2 has the ones we think are good > for a C/C++ compiler. > > "llc" is the code generator, which takes LLVM IR in and produces machine > code. There are some places in the code generator where it has the choice > between spending compile time to produce good code, or getting the code out > quickly, and the -O flag to llc specifies that choice. For example, you can > do register allocation by trying to figure out the most efficient registers > that minimize the number of spills, or you can just pick the registers > starting from one, and spill it if it's already used. Any optimizations llc > does are things that can't possibly happen in an IR-to-IR pass (since the IR > is SSA form, we can't do register allocation there). > > If you want optimized code, you'd run the IR optimizers and ask the code > generator to spend time producing good code. Or if you want unoptimized > code, don't run any IR optimizers and ask the code generator to produce code > as quickly as it can. You can of course choose some other combination by > running opt and llc yourself, as you noticed. > > Nick
Apparently Analagous Threads
- [LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?
- [Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm
- [LLVMdev] XMM in X86 Backend
- [RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model
- [RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model