Bjorn De Sutter
2011-Nov-14 16:30 UTC
[LLVMdev] algebraic (de)optimizations form long chains of dependent operations
Hi, when I compile (clang -O3) and optimize (opt -O3) C-code like this: sum1 = val1 + val2; sum2 = val3 + val4; sum3 = val5 + val6; sum4 = val7 + val8; sum5 = sum1 + sum2; sum6 = sum3 + sum4; sum7 = sum5 + sum6; sum += sum7; I get bitcode like this: if.end152: ; preds = %if.then150, %if.else146, %if.end137 %val8.0 = phi i32 [ -2048, %if.then150 ], [ %conv38, %if.else146 ], [ 2047, %if.end137 ] %conv154 = trunc i32 %val8.0 to i16 store i16 %conv154, i16* %add.ptr36, align 2 %add = add i32 %val1.0, %sum.1 %add161 = add i32 %add, %val2.0 %add170 = add i32 %add161, %val3.0 %add167 = add i32 %add170, %val4.0 %add173 = add i32 %add167, %val5.0 %add164 = add i32 %add173, %val6.0 %add176 = add i32 %add164, %val7.0 %add179 = add i32 %add176, %val8.0 %cmp181 = icmp eq i32 %rem, 56 br i1 %cmp181, label %if.then183, label %for.inc The problem with this bitcode is that all additions form one long chain of dependent additions, rather than a tree of additions as in the original source code. For parallel architectures like VLIW architectures, this is detrimental for performance. Any idea how I can avoid this? Thanks, prof. Bjorn De Sutter Computer Systems Lab Ghent University