So it would appear that llvm-gcc and clang are both slower than gcc4 which is infamous for being slow at compiling code, and yes this is with a release build/--enable-optimizations. This seems to go against notes such as http://clang.llvm.org/features.html#performance which claim clang is signifcantly faster than gcc. Below are some times and the larger object files when compiling an i386 OpenBSD kernel at -O2 on an Intel Atom based laptop. The significantly larger code size is rather disturbing as it means binaries can't fit on space constrained installation media for example. As the large object files appear with both llvm-gcc and clang it would appear to be a problem with the X86 backend. Is this the expected behaviour at this point? There is quite a difference in code size between llvm based compilers and pure GCC ones. I'm happy to supply more configuration/object size details if someone is interested. ======================== clang 20070717 svn --enable-optimizations clang version 1.0 (http://llvm.org/svn/llvm-project/cfe/trunk 76176M) text data bss dec hex 7412287 160516 1049460 8622263 8390b7 19m4.11s real 16m56.35s user 1m44.23s system -rw-r--r-- 1 jsg wsrc 122K Jul 18 01:41 if_spppsubr.o -rw-r--r-- 1 jsg wsrc 127K Jul 18 01:36 aic79xx.o -rw-r--r-- 1 jsg wsrc 130K Jul 18 01:46 pci_subr.o -rw-r--r-- 1 jsg wsrc 161K Jul 18 01:38 bwi.o -rw-r--r-- 1 jsg wsrc 349K Jul 18 01:47 isp_pci.o -rwxr-xr-x 1 jsg wsrc 7.9M Jul 18 01:54 bsd ======================== gcc 3.3.5 text data bss dec hex 6358820 182268 1051044 7592132 73d8c4 14m7.04s real 13m5.64s user 1m15.96s system -rw-r--r-- 1 jsg wsrc 85.2K Jul 18 02:25 isp.o -rw-r--r-- 1 jsg wsrc 97.5K Jul 18 02:26 bwi.o -rw-r--r-- 1 jsg wsrc 100K Jul 18 02:24 aic79xx.o -rw-r--r-- 1 jsg wsrc 124K Jul 18 02:32 pci_subr.o -rw-r--r-- 1 jsg wsrc 350K Jul 18 02:32 isp_pci.o -rwxr-xr-x 1 jsg wsrc 6.9M Jul 18 02:38 bsd ======================== gcc 4.2.4 text data bss dec hex 6373332 169212 1051044 7593588 73de74 17m49.36s real 16m44.82s user 1m15.59s system -rw-r--r-- 1 jsg wsrc 98.6K Jul 18 02:00 aic79xx.o -rw-r--r-- 1 jsg wsrc 99.3K Jul 18 02:02 bwi.o -rw-r--r-- 1 jsg wsrc 124K Jul 18 02:09 pci_subr.o -rw-r--r-- 1 jsg wsrc 351K Jul 18 02:10 isp_pci.o -rwxr-xr-x 1 jsg wsrc 6.9M Jul 18 02:17 bsd ======================== llvm-gcc 4.2.1 svn 20090717 gcc version 4.2.1 (Based on Apple Inc. build 5646) (LLVM build) ld -Ttext 0xD0200120 -e start -N --warn-common -S -x -o bsd ${SYSTEM_OBJ} vers.o text data bss dec hex 7235924 179256 1053092 8468272 813730 18m4.05s real 16m35.42s user 1m35.06s system -rw-r--r-- 1 jsg wsrc 118K Jul 18 01:12 if_spppsubr.o -rw-r--r-- 1 jsg wsrc 124K Jul 18 01:16 pci_subr.o -rw-r--r-- 1 jsg wsrc 125K Jul 18 01:07 aic79xx.o -rw-r--r-- 1 jsg wsrc 159K Jul 18 01:09 bwi.o -rw-r--r-- 1 jsg wsrc 349K Jul 18 01:17 isp_pci.o -rwxr-xr-x 1 jsg wsrc 7.8M Jul 18 01:24 bsd
On Fri, Jul 17, 2009 at 4:14 PM, Jonathan Gray<jsg at goblin.cx> wrote:> This seems to go against notes such as > http://clang.llvm.org/features.html#performance > which claim clang is signifcantly faster than gcc.I think the URL you want is actually http://clang.llvm.org/performance.html. The difference isn't as dramatic when you consider code generation, at least for the moment.> Below are some times and the larger object files when > compiling an i386 OpenBSD kernel at -O2 on an Intel Atom > based laptop. The significantly larger code size > is rather disturbing as it means binaries can't fit on > space constrained installation media for example.Are you building with -g? The debug info implementation is known to be relatively inefficient. -Eli
On Fri, Jul 17, 2009 at 04:41:55PM -0700, Eli Friedman wrote:> On Fri, Jul 17, 2009 at 4:14 PM, Jonathan Gray<jsg at goblin.cx> wrote: > > This seems to go against notes such as > > http://clang.llvm.org/features.html#performance > > which claim clang is signifcantly faster than gcc. > > I think the URL you want is actually > http://clang.llvm.org/performance.html. The difference isn't as > dramatic when you consider code generation, at least for the moment.Are these scripts to break down the time spent in different stages available somewhere?> > > Below are some times and the larger object files when > > compiling an i386 OpenBSD kernel at -O2 on an Intel Atom > > based laptop. ?The significantly larger code size > > is rather disturbing as it means binaries can't fit on > > space constrained installation media for example. > > Are you building with -g? The debug info implementation is known to > be relatively inefficient.No -g/debug info, as below with additional -D defines. -Wall -Wstrict-prototypes -Wmissing-prototypes -Wno-uninitialized -Wno-format -fno-builtin -fno-zero-initialized-in-bss -mno-mmx -mno-sse -mno-sse2 -mno-sse3 -O2 -pipe -nostdinc
Hi Jonathan, Please pick one or two files you think are representative of the file size problem and file bugs with the preprocessed inputs and command line here: http://llvm.org/bugs As for the compile performance, its hard to do much with those full build numbers. Can you see if the problem manifests itself with some individual file, and file a bug if so (please include preprocessed input and command lines)? Given the difference in code size, llvm-gcc and clang would appear to be doing more work, so that will of course negatively impact build times. However, I'm still surprised by your times and would be happy to investigate if you can get me a test case. - Daniel On Fri, Jul 17, 2009 at 4:14 PM, Jonathan Gray<jsg at goblin.cx> wrote:> So it would appear that llvm-gcc and clang are both slower than > gcc4 which is infamous for being slow at compiling code, and > yes this is with a release build/--enable-optimizations. > > This seems to go against notes such as > http://clang.llvm.org/features.html#performance > which claim clang is signifcantly faster than gcc. > > Below are some times and the larger object files when > compiling an i386 OpenBSD kernel at -O2 on an Intel Atom > based laptop. The significantly larger code size > is rather disturbing as it means binaries can't fit on > space constrained installation media for example. > > As the large object files appear with both llvm-gcc > and clang it would appear to be a problem with > the X86 backend. Is this the expected behaviour at > this point? There is quite a difference in code size > between llvm based compilers and pure GCC ones. > > I'm happy to supply more configuration/object size details > if someone is interested. > > ========================> > clang 20070717 svn --enable-optimizations > text data bss dec hex > 7412287 160516 1049460 8622263 8390b7 > 19m4.11s real 16m56.35s user 1m44.23s system > > -rw-r--r-- 1 jsg wsrc 122K Jul 18 01:41 if_spppsubr.o > -rw-r--r-- 1 jsg wsrc 127K Jul 18 01:36 aic79xx.o > -rw-r--r-- 1 jsg wsrc 130K Jul 18 01:46 pci_subr.o > -rw-r--r-- 1 jsg wsrc 161K Jul 18 01:38 bwi.o > -rw-r--r-- 1 jsg wsrc 349K Jul 18 01:47 isp_pci.o > -rwxr-xr-x 1 jsg wsrc 7.9M Jul 18 01:54 bsd > > ========================> > gcc 3.3.5 > > text data bss dec hex > 6358820 182268 1051044 7592132 73d8c4 > 14m7.04s real 13m5.64s user 1m15.96s system > > -rw-r--r-- 1 jsg wsrc 85.2K Jul 18 02:25 isp.o > -rw-r--r-- 1 jsg wsrc 97.5K Jul 18 02:26 bwi.o > -rw-r--r-- 1 jsg wsrc 100K Jul 18 02:24 aic79xx.o > -rw-r--r-- 1 jsg wsrc 124K Jul 18 02:32 pci_subr.o > -rw-r--r-- 1 jsg wsrc 350K Jul 18 02:32 isp_pci.o > -rwxr-xr-x 1 jsg wsrc 6.9M Jul 18 02:38 bsd > > ========================> > gcc 4.2.4 > > text data bss dec hex > 6373332 169212 1051044 7593588 73de74 > 17m49.36s real 16m44.82s user 1m15.59s system > > -rw-r--r-- 1 jsg wsrc 98.6K Jul 18 02:00 aic79xx.o > -rw-r--r-- 1 jsg wsrc 99.3K Jul 18 02:02 bwi.o > -rw-r--r-- 1 jsg wsrc 124K Jul 18 02:09 pci_subr.o > -rw-r--r-- 1 jsg wsrc 351K Jul 18 02:10 isp_pci.o > -rwxr-xr-x 1 jsg wsrc 6.9M Jul 18 02:17 bsd > > ========================> > llvm-gcc 4.2.1 svn 20090717 > > gcc version 4.2.1 (Based on Apple Inc. build 5646) (LLVM build) > > ld -Ttext 0xD0200120 -e start -N --warn-common -S -x -o bsd ${SYSTEM_OBJ} vers.o > text data bss dec hex > 7235924 179256 1053092 8468272 813730 > 18m4.05s real 16m35.42s user 1m35.06s system > > -rw-r--r-- 1 jsg wsrc 118K Jul 18 01:12 if_spppsubr.o > -rw-r--r-- 1 jsg wsrc 124K Jul 18 01:16 pci_subr.o > -rw-r--r-- 1 jsg wsrc 125K Jul 18 01:07 aic79xx.o > -rw-r--r-- 1 jsg wsrc 159K Jul 18 01:09 bwi.o > -rw-r--r-- 1 jsg wsrc 349K Jul 18 01:17 isp_pci.o > -rwxr-xr-x 1 jsg wsrc 7.8M Jul 18 01:24 bsd > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On Jul 17, 2009, at 4:41 PM, Eli Friedman wrote:> On Fri, Jul 17, 2009 at 4:14 PM, Jonathan Gray<jsg at goblin.cx> wrote: >> This seems to go against notes such as >> http://clang.llvm.org/features.html#performance >> which claim clang is signifcantly faster than gcc. > > I think the URL you want is actually > http://clang.llvm.org/performance.html. The difference isn't as > dramatic when you consider code generation, at least for the moment.On Mac OS X / x86, llvm-gcc is easily > 20% faster than gcc 4.2 at - O2 / -O3.> >> Below are some times and the larger object files when >> compiling an i386 OpenBSD kernel at -O2 on an Intel Atom >> based laptop. The significantly larger code size >> is rather disturbing as it means binaries can't fit on >> space constrained installation media for example. > > Are you building with -g? The debug info implementation is known to > be relatively inefficient.That shouldn't matter much unless it's -O0. llvm generated code size is very close to gcc 4.2 (slightly worse on x86, slightly better on x86_64). Given the code size difference, I am inclined to believe something isn't being setup right. Jonathan, please file a bug and attach the complete build logs. Also, please some preprocessed files (e.g. bwi.i). Perhaps some .o files will help as well. Evan> > -Eli > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
We have some results that are somewhat entertaining and that relate to the size/speed discussion. The basic idea is exhaustive generation of C functions where "exhaustive" is qualified by some structural restrictions (depth of AST, node type, etc.). For one particular set of restrictions we ended up with about 7 million C functions. We then compiled each of these functions with 7 compilers: llvm-gcc, clang, Intel cc, Sun cc, various versions of gcc. We then looked for functions where a particular pair of compilers exhibited widely differing abilities to optimize. For example, consider this function: int ZZ_0000728f(int x,int y){return o1s(m8s(x,-2),(x?1:y));} gcc-3.4 can see that it always returns 0, and emits code doing that. On the other hand, llvm-gcc emits 228 bytes of object code (at -Os) to compute the same zeroes. The funny-named functions are little safe-math utilities that avoid undefined behavior for all inputs. "o1s" is "mod 16-bit signed" and "m8s" is "multiply 8-bit signed". Why is this interesting? Because it provides a way to systematically find areas of weakness in an optimizer, relative to a collection of other optimizers. If people would find it useful, I can put the full set of results on the web when time permits. I call the resulting codes "maximally embarrassing" since each function represents some significant failure to optimize. The global maximally embarrasing function is one where various versions of gcc (including llvm-gcc) emit code returning constant 0 and clang emits 762 bytes of x86. The C code is this: int ZZ_00005bbd(int x,int y){return m1s((x?0:x),a8s(y,y));} The other embarrassing thing about these functions is that most compilers miscompile some of the 7 million functions. llvm-gcc and clang are the only ones we tested that actually get them all right. To compile these functions this code needs to be prepended: #include <limits.h> #include <stdint.h> #include "safe_abbrev.h" #include "safe_math.h" The safe math headers are here: http://www.cs.utah.edu/~regehr/safe_math/ Anyway I just throw this out there. People on this list have told me before that missed-optimization bugs are not considered very interesting. The ideal result (from my point of view as a compiler consumer) would be for a few people from one or more of these compilers' development communities to take seriously the job of eliminating these embarrassments. John Regehr
On Fri, Jul 17, 2009 at 10:21 PM, John Regehr<regehr at cs.utah.edu> wrote:> The global maximally embarrasing function is one where various versions of > gcc (including llvm-gcc) emit code returning constant 0 and clang emits > 762 bytes of x86. The C code is this: > > int ZZ_00005bbd(int x,int y){return m1s((x?0:x),a8s(y,y));}The thing about testcases like that is that they heavily stress instcombine; as far as I can tell, that testcase basically boils down to instcombine interacting badly with clang's output for "(short)(x?0:x)>0" (it ends up turning it into something roughly equivalent to "(short)x>0 && x==0", which it doesn't know how to fold). And fixing weird instcombine cases like that tends to be tedious and not very relevant to real-world code.> The other embarrassing thing about these functions is that most compilers > miscompile some of the 7 million functions. llvm-gcc and clang are the > only ones we tested that actually get them all right.Well, that's good :) -Eli
On Fri, Jul 17, 2009 at 10:21 PM, John Regehr<regehr at cs.utah.edu> wrote:> We have some results that are somewhat entertaining and that relate to the > size/speed discussion. > > The basic idea is exhaustive generation of C functions where "exhaustive" > is qualified by some structural restrictions (depth of AST, node type, > etc.). > > For one particular set of restrictions we ended up with about 7 million C > functions. We then compiled each of these functions with 7 compilers: > llvm-gcc, clang, Intel cc, Sun cc, various versions of gcc. > > We then looked for functions where a particular pair of compilers > exhibited widely differing abilities to optimize. For example, consider > this function: > > int ZZ_0000728f(int x,int y){return o1s(m8s(x,-2),(x?1:y));} > > gcc-3.4 can see that it always returns 0, and emits code doing that. On > the other hand, llvm-gcc emits 228 bytes of object code (at -Os) to > compute the same zeroes. > > The funny-named functions are little safe-math utilities that avoid > undefined behavior for all inputs. "o1s" is "mod 16-bit signed" and "m8s" > is "multiply 8-bit signed". > > Why is this interesting? Because it provides a way to systematically find > areas of weakness in an optimizer, relative to a collection of other > optimizers. > > If people would find it useful, I can put the full set of results on the > web when time permits. I call the resulting codes "maximally > embarrassing" since each function represents some significant failure to > optimize. > > The global maximally embarrasing function is one where various versions of > gcc (including llvm-gcc) emit code returning constant 0 and clang emits > 762 bytes of x86. The C code is this: > > int ZZ_00005bbd(int x,int y){return m1s((x?0:x),a8s(y,y));} > > The other embarrassing thing about these functions is that most compilers > miscompile some of the 7 million functions. llvm-gcc and clang are the > only ones we tested that actually get them all right. > > To compile these functions this code needs to be prepended: > > #include <limits.h> > #include <stdint.h> > #include "safe_abbrev.h" > #include "safe_math.h" > > The safe math headers are here: > > http://www.cs.utah.edu/~regehr/safe_math/ > > Anyway I just throw this out there. People on this list have told me > before that missed-optimization bugs are not considered very interesting. > The ideal result (from my point of view as a compiler consumer) would be > for a few people from one or more of these compilers' development > communities to take seriously the job of eliminating these embarrassments.Interesting and cool work. I wonder if there isn't a way to somehow collate the results? As Eli points out, slogging through a bunch of such reports may not be very valuable. However, if there was a way to group results so that multiple failures could be "guessed" to be the same bug, then I'd imagine the top N results off that list would be interesting & definitely worth fixing. Since your generator already has a significant amount of information about the structure of the tests -- and that information probably correlates with the root missed optimization -- there is a probably a reasonable way to implement the "guess" using machine learning. Got an AI grad student with free time handy? - Daniel> John Regehr > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
John Regehr wrote:> We have some results that are somewhat entertaining and that relate to the > size/speed discussion. > > The basic idea is exhaustive generation of C functions where "exhaustive" > is qualified by some structural restrictions (depth of AST, node type, > etc.). > > For one particular set of restrictions we ended up with about 7 million C > functions. We then compiled each of these functions with 7 compilers: > llvm-gcc, clang, Intel cc, Sun cc, various versions of gcc. > > We then looked for functions where a particular pair of compilers > exhibited widely differing abilities to optimize. For example, consider > this function: > > int ZZ_0000728f(int x,int y){return o1s(m8s(x,-2),(x?1:y));} > > gcc-3.4 can see that it always returns 0, and emits code doing that. On > the other hand, llvm-gcc emits 228 bytes of object code (at -Os) to > compute the same zeroes. > > The funny-named functions are little safe-math utilities that avoid > undefined behavior for all inputs. "o1s" is "mod 16-bit signed" and "m8s" > is "multiply 8-bit signed". > > Why is this interesting? Because it provides a way to systematically find > areas of weakness in an optimizer, relative to a collection of other > optimizers. > > If people would find it useful, I can put the full set of results on the > web when time permits. I call the resulting codes "maximally > embarrassing" since each function represents some significant failure to > optimize. > > The global maximally embarrasing function is one where various versions of > gcc (including llvm-gcc) emit code returning constant 0 and clang emits > 762 bytes of x86. The C code is this: > > int ZZ_00005bbd(int x,int y){return m1s((x?0:x),a8s(y,y));} > > The other embarrassing thing about these functions is that most compilers > miscompile some of the 7 million functions. llvm-gcc and clang are the > only ones we tested that actually get them all right. > > To compile these functions this code needs to be prepended: > > #include <limits.h> > #include <stdint.h> > #include "safe_abbrev.h" > #include "safe_math.h" > > The safe math headers are here: > > http://www.cs.utah.edu/~regehr/safe_math/ > > Anyway I just throw this out there. People on this list have told me > before that missed-optimization bugs are not considered very interesting. > The ideal result (from my point of view as a compiler consumer) would be > for a few people from one or more of these compilers' development > communities to take seriously the job of eliminating these embarrassments.I'm moderately interested. The nice thing about these sorts of bugs is that they interact very well with our other optimizations. However, as they aren't real-world cases I can't consider them high priority. I'd just like to have the list to look over and fix a few every once in a while. I don't want you to go through too much trouble to put it on the web, but it sounds like you've already done the hard part of not only producing all the functions but scoring the compilers results! I'm really impressed by this and particularly like your systematic approach. Nick> John Regehr > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Very interesting study! However as others have asked, I would like to see some aggregated data. Also, how do you verify that the generated code is correct? How are you systematically generating these tests? In summary, do you have any paper coming along? :) Thanks, Nuno ----- Original Message ----- From: "John Regehr" <regehr at cs.utah.edu> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> Sent: Saturday, July 18, 2009 6:21 AM Subject: Re: [LLVMdev] speed and code size issues> We have some results that are somewhat entertaining and that relate to the > size/speed discussion. > > The basic idea is exhaustive generation of C functions where "exhaustive" > is qualified by some structural restrictions (depth of AST, node type, > etc.). > > For one particular set of restrictions we ended up with about 7 million C > functions. We then compiled each of these functions with 7 compilers: > llvm-gcc, clang, Intel cc, Sun cc, various versions of gcc. > > We then looked for functions where a particular pair of compilers > exhibited widely differing abilities to optimize. For example, consider > this function: > > int ZZ_0000728f(int x,int y){return o1s(m8s(x,-2),(x?1:y));} > > gcc-3.4 can see that it always returns 0, and emits code doing that. On > the other hand, llvm-gcc emits 228 bytes of object code (at -Os) to > compute the same zeroes. > > The funny-named functions are little safe-math utilities that avoid > undefined behavior for all inputs. "o1s" is "mod 16-bit signed" and "m8s" > is "multiply 8-bit signed". > > Why is this interesting? Because it provides a way to systematically find > areas of weakness in an optimizer, relative to a collection of other > optimizers. > > If people would find it useful, I can put the full set of results on the > web when time permits. I call the resulting codes "maximally > embarrassing" since each function represents some significant failure to > optimize. > > The global maximally embarrasing function is one where various versions of > gcc (including llvm-gcc) emit code returning constant 0 and clang emits > 762 bytes of x86. The C code is this: > > int ZZ_00005bbd(int x,int y){return m1s((x?0:x),a8s(y,y));} > > The other embarrassing thing about these functions is that most compilers > miscompile some of the 7 million functions. llvm-gcc and clang are the > only ones we tested that actually get them all right. > > To compile these functions this code needs to be prepended: > > #include <limits.h> > #include <stdint.h> > #include "safe_abbrev.h" > #include "safe_math.h" > > The safe math headers are here: > > http://www.cs.utah.edu/~regehr/safe_math/ > > Anyway I just throw this out there. People on this list have told me > before that missed-optimization bugs are not considered very interesting. > The ideal result (from my point of view as a compiler consumer) would be > for a few people from one or more of these compilers' development > communities to take seriously the job of eliminating these embarrassments. > > John Regehr