Kelly, Terence P (HP Labs Researcher)
2009-Oct-08 22:26 UTC
[LLVMdev] strace for whole-program bitcodes (was: RE: building whole-program bitcode with LLVM)
Hi, It would be nice if it were easier for relative novices to build whole-program bitcodes for large, complex applications with hairy build systems. Several readers of this list have been trying various approaches for a few months but as far as I know we haven't yet found a good general solution. Approaches that have been tried include 1) placing wrappers for the usual tools (gcc, ar, as, ld, etc.) first on the $PATH, and having the wrappers pass the buck to the LLVM equivalent tools after cleaning up the arguments; and 2) using the Gold plugin. Recently another possibility occurred to me, and I'm wondering if anyone has tried it. The basic idea goes like this: A) use the "strace" utility to trace the default build system and log all invocations of all tools; B) extract from the log a build recipe in the form of tool invocations, with the default tools replaced by LLVM equivalents. I started thinking along these lines after finding some genuine madness in a build system (it used AWK to munge together existing .c files into new ones midway through the build). I want a method that's guaranteed to mimic faithfully an arbitrarily nutty default build system, and an strace-based approach seemed like a "Gordian knot" solution. However I haven't tried it yet and I'm wondering if anyone else has, or if anyone can think of situations where it will fail. Thanks! -- Terence> -----Original Message----- > From: Kelly, Terence P (HP Labs Researcher) > Sent: Friday, July 31, 2009 6:12 PM > To: 'llvmdev at cs.uiuc.edu' > Cc: 'Vikram S. Adve' > Subject: building whole-program bitcode with LLVM > > > Hi, > > Professor Adve suggested that we post this question to llvm-dev. > Thanks in advance for your advice. > > My colleagues and I want to create whole-program bitcode for large > real programs like Apache, BIND, OpenLDAP, etc. We want the > whole-program bitcode to include every part of the program for which > we have source code. For example, in the case of Apache's "httpd" > server, we want to create a whole-program bitcode file "httpd.bc" > containing functions that the default build system stashes in various > application-specific auxiliary libraries (e.g., Apache's libapr and > libaprutil). > > Our motive is *not* link-time optimization; we're interested in > analyzing and modifying the whole-program bitcode in other ways. > Once we have created a whole-program bitcode, we want to compile it > to native assembly, then pass it thru the native assembler & linker > to obtain a native executable whose behavior (except for performance) > is identical to that of an executable obtained from the default build > system. We do *not* want standard libraries like libc and libpthread > to be incorporated as bitcode in the whole-program bitcode; they can > be linked in at the final step, after we have converted the > whole-program bitcode to native assembly and assembled & linked it. > > We have been able to achieve our goal for small programs consisting > of a handful of translation units, so we know that our goal is > attainable in principle. Problems start when we tackle big programs > with complex build systems. We want to find a generic strategy that > works with most real world open source C/C++ programs without too > much fuss, because we want to use it on at least a dozen different > programs. Ideally we want a strategy that works with unmodified > default build systems, because eventually we hope to produce a tool > that is easy for other developers to use. > > Initially we had hoped simply to replace gcc, as, ld, etc. with their > LLVM counterparts in the standard build systems, but we haven't been > able to make that strategy work. Several different approaches along > these lines fail in various ways. Some have recommended the Gold > plugin, but it's not clear from the documentation that it does what > we want, and we haven't been successful in installing it yet. > > Does anyone have experience in constructing whole-program bitcodes > that include app-specific libraries for large open-source programs? > If you could share the right tricks, that would be very helpful. > > Thanks! > > -- Terence Kelly, HP Labs > > ________________________________ > > From: Vikram S. Adve [mailto:vadve at cs.uiuc.edu] > Sent: Friday, July 24, 2009 8:05 PM > To: Kelly, Terence P (HP Labs Researcher) > Cc: Swarup Sahoo > Subject: Re: building complex software with LLVM > > Hi Terence, > > ... > > I also recommend sending any such technical > questions about LLVM to llvmdev at cs.uiuc.edu. > There are a large number of active (and very > helpful) LLVM users on that list. Replies > go to the list so you should join the list > to see them. > > Good luck! > > --Vikram > Associate Professor, Computer Science > University of Illinois at Urbana-Champaign > http://llvm.org/~vadve > >
"Kelly, Terence P (HP Labs Researcher)" <terence.p.kelly at hp.com> writes:> and I'm wondering if anyone else has, or if > anyone can think of situations where it will > fail.This will fail when the input files are temporary files that are removed after the build process. You will also be building the program twice. http://saturn.stanford.edu/pages/relatedindex.html folks do something similar but only look at the build log.
Tianwei
2009-Oct-15 13:10 UTC
[LLVMdev] strace for whole-program bitcodes (was: RE: building whole-program bitcode with LLVM)
Hi, Kelly, Have you found the solution for this problem? I met a similar problem when I were trying to test MySQL 5.0 with LLVM. The following is my step, but still failed since llvm-ld can not recognize some gcc link flags. 1. during the configuration, use a script such as llvm-gcc.sh, at this time the script only invoke the gcc. This is necessary because the gnu configure will test the compiler before configuration. 2. for configuration, specify CC and CXX as llvm-gcc.sh and llvm-g++.sh, also pass a special LDFLAG. 3. after configuration, rewrite the llvm-gcc.sh and llvm-g++.sh to parse the LDFLAGS to determine if we should use "llvm-gcc --emit-llvm " or "llvm-ld" 4. but finally, I still met the following error: libtool: link: mycc.sh -g -DDBUG_ON -DSAFE_MUTEX -O0 -g3 -shit-shit -rdynamic -o comp_sql comp_sql.o -lpthread -lcrypt -lnsl -lm -lpthread llvm-ld: Unknown command line argument '-g'. Try: 'llvm-ld --help' llvm-ld: Unknown command line argument '-DDBUG_ON'. Try: 'llvm-ld --help' llvm-ld: Unknown command line argument '-DSAFE_MUTEX'. Try: 'llvm-ld --help' llvm-ld: Unknown command line argument '-O0'. Try: 'llvm-ld --help' llvm-ld: Unknown command line argument '-g3'. Try: 'llvm-ld --help' llvm-ld: Unknown command line argument '-rdynamic'. Try: 'llvm-ld --help' someone suggested me to use gold-plugin, I know nothing about it yet, I will have a try later. Does anyone have a good solution for this problem? Thanks. Tianwei On Fri, Oct 9, 2009 at 6:26 AM, Kelly, Terence P (HP Labs Researcher) < terence.p.kelly at hp.com> wrote:> > Hi, > > It would be nice if it were easier for relative > novices to build whole-program bitcodes for > large, complex applications with hairy build > systems. Several readers of this list have > been trying various approaches for a few months > but as far as I know we haven't yet found a > good general solution. Approaches that have > been tried include 1) placing wrappers for the > usual tools (gcc, ar, as, ld, etc.) first on > the $PATH, and having the wrappers pass the > buck to the LLVM equivalent tools after cleaning > up the arguments; and 2) using the Gold plugin. > > Recently another possibility occurred to me, > and I'm wondering if anyone has tried it. > The basic idea goes like this: A) use the > "strace" utility to trace the default build > system and log all invocations of all tools; > B) extract from the log a build recipe in the > form of tool invocations, with the default > tools replaced by LLVM equivalents. > > I started thinking along these lines after > finding some genuine madness in a build system > (it used AWK to munge together existing .c files > into new ones midway through the build). I want > a method that's guaranteed to mimic faithfully > an arbitrarily nutty default build system, and > an strace-based approach seemed like a "Gordian > knot" solution. However I haven't tried it yet > and I'm wondering if anyone else has, or if > anyone can think of situations where it will > fail. > > Thanks! > > -- Terence > > > -----Original Message----- > > From: Kelly, Terence P (HP Labs Researcher) > > Sent: Friday, July 31, 2009 6:12 PM > > To: 'llvmdev at cs.uiuc.edu' > > Cc: 'Vikram S. Adve' > > Subject: building whole-program bitcode with LLVM > > > > > > Hi, > > > > Professor Adve suggested that we post this question to llvm-dev. > > Thanks in advance for your advice. > > > > My colleagues and I want to create whole-program bitcode for large > > real programs like Apache, BIND, OpenLDAP, etc. We want the > > whole-program bitcode to include every part of the program for which > > we have source code. For example, in the case of Apache's "httpd" > > server, we want to create a whole-program bitcode file "httpd.bc" > > containing functions that the default build system stashes in various > > application-specific auxiliary libraries (e.g., Apache's libapr and > > libaprutil). > > > > Our motive is *not* link-time optimization; we're interested in > > analyzing and modifying the whole-program bitcode in other ways. > > Once we have created a whole-program bitcode, we want to compile it > > to native assembly, then pass it thru the native assembler & linker > > to obtain a native executable whose behavior (except for performance) > > is identical to that of an executable obtained from the default build > > system. We do *not* want standard libraries like libc and libpthread > > to be incorporated as bitcode in the whole-program bitcode; they can > > be linked in at the final step, after we have converted the > > whole-program bitcode to native assembly and assembled & linked it. > > > > We have been able to achieve our goal for small programs consisting > > of a handful of translation units, so we know that our goal is > > attainable in principle. Problems start when we tackle big programs > > with complex build systems. We want to find a generic strategy that > > works with most real world open source C/C++ programs without too > > much fuss, because we want to use it on at least a dozen different > > programs. Ideally we want a strategy that works with unmodified > > default build systems, because eventually we hope to produce a tool > > that is easy for other developers to use. > > > > Initially we had hoped simply to replace gcc, as, ld, etc. with their > > LLVM counterparts in the standard build systems, but we haven't been > > able to make that strategy work. Several different approaches along > > these lines fail in various ways. Some have recommended the Gold > > plugin, but it's not clear from the documentation that it does what > > we want, and we haven't been successful in installing it yet. > > > > Does anyone have experience in constructing whole-program bitcodes > > that include app-specific libraries for large open-source programs? > > If you could share the right tricks, that would be very helpful. > > > > Thanks! > > > > -- Terence Kelly, HP Labs > > > > ________________________________ > > > > From: Vikram S. Adve [mailto:vadve at cs.uiuc.edu] > > Sent: Friday, July 24, 2009 8:05 PM > > To: Kelly, Terence P (HP Labs Researcher) > > Cc: Swarup Sahoo > > Subject: Re: building complex software with LLVM > > > > Hi Terence, > > > > ... > > > > I also recommend sending any such technical > > questions about LLVM to llvmdev at cs.uiuc.edu. > > There are a large number of active (and very > > helpful) LLVM users on that list. Replies > > go to the list so you should join the list > > to see them. > > > > Good luck! > > > > --Vikram > > Associate Professor, Computer Science > > University of Illinois at Urbana-Champaign > > http://llvm.org/~vadve > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Sheng, Tianwei Inst. of High Performance Computing Dept. of Computer Sci. & Tech. Tsinghua Univ. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091015/1f1b42f3/attachment.html>
Tianwei <tianwei.sheng at gmail.com> writes:> someone suggested me to use gold-plugin, I know nothing about it yet, I will > have a try later. Does anyone have a good solution for this problem?Afaik gold does not help here. I tried it and managed to only generate native code. I'm currently investigating an alternative approach to produce whole-program bitcodes: 1) add /tmp/wrap to PATH 2) create /tmp/wrap/gcc with the following contents #!/bin/sh exec llvm-gcc -specs /tmp/wrap/gcc.specs "$@" 3) llvm-gcc -dumpspecs > /tmp/wrap/gcc.specs 4) modify /tmp/wrap/gcc.specs so that it always passes -emit-llvm to cc1 5) modify /tmp/wrap/gcc.specs so that it calls llvm-ld* instead of real ld and does not pass any unknown flags to it. With this approach I was able to compile and run airstrike (a 2d dogfighting game) in bitcode form very transparently with: $ make-bitcode fakeroot apt-get --build source airstrike $ sudo dpkg -i airstrike*.deb $ airstrike If you are interested I can try to rework my scripts to a shape where they could be used by somebody else. (*) I am not actually calling llvm-ld directly. Instead I have an "llvm-ld-exe" wrapper that calls llvm-ld and then uses "anytoexe" to pack the resulting bitcode to a shell script that can execute itself with lli and use the correct -load options.
Daniel Dunbar
2009-Oct-15 15:12 UTC
[LLVMdev] strace for whole-program bitcodes (was: RE: building whole-program bitcode with LLVM)
Hi Terence, I believe that this is in fact similar to an approach Coverity uses (or used at one time) as a robust solution to determine what was done during a build. I can imagine that one can build a robust system following this technique, but it also seems like it might be quite a bit of work. Another possible alternative not mentioned is to teach the compiler driver (clang, most likely) to understand how to deal with bitcode files on platforms with no LLVM linker support. This isn't terribly difficult, and would work as long as all access to the tools was done through the driver (e.g., CC). There might still be problems with build systems that call tools like ar/ld directly. - Daniel On Thu, Oct 8, 2009 at 3:26 PM, Kelly, Terence P (HP Labs Researcher) <terence.p.kelly at hp.com> wrote:> > Hi, > > It would be nice if it were easier for relative > novices to build whole-program bitcodes for > large, complex applications with hairy build > systems. Several readers of this list have > been trying various approaches for a few months > but as far as I know we haven't yet found a > good general solution. Approaches that have > been tried include 1) placing wrappers for the > usual tools (gcc, ar, as, ld, etc.) first on > the $PATH, and having the wrappers pass the > buck to the LLVM equivalent tools after cleaning > up the arguments; and 2) using the Gold plugin. > > Recently another possibility occurred to me, > and I'm wondering if anyone has tried it. > The basic idea goes like this: A) use the > "strace" utility to trace the default build > system and log all invocations of all tools; > B) extract from the log a build recipe in the > form of tool invocations, with the default > tools replaced by LLVM equivalents. > > I started thinking along these lines after > finding some genuine madness in a build system > (it used AWK to munge together existing .c files > into new ones midway through the build). I want > a method that's guaranteed to mimic faithfully > an arbitrarily nutty default build system, and > an strace-based approach seemed like a "Gordian > knot" solution. However I haven't tried it yet > and I'm wondering if anyone else has, or if > anyone can think of situations where it will > fail. > > Thanks! > > -- Terence > >> -----Original Message----- >> From: Kelly, Terence P (HP Labs Researcher) >> Sent: Friday, July 31, 2009 6:12 PM >> To: 'llvmdev at cs.uiuc.edu' >> Cc: 'Vikram S. Adve' >> Subject: building whole-program bitcode with LLVM >> >> >> Hi, >> >> Professor Adve suggested that we post this question to llvm-dev. >> Thanks in advance for your advice. >> >> My colleagues and I want to create whole-program bitcode for large >> real programs like Apache, BIND, OpenLDAP, etc. We want the >> whole-program bitcode to include every part of the program for which >> we have source code. For example, in the case of Apache's "httpd" >> server, we want to create a whole-program bitcode file "httpd.bc" >> containing functions that the default build system stashes in various >> application-specific auxiliary libraries (e.g., Apache's libapr and >> libaprutil). >> >> Our motive is *not* link-time optimization; we're interested in >> analyzing and modifying the whole-program bitcode in other ways. >> Once we have created a whole-program bitcode, we want to compile it >> to native assembly, then pass it thru the native assembler & linker >> to obtain a native executable whose behavior (except for performance) >> is identical to that of an executable obtained from the default build >> system. We do *not* want standard libraries like libc and libpthread >> to be incorporated as bitcode in the whole-program bitcode; they can >> be linked in at the final step, after we have converted the >> whole-program bitcode to native assembly and assembled & linked it. >> >> We have been able to achieve our goal for small programs consisting >> of a handful of translation units, so we know that our goal is >> attainable in principle. Problems start when we tackle big programs >> with complex build systems. We want to find a generic strategy that >> works with most real world open source C/C++ programs without too >> much fuss, because we want to use it on at least a dozen different >> programs. Ideally we want a strategy that works with unmodified >> default build systems, because eventually we hope to produce a tool >> that is easy for other developers to use. >> >> Initially we had hoped simply to replace gcc, as, ld, etc. with their >> LLVM counterparts in the standard build systems, but we haven't been >> able to make that strategy work. Several different approaches along >> these lines fail in various ways. Some have recommended the Gold >> plugin, but it's not clear from the documentation that it does what >> we want, and we haven't been successful in installing it yet. >> >> Does anyone have experience in constructing whole-program bitcodes >> that include app-specific libraries for large open-source programs? >> If you could share the right tricks, that would be very helpful. >> >> Thanks! >> >> -- Terence Kelly, HP Labs >> >> ________________________________ >> >> From: Vikram S. Adve [mailto:vadve at cs.uiuc.edu] >> Sent: Friday, July 24, 2009 8:05 PM >> To: Kelly, Terence P (HP Labs Researcher) >> Cc: Swarup Sahoo >> Subject: Re: building complex software with LLVM >> >> Hi Terence, >> >> ... >> >> I also recommend sending any such technical >> questions about LLVM to llvmdev at cs.uiuc.edu. >> There are a large number of active (and very >> helpful) LLVM users on that list. Replies >> go to the list so you should join the list >> to see them. >> >> Good luck! >> >> --Vikram >> Associate Professor, Computer Science >> University of Illinois at Urbana-Champaign >> http://llvm.org/~vadve >> >> > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Kelly, Terence P (HP Labs Researcher)
2009-Oct-15 23:52 UTC
[LLVMdev] strace for whole-program bitcodes (was: RE: building whole-program bitcode with LLVM)
Hi Daniel, Thanks for your reply. Do we know if the LLVM developers intend to address this problem in a comprehensive way? The existing LLVM tools are not quite drop-in replacements for their standard GCC counterparts; that's the source of the problems that various people have encountered when trying to develop a fully general way to get whole-program bitcodes. If the LLVM tools *were* fully compatible, I think that would remove an impediment to much wider usage of LLVM. Is full compatibility a goal for the LLVM developers? -- Terence> -----Original Message----- > From: daniel.dunbar at gmail.com > [mailto:daniel.dunbar at gmail.com] On Behalf Of Daniel Dunbar > Sent: Thursday, October 15, 2009 8:13 AM > To: Kelly, Terence P (HP Labs Researcher) > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] strace for whole-program bitcodes > (was: RE: building whole-program bitcode with LLVM) > > Hi Terence, > > I believe that this is in fact similar to an approach Coverity uses > (or used at one time) as a robust solution to determine what was done > during a build. I can imagine that one can build a robust system > following this technique, but it also seems like it might be quite a > bit of work. > > Another possible alternative not mentioned is to teach the compiler > driver (clang, most likely) to understand how to deal with bitcode > files on platforms with no LLVM linker support. This isn't terribly > difficult, and would work as long as all access to the tools was done > through the driver (e.g., CC). There might still be problems with > build systems that call tools like ar/ld directly. > > - Daniel > > On Thu, Oct 8, 2009 at 3:26 PM, Kelly, Terence P (HP Labs Researcher) > <terence.p.kelly at hp.com> wrote: > > > > Hi, > > > > It would be nice if it were easier for relative > > novices to build whole-program bitcodes for > > large, complex applications with hairy build > > systems. Several readers of this list have > > been trying various approaches for a few months > > but as far as I know we haven't yet found a > > good general solution. Approaches that have > > been tried include 1) placing wrappers for the > > usual tools (gcc, ar, as, ld, etc.) first on > > the $PATH, and having the wrappers pass the > > buck to the LLVM equivalent tools after cleaning > > up the arguments; and 2) using the Gold plugin. > > > > Recently another possibility occurred to me, > > and I'm wondering if anyone has tried it. > > The basic idea goes like this: A) use the > > "strace" utility to trace the default build > > system and log all invocations of all tools; > > B) extract from the log a build recipe in the > > form of tool invocations, with the default > > tools replaced by LLVM equivalents. > > > > I started thinking along these lines after > > finding some genuine madness in a build system > > (it used AWK to munge together existing .c files > > into new ones midway through the build). I want > > a method that's guaranteed to mimic faithfully > > an arbitrarily nutty default build system, and > > an strace-based approach seemed like a "Gordian > > knot" solution. However I haven't tried it yet > > and I'm wondering if anyone else has, or if > > anyone can think of situations where it will > > fail. > > > > Thanks! > > > > -- Terence > > > >> -----Original Message----- > >> From: Kelly, Terence P (HP Labs Researcher) > >> Sent: Friday, July 31, 2009 6:12 PM > >> To: 'llvmdev at cs.uiuc.edu' > >> Cc: 'Vikram S. Adve' > >> Subject: building whole-program bitcode with LLVM > >> > >> > >> Hi, > >> > >> Professor Adve suggested that we post this question to llvm-dev. > >> Thanks in advance for your advice. > >> > >> My colleagues and I want to create whole-program bitcode for large > >> real programs like Apache, BIND, OpenLDAP, etc. We want the > >> whole-program bitcode to include every part of the program > for which > >> we have source code. For example, in the case of Apache's "httpd" > >> server, we want to create a whole-program bitcode file "httpd.bc" > >> containing functions that the default build system stashes > in various > >> application-specific auxiliary libraries (e.g., Apache's libapr and > >> libaprutil). > >> > >> Our motive is *not* link-time optimization; we're interested in > >> analyzing and modifying the whole-program bitcode in other ways. > >> Once we have created a whole-program bitcode, we want to compile it > >> to native assembly, then pass it thru the native assembler & linker > >> to obtain a native executable whose behavior (except for > performance) > >> is identical to that of an executable obtained from the > default build > >> system. We do *not* want standard libraries like libc and > libpthread > >> to be incorporated as bitcode in the whole-program > bitcode; they can > >> be linked in at the final step, after we have converted the > >> whole-program bitcode to native assembly and assembled & linked it. > >> > >> We have been able to achieve our goal for small programs consisting > >> of a handful of translation units, so we know that our goal is > >> attainable in principle. Problems start when we tackle > big programs > >> with complex build systems. We want to find a generic > strategy that > >> works with most real world open source C/C++ programs without too > >> much fuss, because we want to use it on at least a dozen different > >> programs. Ideally we want a strategy that works with unmodified > >> default build systems, because eventually we hope to produce a tool > >> that is easy for other developers to use. > >> > >> Initially we had hoped simply to replace gcc, as, ld, etc. > with their > >> LLVM counterparts in the standard build systems, but we > haven't been > >> able to make that strategy work. Several different > approaches along > >> these lines fail in various ways. Some have recommended the Gold > >> plugin, but it's not clear from the documentation that it does what > >> we want, and we haven't been successful in installing it yet. > >> > >> Does anyone have experience in constructing whole-program bitcodes > >> that include app-specific libraries for large open-source programs? > >> If you could share the right tricks, that would be very helpful. > >> > >> Thanks! > >> > >> -- Terence Kelly, HP Labs > >> > >> ________________________________ > >> > >> From: Vikram S. Adve [mailto:vadve at cs.uiuc.edu] > >> Sent: Friday, July 24, 2009 8:05 PM > >> To: Kelly, Terence P (HP Labs Researcher) > >> Cc: Swarup Sahoo > >> Subject: Re: building complex software with LLVM > >> > >> Hi Terence, > >> > >> ... > >> > >> I also recommend sending any such technical > >> questions about LLVM to llvmdev at cs.uiuc.edu. > >> There are a large number of active (and very > >> helpful) LLVM users on that list. Replies > >> go to the list so you should join the list > >> to see them. > >> > >> Good luck! > >> > >> --Vikram > >> Associate Professor, Computer Science > >> University of Illinois at Urbana-Champaign > >> http://llvm.org/~vadve > >> > >> > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >