Kelly, Terence P (HP Labs Researcher)
2009-Aug-01 01:11 UTC
[LLVMdev] building whole-program bitcode with LLVM
Hi, Professor Adve suggested that we post this question to llvm-dev. Thanks in advance for your advice. My colleagues and I want to create whole-program bitcode for large real programs like Apache, BIND, OpenLDAP, etc. We want the whole-program bitcode to include every part of the program for which we have source code. For example, in the case of Apache's "httpd" server, we want to create a whole-program bitcode file "httpd.bc" containing functions that the default build system stashes in various application-specific auxiliary libraries (e.g., Apache's libapr and libaprutil). Our motive is *not* link-time optimization; we're interested in analyzing and modifying the whole-program bitcode in other ways. Once we have created a whole-program bitcode, we want to compile it to native assembly, then pass it thru the native assembler & linker to obtain a native executable whose behavior (except for performance) is identical to that of an executable obtained from the default build system. We do *not* want standard libraries like libc and libpthread to be incorporated as bitcode in the whole-program bitcode; they can be linked in at the final step, after we have converted the whole-program bitcode to native assembly and assembled & linked it. We have been able to achieve our goal for small programs consisting of a handful of translation units, so we know that our goal is attainable in principle. Problems start when we tackle big programs with complex build systems. We want to find a generic strategy that works with most real world open source C/C++ programs without too much fuss, because we want to use it on at least a dozen different programs. Ideally we want a strategy that works with unmodified default build systems, because eventually we hope to produce a tool that is easy for other developers to use. Initially we had hoped simply to replace gcc, as, ld, etc. with their LLVM counterparts in the standard build systems, but we haven't been able to make that strategy work. Several different approaches along these lines fail in various ways. Some have recommended the Gold plugin, but it's not clear from the documentation that it does what we want, and we haven't been successful in installing it yet. Does anyone have experience in constructing whole-program bitcodes that include app-specific libraries for large open-source programs? If you could share the right tricks, that would be very helpful. Thanks! -- Terence Kelly, HP Labs ________________________________ From: Vikram S. Adve [mailto:vadve at cs.uiuc.edu] Sent: Friday, July 24, 2009 8:05 PM To: Kelly, Terence P (HP Labs Researcher) Cc: Swarup Sahoo Subject: Re: building complex software with LLVM Hi Terence, ... I also recommend sending any such technical questions about LLVM to llvmdev at cs.uiuc.edu. There are a large number of active (and very helpful) LLVM users on that list. Replies go to the list so you should join the list to see them. Good luck! --Vikram Associate Professor, Computer Science University of Illinois at Urbana-Champaign http://llvm.org/~vadve
Hi, For my PhD work, I have used LLVM to transform whole-program bitcode modules of systems like Quake 3 and Parrot VM. As build system integration is a very complex problem in general, integrating LLVM in medium to large build systems was not straightforward, although I guess things should be easier now with the help of the gold plugin and libLTO. In short, I was not able to find a fully automated, generic approach to integrate LLVM, as every build system is unique, and often contains subtle mistakes (invoking gcc directly instead of via $CC, ...). Instead, I used a tool-supported, manual approach consisting of the following 3 steps: 1. Visualize and understand the existing build system 2. Plan how my tool fits in 3. Change the makefiles In step 1, I used my MAKAO tool (http://users.ugent.be/~badams/makao/) to visualize the build dependency graph of a run of the existing build system. This gives an idea about all libraries and executables that are built, how they fit together and which makefile rules are responsible for them. Based on the information of step 1, I then determined in step 2 which libraries and executables I wanted to transform. Finally, step 3 involved making system-dependent physical changes to the build system in order to deploy my tools the way I planned to in step 2. Sometimes, this could be done without touching the original makefiles, e.g. by overriding build variables. Often, more invasive changes were needed, such as splitting existing build rules or adding new ones. From my experience, having a good understanding of the build system at hand (see step 1) is indispensable when doing this kind of build change in large systems. More information can be found in sections 7.3.1, 9.3.1 and 10.3.1 of my PhD (http://users.ugent.be/~badams/publications/2008/PhD.pdf ). Kind regards, Bram Adams SAIL, Queen's University (Canada) On 31-Jul-09, at 9:11 PM, Kelly, Terence P (HP Labs Researcher) wrote:> Hi, > > Professor Adve suggested that we post this question to llvm-dev. > Thanks in advance for your advice. > > My colleagues and I want to create whole-program bitcode for large > real programs like Apache, BIND, OpenLDAP, etc. We want the > whole-program bitcode to include every part of the program for which > we have source code. For example, in the case of Apache's "httpd" > server, we want to create a whole-program bitcode file "httpd.bc" > containing functions that the default build system stashes in various > application-specific auxiliary libraries (e.g., Apache's libapr and > libaprutil). > > Our motive is *not* link-time optimization; we're interested in > analyzing and modifying the whole-program bitcode in other ways. > Once we have created a whole-program bitcode, we want to compile it > to native assembly, then pass it thru the native assembler & linker > to obtain a native executable whose behavior (except for performance) > is identical to that of an executable obtained from the default build > system. We do *not* want standard libraries like libc and libpthread > to be incorporated as bitcode in the whole-program bitcode; they can > be linked in at the final step, after we have converted the > whole-program bitcode to native assembly and assembled & linked it. > > We have been able to achieve our goal for small programs consisting > of a handful of translation units, so we know that our goal is > attainable in principle. Problems start when we tackle big programs > with complex build systems. We want to find a generic strategy that > works with most real world open source C/C++ programs without too > much fuss, because we want to use it on at least a dozen different > programs. Ideally we want a strategy that works with unmodified > default build systems, because eventually we hope to produce a tool > that is easy for other developers to use. > > Initially we had hoped simply to replace gcc, as, ld, etc. with their > LLVM counterparts in the standard build systems, but we haven't been > able to make that strategy work. Several different approaches along > these lines fail in various ways. Some have recommended the Gold > plugin, but it's not clear from the documentation that it does what > we want, and we haven't been successful in installing it yet. > > Does anyone have experience in constructing whole-program bitcodes > that include app-specific libraries for large open-source programs? > If you could share the right tricks, that would be very helpful. > > Thanks! > > -- Terence Kelly, HP Labs
On Saturday 01 August 2009 03:11:57 Kelly, Terence P (HP Labs Researcher)> Initially we had hoped simply to replace gcc, as, ld, etc. with their > LLVM counterparts in the standard build systems, but we haven't been > able to make that strategy work. Several different approaches along > these lines fail in various ways. Some have recommended the Gold > plugin, but it's not clear from the documentation that it does what > we want, and we haven't been successful in installing it yet.Could you summarize the failures/issues that you found with this kind of approach? We'll have to do sth similar eventually, and up to now I excepted that customizing the compiler/linker should work most of the time. Perhaps with the help of a custom-written linker-wrapper (using llvmc). It would be great if you could share howto's / Makefile patches / ... once you can build some of these large applications with the LLVM tools. Thanks, Torvald
> Initially we had hoped simply to replace gcc, as, ld, etc. with their > LLVM counterparts in the standard build systems, but we haven't been > able to make that strategy work. Several different approaches along > these lines fail in various ways. Some have recommended the Gold > plugin, but it's not clear from the documentation that it does what > we want, and we haven't been successful in installing it yet.Right now it will produce a native object file, but it might be possible to hack it to dump an IL file on the side. It is in a good position to do so since the linker passes it all the IL files, including the ones that are inside archives.> Thanks! > > -- Terence Kelly, HP Labs >Cheers, -- Rafael Avila de Espindola Google | Gordon House | Barrow Street | Dublin 4 | Ireland Registered in Dublin, Ireland | Registration Number: 368047
Reasonably Related Threads
- [LLVMdev] strace for whole-program bitcodes (was: RE: building whole-program bitcode with LLVM)
- [LLVMdev] strace for whole-program bitcodes (was: RE: building whole-program bitcode with LLVM)
- [LLVMdev] strace for whole-program bitcodes (was: RE: building whole-program bitcode with LLVM)
- [LLVMdev] strace for whole-program bitcodes
- [LLVMdev] strace for whole-program bitcodes