David Jones via llvm-dev
2016-Jul-08 21:18 UTC
[llvm-dev] Incremental compilation and recognizing distinct bitcode
For my project, the step of using LLVM to optimize and generate machine code for a module is much slower than everything else. I realize a significant performance improvement if I can do "incremental compilation" and avoid invoking the LLVM code generator if the underlying object has not changed. My current strategy is as follows: for each module: - write bitcode out to "module.bc.new" - if "module.bc" exists, then compare (byte-by-byte) with "module.bc.new". If they match, then skip compilation - move "module.bc.new" to "module.bc" (known to be different at this point) - generate "module.o" (expensive step) However, I am finding that occasionally I will write out different bitcode for the same input, which causes gratuitous recompilation. If I run llvm-dis on "module.bc" and "module.bc.new" in these cases, the output is identical, as expected. Is it expected that the actual bitcode may change from run to run, perhaps as a result of ASLR? Is there a better way for me to check that a Module* structure just built is (not) identical to that from a previous run? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160708/127abb45/attachment.html>
Eli Friedman via llvm-dev
2016-Jul-08 22:39 UTC
[llvm-dev] Incremental compilation and recognizing distinct bitcode
On Fri, Jul 8, 2016 at 2:18 PM, David Jones via llvm-dev < llvm-dev at lists.llvm.org> wrote:> For my project, the step of using LLVM to optimize and generate machine > code for a module is much slower than everything else. I realize a > significant performance improvement if I can do "incremental compilation" > and avoid invoking the LLVM code generator if the underlying object has not > changed. > > My current strategy is as follows: for each module: > - write bitcode out to "module.bc.new" > - if "module.bc" exists, then compare (byte-by-byte) with > "module.bc.new". If they match, then skip compilation > - move "module.bc.new" to "module.bc" (known to be different at this point) > - generate "module.o" (expensive step) > > However, I am finding that occasionally I will write out different bitcode > for the same input, which causes gratuitous recompilation. If I run > llvm-dis on "module.bc" and "module.bc.new" in these cases, the output is > identical, as expected. > > Is it expected that the actual bitcode may change from run to run, perhaps > as a result of ASLR? >If the input is exactly the same, LLVM should generate the same bitcode from run to run. That said, we don't have very good testing infrastructure for this, so it's possible you're tripping over a bug. Note that llvm-dis by default doesn't print out all the information in a .bc file; try passing "-preserve-ll-uselistorder=true".> Is there a better way for me to check that a Module* structure just built > is (not) identical to that from a previous run? >I don't think so. It's not really a common operation. (Compilers which support incremental compilation generally use some other mechanism.) -Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160708/b3cbfcf8/attachment.html>
Mehdi Amini via llvm-dev
2016-Jul-08 23:30 UTC
[llvm-dev] Incremental compilation and recognizing distinct bitcode
> On Jul 8, 2016, at 2:18 PM, David Jones via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > For my project, the step of using LLVM to optimize and generate machine code for a module is much slower than everything else. I realize a significant performance improvement if I can do "incremental compilation" and avoid invoking the LLVM code generator if the underlying object has not changed. > > My current strategy is as follows: for each module: > - write bitcode out to "module.bc.new" > - if "module.bc" exists, then compare (byte-by-byte) with "module.bc.new". If they match, then skip compilation > - move "module.bc.new" to "module.bc" (known to be different at this point) > - generate "module.o" (expensive step) > > However, I am finding that occasionally I will write out different bitcode for the same input, which causes gratuitous recompilation. If I run llvm-dis on "module.bc" and "module.bc.new" in these cases, the output is identical, as expected. > > Is it expected that the actual bitcode may change from run to run, perhaps as a result of ASLR?No, for instance it is not expected that clang would generate a different bitcode. I assume you’re using your own fronted to generate the IR? You may not be deterministic when creating it. Diffing the output of "llvm-bcanalyzer -dump” may help.> > Is there a better way for me to check that a Module* structure just built is (not) identical to that from a previous run?You may check what we do with ThinLTO (lib/LTO/ThinLTOCodeGenerator) to perform incremental LTO, i.e. hashing the module content and checking on disk if it exists. This may or may not be able to be included nicely into your flow better than scripting for instance. — Mehdi