thr3ads.net - llvm dev - [llvm-dev] Incremental compilation and recognizing distinct bitcode [Jul 2016]

If this information is useful, please help other people find it:
Share via:

David Jones via llvm-dev

2016-Jul-08 21:18 UTC

[llvm-dev] Incremental compilation and recognizing distinct bitcode

For my project, the step of using LLVM to optimize and generate machine
code for a module is much slower than everything else. I realize a
significant performance improvement if I can do "incremental
compilation"
and avoid invoking the LLVM code generator if the underlying object has not
changed.

My current strategy is as follows: for each module:
- write bitcode out to "module.bc.new"
- if "module.bc" exists, then compare (byte-by-byte) with
"module.bc.new".
If they match, then skip compilation
- move "module.bc.new" to "module.bc" (known to be different
at this point)
- generate "module.o" (expensive step)

However, I am finding that occasionally I will write out different bitcode
for the same input, which causes gratuitous recompilation.  If I run
llvm-dis on "module.bc" and "module.bc.new" in these cases,
the output is
identical, as expected.

Is it expected that the actual bitcode may change from run to run, perhaps
as a result of ASLR?

Is there a better way for me to check that a Module* structure just built
is (not) identical to that from a previous run?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160708/127abb45/attachment.html>

Eli Friedman via llvm-dev

2016-Jul-08 22:39 UTC

head link

[llvm-dev] Incremental compilation and recognizing distinct bitcode

On Fri, Jul 8, 2016 at 2:18 PM, David Jones via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> For my project, the step of using LLVM to optimize and generate machine
> code for a module is much slower than everything else. I realize a
> significant performance improvement if I can do "incremental
compilation"
> and avoid invoking the LLVM code generator if the underlying object has not
> changed.
>
> My current strategy is as follows: for each module:
> - write bitcode out to "module.bc.new"
> - if "module.bc" exists, then compare (byte-by-byte) with
> "module.bc.new".  If they match, then skip compilation
> - move "module.bc.new" to "module.bc" (known to be
different at this point)
> - generate "module.o" (expensive step)
>
> However, I am finding that occasionally I will write out different bitcode
> for the same input, which causes gratuitous recompilation.  If I run
> llvm-dis on "module.bc" and "module.bc.new" in these
cases, the output is
> identical, as expected.
>
> Is it expected that the actual bitcode may change from run to run, perhaps
> as a result of ASLR?
>
If the input is exactly the same, LLVM should generate the same bitcode
from run to run.  That said, we don't have very good testing infrastructure
for this, so it's possible you're tripping over a bug.

Note that llvm-dis by default doesn't print out all the information in a
.bc file; try passing "-preserve-ll-uselistorder=true".

> Is there a better way for me to check that a Module* structure just built
> is (not) identical to that from a previous run?
>
I don't think so.  It's not really a common operation.  (Compilers which
support incremental compilation generally use some other mechanism.)

-Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160708/b3cbfcf8/attachment.html>

Mehdi Amini via llvm-dev

2016-Jul-08 23:30 UTC

head link

[llvm-dev] Incremental compilation and recognizing distinct bitcode

> On Jul 8, 2016, at 2:18 PM, David Jones via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> For my project, the step of using LLVM to optimize and generate machine
code for a module is much slower than everything else. I realize a significant
performance improvement if I can do "incremental compilation" and
avoid invoking the LLVM code generator if the underlying object has not changed.
> 
> My current strategy is as follows: for each module:
> - write bitcode out to "module.bc.new"
> - if "module.bc" exists, then compare (byte-by-byte) with
"module.bc.new".  If they match, then skip compilation
> - move "module.bc.new" to "module.bc" (known to be
different at this point)
> - generate "module.o" (expensive step)
> 
> However, I am finding that occasionally I will write out different bitcode
for the same input, which causes gratuitous recompilation.  If I run llvm-dis on
"module.bc" and "module.bc.new" in these cases, the output
is identical, as expected.
> 
> Is it expected that the actual bitcode may change from run to run, perhaps
as a result of ASLR?
No, for instance it is not expected that clang would generate a different
bitcode.
I assume you’re using your own fronted to generate the IR? You may not be
deterministic when creating it.

Diffing the output of "llvm-bcanalyzer -dump” may help.

> 
> Is there a better way for me to check that a Module* structure just built
is (not) identical to that from a previous run?
You may check what we do with ThinLTO (lib/LTO/ThinLTOCodeGenerator) to perform
incremental LTO, i.e. hashing the module content and checking on disk if it
exists. This may or may not be able to be included nicely into your flow better
than scripting for instance.


— 
Mehdi

llvm dev - Jul 2016 - Incremental compilation and recognizing distinct bitcode

[llvm-dev] Incremental compilation and recognizing distinct bitcode

[llvm-dev] Incremental compilation and recognizing distinct bitcode

[llvm-dev] Incremental compilation and recognizing distinct bitcode