Hi, there, I am working on a project on software control flow checking, which instruments a program to check if the control flow at runtime matches the control flow graph computed at compile-time. My instrumentation process has to make use of control flow information, including as control flow graph and dominator/post-dominator trees, so it is better part of the compiler. On the other hand, I don't want any transformation pass to mess up the additional instrumentation code, so my instrumentation process has to be run after other transformation passes are complete. Therefore, I'd like to implement my instrumentation process as the last pass before the machine intermediate representation (MIR) is translated to native assembly code. My instrumentation process also needs to take basic block execution frequencies into consideration. So I have to compile the same program twice. First, the program is compiled, adding code to collect execution frequencies. Then, when the execution frequencies have been collected, the same program is compiled again to add control flow checking instructions, which takes execution frequencies into consideration. Obviously, the program profiled to collect execution frequencies and the program instrumented with control flow checking instructions have to be consistent. At least, they have to have the same basic blocks and identical control flow graphs. So my question is this: If I compile the same program twice using Clang, with the same command line, is it guaranteed that, at the point right before the MIRs are converted to native assembly code, the MIRs are identical? Thank you! Ming Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170716/ce85d28e/attachment.html>
Alexandre Isoard via llvm-dev
2017-Jul-16 19:49 UTC
[llvm-dev] Is clang+llvm deterministisc?
Hi Ming Zhang, If you don't want to rely on Clang reproducibility, you could save the IR into a .bc file. Clang can directly take a .bc file as input. You then: - instrument a copy of that .bc file and run your counting - add control flow checking on an other copy of the original .bc file and you have your final binary For the reproducibility, I think we try to preserve that, but sometime we lose it, you may have to specify -frandom-seed. On Sun, Jul 16, 2017 at 4:22 AM, 章明 via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hi, there, > > > I am working on a project on software control flow checking, which > instruments a program to check if the control flow at runtime matches the > control flow graph computed at compile-time. > > > My instrumentation process has to make use of control flow information, > including as control flow graph and dominator/post-dominator trees, so it > is better part of the compiler. On the other hand, I don't want any > transformation pass to mess up the additional instrumentation code, so my > instrumentation process has to be run after other transformation passes are > complete. Therefore, I'd like to implement my instrumentation process as > the last pass before the machine intermediate representation (MIR) is > translated to native assembly code. > > > My instrumentation process also needs to take basic block execution > frequencies into consideration. So I have to compile the same program > twice. First, the program is compiled, adding code to collect execution > frequencies. Then, when the execution frequencies have been collected, the > same program is compiled again to add control flow checking instructions, > which takes execution frequencies into consideration. Obviously, the > program profiled to collect execution frequencies and the program > instrumented with control flow checking instructions have to be consistent. > At least, they have to have the same basic blocks and identical control > flow graphs. So my question is this: If I compile the same program twice > using Clang, with the same command line, is it guaranteed that, at the > point right before the MIRs are converted to native assembly code, the MIRs > are identical? > > > Thank you! > > > Ming Zhang > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-- *Alexandre Isoard* -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170716/62d056ba/attachment.html>
Thank you for your quick reply! "If you don't want to rely on Clang reproducibility, you could save the IR into a .bc file. Clang can directly take a .bc file as input." I'm aware of LLVM bitcode, but I'm not quite sure about what it really is, since I haven't found any official documentation that clarifies this concept. LLVM bitcode seems to be just another form of LLVM IR, according to "Bitcode Demystified" (https://lowlevelbits.org/bitcode-demystified/). If that is true, then bitcode still has to go through all passes in the backend, e.g., instruction selection, register allocation, etc. I don't think I can work with LLVM bitcode, because I don't want those backend passes to interfere with my instrumentation process. Is it possible to save the intermediate result at the point right before the machine intermediate representation (MIR) is translated into native assembly code, so that a new run of clang/llc can read it and continue to compile correctly? I have read LLVM's official documentation about MIR (http://llvm.org/docs/MIRLangRef.html). It seems that MIR is currently only used for test purposes and still misses some important features. I'm afraid that the missing features may lead the compiler to generate incorrect native assembly code. "For the reproducibility, I think we try to preserve that, but sometime we lose it, you may have to specify -frandom-seed." The comments in the source code of clang (in clang/lib/Driver/tools.cpp) indicates the -frandom-seed option is not supported. I noticed that llc has a -rng-seed option, but I haven't found any documentation about it. Could you please tell me more about the reason why clang/llvm needs to use a random number generator and about the command line options? Thank you! -----Original Messages----- From:"Alexandre Isoard" <alexandre.isoard at gmail.com> Sent Time:2017-07-17 03:49:48 (Monday) To: "章明" <editing at zju.edu.cn> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Is clang+llvm deterministisc? Hi Ming Zhang, If you don't want to rely on Clang reproducibility, you could save the IR into a .bc file. Clang can directly take a .bc file as input. You then: - instrument a copy of that .bc file and run your counting - add control flow checking on an other copy of the original .bc file and you have your final binary For the reproducibility, I think we try to preserve that, but sometime we lose it, you may have to specify -frandom-seed. On Sun, Jul 16, 2017 at 4:22 AM, 章明 via llvm-dev <llvm-dev at lists.llvm.org> wrote: Hi, there, I am working on a project on software control flow checking, which instruments a program to check if the control flow at runtime matches the control flow graph computed at compile-time. My instrumentation process has to make use of control flow information, including as control flow graph and dominator/post-dominator trees, so it is better part of the compiler. On the other hand, I don't want any transformation pass to mess up the additional instrumentation code, so my instrumentation process has to be run after other transformation passes are complete. Therefore, I'd like to implement my instrumentation process as the last pass before the machine intermediate representation (MIR) is translated to native assembly code. My instrumentation process also needs to take basic block execution frequencies into consideration. So I have to compile the same program twice. First, the program is compiled, adding code to collect execution frequencies. Then, when the execution frequencies have been collected, the same program is compiled again to add control flow checking instructions, which takes execution frequencies into consideration. Obviously, the program profiled to collect execution frequencies and the program instrumented with control flow checking instructions have to be consistent. At least, they have to have the same basic blocks and identical control flow graphs. So my question is this: If I compile the same program twice using Clang, with the same command line, is it guaranteed that, at the point right before the MIRs are converted to native assembly code, the MIRs are identical? Thank you! Ming Zhang _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- Alexandre Isoard -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170717/2c5ab580/attachment.html>
I searched source code of LLVM/Clang 4.0.0 for 'random_seed' with grep. It seems the -frandom-seed option is not supported. The -rng-seed option appears to be defined in ./lib/Support/RandomNumberGenerator.cpp, which is source code for class RandomNumberGenerator. The constructor of class RandomNumberGenerator is private and is only called by Module::createRNG (defined in lib/IR/Module.cpp). But Module::createRNG does not seem to be called anywhere, except by a unit test. I also tried adding a line to print a message in Module::createRNG. The modified code compiles without any error. However, when I run clang and llc to compile a simple C program, the message is not printed out. This confirms that Module::createRNG is not called by clang or llc. -----Original Messages----- From:"Alexandre Isoard" <alexandre.isoard at gmail.com> Sent Time:2017-07-17 03:49:48 (Monday) To: "章明" <editing at zju.edu.cn> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Is clang+llvm deterministisc? Hi Ming Zhang, If you don't want to rely on Clang reproducibility, you could save the IR into a .bc file. Clang can directly take a .bc file as input. You then: - instrument a copy of that .bc file and run your counting - add control flow checking on an other copy of the original .bc file and you have your final binary For the reproducibility, I think we try to preserve that, but sometime we lose it, you may have to specify -frandom-seed. On Sun, Jul 16, 2017 at 4:22 AM, 章明 via llvm-dev <llvm-dev at lists.llvm.org> wrote: Hi, there, I am working on a project on software control flow checking, which instruments a program to check if the control flow at runtime matches the control flow graph computed at compile-time. My instrumentation process has to make use of control flow information, including as control flow graph and dominator/post-dominator trees, so it is better part of the compiler. On the other hand, I don't want any transformation pass to mess up the additional instrumentation code, so my instrumentation process has to be run after other transformation passes are complete. Therefore, I'd like to implement my instrumentation process as the last pass before the machine intermediate representation (MIR) is translated to native assembly code. My instrumentation process also needs to take basic block execution frequencies into consideration. So I have to compile the same program twice. First, the program is compiled, adding code to collect execution frequencies. Then, when the execution frequencies have been collected, the same program is compiled again to add control flow checking instructions, which takes execution frequencies into consideration. Obviously, the program profiled to collect execution frequencies and the program instrumented with control flow checking instructions have to be consistent. At least, they have to have the same basic blocks and identical control flow graphs. So my question is this: If I compile the same program twice using Clang, with the same command line, is it guaranteed that, at the point right before the MIRs are converted to native assembly code, the MIRs are identical? Thank you! Ming Zhang _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- Alexandre Isoard -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170717/3f102201/attachment.html>
2017-07-15 20:22 GMT-07:00 章明 via llvm-dev <llvm-dev at lists.llvm.org>:> Hi, there, > > > I am working on a project on software control flow checking, which > instruments a program to check if the control flow at runtime matches the > control flow graph computed at compile-time. > > > My instrumentation process has to make use of control flow information, > including as control flow graph and dominator/post-dominator trees, so it > is better part of the compiler. On the other hand, I don't want any > transformation pass to mess up the additional instrumentation code, >This isn't totally clear to me: the usual way to design the instrumentation is to make it robust to IR transformations. What makes the instrumentation special here that transforming the instrumented IR would break it?> so my instrumentation process has to be run after other transformation > passes are complete. Therefore, I'd like to implement my instrumentation > process as the last pass before the machine intermediate representation > (MIR) is translated to native assembly code. > > > My instrumentation process also needs to take basic block execution > frequencies into consideration. So I have to compile the same program > twice. First, the program is compiled, adding code to collect execution > frequencies. Then, when the execution frequencies have been collected, the > same program is compiled again to add control flow checking instructions, > which takes execution frequencies into consideration. >This is exactly what PGO is doing if I understand correctly your description.> Obviously, the program profiled to collect execution frequencies and the > program instrumented with control flow checking instructions have to be > consistent. At least, they have to have the same basic blocks and identical > control flow graphs. So my question is this: If I compile the same program > twice using Clang, with the same command line, is it guaranteed that, at > the point right before the MIRs are converted to native assembly code, the > MIRs are identical? >I believe any non-determinism here is considered a bug in clang/LLVM, and should be fixed. -- Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170724/528530d4/attachment.html>