thr3ads.net - llvm dev - [llvm-dev] Is clang+llvm deterministisc? [Jul 2017]

If this information is useful, please help other people find it:
Share via:

章明 via llvm-dev

2017-Jul-17 07:36 UTC

[llvm-dev] Is clang+llvm deterministisc?

I searched source code of LLVM/Clang 4.0.0 for 'random_seed' with grep.
It seems the -frandom-seed option is not supported.




The -rng-seed option appears to be defined in
./lib/Support/RandomNumberGenerator.cpp, which is source code for class
RandomNumberGenerator. The constructor of class RandomNumberGenerator is private
and is only called by Module::createRNG (defined in lib/IR/Module.cpp). But
Module::createRNG does not seem to be called anywhere, except by a unit test.




I also tried adding a line to print a message in Module::createRNG. The modified
code compiles without any error. However, when I run clang and llc to compile a
simple C program, the message is not printed out. This confirms that
Module::createRNG is not called by clang or llc.


-----Original Messages-----
From:"Alexandre Isoard" <alexandre.isoard at gmail.com>
Sent Time:2017-07-17 03:49:48 (Monday)
To: "章明" <editing at zju.edu.cn>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Is clang+llvm deterministisc?


Hi Ming Zhang,


If you don't want to rely on Clang reproducibility, you could save the IR
into a .bc file. Clang can directly take a .bc file as input.


You then:
- instrument a copy of that .bc file and run your counting
- add control flow checking on an other copy of the original .bc file and you
have your final binary


For the reproducibility, I think we try to preserve that, but sometime we lose
it, you may have to specify -frandom-seed.


On Sun, Jul 16, 2017 at 4:22 AM, 章明 via llvm-dev <llvm-dev at
lists.llvm.org> wrote:


Hi, there,




I am working on a project on software control flow checking, which instruments a
program to check if the control flow at runtime matches the control flow graph
computed at compile-time.




My instrumentation process has to make use of control flow information,
including as control flow graph and dominator/post-dominator trees, so it is
better part of the compiler. On the other hand, I don't want any
transformation pass to mess up the additional instrumentation code, so my
instrumentation process has to be run after other transformation passes are
complete. Therefore, I'd like to implement my instrumentation process as the
last pass before the machine intermediate representation (MIR) is translated to
native assembly code.




My instrumentation process also needs to take basic block execution frequencies
into consideration. So I have to compile the same program twice. First, the
program is compiled, adding code to collect execution frequencies. Then, when
the execution frequencies have been collected, the same program is compiled
again to add control flow checking instructions, which takes execution
frequencies into consideration. Obviously, the program profiled to collect
execution frequencies and the program instrumented with control flow checking
instructions have to be consistent. At least, they have to have the same basic
blocks and identical control flow graphs. So my question is this: If I compile
the same program twice using Clang, with the same command line, is it guaranteed
that, at the point right before the MIRs are converted to native assembly code,
the MIRs are identical?




Thank you!




Ming Zhang


_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev







--

Alexandre Isoard
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170717/3f102201/attachment.html>

Stephen Crane via llvm-dev

2017-Jul-19 20:19 UTC

head link

[llvm-dev] Is clang+llvm deterministisc?

That RNG is currently not used. There are some old stalled patches
that use it, but they haven't been committed. These patches
specifically use that RNG for intentionally randomizing compiler
output.

I don't know of other major problems for reproducible control flow,
but I'm not an expert. I guess there could always be weird edge cases
like unstable iteration of hash tables of pointers?

- stephen

On Mon, Jul 17, 2017 at 12:36 AM, 章明 via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I searched source code of LLVM/Clang 4.0.0 for 'random_seed' with
grep. It
> seems the -frandom-seed option is not supported.
>
>
> The -rng-seed option appears to be defined in
> ./lib/Support/RandomNumberGenerator.cpp, which is source code for class
> RandomNumberGenerator. The constructor of class RandomNumberGenerator is
> private and is only called by Module::createRNG (defined in
> lib/IR/Module.cpp). But Module::createRNG does not seem to be called
> anywhere, except by a unit test.
>
>
> I also tried adding a line to print a message in Module::createRNG. The
> modified code compiles without any error. However, when I run clang and llc
> to compile a simple C program, the message is not printed out. This
confirms
> that Module::createRNG is not called by clang or llc.
>
>
> -----Original Messages-----
> From:"Alexandre Isoard" <alexandre.isoard at gmail.com>
> Sent Time:2017-07-17 03:49:48 (Monday)
> To: "章明" <editing at zju.edu.cn>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] Is clang+llvm deterministisc?
>
> Hi Ming Zhang,
>
> If you don't want to rely on Clang reproducibility, you could save the
IR
> into a .bc file. Clang can directly take a .bc file as input.
>
> You then:
> - instrument a copy of that .bc file and run your counting
> - add control flow checking on an other copy of the original .bc file and
> you have your final binary
>
> For the reproducibility, I think we try to preserve that, but sometime we
> lose it, you may have to specify -frandom-seed.
>
> On Sun, Jul 16, 2017 at 4:22 AM, 章明 via llvm-dev <llvm-dev at
lists.llvm.org>
> wrote:
>>
>> Hi, there,
>>
>>
>> I am working on a project on software control flow checking, which
>> instruments a program to check if the control flow at runtime matches
the
>> control flow graph computed at compile-time.
>>
>>
>> My instrumentation process has to make use of control flow information,
>> including as control flow graph and dominator/post-dominator trees, so
it is
>> better part of the compiler. On the other hand, I don't want any
>> transformation pass to mess up the additional instrumentation code, so
my
>> instrumentation process has to be run after other transformation passes
are
>> complete. Therefore, I'd like to implement my instrumentation
process as the
>> last pass before the machine intermediate representation (MIR) is
translated
>> to native assembly code.
>>
>>
>> My instrumentation process also needs to take basic block execution
>> frequencies into consideration. So I have to compile the same program
twice.
>> First, the program is compiled, adding code to collect execution
>> frequencies. Then, when the execution frequencies have been collected,
the
>> same program is compiled again to add control flow checking
instructions,
>> which takes execution frequencies into consideration. Obviously, the
program
>> profiled to collect execution frequencies and the program instrumented
with
>> control flow checking instructions have to be consistent. At least,
they
>> have to have the same basic blocks and identical control flow graphs.
So my
>> question is this: If I compile the same program twice using Clang, with
the
>> same command line, is it guaranteed that, at the point right before the
MIRs
>> are converted to native assembly code, the MIRs are identical?
>>
>>
>> Thank you!
>>
>>
>> Ming Zhang
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
>
> --
> Alexandre Isoard
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

章明 via llvm-dev

2017-Jul-20 08:17 UTC

head link

[llvm-dev] Is clang+llvm deterministisc?

Thank you for clarifying the status of the RNG feature!

The possible non-determinism in code generation of the latest release of
LLVM/Clang is what I worry about.
It seems that I'll have to rely on native assembly output of LLVM to provide
a consistent view of the control flow graph.
Also, I may have to dump dominator trees and loop information produced by LLVM
so that they can be used by my instrumentation process.

> -----Original Messages-----
> From: "Stephen Crane" <sjc at immunant.com>
> Sent Time: 2017-07-20 04:19:27 (Thursday)
> To: "章明" <editing at zju.edu.cn>
> Cc: "alexandre isoard" <alexandre.isoard at gmail.com>,
llvm-dev <llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] Is clang+llvm deterministisc?
> 
> That RNG is currently not used. There are some old stalled patches
> that use it, but they haven't been committed. These patches
> specifically use that RNG for intentionally randomizing compiler
> output.
> 
> I don't know of other major problems for reproducible control flow,
> but I'm not an expert. I guess there could always be weird edge cases
> like unstable iteration of hash tables of pointers?
> 
> - stephen
> 
> On Mon, Jul 17, 2017 at 12:36 AM, 章明 via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > I searched source code of LLVM/Clang 4.0.0 for 'random_seed'
with grep. It
> > seems the -frandom-seed option is not supported.
> >
> >
> > The -rng-seed option appears to be defined in
> > ./lib/Support/RandomNumberGenerator.cpp, which is source code for
class
> > RandomNumberGenerator. The constructor of class RandomNumberGenerator
is
> > private and is only called by Module::createRNG (defined in
> > lib/IR/Module.cpp). But Module::createRNG does not seem to be called
> > anywhere, except by a unit test.
> >
> >
> > I also tried adding a line to print a message in Module::createRNG.
The
> > modified code compiles without any error. However, when I run clang
and llc
> > to compile a simple C program, the message is not printed out. This
confirms
> > that Module::createRNG is not called by clang or llc.
> >
> >
> > -----Original Messages-----
> > From:"Alexandre Isoard" <alexandre.isoard at
gmail.com>
> > Sent Time:2017-07-17 03:49:48 (Monday)
> > To: "章明" <editing at zju.edu.cn>
> > Cc: llvm-dev <llvm-dev at lists.llvm.org>
> > Subject: Re: [llvm-dev] Is clang+llvm deterministisc?
> >
> > Hi Ming Zhang,
> >
> > If you don't want to rely on Clang reproducibility, you could save
the IR
> > into a .bc file. Clang can directly take a .bc file as input.
> >
> > You then:
> > - instrument a copy of that .bc file and run your counting
> > - add control flow checking on an other copy of the original .bc file
and
> > you have your final binary
> >
> > For the reproducibility, I think we try to preserve that, but sometime
we
> > lose it, you may have to specify -frandom-seed.
> >
> > On Sun, Jul 16, 2017 at 4:22 AM, 章明 via llvm-dev <llvm-dev at
lists.llvm.org>
> > wrote:
> >>
> >> Hi, there,
> >>
> >>
> >> I am working on a project on software control flow checking, which
> >> instruments a program to check if the control flow at runtime
matches the
> >> control flow graph computed at compile-time.
> >>
> >>
> >> My instrumentation process has to make use of control flow
information,
> >> including as control flow graph and dominator/post-dominator
trees, so it is
> >> better part of the compiler. On the other hand, I don't want
any
> >> transformation pass to mess up the additional instrumentation
code, so my
> >> instrumentation process has to be run after other transformation
passes are
> >> complete. Therefore, I'd like to implement my instrumentation
process as the
> >> last pass before the machine intermediate representation (MIR) is
translated
> >> to native assembly code.
> >>
> >>
> >> My instrumentation process also needs to take basic block
execution
> >> frequencies into consideration. So I have to compile the same
program twice.
> >> First, the program is compiled, adding code to collect execution
> >> frequencies. Then, when the execution frequencies have been
collected, the
> >> same program is compiled again to add control flow checking
instructions,
> >> which takes execution frequencies into consideration. Obviously,
the program
> >> profiled to collect execution frequencies and the program
instrumented with
> >> control flow checking instructions have to be consistent. At
least, they
> >> have to have the same basic blocks and identical control flow
graphs. So my
> >> question is this: If I compile the same program twice using Clang,
with the
> >> same command line, is it guaranteed that, at the point right
before the MIRs
> >> are converted to native assembly code, the MIRs are identical?
> >>
> >>
> >> Thank you!
> >>
> >>
> >> Ming Zhang
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >
> >
> >
> > --
> > Alexandre Isoard
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >

llvm dev - Jul 2017 - Is clang+llvm deterministisc?

[llvm-dev] Is clang+llvm deterministisc?

[llvm-dev] Is clang+llvm deterministisc?

[llvm-dev] Is clang+llvm deterministisc?