thr3ads.net - llvm dev - [llvm-dev] Problem of array index manipulation collection of LLVM IR [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Qingkun Meng via llvm-dev

2016-Jul-21 12:07 UTC

[llvm-dev] Fwd: Problem of array index manipulation collection of LLVM IR

Hi there,

I am a newbie of llvm and here is my question situation. Assume that there
is a function F which contains a loop named L, a array b[100]. I want to
collect the statistical information of array index operation op(i) (take
add and mul simply) of i in the loop L. Pseudocode lists below.

void F(arg1, arg2){
    int b[100];
    for(int i=0; i<n; i++){
        op1(i);
        op2(i);
        ......
        b[op1(i)]=n1;
        b[op2(i)]=n2;    // n1 and n2 are just common constants
}
}

The code fragment is compiled to LLVM IR, I want to collect how many times
are operations (like add and mul) put on i. However the operations are not
easily obtained because there are many temp variables mix the variable
trace. Does anyone have ideas to solve this or some open source project do
this job?

Thank you very much!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160721/0cfb2ead/attachment.html>

Mehdi Amini via llvm-dev

2016-Jul-21 22:38 UTC

head link

[llvm-dev] Problem of array index manipulation collection of LLVM IR

> On Jul 21, 2016, at 5:07 AM, Qingkun Meng via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> Hi there,
> 
> I am a newbie of llvm and here is my question situation. Assume that there
is a function F which contains a loop named L, a array b[100]. I want to collect
the statistical information of array index operation op(i) (take add and mul
simply) of i in the loop L. Pseudocode lists below.
> 
> void F(arg1, arg2){
>     int b[100];
>     for(int i=0; i<n; i++){
>         op1(i);
>         op2(i);
>         ......
>         b[op1(i)]=n1;
>         b[op2(i)]=n2;    // n1 and n2 are just common constants
> }
> }
> 
> The code fragment is compiled to LLVM IR, I want to collect how many times
are operations (like add and mul) put on i. However the operations are not
easily obtained because there are many temp variables mix the variable trace.
Does anyone have ideas to solve this or some open source project do this job?
In short: there is no reliable way in the absolute. The optimizer will make
transformations that completely loses any relationship with the source-code.
Also if you are interested about what gets actually *executed*, some of these
computation will be folded in the addressing mode depending on the architecture.

Some people are doing these kind of analyses using debug info to map back to the
source code, it may be enough if you don’t need precise results or results that
are accurate with respect to the final optimized binary instruction stream.

— 
Mehdi

Mehdi Amini via llvm-dev

2016-Jul-22 03:48 UTC

head link

[llvm-dev] Problem of array index manipulation collection of LLVM IR

> On Jul 21, 2016, at 8:28 PM, Qingkun Meng <mengqingkun1988 at
gmail.com> wrote:
> 
> >if you are interested about what gets actually *executed*, some of
these computation will be folded in the addressing mode depending on the
architecture
> 
> If I just want to collect array index manipulation lexically, is there any
reliable solution?
It depends what you expect exactly. What would be the ideal output for you on
the example you provided before?
Also what is the use-case? (I.e. *why* do you want this information).

> 
> By noting this
> >Some people are doing these kind of analyses using debug info to map
back to the source code
> do you mean reversing to source code from LLVM IR? Is there any open source
project? I am very appreciated you could refer it to me.
I meant debug information as what clang generates with -g.
For instance, try with a simple example:

$ cat test.c
int foo(int a, int b) {
  return a + b;
}

And look at the difference in the output when compiled with -g or not (i.e.
`clang -emit-llvm -S test.c -O3 -o -` and  `clang -emit-llvm -S test.c -O3 -o -
-g`).
In the first you’ll get something like:

define i32 @foo(i32, i32) #0 {
  %3 = add nsw i32 %1, %0
  ret i32 %3
}

while in the second case it will look like (stripped to keep only the relevant
informations):

define i32 @foo(i32, i32) #0 !dbg !7 {
  tail call void @llvm.dbg.value(metadata i32 %0, i64 0, metadata !12, metadata
!14), !dbg !15
  tail call void @llvm.dbg.value(metadata i32 %1, i64 0, metadata !13, metadata
!14), !dbg !16
  %3 = add nsw i32 %1, %0, !dbg !17
  ret i32 %3, !dbg !18
}
[…]
!1 = !DIFile(filename: "test.c", directory: “…")
[…]
!7 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1,
type: !8, isLocal: false, isDefinition: true, scopeLine: 1, flags:
DIFlagPrototyped, isOptimized: true, unit: !0, variables: !11)
[….]
!12 = !DILocalVariable(name: "a", arg: 1, scope: !7, file: !1, line:
1, type: !10)
!13 = !DILocalVariable(name: "b", arg: 2, scope: !7, file: !1, line:
1, type: !10)
!14 = !DIExpression()
!15 = !DILocation(line: 1, column: 13, scope: !7)
!16 = !DILocation(line: 1, column: 20, scope: !7)
!17 = !DILocation(line: 2, column: 12, scope: !7)
!18 = !DILocation(line: 2, column: 3, scope: !7)


Now from there you can analyze the IR and see that there is an addition for two
values (%0 and %1), and the calls to llvm.dbg.value points you to some
information about these variables (name, type, source location).

— 
Mehdi





> 
> 
> 2016-07-22 6:38 GMT+08:00 Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>>:
> 
> > On Jul 21, 2016, at 5:07 AM, Qingkun Meng via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> >
> >
> > Hi there,
> >
> > I am a newbie of llvm and here is my question situation. Assume that
there is a function F which contains a loop named L, a array b[100]. I want to
collect the statistical information of array index operation op(i) (take add and
mul simply) of i in the loop L. Pseudocode lists below.
> >
> > void F(arg1, arg2){
> >     int b[100];
> >     for(int i=0; i<n; i++){
> >         op1(i);
> >         op2(i);
> >         ......
> >         b[op1(i)]=n1;
> >         b[op2(i)]=n2;    // n1 and n2 are just common constants
> > }
> > }
> >
> > The code fragment is compiled to LLVM IR, I want to collect how many
times are operations (like add and mul) put on i. However the operations are not
easily obtained because there are many temp variables mix the variable trace.
Does anyone have ideas to solve this or some open source project do this job?
> 
> In short: there is no reliable way in the absolute. The optimizer will make
transformations that completely loses any relationship with the source-code.
Also if you are interested about what gets actually *executed*, some of these
computation will be folded in the addressing mode depending on the architecture.
> 
> Some people are doing these kind of analyses using debug info to map back
to the source code, it may be enough if you don’t need precise results or
results that are accurate with respect to the final optimized binary instruction
stream.
> 
> —
> Mehdi
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160721/6cf530f0/attachment-0001.html>

Qingkun Meng via llvm-dev

2016-Jul-22 09:38 UTC

head link

[llvm-dev] Problem of array index manipulation collection of LLVM IR

It depends what you expect exactly. What would be the
ideal output for you on the example you provided before?
                     Also what is the use-case? (I.e. *why* do you want
this information).

I want to collect the array index manipulation frequency in a loop from a
function. I recently have read a paper named "Dowsing for Overflows: A
guided Fuzzer to Find Buffer Boundary Violations". It says the array index
manipulations are related to buffer violations so I want to implement it
since I can't get source code from writer. What the paper has analysed is
LLVM bitcode and this is the reason that I post this problem. Is there any
solution?

2016-07-22 11:48 GMT+08:00 Mehdi Amini <mehdi.amini at apple.com>:
>
> On Jul 21, 2016, at 8:28 PM, Qingkun Meng <mengqingkun1988 at
gmail.com>
> wrote:
>
> >if you are interested about what gets actually *executed*, some of
these
> computation will be folded in the addressing mode depending on the
> architecture
>
> If I just want to collect array index manipulation lexically, is there any
> reliable solution?
>
>
> It depends what you expect exactly. What would be the ideal output for you
> on the example you provided before?
> Also what is the use-case? (I.e. *why* do you want this information).
>
>
>
> By noting this
> >Some people are doing these kind of analyses using debug info to map
back
> to the source code
> do you mean reversing to source code from LLVM IR? Is there any open
> source project? I am very appreciated you could refer it to me.
>
>
> I meant debug information as what clang generates with -g.
> For instance, try with a simple example:
>
> $ cat test.c
> int foo(int a, int b) {
>   return a + b;
> }
>
> And look at the difference in the output when compiled with -g or not
> (i.e. `clang -emit-llvm -S test.c -O3 -o -` and  `clang -emit-llvm -S
> test.c -O3 -o - -g`).
> In the first you’ll get something like:
>
> define i32 @foo(i32, i32) #0 {
>   %3 = add nsw i32 %1, %0
>   ret i32 %3
> }
>
> while in the second case it will look like (stripped to keep only the
> relevant informations):
>
> define i32 @foo(i32, i32) #0 !dbg !7 {
>   tail call void @llvm.dbg.value(metadata i32 %0, i64 0, metadata !12,
> metadata !14), !dbg !15
>   tail call void @llvm.dbg.value(metadata i32 %1, i64 0, metadata !13,
> metadata !14), !dbg !16
>   %3 = add nsw i32 %1, %0, !dbg !17
>   ret i32 %3, !dbg !18
> }
> […]
> !1 = !DIFile(filename: "test.c", directory: “…")
> […]
> !7 = distinct !DISubprogram(name: "foo", scope: !1, file: !1,
line: 1,
> type: !8, isLocal: false, isDefinition: true, scopeLine: 1, flags:
> DIFlagPrototyped, isOptimized: true, unit: !0, variables: !11)
> [….]
> !12 = !DILocalVariable(name: "a", arg: 1, scope: !7, file: !1,
line: 1,
> type: !10)
> !13 = !DILocalVariable(name: "b", arg: 2, scope: !7, file: !1,
line: 1,
> type: !10)
> !14 = !DIExpression()
> !15 = !DILocation(line: 1, column: 13, scope: !7)
> !16 = !DILocation(line: 1, column: 20, scope: !7)
> !17 = !DILocation(line: 2, column: 12, scope: !7)
> !18 = !DILocation(line: 2, column: 3, scope: !7)
>
>
> Now from there you can analyze the IR and see that there is an addition
> for two values (%0 and %1), and the calls to llvm.dbg.value points you to
> some information about these variables (name, type, source location).
>
> —
> Mehdi
>
>
>
>
>
>
>
>
> 2016-07-22 6:38 GMT+08:00 Mehdi Amini <mehdi.amini at apple.com>:
>
>>
>> > On Jul 21, 2016, at 5:07 AM, Qingkun Meng via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >
>> >
>> > Hi there,
>> >
>> > I am a newbie of llvm and here is my question situation. Assume
that
>> there is a function F which contains a loop named L, a array b[100]. I
want
>> to collect the statistical information of array index operation op(i)
(take
>> add and mul simply) of i in the loop L. Pseudocode lists below.
>> >
>> > void F(arg1, arg2){
>> >     int b[100];
>> >     for(int i=0; i<n; i++){
>> >         op1(i);
>> >         op2(i);
>> >         ......
>> >         b[op1(i)]=n1;
>> >         b[op2(i)]=n2;    // n1 and n2 are just common constants
>> > }
>> > }
>> >
>> > The code fragment is compiled to LLVM IR, I want to collect how
many
>> times are operations (like add and mul) put on i. However the
operations
>> are not easily obtained because there are many temp variables mix the
>> variable trace. Does anyone have ideas to solve this or some open
source
>> project do this job?
>>
>> In short: there is no reliable way in the absolute. The optimizer will
>> make transformations that completely loses any relationship with the
>> source-code. Also if you are interested about what gets actually
>> *executed*, some of these computation will be folded in the addressing
mode
>> depending on the architecture.
>>
>> Some people are doing these kind of analyses using debug info to map
back
>> to the source code, it may be enough if you don’t need precise results
or
>> results that are accurate with respect to the final optimized binary
>> instruction stream.
>>
>> —
>> Mehdi
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160722/dfa638fa/attachment.html>

llvm dev - Jul 2016 - Problem of array index manipulation collection of LLVM IR

[llvm-dev] Fwd: Problem of array index manipulation collection of LLVM IR

[llvm-dev] Problem of array index manipulation collection of LLVM IR

[llvm-dev] Problem of array index manipulation collection of LLVM IR

[llvm-dev] Problem of array index manipulation collection of LLVM IR