thr3ads.net - llvm dev - [LLVMdev] GEP instructions: is it possible to reverse-engineer array accesses? [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Gabriel Rodríguez

2011-Oct-18 17:00 UTC

[LLVMdev] GEP instructions: is it possible to reverse-engineer array accesses?

Dear All, 


As of late I am having a hard time getting my head around how array accesses 
are translated by Clang into LLVM IR: the often misunderstood GEP instruction. 
I am trying to reverse-engineer array accesses to discover the number of
dimensions
and actual indexes of the original, and I am beginning to wonder whether this is
possible at all. To illustrate (some of) my troubles, consider the following
code and
the LLVM IR for both 32 and 64 bit memory addresses: 


-- 
original C: 



#define N 1000 


int main(int argc, char **argv) 
{ 
int i, k; 
float aux, A[N][N]; 


aux = A[k][i]; 
} 


-- 
32-bit addresses LLVM IR (relevant part): 



%4 = load i32* %i, align 4 
%5 = load i32* %k, align 4 
%6 = getelementptr inbounds [1000 x [1000 x float]]* %A, i32 0, i32 %5 
%7 = getelementptr inbounds [1000 x float]* %6, i32 0, i32 %4 
%8 = load float* %7 
store float %8, float* %aux, align 4 


-- 
64-bit addresses LLVM IR (relevant part): 



%4 = load i32* %i, align 4 
%5 = load i32* %k, align 4 
%6 = getelementptr inbounds [1000 x [1000 x float]]* %A, i32 0, i32 0 
%7 = sext i32 %5 to i64 
%8 = getelementptr inbounds [1000 x float]* %6, i64 %7 
%9 = load float* %8 
store float %9, float* %aux, align 4 


-- 




Why does the 64-bit addresses version use two leading 0s instead of one? I have
tried reading
http://llvm.org/docs/GetElementPtr.html and I don't think the explanation
provided is accurate, or
at least I can't see how to apply it to this particular case. 


Besides, there is an incredible diversity of variations in how arrays can be
represented and accessed
in C codes, leading to my final question: is it really possible to
reverse-engineer array accesses? If so,
any insights? 




Thanks in advance, and best regards, 
Gabriel 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/2bf869b2/attachment.html>

Duncan Sands

2011-Oct-18 17:24 UTC

head link

[LLVMdev] GEP instructions: is it possible to reverse-engineer array accesses?

Hi Gabriel, I suggest you don't bother with testcases like this that are
doing
undefined things.  For example, neither i nor k are initialized, so the result
of accessing the array is undefined.  Thus the frontend can (and apparently
does) produce anything strange thing it does.  What is more, the result aux is
unused, so there is no obligation to compute it correctly.  I think you will
get more understandable results with a more sensible testcase.

Ciao, Duncan.
> As of late I am having a hard time getting my head around how array
accesses
> are translated by Clang into LLVM IR: the often misunderstood GEP
instruction.
> I am trying to reverse-engineer array accesses to discover the number of
dimensions
> and actual indexes of the original, and I am beginning to wonder whether
this is
> possible at all. To illustrate (some of) my troubles, consider the
following
> code and
> the LLVM IR for both 32 and 64 bit memory addresses:
>
> --
> original C:
>
> #define N 1000
>
> int main(int argc, char **argv)
> {
> int i, k;
> float aux, A[N][N];
>
> aux = A[k][i];
> }
>
> --
> 32-bit addresses LLVM IR (relevant part):
>
> %4 = load i32* %i, align 4
> %5 = load i32* %k, align 4
> %6 = getelementptr inbounds [1000 x [1000 x float]]* %A, i32 0, i32 %5
> %7 = getelementptr inbounds [1000 x float]* %6, i32 0, i32 %4
> %8 = load float* %7
> store float %8, float* %aux, align 4
>
> --
> 64-bit addresses LLVM IR (relevant part):
>
> %4 = load i32* %i, align 4
> %5 = load i32* %k, align 4
> %6 = getelementptr inbounds [1000 x [1000 x float]]* %A, i32 0, i32 0
> %7 = sext i32 %5 to i64
> %8 = getelementptr inbounds [1000 x float]* %6, i64 %7
> %9 = load float* %8
> store float %9, float* %aux, align 4
>
> --
>
>
> Why does the 64-bit addresses version use two leading 0s instead of one? I
have
> tried reading
> http://llvm.org/docs/GetElementPtr.html and I don't think the
explanation
> provided is accurate, or
> at least I can't see how to apply it to this particular case.
>
> Besides, there is an incredible diversity of variations in how arrays can
be
> represented and accessed
> in C codes, leading to my final question: is it really possible to
> reverse-engineer array accesses? If so,
> any insights?
>
>
> Thanks in advance, and best regards,
> Gabriel
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Dan Gohman

2011-Oct-18 18:13 UTC

head link

[LLVMdev] GEP instructions: is it possible to reverse-engineer array accesses?

On Oct 18, 2011, at 10:00 AM, Gabriel Rodríguez wrote:
> Dear All,
> 
> As of late I am having a hard time getting my head around how array
accesses
> are translated by Clang into LLVM IR: the often misunderstood GEP
instruction.
> I am trying to reverse-engineer array accesses to discover the number of
dimensions
> and actual indexes of the original, and I am beginning to wonder whether
this is
> possible at all. To illustrate (some of) my troubles, consider the
following code and
> the LLVM IR for both 32 and 64 bit memory addresses:
> 
> --
> original C:
> 
> #define N 1000
> 
> int main(int argc, char **argv)
> {
>   int i, k;
>   float aux, A[N][N];
See "How does VLA addressing work with GEPs?" in the GEP FAQ.  In
short, you have to
reverse-engineer, and the ScalarEvolution library can help you with that.

Also, one thing not mentioned in the FAQ is that if you want to assume that the
dimensions
are independent (in other words, that the inner dimension is never
over-indexed), you
have to prove it for yourself. Even though overindexing may be prohibited at the
source level,
it's valid at the LLVM IR level, and some LLVM optimizations do use it. 
ScalarEvolution
can help with this as well, though it doesn't do everything.

Dan

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/f82bf28e/attachment.html>

Gabriel Rodríguez

2011-Oct-18 19:08 UTC

head link

[LLVMdev] GEP instructions: is it possible to reverse-engineer array accesses?

Thank you both for your answers. 


Actually, as a result of Duncan's comment I've been investigating how
come
this code is generated. It seems to happen only with a particular version of
clang.
Can't reproduce it on different machines cross-compiling for the original
target
(x86_64-apple-darwin10.0.0) , so I will assume I don't need to consider this
particular case. 


I had read that part about ScalarEvolution helping with reverse engineering, but
haven't looked at it in any depth as of yet. I will eventually come to that,
I guess.
Thanks for the tip. 


Best, 
Gabriel 


De: "Duncan Sands" <baldrick at free.fr> 
Para: llvmdev at cs.uiuc.edu 
Enviados: Martes, 18 de Octubre 2011 19:24:15 
Asunto: Re: [LLVMdev] GEP instructions: is it possible to reverse-engineer array
accesses?

Hi Gabriel, I suggest you don't bother with testcases like this that are
doing
undefined things. For example, neither i nor k are initialized, so the result 
of accessing the array is undefined. Thus the frontend can (and apparently 
does) produce anything strange thing it does. What is more, the result aux is 
unused, so there is no obligation to compute it correctly. I think you will 
get more understandable results with a more sensible testcase. 

Ciao, Duncan. 
> As of late I am having a hard time getting my head around how array
accesses
> are translated by Clang into LLVM IR: the often misunderstood GEP
instruction.
> I am trying to reverse-engineer array accesses to discover the number of
dimensions
> and actual indexes of the original, and I am beginning to wonder whether
this is
> possible at all. To illustrate (some of) my troubles, consider the
following
> code and 
> the LLVM IR for both 32 and 64 bit memory addresses: 
> 
> -- 
> original C: 
> 
> #define N 1000 
> 
> int main(int argc, char **argv) 
> { 
> int i, k; 
> float aux, A[N][N]; 
> 
> aux = A[k][i]; 
> } 
> 
> -- 
> 32-bit addresses LLVM IR (relevant part): 
> 
> %4 = load i32* %i, align 4 
> %5 = load i32* %k, align 4 
> %6 = getelementptr inbounds [1000 x [1000 x float]]* %A, i32 0, i32 %5 
> %7 = getelementptr inbounds [1000 x float]* %6, i32 0, i32 %4 
> %8 = load float* %7 
> store float %8, float* %aux, align 4 
> 
> -- 
> 64-bit addresses LLVM IR (relevant part): 
> 
> %4 = load i32* %i, align 4 
> %5 = load i32* %k, align 4 
> %6 = getelementptr inbounds [1000 x [1000 x float]]* %A, i32 0, i32 0 
> %7 = sext i32 %5 to i64 
> %8 = getelementptr inbounds [1000 x float]* %6, i64 %7 
> %9 = load float* %8 
> store float %9, float* %aux, align 4 
> 
> -- 
> 
> 
> Why does the 64-bit addresses version use two leading 0s instead of one? I
have
> tried reading 
> http://llvm.org/docs/GetElementPtr.html and I don't think the
explanation
> provided is accurate, or 
> at least I can't see how to apply it to this particular case. 
> 
> Besides, there is an incredible diversity of variations in how arrays can
be
> represented and accessed 
> in C codes, leading to my final question: is it really possible to 
> reverse-engineer array accesses? If so, 
> any insights? 
> 
> 
> Thanks in advance, and best regards, 
> Gabriel 
> 
> 
> 
> _______________________________________________ 
> LLVM Developers mailing list 
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu 
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev 
_______________________________________________ 
LLVM Developers mailing list 
LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu 
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111018/710f17b0/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Oct 2011 - [LLVMdev] GEP instructions: is it possible to reverse-engineer array accesses?

[LLVMdev] GEP instructions: is it possible to reverse-engineer array accesses?

[LLVMdev] GEP instructions: is it possible to reverse-engineer array accesses?

[LLVMdev] GEP instructions: is it possible to reverse-engineer array accesses?

[LLVMdev] GEP instructions: is it possible to reverse-engineer array accesses?

Maybe Matching Threads