thr3ads.net - llvm dev - [LLVMdev] Signed/unsigned value type resolution [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Sarah Thompson

2011-Nov-01 22:23 UTC

[LLVMdev] Signed/unsigned value type resolution

Hi all,

I am currently working on a static analysis aimed at integer  
arithmetic overflow/underflow detection. We are attempting to build a  
sound abstract domain (based on Cousot & Cousot-style abstract  
interpretation), but practically speaking this really requires the  
ability to figure out the word size and signedness of values in the  
intermediate representation. I'm well aware that LLVM leverages the  
(usual) equivalence of certain arithmetic operations in two's  
compliment form with respect to signedness, but from a program  
analysis point of view it can be very important to know whether, for  
example, 0xFFFFFFFF means 65535 or -1 (assuming 16 bits), particularly  
when values are represented by conceptually infinitiary abstract  
domains.

There seems to be some support in the head version in the DIType class  
(specifically DIType::isUnsignedDIType()) for extracting this  
information from debug metadata, though this member function is  
missing in 2.9. It is also sometimes possible to infer signedness from  
context, since certain instructions imply it, but I'm finding that  
doing that still leaves many cases unresolved.

What's the best way to go with this?

Thank you in advance,
Sarah Thompson
NASA Ames (back doing LLVM stuff again after a while working on  
robotics)

Duncan Sands

2011-Nov-02 08:09 UTC

head link

[LLVMdev] Signed/unsigned value type resolution

Hi Sarah, this is a hopeless task because both signed and unsigned
variables in C can map to the *same* LLVM IR register.  Consider the
following example:

void foo(int x, int y) {
   if ((x < y) || ((unsigned)x < (unsigned)y))
     abort();
}

In LLVM IR this becomes:

define void @foo(i32 %x, i32 %y) {
entry:
   %0 = icmp slt i32 %x, %y
   %1 = icmp ult i32 %x, %y
   %2 = or i1 %0, %1
   br i1 %2, label %"3", label %return

"3":
   tail call void @abort() noreturn nounwind
   unreachable

return:
   ret void
}

Note how both the signed variable x and the unsigned variable (unsigned)x
have become the same IR register %x.  The underlying problem here is that
casts from signed to unsigned or unsigned to signed in C become no-ops in
LLVM IR, so signed and unsigned values that are different in C become the
same register in LLVM IR.  Thus it is logically impossible to correctly
assign "signed" or "unsigned" labels to LLVM IR registers.

Instead of trying to fight LLVM's type system, I think you need to
embrace it: accept that only operations have signs, and adapt your
algorithms to work with that.

Ciao, Duncan.
> I am currently working on a static analysis aimed at integer
> arithmetic overflow/underflow detection. We are attempting to build a
> sound abstract domain (based on Cousot&  Cousot-style abstract
> interpretation), but practically speaking this really requires the
> ability to figure out the word size and signedness of values in the
> intermediate representation. I'm well aware that LLVM leverages the
> (usual) equivalence of certain arithmetic operations in two's
> compliment form with respect to signedness, but from a program
> analysis point of view it can be very important to know whether, for
> example, 0xFFFFFFFF means 65535 or -1 (assuming 16 bits), particularly
> when values are represented by conceptually infinitiary abstract
> domains.
>
> There seems to be some support in the head version in the DIType class
> (specifically DIType::isUnsignedDIType()) for extracting this
> information from debug metadata, though this member function is
> missing in 2.9. It is also sometimes possible to infer signedness from
> context, since certain instructions imply it, but I'm finding that
> doing that still leaves many cases unresolved.
>
> What's the best way to go with this?
>
> Thank you in advance,
> Sarah Thompson
> NASA Ames (back doing LLVM stuff again after a while working on
> robotics)
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Nov 2011 - [LLVMdev] Signed/unsigned value type resolution

[LLVMdev] Signed/unsigned value type resolution

[LLVMdev] Signed/unsigned value type resolution

Possibly Parallel Threads