Hi Sarah, this is a hopeless task because both signed and unsigned
variables in C can map to the *same* LLVM IR register. Consider the
following example:
void foo(int x, int y) {
if ((x < y) || ((unsigned)x < (unsigned)y))
abort();
}
In LLVM IR this becomes:
define void @foo(i32 %x, i32 %y) {
entry:
%0 = icmp slt i32 %x, %y
%1 = icmp ult i32 %x, %y
%2 = or i1 %0, %1
br i1 %2, label %"3", label %return
"3":
tail call void @abort() noreturn nounwind
unreachable
return:
ret void
}
Note how both the signed variable x and the unsigned variable (unsigned)x
have become the same IR register %x. The underlying problem here is that
casts from signed to unsigned or unsigned to signed in C become no-ops in
LLVM IR, so signed and unsigned values that are different in C become the
same register in LLVM IR. Thus it is logically impossible to correctly
assign "signed" or "unsigned" labels to LLVM IR registers.
Instead of trying to fight LLVM's type system, I think you need to
embrace it: accept that only operations have signs, and adapt your
algorithms to work with that.
Ciao, Duncan.
> I am currently working on a static analysis aimed at integer
> arithmetic overflow/underflow detection. We are attempting to build a
> sound abstract domain (based on Cousot& Cousot-style abstract
> interpretation), but practically speaking this really requires the
> ability to figure out the word size and signedness of values in the
> intermediate representation. I'm well aware that LLVM leverages the
> (usual) equivalence of certain arithmetic operations in two's
> compliment form with respect to signedness, but from a program
> analysis point of view it can be very important to know whether, for
> example, 0xFFFFFFFF means 65535 or -1 (assuming 16 bits), particularly
> when values are represented by conceptually infinitiary abstract
> domains.
>
> There seems to be some support in the head version in the DIType class
> (specifically DIType::isUnsignedDIType()) for extracting this
> information from debug metadata, though this member function is
> missing in 2.9. It is also sometimes possible to infer signedness from
> context, since certain instructions imply it, but I'm finding that
> doing that still leaves many cases unresolved.
>
> What's the best way to go with this?
>
> Thank you in advance,
> Sarah Thompson
> NASA Ames (back doing LLVM stuff again after a while working on
> robotics)
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev