Hello, I was wondering about the case below. I tried to find any information in C standard, but I found nothing. In this case, variable "i" is uninitialized, but it is the _same_ value passed as an argument, so only of "a" or "b" should be printed. What I found is that with -O2: LLVM (trunk) prints both "a" and "b" GCC (4.2) prints both "a" and "b" GCC (trunk) prints "b" only. As I said, I don't know what standard says here. #include <stdio.h> void f(int i) __attribute__((noinline)); void g(int i) __attribute__((noinline)); void f(int i) { if (i) printf("a\n"); } void g(int i) { if (!i) printf("b\n"); } int main() { int i; f(i); g(i); } - Kuba
Hi Jakub,> I was wondering about the case below. I tried to find any information in C standard, but I found nothing. > In this case, variable "i" is uninitialized, but it is the _same_ value passed as an argument, so only of "a" or "b" should be printed.I'm no C expert, but my understanding is that making any use of an uninitialized variable results in totally undefined behaviour, where "totally" means anything goes (eg erasing the contents of your hard-drive). From this point of view you are lucky it only printed some funny numbers rather than sending a killer sound pulse from your speakers into your brain (though trying to wrap your head around C semantics may produce a similar brain curdling result!). The situation in other languages is quite different, for example using an uninitialized variable in Ada can result in funny effects but the language standard carefully delimits just how far they can go, and it's not that far. As LLVM's "undef" is modelled on C's extreme behaviour, getting correct Ada semantics in LLVM IR is rather tricky and in fact I didn't solve it yet. Ciao, Duncan.> > What I found is that with -O2: > LLVM (trunk) prints both "a" and "b" > GCC (4.2) prints both "a" and "b" > GCC (trunk) prints "b" only. > > As I said, I don't know what standard says here. > > > #include <stdio.h> > > void f(int i) __attribute__((noinline)); > void g(int i) __attribute__((noinline)); > > void f(int i) { > if (i) printf("a\n"); > } > > void g(int i) { > if (!i) printf("b\n"); > } > > int main() { > int i; > f(i); > g(i); > } > > > - Kuba > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On 24/11/2012, at 9:08 PM, Jakub Staszak wrote:> Hello, > > I was wondering about the case below. I tried to find any information in C standard, but I found nothing. > In this case, variable "i" is uninitialized, but it is the _same_ value passed as an argument, so only of "a" or "b" should be printed. > > What I found is that with -O2: > LLVM (trunk) prints both "a" and "b" > GCC (4.2) prints both "a" and "b" > GCC (trunk) prints "b" only. > > As I said, I don't know what standard says here.Depends which Standard you read, and which part of it :) ISO C99, 6.2.6.1/5 Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. ISO C99 says, in 6.5/5 "If an exceptional condition occurs during the evaluation of an expression (THAT IS, IF THE RESULT IS NOT MATHEMATICALLY DEFINED ... ) the behaviour is undefined. [Emphasis mine] This suggests that the use of an unspecified value leads to undefined behaviour in which case all the results you got are permitted. Unfortunately the C Standard is not known for good wording or consistency. C99 attempted to over-constrain integer type representations (IMHO), and the specifications are themselves not well defined and therefore not normative. (This is what happens when the political driving forces don't know any mathematics and barely understand an CS). I haven't seen the C++11 wording on this. Hopefully they didn't follow ISO C here.> > #include <stdio.h> > > void f(int i) __attribute__((noinline)); > void g(int i) __attribute__((noinline)); > > void f(int i) { > if (i) printf("a\n"); > } > > void g(int i) { > if (!i) printf("b\n"); > } > > int main() { > int i; > f(i); > g(i); > } > > > - Kuba > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- john skaller skaller at users.sourceforge.net http://felix-lang.org
On 11/24/2012 02:08 AM, Jakub Staszak wrote:> Hello, > > I was wondering about the case below. I tried to find any information in C standard, but I found nothing. > In this case, variable "i" is uninitialized, but it is the _same_ value passed as an argument, so only of "a" or "b" should be printed. > > What I found is that with -O2: > LLVM (trunk) prints both "a" and "b" > GCC (4.2) prints both "a" and "b" > GCC (trunk) prints "b" only. > > As I said, I don't know what standard says here. > > > #include<stdio.h> > > void f(int i) __attribute__((noinline)); > void g(int i) __attribute__((noinline)); > > void f(int i) { > if (i) printf("a\n"); > } > > void g(int i) { > if (!i) printf("b\n"); > } > > int main() { > int i; > f(i); > g(i); > }Passing an uninitialized value as a function argument is undefined behaviour on the spot, regardless of what the callee does (even if it never references that argument). That aside, there is no way that 'i' has the same value, since it has no value. This: int i; printf("%d\n", i); printf("%d\n", i); is allowed to print two different values (or, applying the above rule, format your hard drive). The way the standard defines the behaviour that you see when reading a value is by stating that it must have the value that was last stored in it. When no value was stored, there are no rules to apply about what you get each time you look at it, and there is no other guarantee of a consistent value anywhere in the standard. This rule permits your program to print both 'a' and 'b', or neither. I should mention that the above is for C++, and I don't have a copy of any of the standards handy, but I expect the rules to be the same for C and C++ here. Nick
I think that the relevant part in C11 is section 6.2.6.1, which tells you that accessing a trap representation, _other than using a char type_, is undefined. Objects of automatic storage, which don't have an initializer are of indeterminate value, which either is an unspecified value or a trap representation.> What I found is that with -O2: > LLVM (trunk) prints both "a" and "b"I can't reproduce this with r168538. I only get "a". Regards, Patrik Hägglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Jakub Staszak Sent: den 24 november 2012 11:08 To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] Uninitialized variable - question Hello, I was wondering about the case below. I tried to find any information in C standard, but I found nothing. In this case, variable "i" is uninitialized, but it is the _same_ value passed as an argument, so only of "a" or "b" should be printed. What I found is that with -O2: LLVM (trunk) prints both "a" and "b" GCC (4.2) prints both "a" and "b" GCC (trunk) prints "b" only. As I said, I don't know what standard says here. #include <stdio.h> void f(int i) __attribute__((noinline)); void g(int i) __attribute__((noinline)); void f(int i) { if (i) printf("a\n"); } void g(int i) { if (!i) printf("b\n"); } int main() { int i; f(i); g(i); } - Kuba _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Sat, Nov 24, 2012 at 1:57 PM, Patrik Hägglund H <patrik.h.hagglund at ericsson.com> wrote:> I think that the relevant part in C11 is section 6.2.6.1, which tells you that accessing a trap representation, _other than using a char type_, is undefined. Objects of automatic storage, which don't have an initializer are of indeterminate value, which either is an unspecified value or a trap representation. > >> What I found is that with -O2: >> LLVM (trunk) prints both "a" and "b" > > I can't reproduce this with r168538. I only get "a".I can reproduce if 'noinline' is dropped. Dmitri -- main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
On 24/11/2012, at 10:21 PM, Nick Lewycky wrote:> > Passing an uninitialized value as a function argument is undefined behaviour on the spot, regardless of what the callee does (even if it never references that argument).Cite reference? No? Then you're guessing ;)> > That aside, there is no way that 'i' has the same value, since it has no value.This is definitely NOT correct in ISO C. It has an unspecified value, and in C99 that may be a "trap value". You state the rules generically but both C99 and C++ have special rules for unsigned char, where use of an uninitialised value is definitely not undefined.> I should mention that the above is for C++, and I don't have a copy of any of the standards handy, but I expect the rules to be the same for C and C++ here.It's VERY unwise to make such assumptions regarding conformance issues since C and C++ have completely distinct conformance models. They also treat uninitialised variables distinctly: the rules in C++ were constructed independently of ISO C rules, particularly as in C++ there are classes with constructors etc to consider, and generalised rules covering such cases as well as scalars and aggregates are likely to be distinct and have different consequences in their details. One must be aware that Standards are imperfect documents and often specifications in one place are incomplete or even wrong, unless some other place is also considered. You need to be a Standards guru to really know where to find all the relevant clauses. Even then, as I pointed out in my prior post on this topic, the Standard itself can be inconsistent, or fail to achieve a normative requirement despite the intent of the committee. This is the case with integer representation rules in C99: it looks reasonable but is actually non-normative gibberish. However the rules do have an impact, and they have a very unfortunate impact in over-constraining integer representations. In particular if, by specification of your vendor, you have a full twos complement representation of "int" all possible values of an uninitialised int variable are valid ints and the behaviour of all mathematical operations and copying is then specified by the usual rules: it's undefined only if there is overflow, division by zero, or whatever. On a 64 bit machine like x86_64 the usual representations of integers are full, and therefore copying and other operations are well defined (allowing for undefined behaviour on division by zero etc). In particular: int x[2]; int y[2]; x[0]=y[0]; x[1]=y[1]; is a perfectly valid way to copy a possibly incompletely filled array provided int has a full representation. Historically, C was a real mess, with the most traditional copying of arrays of chars aliasing other values being undefined. C++ did NOT follow C here. It invented its own, more consistent, set of rules. Not sure about C++11 though. -- john skaller skaller at users.sourceforge.net http://felix-lang.org