Dear all. After much grief I have finally found the source of some weird discrepancies in results generated using R. It turns out that this is due to the way R handles multi-line expressions. Here is an example with R version 2.8.1: ---------------------------------------------------- # R-script... r_parse_error <- function () { a <- 1; b <- 1; c <- 1; d <- a + b + c; e <- a + b + c; f <- a + b + c; cat('a',a,"\n"); cat('b',b,"\n"); cat('c',c,"\n"); cat('d',d,"\n"); cat('e',e,"\n"); cat('f',f,"\n"); } ----------------------------------------------------> r_parse_error();a 1 b 1 c 1 d 3 e 3 f 1 ---------------------------------------------------- As far as I am concerned f should have the value 3. This is causing me endless problems since case f is our house style for breaking up expressions for readability. All our code will need to be rechecked as a result. Is this behaviour a bug? If not, is it possible to get R to generate a warning that several lines of an expression are potentially being ignored, perhaps by turning on a strict mode which requires the semi-colons? Thank you, Paul
This is a perfectly legal expression: f <- a + b + c; Type it in a the console, and it will assign a to f and then print out the values of b and c. In parsing 'f <- a' that is a complete expression. You may be confused since you think that semicolons terminate an expression; that is not the case in R. If you write 'f <- a +' and then continue on the next line, R recognizes that the parsing of the expression is not complete and will continue looking. So it is not a bug; just a misunderstanding of what the syntax is and how it works. There are similar questions when people type in the following type of statements: if (1 == 1) print (TRUE) else print (FALSE) At the console you get:> if (1 == 1) print (TRUE)[1] TRUE> else print (FALSE)Error: unexpected 'else' in "else" No suitable frames for recover()>because the parsing of the 'if' is complete. Instead you should be doing:> if (1 == 1) {print (TRUE)+ } else {print (FALSE)} [1] TRUE>so the parse knows that the initial 'if' is not complete on the single line. On Fri, Mar 13, 2009 at 8:55 AM, Paul Suckling <paul.suckling at gmail.com> wrote:> Dear all. > > After much grief I have finally found the source of some weird > discrepancies in results generated using R. It turns out that this is > due to the way R handles multi-line expressions. Here is an example > with R version 2.8.1: > > ---------------------------------------------------- > # R-script... > > r_parse_error <- function () > { > ?a <- 1; > ?b <- 1; > ?c <- 1; > ?d <- a + b + c; > ?e <- a + > ? ?b + > ? ?c; > ?f <- a > ? ?+ b > ? ?+ c; > ?cat('a',a,"\n"); > ?cat('b',b,"\n"); > ?cat('c',c,"\n"); > ?cat('d',d,"\n"); > ?cat('e',e,"\n"); > ?cat('f',f,"\n"); > } > ---------------------------------------------------- >> r_parse_error(); > a 1 > b 1 > c 1 > d 3 > e 3 > f 1 > ---------------------------------------------------- > > As far as I am concerned f should have the value 3. > > This is causing me endless problems since case f is our house style > for breaking up expressions for readability. All our code will need to > be rechecked as a result. Is this behaviour a bug? If not, is it > possible to get R to generate a warning that several lines of an > expression are potentially being ignored, perhaps by turning on a > strict mode which requires the semi-colons? > > Thank you, > > Paul > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
On Fri, 13 Mar 2009, Paul Suckling wrote:> Dear all. > > After much grief I have finally found the source of some weird > discrepancies in results generated using R. It turns out that this is > due to the way R handles multi-line expressions. Here is an example > with R version 2.8.1: > > ---------------------------------------------------- > # R-script... > > r_parse_error <- function () > { > a <- 1; > b <- 1; > c <- 1; > d <- a + b + c; > e <- a + > b + > c; > f <- a > + b > + c; > cat('a',a,"\n"); > cat('b',b,"\n"); > cat('c',c,"\n"); > cat('d',d,"\n"); > cat('e',e,"\n"); > cat('f',f,"\n"); > } > ---------------------------------------------------- >> r_parse_error(); > a 1 > b 1 > c 1 > d 3 > e 3 > f 1 > ---------------------------------------------------- > > As far as I am concerned f should have the value 3.That is most unfortunate for you.> Is this behaviour a bug?No.> If not, is it > possible to get R to generate a warning that several lines of an > expression are potentially being ignored, perhaps by turning on a > strict mode which requires the semi-colons?No. R is not ignoring several lines of an expression. f <- a + b + c; is three perfectly legitimate expressions over three lines. R evaluates f<-a then evaluates +b then evaluates +c For people who like semicolons, it's the same as if you had f <- a; +b; +c; The semicolons are just an alternative to a newline, so a semicolon at the end of a line is purely cosmetic. Modifying the parser to require a semicolon to terminate a statement would break essentially every piece of R code and documentation in existence, so it's probably easier to change your house style. You could fairly easily write a tool that parsed your scripts and checked that all your expressions were either assignments or function calls and that the top-level expressions did not include unary plus or minus. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
Paul Suckling wrote: <...>> ---------------------------------------------------- > # R-script... > > r_parse_error <- function () > { ><...>> f <- a > + b > + c; > } > ><...>> f 1 > ---------------------------------------------------- > > As far as I am concerned f should have the value 3. >as far as you intend, perhaps. note, the above snippet says: f <- a; +b; +c not f <- a + b + c> This is causing me endless problems since case f is our house style > for breaking up expressions for readability. All our code will need to > be rechecked as a result. Is this behaviour a bug?clearly not. and it's hardly idiosyncratic to r. you'd have the same behaviour in, e.g., python, though there you can explicitly demand that the lines form a single statement by ending the first two with a backslash. there have been similar discussions on mailing lists of a number of programming/scripting languages.> If not, is it > possible to get R to generate a warning that several lines of an > expression are potentially being ignored,they're not ignored! you demand to compute +b and +c, and it's certainly done. (i don't think r is smart enough to optimize these away).> perhaps by turning on a > strict mode which requires the semi-colons? >that's an idea, but the proposed solution would have to be optional to avoid annoying those who don't like semiquotes in r code. vQ
I get it. Thanks everyone for the feedback. Now that I understand how it works, my comment would be that this system is dangerous since it makes it difficult to read the code and easy to make errors when typing it. I recognise that this is something so fundamental that it is unlikely to be changed so I'll have to adapt to it. My feeling is that such confusion could be avoided however, by introducing a line continuation character (or group of characters) into R that could be used to indicate to the parser (and reader of the code) that the expression continues onto the next line. Something like f <- a ... + b ... + c Cheers, Paul 2009/3/13 Paul Suckling <paul.suckling at gmail.com>:> Dear all. > > After much grief I have finally found the source of some weird > discrepancies in results generated using R. It turns out that this is > due to the way R handles multi-line expressions. Here is an example > with R version 2.8.1: > > ---------------------------------------------------- > # R-script... > > r_parse_error <- function () > { > ?a <- 1; > ?b <- 1; > ?c <- 1; > ?d <- a + b + c; > ?e <- a + > ? ?b + > ? ?c; > ?f <- a > ? ?+ b > ? ?+ c; > ?cat('a',a,"\n"); > ?cat('b',b,"\n"); > ?cat('c',c,"\n"); > ?cat('d',d,"\n"); > ?cat('e',e,"\n"); > ?cat('f',f,"\n"); > } > ---------------------------------------------------- >> r_parse_error(); > a 1 > b 1 > c 1 > d 3 > e 3 > f 1 > ---------------------------------------------------- > > As far as I am concerned f should have the value 3. > > This is causing me endless problems since case f is our house style > for breaking up expressions for readability. All our code will need to > be rechecked as a result. Is this behaviour a bug? If not, is it > possible to get R to generate a warning that several lines of an > expression are potentially being ignored, perhaps by turning on a > strict mode which requires the semi-colons? > > Thank you, > > Paul >-- Nashi Power. http://nashi.podzone.org/ Registered address: 7 Trescoe Gardens, Harrow, Middx., U.K.
If all your code has semicolons you could write a program that puts each statement on one line based on the semicolons and then passing it through R will reformat it in a standard way. See Rtidy.bat in the batchfiles distribution for the reformatting part: http://batchfiles.googlecode.com On Fri, Mar 13, 2009 at 8:55 AM, Paul Suckling <paul.suckling at gmail.com> wrote:> Dear all. > > After much grief I have finally found the source of some weird > discrepancies in results generated using R. It turns out that this is > due to the way R handles multi-line expressions. Here is an example > with R version 2.8.1: > > ---------------------------------------------------- > # R-script... > > r_parse_error <- function () > { > ?a <- 1; > ?b <- 1; > ?c <- 1; > ?d <- a + b + c; > ?e <- a + > ? ?b + > ? ?c; > ?f <- a > ? ?+ b > ? ?+ c; > ?cat('a',a,"\n"); > ?cat('b',b,"\n"); > ?cat('c',c,"\n"); > ?cat('d',d,"\n"); > ?cat('e',e,"\n"); > ?cat('f',f,"\n"); > } > ---------------------------------------------------- >> r_parse_error(); > a 1 > b 1 > c 1 > d 3 > e 3 > f 1 > ---------------------------------------------------- > > As far as I am concerned f should have the value 3. > > This is causing me endless problems since case f is our house style > for breaking up expressions for readability. All our code will need to > be rechecked as a result. Is this behaviour a bug? If not, is it > possible to get R to generate a warning that several lines of an > expression are potentially being ignored, perhaps by turning on a > strict mode which requires the semi-colons? > > Thank you, > > Paul > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 13-Mar-09 12:55:35, Paul Suckling wrote:> Dear all. > After much grief I have finally found the source of some weird > discrepancies in results generated using R. It turns out that this is > due to the way R handles multi-line expressions. Here is an example > with R version 2.8.1: > > ---------------------------------------------------- ># R-script... > > r_parse_error <- function () > { > a <- 1; > b <- 1; > c <- 1; > d <- a + b + c; > e <- a + > b + > c; > f <- a > + b > + c; > cat('a',a,"\n"); > cat('b',b,"\n"); > cat('c',c,"\n"); > cat('d',d,"\n"); > cat('e',e,"\n"); > cat('f',f,"\n"); > } > ---------------------------------------------------- >> r_parse_error(); > a 1 > b 1 > c 1 > d 3 > e 3 > f 1 > ---------------------------------------------------- > > As far as I am concerned f should have the value 3. > > This is causing me endless problems since case f is our house style > for breaking up expressions for readability. All our code will need to > be rechecked as a result. Is this behaviour a bug? If not, is it > possible to get R to generate a warning that several lines of an > expression are potentially being ignored, perhaps by turning on a > strict mode which requires the semi-colons? > > Thank you, > PaulThe lines are not being ignored! In e <- a + b + c; each line (until the last) is syntactically incomplete, so the R parser continues on to the next line until the expression is complete; and the ";" is irrelevant for this purpose. Unlike C, but like (say) 'awk', the ";" in R serves to terminate an expression when this is followed on the same line by another one, so it is basically a separator. In f <- a + b + c; however, "f <- a" is complete, so the value of 'a' is assigned to f. The line "+ b" would have sent the value of 'b' (the "+" being the unary operator "+" which does not change anything) to the console if it did not occur inside a function definition. As it is, although "+ b" is evaluated, because it is inside the function no putput is produced. Similarly for "+ c;" (and, once again, the ";" is irrelevant since a ";" at the end of a line does nothing -- unless the line was syntatically incomplete at that point, in which case ";" as the expression terminator would trigger a syntax error since an incomplete expression was being terminated. So f <- a + b + c; is not a multiline expression. It is three expressions on three separate lines. The only suggestion I can make is that you have to change your "house style" -- it is at odds with the way the R parser works, and is bound to cause "much grief". Best wishes, and good luck! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 13-Mar-09 Time: 14:16:08 ------------------------------ XFMail ------------------------------