William Dunlap
2009-Jun-03 03:17 UTC
[Rd] reference counting bug related to break and next in loops
One of our R users here just showed me the following problem while investigating the return value of a while loop. I added some information on a similar bug in for loops. I think he was using 2.9.0 but I see the same problem on today's development version of 2.10.0 (svn 48703). Should the semantics of while and for loops be changed slightly to avoid the memory buildup that fixing this to reflect the current docs would entail? S+'s loops return nothing useful - that change was made long ago to avoid memory buildup resulting from semantics akin the R's present semantics. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com --------------------Forwarded (and edited) message below------------------------------------------------------------------- ---------- I think I have found another reference counting bug. If you type in the following in R you get what I think is the wrong result.> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; i i + 1; y}; q[1] 42 42 42 42 42 42 42 42 9 10 I had expected [1] 42 42 42 42 42 42 42 8 9 10 which is what you get if you add 0 to y in the last statement in the while loop:> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; i i + 1; y + 0}; q[1] 42 42 42 42 42 42 42 8 9 10 Also,> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break };i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration", i, "\n"); y}; q Completing iteration 2 Completing iteration 3 [1] 42 42 42 42 42 42 42 42 9 10 but if the last statement in the while loop is y+0 instead of y I get the expected result:> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break };i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration", i, "\n"); y+0L}; q Completing iteration 2 Completing iteration 3 [1] 42 42 3 4 5 6 7 8 9 10 A background to the problem is that in R a while-loop returns the value of the last iteration. However there is an exception if an iteration is terminated by a break or a next. Then the value is the value of the previously completed iteration that did not execute a break or next. Thus in an extreme case the value of the while may be the value of the very first iteration even though it executed a million iterations. Thus to implement that correctly one needs to keep a reference to the value of the last non-terminated iteration. It seems as if the current R implementation does that but does not increase the reference counter which explains the odd behavior. The for loop example is> z<-{ tmp<-rep(pi,10);for(i in 1:10){ tmp[i]<-i^2;if(i==9)break ; if(i<9&&i>3)next ; tmp } }> z[1] 1.000000 4.000000 9.000000 16.000000 25.000000 36.000000 49.000000 [8] 64.000000 81.000000 3.141593> z<-{ tmp<-rep(pi,10);for(i in 1:10){ tmp[i]<-i^2;if(i==9)break ; if(i<9&&i>3)next ; tmp+0 } }> z[1] 1.000000 4.000000 9.000000 3.141593 3.141593 3.141593 3.141593 3.141593 [9] 3.141593 3.141593 I can think of a couple of ways to solve this. 1. Increment the reference counter. This solves the bug but may have serious performance implications. In the while example above it needs to copy y in every iteration. 2. Change the semantics of while loops by getting rid of the exception described above. When a loop is terminated with a break the value of the loop would be NULL. Thus there is no need to keep a reference to the value of the last non-terminated iteration. Any opinions?
Wacek Kusnierczyk
2009-Jun-03 08:47 UTC
[Rd] reference counting bug related to break and next in loops
William Dunlap wrote:> One of our R users here just showed me the following problem while > investigating the return value of a while loop. I added some > information > on a similar bug in for loops. I think he was using 2.9.0 > but I see the same problem on today's development version of 2.10.0 > (svn 48703). > > Should the semantics of while and for loops be changed slightly to avoid > the memory > buildup that fixing this to reflect the current docs would entail? S+'s > loops return nothing useful - that change was made long ago to avoid > memory buildup resulting from semantics akin the R's present semantics. > > Bill Dunlap > TIBCO Software Inc - Spotfire Division > wdunlap tibco.com > > --------------------Forwarded (and edited) message > below------------------------------------------------------------------- > ---------- > > I think I have found another reference counting bug. > > If you type in the following in R you get what I think is the wrong > result. > > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; i >> > i + 1; y}; q > [1] 42 42 42 42 42 42 42 42 9 10 > > I had expected [1] 42 42 42 42 42 42 42 8 9 10 which is what you get > if you add 0 to y in the last statement in the while loop: >a simplified example may help to get a clear picture: i = 1; y = 1:3; (while(TRUE) { y[i] = 0 if (i == 2) break i = i + 1 y }) # 0 0 3 i = 1; y = 1:3; (while(TRUE) { y[i] = 0 if (i == 2) break i = i + 1 y + 0 }) # 0 2 3 the test on i is done after the assignment to y[i]. when the loop breaks, y is 0 0 3, and one might expect this to be the final result. it looks like the result is the value of y from the previous iteration, and it does not seem particularly intuitive to me. (using common sense, i mean; an informed expert on the copy-when-scared semantics may have a different opinion, but why should a casual user ever suspect such magic.) anyway, i'd rather expect NULL to be returned. for the oracle, ?'while', says: "'for', 'while' and 'repeat' return the value of the last expression evaluated (or 'NULL' if none was), invisibly. [...] 'if' returns the value of the expression evaluated, or 'NULL' if none was. [...] 'break' and 'next' have value 'NULL', although it would be strange to look for a return value." when i is 2, i == 2 is TRUE. hence, if (i == 2) break evaluates to break. break evaluates to NULL, breaks the loop, and the return value should be NULL. while it is, following the docs, strange to have q while(...) ... in the code, the result above is not compliant with the docs at all -- seems like a plain bug. there is no reason for while to return the value of y, be it 0 0 3 or 0 2 3. one might naively suspect that it is the syntactically last expression in the body of while that provides the return value, but the docs explicitly say "the last expression evaluated". and indeed, (while (TRUE) { break; 'foo' }) # NULL however, i = FALSE (while (TRUE) { if (i) break; i = !i; i }) # TRUE which again reveals the bug. one could suspect that "the last expression evaluated" is actually the whole body of the while loop; so in the above, the value of { if (i) break; i = !i; i } should be returned, even if the loop breaks in the middle. hence, the result "should" be TRUE (or maybe FALSE?). however, (while (TRUE) { break; while(TRUE) { 'foo' } }) # NULL has no problem with returning NULL -- obviously, so to speak. it seems to me that the bug is not in reference counting, but in that the while loop incorrectly returns the value of the *previous* iteration while executing a break, instead of the break's NULL. likewise, (for (i in 1:2) { if (i == 2) break i }) # 1 instead of the specification-promised NULL.>> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; i >> > i + 1; y + 0}; q > [1] 42 42 42 42 42 42 42 8 9 10 >> Also, > > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; >> > i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration", i, "\n"); y}; > q > Completing iteration 2 > Completing iteration 3 > [1] 42 42 42 42 42 42 42 42 9 10 > > but if the last statement in the while loop is y+0 instead of y I get > the > expected result: > > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; >> > i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration", i, "\n"); > y+0L}; q > Completing iteration 2 > Completing iteration 3 > [1] 42 42 3 4 5 6 7 8 9 10 >> A background to the problem is that in R a while-loop returns the value > of the last iteration.not according to the docs; the "last expression evaluated". specifically, not the value of the last non-break-broken iteration. vQ
luke at stat.uiowa.edu
2009-Jun-10 20:05 UTC
[Rd] reference counting bug related to break and next in loops
Thanks for the report. It turns out that a similar issue arises in while() loops without break/next being involved because the test expression is evaluated after the final body evaluation. After some discussion we decided it was simplest both for implementation and documentation to have the value of a loop expression always be NULL. This is now implemented in R-devel. luke On Tue, 2 Jun 2009, William Dunlap wrote:> One of our R users here just showed me the following problem while > investigating the return value of a while loop. I added some > information > on a similar bug in for loops. I think he was using 2.9.0 > but I see the same problem on today's development version of 2.10.0 > (svn 48703). > > Should the semantics of while and for loops be changed slightly to avoid > the memory > buildup that fixing this to reflect the current docs would entail? S+'s > loops return nothing useful - that change was made long ago to avoid > memory buildup resulting from semantics akin the R's present semantics. > > Bill Dunlap > TIBCO Software Inc - Spotfire Division > wdunlap tibco.com > > --------------------Forwarded (and edited) message > below------------------------------------------------------------------- > ---------- > > I think I have found another reference counting bug. > > If you type in the following in R you get what I think is the wrong > result. > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; i > i + 1; y}; q > [1] 42 42 42 42 42 42 42 42 9 10 > > I had expected [1] 42 42 42 42 42 42 42 8 9 10 which is what you get > if you add 0 to y in the last statement in the while loop: > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; i > i + 1; y + 0}; q > [1] 42 42 42 42 42 42 42 8 9 10 > > Also, > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; > i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration", i, "\n"); y}; > q > Completing iteration 2 > Completing iteration 3 > [1] 42 42 42 42 42 42 42 42 9 10 > > but if the last statement in the while loop is y+0 instead of y I get > the > expected result: > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; > i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration", i, "\n"); > y+0L}; q > Completing iteration 2 > Completing iteration 3 > [1] 42 42 3 4 5 6 7 8 9 10 > > A background to the problem is that in R a while-loop returns the value > of the last iteration. However there is an exception if an iteration is > terminated by a break or a next. Then the value is the value of the > previously completed iteration that did not execute a break or next. > Thus in an extreme case the value of the while may be the value of the > very first iteration even though it executed a million iterations. > > Thus to implement that correctly one needs to keep a reference to the > value of the last non-terminated iteration. It seems as if the current R > implementation does that but does not increase the reference counter > which explains the odd behavior. > > The for loop example is > >> z<-{ tmp<-rep(pi,10);for(i in 1:10){ tmp[i]<-i^2;if(i==9)break ; if > (i<9&&i>3)next ; tmp } } >> z > [1] 1.000000 4.000000 9.000000 16.000000 25.000000 36.000000 > 49.000000 > [8] 64.000000 81.000000 3.141593 >> z<-{ tmp<-rep(pi,10);for(i in 1:10){ tmp[i]<-i^2;if(i==9)break ; if > (i<9&&i>3)next ; tmp+0 } } >> z > [1] 1.000000 4.000000 9.000000 3.141593 3.141593 3.141593 3.141593 > 3.141593 > [9] 3.141593 3.141593 > > I can think of a couple of ways to solve this. > > 1. Increment the reference counter. This solves the bug but may > have serious performance implications. In the while example above it > needs to copy y in every iteration. > > 2. Change the semantics of while loops by getting rid of the > exception described above. When a loop is terminated with a break the > value of the loop would be NULL. Thus there is no need to keep a > reference to the value of the last non-terminated iteration. > > Any opinions? > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Chair, Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke at stat.uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu