hin-tak.leung at cimr.cam.ac.uk
2006-Oct-16 15:23 UTC
[Rd] Bugs with partial name matching during partial replacement (PR#9299)
This is a rather interesting, but I don't think it is a bug - it is just things that "you are not supposed to do"... you are assuming a certain evaluation order of the 4 "$" operators in " D$ABC[D$M] = D$V[D$M] " as in: temp1 <- D$M # 2nd and 4th temp2 <- D$V[temp1] # 3rd D$ABC[temp1] = temp2 # 1st What R did was this: temp4 <- D$ABC # make reference, expand to D$ABCD , 1st temp1 <- D$M # 2nd, and 4th temp2 <- D$V[temp1] # 3rd temp4[temp1] <- temp2 # oh dear, it looks as if we are D$ABC <- temp4 # trying to write to a reference, # better make a copy instead R is doing the 4 $'s roughly from left to right, if you have some ideas how R works inside. (I am not saying this behavior is a "good" thing, but at least it is consistent). Basically it is a very bad habit to write code that depends on evaluation order of operators at the same precendence. The difference in behavior in the two case is probably due to coercion, (and also how lazy R does make-a-reference versus "oops, you seems to try to write to a reference so I better copy it") but I'll leave you to think about what order R is doing the combination of the 4 $'s and coercing between types... Basically writing code that depends on evaluation order is a bad idea. c.f. this bit of C code: i =0; ++i = ++i + ++i; what value do you think "i" should be? amaliy1 at uic.edu wrote:> Hello, > > First the version info: > platform powerpc-apple-darwin8.6.0 > arch powerpc > os darwin8.6.0 > system powerpc, darwin8.6.0 > status > major 2 > minor 3.1 > year 2006 > month 06 > day 01 > svn rev 38247 > language R > version.string Version 2.3.1 (2006-06-01) > > I have encountered some unusual behavior when trying to create new > columns in a data frame that have names that would generate a partial > match with an existing column with a longer name. It is my > understanding that replacement operations shouldn't have partial > matching, but it is not clear to me whether this applies only when > the named column exists and not for new assignments. > > The first example: > > > D = data.frame(M=c(T,T,F,F,F,T,F,T,F,F,T,T,T),V=I(sprintf("ZZ%02d", > 1:13)),ABCD=13:1) > > D > M V ABCD > 1 TRUE ZZ01 13 > 2 TRUE ZZ02 12 > 3 FALSE ZZ03 11 > 4 FALSE ZZ04 10 > 5 FALSE ZZ05 9 > 6 TRUE ZZ06 8 > 7 FALSE ZZ07 7 > 8 TRUE ZZ08 6 > 9 FALSE ZZ09 5 > 10 FALSE ZZ10 4 > 11 TRUE ZZ11 3 > 12 TRUE ZZ12 2 > 13 TRUE ZZ13 1 > > D$CBA[D$M] = D$V[D$M] > > D > M V ABCD CBA > 1 TRUE ZZ01 13 ZZ01 > 2 TRUE ZZ02 12 ZZ02 > 3 FALSE ZZ03 11 <NA> > 4 FALSE ZZ04 10 <NA> > 5 FALSE ZZ05 9 <NA> > 6 TRUE ZZ06 8 ZZ06 > 7 FALSE ZZ07 7 <NA> > 8 TRUE ZZ08 6 ZZ08 > 9 FALSE ZZ09 5 <NA> > 10 FALSE ZZ10 4 <NA> > 11 TRUE ZZ11 3 ZZ11 > 12 TRUE ZZ12 2 ZZ12 > 13 TRUE ZZ13 1 ZZ13 > > D$ABC[D$M] = D$V[D$M] > > D > M V ABCD CBA ABC > 1 TRUE ZZ01 13 ZZ01 ZZ01 > 2 TRUE ZZ02 12 ZZ02 ZZ02 > 3 FALSE ZZ03 11 <NA> 11 > 4 FALSE ZZ04 10 <NA> 10 > 5 FALSE ZZ05 9 <NA> 9 > 6 TRUE ZZ06 8 ZZ06 ZZ06 > 7 FALSE ZZ07 7 <NA> 7 > 8 TRUE ZZ08 6 ZZ08 ZZ08 > 9 FALSE ZZ09 5 <NA> 5 > 10 FALSE ZZ10 4 <NA> 4 > 11 TRUE ZZ11 3 ZZ11 ZZ11 > 12 TRUE ZZ12 2 ZZ12 ZZ12 > 13 TRUE ZZ13 1 ZZ13 ZZ13 > > I expected ABC to equal CBA with NA values in rows not assigned, but > instead it appears that an extraction from D$ABCD and coercion to > string is being performed in the process of creating D$ABC. > > Here is something I believe is definitely a bug: > > > D = data.frame(M=c(T,T,F,F,F,T,F,T,F,F,T,T,T),V=1:13,ABCD=13:1) > > D > M V ABCD > 1 TRUE 1 13 > 2 TRUE 2 12 > 3 FALSE 3 11 > 4 FALSE 4 10 > 5 FALSE 5 9 > 6 TRUE 6 8 > 7 FALSE 7 7 > 8 TRUE 8 6 > 9 FALSE 9 5 > 10 FALSE 10 4 > 11 TRUE 11 3 > 12 TRUE 12 2 > 13 TRUE 13 1 > > D$CBA[D$M] = D$V[D$M] > > D > M V ABCD CBA > 1 TRUE 1 13 1 > 2 TRUE 2 12 2 > 3 FALSE 3 11 NA > 4 FALSE 4 10 NA > 5 FALSE 5 9 NA > 6 TRUE 6 8 6 > 7 FALSE 7 7 NA > 8 TRUE 8 6 8 > 9 FALSE 9 5 NA > 10 FALSE 10 4 NA > 11 TRUE 11 3 11 > 12 TRUE 12 2 12 > 13 TRUE 13 1 13 > > D$ABC[D$M] = D$V[D$M] > > D > M V ABCD CBA ABC > 1 TRUE 1 1 1 1 > 2 TRUE 2 2 2 2 > 3 FALSE 3 11 NA 11 > 4 FALSE 4 10 NA 10 > 5 FALSE 5 9 NA 9 > 6 TRUE 6 6 6 6 > 7 FALSE 7 7 NA 7 > 8 TRUE 8 8 8 8 > 9 FALSE 9 5 NA 5 > 10 FALSE 10 4 NA 4 > 11 TRUE 11 11 11 11 > 12 TRUE 12 12 12 12 > 13 TRUE 13 13 13 13 > > ABC is created as before with valued from ABCD in the unassigned > rows, but ABCD is being modified as well. The only difference form > the previous example is that V is now just a numeric column. > > Anil Maliyekkel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Thomas Lumley
2006-Oct-16 19:54 UTC
[Rd] PR#9299:Re: Bugs with partial name matching during partial replacement (PR#9299)
On Mon, 16 Oct 2006, hin-tak.leung at cimr.cam.ac.uk wrote:> This is a rather interesting, but I don't think it is a bug - it is > just things that "you are not supposed to do"It was a bug. It has been fixed in R 2.4.0. Unfortunately, since you didn't quote the PR# of the original bug in the subject line you have just filed a new bug report for it. -thomas> ... you are assuming > a certain evaluation order of the 4 "$" operators in > " D$ABC[D$M] = D$V[D$M] " as in: > > temp1 <- D$M # 2nd and 4th > temp2 <- D$V[temp1] # 3rd > D$ABC[temp1] = temp2 # 1st > > What R did was this: > > temp4 <- D$ABC # make reference, expand to D$ABCD , 1st > temp1 <- D$M # 2nd, and 4th > temp2 <- D$V[temp1] # 3rd > > temp4[temp1] <- temp2 # oh dear, it looks as if we are > D$ABC <- temp4 # trying to write to a reference, > # better make a copy instead > > R is doing the 4 $'s roughly from left to right, if you have some ideas > how R works inside. (I am not saying this behavior is a "good" thing, > but at least it is consistent). Basically it is a very bad habit to > write code that depends on evaluation order of operators at the same > precendence. > > The difference in behavior in the two case is probably due to > coercion, (and also how lazy R does make-a-reference versus "oops, you > seems to try to write to a reference so I better copy it") but > I'll leave you to think about what order R is doing the combination of > the 4 $'s and coercing between types... Basically writing code that > depends on evaluation order is a bad idea. > > c.f. this bit of C code: > > i =0; > ++i = ++i + ++i; > > what value do you think "i" should be? > > amaliy1 at uic.edu wrote: >> Hello, >> >> First the version info: >> platform powerpc-apple-darwin8.6.0 >> arch powerpc >> os darwin8.6.0 >> system powerpc, darwin8.6.0 >> status >> major 2 >> minor 3.1 >> year 2006 >> month 06 >> day 01 >> svn rev 38247 >> language R >> version.string Version 2.3.1 (2006-06-01) >> >> I have encountered some unusual behavior when trying to create new >> columns in a data frame that have names that would generate a partial >> match with an existing column with a longer name. It is my >> understanding that replacement operations shouldn't have partial >> matching, but it is not clear to me whether this applies only when >> the named column exists and not for new assignments. >> >> The first example: >> >> > D = data.frame(M=c(T,T,F,F,F,T,F,T,F,F,T,T,T),V=I(sprintf("ZZ%02d", >> 1:13)),ABCD=13:1) >> > D >> M V ABCD >> 1 TRUE ZZ01 13 >> 2 TRUE ZZ02 12 >> 3 FALSE ZZ03 11 >> 4 FALSE ZZ04 10 >> 5 FALSE ZZ05 9 >> 6 TRUE ZZ06 8 >> 7 FALSE ZZ07 7 >> 8 TRUE ZZ08 6 >> 9 FALSE ZZ09 5 >> 10 FALSE ZZ10 4 >> 11 TRUE ZZ11 3 >> 12 TRUE ZZ12 2 >> 13 TRUE ZZ13 1 >> > D$CBA[D$M] = D$V[D$M] >> > D >> M V ABCD CBA >> 1 TRUE ZZ01 13 ZZ01 >> 2 TRUE ZZ02 12 ZZ02 >> 3 FALSE ZZ03 11 <NA> >> 4 FALSE ZZ04 10 <NA> >> 5 FALSE ZZ05 9 <NA> >> 6 TRUE ZZ06 8 ZZ06 >> 7 FALSE ZZ07 7 <NA> >> 8 TRUE ZZ08 6 ZZ08 >> 9 FALSE ZZ09 5 <NA> >> 10 FALSE ZZ10 4 <NA> >> 11 TRUE ZZ11 3 ZZ11 >> 12 TRUE ZZ12 2 ZZ12 >> 13 TRUE ZZ13 1 ZZ13 >> > D$ABC[D$M] = D$V[D$M] >> > D >> M V ABCD CBA ABC >> 1 TRUE ZZ01 13 ZZ01 ZZ01 >> 2 TRUE ZZ02 12 ZZ02 ZZ02 >> 3 FALSE ZZ03 11 <NA> 11 >> 4 FALSE ZZ04 10 <NA> 10 >> 5 FALSE ZZ05 9 <NA> 9 >> 6 TRUE ZZ06 8 ZZ06 ZZ06 >> 7 FALSE ZZ07 7 <NA> 7 >> 8 TRUE ZZ08 6 ZZ08 ZZ08 >> 9 FALSE ZZ09 5 <NA> 5 >> 10 FALSE ZZ10 4 <NA> 4 >> 11 TRUE ZZ11 3 ZZ11 ZZ11 >> 12 TRUE ZZ12 2 ZZ12 ZZ12 >> 13 TRUE ZZ13 1 ZZ13 ZZ13 >> >> I expected ABC to equal CBA with NA values in rows not assigned, but >> instead it appears that an extraction from D$ABCD and coercion to >> string is being performed in the process of creating D$ABC. >> >> Here is something I believe is definitely a bug: >> >> > D = data.frame(M=c(T,T,F,F,F,T,F,T,F,F,T,T,T),V=1:13,ABCD=13:1) >> > D >> M V ABCD >> 1 TRUE 1 13 >> 2 TRUE 2 12 >> 3 FALSE 3 11 >> 4 FALSE 4 10 >> 5 FALSE 5 9 >> 6 TRUE 6 8 >> 7 FALSE 7 7 >> 8 TRUE 8 6 >> 9 FALSE 9 5 >> 10 FALSE 10 4 >> 11 TRUE 11 3 >> 12 TRUE 12 2 >> 13 TRUE 13 1 >> > D$CBA[D$M] = D$V[D$M] >> > D >> M V ABCD CBA >> 1 TRUE 1 13 1 >> 2 TRUE 2 12 2 >> 3 FALSE 3 11 NA >> 4 FALSE 4 10 NA >> 5 FALSE 5 9 NA >> 6 TRUE 6 8 6 >> 7 FALSE 7 7 NA >> 8 TRUE 8 6 8 >> 9 FALSE 9 5 NA >> 10 FALSE 10 4 NA >> 11 TRUE 11 3 11 >> 12 TRUE 12 2 12 >> 13 TRUE 13 1 13 >> > D$ABC[D$M] = D$V[D$M] >> > D >> M V ABCD CBA ABC >> 1 TRUE 1 1 1 1 >> 2 TRUE 2 2 2 2 >> 3 FALSE 3 11 NA 11 >> 4 FALSE 4 10 NA 10 >> 5 FALSE 5 9 NA 9 >> 6 TRUE 6 6 6 6 >> 7 FALSE 7 7 NA 7 >> 8 TRUE 8 8 8 8 >> 9 FALSE 9 5 NA 5 >> 10 FALSE 10 4 NA 4 >> 11 TRUE 11 11 11 11 >> 12 TRUE 12 12 12 12 >> 13 TRUE 13 13 13 13 >> >> ABC is created as before with valued from ABCD in the unassigned >> rows, but ABCD is being modified as well. The only difference form >> the previous example is that V is now just a numeric column. >> >> Anil Maliyekkel >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle