Angel Rodriguez
2014-Aug-29 08:53 UTC
[R] Unexpected behavior when giving a value to a new variable based on the value of another variable
Dear subscribers, I've found that if there is a variable in the dataframe with a name very similar to a new variable, R does not give the correct values to this latter variable based on the values of a third value:> M <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, 58)),.Names = c("age"), row.names = c(NA, -9L),+ class = "data.frame")> M$sample[M$age >= 65] <- 1 > Mage sample 1 67 1 2 62 NA 3 74 1 4 61 NA 5 60 NA 6 55 NA 7 60 NA 8 59 NA 9 58 NA> N <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, 58), V2 = c(NA, 1, 1, 1, 1,1,1,1,NA)),+ .Names = c("age","samplem"), row.names = c(NA, -9L), class = "data.frame")> N$sample[N$age >= 65] <- 1 > Nage samplem sample 1 67 NA 1 2 62 1 1 3 74 1 1 4 61 1 1 5 60 1 1 6 55 1 1 7 60 1 1 8 59 1 1 9 58 NA NA Any clue for this behavior? My specifications: R version 3.1.1 (2014-07-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252 [4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] foreign_0.8-61 loaded via a namespace (and not attached): [1] tools_3.1.1 Thank you very much. Angel Rodriguez-Laso Research project manager Matia Instituto Gerontologico [[alternative HTML version deleted]]
jim holtman
2014-Aug-29 12:45 UTC
[R] Unexpected behavior when giving a value to a new variable based on the value of another variable
You are being bitten by the "partial matching" of the "$" operator (see ?"$" for a better explanation). Here is solution that works: **original**> N <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, 58), V2 = c(NA, 1, 1, 1, 1,1,1,1,NA)),+ .Names = c("age","samplem"), row.names = c(NA, -9L), class = "data.frame")> N$sample[N$age >= 65] <- 1 > Nage samplem sample 1 67 NA 1 2 62 1 1 3 74 1 1 4 61 1 1 5 60 1 1 6 55 1 1 7 60 1 1 8 59 1 1 9 58 NA NA> > > N <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, 58), V2 = c(NA, 1, 1, 1, 1,1,1,1,NA)),+ .Names = c("age","samplem"), row.names = c(NA, -9L), class = "data.frame")> N[["sample"]][N$age >= 65] <- 1 # use the '[[' operation for complete matching > Nage samplem sample 1 67 NA 1 2 62 1 NA 3 74 1 1 4 61 1 NA 5 60 1 NA 6 55 1 NA 7 60 1 NA 8 59 1 NA 9 58 NA NA Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Fri, Aug 29, 2014 at 4:53 AM, Angel Rodriguez <angel.rodriguez at matiainstituto.net> wrote:> > Dear subscribers, > > I've found that if there is a variable in the dataframe with a name very similar to a new variable, R does not give the correct values to this latter variable based on the values of a third value: > > >> M <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, 58)),.Names = c("age"), row.names = c(NA, -9L), > + class = "data.frame") >> M$sample[M$age >= 65] <- 1 >> M > age sample > 1 67 1 > 2 62 NA > 3 74 1 > 4 61 NA > 5 60 NA > 6 55 NA > 7 60 NA > 8 59 NA > 9 58 NA >> N <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, 58), V2 = c(NA, 1, 1, 1, 1,1,1,1,NA)), > + .Names = c("age","samplem"), row.names = c(NA, -9L), class = "data.frame") >> N$sample[N$age >= 65] <- 1 >> N > age samplem sample > 1 67 NA 1 > 2 62 1 1 > 3 74 1 1 > 4 61 1 1 > 5 60 1 1 > 6 55 1 1 > 7 60 1 1 > 8 59 1 1 > 9 58 NA NA > > > > Any clue for this behavior? > > > > My specifications: > > R version 3.1.1 (2014-07-10) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252 > [4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] foreign_0.8-61 > > loaded via a namespace (and not attached): > [1] tools_3.1.1 > > > > > Thank you very much. > > Angel Rodriguez-Laso > Research project manager > Matia Instituto Gerontologico > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
John McKown
2014-Aug-29 12:46 UTC
[R] Unexpected behavior when giving a value to a new variable based on the value of another variable
On Fri, Aug 29, 2014 at 3:53 AM, Angel Rodriguez <angel.rodriguez at matiainstituto.net> wrote:> > Dear subscribers, > > I've found that if there is a variable in the dataframe with a name very similar to a new variable, R does not give the correct values to this latter variable based on the values of a third value: > ><snip>> > Any clue for this behavior? ><snip>> > Thank you very much. > > Angel Rodriguez-Laso > Research project manager > Matia Instituto GerontologicoThat is unusual, but appears to be documented in a section from ?`[` <quote> Character indices Character indices can in some circumstances be partially matched (see pmatch) to the names or dimnames of the object being subsetted (but never for subassignment). Unlike S (Becker et al p. 358)), R never uses partial matching when extracting by [, and partial matching is not by default used by [[ (see argument exact). Thus the default behaviour is to use partial matching only when extracting from recursive objects (except environments) by $. Even in that case, warnings can be switched on by options(warnPartialMatchDollar = TRUE). Neither empty ("") nor NA indices match any names, not even empty nor missing names. If any object has no names or appropriate dimnames, they are taken as all "" and so match nothing. </quote> Note the commend about "partial matching" in the middle paragraph in the quote above. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown
Jeff Newmiller
2014-Aug-29 13:33 UTC
[R] Unexpected behavior when giving a value to a new variable based on the value of another variable
One clue is the help file for "$"... ?" $" In particular there see the discussion of character indices and the "exact" argument. You can also find this discussed in the Introduction to R document that comes with the software. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On August 29, 2014 1:53:47 AM PDT, Angel Rodriguez <angel.rodriguez at matiainstituto.net> wrote:> >Dear subscribers, > >I've found that if there is a variable in the dataframe with a name >very similar to a new variable, R does not give the correct values to >this latter variable based on the values of a third value: > > >> M <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, >58)),.Names = c("age"), row.names = c(NA, -9L), >+ class = "data.frame") >> M$sample[M$age >= 65] <- 1 >> M > age sample >1 67 1 >2 62 NA >3 74 1 >4 61 NA >5 60 NA >6 55 NA >7 60 NA >8 59 NA >9 58 NA >> N <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, 58), V2 >c(NA, 1, 1, 1, 1,1,1,1,NA)), >+ .Names = c("age","samplem"), row.names = c(NA, >-9L), class = "data.frame") >> N$sample[N$age >= 65] <- 1 >> N > age samplem sample >1 67 NA 1 >2 62 1 1 >3 74 1 1 >4 61 1 1 >5 60 1 1 >6 55 1 1 >7 60 1 1 >8 59 1 1 >9 58 NA NA > > > >Any clue for this behavior? > > > >My specifications: > >R version 3.1.1 (2014-07-10) >Platform: x86_64-w64-mingw32/x64 (64-bit) > >locale: >[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 >LC_MONETARY=Spanish_Spain.1252 >[4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252 > >attached base packages: >[1] stats graphics grDevices utils datasets methods base > > >other attached packages: >[1] foreign_0.8-61 > >loaded via a namespace (and not attached): >[1] tools_3.1.1 > > > > >Thank you very much. > >Angel Rodriguez-Laso >Research project manager >Matia Instituto Gerontologico > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.