hpages at fhcrc.org
2007-Mar-27 03:48 UTC
[Rd] Unexpected result of as.character() and unlist() applied to a data frame
Hi,> dd <- data.frame(A=c("b","c","a"), B=3:1) > ddA B 1 b 3 2 c 2 3 a 1> unlist(dd)A1 A2 A3 B1 B2 B3 2 3 1 3 2 1 Someone else might get something different. It all depends on the values of its 'stringsAsFactors' option:> dd2 <- data.frame(A=c("b","c","a"), B=3:1, stringsAsFactors=FALSE) > dd2A B 1 b 3 2 c 2 3 a 1> unlist(dd2)A1 A2 A3 B1 B2 B3 "b" "c" "a" "3" "2" "1" Same thing with as.character:> as.character(dd)[1] "c(2, 3, 1)" "c(3, 2, 1)"> as.character(dd2)[1] "c(\"b\", \"c\", \"a\")" "c(3, 2, 1)" Bug or "feature"? Note that as.character applied directly on dd$A doesn't have this "feature":> as.character(dd$A)[1] "b" "c" "a"> as.character(dd2$A)[1] "b" "c" "a" Cheers, H.
Liaw, Andy
2007-Mar-27 14:03 UTC
[Rd] Unexpected result of as.character() and unlist() applied to a data frame
Given that the behavior is exactly as I expected it be, I would call that "feature" (and IMHO not a very special one). The two data frames are just different (try str() on them: A in dd is factor, while A in dd2 is character), so I don't know why you'd expect unlist() on them to give you the same answer. Andy From: hpages at fhcrc.org> > Hi, > > > dd <- data.frame(A=c("b","c","a"), B=3:1) dd > A B > 1 b 3 > 2 c 2 > 3 a 1 > > unlist(dd) > A1 A2 A3 B1 B2 B3 > 2 3 1 3 2 1 > > Someone else might get something different. It all depends on > the values of its 'stringsAsFactors' option: > > > dd2 <- data.frame(A=c("b","c","a"), B=3:1, stringsAsFactors=FALSE) > > dd2 > A B > 1 b 3 > 2 c 2 > 3 a 1 > > unlist(dd2) > A1 A2 A3 B1 B2 B3 > "b" "c" "a" "3" "2" "1" > > Same thing with as.character: > > > as.character(dd) > [1] "c(2, 3, 1)" "c(3, 2, 1)" > > as.character(dd2) > [1] "c(\"b\", \"c\", \"a\")" "c(3, 2, 1)" > > Bug or "feature"? > > Note that as.character applied directly on dd$A doesn't have > this "feature": > > > as.character(dd$A) > [1] "b" "c" "a" > > as.character(dd2$A) > [1] "b" "c" "a" > > Cheers, > H. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
Martin Maechler
2007-Mar-27 15:25 UTC
[Rd] Unexpected result of as.character() and unlist() applied to a data frame
>>>>> "Herve" == Herve Pages <hpages at fhcrc.org> >>>>> on Mon, 26 Mar 2007 20:48:33 -0700 writes:Herve> Hi, >> dd <- data.frame(A=c("b","c","a"), B=3:1) dd Herve> A B 1 b 3 2 c 2 3 a 1 >> unlist(dd) Herve> A1 A2 A3 B1 B2 B3 2 3 1 3 2 1 Herve> Someone else might get something different. It all Herve> depends on the values of its 'stringsAsFactors' option: yes, and I don't like that (last) fact either. IMO, an option should never be allowed to influence such a basic function as data.frame(). I know I would have had time earlier to start discussing this, but for some (probably good) reasons, I didn't get to it at the time. As Andy comments, everything is behaving as it should / is documented, including the 'stringsAsFactors' option; but personally, I really would want to consider changing the default for data.frame()s stringAsFactors back (as pre-R-2.4.0) to 'TRUE' instead of default.stringsAsFactors() which is a smart version of getOption("stringsAsFactors"). I find it ok ("acceptable") if its influencing read.table() but feel differently for data.frame(). Martin >> dd2 <- data.frame(A=c("b","c","a"), B=3:1, >> stringsAsFactors=FALSE) >> dd2 Herve> A B 1 b 3 2 c 2 3 a 1 >> unlist(dd2) Herve> A1 A2 A3 B1 B2 B3 "b" "c" "a" "3" "2" "1" Herve> Same thing with as.character: >> as.character(dd) Herve> [1] "c(2, 3, 1)" "c(3, 2, 1)" >> as.character(dd2) Herve> [1] "c(\"b\", \"c\", \"a\")" "c(3, 2, 1)" Herve> Bug or "feature"? Herve> Note that as.character applied directly on dd$A Herve> doesn't have this "feature": >> as.character(dd$A) Herve> [1] "b" "c" "a" >> as.character(dd2$A) Herve> [1] "b" "c" "a" Herve> Cheers, H. Herve> ______________________________________________ Herve> R-devel at r-project.org mailing list Herve> https://stat.ethz.ch/mailman/listinfo/r-devel
Heinz Tuechler
2007-Mar-28 11:16 UTC
[Rd] Unexpected result of as.character() and unlist() appliedto a data frame
At 17:25 27.03.2007 +0200, Martin Maechler wrote:>>>>>> "Herve" == Herve Pages <hpages at fhcrc.org> >>>>>> on Mon, 26 Mar 2007 20:48:33 -0700 writes: > > Herve> Hi, > >> dd <- data.frame(A=c("b","c","a"), B=3:1) dd > Herve> A B 1 b 3 2 c 2 3 a 1 > >> unlist(dd) > Herve> A1 A2 A3 B1 B2 B3 2 3 1 3 2 1 > > Herve> Someone else might get something different. It all > Herve> depends on the values of its 'stringsAsFactors' option: > >yes, and I don't like that (last) fact either. >IMO, an option should never be allowed to influence such a basic >function as data.frame(). > >I know I would have had time earlier to start discussing this, >but for some (probably good) reasons, I didn't get to it at the >time. >As Andy comments, everything is behaving as it should / is documented, >including the 'stringsAsFactors' option; >but personally, I really would want to consider changing >the default for data.frame()s stringAsFactors back (as >pre-R-2.4.0) to 'TRUE' instead of default.stringsAsFactors() >which is a smart version of getOption("stringsAsFactors"). >I find it ok ("acceptable") if its influencing read.table() >but feel differently for data.frame(). > >Martin >Martin! I see the problem with options influencing "such a basic function as data.frame().", but in my view the difficulty starts earlier. In my understanding data.frame() is _the_ basic way to store empirical source data in R and I found the earlier default behaviour, to change character variables to factors, problematic. If changing character variables to factors were only an internal process, not visible to the user, I would not mind, but to include a character variable in a data frame and get a factor out of it, is somewhat disturbing. A naive user like me was especially confused by the fact that I could read an SPSS file with spss.get (default: charfactor=FALSE) and get a character variable in a data.frame as a character variable but then putting it in a different data.frame it changed to factor. I would wish a data.frame() function that behaves as a "data container" with the idea of rows(=cases) and columns(=variables) but without changing the mode/class of the objects. Heinz> > > > > >> dd2 <- data.frame(A=c("b","c","a"), B=3:1, > >> stringsAsFactors=FALSE) > >> dd2 > Herve> A B 1 b 3 2 c 2 3 a 1 > >> unlist(dd2) > Herve> A1 A2 A3 B1 B2 B3 "b" "c" "a" "3" "2" "1" > > Herve> Same thing with as.character: > > >> as.character(dd) > Herve> [1] "c(2, 3, 1)" "c(3, 2, 1)" > >> as.character(dd2) > Herve> [1] "c(\"b\", \"c\", \"a\")" "c(3, 2, 1)" > > Herve> Bug or "feature"? > > Herve> Note that as.character applied directly on dd$A > Herve> doesn't have this "feature": > > >> as.character(dd$A) > Herve> [1] "b" "c" "a" > >> as.character(dd2$A) > Herve> [1] "b" "c" "a" > > Herve> Cheers, H. > > Herve> ______________________________________________ > Herve> R-devel at r-project.org mailing list > Herve> https://stat.ethz.ch/mailman/listinfo/r-devel > >______________________________________________ >R-devel at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-devel >