Mike Prager
2007-Feb-28 19:01 UTC
[Rd] Help with "row.names = as.integer(c(NA, 5))" in file from dput
I am trying to understand why syntax used by dput() to write rownames is valid (say, when read by dget()). I ask this because I desire to emulate its actions *reliably* in my For2R routines, and I won't be comfortable until I understand what R is doing. Given data set "fred":> fredid var1 1 1991 0.4388587 2 1992 0.8772471 3 1993 0.6230486 4 1994 0.2340929 5 1995 0.5005605 we can try this--> dput(ats, control = "all")structure(list(id = c(1991, 1992, 1993, 1994, 1995), var1 c(0.4388587, 0.8772471, 0.6230486, 0.2340929, 0.5005605)), .Names = c("id", "var1"), row.names = as.integer(c(NA, 5)), class = "data.frame") In the above result, why is the following part valid? row.names = as.integer(c(NA, 5)) given that the length of the RHS expression is 2, while the needed length is 5. Moreover, the following doesn't work:> row.names(fred) <- as.integer(c(NA,5))Error in `row.names<-.data.frame`(`*tmp*`, value = c(NA, 5)) : invalid 'row.names' length Is there any reason why the expression c(NA,5) is better here than the more natural 1:5 here? I will appreciate help from anyone with time to reply. MHP -- Mike Prager, NOAA, Beaufort, NC * Opinions expressed are personal and not represented otherwise. * Any use of tradenames does not constitute a NOAA endorsement.
Peter Dalgaard
2007-Feb-28 20:52 UTC
[Rd] Help with "row.names = as.integer(c(NA, 5))" in file from dput
Mike Prager wrote:> I am trying to understand why syntax used by dput() to write > rownames is valid (say, when read by dget()). I ask this > because I desire to emulate its actions *reliably* in my For2R > routines, and I won't be comfortable until I understand what R > is doing. > > Given data set "fred": > > >> fred >> > id var1 > 1 1991 0.4388587 > 2 1992 0.8772471 > 3 1993 0.6230486 > 4 1994 0.2340929 > 5 1995 0.5005605 > > we can try this-- > > >> dput(ats, control = "all") >> > structure(list(id = c(1991, 1992, 1993, 1994, 1995), var1 > c(0.4388587, 0.8772471, 0.6230486, 0.2340929, 0.5005605)), > .Names = c("id", "var1"), row.names = as.integer(c(NA, 5)), > class = "data.frame") > > In the above result, why is the following part valid? > > row.names = as.integer(c(NA, 5)) > > given that the length of the RHS expression is 2, while the > needed length is 5. > > Moreover, the following doesn't work: > > >> row.names(fred) <- as.integer(c(NA,5)) >> > Error in `row.names<-.data.frame`(`*tmp*`, value = c(NA, 5)) : > invalid 'row.names' length > > Is there any reason why the expression > > c(NA,5) > > is better here than the more natural > > 1:5 > > here? > >It's mainly a space-saving device. Originally, row.names was a character vector, but storage of character vectors is quite inefficient, so we now allow integer names and also a very short form where 1:n is stored just using the single value n. To distinguish the latter two, we use the c(NA, n) form, because row names are not allowed to be missing. Consider the following and notice how the string row names take up roughly 36 bytes per record where the actual data are only 8 bytes per record. > d<-data.frame(x=rnorm(1000)) > object.size(d) [1] 8392 > row.names(d)<-as.character(1:1000) > object.size(d) [1] 44384 > row.names(d)<-1000:1 > object.size(d) [1] 12384 > row.names(d)<-NULL > object.size(d) [1] 8392> I will appreciate help from anyone with time to reply. > > MHP > >