thr3ads.net - R devel - [Rd] Help with "row.names = as.integer(c(NA, 5))" in file from dput [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Mike Prager

2007-Feb-28 19:01 UTC

[Rd] Help with "row.names = as.integer(c(NA, 5))" in file from dput

I am trying to understand why syntax used by dput() to write
rownames is valid (say, when read by dget()).  I ask this
because I desire to emulate its actions *reliably* in my For2R
routines, and I won't be comfortable until I understand what R
is doing.

Given data set "fred":
> fred    id      var1
1 1991 0.4388587
2 1992 0.8772471
3 1993 0.6230486
4 1994 0.2340929
5 1995 0.5005605

we can try this--
> dput(ats, control = "all")structure(list(id = c(1991, 1992, 1993, 1994, 1995), var1 c(0.4388587,
0.8772471, 0.6230486, 0.2340929, 0.5005605)),
.Names = c("id", "var1"), row.names = as.integer(c(NA, 5)),
class = "data.frame")

In the above result, why is the following part valid?

row.names = as.integer(c(NA, 5))

given that the length of the RHS expression is 2, while the
needed length is 5.

Moreover, the following doesn't work:
> row.names(fred) <- as.integer(c(NA,5))Error in `row.names<-.data.frame`(`*tmp*`, value = c(NA, 5)) : 
        invalid 'row.names' length

Is there any reason why the expression

c(NA,5) 

is better here than the more natural

1:5 

here?

I will appreciate help from anyone with time to reply.

MHP

-- 
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.

Peter Dalgaard

2007-Feb-28 20:52 UTC

head link

[Rd] Help with "row.names = as.integer(c(NA, 5))" in file from dput

Mike Prager wrote:> I am trying to understand why syntax used by dput() to write
> rownames is valid (say, when read by dget()).  I ask this
> because I desire to emulate its actions *reliably* in my For2R
> routines, and I won't be comfortable until I understand what R
> is doing.
>
> Given data set "fred":
>
>   
>> fred
>>     
>     id      var1
> 1 1991 0.4388587
> 2 1992 0.8772471
> 3 1993 0.6230486
> 4 1994 0.2340929
> 5 1995 0.5005605
>
> we can try this--
>
>   
>> dput(ats, control = "all")
>>     
> structure(list(id = c(1991, 1992, 1993, 1994, 1995), var1 > c(0.4388587,
0.8772471, 0.6230486, 0.2340929, 0.5005605)),
> .Names = c("id", "var1"), row.names = as.integer(c(NA,
5)),
> class = "data.frame")
>
> In the above result, why is the following part valid?
>
> row.names = as.integer(c(NA, 5))
>
> given that the length of the RHS expression is 2, while the
> needed length is 5.
>
> Moreover, the following doesn't work:
>
>   
>> row.names(fred) <- as.integer(c(NA,5))
>>     
> Error in `row.names<-.data.frame`(`*tmp*`, value = c(NA, 5)) : 
>         invalid 'row.names' length
>
> Is there any reason why the expression
>
> c(NA,5) 
>
> is better here than the more natural
>
> 1:5 
>
> here?
>
>   It's mainly a space-saving device. Originally, row.names was a character 
vector, but storage of character vectors is quite inefficient, so we now 
allow integer names and also a very short form where 1:n is stored just 
using the single value n. To distinguish the latter two, we use the 
c(NA, n) form, because row names are not allowed to be missing.

Consider the following and notice how the string row names take up 
roughly 36 bytes per  record where the actual data are only 8 bytes per 
record.

 > d<-data.frame(x=rnorm(1000))
 > object.size(d)
[1] 8392
 > row.names(d)<-as.character(1:1000)
 > object.size(d)
[1] 44384
 > row.names(d)<-1000:1
 > object.size(d)
[1] 12384
 > row.names(d)<-NULL
 > object.size(d)
[1] 8392



> I will appreciate help from anyone with time to reply.
>
> MHP
>
>

Possibly Parallel Threads

Search for more maybe matching threads

R devel - Feb 2007 - Help with "row.names = as.integer(c(NA, 5))" in file from dput

[Rd] Help with "row.names = as.integer(c(NA, 5))" in file from dput

[Rd] Help with "row.names = as.integer(c(NA, 5))" in file from dput

Possibly Parallel Threads