Hi all, I'm a new R user and am confused about how R behaves when converting a vector to a data frame when using the data.frame function. I'm specifically interested in cases where the vector is expressed as a subset of another data frame. For example, say I want to create a data frame from the last three rows of the third column of the data frame, d, that I've created below: a<-(1:10) b<-(11:20) c<-(21:30) d<-data.frame(a,b,c) To do that, I know that I could do: e<-d[8:10,"c"] f<-data.frame(e) However, I would like for the single column in the data frame, f, to be named "c". Obviously, I could just use the vector, c<-d[8:10,"c"], in place of the vector e. However, I wonder why I can't do: g<-data.frame(d[8:10,"c"]) This expression returns the proper values, but the resulting variable is named "d.8.10...c.." and not "c" as I expected it to be named. Could someone explain the mechanics of this statement and tell me why it produced such an oddly named variable? I'm especially confused as to why I get the result I expect if I use the data.frame function on multiple vectors, as in: g2<-data.frame(d[8:10,c("b","c")]) which produces a data frame with columns named "b" and "c". Many thanks in advance, Alec [[alternative HTML version deleted]]
Joshua Wiley
2011-Jan-24 00:22 UTC
[R] How does the data.frame function generate column names?
Hi, Welcome to R! What you have run into is a feature of how subsetting works. By default, it converts to the lowest possible dimensions. The odd name you see, "d.8.10...c..", is an attempt to convert " d[8:10, "c"] " into a valid name. R does this approximately by converting disallowed characters (like ":") into periods (.). This is because data.frame() uses whatever was passed to it as the name of the column, unless whatever it is already has a column name. Here is some code (you should be able to copy and paste), with comments that explains a bit further and hopefully gives you a better feel for indexing and creating data frame objects. Cheers, Josh ################################################ ## your data (in one step) d <- data.frame(a = 1:10, b = 11:20, c = 21:30) ## because only one column of 'd' is selected, the conversion ## to lowest possible dimensions is 1 (a vector) ## and that loses its column name, so use drop = FALSE f <- data.frame(d[8:10, "c", drop = FALSE]) ## another option is to explicitly name the column g <- data.frame(c = d[8:10, "c"]) ## here you have selected two columns so there must ## be at least two dimensions, and names are kept g2 <-data.frame(d[8:10, c("b", "c")]) ## to "see" what is happening d[8:10, "c", drop = FALSE] d[8:10, "c", drop = TRUE] # default ## for more details, see the documentation ?"[" # see the "drop" argument description ?data.frame # under the "value" section on names ################################################ On Sun, Jan 23, 2011 at 1:53 PM, H Roark <hrbuilder at hotmail.com> wrote:> > Hi all, > > I'm a new R user and am confused about how R behaves when converting a vector to a data frame when using the data.frame function. ?I'm specifically interested in cases where the vector is expressed as a subset of another data frame. ?For example, say I want to create a data frame from the last three rows of the third column of the data frame, d, that I've created below: > > a<-(1:10) > b<-(11:20) > c<-(21:30) > d<-data.frame(a,b,c) > > To do that, I know that I could do: > > e<-d[8:10,"c"] > f<-data.frame(e) > > However, I would like for the single column in the data frame, f, to be named "c". ?Obviously, I could just use the vector, c<-d[8:10,"c"], in place of the vector e. ?However, I wonder why I can't do: > > g<-data.frame(d[8:10,"c"]) > > This expression returns the proper values, but the resulting variable is named "d.8.10...c.." and not "c" as I expected it to be named. > > Could someone explain the mechanics of this statement and tell me why it produced such an oddly named variable? ?I'm especially confused as to why I get the result I expect if I use the data.frame function on multiple vectors, as in: > > g2<-data.frame(d[8:10,c("b","c")]) > > which produces a data frame with columns named "b" and "c". > > Many thanks in advance, > Alec > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Seemingly Similar Threads
- Efficient way to determine if a data frame has missing observations
- read.table() versus scan()
- Repeating the same calculation across multiple pairs of variables
- Convert the output of by() to a data frame
- Details of subassignment (for vectors and data frames)