When "should" I use a dataframe vs. a matrix? What are the pros and cons? If I have data of all the same type, am I usually better off using a matrix and not a dataframe? What are the advantages if any of using a dataframe vs. a matrix? (rownames and column names perhaps?)
Anika -- these are good questions and many on the list could expatiate on them. These erudite people are also busy, however, and that is why the R-news posting guide suggests that you study an introductory book on R before asking general questions. On 27-Jun-13, at 11:26 AM, Anika Masters wrote:> When "should" I use a dataframe vs. a matrix? What are the pros > and cons? > If I have data of all the same type, am I usually better off using a > matrix and not a dataframe? > What are the advantages if any of using a dataframe vs. a matrix? > (rownames and column names perhaps?) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.Don McKenzie, Research Ecologist Pacific Wildland Fire Sciences Lab US Forest Service phone: 206-732-7824 Affiliate Professor School of Environmental and Forest Sciences University of Washington
Hi, set.seed(24) dat1<-data.frame(X=sample(letters,20,replace=TRUE),Y=sample(1:40,20,replace=TRUE),stringsAsFactors=FALSE) mat1<-as.matrix(dat1) ?sapply(dat1,class) #????????? X?????????? Y #"character"?? "integer" sapply(split(mat1,col(mat1)),class) #????????? 1?????????? 2 #"character" "character" str(as.data.frame(mat1)) #'data.frame':??? 20 obs. of? 2 variables: # $ X: Factor w/ 14 levels "b","d","f","g",..: 5 3 11 8 10 14 5 12 13 4 ... # $ Y: Factor w/ 14 levels "10","13","15",..: 12 5 9 13 14 8 12 6 7 4 ... If you have data of the same type, matrix would be faster when compared to data.frame. set.seed(245) mat2<- matrix(sample(1:50,3*1e7,replace=TRUE),ncol=3) dat2<- as.data.frame(mat2) system.time(res1<- rowSums(mat2)) #?? user? system elapsed #? 0.132?? 0.016?? 0.201 ?system.time(res2<- rowSums(dat2)) #?? user? system elapsed #? 0.376?? 0.056?? 0.447 ?identical(res1,res2) #[1] TRUE A.K. ----- Original Message ----- From: Anika Masters <anika.masters at gmail.com> To: R help <r-help at r-project.org> Cc: Sent: Thursday, June 27, 2013 2:26 PM Subject: [R] when to use & pros/cons of dataframe vs. matrix? When "should" I use a dataframe vs. a matrix?? What are the pros and cons? If I have data of all the same type, am I usually better off using a matrix and not a dataframe? What are the advantages if any of using a dataframe vs. a matrix? (rownames and column names perhaps?) ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello, Arun's answer shows that matrices are faster. If your data is all of the same type, then this might be a point for matrices. data.frames are better for modeling. You can use the formula interface to the many modeling functions. For instance, the example below is _not_ possible with a matrix. set.seed(1234) dat <- data.frame(x = rnorm(100), A = sample(letters[1:4], 100, TRUE), y = rnorm(100)) model <- lm(y ~ x + A, data = dat) # not possible with matrix #predict.lm needs data.frames newdat <- data.frame(x = c(1,3,4), A = rep("a",3)) predict(model, new = newdat) There are many other examples like this one. If you are doing data modeling, use data frames. Hope this helps, Rui Barradas Em 27-06-2013 19:26, Anika Masters escreveu:> When "should" I use a dataframe vs. a matrix? What are the pros and cons? > If I have data of all the same type, am I usually better off using a > matrix and not a dataframe? > What are the advantages if any of using a dataframe vs. a matrix? > (rownames and column names perhaps?) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >