dadrivr
2011-Sep-07 14:02 UTC
[R] Reshaping data from wide to tall format for multilevel modeling
Hi, I'm trying to reshape my data set from wide to tall format for multilevel modeling. Unfortunately, the function I typically use (make.univ from the multilevel package) does not appear to work with unbalanced data frames, which is what I'm dealing with. Below is an example of the columns of a data frame similar to what I'm working with: ID a1 a2 a4 b2 b3 b4 b5 b6 Below is what I want the columns to be after reshaping the data to long format: ID a b time Here is an example data frame that I want to reshape: ID <- c(1,2,3) a1 <- c(NA, rnorm(2)) a2 <- c(NA, rnorm(1), NA) a4 <- c(NA, rnorm(2)) b2 <- c(rnorm(2), NA) b3 <- rnorm(3) b4 <- NA b5 <- rnorm(3) b6 <- rnorm(3) mydata <- as.data.frame(cbind(ID,a1,a2,a4,b2,b3,b4,b5,b6)) What is the best way to do this efficiently with MANY variables with widely differing time ranges? Note that I will have to manually enter the time for a given measurement because in the wide format, the time is in the variable names. By the way, I have a fairly large data set, with some variables occurring at 2 time points and other variables occurring at 20 time points. Thanks for your help! -- View this message in context: http://r.789695.n4.nabble.com/Reshaping-data-from-wide-to-tall-format-for-multilevel-modeling-tp3796168p3796168.html Sent from the R help mailing list archive at Nabble.com.
Jim Lemon
2011-Sep-08 11:36 UTC
[R] Reshaping data from wide to tall format for multilevel modeling
On 09/08/2011 12:02 AM, dadrivr wrote:> Hi, > > I'm trying to reshape my data set from wide to tall format for multilevel > modeling. Unfortunately, the function I typically use (make.univ from the > multilevel package) does not appear to work with unbalanced data frames, > which is what I'm dealing with. > > Below is an example of the columns of a data frame similar to what I'm > working with: > ID a1 a2 a4 b2 b3 b4 b5 b6 > > Below is what I want the columns to be after reshaping the data to long > format: > ID a b time > > Here is an example data frame that I want to reshape: > ID<- c(1,2,3) > a1<- c(NA, rnorm(2)) > a2<- c(NA, rnorm(1), NA) > a4<- c(NA, rnorm(2)) > b2<- c(rnorm(2), NA) > b3<- rnorm(3) > b4<- NA > b5<- rnorm(3) > b6<- rnorm(3) > mydata<- as.data.frame(cbind(ID,a1,a2,a4,b2,b3,b4,b5,b6)) > > What is the best way to do this efficiently with MANY variables with widely > differing time ranges? Note that I will have to manually enter the time for > a given measurement because in the wide format, the time is in the variable > names. By the way, I have a fairly large data set, with some variables > occurring at 2 time points and other variables occurring at 20 time points. > Thanks for your help! >Hi dadrivr, I think you can do what you want using the rep_n_stack function in the prettyR package. If you want a data frame at the end, you will have to pad out your input data frame so that the lengths of the columns will be equal. You'll get lots of NAs, but without them, you won't get a data frame. mydata$a3<-NA mydata$a5<-NA mydata$a6<-NA mydata$b1<-NA mydata Now you have equal numbers of "a" and "b" columns. To reshape this into three columns is easy: rep_n_stack(mydata,to.stack=c("a1","a2","a3","a4","a5","a6", "b1","b2","b3","b4","b5","b6"),stack.names=c("ab","time")) If you want the "a" and "b" columns separate, try this: rep_n_stack(mydata,to.stack=matrix(c(2,3,10,4,11,12,13,5,6,7,8,9),nrow=2, byrow=TRUE),stack.names=c("a","time","b","time")) Currently you have to pass the column indices directly to get the correct order in the output. I hadn't anticipated the missing column problem when I wrote the function. Jim